Saturday, November 6, 2010

Re: manipulation of multi-byte strings

On 04/11/10 12:09, Luc Hermitte wrote:
> Hello,
>
> I'm in the process on upgrading my scripts to support multi-byte strings.
> I've identified a few needs for now:
> - a mbyte strlen
> - a get-at(pos) operator
>
> Regarding string length, |strlen()| recommends to play with substitute(), however I see strwidth() that seems to do the work, is there a reason that this is not the recommended way ?
>
> Regarding [] alternative, matchstr('.\{'.pos.'}\zs.\ze') does the job (for pos> 0). Is there a better way to proceed ?
>

If your text includes Chinese, Japanese, Korean, or maybe (I'm less
sure) hard tabs, the results will be different:

* strlen(string) is a number of 8-bit bytes in memory
* strwidth(string) is a number of display cells in a Vim window
* strlen(substitute(string, '.', 'a', 'g')) is a number of logical
"characters", each of which can be one hard tab (one byte, between one
and 'tabstop' cells), one ASCII printable character (one byte, one
cell), one Chinese character (two or sometimes four bytes in GB18030,
three or four bytes in UTF-8, two cells), etc.

You might want to use the {count} argument of matchstr:

(untested)
let elemfound = matchstr(string, '.', 0, pos+1)
if elemfound == ""
" not found
else
" found
endif

Best regards,
Tony.
--
Nothing is illegal if one hundred businessmen decide to do it.
-- Andrew Young

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: