Sunday, May 27, 2012

Re: how to match all Chinese chars?

On Sun, May 27, 2012 at 09:53:55AM EDT, William Fugy wrote:

> Question: how to match all Chinese chars?
>
> -----------------------------------
> fenc=utf-16le
> enc=utf-16le
> termencoding=utf-16le
> ------------------------------------
>
> :g/[\%u4e00-\%u9fff]/
> this command doesn't work.
>
> However
> :g/\%u5728/
> could match a single char '在'..
>
> thanks in advance.

Doesn't work here either even with smaller ranges.. (Vim 7.2 and UTF-8).

Unless s/o comes up with a better idea, you could try using the
characters themselves instead of their code points but it looks like you
are going to run into another problem.. in my environment, ranges appear
to be limited to something like 256+ characters. Beyond that you get an
'E16 Invalid range' message.

Unless I missed something, and if you absolutely need to do this, you
could bypass the limitation by breaking up the range like so:

| :g/[一-仿伀-俿倀-儀 ... 鼀-龻]/

This corresponds to ranges:

| \u4e00-\u4eff
| \u4f00-\u4fff
| \u5000-\u50ff
| ..
| \u9f00-\u9fbb¹

Trouble is, this is going to add up to something like 80+ subranges and
may cause you to run into other limitations. I haven't tested the whole
range, only the above (it works here) but if nobody comes up with
a better idea, and you choose go down this path, I would suggest
generating the regex programatically..

CJ

¹ I think \u4e00-\u9fbb is the correct CJK range

--
WE GET SIGNAL

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment