Sunday, May 27, 2012

Re: how to match all Chinese chars?

In fact, if your aim is not so strict, i.e. if you only need to
exclude ASCIIs, you could use [^\x00-\xff] . Hope it helps.
Thank you. i knew this rough method to work around.  Nevertheless, in some case this causes problems. And so i have to find out whether a proper way to solve it.

On Mon, May 28, 2012 at 10:15 AM, Xell Liu <xell.liu@gmail.com> wrote:
In fact, if your aim is not so strict, i.e. if you only need to
exclude ASCIIs, you could use [^\x00-\xff] . Hope it helps.

On Mon, May 28, 2012 at 12:39 AM, Chris Jones <cjns1989@gmail.com> wrote:
> On Sun, May 27, 2012 at 09:53:55AM EDT, William Fugy wrote:
>
>> Question: how to match all Chinese chars?
>>
>> -----------------------------------
>> fenc=utf-16le
>> enc=utf-16le
>> termencoding=utf-16le
>> ------------------------------------
>>
>> :g/[\%u4e00-\%u9fff]/
>> this command doesn't work.
>>
>> However
>> :g/\%u5728/
>> could match a single char '在'..
>>
>> thanks in advance.
>
> Doesn't work here either even with smaller ranges.. (Vim 7.2 and UTF-8).
>
> Unless s/o comes up with a better idea, you could try using the
> characters themselves instead of their code points but it looks like you
> are going to run into another problem.. in my environment, ranges appear
> to be limited to something like 256+ characters. Beyond that you get an
> 'E16 Invalid range' message.
>
> Unless I missed something, and if you absolutely need to do this, you
> could bypass the limitation by breaking up the range like so:
>
> | :g/[一-仿伀-俿倀-儀 ... 鼀-龻]/
>
> This corresponds to ranges:
>
> | \u4e00-\u4eff
> | \u4f00-\u4fff
> | \u5000-\u50ff
> | ..
> | \u9f00-\u9fbb¹
>
> Trouble is, this is going to add up to something like 80+ subranges and
> may cause you to run into other limitations. I haven't tested the whole
> range, only the above (it works here) but if nobody comes up with
> a better idea, and you choose go down this path, I would suggest
> generating the regex programatically..
>
> CJ
>
> ¹ I think \u4e00-\u9fbb is the correct CJK range
>
> --
> WE GET SIGNAL
>
> --
> You received this message from the "vim_use" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: