Sunday, May 27, 2012

Re: how to match all Chinese chars?

On Sun, May 27, 2012 at 09:25:30PM EDT, William Fugy wrote:
> On Mon, May 28, 2012 at 10:15 AM, Xell Liu <> wrote:


> > > Unless I missed something, and if you absolutely need to do this,
> > > you could bypass the limitation by breaking up the range like so:

> >
> >> > | :g/[一-仿伀-俿倀-儀 ... 鼀-龻]/
> >>
> > Good one! i'll give it a try. But so many characters,.....

Depends how much one needs a regex that works for all cases or if
something more relaxed can do the job at hand. I was also thinking that
depending on the particular use case it might be possible to have
a script create the regex and initialize a variable/register and use its
contents in interactive commands to simulate a [:CJK:] character class
more conveniently.

> > > This corresponds to ranges:
> > >
> > > | \u4e00-\u4eff
> > > | \u4f00-\u4fff
> > > | \u5000-\u50ff
> > > | ..
> > > | \u9f00-\u9fbb¹
> > >
> > > Trouble is, this is going to add up to something like 80+ subranges and
> > > may cause you to run into other limitations. I haven't tested the whole
> > > range, only the above (it works here) but if nobody comes up with
> >
> >> > a better idea, and you choose go down this path, I would suggest
> >> > generating the regex programatically..
> >>
> >
> thank you. Apparently it has just to be done like this way. Now I'm
> dealing with this problem by Perl. Hope Vim could accomplish it.

I don't use Perl but I would have expected it to provide native support
for Unicode blocks. In this instance '\p{InCJk_Unified_Ideographs},
which corresponds precisely to U+4E00...U+9FFF.

See this:

> >> > ¹ I think \u4e00-\u9fbb is the correct CJK range
> >>
> > Yes. it's accurate.

Sorry.. in fact, correct was the wrong word.. I really meant something
like 'effectively assigned'.. \u9fbb-\u9fff do belong to the unicode
range but afaict no characters have been assigned. Which makes it
impossible to refer to them by character.. only by code point.


Alex Perez is aliveeeeeeee!!!

You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit

No comments:

Post a Comment