Sunday, May 27, 2012

Re: how to match all Chinese chars?

On Sun, May 27, 2012 at 09:25:30PM EDT, William Fugy wrote:
> On Mon, May 28, 2012 at 10:15 AM, Xell Liu <xell.liu@gmail.com> wrote:

[..]

> > > Unless I missed something, and if you absolutely need to do this,
> > > you could bypass the limitation by breaking up the range like so:

> >
> >> > | :g/[一-仿伀-俿倀-儀 ... 鼀-龻]/
> >>
> > Good one! i'll give it a try. But so many characters,.....

Depends how much one needs a regex that works for all cases or if
something more relaxed can do the job at hand. I was also thinking that
depending on the particular use case it might be possible to have
a script create the regex and initialize a variable/register and use its
contents in interactive commands to simulate a [:CJK:] character class
more conveniently.

> > > This corresponds to ranges:
> > >
> > > | \u4e00-\u4eff
> > > | \u4f00-\u4fff
> > > | \u5000-\u50ff
> > > | ..
> > > | \u9f00-\u9fbb¹
> > >
> > > Trouble is, this is going to add up to something like 80+ subranges and
> > > may cause you to run into other limitations. I haven't tested the whole
> > > range, only the above (it works here) but if nobody comes up with
> >
> >> > a better idea, and you choose go down this path, I would suggest
> >> > generating the regex programatically..
> >>
> >
> thank you. Apparently it has just to be done like this way. Now I'm
> dealing with this problem by Perl. Hope Vim could accomplish it.

I don't use Perl but I would have expected it to provide native support
for Unicode blocks. In this instance '\p{InCJk_Unified_Ideographs},
which corresponds precisely to U+4E00...U+9FFF.

See this:

http://www.regular-expressions.info/unicode.html

> >> > ¹ I think \u4e00-\u9fbb is the correct CJK range
> >>
> > Yes. it's accurate.

Sorry.. in fact, correct was the wrong word.. I really meant something
like 'effectively assigned'.. \u9fbb-\u9fff do belong to the unicode
range but afaict no characters have been assigned. Which makes it
impossible to refer to them by character.. only by code point.

CJ

--
Alex Perez is aliveeeeeeee!!!

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment