Wednesday, May 30, 2012

Re: how to match all Chinese chars?

On Mon, May 28, 2012 at 05:55:27PM EDT, Tony Mechelynck wrote:

[..]

> However, there is also a limitation in Vim, namely, a collection can
> only match (IIRC) at most 257 different individual characters at the
> same point. 4E00..9FFF alone is already much more than that.

The limit is that a range of characters (a-z, 0-9 etc...) that is part
of a collection can only match at most 256 characters.

Here's for instance a valid collection that matches 4096 characters:

| /[一-仿伀-俿倀-僿儀-凿刀-勿匀-叿吀-哿唀-嗿嘀-囿圀-埿堀-壿夀-姿娀-嫿嬀-寿尀-峿崀-帀]

Subranges are: 4e00-4eff ... 5d00-5dff - 256 characters each.

Conversely, the following triggers the 'E16 Invalid range' error:

| /[一-企]

Range is: 4e00-4f01

I generated a similar collection for the entire 4e00-9fff block, split
into 256-character sub-ranges, and apart from the regex causing Vim to
slow down to a crawl on larger files, it appeared to match.

All the same, there does not appear to be any simple solutions save for
this clunky workaround.

Is anything in the works regarding unicode regex support in a future
release of Vim (8.x)..?

CJ

--
WE GET SIGNAL

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: