Monday, May 28, 2012

Re: how to match all Chinese chars?

Hi Chris!

On So, 27 Mai 2012, Chris Jones wrote:

> On Sun, May 27, 2012 at 09:53:55AM EDT, William Fugy wrote:
>
> > Question: how to match all Chinese chars?
> >
> > -----------------------------------
> > fenc=utf-16le
> > enc=utf-16le
> > termencoding=utf-16le
> > ------------------------------------
> >
> > :g/[\%u4e00-\%u9fff]/
> > this command doesn't work.
> >
> > However
> > :g/\%u5728/
> > could match a single char '在'..
> >
> > thanks in advance.
>
> Doesn't work here either even with smaller ranges.. (Vim 7.2 and UTF-8).
>
> Unless s/o comes up with a better idea, you could try using the
> characters themselves instead of their code points but it looks like you
> are going to run into another problem.. in my environment, ranges appear
> to be limited to something like 256+ characters. Beyond that you get an
> 'E16 Invalid range' message.
>
> Unless I missed something, and if you absolutely need to do this, you
> could bypass the limitation by breaking up the range like so:
>
> | :g/[一-仿伀-俿倀-儀 ... 鼀-龻]/
>
> This corresponds to ranges:
>
> | \u4e00-\u4eff
> | \u4f00-\u4fff
> | \u5000-\u50ff
> | ..
> | \u9f00-\u9fbb¹
>
> Trouble is, this is going to add up to something like 80+ subranges and
> may cause you to run into other limitations. I haven't tested the whole
> range, only the above (it works here) but if nobody comes up with
> a better idea, and you choose go down this path, I would suggest
> generating the regex programatically..

I think, you can script this. Something like this:

fu! <sid>Collation(start, end, match) "{{{
let start = '0x'. a:start
let end = '0x'. a:end
let patt = '\%('
if (end - start) < 256
return a:match
endif
while (end - start) > 256
let temp = start + 256
let patt .= printf('[\%%u%X-\%%u%X]', start, temp)
let start = temp + 1
if (end - start) > 0
let patt .= '\|'
endif
endw
if (end - start) > 0
let patt .= printf('[\%%u%X-\%%u%X]', start, end)
endif
let patt .= '\)'
return patt
endfu

fu! <sid>RegCollate() "{{{
let cmd = getcmdline()
if getcmdtype() =~# '[/?:]' && cmd =~# '\[\\%u\x\+-\\%u\x\+\]'
let cmd = substitute(cmd,
\ '\[\\%u\(\x\+\)-\\%u\(\x\+\)\]',
\ '\=<sid>Collation(submatch(1), submatch(2), submatch(0))',
\ 'g')
endif
return cmd
endfu

cnoremap <f7> <c-\>e<sid>RegCollate()<cr>


And then press f7 whenever you have entered a range > 256 items.

regards,
Christian

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: