Tuesday, May 29, 2012

Re: how to match all Chinese chars?

On Mon, May 28, 2012 at 08:58:58AM EDT, Christian Brabandt wrote:

> Hi Chris!

[..]

> I think, you can script this. Something like this:
>
> fu! <sid>Collation(start, end, match) "{{{
> let start = '0x'. a:start
> let end = '0x'. a:end
> let patt = '\%('
> if (end - start) < 256
> return a:match
> endif
> while (end - start) > 256
> let temp = start + 256
> let patt .= printf('[\%%u%X-\%%u%X]', start, temp)
> let start = temp + 1
> if (end - start) > 0
> let patt .= '\|'
> endif
> endw
> if (end - start) > 0
> let patt .= printf('[\%%u%X-\%%u%X]', start, end)
> endif
> let patt .= '\)'
> return patt
> endfu
>
> fu! <sid>RegCollate() "{{{
> let cmd = getcmdline()
> if getcmdtype() =~# '[/?:]' && cmd =~# '\[\\%u\x\+-\\%u\x\+\]'
> let cmd = substitute(cmd,
> \ '\[\\%u\(\x\+\)-\\%u\(\x\+\)\]',
> \ '\=<sid>Collation(submatch(1), submatch(2), submatch(0))',
> \ 'g')
> endif
> return cmd
> endfu
>
> cnoremap <f7> <c-\>e<sid>RegCollate()<cr>
>
> And then press f7 whenever you have entered a range > 256 items.

With minor adjustments, the script works as intended and splits the huge
[%\u4e00-%\u9fff] range into a very long series of [n-n+256] subranges
of code points separated by '\|' alternatives.

But there is a second problem: at least in my setup, (Vim 7.2, UTF-8),
the smaller ranges no longer trigger the 'E16 Invalid Range' error but
they match any ASCII character.. and ironically the only thing they
don't match are the characters they are supposed to match.. here, the
ones in the %\u4e00-%\u9fff range.

That's why I initially suggested the OP use actual characters for his
ranges instead of the '%\u4e00'... etc. code points...

I guess the script could be modified to translate %\u4e00' and friends
into actual characters, but after digging a bit into Vim help.. I'm not
so sure.. my impression is that you'd need an external tool to do it:
Vim's printf() for instance does not support a '%U' conversion type.

Thanks,

CJ

--
Have a nice day!

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: