Friday, August 17, 2012

Searching for combined characters

Hello,
well - this is my first post here after using vim(1) for so long
(it's something in between 10 and 12 years, i've forgotten), so
let me thank the vim(1) developers first -- YYAAAAAAAAAAAAAAAAA!!!

On the unicode@unicode list there was a thread on combining characters,
(Why no combining-character form for U+00F8?), and it turns out that
vim(1) isn't capable to perform a normalized search either!?
E.g., given a file


|é
|e

which is (except empty lines stripped)

|00000000 0a c3 a9 0a 65 cc 81 0a 65 0a 0a |....e...e..|
|0000000b

then with 'Vi IMproved 7.3 (2010 Aug 15, compiled Jan 7 2011 14:27:00)',
old but never failed, pretty stripped, but very Unicode friendly,

/\%xe9\|e\%u0301\|e

finds the first and the last, and

/[=\%xE9=]

finds the second and the third, which is wrong.
Searching for \%u0301 will find the second, but \.\%u0301 won't.
\Ze will also find the second and the third.

Should i update? Or what is the state of Unicode normalization
support for searching and replacement? Will it be implemented?
Am i missing something?
Thanks you and ciao,

--steffen

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: