Friday, August 17, 2012

Re: Searching for combined characters

On Fri, Aug 17, 2012 at 9:48 PM, Steffen Daode Nurpmeso
<sdaoden@gmail.com> wrote:
> Hello,
> well - this is my first post here after using vim(1) for so long
> (it's something in between 10 and 12 years, i've forgotten), so
> let me thank the vim(1) developers first -- YYAAAAAAAAAAAAAAAAA!!!
>
> On the unicode@unicode list there was a thread on combining characters,
> (Why no combining-character form for U+00F8?), and it turns out that
> vim(1) isn't capable to perform a normalized search either!?
> E.g., given a file
>
> |é
> |é
> |e
>
> which is (except empty lines stripped)
>
> |00000000 0a c3 a9 0a 65 cc 81 0a 65 0a 0a |....e...e..|
> |0000000b
>
> then with 'Vi IMproved 7.3 (2010 Aug 15, compiled Jan 7 2011 14:27:00)',
> old but never failed, pretty stripped, but very Unicode friendly,
>
> /\%xe9\|e\%u0301\|e
>
> finds the first and the last, and
>
> /[=\%xE9=]
>
> finds the second and the third, which is wrong.
> Searching for \%u0301 will find the second, but \.\%u0301 won't.
> \Ze will also find the second and the third.
>
> Should i update? Or what is the state of Unicode normalization
> support for searching and replacement? Will it be implemented?
> Am i missing something?
> Thanks you and ciao,
>
> --steffen


Maybe you're interested in this patch:

---
Patch 7.3.259
Problem: Equivalence classes only work for latin characters.
Solution: Add the Unicode equivalence characters. (Dominique Pelle)
Files: runtime/doc/pattern.txt, src/regexp.c, src/testdir/test44.in,
src/testdir/test44.ok
---

In your example, all 3 lines match with Vim-7.3.633 when I do:

/[[=e=]]

See :help \[==\]

-- Dominique

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment