Friday, August 17, 2012

Re: Searching for combined characters

Dominique Pellé <dominique.pelle@gmail.com> wrote:

> On Fri, Aug 17, 2012 at 9:48 PM, Steffen Daode Nurpmeso
> <sdaoden@gmail.com> wrote:
>> Hello,
>> well - this is my first post here after using vim(1) for so long
>> (it's something in between 10 and 12 years, i've forgotten), so
>> let me thank the vim(1) developers first -- YYAAAAAAAAAAAAAAAAA!!!
>>
>> On the unicode@unicode list there was a thread on combining characters,
>> (Why no combining-character form for U+00F8?), and it turns out that
>> vim(1) isn't capable to perform a normalized search either!?
>> E.g., given a file
>>
>> |é
>> |é
>> |e
>>
>> which is (except empty lines stripped)
>>
>> |00000000 0a c3 a9 0a 65 cc 81 0a 65 0a 0a |....e...e..|
>> |0000000b
>>
>> then with 'Vi IMproved 7.3 (2010 Aug 15, compiled Jan 7 2011 14:27:00)',
>> old but never failed, pretty stripped, but very Unicode friendly,
>>
>> /\%xe9\|e\%u0301\|e
>>
>> finds the first and the last, and
>>
>> /[=\%xE9=]
>>
>> finds the second and the third, which is wrong.
>> Searching for \%u0301 will find the second, but \.\%u0301 won't.
>> \Ze will also find the second and the third.
>>
>> Should i update? Or what is the state of Unicode normalization
>> support for searching and replacement? Will it be implemented?
>> Am i missing something?
>> Thanks you and ciao,
>>
>> --steffen
>
>
> Maybe you're interested in this patch:
>
> ---
> Patch 7.3.259
> Problem: Equivalence classes only work for latin characters.
> Solution: Add the Unicode equivalence characters. (Dominique Pelle)
> Files: runtime/doc/pattern.txt, src/regexp.c, src/testdir/test44.in,
> src/testdir/test44.ok
> ---
>
> In your example, all 3 lines match with Vim-7.3.633 when I do:
>
> /[[=e=]]
>
> See :help \[==\]
>
> -- Dominique

I suppose that you wanted to match only the 1st and 2nd lines
ignoring only combining character differences (rather than ignoring
all diacritics in my above suggestion). I'm not sure we can do that.
But it would be useful addition.

I saw ":help /\\Z" which seems close, but it does not do that.

-- Dominique

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment