Thursday, August 20, 2009

Re: using regexp to search for Unicode code points and properties

I read through the help files on /\%u, but now I have a question about
searching for composing or combining characters.

I have a Cyrillic text, using UTF-8 as the encoding, and the characters
are appearing correctly on the screen.

When I select a character and press ga, it gives me the decimal (1073),
hex (0431), and octal (2061) numbers. I can then use /\%u0431 in a
search, to find this code point.

When I press g8 on that same character, it shows 'd0 b1'. I understand
that the combining character follows the base character, so 'd0' is the
base and 'b1' is the combining, but how would I search for:
- only the base character (d0), whether there are any combining
characters or not
- only the combining character (b1), attached to any base character
- both the base + combining character (d0+b1)

I've tried /\%ud0b1, /\%uD0B1, /\%ud0/\%ub1, and several others, but
nothing has worked.

Thanks.

Brian

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

No comments: