Thursday, August 20, 2009

Re: using regexp to search for Unicode code points and properties

Am 20.08.2009 15:47, Brian Anderson schrieb:
> I'm interested in learning how to use regular expressions in Vi(m) to
> search for Unicode code points.
>
> In a book about regexp, it describes how to search for Unicode code
> points by various means, and for various programming languages.
>
> The book describes searching for a specific Unicode code point as \u2122
> or \x{2122}.
>
> From what I've seen in the Vim help files, \u is to identify uppercase
> characters, not Unicode code points, and \x is for hexadecimal digits.
>
> The book also talks about using Unicode property or categories in the
> search. The book indicates there are 30 Unicode categories, grouped into
> 7 super-categories.
> For example, \p{Ll} would find any lowercase letter that has an
> uppercase variant, and \p{Lo} any letter or ideograph that does not have
> lowercase and uppercase variants.
>
> Unicode blocks are defined as \p{IsGreekExtended}. Blocks consist of a
> single range of code points. Example: searching for any code point
> between U+0000...U+007F can be found with \p{InBasicLatin}.
>
> Unicode script is \p{Greek}. Each Unicode code point is part of only one
> Unicode script. So if I wanted to search for any Greek letter, I'd use
> \p{Greek}.
>
> Unicode grapheme is \X or \P{M}. This would be either single codepoints
> (U+00E0 Latin small letter a with grave accent) or combined codepoints
> (U+0061 Latin small letter a + U+0300 combining grave accent).
>
> Help on any of these, either in examples or where to look in the help
> files, welcome.
> [...]

Read the help at :help /\%u for searching characters by a codepoint.


HTH,
Dennis Benzinger

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

No comments:

Post a Comment