Wednesday, April 17, 2013

Re: substitution with accented characters

On 17/04/13 05:12, Ben Fritz wrote:
> On Tuesday, April 16, 2013 8:52:22 PM UTC-5, andalou wrote:
>> Suppose I have the following text:
>>
>> Diagonalización de matrices. Formas cuadráticas.
>> El Espacio Afín
>> El problema de la Programación Lineal
>> El Espacio Euclídeo
>>
>> How can I replace the &#...; with their corresponding characters?
>>
>> I know that ó is ó
>> á is á
>> í is í
>>
>> The text can be very large with several of the &#...;
>>
>
> Are you looking for something like this?
>
> :%s/&#\(\d\+\);/\=nr2char(submatch(1))/g
>
> :help sub-replace-expression
> :help submatch()
> :help nr2char()
>
> Depending on your encoding you might need to throw in an iconv() call somewhere.
>

You may, if you want, refine this expression (the actual implementation
is left as an exercise to the student) for hex entities like e.g.
#x4E09; for δΈ‰, the CJK pictogram for the number three (Unicode
codepoint U+4E09). Of course, for symbolic entity names you will have to
identify each of them separately, which may require some other approach,
such as working with a function and a Dictionary like { 'lt':'<',
'gt':'>', 'nbsp':"\xA0", 'amp':'&', 'aacute':'á', 'eacute':'é',
'iacute':'í', 'oacute':'ó', 'uacute':'ú', 'uuml':'ü', 'ntilde':'ñ',
'Aacute':'Á', …etc… }

Of course, if your example text is part of an HTML page, all browsers
MUST display the characters correctly even if you leave the &…; entities in.


Best regards,
Tony.
--
An authority is a person who can tell you more about something than you
really care to know.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No comments: