Wednesday, May 27, 2015

Re: How to uppercase the non-English characters using Windows-1250 code page?

2015-05-27 19:04 GMT+03:00 Christian Brabandt <cblists@256bit.org>:
> Hi Nikolay!
>
> On Mi, 27 Mai 2015, Nikolay Pavlov wrote:
>
>> 2015-05-27 14:27 GMT+03:00 Igor Forca <igor2x@gmail.com>:
>> > Hi,
>> > on gVim 7.4 on Windows 7 I have a text for example:
>> > abcčšž
>> > and I would like to get uppercase of this word, so final result should be all letters upper-cased:
>> > ABCČŠŽ
>> >
>> >
>> > TEST 1
>> > 1. Set code pages: :set encoding=utf-8 fileencoding=utf-8
>> > 2. Type in text: abcčšž
>> > 3. Normal mode (go uppercase a word): gUaw
>> > Result is: ABCČŠŽ
>> > Working fine.
>> >
>> >
>> > TEST 2
>> > Note: Clear the text before continuing with next text with dd command.
>> > 1. Set code pages: :set encoding=utf-8 fileencoding=cp1250
>> > Repeat 2 and 3 from TEST 1.
>> > Result is: ABCČŠŽ
>> > Working fine.
>> >
>> >
>> > TEST 3
>> > Note: Clear the text before continuing with next text with dd command.
>> > 1. Set code pages: encoding=cp1250 fileencoding=cp1250
>>
>> I can confirm with
>>
>> % echo 'abcčšž' | iconv -t CP1250 > /tmp/enctest
>> % vim -u NONE -i NONE -N --cmd 'set encoding=cp1250' -c 'e
>> ++enc=cp1250 /tmp/enctest' -c 'normal! gUaw' -c 'wqa!'
>> % cat /tmp/enctest| iconv -f CP1250
>> ABCčšž
>
> Interesting. I see the same with:
>
> #v+
> $ cat /tmp/enctest| iconv -f CP1250|tr '[:lower:]' '[:upper:]'

This is not strange, tr is somewhat like tolower/toupper (as opposed
to towlower/towupper):

% echo 'abcабц' | iconv -t CP1251 > /tmp/enctest2
% cat /tmp/enctest2 | LANG=ru_RU.CP1251 tr '[:lower:]' '[:upper:]'
| iconv -f CP1251
ABCАБЦ
% cat /tmp/enctest2 | iconv -f CP1251 | tr '[:lower:]' '[:upper:]'
ABCабц

: it works only with single bytes. Each of the characters "čšž" in
UTF-8 occupies more then one byte, thus you need to use CP1250 locale
for this to work.

Cannot find this in `man tr` though, but `info tr` has

> Currently `tr' fully supports only single-byte characters.
> Eventually it will support multibyte characters; when it does, …


> ABCčšž
> #v-
>
> Best,
> Christian
> --
> Die Streichung des §218 ist eine Voraussetzung für die Befreiung
> der Frauen.
> -- Mitglieder des Frauenbundes Westberlin, 6. Juni 1971
>
> --
> --
> You received this message from the "vim_use" maillist.
> Do not top-post! Type your reply below the text you are replying to.
> For more information, visit http://www.vim.org/maillist.php
>
> ---
> You received this message because you are subscribed to the Google Groups "vim_use" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: