Saturday, September 9, 2017

Re: Encoding issues with Windows gvim

On Sat, Sep 9, 2017 at 9:26 PM, Joseph L. Casale <jcasale@gmail.com> wrote:
> On Saturday, September 9, 2017 at 12:16:27 PM UTC-6, Tony Mechelynck wrote:
>> In your Windows gvim, at the point where you would be reading your
>> problematic file, do instead
>>
>> :verbose set enc?
>>
>> If the answer is anything other than utf-8, then you cannot display
>> the file in gvim because the UTF-16le of the file cannot be translated
>> into whatever it is that gvim is using to represent characters in
>> memory.
>>
>> See http://vim.wikia.com/wiki/Working_with_Unicode
>
> Hi Tony,
> Executing ":verbose set enc?" showed latin1. After reading the doc to be honest my minimal understanding of the topic was grayed even more. The sections of the manual around "*45.4* Editing files with a different encoding" helped however I am still unclear.
>
> After setting an appropriate Unicode font in my vimrc (set guifont=courier_new:h11) and opening the file with ":e ++enc=utf-16le utf16.txt", the file was loaded with conversion errors (all upside down question marks). Executing ":set encoding=utf-16le" and reloading yet again with ":e ++enc=utf-16le utf16.txt" worked, I can now view the file?
>
> Why didn't opening the file with "++enc=utf-16le" accomplish all that ":set encoding=utf-16le" did?
>
> Thanks a lot for the help.

Despite its name, ++enc sets 'fileencoding' (telling Vim which charset
is used _on disk_ for that file), not 'encoding' (the charset used for
the data _in Vim memory_); the latter, if you don't change it, is
still set to latin1, which has no representation for Greek letters.

When you did ":set enc=utf-16le", Vim actually used utf-8, because
UTF-16le uses a lot of null bytes (one each for every codepoint not
greater than U+00FF, for instance spaces, tabs, commas, etc.) and Vim
uses C strings, which those null bytes would terminate. UTF-8, like
UTF-16, can represent data in any encoding including the Greek text of
your problematic file. Your Linux Vim probably runs in a UTF-8 locale
(something many Linux systems use) which would explain why your Greek
text was immediately readable on Linux.

But if you change 'encoding' while some file (even maybe just a help
file) is already loaded in memory, all the data in memory becomes
invalid. The only safe place to change 'encoding' is near the top of
your vimrc, before any editfile has been read, and there are other
changes that go with it.

Please read the Vim wiki article linked in my previous post, it will
tell you how to do it safely, and explain in detail the differences
between the various encoding-related options that Vim possesses.

Best regards,
Tony.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: