Sunday, October 3, 2010

Re: How to set utf-8 locally (for a buffer) on loading the file

On 03/10/10 22:08, esquifit wrote:
> I need a way to set utf-8 encoding only for a particular file, on
> opening it.
>
> I'm using gvim 7.3 on Windows. I have the following situation
> regarding locale and languages:
>
> 1) The OS language is English
> 2) The Windows LANG environment variable is set to DE (because I live
> in Germany).
> 3) The gvim 'installation language' is German. With this I mean that
> the menus are in German.
> 4) My vim locale is English (that is, my _vimrc contains 'lang
> english')
>
> These settings are fine for most purposes. However, I also have a
> special file in which I'm collecting text lines in Japanese. The text
> in the file is not maintained manually but is being written to by
> another application.
>
> A little trial and error shows that, when making the following
> settings *after* the file has been opened for reading, the Japanese
> text is correctly displayed:
>
> set encoding=utf-8
> set guifont:MS_Mincho:h10:cSHIFTJIS
>
> However, this has the undesirable collateral effect that it does not
> only affect the buffer with this particular Japanese text but all open
> and future buffers.
>
> I tried 'fileencoding', which didn't work. From the documentation I
> kind of understand that 'fileencoding' only affects user input after
> the variable was set, which in my case is useless. I don't care user
> input, I just want the text to show in the correct encoding (utf-8).
> The 'on opening' part can be achieved with an autocommand (I used to
> use a modeline until 7.2, but setting encoding from a modeline seems
> to be forbidden from 7.3).
>
> Does anybody know how to handle this?
> Thank you
> e.
>
>

In order to be able to read any Unicode file, 'encoding' MUST be set to
"utf-8". This is a global option, which determines how Vim represents
ALL files' data in memory, and the best place to set it is in the vimrc,
before any file has been loaded. If you change this option while a file
is already loaded, you're liable to corrupt that file's data, because
Vim converts file text only at reading or writing (and only between disk
and memory), not when you change 'encoding' (and the text is already in
memory). See http://vim.wikia.com/wiki/Working_with_Unicode for details.

Having 'encoding' set to Unicode does not preclude editing German files
in Latin1, as follows:
- With 'fileencodings' [plural] set to "ucs-bom,utf-8,latin1" (without
the quotes), any Latin1 file containing at least one eszett, umlaut, or
other non-7-bit character will be correctly recognised as Latin1 and
converted back and forth when reading and writing. This conversion is
possible and lossless as long as you don't add non-Latin1-compatible
data by editing.
- If you have ":setglobal fenc=latin1" (without the quotes), e.g. in
your vimrc, every _new_ file will be created in Latin1.
- The only thing you'll have to pay attention to is files containing
only 7-bit ASCII text. Such data is represented identically on disk in
all three of us-ascii, latin1 and utf-8 so it is not an error that Vim
sees them as UTF-8 when reading. As long as they contain only 7-bit
characters, whether Vim writes them as Latin1 or UTF-8 is immaterial,
the disk data is the same. If you add an umlaut or eszett (etc.) by
editing such a file, it stops being "7-bit only". At that moment, you
have the choice: record it as Latin1 (by using ":setlocal fenc=latin1"
[without the quotes] before writing the file), or leaving 'fileencoding'
[singular] left at utf-8 (if that's what Vim detected when reading the
file) which will write it as UTF-8. As long as that UTF-8 file contains
no codepoint higher than U+00FF, you can still convert it later to
Latin1 by using
:e filename
:setlocal fenc=latin1
" maybe do some editing
:w " or :wq or :x etc.
in a Vim instance where 'encoding' (the global option) is set to "utf-8".

See also
http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line


Best regards,
Tony.
--
"We don't have to protect the environment -- the Second Coming is at
hand."
-- James Watt

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: