Wednesday, November 3, 2010

Re: enc,fenc (again!?)

On 03/11/10 22:56, Alessandro Antonello wrote:
>> When loading an existing file (or reloading one which you just edited) which
>> contains only bytes in the range 0-127, it will be detected as UTF-8 without
>> BOM, in preference to Latin1. This is not an error, because these 128
>> characters are represented identically in US-ASCII, Latin1 (, most
>> non-EBCDIC encodings) and UTF-8; and UTF-8 has to come before Latin1 in
>> 'fileencodings', or it would never be detected. As long as you don't add any
>> characters with the high bit set, the file will be read or written exactly
>> the same way if its 'fileencoding' is set to utf-8, latin1, or even
>> us-ascii.
>>
>> See ":help 'fileencodings'" (or ":h 'fencs'" if you're a lazy typist ;-) )
>> for an explanation of how the charset, and BOM if any, of an existing file
>> are detected.
>>
>> If you want to be sure that a given file will be loaded as Latin1 (assuming
>> 'fileencodings' is set to "ucs-bom,utf-8,latin1" -- or maybe
>> "ucs-bom,utf-8,default,latin1": trying UTF-8 once as itself and once as
>> default entails only a negligible performance loss in most cases), make sure
>> that it contains one or more characters in the range 128-255 (maybe by using
>> accented letters, possibly in a string literal or in a comment, or maybe by
>> underlining the top title of a plaintext file by a line of "divided by"
>> signs rather than dashes or equals...), then ":setlocal fenc=latin1" will
>> have it detected as Latin1 after the next time the file is saved.
>>
>
> Hi, Tony.
>
> The problem I was facing of were, indeed, caused by the 'hidden' option. It
> was 'nohidden' (the default), so buffers was unloaded when *abandoned* and its
> local variables, like 'fenc', was lost. Using 'set hidden' in my *.vimrc*
> solves everything and now I am able to go out an 'utf-8' buffer to a 'latin1'
> one and get back to the first or second and everything works fine.

You'll still see the problem (but see below) if you edit a Latin1 file,
close Vim, and reopen that file later.

>
> My primary working machine is a PC with Windows XP where 'latin1' is the
> default encoding but right now I am working in a MacBook where 'utf-8' is the
> default encoding. To be able to exchange files and projects without loosing
> information I set 'fileencodings=ucs-bom,latin1,default,utf-8' in the *.vimrc*
> of Mac so 'latin1' takes precedence over 'utf-8'. When needed '++enc=utf-8' is
> used to load 'utf-8' encoded files and everything is working beautifully. Vim
> is just a superb editor!

Since latin1 is an 8-bit encoding, it cannot give a "fail" signal:
fencs=ucs-bom,latin1,default,utf-8 means the same as
fencs=ucs-bom,latin1 i.e. whenever there is no BOM, the file will be
detected as Latin1 because none of the 256 possible byte values, in any
sequence, is invalid for Latin1 -- and if it is actually UTF-8 without
BOM, anything above U+007F wil appear as two or more characters of
gibberish.

In the 'fileencodings' option, "ucs-bom", if present, should be first,
and an 8-bit encoding, if present, should be last (which means that at
most one 8-bit encoding should be used), because anything that comes
after the first 8-bit encoding will never be used.

Setting fencs=ucs-bom,utf-8,latin1 means the following:

1) Is there a BOM at the very start of the file? Then setlocal bomb,
"eat" the BOM, and setlocal the corresponding Unicode 'fileencoding',
otherwise setlocal nobomb and:

2) Are the full contents of the file valid for UTF-8? (and note: 7-bit
ASCII is valid for both UTF-8 and Latin1 and is displayed the same in
both) -- if yes, setlocal fenc=utf-8; otherwise

3) Unconditionally setlocal fenc=latin1

>
> Thanks for your help.
>
> Alessandro Antonello
>

My pleasure.

Best regards,
Tony.
--
The pitcher wound up and he flang the ball at the batter. The batter
swang and missed. The pitcher flang the ball again and this time the
batter connected. He hit a high fly right to the center fielder. The
center fielder was all set to catch the ball, but at the last minute
his eyes were blound by the sun and he dropped it.
-- Dizzy Dean

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: