> Charles Campbell wrote:
>>> I need helping deciding whether the following is a useful tip:
>>> http://vim.wikia.com/wiki/Bash_file_encoding_alias
>>>
>>> The heart of the matter is that the tip claims it can use Vim
>>> (on Linux) as a quick way to determine the file encoding of a
>>> particular file. It uses this bash alias (one line):
>>>
>>> alias vimenc='vim -c '\''let $enc =&fileencoding | execute
>>> "!echo Encoding: $enc" | q'\'''
>>>
>>> A usage example is as follows (at bash prompt):
>>>
>>> $ vimenc UTF-16.xml
>>> Encoding: utf-16le
>>> Press ENTER or type command to continue
>>> ...
>>
>> Under linux, the "file" utility identifies the encoding, but
>> it also adds additional information. The following website
>> http://codesnipers.com/?q=node/68 has a nice discussion about
>> determining encoding.
>
> Thanks, but please boldly say what you think about the idea of using
> Vim to detect the file encoding. Is it a worthwhile tip, or is it
> fundamentally misguided because all it is doing is using the user's
> predefined list in 'fileencodings'?
>
> John
>
Well, IMHO the user's 'fileencodings' as set in his vimrc ought to
reflect the user's habits and the charsets he uses most often. Or there
is a 3rd-party global plugin (forgot the name) which will attempt to
detect which (mostly East-Asian or Unicode) encoding a file is in by
examining the contents. Or if you haven't set anything, a UTF-8 Vim 7.x
will start with fileencodings=ucs-bom,utf-8,default,latin1 which will
detect the following:
- any Unicode file with a BOM, namely
00 00 FE FF UCS-4 aka UTF-32 (big endian)
FF FE 00 00 UCS-4le aka UTF-32le (little-endian)
FE FF UTF-16 (big endian)
FF FE (not followed by 00 00) UTF-16le
EF BB BF UTF-8
- UTF-8 even without a BOM
- if none of the above, try your system locale's default charset
- if all else fails, assume Latin1
which is not a bad set of defaults in the absence of more user-specific
information. (It will detect 7-bit ASCII files as "UTF-8 without BOM"
which is not wrong since for the 128 US-ASCII codepoints most non-EBCDIC
charsets, including UTF-8, use the identically same disk representation.)
Best regards,
Tony.
--
What does "it" mean in the sentence "What time is it?"?
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
No comments:
Post a Comment