Thursday, July 29, 2010

Re: Editing files full of NUL characters

On 29/07/10 21:18, Teemu Likonen wrote:
> * 2010-07-29 09:50 (-0700), Bob Weissman wrote:
>
>> Often, the files look like they ought to be text files but are full of
>> NULs. Instead of "Hello", I will see "H^@e^@l^@l^@o^@". Or maybe it's
>> "^@H^@e^@l^@l^@o". I haven't figured out the byte order.
>
> Looks like an UTF-16 encoded file.
>
>> Is there a way to edit these files in gvim such that the ^@'s don't
>> appear onscreen but get written properly when I write the files back?
>
> You could try opening the file with
>
> :e ++enc=utf-16be file.txt
>
> or with ++enc=utf-16le if the byte order wasn't correct.
>
> But Vim should detect the encoding correctly if (1) the file has byte
> order mark (U+FFFE or U+FEFF) in the beginning and (2) you have ucs-bom
> in 'fileencodings' option.
>
> :set fileencodings=ucs-bom,utf-8,default,latin1
>

Yeah:

^@H^@e^@l^@l^@o
is utf-16be aka utf-16 aka unicode
H^@e^@l^@l^@o^@
is utf-16le
şÿ^@H^@e^@l^@l^@o
is utf-16be with BOM and Vim should see Hello
ÿşH^@e^@l^@l^@o^@
is utf-16le with BOM and Vim should see Hello

see ":help ++opt" -- the first example should be read correctly by just
rereading it by means of ":e ++enc=utf-16be" (without quotes and with no
filename). Similarly for the second example with utf-16le. The last two
should be detected automatically if 'fileencodings' (plural) *starts*
with ucs-bom

The (misnamed) byte order mark is always U+FEFF (the Unicode codepoint
0xFEFF, or decimal 65279); its representation varies depending on which
UTF is in use:
UTF-16be 0xFE 0xFF (one 16-bit big-endian word)
UTF-16le 0xFF 0xFE (one 16-bit little-endian word)
UTF-8 0xEF 0xBB 0xBF (three 8-bit bytes)
UTF-32be 0x00 0x00 0xFE 0xFF (one 32-bit big-endian doubleword)
UTF-32le 0xFF 0xFE 0x00 0x00 (one 32-bit little-endian doubleword)


Best regards,
Tony.
--
What we need in this country, instead of Daylight Savings Time, which
nobody really understands anyway, is a new concept called Weekday
Morning Time, whereby at 7 a.m. every weekday we go into a space-
launch-style "hold" for two to three hours, during which it just
remains 7 a.m. This way we could all wake up via a civilized gradual
process of stretching and belching and scratching, and it would still
be only 7 a.m. when we were ready to actually emerge from bed.
-- Dave Barry, "$#$%#^%!^%&@%@!"

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: