Sunday, January 3, 2010

Re: Windows Registry Editor text files (.REG) in Unicode (?) encoding - displayed as garbage

On 02/12/09 21:54, Frantisek Rysanek wrote:
> Dear everybody,
>
> following up on my Unix-originated habit, I'm using Vim as my
> favourite text editor for anything that resembles plain text in
> Windows. The Windows context menu item "edit with Vim" is very useful
> indeed :-)
>
> And that's where I have a problem with .REG files.
> If the file is encoded as plain ASCII, that's no problem.
> The problem is that Regedit in XP exports Unicode (or what), maybe
> utf-16 (or some other multibyte charset). If I try "edit with Vim" on
> such files, I get a screenful of garbage.
>
> I'm using the basic Windows build of Vim (gvim.exe) that gets
> installed by the "allaround Windows installer" of VIM 7.2, available
> at www.vim.org.
>
> Is there an easy way out? A few dark curses in the command line?
> ":help multibyte" was not much help... I don't even know what
> encoding i should try.
> Or is the default Windows build missing some important compile-time
> features? Some parts of "multibyte" support?
> :version only mentions +multi_byte_ime/dyn , not +multi_byte.
>
> Any ideas are welcome :-)
>
> Frank Rysanek
>

I'm coming to this thread about a month late, which shows how far behind
I am in handling email.

Reading this thread shows that a solution "which seems to work" has been
found by trial and error, but, it seems, not with much understanding of
what was going on. So here goes:

1) IUC, Windows registry files (*.REG) are encoded in UTF-16le with BOM.

2) To display (and possibly edit) these files in Vim, you need not only
a Vim version which is "multi-byte capable" (i.e. compiled with
+multi_byte or +multi_byte_ime/dyn), you also need to exercise those
capabilities by setting 'encoding' (the internal representation Vim will
use to represent files in memory) to something that will be able to
represent all characters in all files you'll edit, and in this case this
means UTF-8 (Note: Setting 'encoding' to any of utf-16, utf-16be,
utf-16le, ucs-2, ucs-2be, ucs-2le, ucs-4, ucs-4be, ucs-4le, utf-32,
utf-32be, utf-32le, some of which are synonymous, will result in UTF-8
being used internally, because UTF-8 allows lossless conversion to and
from these encodings but, unlike them, it never uses the byte 0x00 for
anything other than the NUL control character also used in C to
terminate strings.)

3) The 'fileencodings' option defines the heuristics used to determine
the encoding used on disk within a file. The first value is tried first,
then the next in case of failure, and so on. Since 8-bit encodings
cannot give a "fail" signal, there should be at most one of them, and at
the end. OTOH, ucs-bom (if used) should be first, and in any case before
any other Unicode encoding. IOW:

Bad: set fencs=utf-8,ucs-bom,latin1,cp1252
- The BOM in a UTF-8 file will never be detected and you'll see <FEFF>
at the start of line 1 of any file in UTF-8 with BOM;
- Code page 1252 will never be used because latin1 (which cannot give a
"fail" signal) will be tried before. (Note that ending with
,cp1252,latin1 would be just as bad: in this case it's latin1 which
would never be used.)

Good: set fencs=ucs-bom,utf-8,cp1252

Any encoding not in the 'fencs' can still be used, see ":help ++enc".

For more details, see http://vim.wikia.com/Working_with_Unicode


Best regards,
Tony.
--
Republicans raise dahlias, Dalmatians and eyebrows.
Democrats raise Airedales, kids and taxes.

Democrats eat the fish they catch.
Republicans hang them on the wall.

Republican boys date Democratic girls. They plan to marry Republican
girls, but feel they're entitled to a little fun first.

Democrats make up plans and then do something else.
Republicans follow the plans their grandfathers made.

Republicans consume three-fourths of the rutabaga produced in the USA.
The remainder is thrown out.

Republicans sleep in twin beds -- some even in separate rooms.
That is why there are more Democrats.
-- The Official Rules, as compiled by Paul Dickson

--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

No comments: