Tuesday, August 9, 2011

Re: How to display and remove BOM in utf-8 encoded file

On 09/08/11 13:37, Carlo Trimarchi wrote:
> Hi,
> I developed a website with Vim, working both on linux and windows and
> never had any problems. The other day someone else needed to edit some
> files and tried to use Mac and Windows. Apparently in the files he
> edited there is this Byte-Order Mark. I discovered this only via the
> w3c validator that gave me this warning:
>
> "Byte-Order Mark found in UTF-8 File. The Unicode Byte-Order Mark
> (BOM) in UTF-8 encoded files is known to cause problems for some text
> editors and older browsers. You may want to consider avoiding its use
> until it is better supported."

That message is outdated. The BOM is supported in all Unicode encodings
including UTF-8 by all "reasonably recent" browers. It is also part of
the HTML standard. Some text editors (such as Notepad, I think) choke on
it, but the answer to that is to use a better editor, such as Vim or
even WordPad, which know about the BOM and handle it correctly, even in
UTF-8.

For some other kinds of text files (most source files and shell scripts,
for instance), it is better to save the file without a BOM, but for
momst "web" formats including HTML, CSS, and, I think, XML, XHTML, etc.,
a BOM is no problem and can even be a help (e.g. in case the web server
sets the charset incorrectly or not at all in its Content-Type header).

>
> The only way I could solve the problem was using notepad++ which has
> an option to explicitly save the file without the BOM. Is there a way
> to do the same thing in Vim? Maybe even to display this BOM?
>
> Thanks,
> Carlo
>

To save the file without a BOM:

:setlocal nobomb
:w

To ask Vim if there is a BOM:

:setlocal bomb?

The answer is bomb for "BOM present" or nobomb for "BOM absent".

Note that regardless of the state of the 'bomb' option, a BOM can only
exist if the 'fileencoding' is one of UTF-8, UTF-16 (or its UCS-2
subset) or UTF-16 (aka UCS-4), any of them (other than UTF-8 for which
endianness is not relevant) in any endianness. For other 'fileencoding'
values the 'bomb' option is irrelevant.

To display the presence or absence of the BOM on the status line:

see http://vim.wikia.com/wiki/Show_fileencoding_and_bomb_in_the_status_line


Best regards,
Tony.
--
George Orwell was an optimist.

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: