Saturday, August 13, 2011

Re: How to display and remove BOM in utf-8 encoded file

On 11/08/11 00:03, Alessandro Antonello wrote:
> May I add some observation to this discution?
>
> The better way to use BOM is when you know your target. I work in a MacBook
> which has UTF-8 as default. When I'm working with Objective-C that will be
> compiled using LLVM there is no problem using BOM (which is a good thing since
> the encoding can be easily recognized). But when I'm working with Java, doing
> something for the Android platform, I use ISO-8859-1 because the Google guys
> had defined the 'encoding' argument of the 'javac' compiler as 'ASCII' in an
> ANT XML somewhere.
>
> I known, also, that PHP doesn't handle BOM well. So I decided to work with PHP
> also in ISO-8859-1. But, my e-mails are all HTML formated using UTF-8 with BOM
> (edited on VIM), always seen in Firefox, Safari or Chrome with no problems.
>
> I believe that the problem with major browsers is in respect with user
> configuration. You can left the browser discover the character set of a page
> or configure it to use one based in the assumption that you are in an
> occidental country (or another part of the world). This causes no problems if
> you don't open pages from another countries. In the current days, is
> preferable if you let the browser handle the encoding it self.
>
> Regards.
>

Yeah, the idea is to know what your file will be used with.

Recently I discovered that when feeding a local *.txt file to SeaMonkey
(or, I suppose, Firefox), it will try to read it as Latin1 unless there
is a BOM. I'm not sure if that depends on my Appearance preferences. Of
course, for a *.txt on my local disk there is no metadata (no HTTP
headers etc.) to tell the MIME type and the encoding to the browser. For
the MIME type, *.txt means text/plain but it could be any charset.

This means that when I want to display (and possibly print) multilingual
text (let's say, who knows? maybe a *.txt file in French with some
Russian and some Hebrew in it), something Gecko (the display engine used
by Firefox, Thunderbird and SeaMonkey) does better than gvim, I'll have
to record it with a BOM.

OTOH any file starting with #! MUST, as has already been said, be
recorded with no BOM because the shebang is only looked for in the first
two bytes of the file (which would be part of the BOM if there were one).


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
156. You forget your friend's name but not her e-mail address.

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: