Friday, November 9, 2012

Vim BOMing out

I have some files that came from an outside organization containing Byte Order Marks. Looking at these files with a hex editor I can see the BOM is that for a UTF-8 file. I don't think I configured the 'fileencodings' for Vim, but checking the variables it is using fileencodings=ucs-bom,utf-8,default,latin1. With this, Vim fails to read these files properly. I've seen oddly varying behavior as I try different things, but it usually changes the BOM to indicate UTF-16 (big-endian). This results in improper display of many characters.

If I change my configuration so 'encoding' is utf-8, then the file is displayed correctly, though the BOM sometimes shows up as UTF-16 in hex (<FE FF>) and other times as UTF-8 as normal, though funny looking, characters. 

Since I don't need to send these files back out anywhere and the BOM is just unnecessary junk to me, I've used the hex editor to get rid of them and Vim behaves like normal. But I'm still curious what is going on with Vim and the BOMs. Can anyone explain why Vim is apparently thinking these files are or should be UTF-16 when the BOM clearly indicates they're UTF-8? Or perhaps just suggest some better settings so Vim will behave in a logical manner in regards to file encoding?

  -- Jay

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: