Friday, January 28, 2011

Re: cp1252 characters when enc=utf-8, fenc=cp1252

On Thu, Jan 27, 2011 at 10:43 PM, Chris Jones <cjns1989@gmail.com> wrote:
> On Sat, Jan 22, 2011 at 05:41:33PM EST, Ben Fritz wrote:
>
> [..]
>
>> I work in Windows XP mostly, I actually have never heard of "xkb" and
>> don't have the slightest idea what it's level 3 is. Converting to
>> UTF-8 for this particular file would be OK but doesn't serve much
>> purpose, and most of the files I work with are latin1 and need to stay
>> that way.
>
> Maybe you should set your locale to latin1 instead of UTF-8 and the
> encoding to cp1251?
>

Latin1 has no representation for various characters which I like to
use in my personal notes. However most of the code I work with is
Latin1. And I want UTF-8 as my encoding so I can use fancy multibyte
characters in 'listchars', for example, and to view the source of web
pages encoded in UTF-8 or other unicode encodings.

I have the following at the top of my .vimrc right after "set
nocompatible", it seems to work very well:

if has('multi_byte')
set encoding=utf-8
setglobal fenc=latin1
exec "set listchars=nbsp:\u2423,conceal:\u22ef,tab:\u2595\u2014,trail:\u02d1"
" could append this: ,eol:\u21b2
" but it is a little annoying to have on every single line
set list

" Don't detect utf-8 without a BOM by default, I don't use UTF-8 normally
" and any files in latin1 will detect as UTF. Detect cp1252 rather than
" latin1 so files are read in correctly.
set fileencodings=ucs-bom,cp1252
if has('autocmd')
augroup fenc_detect
au!

" Viewing HTML source is mostly when I need to worry about utf-8.
" Eventually I hope to use a plugin to reload based on the META tag but
" for now just detect it as normal. I don't tend to use non-latin1
" characters in HTML anyway.
autocmd BufReadPre *.{x,}htm{l,} set fileencodings=ucs-bom,utf-8,cp1252
autocmd BufReadPost *.{x,}htm{l,} set fileencodings=ucs-bom,cp1252

" Detect when a buffer should actually be latin1 (i.e. there are no cp1252
" bytes in the buffer). cp1252 is a superset of latin1. See
" http://en.wikipedia.org/wiki/Cp1252 for details.
"
" Since latin1 is a subset of cp1252, this does not ACTUALLY modify the
" buffer, so bypass the modifiable option.
let cp1252_latin1_diff =
\ '\u20AC'. '\u201A'. '\u0192'. '\u201E'. '\u2026'.
'\u2020'. '\u2021'. '\u02C6'. '\u2030'. '\u0160'. '\u2039'. '\u0152'.
'\u017D'.
\ '\u2018'. '\u2019'. '\u201C'. '\u201D'. '\u2022'.
'\u2013'. '\u2014'. '\u02DC'. '\u2122'. '\u0161'. '\u203A'. '\u0153'.
'\u017E'. '\u0178'
autocmd BufReadPost * let s:oldmod = &modifiable | if !s:oldmod
| setlocal modifiable | endif
autocmd BufReadPost * if &fenc=='cp1252' &&
search('['.cp1252_latin1_diff.']', 'nw') == 0 | setlocal fenc=latin1
nomodified | endif
autocmd BufReadPost * if !s:oldmod | setlocal nomodifiable | endif
augroup END
endif
endif

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: