Monday, October 4, 2010

Re: How to set utf-8 locally (for a buffer) on loading the file

On 4 Oct, 15:42, Ben Fritz <fritzophre...@gmail.com> wrote:
>
> You can also set fileencoding manually after a file read, so that you
> can convert it to a different encoding when writing the file. You will
> probably want this new encoding in your fileencodings option so it can
> be detected,

If I set fileencoding manually, I see no changes on the screen. What
does this option exactly controls? According to the help: "Sets the
character encoding for the file of this buffer." But honestly, I don't
get it. What does the statement means? I have a number of fundamental
questions about the subject of (vim and) encoding:

1) As far as I know, there is no information stored with a text file
about in what encoding the series of bytes makes sense as a text. An
editor makes a guess on trying to open and display the file based on
fist N bytes, on certain patterns, etc, but in the end is it always a
guess, and sometimes the editor get it wrong. Is this right?

2) When a file is loaded from disk into vim, what does exactly happen
with the bytes? Is there any option in vim that influences this
process? My guess is that the editor interprets the original sequence
of bytes (as on disk) according to the rules of some character
encoding; for vim, this would be the value of the 'encoding' option.
Is this correct?

3) Based on these rules, the editor knows when to take one or two or
more bytes to build a single *character*, and if more that one, in
which order. From that, the editor has decided which *characters*
(not bytes) the text contains. So for example, the sequence 1A 2B F3
E5 66 could be interpreted as
(1A 2B) (F3) (E5 66) according to encoding 1
(2B 1A) (E5 F3 66) according to encoding 2
where each () group represents a 'character' in the respective
encoding. Thus, according to encoding 1 one would have for example:
"small a", "capital z" and "digit 8", whereas according to encoding 2
one would have "question mark" and "small u umlaut". Is this
description correct?

4) What decides how the bytes are displayed in the screen? My
understanding is that the font comes now into play; to each
*character*, a glyph is provided by the font, and this is what is
displayed on the screen. Is this description correct? If yes, how can
I in vim change the way vim interprets the sequence of bytes according
to a different encoding? Is it necessary to reload the file? If I use
'set fileeconding=blah', no change is visible, whereas if when I use
':e ++enc=blah', the displayed glyphs do change. This is probably due
to the fact that ':e ++enc' effectively reloads the sequence from disk
(or rereads the original sequence of bytes from memory), and in doing
so it resolves the bytes into characters according to the newly
specified character encoding. On the other hand, 'set
fileencoding=blah' does not seems to reload/reread anything. What is
the effect of this option? I have a couple of ideas, but I first like
to know the answer to the following question.

5) What happens when I type something on the keyboard? This is a
similar situation a reading from the disk; in the end, it about a
sequence of bytes being inserted at some place in the file; there is
also the need to interpret them as characters and look for glyphs on
some font to represent them (in case the file is being displayed or
printed). Also in this case I would expect some option in vim to
control how the bytes sent by my keyboard are to be interpreted. Which
are these options? Is it the current value of 'encoding'? Or of
'fileencoding'? or of 'termencoding'? And when? Only on terminals, or
also in GUI? and does makes a difference whether I am on Win32 or on
*nix? Or if I use GTK or not? or if I use Cygwin or not?

6) What happens when the file is written to disk (:w)? My guess is:
after reading the bytes, resolved then into characters and having
found a glyph for each character and displayed on the screen, the
editor works exclusively 'on characters, not on bytes'. According to
this, when writing back to disk, the editor would then reverse-
engineer the characters into bytes according to the rules of some
encoding option. What would be this option, 'encoding',
'fileencoding', something derived from 'fileencodings', what?

As you see, too many basic question that cannot be answered with
'fileencoding: Sets the character encoding for the file of this
buffer'.

>
> > > On a related note: is it possible to set different fonts in different
> > > vim windows/tabs within a single application window?
>
> This is not possible. 'guifont' is always global. Can you not find a
> font with all the glyphs of interest to you? Or maybe, keep two
> different shortcuts/aliases for Vim, one for each font you need to
> use, and always use separate Vim instances.

My default font is Courier New 9pt, but when I open the Japanese file,
I have to change the font to something like MS Mincho 12pt or MS
Gothic 12pt, since Courier New does not supports Japanese characters
and the size is also too small (for me). As of now, I'm changing the
font with an autocommand on opening the Japanese file, but then I'd
like to restore the previous font once I'm done with this file. I
guess I could :map a key to toggle between both fonts and sizes.

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment