Monday, April 26, 2010

Re: FAQ question section 11.4

On Mon, 26 Apr 2010, stosss wrote:

> On Mon, Apr 26, 2010 at 4:27 PM, Benjamin R. Haskell wrote:
> > On Mon, 26 Apr 2010, stosss wrote:
> >
> >> I have an ASCII table generated from an HTML file the curly quotes
> >> show up as 147 and 148, when I type / and then press CTRL and then
> >> v I get only this ^ if I type 147 ENTER I get this <93>
> >
> > ASCII does not contain 'curly quotes'.  Character 147 (hexadecimal
> > 0x93) in Windows codepage 1252 [CP1252] (the Microsoft 'extension'
> > of Latin-1 [ISO-8859-1]) is the equivalent of Unicode codepoint
> > U201C (decimal 8220).
> >
> >> I still don't understand how this works or what I am doing wrong.
> >
> > Your problem is with encodings.  The advice I would give depends on
> > what your ultimate goal is.  (E.g. there are rare cases where I'd
> > recommend keeping the Windows-y encoding.)
> >
> > :help 'encoding'
> > :help 'fileencoding'
>
> I was just reading through the FAQ at the general URL list in my OP of
> this thread. I saw that section previously mentioned. It looked like
> another way to do something that I learned how to do with help from
> this list. I was just trying to figure out how to do what the FAQ
> says. I could not. I read or attempted to read the help sections it
> suggested. Still could not figure it out after messing around with it
> so I posted my original question.
>
> My main objective is to read through the entire FAQ, the help docs in
> Vim, all the mail that comes on the list and experiment with Vim until
> I have learned as much as I can. I don't like working with solutions
> that I don't understand. I will put aside a faster solution and come
> back to it after I have more experience. Then I usually have better
> success figuring it out.

Long answer to my mostly-rhetorical question, but I'll follow up:

To restate my answer, then: The original problem stems from using the
character 147 (hex 0x93) in a file that's *not* encoded in CP1252 (which
is the default on American Windows machines, and might be what you'd get
from 'ASCII' on such a machine with a naïve editor/browser) *or* in a
Vim that's not expecting CP1252 data. By default [even on Windows?], I
think Vim expects UTF-8 (one way to encode Unicode) data.

For the setting that controls the former, see :help 'fileencoding'. For
the latter, :help 'encoding'.

The 'ÿ', it turns out, is actually character 255 (the maximum allowed
decimal entry):

See :help i_CTRL-V_digit (as someone else pointed out).

As I stated before, 8220 is the decimal equivalent of the Unicode point
U201C, which apparently can't be entered *in decimal* in vanilla Vim.
You can use the hexadecimal version, though.

e.g., to search for an opening 'curly double quote':
/ <C-v> u 2 0 1 c <Enter>

For more on encodings in general:

For a non-Vim explanation: http://en.wikipedia.org/wiki/Windows-1252

For a somewhat tongue-in-cheek, programming oriented take:
http://www.joelonsoftware.com/articles/Unicode.html

For more, I think my answer and others give you some keywords for
searching: character encodings, Latin 1, Unicode, CP 1252, multibyte,
ASCII.

--
Best,
Ben H

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Subscription settings: http://groups.google.com/group/vim_use/subscribe?hl=en

No comments: