Sunday, July 18, 2010

Re: Limitations of vim-python interface with respect to character encodings

On 19/07/10 03:20, winterTTr wrote:
>
>
> On Sun, Jul 18, 2010 at 10:02 PM, Tony Mechelynck
> <antoine.mechelynck@gmail.com <http://gmail.com>> wrote:
>
> On 18/07/10 12:25, winterTTr wrote:
>
>
>
> On Sun, Jul 18, 2010 at 3:08 AM, Ted
> <cecinemapasderange@gmail.com <mailto:cecinemapasderange@gmail.com>
> <mailto:cecinemapasderange@gmail.com
> <mailto:cecinemapasderange@gmail.com>>> wrote:
>
> Thanks for that guidance, that does fix the problems, also
> for eg
> `vim.current.range.append(u'\u2026'.encode('utf-8')`. It's
> sort of
> inconvenient to have to do this though, is there no way to
> set the
> default encoding that will be used when sending text from
> Python to
> vim? If there is no way to do this with the Python-Vim
> interface,
> then perhaps that is something that should be worked into a
> higher-
> level vim module for Python. Does such a thing exist?
>
>
> Until now, i have not seen such things, maybe the vim python
> interface
> should be given some improvement. :-)
>
>
> I'm also wondering why there is a need to set the encoding
> in a Vim
> variable as you've done. As I understand it (perhaps not so
> well...),
> Vim always works with utf-8 natively, even under Windows.
>
>
> I am not very clear about the relationship between vim internal
> encoding
> and python unicode string.
> When i use the unicode directly in vim, something incorrect
> always comes
> out. So i found an alternate way as i said.
> And it is OK now without any problem although it is not very
> convenient.
>
> The vim internal encoding is set by "encoding" global variable
> as i know.
> If you not set that, gvim "maybe" use the "utf8" as default, but
> i am
> very sure about this.
> And if you set the encoding to "utf16", the vim should use the
> "utf16"
> as internal encoding when it read in a file, as i understand.
>
>
> On Windows, Vim uses by default whatever encoding is set by your
> locale; but you can change it. If your locale sets UTF-16le, Vim
> will represent the data as UTF-8 in internal memory (because UTF-16
> is not usable for null-byte-terminated C strings) but the default
> for new files will (IIUC) be UTF-16le. Note that when no -le or -be
> qualifier is used, Vim assumes big-endian for UCS-2, UTF-16 and
> UTF-32 (aka UCS-4).
>
>
> Sorry, i am a little puzzled about your explanation.
> > If your locale sets UTF-16le, Vim will represent the data as UTF-8 in
> internal memory.
> Is it means that on Windows, no matter what the "encoding" is set, the
> internal vim will always process text data as utf8 encoding in internal
> memory?

No. Vim will use 'encoding' (whose default is set by your locale) to
represent the data in internal memory, except that if 'encoding' is one
of the following

ucs-2 aka ucs-2be aka unicode
ucs-2le
utf-16 aka utf-16be
utf-16le
ucs-4 aka ucs-4be aka utf-32 aka utf-32be
ucs-4le aka utf-32le

which include many null bytes in the representation of codepoints other
than U+0000, and therefore, as I said, are unsuitable for
null-byte-terminated C strings, Vim will use UTF-8, where null bytes
cannot validly appear except as the representation of U+0000, and whose
conversion to and from any of these encodings is trivial and (except in
the case of UTF-8 *to* UCS-2 or UCS-2le when the UTF-8 data includes
codepoints above 0xFFFF) lossless. See ":help 'encoding'" which covers
what I'm explaining here; this particular detail is lower, scroll until
the paragraph starting «When "unicode"» without the French quotes.

>
> > but the default for new files will (IIUC) be UTF-16le.
> This means for the "encoding" or "fileencoding" or BOTH?

The default, I said. The Vim default for 'fileencoding' is the empty
string, which means "use the 'encoding' setting". You can change it for
all new files by using ":setglobal fenc=something", or you can change it
for one newly-created file by using ":e ++enc=something filename", but
then you aren't using the defaults anymore, are you?

That default is not used for existing files because of the
'fileencodings' [plural] magic.

> It seems to be conflict with the former sentence
> # I think i may not understand what your explanation really means. :-)
>
> As i understand,
> # i am not very clear about the process of vim source code, i just give
> this conclusion as my using experience
> The vim will use the "encoding" variable as the internal encoding when
> it process text files, and if you do not set it to some value, vim will
> use the local encoding according to your system default or terminal
> specific.
> When vim tries to save file, it will transfer the content from
> "encoding" to "fileencoding", and save the result back to the file,if
> possible, or warn for the conversion error.
> Is my understanding correct?
>
> Sorry for my understanding ability, please explain a litter more for me :-)

See
:help 'encoding'
:help 'fileencoding'
http://vim.wikia.com/wiki/Working_with_Unicode

>
>
> On Unix, gvim with GTK2 GUI uses UTF-8 when in GUI mode. I "think"
> that Console Vim and non-GTK2 gvim follow your locale like in
> Windows but I'm not sure (my locale, or to be precise, my $LC_CTYPE
> defaulting to $LANG, is set to en_US.UTF-8).
>
>
>
> Cheers
> -Ted
>
>
> Best regards,
> Tony.
> --
> Fortune's Real-Life Courtroom Quote #19:
>
> Q: Doctor, how many autopsies have you performed on dead people?
> A: All my autopsies have been performed on dead people.
>
>

Best regards,
Tony.
--
"I'd love to go out with you, but the last time I went out, I never
came back."

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: