Sunday, July 25, 2010

Re: Limitations of vim-python interface with respect to character encodings

Yes, very informative. Thanks for that explanation; I had it that vim
would *always* convert to utf-8 for internal use.

Cheers
-Ted

On Jul 18, 11:04 pm, Tony Mechelynck <antoine.mechely...@gmail.com>
wrote:
> On 19/07/10 03:20, winterTTr wrote:
>
>
>
>
>
>
>
> > On Sun, Jul 18, 2010 at 10:02 PM, Tony Mechelynck
> > <antoine.mechely...@gmail.com <http://gmail.com>> wrote:
>
> >     On 18/07/10 12:25, winterTTr wrote:
>
> >         On Sun, Jul 18, 2010 at 3:08 AM, Ted
> >         <cecinemapasdera...@gmail.com <mailto:cecinemapasdera...@gmail.com>
> >         <mailto:cecinemapasdera...@gmail.com
> >         <mailto:cecinemapasdera...@gmail.com>>> wrote:
>
> >             Thanks for that guidance, that does fix the problems, also
> >         for eg
> >             `vim.current.range.append(u'\u2026'.encode('utf-8')`.  It's
> >         sort of
> >             inconvenient to have to do this though, is there no way to
> >         set the
> >             default encoding that will be used when sending text from
> >         Python to
> >             vim?  If there is no way to do this with the Python-Vim
> >         interface,
> >             then perhaps that is something that should be worked into a
> >         higher-
> >             level vim module for Python.  Does such a thing exist?
>
> >         Until now, i have not seen such things, maybe the vim python
> >         interface
> >         should be given some improvement. :-)
>
> >             I'm also wondering why there is a need to set the encoding
> >         in a Vim
> >             variable as you've done.  As I understand it (perhaps not so
> >         well...),
> >             Vim always works with utf-8 natively, even under Windows.
>
> >         I am not very clear about the relationship between vim internal
> >         encoding
> >         and python unicode string.
> >         When i use the unicode directly in vim, something incorrect
> >         always comes
> >         out. So i found an alternate way as i said.
> >         And it is OK now without any problem although it is not very
> >         convenient.
>
> >         The vim internal encoding is set by "encoding" global variable
> >         as i know.
> >         If you not set that, gvim "maybe" use the "utf8" as default, but
> >         i am
> >         very sure about this.
> >         And if you set the encoding to "utf16", the vim should use the
> >         "utf16"
> >         as internal encoding when it read in a file, as i understand.
>
> >     On Windows, Vim uses by default whatever encoding is set by your
> >     locale; but you can change it. If your locale sets UTF-16le, Vim
> >     will represent the data as UTF-8 in internal memory (because UTF-16
> >     is not usable for null-byte-terminated C strings) but the default
> >     for new files will (IIUC) be UTF-16le. Note that when no -le or -be
> >     qualifier is used, Vim assumes big-endian for UCS-2, UTF-16 and
> >     UTF-32 (aka UCS-4).
>
> > Sorry, i am a little puzzled about your explanation.
> >  > If your locale sets UTF-16le, Vim will represent the data as UTF-8 in
> > internal memory.
> > Is it means that on Windows, no matter what the "encoding" is set, the
> > internal vim will always process text data as utf8 encoding in internal
> > memory?
>
> No. Vim will use 'encoding' (whose default is set by your locale) to
> represent the data in internal memory, except that if 'encoding' is one
> of the following
>
>         ucs-2 aka ucs-2be aka unicode
>         ucs-2le
>         utf-16 aka utf-16be
>         utf-16le
>         ucs-4 aka ucs-4be aka utf-32 aka utf-32be
>         ucs-4le aka utf-32le
>
> which include many null bytes in the representation of codepoints other
> than U+0000, and therefore, as I said, are unsuitable for
> null-byte-terminated C strings, Vim will use UTF-8, where null bytes
> cannot validly appear except as the representation of U+0000, and whose
> conversion to and from any of these encodings is trivial and (except in
> the case of UTF-8 *to* UCS-2 or UCS-2le when the UTF-8 data includes
> codepoints above 0xFFFF) lossless. See ":help 'encoding'" which covers
> what I'm explaining here; this particular detail is lower, scroll until
> the paragraph starting «When "unicode"» without the French quotes.
>
>
>
> >  > but the default for new files will (IIUC) be UTF-16le.
> > This means for the "encoding" or "fileencoding" or BOTH?
>
> The default, I said. The Vim default for 'fileencoding' is the empty
> string, which means "use the 'encoding' setting". You can change it for
> all new files by using ":setglobal fenc=something", or you can change it
> for one newly-created file by using ":e ++enc=something filename", but
> then you aren't using the defaults anymore, are you?
>
> That default is not used for existing files because of the
> 'fileencodings' [plural] magic.
>
>
>
>
>
> > It seems to be conflict with the former sentence
> > # I think i may not understand what your explanation really means. :-)
>
> > As i understand,
> > # i am not very clear about the process of vim source code, i just give
> > this conclusion as my using experience
> > The vim will use the "encoding" variable as the internal encoding when
> > it process text files, and if you do not set it to some value, vim will
> > use the local encoding according to your system default or terminal
> > specific.
> > When vim tries to save file, it will transfer the content from
> > "encoding" to "fileencoding", and save the result back to the file,if
> > possible, or warn for the conversion error.
> > Is my understanding correct?
>
> > Sorry for my understanding ability, please explain a litter more for me :-)
>
> See
>         :help 'encoding'
>         :help 'fileencoding'
>        http://vim.wikia.com/wiki/Working_with_Unicode
>
>
>
>
>
>
>
> >     On Unix, gvim with GTK2 GUI uses UTF-8 when in GUI mode. I "think"
> >     that Console Vim and non-GTK2 gvim follow your locale like in
> >     Windows but I'm not sure (my locale, or to be precise, my $LC_CTYPE
> >     defaulting to $LANG, is set to en_US.UTF-8).
>
> >             Cheers
> >             -Ted
>
> >     Best regards,
> >     Tony.
> >     --
> >     Fortune's Real-Life Courtroom Quote #19:
>
> >     Q:  Doctor, how many autopsies have you performed on dead people?
> >     A:  All my autopsies have been performed on dead people.
>
> Best regards,
> Tony.
> --
> "I'd love to go out with you, but the last time I went out, I never
> came back."

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: