Rebbe Malachi: Re: bad display of output utf-8 chars

On Thu, Dec 7, 2017 at 6:19 PM, Ni Va <nivaemail@gmail.com> wrote:
> Le jeudi 7 décembre 2017 17:45:19 UTC+1, Tony Mechelynck a écrit :
>> On Thu, Dec 7, 2017 at 5:37 PM, Ni Va <nivaemail@gmail.com> wrote:
>> [...]
>> > I understand and saw disastrous chinese results yesterday.. :) hopefully I had a 7z of my Vi distribution :)
>> >
>> > So, if copen does not accept ++enc modifier which way can I take to modify only copened tempfile ?
>>
>> :-( I don't know. Have you tried to have robocopy create it in Windows-1252?
>>
>>
>> Best regards,
>> Tony.
>
> No, just reading that for the moment
> https://cloud.google.com/storage/docs/gsutil/addlhelp/Filenameencodingandinteroperabilityproblems
>
> But the problem should be generic: Have capacity to change encoding into buffer only.
>
> At end of my launching jobs mecanism, all messages are put in a temp file, this is a user case with robocopy but many others tools too. (Siemens 90' for example)

The charset used by Vim to represent data in memory ('encoding') is
global. For each edit buffer, there is in addition the 'fileencoding'
which Vim uses to remember which charset is used by the file on disk.
That is buffer-local and is set either explicitly by reading the file
with ++enc= or else by means of the 'fileencodings' (plural) option
which defines the heuristic to be used.

A recommended 'encoding' value is utf-8 because that can be translated
losslessly to and from all other charsets: Latin1, UTF-8, UTF16 (le or
be) and UTF-32 (aka UCS-4, le or be) are handled internally by Vim;
the rest uses a library such as iconv (iconv.dll or libiconv.dll on
Windows with +iconv/dyn, or the iconv library can be linked statically
on any platform when Vim was compiled with +iconv without /dyn).

'fileencodings' (the heuristic) is a comma-separated list. Each
charset is tried in turn, until there is one which gives no error.
There should be at most one 8-bit charset and it should come last,
because 8-bit charsets can give no "failure" signal. A recommended
value (and the Vim default if 'encoding' is set to some Unicode value)
is ucs-bom,utf8,default,latin1 where the "default" encoding (the
system default), which can be for instance some national Far-East
encoding, will be tried if no Unicode BOM is found (that's "ucs-bom")
and if the file is not in UTF-8 (which has very strict rules for what
a valid byte sequence is). You might want to add utf-16le before
"default" if you often use files in UTF-16le without BOM. A result of
this particular heuristic is that files in 7-bit US-ASCII will be
recognized as UTF-8 but that is not an error because the two are
(intentionally) byte-for-byte compatible in the ASCII range which is
0..0x7F

For details, see http://vim.wikia.com/wiki/Working_with_Unicode most
definitely including the "References" section at the end, which gives
a number of "places of interest" in the Vim online help.

Best regards,
Tony.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Rebbe Malachi

Thursday, December 7, 2017

Re: bad display of output utf-8 chars

No comments:

Post a Comment