Tuesday, July 27, 2021

Re: unicode: UTF / UCS

Hi, first of all you seem to have misunderstandings about what UTF-8 and
the other Unicode encodings are. If you're interested and confident with
low-level things I advise you to learn exactly what they are. The
relevant portions of the Unicode specification (unicode.org) are not
very long or exceedingly hard to understand, but maybe you can find some
more accessible description.

Most of all, UTF-8 is (normally) absolutely indistinguishable from
normal US-ASCII until you use characters that were not in US-ASCII; so
for example most English files will be bit-per-bit identical whether
written in US-ASCII or UTF-8.

Then, there are many fairly complex issues in how files are read,
converted and written by the various parts of the system. Vim is an
especially problematic part, I had made an attempt of understanding it
in the message
https://www.mail-archive.com/vim_use@googlegroups.com/msg57383.html and
the others of that thread. But you probably won't make much out of it
until you know how at least UTF-8 is encoded.

Finally, if you really want to be sure of having all your files encoded
in Unicode (in UTF-8 or other encodings), then I applaud you and agree
with your concern, and I suggest the way I do it (yes there actually is
a way):
https://www.mail-archive.com/vim_use@googlegroups.com/msg57385.html .
The BOM mentioned there is a byte sequence that can be placed at the
beginning of text files and will be interpreted by unicode-aware
software as a sort of invisible declaration that the file is in a
certain Unicode encoding.

By the way, all of this means that it's not ascii that is "deprecated",
but the various complimentary or alternative encodings that were (and
still partly are) used to support non-English characters.

Kind regards,
Gabriele



P.S. I'm not sure I'll be able to further reply in the next days, I'm in
a complex situation






'Johannes Köhler' via vim_use wrote:
>
> Beloved vim'er!
>
> until shortly before... I never came up with
> the idea of doing: "thinking about the text file encoding
> of my files@hdd"
>
> I used unicode like a definition at my locales. Still in
> mind that my files are utf-8 encoded.
>
> BUT, after a file crash - during the system play with an
> old ext2 filesystem and gnu tar, i had an file header
> without file in my inodes. Like an condensor without
> payload :) AND, out of curiosity i probed a bit with vim
> files, and utf-8 (but btrfs) and an up-to-date archlinux.
>
> Then, I realized that there are three encoding views:
> keyboard, display(terminal), vim. Like, decoding pipes to
> an encoded socket. The encoded socket, the file itself,
> works partly inconsistent together with vim, xterm and
> the unixtool file.
>
> Setting: I create an file using xterm console and touch.
> Then, i open it with vim.
>
> Vim: enc & fenc = utf-8
> BUT file -i: us-ascii
>
> The file results with 2-byte per Character, yet like
> us-ascii inside of an unicode container. However, i
> like to have real unicode and not an endianness
> of us-ascii using 2-byte instead of 1-byte.
>
> Then @vim, i change the encoding to ucs-2 with :set fenc=ucs-2. I
> read@vimdoku ucs-2 and utf-8 is similar@linux
> Now :write, vim tells me [converted] and
> file (sometimes) tells me utf-8 like expected. The file
> size increases to 4-byte per character, like expected
> for ucs-4. Then reread @vim, shows me unreadable content.
> I have to ++enc it back to ucs-2. So, inside vim ucs-2 and utf-8 seems
> to be different. And @linux ucs-2 using
> filespace like ucs-4.
>
> Imaginary reasoning: my system wide (or kernel working)
> utf-8 differs from real unicode utf-8 by endianness
> abuse. Maybe because of compatibility...
> That is why the file tool works inconsistent
> (partly tells binary stuff instead of text encoding).
>
> Is there a way to ensure working with true utf-8
> or better utf-16 files? Aim is to work with source
> files in unicode to exclude the deprecated ascii...
>
> Sincerly
> -kefko
>

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/fd8feffe-891b-5a14-223c-9ebdf99841ac%40tiscali.it.

No comments: