Saturday, July 31, 2021

Re: unicode: UTF / UCS

On 27.07.21 21:37, Gabriele F wrote:
> (If I remember correctly) the first versions of Unicode had only a
> 2-byte encoding, so that (part of the) manpage is very old.

"_that would appeal to me_
... unsalaried working for the linux manpage people, to
keep them far off of "muddle-headed", always up-to-date
and referencing...
I ❤ reading manpages"

Is there a global internet group like for the vim developing?

>> Furthermore, be interested myself in the filesystem behavior
>> and unicode with ucs-2. Is it possible to use a linux
>> filesystem with 2-byte unicode encoding on principle.
>
> I'm not so strong on Linux but filesystems shouldn't have anything to do
> with text files encodings

I thought it like that, because the vim option ":set fenc"
implies file->system...

> you're probably thinking too much ahead, these issues have likely
> nothing to do with endianness

by myself aspirate to think inferring instead of ahead, but
maybe yu meant "keep the point", then i agree ;)

> It's not that simple unfortunately, UTF-16 (let's leave aside UCS-2, it
> shouldn't matter) cannot be assumed to always have two bytes per

UCS: _Uni_versal _Cod_ed Character Set

In my mind, UCS is the mathematical quantum and UTF the
encoding/decoding function using this:
magnitudes: 16(32)bit
plurality: charset / coded character

> All in all, it's nice if you want to understand how things are at the
> lower levels, it's quite fun to know it, but in order to achieve that

I did experience, that knowledge in "low-level" results into
the possible inferring thought to discuss with others...
(without having to stick deep into the science of the
electronic subject).

Such as,

a hardware hdd controller is using a bit buffer for
transferring the bit word to the static memory. This
controller can handle an defined bit length (normally the
bus width).

Assuming that the data of the hdd partition tables (e.g.UID),
used by the operating system, are encoded in 16bit Unicode.
Well, my inferring thoughts were that UCS-2 is a
hardware encoding, UTF-8 for ASCII purpose, UTF-32 a
high level programmer attitude and UTF-16 the real unicode.

In the end that means, the controller is made for 2-byte.
The old ASCII code needs 7bit and probably one for
sth., now than UTF-8 has to work with a different endian.

And... why should i use a deprecated ASCII scheme
at my system, when i can have lots of advantage
using utf-16 (e.g. control/hash functions). It fells
like utf-8 is a "work around" wrapper for
the ASCII scheme...

And... ucs (by princible) is probably using the science
leap from block oriented sequential access (HDD) to
byte oriented random access memory (SSD). Maybe, it plays
with the one to four bit-octets and the endian. ASCII
seems to be developed on the sequential encoding form.

sincerely
-kefko

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/29a674cd-7be1-8abb-829d-ba6c3ddb2c16%40googlemail.com.

No comments: