Hi 'Johannes,
On Saturday, 2021-07-31 12:37:08 +0200, 'Johannes Köhler' via vim_use wrote:
> > It's not that simple unfortunately, UTF-16 (let's leave aside UCS-2, it
> > shouldn't matter) cannot be assumed to always have two bytes per
>
> UCS: _Uni_versal _Cod_ed Character Set
>
> In my mind, UCS is the mathematical quantum and UTF the
> encoding/decoding function using this:
> magnitudes: 16(32)bit
> plurality: charset / coded character
You are confusing things.
UCS-4 and UTF-32 as its subset are capable to hold respectively encode
assigned Unicode characters as direct representations of the Unicode
characters' code points.
UCS-2 is a 2-byte fixed width character set capable of encoding 65536
characters, or just the Unicode Basic Multilingual Plane (BMP).
UTF-16 is capable to encode the entire Unicode character range. It is
almost identical to UCS-2 in the first 64k characters, except the
"escape sequences" it needs to represent surrogate pairs for characters
of higher planes.
> Assuming that the data of the hdd partition tables (e.g.UID),
> used by the operating system, are encoded in 16bit Unicode.
> Well, my inferring thoughts were that UCS-2 is a
> hardware encoding, UTF-8 for ASCII purpose, UTF-32 a
> high level programmer attitude and UTF-16 the real unicode.
That's all nonsense. Really.
> In the end that means, the controller is made for 2-byte.
> The old ASCII code needs 7bit and probably one for
> sth., now than UTF-8 has to work with a different endian.
There is no endianess in UTF-8. Unless your hardware has less than
8 bits per word..
> And... why should i use a deprecated ASCII scheme
> at my system, when i can have lots of advantage
> using utf-16 (e.g. control/hash functions). It fells
> like utf-8 is a "work around" wrapper for
> the ASCII scheme...
UTF-8 is an efficient encoding that for Unicode characters <128 (which
happen to be identical with ASCII and a subset of Unicode) needs only
1 byte per character, whereas UTF-16 needs at least 2 bytes for each
character.
UTF-16 is a workaround for those who wanted Unicode and started off with
UCS-2 but then realized there's more than just BMP.
Or, UTF-16 is the devil's work:
https://robert.ocallahan.org/2008/01/string-theory_08.html
Eike
--
OpenPGP/GnuPG encrypted mail preferred in all private communication.
GPG key 0x6A6CD5B765632D3A - 2265 D7F3 A7B0 95CC 3918 630B 6A6C D5B7 6563 2D3A
Use LibreOffice! https://www.libreoffice.org/
--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/YQdSsnKXRVsDvhFt%40kulungile.erack.de.
Sunday, August 1, 2021
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment