Monday, August 2, 2021

Re: unicode: UTF / UCS

As some have said above, UTF-8 is a variable-length encoding, which
encodes 7-bit ASCII characters exactly like us-ascii, and characters
(codepoints) above U+007F in two or more bytes, each of them with the
high bit set. Originally Unicode was foreseen to be able to go as far
up as U+3FFFFFFF, but when UTF-16 was crafted and surrogate codepoints
were assigned it was decided that codepoints higher than U+10FFFF
would never mean anything (and U+F0000 to U+10FFFF are "for private
use" anyway, i.e. transmitter and receiver have to agree on the
values, which are not defined by Unicode). The Wikipedia page about it
is well-written and I recommend reading it.

The so-called "byte order mark" U+FEFF ZERO-WIDTH NO-BREAK SPACE
should more appropriately be coded an "encoding mark" : it can
discriminate most Unicode encodings and endiannesses from each other,
including UTF-8, which has no byte-order ambiguity. At the head of a
UTF-8 file (e.g. an HTML file or CSS script, whose syntaxes expressly
support it), it means "This is UTF-8". However some programs which
expect only US-ASCII will choke if they get a file headed by a BOM:
for instance a #! "executable script" header will not be recognized if
it is preceded by a BOM, so if you want to start your first line by
#!/bin/bash or #!/bin/env python the file may be in UTF-8 (which
encodes the 128 ASCII characters just like us-ascii) but without BOM.

See:
https://en.wikipedia.org/wiki/Unicode
https://en.wikipedia.org/wiki/UTF-8
and beware that the Microsoft Windows documentation usually says
"Unicode" when what it means is "UTF-16" which represents each
codepoint in one, or sometimes two, 16-bit words.

Best regards,
Tony.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAJkCKXtYN%3DvCHeQXiKygB864xYwr0GcnTxrUNRMfM-SvQc_2Xg%40mail.gmail.com.

No comments: