Tuesday, May 24, 2011

Re: [BUG] 'non-empty string' >? '' returns false on amd64 arch

On 25/05/11 02:56, Ivan Krasilnikov wrote:
> Also mb_strnicmp() assumes that lowercase and uppercase characters
> have the same length in UTF-8 representation. This isn't the case.
> Here are a few counterexamples:
>
> $ python -c 'print " ".join(["0x%.2X" % n for n in range(65536) if
> len(unichr(n).encode("utf8")) !=
> len(unichr(n).lower().encode("utf8"))])'
>
> 0x130 0x23A 0x23E 0x1E9E 0x2126 0x212A 0x212B 0x2C62 0x2C64 0x2C6D 0x2C6E 0x2C6F
>
> So I think the UTF-8 part of mb_strncimp() needs to be completely rewritten.
>

Yes, and in Turkish (i.e. with ":lang ctype tr" and 'casemap' empty), I
and i (1 byte each) have as respective case-counterparts ı and İ (2
bytes each).


Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
94. Now admit it... How many of you have made "modem noises" into
the phone just to see if it was possible? :-)

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: