> On Fri, 7 May 2010, surge wrote:
>
>> Seems like unicode characters throw off the "col" function. The column
>> numbering is different between its return and what I see in vi while
>> moving around (I get dashed column numbers like 13-11).
>
> The first number is the column [col('.')]. The second is the virtual
> column [virtcol('.')].
>
> Unicode chars can have widths other than one, e.g.:
> \UFEFF = ZERO WIDTH NO-BREAK SPACE (a.k.a. byte-order mark) has width 0
> \UFF01 = FULLWIDTH EXCLAMATION MARK has width 2
>
> But, I'm also seeing oddness. E.g. the line that displays as:
> <feff>!asdf
>
> (Entered as: ^V u f e f f ^V u f f 0 1 a s d f )
>
> On the BOM, the column is '1' (as expected)
> On the Asian exclamation, though, it's 4-7, and the 'a' shows up as 7-9.
>
> In UTF-8:
> \UFEFF = \xEF \xBB \xBF (3 bytes)<feff> = display width 6 chars
> \UFF01 = \xEF \xBC \x81 (3 bytes) ! = display width 2 chars
>
> So apparently the first number is bytes, not characters?
>
> :h col() calls the result the 'byte index', so it makes sense, but how
> would one get the character position?
>
You don't get it directly. col() is in bytes, virtcol() is in display
cells. A fullwidth CJK character takes up two cells and (in UTF-8) three
or four bytes. A hard tab is one byte, one to 'tabstop' cells. <feff> is
three bytes, six cells. <80> is two bytes, four cells. And so on.
To get the character position, you can replace every character by (let's
say) a dash between start-of-line and the cursor: then col() and
virtcol() will both equal the number of characters. Then undo.
Best regards,
Tony.
--
Shit makes the flowers grow and that's beautiful
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
No comments:
Post a Comment