Sunday, July 5, 2015

Two problems with Tamil (unicode) text

There seems to be two issues with rendering of Tamil text in gVim 7.4 (running on Windows 7).

1. An issue with combining characters. Here's an example: http://i.imgur.com/2qGNqmB.png showing on the left how the characters are supposed to be rendered, and on the right how they end up being rendered by Vim. The left version is rendering by Notepad++, with the same font selected as Vim (in this case, it was 'Source Code Pro', but the same is the case with 'Courier New' or 'Dejavu Sans Mono' or even 'FreeMono' from GNU.

What is happening in the Vim screenshot is that the 'combining characters', instead of being combined using the Unicode rules, are apparently just being overlayed upon the base character. This would work, for eg., with French or German accents (afaik), but in languages like Tamil there are individual variations on what should be done when particular combining characters are combined with particular base characters, and "just lay them on top of each other" doesn't really work. Notepad++ seems to manage this combining properly, while gVim on the same computer using the same font doesn't.

2. This one I think isn't really Vim's fault, but I'm hoping people here might have a workaround or a suggestion anyway: Tamil fonts, even those claim to be monospace, never seem to really have a fixed width (again, I've tried all available monospace fonts, plus installing GNU FreeMono for this). Adding to this, Vim seems to consider them monospace anyway, and chooses a particular fixed width for all of them, ignoring the rest of them (the red area in the characters here: http://i.imgur.com/qfnDg32.png). "Ignoring" not in the sense of not displaying them at all, but laying out the next character on top of the extruding part of the current character, overwriting it. This leads to quite incomprehensible words (for eg., the word on the left here: http://i.imgur.com/HK35z0e.png is supposed to look like the letters on the right placed together, but the overlaps make it near impossible to read).

To be clear, it's not an input issue - Vim accepts the characters and stores them well enough, and I'm able to view them on other editors correctly. It's only a display-level issue.

Also, FWIW, a Cygwin terminal Vim seems to have issue 2 but not the issue 1 about combining characters. Or rather, it displays them properly, until I navigate through the individual characters, at which point they get split up into 'base character' and 'combining character' separately (as if, for eg., ñ was displayed as n◌̃ instead). This seems to be a whole new can of worms, but it at least seems to narrow down issue 1 to Windows gVim's layout engine.

Any input that sheds light on or improves the situation would be appreciated!

Thanks,
Sundar

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: