Saturday, January 2, 2010

Re: Printing with utf-8 characters on Windows

Hello Chris,

I've tried to print in Linux (I use Linux Mint version 8, the printer is the Print To PDF) and the result is the same as in Windows.

I think this is a bug.

On Wed, Dec 23, 2009 at 3:31 AM, Chris Jones <cjns1989@gmail.com> wrote:
On Sun, Dec 20, 2009 at 11:36:27AM EST, Đức Minh Thái wrote:
> Hello,
> I cannot get utf-8 characters printed correctly. For example:
>
> bột
>
> becomes
>
> bá»™t

U+1ED9   ộ   LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW

See:

:help ga

In utf-8, this character is encoded by the following sequence of three
bytes:

0xe1, 0xbb, 0x99

See:

:help g8

This is what a utf-8 encoded file with the three characters 'bột'
actually contains:

00000000  62 e1 bb 99 74 0a                                 |b...t.|
00000006

0x62             b   LATIN SMALL LETTER B
0xe1,0xbb,0x99   ộ   LATIN SMALL LETTER O WITH CIRCUMFLEX AND DOT BELOW
0x74             t   LATIN SMALL LETTER T

The final 0x0a is a line feed control character.

In Microsoft Windows' cp1252:

0xe1    á
0xbb    »
0x99    ™

 http://en.wikipedia.org/wiki/Windows-1252

You do not give much detail as to where you see what, but I am probably
not far off the mark assuming that 'bột' is what you see when editing a
utf-8 encoded file in vim, and that 'bá»™t' is what you see on your
printout.

Being unfamiliar with Microsoft Windows, I'm speculating a bit, but it
does look like your printing software is processing the file as if it
were cp1252 rather than utf-8.

> My printing options are:
>
> set printfont=LMMono10:h10 " This is the LMMono from LaTeX Latin Modern
> set printoptions=number:y
> set printencoding=ucs-2le bomb

If your file is utf-8 encoded, why do you tell vim that it is ucs2..?

:h penc-option

In particular, this help file states that:

Code page 1252 print character encoding is used by default on Windows
and OS/2 platforms.

> Please help. Thank you!

I am not familiar with Microsoft Windows, so I don't really have an
answer to your question but you could try:

:set penc=

or..

:set penc=utf-8

and see if the 'bột' string prints correctly.

My understanding is that compiled with the adhoc +options, Vim should be
able to process utf-8 encoded files transparently on any platform but
you may also want to ask Vim to convert the file.

Take a look at:

:h ++enc
:h ++ff

If that doesn't help, please attach a small sample file, see if someone
on the list can come up with something more conclusive.

CJ



--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php



--
Minh Duc Thai - StudentID: 0711040
Faculty of Mathematics and Computer Science
University of Science
Vietnam National University - Ho Chi Minh City
227 Nguyen Van Cu street, District 5, Ho Chi Minh City, Vietnam

--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

No comments: