Monday, October 23, 2017

Re: How to handle non-ascii characters?

Barry Gold wrote:
> None of these looks like themselves when I edit the file with vim in a
> cygwin Terminal window. I can search for [^ -~^t] to find the non-ASCII
> characters, then go to the original word document to find out what the
> correct character is. If I had only a few of these, that would be
> enough. But in a longer document, a given non-ASCII can occur hundreds
> of times. So once I've found (e.g.) an emdash, I want to replace _all_
> occurrences with  "—". But I have no way of representing the
> character I want to replace on the command line.

I have a very similar problem to yours and have evolved some fixes that
I use. You've already gotten some replies, but maybe my methods would
help, too.

In my case, I paste content from web pages into Usenet posts and want to
have as much US-ASCII as possible for best readibility. To that end I
have a specific vimrc for news that fixes things with map!s. It could
easily be modified to a ':so script' usage, to fix things on command
or a 'autocmd BufRead *.html' script to fix thins on load.

In my vimrc:

autocmd BufRead .article.* :so ~eli/.news_vimrc

And my news_vimrc looks like this:

:r! cat ~/.news_vimrc | mmencode -q
" smart quotes
map! =E2=80=99 '
map! =E2=80=98 '
map! =E2=80=9C "
map! =E2=80=9D "
map! =E2=80=B3 "
" ellipsis
map! =E2=80=A6 ...
" n-dash
map! =E2=80=93 --
" m-dash
map! =E2=80=94 --
" U+2212 minus
map! =E2=88=92 -
" U+2010 hyphen
map! =E2=80=90 -
"
" find non-ascii
map <F5> /[^ -~]<cr>
" add mime headers if leaving in non-ascii
map <F6> iContent-Type: text/plain; charset=3D"UTF-8"<cr>MIME-Version: 1.=
0<cr><esc>
map! <F6> Content-Type: text/plain; charset=3D"UTF-8"<cr>MIME-Version: 1.=
0<cr>
" general news settings
set ai sw=3D4 tw=3D72

Basically, I'm suggesting that you take all the charcters you find and
want to replace, and save the replacements in a script you can run
easily before looking for new characters that you want to fix.

I use http://qaz.wtf/u/ "Show unicode character" if needed to identify
characters, the plugin might suit you better.

And I have a long-standing macro:

" Use * to "run" a line from the edit buffer
" Mnemonic: * is executible in "ls -F"
" Uses register y
:map * "yyy@y

If I were you, I would make the commands, test them with *, then 'p'ut
them in the fix script.

That * command is one of three macros I consider essential. The other
two I think are less likely to be universally useful, but anyway:

" Find previous space and split line on it
" Mnemonic: 'S'pace
:map S F r<CR>
"
" Double the character under the cursor
" Mnemonic: fix C code like "if (0 = i) ..."
:map = y p

Elijah
------
can type his entire vimrc from memory, and often does

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: