On Thu, Dec 15, 2011 at 4:42 PM, Tony Mechelynck <antoine.mechelynck@gmail.com> wrote:
On 15/12/11 22:15, Graham Lawrence wrote:
--How can I find non-printing characters in a text? I do not know which
specific characters I'm looking for, only that two different such
exist. I have tried /Ctrl+V Ctrl+A thru Z to no avail. Others that I
found visually appeared in vim as ~V ~W etc, but /~ would not go to any
of them so the tilde must designate tokens for something else. As the
text was derived from html, I suspect what I'm looking for are those
curly opening and closing double-quotes.
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
For Latin1, the nonprinting characters are 0x00 to 0x1F (Ctrl-@ to Ctrl-_) and 0xFF to 0x9F (Ctrl-? to Ctrl-Alt-_). The following mapping ought to find them (assuming 'magic' and 'nocompatible'):
:map <F4> /[<Bslash>x00-<Bslash>x1F<Bslash>xFF-<Bslash>x9F]<CR>
:map <S-F4> ?[<Bslash>x00-<Bslash>x1F<Bslash>xFF-<Bslash>x9F]<CR>
Note: this considers the space (0x20), the no-break space (0xA0) and the soft hyphen (0xAD) as "printing", the tab (0x09), carriage return (0x0D) and form feed (0x0C) as "nonprinting"; it also does not regard the end-of-line character (0x0A under Unix, Ox0D followed by 0x0A under Windows, 0x0D under Mac OS 9 or earlier) as part of the line. If your assumptions are different, a more or less trivial modification of the above mappings should suit you.
For UTF-8 it's harder since there is a limit (257 or 258 I think) to the number of different characters that a collection can match, and OTOH there are non-printing characters all over the Unicode range, especially if you include "noncharacters", "invalid codepoints", unpaired surrogates (or any surrogates, even paired, if found in other than UTF-16 be or le) and "private-use" codepoints.
To find _only_ invalid UTF-8 bytes (in Latin1 text), use 8g8 in Normal mode.
To find the value of the character under the cursor (as a printable character if it is one, and in decimal, octal and hex), use ga
The representation ^A ~B |C (usually in blue) used by Vim for characters declared as not part of 'isprint', means Ctrl-A, Ctrl-Alt-B, Alt-C. See the option's help for details.
see
:help /[]
:help /\]
:help map_backslash
:help 8g8
:help ga
:help 'isprint'
http://www.unicode.org/charts/
and in particular
http://www.unicode.org/charts/PDF/U0000.pdf
http://www.unicode.org/charts/PDF/U0080.pdf
(about the latter two, note that Unicode codepoints U+0000 to U+00FF are the 256 characters of Latin1 in the same order).
Best regards,
Tony.
--
Conscience is a mother-in-law whose visit never ends.
-- H. L. Mencken
Many thanks, just what I needed.
Graham
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
No comments:
Post a Comment