Saturday, November 3, 2012

Re: search and replace using a list of characters to replace

On 03/11/12 20:42, stosss wrote:
> Greetings,
> Just trying to learn
>
> On Sat, Nov 3, 2012 at 3:24 PM, Tim Chase <vim@tim.thechases.com> wrote:
>> On 11/03/12 14:11, Chris Lott wrote:
>>> I have a large text file in which I need to remove all punctuation,
>>> all special characters ("smart quotes") and the like, and a bunch of
>>> selected words.
>>>
>>> Can this be done within Vim?
>>
>> Yes.
>>
>> Oh, you want to know *how*? :-P
>>
>> The smart-quotes are the hardest ones to do, but if you can enter
>> them in vim (or select+yank them, and then paste them into an Ex
>> command using control+R followed by a double-quote), they should be
>> usable:
>>
>> :%s/\([[:punct:]]\+\|"\|"\|selected\|words\)//g
>>
>> Alternatively, you might want to specify what *is* allowed and
>> invert it:
>>
>> :%s/\W\+//g " that's "everything that isn't a Word character"
>> or
>> :%s/[^[:alnum:][:space]]\+//g "all but alnum & spaces"
>>
>> which you can read about at
>>
>> :help :alnum:
>> :help /\W
>> :help /\|
>>
>
> Asking because I don't know and I don't use smart quotes. What makes
> them so difficult to remove in a s/search/replace/g ?
>
> Aren't they just quotation marks?
>
Well, yes, but most keyboards haven't got them. The "usual" quotation
marks (which I just used) are the same opening and closing, U+0022, and
all keyboards that I know of (even US-ASCII keyboards with no accents)
can easily produce them. Smart quotes can be "smart" or "smart„ or even
„smart": there are three characters for smart double quotes, U+201C
upper-6, U+201D upper-9, U+201E lower-9, and they are not the same
opening and closing, though which one is opening and which one is
closing varies by language and sometimes by country. These characters
are not in Latin1, I think they are in none of the ISO-8859 charsets, so
you need some Unicode charset (such as UTF-8) to be able to represent
them, and most keyboards either don't have them, or require some unusual
fingering to type them: on this Linux system with Belgian keyboard
layout, „ isn't available (I paste it from Vim where it has the digraph
:9), and " " are AltGr-v and AltGr-b respectively (hold AltGr while
hitting the letter). For single smart quotes it's even harder: '
AltGr-Shift-v, ' AltGr-Shift-b, ‚ (not the comma but the low-9 single
quote) not available. (AltGr is the key right of the space bar on
international keyboards, and if you've got a second plain Alt key there
you might try Alt together with Ctrl).

While I'm here, if you want to select your selected words only as full
words (e.g. "unusual" as a word but not as part of "unusually"; "word"
or "words" but not "worded") you should use \< (zero-length start of
word) and \> (zero-length end of word) as part of your pattern:

:%s/[[:punct:]""]\|\<unusual\>\|\<words\=\>//g

If you want to remove other kinds of quotes, e.g. « » ' ' ‚ i.e. U+00AB
U+00BB U+2018 U+2019 U+201A, the pattern can easily be extended.

To type smart quotes in Vim, if you haven't got them on your keyboard, I
recommend using digraphs, they're easy to remember:
" Ctrl-K " 6 double 6 above
" Ctrl-K " 9 double 9 above
„ Ctrl-K : 9 double 9 below
' Ctrl-K ' 6 single 6 above
' Ctrl-K ' 9 single 9 above
‚ Ctrl-K . 9 single 9 below
« Ctrl-K < < opening French
» Ctrl-K > > closing French
see :help digraph.txt; or you can input them by their Unicode codepoint
in hex, see :help i_CTRL-V_digit. («French» quotes are sometimes used in
»German« with the opposite meaning, BTW.)

The substitute above will not remove spaces around the words. You may
(if you want) *follow* this substitute with

:%s/ \{2,}/ /
to replace two or more spaces by one space, or with
:%s/ *\ze\%( \|$\)//
if you also want to remove any number of spaces at end of line. To
remove all spaces at begin or end of line but replace them by one space
elsewhere is harder to do in one operation. Hm...
:%s/\%(\%(^\| \)\zs *\)\|\%( *\ze\%( \|$\)\)//
should work I think, but it isn't very elegant.


Best regards,
Tony.
--
"But officer, I was only trying to gain enough speed so I could coast
to the nearest gas station."

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment