Tuesday, February 2, 2010

Re: remove and clean CDATA out of xml

bw wrote:
>> :%s/<!\[\[CDATA\[\(\%(\%(]]>\)\@!\_.\)\{-}\)]]>/\=substitute(submatch(1),'<[^>]*>', '', 'g')/g
>
> I have a hard time understand the \(\%(\%(]]>\)\@!\_.\)\{-}\) part.
> What does it do? What does \% mean? I do understand it will take
> anything in CDATA brackets and run the substiture command over it.

The \%(...\) is a non-capturing group.

The command breaks down as

:%s/ substitute

<!\[\[CDATA\[ a literal "<![[CDATA["

\( begin capturing
\%( begin non-capturing group #1
\%( begin non-capturing group #2
]]> a literal "]]>" close tag
\) (end non-cap group #2)
\@! isn't allowed to match here
\_. match any one character incl NL
\) (end non-cap group #1)
\{-} as few as possible
\) end capture group
]]> the literal "]]>" that matches
/ and replace it with
\= the following expression
substitute( uh...substitute :)
submatch(1), the content of the CDATA
'<[^>]*>', all tags and replace them
'', with nothing
'g') for all of the tags
/g for all of the matches on a line

In retrospect, because "]]>" unilaterally closes a CDATA and
you're capturing everything inside, you might be able to simplify
that to just

:%s/:%s/<!\[\[CDATA\[\(\_.\{-}\)]]>/\=substitute(submatch(1),'<[^>]*>',
'', 'g')/g

HTH,

-tim


--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment