> I am looking for a way to remove the CDATA and only get the text.
> CURRENT:
> <add>
> <doc>
> <some_title>My title</some_title>
> <content><![[CDATA[
> <p>The <strong>keyword</strong> is nice to have but is not needed to
> include in a solr feed</p><p><table cellspacing="2" cellpadding="2"
> border="1" width="100%"><tbody><tr><td>Étape 1 :</td></tr>
> ]]></content>
> </doc>
> <doc>
> ....
> </doc>
> </add>
>
> WANTED:
> <add>
> <doc>
> <some_title>My title</some_title>
> <content>The keyword is nice to have but is not needed to
> include in a solr feed
what happens to the rest of the content here?
> </content>
> </doc>
> <doc>
> ....
> </doc>
> </add>
>
> any vim tricks to do this?
You might be able to do something like
:%s/<!\[\[CDATA\[\(\%(\%(]]>\)\@!\_.\)\{-}\)]]>/\=substitute(submatch(1),
'<[^>]*>', '', 'g')/g
(all on one line)
It doesn't post-process XML entities, but otherwise, it worked on
your example...
-tim
--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
No comments:
Post a Comment