Monday, February 1, 2010

Re: remove and clean CDATA out of xml

THX! that did the job!

On 01/02/2010, Tim Chase <vim@tim.thechases.com> wrote:
> bw wrote:
>> I am looking for a way to remove the CDATA and only get the text.
>> CURRENT:
>> <add>
>> <doc>
>> <some_title>My title</some_title>
>> <content><![[CDATA[
>> <p>The <strong>keyword</strong> is nice to have but is not needed to
>> include in a solr feed</p><p><table cellspacing="2" cellpadding="2"
>> border="1" width="100%"><tbody><tr><td>&#201;tape 1&nbsp;:</td></tr>
>> ]]></content>
>> </doc>
>> <doc>
>> ....
>> </doc>
>> </add>
>>
>> WANTED:
>> <add>
>> <doc>
>> <some_title>My title</some_title>
>> <content>The keyword is nice to have but is not needed to
>> include in a solr feed
>
> what happens to the rest of the content here?
>
>> </content>
>> </doc>
>> <doc>
>> ....
>> </doc>
>> </add>
>>
>> any vim tricks to do this?
>
> You might be able to do something like
>
> :%s/<!\[\[CDATA\[\(\%(\%(]]>\)\@!\_.\)\{-}\)]]>/\=substitute(submatch(1),
> '<[^>]*>', '', 'g')/g
>
> (all on one line)
> It doesn't post-process XML entities, but otherwise, it worked on
> your example...
>
> -tim
>
>
>
> --
> You received this message from the "vim_use" maillist.
> For more information, visit http://www.vim.org/maillist.php


--
[Bb](astia{2}n)?\s?[Ww](ak{2}ie)?$

--
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment