Wednesday, August 15, 2012

Pattern matching question

I am working on an HTML file generated by WordPerfect's Publish to HTML, trying to get clean html but maintaining the format fairly closely. One thing I'd like to do is delete empty tag pairs, such as this:
<SPAN STYLE="text-decoration: underline"></SPAN>

I'm sure this must be a trivial regex problem for some but I'm apparently missing some key idea. I'm working under Linux, with
VIM - Vi IMproved 7.1 (2007 May 12, compiled Oct 17 2008 18:11:28)
(sorry, not easy to upgrade)
I have tried search patterns along the following lines
/<SPAN .\{-}><\/SPAN>
Have also tried some 'grouping' possibilites. Whatever I do my patterns always go beyond the closing </SPAN> tag and find a later "<". For example if a line in the file looks like this:
<P><SPAN STYLE="font-size: 10pt"> //she age 20, born NS, d/o William and Sophia - image 581, pg.187] (<STRONG>note</STRONG>: <EM>marriages in </EM></SPAN></P>

(that's all on one line)

my pattern selects
<SPAN STYLE="font-size: 10pt"> //she age 20, born NS, d/o William and Sophia - image 581, pg.187] (<STRONG>note</STRONG>: <EM>marriages in </EM></SPAN>

I seem to need something which says that after finding the first (non-greedy) ">" I want the very next "<" character, rather than searching further down the line.

Any help most gratefully received!

Thanks,
John

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: