Wednesday, December 23, 2020

Re: Substitute pattern over multiple lines

On Wed, Dec 23, 2020 at 05:08:32PM -0600, Tim Chase wrote:
> On 2020-12-23 17:48, John Cordes wrote:
> > I'm seeking help with editing a GEDCOM (genealogy) file. For
> > this I'm using Vim 8.2 in Windows. Here is a segment of text from
> > the file (the language doesn't make sense since I've deleted
> > some internal lines in the NOTEs which aren't relevant to the
> > question):
> >
> > =======================
> > 1 EVEN
> > 2 TYPE tngnote
> > 2 NOTE I have included the children William, Charles, Alice, and
> > with his parents in 1881, and with his widowed mother in 1
> > 3 CONC 891 (e.g. see my online transcription of the 1891 Smiths
> > with James Moser, son of Henry Moser and Mary Henneberry, and his
> > wife Margaret Woodin; however
> > 3 CONC , I have not yet taken this step.
> > 1 BIRT
> > =======================
> >
> > The 2 lines beginning with ^3 CONC are Continuation
> > (CONC=Concatenation) lines.
> >
> > I want to surround the text of the NOTE with a 'div' tag, so that
> > the final result should look like this:
> >
> > =======================
> > 1 EVEN
> > 2 TYPE tngnote
> > 2 NOTE <div class="xxx">I have included the children William,
> > Charles, Alice, and with his parents in 1881, and with his widowed
> > mother in 1891 (e.g. see my online transcription of the 1891
> > Smiths with James Moser, son of Henry Moser and Mary Henneberry,
> > and his wife Margaret Woodin; however, I have not yet taken this
> > step.</div>
> > 1 BIRT
> > =======================
>
> I'd start with this ugly monstrosity:
>
> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
> (all one line in case it breaks in the mail)
>
> If you only want it to do "2 NOTE" lines, you can change that initial
>
> 2 \u\{3,} \zs
>
> (which does any item that has continuations) to
>
> 2 NOTE \zs
>
> This does join *all* the lines and doesn't re-wrap them, so you'd
> then want a second pass to do the wrapping
>
> :set tw=70
> :g/<div [^>]*>.*<\/div>$/norm gqq
>
> Hope this gives you some ideas to work with.

Yes indeed Tim -- an excellent idea. Thanks very much.
I will attempt to deconstruct your 'monstrosity' somewhat later,
but I've been trying to get things to work with my situation.

It's a bit more complicated than I first explained. Two aspects:
a) I *do* need to search on the "2 NOTE" lines, since there are
various other chunks of lines with the CONC lines; and
b) Sometimes the line "2 TYPE tngnote" has a line between it and
the "2 NOTE". The intervening line can look like this

2 DATE 18 AUG 1776
or this
2 _SDATE 1802

So the lines to change could look like this:

===================
1 EVEN
2 TYPE tngnote
2 _SDATE 1802
2 NOTE The surname of John's wife is not positively established.
However, it is certain that her given name is Elizabeth; evidence
for this comes first from the baptismal records for Rebecca and
Eliza Catherine; these children were born while th
3 CONC e family was in London so the records are available in the
London Metropolitan Archives (the other two children were born in
Sheffield). Henry's baptismal record in Sheffield also has his
parents being John (a skinner) and Elizabeth. The id
3 CONC entification of John's wife specifically with Elizabeth
Coxsey is somewhat tentative, however.
1 EVEN
===================

This search pattern
/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE

works to find all 3 possibilities: no DATE line, an _SDATE line
or a DATE line.

I thought I would be able to combine that with your pattern like so:

:%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '', 'g'), '\n', '', 'g')."<\/div>\n"

but that is not working. Here's an example of one small chunk of
lines which were transformed by that command:

1 EVEN
2 TYPE tngnote
2 DATE 18 AUG 1776
2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
1 EVEN

The command is eliminating the content which had been in the NOTE tags altogether.

I will keep trying, but more help would be terrific!

Thanks,
John

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201224003911.GA16492%40dal.ca.

No comments: