Wednesday, December 23, 2020

Re: Substitute pattern over multiple lines

On 2020-12-23 20:39, John Cordes wrote:
>> I'd start with this ugly monstrosity:
>>
>> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
> I will attempt to deconstruct your 'monstrosity' somewhat later,

Tweaking it so that it only does NOTE items, not generic
continuations:

:%s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
'g'), '\n', '', 'g')."<\/div>\n"

Breaking it down so hopefully you can swap parts as you see fit:

:%s/^2 NOTE \zs On every line starting with "2 NOTE "
start our replacement here (\zs)
\( start capturing the note
this will be submatch(1) later
.* everything else on that line
\n and the newline
\%( a non-capturing group for another line that
\%(\D starts with either a non-digit
\| or
3 CONC a literal "3 CONC "
\) (end of this OR of things marking a continuation)
.*\n followed by the rest of the line
\) (end of this continuation-line)
\+ we can have 1 or more continuation lines
\) end the capturing
/ replace it with
\= the result of evaluating this expression
'<div class="xxx">' the literal opening tag
. and then the results of
substitute( remove all the newlines from the results of
substitute( removing from
submatch(1), the whole set of continuation stuff
'\n3 CONC ', the literal newline-followed-by-"3 CONC "
'', and replace them with nothing
'g' everywhere
), and in that "\n3 CONC "-less text, replace
'\n', newlines with
'', nothing
'g') everywhere
. and then tack on
"<\/div>\n" the literal closing </div> followed by a newline

> It's a bit more complicated than I first explained. Two aspects:
> a) I *do* need to search on the "2 NOTE" lines, since there are
> various other chunks of lines with the CONC lines; and
> b) Sometimes the line "2 TYPE tngnote" has a line between it and
> the "2 NOTE". The intervening line can look like this
>
> 2 DATE 18 AUG 1776
> or this
> 2 _SDATE 1802

Given the substitution command above, it should only touch "2 NOTE"
lines with subsequent "3 CONT" lines. It does *every* "2 NOTE" so if
you need to limit them to just those that immediately follow "2 TYPE
tngnote" (assuming there aren't any "2 TYPE tngnote" that *don't*
have a NOTE immediately following them), you can tweak that command,
changing that inital "%" to

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…

This looks for all the "2 TYPE tngnote" lines, searches forward
(skipping over any DATE/_SDATE lines or other intervening stuff) for
the "2 NOTE " line following it, and then only performs the
subsitution on those particular lines.

> So the lines to change could look like this:
>
> ===================
> 1 EVEN
> 2 TYPE tngnote
> 2 _SDATE 1802
> 2 NOTE The surname of John's wife is not positively established.
> However, it is certain that her given name is Elizabeth; evidence
> for this comes first from the baptismal records for Rebecca and
> Eliza Catherine; these children were born while th
> 3 CONC e family was in London so the records are available in the
> London Metropolitan Archives (the other two children were born in
> Sheffield). Henry's baptismal record in Sheffield also has his
> parents being John (a skinner) and Elizabeth. The id
> 3 CONC entification of John's wife specifically with Elizabeth
> Coxsey is somewhat tentative, however.
> 1 EVEN
> ===================
>
> This search pattern
> /^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
>
> works to find all 3 possibilities: no DATE line, an _SDATE line
> or a DATE line.
>
> I thought I would be able to combine that with your pattern like
> so:
>
> :%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
> \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
> but that is not working.

I suspect that the problem snuck in by using \(…\) in your added
conditions which captured that as submatch(1). So you can either
make it non-capturing by adding that "%" before the open-paren:

\%(\_^2 .*DATE.*\)

or change the "submatch(1)" to "submatch(2)"

> Here's an example of one small chunk of
> lines which were transformed by that command:
>
> 1 EVEN
> 2 TYPE tngnote
> 2 DATE 18 AUG 1776
> 2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
> 1 EVEN

Note that the content here is what you captured in the first group.
:-)

Hope this helps get you on the right path,

-tim




--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201223193113.36cd777d%40bigbox.attlocal.net.

No comments: