Wednesday, December 23, 2020

Re: Substitute pattern over multiple lines


On Wed, Dec 23, 2020 at 9:31 PM Tim Chase <vim@tim.thechases.com> wrote:
On 2020-12-23 20:39, John Cordes wrote:
>> I'd start with this ugly monstrosity:
>>
>> :%s/^2 \u\{3,} \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div 
>> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
>> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  I will attempt to deconstruct your 'monstrosity' somewhat later,

Tweaking it so that it only does NOTE items, not generic
continuations:

:%s/^2 NOTE \zs\(.*\n\%(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '', 
'g'), '\n', '', 'g')."<\/div>\n"

Breaking it down so hopefully you can swap parts as you see fit:

:%s/^2 NOTE \zs     On every line starting with "2 NOTE "
                    start our replacement here (\zs)
\(                  start capturing the note
                    this will be submatch(1) later
.*                  everything else on that line
\n                  and the newline
\%(                 a non-capturing group for another line that
\%(\D               starts with either a non-digit
\|                  or
3 CONC              a literal "3 CONC "
\)                  (end of this OR of things marking a continuation)
.*\n                followed by the rest of the line
\)                  (end of this continuation-line)
\+                  we can have 1 or more continuation lines
\)                  end the capturing
/                   replace it with
\=                  the result of evaluating this expression
'<div class="xxx">' the literal opening tag
.                   and then the results of
substitute(         remove all the newlines from the results of
 substitute(        removing from
  submatch(1),      the whole set of continuation stuff
  '\n3 CONC ',      the literal newline-followed-by-"3 CONC "
  '',               and replace them with nothing
  'g'               everywhere
  ),                and in that "\n3 CONC "-less text, replace
 '\n',              newlines with
 '',                nothing
 'g')               everywhere
.                   and then tack on
"<\/div>\n"         the literal closing </div> followed by a newline

>  It's a bit more complicated than I first explained. Two aspects:
> a) I *do* need to search on the "2 NOTE" lines, since there are
> various other chunks of lines with the CONC lines; and
> b) Sometimes the line "2 TYPE tngnote" has a line between it and
> the "2 NOTE". The intervening line can look like this
>
> 2 DATE 18 AUG 1776
>  or this
> 2 _SDATE 1802

Given the substitution command above, it should only touch "2 NOTE"
lines with subsequent "3 CONT" lines.  It does *every* "2 NOTE" so if
you need to limit them to just those that immediately follow "2 TYPE
tngnote" (assuming there aren't any "2 TYPE tngnote" that *don't*
have a NOTE immediately following them), you can tweak that command,
changing that inital "%" to

:g/^2 TYPE tngnote//2 NOTE /s/^2 NOTE \zs…

This looks for all the "2 TYPE tngnote" lines, searches forward
(skipping over any DATE/_SDATE lines or other intervening stuff) for
the "2 NOTE " line following it, and then only performs the
subsitution on those particular lines.

>  So the lines to change could look like this:
>
> ===================
> 1 EVEN
> 2 TYPE tngnote
> 2 _SDATE 1802
> 2 NOTE The surname of John's wife is not positively established.
> However, it is certain that her given name is Elizabeth; evidence
> for this comes first from the baptismal records for Rebecca and
> Eliza Catherine; these children were born while th
> 3 CONC e family was in London so the records are available in the
> London Metropolitan Archives (the other two children were born in
> Sheffield). Henry's baptismal record in Sheffield also has his
> parents being John (a skinner) and Elizabeth. The id
> 3 CONC entification of John's wife specifically with  Elizabeth
> Coxsey is somewhat tentative, however.
> 1 EVEN
> ===================
>
>  This search pattern
> /^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
>
>  works to find all 3 possibilities: no DATE line, an _SDATE line
> or a DATE line.
>
>  I thought I would be able to combine that with your pattern like
> so:
>
> :%s/^2 TYPE tngnote.*\n*\(\_^2 .*DATE.*\)*\n\_^2 NOTE
> \zs\(.*\n\(\%(\D\|3 CONC \).*\n\)\+\)/\='<div
> class="xxx">'.substitute(substitute(submatch(1), '\n3 CONC ', '',
> 'g'), '\n', '', 'g')."<\/div>\n"
>
>  but that is not working.

I suspect that the problem snuck in by using \(…\) in your added
conditions which captured that as submatch(1).  So you can either
make it non-capturing by adding that "%" before the open-paren:

  \%(\_^2 .*DATE.*\)

or change the "submatch(1)" to "submatch(2)"

> Here's an example of one small chunk of
> lines which were transformed by that command:
>
> 1 EVEN
> 2 TYPE tngnote
> 2 DATE 18 AUG 1776
> 2 NOTE <div class="xxx">2 DATE 18 AUG 1776</div>
> 1 EVEN

Note that the content here is what you captured in the first group.
:-)

Hope this helps get you on the right path,

-tim

 
 This is amazing looking, Tim -- thanks so much! There is a lot for a nearly 80-year old to unpack here -- it's going to take me a while. :)
  It looks as though you have covered all the bases I want to deal with. 

 Thank you again,
 John
    

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/CAGZBEdSChuJr8t82%3DOE-aMwQ6GgXyUKj-6SnBMmpQJLEHC9h%2BA%40mail.gmail.com.

No comments: