Saturday, November 14, 2020

Re: regex to find where 'sample text' is not followed by 'sample text' a couple of lines down

On 2020-11-14 16:18, Chris Jones wrote:
> On Thu, Nov 12, 2020 at 07:15:07PM EST, Tim Chase wrote:
> > As best I can tell, this should highlight \index{} entries that
> > don't match text in the following N lines (3ish here, though I
> > might have a fenceposting error)
> >
> > /\\index{\zs\(.*\)\ze}\(\%(\n.*\)\{,3\}\1\)\@!
> >
> > At least it passed all the tests I threw at it.
>
> Never heard the term fenceposting... at least in this context.

Fenceposting errors are off-by-one errors (etymologically stemming
from the fact that to put up N fence segments you need N+1
fenceposts). So that "\{,3\}" might include one line too many/few.

> So I copy/pasted it and it found 3-4 errors for me... cases where
> the regexes that I used to create the '\index etc. tagging from my
> raw text had run into ho-hum... difficulties... and I had to do the
> job manually...

Yeah, it didn't catch multi-line \index{…} statements if you have
those, so you might have to go back and manually revisit any such
instances, but you should be able to find them with something like

/\\index{[^}]*$

> I find your regex very intersting as a case study... all the more so
> because it uses syntax I was not aware of... such as the '\%(...)
> bit that if I read the documentation correctly define a special
> kind of subgroups... the \{,3} bit that apparently can be used to
> whatever matches in the preceding \(...) - I'm guessing...

The "\%(…\)" is the same as the grouping/capturing "\(…\)"
except, well, it doesn't capture (for later reuse with either "\1",
"\2", etc, or in an expression with "submatch(2)"). I find that if I
explicitly use "\(…\)" when I want to capture and use "\%(…\)"
everywhere else, it's a lot clearer what I intended for capturing and
what is just there for grouping purposes. In retrospect, I should
have also used it on the outer grouping since I don't reuse that
capture

/\\index{\zs\(.*\)\ze}\%(\%(\n.*\)\{,3\}\1\)\@!
^ ^

So up to the first "}" finds the "\index{…}" and captures the stuff
in side for later reuse as "\1". It then has a big group

\%(…\)

that it asserts should not be findable here

\@!

Inside that "assert you can't find this following the \index{…}"
portion, it looks for "a newline followed by anything, up to three
times" followed by the stuff we captured earlier ("\1").

If it finds such a match, it's good, so the "you can't find this \@!"
assertion fails and it doesn't highlight. If it doesn't find such a
match, it's either because it's too far away (try increasing the "3"
to some further distance) or because the text is present but doesn't
match what was inside the \index{…} which could be because of typos
or because of line-breaks. I.e., if you have

\index{one two three}
one two
three

it will highlight it as a non-match because the line-breaks make the
two different.

> Could you explain further?

Hopefully the explanation above makes enough sense that you can start
experimenting with it and feel more confident in your abilities to
use it to bludgeon future problems, bending them to your will. :-D

-tim




--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201114191626.348fc8c4%40bigbox.attlocal.net.

No comments: