Saturday, November 14, 2020

Re: regex to find where 'sample text' is not followed by 'sample text' a couple of lines down

On Thu, Nov 12, 2020 at 07:15:07PM EST, Tim Chase wrote:
> On 2020-11-12 18:42, Chris Jones wrote:
> > I am proofreading a document where a few words occur on one line
> > and the same exact words are replicated two lines down.
> >
> > Here's a sample:
> >
> > | ```{=latex}
> > | \index{Text that must occur twice}
> > | ```
> > | **2507. Text that must occur twice.** ... etc.
> >
> > I found that it's easy to highlight such occurrences using (e.g.):
> >
> > | /\\index{\(.*\)}\n```\n\*\*\d\+\. \1 " (1)
> >
> > Now I noticed that once in a while the repeated text is not the
> > same as the text inside the curly brackets (i.e. in the \latex{...}
> > command).
>
> As best I can tell, this should highlight \index{} entries that don't
> match text in the following N lines (3ish here, though I might have a
> fenceposting error)
>
> /\\index{\zs\(.*\)\ze}\(\%(\n.*\)\{,3\}\1\)\@!
>
> At least it passed all the tests I threw at it.

Never heard the term fenceposting... at least in this context.

I tried to figure out the intended logic behind your regex but was
unable to do so.

So I copy/pasted it and it found 3-4 errors for me... cases where the
regexes that I used to create the '\index etc. tagging from my raw text
had run into ho-hum... difficulties... and I had to do the job
manually...

I find your regex very intersting as a case study... all the more so
because it uses syntax I was not aware of... such as the '\%(...) bit
that if I read the documentation correctly define a special kind of
subgroups... the \{,3} bit that apparently can be used to whatever
matches in the preceding \(...) - I'm guessing...

Could you explain further?

> > In order to find them I tried:
> >
> > | /\\index{\(.*\)}\n```\n\*\*\d\+\. \@<!\1 " (2)
> >
> > The '\@<!' as I understand it means that my search pattern will
> > match everything up to and including the space... followed by
> > something that differs from the current value of the '\1' back
> > reference.
>
> The first in there is that the "\@<!" references the atom *before* it
> (a space) rather than the atom *after* it (your \1). However, even if
> you group them, it might not-match if off by even one character. I'd
> have to play with it more to see if there are other nuances that
> would cause issue.

... and so it should. Not sure where this error in my regex crept in
since I originally copy/pasted something I found in some SE issue or
other...

Thanks,

CJ

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/vim_use/20201114211847.GC9509%40turki.local.

No comments: