Monday, December 28, 2015

Re: Apparent regex bug involving `\_.' (and other newline-matching constructs)

On Mon, Dec 28, 2015 at 6:20 AM, Bram Moolenaar <Bram@moolenaar.net> wrote:
>
>
> Brett Stahlman wrote:
>
> > Given a file containing the following 2 lines...
> > 1a3
> > 123xyz
> >
> > ...try the following tests, and note the unexpected results.
> >
> > Case 1.1:
> > call cursor(1, 1)
> > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}xyz', 'pcW')
> > => [1, 1, 2]
> >
> > Case 1.2:
> > call cursor(1, 2)
> > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}xyz', 'pcW')
> > => [2, 1, 1]
> > Question: Why does the \_. not permit earlier match at cursor pos (1, 2)?
> > Note: Clearly, submatch should be 2, not 1, but this error is simply a
> > consequence of the first error: since match doesn't begin on 1st line,
> > the "a" at cursor pos can't be captured.
>
> This is because of the 'c' flag in 'cpoptions'. The Vi-compatible way
> of searching is to start at the first column and skip over the match.
> Then take the first match after the start position.

If this is how it works, then I would have assumed it would have skipped the
match it returned for Case 1.1 (at starting position 1,1). But perhaps not
skipping the match at column 1 had something to do with (from help on 'cpo')
"...but not further than the start of the next line"? If so, the help text
isn't very clear in this case. It seems to be describing search
"continuation", and my tests were for an isolated search beginning at an
arbitrary buffer position. Also, the term "next line" is a bit misleading:
in this case, it seems to refer to what would have been the next line of a
*previous* search. But I guess the Vi designers didn't want to complicate
the implementation by maintaining the state needed to differentiate between
a subsequent search for the same pattern without intervening cursor movement
and a new search...

>
> > Case 1.3:
> > call cursor(1, 3)
> > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}xyz', 'pcW')
> > => [2, 1, 1]
> > Note: Why isn't a match found at cursor pos (1, 3)?
> >
> > Repeat these tests with a \zs in the pattern, and note how the capture
> > is matched unconditionally...
> >
> > Case 2.1:
> > call cursor(1, 1)
> > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}\zsxyz', 'pcW')
> > => [2, 4, 2]
> >
> > Case 2.2:
> > call cursor(1, 2)
> > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}\zsxyz', 'pcW')
> > => [2, 4, 2]
> >
> > Case 2.3:
> > call cursor(1, 3)
> > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}\zsxyz', 'pcW')
> > => [2, 4, 2]
> > Note: Submatch should be 1, not 2, here. It's as though the \zs forces the
> > capture to match unconditionally.
> >
> > Points to note... Originally, I thought the error had to do with the 'p'
> > flag, but that appears not to be the case: the submatch errors are simply a
> > consequence of the incorrectly determined start locations. Also, it appears
> > the results would have been the same with * as they were with \{-}.
> > Finally, the unexpected behavior is not limited to \_., but is seen even
> > when (e.g.) explicit \n is used.
>
> After removing 'c' from 'cpoptions', does it work as you expect?

Not as I expected, but the first 3 tests, at least, work as I now expect.

Case 3.3, however, makes no sense to me now. It returns...
=> [2, 4, 2]
...even though there's nothing to match the [a-z]. If I change the "1a3" to
"123", it returns...
=> [2, 4, 1]
...which tells me that the parens were capturing the "a" *before* the start
position, in spite of the 'W' flag prohibiting wrap. This tells me that the
search must be starting before the cursor position, most likely at the start
of the cursor line. I would not have expected that a forward search with no
lookbehind of any sort could find anything prior to the starting cursor
position. But I guess it's not really finding a match prior to the cursor
position - just checking to see what needs to be skipped? But with &cpo no
longer containing 'c', and the 'c' flag passed to searchpos(), why would it
even need this sort of "skip-over" test prior to cursor position?

Thanks,
Brett S.

>
> --
> Veni, Vidi, Video -- I came, I saw, I taped what I saw.
>
> /// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\
> /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
> \\\ an exciting new programming language -- http://www.Zimbu.org ///
> \\\ help me help AIDS victims -- http://ICCF-Holland.org ///

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: