Monday, December 28, 2015

Re: Apparent regex bug involving `\_.' (and other newline-matching constructs)

Brett Stahlman wrote:

> > > Given a file containing the following 2 lines...
> > > 1a3
> > > 123xyz
> > >
> > > ...try the following tests, and note the unexpected results.
> > >
> > > Case 1.1:
> > > call cursor(1, 1)
> > > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}xyz', 'pcW')
> > > => [1, 1, 2]
> > >
> > > Case 1.2:
> > > call cursor(1, 2)
> > > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}xyz', 'pcW')
> > > => [2, 1, 1]
> > > Question: Why does the \_. not permit earlier match at cursor pos (1, 2)?
> > > Note: Clearly, submatch should be 2, not 1, but this error is simply a
> > > consequence of the first error: since match doesn't begin on 1st line,
> > > the "a" at cursor pos can't be captured.
> >
> > This is because of the 'c' flag in 'cpoptions'. The Vi-compatible way
> > of searching is to start at the first column and skip over the match.
> > Then take the first match after the start position.
>
> If this is how it works, then I would have assumed it would have skipped the
> match it returned for Case 1.1 (at starting position 1,1). But perhaps not
> skipping the match at column 1 had something to do with (from help on 'cpo')
> "...but not further than the start of the next line"? If so, the help text
> isn't very clear in this case. It seems to be describing search
> "continuation", and my tests were for an isolated search beginning at an
> arbitrary buffer position. Also, the term "next line" is a bit misleading:
> in this case, it seems to refer to what would have been the next line of a
> *previous* search. But I guess the Vi designers didn't want to complicate
> the implementation by maintaining the state needed to differentiate between
> a subsequent search for the same pattern without intervening cursor movement
> and a new search...
>
> >
> > > Case 1.3:
> > > call cursor(1, 3)
> > > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}xyz', 'pcW')
> > > => [2, 1, 1]
> > > Note: Why isn't a match found at cursor pos (1, 3)?
> > >
> > > Repeat these tests with a \zs in the pattern, and note how the capture
> > > is matched unconditionally...
> > >
> > > Case 2.1:
> > > call cursor(1, 1)
> > > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}\zsxyz', 'pcW')
> > > => [2, 4, 2]
> > >
> > > Case 2.2:
> > > call cursor(1, 2)
> > > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}\zsxyz', 'pcW')
> > > => [2, 4, 2]
> > >
> > > Case 2.3:
> > > call cursor(1, 3)
> > > echo searchpos('\%(\([a-z]\)\|\_.\)\{-}\zsxyz', 'pcW')
> > > => [2, 4, 2]
> > > Note: Submatch should be 1, not 2, here. It's as though the \zs forces the
> > > capture to match unconditionally.
> > >
> > > Points to note... Originally, I thought the error had to do with the 'p'
> > > flag, but that appears not to be the case: the submatch errors are simply a
> > > consequence of the incorrectly determined start locations. Also, it appears
> > > the results would have been the same with * as they were with \{-}.
> > > Finally, the unexpected behavior is not limited to \_., but is seen even
> > > when (e.g.) explicit \n is used.
> >
> > After removing 'c' from 'cpoptions', does it work as you expect?
>
> Not as I expected, but the first 3 tests, at least, work as I now expect.
>
> Case 3.3, however, makes no sense to me now. It returns...
> => [2, 4, 2]
> ...even though there's nothing to match the [a-z]. If I change the "1a3" to
> "123", it returns...
> => [2, 4, 1]
> ...which tells me that the parens were capturing the "a" *before* the start
> position, in spite of the 'W' flag prohibiting wrap. This tells me that the
> search must be starting before the cursor position, most likely at the start
> of the cursor line. I would not have expected that a forward search with no
> lookbehind of any sort could find anything prior to the starting cursor
> position. But I guess it's not really finding a match prior to the cursor
> position - just checking to see what needs to be skipped? But with &cpo no
> longer containing 'c', and the 'c' flag passed to searchpos(), why would it
> even need this sort of "skip-over" test prior to cursor position?

The search always starts in the first column. Then when a match is
found and it's before the cursor, another search is done at the next
position. Vi compatible is to continue after the matched pattern. When
removing 'c' from 'cpo' it searches from the next column.

With the \zs the search in the first column returns a position after the
start position, thus it's a match. Without the \zs the column would be
the first column.

I can see this is not what you expect or what you want. We can add
another flag to actually start at the search start position.

--
Computers are not intelligent. They only think they are.

/// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\
/// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\ an exciting new programming language -- http://www.Zimbu.org ///
\\\ help me help AIDS victims -- http://ICCF-Holland.org ///

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

No comments: