Tuesday, April 26, 2011

Re: How to match A but not B

Reply to message «How to match A but not B»,
sent 23:58:07 27 April 2011, Wednesday
by howard Schwartz:

> /^\%(.*PPD\)\@!.*DEBIT
In this case it will process regex inside `\%(...\)' as usual, but will reject
the position if it matches.

> I was aware of \@!, but the documention warned that .* will match
> everything including PPD, before the pattern can find PPD and exclude it.
> That is why one must know something about where a string is, to exclude it
> in a pattern match, no?
It is not applicable for the case when `.*' is inside `\@!': regex inside `\@!'
is processed as any other regex, the effect of `\@!' is that position is
discarded if it matches. So when `.*' is inside negative lookahead group, regex
engine will try to rollback until it finds `PPD', just like it will do with
`/^.*PPD/'. If it is outside, regex engine won't have any reasons to rollback.

> /^DEBIT.*\%(PPD\)\@!/
In this case it will first process `DEBIT.*' and only then check `PPD'. Of
course it won't match as `.*' consumed all non-newlines. You should use
`/^DEBIT\%(.*PPD\)\@!/' here.

> Using "\@!" is tricky, because there are many places where a pattern
> does not match. "a.*p\@!" will match from an "a" to the end of the
> line, because ".*" can match all characters in the line and the "p"
> doesn't match at the end of the line.
>
> Also the order seems wrong to me, but maybe this does not matter? The
> string, DEBIT always occrs at the beginning of the line, if it occurs at
> all, and the string, PPD always occurs after DEBIT, if it occurs at all.
What order? I do not see anything wrong.

> Would the pattern not be:
>
> /^DEBIT.*\%(PPD\)\@!/
>
> And then there is the problem that .* will match PPD before the exclude
> operator \@! can exclude it?
You are right and `a.*p\@!' is same as `a.*\%(p\)\@!', but it is not an exclude
operator. It is negative look-ahead which means `discard position if pattern
matches'. Correct regex is `/\v^DEBIT%(.{-}PPD)@!/' (`{-}' should be a bit
faster as it will eliminate the reason to rollback).

Original message:
> Benjamen suggested:
>
> /^\%(.*PPD\)\@!.*DEBIT
>
> I was aware of \@!, but the documention warned that .* will match
> everything including PPD, before the pattern can find PPD and exclude it.
> That is why one must know something about where a string is, to exclude it
> in a pattern match, no?
>
> To quote the documentation:
>
> Using "\@!" is tricky, because there are many places where a pattern
> does not match. "a.*p\@!" will match from an "a" to the end of the
> line, because ".*" can match all characters in the line and the "p"
> doesn't match at the end of the line.
>
> Also the order seems wrong to me, but maybe this does not matter? The
> string, DEBIT always occrs at the beginning of the line, if it occurs at
> all, and the string, PPD always occurs after DEBIT, if it occurs at all.
> Would the pattern not be:
>
> /^DEBIT.*\%(PPD\)\@!/
>
> And then there is the problem that .* will match PPD before the exclude
> operator \@! can exclude it?

No comments: