Monday, October 8, 2012

Re: [Rephrased] Problem with a regular expression in Vim

On Monday, October 8, 2012 1:39:46 PM UTC-5, Chris Jones wrote:
>
> I researched it a little further over the weekend, and eventually, I ran
> into this via a perl forum:
>
> | % echo 'ascii string: "string1", unicode string: "κορδόνι"' | perl -wnE 'say for /"[^"]*"/g
> | "string1"
> | "κορδόνι"
>
> I don't know perl, but it looks like the match on the two sample strings
> includes the quotes.
>
>
> Now, if you add a capturing group¹ around the [^"]* negated character
>
> class that matches the actual strings, this is what you get:
>
> | % echo 'ascii string: "string1", unicode string: "κορδόνι"' | perl -wnE 'say for /"([^"]*)"/g
> | string1
> | κορδόνι
>
> This time the match does _not_ include the quotes.
>

Yes it does. The captured group, now accessible with $1, does not include the quotes. The match does include the quotes. The full match (the equivalent of which gets highlighted in Vim) is accessible in $& (in Vim: \0). But the Perl snippet given only prints out the captured group because of the /g flag. See below.

>
> Or, with our sample text:
>
> | % echo 'xxx==aaa==bbbccc==ddd==yyy' | perl -wnE 'say for /==[^=]*==/g'
> | ==aaa==
> | ==ddd==
> |
> | % echo 'xxx==aaa==bbbccc==ddd==yyy' | perl -wnE 'say for /==([^=]*)==/g'
> | aaa
> | ddd
>

In list context, if there are capturing groups, the match operator /.../g returns a list of all strings where the capture group matches.

"The /g modifier specifies global pattern matching--that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern."

http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators

This Perl is really saying:

for each place where ==([^=]*)== matches, print the captured match

Unlike Perl, Vim cannot access any captured groups outside of the search or substitute command. In other words, Perl can do stuff like this:

$mystr =~ /==(.*)==/;
print $1\n;

Vim cannot.

This is not a regex pattern thing. It's a language thing.

> So, I tried the same approach with Vim:
>
> | xxx==aaa==bbbccc==ddd==yyy
> |
> | /==[^=]*==
> | /==\([^=]*\)==
>
>
> But it doesn't make any difference..
>
> Both regexes match '==aaa==' and '==ddd==' including the quotes.
>

Yes, they both MATCH the quotes. But the capturing group only CAPTURES the text without quotes. The same is true in Perl.

Vim HIGHLIGHTS a match as if you're doing this in Perl:

print "$mystr\n" if ($mystr =~ /"([^"])"/);

Vim CAPTURES a group as if you're doing this in Perl:

$mystr =~ /"([^"])"/;
print "$1\n";

Note that the SAME pattern is used both times, but in a different way.

>
> Isn't Vim supposed to mimic perl regexes..?
>

Not really. Vim has its own dialect. Although Vim regex can do a lot of what Perl's can, it's not a 1-1 match.

>
> Or is there something in Vim's regex syntax that would make it work?
>

Not in the regex syntax. But as discussed it's not Perl's regex syntax allowing it to work in Perl either, it's how the regex is applied. Using the /g flag on a match operator in Perl gives you all matching substrings.

Vim can do something similar with the matchlist() function if you pass a count to it in a loop until the match fails. I'm not sure if there's a more efficient way to extract all matches or not.

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments:

Post a Comment