Tuesday, June 18, 2013

Re: Dealing with empty strings in regexp.

On 18.06.13 14:51, Paul Isambert wrote:
> The "*" operator should be banned, then!

Does the problem with matching empty strings arise from using "*" when
"+" should be used instead? You are presumably aware that¹:

* = 0 or more of the preceding atom.
+ = 1 or more of the preceding atom.

Thus "(a|b)+" means one or more a or b characters, and cannot match the
empty string. Use "*" instead, and you've instructed it to also match "".

There are many regex dialects - enough to fill a fat O'Reilly book, and
enough to make anyone's head hurt. One way to minimise the confusion is
to cultivate fluency in one dialect, and eschew the others.

Having long ago found posix BREs annoyingly full of superfluous
backslashes, I've settled for the more concise and powerful posix EREs.
Also, "man 7 regex" agrees that BREs are obsolete. (To get away from
obsolete regexes in vim, prefix regexes with "\v". That is a good
approximation of posix EREs, and so is consistent with many *nix
utilities, so you can effortlessly switch from awk, bash, egrep,
procmail, etc, etc, to vim with "\v".)

Erik

¹ In posix EREs, and most others, though in some vim modes, "+" isn't
"magic". Those obsolete regex modes are worth avoiding.

--
Leibowitz's Rule:
When hammering a nail, you will never hit your finger if you hold the
hammer with both hands.

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No comments:

Post a Comment