Tuesday, June 18, 2013

Dealing with empty strings in regexp.

Hello all,

The following issue has been recently discussed on the Lua mailing list:
http://lua-users.org/lists/lua-l/2013-04/msg00812.html

(It has also been independantly raised on the LuaTeX list:
http://tug.org/pipermail/luatex/2013-June/004418.html)

If I understand correctly, any string can be represented with
interspersed empty substrings. E.g. "abc" is really "ϵaϵbϵcϵ", where
"ϵ" is the empty string. Now, there seems to be two ways to deal with
those empty strings in regexps, especially regarding the "*" operator:

- The Perl way: "X*" matches as many "X" as possible, and does not
include the following empty string.
- The Python (or sed) way: "X*" matches as many "X" as possible, and
includes the following empty string.

Starting empty strings are always included. So, the Perl way gives (I
use Ruby, since I can't speak Perl):

puts 'abc'.gsub(/[ac]*/, '(\0)')
# returns "(a)()b(c)()", really "(ϵa)(ϵ)b(ϵc)(ϵ)"

And the Python way:

import re
print re.sub(re.compile('(a*)'), '(\\1)', 'abc')
# returns "(a)b(c)", really "(ϵaϵ)b(ϵcϵ)"

(Note that adding "$" to the patterns doesn't change anything.)

Now, VimL works in the Perl way, except that "*" includes the empty
string if it is the last one in the string:

echo substitute('abc', '[ac]*', '(\0)', 'g')
" returns "(a)()b(c)", really "(ϵa)(ϵ)b(ϵcϵ)"

As far as I'm concerned, I find the Perl way quite counter-intuitive,
but what I'm interested in here is whether VimL is consistent or not.
I.e., shouldn't it work clearly one way or the other?

Best,
Paul

--
--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

---
You received this message because you are subscribed to the Google Groups "vim_use" group.
To unsubscribe from this group and stop receiving emails from it, send an email to vim_use+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No comments: