Friday, April 1, 2011

Re: Regular Expression Help

Hi,

Alessandro Antonello wrote:
>
> I have a problem building a regular expression for fields in a SQL UPDATE
> statement. The SET clause has the format as follows:
>
> field1 = 'value of field1', field2 = 'field\' 2 value'
>
> I have built a regular expression that can split the name of fields from its
> values. But when a value has an escaped single-quote the regular expression
> fails. I know that it fails because I am using the expression [^'] in between
> the parenthesis. But how to fix this?
>
> This is the regular expression I am using:
>
> /(\w+)\s*=\s*(('[^']*'\s*,)|(\d+\s*,))\s*/i
>
> Could someone help me with that?

there are some problems with your regular expression: first, Vim needs a
lot of backslashes where Perl or other regexp variants do not. You can
fix this by putting \v inside your regexp (see ":help /\v"). As a
consequence you now have to prefix the equal sign with a backslash:

/\v(\w+)\s*\=\s*(('[^']*'\s*,)|(\d+\s*,))\s*/i

Now you want to match single quoted strings which might include a
backlash-escaped single quote. That means "'[^']'*" is not enough;
inside your string everything but single quotes should be allowed, or
alternative a backslash-escaped single quote. This leads us to
"'([^']|\\')*'" for a string:

/\v(\w+)\s*\=\s*(('([^']|\\')*'\s*,)|(\d+\s*,))\s*/i

This still won't match you example, because "\s*," is expected after
every string or number. First lets move them out of the parentheses:

/\v(\w+)\s*\=\s*(('([^']|\\')*')|(\d+))\s*,\s*/i

Now you have to decide: do you want this regular expression to match a
single field or a list of fields. A single field is (almost) easy: just
strip "\s*,\s*" from the end:

/\v(\w+)\s*\=\s*(('([^']|\\')*')|(\d+))/i

Both fields in your example get matched now, but the second match is too
short. This is due to the way alternatives are tested. You can fix this
by swapping "[^']" and "\\'":

/\v(\w+)\s*\=\s*(('(\\'|[^'])*')|(\d+))/i

This correctly matches both fields. Before we continue lets get rid of
some superfluous parentheses. You don't need them around the
alternatives as long as the whole alternation is wrapped with
parentheses:

/\v(\w+)\s*\=\s*('(\\'|[^'])*'|\d+)/i

If you want to match a list of fields you must consider that there might
be no trailing comma after the last field of the list or -- expressed in
a slightly different way -- that every field has a leading comma except
the first one. For a list only one field is needed; more fields are
optional and separated from their predecessors with (optional) whitespace,
a comma, and (again optional) whitespace:

/\v(\w+)\s*\=\s*('(\\'|[^'])*'|\d+)(\s*,\s*\v(\w+)\s*\=\s*('(\\'|[^'])*'|\d+))*

That's it.

Regards,
Jürgen


--
Sometimes I think the surest sign that intelligent life exists elsewhere
in the universe is that none of it has tried to contact us. (Calvin)

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: