Thursday, August 23, 2012

Re: Vim perl highlighting doesn't understand UTF-8? (vimRE needs perlRE enhancing)

On Thu, 23 Aug 2012, Linda W wrote:

> Benjamin R. Haskell wrote:
>> Possibly. Another possibility is that Perl-syntax gurus use the
>> Perl-specific Vim group: https://groups.google.com/group/vim-perl
> ---
> Possibly, but I doubt it -- since the only posts there since Apr are a
> month old relating to a build problem in vim 7.3 when building in perl
> support.
>
> Given the traffic on that list, anyone who knows how vim does
> perl-syntax would be on this list, as they'd have to have a fairly
> good knowledge of vim -- and that list doesn't have the traffic to
> support such knowledge, by itself.

The author/maintainer of the "current" vim-perl (scare quotes because
it's not the version that ships w/ Vim, AFAIK) is on that list. So,
despite the lack of traffic, it probably does support the amount of Vim
knowledge required.


>> [ed. Trimming interesting, but not-really Vim-related discussion of
>> Perl identifiers]
>
> [...]
>
>> Vim's regular expressions aren't great for character sets that don't
>> fit into (single) 8-bit characters.
>
> Yikes -- it doesn't work at all with UTF-8 -- it's only ASCII compat!
> I didn't know that.
>
> While I've found Vim's RE difficult to memorize all of it's special
> cases, I didn't know it was also incompatible with it's default text
> mode.

That's not what I stated. It's not that it "doesn't work at all" or
that it's "incompatible". It's that some common shortcuts aren't what
one might think they are. Some of Vim's shortcut character classes are
only defined for ASCII (so they don't align with Perl-compatible RE's).
The ones that are used in the Perl syntax definitions that are
problematic here are:

\w [0-9A-Za-z_] word character
\h [A-Za-z_] head of word character

Ones used in Perl syntax definitions that I don't think make a
difference to Perl, but are ASCII-only in Vim are:

\s [ \t] whitespace
(I don't think Perl cares about non-ASCII whitespace)

\u [A-Z] uppercase character
(only used in perlFiledesc* groups -- I think special file descriptors
are limited to ASCII in Perl anyway)

\x [0-9A-Fa-f] hexadecimal digit
\o [0-7] octal digit
(no reason for these to be Unicode anyway)


> I don't suppose now would be a good time to ask, again, for perl-RE
> support as a built-in optional replacement for the vim-RE? :-)
> Only been on my wish list for vim for ~4-5 years, at least...
>
>> I'll take a stab at it if I get some "tuit's", but no promises.
> Sorry, I didn't know it was broken due to the RE-engine not supporting
> it, so please don't throw away good time kludging support for broken
> features, when it's the feature that needs fixing. This sounds like
> an even better reason to add-in such support.

As explained above, it's not the case that the RE-engine won't support
what Perl needs. It's just that the way the Perl syntax is written that
prevents it from working for Unicode.

I've wanted the ability to swap out RE engines for a while, too. This
might be something I attempt to tackle at some point. (Probably in
queue behind switching out the input-processing layer to support more
key sequences in terminal-vim.)

--
Best,
Ben H

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

No comments: