Wednesday, April 27, 2016

Re: RFE: support POSIX standard and developing RE's

LCD 47 wrote:
On 15 April 2016, Erik Christiansen <dvalin@internode.on.net> wrote:    
On 14.04.16 14:40, Christian Brabandt wrote:      
Am 2016-04-14 12:14, schrieb Erik Christiansen:        
So many unix utilities support POSIX "Modern" EREs, that it is the best standard to conform to.          
And that is an argument for what, considering that vi comes from a  time, where BRE where the default RE dialect?        
----
    The argument for that: vim descends from 'vi' which was the visual
editor version of 'ed'.  Use of 'ed' has nearly evaporated, however,
'sed' the stream version of 'ed' (both gnu and early unix utils) DOES offer ERE's as an option, thus answering your question -- i.e. if 'vim'
stayed current with current versions of its ancestors, it would already
have the option.  'sed', 'grep' and others have done what any living program does -- they evolve.  'vim' has yet to evolve in this area.

Consistent regexes across unix utilities. Perhaps I was not  sufficiently explicit in that regard? I note the deep attachment to  obsolete BREs expressed above, but the rest of the world has moved on  to modern EREs.      
BRE's are compatible with ERE's.  If you *only* use BRE syntax, then any
prog using ERE's "should" still work the same for you.  PCRE isn't 100% backwards compatible because they chose to make it slightly easier to use
than ERE's.  Example, supporting '/x' at the end to allow & ignore embedded whitespace for readability.  Second example:  "\" *ALWAYS* means to take the next character as a literal -- thus no special cases and no special rules to remember (except on the fifth Thursday that falls after
a new moon in February...:-) ).
  O'Reilly's "Mastering Regular Expressions" mentions that "POSIX  standardized the workings of over 70 programs, including traditional  regex-wielding tools such as awk, ed, egrep, expr, grep, and sed."  (And mutt, lex, ...)      
[...]        "Consistent regexes across unix utilities"?  You sir are either a  troll, or simply have no idea what you're talking about.    
----
    And you are an aggressive nerdbutt!  Did you even bother to fact check
your statement before calling names?  It does say "POSIX -- An attempt at standardization -- and how they got most programs that POSIX described,
to support BRE's, ERE's or both.  It also brought up the problem of
locale's and unicode.  To date, I believe Perl's RE has the most comprehensive coverage of unicode of any RE by a wide margin.  You can specify chars by charname, codepoint, or just "typing them in".  If your
favorite RE doesn't handle the basics -- like upper & lower case of
all the characters handled in all the languages included in Unicode, it doesn't begin to handle the needs of a multi-lingual world.

      Take a look here:    https://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines        So why ERE instead of PCRE?  Oniguruma?  RE2?    
Actually ERE is at least a standard, and, at least has that going for it,
thought it keeps looking like PCRE will join the group, POSIX moves at
a glacial pace except on matters of great unimportance.  Actually POSIX is
really 'dead', as the current entity calling itself POSIX doesn't believe in or adhere to the original POSIX's mission statement of being a declarative body (telling people what is there and the commonalities they can rely on), vs. the new POSIX's mission is to dumb down the interface and *prescribe* behaviors that weren't there before, but to talk about that would raise my BP by about 20 points and be fairly worthless.

      Actually, try something simpler:    $ grep --version  grep (GNU grep) 2.24  Copyright (C) 2016 Free Software Foundation, Inc.  License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.  This is free software: you are free to change and redistribute it.  There is NO WARRANTY, to the extent permitted by law.    Written by Mike Haertel and others, see <http://git.sv.gnu.org/cgit/grep.git/tree/AUTHORS>.    $ echo 'foo bar' | egrep -o '[[:<:]]bar'  grep: Invalid character class name    
What is [[:<:]]?  What standard is it a part of?  Never seen it before.
Um:
perl -we 'use strict; use P;
> my $a="now < there";
> $a =~ /[[:<:]] there/;
> '
POSIX class [:<:] unknown in regex; marked by <-- HERE in m/[[:<:] <-- HERE ] there/ at -e line 3.
----
Oh, You made up your own syntax!   I see.  It's not part of any standard.

egrep doesn't support PCRE's extended character classes -- and it is
fully compliant in it's documenting the fact that it only supports
text-matching (grep -f), BRE (grep), ERE (grep -E) and PCRE's (grep -P).

What documentation or standard are you claiming grep isn't following in
regards to it's RE engine?

Could you point me at the bug report?



I
      That's because GNU grep has its own \< and \> instead of POSIX  [[:<:]] and [[:>:]].  Have you considered starting a crusade to convince  GNU people to adhere to POSIX conventions, in the name of consistenty?        As for Vim: its regexes have features not present in any other  language.
Such as?  You could support a vim-compatible RE in perl if you wanted to
write one -- it allows plugins and compatibility -- can Vim support
PCRE's -- that library is already to be plugged in, so show me how
wonderful vim's static and arcane RE syntax is better than, say, perl's?

You can embed code in the middle of a perl RE, to handle any matching case (there are also many security provisions that you must comply with to use
such features, but they are there.
  People use them, and thousands of plugins and syntax files  rely on them. 
Yeah -- well people use PCRE's and millions of people rely on them.
Javascript's RE was almost entirely derived from perl's when it was
implemented.  Show me 1 webbrowser that has builtin support for full
vimscript and vim's RE.

BTW, I use vim every day as my code editor -- so I'm not exactly knocking it -- only your ignorance in how superior it is.


 You're asking to break all of them because you _prefer_  something else?  
----
    Who is asking anyone to break anything?   I wrote the original note on this topic, and I made no mention nor wanted to have new RE's replace
the current ones.  I even made suggestions about integration in my previous email like:


Maybe having "\X" & "\P" for extended and pcre's would be a start,
though I'd _like_ to see a way of choosing different RE's for
use in macros & .vim files (for compat), and a 2nd option for
interactive RE's (thus eliminating the need for the "/[vmMVXP]"
on each search or substitute).



Wake up please. 

Learn how to read before you awaken others to your ignorant state.

-l

No comments: