Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

zsh/bash behavior variance: regex ERE matching



This is just to note that I have observed a behavior variance.  My
proposed solution is to do absolutely nothing, and accept the variance
as "sane in an insane world".

Note that, per my standing practice, I do not cause risk to a code-base
which does not belong to me by reading GPL code of a related code-base,
so still have not read the bash code.  (I like the GPL and use it
elsewhere, but Zsh isn't GPL and it's not my call to risk that, so I
stubbornly refuse to risk it).  Descriptions of bash are based on
surmise from observed behavior.

Background: when bash copied the Perl-ish `=~` syntax, they declared it
to be an ERE match.  When I saw that Bash had added the `=~` comparison
infix operator, I went "that's a good idea" and did likewise for Zsh;
during on-list discussion at the time, the core maintainers expressed a
preference for closer compatibility with Bash, so I wrote the
`zsh/regex` module to do ERE matching and introduced the `re_match_pcre`
option to let folks map `=~` onto our long-standing `-pcre-match` infix
operator.  (I think Peter chose to make zsh/regex the default always,
which was very sane.)

Situation: on macOS (10.12.6, Sierrra), the regex library is based on
TRE, not on Henry Spencer's library or any other.  Further, re_format(7)
documents a number of features for `REG_ENHANCED` mode, as distinct from
`REG_EXTENDED`.  These are Perl-ish/PCRE-ish features such as `\d` for
`[[:digit:]]` and `(?:whatever)` for non-capturing grouping.

Using Zsh 5.4.2 built from Homebrew, which has no relevant patches, the
`=~` operator in Zsh is picking up features documented as `REG_ENHANCED`
when we only ask for `REG_EXTENDED`.  Homebrew reports that zsh is:

    Built from source on 2018-01-07 at 18:10:37 with: --with-unicode9 --with-gdbm --with-pcre

Specifically, the added features are the two features cited above,
`\d` and `(?:...)`.

So: we ask for ERE, we get ERE+nonstandard.

On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
match with `REG_ENHANCED` features.

Best operating hypothesis is:

 * Darwin userland bug
 * Bash build process has logic to detect broken ERE in system libraries
   and use a GNU ERE implementation (or ships with such always?) so that
   it's immune from bugs like this

Proposed action: nothing
Reason: most folks aren't familiar enough with regexps to know the
variances and I suspect a non-trivial number of macOS users who are
unwittingly relying upon TRE REG_ENHANCED features.  Fixing the
incompatibility (1) risks breaking working user scripts and (2) requires
shipping our own reliable ERE regexp library, and really I just don't
want to go there.

FWIW, somewhere lying around I also have a module which adds zsh/re2 as
a module, using Russ Cox's RE2 engine (as popularized by Go).  I suspect
that this would cause more confusion than it would solve, and I think I
dropped it part-way through converting RE_MATCH_PCRE to a compatibility
shim which edits a zsh-specific parameter which defines the engine to be
used and so can be set to any of (regex, re2, pcre).  If any of the core
team express interest, I can probably dust that off.

-Phil



Messages sorted by: Reverse Date, Date, Thread, Author