Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: zsh/bash behavior variance: regex ERE matching



2018-03-13 22:40:33 -0400, Phil Pennock:
[...]
> So: we ask for ERE, we get ERE+nonstandard.
> 
> On the same platform, Bash 4.4.19(1)-release from Homebrew does _NOT_
> match with `REG_ENHANCED` features.
[...]

An important note about how bash's =~ works since 3.2 (in 3.1
or with the compat31 option it works more like zsh):

In bash (and to some extent in ksh93 as well though it's very
buggy there), the shell quoting operators have an influence on
the regex matching like it does for shell wildcards.

[[ a =~ "." ]] or [[ a =~ \. ]]

actually call regcomp() with a "\." regexp.

To do that, bash needs to parse the regexp and does it using the
POSIX ERE syntax. In 

[[ a =~ \d ]] there is the same as [[ a =~ "d" ]] and calls
regcomp() with "d" while for [[ a =~ '\d' ]], it calls it with
"\\d" (the "\" being shell-quoted results in it being
regexp-escaped).

That means that if you want to use extensions, you need to use
variables or other expansions there (which you  leave unquoted).

Like:

re='\d'
[[ a =~ $re ]]

for regcomp() to be called with "\d".

Note that  (?:...) and \d are fine. We're not breaking EREs by
supporting it as the behaviour for (?:...) and \d is unspecified
in the POSIX ERE specification.

Other regexp implementations have other backward-compatible
extensions. For instance, GNU EREs support \b, \<, \>...

Some incompatibilities I'm aware of between ERE and PCRE (I
don't know if that also applies to those macOS REs):

- In POSIX ERE, [\d] matches on \ and d while it matches on a
  digit in PCRE (see also [\]] and co)
- in POSIX ERE, alternation looks for the longest match, while
  PCRE the  leftmost one that matches.

  $ echo abc | grep -oE 'a|ab'
  ab
  $ echo abc | grep -oP 'a|ab'
  a

  $ [[ abc =~ '(a|ab)' ]]; echo $match
  ab
  $ setopt rematchpcre
  $ [[ abc =~ '(a|ab)' ]]; echo $match
  a

As long as the regex library does what is required for POSIX
compliant regular expressions, since we document that =~ does
POSIX ERE, I'd say it doesn't matter what extension are
implemented over the standard.

-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author