Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: 'case' pattern matching bug with bracket expressions



On Thu, 14 May 2015 14:14:26 +0100
Martijn Dekker <martijn@xxxxxxxx> wrote:
> While writing a cross-platform shell library I've come across a bug in
> the way zsh (in POSIX mode) matches patterns in 'case' statements that
> are at variance with other POSIX shells.
> 
> Normally, zsh considers an empty bracket expression [] a bad pattern
> while other shells ([d]ash, bash, ksh) consider it a negative:
> 
> case abc in ( [] ) echo yes ;; ( * ) echo no ;; esac
> 
> Expected output: no
> Got output: zsh: bad pattern: []

This is the shell language being typically duplicitous and unhelpful.
"]" after a "[" indicates that the "]" is part of the set.  This is
normal; in bash as well as zsh:

  [[ ']' = []] ]] && echo yes

outputs 'yes'.

However, as you've found out, other shells handle the case where there
isn't another ']' later.  Generally there's no harm in this, and in most
cases we could do this (the case below is harder).

Nonetheless, there's a real ambiguity here, so given this and the
following I'd definitely suggest not relying on it if you can avoid
doing so --- use something else to signify an empty string.

> The same thing does NOT produce an error, but a false positive (!), if
> an extra non-matching pattern with | is added:
> 
> case abc in ( [] | *[!a-z]*) echo yes ;; ( * ) echo no ;; esac

This is the pattern:
 '['                   introducing bracketed expression
   '] | *[!a-z'        characters inside
 ']'                   end of bracketed expression
 '*'                   wildcard.

so it's a set including the character a followed by anything, and hence
matches.

I'm not really sure we *can* resolve this unambiguously the way you
want.  Is there something that forbids us from interpreting the pattern
that way?  The handling of ']' at the start is mandated, if I've
followed all the logic corretly --- POSIX 2007 Shell and Utilities
2.13.1 says:

[
    If an open bracket introduces a bracket expression as in XBD RE
    Bracket Expression, except that the <exclamation-mark> character (
    '!' ) shall replace the <circumflex> character ( '^' ) in its role
    in a non-matching list in the regular expression notation, it shall
    introduce a pattern bracket expression. A bracket expression
    starting with an unquoted <circumflex> character produces
    unspecified results. Otherwise, '[' shall match the character
    itself.

The languaqge is a little turgid, but I think it's saying "unless
you have ^ or [ just go with the RE rules in [section 9.3.5]".

9.3.5 (in regular expressions) says, amongst a lot of other things:

   The <right-square-bracket> ( ']' ) shall lose its special meaning and
   represent itself in a bracket expression if it occurs first in the
   list (after an initial <circumflex> ( '^' ), if any)

That's a "shall".

I haven't read through the "case" doc so there may be some killer reason
why that " | " has to be a case separator and not part of a
square-bracketed expression.  But that would seem to imply some form of
hierarchical parsing in which those characters couldn't occur within a
pattern.

By the way, we don't handle all forms in 9.3.5, e.g. equivalence sets,
so saying "it works like REs" isn't a perfect answer for zsh, either.

pws



Messages sorted by: Reverse Date, Date, Thread, Author