Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: The big kre zsh bug report



On Thu, Dec 20, 2018 at 2:48 PM Martijn Dekker <martijn@xxxxxxxx> wrote:
>
> However, I think having 'set -u' apply to $(( x )) is "obvious" and
> useful behaviour.

Reasonable.

> > Posix says of the "jobs" command that the status is Running (with a capital R)
> > not "running" with a lower case 'r'.    (Same with Done, ...)

In this instance, I really don't give a damn what Posix says about it.
Zsh follows csh here, and this seems way too trivial to special-case.

> $ zsh --emulate sh -c 'echo $(( $(echo 9) $(echo -) $(echo 2) ))'
> zsh:1: bad math expression: operator expected at `2 '

"echo -" is a special case handled like "echo --" so it echoes nothing
and the operator disappears.  This has nothing to do with math or the
"-" operator.  (Elz has clarified this in his own reply.)

> > tc-so:Executing command [ zsh --emulate sh -c . ./h-f3; X=1; set -- ; delim_argv "${X+$@}" ]
>
> So the issue is that "${X+$@}" should be removed completely and not
> leave an empty quoted string if X is set, but there are no positional
> parameters.
>
> Looks logical to me: in that the ${X+$@} parameter substitution
> substitutes $@, within quotes, leaving "$@", which is definitely removed
> completely if there are no positional parameters.
>
> But if this is a bug, it's certainly a widespread one!

I'd prefer that this continue to act like bash and ksh than to follow
the abstract spec.

> > The \ is not removed before var expansion, ${\#} is not ${#}
> > and \# is not a valid var name, nor is \ if this is being parsed
> > as a substring match on ${\}
> > so this should be a syntax error
> > (at least in sh emulation mode).

This one surprised me.  Seems to come down to this code in params.c:

2341        } else if (inbrace && inull(*s)) {
2342            /*
2343             * Handles things like ${(f)"$(<file)"} by skipping
2344             * the double quotes.  We don't need to know what was
2345             * actually there; the presence of a String or Qstring
2346             * is good enough.
2347             */
2348            s++;

inull() is expected to match a quote there, but it happens to also
match backslash.  Fix in another thread.

> > Combining length and set/not set operators is not defined in sh,
> > and makes no sense anyway, as ${#x} is always set
>
> I'll leave this for the zsh developers to consider, but personally I
> think we can legitimately consider this an artefact of a zsh extension.

Agree, I'm strongly inclined to ignore this.

> > Next we get similar tests, but this time we are testing $# rather than the
> > length of a var operator...
> >
> > tc-se:dollar_hash[79]: Test of 'set -- a b c; echo ${#:-99}' failed.
> > tc-se:[79] Expected output '3', received '2'
> > tc-se:[79] Full command: <<set -- a b c; echo ${#:-99}>>
> >
> > For $# I am not sure posix requires handling the tests for set/unset
> > (as $# is always set) so I would understand an error here, but not
> > the wrong answer, there are 3 args $# should be 3, it is never unset
> > or null, so the :-99 part should just be noise (or an error).
>
> So what happens is that ${#:-foo} measures the length of whatever
> follows ':-'. Interesting behaviour, but not correct, at least not for
> POSIX mode.

Zsh allows an empty parameter name before ":-" and it is handling that
case before doing the length calculation.

That also explains this one:

> > tc-se:dollar_hash[80]: Test of 'set -- a b c; echo ${#-99}' failed.
> > tc-se:[80] expected exit code 0, got 1

% echo ${-99}
zsh: bad substitution

There was quite a lot of discussion of this on zsh-workers a while
back, as I recall, and it was decided at that time that ${#anything}
always means ${#${anything}} never ${${#}anything}.

I'm not expecting this to be up for debate again now.

Same for this one:

> > tc-se:dollar_hash[84]: Test of 'set -- a b c; echo ${#?bogus}' failed.
> > tc-se:[84] expected exit code 0, got 1

This is being treated as ${#${?bogus}}.

> > tc-se:shell_params[13]: Test of 'set -- a b c d; echo ${4294967297}' failed.
> > tc-se:[13] Expected output '', received 'a'
> >
> > This indicates that 32 bit arith overflow occurred, and wasn't detected.
>
> Confirmed (also on ksh93 and dash).

Checking for overflow here seems like a lot of computational expense
for a case that probably only happens in test suites.  Since zsh
implements arrays as actual non-sparse C arrays, memory is going to
explode long before anything manages to assign that many positional
parameters.

> > [...] my guess is
> > that the ${ with a \newline between the $ and { is not working as it should.
>
> Confirmed. A bit more experimenting shows that it breaks between '$' and
> '{' and nowhere else. Very unlikely for line continuation to be used
> there in real-life scripts, but still a bug.

Not going to argue with that one.  I suspect it's because the parser
is treating "{" as beginning a brace expansion (e.g., {a,b,c}) at that
point and so encodes it differently.

> > Next ...
> >
> >       check 't=" x";     IFS=" x"; set $t; IFS=":"; r="$*"; IFS=; echo $# $r' '1'
> >
> > I think this means that zsh is not treating non-whitespace IFS characters
> > as field terminators, but as field separators, which is not the way it is
> > supposed to be.  A later failure is because of the same thing I believe.
>
> Confirmed. Modernish identifies this as a shell quirk (QRK_IFSFINAL)
> because the POSIX spec is quite ambiguous on this, so I was unable to
> confirm that it is actually a bug and not a legitimate behaviour variant.

It seems to me that changing this for $=t in zsh native mode might
break a lot of things, so I'll leave it open for discussion as to
whether it's feasible to change it only for emulation mode.  However,
it does differ from the most recent version of bash I have handy.

> > Next, zsh apparently does not implement the posix
> > required -h option.   Nor do we, but we at least allow
> > it to be set and cleared ....

For the lazy or very busy reader:

-h  Locate and remember utilities invoked by functions as those
functions are defined (the utilities are normally located when the
function is executed).

This basically means doing path search at function definition time;
there's some dependence here on the formal definition of "utilities",
as I recall there are other cases where zsh differs from posix on
using builtins or functions to replace utilities.

> > There are no bad patterns in sh.   Ever.   The literal string
> > '[a-c]' should be removed from the start of the value of var.

This one is going to be very tricky to deal with.  Zsh does not
convert "invalid" patterns into plain strings and then match against
the plain strings; a pattern is either valid, or it is never used for
matching in the first place.  Thus in globbing:

% zsh --emulate sh
$ touch "[a-c]"
$ echo *[a-c\]
*[a-c]
$ bash
bash-4.1$ echo *[a-c\]
[a-c]

I think all the related cases cascade from this.

> > tc-se:case_matching[147]: Test of 'var='\z'; case ${var} in (${var}) printf M;; (*) printf X;; esac' failed.
> >
> > The word to match is two chars, backslash and z, the pattern is
> > a quoted 'z' (the backslash becomes a quoting character).

I don't think this is going to get fixed.  I went looking for this
test case but didn't find it:

var='"z"'; case ${var} in (${var}) printf M;; (*) printf X;; esac

This also prints M.  If backslash-z should become a quoted z,
shouldn't the above case also become a quoted z?  So that means in
case statements all variable references have to be treated as if they
were ${(Q)var} (to use zsh-speke)?  What if the quotes aren't
balanced?

> > This one is excusable, a \ followed by nothing in a pattern is an unspecified
> > case [...]
> > That one too.   Note all 3 of them work the same in bash as we expect,
> > and all 3 still fail with zsh --emulate bash
>
> I'm not sure the zsh authors aim to make emulation modes quite that
> exact, but I'll just leave this here for their consideration.

There isn't any --emulate bash, really, it's merly a synonym for
--emulate sh.  And therefore no, it's not intended to be perfect.
(Neither is --emulate ksh, although that has some distinctions from
sh.)

> > tc-se:var_substring_matching[47]: Test of 'var='abc';printf '%s\n' ${var%*}' failed.
> > tc-se:[47] Expected output 'abc', received 'ab'

This one has crept in along the way somewhere, zsh 3.0 and zsh 4.2
both work correctly, zsh 5.3 does not (I don't presently have access
to anything earlier in 5.x to try).

> Various other tests showed that zsh cannot handle file descriptors >9.

That's not true, it just doesn't handle redirection to descriptors >9
without the use of variant syntax.

> > tc-so:Executing command [ zsh --emulate sh -c  set -- a b c d; shift 1 1 ; echo FAILED  ]
> > tc-se:Fail: incorrect exit status: 0, expected: anything else
> > tc-se:stdout:
> > tc-se:FAILED
> > tc-se:
> > tc-se:stderr:
> > tc-se:
> >
> > shift should only take 1 arg, not two ... but most shells do not check that,
> > so this one is perhaps excusable.
>
> I think builtins should always fail on excess arguments since, outside
> of test cases, that is a clear indication something has gone awry in the
> script.

This isn't a bug.  "shift 1 1" means to shift the array $1 by one
position.  Zsh "shift" can even work on multiple arrays at once,
"shift 3 foo bar" means to shift both foo and bar by three.

You could argue that "shift thingthatisnotanarray" should complain,
but it's not about the number of arguments.

> Looks like zsh doesn't like a bare shell assignment as a background job.

This is a bug in native mode too.  Been that way forever.

> > tc-so:Executing command [ zsh --emulate sh -c case in in (esac|cat ]
> >
> > I cannot even begin to imagine what that nonsense parsed as...
> > (but once a case pattern is started with the optional '(' it requires
> > the following ')' to complete it, always.

This is an issue with "-c" and maybe also with a script file input.
If passed to an interactive shell, it's an incomplete parse (the PS2
prompt is printed and the shell waits for more input to finish the
case statement).  So it's not parsing as nonsense, it's parsing as
what it should, and then exiting zero when it hits end-of-file even
though it's still in the middle of a statement.  If I EOF the
interactive shell:

% case in in (esac|cat
case> zsh: parse error near `(esac|cat'

So this has apparently been special-cased to be silent when the shell
is not interactive.

> > tc-so:Executing command [ zsh --emulate sh -c if if :;then :;fi then :;fi ]

Another one that works up through 4.2 but breaks sometime at or before 5.3.

> $ zsh --emulate sh -c 'if until :; do :; done then :; fi'

This, too.

> > tc-so:Executing command [ zsh --emulate sh -c case x in (|| | ||) ;; esac ]
>
> This was a syntax error in zsh until 5.4.1; 5.4.2 starts accepting it.

I get this accepted by every version of zsh that I test.  It matches
the empty string:

% case '' in (|| | ||) print ok;; esac
ok

> > tc-so:Executing command [ zsh --emulate sh -c wait 1 ]
> > tc-se:Fail: incorrect exit status: 1, expected: 127

Probably easily fixed.



Messages sorted by: Reverse Date, Date, Thread, Author