Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Metafication in error messages (Was: [PATCH] unmetafy Re: $var not expanded in ${x?$var})



2024-02-22 14:31:12 -0800, Bart Schaefer:
[...]
> Opinions?

There are two separate things here:

1. whether user-supplied text in messages should be output raw or
   with the non-printable characters given some common visible
   representation (like \n for newline, ^C for 0x03, \M-^C for
   0x83, ^@ for NUL, ^[ for ESC, \uffff for the encoding of
   U+FFFF...) by going through nicezputs().

2. Whether internally in the code the data should be passed
  "metafied" or not to zerr* functions.

For 1, IMO, when the error message is generated by zsh, it
should go through nicezputs(). zsh should decide of the
formatting, have it pass escape sequences as-is would make it
hard to understand and diagnose the error.

For instance, 

$ printf 'uname\r\n' | zsh
zsh: command not found: unane^M

is more useful than

zsh: command not found: uname

Where CR when sent to a terminal moves the cursor to the left
column so we don't see the problem is caused by that extra bogus
character.
 
The only cases where it should be passed raw is when the error
message is constructed by the user, where the user is expected
to be able to decide the formatting.

Like in:

syserror -p $'\e[1;31mERROR\e[m: '
echo ${1?$'multiline\nerror\nmessage'} ${DISPLAY:?$'\e[1;31mNo graphics'}

(syserror likely doesn't use zerrmsg anyway).

For 2, it looks like zerrmsg expects its input metafied and as
you say, that input in most cases is likely to be metafied
already. Not metafied would mean either we couldn't pass text
containing NUL, or we'd need to pass it as ptr+len.

So what makes most sense to me:

%s remains passed metafied and is output nicezputs'ed
%l same, truncated to the given number of bytes (though
   truncating to a number of characters or at least not cutting
   in the middle of character) would be nicer, but maybe
   overkill.
%S also passed metafied, but no nicezputs.

Now, my previous message was showing there were quite a few
issue with the metafication and possibly with the nicezputs'ing
and/or multibyte handling.

> $ printf '%d\n' $'1+|a\x83 c'
> zsh: bad math expression: operand expected at `|a^@c'

Should have been:

zsh: bad math expression: operand expected at `|aM-^C c'

The text was not passed metafied to zerrmsg with 0x83 0x20 then
incorrectly unmetafied to NUL, then rendered by nicezputs as ^@.

[...]
> $ printf '%d\n' '1+|ÃÃÃÃÃÃ'
> zsh: bad math expression: operand expected at `|\M-C\M-c\M-c\M-c\M-c\M-c\M-^C...'

I picked à because it's a letter from the latin script, so you
can even use it in variable names:

$ zsh -c '(( ÃÃÃÃÃÃ ++ )); typeset -p ÃÃÃÃÃÃ'
typeset -i ÃÃÃÃÃÃ=1

But its UTF-8 encoding happens to contain the 0x83 byte used in
metafication.

$ printf %s à | od -An -vtx1
 c3 83

$ printf %s à | cat -v
M-CM-^C

Again above, we see the effect of a missing metafication.

The error should have been:

zsh: bad math expression: operand expected at `|ÃÃÃÃÃÃ'

Like in:

~$ printf '%d\n' '1+|AAAAAA'
zsh: bad math expression: operand expected at `|AAAAAA'

> 0
> $ ((1+|ÃÃÃÃÃÃ))
> zsh: bad math expression: operand expected at `|ÃÃÃÃ\M-C...'

In that case, metafication OK, but character cut in the middle.

2024-02-22 16:49:20 -0800, Bart Schaefer:
> The changes for that are minimal.  With them, Stephane's math-garbles
> handle the ellipsis more cleanly:
> 
> % printf '%d\n' '1+|ÃÃÃÃÃÃ'
> zsh: bad math expression: operand expected at `|\M-C\M-c\...'
> 0
> % ((1+|ÃÃÃÃÃÃ))
> zsh: bad math expression: operand expected at `|Ã?Ã?Ã?...'

It seems rather worse to me.

-- 
Stephane




Messages sorted by: Reverse Date, Date, Thread, Author