Re: BUG: Initializations of named references with an empty string should trigger an error

The nice thing with TYPESET_TO_UNSET is that it makes a clear distinction between uninitialized and initialized variables. For references, that distinction makes a big difference because uninitialized references are placeholders that behave very differently from initialized references. The Zsh code "typeset -i var; var=1; var=2" can be loosely translated to the C code "int var; var=1; var=2". There is no fundamental difference between the initialization and subsequent assignments; both translate to the same C operation. Things are very different for the Zsh code "typeset -n ref; ref=var; ref=42", which can be loosely translated to the C code "int *ref; ref=&var; *ref=42". In this case the initialization and subsequent assignments translate to very different operations.

What I find absurd and detrimental is when we try to equate a reference initialized with the empty string with a placeholder. To me this looks as absurd as pretending that a C int pointer to 0 is the same thing as a null int pointer. It is detrimental because it implies that a statement like "typeset -nu ref=$1" will not necessarily initialize the reference. Innumerable users will have to figure out why their script exhibits a completely weird behavior only to finally find out that it's because they mistakenly passed in an empty string instead of the desired variable name. A common error that would have been immediately caught if instead the "typeset" statement would have complained about an invalid variable name.

Bart, you say "as things stand "typeset -n ref=" is a necessity.". I really don't see why? Especially given the fact that it's already the case that "typeset -a arr=" is not accepted. And rightly so! An empty string is obviously neither an empty array, nor an array that contains just the empty string.

Can you give a concrete example of something fundamental that would break down if we don't accept "typeset -n ref="?

I really don't see anything fundamental that wouldn't work. I can see that when TYPESET_TO_UNSET is disabled, there would be a slight discrepancy between references and other types of variables because for references "typeset -p" would not include any value for a reference defined with no initialization value, while for other types of variables it includes the type's default value.

I frankly doubt that any user would blame us for this. Who cares about that? And why? For sure, there will be orders of magnitude more users that will spend time debugging a misbehaving "typeset -nu ref=$1" than users complaining about the fact that "typeset -n ref; typeset -p ref" doesn't yield an output that includes a default value.

If we really really wanted a default value for references, then we should at least adopt something that makes sense. Like for arrays, that is not the case of the empty string. Here, what would make sense is a token (not a string) that is more or less an equivalent of NULL in C. Maybe we could adopt "<null>" for that. So, when TYPESET_TO_UNSET is disabled, "typeset -n ref" would be equivalent to "typeset -p ref=<null>". If "ref" is a reference, then you could write "ref=<null>" to turn it back into a placeholder. Like the ( ... ) syntax, the <null> token could only be used in the right hand side of assignments. So, like ( ... ), you could never pass it as an argument. Thus, "typeset -nu ref=$1" would always initialize "ref", even if "$1" is equal to the empty string or the string "<null>", which would both trigger an "invalid variable name" error.

Currently you can do the following:

typeset -i var;

while ( ... ); do {

var=0; # Reset "var" to it's default value

...

}

You can do the same for strings, floats, and arrays. You just have to use the appropriate default value. However, for references, that is currently not possible. Which is just another proof that the empty string is a bogus default value for references. If we adopted the <null> token, then the same would also work for references.

Should we adopt <null>? In my opinion, we don't really need a default value for references, so I would rather do without it. However, if we think that a default value is absolutely needed, then yes, we should adopt it. We should not adopt some half-baked default value like the empty string that causes far more troubles than solves issues but go all the way and adopt a true default value that effectively works like a default value.

Philippe

On Sat, Jun 7, 2025 at 7:20 PM Philippe Altherr <philippe.altherr@xxxxxxxxx> wrote:

I wasn't going to send this until I'd completed some other patches,
but Philippe is obviously forging ahead without waiting for me, so
sending it now.

Just a quick reply to let you know that my latest patches aren't related to this question.

A more complete answer later or tomorrow.

Philippe

On Sat, Jun 7, 2025 at 6:25 PM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
I wasn't going to send this until I'd completed some other patches,
but Philippe is obviously forging ahead without waiting for me, so
sending it now.

On Wed, May 28, 2025 at 3:58 PM Philippe Altherr
<philippe.altherr@xxxxxxxxx> wrote:
>
> Currently, "typeset -n ref=$varname" may or may not initialize the reference even though it's extremely unlikely that anyone would write such code with the intent that an empty "varname" should indeed not initialize "ref".

It's always possible to write ref=${varname?uninitialized reference}
... my philosophy has been that except for impossible cases (like the
"hides nonexistent" patch), nameref parameters should be treated in
the ways that ordinary parameters are treated.

> Another consequence of the current treatment of empty variable names is that when TYPESET_TO_UNSET is enabled, there are two types of reference placeholders

Yes, but there are also two types of scalars. Drop the "-n" and your
example looks exactly the same.

> This stinks of bad design in my opinion.

Perhaps so, but it's a design we inherited. If the defaults of years
ago had been to behave as if TYPESET_TO_UNSET and TYPESET_SILENT were
in effect, I might feel differently, but as things stand "typeset -n
ref=" is a necessity.

> It's even more sad given the fact that otherwise everything is so nice and tidy with TYPESET_TO_UNSET enabled.

I'm curious now what "everything" you find otherwise tidier about
TYPESET_TO_UNSET.