Re: [PATCH] Restrict named directories to scalar parameters.

This was started before the latest messages from Bart and Mikael, which I tried to integrate as best as I could when I saw them.

PS1="" for zsh -fis

>> Aside: I suggest adding PS1="" to the input of zsh -fis for the Test/K01 patch.
>
> What is the advantage? It doesn't seem to make any difference.

If the test fails, the PS1 prompt is printed approximately 20 times as
the "Error output" from "zsh -fis". Set it to empty, it prints once
(for that assignment itself).

Aha! That's nice. You can even get to zero with "PS1= zsh -fis …". Then no *?* is needed in the expected output. I will send an updated patch.

ND_USERNAME

> The table supports exactly one feature, mapping a string (the name) to
> another string (the path). Things can be added to this table in
> various ways, but there is always just one type of entry in the table.

That's not precisely true because each entry in the table does have a
"flags" field and the ND_USERNAME flag does have semantics similar to
a "type".

I'm aware of ND_USERNAME. There are also dynamic named directories. I hope that we can limit the discussion to named directories created either via "hash -d" or via a parameter, which is already complicated enough.

PM_NAMEDDIR

> A side-effect of the AUTO_NAME_DIRS option is that it makes it possible to turn a hash-based named directory N into a parameter-based one. Indeed, if a parameter named N is defined after the call to "hash", it will override the value specified by the "hash" command and any further updates of the parameter named N will be reflected in the named directory N:

I'm sure I'm repeating myself at this point, but using "hash -d
foo=/bar" and "foo=/bar" will both set the value of ~foo to /bar,
there is no "type of named directory" in play at any point. You are
simply assigning to the same thing multiple times.

It's true that the following 3 commands define the exact same foo entry in the nameddirs table.

hash -d foo=/bar
foo=/bar; : ~foo
setopt autonamedirs; foo=/bar

However, the last two do strictly more than the first one; they also set the PM_NAMEDDIR flag on the foo parameter. The effect is that from then on any value change of the parameter foo also triggers a value change of the named directory foo (even if autonamedirs is disabled). Afaict, this isn't explicitly described in the documentation but it's an important feature. If one defines JAVA_HOME and turns it into a named directory, one also expects that future changes to JAVA_HOME will be reflected in the named directory. This only works if one of the last two commands is used. From the user's perspective they create a different kind of named directory than the first command.

Parameter promotion to named directory

I'm still disgruntled by this notion of "promotion" to a nameddir.
The only thing the PM_NAMEDDIR flag means is to update the nameddir
table when the parameter changes. If its name is not a key in the
table, it's not a nameddir, whether the flag is set or not.

I think that it's accurate to say that the last two commands in the previous section promote the foo parameter to a named directory. Bart points out that such a promotion includes two parts. One is to flag the foo parameter with PM_NAMEDDIR to ensure that each future value change will also trigger a value change of any foo named directory. The second part is to actually create a foo named directory. Both parts are needed for a successful promotion.

Although -- even without the patch I don't follow how the nameref is
"promoted" to a namedir, since you can't initialize a nameref to a
value starting with a slash.

My two patches are actually only concerned with the first part. Technically their aim is to prevent {named references, non-directory scalars} from being flagged with PM_NAMEDDIR. Or, in user's terms, their aim is to prevent {named references, non-directory scalars} from messing with identically named named directories.

> A better description of the patch workers/54760 is that it prevents the promotion to named directories of (scalar) parameters whose value doesn't start with a "/".

That's already tested in adduserdirs(). What difference does it make
to also test it earlier?

The problem is that if the value is not a directory, adduserdirs() removes any existing named directory. The Zsh startup scripts could contain the following sequence of commands:

hash -d foo=/bar # Creates a named directory foo

…

setopt autonamedirs

…

foo=not-a-dir # Removes the named directory foo even though the parameter foo was never meant to designate a named directory

For the same reason, "typeset -n foo=var" currently removes any foo named directory (when autonamedirs is enabled). The two patches prevent these two foo parameters from being flagged with PM_NAMEDDIR and thus also prevent any problematic calls to adduserdirs().

The question is ...

setopt autonamedirs
foo=bar # is it important that foo is (or is not yet) linked to the nameddir table?

Yes, I think that it's important that at this point foo is NOT YET linked to the nameddir table. See the discussion of use cases in the next section.

Hash-based vs parameter-based

I think that it's hard to deny that the current implementation gives rise, intentionally or accidentally, to two distinct behaviors (at least when AUTO_NAME_DIRS is disabled). If you run "hash -d foo=/foo1", then ~foo is static and always expands to /foo1. If instead you run "foo=/foo1; : ~foo", then ~foo is dynamic and always expands to the same as $foo. From an end user's perspective, everything works as if there are two kinds of named directories. From an implementation perspective, the distinction isn't encoded in the nameddirs table but in the presence or absence of a parameter flagged with PM_NAMEDDIR.

Even though the distinction is never alluded to in the documentation and maybe wasn't even planned and arised by accident, the distinction makes sense to me. If one runs "foo=/foo1; : ~foo" and thus relies on the parameter foo to create ~foo, it feels legitimate to ensure that ~foo will always expand to $foo. While, if one runs "hash -d foo=/foo1" and thus creates ~foo independently from any foo parameter, it feels legitimate to ignore any current or future foo parameter when expanding ~foo. The good news is that except for the case where one runs "foo=/foo1; : ~foo" and later "hash -d foo=/foo2" everything works as described here.

I take it from Bart's comments about the history of named directories that the creation of these two behaviors is rather accidental than intentional. However both look useful to me and I know at least one person who heavily relies on both of them.

A perfect use case for hash-based named directories is to shorten directories that you often use in command lines. That's a fixed set of directories for which you can define a short name with a call to "hash -d". You don't need parameters for these directories since you can always use the ~foo syntax. In fact, this avoids "polluting" the parameter namespace by having a separate namespace for named directories. Here, the expectation is that no parameter will ever modify any of your named directories.

A perfect use case for parameter-based named directories are directory parameters like JAVA_HOME used in various scripts. It can be interesting to promote these to named directories for example to ensure that JAVA_HOME will be used in filename abbreviations. Here, the expectation is that ~JAVA_HOME will always expand to the same as $JAVA_HOME.

Parameters like JAVA_HOME are typically defined in Zsh startup scripts. In order to promote them to named directories, one can add a final script that contains commands lile ": ~JAVA_HOME". However, that can be tedious if there are many such parameters and it has a maintenance cost. Instead, one can simply enable autonamedirs before running the startup scripts and optionally disable it afterwards. However, if one also relies on hash-based named directories, then in order to minimize conflicts with hash-based named directories, it becomes important that only parameters that hold a directory get flagged with PM_NAMEDDIR, which is what the two patches try to address.

If we agree that these two behaviors make sense, then it might be worth being a little more explicit about them in the documentation. We could also try to address the case where the two behaviors collide for example by removing PM_NAMEDDIR from any foo parameter when one runs "hash -d foo=/foo1". Alternatively the documentation could state that one should not try to mix the two behaviors.

Named references

If foo is a named reference that refers to a scalar parameter bar that contains a directory, then foo could in principle be promoted to a named directory. However, with the current implementation it would be very hard to ensure that ~foo keeps expanding to the same as $foo. The problem is that whenever bar is changed ~foo would have to change but there is nothing that links back to foo or ~foo from bar. That's why I think we should never promote named references even though in principle it would make sense.

Bart's questions

Consider this sequence:

setopt autonamedirs
dirname=$HOME/mynameddir
unsetopt autonamedirs
hash -d dirname=/tmp

My preference is that whenever you run "hash -d foo=/bar" it creates a hash-based named directory. This would require that "hash -d" removes PM_NAMEDDIR on any existing global foo parameter.

Now assign something to dirname. Describe what should happen
-- when the value begins with a slash
-- when the value does not begin with slash

Both would have no effect on ~dirname.

For "unset dirname", does it behave like the latter case? Why or why not?

Again, no effect because there is no longer any link between dirname and ~dirname.

Now describe what happens in those 3 cases when that last "hash -d" is
"hash -r".

I don't think we would need to change what "hash -r". I assume that currently it empties the table but doesn't remove any PM_NAMEDDIR. Therefore "dirname=/foo" would set ~dirname to /foo and "dirname=not-a-dir" as well as "unset dirname" would remove ~dirname.

Now dispense with autonamedirs but use ~dirname before "hash -d" or
after "hash -r".

That should yield the exact same results because already today the following

setopt autonamedirs
dirname=$HOME/mynameddir
unsetopt autonamedirs

is equivalent to

dirname=$HOME/mynameddir

: ~dirname

Both set PM_NAMEDDIR on dirname and initialize ~dirname with $HOME/mynameddir.

Mikael's proposal

That said, I would prefer using only the hash builtin to modify the
table and that it had nothing to do with parameters whatsoever. The
whole point of having a separate ~ namespace to me is that I can avoid
littering the $ namespace. Maybe while we're messing around in this
area, nobody would mind if I also add an option to disable ~foo
automagically checking if $foo has a path in it and setting ~foo,
since you currently cannot disable this?

This looks like an option that could be very appreciated by people interested in hash-based named directories.

Alternate implementation

I'm starting to wonder whether an alternate implementation would be possible. With my two patches we end up with 3 places with the same "/" check (adduserdir, getnameddir, strsetfn). And they don't even do exactly the same. getnameddir uses getsparam() while strsetfn uses the paramater's native value. Couldn't we do better?

Another idea that comes to mind is that instead of a PM_NAMEDDIR flag on parameters we could maybe have a ND_PARAMBASED flag on named directories. Instead of updating a named directory whenever a PM_NAMEDDIR is changed, we would lookup a parameter whenever a ND_PARAMBASED named directory is expanded. For now, it's just an idea. I haven't thought it through at all.

Philippe