Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: A solution to fix hidden references in reference chains



So, speed/space tradeoff.  I guess the question would be how often we
need to find a named reference 100 scopes away vs. how often we need
100 scoperefs at all, and whether the never-freed 800 bytes (?) is
worth worrying about.

Afaik, by default the call stack is limited to 500. With that default, the absolute worst case is a very long running script that at the very beginning creates named references to nested scopes at all levels from 1 to 500 and then never again uses named references. This would create an array of 500 empty linked lists that never get freed. I would argue that nowadays, even such a large waste is not worth worrying about, especially given the fact that it would only occur in an extremely unlikely corner case.

For scripts that don't use references, the overhead is limited to a null array and a size variable. For most scripts, the array would probably never grow beyond one or two dozen elements. To me, this definitely looks like something we don't need to worry about. It's probably only the very rare scripts that use both recursion and references for which the array may grow to larger sizes. However, that's also the scripts where the improved performance of the array of linked lists (vs the linked list of linked lists) would come in handy. Another advantage of the array solution is that it's more straightforward; inserting a linked list in the middle is trivial while with a linked list of linked lists it would require more involved code.

For all these reasons, I would rather stick with the array of linked lists.

> You can "unset" an enclosing parameter and then redefine/reuse it with "typeset -g".

But you can't change a named reference into something else that way.
Even "unset -n" doesn't remove the nameref-ness of the surrounding
scope parameter.

Yep but I was working on a fix for this issue. I have now sent the patch, see workers/54236.

Aside #1:  Should this code in bin_unset have changed along with other
parts of array-initializer removal in workers/53776?
            if ((pm->node.flags & PM_NAMEREF) &&
                (!(pm = resolve_nameref(pm)) || pm->width)) {
                /* warning? */
                continue;
            }
(That is, pm->width now irrelevant here?)

Yes, if/once workers/54198 is committed, the check for pm->width can (and should) be dropped.

Aside #2, noting here so I don't have to refresh my memory again:
% typeset -n positional=3

This used to produce
zsh: invalid variable name: 3

but as of workers/54064 this is allowed:
% typeset -p positional
typeset -n positional=3

with an intent to fix later to be a reference to $3 (currently
$positional always returns empty). 

I have a patch that fixes references to positional parameters. I'll try to send it in the next few days.

> I should probably add some explicit tests for the following cases:
> - Ref added to multiple scope lists
> - Ref added multiple times to the same scope list
> - Ref added to a scope list is now an integer with base=N

Per above, I think this is actually impossible, at least at present?

Once workers/54236 is committed all of that should become possible. I will try to rebase the patch onto workers/54236 and add tests for these cases.

Fyi, here are a few other things I intend to look into:

- In workers/54236, I had to add many checks for PM_UNSET and PM_DECLARED. Most of these would not be needed if whenever a local variable is unset all its flags were cleared and replaced with PM_UNSET. This would guarantee that if a parameter has a type flag (one of PM_ARRAY, PM_INTEGER, PM_NAMEREF, ...) then it's for sure a still alive parameter of that type. Currently, you should always check for the presence of the type flag and the absence of PM_UNSET or the presence of PM_DECLARED, which is rather verbose and very error prone. Do you see any reason why flags should NOT be cleared when a parameter is unset? Afaik, there exists no mechanism that allows undeleting an unset parameter. So I don't see any reason why flags would need to be kept after a parameter is unset.

- Currently PM_UNSET can mean two very different things. If PM_DECLARED is also present, then PM_UNSET means that the parameter was declared with no explicit value (needed when the option TYPESET_TO_UNSET is enabled). Otherwise, PM_UNSET means that the parameter was unset, for example with a call to the builtin "unset". This is rather confusing. I wonder whether we would be better served if PM_DECLARED was replaced with a PM_NULL that means that the parameter has a null value and PM_UNSET would always mean that the parameter was unset. I will give it a try to see what code change this would entail.

- Bart, you once suggested that dereferencing not-yet-initialized references ought to trigger an error. Currently such references need to be handled in several places and, when they are part of assignments, they trigger various kinds of errors/warnings that may not necessarily make much sense to end users. My impression is that things could be simpler and more uniform if dereferencing a not-yet-initialized reference would always trigger an error. I will try to write a patch that does that,

- Some time ago, I started to wonder whether the reference resolution implemented in resolve_nameref_rec should instead be part of fetchvalue. My impression is that it would avoid a number of redundancies. I think that it would allow for a much simpler fix for references to positional parameters than the one that I currently have. I also think that it would fix or make it much easier to fix many of the bugs related to subscripted references (however, I would still vote for not supporting them for now). This would be a rather large refactoring but my impression is that many things would be simpler afterwards. However, that's just my gut feeling. I haven't started working on this and maybe once I do, I will quickly find out that something doesn't work.

Philippe


On Tue, Mar 17, 2026 at 6:51 PM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
On Mon, Mar 16, 2026 at 6:23 PM Philippe Altherr
<philippe.altherr@xxxxxxxxx> wrote:
>>
>> scoperefs is a C array of LinkList pointers.  It will always be as
>> large as the greatest locallevel (deepest nested scope) in which a
>> nameref has pointed to a nested parameter
>
> One issue with a list of lists is that it can become expensive to find the list of params of a given scope. In the example below each call to "rec" needs to insert the "r" of the enclosing scope into the list of the current scope. That's cheap as that would be the list at the head of the list of lists. However, the last call to "rec" needs to insert "ref" into the list of the scope of "inner". That's expensive because it's behind the 100 lists corresponding to the 100 scopes of "rec".

So, speed/space tradeoff.  I guess the question would be how often we
need to find a named reference 100 scopes away vs. how often we need
100 scoperefs at all, and whether the never-freed 800 bytes (?) is
worth worrying about.

> Another issue with a list of lists arises in case you only want to store non-null/empty lists. Then you need an additional data structure to tell you to which scope each list of params corresponds to.

Yes, you'd need a structure consisting of a node and a level.

> You can "unset" an enclosing parameter and then redefine/reuse it with "typeset -g".

But you can't change a named reference into something else that way.
Even "unset -n" doesn't remove the nameref-ness of the surrounding
scope parameter.

Aside #1:  Should this code in bin_unset have changed along with other
parts of array-initializer removal in workers/53776?
            if ((pm->node.flags & PM_NAMEREF) &&
                (!(pm = resolve_nameref(pm)) || pm->width)) {
                /* warning? */
                continue;
            }
(That is, pm->width now irrelevant here?)

Aside #2, noting here so I don't have to refresh my memory again:
% typeset -n positional=3

This used to produce
zsh: invalid variable name: 3

but as of workers/54064 this is allowed:
% typeset -p positional
typeset -n positional=3

with an intent to fix later to be a reference to $3 (currently
$positional always returns empty).  Now it only happens for "up"
namerefs:
% () { typeset -nu wrong=5 }
(anon): invalid name reference: 5

This is a bit more interesting now that we can't nameref to argv[3].

> The example below shows how the "ref" parameter defined in "scope1" can be added to the list of params to reset of "scope2" AND of "scope3".

Note to self:  Run some variants of this example after applying workers/54196

> I should probably add some explicit tests for the following cases:
> - Ref added to multiple scope lists
> - Ref added multiple times to the same scope list
> - Ref added to a scope list is now an integer with base=N

Per above, I think this is actually impossible, at least at present?

> - Ref added to a scope list was unset (btw, the code should probably explicitly skip params with PM_UNSET)


Messages sorted by: Reverse Date, Date, Thread, Author