Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Bug in alias expansion

X-seq: zsh-workers 37128
From: Peter Stephenson <p.stephenson@xxxxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: Re: Bug in alias expansion
Date: Tue, 17 Nov 2015 17:29:51 +0000
In-reply-to: <20151115200356.0f3a80a2@ntlworld.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
Organization: Samsung Cambridge Solution Centre
References: <CAFvQaj4AKN36Ntdxt0rNCqPxWFHtjLMt3zPK65y2Xf9G5iUfWA@mail.gmail.com> <20151115200356.0f3a80a2@ntlworld.com>

On Sun, 15 Nov 2015 20:03:56 +0000
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> On Fri, 13 Nov 2015 13:03:52 -0500
> Kynn Jones <kynnjo@xxxxxxxxx> wrote:
> > % typeset -A frobozz
> > % alias -g foo='echo xyz'
> > % frobozz[$(foo)]=9
> > zsh: not an identifier: frobozz[$(fooech9
>...
> This is assuming the tokstr you get back is the same you sent down,
> which it may not because it might need reallocating --- it does indeed
> in this case because we are expanding an alias inside.  It confused me a
> bit that it's only the end of the string (s, from lexbuf.ptr) that's
> being passed back, but that's presumably because it's assuming the start
> doesn't move, which is the (or an) erroneous assumption.  It's possible
> the fix therefore is simply ensuring that both the start and the end of
> the string are pased back and the start passed in forgotten.

I abandoned this after finding out that a whole tower of functions
(is that a collective noun for functions, or only in completion code?)
sitting on top of parse_subscript() is relying on the in-place
tokenisation that was screwing up the string.

We can get rid of the currently visible problems by allocating a new
string for the token and copying back the result to the original.

It's not very efficient, and it still has the problem that if something
funny happens to the length --- and I don't think there's any guarantee
of length preservation built into the lexer even though it happens to
work in the cases we've looked at so far --- then somebody's going to
get hurt.

However, it does remove the immediate problem (is functionally no
worse than before the problem was introduced) and the inefficiency
should be minor unless the subscripts get really huge.

In the error case, we could possibly just not do the copy but move
the end pointer; however, that case doesn't need to be efficient
and I don't want to change more of the logic than is necessary.

diff --git a/Src/lex.c b/Src/lex.c
index 89af961..81904c1 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -1617,7 +1617,7 @@ parsestrnoerr(char **s)
 mod_export char *
 parse_subscript(char *s, int sub, int endchar)
 {
-    int l = strlen(s), err;
+    int l = strlen(s), err, toklen;
     char *t;
 
     if (!*s || *s == endchar)
@@ -1626,18 +1626,34 @@ parse_subscript(char *s, int sub, int endchar)
     untokenize(t = dupstring(s));
     inpush(t, 0, NULL);
     strinbeg(0);
+    /*
+     * Warning to Future Generations:
+     *
+     * This way of passing the subscript through the lexer is brittle.
+     * Code above this for several layers assumes that when we tokenise
+     * the input it goes into the same place as the original string.
+     * However, the lexer may overwrite later bits of the string or
+     * reallocate it, in particular when expanding aliaes.  To get
+     * around this, we copy the string and then copy it back.  This is a
+     * bit more robust but still relies on the underlying assumption of
+     * length preservation.
+     */
     lexbuf.len = 0;
-    lexbuf.ptr = tokstr = s;
+    lexbuf.ptr = tokstr = dupstring(s);
     lexbuf.siz = l + 1;
     err = dquote_parse(endchar, sub);
+    toklen = (int)(lexbuf.ptr - tokstr);
+    DPUTS(toklen > l, "Bad length for parsed subscript");
+    memcpy(s, tokstr, toklen);
     if (err) {
-	err = *lexbuf.ptr;
-	*lexbuf.ptr = '\0';
+	char *strend = s + toklen;
+	err = *strend;
+	*strend = '\0';
 	untokenize(s);
-	*lexbuf.ptr = err;
+	*strend = err;
 	s = NULL;
     } else {
-	s = lexbuf.ptr;
+	s += toklen;
     }
     strinend();
     inpop();
diff --git a/Test/D06subscript.ztst b/Test/D06subscript.ztst
index cffca74..1449236 100644
--- a/Test/D06subscript.ztst
+++ b/Test/D06subscript.ztst
@@ -249,3 +249,20 @@
   string[0]=!
 1:Can't set only element zero of string
 ?(eval):1: string: assignment to invalid subscript range
+
+  typeset -A assoc=(leader topcat officer dibble sidekick choochoo)
+  alias myind='echo leader' myletter='echo 1' myletter2='echo 4'
+  print ${assoc[$(myind)]}
+  print $assoc[$(myind)]
+  print ${assoc[$(myind)][$(myletter)]}${assoc[$(myind)][$(myletter2)]}
+  assoc[$(myind)]='of the gang'
+  print ${assoc[$(myind)]}
+  print $assoc[$(myind)]
+  print $assoc[leader]
+0: Parsing subscript with non-trivial tokenisation
+>topcat
+>topcat
+>tc
+>of the gang
+>of the gang
+>of the gang

Follow-Ups:
- Re: Bug in alias expansion
  - From: Bart Schaefer

References:
- Bug in alias expansion
  - From: Kynn Jones
- Re: Bug in alias expansion
  - From: Peter Stephenson

Messages sorted by: Reverse Date, Date, Thread, Author