Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Aliasing separators (Re: grammar triviality with '&&')

X-seq: zsh-workers 34681
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: Re: Aliasing separators (Re: grammar triviality with '&&')
Date: Sat, 7 Mar 2015 13:10:08 -0800
In-reply-to: <20150307155252.75848f74@ntlworld.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <54F33934.2070607@eastlink.ca> <13666281425228233@web7o.yandex.ru> <54F345D3.9010204@eastlink.ca> <D0509295-7DA9-4F18-9E3D-D50C0A756998@larryv.me> <20150302022754.GA7449@xvii.vinc17.org> <CABx2=D8efL3X2tfB+_+VweY2yye6EhaMNbJa3b3jJeVMp=7gaQ@mail.gmail.com> <20150302104619.GC6869@xvii.vinc17.org> <20150302110610.2e2c7e86@pwslap01u.europe.root.pri> <CAH+w=7YoHjN85hqOZVywOfYGZqvU74vZrbE84Ln+V2HQi-6nSA@mail.gmail.com> <20150304144756.GA27231@ypig.lip.ens-lyon.fr> <150304175112.ZM19818@torch.brasslantern.com> <20150305100638.55631238@pwslap01u.europe.root.pri> <150305090720.ZM8441@torch.brasslantern.com> <20150305174011.0be5a31e@pwslap01u.europe.root.pri> <150305174240.ZM8732@torch.brasslantern.com> <20150306094039.3d968c63@pwslap01u.europe.root.pri> <150306112628.ZM9769@torch.brasslantern.com> <20150307155252.75848f74@ntlworld.com>

On Mar 7,  3:52pm, Peter Stephenson wrote:
} Subject: Re: Aliasing separators (Re: grammar triviality with '&&')
}
} On Fri, 6 Mar 2015 11:26:28 -0800
} Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
} > 
} > In practice that means something like "only tokens that can be changed
} > by concatenating with another string using simple lexical pasting, may
} > be aliased."  But that isn't a very satisfying way to say it (and it's
} > not 100% true because of "{" being a reserved word).
} 
} Sure, but I'm not sure that's particularly useful for users.  The rule
} looks something like "you can't alias it if it's one of them things
} where you don't need to put a space after it", or something like that.

Actually, that helped me to put some sort of explanation around my
intuition of how aliasing should work:

 A token may be expanded as an alias only if doing so cannot change the
 lexical interpretation of any tokens that may appear adjacent to it.

This emphasizes that "{this" expanding to "{ foothis" in the example at
the tail of my previous message, is a bug: The interpretation of "this"
has been changed, and the internal inconsistency is manifest when you
examine what has been stored in the history.

Also note "cannot" rather than "does not," which emphasizes why you
can't make an alias for "&&" even if you always write "x && y" rather
than "x&&y", and why you can't alias newline.

On the other hand, if aliasing were an entirely different stage rather
than occurring during regular lexing -- I think csh worked that way --
then I might be less concerned about this.  That is, for csh I believe
it went something like:

1. Read in the line
2. Break the line into words using whitespace and quoting **
3. Check each word for alias expansion
4. Apply lexical analysis to the result

2 + 3 are why csh aliases are allowed to make \!:N history references
to the words in the line while being expanded.  Since zsh does 1 and
4 simultaneously, the rules at 3 have to be different, but the basic
intention of applying the expansion to "words" was never meant to go
away as a result.

** Also, csh broke the line up into commands at ';' '&&' etc. before
applying aliases, so \!:N references don't cross command boundaries.

} > The other minor point is that this slows down lexical analysis a lot.
} > Many more things are going through checkalias(), including in some
} > cases (as you pointed out) every individual character.
} 
} That's not necessarily that minor, actually; do you have numbers?

I "repaired" POSIXALIASES like so ...

diff --git a/Src/lex.c b/Src/lex.c
index 494ea88..33c5288 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -1739,7 +1739,7 @@ checkalias(void)
        return 0;
 
     if (!noaliases && isset(ALIASESOPT) &&
-       (!isset(POSIXALIASES) ||
+       (!isset(POSIXALIASES) || tokstr &&
         !reswdtab->getnode(reswdtab, zshlextext))) {
        char *suf;

... and then ran this:

    repeat 10 do
      time Src/zsh -ic 'autoload +X -m \*'
      time Src/zsh -o posixaliases -ic 'autoload +X -m \*'
    done

The "autoload +X" has the effect of loading the entire completion suite,
which was the largest convenient bolus of lexing/parsing work I could
think to throw at it.  The difference with 30 aliases defined (none
global) was insignificant, so looking up reserved words is not really a
factor.

I then backed out the lex.c changes and compared the old processing to
the new.  There still wasn't much difference.  However, this is with an
unstripped binary compiled for debugging, so it's possible a larger
difference would show up if optimization were enabled.

} > The lexer can certainly be (and was, before, though it was not
} > explicitly stated) smart enough to know that any token that arrives
} > at that point with tokstr == NULL is not in fact something that
} > could be a command and therefore shouldn't be treated as one.
} 
} Not sure what this means.  "(" in command position is a token but is
} effectively a command

Maybe I could express it this way:  A command is something such that if
you prefix it with a precommand modifier, it's still a command.  The
old aliasing code effectively differentiated those kinds of somethings
from other arbitrary tokens that might appear in what the lexer calls
command position.  It didn't do so in an obvious way, but looking for
tokstr == NULL has the equivilent effect.

} My point is that doesn't really makes sense unless you decide that "&&"
} at the start of the line isn't going to be a token *at all*, in the same
} way that "(" has effectively the reverse behaviour.  In other words,
} either you lose the parse error on "&& foo" and treat "&&" as a string
} *within* the lexer when it occurs in command position, or there's no
} case to answer here because the formal distinction you're trying to make
} doesn't actually exist within the shell.

This helps me understand what you were getting at, but I would counter
that the existence of precommand modifiers shows that the shell does in
fact have that distinction -- internally the distinction is at another
level, but from an overall perspective "&&" appears only after commands,
and "(" appears before; neither occurs "in place of" a command.  Except
for the -g case, aliases should only apply to things that really can be
"in place of" a command.

} If you did make that change, making "&&" a string rather than a token
} at the start of the line, then you could alias it willy-nilly.

It'd have to be not just at the start of a line, but after every ";"
or "|" and so on.  But you can't really do that -- it has to be token
when it is not an alias, otherwise you get "&&: command not found"
instead of the syntax error you should get.

So you'd have to lex it as a string, look to see if it's an alias, and
then if it is not, back up and re-lex it.  Which is sort of what happens
already when it IS an alias, I suppose.

-- 
Barton E. Schaefer

Follow-Ups:
- Re: Aliasing separators (Re: grammar triviality with '&&')
  - From: Vincent Lefevre

References:
- Aliasing separators (Re: grammar triviality with '&&')
  - From: Bart Schaefer
- Re: Aliasing separators (Re: grammar triviality with '&&')
  - From: Peter Stephenson
- Re: Aliasing separators (Re: grammar triviality with '&&')
  - From: Bart Schaefer
- Re: Aliasing separators (Re: grammar triviality with '&&')
  - From: Peter Stephenson
- Re: Aliasing separators (Re: grammar triviality with '&&')
  - From: Bart Schaefer
- Re: Aliasing separators (Re: grammar triviality with '&&')
  - From: Peter Stephenson

Messages sorted by: Reverse Date, Date, Thread, Author