Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Aliasing separators (Re: grammar triviality with '&&')



On Fri, 6 Mar 2015 11:26:28 -0800
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Mar 6,  9:40am, Peter Stephenson wrote:
> } Subject: Re: Aliasing separators (Re: grammar triviality with '&&')
> }
> } OK, to state my basic position (though it's kind of moot --- as I said I
> } don't think anybody really needs the change)  1. tokenisation is part of
> } lexing  2. alias expansion comes between lexing and parsing  3. any
> } result of lexing is game for alias expansion, unless you make stricter
> } rules than zsh already has.  But this discussion isn't really going
> } anywhere.
> 
> Understood, but (prior to 34641) zsh *did* have stricter rules, and (in
> terms of the lexer, not in terms of explaining to end users) the rule
> was very simple:  only STRING tokens are subject to alias expansion.
> 
> In practice that means something like "only tokens that can be changed
> by concatenating with another string using simple lexical pasting, may
> be aliased."  But that isn't a very satisfying way to say it (and it's
> not 100% true because of "{" being a reserved word).

Sure, but I'm not sure that's particularly useful for users.  The rule
looks something like "you can't alias it if it's one of them things
where you don't need to put a space after it", or something like that.
We can still document it somehow, though, so this isn't really
fundamental.

> As an aside, using zshlextext isn't really correct either if the real
> intention is to allow aliasing of tokens.  Did you plan to allow the
> aliasing of the NEWLIN token?  Because with 34641,
> 
>     alias $'\n'=...
> 
> does not work, but
> 
>     alias '\n'=...
> 
> actually does create an alias for hitting enter at a blank PS1 prompt.
> The point being that for non-STRING tokens, zshlextext doesn't always
> represent the actual input string.

Yes, (not necessarily here but in general) that has effects in other
cases e.g. text representation of syntactic structures, so it's
certainly something to think about.  The cases where it's different are
weird and wonderful enough it's not clear you'd want to work, but that's
a fairly fuzzy target.  There'd certainly be room for more detailed
advice to users (though, to be honest, I'm less convinced than I used to
be about the merits of longer documentation).

> The other minor point is that this slows down lexical analysis a lot.
> Many more things are going through checkalias(), including in some
> cases (as you pointed out) every individual character.

That's not necessarily that minor, actually; do you have numbers?
Tokens are a small fraction of most commands but even there there may be
pathological cases.

(Hmm... entirely separately, what would we gain by optimising the case
where there are no global aliases, which a lot of us don't use, not to
search for them?  That looks straightforward --- count added or removed
global aliases, or less bug prone scan the alias table when it changes,
and only search for aliases if the count is non-zero or incmdpos or
inalmore are set.  The major problem is the likelihood of an obscure bug
in the count rendering global aliases unusable.)

> Finally it seems wrong that "setopt POSIXALIASES" disallows aliasing of
> reserved words but (with 34641) still allows aliasing of other special
> tokens.

Yes, that looks like a real bug.

> } On Thu, 5 Mar 2015 17:42:40 -0800
> } Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> } > torch% &&bar
> } > 
> } > I would argue that the "&&" is NOT "in command position" because in the
> } > normal lexical situation "command position" ENDS just to the left of any
> } > separator.  There's NOTHING in "command position" in that example.
> } 
> } Well, that's not how the lexer actually works.  It's been told it's in
> } command position and it fetches the next token.  So whatever comes at
> } the start of the line *must* be in command position.
> 
> This is curiously flipped around from the previous discussion; now you're
> arguing from the strict lexer POV and I'm talking about what it ought to
> mean to the end user.

I'm under the impression I've been arguing about what actually happens,
which is complicated when different people have different views of it.

> The lexer can certainly be (and was, before, though it was not explicitly
> stated) smart enough to know that any token that arrives at that point
> with tokstr == NULL is not in fact something that could be a command and
> therefore shouldn't be treated as one.

Not sure what this means.  "(" in command position is a token but is
effectively a command meaning "enter a subshell and while you're there
do whatever I tell you next", and is handled as such by the exec.c
chain.

> } Given that, in any case, no one is actually suggesting we change the
> } lexer to do something different with "&&" I don't think I see the
> } relevance anyway.  "&&" is a token and either expanded as an alias or
> } not
> 
> It's relevant to "alias" vs. "alias -g".  If && at the start of the line
> is not in command position, then it doesn't expand unless it has the
> global-alias flag.

My point is that doesn't really makes sense unless you decide that "&&"
at the start of the line isn't going to be a token *at all*, in the same
way that "(" has effectively the reverse behaviour.  In other words,
either you lose the parse error on "&& foo" and treat "&&" as a string
*within* the lexer when it occurs in command position, or there's no
case to answer here because the formal distinction you're trying to make
doesn't actually exist within the shell.

If you did make that change, making "&&" a string rather than a token at
the start of the line, then you could alias it willy-nilly.  So I still
don't really see it as relevant to the behaviour of tokens.

I suspect I'm not explaining this point properly.

pws



Messages sorted by: Reverse Date, Date, Thread, Author