Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00 autolearn=ham
	autolearn_force=no version=3.4.0
Date: Wed, 20 Jan 2016 07:47:54 +0000
From: Daniel Shahaf <d.s@daniel.shahaf.name>
To: Bart Schaefer <schaefer@brasslantern.com>
Cc: Peter Stephenson <p.stephenson@samsung.com>,
	Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: bufferwords() lexes a subshell in a shortloop repeat as a string
Message-ID: <20160120074754.GC29602@tarsus.local2>
References: <20160115062648.GA14019@tarsus.local2>
 <20160115094117.5fcde75c@pwslap01u.europe.root.pri>
 <20160118022558.GC3979@tarsus.local2>
 <CAH+w=7Z7d9Xc2ro9F1cMoyT_TeqmVNYzZc0vOnrCchtRi_4VDQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <CAH+w=7Z7d9Xc2ro9F1cMoyT_TeqmVNYzZc0vOnrCchtRi_4VDQ@mail.gmail.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Seq: zsh-workers 37701

Bart Schaefer wrote on Mon, Jan 18, 2016 at 20:56:04 -0800:
> [Returning to the original topic of this thread ...]
> 
> On Sun, Jan 17, 2016 at 6:25 PM, Daniel Shahaf <d.s@daniel.shahaf.name> wrote:
> > What confuses me is that 'repeat 3 (x)' and 'repeat 3; do (x); done' are
> > split differently. ;-)
> >
> > Shouldn't both of them treat the "(x)" the same way [either both of
> > them considering it one unit, or both of them considering it three units]?
> 
> As Peter said earlier, the (z) flag does nothing but break the string
> into syntactic shell words.  With the exception of "for" loops, which
> are a weird special case because of "for ((...))", It does NOT
> interpret shell keywords to parse any corresponding loop structures.
> It knows a little about assignments and redirections but otherwise
> reads lexical tokens in their most generic possible context; you can
> think of it as having "lex" without "yacc" to drive it.
> 

Okay; so what I was seeing was that bufferwords() knew that a DOLOOP token
is followed by a command position, but not that a REPEAT token is
followed by a token that's followed by a command position.

I think REPEAT is the only place where that happens: other reserved
words are followed immediately by a command position with no intervening
words.  (Which is why get_comp_string() sets 'ins' to '2' only for
REPEAT tokens.)

Aside: bufferwords(), get_comp_string(), and z-sy-h's main loop have
something in common: they all drive the lexer and keep track of a little
bit of syntax.  E.g., with this patch all of them keep track of "if the
command word is 'repeat', the word-after-next is a command word".

> (z) also does not expand aliases, which means that even if it did
> interpret keywords you could trivially break it by aliasing something
> else to expand as "repeat" or vice-versa.  (In fact you can already
> break the magic "for" parsing the same way.)

Don't do that, then :-)

Cheers,

Daniel