Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: why is eval needed?

X-seq: zsh-users 28421
From: Stephane Chazelas <stephane@xxxxxxxxxxxx>
To: Ray Andrews <rayandrews@xxxxxxxxxxx>
Cc: zsh-users@xxxxxxx
Subject: Re: why is eval needed?
Date: Sun, 20 Nov 2022 15:08:48 +0000
Archived-at: <https://zsh.org/users/28421>
In-reply-to: <b2bb08fe-d184-77ce-b754-7ef77b7ad8ce@eastlink.ca>
List-id: <zsh-users.zsh.org>
Mail-followup-to: Ray Andrews <rayandrews@xxxxxxxxxxx>, zsh-users@xxxxxxx
References: <d0ef2035-e8e9-00c5-7f53-b16609d96262@eastlink.ca> <20221119164852.hwujmufa6hn5lotr@chazelas.org> <352823cc-954a-fa79-d830-d69d593b1c02@eastlink.ca> <3fa3f7ff-1733-4730-a62f-dd0e138c3b72@app.fastmail.com> <da8b2618-3c6f-8825-02ca-b8dfba9abc6b@eastlink.ca> <20221120085519.szudhyg5ewrw3b4o@chazelas.org> <b2bb08fe-d184-77ce-b754-7ef77b7ad8ce@eastlink.ca>

2022-11-20 05:47:33 -0800, Ray Andrews:
[...]
> > arguments.
> Yeah I get it.  The first time I 'got it' was when discussing 'aptitude' and
> the various quotings needed.  Again one is creating an invisible level of
> organization imposed on a string of characters.  Where it got confusing was
> when zsh imposed it's own ideas of grouping, making '-L 2' into a single
> entity.

zsh doesn't impose anything. "-L 2" is a string made of 4
characters in any shell or programming language. There's no
programming language where "-L 2" means 2 strings -L and 2.

What may be the source of your confusion is that in the Bourne
shell, there was a second round of splitting applied to unquoted
text in list context. Space is a special character in the syntax
of the shell which is used to delimit command arguments, but
it's also in the default value of the $IFS special variable
which was used to *further* split text into futher arguments.

In the Bourne shell (introduced in the late 70s):

edit file

Is like in all other shells code that's meant to run
/path/to/edit with "edit" and "file" as separate arguments, but
if $IFS also happens to contain "i", then those two words happen
to be further  split into "ed", "t" and "f", "le", so you end up
running "/bin/ed" with "ed", "t", "f" and "le" as arguments.

That splitting also happened on top of parameter expansion.

ed $file

Upon syntax evaluation yields "ed" and "$file" as two separate
tokens, but both "ed" and the contents of $file further
underwent $IFS-splitting. Almost worse, globbing also happened
on top of $file expansion.

That is a very weird feature from a language design point of
view. Anybody who doesn't know what the shells looked like
before the Bourne shell would think Stephen Bourne was out of
his mind.

The thing is in the original Unix shell (designed in the early
70s on computers than had kilobytes of RAM), there was no
variable. There was a concept of script that could take arguments,
and in the script, you'd refer to them as $1, $2... the
positional parameters (which have survived in modern shells). 

Parameter expansion was really crude, it was a bit like aliases,
the contents of the parameter was just expanded in place into
the code being evaluated.

So for instance, if you called your script as:

my-script 'hi;rm -rf /'

And the script did:

echo $1

That would say hi and destroy the system.

It had its uses though.

my-script 'dir1 dir2' 'file3 file4'

With a script that did:

ls $1
rm $2

Would list the dir1 and dir2 directories and remove  the file3
and file4 files.

And:

my-script '*.txt'

in a script that did:

cd /foo
rm $1
cd /bar
rm $1

Would remove the files with name ending in .txt in both /foo and
/bar (or the current directory  if cd failed... but that's
beside the point).

My understanding is that the Bourne shell's bizarre IFS handling
and the fact that globbing was performed upon parameter
expansion was an attempt to keep some level of backward
compatibility with that Thompson shell. You see similar things
hapenning in csh from the same era.

The Korn shell (from the early 80s) kept most of that with the
exception that it only did IFS-splitting upon expansions, not on
literal text. 

In:

file=/some/file
edit $file

With IFS=i, $file would be split  into /some/f and le, but edit
would stay edit.

The POSIX specification of sh standardised that behaviour so
that's the one found in bash / dash / yash... as well.

That nonsense was fixed in most shells written after that
though, starting with rc, the shell of plan9 / Unix v10 from the
late 80s, but also zsh (early 90s)  or more recently (mid 2000s)
fish.

The fact that in sh/ksh/bash:

arg='-L 1'
tree $arg

Calls tree with "-L" and "1" as arguments is not that bash
doesn't do some sort of magic "grouping" that zsh would be
doing, but that ksh/bash contrary to zsh does that extra layer
of $IFS-splitting from the Bourne shell on top of the syntax
parsing as the default value of $IFS happens to contain the
space character.

That's why in sh/bash/ksh you almost always need to quote
parameter expansions if you intend to pass the contents of a
variable as an argument to a command.

See
https://unix.stackexchange.com/questions/171346/security-implications-of-forgetting-to-quote-a-variable-in-bash-posix-shells
for the kind of thing that  can happen if you forget.


> Dunno if it's really a thing to be desired but naively one might
> like some way of assembling a command string as if it was at CLI, that is
> *just* a string of characters -- which is how I was looking at it.

If you want to assemble a string, and that string to be
evaluated as shell code, that's what eval is for. eval evaluate
code written in the shell language. That's a way  to dynamically
invoke the language interpreter.

But generally, that's not what you want. Writing correct shell
code dynamically based on external input is very easy to get
wrong. In your case, it rather looks like you want to build up a
list of arguments to pass to a command for which you obviously
need a shell list/array variable.

> > tree parses its options. It looks like it does not use the
> > standard getopt() API or the GNU getopt_long() API for that:
> > 
> > $ nm -D =tree | grep -i getopt
> > $ ltrace -e '*opt*@*' tree > /dev/null
> > +++ exited (status 0) +++
> What are you doing there?  I have no 'nm' command here.  No such command in
> Debian repository.

nm is a standard development command (though not -D which AFAIK
is a GNU extension). Part of GNU binutils on GNU systems. Here
used to list the external functions that the utility claims it
needs.

The above shows that tree doesn't use the getopt() standard
command or that if it does, it embedded a copy into the
executable rather than using the one from the GNU libc.

> > So it must be doing it by hand.
> Yeah, so much in the GNU/Linux world is ad hoc.  Everybody did their own
> thing.  No rules.

Note that tree is not part of the GNU project, it's just a
utility written by some guy and shared to the world. There is
*some* level of consistency among utilities in the GNU
toolchest. There is even such a thing as published GNU coding standards.
See for instance
https://www.gnu.org/prep/standards/html_node/Command_002dLine-Interfaces.html#Command_002dLine-Interfaces

zsh is not part of the GNU project either. Note that zsh also
predates Linux and has been used in a wide variety of GNU and
non-GNU systems.

I'll agree with you that the lack of consistency in API between the
different tools that are available out there can be annoying
(nothing to do with GNU or Linux) 

[...]
> figure out.  But why does calling 'eval' fix it?  It seems as if eval
> 'flattens' everything -- one is back to a command line string of characters
> with no imposed grouping.

eval evaluates shell code. I struggle to understand what you
don't understand. Maybe you're thinking too much or too little
of what a "command line string" is. A "command line string"
like:

var='value'; if blah; then echo $(cmd) | tr -d x; fi

is just one line of code in the syntax of the shell programing
language. It's not  some magic universal language to talk to the
system.

You can store that code in bits in variables with:

a="'val"
b="ue'; if bl"
c='ah; then ech'
d='o $(cmd) | tr -'
e='d x; fi'

But surely you don't expact

var=$a$b$c$d$e

to have the same effect as running that command.

You can however do:

eval "var=$a$b$c$d$e"

For that string to be passed as code to the shell language
interpreter for it to interpret.

[...]
> Yeah, as I was saying, it does seem that 'tree' is very crabby. Why don't
> the GNU people iron these things out?
[...]

Again, tree has nothing to do with the GNU project. There is
also unfortunatly not one standard to parse options and
arguments. There is POSIX getopt() but for instance, it doesn't
support long options (neither a la GNU nor a la X11, or a la
perl Getopt::Long nor a la zsh zparseopts...).

-- 
Stephane

Follow-Ups:
- Re: why is eval needed?
  - From: Ray Andrews

References:
- why is eval needed?
  - From: Ray Andrews
- Re: why is eval needed?
  - From: Stephane Chazelas
- Re: why is eval needed?
  - From: Ray Andrews
- Re: why is eval needed?
  - From: Lawrence Velázquez
- Re: why is eval needed?
  - From: Ray Andrews
- Re: why is eval needed?
  - From: Stephane Chazelas
- Re: why is eval needed?
  - From: Ray Andrews

Messages sorted by: Reverse Date, Date, Thread, Author