Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Command hashing/autocd bug & possible fixes



First the bug.  It happens on any OS.  It just requires there to be a
command in $PATH whose name collides with a root dir entry such as "usr",
"bin", "dev" or "opt".  The below creates a collision (assumes usual /dev
dir exists), does a setopt AUTOCD, witnesses that working, then failing,
then working again.  It should reproduce with a zsh -f (but should not
be sensitive to much besides unsetopt hashcmds):

  PATH=/tmp:$PATH; touch /tmp/dev; chmod +x /tmp/dev
  setopt AUTOCD
  /dev
  pwd
  whence /dev
  /dev          # errors out with permission denied
  unhash /dev
  /dev          # works again

If you do a 'hash -L | grep /dev' after the whence you can see what is
amiss is that /dev is being put in as a hash key mapping to /tmp//dev.
If you strace you can see that Zsh ends up trying to execve("/dev")
which fails with EPERM, hence the message.  unsetopt hashcmds also
works around/removes the problem.

Note that one might think with setopt PATHDIRS in effect that there
would be an ambiguity about what should happen.  /dev appended to an
element of PATH after all hits a real program, /tmp//dev, with the
usual Unix path collapse rules.  Real programs are supposed to take
priority over AUTOCD.  However, the man pages for PATHDIRS say it
should NOT work for commands that begin with "/", "./" or "../".
So, I think it should either do the AUTOCD or the docs are wrong.
I think the "/", "./", "../" is a simple, good rule and AUTOCD
should go through.

This is not entirely theoretical problem.  LLVM comes with a program
called "opt" & "/opt" is very common.  I don't know how popular AUTOCD
is, but I've always loved it.  A lot of typical root entries are not
implausible names for commands.  Finally, getting misleading permission
denied on an "attempted cd" to a root entry is..worrisome.


Now ideas for a fix.  About these I am less confident as there may be
a tangled web of concerns.

Since it makes little sense to call whence with a term that explicitly
blocks path search, one possible fix is inside Src/exec.c:findcmd():

diff --git a/Src/exec.c b/Src/exec.c
index 042ba065a..e6b20dcb6 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -834,7 +834,7 @@ findcmd(char *arg0, int docopy, int default_path)
        return NULL;
     }
     cn = (Cmdnam) cmdnamtab->getnode(cmdnamtab, arg0);
-    if (!cn && isset(HASHCMDS) && !isrelative(arg0))
+    if (!cn && isset(HASHCMDS) && !isrelative(arg0) && *arg0 != '/')
        cn = hashcmd(arg0, path);
     if ((int) strlen(arg0) > PATH_MAX)
        return NULL;

That tests out fine for my "whence" case, but besides bin_whence(),
findcmd() has 6 other call sites: Src/subst.c:equalsubstr(),
Src/Zle/zle_tricky.c:docomplete() Src/Zle/zle_tricky.c:expandcmdpath(),
Src/Zle/compctl.c:makecomplistpc(), Src/Zle/compctl.c:makecomplistcmd(),
and Src/Zle/compctl.c:addmatch().

It's possible those sites are also buggy OR possible they have reason
to put an absolute path into cmdnametab.  If they have a legitimate
need, findcmd() could grow an allowRooted parameter they could pass.
That's a more involved patch, obviously.

Another possible fix might be to try to stop hashcmd from entering
keys with a leading '/' (or accept an allowRooted parameter).  The
easiest way to do that..

diff --git a/Src/exec.c b/Src/exec.c
index 042ba065a..79ef83c1e 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -940,6 +940,8 @@ hashcmd(char *arg0, char **pp)
     char *s, buf[PATH_MAX+1];
     char **pq;

+    if (*arg0 == '/')
+        return NULL;
     for (; *pp; pp++)
        if (**pp == '/') {
            s = buf;

..also tests out fine for my "whence" example.  That sidesteps the
HASHDIRS optimization only for the (probably rare?) rooted path.
If that's viewed as a real problem, the lower isset(HASHDIRS) code
could be duplicated before the early return.  There are only a few
more call sites to hashcmd besides the 6 indirect calls already
mentioned, Src/utils.c:spckword(), Src/Zle/zle_tricky.c:docomplete(),
and Src/utils.c:execcmd_exec().

Chances seem good that the original concept of cmdnametab explicitly
did not want rooted paths in there, but that somewhere in those 9
other call sites someone cares to allow absolute paths for some reason
or another.  But I do think that once a rooted path is in there, if
it collides with a root directory entry the same name collision logic
will break autocd.

So, finally, it also seems possible to let rooted paths stay and have a
smarter isreallycom():

diff --git a/Src/exec.c b/Src/exec.c
index 042ba065a..701e8c2c7 100644
--- a/Src/exec.c
+++ b/Src/exec.c
@@ -906,6 +906,8 @@ isreallycom(Cmdnam cn)
     else if (!cn->u.name)
        return 0;
     else {
+        if (*cn->node.nam == '/')
+            return 0;
        strcpy(fullnam, *(cn->u.name));
        strcat(fullnam, "/");
        strcat(fullnam, cn->node.nam);

This lets "/dev" be put into cmdnametab, but the smarter isreallycmd()
prevents that from interfering with autocd.  We could also add an
"&& isset(AUTOCD) " to that nam == '/' condition, too.  isreallycmd() is
only called in once place the trycd clause of  execcmd_exec().  This
is the most conservative approach (and, yes, yes, whatever we go
with should all have some comment explaining itself a bit).

Does anyone have any firm opinions on what approach they would like
beyond the goes-without-saying "don't break stuff"?  Should rooted
paths be blocked from cmdnametab outright keeping it "clean"?  Or
should we just workaround their presence?  Both?  Some other approach
entirely?


Messages sorted by: Reverse Date, Date, Thread, Author