Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: behavior of test true -a \( ! -a \)



> On 21/03/2024 11:29 GMT Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> > On 21/03/2024 11:04 GMT Vincent Lefevre <vincent@xxxxxxxxxx> wrote:
> > On 2024-03-21 10:28:16 +0000, Peter Stephenson wrote:
> > > I haven't had time to go through this completely but I think somewhere
> > > near the root of the issue is this chunk in par_cond_2(), encountered at
> > > the opint we get to the "!":
> > > 
> > >     if (tok == BANG) {
> > > 	/*
> > > 	 * In "test" compatibility mode, "! -a ..." and "! -o ..."
> > > 	 * are treated as "[string] [and] ..." and "[string] [or] ...".
> > > 	 */
> > > 	if (!(n_testargs > 2 && (check_cond(*testargs, "a") ||
> > > 				 check_cond(*testargs, "o"))))
> > > 	{
> > > 	    condlex();
> > > 	    ecadd(WCB_COND(COND_NOT, 0));
> > > 	    return par_cond_2();
> > > 	}
> > >     }
> > > 
> > > in which case it needs yet more logic to decide why we shouldn't treat !
> > > -a as a string followed by a logical "and" in this case.  To be clear,
> > > obviously *I* can see why you want that, the question is teaching the
> > > code without confusing it further.
> > 
> > Perhaps follow the coreutils logic. What matters is that if there is
> > a "(" argument, it tries to look at a matching ")" argument among the
> > following 3 arguments. So, for instance, if it can see
> > 
> >   ( arg2 arg3 )
> > 
> > (possibly with other arguments after the closing parenthesis[*]), it
> > will apply the POSIX test on 4 arguments.
> > 
> > [*] which can make sense if the 5th argument is -a or -o.
> 
> I suppose as long as we only look for ")" when we know there's one to
> match we can probably get away with it without being too clever.  If
> there's a ")" that logically needs to be treated as a string following a
> "(" we're stuck but I think that's fair game.
> 
> Something simple like: if we find a (, look for a matching ), so blindly
> count intervening ('s and )'s regardless of where they occur, and then
> NULL out the matching ) temporarily until we've parsed the expression
> inside.  If we don't find a matching one treat the ( as as a string.

This implements the above.  I don't think any of the subsequent
discussion has any impact on the effectiveness of this.  Because this is
only done when we've already identified a "(" as starting grouping,
looking for a ")" is benign --- failing to find it would mean the pattern
was invalid.  As Vincent already pointed out, assuming the pattern is
valid is a sensible strategy.  The remaining question is which ")" to
match if there are multiple.

I've added a check that "test \( = \)" returns 1.  This isn't affected
because as I said before three arguments are treated specially.
Possibly some more tests for non-pathological cases where parentheses do
grouping in test compatibility mode would be sensible.

There is inevitably a trade off: "test \( \) = \) \)" used to "work"
(test that the two inner closing parentheses were the same string) but
now doesn't (the first closing parenthesis ends the group started by the
first opening parenthesis).  That strikes me as OK as we're making the
less pathological case (the one where parentheses mean just one thing)
work.  However, it is a sign we're right on the edge of sanity, so I'm
not proposing to "fix" this any further.

Feel free to argue that the current behaviour of simply parsing
parentheses in order and blindly trusting the result is actually a
better bet, though I can't frankly see how, myself.  There is an
alternative strategy which is to assume the rightmost closing parenthesis
ends the outermost group.

Also feel free to come up with other pathologies.

pws

diff --git a/Src/parse.c b/Src/parse.c
index 3343656..1505b49 100644
--- a/Src/parse.c
+++ b/Src/parse.c
@@ -2528,10 +2528,39 @@ par_cond_2(void)
     if (tok == INPAR) {
 	int r;
 
+	/*
+	 * Owing to ambiguuities in "test" compatibility mode, it's
+	 * safest to assume the INPAR has a corresponding OUTPAR
+	 * before trying to guess what intervening strings mean.
+	 */
+	char **endargptr = NULL, *endarg = NULL;
+	if (condlex == testlex) {
+	    char **argptr;
+	    int n_inpar = 1;
+
+	    for (argptr = testargs; *argptr; argptr++) {
+		if (!strcmp(*argptr, ")")) {
+		    if (!--n_inpar) {
+			endargptr = argptr;
+			endarg = *argptr;
+			*argptr = NULL;
+			break;
+		    }
+		} else if (!strcmp(*argptr, "(")) {
+		    ++n_inpar;
+		}
+	    }
+	}
+
 	condlex();
 	while (COND_SEP())
 	    condlex();
 	r = par_cond();
+	if (endargptr) {
+	    *endargptr = endarg;
+	    if (testargs == endargptr)
+		condlex();
+	}
 	while (COND_SEP())
 	    condlex();
 	if (tok != OUTPAR)
diff --git a/Test/C02cond.ztst b/Test/C02cond.ztst
index daea5b4..453fa1c 100644
--- a/Test/C02cond.ztst
+++ b/Test/C02cond.ztst
@@ -442,6 +442,14 @@ F:scenario if you encounter it.
 >in conjunction: 3
 ?(eval):6: no such option: invalidoption
 
+  test \( = \)
+1: test compatility mode doesn't do grouping with three arguments
+
+# This becomes [[ -n true && ( -n -a ) ]]
+# The test is to ensure the ! -a is analysed as two arguments.
+  test true -a \( ! -a \)
+1: test compatilibty mode is based on arguments inside parentheses
+
 %clean
   # This works around a bug in rm -f in some versions of Cygwin
   chmod 644 unmodish




Messages sorted by: Reverse Date, Date, Thread, Author