Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Pattern bug on (a*|)~^(*b)



> On 31/07/2023 12:36 Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> > On 25/07/2023 14:19 Johan Grande <nahoj@xxxxxxxxx> wrote:
> > In zsh 5.8.1 (x86_64-ubuntu-linux-gnu) with extended_glob,
> > 
> > [[ "ab" = (|a*)~^(*b) ]]
> > 
> > incorrectly (unless I'm mistaken) returns 1.
>
> The fixes I can think of are either to find a point at which to undo the sync
> record for the match that has failed after the event or, perhaps better as it's
> a local change even though it does more work, reset any previous matches before
> the sync point on the next alternative when we mark that as having matched
> (we will only do that if the previous branch failed --- remember [irony
> intended] that a branch succeeds only if all following patterns match too).

I think the latter is good enough.  There's no sign of a slowdown with this
fix in D02glob.ztst, so any resulting problems will have to be complicated
things with nested branches and probably zero-length matches --- not all
of which are well optimised anyway.  I could be wrong, but I think we'll
have to take it on the chin.

Anyway, I couldn't think of any easy way around that, despite some investigation,
so as this does fix the immediate problem, we can go with it.  It's the sort
of self-contained fix I like.

pws

diff --git a/Src/pattern.c b/Src/pattern.c
index 3edda1772..2a1a514fb 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -2987,14 +2987,15 @@ patmatch(Upat prog)
 	case P_EXCSYNC:
 	    /* See the P_EXCLUDE code below for where syncptr comes from */
 	    {
-		unsigned char *syncptr;
+		unsigned char *syncstart, *syncptr, *ptr;
 		Upat after;
 		after = P_OPERAND(scan);
 		DPUTS(!P_ISEXCLUDE(after),
 		      "BUG: EXCSYNC not followed by EXCLUDE.");
 		DPUTS(!P_OPERAND(after)->p,
 		      "BUG: EXCSYNC not handled by EXCLUDE");
-		syncptr = P_OPERAND(after)->p + (patinput - patinstart);
+		syncstart = P_OPERAND(after)->p;
+		syncptr = syncstart + (patinput - patinstart);
 		/*
 		 * If we already matched from here, this time we fail.
 		 * See WBRANCH code for story about error count.
@@ -3009,6 +3010,23 @@ patmatch(Upat prog)
 		 * failed anyway.
 		 */
 		*syncptr = errsfound + 1;
+		/*
+		 * Because of backtracking, any match before this point
+		 * can't apply to the current branch we're on so is now
+		 * a failure --- this can happen if, on a previous
+		 * branch, we initially marked a success before failing
+		 * on a later part of the pattern after marking up the
+		 * P_EXCSYNC (even an end anchor will have this effect).
+		 * To make sure we record the current match point
+		 * correctly, mark those down now.
+		 *
+		 * This might have side effects on the efficiency of
+		 * pathological cases involving nested branches.  To
+		 * fix that we'd probably need to record matches on
+		 * different branches separately.
+		 */
+		for (ptr = syncstart; ptr < syncptr; ++ptr)
+		    *ptr = 0;
 	    }
 	    break;
 	case P_EXCEND:
diff --git a/Test/D02glob.ztst b/Test/D02glob.ztst
index 850a535e5..4d88e5c27 100644
--- a/Test/D02glob.ztst
+++ b/Test/D02glob.ztst
@@ -817,6 +817,32 @@
 *>*/glob.tmp/(flip|flop)
 *>*/glob.tmp/(flip|flop)/trailing/components
 
+# The following set test an obscure problem with branches followed by
+# exclusions that shows up when the exclusion matches against
+# something other than the complete test string, hence the complicated
+# double negative.
+  [[ ab = (|a*)~^(*b) ]]
+0:Regression test for exclusion after branches: empty first alternative
+
+  [[ ab = (b|a*)~^(*b) ]]
+0:Regression test for exclusion after branches: non-empty first alternative
+
+  [[ ab = (b*|a*)~^(*b) ]]
+0:Regression test for exclusion after branches: full length first alternative
+
+# Corresponding tests where the exclusion should succeed, so the
+# match fails.  It's hard to know how to provoke bugs here...
+  [[ abc = (|a*)~^(*b) ]]
+1:Regression test for exclusion after branches: failure case 1
+
+  [[ abc = (b|a*)~^(*b) ]]
+1:Regression test for exclusion after branches: failure case 2
+
+  [[ abc = (b*|a*)~^(*b) ]]
+1:Regression test for exclusion after branches: failure case 3
+
+# Careful: extendedglob off from this point.
+
   unsetopt extendedglob
   print -r -- ${(*)=${(@s.+.):-A+B}/(#b)(?)/-${(L)match[1]} ${match[1]}}
 0:the '*' qualfier enables extended_glob for pattern matching




Messages sorted by: Reverse Date, Date, Thread, Author