Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,MISSING_HEADERS
	autolearn=no autolearn_force=no version=3.4.1
X-Originating-IP: [86.21.161.213]
X-Spam: 0
X-Authority: v=2.1 cv=GYXq5ljL c=1 sm=1 tr=0 a=oo3MgO7t/4XyXFuSzI3dDQ==:117
 a=oo3MgO7t/4XyXFuSzI3dDQ==:17 a=L9H7d07YOLsA:10 a=9cW_t1CCXrUA:10
 a=s5jvgZ67dGcA:10 a=kj9zAlcOel0A:10 a=q2GGsy2AAAAA:8 a=ndM92w6KKhMye6QWw1gA:9
 a=CjuIK1q_8ugA:10 a=z9dJwno5l634igLiVhy-:22
Date: Sun, 5 Jun 2016 20:37:08 +0100
From: Peter Stephenson <p.w.stephenson@ntlworld.com>
Cc: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: [BUG] Long line makes pattern matching (by //) hog Zsh
Message-ID: <20160605203708.3701c7a2@ntlworld.com>
In-Reply-To: <160605121020.ZM7727@torch.brasslantern.com>
References: <CAKc7PVC=AES1LhY7tYTXrPsefX3CXgtUsxiVbDaxmc5o2iHnVw@mail.gmail.com>
	<160605121020.ZM7727@torch.brasslantern.com>
X-Mailer: Claws Mail 3.11.1 (GTK+ 2.24.28; x86_64-redhat-linux-gnu)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
X-Seq: zsh-workers 38617

0On Sun, 5 Jun 2016 12:10:20 -0700
Bart Schaefer <schaefer@brasslantern.com> wrote:

> On Jun 5,  4:36pm, Sebastian Gniazdowski wrote:
> }
> } 1. not backslash nor slash nor space [^ /\\\\]##
> } 2. not number, slash, backslash, space [^0-9/\\\\ ]##
> } 3. not slash, backslash [^/\\\\]#
> } 4. end of line (#e)
>
> It's in the block in pattern.c:patchmatch() that begins with the
> comment:
> 
> 		/*
> 		 * Lookahead to avoid useless matches. This is not possible
> 		 * with approximation.
> 		 */

That comment happens to be an irrelevance --- that's where it's looking
for something that's an exact match to follow.  There isn't one in this
case.  If there was, we would have latched onto it and we wouldn't need
to try quite so hard rearranging the other elements of the pattern.

The containing block is where it's handling # and ##, so there's no
great surprise it's'= there.

> Specifically, in the "if (no >= min) for (;;) ..." loop, at each charater
> in the input string patmatch() is called recursively to look at the rest
> of the string, which again enters this same loop because the next thing
> is also a one-or-more expression, which calls recursively and again
> enters the loop because the thing after that is a zero-or-more.

The problem is the patterns are pathological.  Each of them can match
the same characters.  So it's spending a lot of time repartitioning the
mathches  between the possibilities of 1. and 2. and 3. in the above.
That's not polynomially bounded.  I'm not sure if it's even
exponentially bounded.

What I'm not sure is if there's a way of improving this without some
special case or, obviously, making the patterns more specific.

pws

