Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_DKIM_INVALID
	autolearn=ham autolearn_force=no version=3.4.1
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=brasslantern-com.20150623.gappssmtp.com; s=20150623;
        h=from:message-id:date:in-reply-to:comments:references:to:subject
         :mime-version;
        bh=vkfokoWdfNja43yUaqwS+4XrmF8jPjlANgEN0TpJ0oE=;
        b=Dp0DWim45qkS+tXwXVZmAYzI31BHVMZRiml32n52yH1yftj5VL6MgHkc7exTJjhPYR
         vigDOJLQK2HUsMe/VHbvnsNnhf2GXQ4tXGoaE9pqmhF+fbMVuuk+8tWRgJHxxFOuevw6
         p0rApxzFC2PSrK3jPDKn3vzGz5wOO7rEqJoBIaJMw3xJMz2ws0UqmpJGJ6hbr1YeRGoG
         rscQdo6FGkp+InkSxianvYlMQJAiq+cE799dTuAW1pk8BXgKGQivc3smTFQmeR2C+0JB
         3WoM8nc0v+LcAg2vCSX7e0k0rLR5d42kqmMVhcJqIgMIWiA79bEVhIef3187pJv0sgT5
         xtng==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:from:message-id:date:in-reply-to:comments
         :references:to:subject:mime-version;
        bh=vkfokoWdfNja43yUaqwS+4XrmF8jPjlANgEN0TpJ0oE=;
        b=OR8VT3kvQbx2Hvh9pM3a75IAGvQQ+rLwudqm3ht17XarrRns84IxtOeXSjIdFdHzsu
         Xq26y2S/lwsnG3z9gy7bI9F4Hips0i1xmx3oMhiENv2EqsEyvlMFEaH6YakemYse8+VR
         r/cYBGbZ0n0svxEiB7O9U1Xch0IE8/4X48wzNKJYjIq/id8+WbM1DoVoL4syErP6KG5U
         HrVSw7b3KA17dvaK7Hg51pomGEyTVTxQsiLN8pQnJ9YisVN5X2NXZ4oo/kx+gkGa8WeD
         9in43PNEccGWzvXvUeF/uFMSXN1sZVW4WtKxBrwrXhRBM2kDZf96oEnH8TxKm+kxKH/t
         Po1g==
X-Gm-Message-State: ALyK8tLJ2QcN2cPYRwQ+4P8HN0yyljAflxLHWdWlh7eNkdmpz6ckk7sLI+/eoLVcluzhwQ==
X-Received: by 10.66.63.35 with SMTP id d3mr20955421pas.69.1465171650053;
        Sun, 05 Jun 2016 17:07:30 -0700 (PDT)
From: Bart Schaefer <schaefer@brasslantern.com>
Message-Id: <160605170733.ZM8723@torch.brasslantern.com>
Date: Sun, 5 Jun 2016 17:07:33 -0700
In-Reply-To: <20160605203708.3701c7a2@ntlworld.com>
Comments: In reply to Peter Stephenson <p.w.stephenson@ntlworld.com>
        "Re: [BUG] Long line makes pattern matching (by //) hog Zsh" (Jun  5,  8:37pm)
References: <CAKc7PVC=AES1LhY7tYTXrPsefX3CXgtUsxiVbDaxmc5o2iHnVw@mail.gmail.com> 
	<160605121020.ZM7727@torch.brasslantern.com> 
	<20160605203708.3701c7a2@ntlworld.com>
X-Mailer: OpenZMail Classic (0.9.2 24April2005)
To: Zsh hackers list <zsh-workers@zsh.org>
Subject: Re: [BUG] Long line makes pattern matching (by //) hog Zsh
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Seq: zsh-workers 38621

On Jun 5,  8:37pm, Peter Stephenson wrote:
}
} > On Jun 5,  4:36pm, Sebastian Gniazdowski wrote:
} > }
} > } 1. not backslash nor slash nor space [^ /\\\\]##
} > } 2. not number, slash, backslash, space [^0-9/\\\\ ]##
} > } 3. not slash, backslash [^/\\\\]#
} > } 4. end of line (#e)
} 
} The problem is the patterns are pathological.  Each of them can match
} the same characters.

Indeed, given that the first sub-pattern is "at least one" and the
second sub-pattern includes all the same characters as the first, I
think the first ## can be removed without changing what is matched;
i.e.:

    NLIST_COLORING_PATTERN="([^ /\\\\][^0-9/\\\\ ]##[^/\\\\]#(#e))"

And with that pattern the whole test script completes in a couple of
seconds.

} So it's spending a lot of time repartitioning the
} mathches  between the possibilities of 1. and 2. and 3. in the above.
} That's not polynomially bounded.  I'm not sure if it's even
} exponentially bounded.

I tried to work that out and came up with O(n**((n-1)**(n-2))) before
deciding I didn't remember enough of that math to believe it.

The other problem, which I hadn't really thought about before, is that
this is all being applied inside ${str//pat/repl} expression, bounded
only on the right, which means it has to try every position in $str
even when the pattern does NOT match, which for a fair chunk of the
example input it does not.

There may be an optimization possible in the way that igetmatch()
interacts with pattrylen(), so as to skip forward over the longest
NON-match as well as skipping over the longest match.

