Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [BUG] Long line makes pattern matching (by //) hog Zsh



On Jun 5,  8:37pm, Peter Stephenson wrote:
}
} > On Jun 5,  4:36pm, Sebastian Gniazdowski wrote:
} > }
} > } 1. not backslash nor slash nor space [^ /\\\\]##
} > } 2. not number, slash, backslash, space [^0-9/\\\\ ]##
} > } 3. not slash, backslash [^/\\\\]#
} > } 4. end of line (#e)
} 
} The problem is the patterns are pathological.  Each of them can match
} the same characters.

Indeed, given that the first sub-pattern is "at least one" and the
second sub-pattern includes all the same characters as the first, I
think the first ## can be removed without changing what is matched;
i.e.:

    NLIST_COLORING_PATTERN="([^ /\\\\][^0-9/\\\\ ]##[^/\\\\]#(#e))"

And with that pattern the whole test script completes in a couple of
seconds.

} So it's spending a lot of time repartitioning the
} mathches  between the possibilities of 1. and 2. and 3. in the above.
} That's not polynomially bounded.  I'm not sure if it's even
} exponentially bounded.

I tried to work that out and came up with O(n**((n-1)**(n-2))) before
deciding I didn't remember enough of that math to believe it.

The other problem, which I hadn't really thought about before, is that
this is all being applied inside ${str//pat/repl} expression, bounded
only on the right, which means it has to try every position in $str
even when the pattern does NOT match, which for a fair chunk of the
example input it does not.

There may be an optimization possible in the way that igetmatch()
interacts with pattrylen(), so as to skip forward over the longest
NON-match as well as skipping over the longest match.



Messages sorted by: Reverse Date, Date, Thread, Author