Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Substitution ${...///} slows down when certain UTF character occurs



On 26 September 2015 at 22:44, Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Sep 26,  2:19pm, Sebastian Gniazdowski wrote:
> }
> } I attach a script that does ${...///} substitution.
>
> I worry that the attachement hasn't come through correctly?  When I
> unpack the base64 into text, I get (in part)
>
> str="c4d5148ca6 ce3a2d24203abfb385 30f5fe85434ae ... 5d468f6"
>
> Is the value of $str supposed to look like that?  So the pattern in
> the ${str//...} replacement never matches?

Yes. I attached the string instead of code that generated it:
# cat /dev/urandom | env LC_CTYPE=C tr -cd 'a-f0-9 ' | head -c 120000

> } It  is very slow for some chars and very fast for others. How to explain
> } and hopefully fix this?
>
> Each time pattryrefs() fails to find a match, it increments the area
> to be searched by one character and then tries the entire pattern
> match again.  So for a 120000-character string, it's doing a non-
> matching search 120000 times.

That's a huge plus that it's still instant fast for strings of that
length if there is no unlucky unicode character.

> I rewrote your test to use "float SECONDS" + "print $SECONDS" instead
> of forking off subshells for "time" and to use loops so I didn't have
> to comment things in and out.  Observations:
>
> 1. It's only fast for the Yen symbol, which is the only one that does
> not have a byte with the high-order bit set.  This case is avoiding
> this block in pattern.c:

For me (OSX / zsh 5.0.2) it was fast for characters at even positions
in what I attached, i.e. for chars ¥,Ł,Ǟ. Didn't thought it can differ
for different environments, I now ran the test on different machines.
Ubuntu 12.10 / zsh 5.0.0 is the same. For FreeBSD / zsh 5.1.1-dev-0
(HEAD 50721a1 and 8d5c0c) it's different, fast characters are: ¥, Ł.
For zsh-5.1.1-dev-0 (HEAD 50721a1 and 8d5c0c) on OSX it's the same as
the FreeBSD case.

Best regards,
Sebastian Gniazdowski



Messages sorted by: Reverse Date, Date, Thread, Author