Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: multibyte optimisations

X-seq: zsh-workers 39908
From: Sebastian Gniazdowski <psprint@xxxxxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: Re: multibyte optimisations
Date: Thu, 10 Nov 2016 06:57:01 -0800
Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.com; h= content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-sender :x-me-sender:x-sasl-enc; s=mesmtp; bh=66ePxgXm9eIcb3JV8J58IYHukT 0=; b=H9UViBnA2OhmVQg25VFtXE5Sb35am4hTNuXQbbjNompJGgcbHOxNy4zExY Avxg1pKTxnLzW5HHT0xyRVpTR1vLPFtjDAT82XcGbmOimZIMIbfjYtTFmLMQrIIg DjTobSifRhKhkoUXEneltXta5OoITuBGbK8ydr8qEgO2+gKCk=
Dkim-signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-sender:x-me-sender:x-sasl-enc; s=smtpout; bh=66 ePxgXm9eIcb3JV8J58IYHukT0=; b=o49oNzvZPMMNjJ/CDoS6+nag2aBteAIr+G z25jbgw/O8nRgl+E9jSLOe/CP6UWqHN3BizAfvJ7YdMIYwTP54Ddq5F/9W8rakdj CjZFELRDf1JKNKk/UeOPIcKLcHp1+KzzmP5G6bsz0of5mA3zUozZcWL6Ll76vIFx L6QJWwPhQ=
In-reply-to: <20161110134722.06e6dc51@pwslap01u.europe.root.pri>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <CGME20161110103845epcas3p3e7cabeffae723219daafa8d3e6b32f12@epcas3p3.samsung.com> <1478774232.2371010.783342705.69C81F52@webmail.messagingengine.com> <20161110134722.06e6dc51@pwslap01u.europe.root.pri>

On Thu, Nov 10, 2016, at 05:47 AM, Peter Stephenson wrote:
> On Thu, 10 Nov 2016 02:37:12 -0800
> Sebastian Gniazdowski <psprint@xxxxxxxxxxxx> wrote:
> > Other pointed functions seem to be very valid / expected – multibyte
> > functions. They can be optimized if a courageous decision will be made –
> > to do what charnext / pattern.c does:
> > 
> >     if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
> >         return x + 1;
> > 
> > I.e. to optimize for ASCII as subset of UTF-8 also when calling
> > MB_METACHARLEN, not only for MB_METASTRLEN (recent change).
> 
> These look straightforward and along the same lines as what we already
> do.

Was worried that multibyte state can be not clear when requesting length
of character, but that cannot really happen, and if it would, then the
loop that advances char by char would have a problem, being in unclear
situation after recent advancement. With this patch the parser runs for
1493 ms instead of 2148 ms :)

-- 
  Sebastian Gniazdowski
  psprint@xxxxxxxxxxxx

References:
- Callgrind run
  - From: Sebastian Gniazdowski
- multibyte optimisations
  - From: Peter Stephenson

Messages sorted by: Reverse Date, Date, Thread, Author