Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: multibyte optimisations



On Thu, Nov 10, 2016, at 05:47 AM, Peter Stephenson wrote:
> On Thu, 10 Nov 2016 02:37:12 -0800
> Sebastian Gniazdowski <psprint@xxxxxxxxxxxx> wrote:
> > Other pointed functions seem to be very valid / expected – multibyte
> > functions. They can be optimized if a courageous decision will be made –
> > to do what charnext / pattern.c does:
> > 
> >     if (!(patglobflags & GF_MULTIBYTE) || !(STOUC(*x) & 0x80))
> >         return x + 1;
> > 
> > I.e. to optimize for ASCII as subset of UTF-8 also when calling
> > MB_METACHARLEN, not only for MB_METASTRLEN (recent change).
> 
> These look straightforward and along the same lines as what we already
> do.

Was worried that multibyte state can be not clear when requesting length
of character, but that cannot really happen, and if it would, then the
loop that advances char by char would have a problem, being in unclear
situation after recent advancement. With this patch the parser runs for
1493 ms instead of 2148 ms :)

-- 
  Sebastian Gniazdowski
  psprint@xxxxxxxxxxxx



Messages sorted by: Reverse Date, Date, Thread, Author