Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: accessing array by index very slow

X-seq: zsh-workers 51690
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: Zsh hackers list <zsh-workers@xxxxxxx>
Subject: Re: accessing array by index very slow
Date: Sun, 30 Apr 2023 12:26:58 -0700
Archived-at: <https://zsh.org/workers/51690>
In-reply-to: <CAH+w=7Y4DDY9p9BH76jWB-UHyOOCXw81QrGztEfxZsMUDd=soQ@mail.gmail.com>
List-id: <zsh-workers.zsh.org>
References: <20230416090342.dztvzcnzpbyukgfq@chazelas.org> <CAN=4vMouxgQNP=sfrXDhvtUkgLZgkDd6Z3GZX04rHGOhDwihpg@mail.gmail.com> <CAKc7PVAgXMCoFKgVu4Wy=43C=NpC3zECPLirX9O9y1m7QCAY4A@mail.gmail.com> <CAH+w=7bBf7X0CFEGK+H-huj3a9QGiD7-VC3Qc+-hM__ZNQX_Kg@mail.gmail.com> <CAHYJk3SyzKYxTp8Sia8dLRTX_Lkesz80twmsKQroY3q1f4=qBA@mail.gmail.com> <CAH+w=7Y4DDY9p9BH76jWB-UHyOOCXw81QrGztEfxZsMUDd=soQ@mail.gmail.com>

I've been poking around at this, and there are a few serious complications.

The first is that we almost always retrieve arrays via a Value struct,
which contains both a pointer to the source Param and the start/end
indexes for an array slice.  The largest reason array accesses aren't
O(1) is due to bounds checking that those desired slices don't point
outside the array.  We've already optimized bounds checking to avoid
scanning the full array, so unless the same array is going to be
checked repeatedly it'd actually make things slower to populate the
cache.

Second and similarly, the best place to initialize the cache is when
assigning or appending to the array, which makes those operations
become O(n), which seems undesirable.

Third, a lot of array operations either use an array constructed on
the fly (array of all values of a hash table, for example) or fetch
the array from the parameter at a point in the program flow that's
abstracted from the operation that needs to do the bounds checking, so
the cached length is no longer readily available; or a subset of the
source array may already have been copied using (start,end) of the
Value, or by scanning the array for a pattern match, so the cached
value no longer applies.

Fourth, the length of a PM_SPECIAL parameter can't reliably be cached
because the source data may change between accesses.  Not true for all
PM_SPECIAL, of course, but recording which ones can or cannot be is an
obstacle.

There is a "numparamvals" global that's reset whenever a hash is
converted to an array, so that might be co-opted to update on other
array copy operations if we're careful.

References:
- accessing array by index very slow
  - From: Stephane Chazelas
- Re: accessing array by index very slow
  - From: Roman Perepelitsa
- Re: accessing array by index very slow
  - From: Sebastian Gniazdowski
- Re: accessing array by index very slow
  - From: Bart Schaefer
- Re: accessing array by index very slow
  - From: Mikael Magnusson
- Re: accessing array by index very slow
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author