Re: Multi-Minute Startup?

X-seq: zsh-users 13140

From: "Eric D. Friedman" <eric_friedman@xxxxxxx>

To: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>

Subject: Re: Multi-Minute Startup?

Date: Thu, 14 Aug 2008 12:02:07 -0700

Cc: Richard Hartmann <richih.mailinglist@xxxxxxxxx>, Dan Nelson <dnelson@xxxxxxxxxxxxxxx>, Aaron Davies <aaron.davies@xxxxxxxxx>, "Benjamin R. Haskell" <zsh@xxxxxxxxxx>, Zsh Users <zsh-users@xxxxxxxxxx>

In-reply-to: <20080814183726.GA2407@xxxxxxxxxxxxxxxxxxxx>

Mailing-list: contact zsh-users-help@xxxxxxxxxx; run by ezmlm

References: <alpine.LNX.1.10.0808070327310.26055@xxxxxxxxxxxxxxxxxxxxx> <c4e763ac0808070152k2846913dn4b637fe9ea275ef2@xxxxxxxxxxxxxx> <alpine.LNX.1.10.0808070509320.26055@xxxxxxxxxxxxxxxxxxxxx> <c4e763ac0808071832j6f7393fay158c7a2485ca41c9@xxxxxxxxxxxxxx> <20080808025823.GB68181@xxxxxxxxxxxxxxxx> <c4e763ac0808072340l398c5209o1cd6ccd77ec08a7f@xxxxxxxxxxxxxx> <alpine.LNX.1.10.0808081201180.26055@xxxxxxxxxxxxxxxxxxxxx> <c4e763ac0808110248g7dedabb9kbe6af57fcb0b8d68@xxxxxxxxxxxxxx> <20080811152909.GA61872@xxxxxxxxxxxxxxxx> <2d460de70808140608t7cc8ca48v4117feb57c97ca3b@xxxxxxxxxxxxxx> <20080814183726.GA2407@xxxxxxxxxxxxxxxxxxxx>

Using a prime number to size a hashtable is a way of getting a good distribution across buckets in the table if one is using division (modulo) by the table size to identify the bucket to use for some key. You can get a refresher in Knuth v3, p. 515-516.

Another, faster approach that you will also see a lot of these days (the java hashtables use this one as of a few releases back), is to set hashtable sizes to one-less-than-some-power-of two. You then assign keys to buckets by doing a bitwise and of the key and that size. This saves the time involved in doing division (still very expensive, even on modern processors) and also removes the need to find a new prime number every time the table grows.

On Aug 14, 2008, at 11:37 AM, Phil Pennock wrote:

On 2008-08-14 at 15:08 +0200, Richard Hartmann wrote:
On Mon, Aug 11, 2008 at 17:29, Dan Nelson <dnelson@xxxxxxxxxxxxxxx> wrote:
Raise suggested-size to 1601 (a prime number larger than your current
list size with some room to grow).

Using a prime hints at an interesting reason. What is it?

General hashing algorithm theory. I forget the math (if I ever knew
it); also, there's a lot of heuristics in there ("that seems to work,
let's go with that").

You tend to not be hashing completely random data and in the absence of
distribution information about the input data, bucketing into a prime
number of slots tends to cause the least pain.

You'll find prime number requirements for many hash bucketing schemes.

-Phil, who should have made more effort to stay awake that lecture

Reverse Date

Date

Thread

Author