Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: random numbers

X-seq: zsh-users 30353
From: Lawrence Velázquez <larryv@xxxxxxx>
To: zsh-users@xxxxxxx
Subject: Re: random numbers
Date: Mon, 15 Sep 2025 16:10:13 -0400
Archived-at: <https://zsh.org/users/30353>
Feedback-id: iaa214773:Fastmail
In-reply-to: <75f8536d-8916-4b57-b6c8-fe44ec632776@eastlink.ca>
List-id: <zsh-users.zsh.org>
References: <75f8536d-8916-4b57-b6c8-fe44ec632776@eastlink.ca>

It's true that RANDOM is not really random (and Clinton has already
suggested an alternative).  But nothing you said shows that.


On Mon, Sep 15, 2025, at 9:22 AM, Ray Andrews wrote:
> for ((level=1; level<=1000; level++)); do
>
> #First tests:
> var=$RANDOM
>
> #Second tests:
> #var=$(shuf -i 1-1000 -n 1)
>
> [...]
>
> First tests give close to this:
>
> 5 /aWorking/Zsh/Source/Wk/Boneyard/Math 1 % . ./test1
> a1 is: 339
> a2 is: 336
> a3 is: 134
> a4 is: 27
> a5 is: 31
> a6 is: 25
> a7 is: 28
> a8 is: 40
> a9 is: 40
>
> Second tests close to this:
>
> 5 /aWorking/Zsh/Source/Wk/Boneyard/Math 1 % . ./test1
> a1 is: 110
> a2 is: 102
> a3 is: 114
> a4 is: 114
> a5 is: 107
> a6 is: 99
> a7 is: 135
> a8 is: 108
> a9 is: 111

This comparison is invalid.  You're not even testing similar ranges.
If you set the ranges properly, you won't see such a large discrepancy:

	% cat /tmp/rand.zsh
	typeset -A random_1000 random_32767 shuf_1000 shuf_32767

	n=1000

	for ((i = 1; i <= n; ++i))
	do
		# [1, 1000]
		until (((tmp = RANDOM) < 32000))
		do
			:
		done
		((tmp = tmp % 1000 + 1))
		((++random_1000[$tmp[1]]))

		# [0, 32767]
		((tmp = RANDOM))
		((++random_32767[$tmp[1]]))
	done

	gshuf --input-range=1-1000 --repeat --head-count=$n |
		while read -r tmp
		do
			((++shuf_1000[$tmp[1]]))
		done

	gshuf --input-range=0-32767 --repeat --head-count=$n |
		while read -r tmp
		do
			((++shuf_32767[$tmp[1]]))
		done

	printf '         RANDOM      shuf    RANDOM      shuf\n'
	printf 'digit    1-1000    1-1000   0-32767   0-32767\n'

	for i in {0..9}
	do
		printf '%5d%10d%10d%10d%10d\n' \
		       $i \
		       $random_1000[$i] $shuf_1000[$i] \
		       $random_32767[$i] $shuf_32767[$i]
	done
	% ./zsh /tmp/rand.zsh
		 RANDOM      shuf    RANDOM      shuf
	digit    1-1000    1-1000   0-32767   0-32767
	    0         0         0         0         0
	    1       115       111       324       360
	    2       113       110       339       322
	    3       102       105       120       118
	    4       111       119        46        40
	    5       144        91        39        32
	    6        98       111        37        34
	    7       106       109        33        28
	    8       110       126        33        37
	    9       101       118        29        29


> I think i know why, it's because zsh's RANDOM tops out at 32,767, but 
> this insures that the first digit of a string of random numbers isn't 
> random but strongly weighted to 1 and 2, with 3 also weighted.  Is this 
> worth worrying about?  I'd say so.  Perhaps it should max out at 10,000 
> or 100,000 or even offer a user selectable range?

It would be goofy if the shell chose the range of RANDOM just to
eliminate bias with respect to base-10 leading digits.  One could
very well object that such a change would disrupt the current lack
of leading-digit bias for base-8 representations.

A user who generates and uses random numbers needs to account for
bias.  This is not something the generator can do for you.


> Or do we have other tricks for coping with this?

Test better.

https://blog.codinghorror.com/the-danger-of-naivete/


> I find zsh to be quite competent 
> mathematically so having to use an external utility to get really 
> random numbers seems not up to snuff.

The leading-digit bias does not mean that RANDOM is not "really
random" (although it isn't).  And GNU shuf uses an internal PRG
by default, which isn't "really random" either.


-- 
vq

References:
- random numbers
  - From: Ray Andrews

Messages sorted by: Reverse Date, Date, Thread, Author