Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
Re: random numbers
It's true that RANDOM is not really random (and Clinton has already
suggested an alternative). But nothing you said shows that.
On Mon, Sep 15, 2025, at 9:22 AM, Ray Andrews wrote:
> for ((level=1; level<=1000; level++)); do
>
> #First tests:
> var=$RANDOM
>
> #Second tests:
> #var=$(shuf -i 1-1000 -n 1)
>
> [...]
>
> First tests give close to this:
>
> 5 /aWorking/Zsh/Source/Wk/Boneyard/Math 1 % . ./test1
> a1 is: 339
> a2 is: 336
> a3 is: 134
> a4 is: 27
> a5 is: 31
> a6 is: 25
> a7 is: 28
> a8 is: 40
> a9 is: 40
>
> Second tests close to this:
>
> 5 /aWorking/Zsh/Source/Wk/Boneyard/Math 1 % . ./test1
> a1 is: 110
> a2 is: 102
> a3 is: 114
> a4 is: 114
> a5 is: 107
> a6 is: 99
> a7 is: 135
> a8 is: 108
> a9 is: 111
This comparison is invalid. You're not even testing similar ranges.
If you set the ranges properly, you won't see such a large discrepancy:
% cat /tmp/rand.zsh
typeset -A random_1000 random_32767 shuf_1000 shuf_32767
n=1000
for ((i = 1; i <= n; ++i))
do
# [1, 1000]
until (((tmp = RANDOM) < 32000))
do
:
done
((tmp = tmp % 1000 + 1))
((++random_1000[$tmp[1]]))
# [0, 32767]
((tmp = RANDOM))
((++random_32767[$tmp[1]]))
done
gshuf --input-range=1-1000 --repeat --head-count=$n |
while read -r tmp
do
((++shuf_1000[$tmp[1]]))
done
gshuf --input-range=0-32767 --repeat --head-count=$n |
while read -r tmp
do
((++shuf_32767[$tmp[1]]))
done
printf ' RANDOM shuf RANDOM shuf\n'
printf 'digit 1-1000 1-1000 0-32767 0-32767\n'
for i in {0..9}
do
printf '%5d%10d%10d%10d%10d\n' \
$i \
$random_1000[$i] $shuf_1000[$i] \
$random_32767[$i] $shuf_32767[$i]
done
% ./zsh /tmp/rand.zsh
RANDOM shuf RANDOM shuf
digit 1-1000 1-1000 0-32767 0-32767
0 0 0 0 0
1 115 111 324 360
2 113 110 339 322
3 102 105 120 118
4 111 119 46 40
5 144 91 39 32
6 98 111 37 34
7 106 109 33 28
8 110 126 33 37
9 101 118 29 29
> I think i know why, it's because zsh's RANDOM tops out at 32,767, but
> this insures that the first digit of a string of random numbers isn't
> random but strongly weighted to 1 and 2, with 3 also weighted. Is this
> worth worrying about? I'd say so. Perhaps it should max out at 10,000
> or 100,000 or even offer a user selectable range?
It would be goofy if the shell chose the range of RANDOM just to
eliminate bias with respect to base-10 leading digits. One could
very well object that such a change would disrupt the current lack
of leading-digit bias for base-8 representations.
A user who generates and uses random numbers needs to account for
bias. This is not something the generator can do for you.
> Or do we have other tricks for coping with this?
Test better.
https://blog.codinghorror.com/the-danger-of-naivete/
> I find zsh to be quite competent
> mathematically so having to use an external utility to get really
> random numbers seems not up to snuff.
The leading-digit bias does not mean that RANDOM is not "really
random" (although it isn't). And GNU shuf uses an internal PRG
by default, which isn't "really random" either.
--
vq
Messages sorted by:
Reverse Date,
Date,
Thread,
Author