Professional Documents
Culture Documents
Monkey Tests For Random Number Generators 1
Monkey Tests For Random Number Generators 1
Monkey Tests For Random Number Generators 1
1 Introduction
Few images invoke the mysteries and ultimate certainties of a sequence of random events as
well as that of the proverbial monkey at a typewriter. Surprisingly, many questions about the
monkey’s literary output—the times between appearances of certain strings, the number of
distinct four-letter words in a million keystrokes, the time needed to spell CAT, for example—
are well suited for assessing both uniformity and independence in the output of a random
number generator (the monkey). Technically, we are concerned with overlapping m-tuples of
successive elements in a random sequence.
For years, in my annual course Computer Methods in Probability and Statistics, I called
these Overlapping m-Tuple Tests. But for the last few years I have used the monkey metaphor.
It seems a better way to stimulate the interests of the students and, by invoking an interesting
image, make them more readily accept the ideas and even feel as though they were their own.
We hope it will have a similar effect on you, the reader—not that I necessarily equate you
with the students (or the monkey).
This article describes some very simple, as well as some quite sophisticated, tests that
shed light on the suitability of certain random number generators. The generators are used
to provide the random keystrokes for our monkey. The keyboards range from the standard 26
upper-case letters to an organ-like keyboard with 1024 keys to the DNA keyboard with four
keys: C,G,A,T.
2 CAT Tests
Now, to business. Start with an idea that provides a very inefficient test, but one that some
random number generators (RNG’s) fail. Our monkey (RNG) has a typewriter with 26 upper-
case letters A,B,...,Z that he strikes at random. (Assume our RNG monkey produces uniform
reals in [0,1), say by means of a procedure UNI(). The integer part of 26.*UNI() provides the
random keystroke.) Now the CAT test: how many keys must the monkey strike until he spells
CAT?
There are 263 = 17, 576 possible 3-letter words, so the average number of keystrokes neces-
sary to produce CAT should be around 17,576, and the time to reach CAT should be very close
to exponentially distributed. Exact and approximate distributions, and more efficient tests, are
bellow; for now, let’s try this simple CAT test for a few common RNG’s.
The congruential monkey, I = 69069 ∗ I mod 232 , converting to a real UNI on [0,1), gets
CAT after 13,561 keystrokes, then after 18,263, then another 14,872 strokes produces the third
CAT. Quite satisfactory.
Now consider the shift-register (Tauseworthe) monkey that produces 31-bit integers by ex-
clusive or’s, left shift 28 and right shift 3. This is the very generator suggested [8] as a replace-
ment for congruential generators after discovery of their lattice structure [3]. The shift-register
monkey never spells CAT, even after two million keystrokes. He can’t get KAT, DOG, GOD,
SEX, WOW or ZIG either. But he can get ZAG, and too often—every few thousand keystrokes.
Indeed, it turns out that this monkey was only able to get 7,834 of the possible 17,576 3-letter
words, (in a string of 1,000,000 keystrokes) and of course with his limited vocabulary, he gets
those words too often.
Note that inability to get CAT should not be attributed to the equivalent of broken keys on
the typewriter. This monkey still types each of the letters A to Z with the expected frequencies
(and thus would pass a standard test for letter frequency). For example, 26,000 keystrokes
produced 984 C’s, 967 A’s and 1021 T’s, quite satisfactory. Yet continuing the run to 2,600,000
keystrokes failed to produce a single CAT!
As silly as it seems, this is a very effective and convincing way to show the unsuitability
of certain random number generators. You may easily write a program and try it yourself.
1
Although details have been lost, I remember using the CAT test to shoot down RNG’s provided
with Apple and Radio Shack computers (TRS80’s ?) when they first came out in the 1970’s.
X
i=676
(vi − 2600)2
Q2 = ,
i=1
2600
where v1 , v2 , . . . , v676 are the counts for the possible 2-letter words, then the difference, Q3 −Q2 ,
is a zero-centered quadratic form in a weak inverse of the covariance matrix of the counts
x1 , . . . , x17576 , and is the appropriate (likelihood ratio) test that the x’s came from a normal
distribution with the specified means and covariance matrix.
If the hypothesis is true (the monkey is striking the keys uniformly and independently), then
Q3 − Q2 will have a chi-square distribution with 263 − 262 degrees of freedom, (the rank of C).
2
What to do? My approach is this: instead of counting frequencies of, say, 4-letter words in
a long string of keystrokes, requiring a memory location—or at least a byte—for each possible
4-letter word, why not just count the presence or absence of each possible word? That requires
a single bit for each possible word, or 264 = 456796 bits for 4-letter words in an alphabet of 26
letters. That’s about 14,000 computer (32-bit) words, a reasonably-sized array for most high
level languages.
I call these sparse-occupancy tests. Because counting actual frequencies requires arrays too
large, we only count the number of empty cells, that is, in a long string of keystrokes, we use
a bit map to find how many 4- (5-,6- or higher-) letter words are missing. Some interesting
probability theory is required to develop appropriate tests.
6 OPSO Theory
The OPSO test counts the number of missing 2-letter words in a long string of n random
keystrokes. If n = 221 = 2, 097, 152 and there are α = 210 = 1024 letters in the alphabet, then
the number of missing words should average 141,909 with a standard deviation of 290. How is
this determined?
The answer: not easily. At least, the variance is not easy; the mean is easy. To get the
mean, we take advantage of the near lack-of-memory property of the monkey’s output. If he
3
has not typed a particular word after, say, 1000 keystrokes, then the distribution of the time
remaining until he does has virtually the same distribution as the original. In other words,
the time until the monkey types a particular 2-letter word should be close to exponential, with
mean µ = α2 = 220 , and the probability he does not type the word within n keystrokes should
be e−n/µ , to considerable accuracy.
To determine that accuracy, we need the true probability that n keystrokes will not produce
a particular 2-letter word. There are two kinds of 2-letter words: AB and AA. The probability
of no AB in n keystrokes is
1
the coefficient of z n in the Taylor expansion of ,
1 − z + p2 z 2
Then the number of missing words is x1 + · · · + x2097152 , and the expected number of missing
words is
E(x1 ) + · · · + E(x2097152 ).
There are 220 − 210 2-letter words of type AB, and 210 of type AA. When n = 221 , our
expected number of missing 2-letter words is
Thus the easy method for finding the average number of missing 2-letter words from n
20
keystrokes, 220 e−n/2 , is quite suitable for practical applications of the OPSO test.
Those not acquainted with methods for developing generating functions and solving recur-
rence equations such as those above may wish to look at the marvelous treatment in the book
[1] developed out of Donald Knuth’s concrete mathematics course at Stanford, in particular,
sections 7.1-7.3 and 8.4.
4
6.1 Finding the variance
Now for the hard part: the variance of the number of missing 2-letter words. It is the sum
of cov(xi , xj ) for i and j each ranging from 1 to 210 . (Recall that xi is the indicator variable
for the ith word, 1 if it appears, 0 if not). There are about 220 such covariances (actually,
523 × 1023 + 1024). They fall into some 17 different types, with a generating function for each
type. Thus the expected value of xi xj , with xi associated with a word such as AB, and xj
associated with CA, requires a different generating function than does that associated with
AA,BA.
If all the different types, and their frequencies, are accounted for, the total yields the required
variance. It is 84368. Thus, with an alphabet of 210 letters, the number of missing 2-letter words
in a string of 221 keystrokes has mean 141,909 and standard deviation 290.46. It appears to
have a distribution close enough to normal that the statistic (x−141909)/290 is the appropriate
one for the OPSO test, where x is the number of 2-letter words that are missing from the string
of 221 keystrokes. A value of (x − 141909)/290 in absolute value greater than, say, 3 is cause
for concern. A really good monkey (RNG) would cause concern only a few times in 1000 tests.
5
8 The OQSO Test
OQSO means Overlapping-Quadruples-Sparse-Occupancy—the number of missing 4-letter words
in a long string of keystrokes. We use an alphabet of α = 25 letters and a string of n = 221
keystrokes. Thus selecting any five bits from the integer produced by the RNG provides the
resulting keystroke for our RNG monkey.
By enumerating the possible kinds of 4-letter words, finding their generating functions and
asymptotic forms for the coefficients of z n , then combining them with the proper frequencies,
the true expected number of missing 4-letter words in a string of n = 221 keystrokes may be
found to better than 40 digits of accuracy. To the first 11 digits, it is µ = 141909.47365 . . ..
The approximation based on the lack-of-memory property yields 141909.33.
The box below gives details of that enumeration: For each type of word, the frequency, the
generating function and the probability that a string of n = 221 keystrokes will not contain
that word. The notation A0 means not-A, X and Y designate any letters of the alphabet.
Total: µ = 141909.47365
4
Lack-of-memory approximation: α4 e−n/α = 141909.33
I don’t know—and doubt that I ever will know—the true variance. There are just too many
kinds of pairs of 4-letter words to undertake finding all the necessary generating functions.
Extensive simulation suggests that using σ = 295 will serve well for testing a random number
generator: the values (missingwords-141909)/295 should look like independent standard normal
variates. The alternative is to take a larger sample and do a t-test for the known mean of
141,909.
6
trailing segments), find their generating functions, the probability the word will not appear in
n = 221 keystrokes, then combine all those probabilities, with appropriate frequencies.
The result is 141910.5378411, the expected number of missing 10-letter words in a random
DNA segment of 221 C’s,G’s,A’s and T’s. The expected value from the lack-of-memory property
is the same as that for OPSO and OQSO, with αk = 220 and λ = 2: 141909.
It appears a formidable task to find the exact variance for the DNA test. A t-test may
be used to test that the mean is 141,910. It will require a larger sample than one using
the true variance. But extensive simulation suggests use of a sigma of 339. So a reasonble
implementation of the DNA test is this: Generate 221 keystrokes from the alphabet {C,G,A,T}
(using two bits from each random integer), and let x be the number of missing 10-letter words.
Do this, say 4, times. The resulting values (x1 − 141911)/339, . . . , (x4 − 141911)/339) should
look like a sample of 4 independent standard normal variables.
7
generators are in [4, 5, 6, 7]
8
An example: the SWB generator xn = xn−24 − xn−37 − c mod 232 has period about 21178
and passes all the CAT, OPSO, OQSO and DNA tests put to it, for all substrings of its 32-bit
integers. (However, it is not perfect; it fails the birthday-spacing test described in [4], as do
other AWC and SWB generators and lagged-Fibonacci generators using +,– or ⊕.)
References
[1] Graham, Ronald L., Knuth, Donald E. and Patashnik, Oren Concrete Mathematics,
Addison-Wesley, Reading, MA 1989
[2] Haas, Alexander, The multiple prime random number generator, ACM Transactions on
Mathematical Software, 13, No. 4, 1987.
[3] Marsaglia, George, Random numbers fall mainly in the planes, Proceedings National
Academy Science 61, 25-28, 1968.
[4] Marsaglia, George, Keynote Address: A Current View of Random Number Generators,
Proceedings, Computer Science and Statistics: 16th Symposium on the Interface, Elsevier,
1985
[5] Marsaglia, George and Tsay, L. H., Matrices and the structure of random number sequences,
Linear Algebra and its Applications, 67, 147–156, 1985.
[6] Marsaglia, George, The mathematics of random number generators, Proceedings of Symposia
on Applied Mathematics, 46, 73–89, 1992.
[7] Marsaglia, George and Zaman, Arif, A new class of random number generators, Annals of
Applied Probability, 1, No. 3, 462–480, 1991.
[8] Whittlesey, J. R. B., On the multidimensional uniformity of pseudorandom number gener-
ators, Communications of the ACM, 12, p. 247, 1969.