Professional Documents
Culture Documents
Inferring Sequences Produced by Pseudo-Random Number Generators
Inferring Sequences Produced by Pseudo-Random Number Generators
Generators
JOAN BOYAR
University of Chicago, Chicago, Illinois
Abstract. In this paper, efficient algorithms are given for inferring sequences produced by certain
pseudo-random number generators. The generators considered are all of the form X, =
cI:=, ~j@j,(XO,Xlt~. . f X,-,) (modm). In each case, we assume that the functions 4, are known and
polynomial time computable, but that the coefficients a, and the modulus VI are unknown. Using this
general method, specific examples of generators having this form, the linear congruential method, linear
congruences with n terms in the recurrence, and quadratic congruences are shown to be cryptographically
insecure.
Categories and Subject Descriptors: E.3 [Data Encryption]; F.2.1 [Analysis of Algorithms and Problem
Complexity]: Numerical Algorithms and Problems; G.3 [Probability and Statistics]: random number
generation.
General Terms: Algorithms, Security
Additional Key Words and Phrases:Cryptography, inductive inference, linear congruential method
1. Introduction ’
A pseudo-random number generator is considered cryptographically secure if a
cryptanalyst is unable to compute any other segment of the generator’s output
within feasible time and space complexity bounds, even after obtaining long
segments of this output. The first example of a cryptographically secure pseudo-
random number generator is presented in 1201. In proving that his method is
secure, Shamir proves that, given long segments of this generator’s output, the
ability to produce the next number to be output implies the ability to crack the
Rivest-Shamir-Adleman encryption scheme. But, for cryptographic purposes, one
would prefer a stronger definition of security-even given some of the initial bits
in this next number to be output, a cryptanalyst should have no advantage in
guessing the next bits of this same number. It is unknown whether Shamir’s
generator is secure in this stronger sense.In order to solve this problem, Blum and
Micali [2] give a pseudo-random bit generator, a generator that produces only one
bit, rather than an entire number, at each step. The Blum-Micah generator is
cryptographically secure in this stronger sense, assuming the problem of index
finding is intractable. But their method is extremely slow. Pseudo-random bit
generators that are cryptographically strong under other cryptographic assumptions,
This work was supported by an Educational Opportunity Fellowship and by DARPA grant N00039-
82-C-0235.
Author’s address: Department of Computer Science, University of Chicago, Chicago, IL 60637.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
0 1989 ACM 0004-541 l/89/0100-0129 $01.50
Journal ofthe Association for Computing Machinery, Vol. 36, No. I, January 1989, pp. 129-141
130 JOAN BOYAR
but are also slow, are given in [3] and [8]. Each of these methods performs at least
one multiplication on two very large numbers for each bit of output produced.
Yao [23] has shown that strong generators exist under the assumption that any
one-way permutations exist. Recently Long and Wigderson [ 141, Vazirani and
Vazirani [22], and Alexi et al. [I] have proved that the generators in [2], [3], and
[8] are strong even if improved slightly to produce log, y1bits of output for operations
on n bit numbers. This is a great improvement, but still these generators are quite
slow. This suggeststhe question of whether any of the fast pseudo-random number
generators commonly in use are also cryptographically secure.
The linear congruential pseudo-random number generators, those of the form
X;+, = uXj + b (mod m), are very fast and widely used in Monte Carlo simulations
and probabilistic algorithms. In this paper, we show that they are not cryptograph-
ically secure, even in the weaker sense mentioned above. Obviously, these genera-
tors are not cryptographically secure when the modulus, m, is known. In that case,
one could solve for 6 in the congruence (X, - X0@ = (X2 - X,) (mod m).
Then, the original sequence can be correctly predicted using the generator
X,,, = G(Xi) + (X, - a(XO)) (mod m). But what if the modulus is unknown? Linear
congruential pseudo-random number generators and many of the other fast pseudo-
random number generators have the form
This generator is given no initial values, X0, X, , . . . , Xn,-, , and is used to predict
all subsequent values X,,, X,,,,,, . . . . For example, with the linear congruential
method, no = 1, so X0 is given and all subsequent values are computed using the
generator. In the next section, we present a general method for inferring certain
sequencesin this general form. This method is efficient, according to the following
- definition:
Definition. An eficient inferencemethodfor predicting a sequenceproduced by
a generator of the form X, = xi”=, aj4j(Xo, X1, . . . , X,-l) (mod m) for n 2 no is an
algorithm which
(1) is given the $is, which we assume can be computed over the integers (i.e., not
reduced modulo m) in time polynomial in log m and k,
or the modulus m,
(2) is not given the coefficients CU,
(3) makes predictions for each element of the sequence, one at a time, from
knowledge of all previous values,
(4) makes a number of mistakes that is bounded by a polynomial in log m, in k,
and in no, and
(5) produces a prediction for a next element of the sequence within a time that is
bounded by a polynomial in log m and k.
Thus, we are assuming that whenever the algorithm makes an incorrect inference,
it is immediately given the correct value. This general method, which is described
in Section 2, is applied to specific generators, the linear congruential method, and
linear congruences with n terms in the recurrence; and quadratic congruences, in
Section 3.
The results in the third section, related to the linear congruential method, were
first reported in [ 161and [ 171,and the other results presented in that section were
reported in the author’s dissertation [ 181. Many of the details omitted from this
paper can be found in [ 181or [4].
Inferring SequencesProduced by Pseudo-Random Number Generators 131
Other researchers have looked at similar problems. Reeds [ 191also looks at the
linear congruential method, and assumes, as we do, that the modulus and coeffi-
cients are unknown; but that result, unlike the one presented here, relies on the
assumption that factoring is easy. Lagarias and Reeds [ 131have proved a conjecture
made in [ 181, concerning the unique extrapolation of polynomial recurrences.
Thus the results in Section 3, showing how to predict sequences produced by
quadratic congruences, can be generalized to polynomial congruences, for poly-
nomials of fixed-degree.
where the functions 4j are known, but the coefficients aj and the modulus are
unknown. The generator was used to produce all X, where i > no for some known
no. We also assume that the nonreduced dj’s can be computed in time polynomial
in log m and k. Some examples of such generators are the linear congruential
method, X, = aX,-, + b (mod m), linear congruences with n terms in the recurrence,
X, = alX,-, + azX,-2 + . . . + anXiWn+ a,+l (mod m), and quadratic congruences,
X, = uXfwl + bX;-, + c (mod m). The methodology presented in this section is
applicable to generators in any of these three forms, becausethey have the following
extrapolation property.
Definition. A set of functions (@j(Xo,X, , . . . , X,- , ) I 1 5 j 5 k) has the unique
extrapolation property with length r if for any modulus m (where m = +m is allowed
and means over Q) and any data ( yo, y, , . . . , Y~+,-~ ), then all generators (*) with
coefficients (aj ] 1 5 j I k) such that
where the functions @Jare known, we can attempt to predict the sequence pro-
duced by finding some coefficients CY;and a modulus that, when used with this
recurrence, will produce the same sequence as the original generator. For example,
132 JOAN BOYAR
where m(q+I) = 00. From this point on, to predict any X,, there will always
be a generator of this form to predict the sequence. The first generators will
have ms= w.
One of these first generators created will probably be incorrect at some point in
its predictions (this is not necessarily the case, since no mistake would be made on
the constant sequence, for instance). We assume that whenever an incorrect
prediction occurs, the correct value is revealed before the next prediction is made.
The k + 1st prediction error after the first no values are seen will give us enough
additional information to calculate a nonzero integer multiple of the modulus.
Then, after we have seen at least no + Yvalues, using this multiple of the modulus,
we can find some possible integer coefficients a,!. In the case of any generator
having the extrapolation property, these coefficients will be the coefficients we are
looking for. (Of course, they may not be the actual coefficients, but, given the same
modulus, they would produce the same sequence as the original generator.) From
this point on, the generator created to predict X,, will be
and m, will be a multiple of the original modulus. (In fact, as soon as we have
computed a nonzero multiple of the modulus, we can compute some possible
coefficients to use in the prediction process. It is unnecessary to wait for the unique
extrapolation bound. However, the coefficients should then be recomputed after
the unique extrapolation bound is reached.)
In the third step, using these coefficients and the multiple of the modulus, one
can begin predicting the remainder of the sequence. When further errors are made
in predicting the Xi’s, the guess for the modulus is updated so that it divides the
Inferring SequencesProduced by Pseudo-RandomNumber Generators 133
previous guess for the modulus and is still a multiple of the modulus. The fact that
each new modulus divides all previous ones means that any congruence that held
modulo these previous moduli will also hold with the new modulus. Thus the
coefftcients being used need not be changed when a new modulus is computed to
maintain consistency with the portion of the sequence seen so far. This consistency
is automatic. If this initial portion of the sequence contains at least no+ r elements,
the unique extrapolation bound has been passed, and these coefficients will never
need to be changed. The updates to the modulus are easy to make because m
divides
SO gcd(m,, X, - J$= 1 CY,!4j(Xo, Xl, . . . , X,-, )), is also a multiple of m and can be
used for m,,,. Each time an error is made in the prediction,
Xs Z Cfzl (Yj4j(XO, XI,. . . , X,-r) (mod m,), so some nontrivial factor is removed
from m, to produce m,+ , . Thus the maximum number of errors that could possibly
occur, after m,+l (the first nonzero integer multiple of the modulus) is computed,
is 1 + log2(m,+,/m).
This completes the overview of the algortihm. Steps 1 and 2 are explained further
in the remainder of this section.
Step 1. Let us consider the problem of finding the first nonzero integer multiple
of the modulus. The method presented here will work for any class of generators
with the form
but in specific casesthere may be improvements one could make for the sake of
efficiency.
In order to find a multiple of the modulus m, we look at k-dimensional and
(k + I)-dimensional vectors of consecutive Xl’s. These vectors are viewed as
belonging to vector spacesover the field Q of rationals. Define
The notation aV, will be used to denote multiplication of the vector V, by the
constant a. Thus aVj = (a$, (X0, X, , . . . , Xi-, ), a&(X0, X, , . . . , Xi-,), . . . ,
addX0, XI, . . . , Xi- I )). If Vand Ware vectors, then V + W will be used to denote
the component-wise addition of V and W.
LEMMA 1. Let L = (I,, 12,. . . , I,) be an increasing sequenceof integers and let
1, be greater than 1,. Suppose there exist constants cl, ~2,. . . , c,, which may be
rationals, rather than integers,such that C I=, ci V,,= V,. Further supposethat when
expressedin lowest terms, ci = ei/J, where ei andfi’ are integers.Let d be the least
common multiple of the 5 ‘s. Then dx, - Cr=, dciXli is an integer (possibly zero)
multiple of m.
134 JOANBOYAR
PROOF. Consider the following r congruences:
then these coefficients will work forever, that is, for all n 2 no, we have
Thus we would like to solve for (cI:, CY~,. . . , aI) in the above system of congru-
ences. The only problem is that we don’t know the modulus m; instead we have
riz, a multiple of m. Although this system must have a solution modulo m, it may
have no solution modulo 6~. Thus we would like to have necessary and sufficient
conditions for a solution to exist, conditions that tell us which factors to remove
from A to make the system solvable. Writing this system of congruences in matrix
form, we set
The algorithm for refining & and solving for x in the system Ax = b (mod m) is
fairly simple. First, we find unimodular matrices U and V such that A = UEV,
where E= [e,, e2,. . . , e,], E is in Smith normal form, and q = min(k, r). We want
to solve UEVx = b (mod m’), where m’ is a multiple of m and a divisor of r?z.If
we multiply through by U-l, which is an integer matrix since U is unimodular,
and set c = U-lb, we get El/x = c (mod m ‘). Since V and x contain only integers,
y = Vx also contains only integers. In fact, since V is unimodular, there is a
one-to-one correspondence between x and y. Thus Ax = b (mod m’) has a
solution if and only if Ey = c (mod m ‘) has a solution (though both may have
many solutions). But solving Ey = c (mod m ‘) is easy when a solution exists
since E is diagonal; just solve for yi in eiyi = ci (mod m ‘) for all i. If there is no
solution to eiyi = ci (mod k), then the gcd(ti, ei) does not divide ci. But since
there is a solution modulo m, the gcd(m, ei) divides ci. Thus, we can safely replace
GZby [&(gcd(riz, ei, ci))/gCd(riz, s)].
Thus, to find a set of coefftcients that correctly predict X,,, X,,, , . . . , X,,,,+,.-,,
and hence all X:s, we start with some multiple rtt of m. We form the matrix A
and the vector b as described above. Then, from A, we compute its Smith normal
form E, the corresponding unimodular matrices U and V, and the vector
c = U-lb. Then, until the gcd(Gz, ei) divides ci for all i, we remove excess
factors from & without changing the fact that m divides riz, replacing E%by
[&(gcd(A, e,, ci))/gcd(riz, ei)]. After solving for y in Ey = c mod riz (note that
if ei = 0, we set yi = Ci/riz), we can set X t0 V’y.
The algorithm described in this section gives us the following:
THEOREM 2. There is an eficient inference method for predicting any sequence
produced by a generator having the extrapolation property. IJ; for all Xi < m, for all
j, and for all n, 1@j(Xo,XI, . . . , X,-,) 1 5 z, the maximum number of errors this
inference method makes is no greater than no + max(k, r) + 3 + log2k +
W2Yog2(kz2).
For details of the algorithm and a formal proof of its correctness, see [4].
completed. Linear congruences with n terms in the recurrence, though, have such
a special form that one can guarantee that the set L will be the first s vectors for
some s 5 ~1,and thus one will have found L before seeing XZncl. After linding L,
the inference method makes at most 2 + logZn + (n/2)logz(nm2) errors. See [4] for
the improved algorithm and further details.
The linear congruential method X;+, = a;U, + b (mod m), is the special case of
linear congruences with n terms in the recurrence, with 12= 1. Thus, Theorem 5
implies the following:
THEOREM 6. There is an eflcient inferencemethodfor predicting any sequence
produced by the linear congruential method. The maximum number of errors this
inferencemethod makes is no greater than 5 + log2m, including the necessaryerrors
for predicting X0, X, , and X2.
In this special case of the linear congruential method, one can prove somewhat
stronger results, including the following theorem [4].
THEOREM 7. The eficient inference method derivedfrom the general method,
requires knowledge of an initial segmentof length no more than 2 + Tlog2m1 before
it is able to compute a nonzero multiple of the modulus and the coeficients, except
in the caseswhere Y, = +Y2 or Y2= 0.
In addition, for sequences produced by the linear congruential method, it is
often possible to take advantage of knowledge of scattered Xi values, rather than
just (Xi ] 0 5 i 5 j). See [4] and [ 181for details and proofs.
The low-order bits of numbers produced using the linear congruential method
tend to appear much less random than the high-order bits [ 111. Knowing this, a
cryptographer using a linear congruential pseudo-random number generator would
probably only use the high-order bits. In [ 121,Knuth has discussed the problem of
predicting these sequencesproduced by the linear congruential method. He assumes
that the modulus m is known and is a power of two, but assumes that only the
high-order bits of the numbers generated are actually used. Frieze et al. [7] have a
much faster algorithm than Knuth’s for predicting sequences produced by the
linear congruential method even if half of the low-order bits are unknown, but
they assume that the multiplier, a, and the modulus, m, are known. Hastad and
Shamir [9] have generalized this work, but they also assume that the multiplier
and modulus are known.
In [4] and in [5], we assume that a fixed linear congruential pseudo-random
number generator, Xi+, = uXj + b (mod m), is given, but the nonnegative constants
a, b, and m, with m > max( 1, X0, a, b), are unknown. We further assume that the
low-order t bits of the Xi’s are never known. The problem is to predict, from the
high-order bits of some of the Xi’s, the remainder of the sequence. As in this paper,
we assume that whenever an incorrect prediction occurs, the correct value is
revealed before the next prediction is made, but only the high-order bits are
revealed, not the low-order t bits. We show that, even if some small number t =
O(log log m) of the low-order bits are unused, the sequences produced are still
cryptographically insecure.
Now, let us assume that a fixed quadratic congruence, X;,, = aXT + bX, + c
(mod m), is given and that a, b, c, and m are unknown. We also assume that the
coefficients are nonnegative integers and that m > max( 1, X0, a, b, c). In order to
apply the methodology of the previous section, the following theorem that says
that quadratic congruences have the unique extrapolation property will be proved
through the two lemmas following it.
Inferring SequencesProduced by Pseudo-Random Number Generators 139
THEOREM 8. Thefunctions (c$#&, X, , . . . , Xi) = X{ IO 5 j 5 2) have the unique
extrapolation property with length 3.
As in the case of linear congruences, we obtain the sequence (Yk) by setting
Y, = Xk - Xk-, for k 2 1. We also look at the sequence (& ), which we obtain by
setting Zk = & + X,-, for k 2 1. By subtracting
Xk+, = aJ$ + b& + c (mod m)
and
Xk = aXfpl + b&-, + c (mod m),
one gets that Xk+, - Xk= a(Xz - Xc-,) + b(& - Xkel) (mod m), and thus
Yk+, = aYkZk + bY, (mod m). The following lemma reduces the problem of com-
puting the original a, b, c, and m, to that of computing some H, 6, and r?zsuch that
&+I s ii&Z, + bYj (mod yiz), forj L 1.
LEMMA 9. Suppose Y+, = a^Y, Z,, + 66 (mod m) for 1 5 k I s. Then setting
2=X, -(hXi + b&)givesX. ~+,~~~:-+~~~+~(mod~)forO~k%s.
PROOF. Suppose 3+, = tiY,Zj + &Y, (mod riz) for 1 sj 5 s. Let 0 % k 5 S. Then
we have that 6x2 f b& + 2 - &+, = 6X: + b& + (X1 - (6X; + 6x0)) - &+ I =
ii(Xi - x;> + S(X, - Xl)) - (xk+, - Xl) = 6 Cfzzl &Zi + 6 C$czlYj - CfzzI Yj+, =
C!=, (6YjZi + SYi - Y,+,) = 0 (mod riz). Therefore, for 0 s k 5 s, we have that
Xk+,=GXLiX2+1;Xk+i.(mod6z). 0
In order to find a multiple of the modulus WI,we can use the method discussed
in Section 2 and the notation of that section. In this case, we have Vj =
(I$,Z ,-,, Ye1), and (5, 5) =(&,Z,-,, I$,, 5). The set L will contain V, and
I’, unless V, = 0 or there exists a rational constant c such that cV1 = V,. If I’, = 0,
thenY,=O,soY,=Oforalljrl.IfcV~=V~,theneitherY~=OorXo=X~.If
Yz=O,thenY,=Oforalljr2.IfXo=X2,thesequenceis(Xo,X,,Xo,X,,...).
These sequences are very distinctive and easy to predict. But unless the sequence
we are trying to predict has one of these special forms, using the methodology in
Section 2, we can begin predicting the sequence before seeing X4. After this, we
make at most one incorrect prediction before we have enough information to
compute a multiple of the modulus. To show that we can use the methodology
from Section 2 to compute coefficients that will work, we need only show that
sequencesof the form
Y,+, e 6qZj + 6& (mod riz)
have the extrapolation property. The following lemma tells us that if we can find
coefficients that correctly predict Y, and Y, using some multiple of m as the
modulus, then these coefficients can be used to correctly predict all the Yi’s if m is
known.
LEMMA 10. If Vtzis a positive integer multiple of m, and the following two
congruences hold
YZ = LiYiZ1 + 6Y, (mod m)
Y3 = LiY2Z2 + 6Y2 (mod m),
then for all j 2 1, Yj+I = 6YjZj + 66 (mod m).
140 JOAN BOYAR
RECEIVED MAY 1985; REVISED FEBRUARY AND JULY 1986, MARCH AND SEPTEMBER1987, JANUARY 1988;
ACCEPTED JANUARY 1988
Journal of the Association for Computing Machinery, Vol. 36, No. I, January 1989.