Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Inferring SequencesProduced by Pseudo-Random Number

Generators

JOAN BOYAR
University of Chicago, Chicago, Illinois

Abstract. In this paper, efficient algorithms are given for inferring sequences produced by certain
pseudo-random number generators. The generators considered are all of the form X, =
cI:=, ~j@j,(XO,Xlt~. . f X,-,) (modm). In each case, we assume that the functions 4, are known and
polynomial time computable, but that the coefficients a, and the modulus VI are unknown. Using this
general method, specific examples of generators having this form, the linear congruential method, linear
congruences with n terms in the recurrence, and quadratic congruences are shown to be cryptographically
insecure.
Categories and Subject Descriptors: E.3 [Data Encryption]; F.2.1 [Analysis of Algorithms and Problem
Complexity]: Numerical Algorithms and Problems; G.3 [Probability and Statistics]: random number
generation.
General Terms: Algorithms, Security
Additional Key Words and Phrases:Cryptography, inductive inference, linear congruential method

1. Introduction ’
A pseudo-random number generator is considered cryptographically secure if a
cryptanalyst is unable to compute any other segment of the generator’s output
within feasible time and space complexity bounds, even after obtaining long
segments of this output. The first example of a cryptographically secure pseudo-
random number generator is presented in 1201. In proving that his method is
secure, Shamir proves that, given long segments of this generator’s output, the
ability to produce the next number to be output implies the ability to crack the
Rivest-Shamir-Adleman encryption scheme. But, for cryptographic purposes, one
would prefer a stronger definition of security-even given some of the initial bits
in this next number to be output, a cryptanalyst should have no advantage in
guessing the next bits of this same number. It is unknown whether Shamir’s
generator is secure in this stronger sense.In order to solve this problem, Blum and
Micali [2] give a pseudo-random bit generator, a generator that produces only one
bit, rather than an entire number, at each step. The Blum-Micah generator is
cryptographically secure in this stronger sense, assuming the problem of index
finding is intractable. But their method is extremely slow. Pseudo-random bit
generators that are cryptographically strong under other cryptographic assumptions,
This work was supported by an Educational Opportunity Fellowship and by DARPA grant N00039-
82-C-0235.
Author’s address: Department of Computer Science, University of Chicago, Chicago, IL 60637.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.
0 1989 ACM 0004-541 l/89/0100-0129 $01.50

Journal ofthe Association for Computing Machinery, Vol. 36, No. I, January 1989, pp. 129-141
130 JOAN BOYAR

but are also slow, are given in [3] and [8]. Each of these methods performs at least
one multiplication on two very large numbers for each bit of output produced.
Yao [23] has shown that strong generators exist under the assumption that any
one-way permutations exist. Recently Long and Wigderson [ 141, Vazirani and
Vazirani [22], and Alexi et al. [I] have proved that the generators in [2], [3], and
[8] are strong even if improved slightly to produce log, y1bits of output for operations
on n bit numbers. This is a great improvement, but still these generators are quite
slow. This suggeststhe question of whether any of the fast pseudo-random number
generators commonly in use are also cryptographically secure.
The linear congruential pseudo-random number generators, those of the form
X;+, = uXj + b (mod m), are very fast and widely used in Monte Carlo simulations
and probabilistic algorithms. In this paper, we show that they are not cryptograph-
ically secure, even in the weaker sense mentioned above. Obviously, these genera-
tors are not cryptographically secure when the modulus, m, is known. In that case,
one could solve for 6 in the congruence (X, - X0@ = (X2 - X,) (mod m).
Then, the original sequence can be correctly predicted using the generator
X,,, = G(Xi) + (X, - a(XO)) (mod m). But what if the modulus is unknown? Linear
congruential pseudo-random number generators and many of the other fast pseudo-
random number generators have the form

Xn = 2 ~j$j(Xo, XI, . . . , X-I) (mod ml.


J=l

This generator is given no initial values, X0, X, , . . . , Xn,-, , and is used to predict
all subsequent values X,,, X,,,,,, . . . . For example, with the linear congruential
method, no = 1, so X0 is given and all subsequent values are computed using the
generator. In the next section, we present a general method for inferring certain
sequencesin this general form. This method is efficient, according to the following
- definition:
Definition. An eficient inferencemethodfor predicting a sequenceproduced by
a generator of the form X, = xi”=, aj4j(Xo, X1, . . . , X,-l) (mod m) for n 2 no is an
algorithm which
(1) is given the $is, which we assume can be computed over the integers (i.e., not
reduced modulo m) in time polynomial in log m and k,
or the modulus m,
(2) is not given the coefficients CU,
(3) makes predictions for each element of the sequence, one at a time, from
knowledge of all previous values,
(4) makes a number of mistakes that is bounded by a polynomial in log m, in k,
and in no, and
(5) produces a prediction for a next element of the sequence within a time that is
bounded by a polynomial in log m and k.
Thus, we are assuming that whenever the algorithm makes an incorrect inference,
it is immediately given the correct value. This general method, which is described
in Section 2, is applied to specific generators, the linear congruential method, and
linear congruences with n terms in the recurrence; and quadratic congruences, in
Section 3.
The results in the third section, related to the linear congruential method, were
first reported in [ 161and [ 171,and the other results presented in that section were
reported in the author’s dissertation [ 181. Many of the details omitted from this
paper can be found in [ 181or [4].
Inferring SequencesProduced by Pseudo-Random Number Generators 131
Other researchers have looked at similar problems. Reeds [ 191also looks at the
linear congruential method, and assumes, as we do, that the modulus and coeffi-
cients are unknown; but that result, unlike the one presented here, relies on the
assumption that factoring is easy. Lagarias and Reeds [ 131have proved a conjecture
made in [ 181, concerning the unique extrapolation of polynomial recurrences.
Thus the results in Section 3, showing how to predict sequences produced by
quadratic congruences, can be generalized to polynomial congruences, for poly-
nomials of fixed-degree.

2. The General Method


Suppose we are given (X0, XI, . . . , X, ), an initial segment of the sequenceproduced
by a pseudo-random number generator, and we would like to predict some or all
of the remainder of the sequence. We assume that the generator has the form

X,-,) (mod m), (*I


J=l

where the functions 4j are known, but the coefficients aj and the modulus are
unknown. The generator was used to produce all X, where i > no for some known
no. We also assume that the nonreduced dj’s can be computed in time polynomial
in log m and k. Some examples of such generators are the linear congruential
method, X, = aX,-, + b (mod m), linear congruences with n terms in the recurrence,
X, = alX,-, + azX,-2 + . . . + anXiWn+ a,+l (mod m), and quadratic congruences,
X, = uXfwl + bX;-, + c (mod m). The methodology presented in this section is
applicable to generators in any of these three forms, becausethey have the following
extrapolation property.
Definition. A set of functions (@j(Xo,X, , . . . , X,- , ) I 1 5 j 5 k) has the unique
extrapolation property with length r if for any modulus m (where m = +m is allowed
and means over Q) and any data ( yo, y, , . . . , Y~+,-~ ), then all generators (*) with
coefficients (aj ] 1 5 j I k) such that

i LyJ4ji(YO, YIP * . . > Y,,+/-~) = Y,,+/ (mod m)


j=l

holds for 0 5 15 r - 1 produce the same set of iterates ( y; ] 0 I i < a~).


If generators with a given form have this property, then generators that coincide
for the first rzo+ r values modulo an integer m coincide on all values modulo m.
So if two generators with the same modulus are given the same initial no values
and the first r values they produce are identical, then they will continue to produce
the same values. In the case of the linear congruential method, r is 2,
in the case of linear congruences with n terms in the recurrence, it is n + 1, and
in the case of quadratic congruences, it is 3. Thus, in each of these casesit is equal
to k, so the number of congruences that must be satisfied is k.
Since we know that our generator has the form

zl = ;: Q$(Xo, Xl, . . . , Z-, ) (mod m),


j=I

where the functions @Jare known, we can attempt to predict the sequence pro-
duced by finding some coefficients CY;and a modulus that, when used with this
recurrence, will produce the same sequence as the original generator. For example,
132 JOAN BOYAR

X, = 6X,,-, + 2 (mod 10) produces exactly the same sequence as X, = X,,-, + 2


(mod 10) if X0 is even. If the first generator was actually used, either the first or the
second could be used to predict the sequence. The inference procedure we give to
find these “guesses” for the coefficients and modulus has three major steps. Step 1
is to find a nonzero multiple of the modulus, Step 2 is to reline this nonzero
multiple of the modulus and to find the coefficients, and Step 3 is to remove the
excess factors from the multiple of the modulus that was refined in the second
step. The extrapolation property will guarantee that the second step need not be
repeated.
Although we may be unable to find a nonzero multiple of the modulus until we
know a significant portion of the sequence, say (XC,,XI, . . . , X, ), we may be able
to begin making accurate predictions much earlier. The information from an initial
segment, (X0, X , . . . , X, ), with q L n,, and q - no polynomial in k and in the
length of the &‘s, will be used to create a generator that can be used to predict the
values (X0, X,, . . . , X, ), where t r q. This generator will have a somewhat different
form from the original generator; it will have no modulus and its coefficients may
not be integers. For example, after seeing the initial segment (32, 16,8) of a
sequence produced by the linear congruential method, one could create the
generator X, = (i)Xn-, . So in our attempt to predict X,, , , we create a generator

X q+t = j$ Pj.(q+I)+AXo,XI, . . . , &I (mod m(q+*)),

where m(q+I) = 00. From this point on, to predict any X,, there will always
be a generator of this form to predict the sequence. The first generators will
have ms= w.
One of these first generators created will probably be incorrect at some point in
its predictions (this is not necessarily the case, since no mistake would be made on
the constant sequence, for instance). We assume that whenever an incorrect
prediction occurs, the correct value is revealed before the next prediction is made.
The k + 1st prediction error after the first no values are seen will give us enough
additional information to calculate a nonzero integer multiple of the modulus.
Then, after we have seen at least no + Yvalues, using this multiple of the modulus,
we can find some possible integer coefficients a,!. In the case of any generator
having the extrapolation property, these coefficients will be the coefficients we are
looking for. (Of course, they may not be the actual coefficients, but, given the same
modulus, they would produce the same sequence as the original generator.) From
this point on, the generator created to predict X,, will be

and m, will be a multiple of the original modulus. (In fact, as soon as we have
computed a nonzero multiple of the modulus, we can compute some possible
coefficients to use in the prediction process. It is unnecessary to wait for the unique
extrapolation bound. However, the coefficients should then be recomputed after
the unique extrapolation bound is reached.)
In the third step, using these coefficients and the multiple of the modulus, one
can begin predicting the remainder of the sequence. When further errors are made
in predicting the Xi’s, the guess for the modulus is updated so that it divides the
Inferring SequencesProduced by Pseudo-RandomNumber Generators 133
previous guess for the modulus and is still a multiple of the modulus. The fact that
each new modulus divides all previous ones means that any congruence that held
modulo these previous moduli will also hold with the new modulus. Thus the
coefftcients being used need not be changed when a new modulus is computed to
maintain consistency with the portion of the sequence seen so far. This consistency
is automatic. If this initial portion of the sequence contains at least no+ r elements,
the unique extrapolation bound has been passed, and these coefficients will never
need to be changed. The updates to the modulus are easy to make because m
divides

SO gcd(m,, X, - J$= 1 CY,!4j(Xo, Xl, . . . , X,-, )), is also a multiple of m and can be
used for m,,,. Each time an error is made in the prediction,
Xs Z Cfzl (Yj4j(XO, XI,. . . , X,-r) (mod m,), so some nontrivial factor is removed
from m, to produce m,+ , . Thus the maximum number of errors that could possibly
occur, after m,+l (the first nonzero integer multiple of the modulus) is computed,
is 1 + log2(m,+,/m).
This completes the overview of the algortihm. Steps 1 and 2 are explained further
in the remainder of this section.

Step 1. Let us consider the problem of finding the first nonzero integer multiple
of the modulus. The method presented here will work for any class of generators
with the form

but in specific casesthere may be improvements one could make for the sake of
efficiency.
In order to find a multiple of the modulus m, we look at k-dimensional and
(k + I)-dimensional vectors of consecutive Xl’s. These vectors are viewed as
belonging to vector spacesover the field Q of rationals. Define

The notation aV, will be used to denote multiplication of the vector V, by the
constant a. Thus aVj = (a$, (X0, X, , . . . , Xi-, ), a&(X0, X, , . . . , Xi-,), . . . ,
addX0, XI, . . . , Xi- I )). If Vand Ware vectors, then V + W will be used to denote
the component-wise addition of V and W.
LEMMA 1. Let L = (I,, 12,. . . , I,) be an increasing sequenceof integers and let
1, be greater than 1,. Suppose there exist constants cl, ~2,. . . , c,, which may be
rationals, rather than integers,such that C I=, ci V,,= V,. Further supposethat when
expressedin lowest terms, ci = ei/J, where ei andfi’ are integers.Let d be the least
common multiple of the 5 ‘s. Then dx, - Cr=, dciXli is an integer (possibly zero)
multiple of m.
134 JOANBOYAR
PROOF. Consider the following r congruences:

i q#mo,XI, . . . , X,-I) = X, (mod


J=,
m),

i q#)JWO, Xl, - * . , Xl,-,) = X,, (mod m).


J=,

If we multiply congruence i by dc, and sum, we get C:=, [d I,“=,


CfjCi4jj(xO, XI 9* * . , &,-,)I = EC:=,&&, (mod m). The left-hand side is equal to d
C,“=l ~jdj(xO, xl 3. . . , &-, ) = dx, (mod m). Thus CT=, d&, = dJ$ (mod m), so
dX, - z:G, d&Y,, = vm for some integer v. 0
We use the above lemma to find a positive multiple of the modulus. First we
need to find an appropriate set L and an integer f, so that V,$is a linear combination
of the vectors in IV,,, V,*, . . . , V,,]. But to ensure that the multiple we find is
nonzero, we need that ( VIs,X,) is not in the vector space spanned by the set
KV,, xl,), (v/,9 &), f * * , (VI,, &r)l.
To find an appropriate set L, we want to choose as large as possible a set L of
indices of linearly independent vectors, from Vn,, V,,, 1, . . . . Since the vectors have
dimension k, there will be at most k indices in L. Suppose we start with L = (no)
and then look through the other vectors one at a time. The jth vector we look at
will be Vq,+jaIf VH,,+Jis not in the vector space spanned by the vectors whose
indices are already in the set L, then we add its index to L. If it is in the vector
space spanned by the vectors indexed by L = {lo, I,, . . . , lk, ,I, then one can easily
find constants cl, ~2, . . . , CP, such that VnO+j= C!L, c; V,. Let d be the least
common multiple of the denominators of the ci’s. By the previous lemma, there is
an integer v such that dX,q,+j- d Cfl, ciXl, = vm. We can tentatively assume that
v = 0 and predict that Xn,,+j= C!L, c,&, . If we are wrong, then after the correct value
for Xn,,+jis revealed, we have a nonzero multiple of the modulus. So, before finding
A, our first nonzero multiple of the modulus, there are at most no + k + 1 different
Xi’s for which we cannot make a correct prediction. Also note that the computations
performed before the first prediction, after the last prediction, and between any
two predictions are all polynomial in k and log2m.
Since the number of errors we make in our later predictions depends on the size
of t?z,we must look at its size. Suppose the largest absolute value in any of the
vectors Vi is z. Since we are assuming that the time to compute the nonreduced
values of the $j’s is polynomial in log m and k, the size of z is also polynomial in
log m and k. Thus, by Cramer’s Rule, the absolute value of d, the least common
multiple of the A’s, is no greater than the maximum possible value for the
determinant of a k by k matrix with elements whose absolute values are no greater
than z. This is also true of the absolute value of each of the products dc;. Note that
this does not depend on there being k independent vectors since, if there were only
k’ < k vectors, one could work with a smaller k’ by k’ matrix. By Hadamard’s
inequality, which states that ] det(a;j) ] I n,5ic~(C,ai5k ai)“2 [ 1I], such a deter-
minant must have a value less than (kz’)““, so riz must be less than (kz2)k”m +
k(kz2)k’2m I 2km(kz2)k’2. Hence the number of incorrect predictions that will be
made after this multiple of the modulus is found will be no greater than
log2(2km(kz2)k/2/m) = 1 + log2k + (k/2)log2(kz2). This makes the maximum
Inferring SequencesProduced by Pseudo-RandomNumber Generators 135
possible number of errors total no greater than no + max(k, r) + 3 + logzk +
(k/2)log2(kz2), which is polynomial in k and log2m.
Step 2. After finding a nonzero integer multiple of the modulus and seeing
no + r elements of the sequence, one can attempt to find integer coefficients CY;,
which can be used in predicting the sequence produced by the original generator.
Although the 4,‘s used in the pseudo-random number generators may be highly
nonlinear, the problem of finding the coefficients can be solved, in some cases,by
solving a system of linear congruences. Recall that for generators having the
extrapolation property, if one can find coefftcients a, satisfying

i a,! 4Jf0, Xl, . . . , X,,) = X,,+I (mod m),


j=l

then these coefficients will work forever, that is, for all n 2 no, we have

Thus we would like to solve for (cI:, CY~,. . . , aI) in the above system of congru-
ences. The only problem is that we don’t know the modulus m; instead we have
riz, a multiple of m. Although this system must have a solution modulo m, it may
have no solution modulo 6~. Thus we would like to have necessary and sufficient
conditions for a solution to exist, conditions that tell us which factors to remove
from A to make the system solvable. Writing this system of congruences in matrix
form, we set

~lcG,x,, . . ,X-l) ~*(xo,x~,. . . ,Xno-,) . . * $J,(Xo,X,,


. . . ,x,,-,I
4&Gl,x,,...,x,) &J2(X0,XI,...,Xn,J ... hk(XO,X,,. . >&J
A=

h(Xo,X,, . . . ,x”o+,-2) &(X0,X,, . . . ,x,,+,-2) ... 4k(X0,XI,...,Xng+r-2)


( j j )

and b = (&,,-&,+~, . . . , X,,+,-, ). We know that there is a solution to the system


of congruences defined by Ax = b (mod m).
To solve for x, we use a normal form for matrices, called Smith normal form,
for H. J. S. Smith [2 11.Let q be the minimum of k and r. An integer r by k matrix
is in Smith normal form if it is a diagonal matrix with diagonal elements
a, e2,. . . , e,, such that e, divides e,+, for 1 I i 5 s, for some 0 5 s 5 q, and e, = 0
fors+l~i~q. Wedenotesuchamatrixby[e,,e2,...,e,].Foreveryrbyk
matrix A over a principal ideal domain R (in this case the integers), there exists an
r by r unimodular matrix U and a k by k unimodular matrix V such that A = UE V,
where E is in Smith normal form [ 151.A matrix is unimodular if its determinant
is a unit (in this case plus or minus one). The integers et, e2, , . . , e, are called the
invariant factors of A. There is a polynomial-time algorithm given in [lo] for
finding the Smith normal form of a matrix and the unimodular matrices U and V.
136 JOANBOYAR

The algorithm for refining & and solving for x in the system Ax = b (mod m) is
fairly simple. First, we find unimodular matrices U and V such that A = UEV,
where E= [e,, e2,. . . , e,], E is in Smith normal form, and q = min(k, r). We want
to solve UEVx = b (mod m’), where m’ is a multiple of m and a divisor of r?z.If
we multiply through by U-l, which is an integer matrix since U is unimodular,
and set c = U-lb, we get El/x = c (mod m ‘). Since V and x contain only integers,
y = Vx also contains only integers. In fact, since V is unimodular, there is a
one-to-one correspondence between x and y. Thus Ax = b (mod m’) has a
solution if and only if Ey = c (mod m ‘) has a solution (though both may have
many solutions). But solving Ey = c (mod m ‘) is easy when a solution exists
since E is diagonal; just solve for yi in eiyi = ci (mod m ‘) for all i. If there is no
solution to eiyi = ci (mod k), then the gcd(ti, ei) does not divide ci. But since
there is a solution modulo m, the gcd(m, ei) divides ci. Thus, we can safely replace
GZby [&(gcd(riz, ei, ci))/gCd(riz, s)].
Thus, to find a set of coefftcients that correctly predict X,,, X,,, , . . . , X,,,,+,.-,,
and hence all X:s, we start with some multiple rtt of m. We form the matrix A
and the vector b as described above. Then, from A, we compute its Smith normal
form E, the corresponding unimodular matrices U and V, and the vector
c = U-lb. Then, until the gcd(Gz, ei) divides ci for all i, we remove excess
factors from & without changing the fact that m divides riz, replacing E%by
[&(gcd(A, e,, ci))/gcd(riz, ei)]. After solving for y in Ey = c mod riz (note that
if ei = 0, we set yi = Ci/riz), we can set X t0 V’y.
The algorithm described in this section gives us the following:
THEOREM 2. There is an eficient inference method for predicting any sequence
produced by a generator having the extrapolation property. IJ; for all Xi < m, for all
j, and for all n, 1@j(Xo,XI, . . . , X,-,) 1 5 z, the maximum number of errors this
inference method makes is no greater than no + max(k, r) + 3 + log2k +
W2Yog2(kz2).
For details of the algorithm and a formal proof of its correctness, see [4].

3. Application of the General Method to Linear and Quadratic Congruences


The linear congruential pseudo-random number generators, those of the form,
X,, , = aXi + b (mod m), are very fast and widely used in Monte Carlo simulations
and probabilistic algorithms. They are a special case of linear congruences with n
terms in the recurrence,
Xi = alXi-1 + a2Xi-2 + - - * + a,Xi-, + a,+l (mod m)
and of quadratic congruences,
Xi+1 = uX: + bXi + c (mod m).
In this section, we show that the methodology presented in the previous section
can be applied to pseudo-random number generators in both of these forms, thus
showing that these generators are cryptographically insecure.
We assume that a fixed linear congruence with n terms in the congruence,
Xi= a,Xi-1 + a2XiP2+ - - - + anXi-n + a,+, (mod m), is given and that, although n
isknown, al,a2,... , an+l, and m are unknown. We also assume that the coeffi-
cients are all nonnegative and that m is greater than any of these coefficients and
is greater than one. In fact, rather than applying the general method of the previous
section directly to this sequence, we apply it to a similar sequence derived from
Inferring SequencesProduced by Pseudo-Random Number Generators 137
this one. We obtain the sequence ( Yk) by setting Yk = Xk - Xk-r for k 2 1. By
subtracting
Xk = alXkTl + a2Xkp2 + . . . + anXk-n + a,,, (mod m)
and
Xk-, = alXk-2 + a*Xk-3 + . . . + anXk+, + a,+, (mod m),
one gets that
Xk - Xk-, = a,(Xk-, - Z-2) + a2(Xk-2 - L-3)
+ --- + a,(Xk-, - &+ l ) (mod m),
and thus Yk = al Yk- 1+ a2Yke2+ . . . + a,,Yk+ (mod m) for k > n. The following
lemma reduces the problem of computing the original a,, q, . . . , a,,+,, and m,
to that of computing any b,, Li2,. . . , &, and h such that Yj E (x7= 1 ciiYj-i)
(modri?),forjzn+ 1.
LEMMA 3. Suppose Yj E (C?=, &;l&) (mod r?z)for n + 1 5 j 5 s. Then setting
1
a,+l = (X, - C:=l biXn-j) gives Xk e ci,Xk-, + &Xk-2 + * * * + &Xk+ + iin+,
(mod rFz)fir n 5 k s s.
PROOF. Suppose Y = (C?=, di&,) (modti) for n+ 1 <j<s. Let nskss.
Then (LilXk-1 + &Xk-2 + * ** +6,X,-, + &+I) - Xk = (CZr CiiXk-i) +(X, -
Cy=l riiX,-i) - Xk = (Cal ii(Xk-i - Xn-i)) - (Xk - Xn) E (C$cl cii(Cfi,+l Y/-i)) -
ml+1 Y,) = Cf=,,+l ((Cy=l &iY,-i) - Y,) = 0 (mod riz). Therefore, for n 5 k I S,
we have that Xk = cilXk-r + 62Xk-2 + . . . + &Xk-,, + Cintl (mod yi2). q
In order to use the method presented in the previous section, we need only show
now that generators of the form Yk = (CJ’=,&j Yk-j) (mod fi) have the extrapolation
property. The following lemma tells us that if we can find coefficients that correctly
predict Y,+,, Yn+2,. . . , Y2nusing some multiple of m as the modulus, then these
coefficients can be used to correctly predict all the Yi’s if m is known.
THEOREM 4. The functions (4j( YO,Y, , . . . , Y,-, ) = Yn-j 1 1 I j 5 n) have the
unique extrapolation property with length n.
PROOF. Suppose there exist two sets of coefficients (aj 1 1 I j I n) and
(ci,I1(j~n)suchthatforn+1siI2n,C;=lajY;-j~~Ci”=I~j~Yi_j(modm).A
proof by induction shows that these generators produce the same set of iterates.
Suppose that they agree on the first k - 1 iterates, Y, , Y2, . . . , Y,-, , and look at
the kth iterate where k > 2n. Then CT=1ajYk-j= CFI a,(Ce, bi Yk-j-i) =
x7=, ii,(C&l a, Yk-j-i) = Cyz’=lCiiYk-i (mod m). Therefore, for all k 2 n + 1, we have
C&l aj Yk-j E Cyal GjYk-j (mod m). Cl
Since these generators have the extrapolation property, we can use the method
presented in the previous section to infer the sequences they produce. It is clear
that I ~,,(Yo, YI, . . . , Y,- , ) 1 < m. Thus, from Theorem 2 of the previous section,
and Lemma 3 and Theorem 4 above, we get the following:
THEOREM 5. There is an eflcient inference method for predicting any
sequence produced by a linear congruence with n terms in the recurrence. The
maximum number of errors this inference method makes is no greater than
2n + 3 + logzn + (n/2)log2(nm2).
In the general method, one begins making predictions while finding the set L of
linearly independent vectors, but there is no guarantee as to when the set L will be
138 JOANBOYAR

completed. Linear congruences with n terms in the recurrence, though, have such
a special form that one can guarantee that the set L will be the first s vectors for
some s 5 ~1,and thus one will have found L before seeing XZncl. After linding L,
the inference method makes at most 2 + logZn + (n/2)logz(nm2) errors. See [4] for
the improved algorithm and further details.
The linear congruential method X;+, = a;U, + b (mod m), is the special case of
linear congruences with n terms in the recurrence, with 12= 1. Thus, Theorem 5
implies the following:
THEOREM 6. There is an eflcient inferencemethodfor predicting any sequence
produced by the linear congruential method. The maximum number of errors this
inferencemethod makes is no greater than 5 + log2m, including the necessaryerrors
for predicting X0, X, , and X2.
In this special case of the linear congruential method, one can prove somewhat
stronger results, including the following theorem [4].
THEOREM 7. The eficient inference method derivedfrom the general method,
requires knowledge of an initial segmentof length no more than 2 + Tlog2m1 before
it is able to compute a nonzero multiple of the modulus and the coeficients, except
in the caseswhere Y, = +Y2 or Y2= 0.
In addition, for sequences produced by the linear congruential method, it is
often possible to take advantage of knowledge of scattered Xi values, rather than
just (Xi ] 0 5 i 5 j). See [4] and [ 181for details and proofs.
The low-order bits of numbers produced using the linear congruential method
tend to appear much less random than the high-order bits [ 111. Knowing this, a
cryptographer using a linear congruential pseudo-random number generator would
probably only use the high-order bits. In [ 121,Knuth has discussed the problem of
predicting these sequencesproduced by the linear congruential method. He assumes
that the modulus m is known and is a power of two, but assumes that only the
high-order bits of the numbers generated are actually used. Frieze et al. [7] have a
much faster algorithm than Knuth’s for predicting sequences produced by the
linear congruential method even if half of the low-order bits are unknown, but
they assume that the multiplier, a, and the modulus, m, are known. Hastad and
Shamir [9] have generalized this work, but they also assume that the multiplier
and modulus are known.
In [4] and in [5], we assume that a fixed linear congruential pseudo-random
number generator, Xi+, = uXj + b (mod m), is given, but the nonnegative constants
a, b, and m, with m > max( 1, X0, a, b), are unknown. We further assume that the
low-order t bits of the Xi’s are never known. The problem is to predict, from the
high-order bits of some of the Xi’s, the remainder of the sequence. As in this paper,
we assume that whenever an incorrect prediction occurs, the correct value is
revealed before the next prediction is made, but only the high-order bits are
revealed, not the low-order t bits. We show that, even if some small number t =
O(log log m) of the low-order bits are unused, the sequences produced are still
cryptographically insecure.
Now, let us assume that a fixed quadratic congruence, X;,, = aXT + bX, + c
(mod m), is given and that a, b, c, and m are unknown. We also assume that the
coefficients are nonnegative integers and that m > max( 1, X0, a, b, c). In order to
apply the methodology of the previous section, the following theorem that says
that quadratic congruences have the unique extrapolation property will be proved
through the two lemmas following it.
Inferring SequencesProduced by Pseudo-Random Number Generators 139
THEOREM 8. Thefunctions (c$#&, X, , . . . , Xi) = X{ IO 5 j 5 2) have the unique
extrapolation property with length 3.
As in the case of linear congruences, we obtain the sequence (Yk) by setting
Y, = Xk - Xk-, for k 2 1. We also look at the sequence (& ), which we obtain by
setting Zk = & + X,-, for k 2 1. By subtracting
Xk+, = aJ$ + b& + c (mod m)
and
Xk = aXfpl + b&-, + c (mod m),
one gets that Xk+, - Xk= a(Xz - Xc-,) + b(& - Xkel) (mod m), and thus
Yk+, = aYkZk + bY, (mod m). The following lemma reduces the problem of com-
puting the original a, b, c, and m, to that of computing some H, 6, and r?zsuch that
&+I s ii&Z, + bYj (mod yiz), forj L 1.
LEMMA 9. Suppose Y+, = a^Y, Z,, + 66 (mod m) for 1 5 k I s. Then setting
2=X, -(hXi + b&)givesX. ~+,~~~:-+~~~+~(mod~)forO~k%s.
PROOF. Suppose 3+, = tiY,Zj + &Y, (mod riz) for 1 sj 5 s. Let 0 % k 5 S. Then
we have that 6x2 f b& + 2 - &+, = 6X: + b& + (X1 - (6X; + 6x0)) - &+ I =
ii(Xi - x;> + S(X, - Xl)) - (xk+, - Xl) = 6 Cfzzl &Zi + 6 C$czlYj - CfzzI Yj+, =
C!=, (6YjZi + SYi - Y,+,) = 0 (mod riz). Therefore, for 0 s k 5 s, we have that
Xk+,=GXLiX2+1;Xk+i.(mod6z). 0
In order to find a multiple of the modulus WI,we can use the method discussed
in Section 2 and the notation of that section. In this case, we have Vj =
(I$,Z ,-,, Ye1), and (5, 5) =(&,Z,-,, I$,, 5). The set L will contain V, and
I’, unless V, = 0 or there exists a rational constant c such that cV1 = V,. If I’, = 0,
thenY,=O,soY,=Oforalljrl.IfcV~=V~,theneitherY~=OorXo=X~.If
Yz=O,thenY,=Oforalljr2.IfXo=X2,thesequenceis(Xo,X,,Xo,X,,...).
These sequences are very distinctive and easy to predict. But unless the sequence
we are trying to predict has one of these special forms, using the methodology in
Section 2, we can begin predicting the sequence before seeing X4. After this, we
make at most one incorrect prediction before we have enough information to
compute a multiple of the modulus. To show that we can use the methodology
from Section 2 to compute coefficients that will work, we need only show that
sequencesof the form
Y,+, e 6qZj + 6& (mod riz)
have the extrapolation property. The following lemma tells us that if we can find
coefficients that correctly predict Y, and Y, using some multiple of m as the
modulus, then these coefficients can be used to correctly predict all the Yi’s if m is
known.
LEMMA 10. If Vtzis a positive integer multiple of m, and the following two
congruences hold
YZ = LiYiZ1 + 6Y, (mod m)
Y3 = LiY2Z2 + 6Y2 (mod m),
then for all j 2 1, Yj+I = 6YjZj + 66 (mod m).
140 JOAN BOYAR

PROOF. The proof will be by induction on j. The conclusion obviously holds


for j = 1 and j = 2. Suppose it-holds for j = k - 1 and j = k. Then, multiplying the
congruence Yk+I 5 ciYk& + bYk (mod-m) through by (a&+, + b), we get that
Yk+,(aZk+~+b) = ciY/&(aZ~+, * b) + bY&zZ k+, + b) (mod m). This gives us that
Y,,, = izk+,(uYkzk + bYJ + b&Y/& + bYk)_ bciY,Zk,, - UbYkZk + biiYkZk
+ ubYkZk+, (mod m) = LiZk+, Yk+, + bYk+, + (ub - bCi)(YkZk+, - YkZk) (mod m).
We only need to show that (ad - bri)(Yk)(Zk+, - Zk) I 0 (mod m). Since
Yk = LiYk-, Zk-, + Yk-; (mod m-), multiplying through by uZk + b gives
that Yk+l = ciYkZk + bYk + (ub - bri)(Yk-,)(Zk - Z,-,) (modm), and thus
(ah - bii)( Yk-,)(Zk - Zk-, ) = 0 (mod m). If we multiply both sides by
(uZk-, f b)(u(& + Xkd2) + b), we get
0= (aliA - bci)(Yk-,)(UZk-1 + b)(Xk - X&(U(Xk + Xk-2) + b)
= (ub - bii)(Yk)(u(x: - X2-2) + b(X, - x,-,) + c( 1 - 1))
= (a5 - bri)( Yk)(xk+I - Xk-, )
= (a6 - bi?)(Y,)(Zk+, - Zk) (mod m). 0
Since these generators have the extrapolation property, we can use the method
presented in Section 2 to infer the sequencesthey produce. Note that ] YZj ] < m2.
This gives us the following:
THEOREM 11. There is an efficient inferencemethodfor predicting any sequence
produced by a quadratic congruence.The maximum number of errors this inference
method makes is no greater than 9 + 4 log2m, including the necessaryerrors for
predicting X0, XI , X2, and X3.
This bound can, in fact, be tightened. Following the argument in Section 2,
giving a bound on ti, one notes that since I&( Y,, Y, , . . . , Y,-,) 1= 1Y,-, 1< m,
the determinant of the matrix in question has absolute value less than 2m3, giving
a total number of errors no greater than 10 + 3 log2m.
Lagarias and Reeds [ 131have proved a conjecture made in [ 181,and have shown
that congruences of the form X; = P(X;-, ) (mod m), where P is a polynomial of
degree d, have the unique extrapolation property with length d + 1. This gives us
the following:
THEOREM 12. There is un eflcient inferencemethodfor predicting any sequence
produced by a generator of theform Xi = P(Xi- I ) (mod m), where P is an unknown
polynomial of degreed.
ACKNOWLEDGMENTS. We would like to thank our thesis advisor, Manuel Blum,
who suggested the problems we discuss in this paper. He was very supportive,
encouraging, and helpful throughout the research. We would also like to thank
Richard Karp and George Bergman for reading the thesis in which part of this
work appeared, and for offering many useful suggestions. Other people we would
like to thank include Faith Fich, Howard Karloff, Jeff Lagarias, Bart Plumstead,
David Shmoys, and Alice Wong, all of whom made many helpful comments. In
addition, we would like to thank the referees for helping improve the exposition
and the bound on the size of the initial guess for the modulus.
REFERENCES
1. ALEXI, W., CHOR, B., GOLDREICH, O., AND SCHNORR, C. P. RSA/rabin bits are l/2 +
l/poly(log N) secure. In Proceedings of the 25th IEEE Symposium on Foundations of Computer
Science. IEEE, New York, 1984, pp. 449-457.
Inferring SequencesProduced by Pseudo-Random Number Generators 141
2. BLUM, M., AND MICALI, S. How to generate cryptographically strong sequences of pseudo-random
bits. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science. IEEE, New
York, 1982, pp. 112-l 17.
3. BLUM, L., BLUM, M., AND SHUB, M. A simple secure pseudo-random number generator.
In Advances in Cryptography: Proceedings of CRYPT0 82. Plenum Press, New York, 1983,
pp. 61-78.
4. BOYAR, J. Inferring sequences produced by pseudo-random number generators. Tech. Rep. 86-
002. Univ. of Chicago, Chicago, Ill., 1986.
5. BOYAR, J. Missing low order bits in a linear congruential generator. .I. Crypt., to appear.
6. FLOYD, R. Nondeterministic algorithms. J. ACM 14, 4 (Oct. 1967), pp. 636-644.
7. FRIEZE, A. M., KANNAN, R., AND LAGARIAS, J. C. Linear congruential generators do not produce
random sequences. In Proceedings of the 25th IEEE Symposium on Foundations of Computer
Science. IEEE, New York, 1984, pp. 480-484.
8. GOLDWASSER, S., MICALI, S., AND TONG, P. Why and how to establish a private code on a public
network. In Proceedings of the 23rd IEEE Symposium on Foundations of Computer Science. IEEE,
New York, 1982, pp. 132-144.
9. HASTAD, J., AND SHAMIR, A. The cryptographic security of truncated linearly related variables. In
Proceedings of the 17th ACM Symposium on Theory of Computing (Providence, R.I., May 6-8).
ACM, New York, 1985, pp. 356-362.
10. KANNAN, R., AND BACHEM, A. Polynomial algorithms for computing the Smith and Hermite
normal forms of an integer matrix. SIAM .I. Comput. 8, 4 (1979), 499-507.
11. KNUTH, D. E. Seminumerical Algorithms, The Art of Computer Programming, vol. 2. Addison-
Wesley, Reading, Mass., 1969.
12. KNUTH, D. E. Deciphering a linear congruential encryption. Tech. Rep. 024800. Stanford Univ.,
Stanford, Calif., 1980.
13. LAGARIAS, J. C., AND REEDS, J. Unique extrapolation of polynomial recurrences. SIAMJ. Comput.
I7,2 (1988), 342-362.
14. LONG, D. L., AND WIGDERSON, A. How discrete is the discrete log? In Proceedings of the 15th
ACM Symposium on Theory of Computing (Boston, Mass., Apr. 25-27). ACM, New York, 1983,
pp. 4 13-420.
15. MACLANE, S., AND BIRKHOFF, G. Algebra. The MacMillan Company, New York, 1967.
16. PLUMSTEAD, J. B. Inferring a sequence generated by a linear congruence. In Proceedings of the
23rd IEEE Symposium on Foundations of Computer Science. IEEE, New York, 1982, pp.
153-159.
17. PLUMSTEAD, J. B. Inferring a sequence generated by a linear congruence, abstract. In Advances in
Cryptology: Proceedings of CR YPTO 82. Plenum Press, New York, 1983, pp. 3 17-3 19.
18. PLUMSTEAD, J. B. Inferring sequences produced by pseudo-random number generators. Ph.D.
dissertation. Univ. of California, Berkeley, Berkeley, Calif., 1983.
19. REEDS, J. “Cracking” a random number generator. Cryptologia, 1 (Jan. 1977), 20-26.
20. SHAMIR, A. On the generation of cryptographically strong pseudo-random sequences. In 8th
Colloquium on Automata, Languages, and Programming, 1980, 544-550.
2 1. SMITH, H. J. S. On systems of linear indeterminate equations and congruences. Phil. Trans. Royal
Soc. London, A I51 (186 l), 293-326.
22. VAZIRANI, U., AND VAZIRANI, V. Efficient and secure pseudo-random number generation. In
Proceedings of the 25th IEEE Symposium on Foundations of Computer Science. IEEE, New York,
1984, pp. 458-463.
23. YAO, A. Theory and applications of trapdoor functions. In Proceedings of the 23rd IEEE
Symposium on Foundations of Computer Science. IEEE, New York, 1982, pp. 80-9 1.

RECEIVED MAY 1985; REVISED FEBRUARY AND JULY 1986, MARCH AND SEPTEMBER1987, JANUARY 1988;
ACCEPTED JANUARY 1988

Journal of the Association for Computing Machinery, Vol. 36, No. I, January 1989.

You might also like