Professional Documents
Culture Documents
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
1.probability Random Variables and Stochastic Processes Athanasios Papoulis S. Unnikrishna Pillai 1 300 271 300
OBABIL1l'Y ANDRANDOMVARIABUlS
Hence (:-1 is also diagonal with diagonal elements i/CT? Inserting into (7-58). we obtain
EX \i\IPLE 7-9 .. Using characteristic functions. we shall show that if the random variables Xi are
jointly nonnai with zero mean, and E(xjxj} = Cij. then
E{XIX2X314} = C12C34 + CJ3C 24 + C J4C23 (7-61)
PI'(J()f. We expand the exponentials on the left and right side of (7-60) and we show explicitly
only the terms containing the factor Ct.IJWlW3Ct.14:
exp { _~ L Ct.I;(J)JC
J.)
ij } =+ ~ (~~ Ct.I;Ct.I)CI))
I.j
2 + ...
8
= ... + g(CJ2C'4 + CI3C24 + CI4C23)Ct.llWlCt.l)Ct.l4
D = [CXX CXy]
Cyx Cyy
consisting of the 2n2 + n real parameters E{XIXj}. E(Y/Yj). and E{X;Yj}. The corre-
sponding characteristic function ..,
<l>z(O) = E{exp(j(uixi + ... + U"Xn + VIYI + ... + VnYII))}
is an exponential as in (7-60):
Q= [U V] [CXX
CyX
CXy]
Crr
[U:]
V
where U = [UI .... , ulI ]. V = [VI .... , vn1. and 0 = U + jV.
The covariance matrix of the complex vector Z is an n by n hermitian matrix
Czz = E{Z'Z"'} = Cxx + Crr - j(CXY - Cyx)
CIiAI'TSR 7 SEQUENCES OF RANDOM VARIABLES 259
with elements E{Zizj}. Thus. Czz is specified in tenns of n2 real parameters. From
this it follows that. unlike the real case, the density fz(Z) of Z cannot in general be
determined in tenns of Czz because !z(Z) is a normal density consisting of 2n2 + n
parameters. Suppose, for example, that n = 1. In this case, Z = z = x + jy is a
scalar and Czz = E{lzf!}. Thus. Czz is specified in terms of the single parameter
0-; 2
= E{x + f}. However, !z(1.) = I(x, y) is a bivariate normal density consisting of
the three parameters (lx. 0"1' and E{xy}. In the following, we present a special class of
normal vectors that are statistically determined in tenns of their covariance matrix. This
class is important in modulation theory (see Sec. 10-3).
Proof. It suffices to prove (7-63); the proof of (7-62) follows from (7-63) and the Fourier inversion
formula. Under the stated assumptions,
Normal quadratic forms. Given n independent N(O, 1) random variables Zi, we fonn
the sum of their squares
. ~.=-~+ ... +~
Using characteristic functions, we shall show that the random variable x so formed has
a chi-square distribution with n degrees of freedom:
!z(x) = yxn/2-t e-x/2U(x)
IN. R. Goodman. "Statistical Analysis Based on Certain Multivariate Complex Distribution." Annals of
Mtllh. Statistics. 1963. pp. 152-177.
260 PROBABILrI'Y ANDIW'IDOM VARIABLES
Pl'oof. The random variables ~ have a x2 (1) distribution (see page 133); hence theu-
characteristic functions are obtained from (5-109) with m = 1. This yields
2 1
$;(8) = E { estl } = ~
'\"1-2s
From (7-52) and the independence of the random variables zr, it follows therefore that
1
$,1: (s) = $1 (s) ••• $n(s) = -:..,;m(1=-:::::;2;:=s)~n
Z= x+y is x2 (m + n) (7-64)
Sample variance. Given n i.i.d. N(T1, 0') random variables x" we form their .sample
variance
1 1
52 = - - ~)x;
n- 1
It
;=1
- 1)2 X= -
n
LX;It
1-1
(7-65)
as in Example 7-4. We shall show that the random variable (see also Example 8-20)
(n -21)52 = ~
L.,;
(Xi - X)2
is 2
X (n - 1) (7-66)
0' i=1 0'
is'X 2 (l) because the random variable xis N(", (J/,Jn). Fmally, the term on the left side
is x2(n) and the proof is complete.
CHAPTER 7 SEQIJENCES OF RANDOM VARIABLES 261
From (7-66) and (5-1 09) it follows that the mean of the random variable (n-l)sl /(12
equals n - 1 and its variance equals 2(n - 1). This leads to the conclusion that
q2 (14 2u 4
E{s2) = (n - 1) n _ 1 = 3'2 Var {S2} = 2(n - 1) (n _ 1)2 =n _ 1
,....-- -- ,...--
,- --~--
y({) = y
(x=x)
ySfL----i<
c= Ely}
.. -- --..,-;
""",,,~--
--------i(
!p(x) = E{ylx}
e-''''
-----x
S s
(a) (b)
FIGURE 7·2
we might use as the estimate of y a function c(x) of the random variable x. The resulting
problem is the optimum detennination of this function.
It might be argued that, if at a certain trial we observe x(n. then we can determine
the outcome, of this trial, and hence also the corresponding value y(,) of y. This, however,
is not so. The samenumberx(O = x is observed for every , in the set (x = x} (Fig. 7·2b).
If, therefore, this set has many elements and the values of y are different for the various
elements of this set, then the observed xC,) does not determine uniquely y(~). However,
we know now that, is an element of the subset {x = xl. This information reduces the
uncertainty about the value of y. In the subset (x = xl, the random variable x equals x
and the problem of determining c(x) is reduced to the problem of determining the constant
c(x). As we noted, if the optimality criterion is the minimization of the MS error, then c(x)
must be the average of y in this set. In othec words, c(x) must equal the conditional mean
of y assuming that x = x.
We shall illustrate with an example. Suppose that the space S is the set of all children
in a community and the random variable y is the height of each child. A particular outcome
, is a specific child and yen is the height of this child. From the preceding discussion it
follows that if we wish to estimate y by a number, this number must equal the mean of y.
We now assume that each selected child is weighed. On the basis of this observation, the
estimate of the height of the child can be improved. The weight is a random variable x:
hence the optimum estimate of y is now the conditional mean E (y Ix} of y assuming x x =
where x is the observed weight.
de
de
= 1
00
-00
2(y - c)f(y) dy =0
that is, if
CHAPI'ER 7 SEQUENCES OF RANDOM VARIABLES 263
Thus
c = E{y} = 1: yfCy) dy
This result is well known from mechanics: The moment of inertia with respect to a point
(7-69)
is minimum.
e = E{[y - C(X)]2} = 1:1: [y - c(x»)2f(x, y) dx dy (7-70)
We maintain that
I: 1:
Proof. Since fex, y) = fey I xl/ex), (7-70) yields
These integrands are positive. Hence e is minimum if the inner integral is minimum for
every x. This integral is of the form (7-68) if c is changed to c(x), and fey) is changed
to fey I x). Hence it is minimum if c(x) equals the integral in (7-69), provided that fey)
is changed to fey Ix). The result is (7-71).
Thus the optimum c(x) is the regression line <p(x) of Fig. 6-33.
As we noted in the beginning of the section. if y = g(x), then E{y Ixl = g(x):
hence c(x) = g(x) and the resulting MS erroris O. This is not surprising because, ifx is
observed and y = g(x). then y is determined uniquely.
If the random variables x and y are independent, then E {y Ix} = E {y} = constant.
In this case, knowledge of x has no effect on the estimate of y.
Linear MS Estimation
The solution of the nonlinear MS estimation problem is based on knowledge of the
function <p(x). An easier problem, using only second-order moments, is the linear MS
estimation of y in terms of x. The resulting estimate is not as good as the nonlinear
estimate; however. it is used in many applications because of the simplicity of the solution.
The linear estimation problem is the estimation of the random variable y in terms
of a linear function Ax + B of x. The problem now is to find the constants A and B so
as to minimize the MS error
e = E{[y - (Ax + B)]2} (7-72)
We maintain that e = em is minimum if
A =~ = r(Jy B = 11, - Al1x (7-73)
"'20 (Jx
264 PROBABILI1"iANDRANDOMVARlABl.ES
and
2
J.LII 212
em = J.L02 - - = aye - r ) (7-74)
J.L20
For normal random variables. nonlinear and linear MS estimates are identical.
Proof. Oeady, e is minimum if e'(a) = 0; this yields (7-TI). We shall give a second
proof: We assume that a satisfies (7-77) and we shall show that e is minimum. With a
an arbitrary constant,
Here, the last term is 0 by assumption and the second term is positive. From this it follows
that
MS error Since
e = E{(y - ax)y} - E{(y - ax)ax} = E{i} - E{(ax)2} - 2aE{(y - ax)x}
(y - tnt).l x
. Risk and loss functions. We conclude with a brief comment on other optimality criteria
limiting the discussion to the estimation of a random variable y by a constant c. We seleet
a function L (x) and we choose c so as to minimize the mean
of the random variable Us - c). The function L (x) is called the loss function and the
constant R is called the average risk. The choice of L (x) depends on the applications. If
L(x) = x 2, then R = E{(y- C)2} is the MSerror and as we have shown, it is minimum
ife = E{y}.
If L(x) = Ixl. then R = EUy - cll. We maintain that in this case, c equals the
median YO.S ofy (see also Prob. 5-32).
-dR =
de
I-00
c
fey) dy - [00 ICy) dy = 2F(e) - 1
C
(7-80)
(7-81)
i = 1, ...• n (7-82)
and (7-82) results. This important result is known also as the projection theorem. Setting
i = I, ... , n in (7-82), we obtain the system
RlIa, + R21a2 + ... + Rnlan = ROI
Rlzal + RZZa2 + ... + Rn2an = R02
(7-83)
(a) (b)
~
FIGURE 7-4
268 PROBA81L1T't ANDRANDQMVARlABLES
The function Is (s I X) is the conditional mean (regression surface) of the random variable
s assuming X = X.
J:
Since all quantities are positive. it follows that P is minimum if the conditional MS error
is minimum. In the above integral. g(X) is constant. Hence the integral is minimum if
g(X) is given by (7-89) [see also (7-71)].
The general orthogonality principle. From the projection theorem (7-82) it foIlows
that
(7-92)
for any CI ••••• Cn - This shows that if i is the linear MS estimator of s, the estimation
errors - i is orthogonal to any linear function y = CtXt + . --+ CnXn of the data X;.
CHAPl'ER' SEQUENCES OF RANDOM VARIABLI!S 269
We shall now show that if g(X) is the nonlinear MS estimator of s, the estimation
error s - g(X) is orthogonal to any function w(X), linear or nonlinear. of the data Xi:
E{[s - g(X)]w(X)} = 0 (7-93)
Normality. Using the material just developed, we shall show that if the random variables
s, XI, •.. , Xn are jointly normal with zero mean, the linear and nonlinear estimators of s
are equal:
§ = atXl + ... + all X" = g(X) = E{s IX} (7-95)
Proof. To prove (7-93), it suffices to show that § = E{s IXl. The random variables s - S
and X; are jointly normal with zero mean and orthogonal; hence they are independent.
From this it follows that
E(s - sI X} = E{s - 5} = 0 = E{s I Xl - E{5 I Xl
Conditional densities of normal random variables. We shall use the preceding result
to simplify the determination of conditional densities involving normal random vari-
ables. The conditional density fs (s I X) of s assuming X is the ratio of two exponentials
the exponents of which are quadratics, hence it is normal. To detennine it, it suffices,
therefore. to find the conditional mean and variance of s. We maintain that
E{s I X} =~ (7-96)
The first follows from (7-95). The second follows from the fact that s - s is orthogonal
and, therefore, independent of X. We thus conclude that
EXAMPLE 7-11 ~ The random variables Xl and X2 are jointly normal with zero mean. We shall determine
their conditional density f(x21 Xl). As we know [see (7-78)]
RJ2
. E {x21 xtl = axl
a= -
RIl
U;11.KI = P = E{(X2 - aXt)X2} = R22 - aR12
270 PROBABILITY ANDRANDOMVA1UABLES
EXAMPLE 7-12 ~ We now wish to find the conditional density 1 (X3 I Xl. X2). In this case,
E{X31 Xl, X2} = alxl + a2X2
where the constants a, and a2 are determined from the system
RlIal + RI2Q2 = RI3
Furthermore [see (7-96) and (7-85)]
o-;,IXf,X2 =p = R33 - (Rl3a l + R23a2)
and (7-97) yields
EXAl\IPLE 7-13 ~ In this example, we shall find the two-dimensional density I(X2, x31 XI). This in-
volves the evaluation of five parameters [see (6-23)]; two conditional means, two con-
ditional variances, and the conditional covat1ance of the random variables X2 and X3
assuming XI.
The first four parameters are determined as in Example 7-11:
RI2 RI3
E{x21 xd = -R XI E{x31 x11 = -Xl
11 Ru
Thus the determination of the projection § of s is simplified if the data Xi are expressed
in terms of an orthonormal set of vectors. This is done as follows. We wish to find a set
{Zk} of n orthononnal random variables Zk linearly equivalent to the data set {x.t}. By
this we mean that each Zk is a linear function of the elements of the set {Xk} and each
x.t is a linear function of the elements of the set {Zk}. The set {Zk} is not unique. We
shall determine it using the Gram-Schmidt method (Fig. 7-4b).1n this method, each Zk
depends only on the first k data Xl •.•.• Xk. Thus
Zl = ylx)
Z2 = y[x) + yiX2 (7-100)
This is a system of k - 1 equations for the k unknowns yf, ... , yf. The condition
E(zi} = 1 yields one more equation.
The system (7-100) can be written in a vector fonn
z=xr (7-102)
where Z is a row vector with elements Zk. Solving for X. we obtain
Xl = 1lz1 X = zr- 1 = ZL
X2 = qZl +IiZ2 (7-103)
r= [
Y2
o
••.
•.• Yi
yf
..... .
I
In the above. the matrix r and its inverse are upper triangular
YI Y~
L=
11 11
[ Ii
o
~.In I
Y: ["
"
Since E{ztzj} = 8[; - j] by construction, we conclude that
E{Z'Z} = I" = E{rtX'Xf} = rt E{X'X}r (7-104)
272 PROBABILITY ANDRANOOMVAIUABLES
Xi = a + Vi
where the noise components Vi are independent random variables with zero mean and
variance (12. This leads to the conclusion that the sample mean
x = XI + ... + x" (7-109)
n
of the measurements is a random variable with mean a and variance (12/ n. If, therefore,
n is so large that (12 «na 2 , then the value x(n of the sample mean x in a single
performance of the experiment 5" (consisting of n independent measurements) is a
satisfactory estimate of the unknown a.
To find a bound of the error in the estimate of a by i, we apply (7-108) .. To be
concrete. we assume that n is so large that (12/ na 2 = 10-4, and we ask for the probability
that X is between 0.9a and 1.1a. The answer is given by (7-108) with e = 0.1a.
l00cT 2
P{0.9a < x< l.1a} ::: 1 - - - = 0.99
n
Thus, if the experiment is performed n = 1(f (12 / a2 times, then "almost certainly" in
99 percent of the cases, the estimate xof a will be between 0.9a and 1.1a. Motivated by
the above. we introduce next various convergence modes involving sequences of random
variables.
....
For a specific ~ . XII (~) is a sequence of numbers that might or might not converge.
This su.ggests that the notion of convergence of a random sequence might be given several
interpretations:
s
10"(7-113), {xn -+ x} is an event consisting of all outcomes such that xn({) --+ x<n.
Convergence in the MS sense (MS) The sequence X'I tends to the random vari.
able x in the MS sense if
as n --+ 00 (7-114)
This is called
it limit in the mean and it is often written in the form
l.i.m. Xn =x
Convergence in probobUity (P) The probability P{lx - xnl > 8} of the event
{Ix - Xn I >
e} is a sequence of numbers depending on e. If this sequence tends to 0:
P{lx - xnl > e} -+ 0 n -+ 00 (7-115)
for any s > 0, then we say that the sequence X1/ tends to the random variable x in prob-
ability (or in measure). This is also called stochastic convergence.
Convergence in distribution (d) We denote by Fn(x) and F(x), respectively, the
distribution of the random variables Xn and x. If
n --+ 00 (7-116)
for every point x of continuity of F(x), then we say that the sequence Xn tends to the
random variable x in distribution. We note that, in this case, the sequence Xn ({) need not
converge for any s-
Cauchy criterion As we noted, a detenninistic sequence Xn converges if it sat-
isfies (7-111). This definition involves the limit x of Xn • The following theorem, known
as the Cauchy criterion, establishes conditions for the convergence of Xn that avoid the
use of x: If
as n-+OO (7-117)
for any m > 0, then the sequence Xn converges.
The above theorem holds also for random sequence. In this case, the limit must be
interpreted accordingly. For example, if
as n-+oo
for every m > 0, then the random sequence Xn converges in the MS sense.
CHAPTER 7 SSQUENCBS OF RANDOM VARIABLES 275
P{lXII - xl>
If X,I -+ x in the MS sense, then for a fixed e > 0 the right side tends to 0; hence the
left side also tends to 0 as n -+ 00 and (7-115) follows. The converse, however, is not
necessarily true. If Xn is not bounded, then P {Ix,. - x I > e} might tend to 0 but not
E{lx,. - xI2}. If, however, XII vanishes outside some interval (-c, c) for every n > no.
then p convergence and MS convergence are equivalent.
It is self-evident that a.e. convergence in (7-112) implies p convergence. We shall
heulristic argument that the In Fig. 7-6, we plot the
as a function of n, sequences are drawn
represents, thus. a - x(~)I. Convergence
means that for a specific n percentage of these curves
VU",IU'~'"," that exceed e (Fig. possible that not even one
remain less than 6 for Convergence a.e.• on the other
hand, demands that most curves will be below s for every n > no (Fig. 7-6b).
The law of large numbers (Bernoulli). In Sec. 3.3 we showed that if the probability of
an event A in a given experiment equals p and the number of successes of A in n trials
n
(0) (b)
FIGURE7~
276 PROBASR.1TYANDRANOOMVARIABLES
equals k, then
as n~oo (7-118)
We shall reestablish this result as a limit of a sequence of random variables. For this
purpose, we introduce the random variables
Xi = {I if A occurs at the ith trial
o otherwise
We shall show that the sample mean
_ Xl + ... + XII
x,,=-----n
of these random variables tends to p in probability as n ~ 00.
Proof. As we know
E{x;} = E{x,,} = p
The strong law of large numbers (Borel). It can be shown that in tends to p not only
in probability, but also with probability 1 (a.e.). This result, due to Borel, is known as the
strong law of large numbers. The proof will not be given. We give below only a heuristic
explanation of the difference between (7-118) and the strong law of large numbers in
terms of relative frequencies.
Ergodicity. Ergodicity is a topic dealing with the relationship between statistical aver-
ages and sample averages. This topic is treated in Sec. 11-1. In the following, we discuss
certain results phrased in the form of limits of random sequences.
CHAPTER 7 SEQUENCES OF RANDOM VARIA8LES 277
Markov's theorem. We are given a sequence Xi of random variables and we form their
sample mean
XI +, .. +xn
xn = - - - - -
n
Clearly, X,I is a random variable whose values xn (~) depend on the experimental outcome
t. We maintain that, if the random variables Xi are such that the mean 1f'1 of xn tends to
a limit 7J and its variance ({" tends to 0 as n -+ 00:
E{xn } = 1f,1 ---+ 11 ({; = E{CXn -1f,i} ---+ 0 (7-119)
n...... oo n~oo
COROLLARY t> (Tchebycheff's condition.) If the random variables Xi are uncorrelated and
(7-121)
then
1 n
in ---+
n..... oo
1'/ = nlim - ' " E{Xi}
..... oo n L..J
;=1
in the MS sense.
PrtHJf. It follows from the theorem because, for uncorrelated random variables, the left side of
(7-121) equals (f~.
We note that Tchebycheff's condition (7-121) is satisfied if (1/ < K < 00 for every i. This is
the ease if the random variables Xi are i.i.d. with finite variance, ~
Khinchin We mention without proof that if the random variables Xi are i.i.d.,
then their sample mean in tends to 11 even if nothing is known about their variance. In
this case, however, in tends to 11 in probability only, The following is an application:
EXAi\lPLE 7-14 ~ We wish to determine the distribution F(x) of a random variable X defined in a
certain experiment. For this purpose we repeat the experiment n times and form the
random variables Xi as in (7-12). As we know, these random variables are i.i,d. and their
common distribution equals F(x). We next form the random variables
I if Xi::x
Yi(X) ={ 0 'f X, > x
1
278 PROBABIUTYANORANDOMVARlABLES
where x is a fixed number. The random variables Yi (x) so formed are also i.i.d. and their
mean equals
E{Yi(X)} = 1 x P{Yi = I} = Pix; !:: x} = F(x)
Applying Khinchin's theorem to Yi (x), we conclude that
Yl(X) + ... +YII(X) ---7 F(x)
n 11_00
in probability. Thus, to determine F(x), we repeat the original experiment n times and
count the number of times the random variable x is less than x. If this number equals
k and n is sufficiently large, then F (x) ~ k / n. The above is thus a restatement of the
relative frequency interpretation (4-3) of F(x) in the form of a limit theorem. ~
This is a random variable with mean 11 = 111 + ... + 11n and variance cr 2 = O't + ... +cr;.
The central limit theorem (CLT) states that under certain general conditions, the distri-
bution F(x) of x approaches a normal distribution with the same mean and variance:
as n increases. Furthermore, if the random variables X; are of continuous type, the density
I(x) ofx appro~ches a normal density (Fig. 7-7a):
f(x) ~ _1_e-(x-f/)2/2q2
(7-123)
0'$
This important theorem can be stated as a limit: If z = (x - 11) / cr then
1 _ 1/2
f~(z) ---7 --e ~
n_oo $
for the general and for the continuous case, respectively. The proof is outlined later.
The CLT can be expressed as a property of convolutions: The convolution of a
large number of positive functions is approximately a normal function [see (7-51)].
x o x
(a)
FIGURE 7-7
CHAPI'ER 7 SEQUENCES OF RANDOM VAlUABLES 279
The nature of the CLT approxlInation and the required value of n for a specified
error bound depend on the form of the densities fi(x). If the random variables Xi are
Li.d., the value n = 30 is adequate for most applications. In fact, if the functions fi (x)
are smooth, values of n as low as 5 can be used. The next example is an illustration.
EX \l\IPLE 7-15 ~ The random variables Xi are i.i.d. and unifonnly distributed in the interval (0, }). We
shall compare the density Ix (x) of their swn X with the nonnal approximation (7-111)
for n = 2 and n = 3. In this problem,
T T2 T T2
7Ji = "2 al = 12 7J = n"2 (12 = nU
n= 2 I (x) is a triangle obtained by convolving a pulse with itself (Fig. 7-8)
(7-124)
1 fl.-x)
f
0.50
T
o T oX 0 o T 2T 3TX
--f(x) --f~)
.!. IIe- 3(;c - n 2n-2 .!. fI e-2(%- I.3Trn-2
T";;; T";;;
(/.I) (b) (c)
FIGURE 7·8
280 PRoBABILITY AHDRANDOMVMIABI.ES
We give next an illustration in the context of Bernoulli trialS. The random variables Xi of
Example 7-7 are i.i.d. taking the values 1 and 0 with probabilities p and q respectively:
hence their sum x is of lattice type taking the values k = 0, '" . , n. In this case,
E{x} = nE{x;) = np
Inserting into (7-124), we obtain the approximation
This shows that the DeMoivre-Laplace theorem (4-90) is a special case of the lattice-type
form (7-124) of the central limit theorem.
LX \\lI'Uc 7-)(, ~ A fair coin is tossed six times and '" is the zero-one random variable associated with
the event {beads at the ith toss}. The probability of k heads in six tosses equals
ERROR CORRECTION. In the approximation of I(x) by the nonna! curve N(TJ, cr 2),
the error
1 _ 2!2fT2
e(x) = I(x) - cr./iiie x
results where we assumed. shifting the origin. that TJ = O. We shall express this error in
terms of the moments
mil = E{x"} "
_____1_e-(k-3)113
fx(x)
lSi(
..
,, ...,,
,, ,,
....T' 'T',
0 1 2 3 A S 6 k FIGURE1-t)
CHAP'I'ER 7 SEQUIltICES OF RANDOM VAlUABLES 281
1 ~
00 -X2/2H ( )U ( d _
e " X Um x) X -
{n!$
0
n=m
-J.
nTm
Hence sex) can be written as a series
sex) = ~e-rf2u2
(1 v 2:n-
f.
k ..3
CkHIr. (=-)
(J
(7-127)
The series starts with k = 3 because the moments of sex) of order up to 2 are O. The
coefficients Cn can be expressed in terms of the moments mn of x. Equating moments
of order n = 3 and n = 4. we obtain [see (5-73)]
I(x)~--e-
1 x21'>_2 [
1 + -3 - - -
,- m3 (X3 3X)] (7-128)
(1../iii 6cT (13 (1
~ If the random variables X; are i.i.d. with density f;(x) as in Fig. 7-10a. then I(x)
consists of three parabolic pieces (see also Example 7-12) and N(O.1/4) is its normal
approximation. Since I(x) is even and m4 = 13/80 (see Prob. 7-4), (7-129) yields
I(x) ~ ~e-2x2
V-; (1 _4x4 + 2x -..!..)
15 20 -
=
5
2
lex)
In Fig. 7-lOb. we show the error sex) of the normal approximation and the first-order
correction error I(x) -lex) . .....
ON THE PROOF OF THE CENTRAL LIMIT THEOREM. We shall justify the approx-
imation (7-123) using characteristic functions. We assume for simplicity that 11i O. =
Denoting by 4>j(ru) and 4>(ru). respectively. the characteristic functions of the random
=
variables X; and x XI + ... + Xn. we conclude from the independence of Xi that
4>(ru) = 4>1 (ru)· .. 4>,.(ru)
282 PROBABILiTY ANORANDOMVARlABtES
fl-x)
---- >i2hre- ul
n=3 -f(x)
_._- fix)
-0.05
(a) (b)
FIGURE 7-10
Near the origin, the functions \II; (w) = In 4>;(w) can be approximated by a parabola:
\II/(w) =::: _~ol{t)2 4>/(w) = e-ul(',,2j2 for Iwl < 8 (7-130)
If the random variables x/ are of continuous type, then [see (5-95) and Prob. 5-29]
<1>/(0) =1 I<I>/(w) I < 1 for Iwl =F 0 (7-131)
Equation (7-131) suggests that for smalls and large n, the function <I>(w) is negligible
for Iwl > s, (Fig. 7-11a). This holds also for the exponential e-u2";/2 if (1 -+,00 as in
(7-135). From our discussion it follows that
_ 2 2/2 _ 2 2/" 2/2 _-~
<I>(w) ~ e u.lI) •• 'e u~'" .. = e <rID for all w (7-132)
in agreement with (7-123).
The exact form of the theorem states that the normalized random variable
XI + ... + x,. 2 2
Z= (1 = (1, + ... + (1;
(1
-8 0 8 o
(a) (b)
FIGURE 7-11
CHAPTER7 SEQU6JllCESOFRANDOMVARIABLES 283
Hence,
¢;:(W) = ¢7 = (U;~)
Expanding the functions \l#j(w) = In ¢j(w) near the origin, we obtain
u?-w 32
\l#i(W) = --'-
2
+ O(w )
Hence,
\l#t(w) W)
= nl,llj (UjVn
t= = --
w2
2
+0 (1) vn
t= ----.--
n~oo 2
w2 (7-134)
(a)
2
U1 + ... + C1.n2 ----.
11-+00
00 (7-135)
(b) There exists a number a> 2 and a finite constant K such that
These conditions are not the most general. However, they cover a wide range of applications
For example, (7-135) is satisfied if there exists a constant s > 0 such that U; > 8 for
all i. Condition (7-136) is satisfied if all densities .fI(x) are 0 outside a finite interval
(-c, c) no matter bow large.
Lattice type The preceding reasoning can also be applied to discrete-type random
variables. However, in this case the functions <l>j(w) are periodic (Fig. 7-11b) and their
product takes significant values only in a small region near the points W 211:nja. Using =
the approximation (7-124) in each of these regions, we obtain
¢(w):::::: Le-<1 1(GI-nC</o)1/2 ~ = 211:
a
(7-137)
II
As we can see from (llA-l), the inverse of the above yields (7-124).
The Berry-E"een theorem3 This theorem states that if
E{r.}
1
<
-
cu?-1 all i (7-138)
where c is some constant, then the distribution F (x) of the normalized sum
_ Xl + ... + XII
x=-----
U
is close to the normal distribution G(x) in the following sense
- 4c
IF(x) - G(x)1 < - (7-139)
U
Tne central limit theorem is a corollary of (7-139) because (7-139) leads to the
conclusion that
F(x) -? G(x) as 0' -? 00 (7-140)
This proof is based on condition (7-138). This condition, however, is not too restrictive.
It holds, for example, if the random variables Xi are Li.d. and their third moment is finite.
We note, finally, that whereas (7-140) establishes merely the convergence in dis-
tribution on to a normal random variable, (7-139) gives also a bound of the deviation
of F (x) from normality.
The central limit theorem for products. Given n independent positive random vari-
abl~s Xi, we form their product:
X; > 0
where
n n
TJ = L,:E{lnxd 0'2 = L Var(ln Xi)
i=1 ;=1
EX \l\JPLE 7-IS ~ Suppose that the random variables X; are uniform in the interval (0,1). In this case,
on the meaning and generation of random numbers. We start with a simple illustration
of the role of statistics in the numerical solution of deterministic problems.
MONTE CARLO INTEGRATION. We wish to evaluate the integral
1= 11 g(x)dx (7-142)
For this purpose, we introduce a random variable x with uniform distribution in the
interval (0, 1) and we fOfm the random variable y = g(x). As we know,
hen~e 77 y = I. We have thus expressed the unknown I as the expected value of the random
variable y. This result involves only concepts; it does not yield a numerical method for
evaluating I. Suppose, however, that the random variable x models a physical quantity
in a real experiment. We can then estimate I using the relative frequency interpretation
of expected values: We repeat the experiment a large number of times and observe the
values Xi of x; we compute the cOlTesponding values Yi =g(Xi) of y and form their
average as in (5-51). This yields
1
1= E{g(x)} ~ - 2:g(Xi) (7-144)
n
This suggests the method described next for determining I:
The data Xi, no matter how they are obtained, are random numbers; that is, they are
numbers having certain properties. If, therefore, we can numerically generate such num-
bers. we have a method for determining I. To carry out this method, we must reexamine
the meaning of random numbers and develop computer programs for generating them.
THE DUAL INTERPRETATION OF RANDOM NUMBERS. "What are random num-
bers? Can they be generated by a computer? Is it possible to generate truly random
number sequences?" Such questions do not have a generally accepted answer. The rea-
son is simple. As in the case of probability (see Chap. I), the term random numbers has
two distinctly different meanings. The first is theoretical: Random numbers are mental
constructs defined in terms of an abstract model. The second is empirical: Random num-
bers are sequences of real numbers generated either as physical data obtained from a
random experiment or as computer output obtained from a deterministic program. The
duality of interpretation of random numbers is apparent in the following extensively
quoted definitions4 :
A sequence of numbers is random if it has every property that is shared by all infinite
sequences of independent samples of random variables from the uniform distribution.
(1. M. Franklin)
A random sequence is a vague notion embodying the ideas of a sequence in which each
term is unpredictable to the uninitiated and whose digits pass a certain number of tests,
traditional with statisticians and depending somewhat on the uses to which the sequence is
to be put. (D. H. Lehmer)
4D. E. Knuth: The Art o/Computer Programming, Addison-Wesley, Reading. MA, 1969.
286 PROBABIUTY ANDltANOOM VARIABLES
It is obvious that these definitions cannot have the same meaning. Nevertheless,
both are used to define random number sequences. To avoid this confusing ambiguity, we
shall give two definitions: one theoretical. the other empirical. For these definitions We
shall rely solely on the uses for which random numbers are intended: Random numbers
are used to apply statistical techniques to other fields. It is natural, therefore, that they
are defined in terms of the corresponding probabilistic concepts and their properties as
physically generated numbers are expressed directly in terms of the properties of real
data generated by random experiments.
LEHMER'S ALGORITHM. The simplest and one of the oldest random number gener-
ators is the recursion
Z" = aZn-l mod m zo=l n~1 (7-146)
where m is a large prime number and a is an integer. Solving, we obtain
ZII = a" modm (7-147)
The sequence z" takes values between 1 and m - 1; hence at least two of its first m
values are equal. From this it follows that z" is a periodic sequence for n > m with
period mo ~ m - 1. A periodic sequence is not, of course, random. However, if for the
applications for which it is intended the required number of sample does not exceed mo.
periodicity is irrelevant. For most applications, it is enough to choose for m a number
of the order of 1()9 and to search for a constant a such that mo = m - 1. A value for m
suggested by Lehmer in 1951 is the prime number 231 - 1.
To complete the specification of (7-146), we must assign a value to the multiplier
a. Our first condition is that the period mo of the resulting sequence Zo equal m - 1.