Professional Documents
Culture Documents
Random Signals by Shanmugan1988 (1) - 23-124
Random Signals by Shanmugan1988 (1) - 23-124
Random Signals by Shanmugan1988 (1) - 23-124
2.1 INTRODUCTION
functions and density functions are developed. We then discuss summary meas
ures -(averages or expected values) that frequently prove useful in characterizing
random variables.
Vector-valued random variables (or random vectors, as they are often re
ferred to) and methods of characterizing them are introduced in Section 2.5.
Various multivariate distribution and density functions that form the basis of
probability models for random vectors are presented.
As electrical engineers, we are often interested in calculating the response
of a system for a given input. Procedures for calculating the details of the
probability model for the output of a system driven by a random input are
developed in Section 2.6.
In Section 2.7, we introduce inequalities for computing probabilities, which
are often very useful in many applications because they require less knowledge
about the random variables. A series approximation to a density function based
on some o f its moments is introduced, and an approximation to the distribution
o f a random variable that is a nonlinear function o f other (known) random vari
ables is presented.
Convergence of sequences of random variable is the final topic introduced
in this chapter. Examples of convergence are the law of large numbers and the
central limit theorem.
PROBABILITY
null set are called finite sets. A set that is not countable is called uncountable.
A set that is not finite is called an infinite set.
A C B
or equivalently
B DA
A <1S
l C A
A C A
Set Equality. Two arbitrary sets, A and B, are called equal if and only if they
contain exactly the same elements, or equivalently,
A UB
and is the set o f all elements that belong to A or belong to B (or to both). The
union o f N sets is obtained by repeated application o f the foregoing definition
and is denoted by
N
A , U A 2 U ••■U A n U A,
and is the set o f all elements that belong to both A and B. A n B is also written
A B . The intersection o f N sets is written as
N
A i (~l A 2 (~l ■ 0 * O Aft na1
A n B = AB = l
Aj D A ; = ^ for all i, j, iA j
Commutative Laws.
A UB = B UA
A n B = B n A
Associative Laws.
(i4 U B ) U C = A U (B U C) = A U B U C
(A nB )nc = A n ( B n c ) = A fiB n c
Distributive Laws.
A n (B u C) = (a n b ) u (A n c )
AU(BnC) = (AUB)n(AUC)
12 REVIEW OF PROBABILITY AND RANDOM VARIABLES
DeMorgan’s Laws.
(A U B ) = A D B
(A n B ) = A U B
L P(S) = 1 ( 2 . 1)
(2.3)
if A, n A, = $ for i A j,
and A' may be infinite
(0 is the empty or null set)
P ( A ) = lim — (2.4)
/.-» n
For example, if a coin (fair or not) is tossed n times and heads show up nH
times, then the probability of heads equals the limiting value of nHln.
(2.5)
If we use this definition to find the probability o f a tail when a coin is tossed,
we will obtain an answer o f |. This answer is correct when we have a fair coin.
If the coin is not fair, then the classical definition will lead to incorrect values
for probabilities. We can take this possibility into account and modify the def
14 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Upface 1 2 3 4 5 6 Total
Relative
Frequency .155 .159 .164 .169 .174 .179 1.000
1 1 1 1 1 1
Classical 6 6 6 6 6 6 1.000
They obtained these frequencies by calculating the excess of even over odd in
Longcor’s data and supposing that each side o f the die is favored in proportion
to the extent that is has more drilled pips than the opposite side. The 6, since
it is opposite the 1, is the most favored.*1
then
P ( A ) = P ( A n 5) = P[A fl {Ax U A ; U • • • U A n)\
= p [(a n A i) u (A n a 2) u • • • u (A n a „)]
= F ( A n ^ j ) + P (A n a 2) + • • ■ + P (A n A n) (2.10.e)
The sets A ,, A 2, . . . , A n are said to be mutuallyexclusive and exhaustive
if Equations 2.10.C and 2.10.d are satisfied.
8. = P { A l) + P { A 2A 2) + P { A lA1A 2) + ■ ■ •
( 2 . 11)
S2 o f Ez consists o f outcomes b\, b2, . . . , b„2, then the sample space S of the
combined experiment is the Cartesian product o f Si and S2. That is
S = Si x S2
= {(a „ bj): i — 1,2, , nu j = 1, 2, . . . , n-h
= £ P(A,B,) ( 2 . 12)
1=1
(2.13)
Given that the event A has occurred, we know that the outcome is in A. There
are NA outcomes in A. Now, for B to occur given that A has occurred, the
outcome should belong to A and B. There are NaB outcomes in AB. Thus, the
probability o f occurrence o f B given A has occurred is
P(B\A) - (2.14)
One can show that P(B\A) as defined by Equation 2.14 is a probability measure,
that is, it satisfies Equations 2.1, 2.2, and 2.3.
P { A ) = 2 F(A|B/)P (B /) (2.18)
18 REVIEW OF PROBABILITY AND RANDOM VARIABLES
EXAMPLE 2.2.
Class of Defect
Si — B2 — B3 — 64 = 65 —
Manufacturer none critical serious minor incidental Totals
A4, 124 6 3 1 6 140
m2 145 2 4 0 9 160
m3 115 1 2 1 1 120
M4 101 2 0 5 2 110
Totals 485 11 9 7 18 530
What is the probability o f a component selected at random from the 530 com
ponents (a) being from manufacturer M2 and having no defects, (b) having a
critical defect, (c) being from manufacturer M,, (d) having a critical defect given
the component is from manufacturer M2, (e) being from manufacturer M u given
it has a critical defect?
SOLUTION:
(a) This is a joint probability and is found by assuming that each component
is equally likely to be selected. There are 145 components from M 2
having no defects out of a total of 530 components. Thus
/><«,«,) =i
(b) This calls for a marginal probability.
P ( B 2) = P ( M i B2) + P (M 2B2) + P (M 3B2) + P ( M 4B2)
6 2 1 2 11
~ 530 + 530 + 530 + 530 ~~ 530
Note that P ( B 2) can also be found in the bottom margin o f the table,
that is
PROBABILITY 19
(d) This conditional probability is found by the interpretation that given the
component is from manufacturer M2, there are 160 outcomes in the
space, two o f which have critical defects. Thus
P (B 2\M2) =
160
2
P (B 2M2) 530 2
P (B 2\M2)
P (M 2) 160 160
530
(e)
p {m ,\b 2) = Y\
Bayes’ Rule. Sir Thomas Bayes applied Equations 2.15 and 2.18 to arrive at
the form
P(A|Z?,.)P(fi,)
P(Bj\A) = (2.19)
£ P(A\B^P{B,)
i= i
EXAMPLE 2.3.
one is received as a one is .90. We also assume the probability a zero is transmitted
is .4. Find
SOLUTION: Defining
A — one transmitted
A = zero transmitted
B = one received
B = zero received
P ( B ) = P (B \ A )P (A ) + P {B \ A )P (A )
= .90(.6) + ,05(.4)
= .56.
(b) Using Bayes’ rule, Equation 2.19
P(A\B) = = (-90X-6) _ 27
' P (B ) .56 28
or when
PiA^Bj) = P ( A t) (2.20.b)
RANDOM VARIABLES 21
Equation 2.20.a implies Equation 2.20.b and conversely. Observe that statistical
independence is quite different from mutual exclusiveness. Indeed, if A t and Bj
are mutually exclusive, then P{AiBj) = 0 by definition.
2.3 R A N D O M V A R IA B L E S
x~\T) = {x e S :Z (x ) e t }
P(X = x) = P (X:Z(X) = x)
P (Z < x) = P (X:Z(X) < x\
P{xx < X s x2) = P {X:jc, < Z(X) s x j
22 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
F ig u r e 2 .1 M a p p i n g o f t h e s a m p l e s p a c e b y a r a n d o m v a r ia b le .
EXAMPLE 2.4.
Consider the toss o f one die. Let the random variable X represent the value of
the up face. The mapping performed by X is shown in Figure 2.1. The values
of the random variable are 1, 2, 3, 4, 5, 6.
1. Fx {-* > ) = 0
2. Fx (« ) = 1
3. lim Fx {x + e) = Fx (x)
€'—
*0
€>0
4. Fx {xi) s F x ( x 2) if Xi < x 2
5. P [ x x < X £ x 2] = F x { x 2) - Fx (*,)
EXAMPLE 2.5.
Consider the toss o f a fair die. Plot the distribution function of X where X is a
random variable that equals the number o f dots on the up face.
RANDOM VARIABLES 23
1 1 1 1 1 1 1 1 1
1 T----------------------------------------
1
5/6
r r1 — 1
4/6
h 1
* W 3/6 t— 1
1
2/6 ■ """
1/6
- T---------1
1 i i i I l l_____ 1______1______
00 1 2 3 4 5 6 7 8 9 10
X
Figure 2.2 D is t r ib u t io n f u n c t i o n o f th e r a n d o m v a r ia b le X s h o w n in F ig u r e 2 .1 .
Joint Distribution Function. We now consider the case where two random
variables are defined on a sample space. For example, both the voltage and
current might be of interest in a certain experiment.
The probability o f the joint occurrence o f two events such as A and B was
called the joint probability P (A IT B). If the event A is the event { X < x) and
the event B is the event (Y < y), then the joint probability is called the joint
distribution function o f the random variables X and Y ; that is
f x .A.x , y) = =£ * ) n (Y =£ y)}
EXAMPLE 2.6.
Consider the toss o f a fair die. Plot the probability mass function.
P ( X = x i)
1/6
3. P (X = x,|Y = = P ( X p ( Y = Yyj) ~ ’ P (Y = y ) # °
(2.25)
(2.26)
4. Random variables X and Y are statistically independent if
EXAMPLE 2.7.
Find the joint probability mass function and joint distribution function of X , Y
associated with the experiment o f tossing two fair dice where X represents the
I
26 REVIEW OF PROBABILITY A N D RANDO M VARIABLES
4 number appearing on the up face o f one die and Y represents the number
appearing on the up face of the other die.
i
SOLUTION:
%
i 1
P ( X = i, Y = / ) = — , i = 1 ,2 , . . . ,6 ; j = 1, 2, . . - , 6
i x y i
f x .y ( x , y) = X X * = i >2 , . . . ,6 ; y = 1, 2, . . . , 6
t 1=1 /=i do
i xy
36
i
^ If jc and y are not integers and are between 0 and 6, Fx x (x, y ) = FXiy([^], [y])
^ where [a:] is the greatest integer less than or equal to x. Fx Y(x, y) — 0 for x <
4 1 or y < 1. Fx x {x, y ) = 1 for x s 6 and y > 6. Fx x (x, y ) = Fx {x) for y > 6.
Fx x {x, y ) = F y(y) for x > 6 .
i
I -------------------------------------------------------------------------------------------------------------------
i
jl 2.3.3 Expected Values or Averages
] The probability mass function (or the distribution function) provides as complete
a description as possible for a discrete random variable. For many purposes this
^ description is often too detailed. It is sometimes simpler and more convenient
f to describe a random variable by a few characteristic numbers or summary
measures that are representative of its probability mass function. These numbers
1 are the various expected values (sometimes called statistical averages). The ex-
I pected value or the average o f a function g (X ) o f a discrete random variable X
is defined as
1
, E {g {X ) } = 2 g ( x d P ( X = x,.) (2.28)
i ;=i
j It will be seen in the next section that the expected value of a random variable
is valid for all random variables, not just for discrete random variables. The
I form o f the average simply appears different for continuous random variables.
Two expected values or moments that are most commonly used for characterizing
a random variable X are its mean p.A- and its variance ux . The mean and variance
! are defined as
i
n
r E {X } = p* = ^ x ,P { X = x,) (2.29)
1= 1
RANDOM VARIABLES 27
A useful expected value that gives a measure of dependence between two random
variables X and Y is the correlation coefficient defined as
The numerator of the right-hand side of Equation 2.33 is called the covariance
(<Tsr) o f X and Y. The reader can verify that if X and Y are statistically inde
pendent, then PxY = 0 and that in the case when Wand Y are linearly dependent
(i.e., when Y = (b + kX), then |pxy| = 1. Observe that pXY = 0 does not imply
statistical independence.
Two random variables X and Y are said to be orthogonal if
E {X Y } = 0
where the subscripts denote the distributions with respect to which the expected
values are computed.
One o f the important conditional expected values is the conditional mean:
The conditional mean plays an important role in estimating the value o f one
random variable given the value o f a related random variable, for example, the
estimation of the weight o f an individual given the height.
Gx { z ) = 2 z kP { X = k) (2.35.a)
k = 0
1. Gx { 1) = X P ( X = k) = 1 (2.35.b)
k =0
2. If Gx (z ) is given, p k can be obtained from it either by expanding it in a
power series or from
Cn = E { X ( X - l X * - 2) ■ • • ( X - n + 1)}
-£ lO A z)iU (2.35.d)
RANDOM VARIABLES 29
From the factorial moments, we can obtain ordinary moments, for example, as
M-x —
and
o-x = C2 + Q - Cl
where
The reader can verify that the mean and variance of the binomial random variable
are given by (see Problem 2.13)
l^x = nP (2.38.a)
u\ = tip{l - p ) (2.38.b)
then the number o f events in a time interval o f length T can be shown (see
Chapter 5) to have a Poisson probability mass function o f the form
X*
P ( X = k) * = 0, 1, 2, (2.39.a)
k!
where X — X'T. The mean and variance of the Poisson random variable are
given by
M-x = X. (2.39.b)
oi = X (2.39.c)
P (X i — X 2 — x2, X k — xk)
n\
P\pxi ■ ■ ' PT (2.40)
Xllx2l x k-i\xk\
E X A M PLE 2.8.
(Note that this is similar to Example 2.3. The primary difference is the
use of random variables.)
SOLUTION:
(a) Using Equation 2.24, we have
P (Y = 1) = P { Y = 1|AT = 0)P(AT = 0)
+ P ( Y = 1\X = 1) P ( X = 1)
23
P ( Y = 0) = 1 — P { Y = 1) =
P ( Y = 1\X = 1) P ( X = 1)
P { X = 1| Y = 1) =
P (Y = 1)
2
JL 3
32
EXAMPLE 2.9.
SOLUTION:
(a) Let X be the random variable representing the number o f errors per
block. Then, X has a binomial distribution
E {X } = np = (1 6 )(.l) = 1.6
(b) The variance of X is found from Equation 2.38.b:
cri = rip(l - p) = (16)(.1)(.9) = 1.44
(c) P ( X > 5) = 1 - P { X < 4)
= 1 - 2 d V l ) * ( 0 . 9 ) 16-*
*=o K
= 0.017
EXAMPLE 2.10.
The number N o f defects per plate of sheet metal is Poisson with X = 10. The
inspection process has a constant probability of .9 o f finding each defect and
the successes are independent, that is, if M represents the number of found
defects
Find
(a) The joint probability mass function o f M and N.
(b) The marginal probability mass function o f M.
(c) The condition probability mass function of N given M.
(d) E{M\N}.
(e) E{M } from part (d).
SOLUTION:
n = 0, 1, .
e' 10 n
(a) P (M = i, N = « ) = (10)"(^)(-9)£(.1)"_ i = 0, 1, . •, n
»! ,.9 y u r ,
(b) P(M ' n! i\{n — i)!
g - 10( 9 ) ' ^ 1
i! n=, (n - /)!
i = 0, 1,
Thus
E{M\N} = .9N
This may also be found directly using the results of part (b) if these results are
available.
= dFx {x)
(2.41)
dx
With this definition the probability that the observed value of X falls in a small
interval of length A x containing the point x is approximated by )> (x)A x. With
such a function, we can evaluate probabilities of events by integration. As with
a probability mass function, there are properties that f x ( x ) must have before it
can be used as a density function for a random variable. These properties follow
from Equation 2.41 and the properties of a distribution function.
1. fx (x ) & 0 (2.42.a)
2. J f x ( x ) dx = 1 (2.42.b)
F ig u r e 2 .4 D i s t r ib u t i o n f u n c t i o n a n d d e n s it y fu n c t i o n f o r E x a m p le 2 .1 1 .
CONTINUOUS RANDOM VARIABLES 35
EXAMPLE 2.11.
Resistors are produced that have a nominal value of 10 ohms and are ±10%
resistors. Assume that any possible value of resistance is equally likely. Find the
density and distribution function o f the random variable R, which represents
resistance. Find the probability that a resistor selected at random is between 9.5
and 10.5 ohms.
SOLUTION: The density and distribution functions are shown in Figure 2.4.
Using the distribution function,
10.5 - 9.5 _ 1
9.5 2 2 ~ 2
F x k)
X
0
Figure 2 . 5 E x a m p l e o f a m i x e d d is t r ib u t io n fu n c t io n .
36 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
d2Fx,Y(x, y)
f x A x >y)
dx dy
f x A x >31) - 0
F x A x ’ y) /x,y (F , v) d\x dv
f x A P ; v) d\X dv 1
S!
From the joint probability density function one can obtain marginal proba
bility density functions f x (x), f r(y), and conditional probability density func
tions /Ar|y(^|y) and f y\x(.y\x ) as follows:
f x i x) = J fx,y(x, y ) dy (2.43.a)
f x ,r ( x >y) = f x ( x ) f r ( y ) (2-45)
EXAMPLE 2.12.
F Y( y ) = 0, y< 2
= 1, y > 4
1 P dv
= h l j [ xvdxdv 6L v
= 12 “ 4^’ 2 - ^ - 4
~ J j (x ~ tArXy ~ ^y)fx,y(x, y) dx dy
E {(X - p ^ X Y - p,y)}
Px y = (2.47.d)
(JxCTy
It can be shown that —I s pXY < 1. The Tchebycheff’s inequality for a contin
uous random variable has the same form as given in Equation 2.31.
Conditional expected values involving continuous random variables are de
fined as
E {g (X ) h ( Y )} = E {g (X ) }E { h ( Y )} (2.49)
It should be noted that the concept of the expected value o f a random variable
is equally applicable to discrete and continuous random variables. Also, if gen
eralized derivatives of the distribution function are defined using the Dirac delta
function S(jc), then discrete random variables have generalized density functions.
For example, the generalized density function o f die tossing as given in Example
2.6, is
If this approach is used then, for example, Equations 2.29 and 2.30 are special
cases o f Equations 2.47.a and 2.47.b, respectively.
For a continuous random variable (and using 8 functions also for a discrete
random variable) this definition leads to
(2.50.a)
which is the complex conjugate o f the Fourier transform of the pdf of X . Since
|exp(;W)| < 1,
Using the inverse Fourier transform, we can obtain f x (x) from T x(co) as
Thus, f x (x) and T x (co) form a Fourier transform pair. The characteristic function
of a random variable has the following properties.
E { X k} at co = 0 (2.51.a)
du>k
n . x 2(0, 0) = 1
and
dmdn
E{XTXa = [¥ * „ * (» ,, co2)] at (co,, co2) = (0, 0) (2.51.C)
The real-valued function Mx {t) = £ {exp (fX )} is called the moment generating
function. Unlike the characteristic function, the moment generating function
need not always exist, and even when it exists, it may be defined for only some
values of t within a region o f convergence (similar to the existence of the Laplace
transform). If Mx {t) exists, then M x {t) = tyx (t/j).
We illustrate two uses o f characteristic functions.
CONTINUOUS RANDOM VARIABLES 41
EXAMPLE 2.13.
X j and X 2 are two independent (Gaussian) random variables with means p t and
p 2 and variances a} and cr2. The pdfs of X x and X 2 have the form
1 (x, ~ p ,)2]
fx,(xd exp 1 ,2
V 2 tr cr; 2cr} J’
SOLUTION:
p» 1
(a) = —i==— e x p [ - ( ^ t - Pi)2/2cri]exp(/co^1) dxx
J V 2 tt (Tx
and hence
= exP[/M-i“ + (cri/co)2/,2] •J -
x exp[—( x x — p [)2/2cr2] dxt
where pj = Pi + cr2/co.
The value of the integral in the preceding equation is 1 and hence
¥ * ,(«) = exp[/p!co + (cr1/u ) 2/2]
Similarly
= exp[yp2co + (cr2/co)2/ 2]
= 3cr4
Following the same procedure it can be shown for X a normal random
variable with mean zero and variance cr2 that
n = 2k + 1
s m - {;.3 1K n = 2k, k an integer.
42 REVIEW OF PROBABILITY A N D RA NDO M VARIABLES
Thus
expfC^w )} = ¥ * ( 10)
- 1 + E[X](ju>) + E [ X 2] + ■ • • + E [ X n]
+ • • • (2.52.b)
2- n\
2 nl
CONTINUOUS RANDOM VARIABLES 43
E [X ] = K x
(2.52.c)
E [ X 2] = K 2 + K\
(2.52.d)
E [ X 3] = K 3 + 3K2K, + K\
(2.52.e)
E [ X 4] = K4 + 4K 3K, + 3 K\ + 6 K 2K\ + K\ (2.52.f)
b + a
~ ~Y~ (2.53.b)
2 (b - a f
x ~ 12 (2.53.C)
Gaussian Probability Density Function. One of the most widely used pdfs is
the Gaussian or normal probability density function. This pdf occurs in so many
applications partly because o f a remarkable phenomenon called the central limit
theorem and partly because o f a relatively simple analytical form. The central
limit theorem, to be proved in a later section, implies that a random variable
that is determined by the sum o f a large number o f independent causes tends
to have a Gaussian probability distribution. Several versions o f this theorem
have been proven by statisticians and verified experimentally from data by en
gineers and physicists.
One primary interest in studying the Gaussian pdf is from the viewpoint of
using it to model random electrical noise. Electrical noise in communication
44 REVIEW OF PROBABILITY AND RANDOM VARIABLES
fxb)
1 O - fxx) 2
fx(x) = exp (2.54)
V 2t 2<
j\
The family of Gaussian pdfs is characterized by only two parameters, \lx and
a x2, which are the mean and variance of the random variable X . In many ap
plications we will often be interested in probabilities such as
l (x ~ P x ) 2
P ( X > «) = f exp dx
Ja V 2 ttcrl 2oi
1
P (X > a ) = r exp( - z 2/2) dz
P-x)/ax V 2 tt
CONTINUOUS RANDOM VARIABLES 45
Various tables give any of the areas shown in Figure 2.7, so one must observe
which is being tabulated. However, any of the results can be obtained from the
others by using the following relations for the standard (p = 0, cr = 1) normal
random variable X:
/ > ( * < * ) = 1 - g (x )
P ( - a < X < a) = 2 P ( - a < I < 0 ) = 2F(0 < AT < a)
P ( X < 0) = ^ = g (0 )
EXAMPLE 2.14.
EXAMPLE 2.15.
The velocity V of the wind at a certain location is normal random variable with
|jl = 2 and cr = 5. Determine P( —3 < V < 8).
SOLUTION:
i f («-2)n
P( - 3 < V : exp
V2ir(25) L 2(25) J
( 8 - 2 ) /5 1
-I ( —3 —2)/5 V Z r r
exp
H] dx
Bivariate Gaussian pdf. We often encounter the situation when the instanta
neous amplitude of the input signal to a linear system has a Gaussian pdf and
we might be interested in the joint pdf o f the amplitude o f the input and the
output signals. The bivariate Gaussian pdf is a valid model for describing such
situations. The bivariate Gaussian pdf has the form
1 ■1
f x A x>y) exp
2'irCTA-cryVl — p 2(1 - p*)
2p(* - - |xy)~
ux u Y (2.57)
The reader can verify that the marginal pdfs o f X and Y are Gaussian with
means \lx , Py, and variances cr\, u\, respectively, and
Z = X + jY
RANDOM VECTORS 47
E{ g ( Z) } = | | g ( z ) f X:Y(x, y) dx dy
m — 1 integrals
and
fx,,X2(XU x 2)
= J J ■ ■ ■J fx „x 2....,xJx 1 , x 2>x 3, ■ ■ ■ , x m) dx3 dxA ■ ■ • dxm (2.58)
m — 2 integrals
Note that the marginal pdf of any subset o f the m variables is obtained by
“ integrating out” the variables not in the subset.
The conditional density functions are defined as (using m = 4 as an example),
r r \ \ fx,.X,.X,.xAXl’ X2>x 3 j x 4)
fx„X2.X3\xAXl’ X2,*3l*4) = -.J...*■ , , -------------- (2.59)
and
f x ,. X - , .X ,. x A X l ’ X 2i X 3i * 4 )
/ v 1.A’2|A’3.V4(-':1> -^-2!35 * 4) — (2.60)
f x 3.xXXl’ x*)
E { g ( X u X 2, X 3, X ,) }
(2.61)
= I I g ( * l , x 2, X 3, x J f x ^ . X i X ^ x S X u X^ X 3i X * ) d x \ d x 2 ( 2 -6 2 )
J — CO J — =0
RANDOM VECTORS 49
Important parameters o f the joint distribution are the means and the co-
variances
*x, = E{X,}
and
Note that a XXi is the variance of AT,-. We will use both <JX X , and ux . to denote
the variance o f X r Sometimes the notations EXi, Ex .x ., Ex .x . are used to denote
expected values with respect to the marginal distribution o f X h the joint distri
bution of X t and X h and the conditional distribution o f A", given Xjt respectively.
We will use subscripted notation for the expectation operator only when there
is ambiguity with the use o f unsubscripted notation.
The probability law for random vectors can be specified in a concise form
using the vector notation. Suppose we are dealing with the joint probability law
for m random variables X lt X 2, . ■ ■ , X m. These m variables can be represented
as components of an m x 1 column vector X ,
Xi
x 2
X == or X T = { X „ X 2, . . . , X m)
xm
where T indicates the transpose of a vector (or matrix). The values of X are
points in the m-dimensional space Rm. A specific value o f X is denoted by
X T = (*!, x 2, . . . , X m)
E(X, ) '
E ( X 2)
M-x = £ (X ) =
E { X m)
50 REVIEW OF PROBABILITY AND RANDOM VARIABLES
°> ,* 2 ■" V x ,x „
=
UX X22 <*x2x 2 ••• <JX X m2
J*X „X 2 ® x mx 2 "■ O ’X „ x m_
The covariance matrix describes the second-order relationship between the com
ponents of the random vector X . The components are said to be “ uncorrelated”
when
<rx,x, = <T; = 0, i j
and independent if
where |xx is the mean vector, 2 x is the covariance matrix, 2 X* is its inverse, |2 X|
is the determinant of 2 X, and X is of dimension m.
”x ,
~x; ~xt+1~
x2 x k+2
X = x t = x2=
x 2
_Xk_ _X„ _
and
i_W__
X 12
d. i
M'X = * 1 Xx —
^X2
1 X2i X22
0 0 •■ 0
0 0 •• 0
2X—
0 0 0 ■■
P y — A|xx (2.65.a)
XY = A X xA r (2.65.b)
and
Properties (1), (3), and (4) state that marginals, conditionals, as well as linear
transformations derived from a multivariate Gaussian distribution all have mul
tivariate Gaussian distributions.
52 REVIEW OF PROBABILITY AND RANDOM VARIABLES
EXAMPLE 2.15.
2
1
1
0
and
6 3 2 1
3 4 3 2
2X
2 3 4 3
1 2 3 3
Let
X2 = *3
*4
2Xx
Y = X x + 2X2
X3 + X4
SOLUTION:
(a) X j has a bivariate Gaussian distribution with
2 6 3
M-x, = and ^Xj
1 3 4
2 0 0 0
X2
Y = 1 2 0 0 = AX
X ,3
0 0 1 1
X4
RANDOM VECTORS 53
"2~
"2 0 0 o' ~4~
1
|Ay — A|XX — 1 2 0 0 = 4
1
0 0 1 1 1
0
and
2 y = A 2 XA 7
"2 1 o'
2 0 0 0
0 2 0
1 2 0 0
0 0 1
0 0 11
0 0 1
24 24 6
24 34 13
6 13 13
2 1 4 3 *3 - 1
J-Lx.x, — 3 3
3 2 x4 — 0
*3 “ 3 *4 + 1
3 **
and
-1
'4 3" '2 3"
-X ,| X 2
3 l\ -\ _3 3_ _! 2_
14/3 4/3
4/3 5/3
where <aT = (coj, o)2, . . . , o)„). From the joint characteristic function, the
moments can be obtained by partial differentiation. For example,
To simplify the illustrative calculations, let us assume that all random variables
have zero means. Then,
^ ( “ i , w 2, u>3, co4) = 1 - - w r S x to
+ i (t» T2 x o>)2 + R
where R contains terms o f o) raised to the sixth and higher power. When we
take the partial derivatives and set 0)3 = o>2 = o)3 = co4 = 0, the only nonzero
terms come from terms proportional to co1a>2a>3co4 in
- (w TS x w )2 = - + c r 22c o l + O 33 0)^ + 0 -4 4 ^
When we square the quadradic term, the only terms proportional to (j) 1w2oj3co4
will be
Taking the partial derivative o f the preceding expression and setting to = (0),
we have
= E { X 1X 2} E { X 3X 4} + E { X 2X 3} E { X 1X 4}
+ E {X 2X A} E { X l X 7} (2.69)
The reader can verify that for the zero mean case
In the analysis of electrical systems we are often interested in finding the prop
erties of a signal after it has been “ processed” by the system. Typical processing
operations include integration, weighted averaging, and limiting. These signal
processing operations may be viewed as transformations of a set of input variables
to a set of output variables. If the input is a set of random variables, then the
output will also be a set of random variables. In this section, we develop tech
niques for obtaining the probability law (distribution) for the set o f output
random variables given the transformation and the probability law for the set
o f input random variables.
The general type of problem we address is the following. Assume that X is
a random variable with ensemble Sx and a known probability distribution. Let
g be a scalar function that maps each x G Sx to y = g(x). The expression
^ = g (X )
56 REVIEW OF PROBABILITY AND RANDOM VARIABLES
F ig u r e 2 .8 T r a n s f o r m a t i o n o f a r a n d o m v a r ia b le .
defines a new random variable* as follows (see Figure 2.8). For a given outcome
k, X(k) is a number x, and g[X(X.)] is another number specified by g(x). This
number is the value o f the random variable Y, that is, Y(k) = y = g(x). The
ensemble SY of Y is the set
s Y = {y = sM '■x e sx}
B = {* :g (* ) S C}
P (C ) = P (A ) = P (B )*1
3
2
*For Y to be a random variable, the function g : X —> Y must have the following properties:
2. It must be a Baire function, that is, for every y , the set / , such that g(;t) s y must consist
o f the union and intersection o f a countable number o f intervals in S x . Only then {Y s y}
is an event.
3. The events {X : g(/Y (X )) = ± » } must have zero probability.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 57
P(C ) = P ( Y ^ y ) = Fy(y)
= [ f x (x ) dx
Jb
P (Y E l y) - f Y(y ) Ay
= J f x (x ) dx
which shows that we can derive the density o f Y from the density of X.
We will use the principles outlined in the preceding paragraphs to find the
distribution of scalar-valued as well as vector-valued functions of random vari
ables.
P { Y = yi) = £ P { X = x,)
two roots are x® — + V y and x® = —V y ; also see Figure 2.9 for another
example.) We know that
Now if we can find the set of values of x such that y < g(x) < y + Ay, then
we can obtain f Y(y ) Ay from the probability that X belongs to this set. That is
For the example shown in Figure 2.9, this set consists o f the following three
intervals:
x® < x £ x® + Ax®
x® + A x(2) < x s x®
x® < x :£ x® + Ax®
TRANSFORMATIONS ( FUNCTIO NS ) OF RANDO M VARIABLES 59
where Ax® > 0, Ax® > 0 but Ax® < 0. From the foregoing it follows that
We can see from Figure 2.9 that the terms in the right-hand side are given by
A x(1) = A y/g'(x(1))
Ax® = Ay /g '(x ® )
A x® = A y/g'(x® )
Hence we conclude that, when we have three roots for the equation y = g(x),
^ lx (£ ^ (2.71)
fv(y)
h ig'(^(0)l
g'(x) is also called the Jacobian of the transformation and is often denoted by
J{x). Equation 2.71 gives the pdf of the transformed variable Y in terms of the
pdf o f X , which is given. The use of Equation 2.71 is limited by our ability to
find the roots o f the equation y = g(x). If g(x) is highly nonlinear, then the
solutions of y = g(x) can be difficult to find.
EXAMPLE 2.16.
x (1) = + V y — 4
x (2) = —V y — 4
and hence
f x ( x m) f x ( * <2))
fr(y)
|g'(*(1>)l |g'(*(2,)l
/* (* ) = x 2/2),
we obtain
1
e x p ( - ( y - 4)/2), y > 4
V 2 -ir(_y — 4)
fr(y) =
0 y < 4
EXAMPLE 2.17
Using the pdf of X and the transformation shown in Figure 2.10.a and 2.10.b,
find the distribution o f Y.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 61
F ig u r e 2 .1 0 T r a n s fo r m a t io n d is c u s s e d in E x a m p le 2 .1 7 .
All the values of jc > 1 map to y = 1. Since x > 1 has a probability o f §, the
probability that Y = 1 is equal to P ( X > 1) = l Similarly P (Y = - 1 ) = i.
Thus, Y has a mixed distribution with a continuum of values in the interval
( - 1 , 1) and a discrete set of values from the set { - 1 , 1}. The continuous
part is characterized by a pdf and the discrete part is characterized by a prob
ability mass function as shown in Figure 2.10.C.
Yi = gi{X u X 2, . . . , X n), 1 , 2,
Let us start with a mapping o f two random variables onto two other random
variables:
Yi = Si(Xi, X2)
Y = g2( x u X 2)
and
There are k such regions as shown in Figure 2.11 (k = 3). Each region consists
o a parallelogram and the area o f each parallelogram is equal to Ay,Ay,/
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 63
dgi dgi
dxi dx2
J(xi, x 2) = (2.72)
dgi dg2
dx2 dx2
By summing the contribution from all regions, we obtain the joint pdf of Yj and
Y2 as
f Xl,Xl{x f , 4 ° )
y2) = z l7 (4°> 4 '))|
(2.73)
Using the vector notation, we can generalize this result to the n-variate case as
/ x ( x ( ->)
/v (y ) = X |/(x « )| (2.74.a)
i= l
Suppose we have n random variables with known joint pdf, and we are
interested in the joint pdf o f m < n functions o f them, say
yj = gj(x i, x 2, , x„), j = m + 1, . . . , n
in any convenient way so that the Jacobian is nonzero, compute the joint pdf
of Yi, Y2> • • ■ > Yn, and then obtain the marginal pdf o f Yj, Y2, . . . , Ym by
64 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
integrating out Ym+l, . . . , Y„. If the additional functions are carefully chosen,
then the inverse can be easily found and the resulting integration can be handled,
but often with great difficulty.
EXAMPLE 2.18.
* 1*2
(*1 + * 2)
y2 = *2
where
*2 *1
(*1 + *2? (*1 + * 2)2
T ( * i, *2) = 0 1
(*1 + * 2)2
(yi - y 1)2
yl
TRANSFORMATIONS {FUNCTIONS) OF RANDOM VARIABLES 65
We are given
= 0 elsewhere
Thus
1 yi
fyt,r2(yu yi) yi,y 2 = ?
4 (y2 - y 1 ) 2 ’
= 0 elsewhere
We must now find the region in the y u y 2 plane that corresponds to the region
9 < * < 1 1 ; 9 < *2 < 1 1 . Figure 2.12 shows the mapping and the resulting
region in the y lt y 2 plane.
Now to find the marginal density o f Yu we “ integrate out” y 2.
, 19
frjyi) = J99
r■9n'V-r1) ____y|
4 (y 2 - y j
2 dy 2 , 2
v, < 4 —
71 20
yi dy 2)
19
4 — < v, < 5 -
)nv(ii-r.> 4(tt “ y i):
20 71 2
= 0 elsewhere
*2
yi = 9yi/(9-yi)
11
9— ?2 = ll>i/(U -^ 1)
4Y2 4 % 5 V2
yl=xix2Hxi + x2)
(a )
(6)
F ig u r e 2 .1 2 T r a n s f o r m a t io n o f E x a m p l e 2 .1 8 .
66 REVIEW OF PROBABILITY AND RANDOM VARIABLES
yi + y 2 In yi , 1 19
4 ~ — y i — 4
2 + 2(9 - y t) 9 - y / 20
11 - yi yf 11 - yi 19 .1
+ y 1 In yx
2 (H - yO y\ 20
= 0 elsewhere
where the atJ s and £>,•’s are all constants. In matrix notation we can write this
transformation as
Y = AX + B (2.75)
X = A - 'Y - A -1B
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 67
0/i,l 0/i,2 ■ a n ,n
Substituting the preceding two equations into Equation 2.71, we obtain the pdf
of Y as
/v (y ) = /x (A -* y - A - 1B)||A||-1 (2.76)
/y, — f x 2 * f x 2 (2.77.b)
Thus, the density function of the sum of two independent random variables is
given by the convolution of their densities. This also implies that the charac-
68 REVIEW OF PROBABILITY AND RANDOM VARIABLES
teristic functions are multiplied, and the cumulant generating functions as well
as individual cumulants are summed.
EXAMPLE 2.19.
-l
-l
( 6)
I f x ( 0 . 5 - y 2 ) f X2( , y 2 )
V M /J/A
(0.5)
-2 0 .5 2
Id)
EXAMPLE 2.20.
f y(y) = P exp(-Ac 02 e x p [ - 2 (y - * 0 ] d xx
Jo
EXAMPLE 2.21.
Now if we define S Y = A 2 xA r, then the exponent in the pdf of Y has the form
which corresponds to a multivariate Gaussian pdf with zero means and a co-
variance matrix o f 2 Y. Hence, we conclude that Y, which is a linear transforma
tion of a multivariate Gaussian vector X, also has a Gaussian distribution. (Note:
This cannot be generalized for any arbitrary distribution.)
Order Statistics. Ordering, comparing, and finding the minimum and maximum
are typical statistical or data processing operations. We can use the techniques
outlined in the preceding sections for finding the distribution of minimum and
maximum values within a group of independent random variables.
Let X lt X 2, X 3, . . . , X n be a group of independent random variables having
a common pdf, f x (x ) , defined over the interval (a, b). To find the distribution
of the smallest and largest o f these Xfi, let us define the following transformation:
Y„ = largest of (X u X 2, . . . , X„)
That is Yi < Y2 < ••• < Y„ represent X i , X 2,. . . , X nwhen the latter are arranged
in ascending order o f magnitude. Then Yt is called the ith order statistic of the
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 71
group. We will now show that the joint pdf of Yu Y2, ■ . . , Yn is given by
We shall prove this for n = 3, but the argument can be entirely general.
With n = 3
/ a:t,x2,x3( x u x 2 , x 3) = f x ( x 1) f x ( x 2) f x ( x 3)
Yj = smallest of ( X 3, X 2, X 3)
Y2 = middle value of ( A j, X 2, X 3)
Y3 = largest o f { X u X 2, X 3)
A given set of values x u x 2, x 3 may fall into one of the following six possibilities:
Xi < X2 < X3 or yi = yi = x 2,
xu y3 = x 3
Xi < X 3< X 2 or yi = yi = x 3,
X u y3 = x 2
x2< Xi < X 3 or yi = X 2, yi = X u y3 = x 3
x 2 < X 3< Xi or yi = X2, yi = x 3, y3 = X i
x3 < X i < X 2 or yi = x 3, yi = X u y3 = X 2
x3< x2< Xi or y i = X 3, y z = x 2, y3 = X 1
0 0 1
J = 1 0 0 1
0 1 0
The reader can verify that, for all six inverses, the Jacobian has a magnitude of
1, and using Equation 2.71, we obtain the joint pdf of Y1; Y2, Y3 as
Equations 2.78.b and 2.78.C can be used to obtain and analyze the distribution
of the largest and smallest among a group o f random variables.
EXAMPLE 2.22.
f x ( x ) = ae “, x > 0
= 0 x <0
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 73
£ {h (Y )]} = f h ( y ) f Y(y)dy
h
£ Y{h(Y )} = £ x{h(g(X )}
Using the means and covariances, we may be able to approximate the dis
tribution of Y as discussed in the next section.
Y = g ( Z 1; . . . , X n)
74 REVIEW OF PROBABILITY AND RANDOM VARIABLES
F ig u r e 2 .1 5 S im p le M o n t e C a r l o s im u la t io n .
zv
If
O f
6£ '
8£ '
LZ-
92'
S£'
VS'
££'
Z£-
IE'
or
62 '
82 '
LZ'
92'
92'
VZ‘
£2'
22'
12'
02'
61 '
81 '
LV
9Y
S I'
H'
£1'
21'
R e s u lt s o f a M o n t e C a r lo s im u la t io n .
ir
or
60 '
80 '
/O'
90 '
SO'
fO '
£ 0'
20'
10'
0
10-
20-
£ 0' -
VO'-
S O '-
90-
LO'-
F ig u r e 2 .1 6
s3|diues jojsquinN
76 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Fx.(x) = 0 x < 10
= (x - 10 ) / 10 , 10 < x < 20
= 1 x > 20
Notice FxHu) = 10m + 10. Thus, if the value .250 were the random sample
of U, then the corresponding random sample of X would be 12.5.
The reader is asked to show using Equation 2.71 that if X t has a density
function and if X t = F r 1(U) = g(U) where U is uniformly distributed between
zero and one then Fp 1 is unique and
dFj(x)
fx ,(x ) where Ft = (F r T 1
dx
In these cases we use several approximation techniques that yield upper and/or
lower bounds on probabilities.
BOUNDS AND APPROXIMATIONS 77
if |*| s e
Yt =
( if |*1 < e
and thus
However,
(Note that the foregoing inequality does not require the complete distribution
o f X , that is, it is distribution free.)
Now, if we let X = {Y - p,y), and e = k, Equation 2.82.a takes the form
or
Equation 2.82.b gives an upper bound on the probability that a random variable
has a value that deviates from its mean by more than k times its standard
deviation. Equation 2.82.b thus justifies the use o f the standard deviation as a
measure o f variability for any random variable.
X a e
W< e
euY.
and, hence,
or
P ( X a e) < e - KE{e,x}, ta 0
Furthermore,
Equation 2.83 is the Chernoff bound. While the advantage of the Chernoff
bound is that it is tighter than the Tchebycheff bound, the disadvantage of the
Chernoff bound is that it requires the evaluation of E{e'x } and thus requires
more extensive knowledge o f the distribution. The Tchebycheff bound does not
require such knowledge o f the distribution.
BOUNDS A N D APPROXIMATIONS 79
(2.84)
EXAMPLE 2.23.
(a) Find the Tchebycheff and Chernoff bounds on P(X, > 3) and compare
it with the exact value o f P (X x == 3).
(b) Find the union bound on P (X x > 3 o r X 2 > 4 ) and compare it with the
actual value.
SOLUTION:
(a) The Tchebycheff bound on P (X x > 3) is obtained using Equation 2.82.C
as
Hence,
P { X l £ e) < e~*2'2
(b) P(Xi £ 3 or X 2 £ 4)
= P(Xt £ 3) + P (X 2 £ 4) - P ( X t £ 3 and X 2 £ 4)
= P { X l £ 3) + P {X 2 £ 4) - P ( X Y£ 3) P (X 2 £ 4)
since X x and X 2 are independent. The union bound consists o f the sum
o f the first two terms o f the right-hand side o f the preceding equation,
and the union bound is “ off” by the value of the third term. Substituting
the value o f these probabilities, we have
The union bound is usually very tight when the probabilities involved
are small and the random variables are independent.
Y — g (X i, X 2, , X n)
II Y is represented by its first-order Taylor series expansion about the point p.L,
P '2 , ■ ■ . , |A„
Y dg
g( Pl , Pi, ■■■ , Pi.) + s (Pi, p 2, ■ • • , Pi.) [X, - P,]
i=1 dXj
then
where
P; = E[Xi]
vl, = E[X, - P,)2]
_ E\(X, - p )(A - |x,)]
EXAMPLE 2.24.
y = § 2 + X 3X 4 - X I
p*! = 10 = 1
Px2 = 2 (j-2x 2 — -2
Px3 = 3 tri
*3 = 4
-
P*4 = 4 CTx = -
x4 3
Px5 = 1 (Tx = -
*5 5
Find approximately (a) py, (b) a\, and (c) P (Y < 20).
SOLUTION:
(a) py « y + (3)(4) -
00 ^ “ (^)2(l) + (
= 11.2
(c) With only five terms in the approximate linear equation, we assume,
for an approximation, that Y is normal. Thus
where
(2 . 86)
and the basis functions o f the expansion, Hj(y), are the Tchebycheff-Hermite
( T-H) polynomials. The first eight T-H polynomials are
H0(y) = 1
tfi(y) = y
H2(y) = / ~ 1
H3(y) = / - 3y
H ly ) = / - 6/ + 3
Hs(y) = / - 1 0 / + 15y
H6(y) = y 6 - 15y4 + 4 5 / - 15
H-ly) - / - 2 1 / + 105/ - 105y
Hs(y) = y s - 2 8 / + 210 / - 4 2 0 / + 105 (2.87)
d(Hk-i(y )h (y ))
1. Hk(y)h(y) = -
dy
84 REVIEW OF PROBABILITY AND RANDOM VARIABLES
3. J Hm(y )H n(y)h(y) dy = 0 , m ^n
( 2 . 88)
= nl, m - n
The coefficients o f the series expansion are evaluated by multiplying both
sides of Equation 2.85 by H k(y) and integrating from — co to <». By virtue o f the
orthogonality property given in Equation 2.88, we obtain
Ck H k{y)fr(y)dy
h i:
i JfcM m
(2.89.a)
l\ *■ ( 2 ) 1 ! ^ ~ 2 + 2 Z2 ! * k~4
where
= E {Y m}
and
^ = (k - m)\ = k{k ~ 1} [k ~ (m _ ^ k ~ m
The first eight coefficients follow directly from Equations 2.87 and 2.89.a and
are given by
C0 = 1
Cl = P-!
C2 = \ ( p* - 1)
c3 = g (p3 - 3p.i)
c 4 = ~ (p -4 - 6 p.2 + 3)
C5 = 1^0 “ 10^3 + ^ l )
Substituting Equation 2.89 into Equation 2.85 we obtain the series expansion
for the pdf of a random variable in terms o f the moments of the random variable
and the T-H polynomials.
The Gram-Charlier series expansion for the pdf of a random variable X with
mean | jla- and variance crA has the form:
where the coefficients C, are given by Equation 2.89 with |a[ used for |xt where
EXAMPLE 2.25.
SOLUTION:
^ = 1
9|x2 + 27 (jlj — 27
8
12p,3 + 54|x2 — 108(Xj 81
= 3.75
16
86 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Co — 1
Cx = 0
C2 = 0
C3 = 2 ( - . 5 ) = -.0 8 3 3 3
o
c * = h (3 -75 - 6 + 3) = .03125
+ J 1 .03l25h{z)Hi(z) dz
P{Z < 1)
= .8413 + .0833/z(1)//2(1) - .03125/i(l)tf3(l)
which says that (if only the first and second moments of a random variable are
known) the Gaussian pdf is used as an approximation to the underlying pdf. As
BOUNDS A N D APPROXIMATIONS 87
we add more terms, the higher order terms will force the pdf to take a more
proper shape.
A series of the form given in Equation 2.90 is useful only if it converges
rapidly and the terms can be calculated easily. This is true for the Gram-Charlier
series when the underlying pdf is nearly Gaussian or when the random variable
X is the sum of many independent components. Unfortunately, the Gram-
Charlier series is not uniformly convergent, thus adding more terms does not
guarantee increased accuracy. A rule of thumb suggests four to six terms for
many practical applications.
(2.91.a)
where
1
b2 = - .356563782
1 + py
|e(y)| < 7.5 x 10“8 b3 = 1.781477937
p = .2316419 b4 = -1.821255978
b, = .319381530 b5 = 1.330274429
88 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
t „ —* t 0 as n —> “
, converges for every X. e S, then we say that the random sequence converges
everywhere. The limit of each sequence can depend upon X, and if we denote
the limit by X , then X is a random variable.
Now, there may be cases where the sequence does not converge for every
outcome. In such cases if the set o f outcomes for which the limit exists has a
probability of 1 , that is, if
then we say that the sequence converges almost everywhere or almost surely.
This is written as
for all x at which F(x) is continuous, then we say that the sequence X„ converges
in distribution to X .
Zn = 2 (Xi - M.)/VmT2
Then Z„ has a limiting (as ti —* °°) distribution that is Gaussian with mean 0 and
variance 1 .
The central limit theorem can be proved as follows. Suppose we assume that
the moment-generating function M(t) of X^ exists for |f| < h. Then the function
m(t)
exists for - h < t < h. Furthermore, since X k has a finite mean and variance,
the first two derivatives o f M(t) and hence the derivatives of m(t) exist at t =
90 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Next consider
t ) = £ {e x p (T Z „ )}
exp - M. *2 - \l - M -
l t exp I t
= E{ aVn f fV « j ' ' 6XP cr's/rt
- M-'
E'|explT^ f)}
* .r h < ---- 7= < h
c r V /2 / o -V /2
^ i + l ! + [m 'U ) - j 2] t 2
Ct V h / 2/2 2 / 2 (7 2
0< 5<
lim[m"(^) — a2] = 0
and
lim M „( t) = lim j 1 + — }
n—** n~♦* 2/2 J
= e x p (T2/2) (2.94)
SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 91
(The last step follows from the familiar formula o f calculus l i n v ,4 l + a!n\" =
e“). Since exp(i-2/2) is the moment-generating function o f a Gaussian random
variable with 0 mean and variance 1 , and since the moment-generating function
uniquely determines the underlying pdf at all points o f continuity, Equation
2.94 shows that Z n converges to a Gaussian distribution with 0 mean and vari
ance 1 .
In many engineering applications, the central limit theorem and hence the
Gaussian pdf play an important role. For example, the output of a linear system
is a weighted sum of the input values, and if the input is a sequence o f random
variables, then the output can be approximated by a Gaussian distribution.
Another example is the total noise in a radio link that can be modeled as the
sum o f the contributions from a large number of independent sources. The
central limit theorem permits us to model the total noise by a Gaussian distri
bution.
We had assumed that X -s are independent and identically distributed and
that the moment-generating function exists in order to prove the central limit
theorem. The theorem, however, holds under a variety of weaker conditions
(Reference [6]):
The assumption of finite variances, however, is essential for the central limit
theorem to hold.
Finite Sums. The central limit theorem states that an infinite sum, Y, has a
normal distribution. For a finite sum of independent random variables, that is,
Y = 2 X>
1= 1
then
f y — f x t * fxt * ■ ■ • * fx„
>M“ ) = IT ‘M 03)
i=*l
92 REVIEW OF PROBABILITY AND RANDOM VARIABLES
and
Cy(w) - ^ C^.(w)
K,y = 2 K-lx,
y=i
M-r = 2 Mw,
cry = 2 ar
M = 2 K ^t = 2 (M X - p *)4} - 3 K xx)
For finite sums the normal distribution is often rapidly approached; thus a
Gaussian approximation or aGram-Charlier approximation is often appropriate.
The following example illustrates the rapid approach to a normal distribution.
SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 93
9.70 9.75 9.80 9.85 9.90 9.95 10.00 10.05 10.10 10.15 10.20 10.25
x
EXAMPLE 2.26.
is, if
P{|X - X n\> e} —» 0 as n —» oo
for any e > 0, then we say that X„ converges to the random variable X in
probability. This is also called stochastic convergence. An important application
of convergence in probability is the law of large numbers.
X» = - S X, (2 .95.a)
n i
The law of large numbers can be proved directly by using Tchebycheff’s ine
quality.
E [ { X n - X ) 2] *0 as n -» (2.96)
If Equation 2.96 holds, then the random variable X is called the mean square
limit of the sequence X„ and we use the notation
l.i.m. X n = X
For random sequences the following version of the Cauchy criterion applies.
E { ( X n - X ) 2} 0 as n —» cc
if and only if
2.9 SUMMARY
The reviews of probability, random variables, distribution function, probabil
ity mass function (for discrete random variables), and probability density
functions (for continuous random variables) were brief, as was the review of
expected value. Four particularly useful expected values were briefly dis
cussed: the characteristic function £ {ex p (/o)X )}; the moment generating func
tion £ {exp (fX )}; the cumulative generating function In £ {e x p (fX )}; and the
probability generating function E {zx } (non-negative integer-valued random
variables).
96 REVIEW OF PROBABILITY AND RANDOM VARIABLES
The review o f random vectors, that is, vector random variables, extended the
ideas of marginal, joint, and conditional density function to n dimensions,
and vector notation was introduced. Multivariate normal random variables
were emphasized.
2.10 REFERENCES
The material presented in this chapter was intended as a review of probability and random
variables. For additional details, the reader may refer to one of the following books.
Reference [2], particularly Vol. 1, has become a classic text for courses in probability
theory. References [8] and the first edition of [7] are widely used for courses in applied
probability taught by electrical engineering departments. References [1], [3], and [10]
also provide an introduction to probability from an electrical engineering perspective.
Reference [4] is a widely used text for statistics and the first five chapters are an excellent
introduction to probability. Reference [5] contains an excellent treatment of series ap
proximations and cumulants. Reference [6] is written at a slightly higher level and presents
the theory of many useful applications. Reference [9] describes a theory of probable
reasoning that is based on a set of axioms that differs from those used in probability.
[1] A. M. Breipohl, Probabilistic Systems Analysis, John Wiley & Sons, New York,
1970.
[2] W. Feller, An Introduction to Probability Theory and Applications, Vols. I, II,
John Wiley & Sons, New York, 1957, 1967.
[3] C. H. Helstrom, Probability and Stochastic Processes for Engineers, Macmillan,
New York, 1977.
[4] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, Macmillan,
New York, 1978.
PROBLEMS 97
[5] M. Kendall and A. Stuart, The Advanced Theory o f Statistics, Vol. 1, 4th ed.,
Macmillan, New York, 1977.
[6] H. L. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences,
Vol. I, John Wiley & Sons, New York, 1979.
[7] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-
Hill, New York, 1984.
[8] P. Z. Peebles, Jr., Probability, Random Variables, and Random Signal Principles,
2nd ed., McGraw-Hill, New York, 1987.
[9] G. Shafer, A Mathematical Theory o f Evidence, Princeton University Press, Prince
ton, N.J., 1976.
[10] J. B. Thomas, An Introduction to Applied Probability and Random Processes, John
Wiley & Sons, New York, 1971.
2.11 PROBLEMS
2.1 Suppose we draw four cards from an ordinary deck o f cards. Let
A x: an ace on the first draw
2.2 A random experiment consists of tossing a die and observing the number
of dots showing up. Let
A x: number of dots showing up = 3
2 3 A box contains three 100-ohm resistors labeled R x, R2, and R3 and two
1000-ohm resistors labeled R 4 and Rs. Two resistors are drawn from this
box without replacement.
98 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Work parts (b), (c), and (d) by counting the outcomes that belong to the
appropriate events.
2.4 With reference to the random experiment described in Problem 2.3, define
the following events.
a. P {A U S U C ) = P ( A ) + P (B ) + P {C ) - P { A B ) - P (B C )
- P ( C A ) + P {A B C ).
b. P(A\B) = P { A ) implies P(B\A) = P(B).
c. P ( A B C ) = P (A )P (B \ A )P (C \ A B ).
2.6 A u A 2, A 2 are three mutually exclusive and exhaustive sets of events as
sociated with a random experiment E u Events Blt B2, and B3 are mutually
- r '" - B
---- ^ ^ ___
Figure 2.19 Circuit diagram for Problem 2.8.
PROBLEMS 99
\ B,
A \ e, b2 e3
4, 3/36 * 5/36
A2 5/36 4/36 5/36
A3 * 6/36 *
P[B,) 12/36 14/36 *
2.7 There are two bags containing mixtures o f blue and red marbles. The first
bag contains 7 red marbles and 3 blue marbles. The second bag contains 4
red marbles and 5 blue marbles. One marble is drawn from bag one and
transferred to bag two. Then a marble is taken out of bag two. Given that
the marble drawn from the second bag is red, find the probability that the
color of the marble transferred from the first bag to the second bag was
blue.
2.8 In the diagram shown in Figure 2.19, each switch is in a closed state with
probability p, and in the open state with probability 1 — p. Assuming that
the state o f one switch is independent of the state o f another switch, find
the probability that a closed path can be maintained between A and B
(Note: There are many closed paths between A and B.)
2.9 The probability that a student passes a certain exam is .9, given that he
studied. The probability that he passes the exam without studying is ,2.
Assume lhat the probability that the student studies for an exam is .75 (a
somewhat lazy student). Given that the student passed the exam, what is
the probability that he studied?
2.10 A fair coin is tossed four times and the faces showing up are observed.
a. List all the outcomes o f this random experiment.
b. If Al is the number of heads in each of the outcomes of this ex
periment, find the probability mass function o f X.
I
2.11 Two dice are tossed. Let X be the sum o f the numbers showing up. Find
the probability mass function o f X .
2.13 Show that the mean and variance o f a binomial random variable X are
M-v = nP and &x = npq, where q = 1 — p.
2.14 Show that the mean and variance o f a Poisson random variable are p x =
X. and o * = X..
2.15 The probability mass function o f a geometric random variable has the form
P ( X = k ) = pq*~\ k = 1, 2, 3, . . . ; p, q > 0, p + q = 1.
a. Find the mean and variance of X .
b. Find the probability-generating function o f X.
2.16 Suppose that you are trying to market a digital transmission system (m o
dem) that has a bit error probability of 10 ~4 and the bit errors are inde
pendent. The buyer will test your modem by sending a known message of
104 digits and checking the received message. If more than two errors
occur, your modem will be rejected. Find the probability that the customer
will buy your modem.
\x
Y\ -1 0 1
-1 1 1 0
4 8
1
0 0 4 0
1 0 1
i 4
c. Find pxy -
PROBLEMS 101
2.18 Show that the expected value operator has the following properties.
a. E{a + b X } = a + b E {X }
b. E {a X + b Y } = aE {X } + b E {Y }
c. Variance o f a X + b Y = a1 Var[V] + b 2 Var[Y]
+ lab Covar[X, Y]
2.20 A thief has been placed in a prison that has three doors. One o f the doors
leads him on a one-day trip, after which he is dumped on his head (which
destroys his memory as to which door he chose). Another door is similar
except he takes a three-day trip before being dumped on his head. The
third door leads to freedom. Assume he chooses a door immediately and
with probability 1/3 when he has a chance. Find his expected number of
days to freedom. {Hint: Use conditional expectation.)
2.21 Consider the circuit shown in Figure 2.20. Let the time at which the ith
switch closes be denoted by X t. Suppose X x, X 2, X 3, X 4 are independent,
identically distributed random variables each with distribution function F.
As time Increases, switches will close until there is an electrical path from
A to C. Let
U = timewhen circuit is first completed from A to B
V = time when circuit is first completed from B to C
W = timewhen circuit is first completed from A to C
Find the following:
a. The distribution function o f U.
------------- 0 ^ 0 -------------
o
F ig u r e 2 .2 0 C ir c u it d ia g r a m f o r P r o b l e m 2 .2 1 .
102 REVIEW OF PROBABILITY AND RANDOM VARIABLES
2.23 Show that the mean and variance of a random variable X having a uniform
distribution in the interval [a, b] are p.* = (a + b )l 2 and = (b —
a fm .
2.25 X is a zero mean Gaussian random variable with a variance o f <j \ . Show
that
2.26 Show that the characteristic function o f a random variable can be expanded
as
**(<■ >) =
k =
i 0
“
K
r
-
e w
(Note: The series must be terminated by a remainder term just before the
first infinite moment, if any exist).
2.27 a. Show that the characteristic function of the sum of two independent
random variables is equal to the product of the characteristic functions of
the two variables.
b. Show that the cumulant generating function of the sum of two
independent random variables is equal to the sum o f the cumulant gen
erating function of the two variables.
c. Show that Equations 2.52.C through 2.52.f are correct by equating
coefficients of like powers o f / o> in Equation 2.52.b.
a
fx 0 ) = a > 0,
tt( x 2 + a 2) ’
a. Find the characteristic function of X.
b. Comment about the first two moments of X.
2.33 X and Y are independent zero mean Gaussian random variables with
variances &x , and crY. Let
Z = i (X + T) and W - %(X — Y)
a. Find the joint pdf f z,w (z, w).
b. Find the marginal pdf / Z(z).
c. Are Z and W independent?
Z = - [X, + X 2 + • • • + X„]
n
104 REVIEW OF PROBABILITY AND RANDOM VARIABLES
is a Gaussian random variable with p.z = 0 and = a 2ln. (Use the result
derived in Problem 2.32.)
2.35 X is a Gaussian random variable with mean 0 and variance Find the
pdf of Y if:
a. Y = X2
b- Y = |A1
c. Y = i [ X + l*|]
f 1 if X > crx
d. Y = \ X if |*| £ a x
1 -1 if * < —<jx
2.38 Xi and X 2 are two independent random variables with uniform pdfs in the
interval [0, 1], Let
Yi = Xi + X 2 and Y2 = X x - X 2
a. Find the joint pdf f y lrY2( y i, Yz) and clearly identify the domain
where this joint pdf is nonzero.
b. Find py,y2 and E{Yi\Y2 = 0.5).
2.39 X x and X 2 are two independent random variables each with the following
density function:
/* ,(* ) = e x, x > 0
= 0 x < 0
PROBLEMS 105
Let Fj = Xi + X 2 and Y2 = X J ( X 2 + X 2)
Y= t x j
i= 1
2.41 X is uniformly distributed in the interval [ —or, -rr]. Find the pdf of
Y - a sin(AT).
" i i i ”
"6 " 2 4 3
i O 2
|J-X = 0 2 x — 4 - ^ 3
I 2 1
8 3 3 1
Find the mean vector and the covariance matrix of Y = [F l5 F2, F3]r,
where
Fi = X 1 — X 2
Y2 = Xi + X 2 - 2 X 3
F3 = X x + X 3
Vi
y = VJ
y„_
v r 2 xv s o
(This is the condition for positive semidefiniteness of a matrix.)
106 REVIEW OF PROBABILITY AND RANDOM VARIABLES
Y = AX
where
T
A [V 1; v 2, v 3, . . . , v„] n x n
r^i
\2 0
Xy =
o K
2.48 If U(x) 2: 0 for all x and U(x) > a > 0 for all x E. t, where £ is some
interval, show that
P[U(X)*a]£±E{U(X)}
2.49 Plot the Tchebycheff and Chernoff bounds as well as the exact values for
P ( X > a), a > 0 , if W is
2.50 Compare the Tchebycheff and Chernoff bounds on P ( 7 > a) with exact
values for the Laplaeian pdf
fr(y) = |e x p (-|y |)
Y = X + N
where X is the “ signal” component and N is the noise. X can have one
of eight values shown in Figure 2.21, and N has an uncorrelated bivariate
Gaussian distribution with zero means and variances o f 9. The signal X
and noise N can be assumed to be independent.
The receiver observes Y and determines an estimated value X of X
according to the algorithm
if y G Aj then X = x;
The decision regions A, for i = 1, 2, 3, . . . , 8 are illustrated by A , in
Figure 2.21. Obtain an upper bound on P ( X # X ) assuming that P ( X =
x,) = s for i = 1 , 2 , . . . , 8 .
Hint:
8
1. P ( X # X ) = ^ P ( X ^ X\X = x ,)P (X = x,.)
Figure 2.21 Signal values and decision regions for Problem 2.51.
108 REVIEW OF PROBABILITY A N D RANDOM VARIABLES
= H k( y ) h ( y ) , *= 1, 2, . . .
2.53 X has a triangular pdf centered in the interval [ - 1 , 1], Obtain a Gram-
Charlier approximation to the pdf o f X that includes the first six moments
o f X and sketch the approximation for values o f X ranging from —2 to 2.
2.54 Let p be the probability of obtaining heads when a coin is tossed. Suppose
we toss the coin N times and form an estimate of p as
f x hX 2 X„(*l> X 2, , X „) — fx (X i )
i=i
Assume that p.x = 0 and <j 2 xis finite,
a. Find the mean and variance of
n fr,
2.56 Show that if A,s are o f continuous type and independent, then for suffi
ciently large n the density of sin(A"! + X 2 + • • • + X n) is nearly equal
to the density o f sin(A') where X is a random variable with uniform dis
tribution in the interval ( —it, it).
2.57 Using the Cauchy criterion, show that a sequence X n tends to a limit in
the MS sense if and only if E { X mX n} exists as m, n —* °o.
2.58 A box has a large number o f 1000-ohm resistors with a tolerance of ±100
ohms (assume a uniform distribution in the interval 900 to 1100 ohms).
Suppose we draw 10 resistors from this box and connect them in series
PROBLEMS 109
and let R be the resistive value o f the series combination. Using the Gaus
sian approximation for R find
P[9000 < R < 11000]
2.59 Let
2.60 Y is a Guassian random variable with zero mean and unit variance and
sin (Y /n) ify>0
cos( Y /n ) if y s 0
Discuss the convergence o f the sequence X„. (Does the series converge,
if so, in what sense?)
2.61 Let Y be the number of dots that show up when a die is tossed, and let
X n = e x p [ - n ( Y - 3)]
Discuss the convergence o f the sequence X n.
2.62 Y is a Gaussian random variable with zero mean and unit variance and
X n = ex p (— Y/n)
Discuss the convergence o f the sequence X„.