Random Signals by Shanmugan1988 (1) - 23-124

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 102

CHAPTER TWO

Review of Probability and


Random Variables

2.1 INTRODUCTION

The purpose of this chapter is to provide a review of probability for those


electrical engineering students who have already completed a course in prob­
ability. We assume that course covered at least the material that is presented
here in Sections 2.2 through 2.4. Thus, the material in these sections is partic­
ularly brief and includes very few examples. Sections 2.5 through 2.8 may or
may not have been covered in the prerequisite course; thus, we elaborate more
in these sections. Those aspects of probability theory and random variables used
in later chapters and in applications are emphasized. The presentation in this
chapter relies heavily on intuitive reasoning rather than on mathematical rigor.
A bulk of the proofs of statements and theorems are left as exercises for the
reader to complete. Those wishing a detailed treatment of this subject are re­
ferred to several well-written texts listed in Section 2.10.
We begin our review o f probability and random variables with an introduction
to basic sets and set operations. We then define probability measure and review
the two most commonly used probability measures. Next we state the rules
governing the calculation of probabilities and present the notion of multiple or
joint experiments and develop the rules governing the calculation of probabilities
associated with joint experiments.
The concept of random variable is introduced next. A random variable is
characterized by a probabilistic model that consists of (1) the probability space,
(2) the set of values that the random variable can have, and (3) a rule for
computing the probability that the random variable has a value that belongs to
a subset of the set of all permissible values. The use of probability distribution
PROBABILITY 9

functions and density functions are developed. We then discuss summary meas­
ures -(averages or expected values) that frequently prove useful in characterizing
random variables.
Vector-valued random variables (or random vectors, as they are often re­
ferred to) and methods of characterizing them are introduced in Section 2.5.
Various multivariate distribution and density functions that form the basis of
probability models for random vectors are presented.
As electrical engineers, we are often interested in calculating the response
of a system for a given input. Procedures for calculating the details of the
probability model for the output of a system driven by a random input are
developed in Section 2.6.
In Section 2.7, we introduce inequalities for computing probabilities, which
are often very useful in many applications because they require less knowledge
about the random variables. A series approximation to a density function based
on some o f its moments is introduced, and an approximation to the distribution
o f a random variable that is a nonlinear function o f other (known) random vari­
ables is presented.
Convergence of sequences of random variable is the final topic introduced
in this chapter. Examples of convergence are the law of large numbers and the
central limit theorem.

PROBABILITY

In this section we outline mathematical techniques for describing the results of


an experiment whose outcome is not known in advance. Such an experiment is
called a random experiment. The mathematical approach used for studying the
results o f random experiments and random phenomena is called probability
theory. We begin our review of probability with some basic definitions and
axioms.

2.2.1 Set Definitions


A set is defined to be a collection o f elements. Notationally, capital letters A ,
B, . . . , will designate sets; and the small letters a, b, . . . , will designate
elements or members of a set. The symbol, e , is read as “ is an element o f,”
and the symbol, £ , is read “ is not an element o f.” Thus x e A is read “x is an
element o f A .”
Two special sets are of some interest. A set that has no elements is called
the empty set or null set and will be denoted by A set having at least one
element is called nonempty. The whole or entire space S is a set that contains
all other sets under consideration in the problem.
A set is countable if its elements can be put into one-to-one correspondence
with the integers. A countable set that has a finite number of elements and the
10 REVIEW OF PROBABILITY AND RANDOM VARIABLES

null set are called finite sets. A set that is not countable is called uncountable.
A set that is not finite is called an infinite set.

Subset. Given two sets A and B, the notation

A C B

or equivalently

B DA

is read A is contained in B, or A is a subset of B, or B contains A . Thus A is


contained in B or A C B if and only if every element of A is an element o f B.
There are three results that follow from the foregoing definitions. For an
arbitrary set, A

A <1S
l C A
A C A

Set Equality. Two arbitrary sets, A and B, are called equal if and only if they
contain exactly the same elements, or equivalently,

A = B if and only if A C B and B C A

Union. The Union o f two arbitrary sets, A and B, is written as

A UB

and is the set o f all elements that belong to A or belong to B (or to both). The
union o f N sets is obtained by repeated application o f the foregoing definition
and is denoted by

N
A , U A 2 U ••■U A n U A,

Intersection. The intersection o f two arbitrary sets, A and B, is written as


PROBABILITY 11

and is the set o f all elements that belong to both A and B. A n B is also written
A B . The intersection o f N sets is written as

N
A i (~l A 2 (~l ■ 0 * O Aft na1

Two sets are called mutually exclusive (or disjoint) if they


Mutually Exclusive.
have no common elements; that is, two arbitrary sets A and B are mutually
exclusive if

A n B = AB = l

where j is the null set.


The n sets A u A 2, . . . , A„ are called mutually exclusive if

Aj D A ; = ^ for all i, j, iA j

Complement. The complement, A , o f a set A relative to S is defined as the


set of all elements of S that are not in A.
Let S be the whole space and let A , B, C be arbitrary subsets of S. The
following results can be verified by applying the definitions and verifying that
each is a subset of the other. Note that the operator precedence is (1) paren­
theses, (2) complement, (3) intersection, and (4) union.

Commutative Laws.

A UB = B UA
A n B = B n A

Associative Laws.

(i4 U B ) U C = A U (B U C) = A U B U C
(A nB )nc = A n ( B n c ) = A fiB n c

Distributive Laws.

A n (B u C) = (a n b ) u (A n c )

AU(BnC) = (AUB)n(AUC)
12 REVIEW OF PROBABILITY AND RANDOM VARIABLES

DeMorgan’s Laws.

(A U B ) = A D B
(A n B ) = A U B

2.2.2 Sample Space


When applying the concept of sets in the theory o f probability, the whole space
will consist o f elements that are outcomes o f an experiment. In this text an
experiment is a sequence o f actions that produces outcomes (that are not known
in advance). This definition of experiment is broad enough to encompass the
usual scientific experiment and other actions that are sometimes regarded as
observations.
The totality o f all possible outcomes is the sample space. Thus, in applications
of probability, outcomes correspond to elements and the sample space corre­
sponds to S, the whole space. With these definitions an event may be defined
as a collection o f outcomes. Thus, an event is a set, or subset, o f the sample
space. An event A is said to have occurred if the experiment results in an outcome
that is an element o f A .
For mathematical reasons, one defines a completely additive family o f subsets
of S to be events where the class, S, of sets defined on S is called completely
additive if S. ,
i
1. s c s
n

2. If A k C S for k = 1, 2, 3, . . . , then U A k C S for n = 1, 2, 3, . . .


k= 1

3. If A C S, then A C S , where A is the complement o f A

2.2.3 Probabilities of Random Events


Using the simple definitions given before, we now proceed to define the prob­
abilities (of occurrence) o f random events. The probability of an event A , de­
noted by P (A ), is a number assigned to this event. There are several ways in
which probabilities can be assigned to outcomes and events that are subsets of
the sample space. In order to arrive at a satisfactory theory o f probability (a
theory that does not depend on the method used for assigning probabilities to
events), the probability measure is required to obey a set of axioms.

Definition. A probability measure is a set function whose domain is a com­


pletely additive class S of events defined on the sample space S such that the
measure satisfies the following conditions:
PROBABILITY 13

L P(S) = 1 ( 2 . 1)

2. P(A)>0 for all ACS ( 2 .2)

(2.3)

if A, n A, = $ for i A j,
and A' may be infinite
(0 is the empty or null set)

A random experiment is completely described by a samp Ice space, a probability


measure (i.e., a rule for assigning probabilities), and the class o f sets forming
the domain set of the probability measure. The combination of these three items
is called a probabilistic model.
By assigning numbers to events, a probability measure distributes numbers
over the sample space. This intuitive notion has led to tike use o f probability
distribution as another name for a probability measure. W e now present two
widely used definitions of the probability measure.

Relative Frequency Definition. Suppose that a random experiment is repeated


n times. If the event A occurs nA times, then its probability P (A ) is defined as
the limit of the relative frequency nAln o f the occurrence o f A . That is

P ( A ) = lim — (2.4)
/.-» n

For example, if a coin (fair or not) is tossed n times and heads show up nH
times, then the probability of heads equals the limiting value of nHln.

Classical Definition. In this definition, the probability P { A ) o f an event A is


found without experimentation. This is done by counting the total number, N,
o f the possible outcomes of the experiment, that is, the number o f outcomes in
S ( S is finite). If NA of these outcomes belong to event A , then P { A ) is defined
to be

(2.5)

If we use this definition to find the probability o f a tail when a coin is tossed,
we will obtain an answer o f |. This answer is correct when we have a fair coin.
If the coin is not fair, then the classical definition will lead to incorrect values
for probabilities. We can take this possibility into account and modify the def­
14 REVIEW OF PROBABILITY AND RANDOM VARIABLES

inition as: the probability of an event A consisting of NA outcomes equals the


ratio Na/N provided the outcomes are equally likely to occur.
The reader can verify that the two definitions of probabilities given in the
preceding paragraphs indeed satisfy the axioms stated in Equations 2.1-2.3. The
difference between these two definitions is illustrated by Example 2.1.

EXAMPLE 2.1. (Adapted from Shafer [9]).

DIME-STORE DICE: Willard H. Longcor of Waukegan, Illinois, reported in


the late 1960s that he had thrown a certain type of plastic die with drilled pips
over one million times, using a new die every 20,000 throws because the die
wore down. In order to avoid recording errors, Longcor recorded only whether
the outcome o f each throw was odd or even, but a group o f Harvard scholars
who analyzed Longcor’s data and studied the effects of the drilled pips in the
die guessed that the chances of the six different outcomes might be approximated
by the relative frequencies in the following table:

Upface 1 2 3 4 5 6 Total
Relative
Frequency .155 .159 .164 .169 .174 .179 1.000
1 1 1 1 1 1
Classical 6 6 6 6 6 6 1.000

They obtained these frequencies by calculating the excess of even over odd in
Longcor’s data and supposing that each side o f the die is favored in proportion
to the extent that is has more drilled pips than the opposite side. The 6, since
it is opposite the 1, is the most favored.*1

2.2.4 Useful Laws o f Probability


Using any of the many definitions of probability that satisfies the axioms given
in Equations 2.1, 2.2, and 2.3, we can establish the following relationships:

1. If $ is the null event, then


PROBABILITY 15

2. For an arbitrary event, A


P(A) < 1 (2.7)

3. If A U A = S and A fl A = 0 , then A is called the complement of A


and
P(A) = 1 - P{A) (2 . 8)

4. If A is a subset o f B, that is, A C B, then


P ( A ) < P( B) (2.9)
5. P( A U B) = F(^4) + P(B) - P(A n B) (2.10.a)
6. P (/l U B ) < P ( A ) + P(B) (2.10.b)
7. If A ly A 2, . ■ . , A n are random events such that
A t fl A } = 0 for iA j (2.10.C)
and
A 1U A 2 U • • • U An = S (2.10.d)

then
P ( A ) = P ( A n 5) = P[A fl {Ax U A ; U • • • U A n)\
= p [(a n A i) u (A n a 2) u • • • u (A n a „)]
= F ( A n ^ j ) + P (A n a 2) + • • ■ + P (A n A n) (2.10.e)
The sets A ,, A 2, . . . , A n are said to be mutuallyexclusive and exhaustive
if Equations 2.10.C and 2.10.d are satisfied.

8. = P { A l) + P { A 2A 2) + P { A lA1A 2) + ■ ■ •

( 2 . 11)

Proofs o f these relationships are left as an exercise for the reader.

2.2.5 Joint, Marginal, and Conditional Probabilities


In many engineering applications we often perform an experiment that consists
of many subexperiments. Two examples are the simultaneous observation of the
input and output digits of a binary communication system, and simultaneous
observation o f the trajectories of several objects in space. Suppose we have a
random experiment E that consists o f two subexperiments E { and E2 (for ex­
ample, E: toss a die and a coin; £j: toss a die; and E2: toss a coin). Now if the
sample space 51, o f E t consists o f outcomes a,, a2, . . . , and the sample space
16 REVIEW OF PROBABILITY AND RANDOM VARIABLES

S2 o f Ez consists o f outcomes b\, b2, . . . , b„2, then the sample space S of the
combined experiment is the Cartesian product o f Si and S2. That is

S = Si x S2
= {(a „ bj): i — 1,2, , nu j = 1, 2, . . . , n-h

We can define probability measures on 51; S2 and S = S2 X S2. If events A u


A 2, .. . , A„ are defined for the first subexperiment £j, and the eve
B2, .. . , B mare defined for the second subexperiment E2, then even
an event o f the total experiment.

Joint Probability. The probability o f an event such as A, n By that is the


intersection o f events from subexperiments is called the joint probability o f the
event and is denoted by B(A, H By). The abbreviation A,By is often used to
denote A, fl By.

Marginal Probability. If the events A u A 2, . . . , A„ associated with subex­


periment E j are mutually exclusive and exhaustive, then

P(Bj) = P{Bj n S) = P[Bj 0 (Ai U A 2 U • • • U A n)\

= £ P(A,B,) ( 2 . 12)
1=1

Since By is an event associated with subexperiment E2, B(B;) is called a marginal


probability.

Conditional Probability. Quite often, the probability of occurrence o f event


By may depend on the occurrence of a related event A,. For example, imagine
a box containing six resistors and one capacitor. Suppose we draw a component
from the box. Then, without replacing the first component, we draw a second
component. Now, the probability of getting a capacitor on the second draw
depends on the outcome o f the first draw. For if we had drawn a capacitor on
the first draw, then the probability of getting a capacitor on the second draw is
zero since there is no capacitor left in the box! Thus, we have a situation where
the occurrence o f event B; (a capacitor on the second draw) on the second
subexperiment is conditional on the occurrence of event A, (the component
drawn first) on the first subexperiment. We denote the probability of event By
given that event A, is known to have occurred by the conditional probability
B(By|A,).
A n expression for the conditional probability B(B|A) in terms of the joint
probability P ( A B ) and the marginal probabilities B (A ) and B (B ) can be ob­
tained as follows using the classical definition of probability. Let NA, Ns , and
PROBABILITY 17

N ab be the number o f outcomes belonging to events A, B, and AB, respectively,


and let A' be the total number o f outcomes in the sample space. Then,

(2.13)

Given that the event A has occurred, we know that the outcome is in A. There
are NA outcomes in A. Now, for B to occur given that A has occurred, the
outcome should belong to A and B. There are NaB outcomes in AB. Thus, the
probability o f occurrence o f B given A has occurred is

The implicit assumption here is that NA A 0. Based on this motivation we define


conditional probability by

P(B\A) - (2.14)

One can show that P(B\A) as defined by Equation 2.14 is a probability measure,
that is, it satisfies Equations 2.1, 2.2, and 2.3.

Relationships Involving Joint, Marginal, and Conditional Probabilities. The


reader can use the results given in Equations 2.12 and 2.14 to establish the
following useful relationships.

1. P(AB) = P(A\B)P(B) = P(B\A)P(A) (2.15)


2. If AB = 0, then P(A U B\C) = P(A\C) + P(B\C) (2.16)
3. P{ABC) - P{A)P(B\A)P(C\AB) {Chain Rule) (2.17)
4. If B2, . . . , B.m are a set of mutually exclusive and exhaustive
events, then

P { A ) = 2 F(A|B/)P (B /) (2.18)
18 REVIEW OF PROBABILITY AND RANDOM VARIABLES

EXAMPLE 2.2.

An examination o f records on certain components showed the following results


when classified by manufacturer and class o f defect:

Class of Defect
Si — B2 — B3 — 64 = 65 —
Manufacturer none critical serious minor incidental Totals
A4, 124 6 3 1 6 140
m2 145 2 4 0 9 160
m3 115 1 2 1 1 120
M4 101 2 0 5 2 110
Totals 485 11 9 7 18 530

What is the probability o f a component selected at random from the 530 com ­
ponents (a) being from manufacturer M2 and having no defects, (b) having a
critical defect, (c) being from manufacturer M,, (d) having a critical defect given
the component is from manufacturer M2, (e) being from manufacturer M u given
it has a critical defect?

SOLUTION:
(a) This is a joint probability and is found by assuming that each component
is equally likely to be selected. There are 145 components from M 2
having no defects out of a total of 530 components. Thus

/><«,«,) =i
(b) This calls for a marginal probability.
P ( B 2) = P ( M i B2) + P (M 2B2) + P (M 3B2) + P ( M 4B2)
6 2 1 2 11
~ 530 + 530 + 530 + 530 ~~ 530
Note that P ( B 2) can also be found in the bottom margin o f the table,
that is
PROBABILITY 19

(c) Directly from the right margin


140
W ) 530

(d) This conditional probability is found by the interpretation that given the
component is from manufacturer M2, there are 160 outcomes in the
space, two o f which have critical defects. Thus

P (B 2\M2) =
160

or by the formal definition, Equation 2.14

2
P (B 2M2) 530 2
P (B 2\M2)
P (M 2) 160 160
530

(e)
p {m ,\b 2) = Y\

Bayes’ Rule. Sir Thomas Bayes applied Equations 2.15 and 2.18 to arrive at
the form

P(A|Z?,.)P(fi,)
P(Bj\A) = (2.19)
£ P(A\B^P{B,)
i= i

which is used in many applications and particularly in interpreting the impact


of additional information A on the probability of some event P{Bj). An example
illustrates another application of Equation 2.19, which is called Bayes’ rule.

EXAMPLE 2.3.

A binary communication channel is a system that carries data in the form of


one of two types of signals, say, either zeros or ones. Because of noise, a
transmitted zero is sometimes received as a one and a transmitted one is some­
times received as a zero.
We assume that for a certain binary communication channel, the probability
a transmitted zero is received as a zero is .95 and the probability that a transmitted
20 REVIEW OF PROBABILITY AND RANDOM VARIABLES

one is received as a one is .90. We also assume the probability a zero is transmitted
is .4. Find

(a) Probability a one is received.


(b) Probability a one was transmitted given a one was received.

SOLUTION: Defining

A — one transmitted
A = zero transmitted
B = one received
B = zero received

From the problem statement

P ( A ) = .6, P(B\A) = .90, P(B\A) = .05

(a) With the use of Equation 2.18

P ( B ) = P (B \ A )P (A ) + P {B \ A )P (A )
= .90(.6) + ,05(.4)
= .56.
(b) Using Bayes’ rule, Equation 2.19

P(A\B) = = (-90X-6) _ 27
' P (B ) .56 28

Statistical Independence. Suppose that Aj and Bj are events associated with


the outcomes o f two experiments. Suppose that the occurrence o f Aj does not
influence the probability of occurrence of B, and vice versa. Then we say that
the events are statistically independent (sometimes, we say probabilistically in­
dependent or simply independent). More precisely, we say that two events A t
and Bj are statistically independent if

PiAtBj) = P (A i)P (B j) (2.20.a)

or when

PiA^Bj) = P ( A t) (2.20.b)
RANDOM VARIABLES 21

Equation 2.20.a implies Equation 2.20.b and conversely. Observe that statistical
independence is quite different from mutual exclusiveness. Indeed, if A t and Bj
are mutually exclusive, then P{AiBj) = 0 by definition.

2.3 R A N D O M V A R IA B L E S

It is often useful to describe the outcome of a random experiment by a number,


for example, the number o f telephone calls arriving at a central switching station
in an hour, or the lifetime o f a component in a system. The numerical quantity
associated with the outcomes of a random experiment is called loosely a random
variable. Different repetitions o f the experiment may give rise to different o b ­
served values for the random variable. Consider tossing a coin ten times and
observing the number o f heads. If we denote the number of heads by X , then
X takes integer values from 0 through 10, and X is called a random variable.
Formally, a random variable is afunction whose domain is the set of outcomes
X E S, and whose range is R iy the real line. For every outcome X E S, the
random variable assigns a number, Z (X ) such that

1. The set {X: JcT(X) ^ x} is an event for every x E R x.


2. The probabilities of the events {X:Z(X) = oo}; and (X:Z(X) = —» } equal
zero, that is,
P ( X = co) = p ( x = -oo) = o

Thus, a random variable maps S onto a set of real numbers Sx C R x, where Sx


is the range set that contains all permissible values o f the random variable. Often
Sx is also called the ensemble o f the random variable. This definition guarantees
that to every set A C S there corresponds a set T C Rx called the image (under
X ) of A . Also for every (Borel) set T C Rx there exists in S the inverse image
X ~ \ T ) where

x~\T) = {x e S :Z (x ) e t }

and this set is an event which has a probability, i3[Af_1( 2")].


We will use uppercase letters to denote random variables and lowercase
letters to denote fixed values of the random variable (i.e., numbers).
Thus, the random variable X induces a probability measure on the real line
as follows

P(X = x) = P (X:Z(X) = x)
P (Z < x) = P (X:Z(X) < x\
P{xx < X s x2) = P {X:jc, < Z(X) s x j
22 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

F ig u r e 2 .1 M a p p i n g o f t h e s a m p l e s p a c e b y a r a n d o m v a r ia b le .

EXAMPLE 2.4.

Consider the toss o f one die. Let the random variable X represent the value of
the up face. The mapping performed by X is shown in Figure 2.1. The values
of the random variable are 1, 2, 3, 4, 5, 6.

2.3.1 Distribution Functions


The probability P { X s x ) is also denoted by the function Fx (x), which is called
the distribution function o f the random variable X. Given Fx (x), we can compute
such quantities as P ( X > ^t), P { x x s X :£ x 2), and so on, easily.
A distribution function has the following properties

1. Fx {-* > ) = 0

2. Fx (« ) = 1

3. lim Fx {x + e) = Fx (x)
€'—
*0
€>0

4. Fx {xi) s F x ( x 2) if Xi < x 2

5. P [ x x < X £ x 2] = F x { x 2) - Fx (*,)

EXAMPLE 2.5.

Consider the toss o f a fair die. Plot the distribution function of X where X is a
random variable that equals the number o f dots on the up face.
RANDOM VARIABLES 23

1 1 1 1 1 1 1 1 1

1 T----------------------------------------
1
5/6
r r1 — 1
4/6
h 1
* W 3/6 t— 1
1
2/6 ■ """

1/6
- T---------1

1 i i i I l l_____ 1______1______
00 1 2 3 4 5 6 7 8 9 10
X

Figure 2.2 D is t r ib u t io n f u n c t i o n o f th e r a n d o m v a r ia b le X s h o w n in F ig u r e 2 .1 .

SOLUTION: The solution is given in Figure 2.2.

Joint Distribution Function. We now consider the case where two random
variables are defined on a sample space. For example, both the voltage and
current might be of interest in a certain experiment.
The probability o f the joint occurrence o f two events such as A and B was
called the joint probability P (A IT B). If the event A is the event { X < x) and
the event B is the event (Y < y), then the joint probability is called the joint
distribution function o f the random variables X and Y ; that is

f x .A.x , y) = =£ * ) n (Y =£ y)}

From this definition it can be noted that

Fx,y{ —K, —°°) = 0, F:r,y( —co, t ) = 0, FXyY(y°, y) = FY( y ),

FXY( x , -oo) = 0, FX'y (*>, °°) = 1, Fx x (x, °°) = Fx (x) (2.21)

A random variable may be discrete or continuous. A discrete random variable


can take on only a countable number of distinct values. A continuous random
variable can assume any value within one or more intervals on the real line.
Examples of discrete random variables are the number of telephone calls arriving
24 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

at an office in a finite interval o f time, or a student’s numerical score on an


examination. The exact time o f arrival o f a telephone call is an example o f a
continuous random variable.

2.3.2 Discrete Random Variables and Probability Mass Functions


A discrete random variable X is characterized by a set o f allowable values x u
Xi, . . . , x n and the probabilities o f the random variable taking on one o f these
values based on the outcome of the underlying random experiment. The prob­
ability that X = xt is denoted by P ( X = x t) for i = 1 ,2 , . . . ,n , and is called
the probability mass function.
The probability mass function o f a random variable has the following im­
portant properties:

1. P ( X = Xi) > 0, i = 1, 2, . . . , n (2.22.a)


rt
2. 2 P ( x = X,) = 1 (2.22.b)
/ =1
3. P { X s x ) = Fx { x ) = X P ( X = Xi) (2.22.c)
all x{£x

4. P ( X = Xi) = lim [Fx {Xi) - Fx (Xi - e)] (2.22.d)


€—►O
*>0

Note that there is a one-to-one correspondence between the probability distri­


bution function and the probability mass function as given in Equations 2.22c
and 2.22d.

EXAMPLE 2.6.

Consider the toss o f a fair die. Plot the probability mass function.

SOLUTION: See Figure 2.3.

Two Random Variables— Joint, Marginal, and Conditional Distributions and


Independence. It is o f course possible to define two or more random variables
on the sample space o f a single random experiment or on the combined sample
spaces of many random experiments. If these variables are all discrete, then
they are characterized by a joint probability mass function. Consider the example
of two random variables X and Y that take on the values x i7 x 2, . ■ ■ , x n and
Ti» y2, • • ■ , These two variables can be characterized by a joint probability
RANDOM VARIABLES 25

P ( X = x i)

1/6

Number of dots showing up on a die


Figure 2.3 P r o b a b ilit y m a s s f u n c t i o n f o r E x a m p l e 2 .6 .

mass function P ( X = x „ Y = yt), which gives the probability that X = x, and


Y = yr
Using the probability rules stated in the preceding sections, we can prove
the following relationships involving joint, marginal and conditional probability
mass functions:

1. P ( X < x, Y =£ y ) = Z Z p (x = Y = y,) (2.23)


*{sjr y^y
m
2. p {x = Xi) = Z p (x = y = yj)
y=i
m
= Z p (x = x ^Y = yi)p (Y = y<) (2 -24)

3. P (X = x,|Y = = P ( X p ( Y = Yyj) ~ ’ P (Y = y ) # °
(2.25)

- ° *1* = ...fi) , (Bayes> rule)

Z p (Y = y,\x = Xj)P(X = Xj)

(2.26)
4. Random variables X and Y are statistically independent if

P ( X = x h Y = y,) = P ( X = Xj)P(Y = yj)


i = 1, 2, . . . , n\ j = 1, 2, . . . , m (2.27)

EXAMPLE 2.7.

Find the joint probability mass function and joint distribution function of X , Y
associated with the experiment o f tossing two fair dice where X represents the
I
26 REVIEW OF PROBABILITY A N D RANDO M VARIABLES

4 number appearing on the up face o f one die and Y represents the number
appearing on the up face of the other die.
i
SOLUTION:
%

i 1
P ( X = i, Y = / ) = — , i = 1 ,2 , . . . ,6 ; j = 1, 2, . . - , 6

i x y i
f x .y ( x , y) = X X * = i >2 , . . . ,6 ; y = 1, 2, . . . , 6
t 1=1 /=i do
i xy
36
i

^ If jc and y are not integers and are between 0 and 6, Fx x (x, y ) = FXiy([^], [y])
^ where [a:] is the greatest integer less than or equal to x. Fx Y(x, y) — 0 for x <
4 1 or y < 1. Fx x {x, y ) = 1 for x s 6 and y > 6. Fx x (x, y ) = Fx {x) for y > 6.
Fx x {x, y ) = F y(y) for x > 6 .
i
I -------------------------------------------------------------------------------------------------------------------

i
jl 2.3.3 Expected Values or Averages
] The probability mass function (or the distribution function) provides as complete
a description as possible for a discrete random variable. For many purposes this
^ description is often too detailed. It is sometimes simpler and more convenient
f to describe a random variable by a few characteristic numbers or summary
measures that are representative of its probability mass function. These numbers
1 are the various expected values (sometimes called statistical averages). The ex-
I pected value or the average o f a function g (X ) o f a discrete random variable X
is defined as
1

, E {g {X ) } = 2 g ( x d P ( X = x,.) (2.28)
i ;=i

j It will be seen in the next section that the expected value of a random variable
is valid for all random variables, not just for discrete random variables. The
I form o f the average simply appears different for continuous random variables.
Two expected values or moments that are most commonly used for characterizing
a random variable X are its mean p.A- and its variance ux . The mean and variance
! are defined as

i
n
r E {X } = p* = ^ x ,P { X = x,) (2.29)
1= 1
RANDOM VARIABLES 27

£ {(Z - ixxf } = a l = i (x, - ilx?P (X = x.-) (2.30)


i=i

The square-root o f variance is called the standard deviation. The mean of a


random variable is its average value and the variance of a random variable is a
measure of the “ spread” of the values of the random variable.
We will see in a later section that when the probability mass function is not
known, then the mean and variance can be used to arrive at bounds on prob­
abilities via the Tchebycheff’s inequality, which has the form

P[|X - fixl > k] (2.31)

The Tchebycheff’s inequality can be used to obtain bounds on the probability


o f finding X outside of an interval p* ± k<jx .
The expected value of a function of two random variables is defined as

E {g (X , Y)} = 2 X g ( x „ y - )P {X = x(, Y = y,) (2.32)


; =i j=i

A useful expected value that gives a measure of dependence between two random
variables X and Y is the correlation coefficient defined as

E { { X - p ^ )(F - p y)} &XY (2.33)


PXY — tTXCTy (JXCTY

The numerator of the right-hand side of Equation 2.33 is called the covariance
(<Tsr) o f X and Y. The reader can verify that if X and Y are statistically inde­
pendent, then PxY = 0 and that in the case when Wand Y are linearly dependent
(i.e., when Y = (b + kX), then |pxy| = 1. Observe that pXY = 0 does not imply
statistical independence.
Two random variables X and Y are said to be orthogonal if

E {X Y } = 0

The relationship between two random variables is sometimes described in


terms of conditional expected values, which are defined as

E {g {X , Y)\Y = yft = X £(*/> yi)p (x = X‘\Y = tt) (2-34.a)


i

E {g (X , Y)\X = X,} = X g(*i> J';)p ( y = yi\x = x ‘)


(2.34.b)
i
28 REVIEW OF PROBABILITY AND RANDOM VARIABLES

The reader can verify that

E {g {X , Y)} = EXiY{ g ( X , Y)}


= E A E ^ l g i X , Y)\X\} (2.34.C)

where the subscripts denote the distributions with respect to which the expected
values are computed.
One o f the important conditional expected values is the conditional mean:

E {X ]Y = y,} = 2 x ,P ( X = x,\Y = y,) (2.34.d)


i

The conditional mean plays an important role in estimating the value o f one
random variable given the value o f a related random variable, for example, the
estimation of the weight o f an individual given the height.

Probability Generating Functions. When a random variable takes on values


that are uniformly spaced, it is said to be a lattice type random variable. The
most common example is one whose values are the nonnegative integers, as in
many applications that involve counting. A convenient tool for analyzing prob­
ability distributions o f non-negative integer-valued random variables is the prob­
ability generating function defined by

Gx { z ) = 2 z kP { X = k) (2.35.a)
k = 0

The reader may recognize this as the z transform o f a sequence o f probabilities


{/>*}, p k = P ( X = k), except that z _1 has been replaced by z. The probability
generating function has the following useful properties:

1. Gx { 1) = X P ( X = k) = 1 (2.35.b)
k =0
2. If Gx (z ) is given, p k can be obtained from it either by expanding it in a
power series or from

P ( X - k) - ^ [Gx ( z )]|r_0 (2.35.c)

3. The derivatives of the probability generating function evaluated at z =


1 yield the factorial moments C„, where

Cn = E { X ( X - l X * - 2) ■ • • ( X - n + 1)}

-£ lO A z)iU (2.35.d)
RANDOM VARIABLES 29

From the factorial moments, we can obtain ordinary moments, for example, as

M-x —

and

o-x = C2 + Q - Cl

2.3.4 Examples of Probability Mass Functions


The probability mass functions of some random variables have convenient an­
alytical forms. Several examples are presented. We will encounter these prob­
ability mass functions very often in analysis o f communication systems.

The Uniform Probability Mass Function. A random variable X is said to have


a uniform probability mass function (or distribution) when

P ( X = xi) = 1In, i = 1, 2, 3, . . . , n (2.36)

The Binomial Probability Mass Function. Let p be the probability of an event


A , of a random experiment E. If the experiment is repeated n times and the n
outcomes are independent, let X be a random variable that represents the num­
ber o f times A occurs in the n repetitions. The probability that event A occurs
k times is given by the binomial probability mass function

P ( X = k) = (") />*( 1 - Py - k, k = 0, 1, 2, . . . , n (2.37)

where

Ck) = A\{n ~~k ) \ and ml ^ m('m ~ ■ ■ ■ t3) ^ 1)'’ 0! = !■

The reader can verify that the mean and variance of the binomial random variable
are given by (see Problem 2.13)

l^x = nP (2.38.a)

u\ = tip{l - p ) (2.38.b)

Poisson Probability Mass Function. The Poisson random variable is used to


model such things as the number o f telephone calls received by an office and
30 REVIEW OF PROBABILITY AND RANDOM VARIABLES

the number o f electrons emitted by a hot cathode. In situations like these if we


make the following assumptions:

1. The number o f events occurring in a small time interval At —> X'At as


At —> 0.
2. The number o f events occurring in nonoverlapping time intervals are
independent.

then the number o f events in a time interval o f length T can be shown (see
Chapter 5) to have a Poisson probability mass function o f the form

X*
P ( X = k) * = 0, 1, 2, (2.39.a)
k!

where X — X'T. The mean and variance of the Poisson random variable are
given by

M-x = X. (2.39.b)

oi = X (2.39.c)

Multinomial Probability Mass Function. Another useful probability mass func­


tion is the multinomial probability mass function that is a generalization of the
binomial distribution to two or more variables. Suppose a random experiment
is repeated n times. On each repetition, the experiment terminates in but one
of k mutually exclusive and exhaustive events A u A 2, . . . , A k. Let /?,■ be the
probability that the experiment terminates in A, and let p, remain constant
throughout n independent repetitions of the experiment. Let X h i = 1,2, ,
k denote the number o f times the experiment terminates in event A,. Then

P (X i — X 2 — x2, X k — xk)
n\
P\pxi ■ ■ ' PT (2.40)
Xllx2l x k-i\xk\

where x, -I- x2 + • • • + x k = n, p, + p 2 + •••/ >*= 1, and x,- = 0, 1, 2, . . . ,


n. The probability mass function given Equation 2.40 is called a multinomial
probability mass function.
Note that with A j = A , and A 2 = A , p x = p , andp 2 = 1 — p, the multinomial
probability mass function reduces to the binomial case.
Before we proceed to review continuous random variables, let us look at
three examples that illustrate the concepts described in the preceding sections.
RANDOM VARIABLES 31

E X A M PLE 2.8.

The input to a binary communication system, denoted by a random variable X ,


takes on one of two values 0 or 1 with probabilities f and l, respectively. Due
to errors caused by noise in the system, the output Y differs from the input X
occasionally. The behavior of the communication system is modeled by the
conditional probabilities

P ( Y = 1\X = 1) = | and P (Y = 0|* = 0) = g

(a) Find P ( Y = 1) and P { Y = 0).


(b) Find P ( X = 1|Y = 1).

(Note that this is similar to Example 2.3. The primary difference is the
use of random variables.)

SOLUTION:
(a) Using Equation 2.24, we have
P (Y = 1) = P { Y = 1|AT = 0)P(AT = 0)
+ P ( Y = 1\X = 1) P ( X = 1)

23
P ( Y = 0) = 1 — P { Y = 1) =

(b) Using Bayes’ rule, we obtain-

P ( Y = 1\X = 1) P ( X = 1)
P { X = 1| Y = 1) =
P (Y = 1)

2
JL 3
32

P ( X = 1|Y = 1) is the probability that the input to the system is 1


when the output is 1.
32 REVIEW OF PROBABILITY AND RANDOM VARIABLES

EXAMPLE 2.9.

Binary data are transmitted over a noisy communication channel in blocks of


16 binary digits. The probability that a received binary digit is in error due to
channel noise is 0.1. Assume that the occurrence of an error in a particular digit
does not influence the probability of occurrence o f an error in any other digit
within the block (i.e., errors occur in various digit positions within a block in a
statistically independent fashion).
(a) Find the average (or expected) number of errors per block.
(b) Find the variance of the number o f errors per block.
(c) Find the probability that the number of errors per block is greater than
or equal to 5.

SOLUTION:
(a) Let X be the random variable representing the number o f errors per
block. Then, X has a binomial distribution

P ( X = k) = ( l 6)(.l)* (-9 )l(i-*, k = 0, 1, . . . , 16

and using Equation 2.38.a

E {X } = np = (1 6 )(.l) = 1.6
(b) The variance of X is found from Equation 2.38.b:
cri = rip(l - p) = (16)(.1)(.9) = 1.44
(c) P ( X > 5) = 1 - P { X < 4)

= 1 - 2 d V l ) * ( 0 . 9 ) 16-*
*=o K
= 0.017

EXAMPLE 2.10.

The number N o f defects per plate of sheet metal is Poisson with X = 10. The
inspection process has a constant probability of .9 o f finding each defect and
the successes are independent, that is, if M represents the number of found
defects

P ( M = i\N = n) = (” )(.9 ) '( .l)" - ', i < n


CONTINUOUS RANDOM VARIABLES 33

Find
(a) The joint probability mass function o f M and N.
(b) The marginal probability mass function o f M.
(c) The condition probability mass function of N given M.
(d) E{M\N}.
(e) E{M } from part (d).

SOLUTION:
n = 0, 1, .
e' 10 n
(a) P (M = i, N = « ) = (10)"(^)(-9)£(.1)"_ i = 0, 1, . •, n

»! ,.9 y u r ,
(b) P(M ' n! i\{n — i)!

g - 10( 9 ) ' ^ 1

i! n=, (n - /)!

i = 0, 1,

(c) P(7V = n|M = i) - nl (j )(-9)'(-l)" ' e- 9(9y


= e ' ll(n - i)l, n = i, i + 1, . ■ •
i = 0, 1, . . .

(d) Using Equation 2.38.a


E{M\N = n) = .9 n

Thus
E{M\N} = .9N

(e) E {M } = E n{E{M\N}} = E n(.9N) = (.9)E n{N} - 9

This may also be found directly using the results of part (b) if these results are
available.

2.4 CONTINUOUS R A N D O M VARIABLES

2.4.1 Probability Density Functions


A continuous random variable can take on more than a countable number of
values in one or more intervals on the real line. The probability law for a
34 REVIEW OF PROBABILITY AND RANDOM VARIABLES

continuous random variable X is defined by a probability density function (pdf)


f x ( x ) where

= dFx {x)
(2.41)
dx

With this definition the probability that the observed value of X falls in a small
interval of length A x containing the point x is approximated by )> (x)A x. With
such a function, we can evaluate probabilities of events by integration. As with
a probability mass function, there are properties that f x ( x ) must have before it
can be used as a density function for a random variable. These properties follow
from Equation 2.41 and the properties of a distribution function.

1. fx (x ) & 0 (2.42.a)

2. J f x ( x ) dx = 1 (2.42.b)

3. P ( X < a) = Fx {a) = f f x (x) dx (2.42.c)


J —X

4. P(a < X < b) = f f x (x) dx (2.42.d)


Ja

Furthermore, from the definition of integration, we have

P ( X = a) = f f x(x) dx = lim f x {a) Ax = 0 (2.42.e)


Ja Aat—
»0

for a continuous random variable.

F ig u r e 2 .4 D i s t r ib u t i o n f u n c t i o n a n d d e n s it y fu n c t i o n f o r E x a m p le 2 .1 1 .
CONTINUOUS RANDOM VARIABLES 35

EXAMPLE 2.11.

Resistors are produced that have a nominal value of 10 ohms and are ±10%
resistors. Assume that any possible value of resistance is equally likely. Find the
density and distribution function o f the random variable R, which represents
resistance. Find the probability that a resistor selected at random is between 9.5
and 10.5 ohms.

SOLUTION: The density and distribution functions are shown in Figure 2.4.
Using the distribution function,

P ( 9.5 < R s 10.5) = F*(10.5) - FR( 9.5) = ^ - i = i

or using the density function,

10.5 - 9.5 _ 1
9.5 2 2 ~ 2

Mixed Random Variable. It is possible for a random variable to have a dis­


tribution function as shown in Figure 2.5. In this case, the random variable and
the distribution function are called mixed, because the distribution function
consists of a part that has a density function and a part that has a probability
mass function.

F x k)

X
0

Figure 2 . 5 E x a m p l e o f a m i x e d d is t r ib u t io n fu n c t io n .
36 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

Two Random Variables—Joint, Marginal, and Conditional Density Functions


and Independence. If we have a multitude o f random variables defined on one
or more random experiments, then the probability model is specified in terms
of a joint probability density function. For example, if there are two random
variables X and Y, they may be characterized by a joint probability density
function f Xyy(x, y). If the joint distribution function, Fx, y, is continuous and
has partial derivatives, then a joint density function is defined by

d2Fx,Y(x, y)
f x A x >y)
dx dy

It can be shown that

f x A x >31) - 0

From the fundamental theorem of integral calculus

F x A x ’ y) /x,y (F , v) d\x dv

Since Fx, y(°°, °°) = 1

f x A P ; v) d\X dv 1
S!

A joint density function may be interpreted as

lim P[(x < X :£ x + dx) ft ( y < Y s y + dy)\ — f x y(x, y) dx dy


d x -* 0
dy— *Q

From the joint probability density function one can obtain marginal proba­
bility density functions f x (x), f r(y), and conditional probability density func­
tions /Ar|y(^|y) and f y\x(.y\x ) as follows:

f x i x) = J fx,y(x, y ) dy (2.43.a)

fy(y) = [ fx .rix , y) dx (2.43.b)


J -C O
CONTINUOUS RANDOM VARIABLES 37

U r{x \ y ) = M y)> « (2.44.a)

/* * (> !* ) = /x W > 0 (2.44.b)

f Y i xi y \x ) = f r r ( * | y ) / y ( y ? ..... Bayes, rule (2 .44.c)


J /;r|y(*|k).fy(k) ^

Finally, random variables X and Y are said to be statistically independent if

f x ,r ( x >y) = f x ( x ) f r ( y ) (2-45)

EXAMPLE 2.12.

The joint density function o f X and Y is

fx .r (x , y) = axy, l < r s 3 , 2 < j < 4


= 0 elsewhere

Find a, f x (x), and FY{y)

SOLUTION: Since the area under the joint pdf is 1, we have

The marginal pdf o f X is obtained from Equation 2.43.a as

fx(x) = m xydy = :k [ 8 ~ 2 ] = = ~4’ 1 - X~ 3


= 0 elsewhere
38 REVIEW OF PROBABILITY AND RANDOM VARIABLES

And the distribution function of Y is

F Y( y ) = 0, y< 2
= 1, y > 4
1 P dv
= h l j [ xvdxdv 6L v

= 12 “ 4^’ 2 - ^ - 4

Expected Values. As in the case o f discrete random variables, continuous


random variables can also be described by statistical averages or expected values.
The expected values o f functions o f continuous random variables are defined
by

E {g {X , Y)} = J J g(x, y ) f x ,y ( x , y) dx dy (2.46)

\xx = E {X } = J x f x (x) dx (2.47.a)

°i = E {( X - p * )2} = J (x - n-x )2fx(x) dx (2.47.b)

°Ar = E { ( X —p.Ar)(V — P-r)} (2.47.c)

~ J j (x ~ tArXy ~ ^y)fx,y(x, y) dx dy

E {(X - p ^ X Y - p,y)}
Px y = (2.47.d)
(JxCTy

It can be shown that —I s pXY < 1. The Tchebycheff’s inequality for a contin­
uous random variable has the same form as given in Equation 2.31.
Conditional expected values involving continuous random variables are de­
fined as

E{g{X, P)|y = y} = f g(x, y)fxir{x\y) dx (2.48)


J — CO
CONTINUOUS RANDOM VARIABLES 39

F'umhy, if X and Y are independent, then

E {g (X ) h ( Y )} = E {g (X ) }E { h ( Y )} (2.49)

It should be noted that the concept of the expected value o f a random variable
is equally applicable to discrete and continuous random variables. Also, if gen­
eralized derivatives of the distribution function are defined using the Dirac delta
function S(jc), then discrete random variables have generalized density functions.
For example, the generalized density function o f die tossing as given in Example
2.6, is

/r W = g [&(x - 1) + s(x - 2) + 5(* - 3)

+ S(x — 4) + S(x — 5) + 8(x — 6)]

If this approach is used then, for example, Equations 2.29 and 2.30 are special
cases o f Equations 2.47.a and 2.47.b, respectively.

Characteristic Functions and Moment Generating Functions. In calculus we


use a variety of transform techniques to help solve various analysis problems.
For example, Laplace and Fourier transforms are used extensively for solving
linear differential equations. In probability theory we use two similar “ trans­
forms” to aid in the analysis. These transforms lead to the concepts o f charac­
teristic and moment generating functions.
The characteristic function ^ ( w ) of a random variable X is defined as the
expected value of exp(jwX)

T a-(m) = E{exp(;'coX)}, = V^T

For a continuous random variable (and using 8 functions also for a discrete
random variable) this definition leads to

(2.50.a)

which is the complex conjugate o f the Fourier transform of the pdf of X . Since
|exp(;W)| < 1,

and hence the characteristic function always exists.


40 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

Using the inverse Fourier transform, we can obtain f x (x) from T x(co) as

f x (x ) = ^ J y x ( u ) e x p ( -ju>x) dco (2.50.b)

Thus, f x (x) and T x (co) form a Fourier transform pair. The characteristic function
of a random variable has the following properties.

1. The characteristic function is unique and determines the pdf of a random


variable (except for points o f discontinuity of the pdf). Thus, if two
continuous random variables have the same characteristic function, they
have the same pdf.
2. ¥ *(0) = 1, and

E { X k} at co = 0 (2.51.a)
du>k

Equation (2.51. a) can be established by differentiating both sides of


Equation (2.50.a) k times with respect to co and setting co = 0.

The concept o f characteristic functions can be extended to the case of two


or more random variables. For example, the characteristic function o f two ran­
dom variables X x and X 2 is given by

'*fA-1,x2(“ i> “ 2) = £{exp(y'co1AT1 + j<a2X 2)} (2.51.b)

The reader can verify that

n . x 2(0, 0) = 1

and

dmdn
E{XTXa = [¥ * „ * (» ,, co2)] at (co,, co2) = (0, 0) (2.51.C)

The real-valued function Mx {t) = £ {exp (fX )} is called the moment generating
function. Unlike the characteristic function, the moment generating function
need not always exist, and even when it exists, it may be defined for only some
values of t within a region o f convergence (similar to the existence of the Laplace
transform). If Mx {t) exists, then M x {t) = tyx (t/j).
We illustrate two uses o f characteristic functions.
CONTINUOUS RANDOM VARIABLES 41

EXAMPLE 2.13.

X j and X 2 are two independent (Gaussian) random variables with means p t and
p 2 and variances a} and cr2. The pdfs of X x and X 2 have the form

1 (x, ~ p ,)2]
fx,(xd exp 1 ,2
V 2 tr cr; 2cr} J’

(a) Find T X|(to) and T x,(to)


(b) Using T x ( oj) find E {X A} where X is a Gaussian random variable with
mean zero and variance cr2.
(c) Find the pdf of Z = axX x + a2X 2

SOLUTION:
p» 1
(a) = —i==— e x p [ - ( ^ t - Pi)2/2cri]exp(/co^1) dxx
J V 2 tt (Tx

We can combine the exponents in the previous equation and write it as


exp[;p!Co + (cri;'co)2/2 ] e x p { - [xx - (p[ + cr2/co)]2/2cri}

and hence

= exP[/M-i“ + (cri/co)2/,2] •J -
x exp[—( x x — p [)2/2cr2] dxt

where pj = Pi + cr2/co.
The value of the integral in the preceding equation is 1 and hence
¥ * ,(«) = exp[/p!co + (cr1/u ) 2/2]

Similarly
= exp[yp2co + (cr2/co)2/ 2]

(b) From part (a) we have


Tc*(co) = exp( —cr2co2/2)

and from Equation 2.51.a

E{X*} = Ji {Fourth derivative of 1T'^(oj) at co = 0}

= 3cr4
Following the same procedure it can be shown for X a normal random
variable with mean zero and variance cr2 that

n = 2k + 1
s m - {;.3 1K n = 2k, k an integer.
42 REVIEW OF PROBABILITY A N D RA NDO M VARIABLES

(c) ¥ z (<o) = £ {exp (/toZ )} = E{exp(j(li[alX l + a2X 2])}


= E{exp(j(DalX 1)exp(j<j>a2X 2)}
= E{exp(ju>ai X 1)}E{exp(j(j)a2X 2)}
since X 2 and X 2 are independent. Hence,

exP[/(aiM-i + « 2p 2)“ + (a-jal + (J2a2)(ja)2/ 2]

deUfm ed b yGeneratin8 FUnCti° n' ^ CUmU‘ant Senerating Unction Cx o f X is

Uat( w) — In Tpx(w) (2.52.a)

Thus

expfC^w )} = ¥ * ( 10)

Using series expansions on both sides of this equation results in

- 1 + E[X](ju>) + E [ X 2] + ■ • • + E [ X n]
+ • • • (2.52.b)
2- n\

defined by the identity in co given in Equation 2.52.b


Expanding the left-hand side of Equation 2.52,b as the product o f the Taylor
series expansions of

2 nl
CONTINUOUS RANDOM VARIABLES 43

and equating like powers of o> results in

E [X ] = K x
(2.52.c)
E [ X 2] = K 2 + K\
(2.52.d)
E [ X 3] = K 3 + 3K2K, + K\
(2.52.e)
E [ X 4] = K4 + 4K 3K, + 3 K\ + 6 K 2K\ + K\ (2.52.f)

r,artRefr T Ce ^ c°n tains more information on cumulants. The cumulants are


particularly useful when independent random variables are summed because the
individual cumulants are directly added.

2.4.2 Examples of Probability Density Functions


We now present three useful models for continuous random variables that will
be used later. Several additional models are given in the problems included at
the end o f the chapter.

Uniform Probability Density Functions. A random variable X is said to have


a uniform pdf if

f Y(x ) = 1 1 ,(b - a)> a<x


[0 elsewhere (2.53.a)

The mean and varianpe o f a uniform random variable can be shown to be

b + a
~ ~Y~ (2.53.b)

2 (b - a f
x ~ 12 (2.53.C)

Gaussian Probability Density Function. One of the most widely used pdfs is
the Gaussian or normal probability density function. This pdf occurs in so many
applications partly because o f a remarkable phenomenon called the central limit
theorem and partly because o f a relatively simple analytical form. The central
limit theorem, to be proved in a later section, implies that a random variable
that is determined by the sum o f a large number o f independent causes tends
to have a Gaussian probability distribution. Several versions o f this theorem
have been proven by statisticians and verified experimentally from data by en­
gineers and physicists.
One primary interest in studying the Gaussian pdf is from the viewpoint of
using it to model random electrical noise. Electrical noise in communication
44 REVIEW OF PROBABILITY AND RANDOM VARIABLES

fxb)

Figure 2.6 Gaussian probability density function.

systems is often due to the cumulative effects of a large number of randomly


moving charged particles and hence the instantaneous value of the noise will
tend to have a Gaussian distribution— a fact that can be tested experimentally.
(The reader is cautioned that there are examples of noise that cannot be modeled
by Gaussian pdfs. Such examples include pulse type disturbances on a telephone
line and the electrical noise from nearby lightning discharges.)
The Gaussian pdf shown in Figure 2.6 has the form

1 O - fxx) 2
fx(x) = exp (2.54)
V 2t 2<
j\

The family of Gaussian pdfs is characterized by only two parameters, \lx and
a x2, which are the mean and variance of the random variable X . In many ap­
plications we will often be interested in probabilities such as

l (x ~ P x ) 2
P ( X > «) = f exp dx
Ja V 2 ttcrl 2oi

By making a change o f variable z = (x - \ix) la x , the preceding integral can


be reduced to

1
P (X > a ) = r exp( - z 2/2) dz
P-x)/ax V 2 tt
CONTINUOUS RANDOM VARIABLES 45

Unfortunately, this integral cannot be evaluated in closed form and requires


numerical evaluation. Several versions of the integral are tabulated, and we will
use tabulated values (Appendix D ) o f the Q function, which is defined as

Q ( y) = J' e x p ( - z 2/2) dz, y> 0 (2.55)

In terms o f the values of the Q functions we can write P ( X > a) as

P ( X > a) = Q[(a - p * ) / ^ ] (2-56)

Various tables give any of the areas shown in Figure 2.7, so one must observe
which is being tabulated. However, any of the results can be obtained from the
others by using the following relations for the standard (p = 0, cr = 1) normal
random variable X:

/ > ( * < * ) = 1 - g (x )
P ( - a < X < a) = 2 P ( - a < I < 0 ) = 2F(0 < AT < a)

P ( X < 0) = ^ = g (0 )

EXAMPLE 2.14.

The voltage X at the output o f a noise generator is a standard normal random


variable. Find P ( X > 2.3) and P( 1 < AT < 2.3).

SOLUTION: Using one of the tables of standard normal distributions

P ( X > 2.3) = 2 (2 .3 ) = .011

P(1 £ X s 2.3) = 1 - 2 (2 .3 ) - [1 - 2 (1 )] = 2 (1 ) - 2 (2 .3 ) » .148


46 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

EXAMPLE 2.15.

The velocity V of the wind at a certain location is normal random variable with
|jl = 2 and cr = 5. Determine P( —3 < V < 8).

SOLUTION:

i f («-2)n
P( - 3 < V : exp
V2ir(25) L 2(25) J
( 8 - 2 ) /5 1

-I ( —3 —2)/5 V Z r r
exp
H] dx

1 - <2(1.2) - [1 - Q ( - 1)] » -726

Bivariate Gaussian pdf. We often encounter the situation when the instanta­
neous amplitude of the input signal to a linear system has a Gaussian pdf and
we might be interested in the joint pdf o f the amplitude o f the input and the
output signals. The bivariate Gaussian pdf is a valid model for describing such
situations. The bivariate Gaussian pdf has the form

1 ■1
f x A x>y) exp
2'irCTA-cryVl — p 2(1 - p*)
2p(* - - |xy)~
ux u Y (2.57)

The reader can verify that the marginal pdfs o f X and Y are Gaussian with
means \lx , Py, and variances cr\, u\, respectively, and

E {(X P a) ( T ~ py)} O 'A T


P Pa t —
o X&Y OyO y

2.4.3 Complex Random Variables


< ^

complex random variable Z is defined in terms o f the real random variables


and T b y

Z = X + jY
RANDOM VECTORS 47

The expected value o f g ( Z ) is defined as

E{ g ( Z) } = | | g ( z ) f X:Y(x, y) dx dy

Thus the mean, p.z , o f Z is

V-z = -E{^} = E { X } + j E { Y } = [xx + j\xy

The variance, a\, is defined as

oi = E{\Z - p.z |2}

The covariance of two complex random variables Zmand Z„ is defined by

CZmz„ = E {(Z m - p.Zm)*(Z „ - p*.)}

where * denotes complex conjugate.

2.5 RAN DO M VECTORS

In the preceding sections we concentrated on discussing the specification of


probability laws for one or two random variables. In this section we shall discuss
the specification of probability laws for many random variables (i.e., random
vectors). Whereas scalar-valued random variables take on values on the real
line, the values of “ vector-valued” random variables are points in a real-valued
higher (say m) dimensional space ( Rm). An example o f a three-dimensional
random vector is the location of a space vehicle in a Cartesian coordinate system.
The probability law for vector-valued random variables is specified in terms
of a joint distribution function

FXl...xm(x i> •••>*<-.) = ? [ ( * ! xO . . . ( X m < xm)]

or by a joint probability mass function (discrete case) or a joint probability-


density function (continuous case). We treat the continuous case in this section
leaving details of the discrete case for the reader.
The joint probability density function of an m-dimensional random vector
is the partial derivative of the distribution function and is denoted by

f x t,X2....... x S Xl ’ X2’ ‘ ‘ ' ’ ■ *"■ )


48 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

From the joint pdf, we can obtain the marginal pdfs as

f x , ( x l) = J j . ■ ■j f x 1,X2,...,X„(x 1 ’ x 2’ • ■ ■ ’ x m) dx2 • ■ ■ dxm

m — 1 integrals

and

fx,,X2(XU x 2)
= J J ■ ■ ■J fx „x 2....,xJx 1 , x 2>x 3, ■ ■ ■ , x m) dx3 dxA ■ ■ • dxm (2.58)

m — 2 integrals

Note that the marginal pdf of any subset o f the m variables is obtained by
“ integrating out” the variables not in the subset.
The conditional density functions are defined as (using m = 4 as an example),

r r \ \ fx,.X,.X,.xAXl’ X2>x 3 j x 4)
fx„X2.X3\xAXl’ X2,*3l*4) = -.J...*■ , , -------------- (2.59)

and

f x ,. X - , .X ,. x A X l ’ X 2i X 3i * 4 )
/ v 1.A’2|A’3.V4(-':1> -^-2!35 * 4) — (2.60)
f x 3.xXXl’ x*)

Expected values are evaluated using multiple integrals. For example,

E { g ( X u X 2, X 3, X ,) }

= J J J J g(x t, x 2, x 3 , x j f x t . x ^ . x & i , x 2, X}, xd dXi dx2 dx2 dxA

(2.61)

where g is a scalar-valued function. Conditional expected values are defined for


example, as

E { g { X u X 2, X 3, X,)\X3 = x3, X A = x4}

= I I g ( * l , x 2, X 3, x J f x ^ . X i X ^ x S X u X^ X 3i X * ) d x \ d x 2 ( 2 -6 2 )
J — CO J — =0
RANDOM VECTORS 49

Important parameters o f the joint distribution are the means and the co-
variances

*x, = E{X,}

and

o x,x, — E { X i X j } — (J-x M-a-,

Note that a XXi is the variance of AT,-. We will use both <JX X , and ux . to denote
the variance o f X r Sometimes the notations EXi, Ex .x ., Ex .x . are used to denote
expected values with respect to the marginal distribution o f X h the joint distri­
bution of X t and X h and the conditional distribution o f A", given Xjt respectively.
We will use subscripted notation for the expectation operator only when there
is ambiguity with the use o f unsubscripted notation.
The probability law for random vectors can be specified in a concise form
using the vector notation. Suppose we are dealing with the joint probability law
for m random variables X lt X 2, . ■ ■ , X m. These m variables can be represented
as components of an m x 1 column vector X ,

Xi
x 2
X == or X T = { X „ X 2, . . . , X m)

xm

where T indicates the transpose of a vector (or matrix). The values of X are
points in the m-dimensional space Rm. A specific value o f X is denoted by

X T = (*!, x 2, . . . , X m)

Then, the joint pdf is denoted by

/x ( x) = i X 1,X i > .. ., X m { X u x2, . ■ ■ , x m)

The mean vector is defined as

E(X, ) '
E ( X 2)
M-x = £ (X ) =
E { X m)
50 REVIEW OF PROBABILITY AND RANDOM VARIABLES

and the “ covariance-matrix” , 2 X, an m x m matrix is defined as

2X = UjXX7} - P 'X P 'X

°> ,* 2 ■" V x ,x „

=
UX X22 <*x2x 2 ••• <JX X m2

J*X „X 2 ® x mx 2 "■ O ’X „ x m_

The covariance matrix describes the second-order relationship between the com­
ponents of the random vector X . The components are said to be “ uncorrelated”
when

<rx,x, = <T; = 0, i j

and independent if

f x l,x2,...,xm(x i> X2 , ■ ■ ■ , x m) - fx<(x i) (2.63)

2.5.1 Multivariate Gaussian Distribution


An important extension of the bivariate Gaussian distribution is the multivariate
Gaussian distribution, which has many applications. A random vector X is mul­
tivariate Gaussian if it has a pdf of the form

Jx(x) = [(2TT)m/2|2 x|1/2] _1eXp 4 (x fLx )r2 x 1(x M'x) (2.64)

where |xx is the mean vector, 2 x is the covariance matrix, 2 X* is its inverse, |2 X|
is the determinant of 2 X, and X is of dimension m.

2.5.2 Properties of the Multivariate Gaussian Distribution


We state next some of the important properties of the multivariate Gaussian
distribution. Proofs o f these properties are given in Reference [6].
RANDOM VECTORS 51

1. Suppose X has an m-dimensional multivariate Gaussian distribution. If


we partition X as

”x ,
~x; ~xt+1~
x2 x k+2
X = x t = x2=
x 2
_Xk_ _X„ _
and

i_W__
X 12
d. i
M'X = * 1 Xx —
^X2
1 X2i X22

where |xXj is k x 1 and X n is k x k, then X ! has a fc-dimensional


multivariate Gaussian distribution with a mean p Xl and covariance X n .
2. If X x is a diagonal matrix, that is,

0 0 •■ 0
0 0 •• 0
2X—
0 0 0 ■■

then the components of X are independent (i.e ., uncorrelatedness implies


independence. However, this property does not hold for ofher distri­
butions).
3. If A is a it x m matrix of rank k, then Y = A X has a fc-variate Gaussian
distribution with

P y — A|xx (2.65.a)

XY = A X xA r (2.65.b)

4. With a partition of X as in (1), the conditional density of Xi given X 2 =


x2 is a ^-dimensional multivariate Gaussian with

H'xjxj = £ [ X i|X2 = x2] = p Xl + Xi 2X 22‘ (x 2 — p X2) (2.66.a)

and

Xx,|x2 — Xn X 12X 22‘ X 21 (2 .66.b)

Properties (1), (3), and (4) state that marginals, conditionals, as well as linear
transformations derived from a multivariate Gaussian distribution all have mul­
tivariate Gaussian distributions.
52 REVIEW OF PROBABILITY AND RANDOM VARIABLES

EXAMPLE 2.15.

Suppose X is four-variate Gaussian with

2
1
1
0

and

6 3 2 1
3 4 3 2
2X
2 3 4 3
1 2 3 3

Let

X2 = *3
*4

(a) Find the distribution of X ,.


(b) Find the distribution of

2Xx
Y = X x + 2X2
X3 + X4

(c) Find the distribution of X ! given X 2 = (x$, x^)T.

SOLUTION:
(a) X j has a bivariate Gaussian distribution with

2 6 3
M-x, = and ^Xj
1 3 4

(b) We can express Y as

2 0 0 0
X2
Y = 1 2 0 0 = AX
X ,3
0 0 1 1
X4
RANDOM VECTORS 53

Hence Y has a trivariate Gaussian distribution with

"2~
"2 0 0 o' ~4~
1
|Ay — A|XX — 1 2 0 0 = 4
1
0 0 1 1 1
0

and

2 y = A 2 XA 7
"2 1 o'
2 0 0 0
0 2 0
1 2 0 0
0 0 1
0 0 11
0 0 1
24 24 6
24 34 13
6 13 13

(c) Xj given X 2 = (x3, x 4) T has a bivariate Gaussian distribution with

2 1 4 3 *3 - 1
J-Lx.x, — 3 3
3 2 x4 — 0

*3 “ 3 *4 + 1

3 **

and
-1
'4 3" '2 3"
-X ,| X 2
3 l\ -\ _3 3_ _! 2_
14/3 4/3
4/3 5/3

2.5.3 Moments o f Multivariate Gaussian pdf


Although Equation 2.65 gives the moments of a linear combination of multi­
variate Gaussian variables, there are many applications where we need to com­
pute moments such as E {XjXfy, E{X-iX 1X iX a} , and so on. These moments can
54 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

be calculated using the joint characteristic function of the multivariate Gaussian


density function, which is defined by

o>2, • • • , w*) = E{exp[j((x}1X 1 + o>2X 2 + • • • co„X„)]}


1
= exp | ^ <»T2x*» (2.67)

where <aT = (coj, o)2, . . . , o)„). From the joint characteristic function, the
moments can be obtained by partial differentiation. For example,

V d4,Px(a>i, Q)2, 0)3, 0)4)


E IX ^ X .X ,} at w = ( 0) ( 2 . 68)
3co13co23tt)33tt)4

To simplify the illustrative calculations, let us assume that all random variables
have zero means. Then,

* x ( 0)i, to2, to3, a>4) = exp

Expanding the characteristic function as a power series prior to differentiation,


we have

^ ( “ i , w 2, u>3, co4) = 1 - - w r S x to

+ i (t» T2 x o>)2 + R

where R contains terms o f o) raised to the sixth and higher power. When we
take the partial derivatives and set 0)3 = o>2 = o)3 = co4 = 0, the only nonzero
terms come from terms proportional to co1a>2a>3co4 in

- (w TS x w )2 = - + c r 22c o l + O 33 0)^ + 0 -4 4 ^

+ 2 ct12( 0 1C02 + 2 ( J 13COi O)3 + 20-340)30)4

+ 20-^0)20)3 + 2cr240)20)4 + 20340)30)4}2


TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 55

When we square the quadradic term, the only terms proportional to (j) 1w2oj3co4
will be

{ 8 (1 -2 (1 5 4 ( 0 1 (O n t o . OJ4 + 8 c 7 ;,C 7 : .; t o 2 ^0 ', t o ; W.J + 8 ( 7 3 1 7 7 ;. tiJ37l)4 0 ) ^ }


8

Taking the partial derivative o f the preceding expression and setting to = (0),
we have

E {X 1X 2X 3X 4} = ct12ct34 + ct23 ct I4 + ct24CT|3

= E { X 1X 2} E { X 3X 4} + E { X 2X 3} E { X 1X 4}
+ E {X 2X A} E { X l X 7} (2.69)

The reader can verify that for the zero mean case

E{X\Xi) = E{X\}E{Xfs + 2 [ E { X , X 2} f (2.70)

2.6 TRANSFORMATIONS (FUNCTIONS) OF


RANDOM VARIABLES

In the analysis of electrical systems we are often interested in finding the prop­
erties of a signal after it has been “ processed” by the system. Typical processing
operations include integration, weighted averaging, and limiting. These signal
processing operations may be viewed as transformations of a set of input variables
to a set of output variables. If the input is a set of random variables, then the
output will also be a set of random variables. In this section, we develop tech­
niques for obtaining the probability law (distribution) for the set o f output
random variables given the transformation and the probability law for the set
o f input random variables.
The general type of problem we address is the following. Assume that X is
a random variable with ensemble Sx and a known probability distribution. Let
g be a scalar function that maps each x G Sx to y = g(x). The expression

^ = g (X )
56 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Sample space Range set S x C R x Range set S y C

F ig u r e 2 .8 T r a n s f o r m a t i o n o f a r a n d o m v a r ia b le .

defines a new random variable* as follows (see Figure 2.8). For a given outcome
k, X(k) is a number x, and g[X(X.)] is another number specified by g(x). This
number is the value o f the random variable Y, that is, Y(k) = y = g(x). The
ensemble SY of Y is the set

s Y = {y = sM '■x e sx}

We are interested in finding the probability law for Y.


The method used for identifying the probability law for Y is to equate the
probabilities of equivalent events. Suppose C C SY. Because the function g(x)
maps Sx —* SY, there is an equivalent subset B, B C Sx , defined by

B = {* :g (* ) S C}

Now, B corresponds to event A , which is a subset of the sample space S (see


Figure 2.8). It is obvious that A maps to C and hence

P (C ) = P (A ) = P (B )*1
3
2

*For Y to be a random variable, the function g : X —> Y must have the following properties:

1. Its domain must include the range o f the random variable X .

2. It must be a Baire function, that is, for every y , the set / , such that g(;t) s y must consist
o f the union and intersection o f a countable number o f intervals in S x . Only then {Y s y}
is an event.
3. The events {X : g(/Y (X )) = ± » } must have zero probability.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 57

Now, suppose that g is a continuous function and C = y]. If B -


{x : g(x) s y}, then

P(C ) = P ( Y ^ y ) = Fy(y)

= [ f x (x ) dx
Jb

which gives the distribution function o f Y in terms of the density function of X.


The density function of Y (if Y is a continuous random variable) can be obtained
by differentiating FY(y).
As an alternate approach, suppose Iy is a small interval of length Ay con­
taining the point y. Let Ix = {x : g(x) E Iy}. Then, we have

P (Y E l y) - f Y(y ) Ay

= J f x (x ) dx

which shows that we can derive the density o f Y from the density of X.
We will use the principles outlined in the preceding paragraphs to find the
distribution of scalar-valued as well as vector-valued functions of random vari­
ables.

2.6.1 Scalar-valued Function o f One Random Variable


Discrete Case. Suppose A is a discrete random variable that can have one
of n values x u x 2, . . - , x n. Let g(x) be a scalar-valued function. Then Y =
g(X ) is a discrete random variable that can have one of m, m s n, values yi,
y2, . . . , y m. I f g ( X ) isaone-to-one mapping, thenmwill be equal ton. However,
if g(x) is a"many-to-one mapping, then m will be smaller than n. The probability
mass function of Y can be obtained easily from the probability mass function of
X as

P { Y = yi) = £ P { X = x,)

where the sum is over all values o f x, that map to y;.

Continuous Random Variables. If X is a continuous random variable, then the


pdf of y = g (V ) can be obtained from the pdf o f X as follows. Let y be a
particular value of Y and let * « , x(2), . . • , xF> be the roots of the equation y =
g(x). That is y = g ( x « ) = . . . = g (* w)- (For example, if y = *2, then the
58 REVIEW OF PROBABILITY AND RANDOM VARIABLES

two roots are x® — + V y and x® = —V y ; also see Figure 2.9 for another
example.) We know that

P ( y < Y < y + Ay) = f Y(y) Ay as Ay - » 0

Now if we can find the set of values of x such that y < g(x) < y + Ay, then
we can obtain f Y(y ) Ay from the probability that X belongs to this set. That is

P { y < Y < y + Ay) = P [{x:y < g(x) < y + Ay}]

For the example shown in Figure 2.9, this set consists o f the following three
intervals:

x® < x £ x® + Ax®
x® + A x(2) < x s x®
x® < x :£ x® + Ax®
TRANSFORMATIONS ( FUNCTIO NS ) OF RANDO M VARIABLES 59

where Ax® > 0, Ax® > 0 but Ax® < 0. From the foregoing it follows that

P (y < Y < y + Ay) = P(x® < X < x® + A x(l))


+ P(x® + Ax® < X < x®)
+ P(x® < AT < x® + Ax®)

We can see from Figure 2.9 that the terms in the right-hand side are given by

P(x® < AT < X ® + Ax®) = /* (x ® ) Ax®

P(x® + Ax® < X < x® ) = /x(x®)|Ax®|

P(x® < X < x® + Ax®) = f x (x®) Ax®

Since the slope g'(x) of is Ay/Ax, we have

A x(1) = A y/g'(x(1))
Ax® = Ay /g '(x ® )

A x® = A y/g'(x® )

Hence we conclude that, when we have three roots for the equation y = g(x),

/ x ( * (1)) x(*(2)) /x (* (3))


JV(y)Ay Ay + + Ay
g'(*(1)) |g'(x®)| g '(x (3))

Canceling the Ay and generalizing the result, we have

^ lx (£ ^ (2.71)
fv(y)
h ig'(^(0)l

g'(x) is also called the Jacobian of the transformation and is often denoted by
J{x). Equation 2.71 gives the pdf of the transformed variable Y in terms of the
pdf o f X , which is given. The use of Equation 2.71 is limited by our ability to
find the roots o f the equation y = g(x). If g(x) is highly nonlinear, then the
solutions of y = g(x) can be difficult to find.

EXAMPLE 2.16.

Suppose X has a Gaussian distribution with a mean of 0 and variance of 1 and


Y = X 1 + 4. Find the pdf of Y.
60 REVIEW OF PROBABILITY AND RANDOM VARIABLES

SOLUTION: y = g(x) = x 2 + 4 has two roots:

x (1) = + V y — 4
x (2) = —V y — 4

and hence

g '( * (1)) = 2 V_y - 4


g '( * (2)) = - 2 V y - 4

The density function of Y is given by

f x ( x m) f x ( * <2))
fr(y)
|g'(*(1>)l |g'(*(2,)l

With f x (x) given as

/* (* ) = x 2/2),

we obtain

1
e x p ( - ( y - 4)/2), y > 4
V 2 -ir(_y — 4)
fr(y) =
0 y < 4

Note that since y = x 2 + 4, and the domain of X is ( — °°), the domain of Y


is [4, o°).

EXAMPLE 2.17

Using the pdf of X and the transformation shown in Figure 2.10.a and 2.10.b,
find the distribution o f Y.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 61

F ig u r e 2 .1 0 T r a n s fo r m a t io n d is c u s s e d in E x a m p le 2 .1 7 .

SOLUTION: For - 1 < x < 1, y = x and hence

fr(y) = fx(y) = g. - i < y < i

All the values of jc > 1 map to y = 1. Since x > 1 has a probability o f §, the
probability that Y = 1 is equal to P ( X > 1) = l Similarly P (Y = - 1 ) = i.
Thus, Y has a mixed distribution with a continuum of values in the interval
( - 1 , 1) and a discrete set of values from the set { - 1 , 1}. The continuous
part is characterized by a pdf and the discrete part is characterized by a prob­
ability mass function as shown in Figure 2.10.C.

2.6.2 Functions of Several Random Variables


We now attempt to find the joint distribution of n random variables T . } ~,
. . . , Y„ given the distribution of n related random variables X u X 2, ■ ■ ■ , X n
62 REVIEW OF PROBABILITY AND RANDOM VARIABLES

and the relationship between the two sets o f random variables,

Yi = gi{X u X 2, . . . , X n), 1 , 2,

Let us start with a mapping o f two random variables onto two other random
variables:

Yi = Si(Xi, X2)
Y = g2( x u X 2)

Suppose (x, , x 2 ), i - 1 , 2 , . . . , k are the k roots of yi = x 2) and y2 =


&''X*’ x}'- Proceeding along the lines o f the previous section, we need to find
the region in the x ly x 2 plane such that

Ti < x 2) < y, + Ayj

and

y 2 < g i(x i, x 2) < y 2 + Ay 2

There are k such regions as shown in Figure 2.11 (k = 3). Each region consists
o a parallelogram and the area o f each parallelogram is equal to Ay,Ay,/
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 63

l-7(■xri°7 4 ° ) I Where J (xt, x 2) is the Jacobian o f the transformation defined as

dgi dgi
dxi dx2
J(xi, x 2) = (2.72)
dgi dg2
dx2 dx2

By summing the contribution from all regions, we obtain the joint pdf of Yj and
Y2 as

f Xl,Xl{x f , 4 ° )
y2) = z l7 (4°> 4 '))|
(2.73)

Using the vector notation, we can generalize this result to the n-variate case as

/ x ( x ( ->)
/v (y ) = X |/(x « )| (2.74.a)
i= l

where x<f> = [4 °, 4 °, . . . ,x^>]T is the fth solution to y = g(x) = [gi(x), g 2(x),


. . . , g „ ( x ) ] / and the Jacobian J is defined by

dgi dgi dgi

dxi dx2 dx„


(2.74.b)
dgn dgn dgn

dxi dx2 dxn

Suppose we have n random variables with known joint pdf, and we are
interested in the joint pdf o f m < n functions o f them, say

yi = gi(xu X2, . . . , x„), i = 1, 2, . . . , m

Now, we can define n — m additional functions

yj = gj(x i, x 2, , x„), j = m + 1, . . . , n

in any convenient way so that the Jacobian is nonzero, compute the joint pdf
of Yi, Y2> • • ■ > Yn, and then obtain the marginal pdf o f Yj, Y2, . . . , Ym by
64 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

integrating out Ym+l, . . . , Y„. If the additional functions are carefully chosen,
then the inverse can be easily found and the resulting integration can be handled,
but often with great difficulty.

EXAMPLE 2.18.

Let two resistors, having independent resistances, X 1 and X 2, uniformly distrib­


uted between 9 and 11 ohms, be placed in parallel. Find the probability density
function of resistance Yx o f the parallel combination.

SOLUTION: The resistance o f the parallel combination is

* 1*2

(*1 + * 2)

Introducing the variable

y2 = *2

and solving for *1 and x 2 results in the unique solution

Thus, Equation 2.73 reduces to

where

*2 *1
(*1 + *2? (*1 + * 2)2
T ( * i, *2) = 0 1
(*1 + * 2)2

(yi - y 1)2
yl
TRANSFORMATIONS {FUNCTIONS) OF RANDOM VARIABLES 65

We are given

fx^xfou *2) — f X,(Xl)fX2(X2) ^’ 9 =s *! < 11, 9 < ;c2 < 11

= 0 elsewhere

Thus

1 yi
fyt,r2(yu yi) yi,y 2 = ?
4 (y2 - y 1 ) 2 ’

= 0 elsewhere

We must now find the region in the y u y 2 plane that corresponds to the region
9 < * < 1 1 ; 9 < *2 < 1 1 . Figure 2.12 shows the mapping and the resulting
region in the y lt y 2 plane.
Now to find the marginal density o f Yu we “ integrate out” y 2.

, 19
frjyi) = J99
r■9n'V-r1) ____y|
4 (y 2 - y j
2 dy 2 , 2
v, < 4 —
71 20
yi dy 2)
19
4 — < v, < 5 -
)nv(ii-r.> 4(tt “ y i):
20 71 2
= 0 elsewhere

*2
yi = 9yi/(9-yi)
11

9— ?2 = ll>i/(U -^ 1)

4Y2 4 % 5 V2

yl=xix2Hxi + x2)

(a )
(6)
F ig u r e 2 .1 2 T r a n s f o r m a t io n o f E x a m p l e 2 .1 8 .
66 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Carrying out the integration results in

yi + y 2 In yi , 1 19
4 ~ — y i — 4
2 + 2(9 - y t) 9 - y / 20
11 - yi yf 11 - yi 19 .1
+ y 1 In yx
2 (H - yO y\ 20
= 0 elsewhere

Special-case: Linear Transformations. One of the most frequently used type


of transformation is the affine transformation, where each o f the new variables
is a linear combination of the old variables plus a constant. That is

Yi — al<lX l + ah2X 2 + ••• + alrnX n + bi

Y2 — a2AXi + a2i2X 2 + + a2„X n + b2

= a„'lX l + an2X 2 + ■■■ + a„inX n + bn

where the atJ s and £>,•’s are all constants. In matrix notation we can write this
transformation as

yr « 1,1 01,2 ’’ 01,n ~x; V


= 02,1 02,2 ‘‘ « 2 ,n x2 + b2

_Yn _ 0 n ,l 0/1,2 ' &n,n bn

Y = AX + B (2.75)

where A is n x n, Y , X , and B are n X 1 matrices. If A is nonsingular, then


the inverse transformation exists and is given byX

X = A - 'Y - A -1B
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 67

The Jacobian of the transformation is

01,1 01,2 01,n


J = 02,1 02,2 ^ l,n

0/i,l 0/i,2 ■ a n ,n

Substituting the preceding two equations into Equation 2.71, we obtain the pdf
of Y as

/v (y ) = /x (A -* y - A - 1B)||A||-1 (2.76)

Sum o f Random Variables. We consider Yi = + X 2 where X t and X 2 are


independent random variables. As suggested before, let us introduce an addi­
tional function Y2 = X 2 so that the transformation is given by

~YT "1 r ~Xi~


0 i_ _ * 2_

From Equation 2.76 it follows that

fr,,r2(y u y 2) = fx,,x2( y i ~ yi, yi)


— fx,(y i —y i i f x X y i )

since X\ and X 2 are independent.


The pdf of Yt is obtained by integration as

= f f x 2{yi - yi)fx 2{yi) dy2 (2.77.a)


J — CO

The relationship given in Equation 2.77.a is said to be the convolution of f Xl


and f y: . which is written symbolically as

/y, — f x 2 * f x 2 (2.77.b)

Thus, the density function of the sum of two independent random variables is
given by the convolution of their densities. This also implies that the charac-
68 REVIEW OF PROBABILITY AND RANDOM VARIABLES

teristic functions are multiplied, and the cumulant generating functions as well
as individual cumulants are summed.

EXAMPLE 2.19.

Xi and X 2 are independent random variables with identical uniform distributions


in the interval [ —1, 1]. Find the pdf of Y2 = X x + X 2.

SOLUTION: See Figure 2.13

-l

-l
( 6)
I f x ( 0 . 5 - y 2 ) f X2( , y 2 )

V M /J/A

(0.5)

-2 0 .5 2
Id)

Figure 2.13 Convolution of pdfs-Example 2.19.


TRANSFORMATIONS I FUNCTIONS) OF RANDOM VARIABLES 69

EXAMPLE 2.20.

Let Y = X x + X 2 where X x and X 2 are independent, and

f Xi(x,) = e x p ( - x x), x x s 0; f Xl(x2) = 2 e x p ( - 2 x 2), x 2 > 0,


= 0 X! < 0 =0 < 0.

Find the pdf of Y.

SOLUTION: (See Figure 2.14)

f y(y) = P exp(-Ac 02 e x p [ - 2 (y - * 0 ] d xx
Jo

= 2 exp (—2 y) \ exp(jc,) dxx = 2 e x p ( - 2 y)[exp(y) - 1 ]


Jo
f y (y ) = 2 [exp(—y) - exp[ —2y], y 2 0
= 0 y < 0

EXAMPLE 2.21.

X has an n-variate Gaussian density function with E{Xt} = 0, and a covariance


matrix of I x . Find the pdf o f Y = A X where A is an n x n nonsingular matrix.
70 REVIEW OF PROBABILITY AND RANDOM VARIABLES

SOLUTION: We are given

/x (x ) = [( 2 tt)'i/ 2|Sx I1/2] exp - ^ x 7^ 1*

With x = A !y, and J = \A\, we obtain

/v(y) = [(2ir)»/2|2x|1/2] - 1exp —^ yrA _irSx1A"'1y ||A||-2

Now if we define S Y = A 2 xA r, then the exponent in the pdf of Y has the form

which corresponds to a multivariate Gaussian pdf with zero means and a co-
variance matrix o f 2 Y. Hence, we conclude that Y, which is a linear transforma­
tion of a multivariate Gaussian vector X, also has a Gaussian distribution. (Note:
This cannot be generalized for any arbitrary distribution.)

Order Statistics. Ordering, comparing, and finding the minimum and maximum
are typical statistical or data processing operations. We can use the techniques
outlined in the preceding sections for finding the distribution of minimum and
maximum values within a group of independent random variables.
Let X lt X 2, X 3, . . . , X n be a group of independent random variables having
a common pdf, f x (x ) , defined over the interval (a, b). To find the distribution
of the smallest and largest o f these Xfi, let us define the following transformation:

Let Yj = smallest o f (X x, X 2, . . . , X„)


Y2 = next X i in order o f magnitude

Y„ = largest of (X u X 2, . . . , X„)

That is Yi < Y2 < ••• < Y„ represent X i , X 2,. . . , X nwhen the latter are arranged
in ascending order o f magnitude. Then Yt is called the ith order statistic of the
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 71

group. We will now show that the joint pdf of Yu Y2, ■ . . , Yn is given by

j V i , y 2,...,y „ ( y i> yi, ■ ■ ■ , yn) = « / / x ( y , ) / x ( y 2) - f x(yn)


a < y i < y 2 < ■■■ < y n < b

We shall prove this for n = 3, but the argument can be entirely general.
With n = 3

/ a:t,x2,x3( x u x 2 , x 3) = f x ( x 1) f x ( x 2) f x ( x 3)

and the transformation is

Yj = smallest of ( X 3, X 2, X 3)

Y2 = middle value of ( A j, X 2, X 3)

Y3 = largest o f { X u X 2, X 3)

A given set of values x u x 2, x 3 may fall into one of the following six possibilities:

Xi < X2 < X3 or yi = yi = x 2,
xu y3 = x 3
Xi < X 3< X 2 or yi = yi = x 3,
X u y3 = x 2
x2< Xi < X 3 or yi = X 2, yi = X u y3 = x 3
x 2 < X 3< Xi or yi = X2, yi = x 3, y3 = X i
x3 < X i < X 2 or yi = x 3, yi = X u y3 = X 2
x3< x2< Xi or y i = X 3, y z = x 2, y3 = X 1

(Note that x x = x 2, etc., occur with a probability of 0 since X u X 2, X 3 are


continuous random variables.)
Thus, we have six or 3! inverses. If we take a particular inverse, say, y x =
x 3, y 2 = x u and y 3 = x 2, the Jacobian is given by

0 0 1
J = 1 0 0 1
0 1 0

The reader can verify that, for all six inverses, the Jacobian has a magnitude of
1, and using Equation 2.71, we obtain the joint pdf of Y1; Y2, Y3 as

/y1.y2.y3(yi, y2, y3) = 3 ! f x ( y i ) f x ( y 2 ) f x ( y 3) , a < yi < y 2 < y3 < b


72 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Generalizing this to the case o f n variables we obtain

/y1,y2,...,y„(>'i> y2> • ■ • >y«) = n!fx(yi)fx(y2) fx(yn)


a < y x < y 2 < ■■■ < y n < b (2.78.a)

The marginal pdf o f Y„ is obtained by integrating out y x, y2> • • • >yn-i,

fr„(yn) = p” P"‘ - P3Jpa n!fx(yi)fx{yi) ••• dy2 ■■■ dyn_!


Ja Ja Ja

The innermost integral on y 2 yields Fx ( y 2), and the next integral is

P 3 Fx ( y 2) f x ( y 2) dy2 = p 3 F *(y 2)d[F*(y2)]


Ja Ja
= [F y(y 3)]2
2

Repeating this process (n — 1) times, we obtain

/y„(>V) = «[F ^ (y„)]n' 1/^ ( y n), a < yn < b (2.78.b)

Proceeding along similar lines, we can show that

/y .O t) = «[1 - FxiyOY xf x { y d , a < y 1< b (2.78.c)

Equations 2.78.b and 2.78.C can be used to obtain and analyze the distribution
of the largest and smallest among a group o f random variables.

EXAMPLE 2.22.

A peak detection circuit processes 10 identically distributed random samples


and selects as its output the sample with the largest value. Find the pdf of the
peak detector output assuming that the individual samples have the pdf

f x ( x ) = ae “, x > 0
= 0 x <0
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 73

SOLUTION: From Equation 2.78.b, we obtain

fr„(y) = 10[1 - e-oYae-v, T — 0


=0 y <0

Nonlinear Transformations. While it is relatively easy to find the distribution


of Y = g(X ) when g is linear or affine, it is usually very difficult to find the
distribution of Y when g is nonlinear. However, if X is a scalar random variable,
then Equation 2.71 provides a general solution. The difficulties when X is two-
dimensional are illustrated by Example 2.18, and this example suggests the
difficulties when X is more than two-dimensional and g is nonlinear.
For general nonlinear transformations, two approaches are common in prac­
tice. One is the Monte Carlo approach, which is outlined in the next subsection.
The other approach is based upon an approximation involving moments and is
presented in Section 2.7. We mention here that the mean, the variance, and
higher moments of Y can be obtained easily (at least conceptually) as follows.
We start with

£ {h (Y )]} = f h ( y ) f Y(y)dy
h

However, Y = g(X ), and hence we can compute £ {h (Y )} as

£ Y{h(Y )} = £ x{h(g(X )}

Since the right-hand side is a function of X alone, its expected value is

£ x{/i(g(X ))} = J h (g (x ))fx(x) dx (2.79)

Using the means and covariances, we may be able to approximate the dis­
tribution of Y as discussed in the next section.

Monte Carlo (Synthetic Sampling) Technique. We seek an approximation to


the distribution or pdf of Y when

Y = g ( Z 1; . . . , X n)
74 REVIEW OF PROBABILITY AND RANDOM VARIABLES

F ig u r e 2 .1 5 S im p le M o n t e C a r l o s im u la t io n .

It is assumed that Y = g ( X lz . . . , X n) is known and that the joint density


f x t,xz....x„ is known. Now if a sample value o f each random variable were known
(say X x = x u , X 2 = x l2, . . . , X„ = x ln), then a sample value of Y could be
computed [say y t = g (x u , x U2, . . . , x, „)]. If another set of sample values were
chosen for the random variables (say X l = x 2,i. ■ ■ . , X n = x 2i„), then y 2 =
g(x 21 , x 2t2, . . . , x2,„) could be computed.
Monte Carlo techniques simply consist of computer algorithms for selecting
the samples x; l , . . . , x in, a method for calculatingy, = g ( x u , . . . , x i n), which
often is just one or a few lines of code, and a method of organizing and displaying
the results of a large number of repetitions o f the procedure.
Consider the case where the components of X are independent and uniformly
distributed between zero and one. This is a particularly simple example because
computer routines that generate pseudorandom numbers uniformly distributed
between zero and one are widely available. A Monte Carlo program that ap­
proximates the distribution of Y when X is o f dimension 20 is shown in Figure
2.15. The required number of samples is beyond the scope of this introduction.
However, the usual result of a Monte Carlo routine is a histogram, and the
errors of histograms, which are a function of the number of samples, are discussed
in Chapter 8 .
If the random variable X t is not uniformly distributed between zero and one,
then random sampling is somewhat more difficult. In such cases the following
procedure is used. Select a random sample of U that is uniformly distributed
between 0 and 1. Call this random sample u2. Then Fx^(ux) is the random sample
o f Xj.
TRANSFORMATIONS ( FUNCTIONS) OF RANDOM VARIABLES 75

zv
If
O f
6£ '
8£ '
LZ-
92'
S£'
VS'
££'
Z£-
IE'
or
62 '
82 '
LZ'
92'
92'
VZ‘
£2'
22'
12'
02'
61 '
81 '
LV
9Y
S I'
H'
£1'
21'

R e s u lt s o f a M o n t e C a r lo s im u la t io n .
ir
or
60 '
80 '
/O'
90 '
SO'
fO '
£ 0'
20'
10'
0
10-
20-
£ 0' -
VO'-
S O '-
90-
LO'-
F ig u r e 2 .1 6

s3|diues jojsquinN
76 REVIEW OF PROBABILITY AND RANDOM VARIABLES

For example, suppose that X is uniformly distributed between 10 and 20.


Then

Fx.(x) = 0 x < 10
= (x - 10 ) / 10 , 10 < x < 20
= 1 x > 20

Notice FxHu) = 10m + 10. Thus, if the value .250 were the random sample
of U, then the corresponding random sample of X would be 12.5.
The reader is asked to show using Equation 2.71 that if X t has a density
function and if X t = F r 1(U) = g(U) where U is uniformly distributed between
zero and one then Fp 1 is unique and

dFj(x)
fx ,(x ) where Ft = (F r T 1
dx

If the random variables X t are dependent, then the samples of X 2, . . . , X n


are based upon the conditional density function f Xl\xt, ■ • • ,
The results of an example Monte Carlo simulation of a mechanical tolerance
application where Y represents clearance are shown in Figure 2.16. In this case
Y was a somewhat complex trigonometric function o f 41 dimensions on a pro­
duction drawing. The results required an assumed distribution for each o f the
41 individual dimensions involved in the clearance, and all were assumed to be
uniformly distributed between their tolerance limits. This quite nonlinear trans­
formation resulted in results that appear normal, and interference, that is, neg­
ative clearance, occurred 71 times in 8000 simulations. This estimate of the
probability of interference was verified by results of the assembly operation.

2.7 BOUNDS AN D APPROXIM ATIONS

In many applications requiring the calculations of probabilities we often face


the following situations:

1. The underlying distributions are not completely specified— only the


means, variances, and some of the higher order moments E {{X — |Xa-)*}>
k > 2 are known.
2. The underlying density function is known but integration in closed form
is not possible (example: the Gaussian pdf).

In these cases we use several approximation techniques that yield upper and/or
lower bounds on probabilities.
BOUNDS AND APPROXIMATIONS 77

2.7.1 Tchebycheff Inequality


If only the mean and variance of a random variable X are known, we can obtain
upper bounds on P(\X\ > e) using the Tchebycheff inequality, which we prove
now. Suppose X is a random variable, and we define

if |*| s e
Yt =
( if |*1 < e

where e is a positive constant. From the definition of Y( it follows that

X 2 > X 2Yt s e2y e

and thus

E {X 2} > E {X 2Yt) > e2E{Yt} (2.80)

However,

£ { y j = 1 • P(|*| s e) + 0 • P(l*| < e) = P(|*| s e) (2.81)

Combining Equations 2.80 and 2.81, we obtain the Tchebycheff inequality as

P(|*| > e) s ^ E [X 2] (2.82.a)

(Note that the foregoing inequality does not require the complete distribution
o f X , that is, it is distribution free.)
Now, if we let X = {Y - p,y), and e = k, Equation 2.82.a takes the form

P(\(Y - |xy)l s k o Y) ^ p (2.82.b)

or

p (|y - ^1 > *) < ^ (2.82.C)


78 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

Equation 2.82.b gives an upper bound on the probability that a random variable
has a value that deviates from its mean by more than k times its standard
deviation. Equation 2.82.b thus justifies the use o f the standard deviation as a
measure o f variability for any random variable.

2.7.2 Chernoff Bound


The Tchebycheff inequality often provides a very “ loose” upper bound on prob­
abilities. The Chernoff bound provides a ‘tighter” bound. To derive the Chernoff
bound, define

X a e
W< e

Then, for all ( 2 0, it must be true that

euY.

and, hence,

E{e,x} a euE{Yf} = e“P ( X a e)

or

P ( X a e) < e - KE{e,x}, ta 0

Furthermore,

P ( X a e) < min e ,lE{e‘x }


tsO
^ min exp[ —re + In E{e,x}] (2.83)
ia0

Equation 2.83 is the Chernoff bound. While the advantage of the Chernoff
bound is that it is tighter than the Tchebycheff bound, the disadvantage of the
Chernoff bound is that it requires the evaluation of E{e'x } and thus requires
more extensive knowledge o f the distribution. The Tchebycheff bound does not
require such knowledge o f the distribution.
BOUNDS A N D APPROXIMATIONS 79

2.7.3 Union Bound


This bound is very useful in approximating the probability of union of events,
and it follows directly from

P(A U B) = P (A ) + P (B ) - P (A B ) < P (A ) + P(B)

since P (A B ) s 0. This result can be generalized as

(2.84)

We now present an example to illustrate the use o f these bounds.

EXAMPLE 2.23.

X x and X 2 are two independent Gaussian random variables with p*, = p *2 =


0 and a \ = 1 and cr%2 = 4.

(a) Find the Tchebycheff and Chernoff bounds on P(X, > 3) and compare
it with the exact value o f P (X x == 3).
(b) Find the union bound on P (X x > 3 o r X 2 > 4 ) and compare it with the
actual value.

SOLUTION:
(a) The Tchebycheff bound on P (X x > 3) is obtained using Equation 2.82.C
as

P (X x — 3) — P(|X,| s 3) < i = 0.111

To obtain the Chernoff bound we start with


80 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Hence,

P(Xi £ e) < min exp


r>0 -f c + 2

The minimum value o f the right-hand side occurs with t = e and

P { X l £ e) < e~*2'2

Thus, the Chernoff bound on P (X { £ 3) is given by

P ( X t £ 3) < e~912 « 0.0111

From the tabulated values of the Q ( ) function (Appendix D ), we


obtain the value o f P { X x £ 3) as

P ( X x > 3) = Q (3) = .0013

Comparison of the exact value with the Chernoff and Tchebycheff


bounds indicates that the Tchebycheff bound is much looser than the
Chernoff bound. This is to be expected since the Tchebycheff bound
does not take into account the functional form of the pdf.

(b) P(Xi £ 3 or X 2 £ 4)
= P(Xt £ 3) + P (X 2 £ 4) - P ( X t £ 3 and X 2 £ 4)
= P { X l £ 3) + P {X 2 £ 4) - P ( X Y£ 3) P (X 2 £ 4)

since X x and X 2 are independent. The union bound consists o f the sum
o f the first two terms o f the right-hand side o f the preceding equation,
and the union bound is “ off” by the value of the third term. Substituting
the value o f these probabilities, we have

P {X , £ 3 or X 2 £ 4) ~ (.0013) + (.0228) - (,0013)(.0228)


== .02407

The union bound is given by

P(X, £ 3 or X 2 £ 4) < P ( X t £ 3) + P(X 2 £ 4) = .0241


BOUNDS AND APPROXIMATIONS 81

The union bound is usually very tight when the probabilities involved
are small and the random variables are independent.

2.7.4 Approximating the Distribution of Y = g { X 2, X 2, . . . , X n)


A practical approximation based on the first-order Taylor series expansion is
discussed. Consider

Y — g (X i, X 2, , X n)

II Y is represented by its first-order Taylor series expansion about the point p.L,
P '2 , ■ ■ . , |A„

Y dg
g( Pl , Pi, ■■■ , Pi.) + s (Pi, p 2, ■ • • , Pi.) [X, - P,]
i=1 dXj

then

^'l^l = g( Pi , p2, • - • , P„)


E [(Y - p Yy]
dY
s ( p i, . - •, p „)
1= 1 dXi
n n

i=l ;=l dAj


i*i

where

P; = E[Xi]
vl, = E[X, - P,)2]
_ E\(X, - p )(A - |x,)]

II the random variables, X 2, . . . , X n, are uncorrelated ( Px,xt = 0), lhen


die double sum is zero.
Furthermore, as will be explained in Section 2.8.2, the central limit theorem
82 REVIEW OF PROBABILITY AND RANDOM VARIABLES

suggests that if n is reasonably large, then it may not be too unreasonable to


assume that Y is normal if the W,s meet certain conditions.

EXAMPLE 2.24.

y = § 2 + X 3X 4 - X I

The X;S are independent.

p*! = 10 = 1

Px2 = 2 (j-2x 2 — -2

Px3 = 3 tri
*3 = 4
-

P*4 = 4 CTx = -
x4 3

Px5 = 1 (Tx = -
*5 5

Find approximately (a) py, (b) a\, and (c) P (Y < 20).

SOLUTION:

(a) py « y + (3)(4) -

00 ^ “ (^)2(l) + (
= 11.2

(c) With only five terms in the approximate linear equation, we assume,
for an approximation, that Y is normal. Thus

exp( —z2/2) dz = 1 - 2(1.2) = .885


n Y S 2 0 )~ \ l ^ k
BOUNDS AND APPROXIMATIONS 83

-2-7-5 Series Approximation o f Probability Density Functions


In some applications, such as those that involve nonlinear transformations, it
will not be possible to calculate the probability density functions in closed form.
However, it might be easy to calculate the expected values. As an example,
consider Y = X \ Even if the pdf of Y cannot be specified in analytical form,
it might be possible to calculate E {Y k} = E {X 3k} for k < m. In the following
paragraphs we present a method for approximating the unknown pdf f Y(y) of
a random variable Y whose moments E {Y k} are known. To simplify the algebra,
we will assume that E {Y} = 0 and a\ - 1.
The readers have seen the Fourier series expansion for periodic functions.
A similar series approach can be used to expand probability density functions.
A commonly used and mathematically tractable series approximation is the
Gram-Charlier series, which has the form:

fr(y) = Ky) 2 CjHj(y) (2.85)

where

(2 . 86)

and the basis functions o f the expansion, Hj(y), are the Tchebycheff-Hermite
( T-H) polynomials. The first eight T-H polynomials are

H0(y) = 1
tfi(y) = y
H2(y) = / ~ 1
H3(y) = / - 3y
H ly ) = / - 6/ + 3
Hs(y) = / - 1 0 / + 15y
H6(y) = y 6 - 15y4 + 4 5 / - 15
H-ly) - / - 2 1 / + 105/ - 105y
Hs(y) = y s - 2 8 / + 210 / - 4 2 0 / + 105 (2.87)

and they have the following properties:

d(Hk-i(y )h (y ))
1. Hk(y)h(y) = -
dy
84 REVIEW OF PROBABILITY AND RANDOM VARIABLES

2. H k(y) - yH k_1(y) + {k - 1)H k_2(y) = 0, k > 2

3. J Hm(y )H n(y)h(y) dy = 0 , m ^n
( 2 . 88)
= nl, m - n
The coefficients o f the series expansion are evaluated by multiplying both
sides of Equation 2.85 by H k(y) and integrating from — co to <». By virtue o f the
orthogonality property given in Equation 2.88, we obtain

Ck H k{y)fr(y)dy
h i:
i JfcM m
(2.89.a)
l\ *■ ( 2 ) 1 ! ^ ~ 2 + 2 Z2 ! * k~4

where

= E {Y m}

and

^ = (k - m)\ = k{k ~ 1} [k ~ (m _ ^ k ~ m

The first eight coefficients follow directly from Equations 2.87 and 2.89.a and
are given by

C0 = 1
Cl = P-!

C2 = \ ( p* - 1)

c3 = g (p3 - 3p.i)

c 4 = ~ (p -4 - 6 p.2 + 3)

C5 = 1^0 “ 10^3 + ^ l )

c6 = ~ ( m-6 - 15 |X4 + 45 p-2 - 15)

c7 = ( m*7 - 21(X5 + 105p,3 - 105p.^

c 8 = ^ 2 0 (fls ~ 281X6 + 210fl* " 420^ + 105> (2.89.b)


BOUNDS AND APPROXIMATIONS 85

Substituting Equation 2.89 into Equation 2.85 we obtain the series expansion
for the pdf of a random variable in terms o f the moments of the random variable
and the T-H polynomials.
The Gram-Charlier series expansion for the pdf of a random variable X with
mean | jla- and variance crA has the form:

where the coefficients C, are given by Equation 2.89 with |a[ used for |xt where

EXAMPLE 2.25.

For a random variable X

P-! = 3 , p.2 = 13, p,3 = 59, p,4 = 309

Find P (X < 5) using four terms of a Gram-Charlier series.

SOLUTION:

o x2 = E ( X 2) - [ £ ( X )]2 = p.2 - (x? = 4

Converting to the standard normal form

Then the moments of Z are

^ = 1
9|x2 + 27 (jlj — 27
8
12p,3 + 54|x2 — 108(Xj 81
= 3.75
16
86 REVIEW OF PROBABILITY AND RANDOM VARIABLES

Then for the random variable Z , using Equation 2.89,

Co — 1
Cx = 0
C2 = 0

C3 = 2 ( - . 5 ) = -.0 8 3 3 3
o

c * = h (3 -75 - 6 + 3) = .03125

Now P { X < 5) = P ( Z < 1)

■ /C' .. vv 5f e exp (' z’ ,2) [ | C'"'W


ft 1
= f \ e x p ( - z 2/2 ) rfz + J1 ( - . 0 8 3 3 )h(z)H3(z) dz

+ J 1 .03l25h{z)Hi(z) dz

Using the property (1) of the T-H polynomials yields

P{Z < 1)
= .8413 + .0833/z(1)//2(1) - .03125/i(l)tf3(l)

- + 0833^ ; “ p ( 4 ) <°> - -03125 < - 2>


= .8413 + .0151 = .8564

Equation 2.90 is a series approximation to the pdf of a random variable X


whose moments are known. If we know only the first two moments, then the
series approximation reduces to

fx(x ) exPt (x mr)2/2cr^-]

which says that (if only the first and second moments of a random variable are
known) the Gaussian pdf is used as an approximation to the underlying pdf. As
BOUNDS A N D APPROXIMATIONS 87

we add more terms, the higher order terms will force the pdf to take a more
proper shape.
A series of the form given in Equation 2.90 is useful only if it converges
rapidly and the terms can be calculated easily. This is true for the Gram-Charlier
series when the underlying pdf is nearly Gaussian or when the random variable
X is the sum of many independent components. Unfortunately, the Gram-
Charlier series is not uniformly convergent, thus adding more terms does not
guarantee increased accuracy. A rule of thumb suggests four to six terms for
many practical applications.

2.7.6 Approximations of Gaussian Probabilities


The Gaussian pdf plays an important role in probability theory. Unfortunately,
this pdf cannot be integrated in closed form. Several approximations have been
developed for evaluating

and are given in the Handbook o f Mathematical functions edited by Abramowitz


and Stegun (pages 931-934). For large values of y, (y > 4), an approximation
for Q (y) is

(2.91.a)

For 0 < y, the following approximation is excellent as measured by |e(y)|, the


magnitude of the error.

Q(y) = Ky)(bi{ + V 2 + b3t3 + h4f 4 + bsts) + e(y) (2.91.b)

where

1
b2 = - .356563782
1 + py
|e(y)| < 7.5 x 10“8 b3 = 1.781477937

p = .2316419 b4 = -1.821255978

b, = .319381530 b5 = 1.330274429
88 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

2.8 SEQUENCES OF R A N D O M VARIABLES


A N D CONVERGENCE

One of the most important concepts in mathematical analysis is the concept of


convergence and the existence o f a limit. Fundamental operations of calculus
such as differentiation, integration, and summation of infinite series are defined
by means of a limiting process. The same is true in many engineering applications,
for example, the steady state of a dynamic system or the asymptotic trajectory
of a moving object. It is similarly useful to study the convergence o f random
sequences.
With real continuous functions, we use the notation

x (t) —» a as f— or lim x(t) = a


f->f0

to denote that x(t) converges to a as t approaches t0 where t is continuous. The


corresponding statement for t a discrete variable is

x (t„) —> a as tn —* t0 or lim x (t„) = a


rt—
*0

for any discrete sequence such that

t „ —* t 0 as n —> “

With this remark in mind, let us proceed to investigate the convergence of


sequences of random variables, or random sequences. A random sequence is
denoted by X i7 X 2, . . . , X n, . . . . For a specific outcome, X, X n(\) = x n is a
sequence of numbers that might or might not converge. The concept o f con­
vergence of a random sequence may be concerned with the convergence of
individual sequences, X „(\) = x„, or the convergence of the probabilities of
some sequence o f events determined by the entire ensemble o f sequences or
both. Several definitions and criteria are used for determining the convergence
o f random sequences, and we present four o f these criteria.

2.8.1 Convergence Everywhere and Almost Everywhere


For every outcome X, we have a sequence o f numbers

X i(X ), X 2(X), . . . , X „(X ), . . .

and hence the random sequence X i, X 2, . . . , X„ represents a family of se­


quences. If each member o f the family converges to a limit, that is, -X j(X ),X 2(X),
SEQUENCES OF RANDOM VARIABLES A N D CONVERGENCE 89

, converges for every X. e S, then we say that the random sequence converges
everywhere. The limit of each sequence can depend upon X, and if we denote
the limit by X , then X is a random variable.
Now, there may be cases where the sequence does not converge for every
outcome. In such cases if the set o f outcomes for which the limit exists has a
probability of 1 , that is, if

P{\ : lim X n(\) = X (X )} = 1


ft—*«o

then we say that the sequence converges almost everywhere or almost surely.
This is written as

P { X n —> AT} = 1 as n -» “ (2.92)

2.8.2 Convergence in Distribution and Central Limit Theorem


Let F„(x) and F(x) denote the distribution functions o f X„ and X , respectively.
If

F„(x) - » F (x) as n -» » (2.93)

for all x at which F(x) is continuous, then we say that the sequence X„ converges
in distribution to X .

Central Limit Theorem. Let X i, X 2, . . . , X„ be a sequence of independent,


identically distributed random variables, each with mean |jl and variance cr2. Let

Zn = 2 (Xi - M.)/VmT2

Then Z„ has a limiting (as ti —* °°) distribution that is Gaussian with mean 0 and
variance 1 .
The central limit theorem can be proved as follows. Suppose we assume that
the moment-generating function M(t) of X^ exists for |f| < h. Then the function
m(t)

m(t) = E{exp[t(X* - p.)]} = e x p (-p r )M (r )

exists for - h < t < h. Furthermore, since X k has a finite mean and variance,
the first two derivatives o f M(t) and hence the derivatives of m(t) exist at t =
90 REVIEW OF PROBABILITY AND RANDOM VARIABLES

0. We can use Taylor’s formula and expand /22(f) as

m{t) = m ( 0) + m'{Q)t + m”( Q t 2l2, 0 < £< t


= i [m ’\ Q - cr2]f2
2 2

Next consider

t ) = £ {e x p (T Z „ )}

exp - M. *2 - \l - M -
l t exp I t
= E{ aVn f fV « j ' ' 6XP cr's/rt
- M-'

E'|explT^ f)}
* .r h < ---- 7= < h
c r V /2 / o -V /2

In /22(f), replace t by r /(a V n ) to obtain

^ i + l ! + [m 'U ) - j 2] t 2

Ct V h / 2/2 2 / 2 (7 2

where now £ is between 0 and t/ ( a V n ). Accordingly,

0< 5<

Since m”(t) is continuous at t = 0 and since $ -»• 0 as n - » oo, we have

lim[m"(^) — a2] = 0

and

lim M „( t) = lim j 1 + — }
n—** n~♦* 2/2 J
= e x p (T2/2) (2.94)
SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 91

(The last step follows from the familiar formula o f calculus l i n v ,4 l + a!n\" =
e“). Since exp(i-2/2) is the moment-generating function o f a Gaussian random
variable with 0 mean and variance 1 , and since the moment-generating function
uniquely determines the underlying pdf at all points o f continuity, Equation
2.94 shows that Z n converges to a Gaussian distribution with 0 mean and vari­
ance 1 .
In many engineering applications, the central limit theorem and hence the
Gaussian pdf play an important role. For example, the output of a linear system
is a weighted sum of the input values, and if the input is a sequence o f random
variables, then the output can be approximated by a Gaussian distribution.
Another example is the total noise in a radio link that can be modeled as the
sum o f the contributions from a large number of independent sources. The
central limit theorem permits us to model the total noise by a Gaussian distri­
bution.
We had assumed that X -s are independent and identically distributed and
that the moment-generating function exists in order to prove the central limit
theorem. The theorem, however, holds under a variety of weaker conditions
(Reference [6]):

1. The random variables X L, X 2, . ■ ■ , in the original sequence are inde­


pendent with the same mean and variance but not identically distributed.
2. X u X 2, . . . , are independent with different means, same variance, and
not identically distributed.
3. Assume X 2, X 2, X 3, . . . are independent and have variances cr?, rn,
0-3, . . . . If there exist positive constants e and £ such that e < uj < t,
for all i, then the distribution o f the standardized sum converges to the
standard Gaussian; this says in particular that the variances must exist
and be neither too large nor too small.

The assumption of finite variances, however, is essential for the central limit
theorem to hold.

Finite Sums. The central limit theorem states that an infinite sum, Y, has a
normal distribution. For a finite sum of independent random variables, that is,

Y = 2 X>
1= 1

then

f y — f x t * fxt * ■ ■ • * fx„

>M“ ) = IT ‘M 03)
i=*l
92 REVIEW OF PROBABILITY AND RANDOM VARIABLES

and

Cy(w) - ^ C^.(w)

where 'P is the characteristic function and C is the cumulant-generating function.


A lso, if K t is the jth cumulant where K , is the coefficient of (yw)'//! in a power
series expansion of C, then it follows that

K,y = 2 K-lx,
y=i

and in particular the first cumulant is the mean, thus

M-r = 2 Mw,

and the second cumulant is the variance

cry = 2 ar

and the third cumulant, K 3yX is E { ( X - p,*)3}, thus

E {(Y - p.y)3} = ^ E { ( X t - p.*,)3}

and AT4 is E { { X - p *)4} - 3 K ltX, thus

M = 2 K ^t = 2 (M X - p *)4} - 3 K xx)

For finite sums the normal distribution is often rapidly approached; thus a
Gaussian approximation or aGram-Charlier approximation is often appropriate.
The following example illustrates the rapid approach to a normal distribution.
SEQUENCES OF RANDOM VARIABLES AND CONVERGENCE 93

9.70 9.75 9.80 9.85 9.90 9.95 10.00 10.05 10.10 10.15 10.20 10.25
x

Figure 2.17 Density and approximation for Example 2.26.

EXAMPLE 2.26.

Find the resistance of a circuit consisting of five independent resistances in series.


All resistances are assumed to have a uniform density function between 1.95
and 2.05 ohms (2 ohms ± 2.5% ). Find the resistance of the series combination
and compare it with the normal approximation.

SOLUTION: The exact density is found by four convolutions of uniform density


functions. The-mean value of each resistance is 2 and the standard deviation is
(20 V 3 ) -1. The exact density function of the resistance o f the series circuit is
plotted in Figure 2.17 along with the normal density function, which has the
same mean (10) and the same variance (1/240). Note the close correspondence.

2.8.3 Convergence in Probability (in Measure) and the Law of Large


Numbers
The probability P{\X - X n\> e} of the event (|Z - X„| > e} is a sequence of
numbers depending on n and e. If this sequence tends to zero as n —>■ that
94 REVIEW OF PROBABILITY AND RANDOM VARIABLES

is, if

P{|X - X n\> e} —» 0 as n —» oo

for any e > 0, then we say that X„ converges to the random variable X in
probability. This is also called stochastic convergence. An important application
of convergence in probability is the law of large numbers.

Law o f Large Numbers. Assume that X u X 2, . . . , X„ is a sequence o f in­


dependent random variables each with mean p, and variance cr2. Then, if we
define

X» = - S X, (2 .95.a)
n i

lim P{\Xn - p,| > e} = 0 for each e > 0 (2.95.b)

The law of large numbers can be proved directly by using Tchebycheff’s ine­
quality.

2.8.4 Convergence in Mean Square

A sequence X„ is said to converge in mean square if there exists a random


variable X (possibly a constant) such that

E [ { X n - X ) 2] *0 as n -» (2.96)

If Equation 2.96 holds, then the random variable X is called the mean square
limit of the sequence X„ and we use the notation

l.i.m. X n = X

where l.i.m. is meant to suggest the phrase dmit in me an (square) to distinguish


it from the symbol lim for the ordinary limit of a sequence o f numbers.
Although the verification o f some modes of convergences is difficult to es­
tablish, the Cauchy criterion can be used to establish conditions for mean-square
convergence. For deterministic sequences the Cauchy criterion establishes con­
vergence of x„ to x without actually requiring the value o f the limit, that is, x.
In the deterministic case, x„ x if

|x„+m — x a\—» 0 as n —» for any m > 0


SUM MARY 95

Figure 2.18 Relationship between various modes of convergence.

For random sequences the following version of the Cauchy criterion applies.

E { ( X n - X ) 2} 0 as n —» cc

if and only if

E{\Xn+m — X„\2} —> 0 as for any m > 0 (2.97)

2.8.5 Relationship between Different Forms of Convergence


The relationship between various modes of convergence is shown in Figure 2.18.
If a sequence converges in MS sense, then it follows from the application of
Tchebycheff’ s inequality that the sequence also converges in probability. It can
also be shown that almost everywhere convergence implies convergence in prob­
ability, which in turn implies convergence in distribution.

2.9 SUMMARY
The reviews of probability, random variables, distribution function, probabil­
ity mass function (for discrete random variables), and probability density
functions (for continuous random variables) were brief, as was the review of
expected value. Four particularly useful expected values were briefly dis­
cussed: the characteristic function £ {ex p (/o)X )}; the moment generating func­
tion £ {exp (fX )}; the cumulative generating function In £ {e x p (fX )}; and the
probability generating function E {zx } (non-negative integer-valued random
variables).
96 REVIEW OF PROBABILITY AND RANDOM VARIABLES

The review o f random vectors, that is, vector random variables, extended the
ideas of marginal, joint, and conditional density function to n dimensions,
and vector notation was introduced. Multivariate normal random variables
were emphasized.

Transformations o f random variables were reviewed. The special cases o f a


function o f one random variable and a sum (or more generally an affine
transformation) of random variables were considered. Order statistics were
considered as a special transformation. The difficulty o f a general nonlinear
transformations was illustrated by an example, and the Monte Carlo tech­
nique was introduced.

We reviewed the following bounds: the Tchebycheff inequality, the Chernoff


bound, and the union bound. We also discussed the Gram-Charlier series ap­
proximation to a density function using moments. Approximating the distribu­
tion of Y = g { X x, . . . , A „) using a linear approximation with the first two
moments was also reviewed. Numerical approximations to the Gaussian distri­
bution function were suggested.

Limit concepts for sequences o f random variables were introduced. Conver­


gence almost everywhere, in distribution, in probability and in mean square
were defined. The central limit theorem and the law o f large numbers were
introduced. Finite sum convergence was also discussed.

These concepts will prove to be essential in our study o f random signals.

2.10 REFERENCES

The material presented in this chapter was intended as a review of probability and random
variables. For additional details, the reader may refer to one of the following books.
Reference [2], particularly Vol. 1, has become a classic text for courses in probability
theory. References [8] and the first edition of [7] are widely used for courses in applied
probability taught by electrical engineering departments. References [1], [3], and [10]
also provide an introduction to probability from an electrical engineering perspective.
Reference [4] is a widely used text for statistics and the first five chapters are an excellent
introduction to probability. Reference [5] contains an excellent treatment of series ap­
proximations and cumulants. Reference [6] is written at a slightly higher level and presents
the theory of many useful applications. Reference [9] describes a theory of probable
reasoning that is based on a set of axioms that differs from those used in probability.
[1] A. M. Breipohl, Probabilistic Systems Analysis, John Wiley & Sons, New York,
1970.
[2] W. Feller, An Introduction to Probability Theory and Applications, Vols. I, II,
John Wiley & Sons, New York, 1957, 1967.
[3] C. H. Helstrom, Probability and Stochastic Processes for Engineers, Macmillan,
New York, 1977.
[4] R. V. Hogg and A. T. Craig, Introduction to Mathematical Statistics, Macmillan,
New York, 1978.
PROBLEMS 97

[5] M. Kendall and A. Stuart, The Advanced Theory o f Statistics, Vol. 1, 4th ed.,
Macmillan, New York, 1977.
[6] H. L. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences,
Vol. I, John Wiley & Sons, New York, 1979.
[7] A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-
Hill, New York, 1984.
[8] P. Z. Peebles, Jr., Probability, Random Variables, and Random Signal Principles,
2nd ed., McGraw-Hill, New York, 1987.
[9] G. Shafer, A Mathematical Theory o f Evidence, Princeton University Press, Prince­
ton, N.J., 1976.
[10] J. B. Thomas, An Introduction to Applied Probability and Random Processes, John
Wiley & Sons, New York, 1971.

2.11 PROBLEMS
2.1 Suppose we draw four cards from an ordinary deck o f cards. Let
A x: an ace on the first draw

A 2: an ace on the second draw

A }\ an ace on the third draw

A 4: an ace on the fourth draw.


a Find P { A X (T A 2 (~l A 3 0 A f) assuming that the cards are drawn
with replacement (i.e., each card is replaced and the deck is reshuffled
after a card is drawn and observed).
b. Find P { A X IT A 2 (T A 3 0 A f) assuming that the cards are drawn
without replacement.

2.2 A random experiment consists of tossing a die and observing the number
of dots showing up. Let
A x: number of dots showing up = 3

A 2: even number of dots showing up

Ay. odd number of dots showing up

a. Find P ( A X) and P ( A x D A 3).


b. Find P ( A 2 U A 3), P { A 2 fT A 3), P { A X\A3).

c. Are A 2 and A 3 disjoint?


d. Are A 2 and A 3 independent?

2 3 A box contains three 100-ohm resistors labeled R x, R2, and R3 and two
1000-ohm resistors labeled R 4 and Rs. Two resistors are drawn from this
box without replacement.
98 REVIEW OF PROBABILITY AND RANDOM VARIABLES

a. List all the outcomes of this random experiment. [A typical outcome


may be listed as ( R 3, Ry) to represent that R t was drawn first followed by
Rs.]

b. Find the probability that both resistors are 100-ohm resistors.


c. Find the probability o f drawing one 100-ohm resistor and one 1000-
ohm resistor.

d. Find the probability of drawing a 100-ohm resistor on the first draw


and a 1000-ohm resistor on the second draw.

Work parts (b), (c), and (d) by counting the outcomes that belong to the
appropriate events.

2.4 With reference to the random experiment described in Problem 2.3, define
the following events.

A 1: 100-ohm resistor on the first draw

A 2: 1000-ohm resistor on the first draw

5 ,: 100-ohm resistor on the second draw

B 2: 1000-ohm resistor on the second draw


a. Find P { A lB l), P ( A 2B l), and P { A 2B2).

b. Find P { A i ) , P ( A 2), F(Bi|Aj), and P ( B 1\A2). Verify that


P ( B 0 = P i B ^ A O P i A J + P (B 1|A2)P (A 2).

2.5 Show that:

a. P {A U S U C ) = P ( A ) + P (B ) + P {C ) - P { A B ) - P (B C )
- P ( C A ) + P {A B C ).
b. P(A\B) = P { A ) implies P(B\A) = P(B).
c. P ( A B C ) = P (A )P (B \ A )P (C \ A B ).

2.6 A u A 2, A 2 are three mutually exclusive and exhaustive sets of events as­
sociated with a random experiment E u Events Blt B2, and B3 are mutually

- r '" - B

---- ^ ^ ___
Figure 2.19 Circuit diagram for Problem 2.8.
PROBLEMS 99

exclusive and exhaustive sets o f events associated with a random experiment


£ 2. The joint probabilities of occurrence of these events and some marginal
.probabilities are listed in the table:

\ B,
A \ e, b2 e3
4, 3/36 * 5/36
A2 5/36 4/36 5/36
A3 * 6/36 *
P[B,) 12/36 14/36 *

a. Find the missing probabilities (*) in the table.


b. Find P (B 3|Aj) and P ( /f 1|B3).
c. Are events A x and B l statistically independent?

2.7 There are two bags containing mixtures o f blue and red marbles. The first
bag contains 7 red marbles and 3 blue marbles. The second bag contains 4
red marbles and 5 blue marbles. One marble is drawn from bag one and
transferred to bag two. Then a marble is taken out of bag two. Given that
the marble drawn from the second bag is red, find the probability that the
color of the marble transferred from the first bag to the second bag was
blue.

2.8 In the diagram shown in Figure 2.19, each switch is in a closed state with
probability p, and in the open state with probability 1 — p. Assuming that
the state o f one switch is independent of the state o f another switch, find
the probability that a closed path can be maintained between A and B
(Note: There are many closed paths between A and B.)

2.9 The probability that a student passes a certain exam is .9, given that he
studied. The probability that he passes the exam without studying is ,2.
Assume lhat the probability that the student studies for an exam is .75 (a
somewhat lazy student). Given that the student passed the exam, what is
the probability that he studied?

2.10 A fair coin is tossed four times and the faces showing up are observed.
a. List all the outcomes o f this random experiment.
b. If Al is the number of heads in each of the outcomes of this ex­
periment, find the probability mass function o f X.
I

100 REVIEW OF PROBABILITY AND RANDOM VARIABLES

2.11 Two dice are tossed. Let X be the sum o f the numbers showing up. Find
the probability mass function o f X .

2.12 A random experiment can terminate in one of three events A , B, or C


with probabilities 1/2, 1/4, and 1/4, respectively. The experiment is re­
peated three times. Find the probability that events A , B, and C each
occur exactly one time.

2.13 Show that the mean and variance o f a binomial random variable X are
M-v = nP and &x = npq, where q = 1 — p.

2.14 Show that the mean and variance o f a Poisson random variable are p x =
X. and o * = X..

2.15 The probability mass function o f a geometric random variable has the form
P ( X = k ) = pq*~\ k = 1, 2, 3, . . . ; p, q > 0, p + q = 1.
a. Find the mean and variance of X .
b. Find the probability-generating function o f X.

2.16 Suppose that you are trying to market a digital transmission system (m o­
dem) that has a bit error probability of 10 ~4 and the bit errors are inde­
pendent. The buyer will test your modem by sending a known message of
104 digits and checking the received message. If more than two errors
occur, your modem will be rejected. Find the probability that the customer
will buy your modem.

2.17 The input to a communication channel is a random variable X and the


output is another random variable Y. The joint probability mass functions
of X and Y are listed:

\x
Y\ -1 0 1

-1 1 1 0
4 8
1
0 0 4 0
1 0 1
i 4

a. Find P (Y = 1\X = 1).


b. Find P ( X = l|Y = 1).

c. Find pxy -
PROBLEMS 101

2.18 Show that the expected value operator has the following properties.
a. E{a + b X } = a + b E {X }
b. E {a X + b Y } = aE {X } + b E {Y }
c. Variance o f a X + b Y = a1 Var[V] + b 2 Var[Y]
+ lab Covar[X, Y]

2.19 Show that Ex,Y{g (X , Y )} = Ex { E Y<x[g(X, Y )]} where the subscripts


denote the distributions with respect to which the expected values are
computed.

2.20 A thief has been placed in a prison that has three doors. One o f the doors
leads him on a one-day trip, after which he is dumped on his head (which
destroys his memory as to which door he chose). Another door is similar
except he takes a three-day trip before being dumped on his head. The
third door leads to freedom. Assume he chooses a door immediately and
with probability 1/3 when he has a chance. Find his expected number of
days to freedom. {Hint: Use conditional expectation.)

2.21 Consider the circuit shown in Figure 2.20. Let the time at which the ith
switch closes be denoted by X t. Suppose X x, X 2, X 3, X 4 are independent,
identically distributed random variables each with distribution function F.
As time Increases, switches will close until there is an electrical path from
A to C. Let
U = timewhen circuit is first completed from A to B
V = time when circuit is first completed from B to C
W = timewhen circuit is first completed from A to C
Find the following:
a. The distribution function o f U.

b. The distribution function of W.


c. If F(x) = x, 0 s x s 1 (i.e., uniform), what are the mean and
variance o f X h U, and W1
V

------------- 0 ^ 0 -------------
o

------------- c C ^ o ------------- ------------- 0 - ^ 0 ------------

F ig u r e 2 .2 0 C ir c u it d ia g r a m f o r P r o b l e m 2 .2 1 .
102 REVIEW OF PROBABILITY AND RANDOM VARIABLES

2.22 Prove the following inequalities


a. ( E { X Y })2 s E { X 2} E { Y 2} (Schwartz or cosine inequality)
b. V E { ( X + Y )2} s \/E{X2} + V £ { Y 2} (triangle inequality)

2.23 Show that the mean and variance of a random variable X having a uniform
distribution in the interval [a, b] are p.* = (a + b )l 2 and = (b —
a fm .

2.24 X is a Gaussian random variable with p,* = 2 and <


j \ = 9. Find P ( —4 <
X s 5) using tabulated values of Q ( ).

2.25 X is a zero mean Gaussian random variable with a variance o f <j \ . Show
that

£ { * " } = [ ( * * ) " 1 • 3 •5 • • • ( « - 1), n even


n odd

2.26 Show that the characteristic function o f a random variable can be expanded
as

**(<■ >) =
k =
i 0

K
r
-
e w

(Note: The series must be terminated by a remainder term just before the
first infinite moment, if any exist).

2.27 a. Show that the characteristic function of the sum of two independent
random variables is equal to the product of the characteristic functions of
the two variables.
b. Show that the cumulant generating function of the sum of two
independent random variables is equal to the sum o f the cumulant gen­
erating function of the two variables.
c. Show that Equations 2.52.C through 2.52.f are correct by equating
coefficients of like powers o f / o> in Equation 2.52.b.

2.28 The probability density function of Cauchy random variable is given by

a
fx 0 ) = a > 0,
tt( x 2 + a 2) ’
a. Find the characteristic function of X.
b. Comment about the first two moments of X.

2.29 The joint pdf o f random variables X and Y is

f x A x > >') = i 0 < x < y, 0 < y < 2


PROBLEMS 103

a. Find the marginal pdfs, f x (x) and f Y(y).


b. Find the conditional pdfs f x\Y(x\y) and f Y\x (y\x).
c. Find E{X\Y = 1} and E{X\Y = 0.5}.
d. Are X and Y statistically independent?
e. Find pXY.

2.30 The joint pdf of two random variables is


f x „ x 2(xi x 2) = 1, O s r ^ l , 0 < x 2 =S 1
Let = X yX 2 and Y2 = X t
a. Find the joint pdf of f YhY2( y u y 2) ; clearly indicate the domain of
yu y2-
b. Find / yi(yi) and f Yl( y 2).

c. Are and Y2 independent?

2.31 X and Y have a bivariate Gaussian pdf given in Equation 2.57.


a. Show that the marginals are Gaussian pdfs.
b. Find the conditional pdf f x\Y(x\y). Show that this conditional pdf
has a mean

E{X\Y = y} = p-* + p — ( y - p.y)


aY
and a variance
VxO- - P2)

2.32 Let Z = X + Y — c, where X and Y are independent random variables


with variances ux and crYand c is constant. Find the variance o f Z in terms
o f a x , cry, and c.

2.33 X and Y are independent zero mean Gaussian random variables with
variances &x , and crY. Let

Z = i (X + T) and W - %(X — Y)
a. Find the joint pdf f z,w (z, w).
b. Find the marginal pdf / Z(z).
c. Are Z and W independent?

2.34 X 2, X 2, . . . , X n are n independent zero mean Gaussian random variables


with equal variances, crx. = a 2. Show that

Z = - [X, + X 2 + • • • + X„]
n
104 REVIEW OF PROBABILITY AND RANDOM VARIABLES

is a Gaussian random variable with p.z = 0 and = a 2ln. (Use the result
derived in Problem 2.32.)

2.35 X is a Gaussian random variable with mean 0 and variance Find the
pdf of Y if:
a. Y = X2

b- Y = |A1
c. Y = i [ X + l*|]

f 1 if X > crx
d. Y = \ X if |*| £ a x
1 -1 if * < —<jx

2.36 A" is a zero-mean Gaussian random variable with a variance o x2 . Let Y =


a X 2.
a. Find the characteristic function of Y, that is, find
■'I'Y(a)) = £ {exp (/a )Y )} = £ {e x p ( /w « * 2)}

b. Find f Y( y ) by inverting 'IV(w).

2.37 Xi and X 2 are two identically distributed independent Gaussian random


variables with zero mean and variance Let
R = V X ] + X\
and
© = tan"1 [X 2I X x]
a. Find f R,e (r, 6).
b. Find f R(r), and f e(6).
c. Are R and 0 statistically independent?

2.38 Xi and X 2 are two independent random variables with uniform pdfs in the
interval [0, 1], Let

Yi = Xi + X 2 and Y2 = X x - X 2

a. Find the joint pdf f y lrY2( y i, Yz) and clearly identify the domain
where this joint pdf is nonzero.
b. Find py,y2 and E{Yi\Y2 = 0.5).

2.39 X x and X 2 are two independent random variables each with the following
density function:

/* ,(* ) = e x, x > 0
= 0 x < 0
PROBLEMS 105

Let Fj = Xi + X 2 and Y2 = X J ( X 2 + X 2)

a. Find f Yi,r2( y u JzY


b. Find f Y,{y\), frjiyi) and show that F, and F 2 are independent.

2.40 X }, X 2, X 3, . . . , X n are n independent Gaussian random variables with


zero means and unit variances. Let

Y= t x j
i= 1

Find the pdf of F.

2.41 X is uniformly distributed in the interval [ —or, -rr]. Find the pdf of
Y - a sin(AT).

2.42 X is multivariate Gaussian with

" i i i ”
"6 " 2 4 3
i O 2
|J-X = 0 2 x — 4 - ^ 3
I 2 1
8 3 3 1

Find the mean vector and the covariance matrix of Y = [F l5 F2, F3]r,
where
Fi = X 1 — X 2
Y2 = Xi + X 2 - 2 X 3
F3 = X x + X 3

2.43 X is a four-variate Gaussian with

"o ' "4 3 2 r


0 3 4 3 2
and £x =
0 2 3 4 3
0 1 2 3 4

Find E{Xi\X2 = 0.5, X 3 = 1.0, X 4 = 2.0} and the variance of X l given


X 2 = X 3 = X 4 = 0.

2.44 Show that a necessary condition for 2 X to be a covariance matrix is that


for all

Vi
y = VJ

y„_
v r 2 xv s o
(This is the condition for positive semidefiniteness of a matrix.)
106 REVIEW OF PROBABILITY AND RANDOM VARIABLES

2.45 Consider the following 3 x 3 matrices


io 3 r 10 5 2 10 5 2
2 5 0 , B = 5 3 1 , c= 5 3 3
1 0 2 2 1 2 2 3 2

Which o f the three matrices can be covariance matrices?

2.46 Suppose X is an n-variate Gaussian with zero means and a covariance


matrix Xx . Let X1; \2, ■ . ■ , be n distinct eigenvalues of Xx and let V i,
V 2, . . . , V„ be the corresponding normalized eigenvectors. Show that

Y = AX

where
T
A [V 1; v 2, v 3, . . . , v„] n x n

has an n variate Gaussian density with zero means and

r^i
\2 0

Xy =

o K

2.47 X is bivariate Gaussian with


0 3 1
M-x = and £x
0 1 3

a. Find the eigenvalues and eigenvectors o f £ x .


b. Find the transformation Y = [Y ,, Y2] r = A X such that the com­
ponents of Y are uncorrelated.

2.48 If U(x) 2: 0 for all x and U(x) > a > 0 for all x E. t, where £ is some
interval, show that

P[U(X)*a]£±E{U(X)}

2.49 Plot the Tchebycheff and Chernoff bounds as well as the exact values for
P ( X > a), a > 0 , if W is

a. Uniform in the interval [0, 1].


b. E x p o n e n tia l,/^ * ) = exp( —* ), * > 0.
c. Gaussian with zero mean and unit variance.
PROBLEMS 107

2.50 Compare the Tchebycheff and Chernoff bounds on P ( 7 > a) with exact
values for the Laplaeian pdf

fr(y) = |e x p (-|y |)

2.51 In a communication system, the received signal Y has the form

Y = X + N
where X is the “ signal” component and N is the noise. X can have one
of eight values shown in Figure 2.21, and N has an uncorrelated bivariate
Gaussian distribution with zero means and variances o f 9. The signal X
and noise N can be assumed to be independent.
The receiver observes Y and determines an estimated value X of X
according to the algorithm
if y G Aj then X = x;
The decision regions A, for i = 1, 2, 3, . . . , 8 are illustrated by A , in
Figure 2.21. Obtain an upper bound on P ( X # X ) assuming that P ( X =
x,) = s for i = 1 , 2 , . . . , 8 .

Hint:
8
1. P ( X # X ) = ^ P ( X ^ X\X = x ,)P (X = x,.)

2. Use the union bound.

Figure 2.21 Signal values and decision regions for Problem 2.51.
108 REVIEW OF PROBABILITY A N D RANDOM VARIABLES

2.52 Show that the Tchebycheff-Hermite polynomials satisfy

= H k( y ) h ( y ) , *= 1, 2, . . .

2.53 X has a triangular pdf centered in the interval [ - 1 , 1], Obtain a Gram-
Charlier approximation to the pdf o f X that includes the first six moments
o f X and sketch the approximation for values o f X ranging from —2 to 2.

2.54 Let p be the probability of obtaining heads when a coin is tossed. Suppose
we toss the coin N times and form an estimate of p as

where NH = number o f heads showing up in N tosses. Find the smallest


value of N such that

P[\p — p\ s O.Olp) £ 0.1

(Assume that the unknown value of p is in the range 0.4 to 0.6.)

2.55 X x, X 2, . . . , X„ are n independent samples of a continuous random


variable X , that is

f x hX 2 X„(*l> X 2, , X „) — fx (X i )
i=i
Assume that p.x = 0 and <j 2 xis finite,
a. Find the mean and variance of

n fr,

b. Show that X converges to 0 in MS, that is, l.i.m. X = 0.

2.56 Show that if A,s are o f continuous type and independent, then for suffi­
ciently large n the density of sin(A"! + X 2 + • • • + X n) is nearly equal
to the density o f sin(A') where X is a random variable with uniform dis­
tribution in the interval ( —it, it).

2.57 Using the Cauchy criterion, show that a sequence X n tends to a limit in
the MS sense if and only if E { X mX n} exists as m, n —* °o.

2.58 A box has a large number o f 1000-ohm resistors with a tolerance of ±100
ohms (assume a uniform distribution in the interval 900 to 1100 ohms).
Suppose we draw 10 resistors from this box and connect them in series
PROBLEMS 109

and let R be the resistive value o f the series combination. Using the Gaus­
sian approximation for R find
P[9000 < R < 11000]

2.59 Let

where X :, i = 1, 2, . . . , n are statistically independent and identically


distributed random variables each with a Cauchy pdf

a. Determine the characteristic function Y„.


b. Determine the pdf o f Yn.
c. Consider the pdf o f Y„ in the limit as n —» oo. Does the central limit
theorem hold? Explain.

2.60 Y is a Guassian random variable with zero mean and unit variance and
sin (Y /n) ify>0
cos( Y /n ) if y s 0

Discuss the convergence o f the sequence X„. (Does the series converge,
if so, in what sense?)

2.61 Let Y be the number of dots that show up when a die is tossed, and let
X n = e x p [ - n ( Y - 3)]
Discuss the convergence o f the sequence X n.

2.62 Y is a Gaussian random variable with zero mean and unit variance and
X n = ex p (— Y/n)
Discuss the convergence o f the sequence X„.

You might also like