Notes EC636

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 89

Stochastic and Random Processes

(EC636)

by

Dr. Mohamed Elalem

Department of Computer Engineering

University of Tripoli

Notes on Probabilities and Stochastic Processes

Graduate Program of Computer Engineering

http://melalem.com/EC636.php

c M.Elalem
Lecture 1

Experiments, Models, and

Probabilities

1.1 Introduction

• Real word exhibits randomness

– Today’s temperature

– Flip a coin, head or tail (H,T)?

– Walk to a bus station, how long do you wait for the arrival of a bus?

– Transmit a waveform through a channel, which one arrives at the receiver?

– Which one does the receiver identify as the transmitted signal?

• We create models to analyze, since real experiment are generally too complicated, for

example, waiting time depends on the following factors:

– The time of a day (is it rush hour?);

– The speed of each car that passed by while you waited;

1
– The weight, horsepower, and gear ratio of the bus;

– The psychological profile and work schedule of drivers;

– The status of all road construction within 100 miles of the bus stop.

• It would be apparent that it would be too difficult to analyze the effects of all the

factors on the likelihood that you will wait less than 5 minutes for a bus. Therefore,

it is necessary to study and create a model to capture the critical part of the actual

physical experiment.

• Probability theory deals with the study of random phenomena, which under re-

peated experiments yield different outcomes that have certain underlying patterns

about them.

1.1.1 Review of Set Operation

• Event space Ω: sets of outcomes

• Sets constructions for events E ⊂ Ω and F ⊂ Ω

– Union: E ∪ F = {s ∈ Ω : s ∈ E OR s ∈ F };

– Intersection: E ∩ F = {s ∈ Ω : s ∈ E AND s ∈ F };

– Complement: E c = Ē = {s ∈ Ω : s ∈
/ E};

– Empty set: Φ = Ωc = {}.

• Only complement operation needs the knowledge of Ω; event space.

Spring 2023 2
1.1.2 Several Definitions

• Disjoint: if A ∩ B = Φ, the empty set, then A and B are said to be mutually exclusive

(M.E), or disjoint.

• Exhaustive: the collection of events has

Σ∞
i=1 Ai = Ω

• A partition of Ω is a collection of mutually exclusive subsets of Ω such that their

union is Ω (Partition is a stronger condition than Exhaustive.):

Ai ∩ Aj = φ and ∪ni=1 Ai = Ω

1.1.3 De-Morgan’s Law

A∪B =A∩B A∩B = A∪B (1.1)

Spring 2023 3
1.1.4 Sample Space, Events and Probabilities

• Outcome: an outcome of an experiment is any possible observations of that experi-

ment.

• Sample space: the sample space of an experiment is the set of all possible outcomes

of that experiment.

• Event: is a set of outcomes of an experiment.

• Event Space: is a collectively exhaustive, mutually exclusive set of events.

Sample Space and Event Space

– Sample space: contains all the details of an experiment. It is a set of all

outcomes, each outcome s ∈ S. Some example:

∗ coin toss: S = {H, T }

∗ roll pair of dice: S = {(1, 1), · · · , (6, 6)}

∗ component life time: S = {t ∈ [0, ∞)} e.g., lifespan of a light bulb

Spring 2023 4
∗ noise: S = {n(t); t: real}

– Event Space: is a set of events.

Example 1

coin toss 4 times:

The sample space consists of 16 four-letter words, with each letter either h (head)

or t (tail).

Let Bi = outcomes with i heads for i = 0, 1, 2, 3, 4. Each Bi is an event containing

one or more outcomes, say, B1 = {ttth, ttht, thtt, httt} contains four outcomes.

The set B = {B0 , B1 , B2 , B3 , B4 } is an event space. It is not a sample.

Example 2

Toss two dices, there are 36 elements in the sample space. If we define the event

as the sum of two dice,

Ω = {B2 , B3 , · · · , B12 }

there are 11 elements.

– Practical example, binary data transmit through a noisy channel, we are more

interested in the event space.

1.1.5 Probability Defined on Events

Often it is meaningful to talk about at least some of the subsets of S as events, for which

we must have mechanism to compute their probabilities.

Example 3

Spring 2023 5
Consider the experiment where two coins are simultaneously tossed. The sample space is

S = {γ1 , γ2 , γ3, γ4 } where

γ1 = {H, H} γ2 = {H, T } γ3 = {T, H} γ4 = {T, T }

If we define

A = {γ1 , γ2 , γ3 }

The event of A is the same as “Head has occurred at least once” and qualifies as an event.

Probability measure: each event has a probability, P (E)

1.1.6 Definitions, Axioms and Theorems

• Definitions: establish the logic of probability theory

• Axioms: are facts that we have to accept without proof.

• Theorems are consequences that follow logically from definitions and axioms. Each

theorem has a proof that refers to definitions, axioms, and other theorems.

• There are only three axioms.

For any event A, we assign a number P (A), called the probability of the event A. This

number satisfies the following three conditions that act the axioms of probability.

1- Probability is a nonnegative number

P (A) ≥ 0 (1.2)

2- Probability of the whole set is unity

P (Ω) = 1 (1.3)

Spring 2023 6
3- For any countable collection A1 , A2 , · · · of mutually exclusive events

P (A1 ∪ A2 ∪ · · · ) = P (A1 ) + P (A2 ) + · · · (1.4)

(Note that (3) states that if A and B are mutually exclusive (M.E.) events, the

probability of their union is the sum of their probabilities.)

We will build our entire probability theory on these axioms.

1.1.7 Some Results Derived from the Axioms

The following conclusions follow from these axioms:

• Since A ∪ A = Ω, using (2), we have

P (A ∪ A) = P (Ω) = 1

But A ∩ A = φ, and using (3),

P (A ∪ A) = P (A) + P (A) = 1 or P (A) = 1 − P (A)

• Similarly, for any A, A ∩ {φ} = {φ}. hence it follows that P (A ∪ {φ}) = P (A) + P (φ).

But A ∪ {φ} = A and thus

P {φ} = 0

• Suppose A and B are not mutually exclusive (M.E.) How does one compute P (A ∪ B)?

To compute the above probability, we should re-express (A ∪ B) in terms of M.E. sets

so that we can make use of the probability axioms. From figure below,

P (A ∪ B) = A ∪ AB

where A and AB are clearly M.E. events. Thus using axiom (3)

Spring 2023 7
P (A ∪ B) = P (A ∪ AB) = P (A) + P (AB)

to compute P (AB), we can express B as

B = B ∩ Ω = B ∩ (A ∪ A) = (B ∩ A) ∪ (B ∩ A) = BA ∪ BA

Thus

P (B) = P (BA) + P (BA)

Since BA = AB and BA = AB are M.E. events, we have

P (AB) = P (B) − P (AB)

Therefore

P (A ∪ B) = P (A) + P (B) − P (AB)

• Coin toss revisited:

γ1 = [H, H], γ2 = [H, T ], γ3 = [T, H], γ4 = [T, T ],

Let A = γ1 , γ2 : the event that the first coin falls head

Let B = γ1 , γ3 : the event that the second coin falls head

1 1 1 3
P (A ∪ B) = P (A) + P (B) − P (AB) = + − =
2 2 4 4

where P (A ∪ B) denotes the event that at least one head appeared.

Spring 2023 8
1.1.8 Theorem

For an event space B = {B1 , B2 , · · · } and any event A in the event space, let Ci =

A ∩ Bi . For i 6= j, the events Ci and Cj are mutually exclusive and


n
X
A = C1 ∪ C2 ∪ · · · P (A) = P (Ci )
i=1

Example 4

Coin tossing, let A equal the set of outcomes with less than three heads, as A =

{tttt, httt, thtt, ttht, ttth, hhtt, htht, htth, tthh, thth, thht} Let {B0 , B1 , B2 , B3 , B4 } de-

note the event space in which Bi = { outcomes with i heads }. Let Ci = A ∩ Bi (i =

0, 1, 2, 3, 4), the above theorem states that

A = C0 ∪ C1 ∪ C2 ∪ C3 ∪ C4

= (A ∩ B0 ) ∪ (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ) ∪ (A ∩ B4 )

In this example, Bi ⊂ A, for i = 0, 1, 2. Therefore, A ∩ Bi = Bi for i = 0, 1, 2. Also

for i = 3, 4, A ∩ Bi = φ, so that A = B0 ∪ B1 ∪ B2 , a union of disjoint sets. In words,

this example states that the event less than three heads is the union of the events for

“zero head”, “one head”, and “two heads”.

Example 5

Spring 2023 9
V F D

L 0.3 0.15 0.12

B 0.2 0.15 0.08

A company has a model of telephone usage. It classifies all calls as L (long), B (brief).

It also observes whether calls carry voice(V ), fax (F ), or data(D). The sample space

has six outcomes S = {LV, BV, LD, BD, LF, BF }. The probability can be represented

in the table as Note that {V, F, D} is an event space corresponding to {B1 , B2 , B3 } in

the previous theorem (and L is equivalent as the event A). Thus, we can apply the

theorem to find

P (L) = P (LV ) + P (LD) + P (LF ) = 0.57

1.1.9 Conditional Probability and Independence

In N independent trials, suppose NA , NB , NAB denote the number of times events A,

B and AB occur respectively. According to the frequency interpretation of probability,

for large N
NA NB NAB
P (A) = P (B) = P (AB) =
N N N

Among the NA occurrences of A, only NAB of them are also found among the NB

occurrences of B. Thus the ratio

NAB NAB /N P (AB)


= =
NB NB /N P (B)

is a measure of the event A given that B has already occurred. We denote this condi-

tional probability by P (A|B) = Probability of the event A given that B has occurred.

Spring 2023 10
We define

P (AB)
P (A|B) = (1.5)
P (B)

provided P (B) 6= 0. As we show below, the above definition satisfies all probability

axioms discussed earlier. We have

1. Non-negative
P (AB) ≥ 0
P (A|B) = ≥0
P (B) > 0

2.
P (ΩB) P (B)
P (Ω|B) = = = 1, since ΩB = B
P (B) P (B)

3. Suppose A ∩ C = φ, then

P ((A ∪ C) ∩ B) P (AB ∪ CB)


P (A ∪ C|B) = =
P (B) P (B)

But AB ∩ CB = φ, hence P (AB ∪ CB) = P (AB) + P (CB),

P (AB) P (CB)
P (A ∪ C|B) = + = P (A|B) + P (C|B)
P (B) P (B)

satisfying all probability axioms. Thus P (A|B) defines a legitimate probability

measure.

1.1.10 Properties of Conditional Probability

1. If B ⊂ A, AB = B, and

P (AB) P (B)
P (A|B) = = =1
P (B) P (B)

since if B ⊂ A, then occurrence of B implies automatic occurrence of the event

A. As an example, let A = { outcome is even}, B = { outcome is 2} in a dice

tossing experiment. Then B ⊂ A and P (A|B) = 1.

Spring 2023 11
2. If A ⊂ B, AB = A, and

P (AB) P (A)
P (A|B) = = > P (A)
P (B) P (B)

In a dice experiment, A = { outcome is 2}, B = { outcome is even}, so that A ⊂

B. The statement that B has occurred (outcome is even) makes the probability

for “outcome is 2” greater than that without that information.

3. We can use the conditional probability to express the probability of a compli-

cated event in terms of simpler related events: Law of Total Probability. Let

A1 , A2 , · · · , An are pair wise disjoint and their union is Ω. Thus Ai ∩ Aj = φ, and

∪ni=1 Ai = Ω

thus

B = BΩ = B(A1 ∪ A2 ∪ · · · ∪ An ) = BA1 ∪ BA2 ∪ · · · BAn ∪

But Ai ∩ Aj = φ ⇒ BAi ∩ BAj = φ so that

n
X n
X
P (B) = P (BAi ) = P (B|Ai )P (Ai )
i=1 i=1

Next we introduce the notion of “independence” of events.

Independence: A and B are said to be independent events, if

P (AB) = P (A)P (B)

Notice that the above definition is a probabilistic statement, NOT a set theo-

retic notion such as mutually exclusiveness, (independent and disjoint are not

synonyms).

Spring 2023 12
1.1.11 More on Independence

– Disjoint events have no common outcomes and therefore P (AB) = 0. In most

cases, independent does not mean disjoint, except P (A) = 0 or P (B) = 0.

– Disjoint leads to probability sum, while independence leads to probability multi-

plication.

– Independent events cannot be mutually exclusive, since P (A) > 0, P (B) > 0, and

A, B independent implies P (AB) > 0, thus the event AB cannot be the null set.

– Suppose A and B are independent, then

P (AB) P (A)P (B)


P (A|B) = = = P (A)
P (B) P (B)

Thus if A and B are independent, the event that B has occurred does not shed

any more light into the event A. It makes no difference to A whether B has

occurred or not.

Example 6

A box contains 6 white and 4 black balls. Remove two balls at random without

replacement. What is the probability that the first one is white and the second

one is black?

Let W1 = “first ball removed is white” and B2 = “second ball removed is black”.

We need to find P (W1 ∩ B2 ) =?.

We have W1 ∩ B2 = W1 B2 = B2 W1 . Using the conditional probability rule,

P (W1 B2 ) = P (B2 W1 ) = P (B2 |W1 )P (W1 )

But
6 6 3
P (W1 ) = = =
6+4 10 5

Spring 2023 13
and
4 4
P (B2 |W1 ) = =
5+4 9
and hence
3 4 4
P (W1 |B2 ) = · = = 0.267
5 9 15
Are the events W1 and B2 independent? Our common sense says No. To verify

this we need to compute P (B2). Of course the fate of the second ball very much

depends on that of the first ball. The first ball has two options: W1 = “first ball

is white” or B1 = “first ball is black”. Note that W1 ∩ B1 = φ and W1 ∪ B1 = Ω.

Hence W1 together with B1 form a partition. Thus

P (B2) = P (B2 |W1 )P (W1 ) + P (B2 |B1 )P (B1 )

4 3 3 4 2
P (B2 ) = · + · =
5 + 4 5 6 + 3 10 5
and
2 3 4
P (B2 )P (W1 ) = · =6 P (B2 W1 ) =
5 5 15
As expected, the events W1 and B2 are dependent.

1.2 Bayes’ Theorem

since

P (AB) = P (A|B)P (B)

similarly,

P (BA) P (AB)
P (B|A) = = ⇒ P (AB) = P (B|A)P (A)
P (A) P (A)

We get

P (A|B)P (B) = P (B|A)P (A)

Spring 2023 14
or

P (B|A)
P (A|B) = · P (A) (1.6)
P (B)

The above equation is known as Bayes’ theorem.

Although simple enough, Bayes theorem has an interesting interpretation: P (A)

represents the a-priori probability of the event A. Suppose B has occurred, and

assume that A and B are not independent. How can this new information be

used to update our knowledge about A? Bayes’ rule takes into account the new

information (“B has occurred”) and gives out the a-posteriori probability of A

given B.

We can also view the event B as new knowledge obtained from a fresh experiment.

We know something about A as P (A). The new information is available in terms of

B. The new information should be used to improve our knowledge/understanding

of A. Bayes theorem gives the exact mechanism for incorporating such new in-

formation.

A more general version of Bayes’ theorem involves partition of Ω as

P (B|Ai )P (Ai ) P (B|Ai )P (Ai )


P (Ai |B) = = Pn (1.7)
P (B) j=1 P (B|Aj )P (Aj )

In above equation, Ai , i = [1, n] represent a set of mutually exclusive events with

associated a-priori probabilities P (Ai ), i = [1, n]. With the new information “B

has occurred”, the information about Ai can be updated by the n conditional

probabilities P (B|Aj ), j = [1, n].

Example 7

Two boxes B1 and B2 contain 100 and 200 light bulbs respectively. The first box

(B1 ) has 15 defective bulbs and the second 5. Suppose a box is selected at random

Spring 2023 15
and one bulb is picked out.

(a) What is the probability that it is defective?

Solution: Note that box B1 has 85 good and 15 defective bulbs. Similarly box

B2 has 195 good and 5 defective bulbs. Let D = “Defective bulb is picked out”.

Then,
15 5
P (D|B1) = = 0.15, P (D|B2) = = 0.025
100 200
Since a box is selected at random, they are equally likely.

P (B1 ) = P (B2 ) = 1/2

Thus B1 and B2 form a partition, and using Law of Total Probability, we obtain

1 1
P (D) = P (D|B1 )P (B1 ) + P (D|B2)P (B2 ) = 0.15 × + 0.025 × = 0.0875
2 2

Thus, there is about 9% probability that a bulb picked at random is defective.

(b) Suppose we test the bulb and it is found to be defective. What is the proba-

bility that it came from box 1? (P (B1 |D) =?)

P (D|B1)P (B1 ) 0.15 × 0.5


P (B1 |D) = = = 0.8571 (1.8)
P (D) 0.0875

Notice that initially P (B1) = 0.5; then we picked out a box at random and tested

a bulb that turned out to be defective. Can this information shed some light about

the fact that we might have picked up box 1? From (1.8), P (B1 |D) = 0.875 > 0.5,

and indeed it is more likely at this point that we must have chosen box 1 in favor

of box 2. (Recall box1 has six times more defective bulbs compared to box2).

Example 8

Suppose you have two coins, one biased, one fair, but you don’t know which coin is

which. Coin 1 is biased. It comes up heads with probability 3/4, while coin 2 will flip

Spring 2023 16
heads with probability 1/2. Suppose you pick a coin at random and flip it. Let Ci

denote the event that coin i is picked. Let H and T denote the possible outcomes of the

flip. Given that the outcome of the flip is a head, what is P [C1 |H], the probability that

you picked the biased coin? Given that the outcome is a tail, what is the probability

P [C1 |T ] that you picked the biased coin?

Solution: First, we construct the sample tree as shown: To find the conditional

probabilities, we see

P (C1H) P (C1H) 3/8 3


P (C1|H) = = = =
P (H) P (C1H) + P (C2 H) 3/8 + 1/4 5

Similarly,

P (C1 T ) P (C1 T ) 1/8 1


P (C1 |T ) = = = =
P (T ) P (C1T ) + P (C2 T ) 1/8 + 1/4 3

As we would expect, we are more likely to have chosen coin 1 when the first flip is

heads but we are more likely to have chosen coin 2 when the first flip is tails.

Spring 2023 17
Lecture 2

Random Variables

2.1 Introduction

Let Ω be sample space of a probability model, and X a function that maps every ζ ∈ Ω, to

a unique point x ∈ R, the set of real numbers. Since the outcome ζ is not certain, so is the

value X(ζ) = x. Thus if B is some subset of R, we may want to determine the probability

of ”X(ζ) ∈ B“. To determine this probability, we can look at the set A = X −1 (B) ⊂ Ω. A

contains all that maps into B under the function X.

Obviously, if the set A = X −1 (B) is an event, the probability of A is well defined; in this

case we can say

probability of the event ”X(ζ) ∈ B“ = P (X −1(B)) = P (A)

18
However, X −1 (B) may not always belong to R for all B, thus creating difficulties. The

notion of random variable (RV ) makes sure that the inverse mapping always results in an

event so that we are able to determine the probability for any B ⊂ R.

Random Variable (RV ): A finite single valued function X(·) that maps the set of all

experimental outcomes Ω into the set of real numbers R is said to be a RV , if the set

{ζ|X(ζ) ≤ x} is an event for every x in R.

The random variable X by the function X(ζ) that maps the sample outcome ζ to the

corresponding value of the random variable X. That is

{X = x} = {ζ ∈ Ω|X(ζ) = x}

Since all events have well defined probability. Thus the probability of the event {ζ|X(ζ) ≤ x}

must depend on x. Denote

P {ζ|X(ζ) ≤ x} = FX (x) ≥ 0 (2.1)

The role of the subscript X is only to identify the actual RV . FX (x) is said to be the

Cumulative Distribution Function (CDF) associated with the RV X.

2.1.1 Properties of CDF

FX (+∞) = 1, FX (−∞) = 0

FX (+∞) = P {ζ|X(ζ) ≤ +∞} = P (Ω) = 1

FX (−∞) = P {ζ|X(ζ) ≤ −∞} = P (φ) = 0

if x1 < x2 , then FX (x1 ) ≤ FX (x2 )

Spring 2023 19
If x1 < x2 , then the subset (−∞, x1 ) ⊂ (−∞, x2 ). Consequently the event {ζ|X(ζ) ≤

x1 } ⊂ {ζ|X(ζ) ⊂ x2}, since X(ζ) ≤ x1 , implies X(ζ) ≤ x2. As a result

FX (x1 ) = P (X(ζ) ≤ x1 ) ≤ P (X(ζ) ≤ x2) = FX (x2 )

implying that the probability distribution function is nonnegative and monotone non-

decreasing.

F or all b > a, FX (b) − FX (a) = P (a < X ≤ b)

To prove this theorem, express the event Eab = {a < X ≤ b} as a part of union of

disjoint events. Starting with the event Eb = {X ≤ b} . Note that Eb can be written

as the union

Eb = X ≤ b = {X ≤ a} ∪ {a < X ≤ b} = Ea ∪ Eab

Note also that Ea and Eab are disjoint so that P (Eb) = P (Ea )+P (Eab). Since P (Eb ) =

FX (b) and P (Ea ) = FX (a), we can write FX (b) = FX (a) + P (a < X ≤ b), which

completes the proof.

2.1.2 Additional Properties of a CDF

• If FX (x0 ) = 0 for some x0 , then FX (x) = 0, x ≤ x0 .

This follows, since FX (x0 ) = P (X(ζ) ≤ x0 ) = 0 implies {X(ζ) ≤ x0 } is the null set,

and for any x ≤ x0 , {X(ζ) ≤ x} will be a subset of the null set.

• P {X(ζ) > x} = 1 − FX (x)

We have {X(ζ) ≤ x} ∪ {X(ζ) > x} = Ω, and since the two events are mutually

exclusive, the above equation follows.

Spring 2023 20
• P {x1 < X(ζ) ≤ x2 } = FX (x2 ) − FX (x1 ), x2 > x1

The events {X(ζ) ≤ x1 } and {x1 < X(ζ) ≤ x2 } are mutually exclusive and their union

represents the event {X(ζ) ≤ x2 }.

• P {X(ζ) = x} = FX (x) − FX (x− )

Let x1 = x − ǫ, ǫ > 0, and x2 = x,

lim P {x − ǫ < X(ζ) ≤ x} = FX (x) − lim FX (x − ǫ)


ǫ→0 ǫ→0

or

P {X(ζ) = x} = FX (x) − FX (x− )

FX (x+
0 ), the limit of FX (x) as x → x0 from the right always exists and equals FX (x0 ).

However the left limit value FX (x−


0 ) need not equal FX (x0 ). Thus FX (x) need not be

continuous from the left. At a discontinuity point of the distribution, the left and right

limits are different, and

P {X(ζ) = x0 } = FX (x0 ) − FX (x−


0)

Thus the only discontinuities of a distribution function are of the jump type. The

CDF is continuous from the right. Keep in mind that the CDF always takes on the

upper value at every jump in staircase.

Example 1

X is a RV such that X(ζ) = c, ζ ∈ Ω. Find FX (x).

Solution: For x < c, {X(ζ) ≤ x} = φ, so that FX (x) = 0 and for x > c, X(ζ) ≤ x = Ω, so

that FX (x) = 1. (see Figure 2.1)

Example 2

Spring 2023 21
Figure 2.1: CDF for example 1

Toss a coin. Ω = {H, T }. Suppose the RV X is such that X(T ) = q, X(H) = 1 − q. Find

FX (x).

Solution:

• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.

• For 0 < x < 1, {X(ζ) ≤ x} = {T }, so that FX (x) = P (T ) = q.

• For x ≥ 1, {X(ζ) ≤ x} = {H, T } = Ω, so that FX (x) = 1. (see Figure 2.2)

Figure 2.2: CDF for example 2

• X is said to be a continuous-type RV if its distribution function FX (x) is continuous.

In that case FX (x− ) = FX (x) for all x, therefore, P {X = x} = 0.

• If FX (x) is constant except for a finite number of jump discontinuities(piece-wise con-

stant; step-type), then X is said to be a discrete-type RV . If xi is such a discontinuity

Spring 2023 22
point, then

pi = P {X = xi } = FX (xi ) − FX (x−
i )

For above two examples, at a point of discontinuity we get

P {X = c} = FX (c) − FX (c− ) = 1 − 0 = 1

and

P {X = 0} = FX (0) − FX (0− ) = q − 0 = q

Example 3

A fair coin is tossed twice, and let the RV X represent the number of heads. Find FX (x).

Solution: In this case Ω = {HH, HT, T H, T T }, and X(HH) = 2, X(HT ) = 1, X(T H) = 1,

X(T T ) = 0

• For x < 0, {X(ζ) ≤ x} = φ, so that FX (x) = 0.

• For 0 ≤ x < 1, {X(ζ) ≤ x} = {T T }, so that FX (x) = P (T T ) = P (T )P (T ) = 41 .

• For 1 ≤ x < 2, {X(ζ) ≤ x} = {T T, HT, T H}, so that FX (x) = P (T T, HT, T H) = 43 .

• x ≥ 2, {X(ζ) ≤ x} = Ω, so that FX (x) = 1.

(see Figure 2.3) We can also have

3 1 1
P {X = 1} = FX (1) − FX (1− ) = − =
4 4 2

2.2 Probability Density Function (pdf )

The first derivative of the distribution function FX (x) is called the probability density func-

tion fX (x) of the RV X. Thus

dFx (x) FX (x + ∆x) − FX (x)


fX (x) = = lim ≥0 (2.2)
dx ∆x→0 ∆x

Spring 2023 23
Figure 2.3: CDF for example 3

Equation (2.2) shows that fX (x) ≥ 0 for all x.

• Discrete RV:if X is a discrete type RV , then its density function has the general

form
X
fX (x) = pi δ(x − xi )
i

where xi represent the jump-discontinuity points in FX (x). As Fig. 2.4 shows, fX (x)

represents a collection of positive discrete masses, and it is known as the probability

mass function (pmf) in the discrete case.

Figure 2.4: Discrete pmf

• If X is a continuous type RV , fX (x) will be a continuous function,

• We also obtain by integration


Z x
FX (x) = fX (u)du
+∞

Spring 2023 24
Since FX (+∞) = 1, yields
Z −∞
fX (u)du = 1
−∞

which justifies its name as the density function.

• We also get Figure 2.5


Z x2
P {x1 < X < x2 } = FX (x2 ) − FX (x1 ) = fX (x)dx
x1

Thus the area under fX (x) in the interval (x1 , x2 ) represents the probability in the

Figure 2.5: Continuous pdf

above equation.

• Often, RV s are referred by their specific density functions - both in the continuous and

discrete cases - and in what follows we shall list a number of them in each category.

2.3 Continuous-type Random Variables

• Normal (Gaussian): X is said to be normal or Gaussian RV , if

1 h (x − µ)2 i
fX (x) = √ exp − (2.3)
2πσ 2 2σ 2

This is a bell shaped curve, symmetric around the parameter µ, and its distribution

function is given by

x h (y − µ)2 i
1 x−µ
Z
FX (x) = √ exp − 2
dy = Φ( ) (2.4)
−∞ 2πσ 2 2σ σ

Spring 2023 25
Rx 2
where Φ(x) = −∞
√1

exp(− y2 )dy is called standard normal CDF , and is often tabu-

lated. Figure 2.6 shows pdf and cdf of the Normal distribution for different means and

variances.

Figure 2.6: pdf and cdf of Normal distribution for different means and variances

b − µ a − µ
P (a < X < b) = Φ −Φ
σ σ
Z ∞ 2
1  y
Q(x) = √ exp − dy = 1 − Φ(x)
x 2π 2
Q(x) is called Standard Normal complementary CDF , and Q(x) = 1 − Φ(x). Since

fX (x) depends on two parameters µ and σ 2 , the notation X ∼ ℵ(µ, σ 2 ) is applied. If

X −µ
Y = ∼ ℵ(0, 1) (2.5)
σ

Y is called normalized Gaussian RV . Furthermore,

aX + b ∼ ℵ(aµ + b, a2 σ 2 )

linear transform of a Gaussian RV is still Gaussian.

• Uniform: X ∼ U(a, b), a < b, as shown in Figure 2.7 if



 1 , a ≤ x ≤ b;

b−a
fX (x) = (2.6)
 0,

elsewhere.

Spring 2023 26
Figure 2.7: pdf and cdf of Uniform distribution

• Exponential: X ∼ E(λ) if

1
exp(− λx ), x ≥ 0;


λ
fX (x) = (2.7)
 0,

elsewhere.

Figure 2.8 indicates the pdf and cdf of Exponential distribution for different parameter

λ.

Figure 2.8: pdf and cdf of Exponential distribution

• Chi-Square distribution with n degree of freedom



 n n1

xn/2−1 exp(− 2σx2 ), x ≥ 0;
σ 2 Γ(n/2)
fX (x) = (2.8)
 0,

elsewhere.

Spring 2023 27
When the RV X is defined as X = Σni=1 Xi2 , i = [1, n] are statistically independent

and identically distributed (i.i.d) Gaussian RV ∼ ℵ(0, σ 2 ), then X has a chi-square

distribution with n degree of freedom. Γ(x) is called Gamma function and given as
Z ∞
Γ(p) = tp−1 e−t dt, p > 0
0

Γ(p) = (p − 1)! p is possitive integer.

1 √
Γ( ) = π
2

• Rayleigh: X ∼ R(σ 2 ) as shown in Figure 2.9



 x2 exp(− x22 ), x ≥ 0;

σ 2σ
fX (x) = (2.9)
 0,

elsewhere.

Let Y = X12 +X22 where X1 and X2 ∼ ℵ(0, σ 2 ) and independent. Then Y is chi-square

Figure 2.9: pdf and cdf of Rayleigh distribution

distributed with two degrees of freedom, hence pdf of Y is

1  y 
fY (y) = exp −
2σ 2 2σ 2

Now, suppose we define a new RV as R = Y , then R is Rayleigh distributed.

Spring 2023 28
2.4 Discrete-type Random Variables

• Bernoulli: X takes the values of (0, 1), and

P (X = 0) = q, P (X = 1) = p, p = 1 − q.

• Binomial: X ∼ B(n, p)
 
 n 
P (X = k) =   pk q n−k , k = 0, 1, 2, · · · , n
k

• Poisson: X ∼ P (λ)

λk
P (X = k) = e−λ , k = 0, 1, 2, · · · , ∞
k!

• Uniform: X takes the values from [1, , n], and

1
P (X = k) = , k = 0, 1, 2, · · · , n
n

• Geometric: (number of coin toss till first head appear)

P (X = k) = (1 − p)k−1 p, k = 1, · · · ,

where the parameter p ∈ (0, 1) (probability for head appear on each one toss).

2.4.1 Example of Poisson RV

Example of Poisson Distribution: the probability model of Poisson RV describes phenomena

that occur randomly in time. While the time of each occurrence is completely random,

there is a known average number of occurrences per unit time. For example, the arrival of

information requests at a W W W server, the initiation of telephone call, etc.

For example, calls arrive at random times at a telephone switching office with an average

Spring 2023 29
of λ = 0.25 calls/second. The pmf of the number of calls that arrive in a T = 2 second

interval is 
 (0.5)k e−0.5 , k = 0, 1, 2, · · · ;

k!
PK (k) =
 0,

o.w.

2.4.2 Example of Binomial RV

Example of using Binomial Distribution: To communicate one bit of information reliably, we

transmit the same binary symbol 5 times. Thus, ”zero“ is transmitted as 00000 and ”one“

is transmitted as 11111. The receiver detects the correct information if three or more binary

symbols are received correctly. What is the information error probability P (E), if the binary

symbol error probability is q = 0.1?

In this case, we have five trials corresponding to five transmissions. On each trial, the

probability of a success is p = 1 − q = 0.9 (binary symmetric channel). The error event

occurs when the number of successes is strictly less than three:

P (E) = P (S0,5 ) + P (S1,5 ) + P (S2,5 ) = q 5 + 5pq 4 + 10p2 q 3 = 0.0081

By increasing the number of transmissions (5 times), the probability of error is reduced from

0.1 to 0.0081.

2.4.3 Bernoulli Trial Revisited

Bernoulli trial consists of repeated independent and identical experiments each of which has

only two outcomes A or Ā with P (A) = p and P (Ā) = q. The probability of exactly k

occurrences of A in n such trials is given by Binomial distribution.

Let

Xk = ”exact k occurance in n trials“ (2.10)

Spring 2023 30
Since the number of occurrences of A in n trials must be an integer k = 0, 1, 2, · · · , n, either

X0 or X1 or X2 or · · · or Xn must occur in such an experiment. Thus

P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = 1 (2.11)

But Xi , Xj are mutually exclusive. Thus


 
n n
X X  n  k n−k
P (X0 ∪ X1 ∪ X2 ∪ · · · ∪ Xn ) = P (Xk ) =  p q (2.12)
k=0 k=0 k

from the relation  


n
X  n  k n−k
(a + b)n =  p q
k=0 k

Equation (2.12) (p + q)n = 1 and it agrees with Equation (2.11).

For a given n and p what is the most likely value of k? The most probable value of k is that

number which maximizes in Binomial distribution. To obtain this value, consider the ratio

Pn (k − 1) n!pk−1 q n−k+1 (n − k)!k! k q


= · k n−k
= ·
Pn (k) (n − k + 1)!(k − 1)! n!p q n−k+1 p

Thus Pn (k) > Pn (k − 1), if k(1 − p) < (n − k + 1)p or k < (n + 1)p. Thus, Pn (k) as a function

of k increases until

k = (n + 1)p

if it is an integer, or the largest integer kmax less than (n + 1)p and (n + 1)p represents the

most likely number of successes (or heads) in n trials.

Example 4

In a Bernoulli experiment with n trials, find the probability that the number of occurrences

of A is between k1 and k2 .

Spring 2023 31
Solution: with Xi , i = 0, 1, 2, · · · , n as defined in Equation (2.10), clearly they are mutually

exclusive events. Thus

P = P (”Occurrence of A is between k1 and k2 “)


 
k2 k2
X X  n  k n−k
= P (Xk1 ∪ Xk1 +1 ∪ · · · Xk2 ) = P (Xk ) =  p q (2.13)
k=k1 k=k1 k

Example 5

Suppose 5, 000 components are ordered. The probability that a part is defective equals 0.1.

What is the probability that the total number of defective parts does not exceed 400?

Solution: Let

Yk = ”k parts are detective among 5000 components“

using Equation (2.13), the desired probability is given by


 
400
X 400
X  5000 
k 5000−k
P (Y0 ∪ Y1 ∪ · · · Y400 ) = P (Yk ) =   (0.1) (0.9)
k=0 k=0 k

The above equation has too many terms to compute. Clearly, we need a technique to compute

the above term in a more efficient manner.

2.4.4 Binomial Random Variable Approximations

Let X represent a Binomial RV , then


 
k2 k2
X X  n  k n−k
P (k1 < X < k2 = P (Xk ) =  p q (2.14)
k=k1 k=k1 k
 
 n  n!
Since the binomial coefficient   = (n−k)!k! grows quite rapidly with n, it is difficult to
k
compute Equation (2.14) for large n. In this context, Normal approximation is extremely

Spring 2023 32
useful.

Normal Approximation: (Demoivre-Laplace Theorem) Suppose n → ∞ with p held fixed.



Then for k in the npq neighborhood of np, we can approximate
 
 n  k n−k 1  (k − np)2 
p q = √ exp − (2.15)
2πnpq 2npq
 
k

Thus if k1 and k2 in Equation (2.14) are within or around the neighborhood of the interval
√ √
(np − npq, np + npq) we can approximate the summation in Equation (2.14) by an

integration as

K2  (k − np)2 
1
Z
P (k1 < X < k2 ) = √ exp − dx
k1 2πnpq 2npq
x2  y2 
1
Z
= √ exp − dy (2.16)
x1 2π 2

where
k1 − np k2 − np
x1 = √ x2 = √
npq npq

We can express Equation (2.16) in terms of the normalized integral that has been tabulated

extensively. See Figures 2.11 and 2.12.

x
1
Z
2 /2
erf (x) = √ ey dy = −erf (−x) (2.17)
2π 0

For example, if x1 and x2 are both positive, we obtain

P (x1 < X < x2 ) = erf (x2 ) − erf (x1 )

Example 6

A fair coin is tossed 5, 000 times. Find the probability that the number of heads is between

2, 475 to 2, 525.

Solution: We need P (2475 ≤ X ≤ 2525). Here n is large so that we can use the normal

Spring 2023 33
Figure 2.10: pdf of Gaussian approximation.

√ √
approximation. In this case p = 1/2, so that np = 2500, and npq ≃ 35. Since np − npq ≃

2465 and np + npq ≃ 2535, the approximation is valid for k1 = 2475 and k2 = 2525. Thus

x2  y2 
1
Z
P (k1 < X < k2 ) = √ exp − dy (2.18)
x1 2π 2

here
k1 − np 5 k2 − np 5
x1 = √ =− , x2 = √ =
npq 7 npq 7

Since x1 < 0, from Figure 2.10, the above probability is given by

5
P (2475 ≤ X ≤ 2525) = erf (x2 ) − erf (x1 ) = erf (x2 ) + erf (|x1 |) = 2erf ( ) = 0.516
7

where we have used table (erf (0.7) = 0.258).

Spring 2023 34
Figure 2.11: The standard normal complementary CDF Φ(z)

Spring 2023 35
Figure 2.12: The standard normal complementary CDF Q(z)

Spring 2023 36
Lecture 3

Mean, Variance, Characteristic

Function and Transforms of RVs

3.1 Mean of a RV

For a RV X, its pdf fX (x) represents complete information about it. Note that fX (x)

represents very detailed information, and quite often it is desirable to characterize the r.v

in terms of its average behavior. In this context, we will introduce two parameters - mean

and variance - that are universally used to represent the overall properties of the RV and

its pdf . Mean (Expected Value) of a RV X is defined as


Z ∞
X̄ = E[X] = xfX dx (3.1)
−∞

If X is a discrete-type RV , then we get


Z X X Z
X̄ = E[X] = x pi δi (x − xi )dx = xi pi δi (x − xi )dx
i i
X X
= xi pi = xi P (X = xi ) (3.2)
i i

37
Mean represents the average (mean) value of the RV in a very large number of trials. For

example

• X ∼ U(a, b) (uniform distribution), then,


b
b
x 1 x2 a+b
Z
E[X] = dx = =
a b−a b−a 2 2
a

is the midpoint of the interval (a, b).

• X is exponential with parameter λ, then

∞ ∞
x −x
Z Z
E[X] = e λ dx = λ ye−y dy = λ
0 λ 0

implying that the parameter represents the mean value of the exponential RV .

• X is Poisson with parameter λ, we get


∞ ∞ ∞
X X λk X λk
X̄ = E[X] = kP (X = k) = ke−λ = e−λ k
k=0 k=0
k! k=0
k!
∞ k X λi ∞
−λ
X λ
= e = λe−λ eλ = λ (3.3)
k=0
(k − 1)! i=0
i!

Thus the parameter λ also represents the mean of the Poisson RV .

• X is binomial, then its mean is given by


 
n n
X X  n 
X̄ = E[X] = kP (X = k) = k   pk q n−k
k=0 k=0 k
n n
X n! X n!
= k pk q n−k = pk q n−k
k=1
(n − k)!k! k=1
(n − k)!(k − 1)!
n−1
X (n − 1)!
= np pi q n−i−1 = np(p + q)n−1 = np (3.4)
i=0
(n − i − 1)!i!

Thus np represents the mean of the binomial RV .

Spring 2023 38
• For the normal RV ,

∞ Z ∞
(x − µ)2 (y)2
   
1 1
Z
X̄ = E[X] = √ x exp − dx = √ (y + µ) exp − 2 dy
2πσ 2 −∞ 2σ 2 2πσ 2 −∞ 2σ
Z ∞  2
 Z ∞  2

1 (y) 1 (y)
= √ y exp − 2 dy + µ √ exp − 2 dy = µ
2
2πσ −∞ 2σ 2
2πσ −∞ 2σ
(3.5)

where the first integral in Equation (3.5) is zero and the second is 1. Thus the first

parameter in X ∼ ℵ(µ, σ 2 ) is in fact the mean of the Gaussian RV X.

3.1.1 Mean of a Function of a RV

Given X ∼ fX (x), suppose Y = g(X) defines a new RV with pdf fY (y). Then from the

previous discussion, the new RV Y has a mean µY given by


Z ∞
µY = E[Y ] = yfY (y)dy (3.6)
−∞

From above, it appears that to determine E(Y ), we need to determine fY (y). However this

is not the case if only E(Y ) is the quantity of interest. Instead, we can obtain E(Y ) as
Z ∞ Z ∞
µY = E[Y ] = E[g(X)] = yfY (y)dy = g(x)fX (x)dx (3.7)
−∞ −∞

For discrete case

X
µY = E[Y ] = g(xi )P (X = xi ) (3.8)
i

Spring 2023 39
Therefore, fY (y) is not required to evaluate E(Y ) for Y = g(X). As an example, we

determine the mean of Y = X 2 , where X is a Poisson RV.


∞ ∞ k ∞
2
X
2
X
2 −λ λ −λ
X λk
X̄ 2 = E[X ] = k P (X = k) = k e =e k2
k=0 k=0
k! k=1
k!
∞ k ∞ i+1
X λ X λ
= e−λ k = e−λ (i + 1)
k=1
(k − 1)! i=0
i!
∞ ∞
!
−λ
X λi X λi
= λe i +
i=0
i! i=0
i!

! ∞
!
i m+1
X λ X λ
= λe−λ + eλ = λe−λ + eλ
i=1
(i − 1)! m=0
m!
= λe−λ (λeλ + eλ ) = λ2 + λ (3.9)

In general, E(X k ) is known as the k th moment of RV X. Thus if X ∼ P (λ), its second

moment is λ2 + λ.

3.2 Variance of a RV

Mean alone cannot be able to truly represent the pdf of any RV. As an example to illustrate

this, considering two Gaussian RVs X1 ∼ ℵ(0, 1) and X2 ∼ ℵ(0, 10). Both of them have

the same mean. However, as Figure 3.1 shows, their pdf s are quite different. One is more

concentrated around the mean, whereas the other one has a wider spread. Clearly, we need

at least an additional parameter to measure this spread around the mean!.

For a RV X with mean µ , X − µ represents the deviation of the RV from its mean.

Since this deviation can be either positive or negative, consider the quantity (X − µ)2 , and

its average value E[(X − µ)2 ] represents the average mean square deviation of X around its

mean. Define

2
σX = E[(X − µ)2 ] > 0 (3.10)

Spring 2023 40
0.5
2
σ =1
2
σ =4
0.4

Both µ =0
0.3

0.2

0.1

0
−10 −5 0 5 10

Figure 3.1: The impact of the variance on pdf of Normal distribution

With g(X) = (X − µ)2 and using Equation (3.7) we get


Z ∞
2
σX = (X − µ)2 fx (x) dx > 0 (3.11)
−∞

2
p
σX is known as the variance of the RV X, and its square root σX = E[(X − µ)2 ] is

known as the standard deviation of X. Note that the standard deviation represents the root

mean square spread of the RV X around its mean µ.


Z ∞
2
Var(X) = σX = (X 2 − 2xµ + µ2 )fx (x) dx
Z−∞
∞ Z ∞
2
= X fx dx − 2µ Xfx (x) dx + µ2
−∞ −∞
2
= E(X 2 ) − µ2 = E(X 2 ) − [E(X)]2 = X 2 − X (3.12)

• For a Poisson RV, we can obtain that

2 2
σX = E(X 2 ) − [E(X)]2 = X 2 − X = (λ2 + λ) − λ2 = λ

Thus for a Poisson RV, mean and variance are both equal to its parameter λ.

Spring 2023 41
• The variance of the normal RV ℵ(µ, σ 2) can be obtained as


1
Z
2 2 /2σ 2
Var(X) = E(X − µ) = (X − µ)2 √ e−(x−µ) dx (3.13)
−∞ 2πσ 2

To simplify the above integral, we can make use of the identity

∞ ∞
1
Z Z
2 /2σ 2
fX (x)dx = √ e−(x−µ) dx = 1
−∞ −∞ 2πσ 2

which gives
Z ∞
2 /2σ 2 √
e−(x−µ) dx = 2πσ
−∞

Differentiating both sides of above with respect to σ, we get


Z ∞
(x − µ)2 −(x−µ)2 /2σ2 √
3
e dx = 2π
−∞ σ

or

(x − µ)2 −(x−µ)2 /2σ2
Z
√ e dx = σ 2
2πσ 2
−∞

which represents the Var(X) in Equation (3.13). Thus for a normal RV ℵ(µ, σ 2 ),

Var(X) = σ 2 ,

therefore the second parameter in ℵ(µ, σ 2 ) in fact represents the variance. As Figure

4.1 shows the larger the σ, the larger the spread of the pdf around its mean. Thus as

the variance of a RV tends to zero, it will begin to concentrate more and more around

the mean, ultimately behaving like a constant.

3.3 Moments

As remarked earlier, in general

mn = X¯n = E(X n ) n ≥ 1 (3.14)

Spring 2023 42
are known as the moments of the RV X, and

µn = E[(X − µ)n ]

are known as the central moments of X. Clearly, the mean µ = m1 , and the variance

σ 2 = µ2 . It is easy to relate mn and µn . In fact for n > 1 and using the relation
 
n
X n 
(a + b)n =  p q ,
k n−k

k=0 k

we get
 
n
!
Xn
= E[(X − µ)n ] = E
  k n−k
µn   X (−µ)
k=0 k
   
n
X n  n
X n 
k n−k n−k
=   E(X )(µ) =   mk (−µ) (3.15)
k=0 k k=0 k

Direct calculation is often a tedious procedure to compute the mean and variance, and in

this context, the notion of the characteristic function can be quite helpful.

3.4 Characteristic Function (CF)

The characteristic function of a RV X is defined as


Z ∞
jXω
ΦX (ω) = E(e )= ejXω fX (x) dx (3.16)
−∞

Thus ΦX (0) = 1 and |ΦX (ω|) ≤ 1 for all ω.

For discrete RVs the characteristic function is

X
ΦX (ω) = ejkω P (X = k) (3.17)
k

Spring 2023 43
• if X ∼ P (λ) for Poisson distribution, then its characteristic function is given by
∞ ∞ k
X λk X (λejω ) jω jω
ΦX (ω) = ejkω e−λ = e−λ = e−λ eλe = eλ(e −1) (3.18)
k! k!
k=0 k=0

• if X is a binomial RV, its characteristic function is given by


   
n n
X  n  X n 
jω k n−k
n
ejkω   pk q n−k = = pejω + q

ΦX (ω) =   pe q (3.19)
k=0 k k=0 k

3.4.1 CF and Moment

To illustrate the usefulness of the characteristic function of a RV in computing its moments,

first it is necessary to derive the relationship between them.


∞ ∞
jXω
hX (jωX)k i X E(X k ) k
ΦX (ω) = E(e )=E = jk ω
k=0
k! k=0
k!
E(X 2 ) 2 E(X k ) k
= 1 + jE(X)ω + j 2 ω + · · · + jk ω +··· (3.20)
2! k!
P∞ λk
where we have used eλ = k=0 k! . Taking the first derivative of Equation (3.20) with respect

to ω, and letting it to be equal to zero, we get

∂ΦX (ω) 1 ∂ΦX (ω)


= jE(X) or E(X) = (3.21)
∂ω j ∂ω
ω=0 ω=0

Similarly, the second derivative of Equation (3.20) gives

1 ∂ 2 ΦX (ω)
E(X 2 ) = (3.22)
j 2 ∂ω 2
ω=0

and repeating this procedure k times, we obtain the k th moment of X to be

k1 ∂ k ΦX (ω)
E(X ) = k , k≥1 (3.23)
j ∂ω k
ω=0

We can use Equations (3.20)-(3.22) to compute the mean, variance and other higher order

moments of any RV X.

Spring 2023 44
• if X ∼ P (λ), then from Equation (3.18),

∂ΦX (ω) jω
= e−λ eλe , (3.24)
∂ω

so that from Equation (3.21)

E(X) = λ

which agrees with our earlier derivation in Equation (3.3). Differentiating Equation

(3.24) one more time, we get

∂ 2 ΦX (ω) −λ
 j
λe ω jω 2
 λej ω 2 jω

=e e λje +e λj e (3.25)
∂ω 2

so that from Equation (3.22),

E(X 2 ) = λ2 + λ

which again agrees with results in Equation (3.3). Notice that compared to the tedious

calculations in Equation(3.3) to Equation (3.9), the efforts involved by using CF are

very minimal.

• We can use the characteristic function of the binomial RV B(n, p) in Equation (3.19)

to obtain its variance. Direct differentiation gives

∂ΦX (ω) n−1


= jnpejω pejω + q (3.26)
∂ω

so that from Equation (3.21), E(X) = np, which is the same as previous calculation.

One more differentiation of Equation (3.26) yields

∂ 2 ΦX (ω) n−1 n−2


2
= j n p[ejω pejω + q + (n − 1)pe2jω pejω + q (3.27)
∂ω

and using Equation (3.22), we obtain the second moment of the binomial r.v to be

E(X 2 ) = np(1 + (n − 1)p) = n2 p2 + npq

Spring 2023 45
Therefore, we obtain the variance of the binomial r.v to be

2
σX = E(X 2 ) − [E(X)]2 = n2 p2 + npq − n2 p2 = npq

• To obtain the characteristic function of the Gaussian r.v, we can make use of the

definition. Thus if X ∼ N(µ, σ 2 ) then



1
Z
(x−µ)2
ΦX (ω) = ejω √ e− 2σ 2 dx, (Let x − µ = y)
2πσ 2
−∞
Z ∞ ∞
1 1
Z
jµω jωy −y 2 /2σ2 jµω 2 )(y−2jσ 2 ω)
= e √ e e dy = e √ e−(y/2σ dy
2πσ 2 −∞ 2πσ 2 −∞

(Let y − jσ 2 ω = z, so that y = z + jσ 2 ω)
Z ∞
jµω 1 2 2 2
= e √ e−(z+jσ ω)(z−jσ ω)/2σ dz
2πσ −∞ 2
Z ∞
jµω −σ2 ω 2 /2 1 2 2 2 2
= e e √ e−z /2σ dz = e(jµω−σ ω /2) (3.28)
2
2πσ −∞

Notice that the characteristic function of a Gaussian r.v itself has the ”Gaussian“ bell

shape. Thus if X ∼ ℵ(0, σ 2 ), then

1 2 /2σ 2 2 ω 2 /2
fX (x) = √ e−x ΦX (ω) = e−σ
2πσ 2

From Fig. 10, the reverse roles of σ 2 in fX (x) and ΦX (ω) are noteworthy (σ 2 , vs.1/σ 2 ).

3.5 Chebychev Inequality

We conclude this section with a bound that estimates the dispersion of the r.v beyond a

certain interval centered around its mean. Since σ 2 measures the dispersion of the RV X

around its mean µ, we expect this bound to depend on σ 2 as well. Consider an interval of

width 2ǫ symmetrically centered around its mean µ shown as in Figure 3.2. What is the

probability that X falls outside this interval? We need

P (|X − µ| ≥ ǫ) =? (3.29)

Spring 2023 46
Figure 3.2: Chebyshev inequality concept

To compute this probability, we can start with the definition of σ 2


Z ∞ Z
2 2 2
σ = E[(x − µ) ] = (x − µ) fX (x)dx ≥ (x − µ)2 fX (x)dx
−∞ |x−µ|≥ǫ
Z Z
≥ ǫ2 fX (x)dx = ǫ2 fX (x)dx = ǫP (|X − µ| ≥ ǫ) (3.30)
|x−µ|≥ǫ |x−µ|≥ǫ

From Equation (3.30), we obtain the desired probability to be

σ2
P (|X − µ| ≥ ǫ) ≤ (3.31)
ǫ2

Equation (3.31) is known as the chebychev inequality. Interestingly, to compute the above

probability bound, the knowledge of fX (x) is not necessary. We only need σ 2 , the variance

of the RV. In particular with ǫ = kσ in Equation (3.31) we obtain

1
P (|X − µ| ≥ kσ) ≤ (3.32)
k2

Thus with k = 3, we get the probability of X being outside the 3σ interval around its mean

to be 0.111 for any RV. Obviously this cannot be a tight bound as it includes all RVs. For

example, in the case of a Gaussian RV, from Table (µ = 0, σ = 1):

P (|X − µ| ≥ 3σ) = 0.0027

which is much tighter than that given by Equation (3.32). Chebychev inequality always

underestimates the exact probability.

Spring 2023 47
Example 1

If the height X of a randomly chosen adult has expected value E[X] = 5.5 feet and standard

deviation σX = 1 foot, use the Chebyshev inequality to find an upper bound on P (X ≥ 11)

Solution: Since X is nonnegative, the probability that X ≥ 11 can be written as

P [X ≥ 11] = P [X − µX ≥ 11 − µX ] = P [|X − µX | ≥ 5.5]

Now we use the Chebyshev inequality to obtain

Var[X]
P [X ≥ 11] = P [|X − µX | ≥ 5.5] ≤ = 0.033 ≃ 1/30
5.52

We can see that the Chebyshev inequality is a loose bound. In fact, P [X ≥ 11] is orders of

magnitude lower than 1/30. Otherwise, we would expect often to see a person over 11 feet

tall in a group of 30 or more people!

Example 2

If X is uniformly distributed over the interval (0, 10), then, as E[X] = 5, Var(X) = 25/3, it

follows from Chebyshev’s inequality that

σ2 25 1
P (|X ≥ 5| > 4) ≤ 2
= ≃ 0.52
ǫ 3 16

whereas the exact result is

P (|X − 5| > 4) = 0.20

Thus, although Chebyshev’fs inequality is correct, the upper bound that it provides is not

particularly close to the actual probability.

Similarly, if X is a normal random variable with mean µ and variance σ 2 , Chebyshev’s

inequality states that


1
P (|X − µ| > 2σ) ≤
4
Spring 2023 48
whereas the actual probability is given by
 X −µ 
P (|X − µ| > 2σ) = P > 2 = 2[1 − Φ(2)] ≃ 0.0456
σ

Chebyshevfs inequality is often used as a theoretical tool in providing results.

3.6 Functions of a Random Variable

Let X be a RV and suppose g(x) is a function of the variable x. Define

Y = g(X)

Is Y necessarily a RV? If so what is its CDF ; FY (y),pdf fY (y)?

Example 3

Y = aX + b

Solution: Suppose a > 0


 y − b y − b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X ≤ = FX
a a

and
1 y − b
fY (y) = fX
a a
On the other hand if a < 0, then
 y − b y − b
FY (y) = P (Y ≤ y) = P (aX + b ≤ y) = P X > = 1 − FX
a a

and hence
1 y − b
fY (y) = − fX
a a
Therefore, we obtain (for all a)

1 y − b
fY (y) = fX
|a| a

Spring 2023 49
Example 4

Y = X2

FY (y) = P (Y ≤ y) = P (X 2 ≤ y)

if y < 0, then the event {X 2 ≤ y} = φ and hence

FY (y) = 0 y<0

For y > 0, from Figure. 12, the event {Y ≤ y} = {X 2 ≤ y} is equivalent to {x1 < X ≤ x2 }.

Hence,

√ √
FY (y) = P (x1 < X ≤ x2 ) = FX (x2 ) − FX (x1 ) = FX ( y) − FX (− y) y > 0

By direct differentiation, we get



 √1 [fX (√y) − fX (−√y)], y > 0;

2 y
fY (y) = (3.33)
 0,

o.w.

If fX (x) represents an even function, then Equation (3.33) reduces to

1 √
fY (y) = √ fX ( y)U(y) (3.34)
y

In particular if X ∼ ℵ(0, 1), so that

1 2
fx (x) = √ e−x /2 (3.35)

and substituting this into Equation (3.33) or Equation (3.34), we obtain the pdf of Y = X 2

to be

1 2
fY (y) = √ e−y /2 U(y) (3.36)
2πy

Spring 2023 50
3.6.1 General Approach

As a general approach, given Y = g(X), first sketch the graph y = g(x), and determine the

range space of y. Suppose a < y < b is the range space of y = g(x).

• For y < a, FY (y) = 0

• For y > b, FY (y) = 1

• FY (y) can be nonzero only in a < y < b.

• Next, determine whether there are discontinuities in the range space of y. If so evaluate

P (Y (ζ) = yi ) at these discontinuities.

• In the continuous region of y, use the basic approach

FY (y) = P (g(X) ≤ y)

and determine appropriate events in terms of the RV X for every y. Finally, we must

have FY (y) for −∞ < y < ∞ and obtain

dFY (y)
fY (y) = a<y<b
dy

However, if Y = g(X) is a continuous function, it is easy to establish a direct procedure to

obtain fY (y). A continuous function g(x) has a derivative function ǵ(X) which has a finite

number of maxima and minima.

The pdf of Y can be expressed as

X 1 X 1
fY (y) = fX (xi ) = fX (xi ) (3.37)
i
|dy/dx|i i
|ǵ(X)|i

The summation index i in Equation (3.37) depends on y, and for every y the equation

y = g(xi ) must be solved to obtain the total number of solutions at every y, and the actual

Spring 2023 51
solutions x1 , x2 , · · · all in terms of y.

Repeat Example 4 using this approach to verify the answer.

Example 5

Let Y = 1/X, find fY (y).

Solution: Here for every y, x1 = 1/y is the only solution, and

dy 1 dy 1
=− 2 so that = = y2
dx x dx 1/y 2
x=x1

and substituting this into Equation (3.37), we obtain

1 1
fY (y) = 2
fX ( ). (3.38)
y y

3.6.2 Functions of A Discrete-type RV

Suppose X is a discrete-type RV with

P (X = xi ) = pi , x = x1 , x2 , . . . , xi , · · · ,

and Y = g(X). Clearly Y is also of discrete-type, and when x = xi , yi = g(xi ), and for those

yi ,

P (Y = yi ) = P (X = xi ) = pi , y = y1 , y2 , · · · , yi , · · ·

Example 6

Suppose X ∼ P (λ) so that

λk
P (X = k) = e−λ , k = 0, 1, 2, · · ·
k!

Define Y = X 2 + 1. Find the pmf of Y .

Solution: X takes the values 0, 1, 2, · · · , k, · · · , so that Y only takes the values 1, 2, 5, · · · , k 2 +

1, · · · ,

P (Y = k 2 + 1) = P (X = k)

Spring 2023 52
so that for j = k 2 + 1

p λ λ j−1
P (Y = j) = P (X = j − 1) = e √ , j = 1, 2, 5, · · · k 2 + 1, · · ·
( j − 1)!

Spring 2023 53
Lecture 4

Distribution and Density Functions of

Two Random Variables

4.1 Two Random Variables

In many experiments, the observations are expressible not as a single quantity, but as a family

of quantities. For example to record the height and weight of each person in a community

or the number of people and the total income in a family, we need two numbers. Let X and

Y denote two random variables (r.v) based on a probability model (Ω, F, P ). Then
Z x2
P (x1 < X(ζ) < x2 ) = FX (x2 ) − FX (x1 ) = fX (x)dx
x1

and
Z y2
P (y1 < Y (ζ) < y2 ) = FY (y2 ) − FX (y1 ) = fY (y)dy
y1

What about the probability that the pair of RVs (X, Y ) belongs to an arbitrary region D?

In other words, how does one estimate, for example

P (x1 < X(ζ) < x2 ) ∩ P (y1 < Y (ζ) < y2 ) =?

54
Towards this, we define the joint probability distribution function of X and Y to be

FXY (x, y) = P (x1 < X(ζ) < x2 ) ∩ P (y1 < Y (ζ) < y2 )

= P (x1 ≤ x, Y ≤ y) ≥ 0 (4.1)

where x and y are arbitrary real numbers.

4.1.1 Properties

1.

FXY (−∞, y) = FXY (x, −∞) = 0, F (+∞, +∞) = 1 (4.2)

Since (X(ζ) ≤ −∞, Y (ζ) ≤ y) ⊂ (X(ζ) ≤ −∞), we get:

FXY (−∞, y) ≤ P (Z(ζ) ≤ −∞) = 0

Similarly, (X(ζ) ≤ +∞, Y (ζ) ≤ +∞) = Ω we get: FXY (+∞, +∞) = P (Ω) = 1

2.

P (x1 < X(ζ) < x2 , Y (ζ) ≤ y) = FXY (x2 , y) − FXY (x1 , y) (4.3)

P (X(ζ) ≤ x, y1 < Y (ζ) < y2 ) = FXY (x, y2 ) − FXY (x, y1 ) (4.4)

To proof of above equations, we note that for x2 > x1

(X(ζ) ≤ x2 , Y (ζ) ≤ y) = (X(ζ) ≤ x1 , Y (ζ) ≤ y) ∪ (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y)

and the mutually exclusive property of the events on the right side gives

P (X(ζ) ≤ x2 , Y (ζ) ≤ y) = P (X(ζ) ≤ x1 , Y (ζ) ≤ y) + P (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y)

which proves Equation (4.3). Similarly one can prove Equation (4.4).

Spring 2023 55
3.

P (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2) = FXY (x2 , y2 ) − FXY (x2 , y1 )

− FXY (x1 , y2 ) + FXY (x1 , y1) (4.5)

This is the probability that (X, Y ) belongs to the rectangle in Figure 4.1. To prove

Figure 4.1: Two dimensional RV.

Equation (4.5), we can make use of the following identity involving mutually exclusive

events on the right side.

(x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y2 ) = (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y1 )

∪ (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2 )

This gives

P (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y2 ) = P (x1 < X(ζ) ≤ x2 , Y (ζ) ≤ y1 )

+ P (x1 < X(ζ) ≤ x2 , y1 < Y (ζ) ≤ y2 )

and the desired result in Equation (4.5) follows by making use of Equation (4.3) with

y = y2 and y1 respectively.

Spring 2023 56
4.2 Joint Probability Density Function (Joint pdf )

By definition, the joint pdf of X and Y is given by

∂ 2 FXY (x, y)
fXY (x, y) = (4.6)
∂x∂y

and hence we obtain the useful formula


Z x Z y
FXY (x, y) = fXY (u, v)dudv (4.7)
−∞ −∞

Using Equation (4.1.1)


Z ∞ Z ∞
fXY (x, y)dxdy = 1 (4.8)
−∞ −∞

Z Z
P ((X, Y ) ∈ R0 ) = fXY (x, y)dxdy (4.9)
(x,y)∈R0

4.3 Marginal Statistics

In the context of several RV s, the statistics of each individual ones are called marginal

statistics. Thus FX (x) is the marginal probability distribution function of X, and fX (x) is

the marginal pdf of X. It is interesting to note that all marginal can be obtained from the

joint pdf. In fact

FX (x) = FXY (x, +∞) FY (y) = FXY (+∞, y) (4.10)

Also
Z ∞ Z ∞
fX (x) = fXY (x, y)dy fY (y) = fXY (x, y)dx (4.11)
−∞ −∞

To prove Equation (4.10), we can make use of the identity

(X ≤ x) = (X ≤ x) ∪ (Y ≤ +∞)

Spring 2023 57
So that

FX (x) = P (X ≤ x) = P (X ≤ x, Y ≤ ∞) = FXY (x, +∞) (4.12)

To prove Equation (4.11), we can make use of Equation (4.7 and Equation (4.10, which gives
Z +∞
FX (x) = FXY (x, +∞) = fXY (u, y)dydu
−∞

and taking derivative with respect to x , we get


Z +∞
fX (x) = fXY (x, y)dy (4.13)
−∞

If X and Y are discrete RV s, then pij = P (X = xi , Y = yj ) represents their joint pmf , and

their respective marginal pmf s are given by


X X
P (X = xi ) = P (X = xi , Y = yj ) = pij
j j
X X
and P (Y = yi ) = P (X = xi , Y = yj ) = pij (4.14)
i i

Assuming that P (X = xi , Y = yj ) is written out in the form of a rectangular array, to obtain

P (X = xi ) from Equation (4.14, one needs to add up all the entries in the ith row.

4.3.1 Examples

Equation (4.11), the joint cdf and/or the joint pdf represent complete information about the

RV s, and their marginal pdf s can be evaluated from the joint pdf . However, given marginal,

(most often) it will not be possible to compute the joint pdf .

Example 1

Given

 constant, 0 < x < y < 1

fXY (x, y) = (4.15)
 0,

o.w.

Spring 2023 58
Obtain the marginal pdf s fX (x) and fY (y).

Solution: It is given that the joint pdf fXY (x, y) is a constant in the shaded region in

Fig. 4.2 We can use Equation (4.16) to determine that constant c.

Figure 4.2: Diagram for the example.

+∞ 1 y 1
c
Z Z Z  Z 1
fXY (x, y)dxdy = c · dx dy = cydy = =1 (4.16)
−∞ y=0 x=0 y=0 0 2

Thus c = 2. Moreover
Z ∞ Z 1
fX (x) = fXY (x, y)dy = 2dy = 2(1 − x), 0<x<1
−∞ y=x

Similarly
Z ∞ Z y
fY (y) = fXY (x, y)dx = 2dy = 2y, 0<y<1
−∞ x=0

Clearly, in this case given fX (x) and fY (y) as above, it will not be possible to obtain the

original joint pdf in Equation (4.15).

Example 2

Spring 2023 59
X and Y are said to be jointly normal (Gaussian) distributed, if their joint pdf has the

following form:

1 p
fXY (x, y) = 1 − ρ2
2πσx σy
( )
−1 h (x − µx )2 2ρ(x − µx )(y − µy ) (y − µy )2 i
exp 2
− +
2(1 − ρ2 ) σX σX σY σY2
−∞ < x < ∞ − ∞ < y < ∞, |ρ| < 1 (4.17)

By direct integration, it can be shown that

1 h −(x − µ )2 i
x
fX (x) = p exp 2
∼ ℵ(µ2X , σX
2
)
2
2πσX 2σX

Similarly
1 h −(y − µ )2 i
Y
fY (y) = p exp ∼ ℵ(µ2Y , σY2 )
2πσY2 2σY2
2
Following the above notation, we will denote Equation (4.17) as ℵ(µX , µY , σX , σY2 , ρ).

Once again, knowing the marginals in above alone doesn’t tell us everything about the

joint pdf in Equation (4.17). As we show below, the only situation where the marginal

pdf s can be used to recover the joint pdf is when the random variables are statistically

independent.

4.4 Independence of RVs

Definition: The random variables X and Y are said to be statistically independent if

P [(X(ζ) ≤ x) ∩ (Y (ζ) ≤ y)] = P (X(ζ) ≤ x) · P (Y (ζ) ≤ y)

• For continuous RV s,

FXY (x, y) = FX (x)FY (y) (4.18)

Spring 2023 60
or equivalently, if X and Y are independent, then we must have

fXY (x, y) = fX (x)fY (y) (4.19)

• If X and Y are discrete-type RV s then their independence implies

P (X = xi , Y = yj ) = P (X = xi ) · P (Y = yj ) ∀ i, j (4.20)

Equations (4.18)-(4.20) give us the procedure to test for independence. Given fXY (x, y),

obtain the marginal pdf s fX (x) and fY (y) and examine whether one of equations in

(4.18) or (4.20) is valid. If so, the RV s are independent, otherwise they are dependent.

• Returning back to Example 1, we observe by direct verification that fXY (x, y) 6= fX (x)·

fY (y). Hence X and Y are dependent RV s in that case.

• It is easy to see that such is the case in the case of Example 2 also, unless in other

words, two jointly Gaussian RV s as in Equation (4.17) are independent if and only if

the fifth parameter ρ = 0.

4.5 Expectation of Functions of RVs

If X and Y are random variables and g(·) is a function of two variables, then

XX
E[g(X, Y )] = g(x, y) · p(x, y) discrete case
y x
Z ∞ Z ∞
= g(x, y)fXY (x, y)dxdy continuous case (4.21)
−∞ −∞

If g(X, Y ) = aX + bY , then we can obtain

E[g(X, Y )] = aE[X] + bE[Y ]

Spring 2023 61
• If X and Y are independent, then for any functions h(·) and g(·)

E[g(X)h(Y )] = E[g(X)] · E[h(Y )]

And

Var(X + Y ) = Var(X) + Var(Y )

Example 3

Random variables X1 and X2 are independent and identically distributed with probability

density function 
 1 − x/2, 0 ≤ x ≤ 2;

fX (x) =
 0,

o.w.
Find

• (a) The joint pdf fX1 X2 (x1 , x2 )

• (b) The cdf of Z = max(X1 , X2 ).

Solution:(a) since X1 and X2 are independent,



 (1 − x1 )(1 −
 x2
), 0 ≤ x1 ≤ 2, 0 ≤ x2 ≤ 2;
2 2
fX1 X2 (x1 , x2 ) = fX1 (x1 ) · fX2 (x2 ) =
 0,

o.w.

(b) Let FX (x) denote the CDF of both X1 and X2 . The CDF of Z = max(X1 , X2 ) is found

by observing that Z ≤ z iff X1 ≤ z and X2 ≤ z. That is

P (Z ≤ z) = P (X1 ≤ z, X2 ≤ z) = P (X1 ≤ z)P (X2 ≤ z) = [FX (z)]2



0, x < 0;




Z x Z x 
t 
2
FX (x) = fX (t)dt = (1 − )dt = (x − x4 ), 0 ≤ x ≤ 2;
−∞ −∞ 2 



 1,

x > 2.

Spring 2023 62
Thus, for 0 ≤ z ≤ 2,
z2 2
FZ (z) = (z − )
4

The complete CDF of Z is



0, z < 0;






FZ (z) = z2
 (z − 4
), 0 ≤ z ≤ 2;



 1,

z > 2.

Example 4

Given 
 xy 2 e−y , 0 < y < ∞, 0 < x < 1;

fXY (x, y) =
 0,

o.w.
Determine whether X and Y are independent.

Solution:
Z ∞ Z ∞ Z ∞
2 −y
fX (x) = fXY (x, y)dy = x y e dy = x −y 2 de−y
0 0 0

 Z ∞ 
= x − y 2e−y +2 ye−y dy = 2x, 0<x<1
0
0

Similarly
1
y 2 −y
Z
fy (y) = fXY (x, y)dx = e , 0<y<∞
0 2

In this case

fXY (x, y) = fX (x) · fY (y)

and hence X and Y are independent random variables.

Spring 2023 63
4.6 Correlation and Covariance

1 Correlation: Given any two RV s X and Y , define

E[X m Y n ] m, nth joint moment

E[XY ] = Corr(X, Y ) = RXY correlation of X and Y

E[(X − µX )m (Y − µY )n ] m, nth central joint moment

E[(X − µX )(Y − µY )] = Cov(X, Y ) = KXY covariance of X and Y

2 Covariance: Given any two RV s X and Y , define

Cov(X, Y ) = E[(X − µX )(Y − µY )]

By expanding and simplifying the right side of the above equation, we also get

Cov(X, Y ) = E(XY ) − µX × µY = E(XY ) − E(X)E(Y )

3 Correlation coefficient between X and Y

Cov(X, Y ) Cov(X, Y )
ρXY = p = − 1 ≤ ρXY ≤ 1
V ar(X)V ar(Y ) σX σY

Cov(X, Y ) = ρXY σX σY

4 Uncorrelated RV s : If ρXY = 0, then X and Y are said to be uncorrelated RV s. If

X and Y are uncorrelated, then

E(XY ) = E(X)E(Y ) (4.22)

5 Orthogonality X and Y are said to be orthogonal if

E(XY ) = 0

From above, if either X or Y has zero mean, then orthogonality implies uncorrelated-

ness also and vice-versa.

Spring 2023 64
Suppose X and Y are independent RV s,

E(XY ) = E(X)E(Y ),

therefore from Equation (4.22), we conclude that the random variables are uncorrelated.

Thus independence implies uncorrelatedness (ρXY = 0). But the inverse is generally not

true.

Example 5

Let Z = aX + bY . Determine the variance of Z in terms of σX , σY and ρXY .

Solution:

µZ = E(Z) = E(aX + bY ) = aµX + bµY

and
 
σZ2 2
= var(Z) = E[(Z − µZ ) ] = E [a(X − µX ) + b(Y − µY )] 2

= a2 E[(X − µX )2 ] + 2abE[(X − µX )(Y − µY )] + b2 E[(Y − µY )2 ]

= a2 σX
2
+ 2abρXY σX σY + b2 σY2

In particular if X and Y are independent, then ρXY = 0, and the above equation reduces to

σZ = a2 σX
2
+ b2 σY2

Thus the variance of the sum of independent RV s is the sum of their variances (a = b = 1).

6 Moments
Z ∞ Z ∞
m n
E[X Y ] = X m Y n fXY (x, y)dxdy
−∞ −∞

represents the joint moment of order (m, n) for X and Y .

Spring 2023 65
7 Joint Characteristic Function

Following the one random variable case, we can define the joint characteristic function

between two random variables which will turn out to be useful for moment calculations.

The joint characteristic function between X and Y is defined as


Z ∞Z ∞
j(Xω1 +Y ω2 )
ej(Xω1 +Y ω2 ) fXY (x, y)dxdy

ΦXY (ω1 , ω2 ) = E e =
−∞ −∞

From this and the two-dimensional inversion formula for Fourier transforms, it follows

that
∞ ∞
1
Z Z
fXY (x, y) = 2 ΦXY e−j(ω1 x+ω2 y) dω1 dω2
4π −∞ −∞

Note that

|ΦXY (ω1 , ω2 )| ≤ ΦXY (0, 0) = 1

If X and Y are independent RV s, then

ΦXY (ω1 , ω2 ) = ΦX (ω1 )ΦY (ω2 )

Also

ΦX (ω) = ΦXY (ω, 0) ΦY (ω) = ΦXY (0, ω)

Convolution

Characteristic functions are useful in determining the pdf of linear combinations of

RV s. If the RV ; X and Y are independent and Z = X + Y , then

E[ejωZ ] = E[ejω(X+Y ) ] = E[ejωX ] · E[ejωY ]

Hence,

ΦZ (ω) = ΦX (ω) · ΦY (ω)

It is known that the density of Z equals the convolution of fX (x) and fY (y). From

above, the characteristic function of the convolution of two densities equals the product

of their characteristic functions.

Spring 2023 66
Example 6

X and Y are independent Poisson RV s with parameters λ1 and λ2 respectively, let Z = X +Y

Then

ΦZ (ω) = ΦX (ω) · ΦY (ω)

From earlier results

jω −1) jω −1)
ΦX (ω) = eλ1 (e , ΦY (ω) = eλ2 (e

so that
jω −1)
ΦZ (ω) = e(λ1 +λ2 )(e ∼ P (λ1 + λ2 )

i.e., sum of independent Poisson RV s is also a Poisson random variable.

4.7 Central Limit Theorem

Suppose X1 , X2 , · · · , Xn are a sequence of independent, identically distributed (i.i.d) random

variables, each with mean µ and variance σ 2 . Then the distribution of

X1 + X2 + · · · + Xn − nµ
Y = √ ,
σ n

tends to the standard normal as n → ∞

Y → ℵ(0, 1)

The central limit theorem states that a large sum of independent random variables each

with finite variance tends to behave like a normal random variable. Thus the individual

pdf s become unimportant to analyze the collective sum behavior. If we model the noise

phenomenon as the sum of a large number of independent random variables (e.g.: electron

motion in resistor components), then this theorem allows us to conclude that noise behaves

Spring 2023 67
like a Gaussian RV s. This theorem holds for any distribution of the Xi ’s; herein lies its

power.

Spring 2023 68
Lecture 5

Stochastic Processes

Stochastic means: random

Process means: function of time

• Definition: Stochastic Process: A stochastic process X(t) consists of an experiment

with a probability measure P [·] defined on a sample space S and a function that assigns

a time function x(t, s) to each outcome s in the sample space of the experiment.

• Definition: Sample Function: A sample function x(t, s) is the time function associated

with outcome s of an experiment.

X(t) : name of the stochastic process

t : indicate the time dependence

s : indicates the particular outcome of the experiment

(5.1)

69
Figure 5.1: Illustration of Stochastic Process.

Figure 5.2: Example: transmit 3 binary digits to the receiver.

5.1 Types of Stochastic Processes

• Discrete Value and Continuous Value Processes: X(t) is a discrete value process if the

set of all possible values of X(t) at all times t is a countable set SX ; otherwise, X(t)

is a continuous value process.

• Discrete Time and Continuous Time Process: The stochastic process X(t) is a discrete

time process if X(t) is defined only for a set of time instants, tn = nT , where T is a

Spring 2023 70
constant and n is an integer; otherwise X(t) is a continuous time process.

• Random variables from random processes: consider a sample function x(t, s), each

x(t1 , s) is a sample value of a random variable. We use X(t1 ) for this random variable.

The notation X(t) can refer to either the random process or the random variable that

corresponds to the value of the random process at time t.

• Example: in the experiment of repeatedly rolling a die, let Xn = X(nT ). What is

the pmf of X3 ?

The random variable X3 is the value of the die roll at time 3. In this case,

 1/6, x = 1, 2 · · · , 6;

P X3 =
 0,

o.w.

5.2 Independent, Identically Distributed (i.i.d) Ran-

dom Sequences

An i.i.d. random sequence is a random sequence, Xn , in which

· · · , X−2 , X−1 , X0 , X1 , X2 , · · ·

are i.i.d random variables. An i.i.d random sequence occurs whenever we perform indepen-

dent trials of an experiment at a constant rate. An i.i.d random sequence can be either

discrete value or continuous value. In the discrete case, each random variable Xi has pmf

PXi (x) = PX (x), while in the continuous case, each Xi has pdf fXi (x) = fX (x).

Theorem: Let Xn denote an i.i.d random sequence. For a discrete value process, the sample

vector Xn1 , · · · , Xnk has joint pmf


k
Y
PXn1 , · · · , Xnk (x1 , · · · , xk ) = PX (x1 )PX (x2 ) · · · PX (xk ) = PX (xi )
i=1

Spring 2023 71
Otherwise, for a continuous value process, the joint pdf of Xn1 , · · · , Xnk is
k
Y
fXn1 , · · · , Xnk (x1 , · · · , xk ) = fX (x1 )fX (x2 ) · · · fX (xk ) = fX (xi )
i=1

5.3 Expected Value and Correlation

• The Expected Value of Process: The expected value of a stochastic process X(t)

is the deterministic function

µX (t) = E[X(t)]

• Autocovariance: the autocovariance function of the stochastic process X(t) is

CX (t, τ ) = Cov[X(t), X(t + τ )]

• Autocorrelation: The autocorrelation function of the stochastic process X(t) is

RX (t, τ ) = E[X(t)X(t + τ )]

• Autocovariance and Autocorrelation of a Random Sequence:

CX [m, k] = Cov[Xm , Xm+k ] = RX [m, k] − E[Xm ]E[Xm+k ]

where m and k are integers. The autocorrelation function of the random sequence Xn

is

RX [m, k] = E[Xm Xm+k ]

Example 1

Fading envelope, sampled at nTs , obtain a(i). Then


N
1 1 X
RX (t, Ts ) = [a(1)a(2)+a(2)a(3)+· · ·+a(N)a(N+1)] = a(i)a(i+1), (window length N)
N N i=1

Spring 2023 72
similarly,
N
1 X
RX (t, 2Ts ) = a(i)a(i + 2)
N i=1

and
N
1 X
CX (t, Ts ) = (a(i) − µX ) (a(i + 1) − µX )
N i=1

Example 2

If R is a random variable, find the expected value of the rectified cosine X(t) = R| cos 2πf t|.

Solution: The expected value of X(t) is

µX (t) = E[R| cos 2πf t|] = E[R] · | cos 2πf t|

Example 3

The input to a digital filter is an i.i.d random sequence · · · , X−1 , X0 , X1 , · · · with E[Xi ] = 0

and V ar[Xi ] = 1. The output is also a random sequence · · · , Y−1 , Y0 , Y1 , · · · . The relation-

ship between the input sequence and output sequence is expressed in the formula

Yn = Xn + Xn−1 for all integer n

Find the expected value function E[Yn ] and autocovariance function CY (m, k) of the output.

Solution: Because Yi = Xi + Xi−1 , we have E[Yi ] = E[Xi ] + E[Xi−1 ] = 0. Before calculating

CY [m, k], we observe that Xn being an i.i.d random sequence with E[Xn ] = 0 and V ar[Xn ] =

1 implies 
 1, k = 0;

CX [m, k] = E[Xm Xm+k ] =
 0, o.w.

Spring 2023 73
For any integer k, we can write

CY [m, k] = E[Ym Ym+k ] = E[(Xm + Xm−1 ) (Xm+k + Xm+k−1 )]

= E[Xm Xm+k + Xm Xm+k−1 + Xm−1 Xm+k + Xm−1 Xm+k−1 ]

= E[Xm Xm+k ] + E[Xm Xm+k−1 ] + E[Xm−1 Xm+k ] + E[Xm−1 Xm+k−1 ]

= CX [m, k] + CX [m, k − 1] + CX [m − 1, k + 1] + CX [m − 1, k]

We still need to evaluate the above expression for all k. For each value of k, some terms in

the above expression will equal zero since CX [m, k] = 0 for k 6= 0.

When k = 0

CY [m, 0] = CX [m, 0] + CX [m, −1] + CX [m − 1, 1] + CX [m − 1, 0] = 2.

When k = 1

CY [m, 1] = CX [m, 1] + CX [m, 0] + CX [m − 1, 2] + CX [m − 1, 1] = 1.

When k = −1

CY [m, −1] = CX [m, −1] + CX [m, −2] + CX [m − 1, 0] + CX [m − 1, −1] = 1.

When k = 2

CY [m, 2] = CX [m, 2] + CX [m, 1] + CX [m − 1, 3] + CX [m − 1, 2] = 0.

A complete expression for the autocovariance is



2, k = 0;






CY [m, k] = 1, k = ±1;




 0, o.w.

Spring 2023 74
5.4 Stationary Processes

In general, for the stochastic process, X(t), there is a random variable X(t1 ) at every time

instant t1 with pdf fX(t1 ) (x) which depends on t1 . For a special class of random process

known as stationary processes fX(t1 ) (X) does not depend on t1 . That is, for any two time

instants t1 and t1 + τ

fX(t1 ) (x) = fX(t1 +τ ) (x) = fX (x)

5.4.1 Some properties of Stationary Processes

• If X(t) is a stationary process, and for a > 0, then

Y (t) = aX(t) + b is also a stationary process

• If X(t) is a stationary process, the expected value, the autocorrelation, and the auto-

covariance have the following properties for all t

(a) µX(t) = µX

(b) Rx (t, τ ) = Rx (0, τ ) = RX (τ )

(c) Cx (t, τ ) = RX (τ ) − µ2X = CX (τ )

Example 4

At the receiver of an AM radio, the received signal contains a cosine carrier signal at the

carrier frequency fc with a random phase θ that is a sample value of the uniform (0, 2π)

random variable. The received carrier signal is

X(t) = A cos(2πfc t + θ)

Spring 2023 75
What are the expected value and autocorrelation of the process X(t)?

Solution: The phase has P DF



 1/(2π), 0 ≤ θ ≤ 2π;

fθ =
 0,

o.w.

For any fixed angle α and integer k,



1
Z
E[cos(α + kθ)] = cos(α + kθ) dθ
0 2π
sin(α + kθ) 2π sin(α + 2kπ) − sin α
= 0
= =0
k k
(5.2)

We will use the identity cos A cos B = [cos(A−B)+cos(A+B)]/2 to find the autocorrelation:

RX (t, τ ) = E[A cos(2πfc t + θ)A cos(2πfc (t + τ ) + θ)]


A2
= E[cos(2πfc τ ) + cos(2πfc (2t + τ ) + 2θ)]
2

For α = 2πfc (t + τ ) and k = 2,

E[cos(2πfc (2t + τ ) + 2θ)] = E[cos(α + kθ)] = 0.

A2
Thus, RX (t, τ ) = cos(2πfc τ ) = RX (τ ).
2
Therefore, X(t) has the properties of a stationary stochastic

5.4.2 Wide Sense Stationary Stochastic Processes(WSS)

X(t) is a wide sense stationary stochastic process if and only if for all t,

E[X(t)] = µx , and RX (t, τ ) = RX (0, τ ) = RX (τ ).

Xn is a wide sense stationary random sequence if and only if for all n,

E[Xn ] = µx , and RX [n, k] = RX [0, k] = RX [k].

Spring 2023 76
In Example 4, we observe that µX (t) = 0 and RX (t, τ ) = (A2 /2) cos 2πfc τ Thus the random

phase carrier X(t) is a wide sense stationary process.

Properties of WSS

The autocorrelation function of a wide sense stationary process has a number of important

properties:

1. RX (0) ≥ 0

2. RX (τ ) = RX (−τ )

3. RX (0) ≥ RX (τ )

RX (0) has an important physical interpretation for electrical engineers.

The average power of a wide sense stationary process X(t) is RX (0) = E[X 2 (t)].

Quiz : Which of the following functions are valid autocorrelation functions?

1. R1 (τ ) = e−|τ |

2
2. R2 (τ ) = eτ

3. R3 (τ ) = e−τ cos τ

2
4. R4 (τ ) = e−τ sin τ

Example 5

A simple model (in degrees Celsius) for the daily temperature process C(t) is

2πn
Cn = 16[1 − cos ] + 4Xn
365

where xl , x2 , · · · is an iid random sequence of ℵ(0, 1) random variables.

Spring 2023 77
(a) What is the mean E[Cn ]?

(b) Find the autocovariance function CC [m, k].

Solution:

(a) The expected value of the process is


2πn 2πn
E[Cn ] = 16E[1 − cos ] + 4E[Xn ] = 16[1 − cos ]
365 365

(b) The autocovariance of Cn is


h h 2πm i h 2π(m + k) ii
CC [m, k] = E Cm − 16 1 − cos Cm+k − 16 1 − cos
 365 365
 16, k = 0;

= 16E[Xm Xm+k ] =
 0, o.w.

Example 6

A different model for the above example Cn is given as:

1
Cn = Cn−1 + 4Xn ,
2
where C0 , X1 , X2 , · · · is an iid random sequence of ℵ(0, 1) random variables

a) Find the mean and variance of Cn .

b) Find the autocovariance CC [m, k].

Solution:

By repeated application of the recursion Cn = Cn−1 /2 + 4Xn , we obtain


Cn−2 hX
n−1
i
Cn = +4 + Xn
4 2
Cn−3 hX
n−2 Xn−1 i
= +4 + + Xn
8 4 2
..
.
n
C0 h X
1 X2 i C
0
X Xi
= n + 4 n−1 + n−2 + · · · + Xn = n + 4
2 2 2 2 i=1
2n−1

Spring 2023 78
a) Since C0 , X1 , X2 , · · · all have zero mean,
n
E[C0 ] X E[Xi ]
E[Cn ] = + 4 =0
2n i=1
2n−1

b) The autocovariance is so complicated.

5.5 Random Signal Processing

Electrical signals are usually represented as sample functions of wide sense stationary stochas-

tic processes. We use probability density functions and probability mass functions to describe

the amplitude characteristics of signals, and we use autocorrelation functions to describe the

time-varying nature of the signals. In Practical equipment uses digital signal processing

to perform many operations on continuous-time signals, such equipment is known as an

analog-to-digital converter to transform a continuous-time signal to a random sequence. An

analog-to-digital converter performs two operations: sampling and quantization. Sampling

with a period Ts seconds transforms a continuous-time process X(t) to a random sequence

Xn = X(nTs ). Quantization transforms the continuous random variable Xn to a discrete

random variable Qn . Here, we ignore quantization and analyze linear filtering of random

processes and random sequences resulting from sampling random processes.

5.5.1 Linear Filtering of a Continuous-Time Stochastic Process

The relationship of the stochastic process at the output w(t) of a linear time invariant (LT I)

with impulse response h(t) filter to the stochastic process at the input of the filter v(t), is

the convolution:
Z ∞ Z ∞
w(t) = h(u)v(t − u)du = h(t − u)v(u)du.
−∞ −∞

Spring 2023 79
If the possible inputs to the filter are x(t), sample functions of a stochastic process X(t),

then the outputs, y(t), are sample functions of another stochastic process, Y (t). Because

y(t) is the convolution of x(t) and h(t), we adopt the following notation for the relationship

of Y (t) to X(t):
Z ∞ Z ∞
Y (t) = h(u)X(t − u)du = h(t − u)X(u)du.
−∞ −∞

Similarly, the expected value of Y (t) is the convolution of h(t) and E[X(t)].

hZ ∞ i Z ∞  
E[Y (t)] = E h(u)X(t − u)du = h(u)E X(t − u) du
−∞ −∞

5.5.2 Some Properties of LTI Systems

If the input to an LT I filter with impulse response h(t) is a W SS process X(t), the output

Y (t) has the following properties:

• Y (t) is also a W SS process with expected value


Z ∞
µY = E[Y (t)] = µX h(u)du,
−∞

and autocorrelation function


Z ∞ Z ∞
RY (τ ) = h(u) h(v)RX (τ + u − v)dvdu.
−∞ −∞

• X(t) and Y (t) are jointly wide sense stationary and have input-output cross-correlation
Z ∞
RXY (τ ) = h(u)RX (τ − u)du.
−∞

• The output autocorrelation is related to the input-output cross-correlation by


Z ∞
RY (τ ) = h(−u)RXY (τ − u)du.
−∞

Spring 2023 80
Example 7

X(t), a wide sense stationary stochastic process with expected value µ = 10 volts, is the

input to a linear time-invariant filter. The filter impulse response is



 et/0.2 , 0 ≤ t ≤ 0.1 sec;

h(t) =
 0,

o.w.

What is the expected value of the filter output process Y (t)?

Solution:

Z ∞ Z 0.1
et/0.2 dt = 2 e0.5 − 1 = 1.3 volt

µY = µX h(t)dt =
−∞ 0

5.5.3 Linear Filtering of a Random Sequence

The random sequence Xn is obtained by sampling the continuous-time process X(t) at a

rate of 1/Ts samples per second. If X(t) is a wide sense stationary process with expected

value E[X(t)] = µX and autocorrelation RX (τ ), then Xn is a wide sense stationary random

sequence with expected value E[Xn ] = µX and autocorrelation function

RX [k] = RX (kTs )

The impulse response of a discrete-time If the filter has a sequence hn , n = · · · , −1, 0, 1, · · ·

and the output is a random sequence Yn , related to the input Xn by the discrete-time

convolution,

X
Yn = hi Xn−i
i=−∞

If the input to a discrete-time LT I filter with impulse response hn is a wide sense stationary

random sequence, Xn , the output Yn has the following properties.

Spring 2023 81
• (a) Yn is a wide sense stationary random sequence with expected value

X
µ = E[Yn ] = µX hn
n=−∞

and autocorrelation function



X ∞
X
RY [n] = hi hj RX [n + i − j]
i=−∞ j=−∞

• (b) Yn and Xn are jointly wide sense stationary with input-output cross-correlation

X
RXY [n] = hi RX [n − i]
i=−∞

• (c) The output autocorrelation is related to the input-output cross-correlation by



X
RY [n] = h−i RXY [n − i]
i=−∞

Example 8

A wide sense stationary random sequence Xn with µX = 1 and autocorrelation function

RX [n] is the input to the order M − 1 discrete-time moving-average filter hn where



4, n = 0;
 



 1/M, n = 0, · · · , M − 1;
 

hn = and RX [n] = 2, = ±1;
 0,

o.w.




 0, |n| ≥ 2.

For the case M = 2, find the following properties of the output random sequence Yn : the

expected value µY , the autocorrelation Ry [n], and the variance V ar[Yn ].

Solution:

For this filter with M = 2

µY = µX (h0 + h1 ) = µX = 1.

Spring 2023 82
The autocorrelation of the filter output is
1 X
X 1
RY [n] = (0.25)RX [n + i − j]
i=0 j=0




 3, n = 0;



 2,

|n| = 1;
= (0.5)RX [n] + (0.25)RX [n − 1] + (0.25)RX [n + 1] =
0.5, |n| = 2.







 0,

o.w.

To obtain V ar[Yn ], we know that E[Yn2 ] = RY [0] = 3.

∴ V ar[Yn ] = E[Yn2 ] − µ2Y = 2.

Spring 2023 83
5.6 Power Spectral Density of a Continuous-Time Pro-

cess

AS you studied before, the functions g(t) and G(f ) have the Fourier transform pair:
Z ∞ Z ∞
−j2πf t
G(f ) = g(t)e dt, g(t) = G(f )ej2πf t df,
−∞ −∞

The table below provides a list of Fourier transform pairs.

Spring 2023 84
The power spectral density function of the wide sense stationary stochastic process X(t) is
" Z #
T
1 h
2
i 1 2
SX (f ) = lim E XT (f ) = lim E X(t)e−j2πf t dt .
T →∞ 2T T →∞ 2T −T

Physically, Sx (f ) has units of watts/Hz = Joules. Both the autocorrelation function and

the power spectral density function convey information about the time structure of X(t).
Z ∞ Z ∞
−j2πf τ
SX (f ) = RX (τ )e dτ, RX (τ ) = SX (f )ej2πf τ df,
−∞ −∞

 For a wide sense stationary random process X(t), the power spectral density Sx (f ) is a

real-valued function with the following properties:

1. SX (f ) ≥ 0, for all f .

R∞
2. −∞
SX (f )df = E[X 2 (t)] = RX (0)

3. SX (−f ) = SX (f )

Example 9

A wide sense stationary process X(t) has autocorrelation function RX (τ ) = Ae−b|τ | where

b > 0. Derive the power spectral density function Sx (f ) and calculate the average power

E[X 2 (t)]. To find Sx (f ), we use the above table, since RX (τ ) is of the form ae−a|τ | .

2Ab
SX (f ) =
(2πf )2 + b2

The average power is


2Ab
Z
2 −b|0|
E[X (t)] = RX (0) = Ae = df = A
−∞ (2πf )2 + b2

Figure 5.3 displays three graphs for each of two stochastic processes. For each process, the

three graphs are the autocorrelation function, the power spectral density function, and one

Spring 2023 85
Figure 5.3: Random processes V (t) and W (t) with autocorrelation functions Rv (τ ) = e−05|τ |

and Rw (τ ) = e−2|τ | are examples of the process X(t) in above Example. These graphs show

Rv (τ ) and Rw (τ ), the power spectral density functions Sv (f ) and Sw (f ), and sample paths

of V (t) and W (t).

sample function. For both processes, the average power is A = 1 watt. Note W (t) has a

narrower autocorrelation (less dependence between two values of the process with a given

time separation) and a wider power spectral density (more power at higher frequencies) than

V (t). The sample function w(t) fluctuates more rapidly with time than v(t).

5.7 Power Spectral Density of a Random Sequence

The spectral analysis of a random sequence parallels the analysis of a continuous-time pro-

cess. A sample function of a random sequence is an ordered list of numbers. Each number in

the list is a sample value of a random variable. The discrete-time Fourier transform (DTFT)

Spring 2023 86
is a spectral representation of an ordered set of numbers.

The sequence {· · · , X−2 X−1 x0 , X1 , x2 , · · · } and the function X(φ) are a discrete-time

Fourier transform (DTFT) pair if



X Z 1/2
−j2πφn
X(φ) = Xn e , Xn = X(φ)ej2πφn dφ,
−∞ −1/2

where φ is normalized frequency, f = φfs .

The power spectral density function of the wide sense stationary random sequence Xn is

X Z 1/2
−j2πφk
SX (φ) = RX [k]e , RX (k) = SX (φ)ej2πφk df.
k=−∞ −1/2

The properties of the power spectral density function of a random sequence are similar to

the properties of the power spectral density function of a continuous-time stochastic process.

1. SX (φ) ≥ 0, for all f .

R 1/2
2. −1/2
SX (φ)dφ = E[Xn2 ] = Rf X[0]

3. SX (−φ) = SX (φ)

4. For any integer n, SX (φ + n) = SX (φ).

Example 10

The wide sense stationary random sequence Xn has zero expected value and autocorrelation

function 
 σ 2 (2 − |n|)/2, n = −1, 0, 1;

RX [k] =
 0,

o.w.
Derive the power spectral density function of Xn .

Spring 2023 87
Solution:

We have
1
X
SX (φ) = RX [n]e−j2πnφ
n=−1
h (2 − 1) 2 (2 − 1) −j2πφ i
= σ2 ej2πφ + + e
4 4 4
σ2  
= 1 + cos(2πφ)
2

Spring 2023 88

You might also like