Download as pdf or txt
Download as pdf or txt
You are on page 1of 72

MAT205T: Probability Theory

S Vijayakumar
Indian Institute of Information Technology,
Design & Manufacturing, Kancheepuram
Module 7: Expectations

I Expectations and Variance of Standard Random Variables


I Linearity of Expection
I Independence and Expectation
I Markov and Chebyshev Inequalities
I The Central Limit Theorem
I Laws of Large Numbers
Note

The lectures are mostly based on the textbook:

Sheldon Ross: A First Course in Probability, Pearson.

For details on homework problems and for any further clarifications you may consult this book.
Expectation

Definition
If X is a discrete random variable with probability mass function p(x), then the expectation or
the expected value of X , denoted E (X ), is given by
X
E (X ) = xp(x).
x:p(x)>0

That is, the expectation of X is the weighted average of the possible values that X assumes.

Note: The expected value of a random variable is also called the mean or the first moment.
Example
If the pmf of X is given by
1
p(0) = = p(1),
2
then
1 1 1
E (X ) = 0 × +1× = .
2 2 2

Example
If the pmf of X is given by
1 2
p(0) = and p(1) = ,
3 3
then
1 2 2
E (X ) = 0 × +1× = .
3 3 3
Example: The Indicator Random Variable

Let A be an event and let I be a random variable defined as follows:



1 if A occurs
I =
0 if Ac occurs
Then I is called the indicator random variable of the event A. Its expectation is P(A):

E (I ) = 0 × P(I = 0) + 1 × P(I = 1) = 0 × (1 − P(A)) + 1 × P(A) = P(A).


Example

Let X be the outcome when we roll a fair die. Find E (X ).

Solution:
1
p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = .
6
So,
1 1 1 1 1 1 7
E (X ) = 1 × +2× +3× +4× +5× +6× = .
6 6 6 6 6 6 2
Note

If independently repeating the experiment n times (for n large) results in x1 , x2 , . . . , xn , then


x1 + x2 + . . . + xn
≈ E (X )
n
Expectation is the Same as the Center of Gravity

I Let X be a random variable with pmf p(xi ), i ≥ 1.


I Consider a weightless rod in which weights p(xi ) are attached at locations xi .
I The point about which the rod would be balancing is called the center of gravity of the
rod.
I This point also turns out to be the expectation of the random variable.
I For example, for a random variable with pmf

p(−1) = 0.10, p(0) = 0.25, p(1) = 0.30, p(2) = 0.35,

the center of gravity of the rod described is at 0.9. It is also the expectation of the
random variable.
Expectation of a Function of a Random Variable

Let X denote a random variable that takes on any of the values −1, 0, 1 with respective
probabilities

P(X = −1) = 0.2, P(X = 0) = 0.5, P(X = 1) = 0.3.


Compute E (X 2 ).

Solution: Let Y = X 2 . Then the pmf of Y is given by

P(Y = 0) = P(X = 0) = 0.5


P(Y = 1) = P(X = −1) + P(X = 1) = 0.5

Hence
E (X 2 ) = E (Y ) = 0 × 0.5 + 1 × 0.5 = 0.5.
Note: There is a simpler method for computing this expectation!
Expectation of a Function of a Random Variable

Proposition
If X is a discrete random variable that takes on one of the values xi , i ≥ 1, with respective
probabilities p(xi ), then for any real-valued function g
X
E [g (X )] = g (xi )p(xi ).
i

Example
Applying the above proposition to the example in the previous slide, we get

E (X 2 ) = (−1)2 × 0.2 + 02 × 0.5 + 12 × 0.3 = 0.5


Expectation of a Function of a Random Variable

Proposition
If X is a discrete random variable that takes on one of the values xi , i ≥ 1, with respective
probabilities p(xi ), then for any real-valued function g
X
E [g (X )] = g (xi )p(xi ).
i
Proof:

Suppose that yj , j ≥ 1, represent the different values of g (xi ), i ≥ 1. Then grouping all the
g (xi ) having the same value gives

X X X
g (xi )p(xi ) = g (xi )p(xi )
i j i:g (xi )=yj
X X
= yj p(xi )
j i:g (xi )=yj
X X
= yj p(xi )
j i:g (xi )=yj
X
= yj P[g (X ) = yj ]
j
= E [g (X )]
Linearity of Expectation I

Corollary
If a and b are constants, then
E (aX + b) = aE (X ) + b.

Proof.
X
E (aX + b) = (ax + b)p(x)
x:p(x)>0
X X
= a xp(x) + b p(x)
x:p(x)>0 x:p(x)>0

= aE (X ) + b
The nth Moment E [X n ]

Definition
For any random variable X , the quantity E [X n ], n ≥ 1, is called the nth moment of X .

Corollary
X
E [X n ] = x n p(x).
x:p(x)>0
Generalizations

Proposition
If X and Y have a joint probability mass function p(x, y ), then
XX
E [g (X , Y )] = g (x, y )p(x, y ).
y x
Linearity of Expectation II

Theorem
If X and Y are any random variables, then

E [X + Y ] = E [X ] + E [Y ].

Theorem
If X1 , X2 , . . . , Xn are any random variables, then

E [X1 + X2 + . . . + Xn ] = E [X1 ] + E [X2 ] + . . . + E [Xn ].


Variance

Definition
If X is a random variable with mean µ, then the variance of X , denoted Var(X ), is given by

Var(X ) = E [(X − µ)2 ].


Variance: Alternative Formula

Var(X ) = E [(X − µ)2 ]


= E [X 2 − 2µX + µ2 ]
= E [X 2 ] − 2µE [X ] + µ2 ]
= E [X 2 ] − 2µ2 + µ2 ]
= E [X 2 ] − µ2

That is,

Var(X ) = E [X 2 ] − (E (X ))2 .
The Standard Deviation

Definition
The standard deviation of a random variable X , denoted SD(X ), is given by
p
SD(X ) = Var(X ).
Independence and Expectation

If X and Y are independent random variables, then

E [XY ] = E [X ] E [Y ].

More generally, in this case,

E [g (X ) h(Y )] = E [g (X )] E [h(Y )].


Properties of Variance

I Var(aX + b) = a2 Var(X ).
I If X1 , X2 , . . . , Xn are independent random variables, then

Var(X1 + X2 + . . . + Xn ) = Var(X1 ) + Var(X2 ) + . . . + Var(Xn ).


Important Note: The Continuous Case
All the above concepts and results are similarly defined / hold for continuous/jointly
continuous random variables. For instance:
Definition
The expected value or the expecation of a continuous random variable X with pdf f (x) is given
by Z ∞
E (X ) = xf (x)dx.
−∞

Corollary
The nth moment of a continuous random variable X with pdf f (x) is given by
Z ∞
E [x n ] = x n f (x)dx.
−∞
Expectation and Variance of Standard Distributions:
Bernoulli Random Variables

Let X be a Bernoulli random variable with parameter p.


Then its pmf is p(0) = P(X = 0) = 1 − p and p(1) = P(X = 1) = p.

Hence its expectation is

E [X ] = 0 × (1 − p) + 1 × p = p.
Also
E [X 2 ] = 02 × (1 − p) + 12 × p = p.
Hence its variance is

Var(X ) = E [X 2 ] − (E (X ))2 = p − p 2 = p(1 − p)


Expectation and Variance of Binomial Random Variables

Let X be a binomial random variable with paramenters (n, p).


Then X = X1 + X2 + . . . + Xn , where X1 , X2 , . . . , Xn are independent and identically
distributed Bernoulli random variables with parameter p.

Hence, by linear of expectation,

E [X ] = E [X1 + X2 + . . . + Xn ]
= E [X1 ] + E [X2 ] + . . . + E [Xn ]
= p + p + ... + p
= np
Expectation and Variance of Binomial Random Variables...

And, by independence of X1 , X2 , . . . , Xn ,

Var(X ) = Var(X1 + X2 + . . . + Xn )
= Var(X1 ) + Var(X2 ) + . . . + Var(Xn )
= p(1 − p) + p(1 − p) + . . . + p(1 − p)
= np(1 − p)
Expectation and Variance of Poisson Distribution

Let X be a Poisson random variable with parameter λ. Then


X λi
E [X ] = ie −λ
i!
i=0

X λi−1
= λe −λ
(i − 1)!
i=1

X λj
= λe −λ
j!
j=0
= λ
Expectation and Variance of Poisson Random Variables...


X λi
E [X 2 ] = i 2 e −λ
i!
i=0

X λi−1
= λ ie −λ
(i − 1)!
i=1

X λj
= λ (j + 1)e −λ
j!
j=0
 
∞ j ∞ j
X λ X λ
= λ je −λ + e −λ 
j! j!
j=0 j=0
2
= λ(λ + 1) = λ + λ

Hence
Var(X ) = E [X 2 ] − (E [X ])2 = λ2 + λ − λ2 = λ.
Homework

Compute the Expectation and Variance of


1 1−p
(a) the geometric random variable with parameter p (E (X ) = and Var(X ) = );
p p2
r
the negative binomial random variable with parameters (r , p) (E (X ) = and
p
r (1 − p)
Var(X ) = ).
p2
Expectation and Variance of the Gamma Distribution

Let X be a gamma random variable with parameters (α, λ). Then

Z ∞
1
E (X ) = xλe −λx (λx)α−1 dx
Γ(α) 0
Z ∞
1
= e −λx (λx)α d(λx)
λΓ(α) 0
Γ(α + 1)
=
λΓ(α)
α
=
λ
α
Homework: Prove that Var(X ) = .
λ2
Expectation and Variance of Normal Random Variables

Let X be a normal random variable with parameters (µ, σ 2 ).

Then Z = (X − µ)/σ is the standard normal random variable and hence has parameters (0, 1).
Now,

E [Z ] = 0 (Prove!).
So,
Var(Z ) = E [Z 2 ] = 1 (Prove!).
Hence
E [X ] = E [σZ + µ] = σE [Z ] + µ = µ
and
Var(X ) = Var(σZ + µ) = σ 2 Var(Z ) = σ 2 .
Linearity of Expection II: More Applications
Suppose that N people throw their hats into the center of a room. If the hats are mixed up
and each person selects a hat at random, find the expected number of people that select their
own hat.
Solution: Let X denote the number of matches. Then

X = X1 + X2 + . . . + XN ,
where

1 if the ith person selects his own hat
Xi =
0 otherwise
Since, for each i, the ith person is equally likely to select any of the N hats,
1
P(Xi = 1) = .
N
Thus, as each Xi is a Bernoulli random variable,
1
E [Xi ] = P(Xi = 1) = .
N
Thus
1
E [X ] = E [X1 + X2 + . . . + XN ] = E [X1 ] + . . . + E [XN ] = N × = 1.
N
Hence, on the average, exactly one person selects his own hat.
Homework: Coupon-Collecting Problem

Suppose that there are N different types of coupons and that each time one obtains a new
coupon it is equally likely to be any of the N types. Find the expected number of coupons one
need amass for obtaining a complete set containing all the N types.

Note: The answer is not 1.


Homework: A Random Walk in the Plane

Suppose that a particle that is at the origin initially undergoes a sequence of steps of unit
length but in a completely random direction. Compute E [D 2 ], where D is the distance of the
particle from the origin after n steps.
Moment Generating Functions

Definition
The moment generating function (mgf) M(t) of a random variable X is defined for all real
values of t by

M(t) = E [e tX ]
 X


 e tx p(x) if X is discrete with mass function p(x)
x
= Z ∞


 e tx f (x)dx if X is continuous with density function f (x).
−∞
Note

Moment generating functions are called so because all the moments of X can be obtained by
successively differentiating M(t) and evaluating the resulting functions at t = 0:

d
M 0 (t) = E [e tX ]
dt
d
= E [ e tX ]
dt
= E [Xe tX ]

Hence
M 0 (0) = E [X ].
Similarly,
M 00 (t) = E [X 2 e tX ] and M 00 (0) = E [X 2 ].
In general,
M (n) (t) = E [X n e tX ] and M (n) (0) = E [X n ] (n ≥ 1).
Example

Let X be a Bernoulli random variable with parameter p. Then its mgf is

M(t) = E [e tx ] = e t·0 p(0) + e t·1 p(1) = 1 − p + e t p = 1 − p + pe t .


So,
M 0 (t) = pe t and M 00 (t) = pe t .
Hence
E [X ] = M 0 (0) = p and E [X 2 ] = M 00 (0) = p
Example

Let X be a binomial random variable with parameters (n, p). Then

M(t) = E [e tx ]
n  
tk n
X
= e p k (1 − p)n−k
k
k=0
n  
X n
= (pe t )k (1 − p)n−k
k
k=0
= (pe t + 1 − p)n
= (1 − p + pe t )n
Independence and Moment Generating Functions

Proposition
If X and Y are independent random variables, then
MX +Y (t) = MX (t) MY (t).

Proof.

MX +Y (t) = E [e t(X +Y ) ]
= E [e tX +tY ]
= E [e tX e tY ]
= E [e tX ] E [e tY ]
= MX (t) MY (t)
Note

Let X be a binomial random variable with parameters (n, p). Then

X = X1 + . . . + Xn ,
where X1 , . . . , Xn are independent and identically distributed (iid) Bernoulli random variables
with parameter p.
Then the mgf of X is

MX (t) = MX1 (t)MX2 (t) . . . MXn (t)


= (1 − p + pe t )n

Homework: Compute E [X ] and E [X 2 ] using the mgf above. Hence compute the variance of X .
Moment Generating Function of Standard Distributions

Derive the moment generating functions of the following random variables. Hence compute
their mean and variance.
t
−1)
I Poisson random variable with parameter λ. (M(t) = e λ(e .)
pe t
I Geometric random variable with parameter p. (M(t) = .)
1 − (1 − p)e t
r
pe t

I Negative binomial random variable with parameters (r , p). (M(t) = .)
1 − (1 − p)e t
Moment Generating Function of Exponential Random Variable

Let X be exponential with parameter λ. Find its mgf.

Solution:

M(t) = E [e tX ]
Z ∞
= e tx λe −λx dx
0
Z ∞
= λ e −(λ−t)x dx
0
λ
= for t < λ.
λ−t
Homework

Let X be an exponential random variable with parameter λ. Compute E [X ] and E [X 2 ] using


the mgf of X . Hence compute the variance of X .
MGFs of the Standard Normal and the Normal Random Variables

MGF of the standard normal:


Z ∞ Z ∞
1 2 2 1 2 2
MZ (t) = E (e tZ ) = e tz √ e −z /2 dz = e t /2 √ e (−(z−t) /2) dz == e t /2 .
−∞ 2π −∞ 2π

MGF of the normal random variable with parameters (µ, σ 2 ):

2 2 2 2
MX (t) = E (e tX ) = E [e t(σZ +µ) ] = E [e µt e σtZ ] = e µt E [e σtZ ] = e µt e σ t /2
= e µt+σ t /2
.
Homework

1. Find the mean and variance of the standard normal random variable using its moment
generating function. (Mean: 0. Variance: 1.)
2. Find the mean and variance of the normal random variable with parameters (µ, σ 2 ) using
its moment generating function. (Mean: µ. Variance: σ 2 .)
Moment Generating Function of Standard Distributions

Find the moment generating functions of the following random variables. Hence compute their
mean and variance.
e tb − e ta
 
I Uniform random variable over the interval [a, b]. M(t) =
t(b − a)
  α 
λ
I Gamma random variable with parameter (α, λ). M(t) =
λ−t
Fact

The moment generating function of a random variable uniquely determines the distribution.

Examples
I If the mgf of a random variable X is 1 − p + pe t , then X must be Bernoulli with
parameter p.
λ
I If the mgf of a random variable X is , then X must be exponential with parameter λ.
λ−t
2
I If the gmf of a random variable X is e t /2 , then X must be the standard normal random
variable.
Example

Show that if X and Y are independent normal random variables with respective parameters
(µ1 , σ12 ) and (µ2 , σ22 ), then X + Y is normal with parameters (µ1 + µ2 , σ12 + σ22 ).

Solution:

MX +Y (t) = MX (t)MY (t)


2 2
/2 µ2 t+σ22 t 2 /2
= e µ1 t+σ1 t e
(µ1 +µ2 )t+(σ12 +σ22 )t 2 /2
= e

We recognize the above as the mgf a normal random variable with parameters
(µ1 + µ2 , σ12 + σ22 ). From the property of MGFs, it now follows that X + Y is normal with
parameters (µ1 + µ2 , σ12 + σ22 ).
Markov’s Inequality

Proposition
Let X be a non-negative random variable. Then for any value a > 0

E [X ]
P(X ≥ a) ≤ .
a
Proof

X
E [X ] = xP(X = x)
x
X
≥ xP(X = x)
x≥a
X
≥ aP(X = x)
x≥a
= aP(X ≥ a)

E [X ]
∴ P(X ≥ a) ≤ .
a
Chebyshev’s Inequality

Proposition
Let X be a random variable with mean µ and variance σ 2 . Then for any value k > 0

σ2
P(|X − µ| ≥ k) ≤ .
k2
Proof:

Note that (X − µ)2 is a non-negative random variable. Hence applying Markov’s inequality
with a = k 2 , we obtain

E [(X − µ)2 ] σ2
P((X − µ)2 ≥ k 2 ) ≤ = .
k2 k2
But (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k (> 0). Hence the above inequality is equivalent to

σ2
P(|X − µ| ≥ k) ≤ .
k2
Example

The number of items produced in a factory during a week is a random variable with mean 50.
1. What can be said about the probability that this week’s production will exceed 75?
2. If the variance of a week’s production is known to equal to 25, what can be said about the
probability that this week’s production will be between 40 and 60?
Solution:

Let X be the number of items that are produced in a week.


1. By Markov’s inequality,
E [X ] 50 2
P(X > 75) ≤ = = .
75 75 3
2. By Chebyshev’s inequality,

σ2 25 1
P(|X − 50| ≥ 10) ≤ 2
= = .
10 100 4
Hence
1 3
P(40 < X < 60) = P(|X − 50| < 10) ≥ 1 − = .
4 4
Homework

Let X be a normal random variable with parameters (µ, σ 2 ). Using Chebyshev’s inequality, find
an upper bound for the probability P(|X − µ| ≥ 2σ). (Answer: 0.25) Also approximate this
probability using the normal table. (Answer: 0.0456)

Note: Chebyshev’s inequality is often used as a theoretical tool in proving results.

Homework: If Var(X ) = 0, prove that

P(X = E [X ]) = 1.
Note

Let X1 , X2 , X3 , . . . , Xn be a sequence of independent and identically distributed (iid) random


variables each having a finite mean E [Xi ] = µ and a finite variance Var(Xi ) = σ 2 . Then
X1 + X2 + . . . + Xn 1 1
E( ) = × E (X1 + X2 + . . . + Xn ) = × n × µ = µ
n n n
and

X1 + X2 + . . . + Xn 1 1 σ2
Var( ) = 2 × Var(X1 + X2 + . . . + Xn ) = 2 × n × σ 2 = .
n n n n
The Weak Law of Large Numbers

Theorem
Let X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having finite mean E [Xi ] = µ. Then for any  > 0
 
X1 + X2 + . . . + Xn
P − µ ≥  → 0 as n → ∞.

n
Proof:

Assume that the random variables have a finite variance σ 2 . Note that

σ2
   
X1 + X2 + . . . + Xn X1 + X2 + . . . + Xn
E = µ and Var = .
n n n
Hence by Chebyshev’s inequality

σ2
 
X1 + X2 + . . . + Xn
P − µ ≥  ≤ 2 → 0 as n → ∞.
n n
Note

Let X be a random variable with mean µ and variance σ 2 . (X need not be a normal random
variable.)

Consider the random variable


X −µ
Z= .
σ
We have
E (Z ) = 0 and Var(Z ) = 1.

(Z need not the standard normal random variable.)


The Central Limit Theorem

Theorem
Let X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having mean µ and variance σ 2 . Then the distribution of
X1 +X2 +...+Xn
n −µ X1 + X2 + . . . + Xn − nµ
√ = √
σ/ n σ n

tends to the standard normal as n → ∞. That is, for −∞ < a < ∞,


  Z a
X1 + X2 + . . . + Xn − nµ 1 2
P √ ≤a → √ e −z /2 dz as n → ∞.
σ n 2π −∞
Proof:

We will prove that the mgf (moment generating function) of the random variable

X1 + X2 + . . . + Xn − nµ

σ n

tends to the mgf of the standard normal as n → ∞. This implies the theorem (by a lemma).
Xi − µ
Let Zi = for i = 1, 2, . . .. Then
σ
X1 + X2 + . . . + Xn − nµ
Z1 + Z2 + . . . + Zn = .
σ
Note also that E [Zi ] = 0 and Var(Zi ) = 1 for all i. Hence E [Zi2 ] = 1.

Moreover, Z1 , Z2 , . . . are independent.


MZi (t) = E [e tZi ]
t2 2 t3 3
= E [1 + tZi + Z + Zi + . . .]
2 i 6
t2 t3
= 1 + tE [Zi ] + E [Zi ] + E [Zi3 ] + . . .
2
2 6
t2 t3
= 1+ + E [Zi3 ] + . . .
2 6
Hence

t t2 t3
MZi ( √ ) = 1+ + 3/2 E [Zi3 ] + . . .
n 2n 6n
t2
≈ 1+ for n large.
2n
Finally,

t
M Z1 +Z2√+...+Zn (t) = MZ1 +Z2 +...+Zn ( √ )
n n

= [MZ1 (t/ n)]n
n
t2

≈ 1+
2n
2
→ et /2
as n → ∞.
The Strong Law of Large Numbers

Theorem
Let X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having finite mean E [Xi ] = µ. Then, with probability 1,

X1 + X2 + . . . + Xn
→ µ as n → ∞.
n
Proof:

We prove the theorem assuming that E [Xi4 ] = K < ∞. Let us first assume that E [Xi ] = 0.

Let Sn = X1 + . . . + Xn and consider

E [Sn4 ] = E [(X1 + . . . + Xn )(X1 + . . . + Xn )(X1 + . . . + Xn )(X1 + . . . + Xn )].


Expanding the RHS, we obtain terms like Xi4 , Xi3 Xj , Xi2 Xj2 , Xi2 Xj Xk , Xi Xj Xk Xl .

Since E [Xi ] = 0 for all i, we have

E [Xi3 Xj ] = E [Xi3 ]E [Xj ] = 0


E [Xi2 Xj Xk ] = E [Xi2 ]E [Xj ]E [Xk ] = 0
E [Xi Xj Xk Xl ] = 0
4
= 6 terms that equal Xi2 Xj2 . Hence

Now, for a given pair i and j, there are 2

 
n
E [Sn4 ] = nE [Xi4 ] +6 E [Xi2 Xj2 ]
2
= nK + 3n(n − 1)E [Xi2 ]E [Xj2 ]

Also
0 ≤ Var(Xi2 ) = E [Xi4 ] − (E [Xi2 ])2 .
Hence
(E [Xi2 ])2 ≤ E [Xi4 ] = K .
Hence
E [Sn4 ] ≤ nK + 3n(n − 1)K .
This implies that
Sn4
 
K 3K
E ≤ 3+ 2.
n4 n n
Hence
∞ ∞ ∞
Sn4
 
X X 1 X 1
E 4
≤ K 3
+ 3K < ∞.
n=1
n n=1
n n=1
n2

Thus
∞ ∞
" #
S4 Sn4
X X  
n
E = E < ∞.
n=1
n4 n=1
n4
P∞ Sn4
This implies that, with probability 1, n=1 < ∞. n4
Sn4 Sn
This implies that 4 → 0 and hence that →0.
n n
X1 + X2 + . . . + Xn
This means that → 0.
n
When the mean µ 6= 0, we apply the above arguments to Yi = Xi − µ and conclude that
Tn
→ 0 with probability 1, where Tn = Y1 + . . . + Yn .
n
X1 + X2 + . . . + Xn
This implies that → µ with probability 1.
n
Applications of the SLLN

Theorem
Let X , X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having a finite mean µ. Then, with probability 1,

X1k + X2k + . . . + Xnk


→ E [X k ] = µk as n → ∞.
n
Applications of the SLLN

Theorem
Consider an event A with P(A) unknown. Perform the underlying experiment repeatedly and
independently and set Xi = 1 if A occurs in the ith trial and set Xi = 0 otherwise. Then, with
probability 1,
X1 + X2 + . . . + Xn
→ P(A) as n → ∞.
n

You might also like