Professional Documents
Culture Documents
MAT205T: Probability Theory: S Vijayakumar
MAT205T: Probability Theory: S Vijayakumar
S Vijayakumar
Indian Institute of Information Technology,
Design & Manufacturing, Kancheepuram
Module 7: Expectations
For details on homework problems and for any further clarifications you may consult this book.
Expectation
Definition
If X is a discrete random variable with probability mass function p(x), then the expectation or
the expected value of X , denoted E (X ), is given by
X
E (X ) = xp(x).
x:p(x)>0
That is, the expectation of X is the weighted average of the possible values that X assumes.
Note: The expected value of a random variable is also called the mean or the first moment.
Example
If the pmf of X is given by
1
p(0) = = p(1),
2
then
1 1 1
E (X ) = 0 × +1× = .
2 2 2
Example
If the pmf of X is given by
1 2
p(0) = and p(1) = ,
3 3
then
1 2 2
E (X ) = 0 × +1× = .
3 3 3
Example: The Indicator Random Variable
Solution:
1
p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = .
6
So,
1 1 1 1 1 1 7
E (X ) = 1 × +2× +3× +4× +5× +6× = .
6 6 6 6 6 6 2
Note
the center of gravity of the rod described is at 0.9. It is also the expectation of the
random variable.
Expectation of a Function of a Random Variable
Let X denote a random variable that takes on any of the values −1, 0, 1 with respective
probabilities
Hence
E (X 2 ) = E (Y ) = 0 × 0.5 + 1 × 0.5 = 0.5.
Note: There is a simpler method for computing this expectation!
Expectation of a Function of a Random Variable
Proposition
If X is a discrete random variable that takes on one of the values xi , i ≥ 1, with respective
probabilities p(xi ), then for any real-valued function g
X
E [g (X )] = g (xi )p(xi ).
i
Example
Applying the above proposition to the example in the previous slide, we get
Proposition
If X is a discrete random variable that takes on one of the values xi , i ≥ 1, with respective
probabilities p(xi ), then for any real-valued function g
X
E [g (X )] = g (xi )p(xi ).
i
Proof:
Suppose that yj , j ≥ 1, represent the different values of g (xi ), i ≥ 1. Then grouping all the
g (xi ) having the same value gives
X X X
g (xi )p(xi ) = g (xi )p(xi )
i j i:g (xi )=yj
X X
= yj p(xi )
j i:g (xi )=yj
X X
= yj p(xi )
j i:g (xi )=yj
X
= yj P[g (X ) = yj ]
j
= E [g (X )]
Linearity of Expectation I
Corollary
If a and b are constants, then
E (aX + b) = aE (X ) + b.
Proof.
X
E (aX + b) = (ax + b)p(x)
x:p(x)>0
X X
= a xp(x) + b p(x)
x:p(x)>0 x:p(x)>0
= aE (X ) + b
The nth Moment E [X n ]
Definition
For any random variable X , the quantity E [X n ], n ≥ 1, is called the nth moment of X .
Corollary
X
E [X n ] = x n p(x).
x:p(x)>0
Generalizations
Proposition
If X and Y have a joint probability mass function p(x, y ), then
XX
E [g (X , Y )] = g (x, y )p(x, y ).
y x
Linearity of Expectation II
Theorem
If X and Y are any random variables, then
E [X + Y ] = E [X ] + E [Y ].
Theorem
If X1 , X2 , . . . , Xn are any random variables, then
Definition
If X is a random variable with mean µ, then the variance of X , denoted Var(X ), is given by
That is,
Var(X ) = E [X 2 ] − (E (X ))2 .
The Standard Deviation
Definition
The standard deviation of a random variable X , denoted SD(X ), is given by
p
SD(X ) = Var(X ).
Independence and Expectation
E [XY ] = E [X ] E [Y ].
I Var(aX + b) = a2 Var(X ).
I If X1 , X2 , . . . , Xn are independent random variables, then
Corollary
The nth moment of a continuous random variable X with pdf f (x) is given by
Z ∞
E [x n ] = x n f (x)dx.
−∞
Expectation and Variance of Standard Distributions:
Bernoulli Random Variables
E [X ] = 0 × (1 − p) + 1 × p = p.
Also
E [X 2 ] = 02 × (1 − p) + 12 × p = p.
Hence its variance is
E [X ] = E [X1 + X2 + . . . + Xn ]
= E [X1 ] + E [X2 ] + . . . + E [Xn ]
= p + p + ... + p
= np
Expectation and Variance of Binomial Random Variables...
And, by independence of X1 , X2 , . . . , Xn ,
Var(X ) = Var(X1 + X2 + . . . + Xn )
= Var(X1 ) + Var(X2 ) + . . . + Var(Xn )
= p(1 − p) + p(1 − p) + . . . + p(1 − p)
= np(1 − p)
Expectation and Variance of Poisson Distribution
∞
X λi
E [X ] = ie −λ
i!
i=0
∞
X λi−1
= λe −λ
(i − 1)!
i=1
∞
X λj
= λe −λ
j!
j=0
= λ
Expectation and Variance of Poisson Random Variables...
∞
X λi
E [X 2 ] = i 2 e −λ
i!
i=0
∞
X λi−1
= λ ie −λ
(i − 1)!
i=1
∞
X λj
= λ (j + 1)e −λ
j!
j=0
∞ j ∞ j
X λ X λ
= λ je −λ + e −λ
j! j!
j=0 j=0
2
= λ(λ + 1) = λ + λ
Hence
Var(X ) = E [X 2 ] − (E [X ])2 = λ2 + λ − λ2 = λ.
Homework
Z ∞
1
E (X ) = xλe −λx (λx)α−1 dx
Γ(α) 0
Z ∞
1
= e −λx (λx)α d(λx)
λΓ(α) 0
Γ(α + 1)
=
λΓ(α)
α
=
λ
α
Homework: Prove that Var(X ) = .
λ2
Expectation and Variance of Normal Random Variables
Then Z = (X − µ)/σ is the standard normal random variable and hence has parameters (0, 1).
Now,
E [Z ] = 0 (Prove!).
So,
Var(Z ) = E [Z 2 ] = 1 (Prove!).
Hence
E [X ] = E [σZ + µ] = σE [Z ] + µ = µ
and
Var(X ) = Var(σZ + µ) = σ 2 Var(Z ) = σ 2 .
Linearity of Expection II: More Applications
Suppose that N people throw their hats into the center of a room. If the hats are mixed up
and each person selects a hat at random, find the expected number of people that select their
own hat.
Solution: Let X denote the number of matches. Then
X = X1 + X2 + . . . + XN ,
where
1 if the ith person selects his own hat
Xi =
0 otherwise
Since, for each i, the ith person is equally likely to select any of the N hats,
1
P(Xi = 1) = .
N
Thus, as each Xi is a Bernoulli random variable,
1
E [Xi ] = P(Xi = 1) = .
N
Thus
1
E [X ] = E [X1 + X2 + . . . + XN ] = E [X1 ] + . . . + E [XN ] = N × = 1.
N
Hence, on the average, exactly one person selects his own hat.
Homework: Coupon-Collecting Problem
Suppose that there are N different types of coupons and that each time one obtains a new
coupon it is equally likely to be any of the N types. Find the expected number of coupons one
need amass for obtaining a complete set containing all the N types.
Suppose that a particle that is at the origin initially undergoes a sequence of steps of unit
length but in a completely random direction. Compute E [D 2 ], where D is the distance of the
particle from the origin after n steps.
Moment Generating Functions
Definition
The moment generating function (mgf) M(t) of a random variable X is defined for all real
values of t by
M(t) = E [e tX ]
X
e tx p(x) if X is discrete with mass function p(x)
x
= Z ∞
e tx f (x)dx if X is continuous with density function f (x).
−∞
Note
Moment generating functions are called so because all the moments of X can be obtained by
successively differentiating M(t) and evaluating the resulting functions at t = 0:
d
M 0 (t) = E [e tX ]
dt
d
= E [ e tX ]
dt
= E [Xe tX ]
Hence
M 0 (0) = E [X ].
Similarly,
M 00 (t) = E [X 2 e tX ] and M 00 (0) = E [X 2 ].
In general,
M (n) (t) = E [X n e tX ] and M (n) (0) = E [X n ] (n ≥ 1).
Example
M(t) = E [e tx ]
n
tk n
X
= e p k (1 − p)n−k
k
k=0
n
X n
= (pe t )k (1 − p)n−k
k
k=0
= (pe t + 1 − p)n
= (1 − p + pe t )n
Independence and Moment Generating Functions
Proposition
If X and Y are independent random variables, then
MX +Y (t) = MX (t) MY (t).
Proof.
MX +Y (t) = E [e t(X +Y ) ]
= E [e tX +tY ]
= E [e tX e tY ]
= E [e tX ] E [e tY ]
= MX (t) MY (t)
Note
X = X1 + . . . + Xn ,
where X1 , . . . , Xn are independent and identically distributed (iid) Bernoulli random variables
with parameter p.
Then the mgf of X is
Homework: Compute E [X ] and E [X 2 ] using the mgf above. Hence compute the variance of X .
Moment Generating Function of Standard Distributions
Derive the moment generating functions of the following random variables. Hence compute
their mean and variance.
t
−1)
I Poisson random variable with parameter λ. (M(t) = e λ(e .)
pe t
I Geometric random variable with parameter p. (M(t) = .)
1 − (1 − p)e t
r
pe t
I Negative binomial random variable with parameters (r , p). (M(t) = .)
1 − (1 − p)e t
Moment Generating Function of Exponential Random Variable
Solution:
M(t) = E [e tX ]
Z ∞
= e tx λe −λx dx
0
Z ∞
= λ e −(λ−t)x dx
0
λ
= for t < λ.
λ−t
Homework
2 2 2 2
MX (t) = E (e tX ) = E [e t(σZ +µ) ] = E [e µt e σtZ ] = e µt E [e σtZ ] = e µt e σ t /2
= e µt+σ t /2
.
Homework
1. Find the mean and variance of the standard normal random variable using its moment
generating function. (Mean: 0. Variance: 1.)
2. Find the mean and variance of the normal random variable with parameters (µ, σ 2 ) using
its moment generating function. (Mean: µ. Variance: σ 2 .)
Moment Generating Function of Standard Distributions
Find the moment generating functions of the following random variables. Hence compute their
mean and variance.
e tb − e ta
I Uniform random variable over the interval [a, b]. M(t) =
t(b − a)
α
λ
I Gamma random variable with parameter (α, λ). M(t) =
λ−t
Fact
The moment generating function of a random variable uniquely determines the distribution.
Examples
I If the mgf of a random variable X is 1 − p + pe t , then X must be Bernoulli with
parameter p.
λ
I If the mgf of a random variable X is , then X must be exponential with parameter λ.
λ−t
2
I If the gmf of a random variable X is e t /2 , then X must be the standard normal random
variable.
Example
Show that if X and Y are independent normal random variables with respective parameters
(µ1 , σ12 ) and (µ2 , σ22 ), then X + Y is normal with parameters (µ1 + µ2 , σ12 + σ22 ).
Solution:
We recognize the above as the mgf a normal random variable with parameters
(µ1 + µ2 , σ12 + σ22 ). From the property of MGFs, it now follows that X + Y is normal with
parameters (µ1 + µ2 , σ12 + σ22 ).
Markov’s Inequality
Proposition
Let X be a non-negative random variable. Then for any value a > 0
E [X ]
P(X ≥ a) ≤ .
a
Proof
X
E [X ] = xP(X = x)
x
X
≥ xP(X = x)
x≥a
X
≥ aP(X = x)
x≥a
= aP(X ≥ a)
E [X ]
∴ P(X ≥ a) ≤ .
a
Chebyshev’s Inequality
Proposition
Let X be a random variable with mean µ and variance σ 2 . Then for any value k > 0
σ2
P(|X − µ| ≥ k) ≤ .
k2
Proof:
Note that (X − µ)2 is a non-negative random variable. Hence applying Markov’s inequality
with a = k 2 , we obtain
E [(X − µ)2 ] σ2
P((X − µ)2 ≥ k 2 ) ≤ = .
k2 k2
But (X − µ)2 ≥ k 2 if and only if |X − µ| ≥ k (> 0). Hence the above inequality is equivalent to
σ2
P(|X − µ| ≥ k) ≤ .
k2
Example
The number of items produced in a factory during a week is a random variable with mean 50.
1. What can be said about the probability that this week’s production will exceed 75?
2. If the variance of a week’s production is known to equal to 25, what can be said about the
probability that this week’s production will be between 40 and 60?
Solution:
σ2 25 1
P(|X − 50| ≥ 10) ≤ 2
= = .
10 100 4
Hence
1 3
P(40 < X < 60) = P(|X − 50| < 10) ≥ 1 − = .
4 4
Homework
Let X be a normal random variable with parameters (µ, σ 2 ). Using Chebyshev’s inequality, find
an upper bound for the probability P(|X − µ| ≥ 2σ). (Answer: 0.25) Also approximate this
probability using the normal table. (Answer: 0.0456)
P(X = E [X ]) = 1.
Note
X1 + X2 + . . . + Xn 1 1 σ2
Var( ) = 2 × Var(X1 + X2 + . . . + Xn ) = 2 × n × σ 2 = .
n n n n
The Weak Law of Large Numbers
Theorem
Let X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having finite mean E [Xi ] = µ. Then for any > 0
X1 + X2 + . . . + Xn
P − µ ≥ → 0 as n → ∞.
n
Proof:
Assume that the random variables have a finite variance σ 2 . Note that
σ2
X1 + X2 + . . . + Xn X1 + X2 + . . . + Xn
E = µ and Var = .
n n n
Hence by Chebyshev’s inequality
σ2
X1 + X2 + . . . + Xn
P − µ ≥ ≤ 2 → 0 as n → ∞.
n n
Note
Let X be a random variable with mean µ and variance σ 2 . (X need not be a normal random
variable.)
Theorem
Let X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having mean µ and variance σ 2 . Then the distribution of
X1 +X2 +...+Xn
n −µ X1 + X2 + . . . + Xn − nµ
√ = √
σ/ n σ n
We will prove that the mgf (moment generating function) of the random variable
X1 + X2 + . . . + Xn − nµ
√
σ n
tends to the mgf of the standard normal as n → ∞. This implies the theorem (by a lemma).
Xi − µ
Let Zi = for i = 1, 2, . . .. Then
σ
X1 + X2 + . . . + Xn − nµ
Z1 + Z2 + . . . + Zn = .
σ
Note also that E [Zi ] = 0 and Var(Zi ) = 1 for all i. Hence E [Zi2 ] = 1.
t t2 t3
MZi ( √ ) = 1+ + 3/2 E [Zi3 ] + . . .
n 2n 6n
t2
≈ 1+ for n large.
2n
Finally,
t
M Z1 +Z2√+...+Zn (t) = MZ1 +Z2 +...+Zn ( √ )
n n
√
= [MZ1 (t/ n)]n
n
t2
≈ 1+
2n
2
→ et /2
as n → ∞.
The Strong Law of Large Numbers
Theorem
Let X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having finite mean E [Xi ] = µ. Then, with probability 1,
X1 + X2 + . . . + Xn
→ µ as n → ∞.
n
Proof:
We prove the theorem assuming that E [Xi4 ] = K < ∞. Let us first assume that E [Xi ] = 0.
n
E [Sn4 ] = nE [Xi4 ] +6 E [Xi2 Xj2 ]
2
= nK + 3n(n − 1)E [Xi2 ]E [Xj2 ]
Also
0 ≤ Var(Xi2 ) = E [Xi4 ] − (E [Xi2 ])2 .
Hence
(E [Xi2 ])2 ≤ E [Xi4 ] = K .
Hence
E [Sn4 ] ≤ nK + 3n(n − 1)K .
This implies that
Sn4
K 3K
E ≤ 3+ 2.
n4 n n
Hence
∞ ∞ ∞
Sn4
X X 1 X 1
E 4
≤ K 3
+ 3K < ∞.
n=1
n n=1
n n=1
n2
Thus
∞ ∞
" #
S4 Sn4
X X
n
E = E < ∞.
n=1
n4 n=1
n4
P∞ Sn4
This implies that, with probability 1, n=1 < ∞. n4
Sn4 Sn
This implies that 4 → 0 and hence that →0.
n n
X1 + X2 + . . . + Xn
This means that → 0.
n
When the mean µ 6= 0, we apply the above arguments to Yi = Xi − µ and conclude that
Tn
→ 0 with probability 1, where Tn = Y1 + . . . + Yn .
n
X1 + X2 + . . . + Xn
This implies that → µ with probability 1.
n
Applications of the SLLN
Theorem
Let X , X1 , X2 , X3 , . . . be a sequence of independent and identically distributed (iid) random
variables each having a finite mean µ. Then, with probability 1,
Theorem
Consider an event A with P(A) unknown. Perform the underlying experiment repeatedly and
independently and set Xi = 1 if A occurs in the ith trial and set Xi = 0 otherwise. Then, with
probability 1,
X1 + X2 + . . . + Xn
→ P(A) as n → ∞.
n