Download as pdf or txt
Download as pdf or txt
You are on page 1of 110

2.

Bivariate Random Variables

Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology

3/2/20

ISYE 6739
1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Introduction

Lesson 2.1 — Introduction

In this introductory lesson, we’ll cover . . .


What we mean by bivariate (or joint) random variables.
The discrete case.
The continuous case.
Bivariate cdf’s.

In this module, we’ll look at what happens when you consider two random
variables simultaneously.

Example: Choose a person at random. Look at their height and weight


(X, Y ). Obviously, X and Y will be related somehow.

ISYE 6739
Introduction

Discrete Case

Definition: If X and Y are discrete random variables, then (X, Y ) is called a


jointly discrete bivariate random variable.

The joint (or bivariate) pmf is

f (x, y) = P (X = x, Y = y), ∀x, y.

Properties:
0 ≤ f (x, y) ≤ 1.
P P
x y f (x, y) = 1.
A ⊆ R2 ⇒ P ((X, Y ) ∈ A) =
PP
(x,y)∈A f (x, y).

ISYE 6739
Introduction

Example: 3 sox in a box (numbered 1,2,3). Draw 2 sox at random without


replacement. X = # of the first sock; Y = # of the second sock. The joint
pmf f (x, y) is

f (x, y) X=1 X=2 X=3 P (Y = y)


Y =1 0 1/6 1/6 1/3
Y =2 1/6 0 1/6 1/3
Y =3 1/6 1/6 0 1/3
P (X = x) 1/3 1/3 1/3 1

fX (x) ≡ P (X = x) is the “marginal” pmf of X.

fY (y) ≡ P (Y = y) is the “marginal” pmf of Y .

ISYE 6739
Introduction

By the Law of Total Probability,


3
X
P (X = 1) = P (X = 1, Y = y) = 1/3.
y=1

In addition,

P (X ≥ 2, Y ≥ 2)
XX
= f (x, y)
x≥2 y≥2
= f (2, 2) + f (2, 3) + f (3, 2) + f (3, 3)
= 0 + 1/6 + 1/6 + 0 = 1/3. 2

ISYE 6739
Introduction

Continuous Case

Definition: If X and Y are continuous RVs, then (X, Y ) is a jointly


continuous bivariate RV if there exists a magic function f (x, y) such that
f (x, y) ≥ 0, ∀x, y.
RR
R2 f (x, y) dx dy = 1. RR
P (A) = P ((X, Y ) ∈ A) = A f (x, y) dx dy.

In this case, f (x, y) is called the joint pdf.

If A ⊆ R2 , then P (A) is the volume between f (x, y) and A.

Think of

f (x, y) dx dy ≈ P (x < X < x + dx, y < Y < y + dy).

It’s easy to see how this generalizes the 1-dimensional pdf, f (x).

ISYE 6739
Introduction

Example: Choose a point (X, Y ) at random in the interior of the circle


inscribed in the unit square, e.g., C ≡ (x − 12 )2 + (y − 12 )2 ≤ 14 .

Find the pdf of (X, Y ).

Since the area of the circle is π/4,


(
4/π if (x, y) ∈ C
f (x, y) =
0 otherwise. 2

Application: Toss n darts randomly into the unit square. The probability
that any individual dart will land in the circle is π/4. It stands to reason that
the proportion of darts, p̂n , that land in the circle will be approximately π/4.
So you can use 4p̂n to estimate π!

ISYE 6739
Introduction

Example: Suppose that


(
4xy if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0 otherwise.

Find the probability (volume) of the region 0 ≤ y ≤ 1 − x2 .


Z 1 Z 1−x2
V = 4xy dy dx
0 0

Z 1Z 1−y
= 4xy dx dy
0 0
= 1/3.

Moral: Be careful with limits! 2

ISYE 6739
Introduction

Bivariate cdf’s

Definition: The joint (bivariate) cdf of X and Y is


F (x, y) ≡ P (X ≤ x, Y ≤ y), for all x, y.
 PP
s≤x,t≤y f (s, t) discrete



F (x, y) =
 R y R x f (s, t) ds dt continuous.


−∞ −∞

Going from cdf’s to pdf’s (continuous case):


Rx
1-dimension: f (x) = F 0 (x) = d
dx −∞ f (t) dt.

∂2 ∂2
Rx Ry
2-dimensions: f (x, y) = ∂x∂y F (x, y) = ∂x∂y −∞ −∞ f (s, t) dt ds.

ISYE 6739
Introduction

Properties:

F (x, y) is non-decreasing in both x and y.

limx→−∞ F (x, y) = limy→−∞ F (x, y) = 0.

limx→∞ F (x, y) = FY (y) = P (Y ≤ y) (“marginal” cdf of Y ).

limy→∞ F (x, y) = FX (x) = P (X ≤ x) (“marginal” cdf of X).

limx→∞ limy→∞ F (x, y) = 1.

F (x, y) is continuous from the right in both x and y.

ISYE 6739
Introduction

Example: Suppose
(
1 − e−x − e−y + e−(x+y) if x ≥ 0, y ≥ 0
F (x, y) =
0 if x < 0 or y < 0.

The marginal cdf of X is


(
1 − e−x if x ≥ 0
FX (x) = lim F (x, y) =
y→∞ 0 if x < 0.

The joint pdf is


∂2
f (x, y) = F (x, y)
∂x∂y
∂ −x
= (e − e−y e−x )
∂y
= e−(x+y) if x ≥ 0, y ≥ 0. 2

ISYE 6739
Marginal Distributions

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Marginal Distributions

Lesson 2.2 — Marginal Distributions

We’re also interested in the individual (marginal) distributions of X and Y .

Definition: If X and Y are jointly discrete, then the marginal pmf’s of X


and Y are, respectively,
X
fX (x) = P (X = x) = f (x, y)
y

and
X
fY (y) = P (Y = y) = f (x, y).
x

ISYE 6739
Marginal Distributions

Example (discrete case): f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 X=3 P (Y = y)


Y = 40 0.01 0.07 0.12 0.2
Y = 60 0.29 0.03 0.48 0.8
P (X = x) 0.3 0.1 0.6 1

By total probability,

P (X = 1) = P (X = 1, Y = any #) = 0.3. 2

ISYE 6739
Marginal Distributions

Example (discrete case): f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 X=3 P (Y = y)


Y = 40 0.06 0.02 0.12 0.2
Y = 60 0.24 0.08 0.48 0.8
P (X = x) 0.3 0.1 0.6 1

Remark: Hmmm. . . . Compared to the last example, this has the same
marginals but different joint distribution! That’s because the joint distribution
contains much more information than just the marginals.

ISYE 6739
Marginal Distributions

Definition: If X and Y are jointly continuous, then the marginal pdf’s of


X and Y are, respectively,
Z Z
fX (x) = f (x, y) dy and fY (y) = f (x, y) dx.
R R

Example: (
e−(x+y) if x ≥ 0, y ≥ 0
f (x, y) =
0 otherwise.
Then the marginal pdf of X is
Z Z ∞
fX (x) = f (x, y) dy = e−(x+y) dy = e−x , if x ≥ 0. 2
R 0

ISYE 6739
Marginal Distributions

Example: (
21 2
4 x y if x2 ≤ y ≤ 1
f (x, y) =
0 otherwise.
Note funny limits where the pdf is positive, i.e., x2 ≤ y ≤ 1.
Z Z 1
21 2 21 2
fX (x) = f (x, y) dy = x y dy = x (1 − x4 ), −1 ≤ x ≤ 1.
R x2 4 8

Z Z y
21 2 7
fY (y) = f (x, y) dx = √
x y dx = y 5/2 , 0 ≤ y ≤ 1. 2
R − y 4 2

ISYE 6739
Conditional Distributions

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Conditional Distributions

Lesson 2.3 — Conditional Distributions

Recall conditional probability: P (A|B) = P (A ∩ B)/P (B) if P (B) > 0.

Suppose that X and Y are jointly discrete RVs. Then if P (X = x) > 0,

P (X = x ∩ Y = y) f (x, y)
P (Y = y|X = x) = = .
P (X = x) fX (x)
P (Y = y|X = 2) defines the probabilities on Y given that X = 2.

Definition: If fX (x) > 0, then the conditional pmf/pdf of Y given


X = x is
f (x, y)
fY |X (y|x) ≡ .
fX (x)

Remark: We usually just write f (y|x) instead of fY |X (y|x).

f (x,y)
Remark: Of course, fX|Y (x|y) = f (x|y) = fY (y) .
ISYE 6739
Conditional Distributions

Discrete Example: f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 X=3 fY (y)


Y = 40 0.01 0.07 0.12 0.2
Y = 60 0.29 0.03 0.48 0.8
fX (x) 0.3 0.1 0.6 1

Then, for example,



29

 80 if x = 1
f (x, 60) f (x, 60) 
3
f (x|y = 60) = = = 80 if x = 2
fY (60) 0.8 
48
if x = 3. 2


80

ISYE 6739
Conditional Distributions

Old Continuous Example:


21 2
f (x, y) = x y, if x2 ≤ y ≤ 1.
4
21 2
fX (x) = x (1 − x4 ), if −1 ≤ x ≤ 1.
8
7 5/2
fY (y) = y , if 0 ≤ y ≤ 1.
2

Then the conditional pdf of Y given X = x is


21 2
f (x, y) 4 x y 2y
f (y|x) = = 21 2 = , if x2 ≤ y ≤ 1.
fX (x) 4
8 x (1 − x )
1 − x4

ISYE 6739
Conditional Distributions

So, for example,

f ( 21 , y) 21
· 14 y 32
4 1
≤ y ≤ 1. 2

f y|1/2 = = = y, if 4
fX ( 21 ) 21
8 · 1
4 · (1 − 1
16 )
15

Note that 2/(1 − x4 ) is a constant with respect to y, and we can check to see
that f (y|x) is a legit conditional pdf:
Z Z 1
2y
f (y|x) dy = dy = 1. 2
R x2 1 − x4

ISYE 6739
Conditional Distributions

Typical Problem: Given fX (x) and f (y|x), find fY (y).


R
Game Plan: Find f (x, y) = fX (x)f (y|x) and then fY (y) = R f (x, y) dx.

Example: Suppose fX (x) = 2x, for 0 < x < 1. Given X = x, suppose that
Y |x ∼ Unif(0, x). Now find fY (y).

Solution: Y |x ∼ Unif(0, x) implies that f (y|x) = 1/x, for 0 < y < x. So,

f (x, y) = fX (x)f (y|x)


1
= 2x · for 0 < x < 1 and 0 < y < x
x
= 2 0 < y < x < 1 (still have funny limits).

Thus,
Z Z 1
fY (y) = f (x, y) dx = 2 dx = 2(1 − y), 0 < y < 1. 2
R y

ISYE 6739
Independent Random Variables

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Independent Random Variables

Lesson 2.4 — Independent Random Variables

Recall that two events are independent if P (A ∩ B) = P (A)P (B).

Then
P (A ∩ B) P (A)P (B)
P (A|B) = = = P (A).
P (B) P (B)
And similarly, P (B|A) = P (B).

Now we want to define independence for random variables, i.e., the outcome
of X doesn’t influence the outcome of Y (and vice versa).

Definition: X and Y are independent RVs if, for all x and y,

f (x, y) = fX (x)fY (y).

ISYE 6739
Independent Random Variables

Equivalent definitions:

F (x, y) = FX (x)FY (y), ∀x, y


or
P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y), ∀x, y.

If X and Y aren’t independent, then they’re dependent.

Nice, Intuitive Theorem: X and Y are independent if and only if


f (y|x) = fY (y) ∀x, y.

Proof:
f (x, y) fX (x)fY (y)
f (y|x) = = = fY (y). 2
fX (x) fX (x)

Similarly, X and Y independent implies f (x|y) = fX (x).

ISYE 6739
Independent Random Variables

Example (discrete): f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 fY (y)


Y =2 0.12 0.28 0.4
Y =3 0.18 0.42 0.6
fX (x) 0.3 0.7 1

X and Y are independent since f (x, y) = fX (x)fY (y), ∀x, y. 2

ISYE 6739
Independent Random Variables

Example (continuous): Suppose f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.


After some work (which can be avoided by the next theorem), we can derive

fX (x) = 2x, if 0 ≤ x ≤ 1 and


2
fY (y) = 3y , if 0 ≤ y ≤ 1.

X and Y are independent since f (x, y) = fX (x)fY (y), ∀x, y. 2

ISYE 6739
Independent Random Variables

Easy way to tell if X and Y are independent. . . .

Theorem: X and Y are independent iff f (x, y) = a(x)b(y), ∀x, y, for some
functions a(x) and b(y) (not necessarily pdf’s).

So if f (x, y) factors into separate functions of x and y, then X and Y are


independent.

But if there are funny limits, this messes up the factorization, so in that case,
X and Y will be dependent — watch out!

Example: f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Take

a(x) = 6x, 0 ≤ x ≤ 1, and b(y) = y 2 , 0 ≤ y ≤ 1.

Thus, X and Y are independent (as above). 2

ISYE 6739
Independent Random Variables

21 2
Example: f (x, y) = 4 x y, x2 ≤ y ≤ 1.

Funny (non-rectangular) limits make factoring into marginals impossible.


Thus, X and Y are not independent. 2
c
Example: f (x, y) = x+y , 1 ≤ x ≤ 2, 1 ≤ y ≤ 3.

Can’t factor f (x, y) into functions of x and y separately. Thus, X and Y are
not independent. 2

Now that we can figure out if X and Y are independent, what can we do with
that knowledge?

ISYE 6739
Consequences of Independence

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Consequences of Independence

Lesson 2.5 — Consequences of Independence

Definition/Theorem (two-dimensional Unconscious Statistician):


Let h(X, Y ) be a function of the RVs X and Y . Then
( P P
h(x, y)f (x, y) discrete
E[h(X, Y )] = R xR y
R R h(x, y)f (x, y) dx dy continuous.

Theorem: Whether or not X and Y are independent,

E[X + Y ] = E[X] + E[Y ].

ISYE 6739
Consequences of Independence

Proof (continuous case):


Z Z
E[X + Y ] = (x + y)f (x, y) dx dy (2-D LOTUS)
R R
Z Z Z Z
= xf (x, y) dx dy + yf (x, y) dx dy
R R R R
Z Z Z Z
= x f (x, y) dy dx + y f (x, y) dx dy
R R R R
Z Z
= xfX (x) dx + yfY (y) dy
R R

= E[X] + E[Y ]. 2

ISYE 6739
Consequences of Independence

One can generalize this result to more than two random variables.

Corollary: If X1 , X2 , . . . , Xn are RVs, then


n
X  n
X
E Xi = E[Xi ].
i=1 i=1

Proof: Induction. 2

ISYE 6739
Consequences of Independence

Theorem: If X and Y are independent, then E[XY ] = E[X]E[Y ].

Proof (continuous case):


Z Z
E[XY ] = xyf (x, y) dx dy (2-D LOTUS)
R R
Z Z
= xyfX (x)fY (y) dx dy (X and Y are indep)
R R
Z Z 
= xfX (x) dx yfY (y) dy
R R

= E[X]E[Y ]. 2

Remark: The above theorem is not necessarily true if X and Y are


dependent. See the upcoming discussion on covariance.

ISYE 6739
Consequences of Independence

Theorem: If X and Y are independent, then

Var(X + Y ) = Var(X) + Var(Y ).

Proof:

Var(X + Y ) = E[(X + Y )2 ] − (E[X + Y ])2


2
= E[X 2 + 2XY + Y 2 ] − E[X] + E[Y ]
n 2 2 o
= E[X 2 ] + 2E[XY ] + E[Y 2 ] − E[X] + 2E[X]E[Y ] + E[Y ]
2 2
= E[X 2 ] + 2E[X]E[Y ] + E[Y 2 ] − E[X] − 2E[X]E[Y ] − E[Y ]
(since X and Y are independent)
2 2
= E[X 2 ] − E[X] + E[Y 2 ] − E[Y ] . 2

Remark: The assumption of independence really is important here. If X and


Y aren’t independent, then the result might not hold!

ISYE 6739
Consequences of Independence

Can generalize. . .

Corollary: If X1 , X2 , . . . , Xn are independent RVs, then


n
X  n
X
Var Xi = Var(Xi ).
i=1 i=1

Proof: Induction. 2

Corollary: If X1 , X2 , . . . , Xn are independent RVs, then


n
X  n
X
Var ai Xi + b = a2i Var(Xi ).
i=1 i=1

ISYE 6739
Random Samples

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Random Samples

Lesson 2.6 — Random Samples

Definition: X1 , X2 , . . . , Xn form a random sample if


Xi ’s are all independent.
Each Xi has the same pmf/pdf f (x).
iid
Notation: X1 , . . . , Xn ∼ f (x) (“independent and identically
distributed”).
iid
Example/Theorem: Suppose X1 , . . . , Xn ∼ f (x),
Pwith E[Xi ] = µ, and
Var(Xi ) = σ 2 . Define the sample mean as X̄ ≡ ni=1 Xi /n. Then
 Xn  n n
1 1X 1X
E[X̄] = E Xi = E[Xi ] = µ = µ.
n n n
i=1 i=1 i=1

So the mean of X̄ is the same as the mean of Xi . 2

ISYE 6739
Random Samples

Meanwhile, how about the variance of the sample mean?


 X n 
1
Var(X̄) = Var Xi
n
i=1
X n 
1
= Var X i
n2
i=1
n
1 X
= Var(Xi ) (Xi ’s indep)
n2
i=1
n
1 X 2
= σ = σ 2 /n.
n2
i=1

So the mean of X̄ is the same as the mean of Xi , but the variance decreases!
This makes X̄ a great estimator for µ (which is usually unknown in practice);
the result is referred to as the Law of Large Numbers. Stay tuned.

ISYE 6739
Conditional Expectation

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Conditional Expectation

Lesson 2.7 — Conditional Expectation

The Next Few Lessons:


Conditional expectation — definition and examples.
“Double” expectation — a very cool theorem.
Honors Class: First-step analysis.
Honors Class: Random sums of random variables.
Honors Class: The standard conditioning argument and its applications.

ISYE 6739
Conditional Expectation

Consider the usual definition of expectation. (E.g., what’s the average weight
of a male?) ( P
yf (y) discrete
E[Y ] = R y
R yf (y) dy continuous.

Now suppose we’re interested in the average weight of a 6' tall male.

f (y|x) is the conditional pmf/pdf of Y given X = x.

Definition: The conditional expectation of Y given X = x is


( P
yf (y|x) discrete
E[Y |X = x] ≡ R y
R yf (y|x) dy continuous.

Note that E[Y |X = x] is a function of x.

ISYE 6739
Conditional Expectation

Discrete Example:

f (x, y) X=0 X=3 fY (y)


Y =2 0.11 0.34 0.45
Y =5 0.00 0.05 0.05
Y = 10 0.29 0.21 0.50
fX (x) 0.40 0.60 1

The unconditional expectation is


X
E[Y ] = yfY (y) = 2(0.45) + 5(0.05) + 10(0.50) = 6.15.
y

ISYE 6739
Conditional Expectation

But conditional on X = 3, we have



34

 60 if y = 2
f (3, y) f (3, y) 
5
f (y|x = 3) = = = 60 if y = 5
fX (3) 0.60 
 21

60 if y = 10.

So the expectation conditional on X = 3 is


X
E[Y |X = 3] = yf (y|3)
y
= 2(34/60) + 5(5/60) + 10(21/60)
= 5.05.

This compares to the unconditional expectation E[Y ] = 6.15. So information


that X = 3 pushes the conditional expected value of Y down to 5.05. 2

ISYE 6739
Conditional Expectation

Old Continuous Example:


21 2
f (x, y) = x y, if x2 ≤ y ≤ 1.
4
Recall that
2y
f (y|x) = if x2 ≤ y ≤ 1.
1 − x4
Thus,
1
2 1 − x6
Z Z
2
E[Y |x] = yf (y|x) dy = y 2 dy = · .
R 1 − x4 x2 3 1 − x4

So, e.g., E[Y |X = 0.5] = 2


3 · 63 15
64 / 16 = 0.70. 2

ISYE 6739
Double Expectation

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Double Expectation

Lesson 2.8 — Double Expectation

Theorem (double expectation):

E[E(Y |X)] = E[Y ].

Remarks: Yikes, what the heck is this!?

The expected value (averaged over all X’s) of the conditional expected value
(of Y |X) is the plain old expected value (of Y ).

Think of the outside expected value as the expected value of


h(X) = E(Y |X). Then LOTUS miraculously gives us E[Y ].

Believe it or not, sometimes it’s easier to calculate E[Y ] indirectly by using


our double expectation trick.

ISYE 6739
Double Expectation

Proof (continuous case): By the Unconscious Statistician,


Z
E[E(Y |X)] = E(Y |x)fX (x) dx
R
Z Z 
= yf (y|x) dy fX (x) dx
R R
Z Z
= yf (y|x)fX (x) dx dy
R R
Z Z
= y f (x, y) dx dy
R R
Z
= yfY (y) dy
R
= E[Y ]. 2

ISYE 6739
Double Expectation

21 2
Old Example: Suppose f (x, y) = 4 x y, if x2 ≤ y ≤ 1.

Find E[Y ] two ways.

By previous examples, we know that


21 2
fX (x) = x (1 − x4 ), if −1 ≤ x ≤ 1
8
7 5/2
fY (y) = y , if 0 ≤ y ≤ 1
2

2 1 − x6
E[Y |x] = · .
3 1 − x4

ISYE 6739
Double Expectation

Solution #1 (old, boring way):


Z Z 1
7 7/2 7
E[Y ] = yfY (y) dy = y dy = .
R 0 2 9

Solution #2 (new, exciting way):

E[Y ] = E[E(Y |X)]


Z
= E(Y |x)fX (x) dx
R
Z 1
2 1 − x6
 
21 2 4
= · 4
x (1 − x ) dx
−1 3 1 − x 8
= 7/9.

Notice that both answers are the same (good)! 2

ISYE 6739
Honors Class: First-Step Analysis

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Honors Class: First-Step Analysis

Lesson 2.9 — Honors Class: First-Step Analysis

Example: “First-step” method to find the mean of Y ∼ Geom(p). Think


of Y as the number of coin flips before H appears, where P (H) = p.

Furthermore, consider the first step of the coin flip process, and let X = H or
T denote the outcome of the first toss. Based on the result X of this first step,
we have

E[Y ] = E[E(Y |X)]


X
= E[Y |x]fX (x)
x
= E[Y |X = T]P (X = T) + E[Y |X = H]P (X = H)
= (1 + E[Y ])(1 − p) + (1)(p) (start from scratch if X = T).

Solving, we get E[Y ] = 1/p (which is the correct answer)! 2

ISYE 6739
Honors Class: First-Step Analysis

Example: Consider a sequence of coin flips. What is the expected number of


flips Y until “HT” appears for the first time?

Clearly, Y = A + B, where A is the number of flips until the first “H”


appears, and B is the number of subsequent flips until “T” appears for the first
time after the sequence of H’s begins.

For instance, the sequence TTTHHT corresponds to Y = A + B = 4 + 2 = 6.

In any case, it’s obvious that A and B are iid Geom(p = 1/2), so by the
previous example, E[Y ] = E[A] + E[B] = (1/p) + (1/p) = 4. 2

This example didn’t involve first-step analysis (besides using the expected
value of a geometric RV). But the next related example will. . . .

ISYE 6739
Honors Class: First-Step Analysis

Example: Again consider a sequence of coin flips. What is the expected


number of flips Y until “HH” appears for the first time?

For instance, the sequence TTHTTHH corresponds to Y = 7 tries.

Using an enhanced first-step analysis, we see that

E[Y ] = E[Y |T]P (T) + E[Y |H]P (H)


= E[Y |T]P (T)

+ E[Y |HH]P (HH|H) + E[Y |HT]P (HT|H) P (H)
  
= 1 + E[Y ] (0.5) + (2)(0.5) + 2 + E[Y ] (0.5) (0.5)
(since we have to start over once we see a T)
= 1.5 + 0.75 E[Y ].

Solving, we obtain E[Y ] = 6, which is perhaps surprising given the result


from the previous example. 2

ISYE 6739
Honors Class: Random Sums of Random Variables

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Honors Class: Random Sums of Random Variables

Lesson 2.10 — Honors Class: Random Sums of Random Variables

Bonus Theorem (expectation of sum of a random number of RVs):

Suppose that X1 , X2 , . . . are independent RVs, all with the same mean.

Also suppose that N is a nonnegative, integer-valued RV that’s independent


of the Xi ’s. Then
XN 
E Xi = E[N ]E[X1 ].
i=1

Remark:
 PN You  have to be very careful here. In particular, note that
E i=1 Xi 6= N E[X1 ], since the LHS is a number and the RHS is random.

ISYE 6739
Honors Class: Random Sums of Random Variables

Proof (cf. Ross): By double expectation,


N
X  "  N #
X
E Xi = E E Xi N
i=1 i=1

X N
X 
= E Xi N = n P (N = n)
n=1 i=1
X∞ n
X 
= E Xi N = n P (N = n)
n=1 i=1
X∞ n
X 
= E Xi P (N = n) (N and Xi ’s indep)
n=1 i=1
X∞
= nE[X1 ]P (N = n)
n=1

X
= E[X1 ] nP (N = n). 2
n=1

ISYE 6739
Honors Class: Random Sums of Random Variables

Example: Suppose the number of times we roll a die is N ∼ Pois(10). If Xi


denotes the value of the ith toss, then the expected total of all of the rolls is
N
X 
E Xi = E[N ]E[X1 ] = 10(3.5) = 35. 2
i=1

Theorem: Under the same conditions as before,


N
X 
Var Xi = E[N ]Var(X1 ) + (E[X1 ])2 Var(N ).
i=1

Proof: See, for instance, Ross. 2

ISYE 6739
Honors Class: Standard Conditioning Argument

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Honors Class: Standard Conditioning Argument

Lesson 2.11 — Honors Class: Standard Conditioning Argument

Bonus Theorem/Proof (computing probabilities by conditioning):

Let A be some event, and define the RV Y as the following indicator function:
(
1 if A occurs
Y = 1A ≡
0 otherwise.

Then X
E[Y ] = yfY (y) = P (Y = 1) = P (A).
y

Similarly, for any RV X, we have


X
E[Y |X = x] = yf (y|x) = P (Y = 1|X = x) = P (A|X = x).
y
These results suggest an alternative way of calculating P (A). . . .

ISYE 6739
Honors Class: Standard Conditioning Argument

Theorem: If X is a continuous RV (similar result if X is discrete), then


Z
P (A) = P (A|X = x)fX (x) dx.
R

Proof:
P (A) = E[Y ] (where we take Y = 1A )
= E[E(Y |X)] (double expectation)
Z
= E[Y |x]fX (x) dx (LOTUS)
ZR

= P (A|X = x)fX (x) dx (since Y = 1A ). 2


R

Remark: We call this the “standard conditioning argument.” Yes, it


looks complicated. But sometimes you need to take a step backward to go two
steps forward!

ISYE 6739
Honors Class: Standard Conditioning Argument

Example/Theorem: If X and Y are independent continuous RVs, with pdf


fX (·) and cdf FY (·), respectively. Then
Z
P (Y ≤ X) = FY (x)fX (x) dx.
R

Proof: (Actually, there are many proofs.) Let the event A = {Y ≤ X}. Then

Z
P (Y ≤ X) = P (Y ≤ X|X = x)fX (x) dx
ZR
= P (Y ≤ x|X = x)fX (x) dx
ZR
= P (Y ≤ x)fX (x) dx (X, Y are independent). 2
R

ISYE 6739
Honors Class: Standard Conditioning Argument

Example: If X ∼ Exp(α) and Y ∼ Exp(β) are independent RVs, then


Z
P (Y ≤ X) = FY (x)fX (x) dx
ZR∞
= (1 − e−βx )αe−αx dx
0
β
= . 2
α+β

Remark: Think of X as the time until the next male driver shows up at a
parking lot (at rate α / hour) and Y as the time for the next female driver (at
rate β / hour). Then P (Y ≤ X) = β/(α + β) is the intuitively reasonable
probability that the next driver to arrive will be female. 2

ISYE 6739
Honors Class: Standard Conditioning Argument

Example/Theorem: Suppose X and Y are independent continuous RVs,


with pdf fX (·) and cdf FY (·), respectively. Define the sum Z = X + Y . Then
Z
P (Z ≤ z) = FY (z − x)fX (x) dx.
R

As expression such as the above for P (Z ≤ z) is often called a convolution.

Proof: Z
P (Z ≤ z) = P (X + Y ≤ z|X = x)fX (x) dx
ZR
= P (Y ≤ z − x|X = x)fX (x) dx
ZR
= P (Y ≤ z − x)fX (x) dx (X, Y are indep). 2
R

ISYE 6739
Honors Class: Standard Conditioning Argument

iid
Example: Suppose X, Y ∼ Exp(λ), and let Z = X + Y . Then
Z
P (Z ≤ z) = FY (z − x)fX (x) dx
R
Z z
= (1 − e−λ(z−x) )λe−λx dx
0
(must have x ≥ 0 and z − x ≥ 0)

= 1 − e−λz − λze−λz , if z ≥ 0.

Thus, the pdf of Z is

d
P (Z ≤ z) = λ2 ze−λz , z ≥ 0.
dz
This turns out to mean that Z ∼ Gamma(2, λ), aka Erlang2 (λ). 2

ISYE 6739
Honors Class: Standard Conditioning Argument

You can do the similar kinds of convolutions with discrete RVs. We state the
following result without proof (which is straightforward).

Example/Theorem: Suppose X and Y are two independent integer-valued


RVs with pmf’s fX (x) and fY (y). Then the pmf of Z = X + Y is


X
fZ (z) = P (Z = z) = fX (x)fY (z − x).
x=−∞

ISYE 6739
Honors Class: Standard Conditioning Argument

Example Suppose X and Y are iid Bern(p). Then the pmf of Z = X + Y is



X
fZ (z) = fX (x)fY (z − x)
x=−∞
= fX (0)fY (z) + fX (1)fY (z − 1) (X can only = 0 or 1)

= fX (0)fY (z)1{0,1} (z) + fX (1)fY (z − 1)1{1,2} (z)


(1{·} (z) functions indicate nonzero fY (·)’s)

= p0 q 1−0 pz q 1−z 1{0,1} (z) + p1 q 1−1 pz−1 q 2−z 1{1,2} (z)

pz q 2−z 1{0,1} (z) + 1{1,2} (z)


 
=
 
2 z 2−z
= p q , z = 0, 1, 2.
z

Thus, Z ∼ Bin(2, p), a fond blast from the past! 2

ISYE 6739
Covariance and Correlation

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Covariance and Correlation

Lesson 2.12 — Covariance and Correlation

In the next few lessons we’ll cover:


Basic Concepts of Covariance and Correlation
Causation
A Couple of Worked Examples
Some Useful Theorems

Covariance and correlation are measures used to define the degree of


association between X and Y if they don’t happen to be independent.

Definition: The covariance between X and Y is

Cov(X, Y ) ≡ σXY ≡ E[(X − E[X])(Y − E[Y ])].

Remark: Cov(X, X) = E[(X − E[X])2 ] = Var(X).

ISYE 6739
Covariance and Correlation

Remark: If X and Y have positive covariance, then X and Y move “in the
same direction.” Think height and weight.

ISYE 6739
Covariance and Correlation

If X and Y have negative covariance, then X and Y move “in opposite


directions.” Think snowfall and temperature.

ISYE 6739
Covariance and Correlation

If X and Y are independent, then of course they have no association with


each other. In fact, we’ll prove below that independence implies that the
covariance is 0 (but not the other way around).

Example: IBM stock price vs. temperature on Mars are independent — at


least that’s what they want you to believe!

ISYE 6739
Covariance and Correlation

Theorem (easier way to calculate covariance):

Cov(X, Y ) = E[XY ] − E[X]E[Y ].

Proof:

Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]


h i
= E XY − XE[Y ] − Y E[X] + E[X]E[Y ]
= E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ]. 2

Theorem: X and Y independent implies Cov(X, Y ) = 0.

Proof: By a previous theorem, X and Y independent implies


E[XY ] = E[X]E[Y ]. Then

Cov(X, Y ) = E[XY ] − E[X]E[Y ] = E[X]E[Y ] − E[X]E[Y ]. 2

ISYE 6739
Covariance and Correlation

Danger Will Robinson! Cov(X, Y ) = 0 does not imply that X and Y


are independent!!

Example: Suppose X ∼ Unif(−1, 1) and Y = X 2 (so X and Y are clearly


dependent).

But Z 1
1
E[X] = x·
dx = 0 and
−1 2
Z 1
3 1
E[XY ] = E[X ] = x3 · dx = 0,
−1 2
so
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.

ISYE 6739
Covariance and Correlation

In fact, here’s a graphical illustration of this zero-correlation dependence


phenomenon, where we’ve actually added some normal noise to Y to make it
look prettier.

ISYE 6739
Covariance and Correlation

Definition: The correlation between X and Y is


Cov(X, Y ) σXY
ρ = Corr(X, Y ) ≡ p = .
Var(X)Var(Y ) σX σY

Remark: Covariance has “square” units; correlation is unitless.

Corollary: X, Y independent implies ρ = 0.

Theorem: It can be shown that −1 ≤ ρ ≤ 1.

ρ ≈ 1 is “high” correlation.
ρ ≈ 0 is “low” correlation.
ρ ≈ −1 is “high” negative correlation.

Example: Height is highly correlated with weight.


Temperature on Mars has low correlation with IBM stock price.

ISYE 6739
Correlation and Causation

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Correlation and Causation

Lesson 2.13 — Correlation and Causation

NOTE! Correlation does not necessarily imply causality! This is a


very common pitfall in many areas of data analysis and public discourse.

Example in which correlation does imply causality: Height and


weight are positively correlated, and larger height does indeed tend to cause
greater weight. 2

Example in which correlation does not imply causality: Temperature


and lemonade sales have positive corr, and temp has causal influence on
lemonade sales. Similarly, temp and overheating cars are positively correlated
with a causal relationship. It’s also likely that lemonade sales and overheating
cars are positively correlated, but there’s no causal relationship there. 2

Example of a zero correlation relationship with causality! We’ve


seen that it’s possible for two dependent RVs to be uncorrelated. 2

ISYE 6739
Correlation and Causation

To prove that X causes Y , one must establish that:


X occurred before Y ;
The relationship between X and Y is not completely due to random
chance; and
Nothing else accounts for the relationship (which is violated in the
lemonade sales / overheating cars example above).

These items can be often be established via mathematical analysis, statistical


analysis of appropriate data, or consultation with appropriate experts.

ISYE 6739
Correlation and Causation

The three examples above seem to give conflicting guidance with respect to
the relationship between correlation and causality. How can we interpret these
findings in a meaningful way? Here are the takeaways:
If the correlation between X and Y is (significantly) nonzero, there is
some type of relationship between the two items, which may or may not
be causal; but this should raise our curiosity.
If the correlation between X and Y is 0, we are not quite out of the
woods with respect to dependence and causality. In order to definitively
rule out a relationship between X and Y , it is always highly
recommended protocol to, at the very least,
Plot data from X and Y against each other to see if there is a nonlinear
relationship, as in the uncorrelated-yet-dependent example.
Consult with appropriate experts.

ISYE 6739
A Couple of Worked Correlation Examples

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
A Couple of Worked Correlation Examples

Lesson 2.14 — A Couple of Worked Correlation Examples

Discrete Example: Suppose X is the GPA of a UGA student, and Y is


their IQ. Here’s the joint pmf.

f (x, y) X=2 X=3 X=4 fY (y)


Y = 40 0.0 0.2 0.2 0.4
Y = 50 0.1 0.1 0.0 0.2
Y = 60 0.4 0.0 0.0 0.4
fX (x) 0.5 0.3 0.2 1

We’ll spare the details, but here are the relevant calculations. . .

ISYE 6739
A Couple of Worked Correlation Examples

X
E[X] = xfX (x) = 2.7,
x
X
E[X 2 ] = x2 fX (x) = 7.9, and
x
Var(X) = E[X 2 ] − (E[X])2 = 0.61.

Similarly, E[Y ] = 50, E[Y 2 ] = 2580, and Var(Y ) = 80. Finally,


XX
E[XY ] = xyf (x, y)
x y
= 2(40)(0.0) + 3(40)(0.2) + · · · + 4(60)(0.0) = 129,

Cov(X, Y ) = E[XY ] − E[X]E[Y ] = − 6.0, and


Cov(X, Y )
ρ = p = − 0.859. 2
Var(X)Var(Y )

ISYE 6739
A Couple of Worked Correlation Examples

Continuous Example: Suppose f (x, y) = 10x2 y, 0 ≤ y ≤ x ≤ 1.


Z x
fX (x) = 10x2 y dy = 5x4 , 0 ≤ x ≤ 1,
0
Z 1
E[X] = 5x5 dx = 5/6,
0
Z 1
E[X 2 ] = 5x6 dx = 5/7,
0

Var(X) = E[X 2 ] − (E[X])2 = 0.01984.

ISYE 6739
A Couple of Worked Correlation Examples

Similarly,
Z 1
10
fY (y) = 10x2 y dx = y(1 − y 3 ), 0 ≤ y ≤ 1,
y 3

E[Y ] = 5/9, Var(Y ) = 0.04850,

Z 1Z x
E[XY ] = 10x3 y 2 dy dx = 10/21,
0 0

Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.01323,

Cov(X, Y )
ρ = p = 0.4265. 2
Var(X)Var(Y )

ISYE 6739
Some Useful Covariance / Correlation Theorems

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Some Useful Covariance / Correlation Theorems

Lesson 2.15 — Some Useful Covariance / Correlation Theorems

Theorem: Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ), whether or


not X and Y are independent.

Remark: If X, Y are independent, the covariance term goes away.

Proof: By the work we did on a previous proof,

Var(X + Y ) = E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2


+2(E[XY ] − E[X]E[Y ])
= Var(X) + Var(Y ) + 2Cov(X, Y ). 2

ISYE 6739
Some Useful Covariance / Correlation Theorems

Theorem:
n
X  n
X XX
Var Xi = Var(Xi ) + 2 Cov(Xi , Xj ).
i<j
i=1 i=1

Proof: Induction.

Corollary: If all Xi ’s are independent, then


n
X  n
X
Var Xi = Var(Xi ).
i=1 i=1

ISYE 6739
Some Useful Covariance / Correlation Theorems

Theorem: Cov(aX, bY + c) = ab Cov(X, Y ).

Proof:

Cov(aX, bY + c) = E[aX · (bY + c)] − E[aX]E[bY + c]


= E[abXY ] + E[acX] − E[aX]E[bY ] − E[aX]E[c]
= ab E[XY ] − ab E[X]E[Y ] + acE[X] − acE[X]
= ab Cov(X, Y ). 2

Theorem:
Xn  n
X XX
Var ai Xi + c = a2i Var(Xi ) + 2 ai aj Cov(Xi , Xj ).
i<j
i=1 i=1

Proof: Put the above two results together. 2

ISYE 6739
Some Useful Covariance / Correlation Theorems

Example: Var(X − Y ) = Var(X) + Var(Y ) − 2Cov(X, Y ).

Example: Suppose Var(X) = Var(Y ) = Var(Z) = 10,


Cov(X, Y ) = 3, Cov(X, Z) = −2, and Cov(Y, Z) = 0. Then

Var(X − 2Y + 3Z)
= Var(X) + 4Var(Y ) + 9Var(Z)
−4Cov(X, Y ) + 6Cov(X, Z) − 12Cov(Y, Z)
= 14(10) − 4(3) + 6(−2) − 12(0) = 116. 2

ISYE 6739
Moment Generating Functions, Revisited

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Moment Generating Functions, Revisited

Lesson 2.16 — Moment Generating Functions, Revisited

Old Definition: MX (t) ≡ E[etX ] is the moment generating function


(mgf) of the RV X.

Old Example: If X ∼ Bern(p), then


X
MX (t) = E[etX ] = etx f (x) = et·1 p + et·0 q = pet + q. 2
x

Old Example: If X ∼ Exp(λ), then


Z
λ
tX
MX (t) = E[e ] = etx f (x) dx = if λ > t. 2
R λ−t

Old Theorem (why it’s called the mgf): Under certain technical conditions,

dk
E[X k ] = MX (t) , k = 1, 2, . . . .
dtk t=0

ISYE 6739
Moment Generating Functions, Revisited

New Theorem (mgf of the Psum of independent RVs): Suppose X1 , . . . , Xn


n
are independent. Let Y = i=1 Xi . Then
n
Y
MY (t) = MXi (t).
i=1

Proof:
MY (t) = E[etY ]
P
= E[et Xi ]
Yn 
tXi
= E e
i=1
n
Y
= E[etXi ] (Xi ’s independent)
i=1
n
Y
= MXi (t). 2
i=1

ISYE 6739
Moment Generating Functions, Revisited

Pn
Corollary: If X1 , . . . , Xn are iid and Y = i=1 Xi , then

MY (t) = [MX1 (t)]n .


iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p). Then by a previous example,

MY (t) = [MX1 (t)]n = (pet + q)n .

So what use is a result like this? We can use results such as this with our old
friend. . . .

Old Theorem (identifying distributions): In this class, each distribution has


a unique mgf.

ISYE 6739
Moment Generating Functions, Revisited

Example/Theorem: The sum Y of n iid Bern(p) RVs is the same as a


Bin(n, p) RV.

By the previous example and uniqueness, all we need to show is that the mgf
of Z ∼ Bin(n, p) matches MY (t) = (pet + q)n . To this end, we have

MZ (t) = E[etZ ]
X
= etz P (Z = z)
z
n  
X n z n−z
tz
= e p q
z
z=0
n  
X n
= (pet )z q n−z
z
z=0
= (pe + q)n
t
(by the Binomial Theorem). 2

ISYE 6739
Moment Generating Functions, Revisited

Example: You can identify a distribution by its mgf.


 15
3 t 1
MX (t) = e +
4 4

implies that X ∼ Bin(15, 0.75). 2

Old Theorem (mgf of a linear function of X): Suppose X has mgf MX (t)
and let Y = aX + b. Then MY (t) = etb MX (at).

Example:
 15
−2t 3 3t 1
MY (t) = e e + = ebt (peat + q)n = ebt MX (at),
4 4

which implies that Y has the same distribution as 3X − 2, where


X ∼ Bin(15, 0.75). 2

ISYE 6739
Moment Generating Functions, Revisited

Theorem (Additive property of Binomials): If X1 , . . . , Xk are independent,


with Xi ∼ Bin(ni , p) (where p is the same for all Xi ’s), then
k
X k
X 
Y ≡ Xi ∼ Bin ni , p .
i=1 i=1

Proof:
k
Y
MY (t) = MXi (t) (mgf of independent sum)
i=1
k
Y
= (pet + q)ni (Binomial(ni , p) mgf)
i=1
Pk
= (pet + q) i=1 ni .

This is the mgf of the Bin( ki=1 ni , p), so we’re done. 2


P

ISYE 6739
Honors Bivariate Functions of Random Variables

1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Honors Bivariate Functions of Random Variables

Lesson 2.17 — Honors Bivariate Functions of Random Variables

In earlier work, we looked at. . .


Functions of a single variable, e.g., what is the expected value of h(X)?
(LOTUS, from Module 2)
What is the distribution of h(X)? (functions of RVs, from Module 2)
And sometimes even functions of two (or Pmore) variables. For example,
if the Xi ’s are independent, what’s Var( ni=1 Xi )? (earlier in Module 3)
Use a standard conditioning argument to get the distribution of X + Y .
(earlier in Module 3)

Goal: Now let’s give a general result on the distribution of functions of two
random variables, the proof of which is beyond the scope of our class.

ISYE 6739
Honors Bivariate Functions of Random Variables

Honors Theorem: Suppose X and Y are continuous RVs with joint pdf
f (x, y), and V = h1 (X, Y ) and W = h2 (X, Y ) are functions of X and Y ,
and
X = k1 (V, W ) and Y = k2 (V, W ),
for suitably chosen inverse functions k1 and k2 .

Then the joint pdf of V and W is



g(v, w) = f k1 (v, w), k2 (v, w) |J(v, w)|,

where |J| is the absolute value of the Jacobian (determinant) of the


transformation, i.e.,
∂x ∂x
∂v ∂w ∂x ∂y ∂y ∂x
J(v, w) = ∂y ∂y
= − .
∂v ∂w
∂v ∂w ∂v ∂w

ISYE 6739
Honors Bivariate Functions of Random Variables

Corollary: If X and Y are independent, then the joint pdf of V and W is


 
g(v, w) = fX k1 (v, w) fY k2 (v, w) |J(v, w)|.

Remark: These results generalize the 1-D method from Module 2.

You can use this method to find all sorts of cool stuff, e.g., the distribution of
X + Y , X/Y , etc., as well as the joint pdf of any functions of X and Y .

Remark: Although the notation is nasty, the application isn’t really so bad.

ISYE 6739
Honors Bivariate Functions of Random Variables

Example: Suppose X and Y are iid Exp(λ). Find the pdf of X + Y .

We’ll set V = X + Y along with the dummy RV W = X.

This yields

X = W = k1 (V, W ) and Y = V − W = k2 (V, W ).

To get the Jacobian term, we calculate

∂x ∂x ∂y ∂y
= 0, = 1, = 1, and = −1,
∂v ∂w ∂v ∂w
so that
∂x ∂y ∂y ∂x
|J| = − = |0(−1) − 1(1)| = 1.
∂v ∂w ∂v ∂w

ISYE 6739
Honors Bivariate Functions of Random Variables

This implies that the joint pdf of V and W is



g(v, w) = f k1 (v, w), k2 (v, w) |J(v, w)|
= f (w, v − w) · 1
= fX (w)fY (v − w) (X and Y independent)
= λe−λw · λe−λ(v−w) , for w > 0 and v − w > 0
2 −λv
= λ e , for 0 < w < v.

And, finally, we obtain the desired pdf of the sum V (after carefully noting the
region of integration),
Z Z v
gV (v) = g(v, w) dw = λ2 e−λv dw = λ2 ve−λv , for v > 0.
R 0

This is the Gamma(2, λ) pdf, which matches our answer from earlier in the
current module. 2

ISYE 6739
Honors Bivariate Functions of Random Variables

Honors Example: Suppose X and Y are iid Unif(0,1). Find the joint pdf of
V = X + Y and W = X/Y .

After some algebra, we obtain


VW V
X = = k1 (V, W ) and Y = = k2 (V, W ).
W +1 W +1
After more algebra, we calculate

∂x w ∂x v ∂y 1 ∂y −v
= , = , = , = ,
∂v w+1 ∂w (w + 1)2 ∂v w+1 ∂w (w + 1)2

so that after still more algebra,

∂x ∂y ∂y ∂x v
|J| = − = .
∂v ∂w ∂v ∂w (w + 1)2

ISYE 6739
Honors Bivariate Functions of Random Variables

This implies that the joint pdf of V and W is



g(v, w) = f k1 (v, w), k2 (v, w) |J(v, w)|
 
vw v v
= f , ·
w+1 w+1 (w + 1)2
   
vw v v
= fX fY (X and Y indep)
w+1 w + 1 (w + 1)2
v
= 1·1· , for 0 < x, y < 1 (since X, Y ∼ Unif(0,1))
(w + 1)2
v vw v
= , for 0 < x = w+1 < 1 and 0 < y = w+1 < 1.
(w + 1)2
v
= , 0 < v < 1 + min{ w1 , w} and w > 0 (after algebra).
(w + 1)2

Note that you have to be careful about the limits of v and w, but this thing
really does double integrate to 1! 2

ISYE 6739
Honors Bivariate Functions of Random Variables

We can also get the marginal pdf’s. First of all, for the ratio of the uniforms,
we get
Z
gW (w) = g(v, w) dv
R
Z1+min{1/w,w}
v
= dv
0 (w + 1)2
2
1 + min{1/w, w}
=
2(w + 1)2
(
1
2, if w ≤ 1
= 1
2w2
, if w > 1,

which is a little weird-looking and unexpected to me (it’s flat for w ≤ 1, and


then decreases to 0 pretty quickly for w > 1). 2

ISYE 6739
Honors Bivariate Functions of Random Variables

For the pdf


R of the sum of the uniforms, we have to calculate
gV (v) = R g(v, w) dw. But first we need to deal with some inequality
constraints so that we can integrate over the proper region, namely,

0 ≤ v ≤ 1 + min{1/w, w}, 0 ≤ v ≤ 2, and w ≥ 0.

With a little thought, we see that if 0 ≤ v ≤ 1, then there is no constraint on w


except for it being positive. On the other hand, if 1 < v ≤ 2, then you can
1
show (it takes a little work) that v − 1 ≤ w ≤ v−1 . Thus, we have
( R∞
g(v, w) dw, if 0 ≤ v ≤ 1
gV (v) = R01/(v−1)
v−1 g(v, w) dw, if 1 < v ≤ 2
(
v, if 0 ≤ v ≤ 1
= (after algebra).
2 − v, if 1 < v ≤ 2

This is a Triangle(0,1,2) pdf. Can you see why? Is there an intuitive


explanation for this pdf? 2
ISYE 6739
Honors Bivariate Functions of Random Variables

And Now a Word From Our Sponsor. . .

We are finally done with the most-difficult module of the course.


Congratulations and Felicitations!!!

Things will get easier from now on! Happy days are here again!

ISYE 6739

You might also like