Probability 2.2 EdX

2.
Bivariate Random Variables
Dave Goldsman
H. Milton Stewart School of Industrial and Systems Engineering
Georgia Institute of Technology
3/2/20
ISYE 6739
1 Introduction
2 Marginal Distributions
3 Conditional Distributions
4 Independent Random Variables
5 Consequences of Independence
6 Random Samples
7 Conditional Expectation
8 Double Expectation
9 Honors Class: First-Step Analysis
10 Honors Class: Random Sums of Random Variables
11 Honors Class: Standard Conditioning Argument
12 Covariance and Correlation
13 Correlation and Causation
14 A Couple of Worked Correlation Examples
15 Some Useful Covariance / Correlation Theorems
16 Moment Generating Functions, Revisited
17 Honors Bivariate Functions of Random Variables
ISYE 6739
Introduction
Lesson 2.1 — Introduction
In this introductory lesson, we’ll cover . . .

What we mean by bivariate (or joint) random variables.
The discrete case.
The continuous case.
Bivariate cdf’s.
In this module, we’ll look at what happens when you consider two random
variables simultaneously.
Example: Choose a person at random. Look at their height and weight

(X, Y ). Obviously, X and Y will be related somehow.
ISYE 6739
Introduction
Discrete Case
Definition: If X and Y are discrete random variables, then (X, Y ) is called a

jointly discrete bivariate random variable.
The joint (or bivariate) pmf is
f (x, y) = P (X = x, Y = y), ∀x, y.
Properties:
0 ≤ f (x, y) ≤ 1.
P P
x y f (x, y) = 1.
A ⊆ R2 ⇒ P ((X, Y ) ∈ A) =
PP
(x,y)∈A f (x, y).
ISYE 6739
Introduction
Example: 3 sox in a box (numbered 1,2,3). Draw 2 sox at random without

replacement. X = # of the first sock; Y = # of the second sock. The joint
pmf f (x, y) is
f (x, y) X=1 X=2 X=3 P (Y = y)

Y =1 0 1/6 1/6 1/3
Y =2 1/6 0 1/6 1/3
Y =3 1/6 1/6 0 1/3
P (X = x) 1/3 1/3 1/3 1
fX (x) ≡ P (X = x) is the “marginal” pmf of X.
fY (y) ≡ P (Y = y) is the “marginal” pmf of Y .
ISYE 6739
Introduction
By the Law of Total Probability,

3
X
P (X = 1) = P (X = 1, Y = y) = 1/3.
y=1
In addition,
P (X ≥ 2, Y ≥ 2)
XX
= f (x, y)
x≥2 y≥2
= f (2, 2) + f (2, 3) + f (3, 2) + f (3, 3)
= 0 + 1/6 + 1/6 + 0 = 1/3. 2
ISYE 6739
Introduction
Continuous Case
Definition: If X and Y are continuous RVs, then (X, Y ) is a jointly

continuous bivariate RV if there exists a magic function f (x, y) such that
f (x, y) ≥ 0, ∀x, y.
RR
R2 f (x, y) dx dy = 1. RR
P (A) = P ((X, Y ) ∈ A) = A f (x, y) dx dy.
In this case, f (x, y) is called the joint pdf.
If A ⊆ R2 , then P (A) is the volume between f (x, y) and A.
Think of
f (x, y) dx dy ≈ P (x < X < x + dx, y < Y < y + dy).
It’s easy to see how this generalizes the 1-dimensional pdf, f (x).
ISYE 6739
Introduction
Example: Choose a point (X, Y ) at random in the interior of the circle

inscribed in the unit square, e.g., C ≡ (x − 12 )2 + (y − 12 )2 ≤ 14 .
Find the pdf of (X, Y ).
Since the area of the circle is π/4,

(
4/π if (x, y) ∈ C
f (x, y) =
0 otherwise. 2
Application: Toss n darts randomly into the unit square. The probability
that any individual dart will land in the circle is π/4. It stands to reason that
the proportion of darts, p̂n , that land in the circle will be approximately π/4.
So you can use 4p̂n to estimate π!
ISYE 6739
Introduction
Example: Suppose that

(
4xy if 0 ≤ x ≤ 1, 0 ≤ y ≤ 1
f (x, y) =
0 otherwise.
Find the probability (volume) of the region 0 ≤ y ≤ 1 − x2 .

Z 1 Z 1−x2
V = 4xy dy dx
0 0
√
Z 1Z 1−y
= 4xy dx dy
0 0
= 1/3.
Moral: Be careful with limits! 2
ISYE 6739
Introduction
Bivariate cdf’s
Definition: The joint (bivariate) cdf of X and Y is

F (x, y) ≡ P (X ≤ x, Y ≤ y), for all x, y.
 PP
s≤x,t≤y f (s, t) discrete



F (x, y) =
 R y R x f (s, t) ds dt continuous.


−∞ −∞
Going from cdf’s to pdf’s (continuous case):

Rx
1-dimension: f (x) = F 0 (x) = d
dx −∞ f (t) dt.
∂2 ∂2
Rx Ry
2-dimensions: f (x, y) = ∂x∂y F (x, y) = ∂x∂y −∞ −∞ f (s, t) dt ds.
ISYE 6739
Introduction
Properties:
F (x, y) is non-decreasing in both x and y.
limx→−∞ F (x, y) = limy→−∞ F (x, y) = 0.
limx→∞ F (x, y) = FY (y) = P (Y ≤ y) (“marginal” cdf of Y ).
limy→∞ F (x, y) = FX (x) = P (X ≤ x) (“marginal” cdf of X).
limx→∞ limy→∞ F (x, y) = 1.
F (x, y) is continuous from the right in both x and y.
ISYE 6739
Introduction
Example: Suppose
(
1 − e−x − e−y + e−(x+y) if x ≥ 0, y ≥ 0
F (x, y) =
0 if x < 0 or y < 0.
The marginal cdf of X is

(
1 − e−x if x ≥ 0
FX (x) = lim F (x, y) =
y→∞ 0 if x < 0.
The joint pdf is

∂2
f (x, y) = F (x, y)
∂x∂y
∂ −x
= (e − e−y e−x )
∂y
= e−(x+y) if x ≥ 0, y ≥ 0. 2
ISYE 6739
Marginal Distributions
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.2 — Marginal Distributions
We’re also interested in the individual (marginal) distributions of X and Y .
Definition: If X and Y are jointly discrete, then the marginal pmf’s of X

and Y are, respectively,
X
fX (x) = P (X = x) = f (x, y)
y
and
X
fY (y) = P (Y = y) = f (x, y).
x
ISYE 6739
Example (discrete case): f (x, y) = P (X = x, Y = y).
f (x, y) X=1 X=2 X=3 P (Y = y)

Y = 40 0.01 0.07 0.12 0.2
Y = 60 0.29 0.03 0.48 0.8
P (X = x) 0.3 0.1 0.6 1
By total probability,
P (X = 1) = P (X = 1, Y = any #) = 0.3. 2
ISYE 6739
Example (discrete case): f (x, y) = P (X = x, Y = y).
f (x, y) X=1 X=2 X=3 P (Y = y)

Y = 40 0.06 0.02 0.12 0.2
Y = 60 0.24 0.08 0.48 0.8
P (X = x) 0.3 0.1 0.6 1
Remark: Hmmm. . . . Compared to the last example, this has the same
marginals but different joint distribution! That’s because the joint distribution
contains much more information than just the marginals.
ISYE 6739
Definition: If X and Y are jointly continuous, then the marginal pdf’s of

X and Y are, respectively,
Z Z
fX (x) = f (x, y) dy and fY (y) = f (x, y) dx.
R R
Example: (
e−(x+y) if x ≥ 0, y ≥ 0
f (x, y) =
0 otherwise.
Then the marginal pdf of X is
Z Z ∞
fX (x) = f (x, y) dy = e−(x+y) dy = e−x , if x ≥ 0. 2
R 0
ISYE 6739
Example: (
21 2
4 x y if x2 ≤ y ≤ 1
f (x, y) =
0 otherwise.
Note funny limits where the pdf is positive, i.e., x2 ≤ y ≤ 1.
Z Z 1
21 2 21 2
fX (x) = f (x, y) dy = x y dy = x (1 − x4 ), −1 ≤ x ≤ 1.
R x2 4 8
√
Z Z y
21 2 7
fY (y) = f (x, y) dx = √
x y dx = y 5/2 , 0 ≤ y ≤ 1. 2
R − y 4 2
ISYE 6739
Conditional Distributions
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.3 — Conditional Distributions
Recall conditional probability: P (A|B) = P (A ∩ B)/P (B) if P (B) > 0.
Suppose that X and Y are jointly discrete RVs. Then if P (X = x) > 0,
P (X = x ∩ Y = y) f (x, y)
P (Y = y|X = x) = = .
P (X = x) fX (x)
P (Y = y|X = 2) defines the probabilities on Y given that X = 2.
Definition: If fX (x) > 0, then the conditional pmf/pdf of Y given

X = x is
f (x, y)
fY |X (y|x) ≡ .
fX (x)
Remark: We usually just write f (y|x) instead of fY |X (y|x).
f (x,y)
Remark: Of course, fX|Y (x|y) = f (x|y) = fY (y) .
ISYE 6739
Discrete Example: f (x, y) = P (X = x, Y = y).
f (x, y) X=1 X=2 X=3 fY (y)

Y = 40 0.01 0.07 0.12 0.2
Y = 60 0.29 0.03 0.48 0.8
fX (x) 0.3 0.1 0.6 1
Then, for example,


29

 80 if x = 1
f (x, 60) f (x, 60) 
3
f (x|y = 60) = = = 80 if x = 2
fY (60) 0.8 
48
if x = 3. 2


80
ISYE 6739
Old Continuous Example:

21 2
f (x, y) = x y, if x2 ≤ y ≤ 1.
4
21 2
fX (x) = x (1 − x4 ), if −1 ≤ x ≤ 1.
8
7 5/2
fY (y) = y , if 0 ≤ y ≤ 1.
2
Then the conditional pdf of Y given X = x is

21 2
f (x, y) 4 x y 2y
f (y|x) = = 21 2 = , if x2 ≤ y ≤ 1.
fX (x) 4
8 x (1 − x )
1 − x4
ISYE 6739
So, for example,
f ( 21 , y) 21
· 14 y 32
4 1
≤ y ≤ 1. 2

f y|1/2 = = = y, if 4
fX ( 21 ) 21
8 · 1
4 · (1 − 1
16 )
15
Note that 2/(1 − x4 ) is a constant with respect to y, and we can check to see
that f (y|x) is a legit conditional pdf:
Z Z 1
2y
f (y|x) dy = dy = 1. 2
R x2 1 − x4
ISYE 6739
Typical Problem: Given fX (x) and f (y|x), find fY (y).

R
Game Plan: Find f (x, y) = fX (x)f (y|x) and then fY (y) = R f (x, y) dx.
Example: Suppose fX (x) = 2x, for 0 < x < 1. Given X = x, suppose that
Y |x ∼ Unif(0, x). Now find fY (y).
Solution: Y |x ∼ Unif(0, x) implies that f (y|x) = 1/x, for 0 < y < x. So,
f (x, y) = fX (x)f (y|x)

1
= 2x · for 0 < x < 1 and 0 < y < x
x
= 2 0 < y < x < 1 (still have funny limits).
Thus,
Z Z 1
fY (y) = f (x, y) dx = 2 dx = 2(1 − y), 0 < y < 1. 2
R y
ISYE 6739
Independent Random Variables
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.4 — Independent Random Variables
Recall that two events are independent if P (A ∩ B) = P (A)P (B).
Then
P (A ∩ B) P (A)P (B)
P (A|B) = = = P (A).
P (B) P (B)
And similarly, P (B|A) = P (B).
Now we want to define independence for random variables, i.e., the outcome
of X doesn’t influence the outcome of Y (and vice versa).
Definition: X and Y are independent RVs if, for all x and y,
f (x, y) = fX (x)fY (y).
ISYE 6739
Equivalent definitions:
F (x, y) = FX (x)FY (y), ∀x, y

or
P (X ≤ x, Y ≤ y) = P (X ≤ x)P (Y ≤ y), ∀x, y.
If X and Y aren’t independent, then they’re dependent.
Nice, Intuitive Theorem: X and Y are independent if and only if

f (y|x) = fY (y) ∀x, y.
Proof:
f (x, y) fX (x)fY (y)
f (y|x) = = = fY (y). 2
fX (x) fX (x)
Similarly, X and Y independent implies f (x|y) = fX (x).
ISYE 6739
Example (discrete): f (x, y) = P (X = x, Y = y).
f (x, y) X=1 X=2 fY (y)

Y =2 0.12 0.28 0.4
Y =3 0.18 0.42 0.6
fX (x) 0.3 0.7 1
X and Y are independent since f (x, y) = fX (x)fY (y), ∀x, y. 2
ISYE 6739
Example (continuous): Suppose f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.

After some work (which can be avoided by the next theorem), we can derive
fX (x) = 2x, if 0 ≤ x ≤ 1 and

2
fY (y) = 3y , if 0 ≤ y ≤ 1.
X and Y are independent since f (x, y) = fX (x)fY (y), ∀x, y. 2
ISYE 6739
Easy way to tell if X and Y are independent. . . .
Theorem: X and Y are independent iff f (x, y) = a(x)b(y), ∀x, y, for some
functions a(x) and b(y) (not necessarily pdf’s).
So if f (x, y) factors into separate functions of x and y, then X and Y are

independent.
But if there are funny limits, this messes up the factorization, so in that case,
X and Y will be dependent — watch out!
Example: f (x, y) = 6xy 2 , 0 ≤ x ≤ 1, 0 ≤ y ≤ 1. Take
a(x) = 6x, 0 ≤ x ≤ 1, and b(y) = y 2 , 0 ≤ y ≤ 1.
Thus, X and Y are independent (as above). 2
ISYE 6739
21 2
Example: f (x, y) = 4 x y, x2 ≤ y ≤ 1.
Funny (non-rectangular) limits make factoring into marginals impossible.

Thus, X and Y are not independent. 2
c
Example: f (x, y) = x+y , 1 ≤ x ≤ 2, 1 ≤ y ≤ 3.
Can’t factor f (x, y) into functions of x and y separately. Thus, X and Y are
not independent. 2
Now that we can figure out if X and Y are independent, what can we do with
that knowledge?
ISYE 6739
Consequences of Independence
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.5 — Consequences of Independence
Definition/Theorem (two-dimensional Unconscious Statistician):

Let h(X, Y ) be a function of the RVs X and Y . Then
( P P
h(x, y)f (x, y) discrete
E[h(X, Y )] = R xR y
R R h(x, y)f (x, y) dx dy continuous.
Theorem: Whether or not X and Y are independent,
E[X + Y ] = E[X] + E[Y ].
ISYE 6739
Proof (continuous case):

Z Z
E[X + Y ] = (x + y)f (x, y) dx dy (2-D LOTUS)
R R
Z Z Z Z
= xf (x, y) dx dy + yf (x, y) dx dy
R R R R
Z Z Z Z
= x f (x, y) dy dx + y f (x, y) dx dy
R R R R
Z Z
= xfX (x) dx + yfY (y) dy
R R
= E[X] + E[Y ]. 2
ISYE 6739
One can generalize this result to more than two random variables.
Corollary: If X1 , X2 , . . . , Xn are RVs, then

n
X n
X
E Xi = E[Xi ].
i=1 i=1
Proof: Induction. 2
ISYE 6739
Theorem: If X and Y are independent, then E[XY ] = E[X]E[Y ].
Proof (continuous case):

Z Z
E[XY ] = xyf (x, y) dx dy (2-D LOTUS)
R R
Z Z
= xyfX (x)fY (y) dx dy (X and Y are indep)
R R
Z Z
= xfX (x) dx yfY (y) dy
R R
= E[X]E[Y ]. 2
Remark: The above theorem is not necessarily true if X and Y are

dependent. See the upcoming discussion on covariance.
ISYE 6739
Theorem: If X and Y are independent, then
Var(X + Y ) = Var(X) + Var(Y ).
Proof:
Var(X + Y ) = E[(X + Y )2 ] − (E[X + Y ])2

2
= E[X 2 + 2XY + Y 2 ] − E[X] + E[Y ]
n 2 2 o
= E[X 2 ] + 2E[XY ] + E[Y 2 ] − E[X] + 2E[X]E[Y ] + E[Y ]
2 2
= E[X 2 ] + 2E[X]E[Y ] + E[Y 2 ] − E[X] − 2E[X]E[Y ] − E[Y ]
(since X and Y are independent)
2 2
= E[X 2 ] − E[X] + E[Y 2 ] − E[Y ] . 2
Remark: The assumption of independence really is important here. If X and

Y aren’t independent, then the result might not hold!
ISYE 6739
Can generalize. . .
Corollary: If X1 , X2 , . . . , Xn are independent RVs, then

n
X n
X
Var Xi = Var(Xi ).
i=1 i=1
Proof: Induction. 2
Corollary: If X1 , X2 , . . . , Xn are independent RVs, then

n
X n
X
Var ai Xi + b = a2i Var(Xi ).
i=1 i=1
ISYE 6739
Random Samples
1 Introduction
6 Random Samples
ISYE 6739
Random Samples
Lesson 2.6 — Random Samples
Definition: X1 , X2 , . . . , Xn form a random sample if

Xi ’s are all independent.
Each Xi has the same pmf/pdf f (x).
iid
Notation: X1 , . . . , Xn ∼ f (x) (“independent and identically
distributed”).
iid
Example/Theorem: Suppose X1 , . . . , Xn ∼ f (x),
Pwith E[Xi ] = µ, and
Var(Xi ) = σ 2 . Define the sample mean as X̄ ≡ ni=1 Xi /n. Then
Xn n n
1 1X 1X
E[X̄] = E Xi = E[Xi ] = µ = µ.
n n n
i=1 i=1 i=1
So the mean of X̄ is the same as the mean of Xi . 2
ISYE 6739
Random Samples
Meanwhile, how about the variance of the sample mean?

X n
1
Var(X̄) = Var Xi
n
i=1
X n
1
= Var X i
n2
i=1
n
1 X
= Var(Xi ) (Xi ’s indep)
n2
i=1
n
1 X 2
= σ = σ 2 /n.
n2
i=1
So the mean of X̄ is the same as the mean of Xi , but the variance decreases!
This makes X̄ a great estimator for µ (which is usually unknown in practice);
the result is referred to as the Law of Large Numbers. Stay tuned.
ISYE 6739
Conditional Expectation
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.7 — Conditional Expectation
The Next Few Lessons:

Conditional expectation — definition and examples.
“Double” expectation — a very cool theorem.
Honors Class: First-step analysis.
Honors Class: Random sums of random variables.
Honors Class: The standard conditioning argument and its applications.
ISYE 6739
Consider the usual definition of expectation. (E.g., what’s the average weight
of a male?) ( P
yf (y) discrete
E[Y ] = R y
R yf (y) dy continuous.
Now suppose we’re interested in the average weight of a 6' tall male.
f (y|x) is the conditional pmf/pdf of Y given X = x.
Definition: The conditional expectation of Y given X = x is

( P
yf (y|x) discrete
E[Y |X = x] ≡ R y
R yf (y|x) dy continuous.
Note that E[Y |X = x] is a function of x.
ISYE 6739
Discrete Example:
f (x, y) X=0 X=3 fY (y)

Y =2 0.11 0.34 0.45
Y =5 0.00 0.05 0.05
Y = 10 0.29 0.21 0.50
fX (x) 0.40 0.60 1
The unconditional expectation is

X
E[Y ] = yfY (y) = 2(0.45) + 5(0.05) + 10(0.50) = 6.15.
y
ISYE 6739
But conditional on X = 3, we have


34

 60 if y = 2
f (3, y) f (3, y) 
5
f (y|x = 3) = = = 60 if y = 5
fX (3) 0.60 
 21

60 if y = 10.
So the expectation conditional on X = 3 is

X
E[Y |X = 3] = yf (y|3)
y
= 2(34/60) + 5(5/60) + 10(21/60)
= 5.05.
This compares to the unconditional expectation E[Y ] = 6.15. So information

that X = 3 pushes the conditional expected value of Y down to 5.05. 2
ISYE 6739
Old Continuous Example:

21 2
f (x, y) = x y, if x2 ≤ y ≤ 1.
4
Recall that
2y
f (y|x) = if x2 ≤ y ≤ 1.
1 − x4
Thus,
1
2 1 − x6
Z Z
2
E[Y |x] = yf (y|x) dy = y 2 dy = · .
R 1 − x4 x2 3 1 − x4
So, e.g., E[Y |X = 0.5] = 2

3 · 63 15
64 / 16 = 0.70. 2
ISYE 6739
Double Expectation
1 Introduction
6 Random Samples
ISYE 6739
Double Expectation
Lesson 2.8 — Double Expectation
Theorem (double expectation):
E[E(Y |X)] = E[Y ].
Remarks: Yikes, what the heck is this!?
The expected value (averaged over all X’s) of the conditional expected value
(of Y |X) is the plain old expected value (of Y ).
Think of the outside expected value as the expected value of

h(X) = E(Y |X). Then LOTUS miraculously gives us E[Y ].
Believe it or not, sometimes it’s easier to calculate E[Y ] indirectly by using

our double expectation trick.
ISYE 6739
Double Expectation
Proof (continuous case): By the Unconscious Statistician,

Z
E[E(Y |X)] = E(Y |x)fX (x) dx
R
Z Z
= yf (y|x) dy fX (x) dx
R R
Z Z
= yf (y|x)fX (x) dx dy
R R
Z Z
= y f (x, y) dx dy
R R
Z
= yfY (y) dy
R
= E[Y ]. 2
ISYE 6739
Double Expectation
21 2
Old Example: Suppose f (x, y) = 4 x y, if x2 ≤ y ≤ 1.
Find E[Y ] two ways.
By previous examples, we know that

21 2
fX (x) = x (1 − x4 ), if −1 ≤ x ≤ 1
8
7 5/2
fY (y) = y , if 0 ≤ y ≤ 1
2
2 1 − x6
E[Y |x] = · .
3 1 − x4
ISYE 6739
Double Expectation
Solution #1 (old, boring way):

Z Z 1
7 7/2 7
E[Y ] = yfY (y) dy = y dy = .
R 0 2 9
Solution #2 (new, exciting way):
E[Y ] = E[E(Y |X)]

Z
= E(Y |x)fX (x) dx
R
Z 1
2 1 − x6

21 2 4
= · 4
x (1 − x ) dx
−1 3 1 − x 8
= 7/9.
Notice that both answers are the same (good)! 2
ISYE 6739
Honors Class: First-Step Analysis
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.9 — Honors Class: First-Step Analysis
Example: “First-step” method to find the mean of Y ∼ Geom(p). Think

of Y as the number of coin flips before H appears, where P (H) = p.
Furthermore, consider the first step of the coin flip process, and let X = H or
T denote the outcome of the first toss. Based on the result X of this first step,
we have
E[Y ] = E[E(Y |X)]

X
= E[Y |x]fX (x)
x
= E[Y |X = T]P (X = T) + E[Y |X = H]P (X = H)
= (1 + E[Y ])(1 − p) + (1)(p) (start from scratch if X = T).
Solving, we get E[Y ] = 1/p (which is the correct answer)! 2
ISYE 6739
Example: Consider a sequence of coin flips. What is the expected number of

flips Y until “HT” appears for the first time?
Clearly, Y = A + B, where A is the number of flips until the first “H”

appears, and B is the number of subsequent flips until “T” appears for the first
time after the sequence of H’s begins.
For instance, the sequence TTTHHT corresponds to Y = A + B = 4 + 2 = 6.
In any case, it’s obvious that A and B are iid Geom(p = 1/2), so by the
previous example, E[Y ] = E[A] + E[B] = (1/p) + (1/p) = 4. 2
This example didn’t involve first-step analysis (besides using the expected
value of a geometric RV). But the next related example will. . . .
ISYE 6739
Example: Again consider a sequence of coin flips. What is the expected

number of flips Y until “HH” appears for the first time?
For instance, the sequence TTHTTHH corresponds to Y = 7 tries.
Using an enhanced first-step analysis, we see that
E[Y ] = E[Y |T]P (T) + E[Y |H]P (H)

= E[Y |T]P (T)

+ E[Y |HH]P (HH|H) + E[Y |HT]P (HT|H) P (H)

= 1 + E[Y ] (0.5) + (2)(0.5) + 2 + E[Y ] (0.5) (0.5)
(since we have to start over once we see a T)
= 1.5 + 0.75 E[Y ].
Solving, we obtain E[Y ] = 6, which is perhaps surprising given the result

from the previous example. 2
ISYE 6739
Honors Class: Random Sums of Random Variables
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.10 — Honors Class: Random Sums of Random Variables
Bonus Theorem (expectation of sum of a random number of RVs):
Suppose that X1 , X2 , . . . are independent RVs, all with the same mean.
Also suppose that N is a nonnegative, integer-valued RV that’s independent

of the Xi ’s. Then
XN
E Xi = E[N ]E[X1 ].
i=1
Remark:
PN You have to be very careful here. In particular, note that
E i=1 Xi 6= N E[X1 ], since the LHS is a number and the RHS is random.
ISYE 6739
Proof (cf. Ross): By double expectation,

N
X " N #
X
E Xi = E E Xi N
i=1 i=1
∞
X N
X
= E Xi N = n P (N = n)
n=1 i=1
X∞ n
X
= E Xi N = n P (N = n)
n=1 i=1
X∞ n
X
= E Xi P (N = n) (N and Xi ’s indep)
n=1 i=1
X∞
= nE[X1 ]P (N = n)
n=1
∞
X
= E[X1 ] nP (N = n). 2
n=1
ISYE 6739
Example: Suppose the number of times we roll a die is N ∼ Pois(10). If Xi

denotes the value of the ith toss, then the expected total of all of the rolls is
N
X
E Xi = E[N ]E[X1 ] = 10(3.5) = 35. 2
i=1
Theorem: Under the same conditions as before,

N
X
Var Xi = E[N ]Var(X1 ) + (E[X1 ])2 Var(N ).
i=1
Proof: See, for instance, Ross. 2
ISYE 6739
Honors Class: Standard Conditioning Argument
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.11 — Honors Class: Standard Conditioning Argument
Bonus Theorem/Proof (computing probabilities by conditioning):
Let A be some event, and define the RV Y as the following indicator function:
(
1 if A occurs
Y = 1A ≡
0 otherwise.
Then X
E[Y ] = yfY (y) = P (Y = 1) = P (A).
y
Similarly, for any RV X, we have

X
E[Y |X = x] = yf (y|x) = P (Y = 1|X = x) = P (A|X = x).
y
These results suggest an alternative way of calculating P (A). . . .
ISYE 6739
Theorem: If X is a continuous RV (similar result if X is discrete), then

Z
P (A) = P (A|X = x)fX (x) dx.
R
Proof:
P (A) = E[Y ] (where we take Y = 1A )
= E[E(Y |X)] (double expectation)
Z
= E[Y |x]fX (x) dx (LOTUS)
ZR
= P (A|X = x)fX (x) dx (since Y = 1A ). 2

R
Remark: We call this the “standard conditioning argument.” Yes, it

looks complicated. But sometimes you need to take a step backward to go two
steps forward!
ISYE 6739
Example/Theorem: If X and Y are independent continuous RVs, with pdf

fX (·) and cdf FY (·), respectively. Then
Z
P (Y ≤ X) = FY (x)fX (x) dx.
R
Proof: (Actually, there are many proofs.) Let the event A = {Y ≤ X}. Then
Z
P (Y ≤ X) = P (Y ≤ X|X = x)fX (x) dx
ZR
= P (Y ≤ x|X = x)fX (x) dx
ZR
= P (Y ≤ x)fX (x) dx (X, Y are independent). 2
R
ISYE 6739
Example: If X ∼ Exp(α) and Y ∼ Exp(β) are independent RVs, then

Z
P (Y ≤ X) = FY (x)fX (x) dx
ZR∞
= (1 − e−βx )αe−αx dx
0
β
= . 2
α+β
Remark: Think of X as the time until the next male driver shows up at a
parking lot (at rate α / hour) and Y as the time for the next female driver (at
rate β / hour). Then P (Y ≤ X) = β/(α + β) is the intuitively reasonable
probability that the next driver to arrive will be female. 2
ISYE 6739
Example/Theorem: Suppose X and Y are independent continuous RVs,

with pdf fX (·) and cdf FY (·), respectively. Define the sum Z = X + Y . Then
Z
P (Z ≤ z) = FY (z − x)fX (x) dx.
R
As expression such as the above for P (Z ≤ z) is often called a convolution.
Proof: Z
P (Z ≤ z) = P (X + Y ≤ z|X = x)fX (x) dx
ZR
= P (Y ≤ z − x|X = x)fX (x) dx
ZR
= P (Y ≤ z − x)fX (x) dx (X, Y are indep). 2
R
ISYE 6739
iid
Example: Suppose X, Y ∼ Exp(λ), and let Z = X + Y . Then
Z
P (Z ≤ z) = FY (z − x)fX (x) dx
R
Z z
= (1 − e−λ(z−x) )λe−λx dx
0
(must have x ≥ 0 and z − x ≥ 0)
= 1 − e−λz − λze−λz , if z ≥ 0.
Thus, the pdf of Z is
d
P (Z ≤ z) = λ2 ze−λz , z ≥ 0.
dz
This turns out to mean that Z ∼ Gamma(2, λ), aka Erlang2 (λ). 2
ISYE 6739
You can do the similar kinds of convolutions with discrete RVs. We state the
following result without proof (which is straightforward).
Example/Theorem: Suppose X and Y are two independent integer-valued

RVs with pmf’s fX (x) and fY (y). Then the pmf of Z = X + Y is
∞
X
fZ (z) = P (Z = z) = fX (x)fY (z − x).
x=−∞
ISYE 6739
Example Suppose X and Y are iid Bern(p). Then the pmf of Z = X + Y is

∞
X
fZ (z) = fX (x)fY (z − x)
x=−∞
= fX (0)fY (z) + fX (1)fY (z − 1) (X can only = 0 or 1)
= fX (0)fY (z)1{0,1} (z) + fX (1)fY (z − 1)1{1,2} (z)

(1{·} (z) functions indicate nonzero fY (·)’s)
= p0 q 1−0 pz q 1−z 1{0,1} (z) + p1 q 1−1 pz−1 q 2−z 1{1,2} (z)
pz q 2−z 1{0,1} (z) + 1{1,2} (z)

=

2 z 2−z
= p q , z = 0, 1, 2.
z
Thus, Z ∼ Bin(2, p), a fond blast from the past! 2
ISYE 6739
Covariance and Correlation
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.12 — Covariance and Correlation
In the next few lessons we’ll cover:

Basic Concepts of Covariance and Correlation
Causation
A Couple of Worked Examples
Some Useful Theorems
Covariance and correlation are measures used to define the degree of

association between X and Y if they don’t happen to be independent.
Definition: The covariance between X and Y is
Cov(X, Y ) ≡ σXY ≡ E[(X − E[X])(Y − E[Y ])].
Remark: Cov(X, X) = E[(X − E[X])2 ] = Var(X).
ISYE 6739
Remark: If X and Y have positive covariance, then X and Y move “in the
same direction.” Think height and weight.
ISYE 6739
If X and Y have negative covariance, then X and Y move “in opposite

directions.” Think snowfall and temperature.
ISYE 6739
If X and Y are independent, then of course they have no association with

each other. In fact, we’ll prove below that independence implies that the
covariance is 0 (but not the other way around).
Example: IBM stock price vs. temperature on Mars are independent — at

least that’s what they want you to believe!
ISYE 6739
Theorem (easier way to calculate covariance):
Cov(X, Y ) = E[XY ] − E[X]E[Y ].
Proof:
Cov(X, Y ) = E[(X − E[X])(Y − E[Y ])]

h i
= E XY − XE[Y ] − Y E[X] + E[X]E[Y ]
= E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ]. 2
Theorem: X and Y independent implies Cov(X, Y ) = 0.
Proof: By a previous theorem, X and Y independent implies

E[XY ] = E[X]E[Y ]. Then
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = E[X]E[Y ] − E[X]E[Y ]. 2
ISYE 6739
Danger Will Robinson! Cov(X, Y ) = 0 does not imply that X and Y

are independent!!
Example: Suppose X ∼ Unif(−1, 1) and Y = X 2 (so X and Y are clearly

dependent).
But Z 1
1
E[X] = x·
dx = 0 and
−1 2
Z 1
3 1
E[XY ] = E[X ] = x3 · dx = 0,
−1 2
so
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.
ISYE 6739
In fact, here’s a graphical illustration of this zero-correlation dependence

phenomenon, where we’ve actually added some normal noise to Y to make it
look prettier.
ISYE 6739
Definition: The correlation between X and Y is

Cov(X, Y ) σXY
ρ = Corr(X, Y ) ≡ p = .
Var(X)Var(Y ) σX σY
Remark: Covariance has “square” units; correlation is unitless.
Corollary: X, Y independent implies ρ = 0.
Theorem: It can be shown that −1 ≤ ρ ≤ 1.
ρ ≈ 1 is “high” correlation.
ρ ≈ 0 is “low” correlation.
ρ ≈ −1 is “high” negative correlation.
Example: Height is highly correlated with weight.

Temperature on Mars has low correlation with IBM stock price.
ISYE 6739
Correlation and Causation
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.13 — Correlation and Causation
NOTE! Correlation does not necessarily imply causality! This is a

very common pitfall in many areas of data analysis and public discourse.
Example in which correlation does imply causality: Height and

weight are positively correlated, and larger height does indeed tend to cause
greater weight. 2
Example in which correlation does not imply causality: Temperature

and lemonade sales have positive corr, and temp has causal influence on
lemonade sales. Similarly, temp and overheating cars are positively correlated
with a causal relationship. It’s also likely that lemonade sales and overheating
cars are positively correlated, but there’s no causal relationship there. 2
Example of a zero correlation relationship with causality! We’ve

seen that it’s possible for two dependent RVs to be uncorrelated. 2
ISYE 6739
To prove that X causes Y , one must establish that:

X occurred before Y ;
The relationship between X and Y is not completely due to random
chance; and
Nothing else accounts for the relationship (which is violated in the
lemonade sales / overheating cars example above).
These items can be often be established via mathematical analysis, statistical

analysis of appropriate data, or consultation with appropriate experts.
ISYE 6739
The three examples above seem to give conflicting guidance with respect to
the relationship between correlation and causality. How can we interpret these
findings in a meaningful way? Here are the takeaways:
If the correlation between X and Y is (significantly) nonzero, there is
some type of relationship between the two items, which may or may not
be causal; but this should raise our curiosity.
If the correlation between X and Y is 0, we are not quite out of the
woods with respect to dependence and causality. In order to definitively
rule out a relationship between X and Y , it is always highly
recommended protocol to, at the very least,
Plot data from X and Y against each other to see if there is a nonlinear
relationship, as in the uncorrelated-yet-dependent example.
Consult with appropriate experts.
ISYE 6739
A Couple of Worked Correlation Examples
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.14 — A Couple of Worked Correlation Examples
Discrete Example: Suppose X is the GPA of a UGA student, and Y is

their IQ. Here’s the joint pmf.
f (x, y) X=2 X=3 X=4 fY (y)

Y = 40 0.0 0.2 0.2 0.4
Y = 50 0.1 0.1 0.0 0.2
Y = 60 0.4 0.0 0.0 0.4
fX (x) 0.5 0.3 0.2 1
We’ll spare the details, but here are the relevant calculations. . .
ISYE 6739
X
E[X] = xfX (x) = 2.7,
x
X
E[X 2 ] = x2 fX (x) = 7.9, and
x
Var(X) = E[X 2 ] − (E[X])2 = 0.61.
Similarly, E[Y ] = 50, E[Y 2 ] = 2580, and Var(Y ) = 80. Finally,

XX
E[XY ] = xyf (x, y)
x y
= 2(40)(0.0) + 3(40)(0.2) + · · · + 4(60)(0.0) = 129,
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = − 6.0, and

Cov(X, Y )
ρ = p = − 0.859. 2
Var(X)Var(Y )
ISYE 6739
Continuous Example: Suppose f (x, y) = 10x2 y, 0 ≤ y ≤ x ≤ 1.

Z x
fX (x) = 10x2 y dy = 5x4 , 0 ≤ x ≤ 1,
0
Z 1
E[X] = 5x5 dx = 5/6,
0
Z 1
E[X 2 ] = 5x6 dx = 5/7,
0
Var(X) = E[X 2 ] − (E[X])2 = 0.01984.
ISYE 6739
Similarly,
Z 1
10
fY (y) = 10x2 y dx = y(1 − y 3 ), 0 ≤ y ≤ 1,
y 3
E[Y ] = 5/9, Var(Y ) = 0.04850,
Z 1Z x
E[XY ] = 10x3 y 2 dy dx = 10/21,
0 0
Cov(X, Y ) = E[XY ] − E[X]E[Y ] = 0.01323,
Cov(X, Y )
ρ = p = 0.4265. 2
Var(X)Var(Y )
ISYE 6739
Some Useful Covariance / Correlation Theorems
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.15 — Some Useful Covariance / Correlation Theorems
Theorem: Var(X + Y ) = Var(X) + Var(Y ) + 2Cov(X, Y ), whether or

not X and Y are independent.
Remark: If X, Y are independent, the covariance term goes away.
Proof: By the work we did on a previous proof,
Var(X + Y ) = E[X 2 ] − (E[X])2 + E[Y 2 ] − (E[Y ])2

+2(E[XY ] − E[X]E[Y ])
= Var(X) + Var(Y ) + 2Cov(X, Y ). 2
ISYE 6739
Theorem:
n
X n
X XX
Var Xi = Var(Xi ) + 2 Cov(Xi , Xj ).
i<j
i=1 i=1
Proof: Induction.
Corollary: If all Xi ’s are independent, then

n
X n
X
Var Xi = Var(Xi ).
i=1 i=1
ISYE 6739
Theorem: Cov(aX, bY + c) = ab Cov(X, Y ).
Proof:
Cov(aX, bY + c) = E[aX · (bY + c)] − E[aX]E[bY + c]

= E[abXY ] + E[acX] − E[aX]E[bY ] − E[aX]E[c]
= ab E[XY ] − ab E[X]E[Y ] + acE[X] − acE[X]
= ab Cov(X, Y ). 2
Theorem:
Xn n
X XX
Var ai Xi + c = a2i Var(Xi ) + 2 ai aj Cov(Xi , Xj ).
i<j
i=1 i=1
Proof: Put the above two results together. 2
ISYE 6739
Example: Var(X − Y ) = Var(X) + Var(Y ) − 2Cov(X, Y ).
Example: Suppose Var(X) = Var(Y ) = Var(Z) = 10,

Cov(X, Y ) = 3, Cov(X, Z) = −2, and Cov(Y, Z) = 0. Then
Var(X − 2Y + 3Z)
= Var(X) + 4Var(Y ) + 9Var(Z)
−4Cov(X, Y ) + 6Cov(X, Z) − 12Cov(Y, Z)
= 14(10) − 4(3) + 6(−2) − 12(0) = 116. 2
ISYE 6739
Moment Generating Functions, Revisited
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.16 — Moment Generating Functions, Revisited
Old Definition: MX (t) ≡ E[etX ] is the moment generating function

(mgf) of the RV X.
Old Example: If X ∼ Bern(p), then

X
MX (t) = E[etX ] = etx f (x) = et·1 p + et·0 q = pet + q. 2
x
Old Example: If X ∼ Exp(λ), then

Z
λ
tX
MX (t) = E[e ] = etx f (x) dx = if λ > t. 2
R λ−t
Old Theorem (why it’s called the mgf): Under certain technical conditions,
dk
E[X k ] = MX (t) , k = 1, 2, . . . .
dtk t=0
ISYE 6739
New Theorem (mgf of the Psum of independent RVs): Suppose X1 , . . . , Xn

n
are independent. Let Y = i=1 Xi . Then
n
Y
MY (t) = MXi (t).
i=1
Proof:
MY (t) = E[etY ]
P
= E[et Xi ]
Yn
tXi
= E e
i=1
n
Y
= E[etXi ] (Xi ’s independent)
i=1
n
Y
= MXi (t). 2
i=1
ISYE 6739
Pn
Corollary: If X1 , . . . , Xn are iid and Y = i=1 Xi , then
MY (t) = [MX1 (t)]n .

iid
Example: Suppose X1 , . . . , Xn ∼ Bern(p). Then by a previous example,
MY (t) = [MX1 (t)]n = (pet + q)n .
So what use is a result like this? We can use results such as this with our old
friend. . . .
Old Theorem (identifying distributions): In this class, each distribution has

a unique mgf.
ISYE 6739
Example/Theorem: The sum Y of n iid Bern(p) RVs is the same as a

Bin(n, p) RV.
By the previous example and uniqueness, all we need to show is that the mgf
of Z ∼ Bin(n, p) matches MY (t) = (pet + q)n . To this end, we have
MZ (t) = E[etZ ]
X
= etz P (Z = z)
z
n
X n z n−z
tz
= e p q
z
z=0
n
X n
= (pet )z q n−z
z
z=0
= (pe + q)n
t
(by the Binomial Theorem). 2
ISYE 6739
Example: You can identify a distribution by its mgf.

15
3 t 1
MX (t) = e +
4 4
implies that X ∼ Bin(15, 0.75). 2
Old Theorem (mgf of a linear function of X): Suppose X has mgf MX (t)
and let Y = aX + b. Then MY (t) = etb MX (at).
Example:
15
−2t 3 3t 1
MY (t) = e e + = ebt (peat + q)n = ebt MX (at),
4 4
which implies that Y has the same distribution as 3X − 2, where

X ∼ Bin(15, 0.75). 2
ISYE 6739
Theorem (Additive property of Binomials): If X1 , . . . , Xk are independent,

with Xi ∼ Bin(ni , p) (where p is the same for all Xi ’s), then
k
X k
X
Y ≡ Xi ∼ Bin ni , p .
i=1 i=1
Proof:
k
Y
MY (t) = MXi (t) (mgf of independent sum)
i=1
k
Y
= (pet + q)ni (Binomial(ni , p) mgf)
i=1
Pk
= (pet + q) i=1 ni .
This is the mgf of the Bin( ki=1 ni , p), so we’re done. 2

P
ISYE 6739
Honors Bivariate Functions of Random Variables
1 Introduction
6 Random Samples
ISYE 6739
Lesson 2.17 — Honors Bivariate Functions of Random Variables
In earlier work, we looked at. . .

Functions of a single variable, e.g., what is the expected value of h(X)?
(LOTUS, from Module 2)
What is the distribution of h(X)? (functions of RVs, from Module 2)
And sometimes even functions of two (or Pmore) variables. For example,
if the Xi ’s are independent, what’s Var( ni=1 Xi )? (earlier in Module 3)
Use a standard conditioning argument to get the distribution of X + Y .
(earlier in Module 3)
Goal: Now let’s give a general result on the distribution of functions of two
random variables, the proof of which is beyond the scope of our class.
ISYE 6739
Honors Theorem: Suppose X and Y are continuous RVs with joint pdf
f (x, y), and V = h1 (X, Y ) and W = h2 (X, Y ) are functions of X and Y ,
and
X = k1 (V, W ) and Y = k2 (V, W ),
for suitably chosen inverse functions k1 and k2 .
Then the joint pdf of V and W is

g(v, w) = f k1 (v, w), k2 (v, w) |J(v, w)|,
where |J| is the absolute value of the Jacobian (determinant) of the

transformation, i.e.,
∂x ∂x
∂v ∂w ∂x ∂y ∂y ∂x
J(v, w) = ∂y ∂y
= − .
∂v ∂w
∂v ∂w ∂v ∂w
ISYE 6739
Corollary: If X and Y are independent, then the joint pdf of V and W is

g(v, w) = fX k1 (v, w) fY k2 (v, w) |J(v, w)|.
Remark: These results generalize the 1-D method from Module 2.
You can use this method to find all sorts of cool stuff, e.g., the distribution of
X + Y , X/Y , etc., as well as the joint pdf of any functions of X and Y .
Remark: Although the notation is nasty, the application isn’t really so bad.
ISYE 6739
Example: Suppose X and Y are iid Exp(λ). Find the pdf of X + Y .
We’ll set V = X + Y along with the dummy RV W = X.
This yields
X = W = k1 (V, W ) and Y = V − W = k2 (V, W ).
To get the Jacobian term, we calculate
∂x ∂x ∂y ∂y
= 0, = 1, = 1, and = −1,
∂v ∂w ∂v ∂w
so that
∂x ∂y ∂y ∂x
|J| = − = |0(−1) − 1(1)| = 1.
∂v ∂w ∂v ∂w
ISYE 6739
This implies that the joint pdf of V and W is

g(v, w) = f k1 (v, w), k2 (v, w) |J(v, w)|
= f (w, v − w) · 1
= fX (w)fY (v − w) (X and Y independent)
= λe−λw · λe−λ(v−w) , for w > 0 and v − w > 0
2 −λv
= λ e , for 0 < w < v.
And, finally, we obtain the desired pdf of the sum V (after carefully noting the
region of integration),
Z Z v
gV (v) = g(v, w) dw = λ2 e−λv dw = λ2 ve−λv , for v > 0.
R 0
This is the Gamma(2, λ) pdf, which matches our answer from earlier in the
current module. 2
ISYE 6739
Honors Example: Suppose X and Y are iid Unif(0,1). Find the joint pdf of
V = X + Y and W = X/Y .
After some algebra, we obtain

VW V
X = = k1 (V, W ) and Y = = k2 (V, W ).
W +1 W +1
After more algebra, we calculate
∂x w ∂x v ∂y 1 ∂y −v
= , = , = , = ,
∂v w+1 ∂w (w + 1)2 ∂v w+1 ∂w (w + 1)2
so that after still more algebra,
∂x ∂y ∂y ∂x v
|J| = − = .
∂v ∂w ∂v ∂w (w + 1)2
ISYE 6739
This implies that the joint pdf of V and W is

g(v, w) = f k1 (v, w), k2 (v, w) |J(v, w)|

vw v v
= f , ·
w+1 w+1 (w + 1)2

vw v v
= fX fY (X and Y indep)
w+1 w + 1 (w + 1)2
v
= 1·1· , for 0 < x, y < 1 (since X, Y ∼ Unif(0,1))
(w + 1)2
v vw v
= , for 0 < x = w+1 < 1 and 0 < y = w+1 < 1.
(w + 1)2
v
= , 0 < v < 1 + min{ w1 , w} and w > 0 (after algebra).
(w + 1)2
Note that you have to be careful about the limits of v and w, but this thing
really does double integrate to 1! 2
ISYE 6739
We can also get the marginal pdf’s. First of all, for the ratio of the uniforms,
we get
Z
gW (w) = g(v, w) dv
R
Z1+min{1/w,w}
v
= dv
0 (w + 1)2
2
1 + min{1/w, w}
=
2(w + 1)2
(
1
2, if w ≤ 1
= 1
2w2
, if w > 1,
which is a little weird-looking and unexpected to me (it’s flat for w ≤ 1, and

then decreases to 0 pretty quickly for w > 1). 2
ISYE 6739
For the pdf

R of the sum of the uniforms, we have to calculate
gV (v) = R g(v, w) dw. But first we need to deal with some inequality
constraints so that we can integrate over the proper region, namely,
0 ≤ v ≤ 1 + min{1/w, w}, 0 ≤ v ≤ 2, and w ≥ 0.
With a little thought, we see that if 0 ≤ v ≤ 1, then there is no constraint on w

except for it being positive. On the other hand, if 1 < v ≤ 2, then you can
1
show (it takes a little work) that v − 1 ≤ w ≤ v−1 . Thus, we have
( R∞
g(v, w) dw, if 0 ≤ v ≤ 1
gV (v) = R01/(v−1)
v−1 g(v, w) dw, if 1 < v ≤ 2
(
v, if 0 ≤ v ≤ 1
= (after algebra).
2 − v, if 1 < v ≤ 2
This is a Triangle(0,1,2) pdf. Can you see why? Is there an intuitive

explanation for this pdf? 2
ISYE 6739
And Now a Word From Our Sponsor. . .
We are finally done with the most-difficult module of the course.

Congratulations and Felicitations!!!
Things will get easier from now on! Happy days are here again!
ISYE 6739

Probability 2.2 EdX

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Probability 2.2 EdX

Uploaded by

Copyright:

Available Formats

2.

Bivariate Random Variables

Lesson 2.1 — Introduction

In this introductory lesson, we’ll cover . . .

Example: Choose a person at random. Look at their height and weight

Definition: If X and Y are discrete random variables, then (X, Y ) is called a

The joint (or bivariate) pmf is

f (x, y) = P (X = x, Y = y), ∀x, y.

Example: 3 sox in a box (numbered 1,2,3). Draw 2 sox at random without

f (x, y) X=1 X=2 X=3 P (Y = y)

fX (x) ≡ P (X = x) is the “marginal” pmf of X.

fY (y) ≡ P (Y = y) is the “marginal” pmf of Y .

By the Law of Total Probability,

Definition: If X and Y are continuous RVs, then (X, Y ) is a jointly

In this case, f (x, y) is called the joint pdf.

If A ⊆ R2 , then P (A) is the volume between f (x, y) and A.

f (x, y) dx dy ≈ P (x < X < x + dx, y < Y < y + dy).

Example: Choose a point (X, Y ) at random in the interior of the circle

Find the pdf of (X, Y ).

Since the area of the circle is π/4,

Example: Suppose that

Find the probability (volume) of the region 0 ≤ y ≤ 1 − x2 .

Moral: Be careful with limits! 2

Definition: The joint (bivariate) cdf of X and Y is

Going from cdf’s to pdf’s (continuous case):

F (x, y) is non-decreasing in both x and y.

limx→−∞ F (x, y) = limy→−∞ F (x, y) = 0.

limx→∞ F (x, y) = FY (y) = P (Y ≤ y) (“marginal” cdf of Y ).

limy→∞ F (x, y) = FX (x) = P (X ≤ x) (“marginal” cdf of X).

limx→∞ limy→∞ F (x, y) = 1.

F (x, y) is continuous from the right in both x and y.

The marginal cdf of X is

The joint pdf is

Lesson 2.2 — Marginal Distributions

We’re also interested in the individual (marginal) distributions of X and Y .

Definition: If X and Y are jointly discrete, then the marginal pmf’s of X

Example (discrete case): f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 X=3 P (Y = y)

Example (discrete case): f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 X=3 P (Y = y)

Definition: If X and Y are jointly continuous, then the marginal pdf’s of

Lesson 2.3 — Conditional Distributions

Recall conditional probability: P (A|B) = P (A ∩ B)/P (B) if P (B) > 0.

Suppose that X and Y are jointly discrete RVs. Then if P (X = x) > 0,

Definition: If fX (x) > 0, then the conditional pmf/pdf of Y given

Remark: We usually just write f (y|x) instead of fY |X (y|x).

Discrete Example: f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 X=3 fY (y)

Then, for example,

Old Continuous Example:

Then the conditional pdf of Y given X = x is

So, for example,

Typical Problem: Given fX (x) and f (y|x), find fY (y).

f (x, y) = fX (x)f (y|x)

Lesson 2.4 — Independent Random Variables

Recall that two events are independent if P (A ∩ B) = P (A)P (B).

Definition: X and Y are independent RVs if, for all x and y,

f (x, y) = fX (x)fY (y).

F (x, y) = FX (x)FY (y), ∀x, y

If X and Y aren’t independent, then they’re dependent.

Nice, Intuitive Theorem: X and Y are independent if and only if

Similarly, X and Y independent implies f (x|y) = fX (x).

Example (discrete): f (x, y) = P (X = x, Y = y).

f (x, y) X=1 X=2 fY (y)