Joint Distributions: The Joint Distribution of Two Discrete Random Variables

LECTURE NOTES NO.
5 M235
Joint Distributions
The Joint Distribution of Two Discrete
Random Variables:
Example: A experiment consists of three

tosses of a fair coin. The sample space
S = {HHH, HHT, HTH, HTT, THH, THT,
TTH, TTT}
Let the random variable X be the number of
heads.
1
A random variable is A function that
maps events to numbers.
X:S (the sample space)→R(the set of real numbers)
X(HHH)=3,X(HHT)=2,X(HTH)=2,
X(HTT)=1,X(THH)=2,X(THT)=1
X(TTT)=0,X(TTH)=1
Then the possible values of X are 0,1,2, 3
The probability distribution of X is given by
Table 1
x 0 1 2 3
P(X=x) 1/8 3/8 3/8 1/8
Let the random variable Y be the number of
tails that precede the first head. (if no heads
come up, we let Y=3),
2
A random variable is A function that
maps events to numbers.
Y(HHH)=0,Y(HHT)=0,Y(HTH)=0,
Y(HTT)=0,Y(THH)=1,Y(THT)=1
Y(TTT)=3,Y(TTH)=2
then the possible values of Y are 0,1,2, and 3
The probability distribution of Y is given by
Table 2
y 0 1 2 3
P(Y=y) 4/8 2/8 1/8 1/8
Look to events involving both X and Y, for

example P(X=1 and Y = 1)?
P(X=1 and Y = 1) = P(THT) = 1/8,
3
because the event “X=1 and Y=1” occurs
only if the outcome of the experiment is
THT. But notice that this answer , 1/8,
cannot be found from the tables 1 and 2.
Notation:
P(X = x and Y = y)= P(X = x, Y = y)
The tables 1 and 2 are fine for computing
probabilities for X and Y alone, but from
these tables, there is no way to know that
P(X=1 and Y = 1)=1/8.
What do we need?
It seems clear that in this example, the
probabilities in the following table will give
us enough information:
4
y
p(x, y) 0 1 2 3
0 0 0 0 1/8
x 1 1/8 1/8 1/8 0
2 2/8 1/8 0 0
3 1/8 0 0 0
The notation “p(x, y)” refers to P(X = x and
Y = y); for example, p(2, 0) = P(X=2 and
Y=0)= 2/8. Notice that the 16 probabilities
in the table add to 1.
The entries in table 3 amount to the
specification of a function of x and y, p(x,
y)= P(X = x and Y = y). It is called the joint
probability mass function of X and Y.
5
y
0 1 2 3
p(x, y)
0 0 0 0 1/8 1/8 P(X=x)
1 1/8 1/8 1/8 0 3/8
x 2 2/8 1/8 0 0 3/8
3 1/8 0 0 0 1/8
4/8 2/8 1/8 1/8
P(Y=y)
Notice from Table 4, we can get tables 1 and 2 – The

individual mass functions of X and Y – by adding
across the rows and down the columns. In this
context the probability mass function of X and Y are
called marginal probability mass functions and
denoted by pX(x) and pY(y).
6
Definition. The joint probability mass
function of two discrete random variables X
and Y is the function p(x, y), defined for all
pairs of real numbers x and y by
P(x, y)=P(X = x and Y = y).
and satisfies the following two conditions:
1. P(x, y)  0, for all x and y
2.  P(x, y) =  1
x y
.
3. Pa  X  b, c  X  d     P( x, y)
c  x  d a  xy b
7
Theorem: The marginal probability mass
functions can be obtained from the joint
probability mass function:
P( X  x)   P( X  x, Y  y)
y
P(Y  y)   P( X  x, Y  y)
x
8
Independence of random variables :
Recall: events A and B are
independent  P(A  B) = P(A) P(B)
Two r.v.`s X and Y will be called
independent when every event
concerning X and every event concerning
Y are indpendent.
General definition :
X and Y are independent 
P(XA and YB) = P(XA) P(YB) ,
for any combination A R and B R
For discrete variables this is equivalent

to:
X and Y are independent 
P(X = x, Y = y) = P(X = x) P(Y = y)
for all x and y
X and Y are dependent if they are not
independent
9
Theorem:
* ( )+
∑∑ ( ) ( )
e.g.
( ) ∑ ∑ ( )
It follows that: E(X+Y) = E(X) + E(Y)

Definition: The conditional probability mass
function of X, given Y=y denoted by PX(x|y) is
defined by
p( X  x, Y  y)
p X ( x | y )  P( X  x | Y  y ) 
P (Y  y)
for fixed y. Provided P(Y=y)>0

The conditional expectation is:
E ( X | Y  y )   P( X  x | Y  y )
x
10
Example: Consider the following joint
probability mass function
y
p(x, y) 0 1 2 3
0 1/8 1/8 0 0
x 1 0 2/8 2/8 0
2 0 0 1/8 1/8
a. Find P(X+Y = 2) ?
=P(0,2)+P(1,1)+P(2,0)=0+2/8+0 = 2/8
b. Find P(X > Y)?
=P(1,0)+P(2,0)+P(2,1) = 0
c. Find marginal probability mass
function of X
3
P( X  x)   P( X  x, Y  y)
y 0
11
3
P( X  0)   P( X  0, Y  y )
y 0
1 1 2
 p(0,0)  p(0,1)  p(0,2)  p(0,3)    0  0 
8 8 8
3
P( X  1)   P( X  1, Y  y)
y 0
2 2 4
 p(1,0)  p(1,1)  p(1,2)  p(1,3)  0    0 
8 8 8
3
P( X  2)   P( X  2, Y  y )
y 0
1 1 2
 p(2,0)  p(2,1)  p(2,2)  p(2,3)  0  0   
8 8 8
d. Find the conditional p.m.f of X,

given Y=1?
12
p( X  x, Y  1)
p X |Y ( X  x | Y  1) 
P(Y  1)
But PY(1)=3/8. Therefore

p( X  x, Y  1)
p X |Y ( X  x | Y  1)  , x  0,1,2
P(Y  1)
Thus
p( X  0, Y  1) 1 / 8
p X |Y ( X  0 | Y  1)    1/ 3
P(Y  1) 3/8
p( X  1, Y  1) 2 / 8
p X |Y ( X  1 | Y  1)    2/3
P(Y  1) 3/8
p( X  2, Y  1) 0
p X |Y ( X  2 | Y  1)   0
P(Y  1) 3/8
These probabilities can be represented in

the following table:
x 0 1 2
P(X=x|Y=1) 1/3 2/3 0
e. Find P(0 ≤ X ≤ 1|Y=1) ?
13
P(0  X  1 | Y  1)   P( X  x | Y  1)
x 0,1
1 2
 p(0 | 1)  p(1 | 1)    1
3 3
f. Are X and Y independent?
X and Y are independent if
P(X=x, Y=y)= pX(X=x) . PY(Y=y)
for all x and y
Notice:
Check first if
p(0, 0) ≠ pX(0) . pY(0) ?
1/8 ≠ (2/8)(1/8)
Hence So, X and Y are not independent.
Joint Probability mass functions:
Let X 1 ,, X n be discrete random variables. The
joint probability distribution function is
Px1 ,, xn   P X 1  x1 ,, X n  xn  ,
14
where x1 ,, xn are possible values of

respectively. P x1 ,, xn satisfies
X 1 ,, X n , 
the following conditions:
1. Px1 ,, xn   0 for all x1 ,, x n ;
   P  x , , x   1
1 n
2. x2 xn
.
Independence of Multiple Random Variables:

Let X 1 ,, X n be random variables. If X 1 ,, X n
are independent if and only if
P X n  x1,, X n  xn   P X n  x1  P X n  xn  ,
for all pairs of x1 ,, xn 
15
Covariance Function
Measure of Dependency Between Two
Variables
Definition:
Suppose that X and Y are random
variables with joint p.m.f P(x,y) or p.d.f
fX,Y(x,y). The covariance of X and Y is
defined by
Cov(X,Y)=E([X−E(X)][Y−E(Y)]
Covariance is a measure of how much two
variables change together
Covariance indicates the relationship of two
variables whenever one variable changes.
If an increase in one variable results in an
increase in the other variable, both variables are
said to have a positive covariance. Decreases in
16
one variable also cause a decrease in the other.
Both variables move together in the same
direction when they change.
The concept of covariance is commonly used
when discussing relationships between two
variables.
Example1:
Suppose first that X and Y have the joint
p.m.f p(x,y) as given in the following
table
P(x,y) y
1 2 3
0 0.02 0.05 0.15 0.22
x 1 0.08 0.24 0.05 0.37
2 0.23 0.15 0.03 0.41
0.33 0.44 0.23
17
E(X) =0(.22)+1(.37)+2(.41)=1.19
E(Y) =1(.33)+2(.44)+3(.23)=1.90
E(XY) =
0(1)(.02)+0(2)(.05)+0(3)(.15)+1(1)(.08)+
1(2)(.24)+1(3)
(.37)+2(1)(.23)+2(2)(.15)+2(3)(.03)=1.95
Consequently,
Cov(X,Y) =1.95-1.19(1.90)= -0.311
Example 2:
P(x,y) 1 2 3
0 0.02 0.05 0.15 0.22
1 0.08 0.24 0.05 0.37
2 0.23 0.15 0.03 0.41
0.33 0.44 0.23
18
Note if X=0, Y is much more likely to be 3 rather
than 1
If X=2, Y is much more likely to be 1 rather
than 3
We expect negative correlation
E(X)=1.19
E(Y)=1.9
E(XY)=1.95
Cov(X,Y)=E(XY)-E(X) E(Y)
Cov(X,Y)=1.95-1.19(1.9)=-0.311
Example 3:
P(x,y) 1 2 3
0 0.22 0.05 0.05 0.32
1 0.18 0.14 0.05 0.37
2 0.07 0.1 0.14 0.31
0.47 0.29 0.24
19
Note if X=0, Y is much more likely to be 0 rather
than 3
If X=2, Y is much more likely to be 3 rather
than 1
We expect positive correlation
E(X)=0.99
E(Y)=1.77
E(XY)=1.99
Cov(X,Y)=E(XY)-E(X) E(Y)
Cov(X,Y)=1.99-0.99(1.77)=0.737
The covariance indicates the sign of

the correlation between the variables.
If cov(X, Y) > 0 the correlation is
positive.
If cov(X, Y) < 0 the correlation is
negative .
20
Covariance is a measure of dependence
between X and Y, it is zero when they
are independent.
We say that X and Y are not correlated if
cov(X, Y) = 0
Result :
cov(X, Y) = E(XY) – EX×EY
Properties covariance:
1. cov(X, Y) = cov(Y, X)
2. cov(X, X) = var(X)
3. cov(X + a, Y) = cov(X, Y)
4. cov(aX, Y) = a cov(X, Y)
5. cov(X, Y + Z) = cov(X, Y) + cov(X, Z)
6. var(X+Y) = var(X)+var(Y)+2cov(X,Y)
Property 8 for X1, …, Xn :
21
 n  n
var  X i    var( X i )   cov( X i , X j )
 i 1  i 1 i j
Properties for independent X and Y :

1. E(XY) = EX×EY
2. cov(X, Y) = 0
3. var(X+Y) = var(X)+var(Y)
Note 1: In general: E(X+Y) = EX+EY ,
but E(XY) ≠ EX×EY
Note 2 : independence => no correlation
not vice versa !
The value of the covariance :
- is positive if large values of X in general
coïncide with large values of Y and v.v.
22
- dépends on the unit of measurement
The correlation coefficient ρ(X,Y) does
not depend on the unit of
measurement:
( )
( )
Cov( X , Y ) Cov( X , Y )
Corr( X , Y )  
Var ( X ) Var (Y )  XY
Properties ρ(X,Y):
1. -1 ≤ ρ(X,Y) ≤ 1
So “ρ(X,Y) is a standardized measure for
linear dependence of X and Y”
23
Joint distribution of two continuous
random variables
If there is a function fX,Y(x,y) so that for

every B R2
P(( X , Y )  B)   f
( x , y )B
X ,Y ( x, y)dxdy
Then (X,Y) are jointly continuous

random variables fX,Y(x,y) is called joint
probability density function
As X and Y are continuous, Probability of values of
(X, Y) in a rectangle:
24
d b
Pa  X  b, c  Y  d     f ( x, y )dxdy
c a
Requirements for joint density function f(x,y):

1. f(x, y) ≥ 0, for all x and y
2. ∫ ∫ ( )
Notation: f(x, y) = fX,Y(x, y)
Example: Let X and Y have joint probability density

function:
4 xy 0  x  1,0  y  1
f ( x, y )   
0 otherwise 
25
Marginal probability density functions:
 Let X and Y be jointly continuous random
variables. The marginal probability density
function for X and Y are

f X ( x)  f

X ,Y ( x, y )dy ,
and

f Y ( y)  f

X ,Y ( x, y )dx
Conditional probability density function:

Let X and Y be continuous random variables. The
conditional probability density function of X given
Y is
f X |Y x, y 
f X |Y ( x | y ) 
fY  y
Where fY(y)>0
Independence of Two Random Variables:
Let X and Y be random variables. X and Y are
independent if and only if
26
f X ,Y ( x, y)  f X ( x) fY ( y), for all x and y
Independence of continuous random variables

X and Y are independent if
( ) ( ) ( ) for all xR and yR
Example: Let X and Y have joint probability density

function:
4 xy 0  x  1,0  y  1
f ( x, y)   
0 otherwise 
(a) Find P( X  1/ 2, Y  1/ 2) ?
1 1
2 4
P ( X  1 / 2, Y  1 / 4)    4 xydxdy
0 0
1
 14 
2 
P ( X  1 / 2, Y  1 / 4)   4 y   xdx dy
0 0 
 
1
2
 x 2 1/ 4 
P( X  1 / 2, Y  1 / 4)   4 y |0 dy
0  2 
27
1
 1 
2
P( X  1 / 2, Y  1 / 4)  0  32 dy
4 y
1
1 1 y 2 1/ 2 1
2
P( X  1 / 2, Y  1 / 4)   ydy  |0 
80 8 2 64
(b) Find the marginal probability density function
for X and Y.
1 1
y2 1
f X ( x)   4 xydy  4 x  ydy  4 x |0  2 x 0  x  1
0 0
2
1 1
x2 1
f Y ( x)   4 xydx  4 y  xdx  4 y |0  2 y 0  y  1
0 0
2
(c) Find the conditional probability density function

for X given X and the conditional probability
1 2
density function for X given Y , f X |Y ( x | y) ?

f X ,Y ( x, y) 4 xy
f X |Y ( x | y)    2 x, 0  x 1
fY ( y) 2y
(d)
© Are X and Y independent?
28
check if f X ,Y ( x, y)  f X ( x) f Y ( y) ?
4 xy  (2 x)(2 y)  4 xy
Then X and Y are independent

For the expectation we can derive:
( ) ∫ ∫ ( ) ( )
In particular:
( ) ∫ ∫ ( ) ,
Joint probability distribution of more than

two random variables
 Let X 1 ,, X n be continuous random variables. The
joint probability density function satisfies the
following conditions:
1. f x1 ,, xn   0 for    x1 ,, xn  ; ;
 
2.    f x1 ,, xn dx1  d xn  1 .

 
properties for two variables can be easily extended

to 3 or more variables. Some examples:
29
( ) ∫ ∫ ( )
Independence of Multiple Random Variables:

Let X 1 ,, X n be random variables. If X 1 ,, X n are
independent if and only if
f x1 ,, xn   f1 x1  f n xn  ,
for all pairs of x1 ,, xn  , where f j is the marginal

probability distribution (or density) function of X j .
Result
If X ~ N(µ1, σ12) and Y ~ N(µ2, σ22) and X and Y
are independent, then
X +Y ~ N(µ1+ µ2, σ12 + σ22)
Results:
- E(X1 +…+ X n) = E(X1) +…+ E(X n)
30
If X1,…, Xn are independent, then
- var(X1 +…+ X n) = var(X1) +…+ var(X n)
- ( ) ∏ ( )
31

Joint Distributions: The Joint Distribution of Two Discrete Random Variables

Uploaded by

Copyright:

Available Formats

You might also like

Joint Distributions: The Joint Distribution of Two Discrete Random Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Joint Distributions: The Joint Distribution of Two Discrete Random Variables

Uploaded by

Copyright:

Available Formats

LECTURE NOTES NO.

Example: A experiment consists of three

Look to events involving both X and Y, for

P(X=1 and Y = 1) = P(THT) = 1/8,

Notice from Table 4, we can get tables 1 and 2 – The

For discrete variables this is equivalent

It follows that: E(X+Y) = E(X) + E(Y)

for fixed y. Provided P(Y=y)>0

d. Find the conditional p.m.f of X,

But PY(1)=3/8. Therefore

These probabilities can be represented in

e. Find P(0 ≤ X ≤ 1|Y=1) ?

1. Px1 ,, xn   0 for all x1 ,, x n ;

Independence of Multiple Random Variables:

for all pairs of x1 ,, xn 

The covariance indicates the sign of

Properties for independent X and Y :

If there is a function fX,Y(x,y) so that for

Then (X,Y) are jointly continuous

Requirements for joint density function f(x,y):

Notation: f(x, y) = fX,Y(x, y)

Example: Let X and Y have joint probability density

Conditional probability density function:

Independence of continuous random variables

Example: Let X and Y have joint probability density

(c) Find the conditional probability density function

density function for X given Y , f X |Y ( x | y) ?

Then X and Y are independent

Joint probability distribution of more than

2.    f x1 ,, xn dx1  d xn  1 .

properties for two variables can be easily extended

Independence of Multiple Random Variables:

f x1 ,, xn   f1 x1  f n xn  ,

for all pairs of x1 ,, xn  , where f j is the marginal

You might also like