Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

2.

5 Jointly Distributed Random Variables

2.5.1 Joint Probability Distributions


Instead of considering one random variable X and its probability distribution, it is often
appropriate to consider two random variables X and Y and their joint probability distribution.
If the random variables are discrete, then the joint probability mass function consists of
probability values P(X = xi , Y = y j ) = pi j ≥ 0 satisfying
pi j = 1
i j

If the random variables are continuous, then the joint probability density function is a
function f (x, y) ≥ 0 satisfying

f (x, y) dx dy = 1
state space

The probability that a ≤ X ≤ b and c ≤ Y ≤ d is obtained from the joint probability density
function as
b d
f (x, y) d y d x
x =a y =c

The joint cumulative distribution function is defined to be


F(x, y) = P(X ≤ x, Y ≤ y),
which is
F(x, y) = pi j
i:xi ≤x j:y j ≤y
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 115

for discrete random variables and


x y
F(x, y) = f (w, z) dz dw
w=−∞ z=−∞

for continuous random variables.

Joint Probability Distributions


The joint probability distribution of two random variables X and Y is specified by a
set of probability values P(X = xi , Y = y j ) = pi j for discrete random variables, or a
joint probability density function f (x, y) for continuous random variables. In either
case, the joint cumulative distribution function is defined to be
F(x, y) = P(X ≤ x, Y ≤ y)

The following two examples illustrate jointly distributed random variables.

Example 19 A company that services air conditioner units in residences and office blocks is interested
Air Conditioner in how to schedule its technicians in the most efficient manner. Specifically, the company is
Maintenance interested in how long a technician takes on a visit to a particular location, and the company
recognizes that this mainly depends on the number of air conditioner units at the location that
need to be serviced.
If the random variable X, taking the values 1, 2, 3, and 4, is the service time in hours taken
at a particular location, and the random variable Y , taking the values 1, 2, and 3, is the number
of air conditioner units at the location, then these two random variables can be thought of as
jointly distributed.
Suppose that their joint probability mass function pi j is given in Figure 2.58. The figure
indicates, for example, that there is a probability of 0.12 that X = 1 and Y = 1, so that there
is a probability of 0.12 that a particular location chosen at random has one air conditioner
unit that takes a technician one hour to service. Similarly, there is a probability of 0.07 that
a location has three air conditioner units that take four hours to service. Notice that this is a
valid probability mass function since

pi j = 0.12 + 0.08 + · · · + 0.07 = 1.00


i j

FIGURE 2.58 X = service time (hrs)


Joint probability mass function for 1 2 3 4
air conditioner maintenance
example 1 0.12 0.08 0.07 0.05

Y = number of 2 0.08 0.15 0.21 0.13


air conditioner units

3 0.01 0.01 0.02 0.07


116 CHAPTER 2 RANDOM VARIABLES

FIGURE 2.59 X = service time (hrs)


Joint cumulative distribution 1 2 3 4
function for air conditioner
maintenance example 1 0.12 0.20 0.27 0.32

Y = number of 2 0.20 0.43 0.71 0.89


air conditioner units

3 0.21 0.45 0.75 1.00

The joint cumulative distribution function


x y
F(x, y) = P(X ≤ x, Y ≤ y) = pi j
i =1 j=1

is given in Figure 2.59. For example, the probability that a location has no more than two air
conditioner units that take no more than two hours to service is

F(2, 2) = p11 + p12 + p21 + p22 = 0.12 + 0.08 + 0.08 + 0.15 = 0.43

Example 20 In order to determine the economic viability of mining in a certain area, a mining company
Mineral Deposits obtains samples of ore from the location and measures their zinc content and their iron content.
Suppose that the random variable X is the zinc content of the ore, taking values between 0.5
and 1.5, and that the random variable Y is the iron content of the ore, taking values between
20.0 and 35.0. Furthermore, suppose that their joint probability density function is

39 17(x − 1)2 (y − 25)2


f (x, y) = − −
400 50 10,000
for 0.5 ≤ x ≤ 1.5 and 20.0 ≤ y ≤ 35.0.
The validity of this joint probability density function can be checked by ascertaining that
f (x, y) ≥ 0 within the state space 0.5 ≤ x ≤ 1.5 and 20.0 ≤ y ≤ 35.0, and that
1.5 35.0
f (x, y) dx dy = 1
x = 0.5 y = 20.0

The joint probability density function provides complete information about the joint prob-
abilistic properties of the random variables X and Y . For example, the probability that a
randomly chosen sample of ore has a zinc content between 0.8 and 1.0 and an iron content
between 25 and 30 is
1.0 30.0
f (x, y) dx dy
x = 0.8 y = 25.0

which can be calculated to be 0.092. Consequently only about 9% of the ore at the location
has mineral levels within these limits.
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 117

2.5.2 Marginal Probability Distributions


Even though two random variables X and Y may be jointly distributed, if interest is focused on
only one of the random variables, then it is appropriate to consider the probability distribution
of that random variable alone. This is known as the marginal distribution of the random
variable and can be obtained quite simply by summing or integrating the joint probability
distribution over the values of the other random variable.
For example, for two discrete random variables X and Y , the probability values of the
marginal distribution of X are
P(X = xi ) = pi+ = pi j
j

and for two continuous random variables, the probability density function of the marginal
distribution of X is

f X (x) = f (x, y) d y
−∞
where in practice the summation and integration limits can be curtailed at the appropriate
boundaries of the state space. Note that the marginal distribution of a random variable X
and the marginal distribution of a random variable Y do not uniquely determine their joint
distribution.

Marginal Probability Distributions


The marginal distribution of a random variable X is obtained from the joint
probability distribution of two random variables X and Y by summing or integrating
over the values of the random variable Y . The marginal distribution is the individual
probability distribution of the random variable X considered alone.

The expectations and variances of the random variables X and Y can be obtained from
their marginal distributions in the usual manner, as illustrated in the following examples.

Example 19 The marginal probability mass function of X , the time taken to service the air conditioner units
Air Conditioner at a particular location, is given in Figure 2.60 and is obtained by summing the appropriate
Maintenance values of the joint probability mass function. For example,
3
P(X = 1) = p1 j = 0.12 + 0.08 + 0.01 = 0.21
j=1

The expected service time is


4
E(X ) = i P(X = i)
i=1
= (1 × 0.21) + (2 × 0.24) + (3 × 0.30) + (4 × 0.25) = 2.59
Since
4
E(X 2 ) = i 2 P(X = i)
i=1
= (1 × 0.21) + (4 × 0.24) + (9 × 0.30) + (16 × 0.25) = 7.87
118 CHAPTER 2 RANDOM VARIABLES

FIGURE 2.60 X = service time (hrs)


Marginal probability mass 1 2 3 4
functions for air conditioner
maintenance example 1 0.12 0.08 0.07 0.05 0.32

Y = number of
2 0.08 0.15 0.21 0.13 0.57
air conditioner units

3 0.01 0.01 0.02 0.07 0.11

Marginal
0.21 0.24 0.30 0.25 distribution
of Y

Marginal distribution of X

0.57
E(X) = 1.79
E(X) = 2.59

0.30 Probability 0.32

Probability 0.24 0.25


0.21

0.11
σ = 1.08 σ = 1.08 σ = 0.62 σ = 0.62

1 2 3 4 1 2 3
Service time (hrs) Number of units
FIGURE 2.61 FIGURE 2.62

Marginal probability mass function of the service time Marginal probability mass function of the number of air
conditioner units

the variance in the service times is


Var(X ) = E(X 2 ) − (E(X))2 = 7.87 − 2.592 = 1.162

The standard deviation is therefore σ = 1.162 = 1.08 hours, or about 65 minutes, as
indicated in Figure 2.61.
The marginal probability mass function of Y , the number of air conditioner units at a
particular location, is given in Figure 2.62. Here,
4
P(Y = 1) = pi1 = 0.12 + 0.08 + 0.07 + 0.05 = 0.32
i =1
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 119

FIGURE 2.63 2
fX (x) = 57 − 51 −
(x 1)
Marginal probability density 40 10
function of zinc content

E( X ) = 1.0

σ = 0.23 σ = 0.23

x
0.5 1.5
Zinc content

for example. The expected number of air conditioner units can be calculated to be E(Y ) =
1.79, and the standard deviation is σ = 0.62.

Example 20 The marginal probability density function of X, the zinc content of the ore, is
y=35.0
Mineral Deposits
f X (x) = f (x, y) d y
y=20.0
y=35.0
39 17(x − 1)2 (y − 25)2
= − − dy
y=20.0 400 50 10,000
y=35.0
39y 17y(x − 1)2 (y − 25)3 57 51(x − 1)2
= − − = −
400 50 30,000 y=20.0 40 10
for 0.5 ≤ x ≤ 1.5. This is shown in Figure 2.63, and since it is symmetric about the point
x = 1, the expected zinc content is E(X ) = 1. The variance of the zinc content is
Var(X ) = E((X − E(X ))2 )
1.5 1.5
57 51(x − 1)2
= (x − 1)2 f X (x)d x = (x − 1)2 − dx
0.5 0.5 40 10
1.5
19 51
= (x − 1)3 − (x − 1) 5
= [0.0275] − [−0.0275] = 0.055
40 50 0.5

and the standard deviation is therefore σ = 0.055 = 0.23.
The probability that a sample of ore has a zinc content between 0.8 and 1.0 can be calculated
from the marginal probability density function to be
1.0 1.0
57 51(x − 1)2
P(0.8 ≤ X ≤ 1.0) = f X (x)dx = − dx
0.8 0.8 40 10
1.0
57x 17(x − 1)3
=− = [1.425] − [1.1536] = 0.2714
40 10 0.8
Consequently about 27% of the ore has a zinc content within these limits.
120 CHAPTER 2 RANDOM VARIABLES

FIGURE 2.64 (y − 25) 2


fY (y) = 83 −
Marginal probability density 1200 10,000 E(Y ) = 27.36
function of iron content

= 4.27 = 4.27

y
20 35
Iron content

The marginal probability density function of Y , the iron content of the ore, is
x=1.5 x =1.5
39 17(x − 1)2 (y − 25)2
f Y (y) = f (x, y)dx = − − dx
x = 0.5 x = 0.5 400 50 10,000
x =1.5
39x 17(x − 1)3 x(y − 25)2 83 (y − 25)2
= − − = −
400 150 10,000 x = 0.5 1200 10,000
for 20.0 ≤ y ≤ 35.0. This is shown in Figure 2.64 together with the expected iron content
and the standard deviation of the iron content, which can be calculated to be E(Y ) = 27.36
and σ = 4.27.

2.5.3 Conditional Probability Distributions


If two random variables X and Y are jointly distributed, then it is sometimes useful to consider
the distribution of one random variable conditional on the other random variable having taken
a particular value. Conditional probabilities were discussed in Section 1.4, and they allow
probabilities, or more generally random variable distributions, to be revised following the
observation of a certain event.
If two discrete random variables X and Y are jointly distributed, then the conditional
distribution of random variable X conditional on the event Y = y j consists of the probability
values
P(X = xi , Y = y j ) pi j
pi|Y =y j = P(X = xi |Y = y j ) = =
P(Y = y j ) p+ j
where p+ j = P(Y = y j ) = i pi j . If two continuous random variables X and Y are jointly
distributed, then the conditional distribution of random variable X conditional on the event
Y = y has a probability density function
f (x, y)
f X|Y =y (x) =
f Y (y)
where the denominator f Y (y) is the marginal distribution of the random variable Y . Condi-
tional expectations and variances can be calculated in the usual manner from these conditional
distributions.
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 121

Conditional Probability Distributions


The conditional distribution of a random variable X conditional on a random
variable Y taking a particular value summarizes the probabilistic properties of the
random variable X under the knowledge provided by the value of Y . It consists of the
probability values
P(X = xi , Y = y j ) pi j
pi |Y =y j = P(X = xi |Y = y j ) = =
P(Y = y j ) p+ j
for discrete random variables or the probability density function
f (x, y)
f X|Y =y (x) =
f Y (y)
for continuous random variables, where f Y (y) is the marginal distribution of the
random variable Y .

It is important to recognize the difference between a marginal distribution and a con-


ditional distribution. The marginal distribution for X is the appropriate distribution for the
random variable X when nothing is known about the random variable Y . In contrast, the condi-
tional distribution for X conditional on a particular value y of Y is the appropriate distribution
for the random variable X when the random variable Y is known to take the value y. This
difference is illustrated in the following examples.

Example 19 Suppose that a technician is visiting a location that is known to have three air conditioner
Air Conditioner units, an event that has a probability of
Maintenance P(Y = 3) = p+3 = 0.01 + 0.01 + 0.02 + 0.07 = 0.11
The conditional distribution of the service time X consists of the probability values
p13 0.01
p1|Y =3 = P(X = 1|Y = 3) = = = 0.091
p+3 0.11
p23 0.01
p2|Y =3 = P(X = 2|Y = 3) = = = 0.091
p+3 0.11
p33 0.02
p3|Y =3 = P(X = 3|Y = 3) = = = 0.182
p+3 0.11
p43 0.07
p4|Y =3 = P(X = 4|Y = 3) = = = 0.636
p+3 0.11
These values are shown in Figure 2.65, and they are clearly different from the marginal
distribution of the service time given in Figure 2.61. Conditioning on a location having three
air conditioner units increases the chances of a large service time being required.
The conditional expectation of the service time is
4
E(X |Y = 3) = i pi |Y =3
i=1
= (1 × 0.091) + (2 × 0.091) + (3 × 0.182) + (4 × 0.636) = 3.36
which, as expected, is considerably larger than the “overall” expected service time of 2.59
hours. The difference between these expected values can be interpreted in the following way.
122 CHAPTER 2 RANDOM VARIABLES

2
fY|X = 0.55 (y) = 0.073 − (y − 25)
3922.5
E(X | Y = 3) = 3.36
E(Y | X = 0.55) = 27.14

0.636

Probability

0.182

0.091 0.091 σ = 4.14 σ = 4.14

y
1 2 3 4 20 35
Service time (hrs) Iron content
FIGURE 2.65 FIGURE 2.66

Conditional probability mass function of service time when Y = 3 Conditional probability density function of iron content when
X = 0.55

If a technician sets off for a location for which the number of air conditioner units is not
known, then the expected service time at the location is 2.59 hours. However, if the technician
knows that there are three air conditioner units at the location that need servicing, then the
expected service time is 3.36 hours.

Example 20 Suppose that a sample of ore has a zinc content of X = 0.55. What is known about its iron
Mineral Deposits content? The information about the iron content Y is summarized in the conditional probability
density function for the iron content, which is
f (0.55, y)
f Y |X = 0.55 (y) =
f X (0.55)
where the denominator is the marginal distribution of the zinc content X evaluated at 0.55.
Since
57 51(0.55 − 1.00)2
f X (0.55) = − = 0.39225
40 10
the conditional probability density function is
f (0.55, y) 39 17(0.55 − 1.00)2 (y − 25)2
f Y |X = 0.55 (y) = = − −
0.39225 400 × 0.39225 50 × 0.39225 10,000 × 0.39225
(y − 25)2
= 0.073 −
3922.5
for 20.0 ≤ y ≤ 35.0. This is shown in Figure 2.66 with the conditional expectation of the
iron content, which can be calculated to be 27.14, and with the conditional standard deviation,
which is 4.14.
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 123

2.5.4 Independence and Covariance


In the same way that two events A and B were said to be independent in Chapter 1 if they
are “unrelated” to each other, two random variables X and Y are said to be independent if
the value taken by one random variable is “unrelated” to the value taken by the other random
variable. More specifically, the random variables are independent if the distribution of one of
the random variables does not depend upon the value taken by the other random variable.

Independent Random Variables


Two random variables X and Y are defined to be independent if their joint probability
mass function or joint probability density function is the product of their two marginal
distributions. If the random variables are discrete, then they are independent if
pi j = pi+ p+ j
for all values of xi and y j . If the random variables are continuous, then they are
independent if
f (x, y) = f X (x) f Y (y)
for all values of x and y. If two random variables are independent, then the probability
distribution of one of the random variables does not depend upon the value taken by
the other random variable.

Notice that if the random variables X and Y are independent, then their conditional
distributions are identical to their marginal distributions. If the random variables are discrete,
this is because
pi j pi+ p+ j
pi|Y =y j = = = pi +
p+ j p+ j
and if the random variables are continuous, this is because
f (x, y) f X (x) f Y (y)
f X|Y =y (x) = = = f X (x)
f Y (y) f Y (y)
In either case, the conditional distributions do not depend upon the value conditioned upon, and
they are equal to the marginal distributions. This result has the interpretation that knowledge
of the value taken by the random variable Y does not influence the distribution of the random
variable X , and vice versa.
As a simple example of two independent random variables, suppose that X and Y have a
joint probability density function of
f (x, y) = 6x y 2
for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 and f (x, y) = 0 elsewhere. The fact that this joint density
function is a function of x multiplied by a function of y (and that the state spaces of the random
variables [0, 1] do not depend upon each other) immediately indicates that the two random
variables are independent. Specifically, the marginal distribution of X is
1
f X (x) = 6x y 2 dy = 2x
y= 0
for 0 ≤ x ≤ 1, and the marginal distribution of Y is
1
f Y (y) = 6x y 2 d x = 3y 2
x= 0
124 CHAPTER 2 RANDOM VARIABLES

X
0 1 2

X 0 1/8 1/8 0 1/4


0 1 2

0 1/8 1/4 1/8 1/2 Z 1 1/8 1/4 1/8 1/2


Y
1 1/8 1/4 1/8 1/2 2 0 1/8 1/8 1/4

1/4 1/2 1/4 Marginal 1/4 Marginal


1/2 1/4
distribution distribution
of Y of Z
Marginal distribution of X Marginal distribution of X
FIGURE 2.67 FIGURE 2.68

Joint probability mass function and marginal Joint probability mass function and marginal
probability mass functions for X and Y in probability mass functions for X and Z in
coin-tossing game coin-tossing game

for 0 ≤ y ≤ 1. The fact that f (x, y) = f X (x) f Y (y) confirms that the random variables X
and Y are independent.

GAMES OF CHANCE Suppose that a fair coin is tossed three times so that there are eight equally likely outcomes,
and that the random variable X is the number of heads obtained in the first and second tosses,
the random variable Y is the number of heads in the third toss, and the random variable Z is
the number of heads obtained in the second and third tosses.
The joint probability mass function of X and Y is given in Figure 2.67 together with the
marginal distributions of X and Y . For example, P(X = 0, Y = 0) = P(T T T ) = 1/8 and
P(X = 0) = P(T T T ) + P(T T H ) = 1/4. It is easy to check that
P(X = i, Y = j) = P(X = i )P(Y = j)
for all values of i = 0, 1, 2 and j = 0, 1, so that the joint probability mass function is equal
to the product of the two marginal probability mass functions. Consequently, the random
variables X and Y are independent, which is not surprising since the outcome of the third coin
toss should be unrelated to the outcomes of the first two coin tosses.
Figure 2.68 shows the joint probability mass function of X and Z together with the marginal
distributions of X and Z. For example, P(X = 1, Z = 1) = P(H T H ) + P(T H T ) = 1/4.
Notice, however, that
1
P(X = 0, Z = 0) = P(T T T ) =
8
1
P(X = 0) = P(T T H ) + P(T T T ) =
4
and
1
P(Z = 0) = P(H T T ) + P(T T T ) =
4
2.5 JOINTLY DISTRIBUTED RANDOM VARIABLES 125

so that
P(X = 0, Z = 0) = P(X = 0)P(Z = 0)
This result indicates that the random variables X and Z are not independent. In fact, their
dependence is a result of their both depending upon the result of the second coin toss.

The strength of the dependence of two random variables on each other is indicated by
their covariance, which is defined to be
Cov(X, Y ) = E((X − E(X ))(Y − E(Y )))
The covariance can be any positive or negative number, and independent random variables
have a covariance of zero. It is often convenient to calculate the covariance from an alternative
expression
Cov(X, Y ) = E((X − E(X))(Y − E(Y )))
= E(X Y − X E(Y ) − E(X )Y + E(X )E(Y ))
= E(X Y ) − E(X )E(Y ) − E(X)E(Y ) + E(X)E(Y )
= E(X Y ) − E(X )E(Y )

Covariance
The covariance of two random variables X and Y is defined to be
Cov(X, Y ) = E((X − E(X ))(Y − E(Y ))) = E(X Y ) − E(X )E(Y )
The covariance can be any positive or negative number, and independent random
variables have a covariance of 0.

In practice, the most convenient way to assess the strength of the dependence between
two random variables is through their correlation.

Correlation
The correlation between two random variables X and Y is defined to be
Cov(X, Y )
Corr(X, Y ) = √
Var(X)Var(Y )
The correlation takes values between −1 and 1, and independent random variables
have a correlation of 0.

Random variables with a positive correlation are said to be positively correlated, and in
such cases there is a tendency for high values of one random variable to be associated with
high values of the other random variable. Random variables with a negative correlation are
said to be negatively correlated, and in such cases there is a tendency for high values of one
random variable to be associated with low values of the other random variable. The strength
of these tendencies increases as the correlation moves further away from 0 to 1 or to −1.
As an illustration of the calculation of a covariance, consider again the simple example
where
f (x, y) = 6x y 2
126 CHAPTER 2 RANDOM VARIABLES

for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. The expectation of the random variable X can be calculated


from its marginal distribution to be
1 1
2
E(X ) = x f X (x) d x = 2x 2 d x =
x =0 x =0 3
and similarly
1 1
3
E(Y ) = y fY (y) dy = 3y 3 d y =
y=0 y =0 4
Also,
1 1 1 1
1
E(X Y ) = x y f (x, y) d y d x = 6x 2 y 3 dy dx =
x =0 y=0 x =0 y=0 2
so that
1 2 3
Cov(X, Y ) = E(X Y ) − E(X)E(Y ) = − × =0
2 3 4
This result is expected since the random variables X and Y were shown to be independent.

Example 19 The expected service time is E(X) = 2.59 hours, and the expected number of units serviced
Air Conditioner is E(Y ) = 1.79. In addition,
Maintenance 4 3
E(X Y ) = i j pi j
i=1 j=1
= (1 × 1 × 0.12) + (1 × 2 × 0.08) + · · · + (4 × 3 × 0.07) = 4.86

so that the covariance is

Cov(X, Y ) = E(X Y ) − E(X )E(Y ) = 4.86 − (2.59 × 1.79) = 0.224

Since Var(X ) = 1.162 and Var(Y ) = 0.384, the correlation between the service time and the
number of units serviced is
Cov(X, Y ) 0.224
Corr(X, Y ) = √ = √ = 0.34
Var(X)Var(Y ) 1.162 × 0.384
As expected, the service time and the number of units serviced are not independent but are
positively correlated. This makes sense because there is a tendency for locations with a large
number of air conditioner units to require relatively long service times.

GAMES OF CHANCE Consider again the tossing of three coins and the random variables X , Y , and Z. It is easy to
check that E(X) = 1 and E(Y ) = 1/2, and that
2 1
1 1 1
E(XY ) = i j pi j = 1×1× + 2×1× =
4 8 2
i =0 j =0

Consequently,
1 1
Cov(X, Y ) = E(XY ) − E(X)E(Y ) = − 1× =0
2 2
which is expected because the random variables X and Y are independent.

You might also like