Random Variables and Distributions

2.1. Random objects and random variables

DeÖnition. A random object is a measurable function

x̃ : (W, F ) "! (W0 , F 0 ) , where W is a sample space and F is the

s-algebra of events.

DeÖnition. A (real-valued) random variable is a measurable function

x̃ : (W, F ) "! (R, B) , where W is a sample space and F is the

s-algebra of events.
Similarly, when W is a sample space and F is the s-algebra of events,
! "
x̃ : (W, F ) "! R, B is a extended (real-valued) random variable.
x̃ : (W, F ) "! (Rn , B) is a "(real-valued) random vector" or a
"(real-valued) multivariate random variable".
# $
x̃ : (W, F ) "! R , B is a "extended (real-valued) random vector"
or a "extended (real-valued) multivariate random variable".

A random vector is just a vector of random variables:

x̃ = (x̃1 , x̃2 , ..., x̃n ) .

2.2. Probability distributions
Let (W, F , P ) be a probability space.
DeÖnition. The probability distribution (or distribution) of a random
object x̃ : (W, F , P ) "! (W0 , F 0 ) is a probability measure Px̃ on
(W0 , F 0 ) deÖned by
! "
Px̃ (B ) = P x̃ "1 (B ) for all B 2 F 0
Px̃ (B ) = P fw 2 W j x̃ (w ) 2 B g = P fx̃ 2 B g for all B 2 F 0 .

Px̃ (B ) = 1dPx̃ * dPx̃ * IB (x )dPx̃ (x ) for all B 2 F 0
B B W0
Px̃ (B ) = 1dP * dP * Ix̃ "1 (B ) (w )dP (w ) for all B 2 F 0 .
x̃ "1 (B ) x̃ "1 (B )

Example: We roll a balanced dice,! W W= f1,
" 2, 3, 4, 5, 6g , and
consider the random variable x̃ : W, 2 , P "! (R, B) deÖned as
< 1 if w = 1, 2, 3, 4
x̃ (w ) =
7 if w = 5, 6.

The induced probability Px̃ on (R, B) (or distribution of x̃) satisÖes

Px̃ f1g = P f1, 2, 3, 4g = 2/3, Px̃ f7g = P f5, 6g = 1/3,

Px̃ f12g = P (∆) = 0, Px̃ ("3, 1) = P (∆) = 0,
Px̃ h 3,p1] =i P f1, 2, 3, 4g = 2/3,
[" Px̃ [5, 8] = Px̃ f7g = 1/3,
Px̃ p, 13 = P (∆) = 0, Px̃ ("•, 12] = P (W) = 1,
Px̃ [10, •) = P (∆) = 0, Px̃ (1, •) = P f5, 6g = 1/3,
Px̃ ("•, 2] = P f1, 2, 3, 4g = 2/3, etc.

Moreover, using the properties of the probability, we obtain the

distribution for all Borel sets in R.
DeÖnition. The support supp (Px̃ ) of the distribution of the random
vector x̃ : (W, F ) "! (Rn , B) is the smallest closed subset of Rn
whose complement has zero probability distribution,
Px̃ f[supp (Px̃ )]c g = 0.

DeÖnition. Two random objects x̃ and ỹ deÖned on (W, F , P ) and


taking values on (W0 , F 0 ) are equivalent (or equal) in distribution

(x̃ = ỹ ) if they have the same distribution, Px̃ = Pỹ .
! We" toss a balanced coin !and consider
x̃ : W, 2W , P "! (R, B), and ỹ : W, 2W , P "! (R, B) deÖned as
8 8
< "1 if w = H < "1 if w = T
x̃ (w ) = and ỹ (w ) =
: :
1 if w = T 1 if w = H.

Thus, x̃ = ỹ .

An event A is sure if A = W.

An event A is almost sure (a.s.) if P (A) = 1.

An event A is negligible if P (A) = 0.

DeÖnition. We say that two random objects deÖned on (W, F , P )
and taking values on (W0 , F 0 ) are equal, x̃ = ỹ , if x̃ (w ) = ỹ (w ) for
all w 2 W.
DeÖnition. We say that two random objects deÖned on (W, F , P )
a.s .
and taking values on (W0 , F 0 ) are equal almost surely (a.s.), x̃ = ỹ , if
P fx̃ = ỹ g = P fw 2 W j x̃ (w ) = ỹ (w ) g = 1 ,
or, equivalently, if
P fx̃ 6= ỹ g = P fw 2 W j x̃ (w ) 6= ỹ (w ) g = 0.

Note that the concept of "a.s." is the same as that of "a.e." The only
di§erence is that "a.e." applies to functions deÖned on measure
spaces, whereas "a.s." applies to random objects deÖned on
probability spaces.
a.s . a.s . d
Obviously, x̃ = ỹ =) x̃ = ỹ . Moreover, x̃ = ỹ =) x̃ = ỹ but the
converse is not true (see the example in the previous page where
d a.s .
x̃ = ỹ but x̃ 6= ỹ since P fx̃ 6= ỹ g = 1).
2.3. Distribution function of a random variable

Note that the distribution Px̃ of a random variable

x̃ : (W, F , P ) "! (R, B) is a probability measure on (R, B) and,
thus, is a Önite measure.

Therefore, the distribution Px̃ of a random variable

x̃ : (W, F , P ) "! (R, B) is a Lebesgue-Stieltjes measure on R
satisfying Px̃ (R ) = 1.


DeÖnition. The (cumulative) distribution function (cdf)

Fx̃ : R "! R of a random variable x̃ : (W, F , P ) "! (R, B) is the
distribution function associated with the distribution Px̃ , i.e.,

Px̃ (a, b ] = P fa < x̃ . b g = Fx̃ (b ) " Fx̃ (a),

where we make the normalization lim Fx̃ (x ) = 0.

x !"•


Px̃ ("•, x ] = P fx̃ . x g = Fx̃ (x ) " lim Fx̃ (x ) = Fx̃ (x ).

x !"•


lim Fx̃ (x ) = Px̃ ("•, •) = P fx̃ 2 R g = 1.

x !•

Thus, the distribution function of a random variable x̃ is increasing,

right-continuous, and satisÖes lim Fx̃ (x ) = 0 and lim Fx̃ (x ) = 1.
x !"• x !•

2.4. Discrete random variables

DeÖnition. A random variable x̃ : (W, F ) "! (R, B) is discrete if

its range is countable or discrete (either Önite or inÖnite).

If W is discrete then x̃ is discrete. The converse is not true.

Let fx1 , x2 , ...g be the range x̃ (W) of the discrete random variable x̃.

If x̃ is discrete there is a countable partition A = fA1 , A2 , ...g of W

An = fw 2 W j x̃ (w ) = xn g , for all xn 2 x̃ (W).

Therefore, An = x̃ "1 (xn ) , for all xn 2 x̃ (W).

s-algebra that makes the random variable x̃ measurable.

The distribution of a discrete random variable x̃ (which is said to

have a discrete distribution) satisÖes:

Px̃ fxn g = P fx̃ = xn g = P (An ), for all xn 2 x̃ (W).

DeÖnition. The probability mass function (pmf), (or just probability

function), fx̃ : x̃ (W) "! [0, 1] , of a discrete random variable x̃ (or of
a discrete distribution Px̃ ) is given by:

fx̃ (x ) = P fx̃ = x g = Px̃ fx g , for all x 2 x̃ (W).

Properties of the probability and distribution functions of a
discrete random variable:

 fx̃ (x ) = 1.
x 2x̃ (W)

2. Any function f : x̃ (W) "! [0, 1] , where x̃ (W) is countable,

satisfying  f (x ) = 1 can serve as a probability function of a
x 2x̃ (W)
discrete distribution.

Fx̃ (x ) = Â fx̃ (t ), with t 2 x̃ (W).
t .x

Px̃ (B ) = P fx̃ 2 B g = Â fx̃ (x ), for all B 2 B , with x 2 x̃ (W).

x 2B

fx̃ (x ) = Fx̃ (x ) " lim" Fx̃ (t ), for x 2 x̃ (W).
t !x

In particular, if the range of x̃ can be ordered so that

x1 < x2 < ... < xi "1 < xi < xi +1 < ..., then fx̃ (x1 ) = Fx̃ (x1 ) and
fx̃ (xi ) = Fx̃ (xi ) " Fx̃ (xi "1 ) for i = 2, 3, ...

Example: Let x̃ be the number of heads when tossing 4 coins.
> 1/16 for x = 0

> 4/16 for x = 1
fx̃ (x ) = 6/16 for x = 2
> 4/16 for x = 3
1/16 for x = 4,

, -
1 4
fx̃ (x ) = , for x = 0, 1, 2, 3, 4.
16 x | {z }
x̃ (W)

Probability Histogram:

Probability Bar Chart:

Distribution function:

2.5. Continuous and absolutely continuous random
DeÖnition 1. A random variable x̃ is continuous if its range x̃ (W) is
continuous. "

DeÖnition 2. A random variable x̃ is continuous if its distribution

function Fx̃ is continuous, that is, if Px̃ fx g = P fx̃ = x g = 0 for all
x 2 R.
Continuity according to DeÖnition 2 implies continuity according to
DeÖnition 1.
DeÖnition. A random variable x̃ : (W, F ) "! (R, B) is absolutely
continuous if its distribution function Fx̃ is absolutely continuous,
! " i.e.,
there exists a Borel measurable function fx̃ : (R, B) "! R, B that
is integrable with respect to Lebesgue measure such that
Fx̃ (x ) " Fx̃ (a) = fx̃ (t )dt, for all a 2 R, x 2 R, with a . x.
[a,x ]

Absolute continuity implies continuity.

Random variables that are neither discrete nor absolutely continuous

are called "mixed".

Equivalent deÖnition: A random variable x̃ is absolutely continuous

if its distribution Px̃ is absolutely continuous with respect to Lebesgue

Therefore, thanks to the Radon-Nikodym theorem,

! " there exists a
Borel measurable function fx̃ : (R, B) "! R, B such that
Px̃ (B ) = fx̃ (x )dx, for all B 2 B .

2.6. Density
! "
The Borel measurable function fx̃ : (R, B) "! R, B such that
Px̃ (B ) = fx̃ (x )dx, for all B 2 B ,

is called the probability density function (pdf), (or density function or

just "density"), of the random variable x̃ (or of the distribution Px̃ ).

Since Px̃ (R ) = 1, the density function fx̃ is integrable with respect to

Lebesgue measure on (R, B) .

Moreover, the density fx̃ is Önite a.e. with respect to Lebesgue

measure on (R, B) .

The density function fx̃ of the random variable x̃ is the

Radon-Nikodym derivative of its distribution with respect to Lebesgue
measure, fx̃ = dPx̃ / dx.

Note: If x̃ is absolutely continuous, then

Px̃ (a, b ] = Px̃ (a, b ) = Px̃ [a, b ] = Px̃ [a, b ) =

Fx̃ (b ) " Fx̃ (a) = fx̃ (x )dx.
[a,b ]

Notation: If the random variable x̃ has the distribution Px̃ , we write

x̃ 0 Px̃ , x̃ 0 Fx̃ , or x̃ 0 fx̃ , where Fx̃ is the corresponding distribution
function and fx̃ is the corresponding probability or density function.

Px̃ [a, b ] is given by the area of the yellow region
Properties of the density:

1. Z
fx̃ (x )dx = 1.

2. Z
Fx̃ (x ) = fx̃ (t )dt.
("•,x ]

3. Any non-negative (a.e.! w.r.t.

" LebesgueRmeasure) Borel measurable
function f : (R, B) "! R, B satisfying R f (x )dx = 1 can serve as
a density of an absolutely continuous distribution on (R, B).

4. If x̃ is absolutely continuous, then fx̃ = Fx̃0 when the derivative of

Fx̃ exists. Moreover, the derivative Fx̃0 exists a.e. w.r.t. Lebesgue
measure. If fx̃ is continuous at x then Fx̃ is di§erentiable at x and
fx̃ (x ) = Fx̃0 (x ).

2.7. Random vectors

x̃ : (W, F ) "! (Rn , B) .

x̃ = (x̃1 , x̃2 , ..., x̃n ) or x̃ = (x̃1 , x̃2 , ..., x̃n )| .
x̃i = pi (x̃ ) , where pi : Rn "! R is the projection to the ith
The distribution of the random vector x̃ is a probability measure on
(Rn , B) given by
! "
Px̃ (B ) = P x̃ "1 (B ) for all B 2 B (Rn ) .

The distribution function (cdf) of the random vector

x̃ = (x̃1 , x̃2 , ..., x̃n ) , Fx̃ : Rn "! R, is given by

Fx̃ (x1 , x2 , ..., xn ) = P fx̃i . xi , for i = 1, 2, ..., ng .

| {z }
The distribution function of a random vector x̃ is an (i) increasing,...

(ii) right-continuous,..
! "
(Right-continuous at x0 : lim F (x ) * F x0+ = F (x0 ) , where
x !x0+
x > x0 2 Rn )

(iii) Fx̃ (x ) ! 0 if at least one of the components xi of x 2 Rn tends
to "•, and

(iv) Fx̃ (x ) ! 1 if all the components xi , i = 1, ..., n, of x 2 Rn tend

to •.

The random vector x̃ = (x̃1 , x̃2 , ..., x̃n ) is discrete if its range x̃ (W) is
countable (or discrete).

The probability function (pmf),
fx̃ : x̃1 (W) 1 x̃2 (W) 1 ... 1 x̃n (W) "! [0, 1] , of a discrete random

vector x̃ is given by:
8 9
< =
fx̃ (x ) = Px̃ fx g = P (x̃1 , x̃2 , ..., x̃n ) = (x1 , x2 , ..., xn ) =
:| {z } | {z };
x̃ x 2R n

P fx̃i = xi , for i = 1, 2, ..., ng , for all x 2 x̃1 (W) 1 x̃2 (W) 1 ... 1 x̃n (W).

Note: x̃ (W) 2 x̃1 (W) 1 x̃2 (W) 1 ... 1 x̃n (W).

Properties of the probability and distribution functions of a
discrete random vector:

 fx̃ (x ) = 1 or  fx̃ (x ) = 1.
x 2x̃ (W) x 2x̃1 (W)1x̃2 (W)1...1x̃n (W)

F (x ) = Â f (t ),

t 5x
x̃ with t = (t1 , t2 , ..., tn ) 2 x̃ (W),

where t 5 x means that ti . xi for i = 1, 2, ..., n.

Px̃ (B ) = P fx̃ 2 B g = Â fx̃ (x ), for all B 2 B (Rn ) .
x 2B

The random vector x̃ = (x̃1 , x̃2 , ..., x̃n ) (or its distribution) is
absolutely continuous
! if" there exists a Borel measurable function
fx̃ : (Rn , B) "! R, B , called the density (pdf), that is integrable
with respect to Lebesgue measure on (Rn , B) , such that
Px̃ (B ) = fx̃ (x1 , x2 , ..., xn ) d (x1 , x2 , ..., xn ) , for all B 2 B (Rn ) .

Properties of the density of a random vector:

1. Z Z Z
fx̃ (x )dx = ... fx̃ (x1 , ..., xn )dx1 ...dxn = 1.
Rn R R | {z }
x 2R n

Fx̃ (x ) = ... fx̃ (t1 , t2 , ..., tn )dt1 dt2 ...dtn .
("•,xn ] ("•,xn "1 ] ("•,x1 ]

3. Any non-negative (a.e.! w.r.t." Lebesgue measure) Borel measurable
function f : (Rn , B) "! R, B satisfying
... f (x1 , x2 , ..., xn )dx1 dx2 , ..., dxn = 1

can serve as a density of an absolutely continuous distribution on

(Rn , B) .
4. If the random vector x̃ = (x̃1 , x̃2 , ..., x̃n ) is absolutely continuous,
∂n Fx̃ (x1 , x2 , ..., xn )
fx̃ (x1 , x2 , ..., xn ) = .
∂x1 ∂x2 ...∂xn
when this nth crossed partial derivative of Fx̃ exists. Moreover, this
derivative exists a.e. w.r.t. Lebesgue measure on (Rn , B).

2.8. Marginal distributions
DeÖnition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be a random vector with
distribution Px̃ . The marginal distribution of x̃i , for i = 1, ..., n, is
given by
Px̃i (B ) = Px̃ (R 1... 1 B 1 ... 1 R ), for all B 2 B (R ) .

DeÖnition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be a discrete random vector with
the probability function fx̃ , the marginal probability function of x̃i , for
i = 1, ..., n, is given by
fx̃i (xi ) =

 ...   ...  fx̃ (x1 , ..., xi "1 , xi , xi +1 ..., xn ) .

| {z }
x1 2x̃1 (W) xi "1 2x̃i "1 (W) xi +1 2x̃i +1 (W) xn 2x̃n (W)
for all xi 2 x̃i (W).

DeÖnition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be an absolutely continuous
random vector with the density fx̃ , the marginal density of x̃i , for
i = 1, ..., n, is given by
fx̃i (xi ) = ... fx̃ (x1 , ..., xi "1 , xi , xi +1 ..., xn )dx1 ...dxi "1 dxi +1 ...dxn ,
| {z }
R R x

for all xi 2 R.

Note: From the marginal probability or density functions we can

construct the marginal distributions in the usual way.

Example 1: The discrete random vector (x̃, ỹ ), where x̃ is the
number of points when rolling a dice and ỹ is the number of heads
when tossing a coin has a probability function fx̃ ,ỹ (x, y ) summarized
in the following table:

y nx 1 2 3 4 5 6 fỹ (y )
0 1/12 1/12 1/12 1/12 1/12 1/12 1/2
1 1/12 1/12 1/12 1/12 1/12 1/12 1/2
fx̃ (x ) 1/6 1/6 1/6 1/6 1/6 1/6 1

The marginal probability functions of x̃ and ỹ are summarized in the


Example 2: The absolutely continuous random vector (x̃, ỹ ) has the
following density:
> 2
< (x + 2y ) for 0 < x < 1 and 0 < y < 1
fx̃ ,ỹ (x, y ) = 3
0 otherwise.

Marginal densities:
Z • Z 1
2 26 71
fx̃ (x ) = fx̃ ,ỹ (x, y )dy = (x + 2y ) dy = xy + y 2 0
"• 0 3 3
= (x + 1) , for 0 < x < 1.
Therefore, 8
> 2
< 3 (x + 1) for 0 < x < 1
fx̃ (x ) =
: 0 otherwise.

Z • Z 1 8 91
2 2 x2
fỹ (y ) = fx̃ ,ỹ (x, y )dx = (x + 2y ) dx = + 2xy
"• 0 3 3 2 0
, -
2 1 1
= + 2y = (1 + 4y ) , for 0 < y < 1.
3 2 3

> 1
< (1 + 4y ) for 0 < y < 1
fỹ (y ) = 3
0 otherwise.

2.9. Independent random variables
DeÖnition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be a random vector deÖned on
(W, F , P ) with the distribution Px̃ on (Rn , B (Rn )) . The random
variables x̃1 , x̃2 , ..., x̃n are said to be independent if, for all collections
of sets B1 , B2, ..., Bn belonging to B (R ) , we have
P fx̃1 2 B1 , ..., x̃n 2 Bn g = P fx̃1 2 B1 g 5 ... 5 P fx̃n 2 Bn g .

or, equivalently, if the distribution of the random vector x̃ is equal to

the product measure of the marginal distributions,
Px̃ = Px̃1 1 ... 1 Px̃n * ’ Px̃ .i
i =1

DeÖnition. Let x̃i : (W, F , P ) "! (Wi , Fi ) , for i = 1, ..., n. The

random objects x̃1 , x̃2 , ..., x̃n are said to be independent if, for all sets
B1 2 F1 , ..., Bn 2 Fn ,
P fx̃1 2 B1 , ..., x̃n 2 Bn g = P fx̃1 2 B1 g 5 ... 5 P fx̃n 2 Bn g .

Equivalent deÖnition. Let x̃1 , x̃2 , ..., x̃n be a collection of random
objects on the probability space (W, F , P ),

x̃i : (W, F ) "! (Wi , Fi ) , for i = 1, 2, ..., n.

The random objects x̃1 , x̃2 , ..., x̃n are said to be independent if the
joint distribution
P(x̃1 ,x̃2 ,...,x̃n ) : Fi "! [0, 1]
i =1

of these n random objects is equal to the product measure of the

marginal distributions,
P(x̃1 ,x̃2 ,...,x̃n ) = ’ Px̃ ,i
i =1

where Px̃i : Fi "! [0, 1] is the marginal distribution of the random

object x̃i , i = 1, ..., n, and Fi is the product s"algebra.
i =1

Proposition. Let x̃i : (W, F , P ) "! (Wi , Fi ) , for i = 1, ..., n, be a
collection of independent random objects and
gi : (Wi , Fi ) "! (Wi0 , Fi0 ) , for i = 1, ..., n, be measurable functions.
Then, the random objects gi (x̃i ) : (W, F , P ) "! (Wi0 , Fi0 ) , for
i = 1, ..., n, are independent.

Proof. If

P fx̃1 2 B1 , ..., x̃n 2 Bn g = P fx̃1 2 B1 g 5 ... 5 P fx̃n 2 Bn g ,

for all sets B1 2 F1 , ..., Bn 2 Fn , then

< ! " ! "=
P x̃1 2 g1"1 B10 , ..., x̃n 2 gn"1 Bn0
< ! "= < ! "=
= P x̃1 2 g1"1 B10 5 ... 5 P x̃n 2 gn"1 Bn0 ,
for all sets B10 2 F10 , ..., Bn0 2 Fn0 , since g1"1 (B10 ) 2 F1 , ..., gn"1 (Bn0 ) 2 Fn
due to the measurability of gi , for i = 1, ..., n. Therefore,
< = < = < =
P g1 (x̃1 ) 2B10 , ..., gn (x̃n ) 2Bn0 = P g1 (x̃1 ) 2B10 5 ... 5 P gn (x̃n ) 2Bn0 ,

for all sets B10 2 F10 , ..., Bn0 2 Fn0 , which proves the independency of the
random objects gi (x̃i ) : (W, F , P ) "! (Wi0 , Fi0 ) , for i = 1, ..., n. Q.E .D.

Proposition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be a random vector with the
distribution function Fx̃ : Rn "! [0, 1] and let Fi : R "! [0, 1] be
the marginal distribution function of x̃i , for i = 1, ..., n. Then, the
random variables x̃1 , x̃2 , ..., x̃n are independent if and only if

Fx̃ (x1 , ..., xn ) = F1 (x1 ) 5 F2 (x2 ) 5 ... 5 Fn (xn ) ,

for all x = (x1 , ..., xn ) 2 Rn .

Proposition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be a discrete random vector
with the probability function fx̃ : x̃1 (W) 1 ... 1 x̃n (W) "! [0, 1] and
let fi : x̃i (W) "! [0, 1] be the marginal probability function of x̃i , for
i = 1, ..., n. Then, the random variables x̃1 , x̃2 , ..., x̃n are independent
if and only if

fx̃ (x1 , ..., xn ) = f1 (x1 ) 5 f2 (x2 ) 5 ... 5 fn (xn ) ,

for all x = (x1 , ..., xn ) 2 x̃1 (W) 1 ... 1 x̃n (W).

Proposition. Let x̃ = (x̃1 , x̃2 , ..., x̃n ) be an absolutely continuous

random vector with the density function fx̃ : Rn "! R and let
fi : R "! R be the marginal density function of x̃i , for i = 1, ..., n.
Then, the random variables x̃1 , x̃2 , ..., x̃n are independent if and only if

fx̃ (x1 , ..., xn ) = f1 (x1 ) 5 f2 (x2 ) 5 ... 5 fn (xn ) ,

for all x = (x1 , ..., xn ) 2 Rn .

2.10. Generalized conditional probability
Let x̃ : (W, F , P ) "! (W0 , F 0 ) and let us Öx the event B 2 F . From
the Radon-Nikodym theorem we know that there exists a Borel
measurable function g : (W0 , F 0 ) "! (R, B) such that
P (fx̃ 2 Ag \ B ) = g (x )dPx̃ (x ), for all A 2 F 0 ,
| {z } A
l (A )

since l 7 Px̃ . The function g is called the conditional probability of

B given x̃ = x and is written as g (x ) = P (B jx̃ = x ). The
conditional probability is essentially unique for a given B 2 F (i.e., if
there exists another such function h, then g = h a.e. [Px̃ ]).
P (fx̃ 2 Ag \ B ) = P (B jx̃ = x )dPx̃ (x ),

with P (B jx̃ = 5 ) : (W0 , F 0 ) "! (R, B) .

However, sometimes the conditional probability is viewed as a
measure on (W, F ),

P (5 jx̃ = x ) : F "! R.

Moreover, if A = W0 , then
!< = " Z
P x̃ 2 W0 \ B = P (W \ B ) = P (B ) = P (B jx̃ = x )dPx̃ (x ),

which is a generalization of the theorem of total probability.

Note that, if x̃ is an absolutely continuous random variable, then

P (B jx̃ = x ) is a conditional probability given an event (fx̃ = x g)
that has zero probability!

2.11. Conditional distributions

DeÖnition. Let (x̃, ỹ ) be a vector of two random objects

x̃ : (W, F , P ) "! (Wx , Fx ) and ỹ : (W, F , P ) "! (Wy , Fy ) , and let
C 2 Fy be a Öxed measurable set. The conditional distribution of ỹ
given x̃ = x is the Borel measurable function
Pỹ jx̃ (C jx̃ = 5 ) : (Wx , Fx ) "! (R, B) given by

Pỹ jx̃ (C jx ) = P fỹ 2 C jx̃ = x g , for all x 2 Wx ,

which is essentially unique w.r.t. Px̃ .

However, sometimes the conditional distribution is viewed as a

measure on (Wy , Fy ) ,

Pỹ jx̃ (5 jx ) : Fy "! R.

Assume that the random vector (x̃, ỹ ) is discrete with the probability
function fx̃ ,ỹ : x̃ (W) 1 ỹ (W) "! [0, 1] . Then, the conditional
distribution Pỹ jx̃ (y jx ) = P fỹ = y jx̃ = x g must satisfy

P fx̃ 2 A, ỹ 2 C g = Â Pỹ jx̃ (C jx )P| fx̃{z= x g}

x 2A
fx̃ (x )

= Â Â Pỹ jx̃ (y jx )fx̃ (x ), for all A 2 B (R ) and C 2 B (R ) . (8)

x 2A y 2C
| {z }
P ỹ jx̃ (C jx )

Let us deÖne the conditional distribution Pỹ jx̃ (y jx ) as follows:

P fx̃ = x, ỹ = y g fx̃ ,ỹ (x, y )

Pỹ jx̃ (y jx ) = = * fỹ jx̃ (y jx ) ,
P fx̃ = x g fx̃ (x )

for all (x, y ) 2 x̃ (W) 1 ỹ (W) with fx̃ (x ) > 0.


The function fỹ jx̃ (5 jx ) : ỹ (W) "! [0, 1] , for all x 2 x̃ (W) such that
fx̃ (x ) > 0, is the conditional probability function of ỹ given x̃ = x.

The previous deÖnition of the conditional probability function (or

conditional distribution) of ỹ given x̃ = x is the right one since the
expression (8) becomes

P fx̃ 2 A, ỹ 2 C g = Â Â fỹ jx̃ (y jx ) fx̃ (x )

x 2A y 2C

fx̃ ,ỹ (x, y )

= Â Â fx̃ (x )
fx̃ (x ) = Â Â fx̃ ,ỹ (x, y ),
x 2A y 2C x 2A y 2C

for all A 2 B (R ) and C 2 B (R ) .

Assume that the random vector (x̃, ỹ ) is absolutely continuous with
the density fx̃ ,ỹ : R2 "! R. Then, we would like to have an
expression like this:
P fx̃ 2 A, ỹ 2 C g = Pỹ jx̃ (C jx )dPx̃ (x )
Z Z 8Z 9
= Pỹ jx̃ (C jx )fx̃ (x )dx = fỹ jx̃ (y jx ) dy fx̃ (x )dx, (88)
| {z }
P ỹ jx̃ (C jx )

for all A 2 B (R ) and C 2 B (R ) .

Let us deÖne the conditional density of ỹ given x̃ = x,
fỹ jx̃ (5 jx ) : R "! R, for all x 2 R such that fx̃ (x ) > 0, as follows:

fx̃ ,ỹ (x, y )

fỹ jx̃ (y jx ) = , for all (x, y ) 2 R2 with fx̃ (x ) > 0.
fx̃ (x )

The previous deÖnition of the conditional density of ỹ given x̃ = x is

the right one since the expression (88) becomes
P fx̃ 2 A, ỹ 2 C g = fỹ jx̃ (y jx ) fx̃ (x )dydx
fx̃ ,ỹ (x, y )
= fx̃ (x )dydx = fx̃ ,ỹ (x, y )dydx,
A C fx̃ (x ) A C

for all A 2 B (R ) and C 2 B (R ) .

If the discrete (absolutely continuous) random variables x̃ and ỹ are
independent then

fx̃ ,ỹ (x, y ) fx̃ (x ) 5 fỹ (y )

fỹ jx̃ (y jx ) = = = fỹ (y ) , for fx̃ (x ) > 0.
fx̃ (x ) fx̃ (x )

That is, the conditional probability function (density function) is

equal to the corresponding unconditional probability function (density

Note that from the conditional probability and density functions we
can obtain the conditional distribution in the usual way, namely,
Pỹ jx̃ (C jx ) = P fỹ 2 C jx̃ = x g = Â fỹ jx̃ (y jx ) , for all C 2 B ,
y 2C
Pỹ jx̃ (C jx ) = P fỹ 2 C jx̃ = x g = fỹ jx̃ (y jx ) dy , for all C 2 B ,
Pỹ jx̃ (C j5 ) : (R, B) "! (R, B)
or, sometimes,
Pỹ jx̃ (5 jx ) : B (R ) "!R.

Note again that, if x̃ is an absolutely continuous random variable,

then Pỹ jx̃ (C jx ) is a conditional distribution (and, hence, a
conditional probability P fỹ 2 C jx̃ = x g) given the event fx̃ = x g ,
which has zero probability! This conditional distribution is well
deÖned when the marginal density of x̃ evaluated at x, fx̃ (x ), is
strictly positive.
Example: The absolutely continuous random vector (x̃, ỹ ) has the
following density:
> 2
< 3 (x + 2y ) for 0 < x < 1 and 0 < y < 1
fx̃ ,ỹ (x, y ) =
: 0 otherwise.

We have already proved that the marginal density of the random

variable ỹ is
> 1
< (1 + 4y ) for 0 < y < 1
fỹ (y ) = 3
0 otherwise.

Therefore, the conditional density of x̃ given ỹ = y is
8 2
(x + 2y ) 2x + 4y
< 31 = for 0 < x < 1
3 (1 + 4y )
1 + 4y
fx̃ jỹ (x jy ) =
0 otherwise,

for 0 < y < 1.

Thus, the conditional density of x̃ given ỹ = 1/4 is

, > - > 2x + 1 1
>1 < = (2x + 1) for 0 < x < 1
fx̃ jỹ x >> = 2 2
4 >
0 otherwise.

,, 9> - ? > @ Z 1/3 , > -
1 >> 1 1 >> 1 >1
Px̃ jỹ "•, > = P x̃ . >ỹ = = fx̃ jỹ x >> dx
3 4 3 4 "• 4
Z 1/3 , -
1 16 2 71/3 1 1 1 1 4 4 2
= (2x + 1) dx = x +x 0 = + = 5 = = ,
0 2 2 2 9 3 2 9 18 9

while , 9 Z 1/3 Z •
Px̃ "•, = fx̃ ,ỹ (x, y )dy dx
3 "• "•
| {z }
fx̃ (x )
Z 1/3 Z 1
2 7
= (x + 2y ) dy dx = .
|0 3 {z } 27
3 (x +1 )

If we have more than 2 random variables, we can generalize the
previous conditional probability and density functions.


fx̃1 ,x̃2 ,x̃3 ,x̃4 (x1 , x2 , x3 , x4 )

fx̃1 ,x̃3 jx̃2 ,x̃4 (x1 , x3 jx2 , x4 ) = ,
fx̃2 ,x̃4 (x2 , x4 )


fx̃2 ,x̃4 (x2 , x4 ) = Â Â fx̃1 ,x̃2 ,x̃3 ,x̃4 (x1 , x2 , x3 , x4 ) > 0,

x1 2x̃1 (W) x3 2x̃3 (W)

or Z Z
fx̃2 ,x̃4 (x2 , x4 ) = fx̃1 ,x̃2 ,x̃3 ,x̃4 (x1 , x2 , x3 , x4 ) dx1 dx3 > 0.

