Download as pdf or txt
Download as pdf or txt
You are on page 1of 97

4.

Special Distributions

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 1 / 90


x̃ : (W, F , P ) "! (Rn , B)
A discrete distribution is fully characterized by its probability function,
fx̃ : x̃ (W) "! [0, 1] , since

Px̃ (B ) = P fx̃ 2 B g = Â fx̃ (x ), for all B 2 B .


x 2B

An absolutely continuous distribution


! is"fully characterized by its
density function, fx̃ : (R, B) "! R, B , since
Z
Px̃ (B ) = P fx̃ 2 B g = fx̃ (x )dx, for all B 2 B .
B

Let x̃ be an absolutely continuous random variable with density fx̃ .


We assume, without loss of generality, that the Borel set
A = fx 2 Rn jfx̃ (x ) 6= 0 g is open in Rn .
Thus, under this convention, the closure of A is the support supp (Px̃ )
of the distribution of the absolutely continuous random variable x̃.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 2 / 90
4.1. The discrete uniform distribution and the Dirac
distribution

The random variable/vector x̃ : (W, F , P ) ! (Rn , B) has the


discrete uniform distribution if its probability function is
1
f (x ) = , for x = x1 , . . . , xk , with xi 6= xj for i 6= j.
k | {z }
x̃ (W)

If k = 1, then x̃ is a constant.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 3 / 90


Example: The discrete random vector (x̃1 , x̃ 2 ), where x̃1 is the
number of points when rolling a dice and x̃ 2 is the number of heads
when tossing a coin has a probability function f (x1 , x2 ) summarized
in the following table:

x2 nx1 1 2 3 4 5 6
0 1/12 1/12 1/12 1/12 1/12 1/12
1 1/12 1/12 1/12 1/12 1/12 1/12

Therefore,
1
f (x1 , x2 ) = , for (x1 , x2 ) 2 f1, 2, 3, 4, 5, 6g + f0, 1g .
12

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 4 / 90


The Dirac distribution Px̃ of the random variable
x̃ : (W, F , P ) ! (R, B) satisÖes
8
< 1 if a 2 A
Px̃ (A) =
:
/ A,
0 if a 2

for all A 2 B .

Note that Px̃ fag = 1, i.e., the value a is taken almost surely by the
random variable x̃.

Some people deÖne the density function of the previous Dirac


distribution Ras a function such that fx̃ (x ) = • if x = a, fx̃ (x ) = 0 if
x 6= a, and R fx̃ (x ) dx = 1.

Obviously, there is no density function with those properties, even


though it can be viewed as the limit of a sequence of strictly positive
density functions converging to zero for all x 6= a.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 5 / 90
Paul Dirac (1902 - 1984)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 6 / 90


4.2. The Bernoulli, binomial, Pascal, geometric, and
hypergeometric distributions

Jacob Bernoulli (1654 - 1705)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 7 / 90


The random variable x̃ : (W, F , P ) ! (R, B) has the Bernoulli
distribution if its probability function (pmf) is

f (x; q ) = q x (1 " q )1 "x , for x = 0, 1.

or 8
< 1"q for x = 0
f (x; q ) =
:
q for x = 1.

Mean and variance:


µ=q
and
s 2 = q (1 " q ).

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 8 / 90


The binomial distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has a binomial
distribution (or is binomial) if its probability function is
, -
n x
b (x; n, q ) = q (1 " q )n "x , for x = 0, 1, . . . , n.
x

Motivation: q is the probability of a success in each trial. Then,


b (x; n, q ) gives the probability of x successes in n independent trials.
Note:
b (x; 1, q ) = q x (1 " q )1 "x , for x = 0, 1. Bernoulli pmf
Obviously, if x̃1 , ..., x̃n are independently distributed random variables
having a Bernoulli distribution with parameter q, then its sum
S̃ = Âni=1 x̃i has a binomial distribution with parameters n and q.
Therefore, if x̃1 , ..., x̃m are independently distributed random variables
having a binomial distribution with parameters ni and q, for
i = 1, ..., m, then its sum ỹ = Âm i =1 x̃i has a binomial distribution
with parameters Âm i =1 ni and q.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 9 / 90
Example: Let x̃ be the number of heads when tossing 4 coins.

, - , - x , - 4 "x , - , -4 , -
4 1 1 4 1 1 4
P fx̃ = x g = fx̃ (x ) = = = ,
x 2 2 x 2 16 x
| {z }
b (x ;4, 12 )

for x = 0, 1, 2, 3, 4
| {z }
x̃ (W)
or 8
> 1/16 = 0.0625 for x = 0
>
>
>
>
>
>
>
> 4/16 = 0.25
> for x = 1
, - >>
<
1
fx̃ (x ) = b x; 4, = 6/16 = 0.375 for x = 2
2 >
>
>
>
>
>
>
> 4/16 = 0.25 for x = 3
>
>
>
>
:
1/16 = 0.0625 for x = 4.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 10 / 90
Probability Histogram (which is symmetric i§ q = 1/2):

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 11 / 90


Probability Bar Chart (which is symmetric i§ q = 1/2):

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 12 / 90


Example: Let x̃ be the number of heads when tossing 4 coins. The
coin is unbalanced and the probability of head is q = 1/3.

, - , - x , - 4 "x
4 1 2
P fx̃ = x g = fx̃ (x ) = for x = 0, 1, 2, 3, 4
x 3 3 | {z }
| {z } x̃ (W)
b (x ;4, 13 )

or 8
> 0.1975 for x = 0
>
>
>
>
>
>
>
> 0.3951 for x = 1
>
, - >>
<
1
fx̃ (x ) = b x; 4, = 0.2963 for x = 2
3 >
>
>
>
>
>
>
> 0.0988 for x = 3
>
>
>
>
:
0.0123 for x = 4.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 13 / 90


Probability Histogram (which is not symmetric since q = 1/3):

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 14 / 90


Probability Bar Chart (which is not symmetric since q = 1/3):

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 15 / 90


Moment-generating function of a binomial random variable:
Mx̃ (t ) = [1 + q (e t " 1)]n .

Proof:
n , -
! " n x
Mx̃ (t ) = E e t x̃
= Âe tx
q ( 1 " q ) n "x
x =0 x
| {z }
b (x ;n,q )
n , -! "x
n
= Â qe t (1 " q )n "x = [qe t + (1 " q )]n = [1 + q (e t " 1)]n .
x =0 x
Then, /
µ = Mx̃0 (0) = n[1 + q (e t " 1)]n "1 qe t /t =0 = nq
and
! "
s2 = E x̃ 2 " [E (x̃ )]2 = Mx̃00 (0) " µ2 =
/
fn(n " 1)[1 + q (e t " 1)]n " 2 q 2 e 2t + n[1 + q (e t " 1)]n " 1 qe t g/t =0
" n2 q 2 = n(n " 1)q 2 + nq " n2 q 2 = "nq 2 + nq = nq (1 " q ).
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 16 / 90
The Pascal (negative binomial or binomial waiting-time)
distribution.

Blaise Pascal (1623 - 1662)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 17 / 90


The random variable x̃ : (W, F , P ) ! (R, B) has a Pascal
distribution if its probability function is
, -
. x "1 k
b (x; k, q ) = q (1 " q )x "k , for x = k, k + 1, . . .
k "1

Motivation: q is the probability of success in each trial. Then,


b . (x; k, q ) gives the probability that the k th success will occur on the
x th independent trial.
Note: b . (x; k, q ) = qb (k " 1; x " 1, q ).
Mean, variance, and moment-generating function:
k k (1 " q )
µ= , s2 = ,
q q2
0 1k
qe t
Mx̃ (t ) = for t < " ln(1 " q ).
1 " (1 " q )e t | {z }
+

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 18 / 90


The geometric distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has a geometric
distribution if its probability function is
g (x; q ) = q (1 " q )x "1 , for x = 1, 2, . . .

Motivation: q is the probability of success in each trial. Then, g (x; q )


gives the probability that the Örst success will occur on the x th
independent trial.
Note: g (x; q ) = b . (x; 1, q ).
Thus, the moment-generating function is
qe t
Mx̃ (t ) = for t < " ln(1 " q ).
1 " (1 " q )e t | {z }
+
Mean and variance:
1 1"q
µ = Mx̃0 (0) = and s2 = Mx̃00 (0) " µ2 = .
q q2

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 19 / 90


The hypergeometric distribution.

The random variable x̃ : (W, F , P ) ! (R, B) has a hypergeometric


distribution if its probability function is
, -, -
a N "a
x n"x
h(x; n, N, a) = , - ,
N
n

for x = 0, 1 . . . , n, x / a, n " x / N " a.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 20 / 90


Motivation: Consider a set of N elements of which a are successes
and N " a are failures. We choose, without replacement, n of the N
elements contained in the set. Then, h(x; n, N, a) gives the
probability of getting x successes.

Mean and variance:


na
µ=
N
and
na(N " a)(N " n)
s2 = .
N 2 (N " 1)
Note:
lim h(x; n, N, a) = b (x; n, q )
N!•
a/N = q

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 21 / 90


4.3. The multinomial and multivariate hypergeometric
distributions.

The multinomial distribution.


! "
The random vector x̃ = (x̃1 , x̃2 , . . . , x̃k ) : (W, F , P ) ! Rk , B has
the multinomial distribution if its probability function is
, -
n
m (x1 , x2 , . . . , xk ; n, q 1 , q 2 , . . . , q k ) = q x11 q x22 0 0 0 q xkk ,
x1 , x2 , . . . , xk

for xi = 0, 1, . . . , n, Âki=1 xi = n and Âki=1 q i = 1.

Recall that , -
n n!
= .
x1 , x2 , . . . , xk x1 ! 0 x2 ! 0 . . . 0 xk !

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 22 / 90


Motivation: There are n independent trials permitting k exclusive
outcomes, whose respective probabilities are q 1 , q 2 , . . . , q k (with
Âki=1 q i = 1). We shall refer to the outcomes as being of the Örst
type, the second type, ... and the k th type. Then,
m (x1 , x2 , . . . , xk ; n, q 1 , q 2 , . . . , q k ) gives the probability of getting x1
outcomes of the Örst type, x2 outcomes of the second type, ... and xk
outcomes of the k th type (with Âki=1 xi = n).

Note: m (x, n " x; n, q, 1 " q ) = b (x; n, q ).

Example: Consider a very large population. 50% of the individuals


have brown eyes, 30% have black eyes, and 20% have blue eyes. We
pick randomly 8 individuals. The probability of picking 5 individuals
with brown eyes, 2 with black eyes, and 1 with blue eyes is
8!
m (5, 2, 1; 8, 0.5, 0.3, 0.2) = 0 0.55 0 0.32 0 0.21 = 0.0945.
5!2!1!

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 23 / 90


The multivariate hypergeometric distribution.
! "
The random vector x̃ = (x̃1 , x̃2 , . . . , x̃k ) : (W, F , P ) ! Rk , B has
the multivariate hypergeometric distribution if its probability function
is
, -, - , -
a1 a2 a
000 k
x1 x2 xk
f (x1 , x2 , . . . , xk ; n, N, a1 , a2 , . . . , ak ) = , - ,
N
n

for xi = 0, 1, . . . , n, and xi / ai , where Âki=1 xi = n, Âki=1 ai = N.


Motivation: There is a set of N elements, of which a1 are elements of
the Örst type, a2 are elements of the second type, ..., ak are elements
of the k th type (with Âki=1 ai = N). We choose, without replacement,
n of the N elements in the set. Then
f (x1 , x2 , . . . , xk ; n, N, a1 , a2 , . . . , ak ) gives the probability of getting x1
outcomes of the Örst type, x2 outcomes of the second type, ... and xk
outcomes of the k th type (with Âki=1 xi = n).
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 24 / 90
Note: f (x, n " x; n, N, a, N " a) = h(x; n, N, a).
Note:
lim f (x1 , x2 , . . . , xk ; n, N, a1 , a2 , . . . , ak )
N!•
a1 /N = q 1
a2 /N = q 2
000
ak /N = q k
= m (x1 , x2 , . . . , xk ; n, q 1 , q 2 , . . . , q k ) .

Example of the multinomial with N = 2000 :


, -, -, -
1000 600 400
5 2 1
f (5, 2, 1; 8, 2000, 1000, 600, 400) = , - = 0.0947
2000
8
1 m (5, 2, 1; 8, 0.5, 0.3, 0.2) = 0.0945.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 25 / 90
4.4. Integration by parts for Lebesgue-Stieltjes integrals.
Integration by parts.
From now on, whenever we write the integral of a function w.r.t. a
measure it should be understood that the function is integrable w.r.t.
that measure.
Assume that F : M "! R and G : M "! R are continuously
di§erentiable functions, where M is an open subset of R, and
[a, b ] 2 M. Then,
Z Z
G (x )F 0 (x )dx = F (b )G (b ) " F (a)G (a) " G 0 (x )F (x )dx.
[a,b ] | {z } [a,b ]
[F (x )G (x )]ba

Note: The interval of integration can be replaced by (a, b ) , (a, b ], or


[a, b ).
The previous Lebesgue integrals are equal to their Riemann
counterparts as the functions G 0 F 0 and G 0 0 F are continuous on
[a, b ].
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 26 / 90
Remember that, if F is a distribution function, then
Z Z
G (x )dF (x ) 3 G (x )d µ (x ),
(a,b ] (a,b ]

where µ is the Lebesgue-Stieltjes measure associated with F .

Assume that F : R "! R is a distribution function, G : M "! R is


a continuously di§erentiable function, where M is an open subset of
R, and [a, b ] 2 M. Then,
Z Z
G (x )dF (x ) = F (b )G (b ) " F (a)G (a) " F (x )G 0 (x )dx. (?)
(a,b ] (a,b ]

Note that F (a+ ) = F (a), because of the right-continuity of F .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 27 / 90


Since G is continuous on [a, b ] , then
Z Z b
G (x )dF (x ) = G (x )dF (x ) .
(a,b ]
| {z } |a {z }
Lebesgue-Stieltjes integral Riemann-Stieltjes integral

The last (Lebesgue) integral in (?) obviously satisÖes


Z Z
F (x )G 0 (x )dx = F (x )G 0 (x )dx
(a,b ] (a,b )
Z Z
0
= F (x )G (x )dx = F (x )G 0 (x )dx.
[a,b ] [a,b )

Z
G (x )dF (x ) =
[a,b ]
! " Z
G (a ) F (a ) " F (a " ) + F (b )G (b ) " F (a )G (a ) " F (x )G 0 (x )dx
[a,b ]
Z
= F (b )G (b ) " F (a " )G (a ) " F (x )G 0 (x )dx.
[a,b ]

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 28 / 90


Z
G (x )dF (x ) =
(a,b )
! " Z
F (b )G (b ) " F (a )G (a ) " G (b ) F (b ) " F (b " ) " F (x )G 0 (x )dx
(a,b )
Z
"
= F (b )G (b ) " F (a )G (a ) " F (x )G 0 (x )dx.
(a,b )

Z
G (x )dF (x ) =
[a,b )
! " Z
" "
G (a) F (a)"F (a ) + F (b )G (b )"F (a)G (a)" F (x )G 0 (x )dx
(a,b )
| R
{z }
(a,b ) G (x )dF (x )
Z
" "
= F (b )G (b ) " F (a )G (a ) " F (x )G 0 (x )dx.
[a,b )

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 29 / 90


4.5. Lebesgue integration by change of variable: polar
coordinates.

Assume that (i) g : M "! R is a continuously di§erentiable function,


where M is an open subset of R, and (ii) the function g restricted to
g "1 ([a, b ]) , g : g "1 ([a, b ]) "! [a, b ] , where [a, b ] 2 g (M ) (or
g "1 ([a, b ]) 2 M), is a bijective function (or one-to-one
correspondence). Let f : ([a, b ] , B ([a, b ])) "! (R, B) be a
Lebesgue integrable function. Then,
Z Z / /
f (x )dx = f (g (y )) /g 0 (y )/ dy .
[a,b ] g "1 ([a,b ]) | {z }
x

See the handout for an informal proof.


2 3
Note that g "1 ([a, b ]) is the2 interval g "1 (a)3, g "1 (b ) if g is
increasing or is the interval g "1 (b ) , g "1 (a) if g is decreasing.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 30 / 90


Assume that (i) g : M "! Rn is a continuously di§erentiable
function, where M is an open subset of Rn , and (ii) the function g
restricted to g "1 (B ) , g : g "1 (B ) "! B, where B is a Borel set in
Rn such that B 2 g (M ) (or g "1 (B ) 2 M), is a bijective function
(or one-to-one correspondence).
Let Jg (y1 , y2 , ..., yn ) be the Jacobian matrix of the function
g : (y1 , ..., yn ) 7"! (g1 (y1 , y2 , ..., yn ) , ..., gn (y1 , y2 , ..., yn )),
| {z } | {z }
y 2R n g (y )2Rn

which is given by
0 1
∂g 1 (y ) ∂g 1 (y ) ∂g 1 (y )
∂y1 ∂y2 000 000 ∂yn
B C
B C
B C
B ∂g 2 (y ) ∂g 2 (y )
000 000 ∂g 2 (y ) C
Jg (y1 , y2 , ..., yn ) = B
B
∂y1 ∂y2 ∂yn C,
C
B ... ... ... ... ... C
B C
@ A
∂g n (y ) ∂g n (y ) ∂g n (y )
∂y1 ∂y2 000 000 ∂yn

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 31 / 90


Let f : (B, B (B )) "! (R, B) be a Lebesgue integrable function.
Then,
Z
f (x1 , x2, ..., xn )d (x1 , ..., xn ) =
B
Z
f (g (y1 , y2 , ..., yn )) jJg (y1 , y2 , ..., yn )j d (y1 , ..., yn ),
g "1 (B ) | {z }
(x1 ,x2, ...,xn )

where jJg (y1 , y2 , ..., yn )j is the absolute value of the determinant of


Jg (y1 , y2 , ..., yn ).

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 32 / 90


Alternatively, we can assume that (i) g : M "! Rn is a continuously
di§erentiable function, where M is an open subset of Rn , and (ii) the
function g restricted to A, g : A "! g (A), where A is a Borel set in
Rn such that A 2 M, is a bijective function (or one-to-one
correspondence). Let f : (g (A), B (g (A))) "! (R, B) be a Lebesgue
integrable function. Then,
Z
f (x1 , x2, ..., xn )d (x1 , ..., xn ) =
g (A )
Z
f (g (y1 , y2 , ..., yn )) jJg (y1 , y2 , ..., yn )j d (y1 , ..., yn ).
A | {z }
(x1 ,x2, ...,xn )

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 33 / 90


A very useful change of variable (or substitution) is the change to
polar coordinates in R2 .

Consider the function

g : R ++ + [0, 2p ) "! R2 ,

given by 8
< x = r 0 cos q
(x, y ) = g (r , q ), with
:
y = r 0 sin q.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 34 / 90


Note that
x 2 + y 2 = r 2 cos2 q + r 2 sin2 q = r 2 .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 35 / 90


Clearly, g (R ++ + [0, 2p )) = R2 n (0, 0) , where R2 n (0, 0) is the real
plane without the point (0, 0), which has zero Lebesgue measure.

Moreover,
! the function
" g restricted to
g "1 R2 n (0, 0) = R ++ + [0, 2p ) ,
g : R ++ + [0, 2p ) "! R2 n (0, 0) , is bijective.

C is a circular region in R2 n (0, 0) if g "1 (C ) is a measurable


rectangle on R ++ + [0, 2p ) , that is, if g "1 (C ) is the Cartesian
product of two Borel sets, g "1 (C ) = D1 + D2 , with D1 2 B (R ++ )
and D2 2 B ([0, 2p )) .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 36 / 90


Obviously, we can deÖne the inverse function g "1 deÖned on
R2 n (0, 0) as

8 ! 2 "
> 2 1/2
< r = x +y
(r , q ) = g "1 (x, y ), with : ;
>
: q = arctan y .
x

Moreover, 2 3
cos q "r 0 sin q
Jg ( r , q ) = 4 5,
sin q r 0 cos q

so that / /
jJg (r , q )j = /r cos2 q + r sin2 q / = jr j = r .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 37 / 90


Therefore, by making the change to polar coordinates, we can
compute the following integral over a circular region C :
Z Z
f (x, y ) d (x, y ) = f (r cos q, r sin q )r d (r , q )
C g "1 (C )
Z Z
= f (r cos q, r sin q )rd qdr ,
D1 D2
where we use Fubiniís theorem in the last equality since g "1 (C ) is a
measurable rectangle on R ++ + [0, 2p ), i.e., it is the Cartesian
product of two Borel sets, g "1 (C ) = D1 + D2 , with D1 2 B (R ++ )
and D2 2 B ([0, 2p )) .
Moreover, we can easily compute the following integral over a circular
region C :
Z ! 2 " Z
2
h x + y d (x, y ) = h (r 2 )r d (r , q )
C g "1 (C )
Z Z ,Z - ,Z -
2 2
= h(r )rd qdr = h(r )rdr dq .
D1 D2 D1 D2

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 38 / 90


Example:
Z •Z • Z
" (x 2 +y 2 )
e " (x
2 +y 2
e dxdy = ) d (x, y ) =
0 0 R + +R +

,Z - ,Z - " 2
#•
"r 2 " e "r 1 p p
= e rdr dq = 0 [q ]0p/2 = 0 = .
R ++ [0,p/2 ) 2 2 2 4
0

Note that
Z •Z • ,Z •
- ,Z •
-
" (x 2 +y 2 ) "x 2 "y 2
e dxdy = e dx e dy = M2,
0 0 0 0
Z • Z •
2 2
where M = e "x dx = e "y dy .
0 0

Therefore,
Z • : p ;1/2 p Z •
2 p 2 p
M= e "x dx = = =) e "x dx = p.
0 4 2 "•

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 39 / 90


4.6. The uniform density

The absolutely continuous random variable x̃ : (W, F , P ) ! (R, B) is


uniform (or has a uniform distribution) on (a, b) if its density
function is 8
> 1
>
< for x 2 (a, b)
b"a
f (x ) =
>
>
:
0 otherwise.

The distribution function is


8
>
> 0 for x / a
>
>
>
>
< x "a
F (x ) = for x 2 (a, b)
>
> b"a
>
>
>
>
:
1 for x 7 b.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 40 / 90


J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 41 / 90
Let a / c < d / b, then,
( )
d "c
P c / e
x / d = Px̃ [c, d ] = .
(<) (<) b"a

Mean:

Z • - Z b ,
0 2 1b
1 1 x
µ = E (x̃ ) = xf (x )dx = x dx =
"• a b"a b"a 2 a
" #
1 b2 " a2 b+a
= = ,
b"a 2 2

where the last inequality follows since ( b + a)( b " a) = b2 " a2 .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 42 / 90


Moreover,
Z • Z b , -
! " 1
E x̃ 2
= 2
x f (x )dx = x 2
dx
"• a b"a
0 3 1b
1 x b3 " a3
= = ,
b"a 3 a 3( b " a )

so that
0 1
! " b3 " a3
2 b+a 2 ( b " a )2
Var (x̃ ) = E x̃ 2
" [E (x̃ )] = " = ,
3( b " a ) 2 12

where the last equality is obtained after some tedious algebra.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 43 / 90


Moment-generating function:
8 tb
>
> e " eta
< for t 6= 0
t ( b " a)
Mx̃ (t ) =
>
>
:
1 for t = 0.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 44 / 90


4.7. The gamma, exponential, and chi-square distributions.

The gamma distribution.

The random variable x̃ : (W, F , P ) ! (R, B) has the gamma


distribution if its density is
8
> 1
>
< a x a"1 e "x /b for x > 0
b G(a)
f (x; a, b) =
>
>
:
0 otherwise

with a > 0, b > 0 and where


Z •
G(a) = y a"1 e "y dy , for a > 0. (the gamma function)
0

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 45 / 90


Densities of the gamma distribution:

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 46 / 90


Making the change of variable
x dx
x = g (y ) = by , y = g "1 (x ) = , so that = g 0 (y ) = b > 0,
b dy

we can check that


Z • Z • Z •
f (x; a, b)dx = kx a"1 e "x /b dx = k ( by )a"1 e "y bdy
0 0 0
Z •
= k ba y a"1 e "y dy = 1.
0
| {z }
G(a)

Therefore,
1
k= .
ba G(a)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 47 / 90


Properties of the gamma function G(a).
R• •
(a) G(1) = 0 e "y dy = ["e "y ]0 = lim ("e "y ) + e 0 = 1.
y !•

(b) G(a) = (a " 1)G(a " 1) for a > 1.


Proof. Integrating by parts:
Z • 2 3• Z • ! "
G(a) = y a "1 "y
e dy = " y a "1 e "y 0 " (a " 1)y a"2 "e "y dy
0 |{z}|{z} 0 | {z }| {z }
0
F (y ) G (y ) F 0 (y ) G (y )
Z •
= 0 + ( a " 1) y a"2 e "y dy = (a " 1)G(a " 1).
0

Note that (a), (b), and the continuity of the gamma function imply
that

1 = lim+ G(a) = lim [x 0G(x )] =) lim+ G(x ) = •.


a !1 x 3 a "1 !0 + x !0

(c) G(a) = (a " 1)! when a is a strictly positive integer.


J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 48 / 90
, -
1 p
(d) G = p
2
Proof: See handout 1.A.
r
R• 1 2 p
Corollary. 0
e " 2 z dz = .
2
Proof: See handout 1.B.

00 1 2 3 4

The gamma function G(a)


J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 49 / 90
Let x̃ be a random variable that has a gamma distribution with
parameters a and b. Then,
br G ( a + r )
(a) µr0 = .
G(a)
Proof. See handout 1.C.

(b) Mx̃ (t ) = (1 " bt )"a for t < 1/b.

Proof. See handout 1.D.

Corollary.
(i) µ = µ10 = ab,
(ii) µ20 = a(a + 1) b2 ,
(iii) s2 = µ20 " µ2 = ab2 .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 50 / 90


The exponential distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has the exponential
distribution if its density is the gamma density with a = 1 and b = q,
8 1
< e "x /q for x > 0
>
f (x; q ) = q
>
:
0 otherwise.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 51 / 90


This distribution is used to model waiting time.
The distribution function is
8
< 1"e "x /q for x > 0
P fx̃ / x g =F (x ) =
:
0 otherwise,
which gives the probability of waiting less than x units of time.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 52 / 90


Mean and variance:
µ = ab = q,
s2 = ab2 = q 2 .

Moment-generating function:
1
Mx̃ (t ) = (1 " bt )"a = for t < 1/q.
1 " qt

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 53 / 90


! "
The chi-square c2 distribution.

Karl Pearson (1857 ñ 1936)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 54 / 90


The random variable x̃ : (W, F , P ) ! (R, B) has the c2 (chi-square)
distribution if its density is the gamma density with a = n/2 and
b = 2, 8
> 1 n "2
"x /2 for x > 0
>
< n/2 n x 2 e
2 G( 2 )
f (x; n) =
>
>
:
0 otherwise.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 55 / 90


Mean and variance:
µ = ab = n,
s2 = ab2 = 2n.

Moment-generating function:

Mx̃ (t ) = (1 " bt )"a = (1 " 2t )"n/2 for t < 1/2.

Notation:

x̃ 9 B(n, q ) " x̃ has the binomial distribution


x̃ 9 U(a, b) " x̃ has the uniform distribution on (a, b)
x̃ 9 G (a, b) " x̃ has the gamma distribution
x̃ 9 c2n " x̃ has the chi-square distribution with n degrees of
freedom.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 56 / 90


Densities of the chi-square distributions with 2, 4, and 6 degrees of
freedom:

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 57 / 90


4.8. The beta distribution.

The random variable x̃ : (W, F , P ) ! (R, B) has the beta


distribution if its density is
8
>
> G ( a + b ) a "1
< x (1 " x ) b"1 for x 2 (0, 1)
G(a)G( b)
f (x; a, b) =
>
>
:
0 otherwise,

with a > 0, b > 0.

If a = 1 and b = 1, then the beta distribution becomes the uniform


distribution on (0, 1) .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 58 / 90


Densities of the beta distribution:

Mean and variance:


a
µ= ,
a+b
ab
s2 = .
( a + b )2 ( a + b + 1)
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 59 / 90
4.9. The normal distribution

Carl Friedrich Gauss (1777 ñ 1855)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 60 / 90


The random variable x̃ : (W, F , P ) ! (R, B) has the normal
distribution (or is normal) if its density is

1 1 x "µ 2
n(x; µ, s) = p e " 2 ( s ) , with s > 0, for all x 2 R.
s 2p

Notation: x̃ 9 N(µ, s2 ).

A random variable z̃ has the standard normal distribution (or is


standard normal) if z̃ 9 N(0, 1). Thus, the density of a standard
normal random variable is
1 1 2
n(z; 0, 1) = p e " 2 z , for all z 2 R.
2p

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 61 / 90


Normal densities (i.e., densities of normal distributions) are symmetric
around µ.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 62 / 90


! "
Properties of the normal distribution N µ, s2 .
R•
(1) "• n (x; µ, s ) = 1.

Proof: See handout 2.A.


1 2t2
(2) Mx̃ (t ) = e µt + 2 s . In particular, if z̃ 9 N(0, 1), then
Mz̃ (t ) = e t /2 .
2

Proof: See handout 2.B.

(3)
Mx̃0 (t ) = (µ + s2 t )Mx̃ (t ) ) Mx̃0 (0) = µ = E(x̃ ).

Mx̃00 (t ) = s2 Mx̃ (t ) + (µ + s2 t )2 Mx̃ (t ) ) Mx̃00 (0) = s2 + µ2 = E(x̃ 2 ).

Var(x̃ ) = E(x̃ 2 ) " [E(x̃ )]2 = s2 .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 63 / 90


Note: If µx̃ and sx̃ > 0 are the mean and the standard deviation,
respectively, of the random variable x̃, then the "standardized"
x̃ " µx̃
random variable z̃ = has µz̃ = 0 and s2z̃ = sz̃ = 1.
sx̃
x̃ " µ
(4) If x̃ 9 N(µ, s2 ) and z̃ = , then z̃ 9 N(0, 1).
s
x "µ
Proof: Let x = g (z ) = µ + sz so that z = g "1 (x ) = and
s
0
g (z ) = s > 0.
For all A 2 B , we have
F G
Pz̃ (A) = P fz̃ 2 Ag = P g "1 (x̃ ) 2 A = P fx̃ 2 g (A)g
Z Z
1 1 x "µ 2 1
p e " 2 ( s ) dx =
1 2
= Px̃ (g (A)) = p e " 2 z sdz
g (A ) s 2p A s 2p
Z Z
1 " 1 z2
= p e 2 dz = n(z; 0, 1)dz.
A 2p A

Therefore, z̃ 9 N(0, 1). Q.E.D.


J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 64 / 90
The distribution function of the standard normal distribution is tabulated:

'

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 65 / 90


The previous table gives the area of the shaded region.

'

i ! !
( ! !
1

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 66 / 90


or


J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 67 / 90
The previous table gives the area of the shaded region.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 68 / 90


If a < 0, then N(a) = 1 " N("a).

If a < b < 0, then

N(b ) " N(a) = 1 " N("b ) " [1 " N("a)] = N("a) " N("b ).

if a < 0 and b > 0, then

N(b ) " N(a) = N(b ) " [1 " N("a)] = N(b ) + N("a) " 1.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 69 / 90


Then, if x̃ 9N(µ, s2 ),
H I
a"µ x̃ " µ b"µ
P fa / x̃ / b g = P / / =
s s s
H I , - , -
a"µ b"µ b"µ a"µ
P / z̃ / =N "N ,
s s s s
where z̃ 9 N(0, 1) and N (0) is the distribution function of the
standard normal distribution.

Example. If x̃ 9 N(µ, s2 ) with µ = 4 and s2 = 49, then


H I
"2 " 4 x̃ " µ 5"4
P f"2 / x̃ / 5g = P / /
7 s 7

= P f"0.8571 / z̃ / 0.1429g = N(0.1429) " N ("0.8571)


= N(0.1429) + N (0.8571) " 1 = 0.5568 + 0.8043 " 1 = 0.3611.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 70 / 90


4.10. The multivariate normal distribution and its
properties
The random vector x̃ = (x̃1 , x̃2 , . . . , x̃n )| : (W, F , P ) ! (Rn , B) has the
multivariate normal distribution ( x̃ 9 MN(µ, S) ) if its density function is
0 1
1 1 | "1
f (x; µ, S) = exp " (x " µ) S (x " µ) , 8 x 2 Rn ,
(2p )n/2 jSj1/2 2
where
x = (x1 , x2 , . . . , xn )| ,
0 1
µ1 0 1
B µ C s11 s12 000 s1n
B 2 C B s21 s22 000 s2n C
B .. C B C
µ=B
B . C,
C S = B 000 000
B 000 000 C,
C
B .. C @ 000 000 000 000 A
@ . A
µn sn1 sn2 000 snn

S is a symmetric positive deÖnite matrix, and jSj > 0 is the (absolute


value of the) determinant of S.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 71 / 90
p
If n = 1, then f (x; µ, S) = n(x; µ, s), with s = S as S 2 R in
this case.

Properties of the multivariate normal distribution.

(1) The marginal distribution of any sub-vector of the multivariate


normally distributed random vector (x̃1 , x̃2 , ..., x̃n )| will be multivariate
normally distributed and the corresponding sub-vector of µ and the
corresponding sub-matrix of S will be the mean vector and the
variance-covariance matrix of that random sub-vector. In particular,

x̃ i 9 N(µi , s2i ), where s2i 3 sii ,

and
Cov(x̃ i ,x̃ j ) = sij .

However, it is not true that the joint distribution of (multivariate)


normal random vectors/variables is multivariate normal.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 72 / 90


(2) Moment-generating function of the multivariate normal
distribution:
| 1 |
Mx̃ (t1 , t2 , ..., tn ) = e t µ+ 2 t St .
| {z }
t 2R n

(3) The random vector x̃ = (x̃1 , x̃2 , . . . , x̃n )| is multivariate normally


distributed, x̃ 9MN(µ, S), with S diagonal (i.e., sij = 0, for all
i 6= j) if and only if the random variables x̃1 , x̃2 , ..., x̃n are normally
distributed and independent.
Observe that, for S diagonal,
n
f (x1 , x2 , . . . , xn ; µ, S) = ’ n(xi ; µi , si ), where si 3 (sii )1/2 .
i =1

Under multivariate normality, zero covariance implies independency!


However, it is not true that the joint distribution of uncorrelated
normally distributed random variables is multivariate normal.
Note: When S is diagonal and s2i is the same for all i, we say that x̃
has a multivariate circular or spherical normal distribution.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 73 / 90
(4) Consider the following partitioned vectors/matrices of x, µ, S
and x̃ :
0 1
x1
x = @ 0 0 0 A , x1 2 Rn1 , x2 2 Rn2 , n1 + n2 = n,
x2
0 1
0 1 ..
µ1 B S11 . S12 C
µ = @ 000 A, S=B
@ 000 000 000 C
A, S|12 = S21 ,
µ2 ..
S21 . S22
0 1
x̃1
x̃ = @ 0 0 0 A , where x̃1 9 MN(µ1 , S11 ) and x̃2 9 MN(µ2 , S22 ),
x̃2
h i
and Sij = Cov(x̃ i , x̃ j ) = E (x̃ i " µi ) (x̃ j " µj )| is a ni + nj matrix,
for i = 1, 2, j = 1, 2.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 74 / 90


Then, the conditional density of the random vector x̃1 given x̃2 = x2 ,
: ; f (x; µ, S)
fx̃ 1 jx̃ 2 x1 jx2 ; µx̃ 1 jx̃ 2 =x2 , Sx̃ 1 jx̃ 2 =x2 = , 8x1 2 Rn1 ,
fx̃ 2 (x2 ; µ2 , S22 )

is the density of a multivariate normal random vector with mean


µx̃1 jx̃2 =x2 and covariance matrix Sx̃1 jx̃2 =x2 .

Moreover, the conditional mean of the random vector x̃1 given


x̃2 = x2 is the following:
"1
µx̃ 1 jx̃ 2 =x2 3 E(x̃ 1 jx̃ 2 = x2 ) = µ1 + S12 S22 (x2 " µ2 ) 2 Rn1 .

Note: If n1 = n2 = 1, then
s12
E(x̃ 1 jx̃ 2 = x2 ) = µ1 + (x2 " µ2 ) .
s22

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 75 / 90


Finally, let

Sx̃ 1 jx̃ 2 =x2 3 Var(x̃ 1 jx̃ 2 = x2 ) =

E([x̃ 1 " E(x̃ 1 jx̃ 2 = x2 )][x̃ 1 " E(x̃ 1 jx̃ 2 = x2 )]| jx̃ 2 = x2 )
be the n1 + n1 conditional covariance matrix of the random vector x̃ 1
given x̃2 = x2 . Then,
"1
Sx̃ 1 jx̃ 2 =x2 = S11 " S12 S22 S21 ,

which does not depend on the value x2 taken by the random vector
x̃ 2 .
Note: If n1 = n2 = 1, then
s212
Var(x̃ 1 jx̃ 2 = x2 ) = s21 " = s21 (1 " r2 ),
s22

so that the random variable Var(x̃ 1 jx̃ 2 ) : (W, F ) "! (R, B) is a


constant.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 76 / 90
(5) Let x̃ = (x̃1 , x̃2 , . . . , x̃n )| 9 MN(µ, S) and a = (a1 , a2 , . . . , an )|
is a non-zero vector of scalars, then
n
a| x̃ = Â ai x̃ i 9 N(a| µ, a| Sa).
i =1

Note that in the previous result the vector x̃ = (x̃ 1 ,x̃ 2 , . . . ,x̃ n )| has
to be multivariate normal. It is not enough that each component of
that vector be normal.

General Proposition. Let x̃ = (x̃1 , x̃2 , . . . , x̃n )| 9 MN(µ, S) and


ỹ = c + Ax̃ be an a¢ne transformation of x̃, where c 2 Rm is a
column vector and A is m + n matrix with rank(A) = m / n. Then,
ỹ = (ỹ1 , ỹ2 , . . . , ỹm )| 9 MN(c + Aµ, ASA| ).

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 77 / 90


The previous Proposition implies Property 5 (when m = 1, A = a| ,
and c = 0) and Property 1. For the later, consider the following
example: to extract the sub-vector (x̃1 , x̃2 , x̃4 ) | from the random
vector x̃ = (x̃1 , x̃2 , . . . , x̃n )| use the vector c = 0 and the 3 + n matrix
0 1
1 0 0 0 0 000 0
A = @ 0 1 0 0 0 000 0 A,
0 0 0 1 0 000 0

which extracts the desired sub-vector directly.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 78 / 90


4.11. Multivariate normality and linear models
From linearity to multivariate normality:
Consider the random variable ỹ ,
n
ỹ = a + b| x̃ + #̃ = a + Â bi x̃i + #̃,
|{z} i =1
=x̃ | b
|
where a 2 R, b = ( b1 , b2 , . . . , bn ) 2 Rn is a column vector,
x̃ = (x̃1 , x̃2 , . . . , x̃n )| 9 MN(µx , Sx ), #̃ 9 N(0, s2# ), and #̃ and x̃ are
independent.
We know from Property 3 above that the vector (x̃1 , x̃2 , . . . , x̃n , #̃)| is
multivariate normal. Thus, from the General Proposition above, the
random variable ỹ is normal since it is an a¢ne transformation of the
random variables appearing in the multivariate normal random vector
(x̃1 , x̃2 , . . . , x̃n , #̃)| . Finally, also from the General Proposition above,
the random vector (ỹ , x̃1 , x̃2 , . . . , x̃n )| is multivariate normal.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 79 / 90
The mean µy and the variance s2y of ỹ are computed as follows:

Mean:

µy = E(ỹ ) = E (a + b| x̃ + #̃) = a + b| µx + E (#̃) .

= a + b| µx + 0 = a + b| µx . (1)
| {z }
=µ|x b

Variance:

s2y = Var(ỹ ) = Var(a + b| x̃ + #̃) = b| Sx b + s2# . (2)

Therefore, ỹ 9 N(a + b| µx , b| Sx b + s2# ).


| {z } | {z }
µy s2y

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 80 / 90


Let us compute the conditional expectation of ỹ given x̃ = x,

E(ỹ jx̃ = x ) = E ( a + b| x̃ + #̃j x̃ = x )

= a + E ( b| x̃ j x̃ = x ) + E (#̃jx̃ = x )
= a + E ( b| x ) + E (#̃) = a + b| x ,
|{z}
=x | b

where the third equality holds because #̃ and x̃ are independent.

Therefore, the conditional expectation E(ỹ jx̃ = x ) is an a¢ne


transformation of x. This agrees with our previous Property 4,
according to which,

E(ỹ jx̃ = x ) = µy + Sy ,x Sx"1 (x " µx )


: ;
= µy " Sy ,x Sx̃"1 µx + Sy ,x Sx"1 x.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 81 / 90


Thus, the following equalities should be true:

a = µy " Sy ,x Sx"1 µx (3)

and
b| = Sy ,x Sx"1 or b = Sx"1 S|y ,x = Sx"1 Sx ,y . (4)

We can check that (4) holds since the 1 + n matrix Sy ,x satisÖes

Sy ,x = Cov (ỹ , x̃ ) = Cov (a + b| x̃ + #̃, x̃ ) = b| Sx

so that
Sy ,x Sx̃"1 = b| Sx Sx"1 = b| .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 82 / 90


To check that (3) holds note that, from (1) and (4), we get

a = µy " b| µx = µy " Sy ,x Sx"1 µx .

We can compute now the conditional variance of ỹ given x̃ = x,

Var(ỹ jx̃ = x ) = Var ( a + b| x̃ + #̃j x̃ = x )

= Var (a + b| x + #̃) = Var (#̃) = s2# ,


which agrees with our previous Property 4, according to which,

Var(ỹ jx̃ = x ) = s2y " Sy ,x Sx"1 Sx ,y

and, thus, Var(ỹ jx̃ = x ) does not depend on the value x taken by the
random vector x̃, i.e., the random variable Var(ỹ jx̃ ) is a constant.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 83 / 90


Therefore, we should have

s2# = s2y " Sy ,x Sx"1 Sx ,y .

We can check that the previous equality holds indeed since, from (2)
and (4), we get

s2# = s2y " b| Sx b = s2y " Sy ,x Sx"1 Sx Sx"1 Sx ,y

= s2y " Sy ,x Sx"1 Sx ,y .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 84 / 90


From multivariate normality to linearity:
Assume that the random vector (ỹ ,x̃1 , x̃2 , . . . , x̃n )| is MN(µ, S), with
| {z }
x̃ |
µ = (µy , µ1 , µ2 , ..., µn )|
| {z }
µ|x

and , -
s2y Sy ,x
S= .
Sx ,y Sx
Then, we know from Property 4 above that
: ;
E(ỹ jx̃ =x )=µy +Sy ,x Sx"1 (x "µx )= µy "Sy ,x Sx̃"1 µx +Sy ,x Sx"1 x.
or equivalently,
: ;
E(ỹ jx̃ ) = µy " Sy ,x Sx̃"1 µx + Sy ,x Sx"1 x̃,
so that E(ỹ jx̃ ) is an a¢ne transformation of x̃. Thus, since
x̃ 9 MN(µx , Sx ), the random variable E(ỹ jx̃ ) is normal as dictated
by the General Proposition above.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 85 / 90
DeÖne the random variable
: ;
#̃ = ỹ " E(ỹ jx̃ ) = ỹ " µy " Sy ,x Sx̃"1 µx " Sy ,x Sx"1 x̃.

According to the General Proposition above, the random variable #̃ is


normal since it is an a¢ne transformation of the random variables
appearing in the multivariate normal random vector
(ỹ , x̃1 , x̃2 , . . . , x̃n )| . Moreover, also from the General Proposition
above, the random vector (x̃1 , x̃2 , . . . , x̃n , #̃)| is multivariate normal.

Thanks to the theorem of total expectation, we can Önd the


expectation for #̃ :

E(#̃) = E [ỹ " E(ỹ jx̃ )] = E (ỹ ) " E [E(ỹ jx̃ )] = E (ỹ ) " E (ỹ ) = 0.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 86 / 90


Moreover, the variance of #̃ is

s2# = Var(#̃) = Var [ỹ " E(ỹ jx̃ )]


h : ; i
= Var ỹ " µy " Sy ,x Sx̃"1 µx " Sy ,x Sx"1 x̃
! " ! "
= Var (ỹ ) + Var Sy ,x Sx"1 x̃ " 2Cov ỹ , Sy ,x Sx"1 x̃
= s2y + Sy ,x Sx"1 Sx Sx"1 Sx ,y " 2Sy ,x Sx"1 Sx ,y
= s2y + Sy ,x Sx"1 Sx ,y " 2Sy ,x Sx"1 Sx ,y = s2y "Sy ,x Sx"1 Sx ,y .

Therefore #̃ 9 N(0, s2y "Sy ,x Sx"1 Sx ,y ).


| {z }
s2#

Moreover, from generalizing Exercise 30, part (c), of List 3, we know


that #̃ = ỹ " E(ỹ jx̃ ) has zero covariance with x̃,
Cov (#̃, x̃ ) = 0| 2 Rn and, thus, from Property 3 above, #̃ and x̃ are
independent.
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 87 / 90
Note that, from the deÖnition of #̃, we have
: ;
ỹ = E(ỹ jx̃ ) + #̃ = µy " Sy ,x Sx̃"1 µx + Sy ,x Sx"1 x̃ + #̃.

Then, we can deÖne the scalar a = µy " Sy ,x Sx̃"1 µx and the column
vector b = Sx"1 Sx ,y 2 Rn so that the previous equation becomes

ỹ = a + b| x̃ + #̃.
|{z}
=x̃ | b

where the random vector x̃ and the random variable #̃ are


independent, and #̃ 9 N(0, s2y "Sy ,x Sx"1 Sx ,y ).

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 88 / 90


We can summarize all this section with the following proposition,
which is related with regression analysis:
Proposition. The random vector (ỹ ,x̃1 , x̃2 , . . . , x̃n )| has a
| {z }
x̃ |
multivariate normal distribution MN(µ, S) with
, 2 -
| sy Sy ,x
µ = (µy , µ1 , µ2 , ..., µn ) and S =
| {z } S x ,y Sx
µ|x

if and only if there exist a scalar a, a vector of scalars


b = ( b1 , b2 , . . . , bn )| , and a random variable #̃ 9 N(0, s2# ) such that
ỹ = a + b| x̃ + #̃ (or ỹ | = a| + x̃ | b + #̃| ),
where the random vector x̃ 9 MN(µx , Sx ) and the random variable #̃
are independent.
Moreover, the following equalities hold:
a = µy " Sy ,x Sx"1 µx , b = Sx"1 Sx ,y , and s2# = s2y "Sy ,x Sx"1 Sx ,y .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 89 / 90


It is straightforward to generalize the previous proposition when
ỹ = (ỹ1 , ỹ2 , ..., ỹm )| is a random vector as follows:
|
Proposition. The random vector (ỹ1 , ỹ2 , . . . , ỹm ,x̃1 , x̃2 , . . . , x̃n ) has
| {z }| {z }
ỹ | x̃ |
a multivariate normal distribution MN(µ, S) with
, -
| Sy Sy ,x
µ = (µy1 , µy2 , ..., µyn , µx1 , µx2 , ..., µxn ) and S=
| {z } | {z } Sx ,y Sx
µ|y µ|x

if and only if there exist a vector a 2 Rm , a n + m matrix B, and a


,

random vector #̃ = (#̃1 , #̃2 , ..., #̃m )| 9 MN(0, S# ) such that


ỹ = a + B| x̃ + #̃ (or ỹ | = a| + x̃ | B + #̃| ),
where the random vectors x̃ 9 MN(µx , Sx ) and #̃ are independent.
Moreover, the following equalities hold:
a = µy " Sy ,x Sx"1 µx , B = Sx"1 Sx ,y , and S# = Sy "Sy ,x Sx"1 Sx ,y .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 90 / 90


Informal proof of the formula of integration by change of variable

We want to compute the following Lebesgue integral:


Z
f (x)dx:
[a;b]

Note that when we consider an interval [a; b] ; we must have a < b: From now on,
whenever we write the integral of a function w.r.t. the Lebesgue measure it should be
understood that the function is not only integrable w.r.t. that measure, but also that
it is Riemann integrable so that
Z Z b
f (x)dx = f (x)dx: (1)
[a;b] a

Let x = g(y) and assume that g is di§erentiable on g !1 ([a; b]) : This requirement is
fulÖlled if we assume that the function g : M !! R is di§erentiable and M is an open
subset of R with g !1 ([a; b]) # M (or equivalently with [a; b] # g(M ) ).
Consider the inverse function g !1 : [a; b] !! R so that y = g !1 (x): This inverse
function g !1 exists if and only if the function g restricted to g !1 ([a; b]) ; i.e.,
g : g !1 ([a; b]) !! [a; b] ; is bijective (or a one-to-one correspondence). That is, g must
be strictly increasing (g 0 > 0 a.e.) or strictly decreasing (g 0 < 0 a.e.) on g !1 ([a; b]) :
Therefore, if x = a then y = g !1 (a), whereas if x = b then y = g !1 (b):
From the theory of Riemann integration, recall that
Z b
f (x)dx = F (b) ! F (a); where F 0 = f on [a; b] :
a

Therefore,
Z a Z b
f (x)dx = F (a) ! F (b) = ! f (x)dx: (2)
b a
Note that the primitive (or antiderivative) of f (g(y)) $ g 0 (y) is F (g(y)) as follows
from the chain rule,

dF (g(y))
= F 0 (g(y)) $ g(y) = f (g(y)) $ g 0 (y):
dy
Therefore,
Z g !1 (b) " !1
"g (b)
f (g(y))g 0 (y)dy = F (g(y)) "g!1 (a) = F (g(g !1 (b))) ! F (g(g !1 (a)))
g !1 (a)
(3)
Z b
= F (b) ! F (a) = f (x)dx:
a

On the one hand, if g 0 > 0 a.e. # (g is strictly increasing)


$ then g !1 (a) < g !1 (b)
and, thus, g ([a; b]) is the interval g (a) ; g (b) : On the other hand, if g 0 < 0
!1 !1 !1
a.e. !1 !1 !1
# !1(g is strictly
!1
$ decreasing) then g (a) > g (b) and, thus, g ([a; b]) is the interval
g (b) ; g (a) .
If g 0 > 0 a.e. then
Z g !1 (b) Z Z
0 0
f (g(y))g (y)dy = f (g(y))g (y)dy = f (g(y))g 0 (y)dy: (4)
g !1 (a) [g !1 (a);g !1 (b)] g !1 ([a;b]) | {z }
>0

However, if g 0 < 0 a.e. then


Z g !1 (b) Z g !1 (a) Z g !1 (a) # $
f (g(y))g 0 (y)dy = ! f (g(y))g 0 (y)dy = f (g(y)) !g 0 (y) dy
g !1 (a) g !1 (b) g !1 (b)
Z Z
# $ # $
= f (g(y)) !g 0 (y) dy = f (g(y)) !g 0 (y) dy; (5)
[g !1 (b);g !1 (a)] g !1 ([a;b]) | {z }
>0

where the Örst equality comes from (2) :


Combining (1), (3), (4), and (5) ; we get
Z Z Z g !1 (b) Z
b " "
f (x)dx = f (x)dx = 0
f (g(y))g (y)dy = f (g(y)) "g 0 (y)" dy:
[a;b] a g !1 (a) g !1 ([a;b])

The previous formula holds both for g strictly increasing and for g strictly decreasing.
1. The Gamma Distribution
! "
1 p
1.A. ! = !:
2

Proof.
R1
1st step: Let !(#) = 0
y !"1 e"y dy. We make the following change of variable:

1 dy
y = g(z) = z 2 ) =g 0 (z) = z; for y > 0; z > 0:
2 dz
Z 1 Z 1
1 2!"2 " 1 z2 1"! 1 2
) !(#) = !"1
z e 2 zdz = 2 z 2!"1 e" 2 z dz:
0 2 0
! " Z
1 p 1
1 2
=) ! = 2 e" 2 z dz:
2 0

2nd step:
% ! "&2 %Z 1 & %Z 1 & Z 1Z 1
1 " 12 z 2 " 12 x2 1 2 2
! =2 e dz e dx = 2 e" 2 (z +x ) dzdx:
2 0 0 0 0

Let us make the change to polar coordinates:


Z 1 Z 1 Z Z 1 Z %=2
" 12 (z 2 +x2 ) " 12 (z 2 +x2 ) 1 2
) e dzdx = e d(z; x) = e" 2 r rd-dr
0 0 R2+ 0 0

!Z " Z !
1 h i
1 2 1
%=2
!
" 12 r 2 %=2
= e = #e" 2 r
rdr $ [-]0 = :
d-
0 0 0 2
% ! "&2 +! , ! "
1 1 p
=) ! =2 = ! =) ! = !:
2 2 2

###############

1
Z 1 r
" 12 z 2 !
1.B. e dz = :
0 2

Proof. Observe that,


! "
1 p Z 1 " 1 z2 p
! = 2 e 2 dz = !;
2 0

where the Örst equality comes from step 1 in 1.A. and the second one comes from
step 2 in 1.A. Then, Z 1 r
" 12 z 2 !
e dz = :
0 2

###############

/ r !(# + r)
1.C. .0r = :
!(#)

Proof.
Z 1
1
.0r = xr ! x!"1 e"x=( dx:
0 / !(#)
Making the change of variable
x dx
x = g(y) = /y , y = g "1 (x) = ; so that = g 0 (y) = / > 0;
/ dy

Z 1
1
.0r = / r yr ! / !"1 y !"1 e"y / dy
0 / !(#)
r Z
/ 1
!+r"1 "y / r !(# + r)
= y e dy = :
!(#) !(#)
|0 {z }
'(!+r)

###############

2
1.D. Mx~ (t) = (1 # /t)"! if t < 1=/.

Proof.
Z 1 Z 1
1 1 1
Mx~ (t) = tx
e ! x!"1 e"x=( dx = ! x!"1 e"x( ! "t) dx:
0 / !(#) / !(#) 0

Change of variable:
! "
y 1
x = g(y) = 1 , y = g "1 (x) = x #t ;
(
#t /

so that
dx 1
= g 0 (y) = 1 > 0; if t < 1=/.
dy (
#t
Z !!"1 !
1
1 y 1
Mx~ (t) = ! 1 e"y 1 dy
/ !(#) 0 (
#t (
#t
Z 1
1 1
= + ,! y !"1 e"y dy = + ,!
/ ! !(#) 1
(
# t |0 {z } / ! 1
(
# t
'(!)

1
= =(1 # /t)"! if t < 1=/:
(1 # /t)!

###############

3
2. The Normal distribution
Z 1
2.A. n(x; .; 5) = 1:
"1

Proof. Let
x#. dx
x = g(z) = . + 5z () z = g "1 (x) = ) = g 0 (z) = 5 > 0:
5 dz
Then, Z Z 1
1
1 " 12 ( x!# )
2 1 1 " 1 z2
p e $ dx = p e 2 5 dz
"1 5 2! 2! "1 5
Z 1 r r
2 1 2 2 !
=p e" 2 z dz = $ = 1;
2! 0 ! 2
1 2
where the second equality comes from the symmetry of the function e" 2 z and
the third equality comes from 1.B.

###############

1 2 2
2.B. Mx~ (t) = e*t+ 2 t +
.

Proof. We will use in this proof a technique called "completing the square".

Z 1 Z 1
1 1 x!# 2 1 1
e p e" 2 ( $ ) dx =
2 2
Mx~ (t) = tx
p e" 2$2 ["2xt+ +(x"*) ] dx
"1 5 2! "1 5 2!
Observe that

#2xt5 2 + (x # .)2 = [x # (. + t5 2 )]2 # 2.t5 2 # t2 5 4 :


Then, the last integral equals
Z 1
1
[2*t+ 2 +t2 + 4 ] 1 1 2 2 1 2 2
e 2$2 p e" 2$2 [x"(*+t+ )] dx = e*t+ 2 t + ;
5 2!
| "1 {z }
1

R1 1 2 2
since p1 e" 2$2 [x"(*+t+ )] dx is the integral of the normal density function
"1 + 2%
1 2 2
with parameters . + t5 and 5, n(x; . + t5 2 ; 5) =
2 p1 e" 2$2 [x"(*+t+ )] :
+ 2%

4
The Standard Normal Distribution Function N(0,1)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
o
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 n
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

Also, for z = 4.0, 5.0, and 6.0 the probabilities are 0.49997, 0.4999997, 0.499999999.

You might also like