StatIdea Slides 4

4.
Special Distributions
J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 1 / 90

x̃ : (W, F , P ) "! (Rn , B)
A discrete distribution is fully characterized by its probability function,
fx̃ : x̃ (W) "! [0, 1] , since
Px̃ (B ) = P fx̃ 2 B g = Â fx̃ (x ), for all B 2 B .

x 2B
An absolutely continuous distribution

! is"fully characterized by its
density function, fx̃ : (R, B) "! R, B , since
Z
Px̃ (B ) = P fx̃ 2 B g = fx̃ (x )dx, for all B 2 B .
B
Let x̃ be an absolutely continuous random variable with density fx̃ .

We assume, without loss of generality, that the Borel set
A = fx 2 Rn jfx̃ (x ) 6= 0 g is open in Rn .
Thus, under this convention, the closure of A is the support supp (Px̃ )
of the distribution of the absolutely continuous random variable x̃.
4.1. The discrete uniform distribution and the Dirac
distribution
The random variable/vector x̃ : (W, F , P ) ! (Rn , B) has the

discrete uniform distribution if its probability function is
1
f (x ) = , for x = x1 , . . . , xk , with xi 6= xj for i 6= j.
k | {z }
x̃ (W)
If k = 1, then x̃ is a constant.

Example: The discrete random vector (x̃1 , x̃ 2 ), where x̃1 is the
number of points when rolling a dice and x̃ 2 is the number of heads
when tossing a coin has a probability function f (x1 , x2 ) summarized
in the following table:
x2 nx1 1 2 3 4 5 6
0 1/12 1/12 1/12 1/12 1/12 1/12
1 1/12 1/12 1/12 1/12 1/12 1/12
Therefore,
1
f (x1 , x2 ) = , for (x1 , x2 ) 2 f1, 2, 3, 4, 5, 6g + f0, 1g .
12

The Dirac distribution Px̃ of the random variable
x̃ : (W, F , P ) ! (R, B) satisÖes
8
< 1 if a 2 A
Px̃ (A) =
:
/ A,
0 if a 2
for all A 2 B .
Note that Px̃ fag = 1, i.e., the value a is taken almost surely by the
random variable x̃.
Some people deÖne the density function of the previous Dirac

distribution Ras a function such that fx̃ (x ) = • if x = a, fx̃ (x ) = 0 if
x 6= a, and R fx̃ (x ) dx = 1.
Obviously, there is no density function with those properties, even

though it can be viewed as the limit of a sequence of strictly positive
density functions converging to zero for all x 6= a.
Paul Dirac (1902 - 1984)

4.2. The Bernoulli, binomial, Pascal, geometric, and
hypergeometric distributions
Jacob Bernoulli (1654 - 1705)

The random variable x̃ : (W, F , P ) ! (R, B) has the Bernoulli
distribution if its probability function (pmf) is
f (x; q ) = q x (1 " q )1 "x , for x = 0, 1.
or 8
< 1"q for x = 0
f (x; q ) =
:
q for x = 1.
Mean and variance:

µ=q
and
s 2 = q (1 " q ).

The binomial distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has a binomial
distribution (or is binomial) if its probability function is
, -
n x
b (x; n, q ) = q (1 " q )n "x , for x = 0, 1, . . . , n.
x
Motivation: q is the probability of a success in each trial. Then,

b (x; n, q ) gives the probability of x successes in n independent trials.
Note:
b (x; 1, q ) = q x (1 " q )1 "x , for x = 0, 1. Bernoulli pmf
Obviously, if x̃1 , ..., x̃n are independently distributed random variables
having a Bernoulli distribution with parameter q, then its sum
S̃ = Âni=1 x̃i has a binomial distribution with parameters n and q.
Therefore, if x̃1 , ..., x̃m are independently distributed random variables
having a binomial distribution with parameters ni and q, for
i = 1, ..., m, then its sum ỹ = Âm i =1 x̃i has a binomial distribution
with parameters Âm i =1 ni and q.
Example: Let x̃ be the number of heads when tossing 4 coins.
, - , - x , - 4 "x , - , -4 , -
4 1 1 4 1 1 4
P fx̃ = x g = fx̃ (x ) = = = ,
x 2 2 x 2 16 x
| {z }
b (x ;4, 12 )
for x = 0, 1, 2, 3, 4
| {z }
x̃ (W)
or 8
> 1/16 = 0.0625 for x = 0
>
>
>
>
>
>
>
> 4/16 = 0.25
> for x = 1
, - >>
<
1
fx̃ (x ) = b x; 4, = 6/16 = 0.375 for x = 2
2 >
>
>
>
>
>
>
> 4/16 = 0.25 for x = 3
>
>
>
>
:
1/16 = 0.0625 for x = 4.
Probability Histogram (which is symmetric i§ q = 1/2):

Probability Bar Chart (which is symmetric i§ q = 1/2):

Example: Let x̃ be the number of heads when tossing 4 coins. The
coin is unbalanced and the probability of head is q = 1/3.
, - , - x , - 4 "x
4 1 2
P fx̃ = x g = fx̃ (x ) = for x = 0, 1, 2, 3, 4
x 3 3 | {z }
| {z } x̃ (W)
b (x ;4, 13 )
or 8
> 0.1975 for x = 0
>
>
>
>
>
>
>
> 0.3951 for x = 1
>
, - >>
<
1
fx̃ (x ) = b x; 4, = 0.2963 for x = 2
3 >
>
>
>
>
>
>
> 0.0988 for x = 3
>
>
>
>
:
0.0123 for x = 4.

Probability Histogram (which is not symmetric since q = 1/3):

Probability Bar Chart (which is not symmetric since q = 1/3):

Moment-generating function of a binomial random variable:
Mx̃ (t ) = [1 + q (e t " 1)]n .
Proof:
n , -
! " n x
Mx̃ (t ) = E e t x̃
= Âe tx
q ( 1 " q ) n "x
x =0 x
| {z }
b (x ;n,q )
n , -! "x
n
= Â qe t (1 " q )n "x = [qe t + (1 " q )]n = [1 + q (e t " 1)]n .
x =0 x
Then, /
µ = Mx̃0 (0) = n[1 + q (e t " 1)]n "1 qe t /t =0 = nq
and
! "
s2 = E x̃ 2 " [E (x̃ )]2 = Mx̃00 (0) " µ2 =
/
fn(n " 1)[1 + q (e t " 1)]n " 2 q 2 e 2t + n[1 + q (e t " 1)]n " 1 qe t g/t =0
" n2 q 2 = n(n " 1)q 2 + nq " n2 q 2 = "nq 2 + nq = nq (1 " q ).
The Pascal (negative binomial or binomial waiting-time)
distribution.
Blaise Pascal (1623 - 1662)

The random variable x̃ : (W, F , P ) ! (R, B) has a Pascal
distribution if its probability function is
, -
. x "1 k
b (x; k, q ) = q (1 " q )x "k , for x = k, k + 1, . . .
k "1
Motivation: q is the probability of success in each trial. Then,

b . (x; k, q ) gives the probability that the k th success will occur on the
x th independent trial.
Note: b . (x; k, q ) = qb (k " 1; x " 1, q ).
Mean, variance, and moment-generating function:
k k (1 " q )
µ= , s2 = ,
q q2
0 1k
qe t
Mx̃ (t ) = for t < " ln(1 " q ).
1 " (1 " q )e t | {z }
+

The geometric distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has a geometric
g (x; q ) = q (1 " q )x "1 , for x = 1, 2, . . .
Motivation: q is the probability of success in each trial. Then, g (x; q )

gives the probability that the Örst success will occur on the x th
independent trial.
Note: g (x; q ) = b . (x; 1, q ).
Thus, the moment-generating function is
qe t
Mx̃ (t ) = for t < " ln(1 " q ).
1 " (1 " q )e t | {z }
+
Mean and variance:
1 1"q
µ = Mx̃0 (0) = and s2 = Mx̃00 (0) " µ2 = .
q q2

The hypergeometric distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has a hypergeometric

, -, -
a N "a
x n"x
h(x; n, N, a) = , - ,
N
n
for x = 0, 1 . . . , n, x / a, n " x / N " a.

Motivation: Consider a set of N elements of which a are successes
and N " a are failures. We choose, without replacement, n of the N
elements contained in the set. Then, h(x; n, N, a) gives the
probability of getting x successes.
Mean and variance:

na
µ=
N
and
na(N " a)(N " n)
s2 = .
N 2 (N " 1)
Note:
lim h(x; n, N, a) = b (x; n, q )
N!•
a/N = q

4.3. The multinomial and multivariate hypergeometric
distributions.
The multinomial distribution.

! "
The random vector x̃ = (x̃1 , x̃2 , . . . , x̃k ) : (W, F , P ) ! Rk , B has
the multinomial distribution if its probability function is
, -
n
m (x1 , x2 , . . . , xk ; n, q 1 , q 2 , . . . , q k ) = q x11 q x22 0 0 0 q xkk ,
x1 , x2 , . . . , xk
for xi = 0, 1, . . . , n, Âki=1 xi = n and Âki=1 q i = 1.
Recall that , -
n n!
= .
x1 , x2 , . . . , xk x1 ! 0 x2 ! 0 . . . 0 xk !

Motivation: There are n independent trials permitting k exclusive
outcomes, whose respective probabilities are q 1 , q 2 , . . . , q k (with
Âki=1 q i = 1). We shall refer to the outcomes as being of the Örst
type, the second type, ... and the k th type. Then,
m (x1 , x2 , . . . , xk ; n, q 1 , q 2 , . . . , q k ) gives the probability of getting x1
outcomes of the Örst type, x2 outcomes of the second type, ... and xk
outcomes of the k th type (with Âki=1 xi = n).
Note: m (x, n " x; n, q, 1 " q ) = b (x; n, q ).
Example: Consider a very large population. 50% of the individuals

have brown eyes, 30% have black eyes, and 20% have blue eyes. We
pick randomly 8 individuals. The probability of picking 5 individuals
with brown eyes, 2 with black eyes, and 1 with blue eyes is
8!
m (5, 2, 1; 8, 0.5, 0.3, 0.2) = 0 0.55 0 0.32 0 0.21 = 0.0945.
5!2!1!

The multivariate hypergeometric distribution.
! "
The random vector x̃ = (x̃1 , x̃2 , . . . , x̃k ) : (W, F , P ) ! Rk , B has
the multivariate hypergeometric distribution if its probability function
is
, -, - , -
a1 a2 a
000 k
x1 x2 xk
f (x1 , x2 , . . . , xk ; n, N, a1 , a2 , . . . , ak ) = , - ,
N
n
for xi = 0, 1, . . . , n, and xi / ai , where Âki=1 xi = n, Âki=1 ai = N.

Motivation: There is a set of N elements, of which a1 are elements of
the Örst type, a2 are elements of the second type, ..., ak are elements
of the k th type (with Âki=1 ai = N). We choose, without replacement,
n of the N elements in the set. Then
f (x1 , x2 , . . . , xk ; n, N, a1 , a2 , . . . , ak ) gives the probability of getting x1
outcomes of the Örst type, x2 outcomes of the second type, ... and xk
outcomes of the k th type (with Âki=1 xi = n).
Note: f (x, n " x; n, N, a, N " a) = h(x; n, N, a).
Note:
lim f (x1 , x2 , . . . , xk ; n, N, a1 , a2 , . . . , ak )
N!•
a1 /N = q 1
a2 /N = q 2
000
ak /N = q k
= m (x1 , x2 , . . . , xk ; n, q 1 , q 2 , . . . , q k ) .
Example of the multinomial with N = 2000 :

, -, -, -
1000 600 400
5 2 1
f (5, 2, 1; 8, 2000, 1000, 600, 400) = , - = 0.0947
2000
8
1 m (5, 2, 1; 8, 0.5, 0.3, 0.2) = 0.0945.
4.4. Integration by parts for Lebesgue-Stieltjes integrals.
Integration by parts.
From now on, whenever we write the integral of a function w.r.t. a
measure it should be understood that the function is integrable w.r.t.
that measure.
Assume that F : M "! R and G : M "! R are continuously
di§erentiable functions, where M is an open subset of R, and
[a, b ] 2 M. Then,
Z Z
G (x )F 0 (x )dx = F (b )G (b ) " F (a)G (a) " G 0 (x )F (x )dx.
[a,b ] | {z } [a,b ]
[F (x )G (x )]ba
Note: The interval of integration can be replaced by (a, b ) , (a, b ], or

[a, b ).
The previous Lebesgue integrals are equal to their Riemann
counterparts as the functions G 0 F 0 and G 0 0 F are continuous on
[a, b ].
Remember that, if F is a distribution function, then
Z Z
G (x )dF (x ) 3 G (x )d µ (x ),
(a,b ] (a,b ]
where µ is the Lebesgue-Stieltjes measure associated with F .
Assume that F : R "! R is a distribution function, G : M "! R is

a continuously di§erentiable function, where M is an open subset of
R, and [a, b ] 2 M. Then,
Z Z
G (x )dF (x ) = F (b )G (b ) " F (a)G (a) " F (x )G 0 (x )dx. (?)
(a,b ] (a,b ]
Note that F (a+ ) = F (a), because of the right-continuity of F .

Since G is continuous on [a, b ] , then
Z Z b
G (x )dF (x ) = G (x )dF (x ) .
(a,b ]
| {z } |a {z }
Lebesgue-Stieltjes integral Riemann-Stieltjes integral
The last (Lebesgue) integral in (?) obviously satisÖes

Z Z
F (x )G 0 (x )dx = F (x )G 0 (x )dx
(a,b ] (a,b )
Z Z
0
= F (x )G (x )dx = F (x )G 0 (x )dx.
[a,b ] [a,b )
Z
G (x )dF (x ) =
[a,b ]
! " Z
G (a ) F (a ) " F (a " ) + F (b )G (b ) " F (a )G (a ) " F (x )G 0 (x )dx
[a,b ]
Z
= F (b )G (b ) " F (a " )G (a ) " F (x )G 0 (x )dx.
[a,b ]

Z
G (x )dF (x ) =
(a,b )
! " Z
F (b )G (b ) " F (a )G (a ) " G (b ) F (b ) " F (b " ) " F (x )G 0 (x )dx
(a,b )
Z
"
= F (b )G (b ) " F (a )G (a ) " F (x )G 0 (x )dx.
(a,b )
Z
G (x )dF (x ) =
[a,b )
! " Z
" "
G (a) F (a)"F (a ) + F (b )G (b )"F (a)G (a)" F (x )G 0 (x )dx
(a,b )
| R
{z }
(a,b ) G (x )dF (x )
Z
" "
= F (b )G (b ) " F (a )G (a ) " F (x )G 0 (x )dx.
[a,b )

4.5. Lebesgue integration by change of variable: polar
coordinates.
Assume that (i) g : M "! R is a continuously di§erentiable function,

where M is an open subset of R, and (ii) the function g restricted to
g "1 ([a, b ]) , g : g "1 ([a, b ]) "! [a, b ] , where [a, b ] 2 g (M ) (or
g "1 ([a, b ]) 2 M), is a bijective function (or one-to-one
correspondence). Let f : ([a, b ] , B ([a, b ])) "! (R, B) be a
Lebesgue integrable function. Then,
Z Z / /
f (x )dx = f (g (y )) /g 0 (y )/ dy .
[a,b ] g "1 ([a,b ]) | {z }
x
See the handout for an informal proof.

2 3
Note that g "1 ([a, b ]) is the2 interval g "1 (a)3, g "1 (b ) if g is
increasing or is the interval g "1 (b ) , g "1 (a) if g is decreasing.

Assume that (i) g : M "! Rn is a continuously di§erentiable
function, where M is an open subset of Rn , and (ii) the function g
restricted to g "1 (B ) , g : g "1 (B ) "! B, where B is a Borel set in
Rn such that B 2 g (M ) (or g "1 (B ) 2 M), is a bijective function
(or one-to-one correspondence).
Let Jg (y1 , y2 , ..., yn ) be the Jacobian matrix of the function
g : (y1 , ..., yn ) 7"! (g1 (y1 , y2 , ..., yn ) , ..., gn (y1 , y2 , ..., yn )),
| {z } | {z }
y 2R n g (y )2Rn
which is given by
0 1
∂g 1 (y ) ∂g 1 (y ) ∂g 1 (y )
∂y1 ∂y2 000 000 ∂yn
B C
B C
B C
B ∂g 2 (y ) ∂g 2 (y )
000 000 ∂g 2 (y ) C
Jg (y1 , y2 , ..., yn ) = B
B
∂y1 ∂y2 ∂yn C,
C
B ... ... ... ... ... C
B C
@ A
∂g n (y ) ∂g n (y ) ∂g n (y )
∂y1 ∂y2 000 000 ∂yn

Let f : (B, B (B )) "! (R, B) be a Lebesgue integrable function.
Then,
Z
f (x1 , x2, ..., xn )d (x1 , ..., xn ) =
B
Z
f (g (y1 , y2 , ..., yn )) jJg (y1 , y2 , ..., yn )j d (y1 , ..., yn ),
g "1 (B ) | {z }
(x1 ,x2, ...,xn )
where jJg (y1 , y2 , ..., yn )j is the absolute value of the determinant of

Jg (y1 , y2 , ..., yn ).

Alternatively, we can assume that (i) g : M "! Rn is a continuously
di§erentiable function, where M is an open subset of Rn , and (ii) the
function g restricted to A, g : A "! g (A), where A is a Borel set in
Rn such that A 2 M, is a bijective function (or one-to-one
correspondence). Let f : (g (A), B (g (A))) "! (R, B) be a Lebesgue
integrable function. Then,
Z
f (x1 , x2, ..., xn )d (x1 , ..., xn ) =
g (A )
Z
f (g (y1 , y2 , ..., yn )) jJg (y1 , y2 , ..., yn )j d (y1 , ..., yn ).
A | {z }
(x1 ,x2, ...,xn )

A very useful change of variable (or substitution) is the change to
polar coordinates in R2 .
Consider the function
g : R ++ + [0, 2p ) "! R2 ,
given by 8
< x = r 0 cos q
(x, y ) = g (r , q ), with
:
y = r 0 sin q.

Note that
x 2 + y 2 = r 2 cos2 q + r 2 sin2 q = r 2 .

Clearly, g (R ++ + [0, 2p )) = R2 n (0, 0) , where R2 n (0, 0) is the real
plane without the point (0, 0), which has zero Lebesgue measure.
Moreover,
! the function
" g restricted to
g "1 R2 n (0, 0) = R ++ + [0, 2p ) ,
g : R ++ + [0, 2p ) "! R2 n (0, 0) , is bijective.
C is a circular region in R2 n (0, 0) if g "1 (C ) is a measurable

rectangle on R ++ + [0, 2p ) , that is, if g "1 (C ) is the Cartesian
product of two Borel sets, g "1 (C ) = D1 + D2 , with D1 2 B (R ++ )
and D2 2 B ([0, 2p )) .

Obviously, we can deÖne the inverse function g "1 deÖned on
R2 n (0, 0) as
8 ! 2 "
> 2 1/2
< r = x +y
(r , q ) = g "1 (x, y ), with : ;
>
: q = arctan y .
x
Moreover, 2 3
cos q "r 0 sin q
Jg ( r , q ) = 4 5,
sin q r 0 cos q
so that / /
jJg (r , q )j = /r cos2 q + r sin2 q / = jr j = r .

Therefore, by making the change to polar coordinates, we can
compute the following integral over a circular region C :
Z Z
f (x, y ) d (x, y ) = f (r cos q, r sin q )r d (r , q )
C g "1 (C )
Z Z
= f (r cos q, r sin q )rd qdr ,
D1 D2
where we use Fubiniís theorem in the last equality since g "1 (C ) is a
measurable rectangle on R ++ + [0, 2p ), i.e., it is the Cartesian
product of two Borel sets, g "1 (C ) = D1 + D2 , with D1 2 B (R ++ )
and D2 2 B ([0, 2p )) .
Moreover, we can easily compute the following integral over a circular
region C :
Z ! 2 " Z
2
h x + y d (x, y ) = h (r 2 )r d (r , q )
C g "1 (C )
Z Z ,Z - ,Z -
2 2
= h(r )rd qdr = h(r )rdr dq .
D1 D2 D1 D2

Example:
Z •Z • Z
" (x 2 +y 2 )
e " (x
2 +y 2
e dxdy = ) d (x, y ) =
0 0 R + +R +
,Z - ,Z - " 2
#•
"r 2 " e "r 1 p p
= e rdr dq = 0 [q ]0p/2 = 0 = .
R ++ [0,p/2 ) 2 2 2 4
0
Note that
Z •Z • ,Z •
- ,Z •
-
" (x 2 +y 2 ) "x 2 "y 2
e dxdy = e dx e dy = M2,
0 0 0 0
Z • Z •
2 2
where M = e "x dx = e "y dy .
0 0
Therefore,
Z • : p ;1/2 p Z •
2 p 2 p
M= e "x dx = = =) e "x dx = p.
0 4 2 "•

4.6. The uniform density
The absolutely continuous random variable x̃ : (W, F , P ) ! (R, B) is

uniform (or has a uniform distribution) on (a, b) if its density
function is 8
> 1
>
< for x 2 (a, b)
b"a
f (x ) =
>
>
:
0 otherwise.
The distribution function is

8
>
> 0 for x / a
>
>
>
>
< x "a
F (x ) = for x 2 (a, b)
>
> b"a
>
>
>
>
:
1 for x 7 b.

Let a / c < d / b, then,
( )
d "c
P c / e
x / d = Px̃ [c, d ] = .
(<) (<) b"a
Mean:
Z • - Z b ,
0 2 1b
1 1 x
µ = E (x̃ ) = xf (x )dx = x dx =
"• a b"a b"a 2 a
" #
1 b2 " a2 b+a
= = ,
b"a 2 2
where the last inequality follows since ( b + a)( b " a) = b2 " a2 .

Moreover,
Z • Z b , -
! " 1
E x̃ 2
= 2
x f (x )dx = x 2
dx
"• a b"a
0 3 1b
1 x b3 " a3
= = ,
b"a 3 a 3( b " a )
so that
0 1
! " b3 " a3
2 b+a 2 ( b " a )2
Var (x̃ ) = E x̃ 2
" [E (x̃ )] = " = ,
3( b " a ) 2 12
where the last equality is obtained after some tedious algebra.

Moment-generating function:
8 tb
>
> e " eta
< for t 6= 0
t ( b " a)
Mx̃ (t ) =
>
>
:
1 for t = 0.

4.7. The gamma, exponential, and chi-square distributions.
The gamma distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has the gamma

distribution if its density is
8
> 1
>
< a x a"1 e "x /b for x > 0
b G(a)
f (x; a, b) =
>
>
:
0 otherwise
with a > 0, b > 0 and where

Z •
G(a) = y a"1 e "y dy , for a > 0. (the gamma function)
0

Densities of the gamma distribution:

Making the change of variable
x dx
x = g (y ) = by , y = g "1 (x ) = , so that = g 0 (y ) = b > 0,
b dy
we can check that

Z • Z • Z •
f (x; a, b)dx = kx a"1 e "x /b dx = k ( by )a"1 e "y bdy
0 0 0
Z •
= k ba y a"1 e "y dy = 1.
0
| {z }
G(a)
Therefore,
1
k= .
ba G(a)

Properties of the gamma function G(a).
R• •
(a) G(1) = 0 e "y dy = ["e "y ]0 = lim ("e "y ) + e 0 = 1.
y !•
(b) G(a) = (a " 1)G(a " 1) for a > 1.

Proof. Integrating by parts:
Z • 2 3• Z • ! "
G(a) = y a "1 "y
e dy = " y a "1 e "y 0 " (a " 1)y a"2 "e "y dy
0 |{z}|{z} 0 | {z }| {z }
0
F (y ) G (y ) F 0 (y ) G (y )
Z •
= 0 + ( a " 1) y a"2 e "y dy = (a " 1)G(a " 1).
0
Note that (a), (b), and the continuity of the gamma function imply
that
1 = lim+ G(a) = lim [x 0G(x )] =) lim+ G(x ) = •.

a !1 x 3 a "1 !0 + x !0
(c) G(a) = (a " 1)! when a is a strictly positive integer.

, -
1 p
(d) G = p
2
Proof: See handout 1.A.
r
R• 1 2 p
Corollary. 0
e " 2 z dz = .
2
Proof: See handout 1.B.
00 1 2 3 4
The gamma function G(a)

Let x̃ be a random variable that has a gamma distribution with
parameters a and b. Then,
br G ( a + r )
(a) µr0 = .
G(a)
Proof. See handout 1.C.
(b) Mx̃ (t ) = (1 " bt )"a for t < 1/b.
Proof. See handout 1.D.
Corollary.
(i) µ = µ10 = ab,
(ii) µ20 = a(a + 1) b2 ,
(iii) s2 = µ20 " µ2 = ab2 .

The exponential distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has the exponential
distribution if its density is the gamma density with a = 1 and b = q,
8 1
< e "x /q for x > 0
>
f (x; q ) = q
>
:
0 otherwise.

This distribution is used to model waiting time.
The distribution function is
8
< 1"e "x /q for x > 0
P fx̃ / x g =F (x ) =
:
0 otherwise,
which gives the probability of waiting less than x units of time.

Mean and variance:
µ = ab = q,
s2 = ab2 = q 2 .
1
Mx̃ (t ) = (1 " bt )"a = for t < 1/q.
1 " qt

! "
The chi-square c2 distribution.
Karl Pearson (1857 ñ 1936)

The random variable x̃ : (W, F , P ) ! (R, B) has the c2 (chi-square)
distribution if its density is the gamma density with a = n/2 and
b = 2, 8
> 1 n "2
"x /2 for x > 0
>
< n/2 n x 2 e
2 G( 2 )
f (x; n) =
>
>
:
0 otherwise.

Mean and variance:
µ = ab = n,
s2 = ab2 = 2n.
Mx̃ (t ) = (1 " bt )"a = (1 " 2t )"n/2 for t < 1/2.
Notation:
x̃ 9 B(n, q ) " x̃ has the binomial distribution

x̃ 9 U(a, b) " x̃ has the uniform distribution on (a, b)
x̃ 9 G (a, b) " x̃ has the gamma distribution
x̃ 9 c2n " x̃ has the chi-square distribution with n degrees of
freedom.

Densities of the chi-square distributions with 2, 4, and 6 degrees of
freedom:

4.8. The beta distribution.
The random variable x̃ : (W, F , P ) ! (R, B) has the beta

distribution if its density is
8
>
> G ( a + b ) a "1
< x (1 " x ) b"1 for x 2 (0, 1)
G(a)G( b)
f (x; a, b) =
>
>
:
0 otherwise,
with a > 0, b > 0.
If a = 1 and b = 1, then the beta distribution becomes the uniform

distribution on (0, 1) .

Densities of the beta distribution:
Mean and variance:

a
µ= ,
a+b
ab
s2 = .
( a + b )2 ( a + b + 1)
4.9. The normal distribution
Carl Friedrich Gauss (1777 ñ 1855)

The random variable x̃ : (W, F , P ) ! (R, B) has the normal
distribution (or is normal) if its density is
1 1 x "µ 2
n(x; µ, s) = p e " 2 ( s ) , with s > 0, for all x 2 R.
s 2p
Notation: x̃ 9 N(µ, s2 ).
A random variable z̃ has the standard normal distribution (or is

standard normal) if z̃ 9 N(0, 1). Thus, the density of a standard
normal random variable is
1 1 2
n(z; 0, 1) = p e " 2 z , for all z 2 R.
2p

Normal densities (i.e., densities of normal distributions) are symmetric
around µ.

! "
Properties of the normal distribution N µ, s2 .
R•
(1) "• n (x; µ, s ) = 1.
Proof: See handout 2.A.

1 2t2
(2) Mx̃ (t ) = e µt + 2 s . In particular, if z̃ 9 N(0, 1), then
Mz̃ (t ) = e t /2 .
2
Proof: See handout 2.B.
(3)
Mx̃0 (t ) = (µ + s2 t )Mx̃ (t ) ) Mx̃0 (0) = µ = E(x̃ ).
Mx̃00 (t ) = s2 Mx̃ (t ) + (µ + s2 t )2 Mx̃ (t ) ) Mx̃00 (0) = s2 + µ2 = E(x̃ 2 ).
Var(x̃ ) = E(x̃ 2 ) " [E(x̃ )]2 = s2 .

Note: If µx̃ and sx̃ > 0 are the mean and the standard deviation,
respectively, of the random variable x̃, then the "standardized"
x̃ " µx̃
random variable z̃ = has µz̃ = 0 and s2z̃ = sz̃ = 1.
sx̃
x̃ " µ
(4) If x̃ 9 N(µ, s2 ) and z̃ = , then z̃ 9 N(0, 1).
s
x "µ
Proof: Let x = g (z ) = µ + sz so that z = g "1 (x ) = and
s
0
g (z ) = s > 0.
For all A 2 B , we have
F G
Pz̃ (A) = P fz̃ 2 Ag = P g "1 (x̃ ) 2 A = P fx̃ 2 g (A)g
Z Z
1 1 x "µ 2 1
p e " 2 ( s ) dx =
1 2
= Px̃ (g (A)) = p e " 2 z sdz
g (A ) s 2p A s 2p
Z Z
1 " 1 z2
= p e 2 dz = n(z; 0, 1)dz.
A 2p A
Therefore, z̃ 9 N(0, 1). Q.E.D.

The distribution function of the standard normal distribution is tabulated:
'
。

The previous table gives the area of the shaded region.
'
i ! !
( ! !
1

or
溠
The previous table gives the area of the shaded region.

If a < 0, then N(a) = 1 " N("a).
If a < b < 0, then
N(b ) " N(a) = 1 " N("b ) " [1 " N("a)] = N("a) " N("b ).
if a < 0 and b > 0, then
N(b ) " N(a) = N(b ) " [1 " N("a)] = N(b ) + N("a) " 1.

Then, if x̃ 9N(µ, s2 ),
H I
a"µ x̃ " µ b"µ
P fa / x̃ / b g = P / / =
s s s
H I , - , -
a"µ b"µ b"µ a"µ
P / z̃ / =N "N ,
s s s s
where z̃ 9 N(0, 1) and N (0) is the distribution function of the
standard normal distribution.
Example. If x̃ 9 N(µ, s2 ) with µ = 4 and s2 = 49, then

H I
"2 " 4 x̃ " µ 5"4
P f"2 / x̃ / 5g = P / /
7 s 7
= P f"0.8571 / z̃ / 0.1429g = N(0.1429) " N ("0.8571)

= N(0.1429) + N (0.8571) " 1 = 0.5568 + 0.8043 " 1 = 0.3611.

4.10. The multivariate normal distribution and its
properties
The random vector x̃ = (x̃1 , x̃2 , . . . , x̃n )| : (W, F , P ) ! (Rn , B) has the
multivariate normal distribution ( x̃ 9 MN(µ, S) ) if its density function is
0 1
1 1 | "1
f (x; µ, S) = exp " (x " µ) S (x " µ) , 8 x 2 Rn ,
(2p )n/2 jSj1/2 2
where
x = (x1 , x2 , . . . , xn )| ,
0 1
µ1 0 1
B µ C s11 s12 000 s1n
B 2 C B s21 s22 000 s2n C
B .. C B C
µ=B
B . C,
C S = B 000 000
B 000 000 C,
C
B .. C @ 000 000 000 000 A
@ . A
µn sn1 sn2 000 snn
S is a symmetric positive deÖnite matrix, and jSj > 0 is the (absolute

value of the) determinant of S.
p
If n = 1, then f (x; µ, S) = n(x; µ, s), with s = S as S 2 R in
this case.
Properties of the multivariate normal distribution.
(1) The marginal distribution of any sub-vector of the multivariate

normally distributed random vector (x̃1 , x̃2 , ..., x̃n )| will be multivariate
normally distributed and the corresponding sub-vector of µ and the
corresponding sub-matrix of S will be the mean vector and the
variance-covariance matrix of that random sub-vector. In particular,
x̃ i 9 N(µi , s2i ), where s2i 3 sii ,
and
Cov(x̃ i ,x̃ j ) = sij .
However, it is not true that the joint distribution of (multivariate)

normal random vectors/variables is multivariate normal.

(2) Moment-generating function of the multivariate normal
distribution:
| 1 |
Mx̃ (t1 , t2 , ..., tn ) = e t µ+ 2 t St .
| {z }
t 2R n
(3) The random vector x̃ = (x̃1 , x̃2 , . . . , x̃n )| is multivariate normally

distributed, x̃ 9MN(µ, S), with S diagonal (i.e., sij = 0, for all
i 6= j) if and only if the random variables x̃1 , x̃2 , ..., x̃n are normally
distributed and independent.
Observe that, for S diagonal,
n
f (x1 , x2 , . . . , xn ; µ, S) = ’ n(xi ; µi , si ), where si 3 (sii )1/2 .
i =1
Under multivariate normality, zero covariance implies independency!

However, it is not true that the joint distribution of uncorrelated
normally distributed random variables is multivariate normal.
Note: When S is diagonal and s2i is the same for all i, we say that x̃
has a multivariate circular or spherical normal distribution.
(4) Consider the following partitioned vectors/matrices of x, µ, S
and x̃ :
0 1
x1
x = @ 0 0 0 A , x1 2 Rn1 , x2 2 Rn2 , n1 + n2 = n,
x2
0 1
0 1 ..
µ1 B S11 . S12 C
µ = @ 000 A, S=B
@ 000 000 000 C
A, S|12 = S21 ,
µ2 ..
S21 . S22
0 1
x̃1
x̃ = @ 0 0 0 A , where x̃1 9 MN(µ1 , S11 ) and x̃2 9 MN(µ2 , S22 ),
x̃2
h i
and Sij = Cov(x̃ i , x̃ j ) = E (x̃ i " µi ) (x̃ j " µj )| is a ni + nj matrix,
for i = 1, 2, j = 1, 2.

Then, the conditional density of the random vector x̃1 given x̃2 = x2 ,
: ; f (x; µ, S)
fx̃ 1 jx̃ 2 x1 jx2 ; µx̃ 1 jx̃ 2 =x2 , Sx̃ 1 jx̃ 2 =x2 = , 8x1 2 Rn1 ,
fx̃ 2 (x2 ; µ2 , S22 )
is the density of a multivariate normal random vector with mean

µx̃1 jx̃2 =x2 and covariance matrix Sx̃1 jx̃2 =x2 .
Moreover, the conditional mean of the random vector x̃1 given

x̃2 = x2 is the following:
"1
µx̃ 1 jx̃ 2 =x2 3 E(x̃ 1 jx̃ 2 = x2 ) = µ1 + S12 S22 (x2 " µ2 ) 2 Rn1 .
Note: If n1 = n2 = 1, then
s12
E(x̃ 1 jx̃ 2 = x2 ) = µ1 + (x2 " µ2 ) .
s22

Finally, let
Sx̃ 1 jx̃ 2 =x2 3 Var(x̃ 1 jx̃ 2 = x2 ) =
E([x̃ 1 " E(x̃ 1 jx̃ 2 = x2 )][x̃ 1 " E(x̃ 1 jx̃ 2 = x2 )]| jx̃ 2 = x2 )
be the n1 + n1 conditional covariance matrix of the random vector x̃ 1
given x̃2 = x2 . Then,
"1
Sx̃ 1 jx̃ 2 =x2 = S11 " S12 S22 S21 ,
which does not depend on the value x2 taken by the random vector
x̃ 2 .
Note: If n1 = n2 = 1, then
s212
Var(x̃ 1 jx̃ 2 = x2 ) = s21 " = s21 (1 " r2 ),
s22
so that the random variable Var(x̃ 1 jx̃ 2 ) : (W, F ) "! (R, B) is a

constant.
(5) Let x̃ = (x̃1 , x̃2 , . . . , x̃n )| 9 MN(µ, S) and a = (a1 , a2 , . . . , an )|
is a non-zero vector of scalars, then
n
a| x̃ = Â ai x̃ i 9 N(a| µ, a| Sa).
i =1
Note that in the previous result the vector x̃ = (x̃ 1 ,x̃ 2 , . . . ,x̃ n )| has
to be multivariate normal. It is not enough that each component of
that vector be normal.
General Proposition. Let x̃ = (x̃1 , x̃2 , . . . , x̃n )| 9 MN(µ, S) and

ỹ = c + Ax̃ be an a¢ne transformation of x̃, where c 2 Rm is a
column vector and A is m + n matrix with rank(A) = m / n. Then,
ỹ = (ỹ1 , ỹ2 , . . . , ỹm )| 9 MN(c + Aµ, ASA| ).

The previous Proposition implies Property 5 (when m = 1, A = a| ,
and c = 0) and Property 1. For the later, consider the following
example: to extract the sub-vector (x̃1 , x̃2 , x̃4 ) | from the random
vector x̃ = (x̃1 , x̃2 , . . . , x̃n )| use the vector c = 0 and the 3 + n matrix
0 1
1 0 0 0 0 000 0
A = @ 0 1 0 0 0 000 0 A,
0 0 0 1 0 000 0
which extracts the desired sub-vector directly.

4.11. Multivariate normality and linear models
From linearity to multivariate normality:
Consider the random variable ỹ ,
n
ỹ = a + b| x̃ + #̃ = a + Â bi x̃i + #̃,
|{z} i =1
=x̃ | b
|
where a 2 R, b = ( b1 , b2 , . . . , bn ) 2 Rn is a column vector,
x̃ = (x̃1 , x̃2 , . . . , x̃n )| 9 MN(µx , Sx ), #̃ 9 N(0, s2# ), and #̃ and x̃ are
independent.
We know from Property 3 above that the vector (x̃1 , x̃2 , . . . , x̃n , #̃)| is
multivariate normal. Thus, from the General Proposition above, the
random variable ỹ is normal since it is an a¢ne transformation of the
random variables appearing in the multivariate normal random vector
(x̃1 , x̃2 , . . . , x̃n , #̃)| . Finally, also from the General Proposition above,
the random vector (ỹ , x̃1 , x̃2 , . . . , x̃n )| is multivariate normal.
The mean µy and the variance s2y of ỹ are computed as follows:
Mean:
µy = E(ỹ ) = E (a + b| x̃ + #̃) = a + b| µx + E (#̃) .
= a + b| µx + 0 = a + b| µx . (1)
| {z }
=µ|x b
Variance:
s2y = Var(ỹ ) = Var(a + b| x̃ + #̃) = b| Sx b + s2# . (2)
Therefore, ỹ 9 N(a + b| µx , b| Sx b + s2# ).

| {z } | {z }
µy s2y

Let us compute the conditional expectation of ỹ given x̃ = x,
E(ỹ jx̃ = x ) = E ( a + b| x̃ + #̃j x̃ = x )
= a + E ( b| x̃ j x̃ = x ) + E (#̃jx̃ = x )
= a + E ( b| x ) + E (#̃) = a + b| x ,
|{z}
=x | b
where the third equality holds because #̃ and x̃ are independent.
Therefore, the conditional expectation E(ỹ jx̃ = x ) is an a¢ne

transformation of x. This agrees with our previous Property 4,
according to which,
E(ỹ jx̃ = x ) = µy + Sy ,x Sx"1 (x " µx )

: ;
= µy " Sy ,x Sx̃"1 µx + Sy ,x Sx"1 x.

Thus, the following equalities should be true:
a = µy " Sy ,x Sx"1 µx (3)
and
b| = Sy ,x Sx"1 or b = Sx"1 S|y ,x = Sx"1 Sx ,y . (4)
We can check that (4) holds since the 1 + n matrix Sy ,x satisÖes
Sy ,x = Cov (ỹ , x̃ ) = Cov (a + b| x̃ + #̃, x̃ ) = b| Sx
so that
Sy ,x Sx̃"1 = b| Sx Sx"1 = b| .

To check that (3) holds note that, from (1) and (4), we get
a = µy " b| µx = µy " Sy ,x Sx"1 µx .
We can compute now the conditional variance of ỹ given x̃ = x,
Var(ỹ jx̃ = x ) = Var ( a + b| x̃ + #̃j x̃ = x )
= Var (a + b| x + #̃) = Var (#̃) = s2# ,

which agrees with our previous Property 4, according to which,
Var(ỹ jx̃ = x ) = s2y " Sy ,x Sx"1 Sx ,y
and, thus, Var(ỹ jx̃ = x ) does not depend on the value x taken by the
random vector x̃, i.e., the random variable Var(ỹ jx̃ ) is a constant.

Therefore, we should have
s2# = s2y " Sy ,x Sx"1 Sx ,y .
We can check that the previous equality holds indeed since, from (2)
and (4), we get
s2# = s2y " b| Sx b = s2y " Sy ,x Sx"1 Sx Sx"1 Sx ,y
= s2y " Sy ,x Sx"1 Sx ,y .

From multivariate normality to linearity:
Assume that the random vector (ỹ ,x̃1 , x̃2 , . . . , x̃n )| is MN(µ, S), with
| {z }
x̃ |
µ = (µy , µ1 , µ2 , ..., µn )|
| {z }
µ|x
and , -
s2y Sy ,x
S= .
Sx ,y Sx
Then, we know from Property 4 above that
: ;
E(ỹ jx̃ =x )=µy +Sy ,x Sx"1 (x "µx )= µy "Sy ,x Sx̃"1 µx +Sy ,x Sx"1 x.
or equivalently,
: ;
E(ỹ jx̃ ) = µy " Sy ,x Sx̃"1 µx + Sy ,x Sx"1 x̃,
so that E(ỹ jx̃ ) is an a¢ne transformation of x̃. Thus, since
x̃ 9 MN(µx , Sx ), the random variable E(ỹ jx̃ ) is normal as dictated
by the General Proposition above.
DeÖne the random variable
: ;
#̃ = ỹ " E(ỹ jx̃ ) = ỹ " µy " Sy ,x Sx̃"1 µx " Sy ,x Sx"1 x̃.
According to the General Proposition above, the random variable #̃ is

normal since it is an a¢ne transformation of the random variables
appearing in the multivariate normal random vector
(ỹ , x̃1 , x̃2 , . . . , x̃n )| . Moreover, also from the General Proposition
above, the random vector (x̃1 , x̃2 , . . . , x̃n , #̃)| is multivariate normal.
Thanks to the theorem of total expectation, we can Önd the

expectation for #̃ :
E(#̃) = E [ỹ " E(ỹ jx̃ )] = E (ỹ ) " E [E(ỹ jx̃ )] = E (ỹ ) " E (ỹ ) = 0.

Moreover, the variance of #̃ is
s2# = Var(#̃) = Var [ỹ " E(ỹ jx̃ )]

h : ; i
= Var ỹ " µy " Sy ,x Sx̃"1 µx " Sy ,x Sx"1 x̃
! " ! "
= Var (ỹ ) + Var Sy ,x Sx"1 x̃ " 2Cov ỹ , Sy ,x Sx"1 x̃
= s2y + Sy ,x Sx"1 Sx Sx"1 Sx ,y " 2Sy ,x Sx"1 Sx ,y
= s2y + Sy ,x Sx"1 Sx ,y " 2Sy ,x Sx"1 Sx ,y = s2y "Sy ,x Sx"1 Sx ,y .
Therefore #̃ 9 N(0, s2y "Sy ,x Sx"1 Sx ,y ).

| {z }
s2#
Moreover, from generalizing Exercise 30, part (c), of List 3, we know

that #̃ = ỹ " E(ỹ jx̃ ) has zero covariance with x̃,
Cov (#̃, x̃ ) = 0| 2 Rn and, thus, from Property 3 above, #̃ and x̃ are
independent.
Note that, from the deÖnition of #̃, we have
: ;
ỹ = E(ỹ jx̃ ) + #̃ = µy " Sy ,x Sx̃"1 µx + Sy ,x Sx"1 x̃ + #̃.
Then, we can deÖne the scalar a = µy " Sy ,x Sx̃"1 µx and the column
vector b = Sx"1 Sx ,y 2 Rn so that the previous equation becomes
ỹ = a + b| x̃ + #̃.
|{z}
=x̃ | b
where the random vector x̃ and the random variable #̃ are

independent, and #̃ 9 N(0, s2y "Sy ,x Sx"1 Sx ,y ).

We can summarize all this section with the following proposition,
which is related with regression analysis:
Proposition. The random vector (ỹ ,x̃1 , x̃2 , . . . , x̃n )| has a
| {z }
x̃ |
multivariate normal distribution MN(µ, S) with
, 2 -
| sy Sy ,x
µ = (µy , µ1 , µ2 , ..., µn ) and S =
| {z } S x ,y Sx
µ|x
if and only if there exist a scalar a, a vector of scalars

b = ( b1 , b2 , . . . , bn )| , and a random variable #̃ 9 N(0, s2# ) such that
ỹ = a + b| x̃ + #̃ (or ỹ | = a| + x̃ | b + #̃| ),
where the random vector x̃ 9 MN(µx , Sx ) and the random variable #̃
are independent.
Moreover, the following equalities hold:
a = µy " Sy ,x Sx"1 µx , b = Sx"1 Sx ,y , and s2# = s2y "Sy ,x Sx"1 Sx ,y .

It is straightforward to generalize the previous proposition when
ỹ = (ỹ1 , ỹ2 , ..., ỹm )| is a random vector as follows:
|
Proposition. The random vector (ỹ1 , ỹ2 , . . . , ỹm ,x̃1 , x̃2 , . . . , x̃n ) has
| {z }| {z }
ỹ | x̃ |
a multivariate normal distribution MN(µ, S) with
, -
| Sy Sy ,x
µ = (µy1 , µy2 , ..., µyn , µx1 , µx2 , ..., µxn ) and S=
| {z } | {z } Sx ,y Sx
µ|y µ|x
if and only if there exist a vector a 2 Rm , a n + m matrix B, and a

,
random vector #̃ = (#̃1 , #̃2 , ..., #̃m )| 9 MN(0, S# ) such that

ỹ = a + B| x̃ + #̃ (or ỹ | = a| + x̃ | B + #̃| ),
where the random vectors x̃ 9 MN(µx , Sx ) and #̃ are independent.
Moreover, the following equalities hold:
a = µy " Sy ,x Sx"1 µx , B = Sx"1 Sx ,y , and S# = Sy "Sy ,x Sx"1 Sx ,y .

Informal proof of the formula of integration by change of variable
We want to compute the following Lebesgue integral:

Z
f (x)dx:
[a;b]
Note that when we consider an interval [a; b] ; we must have a < b: From now on,
whenever we write the integral of a function w.r.t. the Lebesgue measure it should be
understood that the function is not only integrable w.r.t. that measure, but also that
it is Riemann integrable so that
Z Z b
f (x)dx = f (x)dx: (1)
[a;b] a
Let x = g(y) and assume that g is di§erentiable on g !1 ([a; b]) : This requirement is
fulÖlled if we assume that the function g : M !! R is di§erentiable and M is an open
subset of R with g !1 ([a; b]) # M (or equivalently with [a; b] # g(M ) ).
Consider the inverse function g !1 : [a; b] !! R so that y = g !1 (x): This inverse
function g !1 exists if and only if the function g restricted to g !1 ([a; b]) ; i.e.,
g : g !1 ([a; b]) !! [a; b] ; is bijective (or a one-to-one correspondence). That is, g must
be strictly increasing (g 0 > 0 a.e.) or strictly decreasing (g 0 < 0 a.e.) on g !1 ([a; b]) :
Therefore, if x = a then y = g !1 (a), whereas if x = b then y = g !1 (b):
From the theory of Riemann integration, recall that
Z b
f (x)dx = F (b) ! F (a); where F 0 = f on [a; b] :
a
Therefore,
Z a Z b
f (x)dx = F (a) ! F (b) = ! f (x)dx: (2)
b a
Note that the primitive (or antiderivative) of f (g(y)) $ g 0 (y) is F (g(y)) as follows
from the chain rule,
dF (g(y))
= F 0 (g(y)) $ g(y) = f (g(y)) $ g 0 (y):
dy
Therefore,
Z g !1 (b) " !1
"g (b)
f (g(y))g 0 (y)dy = F (g(y)) "g!1 (a) = F (g(g !1 (b))) ! F (g(g !1 (a)))
g !1 (a)
(3)
Z b
= F (b) ! F (a) = f (x)dx:
a
On the one hand, if g 0 > 0 a.e. # (g is strictly increasing)

$ then g !1 (a) < g !1 (b)
and, thus, g ([a; b]) is the interval g (a) ; g (b) : On the other hand, if g 0 < 0
!1 !1 !1
a.e. !1 !1 !1
# !1(g is strictly
!1
$ decreasing) then g (a) > g (b) and, thus, g ([a; b]) is the interval
g (b) ; g (a) .
If g 0 > 0 a.e. then
Z g !1 (b) Z Z
0 0
f (g(y))g (y)dy = f (g(y))g (y)dy = f (g(y))g 0 (y)dy: (4)
g !1 (a) [g !1 (a);g !1 (b)] g !1 ([a;b]) | {z }
>0
However, if g 0 < 0 a.e. then

Z g !1 (b) Z g !1 (a) Z g !1 (a) # $
f (g(y))g 0 (y)dy = ! f (g(y))g 0 (y)dy = f (g(y)) !g 0 (y) dy
g !1 (a) g !1 (b) g !1 (b)
Z Z
# $ # $
= f (g(y)) !g 0 (y) dy = f (g(y)) !g 0 (y) dy; (5)
[g !1 (b);g !1 (a)] g !1 ([a;b]) | {z }
>0
where the Örst equality comes from (2) :

Combining (1), (3), (4), and (5) ; we get
Z Z Z g !1 (b) Z
b " "
f (x)dx = f (x)dx = 0
f (g(y))g (y)dy = f (g(y)) "g 0 (y)" dy:
[a;b] a g !1 (a) g !1 ([a;b])
The previous formula holds both for g strictly increasing and for g strictly decreasing.
1. The Gamma Distribution
! "
1 p
1.A. ! = !:
2
Proof.
R1
1st step: Let !(#) = 0
y !"1 e"y dy. We make the following change of variable:
1 dy
y = g(z) = z 2 ) =g 0 (z) = z; for y > 0; z > 0:
2 dz
Z 1 Z 1
1 2!"2 " 1 z2 1"! 1 2
) !(#) = !"1
z e 2 zdz = 2 z 2!"1 e" 2 z dz:
0 2 0
! " Z
1 p 1
1 2
=) ! = 2 e" 2 z dz:
2 0
2nd step:
% ! "&2 %Z 1 & %Z 1 & Z 1Z 1
1 " 12 z 2 " 12 x2 1 2 2
! =2 e dz e dx = 2 e" 2 (z +x ) dzdx:
2 0 0 0 0
Let us make the change to polar coordinates:

Z 1 Z 1 Z Z 1 Z %=2
" 12 (z 2 +x2 ) " 12 (z 2 +x2 ) 1 2
) e dzdx = e d(z; x) = e" 2 r rd-dr
0 0 R2+ 0 0
!Z " Z !
1 h i
1 2 1
%=2
!
" 12 r 2 %=2
= e = #e" 2 r
rdr $ [-]0 = :
d-
0 0 0 2
% ! "&2 +! , ! "
1 1 p
=) ! =2 = ! =) ! = !:
2 2 2
###############
1
Z 1 r
" 12 z 2 !
1.B. e dz = :
0 2
Proof. Observe that,

! "
1 p Z 1 " 1 z2 p
! = 2 e 2 dz = !;
2 0
where the Örst equality comes from step 1 in 1.A. and the second one comes from
step 2 in 1.A. Then, Z 1 r
" 12 z 2 !
e dz = :
0 2
###############
/ r !(# + r)
1.C. .0r = :
!(#)
Proof.
Z 1
1
.0r = xr ! x!"1 e"x=( dx:
0 / !(#)
Making the change of variable
x dx
x = g(y) = /y , y = g "1 (x) = ; so that = g 0 (y) = / > 0;
/ dy
Z 1
1
.0r = / r yr ! / !"1 y !"1 e"y / dy
0 / !(#)
r Z
/ 1
!+r"1 "y / r !(# + r)
= y e dy = :
!(#) !(#)
|0 {z }
'(!+r)
###############
2
1.D. Mx~ (t) = (1 # /t)"! if t < 1=/.
Proof.
Z 1 Z 1
1 1 1
Mx~ (t) = tx
e ! x!"1 e"x=( dx = ! x!"1 e"x( ! "t) dx:
0 / !(#) / !(#) 0
Change of variable:
! "
y 1
x = g(y) = 1 , y = g "1 (x) = x #t ;
(
#t /
so that
dx 1
= g 0 (y) = 1 > 0; if t < 1=/.
dy (
#t
Z !!"1 !
1
1 y 1
Mx~ (t) = ! 1 e"y 1 dy
/ !(#) 0 (
#t (
#t
Z 1
1 1
= + ,! y !"1 e"y dy = + ,!
/ ! !(#) 1
(
# t |0 {z } / ! 1
(
# t
'(!)
1
= =(1 # /t)"! if t < 1=/:
(1 # /t)!
###############
3
2. The Normal distribution
Z 1
2.A. n(x; .; 5) = 1:
"1
Proof. Let
x#. dx
x = g(z) = . + 5z () z = g "1 (x) = ) = g 0 (z) = 5 > 0:
5 dz
Then, Z Z 1
1
1 " 12 ( x!# )
2 1 1 " 1 z2
p e $ dx = p e 2 5 dz
"1 5 2! 2! "1 5
Z 1 r r
2 1 2 2 !
=p e" 2 z dz = $ = 1;
2! 0 ! 2
1 2
where the second equality comes from the symmetry of the function e" 2 z and
the third equality comes from 1.B.
###############
1 2 2
2.B. Mx~ (t) = e*t+ 2 t +
.
Proof. We will use in this proof a technique called "completing the square".
Z 1 Z 1
1 1 x!# 2 1 1
e p e" 2 ( $ ) dx =
2 2
Mx~ (t) = tx
p e" 2$2 ["2xt+ +(x"*) ] dx
"1 5 2! "1 5 2!
Observe that
#2xt5 2 + (x # .)2 = [x # (. + t5 2 )]2 # 2.t5 2 # t2 5 4 :

Then, the last integral equals
Z 1
1
[2*t+ 2 +t2 + 4 ] 1 1 2 2 1 2 2
e 2$2 p e" 2$2 [x"(*+t+ )] dx = e*t+ 2 t + ;
5 2!
| "1 {z }
1
R1 1 2 2
since p1 e" 2$2 [x"(*+t+ )] dx is the integral of the normal density function
"1 + 2%
1 2 2
with parameters . + t5 and 5, n(x; . + t5 2 ; 5) =
2 p1 e" 2$2 [x"(*+t+ )] :
+ 2%
4
The Standard Normal Distribution Function N(0,1)
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
o
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621 n
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
Also, for z = 4.0, 5.0, and 6.0 the probabilities are 0.49997, 0.4999997, 0.499999999.

StatIdea Slides 4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

StatIdea Slides 4

Uploaded by

Copyright:

Available Formats

4.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 1 / 90

Px̃ (B ) = P fx̃ 2 B g = Â fx̃ (x ), for all B 2 B .

An absolutely continuous distribution

Let x̃ be an absolutely continuous random variable with density fx̃ .

The random variable/vector x̃ : (W, F , P ) ! (Rn , B) has the

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 3 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 4 / 90

Some people deÖne the density function of the previous Dirac

Obviously, there is no density function with those properties, even

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 6 / 90

Jacob Bernoulli (1654 - 1705)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 7 / 90

f (x; q ) = q x (1 " q )1 "x , for x = 0, 1.

Mean and variance:

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 8 / 90

Motivation: q is the probability of a success in each trial. Then,

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 11 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 12 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 13 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 14 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 15 / 90

Blaise Pascal (1623 - 1662)

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 17 / 90

Motivation: q is the probability of success in each trial. Then,

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 18 / 90

Motivation: q is the probability of success in each trial. Then, g (x; q )

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 19 / 90

The random variable x̃ : (W, F , P ) ! (R, B) has a hypergeometric

for x = 0, 1 . . . , n, x / a, n " x / N " a.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 20 / 90

Mean and variance:

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 21 / 90

The multinomial distribution.

for xi = 0, 1, . . . , n, Âki=1 xi = n and Âki=1 q i = 1.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 22 / 90

Note: m (x, n " x; n, q, 1 " q ) = b (x; n, q ).

Example: Consider a very large population. 50% of the individuals

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 23 / 90

for xi = 0, 1, . . . , n, and xi / ai , where Âki=1 xi = n, Âki=1 ai = N.

Example of the multinomial with N = 2000 :

Note: The interval of integration can be replaced by (a, b ) , (a, b ], or

where µ is the Lebesgue-Stieltjes measure associated with F .

Assume that F : R "! R is a distribution function, G : M "! R is

Note that F (a+ ) = F (a), because of the right-continuity of F .

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 27 / 90

The last (Lebesgue) integral in (?) obviously satisÖes

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 28 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 29 / 90

Assume that (i) g : M "! R is a continuously di§erentiable function,

See the handout for an informal proof.

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 30 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 31 / 90

where jJg (y1 , y2 , ..., yn )j is the absolute value of the determinant of

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 32 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 33 / 90

Consider the function

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 34 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 35 / 90

C is a circular region in R2 n (0, 0) if g "1 (C ) is a measurable

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 36 / 90

J. CaballÈ (UAB - MOVE - BSE) Probability and Statistics IDEA 37 / 90