Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

SMSTC (2022/23)

Foundations of Probability

www.smstc.ac.uk

Contents

5 Important special distributions 5–1


5.1 The Binomial, Poisson, and Multinomial distributions . . . . . . . . . . . . . . . 5–1
5.1.1 The Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–1
5.1.2 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2
5.1.3 The Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5.1.4 Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5.2 The Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–5
5.3 The Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5.4 The Exponential and Gamma distributions . . . . . . . . . . . . . . . . . . . . . 5–7
5.4.1 The Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.4.2 Convergence of Geometric distributions to Exponential . . . . . . . . . . 5–8
5.4.3 The Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5.4.4 Relations between Exponential and Gamma distributions . . . . . . . . . 5–9
5.5 The Gaussian (Normal) distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5–10
5.5.1 The one-dimensional Normal distribution . . . . . . . . . . . . . . . . . . 5–10
5.5.2 Relations between Gamma, Exponential and Normal Distributions. . . . 5–11
5.5.3 The Multivariate (Multidimensional) Normal Distribution . . . . . . . . . 5–12
5.5.4 Conditional Normal distribution . . . . . . . . . . . . . . . . . . . . . . . 5–15

(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 5: Important special distributions
The Probability Teama

www.smstc.ac.uk

Contents
5.1 The Binomial, Poisson, and Multinomial distributions . . . . . . . . 5–1
5.1.1 The Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 5–1
5.1.2 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2
5.1.3 The Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5.1.4 Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5.2 The Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 5–5
5.3 The Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5.4 The Exponential and Gamma distributions . . . . . . . . . . . . . . . 5–7
5.4.1 The Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.4.2 Convergence of Geometric distributions to Exponential . . . . . . . . . 5–8
5.4.3 The Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5.4.4 Relations between Exponential and Gamma distributions . . . . . . . . 5–9
5.5 The Gaussian (Normal) distribution . . . . . . . . . . . . . . . . . . . 5–10
5.5.1 The one-dimensional Normal distribution . . . . . . . . . . . . . . . . . 5–10
5.5.2 Relations between Gamma, Exponential and Normal Distributions. . . 5–11
5.5.3 The Multivariate (Multidimensional) Normal Distribution . . . . . . . . 5–12
5.5.4 Conditional Normal distribution . . . . . . . . . . . . . . . . . . . . . . 5–15

5.1 The Binomial, Poisson, and Multinomial distributions


5.1.1 The Binomial distribution
Consider the coin tossing experiment, i.e. consider a sequence ξ1 , ξ2 , . . . of i.i.d. Bernoulli random
variables (see Chapter 2) with

P(ξ1 = 1) = p, P(ξ1 = 0) = 1 − p.

The distribution of the sum


S n = ξ1 + · · · + ξn
is called Binomial with parameters n and p. We write, in short, Sn ∼ B(n, p).
From this we have
ESn = np, Var Sn = n Var ξ1 = np(1 − p).
a
B.Buke@hw.ac.uk

5–1
SMSTC: Foundations of Probability 5–2

Further, since ξn+1 + . . . + ξn+m has also Binomial distribution with parameters m and p and
does not depend on Sn , we have that the sum of two independent Binomial random variables
with the same parameter p is again Binomial. We also have
 
n k
P(Sn = k) = p (1 − p)n−k , k = 0, 1, . . . , n,
k
and, from the binomial theorem (or from the properties of moment generating functions), the
moment generating function of Sn is given by
MSn (θ) := EeθSn = (peθ + 1 − p)n .
Exercise 5.1. Compute the moment generating function of n−1/2 (Sn − ESn ) and show that, as
2
n → ∞, it converges to a function of the form ecθ . (As you already know, this is the moment
generating function of a Normal (Gaussian) random variable).
Exercise 5.2. Use Chernoff’s inequality to estimate P(Sn > n(p + x)) for x > 0.

5.1.2 The Poisson distribution


Again suppose that Sn has a Binomial distribution with parameters n and p. By letting p vary
with n and taking limits we obtain a different fundamental law, the Poisson law. Specifically,
λ
Lemma 5.1. If pn = n + o(1/n), as n → ∞, then
λk −λ
P(Sn = k) → e ,
k!
for all k = 0, 1, 2, . . ..
Proof Indeed,
n!
P(Sn = k) = pk (1 − pn )n−k
k!(n − k)! n
1 n(n − 1) . . . (n − k + 1)
= · · (npn )k · (1 − pn )n · (1 − pn )−k
k! nk
1
→ · 1 · λk · e−λ · 1
k!
λk −λ
= e .
k!

This is the Poisson distribution (law) with parameter λ. The Poisson law is fundamental when,
roughly speaking, we deal with counts of independent rare events, either exactly or as a limiting
distribution of the quantity of interest. The Poisson limit of the sum of i.i.d. Bernoulli random
variables in Lemma 5.1 above may be greatly generalised, by allowing the probabilities p to vary
between different Bernoulli variables, and by relaxing the assumption of independence between
Bernoulli variables. The Poisson distribution also arises in the context of Poisson processes,
which are fundamental models for random processes which you will study in the Stochastic
Processes module.
Let a random variable (r.v.) X have a Poisson distribution with parameter λ, X ∼ P ois(λ).
Then its moment generating function MX is given by

X (λeθ )k θ −1)
MX (θ) := EeθX = e−λ = eλ(e .
k!
k=0

Differentiating twice, we find


EX = λ, Var X = λ.
SMSTC: Foundations of Probability 5–3

Lemma 5.2. If XP 1 , X2 , . . . , Xn are independent Poisson


P random variables with parameters
λ1 , λ2 , . . . , λn then nk=1 Xk is Poisson with parameter nk=1 λk .

Proof The moment generating function of X1 + . . . + Xn is


n Pn
θ −1) θ −1)
Y
MX1 +...+Xn (θ) = eλk (e = e(e k=1 λk

k=1

and this is the moment generating function of the Poisson law with parameter nk=1 λk . Then
P
the result follows since the moment generating function determines the distribution uniquely. 
Next we consider conditional probabilities:

Lemma 5.3. Suppose that X1 , X2 are independent Poisson random variables with parameters
λ1 , λ2 . Then, conditional on X1 + X2 = n, we have that X1 is Binomial with parameters n,
λ1 /(λ1 + λ2 ).

Proof For 0 ≤ k ≤ n,

P(X1 = k, X2 = n − k)
P(X1 = k|X1 + X2 = n) =
P(X1 + X2 = n)
λk1 −λ1 λn−k
e 2
e−λ2
k! (n − k)!
=
(λ1 + λ2 )n −λ1 −λ2
e
n!
  k  n−k
n λ1 λ2
= .
k λ1 + λ2 λ1 + λ2

5.1.3 The Multinomial distribution


Generalising Lemma 5.3, we have:

Lemma 5.4. Suppose that X1 , X2 , . . . , Xd are independent Poisson P random variables with pa-
rameters λ1 , λ2 , . . . , λd , respectively. Then, conditional on k Xk = n, the random vector
(X1 , . . . , Xd ) has law given by
   n1  nd
X n λ1 λd
P(X1 = n1 , . . . , Xd = nd | Xk = n) = ··· ,
n1 , . . . , n d λ λ
k
P
where (n1 , . . . , nd ) are nonnegative integers with sum equal to n, and λ = k λk , and where
 
n n!
=
n1 , . . . , n d n1 ! · · · nd !

Exercise 5.3. Prove the above lemma.


 
n
The symbol is the multinomial coefficient (see Chapter 1) since it appears in the
n1 , . . . , n d
algebraic identity known as the multinomial theorem:
X n

n
(x1 + · · · + xd ) = xn1 · · · xnd d .
n1 , . . . , n d 1
SMSTC: Foundations of Probability 5–4

The sum is taken over all nonnegative integers (n1 , . . . , nd ), with sum equal to n.
The random variable (X1 , . . . , Xd ) with values in Zd+ is said to have a Multinomial distribution
with parameters d, n, p1 , . . . , pd (where p1 + · · · + pd = 1, so one of them is superfluous) if
 
n
P(X1 = n1 , . . . , Xd = nd ) = pn1 · · · pnd d ,
n1 , . . . , n d 1

where (n1 , . . . , nd ) are nonnegative integers with sum equal to n.


Of course, a multinomial distribution with parameters 2, n, p, 1 − p is a binomial distribution
with parameters n, p.

5.1.4 Thinning
Suppose that an urn contains n balls. There are d colours available. Let the colours be denoted
by c1 , c2 , . . . , cd . To each ball assign colour ci with probability pi , independently from ball to
ball. Let Sni be the number of balls that have colour ci , 1 ≤ i ≤ d. It is easy to see that
(Sn1 , . . . , Snd ) has the multinomial distribution:
 
n
P(Sn1 = k1 , . . . , Snd = kd ) = pk1 · · · pkdd ,
k1 , . . . , kd 1

where (k1 , . . . , kd ) are nonnegative integers summing up to n. Clearly,

Sn1 + · · · + Snd = n,

so the random variables Sn1 , . . . , Snd cannot be independent. The question is:

Suppose that the number of balls is itself a random variable, independent of every-
thing else. Is there a way to choose the law of this random variable so that the above
variables are independent?

To put the problem in precise mathematical terms, let ξ1 , ξ2 , . . . be i.i.d. random colours, i.e.
random variables taking values in {c1 , . . . , cd } such that

P(ξ1 = ci ) = pi , 1 ≤ i ≤ d.

Let
n
X
Sni := 1(ξk = ci ).
k=1

(In physical terms, Sni denotes precisely what we talked about earlier using more flowery lan-
guage.) Now, independent of the sequence ξ1 , ξ2 , . . ., let N be a random variable with values in
Z+ . The problem is to find its law so that
1 d
SN , . . . , SN are independent random variables. (?)

It turns out that this is a characterising property of the Poisson law. We will content ourselves
with proving one direction and only for d = 2.

Lemma 5.5. If N is Poisson with parameter λ, then SN i is also Poisson with parameter λp ,
i
i = 1, 2 and these random variables are independent.
SMSTC: Foundations of Probability 5–5

Proof Indeed, for any non-negative integers n and m,


1
P(SN = n, Sn2 = m) = P(N = n + m, SN
1
= n)
1
= P(N = n + m) · P(SN = n | N = n + m)
n+m
 
λ n+m n m
= e−λ · p1 p2
(n + m)! n
(λp1 )n −λp1 (λp2 )m −λp2
= e · e
n! m!
We observed in the previous chapter that, for discrete random variables, factorization of the
joint probability is sufficient for independence. From the formula above, it is easy to see that
the normalising coefficients are equal to one and that the marginal distributions are Poisson. 

5.2 The Geometric distribution


A random variable X with values in N = {1, 2, . . .} has a Geometric distribution with parameter
p ∈ (0, 1), X ∼ Geom(p) if

P(X = k) = pq k−1 , k = 1, 2, . . .

where q = 1 − p ∈ (0, 1). Therefore,

P(X > k) = q k , k ∈ N.

The geometric distribution has the memoryless property:

Lemma 5.6. For each k ∈ N, the conditional distribution of X − k given {X > k} is the same
as the distribution of X:
P(X − k = n|X > k) = P(X = n).

Think of X as a random time, e.g. the day on which a certain volcano will erupt. The property
above says that if by day k the volcano has not erupted then the remaining time X − k has the
same law as X, no matter how large k is.
Proof Indeed,

P(X > k + n, X > k) P(X > k + n) q k+n


P(X > k + n|X > k) = = = k = qn.
P(X > k) P(X > k) q

Exercise 5.4. Show the opposite: if a random variable X with values in N has the memoryless
property, then it is geometric with parameter p = P(X = 1).

By direct calculations, one can obtain the following:

Lemma 5.7. If X has a geometric distribution with parameter p, then

peθ 1
EX = 1/p, Var X = (1 − p)/p2 and EeθX = , for all θ < ln .
1 − (1 − p)eθ 1−p

Exercise 5.5. Prove the above lemma.


SMSTC: Foundations of Probability 5–6

A concrete way to get a geometric random variable is by considering ξ1 , ξ2 , . . . to be independent


identically distributed (i.i.d.) Bernoulli random variables,

P(ξ1 = 1) = 1 − P(ξ1 = 0) = p

and by letting
X = inf{k ≥ 1 : ξk = 1}.
We have P(X < ∞) = 1, so
X = min{k ≥ 1 : ξk = 1}
and
P(X > k) = P(ξ1 = · · · = ξk = 0) = (1 − p)k ,
as required.

Lemma 5.8. Suppose that X1 , X2 , . . . , Xd are independent and geometric random variables.
Then X = min(X1 , . . . , Xd ) is also geometric.

Proof

P(X > k) = P(X1 > k, . . . , Xd > k)


= P(X1 > k) · · · P(Xd > k)
= q1k · · · qdk = (q1 · · · qd )k .

5.3 The Uniform distribution


Recall that U has a uniform distribution in the interval [0, 1] if P(U ≤ x) = x, 0 ≤ x ≤ 1. We
write in short U ∼ U (0, 1). More generally,

X has a uniform distribution in [a, b] (X ∼ U [a, b]) if, for all intervals I contained in
[a, b], the probability P(X ∈ I) is proportional to the length of I contained within
[a, b].

Exercise 5.6. If X is uniform in [a, b], then cX + d is uniform in the interval with endpoints
ca + d and cb + d.

Lemma 5.9. Let p1 , . . . , pd be positive numbers adding up to 1. Split the interval [0, 1] into
consecutive intervals I1 , . . . , Id of lengths p1 , . . . , pd , respectively. Let U1 , . . . , Un be i.i.d. uniform
in [0, 1]. Let
Xn
i
Sn = 1(Uk ∈ Ii ), 1 ≤ i ≤ d.
k=1

Then (Sn1 , . . . , Snd ) has a multinomial law. In particular, Sni is Binomial with parameters n, pi .

Exercise 5.7. Show this last lemma.

Exercise 5.8. Let U1 , . . . , Ud be i.i.d. uniform in [0, 1]. Compute the probability P(U1 < U2 <
· · · < Ud ).

Exercise 5.9. Consider a stick of length 1 and break it into 3 pieces, by choosing the two break
points at random. Find the probability that the 3 smaller sticks can be joined to form a triangle.
SMSTC: Foundations of Probability 5–7

5.4 The Exponential and Gamma distributions


5.4.1 The Exponential distribution
A random variable T with values in R+ has an Exponential distribution with parameter (or rate)
λ > 0, T ∼ Exp(λ), if
P(T > t) = e−λt , t ≥ 0.
Lemma 5.10. If T is exponential with rate λ, then T has an absolutely continuous distribution
with density
f (t) = λe−λt , t ≥ 0.
Also, the moment generating function
λ
MT (θ) := EeθT = ,
λ−θ
is defined for all θ < λ, and
ET = 1/λ, Var T = 1/λ2 .

Proof Note that Z t


f (s)ds = 1 − e−λt ,
0
showing that f is a density for T . The remaining parts are direct computations. 
An exponentially distributed random variable has a memoryless property which is similar to
that of a geometric distribution:
Lemma 5.11. For all t, s > 0,

P(T − t > s|T > t) = P(T > s).

Proof Indeed,

P(T > t + s, T > t) P(T > t + s) e−λ(t+s)


P(T − t > s|T > t) = = = = e−λs .
P(T > t) P(T > t) e−λt


Lemma 5.12. Let T1 , T2 , . . . , Td be independent exponential random variables with parameters


λ1 , λ2 , . . . , λd , respectively. Then min(T1 , . . . , Td ) is exponential with parameter λ1 + · · · + λd .

Proof
P(min(T1 , . . . , Td ) > t) = P(T1 > t) · · · P(Td > t).

Whereas an exponential distribution is the natural analogue of a geometric distribution, in that
they are both memoryless, the former also enjoys the important scaling property (with the
almost obvious proof):

If T is exponential with parameter λ then, for any c > 0, cT is exponential with


parameter λ/c. Therefore, 1/λ is the scale parameter of the exponential distribution.
Exercise 5.10. Let T1 , T2 , . . . , Td be independent exponential random variables with a common
parameter 1. Show that
 
T2 Td
law of max(T1 , . . . , Td ) = law of T1 + + ··· + .
2 d
SMSTC: Foundations of Probability 5–8

5.4.2 Convergence of Geometric distributions to Exponential


Another relation between the geometric and exponential distributions is the following: Let p be
very small. Let Xp be geometric with parameter p. Consider a scaling of Xp by p, i.e. the random
variable pXp which takes values p, 2p, 3p, . . .. Then the law of pXp converges in distribution to
an exponential law with rate 1:
Lemma 5.13 (Rényi). For Xp geometric with parameter p,

lim P(pXp > t) = e−t , t > 0.


p→0

Proof P(pXp > t) = P(Xp > t/p), and, since Xp is an integer,

P(Xp > t/p) = P(Xp ≥ dt/pe) = (1 − p)dt/pe+1 .

Since limn→∞ (1 + n−1 )n = e, we have that the last expression converges to e−t as p → 0. 

Exercise 5.11. Show the following relation between exponential and uniform distributions: if
U is uniform in (0, 1), then − λ1 ln U is exponential with parameter λ.

5.4.3 The Gamma distribution


The following Γ function is well-known: for β > 0,
Z ∞
Γ(β) = y β−1 e−y dy.
0

Here β is the shape parameter.


It is impossible to evaluate the integral analytically, for all values of β. However, it is known if
β takes a positive integer value and also for β = n + 1/2, n = 0, 1, . . ..
Exercise 5.12. Using integration by parts, prove that, for β > 1,

Γ(β) = (β − 1)Γ(β − 1). (5.1)

We know that, for β = 1,


Γ(1) = 1.
Therefore, for n = 1, 2, . . ., from (5.1),

Γ(n) = (n − 1)!

Exercise 5.13. Prove that, for β = 1/2,



Γ(1/2) = π.

Then, again from (5.1), for n = 1, 2, . . .,



Γ(n + 1/2) = π(n − 1/2) · . . . · 1/2.

We introduce now a random variable with Gamma distribution. Let β and λ be two positive
constants. A random variable X has a Gamma distribution with parameters β and λ, X ∼
Γ(β, λ) if X has the density function
( β
λ
xβ−1 e−λx if x > 0
fX (x) = Γ(β)
0 otherwise
SMSTC: Foundations of Probability 5–9

Here β is the shape parameter. and λ is the (reciprocal) scale parameter.


It is easy to verify that this is a density function. Indeed, it is non-negative and
Z ∞ Z ∞
1 Γ(β)
xβ−1 e−λx dx = β y β−1 e−y dy = β
0 λ 0 λ

where we used the change of variables y = λx, dx = dy/λ.


Then, we can find the first and the second moments directly.

Exercise 5.14. Prove that


β β(β + 1)
EX = , E(X 2 ) =
λ λ2
and that
θ −β
 
MX (θ) = 1 −
λ
for t < λ and MX (θ) = ∞ for t ≥ λ.

5.4.4 Relations between Exponential and Gamma distributions


Clearly, Exp(λ) = Γ(1, λ). More generally, the following result holds:

Lemma 5.14. If ξ1 , . . . ξn are i.i.d. random variables with a common exponential distribution
Exp(λ), then
Sn ≡ ξ1 + . . . + ξn ∼ Γ(n, λ).

Proof This may be done by induction: for any t > 0,


Z t
fSn+1 (t) = fSn (u)fξn+1 (t − u)du
0
Z t n n−1
λ u
= e−λu λe−λ(t−u) du
0 (n − 1)!
λn+1 −λt t n−1
Z
= e u du
(n − 1)! 0
λn+1 tn −λt
= e .
n!


Corollary 5.1. If X1 ∼ Γ(n, λ) and X2 ∼ Γ(m, λ), and if X1 and X2 are independent, then
X1 + X2 ∼ Γ(n + m, λ).

Proof From the previous lemma, one can represent X1 as a sum X1 = ξ1 + . . . + ξn where
ξ1 , . . . , ξn are i.i.d. exponential. Similarly, X2 = ξn+1 +. . .+ξn+m , and X1 +X2 = ξ1 +. . .+ξn+m
is a sum of n + m i.i.d. exponential random variables. 

Exercise 5.15. Prove the following generalisation of the previous corollary:


Let λ, β1 and β2 be positive numbers. If X1 ∼ Γ(β1 , λ) and X2 ∼ Γ(β2 , λ), and if X1 and X2
are independent, then X1 + X2 ∼ Γ(β1 + β2 , λ).
SMSTC: Foundations of Probability 5–10

5.5 The Gaussian (Normal) distribution


5.5.1 The one-dimensional Normal distribution
The standard Normal N(0, 1)

A random variable X has a standard normal distribution, X ∼ N(0, 1) if it has a density function
1 2
fX (x) = √ e−x /2 .

Note that Z ∞
2 /2 √
e−x dx = 2π.
−∞
Exercise 5.16. Prove the last equality.

Since the density function is an odd function,


EX = 0.
Further,
Z ∞
1 2
2
E(X ) = √ x2 e−x /2 dx
2π −∞
Z ∞
1 2
= √ x · xe−x /2 dx
2π −∞
∞ !
Z ∞
1 −x2 /2 −x2 /2
= √ −x · e + e dx
2π −∞
−∞

= 1.
Then
Var X = EX 2 − (EX)2 = 1.
Also, for all θ, the moment generating function of X exists and is equal to
Z ∞
1 2 2
MX (θ) = √ eθx e−x /2 dx = eθ /2
2π −∞
Exercise 5.17. Prove the last equality.

Since the standard normal distribution plays a central role in probability theory and its appli-
cations, special notation has been reserved for its distribution function:
Z x
1 2
Φ(x) = √ e−y /2 dy.
2π −∞
Here are the basic properties of function Φ:
(1) Φ(x) = 1 − Φ(−x) and, in particular.
(2) Φ(0) = 1/2.

The Normal Distribution N(a, σ 2 )

Let a be an arbitrary number and σ > 0. A random variable X has a normal distribution with
parameters a and σ 2 if its density function is defined as
1 2 2
fX (x) = √ e−(x−a) /2σ .
σ 2π
Here a is a shift parameter, and σ is a scale parameter.
SMSTC: Foundations of Probability 5–11

Lemma 5.15. If X ∼ N(a, σ 2 ), then Y = X−a


σ ∼ N(0, 1). In other words, if X ∼ N(a, σ 2 ),
then X may be represented as X = a + σY where Y ∼ N(0, 1).

Proof Let Y = (X − a)/σ. Then

P(Y ≤ y) = P(X ≤ a + σy)


Z a+σy
(z − a)2
 
1
= √ exp − dz
2πσ −∞ 2σ 2
Z y
1 2
=(z=a+σv) √ e−v /2 dv = Φ(y).
2π −∞


Corollary 5.2. If X ∼ N(a, σ 2 ) and Y is as before, then

EX = a + σEY = a,

E(X 2 ) = a2 + σ 2 ,
and
VarX = σ 2 .
Also, the moment generating function MX (θ) of X exists for all θ and is equal to

σ2 θ2
 
MX (θ) = exp aθ + .
2

Lemma 5.16. Let X1 , X2 , . . . , Xn be mutually independent random variables and let Xi ∼


N(ai , σi2 ) for all i. Then
Xn
X= Xi ∼ N(a, σ 2 )
i=1

where X X
a= ai and σ2 = σi2 .

Proof This follows from the basic properties of moment generating functions:
2 θ 2 /2
MX (θ) = MX1 (θ) · . . . · MXn (θ) = . . . = eaθ+σ .

5.5.2 Relations between Gamma, Exponential and Normal Distributions.


We start with the following result.

Lemma 5.17. Let Y ∼ N(0, 1). Then


 
2 1 1
X=Y ∼Γ , .
2 2
SMSTC: Foundations of Probability 5–12

Proof For x > 0,


P(X ≤ x) = P(Y 2 ≤ x)
√ √
= P(− x ≤ Y ≤ x)
Z √x
1 2
= √ √
e−z /2 dz
2π − x
Z √x
2 2
= √ e−z /2 dz
2π 0
Z x
2 1
=(change of variable: √
z= u)
√ e−u/2 √ du
2π 0 2 u
Z x 1 1/2
= 2√
u−1/2 e−u/2 du.
0 π
1 1

This is the distribution function of the Γ 2, 2
distribution. 
Note that the Γ 21 , 21 distribution is also a particular case of a chi-squared random variable


(which we will not discuss further here).


Corollary 5.3. If Y1 , Y2 are i.i.d. standard Normal random variables, then
 
2 2 1
Y1 + Y2 ∼ Exp .
2
Exercise 5.18. Prove this corollary.

Comment: The formula above is efficiently used in the simulation of normally distributed random
variables.
Exercise 5.19. Try to draw a diagram of relations between standard distributions (discrete
and continuous).

5.5.3 The Multivariate (Multidimensional) Normal Distribution


The Standard MND

For any d ≥ 2, a d-dimensional random vector X = (X1 , . . . , Xd ) has a standard d-dimensional


normal distribution Nd (0, I) if its coordinates are i.i.d. standard normal random variables. Since
we will consider products A × X of matrices A and vectors X, it is convenient to write vectors
in columns:  
X1
X2 
X= . 
 
 .. 
Xd
Here  
1 0 0 ... 0
0 1 0 . . . 0
 
I = 0 0 1 . . . 0  = Cov X

 .. .. .. . . .. 
. . . . .
0 0 0 ... 1
is a matrix of covariances:
(
1 if i = j
Cov(Xi , Xj ) = E(Xi Xj ) =
0 6 j
if i =
SMSTC: Foundations of Probability 5–13

and  
0
0
0 = .
 
 .. 
0
is a vector of means (expectations):

0 = EXi for all i.

The General MND

Let X ∼ Nd (0, I),  


b1
b2 
b=.
 
 .. 
bd
a vector of constants, and  
a11 a12 . . . a1d
a21 a22 . . . a2d 
A= .
 
.. .. .. 
 .. . . . 
ad1 ad2 . . . add
a d × d matrix with Det(A) 6= 0. Then

Y = AX + b

has d-dimensional normal distribution Nd (b, C) (in short: Y ∼ Nd (b, C)).

Comment: The lower index d in Nd (b, C) is usually omitted.

Here
bi = EYi
and  
c11 c12 . . . c1d
c21 c22 . . . c2d 
C= .
 
.. . . .. 
 .. . . . 
cd1 cd2 . . . cdd
where
d
X d
X
cij = Cov(Yi , Yj ) = ail ajl =, ail aTlj i, j = 1, . . . , d
l=1 l=1

where aTlj = ajl . Note that C is a symmetric matrix,

cij = cji for all i, j.

Example. Let d = 2,  
b
b= 1
b2
and  
a11 a12
A=
a21 a22
SMSTC: Foundations of Probability 5–14

where
DetA = a11 a22 − a12 a21 6= 0.
Then  
Y1
Y=
Y2
where
Yi = bi + ai1 X1 + ai2 X2 , i = 1, 2,
EY1 = b1 , EY2 = b2 ,
and, for i, j = 1, 2,

Cov(Yi , Yj ) = E((Yi − EYi )(Yj − EYj ))


= E((ai1 X1 + ai2 X2 )(aj1 X1 + aj2 X2 ))
= ai1 aj1 + ai2 aj2 + 0 + 0
= ai1 aj1 + ai2 aj2

Warning Comment: A common mistake is to say:


“if Xi ∼ N(bi , σi2 ), then  
X1
X2 
X= . 
 
 .. 
Xd
is a multivariate normal random vector.”
This is NOT TRUE, in general. Can you suggest a counter-example?
Remark. It follows from the definition of C that

C = A × AT

where AT is the transpose of A. Since A and AT have the same determinant,

Det(C) = Det(A) · Det(AT ) = (Det(A))2 ≥ 0,

and
Det(C) > 0 iff Det(A) 6= 0.

The density of the MND

If X ∼ Nd (b, C), then, for x = (x1 , . . . , xd ),


 
d X d
1 1 1 X
fX (x) = ·p exp − gij (xi − bi )(xj − bj ) (5.2)
(2π)d/2 Det(C) 2
i=1 j=1

where
G = (gij )i,j=1,...,d = C −1 .
Since C is symmetric matrix, G is symmetric too, i.e. gij = gji for all i, j.
In particular, G is a diagonal matrix if and only if C is, and if and only if

Cov(Xi , Xj ) = 0 ∀ i 6= j.
SMSTC: Foundations of Probability 5–15

On the other hand, we know that random variables with joint absolutely continuous distribution
are mutually independent if and only if their joint density function is a product of their marginal
densities. From (5.2), we can see that the latter happens if and only if gij = 0 for all i 6= j. The
conclusion is:
Let X1 and X2 be two random variables. We know that if they are independent, then they are
uncorrelated, but the converse is not true, in general.
But, if (X1 , X2 ) is a two-dimensional normal random vector and if Cov(X1 , X2 ) = 0, then X1
and X2 are independent.
The following general result holds:

Theorem 5.1. If  
X1
X2 
X= . 
 
 .. 
Xd
is multivariate normal vector and if Cov(Xi , Xj ) = 0 for all i 6= j, then random variables
X1 , X2 , . . . , Xd are mutually independent.

5.5.4 Conditional Normal distribution


If it appears that we’ve done a lot of Linear Algebra, then this is because it is so: Dealing with
Normal (Gaussian) random variables (and processes!) is mostly dealing with Linear Algebra.
Without proof, we mention the following:

Lemma 5.18. Let (X, Y1 , . . . , Yd ) be a Normal (1 + d)-dimensional random vector. Then the
conditional law P(X ∈ ·|Y1 , . . . , Yd ) is normal with mean E(X|Y1 , . . . , Yd ) and deterministic
covariance matrix. Moreover, E(X|Y1 , . . . , Yd ) is a linear function of (Y1 , . . . , Yd ) and is char-
acterised by the fact that X − E(X|Y1 , . . . , Yd ) is independent of (Y1 , . . . , Yd ).

Exercise 5.20. Let X, Y be independent standard Normal random variables. Based on the last
part of the lemma above, compute E(X + 2Y |3X − 4Y ).

You might also like