Professional Documents
Culture Documents
05 - Special Distributions
05 - Special Distributions
Foundations of Probability
www.smstc.ac.uk
Contents
(i)
SMSTC (2022/23)
Foundations of Probability
Chapter 5: Important special distributions
The Probability Teama
www.smstc.ac.uk
Contents
5.1 The Binomial, Poisson, and Multinomial distributions . . . . . . . . 5–1
5.1.1 The Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 5–1
5.1.2 The Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–2
5.1.3 The Multinomial distribution . . . . . . . . . . . . . . . . . . . . . . . . 5–3
5.1.4 Thinning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5–4
5.2 The Geometric distribution . . . . . . . . . . . . . . . . . . . . . . . . 5–5
5.3 The Uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 5–6
5.4 The Exponential and Gamma distributions . . . . . . . . . . . . . . . 5–7
5.4.1 The Exponential distribution . . . . . . . . . . . . . . . . . . . . . . . . 5–7
5.4.2 Convergence of Geometric distributions to Exponential . . . . . . . . . 5–8
5.4.3 The Gamma distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 5–8
5.4.4 Relations between Exponential and Gamma distributions . . . . . . . . 5–9
5.5 The Gaussian (Normal) distribution . . . . . . . . . . . . . . . . . . . 5–10
5.5.1 The one-dimensional Normal distribution . . . . . . . . . . . . . . . . . 5–10
5.5.2 Relations between Gamma, Exponential and Normal Distributions. . . 5–11
5.5.3 The Multivariate (Multidimensional) Normal Distribution . . . . . . . . 5–12
5.5.4 Conditional Normal distribution . . . . . . . . . . . . . . . . . . . . . . 5–15
P(ξ1 = 1) = p, P(ξ1 = 0) = 1 − p.
5–1
SMSTC: Foundations of Probability 5–2
Further, since ξn+1 + . . . + ξn+m has also Binomial distribution with parameters m and p and
does not depend on Sn , we have that the sum of two independent Binomial random variables
with the same parameter p is again Binomial. We also have
n k
P(Sn = k) = p (1 − p)n−k , k = 0, 1, . . . , n,
k
and, from the binomial theorem (or from the properties of moment generating functions), the
moment generating function of Sn is given by
MSn (θ) := EeθSn = (peθ + 1 − p)n .
Exercise 5.1. Compute the moment generating function of n−1/2 (Sn − ESn ) and show that, as
2
n → ∞, it converges to a function of the form ecθ . (As you already know, this is the moment
generating function of a Normal (Gaussian) random variable).
Exercise 5.2. Use Chernoff’s inequality to estimate P(Sn > n(p + x)) for x > 0.
k=1
and this is the moment generating function of the Poisson law with parameter nk=1 λk . Then
P
the result follows since the moment generating function determines the distribution uniquely.
Next we consider conditional probabilities:
Lemma 5.3. Suppose that X1 , X2 are independent Poisson random variables with parameters
λ1 , λ2 . Then, conditional on X1 + X2 = n, we have that X1 is Binomial with parameters n,
λ1 /(λ1 + λ2 ).
Proof For 0 ≤ k ≤ n,
P(X1 = k, X2 = n − k)
P(X1 = k|X1 + X2 = n) =
P(X1 + X2 = n)
λk1 −λ1 λn−k
e 2
e−λ2
k! (n − k)!
=
(λ1 + λ2 )n −λ1 −λ2
e
n!
k n−k
n λ1 λ2
= .
k λ1 + λ2 λ1 + λ2
Lemma 5.4. Suppose that X1 , X2 , . . . , Xd are independent Poisson P random variables with pa-
rameters λ1 , λ2 , . . . , λd , respectively. Then, conditional on k Xk = n, the random vector
(X1 , . . . , Xd ) has law given by
n1 nd
X n λ1 λd
P(X1 = n1 , . . . , Xd = nd | Xk = n) = ··· ,
n1 , . . . , n d λ λ
k
P
where (n1 , . . . , nd ) are nonnegative integers with sum equal to n, and λ = k λk , and where
n n!
=
n1 , . . . , n d n1 ! · · · nd !
The sum is taken over all nonnegative integers (n1 , . . . , nd ), with sum equal to n.
The random variable (X1 , . . . , Xd ) with values in Zd+ is said to have a Multinomial distribution
with parameters d, n, p1 , . . . , pd (where p1 + · · · + pd = 1, so one of them is superfluous) if
n
P(X1 = n1 , . . . , Xd = nd ) = pn1 · · · pnd d ,
n1 , . . . , n d 1
5.1.4 Thinning
Suppose that an urn contains n balls. There are d colours available. Let the colours be denoted
by c1 , c2 , . . . , cd . To each ball assign colour ci with probability pi , independently from ball to
ball. Let Sni be the number of balls that have colour ci , 1 ≤ i ≤ d. It is easy to see that
(Sn1 , . . . , Snd ) has the multinomial distribution:
n
P(Sn1 = k1 , . . . , Snd = kd ) = pk1 · · · pkdd ,
k1 , . . . , kd 1
Sn1 + · · · + Snd = n,
so the random variables Sn1 , . . . , Snd cannot be independent. The question is:
Suppose that the number of balls is itself a random variable, independent of every-
thing else. Is there a way to choose the law of this random variable so that the above
variables are independent?
To put the problem in precise mathematical terms, let ξ1 , ξ2 , . . . be i.i.d. random colours, i.e.
random variables taking values in {c1 , . . . , cd } such that
P(ξ1 = ci ) = pi , 1 ≤ i ≤ d.
Let
n
X
Sni := 1(ξk = ci ).
k=1
(In physical terms, Sni denotes precisely what we talked about earlier using more flowery lan-
guage.) Now, independent of the sequence ξ1 , ξ2 , . . ., let N be a random variable with values in
Z+ . The problem is to find its law so that
1 d
SN , . . . , SN are independent random variables. (?)
It turns out that this is a characterising property of the Poisson law. We will content ourselves
with proving one direction and only for d = 2.
Lemma 5.5. If N is Poisson with parameter λ, then SN i is also Poisson with parameter λp ,
i
i = 1, 2 and these random variables are independent.
SMSTC: Foundations of Probability 5–5
P(X = k) = pq k−1 , k = 1, 2, . . .
P(X > k) = q k , k ∈ N.
Lemma 5.6. For each k ∈ N, the conditional distribution of X − k given {X > k} is the same
as the distribution of X:
P(X − k = n|X > k) = P(X = n).
Think of X as a random time, e.g. the day on which a certain volcano will erupt. The property
above says that if by day k the volcano has not erupted then the remaining time X − k has the
same law as X, no matter how large k is.
Proof Indeed,
Exercise 5.4. Show the opposite: if a random variable X with values in N has the memoryless
property, then it is geometric with parameter p = P(X = 1).
peθ 1
EX = 1/p, Var X = (1 − p)/p2 and EeθX = , for all θ < ln .
1 − (1 − p)eθ 1−p
P(ξ1 = 1) = 1 − P(ξ1 = 0) = p
and by letting
X = inf{k ≥ 1 : ξk = 1}.
We have P(X < ∞) = 1, so
X = min{k ≥ 1 : ξk = 1}
and
P(X > k) = P(ξ1 = · · · = ξk = 0) = (1 − p)k ,
as required.
Lemma 5.8. Suppose that X1 , X2 , . . . , Xd are independent and geometric random variables.
Then X = min(X1 , . . . , Xd ) is also geometric.
Proof
X has a uniform distribution in [a, b] (X ∼ U [a, b]) if, for all intervals I contained in
[a, b], the probability P(X ∈ I) is proportional to the length of I contained within
[a, b].
Exercise 5.6. If X is uniform in [a, b], then cX + d is uniform in the interval with endpoints
ca + d and cb + d.
Lemma 5.9. Let p1 , . . . , pd be positive numbers adding up to 1. Split the interval [0, 1] into
consecutive intervals I1 , . . . , Id of lengths p1 , . . . , pd , respectively. Let U1 , . . . , Un be i.i.d. uniform
in [0, 1]. Let
Xn
i
Sn = 1(Uk ∈ Ii ), 1 ≤ i ≤ d.
k=1
Then (Sn1 , . . . , Snd ) has a multinomial law. In particular, Sni is Binomial with parameters n, pi .
Exercise 5.8. Let U1 , . . . , Ud be i.i.d. uniform in [0, 1]. Compute the probability P(U1 < U2 <
· · · < Ud ).
Exercise 5.9. Consider a stick of length 1 and break it into 3 pieces, by choosing the two break
points at random. Find the probability that the 3 smaller sticks can be joined to form a triangle.
SMSTC: Foundations of Probability 5–7
Proof Indeed,
Proof
P(min(T1 , . . . , Td ) > t) = P(T1 > t) · · · P(Td > t).
Whereas an exponential distribution is the natural analogue of a geometric distribution, in that
they are both memoryless, the former also enjoys the important scaling property (with the
almost obvious proof):
Since limn→∞ (1 + n−1 )n = e, we have that the last expression converges to e−t as p → 0.
Exercise 5.11. Show the following relation between exponential and uniform distributions: if
U is uniform in (0, 1), then − λ1 ln U is exponential with parameter λ.
Γ(n) = (n − 1)!
We introduce now a random variable with Gamma distribution. Let β and λ be two positive
constants. A random variable X has a Gamma distribution with parameters β and λ, X ∼
Γ(β, λ) if X has the density function
( β
λ
xβ−1 e−λx if x > 0
fX (x) = Γ(β)
0 otherwise
SMSTC: Foundations of Probability 5–9
Lemma 5.14. If ξ1 , . . . ξn are i.i.d. random variables with a common exponential distribution
Exp(λ), then
Sn ≡ ξ1 + . . . + ξn ∼ Γ(n, λ).
Corollary 5.1. If X1 ∼ Γ(n, λ) and X2 ∼ Γ(m, λ), and if X1 and X2 are independent, then
X1 + X2 ∼ Γ(n + m, λ).
Proof From the previous lemma, one can represent X1 as a sum X1 = ξ1 + . . . + ξn where
ξ1 , . . . , ξn are i.i.d. exponential. Similarly, X2 = ξn+1 +. . .+ξn+m , and X1 +X2 = ξ1 +. . .+ξn+m
is a sum of n + m i.i.d. exponential random variables.
A random variable X has a standard normal distribution, X ∼ N(0, 1) if it has a density function
1 2
fX (x) = √ e−x /2 .
2π
Note that Z ∞
2 /2 √
e−x dx = 2π.
−∞
Exercise 5.16. Prove the last equality.
= 1.
Then
Var X = EX 2 − (EX)2 = 1.
Also, for all θ, the moment generating function of X exists and is equal to
Z ∞
1 2 2
MX (θ) = √ eθx e−x /2 dx = eθ /2
2π −∞
Exercise 5.17. Prove the last equality.
Since the standard normal distribution plays a central role in probability theory and its appli-
cations, special notation has been reserved for its distribution function:
Z x
1 2
Φ(x) = √ e−y /2 dy.
2π −∞
Here are the basic properties of function Φ:
(1) Φ(x) = 1 − Φ(−x) and, in particular.
(2) Φ(0) = 1/2.
Let a be an arbitrary number and σ > 0. A random variable X has a normal distribution with
parameters a and σ 2 if its density function is defined as
1 2 2
fX (x) = √ e−(x−a) /2σ .
σ 2π
Here a is a shift parameter, and σ is a scale parameter.
SMSTC: Foundations of Probability 5–11
EX = a + σEY = a,
E(X 2 ) = a2 + σ 2 ,
and
VarX = σ 2 .
Also, the moment generating function MX (θ) of X exists for all θ and is equal to
σ2 θ2
MX (θ) = exp aθ + .
2
where X X
a= ai and σ2 = σi2 .
Proof This follows from the basic properties of moment generating functions:
2 θ 2 /2
MX (θ) = MX1 (θ) · . . . · MXn (θ) = . . . = eaθ+σ .
Comment: The formula above is efficiently used in the simulation of normally distributed random
variables.
Exercise 5.19. Try to draw a diagram of relations between standard distributions (discrete
and continuous).
and
0
0
0 = .
..
0
is a vector of means (expectations):
Y = AX + b
Here
bi = EYi
and
c11 c12 . . . c1d
c21 c22 . . . c2d
C= .
.. . . ..
.. . . .
cd1 cd2 . . . cdd
where
d
X d
X
cij = Cov(Yi , Yj ) = ail ajl =, ail aTlj i, j = 1, . . . , d
l=1 l=1
Example. Let d = 2,
b
b= 1
b2
and
a11 a12
A=
a21 a22
SMSTC: Foundations of Probability 5–14
where
DetA = a11 a22 − a12 a21 6= 0.
Then
Y1
Y=
Y2
where
Yi = bi + ai1 X1 + ai2 X2 , i = 1, 2,
EY1 = b1 , EY2 = b2 ,
and, for i, j = 1, 2,
C = A × AT
and
Det(C) > 0 iff Det(A) 6= 0.
where
G = (gij )i,j=1,...,d = C −1 .
Since C is symmetric matrix, G is symmetric too, i.e. gij = gji for all i, j.
In particular, G is a diagonal matrix if and only if C is, and if and only if
Cov(Xi , Xj ) = 0 ∀ i 6= j.
SMSTC: Foundations of Probability 5–15
On the other hand, we know that random variables with joint absolutely continuous distribution
are mutually independent if and only if their joint density function is a product of their marginal
densities. From (5.2), we can see that the latter happens if and only if gij = 0 for all i 6= j. The
conclusion is:
Let X1 and X2 be two random variables. We know that if they are independent, then they are
uncorrelated, but the converse is not true, in general.
But, if (X1 , X2 ) is a two-dimensional normal random vector and if Cov(X1 , X2 ) = 0, then X1
and X2 are independent.
The following general result holds:
Theorem 5.1. If
X1
X2
X= .
..
Xd
is multivariate normal vector and if Cov(Xi , Xj ) = 0 for all i 6= j, then random variables
X1 , X2 , . . . , Xd are mutually independent.
Lemma 5.18. Let (X, Y1 , . . . , Yd ) be a Normal (1 + d)-dimensional random vector. Then the
conditional law P(X ∈ ·|Y1 , . . . , Yd ) is normal with mean E(X|Y1 , . . . , Yd ) and deterministic
covariance matrix. Moreover, E(X|Y1 , . . . , Yd ) is a linear function of (Y1 , . . . , Yd ) and is char-
acterised by the fact that X − E(X|Y1 , . . . , Yd ) is independent of (Y1 , . . . , Yd ).
Exercise 5.20. Let X, Y be independent standard Normal random variables. Based on the last
part of the lemma above, compute E(X + 2Y |3X − 4Y ).