Professional Documents
Culture Documents
Chapter 7: Introduction: 1 Convergence in Distribution
Chapter 7: Introduction: 1 Convergence in Distribution
Chapter 7: Introduction: 1 Convergence in Distribution
Ín
EXP(θ) distribution. Denote the sample sum by n X̄ n i1 X i . Then it is well know that
n X̄ n ∼ GAM(θ, n). What if X does not have an exponential distribution?
Suppose X1 , X2 , . . . , X n is a random sample on a random variable X which has a
distribution defined by the pdf f X (x). Further, suppose that t n t(x1 , . . . , x n ) is a function
of x 1 , . . . , x n such that Tn t(X1 , . . . , X n ) is a random variable. Several special forms of
Ín Ín
Tn are Tn X̄ n 1
n i1 X i , Tn S2n 1
n−1 i1 (X i − X̄ n )2 , Tn X(1) , Tn X(n) , etc. These
random variables play a key role in obtaining exact procedures for estimation, confidence
interval and testing of unknown parameters of the distribution.
In some cases the pdf of Tn is obtained easily, but there are many important cases
where the derivation is not tractable.
In many of these, it is possible to obtain useful approximate results that apply when n
is large. These results are based on the notions of convergence in distribution and limiting
distribution.
1 Convergence in distribution
If the CDF of Yn is G n (y) for each n 1, 2, . . . , .. and if for some CDF GY (y) of a random
variable Y,
lim G n (y) GY (y)
n→∞
1
for all values y at which GY (y) is continuous, then the sequence Y1 , Y2 , . . . , .. is said to
d
converge in distribution to Y, denoted by Yn → Y. The distribution corresponding to the
CDF GY (y) is called the limiting distribution of Yn .
Example 7.2.1 of the book Let X1 , . . . , X n , be a random sample from a uniform distribution,
X ∼ UNIF(0, 1). Then,
f X (x) 1, 0 < x < 1,
Statistical Inference, September 9, 2020
MX (t) exp(µt)
and
Var(X) 0.
2
converge in distribution to Y G(y), denoted by - Y. The distribution corre-
sponding to the CDF G(y) is called the limiting distribution of }.
Fxmph1 7.2.1 Let X1, ..., X, be a random sample from a uniform distribution, X. UNIF(O, 1),
and let }Ç = the largest order statistic. From the results of Chapter 6, it
follows that the
Now, one can check that CDF of is
G(y)y' lim G n (y) GY (y)
O<y<1 (7.2.3)
n→∞
zero if y O and one if y 1. Of course, when O <y < 1, yfl approaches O as n
for all values y at which GY (y) is continuous.
approaches x, and when y O or y ? 1, G(y) is a sequence of constants, with
This situation is illustrated in Figure 1 (Figure 7.1 in the book), which shows G(y) and
G(y) for n 2, 5, and 10.
Statistical Inference, September 9, 2020
FIGURE 7.1 Comparison of CDFs G(y) with limiting degenerate CDF G(y)
G,1(y) G(y)
5
Yo y2
y
o o
d
Thus, Yn X n:n → Y where the random variable Y has a degenerate distribution with
P[Y 1] 1. In other words we can say that the nth order statistic from a uniform (0, 1)
distribution converges to a degenerate random variable. Or the nth order statistic from a
ession, OCR, web optimization
uniform usingdistribution
(0, 1) has a limiting a watermarked evaluationatcopy
which degenerates 1. of CVISION PDFCompressor
Example 7.2.2 of the book Let X1 , . . . , X n , be a random sample from an exponential
distribution, X ∼ EXP(θ). Then,
1 x
f X (x) exp − , x > 0, θ>0
θ θ
and zero otherwise, and
(
0, if x≤0
FX (x)
1 − exp − θ , if
x
x > 0.
Further, let Yn X1:n , the smallest order statistic. Then, it follows that the G n (y) of Yn is
ny
G n (y) 1 − [1 − FX (y)] 1 − exp −
n
, y>0
θ
3
y
and zero if y ≤ 0. We have limn→0 G n (y) 1 if y > 0 because exp − θ < 1 in this case.
Thus, the limit is zero if y < 0 and one if y > 0. Also, notice that the limit at y 0 is zero.
Thus
(
0, if y ≤ 0
lim G n (y)
n→∞ 1, if y > 0
Statistical Inference, September 9, 2020
Observe that the limiting function is not only discontinuous at y 0 but also not even
continuous from the right at y 0, which is a requirement of a CDF.
Now, define the CDF of a degenerate random variable Y as
(
0, if y < 0
GY (Y)
1, if y ≥ 0.
Now, note that the right hand side (limiting function) of limn→∞ G n (y) and GY (Y) are
equal except for one point y 0 but this is not a problem, because the definition of
convergence in distribution requires only that the limiting function agrees with a CDF at its
points of continuity and y 0 is the point of discontinuity of GY (Y).
d
Thus, X1:n → Y where the random variable Y has a degenerate distribution with
P[Y 0] 1. That is the first order statistic from an exponential distribution converges to
a degenerate random variable.
2 Stochastic convergence
In earlier examples we have seen that limiting distribution are degenerate. But, not all
limiting distributions are degenerate, as seen in the next example. The following limits
are useful in many problems:
c nb
lim 1 + exp(cb),
n→∞ n
nb
c d(n)
lim 1 + + exp(cb) if lim d(n) 0.
n→∞ n n n→∞
4
The Pareto distribution
A random variable X is said to have a Pareto distribution with parameters θ and κ,
denoted by X ∼ PAR(θ, κ) if its density is given by
κ
f X (x; θ, κ) , x > 0, θ > 0, κ > 0.
θ(1 + x/θ)κ+1
Statistical Inference, September 9, 2020
Example 7.2.3 of the book Let X1 , . . . , X n , be a random sample from a Pareto distribution,
X ∼ PAR(1, 1), and let Yn nX1:n . The CDF of X is
so the CDF of Yn , is
G n (y) P[Yn ≤ y]
P[nX1:n ≤ y]
h yi
P X1:n ≤
n i
h y n
1 − 1 − FX
n
y −n
1− 1+ , y > 0.
n
Now, taking the limit of G n (y) as n → ∞, for y > 0, we get
y −n
lim G n (y) 1 − lim 1 + 1 − exp(−y).
n→∞ n→∞ n
WE know that if Y ∼ EXP(1), then
(
0, if y ≤ 0
GY (Y)
1 − exp(−y), if y > 0.
Now, observe that
lim G n (y) GY (Y).
n→∞
This is illustrated in Figure 2 (Figure 7.2 in the book), which shows the graphs of GY (y)
and G n (y) for n 1, 2, and 5.
Thus the limiting distribution of nX1:n is exponential which is a non-degenerate dis-
tribution.
5
which is the CDF of an exponential distribution, EXP(1) This is illustrated in
Figure 7.2, which shows the graphs of G(y) and G,,(y) for n = 1, 2, and 5.
G(y) = i - e
Figure 2: following
The Comparison of CDFs
example G n (y)
shows thatwith limiting
a sequence ofCDF GY (y)
random variables need not
have a limiting distribution.
4 Limiting distribution does not exist
The following example shows that a sequence of random variables need not have a limiting
distribution.
Example 7.2.4 of the book Let X1 , . . . , X n , be a random sample from a Pareto distribution,
X ∼ PAR(1, 1), and let Yn X n:n . The CDF of X is
compression, OCR, web optimization using a watermarked
−1 x evaluation copy of CVISION PDFCompr
FX (x) 1 − (1 + x) , x>0
1+x
so the CDF of Yn , is
G n (y) P[Yn ≤ y]
P[X n:n ≤ y]
n
FX y
n
y
, y > 0,
1+ y
and zero otherwise. Because y/(1 + y) < 1, we have limn→∞ G n (y) 0 for all y > 0 which
can not be a CDF because it does not approach one as y → ∞.
Example 7.2.5 of the book Let X1 , . . . , X n , be a random sample from a Pareto distribution,
X ∼ PAR(1, 1), and let Yn X n:n /n. The CDF of X is
x
FX (x) 1 − (1 + x)−1 , x>0
1+x
6
so the CDF of Yn , is
G n (y) P Yn ≤ y
X n:n
P[ ≤ y]
n n
FX n y
n
ny
Statistical Inference, September 9, 2020
1 + ny
−n
1 + ny
ny
−n
1
1+ , y > 0,
ny
c nb
and zero otherwise. Now, taking the limit (use the result limn→∞ 1 + exp(cb)) of
n
G n (y) as n → ∞, for y > 0, we get
−n
1 1
lim G n (y) lim 1 + exp − , y > 0.
n→∞ n→∞ ny y
Now, consider the CDF of Y (you can show that it is a CDF) given by
(
0, if y ≤ 0
GY (Y)
exp − 1y , if y > 0.
Then,
lim G n (y) GY (y)
n→∞
d
for all values y at which GY (y) is continuous. Hence, Yn n1 X n:n → Y where the random
variable Y has a non-degenerate distribution with pdf
1 −2
fY (y) exp − y , y > 0.
y
The distribution defined above is a special case of the inverse-gamma distribution.
Example 7.2.6 of the book Let X1 , . . . , X n , be a random sample from an exponential
distribution, X ∼ EXP(θ). Then,
1 x
f X (x) exp − , x > 0, θ>0
θ θ
and zero otherwise, and
(
0, if x≤0
FX (x)
− θx , if x > 0.
1 − exp
7
Further, let Yn θ1 X n:n − ln n. Then, it follows that G n (y) of Yn is
G n (y) P[Yn ≤ y]
1
P X n:n − ln n ≤ y
θ
P X n:n ≤ θ(y + ln n)
Statistical Inference, September 9, 2020
Now, consider the CDF of Y (you can show that it is a CDF) given by
Then,
lim G n (y) GY (y)
n→∞
d
for all values y. Hence, Yn 1
θ X n:n − ln n → Y, where the random variable Y has a
non-degenerate distribution with pdf
THE distribution defined by the above density is called the Gumbel distribution.
Example 7.2.7 of the book Let X1 , . . . , X n , be a random sample from a normal distribution,
X ∼ N(µ, σ2 ). Then, Yn X̄ n ∼ N(µ, n1 σ2 ). Further,
√
n(Yn − µ)
∼ N(µ, σ2 )
σ
which gives
√ √ √
n(Yn − µ) n(y − µ) n(y − µ)
G n (y) P[Yn ≤ y] P ≤ Φ , −∞ < y < ∞.
σ σ σ
8
where Φ(·) is the CDF of a standard normal distribution. To understand the explanation
easier, let us write the above expression in terms of an integral as
√
∫ n(y−µ)
σ
G n (y) P[Yn ≤ y] φ(z) dz,
−∞
where φ(z) is the pdf of a standard normal variable. Now, consider three cases y < µ,
Statistical Inference, September 9, 2020
y µ and y > µ.
√ √
n(y−µ) n(y−µ)
If y < µ, then σ < 0 and σ → −∞ as n → ∞ giving
√
n(y−µ)
If y µ, then σ 0 and
∫ 0
1
lim G n (y) lim φ(z) dz , if y µ.
n→∞ n→∞ −∞ 2
√ √
n(y−µ) n(y−µ)
If y > µ, then σ > 0 and σ → ∞ as n → ∞ giving
Thus,
0, if y < µ,
lim G n (y) 1
2 , if y µ,
n→∞
1, if y > µ.
Now, let Y be a degenerate random variable with P(Y µ) 1. Then
(
0, if y < µ
GY (Y)
1, if y ≥ µ.
Now, note that the right hand side (limiting function) of limn→∞ G n (y) and GY (Y) are
equal except for one point y µ but this is not a problem, because the definition of
convergence in distribution requires only that the limiting function agrees with a CDF at its
points of continuity and y µ is the point of discontinuity of GY (Y).
d
Thus, X̄ n → Y, where the random variable Y has a degenerate distribution with
P[Y µ] 1. That is the sample mean from a normal distribution with mean µ and
variance σ 2 converges to a degenerate random variable. Needless to say that this type of
convergence is also called stochastic convergence.
9
6 Limiting distributions - conclusion
(Example 7.2.3)
(Example 7.2.5)
d
• X1:n → Y, where the random variable Y has a degenerate distribution with P[Y
0] 1 (Example 7.2.2)
d
• 1
θ X n:n − ln n → Y, where the random variable Y has a non-degenerate distribution
(Gumbel distribution) with the pdf
(Example 7.2.6)
d
Also, we have shown that X̄ n → Y, where the random variable Y has a degenerate
distribution with P[Y µ] 1 and X̄ n is the sample mean from a normal distribution
with mean µ and variance σ2 . (Example 7.2.7)
That is the sequence of random variables may converge in distribution to a degenerate
random variable or a non-degenerate random variable or may not converge at all to any
random variable.
10
7 Approximation using limiting distribution
d
and zero if y ≤ − ln n. Further, Yn θ1 X n:n − ln n → Y, where the random variable Y has
a non-degenerate distribution (Gumbel distribution) with the CDF and the pdf given by
and
fY (y) exp − exp −y exp −y , −∞ < y < ∞,
respectively.
We now illustrate the accuracy when this limiting CDF is used as an approximation to
G n (y) for large n
Suppose that the lifetime in months of a certain type of component is a random variable
X ∼ EXP(1), and suppose that 10 independent components are connected in a parallel
system. The time to failure of the system is T X10:10 and the CDF is
t 1 2 5 7
FT (t) 0.010 0.234 0.935 0.9909
G(t − ln 10) 0.025 0.258 0.935 0.9909
FT (t) P[T ≤ t]
P[Y10 + ln 10 ≤ t]
P[Y10 ≤ t − ln 10]
G(t − ln 10),
11
where
In the previous examples, the exact CDF was known for each finite n and the limiting
distribution was obtained directly from this sequence. One advantage of limiting dis-
tributions is that it often may be possible to determine the limiting distribution without
knowing the exact form of the CDF for finite n. The limiting distribution then may pro-
vide a useful approximation when the exact probabilities are not available. One method
of accomplishing this result is to make use of MGFs. The following theorem (Theorem
7.3.1) is stated without proof.
12
The Poisson distribution
A discrete random variable X is said to have the Poisson distribution with parameter µ > 0
if it has discrete pdf of the form
exp(−µ)µ x
f X (x; µ) , x 0, 1, µ > 0.
x!
Statistical Inference, September 9, 2020
A special notation that designates that a random variable X has the Poisson distribution
with parameter µ is X ∼ POI(µ). The MGF of X is given by
It has been shown in Theorem 3.2.3 of the book that if X ∼ BIN(n, p), then for each
value x 0, 1, 2, . . . and p → ∞ with np µ constant,
exp(−µ)µ x
n x
lim p (1 − p)n−x .
n→∞ x x!
Example 7.3.1 of the book Let X1 , . . . , X n , be a random sample from a Bernoulli distribu-
Ín
tion distribution, X i ∼ BIN(1, p), i 1, . . . , n, consider the sum Yn i1 X i which has a
binomial distribution, Yn ∼ BIN(n, p) with MGF
n
MYn (t) 1 − p + p exp(t) , −h < t < h, h > 0.
13
d
which is the MGF of a Poisson distribution with mean µ. Thus Yn → Y ∼ POI(µ).
Example 7.3.2 of the book (Bernoulli Law of Large Numbers) Let X1 , . . . , X n be a random
sample from a Bernoulli distribution distribution, X i ∼ BIN(1, p), i 1, . . . , n, consider
Ín
the the sequence of sample proportion Wn 1
n Yn 1
n i1 X i . The MGF of Wn in this
case
Statistical Inference, September 9, 2020
we obtain
n
pt d(n)
lim MWn (t) lim 1 + +
n→∞ n→∞ n n
exp(pt)
14
Example 7.3.3 of the book Let X1 , . . . , X n , be a random sample from a Bernoulli distribu-
tion, X i ∼ BIN(1, p), i 1, . . . , n, consider the the sequence of "standardized" variables
Ín
X i − np Yn − np Yn np
Z n pi1 − ,
np(1 − p) σn σn σn
where E(Yn ) np and Var(Yn ) np(1 − p) σn2 . The MGF of Z n in this case
Statistical Inference, September 9, 2020
Now, substituting σn
p
np(1 − p) in the first term and the second term we get
pt 2 p2 t2 p2 t2 pt 2 p2 t2 (p − p 2 )t 2 (p(1 − p))t 2 t2
first term − + −
2σn2 σn2 2σn2 2σn2 2σn2 2σn2 2np(1 − p) 2n
15
and
p2 t3 p3 t3 p3 t4
second term + + +···
2σn3 2σn3 4σn4
p2 t3 p3 t3 p3 t4
+ + +···
2[np(1 − p)]3/2 2[np(1 − p)]3/2 4[np(1 − p)]2
√ 3 p
pt p3 t3 pt 4
1
+ p + +···
Statistical Inference, September 9, 2020
d(n)
n
where √ p
pt 3 pt 4 p3 t3
d(n) p + p + 2
+··· .
2 n(1 − p)3 2 n(1 − p)3 4n(1 − p)
It can be checked that, for a fixed value of p, d(n) → 0 as n → ∞. Now, MZ n (t) can be
written as n
t2
d(n)
MZ n (t) 1 + + .
2n n
Finally, applying the result
nb
c d(n)
lim 1 + + exp(cb) if lim d(n) 0.
n→∞ n n n→∞
we obtain
n
t2
d(n)
lim MZ n (t) lim 1 + +
n→∞ n→∞ 2n n
2
t
exp
2
d
which is the MGF of of the standard normal distribution, and so Z n → Z ∼ N(0, 1). This
is an example of a special limiting result known as the Central Limit Theorem.
16
Let m(t) denote the MGF of X − µ, m(t) MX−µ (t), and note that m(0) 1, m 0(0)
E(X − µ) 0, and m 00(t) E(X − µ)2 σ 2 . Expanding m(t) by the Taylor series formula
about 0 gives, for ξ between 0 and t,
m 00(ξ)t 2
m(t) m(0) + m 0(0)t +
2
m 00(ξ)t 2
1+
Statistical Inference, September 9, 2020
2
(m 00(ξ) − σ 2 + σ 2 )t 2
1+
2
σ 2 t 2 (m 00(ξ) − σ 2 )t 2
1+ + .
2 2
Now we may write
Ín n
i1 X i − nµ Õ X i − µ
Zn √ √
nσ i1
nσ
and
MZ n (t) E exp(tZ n )
" n
!#
Õ Xi − µ
E exp t √
i1
nσ
" n #
Xi − µ
Ö
E exp t √
i1
nσ
n
Xi − µ
Ö
E exp t √ ( ∵ X1 , . . . , X n are independent)
i1
nσ
n
t
E exp √ (X − µ) ( ∵ X1 , . . . , X n are identical)
nσ
n
t
m √
nσ
n
t2 (m 00(ξ) − σ 2 )t 2
|t |
1+ + , 0<ξ< √ .
2n 2nσ 2 nσ
|t |
Observe that n → ∞, √nσ → 0, and therefore ξ → 0, and consequently m 00(ξ)− σ 2 → 0.
So we take
(m 00(ξ) − σ 2 )t 2
d(n) , lim d(n) 0,
2σ 2 n→∞
and we re-write MZ n (t) as
n
t2
(d(n)
MZ n (t) 1 + + .
2n n
17
Finally, applying the result
nb
c d(n)
lim 1 + + exp(cb) if lim d(n) 0.
n→∞ n n n→∞
we obtain
n
t2
d(n)
lim MZ n (t) lim 1 + +
Statistical Inference, September 9, 2020
n→∞ n→∞ 2n n
2
t
exp .
2
Or equivalently,
lim GZ n (t) Φ(z)
n→∞
d
which suggests Z n → Z ∼ N(0, 1).
The major application of the CLT is to provide an approximate distribution in cases
where the exact distribution is unknown or intractable.
Example 7.3.4 of the book Let X1 , . . . , X n , be a random sample from a Uniform distribution
Ín
distribution, X i ∼ UNI(0, 1), i 1, . . . , n, and let Yn i1 X i . Because E(X i ) 1/2 and
Var(X i ) 1/12, we have the approximation
approx
n n
Yn ∼ N , .
2 12
For example, if n 12, then approximately
Y12 − 6 N (0, 1) .
This approximation is so close that it often is used to simulate standard normal random
numbers in computer applications. Of course this requires 12 uniform random numbers
to be generated to obtain one random number from the standard normal distribution.
1. If we let p → 0 as n → ∞ in such a way that np µ, for fixed µ > 0, then it has been
d
shown that Yn → Y ∼ POI(µ). (Example 7.3.1 of the book)
18
Ín
2. If p is fixed, then Wn 1
n Yn 1
n i1 X i Wn converges stochastically to p as n
approaches infinity. (Example 7.3.2 of the book)
Here we will concentrate on the last case which states that for a fixed value of p
a suitably standardized sequence of binomial random variable converges to a standard
normal distribution-suggesting a normal approximation. In particular, it suggests that
for large n and fixed p, approximately Yn ∼ N(np, np(1 − p)). This approximation works
best when p is close to 0.5, because the binomial distribution is symmetric when p 0.5.
The accuracy required in any approximation depends on the application. One guideline
is to use the normal approximation when np > 5 and n(1 − p) > 5, but again this would
depend on the accuracy required.
Example 7.4.1 of the book The probability that a basketball player hits a shot is p 0.5, If
he takes 20 shots, what is the probability that he hits at least nine? The exact probability is
P[Y20 ≥ 9] 1 − P[Y20 ≤ 8]
8
Õ 20
1− (0.5) y (1 − 0.5)20−y
y
y0
8
Õ 20
1 − (0.5) 20
y
y0
0.7483.
A normal approximation is
P[Y20 ≥ 9] 1 − P[Y20 ≤ 8]
" #
Y20 − np 8 − 10
1−P p ≤ p ( ∵ n 20, p 0.5)
np(1 − p) (20)(0.5)(0.5)
" #
Y20 − np
−2
1−P p ≤ √ .
np(1 − p) 5
Now,
Y20 − np approx
p ∼ N (0, 1)
np(1 − p)
19
and therefore
−2
P[Y20 ≥ 9] 1 − Φ − √
5
2
1− 1−Φ √
5
2
Statistical Inference, September 9, 2020
Φ √
5
Φ (0.894427)
0.8133.
You can use online calculator to compute normal probabilities here or here.
Because the binomial distribution is discrete and the normal distribution is continuous,
the approximation can be improved by making a continuity correction. In particular,
each binomial probability b(y; n, p) has the same value as the area of a rectangle of height
b(y; n, p) and with the interval [y − 0.5, y + 0.5] as its base, because the length of the base
is one unit. The area of this rectangle can be approximated by the area under the pdf of
Y ∼ N(np, np(1 − p)), which corresponds to fitting a normal distribution with the same
mean and variance as Yn ∼ BIN(n, p). This is illustrated for the case of n 20, p 0.5, and
20
y 7 in Figure 3, where the exact probability is b(7; 20, 0.5) 7
5 (0.5) (0.5)
13 0.0739.
The approximation, which is the shaded area in the Figure 3 is
" #
6.5 − 10 Y − np 7.5 − 10
P[6.5 ≤ Y ≤ 7.5] P p ≤ p ≤ p
(20)(0.5)(0.5) np(1 − p) (20)(0.5)(0.5)
P [−1.56525 ≤ Z ≤ −1.11803]
Φ(−1.11803) − Φ(−1.56525)
0.0732.
Y−np
where Y ∼ N(np, np(1 − p)), Z √ ∼ N(0, 1) with n 20 and p 0.5. You can
np(1−p)
compute normal probabilities here.
The same idea can be used with other binomial probabilities, such as
P[Y20 ≥ 9] 1 − P[Y20 ≤ 8]
20
FIGURE 7.3 Continuity correction for normal approximation of a binomial probability
Statistical Inference, September 9, 2020
b(7;20,0.5) = 0.0739
6.5 7 7.5
" #
Y − np 8.5 − 10
P p ≤ p
np(1 − p) (20)(0.5)(0.5)
P [Z ≤ −0.67082]
Φ(−0.67082)
and
which is much closer to the exact value than without the continuity correction. The
situation is shown in Figure 4
Example 7.4.2 of the book Suppose that Yn ∼ POI(n), where n is a positive integer.
From the reproductive property of Poisson distribution, we know that Yn has the same
Ín
distribution as a sum i1 X i , where X1 , . . . , X n are independent, X i ∼ POI(1). According
to the CLT,
Yn − n
Zn √ Z ∼ N(0, 1),
n
approx
which suggests the approximation Yn ∼ N(n, n) for large n. For example, n 20, we
desire to find P[l0 < Y20 < 30]. The exact value is
30
Õ exp(−20)(20) y
P[10 ≤ Y20 ≤ 30] 0.982,
y!
i10
21
FIGURE 7.4 The normal approximation for a binomial distribution
Statistical Inference, September 9, 2020
22