Chapter 7: Introduction: 1 Convergence in Distribution

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Chapter 7: Introduction

Suppose X1 , X2 , . . . , X n is a random sample on a random variable X which has a N(µ, σ2 )


Ín
distribution. Denote the sample mean by X̄ n  1
n i1 X i . Then it is well know that
X̄ n ∼ N(µ, σ2 /n). What if X does not have a normal distribution?
Suppose X1 , X2 , . . . , X n is a random sample on a random variable X which has a
Statistical Inference, September 9, 2020

Ín
EXP(θ) distribution. Denote the sample sum by n X̄ n  i1 X i . Then it is well know that
n X̄ n ∼ GAM(θ, n). What if X does not have an exponential distribution?
Suppose X1 , X2 , . . . , X n is a random sample on a random variable X which has a
distribution defined by the pdf f X (x). Further, suppose that t n  t(x1 , . . . , x n ) is a function
of x 1 , . . . , x n such that Tn  t(X1 , . . . , X n ) is a random variable. Several special forms of
Ín Ín
Tn are Tn  X̄ n  1
n i1 X i , Tn  S2n  1
n−1 i1 (X i − X̄ n )2 , Tn  X(1) , Tn  X(n) , etc. These
random variables play a key role in obtaining exact procedures for estimation, confidence
interval and testing of unknown parameters of the distribution.
In some cases the pdf of Tn is obtained easily, but there are many important cases
where the derivation is not tractable.
In many of these, it is possible to obtain useful approximate results that apply when n
is large. These results are based on the notions of convergence in distribution and limiting
distribution.

Chapter 7: Sequences of Random Variables

Consider a sequence of random variables Y1 , Y2 , . . . with a corresponding sequence of


CDFs G1 (y), G2 (y), . . . so that for each n  1, 2, . . .

G n (y)  P[Yn ≤ y].

1 Convergence in distribution

If the CDF of Yn is G n (y) for each n  1, 2, . . . , .. and if for some CDF GY (y) of a random
variable Y,
lim G n (y)  GY (y)
n→∞

1
for all values y at which GY (y) is continuous, then the sequence Y1 , Y2 , . . . , .. is said to
d
converge in distribution to Y, denoted by Yn → Y. The distribution corresponding to the
CDF GY (y) is called the limiting distribution of Yn .
Example 7.2.1 of the book Let X1 , . . . , X n , be a random sample from a uniform distribution,
X ∼ UNIF(0, 1). Then,
f X (x)  1, 0 < x < 1,
Statistical Inference, September 9, 2020

and zero otherwise, and




 0, if x ≤ 0


FX (x)  x, if 0 < x < 1

 1, if x ≥ 1.


Further, let Yn  X n:n , the largest order statistic. Then, it follows that the G n (y) of Yn is

G n (y)  [FX (y)]n  y n , 0<y<1

and zero if y ≤ 0 and one if y ≥ 1. Of course, when 0 < y < 1, y n approaches 0 as n


approaches ∞, and when y ≤ 0 or y ≥ 1, G( y) is a sequence of constants, with respective
limits 0 or 1. Thus, (
0, if y < 1
lim G n (y) 
n→∞ 1, if y ≥ 1

The degenerate random variable


A random variable X is degenerate if, for some constant µ, P(X  µ)  1. The CDF of
X is given by (
0, if x < µ
FX (x) 
1, if x ≥ µ.
The moment generating function of X is

MX (t)  exp(µt)

and
Var(X)  0.

Now, let Y be a degenerate random variable with P(Y  1)  1. Then


(
0, if y < 1
GY (Y) 
1, if y ≥ 1.

2
converge in distribution to Y G(y), denoted by - Y. The distribution corre-
sponding to the CDF G(y) is called the limiting distribution of }.

Fxmph1 7.2.1 Let X1, ..., X, be a random sample from a uniform distribution, X. UNIF(O, 1),
and let }Ç = the largest order statistic. From the results of Chapter 6, it
follows that the
Now, one can check that CDF of is
G(y)y' lim G n (y)  GY (y)
O<y<1 (7.2.3)
n→∞
zero if y O and one if y 1. Of course, when O <y < 1, yfl approaches O as n
for all values y at which GY (y) is continuous.
approaches x, and when y O or y ? 1, G(y) is a sequence of constants, with
This situation is illustrated in Figure 1 (Figure 7.1 in the book), which shows G(y) and
G(y) for n  2, 5, and 10.
Statistical Inference, September 9, 2020

FIGURE 7.1 Comparison of CDFs G(y) with limiting degenerate CDF G(y)
G,1(y) G(y)

5
Yo y2

y
o o

Figure 1: Comparison of CDFs G n (y) with limiting degenerate CDF GY (y)

d
Thus, Yn  X n:n → Y where the random variable Y has a degenerate distribution with
P[Y  1]  1. In other words we can say that the nth order statistic from a uniform (0, 1)
distribution converges to a degenerate random variable. Or the nth order statistic from a
ession, OCR, web optimization
uniform usingdistribution
(0, 1) has a limiting a watermarked evaluationatcopy
which degenerates 1. of CVISION PDFCompressor
Example 7.2.2 of the book Let X1 , . . . , X n , be a random sample from an exponential
distribution, X ∼ EXP(θ). Then,
1  x
f X (x)  exp − , x > 0, θ>0
θ θ
and zero otherwise, and
(
0, if x≤0
FX (x) 
1 − exp − θ , if
x
x > 0.


Further, let Yn  X1:n , the smallest order statistic. Then, it follows that the G n (y) of Yn is
 ny 
G n (y)  1 − [1 − FX (y)]  1 − exp −
n
, y>0
θ

3
y
and zero if y ≤ 0. We have limn→0 G n (y)  1 if y > 0 because exp − θ < 1 in this case.
Thus, the limit is zero if y < 0 and one if y > 0. Also, notice that the limit at y  0 is zero.
Thus

(
0, if y ≤ 0
lim G n (y) 
n→∞ 1, if y > 0
Statistical Inference, September 9, 2020

Observe that the limiting function is not only discontinuous at y  0 but also not even
continuous from the right at y  0, which is a requirement of a CDF.
Now, define the CDF of a degenerate random variable Y as
(
0, if y < 0
GY (Y) 
1, if y ≥ 0.

Now, note that the right hand side (limiting function) of limn→∞ G n (y) and GY (Y) are
equal except for one point y  0 but this is not a problem, because the definition of
convergence in distribution requires only that the limiting function agrees with a CDF at its
points of continuity and y  0 is the point of discontinuity of GY (Y).
d
Thus, X1:n → Y where the random variable Y has a degenerate distribution with
P[Y  0]  1. That is the first order statistic from an exponential distribution converges to
a degenerate random variable.

2 Stochastic convergence

A sequence of random variables, Y1 , Y2 , . . . , . . . is said to converge stochastically to a constant


c if it has a limiting distribution that is degenerate at c.

3 Non-degenerate limiting distributions

In earlier examples we have seen that limiting distribution are degenerate. But, not all
limiting distributions are degenerate, as seen in the next example. The following limits
are useful in many problems:
 c  nb
lim 1 +  exp(cb),
n→∞ n
  nb
c d(n)
lim 1 + +  exp(cb) if lim d(n)  0.
n→∞ n n n→∞

4
The Pareto distribution
A random variable X is said to have a Pareto distribution with parameters θ and κ,
denoted by X ∼ PAR(θ, κ) if its density is given by
κ
f X (x; θ, κ)  , x > 0, θ > 0, κ > 0.
θ(1 + x/θ)κ+1
Statistical Inference, September 9, 2020

The CDF is given by x  −κ


FX (x; θ, κ)  1 − 1 + , x > 0.
θ

Example 7.2.3 of the book Let X1 , . . . , X n , be a random sample from a Pareto distribution,
X ∼ PAR(1, 1), and let Yn  nX1:n . The CDF of X is

FX (x)  1 − (1 + x)−1 , x>0

so the CDF of Yn , is

G n (y)  P[Yn ≤ y]
 P[nX1:n ≤ y]
h yi
 P X1:n ≤
n  i
h y n
 1 − 1 − FX
n
 y  −n
1− 1+ , y > 0.
n
Now, taking the limit of G n (y) as n → ∞, for y > 0, we get
 y  −n
lim G n (y)  1 − lim 1 +  1 − exp(−y).
n→∞ n→∞ n
WE know that if Y ∼ EXP(1), then
(
0, if y ≤ 0
GY (Y) 
1 − exp(−y), if y > 0.
Now, observe that
lim G n (y)  GY (Y).
n→∞

This is illustrated in Figure 2 (Figure 7.2 in the book), which shows the graphs of GY (y)
and G n (y) for n  1, 2, and 5.
Thus the limiting distribution of nX1:n is exponential which is a non-degenerate dis-
tribution.

5
which is the CDF of an exponential distribution, EXP(1) This is illustrated in
Figure 7.2, which shows the graphs of G(y) and G,,(y) for n = 1, 2, and 5.

FIGURE 7.2 Comparison of CDFs G(y) with 'imiting CDF G(y)

G(y) = i - e

Ql (Y) G2(y) G,(y)


Statistical Inference, September 9, 2020

Figure 2: following
The Comparison of CDFs
example G n (y)
shows thatwith limiting
a sequence ofCDF GY (y)
random variables need not
have a limiting distribution.
4 Limiting distribution does not exist

The following example shows that a sequence of random variables need not have a limiting
distribution.
Example 7.2.4 of the book Let X1 , . . . , X n , be a random sample from a Pareto distribution,
X ∼ PAR(1, 1), and let Yn  X n:n . The CDF of X is
compression, OCR, web optimization using a watermarked
−1 x evaluation copy of CVISION PDFCompr
FX (x)  1 − (1 + x)  , x>0
1+x
so the CDF of Yn , is

G n (y)  P[Yn ≤ y]
 P[X n:n ≤ y]
n
 FX y

 n
y
 , y > 0,
1+ y

and zero otherwise. Because y/(1 + y) < 1, we have limn→∞ G n (y)  0 for all y > 0 which
can not be a CDF because it does not approach one as y → ∞.

5 Limiting distribution - some more problems

Example 7.2.5 of the book Let X1 , . . . , X n , be a random sample from a Pareto distribution,
X ∼ PAR(1, 1), and let Yn  X n:n /n. The CDF of X is
x
FX (x)  1 − (1 + x)−1  , x>0
1+x
6
so the CDF of Yn , is

G n (y)  P Yn ≤ y
 

X n:n
 P[ ≤ y]
 n n
 FX n y
 n
ny

Statistical Inference, September 9, 2020

1 + ny
 −n
1 + ny


ny
  −n
1
 1+ , y > 0,
ny
c nb
and zero otherwise. Now, taking the limit (use the result limn→∞ 1 +  exp(cb)) of

n
G n (y) as n → ∞, for y > 0, we get
  −n  
1 1
lim G n (y)  lim 1 +  exp − , y > 0.
n→∞ n→∞ ny y
Now, consider the CDF of Y (you can show that it is a CDF) given by
(
0, if y ≤ 0
GY (Y)   
exp − 1y , if y > 0.

Then,
lim G n (y)  GY (y)
n→∞
d
for all values y at which GY (y) is continuous. Hence, Yn  n1 X n:n → Y where the random
variable Y has a non-degenerate distribution with pdf
 
1 −2
fY (y)  exp − y , y > 0.
y
The distribution defined above is a special case of the inverse-gamma distribution.
Example 7.2.6 of the book Let X1 , . . . , X n , be a random sample from an exponential
distribution, X ∼ EXP(θ). Then,
1  x
f X (x)  exp − , x > 0, θ>0
θ θ
and zero otherwise, and
(
0, if x≤0
FX (x) 
− θx , if x > 0.

1 − exp

7
Further, let Yn  θ1 X n:n − ln n. Then, it follows that G n (y) of Yn is

G n (y)  P[Yn ≤ y]
 
1
P X n:n − ln n ≤ y
θ
 P X n:n ≤ θ(y + ln n)
 
Statistical Inference, September 9, 2020

 [FX (θ(y + ln n))]n


 n
θ(y + ln n)
 
 1 − exp −
θ
n
 1 − exp −y exp (− ln n)
 
 n
1
 1 − exp −y , y > − ln n

n

and zero if y ≤ − ln n. Now, taking the limit of G n (y) as n → ∞, we get


 n
1
lim G n (y)  lim 1 − exp −y  exp − exp −y , −∞ < y < ∞.
 
n→∞ n→∞ n

Now, consider the CDF of Y (you can show that it is a CDF) given by

GY (y)  exp − exp −y , −∞ < y < ∞.




Then,
lim G n (y)  GY (y)
n→∞
d
for all values y. Hence, Yn  1
θ X n:n − ln n → Y, where the random variable Y has a
non-degenerate distribution with pdf

fY (y)  exp − exp −y exp −y , −∞ < y < ∞.


 

THE distribution defined by the above density is called the Gumbel distribution.
Example 7.2.7 of the book Let X1 , . . . , X n , be a random sample from a normal distribution,
X ∼ N(µ, σ2 ). Then, Yn  X̄ n ∼ N(µ, n1 σ2 ). Further,

n(Yn − µ)
∼ N(µ, σ2 )
σ
which gives
√ √ √
n(Yn − µ) n(y − µ) n(y − µ)
 
G n (y)  P[Yn ≤ y]  P ≤ Φ , −∞ < y < ∞.
σ σ σ

8
where Φ(·) is the CDF of a standard normal distribution. To understand the explanation
easier, let us write the above expression in terms of an integral as

∫ n(y−µ)
σ
G n (y)  P[Yn ≤ y]  φ(z) dz,
−∞

where φ(z) is the pdf of a standard normal variable. Now, consider three cases y < µ,
Statistical Inference, September 9, 2020

y  µ and y > µ.
√ √
n(y−µ) n(y−µ)
If y < µ, then σ < 0 and σ → −∞ as n → ∞ giving

lim G n (y)  0, if y < µ.


n→∞


n(y−µ)
If y  µ, then σ  0 and
∫ 0
1
lim G n (y)  lim φ(z) dz  , if y  µ.
n→∞ n→∞ −∞ 2
√ √
n(y−µ) n(y−µ)
If y > µ, then σ > 0 and σ → ∞ as n → ∞ giving

lim G n (y)  1, if y > µ.


n→∞

Thus,


 0, if y < µ,


lim G n (y)  1
2 , if y  µ,
n→∞ 
 1, if y > µ.


Now, let Y be a degenerate random variable with P(Y  µ)  1. Then
(
0, if y < µ
GY (Y) 
1, if y ≥ µ.

Now, note that the right hand side (limiting function) of limn→∞ G n (y) and GY (Y) are
equal except for one point y  µ but this is not a problem, because the definition of
convergence in distribution requires only that the limiting function agrees with a CDF at its
points of continuity and y  µ is the point of discontinuity of GY (Y).
d
Thus, X̄ n → Y, where the random variable Y has a degenerate distribution with
P[Y  µ]  1. That is the sample mean from a normal distribution with mean µ and
variance σ 2 converges to a degenerate random variable. Needless to say that this type of
convergence is also called stochastic convergence.

9
6 Limiting distributions - conclusion

We have seen that if X1 , . . . , X n , is a random sample from a Pareto distribution, X ∼


PAR(1, 1), then

• limiting distribution of nX1:n is exponential which is a non-degenerate distribution.


Statistical Inference, September 9, 2020

(Example 7.2.3)

• the limiting distribution of X n:n does not exist. (Example 7.2.4)


d
• 1
n X n:n → Y where the random variable Y has a non-degenerate distribution (inverse
gamma distribution) with the pdf
 
1 −2
fY (y)  exp − y , y > 0.
y

(Example 7.2.5)

Also, we have shown that if X1 , . . . , X n is a random sample from an exponential


distribution, X ∼ EXP(θ), then

d
• X1:n → Y, where the random variable Y has a degenerate distribution with P[Y 
0]  1 (Example 7.2.2)
d
• 1
θ X n:n − ln n → Y, where the random variable Y has a non-degenerate distribution
(Gumbel distribution) with the pdf

fY (y)  exp − exp −y exp −y , −∞ < y < ∞.


 

(Example 7.2.6)

d
Also, we have shown that X̄ n → Y, where the random variable Y has a degenerate
distribution with P[Y  µ]  1 and X̄ n is the sample mean from a normal distribution
with mean µ and variance σ2 . (Example 7.2.7)
That is the sequence of random variables may converge in distribution to a degenerate
random variable or a non-degenerate random variable or may not converge at all to any
random variable.

10
7 Approximation using limiting distribution

We have seen that if X1 , . . . , X n is a random sample from an exponential distribution,


X ∼ EXP(θ), then Yn  θ1 X n:n − ln n has the CDF given by
 n
1
G n (y)  1 − exp −y , y > − ln n

Statistical Inference, September 9, 2020

d
and zero if y ≤ − ln n. Further, Yn  θ1 X n:n − ln n → Y, where the random variable Y has
a non-degenerate distribution (Gumbel distribution) with the CDF and the pdf given by

GY (y)  exp − exp −y , −∞ < y < ∞.




and
fY (y)  exp − exp −y exp −y , −∞ < y < ∞,
 

respectively.
We now illustrate the accuracy when this limiting CDF is used as an approximation to
G n (y) for large n
Suppose that the lifetime in months of a certain type of component is a random variable
X ∼ EXP(1), and suppose that 10 independent components are connected in a parallel
system. The time to failure of the system is T  X10:10 and the CDF is

FT (t)  P[T ≤ t]  [1 − FX (t)]10  [1 − exp(−t)]10 , t > 0.

This CDF is evaluated at t  1, 2, 5 and 7 months in the table given below.

t 1 2 5 7
FT (t) 0.010 0.234 0.935 0.9909
G(t − ln 10) 0.025 0.258 0.935 0.9909

To approximate these probabilities with the limiting distribution, then

FT (t)  P[T ≤ t]
 P[Y10 + ln 10 ≤ t]
 P[Y10 ≤ t − ln 10]
 G(t − ln 10),

11
where

G(t − ln 10)  exp − exp {−(t − ln 10)}


 

 exp − exp(−t) exp(ln 10)


 

 exp −10 exp(−t) .


 
Statistical Inference, September 9, 2020

Chapter 7: The Central Limit Theorem

In the previous examples, the exact CDF was known for each finite n and the limiting
distribution was obtained directly from this sequence. One advantage of limiting dis-
tributions is that it often may be possible to determine the limiting distribution without
knowing the exact form of the CDF for finite n. The limiting distribution then may pro-
vide a useful approximation when the exact probabilities are not available. One method
of accomplishing this result is to make use of MGFs. The following theorem (Theorem
7.3.1) is stated without proof.

Theorem 7.3.1 Let Y1 , Y2 , . . . , . . . be a sequence of random variables with respective CDFs


G1 (y), G2 (y), . . . , and MGFs M1 (t), M2 (t), . . . , .. If M(t) is the MGF of a CDF G(y), and
if limn→∞ M n (t)  M(t) for all t in an open interval containing zero, −h < t < h, then
limn→∞ G(y)  G(y) for all continuity points of G(y).

The Bernoulli distribution


A discrete random variable X is said to have a Bernoulli distribution with parameter
θ, denoted by X ∼ BIN(1, θ), if its probability function is given by

f X (x; θ)  θ x (1 − θ)1−x , x  0, 1, 0 < θ < 1.

Further, if X1 , . . . , X n are independent Bernoulli random variables, X i ∼ BIN(1, θ), i 


Ín Ín
1, . . . , n, then the sum Y  i1 X i has a binomial distribution, Y  i1 X i ∼ BIN(n, θ),
defined by the probability function
 
n y
fY (y; n, θ)  θ (1 − θ)n−y , y  0, 1, . . . , n, 0 < θ < 1.
y
The MGF of Y is given by
n
MY (t)  1 − θ + θ exp(t) , −h < t < h, h > 0.

12
The Poisson distribution
A discrete random variable X is said to have the Poisson distribution with parameter µ > 0
if it has discrete pdf of the form

exp(−µ)µ x
f X (x; µ)  , x  0, 1, µ > 0.
x!
Statistical Inference, September 9, 2020

A special notation that designates that a random variable X has the Poisson distribution
with parameter µ is X ∼ POI(µ). The MGF of X is given by

MX (t)  exp µ(exp(t) − 1) , −h < t < h, h > 0.


 

It has been shown in Theorem 3.2.3 of the book that if X ∼ BIN(n, p), then for each
value x  0, 1, 2, . . . and p → ∞ with np  µ constant,

exp(−µ)µ x
 
n x
lim p (1 − p)n−x  .
n→∞ x x!

Example 7.3.1 of the book Let X1 , . . . , X n , be a random sample from a Bernoulli distribu-
Ín
tion distribution, X i ∼ BIN(1, p), i  1, . . . , n, consider the sum Yn  i1 X i which has a
binomial distribution, Yn ∼ BIN(n, p) with MGF
n
MYn (t)  1 − p + p exp(t) , −h < t < h, h > 0.

If we let p → 0 as n → ∞ in such a way that np  µ, for fixed µ > 0, then


n
MYn (t)  1 − p + p exp(t)
 µ µ n
 1 − + exp(t)
n n
h µ in
 1 + (exp(t) − 1) .
n
Now, using the result
 c  nb
lim 1 +  exp(cb),
n→∞ n
we have
h µ in
lim MYn (t)  lim 1 + (exp(t) − 1)  exp µ(exp(t) − 1) , −h < t < h, h>0
 
n→∞ n→∞ n

13
d
which is the MGF of a Poisson distribution with mean µ. Thus Yn → Y ∼ POI(µ).
Example 7.3.2 of the book (Bernoulli Law of Large Numbers) Let X1 , . . . , X n be a random
sample from a Bernoulli distribution distribution, X i ∼ BIN(1, p), i  1, . . . , n, consider
Ín
the the sequence of sample proportion Wn  1
n Yn  1
n i1 X i . The MGF of Wn in this
case
Statistical Inference, September 9, 2020

MWn (t)  E[exp (tWn )]


  
1
 E exp t Yn
n
h  t i t
 E exp Yn  MYn
n  t i n n
h t
 1 − p + p exp , −h < < h, h > 0.
n n
t

Expanding exp n by power series expansion of the exponential function,
t1  t  1  t 2 1  t 3
exp 1+ + + +··· ,
n 1! n 2! n 3! n
in the above expression, we get
 n
1  t  1  t 2 1  t 3
 
MWn (t)  1 − p + p 1 + + + +···
1! n 2! n 3! n
 2  n
t3

pt t
 1+ +p + +···
n 2n 2 6n 2
 n
pt d(n)
 1+ + ,
n n
where
t2 t3
 
d(n)  p + +··· .
2n 6n 2
It can easily be verified that for fixed value of p, limn→∞ d(n)  0. Now, applying the
result   nb
c d(n)
lim 1 + +  exp(cb) if lim d(n)  0.
n→∞ n n n→∞

we obtain
 n
pt d(n)
lim MWn (t)  lim 1 + +
n→∞ n→∞ n n
 exp(pt)

which is the MGF of a degenerate distribution with probability mass concentrated at p,


and thus Wn converges stochastically to p as n approaches infinity.

14
Example 7.3.3 of the book Let X1 , . . . , X n , be a random sample from a Bernoulli distribu-
tion, X i ∼ BIN(1, p), i  1, . . . , n, consider the the sequence of "standardized" variables
Ín
X i − np Yn − np Yn np
Z n  pi1   − ,
np(1 − p) σn σn σn

where E(Yn )  np and Var(Yn )  np(1 − p)  σn2 . The MGF of Z n in this case
Statistical Inference, September 9, 2020

MZ n (t)  E[exp (tZ n )]


  
Yn np
 E exp t −t
σn σ
   n  
np t
 exp −t E exp Yn
σn σ
   n
np t
 exp −t MYn
σn σn
      n
pt t t
 exp − 1 − p + p exp , −h < < h, h > 0.
σn σn σn
   
pt t
Expanding exp − σn and exp σn by using the power series expansion of the exponential
function,
     2
pt 1 pt 1 pt
exp − 1− + +···
σn 1! σn 2! σn
and      2
t 1 t 1 t
exp 1+ + +···
σn 1! σn 2! σn
in the above expression, we get
 n
pt p 2 t 2 t2
   
t
MZ n (t)  1 − + + · · · 1 − p + p 1 + + +···
σn 2σn2 σn 2σn2
 n
pt p 2 t 2 pt 2
  
pt
 1− + +··· 1+ + +···
σn 2σn2 σn 2σn2
n
pt 2 pt 2 p2 t2 pt 2
    
pt pt pt pt
 1+ + − 1+ + + 1+ + +···
σn 2σn2 σn σn 2σn2 2σn2 σn 2σn2
n
pt 2 p 2 t 2 p 2 t 2 p 2 t 3 p 3 t 3 p 3 t 4

 1+ 2 − 2 + − 3 + + +··· .
2σn σn 2σn2 2σn 2σn3 4σn4
| {z } | {z }
first term second term

Now, substituting σn 
p
np(1 − p) in the first term and the second term we get

pt 2 p2 t2 p2 t2 pt 2 p2 t2 (p − p 2 )t 2 (p(1 − p))t 2 t2
first term  − +  −   
2σn2 σn2 2σn2 2σn2 2σn2 2σn2 2np(1 − p) 2n

15
and
p2 t3 p3 t3 p3 t4
second term  + + +···
2σn3 2σn3 4σn4
p2 t3 p3 t3 p3 t4
 + + +···
2[np(1 − p)]3/2 2[np(1 − p)]3/2 4[np(1 − p)]2
√ 3 p
pt p3 t3 pt 4
 
1
 + p + +···
Statistical Inference, September 9, 2020

n 2 n(1 − p)3 2 n(1 − p)3 4n(1 − p)2


p

d(n)

n
where √ p
pt 3 pt 4 p3 t3
d(n)  p + p + 2
+··· .
2 n(1 − p)3 2 n(1 − p)3 4n(1 − p)
It can be checked that, for a fixed value of p, d(n) → 0 as n → ∞. Now, MZ n (t) can be
written as n
t2

d(n)
MZ n (t)  1 + + .
2n n
Finally, applying the result
  nb
c d(n)
lim 1 + +  exp(cb) if lim d(n)  0.
n→∞ n n n→∞

we obtain
n
t2

d(n)
lim MZ n (t)  lim 1 + +
n→∞ n→∞ 2n n
 2
t
 exp
2

d
which is the MGF of of the standard normal distribution, and so Z n → Z ∼ N(0, 1). This
is an example of a special limiting result known as the Central Limit Theorem.

Theorem 7.3.2 Let X1 , X2 , . . . , X n be a random sample from a from a distribution with


mean µ and variance σ 2 < ∞, then the limiting distribution of
Ín
i1 X i − nµ
Zn  √

d
is the standard normal, Z n → Z ∼ N(0, 1).

16
Let m(t) denote the MGF of X − µ, m(t)  MX−µ (t), and note that m(0)  1, m 0(0) 
E(X − µ)  0, and m 00(t)  E(X − µ)2  σ 2 . Expanding m(t) by the Taylor series formula
about 0 gives, for ξ between 0 and t,

m 00(ξ)t 2
m(t)  m(0) + m 0(0)t +
2
m 00(ξ)t 2
1+
Statistical Inference, September 9, 2020

2
(m 00(ξ) − σ 2 + σ 2 )t 2
1+
2
σ 2 t 2 (m 00(ξ) − σ 2 )t 2
1+ + .
2 2
Now we may write
Ín n
i1 X i − nµ Õ X i − µ
Zn  √  √
nσ i1

and

MZ n (t)  E exp(tZ n )
 
" n
!#
Õ Xi − µ
 E exp t √
i1

" n #
Xi − µ
Ö 
E exp t √
i1

n
Xi − µ
Ö   
 E exp t √ ( ∵ X1 , . . . , X n are independent)
i1

     n
t
 E exp √ (X − µ) ( ∵ X1 , . . . , X n are identical)

  n
t
 m √

n
t2 (m 00(ξ) − σ 2 )t 2

|t |
 1+ + , 0<ξ< √ .
2n 2nσ 2 nσ

|t |
Observe that n → ∞, √nσ → 0, and therefore ξ → 0, and consequently m 00(ξ)− σ 2 → 0.
So we take
(m 00(ξ) − σ 2 )t 2
d(n)  , lim d(n)  0,
2σ 2 n→∞
and we re-write MZ n (t) as
n
t2

(d(n)
MZ n (t)  1 + + .
2n n

17
Finally, applying the result
  nb
c d(n)
lim 1 + +  exp(cb) if lim d(n)  0.
n→∞ n n n→∞

we obtain
n
t2

d(n)
lim MZ n (t)  lim 1 + +
Statistical Inference, September 9, 2020

n→∞ n→∞ 2n n
 2
t
 exp .
2

Or equivalently,
lim GZ n (t)  Φ(z)
n→∞
d
which suggests Z n → Z ∼ N(0, 1).
The major application of the CLT is to provide an approximate distribution in cases
where the exact distribution is unknown or intractable.
Example 7.3.4 of the book Let X1 , . . . , X n , be a random sample from a Uniform distribution
Ín
distribution, X i ∼ UNI(0, 1), i  1, . . . , n, and let Yn  i1 X i . Because E(X i )  1/2 and
Var(X i )  1/12, we have the approximation
approx
n n 
Yn ∼ N , .
2 12
For example, if n  12, then approximately

Y12 − 6  N (0, 1) .

This approximation is so close that it often is used to simulate standard normal random
numbers in computer applications. Of course this requires 12 uniform random numbers
to be generated to obtain one random number from the standard normal distribution.

Chapter 7: Approximations For The Binomial Distribution

Let X1 , . . . , X n , be a random sample from a Bernoulli distribution, X i ∼ BIN(1, p), i 


Ín
1, . . . , n, consider the sum Yn  i1 X i which has a binomial distribution, Yn ∼ BIN(n, p).

1. If we let p → 0 as n → ∞ in such a way that np  µ, for fixed µ > 0, then it has been
d
shown that Yn → Y ∼ POI(µ). (Example 7.3.1 of the book)

18
Ín
2. If p is fixed, then Wn  1
n Yn  1
n i1 X i Wn converges stochastically to p as n
approaches infinity. (Example 7.3.2 of the book)

3. For a fixed value of p, the the sequence of "standardized" variables


Ín
X i − np
Z n  pi1
np(1 − p)
Statistical Inference, September 9, 2020

converges in distribution to a standard normal variable. (Example 7.3.3 of the book)

Here we will concentrate on the last case which states that for a fixed value of p
a suitably standardized sequence of binomial random variable converges to a standard
normal distribution-suggesting a normal approximation. In particular, it suggests that
for large n and fixed p, approximately Yn ∼ N(np, np(1 − p)). This approximation works
best when p is close to 0.5, because the binomial distribution is symmetric when p  0.5.
The accuracy required in any approximation depends on the application. One guideline
is to use the normal approximation when np > 5 and n(1 − p) > 5, but again this would
depend on the accuracy required.
Example 7.4.1 of the book The probability that a basketball player hits a shot is p  0.5, If
he takes 20 shots, what is the probability that he hits at least nine? The exact probability is

P[Y20 ≥ 9]  1 − P[Y20 ≤ 8]
8  
Õ 20
1− (0.5) y (1 − 0.5)20−y
y
y0
8  
Õ 20
 1 − (0.5) 20
y
y0

 0.7483.

A normal approximation is

P[Y20 ≥ 9]  1 − P[Y20 ≤ 8]
" #
Y20 − np 8 − 10
1−P p ≤ p ( ∵ n  20, p  0.5)
np(1 − p) (20)(0.5)(0.5)
" #
Y20 − np
−2
1−P p ≤ √ .
np(1 − p) 5
Now,
Y20 − np approx
p ∼ N (0, 1)
np(1 − p)

19
and therefore
 
−2
P[Y20 ≥ 9]  1 − Φ − √
 5 
2
1− 1−Φ √
  5
2
Statistical Inference, September 9, 2020

Φ √
5
 Φ (0.894427)
 0.8133.

You can use online calculator to compute normal probabilities here or here.
Because the binomial distribution is discrete and the normal distribution is continuous,
the approximation can be improved by making a continuity correction. In particular,
each binomial probability b(y; n, p) has the same value as the area of a rectangle of height
b(y; n, p) and with the interval [y − 0.5, y + 0.5] as its base, because the length of the base
is one unit. The area of this rectangle can be approximated by the area under the pdf of
Y ∼ N(np, np(1 − p)), which corresponds to fitting a normal distribution with the same
mean and variance as Yn ∼ BIN(n, p). This is illustrated for the case of n  20, p  0.5, and
20
y  7 in Figure 3, where the exact probability is b(7; 20, 0.5)  7
5 (0.5) (0.5)
13  0.0739.
The approximation, which is the shaded area in the Figure 3 is
" #
6.5 − 10 Y − np 7.5 − 10
P[6.5 ≤ Y ≤ 7.5]  P p ≤ p ≤ p
(20)(0.5)(0.5) np(1 − p) (20)(0.5)(0.5)
 P [−1.56525 ≤ Z ≤ −1.11803]
 Φ(−1.11803) − Φ(−1.56525)
 0.0732.
Y−np
where Y ∼ N(np, np(1 − p)), Z  √ ∼ N(0, 1) with n  20 and p  0.5. You can
np(1−p)
compute normal probabilities here.
The same idea can be used with other binomial probabilities, such as

P[Y20 ≥ 9]  1 − P[Y20 ≤ 8]

where we approximate P[Y20 ≤ 8] as

P[Y20 ≤ 8]  P[Y ≤ 8.5]

20
FIGURE 7.3 Continuity correction for normal approximation of a binomial probability
Statistical Inference, September 9, 2020

b(7;20,0.5) = 0.0739

6.5 7 7.5

Figure 3: Continuity correction for normal approximation of a binomial probability

" #
Y − np 8.5 − 10
P p ≤ p
np(1 − p) (20)(0.5)(0.5)
 P [Z ≤ −0.67082]
 Φ(−0.67082)

and

P[Y20 ≥ 9]  1 − P[Y ≤ 8.5]  1 − Φ(−0.67082)  0.7486

which is much closer to the exact value than without the continuity correction. The
situation is shown in Figure 4
Example 7.4.2 of the book Suppose that Yn ∼ POI(n), where n is a positive integer.
From the reproductive property of Poisson distribution, we know that Yn has the same
Ín
distribution as a sum i1 X i , where X1 , . . . , X n are independent, X i ∼ POI(1). According
to the CLT,
Yn − n
Zn  √ Z ∼ N(0, 1),
n
approx
which suggests the approximation Yn ∼ N(n, n) for large n. For example, n  20, we
desire to find P[l0 < Y20 < 30]. The exact value is
30
Õ exp(−20)(20) y
P[10 ≤ Y20 ≤ 30]   0.982,
y!
i10

21
FIGURE 7.4 The normal approximation for a binomial distribution
Statistical Inference, September 9, 2020

Figure 4: The normal approximation for a binomial distribution

and the approximate value is


 
9.5 − 20 Y − n 30.5 Y−n
P[9.5 ≤ Y ≤ 30.5]  P √ ≤ √ ≤ √ , √ ∼ N(0, 1)
ompression, OCR, web optimization using 20 n
a watermarked n copy of
20evaluation CVISION PDFCom
 P [−2.34787 ≤ Z ≤ 2.34787]
 Φ(2.34787) − Φ(−2.34787)
 2Φ(2.34787) − 1
 2(0.991) − 1
 0.982.

To compute Φ use online computation (https://stattrek.com/online-calculator/normal.aspx)

22

You might also like