Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Economics 241B

Review of Limit Theorems for Sequences of Random Variables

Convergence in Distribution
The previous de…nitions of convergence focus on the outcome sequences of a ran-
dom variable. Convergence in distribution refers to the probability distribution
of each element of the sequence directly.
De…nition. A sequence of random variables Y1 ; Y2 ; : : : is said to converge
in distribution to a random variable Y if
lim P Yn < c = P (Y < c)
n!1

at all c such that FY is continuous.


D
We express this as Yn ! Y . An equivalent de…nition is
De…nition. A sequence of random variables Y1 ; Y2 ; : : : is said to converge
in distribution to FY if
lim FYn (c) = FY (c)
n!1
at all c such that FY is continuous.
D
We express this as Yn ! FY . The distribution FY is the asymptotic (or limiting)
distribution of Yn . Convergence in distribution is simply the pointwise conver-
gence of FYn to FY . ( The requirement that FY be continuous at all c will hold for
all applications in this course, with the exception of binary dependent variables.)
In most cases FY is a Gaussian distribution, for which we write
D 2
Yn ! N ; :
Convergence in distribution is closely related to convergence in probability, if
the limit quantity is changed. In the discussion above, a sequence of random
variables was shown to converge (in probability) to a constant. One can also
establish that a sequence of random variables converges in probability to a random
variable. Recall, for convergence (i.p.) to a constant, the probability that Yn is in
an neighborhood of must be high, P ( Yn < )>1 . That is
P ! : Yn (!) < >1 :
For convergence to a random variable, Y , we need
P ! : Yn (!) Y (!) < >1 ; (1)
that is, for large n we want the probability that the histogram for Yn is close to the
histogram for Y . There is no natural measure of distance here, although we think
of as de…ning the histogram bin width. If the two histograms are arbitrarily
close as n increases, then we write
P
Yn ! Y:
Because the two histograms are arbitrarily close, it should not be surprising that
P D D
Yn ! Y ) Yn ! FY (sometimes written Yn ! Y ).
Why doesn’t the reverse hold? Because of the basic workings of probability
spaces, about which we rarely concern ourselves. By the fact that both Yn and
Y are indexed by the same ! in (1), both quantities are de…ned on the same
probability space. No such requirement is made in establishing convergence of
distribution, we simply discuss a sequence of distribution functions. Not only
could Yn and Y be de…ned on di¤erent probability spaces, for each n, Yn could be
de…ned on a di¤erent probability space. Yet the two de…nitions are almost i¤.
D
The Skorhorod construction is a proof that establishes that if Yn ! Y , then it
is always possible to construct sequences Yn0 and Y 0 that are de…ned on the same
P
probability space and are similar to Yn and Y so that Yn0 ! Y 0 .
D
The extension to a sequence of random vectors is immediate: Yn ! Y if the
joint c.d.f. Fn converges to the joint c.d.f. F of Y at every continuity point of F .
Note, however, that unlike other concepts of convergence, for convergence in dis-
tribution, element by element convergence does not necessarily mean convergence
D
for the vector sequence. That is "each element Yn !corresponding element of
D
Y " does not imply "Yn ! Y ". A common way to connect scalar convergence and
vector convergence in distribution is
Multivariate Convergence in Distribution Theorem: Let Yn be a se-
quence of K-dimensional random vectors. Then
D 0 D 0
Yn ! Y , Yn ! Y
for any K-dimensional vector of real numbers .
Convergence in Distribution vs. Convergence in Moments
The moments of the limit distribution of Yn are not necessarily equal to the limit
D
of the moments of Yn . Thus Yn ! Y does not imply that limn!1 E Yn = E (Y ).
However
Lemma (convergence in distribution and moments): Let sn be the s-th
moment of Yn and limn!1 sn = s where s is …nite (i.e. a real number). Then
D
Yn ! Y ) s is the s-th moment of Y:

Thus, for example, if the variance of a sequence of random variables (converging


in distribution) converges to some …nite number, then that number is the variance
of the limiting distribution.
Relation among Modes of Convergence
As noted above, some modes of convergence are weaker than others.
Lemma (relation among the four modes of convergence):
m:s: p m:s: p
(a) Yn ! ) Yn ! . So Yn ! Y ) Yn ! Y .
a:s: p a:s: p
(b) Yn ! ) Yn ! . So Yn ! Y ) Yn ! Y .
p d
(c) Yn ! , Yn ! .
Part (c) states that if the limiting random variable is a constant (a trivial random
variable) then convergence in distribution is the same as convergence in probabil-
ity.
Three Useful Results
The great use of asymptotic analysis is that we can easily de…ne the convergence
properties of ratios, while we cannot easily de…ne the expectation of ratios (and
so cannot easily do …nite sample analysis). These results, which highlight such
features, are essential for large-sample theory.
Lemma (continuous mapping theorem): Suppose that g ( ) is a continuous
function that does not depend on n. Then:
P P
Yn ! ) g Yn ! g ( )
D D
Yn ! Y ) g Yn ! g (Y ) :

The basic idea is that because g ( ) is continuous, g Yn will be close to g ( )


1
provided that Yn is close to . Note g Yn = n 2 Yn depends on n and is not cov-
ered by this proposition. An immediate implication is that the usual arithmetic
operations preserve convergence in probability. For example
P P P
Yn ! ; Xn ! ) Yn + Xn ! +
P P P
Yn ! ; Xn ! ) Yn Xn !
P P P
Yn ! ; Xn ! ) Yn =Xn ! = provided that 6= 0
P 1 P 1
Yn ! ) Yn ! provided that is invertible.

The next result combines convergence in probability and distribution, and is used
repeatedly to derive the limit theory for our estimators.
Lemma (parts a and c are called Slutzky’s Theorem):
P D D
(a) Yn ! and Xn ! X ) Yn + Xn ! X +
P D P
(b) Yn ! 0 and Xn ! X ) Yn0 Xn ! 0
P D D
(c) An ! A and Xn ! X ) An Xn ! AX
P D D
(d) An ! A and Xn ! X ) Xn0 An 1 Xn ! X 0 A 1 X
Parts (c)-(d) require that X and A be conformable and (for part (d)) that A be
invertible.
D D
In particular, if X N (0; V ), then Yn +Xn ! N ( ; V ) and An Xn ! N (0; AV A0 ).
Also, if we set = 0, then part (a) implies
P D D
Yn ! 0 and Xn ! X ) Yn + Xn ! X:
P p
Let Zn = Yn + Xn , so Yn ! 0 (i.e. Zn Xn ! 0) implies that the asymptotic
p
distribution of Zn is the same as that of Xn . When Zn Xn ! 0 we sometimes
say that the two sequences are asymptotically equivalent (Zn Xn ) and write
a

Zn = Xn + oP

where oP is some suitable random variable (Yn here) that converges to zero in
probability.
We employ the lemma in a standard trick to derive the limit distribution of a
sequence of random variables. Suppose we wish to …nd the limit distribution of
D P
Yn0 Xn , where we know Xn ! X and Yn ! . Then we replace Yn with , and
determine the limit distribution of 0 Xn . Why does this work? Because Yn0 Xn
a
0
Xn , which is more easily seen in the equivalent statement

Yn0 Xn = 0
Xn + oP
0
where oP here is Yn Xn .
To test nonlinear hypotheses, given the asymptotic distribution of the estima-
tor, we need
Lemma (delta method): Let fBn gn 1 be a sequence of K-dimensional random
variables such that
P
Bn !
1 D
n 2 (Bn ) ! N (0; V ) :

Let g ( ) : RK ! RQ have continuous …rst derivatives, with G ( ) denoting the


Q K matrix of …rst derivatives evaluated at :

@g ( )
G( ) = :
@ 0
Then
1 D
n 2 (g (Bn ) g ( )) ! N 0; G ( ) V G ( )0 :

1 1 1
Proof: A mean-value expansion yields n 2 g (Bn ) = n 2 g ( ) + n 2 G (B ) (Bn ),
P P
where Bn is the mean value between Bn and . Because Bn ! , Bn ! and,
P
by continuity, G (Bn ) ! G ( ). Thus
1 1 D
n 2 [g (Bn ) g ( )] = n 2 G (Bn ) (Bn ) ! G ( ) N (0; V ) :

Viewing Estimators as Sequences of Random Variables


P
Let Bn be an estimator of . If Bn ! , then Bn is consistent for .
The asymptotic bias of an estimator is not an agreed upon concept. The most
common de…nition is
lim E (Bn ) ;
n!1

so consistency and asymptotic unbiasedness are distinct concepts. Hayashi uses


p limn!1 Bn , so consistency and asymptotic unbiasedness are identical for
him.
1 D
If , in addition, n 2 (Bn ) ! N (0; V ), then Bn is asymptotically Gaussian (and
1
n 2 consistent, hence CAN). The asymptotic variance is V .
Laws of Large Numbers and Central Limit Theorems
To verify convergence in probability from basic principles, we would need to know
the joint distribution of the …rst n elements in the sequence. To verify convergence
almost surely, we would need to know the joint distribution of the entire (in…nite
dimensional) sequence. To verify convergence in quadratic mean, we would need
to know the …rst two moments of the …nite sample distribution of the estimator.
In general, we do not know any of these distributions. Instead, we turn to laws of
large numbers. Laws of large numbers are one method of establishing convergence
in probability (or almost surely) to a constant. They are applied to sequences of
random variables, for which each member of the sequence is an (increasing) sum
of random variables, such as Yn . An LLN is strong if the convergence is almost
sure and weak if the convergence is in probability.
Chebychev’s Weak LLN:
p
" lim E Yn = ; lim V ar Yn = 0" ) Yn ! :
n!1 n!1

The following strong LLN assumes that fYt g is i.i.d. but the variance does not
need to be …nite
Kolmogorov’s Second Strong LLN: Let fYt g be i.i.d. with EYt = . Then
a:s:
Yn ! .
Central limit theorems are a principal method of establishing convergence in dis-
tribution, as they govern the limit behavior of the p
di¤erence between Yn and E Yn
(which equals EYt if fYt g is i.i.d.) blown up by n. For i.i.d. sequences, the
only CLT we need is
Lindberg-Levy CLT: Let fYt g be i.i.d. with EYt = and V ar (Yt ) = . Then

1 X
n
p d
n Yn =p (Yt ) ! N (0; ) :
n t=1
p
We read the above CLT as: a sequence of random vectors n Yn converges
in distribution to a random vector whose distribution is N (0; ). To understand
how to relate the above CLT for random vectors, with the CLT for random scalars,
let be any vector of real numbers with the same dimension as Yt . Now f 0 Yt g is
a sequence of random variables with E ( 0 Yt ) = 0 and V ar ( 0 Yt ) = 0 . The
scalar version of Lindberg-Levy implies
p p d
n 0 Yn 0
= 0 n Yn ! N (0; 0 ) :
Yet this limit distribution is the distribution of 0 Z, where Z N (0; ), hence
the multivariate convergence in distribution theorem establishes the multivariate
Lindberg-Levy CLT.

You might also like