Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

ASYMPTOTIC THEORY

Convergence of Deterministic Sequences

A sequence of non-random numbers {aN : N = 1, 2, · · · } converges


to its limit a iff ∀ > 0, ∃N such that ∀N > N, |aN − a| < .

Written as: aN → a as N → ∞.

E.g. If aN = 2 + 1/N , Nlim


→∞ aN = 2. Thus aN → 2 as N → ∞.

A sequence {aN : N = 1, 2, · · · } is bounded iff ∃b < ∞ such that


|aN | ≤ b ∀N = 1, 2, · · · . Otherwise {aN } is unbounded.

E.g. If aN = (−1)N , then {aN } does not have a limit (is not conver-
gent), but is bounded by b = 1.
If aN = N 1/4, {aN } is not bounded. In fact aN → ∞ as N → ∞.

A sequence {aN } is O(Nλ) (i.e., at most of order N λ) iff {N −λaN }


is bounded.
As a special case, when λ = 0, {aN } is bounded and then {aN } is
O(1).


E.g., if aN = 10+ N , then {aN } = O(N 1/2) as {N −1/2aN } is bounded
by 11.

A sequence {aN } is o(Nλ) if N −λaN → 0 as N → ∞.


As a special case, when λ = 0, aN → 0 as N → ∞ and then {aN } is
o(1).
E.g. if aN = logN then {aN } = o(N λ), ∀λ > 1, since N −λlogN = logN .

The numerator increases at a decreasing rate but the denominator
increases at an increasing rate as N → ∞. So the ratio decreases as
N → ∞ and converges to 0 in the limit.
Convergence in Probability

Random sequences: The best example of sequence of random vari-


ables is perhaps the sequence of particular sample statistics (say,
1 Pn
x̄ = n i=1 xi) for different sample sizes.

A sequence of random variables {xN : N = 1, 2, · · · } converges in


probability to the constant a if ∀ > 0, ∃N such that P [|xN − a| >
] < η whenever N > N, ∀η > 0.
It means that the probability that a term sufficiently higher up in the
sequence deviates from the constant “a” by any finite magnitude is
infinitesimally small.

Written as: P [|xN − a| > ] → 0 as N → ∞, or


p
xN → a (converges in probability to), or
plim xN = a (probability limit).

p
As a special case, when a = 0, {xN} is op(1), ie, xN → 0.

A sequence of random variables {xN } is (eventually) bounded in


probability iff ∀ > 0, ∃b and integer N such that P [|xN | ≥ b] <
, ∀N > N.
It means that the probability that a term sufficiently higher up in the
sequence lies beyond the range [−b, b] is insufficiently small.

When we say a sequence is bounded in probability, we shall mean


eventual boundedness.

We say {xN} is Op(1), when {xN } is bounded in probability.


The way to visualize a sequence of random numbers is a sequence of
probability densities associated with each random number.
Then the notion of convergence in probability states that the densities
associated with the random numbers higher and higher up in the
sequence will be more and more concentrated around the constant
“a” in the domain of x. In the limit, the density will be degenerate
at “a”.
The (eventual) boundedness in probability suggests that eventually
(i.e. except for a finite number of terms in the sequence), for any
pre-specified tail-probability the sequence will be contained within the
range [−b, b].
Tail-probability is the sum of areas in the shaded regions. The tail-
probability can be made as small as possible (indistinguishable from
0 in the limit) and yet we can obtain the bounds.

An unbounded sequence in probability means that the probability of


an extreme event (towards the tails) is always finite, however higher
up the term might be in the sequence.

The above notion of op(1) and Op(1) may be generalized in tandem


with the deterministic cases.

A random sequence {xN : N = 1, 2, · · · } is op(N δ ) for δ ∈ <, if


p
N −δ xN → 0, i.e., {N −δ xN } is convergent, i.e., P [|N −δ xN − 0| > ] <
η ∀η > 0, whenever N > N.
p
As a special case of this we get xN → 0 when δ = 0.
A random sequence {xN : N = 1, 2, · · · } is Op(N δ ) if P [|N −δ xN | ≥ b] <
 ∀ > 0 for any N > N; i.e., {N −δ xN } is bounded in probability.
As a special case we get {xN } is bounded in probability when δ = 0.

A random sequence {xN : N = 1, 2, · · · } is op(aN ) where {aN } is a


positive non-random sequence, iff xaN = op(1) i.e. xN = op(aN ).
N

A random sequence {xN : N = 1, 2, · · · } is Op(aN ) where {aN } is a


positive non-random sequence, iff xaN = Op(1) i.e. xN = Op(aN ).
N

Now we state some results without proving them.

p
LEMMA 1: If xN → a then xN = Op(1). So if a random sequence
converges in probability to any real number then the sequence is
bounded.
This result holds for vector xN or matrix XN .

LEMMA 2:
i) op(1) + op(1) = op(1)
ii) Op(1) + Op(1) = Op(1)
iii) Op(1) · Op(1) = Op(1)
iv) op(1) · Op(1) = op(1)

But since if a sequence is op(1) it is also Op(1) by Lemma 1, hence


op(1) + Op(1) = Op(1) [from (i)] and op(1) · op(1) = op(1), [from iv].

The notions of boundedness and convergence in probability of random


sequences extends to vectors or matrices as well.
p
Let {xN } be a sequence of random K × 1 vectors. Then xN → a iff
p
xN j → aj , j = 1 to K; where a is a K × 1 constant vector.

p
The above in turn is equivalent to ||xN −a|| → 0, where ||b|| = (b0b)1/2
is the Euclidean Norm of the vector b.

p
Let {ZN } be a sequence of random M × K matrices. Then ZN → B
p
where B is an M × K constant matrix iff ||ZN − B|| → 0 where ||A|| =
[tr(A0A)]1/2].

LEMMA 3: Let {ZN : N = 1, 2, · · · } be a sequence of J × K random


matrices such that ZN = op(1) and {xN : N = 1, 2, · · · } be a sequence
0
of random J ×1 vectors such that {xN } = Op(1). Then ZN xN = op(1).
This result follows from the above definitions and part (i) and (iv) of
Lemma 2.
LEMMA 4 (Slutsky’s Theorem): Let g : <K → <J be a vector
valued function continuous at some point c ∈ <K . Let {xN : N =
p
1, 2, · · · } be a sequence of K × 1 random vectors such that xN → c.
Then,
p
g(xN ) → g(c) as N → ∞.

In other words, plim g(xN ) = g(plim xN ), if g(·) is continuous at


plim xN .

This theorem gives plim operator its advantage over E(·) operator
and this is why finite sample analysis for many estimators are difficult
whereas we can still talk about their limiting properties.
A Digression – Probability Space:

Elements of the structure

• A non-empty set Ω of possible outcomes (sample space),

• a family = of subsets of Ω representing possible events (events


space), and

• a real-valued function P (.) on = such that ∀E ∈ =, P (E) is inter-


preted as probability of event E.
E.g. Tossing two coins simultaneously → a random experiment:

→ Ω = {(H, H), (H, T ), (T, H), (T, T )}


→ = is a family of all subsets of Ω. In this case the power set of Ω
can serve as =. Events may be identified as follows:
(i) The event of getting at least one head, A = [(H, H), (H, T ), (T, H)]
(ii) The event of getting exactly one head, B = [(H, T ), (T, H)]
→ If the coin is fair then for each ω ∈ Ω, P ({ω}) = 1/4.

In general, a set function structure (Ω, =, P (.)) is a probability space


iff ∀A and B ∈ =,
A1. = is a suitable algebra of sets on Ω
[For finite probability space, = is a Boolean algebra , i.e., closed un-
der finite complementation, union and intersections. For countable
probability space, = is σ-algebra, i.e., closed under countable comple-
mentation, union and intersection. For uncountable probability space,
= is the σ-algebra generated by a family of sub-sets of Ω.]
A2. P (A) ≥ 0 ∀A ∈ =
A3.P (Ω) = 1
A4. If A ∩ B = φ, then P (A ∪ B) = P (A) + P (B).

Thus, (Ω, =) is a measurable space and P (.), a suitable measure on


(Ω, =) satisfying the 4 axioms, is a mapping from = to < such that
if A ∈ =, then P (A) = ω∈A P (ω). Then (Ω, =, P (.)) is a probability
P

space.

A random variable is a mapping from Ω to <.

Let (Ω, =, P (.)) be a probability space. A sequence of events {ΩN :


N = 1, 2, · · · } ∈ = is said to occur with probability approaching one
(w.p.a.1) [or “almost surely”] iff P (ΩN ) → 1 as N → ∞.

The complement of ΩN , viz. ΩcN , can occur for every N, but its
chance of occurrence goes to 0 as N → ∞.

COROLLARY 1: Let {ZN : N = 1, 2, · · · } be a sequence of random


K × K matrices and let A be a non-random, invertible K × K matrix.
p
If ZN → A, then
−1
1. ZN exists w.p.a.1
−1 p
2. ZN → A−1 or plim Z−1
N = A−1 (in an appropriate sense).

Proof : Determinant is a continuous function in the space of all square


p
matrices. Hence det(ZN ) → det(A) [by Lemma 4].

But det(A) 6= 0 as A is non-singular. Hence P [det(ZN ) 6= 0] → 1 as


N → ∞. [Part 1 Proved]

How to define Z−1


N when ZN is non-singular?

Let ΩN be the set of outcomes (ω) such that ZN (ω) is non-singular


∀ω ∈ ΩN . Thus, P (ΩN ) → 1 as N → ∞ [from part 1].

Define a new sequence of matrices as


Z̃N (ω) ≡ ZN (ω) if ω ∈ ΩN .
Z̃N (ω) ≡ IK (ω) if ω ∈
/ ΩN .

p
Then P (Z̃N = ZN ) = P (ΩN ) → 1 as N → ∞. Now since ZN → A,
p
hence, Z̃N → A(by transitivity).
Now, inverse function is continuous on the space of invertible ma-
−1 p p
trices. So Z̃N → A−1. This implies Z−1N → A
−1 as the fact that

ZN can be singular with vanishing probabilities does not affect the


analysis.
Convergence in Distribution:

A sequence of random variables {xN : N = 1, 2, · · · } converges in


distribution to the continuous random variable x iff FN (ξ) → F (ξ) as
N → ∞ ∀ξ ∈ < where FN is the c.d.f of xN and F is the (continuous)
c.d.f of x.

d
Written as: xN → x.
d a
When x ∼ N (µ, σ 2), we say xN → N (µ, σ 2), or, xN ∼ N (µ, σ 2) [xN is
asymptotically normal].
SN −N ·p a
E.g. xN ≡ ∼ N (0, 1) where SN ∼ bin(N, p)
[N ·p(1−p)]1/2

So xN is not required to be continuous for any finite N, but the


limiting distribution is continuous.

A sequence of K × 1 random vectors {xN : N = 1, 2, · · · } converges


in distribution to the continuous random vector x iff ∀K × 1 non-
d
random vector c, where c0c = 1, c0xN → c0x.

d
Written as: xN → x.

d
E.g.: When x ∼ N (m, V) (i.e. c0x ∼ N (c0m, c0Vc)), c0xN → N (c0m, c0Vc)
d
such that c0c = 1 ⇒ xN → N (m, V).

d
LEMMA 5: If xN → x, where x is any K × 1 random vector, then
xN = Op(1)
We already know from Lemma 1 a sufficient condition for a random
sequence to be bounded in probability. This lemma provides us with
yet another sufficient condition for the same.

LEMMA 6: Let {xN } be a sequence of K × 1 random vectors such


d d
that xN → x. If g : <K → <J is a continuous function, then g(xN ) →
g(x).

This is called continuous mapping theorem. It is the counterpart


of Slutsky’s Theorem for convergence in distribution.

It is extremely useful for finding asymptotic distribution of test-statistics


once the limiting distribution of an estimator is known.
COROLLARY 2: If {zN } is sequence of K × 1 random vectors such
d
that zN → N (0, V), then
d
1. For any K × M non-random matrix A, A0zN → N (0, A0VA)
0 d
2. zN V−1zN → χ2K

The above result is intuitive from the knowledge that A0zN is a linear
d d
combination of the elements of zN . As zN → N (.), hence A0zN → N (.)
as the linear transformation is a continuous mapping.

0 d
Also zN V−1zN = (V−1/2zN )0(V−1/2zN ). As zN → N (0, V), hence
d a
V−1/2zN → N (0, IK ) by the above logic. Thus, (V−1/2zN )0(V−1/2zN ) ∼
χ2
K , being sum of squares of standard normal variates, asymptotically.
LEMMA 7: Let {xN } and {zN } be sequences of K × 1 random
d p d
vectors. If zN → z and xN − zN → 0, then xN → z.

This is asymptotic equivalence lemma. It is extremely useful and


used frequently in asymptotic analysis.
Limit Theorems for Random Samples

THEOREM 1: Let {wi : i = 1, 2, · · · } be a sequence of iid G × 1


random vectors such that E(|wig |) < ∞, g = 1, · · · , G. Then the
sequence satisfies the Weak Law of Large Numbers (WLLN):
p
N −1
PN
w
i=1 i → µw , where µw ≡ E(wi).

This is the familiar result that plim x̄ = m.

THEOREM 2 (Lindeberg-Levy): Let {wi : i = 1, 2, · · · } be a se-


quence of iid G × 1 random vectors such that E(|wig 2 |) < ∞, g =

1, · · · , G and E(wi = 0). Then {wi : i = 1, 2, · · · } satisfies Cen-


d
tral Limit Theorem (CLT): N −1/2
PN
w
i=1 i → N (0, B), where B =
0
var(wi) = E(wiwi) is positive (semi)definite.

You might also like