Lecture 12

Empirical Research in Economics
Lecture 12
Introduction Asymptotics
Jana Mareckova and David Preinerstorfer
J. Mareckova and D. Preinerstorfer Data Analytics II 1/ 21

Overview
1 Consistency
2 Asymptotic Normality
Literature: Appendix C.3 (& Appendix C.2)

Estimator properties in small and large samples
• So far we have talked about properties of estimators in small samples, e.g., unbiasedness.
• In lectures 12 & 13 we now deal with properties in large samples: so-called asymptotic
properties when the sample size N → ∞.
• Recall consistency from Data Analytics I.
• Why are we interested in asymptotics at all:
• Want to know if our estimator is close to the true value with high probability if N is large. If it does
not work even then, we should (i) know that, and (ii) perhaps use a different estimator.
• Distributional approximations for situations, where we do not want to impose distributional

assumptions.

Consistency
A generic estimator WN , say, of a parameter θ is a random variable with distribution fWN , say.
Note that the distribution fWN depends on θ , which we do not highlight notationally.
Recall: If WN is unbiased, the expectation of the distribution fWN coincides with the true parameter.
Consistency (verbal):
Consistency means that the distribution fWN gets more and more “concentrated” around the true
parameter θ with larger samples, until “at infinity” it concentrates all its mass at θ .

Consistency Graphically

Consistency Formally
An sequence of estimators WN is called consistent, if the sequence of random variables WN

converges in probability to the true value θ ; that is, if for every ε > 0,
P(|WN − θ | > ε) → 0, as N → ∞.
Verbally: The probability that the absolute difference between the estimator and the true value is
greater than any positive number approaches zero asymptotically.
In case a sequence WN converges in probability to θ , we write
plim(WN ) = plim WN = θ ;
and shall often drop the parentheses.

Fundamental properties of plim
Let WN and ZN be such that plim WN = w and plim ZN = z (with non-random limits).
1 If f : R → R is continuous at w, then plim f (WN ) = f (w).

2 plim(WN + ZN ) = w + z.
3 plim(WN ZN ) = wz.
4 plim(WN /ZN ) = w/z, provided z ̸= 0.
The first result is called a “Continuous Mapping Theorem”.
The remaining results (2-4) together are often referred to as “Slutzky’s lemma”.
Recall: For events A and B we have P(A ∪ B) ≤ P(A) + P(B); and if A ≤ B, then P(A) ≤ P(B).

Proof of (1): Fix ε > 0. By assumption, f is continuous at w. Hence, there exists a δ > 0 such that
|WN − w| ≤ δ implies |f (WN ) − f (w)| ≤ ε.
In other words, we have the relation
{|WN − w| ≤ δ } ⊆ {|f (WN ) − f (w)| ≤ ε}.
Taking complements “reverts” inclusions, thus
{|f (WN ) − f (w)| > ε} ⊆ {|WN − w| > δ }.
Therefore, we arrive at the inequality
P(|f (WN ) − f (w)| > ε) ≤ P(|WN − w| > δ ) → 0,
the convergence following from plim(WN ) = w.

Proof of (2): Fix ε > 0, and use the triangle inequality to conclude
P(|WN + ZN − w + z| ≥ ε) ≤ P(|WN − w| + |ZN − z| ≥ ε).
For a ≥ 0 and b ≥ 0 such that a + b ≥ ε , it must hold that a > ε/2 or b > ε/2.
Hence,
{|WN − w| + |ZN − z| ≥ ε} ⊆ {|WN − w| ≥ ε/2} ∪ {|ZN − z| ≥ ε/2},
so that
P(|WN − w| + |ZN − z| ≥ ε) ≤ P(|WN − w| ≥ ε/2) + P(|ZN − z| ≥ ε/2),

which converges to 0 due to plim WN = w and plim ZN = z.

Proof of (3) [not examinable]: Write
4WN ZN = (WN + ZN )2 − (WN − ZN )2 .
The square function is continuous. That plim −ZN = −z follows immediately from the definition.
Therefore, Parts (1) and (2) show that
plim(WN + ZN )2 = (w + z)2 and plim(WN − ZN )2 = (w − z)2 .
From Part (2) it hence follows that
plim(4WN ZN ) = (w + z)2 + (w − z)2 = 4wz.
Fix ε > 0 and write
P(|WN ZN − wz| > ε) = P(|4WN ZN − 4wz| > 4ε) → 0,
the convergence following from the penultimate display.

Proof of (4) [not examinable]: The function x 7→ 1/x is continuous everywhere but at 0.
Because of z ̸= 0, it follows from Part (1) that
plim 1/ZN = 1/z.
In combination with Part (3) it follows that
1 1
plim WN /ZN = plim WN = w = w/z.
ZN z

Law of large numbers (LLN)
Let YN be a sequence of i.i.d random variables with expectation µ . Then, the sample average
is a consistent estimator of µ , that is
N
plim(N −1 ∑ Yi ) = µ.
i=1
Proof: We sketch the proof assuming that Yi has finite variance Var(Y), say. Recall
N N
E(N −1 ∑ Yi ) = N −1 ∑ E(Yi ) = N −1 Nµ = µ → µ,
i=1 i=1
and that
N N
Var(N −1 ∑ Yi ) = N −2 ∑ Var(Yi ) = N −1 Var(Y) → 0.
i=1 i=1
It hence follows (Data Analytics I, Consistency) that plim(N −1 ∑N
i=1 Yi ) = µ.

Remarks on the LLN
Continuous Mapping Theorem implies: plim f (Y N ) = f (µ) for every continuous function f .
⇒ The larger the sample, the more accurate we can estimate the population expectation (or
continuous functions thereof).
The law of large numbers states (but we did not prove that) that ȲN is even consistent if the variance
σ 2 of a distribution is not finite (for example, for these distributions).

Example, cf. Data Analytics I
We observe a random sample of the variable Yi for N individuals. The expected value of Yi in the
population corresponds to E[Yi ] = µ . We can use different estimators for this expected value:
1 N
ȲA,N = ȲN = ∑ Yi (1)
N i=1
1
ȲB,N = (Y1 + Y2 + Y3 ) (2)
3
1 N
ȲC,N = ∑ Yi (3)
N − 1 i=1
1 N
ȲD,N = √ ∑ Yi (4)
N i=1
Which ones are unbiased? Which ones are consistent for µ ?

Example Unbiasedness, cf. Data Analytics I
Unbiasedness of ȲB,N :

1 1 1 1
E(ȲB,N ) = E (Y1 + Y2 + Y3 ) = (E[Y1 ] + E[Y2 ] + E[Y3 ]) = (µ + µ + µ) = ̸ 3µ = µ
3 3 3 ̸3
Bias of ȲC,N :
!
1 N 1 N 1 N 1 N
E(ȲC,N ) = E ∑ Yi = ∑ E(Yi ) = ∑ µ= Nµ = µ ̸= µ
N − 1 i=1 N − 1 i=1 N − 1 i=1 N −1 N −1
N
Hence, the bias of ȲC,N is N−1 µ − µ = µ/(N − 1).
Food for thought: Show that ȲD,N is biased.

Example Consistency, cf. Data Analytics I
Consistency of ȲC,N : ! !
1 N 1 N N
plim(ȲC,N ) = plim ∑ Yi = plim N − 1 N ∑ Yi
N − 1 i=1 i=1
!
N
N 1
= plim plim ∑ Yi = 1µ = µ
N −1 N i=1
Inconsistency of ȲB,N :
1
plim(ȲB,N ) = (Y1 + Y2 + Y3 )
3
which is different from µ with positive probability unless σ 2 = 0.1
Food for thought: Show that ȲD,N is not consistent.
1
One can show that a limit in probability is unique up to modifications of probability 0.
Asymptotic normality
Consistency states that the distribution of an estimator concentrates around a point.

Consistency, however, doesn’t tell us much about what the distribution of the estimator
approximately looks like for n large.
However, knowing the distribution, at least approximately, is necessary to perform hypothesis tests
Many estimators are asymptotically normally distributed, in the sense that
WN − E(WN )
ZN := p satisfies P(ZN ≤ z) → Φ(z) as N → ∞, for every z ∈ R,
Var(WN )
where Φ is the cdf (cumulative distribution function) of the standard normal distribution.

Central Limit Theorem
Asymptotic normality of most estimators is established using the Central Limit Theorem (CLT):
Let the random variables YN be i.i.d. with expectation µ and variance σ 2 , and denote ȲN =
1 N
N ∑i=1 Yi Then, √
ȲN − E(ȲN ) N(ȲN − µ)
ZN = p =
Var(ȲN ) σ
a
satisfies P(ZN ≤ z) → Φ(z), for every z ∈ R, which we also denote as ZN ∼ N(0, 1).
Verbally: The sample average is approximately normally distributed, regardless (upon existence of
an expectation and variance) of the distribution from which it was drawn.

Remarks on the CLT and LLN
√ (ȲN −µ) a
The CLT shows that ZN = N σ ∼ N(0, 1).

ȲN −µ
The LLN “only” shows that plim σ = 0.
√
⇒ Multiplication by N ensures that the variance stays constant as N → ∞.

Remarks
We have used two convergence properties for sequences of random variables today.
1 Convergence in probability for consistency.
2 Asymptotic normality, which is a special case of “convergence in distribution”.
How do these concepts relate to each other? I won’t discuss this much beyond pointing out that:
a
If plim WN = w (limit non-random) and if ZN ∼ N(0, 1), then
a a
WN ZN ∼ N(0, w2 ) and WN + ZN ∼ N(w, 1).
This will be important in the next lecture.

Notebook time
Notebook12.nb.html (Self-study)

Lecture 12

Uploaded by

Copyright:

Available Formats

You might also like

Lecture 12

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 12

Uploaded by

Copyright:

Available Formats

Empirical Research in Economics

Jana Mareckova and David Preinerstorfer

J. Mareckova and D. Preinerstorfer Data Analytics II 1/ 21

Literature: Appendix C.3 (& Appendix C.2)

J. Mareckova and D. Preinerstorfer Data Analytics II 2/ 21

• Why are we interested in asymptotics at all:

• Distributional approximations for situations, where we do not want to impose distributional

J. Mareckova and D. Preinerstorfer Data Analytics II 3/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 4/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 5/ 21

An sequence of estimators WN is called consistent, if the sequence of random variables WN

In case a sequence WN converges in probability to θ , we write

and shall often drop the parentheses.

J. Mareckova and D. Preinerstorfer Data Analytics II 6/ 21

1 If f : R → R is continuous at w, then plim f (WN ) = f (w).

The first result is called a “Continuous Mapping Theorem”.

J. Mareckova and D. Preinerstorfer Data Analytics II 7/ 21

|WN − w| ≤ δ implies |f (WN ) − f (w)| ≤ ε.

In other words, we have the relation

{|WN − w| ≤ δ } ⊆ {|f (WN ) − f (w)| ≤ ε}.

Taking complements “reverts” inclusions, thus

{|f (WN ) − f (w)| > ε} ⊆ {|WN − w| > δ }.

Therefore, we arrive at the inequality

P(|f (WN ) − f (w)| > ε) ≤ P(|WN − w| > δ ) → 0,

the convergence following from plim(WN ) = w.

J. Mareckova and D. Preinerstorfer Data Analytics II 8/ 21

P(|WN + ZN − w + z| ≥ ε) ≤ P(|WN − w| + |ZN − z| ≥ ε).

P(|WN − w| + |ZN − z| ≥ ε) ≤ P(|WN − w| ≥ ε/2) + P(|ZN − z| ≥ ε/2),

J. Mareckova and D. Preinerstorfer Data Analytics II 9/ 21

4WN ZN = (WN + ZN )2 − (WN − ZN )2 .

Therefore, Parts (1) and (2) show that

plim(WN + ZN )2 = (w + z)2 and plim(WN − ZN )2 = (w − z)2 .

From Part (2) it hence follows that

plim(4WN ZN ) = (w + z)2 + (w − z)2 = 4wz.

Fix ε > 0 and write

P(|WN ZN − wz| > ε) = P(|4WN ZN − 4wz| > 4ε) → 0,

the convergence following from the penultimate display.

Because of z ̸= 0, it follows from Part (1) that

plim 1/ZN = 1/z.

In combination with Part (3) it follows that

J. Mareckova and D. Preinerstorfer Data Analytics II 11/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 12/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 13/ 21

Which ones are unbiased? Which ones are consistent for µ ?

J. Mareckova and D. Preinerstorfer Data Analytics II 14/ 21

Food for thought: Show that ȲD,N is biased.

J. Mareckova and D. Preinerstorfer Data Analytics II 15/ 21

Food for thought: Show that ȲD,N is not consistent.

Consistency states that the distribution of an estimator concentrates around a point.

Many estimators are asymptotically normally distributed, in the sense that

J. Mareckova and D. Preinerstorfer Data Analytics II 17/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 18/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 19/ 21

This will be important in the next lecture.

J. Mareckova and D. Preinerstorfer Data Analytics II 20/ 21

J. Mareckova and D. Preinerstorfer Data Analytics II 21/ 21

You might also like