Lecture 12

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Empirical Research in Economics

Lecture 12

Introduction Asymptotics

Jana Mareckova and David Preinerstorfer

J. Mareckova and D. Preinerstorfer Data Analytics II 1/ 21


Overview

1 Consistency

2 Asymptotic Normality

Literature: Appendix C.3 (& Appendix C.2)

J. Mareckova and D. Preinerstorfer Data Analytics II 2/ 21


Estimator properties in small and large samples

• So far we have talked about properties of estimators in small samples, e.g., unbiasedness.
• In lectures 12 & 13 we now deal with properties in large samples: so-called asymptotic
properties when the sample size N → ∞.
• Recall consistency from Data Analytics I.

• Why are we interested in asymptotics at all:

• Want to know if our estimator is close to the true value with high probability if N is large. If it does
not work even then, we should (i) know that, and (ii) perhaps use a different estimator.

• Distributional approximations for situations, where we do not want to impose distributional


assumptions.

J. Mareckova and D. Preinerstorfer Data Analytics II 3/ 21


Consistency

A generic estimator WN , say, of a parameter θ is a random variable with distribution fWN , say.

Note that the distribution fWN depends on θ , which we do not highlight notationally.

Recall: If WN is unbiased, the expectation of the distribution fWN coincides with the true parameter.

Consistency (verbal):
Consistency means that the distribution fWN gets more and more “concentrated” around the true
parameter θ with larger samples, until “at infinity” it concentrates all its mass at θ .

J. Mareckova and D. Preinerstorfer Data Analytics II 4/ 21


Consistency Graphically

J. Mareckova and D. Preinerstorfer Data Analytics II 5/ 21


Consistency Formally

An sequence of estimators WN is called consistent, if the sequence of random variables WN


converges in probability to the true value θ ; that is, if for every ε > 0,

P(|WN − θ | > ε) → 0, as N → ∞.

Verbally: The probability that the absolute difference between the estimator and the true value is
greater than any positive number approaches zero asymptotically.

In case a sequence WN converges in probability to θ , we write

plim(WN ) = plim WN = θ ;

and shall often drop the parentheses.

J. Mareckova and D. Preinerstorfer Data Analytics II 6/ 21


Fundamental properties of plim

Let WN and ZN be such that plim WN = w and plim ZN = z (with non-random limits).

1 If f : R → R is continuous at w, then plim f (WN ) = f (w).


2 plim(WN + ZN ) = w + z.
3 plim(WN ZN ) = wz.
4 plim(WN /ZN ) = w/z, provided z ̸= 0.

The first result is called a “Continuous Mapping Theorem”.

The remaining results (2-4) together are often referred to as “Slutzky’s lemma”.

Recall: For events A and B we have P(A ∪ B) ≤ P(A) + P(B); and if A ≤ B, then P(A) ≤ P(B).

J. Mareckova and D. Preinerstorfer Data Analytics II 7/ 21


Proof of (1): Fix ε > 0. By assumption, f is continuous at w. Hence, there exists a δ > 0 such that

|WN − w| ≤ δ implies |f (WN ) − f (w)| ≤ ε.

In other words, we have the relation

{|WN − w| ≤ δ } ⊆ {|f (WN ) − f (w)| ≤ ε}.

Taking complements “reverts” inclusions, thus

{|f (WN ) − f (w)| > ε} ⊆ {|WN − w| > δ }.

Therefore, we arrive at the inequality

P(|f (WN ) − f (w)| > ε) ≤ P(|WN − w| > δ ) → 0,

the convergence following from plim(WN ) = w.

J. Mareckova and D. Preinerstorfer Data Analytics II 8/ 21


Proof of (2): Fix ε > 0, and use the triangle inequality to conclude

P(|WN + ZN − w + z| ≥ ε) ≤ P(|WN − w| + |ZN − z| ≥ ε).

For a ≥ 0 and b ≥ 0 such that a + b ≥ ε , it must hold that a > ε/2 or b > ε/2.

Hence,
{|WN − w| + |ZN − z| ≥ ε} ⊆ {|WN − w| ≥ ε/2} ∪ {|ZN − z| ≥ ε/2},
so that

P(|WN − w| + |ZN − z| ≥ ε) ≤ P(|WN − w| ≥ ε/2) + P(|ZN − z| ≥ ε/2),


which converges to 0 due to plim WN = w and plim ZN = z.

J. Mareckova and D. Preinerstorfer Data Analytics II 9/ 21


Proof of (3) [not examinable]: Write

4WN ZN = (WN + ZN )2 − (WN − ZN )2 .

The square function is continuous. That plim −ZN = −z follows immediately from the definition.

Therefore, Parts (1) and (2) show that

plim(WN + ZN )2 = (w + z)2 and plim(WN − ZN )2 = (w − z)2 .

From Part (2) it hence follows that

plim(4WN ZN ) = (w + z)2 + (w − z)2 = 4wz.

Fix ε > 0 and write

P(|WN ZN − wz| > ε) = P(|4WN ZN − 4wz| > 4ε) → 0,

the convergence following from the penultimate display.


J. Mareckova and D. Preinerstorfer Data Analytics II 10/ 21
Proof of (4) [not examinable]: The function x 7→ 1/x is continuous everywhere but at 0.

Because of z ̸= 0, it follows from Part (1) that

plim 1/ZN = 1/z.

In combination with Part (3) it follows that

1 1
plim WN /ZN = plim WN = w = w/z.
ZN z

J. Mareckova and D. Preinerstorfer Data Analytics II 11/ 21


Law of large numbers (LLN)

Let YN be a sequence of i.i.d random variables with expectation µ . Then, the sample average
is a consistent estimator of µ , that is

N
plim(N −1 ∑ Yi ) = µ.
i=1

Proof: We sketch the proof assuming that Yi has finite variance Var(Y), say. Recall
N N
E(N −1 ∑ Yi ) = N −1 ∑ E(Yi ) = N −1 Nµ = µ → µ,
i=1 i=1
and that
N N
Var(N −1 ∑ Yi ) = N −2 ∑ Var(Yi ) = N −1 Var(Y) → 0.
i=1 i=1
It hence follows (Data Analytics I, Consistency) that plim(N −1 ∑N
i=1 Yi ) = µ.

J. Mareckova and D. Preinerstorfer Data Analytics II 12/ 21


Remarks on the LLN

Continuous Mapping Theorem implies: plim f (Y N ) = f (µ) for every continuous function f .

⇒ The larger the sample, the more accurate we can estimate the population expectation (or
continuous functions thereof).

The law of large numbers states (but we did not prove that) that ȲN is even consistent if the variance
σ 2 of a distribution is not finite (for example, for these distributions).

J. Mareckova and D. Preinerstorfer Data Analytics II 13/ 21


Example, cf. Data Analytics I
We observe a random sample of the variable Yi for N individuals. The expected value of Yi in the
population corresponds to E[Yi ] = µ . We can use different estimators for this expected value:

1 N
ȲA,N = ȲN = ∑ Yi (1)
N i=1
1
ȲB,N = (Y1 + Y2 + Y3 ) (2)
3
1 N
ȲC,N = ∑ Yi (3)
N − 1 i=1
1 N
ȲD,N = √ ∑ Yi (4)
N i=1

Which ones are unbiased? Which ones are consistent for µ ?

J. Mareckova and D. Preinerstorfer Data Analytics II 14/ 21


Example Unbiasedness, cf. Data Analytics I

Unbiasedness of ȲB,N :
 
1 1 1 1
E(ȲB,N ) = E (Y1 + Y2 + Y3 ) = (E[Y1 ] + E[Y2 ] + E[Y3 ]) = (µ + µ + µ) = ̸ 3µ = µ
3 3 3 ̸3

Bias of ȲC,N :
!
1 N 1 N 1 N 1 N
E(ȲC,N ) = E ∑ Yi = ∑ E(Yi ) = ∑ µ= Nµ = µ ̸= µ
N − 1 i=1 N − 1 i=1 N − 1 i=1 N −1 N −1
N
Hence, the bias of ȲC,N is N−1 µ − µ = µ/(N − 1).

Food for thought: Show that ȲD,N is biased.

J. Mareckova and D. Preinerstorfer Data Analytics II 15/ 21


Example Consistency, cf. Data Analytics I

Consistency of ȲC,N : ! !
1 N 1 N N
plim(ȲC,N ) = plim ∑ Yi = plim N − 1 N ∑ Yi
N − 1 i=1 i=1
!
  N
N 1
= plim plim ∑ Yi = 1µ = µ
N −1 N i=1
Inconsistency of ȲB,N :
1
plim(ȲB,N ) = (Y1 + Y2 + Y3 )
3
which is different from µ with positive probability unless σ 2 = 0.1

Food for thought: Show that ȲD,N is not consistent.

1
One can show that a limit in probability is unique up to modifications of probability 0.
J. Mareckova and D. Preinerstorfer Data Analytics II 16/ 21
Asymptotic normality

Consistency states that the distribution of an estimator concentrates around a point.


Consistency, however, doesn’t tell us much about what the distribution of the estimator
approximately looks like for n large.
However, knowing the distribution, at least approximately, is necessary to perform hypothesis tests

Many estimators are asymptotically normally distributed, in the sense that

WN − E(WN )
ZN := p satisfies P(ZN ≤ z) → Φ(z) as N → ∞, for every z ∈ R,
Var(WN )
where Φ is the cdf (cumulative distribution function) of the standard normal distribution.

J. Mareckova and D. Preinerstorfer Data Analytics II 17/ 21


Central Limit Theorem

Asymptotic normality of most estimators is established using the Central Limit Theorem (CLT):

Let the random variables YN be i.i.d. with expectation µ and variance σ 2 , and denote ȲN =
1 N
N ∑i=1 Yi Then, √
ȲN − E(ȲN ) N(ȲN − µ)
ZN = p =
Var(ȲN ) σ
a
satisfies P(ZN ≤ z) → Φ(z), for every z ∈ R, which we also denote as ZN ∼ N(0, 1).

Verbally: The sample average is approximately normally distributed, regardless (upon existence of
an expectation and variance) of the distribution from which it was drawn.

J. Mareckova and D. Preinerstorfer Data Analytics II 18/ 21


Remarks on the CLT and LLN

√ (ȲN −µ) a
The CLT shows that ZN = N σ ∼ N(0, 1).
 
ȲN −µ
The LLN “only” shows that plim σ = 0.

⇒ Multiplication by N ensures that the variance stays constant as N → ∞.

J. Mareckova and D. Preinerstorfer Data Analytics II 19/ 21


Remarks

We have used two convergence properties for sequences of random variables today.
1 Convergence in probability for consistency.
2 Asymptotic normality, which is a special case of “convergence in distribution”.

How do these concepts relate to each other? I won’t discuss this much beyond pointing out that:

a
If plim WN = w (limit non-random) and if ZN ∼ N(0, 1), then
a a
WN ZN ∼ N(0, w2 ) and WN + ZN ∼ N(w, 1).

This will be important in the next lecture.

J. Mareckova and D. Preinerstorfer Data Analytics II 20/ 21


Notebook time

Notebook12.nb.html (Self-study)

J. Mareckova and D. Preinerstorfer Data Analytics II 21/ 21

You might also like