Professional Documents
Culture Documents
Lecture 12
Lecture 12
Lecture 12
Lecture 12
Introduction Asymptotics
1 Consistency
2 Asymptotic Normality
• So far we have talked about properties of estimators in small samples, e.g., unbiasedness.
• In lectures 12 & 13 we now deal with properties in large samples: so-called asymptotic
properties when the sample size N → ∞.
• Recall consistency from Data Analytics I.
• Want to know if our estimator is close to the true value with high probability if N is large. If it does
not work even then, we should (i) know that, and (ii) perhaps use a different estimator.
A generic estimator WN , say, of a parameter θ is a random variable with distribution fWN , say.
Note that the distribution fWN depends on θ , which we do not highlight notationally.
Recall: If WN is unbiased, the expectation of the distribution fWN coincides with the true parameter.
Consistency (verbal):
Consistency means that the distribution fWN gets more and more “concentrated” around the true
parameter θ with larger samples, until “at infinity” it concentrates all its mass at θ .
P(|WN − θ | > ε) → 0, as N → ∞.
Verbally: The probability that the absolute difference between the estimator and the true value is
greater than any positive number approaches zero asymptotically.
plim(WN ) = plim WN = θ ;
Let WN and ZN be such that plim WN = w and plim ZN = z (with non-random limits).
The remaining results (2-4) together are often referred to as “Slutzky’s lemma”.
Recall: For events A and B we have P(A ∪ B) ≤ P(A) + P(B); and if A ≤ B, then P(A) ≤ P(B).
For a ≥ 0 and b ≥ 0 such that a + b ≥ ε , it must hold that a > ε/2 or b > ε/2.
Hence,
{|WN − w| + |ZN − z| ≥ ε} ⊆ {|WN − w| ≥ ε/2} ∪ {|ZN − z| ≥ ε/2},
so that
The square function is continuous. That plim −ZN = −z follows immediately from the definition.
1 1
plim WN /ZN = plim WN = w = w/z.
ZN z
Let YN be a sequence of i.i.d random variables with expectation µ . Then, the sample average
is a consistent estimator of µ , that is
N
plim(N −1 ∑ Yi ) = µ.
i=1
Proof: We sketch the proof assuming that Yi has finite variance Var(Y), say. Recall
N N
E(N −1 ∑ Yi ) = N −1 ∑ E(Yi ) = N −1 Nµ = µ → µ,
i=1 i=1
and that
N N
Var(N −1 ∑ Yi ) = N −2 ∑ Var(Yi ) = N −1 Var(Y) → 0.
i=1 i=1
It hence follows (Data Analytics I, Consistency) that plim(N −1 ∑N
i=1 Yi ) = µ.
Continuous Mapping Theorem implies: plim f (Y N ) = f (µ) for every continuous function f .
⇒ The larger the sample, the more accurate we can estimate the population expectation (or
continuous functions thereof).
The law of large numbers states (but we did not prove that) that ȲN is even consistent if the variance
σ 2 of a distribution is not finite (for example, for these distributions).
1 N
ȲA,N = ȲN = ∑ Yi (1)
N i=1
1
ȲB,N = (Y1 + Y2 + Y3 ) (2)
3
1 N
ȲC,N = ∑ Yi (3)
N − 1 i=1
1 N
ȲD,N = √ ∑ Yi (4)
N i=1
Unbiasedness of ȲB,N :
1 1 1 1
E(ȲB,N ) = E (Y1 + Y2 + Y3 ) = (E[Y1 ] + E[Y2 ] + E[Y3 ]) = (µ + µ + µ) = ̸ 3µ = µ
3 3 3 ̸3
Bias of ȲC,N :
!
1 N 1 N 1 N 1 N
E(ȲC,N ) = E ∑ Yi = ∑ E(Yi ) = ∑ µ= Nµ = µ ̸= µ
N − 1 i=1 N − 1 i=1 N − 1 i=1 N −1 N −1
N
Hence, the bias of ȲC,N is N−1 µ − µ = µ/(N − 1).
Consistency of ȲC,N : ! !
1 N 1 N N
plim(ȲC,N ) = plim ∑ Yi = plim N − 1 N ∑ Yi
N − 1 i=1 i=1
!
N
N 1
= plim plim ∑ Yi = 1µ = µ
N −1 N i=1
Inconsistency of ȲB,N :
1
plim(ȲB,N ) = (Y1 + Y2 + Y3 )
3
which is different from µ with positive probability unless σ 2 = 0.1
1
One can show that a limit in probability is unique up to modifications of probability 0.
J. Mareckova and D. Preinerstorfer Data Analytics II 16/ 21
Asymptotic normality
WN − E(WN )
ZN := p satisfies P(ZN ≤ z) → Φ(z) as N → ∞, for every z ∈ R,
Var(WN )
where Φ is the cdf (cumulative distribution function) of the standard normal distribution.
Asymptotic normality of most estimators is established using the Central Limit Theorem (CLT):
Let the random variables YN be i.i.d. with expectation µ and variance σ 2 , and denote ȲN =
1 N
N ∑i=1 Yi Then, √
ȲN − E(ȲN ) N(ȲN − µ)
ZN = p =
Var(ȲN ) σ
a
satisfies P(ZN ≤ z) → Φ(z), for every z ∈ R, which we also denote as ZN ∼ N(0, 1).
Verbally: The sample average is approximately normally distributed, regardless (upon existence of
an expectation and variance) of the distribution from which it was drawn.
√ (ȲN −µ) a
The CLT shows that ZN = N σ ∼ N(0, 1).
ȲN −µ
The LLN “only” shows that plim σ = 0.
√
⇒ Multiplication by N ensures that the variance stays constant as N → ∞.
We have used two convergence properties for sequences of random variables today.
1 Convergence in probability for consistency.
2 Asymptotic normality, which is a special case of “convergence in distribution”.
How do these concepts relate to each other? I won’t discuss this much beyond pointing out that:
a
If plim WN = w (limit non-random) and if ZN ∼ N(0, 1), then
a a
WN ZN ∼ N(0, w2 ) and WN + ZN ∼ N(w, 1).
Notebook12.nb.html (Self-study)