Lecture 8

Advanced Econometrics I
Jürgen Meinecke
Lecture 8 of 12
Research School of Economics, Australian National University
1 / 32
Roadmap
Instrumental Variables Estimation

Two Stage Least Squares (2SLS) Estimator
Large Sample Properties of β̂2SLS
Bias of 2SLS Estimator
Invalid Instruments
2 / 32
We are finally re-connecting with the week 5 lecture
We combined the structural and first stage equations like so:
Yi = Xi0 β + ei
= (π 0 Zi + vi )0 β + ei
= Zi0 λ + wi ,
with λ := πβ and wi := vi0 β + ei

Recall the two reduced form projection coefficients
• regressing Yi on Zi results in λ = E(Zi Zi0 )−1 E(Zi Yi )

• regressing Xi on Zi results in π = E(Zi Zi0 )−1 E(Zi Xi0 )
Let’s recall their dimensions
• dim λ = L × 1
• dim π = L × K
3 / 32
We learned that the projection coefficients λ and π are
identified because they are explicit functions of population
moments
This means we can uniquely estimate them
Practically we treat them as if they were known to us (because
we have faith in uniquely estimating them via analog principle)
In contrast, identification of β is not so easy because λ = πβ
is a system of L equations for K unknowns
Linear algebra tells you that there
• are no solutions or infinitely many solutions if L < K

• is hope for unique solution only if L ≥ K
So let’s only consider L ≥ K
4 / 32
Two sub-cases here
• L=K
then dim π = K × K and if it is invertible then
β = π −1 λ = E(Zi Xi0 )−1 E(Zi Yi )
This solution for β motivates the IV estimator

• L > K then we cannot simply invert, but we can do this:
πβ = λ ⇔ π 0 πβ = π 0 λ
and therefore
β = ( π 0 π ) −1 π 0 λ
This seems straightforward, however the 2SLS estimator is

not (π̂ 0 π̂ )−1 π̂ 0 λ̂
It has a different motivation
5 / 32
Again looking at our structural equation and plugging in the
first stage
Yi = Xi0 β + ei
= (π 0 Zi + vi )0 β + ei
= Zi0 πβ + (vi0 β + ei )
= Zi0 πβ + wi
If you knew π you could define Z̃i0 = Zi0 π and write

Yi = Z̃i0 β + wi ,
where E(Z̃i wi ) = 0
Clearly, OLS would work fine here
6 / 32
Notice that dim Z̃i = dim Xi = K × 1
The corresponding matrix Z̃ := Zπ with
dim Z̃ = dim X = N × K
The OLS estimator is
−1 0
β̂i2SLS := Z̃0 Z̃ Z̃ Y
−1 0
= Z̃0 Z̃

Z̃ Y
−1 0 0
= π 0 Z0 Zπ πZY
This OLS estimator is infeasible because we don’t know π

But we can turn it into a feasible estimator by plugging in the
consistent estimator π̂ := (Z0 Z)−1 Z0 X
And this is indeed what the 2SLS estimator does
7 / 32
Definition (Two Stage Least Squares (2SLS) Estimator)
−1 0 0
β̂2SLS := π̂ 0 Z0 Zπ̂ π̂ Z Y
−1
= X 0 Z (Z0 Z ) −1 Z0 X X 0 Z (Z0 Z ) −1 Z0 Y
In summation  notation:! ! −1 !  −1
N N N
β̂2SLS =  ∑ Xi Zi0 ∑ Zi Zi0 ∑ Zi Xi0  ×
i=1 i=1 i=1
! ! −1 !
N N N
∑ Xi Zi0 ∑ Zi Zi0 ∑ Zi Yi
i=1 i=1 i=1
8 / 32
Three different interpretations of β̂2SLS
Recall PZ := Z(Z0 Z)−1 Z0 is the symmetric and idempotent
projection matrix
Then X̂ := PZ X is the projection of X on Z
It follows −1
β̂2SLS = X0 Z(Z0 Z)−1 Z0 X X 0 Z (Z0 Z ) −1 Z0 Y
−1 0
= X0 PZ X X PZ Y (1)
−1
= (PZ X)0 X (PZ X)0 Y

−1 0
= X̂0 X X̂ Y (2)
−1 0
= X0 PZ PZ X X PZ Y
−1
= (PZ X)0 (PZ X) ( PZ X ) 0 Y
−1 0
= X̂0 X̂ X̂ Y (3)
9 / 32
Equation (1) is the most common matrix representation of β̂2SLS
in textbooks and notes
Equation (2) presents the 2SLS estimator as an IV estimator, it
has the same structure as β̂IV with X̂ used in place of Z
Equation (3) presents the 2SLS estimator as an OLS estimator of
Y on X̂
This third interpretation justifies the use of “two stage least
squares”:
(1) regress X on Z, obtain π̂ = (Z0 Z)−1 Z0 X and X̂ = Zπ̂ = PZ X

(2) regress Y on X̂ and obtain β̂2SLS = (X̂0 X̂)−1 X̂0 Y
10 / 32
Roadmap

Invalid Instruments
11 / 32
Proposition (Consistency of β̂2SLS )
p
β̂2SLS → β.
Proposition (Asymptotic Distribution of β̂2SLS )

−1
Let CXZ = E(Xi Zi0 ), and CZZ = E(Zi Zi0 ), and CZX = E(Zi Xi0 ). Then
√ d
N ( β̂2SLS − β) → N (0, Ω)
where
−1 −1 −1 −1
Ω = (CXZ CZZ CZX )−1 CXZ CZZ E(e2i Zi Zi0 )CZZ CZX (CXZ CZZ CZX )−1
Corollary
−1
Under homoskedasticity, Ω = σe2 (CXZ CZZ CZX )−1 .
12 / 32
Consistent estimators for the asymptotic covariances are
readily obtained by using the analogy principle
So replace population moments by sample moments, because
N
∑ Xi Zi0 /N = CXZ + op (1)
i=1
N
∑ Zi Zi0 /N = CZZ
−1
+ op (1)
i=1
N
∑ Zi Xi0 /N = CZX + op (1)
i=1
N
∑ Zi Zi0 ê2i /N = E(Zi Zi0 e2i ) + op (1)
i=1
where êi := Yi − Xi0 β̂2SLS

The resulting covariance matrix estimator will be consistent
13 / 32
Roadmap

Invalid Instruments
14 / 32
Let’s look at the model
Yi = Xi β + ei
Xi = Zi0 π + vi ,
where Xi is a scalar and dim Zi = L ≥ 1

Let (ei , vi ) ∼ N (0, Σ)
(that is, we assume an exact bivariate normal distribution)
2SLS estimation makes sense here because E(ei Xi ) 6= 0
Our goal here is to study the bias of β̂2SLS
Spoiler alert: β̂2SLS is biased
We want to better understand that bias
15 / 32
We know that
√ 2SLS √1 X0 Z(Z0 Z)−1 Z0 e
N
N β̂ −β = 1 0 0 −1 0
N X Z(Z Z) Z X
Let’s dissect numerator and denominator

We will rely on the following substitution: given X = Zπ + v
• Z0 X = Z0 Zπ + Z0 v
• X 0 Z = π 0 Z0 Z + v0 Z
These can be used in numerator and denominator

Also, to make life easier, let’s pretend that
• Zi
• Ξ : = ∑N 0
i=1 Zi Zi /N
are non-stochastic (we treat them as constants)

16 / 32
Turning first to the denominator
1 0 0 −1 0 1 0 0 2 0 0 1 0 0 −1 0
N X Z(Z Z) Z X = N π Z Zπ + N π Z v + N v Z(Z Z) Z v
Looking at the individual terms

1 0 0 0
N π Z Zπ = π Ξπ = O(1)
4σv2 0 0

2 0 0
N π Z v ∼ N 0, N2
π Z Zπ
= √2N N 0, σv2 π 0 Ξπ

= √2 Op (1) = Op √1
N N

1 0 1 2 2 1 2 1 √1

N v PZ v ∼ N σv χ (L) = N σv Op (1) = Op N = op N
where I use the lemma:

If P ∼ N (0, IN ) then P0 QP ∼ χ2 (tr (Q))
17 / 32
Then turning to the numerator
√1 X0 Z(Z0 Z)−1 Z0 e = √1 π 0 Z0 e + √1 v0 PZ e
N N N
= √1 π 0 Z0 e + √1 σev2 v0 PZ v + √1 v0 PZ w
N N σv N
σev
where I use the projection ei = v
σv2 i
+ wi with E(vi wi ) = 0
Looking at the individual terms
√1 π 0 Z0 e ∼ N 0, σe2 π 0 Ξπ = Op (1)

N

√1 σev2 v0 PZ v ∼ √1 σev χ2 (L) = Op √1
N σv N N

1 0
√ v PZ w = Op √ 1
N N
The last line is justified because v0 PZ w is a well behaved rv with

E (v0 PZ w) = 0 and Var (v0 PZ w) = σv2 σw2 L < ∞, which is not
difficult (but tedious) to show
(vi , wi are independent normal rvs, their product is non-normal)
18 / 32
Now we’ve got the main building blocks in place
Putting things together and applying an asymptotic
approximation from Hahn and Kuersteiner (2002)
√ √1 π 0 Z0 e + √1 v0 PZ e
2SLS N N
N ( β̂ − β) =
π 0 Ξπ + N π 0 Z0 v + op √1
2
N
√1 π 0 Z0 e + √1 v0 PZ e √1 π 0 Z0 e
N N N 2 0 0
≈ − πZv
π 0 Ξπ (π 0 Ξπ )2 N
Big picture:
We want the study the expected value of β̂2SLS
We have done the hard work already, now we can derive the
expected value of the rhs
19 / 32
√
E N ( β̂2SLS − β)
√1 π 0 Z0 e + √1 v0 PZ e √1 π 0 Z0 e
!
N N N 2 0 0
≈E − πZv
π 0 Ξπ (π 0 Ξπ )2 N
1 L 2 1
=√ σev − √ σev
N π Ξπ
0
N π Ξπ
0
1 L−2
=√ σev
N π 0 Ξπ
where I have used two useful facts:

E(χ2 (L)) = L
E π 0 Z0 eπ 0 Z0 v = E π 0 Z0 ev0 Zπ = π 0 Z0 Zπσev = Nπ 0 Ξπσev

20 / 32
We have successfully approximated the bias of the 2SLS
estimator:
1 L−2 L−2 L − 2 σev
E β̂2SLS − β ≈ σev = 0 0 σev = ,
N π 0 Ξπ π Z Zπ µ2 σv2
where µ2 := π 0 Z0 Zπ/σv2 is the concentration parameter

The concentration parameter is a measure of strength of the
instruments
If instruments are weak, in the sense of µ2 ≈ 0, then we suspect
a large bias for the 2SLS estimator
We will pursue this further, both analytically and
computationally
21 / 32
Roadmap

Invalid Instruments
22 / 32
Consider the simple scalar model
Yi = Xi β + ei
Xi = Zi π + vi
In other words: K1 = 0, K2 = L2 = L = 1
Recall that π = E(Xi Zi )/E(Z2i )
What happens when E(Xi Zi ) = 0 so that π = 0?
Let’s label this case invalid instrument
Using Zi as an IV doesn’t make sense because it isn’t one
23 / 32
To make life easy,
! let’s
! assume !
ei 1 ρ
Var |Zi =
vi ρ 1
as well as EZi = 0 and EZ2i = 1

Endogeneity, of course, implies ρ 6= 0
Let’s say, you recognize that Zi isn’t really an IV and you
decide to resort to OLS instead
∑N
i=1 Xi ei N −1 ∑ N
i=1 vi ei p E(vi ei )
β̂OLS − β = = → = ρ 6= 0
N
∑i=1 Xi2 N −1 ∑ N 2
i = 1 vi E(v2i )
So β̂OLS is not consistent, which we knew already

Can the instrument help, although it is invalid?
24 / 32
N −1 ∑ N
i=1 Zi ei p E(Zi ei ) 0
β̂IV − β = N
→ = ,
N ∑i=1 Xi Zi
− 1 E ( X Z
i i ) 0
which is indeterminate
Notice that ! ! !!
1 N Z i ei ξ1 1 ρ
√ ∑
d
→ ∼N 0,
N i=1 Zi vi ξ2 ρ 1
Here Cov(ξ 1 , ξ 2 ) = E(ξ 1 ξ 2 ) = ρ

Then define ξ 0 := ξ 1 − ρξ 2
This makes Cov(ξ 0 , ξ 2 ) = E(ξ 0 ξ 2 ) = 0,
meaning ξ 0 and ξ 2 are independent
(joint normal and zero covariance implies independence)
25 / 32
Let’s take another look now:
√1 ∑N Zi ei
IV N i=1 d ξ1 ξ0
β̂ −β= 1 N
→ = ρ+
√ ∑ XZ ξ2 ξ2
N i=1 i i
(applying the continuous mapping theorem)

The ratio of two independently normally distributed rvs is
Cauchy distributed
The Cauchy distribution is nasty
26 / 32
We’ve learned that using β̂IV when Zi isn’t a valid IV results in
an estimator β̂IV that
• does not converge in probability

• instead converges to a Cauchy distribution
(which has infinite mean and thick tails)
• has a median of β + ρ
Let’s say, you ignore all that and use an IV based t test anyway
What will happen?
27 / 32
What happens to the β̂IV -based t statistic under invalid
instruments?
Recall the generic t statistic that is based on an estimator β̂:
β̂ − β
tβ̂ ( β) =
se( β̂)
Let’s make our lives easy and consider the standard error of β̂IV
under homoskedasticity
The estimator of the asymptotic variance for β̂IV is
∑ N Z2
Var β̂IV |Zi = σ̂e2 N i=1 i
( ∑ i = 1 Xi Zi ) 2
therefore q
IV
∑N 2
i=1 Zi
se( β̂ ) = σ̂e
∑N
i=1 Xi Zi
28 / 32
Notice
N N 2
σ̂e2 = N −1 ∑ (Yi − Xi β̂IV )2 = N −1 ∑ Xi ( β − β̂IV ) + ei
i=1 i=1
N N N
= N −1 ∑ e2i − 2N −1 ∑ Xi ei ( β̂IV − β) + N −1 ∑ Xi2 ( β̂IV − β)2
i=1 i=1 i=1
2
d ξ1 ξ1
→ 1 − 2ρ +
ξ2 ξ2
It follows that
∑Ni=1 Zi ei
β̂IV − β ∑N
√1 ∑N
i = 1 Z i ei
i=1 Xi Zi N
tβ̂IV ( β) = = √ = q
se( β̂IV ) σ̂e ∑Ni=1 Zi
2
1
∑N 2
N
σ̂e N i=1 Zi
∑i=1 Xi Zi
√1
N
∑N
i=1 Zi ei d ξ1
= q →r 2 = : S( ρ )
σ̂e 1 + op (1) 1 − 2ρ ξξ 12 + ξ1
ξ2
29 / 32
Copy and paste last line from previous slide:
d ξ1
tβ̂IV ( β) → r 2 = : S( ρ )
1 − 2ρ ξ 2 + ξξ 12
ξ1
What does this mean?

The t statistic does NOT converge to a normal distribution
So we can’t simply compare it to the ±1.96 cutoffs
The asymptotic distribution of t depends on ρ, a parameter that
we don’t know and cannot estimate
• ρ is the degree of endogeneity
To get more intuition about what’s going on, let’s send ρ to 1

which is the worst possible case of endogeneity
30 / 32
The closer ρ → 1, the closer ξ 1 and ξ 2 will resemble each other
Weird things will happen in the limit case as ρ → 1:
p
• ξ1 → ξ2
p
• σ̂e2 → 0
p
• se( β̂IV ) → 0
• S( ρ ) → ∞
• and ultimately the t statistic converges in probability to ∞
That can’t be good

It means, that you are mechanically rejecting H0 irrespective of
the true value of β
Hansen puts it nicely in his book:
. . . users may incorrectly interpret estimates as precise, despite the
fact that they are useless.
31 / 32
Put slightly differently:
• the t statistic based on β̂IV when instruments are invalid is
deceivingly optimistic
• it tends to be large suggesting a nonzero coefficient
• irrespective of the true value of β
• the large t statistic is merely an artifact of the breakdown
of the asymptotic normal distribution
In the case π = 0, perhaps better to use OLS instead of IV?

Problem: in applications you don’t usually know that π = 0
Anyway, maybe the case π = 0 is too extreme and produces
problems that are too dramatic
Let’s study a case that is less extreme and therefore, maybe, less
dramatic: π 6= 0 but π ≈ 0 (so-called weak instruments)
32 / 32

Lecture 8

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 8

Uploaded by

Copyright:

Available Formats

Advanced Econometrics I

Instrumental Variables Estimation

with λ := πβ and wi := vi0 β + ei

• regressing Yi on Zi results in λ = E(Zi Zi0 )−1 E(Zi Yi )

Let’s recall their dimensions

• are no solutions or infinitely many solutions if L < K

So let’s only consider L ≥ K

This solution for β motivates the IV estimator

This seems straightforward, however the 2SLS estimator is

If you knew π you could define Z̃i0 = Zi0 π and write

This OLS estimator is infeasible because we don’t know π

(1) regress X on Z, obtain π̂ = (Z0 Z)−1 Z0 X and X̂ = Zπ̂ = PZ X

Instrumental Variables Estimation

Proposition (Asymptotic Distribution of β̂2SLS )

where êi := Yi − Xi0 β̂2SLS

Instrumental Variables Estimation

where Xi is a scalar and dim Zi = L ≥ 1

Let’s dissect numerator and denominator

These can be used in numerator and denominator

are non-stochastic (we treat them as constants)

Looking at the individual terms

where I use the lemma:

The last line is justified because v0 PZ w is a well behaved rv with

where I have used two useful facts:

where µ2 := π 0 Z0 Zπ/σv2 is the concentration parameter

Instrumental Variables Estimation

as well as EZi = 0 and EZ2i = 1

So β̂OLS is not consistent, which we knew already

Here Cov(ξ 1 , ξ 2 ) = E(ξ 1 ξ 2 ) = ρ

(applying the continuous mapping theorem)

• does not converge in probability

What does this mean?

• ρ is the degree of endogeneity

To get more intuition about what’s going on, let’s send ρ to 1

That can’t be good

In the case π = 0, perhaps better to use OLS instead of IV?

You might also like