Professional Documents
Culture Documents
Lecture 8
Lecture 8
Jürgen Meinecke
Lecture 8 of 12
Research School of Economics, Australian National University
1 / 32
Roadmap
2 / 32
We are finally re-connecting with the week 5 lecture
We combined the structural and first stage equations like so:
Yi = Xi0 β + ei
= (π 0 Zi + vi )0 β + ei
= Zi0 λ + wi ,
• dim λ = L × 1
• dim π = L × K
3 / 32
We learned that the projection coefficients λ and π are
identified because they are explicit functions of population
moments
This means we can uniquely estimate them
Practically we treat them as if they were known to us (because
we have faith in uniquely estimating them via analog principle)
In contrast, identification of β is not so easy because λ = πβ
is a system of L equations for K unknowns
Linear algebra tells you that there
4 / 32
Two sub-cases here
• L=K
then dim π = K × K and if it is invertible then
β = π −1 λ = E(Zi Xi0 )−1 E(Zi Yi )
5 / 32
Again looking at our structural equation and plugging in the
first stage
Yi = Xi0 β + ei
= (π 0 Zi + vi )0 β + ei
= Zi0 πβ + (vi0 β + ei )
= Zi0 πβ + wi
where E(Z̃i wi ) = 0
Clearly, OLS would work fine here
6 / 32
Notice that dim Z̃i = dim Xi = K × 1
The corresponding matrix Z̃ := Zπ with
dim Z̃ = dim X = N × K
The OLS estimator is
−1 0
β̂i2SLS := Z̃0 Z̃ Z̃ Y
−1 0
= Z̃0 Z̃
Z̃ Y
−1 0 0
= π 0 Z0 Zπ πZY
7 / 32
Definition (Two Stage Least Squares (2SLS) Estimator)
−1 0 0
β̂2SLS := π̂ 0 Z0 Zπ̂ π̂ Z Y
−1
= X 0 Z (Z0 Z ) −1 Z0 X X 0 Z (Z0 Z ) −1 Z0 Y
In summation notation:! ! −1 ! −1
N N N
β̂2SLS = ∑ Xi Zi0 ∑ Zi Zi0 ∑ Zi Xi0 ×
i=1 i=1 i=1
! ! −1 !
N N N
∑ Xi Zi0 ∑ Zi Zi0 ∑ Zi Yi
i=1 i=1 i=1
8 / 32
Three different interpretations of β̂2SLS
Recall PZ := Z(Z0 Z)−1 Z0 is the symmetric and idempotent
projection matrix
Then X̂ := PZ X is the projection of X on Z
It follows −1
β̂2SLS = X0 Z(Z0 Z)−1 Z0 X X 0 Z (Z0 Z ) −1 Z0 Y
−1 0
= X0 PZ X X PZ Y (1)
−1
= (PZ X)0 X (PZ X)0 Y
−1 0
= X̂0 X X̂ Y (2)
−1 0
= X0 PZ PZ X X PZ Y
−1
= (PZ X)0 (PZ X) ( PZ X ) 0 Y
−1 0
= X̂0 X̂ X̂ Y (3)
9 / 32
Equation (1) is the most common matrix representation of β̂2SLS
in textbooks and notes
Equation (2) presents the 2SLS estimator as an IV estimator, it
has the same structure as β̂IV with X̂ used in place of Z
Equation (3) presents the 2SLS estimator as an OLS estimator of
Y on X̂
This third interpretation justifies the use of “two stage least
squares”:
10 / 32
Roadmap
11 / 32
Proposition (Consistency of β̂2SLS )
p
β̂2SLS → β.
√ d
N ( β̂2SLS − β) → N (0, Ω)
where
−1 −1 −1 −1
Ω = (CXZ CZZ CZX )−1 CXZ CZZ E(e2i Zi Zi0 )CZZ CZX (CXZ CZZ CZX )−1
Corollary
−1
Under homoskedasticity, Ω = σe2 (CXZ CZZ CZX )−1 .
12 / 32
Consistent estimators for the asymptotic covariances are
readily obtained by using the analogy principle
So replace population moments by sample moments, because
N
∑ Xi Zi0 /N = CXZ + op (1)
i=1
N
∑ Zi Zi0 /N = CZZ
−1
+ op (1)
i=1
N
∑ Zi Xi0 /N = CZX + op (1)
i=1
N
∑ Zi Zi0 ê2i /N = E(Zi Zi0 e2i ) + op (1)
i=1
13 / 32
Roadmap
14 / 32
Let’s look at the model
Yi = Xi β + ei
Xi = Zi0 π + vi ,
15 / 32
We know that
√ 2SLS √1 X0 Z(Z0 Z)−1 Z0 e
N
N β̂ −β = 1 0 0 −1 0
N X Z(Z Z) Z X
• Z0 X = Z0 Zπ + Z0 v
• X 0 Z = π 0 Z0 Z + v0 Z
• Zi
• Ξ : = ∑N 0
i=1 Zi Zi /N
= √2N N 0, σv2 π 0 Ξπ
= √2 Op (1) = Op √1
N N
1 0 1 2 2 1 2 1 √1
N v PZ v ∼ N σv χ (L) = N σv Op (1) = Op N = op N
17 / 32
Then turning to the numerator
√1 X0 Z(Z0 Z)−1 Z0 e = √1 π 0 Z0 e + √1 v0 PZ e
N N N
= √1 π 0 Z0 e + √1 σev2 v0 PZ v + √1 v0 PZ w
N N σv N
σev
where I use the projection ei = v
σv2 i
+ wi with E(vi wi ) = 0
Looking at the individual terms
√1 π 0 Z0 e ∼ N 0, σe2 π 0 Ξπ = Op (1)
N
√1 σev2 v0 PZ v ∼ √1 σev χ2 (L) = Op √1
N σv N N
1 0
√ v PZ w = Op √ 1
N N
Big picture:
We want the study the expected value of β̂2SLS
We have done the hard work already, now we can derive the
expected value of the rhs
19 / 32
√
E N ( β̂2SLS − β)
√1 π 0 Z0 e + √1 v0 PZ e √1 π 0 Z0 e
!
N N N 2 0 0
≈E − πZv
π 0 Ξπ (π 0 Ξπ )2 N
1 L 2 1
=√ σev − √ σev
N π Ξπ
0
N π Ξπ
0
1 L−2
=√ σev
N π 0 Ξπ
20 / 32
We have successfully approximated the bias of the 2SLS
estimator:
1 L−2 L−2 L − 2 σev
E β̂2SLS − β ≈ σev = 0 0 σev = ,
N π 0 Ξπ π Z Zπ µ2 σv2
21 / 32
Roadmap
22 / 32
Consider the simple scalar model
Yi = Xi β + ei
Xi = Zi π + vi
In other words: K1 = 0, K2 = L2 = L = 1
Recall that π = E(Xi Zi )/E(Z2i )
What happens when E(Xi Zi ) = 0 so that π = 0?
Let’s label this case invalid instrument
Using Zi as an IV doesn’t make sense because it isn’t one
23 / 32
To make life easy,
! let’s
! assume !
ei 1 ρ
Var |Zi =
vi ρ 1
∑N
i=1 Xi ei N −1 ∑ N
i=1 vi ei p E(vi ei )
β̂OLS − β = = → = ρ 6= 0
N
∑i=1 Xi2 N −1 ∑ N 2
i = 1 vi E(v2i )
24 / 32
N −1 ∑ N
i=1 Zi ei p E(Zi ei ) 0
β̂IV − β = N
→ = ,
N ∑i=1 Xi Zi
− 1 E ( X Z
i i ) 0
which is indeterminate
Notice that ! ! !!
1 N Z i ei ξ1 1 ρ
√ ∑
d
→ ∼N 0,
N i=1 Zi vi ξ2 ρ 1
25 / 32
Let’s take another look now:
√1 ∑N Zi ei
IV N i=1 d ξ1 ξ0
β̂ −β= 1 N
→ = ρ+
√ ∑ XZ ξ2 ξ2
N i=1 i i
26 / 32
We’ve learned that using β̂IV when Zi isn’t a valid IV results in
an estimator β̂IV that
Let’s say, you ignore all that and use an IV based t test anyway
What will happen?
27 / 32
What happens to the β̂IV -based t statistic under invalid
instruments?
Recall the generic t statistic that is based on an estimator β̂:
β̂ − β
tβ̂ ( β) =
se( β̂)
Let’s make our lives easy and consider the standard error of β̂IV
under homoskedasticity
The estimator of the asymptotic variance for β̂IV is
∑ N Z2
Var β̂IV |Zi = σ̂e2 N i=1 i
( ∑ i = 1 Xi Zi ) 2
therefore q
IV
∑N 2
i=1 Zi
se( β̂ ) = σ̂e
∑N
i=1 Xi Zi
28 / 32
Notice
N N 2
σ̂e2 = N −1 ∑ (Yi − Xi β̂IV )2 = N −1 ∑ Xi ( β − β̂IV ) + ei
i=1 i=1
N N N
= N −1 ∑ e2i − 2N −1 ∑ Xi ei ( β̂IV − β) + N −1 ∑ Xi2 ( β̂IV − β)2
i=1 i=1 i=1
2
d ξ1 ξ1
→ 1 − 2ρ +
ξ2 ξ2
It follows that
∑Ni=1 Zi ei
β̂IV − β ∑N
√1 ∑N
i = 1 Z i ei
i=1 Xi Zi N
tβ̂IV ( β) = = √ = q
se( β̂IV ) σ̂e ∑Ni=1 Zi
2
1
∑N 2
N
σ̂e N i=1 Zi
∑i=1 Xi Zi
√1
N
∑N
i=1 Zi ei d ξ1
= q →r 2 = : S( ρ )
σ̂e 1 + op (1) 1 − 2ρ ξξ 12 + ξ1
ξ2
29 / 32
Copy and paste last line from previous slide:
d ξ1
tβ̂IV ( β) → r 2 = : S( ρ )
1 − 2ρ ξ 2 + ξξ 12
ξ1
30 / 32
The closer ρ → 1, the closer ξ 1 and ξ 2 will resemble each other
Weird things will happen in the limit case as ρ → 1:
p
• ξ1 → ξ2
p
• σ̂e2 → 0
p
• se( β̂IV ) → 0
• S( ρ ) → ∞
• and ultimately the t statistic converges in probability to ∞