Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Econometrics II

Endogeneity and the GMM


Examples of Endogeneity
Instruments
Generalized Method of Moments (GMM)
Two-Stage Least Squares Estimation (2SLS)
Asymptotic Distribution of the GMM Estimator
Hypothesis Testing
Testing Overidentifying Restrictions
Testing Subsets of Orthogonality Conditions

1 / 50

Endogeneity and the GMM [Hayashi]


Introduction

Consider yi = β1 xi1 + β2 xi2 + ... + βK xiK + εi .

Definition (Endogenous Regressor)


We say that xij (j-th regressor) is endogenous if

Cov xij , εi 6= 0 (or E xij εi 6= 0).

It follows that E (xi εi ) 6= 0 and E ( εi j xi ) 6= 0.If Cov xij , εi = 0 (or


E xij εi = 0), 8i, we say that the xij (j-th regressor) is predetermined
or orthogonal to the error term εi .

If the regressors are endogenous we have, under the Assumptions


LS1 and LS2,
! 1
1 n 1 n p
xi xi0 1
n i∑ n i∑
b = β+ xi ε i ! β + Q E (xi εi ) 6= β
=1 =1

since E (xi εi ) 6= 0.The term Q 1


E (xi εi ) is the asymptotic bias.
2 / 50
Endogeneity and the GMM [Hayashi]
Introduction

Example (simple regression model): Consider

yi = β1 + β2 xi2 + εi , E (xi2 εi ) 6= 0.

Then under the Assumptions LS1 and LS2


p 1
b ! β+Q E (xi εi )

and 2 3
Cov(xi2 ,εi )
1 Var(xi2 ) E i2
(x )
Q E (xi εi ) = ... = 4 Cov(xi2 ,εi )
5
Var(xi2 )

(see Exercise 1 of Exercise Sheet 3).If xi2 is endogenous, b1 and b2 are


inconsistent.

3 / 50

Correlation between error terms and regressors

The endogeneity problem is endemic in social sciences/economics:


Measurement error in the regressors may lead to endogeneity.
Sample Selection.
In many cases economists are not interested in estimating the
mean of a variable conditional on regressors;They are interested
in estimating the parameters of models (usually, but not always,
derived from Economic Theory) which they believe generated
the data.In these type of models we can have endogeneity for
two reasons:
1 important explanatory variables are not included in the regression
model because they cannot be observed.
2 simultaneity.
Instrumental variables method (IV) is the most well-known
method to address endogeneity problems

4 / 50
Examples of Endogeneity
Errors-in-Variables Bias

Example: We will see that predetermined regressor necessarily


becomes endogenous when measured with error.This problem is
ubiquitous, particularly in micro data on households.Consider

yi = β1 + β2 xi2 + ui

where xi2 is a predetermined regressor.The variable xi2 is measured


with error:
xi2 = xi2 + vi .
Assume further that
E xi2 ui = E xi2 εi = E xi2 vi = E (vi ui ) = 0.The regression
equation is
yi = β1 + β2 xi2 + εi , εi = ui β2 vi
Assuming LS1 and LS2 we have after some calculations (see Exercise
7 of Exercise Sheet 3):

p Cov (xi2 , εi ) E v2i


b2 ! β2 + = β2 β2 .
Var (xi2 ) Var (xi2 )

5 / 50

Examples of Endogeneity
Sample Selection Bias

Example: “Dewey defeats Truman” was an incorrect banner headline


on the front page of the Chicago Daily Tribune on November 3, 1948,
the day after Harry Truman, won an upset victory over the
republican candidate Thomas Dewey, in the 1948 presidential
election. The reason the newspaper was mistaken is that their editor
trusted the results of a phone survey. Survey research was then in its
infancy, and few academics realized that a sample of telephone users
was not representative of the general population. Telephones were
not yet widespread, and those who had them tended to be
prosperous and [Dewey supporters]. There are many types of Sample
Selection Bias: Self-selection bias, Survivorship bias, etc (which are
beyond the scope of our syllabus).

6 / 50
Examples of Endogeneity
Sample Selection Bias

Sample selection corresponds to nonrandom sampling from a


cross-sectional population which is conveniently viewed as follows:
we randomly draw (yi , xi ) from the population,but it is not always
(fully) observed. Let si denote a binary selection indicator: si = 1 if
(yi , xi ) is observed, si = 0 otherwise. Therefore, our sample consist of
f(yi , xi , si ) , i = 1, ..., ng where the value of si determines whether we
observe all elements of (yi , xi , si ) .
The OLS estimator depends whether we observe all elements of
(yi , xi , si ) . Thus

n
! 1 n
b = ∑ si xi xi0 ∑ si xi yi = ...
i=1 i=1
! 1
1 n 1 n
si xi xi0
n i∑ n i∑
= β+ si x i ε i
=1 =1

7 / 50

Examples of Endogeneity
Sample Selection Bias

p
Under LS1 and assuming that n1 ∑ni=1 si xi xi0 ! Q positive definite
then
p
b ! β + Q 1 E (si xi εi ) .
Therefore, if E (si xi εi ) = 0 ) Sample Selection can be ignored and b
is consistent; Otherwise if E (si xi εi ) 6= 0 b is inconsistent.

8 / 50
Examples of Endogeneity
Sample Selection Bias

Some cases of interest:


If E ( εi j xi ) = 0 and si is a deterministic function of xi , then
assumption E (si xi εi ) = 0 holds. In other words, exogenous
sampling occurs when si = h (xi ) for some nonrandom function h.
[proof]
If si is correlated with yi or εi then E (si xi εi ) 6= 0, even if
E ( εi j xi ) = 0. Therefore b is inconsistent. Therefore, sample
selection based on y, or on variables correlated with y causes
inconsistency of the OLS estimator.

9 / 50

Examples of Endogeneity
Sample Selection Bias

Example (Case E (si xi εi ) 6= 0): Following the example "Dewey defeats


Truman" the model with all observations is yi = β1 + εi , where yi = 1
if the individual i supports Dewey, and yi = 0 if supports Truman,
and β1 is the proportion of individuals that support Dewey. Because
the telephones were not yet widespread, and those who had them
tended to be prosperous and [supported the republican candidate
Dewey] the sample selection variable si was correlated with a
variable (wealth of people) which was correlated with yi . For this
reason b1 was overestimated.

10 / 50
Examples of Endogeneity
Sample Selection Bias

Example (Case E (si xi εi ) = 0): Suppose we wish to estimate the


savings function for all families in a given country, and the
population saving function is a function of income, age (age of the
household head) and other variables. However, we only have access
to a survey that included families whose household head was 45
years of age or older. This restricted sampling potentially raises a
sample selection issue because we are interested in savings function
for all families, but we can obtain a random sample only for a subset
of the population. However, we are in the case si = h (xi ) , and if
E ( εi j xi ) = 0 and other regularity conditions hold the OLS estimator
is consistent.

11 / 50

Examples of Endogeneity
Omitted Variable Bias

Example: Consider the problem of unobserved ability in a wage


equation for working adults. A simple model is

log (wagei ) = β1 + β2 educi + β3 abili + ui

where ui is the error term. We put abili into the error term, and we are
left with the simple regression model

log (wagei ) = β1 + β2 educi + εi

where εi = β3 abili + ui . The OLS will be inconsistent estimator of β2 if


educi and abili are correlated. In effect,

p Cov (educi , εi ) Cov (educi , abili )


b2 ! β2 + = β2 + β3 .
Var (educi ) Var (educi )

12 / 50
Examples of Endogeneity
Omitted Variable Bias

Example (Omitted variables, Ignoring a Common Cause): Ice cream


sales (xt2 ) versus deaths by drowning (yt ). Running the regression of
yt of xt2 it was found that the estimate of the coefficient of xt2 is
positive and significant. Does that mean that ice cream consumption
cause drownings? This example fails to recognize the importance of
time of year and temperature (xt3 ) to ice cream sales. Ice cream is
sold during the hot summer months at a much greater rate than
during colder times, and it is during these hot summer months that
people are more likely to engage in activities involving water, such as
swimming. The increased drowning deaths are simply caused by
more exposure to water-based activities, not ice cream. The stated
conclusion is false. Model yt = β1 + β2 xt2 + εt , εt = β3 xt3 + errort .
Conclude that
p Cov (xt2 , εt ) Cov (xt2 , xt3 )
b2 ! β2 + = ... = β3 > 0 (with β2 = 0)
Var (xt2 ) Var (xt2 )

13 / 50

Examples of Endogeneity
Simultaneous Equations Bias

Example: Consider

yi1 = β1 + β2 yi2 + εi
yi2 = α1 + α2 yi1 + ui

where εi and ui are independent. By construction yi1 and yi2 are


endogenous regressors. In fact, it can be proved that
α2
Cov (yi2 , εi ) = E (yi2 εi ) = Var (εi ) 6= 0
1 β2 α2

(see Exercise 2 of Exercise Sheet 3).

14 / 50
Examples of Endogeneity
Simultaneous Equations Bias

Example: The OLS estimator is inconsistent for both β1 and β2 . For


example, using the formula in example on the simple regression
model we have
p Cov (yi2 , εi ) α2 Var (εi )
b2 ! β 2 + = β2 +
Var (yi2 ) 1 β2 α2 Var (yi2 )

This phenomenon is known as the simultaneous equations bias or


simultaneity bias, because the regressor and the error term are often
related to each other through a system of simultaneous equations.

15 / 50

Examples of Endogeneity
Simultaneous Equations Bias

Example: A classic example:

Ci = β1 + β2 Yi + εi (consumption function),0 < β2 < 1


Yi = Ci + Ii (GNP identity).

where Cov (εi , Ii ) = 0. Under adequate assumptions (see Exercise 3 of


exercise sheet 3):
Var (εi )
Cov (Yi , εi ) =
1 β2
Therefore Yi is an endogenous variable. Additionally

p 1 Var (εi )
b2 ! β2 + .
1 β2 Var (yi )

16 / 50
Instruments

Suppose that xij is endogeneous. In order to obtain consistent


estimators of β, we need some additional information. The
information comes by way of a new variable, called instrument, that
satisfies certain properties.

Definition
zi is an instrumental variable (IV) for xij if
(1) zi is uncorrelated with εi , that is, Cov(zi , εi ) = 0 (thus, zi is a
predetermined variable),
(2) zi is correlated with xij after controlling for the effects of the other
exogenous variables in the model.

17 / 50

Instruments

Example: A classic example:

Ci = β1 + β2 Yi + εi (consumption function),0 < β2 < 1


Yi = Ci + Ii (GNP identity).

where Cov (εi , Ii ) = 0. Here investment Ii is an instrument for Yi


because it is correlated with Yi and uncorrelated with the error term
εi .

18 / 50
Instruments

Example: Why does this help?


Note that
Cov (Ci , Ii ) = β2 Cov (Yi , Ii )
Therefore
Cov (Ci , Ii )
β2 =
Cov (Yi , Ii )
if Cov (Yi , Ii ) 6= 0.To estimate β2 we are going to use the method of
moments principle.

19 / 50

Instruments
The method of moment principle: To estimate a feature of the
population, use the corresponding feature of the sample.
Examples:

Parameter of the population Estimator


E (yi ) ȳ (sample mean)
Var (yi ) , S2y (sample variance)
Cov (yi , xi ) Sy.x (sample covariance)

Example: Therefore an estimator for

Cov (Ci , Ii )
β2 =
Cov (Yi , Ii )

is given by
SCi ,Ii
β̂2,IV = .
SYi ,Ii
where SCi ,Ii is the sample covariance between Ci and Ii and SYi ,Ii is the
sample covariance between Yi and Ii
20 / 50
Generalized Method of Moments (GMM)

Let us now consider the general framework in which we have more


than one regressor and more than one instrument.
The estimator that we are going to use is known as Generalised Method
of Moments estimator (GMM) and was introduced by Lars Peter
Hansen in 1982. This estimator is applicable in general settings. Here
we show how this estimator can be applied in the Instrumental
Variable model.
In 2013, Lars Peter Hansen was awarded the Nobel Memorial Prize in
Economic Sciences for developing a statistical method (GMM) that
allows the empirical analysis of asset prices.

21 / 50

Generalized Method of Moments (GMM)


Assumptions

Let zi be a L-dimensional vector to be referred to as the vector of


instruments,
Assumption (GMM1 - Linearity)
The equation to be estimated is linear:

yi = xi0 β + εi , (i = 1, 2, ..., n) ,

where xi is an K-dimensional vector of regressors, β is an K-dimensional


coefficient vector and εi is an unobservable error term.

Assumption (GMM2 - S&WD)


f(yi , xi , zi )g is jointly stationary and weakly dependent.

22 / 50
Generalized Method of Moments (GMM)
Assumptions

Assumption (GMM3 - Orthogonality Conditions)


All the L variables in zi are predetermined in the sense that they are all
orthogonal to the current error term: E (zik εi ) = 0 for all i and k. This can
be written as
0
E zi yi xi β = 0.

Notice: zi should include the “1” (constant). Not only zi1 = 1 can be
considered as an IV variable but it also guarantees that
E 1 yi xi0 β = 0 , E (εi ) = 0.

23 / 50

Generalized Method of Moments (GMM)


Assumptions

Example: Estimating the return to schooling via simple regression:

log(wagei ) = β1 + β2 educi + β3 agei + εi


where educi is considered endogenous and agei is exogenous
(pre-determined).
Instruments for educi that have been considered in the literature:
1 mother’s education (motheduci )
2 father’s education (fatheduci )
3 number of siblings (sibsi ) .

24 / 50
Generalized Method of Moments (GMM)
Assumptions

Example: In terms of the general model,

yi = log (wagei ) ,
2 3 2 3
1 β1
xi = 4 educi 5 , β = 4 β2 5 , K=3
agei β3
2 3
1
6 motheduci 7
6 7
zi = 6
6 fatheduci 7 ,
7 L = 5.
4 sibsi 5
agei

25 / 50

Generalized Method of Moments (GMM)


Assumptions

Assumption (GMM4 - Rank Condition for Identification)


We have L K and the L K matrix E zi xi0 is of full column rank (i.e., its
rank equals K, the number of its columns). We denote this matrix by
Qzx .Let wi = zi εi .

Assumption (GMM5 - fwi g is a MDS with Finite Second Moments)


fwi g is a martingale difference sequence (so E (wi ) = 0). ThepL L matrix
of cross moments, E wi wi0 , is nonsingular. Let S = Avar nw̄ , where
w̄ = ∑ni=1 wi /n.

26 / 50
Generalized Method of Moments (GMM)
Assumptions

Remarks:
Assumption pGMM5 implies p
S = AVar nw̄ = lim Var nw̄ = E wi wi0 = E ε2i zi zi0
p d
Assumptions GMM1-GMM5 imply nw̄ ! N 0, E wi wi0 .
[Theorem 5.2.3 Econometria I]
If the instruments include a constant, then this assumption
implies that the error fεi g is a martingale difference sequence
(and a fortiori serially uncorrelated).
A sufficient and perhaps easier to understand condition for
Assumption GMM5 is that

E ( εi j εi 1 , εi 2 , ..., ε1 , zi , zi 1 , ..., z1 ) =0

It implies the error term is orthogonal not only to the current but
also to the past instruments.
If fwi g is serially correlated, then S does not equal E wi wi0 and
will take a more complicated form.
27 / 50

Generalized Method of Moments (GMM)

How can we use the Method of Moments to obtain an estimator of β?

28 / 50
Generalized Method of Moments (GMM)

How can we use the Method of Moments to obtain an estimator of β?


By GMM3

E zi yi xi0 β =0 (1)
0
, E zi yi ) E(zi xi β=0

Notice that E(zi yi ) is a (L 1) vector and E(zi xi0 ) is a (L K) matrix,


therefore β can be interpreted as the solution of the system of
equations (1)

28 / 50

Generalized Method of Moments (GMM)

Notice that the method of moments estimator of E(zi yi ) is n1 ∑ni=1 zi yi


and the method of moments estimator of E(zi xi0 ) is n1 ∑ni=1 zi xi0 .
Therefore, the method of moments estimator of β is β̃, the solution of
the following system of equations:.

1 n 1 n
zi xi0 β̃= 0
n i∑ n i∑
zi yi
=1 =1

29 / 50
Generalized Method of Moments (GMM)
Now notice that
1 n 1 n 1 n 1 n
zi xi0 β̃= 0 , ∑ zi xi0 β̃= ∑ zi yi , Z0 X β̃ = Z0 y
n i∑ ∑
zi yi
=1
n i=1 n i=1 n i=1

where
z10 x10
2 3 2 3 2 3
y1
6 z20 7 6 x20 7 6 y2 7
Z=6 .. 7,X = .. 7,y = .. 7.
6 7 6 7 6 7
6 6
4 . 5 4 . 5 4 . 5
zn0 xn0 yn
Thus:
Z0 X β̃ = Z0 y
(L K ) (K 1) (L 1)

is a system with L equations with K unknowns.


If L = K and rank E zi xi0 = K, than Z0 X is invertible (in probability,
for n large enough). Solving Z0 X β̃ = Z0 y with respect to β̃ gives the
Instrumental Variable Estimator
1
β̂IV = Z0 X Z0 y.

30 / 50

Generalized Method of Moments (GMM)

It may happen that L > K (there are more orthogonality conditions


than parameters). In principle, it is better to have as many IV as
possible, so the case L > K is desirable, but then the system
Z0 X β̃ = Z0 y does not have a solution.
This means that we cannot set Z0 X β̃ Z0 y exactly equal to 0.
However, we can at least choose β̃ so that Z0 X β̃ Z0 y is as close to 0
as possible. The GMM estimator is based on this idea.

Definition (GMM estimator)


p
Suppose that Ŵ ! W where W is symmetric positive matrix. The
GMM estimator of β, denoted β̂GMM is

β̂GMM = arg min J β̃


β̃

where
0
J β̃ = Z0 X β̃ Z0 y Ŵ Z0 X β̃ Z0 y .

31 / 50
Generalized Method of Moments (GMM)

Any definite positive matrix Ŵ is possible. For example, Ŵ = IL .


However, it can be proved that the optimal choice of the weighting
matrix Ŵ is
p
Ŵ =Ŝ 1
where Ŝ ! S = Var (zi εi ) = E ε2i zi zi0

(produces, asymptotically, the most efficient estimator). Therefore the


Efficient GMM estimator is
0
β̂GMM = arg min Z0 X β̃ Z0 y Ŝ 1
Z0 X β̃ Z0 y
β̃

The idea of using Ŵ = Ŝ 1 is that the components in zi εi that have


greater variability have lower weights in the sums and therefore less
weight in the GMM estimation. This procedure leads to the Efficient
GMM estimator.

32 / 50

Generalized Method of Moments (GMM)

0
Solving β̂GMM = arg minβ̃ Z0 X β̃ Z0 y Ŝ 1 Z0 X β̃ Z0 y , we have:

Theorem
Under the Assumptions GMM2 and GMM4 the first order conditions are
given by
X0 ZŜ 1 Z0 X β̂GMM Z0 y = 0
and consequently the efficient GMM estimator is given by
1
β̂GMM = X0 ZŜ 1 0
ZX X0 ZŜ 1 0
Z y.

1
Remark: Note that if L = K, β̂GMM = β̂IV = (Z0 X) Z0 y

33 / 50
Generalized Method of Moments (GMM)
To calculate the efficient GMM estimator, we need the consistent
estimator Ŝ.This leads us to the following two-step efficient GMM
procedure:
Step1: Consider the following GMM estimator of β based on
Ŵ = IL :
0
β̂ = arg min Z0 X β̃ Z0 y Z0 X β̃ Z0 y
β̃
1
= X0 ZZ0 X X0 ZZ0 y

This estimator is consistent for β (although


asymptotically inefficient). Compute

1 n 2 0
n i∑
Ŝ ε̂i zi zi ,
=1

where ε̂i = yi xi0 β̂. We can rewrite Ŝ as


Ŝ ∑ni=1 ε̂2i zi zi0 /n = Z0 BZ/n where
B = diag(ε̂21 , ε̂22 , ..., ε̂2n ).
34 / 50

Generalized Method of Moments (GMM)

Step 2: With Ŝ= n1 Z0 BZ, we obtain (n cancels out in the


expression above)
1
β̂GMM = X0 ZŜ 1 0
ZX X0 ZŜ 1 0
Zy
1 1 1
= X0 Z Z0 BZ Z0 X X0 Z Z0 BZ Z0 y.

This estimator is called the two-step GMM estimator.

35 / 50
Implications of Conditional Homoskedasticity
Assume now:
Assumption (GMM6 - Conditional Homoskedasticity)
E ε2i zi = σ2 .

This assumption implies


0 2 0 2 0 2
S E wi wi = E εi zi zi = σ E zi zi = σ Qzz .
Its estimator is
Z0 Z
Ŝ = σ̂2 .
n
where σ̂2 is a consistent estimator of σ2 . Now the efficient GMM
becomes:
1
β̂GMM = X0 ZŜ 1 0
ZX X0 ZŜ 1 0
Zy
1 1 1
= X0 Z Z0 Z Z0 X X0 Z Z0 Z Z0 y.
This estimator is called two-stage least squares estimator and we will
denote it as β̂2SLS .
36 / 50

Two-Stage Least Squares Estimation (2SLS)


β̂2SLS is called Two Stages Least Squares (2SLS) estimator because it can
be computed in two stages.
1
Let PZ = Z (Z0 Z) Z0 .
Note that PZ = PZ0 and PZ PZ = PZ .
Now note also that
1 1 1
β̂2SLS = X0 Z Z0 Z Z0 X X0 Z Z0 Z Z0 y
1
= X0 PZ X X0 PZ y
1
= X0 PZ0 PZ X X0 PZ0 y.
1
= X̂0 X̂ X̂0 y
1
where X̂ = PZ X = Z (Z0 Z) Z0 X..
The jth column of X̂ is x̂ j = PZ x j and thus x̂ j is the predicted
vector from regressing x j on Z, j = 1, .., K.
Therefore X̂ is the predicted matrix from regressing X on Z.
β̂2SLS is the OLS estimator from regressing y on X̂.
37 / 50
Two-Stage Least Squares Estimation (2SLS)

Therefore, the optimal GMM estimator under homoskedasticity can


be obtained in two-stages:
1 At stage 1, regress each column of X on Z to obtain the predicted
matrix X, denoted X̂.
2 At stage 2, regress y on X̂ to obtain β̂2SLS
Therefore, the optimal GMM estimator under homoskedasticity is
usually called the two stage least squares (2SLS) estimator.

38 / 50

Asymptotic Distribution of the GMM Estimator


Let us go back to the heteroskedastic case:

Theorem (Asymptotic Distribution of the GMM estimator)


p
(a) (Consistency) Under Assumptions GMM1-GMM4, β̂GMM ! β; (b)
(Asymptotic Normality) If Assumption GMM3 is strengthened as
Assumption GMM5, then
p d
n β̂GMM β ! N (0, V)

where
p 0 1
1
V = AVar n β̂GMM β = Qzx S Qzx

Where Qzx = E zi xi0 and S E wi wi0 = E ε2i zi zi0 . (c) (Consistent


Estimate of V) Suppose there is available a consistent estimator, Ŝ, of S.
Then, under Assumption GMM2, V is consistently estimated by
1
X0 Z 1Z
0X
V̂ = Ŝ .
n n
39 / 50
Hypothesis Testing

Theorem (Robust t-ratio and Wald Statistics)

Suppose Assumptions GMM1-GMM5 hold, and suppose there is available a


consistent estimate Ŝ of S. Then (a) under the null H0 : βj = β0j

β̂j β0j d
t0j = ! N (0, 1)
σ̂ β̂
j

where β̂j is the jth element of β̂GMM and σ̂2β̂ is the (j, j) element of V̂/n.
j
(b) Under the null hypothesis H0 :Rβ = r where p is the number of
restrictions and R (p K) is of full row rank,
0 1 d
W = n R β̂GMM r RV̂R0 R β̂GMM r ! χ2(p) .

40 / 50

Hypothesis Testing
Example: Wage and education data for a sample of men in 1976 .Card
(1995) considered the following model
log(wagei ) = β1 + β2 educi + β3 experi + β4 exper2i + β5 blacki + β6 smsai
+ β7 southi + εi
where smsa =1 if in Standard Metropolitan Statistical Area in 1976.

However, Card (1995) considered educ an endogeneous variable


41 / 50
Hypothesis Testing
Example: Let nearc2 and nearc4 =1 if he grew up near a 2 and 4 year
college, respectively. Card (1995) uses college proximity (nearc2 and
nearc4) as an instrument to identify the returns to schooling, noting
that living close to a college during childhood may induce some
children to go to college but is unlikely to directly affect the wages
earned in their adulthood. Hence, nearc2 and nearc4 instrumental
variables for EDUC.

42 / 50

Hypothesis Testing
Example: Let’s get the GMM estimator assuming that nearc2 and
nearc4 are instrumental variables
xi0 = 1 educi experi exper2i blacki smsai southi
zi0 = 1 experi exper2i blacki smsai southi nearc2i nearc4i

43 / 50
Hypothesis Testing

Example: Let’s obtain the 2SLS estimator

44 / 50

Hypothesis Testing
Example: Let’s now obtain the 2SLS estimator with robust standard
errors (2SLS is consistent and asymptotically normal under
heteroskedasticity, but it is not efficient and we need to use robust
standard errors to make inferences)

45 / 50
Testing Overidentifying Restrictions
Testing all Orthogonality Conditions

Suppose that we would like to test

H0 : E zi yi xi0 β = 0, vs H1 : E zi yi xi0 β 6= 0

Theorem (Hansen’s test of Overidentifying Restrictions)

Under assumptions GMM1-GMM5


0 d
Ĵ = Z0 X β̂GMM Z0 y Ŝ 1
Z0 X β̂GMM Z0 y /n ! χ2(L K)

If Ĵ is large, it means that either the orthogonality conditions


(Assumption GMM3) or the other assumptions (or both) are
likely to be false.
In the conditionally homoskedastic case with β̂GMM = β̂2SLS and
Z0 Z
Ŝ = σ̂2 n , the Hansen test is called Sargan test.
To perform this test we need L > K.
46 / 50

Testing Overidentifying Restrictions


Example (continuation): The program provides the J statistics:

H0 : E (zi εi ) = 0 [the instruments are uncorrelated with the error]


H1 : E (zi εi ) 6= 0 [some instruments are endogeneous]

. estat overid

Test of overidentifying restriction:

Hansen's J chi2(1) = 2.65321 (p = 0.1033)

We do not reject H0 at the 5% level (that’s good!)


47 / 50
Testing Subsets of Orthogonality Conditions

Consider
zi1 g L1 rows
zi =
zi2 g L L1 rows
We want to test H0 : E (zi2 εi ) = 0, E (zi1 εi ) = 0 vs
H1 : E (zi2 εi ) 6= 0, E (zi1 εi ) = 0.
The basic idea is to compare two J statistics from two separate GMM
estimators, one using only the instruments included in zi1 and the
other using also the suspect instruments zi2 in addition to zi1 . If the
inclusion of the suspect instruments significantly increases the Ĵ
statistic, that is a good reason for doubting the predeterminedness of
zi2 . This restriction is testable if L1 K (why?).
Remark: These tests are called tests for subsets of orthogonality
conditions by Hayashi. However they are called tests for additional
orthogonality conditions by other authors

48 / 50

Testing Subsets of Orthogonality Conditions

Theorem (Testing a Subset of Orthogonality Conditions)

Suppose that the rank condition is satisfied for zi1 , so E zi1 xi0 is of full
column rank. Under assumptions GMM1-GMM5. and under the null H0 ,
we have
d
C Ĵ Ĵ1 ! χ2(L L1 ) .

where in Ĵ we use zi1 and zi2 as instrumental variables and in Ĵ1 just zi1 .

49 / 50
Testing Subsets of Orthogonality Conditions

Example: We can use theorem to test for the endogeneity of educi in


the previous example Let
zi0 = 1 experi exper2i blacki smsai southi nearc2i nearc4i ,
and suppose you want to test H0 : E (zi εi ) = 0, E (educi εi ) = 0. vs
H1 : E (zi εi ) = 0, E (educi εi ) 6= 0

. estat endogenous

Test of endogeneity (orthogonality conditions)


H0: Variables are exogenous

GMM C statistic chi2(1) = 3.73275 (p = 0.0534)

50 / 50

You might also like