Topic 3 V 6

Econometrics II
Endogeneity and the GMM

Examples of Endogeneity
Instruments
Generalized Method of Moments (GMM)
Two-Stage Least Squares Estimation (2SLS)
Asymptotic Distribution of the GMM Estimator
Hypothesis Testing
Testing Overidentifying Restrictions
Testing Subsets of Orthogonality Conditions
1 / 50
Endogeneity and the GMM [Hayashi]

Introduction
Consider yi = β1 xi1 + β2 xi2 + ... + βK xiK + εi .
Definition (Endogenous Regressor)

We say that xij (j-th regressor) is endogenous if
Cov xij , εi 6= 0 (or E xij εi 6= 0).
It follows that E (xi εi ) 6= 0 and E ( εi j xi ) 6= 0.If Cov xij , εi = 0 (or

E xij εi = 0), 8i, we say that the xij (j-th regressor) is predetermined
or orthogonal to the error term εi .
If the regressors are endogenous we have, under the Assumptions

LS1 and LS2,
! 1
1 n 1 n p
xi xi0 1
n i∑ n i∑
b = β+ xi ε i ! β + Q E (xi εi ) 6= β
=1 =1
since E (xi εi ) 6= 0.The term Q 1

E (xi εi ) is the asymptotic bias.
2 / 50
Endogeneity and the GMM [Hayashi]
Introduction
Example (simple regression model): Consider
yi = β1 + β2 xi2 + εi , E (xi2 εi ) 6= 0.
Then under the Assumptions LS1 and LS2

p 1
b ! β+Q E (xi εi )
and 2 3
Cov(xi2 ,εi )
1 Var(xi2 ) E i2
(x )
Q E (xi εi ) = ... = 4 Cov(xi2 ,εi )
5
Var(xi2 )
(see Exercise 1 of Exercise Sheet 3).If xi2 is endogenous, b1 and b2 are

inconsistent.
3 / 50
Correlation between error terms and regressors
The endogeneity problem is endemic in social sciences/economics:

Measurement error in the regressors may lead to endogeneity.
Sample Selection.
In many cases economists are not interested in estimating the
mean of a variable conditional on regressors;They are interested
in estimating the parameters of models (usually, but not always,
derived from Economic Theory) which they believe generated
the data.In these type of models we can have endogeneity for
two reasons:
1 important explanatory variables are not included in the regression
model because they cannot be observed.
2 simultaneity.
Instrumental variables method (IV) is the most well-known
method to address endogeneity problems
4 / 50
Errors-in-Variables Bias
Example: We will see that predetermined regressor necessarily

becomes endogenous when measured with error.This problem is
ubiquitous, particularly in micro data on households.Consider
yi = β1 + β2 xi2 + ui
where xi2 is a predetermined regressor.The variable xi2 is measured

with error:
xi2 = xi2 + vi .
Assume further that
E xi2 ui = E xi2 εi = E xi2 vi = E (vi ui ) = 0.The regression
equation is
yi = β1 + β2 xi2 + εi , εi = ui β2 vi
Assuming LS1 and LS2 we have after some calculations (see Exercise
7 of Exercise Sheet 3):
p Cov (xi2 , εi ) E v2i

b2 ! β2 + = β2 β2 .
Var (xi2 ) Var (xi2 )
5 / 50
Sample Selection Bias
Example: “Dewey defeats Truman” was an incorrect banner headline

on the front page of the Chicago Daily Tribune on November 3, 1948,
the day after Harry Truman, won an upset victory over the
republican candidate Thomas Dewey, in the 1948 presidential
election. The reason the newspaper was mistaken is that their editor
trusted the results of a phone survey. Survey research was then in its
infancy, and few academics realized that a sample of telephone users
was not representative of the general population. Telephones were
not yet widespread, and those who had them tended to be
prosperous and [Dewey supporters]. There are many types of Sample
Selection Bias: Self-selection bias, Survivorship bias, etc (which are
beyond the scope of our syllabus).
6 / 50
Sample selection corresponds to nonrandom sampling from a

cross-sectional population which is conveniently viewed as follows:
we randomly draw (yi , xi ) from the population,but it is not always
(fully) observed. Let si denote a binary selection indicator: si = 1 if
(yi , xi ) is observed, si = 0 otherwise. Therefore, our sample consist of
f(yi , xi , si ) , i = 1, ..., ng where the value of si determines whether we
observe all elements of (yi , xi , si ) .
The OLS estimator depends whether we observe all elements of
(yi , xi , si ) . Thus
n
! 1 n
b = ∑ si xi xi0 ∑ si xi yi = ...
i=1 i=1
! 1
1 n 1 n
si xi xi0
n i∑ n i∑
= β+ si x i ε i
=1 =1
7 / 50
p
Under LS1 and assuming that n1 ∑ni=1 si xi xi0 ! Q positive definite
then
p
b ! β + Q 1 E (si xi εi ) .
Therefore, if E (si xi εi ) = 0 ) Sample Selection can be ignored and b
is consistent; Otherwise if E (si xi εi ) 6= 0 b is inconsistent.
8 / 50
Some cases of interest:

If E ( εi j xi ) = 0 and si is a deterministic function of xi , then
assumption E (si xi εi ) = 0 holds. In other words, exogenous
sampling occurs when si = h (xi ) for some nonrandom function h.
[proof]
If si is correlated with yi or εi then E (si xi εi ) 6= 0, even if
E ( εi j xi ) = 0. Therefore b is inconsistent. Therefore, sample
selection based on y, or on variables correlated with y causes
inconsistency of the OLS estimator.
9 / 50
Example (Case E (si xi εi ) 6= 0): Following the example "Dewey defeats

Truman" the model with all observations is yi = β1 + εi , where yi = 1
if the individual i supports Dewey, and yi = 0 if supports Truman,
and β1 is the proportion of individuals that support Dewey. Because
the telephones were not yet widespread, and those who had them
tended to be prosperous and [supported the republican candidate
Dewey] the sample selection variable si was correlated with a
variable (wealth of people) which was correlated with yi . For this
reason b1 was overestimated.
10 / 50
Example (Case E (si xi εi ) = 0): Suppose we wish to estimate the

savings function for all families in a given country, and the
population saving function is a function of income, age (age of the
household head) and other variables. However, we only have access
to a survey that included families whose household head was 45
years of age or older. This restricted sampling potentially raises a
sample selection issue because we are interested in savings function
for all families, but we can obtain a random sample only for a subset
of the population. However, we are in the case si = h (xi ) , and if
E ( εi j xi ) = 0 and other regularity conditions hold the OLS estimator
is consistent.
11 / 50
Omitted Variable Bias
Example: Consider the problem of unobserved ability in a wage

equation for working adults. A simple model is
log (wagei ) = β1 + β2 educi + β3 abili + ui
where ui is the error term. We put abili into the error term, and we are
left with the simple regression model
log (wagei ) = β1 + β2 educi + εi
where εi = β3 abili + ui . The OLS will be inconsistent estimator of β2 if

educi and abili are correlated. In effect,
p Cov (educi , εi ) Cov (educi , abili )

b2 ! β2 + = β2 + β3 .
Var (educi ) Var (educi )
12 / 50
Omitted Variable Bias
Example (Omitted variables, Ignoring a Common Cause): Ice cream

sales (xt2 ) versus deaths by drowning (yt ). Running the regression of
yt of xt2 it was found that the estimate of the coefficient of xt2 is
positive and significant. Does that mean that ice cream consumption
cause drownings? This example fails to recognize the importance of
time of year and temperature (xt3 ) to ice cream sales. Ice cream is
sold during the hot summer months at a much greater rate than
during colder times, and it is during these hot summer months that
people are more likely to engage in activities involving water, such as
swimming. The increased drowning deaths are simply caused by
more exposure to water-based activities, not ice cream. The stated
conclusion is false. Model yt = β1 + β2 xt2 + εt , εt = β3 xt3 + errort .
Conclude that
p Cov (xt2 , εt ) Cov (xt2 , xt3 )
b2 ! β2 + = ... = β3 > 0 (with β2 = 0)
Var (xt2 ) Var (xt2 )
13 / 50
Simultaneous Equations Bias
Example: Consider
yi1 = β1 + β2 yi2 + εi
yi2 = α1 + α2 yi1 + ui
where εi and ui are independent. By construction yi1 and yi2 are

endogenous regressors. In fact, it can be proved that
α2
Cov (yi2 , εi ) = E (yi2 εi ) = Var (εi ) 6= 0
1 β2 α2
(see Exercise 2 of Exercise Sheet 3).
14 / 50
Example: The OLS estimator is inconsistent for both β1 and β2 . For

example, using the formula in example on the simple regression
model we have
p Cov (yi2 , εi ) α2 Var (εi )
b2 ! β 2 + = β2 +
Var (yi2 ) 1 β2 α2 Var (yi2 )
This phenomenon is known as the simultaneous equations bias or

simultaneity bias, because the regressor and the error term are often
related to each other through a system of simultaneous equations.
15 / 50
Example: A classic example:
Ci = β1 + β2 Yi + εi (consumption function),0 < β2 < 1

Yi = Ci + Ii (GNP identity).
where Cov (εi , Ii ) = 0. Under adequate assumptions (see Exercise 3 of

exercise sheet 3):
Var (εi )
Cov (Yi , εi ) =
1 β2
Therefore Yi is an endogenous variable. Additionally
p 1 Var (εi )
b2 ! β2 + .
1 β2 Var (yi )
16 / 50
Instruments
Suppose that xij is endogeneous. In order to obtain consistent

estimators of β, we need some additional information. The
information comes by way of a new variable, called instrument, that
satisfies certain properties.
Definition
zi is an instrumental variable (IV) for xij if
(1) zi is uncorrelated with εi , that is, Cov(zi , εi ) = 0 (thus, zi is a
predetermined variable),
(2) zi is correlated with xij after controlling for the effects of the other
exogenous variables in the model.
17 / 50
Instruments
Example: A classic example:
Ci = β1 + β2 Yi + εi (consumption function),0 < β2 < 1

Yi = Ci + Ii (GNP identity).
where Cov (εi , Ii ) = 0. Here investment Ii is an instrument for Yi

because it is correlated with Yi and uncorrelated with the error term
εi .
18 / 50
Instruments
Example: Why does this help?

Note that
Cov (Ci , Ii ) = β2 Cov (Yi , Ii )
Therefore
Cov (Ci , Ii )
β2 =
Cov (Yi , Ii )
if Cov (Yi , Ii ) 6= 0.To estimate β2 we are going to use the method of
moments principle.
19 / 50
Instruments
The method of moment principle: To estimate a feature of the
population, use the corresponding feature of the sample.
Examples:
Parameter of the population Estimator

E (yi ) ȳ (sample mean)
Var (yi ) , S2y (sample variance)
Cov (yi , xi ) Sy.x (sample covariance)
Example: Therefore an estimator for
Cov (Ci , Ii )
β2 =
Cov (Yi , Ii )
is given by
SCi ,Ii
β̂2,IV = .
SYi ,Ii
where SCi ,Ii is the sample covariance between Ci and Ii and SYi ,Ii is the
sample covariance between Yi and Ii
20 / 50
Let us now consider the general framework in which we have more

than one regressor and more than one instrument.
The estimator that we are going to use is known as Generalised Method
of Moments estimator (GMM) and was introduced by Lars Peter
Hansen in 1982. This estimator is applicable in general settings. Here
we show how this estimator can be applied in the Instrumental
Variable model.
In 2013, Lars Peter Hansen was awarded the Nobel Memorial Prize in
Economic Sciences for developing a statistical method (GMM) that
allows the empirical analysis of asset prices.
21 / 50

Assumptions
Let zi be a L-dimensional vector to be referred to as the vector of

instruments,
Assumption (GMM1 - Linearity)
The equation to be estimated is linear:
yi = xi0 β + εi , (i = 1, 2, ..., n) ,
where xi is an K-dimensional vector of regressors, β is an K-dimensional

coefficient vector and εi is an unobservable error term.
Assumption (GMM2 - S&WD)

f(yi , xi , zi )g is jointly stationary and weakly dependent.
22 / 50
Assumptions
Assumption (GMM3 - Orthogonality Conditions)

All the L variables in zi are predetermined in the sense that they are all
orthogonal to the current error term: E (zik εi ) = 0 for all i and k. This can
be written as
0
E zi yi xi β = 0.
Notice: zi should include the “1” (constant). Not only zi1 = 1 can be
considered as an IV variable but it also guarantees that
E 1 yi xi0 β = 0 , E (εi ) = 0.
23 / 50

Assumptions
Example: Estimating the return to schooling via simple regression:
log(wagei ) = β1 + β2 educi + β3 agei + εi

where educi is considered endogenous and agei is exogenous
(pre-determined).
Instruments for educi that have been considered in the literature:
1 mother’s education (motheduci )
2 father’s education (fatheduci )
3 number of siblings (sibsi ) .
24 / 50
Assumptions
Example: In terms of the general model,
yi = log (wagei ) ,
2 3 2 3
1 β1
xi = 4 educi 5 , β = 4 β2 5 , K=3
agei β3
2 3
1
6 motheduci 7
6 7
zi = 6
6 fatheduci 7 ,
7 L = 5.
4 sibsi 5
agei
25 / 50

Assumptions
Assumption (GMM4 - Rank Condition for Identification)

We have L K and the L K matrix E zi xi0 is of full column rank (i.e., its
rank equals K, the number of its columns). We denote this matrix by
Qzx .Let wi = zi εi .
Assumption (GMM5 - fwi g is a MDS with Finite Second Moments)

fwi g is a martingale difference sequence (so E (wi ) = 0). ThepL L matrix
of cross moments, E wi wi0 , is nonsingular. Let S = Avar nw̄ , where
w̄ = ∑ni=1 wi /n.
26 / 50
Assumptions
Remarks:
Assumption pGMM5 implies p
S = AVar nw̄ = lim Var nw̄ = E wi wi0 = E ε2i zi zi0
p d
Assumptions GMM1-GMM5 imply nw̄ ! N 0, E wi wi0 .
[Theorem 5.2.3 Econometria I]
If the instruments include a constant, then this assumption
implies that the error fεi g is a martingale difference sequence
(and a fortiori serially uncorrelated).
A sufficient and perhaps easier to understand condition for
Assumption GMM5 is that
E ( εi j εi 1 , εi 2 , ..., ε1 , zi , zi 1 , ..., z1 ) =0
It implies the error term is orthogonal not only to the current but
also to the past instruments.
If fwi g is serially correlated, then S does not equal E wi wi0 and
will take a more complicated form.
27 / 50
How can we use the Method of Moments to obtain an estimator of β?
28 / 50
How can we use the Method of Moments to obtain an estimator of β?

By GMM3
E zi yi xi0 β =0 (1)
0
, E zi yi ) E(zi xi β=0
Notice that E(zi yi ) is a (L 1) vector and E(zi xi0 ) is a (L K) matrix,

therefore β can be interpreted as the solution of the system of
equations (1)
28 / 50
Notice that the method of moments estimator of E(zi yi ) is n1 ∑ni=1 zi yi

and the method of moments estimator of E(zi xi0 ) is n1 ∑ni=1 zi xi0 .
Therefore, the method of moments estimator of β is β̃, the solution of
the following system of equations:.
1 n 1 n
zi xi0 β̃= 0
n i∑ n i∑
zi yi
=1 =1
29 / 50
Now notice that
1 n 1 n 1 n 1 n
zi xi0 β̃= 0 , ∑ zi xi0 β̃= ∑ zi yi , Z0 X β̃ = Z0 y
n i∑ ∑
zi yi
=1
n i=1 n i=1 n i=1
where
z10 x10
2 3 2 3 2 3
y1
6 z20 7 6 x20 7 6 y2 7
Z=6 .. 7,X = .. 7,y = .. 7.
6 7 6 7 6 7
6 6
4 . 5 4 . 5 4 . 5
zn0 xn0 yn
Thus:
Z0 X β̃ = Z0 y
(L K ) (K 1) (L 1)
is a system with L equations with K unknowns.

If L = K and rank E zi xi0 = K, than Z0 X is invertible (in probability,
for n large enough). Solving Z0 X β̃ = Z0 y with respect to β̃ gives the
Instrumental Variable Estimator
1
β̂IV = Z0 X Z0 y.
30 / 50
It may happen that L > K (there are more orthogonality conditions

than parameters). In principle, it is better to have as many IV as
possible, so the case L > K is desirable, but then the system
Z0 X β̃ = Z0 y does not have a solution.
This means that we cannot set Z0 X β̃ Z0 y exactly equal to 0.
However, we can at least choose β̃ so that Z0 X β̃ Z0 y is as close to 0
as possible. The GMM estimator is based on this idea.
Definition (GMM estimator)

p
Suppose that Ŵ ! W where W is symmetric positive matrix. The
GMM estimator of β, denoted β̂GMM is
β̂GMM = arg min J β̃

β̃
where
0
J β̃ = Z0 X β̃ Z0 y Ŵ Z0 X β̃ Z0 y .
31 / 50
Any definite positive matrix Ŵ is possible. For example, Ŵ = IL .

However, it can be proved that the optimal choice of the weighting
matrix Ŵ is
p
Ŵ =Ŝ 1
where Ŝ ! S = Var (zi εi ) = E ε2i zi zi0
(produces, asymptotically, the most efficient estimator). Therefore the

Efficient GMM estimator is
0
β̂GMM = arg min Z0 X β̃ Z0 y Ŝ 1
Z0 X β̃ Z0 y
β̃
The idea of using Ŵ = Ŝ 1 is that the components in zi εi that have

greater variability have lower weights in the sums and therefore less
weight in the GMM estimation. This procedure leads to the Efficient
GMM estimator.
32 / 50
0
Solving β̂GMM = arg minβ̃ Z0 X β̃ Z0 y Ŝ 1 Z0 X β̃ Z0 y , we have:
Theorem
Under the Assumptions GMM2 and GMM4 the first order conditions are
given by
X0 ZŜ 1 Z0 X β̂GMM Z0 y = 0
and consequently the efficient GMM estimator is given by
1
β̂GMM = X0 ZŜ 1 0
ZX X0 ZŜ 1 0
Z y.
1
Remark: Note that if L = K, β̂GMM = β̂IV = (Z0 X) Z0 y
33 / 50
To calculate the efficient GMM estimator, we need the consistent
estimator Ŝ.This leads us to the following two-step efficient GMM
procedure:
Step1: Consider the following GMM estimator of β based on
Ŵ = IL :
0
β̂ = arg min Z0 X β̃ Z0 y Z0 X β̃ Z0 y
β̃
1
= X0 ZZ0 X X0 ZZ0 y
This estimator is consistent for β (although

asymptotically inefficient). Compute
1 n 2 0
n i∑
Ŝ ε̂i zi zi ,
=1
where ε̂i = yi xi0 β̂. We can rewrite Ŝ as

Ŝ ∑ni=1 ε̂2i zi zi0 /n = Z0 BZ/n where
B = diag(ε̂21 , ε̂22 , ..., ε̂2n ).
34 / 50
Step 2: With Ŝ= n1 Z0 BZ, we obtain (n cancels out in the

expression above)
1
β̂GMM = X0 ZŜ 1 0
ZX X0 ZŜ 1 0
Zy
1 1 1
= X0 Z Z0 BZ Z0 X X0 Z Z0 BZ Z0 y.
This estimator is called the two-step GMM estimator.
35 / 50
Implications of Conditional Homoskedasticity
Assume now:
Assumption (GMM6 - Conditional Homoskedasticity)
E ε2i zi = σ2 .
This assumption implies

0 2 0 2 0 2
S E wi wi = E εi zi zi = σ E zi zi = σ Qzz .
Its estimator is
Z0 Z
Ŝ = σ̂2 .
n
where σ̂2 is a consistent estimator of σ2 . Now the efficient GMM
becomes:
1
β̂GMM = X0 ZŜ 1 0
ZX X0 ZŜ 1 0
Zy
1 1 1
= X0 Z Z0 Z Z0 X X0 Z Z0 Z Z0 y.
This estimator is called two-stage least squares estimator and we will
denote it as β̂2SLS .
36 / 50

β̂2SLS is called Two Stages Least Squares (2SLS) estimator because it can
be computed in two stages.
1
Let PZ = Z (Z0 Z) Z0 .
Note that PZ = PZ0 and PZ PZ = PZ .
Now note also that
1 1 1
β̂2SLS = X0 Z Z0 Z Z0 X X0 Z Z0 Z Z0 y
1
= X0 PZ X X0 PZ y
1
= X0 PZ0 PZ X X0 PZ0 y.
1
= X̂0 X̂ X̂0 y
1
where X̂ = PZ X = Z (Z0 Z) Z0 X..
The jth column of X̂ is x̂ j = PZ x j and thus x̂ j is the predicted
vector from regressing x j on Z, j = 1, .., K.
Therefore X̂ is the predicted matrix from regressing X on Z.
β̂2SLS is the OLS estimator from regressing y on X̂.
37 / 50
Therefore, the optimal GMM estimator under homoskedasticity can

be obtained in two-stages:
1 At stage 1, regress each column of X on Z to obtain the predicted
matrix X, denoted X̂.
2 At stage 2, regress y on X̂ to obtain β̂2SLS
Therefore, the optimal GMM estimator under homoskedasticity is
usually called the two stage least squares (2SLS) estimator.
38 / 50
Asymptotic Distribution of the GMM Estimator

Let us go back to the heteroskedastic case:
Theorem (Asymptotic Distribution of the GMM estimator)

p
(a) (Consistency) Under Assumptions GMM1-GMM4, β̂GMM ! β; (b)
(Asymptotic Normality) If Assumption GMM3 is strengthened as
Assumption GMM5, then
p d
n β̂GMM β ! N (0, V)
where
p 0 1
1
V = AVar n β̂GMM β = Qzx S Qzx
Where Qzx = E zi xi0 and S E wi wi0 = E ε2i zi zi0 . (c) (Consistent

Estimate of V) Suppose there is available a consistent estimator, Ŝ, of S.
Then, under Assumption GMM2, V is consistently estimated by
1
X0 Z 1Z
0X
V̂ = Ŝ .
n n
39 / 50
Hypothesis Testing
Theorem (Robust t-ratio and Wald Statistics)
Suppose Assumptions GMM1-GMM5 hold, and suppose there is available a

consistent estimate Ŝ of S. Then (a) under the null H0 : βj = β0j
β̂j β0j d
t0j = ! N (0, 1)
σ̂ β̂
j
where β̂j is the jth element of β̂GMM and σ̂2β̂ is the (j, j) element of V̂/n.
j
(b) Under the null hypothesis H0 :Rβ = r where p is the number of
restrictions and R (p K) is of full row rank,
0 1 d
W = n R β̂GMM r RV̂R0 R β̂GMM r ! χ2(p) .
40 / 50
Hypothesis Testing
Example: Wage and education data for a sample of men in 1976 .Card
(1995) considered the following model
log(wagei ) = β1 + β2 educi + β3 experi + β4 exper2i + β5 blacki + β6 smsai
+ β7 southi + εi
where smsa =1 if in Standard Metropolitan Statistical Area in 1976.
However, Card (1995) considered educ an endogeneous variable

41 / 50
Hypothesis Testing
Example: Let nearc2 and nearc4 =1 if he grew up near a 2 and 4 year
college, respectively. Card (1995) uses college proximity (nearc2 and
nearc4) as an instrument to identify the returns to schooling, noting
that living close to a college during childhood may induce some
children to go to college but is unlikely to directly affect the wages
earned in their adulthood. Hence, nearc2 and nearc4 instrumental
variables for EDUC.
42 / 50
Hypothesis Testing
Example: Let’s get the GMM estimator assuming that nearc2 and
nearc4 are instrumental variables
xi0 = 1 educi experi exper2i blacki smsai southi
zi0 = 1 experi exper2i blacki smsai southi nearc2i nearc4i
43 / 50
Hypothesis Testing
Example: Let’s obtain the 2SLS estimator
44 / 50
Hypothesis Testing
Example: Let’s now obtain the 2SLS estimator with robust standard
errors (2SLS is consistent and asymptotically normal under
heteroskedasticity, but it is not efficient and we need to use robust
standard errors to make inferences)
45 / 50
Testing all Orthogonality Conditions
Suppose that we would like to test
H0 : E zi yi xi0 β = 0, vs H1 : E zi yi xi0 β 6= 0
Theorem (Hansen’s test of Overidentifying Restrictions)
Under assumptions GMM1-GMM5

0 d
Ĵ = Z0 X β̂GMM Z0 y Ŝ 1
Z0 X β̂GMM Z0 y /n ! χ2(L K)
If Ĵ is large, it means that either the orthogonality conditions

(Assumption GMM3) or the other assumptions (or both) are
likely to be false.
In the conditionally homoskedastic case with β̂GMM = β̂2SLS and
Z0 Z
Ŝ = σ̂2 n , the Hansen test is called Sargan test.
To perform this test we need L > K.
46 / 50

Example (continuation): The program provides the J statistics:
H0 : E (zi εi ) = 0 [the instruments are uncorrelated with the error]

H1 : E (zi εi ) 6= 0 [some instruments are endogeneous]
. estat overid
Test of overidentifying restriction:
Hansen's J chi2(1) = 2.65321 (p = 0.1033)
We do not reject H0 at the 5% level (that’s good!)

47 / 50
Consider
zi1 g L1 rows
zi =
zi2 g L L1 rows
We want to test H0 : E (zi2 εi ) = 0, E (zi1 εi ) = 0 vs
H1 : E (zi2 εi ) 6= 0, E (zi1 εi ) = 0.
The basic idea is to compare two J statistics from two separate GMM
estimators, one using only the instruments included in zi1 and the
other using also the suspect instruments zi2 in addition to zi1 . If the
inclusion of the suspect instruments significantly increases the Ĵ
statistic, that is a good reason for doubting the predeterminedness of
zi2 . This restriction is testable if L1 K (why?).
Remark: These tests are called tests for subsets of orthogonality
conditions by Hayashi. However they are called tests for additional
orthogonality conditions by other authors
48 / 50
Theorem (Testing a Subset of Orthogonality Conditions)
Suppose that the rank condition is satisfied for zi1 , so E zi1 xi0 is of full
column rank. Under assumptions GMM1-GMM5. and under the null H0 ,
we have
d
C Ĵ Ĵ1 ! χ2(L L1 ) .
where in Ĵ we use zi1 and zi2 as instrumental variables and in Ĵ1 just zi1 .
49 / 50
Example: We can use theorem to test for the endogeneity of educi in

the previous example Let
zi0 = 1 experi exper2i blacki smsai southi nearc2i nearc4i ,
and suppose you want to test H0 : E (zi εi ) = 0, E (educi εi ) = 0. vs
H1 : E (zi εi ) = 0, E (educi εi ) 6= 0
. estat endogenous
Test of endogeneity (orthogonality conditions)

H0: Variables are exogenous
GMM C statistic chi2(1) = 3.73275 (p = 0.0534)
50 / 50

Topic 3 V 6

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic 3 V 6

Uploaded by

Copyright:

Available Formats

Econometrics II

Endogeneity and the GMM

Endogeneity and the GMM [Hayashi]

Consider yi = β1 xi1 + β2 xi2 + ... + βK xiK + εi .

Definition (Endogenous Regressor)

Cov xij , εi 6= 0 (or E xij εi 6= 0).

It follows that E (xi εi ) 6= 0 and E ( εi j xi ) 6= 0.If Cov xij , εi = 0 (or

If the regressors are endogenous we have, under the Assumptions

since E (xi εi ) 6= 0.The term Q 1

Example (simple regression model): Consider

Then under the Assumptions LS1 and LS2

(see Exercise 1 of Exercise Sheet 3).If xi2 is endogenous, b1 and b2 are

Correlation between error terms and regressors

The endogeneity problem is endemic in social sciences/economics:

Example: We will see that predetermined regressor necessarily

where xi2 is a predetermined regressor.The variable xi2 is measured

p Cov (xi2 , εi ) E v2i

Example: “Dewey defeats Truman” was an incorrect banner headline

Sample selection corresponds to nonrandom sampling from a

Some cases of interest:

Example (Case E (si xi εi ) 6= 0): Following the example "Dewey defeats

Example (Case E (si xi εi ) = 0): Suppose we wish to estimate the

Example: Consider the problem of unobserved ability in a wage

log (wagei ) = β1 + β2 educi + β3 abili + ui

log (wagei ) = β1 + β2 educi + εi

where εi = β3 abili + ui . The OLS will be inconsistent estimator of β2 if

p Cov (educi , εi ) Cov (educi , abili )

Example (Omitted variables, Ignoring a Common Cause): Ice cream

where εi and ui are independent. By construction yi1 and yi2 are

(see Exercise 2 of Exercise Sheet 3).

Example: The OLS estimator is inconsistent for both β1 and β2 . For

This phenomenon is known as the simultaneous equations bias or

Example: A classic example:

Ci = β1 + β2 Yi + εi (consumption function),0 < β2 < 1

where Cov (εi , Ii ) = 0. Under adequate assumptions (see Exercise 3 of

Suppose that xij is endogeneous. In order to obtain consistent

Example: A classic example:

Ci = β1 + β2 Yi + εi (consumption function),0 < β2 < 1

where Cov (εi , Ii ) = 0. Here investment Ii is an instrument for Yi

Example: Why does this help?

Parameter of the population Estimator

Example: Therefore an estimator for

Let us now consider the general framework in which we have more

Generalized Method of Moments (GMM)

Let zi be a L-dimensional vector to be referred to as the vector of

where xi is an K-dimensional vector of regressors, β is an K-dimensional

Assumption (GMM2 - S&WD)

Assumption (GMM3 - Orthogonality Conditions)

Generalized Method of Moments (GMM)

Example: Estimating the return to schooling via simple regression:

log(wagei ) = β1 + β2 educi + β3 agei + εi

Example: In terms of the general model,

Generalized Method of Moments (GMM)

Assumption (GMM4 - Rank Condition for Identification)

Assumption (GMM5 - fwi g is a MDS with Finite Second Moments)

Generalized Method of Moments (GMM)

How can we use the Method of Moments to obtain an estimator of β?

How can we use the Method of Moments to obtain an estimator of β?

Notice that E(zi yi ) is a (L 1) vector and E(zi xi0 ) is a (L K) matrix,

Generalized Method of Moments (GMM)

Notice that the method of moments estimator of E(zi yi ) is n1 ∑ni=1 zi yi

is a system with L equations with K unknowns.