E300 Lecture 25

E300 Econometric Methods
Lecture 25:
Generalized Method of Moments
Julius Vainora
University of Cambridge
Michaelmas Term 2022
E300 Econometric methods Lecture 25: Generalized Method of Moments 1 / 23

In this lecture
• Some basics of the generalized method of moments.
• Wooldridge (2001) “Applications of Generalized Method of Moments
Estimation” and Greene 13.4.

Method of Moments (1/2)
• Suppose that the distribution of the data Y1 , Y2 , . . . , Yn depends on k parameters
θ1 , θ 2 , . . . , θ k .
• One way to estimate the parameters is to link them to the j-th moments of the
random variables, E[Yij ], for j = 1, . . . , k:
E Yij = fj (θ1 , θ2 , . . . , θk ) .

Pn
• The estimation replaces E[Yij ] with the sample moments 1
n i=1
Yij and solves the
above equations to obtain the method of moments estimators θ̂1MM , . . . , θ̂kMM .
• Example. Consider Yi ∼ i.i.d. N (µ, 1). Then since E [Yi ] = µ,
1 Xn
µ̂MM = Yi = Ȳ .
n i=1

Method of Moments (2/2)
• Example. Consider an i.i.d. sample y1 , . . . , yn from a parametric distribution Fα,β
with mean µ = α/β and variance σ 2 = α/β 2 , where α, β > 0.
• Find method of moments estimators of α and β based on µ and σ 2 .
• From our two moment conditions we get


µ =
 α
β µ2 µ
=⇒ α= and β= .
σ 2 =
 α σ2 σ2
β2
• Hence, possible method of moments estimators are
µ̂2 µ̂
α̂ = and β̂ = ,
σ2
b σb2
where
n n n
1X 1X 1X 2
µ̂ = yi and σb2 = (yi − µ̂)2 = yi − µ̂2 .
n n n
i=1 i=1 i=1

Generalized Method of Moments (1/2)
• What to do if the number of parameters is less than the number of moments that
you link to the parameters?
• Example. Consider Yi ∼ i.i.d. N (θ, θ). Want to estimate θ.
• We have one parameter θ and two moment conditions:
E (Yi − E [Yi ])2 = θ.

E [Yi ] = θ and
• Can estimate θ using either of the two individual estimators:

n n
1X 1X 2
Ȳ = Yi or S2 = Yi − Ȳ .
n n
i=1 i=1
• The asymptotic distributions of Ȳ and S 2 are
N θ, n−1 θ ,

Ȳ ≈
S2 N θ, n−1 2θ2 .

≈
• Both estimators are consistent and asymptotically normal.

• For values of θ > 0 such that θ > 2θ2 , or θ < 1/2, Ȳ has higher variance,
otherwise S 2 has a higher variance.
Generalized Method of Moments (2/2)
• Consider a linear combination of the two estimators
θ̂ (λ) = λȲ + (1 − λ) S 2 , λ ∈ (0, 1) .
• As λ is a constant, and Ȳ and S 2 are asymptotically normal,
N λθ, n−1 λ2 θ ,

λȲ ≈
(1 − λ) S 2 N (1 − λ) θ, n−1 (1 − λ)2 2θ2 .

≈
• As mean and variance estimators are independent, by Slutsky’s theorem,
N λθ + (1 − λ) θ, n−1 λ2 θ + 2 (1 − λ)2 θ2

θ̂ (λ) ≈ ,
√ d
N 0, λ2 θ + 2 (1 − λ)2 θ 2

n θ̂ (λ) − θ −→ .
• Pick λ to minimize the asymptotic variance: λ̂ = 2θ

1+2θ
.
• General principle: the Generalized Method of Moments (GMM) tells us how to
use the two population moment equations in a manner that minimizes the
asymptotic variance of the estimator. It weights Ȳ and S 2 in an optimal way.
OLS as Method of Moments estimator
• Consider a linear regression with k regressors:
yi = xi0 β + εi , i = 1, . . . , n.
• The assumption E [xi εi ] = 0 can be rewritten as
E xi yi − xi0 β = E [xi yi ] − E xi xi0 β = 0.

• Since xi is a k × 1 vector, this gives k population moment equations in k
unknowns β1 , . . . , βk .
• Replacing the population moments by sample moments, we get

1 Xn 1 Xn
xi yi − xi xi0 β̂ MM = 0.
n i=1 n i=1
• The method of moments estimator of β is

−1
1 1 Xn
Xn
β̂ MM = xi xi0 xi yi = β̂ OLS .
n i=1 n i=1

Can we improve over OLS?
• Consider a linear regression
yi = xi0 β + εi , i = 1, . . . , n,
where E [εi |xi ] = 0 and (yi , xi ) are i.i.d.
• Note that E [εi |xi ] = 0 implies many moment conditions.
• For example, for j = 1, . . . , k,
E xi yi − xi0 β

= 0,
E xij2 yi − xi0 β

= 0.
• This gives more moment equations than there are unknowns.
• Using GMM we are utilizing more information and can improve on OLS in case of
heteroskedasticity! (With homoskedasticity and i.i.d.-ness of data we know that

OLS is BLUE!)
How does GMM work?
• Decide which extra moment condition to the usual “no correlation” condition you
use.
• Do OLS. Calculate weighting matrix, which is an inverse of a consistent estimator
of the variance-covariance matrix of the sample moments used. If there are m

moment conditions, the weighting matrix will be m × m.
• GMM estimator minimizes a quadratic form of the sample moment conditions,
where the weighting matrix appears in the quadratic form.
• Such a choice of the weighting matrix is asymptotically optimal in the sense that
the asymptotic variance of the estimator is minimized with this choice of the
matrix.
• Problems: which condition to add? GMM is consistent but not unbiased. May
have problems in small samples. Efficiency gains may be small in large samples.
GMM defined formally
• Suppose that data w1 , . . . , wn are i.i.d. and we have moment conditions
E [g (wi , θ 0 )] = 0.
• g (wi , θ 0 ) is an m-dimensional function.
• θ 0 is a k-dimensional parameter vector, m ≥ k.
• The “variance” of g (wi , θ 0 ), the moment equations:
E g (wi , θ 0 ) g (wi , θ 0 )0 = Ω > 0.

• A GMM estimator is defined as

n
!0 n
!
1X 1X
θ̂ GMM = arg min g (wi , θ) W
bn g (wi , θ) .
θ n n
i=1 i=1
p
• W b n −→
b n is a weighting matrix s.t W W for some W > 0.
• Optimal GMM uses W = Ω−1 .

2SLS as GMM: moment conditions
• Consider a linear regression with k regressors
yi = xi0 β + εi .
• Residuals εi = yi − xi0 β and xi may be correlated, that is,
E xi yi − xi0 β

6= 0.
• Suppose there is a set of instruments Z = (z1 , . . . , zn )0 such that
dim zi = r ≥ k = dim xi , and
E zi yi − xi0 β

= 0.
• We used these conditions to motivate the 2SLS estimator.
• Denote g (wi , β) = zi (yi − xi0 β), where wi = (yi , xi , zi ), then
E [g (wi , β)] = 0.

2SLS as GMM: optimal weight
• Let us find Ω = E g (wi , β) g (wi , β)0 .

• Recall that g (wi , β) = zi (yi − xi0 β), and hence

h 2 0 i
Ω = E zi yi − xi0 β zi = E ε2i zi z0i .

• Assuming E ε2i |zi = σε2 (homoskedasticity), we obtain
Ω = σε2 E zi z0i .

• We can define the weighting matrix as

n
!−1
−1
1X 0 1 0

b n = σ̂ε−2
W zi zi = σ̂ε−2 ZZ .
n n
i=1
• Then, by LLN and Slutsky’s theorem, we obtain

−1 −1
1 0

p
b n = σ̂ε−2 −→ σε2 E zi z0i = Ω−1 .

W ZZ
n
.
2SLS as GMM
• The moment conditions are given by
n n
1X 1X 1
g (wi , β) = zi yi − xi0 β = Z0 (Y − Xβ) .
n n n
i=1 i=1
• The optimal weighting matrix under homoskedasticity is
−1
1 0

b n = σ̂ε−2
W ZZ .
n
• The optimal GMM minimizes the quadratic form
n
!0 n
!
1X 1X
g (wi , θ) W
bn g (wi , θ) .
n n
i=1 i=1
• This is equivalent to minimizing
−1
(Y − Xβ)0 Z Z0 Z Z0 (Y − Xβ) = (PZ Y − PZ Xβ)0 (PZ Y − PZ Xβ) .
• Recall that projection matrix PZ = Z (Z0 Z)−1 Z0 is idempotent.

• But the minimizer of this is β 2SLS !
• The optimal GMM under homoskedasticity is 2SLS!
GMM under heteroskedasticity
• Under heteroskedasticity, GMM is more efficient than 2SLS.
• The gains may be small if heteroskedasticity is not too much of a problem.

Why GMM is useful in econometrics?
• GMM estimators are consistent, asymptotically unbiased, and asymptotically
normally distributed.
• If the weighting matrix W

b n is appropriately chosen, GMM estimators are also
efficient within a wide class of estimators.
• These results hold under very general conditions, and we do not need to assume
normality.
• Almost all other common estimators (including OLS, 2SLS and even ML
estimators) are special cases of GMM.
• We can often derive valid conditional moment restrictions by using economic
theory.

Example: Mid-course Exam 2021/2022 B2 (1/4)
Consider an i.i.d. sample of (yi , xi ) for i = 1, . . . , n, where xi is a (k + 1) × 1 vector, and

a linear regression yi = xi0 β + εi . It is known that E[xi εi ] 6= 0, and that there exist i.i.d.
vectors z1 , . . . , zn such that E[εi | zi ] = 0. Assume that zi is r × 1 with r > k + 1 and

includes the exogenous variables from xi . However, E ε2i zi = f (zi ) is not constant,
although you can assume that f (zi ) is known.
(a) [3 points] Show that

zi εi
E = 0. (1)
f (zi )
Solution. By the Law of Iterated Expectations,

zi εi zi E[εi | zi ]
E =E = 0.
f (zi ) f (zi )

Consider an i.i.d. sample of (yi , xi ) for i = 1, . . . , n, where xi is a (k + 1) × 1 vector, and
a linear regression yi = xi0 β + εi . It is known that E[xi εi ] 6= 0, and that there exist i.i.d.
vectors z1 , . . . , zn such that E[εi | zi ] = 0. Assume that zi is r × 1 with r > k + 1 and

includes the exogenous variables from xi . However, E ε2i zi = f (zi ) is not constant,
although you can assume that f (zi ) is known.
(b) [4 points] Write down a minimisation problem that a sub-optimal GMM estimator
based on (1) solves.
Solution. Let the weighting matrix be simply Ir . Let wi = (yi , xi , zi )0 . Then

g(wi , β) = zi (yi − xi0 β)/f (zi ) and
n
!0 n
!
1X 1X
β̂GMM = arg min g(wi , β) g(wi , β) .
β n n
i=1 i=1

(c) [6 points] Write down a minimisation problem that the optimal GMM estimator
based on (1) solves. Make sure to find the expression for the optimal weighting
matrix.
Solution. We have that, by the Law of Iterated Expectations,

" #
E ε2i zi zi z0i

0 ε2 zi z0 zi z0i
= E i 2i

Ω = E g(wi , β)g(wi , β) =E =E .
f (zi ) f (zi )2 f (zi )
Hence, we can estimate the optimal weighting matrix by
n
!−1
1 X zi z0i
Ŵn = .
n f (zi )
i=1
Thus, the optimal GMM estimator based on (1) is

n
!0 n
!−1 n
!
∗ 1X 1 X zi z0i 1X
β̂ GMM = arg min g(wi , β) g(wi , β) .
β n n f (zi ) n
i=1 i=1 i=1
Note that since f (·) is known we did not need a two-step procedure where we
would first estimate Ŵn .
Pn
(d) [7 points] Solve the minimisation problem from (c). Hint: i=1
zi (yi − xi0 β)/f (zi )
can be written as Z0 Σ−1 (Y − Xβ), where Σ = diag(f (z1 ), . . . , f (zn )).
∗
Solution. Using the hint we get that β̂ GMM is the minimiser of
−1 −1
(Y − Xβ)0 Σ−1 Z Z0 Σ−1 Z Z0 Σ−1 (Y − Xβ) = (Ỹ − X̃β)0 Z̃ Z̃0 Z̃ Z̃0 (Ỹ − X̃β)
0
= PZ̃ Ỹ − PZ̃ X̃β PZ̃ Ỹ − PZ̃ X̃β ,
−1
where PZ̃ = Z̃ Z̃0 Z̃ Z̃0 , Ỹ = Σ−1/2 Y, X̃ = Σ−1/2 X, Z̃ = Σ−1/2 Z, and
Σ−1/2 = diag f (z1 )−1/2 , . . . , f (zn )−1/2 . But the minimiser of this expression is

simply the 2SLS estimator for transformed data:
∗ −1
β̂ GMM = X̃0 PZ̃ X̃ X̃0 PZ̃ Ỹ.

Consider T observations y1 , . . . , yT following an AR(2) model yt = φ1 yt−1 + φ2 yt−2 + εt

with independent and identically distributed errors εt with zero mean and variance
σ 2 < ∞.
(a) [4 points] Suppose that φ1 = 0.3 and φ2 = −0.4. Is yt stable?
Solution. The autoregressive lag polynomial is Φ(z) = 1 − 0.3z + 0.4z 2 and has
p
roots 3/8 ± 1511/2 /8i with modulus (3/8)2 + 151/82 = 101/2 /2 > 1. Thus, the
process is stable.

(b) [6 points] Assume that φ1 and φ2 are such that yt is stable and, hence, stationary.
Show that
γ(1) = φ1 γ(0) + φ2 γ(1) and γ(2) = φ1 γ(1) + φ2 γ(0),
where γ(h) = Cov[yt , yt−h ] is the autocovariance function.

Solution. Since yt is stationary,
E[yt ] = φ1 E[yt−1 ] + φ2 E[yt−2 ] + E[εt ],

E[yt ] = φ1 E[yt ] + φ2 E[yt ],
E[yt ] = 0.
Hence, γ(h) = E[yt yt−h ]. Premultiply both sides of the model by yt−1 to get
2
yt yt−1 = φ1 yt−1 + φ2 yt−2 yt−1
γ(1) = φ1 γ(0) + φ2 γ(1).
Now multiply both sides by yt−2 to get

2
yt yt−2 = φ1 yt−1 yt−2 + φ2 yt−2
γ(2) = φ1 γ(1) + φ2 γ(0).

(c) [5 points] Based on the results in (b), find method of moments estimators of φ1
and φ2 .
Solution. We may additionally divide the two relations by γ(0) to have an

equivalent in terms of autocorrelations. Then, solving
ρ(1) = φ1 + φ2 ρ(1) and ρ(2) = φ1 ρ(1) + φ2
as a system of equations gives
ρ̂(1)(1 − ρ̂(2)) ρ̂(2) − ρ̂(1)2

φ̂1 = and φ̂2 = ,
1 − ρ̂(1)2 1 − ρ̂(1)2
where, for h > 0, PT

yy
t=h+1 t t−h
ρ̂(h) = P T
.
y2
t=1 t

(d) [5 points] Assume that σ 2 > 0 is known. Construct a GMM estimator for φ1 and
φ2 using more than two moment conditions. You may use an identity matrix as
your weighting matrix.
Solution. Continuing in the same manner as in (b) we get that also, e.g.,
γ(3) = φ1 γ(2) + φ2 γ(1) and γ(4) = φ1 γ(3) + φ2 γ(2).
Hence, the moment conditions in terms of autocovariances are
2
+ φ2 yt−2 yt−1 − yt yt−1
 
φ1 yt−1
2
 φ1 yt−1 yt−2 + φ2 yt−2 − yt yt−2 
g(φ1 , φ2 ) = E  = 0.
φ1 yt−1 yt−3 + φ2 yt−2 yt−3 − yt yt−3 
φ1 yt−1 yt−4 + φ2 yt−2 yt−4 − yt yt−4
The sample moments then are
2
+ φ2 yt−2 yt−1 − yt yt−1
 
T
φ1 yt−1
2
1 X  φ1 yt−1 yt−2 + φ2 yt−2 − yt yt−2 
ĝ(φ1 , φ2 ) = .
n φ y y + φ 2 t−2 yt−3 − yt yt−3
y

1 t−1 t−3
t=5
φ1 yt−1 yt−4 + φ2 yt−2 yt−4 − yt yt−4
Thus,
(φ̂1 , φ̂2 )0 = arg min ĝ(φ1 , φ2 )0 ĝ(φ1 , φ2 )
φ1 ,φ2
defines a GMM estimator.

E300 Lecture 25

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

E300 Lecture 25

Uploaded by

Copyright:

Available Formats

E300 Econometric Methods

Michaelmas Term 2022

E300 Econometric methods Lecture 25: Generalized Method of Moments 1 / 23

• Some basics of the generalized method of moments.

• Wooldridge (2001) “Applications of Generalized Method of Moments

Estimation” and Greene 13.4.

E300 Econometric methods Lecture 25: Generalized Method of Moments 2 / 23

• Suppose that the distribution of the data Y1 , Y2 , . . . , Yn depends on k parameters

random variables, E[Yij ], for j = 1, . . . , k:

• Example. Consider Yi ∼ i.i.d. N (µ, 1). Then since E [Yi ] = µ,

E300 Econometric methods Lecture 25: Generalized Method of Moments 3 / 23

with mean µ = α/β and variance σ 2 = α/β 2 , where α, β > 0.

• Find method of moments estimators of α and β based on µ and σ 2 .

• From our two moment conditions we get

• Hence, possible method of moments estimators are

E300 Econometric methods Lecture 25: Generalized Method of Moments 4 / 23

E (Yi − E [Yi ])2 = θ.

• Can estimate θ using either of the two individual estimators:

• Both estimators are consistent and asymptotically normal.

θ̂ (λ) = λȲ + (1 − λ) S 2 , λ ∈ (0, 1) .

• As λ is a constant, and Ȳ and S 2 are asymptotically normal,

(1 − λ) S 2 N (1 − λ) θ, n−1 (1 − λ)2 2θ2 .

• As mean and variance estimators are independent, by Slutsky’s theorem,

• Pick λ to minimize the asymptotic variance: λ̂ = 2θ

• The assumption E [xi εi ] = 0 can be rewritten as

E xi yi − xi0 β = E [xi yi ] − E xi xi0 β = 0.

• Since xi is a k × 1 vector, this gives k population moment equations in k

• Replacing the population moments by sample moments, we get

• The method of moments estimator of β is

E300 Econometric methods Lecture 25: Generalized Method of Moments 7 / 23

where E [εi |xi ] = 0 and (yi , xi ) are i.i.d.

• Note that E [εi |xi ] = 0 implies many moment conditions.

• For example, for j = 1, . . . , k,

• This gives more moment equations than there are unknowns.

heteroskedasticity! (With homoskedasticity and i.i.d.-ness of data we know that

• Do OLS. Calculate weighting matrix, which is an inverse of a consistent estimator

of the variance-covariance matrix of the sample moments used. If there are m

• GMM estimator minimizes a quadratic form of the sample moment conditions,

where the weighting matrix appears in the quadratic form.

• g (wi , θ 0 ) is an m-dimensional function.

• θ 0 is a k-dimensional parameter vector, m ≥ k.

• The “variance” of g (wi , θ 0 ), the moment equations:

E g (wi , θ 0 ) g (wi , θ 0 )0 = Ω > 0.

• A GMM estimator is defined as

• Optimal GMM uses W = Ω−1 .

• Residuals εi = yi − xi0 β and xi may be correlated, that is,

• Suppose there is a set of instruments Z = (z1 , . . . , zn )0 such that

dim zi = r ≥ k = dim xi , and

• We used these conditions to motivate the 2SLS estimator.

• Denote g (wi , β) = zi (yi − xi0 β), where wi = (yi , xi , zi ), then

E300 Econometric methods Lecture 25: Generalized Method of Moments 11 / 23

• Recall that g (wi , β) = zi (yi − xi0 β), and hence

• We can define the weighting matrix as

• Then, by LLN and Slutsky’s theorem, we obtain

• Recall that projection matrix PZ = Z (Z0 Z)−1 Z0 is idempotent.

• Under heteroskedasticity, GMM is more efficient than 2SLS.

• The gains may be small if heteroskedasticity is not too much of a problem.

E300 Econometric methods Lecture 25: Generalized Method of Moments 14 / 23

• GMM estimators are consistent, asymptotically unbiased, and asymptotically

• If the weighting matrix W

estimators) are special cases of GMM.

• We can often derive valid conditional moment restrictions by using economic

E300 Econometric methods Lecture 25: Generalized Method of Moments 15 / 23

(a) [3 points] Show that