Efficient GMM Estimation of A Spatial Autoregressive Model With Autoregressive Disturbances

Efficient GMM Estimation of a Spatial Autoregressive Model
with Autoregressive Disturbances
Lung-fei Lee∗ Xiaodong Liu

Department of Economics Department of Economics
Ohio State University Ohio State University
Columbus, OH 43210, USA Columbus, OH 43210, USA
Email: lee.1777@osu.edu Email: liu.470@osu.edu
October, 2006
Abstract
In this paper, we extend the GMM framework for the estimation of the mixed-regressive
spatial autoregressive model introduced by Lee (2006) to estimate the mixed-regressive spatial
autoregressive model with spatial autoregressive disturbances. Contrary to the G2SLS estima-
tion method in Kelejian and Prucha (1998), the proposed GMM estimation is a procedure which
estimates simultaneously the spatial lags coefficient and the spatial correlation parameter of the
spatially correlated disturbances. The GMM approach has computational advantage over the
conventional maximum likelihood method. The proposed GMM estimators are shown to be
consistent and asymptotically normal. The best GMM estimator is derived, within the class of
optimal GMM estimators based on linear and quadratic moment conditions of the disturbances.
The best GMM estimator is asymptotically efficient as the MLE under normality, asymptotically
more efficient than the QMLE otherwise, and is efficient relative to the G2SLSE.
Key Words: Spatial econometrics, spatial autoregressive models, regression, GMM, QMLE, effi-
ciency
JEL Classification: C13, C21, R15
∗ Lee gratefully acknowledges financial support from the National Science Foundation Grant No. SES-0519204 and
research assistantship support from the Department of Economics at the Ohio State University.
1
1 Introduction
In this paper, we consider estimation of the spatial autoregressive (SAR) model with SAR distur-
bances. For this model, the most conventional estimation method is the quasi-maximum likelihood
(QML), which is, however, not easy to implement due to significant computational complexities of
evaluating the determinant of the Jacobian transformation, especially for large sample sizes with
general spatial weights matrices.
Another estimation procedure for SAR models in the presence of exogenous regressors (aka
mixed-regressive spatial autoregressive (MRSAR) models) with SAR disturbances in the existing
literature is the generalized two stage least square (G2SLS) suggested by Kelejian and Prucha (1998).
The G2SLS estimator has the virtue of computational simplicity but it is inefficient relative to the
maximum likelihood estimator (MLE) under normality, since it estimates the spatial correlation in
the disturbances and other parameters separately. And the G2SLS method in Kelejian and Prucha
(1998) would not be applicable when all exogenous variables in Xn are really irrelevant.
Lee (2001; 2006) proposes a systematic generalized method of moments (GMM) framework for
the estimation of SAR models, with or without exogenous regressors. In this paper, we extend the
GMM approach for the estimation of the (MR)SAR model with SAR disturbances. We study the
identification of the model and asymptotic properties of the proposed GMM estimator (GMME).
Contrary to the two or three step 2SLS estimation in Kelejian and Prucha (1998), we realize that, for
efficient estimation, the coefficients of the spatial regression equation and the spatial lag parameter
in the disturbances need to be estimated simultaneously in order not to lose efficiency. We show the
existence of the distributionally free best GMME (BGMME) within the class of GMMEs based on
linear and quadratic moment conditions. The BGMME has the merit of computational simplicity
and asymptotic efficiency. The proposed BGMME is asymptotically efficient as the MLE under
normality, asymptotically more efficient than the QMLE otherwise, and is efficient relative to the
G2SLSE.
This paper is organized as follows. In Section 2, we formally introduce the MRSAR model with
SAR disturbances. We establish identification of the model and propose a GMM estimation approach
in Section 3. Section 4 investigates consistency and asymptotic distribution of the GMMEs, and
derives the best selection of moment functions and optimal IVs. All the proofs of the results are
collected in the appendices. Section 5 provides some Monte Carlo results for the comparison of finite
2
sample properties of estimators. Section 6 briefly concludes.
2 The MRSAR Model with SAR Disturbances
The MRSAR model with SAR disturbances is
Yn = λWn Yn + Xn β + un ,
un = ρMn un + n, (1)
where elements ni , i = 1, · · · , n, of n, which is a n-dimensional (column) vector, are i.i.d. with zero
mean and variance σ 2 . The model (1) incorporates both spatial lag Wn Yn and spatial correlated
disturbances un in a regression equation. The Wn and Mn are n × n dimensional spatial weights
matrices, which may or may not be the same. Let θ = (ρ, λ, β 0 )0 and δ = (λ, β 0 )0 . Denote the
true parameters that generate the observed sample by θ0 = (ρ0 , λ0 , β 00 )0 . For this model, denote
Sn (λ) = In − λWn and Rn (ρ) = In − ρMn . At true θ0 , Sn = Sn (λ0 ) and Rn = Rn (ρ0 ) for simplicity.
Furthermore, denote Gn = Wn Sn−1 and Hn = Mn Rn−1 , which provide the representations of Wn Yn
and Mn un as
Wn Yn = Gn Xn β 0 + Gn un and Mn un = Hn n .
The Rn is a quasi-transformation on un to remove its spatial correlation. The quasi-transformed

equation of (1) is
Yn = (ρ0 Mn + λ0 Wn − ρ0 λ0 Mn Wn )Yn + (In − ρ0 Mn )Xn β 0 + n,
which is in the form of a MRSAR model with high order spatial lags and with nonlinear constraints
on coefficients. Corresponding to Sn in the first order spatial lag model, define Fn (ρ, λ) = (In −
ρMn − λWn + ρλMn Wn ), and Fn = Fn (ρ0 , λ0 ) for this high order spatial model. The Fn = Rn Sn
transforms Yn into functions of Xn and n as Fn Yn = Rn Xn β 0 + n. The reduced form equation of
(1) is
Yn = Sn−1 Xn β 0 + Fn−1 n . (2)
For the estimation of (1), Kelejian and Prucha (1998) suggest a feasible generalized 2SLS (G2SLS)
estimation method. Their estimation method involves three steps. In the first step, the coefficients
3
vector δ in (1) will be estimated by a 2SLS method with an IV matrix Qn . With an initial 2SLS
estimate of δ from the first step, the un can be estimated as a residual from (1) and ρ will then
be estimated by a MOM originated in Kelejian and Prucha (1999). Let ρ̂n be the consistent MOM
estimate of ρ, the G2SLS estimator of δ = (λ, β 0 ) is
δ̂ kp,n = [Zn0 Rn0 (ρ̂n )Qn (Q0n Qn )−1 Q0n Rn (ρ̂n )Zn ]−1 Zn0 Rn0 (ρ̂n )Qn (Q0n Qn )−1 Q0n Rn (ρ̂n )Yn , (3)
where Zn = (Wn Yn , Xn ). The G2SLS method in (3) would not be applicable to the estimation of
(1) when all exogenous variables in Xn are really irrelevant.
Another unsatisfactory feature of the G2SLS estimator is that the asymptotic distribution of
δ̂ kp,n does not depend on the asymptotic distribution of the consistent estimate of ρ̂n . In a time
series model with lagged dependent variables and autoregressive disturbances such as the equation
yt = λyt−1 + xt β + ut with ut = ρut−1 + t, it is known that a feasible GLS estimation of (λ, β 0 )
based on the transformed equation yt − ρ̂yt−1 = λ(yt−1 − ρ̂yt−2 ) + (xt − ρ̂xt−1 )β +ˆt with ρ replaced
by a consistent estimate ρ̂ is not efficient. A relatively more efficient method for estimating this time
series model is to estimate jointly the parameters ρ and λ by the nonlinear least squares method or
the Hatanaka two-step method (Hatanaka, 1974). The SAR model with SAR disturbances in (1)
includes this dynamic time series model as a special case (with a special spatial weights matrix of a
single neighbor for each spatial unit and by setting y0 = 0). Therefore, a possible efficient estimation
method for estimating (1) should take into account the joint estimation of ρ, λ and β. In order to
achieve possible efficient estimates, we suggest the GMM approach such that the estimates of ρ and
λ can be asymptotically correlated.
3 Identification and GMM Estimation
To proceed, we follow the regularity assumptions for GMM estimation specified in Lee (2006) with
proper modifications to fit in the current model.
Assumption 1 The ni ’s are i.i.d. with zero mean, variance σ 20 and that a moment of order
higher than the fourth exists.
Assumption 2 The elements of Xn are uniformly bounded constants, Xn has the full rank k,
and limn→∞ n1 Xn0 Xn exists and is nonsingular.
4
Assumption 3 The spatial weights matrices {Wn }, {Mn }, and the corresponding {Sn−1 } and
{Rn−1 } are uniformly bounded in both row and column sums in absolute value.1
The higher than the fourth moment condition in Assumption 1 is needed in order to apply a
central limit theorem due to Kelejian and Prucha (2001). The nonstochastic Xn and its uniform
boundedness conditions in Assumption 2 are for analytical tractability. Assumption 3 limits the
spatial dependences among the units to a tractable degree and is originated by Kelejian and Prucha
(1999). It rules out the unit root case (in time series as a special case).
The GMM method for the estimation of this model (1) is based on the IV functions Pjn n (θ)
(j = 1, · · · , m) where tr(Pjn ) = 0, and n × kx IV matrix Qn constructed from Xn , Wn and Mn . Let
un (δ) = Sn (λ)Yn − Xn β, and n (θ) = Rn (ρ)un (δ). For analytical tractability, the matrices Pn ’s are
assumed to have the uniformly boundedness properties as Wn and Mn .
Assumption 4 The matrices Pn ’s with tr (Pn ) = 0 are uniformly bounded in both row and
column sums in absolute value, and elements of Qn are uniformly bounded.
The vector of empirical moments for estimation can be
⎛ ⎞ ⎛ ⎞
Q0n Q0n
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟
⎜ 0
n (θ)P1n
⎟ ⎜ (Rn (ρ)un (δ))0 P1n ⎟
⎜ ⎟ ⎜ ⎟
gn (θ) = ⎜ .. ⎟ n (θ) = ⎜ .. ⎟ Rn (ρ)un (δ). (4)
⎜ ⎟ ⎜ ⎟
⎜ . ⎟ ⎜ . ⎟
⎝ ⎠ ⎝ ⎠
0
n (θ)Pmn (Rn (ρ)un (δ))0 Pmn
At θ0 , gn (θ0 ) = ( 0n Qn , 0 0
n P1n n , · · · , 0 0
n Pmn n )
0
and E(gn (θ0 )) = 0. For any feasible θ, the model
(1) implies that
⎛ ⎞
Q0n Rn (ρ)dn (δ)
⎜ ⎟
⎜ 0 ⎟
⎜ dn (δ)Rn0 (ρ)P1n Rn (ρ)dn (δ) + σ20 tr(Fn0 −1 Fn0 (ρ, λ)P1n Fn (ρ, λ)Fn−1 ) ⎟
⎜ ⎟
E (gn (θ)) = ⎜ .. ⎟, (5)
⎜ ⎟
⎜ . ⎟
⎝ ⎠
0
d0n (δ)Rn0 (ρ)Pmn Rn (ρ)dn (δ) + σ20 tr(Fn−1 Fn0 (ρ, λ)Pmn Fn (ρ, λ)Fn−1 )
where dn (δ) = Sn (λ)Sn−1 Xn β 0 − Xn β = Xn (β 0 − β) + Gn Xn β 0 (λ0 − λ).2

1 A sequence of square matrices {A }, where A = [a
n n n,ij ], is said to be uniformly bounded
S in row sums (column
sums) in absolute value if the sequence of row sum matrix norm ||An ||∞ = maxi=1,··· ,n n j=1 |an,ij | (column sum
Sn
matrix norm ||An ||1 = maxj=1,··· ,n i=1 |an,ij |) are bounded. (Horn and Johnson, 1985)
2 These moments have been designed to focus on the estimation on θ. If we are also interested in the estimation
of σ2 . It can be estimated by the empirical second moments with estimated residuals of n . In Liu et al. (2006), we
show that this two step approach will not lose asymptotic efficiency.
5
Consider first the identification of θ from the moment equation E (gn (θ)) = 0 of (5). The
moment equation Q0n Rn (ρ)dn (δ) = 0 is explicitly Q0n Rn (ρ)(Xn , Gn Xn β 0 )(β 00 − β 0 , λ0 − λ)0 = 0,
which has the unique solution (λ0 , β 00 ) if Q0n Rn (ρ)(Xn , Gn Xn β 0 ) has full column rank (k + 1) for
each possible ρ in its parameter space. It is necessary that Qn has a rank equal to or greater
than (k + 1), and (Xn , Gn Xn β 0 ) has full column rank (k + 1). When β 0 and λ0 are identified,
because Fn (ρ, λ0 )Fn−1 = Rn (ρ)Rn−1 and dn (δ 0 ) = 0, the remaining moment equations from (5)
0
become tr(Rn−1 Rn0 (ρ)Pjn Rn (ρ)Rn−1 ) = 0, j = 1, · · · , m, which are for the identification of ρ0 .
These remaining moment equations will identify ρ0 as if un were observable for the spatial process
un = ρun + n.
For the case that (Xn , Gn Xn β 0 ) does not have a full column rank, because Xn has a full column
rank k, there exists a constant vector c 6= 0 such that Gn Xn β 0 = Xn c. When Q0n Rn (ρ)Xn has the full
column rank k for each possible ρ, the moment equations 0 = Q0n Rn (ρ)dn (δ) = Q0n Rn (ρ)Xn ((β 0 −
β) + (λ0 − λ)c) imply that all their solutions satisfy the relation β = β 0 + (λ0 − λ)c. This indicates
that once λ0 is identified, β 0 will be identifiable. Furthermore, the relation of solutions implies
that dn (δ) = Xn ((β 0 − β) + (λ0 − λ)c) = 0 and the remaining moment equations are simplified
0
to tr(Fn−1 Fn0 (ρ, λ)Pjn Fn (ρ, λ)Fn−1 ) = 0, for j = 1, · · · , m. These moment equations correspond to
similar moment equations of a pure SAR process with SAR disturbances,
Yn = λ0 Wn Yn + un , un = ρ0 Mn un + n.
For this process, n (θ) = Fn (ρ, λ)Yn = Fn (ρ, λ)Fn−1 n and, hence,
0
E ( 0n (θ)Pjn n (θ)) = σ 20 tr(Fn−1 Fn0 (ρ, λ)Pjn Fn (ρ, λ)Fn−1 ).
For this pure SAR process, identification of ρ0 and λ0 separately would not be possible if Wn = Mn .
This pure SAR process implies the transformed process
Yn = ρ0 Mn Yn + λ0 Wn Yn − ρ0 λ0 Mn Wn Yn + n.
This transformed process is a SAR process with high order spatial lags, Mn Yn , Wn Yn and Mn Wn Yn ,
and with nonlinear constraints on the spatial effect parameters, namely, ρ0 , λ0 and ρ0 λ0 . If Mn =
Wn , the transformed equation would be reduced to Yn = (ρ0 + λ0 )Wn Yn − ρ0 λ0 Wn2 Yn + n. In this
6
situation, the ρ0 and λ0 would not be distinguished from each other even though they might be
locally identifiable. So without some other restrictions such as the inequality constraint ρ0 > λ0 or
λ0 > ρ0 , the ρ0 and λ0 could only be locally identified. This identification issue is similar to that
of a dynamic AR(1) process with AR(1) disturbances in time series.3 Even when Wn is different
from Mn , each of these moment equations is a second degree algebraic equation in ρ and λ and its
solution in the plane of (λ, ρ) can either be an ellipse, a parabola, a hyperbola, two lines, or one
point. It is complex and not illuminating to explore the solution set for each equation. Instead,
identification conditions can be derived by investigating some characteristics of the moment equations
0
tr(Fn−1 Fn0 (ρ, λ)Pjn Fn (ρ, λ)Fn−1 ) = 0 for j = 1, · · · , m. As
Fn (ρ, λ) = Fn + (ρ0 − ρ)Mn Sn + (λ0 − λ)Rn Wn + (ρ0 − ρ)(λ0 − λ)Mn Wn ,
it implies that Fn (ρ, λ)Fn−1 = In + (ρ0 − ρ)Kρn + (λ0 − λ)Kλn + (ρ0 − ρ)(λ0 − λ)Kρλn , where
Kρn = Mn Sn Fn−1 , Kλn = Rn Wn Fn−1 and Kρλn = Mn Wn Fn−1 . Let As = A + A0 for any square
matrix A. (A list of special notations used for this paper has been collected in the Appendix for
convenient reference.) It follows that
0
tr(Fn−1 Fn0 (ρ, λ)Pjn Fn (ρ, λ)Fn−1 )
= αρ,j (ρ0 − ρ) + αλ,j (λ0 − λ) + αρ2 ,j (ρ0 − ρ)2 + αλ2 ,j (λ0 − λ)2 + αρλ,j (ρ0 − ρ)(λ0 − λ)
+αρ2 λ,j (ρ0 − ρ)2 (λ0 − λ) + αρλ2 ,j (ρ0 − ρ)(λ0 − λ)2 + αρ2 λ2 ,j (ρ0 − ρ)2 (λ0 − λ)2 ,
s s
for j = 1, · · · , m, where the coefficients are defined as αρ,j = tr(Pjn Kρn ), αλ,j = tr(Pjn Kλn ),
0 0 s 0 s
αρ2 ,j = tr(Kρn Pjn Kρn ), αλ2 ,j = tr(Kλn Pjn Kλn ), αρλ,j = tr(Pjn Kρλn + Kρn Pjn Kλn ), αρ2 λ,j =
0 s 0 s 0
tr(Kρλn Pjn Kρn ), αρλ2 ,j = tr(Kλn Pjn Kρλn ) and αρ2 λ2 ,j = tr(Kρλn Pjn Kρλn ). It is apparent that
(ρ0 , λ0 ) is a common solution of these m moment equations. Let αρ be the m-dimensional vector
with αρ,j as its jth element. Similarly, αλ , etc., are defined. The necessary and sufficient condition
for this equation system to have the unique solution at (ρ0 , λ0 ) is that the vectors α’s do not have
a linear combination with nonlinear coefficients in the form that
αρ δ 1 + αλ δ 2 + αρ2 δ 21 + αλ2 δ 22 + αρλ δ 1 δ 2 + αρ2 λ δ 21 δ 2 + αρλ2 δ 1 δ 22 + αρ2 λ2 δ 21 δ 22 = 0, (6)

3 It is noted that the identification of the MRSAR model with relevant X may not necessarily require W to be
n n
distinct from Mn . This is so for a dynamic AR(1) regression equation with AR(1) disturbances.
7
for some constants δ 1 and δ 2 with (δ 1 , δ 2 ) 6= 0. This condition is a necessary and sufficient condition
for the identification of the first-order SAR process with first-order SAR disturbances. A sufficient
identification condition is that the α’s are linearly independent. As the α’s together will form a
matrix of dimension m × 8, in order to have this sufficient identification condition, the number
of Pn ’s has to be at least 8. Weaker sufficient conditions are available. If there exists a solution
ρ not equal to ρ0 , one has δ 1 6= 0 in (6). This will imply that either αρ or αρ2 will be linearly
dependent on all other seven α’s. So it is sufficient to identify ρ0 if both αρ and αρ2 are linearly
independent of the other seven α’s. With ρ0 being identified, (6) becomes αλ δ 2 + αλ2 δ 22 = 0. The
λ0 will be identified if αλ and αλ2 are not linearly dependent. By symmetric arguments, a similar
set of sufficient conditions can be stated for the identification of λ0 first and then the identification
of ρ0 . As a general principle, the true (ρ0 , λ0 ) may be identifiable when sufficient distinct moment
equations are used and their solution sets intersect only at the true parameter vector.4 Assumption
5 summarizes some sufficient conditions for the identification of θ0 .
Assumption 5 Either (i) limn→∞ n1 Q0n Rn (ρ)(Xn , Gn Xn β 0 ) has full rank (k + 1) for each pos-
sible ρ in its parameter space, and limn→∞ n1 tr(Pjn Hn ) 6= 0 for some j = 1, · · · , m,
1 s s
lim (tr(P1n Hn ), · · · , tr(Pmn Hn ))0
n→∞ n
is linearly independent of limn→∞ n1 (tr(Hn0 P1n Hn ), · · · , tr(Hn0 Pmn Hn ))0 ; or (ii) limn→∞ n1 Q0n Rn (ρ)Xn
has full rank k for each possible ρ in its parameter space, Wn 6= Mn , and the vectors α’s do not
have a linear combination with nonlinear coefficients in the form that αρ δ 1 + αλ δ 2 + αρ2 δ 21 + αλ2 δ 22 +
αρλ δ 1 δ 2 + αρ2 λ δ 21 δ 2 + αρλ2 δ 1 δ 22 + αρ2 λ2 δ 21 δ 22 = 0, for some constants δ 1 and δ 2 with (δ 1 , δ 2 ) 6= 0.
When (Xn , Gn Xn β 0 ) does not have a full column rank and Wn = Mn , only local identification
of (ρ0 , λ0 ) are possible.
Let Ωn = var (gn (θ0 )). The variance matrix Ωn involves variances and covariances of lin-
ear and quadratic forms of n. For any square matrix A, let vecD (A) = (a11 , · · · , ann )0 denote
the column vector formed with the diagonal elements of a matrix A. Then, E(Q0n n · 0n Pn n ) =
Pn Pn
Q0n i=1 j=1 pn,ij E( n ni nj ) = Q0n vecD (Pn )μ3 , where μ3 = E( 3ni ) is the third moment of ni .
4
Let μ4 = E( ni ) be the fourth moment of ni . Lemma B.2 shows that E( 0n Pjn n · 0
n Pln n ) =
4 As the GMM estimation with those moment functions can be rewritten in a nonlinear least squares estima-
tion framework with nonlinearity only in parameters, sufficient identification condition can also be derived from the
corresponding nonlinear regression equation.
8
(μ4 − 3σ 40 )vec0D (Pjn )vecD (Pln ) + σ40 tr(Pjn Pln
s
). It follows that
⎛ ⎞
⎜ 0 μ3 Q0n ω nm ⎟
Ωn = ⎝ ⎠ + Vn , (7)
μ3 ω 0nm Qn (μ4 − 3σ40 )ω 0nm ω nm
with ω nm = [vecD (P1n ), · · · , vecD (Pmn )] and
⎛ ⎞
1
Q0 Q
σ 20 n n
0 ··· 0
⎜ ⎟ ⎛ ⎞
⎜ ⎟
⎜ 0 s
tr(P1n P1n ) ··· s
tr(P1n Pmn ) ⎟ 1
Q0 Q 0
4⎜ ⎟ ⎜ σ 20 n n ⎟
Vn = σ0 ⎜ .. .. .. .. ⎟ = σ 40 ⎝ ⎠, (8)
⎜ . ⎟ 0 ∆mn
⎜ . . . ⎟
⎝ ⎠
s s
0 tr(Pmn P1n ) · · · tr(Pmn Pmn )
s s
where ∆mn = [vec(P1n ), · · · , vec(Pmn )]0 [vec(P1n ), · · · , vec(Pmn )], by using tr(AB) = vec(A0 )0 vec(B)
for any conformable matrices A and B. When n is normally distributed, Ωn is simplified to
Vn because μ3 = 0 and μ4 = 3σ 40 . In general, Ωn is nonsingular if and only if both matrices
(vec(P1n ), · · · , vec(Pmn )) and Qn have full column ranks. As elements of Pjn ’s and Qn are uni-
1 0 1 0 1 0
formly bounded by Assumption 4, it is apparent that n ω nm Qn , n ω nm ω nm and n Qn Qn are of order
s 1 s
O(1). Furthermore, because Pjn Pln is bounded in row or column sums, n tr(Pjn Pln ) = O(1). Conse-
quently, n1 Ωn = O(1). It is thus meaningful to impose the following conventional regularity condition
1
on the limit of n Ωn :
1
Assumption 6 The limit of n Ωn exists and is a nonsingular matrix.5
The derivatives of gn (θ) in (4) with respect to ρ, λ, and β are
⎛ ⎞
Q0n Mn un (δ) Q0n Rn (ρ)Wn Yn Q0n Rn (ρ)Xn
⎜ ⎟
⎜ 0 ⎟
∂gn (θ) ⎜ un (δ)Mn0 P1n
s
Rn (ρ)un (δ) (Wn Yn )0 Rn0 (ρ)P1ns
Rn (ρ)un (δ) u0n (δ)Rn0 (ρ)P1ns
Rn (ρ)Xn ⎟
⎜ ⎟
0 = − ⎜ . . . ⎟.
∂θ ⎜ .. .. .. ⎟
⎜ ⎟
⎝ ⎠
u0n (δ)Mn0 Pmn
s
Rn (ρ)un (δ) (Wn Yn )0 Rn0 (ρ)Pmns
Rn (ρ)un (δ) u0n (δ)Rn0 (ρ)Pmns
Rn (ρ)Xn
5 This assumption with 1 Ω has ruled out the large group interactions scenario, which has also be analyzed in Lee
n n
(2004). Here, for simplicity, we focus on spatial interactions with a few neighbors for each spatial unit.
9
Thus, at θ0 , as un = Rn−1 n , Wn Yn = Gn Xn β 0 + Gn Rn−1 n , and
⎛ ⎞
0 Q0n Rn Gn Xn β 0 Q0n Rn Xn
⎜ ⎟
⎜ 2 ⎟
∂E(gn (θ0 )) ⎜ σ 0 tr(P1n s
Hn ) σ20 tr(P1n
s
Ḡn ) 0 ⎟
⎜ ⎟
Dn = 0 = − ⎜ . .. .. ⎟, (9)
∂θ ⎜ .. ⎟
⎜ . . ⎟
⎝ ⎠
σ 20 tr(Pmns
Hn ) σ 20 tr(Pmn
s
Ḡn ) 0
where Ḡn = Rn Gn Rn−1 .
4 Asymptotic Distributions of GMMEs and the BGMME
The following proposition provides the asymptotic distribution of a GMME with a linear transfor-
mation of the moment equations, an gn (θ), where an is a matrix with a full row rank greater than or
equal to the number of unknown of parameters (k + 2). The an is assumed to converge to a constant
matrix a0 which has also a full row rank. This corresponds to the Hansen’s GMM setting, which
illustrates the optimal weighting issue in the GMM framework. As usual for nonlinear estimation,
the parameter space Θ of θ will be taken to be a compact convex set including θ0 in its interior.6
Assumption 7 The θ0 is in the interior of the parameter space Θ, which is a compact convex
subset of Rk+2 .
Proposition 1 Under Assumptions 1-5, suppose gn (θ) is given by (4) so that limn→∞ an E(gn (θ)) =
0 has a unique root at θ0 in Θ. Then, the GMME θ̂n derived from minθ∈Θ gn0 (θ)a0n an gn (θ) is a
√ D
consistent estimator of θ0 , and n(θ̂n − θ0 ) → N (0, Σ), where
∙ ¸−1 ∙ ¸−1
1 1 1 1 1 1 1
Σ = lim ( Dn0 )a0n an ( Dn ) ( Dn0 )a0n an ( Ωn )a0n an ( Dn ) ( Dn0 )a0n an ( Dn ) ,
n→∞ n n n n n n n
with Dn in (9) under the assumption that limn→∞ n1 an Dn exists and has the full rank (k + 2).
From Proposition 1, with the moment functions gn (θ) in (4), the optimal choice of a weighting
matrix a0n an is Ω−1
n by the generalized Schwartz inequality. As Ωn involves unknown parameters
σ 20 , μ3 and μ4 , the optimal GMM objective function shall be formulated with a two-step feasible
6 Note that it is unnecessary to require that for each θ in Θ to make sure that the determinant of S (λ) is
n
positive. The property of such a determinant does not play a role in the GMM estimation. In theory, any convex
compact set in Assumption 7 will do as long as other assumptions are satisfied at θ = θ0 .
10
approach by estimating consistently σ 20 , as well as μ3 and μ4 . They can be done by using estimated
residuals of n from an initial consistent estimate of θ0 .7 The Ωn can then be consistently estimated
as Ω̂n . The following proposition shows that the feasible optimum GMME with a consistently
estimated Ω̂n has the same limiting distribution of the optimum GMME based on Ωn . With the
optimum GMM objective function, an overidentification test is available.
Proposition 2 Under Assumptions 1-6, suppose that ( Ω̂nn )−1 − ( Ωnn )−1 = op (1), then the feasible
optimal GMME θ̂f o,n derived from minθ∈Θ gn0 (θ)Ω̂−1
n gn (θ) based on gn (θ) in (4) has the asymptotic
distribution
√ D 1
n(θ̂f o,n − θ0 ) → N (0, ( lim Dn0 Ω−1
n Dn )
−1
). (10)
n→∞ n
Furthermore,
D
gn0 (θ̂n )Ω̂−1 2
n gn (θ̂ n ) → χ ((m + kx ) − (k + 2)).
Consider the issue of selecting the best moment functions and the best IV matrix. By transform-
ing the disturbance vector un into n free of spatial correlation, the model (1) implies that
Rn Yn = λ0 (Rn Wn Yn ) + Rn Xn β 0 + n.
This transformed equation can be rewritten into a first-order MRSAR form as
Ȳn = λ0 W̄n Ȳn + X̄n β 0 + n, (11)
in terms of the transformed variables Ȳn = Rn Yn and X̄n = Rn Xn , where W̄n = Rn Wn Rn−1 . Denote
S̄n = In − λ0 W̄n and Ḡn = W̄n S̄n−1 . So, the equation for W̄n Ȳn is W̄n Ȳn = Ḡn X̄n β 0 + Ḡn n .
Note that S̄n = In − λ0 W̄n = Rn Sn Rn−1 and Ḡn = W̄n S̄n−1 = Rn Wn Rn−1 (In − λ0 Rn Wn Rn−1 )−1 =
£ ¤
Rn Wn (In −λ0 Wn )−1 Rn−1 = Rn Gn Rn−1 . If an intercept appears in X̄n , we have X̄n = X̄n∗ , ln , where
ln is an n-dimensional vector of ones. Otherwise X̄n∗ ≡ X̄n . Suppose there are k∗ columns in X̄n∗ . Let
∗
X̄nj be the jth column of X̄n , and X̄nj be the jth column of X̄n∗ . Denote X̄nj
∗d ∗
= X̄nj − n1 ln ln0 X̄nj
∗
,
∗
the deviation of X̄nj from its sample mean. Let D (A) be a diagonal matrix with diagonal elements
7 The detailed proof is straightforward and is omitted here
11
being those of A if A is a vector, or diagonal elements of A if A is a square matrix. Let
(η 4 − 3) − η23 ¡ ¢ η3 ¡ ¢
Ḡ∗n = Ḡn − D Ḡn − D Ḡn X̄n β 0 ,
(η 4 − 1) − η23 σ 0 [(η 4 − 1) − η23 ]
and
(η 4 − 3) − η23
Hn∗ = Hn − D(Hn ),
(η 4 − 1) − η23
where η 3 = μ3 /σ 30 being the skewness of the disturbance, and η4 = μ4 /σ 40 being the kurto-
sis of the disturbance. And denote Mn = {θ̂o,n } the class of optimal GMMEs derived from
minθ∈Θ gn0 (θ) Ω−1
n gn (θ).
¡ ¢
∗
Proposition 3 Let P1n = Ḡ∗n − n1 tr Ḡ∗n In , P2n
∗
= Hn∗ − n1 tr (Hn∗ ) In , and Pj+2,n
∗ ∗d
= D(X̄nj ) for
j = 1, · · · , k ∗ . Let Q∗n = (Q∗1n , Q∗2n , Q∗3n ) with
µ ¶
η23 1 0
Q∗1n = X̄n + X̄n − ln ln X̄n ,
(η4 − 1) − η23 n
µ ¶
η 23 1 0
Q∗2n = Ḡn X̄n β 0 + Ḡ X̄ β
n n 0 − l n l Ḡ X̄ β
n n 0
(η4 − 1) − η23 n n
µ ¶
2σ 0 η3 ¡ ¢ 1 ¡ ¢
− 2 vecD Ḡn − tr Ḡn ln ,
(η4 − 1) − η 3 n
and Q∗3n = vecD (Hn ) − n1 tr (Hn ) ln . Within the class of optimal GMMEs Mn , the consistent root
θ̂b,n derived from minθ∈Θ gn∗0 (θ) Ω∗−1 ∗ ∗ ∗
n gn (θ), where Ωn = var (gn (θ 0 )) and
gn∗ (θ) = (Q∗n , P1n

∗ ∗ 0
n (θ), · · · , Pk∗ +2,n n (θ)) n (θ),
√ D
is the BGMME with the limiting distribution n(θ̂b,n − θ0 ) → N (0, Σ−1
b ), and
⎛ ⎞
⎜
∗s
tr(P2n Hn ) ∗s
tr (P1n Hn ) − σ0 [(η 2η 3 ∗0
2 Q3n X̄n
⎟
4 −1)−η 3 ]
1⎜
⎜ ¡ ¢0 ¡ ¢ ⎟
⎟.
Σb = lim
n→∞ n ⎜
∗s
tr (P1n Hn ) σ −2
0 Ḡn X̄n β 0 Q∗2n + tr P1n
∗s
Ḡn σ −2 ∗0
0 Q2n X̄n ⎟
⎝ ⎠
− σ0 [(η 2η 3
X̄ 0 Q∗
−1)−η 2 ] n 3n
σ −2 0 ∗
0 X̄n Q2n σ −2 0 ∗
0 X̄n Q1n
4 3
0
The moment function [Q∗3n , P2n
∗
Rn (ρ)un (δ)] Rn (ρ)un (δ) is apparently designed for the estimation
of ρ of the spatial correlated disturbances un = ρMn un + n. Due to the correlation between linear
12
and quadratic moment functions, it is more involved than the best moment function for estimating
the (pure) SAR process un = ρMn un + n, if un were observable.8 And the selection of the remaining
∗ ∗ ∗
moment functions with (P1n , P3n , · · · , Pj+2,n ) and IV matrix (Q∗1n , Q∗2n ) correspond to the selection
of best moment functions and best IV matrix for the estimation of the transformed first-order
MRSAR model in Liu et al. (2006). These two sets of moment functions are, however, not separated
from each other but are estimating simultaneously ρ and δ.
Let P1n be the class of constant n × n matrices Pn ’s satisfying Assumption 4. A subclass P2n of
P1n consisting of Pn ’s with a zero diagonal is also interesting, as the corresponding GMME is robust
against unknown heteroskedasticity (Lin and Lee, 2006). When n is normally distributed or Pn ’s
are selected from the subclass P2n , the variance matrix Ωn = Vn in (8) is a block diagonal matrix.
Let 0p×q denote the zero matrix of dimension p × q. This variance matrix and the derivative matrix
in (9) together imply the asymptotic precision matrix of θ̂o,n which is
⎛ ⎞ ⎛ ⎞
−1 0
⎜ An Bn An 02×k ⎟ ⎜ 01×1 01×(k+1) ⎟
Dn0 Ω−1
n Dn = ⎝ ⎠+⎝ ⎠, (12)
1
0k×2 0k×k 0(k+1)×1 σ 2 Cn
0
where ⎛ ⎞
s s
⎜ tr(P1n Hn ) ··· tr(Pmn Hn ) ⎟
An = ⎝ ⎠,
s s
tr(P1n Ḡn ) ··· tr(Pmn Ḡn )
⎛ ⎞ ⎛ ⎞
s s s s s s
⎜ tr(P1n P1n ) ··· tr(P1n Pmn ) ⎟ ⎜ tr(P1n P1n ) ··· tr(P1n Pmn ) ⎟
⎜ .. .. .. ⎟ 1⎜ .. .. .. ⎟
Bn = ⎜
⎜ . . . ⎟= ⎜
⎟ 2⎜ . . . ⎟,
⎟
⎝ ⎠ ⎝ ⎠
s s s s s s
tr(Pmn P1n ) · · · tr(Pmn Pmn ) tr(Pmn P1n ) ··· tr(Pmn Pmn )
and
Cn = (Gn Xn β 0 , Xn )0 Rn0 Qn (Q0n Qn )−1 Q0n Rn (Gn Xn β 0 , Xn )
= (Ḡn X̄n β 0 , X̄n )0 Qn (Q0n Qn )−1 Q0n (Ḡn X̄n β 0 , X̄n ).
With the asymptotic precision matrix in (12), it follows from the generalized Schwartz inequality
that the best selection of Qn shall be Ḡn X̄ n β 0 and X̄ n , the best selection of Pn ’s from the subclass
8 For the pure SAR process, the BGMME is derived from the quadratic moment condition 0 P∗ = 0 with Pn∗ =
n n n
η −3
(Hn − tr(Hn
n)
In ) − η4 −1 (D(Hn ) − tr(H
n
n)
In ). (Liu et al., 2006)
4
13
P2n shall be (Ḡn − D(Ḡn )) and (Hn − D(Hn )), and in the event that n is normally distributed (or,
more generally, η3 = 0 and η4 = 3), (Ḡn − tr(nḠn ) In ) and (Hn − tr(H
n
n)
In ) shall be the best selection
of Pn ’s from the broader class P1n . The best selections of Qn and Pn ’s from P1n under normality
are special case of Q∗1n , Q∗2n , P1n
∗ ∗
and P2n in Proposition 3. When n is normally distributed,
tr(Ḡn )
Q∗1n = X̄n and Q∗2n = Ḡn X̄ n β 0 as η 3 = 0, and, on the other hand, P1n
∗
= Ḡn − n In and
∗ tr(Hn )
P2n = Hn − n In as Ḡ∗n and Hn∗ reduces to Ḡn and Hn . And it follows arguments in Breusch
et al. (1999) that moment functions [Q∗3n , D0 (Xn1
∗d
) n (θ), · · · , D0 (Xnk
∗d 0
∗ ) n (θ)] n (θ) are redundant
given [Q∗1n , Q∗2n , P1n

∗0 ∗0 0
n (θ), P2n n (θ)] n (θ) under normality. The best GMM estimator θ̂ b,n with P1n
under normality (respectively P2n ) has the limiting distribution that
√ D 1
n(θ̂b,n − θ0 ) → N (0, ( lim Σb,n )−1 ),
n→∞ n
where
⎛ ⎞
⎜ tr(Hnds Hn ) tr((Ḡn ) Hn ) ds
0 ⎟
⎜ ⎟
Σb,n ⎜
= ⎜ tr((Ḡn )ds Hn ) tr((Ḡn )ds Ḡn ) + 12 (Ḡn X̄ n β 0 )0 (Ḡn X̄ n β 0 ) 1
(Ḡn X̄ n β 0 )0 X̄ n ⎟,
σ0 σ 20 ⎟
⎝ ⎠
1 1
0 σ2
X̄ 0n Ḡn X̄ n β 0 σ 20
X̄ 0n X̄ n
0
tr(An )
where Adn = (An − n In ) for any n × n matrix An (respectively Adn = (An − D(An )).
Similar to the MRSAR model, the BGMME is as efficient as MLE under normality and more
efficient than QMLE otherwise. The reason is as follows. The log-likelihood function for MRSAR
model with SAR disturbances is
n n
ln Ln = − ln(2π) − ln σ 2 + ln |Sn (λ)| + ln |Rn (ρ)|
2 2
1
− 2 [Yn − Sn (λ) Xn β]0 [Rn (ρ) Sn (λ)]0 [Rn (ρ) Sn (λ)][Yn − Sn−1 (λ) Xn β],
−1
2σ
and the derivatives are
∂ ln Ln 1 1
= (Rn (ρ) Xn )0 n (θ) = 2 X̄n0 (ρ) n (θ),
∂β σ2 σ
∂ ln Ln n 1
= − 2 + 4 0n (θ) n (θ),
∂σ 2 2σ 2σ
14
∂ ln Ln
= −tr{Rn (ρ) Wn Rn−1 (ρ) [Rn (ρ) (In − λWn )Rn−1 (ρ)]−1 }
∂λ
1
+{Rn (ρ) Wn Rn−1 (ρ) [Rn (ρ) (In − λWn )Rn−1 (ρ)]−1 Rn (ρ) Xn β}0 n (θ)
σ2
1
+ 2 0n (θ){Rn (ρ) Wn Rn−1 (ρ) [Rn (ρ) (In − λWn )Rn−1 (ρ)]−1 } n (θ)
σ
1 1
= −tr(Ḡn (ρ, λ)) + 2 {Ḡn (ρ, λ)X̄n (ρ)β}0 n (θ) + 2 0n (θ)Ḡn (ρ, λ) n (θ),
σ σ
∂ ln Ln 1 0
= −tr(Hn (ρ)) + 2 n (θ)Hn (ρ) n (θ).
∂ρ σ
where Ȳn (ρ) = Rn (ρ)Yn , X̄n (ρ) = Rn (ρ)Xn , W̄n (ρ) = Rn (ρ)Wn Rn−1 (ρ), S̄n (ρ, λ) = Rn (ρ)Sn (λ)Rn−1 (ρ),
and Ḡn (ρ, λ) = W̄n (ρ)S̄n−1 (λ). The QMLE of σ 2 is σ̂ 2ml,n (θ) = 1 0
n n (θ) n (θ) for a given value θ. Sub-
stitution of σ̂2ml,n (θ) in the remaining likelihood equations shows that the QMLE is characterized
by the moment equations X̄n0 (ρ) n (θ) = 0,
1
{Ḡn (ρ, λ)X̄n (ρ)β}0 n (θ) + 0
n (θ){Ḡn (ρ, λ) − tr(Ḡn (ρ, λ))} n (θ) = 0,
n
and
0 1
n (θ){Hn (ρ) − tr(Hn (ρ))} n (θ) = 0.
n
Denote the QMLE of θ by θ̂ml,n . θ̂ml,n is also the root of X̄n0 (ρ̂ml,n ) n (θ) = 0,
1
{Ḡn (ρ̂ml,n , λ̂ml,n )X̄n (ρ̂ml,n )β̂ ml,n }0 n (θ)+ 0n (θ){Ḡn (ρ̂ml,n , λ̂ml,n )− tr(Ḡn (ρ̂ml,n , λ̂ml,n ))} n (θ) = 0,
n
and
0 1
n (θ){Hn (ρ̂ml,n ) − tr(Hn (ρ̂ml,n ))} n (θ) = 0,
n
which are asymptotically equivalent to X̄n0 n (θ) = 0,
1
{Ḡn X̄n β 0 }0 n (θ) + 0
n (θ){Ḡn − tr(Ḡn )} n (θ) = 0,
n
and 0
n (θ){Hn − n1 tr(Hn )} n (θ) = 0, by similar arguments used in the proof of the following propo-
sition. In another word, the first order conditions that solves QMLE are asymptotically equivalent
to some linear and quadratic moment conditions within the class that the BGMME is defined.
The G2SLS estimator of Kelejian and Prucha (1998) is shown to be consistent and asymptotic
15
normal with
√ D 1
n(δ̂ kp,n − δ 0 ) → N (0, σ 20 [ lim (Gn Xn β 0 , Xn )0 Rn0 Qn (Q0n Qn )−1 Q0n Rn (Gn Xn β 0 , Xn )]−1 ). (13)
n→∞ n
The asymptotic variance of the G2SLS estimator in (13) can be easily compared with the asymptotic
variance of the optimum GMME in (10) when n is normally distributed. For this case, the asymp-
totic variance of θ̂f o,n is the inverse of the asymptotic precision matrix in (12). The corresponding
asymptotic variance of the component δ̂ f o,n of θ̂f o,n can be derived by the inverse formula of a
partitioned matrix, which is
⎛ ⎛ ⎞⎞−1
−1 0 2
(An Bn An )21
⎜ Cn ⎜ (An Bn−1 A0n )22 − −1 0
(An Bn An )11
01×k ⎟⎟
⎝ 2 +⎝ ⎠⎠ .
σ0 0k×1 0k×k
As An Bn−1 An is nonnegative definite, it is seen that the asymptotic variance of δ̂ f o,n is relatively
smaller than the asymptotic variance of δ̂ kp,n . Thus, the optimum GMME can be more efficient
relative to the G2SLS estimator. By the same reason, the optimum GMME with the best selected
Qn will be more efficient relative to the best G2SLS estimator in Lee (2003).
∗
The BGMME associated with Pjn (j = 1, · · · , k ∗ + 2) and Q∗n involves the unknown parameters
θ0 , σ 20 , μ3 and μ4 . In practice, with initial consistent estimates θ̂n , σ̂ 2n , μ̂3 and μ̂4 , Pjn
∗
(j = 1, · · · , k∗ +
2) and Q∗n can be replaced by their empirical counterparts P̂jn
∗ ∗
= Pjn (θ̂n , σ̂ 2n , μ̂3 , μ̂4 ) (j = 1, · · · , k ∗ +
2) and Q̂∗n = Q∗n (θ̂n , σ̂2n , μ̂3 , μ̂4 ). The corresponding variance matrix Ω∗n of the best moment functions
can be estimated as Ω̂∗n = Ω∗n (θ̂n , σ̂ 2n , μ̂3 , μ̂4 ). The following proposition shows that the feasible
BGMME with the moment functions ĝn∗ (θ) = (Q̂∗n , P̂1n
∗ ∗ 0
n (θ), · · · , P̂k∗ +2,n n (θ)) n (θ) has the same
limiting distribution as the corresponding BGMME in Proposition 3.
√
Proposition 4 Suppose θ̂n , σ̂ 2n , μ̂3 and μ̂4 are n-consistent estimates of θ0 , σ 20 , μ3 and μ4 . Let
∗
P̂jn ∗
= Pjn (θ̂n , σ̂ 2n , μ̂3 , μ̂4 ) (j = 1, · · · , k ∗ +2), Q̂∗n = Q∗n (θ̂n , σ̂ 2n , μ̂3 , μ̂4 ), ĝn∗ (θ) = (Q̂∗n , P̂1n
∗ ∗ 0
n (θ), · · · , P̂k∗ +2,n n (θ)) n (θ),
and Ω̂∗n = Ω∗n (θ̂n , σ̂ 2n , μ̂3 , μ̂4 ). Then minθ∈Θ ĝn∗0 (θ) Ω̂∗−1 ∗
n ĝn (θ) has a consistent root θ̂ f b,n which has
the same limiting distribution of θ̂b,n derived from minθ∈Θ gn∗0 (θ) Ω∗−1 ∗
n gn (θ).
16
5 Monte Carlo Study
In the Monte Carlo study, we consider the MRSAR model with SAR disturbances specified as
Yn = λWn Yn + Xn1 β 1 + Xn2 β 2 + Xn3 β 3 + un ,
un = ρMn un + n,
where xi1 , xi2 and xi3 are three independently generated standard normal variables and are i.i.d. for
all i, and ni ’s are independently generated from the following 5 distributions, all of which are scaled
to have mean 0 and variance 2:
(a) normal, ∼ N (0, 2),
ni
p
(b) student t, ni = 6/5u where u ∼ t (5),
p
(c) symmetric bimodal mixture normal, ni = 2/17u where u ∼ .5N (−4, 1) + .5N (4, 1),
(d) asymmetric bimodal mixture normal, ni = u/3 where u ∼ .5N (−4, 1) + .5N (4, 3),
(e) gamma, ni = u − 2 where u ∼ gamma (2, 1).
To facilitate comparison, skewness (η 3 ) and kurtosis (η 4 ) for these distributions are correspondingly:
(a) η 3 = 0, η 4 = 3; (b) η 3 = 0, η4 = 9; (c) η 3 = 0, η4 ≈ 1.228; (d) η 3 ≈ 0.157, η 4 = 1.429; and
√
(e) η 3 = 2, η4 = 6. Normal distribution is considered as the basis for comparison. Student t and
symmetric bimodal mixture normal distribution are introduced to explore the effects of, respectively,
leptokurtic (η 4 > 3) and platykurtic (η4 < 3) disturbances on the small sample performance of
various estimates. Asymmetric bimodal mixture normal and gamma distributions are introduced to
study the effects of skewness and kurtosis of excess together. To be specific, asymmetric bimodal
mixture normal specified here corresponds to the case where disturbances are platykurtic and have
a small skewness, and gamma corresponds to the case where disturbances are leptokurtic and have
a relatively large skewness. Asymptotically, BGMME is as efficient as MLE under (a), and is more
efficient than QMLE under (b)-(e).
The number of repetitions is 1, 000 for each case in the Monte Carlo experiment. The regressors
are randomly redrawn for each repetition. In each case, we report the mean ‘Mean’ and standard
deviation ‘SD’ of the empirical distributions of the estimates. To facilitate the comparison of various
estimators, their root mean square errors ‘RMSE’ are also reported. Let the true parameters λ0 =
0.2, ρ0 = 0.4, and β 10 = 1.0, β 20 = 0, β 30 = −1.0. The variance ratio of xβ 0 with the sum of
17
variances of xβ 0 and is 0.5. If one ignores the interaction term, this ratio would represent R2 = 0.5
in a regression equation.
The estimation methods considered are (i) G2SLS (ii) QML (iii) GMM1 (iv) GMM2 and (v)
BGMM. G2SLS refers to the generalized 2SLS approach in Kelejian and Prucha (1998),9 QML
refers to the quasi maximum likelihood method,10 GMM1 refers to the best optimal GMM in P2 ,
GMM2 refers to the best optimal GMM under normality, and BGMM refers to the feasible best
GMM described in Proposition 4. We use G2SLS estimates as the initial estimates to implement
feasible optimal GMMs.
For spatial weights matrices, we focus on the case that Wn = Mn , motivated by specifications in
empirical works. First, we consider the weights matrix for the study of crimes across 49 districts in
Columbus, Ohio in Anselin (1988). For moderate sample sizes of n = 245 and 490, the corresponding
spatial weights matrices are block diagonal matrices with the preceding 49 × 49 matrix as their
diagonal blocks. These correspond to panel data of that area for five and ten periods. We denote
these weights matrices by WA .
G2SLS, QML and various GMM estimates of λ0 , ρ0 and β 0 with Wn = Mn = WA are reported
in Tables 1-5. For sample size n = 245, 2SLS estimates of λ0 are biased upwards by about 24%
and 2SLS estimates of ρ0 are biased downwards by about 18%, under all disturbance specifications.
As sample size increases to n = 490, biases in 2SLS estimates of λ0 and ρ0 reduce to about 10%
and 9% respectively. When n = 245, QML estimates of λ0 are biased upwards by about 3% under
distributions (a) and (b), and QML, GMM1 and GMM2 estimators of ρ0 are biased downwards by
about 5%, 3% and 3% respectively, under all disturbance specifications. The other estimates are
essentially unbiased for both sample sizes considered. In terms of SD, G2SLS estimates are almost
as good as those of QML, GMM1 and GMM2, under all disturbance specifications. When the
disturbances are normally distributed, for sample size n = 245, QML, GMM1 and GMM2 estimates
are better than BGMM estimates in terms of smaller SD and RMSE. The finite sample performance
of BGMM estimates is as good as the other estimates when n = 490. When the disturbances are
symmetric and leptokurtic, BGMM estimates are not better than those of QML, GMM1 and GMM2,
even though the latter are not asymptotically efficient. When the disturbances are symmetric and
9 The GS2SLS estimator is calculated using sac_gmm.m in Econometrics Toolbox (version 7) by James P. Lesage.
Function options of sac_gmm.m are set to the default values.

1 0 The QML estimator is calculated using sac.m in Econometrics Toolbox (version 7) by James P. Lesage. Function
option inf o.lf lag = 0 for full computation (instead of approximation), and other options are set to the default values.
18
platykurtic, BGMM estimates are a little better than the other estimates. When the disturbances
follow the distributions with η 3 6= 0, BGMM estimates have smaller SD and RMSE than the other
estimates for n = 490. For example, when the disturbances follow gamma distribution and n = 490,
the percentage reduction in SD of BGMM estimates of λ10 , λ20 , β 10 , β 20 , and β 30 relative to those
of GMM2 is, respectively, 22.6%, 15.0%, 21.9%, 23.1%, and 22.6%.
1
Pn
The empirical variance from diagonal elements of WA2 , given by v 2 (WA2 ) = n
2
i=1 (WA,ii −
1 2 2
n tr(WA )) , is about 0.005, which is quite small. As Liu et al. (2006) have shown that with such a
small v 2 (WA2 ), improvement of BGMME in a SAR process by incorporating a measure of kurtosis
into the moment conditions may not be evident.11 To magnify such improvement that is based
on kurtosis, we construct a weights matrix with relatively large v 2 (Wn2 ) as follows. Because when
Wn is row-normalized, diagonal elements of Wn2 are weighted average for each column of Wn . We
generate 7 × 7 upper triangular matrices An ’s, whose non-zero elements in each column are either all
(200+u)’s or all u’s (u ∼ U [0, 1]) with equal probability. We calculate v 2 (Bn2 ) for the row-normalized
Bn , with Bn = Λ−1 0
n (An + An ) where Λ is a diagonal matrix with its ith diagonal element being
Pn 2 2
j=1 (An,ij +An,ji ). We generate 1000 such Bn ’s, and pick 7 of them with the largest v (Bn ). In this
way, we get a 49 × 49 block diagonal matrix Wn with the 7 row-normalized Bn ’s being the diagonal
blocks. This Wn gives v 2 (Wn2 ) = 0.145. For sample sizes of n = 245 and 490, the corresponding
spatial weights matrices are block diagonal matrices with the preceding 49 × 49 matrix as their
diagonal blocks. We denote the weights matrices constructed as above by WC .
Tables 6-10 report G2SLS, QML and various GMM estimates of λ0 , ρ0 and β 0 with Wn =
Mn = WC . When n = 245, G2SLS estimates of λ0 are biased upwards by 15 ∼ 19% and those
of ρ0 are biased downwards by about 15%, under all disturbance specifications. The biases of
2SLS estimates of λ0 and ρ0 reduce to about 10% and 8% respectively, as sample size increases to
n = 490. When n = 245, QML and various GMM estimates of λ0 are slightly biased upwards12 and
those of ρ0 are biased downwards under all disturbance specifications, with the largest absolute bias
being about 4%. The biases are even smaller when n = 490. The other estimates are essentially
unbiased for both sample sizes considered. In terms of SD, G2SLS estimates of λ0 and ρ0 are
mostly dominated by the other estimates, and G2SLS estimates of β 0 are as good as those of QML,
1 1 Liu et al. (2006) show that improvement of BGMME in a SAR process depends on the empirical variance from
the diagonal elements in Wn (In − λ0 Wn )−1 . Given that (In − λ0 Wn ) is invertible, the diagonal elements in Wn2 are
also relevant as they are the leading terms of the diagonal elements in Wn (In − λ0 Wn )−1 , and they are independent
of λ0 .
1 2 Under distribution (e), QML and various GMM estimates of λ are biased down by less than 4%.
0
19
GMM1 and GMM2, under all disturbance specifications when n = 490. When the disturbances are
normally distributed, BGMM estimates are as good as those of QML and GMM2 in terms of SD
and RMSE when n = 490. When the disturbances follow student t distribution, GMM1 estimates
have the smallest SD when n = 245, and QML, GMM1 and GMM2 estimates are as good as BGMM
estimates when n = 490. When the disturbances follow symmetric bimodal mixture normal, BGMM
estimates are a little better than the other estimates. When n = 490, the percentage reduction in
SD of BGMM estimates of λ10 , λ20 , β 10 , β 20 , and β 30 relative to those of the runner-up GMM2 is,
respectively, 6.9%, 8.5%, 3.0%, 5.1%, and 6.2%. When the disturbances follow the distributions with
η 3 6= 0, BGMM estimates have smaller SD and RMSE than the other estimates for both sample sizes
considered. When the disturbances follow asymmetric bimodal mixture normal and n = 490, the
percentage reduction in SD of BGMM estimates of λ10 , λ20 , β 10 , β 20 , and β 30 relative to those of the
runner-up GMM2 is, respectively, 13.6%, 8.9%, 12.7%, 12.3%, and 14.1%. When the disturbances
follow gamma distribution and n = 490, the percentage reduction in SD of BGMM estimates of λ10 ,
λ20 , β 10 , β 20 , and β 30 relative to the runner-up GMM2 is, respectively, 18.7%, 12.2%, 21.9%, 23.4%,
and 23.8%.
In summary, the GMM approach can improve upon the G2SLS method because the former
estimates δ = (λ, β 0 ) and the spatial effects in disturbances ρ jointly. In the finite sample, the
GMMEs have a smaller bias and standard deviation. And Monte Carlo experiments provide evidence
that BGMMEs improve upon QMLE when disturbances are not normally distributed.
6 Conclusion
In this paper, we consider the GMM estimation of the MRSAR models with SAR disturbances. The
proposed GMM approach improves upon G2SLS in Kelejian and Prucha (1998) by jointly estimating
all unknown parameters in the model. Among the optimal GMMEs, we show the existence of the
BGMME that is asymptotically as efficient as MLE under normality, and more efficient than the
QMLE when the disturbances are not normally distributed. We also present some evidence from
Monte Carlo experiments that the GMM approach improves upon the finite sample performance of
the conventional G2SLS.
20
A Summary of Notations
D (A) = Diag (A) is a diagonal matrix with diagonal elements being those A if A is a vector, or
diagonal elements of A if A is a square matrix.
vecD (A) is a column vector formed by the diagonal elements of a square matrix A.
As = A + A0 where A is a square matrix.
Ad = A − n1 tr (A) In or Ad = A − D (A) where A is an n × n matrix.
AL is a linearly transformed square matrix of A which preserves the uniform boundedness property.
θ = (ρ, λ, β 0 )0 ; δ = (λ, β 0 )0 .
Sn (λ) = In − λWn ; Sn = Sn (λ0 ).
Gn (λ) = Wn Sn−1 (λ); Gn = Gn (λ0 ).
Rn (ρ) = In − ρMn ; Rn = Rn (ρ0 ).
Hn (ρ) = Mn Rn−1 (ρ) = Mn (In − ρMn )−1 ; Hn = Hn (ρ0 ).
Fn (ρ, λ) = Rn (ρ)Sn (λ); Fn = Rn Sn .
un (δ) = Sn (λ)Yn − Xn β; n (θ) = Rn (ρ)un (δ).
Ȳn (ρ) = Rn (ρ)Yn ; X̄n (ρ) = Rn (ρ)Xn ; W̄n (ρ) = Rn (ρ)Wn Rn−1 (ρ).
S̄n (ρ, λ) = Rn (ρ)Sn (λ)Rn−1 (ρ); Ḡn (ρ, λ) = W̄n (ρ)S̄n−1 (ρ, λ).
Ȳn = Ȳn (ρ0 ); X̄n = X̄n (ρ0 ); W̄n = W̄n (ρ0 ); S̄n = S̄n (ρ0 , λ0 ); Ḡn = Ḡn (ρ0 , λ0 ).
ln is an n × 1 vector of ones.
ekj is the jth unit vector in Rk .
£ ¤
If an intercept appears in X̄n , we have X̄n = X̄n∗ , ln . Otherwise X̄n∗ ≡ X̄n .
∗d
X̄nj ∗
= X̄nj − n1 ln ln0 X̄nj
∗ ∗
is the deviation of observation X̄nj from its sample mean.
(η 4 −3)−η 23 η3
Ḡ∗n = Ḡn − (η 4 −1)−η 23
D(Ḡn ) − σ 0 [(η4 −1)−η23 ]
D(Ḡn X̄n β 0 ).
(η 4 −3)−η 23
Hn∗ = Hn − (η 4 −1)−η 23
D(Hn ).
B Some Useful Lemmas
In this appendix, we list some lemmas which are useful for the proofs of the results in the text.
Lemma B.1 Suppose that the elements of the sequences of n-dimensional column vectors {z1n }
and {z2n } are uniformly bounded. If {An } are uniformly bounded in either row or column sums in
0
absolute value, then |z1n An z2n | = O(n).
21
Proof. Trivial.
Lemma B.2 Suppose that n1 , · · · , nn are i.i.d. random variables with zero mean and finite vari-
ance σ 2 and finite fourth moment μ4 . Then, for any two n × n matrices An and Bn ,
¡ ¢
E ( 0n An n · 0
n Bn n ) = μ4 − 3σ 4 vec0D (An ) vecD (Bn ) + σ 4 [tr (An ) tr (Bn ) + tr (An Bns )] ,
where Bns = Bn + Bn0 .
Proof. See Lee (2001).
Lemma B.3 Suppose that {An } are uniformly bounded in both row and column sums in absolute
value. n1 , · · · , nn are i.i.d. with zero mean and finite fourth moment. Then, E( 0n An n ) = O(n),
1 0
var( 0n An n ) = O(n), 0
n An n = Op (n), and n n An n − n1 E( 0n An n ) = op (1).
Lemma B.4 Suppose that An is an n × n matrix with its column sums being uniformly bounded in
absolute value, elements of the n×k matrix Cn are uniformly bounded, and n1 , · · · , nn are i.i.d. with
zero mean and finite variance σ 2 . Then, √1 C 0 An n = Op (1) and n1 Cn0 An n = op (1). Furthermore, if
n n
D
the limit of n1 Cn0 An A0n Cn exists and is positive definite, then √1 C 0 An n →
n n
N (0, σ 2 limn→∞ n1 Cn0 An A0n Cn ).
Lemma B.5 Suppose that {An } is a sequence of symmetric n × n matrices with row and column
sums uniformly bounded in absolute value and bn = (bn1 , · · · , bnn )0 is an n-dimensional vector such
P
that supn n1 ni=1 |bni |2+η1 < ∞ for some η1 > 0. n1 , · · · , nn are i.i.d. random variables with zero
mean and finite variance σ 2 , and its moment E(| |4+2δ ) for some δ > 0 exists. Let σ 2Qn be the
0
variance of Qn where Qn = n An n + b0n n − σ2 tr(An ). Assume that the variance σ 2Qn is bounded
Qn D
away from zero at the rate n. Then, σ Qn → N (0, 1).
Proof. See Kelejian and Prucha (2001).
∗
Lemma B.6 Let θ̂n and θ̂n be, respectively, the minimizers of zn (θ) and z∗n (θ) in Θ. Suppose
1
that n (zn (θ) − z̄n (θ)) converges in probability to zero uniformly in θ ∈ Θ, which is a compact set,
©1 ª 1 ∗
and n z̄n (θ) satisfies the uniqueness identification condition at θ 0 . If n (zn (θ) − zn (θ)) = op (1)
∗
uniformly in θ ∈ Θ, then both θ̂n and θ̂n converge in probability to θ0 .
22
2
1 ∂ z n (θ)
In addition, suppose that n ∂θ∂θ 0 converges in probability to a well defined limiting matrix,
∂z n (θ0 ) ∂ 2 z ∗ (θ) 2
uniformly in θ ∈ Θ, which is nonsingular at θ0 , and √1
n ∂θ = Op (1). If n1 ( ∂θ∂θ
n
0 − ∂ ∂θ∂θ
z n (θ)
0 ) =
∂z ∗
n (θ 0 )
√ ∗ √
op (1) uniformly in θ ∈ Θ and √1 (
n ∂θ − ∂z ∂θ
n (θ 0 )
) = op (1), then n(θ̂n − θ0 ) and n(θ̂n − θ0 )
have the same limiting distribution.
Lemma B.7 Under Assumption 2, the projectors Xn (Xn0 Xn )−1 Xn0 and In − Xn (Xn0 Xn )−1 Xn0 are
uniformly bounded in both row and column sums in absolute value.
Lemma B.8 Suppose that {||Wn ||} and {||Sn−1 ||}, where || · || is a matrix norm, are bounded. Then
{||Sn (λ)−1 ||}, where Sn (λ) = In − λWn , are uniformly bounded in a neighborhood of λ0 .
Lemma B.9 Suppose that z1n and z2n are n-dimensional column vectors of constants which are
uniformly bounded, the n × n constant matrix An is uniformly bounded in column sums in absolute
value, and B1n and B2n are uniformly bounded in both row and column sums in absolute value, and
√
n1 , · · · , nn are i.i.d. random variables with zero mean and finite second moment. n(α̂n − α0 ) =
Op (1) where α0 is a p-dimensional vector in the interior of its convex parameter space. The matrix
Cn (α̂n ) has the expansion that
m−1
X p
X p
X
Cn (α̂n ) − Cn (α0 ) = ··· (α̂nj1 − αj1 0 ) · · · (α̂nji − αji 0 )Kin (α0 )
i=1 j1 =1 ji =1
Xp p
X
+ ··· (α̂nj1 − αj1 0 ) · · · (α̂njm − αjm 0 )Kmn (α̂n ), (14)
j1 =1 jm =1
for some m ≥ 2, where Cn (α0 ) and Kin (α0 ) are uniformly bounded in both row and column sums
in absolute value for i = 1, · · · , m − 1, and Kmn (α) is uniformly bounded in both row and column
sums in absolute value, uniformly in a small neighborhood of α0 . Then,
1 0
(a) n z1n (Cn (α̂n ) − Cn (α0 ))z2n = op (1);
(b) √1 z 0 (Cn (α̂n ) − Cn (α0 ))An n = op (1);

n 1n
23
1 0 0
(c) n n B1n (Cn (α̂n ) − Cn (α0 ))B2n n = op (1), if (14) holds for m > 2; and
(d) √1 0 (Cn (α̂n ) − Cn (α0 )) n = op (1), if (14) holds for m > 3 with tr(Kin (α0 )) = 0 for i =
n n
1, · · · , m − 1.
Proof. See Liu et al. (2006).
Lemma B.10 Suppose that z1n and z2n are n-dimensional column vectors of constants which are
uniformly bounded, the n × n constant matrix An is uniformly bounded in column sums in absolute
value, n × n matrices B1n and B2n are uniformly bounded in both row and column sums in absolute
value, and n1 , · · · , nn are i.i.d. with zero mean and finite fourth moment. Let θ̂n , σ̂ 2n , μ̂3 and μ̂4
√
be, respectively, n-consistent estimates of θ0 , σ 20 , μ3 and μ4 . Let Cn be either Ḡn = Rn Wn (In −
λ0 Wn )−1 Rn−1 or Hn = Mn Rn−1 , and Cn∗ be either Ḡ∗n , Hn∗ , or D(X̄nj
∗
) for j = 1, · · · , k∗ . Let Ĉn and
Ĉn∗ be the empirical counterparts of Cn and Cn∗ . Then, under Assumption 3,
1 0 0 √1 z 0 (Ĉ 0
(a) n z1n (Ĉn − Cn0 )L z2n = op (1), n 1n n − Cn0 )L An n = op (1),
1 0 0 √1 0 (Ĉn
n n B1n (Ĉn − Cn )L B2n n = op (1), n n
− Cn )d n = op (1);
1 0 ∗0 √1 z 0 (Ĉ ∗0
(b) n z1n (Ĉn − Cn∗0 )L z2n = op (1), n 1n n − Cn∗0 )L An n = op (1),
1 0 0 ∗ √1 0 (Ĉ ∗
n n B1n (Ĉn − Cn∗ )L B2n n = op (1), n n n − Cn∗ )d n = op (1);
1 0 √1 vec0 (Ĉn
(c) n vecD (Ĉn − Cn )L z2n = op (1), n D − Cn )L An n = op (1); and
1 0 ∗ 1
(d) n vecD (Ĉn − Cn∗ )L z2n = op (1), 0 ∗
n tr[An (Ĉn − Cn∗ )L ] = op (1).
Proof. As Sn − Sn (λ̂n ) = (λ̂n − λ0 )Wjn , it follows that
Sn−1 (λ̂n ) − Sn−1 = Sn−1 (λ̂n )[Sn − Sn (λ̂n )]Sn−1 = Sn−1 (λ̂n )(λ̂n − λ0 )Gn .
By induction,
m−1
X
Sn−1 (λ̂n ) − Sn−1 = (λ̂n − λ0 )i Sn−1 Gin + (λ̂n − λ0 )m Sn−1 (λ̂n )Gm
n,
i=1
for any m ≥ 2. Hence, it follows that
m−1
X
(Ĝn − Gn )L = (λ̂n − λ0 )i (Gi+1 L m m L
n ) + (λ̂n − λ0 ) (Ĝn Gn ) . (15)
i=1
24
which conforms to the expansion (14) with p = 1, Kin (λ0 ) = (Gi+1 L m L
n ) and Kmn (λ̂n ) = (Ĝn Gn ) .
Analogously, we have,
m−1
X
Rn−1 (ρ̂n ) − Rn−1 = (ρ̂n − ρ0 )i Rn−1 Hni + (ρ̂n − ρ0 )m Rn−1 (ρ̂n )Hnm , (16)
i=1
for any m ≥ 2, which implies that
m−1
X
(Ĥn − Hn )L = (ρ̂n − ρ0 )i (Hni+1 )L + (ρ̂n − ρ0 )m (Ĥn Hnm )L . (17)
i=1
(17) conforms to the expansion (14) with p = 1, Kin (λ0 ) = (Hni+1 )L , and Kmn (λ̂n ) = (Ĥn Hnm )L .
And let Ḡn = Rn Gn Rn−1 . We have
R̂n Ĝn R̂n−1 − Rn Gn Rn−1 = (R̂n − Rn )Ĝn R̂n−1 + Rn Ĝn (R̂n−1 − Rn−1 ) + Rn (Ĝn − Gn )Rn−1 .
(R̂n − Rn )Ĝn R̂n−1 = (ρ0 − ρ̂n )Mn Ĝn R̂n−1 . On the other hand, Rn Ĝn (R̂n−1 − Rn−1 ) can be expanded
to the form of (14) by Taylor expansion, and Rn (Ĝn − Gn )Rn−1 can be expanded to the form of
(14) by (15). Note that when the transformation ·L is taken to be the special transformation ·d ,
deterministic parts of the expansions of Rn Ĝn (R̂n−1 − Rn−1 ) and Rn (Ĝn − Gn )Rn−1 have a zero trace
by construction as the operation ·d is to make the resulted matrix to have a zero trace. Hence (a)
follows from Lemma B.9.
For (b), first consider the case that Cn∗ is either Ḡ∗n or Hn∗ . We have
(η 4 − 3) − η23 η3
Ḡ∗n = Ḡn − 2 D(Ḡn ) − D(Ḡn X̄n β 0 )
(η 4 − 1) − η3 σ 0 [(η 4 − 1) − η23 ]
σ2 (μ − 3σ 40 ) − μ23 σ 20 μ3
= Ḡn − 02 4 4 2 D(Ḡn ) − 2 D(Ḡn X̄n β 0 )
σ 0 (μ4 − σ0 ) − μ3 σ 0 (μ4 − σ 40 ) − μ23
κ − 2σ 60 σ2 μ
= Ḡn − D(Ḡn ) − 0 3 D(Ḡn X̄n β 0 ),
κ κ
and
(η 4 − 3) − η23 σ 2 (μ − 3σ 40 ) − μ23
Hn∗ = Hn − 2 D(Hn ) = Hn − 02 4 D(Hn )
(η 4 − 1) − η3 σ 0 (μ4 − σ 40 ) − μ23
κ − 2σ 60
= Hn − D(Hn ),
κ
25
where κ = σ 20 (μ4 − σ 40 ) − μ23 . Let κ̂ be κ’s empirical counterpart, and
κ̂ − 2(σ̂ 2n )3 κ − 2σ 60
T1n = [Ĉn − D(Ĉn )] − [Cn − D(Cn )]
κ̂ κ
2 3
2(σ̂ n ) 2(σ̂ 2n )3 2σ 60
= (Ĉn − Cn ) − (1 − )D(Ĉn − Cn ) + ( − )D(Cn ).
κ̂ κ̂ κ
As (2(σ̂ 2n )3 /κ̂ − 2σ 60 /κ) = op (1), it follows from (a) and Lemma B.1 that 1 0 0 L
n z1n (T1n ) z2n = op (1).
On the other hand, let
σ̂ 2n μ̂3 σ̂ 2 μ̂ σ2 μ
T2n = − D(Ĉn X̄n (ρ̂n )β̂ n − Cn X̄n β 0 ) − ( n 3 − 0 3 )D(Cn X̄n β 0 )
κ̂ κ̂ κ
σ̂ 2n μ̂3
= − D[(Ĉn − Cn )X̄n β 0 + Ĉn X̄n (β̂ n − β 0 ) + Ĉn (X̄n (ρ̂n ) − X̄n )β̂ n ]
κ̂
σ̂2 μ̂ σ2μ
−( n 3 − 0 3 )D(Cn X̄n β 0 ),
κ̂ κ
where Ĉn − Cn takes the general form
m−1
X
Ĉn − Cn = (α̂n − α0 )i Kin (α0 ) + (α̂n − α0 )m Kmn (α̂n ),
i=1
as we show in the proof of (a). Therefore,
m−1
X
D[(Ĉn − Cn )X̄n β 0 ] = (α̂n − α0 )i D[Kin (α0 ) X̄n β 0 ] + (α̂n − α0 )m D[Kmn (α̂n )X̄n β 0 ],
i=1
1 0 0
As conditions in Lemma B.9 are satisfied, it follows that n z1n D [(Ĉn − Cn )X̄n β 0 ]L z2n = op (1) by
Lemma B.9. On the other hand, let ekj be the jth unit vector in Rk , then
n
1 0 1X
z1n D0 [Ĉn X̄n (β̂ n − β 0 )]z2n = z1n,i z2n,i e0ni Ĉn X̄n (β̂ n − β 0 ) = op (1),
n n i=1
1
Pn 0
as n i=1 z1n,i z2n,i eni Ĉn Xn = Op (1) and β̂ n − β 0 = op (1). And
Ĉn (X̄n (ρ̂n ) − X̄n )β̂ n = Ĉn (Rn (ρ̂n ) − Rn )Xn β̂ n = (ρ0 − ρ̂n )Ĉn Mn Xn β̂ n ,
as e0ni Ĉn Mn Xn β̂ n = Op (1) uniformly in i and ρ0 −ρ̂n = op (1). The remaining term in n1 z1n
0 0 L
(T2n ) z2n
is op (1) as (σ̂ 2n μ̂3 /κ̂ − σ 20 μ3 /κ) = op (1). And with similar arguments and corresponding results in
26
Lemma B.9, the other results in (b) follow, when Cn∗ is either Ḡ∗n or Hn∗ . On the other hand,
when Cn∗ = D(X̄nj ∗
), for j = 1, · · · , k ∗ , we have Ĉn∗ − Cn∗ = D(Rn (ρ̂n )Xnj
∗ ∗
) − D(Rn (ρ0 )Xnj ) =
∗
√
(ρ0 − ρ̂n )D(Mn Xnj ). Because (ρ0 − ρ̂n ) = op (1) and n(ρ0 − ρ̂n ) = Op (1), the 4 claims in (b) hold
for Cn∗ = D(X̄nj
∗
) by Lemmas B.1, B.4, B.3, and B.5 respectively.
For (c), as vec0D (Ĉn − Cn )L = ln0 D(Ĉn − Cn )L , the results follow from (a).
1
For (d), as vec0D (Ĉn∗ − Cn∗ )L = ln0 D(Ĉn∗ − Cn∗ )L , it follows from (b) that 0 ∗
n vecD (Ĉn − Cn∗ )L z2n =
1 0 ∗
op (1). To show that n tr[An (Ĉn − Cn∗ )L ] = op (1), we first consider the case that Cn∗ is either Ḡ∗n or
Hn∗ . As we show in the proof of (a) that Ĉn − Cn conforms to the expansion (14). For m = 2, we
have
Ĉn − Cn = (α̂n − α0 )K1n (α0 ) + (α̂n − α0 )2 K2n (α̂n ).
Hence, it follows
1 1 1
tr[A0n (Ĉn − Cn )L ] = (α̂n − α0 ) tr(A0n K1n
L
(α0 )) + (α̂n − α0 )2 tr(A0n K2n
L
(α̂n )) = op (1),
n n n
1 0 L 1 0 L
because n tr(An K1n (α0 )) = O(1), n tr(An K2n (α̂n )) = Op (1) and (α̂n − α0 ) = op (1). Similarly,
1
tr[A0n D(Ĉn X̄n (ρ̂n )β̂ n − Cn X̄n β 0 )]
n
1
= vec0D (An )[(Ĉn − Cn )X̄n β 0 + Ĉn X̄n (β̂ n − β 0 ) + Ĉn (X̄n (ρ̂n ) − X̄n )β̂ n ] = op (1),
n
because
1
vec0D (An )(Ĉn − Cn )X̄n β 0
n
1 1
= (α̂n − α0 ) vec0D (An )K1n (α0 ) X̄n β 0 + (α̂n − α0 )2 vec0D (An )K2n (α̂n )X̄n β 0 = op (1),
n n
1 0
n vecD (An )Ĉn X̄n (β̂ n −β 0 ) = op (1), and n1 vec0D (An )Ĉn (X̄n (ρ̂n )−X̄n )β̂ n = (ρn −ρ̂0 ) n1 vec0D (An )Ĉn Mn Xn β̂ n =
op (1). As 1 0
n tr[An D(Cn )] = O(1), 1 0
n tr[An D(Cn X̄n β 0 )] = O(1), and σ̂ 2n , μ̂3 , κ̂ are consistent esti-
1 0 ∗
mates, it follows that n tr[An (Ĉn − Cn∗ )L ] = op (1) when Cn∗ is either Ḡ∗n or Hn∗ . On the other hand,
1
when Cn∗ = D(X̄nj
∗
), we have 0 ∗
n tr[An (Ĉn − Cn∗ )L ] = (ρ0 − ρ̂n ) n1 tr[A0n D(Mn Xnj
∗ L
) ] = op (1), because
1 0 ∗ L
(ρ0 − ρ̂n ) = op (1) and n tr[An D(Mn Xnj ) ] = O(1).
Lemma B.11 Suppose that zn is an n-dimensional column vector of constants which are uniformly
bounded, the n × n constant matrix An is uniformly bounded in column sums in absolute value, and
27
n1 , · · · , nn are i.i.d. with zero mean and finite fourth moment. Let θ̂n , σ̂ 2n , μ̂3 and μ̂4 be, respectively,
√
n-consistent estimates of θ0 , σ 20 , μ3 and μ4 . Let Cn be either Ḡn or Hn , with Ĉn being the empirical
counterpart. Let
µ ¶
η 23 1 0
T1n = X̄n + X̄n − ln ln X̄n ,
(η 4 − 1) − η23 n
2
µ ¶
η3 1 0
T2n = Cn X̄n β 0 + Cn X̄n β 0 − ln ln Cn X̄n β 0 ,
(η4 − 1) − η23 n
2σ 0 η 3
T3n = vecD (Cnd ),
(η 4 − 1) − η23
with T̂1n , T̂2n , and T̂3n being their empirical counterparts. Then, under Assumption 3,
1
(a) n (T̂in − Tin )0 zn = op (1), and
(b) √1 (T̂in − Tin )0 An n = op (1), for i = 1, 2, 3.

n
Proof. Let κ = σ 20 (μ4 −σ 40 )−μ23 , with μ3 = η3 σ 30 , μ4 = η4 σ40 , and κ̂ being the empirical counterpart,
we have
μ̂23 1 μ̂2 μ2 1
T̂1n − T1n = [In + (In − ln ln0 )](X̄n (ρ̂n ) − X̄n ) + ( 3 − 3 )(In − ln ln0 )X̄n , (18)
κ̂ n κ̂ κ n
and
μ̂23 1
T̂2n − T2n = [In + (In − ln ln0 )](Ĉn X̄n (ρ̂n )β̂ n − Cn X̄n β 0 )
κ̂ n
μ̂2 μ2 1
+( 3 − 3 )(In − ln ln0 )Cn X̄n β 0 ,
κ̂ κ n
2(σ̂ 2n )2 μ̂3 2(σ̂2n )2 μ̂3 2σ 40 μ3

T̂3n − T3n = vecD (Ĉn − Cn )d + ( − )vecD (Cnd ).
κ̂ κ̂ κ
1 0
As X̄n (ρ̂n ) − X̄n = (Rn (ρ̂n ) − Rn )Xn = (ρ0 − ρ̂n ) Mn Xn . Because n (Mn Xn ) zn = O(1) and
μ̂23 /κ̂ − μ23 /κ = op (1), it follows that 1
n (T̂1n − T1n )0 zn = op (1). For the first term in 1
n (T̂2n −
1 1
T2n )0 zn , since n [(Ĉn − Cn )X̄n (ρ̂n )β̂ n ]0 zn = op (1) by Lemma B.10, n [Cn (X̄n (ρ̂n ) − X̄n )β̂ n ]0 zn =
28
(ρ0 − ρ̂n ) n1 [Cn Mn Xn β̂ n ]0 zn = op (1), and (β̂ n − β 0 )0 n1 (Cn X̄n )0 zn = op (1), it follows that
1
(Ĉn X̄n (ρ̂n )β̂ n − Cn X̄n β 0 )0 zn
n
1
= [(Ĉn − Cn )X̄n (ρ̂n )β̂ n + Cn (X̄n (ρ̂n ) − X̄n )β̂ n + Cn X̄n (β̂ n − β 0 )]0 zn = op (1).
n
On the other hand, the remaining term in 1

n (T̂2n − T2n )0 zn is op (1) because (μ̂23 /κ̂ − μ23 /κ) = op (1)
1 1 0 0 1 0
and n [(In − n ln ln )Cn X̄n β 0 ] zn = O(1). For the first term in n (T̂3n −T3n ) zn , it follows from Lemma
1 0 d 1 0
B.10 that n vecD (Ĉn − Cn ) zn = op (1). And the remaining term in n (T̂3n − T3n ) zn is op (1) because
((σ̂ 2n )2 μ̂3 /κ̂ − σ40 μ3 /κ) = op (1) and 1 0 d
n vecD (Cn )zn = O(1). This proves (a).
The first term in √1 (T̂1n − T1n )0 An n is op (1) because
n
1 √ 1
√ (X̄n (ρ̂n ) − X̄n )0 An n = n (ρ0 − ρ̂n ) (Mn Xn )0 An n = op (1).
n n
√ 1 0
where n (ρ0 − ρ̂n ) = Op (1) and n (Mn Xn ) An n = op (1) by Lemma B.4. Similarly, the remaining
term in √1 (T̂1n − T1n )0 An n is also op (1). The first term in √1 (T̂2n − T2n )0 An n is op (1) because
n n
1
√ (Ĉn X̄n (ρ̂n )β̂ n − Cn X̄n β 0 )0 An n
n
0 1 √ 0 1
= β̂ n Xn0 √ Rn0 (ρ̂n )(Ĉn − Cn )0 An n + n (ρ0 − ρ̂n ) β̂ n (Mn Xn )0 Cn0 An n
n n
√ 1
+ n(β̂ n − β 0 )0 X̄n0 Cn0 An n . (19)
n
For the first term of (19), because Rn0 (ρ̂n )(Ĉn − Cn )0 can be expanded to the form of (14) by
0
Taylor expansion, it follows from Lemma B.10 and Slutsky’s lemma that β̂ n Xn0 √1n Rn0 (ρ̂n )(Ĉn −
Cn )0 An n= op (1). And the remaining terms of (19) are op (1), because n1 (Mn Xn )0 Cn0 An n = op (1)
√ √
and n1 X̄n0 Cn0 An n = op (1) by Lemma B.4, and n (ρ0 − ρ̂n ) = Op (1) and n(β̂ n − β 0 ) = Op (1).
Similarly, the remaining term in √1 (T̂2n is also op (1). The first term in √1n (T̂3n −
− T2n )0 An n
n
√
T3n )0 An n is op (1) by Lemma B.10, and the second term is also op (1) because n(2(σ̂ 2n )2 μ̂3 /κ̂ −
1
2σ 40 μ3 /κ) = Op (1) and 0 d
n vecD (Cn )An n = op (1). The desired results follow.
To show the proposed moment conditions are optimal, we show adding additional moment con-
ditions to the optimal moment conditions does not increase the asymptotic efficiency of the GMME
using the conditions for redundancy in Breusch et al. (1999). The definition of redundancy is given
29
as follows. “Let θ̂ be the optimal GMME based on a set of (unconditional) moment conditions
E [g1 (y, θ)] = 0. Now add some extra moment conditions E [g2 (y, θ)] = 0 and let θ̃ be the opti-
mal GMME based on the whole set of moment conditions E [g (y, θ)] ≡ E [g10 (y, θ) , g20 (y, θ)]0 = 0.
We say that the moment conditions E [g2 (y, θ)] = 0 are redundant given the moment conditions
E [g1 (y, θ)] = 0, or simply that g2 is redundant given g1 , if the asymptotic variances of θ̂ and θ̃ are the
0
same.” (Breusch et al., 1999, p. 90) For moment conditions E [g (y, θ)] ≡ E [g10 (y, θ) , g20 (y, θ)] = 0,
let ⎡ ⎤
⎢ Ω11 Ω12 ⎥
Ω ≡ E [g (y, θ) g 0 (y, θ)] = ⎣ ⎦,
Ω21 Ω22
£ ¤
with Ωjl = E [gj (y, θ) gl0 (y, θ)] for j, l = 1, 2. And define Dj = E ∂gj (y, θ) /∂θ0 for j = 1, 2. Let
the dimensions of g1 (y, θ), g2 (y, θ) and θ be k1 , k2 and p.
Lemma B.12 The following statements are equivalent.
(a) g2 is redundant given g1 .
(b) D2 = Ω21 Ω−1

11 D1 .
(c) There exists a k1 × p matrix A such that D1 = Ω11 A and D2 = Ω21 A.
Proof. Breusch et al. (1999) Theorem 1 (A) (C) and (D).
Lemma B.13 Let the set of moment conditions to be considered be
0
E [g (θ)] ≡ E [g10 (θ) , g20 (θ) , g30 (θ)] = 0,
or simply g = (g10 , g20 , g30 )0 . Then (g20 , g30 )0 is redundant given g1 if and only if g2 is redundant given
g1 and g3 is redundant given g1 .
Proof. Breusch et al. (1999) Theorem 2.
C Proofs
Proof of Proposition 1. For consistency, we first show that n1 an gn (θ)− n1 an E(gn (θ)) will converge
in probability uniformly in θ ∈ Θ to zero. Let an = (an1 , · · · , anm , anx ) where anx is a (row) subvec-
P
tor. Then an gn (θ) = 0n (θ)( m 0
j=1 anj Pjn ) n (θ) + anx Qn n (θ). Since Sn (λ) = Sn + (λ0 − λ)Wn and
30
Rn (ρ) = Rn + (ρ0 − ρ)Mn , it follows that n (θ)= dn (θ) + (In + (ρ0 − ρ)Hn )(In + (λ0 − λ)Ḡn ) n where
P
dn (θ) = (In +(ρ0 −ρ)Hn )[(λ0 −λ)Ḡn X̄n β 0 + X̄n (β 0 −β)]. It follows that 0n (θ)( m j=1 anj Pjn ) n (θ) =
P Pm
d0n (θ)( m 0 s
j=1 anj Pjn )dn (θ) + γ n (θ) + qn (θ), where γ n (θ) = dn (θ)( j=1 anj Pjn )(In + (ρ0 − ρ)Hn )(In +
Pm
(λ0 −λ)Ḡn ) n and qn (θ) = 0n (In +(λ0 −λ)Ḡn )0 (In +(ρ0 −ρ)Hn )0 ( j=1 anj Pjn )(In +(ρ0 −ρ)Hn )(In +
1
(λ0 − λ)Ḡn ) n . The term γ n (θ) is linear in n. Expansion of γ n (θ) shows that n γ n (θ) = op (1) uni-
formly in θ ∈ Θ, by Lemma B.4. The uniform convergence in probability follows because γ n (θ) is
simply a polynomial of θ and Θ is a bounded set. Similarly,
1 1 0 0 Pm s 1 0 0 Pm s
qn (θ) = (ρ0 − ρ) n Hn ( j=1 anj Pjn ) n + (λ0 − λ) n Ḡn ( j=1 anj Pjn ) n
n n n
1 0 0 0 Pm s 1 Pm
+(ρ0 − ρ)(λ0 − λ)[ Ḡ H ( anj Pjn ) n + 0n Hn0 ( j=1 anj Pjn
s
)Ḡn n ]
n n n n j=1 n
1 P
+(ρ0 − ρ)2 (λ0 − λ) 0n Hn0 ( m s
j=1 anj Pjn )Hn Ḡn n
n
1 Pm
+(ρ0 − ρ)(λ0 − λ)2 0n Ḡ0n ( j=1 anj Pjn s
)Hn Ḡn n
n
1 Pm 1 Pm
+(ρ0 − ρ)2 0n Hn0 ( j=1 anj Pjn s
)Hn n + (λ0 − λ)2 0n Ḡ0n ( j=1 anj Pjns
)Ḡn n
n n
1 P 1 0 Pm
+(ρ0 − ρ)2 (λ0 − λ)2 0n Ḡ0n Hn0 ( m s
j=1 anj Pjn )Hn Ḡn n + ( anj Pjn ) n
n n n j=1
1
= E( qn (θ)) + op (1),
n
where
1
E( qn (θ))
n
σ20 Pm n
0 s 0 s s 0 s
= j=1 nj (ρ0 − ρ)tr(Hn Pjn ) + (λ0 − λ)tr(Ḡn Pjn ) + (ρ0 − ρ)(λ0 − λ)tr(Ḡn Hn Pjn )
a
n
+(ρ0 − ρ)2 (λ0 − λ)tr(Hn0 Pjn
s
Hn Ḡn ) + (ρ0 − ρ)(λ0 − λ)2 tr(Ḡ0n Pjn
s
Hn Ḡn )
+(ρ0 − ρ)2 tr(Hn0 Pjn

s
Hn ) + (λ0 − λ)2 tr(Ḡ0n Pjn
s
Ḡn )
o
+(ρ0 − ρ)2 (λ0 − λ)2 tr(Ḡ0n Hn0 Pjn
s
Hn Ḡn ) , (20)
uniformly in θ ∈ Θ, by Lemma B.3 and E( 0n Pjn n ) = σ 20 tr(Pjn ) = 0 for all j = 1, · · · , m. Conse-

quently,
1 0 Pm an
n (θ)( j=1 anj Pjn ) n (θ) = E(gn (θ)) + op (1),
n n
where
an 1 P 1
E(gn (θ)) = d0n (θ)( m
j=1 anj Pjn )dn (θ) + E( qn (θ)).
n n n
31
As gn (θ) is a polynomial of θ and Θ is compact, n1 an E(gn (θ)) is uniformly equicontinuous on Θ. The
1
identification condition and the uniform equicontinuity of n an E(gn (θ)) imply that the identification
1 0
uniqueness condition for n2 E(gn (θ) a0n an gn (θ)) must be satisfied. The consistency of the GMME
θ̂n follows from this uniform convergence and the identification uniqueness condition. (White, 1994)
0
∂gn (θ̂n ) 0
For the asymptotic distribution of θ̂n , by the Taylor expansion of ∂θ an an gn (θ̂n ) = 0 at θ0 ,
√ 1 ∂gn0 (θ̂n ) 0 1 ∂gn (θ̄n ) −1 1 ∂gn0 (θ̂n ) 0 1

n(θ̂n − θ0 ) = −[ an an ] an √ an gn (θ0 ).
n ∂θ n ∂θ0 n ∂θ n
∂ n (θ)
As ∂θ 0 = −(Mn un (δ), Rn (ρ)Wn Yn , Rn (ρ)Xn ), it follows that
∂gn (θ) s s 0
= −(Qn , P1n n (θ), · · · , Pmn n (θ)) (Mn un , Rn (ρ)Wn Yn , Rn (ρ)Xn ).
∂θ0
1 0
n Qn Mn un = op (1), and at ρ0 , n1 Q0n Rn (ρ0 )Wn Yn = 1 0 1 0
n Qn Rn Gn Xn β 0 + n Qn Rn Gn n = 1 0
n Qn Rn Gn Xn β 0 +
0 s 0 s
op (1). Explicitly, n (θ)Pjn Mn un = n (θ)Pjn Hn n , where n (θ) = dn (θ)+(In +(ρ0 −ρ)Hn )(In +(λ0 −
1 0 s 1 0 s
λ)Ḡn ) n . Uniform convergence of n n (θ)Pjn Mn un follows from Lemma B.3, and n n (θ 0 )Pjn Mn un =
σ 20 s 1 0 s 1 0 s 1 0 s
n tr(Pjn Hn )+op (1) at θ0 . Similarly, n n (θ)Pjn Rn (ρ)Wn Yn = n n (θ)Pjn Rn (ρ)Gn Xn β 0 + n n (θ)Pjn Rn (ρ)Gn un .
1 0 s
Uniform convergence follows from Lemmas B.3 and B.4, and at θ0 , n n (θ 0 )Pjn Rn Gn Xn β 0 = op (1)
1 0 s σ 20 s 1 0 s σ 20 s
and n n (θ 0 )Pjn Rn Gn un = n tr(Pjn Ḡn )+op (1). Hence, n n (θ 0 )Pjn Rn Wn Yn = n tr(Pjn Ḡn )+op (1)
1 0 s 1 ∂gn (θ̃n ) 1
at θ0 . Finally, at θ0 , n n (θ 0 )Pjn Rn Xn = op (1). In conclusion, n ∂θ 0 = n Dn + op (1) with Dn in
(9). On the other hand, Lemma B.5 implies that
X m
1 1 D 1
√ an gn (θ0 ) = √ [ 0n ( anj Pjn ) n + anx Q0n n ] → N (0, lim an Ωn a0n ).
n n j=1
n→∞ n
√
The asymptotic distribution of n(θ̂n − θ0 ) follows.
Proof of Proposition 2. The generalized Schwartz inequality implies that the optimal weighting
matrix for a0n an in Proposition 1 is ( n1 Ωn )−1 . For consistency, consider
1 0 1 0 1 0
g (θ)Ω̂−1
n gn (θ) = g (θ)Ω−1
n gn (θ) + g (θ)(Ω̂−1 −1
n − Ωn )gn (θ).
n n n n n n
With an = ( n1 Ωn )−1/2 in Proposition 1, Assumption 6 implies that a0 = (limn→∞ n1 Ωn )−1/2 exits.

Because a0 is nonsingular, the identification condition of θ0 corresponds to the unique root of
32
limn→∞ E( n1 gn (θ)) = 0 at θ0 , which is satisfied by Assumption 5. Hence, the uniform convergence
1 0 −1
in probability of n gn (θ)Ωn gn (θ) to a well defined limit uniformly in θ ∈ Θ follows by a similar
1 0 −1 −1
argument in the proof of Proposition 1. So it remains to show that n gn (θ)(Ω̂n − Ωn )gn (θ) = op (1)
uniformly in θ ∈ Θ. Let k · k be the Euclidean norm for vectors and matrices. Then,
1 0 1 Ω̂n Ωn
k g (θ)(Ω̂−1 −1
n − Ωn )gn (θ) k≤ ( k gn (θ) k)2 k ( )−1 − ( )−1 k .
n n n n n
1
To show this is of probability order op (1) uniformly in θ ∈ Θ, it is sufficient to show that n kgn (θ)k =
Op (1) uniformly in θ ∈ Θ. From the proof of Proposition 1, n1 [gn (θ) − E(gn (θ))] = op (1) uniformly
Pm
in θ ∈ Θ. On the other hand, as E( n1 qn (θ)) = j=1 anj E( n1 0n (θ)Pjn n (θ)), it follows from (20) that
E( n1 0
n (θ)Pjn n (θ)) = O(1), uniformly in θ ∈ Θ. Let dn (θ) = (In + (ρ0 − ρ)Hn )[(λ0 − λ)Ḡn X̄n β 0 +
1 0 1 0
X̄n (β 0 − β)]. Similarly, n E(Qn n (θ)) = n Qn dn (θ) = O(1) uniformly in θ ∈ Θ. These imply that
|| n1 E(gn (θ))|| = O(1) uniformly in θ ∈ Θ. Consequently, 1
n kgn (θ)k = Op (1) uniformly in θ ∈ Θ,
because || n1 gn (θ)|| ≤ || n1 gn (θ) − n1 E(gn (θ))|| + || n1 E(gn (θ))||. Therefore, || n1 gn0 (θ)(Ω̂−1 −1
n − Ωn )gn (θ)||
converges in probability to zero, uniformly in θ ∈ Θ. The consistency of the feasible optimum

GMME θ̂f o,n follows.
1 ∂gn (θ̂n )
For the limiting distribution, as n ∂θ = − Dnn + op (1) from the proof of Proposition 1,
⎡ Ã !−1 ⎤−1 Ã !−1

√ 1 ∂gn0 (θ̂n )
Ω̂n 1 ∂gn (θ̂n ) ⎦ 1 ∂gn0 (θ̂n ) Ω̂n 1
n(θ̂f o,n − θ0 ) = − ⎣ √ gn (θ0 )
n ∂θ n n ∂θ n ∂θ n n
" µ ¶−1 #−1 µ ¶−1
Dn0 Ωn Dn Dn0 Ωn 1
= √ gn (θ0 ) + op (1).
n n n n n n
√
The limiting distribution of n(θ̂f o,n − θ0 ) follows from this expansion.
For the overidentification test, by the Taylor expansion,
1 gn (θ0 ) 1 ∂gn (θ̄n ) √ gn (θ0 ) Dn √

√ gn (θ̂f o,n ) = √ + n(θ̂f o,n − θ0 ) = √ − n(θ̂n − θ0 ) + op (1)
n n n ∂θ n n
gn (θ0 )
= An √ + op (1),
n
33
0 0
Dn Dn Ωn −1 Dn −1 Dn Ωn −1
where An = In − n [ n ( n ) n ] n ( n ) . Therefore,
gn0 (θ̂f o,n )Ω̂−1

n gn (θ̂ f o,n )
µ ¶−1
1 0 0 Ωn 1
= √ gn (θ0 )An An √ gn (θ0 ) + op (1)
n n n
½ ¾
1 0 Ωn −1/2 Ωn −1/2 Dn Dn0 Ωn −1 Dn −1 Dn0 Ωn −1/2
= √ g (θ0 )( ) In − ( ) [ ( ) ] ( )
n n n n n n n n n n
Ωn 1
×( )−1/2 √ gn (θ0 ) + op (1)
n n
D
→ χ2 ((m + kx ) − (p + k)),
D Ωn
because √1 gn (θ 0 ) → N (0, limn→∞
n n ) as in the proof of Proposition 1.
Proof of Proposition 3. Consider the moment conditions

⎡ ⎤
∗
g (θ
⎢ n 0 ⎥ )
E⎣ ⎦ = 0,
gn (θ0 )
where gn (θ) is a vector of arbitrary moment functions taken the form of (4). To show the desired
results, it is sufficient to show that gn is redundant given gn∗ , or equivalently that there exists an An
invariant with Pjn (j = 1, · · · , m) and Qn st. D2 = Ω21 An according to Lemma B.12 (c), where
⎛ ⎞
0 Q0n Ḡn X̄n β 0 Q0n X̄n
⎜ ⎟
⎜ 2 ⎟
∂E(gn (θ0 )) ⎜ σ0 tr(P1n s
Hn ) σ 20 tr(P1n s
Ḡn ) 0 ⎟
⎜ ⎟
D2 = 0 = − ⎜ . . .. ⎟,
∂θ ⎜ .. .. ⎟
⎜ . ⎟
⎝ ⎠
σ 20 tr(Pmns
Hn ) σ 20 tr(Pmns
Ḡn ) 0
34
and
Ω21
= E (gn (θ0 ) gn∗0 (θ0 ))

⎛ ⎞
σ 20 Q0n Q∗n μ3 Q0n vecD (P1n ∗
) · · · μ3 Q0n vecD (Pk∗∗ +2,n )
⎜ ⎟
⎜ ⎟
⎜ μ3 vec0 (P1n ) Q∗n σ 4
tr(P s
P ∗
) · · · σ 4
tr(P s
P ∗
) ⎟
⎜ D 0 1n 1n 0 1n k∗ +2,n ⎟
= ⎜ .. .. .. ⎟
⎜ .. ⎟
⎜ . . . . ⎟
⎝ ⎠
0 ∗ 4 s ∗ 4 s ∗
μ3 vecD (Pmn ) Qn σ0 tr(Pmn P1n ) · · · σ0 tr(Pmn Pk∗ +2,n )
⎛ ⎞
0 0 ··· 0
⎜ ⎟
⎜ ⎟
⎜ 0 vec0 (P1n ) vecD (P1n ∗
) · · · vec0D (P1n ) vecD (Pk∗∗ +2,n ) ⎟
⎜ D ⎟
+(μ4 − 3σ 40 ) ⎜ . .. .. .. ⎟.
⎜ . . ⎟
⎜ . . . ⎟
⎝ ⎠
0 vec0D (Pmn ) vecD (P1n ∗
) · · · vec0D (Pmn ) vecD (Pk∗∗ +2,n )
£ ¤
To simplify notations, denote κ = σ 60 (η4 − 1) − η 23 = σ20 (μ4 − σ 40 ) − μ23 . Let
⎛ ⎞
⎜ 0k×1 0k×1 σ −2
0 Ik ⎟
⎜ ⎟
⎜ 0 σ−2 01×k ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ 2σ20 μ3 ⎟
⎜ − κ 0 0 ⎟
An = − ⎜
⎜
⎟,
⎟
⎜ 0 σ−2 01×k ⎟
⎜ 0 ⎟
⎜ ⎟
⎜ σ −2 0 0 ⎟
⎜ 0 ⎟
⎝ ⎠
0k∗ ×1 0k∗ ×1 b
0
where b = (b01 , · · · , b0k∗ ) with bj = − (μ3 /κ) e0kj for j = 1, · · · , k ∗ . To check D2 = Ω21 An , the
following identities are helpful:
2σ 60 tr(Ḡn ) σ2 μ
∗
(1) vecD (P1n )= κ vecD (Ḡn − n In )− 0κ 3 (Ḡn X̄n β 0 − n1 ln ln0 Ḡn X̄n β 0 ),
∗ 2σ 60 tr(Hn )
(2) vecD (P2n )= κ vecD (Hn − n In ),
∗
(3) vecD (Pj+2,n ∗
) = X̄nj − n1 ln ln0 X̄nj
∗
, for j = 1, · · · , k∗ ,
P ∗
(4) kj=1 vecD (Pj+2,n
∗
)e0kj = X̄n − n1 ln ln0 X̄n .
It follows from identity (1) that
(5) σ20 Q∗2n + μ3 vecD (P1n
∗
) = σ 20 Ḡn X̄n β 0 ,
it follows from identity (2) that
35
2σ 60 ∗ ∗
(6) κ Q3n = vecD (P2n ),
and it follows from identity (4) that
μ2 Pk∗
(7) Q∗1n − κ3 j=1 vecD (Pj+2,n
∗
)e0kj = X̄n .
For an arbitrary n × n matrix Pn with tr (Pn ) = 0, we have:
(8) vec0D (Pn )Q∗1n = (σ20 (μ4 − σ 40 )/κ)vec0D (Pn )X̄n ,
(9) μ3 vec0D (Pn )Q∗2n + σ 40 tr(Pns P1n
∗
) + (μ4 − 3σ 40 )vec0D (Pn )vecD (P1n
∗
) = σ 40 tr(Pns Ḡn ),
2σ 40 μ23 0 ∗
(10) − κ vecD (Pn )Q3n
∗
+ σ 40 tr(Pns P2n ) + (μ4 − 3σ 40 )vec0D (Pn )vecD (P2n
∗
) = σ 40 tr(Pns Hn ), and
∗
(11) σ 40 tr(Pns Pj+2,n ) + (μ4 − 3σ 40 )vec0D (Pn )vecD (Pj+2,n
∗
) = (μ4 − σ40 )vec0D (Pn )vecD (Pj+2,n
∗
), for
j = 1, · · · , k ∗ .
It follows from identity (6) the (1, 1) block of Ω21 An are zeros, it follows from identity (5) that
the (1, 2) block of Ω21 An is −Q0n Ḡn X̄n β 0 , and it follows from identity (7) that the (1, 3) block
of Ω21 An is −Q0n X̄n . Identity (10) implies that the (j + 1, 1) blocks of Ω21 An are −σ 20 tr(Pjn
s
Hn )
for j = 1, · · · , m, identity (9) implies that the (j + 1, 2) blocks of Ω21 An are −σ 20 tr(Pjn
s
Ḡn ) for
j = 1, · · · , m, and identities (4), (8) and (11) imply that the remaining blocks of Ω21 An are zeros.
Therefore, Ω21 An = D2 .
Furthermore, as gn∗ (θ) is a special case of gn (θ), and An is invariant with Pn ’s and Qn , it follows
that D1 = Ω11 An , and hence Ω−1 ∗ ∗
11 D1 = An , where Ω11 = Ωn = var (gn (θ 0 )) and
⎛ ⎞
0 Q∗0
n Ḡn X̄n β 0 Q∗0
n X̄n
⎜ ⎟
µ ¶ ⎜ ⎟
∂gn∗ (θ0 ) ⎜ ∗s
σ 20 tr(P1n Hn ) ∗s
σ 20 tr(P1n Ḡn ) 0 ⎟
⎜ ⎟
D1 = E = −⎜ .. .. .. ⎟.
∂θ ⎜ ⎟
⎜ . . . ⎟
⎝ ⎠
σ 20 tr(P ∗s 2 ∗s
k∗ +2,n Hn ) σ 0 tr(P k∗ +2,n Ḡn ) 0
Hence Σb = limn→∞ n1 D10 Ω−1 1 0

11 D1 = limn→∞ n D1 An , where
⎛ ⎞
∗s ∗s
⎜ tr(P2n Hn ) tr (P1n Hn ) Σ13 ⎟
⎜ ¡ ¢ ¡ ∗s ¢ ⎟
D10 An = ⎜
⎜ Σ21 σ−2
0
0
Ḡn X̄n β 0 Q∗2n + tr P1n Ḡn Σ23 ⎟
⎟
⎝ ⎠
−2σ 20 μ3 X̄n0 Q∗3n /κ σ −2 0 ∗
0 X̄n Q2n σ −2 0 ∗
0 X̄n Q1n
⎛ ⎞
∗s ∗s
⎜ tr(P2n Hn ) tr (P1n Hn ) −2σ 20 μ3 Q∗0
3n X̄n /κ ⎟
⎜ ¡ ¢ ¡ ¢ ⎟
= ⎜
⎜
∗s
tr (P1n Hn ) σ−2
0
0
Ḡn X̄n β 0 Q∗2n + tr P1n
∗s
Ḡn σ−2 ∗0
0 Q2n X̄n
⎟,
⎟
⎝ ⎠
−2σ 20 μ3 X̄n0 Q∗3n /κ σ −2 0 ∗
0 X̄n Q2n σ−2 0 ∗
0 X̄n Q1n
36
since D10 An = D10 Ω−1 2 4 2 3
11 D1 is symmetric. The desired result follows as κ = σ 0 (μ4 −σ 0 )−μ3 , μ3 = η 3 σ 0 ,
and μ4 = η4 σ40 .
Proof of Proposition 4. We shall show that the objective functions z∗n (θ) = ĝn∗0 (θ)Ω̂∗−1 ∗
n ĝn (θ)
and zn (θ) = gn∗0 (θ)Ω∗−1 ∗

n gn (θ) will satisfy the conditions in Lemma B.6. If so, the GMME from the
minimization of z∗n (θ) will have the same limiting distribution as that of the minimization of zn (θ).
The difference of z∗n (θ) and zn (θ) and its derivatives involve the difference of ĝn∗ (θ) and gn∗ (θ) and
their derivatives. Furthermore, one has to consider the difference of Ω̂∗n and Ω∗n .
1 ∗
First, consider n (ĝn (θ) − gn∗ (θ)). Explicitly,
1 ∗
(ĝ (θ) − gn∗ (θ))0
n n
1 1 1
= [ (Q̂∗n − Q∗n )0 , 0n (θ)(P̂1n
∗ ∗
− P1n ), · · · , 0 ∗
n (θ)(P̂k∗ +2,n − Pk∗∗ +2,n )] n (θ).
n n n
The n (θ) is related to n as n (θ) = (In + (ρ0 − ρ)Hn )(In + (λ0 − λ)Ḡn ) n + dn (θ) where dn (θ) =
1 ∗ 1
(In + (ρ0 − ρ)Hn )[(λ0 − λ)Ḡn X̄n β 0 + X̄n (β 0 − β)]. It follows that n (Q̂n − Q∗n )0 n (θ) = ∗
n (Q̂n −
1
Q∗n )0 (In + (ρ0 − ρ)Hn )(In +(λ0 − λ)Ḡn ) ∗ ∗ 0
n + n (Q̂n − Qn ) dn (θ) = op (1) uniformly in θ ∈ Θ by Lemma
1 0 ∗ ∗
B.11. Similarly, it follows that n n (θ)(P̂jn − Pjn ) n (θ) = op (1), for j = 1, · · · , k ∗ + 2, uniformly in
1 ∗
θ ∈ Θ by Lemma B.10. Hence, we conclude that n (ĝn (θ) − gn∗ (θ)) = op (1) uniformly in θ ∈ Θ.
Consider the derivatives of ĝn∗ (θ) and gn∗ (θ). As the second derivatives of n (θ) with respect to θ
are zero because n (θ) is linear in θ, it follows that
⎛ ⎞ ⎛ ⎞
∂ n (θ)
Q∗0
n ∂θ 0 0
⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ 0 ⎟
∂gn∗ (θ) ⎜ ⎟ ∂ 2 gn∗ (θ) ⎜ ⎟
0 ∗s ∂ n (θ) ∂ n (θ) ∗s ∂ n (θ)
⎜ n (θ)P1n ∂θ 0 ⎟ ⎜ ∂θ P1n ∂θ0 ⎟
=⎜ .. ⎟ , and =⎜ .. ⎟.
∂θ0 ⎜ ⎟ ∂θ∂θ0 ⎜ ⎟
⎜ . ⎟ ⎜ . ⎟
⎝ ⎠ ⎝ 0
⎠
0 ∗s ∂ n (θ) ∂ n (θ) ∗s ∂ n (θ)
n (θ)Pk∗ +2,n ∂θ 0 ∂θ Pk∗ +2,n ∂θ 0
∂ n (θ)
The first order derivatives of n (θ) is ∂θ 0 = −(Mn un (δ), Rn (ρ)Wn Yn , Rn (ρ)Xn ), where un (δ) =
(In − λWn )Yn − Xn β, Rn (ρ) = In − ρMn , and Yn = Sn−1 Xn β 0 + Sn−1 Rn−1 n . It follows from Lemmas
∗ ∗ 2 ∗ ∗
1 ∂ ĝn (θ) ∂gn (θ) 1 ∂ ĝn (θ) ∂ 2 gn (θ)
B.10 and B.11 that n ( ∂θ − ∂θ ) = op (1) and n ( ∂θ∂θ 0 − ∂θ∂θ 0 ) = op (1) uniformly in θ ∈ Θ.
37
1 ∗
Consider n (Ω̂n − Ω∗n ), where
Ω∗n = E [gn∗ (θ0 ) gn∗0 (θ0 )]

⎛ ⎞
2 ∗0 ∗ ∗0 ∗
⎜ σ 0 Qn Qn μ3 Qn ω k∗ +2 ⎟
= ⎝ ⎠,
∗0 ∗ 4 ∗ 4 ∗0 ∗
μ3 ω k∗ +2 Qn σ 0 ∆k∗ +2 + (μ4 − 3σ 0 )ω k∗ +2 ω k∗ +2
with ω ∗k∗ +2 = [vecD (P1n

∗
), · · · , vecD (Pk∗∗ +2,n )] and
⎛ ³ ´ ⎞
∗s ∗ ∗s ∗
⎜ tr (P1n P1n ) ··· tr P1n Pk∗ +2,n ⎟
⎜ .. .. .. ⎟
∆∗k∗ +2 =⎜
⎜ . . . ⎟.
⎟
⎝ ³ ´ ³ ´ ⎠
tr Pk∗s∗ +2,n P1n
∗
··· tr Pk∗s∗ +2,n Pk∗∗ +2,n
First, consider the block matrix σ40 ∆∗k∗ +2 + (μ4 − 3σ 40 )ω ∗0 ∗ ∗

k∗ +2 ω k∗ +2 . As P̂jn is uniformly bounded in
column sums in absolute value with probability one, it follows from Lemma B.10 that n1 tr(P̂in
∗s ∗
P̂jn )−
1 1
∗s ∗
n tr(Pin Pjn ) = ∗s ∗s ∗ ∗s ∗ ∗
n tr[(P̂in −Pin )P̂jn +Pin (P̂jn −Pjn )] = op (1), and n1 vec0D (P̂in
∗ ∗
)vecD (P̂jn )− n1 vec0D (Pin
∗ ∗
) vecD (Pjn )=
1 0 ∗ ∗
n vecD (P̂in )vecD (P̂jn
∗
− Pjn ) + n1 vec0D (P̂in
∗ ∗
− Pin ∗
)vecD (Pjn ) = op (1) for i, j = 1, · · · , k ∗ + 2. Hence,
2 2
1 ∗s ∗ 1 4 ∗s ∗
n (σ̂ n ) tr(P̂in P̂jn )− n σ 0 tr(Pin Pjn ) = (σ̂ 2n )2 n1 (tr(P̂in
∗s ∗
Pjn ))+((σ̂ 2n )2 −σ 40 ) n1 tr(Pin
∗s ∗
P̂jn )−tr(Pin ∗s ∗
Pjn ) =
op (1), and
1 1
(μ̂ − 3(σ̂ 2n )2 )vec0D (P̂in
∗ ∗
)vecD (P̂jn ) − (μ4 − 3σ 40 )vec0D (Pin∗ ∗
)vecD (Pjn )
n 4 n
1
= (μ̂4 − 3(σ̂2n )2 ) [vec0D (P̂in
∗ ∗
)vecD (P̂jn ) − vec0D (Pin
∗ ∗
)vecD (Pjn )]
n
1
+[(μ̂4 − 3(σ̂ 2n )2 ) − (μ4 − 3σ40 )] vec0D (Pin ∗
)vecD (Pjn∗
)
n
= op (1),
for i, j = 1, · · · , k ∗ + 2.
Next consider the block matrix μ3 Q∗0 ∗ ∗
n ω k∗ +1 . As elements of Q̂n are uniformly bounded with
1 ∗0 ∗ 1 ∗0 ∗
probability one, it follows from Lemmas B.10 and B.11 that n Q̂n vecD (P̂jn ) − n Qn vecD (Pjn ) =
1 ∗0 1
∗ ∗ ∗ ∗ 0 ∗
n Q̂n vecD (P̂jn −Pjn )+ n (Q̂n −Qn ) vecD (Pjn ) = op (1) for j = 1, · · · , k ∗ +2. Hence, n1 (μ̂3 Q̂∗0 ∗
n vecD (P̂jn )−
1 1 ∗0
μ3 Q∗0 ∗ ∗0 ∗ ∗0 ∗ ∗
n vecD (Pjn )) = μ̂3 n (Q̂n vecD (P̂jn ) − Qn vecD (Pjn )) + (μ̂3 − μ3 ) n Qn vecD (Pjn ) = op (1) for
j = 1, · · · , k ∗ + 2.
Lastly, consider the remaining block matrix σ 20 Q∗0 ∗ ∗
n Qn . As elements of Q̂n are uniformly bounded
38
1 ∗0 ∗ 1
with probability one, Lemma B.11 implies that n (Q̂in Q̂jn − Q∗0 ∗
in Qjn ) =
∗0 ∗
n [Q̂in (Q̂jn − Q∗jn ) + (Q̂∗in −
Q∗in )0 Q∗jn ] = op (1) for i, j = 1, 2, 3. Therefore, it follows that n1 (σ̂ 2n Q̂∗0 ∗ 2 ∗0 ∗ 2 1 ∗0 ∗
n Q̂n −σ 0 Qn Qn ) = σ̂ n n (Q̂n Q̂n −
2 2 1 ∗0 ∗ 1 ∗
Q∗0 ∗
n Qn ) + (σ̂ n − σ 0 ) n Qn Qn = op (1). In conclusion, n Ω̂n − n1 Ω∗n = op (1). As the limit of 1 ∗
n Ωn exists
and is a nonsingular matrix, it follows that ( n1 Ω̂∗n )−1 − ( n1 Ω∗n )−1 = op (1) by the continuous mapping
theorem.
1 ∗ 1 ∗
Furthermore, because n (ĝn (θ) − gn∗ (θ)) = op (1), and n [gn (θ) − E(gn∗ (θ))] = op (1) uniformly in
θ ∈ Θ, and supθ∈Θ n1 |E(gn∗ (θ))| = O(1) (see proof of Proposition 2), hence 1 ∗
n gn (θ) and 1 ∗
n ĝn (θ) are
∗ ∗ ∗ ∗
1 ∂gn (θ) 1 ∂ ĝn (θ) 1 ∂ 2 gn (θ) 1 ∂ 2 ĝn (θ)
stochastically bounded, uniformly in θ ∈ Θ. Similarly, n ∂θ , n ∂θ , n ∂θ∂θ and n ∂θ∂θ are
stochastically bounded, uniformly in θ ∈ Θ.
With the uniform convergence in probability and uniformly stochastic boundedness properties,
the difference of z∗n (θ) and zn (θ) can be investigated. By expansion,
1 ∗
(z (θ) − zn (θ))
n n
1 ∗0 1 ∗0 1 ∗0
= ĝ (θ)Ω̂∗−1 ∗ ∗
n (ĝn (θ) − gn (θ)) + g (θ)(Ω̂∗−1 − Ω∗−1 ∗ ∗−1 ∗ ∗
n )ĝn (θ) + gn (θ)Ωn (ĝn (θ) − gn (θ))
n n n n n
n
= op (1),
uniformly in θ ∈ Θ. Similarly, for each component θl of θ,
1 ∂ 2 z∗n (θ) 1 ∂ 2 zn (θ)

−
n ∂θl ∂θ0 n ∂θl ∂θ0
2 ∂ĝn∗0 (θ) ∗−1 ∂ĝn∗ (θ) ∂ 2 ĝn∗ (θ) ∂gn∗0 (θ) ∗−1 ∂gn∗ (θ) ∂ 2 gn∗ (θ)
= [ Ω̂n 0 + ĝn∗0 (θ)Ω̂∗−1
n 0 −( Ωn 0 + gn∗0 (θ)Ω∗−1
n )]
n ∂θl ∂θ ∂θl ∂θ ∂θl ∂θ ∂θl ∂θ0
= op (1).
∗0 ∗0
∂ ĝn (θ0 ) ∗−1 ∂gn (θ0 ) ∗−1
Finally, because ( Ω̂n − Ωn ) = op (1) as above, and √1 g ∗ (θ 0 ) = Op (1) by the
∂θ ∂θ n n
central limit theorems in Lemmas B.4 and B.5,
1 ∂z∗ (θ ) ∂zn (θ0 )

√ ( n 0 − )
n ∂θ ∂θ
∂ĝ ∗0 (θ0 ) ∗−1 1 ∗ ∂ĝ ∗0 (θ0 ) ∗−1 ∂gn∗0 (θ0 ) ∗−1 1 ∗
= 2{ n Ω̂n √ (ĝn (θ0 ) − gn∗ (θ0 )) + ( n Ω̂n − Ωn ) √ gn (θ0 )}
∂θ n ∂θ ∂θ n
1 ∂ĝn∗0 (θ0 ) 1 ∗ −1 1 ∗
= 2 ( Ω̂n ) √ (ĝn (θ0 ) − gn∗ (θ0 )) + op (1).
n ∂θ n n
39
∂z ∗
n (θ0 ) ∂z n (θ0 )
As √1 (ĝ ∗ (θ 0 ) − gn∗ (θ0 )) = op (1) by Lemmas B.10 and B.11, √1 ( − ) = op (1). The
n n n ∂θ ∂θ
desired result follows from Lemma B.6.
References
Anselin, L. (1988). Spatial Econometrics: Methods and Models, Kluwer Academic Publishers, Dor-
drecht.
Breusch, T., Qian, H., Schmidt, P. and Wyhowski, D. (1999). Redundancy of moment conditions,
Journal of Econometrics 91: 89—111.
Hatanaka, M. (1974). An efficient two-step estimator for the dynamic adjustment model with
autoregressive errors, Journal of Econometrics 2: 199—220.
Horn, R. and Johnson, C. (1985). Matrix Analysis, Cambridge Univsersity Press, Cambridge.
Kelejian, H. H. and Prucha, I. R. (1998). A generalized spatial two-stage least squares procedure
for estimating a spatial autoregressive model with autoregressive disturbance, Journal of Real
Estate Finance and Economics 17: 99—121.
Kelejian, H. H. and Prucha, I. R. (1999). A generalized moments estimator for the autoregressive
parameter in a spatial model, International Economic Review 40: 509—533.
Kelejian, H. H. and Prucha, I. R. (2001). On the asymptotic distribution of the moran i test statistic
with applications, Journal of Econometrics 104: 219—257.
Lee, L. F. (2001). Generalized method of moments estimation of spatial autoregressive processes.

Manuscript, Department of Economics, Ohio State University.
Lee, L. F. (2003). Best spatial two-stage least squares estimators for a spatial autoregressive model
with autoregressive disturbances, Econometric Reviews 22: 307—335.
Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial

econometric models, Econometrica 72: 1899—1926.
Lee, L. F. (2006). Gmm and 2sls estimation of mixed regressive, spatial autoregressive models.
Working paper, Department of Economics, Ohio State University, forthcoming in Journal of
Econometrics.
40
Lin, X. and Lee, L. F. (2006). Gmm estimation of spatial autoregressive models with unknown
heteroskedasticity. Working paper, Department of Economics, Ohio State University.
Liu, X., Lee, L. F. and Bollinger, C. (2006). Improved efficient quasi maximum likelihood estimator of
spatial autoregressive models. Working paper, Department of Economics, Ohio State University.
White, H. (1994). Estimation, Inference and Specification Analysis, Cambridge Univsersity Press,
New York.
41
Table 1: 2SLSE and GMME of the MRSAR model with SAR disturbances
(normal distribution; Wn = Mn = WA )
true parameters: λ0 = 0.2, ρ0 = 0.4, β 10 = 1.0, β 20 = 0, β 30 = −1.0
G2SLS QML GMM1 GMM2 BGMM
Mean(SD)[RMSE] Mean(SD)[RMSE] Mean(SD)[RMSE] Mean(SD)[RMSE] Mean(SD)[RMSE]
n = 245
λ 0.247(.143)[.151] 0.206(.140)[.140] 0.201(.142)[.142] 0.201(.143)[.143] 0.191(.163)[.164]
ρ 0.328(.148)[.165] 0.377(.149)[.150] 0.387(.146)[.147] 0.387(.149)[.149] 0.400(.162)[.162]
β1 1.002(.092)[.092] 0.999(.092)[.092] 0.997(.092)[.092] 0.997(.092)[.092] 0.994(.095)[.095]
β2 −0.003(.089)[.089] −0.003(.088)[.088] −0.003(.089)[.089] −0.003(.089)[.089] −0.003(.090)[.090]
β3 −1.004(.093)[.093] −1.000(.093)[.093] −0.999(.094)[.094] −0.999(.094)[.094] −0.996(.097)[.097]
n = 490
λ 0.219(.103)[.105] 0.201(.099)[.099] 0.197(.100)[.100] 0.198(.100)[.100] 0.194(.104)[.104]
ρ 0.365(.105)[.111] 0.391(.099)[.100] 0.396(.098)[.098] 0.396(.098)[.098] 0.402(.101)[.101]
β1 1.004(.062)[.062] 1.002(.063)[.063] 1.001(.062)[.062] 1.001(.062)[.062] 1.001(.063)[.063]
β2 −0.001(.064)[.064] −0.001(.064)[.064] −0.001(.064)[.064] −0.001(.064)[.064] −0.001(.065)[.065]
β3 −1.001(.065)[.065] −0.999(.065)[.065] −0.999(.066)[.066] −0.999(.066)[.066] −0.998(.066)[.066]
42
(t distribution; Wn = Mn = WA )
n = 245
λ 0.250(.140)[.149] 0.208(.142)[.142] 0.203(.141)[.141] 0.202(.143)[.143] 0.195(.153)[.153]
ρ 0.324(.150)[.169] 0.374(.151)[.153] 0.384(.146)[.147] 0.385(.149)[.150] 0.396(.156)[.156]
β1 1.001(.091)[.091] 0.997(.091)[.091] 0.996(.091)[.091] 0.996(.092)[.092] 0.995(.094)[.094]
β2 −0.003(.089)[.089] −0.002(.088)[.088] −0.003(.089)[.089] −0.003(.089)[.089] −0.003(.087)[.087]
β3 −1.004(.093)[.093] −1.000(.094)[.094] −0.999(.094)[.094] −0.999(.094)[.094] −0.997(.093)[.093]
n = 490
λ 0.218(.103)[.105] 0.200(.100)[.100] 0.196(.102)[.102] 0.195(.102)[.103] 0.190(.105)[.106]
ρ 0.364(.105)[.111] 0.391(.101)[.101] 0.397(.100)[.100] 0.397(.100)[.100] 0.405(.103)[.103]
β1 1.003(.063)[.063] 1.001(.063)[.063] 1.000(.063)[.063] 1.000(.063)[.063] 1.000(.063)[.063]
β2 −0.001(.064)[.064] −0.001(.064)[.064] −0.001(.064)[.064] −0.001(.064)[.064] −0.001(.064)[.064]
β3 −1.001(.065)[.065] −0.999(.065)[.065] −0.998(.065)[.065] −0.998(.065)[.065] −0.998(.065)[.065]
(symmetric bimodal mixture normal distribution; Wn = Mn = WA )
n = 245
λ 0.245(.141)[.148] 0.200(.141)[.141] 0.197(.141)[.141] 0.198(.140)[.140] 0.194(.136)[.136]
ρ 0.323(.145)[.164] 0.379(.148)[.149] 0.388(.143)[.143] 0.387(.141)[.142] 0.395(.138)[.139]
β1 1.000(.090)[.090] 0.996(.091)[.091] 0.995(.092)[.092] 0.995(.091)[.092] 0.994(.086)[.086]
β2 −0.002(.089)[.089] −0.002(.088)[.088] −0.002(.089)[.089] −0.002(.089)[.089] −0.002(.082)[.082]
β3 −1.000(.090)[.090] −0.995(.090)[.090] −0.995(.090)[.090] −0.995(.090)[.090] −0.994(.084)[.085]
n = 490
λ 0.220(.104)[.106] 0.202(.101)[.101] 0.197(.104)[.104] 0.197(.103)[.103] 0.198(.099)[.099]
ρ 0.362(.109)[.116] 0.387(.105)[.106] 0.394(.105)[.105] 0.393(.104)[.104] 0.395(.101)[.101]
β1 1.004(.066)[.066] 1.002(.066)[.066] 1.001(.066)[.066] 1.001(.066)[.066] 1.000(.064)[.064]
β2 0.001(.060)[.060] 0.001(.060)[.060] 0.001(.060)[.060] 0.001(.060)[.060] 0.001(.057)[.057]
β3 −1.002(.065)[.065] −1.000(.065)[.065] −0.999(.065)[.065] −0.999(.065)[.065] −0.999(.061)[.061]
43
(asymmetric bimodal mixture normal distribution; Wn = Mn = WA )
n = 245
λ 0.250(.153)[.161] 0.199(.151)[.151] 0.194(.153)[.153] 0.195(.151)[.151] 0.187(.155)[.155]
ρ 0.315(.155)[.177] 0.379(.156)[.158] 0.389(.152)[.153] 0.387(.150)[.151] 0.400(.150)[.150]
β1 0.999(.098)[.098] 0.995(.098)[.099] 0.994(.099)[.100] 0.994(.099)[.099] 0.992(.089)[.089]
β2 −0.001(.097)[.097] −0.001(.096)[.096] −0.001(.097)[.097] −0.001(.097)[.097] −0.001(.085)[.085]
β3 −0.999(.099)[.099] −0.994(.098)[.098] −0.993(.099)[.099] −0.994(.098)[.099] −0.993(.090)[.090]
n = 490
λ 0.224(.112)[.115] 0.202(.109)[.109] 0.196(.112)[.112] 0.197(.111)[.111] 0.195(.099)[.099]
ρ 0.355(.117)[.125] 0.384(.113)[.114] 0.392(.111)[.112] 0.392(.111)[.111] 0.397(.102)[.102]
β1 1.003(.071)[.071] 1.001(.071)[.071] 1.000(.071)[.071] 1.000(.071)[.071] 1.000(.062)[.062]
β2 0.002(.066)[.066] 0.002(.066)[.066] 0.002(.066)[.066] 0.002(.066)[.066] −0.001(.057)[.057]
β3 −1.001(.071)[.071] −0.999(.071)[.071] −0.998(.071)[.071] −0.998(.071)[.071] −0.998(.061)[.061]
(gamma distribution; Wn = Mn = WA )
n = 245
λ 0.243(.145)[.151] 0.201(.140)[.140] 0.196(.143)[.143] 0.196(.144)[.144] 0.190(.142)[.142]
ρ 0.324(.143)[.162] 0.378(.145)[.146] 0.388(.143)[.143] 0.387(.143)[.144] 0.400(.142)[.142]
β1 1.004(.090)[.090] 1.000(.091)[.091] 0.999(.091)[.091] 0.999(.091)[.091] 0.999(.074)[.074]
β2 −0.001(.090)[.090] −0.001(.090)[.090] −0.001(.090)[.090] −0.001(.090)[.090] 0.003(.069)[.070]
β3 −0.999(.091)[.091] −0.995(.091)[.091] −0.994(.091)[.091] −0.994(.091)[.091] −0.993(.071)[.071]
n = 490
λ 0.220(.104)[.106] 0.200(.101)[.101] 0.196(.101)[.101] 0.196(.102)[.102] 0.194(.079)[.080]
ρ 0.363(.105)[.111] 0.391(.100)[.100] 0.397(.099)[.099] 0.397(.100)[.100] 0.404(.085)[.085]
β1 1.000(.063)[.063] 0.998(.063)[.064] 0.997(.064)[.064] 0.997(.064)[.064] 0.998(.050)[.050]
β2 −0.000(.065)[.065] 0.000(.065)[.065] −0.000(.065)[.065] 0.000(.065)[.065] −0.001(.050)[.050]
β3 −1.001(.062)[.062] −0.998(.062)[.062] −0.998(.063)[.063] −0.998(.062)[.063] −0.998(.048)[.048]
44
(normal distribution; Wn = Mn = WC )
n = 245
λ 0.237(.108)[.114] 0.208(.104)[.104] 0.208(.106)[.106] 0.209(.102)[.103] 0.204(.106)[.106]
ρ 0.339(.113)[.128] 0.383(.106)[.107] 0.390(.109)[.109] 0.383(.103)[.104] 0.391(.106)[.106]
β1 1.003(.090)[.091] 0.998(.090)[.090] 0.998(.091)[.091] 0.998(.090)[.090] 0.997(.091)[.091]
β2 −0.002(.089)[.089] −0.003(.088)[.088] −0.003(.088)[.088] −0.003(.088)[.088] −0.003(.089)[.089]
β3 −1.007(.093)[.093] −1.002(.093)[.093] −1.002(.093)[.093] −1.002(.093)[.093] −1.000(.095)[.095]
n = 490
λ 0.219(.078)[.080] 0.206(.075)[.076] 0.203(.076)[.076] 0.204(.075)[.075] 0.203(.077)[.077]
ρ 0.366(.084)[.091] 0.388(.075)[.076] 0.393(.077)[.077] 0.389(.075)[.076] 0.393(.076)[.076]
β1 1.005(.063)[.063] 1.002(.063)[.063] 1.002(.063)[.063] 1.002(.063)[.063] 1.002(.064)[.064]
β2 −0.000(.064)[.064] −0.000(.064)[.064] −0.000(.064)[.064] −0.000(.064)[.064] −0.000(.064)[.064]
β3 −1.002(.066)[.066] −1.000(.066)[.066] −0.999(.066)[.066] −1.000(.066)[.066] −1.000(.067)[.067]
(t distribution; Wn = Mn = WC )
n = 245
λ 0.236(.110)[.116] 0.205(.114)[.114] 0.208(.106)[.106] 0.204(.110)[.110] 0.201(.112)[.112]
ρ 0.336(.115)[.132] 0.382(.116)[.118] 0.389(.105)[.106] 0.387(.109)[.110] 0.396(.112)[.112]
β1 1.001(.090)[.090] 0.995(.090)[.090] 0.997(.090)[.090] 0.995(.091)[.091] 0.995(.092)[.092]
β2 −0.002(.089)[.089] −0.002(.088)[.088] −0.002(.088)[.088] −0.002(.089)[.089] −0.003(.087)[.087]
β3 −1.006(.093)[.093] −1.000(.094)[.094] −1.002(.093)[.093] −1.001(.094)[.094] −1.000(.093)[.093]
n = 490
λ 0.219(.078)[.080] 0.202(.079)[.079] 0.204(.076)[.076] 0.203(.076)[.076] 0.201(.077)[.077]
ρ 0.366(.087)[.094] 0.389(.080)[.081] 0.393(.076)[.077] 0.391(.077)[.078] 0.395(.078)[.078]
β1 1.004(.064)[.064] 1.001(.064)[.064] 1.002(.064)[.064] 1.001(.064)[.064] 1.002(.064)[.064]
β2 −0.000(.064)[.064] −0.000(.063)[.063] −0.000(.064)[.064] −0.000(.064)[.064] −0.000(.064)[.064]
β3 −1.003(.065)[.065] −0.999(.065)[.065] −0.999(.065)[.065] −0.999(.066)[.066] −0.999(.065)[.065]
45
(symmetric bimodal mixture normal distribution; Wn = Mn = WC )
n = 245
λ 0.232(.113)[.117] 0.206(.104)[.104] 0.202(.108)[.108] 0.207(.100)[.101] 0.206(.094)[.094]
ρ 0.342(.117)[.130] 0.384(.107)[.108] 0.393(.111)[.111] 0.381(.102)[.103] 0.382(.094)[.096]
β1 1.002(.091)[.091] 0.998(.091)[.091] 0.996(.092)[.092] 0.998(.091)[.091] 0.997(.085)[.085]
β2 −0.001(.089)[.089] −0.002(.088)[.088] −0.001(.088)[.088] −0.001(.088)[.088] −0.002(.081)[.081]
β3 −1.002(.089)[.089] −0.998(.089)[.089] −0.997(.090)[.090] −0.998(.089)[.089] −0.997(.082)[.082]
n = 490
λ 0.219(.078)[.080] 0.207(.073)[.074] 0.204(.074)[.074] 0.207(.072)[.072] 0.207(.067)[.067]
ρ 0.367(.082)[.089] 0.387(.074)[.075] 0.393(.076)[.076] 0.387(.071)[.072] 0.386(.065)[.067]
β1 1.005(.066)[.066] 1.003(.066)[.066] 1.002(.066)[.066] 1.002(.066)[.066] 1.001(.064)[.064]
β2 0.001(.059)[.059] 0.001(.059)[.059] 0.001(.059)[.059] 0.001(.059)[.059] 0.001(.056)[.056]
β3 −1.004(.065)[.065] −1.002(.065)[.065] −1.001(.065)[.065] −1.001(.065)[.065] −1.002(.061)[.061]
(asymmetric bimodal mixture normal distribution; Wn = Mn = WC )
n = 245
λ 0.237(.122)[.128] 0.206(.114)[.114] 0.202(.117)[.117] 0.206(.111)[.111] 0.202(.102)[.102]
ρ 0.334(.123)[.140] 0.381(.117)[.118] 0.392(.120)[.120] 0.381(.111)[.112] 0.388(.104)[.105]
β1 1.001(.099)[.099] 0.996(.099)[.099] 0.995(.100)[.100] 0.996(.099)[.100] 0.995(.087)[.087]
β2 −0.000(.097)[.097] −0.001(.097)[.097] −0.000(.097)[.097] −0.000(.097)[.097] −0.001(.085)[.085]
β3 −1.001(.097)[.097] −0.997(.097)[.097] −0.995(.099)[.099] −0.996(.098)[.098] −0.997(.086)[.086]
n = 490
λ 0.222(.086)[.089] 0.208(.082)[.082] 0.205(.082)[.082] 0.207(.081)[.081] 0.204(.070)[.070]
ρ 0.362(.089)[.096] 0.385(.081)[.083] 0.391(.082)[.082] 0.386(.079)[.081] 0.391(.072)[.073]
β1 1.004(.071)[.071] 1.002(.071)[.071] 1.001(.071)[.071] 1.001(.071)[.071] 1.001(.062)[.062]
β2 0.002(.065)[.065] 0.002(.065)[.065] 0.002(.065)[.065] 0.002(.065)[.065] 0.000(.057)[.057]
β3 −1.003(.071)[.071] −1.001(.071)[.071] −1.000(.071)[.071] −1.000(.071)[.071] −1.000(.061)[.061]
46
(gamma distribution; Wn = Mn = WC )
n = 245
λ 0.229(.115)[.119] 0.196(.117)[.117] 0.199(.113)[.113] 0.196(.115)[.115] 0.193(.101)[.101]
ρ 0.338(.117)[.132] 0.388(.115)[.116] 0.395(.115)[.115] 0.391(.113)[.113] 0.403(.106)[.106]
β1 1.005(.091)[.091] 0.999(.091)[.091] 1.000(.091)[.091] 0.999(.091)[.091] 1.000(.072)[.072]
β2 −0.001(.089)[.089] −0.001(.088)[.088] −0.001(.089)[.089] −0.001(.089)[.089] 0.002(.069)[.069]
β3 −1.000(.091)[.091] −0.994(.093)[.093] −0.995(.093)[.093] −0.994(.093)[.093] −0.994(.071)[.071]
n = 490
λ 0.218(.078)[.080] 0.202(.078)[.078] 0.203(.075)[.075] 0.202(.075)[.075] 0.202(.061)[.061]
ρ 0.367(.082)[.088] 0.393(.077)[.077] 0.396(.075)[.075] 0.394(.074)[.074] 0.398(.065)[.065]
β1 1.001(.064)[.064] 0.998(.064)[.064] 0.998(.064)[.064] 0.998(.064)[.064] 0.998(.050)[.050]
β2 0.000(.064)[.064] 0.000(.064)[.064] 0.000(.064)[.064] 0.000(.064)[.064] −0.000(.049)[.049]
β3 −1.001(.063)[.063] −0.998(.063)[.063] −0.998(.063)[.063] −0.998(.063)[.063] −0.999(.048)[.048]
47

Efficient GMM Estimation of A Spatial Autoregressive Model With Autoregressive Disturbances

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Efficient GMM Estimation of A Spatial Autoregressive Model With Autoregressive Disturbances

Uploaded by

Copyright:

Available Formats

Eﬃcient GMM Estimation of a Spatial Autoregressive Model

with Autoregressive Disturbances

Lung-fei Lee∗ Xiaodong Liu

2 The MRSAR Model with SAR Disturbances

The MRSAR model with SAR disturbances is

The Rn is a quasi-transformation on un to remove its spatial correlation. The quasi-transformed

Yn = (ρ0 Mn + λ0 Wn − ρ0 λ0 Mn Wn )Yn + (In − ρ0 Mn )Xn β 0 + n,

3 Identification and GMM Estimation

where dn (δ) = Sn (λ)Sn−1 Xn β 0 − Xn β = Xn (β 0 − β) + Gn Xn β 0 (λ0 − λ).2

Fn (ρ, λ) = Fn + (ρ0 − ρ)Mn Sn + (λ0 − λ)Rn Wn + (ρ0 − ρ)(λ0 − λ)Mn Wn ,

αρ δ 1 + αλ δ 2 + αρ2 δ 21 + αλ2 δ 22 + αρλ δ 1 δ 2 + αρ2 λ δ 21 δ 2 + αρλ2 δ 1 δ 22 + αρ2 λ2 δ 21 δ 22 = 0, (6)

with ω nm = [vecD (P1n ), · · · , vecD (Pmn )] and

where Ḡn = Rn Gn Rn−1 .

4 Asymptotic Distributions of GMMEs and the BGMME

This transformed equation can be rewritten into a first-order MRSAR form as

Ȳn = λ0 W̄n Ȳn + X̄n β 0 + n, (11)

gn∗ (θ) = (Q∗n , P1n

Cn = (Gn Xn β 0 , Xn )0 Rn0 Qn (Q0n Qn )−1 Q0n Rn (Gn Xn β 0 , Xn )

= (Ḡn X̄n β 0 , X̄n )0 Qn (Q0n Qn )−1 Q0n (Ḡn X̄n β 0 , X̄n ).

given [Q∗1n , Q∗2n , P1n

under normality (respectively P2n ) has the limiting distribution that

and the derivatives are

which are asymptotically equivalent to X̄n0 n (θ) = 0,

limiting distribution as the corresponding BGMME in Proposition 3.

Yn = λWn Yn + Xn1 β 1 + Xn2 β 2 + Xn3 β 3 + un ,

Function options of sac_gmm.m are set to the default values.

B Some Useful Lemmas

where Bns = Bn + Bn0 .

Proof. See Lee (2001).

Proof. See Lee (2001).

Proof. See Lee (2004).

Proof. See Kelejian and Prucha (2001).

Proof. See Lee (2006).

Proof. See Lee (2004).

Proof. See Lee (2004).

(b) √1 z 0 (Cn (α̂n ) − Cn (α0 ))An n = op (1);

Proof. See Liu et al. (2006).

Proof. As Sn − Sn (λ̂n ) = (λ̂n − λ0 )Wjn , it follows that

for any m ≥ 2. Hence, it follows that

for any m ≥ 2, which implies that

where Ĉn − Cn takes the general form

as we show in the proof of (a). Therefore,

(b) √1 (T̂in − Tin )0 An n = op (1), for i = 1, 2, 3.

2(σ̂ 2n )2 μ̂3 2(σ̂2n )2 μ̂3 2σ 40 μ3

On the other hand, the remaining term in 1

Lemma B.12 The following statements are equivalent.

(a) g2 is redundant given g1 .

(b) D2 = Ω21 Ω−1

(c) There exists a k1 × p matrix A such that D1 = Ω11 A and D2 = Ω21 A.

Proof. Breusch et al. (1999) Theorem 1 (A) (C) and (D).

Lemma B.13 Let the set of moment conditions to be considered be

Proof. Breusch et al. (1999) Theorem 2.

+(ρ0 − ρ)2 tr(Hn0 Pjn

uniformly in θ ∈ Θ, by Lemma B.3 and E( 0n Pjn n ) = σ 20 tr(Pjn ) = 0 for all j = 1, · · · , m. Conse-

√ 1 ∂gn0 (θ̂n ) 0 1 ∂gn (θ̄n ) −1 1 ∂gn0 (θ̂n ) 0 1

With an = ( n1 Ωn )−1/2 in Proposition 1, Assumption 6 implies that a0 = (limn→∞ n1 Ωn )−1/2 exits.

converges in probability to zero, uniformly in θ ∈ Θ. The consistency of the feasible optimum

⎡ Ã !−1 ⎤−1 Ã !−1

1 gn (θ0 ) 1 ∂gn (θ̄n ) √ gn (θ0 ) Dn √

gn0 (θ̂f o,n )Ω̂−1

Proof of Proposition 3. Consider the moment conditions

= E (gn (θ0 ) gn∗0 (θ0 ))