Professional Documents
Culture Documents
Yu Et Al (2008) - Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data With Fixed Effects When Both N and T Are Large
Yu Et Al (2008) - Quasi-Maximum Likelihood Estimators For Spatial Dynamic Panel Data With Fixed Effects When Both N and T Are Large
Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom
Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed
effects when both n and T are largeI
Jihai Yu a , Robert de Jong b , Lung-fei Lee b,∗
a
Department of Economics, University of Kentucky, United States
b
Department of Economics, Ohio State University, United States
article info a b s t r a c t
Article history: This paper investigates the asymptotic properties of quasi-maximum likelihood estimators for spatial
Received 24 November 2006 dynamic panel data with fixed effects, when both the number of individuals n and the number of time
Received in revised form periods T are large. We consider the case where T is asymptotically large relative to n, the case where
4 August 2008
T is asymptotically proportional to n, and the case where n is asymptotically
√ large relative to T . In the
Accepted 6 August 2008
Available online 13 August 2008
case where T is asymptotically large relative to n, the estimators are nT consistent and asymptotically
normal, with the√limit distribution centered around 0. When n is asymptotically proportional to T , the
JEL classification: estimators are nT consistent and asymptotically normal, but the limit distribution is not centered
C13 around 0; and when n is large relative to T , the estimators are T consistent, and have a degenerate limit
C23
√
distribution. The estimators of the fixed effects are T consistent and asymptotically normal. We also
Keywords: propose a bias correction for our estimators. We show that when T grows faster than n1/3 , the correction
Spatial autoregression will asymptotically eliminate the bias and yield a centered confidence interval.
Dynamic panels © 2008 Elsevier B.V. All rights reserved.
Fixed effects
Maximum likelihood estimation
Quasi-maximum likelihood estimation
Bias correction
theoretical analysis on the asymptotic properties of the ML state consumption growth. As a recursive model, the parameters
estimator (MLE) and the QML estimator (QMLE). The asymptotics including the fixed effects can be estimated by OLS (within
will be based on both n, the cross sectional units, and T , the time estimator). Korniotis (2005) has also considered a bias adjusted
length, go to infinity, or n being a fixed finite integer, while T goes within estimator, which generalizes that in Hahn and Kuersteiner
to infinity. The case with both n and T going to infinity will be the (2002). For the dynamic spatial model considered in this paper,
main interest. as the contemporaneous spatial lag is presented, the QMLEs of
As our model includes the dynamic panel data model without the parameters are nonlinear. Our asymptotic analysis is more
spatial dependence as a special case, estimation issues of the complex, but our assumptions are more general. The asymptotics
dynamic panel data models in the existing econometric literature in Hahn and Kuersteiner (2002) is based on the scenario that n
are relevant. When the time dimension T is fixed, we are likely and T diverge at a proportional rate. Our asymptotic analysis can
to encounter the ‘‘incidental parameters’’ problem discussed in cover this scenario and also scenarios that n may go to infinity
Neyman and Scott (1948). This is because the introduction of fixed faster than T , and vice versa. Following the literature on bias
effects increases the number of parameters to be estimated. In a correction, we have also considered a bias-adjusted estimator for
our QMLE and its asymptotic properties. Monte Carlo experiments
simple dynamic panel data model with fixed effects, the MLE of
are conducted to provide some finite sample properties of the
the autoregressive coefficient, which is also known as the within
estimators. This paper is theoretic and does not provide an
group estimator, is biased and inconsistent when n tends to infinity
empirical application. But it is interesting to note that the empirical
but T is fixed (Nickell, 1981; Hsiao, 1986). To avoid the incidental
study on interregional trade with a historical panel data on Chinese
parameters problem in estimation, alternative estimation methods
rice price by Keller and Shiue (2007) allows own time and spatial
have been introduced. By taking time differences to eliminate the
time lags in addition to a contemporaneous spatial lag in their
fixed effects in either the dynamic equation or the construction spatial model.1
of instrumental variables (IV), Anderson and Hsiao (1981) show This paper is organized as follows. In Section 2, we introduce the
that IV methods can provide consistent estimates. Arellano and model, and explain our estimation method, which is a concentrated
Bond (1991) and Arellano and Bover (1995) generalize Anderson QML estimation. With the law of large numbers and central limit
and Hsiao (1981) with many more IV moments, by exploring theorem for our setting developed in the Appendix, Section 3
all possible time lag values of the dependent variable in each establishes the consistency and asymptotic distributions of MLE
time period. Blundell and Bond (1998) have considered system and QMLE. We also propose an analytical bias correction for our
estimators, including moments of both levels and first differences estimators. We show that when T grows faster than n1/3 , this
in Arellano and Bond (1991) and Arellano and Bover (1995). Bun correction will eliminate the bias, and yield a centered confidence
and Kiviet (2006) derive higher order asymptotic approximation interval. Section 4 concludes the paper. Some useful lemmas and
of the finite sample bias for the system estimator under various proofs are collected in the Appendix.2
circumstances, as both N and T are small or moderately large.
When T is finite, additional IVs can improve the efficiency of 2. The model and concentrated likelihood function
the estimators, even though finite sample biases remain. When
both n and T go to infinity, the incidental parameters issue in 2.1. The model
the MLE becomes less severe as each individual fixed effect can
be consistently estimated. However, Hahn and Kuersteiner (2002) The model considered in this paper is
and Alvarez and Arellano (2003) have found the existence of Ynt = λ0 Wn Ynt + γ0 Yn,t −1 + ρ0 Wn Yn,t −1 + Xnt β0 + cn0 + Vnt ,
asymptotic bias of order O(1/T ) in the MLE of the autoregressive
t = 1, 2, . . . , T , (1)
parameter when both n and T tend to infinity at a proportional rate.
In addition to the MLE, Alvarez and Arellano (2003) also investigate where Ynt = (y1t , y2t , . . . , ynt )0 and Vnt = (v1t , v2t , . . . , vnt )0 are
the asymptotic properties of the IV estimators in Arellano and n × 1 column vectors and vit is i.i.d.. across i and t with zero
Bond (1991). They have found the presence of asymptotic bias of a mean and variance σ02 , Wn is an n × n spatial weights matrix,
similar order to that of the MLE and the IV estimators, due to the which is predetermined and generates the spatial dependence
presence of many moment conditions. The presence of asymptotic between cross sectional units yit , Xnt is an n × kx matrix of
bias is an undesirable feature of these estimates. nonstochastic regressors, and cn0 is n × 1 column vector of fixed
Kiviet (1995), Hahn and Kuersteiner (2002), and Bun and effects. Therefore, the total number of parameters in this model
Carree (2005) have constructed bias corrected estimators for is equal to the number of individuals n plus the dimension of the
the dynamic panel data model, by analytically modifying the common parameters (γ , ρ, β 0 , λ, σ 2 )0 , which is kx + 4.
Define Sn ≡ Sn (λ0 ) = In − λ0 Wn . Then, presuming Sn is
within estimator. Hahn and Kuersteiner (2002) provide a rigorous
invertible and denoting An = Sn−1 (γ0 In + ρ0 Wn ), (1) can be
asymptotic theory for the within estimator and their bias corrected
estimator, when both n and T go to infinity with a same rate. As rewritten as Ynt = An Yn,t −1 +Sn−1 Xnt β0 +Sn−1 cn0 +Sn−1 Vnt . Assuming
an alternative to the analytical bias correction, Hahn and Newey that the infinite sums are well-defined, by continuous substitution,
∞
(2004) have considered also the Jackknife bias reduction approach. X
For the SAR model, Kelejian and Prucha (1998) provide Ynt = Ahn Sn−1 (cn0 + Xn,t −h β0 + Vn,t −h )
h=0
a theoretical foundation for asymptotic analysis for their IV
estimator. Lee (2004) analyzes the asymptotic properties of the = µn + Xnt β0 + Unt , (2)
QMLE. Kapoor et al. (2007) extend their asymptotic analysis of where µn ≡
P∞ h −1
P∞ h −1
IV and method of moments estimators to a spatial panel model P∞ h −1 h=0 An Sn cn0 , Xnt ≡ h=0 An Sn Xn,t −h , and Unt ≡
h=0 An Sn Vn,t −h .
with error components, where T is a fixed finite integer. To the
best knowledge of the authors, there is little analytical work
1 However, error components have not been considered in their empirical
done on estimates of spatial dynamic models, when both n and
T are large, with the exception of Korniotis (2005). The model models and no theoretic properties of the estimates are investigated in the paper.
2 Due to space limitation, at the request of the editor and referees, some of the
considered in Korniotis (2005) is a time-space recursive model
proofs have been condensed and removed. The detailed proofs and intermediate
in that only individual time lag and spatial time lag are present, steps in some derivations can be found in the working paper version of this
but not contemporaneous spatial lag. Fixed effects are included paper. The working paper under the same title is available on the web site:
in the model, and this model has an empirical application to US http://economics.sbs.ohio-state.edu/lee/.
120 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
2.2. Concentrated likelihood function Assumption 3. Sn (λ) is invertible for all λ ∈ Λ. Furthermore, Λ is
compact and λ0 is in the interior of Λ.
Denote θ = (δ 0 , λ, σ 2 )0 and ζ = (δ 0 , λ, c0n )0 where δ =
(γ , ρ, β 0 )0 . At the true value, θ0 = (δ00 , λ0 , σ02 )0 and ζ0 = Assumption 4. The elements of Xnt are nonstochastic and bound-
(δ00 , λ0 , c0n0 )0 where δ0 = (γ0 , ρ0 , β00 )0 . The likelihood function of 1
PT 0
ed,4 uniformly in n and t. Also, limT →∞ nT t =1 X̃nt X̃nt exists and
(1) is3 is nonsingular.
nT nT
ln Ln,T (θ , cn ) = − ln 2π − ln σ 2 + T ln |Sn (λ)| Assumption 5. Wn is uniformly bounded in row and column sums
2 2 in absolute value (for short, UB).5 Also Sn−1 (λ) is UB, uniformly in
T
1 X 0 λ ∈ Λ.
− Vnt (ζ )Vnt (ζ ), (3)
2σ abs(Ahn ) is UB,6 where [abs(An )]ij = An,ij .
2
P∞
t =1 Assumption 6. h=1
For our analysis of the asymptotic properties of estimators, we 4 If X is allowed to be stochastic and unbounded, appropriate moment
nt
need the following assumptions:
conditions can be imposed instead.
5 We say a (sequence of n × n) matrix P is uniformly bounded in row and
n
Assumption 1. Wn is a constant spatial weights matrix and its
column sums if supn≥1 kPn k∞ < ∞ and supn≥1 kPn k1 < ∞, where kPn k∞ ≡
diagonal elements satisfy wn,ii = 0 for i = 1, 2, . . . , n. Pn Pn
sup1≤i,j≤n j=1 pij,n is the row sum norm and kPn k1 = sup1≤i,j≤n i=1 pij,n is
the column sum norm.
Assumption 2. The disturbances {vit }, i = 1, 2, . . . , n and t = 6 This assumption has effectively ruled out some cases, and, hence, imposed
1, 2, . . . , T , are i.i.d. across i and t with zero mean, variance σ02 limited dependence across units or time series. For example, if λ0n = 1 − 1/n under
and E |vit |4+η < ∞ for some η > 0. n → ∞, it is a near unit root case for a cross sectional spatial autoregressive model
and Sn−1 will not be UB. For spatial dynamic panel model, if λ0 + ρ0 + γ0 = 1, Ynt
might have deterministic trends as well as a nonstationary stochastic component
3 As T is large, we can ignore the influence of the initial condition. When T is fixed, (see Yu et al. (2007) for detail).
we need to specify the initial condition if MLE is used; and we may also consider 7 For the case W is not row normalized but its eigenvalues are real, Λ can be a
n
the estimation by the generalized method of moments where lagged dependent closed interval contained in (−1/ ωn,min , 1/ωn,max ) where ωn,min and ωn,max are
variables can be used as IVs. the minimum and maximum eigenvalues of Wn (Anselin, 1988).
J. Yu et al. / Journal of Econometrics 146 (2008) 118–134 121
γ +ρ ω γ +ρ ω
where Dn = diag{ 10−λ 0ω n1 , . . . , 10−λ 0ω nn } is the eigenvalue matrix 3.2. Distribution of QMLEs
0 n1 0 nn
of An . When |λ0 | + |γ0 | + |ρ0 | < 1, it is easy P∞to show
γ0 +ρ0 ωni
that | 1−λ ω | < 1 for all i = 1, . . . , n. Thus, h
h=0 An =
The asymptotic distribution of the QMLE θ̂nT can be derived
0 ni
∂ ln L (θ̂ )
= Rn (In − Dn ) Rn
P∞ n,T nT
h −1
h=0 Rn Dn Rn is a well defined matrix. −1 −1
from the Taylor expansion of ∂θ
around θ0 . At θ0 , from (35)
Assumption 6 imposes stronger convergence of this series in term and (36), the first order derivative of the concentrated likelihood
of absolute values and assumes UB as n → ∞. Assumption 7 function at θ0 is in (37) of Appendix C, which involves both linear
allows two cases: (i) n → ∞ as T → ∞; (ii) n is fixed as T → ∞. and quadratic functions of Ṽnt . Also, from (2),
Because (ii) is similar to a vector autoregressive (VAR) model with
restricted coefficients, our main interest is in (i); but our analysis ∗
Z̃nt = Z̃nt − (ŪnT ,−1 , Wn ŪnT ,−1 , 0n×kx ), (7)
is applicable to both cases. If Assumption 7 holds, then we say that
˜
= ((X̃ ˜
n, T → ∞ simultaneously. n,t −1 + Un,t −1 ), (Wn X̃n,t −1 + Wn Un,t −1 ), X̃nt )
∗
where Z̃nt
˜
with X̃ n,t −1 = Xn,t −1 − X̄nT ,−1 . Hence, Z̃nt has two compo-
3.1. Consistency of the concentrated estimator θ̂nT ∗
nents: one is Z̃nt , which is uncorrelated with Vnt ; the other is
For the concentrated log likelihood function (4) divided by −(ŪnT ,−1 , Wn ŪnT ,−1 , 0n×kx ), which is correlated with Vnt when
the sample size nT , the corresponding expected value function is t ≤ T − 1.
Qn,T (θ ) = E maxcn nT
1
ln Ln,T (θ , cn ), which is Hence, from the first order condition in (37) and the decompo-
∂ ln Ln,T (θ0 ) ∂ ln L∗n,T (θ0 )
1 1 1 1 sition of Z̃nt in (7), √1 ∂θ
= √1 ∂θ
− ∆nT where
Qn,T (θ ) = E ln Ln,T (θ ) = − ln 2π − ln σ + 2
ln |Sn (λ)| nT nT
nT 2 2 n 1 ∂ ln L∗n,T (θ0 )
√ =
1
T
1 X 0 nT ∂θ
− E Ṽnt (ζ )Ṽnt (ζ ). (5)
2σ 2 nT t =1 T
1 1 X ∗0
σ2 √ Z̃nt Vnt
0 nT t =1
To show the consistency of θ̂nT , we need the following uniform
1 1 X T T
1 1 X 0 0
convergence result. (Gn Z̃nt δ0 ) Vnt + 2 √ ,
(Vnt Gn Vnt − σ0 tr Gn )
∗ 0 2
σ2 √
(8)
0 nT t =1
σ0 nT t =1
Claim 1. Let Θ be any compact parameter space. Then under
T
1 1 X 0
( σ 2
)
p √ V V − n
Assumptions 1–7, nT 1
ln Ln,T (θ ) − Qn,T (θ ) → 0 uniformly in θ ∈ Θ 2σ04 nT t =1
nt nt 0
Theorem 1. Under Assumptions 1–7, if limT →∞ E HnT is nonsingu- symmetric matrix, with µ4 being the fourth moment of vit , where
p Gn,ii is the (i, i) entry of Gn . When Vnt are normally distributed,
lar, θ0 is globally identified and θ̂nT → θ0 .
Ωθ0 ,nT = 0 because µ4 − 3σ04 = 0 for a normal distribution. Denote
Theorem 2. Under Assumptions 1–7, θ0 is globally identified and Σθ0 = limT →∞ Σθ0 ,nT and Ωθ0 = limT →∞ Ωθ0 ,nT , then,
p
θ̂nT → θ0 if ∂ ln L∗n,T (θ0 ) ∂ ln L∗n,T (θ0 )
1 1
limn→∞ 1
ln σ 2 −10 −1
− ln σ (λ)Sn (λ) Sn (λ)
1 2 −1 0 −1
6= 0 for lim E √ ·√ = Σθ0 + Ωθ0 .
n 0 Sn Sn n n T →∞ nT ∂θ nT ∂θ 0
λ 6= λ0 .8 (11)
∂ ln L∗n,T (θ0 )
8 When 1
ln |σ02 Sn0−1 Sn−1 | The asymptotic distribution of √1 can be derived
n is finite, the condition is n
− nT ∂θ
1
n
ln |σ (λ)Sn (λ)Sn (λ)| 6= 0 for λ 6= λ0 .
2
n
0−1 −1
from the central limit theorem for martingale difference arrays
122 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
! !
∞
1 X
tr Ahn (θ ) Sn (λ)
−1
n h=0
! !
∞
1 X
An (θ ) Sn (λ)
h −1
tr Wn
n h=0
ϕn (θ ) =
! 0kx ×1 !
∞ ∞
1 1 1
X X
γ tr(Gn (λ) An (θ ) Sn (λ)) + ρ tr(Gn (λ)Wn
h −1
An (θ ) Sn (λ)) + tr Gn (λ)
h −1
n n n
h=0 h=0
1
2σ 2
Box I.
d
and we may choose9
→ N 0, Σθ−01 Σθ0 + Ωθ0 Σθ−01 ,
(12) " −1 #
1 ∂ 2 ln Ln,T (θ )
B̂nT = E ϕn (θ ) . (18)
where ϕθ0 ,nT = Σθ−01,nT ϕn is O(1). nT ∂θ ∂θ 0
θ =θ̂nT
When Tn → 0,
√ 9 An asymptotically equivalent alternative way is to replace Σ −1 by the
d θ0 ,nT
nT (θ̂nT − θ0 ) → N (0, Σθ−01 (Σθ0 + Ωθ0 )Σθ−01 ). (13) empirical Hessian matrix of the concentrated log likelihood function.
J. Yu et al. / Journal of Econometrics 146 (2008) 118–134 123
Table 1
Performance of estimators before bias correction
T n θ0 γ ρ β λ σ2
(1) 10 49 θ a
0 Bias −0.0628 −0.0031 −0.0077 −0.0024 −0.1168
SD 0.0322 0.0591 0.0452 0.0477 0.0566
RMSE 0.0733 0.0807 0.0635 0.0667 0.1352
CP 0.5020 0.9430 0.9290 0.9300 0.4530
(2) 10 49 θ0b Bias −0.0701 −0.0080 −0.0111 −0.0105 −0.1193
SD 0.0322 0.0570 0.0453 0.0457 0.0567
RMSE 0.0792 0.0779 0.0641 0.0639 0.1372
CP 0.4050 0.9300 0.9230 0.9370 0.4330
(3) 10 196 θ0a Bias −0.0625 −0.0036 −0.0076 −0.0024 −0.1105
SD 0.0161 0.0304 0.0226 0.0246 0.0285
RMSE 0.0647 0.0417 0.0320 0.0344 0.1146
CP 0.0310 0.9380 0.9260 0.9260 0.0580
(4) 10 196 θ0b Bias −0.0691 −0.0067 −0.0109 −0.0091 −0.1129
SD 0.0160 0.0292 0.0226 0.0236 0.0285
RMSE 0.0710 0.0405 0.0329 0.0322 0.1169
CP 0.0130 0.9300 0.9140 0.9320 0.0530
(5) 50 49 θ0a Bias −0.0121 −0.0018 −0.0008 0.0005 −0.0220
SD 0.0141 0.0260 0.0202 0.0213 0.0280
RMSE 0.0221 0.0350 0.0278 0.0288 0.0433
CP 0.8460 0.9460 0.9370 0.9480 0.8590
(6) 50 49 θ0b Bias −0.0132 −0.0024 −0.0009 −0.0006 −0.0221
SD 0.0139 0.0243 0.0203 0.0201 0.0281
RMSE 0.0224 0.0327 0.0279 0.0269 0.0435
CP 0.8310 0.9530 0.9340 0.9580 0.8570
(7) 50 196 θ0a Bias −0.0122 −0.0002 −0.0004 0.0012 −0.0211
SD 0.0071 0.0134 0.0101 0.0110 0.0140
RMSE 0.0148 0.0182 0.0139 0.0149 0.0271
CP 0.5990 0.9410 0.9450 0.9470 0.6530
(8) 50 196 θ0b Bias −0.0133 −0.0008 −0.0005 0.0004 −0.0212
SD 0.0070 0.0125 0.0101 0.0103 0.0141
RMSE 0.0156 0.0171 0.0140 0.0141 0.0273
CP 0.5040 0.9430 0.9480 0.9480 0.6640
θ0a = (0.2, 0.2, 1, 0.2, 1) and θ0b = (0.3, 0.3, 1, 0.3, 1).
√
We show that when T /n1/3 → ∞, θ̂nT 1
is nT consistent and and the spatial weights matrix we use is a rook matrix. We use
asymptotically centered normal, even when n/T → ∞. For T = 10 and T = 50, and n = 49 and n = 196. For each set
the bias corrected estimator, we need the following additional of generated sample observations, we calculate the MLE θ̂nT and
assumption. evaluate the bias θ̂nT − θ0 ; we then construct the bias corrected
estimator θ̂nT
1
and evaluate the bias θ̂nT
1
− θ0 . We do this 1000 times
h=0 An (θ ) and (θ ) are uniformly
P∞ h h−1
P∞
Assumption 9. h=1 hAn
bounded in either row sum or column sums, uniformly in a to see if the bias is reduced on average by using the analytical bias
i=1 (θ̂nT − θ0 )i with
P1000
neighborhood of θ0 .
1
correction procedure,11 i.e., to compare 1000
Assumption 9 can be justified by Lemma 14. Our result for the
i=1 (θ̂nT − θ0 )i . With two different values of θ0 for each n
1
P1000 1
bias corrected estimator is in Theorem 5. 1000
√ and T , finite sample properties of both estimators are summarized
Theorem 5. If T /n1/3 → ∞, under Assumptions 1–9, nT (θ̂nT
1
− in Table 1 and Table 2, where Table 1 is for the performance
d of the estimators before bias correction and Table 2 is for the
θ0 ) → N (0, Σθ−01 (Σθ0 + Ωθ0 )Σθ−01 ). performance after the bias correction. For each case, we report the
Hence, if T grows faster than n1/3 , the analytical bias correction bias (Bias), standard deviation (SD), root mean square error (RMSE)
will give us estimators that are asymptotically normal and and coverage probability (CP).
centered around θ0 . For the case Tn → k, θ̂nT 1
has removed the We can see that both estimators have some bias, but the
bias corrected estimators reduce those biases which are originally
asymptotic bias ϕθ0 ,nT . Note that T → k implies T /n1/3 → ∞. For
n
√ larger. This is consistent with our asymptotic analysis, because
the case n
T
→ ∞, as long as T /n1/3 → ∞, θ̂nT
1
is nT consistent, the bias corrected estimators will eliminate the bias of order
which is also an improvement upon the T consistency of θ̂nT . O(T −1 ). Also, bias reduction is achieved while there is no significant
Thus, θ̂nT
1
might have better performance in economic applications, increase in the variance of the estimators. Before bias correction,
especially when n is much larger than T . the CPs of the estimators under 95% confidence level have lower
values due to the bias, especially when n is relatively large. After
3.4. Monte Carlo results bias correction, the CPs are close to the specified 95% confidence
We conduct a small Monte Carlo experiment to evaluate the level.
performance of our MLEs and the bias corrected estimators. We For different cases of n and T , we can see that for each given n,
generate samples from (1) and use θ0a = (0.2, 0.2, 1, 0.2, 1)0 , θ0b = when T is larger, the biases of two sets of estimators will be smaller
(0.3, 0.3, 1, 0.3, 1)0 where θ0 = (γ0 , ρ0 , β00 , λ0 , σ02 )0 , and Xnt , cn0 and the variance will be smaller; for each given T , when n is larger,
and Vnt are generated from independent normal distributions10 the biases of two sets of estimators will be nearly the same, but
the variance will be smaller. This is consistent with our theoretical
prediction, because the bias is of the order O(T −1 ) and the variance
10 We generated the spatial panel data with 20 + T periods and then take the last
T periods as our sample. The initial value is generated as N (0, In ) in the simulation.
We have also generated the data with a much longer history 1000+T and the results 11 For n = 196 and T = 50, each iteration takes about 3 s on average using a
are similar. desktop with 4G memory and duo 2.66 GHz CPU.
124 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
Table 2
Performance of estimators after bias correction
T n θ0 γ ρ β λ σ2
(1) 10 49 θ a
0 Bias −0.0039 −0.0005 −0.0001 −0.0008 −0.0287
SD 0.0338 0.0623 0.0474 0.0483 0.0623
RMSE 0.0467 0.0857 0.0650 0.0671 0.0911
CP 0.9270 0.9260 0.9320 0.9360 0.8600
(2) 10 49 θ0b Bias −0.0038 0.0036 0.0004 −0.0039 −0.0322
SD 0.0337 0.0606 0.0475 0.0459 0.0623
RMSE 0.0470 0.0855 0.0653 0.0642 0.0921
CP 0.9130 0.8970 0.9340 0.9220 0.8510
(3) 10 196 θ0a Bias −0.0040 −0.0011 −0.0000 −0.0009 −0.0217
SD 0.0169 0.0320 0.0237 0.0249 0.0314
RMSE 0.0237 0.0441 0.0322 0.0346 0.0484
CP 0.9120 0.9240 0.9380 0.9270 0.8160
(4) 10 196 θ0b Bias −0.0035 0.0027 0.0003 −0.0037 −0.0250
SD 0.0168 0.0310 0.0237 0.0237 0.0314
RMSE 0.0236 0.0436 0.0322 0.0328 0.0497
CP 0.9110 0.9020 0.9390 0.9370 0.7950
(5) 50 49 θ0a Bias −0.0001 −0.0018 −0.0005 0.0005 −0.0025
SD 0.0143 0.0263 0.0204 0.0213 0.0286
RMSE 0.0197 0.0355 0.0280 0.0289 0.0393
CP 0.9400 0.9460 0.9370 0.9460 0.9300
(6) 50 49 θ0b Bias −0.0002 −0.0019 −0.0004 −0.0002 −0.0026
SD 0.0141 0.0246 0.0205 0.0201 0.0287
RMSE 0.0194 0.0332 0.0280 0.0269 0.0395
CP 0.9410 0.9470 0.9360 0.9570 0.9270
(7) 50 196 θ0a Bias −0.0002 −0.0001 −0.0001 0.0013 −0.0015
SD 0.0071 0.0136 0.0102 0.0110 0.0143
RMSE 0.0097 0.0184 0.0140 0.0149 0.0194
CP 0.9430 0.9380 0.9440 0.9470 0.9430
(8) 50 196 θ0b Bias −0.0003 −0.0003 −0.0001 0.0007 −0.0017
SD 0.0070 0.0127 0.0102 0.0104 0.0144
RMSE 0.0096 0.0173 0.0140 0.0141 0.0195
CP 0.9420 0.9410 0.9450 0.9440 0.9440
θ0a = (0.2, 0.2, 1, 0.2, 1) and θ0b = (0.3, 0.3, 1, 0.3, 1).
Table 3
Performance of estimators under non-normality
T n γ ρ β λ σ2
Before bias correction
(1) 10 49 Bias −0.0606 −0.0065 −0.0064 −0.0032 −0.1162
SD 0.0321 0.0590 0.0451 0.0477 0.1035
RMSE 0.0715 0.0798 0.0637 0.0663 0.1850
CP 0.5230 0.9590 0.9260 0.9330 0.6800
(2) 10 196 Bias −0.0621 −0.0019 −0.0089 −0.0002 −0.1124
SD 0.0161 0.0304 0.0226 0.0246 0.0530
RMSE 0.0644 0.0418 0.0326 0.0349 0.1294
CP 0.0310 0.9400 0.9160 0.9140 0.4390
(3) 50 49 Bias −0.0116 −0.0005 −0.0006 −0.0003 −0.0223
SD 0.0141 0.0260 0.0202 0.0213 0.0561
RMSE 0.0220 0.0351 0.0276 0.0294 0.0798
CP 0.8660 0.9410 0.9480 0.9440 0.9010
(4) 50 196 Bias −0.0121 −0.0009 −0.0001 0.0012 −0.0214
SD 0.0071 0.0134 0.0101 0.0110 0.0282
RMSE 0.0148 0.0182 0.0136 0.0151 0.0427
CP 0.5930 0.9470 0.9540 0.9340 0.8720
Our asymptotic analysis in this paper has focused on the spatial an asymptotic bias of order O( 1n ). So, contrary to the model with
dynamic model with fixed effects, but the remaining disturbances only individual effects, for the model with both individual and time
are i.i.d. across spatial units. We expect that our asymptotic effects, an asymptotic bias of order either O( 1n ) or O( T1 ) exists. Lee
analysis can be easily extended to dynamic panel models with error and Yu (2007) have also constructed a bias-corrected estimator
components, and spatially and serially correlated disturbances. The which can remove such biases, but will require conditions that
spatial panel data model in Baltagi et al. (2007) is an example. both T /n3 and n/T 3 go to zero. The model in this paper with
Their model is a regression panel model with serial correlation only individual effects is of interest in its own as it includes the
and spatial dependence in disturbances: Ynt = Xnt β0 + cn0 + nt scenarios of a fixed finite n or Tn → 0.13 Under such scenarios,
where nt = λ0 Wn nt + Unt and Unt = γ0 Un,t −1 + Vnt . Denote the spatial dynamic model can be regarded as a structural vector
the n-dimensional vector of total disturbances ηnt = cn0 + nt . autoregressive model with restricted coefficients.
The disturbance process implies the structure ηnt = λ0 Wn ηnt + For future research, it may be of interest to model common and
γ0 ηn,t −1 − γ0 λ0 Wn ηn,t −1 + cn0
∗ ∗
+ Vnt , where cn0 = (1 − γ0 )(In − persistent shocks directly as in Phillips and Moon (1999), Phillips
λ0 Wn )cn0 . The process of ηnt is in the form of our dynamic model and Sul (2003) and Pesaran (2006) in a random component or
∗
when cn0 is treated as fixed effects and with nonlinear constraints factor structural framework with the spatial setting. In addition, as
on the spatial and dynamic coefficients. Hence, our theory can be in Korniotis (2007), the model may be extended to accommodate
easily adopted to cover the estimation of this model for the case endogenous control variables. With endogenous control variables,
with T (and n) goes to infinity. a possible estimation method is the generalized method of
The dynamic panel model analyzed in this paper allows moments, if proper instrumental variables can be found. The
individual-invariant, time-varying exogenous variables in the method of maximum likelihood may also be possible if the model is
equation, but it does not incorporate cross-section dependence due expanded into a simultaneous equation system. These extensions
to unobserved macroeconomic variables or shocks. Such a cross- are of interest, as those features can be important in many
section dependence has been considered in some recent panel time macroeconomic applications.14 In addition to the above extension,
series models, e.g., Phillips and Sul (2003) and Pesaran (2006), it may also be of interest to extend the model to incorporate high
among others. As an extension of this paper, Lee and Yu (2007) order contemporaneous spatial lags and spatial time lags. With
have considered the ML estimation of the SDPD model with both high order spatial lags, the ML approach is not computationally
(additive) individual and time fixed effects. By estimating both
the individual and time fixed effects, the asymptotic bias problem 13 Lee and Yu (2007) have found a data transformation approach, which can avoid
becomes more severe. With only individual fixed effects, for the the additional bias caused by the time effects. However, the transformed approach
case that Tn → 0, as shown in this paper, the QMLE of θ is is valid only for spatial weights matrices with row-normalization.
asymptotically normal centered at 0 (without an asymptotic bias). 14 We appreciate referees for pointing out these important features in empirical
However, with both individual and time fixed effects, there exists macroeconomics models.
126 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
Table 4
Biases (1st row), SDs (2nd row) and RMSEs (3rd row) under different DGPs
σ02
where E ( nT U0nt Wnt ) = = O(1) and
PT P∞
let B1n and B2n be n × n nonstochastic square matrices. We first list 1 0
t =1 n
tr h=1 Pnh Qnh
the basic assumptions needed for those lemmas. σ02
E( )= P̈nh Q̈nh = O( ) where P̈nh and Q̈nh are
1 0
P∞ 0
1
Ū W̄nT
n nT n
tr h=1 T
Assumption A1. The disturbances {vit }, i = 1, 2, . . . , n and t = defined in (20).
1, 2, . . . , T , are i.i.d. across i and t with zero mean, variance σ02 PT PT
1 1
and E |vit |4+η < ∞ for some η > 0. Lemma 8. Under Assumptions A1–A4,
nT t =1
D̃0nt Ũnt =
nT t =1
PT
D̃0nt Unt = Op √1 , and 1
D̄0nt Ūnt = Op √1 .
Assumption A2. Pnh = B1n Pnh
and Qnh = where and B2n Qnh Pnh nT nT t =1 nT
Qnh are the Pn and Qn to the power of h. Furthermore, B1n , B2n ,
Lemma 9. Under Assumptions A1 and A4, for an n × n nonstochastic
h=1 abs(Pn ) and h=1 abs(Qn ) are UB, where [abs(Pn )]ij
P∞ h
P∞ h
=
UB matrix Bn ,
Pn,ij .
!
T T
1 X 0 1 X 0 1
Assumption A3. The elements of n×1 vector Dnt are nonstochastic Vnt Bn Vnt − E Vnt Bn Vnt = Op √ , (24)
and bounded, uniformly in n and t. nT t =1 nT t =1 nT
Assumption A4. n is a nondecreasing function of T . 1 0 1 0 1
V̄nT Bn V̄nT − E V̄nT Bn V̄nT = Op √ , (25)
P∞ n n nT 2
Lemma 1. With Unt and Wnt in (19), ŪnT = h=1 P̈nh Vn,T +1−h and !
PT T T
W̄nT = Q̈nh Vn,T +1−h where 1 X 0 1 X 0 1
h=1
Ṽnt Bn Ṽnt − E Ṽnt Bn Ṽnt = Op √ , (26)
h
nT t =1 nT t =1 nT
1 1X
( )
Pn1 + Pn2 + · · · + Pnh = Png for h ≤ T
where E ( nT ) = σ tr(Bn ) O(1) and
1
PT 0 1 2
T T t =1 Vnt Bn Vnt =
P̈nh =
g =1
(20) n 0
T
1X E( 1 0
V̄ B V̄
n nT n nT
)= 1
nT 0
2
σ
tr Bn ( )=O T .
1
for h > T ,
Pn,h−T +g
T
P∞E ([(Un,t0−1 )i ]2 ) (µ4 −
4
g =1 Lemma 10. P
Under Assumption A1, =
3σ04 ) h=1 j=1 [(Pnh )ij ]4 + 3σ04 [ h=1 (Pnh Pnh )ii ] .
P∞ n
P∞ P∞
and Q̈nh has the same pattern. Furthermore, h=1 P̈nh = h=1 Pnh ,
P∞ P∞ q
and h=1 Q̈nh = h=1 Qnh . Lemma 11. Under Assumptions A1, A2 and A4, (Ū0nT ,−1 V̄nT − E T
n
A1, for t ≥ s, E (Unt Wns ) = 0
q q
Lemma 2. Under Assumption Ū0nT ,−1 V̄nT ) = Op √1 where Tn E Ū0nT ,−1 V̄nT =
σ tr
n 1 2
T n 0
σ02 ( nt Wns ) = σ0 tr
P∞ 0
0 2
P∞ 0 T
h=1 Pn, t − s +h Qnh and E U h=1 Pn,t −s+h Qnh . P∞ q
n
h=1 Pnh + O T3
when T → ∞.
Lemma 3. Under Assumption A1, E (Vnt
0
B1n Vns )(Vng
0
B2n Vnh ) is equal
to (µ4 − 3σ04 ) i=1 B1,ii B2,ii + σ04 (tr B1n × tr B2n + tr B1n B2n +
Pn
Lemma 12. Let B− n denote the lower diagonal matrix constructed
tr B1n B02n ) for t = s = g = h; σ04 tr B1n × tr B2n for t = s 6= g = h; from a symmetric Bn by deleting the diagonal and the upper triangle
σ04 tr(B1n B02n ) for t = g 6= s = h; σ04 tr(B1n B2n ) for t = h =
6 s = g; entries. Under Assumptions A1 and A2, if Bn is UB and Kn is an
and 0 otherwise. n-dimensional nonstochastic vector with all its elements uniformly
bounded, then
Lemma 4. Under Assumption A1, for t ≥ s, i−1
( v ) − σ [ ( )− vec0D (Bn )vecD (Bn )]
1
PT n 2 1 2
tr B2n
P P
(a) nT t =1 i=1 j=1 bnij jt 2 0
∞ X
n
X 0 0
Cov(U0nt Wnt , U0ns Wns ) = (µ4 − 3σ04 ) (Pn0 ,t −s+h Qn,t −s+h )ii 1 T 0 − − 2 − −
−σ ( )] = Op √1nT .
P
= nT
[
t =1 Vnt Bn Bn Vnt 0 tr Bn Bn
h=1 i=1 T n i− 1
( v ) =
1
P P P 1
PT 0 −
" ! ! (b) nT t =1 i=1 kni j=1 bnij jt nT t =1 Kn Bn Vnt =
∞ ∞
.
X X
× (Pnh
0
Qnh )ii + σ04 tr Pnh Pn0 ,t −s+h 0
Qn,t −s+h Qnh Op √1
nT
h=1 h=1 T n i−1
( v )=
1 1
PT
U0n,t −1 B−
P P P
! !# (c) nT t =1 i=1 un,t −1,i j=1 bnij jt nT t =1 n Vnt =
∞ ∞
Op √1 .
X X
+ Qnh Pn0 ,t −s+h 0
Qn,t −s+h Pnh . nT
h=1 h=1 PT Pn PT
1 1 0 √1
(d) nT t =1 i=1 kni un,t −1,i = nT t =1 Kn Un,t −1 = Op nT
Lemma 5. Suppose Bn , Cnh and Dnh are P n × n square matrices with where vecD (Bn ) is the n-dimensional column vector formed by the the
∞ P∞
all elements being non-negative, and Bn , h=1 Cnh and h=1 Dnh are diagonal elements of Bn .
P∞
UB. Then, h=1 Cnh Bn Dnh is UB.
For the central limit theorem that follows, we will consider the
Lemma 6. Under Assumptions A1, A2 and A4, Var ( following form:
t =1 Unt Wnt ) =
PT 0
O(nT ). T
X
U0n,t −1 Vnt + D0nt Vnt + Vnt0 Bn Vnt − σ02 tr Bn
QnT =
Lemma 7. Under Assumptions A1, A2 and A4, t =1
T T
! T X
n
1 X 0 1 X 0 1
X
Unt Wnt − E Unt Wnt = Op √ , (21) = znt ,i ,
nT t =1 nT t =1 nT t =1 i=1
1 0
1 0
1
where Bn is an arbitrary n × n symmetric UB matrix,15 and znt ,i =
ŪnT W̄nT − E = Op √ , (22) (ui,t −1 + dnti )vit + bn,ii (vit2 − σ02 ) + 2( ij−
=1 bn,ij vjt )vit . Then, the
ŪnT W̄nT P 1
n n nT 2
!
T T
1 X 0 1 X 0 1 15 The assumption that B is symmetric is maintained w.l.o.g. since V 0 B V =
Ũnt W̃nt − E Ũnt W̃nt = Op √ , (23) n nt n nt
nT t =1 nT t =1 nT 0
Vnt [(Bn + B0n )/2]Vnt .
128 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
Fn,t ,i = σ (v11 , v21 , . . . , vn1 , . . . , v1,t −1 , . . . , uniformly in n, t and i (from Lemma 10), it follows that E |znt ,i |2+δ ≤
c1 c2 E |un,t −1,i |2+δ + c1 c4 = O(1) uniformly. Because σQnT 2+δ
=
vn,t −1 , v1t , . . . , vit ), (27)
1+ 2δ
O[(nT ) 1
PT Pn 2+δ 1
], one has 2+δ t =1 i=1 E |znt ,i | = O ,
then E (znt ,i |Fn,t ,i−1 ) = 0 and E (znt ,i |Fn,t −1,n ) = 0. As a convention, σQ
nT
δ
(nT ) 2
define Fn,t ,0 = Fn,t −1,n . Thus, {znt ,i , Fn,t ,i , 1 ≤ t ≤ T , 1 ≤ which goes to zero. This proves (i).
i ≤ n} forms a martingale difference array. To see explicitly To show (ii): Because znt ,i = (un,t −1,i + dnti + 2
Pi−1
bnij vjt )vit +
that this is a difference array, let j = n(t − 1) + i for 1 ≤ j=1
bnii (vit2 − σ02 ), it implies that
i ≤ n and 1 ≤ t ≤ T . Thus, j takes integer values from 1
to J where J = nT . The σ -field Fn,t ,i can be renamed as FJ ,j !2
i−1
and zJj = znt ,i . As E (zJ ,j |FJ ,j−1 ) = 0 because E (znt ,i |Fn,t ,i−1 ) = X
E( 2
znt |Fnt ,i−1 ) = σ 2
un,t −1,i + dnti + 2 bnij vjt
0 and E (znt ,i |Fn,t −1,n ) = 0, {znt ,i , Fn,t ,i } = {zJ ,j , FJ ,j−1 } is a ,i 0
znt ,i j=1
martingale difference array. Denote zJj∗ = znt ∗
,i = σ , where !
Pi−1 QnT i−1
X
znt ,i = (ui,t −1 + dnti )vit + bn,ii (v − σ ) + 2( j=1 bn,ij vjt )vit , we
2
it
2
0 + (µ4 − σ ) 4
0 b2nii + 2µ3 bnii un,t −1,i + dnti + 2 bnij vjt ,
PnT PT Pn j=1
will apply the martingale CLT to j=1 zJj = t =1 i=1 znt ,i . The
1
P T P n 2+δ
sufficient conditions are (i) 2+δ
σQ t =1 i=1 E |znt ,i | → 0 and as E (vit (vit2 − σ02 )) = µ3 and E (vit2 − σ02 )2 = µ4 − σ04 . Therefore,
nT
p
i=1 E (znt ,i |Fnt ,i−1 ) → 1.
T n
(ii) 21 2
P P
T X
n T X
n i−1
σ t =1 X X X
E (znt
2
,i |Fnt ,i−1 ) = σ0
2
(un,t −1,i + 2 bnij vjt )2
QnT
T T T
1 X 0 1 X 0 1 X 0
σ2 Z̃nt Z̃nt Z̃nt Wn Ỹnt Z̃nt Ṽnt (ζ )
t =1
σ 2
t =1
σ t =1
4
1 ∂ ln Ln,T (θ )
2
T T
1 1 X 1 X
(Wn Ỹnt )0 Wn Ỹnt + tr(G2n (λ)) ( ) (ζ )
0
= − ∗ W Ỹ
n nt Ṽnt
nT ∂θ ∂θ 0 nT
σ 2
t =1
σ t =1
4
T
nT 1 X 0
Ṽnt (ζ )Ṽnt (ζ )
∗ ∗ − 4+ 6
2σ σ t =1
Box II.
Appendix B. Lemmas for some statistics in the model Appendix C. Concentrated QMLEs
Denote Znt = (Yn,t −1 , Wn Yn,t −1 , Xnt ), we provide some lemmas C.1. Reduced form of (1)
related to Z̃nt , Z̄nT and Ṽnt , V̄nT .
As Znt = (Yn,t −1 , Wn Yn,t −1 , Xnt ), the reduced form of (1) can be
Lemma 15. Under Assumptions 1–7, for an n × n nonstochastic UB represented as
matrix Bn , Ynt = Sn−1 Znt δ0 + Sn−1 (cn0 + Vnt )
T
1 X 0
T
1 X 0
1
= Znt δ0 + λ0 Gn Znt δ0 + Sn−1 (cn0 + Vnt ), (34)
Z̃nt Bn Z̃nt − E Z̃nt Bn Z̃nt = Op √ , (28)
nT t =1 nT t =1 nT for t = 1, 2, . . . , T because Sn −1
= In + λ0 Gn . This implies that
T T
Ỹnt = Sn−1 Z̃nt δ0 + Sn−1 Ṽnt = Z̃nt δ0 + λ0 Gn Z̃nt δ0 + Sn−1 Ṽnt .
1 X 0 1 X 0 1 (35)
Z̃nt Bn Ṽnt − E Z̃nt Bn Ṽnt = Op √ , (29)
nT t =1 nT t =1 nT
T T C.2. The first and second order conditions
1 X 0 1 X 0 1
Ṽnt Bn Ṽnt − E Ṽnt Bn Ṽnt = Op √ , (30)
nT t =1 nT t =1 nT
For the concentrated likelihood function (4), the first order
derivatives are
Z̃nt Bn Z̃nt is O(1), E
1
PT 0 1
PT 0 1
where E t =1 t =1 Z̃nt Bn Ṽnt is O and
nT nT T
1 ∂ ln Ln,T (θ )
Ṽnt Bn Ṽnt is O(1).
1
PT 0
E t =1 √
nT
nT ∂θ
T
Lemma 16. Under Assumptions 1–7, for an n × n nonstochastic UB 1 1 X 0
matrix Bn , σ2 √ Z̃nt Ṽnt (ζ )
nT t =1
1 1 X T
1 0 1 0 1
, ,
(Wn Ỹnt ) Ṽnt (ζ ) − tr Gn (λ)
0
Z̄nT Bn Z̄nT − E Z̄nT Bn Z̄nT = Op √ (31) = 2√
(36)
n n nT σ nT t =1
T
1 1 X
1 0 1 0 1
, ( 0
(ζ ) (ζ ) σ 2
)
Z̄nT Bn V̄nT − E Z̄nT Bn V̄nT = Op √ (32) √ Ṽ Ṽnt − n
2σ 4 nt
n n nT nT t =1
1 0 1 0 1 and the second order derivatives are given in Box II.
V̄nT Bn V̄nT − E V̄nT Bn V̄nT = Op √ , (33)
n n nT 2 Hence,
1∂ ln Ln,T (θ0 )
where E 1n Z̄nT Bn Z̄nT is O(1), E 1n Z̄nT 1
and E 1n V̄nT
0 0
0
Bn V̄nT is O Bn V̄nT is √
1
T
nT ∂θ
O T
.
T
1 1 X 0
σ2 √ Z̃nt Ṽnt
From (7), Z̃nt = Z̃nt∗ − (ŪnT ,−1 , Wn ŪnT ,−1 , 0) where 0 nT t =1
˜ ˜ T T
Z̃nt∗
= ((X̃n,t −1 + Un,t −1 ), (Wn X̃ n,t −1 + Wn Un,t −1 ), X̃nt ) with
1 1 X 1 1 X 0 0
(Gn Z̃nt δ0 ) Ṽnt + 2 √ , (37)
(Ṽnt Gn Ṽnt − σ0 tr Gn )
0 2
= 2√
˜ σ
0 nT t =1 σ 0 nT t =1
X̃ n,t −1 = Xn,t −1 − X̄nT ,−1 . Hence Znt has two components: one
T
1 1 X 0
∗
is Z̃nt , uncorrelated with Vnt ; the other is −(ŪnT ,−1 Wn ŪnT ,−1 0), ( σ 2
)
√ Ṽ Ṽnt − n
2σ04 nt 0
correlated with Vnt when t ≤ T − 1. Following is a lemma related nT t =1
∗
to Z̃nt and Znt . and the information matrix is equal to the equation in Box III.
(2) (1)
Using Lemma 16, Σθ0 ,nT is O T1 . Hence, Σθ0 ,nT = Σθ0 ,nT +
Lemma 17. Under Assumptions 1–7, for an n × n nonstochastic UB 1
PT PT O T
.
1 1 1
0 ∗0 ∗
matrix Bn , E nT t =1 Z̃nt Bn Z̃nt − E nT t =1 Z̃nt Bn Z̃nt = O T where
is O(1).
1
PT ∗0 ∗
E nT t =1 Z̃nt Bn Z̃nt C.3. The variance of the gradient
Lemma 18. Under Assumptions 1–7, if the elements of cn0 are uni- ∗
From (8), as Z̃nt is uncorrelated with Vnt , we have the equation
t =1 ((Gn cn0 + Gn Znt δ0 )i ,
PT
formly bounded, then the elements of T1 in Box IV.
(Znt )i ) are Op (1) uniformly in n and i, where (Gn cn0 + Gn Znt δ0 )i is For the first matrix, it is equal to Σθ0 ,nT + O T1 using
the ith element of (Gn cn0 + Gn Znt δ0 ) and (Znt )i is the ith row of Znt .
PT ∗
Lemma 17. For the second matrix, E t =1 Z̃nt = 0n×(kx +2) and
130 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
1 ∂ 2 ln Ln,T (θ0 )
Σθ0 ,nT = −E = Σθ(01,)nT − Σθ(02,)nT where
nT ∂θ ∂θ 0
T
1 X
0
σ 2 nT E Z̃nt Z̃nt ∗ ∗
0 t =1
T T
(1) 1 1 1
Σθ0 ,nT
X X
= (Gn Z̃nt δ0 )0 Z̃nt (Gn Z̃nt δ0 )0 Gn Z̃nt δ0 + tr(G0n Gn ) + tr(G2n )
σ 2 nT E E ∗
0 t =1
σ 2
0 nT t =1
n
1 1
01×(kx +2) tr(Gn )
σ02 n 2σ04
0 ∗ ∗
(kx +2)×(kx +2)
1 E (G V̄ )0 Z̄ 2
E (Gn Z̄nT δ0 )0 Gn V̄nT +
1
tr(G0n Gn ) ∗
(2) n nT nT
and Σθ0 ,nT = σ02 n σ02 n nT
1 0 1 1 1 1
0
E (Gn Z̄nT δ0 )0 V̄nT + tr(Gn )
E Z̄nT V̄nT
σ04 n σ04 n σ02 nT T σ04
Box III.
Box IV.
When Vnt are normally distributed, Ωθ0 ,nT = 0(kx +4)×(kx +4) because for all i, j = 1, 2, . . . , kx + 4.
µ4 − 3σ04 = 0 for a normal distribution. Proof for (38). The detailed expressions of each entry of the
∂ 2 ln LnT (θ )
− (− nT1 ∂ ln∂θL∂θ nT (θ0 )
2
1
difference − nT ∂θ ∂θ 0 0 ) are straightforward
1 ∂ 2 ln LnT (θ) 1 ∂ ln LnT (θ)
2
1 ∂ 2 ln LnT (θ0 )
C.4. About − nT
E ∂θ∂θ 0 , − nT ∂θ∂θ 0
, − nT
E ∂θ∂θ 0 and from Box II. First, 1n tr(G2n (λ) − G2n ) = 1n tr [(Gn (λ̄))3 ](λ − λ0 ) where
1 ∂ ln LnT (θ0 )
2
− nT ∂θ∂θ 0
λ̄ lies between λ and λ0 . As 1n tr [(Gn (λ))3 ] is UB by Lemma A.7
in Lee (2004), 1n tr(G2n (λ) − G2n ) is of the order |λ − λ0 | · O(1).
Denote kθ − θ0 k as the Euclidean norm of θ − θ0 , and Θ1 as a Second, as Ṽnt (ζ ) = Ṽnt − (λ − λ0 )Wn Ỹnt − Z̃nt (δ − δ0 ) and
neighborhood of θ0 . We have Wn Ỹnt = Gn Z̃nt δ0 + Gn Ṽnt , using Lemma 15, all the entries in
the above matrices difference are of the same order as kθ − θ0 k,
1 ∂ 2 ln LnT (θ ) 1 ∂ 2 ln LnT (θ0 )
− − − multiplied by stochastic terms of orders not larger than Op (1).
nT ∂θ ∂θ 0 nT ∂θ ∂θ 0 1 ∂ ln LnT (θ )
− (− nT1 ∂ ln∂θL∂θ
nT (θ0 )
2 2
Hence, − nT ∂θ ∂θ 0 0 ) = kθ − θ0 k · Op (1).
= kθ − θ0 k · Op (1), (38)
∂ 2 ln LnT (θ0 )
Proof for (39). As Σθ0 ,nT = −E nT
1
, all the entries of the
1 ∂ 2 ln LnT (θ0 ) ∂θ ∂θ 0
1
− − Σθ0 ,nT = Op √ , (39)
difference (− 1 ∂ ln LnT (θ0 )
2
) − Σθ0 ,nT have zero means. The detailed
nT ∂θ ∂θ 0 nT nT ∂θ ∂θ 0
J. Yu et al. / Journal of Econometrics 146 (2008) 118–134 131
!
T
expressions of the entries are immediate from Box II evaluated at 1 X p
θ0 . UsingLemma − E 0
Ṽnt (ζ )Ṽnt (ζ ) → 0 uniformly in θ .
15, all the entries in above difference are of the nT t =1
order Op √1 .
nT
To prove Qn,T (θ ) is uniformly equicontinuous in θ in any compact
Proof for (40). Again, all the detailed expressions of entries of the parameter space Θ :
1
difference − nT
∂ 2 ln LnT (θ)
− (− nT1 E ∂ ∂θ∂θ
ln LnT (θ) 2
) follow from Box II. As We have QnT (θ ) = E nT1 ln Ln,T (θ ) = − 12 ln 2π −
∂θ∂θ 0 0
ln σ 2 + ln |Sn (λ)| − 2σ 12 nT E t =1 Ṽnt (ζ )Ṽnt (ζ ). As Ṽnt (ζ ) =
1 1 0
PT
Ṽnt (ζ ) = Ṽnt − (λ − λ0 )Wn Ỹnt − Z̃nt (δ − δ0 ) and Wn Ỹnt = Gn Z̃nt δ0 + 2 n
1
( )( ) tr Gn n 1
where ϕn is O(1) in Box I.
0 0 0
2
tr C n + C n Cn + C n ≥ 0 where C n = G n − n n
I , combined O T3
+ Op T
with the condition that limT →∞ E HnT is nonsingular, we have α2 = √
0 and hence α = 0. Proof of Theorem 3. According to the Taylor expansion, nT (θ̂nT
−1 ∂ ln L∗n,T (θ0 )
∂ 2 ln Ln,T (θ̄nT )
Ṽnt = n(T − 1)σ02 , at θ0 , (5) − θ0 ) = − nT1 √1 − ∆nT where
PT 0
Proof of Theorem 1. As E t =1 Ṽnt ∂θ ∂θ 0
· nT ∂θ
n(T −1)
implies E ln Ln,T (θ0 ) = − 2 ln 2π − nT
nT
ln σ02 + T ln |Sn | − 2 . ∂ ln L∗n,T (θ0 ) d
q q
2 √1
∂θ
→ N (0, Σθ0 + Ωθ0 ), ∆nT = n
T
ϕn + O n
T3
+
σ2 nT
Denote σn2 (λ) = n0 tr(Sn−10 Sn0 (λ)Sn (λ)Sn−1 ). By using Sn (λ)Sn−1 = q
In + (λ0 − λ)Gn for (43), it follows that Op 1
T
with ϕn = O(1) and θ̄nT lies between θ0 and θ̂nT .
1 ∂ ln Ln,T (θ̄nT ) 1 ∂ ln Ln,T (θ̄nT ) 1 ∂ ln Ln,T (θ0 )
2 2 2
1 1 As − nT = − − − +
E ln Ln,T (θ ) − E ln Ln,T (θ0 ) ∂θ ∂θ 0
nT ∂θ ∂θ 0 nT ∂θ ∂θ 0
nT nT ∂ 2 ln L (θ )
n,T 0
1 1 1 − nT1 ∂θ ∂θ 0
− Σθ0 ,nT + Σθ0 ,nT where the first term is
= − (ln σ 2 − ln σ02 ) + ln |Sn (λ)| − ln |Sn |
2 n n
! θ̄nT − θ0 · Op (1) from (38) and the second term is Op √1nT
T
1 1 X T −1 1 ∂ ln Ln,T (θ̄nT )
2
− E Ṽnt (ζ )Ṽnt (ζ ) −
0 from (39), − nT ∂θ ∂θ 0 = θ̄nT − θ 0 · O p (1 ) + O p √1 +
2σ 2 nT t =1 2T nT
Σθ0 ,nT . Because θ̄nT − θ0 = op (1) and Σθ0 ,nT is nonsingular
1 1 ∂ ln Ln,T (θ̄nT )
2
= T1,n (λ, σ ) − 2
T
2 2,n,T
(δ, λ) + o(1) in the limit, − is invertible for large n and T and
2σ nT
−1
∂θ ∂θ 0
√
∂ 2 ln Ln,T (θ̄nT )
where T1,n (λ, σ 2 ) = − 12 (ln σ 2 − ln σ02 ) + 1
ln |Sn (λ)| − 1
ln |Sn | − − nT1 ∂θ ∂θ 0 is Op (1). Then, nT (θ̂nT − θ0 ) = Op (1) ·
n n
(σ (λ) − σ ) and
1
q
2 2
2σ 2 n Op (1) + O n
T
, which implies that
T
1 X n r !!
T2,n,T (δ, λ) = E (Z̃nt (δ0 − δ) + (λ0 − λ)Gn Z̃nt δ0 )0 1 1
nT t =1 θ̂nT − θ0 = Op max , . (44)
o nT T
× (Z̃nt (δ0 − δ) + (λ0 − λ)Gn Z̃nt δ0 ) . √ q −1
Hence, nT (θ̂nT − θ0 ) = Σθ0 ,nT + Op max 1
,
nT T
1
·
Consider the process Ynt = λ0 Wn Ynt + Vnt for a period t, ∂ ln L∗n,T (θ0 )
the log likelihood function of this process is ln Lp,n (λ, σ 2 ) = √1
∂θ
− ∆nT . Using the fact that17
nT
− 2n ln 2π − 2n ln σ 2 + ln |Sn (λ)| − 2σ1 2 (Sn (λ)Ynt )0 Sn (λ)Ynt . Let Ep (·) r !!!−1
be the expectation operator for Ynt based on this process. It follows 1 1
that Ep ( 1n ln Lp,n (λ, σ 2 )) − Ep ( 1n ln Lp,n (λ0 , σ02 )) = − 12 (ln σ 2 − Σθ0 ,nT + Op max ,
nT T
ln σ ) +
2
0
1
n
ln |Sn (λ)| − 1
n
ln |Sn (λ0 )| − 2σ 2 (σ (λ) − σ ), which
1 2
n
2
!!
equals T1,n (λ, σ 2 ). By the information inequality, ln Lp,n (λ, σ 2 ) −
r
1 1
ln Lp,n (λ0 , σ02 ) ≤ 0. Thus, T1,n (λ, σ 2 ) ≤ 0 for any (λ, σ 2 ). Also,
= Σθ−01,nT + Op max , , (45)
nT T
T2,n,T (δ, λ) is a quadratic function of δ and λ. Under the condition
that limT →∞ E HnT is nonsingular, T2,n,T (δ, λ) > 0 whenever √ ∂ ln L∗n,T (θ0 )
we have nT (θ̂nT − θ0 ) = Σθ−01,nT · √1 +
(δ, λ) 6= (δ0 , λ0 ), so, (δ, λ) is globally identified. Given λ0 , σ02 is nT ∂θ
∂ ln L∗n,T (θ0 )
q
the unique maximizer of T1,n (λ0 , σ 2 ). Hence, (δ, λ, σ 2 ) is globally Op max 1
nT
, 1
T
· √1
∂θ
− Σθ−01,nT · ∆nT −
nT
identified. q √
Combined with uniform convergence and equicontinuity in Op max 1
nT
, T1 · ∆nT , which implies that nT (θ̂nT − θ0 ) +
Claim 1, the consistency follows. q
Σθ−01,nT · ∆nT + Op max 1
, 1 ∆nT = (Σθ−01,nT + op (1)) ·
nT T
Proof of Theorem 2. From proof of Theorem 1, 1
nT
E ln Ln,T (θ ) − ∂ ln L∗n,T (θ0 )
1
E ln Ln,T (θ0 ) = T1,n (λ, σ 2 ) − 2σ1 2 T2,n,T (δ, λ) + o(1). When
√1
∂θ
. As Σθ0 = limT →∞ Σθ0 ,nT exists, then using
nT
nT q √
limT →∞ E HnT is singular, δ0 and λ0 cannot be identified from
q q
Claim 2 and ∆nT = n
ϕn + O n
+ Op 1
, nT (θ̂nT −
T2,n,T (δ, λ). Global identification requires that the limit of T T3 T
∂ A (θ )
q
(θ ) ∂θn 0 . As (1) h=0 Ahn (θ ) and h=1 hAhn−1 (θ ) are
q q d
P∞ P∞ P∞
h−1
θ0 ) + n
T
Σθ−01,nT ϕn + Op max n
T3
, 1
T
→ N (0, Σθ−01 (Σθ0 + h=1 hAn
uniformly bounded in either row sum or column sum, uniformly
Ωθ0 )Σθ0 ). −1
in a neighborhood of θ0 , (2) Sn−1 (λ) is UB, also uniformly in λ in a
∂ ln Ln,T (θ ,cn ) neighborhood of λ0 and (3) Wn is UB, it follows that the elements
Proof of Theorem 4. From the first order condition ∂ cn
= ∂ϕn (θ )
of ∂θ 0 will be uniformly bounded in a neighborhood of θ0 . As θ̄nT
Vnt (ζ ), we have ĉnT (θ ) = (Sn (λ)Ynt − Znt δ). As
1
PT 1
PT
∂ϕn (θ̄nT )
σ2 t =1 T t =1
converges in probability to θ0 , elements of ∂θ 0
are Op (1).
Sn Ynt = Znt δ0 + cn0 + Vnt and Sn (λ)Sn−1 = In − (λ − λ0 )Gn , it
implies that ĉnT (θ ) = T1 t =1 ((In − (λ − λ0 )Gn )(Znt δ0 + cn0 + Vnt )
PT
− Znt δ). Hence, for each fixed effect,
References
T
1X
ĉi,nT (θ̂nT ) − ci,0 = − ((Gn cn0 + Gn Znt δ0 )i , (Znt )i ) Alvarez, J., Arellano, M., 2003. The time series and cross-section asymptotics of
T t =1 dynamic panel data estimators. Econometrica 71, 1121–1159.
T Anderson, T.W., Hsiao, C., 1981. Estimation of dynamic models with error
λ̂nT − λ0
1 X n o
components. Journal of the American Statistical Association 76, 598–606.
× + In − (λ̂nT − λ0 )Gn Vnt .
δ̂nT − δ0 T t =1 i Anselin, L., 1988. Spatial Econometrics: Methods and Models. Kluwer Academic, The
Netherlands.
Anselin, L., 2001. Spatial econometrics. In: Baltagi, Badi H. (Ed.), A Companion to
((Gn cn0 + Gn Znt δ0 )i , (Znt )i ) are Op (1) uni-
1
PT
As elements of T t =1 Theoretical Econometrics. Blackwell Publishers Lte, Massachusetts, Chapter 14.
Anselin, L., Bera, A.K., 1998. Spatial dependence in linear regression models with an
q
formly in n and i by Lemma 18 and θ̂nT −θ0 = Op max 1
,
nT T
1
introduction to spatial econometrics. In: Ullah, A., Giles, D.E.A. (Eds.), Handbook
of Applied Economics Statistics. Marcel Dekker, New York.
by Theorem 3, the dominant term of ĉi,nT (θ̂nT ) − ci,0 would be Arellano, M., Bond, O., 1991. Some tests of specification for panel data: Monte Carlo
√ d evidence and an application to employment equations. Review of Economic
vit . So, for each fixed effect, T ĉi,nT (θ̂nT ) − c0 → N (0,
1
PT
T t =1 Studies 58, 277–297.
σ02 ) and they are independent from each other asymptotically. Arellano, M., Bover, O., 1995. Another look at the instrumental-variable estimation
of error-components models. Journal of Econometrics 68, 29–51.
√ Baltagi, B., Song, S.H., Kon, W., 2003. Testing panel data regression models with
Proof
q of Theorem 5. Theorem
q 3qstates dthat nT (θ̂nT − θ0 ) + spatial error correlation. Journal of Econometrics 117, 123–150.
n
T
Σθ−01,nT ϕn + Op max n
T3
, T1 → N (0, Σθ−01 (Σθ0 + Ωθ0 ) Baltagi, B., Song, S.H., Jung, B.C., Kon, W., 2007. Testing for serial correlation, spatial
autocorrelation and random effects using panel data. Journal of Econometrics
√
= θ̂nT + T1 (− nT1 E ∂ ln∂θ∂θ
LnT (θnT ) −1
2 140, 5–51.
Σθ−01 ). As θ̂nT
1
0 ) ϕn (θ̂nT ), nT (θ̂nT 1
− Blundell, R., Bond, S., 1998. Initial conditions and moment restrictions in dynamic
q
d ∂ ln LnT (θ̂nT ) −1
2 panel data models. Journal of Econometrics 87, 115–143.
θ0 ) → N (0, Σθ0 (Σθ0 + Ωθ0 )Σθ0 ) if T ((− nT E ∂θ∂θ 0 ) ϕn
−1 −1 n 1
Bun, M., Carree, M., 2005. Bias-corrected estimation in dynamic panel data models.
p Journal of Business & Economic Statistics 3 (2), 200-210(11).
(θ̂nT ) − Σθ−01,nT ϕn (θ0 )) → 0 and n
T3
→ 0. Assuming that n
T3
→ 0, Bun, M.J.G., Kiviet, J.F., 2006. The effects of dynamic feedbacks and LS and
we are going to prove that MM estimator accuracy in panel data models. Journal of Econometrics 132,
!−1 409–444.
− E ∂ ln LnT (θ̂nT )
Cliff, A.D., Ord, J.K., 1973. Spatial Autocorrelation. Pion Ltd., London.
r 2
n 1 p
ϕn (θ̂nT ) − Σθ−01,nT ϕn (θ0 ) → 0. Cressie, N., 1993. Statistics for Spatial Data. Wiley, New York.
T nT ∂θ ∂θ 0 Gänsler, P., Stute, W., 1977. Wahrscheinlichkeitstheorie. Springer Verlag, New York.
Hahn, J., Kuersteiner, G., 2002. Asymptotically unbiased inference for a dynamic
(46) panel model with fixed effects when both n and T are Large. Econometrica 70
(4), 1639–1657.
∂ 2 ln LnT (θ̂nT ) Hahn, J., Newey, W., 2004. Jackknife and analytical bias reduction for nonlinear
1
From (44) and (45), − nT E ∂θ∂θ 0
= Σθ−01,nT + Op (max( T1 , √1nT )). panel models. Econometrica 72 (4), 1295–1319.
Hence, Horn, R., Johnson, C., 1985. Matrix Algebra. Cambridge University Press.
!−1 Hsiao, C., 1986. Analysis of Panel Data. Cambridge University Press.
∂ ln LnT (θ̂nT ) Kapoor, M., Kelejian, H.H., Prucha, I.R., 2007. Panel data models with spatially
r 2
n 1
− E ϕn (θ̂nT ) − Σθ−01,nT ϕn (θ0 ) correlated error components. Journal of Econometrics 140, 97–130.
T nT ∂θ ∂θ 0 Kelejian, H.H., Prucha, I.R., 1998. A generalized spatial two-stage least squares
r procedure for estimating a spatial autoregressive model with autoregressive
n disturbance. Journal of Real Estate Finance and Economics 17 (1), 99–121.
= Σθ−01,nT ϕn (θ̂nT ) − ϕn (θ0 ) Kelejian, H.H., Prucha, I.R., 2001. On the asymptotic distribution of the Moran I test
T
r statistic with applications. Journal of Econometrics 104, 219–257.
n 1 1 Kelejian, H.H., Robinson, D., 1993. A suggested method of estimation for spatial
+ ϕn (θ̂nT ) × Op max ,√ . interdependent models with autocorrelated errors, and an application to a
T T nT vounty expenditure model. Papers in Regional Science 72, 297–312.
Keller, W., Shiue, C.H., 2007. The origin of spatial interaction. Journal of
As θ̂nT − θ0 = Op max 1
T
, √1 and ϕn (θ0 ) is O(1), according to Econometrics 140, 304–332.
nT Kiviet, J., 1995. On bias, inconsistency, and efficiency of various estimators in
the Taylor expansion of ϕn (θ̂nT ) in Box I around ϕn (θ0 ), to prove dynamic panel data models. Journal of Econometrics 68, 81–126.
∂ϕn (θ̄nT ) Korniotis, G.M., 2005. A dynamic panel estimator with both fixed and spatial effects,
(46) is reduced to prove that elements of < ∞ where ∂θ 0 Manuscript, Department of Finance, University of Notre Dame.
θ̄nT lies between θ̂nT and θ0 . As An (θ ) = Sn (λ)(γ In + ρ Wn ), −1 Korniotis, G.M., 2007. Estimating panel models with internal and external habit
formation. Manuscript, Board of Governors of the Federal Reserve System,
∂ An (θ)
we have ∂γ = Sn−1 (λ), ∂ A∂ρ
n (θ)
= Sn−1 (λ)Wn , ∂ A∂β
n (θ)
i
= 0 for Washington, DC.
∂ An (θ) Lee, L.F., 2004. Asymptotic distributions of quasi-maximum likelihood estimators
i = 1, 2, . . . , kx and ∂λ
= Sn−1 (λ)Wn Sn−1 (λ)(γ In + ρ Wn ). for spatial econometric models. Econometrica 72 (6), 1899–1925.
∂ Ahn (θ) P∞ ∂ Ahn (θ )
(θ ) ∂ A∂θn (θ)
h−1 Lee, L.F., 2007. GMM and 2SLS estimation of mixed regressive, spatial autoregressive
Because18 ∂θ 0 = hAn 0 for h ≥ 1, h=1 ∂θ 0 = models. Journal of Econometrics 137, 489–514.
Lee, L.F, Liu, X., 2007. Efficient GMM estimation of high order spatail autoregressive
models with autoregressive disturbances. Working Paper, Department of
18 This can be proved by mathematical induction. Step (i) For h = 2, ∂ A2n (θ) = Economics, The Ohio State University.
∂λ
∂ An (θ)
An (θ) ∂λ + ∂ A∂λ
n (θ)
An (θ). Using Wn Sn−1 (λ) = Sn−1 (λ)Wn ,
∂ An (θ)
∂λ
An (θ) = Lee, L.F., Yu, J., 2007. A spatial dynamic panel data model with both time and
individual fixed effects. Working Paper, The Ohio State University.
∂ A2n (θ)
∂ A (θ) ∂ Ahn (θ )
An (θ) ∂λ
n
. So, ∂λ
= 2An (θ) ∂ A∂λ
n (θ )
. Step (ii) Suppose ∂λ = hAhn−1 (θ) ∂ A∂λ
n (θ)
, Neyman, J., Scott, E., 1948. Consistent estimates based on partially consistent
∂ Ahn+1 (θ) ∂ An (θ ) ∂ An (θ ) ∂ An (θ) observations. Econometrica 16 (1), 1–32.
then ∂λ
= hAn (θ) ∂λ An (θ) + An (θ) ∂λ = (h + 1)An (θ) ∂λ . Same
h−1 h h
Nickell, S.J., 1981. Biases in dynamic models with fixed effects. Econometrica 49,
∂ Ahn (θ )
arguments can be applied to other components of ∂θ . 1417–1426.
134 J. Yu et al. / Journal of Econometrics 146 (2008) 118–134
Ord, J.K., 1975. Estimation methods for models of spatial interaction. Journal of the Pötscher, B.M., Prucha, I.R., 1997. Dynamic Nonlinear Econometric Models,
American Statistical Association 70, 120–297. Asymptotic Theory. Springer, New York.
Pesaran, M.H., 2006. Estimation and inference in large heterogeneous panels with a Rothenberg, T.J., 1971. Identification in parametric models. Econometrica 39 (3),
multifactor error structure. Econometrica 74, 967–1012. 577–591.
Phillips, P.C.B., Moon, H.R., 1999. Linear regression limit theory for nonstationary
Yu, J., de Jong, R., Lee, L.-F., 2007. Quasi-maximum likelihood estimators for
panel data. Econometrica 67 (5), 1057–1111.
Phillips, P.C.B., Sul, D., 2003. Dynamic panel estimation and homogeneity tests spatial dynamic panel data with fixed effects when both n and T are large: A
under cross section dependence. Econometric Journal 6, 217–259. nonstationary case. Working Paper, The Ohio State University.