Professional Documents
Culture Documents
Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF
Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF
Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF
Badi H. Baltagi
M. Douglas Berg
Richard Blundell
Stephen Bond
Jrg Breitung
Min-Hsien Chiang
Alain Hecq
Nazrul Islam
viii
Chihwa Kao
Heikki Kauppi
Qi Li
Chris Murray
Franz C. Palm
David H. Papell
Peter Pedroni
Aman Ullah
ix
Jean-Pierre Urbain
Frank Windmeijer
Showen Wu
Yong Yin
INTRODUCTION
Badi H. Baltagi, Thomas B. Fomby and R. Carter Hill
Twenty two years ago, the first special issue on panel data econometrics was
published by the Annales de lINSEE. This consisted of two volumes
containing a list of whos who in economics and econometrics of panel data
that was edited by Mazodier (1978). Since then, several books on panel data
have been written including the econometric society monograph by Hsiao
(1986), a two volume collection of classic papers on the subject by Maddala
(1993), a Handbook, which in its second edition contained 33 chapters edited
by Matyas & Sevestre (1996) and a textbook by Baltagi (1995a). Several
special issues of journals with a panel data theme have also appeared since
1978, those include Raj & Baltagi (1992), Matyas (1992), Carraro, et al.
(1993), Baltagi (1995b), Sevestre (1999) and Banerjee (1999). There have been
nine international conferences on panel data since the first conference at
INSEE, the last one was held at the University of Geneva in June, 2000. Panel
data econometrics continues to have an important impact on todays empirical
economics studies. A Journal of Economic Literature search returned 2780
citations using the words panel data between 1980 and 2000. This volume is
dedicated to two recent intensive areas of research in the econometrics of panel
data: nonstationary panels and dynamic panels, see the survey chapter by
Baltagi & Kao in this volume. The volume includes eleven refereed chapters on
this subject written by twenty authors. The editors are grateful to the authors
and referees for their cooperation.
The chapter by Baltagi & Kao surveys the nonstationary panels, cointegration in panels and dynamic panels literature. In particular, panel unit root tests
are considered first and several important chapters are reviewed including a
summary of the finite sample properties of these unit roots tests obtained from
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 15.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
Introduction
included. Breitung suggests a test statistic that does not employ a bias
adjustment whose power is substantially higher than that of LL or the IPS tests
using Monte Carlo experiments. This chapter also finds that the power of the
LL and IPS tests is sensitive to the specification of the deterministic terms.
The chapter by Kao & Chiang studies the limiting distributions of ordinary
least squares (OLS), fully modified OLS (FMOLS) and dynamic OLS (DOLS)
estimators in a panel cointegrated regression model. This chapter shows that
the OLS, FMOLS and DOLS estimators are all asymptotically normally
distributed. However, the asymptotic distribution of the OLS estimator has a
non-zero mean. Extensive Monte Carlo experiments are performed which show
that the OLS estimator has a non-negligible bias in finite samples, the FMOLS
estimator does not improve on the OLS estimator in general, and the DOLS
estimator outperforms both OLS and FMOLS.
The chapter by Murray & Papell proposes a panel unit roots test in the
presence of structural change. In particular, this chapter proposes a unit root
test for non-trending data in the presence of a one-time change in the mean for
a heterogeneous panel. The date of the break is endogenously determined. The
resultant test allows for both serial and contemporaneous correlation, both of
which are often found to be important in the panel unit roots context. Murray
& Papell conduct two power experiments for panels of non-trending, stationary
series with a one-time change in means and find that conventional panel unit
root tests generally have very low power. Then they conduct the same
experiment using methods that test for unit roots in the presence of structural
change and find that the power of the test is much improved.
The chapter by Kauppi develops a new limit theory for panel data that may
be cross sectionally heterogeneous in a fairly general way. This limit theory
builds upon the concepts of joint convergence in probability and in distribution
for double indexed processes by Phillips & Moon (1999a). This limit theory is
applied to a panel regression model with regressors that are generated by an
autoregressive process with a root local to unity. The main results are the
following: (i) the usual pooled panel OLS estimator is invalid for inference, (ii)
a bias corrected pooled OLS proves to be NT consistent with an asymptotic
normal distribution centered on the true parameter value irrespective of
whether the regressors have near or exact unit roots. This positive result holds
only in the special case where the model does not exhibit any deterministic
effects, such as individual intercepts. (iii) The fully modified panel estimator of
Phillips & Moon (1999a) is also subject to severe bias effects if the regressors
are nearly rather than exactly cointegrated. These theoretical results are
confirmed using Monte Carlo results.
REFERENCES
Only references that are not cited later in the volume are given here.
Baltagi, B. H. (1995b). Editors Introduction: Panel Data. Journal of Econometrics, 68, 14.
Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of
Economics and Statistics, 61, 607629.
Carraro, C., Peracchi, F., & Weber, G. (Eds.) (1993). The Econometrics of Panels and Pseudo
Panels. Journal of Econometrics, 59, 1211.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Maddala, G. S. (Ed.) (1993). The Econometrics of Panel Data. Vols. 1 and 2. Cheltenham: Edward
Elgar.
Matyas, L. (Ed.) (1996). Modelling Panel Data. Structural Change and Economic Dynamics, 3,
291384.
Introduction
Matyas, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: Handbook of Theory
and Applications. Dordrecht: Kluwer Academic Publishers.
Mazodier P. (Ed.) (1978). The Econometrics of Panel Data. Annales de IINSEE, 30/31.
Raj, B., & Baltagi, B. (1992). Editors Introduction and Overview: Panel Data Analysis. Empirical
Economics, 17, 18.
Sevestre, P. (1999). 19771997: Changes and Continuities in Panel Data. Annales DEconomie et
de Statistique, 5556, 1525.
NONSTATIONARY PANELS,
COINTEGRATION IN PANELS AND
DYNAMIC PANELS: A SURVEY
Badi H. Baltagi and Chihwa Kao
ABSTRACT
This chapter provides an overview of topics in nonstationary panels: panel
unit root tests, panel cointegration tests, and estimation of panel
cointegration models. In addition it surveys recent developments in
dynamic panel data models.
I. INTRODUCTION
Two important areas in the econometrics of panel data that have received much
attention recently are dynamic panel data1 and nonstationary panel time series
models.2 This special issue focuses on these two topics. With the growing use
of cross-country data over time to study purchasing power parity, growth
convergence and international R&D spillovers, the focus of panel data
econometrics has shifted towards studying the asymptotics of macro panels
with large N (number of countries) and large T (length of the time series) rather
than the usual asymptotics of micro panels with large N and small T. In fact, the
limiting distributions of double indexed integrated processes had to be
developed, see Phillips & Moon (1999a). The fact that T is allowed to increase
to infinity in macro panel data, generated two strands of ideas. The first rejected
the homogeneity of the regression parameters implicit in the use of a pooled
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 751.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
(1999a). On international R&D spillovers, see Funk (1998) and Kao, Chiang &
Chen (1999). On exchange rate models, see Groen & Kleibergen (1999), and
Groen (1999). On savings and investment models, see Coakely, Kulasi & Smith
(1996) and Moon & Phillips (1998).
The first part of this chapter surveys some of the developments in
nonstationary panel models that have occurred since the middle of 1990s. Two
other recent surveys on this subject include Phillips & Moon (1999b) on multiindexed processes and Banerjee (1999) on panel unit roots and cointegration
tests. We will pay attention to the following three topics: (1) panel unit root
tests, (2) panel cointegration tests, and (3) estimation and inference in the panel
cointegration models. The discussion of each topic will be illustrated by
examples taken from the aforementioned list of references. Section 2 reviews
panel unit root tests, while Section 3 discusses the panel spurious models.
Section 4 considers the panel cointegration tests, while Section 5 discusses
panel cointegration models. Section 6 reviews some recent developments in
dynamic panels and Section 7 gives our conclusion.
A word on notation. We write the integral 01W(s)ds, as W when there is no
ambiguity over limits. We define 1/2 to be any matrix such that
p
= (1/2)(1/2), use to denote weak convergence, to denote convergence
in probability, I(0) and I(1) to signify a time series that is integrated of order
zero and one, respectively, and WZ(r) = W(r) [ WZ][ ZZ]Z(r) to denote an
L2 projection residual of W(r) on Z(r).
10
period January 1973 to February 1986. Breitung & Meyer (1994) applied
various modified DF test statistics to test for unit roots in a panel of contracted
wages negotiated at the firm and industry level for Western Germany over the
period 19721987. Quah (1994) suggested a test for unit root in a panel data
model without fixed effects where both N and T go to infinity at the same rate
such that N/T is constant. Levin & Lin (1992) generalized this model to allow
for fixed effects, individual deterministic trends and heterogeneous serially
correlated errors. They assumed that both N and T tend to infinity. However, T
increases at a faster rate than N with N/T 0. Even though this literature grew
from time series and panel data, the way in which N, the number of crosssection units, and T, the length of the time series, tend to infinity is crucial for
determining asymptotic properties of estimators and tests proposed for
nonstationary panels, see Phillips & Moon (1999a). Several approaches are
possible including: (i) sequential limits where one index, say N, is fixed and T
is allowed to increase to infinity, giving an intermediate limit. Then by letting
N tend to infinity subsequently, a sequential limit theory is obtained. Phillips &
Moon (1999b) argued that these sequential limits are easy to derive and are
helpful in extracting quick asymptotics. However, Phillips and Moon provided
a simple example that illustrates how sequential limits can sometimes give
misleading asymptotic results. (ii) A second approach, used by Quah (1994)
and Levin & Lin (1992) is to allow the two indexes, N and T to pass to infinity
along a specific diagonal path in the two dimensional array. This path can be
determined by a monotonically increasing functional relation of the type
T = T(N) which applies as the index N . Phillips & Moon (1999b) showed
that the limit theory obtained by this approach is dependent on the specific
functional relation T = T(N) and the assumed expansion path may not provide
an appropriate approximation for a given (T, N) situation. (iii) A third approach
is a joint limit theory allowing both N and T to pass to infinity simultaneously
without placing specific diagonal path restrictions on the divergence. Some
control over the relative rate of expansion may have to be exercised in order to
get definitive results. Phillips & Moon argued that, in general, joint limit theory
is more robust than either sequential limit or diagonal path limit. However, it
is usually more difficult to derive and requires stronger conditions such as the
existence of higher moments that will allow for uniformity in the convergence
arguments. The muti-index asymptotic theory in Phillips & Moon (1999a, b) is
applied to joint limits in which T, N and (T/N) , i.e. to situations where
the time series sample is large relative to the cross-section sample. However,
the general approach given there is also applicable to situations in which (T/
N) 0 although different limit results will generally obtain in that case.
11
(1)
where zit is the deterministic component and uit is a stationary process. zit could
be zero, one, the fixed effects, i, or fixed effect as well as a time trend, t. The
Levin & Lin (LL) tests assume that uit are iid(0, 2u) and i = for all i. LL are
interested in testing the null hypothesis
H0 : = 1
(2)
T
h(t, s) = zt
1
ztzt
zs,
t=1
u it = uit
h(t, s)uis,
s=1
and
T
y it = yit
h(t, s)yis.
(3)
s=1
Then we have
N
NT( 1) =
N
1
N
i=1
N
i=1
1
T2
y i, t 1u it
t=1
T
y 2i, t 1
t=1
N
( 1)
t =
i=1
se
t=1
y 2i, t 1
12
where
N
1
s =
NT
2
e
i=1
u 2it.
t=1
N
1
N
i=1
and
y i, t 1u it
t=1
N
1
N
1
T
i=1
1
N
1
T2
WiZ dWiZ
i=1
N
y 2i, t 1
t=1
1
N
W 2iZ,
i=1
as T . Next we assume that WiZ dWiZ and W 2iZ, are independent across i
and have finite second moments. Then it follows that
N
N
1
N
1
N
W 2iZ E
i=1
WiZ dWiZ E
i=1
W 2iZ
WiZ dWiZ
N 0, Var
WiZ dWiZ
E[ W 2iZ]
1
2
1
2
1
6
1
15
Var[ W2iZ]
1
3
?
1
45
11
6300
(4)
13
Using (4), Levin & Lin (1992) obtain the following limiting distributions of
NT( 1) and t:
zit
0
1
NT( 1) N(0, 2)
NT( 1) N(0, 2)
i
NT( 1) + 3N 0,
t
t N(0, 1)
t N(0, 1)
51
5
2895
112
(5)
448
(t + 3.75N) N(0, 1)
277
N( 1) N 0,
i
N 1 +
(i, t) N 1 +
2
T(T 1)
3
3(17T 2 20T + 17)
N 0,
T+1
5(T 1)(T + 1)3
15
15(193T 2 728T + 1147)
N 0,
2(T + 2)
112(T + 2)3(T 2)
Harris & Tzavalis (1999) also showed that the assumption that T tends to
infinity at a faster rate than N as in LL rather than T fixed as in the case in micro
panels, yields tests which are substantially undersized and have low power
especially when T is small.
Recently, Frankel & Rose (1996), Oh (1996), and Lothian (1996) tested the
PPP hypothesis using panel data. All of these articles use LL tests and some of
them report evidence supporting the PPP hypothesis. OConnell (1998),
14
however, showed that the LL tests suffered from significant size distortion in
the presence of correlation among contemporaneous cross-sectional error
terms. OConnell highlighted the importance of controlling for cross-sectional
dependence when testing for a unit root in panels of real exchange rates. He
showed that, controlling for cross-sectional dependence, no evidence against
the null of a random walk can be found in panels of up to 64 real exchange
rates.
Virtually all the existing nonstationary panel literature assume crosssectional independence. It is true that the assumption of independence across i
is rather strong, but it is needed in order to satisfy the requirement of the
Lindeberg-Levy central limit theorem. Moreover, as pointed out by Quah
(1994), modeling cross-sectional dependence is involved because individual
observations in a cross-section have no natural ordering. Driscoll & Kraay
(1998) presented a simple extension of common nonparametric covariance
matrix estimation techniques which yields standard errors that are robust to
very general forms of spatial and temporal dependence as the time dimension
becomes large. In a recent paper, Conley (1999) presented a spatial model of
dependence among agents using a metric of economic distance that provides
cross-sectional data with a structure similar to time-series data. Conley
proposed a generalized method of moments (GMM) using such dependent data
and a class of nonparametric covariance matrix estimators that allow for a
general form of dependence characterized by economic distance.
B. Im, Pesaran & Shin (1997) Tests
The LL test is restrictive in the sense that it requires to be homogeneous
across i. As Maddala (1999) pointed out, the null may be fine for testing
convergence in growth among countries, but the alternative restricts every
country to converge at the same rate. Im, Pesaran & Shin (1997) (IPS) allow for
a heterogeneous coefficient of yit 1 and proposed an alternative testing
procedure based on averaging individual unit root test statistics. IPS suggested
an average of the augmented DF (ADF) tests when uit is serially correlated with
different serial correlation properties across cross-sectional units, i.e. uit =
pj =i 1
ijuit j + it. Substituting this uit in (1) we get:
pi
yit = iyit 1 +
j=1
(6)
15
N
1
t =
N
ti,
(7)
i=1
WiZ dWiZ
ti
= tiT
(8)
1/2
W 2iZ
as T . IPS assume that tiT are iid and have finite mean and variance. Then
N
1
N
N
tiT E[tiT | i = 1]
i=1
Var[tiT | i = 1]
N(0, 1)
(9)
N(t E[tiT | i = ])
Var[tiT | i = 1]
N(0, 1)
(10)
16
hypothesis that these two series contain unit roots. Gerdtham & Lthgren
(1998) claimed that the stationarity found by McCoskey & Selden are driven by
the omission of time trends in their ADF regression in (6). Using the IPS test
with a time trend, Gerdtham & Lthgren found that both HE and GDP are
nonstationary. They concluded that HE and GDP are cointegrated around linear
trends following the results of McCoskey & Kao (1999b).
C. Combining P-Values Tests
Let GiTi be a unit root test statistic for the i-th group in (1) and assume that as
Ti , GiTi Gi. Let pi be the p-value of a unit root test for cross-section i, i.e.
pi = F(GiTi), where F() is the distribution function of the random variable Gi.
Maddala & Wu (1999) and Choi (1999a) proposed a Fisher type test
N
P=2
ln pi
(11)
i=1
which combines the p-values from unit root tests for each cross-section i to test
for unit root in panel data. P is distributed as
2 with 2N degrees of freedom as
Ti for all N. Maddala et al. (1999) argued that the IPS and Fisher tests relax
the restrictive assumption of the LL test that i is the same under the alternative.
Both the IPS and Fisher tests combine information based on individual unit
root tests. However, the Fisher test has the advantage over the IPS test in that
it does not require a balanced panel. Also, the Fisher test can use different lag
lengths in the individual ADF regressions and can be applied to any other unit
root tests. The disadvantage is that the p-values have to be derived by Monte
Carlo simulations. Choi (1999a) echoes similar advantages of the Fisher test:
(1) the cross-sectional dimension, N, can be either finite or infinite, (2) each
group can have different types of nonstochastic and stochastic components, (3)
the time series dimension, T, can be different for each i, and (4) the alternative
hypothesis would allow some groups to have unit roots while others may not.
When N is large, Choi (1999a) proposed a Z test,
N
Z=
N
( 2 ln pi 2)
i=1
(12)
since E[ 2 ln pi] = 2 and Var[ 2 ln pi] = 4. Assume that the pis are iid and
use the Lindeberg-Levy central limit theorem to get
Z N(0, 1)
17
as Ti followed by N .3
Choi (1999a) applied the Z test in (12) and the IPS test in (7) to panel data
of real exchange rates and provided evidence in favor of the PPP hypothesis.
Choi claimed that this is due to the improved finite sample power of the Fisher
test. Maddala & Wu (1999) and Maddala et al. (1999) find that the Fisher test
is superior to the IPS test, but they argue that these panel unit root tests still do
not rescue the PPP hypothesis. When allowance is made for the deficiency in
the panel data unit root tests and panel estimation methods, support for PPP
turns out to be weak.
D. Residual Based LM Test
Hadri (1999) proposed a residual based Lagrange Multiplier (LM) test for the
null that the time series for each i are stationary around a deterministic trend
against the alternative of a unit root in panel data. Consider the following
model
yit = zit + rit + it
(13)
(14)
eit =
uij + it.
j=1
Let e it be the residuals from the regression in (14) and 2e be the estimate of the
error variance. Also, let Sit be the partial sum process of the residuals,
t
Sit =
e ij.
j=1
N
LM =
1
N
i=1
1
T2
2e
t=1
S2it
18
LM E
W 2iZ
N(0, 1)
as T followed by N .
E. Finite Sample Properties of Unit Root Tests
Extensive simulations have been conducted to explore the finite sample
performance of panel unit root tests, e.g. Karlsson & Lthgren (1999), Im et.
al. (1997), Maddala & Wu (1999), and Choi (1999a). Choi (1999a) studied the
small sample properties of IPS t-bar test in (7) and Fishers test in (11). Chois
major findings are the following:
(1) The empirical size of the IPS and the Fisher test are reasonably close to
their nominal size 0.05 when N is small. But the Fisher test shows mild size
distortions at N = 100, which is expected from the asymptotic theory.
Overall, the IPS t-bar test has the most stable size.
(2) In terms of the size-adjusted power, the Fisher test seems to be superior to
the IPS t-bar test.
(3) When a linear time trend is included in the model, the power of all tests
decrease considerably.
(15)
19
=
i=1
t=1
1
x it x it
i=1
t=1
x ity it ,
(16)
x it = xit
h(t, s)xis.
s=1
N
1
N
and
i=1
1
T2
t=1
N
1
N
i=1
1
T2
WiZWiZ
x it x it E
x ity it E WiZWiZ u
t=1
Then we have
p
1u.
(17)
20
since both eit and xit are I(1). In the panel regression (15) with a large number
of cross-sections, the strong noise of eit is attenuated by pooling the data and
a consistent estimate of can be extracted. The asymptotics of the OLS
estimator are very different from those of the spurious regression in pure time
series. This has an important consequence for residual-based cointegration tests
in panel data, because the null distribution of residual-based cointegration tests
depends on the asymptotics of the OLS estimator. This point is explained
further in the next section.
e it = y it x it.
In order to test the null hypothesis of no cointegration, the null can be written
as H0 : = 1. The OLS estimate of and the t-statistic are given as:
N
i=1
t=2
e ite it 1
=
e 2it
i=1
and
t=2
( 1)
t =
N
1
where s =
NT
2
e
i=1
i=1
e 2it 1
t=2
se
t=2
NT( 1) + 3N
10.2
21
NT( 1) +
DF* =
and
t +
DF*t =
6N
2 0
,
20 3 2
+
2 2 10 20
1 and 2 =
u
u
1. While DF and DFt are based
where 2 =
0
u
u
on the strong exogeneity of the regressors and errors, DF* and DF*t are for the
cointegration with endogenous relationship between regressors and errors. For
the ADF test, we can run the following regression:
p
e it = eit 1 +
jeit j + itp.
(19)
j=1
With the null hypothesis of no cointegration, the ADF test statistics can be
constructed as:
tADF +
ADF =
6N
2 0
20 3 2
+
2 2 10 20
22
(20)
xit = xit 1 + it
(21)
(22)
and
it = it 1 + uit,
where uit are i.i.d(0, 2u). The null of hypothesis of cointegration is equivalent
to = 0.
The test statistic proposed by McCoskey & Kao (1998) is defined as
follows:
N
LM =
1
N
i=1
1
T2
S2it
t=1
2e
(23)
t
Sit =
e ij
j=1
and 2e is defined in McCoskey and Kao. The asymptotic result for the test is:
N(LM ) N(0, 2 ).
(24)
23
The moments, and 2 , can be found through Monte Carlo simulation. The
limiting distribution of LM is then free of nuisance parameters and robust to
heteroskedasticity.
Urban economists have long sought to explain the relationship between
urbanization levels and output. McCoskey & Kao (1999a) revisited this
question and test the long run stability of a production function including
urbanization using nonstationary panel data techniques. McCoskey and Kao
applied the IPS test and LM in (23) and showed that a long run relationship
between urbanization, output per worker and capital per worker cannot be
rejected for the sample of thirty developing countries or the sample of twentytwo developed countries over the period 19651989. They do find, however,
that the sign and magnitude of the impact of urbanization varies considerably
across the countries. These results offer new insights and potential for dynamic
urban models rather than the simple cross-section approach.
C. Pedroni Tests
Pedroni (1997a) also proposed several tests for the null hypothesis of no
cointegration in a panel data model that allows for considerable heterogeneity.
His tests can be classified into two categories. The first set is similar to the tests
discussed above, and involve averaging test statistics for cointegration in the
time series across cross-sections. The second set group the statistics such that
instead of averaging across statistics, the averaging is done in pieces so that the
limiting distributions are based on limits of piecewise numerator and
denominator terms.
The first set of statistics as discussed includes a form of the average of the
Phillips & Ouliaris (1990) statistic:
T
(eit 1eit i)
Z =
t=1
i=1
(25)
e it2 1
t=1
1
where e it is estimated from (15), and i = ( 2i s2i ), for which 2i and s2i are
2
individual long-run and contemporaneous variances respectively of the residual
e it. For his second set of statistics, Pedroni defines four panel test statistics. Let
i be a consistent estimate of i, the long-run variance-covariance matrix.
i such that in the
Define L i to be the lower triangular Cholesky composition of
24
2u
is the long-run conditional variance. In
2
this survey we consider only one of these statistics:
N
2
L 11i
(eit 1e it i)
Zt =
NT
i=1
t=2
2NT
i=1
where
t=2
N
1
NT =
N
2 2
L 11i
e it 1
i=1
(26)
2i
.
L 211i
It should be noted that Pedroni bases his test on the average of the numerator
and denominator terms respectively, rather than the average for the statistics as
a whole. Using results on convergence of functionals of Brownian motion,
Pedroni finds the following result:
Zt + 1.73N N(0, 0.93).
NT
Note that this distribution applies to the model including an intercept and not
including a time trend. Asymptotic results for other model specifications can be
found in Pedroni (1997a). The convergence in distribution is based on
individual convergence of the numerator and denominator terms. What is the
intuition of rejection of the null hypothesis? Using the average of the overall
test statistic allows more ease in interpretation: rejection of the null hypothesis
means that enough of the individual cross-sections have statistics far away
from the means predicted by theory were they to be generated under the null.
Pedroni (1999) derived asymptotic distributions and critical values for
several residual based tests of the null of no cointegration in panels where there
are multiple regressors. The model includes regressions with individual specific
fixed effects and time trends. Considerable heterogeneity is allowed across
individual members of the panel with regards to the associated cointegrating
vectors and the dynamics of the underlying error process. Pedroni (1997b)
showed that for test of the null of no cointegration, the appropriate weighting
matrix of a GLS based estimator must be constructed using the long run
conditional covariance matrix between individual members of the panel in
order to eliminate nuisance parameters associated with member specific
dynamics. Pedroni (1997b) found that the violation of cross-sectional
independence does not appear to play a significant role for the conclusions in
25
favor of weak long run PPP provided that one also includes common time
dummies in the regression. Pedroni (2000) also demonstrated how it is possible
to construct a test that can be employed to test whether or not members of a
panel with heterogeneous short run dynamics converge to a single common
steady state.
D. Likelihood-Based Cointegration Test
Larsson, Lyhagen & Lthgren (1998) presented a likelihood-based (LR) panel
test of cointegrating rank in heterogeneous panel models based on the average
of the individual rank trace statistics developed by Johansen (1995). The
proposed LR-bar statistic is very similar to the IPS t-bar statistic in (7) through
(10). In Monte Carlo simulation, Larsson et al. investigated the small sample
properties of the standardized LR-bar statistic. They found that the proposed
test requires a large time series dimension. Even if the panel has a large crosssectional dimension, the size of the test will be severely distorted.
Groen & Kleibergen (1999) proposed a likelihood-based framework for
cointegrating analysis in panels of a fixed number of vector error correction
models. Maximum likelihood estimators of the cointegrating vectors are
constructed using iterated generalized method of moments (GMM) estimators.
Using these estimators Groen and Kleibergen construct likelihood ratio
statistics, LR(B|A), to test for a common cointegration rank across the
individual vector error correction models, both with heterogeneous and
homogeneous cointegrating vectors. Interestingly, the limiting distribution of
LR(B|A) is invariant to the covariance matrix of the error terms which
implies that LR(B|A) is robust with respect to the choices of covariance
matrix. Let us define the LRs(r|k) as the summation of the N individual trace
statistics
N
LRs(r | k) =
(27)
LRi(r | k)
i=1
LRi(r | k) tr
dBk r, iBk r, i
dBk r, iBk r, i
dBk r, iBk r, i
26
N
LRs(r | k) =
LRi(r | k)
i=1
N
tr
dBk r, iBk r, i
i=1
dBk r, iBk r, i
dBk r, iBk r, i
(28)
as T by a continuous mapping theorem. It follows that LRs(r | k) is
asymptotically equivalent to LR(B | A) when N is fixed and T is large. This
means that nothing is lost by assuming that the covariance matrix has zero nondiagonal covariances as far as the asymptotics are concerned for the proposed
test statistics in this chapter. More importantly, the tests based on the crossindependence like (27) will perform just as well (asymptotically) as the tests
based on the cross-dependence such as LR(B | A). Groen and Kleibergen
verified that the likelihood-based cointegration tests proposed by Larsson et al.
in (27) are robust with respect to the cross-dependence in panel data. The
(asymptotic) equivalence of LRs(r | k) and LR(B | A) found in Groen and
Kleibergen has profound implications to econometricians and applied economists, e.g. there exists tests/estimators based on the cross-independence
which are equivalent to tests/estimators based on the cross-dependence in
nonstationary panel time series. Define LR(r | k) to be the average of LRi(r | k):
N
LR(r | k) =
1
1
LRs(r | k) =
N
N
LRi(r | k).
i=1
N(0, 1)
1
LR(B | A).
N
(29)
LR(B | A) =
1
LR(B | A)
N
N
tr
dBk r, iBk r, i
27
dBk r, iBk r, i
i=1
dBk r, iBk r, i
1
=
N
where
Zki = tr
Zki
i=1
dBk r, iBk r, i
as T . Then
dBk r, iBk r, i
1
N
dBk r, iBk r, i
i=1
1
Zki E
N
Zki
i=1
N(0, 1)
N
1
Var
Zki
N i=1
as N since Bk r, i and Bk r, j are independent for i j. It implies that
LR(B | A) E[LR(B | A)]
Var[LR(B | A)]
N(0, 1)
28
heterogeneous panel data: varying slopes and varying intercepts. Two of the
tests are constructed under the null hypothesis of no cointegration. These tests
are based on the average ADF test and Pedronis pooled tests in (25) and (26).
The third test is based on the null hypothesis of cointegration which is based
on the McCoskey & Kao LM test in (23). Wu & Yin (1999) performed a similar
comparison for panel tests in which they consider only tests for which the null
hypothesis is that of no cointegration. Wu & Yin compared ADF statistics with
maximum eigenvalue statistics in pooling information on means and p-values
respectively. They found that the average ADF performs better with respect to
power and their maximum eigenvalue based p-value performs better with
regards to size.
The test of the null hypothesis was originally proposed in response to the low
power of the tests of the null of no cointegration, especially in the time series
case. Further, in cases where economic theory predicted a long run steady state
relationship, it seemed that a test of the null of cointegration rather than the null
of no cointegration would be appropriate. The results from the Monte Carlo
study showed that the McCoskey & Kao LM test outperforms the other two
tests.
Of the two reasons for the introduction of the test of the null hypothesis of
cointegration, low power and attractiveness of the null, the introduction of the
cross-section dimension of the panel solves one: all of the tests show decent
power when used with panel data. For those applications where the null of
cointegration is more logical than the null of no cointegration, McCoskey &
Kao (1999b), at a minimum, conclude that using the McCoskey & Kao LM test
does not compromise the ability of the researcher in determining the underlying
nature of the data.
Recently, Hall et al. (1999) proposed a new approach based on principal
components analysis to test for the number of common stochastic trends
driving the nonstationary series in a panel data set. The test is consistent even
if there is a mixture of I(0) and I(1) series in the sample. This makes it
unnecessary to pretest the panel for unit root. It also has the advantage of
solving the problem of dimensionality encountered in large panel data sets.
29
of these differences have become apparent in recent works by Kao & Chiang
(2000), Phillips & Moon (1999a) and Pedroni (1996). The panel cointegration
models are directed at studying questions that surround long run economic
relationships typically encountered in macroeconomic and financial data. Such
a long run relationship is often predicted by economic theory and it is then of
central interest to estimate the regression coefficients and test whether they
satisfy theoretical restrictions. Kao & Chen (1995) showed that the OLS in
panel cointegrated models is asymptotically normal but still asymptotically
biased. Chen, McCoskey & Kao (1999) investigated the finite sample
proprieties of the OLS estimator, the t-statistic, the bias-corrected OLS
estimator, and the bias-corrected t-statistic. They found that the bias-corrected
OLS estimator does not improve over the OLS estimator in general. The results
of Chen et al. suggested that alternatives, such as the fully modified (FM)
estimator or dynamic OLS (DOLS) estimator may be more promising in
cointegrated panel regressions. Phillips & Moon (1999a) and Pedroni (1996)
proposed a FM estimator, which can be seen as a generalization of Phillips &
Hansen (1990). In this volume, Kao & Chiang (2000) propose an alternative
approach based on a panel dynamic least squares (DOLS) estimator, which
builds upon the work of Saikkonen (1991) and Stock & Watson (1993).
Next, we provide a brief discussion of the OLS estimation methods in a
panel cointegrated model. Consider the following panel regression:
yit = xit + ziti + uit,
(30)
i=1
t=1
x it x it
N
1
N
and
i=1
1
i=1
t=1
x ity it .
1
T2
t=1
(31)
1
x it x it lim
N N
p
i=1
E[2i],
(32)
30
N
1
N
i=1
1
T
t=1
N
1
x itu it lim
N N
(33)
E[1i]
i=1
E[1i]
i
1
ui + ui
2
1
ui + ui
2
(i, t)
E[2i]
1
2
0
1
i
6
1
i
15
and
i =
ui
ui
ui
i
ui
ui
ui
is the onei
N
1
NT( OLS ) NNT N 0, 6 1 lim
N N
u.i 1 ,
i=1
where = lim
N
1
N
i and
i=1
1
NT =
N
1
T2
i=1
1
t=1
1
N
1/2
i
i dWi i 1/2ui + ui .
W
i=1
Kao & Chiang (2000) in this volume studied the limiting distributions for the
FM, and DOLS estimators in a cointegrated regression and showed they are
31
asymptotically normal. Phillips & Moon (1999a) and Pedroni (1996) also
obtained similar results for the FM estimator. The reader is referred to the cited
papers for further details. Kao and Chiang also investigated the finite sample
properties of the OLS, FM, and DOLS estimators. They found that: (i) the OLS
estimator has a non-negligible bias in finite samples, (ii) the FM estimator does
not improve over the OLS estimator in general, and (iii) the DOLS estimator
may be more promising than OLS or FM estimators in estimating the
cointegrated panel regressions.
Choi (1999b) extended Kao & Chiang (2000) to study asymptotic properties
of OLS, Within and GLS estimators for an error component model. The error
component model involves both stationary and nonstationary regressors. Chois
simulation results indicated that the feasible GLS estimator is more efficient
than the Within estimator. Choi (1999c) studied instrumental variable
estimation for an error component model with stationary and nearly
nonstationary regressors.
Phillips & Moon (1999a) studied various regressions between two panel
vectors that may or may not have cointegrating relations, and present a
fundamental framework for studying sequential and joint limit theories in
nonstationary panel data. In particular, Phillips and Moon studied regression
limit theory of nonstationary panels when both N and T go to infinity. Their
limit theory allows for both sequential limits, where T followed by N
and joint limits, where T, N simultaneously. Phillips and Moon require that
N/T 0, so that these results apply for moderate N and large T macro panel
data and not large N and small T micro panel data. The panel models
considered allow for four cases: (i) panel spurious regression, where there is no
time series cointegration, (ii) heterogeneous panel cointegration, where each
individual has its own specific cointegration relation, (iii) homogeneous panel
cointegration where individuals have the same cointegration relation, and (iv)
near-homogeneous panel cointegration, where individuals have slightly
different cointegration relations determined by the value of a localizing
parameter. Phillips & Moon (1999a) investigated these four models and
developed panel asymptotics for regression coefficients and tests using both
sequential and joint limit arguments. In all cases considered the pooled
estimator is consistent and has a normal limiting distribution. In fact, for the
spurious panel regression, Phillips & Moon (1999a) showed that under quite
weak regularity conditions, the pooled least squares estimator of the slope
coefficient is N consistent for the long-run average relation parameter
and has a limiting normal distribution. Also, Moon & Phillips (1998a) showed
that a limiting cross-section regression with time averaged data is also N
consistent for and has a limiting normal distribution. This is different from
32
the pure time series spurious regression where the limit of the OLS estimator
of is a nondegenerate random variate that is a functional of Brownian
motions and is therefore not consistent for . The idea in Phillips & Moon
(1999a) is that independent cross-section data in the panel adds information
and this leads to a stronger overall signal than the pure time series case. Pesaran
& Smith (1995) studied limiting cross-section regressions with time averaged
data and established consistency with restrictive assumptions on the heterogeneous panel model. This differs from Phillips & Moon (1999a) in that the
former use an average of the cointegrating coefficients which is different from
the long run average regression coefficient. This requires the existence of
cointegrating time series relations, whereas the long run average regression
coefficient is defined irrespective of the existence of individual cointegrating
relations and relies only on the long run average variance matrix of the panel.
Phillips & Moon (1999a) also showed that for the homogeneous and near
homogeneous cointegration cases, a consistent estimator of the long run
regression coefficient can be constructed which they call a pooled FM
estimator. They showed that this estimator has faster convergence rate than the
simple cross-section and time series estimators. See also Phillips & Moon
(1999b) for a concise review. In fact, the latter paper also shows how to extend
the above ideas to models with individual effects in the data generating process.
For the panel spurious regression with individual specific deterministic trends,
estimates of the trend coefficients are obtained in the first step and the
detrended data is pooled and used in least squares regression to estimate in
the second step. Two different detrending procedures are used based on OLS
and GLS regressions. OLS detrending leads to an asymptotically more efficient
estimator of the long run average coefficient in pooled regression than GLS
detrending. Phillips & Moon (1999b) explain that the residuals after time
series GLS detrending have more cross section variation than they do after OLS
detrending and this produces great variation in the limit distribution of the
pooled regression estimator of the long run average coefficient.
Moon & Phillips (1999) investigate the asymptotic properties of the
Gaussian MLE of the localizing parameter in local to unity dynamic panel
regression models with deterministic and stochastic trends. Moon and Phillips
find that for the homogeneous trend model, the Gaussian MLE of the common
localizing parameter is N consistent, while for the heterogeneous trends
model, it is inconsistent. The latter inconsistency is due to the presence of an
infinite number of incidental parameters (as N ) for the individual trends.
Unlike the fixed effects dynamic panel data model where this inconsistency due
to the incidental parameter problem disappears as T , the inconsistency of
33
the localizing parameter in the Moon and Phillips model persists even when
both N and T go to infinity.
Pesaran, Shin & Smith (1999) derived the asymptotics of a pooled mean
group (PMG) estimator. The PMG estimation constrains the long run
coefficients to be identical, but allow the short run and adjustment coefficients
as the error variances to differ across the cross-sectional dimension.
Recently, Binder, Hsiao & Pesaran (2000) considered estimation and
inference in panel vector autoregressions (PVARS) with fixed effects when T is
finite and N is large. A maximum likelihood estimator as well as unit root and
cointegration tests are proposed based on a transformed likelihood function.
This MLE is shown to be consistent and asymptotically normally distributed
irrespective of the unit root and cointegrating properties of the PVAR model.
The tests proposed are based on standard chi-square and normal distributed
statistics. Binder et al. also show that the conventional GMM estimators based
on standard orthogonality conditions break down if the underlying time series
contain unit roots. Monte Carlo evidence is provided which favors MLE over
GMM in small samples.
In this volume, Kauppi (2000) develops a new joint limit theory where the
panel data may be cross-sectionally heterogeneous in a general way. This limit
theory builds upon the concepts of joint convergence in probability and in
distribution for double indexed processes by Phillips & Moon (1999a) and
develops new versions of the law of large numbers and the central limit
theorem that apply in panels where the data may be cross-sectionally
heterogeneous in a fairly general way. Kauppi demonstrates how this joint limit
theory can be applied to derive asymptotics for a panel regression where the
regressors are generated by a local to unit root with heterogeneous localizing
coefficients across cross-sections. Kauppi discusses issues that arise in the
estimation and inference of panel cointegrated regressions with near integrated
regressors. Kauppi shows that a bias corrected pooled OLS for a common
cointegrating parameter has an asymptotic normal distribution centered on the
true value irrespective of whether the regressor has near or exact unit root.
However, if the regression model contains individual effects and/or deterministic trends, then Kauppis bias corrected pooled OLS still produces
asymptotic bias. Kauppi also shows that the panel FM estimator is subject to
asymptotic bias regardless of how individual effects and/or deterministic trends
are contained if the regressors are nearly rather than exacly integrated. This
indicates that much care should be taken in interpreting empirical results
achieved by the recent panel cointegration methods that assume exact unit roots
when near unit roots are equally plausible.
34
(34)
35
Nerlove (1999), life cycle labor supply models, see Ziliak (1997), and demand
for gasoline, see Baltagi & Griffin (1997) to mention a few.
It is well known that for typical micro-panels where there are a large number
of firms or individuals (N) observed over a short period of time (T), the fixed
effects (FE) estimator is biased and inconsistent (since T is fixed and N ),
see Nickell (1981) and more recently Kiviet (1995, 1999). Monte Carlo results
have shown that first order asymptotic properties do not necessarily yield
correct inference in finite samples. Therefore, Kiviet (1995) examined higher
order asymptotics which may approximate the actual finite sample properties
more closely and lead to better inference. In fact, Kiviet (1995) considered the
simple dynamic linear panel data model with serially uncorrelated disturbances
and strongly exogenous regressors and derived an approximation for the bias of
the FE estimator. When a consistent estimator of this bias is subtracted from the
original FE estimator, a corrected FE estimator results. This corrected FE
estimator performed well in simulations when compared with eight other
consistent instrumental variable or GMM estimators.4
In macro-panels studying for example long run growth, the data covers a
large number of countries N over a moderate size T. In this case, T is not very
small relative to N. Hence, some researchers may still favor the FE estimator
arguing that its bias may not be large. Judson & Owen (1999) performed some
Monte Carlo experiments for N = 20 or 100 and T = 5, 10, 20 and 30 and found
that the bias in the FE can be sizeable, even when T = 30. The bias of the FE
estimator increases with and decreases with T. But even for T = 30, this bias
could be as much as 20% of the true value of the coefficient of interest. Judson
& Owen (1999) recommend the corrected FE estimator proposed by Kiviet
(1995) as the best choice, GMM being second best and for long panels, the
computationally simpler Anderson & Hsiao (1982) estimator. This last
estimator first differences the data to get rid of the individual effects and then
uses lagged predetermined variables in levels as instruments.5 Arellano & Bond
(1991) proposed GMM procedures that are more efficient than the Anderson &
Hsiao (1982) estimator. Ahn & Schmidt (1995) derive additional nonlinear
moment restrictions not exploited by the Arellano & Bond (1991) GMM
estimator.6 Ahn & Schmidt (1995, 1997) also give a complete count of the set
of orthogonality conditions corresponding to a variety of assumptions imposed
on the disturbances and the initial conditions of the dynamic panel data model.
While many of the moment conditions are nonlinear in the parameters, Ahn &
Schmidt (1997) propose a linearized GMM estimator that is asymptotically as
efficient as the nonlinear GMM estimator. They also provide simple moment
tests of the validity of these nonlinear restrictions. In addition, they investigate
the circumstances under which the optimal GMM estimator is equivalent to a
36
linear instrumental variable estimator. They find that these circumstances are
quite restrictive and go beyond uncorrelatedness and homoskedasticity of the
errors. Ahn & Schmidt (1995) provide some evidence on the efficiency gains
from the nonlinear moment conditions which provide support for their use in
practice. By employing all these conditions, the resulting GMM estimator is
asymptotically efficient and has the same asymptotic variance as the MLE
under normality. In fact, Hahn (1997) showed that GMM based on an
increasing set of instruments as N would achieve the semiparametric
efficiency bound.
Hahn (1997) considers the asymptotic efficient estimation of the dynamic
panel data model with sequential moment restrictions in an environment with
i.i.d. observations. Hahn (1997) shows that the GMM estimator with an
increasing set of instruments as the sample size grows attains the semiparametric efficiency bound of the model. Hahn (1997) explains how Fourier series
or polynomials may be used as the set of instruments for efficient estimation.
In a limited Monte Carlo comparison, Hahn finds that this estimator has similar
finite sample properties as the Keane & Runkle (1992) and/or Schmidt et al.
(1992) estimators when the latter estimators are efficient. In cases where the
latter estimators are not efficient, the Hahn efficient estimator outperforms both
estimators in finite samples.
Recently, Wansbeek & Bekker (1996) considered a simple dynamic panel
data model with no exogenous regressors and disturbances uit and random
effects i that are independent and normally distributed. They derived an
expression for the optimal instrumental variable estimator, i.e. one with
minimal asymptotic variance. A striking result is the difference in efficiency
between the IV and ML estimators. They find that for regions of the
autoregressive parameter which are likely in practice, ML is superior. The
gap between IV (or GMM) and ML can be narrowed down by adding moment
restrictions of the type considered by Ahn & Schmidt (1995). Hence, Wansbeek
& Bekker (1996) find support for adding these nonlinear moment restrictions
and warn against the loss in efficiency as compared with MLE by ignoring
them.
Blundell & Bond (1998) revisit the importance of exploiting the initial
condition in generating efficient estimators of the dynamic panel data model
when T is small. They consider a simple autoregressive panel data model with
no exogenous regressors
yit = yi, t 1 + i + uit
(35)
37
only one orthogonality condition given by E(yi1ui3) = 0, so that is justidentified. In this case, the first stage IV regression is obtained by running yi2
on yi1. Note that this regression can be obtained from (2) evaluated at t = 2 by
subtracting yi1 from both sides of this equation, i.e.
yi2 = ( 1)yi, 1 + i + ui2
(36)
c
c + ( 2/ 2u)
(37)
where c = (1 )/(1 + ). The bias term effectively scales the estimated
coefficient on the instrumental variable yi1 towards zero. They also find that the
F-statistic of the first stage IV regression converges to
21 with noncentrality
parameter
=
( 2uc)2
0 as 1
2 + 2uc
(37)
38
estimator. However, the system GMM estimator not only improves the
precision but also reduces the finite sample bias. The empirical application
revisits the estimates of the capital and labor coefficients in a Cobb-Douglas
production function considered by Griliches & Mairesse (1998). Using data on
509 R&D performing US manufacturing companies observed over 8 years
(19821989), the standard GMM estimator that uses moment conditions on the
first differenced model finds a low estimate of the capital coefficient and low
precision for all coefficients estimated. However, the system GMM estimator
gives reasonable and more precise estimates of the capital coefficient and
constant returns to scale is not rejected. Blundell et al. conclude that a careful
examination of the original series and consideration of the system GMM
estimator can usefully overcome many of the disappointing features of the
standard GMM estimator for dynamic panel models. Hahn (1999) also
examines the role of the initial condition imposed by the Blundell & Bond
(1998) estimator. This is done by numerically comparing the semiparametric
information bounds for the case that incorporates the stationarity of the initial
condition and the case that does not. Hahn (1999) finds that the efficiency gain
can be substantial.
Ziliak (1997) asks the question whether the bias/efficiency trade-off for the
GMM estimator considered by Tauchen (1986) for the time series case is still
binding in panel data where the sample size is normally larger than 500. For
time series data, Tauchen (1986) shows that even for T = 50 or 75 there is a bias/
efficiency trade-off as the number of moment conditions increase. Therefore,
Tauchen recommends the use of sub-optimal instruments in small samples.
This result was also corroborated by Andersen & Sorensen (1996) who argue
that GMM using too few moment conditions is just as bad as GMM using too
many moment conditions. This problem becomes more pronounced with panel
data since the number of moment conditions increase dramatically as the
number of strictly exogenous variables and the number of time series
observations increase. Even though it is desirable from an asymptotic efficiency
point of view to include as many moment conditions as possible, it may be
infeasible or impractical to do so in many cases. For example, for T = 10 and
five strictly exogenous regressors, this generates 500 moment conditions for
GMM. Ziliak (1997) performs an extensive set of Monte Carlo experiments for
a dynamic panel data model and finds that the same trade-off between bias and
efficiency exists for GMM as the number of moment conditions increase, and
that one is better off with sub-optimal instruments. In fact, Ziliak finds that
GMM performs well with suboptimal instruments, but is not recommended for
panel data applications when all the moments are exploited for estimation.7
Ziliak estimates a life cycle labor supply model under uncertainty based on 532
39
men observed over 10 years of data (19781987) from the panel study of
income dynamics. The sample was restricted to continuously married,
continuously working prime age men aged 2251 in 1978. These men were
paid an hourly wage or salaried and could not be piece-rate workers or selfemployed. Ziliak finds that the downward bias of GMM is quite severe as the
number of moment conditions expands, outweighing the gains in efficiency.
Ziliak reports estimates of the intertemporal substitution elasticity which is the
focal point of interest in the labor supply literature. This measures the
intertemporal changes in hours of work due to an anticipated change in the real
wage. For GMM, this estimate changes from 0.519 to 0.093 when the number
of moment conditions used in GMM are increased from 9 to 212. The standard
error of this estimate drops from 0.36 to 0.07. Ziliak attributes this bias to the
correlation between the sample moments used in estimation and the estimated
weight matrix. Interestingly, Ziliak finds that the forward filter 2SLS estimator
proposed by Keane & Runkle (1992) performs best in terms of the bias/
efficiency trade-off and is recommended. Forward filtering eliminates all forms
of serial correlation while still maintaining orthogonality with the initial
instrument set. Schmidt, Ahn & Wyhowski (1992) argued that filtering is
irrelevant if one exploits all sample moments during estimation. However, in
practice, the number of moment conditions increases with the number of time
periods T and the number of regressors K and can become computationally
intractable. In fact for T = 15 and K = 10, the number of moment conditions for
Schmidt, et al. (1992) is T(T1)K/2 which is 1040 restrictions, highlighting the
computational burden of this approach. In addition, Ziliak argues that the
overidentifying restrictions are less likely to be satisfied possibly due to the
weak correlation between the instruments and the endogenous regressors.8 In
this case, the forward filter 2SLS estimator is desirable yielding less bias than
GMM and sizeable gains in efficiency. In fact, for the life cycle labor example,
the forward filter 2SLS estimate of the intertemporal substitution elasticity was
0.135 for 9 moment conditions compared to 0.296 for 212 moment conditions.
The standard error of these estimates dropped from 0.32 to 0.09.
The practical problem of not being able to use more moment conditions as
well as the statistical problem of the trade-off between small sample bias and
efficiency prompted Ahn & Schmidt (1999a) to pose the following questions:
Under what conditions can we use a smaller set of moment conditions without
incurring any loss of asymptotic efficiency? In other words, under what
conditions are some moment conditions redundant in the sense that utilizing
them does not improve efficiency? These questions were first dealt with by Im,
Ahn, Schmidt & Wooldridge (1999) who considered panel data models with
strictly exogenous explanatory variables. They argued that, for example, with
40
ten strictly exogenous time-varying variables and six time periods, the moment
conditions available for the random effects (RE) model is 360 and this reduces
to 300 moment conditions for the FE model. GMM utilizing all these moment
conditions leads to an efficient estimator. However, these moment conditions
exceed what the simple RE and FE estimators use. Im et al. (1999) provide the
assumptions under which this efficient GMM estimator reduces to the simpler
FE or RE estimator. In other words, Im et al. (1999) show the redundancy of
the moment conditions that these simple estimators do not use. Ahn & Schmidt
(1999a) provide a more systematic method by which redundant instruments can
be found and generalize this result to models with time-varying individual
effects. However, both papers deal only with strictly exogenous regressors. In
a related paper, Ahn & Schmidt (1999b) consider the cases of strictly and
weakly exogenous regressors. They show that the GMM estimator takes the
form of an instrumental variables estimator if the assumption of no conditional
heteroskedasticity (NCH) holds. Under this assumption, the efficiency of
standard estimators can often be established showing that the moment
conditions not utilized by these estimators are redundant. However, Ahn &
Schmidt (1999b) conclude that the NCH assumption necessarily fails if the full
set of moment conditions for the dynamic panel data model are used. In this
case, there is clearly a need to find modified versions of GMM, with reduced
set of moment conditions that lead to estimates with reasonable finite sample
properties.
Crepon, Kramarz & Trognon (1997) argue that for the dynamic panel data
model, when one considers a set of orthogonal conditions, the parameters can
be divided into parameters of interest (like ) and nuisance parameters (like the
second order terms in the autoregressive error component model). They show
that the elimination of such nuisance parameters using their empirical
counterparts does not entail an efficiency loss when only the parameters of
interest are estimated. In fact, Sevestre and Trognon in chapter 6 of Matyas &
Sevestre (1996) argue that if one is only interested in , then one can reduce
the number of orthogonality restrictions without loss in efficiency as far as is
concerned. However, the estimates of the other nuisance parameters are not
generally as efficient as those obtained from the full set of orthogonality
conditions.
The Alonso-Borrego & Arellano (1999) paper is also motivated by the finite
sample bias in panel data instrumental variable estimators when the
instruments are weak. The dynamic panel model generates many overidentifying restrictions even for moderate values of T. Also, the number of
instruments increases with T, but the quality of these instruments is often poor
because they tend to be only weakly correlated with first differenced
41
(39)
This model results from Islams (1995) version of Solows model on growth
convergence among countries. Wansbeek & Knaap (1999) show that double
differencing gets rid of the individual country effects (i) on the first round of
differencing and the heterogeneous coefficient on the time trend (i) on the
second round of differencing. Modified OLS, IV and GMM methods are
adapted to this model and LIML is suggested as a viable alternative to GMM
to guard against the small sample bias of GMM. Macroeconomic data are
subject to measurement error and Wansbeek & Knaap (1999) show how these
estimators can be modified to account for measurement error that is white
noise. For example, GMM is modified so that it discards the orthogonality
conditions that rely on the absence of measurement error.
Jimenez-Martin (1998) performs Monte Carlo experiments to study the
performance of the Holtz-Eakin (1988) test for the presence of individual
42
43
for a dynamic heterogeneous panel data model using Monte Carlo experiments.
Their findings indicate that the mean group estimator performs reasonably well
for large T. However, when T is small, the mean group estimator could be
seriously biased, particularly when N is large relative to T. Pesaran & Zhao
(1999) examine the effectiveness of alternative bias-correction procedures in
reducing the small sample bias of these estimators using Monte Carlo
experiments. An interesting finding is that when the coefficient of the lagged
dependent variable is greater than or equal to 0.8, none of the bias correction
procedures seem to work.
Hsiao, Pesaran & Tahmiscioglu (1999) suggest a Bayesian approach for
estimating the mean parameters of a dynamic heterogeneous panel data model.
The coefficients are assumed to be normally distributed across cross-sectional
units and the Bayes estimator is implemented using Markov Chain Monte
Carlo methods. Hsiao et al. argue that Bayesian methods can be a viable
alternative in the estimation of mean coefficients in dynamic panel data models
even when the initial observations are treated as fixed constants. They establish
the asymptotic equivalence of this Bayes estimator and the mean group
estimator proposed by Pesaran & Smith (1995). The asymptotics are carried
out for both N and T with N/T 0. Monte Carlo experiments show that
this Bayes estimator has better sampling properties than other estimators for
both small and moderate size T. Hsiao et al. also caution against the use of the
mean group estimator unless T is sufficiently large relative to N. The bias in the
mean coefficient of the lagged dependent variable appears to be serious when
T is small and the true value of this coefficient is larger than 0.6. Hsiao et al.
apply their methods to estimate the q-investment model using a panel of 273
US firms over the period 19721993.
VII. CONCLUSION
This survey gives a brief overview of some of the main results in the
econometrics of nonstationary panels as well as recent developments in
dynamic panels. There has been an immense amount of research in this area
recently with the demand for empirical studies exceeding the supply of
econometric theory developed for these models. As this survey indicates,
several issues have been resolved but a lot remains to be done.
ACKNOWLEDGMENTS
The authors would like to thank R. Carter Hill, M. H. Pesaran and an
anonymous referee for their helpful comments and suggestions. Baltagi was
44
NOTES
1. A collection of dynamic panel data routines can be found in: http://www.cemfi.es/
~ arellano/#dpd.
2. Chiang & Kao (2000) have recently put together a fairly comprehensive set of
subroutines, NPT 1.0, for studying nonstationary panel data. NPT 1.0 can be
downloaded from http://web.syr.edu/ ~ cdkao.
3. Testing for cointegration in panel data by combining p-values tests is a
straightforward extension of the testing procedures in this section. For cointegration
tests, the relevant model is equation (15). We let GiTi be a test for the null of no
cointegration and apply the same tests and asymptotic theory in this section.
4. Kiviet (1999) extends this derivation to the case of weakly exogenous variables
and examines to what degree this order of approximation is determined by the initial
conditions of the dynamic panel model.
5. Arellano (1989) found that using lagged differences of predetermined variables
as instruments is not recommended since it has a singularity point and very large
variances over a significant range of the parameter values.
6. See also Arellano & Bover (1995), chapter 8 of Baltagi (1995) and chapters 6 and
7 of Matyas & Sevestre (1996) for more details.
7. For a Hausman & Taylor (1981) type model, Metcalf (1996) shows that using
less instruments may lead to a more powerful Hausman specification test. Asymptotically, more instruments lead to more efficient estimators. However, the asymptotic bias
of the less efficient estimator will also be greater as the null hypothesis of no correlation
is violated. Metcalf argues that if the bias increases at the same rate as the variance (as
the null is violated) for the less efficient estimator, then the power of the Hausman test
will increase. This is due to the fact that the test statistic is linear in variance but
quadratic in bias.
8. See the growing literature on weak instruments by Nelson & Startz (1990),
Bekker (1994), Angrist & Kreuger (1995), Bound, Jaeger & Baker (1995) and Staiger
& Stock (1997) to mention a few.
9. An alternative one-step method that achieves the same asymptotic efficiency as
robust GMM or LIML estimators is the maximum empirical likelihood estimation
method, see Imbens (1997). This maximizes a multinomial pseudo-likelihood function
subject to the orthogonality restrictions. These are invariant to normalization because
they are maximum likelihood estimators.
10. Maddala et al. (1997) also provide a unified treatment of classical, Bayes and
empirical Bayes procedures for estimating this model.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal
of Econometrics, 68, 527.
45
Ahn, S. C., & Schmidt, P. (1997). Efficient Estimation of Dynamic Panel Data Models: Alternative
Assumptions and Simplified Estimation. Journal of Econometrics, 76, 309321.
Ahn, S. C., & Schmidt, P. (1999a). Modified Generalized Instrumental Variables Estimation of
Panel Data Models with Strictly Exogenous Instrumental Variables. In: C. Hsiao, K. Lahiri,
L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable
Models (pp. 171198). Cambridge: Cambridge University Press.
Ahn, S. C., & P. Schmidt. (1999b). Estimation of Linear Panel Data Models Using GMM. In:
Generalized Method of Moments Estimation (pp. 211247). Cambridge: Cambridge
University Press.
Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalized Instrumental Variable
Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 3649.
Alvarez, J., & Arellano, M. (1997). The Time Series and Cross-section Asymptotics of Dynamic
Panel Data Estimators. Working paper, CEMFI, Madrid.
Andersen, T. G., & Srensen, R. E. (1996). GMM Estimation of a Stochastic Volatility Model: A
Monte Carlo Study. Journal of Business and Economic Statistics, 14, 328352.
Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using
Panel Data. Journal of Econometrics, 18, 4782.
Andersson, J., & Lyhagen, J. (1999). A Long Memory Panel Unit Root Test: PPP Revisited.
Working paper, Economics and Finance, No. 303, Stockholm School of Economics,
Sweden.
Angrist, J. D., & Krueger, A. B. (1995). Split Sample Instrumental Variable Estimates of Return
to Schooling. Journal of Business and Economic Statistics, 13, 225235.
Arellano, M. (1989). A Note on the Anderson-Hsiao Estimator for Panel Data. Economics Letters,
31, 337341.
Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo
Evidence and An Application to Employment Equations. Review of Economic Studies, 58,
277297.
Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variables Estimation of ErrorComponent Models. Journal of Econometrics, 68, 2951.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley.
Baltagi, B. H., & Griffin, J. M. (1995). A Dynamic Demand Model for Liquor: The Case for
Pooling. Review of Economics and Statistics, 77, 545553.
Baltagi, B. H., & Griffin, J. M. (1997). Pooled Estimators v.s. Their Heterogeneous Counterparts
in the Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303327.
Baltagi, B. H., Griffin, J. M. & Xiong, W. (2000). To Pool or Not to Pool: Homogeneous Versus
Heterogeneous Estimators Applied to Cigarette Demand. Review of Economics and
Statistics, 82, 117126.
Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of
Economics and Statistics, 61, 607629.
Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variables
Estimators. Econometrica, 62, 657682.
Bernard, A., & Jones, C. (1996). Productivity Across Industries and Countries: Time Series Theory
and Evidence. Review of Economics and Statistics, 78, 135146.
Bhargava, A., Franzini, L. & Narendranathan, W. (1982). Serial Correlation and Fixed Effects
Models. Review of Economic Studies, 49, 533549.
Binder, M., Hsiao, C. & Pesaran, M. H. (2000). Estimation and Inference in Short Panel Vector
Autoregressions With Unit Roots and Cointegration. Working paper, Department of
Economics, University of Maryland.
46
Blundell, R. W., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel
Data Models. Journal of Econometrics, 87, 115143.
Blundell, R. W., Bond, S., & Windmeijer, F. (2000). Estimation in Dynamic Panel Data Models:
Impoving on the Performance of the Standard GMM Estimator. Advances in Econometrics,
15, forthcoming.
Boumahdi, R., & Thomas, A. (1991). Testing for Unit Roots Using Panel Data. Economics Letters,
37, 7779.
Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation
When the Correlation Between the Instruments and the Endogenous Explanatory Variables
is Weak. Journal of the American Statistical Association, 90, 443450.
Breitung, J. (2000). The Local Power of Some Unit Root Tests for Panel Data. Advances in
Econometrics, 15, forthcoming.
Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different
Bargaining Levels Cointegrated? Applied Economics, 26, 353361.
Canzoneri, M. B., Cumby, E. E., & Diba, B. (1999). Relative Labor Productivity and the Real
Exchange Rate in the Long Run: Evidence for a Panel of OECD Countries. Journal of
International Economics, 47, 245266.
Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression
in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management
Sciences, 19, 75114.
Chiang, M. H., & Kao, C. (2000). Nonstationary Panel Time Series Using NPT 1.0 A User
Guide. Manuscript, Center for Policy Research, Syracuse University.
Choi, I. (1999a). Unit Root Tests for Panel Data. Working paper, Department of Economics,
Kookmin University, Korea.
Choi, I. (1999b). Asymptotic Analysis of a Nonstationary Error Component Model. Working paper,
Department of Economics, Kookmin University, Korea.
Choi, I. (1999c). Instrumental Variables Estimation of a Nearly Nonstationary Error Component
Model. Working paper, Department of Economics, Kookmin University, Korea.
Coakley, J., & Fuertes, A. M. (1997). New Panel Unit Root Tests of PPP. Economics Letters, 57,
1722.
Coakely, J., Kulasi, F., & Smith, R. (1996). Current Account Solvency and the Feldstein-Horioka
Puzzle. Economic Journal, 106, 620627.
Coe, D., & Helpman, E. (1995). International R&D Spillovers. European Economic Review, 39,
859887.
Conley, T. G. (1999). GMM Estimation with Cross Sectional Dependence. Journal of
Econometrics, 92, 145.
Crepon, B., Kramarz, F., & Trognon, A. (1997). Parameters of Interest, Nuisance Parameters and
Orthogonality Conditions: An Application to Autoregressive Error Components Models.
Journal of Econometrics, 82, 135156.
Culver, S. E., & Papell, D. H. (1997). Is There a Unit Root in the Inflation Rate? Evidence from
Sequential Break and Panel Data Model. Journal of Applied Econometrics, 35, 155160.
Driscoll, J. C., & Kraay, A. C. (1998). Consistent Covariance Matrix Estimation with Spatially
Dependent Panel Data. Review of Economics and Statistics, 80, 549560.
Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37,
249265.
Entorf, H. (1997). Random Walks with Drifts: Nonsense Regression and Spurious Fixed-Effect
Estimation. Journal of Econometrics, 80, 287296.
47
Frankel, J. A., & Rose, A. K. (1996). A Panel Project on Purchasing Power Parity: Mean Reversion
Within and Between Countries. Journal of International Economics, 40, 209224.
Funk, M. (1998). Trade and International R&D Spillovers Among OECD Countries. Working
paper, Department of Economics, St. Louis University, St. Louis.
Gerdtham, U. G., & Lthgren, M. (1998). On Stationarity and Cointegration of International
Health Expenditure and GDP. Working paper, Economics and Finance, No. 232,
Stockholm School of Economics, Sweden.
Griliches, Z., & Mairesse, J. (1998). Production Functions: The Search for Identification. In: S.
Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series,
Cambridge: Cambridge University Press.
Groen, J. J. J. (1999). The Monetary Exchange Rate Model as A Long-run Phenomenon. Journal
of International Economics, forthcoming.
Groen, J. J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of
Vector Error Correction Models. Discussion paper 99055/4, Tinbergen Institute, The
Netherlands.
Hadri, K. (1999). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root
in Panel Data with Serially Correlated Errors. Manuscript, Department of Economics and
Accounting, University of Liverpool, United Kingdom.
Hahn, J. (1997). Efficient Estimation of Panel Data Models With Sequential Moment Restrictions.
Journal of Econometrics, 79, 121.
Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed
Effects? Journal of Econometrics, 93, 309326.
Hall, S., Lazarova, S., & Urga, G. (1999). A Principal Components Analysis of Common
Stochastic Trends in Heterogeneous Panel Data: Some Monte Carlo Evidence. Oxford
Bulletin of Economics and Statistics, 61, 749767.
Harris, D., & Inder, B. (1994). A Test of the Null Hypothesis of Cointegration. In: C. P. Hargreaves
(Ed.), Nonstationary Time Series Analysis and Cointegration. New York: Oxford University
Press.
Harris, R. D. F., & Tzavalis, E. (1999). Inference for Unit Roots in Dynamic Panels Where the
Time Dimension is Fixed. Journal of Econometrics, 91, 201226.
Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects.
Econometrica, 49, 13771398.
Hillier, G. H. (1990). On the Normalization of Structural Equations: Properties of Direction
Estimators. Econometrica, 58, 11811194.
Holtz-Eakin, D. (1988). Testing for Individual Effects in Autoregressive Models. Journal of
Econometrics, 39, 297307.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Hsiao, C., Pesaran, M. H., & Tahmiscioglu, K. (1999). Bayes Estimation of Short-run Coefficients
in Dynamic Panel Data Models. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.),
Analysis of Panel Data and Limited Dependent Variable Models (pp. 268296).
Cambridge: Cambridge University Press.
Im, K. S., Ahn, S. C., Schmidt, P., & Wooldridge, J. M. (1999). Efficient Estimation of Panel Data
Models with Strictly Exogenous Explanatory Variables. Journal of Econometrics, 93,
177201.
Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels.
Manuscript, Department of Applied Economics, University of Cambridge, United
Kingdom.
48
49
Maddala, G. S. (1999). On the Use of Panel Data Methods with Cross Country Data. Annales
dEconomie et de Statistique, 5556, 429448.
Maddala, G. S., Srivastava, V. K., & Li, H. (1994). Shrinkage Estimators for the Estimation of
Short-run and Long-run Parameters From Panel Data Models. Working paper, Ohio State
University, Ohio.
Maddala, G. S., Trost, R. P., Li, H., & Joutz, F. (1997). Estimation of Short-run and Long-run
Elasticities of Energy Demand from Panel Data Using Shrinkage Estimators. Journal of
Business and Economic Statistics, 15, 90100.
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and
A New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631652.
Maddala, G. S., Wu, S., & Liu, P. (2000). Do Panel Data Rescue Purchasing Power Parity (PPP)
Theory? In: J. Krishnakumar & E. Ronchetti (Eds.), Panel Data Econometrics: Future
Directions (pp. 3551). Amsterdam: North-Holland.
Mtys, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: A Handbook of Theory
and Applications. Dordrecht: Kluwer Academic Publishers.
McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel
Data. Econometric Reviews, 17, 5784.
McCoskey, S., & Kao, C. (1999a). Testing the Stability of a Production Function with
Urbanization as a Shift Factor: An Application of Non-Stationary Panel Data Techniques.
Oxford Bulletin of Economics and Statistics, 61, 671690.
McCoskey, S., & Kao, C. (1999b). Comparing Panel Data Cointegration Tests with an Application
of the Twin Deficits Problems. Working paper, Center for Policy Research, Syracuse
University, New York.
McCoskey, S., & Selden, T. (1998). Health Care Expenditures and GDP: Panel Data Unit Root
Test Results. Journal of Health Economics, 17, 369376.
Metcalf, G. E. (1996). Specification Testing in Panel Data with Instrumental Variables. Journal of
Econometrics, 71, 291307.
Moon, H. R., & Phillips, P. C. B. (1998). A Reinterpretation of the Feldstein-Horioka Regressions
from a Nonstationary Panel Viewpoint. Working paper, Department of Economics, Yale
University.
Moon, H. R., & Phillips, P. C. B. (1999). Maximum Likelihood Estimation in Panels with
Incidental Trends. Oxford Bulletin of Economics and Statistics, 61, 711747.
Nelson, C., & Startz, R. (1990). The Distribution of the Instrumental Variables Estimator and Its
t-ratio When the Instrument Is A Poor One. Journal of Business, 63, S125-S140.
Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical
Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri,
L. F. Lee & M.H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable
Models (pp. 136170). Cambridge: Cambridge University Press.
Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426.
OConnell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of
International Economics, 44, 119.
Oh, K. Y. (1996). Purchasing Power Parity and Unit Roots Tests Using Panel Data. Journal of
International Money and Finance, 15, 405418.
Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float.
Journal of International Economics, 43, 313332.
Pedroni, P. (1996). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper, Department of Economics, Indiana University.
50
Pedroni, P. (1997a). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time
Series Tests with an Application to the PPP Hypothesis. Working paper, Department of
Economics, Indiana University.
Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power
Parity in Panels. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1999). Critical Values for Cointegration Tests in Heterogeneous Panels with Multiple
Regressors. Oxford Bulletin of Economics and Statistics, 61, 653678.
Pedroni, P. (2000). Testing for Convergence to Common Steady States in Nonstationary
Heterogeneous Panels. Working paper, Department of Economics, Indiana University.
Pesaran, M. H., & Smith, R. (1995). Estimating Long-run Relationships From Dynamic
Heterogeneous Panels. Journal of Econometrics, 68, 79113.
Pesaran, M. H., Shin, Y., & Smith, R. (1999). Pooled Mean Group Estimation of Dynamic
Heterogeneous Panels. Journal of the American Statistical Association, 94, 621634.
Pesaran, M. H., & Zhao, Z. (1999). Bias Reduction in Estimating Long-run Relationships From
Dynamic Heterogeneous Panels. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Persaran (Eds.),
Analysis of Panels and Limited Dependent Variable Models (pp. 297322). Cambridge:
Cambridge University Press.
Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables
Regression with I (1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel
Data. Econometrica, 67, 10571111.
Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some
Recent Developments. Econometric Reviews, forthcoming.
Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for
Cointegration. Econometrica, 58, 165193.
Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data.
Economics Letters, 44, 919.
Quah, D. (1996). Empirics for Economic Growth and Convergence. European Economic Review,
40, 13531375.
Robertson, D., & Symons, J. (1992). Some Strange Properties of Panel Data Estimators. Journal
of Applied Econometrics, 7, 175189.
Saikkonen, P. (1991). Asymptotically Efficient Estimation of Cointegrating Regressions.
Econometric Theory, 58, 121.
Sala-i-Martin, X. (1996). The Classical Approach to Convergence Analysis. Economic Journal,
106, 10191036.
Schmidt, P., Ahn, S. C. & Wyhowski, D. (1992). Comment. Journal of Business and Economic
Statistics, 10, 1014.
Shin, Y. (1994). A Residual Based Test of the Null of Cointegration Against the Alternative of No
Cointegration. Econometric Theory, 10, 91115.
Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression With Weak Instruments.
Econometrica, 65, 557586.
Stock, J. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems.
Econometrica, 61, 783820.
Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order
Integrated Systems. Econometrica, 61, 783820.
Tauchen, G. (1986). Statistical Properties of Generalized Method of Moments Estimators of
Structural Parameters Obtained From Financial Market Data. Journal of Business and
Economic Statistics, 4, 397416.
51
Wansbeek, T. J., & Bekker, P. (1996). On IV, GMM and ML in a Dynamic Panel Data Model.
Economics Letters , 51, 145152.
Wansbeek, T. J., & Knaap, T. (1999). Estimating a Dynamic Panel Data Model with Heterogenous
Trends. Annales dEconomie et de Statistique, 5556, 331349.
Wooldridge, J. M. (1997). Multiplicative Panel Data Models Without the Strict Exogeneity
Assumption. Econometric Theory, 13, 667678.
Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo Study.
Working paper, Department of Economics, State University of New York at Buffalo, New
York.
Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel Data Set. Journal
of Money, Credit and Banking, 28, 5463.
Ziliak, J. P. (1997). Efficient Estimation with Panel Data When Instruments are Predetermined: An
Empirical Comparison of Moment-condition Estimators. Journal of Business and
Economic Statistics, 15, 419431.
1. INTRODUCTION
Much of the recent literature on dynamic panel data estimation has focused on
providing optimal linear Generalised Method of Moments (GMM) estimators
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 5391.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
53
54
55
56
those provided by the system GMM estimator when the initial conditions
restriction is valid.
The empirical application returns to the Griliches and Mairesse discussion.
The application uses production function data for the U.S. and confirms the
Griliches and Mairesse findings for the capital and labor coefficients in a CobbDouglas model. Using the standard first-differenced GMM estimator, the
estimated coefficient on capital is very low and all coefficient estimates have
poor precision. Constant returns to scale is easily rejected. Moreover, an
examination of the individual series suggests that they are highly autoregressive
thus hinting at a weak instruments problem for standard GMM on this data.
These production function results are improved by using the system estimator.
The capital coefficient is now more precise and takes a reasonable value and
constant returns to scale is not rejected. These Monte Carlo and empirical
results indicate that a careful examination of the original series and use of the
system GMM estimator can overcome many of the disappointing features of
the standard GMM estimator in the context of highly persistent series.
(2.1)
uit = i + vit
(2.2)
57
(2.3)
uit = i + vit
for i = 1, . . . , N and t = 2, . . . , T.2 At the outset we will assume that i and vit
have the familiar error components structure in which
E(i) = 0, E(vit) = 0, E(viti) = 0 for i = 1, . . . , N and t = 2, . . . , T
(2.4)
(2.5)
and
In addition there is the standard assumption concerning the initial conditions yi1
(see Ahn & Schmidt (1995), for example)
E(yi1vit) = 0 for i = 1, . . . , N and t = 2, . . . , T.
(2.6)
These standard assumptions (2.4), (2.5) and (2.6) imply moment restrictions
that are sufficient to (identify and) estimate for T 3.3
Further restrictions on the initial conditions define a mean stationary process
as
yi1 =
i
+ i1
1
for i = 1, . . . , N
(2.7)
and
E( i1) = E(i i1) = 0 for i = 1, . . . , N,
(2.8)
for i = 1, . . . , N and t = 2, . . . , T
2v
1 2
for i = 1, . . . , N.
For completeness and to conclude this brief outline of the dynamic error
components model, we consider the biases from the standard panel data
estimators in this model. We consider here the biases found under covariance
stationarity (for more details see Baltagi (1995) and Hsiao (1986)).
The asymptotic bias of the simple OLS estimator for in model (2.3), is
given by
plim( OLS ) = (1 )
2/
2v
1
, with k =
,
/
2v + k
1+
2
where
2 = E(2i ), and therefore the OLS estimator is biased upwards, with
< plim( OLS) < 1.
58
The asymptotic bias of the Within Groups estimator for has been
documented by Nickell (1981) and is given by
1+
1 1 T
1
T1
T (1 )
,
plim( WG ) =
2
1 1 T
1
1
(1 )(T 1)
T (1 )
and so, when > 0, plim( WG) < .
When the model is transformed into first-differences to eliminate the
unobserved individual heterogeneity component i,
yit = yit 1 + uit,
the asymptotic bias of the OLS estimator is given by
1+
,
plim( OLSd ) =
2
1
and so plim( OLSd) =
< 0.
2
59
1
WN =
N
ZdiuiuiZdi
i=1
1
(3.2)
where ui are residuals from an initial consistent estimator. We refer to this as
the two-step GMM estimator.5 In the absence of any additional knowledge
about the process for the initial conditions, this estimator is asymptotically
efficient in the class of estimators based on the linear moment conditions (3.1)
(see Hansen (1982) and Chamberlain (1987)).
3.2. Homoskedasticity
Ahn & Schmidt (1995) show that additional linear moment conditions are
available if the vit disturbances are homoskedastic through time, i.e. if
E(v2it) =
2i for t = 2, . . . , T.
(3.3)
(3.4)
yi3
yi3
.
0
0
yi4
.
0
...
0
...
0
...
.
. . . yiT 2
0
0
.
.
yiT 1
60
4. WEAK INSTRUMENTS
The instruments used in the standard first-differenced GMM estimator become
less informative in two important cases. First, as the value of the autoregressive
parameter increases towards unity; and second, as the variance of the
individual effects i increases relative to the variance of vit. To examine this
further consider the case with T = 3. In this case, the moment conditions
corresponding to the standard GMM estimator reduce to a single orthogonality
condition. The corresponding method of moments estimator reduces to a
simple two stage least squares (2SLS) estimator, with first stage (instrumental
variable) regression
yi2 =
dyi1 + ri for i = 1, . . . , N.
For sufficiently high autoregressive parameter or for sufficiently high relative
variance of the individual effects, the least squares estimate of the reduced form
coefficient
d can be made arbitrarily close to zero. In this case the instrument
yi1 is only weakly correlated with yi2. To see this notice that the model (2.3)
implies that
yi2 = ( 1)yi1 + i + vi2 for i = 1, . . . , N.
(4.1)
+k
2
2
v
; with k =
1
.
1+
(4.2)
The bias term effectively scales the estimated coefficient on the instrumental
variable yi1 toward zero. We find that plim
d 0 as 1 or as (
2/
2v ) ,
which are the cases in which the first stage F-statistic is Op(1). A graph showing
both plim
d and 1 against is given in Fig. 1, for
2 =
2v , T = 3.
We are interested in inferences using this first-differenced instrumental
variable (IV) estimator when
d is local to zero, that is where the instrument yi1
is only weakly correlated with yi2. Following Nelson & Startz (1990a, b) and
Staiger & Stock (1997) we characterise this problem of weak instruments using
the concentration parameter. First note that the F-statistic for the first stage
instrumental variable regression converges to a noncentral chi-squared with one
Fig. 1.
61
plim
d and 1,
2 =
2 , T = 3. Source: Blundell & Bond (1998).
degree of freedom. The concentration parameter is then the corresponding noncentrality parameter which we label in this case. The IV estimator performs
poorly when approaches zero. Assuming covariance stationarity, has the
following simple characterisation in terms of the parameters of the AR model
=
(
2v k)2
1
; with k =
.
2 +
2v k
1+
62
Fig. 2.
Concentration Parameter ,
2 =
2 = 1, T = 3. Source: Blundell & Bond
(1998).
(5.1)
63
5.2. Homoskedasticity
Under the homoskedasticity through time restriction (3.3), there is one further
non-linear moment condition available, in addition to (3.1), (3.4) and (5.1) (see
Ahn & Schmidt (1995)). This can be written as
T
1
E(uiui3) = 0 where ui =
T1
uit.
(5.2)
t=2
(6.1)
Notice that, given (2.3)(2.6) which specifies yi2 given yi1, assumption (6.1) is
a restriction on the initial conditions process generating yi1.7
If this initial conditions restriction holds in addition to the standard
assumptions (2.4), (2.5) and (2.6), the following T 2 linear moment
conditions are valid
E(uityi, t 1) = 0; for t = 3, 4, . . . , T.
(6.2)
t3
t2
yit =
yi2 +
sui, t s
s=0
so that yit will be uncorrelated with i if and only if yi2 is uncorrelated with
i. This is precisely the assumption (6.1). To guarantee this, we require the
initial conditions restriction
64
yi1
i
i = 0,
1
1
2
(6.3)
and therefore this moment condition stays informative for high values of , in
contrast to the moment condition available for the first-differenced model.
The 0.5(T + 1)(T 2) linear moment conditions (3.1) and (6.2) comprise the
full set of second-order moment conditions under mean stationarity in
conjunction with the standard assumptions listed in Section 2, and form the
basis for a system GMM estimator which will be discussed in the next section.
However, as this system GMM estimator combines the moment conditions for
the model in first-differences with those for the model in levels, we also
consider a simpler GMM levels estimator, that is based on the
ml = 0.5(T 1)(T 2) moment conditions
E(uityi, t s) = 0; for t = 3, . . . , T and 1 s t 2,
(6.4)
...
0
...
0
...
.
. . . yi2
...
0
...
0
,
...
.
. . . yiT 1
and ui is the (T 2) vector (ui3, ui4, . . . , uiT). Calculation of the one-step and
65
(7.1)
(7.2)
Zsi =
Zdi 0
0 Zpli
ui
ui
Zdi
0
0
0
0 yi2
= 0
0
yi3
.
.
.
0
0
0
...
0
...
0
...
0
;
...
0
. . . yi, T 1
with Zdi as defined in section 3, and Zpli is the non-redundant subset of Zli.
The calculation of the two-step GMM estimator is then analogous to that
described above. Again in this case, unless
2 = 0, there is no one-step GMM
estimator that is asymptotically equivalent to the two-step estimator, even in the
special case of i.i.d. disturbances.11
The system GMM estimator is clearly a combination of the GMM
differenced estimator and a GMM levels estimator that uses only (7.2). This
combination is linear for the system 2SLS estimator which is given by
66
yi
.
yi
Because
q 1Zs(ZsZs) 1Zsq 1 = y 1Zd(ZdZd) 1Zdy 1 + y 1Zpl (ZplZpl ) 1Zpl y 1
the system 2SLS estimator is equivalent to the linear combination
s = d + (1 ) pl ,
p
where d and l are the 2SLS first-differenced and levels estimators
respectively, with the levels estimator utilising only the T 2 moment
conditions (7.2), and
y 1Zd(ZdZd) 1Zdy 1
=
y 1Zd(ZdZd) 1Zdy 1 + y 1Zpl (ZplZpl ) 1Zpl y 1
dZdZd
d
=
,
dZdZd
d +
lZpl Zpl
l
l are the OLS estimates of the first stage regression coefficients
where
d and
underlying these 2SLS estimators. From (4.2) and (6.3) it follows that 0 if
1 and/or (
2/
2v ) , so all the weight for the system estimator will in
these cases be given to the informative levels moment conditions (7.2).
7.2. Homoskedasticity
In the case where the initial conditions satisfy restriction (6.1) and the vit satisfy
restriction (3.3), Ahn & Schmidt (1995, equation (12b)) show that the T 2
homoskedasticity restrictions (3.4) and (5.2) can be replaced by a set of T 2
moment conditions
E(yituit yi, t 1ui, t 1) = 0; for t = 3, . . . , T,
which are all linear in the parameter . The non-linear conditions (5.2) are
again redundant for estimation given (6.1), and the complete set of second
order moment restrictions implied by (2.3)(2.6), (3.3) and (6.1) can be
implemented as a linear GMM estimator.
Table 1.
67
2/
2v = 1.00
SYS
2/
2v = 0.25
NON-LINEAR
SYS
NON-LINEAR
T=3
0.0
0.3
0.5
0.8
0.9
1.33
2.15
4.00
28.00
121.33
n/a
1.33
1.89
2.91
13.10
47.91
n/a
T=4
0.0
0.3
0.5
0.8
0.9
1.75
2.31
3.26
13.97
55.40
1.67
1.91
2.10
2.42
2.54
1.40
1.77
2.42
8.88
30.90
1.29
1.33
1.35
1.41
1.45
Section 7.1. These asymptotic variance ratios are calculated assuming both
covariance stationarity and homoskedasticity. They are presented for T = 3 and
T = 4, for two fixed values of
2/
2v , and for a range of values of the
autoregressive parameter . For comparison, we also reproduce from Ahn &
Schmidt (1995) the corresponding asymptotic variance ratios comparing firstdifferenced GMM to the non-linear GMM estimator which uses the quadratic
moment conditions (5.1), but not the extra linear moment conditions (6.2). In
the T = 3 case there are no quadratic moment restrictions available. These
calculations suggest that exploiting conditions (6.2) can result in dramatic
efficiency gains when T = 3, particularly at high values of and high values of
2/
2v . These are indeed the cases where we find the instruments used to obtain
the first-differenced estimator to be weak.
In the T = 4 case we still find dramatic efficiency gains at high values of .
Comparison to the results for the non-linear GMM estimator also shows that
the gains from exploiting conditions (6.2) can be much larger than the gains
from simply exploiting the non-linear restrictions (5.1).
In the Monte Carlo simulations presented in Section 11 we investigate
whether similar improvements are found in finite samples.
68
(9.1)
where xit is a scalar. The error components i and vit again satisfy the conditions
(2.4)(2.6). The xit process is correlated with the individual effects i and we
consider three possible correlation structures between the xit process and the vit
error process that determine the instruments that can be used to estimate and
.
First, the xit process is strictly exogenous:
E(xisvit) = 0; for s = 1, . . . , T; t = 2, . . . , T.
(9.2)
(9.3)
E(xisvit) 0; for s = t + 1, . . . , T; t = 2, . . . , T
and thirdly, the xit process is endogenously determined
E(xisvit) = 0; for s = 1, . . . , t 1; t = 2, . . . , T
(9.4)
E(xisvit) 0; for s = t, . . . , T; t = 2, . . . , T.
We are especially interested in the case when the xit process is endogenously
determined, which includes simultaneous processes, but also measurement
error.
For the GMM first-differenced estimator, the 0.5(T 1)(T 2) moment
conditions (3.1)
E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1
remain valid. When the xit process is strictly exogenous, the following
additional T(T 2) moment conditions are valid
E(xisuit) = 0; for t = 3, . . . , T and 1 s T.
(9.5)
When xit is predetermined there are only the 0.5(T + 1)(T 2) additional
moment conditions
E(xi, t suit) = 0; for t = 3, . . . , T and 1 s t 1,
(9.6)
(9.7)
For the non-linear GMM estimator, moment conditions (5.1) remain valid,
and no further moment conditions result from the presence of xit variables.
69
For the system GMM estimator, we first consider under what conditions both
yit and xit are uncorrelated with i. In order to illustrate this, we specify the
following process for the regressor
xit = xi, t 1 + i + eit.
Thus 0 allows the level of xit to be correlated with i, and the covariance
properties between vit and eis determine whether xit is strictly exogenous,
predetermined or endogenously determined. First notice that
t3
t2
xit =
xi2 +
sei, t s,
s=0
so that xit will be correlated with i if and only if xi2 is correlated with i.
To guarantee E[xi2i] = 0 we require the initial conditions restriction
xi1
i
i = 0
1
(9.8)
t3
t2
yit =
yi2 +
(9.9)
s=0
shows that yit will be correlated with i if and only if yi2 is correlated with
i. To guarantee E[yi2i] = 0 we then require the similar initial conditions
restriction
yi1
i
+ i
1
1
i
=0
(9.10)
which would again be satisfied under stationarity. Thus, there are additional
moment restrictions available for the equations in levels when the yit and xit
processes are both mean stationary.
Whilst jointly stationary means is sufficient to ensure that both yit and xit
are uncorrelated with i, this condition is stronger than is necessary. For
example, if the conditional model (9.1) has generated the yit series for
sufficiently long time prior to our sample period for any influence of the true
initial conditions to be negligible, then an expression analogous to (9.9) shows
that yit will be uncorrelated with i provided that xit is uncorrelated with i,
70
even if the mean of xit (and hence yit) is time-varying. Moreover we can note
that it is perfectly possible for xit to be uncorrelated with i in cases where yit
is correlated with i (for example, when (9.8) holds or = 0 but (9.10) is not
satisfied). However, given (9.9), it seems very unlikely that yit will be
uncorrelated with i in contexts where xit is correlated with i.
When both yit and xit are uncorrelated with i, the extra moment
conditions for the GMM system estimator are, as before, (7.2),
E(uityi, t 1) = 0; for t = 3, . . . , T
and
E(uitxit) = 0; for t = 2, . . . , T
(9.11)
(9.12)
1
uZdWNZdu
N
where WN is the optimal weight matrix as in (3.2) and u are the two-step
residuals in the differenced model. In general, under the null that the moment
conditions are valid, Sard is asymptotically chi-squared distributed with md k
degrees of freedom, where md is the number of moment conditions and k is the
number of estimated parameters.
For the system estimator, the same test is readily defined. Call this test Sars.
A test for the validity of the level moment conditions that are utilised by the
system estimator is then obtained as the difference between Sars and Sard:
Dif-Sar = Sars Sard
(10.1)
71
(11.1)
(11.2)
with
i ~ N(0,
2); vit ~ N(0,
2v ); eit ~ N(0,
2e )
and the initial observations are drawn from the covariance stationary
distribution. Although these errors are homoskedastic, we do not consider any
of the additional moment conditions that require homoskedasticity in the
simulated estimators.
We choose the error process parameters in such a way that the xit process is
highly persistent for high values of . Further, xit is positively correlated with
i and the value of is negative to mimic the effects of measurement error. The
values of the parameters that are kept fixed in the various Monte Carlo
simulations presented below are
= 1, = 0.25, = 0.1,
2 = 1,
2v = 1,
2e = 0.16.
The parameters that are varied in the simulations are the autoregressive
coefficients and . We consider four designs with and both taking the
values of 0.5 and 0.95. The case when = 0.5 and = 0.95 resembles the
production function data that will be analysed in the next section. The sample
size is N = 500, and the simulation results for the various estimators are
presented in Tables 2 and 3 for T = 4 and in Tables 4 and 5 for T = 8.
Means, standard deviations and root mean squared errors (RMSE) from
10,000 simulations are tabulated for the OLS levels estimator (OLS), Within
Groups estimator (WG), the GMM first-differenced estimator (DIF), the nonlinear GMM estimator (AS),13 the levels GMM estimator (LEV), and the
0.990
0.583
0.775
0.820
0.762
0.001
0.040
0.053
0.420
0.011
0.320
0.053
0.231
0.017
0.263
St D
rmse
0.194
0.300
0.318
0.010
0.036
Mean
WG
0.032
0.651
0.075
0.809
0.031
0.491
0.080
0.687
0.030
0.538
St D
rmse
0.195
0.350
0.915
0.469
0.496
Mean
DIF
0.487
0.773
0.994
1.554
0.131
0.135
0.420
0.428
0.090
0.091
St D
rmse
0.790
0.840
1.006
0.516
0.501
Mean
AS
0.242
0.266
0.524
0.565
0.095
0.096
0.351
0.351
0.075
0.075
St D
rmse
1.004
0.980
1.029
0.512
0.502
St D
rmse
0.029
0.042
0.289
0.289
0.070
0.070
0.336
0.337
0.059
0.059
LEV
Mean
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 2.
1.000
0.979
1.015
0.512
0.500
Mean
SYS
0.033
0.044
0.232
0.232
0.060
0.061
0.257
0.257
0.055
0.055
St D
rmse
72
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
0.962
0.904
0.830
0.650
0.997
0.001
0.012
0.026
0.100
0.014
0.151
0.034
0.174
0.002
0.047
St D
rmse
0.465
0.661
0.551
0.089
0.221
Mean
WG
0.026
0.290
0.089
0.543
0.031
0.412
0.090
0.458
0.032
0.729
St D
rmse
0.233
0.907
0.517
0.466
0.472
Mean
DIF
0.104
0.112
1.769
1.928
0.103
0.109
1.438
1.522
0.825
0.954
St D
rmse
0.863
0.936
1.021
0.500
0.868
Mean
AS
0.072
0.074
0.853
0.864
0.065
0.065
0.461
0.461
0.221
0.235
St D
rmse
LEV
1.020
0.957
1.078
0.518
0.961
Mean
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 3.
0.008
0.010
0.091
0.093
0.053
0.056
0.160
0.178
0.144
0.145
St D
rmse
1.020
0.956
1.075
0.514
0.953
Mean
SYS
0.010
0.011
0.090
0.092
0.044
0.046
0.153
0.170
0.096
0.096
St D
rmse
0.990
0.581
0.775
0.820
0.762
0.001
0.040
0.035
0.421
0.007
0.320
0.034
0.228
0.012
0.262
St D
rmse
0.388
0.662
0.490
0.311
0.265
Mean
WG
0.016
0.289
0.044
0.613
0.017
0.190
0.045
0.512
0.018
0.236
St D
rmse
0.226
0.548
0.930
0.480
0.494
Mean
DIF
0.177
0.440
0.356
0.852
0.040
0.045
0.136
0.153
0.034
0.035
St D
rmse
0.972
0.969
0.944
0.497
0.495
Mean
AS
0.030
0.036
0.134
0.137
0.029
0.029
0.134
0.145
0.025
0.026
St D
rmse
LEV
0.979
0.982
1.041
0.523
0.503
Mean
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 4.
0.007
0.032
0.108
0.110
0.034
0.041
0.157
0.162
0.029
0.029
St D
rmse
0.983
0.979
0.997
0.511
0.501
Mean
SYS
0.011
0.031
0.101
0.103
0.027
0.029
0.124
0.124
0.024
0.024
St D
rmse
74
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
0.962
0.902
0.830
0.650
0.997
0.001
0.012
0.017
0.100
0.009
0.150
0.022
0.171
0.001
0.047
St D
rmse
0.745
0.882
0.796
0.396
0.591
Mean
WG
0.009
0.068
0.040
0.258
0.015
0.106
0.040
0.208
0.017
0.359
St D
rmse
0.615
0.927
0.800
0.480
0.676
Mean
DIF
0.025
0.034
0.400
0.555
0.033
0.039
0.290
0.352
0.222
0.350
St D
rmse
1.016
0.956
1.099
0.508
0.903
Mean
AS
0.007
0.009
0.118
0.119
0.024
0.025
0.125
0.159
0.061
0.077
St D
rmse
LEV
1.017
0.957
1.084
0.523
0.973
Mean
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 5.
0.002
0.007
0.028
0.033
0.022
0.032
0.058
0.101
0.022
0.032
St D
rmse
1.019
0.957
1.075
0.518
0.958
Mean
SYS
0.003
0.007
0.031
0.036
0.021
0.028
0.059
0.095
0.031
0.032
St D
rmse
76
system GMM estimator (SYS). Thus for the case of estimating the AR(1)
model for xit, DIF uses the moment conditions (3.1); AS uses the moment
conditions (3.1) and (5.1); LEV uses the moment conditions (6.4); and SYS
uses the moment conditions (3.1) and (6.2). The reported results are for the
two-step GMM estimators.
Tables 2 and 4 present results for = 0.5. The row labelled presents the
results for the estimates of in model (11.2), where the various GMM
estimators only utilise lagged information on x as instruments, and potential
information from the lagged values of y is not used. Our results for the DIF and
SYS estimators can therefore be compared to those reported in, for example,
Blundell & Bond (1998) and AlonsoBorrego & Arellano (1999). As expected,
the OLS estimates are biased upward and the WG estimates are biased
downwards. In this experiment where xit is not highly persistent and the
instruments available for the equations in first-differences are not weak, all four
GMM estimators are virtually unbiased. The AS, LEV and SYS estimators all
provide an improvement in precision compared to the standard DIF estimator.
As we would expect from the asymptotic variance ratios in Table 1, there is a
greater gain in precision from using SYS rather than AS at T = 4, although in
Table 4 we can observe that this difference becomes very small at T = 8.
The next two rows in Tables 2 and 4 present the estimation results for and
in model (11.1) when = 0.5 and = 0.5. The OLS estimates for are biased
upwards, whereas those for are biased downwards. The WG estimates for
and are both biased downwards. Again, as expected, since both the y and x
series have a low degree of persistence, the four GMM estimators perform quite
well in this experiment. The SYS estimator has the smallest RMSE for both
parameters, but the gains are not dramatic at T = 8.
The final two rows in Tables 2 and 4 are for the model with = 0.95 and
= 0.5. As this makes the y process highly persistent, the DIF estimator suffers
from a serious weak instrument bias, as well as being very imprecise. We can
notice that the DIF estimates of and are both biased downwards, in the
direction of the Within Groups estimates. The AS estimator is better behaved,
as a result of exploiting the non-linear moment conditions (5.1). However the
LEV and SYS estimators which exploit the initial conditions restrictions
provide more dramatic gains in precision, particularly for the estimation of
and particularly in the case with T = 4. With T = 8, the LEV and SYS estimates
of are biased upwards, in the direction of the OLS estimate, but still dominate
on the RMSE criterion.
Tables 3 and 5 present the results for the cases where the xit process is highly
persistent, with = 0.95. The estimates for show the familiar pattern: OLS is
upward biased, WG is downward biased, and DIF is downward biased towards
77
78
= 0.95 and T = 8. The Sars and Dif-Sar tests are considerably oversized in this
case, whereas the Sard test has the correct size.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
79
80
Fig. 7.
= 0.5, = 0.5, T = 4.
Fig. 8.
= 0.5, = 0.95, T = 4.
Fig. 9.
= 0.5, = 0.5, T = 8.
Fig. 10.
= 0.5, = 0.95, T = 8.
81
82
Fig. 11.
= 0.95, = 0.5, T = 4.
Fig. 12.
= 0.95, = 0.95, T = 4.
Fig. 13.
= 0.95, = 0.5, T = 8.
Fig. 14.
= 0.95, = 0.95, T = 8.
83
84
|| < 1
(12.1)
where yit is log sales of firm i in year t, nit is log employment, kit is log capital
stock and t is a year-specific intercept reflecting, for example, a common
technology shock. Of the error components, i is an unobserved time-invariant
firm-specific effect, vit is a possibly autoregressive (productivity) shock and mit
reflects serially uncorrelated (measurement) errors. Constant returns to scale
would imply n + k = 1, but this is not necessarily imposed.
Interest is in the consistent estimation of the parameters (n, k, ) when the
number of firms (N) is large and the number of years (T) is fixed. We maintain
that both employment (nit) and capital (kit) are potentially correlated with the
firm-specific effects (i), and with both productivity shocks (eit) and
measurement errors (mit).
The model has a dynamic (common factor) representation
yit = nnit nni, t 1 + kkit kki, t 1 + yi, t 1
+ (t t 1) + (i(1 ) + eit + mit mi, t 1)
(12.2)
or
yit =
1nit +
2ni, t 1 +
3kit +
4ki, t 1 +
5yi, t 1 + *t + (*i + wit)
(12.3)
85
86
Table 6.
OLS
Levels
Within
Groups
DIF
t2
DIF
t3
SYS
t2
SYS
t3
0.479
(0.029)
0.423
(0.031)
0.235
(0.035)
0.212
(0.035)
0.922
(0.011)
0.488
(0.030)
0.023
(0.034)
0.177
(0.034)
0.131
(0.025)
0.404
(0.029)
0.513
(0.089)
0.073
(0.093)
0.132
(0.118)
0.207
(0.095)
0.326
(0.052)
0.499
(0.101)
0.147
(0.113)
0.194
(0.154)
0.105
(0.110)
0.426
(0.079)
0.629
(0.106)
0.092
(0.108)
0.361
(0.129)
0.326
(0.104)
0.462
(0.051)
0.472
(0.112)
0.278
(0.120)
0.398
(0.152)
0.209
(0.119)
0.602
(0.098)
m1
m2
Sar
Dif-Sar
2.60
2.06
8.89
1.09
6.21
1.36
0.001
4.84
0.69
0.073
8.14
0.59
0.000
0.001
6.53
0.35
0.032
0.102
n
0.538
(0.025)
0.266
(0.032)
0.964
(0.006)
0.488
(0.030)
0.199
(0.033)
0.512
(0.022)
0.583
(0.085)
0.062
(0.079)
0.377
(0.049)
0.515
(0.099)
0.225
(0.126)
0.448
(0.073)
0.773
(0.093)
0.231
(0.075)
0.509
(0.048)
0.479
(0.098)
0.492
(0.074)
0.565
(0.078)
0.000
0.000
0.000
0.000
0.014
0.000
0.711
0.006
0.012
0.922
0.772
0.641
nt
nt1
kt
kt1
yt1
k
Comfac
CRS
Asymptotic standard errors in parentheses. Year dummies included in all models. m1 and m2 are
tests for first- and second-order serial correlation, asymptotically N(0, 1). We test the levels
residuals for OLS levels, and the first-differenced residuals in all other columns.
Comfac is a minimum distance test of the non-linear common factor restrictions imposed in the
restricted models. P-values are reported (also for Sar and Dif-Sar). CRS is a Wald test of the
constant resturns to scale hypothesis n + k = 1 in the restricted models. P-values are reported.
Source: Blundell & Bond (2000).
For the one-step GMM estimators, t s indicates that levels of the three series (y, n, k) dated
t s and all observed longer lags are used as instruments for the first-differenced equations. SYS
estimators use lagged differences of the three series dated t s + 1 as instruments for the levels
equations.
87
specifically tests the additional moment conditions used in the levels equations
accepts their validity at the 10% level. The system GMM parameter estimates
appear to be reasonable. The estimated coefficient on the lagged dependent
variable is higher than the Within Groups estimate, but well below the OLS
levels estimate. The common factor restrictions are easily accepted, and the
estimate of k is both higher and better determined than the differenced GMM
estimate. The constant returns to scale restriction is easily accepted in the
system GMM results.16
Blundell & Bond (2000) explore this data in more detail and conclude that
the system GMM estimates in the final column of Table 6 are their preferred
results. In particular they find that the individual series used here are highly
persistent, and that the instruments available for the first-differenced equations
are only weakly correlated with the explanatory variables in first-differences.
This is consistent with the similarity between the first-differenced GMM and
Within Groups results. Blundell & Bond (2000) also find that when constant
returns to scale is imposed on the production function it is not rejected in the
preferred system GMM results then the results obtained using the firstdifferenced GMM estimator become more similar to the system GMM
estimates.
88
instruments available for the equations in levels, the system GMM estimator
can both greatly improve the precision and greatly reduce the finite sample bias
when these additional moment conditions are valid. Intermediate results are
found for the non-linear GMM estimator considered, which suggests that this
estimator could also be useful in applications with persistent series where the
validity of the initial conditions restrictions required for the system GMM
estimator are rejected.
The empirical application uses company accounts data for the US to estimate
a simple Cobb-Douglas production function. For the standard GMM estimator
that uses moment conditions only for the first-differenced equations, we
confirm the problems noted by Griliches and Mairesse: the estimated
coefficient on capital is very low, all coefficient estimates are imprecise, and
constant returns to scale is easily rejected. We notice that the first-differenced
GMM results are similar to the Within Groups results, which suggests there
may be a problem of weak instruments. This suggestion is consistent with the
persistence of the underlying sales, employment and capital stock series. The
additional moment conditions used by the system GMM estimator are not
rejected in this context, and lead to a marked improvement in the empirical
results.
Taken together, these Monte Carlo and empirical results suggest that careful
consideration of the underlying series and comparisons between different panel
data estimators can be useful in detecting situations where the standard firstdifferenced GMM estimator is likely to be subject to serious weak instruments
biases. Where appropriate, the use of the system GMM estimator offers a
simple and powerful alternative, that can overcome many of the disappointing
features of the standard first-differenced GMM estimator in the context of
highly persistent series.
ACKNOWLEDGMENTS
This research is part of the programme of research at the ESRC Centre for the
Micro-Economic Analysis of Fiscal Policy at IFS. Financial support from the
ESRC is gratefully acknowledged.
NOTES
1. All of the estimators discussed and their properties extend in an obvious fashion
to higher order autoregressive models.
2. Extensions to dynamic models with additional regressors are considered in
Section 9.
89
3. With T = 3, the absence of serial correlation in vit (2.5) and predetermined initial
conditions (2.6) are required to identify (in the absence of any strictly exogenous
instruments). With T > 3, can be identified in the presence of suitably low order
moving average autocorrelation in vit.
4. These estimators are all based on the normalisation (2.3). Alonso-Borrego &
Arellano (1999) consider a symmetrically normalised instrumental variable estimator
based on the normalisation invariance of the standard LIML estimator.
5. As a choice of WN to yield the initial consistent estimator, Arellano & Bond
(1991) suggest
WN =
1
N
1
ZdiHdZdi
i=1
Hd =
2
1
0
...
0
1
2
1
...
0
0
1
2
...
0
...
...
...
...
...
0
0
0
.
2
which can be calculated in one step. The use of this Hd matrix accounts for the firstorder moving average structure in uit induced by the first-differencing transformation.
Note that when the vit are i.i.d., the one-step and two-step estimators are asymptotically
equivalent in this model. We follow this suggestion in the Monte Carlo simulations in
Section 11.
6. As shown by Arellano & Bover (1995), OLS on the model transformed to
orthogonal deviations coincides with the Within Groups estimator.
7. In this section we focus only on moment conditions that are valid under
heteroskedasticity. The case with homoskedasticity and assumption (6.1) is considered
in Section 7.2.
8. This corrects the expression for plim
l as given in Blundell and Bond (1998,
p. 125).
9. As a choice of WN to yield the initial consistent estimator, we use
WN =
1
N
1
ZliZli
i=1
90
1
WN =
N
1
ZsiHsZsi
i=1
Hd
0
IT 2
Zdi
0
0
, then the non-linear moment conditions can be written as
IT 3
N
1
1
, see Meghir &
E[Znlisi] = 0. As an initial weight matrix we use WN =
ZnliZnli
N i=1
Windmeijer (1999).
14. The unrestricted results are computed using DPD98 for GAUSS (see Arellano &
Bond, 1998).
15. The table reports p-values from minimum distance tests of the common factor
restrictions and Wald tests of the constant returns to scale restrictions.
16. One puzzle is that we find little evidence of second-order serial correlation in the
first-differenced residuals (i.e. an MA(1) component in the error term in levels),
although the use of instruments dated t 2 is strongly rejected. It may be that the eit
productivity shocks are also MA(1), in a way that happens to offset the appearance of
serial correlation that would otherwise result from measurement errors.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal
of Econometrics, 68, 528.
Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalised Instrumental-Variable
Estimation using Panel Data. Journal of Business and Economic Statistics, 17, 3649.
Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components.
Journal of the American Statistical Association, 76, 598606.
Arellano, M., & Bond, S. R. (1991). Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations. Review of Economic Studies, 58,
277297.
Arellano, M., & Bond, S. R. (1998). Dynamic Panel Data Estimation using DPD98 for GAUSS.
http://www.ifs.org.uk/staff/steve_b.shtml.
Arellano, M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 2952.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley.
91
Bhagarva, A., & Sargan, J. D. (1983). Estimating Dynamic Random Effects Models from Panel
Data Covering Short Time Periods. Econometrica, 51, 16351659.
Blundell, R. W., & Bond, S. R. (1998). Initial Conditions and Moment Restrictions in Dynamic
Panel Data Models. Journal of Econometrics, 87, 115143.
Blundell, R. W., & Bond, S. (2000). GMM Estimation with Persistent Panel Data: An Application
to Production Functions. Econometric Reviews, 19(3), 321340.
Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation
when the Correlation between the Instruments and the Endogenous Explanatory Variable is
Weak. Journal of the American Statistical Association, 90, 443450.
Chamberlain, G. (1987). Asymptotic Efficiency in Estimation with Conditional Moment
Restrictions. Journal of Econometrics, 34, 305334.
Davidson, R., & MacKinnon, J. G. (1996). Graphical Methods for Investigating the Size and
Power of Hypothesis Tests. Manchester School, 66, 126.
Griliches, Z., & Mairesse, J. (1998). Production Functions: the Search for Identification. In: S.
Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series,
Cambridge: Cambridge University Press.
Hansen, L. P. (1982). Large Sample Properties of Generalised Method of Moment Estimators.
Econometrica, 50, 10291054.
Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating Vector Autoregressions with Panel
Data. Econometrica, 56, 13711396.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Mairesse, J., & Hall, B. H. (1996). Estimating the Productivity of Research and Development in
French and US Manufacturing Firms: An Exploration of Simultaneity Issues with GMM
Methods. In: K. Wagner & B. Van Ark (Eds), International Productivity Differences and,
Their Explanations (pp. 285315). Elsevier Science.
Meghir, C., & Windmeijer, F. (1999). Moment Conditions for Dynamic Panel Data Models with
Multiplicative Individual Effects in the Conditional Variance. Annales dconomie et de
Statistique, 55/56, 317330.
Nelson, C. R., & Startz, R. (1990a). Some Further Results on the Exact Small Sample Properties
of the Instrumental Variable Estimator. Econometrica, 58, 967976.
Nelson, C. R., & Startz, R. (1990b). The Distribution of the Instrumental Variable Estimator and
Its t-ratio When the Instrument is A Poor One. Journal of Business and Economic Statistics,
63, 51255140.
Nickell, S. J. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49,
14171426.
Sargan, J. D. (1958). The Estimation of Economic Relationships Using Instrumental Variables.
Econometrica, 26, 329338.
Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments.
Econometrica, 65, 557586.
Windmeijer, F. (2000). Efficiency Comparisons for a System GMM Estimator in Dynamic Panel
Data Models. In: R. D. H. Heijmans, D. S. G. Pollock & A. Satorra (Eds), Innovations in
Multivariate Statistical Analysis. A Festschrift for Heinz Neudecker (pp. 175184). Kluwer
Academic Publishers.
I. INTRODUCTION
In this chapter we develop methods for estimating and testing hypotheses for
cointegrating vectors in dynamic time series panels. In particular we propose
methods based on fully modified OLS principles which are able to
accommodate considerable heterogeneity across individual members of the
panel. Indeed, one important advantage to working with a cointegrated panel
approach of this type is that it allows researchers to selectively pool the long
run information contained in the panel while permitting the short run dynamics
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 93130.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
93
94
PETER PEDRONI
95
96
PETER PEDRONI
97
sample mean of the cointegrating vectors. Both of these features of the group
mean estimator are often important in practical applications.
Finally, the implementation of the feasible form of the between dimension
group mean estimator also has advantages over the other estimators in the
presence of heterogeneity of the residual dynamics around the cointegrating
vector. As was demonstrated in Pedroni (1996a), in the presence of such
heterogeneity, the pooled panel FMOLS estimator requires a correction term
that depends on the true cointegrating vector. For a specific null value for a
cointegrating vector, the t-statistic is well defined, but of course this is of little
use per se when one would like to estimate the cointegrating vector. One
solution is to obtain a preliminary estimate of the cointegrating vector using
OLS. However, although the OLS estimator is superconsistent, it still contains
a second order bias in the presence of endogeneity, which is not eliminated
asymptotically. Accordingly, this bias leads to size distortion, which is not
necessarily eliminated even when the sample size grows large in the panel
dimension. Consequently, this type of approach based on a first stage OLS
estimate was not recommended in Pedroni (1996a), and it is not surprising that
Monte Carlo simulations have shown large size distortions for such estimators.
Even when the null hypothesis was imposed without using an OLS estimator,
the size distortions for this type of estimator were large as reported in Pedroni
(1996a). Similarly, Kao & Chiang (1997) also found large size distortions for
such estimators when OLS estimates were used in the first stage for the
correction term. By contrast, the feasible version of the between dimension
group mean based estimator does not suffer from these difficulties, even in the
presence of heterogeneous dynamics. As we will see, the size distortions for
this estimator are minimal, even in panels of relatively modest dimensions.
The remainder of the chapter is structured as follows. In Section 2, we
introduce the econometric models of interest for heterogeneous cointegrated
panels. We then present a number of theoretical results for estimators designed
to be asymptotically unbiased and to provide nuisance parameter free
asymptotic distributions which are standard normal when applied to heterogeneous cointegrated panels and can be used to test hypotheses regarding
common cointegrating vectors in such panels. In Section 3 we study the small
sample properties of these estimators and propose feasible FMOLS statistics
that perform relatively well in realistic panels with heterogeneous dynamics. In
Section 4 we enumerate the algorithm used to construct these statistics and
briefly describe a few examples of their uses. Finally, in Section 5 we offer
conclusions and discuss a number of related issues in the ongoing research on
estimation and inference in cointegrated panels.
98
PETER PEDRONI
(1)
where the vector error process it = (it, it) is stationary with asymptotic
covariance matrix i. Thus, the variables xi, yi are said to cointegrate for each
member of the panel, with cointegrating vector if yit is integrated of order
one. The term i allows the cointegrating relationship to include member
specific fixed effects. In keeping with the cointegration literature, we do not
require exogeneity of the regressors. As usual, xi can in general be an m
dimensional vector of regressors, which are not cointegrated with each other. In
this case, we partition it = (it, it) so that the first element is a scalar series and
the second element is an m dimensional vector of the differences in the
regressors it = xit xit1 =
xit, so that when we construct
i =
11i
21i
21i
22i
(2)
then 11i is the scalar long run variance of the residual it, and 22i is the m m
long run covariance among the it, and 21i is an m 1 vector that gives the long
run covariance between the residual it and each of the it. However, for
simplicity and convenience of notation, we will refer to xi as univariate in the
remainder of this chapter. Each of the results of this study generalize in an
obvious and straightforward manner to the vector case, unless otherwise
indicated.2
99
[Tr]
it Bi(r, i) holds for any given member, i, of the panel,
T t = 1
where Bi(r, i) is Brownian motion dened over the real interval r[0,1], with
asymptotic covariance i.
This assumption indicates that the multivariate functional central limit theorem,
or invariance principle, holds over time for any given member of the panel. This
places very little restriction on the temporal dependency and heterogeneity of
the error process, and encompasses for example a broad class of stationary
ARMA processes. It also allows the serial correlation structure to be different
for individual members of the panel. Specifically, the asymptotic covariance
matrix, i varies across individual members, and is given by i
limT E[T 1(Tt = 1it)(Tt = 1it)], which can also be decomposed as i =
oi +
i +
i, where oi is the contemporaneous covariance and
i is a weighted
sum of autocovariances. The off-diagonal terms of these individual 21i
matrices capture the endogenous feedback effect between yit and xit, which is
also permitted to vary across individual members of the panel. For several of
the estimators that we propose, it will be convenient to work with a
triangularization of this asymptotic covariance matrix. Specifically, we will
refer to this lower triangular matrix of i as Li, whose elements are related as
follows
1/2
(3)
L11i = (11i 221i/22i)1/2, L12i = 0, L21i = 21i /1/2
22i, L22i = 22i
Estimation of the asymptotic covariance matrix can be based on any one of a
number of consistent kernel estimators such as the Newey & West (1987)
estimator.
Next, for the cross sectional dimension, we will employ the standard panel
data assumption of independence. Hence we have:
Assumption 1.2 (cross sectional independence): The individual processes are
assumed to be independent cross sectionally, so that E[it, jt] = 0 for all i j.
partial sum
100
PETER PEDRONI
101
time series dimension which can be expected to grow in actuality rather than
the cross sectional dimension, which is in practice fixed. Thus, T is in a
sense the true asymptotic feature in which we are interested, and this leads to
statistics which are characterized as sums of i.i.d. Brownian motion
functionals. For practical purposes, however, we would like to be able to
characterize these statistics for the general case in which N is large, and in this
case we take N as a convenient benchmark for which to characterize the
distribution, provided that we understand T to be the dominant asymptotic
feature of the data.
B. Asymptotic Properties of Panel OLS
Next, we consider the properties of a number of statistics that might be used for
a cointegrated panel as described by (1) under assumptions 1.1 and 1.2
regarding the time series and cross dimensional dependencies in the data. The
first statistic that we examine is a standard panel OLS estimator of the
cointegrating relationship. It is well known that the conventional single
equation OLS estimator for the cointegrating vector is asymptotically biased
and that its standardized distribution is dependent on nuisance parameters
associated with the serial correlation structure of the data, and there is no
reason to believe that this would be otherwise for the panel OLS estimator. The
following proposition confirms this suspicion.4
Proposition 1.1 (Asymptotic Bias of the Panel OLS Estimator). Consider a
standard panel OLS estimator for the coefficient of panel (1), under
assumptions 1.1 and 1.2, given as
N
NT =
i=1
t=1
(xit x i)
i=1
t=1
(xit xi)(yit y i)
102
PETER PEDRONI
N
NT =
*
i=1
where
L 2
22i
t=1
(xit x i)
1
1
L 1
11iL22i
i=1
t=1
(xit x i)*it T i
103
L 21i
L
o21i 21i (
22i +
o22i)
x ,
21i +
L22i it i
L 22i
i as defined in (2) above. Then,
and L i is a lower triangular decomposition of
NT converges to the true value
under assumptions 1.1 and 1.2, the estimator *
at rate TN, and is distributed as
*it = it
NT ) N(0, v) where v =
TN(*
2 iff x i = y i = 0
6 else
as T and N .
As the proposition indicates, when proper modifications are made to the
estimator, the corresponding asymptotic distribution will be free of the
nuisance parameters associated with any member specific serial correlation
patterns in the data. Notice also that this fully modified panel OLS estimator is
asymptotically unbiased for both the standard case without intercepts as well as
the fixed effects model with heterogeneous intercepts. The only difference is in
the size of the variance, which is equal to 2 in the standard case, and 6 in the
case with heterogeneous intercepts, both for xit univariate. More generally,
when xit is an m-dimensional vector, the specific values for v will also be a
function of the dimension m. The associated t-statistics, however, will not
depend on the specific values for v, as we shall see.
The fact that this estimator is distributed normally, rather than in terms of
unit root asymptotics as in Phillips & Hansen (1990), derives from the fact that
these unit root distributions are being averaged over the cross sectional
dimension. Specifically, this averaging process produces normal distributions
whose variance depends only on the moments of the underlying Brownian
motion functionals that describe the properties of the integrated variables. This
is achieved by constructing the estimator in a way that isolates the idiosyncratic
components of the underlying Wiener processes to produce sums of standard
and independently distributed Brownian motion whose moments can be
computed algebraically, as the proof of the proposition makes clear. The
estimators L 11i and L 22i, which correspond to the long run standard errors of
conditional process it, and the marginal process
xit respectively, act to purge
the contribution of these idiosyncratic elements to the endogenous feedback
T
(xit x i)y*it T i.
t=1
The fact that the variance is larger for the fixed effects model in which
heterogeneous intercepts are included stems from the fact that in the presence
104
PETER PEDRONI
of unit roots, the variation from the cross terms of the sample averages x i and
y i grows large over time at the same rate T, so that their effect is not eliminated
NT ).5 However, since the
asymptotically from the distribution of TN(*
contribution to the variance is computable analytically as in the proof of
proposition 1.2, this in itself poses no difficulties for inference. Nevertheless,
upon consideration of these expressions, it also becomes apparent that there
should exist a metric which can directly adjust for this effect in the distribution
and consequently render the distribution standard normal. In fact, as the
following proposition indicates, it is possible to construct a t-statistic from this
fully modified panel OLS estimator whose distribution will be invariant to this
effect.
Corollary 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS tstatistic). Consider the following t-statistic for the FMOLS panel estimator of
as defined in proposition 1.2 above. Then under the same assumptions as in
proposition 1.2, the statistic is standard normal,
N
NT )
t*
NT = (*
i=1
2
22i
(xit x i)2
t=1
1/2
N(0, 1)
T
t=1
105
T
(xit x i)2. Finally, note that proposition 1.2 and its corollary 1.2 have been
t=1
106
PETER PEDRONI
hypothesized common value for under the null, and a is some alternative
value for which is also common to all members of the panel. By contrast, the
group mean fully modified t-statistic can be used to test the null hypothesis
Ho : i = o for all i versus the alternative hypothesis Ha : i o for all i, so that
the values for are not necessarily constrained to be homogeneous across
different members under the alternative hypothesis.
The following proposition gives the precise form of the panel fully modified
OLS t-statistic that we propose and gives its asymptotic distributions.
Proposition 1.3 (Asymptotic Distribution of the Panel FMOLS Group Mean
t-Statistic). Consider the following group mean FMOLS t-statistic for of the
cointegrated panel (1). Then under assumptions 1.1 and 1.2, the statistic is
standard normal, and
N
1
t*
NT =
N
L 1
11i
i=1
t=1
(xit x i)
1/2
t=1
where
y*it = (yit y i)
L 21i
o21i L21i (
22i +
o22i)
xit, i
21i +
L 22i
L 22i
107
108
PETER PEDRONI
single series estimator, for panels, as N grows large, the effect has the potential
to become first order.
Another possibility might appear to be to construct the feasible panel
FMOLS estimator for proposition 1.2 in terms of the original data series
L 21i
N
*NT =
i=1
2
22i
(xit x i)
t=1
L L
1 1
11i 22i
i=1
t=1
(xit x i)y*it T i
where
L 21i
L 11i L 22i
xit +
(xit x i)
L 22i
L 22i
NT ) and t*
and L i and i are defined as before. Then the statistics TN (*
NT
constructed from this estimator are numerically equivalent to the ones defined
in proposition 1.2 and corollary 1.2.
This proposition shows why it is difficult to construct a reliable point estimator
based on the naive FMOLS estimator simply by using a transformation of y*it
analogous to the single equation case. Indeed, as the proposition makes
explicit, such an estimator would in general depend on the true value of the
parameter that it is intended to estimate, except in very specialized cases, which
we discuss below. On the other hand, this does not necessarily prohibit the
usefulness of an estimator based on proposition 2.1 for the purposes of testing
a particular hypothesis about a cointegrating relationship in heterogeneous
panels. By using the hypothesized null value for in the expression for y*it,
proposition 2.1 can at least in principle be employed to construct a feasible
FMOLS statistics to test the null hypothesis that i = for all i. However, as
was reported in Pedroni (1996a), even in this case the small sample
performance of the statistic is often subject to relatively large size distortion.
Proposition 2.1 also provides us with an opportunity to examine the
consequences of ignoring heterogeneity associated with the serial correlation
y*it = (yit y i)
109
dynamics for the error process for this type of estimator. In particular, we
notice that the modification involved in this estimator relative to the convential
time series fully modified OLS estimator differs in two respects. First, it
includes the estimators L 11i and L 22i that premultiply the numerator and
denominator terms to control for the idiosyncratic serial correlation properties
of individual cross sectional members prior to summing over N. Secondly, and
more importantly, it includes in the transformation of the dependent variable y*it
L 11i L 22i
an additional term
(xit x i). This term is eliminated only in two
L 22i
special cases: (1) The elements L11i and L22i are identical for all members of the
panel, and do not need to be indexed by i. This corresponds to the case in which
the serial correlation structure of the data is homogeneous for all members of
the panel. (2) The elements L11i and L22i are perhaps heterogeneous across
members of the panel, but for each panel L11i = L22i. This corresponds to the case
in which asymptotic variances of the dependent and independent variables are
the same. Conversely, the effect of this term increases as (1) the dynamics
become more heterogeneous for the panel, and (2) as the relative volatility
becomes more different between the variables xit and yit for any individual
members of the panel. For most panels of interest, these are likely to be
important practical considerations. On the other hand, if the data are known to
be relatively homogeneous or simple in its serial correlation structure, the
imprecise estimation of these elements will decrease the attractiveness of this
type of estimator relative to one that implicitly imposes these known
restrictions.
B. Monte Carlo Simulation Results
We now study small sample properties in a series of Monte Carlo simulations.
Given the difficulties associated with the feasible versions of the within
dimension pooled panel fully modified OLS estimators discussed in the
previous subsection based on proposition 2.1, it is not surprising that these tend
to exhibit relatively large size distortions in certain scenarios, as reported in the
Pedroni (1996a). Kao & Chiang (1997) subsequently also confirmed the poor
small sample properties of the within dimension pooled panel fully modified
estimator based on a version in which a first stage OLS estimate was used for
the adjustment term. Indeed, such results should not be surprising given that the
first stage OLS estimator introduces a second order bias in the presence of
endogeneity, which is not eliminated asymptotically. Consequently, this bias
leads to size distortion for the panel which is not necessarily eliminated even
when the sample size grows large. By contrast, the feasible version of the
110
PETER PEDRONI
between dimension group mean estimator does not require such an adjustment
term even in the presence of heterogeneous serial correlation dynamics, and
does not suffer from the same size distortion.6 Consequently, we focus here on
reporting the small sample Monte Carlo results for the between dimension
group mean estimator and refer readers to Pedroni (1996a) for simulation
results for the feasible versions of the within dimension pooled estimators.
To facilitate comparison with the conventional time series literature, we use
as a starting point a few Monte Carlo simulations analogous to the ones studied
in Phillips & Loretan (1991) and Phillips & Hansen (1990) based on their
original work on FMOLS estimators for conventional time series. Following
these studies, we model the errors for the data generating process in terms of
a vector MA(1) process and consider the consequences of varying certain key
parameters. In particular, for the purposes of the Monte Carlo simulations, we
model our data generating process for the cointegrated panel (1) under
assumptions 1.1 and 1.2 as
yit = i + xit + it
xit = xit 1 + it
i = 1, . . . , N, t = 1, . . . , T, for which we model the vector error process
it = (it, it) in terms of a vector moving average process given by
it = it iit 1; it ~ i.i.d. N(0, i)
(3)
111
distributions which are centered around the parameter values set by Phillips &
Loretan (1991), but deviate by up to 0.4 in either direction for the elements of
i and by up to 0.85 in either direction for 21i. Thus, in our first experiment,
the parameters are drawn as follows: 11i ~ U(0.1, 0.7), 12i ~ (0.0, 0.8),
21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0) and 21i ~ U(0.85, 0.85). This specification
achieves considerable heterogeneity across individual members and also allows
the key parameters 21i and 21i to span the set of values considered in Phillips
and Loretans study. In this first experiment we restrict the values of 21i to span
only the positive set of values considered in Phillips and Loretan for this
parameter. In several cases Phillips and Loretan found negative values for 21i
to be particularly problematic in terms of size distortion for many of the
conventional test statistics applied to pure time series, and in our subsequent
experiments we also consider the consequences of drawing negative values for
this coefficient. In each case, the asymptotic covariances were estimated
individually for each member i of the cross section using the Newey-West
(1987) estimator. In setting the lag length for the band width, we employ the
data dependent scheme recommended in Newey & West (1994), which is to set
T
100
2/9
, where T is the
112
PETER PEDRONI
biases are also very small. Similar results continue to hold in subsequent
experiments with negative moving average coefficients, regardless of the data
generating process for the serial correlation processes. Consequently, the first
thing to note is that these estimators are extremely accurate even in panels with
very heterogeneous serial correlation dynamics, fixed effects and endogenous
regressors.
Of course these findings on bias should not come as a surprise given the
superconsistency results presented in the previous section. Instead, a more
central concern for the purposes of inference are the small sample properties of
the associated t-statistic and the possibility for size distortion. For this, we
consider the performance of the small sample sizes of the test under the null
hypothesis for various nominal sizes based on the asymptotic distribution.
Specifically, the last two columns report the Monte Carlo small sample results
for the nominal 5% and 10% p-values respectively for a two sided test of the
null hypothesis = 2.0. As a general rule, we find that the size distortions in
these small samples are remarkably small provided that the time series
dimension, T, is not smaller than the cross sectional dimension, N. The reason
for this condition stems primarily as a consequence of the estimation of the
fixed effects. The number of fixed effects, i, grows with the N dimension of
the panel. On the other hand, each of these N fixed effects are estimated
consistently as T grows large, so that i i goes to zero only as T grows large.
Accordingly, we require T to grow faster than N in order to eliminate this effect
asymptotically for the panel. As a practical consequence, small sample size
distortion tends to be high when N is large relative to T, and decreases as T
becomes large relative to N, which can be anticipated in any fixed effects
model. As we can see from the results in Table I, in cases when N exceeds T,
the size distortions are large, with actual sizes exceeding 30 and 40% when
T = 10 and N grows from 10 to 20 and 30. This represents an unattractive
scenario, since in this case, the tests are likely to report rejections of the null
hypothesis when in fact it is not warranted. However, these represent extreme
cases, as the techniques are designed to deal with the opposite case, where the
T dimension is reasonably large relative to the N dimension. In these cases,
even when the T dimension is only slightly larger than the N dimension, and
even in cases where it is comparable, we find that the size distortion is
remarkably small. For example, in the results reported in Table I we find that
with N = 20, T = 40 the size of the nominal 5% and 10% tests becomes 4.5%
and 9.3% respectively. Similarly, for N = 10, T = 30 the sizes for the Monte
Carlo sample become 6.1% and 11% respectively, and for N = 30, T = 60, they
become 4.7% and 9.6%. As the T dimension grows even larger for a fixed N
dimension, the tests tend to become slightly undersized, with the actual size
113
becoming slightly smaller than the nominal size. In this case the small sample
tests actually become slightly more conservative than one would anticipate
based on the asymptotic critical values.
Next, we consider the case in which the values for 21i span negative
numbers, and for the experiment reported in Table II of Appendix B we draw
this coefficient from 21i ~ U(0.8, 0.0). Large negative values for moving
average coefficients are well known to create size distortion for such
estimators, and we anticipate this to be a case in which we have higher small
sample distortion. It is interesting to note that in this case the biases for the
point estimate become slightly positive, although as mentioned before, they
continue to be very small. The small sample size distortions follow the same
pattern in that they tend to be largest when T is small relative to N and decrease
as T grows larger. In this case, as anticipated, they tend to be higher than for the
case in which 21i spans only positive values. However, the values still fall
within a fairly reasonable range considering that we are dealing with all
negative values for 21i. For example, with N = 10, T = 100 we have values of
6.3% and 12% for the 5% and 10% nominal sizes respectively. For N = 20,
T = 100 they become 9% and 15.6% respectively. These are still remarkably
small compared to the size distortions reported in Phillips & Loretan (1991) for
the conventional time series case.
Finally, we ran a third experiment in which we allowed the values for 21i to
span both positive and negative values so that we draw the values from
21i ~ U(0.4, 0.4). We consider this to be a fairly realistic case, and this
corresponds closely to the range of moving average coefficients that were
estimated in the purchasing power parity study contained in Pedroni (1996a).
We find the group mean estimator and test statistic to perform very well in this
situation. The Monte Carlo simulation results for this case are reported in Table
III of Appendix B. Whereas the biases for the case with large positive values
of 21i in Table I were negative, and for the case with large negative values in
Table II were positive, here we find the biases to be positive and often even
smaller in absolute value than either of the first two cases. Most importantly,
we find the size distortions for the t-statistic to be much smaller here than in the
case where we have exclusively negative values for 21i. For example, with
N = 30, and T as small as T = 60, we find the nominal 5% and 10% sizes to be
5.4% and 10.5%. Again, generally the small sample sizes for the test are quite
close to the asymptotic nominal sizes provided that the T dimension is not
smaller than the N dimension. Consequently, it appears to be the case that even
when some members of the panel exhibit negative moving average coefficients,
as long as other members exhibit positive values, the distortions tend to be
averaged out so that the small sample sizes for the group mean statistic stay
114
PETER PEDRONI
very close to the asymptotic sizes. Thus, we conclude that in general when the
T dimension is not smaller than the N dimension, the asymptotic normality
result appears to provide a very good benchmark for the sampling distribution
under the null hypothesis, even in relatively small samples with heterogeneous
serial correlation dynamics.
Finally, although power is generally not a concern for such panel tests, since
the power is generally quite high, it is worth mentioning the small sample
power properties of the group mean estimator. Specifically, we experimented
by checking the small sample power of the test against the alternative
hypothesis by generating the 10,000 draws for the DGP associated with case 3
above with = 1.9. For the test of the null hypothesis that = 2.0 against the
alternative hypothesis that = 1.9, we found that the power for the 10% p-value
test reached 100% for N = 10 when T was 40 or more (or 98.2% when T = 30)
and reached 100% for N = 20 when T was 30 or more, and for N = 30 the power
reached 100% already when T was 20 or more. Consequently, considering the
high power and the relatively small size distortion, we find the small sample
properties of the estimator and associated t-statistic to be extremely well
behaved in the cases for which it was designed.
115
xit can be
L 22i
constructed from the original data. Then the final step is to construct the cross
product terms between y*it and (xit xi). This is sufficient now to compute either
the point estimators or the associated t-statistics for any of the statistics.
It is worth noting two points here. The difference between the panel within
dimension estimators and the group mean between dimension estimators is in
the way in which the cross product terms are computed. For the within
dimension statistics, the cross product terms are computed by summing over
the T and N dimensions separately for the numerator and the denominator. For
the group mean between dimension statistics, the cross product terms are
computed by summing over the T dimension for the numerator and
denominator separately, and then summing over the N dimension for the entire
ratio. Consequently, the first point to note is that the algorithm as applied to the
Next, using the elements of Li, the expression for y*it = (yit y i)
116
PETER PEDRONI
group mean estimator describes the same steps that one would take if one were
estimating N different conventional FMOLS estimators and then taking the
average of these. The same is true for the group mean t-statistic. Thus, if one
already has a routine to estimate the conventional time series FMOLS
estimator, then the group mean panel FMOLS estimator is extremely simple
and convenient to estimate. The second point to note is that for the panel
FMOLS within dimension estimator we have used the estimates of i,
i, oi
and i to compute the weighted panel variances. But it is equally feasible to
compute the unweighted panel variances by first averaging the values i,
i, oi
before applying the transformations. Whether or not the two different
treatments has much consequence for the estimate is likely to depend on how
heterogeneous the values of i are across individual members.
Next, we briefly describe a few examples of the use of these panel FMOLS
estimators. One obvious application is to the exchange rate literature, and in
particular the purchasing power parity literature. Long run absolute or strong
purchasing power parity predicts that nominal exchange rates and aggregate
price ratios among countries should be cointegrated with a unit cointegrating
vector, so that the real exchange rate is stationary. However, panel unit root
tests based on Levin & Lin (1993) have generally found mixed results. See for
example Oh (1996) and Papell (1997) and Wu (1996) among others. On the
other hand, panel cointegration tests based on Pedroni (1995, 1997a) have
generally rejected the null of no cointegration. See for example Canzoneri,
Cumby & Diba (1996), Chinn (1997) and Taylor (1996) among others for
these. By contrast, long run relative or weak purchasing power parity simply
predicts that the nominal exchange rate and aggregate price ratios will be
cointegrated, though not necessarily with a unit cointegrating vector. The panel
FMOLS estimators presented in this paper are an obvious way to distinguish
between these two hypothesis, and Pedroni (1996a, 1999) uses these panel
FMOLS estimators to show that only the relative, weak form of purchasing
power parity holds for a panel of post Bretton Woods period floating exchange
rates. The latter paper contrasts results for both a parametric group mean DOLS
estimator and nonparametric group mean FMOLS estimator for the weak
purchasing power parity test. In a similar spirit, Alexius & Nilson (2000),
Canzoneri, Cumby & Diba (1996), Chinn (1997) apply these panel FMOLS
tests from Pedroni (1996a) to test the Samuelson-Balassa hypothesis that long
run movements of real exchange rates are driven by differences in long run
relative productivities among countries.
Other examples of the use of these panel FMOLS tests have been to the
growth literature. Neusser & Kugler (1998) use the tests from Pedroni (1996a)
to investigate the connection between financial development and growth. Kao,
117
Chiang & Chen (1999) use a panel FMOLS estimator and compare it to a panel
DOLS estimator to investigate the connection between research and development expenditure and growth. Keller & Pedroni (1999) use the group mean
panel estimator presented in this chapter to study the mechanism by which
imported R&D impacts growth at the industry level and demonstrate the
attractiveness of the more flexible form of the group mean estimator. Canning
& Pedroni (1999) use the same group mean panel FMOLS test as a first step
estimator to construct a test for the direction of long run causality between
public infrastructure and long run growth. Finally Pedroni & Wen (2000) make
use of the group mean panel FMOLS estimator as a first step estimator in an
overlapping generations model to identify the position of the U.S., Japanese
and European economies relative to the golden rule, and the extent to which
social security transfer programs can move economies closer to this position.
This is just a brief summary of the application of these estimators to two
literatures, the exchange rate and growth literatures. Needless to say, many
potential applications exist beyond these two literatures.
118
PETER PEDRONI
whether heterogeneity of the cointegrating vector can be ruled out, and it would
be particularly nice to test the null hypothesis that the cointegrating vectors are
heterogeneous in such panels with heterogeneous dynamics. In this context,
Pedroni (1998) provides a technique that allows one to test such a null
hypothesis against the alternative hypothesis that they are homogeneous and
demonstrates how the technique can be used to test whether convergence in the
Solow growth model occurs to distinct versus common steady states for the
Summers and Heston data set.
Another important issue that is often raised for these types of panels pertains
to the assumption of cross sectional independence as per assumption 1.2 in this
chapter. The standard approach is to use common time dummies, which in
many cases is sufficient to deal with cross sectional dependence. However, in
some cases, common time dummies may not be sufficient, particularly when
the cross sectional dependence is not limited to contemporaneous effects and is
dynamic in nature. Pedroni (1997b) proposes an asymptotic covariance
weighted GLS approach to deal with such dynamic cross sectional dependence
for the case in which the time series dimension is considerably larger than the
cross sectional dimension, and applies the panel fully modified form of the test
to the purchasing power parity hypothesis using monthly OECD exchange rate
data. It is interesting to note, however, that for this particular application, taking
account of such cross sectional dependencies does not appear to impact the
conclusions and it is possible that in many cases cross sectional dependence
does not play as large a role as one might anticipate once common time
dummies have been included, although this remains an open question.
Another important issue is parameteric versus non-parametric estimation of
nuisance parameters. Clearly, any of the estimators presented here can be
implemented by taking care of the nuisance parameter effects either
nonparameterically using kernel estimators, or parametrically, as for example
using dynamic OLS corrections. Generally speaking, non-parametric estimation tends to be more robust, since one does not need to assume a specific
parametric form. On the other hand, since non-parametric estimation relies on
fewer assumptions, it generally requires more data than parametric estimation.
Consequently, for conventional time series tests, when data is limited it is often
worth making specific parameteric assumptions. For panels, on the other hand,
the greater abundance of data suggests an opportunity to take advantage of the
greater robustness of nonparametric methods, though ultimately the choice
may simply be a matter of taste. The Monte Carlo simulation results provided
here demonstrate that even in the presence of considerable heterogeneity, nonparametric correction methods do very well for the group mean estimator and
the corresponding t-statistic.
119
NOTES
1. The results in section 2 and appendix A first appeared in Pedroni (1996a). The
Indiana University working paper series is available at http://www.indiana.edu/ iuecon/
workpaps/
2. In fact the computer program which accompanies this paper also allows one to
implement these tests for any arbitrary number of regressors. It is available upon request
from the author at ppedroni@indiana.edu
3. See Phillips & Moon (1999) for a recent formal study of the regularity conditions
required for the use of sequential limit theory in panel data and a set of conditions under
which sequential limits imply joint limits, including the case in which the long run
variances differ among members of the panel.
4. These results are for the OLS estimator when the variables are cointegrated. A
related stream of the literature studies the properties of the panel OLS estimator when
the variables are not cointegrated and the regression is spurious. See for example Entorf
(1997), Kao (1999), Phillips & Moon (1999) and Pedroni (1993, 1997a) on spurious
regression in nonstationary panels.
5. A separate issue pertains to differences between the sample averages and the true
population means. Since we are treating the asymptotics sequentially, this difference
goes to zero as T grows large prior to averaging over N, and thus does not impact the
limiting distribution. Otherwise, more generally we would require that the ratio N/T
goes to zero as N and T grow large in order to ensure that these differences do not
impact the limiting distribution. We return to this point in the discussion of the small
sample properties in section 3.2.
6. Of course this is not to say that all within dimension estimators will necessarily
suffer from this particular form of size distortion, and it is likely that some forms of the
pooled FMOLS estimator will be better behaved than others. Nevertheless, given the
other attractive features of the between dimension group mean estimator, we focus here
on reporting the very attractive small sample properties of this estimator.
7. I am grateful to an anonymous referee for suggesting this section.
ACKNOWLEDGMENTS
I thank especially Bob Cumby, Bruce Hansen, Roger Moon, Peter Phillips,
Norman Swanson and Pravin Trivedi and two anonymous referees for helpful
comments and suggestions on various earlier versions, and Maria Arbatskaya
for research assistance. The paper has also benefitted from presentations at the
June 1996 North American Econometric Society Summer Meetings, the April
1996 Midwest International Economics Meetings, and workshop seminars at
Rice University-University of Houston, Southern Methodist University, The
Federal Reserve Bank of Kansas City, U. C. Santa Cruz and Washington
University. The current version of the paper was completed while I was a
visitor at the Department of Economics at Cornell University, and I thank the
members of the Department for their generous hospitality. A computer program
120
PETER PEDRONI
which implements these tests is available upon request from the author at
ppedroni@indiana.edu
REFERENCES
Alexius, A., & Nilson, J. (2000). Real Exchange Rates and Fundamentals: Evidence from 15
OECD Countries. Open Economies Review, forthcoming.
Canning, D., & Pedroni, P. (1999). Infrastructure and Long Run Economic Growth. CAE Working
paper, No. 9909, Cornell University.
Canzoneri M., Cumby, R., & Diba, B. (1996). Relative Labor Productivity and the Real Exchange
Rate in the Long Run: Evidence for a Panel of OECD Countries. NBER Working paper No.
5676.
Chinn, M. (1997). Sectoral Productivity, Government Spending and Real Exchange Rates:
Empirical Evidence for OECD Countries. NBER Working paper No. 6017.
Chinn, M., & Johnson, L. (1996). Real Exchange Rate Levels, Productivity and Demand Shocks:
Evidence from a Panel of 14 Countries. NBER Working paper No. 5709.
Entorf, H. (1997). Random Walks and Drifts: Nonsense Regression and Spurious Fixed-Effect
Estimation. Journal of Econometrics, 80, 28796.
Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37,
249265.
Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels. Working
paper, Department of Economics, University of Cambridge.
Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data.
Journal of Econometrics, 90, 144.
Kao, C., & Chen, B. (1995). On the Estimation and Inference of a Cointegrated Regression in
Panel Data When the Cross-section and Time-series Dimensions Are Comparable in
Magnitude. Working paper, Department of Economics, Syracuse University.
Kao, C., & Chiang, M. (1997). On the Estimation and Inference of a Cointegrated Regression In
Panel Data. Working paper, Department of Economics, Syracuse University.
Kao, C., Chiang, M., & Chen, B. (1999). International R&D Spillovers: An Application of
Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and
Statistics, 61(4), 691709.
Keller, W., & Pedroni, P. (1999). Does Trade Affect Growth? Estimating R&D Driven Models of
Trade and Growth at the Industry Level. Working paper, Department of Economics, Indiana
University and University of Texas.
Levin, A., & Lin, F. (1993). Unit Root Tests in Panel Data; Asymptotic and Finite-sample
Properties. Working paper, Department of Economic, U. C. San Diego.
Mark, N., & Sul, D. (1999). A Computationally Simple Cointegration Vector Estimator for Panel
Data. Working paper, Department of Economics, Ohio State University.
Neusser, K., & Kugler, M. (1998). Manufacturing Growth and Financial Development: Evidence
from OECD Countries. Review of Economics and Statistics, 80, 638646.
Newey, W., & West, K. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and
Autocorrelation Consistent Coariance Matrix. Econometrica, 55, 703708.
Newey, W., & West, K. (1994). Autocovariance Lag Selection in Covariance Matrix Estimation.
Review of Economic Studies, 61, 631653.
121
Obstfeld M., & Taylor, A. (1996). International Capital-Market Integration over the Long Run:
The Great Depression as a Watershed. Working paper, Department of Economics, U. C.
Berkeley.
Oh, K. (1996). Purchasing Power Parity and Unit Root Tests Using Panel Data. Journal of
International Money and Finance, 15, 405418.
Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float.
Journal of International Economics, 43, 31332.
Pedroni, P. (1993). Panel Cointegration. Chapter 2 in Panel Cointegration, Endogenous Growth
And Business Cycles in Open Economies, Columbia University Dissertation, Ann Arbor,
MI: UMI Publishers.
Pedroni, P. (1995). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time
Series Tests, With an Application to the PPP Hypothesis. Working paper, Department of
Economics, No. 95013, Indiana University.
Pedroni, P. (1996a). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper No. 96020, Department of Economics, Indiana
University.
Pedroni, P. (1996b). Human Capital, Endogenous Growth, & Cointegration for Multi-Country
Panels. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1997a). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time
Series Tests, With an Application to the PPP Hypothesis; New Results. Working paper,
Department of Economics, Indiana University.
Pedroni, P. (1997b). On the Role of Cross Sectional Dependency in Dynamic Panel Unit Root and
Panel Cointegration Exchange Rate Studies. Working paper, Department of Economics,
Indiana University.
Pedroni, P. (1998). Testing for Convergence to Common Steady States in Nonstationary
Heterogeneous Panels. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1999). Purchasing Power Parity Tests in Cointegrated Panels. Working paper,
Department of Economics, Indiana University.
Pedroni, P., & Wen, Y. (2000). Government and Dynamic Efficiency. Working paper, Department
of Economics, Cornell University and Indiana University.
Pesaran, H., & Smith, R. (1995). Estimating Long Run Relationships from Dynamic
Heterogeneous Panels. Journal of Econometrics, 68, 79114.
Phillips, P., & Durlauf, S. (1986). Multiple Time Series Regressions with Integrated Processes.
Review of Economic Studies, 53, 473495.
Phillips, P., & Hansen, B. (1990). Statistical Inference in Instrumental Variables Regression with
I(1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P., & Loretan, M. (1991). Estimating Long-run Economic Equilibria. Review of Economic
Studies, 58, 407436.
Phillips, P., & Moon, H. (1999). Linear Regression Limit Theory for Nonstationary Panel Data.
Econometrica, 67, 10571112.
Quah, D. (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data.
Economics Letters, 44, 919.
Taylor, A. (1996). International Capital Mobility in History: Purchasing Power Parity in the LongRun. NBER Working paper No. 5742.
Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel-Data Test. Journal
of Money Credit and Banking, 28, 5463.
122
PETER PEDRONI
MATHEMATICAL APPENDIX A
Proposition 1.1: We establish notation here which will be used throughout the
remainder of the appendix. Let Zit = Zit1 + it where it = (it, it). Then by
virtue of assumption 1.1 and the functional central limit theorem,
T
Z itit
i) dB(r,
i) +
i + oi
B(r,
t=1
(A1)
r=0
it
Z itZ
i)B(r,
i) dr
B(r,
(A2)
r=0
t=1
for all i, where Z it = Z it Z i refers to the demeaned discrete time process and
i) is demeaned vector Brownian motion with asymptotic covariance i.
B(r,
i(r) where Li = i1/2 is the
i) = Li W
This vector can be decomposed as B(r,
= W1(r)
lower triangular decomposition of i and W(r)
W2(r)
W1(r) dr,
W2(r) dr
with W1i independent of W2i. Under the null hypothesis, the statistic can be
written in these terms as
N
T
1
1
Z itit
T
N i = 1
21
t=1
TN( NT ) =
(A3)
N
T
1
2
it
T
Z itZ
N i=1
22
t=1
Based on (A1), as T , the bracketed term of the numerator converges to
i)
i) dB(r,
B(r,
r=0
+ 21i + o21i
r=0
(A4)
21
i) dB(r,
i)
B(r,
= L11iL22i
21
+ L21iL22i
W2i
W2i
(A5)
123
E L21iL22i
W2i
1
= L21iL22i
2
(A6)
Thus, given that the asymptotic covariance matrix, i, must have positive
diagonals, the expected value of the expression (A4) will be zero only if
L21i =
21i = o21i = 0, which corresponds to strict exogeneity of regressors for all
members of the panel. Finally, even if such strict exogeneity does hold, the
variance of the numerator will still be influenced by the parameters L11i, L22i
which reflect the idiosyncratic serial correlation patterns in the individual
cross sectional members. Unless these are homogeneous across members of the
panel, they will lead to non-trivial data dependencies in the asymptotic
distribution.
Proposition 1.2: Continuing with the same notation as above, the fully modified
statistic can be written under the null hypothesis as
N
NT ) =
TN(*
N
1
L 1
11iL22i (0,1)
Z itit
i=1
1,
t=1
1
N
L 21i
i
L 22i
2
22i
i=1
(A7)
it
Z itZ
t=1
22
r=0
i) dB(r,
i)
B(r,
21
L 21i
L 22i
i) dB(r,
i)
B(r,
r=0
+ 21i + o21i
L 21i
(
22i + o22i)
L 22i
i such that
which can be decomposed into the elements of W
22
(A8)
124
PETER PEDRONI
i) dB(r,
i)
B(r,
r=0
= L11iL22i
21
+ L21iL22i
i) dB(r,
i)
B(r,
r=0
W2i
= L222i
W2i
W2idW2i W2i(1)
22
W2i
(A9)
(A10)
where the index r has been omitted for notational simplicity. Thus, if a
i i and consequently L i Li
consistent estimator of i is employed, so that
and i , then
T
L L
1 1
11i 22i
(0,1)(T
t=1
L 21i
i
Z itit) 1,
L 22i
1
(A11)
W2i(r) dr
W2i dW1i
W2idW1i W1i(1)
2W1i(1)
W2idr
(A12)
W2idr = 0
2
W2idW1i + W1i(1)2
1
1
1 1
= 2
+ =
2
3
3 6
W2idr
(A13)
respectively. Now that this expression has been rendered void of any
i), then by virtue of
idiosyncratic components associated with the original B(r,
assumption 1.2 and a standard central limit theorem argument,
N
1
N
i=1
125
i)B(r,
i)
B(r,
r=0
22
Thus,
= L222i
(T
it)
Z itZ
22
t=1
W2i(r) dr
(A15)
W2i(r) dr
(A16)
T
2
22i
W2i(r)2 dr
W2i(r)2dr
W2i(r)2dr
2
W2i(r) dr
1 1 1
= =
2 3 6
(A18)
Again, since this expression has been rendered void of any idiosyncratic
i), then by virtue of assumption
components associated with the original B(r,
1.2 and a standard law of large numbers argument,
N
1
N
i=1
W2i(r)2 dr
W2i(r) dr
1
6
(A18)
W1i(r) dr
W1i =
and
T 1/2x i
W2i(r) dr
as
T ,
and
setting
and (A17) the results for the distribution in the case with no estimated
126
PETER PEDRONI
intercepts. In this case the mean given by (A12) remains zero, but the variance
1
1
in (A13) become 2 and the mean in (A17) also becomes 2. Thus,
NT ) N(0, 2) for this case.
TN(*
Corollary 1.2: In terms of earlier notation, the statistic can be rewritten as:
N
t*
NT =
N
1
L 1
11iL22i (0,1)
Z itit
i=1
1,
t=1
1
N
L 21i
i
L 22i
(A19)
L 2
22i
it
Z itZ
T 2
i=1
22
t=1
2
W2i dW1i
2W1i(1)
2
W2i
W 22i
=E
W2i
(A20)
W2i
W1i,
then t*
i, y i are
NT N(0, 1) irrespective of whether x
estimated or not.
Proposition 1.3: Write the statistic as:
N
t*
NT =
N
i=1
t=1
Z itit
1
L 2
11i (0, 1) T
(T
L 21i
i
L 22i
1/2
it)22
Z itZ
1,
t=1
(A21)
L11iL22i
127
W2i(r) dr
~ N 0, L11iL22i
W2i(r)2 dr
W2i(r) dr
(A22)
by virtue of the independence of W21i(r) and dW1i(r). Since the second bracketed
term converges to
L22i
W2i(r)2 dr
1/2
(A23)
W2i(r) dr
then, taken together, for L i Li, (A21) becomes a standardized sum of i.i.d.
standard normals regardless of whether or not
W1i,
xit) T i
L 22i
i
=
1
t
=
1
NT =
*
N
T
(xit x i)2
L 222i
i=1
L L
1 1
11i 22i
i=1
t=1
L 11i L 22i
1+
L 22i
t=1
(xit x i)2
(A24)
(xit x i)2
L 2
22i
i=1
t=1
L 11i L 22i
1 1
Since L 2
, the last term in (A24) reduces to , thereby
22i = L11iL22i 1 +
L 22i
128
PETER PEDRONI
APPENDIX B
Table I. Small Sample Performance of Group Mean Panel FMOLS with
Heterogeneous Dynamics
Case 1: 21i ~ (0.0, 0.8)
N
bias
std error
5% size
10% size
10
10
20
30
40
50
60
70
80
90
100
0.058
0.018
0.009
0.006
0.004
0.003
0.002
0.002
0.002
0.001
0.115
0.047
0.029
0.020
0.016
0.012
0.010
0.009
0.008
0.007
0.282
0.084
0.061
0.035
0.027
0.020
0.016
0.014
0.014
0.014
0.362
0.145
0.110
0.076
0.062
0.049
0.044
0.040
0.038
0.037
20
10
20
30
40
50
60
70
80
90
100
0.034
0.012
0.006
0.004
0.003
0.003
0.002
0.002
0.002
0.001
0.079
0.033
0.020
0.014
0.011
0.009
0.007
0.006
0.006
0.005
0.291
0.100
0.076
0.045
0.039
0.028
0.026
0.021
0.020
0.018
0.378
0.166
0.132
0.093
0.081
0.066
0.059
0.055
0.050
0.052
30
10
20
30
40
50
60
70
80
90
100
0.049
0.017
0.009
0.006
0.004
0.003
0.003
0.002
0.002
0.002
0.061
0.025
0.015
0.011
0.008
0.007
0.006
0.005
0.004
0.004
0.386
0.156
0.107
0.072
0.059
0.047
0.039
0.035
0.032
0.030
0.470
0.234
0.177
0.133
0.118
0.096
0.086
0.073
0.077
0.076
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with
= 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11i ~ U(0.1, 0.7),
12i ~ U(0.0, 0.8), 21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0).
129
Table II. Small Sample Performance of Group Mean Panel FMOLS with
Heterogeneous Dynamics
Case 2: 21i ~ U(0.8, 0.0)
N
bias
std error
5% size
10% size
10
10
20
30
40
50
60
70
80
90
100
0.082
0.041
0.025
0.016
0.012
0.009
0.007
0.006
0.005
0.005
0.132
0.058
0.037
0.027
0.021
0.017
0.014
0.012
0.011
0.010
0.422
0.234
0.187
0.137
0.115
0.091
0.087
0.078
0.072
0.063
0.498
0.324
0.268
0.213
0.185
0.155
0.151
0.140
0.135
0.120
20
10
20
30
40
50
60
70
80
90
100
0.093
0.043
0.026
0.017
0.012
0.009
0.007
0.006
0.005
0.004
0.092
0.042
0.027
0.020
0.015
0.012
0.010
0.009
0.008
0.007
0.581
0.352
0.265
0.205
0.158
0.130
0.117
0.109
0.103
0.090
0.648
0.447
0.361
0.294
0.242
0.211
0.194
0.181
0.170
0.156
30
10
20
30
40
50
60
70
80
90
100
0.070
0.033
0.020
0.013
0.009
0.007
0.006
0.005
0.004
0.003
0.071
0.032
0.020
0.015
0.011
0.009
0.008
0.007
0.006
0.005
0.563
0.339
0.259
0.196
0.152
0.131
0.113
0.103
0.096
0.087
0.630
0.433
0.352
0.289
0.236
0.211
0.190
0.175
0.164
0.156
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with
= 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11i ~ U(0.1, 0.7),
12i ~ U(0.8, 0.0), 21i ~ U(0.8, 0.0), 22i ~ U(0.2, 1.0).
130
PETER PEDRONI
Table III. Small Sample Performance of Group Mean Panel FMOLS with
Heterogeneous Dynamics
Case 3: 21i ~ U(0.4, 0.4)
N
bias
std error
5% size
10% size
10
10
20
30
40
50
60
70
80
90
100
0.009
0.011
0.008
0.005
0.004
0.003
0.002
0.002
0.002
0.001
0.129
0.052
0.033
0.023
0.018
0.014
0.012
0.011
0.009
0.008
0.284
0.113
0.086
0.058
0.048
0.039
0.037
0.031
0.029
0.028
0.367
0.179
0.150
0.113
0.093
0.083
0.077
0.072
0.068
0.062
20
10
20
30
40
50
60
70
80
90
100
0.028
0.014
0.009
0.006
0.004
0.003
0.002
0.002
0.001
0.001
0.090
0.037
0.024
0.017
0.013
0.010
0.009
0.008
0.007
0.006
0.346
0.145
0.106
0.077
0.060
0.048
0.040
0.037
0.035
0.035
0.430
0.222
0.179
0.138
0.114
0.093
0.085
0.083
0.079
0.078
30
10
20
30
40
50
60
70
80
90
100
0.008
0.006
0.004
0.003
0.002
0.001
0.001
0.001
0.001
0.001
0.069
0.028
0.018
0.013
0.010
0.008
0.007
0.006
0.005
0.005
0.317
0.122
0.095
0.068
0.054
0.044
0.038
0.036
0.033
0.036
0.402
0.194
0.155
0.122
0.105
0.088
0.082
0.076
0.073
0.074
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with
= 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11 ~ U(0.1, 0.7),
12i ~ U(0.4, 0.4), 21i ~ U(0.4, 0.4), 22i ~ U(0.2, 1.0).
I. INTRODUCTION
In economics it is often of interest to test whether a set of time series moves
together, that is whether the series are driven by some common factors. The
vast literature on cointegration has focussed on long-run comovements for
nonstationary time series. More recently, some authors have analyzed the
existence of short-run comovements between stationary time series or between
first differenced cointegrated-I(1) series (see Tiao & Tsay, 1989; Engle &
Kozicki, 1993; Gouriroux & Peaucelle, 1993; Vahid & Engle, 1993; Vahid &
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 131160.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
131
132
Engle, 1997; Ahn, 1997). Among these approaches, the concept of serial
correlation common features (SCCF hereafter) introduced by Engle & Kozicki
(1993) appeared to be useful. It means that stationary time series move together
as there exist linear combinations of these variables that yield white noise
processes. These common feature vectors are measures for analyzing short-run
relationships between economic variables suggested by economic theory such
as relative purchasing power parity (Gouriroux & Peaucelle, 1993), permanent
income hypothesis (Campbell & Mankiw, 1990, Jobert, 1995), cross-country
real interest rate differentials (Kugler & Neusser, 1993), real business cycle
models (Issler & Vahid, 1996), convergence of economies (Beine & Hecq,
1997, 1998), Okuns Law (Candelon & Hecq, 2000).
Serial correlation common features imply the existence of a reduced number
of common dynamic factors explaining short-run comovements in economic
variables. A companion form of the common features models is the common
factor representation which has been used in macroeconomics for some
decades (see e.g. Engle & Watson, 1981; Geweke, 1977; Lumsdaine & Prasad,
1997; Singleton, 1980). Beyond economic considerations, through the reducedrank restrictions, the existence of common features is likely to lead to a
reduction of the number of parameters to be estimated. In general, imposing
common cyclical feature restrictions when they are appropriate will induce an
increase in estimation efficiency (Ltkepohl, 1991) and accuracy of forecasts
(Vahid & Issler, 1999).
Also as for unit roots and cointegration tests, the power of common cyclical
feature procedures may be low for small samples (Beine & Hecq, 1999). The
power of tests might be increased by relying on panel data instead of using only
time series data. Consequently, in this paper we propose to extend these models
by testing for serial correlation common features in a panel data framework. In
order to avoid confusion, it is worth noticing that standard panel data models
with common parameter structures obviously already imply a common feature
structure, namely the one which allows to pool the behavior of N individuals.
Notice that the assumption of poolability often made in panels may be often far
too strong. An investigator may want to test which poolability restrictions are
supported by the data and which restrictions have to be rejected for the panel
data.
We propose to generalize the SCCF approach and apply it to search for
common cyclical features in panel data. In particular, we investigate whether
there exist linear combinations of the variables for individual or entity i which
are white noise for all i, in other words, which weights in the linear
combinations are identical across all entities. Developing a methodology to
133
analyze and test common cyclical features in panel data is of theoretical and
practical importance since common cyclical feature restrictions are less
restrictive than the assumption of identical parameters across individuals
usually made in panel data modeling.
Some purists might not speak about panel for this type of analysis. Indeed,
in situations we are interested with, N will be relatively small compared to its
value in usual panel data and T is assumed large (with T asymptotics).
Many macroeconomic studies deal with 15 to 50 annual observations for 20 to
100 countries, regions, industry levels or big firms. In those cases, the border
between pure panel analysis (N ) and pure time series analysis (T ) is
fuzzy. Far from impoverishing the panel data analysis, taking into account
medium or large size time series raises new interesting issues such as testing
for unit roots or cointegration in panel data (see inter alia Levin & Lin, 1993;
Pesaran & Smith, 1995; Evans & Karas, 1996a; Kao, 1999; Pedroni, 1997a;
Phillips & Moon, 1999b, and Phillips & Moon, 1999a, for the asymptotic
theory, and the recent issue of the Oxford Bulletin of Economics and Statistics,
1999).
The chapter is organized as follows. Section II provides an example of
common features between consumption and income implied by economic
theory and likely to be common to data for different countries. In Section III we
review the concept of serial correlation common features. Section IV extends
it to panel data. As we study differences and similarities in macroeconomic
series for different countries, we concentrate our analysis on the fixed effect
model (see Hsiao, 1986). Section V describes estimation procedures. In Section
VI simulation results are reported. In Section VII we present an empirical
analysis of the liquidity constraint consumption model for 22 OECD countries
and the G7. Section VIII concludes.
134
(1)
(2)
yt = yPt + yTt ,
which shows that aggregate consumption and income share a common trend yPt .
Note that because a fraction of income accrues to individuals who consume
their current income rather than their permanent income, this model has been
labelled model by Campbell & Mankiw (1990, 1991). It is also easily seen
that if = 0 we get the permanent income model. In order to stress the common
cycle component let us take the first difference of aggregate consumption
ct = c1t + c2t. By substituting the shares of income in the total income we obtain
ct = yt + (1 )yPt which in first differences can be written as:
ct = yt + (1 )yPt .
(3)
135
ct
1
=
+
[21
yt
2
1
22]
ct 1
+
[ 1
yt 1
1
2]
ct 1
1t
+
yt 1
2t
(4)
expression (4) by
yields a white noise. In the less restrictive model, labelled
Weak Form Reduced Rank Structure (WF), , and a linear combination of
first differences in deviation from the long-run equilibrium is a white noise:
ct
[ 1
yt
1
ct 1
yt 1
1t
.
2t
(5)
Formal definitions of the strong and the weak form are given in Hecq, Palm &
Urbain (1997, 2000) and consequences in terms of common cycles as well as
inference issues are analyzed there as well. Notice that Hecq et al. (1997) also
consider the mixed form combining both the strong and the weak form.
Common features relationships give information on short-run comovements.
These relationships may come from economic theory (relative purchasing
power parity, PIH) or from stylized facts (convergence, Real Business Cycle
(RBC) models) and give the dynamic common factor within the system, i.e.
21ct 1 + 22yt 1 in the WF case for instance. The orthogonal complement of
= 0s 2), gives the factor loading of the common
labelled (
the ,
dynamics in the equations, that is = [ 1]
in system (4). Note that these
common dynamic factors should not be confused with common cycles.
136
(6)
137
138
N=2
N=5
T = 10
T = 25
T = 50
T = 100
T = 10
T = 25
T = 50
T = 100
Marg
bias05
bias075025
2(2)
2ss(2)
Separ
bias05
bias075025
2(8)
2ss(8)
0.056
0.026
0.011
0.005
0.310
0.155
0.104
0.068
14.64
7.56
6.30
4.86
6.22
5.20
5.04
4.42
0.040
0.027
0.013
0.007
0.441
0.138
0.090
0.059
70.98
18.36
10.16
6.66
12.8
7.14
6.16
5.14
Marg
bias05
bias075025
2(2)
2ss(2)
Separ
bias05
bias075025
2(14)
2ss(14)
0.061
0.025
0.012
0.006
0.299
0.152
0.100
0.069
14.14
7.82
6.30
5.58
5.86
5.44
5.18
5.04
0.019
0.011
0.007
0.241
0.087
0.052
99.76
62.88
25.18
35.04
15.26
9.38
T = 25 for N = 2 and for T = 50 for N = 5. However the dispersion is too high for
smaller sample size and test statistics reject too often the presence of
respectively two and five common feature vectors.
These illustrative Monte Carlo results call for an extension to a (possibly
nonstationary) panel common feature analysis.
Let the subscript i = 1, . . . , N indicate the different groups/entities/units,
t = 1, . . . , T denote the sample period and j = 1, . . . , n denote the number of
variables for each group/entity. We assume that the n-dimensional vector of
observed I(1) variables for entity i, Xi, t, is generated by a pi-th order
cointegrated VAR which can be expressed in error-correction form as follows:
pi 1
Xi, t = i + t + i
i Xi, t 1 +
j=1
i = 1, . . . N,
t = 1, . . . , T,
(7)
139
E(i, t
j, t) = i, j. Note that one could allow for random individual effects in
expression (7). This would lead to an error-component structure of i, t similar
to that used in the panel-data literature.
For system (7), we define a homogeneous SCCF panel model as follows:
Definition 1. A panel model is called an homogeneous panel common feature
model if there exists, i = 1, . . . , N, a (n si) matrix i = j i, j = 1, . . . , N,
ii, t
i Xi, t =
Xt =
Xt 1 +
Xt 1 + ut,
N1 . . . NN
N1 . . . NN
(8)
(9)
140
nN nN
11
N1
...
...
1N
NN
(10)
When ur = 0, the system (9) is non-cointegrated. The approach presented can
be applied to non-cointegrated systems. Obviously, in such system, the WF and
SF reduced rank structures are identical.
Without imposing any zero block restrictions, the large unrestricted model
(8) is not estimable in practice. Consequently, restrictions have to be
considered. We first describe cointegrating restrictions before introducing serial
correlation common feature restrictions.
1. Cointegrating Restrictions In A Panel VAR
We first consider restrictions on the long-run matrix ur in the unrestricted
VECM. Two types (A and B) of sequences of hypotheses naturally arise in
panel data. The hypotheses involved in a sequence can be tested either
sequentially or jointly.
A1: Absence of long-run Granger-Causality [see Granger & Lin, 1995]
between the individual subgroups, i.e. ur is block-diagonal with elements
ij = 0 for i j.
A2: Cointegration in absence of long-run Granger-causality, i.e. ii = i
i,
with i and i being n ri matrices of rank ri, i = 1, . . . , N.
A3: Homogeneous panel cointegration, i.e. i = 1, i = 1, . . . , N; r = Nr1.
B1: Cointegration, i.e. ur =
, with and being nN r matrices of rank
r.
B2: Complete separation in cointegration (see Granger & Haldrup, 1997), i.e.
and are block-diagonal with typical blocks i and i respectively, of rank
ri, such that a typical block of
is i
i as defined in A2, and r = Ni = 1ri.
B3: Homogeneous panel cointegration, i.e. i = 1 ; i = 1, . . . , N; r = Nr1.
When the first two sets of restrictions in either sequence hold, the following
restricted structure arises.
0
11 . . . 1N
1
1 0 . . .
Xt =
0
0
Xt 1 +
Xt 1 + ut. (11)
0
0 . . . N
N
N1 . . . NN
When it is appropriate to add a restricted trend in the cointegration space, we
replace Xt 1 by X*t 1 = (X
t 1, t)
. For N fixed, a likelihood ratio statistic for
testing (11) versus (8) can be obtained using the sum of two different
conditional likelihood ratio statistics to test the sets of restrictions {A1, A2} or
141
0
0
X
+
0
0
Xt 1 + ut.
Xt =
t1
0
0 . . . N*N
0
0 . . . NN N
(12)
As for cointegrating restrictions, this model may be obtained by considering
two of the next three hypotheses under (11).
C1: Serial correlation common features: there exists a (nN s) matrix such
N
that
X
t is an s dimensional white noise, with s = i = 1si.
C2: Absence of short-run Granger-causality between the individual subgroups: ur is block-diagonal, i.e. ij = 0 for i j.
C3: Separation in common features: the matrix is block-diagonal with the
(si n) matrix i being a typical block on the main diagonal, s = Ni = 1si.
C4: Homogeneity of common features: i = 1; i = 1, . . . , N; s = Ns1.
Actually the hypothesis C2 is implicit when one stacks VECMs. Restriction C3
is developed in Hecq, Palm & Urbain (1999) for the SCCF as well as for the
weak form structure. Here again a likelihood ratio for testing model (12) versus
(11) can be obtained as the sum of two conditional likelihood ratio statistics to
test either {C1, C2} or {C2, C3}. This means that we can first test for common
cyclical features under the maintained hypothesis of short-run Granger-non-
142
causality C2. Alternatively, we can first test for absence of short-run causality
and then test for SCCF since both sequences of restrictions imply separation in
common features. This result is derived from Proposition 3.3. in Hecq, Palm &
Urbain (1999) which states that under separation in cointegration and blockdiagonality of this long-run matrix, the presence of common features implies
that the co-feature matrix is block-diagonal.
V. GMM ESTIMATION
To test for common features in a time series context, we have the choice
between GMM estimators applied to a regression framework and a canonical
correlation procedure based on maximum likelihood (ML) estimation. Both
methods have their advantages and drawbacks. The ML estimation is fully
efficient and likelihood ratio tests are asymptotically most powerful. GMM
estimators can be more easily implemented but they are in general not fully
efficient. In this section we present a GMM estimator that will be used in our
empirical analysis of a bivariate system for consumption and income for the
case where at most one serial correlation common feature vector exists.
For each individual, let us split Xi, t = (yi, t, zi, t)
and let the bivariate DGP be
i zi, t + i, t
yi, t = i + *
(13)
pi 1
k=1
pi 1
yi, t 1 +
(i)
1,k
(i)
2,kzi, t 1 + i, t,
(14)
k=1
where the second equation for zi, t is just one row of the VECM (11), with
normalized cointegrating vector
i = [1, *i ]. Both the ys and the zs are
autocorrelated as the disturbances i, t depend on lagged values of yi, t, zi, t
and on the error correction mechanism. Under the null of serial correlation
common features for individual i, i, t is a white noise process and the
i ].
i = [1, *
normalized SCCF vector is given by
In practice (Vahid & Engle, 1993, 1997), after the cointegration analysis in
the first step, the GMM procedure proceeds as follows. Regress the explanatory
variables zt on the whole set of instruments (i.e. lags of Xt and cointegrating
vectors) in order to obtain the best linear prediction zt. Then regress yt on a
constant term and zt. This estimate gives the potential serial correlation
common feature vector i. Finally, one tests for the validity of the
overidentifying restrictions using Hansens (1982)
2 test.
143
i
i=1
(2)1/2
~ N(0, 1)
where =
i
(15)
k=1
144
2
0
0
0 0
N
0 0
s nN
X1t
X2t
XNt
nN 1
1 0 0
2
0
0
=
0 0
N
0 0
s nN
u1t
u2t
uNt
(16)
nN 1
with s =
si and ut = (u
1t, u
2t, . . . , u
Nt)
being IIN(0, ).
i=1
1)ut.
(IN
(17)
i = [Is , *
i
]. Under this normalization, the
loss of generality) as follows
i
system (16) can be expressed as
y1t
y2t
yNt
145
1
0 0
*
2
0 *
0
0
0
0
0 *
z1t
z2t
zNt
s (nN s)
s 1
t
+
u
(18)
(nN s) 1
or more compactly
(19)
yt = B
zt + vt
observations, we get
Y = Z
Ts
+V
T (nN s)(nN s) s
(20)
Ts
or in vectorized form
y* =
Ts 1
Z*
+ v*
(21)
Ts 1
with y* = vec(Y), v* = vec(V), Z* = diag(Isi Zi) with Zi = [zitl], of
dimension T (n si), with t = 1, . . . , T, l = 1, . . . , n si; and being a vector
i ). Under the homogeneity
with typical i-th subvector being equal to vec( *
(22)
146
(25)
where the Z* are the projections of Z* on W, can be obtained as a GMM
estimator by selecting S = ITs in (23) and taking W(W
W) 1W
as instrument.
= W(W
* 1W) 1
Similarly, the GLS estimator regressing y* on Z*
1
W
* Z*, with * being the disturbance covariance matrix of the
(multivariate) regression of Z* on W, can be obtained from (23) by taking
S = and using as instruments W(W
* 1W) 1W
* 1 instead of W.
In the empirical analysis in Section VII, we consider a fixed effects model
because in the macroeconomic application, we study the population and not a
sample. Adding fixed effects to the model (21) for the case which we analyze,
e.g. for si = 1, i = 1, . . . , N and n = 2, yields
y = Z [ + Z] + Z*r r + v*,
(26)
147
(28)
A test for the validity the overidentifying restrictions is obtained using (25) and
is readily seen to be a test for the null hypothesis of C4, e.g. for the null of
homogeneity of common features: i = 1; i = 1, . . . , N, with s = Ns1, si = s1 = 1,
i = 1, . . . , N. In this specific case, the number of degrees of freedom for the
[n(pi 1) + ri (n 1)] +
i=1
148
yi, t
1
0.25
=
+
(1
zi, t
2
0.5
0.5
(0.6
1
1)
y1, t 1
z1, t 1
y1, t 1
i1, t
+
,
z1, t 1
i2, t
0.3)
1 0.8
0.8 1
ij =
0.7
0.6
0.6
.
0.75
Fig. 1.
Fig. 2.
149
this additional heteroscedasticity. From this DGP we see that under the
assumption of reduced rank the short run dynamic matrices (for each i) are
simply given by
0.30
0.60
0.15
, while under the alternative we chose to
0.30
0.30
0.60
0.00
.
0.30
We consider three sample sizes, i.e. T = 10, 25 and 50, and five cases for the
number of individuals, i.e. N = 1, 2, 5, 10 and 25. We report the median and the
spread (interquartile range) of the bias of the GMM panel estimator. We also
report the median of the standard deviation of r, GMM. We report the empirical
size (nominal being 5%) as well as the empirical size-adjusted power for overidentifying restrictions test statistics. df denotes the number of degrees of
freedom. Due to the huge computational time required for these simulations,
5,000 replications were used for N = 1, 2, 5; 2,000 for N = 10 and 1,000 for
N = 25.
The results are presented in Table 2. One can directly observe that the bias
is small and decreases when both T and/or N increase. The accuracy of
estimates, measured both by the spread and the standard deviation of the
150
biasQ75Q25
( r,GMM)Median
2(df)
size
size-adj. power
N=1
T = 10
T = 25
T = 50
0.0123
0.0101
0.0067
0.2228
0.1387
0.0944
0.156
0.098
0.070
(2)
(2)
(2)
7.88
5.58
5.54
9.90
19.78
34.68
N=2
T = 10
T = 25
T = 50
0.0136
0.0069
0.0034
0.1817
0.1057
0.0726
0.106
0.079
0.057
(5)
(5)
(5)
4.98
6.18
5.72
8.56
16.58
31.52
N=5
T = 10
T = 25
T = 50
0.0045
0.0044
0.0021
0.1409
0.0751
0.0460
0.067
0.060
0.047
(14)
(14)
(14)
3.96
5.68
5.74
7.26
12.52
24.82
N = 10
T = 25
T = 50
0.0022
0.0020
0.0658
0.0377
0.046
0.038
(29)
(29)
4.70
4.80
11.00
21.55
N = 25
T = 50
0.0002
0.0398
0.029
(74)
5.80
13.80
Fig. 3.
151
series, namely consumption and income variables for the OECD countries. The
picture also pleas in favor of disposing tools in order to modeling this
information. Lower case c and y denote natural logarithms of C and Y
respectively.
Table 3 reports time series statistics for each country. The first column of
Table 3 lists in alphabetical order, the names of the countries as well as the date
of joining OECD.9 Column 2 gives the quality ranking of the data as presented
in Summers & Heston (1991). It is seen that for the most part, the quality of the
data is reasonable. Columns 3 and 4 give the value of the Augmented DickeyFuller unit root test for respectively consumption and income. All tests are
based on both a constant term and a trend. The number of lags necessary to
whiten the residuals is given in parentheses. Columns 5 and 6 give respectively
the value of the Engle-Granger Augmented Dickey-Fuller cointegrating test for
each country separately and the long-run elasticity of consumption as a
dependent variable. Column 7 gives the order of the VAR(pi) in level, where pi
is determined using multivariate Hannan-Quinn (HQC) criteria. These lags, as
well as the presence of an error correcting mechanism term, will determine the
instruments to be used in common features test statistics.
In Table 3, a * indicates that individual unit root or cointegration test
statistics reject the null at a 5% nominal level. It emerges that, except for
152
Australia (1971)
Austria (1961)
Belgium (1961)
Canada (1961)
Denmark (1961)
Finland (1969)
France (1961)
Germany (1961)
Greece (1961)
Iceland (1961)
Ireland (1961)
Italy (1961)
Japan (1964)
Luxembourg (1961)
Netherlands (1961)
New Zealand (1973)
Norway (1961)
Portugal (1961)
Spain (1961)
Sweden (1961)
Switzerland (1961)
Turkey (1961)
UK (1961)
USA (1961)
Qual.
ADF ct
ADF yt
EG
i
*
HQC
A
A
A
A
A
A
A
A
A
B+
A
A
A
A
A
A
A
A
A
A
B+
C
A
A
1.21(4)
0.82(0)
1.43(1)
1.50(1)
0.94(0)
2.48(1)
0.11(2)
2.18(2)
0.58(0)
2.64(1)
2.54(1)
0.61(1)
0.91(0)
1.45(1)
0.71(2)
2.26(0)
1.29(1)
3.54(3)*
1.25(0)
0.70(1)
0.03(4)
3.26(2)
3.61(1)*
1.75(0)
0.93(2)
1.25(2)
0.74(1)
1.80(1)
0.94(0)
0.20(2)
0.04(1)
3.10(2)
0.01(0)
2.23(1)
2.82(1)
0.77(1)
0.48(1)
3.32(4)
0.20(2)
1.52(0)
1.76(1)
2.95(3)
1.34(0)
0.30(1)
1.69(2)
3.48(0)*
3.62(1)*
2.05(0)
1.46(1)
3.59(0)*
2.36(0)
3.89(1)*
3.69(0)*
1.69(3)
1.96(0)
1.69(2)
0.79(0)
4.52(0)*
3.76(2)*
1.86(1)
4.75(1)*
2.16(4)
3.07(1)
5.93(0)*
1.83(1)
3.07(3)
2.99(0)
3.58(1)*
3.28(0)
1.73(0)
2.13(0)
4.08(0)*
0.95
1.00
0.94
1.00
0.82
0.98
0.98
1.07
0.97
1.04
0.81
1.09
0.92
1.34
1.08
1.02
0.80
0.88
0.94
0.81
0.92
0.85
1.04
1.15
3
1
1
1
1
4
2
2
1
1
1
4
4
4
4
1
1
3
1
2
2
1
3
2
Portugal, UK and Turkey, we cannot reject the unit root hypothesis for
consumption and income. Using the Engle-Granger cointegration test, the null
hypothesis of non-cointegration is rejected for nine countries with long-run
elasticity *i close10 to 1. Consequently, we will use the cointegrating vectors
as instruments in six different versions: four homogeneous cases and two
heterogeneous ones. We proceed in two steps. In the first step the cointegrating
vectors are estimated. They are used as instruments in the second step to
estimate the common feature vectors. The results are reported in Table 4.
The homogeneous cases refer to a panel estimation of a common
cointegrating vector, that is parameters are assumed to be the same across
countries and the contemporaneous disturbance correlation across countries
and across variables for a given country is ignored. Absence of short-run
Granger-causality between countries is assumed throughout steps 1 and 2.
153
NGM
r,GMM
( r,GMM)
Test
df
p-val
*i = 1,
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.770
0.769
0.770
3.71
6.14
4.43
0.745
0.660
0.704
0.718
0.050
0.036
0.031
0.036
148.98
173.65
211.27
156.04
65
109
153
93
< 0.001
< 0.001
0.001
< 0.001
GM = 0.979
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.829
0.804
0.793
5.36
6.54
4.95
0.768
0.670
0.710
0.728
0.051
0.036
0.031
0.036
146.67
176.61
214.06
156.92
65
109
153
93
< 0.001
< 0.001
< 0.001
< 0.001
OLS = 0.939
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.870
0.837
0.822
5.74
5.12
3.93
0.814
0.687
0.727
0.738
0.050
0.036
0.031
0.036
131.96
170.16
206.93
145.01
65
109
153
93
< 0.001
< 0.001
0.002
< 0.001
*i = LSDV = 0.968
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.855
0.821
0.804
6.03
6.25
4.94
0.782
0.677
0.715
0.733
0.051
0.036
0.031
0.036
142.93
175.97
213.50
155.12
65
109
153
93
< 0.001
< 0.001
0.001
< 0.001
j
*i = *
(i,j with
cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.814
0.726
0.755
6.89
6.16
4.46
0.782
0.647
0.696
0.707
0.053
0.036
0.031
0.037
138.45
158.74
210.03
146.50
52
96
140
80
< 0.001
< 0.001
< 0.001
< 0.001
*i = 1
(i with
cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.865
0.784
0.775
1.59
3.89
2.72
0.810
0.682
0.734
0.750
0.056
0.037
0.033
0.040
115.25
144.00
192.33
131.56
52
96
140
80
< 0.001
0.001
0.002
< 0.001
154
155
common feature model for both test statistics. Table 5 presents the results for
the G7. The results are similar to those for the panel of 22 countries. However
in several situations we cannot reject the null of one homogeneous common
features vector. In these cases, we imposed the unlikely hypotheses of an
homogeneous cointegrating vector with a lag order uniformly fixed to p* = 3.
Finally, we want to notice the implications for empirical modeling that
follow from a restriction between the number of variables n and the sum of
cointegrated vectors and common features vectors. From Vahid & Engle
(1993), Theorem 1, it follows that the common feature space and the
cointegration space are linearly independent. This means that the sum of the
number of common feature vectors (s) and of the number of cointegrating
vectors (r) should be less than or equal to the number of variables (n): r + s n.
In a panel context under the absence of long and short-run Granger causality,
this has obvious but different implications depending on whether common
features vectors and cointegrating vectors are homogeneous or heterogeneous.
Table 5.
NGM
r,GMM
( r,GMM)
Test
df
p-val
LSDV
*i = 1 = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.866
0.763
0.755
2.47
2.37
1.81
1.042
0.856
0.872
0.884
0.087
0.060
0.052
0.058
32.83
53.70
67.05
50.84
20
34
48
30
0.035
0.017
0.036
0.010
GM = 1.035
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.893
0.777
0.766
1.64
1.815
1.49
1.021
0.857
0.878
0.892
0.082
0.060
0.052
0.057
31.51
50.25
62.75
46.22
20
34
48
30
0.048
0.036
0.075*
0.029
OLS = 1.023
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.882
0.771
0.762
1.75
1.89
1.51
1.036
0.856
0.876
0.890
0.084
0.060
0.052
0.057
32.06
51.11
63.87
47.84
20
34
48
30
0.043
0.030
0.062*
0.021
j
*i = *
i,j with
size cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.818
0.710
0.737
6.02
3.58
2.13
0.894
0.723
0.787
0.800
0.074
0.053
0.047
0.051
49.07
52.66
64.46
46.61
16
30
44
26
*i = *j = 1
(i,j with
size cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.875
0.753
0.764
2.68
2.60
1.66
1.029
0.859
0.894
0.917
0.089
0.062
0.053
0.061
27.69
47.49
60.14
43.97
16
30
44
26
< 0.001
0.006
0.024
0.008
0.034
0.022
0.053*
0.015
156
VIII. CONCLUSION
In this chapter we extended the serial correlation common feature analysis to
nonstationary panel data models. Concentrating upon the fixed effect model,
we defined homogeneous panel common feature models. We give a series of
steps allowing to implement these tests. We then apply this framework when
investigating the liquidity constraints model for 22 OECD and G7 countries. At
a 5% nominal level, we reject the presence of a panel common feature vector.
From the empirical analysis we can draw several (tentative) conclusions:
First, in a country by country analysis for approximately slightly less than
50% of the countries in the sample, there is evidence of cointegration between
consumption and income. The cointegrating vector appears to be homogeneous
across these countries with a long-run consumption elasticity close to one.
Second, for the sample of 22 countries, the existence of one homogeneous
SF (SCCF) common feature vector is rejected in most instances when using the
test proposed in (15). For the sample of G7 countries, in several instances, the
occurrence of a homogeneous SF common feature vector is not rejected. Notice
that this restriction is obviously less restrictive when it only applies to seven
countries. However the p-values are quite low and the non rejection of the null
hypothesis occurs when the model might be misspecified in particular because
we have maintained a homogeneous lag length of 3.
Third, the overidentifying restrictions implied by the assumption of a
homogeneous common feature vector are rejected in all instances in the sample
of 22 countries. For the G7 countries, again there is occasionally evidence in
favor of the overidentifying restrictions.
Again, it is not surprising to see that the assumption of homogenous
common features is rejected more frequently than the assumption of
homogenous cointegration. In the long-run consumption and income are
closely linked to each other, short-run deviations are generally possible and can
be realized through saving or borrowing.
157
Our model representation is not stricto sensus a dynamic panel because only
a part of the dynamics is common to all individuals. However it does part of the
job. Indeed while no size distortions have been noticed in our Monte Carlo
results, we can increase the power of test statistics, by going a step further
towards dynamic panel data if the null hypothesis of panel common-cyclical
feature model is not rejected. In the opposite case, it is not worth imposing
further common restrictions if the null is rejected. This is a clue for considering
less restrictive models like heterogeneous or group homogeneous models. A
bootstrap procedure could certainly be undertaken to find the distribution. This
is also perhaps the place to choose more flexible models like the nonsynchronous common cycle model (Vahid & Engle, 1997) or the weak form
common feature analysis (Hecq, Palm & Urbain, 1997).
ACKNOWLEDGMENT
Support from METEOR through the research project Dynamic and Nonstationary Panels: Theoretical and Empirical Issues is gratefully
acknowledged. The authors want to thank two anonymous referees and the coeditor for useful comments of a previous version of this paper. The usual
disclaimer applies. The GAUSS routines and the data that have been used in
this paper are available from http://www.employ.unimaas.nl/j.urbain
NOTES
1. Note that Vahid & Engle (1997) have extended their framework to the case where
a linear combination is a MA(q) process and not a white noise. They labelled this model
non-synchronous common cycle.
2. The first step checks for the presence of cointegrating relationships and then,
given the estimated cointegration relations, the common feature analysis is carried out
in a second step. An alternative is to use a joint estimation procedure that exploits both
the cointegration and common features restrictions using a switching algorithm (Hansen
& Johansen, 1998; Hecq, 1999).
3. See Anderson and Vahid (1996) for the connection between GMM and canonical
correlation estimators.
4. Complete results are available upon request.
5. The operation is the following. Consider an N dimensional vector with increment
four g = (1, 5, 9 . . .)
. We form an nN nN matrix G = gg
R with R an n n matrix
with all elements equal to 1. Then the heteroskedasticity disturbance covariance matrix
* is given by * = G, with given in (10) and the elementwise product or
Hadamard product.
6. The data may be downloaded via different internet sites such as
http://www.nber.org/pwt56.html or http://datacentre.epas.utoronto.ca:5620/pwt.
7. Because of computation facility, we have balanced the panel in this study and we
did not consider either Greece and Portugal.
158
8. We did not consider here a slightly different model in which real government
expenditures are substracted from output. Indeed, as raised by Evans & Karas (1996b),
the model should be extended to take care of the potential substitutability or
complementarity between private and public goods. Without a fine distinction of the
components of government expenditures, it might be desirable to extend the model to
take into account a third variable. It is also possible to consider a simple alternative
model where all the public goods are substitutable to private one by substracting G from
Y.
9. Other countries joined the OECD. This was the case of the Czech Republic in
1995, Korea in 1996, Poland 1996 and Mexico 1994. We drop them because the ending
year is 1992 in our data set. Also note that OECD has its origin in the Organization for
European Economic Cooperation which grouped European Countries. This organization was charged with administering United States aid, under the Marshall Plan, to
reconstruct Europe after the World War II. Consequently, for countries that did not
participate at the beginning in this project, homogeneity of cointegration and/or
common features might be rejected for that reason.
10. As noted in Section 4, the main part of the approach presented in this paper also
applies to non-cointegrated systems.
REFERENCES
Ahn, S. K. (1997). Inference of Vector Autoregressive Models with Cointegration and Scalar
Components. Journal of the American Statistical Association, 92, 350356.
Anderson, H., & Vahid, F. (1996). Testing Multiple Equation Systems for Common Nonlinear
Components. Working paper, Department of Economics, Texas A&M University.
Banerjee, A. (Ed.) (1999). Testing for Unit Roots and Cointegration Using Panel Data: Theory and
Applications. Oxford Bulletin of Economics and Statistics, 61, 607629.
Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley.
Beine, M., & Hecq, A. (1997). Asymmetric Shocks Inside Future EMU. Journal of Economic
Integration, 12, 131140.
Beine, M., & Hecq, A. (1998). Codependence and Convergence, an Application to the EC
Economies. Journal of Policy Modeling, 20, 403426.
Beine, M., & Hecq, A. (1999). Inference in Codependence: Some Monte Carlo Results and
Applications. Annales dEconomies et de Statistique, 54, 6990.
Campbell, J. Y., & Mankiw, N. G. (1990). Permanent Income, Current Income, & Consumption.Journal of Business and Economic Statistics, 8, 265279.
Campbell, J. Y., & Mankiw, N. G. (1991). The Response of Consumption to Income: A CrossCountry Investigation. European Economic Review, 35, 723767.
Candelon, B., & Hecq, A. (2000). Stability of the Unemployment-Activity Relationship In: A
Codependent System. Applied Economics Letters, forthcoming.
Engle, R. F., & Kozicki, S. (1993). Testing for Common Features (with comments). Journal of
Business and Economic Statistics, 11, 369395.
Engle, R. F., & Watson, M. W. (1981). A One-Factor Multivariate Time Series Model of
Metropolitan Wages.Journal of the American Statistical Association, 76, 545565.
Evans, P., & Karras, G. (1996a). Convergence Revisited. Journal of Monetary Economics, 37,
249265.
159
Evans, P., & Karras, G. (1996b). Private and Government Consumption With Liquidity
Constraints. Journal of International Money and Finance, 2, 255266.
Geweke, J. (1977). The Dynamic Factor Analysis of Economic Time Series. In: D. J. Aigner & A.
S. Goldberger (Eds), Latent Variables in Socio-Economic Models.Amsterdam: NorthHolland.
Gouriroux, C., & Peaucelle, I. (1993). Sries codpendantes: application lhypothse de parit
du pouvoir dachat. In: Macroconomie}, Dveloppements Rcents. Economica: Paris.
Granger, C. W. J., & Lin, J. L. (1995). Causality in the Long Run. Econometric Theory, 11,
530536.
Granger, C. W. J., & Haldrup, N. (1997). Separation in Cointegrated Systems and P-T
Decompositions. Oxford Bulletin of Economics and Statistics, 59, 449463.
Greene, W. H. (1993). Econometric Analysis. New York: MacMillan.
Groen, J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of Vector
Error Correction Models. Discussion Paper TI 99055/4, Tinbergen Institute, Erasmus
University Rotterdam.
Hamilton, J. D. (1994). Time Series Analysis. Princeton: Princeton University Press.
Hansen, L. P. (1982). Large Sample Properties of Generalized Method of Moment Estimators.
Econometrica, 50, 10291054.
Hansen, P. R., & Johansen, S. (1998). Workbook on Cointegration. Oxford: Oxford University
Press.
Hecq, A. (1999). On the Usefulness of Considering Common Serial Features and Cointegrating
Restrictions. Working paper, University of Maastricht RM/99/017.
Hecq, A., Palm, F. C., & Urbain, J. P. (1997). Testing for Common Cycles in VAR Models with
Cointegration. Working paper, University of Maastricht RM/97/031 (revised 1998).
Hecq, A., Palm, F. C., & Urbain, J. P. (1999). Separation and Weak Exogeneity in Cointegrated
VAR Models with Common Features. mimeo, University of Maastricht.
Hecq, A., Palm, F. C., & Urbain, J. P. (2000). Permanent-Transitory Decomposition in VAR
Models with Cointegration and Common Cycles. Oxford Bulletin of Economics and
Statistics, forthcoming.
Hoogstrate, A. J. (1998). Dynamic Panel Data Models: Theory and Macroeconomic Applications.
Ph. D.Thesis, University of Maastricht.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels.
mimeo, University of Cambridge.
Issler, J. V., & Vahid, F. (1996). Common Cycles in Macroeconomic Aggregates. mimeo.
Jobert, T. (1995. Tendances et cycles communs la consommation et au revenu: Implications pour
le modle de revenu permanent. Economie et Prvision, 121, 1938.
Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models.
Oxford: Oxford University Press.
Kugler, P., & Neusser, K. (1993). International Real Interest Rate Equalization: A Multivariate
Time-Series Approach. Journal of Applied Econometrics, 8, 163174.
Kunst, R., & Neusser, K. (1990). Cointegration in Macroeconomic System. Journal of Applied
Econometrics, 5, 351365.
Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data.
Journal of Econometrics, 40, 144.
Konishi, T., & Granger, C. W. J. (1993). Separation in Cointegrated Systems. Working paper,
Department of Economics, University of California-San Diego
160
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite Sample
Properties. Working paper, Department of Economics, University of Calfornia-San Diego.
Larsson, R., & Lyhagen, J. (1999). Likelihood-Based Inference in Multivariate Panel Cointegration Models. Working paper 331, Stockholm School of Economics, SSE.
Lumsdaine, R. L., & Prasad, E. (1997). Identifying the Common Components in International
Economic Fluctuations. NBER Working paper 5984.
Ltkepohl, H. (1991). Introduction to Multiple Time Series Models. Berlin: Springer Verlag.
McCoskey, S., & C. Kao. (1998a. A Residual-Based Test of the Null of Cointegration in Panel
Data. Econometric Reviews, 17, 5784.
McCoskey, S., & Kao, C. (1998b). A Monte Carlo Comparison of Tests for Cointegration in Panel
Data. mimeo.
OConnell, P. (1998). The Overvaluation of Purchasing Power Parity. Journal of International
Economics, 44, 119.
Pedroni, P. (1997a). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power
Parity. Working paper, Department of Economics, Indiana University.
Pesaran, M. H., Shin, Y., & Smith, R. P. (1997). Pooled Estimation of Long-Run Relationships in
Dynamic Heterogenous Panels. Working paper, Department of Economics, University of
Cambridge.
Pesaran, M. H., & Smith, R. P. (1995). Estimating Long-Run Relationships From Dynamic
Heterogenous Panels. Journal of Econometrics, 68, 79113.
Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel
Data. Econometrica, 67, 10571111.
Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some
Recent Developments. Econometric Reviews, forthcoming.
Singleton, K. (1980). A Latent Time Series Model of the Cyclical Behavior of Interest Rates.
International Economic Review, 21, 559575.
Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of
International Comparisons, 19501988. Quarterly Journal of Economics, 106, 327368.
Tiao, G. C., & Tsay, R. S. (1989). Model Specification in Multivariate Time Series. Journal of
Royal Statistical Society (series B), 51, 157213.
Vahid, F., & R. F. Engle (1993). Common Trends and Common Cycles. Journal of Applied
Econometrics}, 8, 341360.
Vahid, F., & R. F. Engle. (1997). Codependent Cycles. Journal of Econometrics, 80, 199221.
Vahid, F., & Issler, J. V. (1999). The Importance of Common-Cyclical Features in VAR Analysis:
A Monte-Carlo Study. Presented at ESEM99 in Madrid, Spain.
I. INTRODUCTION
In a panel data set, a variable yit is observed for cross section units i = 1, . . . , N
in t = 1, . . . , T time periods. A well known problem with such data is
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 161177.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
161
162
JRG BREITUNG
(1)
where the error term it is assumed to be uncorrelated across i and t. In this
model individual heterogeneity is represented by the individual specific
parameters i, i and 2i = E(2it). If there are no further assumptions on the
parameters, then the data for each cross section unit can be analyzed separately
by running N different regressions. In this case, we take no advantage from
pooling the data and, thus, inference may be very inefficient. The other extreme
is that we ignore a possible heterogeneity altogether and estimate a pooled
regression with 1 = = N, 1 = = N and 21 = = 2N. Of course,
ignoring heterogeneity in the data may result in biased estimates (e.g. Baltagi
(1995) p. 3f).
Traditional panel data analysis adopts a compromise between these two
extremes and assumes that individual heterogeneity can be represented by an
individual specific intercept i alone. Furthermore, one often encounters
additional assumptions on the individual effect i, for example, that it is
random and uncorrelated with the regressors. The latter model is known as
random-effects model.
It is not surprising that early work on tests for unit roots in panel data starts
from the Dickey-Fuller type regression with individual specific intercept (e.g.
Breitung (1992)). Levin & Lin (henceforth: LL) (1993) and Im, Pesaran & Shin
(henceforth: IPS) (1997) consider more general models by allowing for
individual specific short run dynamics and time trends.
It is well known that the usual dummy variable estimator (or within-group
estimator) of dynamic models suffers from the so-called Nickell bias (Nickell
1981). The same is true if individual specific time trends are estimated by using
the dummy-variable approach. LL (1993) construct a bias adjusted t-statistic to
test the null hypothesis of a unit root process. Unfortunately, bias adjusted test
statistics for the model with a constant or a time trend suffer from a severe loss
of power. For example, the power of the LL (1993) test without an intercept
(and thus without the need to correct for the Nickell bias) against a stationary
alternative with an autoregressive coefficient of 0.9 is virtually unity for N = 25
and T = 25. For the bias adjusted test statistic in the model with individual
specific intercept (trend), the power against the same alternative drops to 0.45
(0.25). Furthermore IPS (1997) observe a serious size bias if the bias adjusted
LL statistic is augmented with lagged differences.
163
164
JRG BREITUNG
(2)
t = 1, 2, . . . , T ,
p+1
xit =
ikxi, t k + it
(3)
k=1
and xis = 0 for s 0. It is assumed that it is white noise with E(2it) = 2i and
E|it|2 + < for all i, t and some > 0. Furthermore it is assumed to be
independent of js for i j and all t and s.
The null hypothesis is that the process is difference stationary, i.e.
p+1
H0:
i
(4)
k=1
Under the alternative we assume that yit is (trend) stationary, that is,
i < 0 for
all i.
The assumptions concerning it ensure that there exists a functional central
limit theorem such that
[rT]
T 1/2
it iWi(r) ,
t=1
T
2
i
1
it (e.g.
t=1
Phillips & Solo (1992)). The parameter 2i is sometimes called the long-run
variance, since it is computed as 2 times the spectral density at frequency
zero.
LL (1993) suggest a test procedure against the alternative
1 = =
N < 0.
Let eit (vi, t 1) denote the residuals from a regression of yit (yi, t 1) on
1, t, yi, t 1, . . . , yi, t p. Furthermore, let e it = eit /i and v it = vit /i, where in
165
practice 2i is estimated using the residuals eit. The LL test is based on the bias
adjusted t-statistic for
= 0 in the regression:
e it =
vi, t 1 +
it .
LL (1993) show that under the null hypothesis, the ordinary t-statistic tends to
minus infinity if a constant or a time trend is included in the model. Therefore,
they suggest a bias adjusted test statistic given by
N
i=1
t=1
LL =
(5)
bT
i=1
v 2i, t 1
t=1
b2 =
V dV
(6)
var[ VdV]
E V2
(7)
a = E
and V V(r) is a detrended Brownian motion. LL (1993) suggest to use a nonparametric estimator for 2i based on the first differences of the data.1
IPS (1997) relax the assumption of a common parameter
under the
alternative. Accordingly, model (2) is estimated for each cross section unit
separately, yielding an individual specific Dickey-Fuller t-statistic i. The IPS
statistic is given by:
N
IPS = N
1/2
[i mT]/T ,
i=1
where i is the usual augmented Dickey-Fuller t-statistic for cross section unit
i, and mT, 2T are small sample analogs of
m = E
2 = var
VdV
(8)
V 2
VdV
V 2
(9)
166
JRG BREITUNG
IPS (1997) provide tables for various values of T and the lag order p. As for the
LL test, these tables assume that the panels are balanced, that is, all cross
section units have the same number of time periods T.
(10)
where
xit = 1
TN
xi, t 1 + it
c>0.
(11)
0c< .
This is an important difference to the asymptotic theory in the usual time series
context, where under the local alternative the limiting process is an OrnsteinUhlenbeck process (cf. Phillips (1987)).
The probability limits of the tests depend on the parameters i and i. First,
we consider the theoretical value of 2i under the local alternative.
167
E N T
*BA(c) = lim N
N, T
N
1
1
e itv i, t 1 N
i=1
t=1
( i /i)a
i=1
1
E N 1T 2
v 2i, t 1
i=1
t=1
Note that numerator and denominator are normalized so that both converge to
a fixed limit.
Since
e itv i, t 1 = [i 1 it c/(TN)vi, t 1]vi, t 1
the limit can be written as
N
lim N N
*BA(c) =
N, T
1
E(Ti) a
i=1
b E V 2
cE V 2
,
b
(12)
T
Ti = T
1
i 1 itv i, t 1 .
t=1
It turns out that the limit of the bias adjusted statistic depends on two different
terms on the right hand side of (12). The first term is due to the detrending
168
JRG BREITUNG
Since the result of Lemma 3 is crucial for the local power of the bias adjusted
test, the accuracy of the approximation is investigated in a Monte Carlo
experiment. First, we generate 10,000 realizations of Ti by letting T = 200,
c = 5 and repeat the experiment with various values for N.2 If Lemma 3 holds,
a regression of the sample means of Ti on c/N and a constant should yield
an estimate for the intercept close to 0.5 and a slope of roughly 1/15 = 0.067.
Using N{30, 35, 40, . . . , 500} the following regression function was
obtained for the 71 realizations:
E(Ti) 0.495 + 0.0629c/N ,
(0.00060)
(0.0016)
where standard errors are given in parentheses. The estimated slope coefficient
is only slightly smaller than 0.067 and, therefore, the approximation in Lemma
3 seems to perform fairly well in finite samples.
Now we present the limiting distribution of the bias adjusted test statistic.
Theorem 2: Consider a sequence of local alternatives given in (10)(11). If the
estimator for i converges weakly to i, the bias adjusted test statistic is
asymptotically distributed as (0, 1).
It turns out that the bias adjusted test can fail to have power against the
sequence of local alternatives. This finding suggests that the power may be
improved by a modification that avoids the bias correction altogether. Such a
modified test procedure is suggested in Section IV.
It is important to notice that the test suggested by LL (1993) employs a nonparametric estimator that converges to zero for a stationary alternative. In the
univariate time series context the unit root tests are inconsistent if the long-run
169
variance is estimated by using the differences of the time series (cf Phillips &
Ouliaris (1990), Theorem 5.3). Therefore, Phillips & Perron (1988) estimate 2i
by using the residuals of the autoregression. In a panel data framework,
however, this approach yields a test that has no power against the sequence of
local alternatives.
Finally the local power of the IPS test is investigated. As in the case of the
bias adjusted statistic considered above, the probability limit of the test statistic
depends on two terms. The first term is due to the detrending method and
depends on
T
i 1itv i, t 1
*Ti =
t=1
v 2i, t 1
t=1
(0.0077)
(14)
This approximation can be used to compute the limiting distribution of the IPS
test given in
Theorem 3: For a sequence of local alternative given in (10)(11) the IPS test
is asymptotically distributed as (IPS
c , 1), where
IPS
c =
c
lim
E(*Ti)
(c/N)
E
V2
c=0
Again we find that the local power depends on two terms. Our Monte Carlo
experiment suggests that the derivative of E(*Ti) is positive so that the
detrending bias implies a substantial loss of power.
Using 10,000 Monte Carlo replications, the expression E( V 2) is
estimated as 0.243. Using the value 100 = 0.597, which is taken from the
values reported in IPS (1997), we obtain:
cIPS = c(0.2120.243)/0.597 = 0.0401c .
170
JRG BREITUNG
It turns out that the asymptotic mean function has a relatively small slope of
roughly 0.04 compared to the slope of 1/2 = 0.707 for the case
without deterministic trend (see Theorem 1).
(15)
for all i and t. Imposing further assumptions to rule out degenerate cases it is
possible to show that a t-statistic based on the transformed variables has a
standard normal limiting distribution.
Theorem 4: Let yit be white noise with E(yit) = i, E(yit i)2 = 2i > 0 and
E(yit i)4 < . Under the assumption (15) and
lim E(T 1y*i y*i ) > 0
the statistic
171
N
UB =
i=1
N
i=1
y*it = st yit
1
(yi, t + 1 + + yiT) ,
Tt
t = 1, 2, . . . , T 1, (16)
where s2t = (T t)/(T t + 1). This transformation is also used in Arellano &
Bover (1995), for example. An important property of this transformation is that
whenever yit is a white noise process with constant variance, then the same is
true for y*it. Obviously, if yit is a random walk with (individual specific) time
trend, then y*it has a zero mean and is uncorrelated with yi, t 1.
The matrix B is chosen such that E(x*it) = 0 and E(y*it x*it) = 0. A possible
transformation with the desired properties is:
t1
yiT .
(17)
x*it = yi, t 1 yi1
T
T
1
1
t=1
variable x*it is adjusted for a time trend. It is easy to verify that in this case y*it
and x*it are uncorrelated. Furthermore, since the transformation matrix A
corresponding to the Helmert transformation (16) satisfies AA = I we conclude
from Theorem 4 that the t-statistic for H0:
* = 0 in the pooled regression
y*it =
*xit + e*it t = 2, 3, . . . , T 1
(18)
has a standard normal limiting distribution.
To compute the local power function of this test statistic we need an
approximation for
T
E(*Ti) = E T
1
y*it x*it
t=1
172
JRG BREITUNG
that is accurate up to O(N 1/2). As for the LL and IPS statistic, such an
approximation is obtained by fitting a regression function to the simulated
values of *Ti:
E(*Ti) 0.0104 0.0407cN .
(0.0021)
(0.0104)
(19)
Since the test statistic is constructed to have an expectation of zero under the
null hypothesis, we expect to find a constant close to zero. The estimated
constant is indeed quite small but nevertheless significant. The slope coefficient
is significantly negative so that the test seem to have a local power larger than
the size. The following theorem presents further details on the local power of
the UB statistic.
Theorem 5: For a sequence of local alternative given in (10)(11) the UB test
is asymptotically distributed as (UB
c , 1), where
UB
c = c6 lim
E(*Ti)
(c/N)
c=0
It is interesting to compare the local power of the IPS and the UB test. Since
6 0.0407 > 0.0401, the UB statistic has a location parameter which is more
than twice as large in absolute value compared to the IPS statistic. Again,
however, we emphasize that this comparison is inappropriate, because the IPS
test is more general than the UB test as it allows for a heterogeneous
autoregressive parameter under the alternative.
173
(20)
and yit = i + it + xit. The initial values of the process are set equal to zero. The
errors are i.i.d. with it ~ N(0, 1). Since all tests are invariant to the parameters
i and i, these parameters are set equal to zero. For the bias and variance
corrections of the LL and IPS tests the tabulated values in LL (1993) and IPS
(1997) are used. To represent a typical regional panel data set, we let T = 30
(years) and N = 20 (countries). All rejection frequencies are computed from
1000 realizations with a nominal significance level of 0.05.
Table 1 presents the rejection frequencies for the different tests. For p > 0 the
LL test turns out to be quite conservative. This was also observed by IPS (1997)
and, therefore, the values for the mean and variance of this test should also be
tabulated for different augmentation lags. With respect to the power of the test
it turns out that for p = 0 the power of the LL and IPS tests are roughly similar.
For p > 0 the IPS test is more powerful than the LL test, at least if the critical
values of the LL test are not adjusted for different augmentation lags.
The UB statistic suggested in Section IV appears to be substantially more
powerful than the LL and IPS tests. Furthermore the size of the UB test is fairly
robust with respect to the augmentation lag. Notice that for the UB test no
tables are required for different values of p and T.
In the next Monte Carlo experiment we consider the validity of the
theoretical results for the actual power of the test. For this purpose we set
Table 1.
LL
1.00
0.95
0.90
0.80
IPS
UB
LL
p=0
0.025
0.048
0.189
0.801
1.00
0.95
0.90
0.80
0.046
0.076
0.198
0.723
0.045
0.072
0.118
0.365
UB
p=1
0.073
0.127
0.396
0.897
0.005
0.009
0.041
0.277
p=2
0.001
0.001
0.001
0.002
IPS
0.053
0.077
0.152
0.544
0.069
0.213
0.417
0.807
p=3
0.038
0.147
0.260
0.508
0.000
0.000
0.000
0.000
0.040
0.056
0.107
0.257
Note: Empirical sizes computed from 1000 Monte Carlo replications of model (20).
p denotes the number of lagged differences. The nominal size is 0.05.
0.053
0.195
0.266
0.418
174
JRG BREITUNG
= 120/(TN). If the test does not have power against such alternative, we
expect that the power of the test tends to the size as N and T . In our
Monte Carlo comparison we also include a variant of the LL test that estimates
the long-run variances by using the regression residuals instead of the first
difference of the process. As shown in Section III such a test has a local power
equal to the size. The critical values for this test are computed by Monte Carlo
simulations. The respective test is denoted as LL*.
Table 2 presents the outcome of such a Monte Carlo experiment. As
predicted by Theorem 2, the power of the LL* test is close the size for all N and
T. All other tests appear to converge to a limit larger than the size, where the
limiting power of the UB test is nearly twice as large as the limiting power of
the IPS test. The original LL test turns out to have power against the local
alternative but the power is substantially smaller than the power of the IPS and
UB statistics.
The findings of the Monte Carlo experiment can be compared to the results
of our theoretical analysis. From Theorem 3 it is expected that the IPS test has
Table 2. Power against local alternatives
LL
N
25
50
70
100
25
50
70
100
LL*
IPS
UB
0.384
0.300
0.296
0.261
0.668
0.660
0.608
0.579
N and T
0.378
0.269
0.210
0.170
0.064
0.056
0.033
0.050
T fixed, N
50
70
100
25
25
25
0.235
0.156
0.090
0.038
0.038
0.028
0.342
0.313
0.273
0.575
0.535
0.450
N fixed, T
25
25
25
50
70
100
0.415
0.378
0.298
0.061
0.020
0.028
0.419
0.421
0.402
0.724
0.742
0.783
Note: This table reports the rejection rates computed from 1000 replications of model (20) with
= 1 20/(TN). The significance level is 0.05. The statistic LL* is constructed similarly to the
LL test but using the residuals from the autoregressions to estimate 2i . For this test the values
for the expectation and variance are computed by additional Monte Carlo simulations.
175
V. CONCLUSION
In this chapter we have considered the local power of some well known tests
and a new test for unit roots in panel data. We found that the LL and IPS tests
suffer from a severe loss of power if individual specific trends are included.
Therefore, a class of test statistic is suggested that does not employ a bias
adjustment and it is found that the power of this test is substantially higher than
the LL and the IPS tests. Furthermore, it turns out that the LL test is very
sensitive to the augmentation lag. It is therefore recommended to apply tables
for the mean and variance that take into account the lag-augmentation of the
test.
The results further indicate that the power of the tests is very sensitive to the
specification of the deterministic terms. If there is only a constant or a joint
linear trend, then subtracting the first observation yields a very powerful test.
Including individual specific trends when it is unnecessary leads to a dramatic
loss of power. Hence, in practice it is desirable to have a test for a common
deterministic trend against the alternative of individual specific time trends.
As pointed out by a referee, there are other detrending methods that may be
used to construct an improved test procedure. A natural candidate is the quasi
difference detrending suggested by Elliot, Rothenberg & Stock (1996) (see
also Phillips & Xiao (1998)). Unfortunately, it can be shown that a t-statistic
computed from quasi differenced data also suffers from a (Nickell type) bias so
that again a bias correction is required to obtain a reasonable test procedure.
Nevertheless, a test procedure based on quasi differences may perform better
than test procedures with OLS detrending. In this chapter, our strategy is to
avoid the bias term altogether. The comparison of our approach to a test
procedure based on quasi differences is left for future research.
176
JRG BREITUNG
ACKNOWLEDGMENTS
The research for this paper was carried out within the SFB 373 at the Humboldt
University Berlin and the METEOR research project Dynamic and Nonstationary Panels: Theoretical and Empirical Issues. I thank Carsten Trenkler
and two referees for their helpful comments and suggestions.
NOTES
1. In LL (1993) the test statistic is divided by NT which is computed as the overall
standard deviation of e it. However, since e it is already adjusted for its standard deviation,
we can drop NT when computing the test statistic.
2. I repeated the experiment for different values of c and T. The results turn out to
be fairly robust.
3. Another possibility is to use alternative estimation methods like the Generalized
Methods of Moments (GMM). Breitung (1997) apply second differences and obtains a
unit root test without bias adjustment by using an appropriate GMM estimator.
REFERENCES
Arellano M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 2951.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley and Sons.
Breitung, J. (1992). Dynamische Modelle fr die Paneldatenanalyse (Dynamic Models for the
Analysis of Panel Data). PhD dissertation, Haag + Herchen, Frankfurt.
Breitung, J. (1997). Testing for Unit Roots in Panel Data Using a GMM Approach. Statistical
Papers, 38, 253269.
Breitung, J. (1999). The Local Power of Some Unit Root Tests for Panel Data. SFB 373 Discussion
paper, No. 691999, Humboldt University Berlin.
Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different
Bargaining Levels Cointegrated? Applied Economics, 26, 353361.
Cheung, K. S. (1995), Lag Order and Critical Values of the Augmented Dickey-Fuller Test.
Journal of Business and Economic Statistics, 13, 277280.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimates for Autoregressive Time Series
With a Unit Root. Journal of the American Statistical Association, 74, 427431.
Elliot, G., Rothenberg, T. J., & Stock, J. H. (1996). Efficient Tests for an Autoregressive Unit Root.
Econometrica, 64, 813836.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. DAE
Working paper, No 9526, University of Cambridge, revised version.
Kao, C. (1999). Spurious Regression and Residual-based Tests for Cointegration in Panel Data.
Journal of Econometrics, 90, 144.
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample
Properties. Working paper, Department of Economics, University of California San
Diego.
177
Moon, H. R., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using
Panel Data. mimeo, Yale University.
Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426.
Phillips, P. C. B. (1987). Towards a Unified Asymptotic Theory of Autoregression. Biometrika, 74,
53548.
Phillips, P. C. B., & Lee, C. C. (1996). Efficiency Gains from Quasi-Differencing Under
Nonstationarity. In: P. M. Robinson & M. Rosenblatt (Eds), Essays in Memory of E. J.
Hannan (pp. 300314).
Phillips, P. C. B., & Moon, H. R. (1999). Linear Regression Limit Theory for Nonstationary Panel
Data. Econometrica, 67, 10571111.
Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for
Cointegration. Econometrica, 58, 165193.
Phillips, P. C. B., & Perron, P. (1988). Testing for a Unit Root in Time Series Regression.
Biometrika, 75, 335346.
Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20,
9711001.
Phillips, P. C. B., & Xiao, Z. (1998). A Primer on Unit Root Testing. Journal of Economic Surveys,
12, 423467.
Quah, D, (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data.
Economics Letters, 44, 919.
Schmidt, P., & Phillips, P. C. B. (1992). LM Test for a Unit Root in the Presence of Deterministic
Trends. Oxford Bulletin of Economics and Statistics, 54, 257287.
I. INTRODUCTION
Evaluating the statistical properties of data along the time dimension has
proven to be very different from analysis of the cross-section dimension. As
economists have gained access to better data with more observations across
time, understanding these properties has grown increasingly important. An area
of particular concern in time-series econometrics has been the use of nonstationary data. With the desire to study the behavior of cross-sectional data
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 179222.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
179
180
over time and the increasing use of panel data, e.g. Summers and Heston (1991)
data, one new research area is examining the properties of non-stationary timeseries data in panel form. It is an intriguing question to ask: how exactly does
this hybrid style of data combine the statistical elements of traditional crosssectional analysis and time-series analysis? In particular, what is the correct
way to analyze non-stationarity, the spurious regression problem, and
cointegration in panel data?
Given the immense interest in testing for unit roots and cointegration in timeseries data, not much attention has been paid to testing the unit roots in panel
data. The only theoretical studies we know of in this area are Breitung & Meyer
(1994); Quah (1994); Levin & Lin (1993); Im, Pesaran & Shin (1995); and
Maddala & Wu (1999). Breitung & Meyer (1994) derived the asymptotic
normality of the Dickey-Fuller test statistic for panel data with a large crosssection dimension and a small time-series dimension. Quah (1994) studied a
unit root test for panel data that simultaneously have extensive cross-section
and time-series variation. He showed that the asymptotic distribution for the
proposed test is a mixture of the standard normal and Dickey-Fuller-Phillips
asymptotics. Levin & Lin (1993) derived the asymptotic distributions for unit
roots on panel data and showed that the power of these tests increases
dramatically as the cross-section dimension increases. Im et al. (1995) critiqued
the Levin and Lin panel unit root statistics and proposed alternatives. Maddala
& Wu (1999) provided a comparison of the tests of Im et al. (1995) and Levin
& Lin (1993). They suggested a new test based on the Fisher test.
Recently, some attention has been given to the cointegration tests and
estimation with regression models in panel data, e.g. Kao (1999), McCoskey &
Kao (1998), Pedroni (1996, 1997) and Phillips & Moon (1999). Kao (1999)
studied a spurious regression in panel data, along with asymptotic properties of
the ordinary least squares (OLS) estimator and other conventional statistics.
Kao showed that the OLS estimator is consistent for its true value, but the tstatistic diverges so that inferences about the regression coefficient, , are
wrong with a probability that goes to one. Furthermore, Kao examined the
Dickey-Fuller (DF) and the augmented Dickey-Fuller (ADF) tests to test the
null hypothesis of no cointegration in panel data. McCoskey & Kao (1998)
proposed further tests for the null hypothesis of cointegration in panel data.
Pedroni (1997) derived asymptotic distributions for residual-based tests of
cointegration for both homogeneous and heterogeneous panels. Pedroni (1996)
proposed a fully modified estimator for heterogeneous panels. Phillips & Moon
(1999) developed both sequential limit and joint limit theories for nonstationary panel data. Pesaran & Smith (1995) are not directly concerned with
cointegration but do touch on a number of related issues, including the potential
Panel Cointegration
181
(1)
where {yit} are 1 1, is a k 1 vector of the slope parameters, {i} are the
intercepts, and {uit} are the stationary disturbance terms. We assume that {xit}
are k 1 integrated processes of order one for all i, where
xit = xit 1 + it.
Under these specifications, (1) describes a system of cointegrated regressions,
i.e. yit is cointegrated with xit. The initialization of this system is yi0 = xi0 = Op(1)
as T , for all i. The individual constant term i can be extended into general
deterministic time trends such as 0i + 1it + , . . . , + pit p.
Assumption 1. The asymptotic theory employed in this paper is a sequential
limit theory established by Phillips & Moon (1999) in which T and
followed by N .
182
Next, we characterize the innovation vector wit = (uit, it). We assume that wit is
a linear process that satisfies the following assumption.
Assumption 2. For each i, we assume:
(a) wit =
(L)it =
j it j,
j=0
ja||
j || < , |
(1)| 0, for some a > 1.
j=0
(b) it is i.i.d. with zero mean, variance matrix , and finite fourth order
cumulants.
Assumption 2 implies that (e.g. Phillips & Solo, 1992) the partial sum process
[Tr]
1
T
t=1
[Tr]
1
T
(2)
t=1
where
Bi =
Bui
.
Bi
=
E(wijwi0)
j=
=
(1)
(1)
= +
+
=
u
u
u
,
where
j=1
and
E(wijwi0) =
u
u
u
(3)
Panel Cointegration
183
= E(wi0wi0) =
u
u
u
(4)
(5)
where
Bui
1/2
u.
=
Bi
0
u 1/2
1/2
Vi
,
Wi
(6)
Vi
= BM(I) is a standardized Brownian motion. Define the one-sided
Wi
long-run covariance
=+
E(wijwi0)
j=0
with
=
u
u
u
.
Here we assume that panels are homogeneous, i.e. the variances are constant
across the cross-section units. We will relax this assumption in Section 4 to
allow for different variances for different i.
Remark 1. The benefits of using panel data models have been discussed
extensively by Hsiao (1986) and Baltagi (1995), though Hsiao & Baltagi
assume the time dimension is small while the cross-section dimension is large.
However, in international trade, open macroeconomics, urban regional, public
finance, and finance, panel data usually have long time-series and crosssection dimensions. The data of Summers & Heston (1991) are a notable
example.
184
Remark 2. The advantage of using the sequential limit theory is that it offers
a quick and easy way to derive the asymptotics as demonstrated by Phillips &
Moon (1999). Phillips & Moon also provide detailed treatments of the
connections between the sequential limit theory and the joint limit theory.
Remark 3. If one wants to obtain a consistent estimate of in (1) or wants to
test some restrictions on , then an individual time-series regression or a
multiple time-series regression is probably enough. So what are the advantages
of using the (N, T) asymptotics, e.g. sequential asymptotics in Assumption 1,
instead of T asymptotics? One of the advantages is that we can get a normal
approximation of the limit distributions of the estimators and test statistics with
the convergence rate NT. More importantly, the biases of the estimators and
test statistics can be reduced when N and T are large. For example, later in this
paper we will show that the biases of the OLS, FMOLS, and DOLS estimators
in Table 2 were reduced by half when the sample size was changed from
(N = 1, T = 20) to (N = 20, T = 20). However, in order to obtain an asymptotic
normality using the (N, T) asymptotics we need to make some strong
assumptions; for example, in this paper we assume that the error terms are
independent across i.
Remark 4. The results in this chapter require that regressors are not
cointegrated. Assuming that I(1) regressors are not cointegrated with each
other is indeed restrictive. The authors are currently investigating this issue.
N
OLS =
i=1
t=1
1
i=1
t=1
(xit x i)(yit y i) .
(7)
where
Panel Cointegration
185
N
1
NT =
N
1
T2
i=1
1
t=1
1
N
1/2
i=1
i = Wi Wi.
and W
The normality of the OLS estimator in Theorem 1 comes naturally. When
summing across i, the non-standard asymptotic distribution due to the unit root
in the time dimension is smoothed out. From Theorem 1 we note that there is
an interesting interpretation of the asymptotic covariance matrix, 6 1u., i.e.
1u. can be seen as the long-run noise-to-signal ratio. We also note that
1
2u is due to the endogeneity of the regressor xit, and u is due to the serial
correlation. It can be shown easily that
NT 3 1u + 6 1u.
p
u,
, and
,
u be
which was examined by Kao & Chen (1995). Let
consistent estimates of , u, , and u respectively. Then from (b) in
+
Theorem 1, we can define a bias-corrected OLS, OLS
,
NT
+
= OLS
OLS
T
such that
+
NT( OLS
) N(0, 6 1u.),
where
1
u + 6
1
u.
NT = 3
Chen, McCoskey & Kao (1999) investigated the finite sample proprieties of the
OLS estimator in (7), the t-statistic, the bias-corrected OLS estimator, and the
bias-corrected t-statistic. They found that the bias-corrected OLS estimator
does not improve over the OLS estimator in general. The results of Chen et al.
suggest that alternatives, such as the FMOLS estimator or the DOLS estimator
(e.g. Saikkonen, 1991; Stock & Watson, 1993) may be more promising in
186
(9)
u it = uit u it,
yit+ = yit u 1xit,
(10)
and
u
1xit.
(11)
y it+ = yit
Note that
u 1
Ik
uit+
1
=
it
0
uit
,
it
u.
0
0
,
u
)
+u = (
1
u
1
u,
1
= u
are kernel estimates of u and . Therefore, the FMOLS
where u and
estimator is
N
FM =
i=1
t=1
i=1
t=1
1
(12)
Panel Cointegration
187
(13)
N
i=1
t=1
(14)
w
itw
it
N
=1
N
i=1
1
T
t=1
l
1
w
itw
it +
T
l
=1
(w itw
it + w it w
it) ,
t=+1
(15)
where l is a weight function or a kernel. Using Phillips & Durlauf (1986)
can be shown to be consistent for and
and sequential limit theory, and
.
) does not
Remark 6. The distribution results for FM require N(
may not be small when T is fixed.
diverge as N grows large. However,
) may be non-neglibible in panel data with finite
It follows that N(
samples.
Next, we propose a DOLS estimator, D, which uses the past and future values
of xit as additional regressors. We then show that the limiting distribution of
D is the same as the FMOLS estimator, FM. But first, we need the following
additional assumption:
Assumption 4. The spectral density matrix fww() is bounded away from zero
and full rank for all i, i.e.
fww() IT, [0, ], > 0.
When Assumptions 2 and 4 hold, the process {uit} can be written as (see
Saikkonen, 1991):
uit =
j=
cijit + j + vit
(16)
188
|| cij || < ,
j=
{vit} is stationary with zero mean, and {vit} and {it} are uncorrelated not only
contemporaneously but also in all lags and leads. In practice, the leads and lags
may be truncated while retaining (16) approximately, so that
q
uit =
cijit + j + v it.
j=q
for all i. This is because {cij} are assumed to be absolutely summable, i.e.
|| cij || < .
j=
|| cij || 0
(17)
|j|>q
for all i.
We then substitute (16) into (1) to get
cijit + j + v it,
cijit + j.
yit = i + xit +
j=q
where
v it = vit +
(18)
|j|>q
q
yit = i + xit +
cijxit + j + v it.
(19)
j=q
Panel Cointegration
189
(20)
iu
i 1it,
u it+ = uit
iu
1/2
1/2
1/2xit)),
i 1xit
y it+ = yit
iu.(iu. xit (i
(21)
1/2 +
iu.
it ,
y*it =
y
(23)
(22)
and
iu. are consistent estimators of i and
i and
where
iu. = iu iui 1iu,
1/2
1/2
respectively. Similar to Pedroni (1996) the correction term,
iu.(iu.
1/2
i xit)), is needed in (22) in the heterogeneous panel. We note that
xit (
1/2
1/2xit) = 0 in the
iu.
(22) will be the same as (11) only if
xit (i
heterogeneous panel. Also (22) requires knowing something about the true .
In practice, in (22) can be replaced by a preliminary OLS, OLS. Therefore,
let
iu
1/2
1/2
1/2xit)) OLS,
i 1xit
y it+ + = yit
iu.(iu. xit (i
and
1/2 + +
iu.
it .
y*it =
y
190
N
i=1
t=1
1
(x*it x *i )(x*it x *i )
i=1
t=1
iu
(x*it x *i )y*it T*
(24)
where
1/2
iu.
i 1/2
iu =
iu +
*
and
i+u = (
iu
i)
1
iu
i 1
iu.
i 1
i
= iu
FM )
Theorem 4. If Assumptions 12 and 67 hold, then NT(*
N(0, 6Ik).
D, can be obtained by
The DOLS estimator for heterogeneous panels, *
running the following regression:
qi
y*it = i + x*
it +
cijx*it + j + v *it,
(25)
j = qi
where v *it is defined similarly as in (18). Note that in (25) different lag
truncations, qi, may have to be used because the error terms are heterogeneous
across i. Therefore, we need to assume that qi tends to infinity with T at a
suitable rate for all i:
Assumption 8. qi as T such that
T1/2
q3i
0, and
T
|| cij || 0
(26)
| j | > qi
for all i.
D also has the same limiting
In the following theorem we show that *
distribution as *FM.
Panel Cointegration
191
D )
Theorem 5. If Assumptions 12 and 68 hold, then NT(*
N(0, 6Ik).
FM and
Remark 7. Theorems 4 and 5 show that the limiting distributions of *
*D are free of nuisance parameters.
Remark 8. We now consider a linear hypothesis that involves the elements of
the coefficient vector . We show that hypothesis tests constructed using the
FMOLS and DOLS estimators have asymptotic chi-squared distributions. The
null hypothesis has the form:
H0:R = r,
(27)
(28)
FM or *
D
Remark 9. For the heterogeneous panels, a natural statistic using *
to test the null hypothesis is
1
D r)[RR] 1(R*
D r).
W* = NT2(R*
6
(29)
192
uit
0.5 0
=
it
0 0.5
uit 1
u*it
0.3
+
+
it 1
*it
21
0.4
0.6
u*it 1
*it 1
with
u*it iid
~N
*it
0
,
0
1 21
21 1
The design in (30) nests several important special cases. First, when
is replaced by
0
0
0
0
0.5 0
0 0.5
0
0
0.5 0
0 0.5
is replaced by
0
, and 21 and 21 are random variable different across i, then the DGP
0
Panel Cointegration
193
design similar to that of Phillips & Hansen (1990) and Phillips & Loretan
(1991).
yit = i + xit + uit
and
xit = xit 1 + it
for i = 1, . . ., N, t = 1, . . . , T, where
0.4
0.6
uit
u*it
0.3
=
+
it
*it
21
u*it 1
*it 1
(31)
with
u*it iid
~N
*it
0
,
0
1
21
21
1
0.176
(0.044)
0.099
(0.017)
0.069
(0.009)
0.064
(0.025)
0.038
(0.009)
0.027
(0.005)
0.002
(0.015)
0.005
(0.005)
0.004
(0.003)
0.038
(0.012)
0.018
(0.004)
0.011
(0.002)
0.201
(0.049)
0.104
(0.019)
0.070
(0.010)
0.132
(0.038)
0.066
(0.014)
0.044
(0.007)
0.079
(0.027)
0.039
(0.009)
0.026
(0.005)
0.029
(0.016)
0.015
(0.006)
0.009
(0.003)
21 = 0.8
FM
0.007
(0.008)
0.003
(0.002)
0.002
(0.001)
0.001
(0.017)
0.001
(0.005)
0.000
(0.003)
0.001
(0.027)
0.001
(0.027)
0.000
(0.005)
0.001
(0.040)
0.000
(0.013)
0.000
(0.007)
D
0.019
(0.017)
0.009
(0.006)
0.007
(0.003)
0.059
(0.026)
0.029
(0.009)
0.019
(0.005)
0.082
(0.030)
0.041
(0.011)
0.027
(0.006)
0.097
(0.032)
0.049
(0.012)
0.033
(0.007)
OLS
0.036
(0.015)
0.018
(0.005)
0.012
(0.002)
0.019
(0.022)
0.012
(0.008)
0.009
(0.004)
0.068
(0.029)
0.038
(0.011)
0.026
(0.006)
0.113
(0.035)
0.062
(0.013)
0.042
(0.007)
21 = 0.4
FM
0.007
(0.014)
0.003
(0.004)
0.002
(0.002)
0.002
(0.026)
0.001
(0.008)
0.001
(0.008)
0.002
(0.031)
0.001
(0.009)
0.001
(0.005)
0.002
(0.033)
0.001
(0.011)
0.000
(0.006)
D
0.114
(0.034)
0.057
(0.012)
0.038
(0.007)
0.005
(0.016)
0.002
(0.006)
0.001
(0.003)
0.014
(0.013)
0.007
(0.005)
0.005
(0.002)
0.022
(0.011)
0.011
(0.004)
0.007
(0.002)
OLS
0.012
(0.028)
0.011
(0.009)
0.010
(0.005)
0.069
(0.021)
0.035
(0.007)
0.023
(0.004)
0.073
(0.018)
0.037
(0.006)
0.025
(0.003)
0.069
(0.016)
0.036
(0.006)
0.024
(0.003)
21 = 0.8
FM
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000
(0.031)
0.000
(0.009)
0.000
(0.005)
0.006
(0.017)
0.003
(0.005)
0.002
(0.003)
0.003
(0.013)
0.001
(0.004)
0.001
(0.002)
0.009
(0.009)
0.004
(0.003)
0.003
(0.002)
D
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 1.
194
CHIHWA KAO & MIN-HSIEN CHIANG
Panel Cointegration
Table 2.
195
(N,T)
OLS
FM(5)
FM(2)
D(4,2)
D(2,1)
(1,20)
0.135
(0.184)
0.070
(0.093)
0.047
(0.063)
0.024
(0.032)
0.082
(0.030)
0.042
(0.016)
0.028
(0.010)
0.014
(0.005)
0.081
(0.022)
0.041
(0.011)
0.028
(0.007)
0.014
(0.004)
0.080
(0.017)
0.041
(0.009)
0.027
(0.006)
0.014
(0.003)
0.079
(0.012)
0.041
(0.006)
0.027
(0.004)
0.014
(0.002)
0.104
(0.196)
0.059
(0.012)
0.041
(0.064)
0.023
(0.031)
0.068
(0.029)
0.039
(0.015)
0.027
(0.010)
0.014
(0.005)
0.066
(0.021)
0.038
(0.011)
0.026
(0.007)
0.014
(0.004)
0.067
(0.017)
0.038
(0.009)
0.026
(0.006)
0.014
(0.003)
0.066
(0.012)
0.037
(0.006)
0.026
(0.004)
0.014
(0.002)
0.122
(0.189)
0.065
(0.092)
0.043
(0.061)
0.022
(0.031)
0.075
(0.029)
0.039
(0.015)
0.026
(0.009)
0.013
(0.005)
0.073
(0.021)
0.038
(0.011)
0.025
(0.007)
0.013
(0.003)
0.073
(0.017)
0.038
(0.009)
0.025
(0.006)
0.012
(0.003)
0.072
(0.012)
0.037
(0.006)
0.025
(0.004)
0.013
(0.002)
0.007
(0.297)
0.001
(0.106)
0.001
(0.064)
0.001
(0.029)
0.002
(0.031)
0.001
(0.015)
0.000
(0.009)
0.000
(0.005)
0.001
(0.022)
0.001
(0.009)
0.001
(0.007)
0.000
(0.003)
0.002
(0.018)
0.001
(0.008)
0.001
(0.005)
0.000
(0.003)
0.002
(0.012)
0.001
(0.006)
0.001
(0.004)
0.000
(0.002)
0.031
(0.211)
0.015
(0.090)
0.009
(0.057)
0.004
(0.027)
0.017
(0.028)
0.008
(0.014)
0.006
(0.009)
0.003
(0.004)
0.017
(0.019)
0.008
(0.009)
0.005
(0.006)
0.003
(0.004)
0.016
(0.016)
0.008
(0.008)
0.005
(0.005)
0.003
(0.003)
0.016
(0.011)
0.008
(0.005)
0.005
(0.004)
0.003
(0.002)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2)
estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1)
estimators. (c) 21 = 0.4 and 21 = 0.4.
5.594
(1.330)
8.435
(1.382)
10.749
(1.439)
2.377
(1.042)
4.558
(1.071)
6.012
(1.109)
0.145
(0.919)
0.796
(0.888)
1.294
(0.899)
3.694
(1.201)
5.509
(1.243)
7.130
(1.281)
7.247
(1.526)
10.047
(1.484)
12.250
(1.468)
5.425
(1.340)
7.507
(1.302)
9.161
(1.287)
3.927
(1.200)
5.453
(1.173)
6.674
(1.161)
2.067
(1.066)
2.898
(1.050)
3.574
(1.040)
21 = 0.8
FMOLS
0.635
(0.732)
0.948
(0.712)
1.236
(0.737)
0.054
(0.993)
0.001
(0.926)
0.147
(0.927)
0.046
(1.132)
0.017
(1.023)
0.009
(1.009)
0.047
(1.281)
0.004
(1.119)
0.004
(1.093)
DOLS
1.229
(1.084)
1.758
(1.067)
2.188
(1.061)
2.944
(1.241)
4.134
(1.229)
5.070
(1.229)
3.905
(1.334)
5.462
(1.325)
6.676
(1.329)
4.650
(1.393)
6.503
(1.389)
7.937
(1.397)
OLS
2.893
(1.214)
4.041
(1.161)
4.983
(1.143)
1.006
(1.180)
1.684
(1.086)
2.198
(1.065)
3.017
(1.282)
4.401
(1.205)
5.489
(1.197)
4.823
(1.414)
6.833
(1.366)
8.429
(1.377)
21 = 0.4
FMOLS
0.530
(1.107)
0.741
(0.984)
0.913
(0.964)
0.096
(1.342)
0.168
(1.134)
0.199
(1.088)
0.124
(1.402)
0.104
(1.168)
0.126
(1.118)
0.086
(1.423)
0.069
(1.187)
0.084
(1.135)
DOLS
4.495
(1.123)
6.255
(1.088)
7.630
(1.092)
0.277
(0.897)
0.334
(0.885)
0.405
(0.891)
0.925
(0.867)
1.336
(0.856)
1.626
(0.859)
1.758
(0.859)
2.491
(0.847)
3.030
(0.847)
OLS
0.542
(1.209)
1.349
(1.103)
1.975
(1.087)
5.198
(1.503)
7.086
(1.441)
8.556
(1.395)
6.864
(1.642)
9.744
(1.665)
11.966
(1.644)
7.927
(1.719)
11.584
(1.826)
14.402
(1.840)
21 = 0.8
FMOLS
0.013
(1.350)
0.002
(1.160)
0.003
(1.109)
0.439
(1.277)
0.547
(1.104)
0.663
(1.047)
0.277
(1.203)
0.362
(1.054)
0.408
(0.999)
1.049
(1.122)
1.386
(1.006)
1.633
(0.959)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 3.
196
CHIHWA KAO & MIN-HSIEN CHIANG
Panel Cointegration
Table 4.
197
(N,T)
OLS
FMOLS(5)
FMOLS(2)
DOLS(4,2)
DOLS(2,1)
(1,20)
1.169
(1.497)
1.116
(1.380)
1.090
(1.357)
1.092
(1.333)
3.905
(1.334)
3.934
(1.307)
3.861
(1.306)
3.893
(1.312)
5.439
(1.347)
5.462
(1.325)
5.457
(1.328)
5.469
(1.296)
6.677
(1.329)
6.699
(1.323)
6.676
(1.329)
6.677
(1.311)
9.407
(1.350)
9.418
(1.313)
9.411
(1.310)
9.408
(1.315)
1.264
(2.326)
1.169
(1.805)
1.162
(1.692)
1.239
(1.165)
3.017
(1.281)
3.202
(1.206)
3.202
(1.150)
3.247
(1.149)
4.163
(1.269)
4.401
(1.205)
4.506
(1.199)
4.647
(1.190)
5.097
(1.258)
5.384
(1.204)
5.489
(1.197)
5.656
(1.196)
7.153
(1.262)
7.753
(1.171)
7.717
(1.182)
7.932
(1.195)
1.334
(2.031)
1.232
(1.738)
1.195
(1.676)
1.217
(1.652)
3.156
(1.230)
3.169
(1.200)
3.111
(1.191)
3.141
(1.209)
4.342
(1.226)
4.344
(1.197)
4.339
(1.192)
4.356
(1.176)
5.314
(1.208)
5.309
(1.192)
5.289
(1.191)
5.299
(1.182)
7.446
(1.215)
7.753
(1.171)
7.429
(1.174)
7.432
(1.181)
0.304
(3.224)
0.113
(2.086)
0.071
(1.778)
0.056
(1.531)
0.124
(1.402)
0.114
(1.186)
0.053
(1.122)
0.073
(1.078)
0.088
(1.358)
0.104
(1.168)
0.098
(1.121)
0.106
(1.050)
0.169
(1.361)
0.162
(1.169)
0.126
(1.118)
0.115
(1.056)
0.220
(1.348)
0.193
(1.157)
0.177
(1.093)
0.152
(1.057)
0.232
(2.109)
0.258
(1.689)
0.254
(1.554)
0.234
(1.448)
0.695
(1.184)
0.634
(1.099)
0.677
(1.079)
0.642
(1.061)
1.008
(1.169)
0.928
(1.092)
0.913
(1.081)
0.879
(1.033)
1.179
(1.162)
1.097
(1.094)
1.106
(1.074)
1.083
(1.041)
1.662
(1.163)
1.565
(1.085)
1.549
(1.053)
1.530
(1.040)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2)
estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1)
estimators. (c) 21 = 0.4 and 21 = 0.4.
Fig. 1.
198
CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 2.
Panel Cointegration
199
Fig. 3.
200
CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 4.
Panel Cointegration
201
Fig. 5.
202
CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 6.
Panel Cointegration
203
Fig. 7.
204
CHIHWA KAO & MIN-HSIEN CHIANG
Fig. 8.
Panel Cointegration
205
206
Panel Cointegration
207
208
investigate the sensitivity of the FMOLS estimator with respect to the choice of
length of the bandwidth. We extend the experiments by changing the lag length
from 5 to 2 for a Barlett window. Overall, the results show that changing the
lag length from 5 to 2 does not lead to substantial changes in biases for the
FMOLS estimator and its t-statistic. However, the biases of the DOLS
estimator and its t-statistic are reduced substantially when the lags and leads are
changed from (2, 1) to (4, 2) as predicted from Theorem 3. The results from
Tables 2 and 4 show that the DOLS method gives different estimates of and
the t-statistic depending on the number of lags and leads we choose. This seems
to be a drawback of the DOLS estimator. Further research is needed on how to
choose the lags and leads for the DOLS estimator in the panel setting.
B. ARMA(1, 1) Error Terms
In this section, we look at simulations where, instead of the errors being
generated by an MA(1) process, like in (31), the errors are generated by an
ARMA(1, 1) process, as in (30). One may question that the MA(1)
specification in (31) may be unfair to the FMOLS estimator. One of the reasons
why the performance of the DOLS is much better than that of the FMOLS lies
in the simulation design in (31), which assumes that the error terms are MA(1)
processes. If (uit , it) is an MA(1) process, then uit can be written exactly with
three terms, it1, it, and it + 1 and no lag truncation approximation is required
for the DOLS.
Tables 5 and 6 report the performance of OLS, FMOLS, and DOLS and their
t-statistics when the errors are generated by an ARMA(1, 1) process. Tables 5
and 6 show that the FMOLS estimator and its t-statistic are less biased than the
OLS estimator for most cases and is outperformed by the DOLS. Again, when
21 0.0 and 21 = 0.8 the FMOLS estimator and its t-statistic suffer from severe
biases. On the other hand, we observe that DOLS shows less improvement
compared with OLS and FMOLS, in contrast to Tables 1 and 3. However, the
good performance of DOLS may disappear for high order ARMA(p, q) error
process.
C. Non-normal Errors
In this section, we conduct an experiment where the error terms are nonnormal. The DGP is similar to that of Gonzalo (1994):
0.101
(0.038)
0.052
(0.014)
0.035
(0.008)
0.039
(0.024)
0.020
(0.008)
0.013
(0.004)
0.006
(0.015)
0.003
(0.005)
0.002
(0.003)
0.017
(0.009)
0.008
(0.003)
0.005
(0.001)
0.110
(0.042)
0.052
(0.015)
0.034
(0.008)
0.073
(0.032)
0.034
(0.011)
0.022
(0.006)
0.046
(0.025)
0.021
(0.009)
0.014
(0.005)
0.020
(0.016)
0.008
(0.005)
0.006
(0.003)
21 = 0.8
FM
0.002
(0.007)
0.002
(0.002)
0.001
(0.001)
0.001
(0.015)
0.000
(0.005)
0.001
(0.003)
0.001
(0.024)
0.000
(0.008)
0.000
(0.004)
0.003
(0.037)
0.001
(0.012)
0.000
(0.007)
D
0.016
(0.017)
0.007
(0.006)
0.005
(0.003)
0.035
(0.025)
0.016
(0.008)
0.011
(0.005)
0.045
(0.028)
0.021
(0.010)
0.013
(0.005)
0.049
(0.029)
0.024
(0.010)
0.015
(0.006)
OLS
0.017
(0.013)
0.008
(0.004)
0.005
(0.002)
0.013
(0.022)
0.006
(0.007)
0.004
(0.004)
0.038
(0.027)
0.019
(0.009)
0.012
(0.005)
0.062
(0.020)
0.031
(0.011)
0.021
(0.006)
21 = 0.4
FM
0.003
(0.012)
0.001
(0.004)
0.001
(0.002)
0.001
(0.023)
0.001
(0.008)
0.001
(0.004)
0.000
(0.028)
0.000
(0.009)
0.000
(0.005)
0.000
(0.030)
0.000
(0.010)
0.000
(0.005)
D
0.035
(0.024)
0.016
(0.009)
0.011
(0.005)
0.001
(0.016)
0.001
(0.006)
0.000
(0.003)
0.006
(0.013)
0.002
(0.004)
0.002
(0.002)
0.009
(0.011)
0.004
(0.004)
0.003
(0.002)
OLS
0.012
(0.024)
0.007
(0.009)
0.005
(0.005)
0.034
(0.016)
0.016
(0.005)
0.010
(0.003)
0.037
(0.014)
0.017
(0.004)
0.012
(0.002)
0.036
(0.012)
0.017
(0.004)
0.012
(0.002)
21 = 0.8
FM
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000
(0.031)
0.000
(0.009)
0.000
(0.005)
0.003
(0.015)
0.001
(0.005)
0.002
(0.003)
0.001
(0.012)
0.001
(0.004)
0.000
(0.002)
0.003
(0.009)
0.001
(0.003)
0.001
(0.002)
D
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS
estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 5.
Panel Cointegration
209
3.569
(1.323)
4.601
(1.219)
5.22
(1.195)
1.857
(1.106)
2.576
(1.044)
3.179
(1.036)
0.353
(0.952)
0.624
(0.897)
0.827
(0.904)
1.733
(0.933)
2.511
(0.871)
3.270
(0.897)
5.316
(1.929)
7.013
(1.903)
8.437
(1.899)
4.152
(1.762)
5.424
(1.733)
6.521
(1.721)
3.184
(1.644)
4.120
(1.616)
4.952
(1.599)
1.956
(1.529)
2.471
(1.507)
2.966
(1.484)
0.214
(0.663)
0.317
(0.664)
0.428
(0.694)
0.034
(0.956)
0.047
(0.909)
0.058
(0.913)
0.056
(1.132)
0.045
(1.027)
0.034
(1.004)
0.119
(1.290)
0.090
(1.119)
0.068
(1.077)
1.496
(1.589)
1.888
(1.578)
2.267
(1.571)
2.538
(1.769)
3.327
(1.771)
4.131
(1.746)
3.064
(1.867)
4.069
(1.880)
4.899
(1.898)
3.411
(1.924)
4.583
(1.949)
5.523
(1.969)
OLS
1.429
(1.015)
1.917
(1.010)
2.237
(0.999)
0.732
(1.226)
0.967
(1.085)
1.141
(1.021)
1.877
(1.314)
2.346
(1.149)
2.779
(1.114)
2.912
(1.390)
3.580
(1.216)
4.206
(1.178)
0.221
(1.052)
0.294
(0.956)
0.363
(0.941)
0.038
(1.313)
0.075
(1.116)
0.206
(1.118)
0.025
(1.388)
0.011
(1.152)
0.027
(1.096)
0.006
(1.417)
0.009
(1.166)
0.006
(1.111)
DOLS
2.315
(1.577)
3.089
(1.644)
3.736
(1.676)
0.047
(1.498)
0.194
(1.528)
0.064
(1.498)
0.705
(1.454)
1.099
(1.479)
1.343
(1.473)
1.158
(1.426)
1.723
(1.445)
2.097
(1.435)
OLS
21 = 0.4
FMOLS
DOLS
21 = 0.8
FMOLS
0.564
(1.195)
0.876
(1.088)
1.132
(1.062)
2.825
(1.327)
3.557
(1.194)
4.005
(1.096)
3.858
(1.373)
5.034
(1.268)
6.016
(1.211)
4.589
(1.420)
6.144
(1.343)
7.428
(1.294)
21 = 0.8
FMOLS
0.002
(1.551)
0.005
(1.239)
0.003
(1.155)
0.230
(1.276)
0.212
(1.095)
0.693
(1.094)
0.068
(1.208)
0.134
(1.053)
0.144
(1.014)
0.347
(1.139)
0.505
(1.011)
0.603
(0.978)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS
estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 6.
210
CHIHWA KAO & MIN-HSIEN CHIANG
Panel Cointegration
211
uit
u*it
0.3
=
+
it
*it
21
u*it =
0.4
0.6
u*it1
,
*it1
(32)
1
0.5*it + (10.52)1/2u**
it ,
and
*it = **
it ,
where u**
and **
are independent exponential random variables with a
it
it
parameter 1. The results from Tables 78 show that while the DOLS estimator
performs better in terms of the biases, the distribution of the DOLS t-statistic
is far from the asymptotic N(0, 1). The standard deviations of the DOLS tstatistic are badly underestimated.
To summarize the results so far, it would appear that the DOLS estimator is
the best estimator overall, though the standard error for the DOLS t-statistic
shows significant downward bias when the error terms are generated from nonnormal distributions.
D. Heterogeneous Panel
In Sections AC we compare the small sample properties of the OLS, FMOLS,
and DOLS estimators and conclude that the DOLS estimator and its t-statistic
generally exhibit the least bias. One of the reasons for the poor performance of
the FMOLS estimator in the homogeneous panel is that the FMOLS estimator
needs to use a kernel estimator for the asymptotic covariance matrix, while the
DOLS does not. By contrast, for the heterogeneous panel both DOLS in (20)
and OLS in (33) use kernel estimators. Consequently, one may expect that the
much better performance of the DOLS estimator in Sections 5A-C is limited to
only very specialized cases, e.g. in the homogeneous panel. To test this, we now
compare the performance of the OLS, FMOLS, and DOLS estimators for a
heterogeneous panel using Monte Carlo experiments similar to those in Section
5A. The DGP is
yit = i + xit + uit
and
xit = xit1 + it
for i = 1, . . . , N, t = 1, . . . T, where
0.011
(0.009)
0.003
(0.002)
0.001
(0.001)
0.008
(0.009)
0.005
(0.004)
0.002
(0.002)
0.010
(0.057)
0.002
(0.014)
0.001
(0.007)
0.022
(0.012)
0.006
(0.003)
0.003
(0.001)
0.005
(0.009)
0.001
(0.002)
0.001
(0.001)
0.002
(0.009)
0.002
(0.004)
0.001
(0.002)
0.012
(0.058)
0.003
(0.014)
0.001
(0.007)
0.011
(0.013)
0.003
(0.003)
0.001
(0.001)
= 0.25
FM
0.000
(0.002)
0.000
(0.001)
0.000
(0.000)
0.001
(0.054)
0.000
(0.013)
0.000
(0.006)
0.001
(0.005)
0.000
(0.001)
0.000
(0.001)
0.000
(0.002)
0.000
(0.000)
0.000
(0.000)
D
0.034
(0.020)
0.009
(0.005)
0.004
(0.002)
0.005
(0.017)
0.001
(0.004)
0.001
(0.002)
0.002
(0.009)
0.000
(0.002)
0.000
(0.001)
0.002
(0.006)
0.001
(0.001)
0.000
(0.001)
OLS
0.049
(0.019)
0.014
(0.005)
0.007
(0.002)
0.007
(0.016)
0.002
(0.004)
0.001
(0.002)
0.008
(0.009)
0.002
(0.002)
0.001
(0.001)
0.007
(0.006)
0.002
(0.001)
0.001
(0.001)
= 0.5
FM
0.001
(0.013)
0.000
(0.003)
0.000
(0.001)
0.001
(0.014)
0.000
(0.003)
0.000
(0.002)
0.000
(0.005)
0.000
(0.001)
0.000
(0.001)
0.000
(0.003)
0.028
(0.001)
0.000
(0.000)
D
0.039
(0.016)
0.012
(0.004)
0.005
(0.002)
0.001
(0.005)
0.000
(0.001)
0.000
(0.001)
0.001
(0.004)
0.000
(0.001)
0.000
(0.000)
0.001
(0.003)
0.000
(0.001)
0.000
(0.000)
OLS
0.008
(0.014)
0.003
(0.004)
0.002
(0.002)
0.005
(0.005)
0.001
(0.001)
0.001
(0.001)
0.005
(0.004)
0.001
(0.001)
0.001
(0.000)
0.004
(0.003)
0.001
(0.001)
0.001
(0.000)
=1
FM
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000
(0.013)
0.000
(0.003)
0.000
(0.001)
0.000
(0.003)
0.000
(0.001)
0.000
(0.000)
0.000
(0.002)
0.000
(0.001)
0.000
(0.000)
0.000
(0.002)
0.000
(0.000)
0.000
(0.000)
D
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS
estimator. (d) The error terms are non-normal.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 7.
212
CHIHWA KAO & MIN-HSIEN CHIANG
1.248
(0.940)
0.892
(0.599)
0.738
(0.488)
0.884
(0.932)
0.787
(0.599)
0.651
(0.488)
0.164
(0.941)
0.106
(0.616)
0.093
(0.505)
1.714
(0.951)
1.249
(0.605)
1.036
(0.492)
0.699
(1.311)
0.717
(1.253)
0.741
(1.267)
0.259
(1.243)
0.587
(1.250)
0.611
(1.264)
0.275
(1.271)
0.282
(1.231)
0.264
(1.248)
1.104
(1.326)
1.134
(1.262)
1.163
(1.274)
0.000
(0.189)
0.001
(0.126)
0.001
(0.102)
0.014
(0.896)
0.013
(0.579)
0.002
(0.477)
0.071
(0.561)
0.007
(0.230)
0.008
(0.188)
0.006
(0.209)
0.002
(0.139)
0.002
(0.113)
2.286
(1.278)
2.368
(1.208)
2.416
(1.214)
0.340
(1.236)
0.347
(1.186)
0.332
(1.193)
0.259
(1.243)
0.268
(1.189)
0.289
(1.197)
0.472
(1.245)
0.484
(1.191)
0.506
(1.199)
OLS
2.528
(0.976)
1.947
(0.633)
1.637
(0.513)
0.398
(0.941)
0.268
(0.611)
0.226
(0.497)
0.884
(0.932)
0.626
(0.599)
0.519
(0.485)
1.055
(0.931)
0.752
(0.597)
0.623
(0.483)
0.035
(0.650)
0.035
(0.446)
0.033
(0.363)
0.031
(0.784)
0.025
(0.509)
0.013
(0.421)
0.071
(0.561)
0.054
(0.363)
0.052
(0.299)
0.039
(0.421)
0.003
(0.276)
0.028
(0.227)
DOLS
2.749
(1.067)
2.946
(0.992)
3.011
(0.981)
0.145
(1.041)
0.141
(0.982)
0.125
(0.978)
0.199
(1.040)
0.213
(0.981)
0.232
(0.978)
0.406
(1.040)
0.424
(0.981)
0.445
(0.979)
OLS
= 0.5
FMOLS
DOLS
= 0.25
FMOLS
0.539
(0.984)
0.598
(0.672)
0.538
(0.554)
0.961
(0.931)
0.685
(0.594)
0.570
(0.478)
1.152
(0.927)
0.831
(0.589)
0.692
(0.474)
1.265
(0.925)
0.918
(0.588)
0.764
(0.472)
=1
FMOLS
0.026
(0.899)
0.008
(0.624)
0.002
(0.525)
0.066
(0.619)
0.053
(0.407)
0.039
(0.337)
0.019
(0.567)
0.016
(0.368)
0.020
(0.304)
0.118
(0.520)
0.096
(0.336)
0.088
(0.276)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS
estimator. (d) The error terms are non-normal.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
T = 20
OLS
Table 8.
Panel Cointegration
213
214
uit
u*it
0.3
=
+
it
*it
21
0.4
0.6
u*it1
*it1
with
u*it iid
~N
*it
0
,
0
1
21
21
1
N
i=1
t=1
(x**
**
**
it x
i )(x**
it x
i )
i=1
t=1
(x**
**
it x
i )(y**
it )
(33)
with x**
**
=
it = wi xit, y**
it = wiyit, x
i
1
T
1
x**
it , and wi = [i ]11. Two FMOLS
t=1
estimators will be considered, one using the lag length of 5 (FMOLS(5)), the
second using the lag length of 2 (FMOLS(2)). Two DOLS estimators are also
considered: DOLS with four lags and two leads, DOLS(4, 2) and DOLS with
two lags and one lead, DOLS(2, 1). The relatively good performance of the
DOLS estimator in a homogeneous panel can also be observed in Table 9. The
biases of the OLS and FMOLS estimators are substantial. Again, the DOLS
outperforms the OLS and FMOLS. Note from Table 9 that the FMOLS always
has more bias than the OLS for all N and T except when N = 1. The poor
performance of the FMOLS in the heterogenous panels indicates that the
FMOLS in Section 4 is not recommended in practice. A possible reason for the
poor performance of the FMOLS in heterogenous panels is that it has to go
through two non-parametric corrections, as in (22) and (23). Therefore the
failure of the non-parametric correction could be very severe for the FMOLS
estimator in heterogenous panels. Pedroni (1996) proposed several alternative
versions of the FMOLS estimator such as an FMOLS estimator based on the
Panel Cointegration
Table 9.
215
(N,T)
OLS
*
FM(5)
*
FM(2)
*
D(4,2)
*
D(2,1)
*
(1,20)
0.102
(0.163)
0.052
(0.079)
0.035
(0.052)
0.018
(0.026)
0.025
(0.032)
0.016
(0.014)
0.012
(0.009)
0.006
(0.004)
0.023
(0.024)
0.015
(0.009)
0.013
(0.006)
0.014
(0.004)
0.023
(0.019)
0.015
(0.008)
0.011
(0.005)
0.006
(0.002)
0.022
(0.014)
0.015
(0.006)
0.011
(0.004)
0.006
(0.002)
0.076
(0.319)
0.006
(0.116)
0.004
(0.066)
0.008
(0.027)
0.069
(0.054)
0.041
(0.019)
0.028
(0.011)
0.014
(0.005)
0.089
(0.038)
0.048
(0.013)
0.032
(0.008)
0.014
(0.004)
0.073
(0.031)
0.042
(0.011)
0.029
(0.006)
0.014
(0.003)
0.075
(0.003)
0.042
(0.008)
0.029
(0.004)
0.014
(0.002)
0.008
(0.212)
0.018
(0.084)
0.014
(0.050)
0.009
(0.023)
0.073
(0.034)
0.035
(0.014)
0.023
(0.009)
0.011
(0.004)
0.083
(0.024)
0.039
(0.009)
0.026
(0.006)
0.012
(0.003)
0.074
(0.019)
0.036
(0.008)
0.023
(0.005)
0.011
(0.002)
0.072
(0.022)
0.036
(0.006)
0.024
(0.004)
0.011
(0.002)
0.011
(0.405)
0.001
(0.121)
0.001
(0.071)
0.000
(0.030)
0.000
(0.054)
0.001
(0.020)
0.000
(0.012)
0.000
(0.005)
0.000
(0.038)
0.001
(0.014)
0.000
(0.009)
0.000
(0.003)
0.001
(0.031)
0.001
(0.011)
0.000
(0.007)
0.000
(0.003)
0.001
(0.022)
0.001
(0.008)
0.000
(0.005)
0.000
(0.002)
0.004
(0.264)
0.006
(0.099)
0.005
(0.061)
0.002
(0.029)
0.006
(0.040)
0.004
(0.017)
0.003
(0.011)
0.002
(0.005)
0.007
(0.028)
0.004
(0.012)
0.003
(0.008)
0.002
(0.004)
0.006
(0.023)
0.004
(0.009)
0.003
(0.006)
0.002
(0.003)
0.016
(0.011)
0.004
(0.007)
0.003
(0.004)
0.002
(0.002)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2)
estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1)
estimators. (c) 21 ~ U[0.8,0.8] and 21 ~ U[0.8,0.8].
216
V. CONCLUSION
This chapter discusses limiting distributions for the OLS, FMOLS, and DOLS
estimators in a cointegrated regression. We also investigate the finite sample
proprieties of the OLS, FMOLS, and DOLS estimators. The results from
Monte Carlo simulations can be summarized as follows: First, for the
homogeneous panel, when the serial correlation parameter, 21, and the
endogeneity parameter, 21, are both negative, the OLS is the most biased
estimator. The OLS is biased in almost all cases for the heterogenous panel.
Second, the FMOLS is more biased than the OLS when 21 0 and 21 > 0 for
the homogeneous panel. The FMOLS is severely biased for the heterogenous
panel in almost all trials. This indicates the failure of the parametric correction
is very serious, especially in the heterogenous panel. Third, DOLS performs
very well in all cases for both the homogeneous and heterogenous panels.
Adding the number of leads and lags reduces the bias of the DOLS
substantially. This was predicted by the asymptotic theory in Theorem 3.
Fourth, the sequential limit theory approximates the limit distributions of the
DOLS and its t-statistic very well. All in all, our findings are summarized as
follows:
(i) The OLS estimator has a non-negligible bias in finite samples.
(ii) The FMOLS estimator does not improve over the OLS estimator in
general.
(iii) The FMOLS estimator is complicated by the dependence of the correction
terms upon the preliminary estimator (here we use OLS), which may be
very biased in finite samples with panel data. More seriously, the failure
of the non-parametric correction for the FMOLS in panel data could be
severe. This indicates that the DOLS estimator may be more promising
than the OLS or FMOLS estimators in estimating cointegrated panel
regressions.
Panel Cointegration
Table 10.
217
(N,T)
OLS
FMOLS(5)
FMOLS(2)
DOLS(4,2)
DOLS(2,1)
(1,20)
0.893
(1.390)
0.861
(1.265)
0.844
(1.233)
0.845
(1.212)
1.221
(1.578)
1.629
(1.344)
1.774
(1.282)
1.957
(1.239)
1.612
(1.640)
2.194
(1.392)
2.417
(1.306)
2.832
(1.234)
1.946
(1.697)
2.715
(1.389)
3.045
(1.328)
3.346
(1.250)
2.675
(1.720)
3.802
(1.408)
4.269
(1.336)
4.715
(1.250)
0.588
(2.473)
0.101
(1.849)
0.095
(1.579)
0.372
(1.336)
2.411
(1.902)
2.899
(1.345)
3.031
(1.195)
3.095
(1.047)
4.381
(1.882)
4.807
(1.341)
4.905
(1.199)
4.886
(1.059)
4.408
(1.884)
5.171
(1.320)
5.361
(1.170)
5.420
(1.033)
6.382
(1.878)
7.399
(1.314)
7.633
(1.162)
7.723
(1.045)
0.058
(1.643)
0.280
(1.331)
0.347
(1.207)
0.459
(1.139)
2.530
(1.192)
2.518
(0.999)
2.508
(0.952)
2.466
(0.907)
4.079
(1.191)
3.969
(1.004)
3.932
(0.960)
3.839
(0.911)
4.474
(1.182)
4.407
(0.976)
4.380
(0.933)
4.281
(0.889)
6.383
(1.169)
6.272
(0.967)
6.209
(0.931)
6.084
(0.897)
0.093
(3.303)
0.009
(1.980)
0.016
(1.729)
0.016
(1.510)
0.010
(1.983)
0.059
(1.485)
0.004
(1.329)
0.046
(1.197)
0.039
(1.987)
0.068
(1.472)
0.007
(1.319)
0.099
(1.181)
0.041
(1.932)
0.110
(1.452)
0.027
(1.307)
0.105
(1.181)
0.073
(1.939)
0.145
(1.444)
0.047
(1.307)
0.136
(1.178)
0.029
(2.156)
0.106
(1.618)
0.119
(1.489)
0.101
(1.405)
0.219
(1.468)
0.271
(1.259)
0.347
(1.184)
0.393
(1.121)
0.365
(1.466)
0.432
(1.233)
0.515
(1.169)
0.608
(1.099)
0.408
(1.449)
0.472
(1.221)
0.572
(1.165)
0.697
(1.099)
0.580
(1.439)
0.683
(1.215)
0.803
(1.165)
0.977
(1.098)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2)
estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1)
estimators. (c) 21 ~ U[0.8,0.8] and 21 ~ U[0.8,0.8].
218
ACKNOWLEDGMENTS
We thank Suzanne McCoskey, Peter Pedroni, Andrew Levin and participants of
the 1998 North American Winter Meetings of the Econometric Society for
helpful comments and Bangtian Chen for his research assistance on an earlier
draft of this chapter. Thanks also go to Denise Paul for correcting my English
and carefully checking the manuscript to enhance its readability. A Gauss
program for this paper can be retrieved from http://web.syr.edu/ ~ cdkao.
Address correspondence to: Chihwa Kao, Center for Policy Research,
426 Eggers Hall, Syracuse University, Syracuse, NY. 132441020; e-mail:
cdkao@maxwell.syr.edu.
REFERENCES
Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons.
Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels:
A Survey. Advances in Econometrics, 15, 751.
Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different
Bargaining Levels Cointegrated? Applied Economics, 26, 353361.
Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression
in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management
Sciences, 19, 75114.
Gonzalo, J. (1994). Five Alternative Methods of Estimating Long-Run Equilibrium Relationships.
Journal of Econometrics, 60, 203233.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels.
Manuscript, University of Cambridge.
Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data.
Journal of Econometrics, 90, 144.
Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data
When the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center
for Policy Research, Syracuse University.
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: New Results. Discussion paper,
Department of Economics, UC-San Diego.
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and
a New Simple Test: Evidence From Simulations and the Bootstrap. Oxford Bulletin of
Economics and Statistics, 61, 631652.
McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel
Data. Econometric Reviews, 17, 5784.
Pesaran, H., & Smith, R. (1995). Estimating Long-Run Relationships from Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79113.
Pedroni, P. (1997). Panel Cointegration: Asymptotics and Finite Sample Properties of Pooled Time
Series Tests with an Application to the PPP Hypothesis. Working paper, Department of
Economics, No. 95013, Indiana University.
Panel Cointegration
219
Pedroni, P. (1996). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper, Department of Economics, No. 9620, Indiana
University.
Phillips, P. C. B., & Durlauf, S. N. (1986). Multiple Time Series Regression with Integrated
Processes. Review of Economic Studies, 53, 473495.
Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables
Regression with I(1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P. C. B., & Loretan, M. (1991). Estimating Long-Run Economic Equilibria. Review of
Economic Studies, 58, 407436.
Phillips, P. C. B., & Moon, H. (1999). Linear Regression Limit Theory for Non-stationary Panel
Data. Econometrica, 67, 10571111.
Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20,
9711001.
Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data.
Economics Letters, 44, 919.
Saikkonen, P. (1991). Asymptotically Efficient Estimation of Cointegrating Regressions.
Econometric Theory, 58, 121.
Summers, R., & Heston, A. (1991). The Penn World Table; An Expanded Set of International
Comparisons 19501988. Quarterly Journal of Economics, 106, 327368.
Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order
Integrated Systems. Econometrica, 61, 783820.
APPENDIX
Proof of Theorem 3
First we write (19) in vector form:
yi = ei + xi + ZiqC + v i
= xi + ZiD + v i (say),
where yi, is a T 1 vector of yit; e is T 1 unit vector; Ziq is the T 2q matrix
of observations on the 2 q regressors xit q, , xit + q; xi is a vector of T k
of xit; C is a (2 q) 1 vector of cij; v i is a T 1 vector of v it; Zi is a
T (2 q + 1) matrix, Zi = (e, Ziq); and D is a (2 q + 1) 1 vector of
parameters. Let Qi = I Zi(ZiZi) 1Zi. It follows that
N
( D ) =
1
(xiQi xi)
i=1
(xiQiv i) .
i=1
220
N
NT( D ) =
1
N
i=1
1
(xiQi xi)
T2
1
=
N
6iT
1
i=1
1
i=1
i=1
1
(xiQiv i)
T
1
N
N
= [6NT] 1[N5NT],
1
where 5NT =
N
1
N
N
5iT
i=1
N
1
1
5iT, 5iT = (xiQiv i), 6NT =
T
N
i=1
1
(xiQi xi)
T2
1
(xiWT xi) + op(1)
T2
Tq
1
(xit x i)(xit x i) + op(1)
= 2
T t=q+1
i,
B iB
and
1
5iT = (xiQiv i)
T
1
= (xiWTv i) + op(1)
T
Tq
1
(xit x i)vit + op(1)
=
T t=q+1
B dBui+ ,
1
(xiQi xi).
T2
Panel Cointegration
221
1
ee. Then applying
T
1
the multivariate Lindeberg-Levy central limit theorem to
B idBui+ and
N
N
1
i as in Theorem 2, we have
B iB
combining this with the limit of
N i=1
Bi and WT = IT
N
1
N
1
i
B iB
N
i=1
Proof of Theorem 5
N
D ) = 1
NT(*
N
i=1
1
(x*
i Q*
i x*
i)
T2
1
=
N
8iT
i=1
1
1
N
N
i=1
1
(x*
*i )
i Q*
iv
T
1
N
N
7iT
i=1
= [8NT] 1[N7NT],
N
where
8iT =
1
7NT =
N
1
(x*
i Q*
i x*
i).
T2
i=1
1
N
7iT,
1
7iT = (x*
*i),
i Q*
i v
T
1
8NT =
N
i=1
8iT,
and
222
1
(x*
i Q*
i x*
i)
T2
1
(x*
i W*
T x*
i ) + op(1)
T2
T qi
1
(x*it x *i )(x*it x *i ) + op(1)
= 2
T t=q +1
iW
i,
W
and
1
7iT = (x*
*i)
i Q*
i v
T
1
*i ) + op(1)
= (x*
i WT v
T
T qi
1
(x*it x *i )v*it + op(1)
=
T t=q +1
i
idVi,
W
I. INTRODUCTION
The work of Perron (1989) has inspired extensive research on testing for unit
roots in the presence of structural change. Banerjee, Lumsdaine & Stock
(1992), Zivot & Andrews (1992), and Perron (1997), among many others,
develop tests which allow the break to be determined endogenously and
Lumsdaine & Papell (1997) extend the tests to allow for two breaks. Starting
with Levin & Lin (1992), much work has also been done on testing for unit
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 223238.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
223
224
roots in panels, including papers by Im, Peseran & Shin (1997), Maddala & Wu
(1999), and Bowman (1999).
This chapter takes a small step towards combining the two research agendas.
We propose a unit root test for non-trending data in the presence of a one-time
change in the mean for a heterogeneous panel. The date of the break, which is
common across the countries of the panel, is determined endogenously and,
in the additive outlier framework, is assumed to occur instantaneously. The
speed of mean reversion is also common across countries. The intercepts,
coefficients on the break dummy variable, and serial correlation structure,
however, are country specific.
In the context of testing for a unit root in the presence of structural change,
our test is most closely related to the work of Perron & Vogelsang (1992). They
develop a test for a unit root in non-trending data in the presence of a one-time
change in the mean of a single series, with the date of the change determined
endogenously. In the panel unit root context, the most closely related work is
Papell (1997), who utilizes a feasible generalized least squares (SUR) method
which allows for both contemporaneous and heterogeneous serial correlation.
Levin & Lin (1992) and Bowman (1999) show that, in the absence of
structural change, panel unit root tests have good power in moderately sized
samples of 10 or more countries, even with fairly long persistence. We conduct
two power experiments, both involving panels of non-trending, stationary
series with a one-time change in the mean. First, using conventional panel unit
root tests, we find very low power to reject the unit root null. Second, using
tests that incorporate structural change, the power is much improved.
We apply the test to a data set of annual unemployment rates for 17 OECD
countries from 1955 to 1990. Using the panel tests in the presence of structural
change, we find much stronger rejections of unit roots than can be found with
univariate tests that do not incorporate structural change, panel tests that do not
incorporate structural change, or univariate tests that do incorporate structural
change.
225
with structural change. While our tests are for non-trending data, an extension
to trending data would be straightforward.
The most common tests for unit roots are Augmented Dickey-Fuller tests.
ADF tests for non-trending data involve running the following regression:
k
ut = + ut 1 +
ciut i + t,
(1)
i=1
where ut is the variable of interest. The null hypothesis of a unit root is rejected
if the value of the t-statistic for (in absolute value) is greater than the
appropriate critical value. While the critical values are non-standard, they are
readily available.1
There is substantial evidence that the lag truncation parameter k is best
selected according to data-dependent methods rather than choosing a fixed k a
priori. We follow the method suggested by Campbell & Perron (1991), Hall
(1994), and Ng & Perron (1995). Start with an upper bound kmax on k. If the tstatistic on the coefficient of the last lag is significant, (using the 10% value of
the asymptotic distribution of 1.645), then kmax = k. If it is not significant, then
k is lowered by one. This procedure is repeated until the last lag becomes
significant. If no lag is significant, then k is chosen to equal zero.
Panel unit root tests in the ADF framework for non-trending data with
heterogeneous intercepts, which are equivalent to including country-specific
dummy variables, involve estimating the following regressions:
kj
ujt = j + ujt 1 +
cjiujt i + jt.
(2)
i=1
226
k
t =
i=0
k
iDTBt i + t 1 +
ci t i + t,
(4)
i=1
where t is the estimated residual from equation (3).6 TB is the break date,
DTBt = 1 if t = TB + 1, 0 otherwise, and DUt = 1 if t > TB, 0 otherwise.7
Equations (3) and (4) are estimated sequentially for each break year
TB = k + 2, . . . , T 1, where T is the number of observations. The break date
is chosen to minimize the t-statistic for , and data-dependent methods are used
to select the lag length k. The null hypothesis of a unit root is rejected if the tstatistic on is sufficiently large (in absolute value). The finite sample critical
values of Perron & Vogelsang (1992) can be used to assess the significance of
the unit root statistic.
We proceed to construct a test for unit roots in panel data in the presence of
structural change. With heterogeneous intercepts, the panel AO model is
estimated by the following two equations:
Table 1.
227
Finite Sample Critical Values for Panel Unit Root Tests without
Structural Change
1%
T
5
10
15
20
50
100
200
5.525
6.964
8.327
9.775
5.272
6.604
7.675
8.683
5.121
6.251
7.234
8.119
5%
T
5
10
15
20
50
100
200
4.789
6.244
7.603
8.940
4.641
5.923
6.964
7.955
4.512
5.640
6.629
7.512
10%
T
5
10
15
20
50
100
200
4.452
5.857
7.221
8.528
4.314
5.594
6.621
7.587
4.177
5.317
6.308
7.145
(5)
and
kj
jt =
i=0
kj
jtDTBjt i + jt 1 +
i=1
cjt jt i + jt,
(6)
228
Table 2.
229
Finite Sample Critical Values for Panel Unit Root Tests with
Structural Change
1%
T
5
10
15
20
50
100
200
7.329
9.056
10.940
12.667
6.941
8.658
9.995
11.103
6.915
8.415
9.571
10.672
5%
T
5
10
15
20
50
100
200
6.613
8.484
10.279
12.011
6.432
8.046
9.461
10.618
6.334
7.852
9.105
10.225
10%
T
5
10
15
20
50
100
200
6.344
8.203
10.025
11.705
6.113
7.785
9.184
10.361
6.051
7.553
8.815
9.958
it is probable that panel unit root tests will incorrectly find that unemployment
is integrated, rather than stationary around a one time shift in mean.
Table 4 demonstrates that allowing for a mean shift greatly increases power
relative to Table 3. For all values of and considered, the power is at least
50%, and often times 100%, for a panel of at least 10 countries with at least 100
observations. Indeed, for T = 100, there are only two instances in which the
power is less that 50%, and those occur for the smallest panel considered,
N = 5, and the most persistent value of , 0.95.
230
Table 3.
5
10
15
20
5
10
15
20
5
10
15
20
= 0.95, = 1.0
50
100
200
0.0004
0.0008
0.0000
0.0000
0.0008
0.0004
0.0000
0.0000
0.0008
0.0000
0.0000
0.0000
5
10
15
20
50
100
200
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
= 0.90, = 0.5
= 0.90, = 1.0
50
100
200
0.0180
0.0116
0.0120
0.0084
0.0560
0.1204
0.2300
0.3084
0.3780
0.8312
0.9608
0.9924
5
10
15
20
50
100
200
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0008
= 0.80, = 0.5
= 0.80, = 1.0
50
100
200
0.3652
0.6848
0.8216
0.8732
0.8400
0.9908
0.9992
1.0000
0.9872
1.0000
1.0000
1.0000
5
10
15
20
50
100
200
0.0036
0.0052
0.0052
0.0044
0.0336
0.1784
0.4208
0.6432
0.2052
0.6876
0.9124
0.9872
Table 4.
5
10
15
20
5
10
15
20
5
10
15
20
231
= 0.95, = 1.0
50
100
200
0.0710
0.0840
0.0810
0.0520
0.2320
0.5160
0.7250
0.8730
0.8460
0.9960
1.0000
1.0000
5
10
15
20
50
100
200
0.0220
0.0160
0.0060
0.0020
0.4130
0.7570
0.8770
0.9570
0.9980
1.0000
1.0000
1.0000
= 0.90, = 0.5
= 0.90, = 1.0
50
100
200
0.2750
0.4730
0.5730
0.6600
0.7790
0.9930
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
5
10
15
20
50
100
200
0.2920
0.5150
0.5600
0.5590
0.9430
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
= 0.80, = 0.5
= 0.80, = 1.0
50
100
200
0.8000
0.9910
0.9990
0.9990
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
5
10
15
20
50
100
200
0.8000
0.8520
0.9960
0.9990
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
232
tests, as in equation (1), for each of the 17 countries in the sample. The results
of the ADF tests are reported in Table 5. We set kmax to 4. Using critical values
from MacKinnon (1991), we find that the null of a unit root cannot be rejected
for any of the series at the 10% level.
Table 5.
Country
Australia
0.437
(1.60)
0.188
(1.26)
0.337
(1.48)
0.819
(1.61)
0.222
(0.82)
0.359
(1.42)
0.176
(1.38)
0.239
(1.19)
0.470
(1.36)
0.597
(2.04)
0.210
(1.91)
0.248
(1.21)
0.435
(1.01)
0.369
(1.85)
0.413
(1.82)
0.391
(1.38)
1.389
(2.14)
0.936
(1.15)
0.915
(1.28)
0.953
(1.40)
0.893
(1.46)
0.993
(0.14)
0.912
(1.26)
0.987
(0.54)
0.929
(1.32)
0.952
(1.28)
0.885
(2.08)
0.883
(2.04)
0.966
(0.96)
0.835
(0.84)
0.945
(2.25 )
0.760
(1.37)
0.947
(1.14)
0.766
(2.16)
Austria
Belgium
Canada
Denmark
Finland
France
Germany
Ireland
Italy
Japan
Netherlands
Norway
Spain
Sweden
U.K.
U.S.A.
1
1
0
4
2
1
1
1
3
3
2
2
3
2
2
0
Note: The critical values for the ADF test, calculated from MacKinnon (1991) with 36
observations, are 3.62 (1%), 2.94 (5%), and 2.61 (10%). Numbers in parentheses are
t-statistics.
233
One possible reason for the failure of the ADF tests to reject the unit root
hypothesis is the relatively short (36 years) time span of the data.10 We
investigate this possibility by conducting panel unit root tests, described by
equation (2), to exploit cross-section variability among the 17 unemployment
rates. The results of the panel unit root tests are reported in Table 6.11 The null
hypothesis of a unit root cannot be rejected, at even the 10% level, either for
the OECD countries as a whole or for smaller panels consisting of European
(13), European Community (EC) (9), European Free Trade Area (EFTA) (4),
Non-European (4), or Non-EC (EFTA plus Non-Europe) (8) countries.12
The results for the univariate AO model of equations (3) and (4) are reported
in Table 7. The null hypothesis of a unit root is rejected for Finland, Ireland and
Spain at the 1% level, Belgium, France, Italy and Norway at the 5% level, and
Austria, Canada, Denmark, and the United Kingdom at the 10% level. The
structural breaks are all positive, reflecting the general rise in unemployment
among the OECD countries. The structural break occurs between 1974 and
1976 for nine out of eleven countries for which the unit root null can be
rejected.
The results of the panel unit root tests from equations (5) and (6) that account
for structural change, along with the associated critical values, are reported in
Table 6.
Group
t
OECD
EUROPE
EC
NON-EC
EFTA
NON-EUROPE
17
13
9
8
4
4
0.924
0.936
0.941
0.846
0.868
0.863
6.40
4.73
3.96
4.82
3.04
3.52
1%
5%
10%
10.16
8.52
7.09
6.83
5.45
5.45
9.00
7.58
6.28
5.99
4.67
4.67
8.48
7.16
5.86
5.58
4.27
4.27
Critical Values
Group
OECD
EUROPE
EC
NON-EC
EFTA
NON-EUROPE
234
Table 8.13 The unit root hypothesis is strongly (at the 1% level) rejected in favor
of stationarity with a one-time break in 1975 for the OECD, European, and EC
countries and a break in 1973 for the non-EC and EFTA countries. For the nonTable 7. The Additive Outlier Model
Country
Break Year
Australia
1973
1979
Belgium
1975
Canada
1976
Denmark
1975
Finland
1974
France
1975
Germany
1972
Ireland
1976
Italy
1976
Japan
1969
Netherlands
1976
Norway
1986
Spain
1974
Sweden
1964
U.K.
1974
U.S.A.
1974
4.536
(10.61)
1.460
(6.42)
6.908
(13.99)
3.754
(8.17)
5.696
(11.93)
2.885
(8.65)
5.914
(11.81)
3.317
(6.01)
7.287
(8.19)
1.907
(4.20)
0.423
(2.38)
6.662
(10.55)
1.781
(4.91)
11.463
(8.20)
0.334
(2.01)
5.604
(8.82)
2.141
(5.67)
0.609
(3.99)
0.623
(4.33)c
0.404
(4.96)b
0.277
(4.33)c
0.513
(4.34)c
0.227
(6.64)a
0.660
(4.95)b
0.732
(3.63)
0.657
(7.58)a
0.702
(4.75)b
0.783
(3.53)
0.606
(4.06)
0.303
(4.78)b
0.685
(7.61)a
0.536
(3.87)
0.493
(4.60)c
0.251
(4.10)
Austria
2.053
(6.99)
1.704
(13.55)
2.771
(8.70)
5.145
(17.95)
2.557
(8.29)
1.915
(8.61)
2.052
(6.35)
1.417
(3.63)
5.627
(10.14)
4.650
(16.43)
1.653
(12.19)
1.945
(4.94)
2.094
(16.96)
2.400
(2.57)
1.470
(10.40)
2.715
(6.41)
4.840
(19.21)
1
4
3
3
1
4
1
3
3
3
2
1
4
1
4
3
Note: The critical values for the AO model, reported in Perron and Vogelsang (1992), are 5.20
(1%), 4.67 (5%), and 4.33 (10%). Numbers in parentheses are t-statistics. Superscripts a, b, and
c denote rejection of the unit root null at the 1%, 5%, and 10% significance levels respectively.
Table 8.
235
Group
Break Year
t
OECD
17
1975
0.638
21.91a
EUROPE
13
1975
0.651
18.92a
EC
1975
0.670
16.15a
NON-EC
1973
0.550
10.36a
EFTA
1973
0.557
8.45a
NON-EUROPE
1975
0.629
5.61
Critical Values
Group
1%
5%
10%
OECD
12.38
11.56
11.16
EUROPE
10.89
10.00
9.63
EC
9.13
8.35
7.97
NON-EC
8.60
8.01
7.66
EFTA
7.18
6.46
6.11
NON-EUROPE
7.18
6.46
6.11
Note: Superscripts a, b, and c denote rejection of the unit root null at the 1%, 5%, and 10%
significance levels respectively.
Europe countries, the unit root null could not be rejected at the 10% level. This
panel, however, consists of only four countries.
V. CONCLUSIONS
The purpose of this chapter was to develop and implement panel unit root tests
in the presence of structural change. To that end, we combine methods from
two previously disjoint literatures: testing for a unit root in panels and testing
236
for a unit root in the presence of structural change. The resultant test allows for
both serial and contemporaneous correlation, both of which are often found to
be important in the panel unit root context.
The motivation for the test comes from the hypothesis that conventional
panel unit root tests, those that do not incorporate structural change, will have
low power if the data are stationary with structural change. While this is well
established in the univariate literature, it is only a conjecture in the panel
context. We investigate this conjecture by conducting power experiments for
panels of non-trending, stationary series with a one-time change in the mean,
and find that conventional panel unit root tests generally have very low power.
We then conduct the same experiments using methods that test for a unit root
in the presence of structural change, and find that the power of the tests is much
improved.
We apply our test to a data set of annual unemployment rates for 17 OECD
countries from 1955 to 1990. For these countries, unit root tests that do not
incorporate structural change, whether univariate or panel, provide no evidence
against the unit root null. While univariate tests that incorporate structural
change do provide some evidence against unit roots, the short span of the data
suggests that power may be problematic. Using our panel test with a one-time
structural change, we find very strong evidence of regime-wise stationarity.
This evidence is both for the full panel and for a number of smaller subpanels.
Our work could be extended in a number of directions. While the test
incorporates a one-time break in non-trending data, extensions to multiple
breaks and/or trending data would be straightforward. Once variety in the
number of breaks, type of breaks, number of countries, and number of
observations are allowed for, the number of possibilities increases rapidly. With
the availability of programs for calculating critical values, we suspect that it
will be more fruitful to develop tests on a case-by-case basis rather than attempt
to achieve generality.14
NOTES
1. MacKinnon (1991) shows how to calculate critical values for ADF tests for any
sample size.
2. If the coefficient is not equated across countries, as in Breuer, McNown &
Wallace (2000), the gains in power over univariate methods are much smaller. Im,
Peseran & Shin (1997) report higher power without equating across countries, but
their alternative hypothesis is that one member of the panel, rather than all members, are
stationary.
237
REFERENCES
Abuaf, N., & Jorion, P. (1990). Purchasing Power Parity in the Long Run. Journal of Finance, 45,
157174.
Bai, J., & Perron, P. (1998). Estimating and Testing Linear Models with Multiple Structural
Changes. Econometrica, 66, 4778.
Banerjee, A., Lumsdaine, R. L., & Stock, J. H. (1992). Recursive and Sequential Tests of the Unit
Root and Trend-Break Hypotheses: Theory and International Evidence. Journal of
Business and Economic Statistics, 10, 271288.
Bowman, D. (1999). Efficient Tests for Autoregressive Unit Roots in Panel Data. IFDP #646,
Board of Governors of the Federal Reserve System.
Breuer, J., McNown, R., & Wallace, M. (2000). The Quest for Purchasing Power Parity With A
Series-Specific Test using Panel Data. Working paper, Department of Economics,
University of South Carolina.
238
Campbell, J. Y., & Perron, P. (1991). Pitfalls and Opportunities: What Macroeconomists Should
Know About Unit Roots. In: O. J. Blanchard & S. Fischer (Eds), NBER Macroeconomic
Annual (pp. 141201). Cambridge: MIT Press.
Froot, K. A., & Rogoff, K. (1995). Perspectives on PPP and Long-Run Real Exchange Rates. In:
G. Grossman & K. Rogoff (Eds), Handbook of International Economics, Vol. 3 (pp. 1647
1688). North Holland: Amsterdam.
Hall, A. R. (1994). Testing for a Unit Root in Time Series with Pretest Data-Based Model
Selection. Journal of Business and Economic Statistics, 12, 461470.
Im, S., Pesaran, H., & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. Working
paper, Department of Economics, University of Cambridge.
Layard, R., Nickell, S., & Jackman, R. (1991). Unemployment: Macroeconomic Performance and
The Labour Market. Oxford: Oxford University Press.
Levin, A., & Lin, C. F. (1992). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample
Properties. Discussion paper 9223, Department of Economics, University of CaliforniaSan Diego.
Lumsdaine, R. L., & Papell, D. H. (1997). Multiple Trend Breaks and the Unit Root Hypothesis.
Review of Economics and Statistics, 79, 212218.
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and
a New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631652.
MacKinnon, J. G. (1991). Critical Values for Cointegration Tests. In: R. F. Engle & C. W. J.
Granger (Eds), Long-Run Economic Relationships: Readings in Cointegration (pp. 267
276). Oxford: Oxford University Press.
Ng, S., & Perron, P. (1995). Unit Root Tests in ARMA Models with Data Dependent Methods for
the Selection of the Truncation Lag. Journal of the American Statistical Association, 90,
268281.
OConnell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of
International Economics, 44, 120.
Papell, D. H. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float.
Journal of International Economics, 43, 313332.
Papell, D. H. (2000). The Great Appreciation, the Great Depreciation, and the Purchasing Power
Parity Hypothesis. Working paper, Department of Economics, University of Houston.
Papell, D. H., Murray, C. J., & Ghiblawi, H. (2000). The Structure of Unemployment. Review of
Economics and Statistics, 82, 309315.
Perron, P. (1989). The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis.
Econometrica, 57, 13611401.
Perron, P. (1997). Further Evidence on Breaking Trend Functions in Macroeconomic Variables.
Journal of Econometrics, 80, 355385.
Perron, P., & Vogelsang, T. J. (1992). Non-stationarity and Level Shifts With An Application to
Purchasing Power Parity. Journal of Business and Economic Statistics, 10, 301320.
Zivot, E., & Andrews, D. W. K. (1992). Further Evidence on the Great Crash, the Oil- Price Shock,
and The Unit Root Hypothesis. Journal of Business and Economic Statistics, 10,
251270.
I. INTRODUCTION
In the last few years much new research has emerged that develops econometric
methods for panel data where both the numbers of cross section and time series
observations are large. This research is motivated by the increasing availability
of important panel data sets that cover large numbers of different countries,
sectors, and individuals over long periods of time. Many of these data sets
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 239274.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
239
240
HEIKKI KAUPPI
241
II. THEORY
In panel data limit theory we consider a double indexed process Xn, T , in which
both n and T tend to infinity. In general, the limit of Xn, T depends on the
treatment of the indices n and T, and the properties that link the two dimensions
of the process. Phillips & Moon (1999a) discuss different approaches. One
possibility is to allow n and T to pass to infinity along a diagonal path
determined by a monotonically increasing functional relation of the type
T = T(n) as the index n . This approach simplifies the asymptotic theory by
replacing Xn, T with a single indexed process Xn, T(n). However, a drawback of this
diagonal path limit theory is that the assumed expansion path (n, T(n))
may not provide an appropriate approximation for a given (n, T) situation.
Furthermore, the limit theory is likely to depend on the specific functional
relation T = T(n) that is used in the asymptotic development. Following Phillips
& Moon (1999a) we therefore focus on an alternative approach where n and T
are allowed to tend to infinity simultaneously without imposing a specific
diagonal path for the divergence of the indices.
Merely as an auxiliary tool, we also consider a special form of multi-index
asymptotics, called the sequential limit theory. Again, this theory is introduced
by Phillips & Moon (1999a). The general idea of this approach is to derive limit
results in two steps. The first step is to fix one index, say n, and allow the other,
say T, to pass to infinity, giving an intermediate limit. The final limit result is
then obtained by letting n tend to infinity subsequently. While the sequential
limit theory can offer an easy route to a limit result it may give asymptotic
results that are misleading in cases where both indices tend to infinity
simultaneously (see Phillips & Moon (1999b)). Nevertheless, this theory can
often serve as a helpful tool to obtain conjectures about limit results that hold
under the more general joint limit theory.
242
HEIKKI KAUPPI
n
1
Xn, T =
kn
Yi, T,
i=1
where the Yi, T are independent random vectors across i and kn is either n or n.
A typical Yi, T component is a standardized sum of the time series component
of the panel data. Examples are given in the following section. To this end,
n
1
suppose we are interested in the probability limit of Xn, T =
n
Yi, T. Assume
i=1
Yi, T Yi as T for all i. Then, by the independence of Yi, T across i for all T,
n
1
it follows that Xn, T Xn as T for all n, where Xn =
n
p
i=1
be noticed that one has to assume that the Yi are defined on the same probability
n
1
space for all i so that the sum of the limit random variables
n
Yi is well
i=1
n
1
appropriate law of large numbers to Xn =
n
i=1
1
sequential limit of Xn, T . Let X = lim
n n
i=1
Xn X so that as T followed by n ,
p
Xn, T X.
This is a sequential probability limit result in the sense defined by Phillips &
Moon (1999a).
In general, the sequential probability limit X of Xn, T is not the same as the
probability limit of Xn, T under joint convergence of the indices n, T and may not
even exist or requires a different normalization. Examples are given in Phillips
& Moon (1999b). Therefore, an interesting question arises: when does the
sequential limit coincide with the joint limit? The following theorem is adopted
from Phillips & Moon (1999a, Theorem 1) and gives sufficient conditions
under which the joint probability limit and the sequential probability limit are
243
n
(i)
lim supn, T
1
n
i=1
n
1
(ii) lim supn, T
n
||E(Yi, T) E(Yi)|| = 0,
i=1
n
1
(iii) lim supn, T
n
i=1
1
(iv) lim supn
n
1
If limn
n
i=1
i=1
n
1
E(Yi) = X exists and Xn =
n
Yi X as n , then
i=1
Xn, T =
1
n
i=1
Theorem 1 gives fairly general conditions under which a joint probability limit
can be established. However, in many cases it may be rather tedious to verify
all the required conditions (i) through (iv) of the theorem. As shown by
Corollary 1 of Phillips & Moon (1999a) somewhat easier conditions can be
obtained in the special case, where the Yi, T are scaled variates of an iid process.
However, there are certainly various interesting situations where the heterogeneity of the different panel members arises from other sources so that Corollary
1 of Phillips & Moon (1999a) cannot be applied. Therefore, for dealing with
heterogenous panels of other types we have designed the following theorem.
The basic idea of Theorem 2 arises from Markovs law of large numbers that
applies in the case of independent variates Zi satisfying Markovs condition,
E||Zi||1 + M < for some > 0 and for all i.
244
HEIKKI KAUPPI
n
1
If limn
n
i=1
n
1
n
i=1
n
Yi, T, say.
n i = 1
(Examples are given in Phillips & Moon (1999a, b).) As to how to obtain
convergencies in joint limits as (T, n ), again, Phillips & Moon (1999a) give
some general results. Their Theorem 2 provides a joint central limit theorem for
(T, n ) that employs a Lindeberg condition for double indexed processes. In
addition, their Theorem 3 gives a version which applies to iid variates scaled
differently across cross section. Again, to deal with other types of heterogeneities across cross section we have developed the following version of the joint
central limit theorem.
Theorem 3. Suppose that Yi, T are independent scalar variables across i for all
T with E(Yi, T) = 0 and Var(Yi, T) = Vi, T. Assume the following conditions hold:
n
1
(i) limn, T
n
i=1
2+
i, T
(ii) supTE|Y |
Then,
n
Xn, T =
1
n
i=1
245
III. AN APPLICATION
Most of the recent applications of the new large n, T panel data limit theory has
involved studying and developing estimators and tests for panel cointegrating
regressions where the regressors are integrated of order one. In this section we
analyze problems that arise in these models when the regressors are nearly
rather than exactly integrated of order one. We start by introducing the model
and assumptions.
A. The Model
We focus on the simple two variable panel regression
yi, t = xi, t + ui, t,
(1)
ci
xi, t =
i xi, t 1 + i, t,
i = exp(ci /T) 1 + ,
T
(2)
(t = 1, . . . , T, i = 1, . . . , n),
where the initial values zi, 0 = (yi, 0, xi, 0) are iid, E||zi, 0||4 < , and the errors are
specified below. To this end, notice that if
i = 1 (i.e. ci = 0) in (2) for each i, then
the xi, t are pure or exact unit root processes and the system given by equations
(1) and (2) coincides with the homogenous panel cointegration regression
studied by Phillips & Moon (1999a) and many others (for a survey, see Phillips
& Moon (1999b)). In these studies the regression coefficient in (1) is called a
cointegrating parameter and it represents a stationary relationship that holds
between yi, t and xi, t for every i. Such a common long-run relationship is often
predicted by economic theory and it is then of central interest to estimate and
test whether it satisfies theoretically sound restrictions. A typical example
involves testing for the existence of a purchasing power parity hypothesis in a
panel of suitably similar countries.
In contrast to the recent panel cointegration literature, we do not restrict
attention to models, where the regressors are generated by exact unit root
processes. Indeed, although most macroeconomic variables analyzed in the
recent panel cointegration studies display strong autocorrelation, there are
seldom strong prior reasons why the autoregressive parameter should be unity.
The problem is aggravated by the fact that unit root tests cannot reliably detect
small deviations from unity. Given this uncertainty about the unit roots, it is of
interest to study problems that arise in the statistical inference about the
regression parameter in (1) when the autoregressive parameters in (2) are close
to rather than exactly equal to one. From earlier literature we know that such
246
HEIKKI KAUPPI
problematic near alternatives are best modeled by the local to unit root
ci
parametrization
i = exp(ci /T) 1 + in (2) (see e.g. Elliott (1998) and Stock
T
(1997)). By this device it is possible to obtain asymptotic results that provide
reasonable approximations in cases where the regressors xi, t are stationary but
revert to their means so slowly that the standard fixed
i asymptotics fail to
attain satisfactory accuracy.
We close this section by imposing the following assumption.
Assumption 1. The errors i, t = (ui, t, i, t) are linear processes satisfying the
following conditions:
(a) i, t = C(L)i, t =
Cji, t j, where
j=0
j3||Cj|| < ,
j=0
(b) i, t = (
i, t, wi, t), where
i, t and wi, t are mutually independent and iid across
i and over t with E(
i, t) = E(wi, t) = 0, E(
2i, t) = E(w2i, t) = 1, and E(
4i, t) =
E(w4i, t) = 4 < for all i and t.
Under Assumption 1 the error process in the system (1) and (2) satisfy the same
conditions as the error process of the homogenous panel cointegration
regression of Phillips & Moon (1999a, Assumptions 8 and 9).
B. Preliminary Analysis
For preliminary insights, we derive sequential limits for the pooled panel OLS
estimator,
n
i=1
t=1
xi, tyi, t
i=1
(3)
.
x 2i, t
t=1
Let [Tr] denote the integer part of Tr. From Phillips & Solo (1992), we know
[Tr]
i, t converges weakly
T t = 1
to a two dimensional Brownian motion Bi(r) = (Bui(r), Bi(r)), (0 r 1), with
that under Assumption 1, the partial sum process
j=
247
= [kl], (k, l = u, ). Furthermore, by the well know limit theory for near
integrated processes (e.g. Phillips (1987, 1988)) as T ,
T
1
T2
Kci(r)2dr,
(4)
t=1
1
T
x2i, t
1
xi, tui, t
Kci(r)dBui(r) + u,
(5)
t=1
matrix =
e(r s)cidBi(s),
j=0
n
1
T( )
n
i=1
1
n
Kci(r) dr
1
Kci(r)dBui(r) + u .
i=1
(6)
This result provides the first step for obtaining sequential asymptotics for (3).
The second step is to derive the limit of the right hand side of (6) as n . For
simplicity assume ci = c for all i. Then, notice that the
with mean zero and variance
Kci(r)dBui(r)
1
= uu
(7)
where the equality follows from well known results for stochastic integrals.
Consequently, we may apply the strong law of large numbers to obtain
n
1
n
i=1
as
Kci(r)dBui(r) 0, as n ,
as
0
Kci(r)2dr =
1
(8)
Kci(r)2dr are
248
HEIKKI KAUPPI
and E
Kci(r)2dr
1
e2(r s)cdsdr, as
T( ) 1/
e2(r s)cdsdr
u
(9)
(10)
where
V =
uu
1
1
r
e2(r s)cdsdr
The latter limiting result essentially follows from the fact that
n
1
n
i=1
Kci(r)dBui(r)
249
u. One alternative is to use the kernel estimation strategy that is used in the
pooled fully modified (PFM) estimator of Phillips & Moon (1999a). The PFM
estimator will be introduced in the subsequent section and it employes the
= [ kl] and
= [ kl], (k, l = u, ), of and ,
averaged kernel estimators
respectively, defined by
i=
i,
i=1
1
Here i(j) =
T
i=1
(j/K) i(j),
j=T+1
T1
=1
n
T1
=1
n
i,
i=
(j/K) i(j).
(11)
j=0
1 (x)
< . As to
|x|q
applicable lag kernel functions and the choice of the bandwidth parameter K we
follow Phillips & Moon (1999a) and impose the following assumption.
1
Assumption 2. The lag kernel (j/K) in (11) has Parzen exponent q > , and
2
the bandwidth parameter K tends to infinity with K/T 0 and K2q/T > 0, as
T .
)
Remark 1. Under Assumption 2 the normalized estimation errors n(
) converge in probability to zero. This result was stated in
and n(
Phillips & Moon (1999a, Proof of Theorem 9) and holds as (T, n ) with
n/T 0. This result is employed in the proofs of the theorems given below.
Remark 2. Notice that the kernel estimators defined in (11) are not feasible,
since they employ the unknown errors i, t = (ui, t, i, t). A natural approach to
i, t, from a preliminary
estimate ui, t and i, t is to use the residuals u i, t = yi, t x
pooled panel OLS regression, and the differences xi, t , respectively. It is easy
to show that the associated estimation errors for ui, t and i, t are of orders of
magnitude T 1 and T 1/2, respectively. In view of this and Remark 1 we may
then expect that under the assumptions of this chapter and irrespective whether
the xi, t in (2) have exact or near unit roots, the use of u i, t and xi, t in places of
ui, t and i, t, respectively, has no effect on the rate of consistency of the kernel
250
HEIKKI KAUPPI
n
=
*
i=1
t=1
i=1
(12)
x2i, t
t=1
we assume that the values of ci are uniformly bounded and such that the
1
n
1
lim
n n
i=1
1
Jci(r) dr = lim
n n
2
i=1
exists and is finite by assumption. The latter condition is not restrictive and
basically means that we assume that the appropriately normalized sample
n
1
second moment of the pooled regressors xi, t, i.e. 2
nT
probability.
i=1
x2i, t, converges in
i=1
Theorem 4. Suppose Assumptions 1 and 2 hold and that data are generated by
(1) and (2) with ci such that supi|ci| c < . Then under joint limits as
(T, n ) with n/T 0
) N(0, V *
nT( *
),
where
V *
=
uu 1
.
xx
251
n
i=1
t=1
x i, ty i, t nT u
=
*
i=1
(13)
x 2i, t
t=1
T
T
1
1
where y i, t = yi, t y i and x i, t = xi, t x i, with y i =
yi, t and x i =
xi, t,
T t=1
T t=1
respectively.
The asymptotic properties of the estimator in (13) are easily found by
employing the sequential limit theory. To reveal the most essential part of this
exercise note that we have
T
1
T
t=1
x i, tu i, t
(14)
252
HEIKKI KAUPPI
still remove the bias effects that arise from the presence of u on the right hand
side of (14), the remaining term, i.e.
c (r).
mean in comparison with the case in (5), where we had Kci(r) in place of K
i
In fact,
E
i=1
1
n
e(r s)cidsdr
p
c (r)dBu (r)
uxx, as n ,
K
i
i
where xx is given above. In view of this result it is easy to see that the
estimator in (13) is subject to an asymptotic bias, which depends on the
nuisance parameters ci. Unfortunately, no technique is currently available that
would provide consistent estimates for the single localizing coefficients ci.
Only in the special case where the localizing coefficient are the same across i,
we may use the cross sectional dimension of the panel to provide consistent
estimates for the common localizing coefficient (see Moon & Phillips (1999)).
This fact opens a possibility for correcting the bias effects. However, such a
correction may be rather complicated and is to be restricted in cases where the
common c is well below zero (cf. Moon & Phillips (1999)). While it is out of
the scope of this study to consider this matter in more detail, in empirical
applications the special case of a common c is nevertheless hardly realistic.
D. Fully Modified Estimation
We turn to consider the PFM estimator of Phillips & Moon (1999a). The idea
of the PFM estimator is to modify the pooled OLS estimator in (3) by
employing non-parametric corrections in the same way as in the fully modified
OLS (FM-OLS) estimator of Phillips & Hansen (1990). The estimator is
defined by
n
253
+ =
i=1
t=1
i=1
(15)
x2i, t
t=1
where
1
u
xi, t
yi,+ t = yi, t
(16)
1
u+ = u
u
,
(17)
and
employ the kernel estimators in (11). The equation (16) gives an endogeneity
correction and is similar to that in the FM-OLS estimator of Phillips & Hansen
(1990). The equation (17) gives the contemporaneous and serial correlation
corrections that are needed to remove all the second order bias effects arising
from temporal correlation between ui, t and i, t.
Under the assumption that the regressors xi, t in (2) have exact unit roots the
joint asymptotics of the PFM estimator are determined by Theorem 9 of
Phillips & Moon (1999a). The following theorem shows how this result
changes when the regressors xi, t are generated by the more general class of near
unit root processes. Here we make an additional (technical) assumption that the
values of ci are such that the ci-weighted average of the expected values of
n
1
limn
n
ciE
i=1
Jci(r)2dr = cxx
u 1
,
xx
(18)
254
HEIKKI KAUPPI
1
with u = uu 2u
, and
n
u
Bn, T =
i=1
T
ci / T
T(e
1)
xi, t xi, t 1
t=1
(19)
x2i, t
i=1
t=1
u cxx
.
B =
xx
(20)
The following corollary holds when the assumption of Phillips & Moon
(1999a) about exact unit roots in the regressors xi, t is valid.
Corollary 6. Suppose Assumptions 1 and 2 hold and data are generated by (1)
and (2) with ci = 0 for all i. Then under joint limits as (T, n ) with n/T 0
1
nT( + ) N(0, 2u
).
It is indeed easy to see that the result of Corollary 6 follows from Theorem 5,
1
1
1
Jci(r)2dr = E
Wi(r)2dr =
because if ci = 0, then Bn, T = B = 0, and E
2
0
0
1
giving V + = 2u . The result of Corollary 6 coincides precisely with that of
Theorem 9 of Phillips & Moon (1999a) and it is illustrative to compare it to
Theorems 4 and 5 above. First, note from Corollary 6 the obvious fact that
when the exact unit root assumption holds, then + is nT-consistent,
asymptotically normal and unbiased. In addition, note that in this case + is
1
because u = uu 2u
uu. This is the
generally more efficient than *,
price that we have to pay, if the autoregressive parameters in (2) happen to be
instead of + .
exactly equal to one and we use the estimator *
However, as Theorem 5 indicates the behavior of the estimator + is
radically different, if the regressors xi, t are generated by processes with roots
that are only local to one. First, the estimator + is no more nT-consistent.
Rather, in order to obtain nT-rate asymptotics, a bias term Bn, T given in (19)
has to be subtracted from the estimation error. In fact, in view of the result (b)
of Theorem 5, if the xi, t are near, rather than exact, unit root processes, the
estimator + is only T-consistent and has an asymptotic bias given by B in (20).
If there is no simultaneity in the model, i.e. if u = 0, then the biases disappear
and the PFM estimator is nT-consistent and has an asymptotic normal
distribution with the same variance as that of the serial correlation corrected
pooled OLS estimator.
To see why the biases arise notice first that when an autoregressive parameter
i in (2) is just nearly one with ci non-zero, then xi, t = i, t + (eci /T 1)xi, t 1,
255
where (eci /T 1) ci /T. It is then easy to see that the use of xi, t in the
endogeneity correction term (16) gives raise to Bn, T in (19), which has the limit
given in (20). It is worth noticing that if the nuisance parameters ci were known,
we could employ a quasi-difference in place of the pure difference xi, t in (16)
so that the bias term, Bn, T = 0. However, as we already noted above such a
solution is generally infeasible because the localizing coefficient ci are
unknown and cannot be consistently estimated from the individual time series
xi, t.
We close this section by pointing out that the above bias problem also occurs
in cases where the PFM estimator is modified to account of deterministic
effects like individual intercepts in (1). This fact can be easily verified through
sequential asymptotics (for details see Kauppi (1999, p. 124125)).
E. Hypothesis Testing
In this section we consider testing a simple hypothesis H0: = 0 against
H1: 0. First, in view of Theorem 4 we could use the serial correlation
corrected pooled OLS estimator to obtained the t-test statistic
0)
t * = nT( *
n
1
nT2
i=1
t=1
uu
x2i, t
In view of Theorem 4 and the result (36) given in its proof in the appendix it
is easy to deduce the following corollary.
Corollary 7. Suppose the assumptions of Theorem 4 hold. Then, under joint
limits as (T, n ) with n/T 0, t * N(0, 1).
For comparison we will also consider assuming exact unit roots in xi, t and
accordingly employing the PFM estimator based t-test
t + = nT( + 0)
1
,
2
u
1
where and u =
uu
2u
are obtained from the kernel estimators
given in (11) (cf. Phillips & Moon (1999a, Remark (c), p. 1086)).
256
HEIKKI KAUPPI
1
(b) t + N(0, Vt + ), if u = 0, where Vt + = xx.
2
Part (a) of Corollary 8 states the obvious consequence of Theorem 5 that the ttest statistic t + diverges, if the regressors are generated by local to unit root
processes and u is non-zero. This means that hypothesis tests based on the
PFM estimator are generally severely distorted. The result of part (b) of
Corollary 8 shows that even when there is no simultaneity, i.e. u = 0, the test
does not have the desired standard normal distribution. To illustrate this latter
effect suppose that ci = c for all i. Then, if u = 0, we have
because E
1
Jci(r)2dr =
Vt + =
2c2
,
e2c 2c 1
(21)
to see from (21) that for negative values of c, the Vt + becomes larger than unity.
For example, for c = 5 and c = 10, the Vt + is approximately equal to 5.55
and 10.53, respectively. Notice that if the usual 5% critical value 1.96 is applied
in the t + -test, then the true asymptotic rejection rates that correspond to c = 5
and c = 10 are approximately equal to 40.3% and 54.6%, respectively.
F. Simulations
In this section, we illustrate the theoretical findings obtained in the previous
section by conducting some simple Monte Carlo experiments. We focus on
investigating the size behavior of the PFM t-test statistic, t + , and that of the
bias corrected t-test, t *. For the experiments we generate artificial data by
employing equations (1) and (2), where we impose = 1 in (1). The errors
i, t = (ui, t, i, t) are generated simply by equation i, t = chol(C)i, t, where
i, t ~ nid(0, I2) across i = 1, . . . , n, and over t = 1, . . . , T, and chol(C) is the
Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1,
C12 = C21 = u. Thus, we have E(ui, t) = E(i, t) = 0, E(u2i, t) = E(2i, t) = 1 = uu =
and E(ui, ti, t) = u. The initial values yi, 0 and xi, 0 are set to zeros.
Table 1 reports percentage rejection rates of the t-tests, t + and t *,
respectively, when a 5% critical value 1.96 is applied, n = 50, T = 250, and the
local to unit root coefficients are set equal to a common value c, i.e. we use
i =
= 1 + c/T for all i. In computing the long-run covariance estimates in t +
and t *, respectively, we employed the Parzen kernel function and the
bandwidth parameter value K = 1.[2] The columns under c = 0 report results
when an exact unit root assumption holds. In accordance with the analytical
Table 1.
257
c=0
c = 5
c = 10
u
t +
t *
t +
t *
t +
t *
0
0.2
0.4
0.6
0.8
5.20
5.30
6.60
4.30
4.30
4.70
4.40
6.80
4.50
4.50
42.10
89.80
100.0
100.0
100.0
5.00
4.30
4.90
4.00
5.80
52.30
99.60
100.0
100.0
100.0
4.20
5.40
4.90
5.60
4.50
Notes: The columns under t + and t * report Monte Carlo rejection rates of the respective t-tests
computed by employing long-run covariance estimates that were achieved by using a Parzen
kernel function and a bandwidth parameter value K = 1. A nominal 5% asymptotic level were
applied. In each replication, the data were obtained by using equations (1) and (2) with = 1 and
i =
= 1 + c/T in (1) and (2), respectively, initial values zeros, and with the errors i,t = (ui,t,i,t)
generated by equation i,t = chol(C)i,t, where i,t ~ nid(0, I2) across i = 1, . . . , n, and over
t = 1, . . . , T, and chol(C) is the Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1,
C12 = C21 = u. Results are based on 1000 replications.
results of the previous section, in this case, the size behavior of the two tests is
good. The columns under c = 5 and c = 10 give rejection rates when the
roots of the regressors are only nearly one. As predicted by Corollary 8, now
the t + -test is very sensitive to deviations from exact unit roots and suffers from
severe size distortions through all values of u. Notice that even when u = 0
the t + -test rejects far in excess to the desired 5% nominal level as was
predicted by the considerations of the previous section. In contrast, as predicted
by Corollary 7 the bias corrected t-test, t *, maintains well the desired size level
through different values of u.
Table 2 reports otherwise similarly computed test results as those of Table 1
except that now n and T are set to 25 and 100, respectively. As is apparent the
results do not change much from those of Table 1. This indicates that our
asymptotic results can provide fairly accurate approximations with sample
sizes that are typical in empirical applications.
Table 3 examines the performance of the bias corrected t-test when the
individual localizing coefficients in the generating mechanisms of the
regressors vary across different panel members. The heterogeneity across panel
members were obtained by using otherwise similarly generated data as in
Tables 1 and 2 except that all the individual specific localizing coefficients ci
were drawn from a uniform distribution on the interval [c, 0]. For example, the
column denoted by (n = 25, T = 100) and c = 10 reports simulation results
258
HEIKKI KAUPPI
Table 2.
c=0
c = 5
c = 10
u
t +
t *
t +
t *
t +
t *
0
0.2
0.4
0.6
0.8
6.80
6.60
6.00
5.40
5.30
6.20
6.10
5.10
4.90
5.80
37.40
74.60
99.20
100.0
100.0
5.50
4.00
5.10
6.20
5.60
52.30
96.50
100.0
100.0
100.0
5.50
6.60
6.20
5.80
5.00
Table 3.
(n = 25, T = 100)
u
c = 5
c = 10
c = 5
c = 10
0
0.2
0.4
0.6
0.8
4.82
5.80
4.96
5.42
5.18
5.10
5.06
4.62
5.06
5.18
5.00}
6.20
5.12
5.98
5.44
5.18
5.00
5.34
5.46
5.92
Notes: The table reports Monte Carlo rejection rates of the t *-test computed in the same way as
in Tables 1 and 2. The data were obtained otherwise similarly as in Tables 1 and 2 except that in
each replication the individual specific localizing coefficient ci (i = 1, . . . , n) were drawn from a
uniform distribution on the interval [c, 0]. The applied values of c are given in the top of each
column. Results are based on 5000 replications.
259
regard to the new bias corrected test, which was able to maintain good size
behavior through all the performed experiments. However, it should be pointed
out that our simulation setup here is rather simple and it is likely that some
problems arise in more complicated models. For example, if the data
generating mechanism obeys a more general short-run dynamics than
experimented here, then it can be expected that the non-parametric corrections
are subject to somewhat larger (finite sample) estimation errors, which may
weaken the performance of the bias corrected test. Furthermore, an additional
source of estimation error results in when the non-parametric estimators use
estimated values in places of the true values of the errors.
NOTES
1. This is proved by Phillips & Moon (1999a, Theorem 8) when ci = 0 for all i.
Furthermore, similar result can be proved in the case where the ci are nonzero by
following lines given in the proof of Theorem 5 of this chapter.
2. In empirical applications a bandwidth parameter value K = 1 is hardly realistic.
However, in the present simulation setup the actual value of K does not play an
important role, because we use iid errors in the simulations. For example, in all of the
260
HEIKKI KAUPPI
reported cases, essentially similar results were obtained by using the bandwidth
parameter value K = 4.
ACKNOWLEDGMENTS
I would like to thank the two referees for their useful comments and
suggestions. This paper was completed while the author worked at the
Research Department of the Bank of Finland whose hospitality is gratefully
acknowledged. This paper is a part of the research program of the Research
Unit on Economic Structures and Growth (RUESG) at the Department of
Economics at the University of Helsinki. Financial support from the Yrj
Jahnsson Foundation is appreciated. The usual disclaimer applies.
REFERENCES
Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley.
Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press.
Elliott, G. (1998). On The Robustness of Cointegration Methods When Regressors Almost Have
Unit Roots. Econometrica, 66(1), 149158.
Kauppi, H. (1999). Essays on Econometrics of Cointegration. Research Reports Nro 84,
Dissertationes Oeconomicae, Department of Economics, University of Helsinki.
Moon, H., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using
Panel Data. Cowles Foundation Discussion Paper No. 1224, Yale University,
(http://cowles.econ.yale.edu/).
Phillips, P. C. B. (1987). Towards A Unified Asymptotic Theory for Autoregression. Biometrica,
74(3), 535547.
Phillips, P. C. B. (1988). Regression Theory for Near-integrated Time Series. Econometrica, 56(5),
10211043.
Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference In Instrumental Variables
Regression With I(1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Non-stationary Panel
Data. Econometrica, 67(5), 10571111.
Phillips, P. C. B., & Moon, H. (1999b). Non-stationary Panel Data Analysis: An Overview of
Some Recent Developments. Cowles Foundation Discussion Paper No. 1221, Yale
University, (http://cowles.econ.yale.edu/).
Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. The Annals of Statistics,
20(2), 9711001.
Stock, J. H. (1997). Cointegration, Long-run Comovements, and Long Horizon Forecasting. In: D.
Kreps & K. F. Wallis (Eds), Advances in Econometrics Proceedings of the Seventh World
Congress of the Econometric Society. Cambridge: Cambridge University Press.
Stout, W. F. (1974). Almost Sure Convergence. New York: Academic Press.
White, H. (1984). Asymptotic Theory for Econometricians. Academic Press: San Diego,
California.
261
APPENDIX
APPENDIX A: PROOF OF THEOREM 2
n
1
From the conditions of the theorem we know that Xn, T =
Yi, T
n i=1
n
1
Xn =
Yi as T for all fixed n. Since supTE||Yi, T||1 + M < for all i and
n i=1
because Yi, T Yi implies ||Yi, T||1 + ||Yi||1 + by the continuous mapping theorem
we also have E||Yi||1 + M < by Theorem 5.3 of Billingsley (1968) (see also
discussion on p. 33 of Billingsley (1968)). By arguments given in the proof of
Theorem 1 of Phillips & Moon (1999a) we can justify that the Yi are
independent across i, since the Yi, T are independent across i for all T. Given this
and the fact that E||Yi||1 + M < , we may apply Markovs law of large
p
numbers to deduce Xn X as n (e.g. White (1984, p. 33)). Furthermore,
p
n
1
n
n
E||Yi, T||
i=1
1
n
i=1
< ,
sup E||Yi, T|| M
T
where the last two inequalities follow from condition (b) of the theorem. Also,
condition (ii) holds, since
n
1
n
i=1
by condition (a). For condition (iii) we use the fact that E||Yi, T||1{||Yi, T|| > n}
1
M
sup E||Yi, T||1 +
for all i, where the first inequality follows from
(n) T
(n)
arguments given by Billingsley (1968, p. 32) and the second inequality holds
by condition (b). Now, for any > 0,
n
1
n
i=1
M
,
(n)
262
HEIKKI KAUPPI
and therefore, condition (iii) follows. Condition (iv) holds by the same
1
M
E||Yi||1 +
.
argument as we notice that now E||Yi||1{||Yi|| > n}
(n)
(n)
n
Let s
2
n, T
i=1
Yi, T
. Then
sn, T
(22)
i=1
n
lim
n, T
i=1
n
n
n
i=1
Y2i, T
Y2
1 2i, T >
2
sn, T
sn, T
i=1
n 1
s2n, T n
i=1
s2n, T
n
n
n 1
2
sn, T n
s2n, T
n
n
i=1
(23)
By condition (ii) we can always find > 0 such that sup E|Y 2i, T|(1 + ) N < for
T
s2n, T
n
n
n
s2n, T
n
,
(24)
263
for all i (cf. Billingsley (1968, p. 32)). In view of (23) and (24) and given that
n
s2n, T
= V < we may
condition (i) implies lim 2 = 1/V < (V > 0) and lim
n, T s
n, T n
n, T
now conclude that
n
lim
n, T
i=1
where C = C(1) =
Ck and i, t =
k=0
j=0
j=0
Ck. Under
k=j+1
ji, t j with C
j=
C
j2||C j||2 =
j2
j=0
Cs
< (see
s=j+1
C2
+ C2
w
C
Cw
+ C
wCww
C
Cw
+ C
wCww
uu
=
2
2
Cw
+ Cww
u
u
(27)
For subsequent reference note that the components of i, t = (ui, t, i, t) in (25)
may be written as
(28)
ui, t = C
i, t + C
wwi, t + u i, t 1 u i, t,
i, t = Cw
i, t + Cwwwi, t + i, t 1 i, t,
(29)
where u i, t and i, t are the two components in i, t.
Next, by equation (2)
t
xi, t =
s=1
264
HEIKKI KAUPPI
(30)
f(a)i, t =
e((t s)/T)ciai, s, a = , w,
(31)
s=1
and
t1
(t 1)/T)ci
i, 0 + (1 e
R(x)i, t = e
ci /T
(32)
s=1
For later analysis it is useful to have the following two moment bounds. First,
2
f(a)i,
1
t
sup sup E
sup
i 1tT
T
1tT T
t
(33)
s=1
since e((t s)/T)2 supi|ci| M < (recall that supi|ci| c < ). Second, using the
m
inequality E
m
Xi| m
i=1
i=1
sup sup E(R2(x)i, t) 4 sup e((t 1)/T)2 supi|ci| E(2i, 0) + 4 sup E(2i, t)
i
1tT
1tT
1tT
1
2
1tT T
t1
k=1
t1
s=1
M < .
(34)
To see that (34) holds note that sup1 t T e(t/T)2 supi|ci| e2 supi|ci|, E(2i, t) M by (26),
E(x2i, 0) M (by the initial value condition), T 2(1 esupi|ci| /T)2 = O(1), and by the
Cauchy-Schwartz inequality E|i, k i, s| E(i, k)2E(i, s)2 M, where the latter
inequality follows again from (26).
265
n
) =
nT( *
n
i=1
t=1
1
n
n
i=1
1
T2
x2i, t
t=1
where n( u u) = op(1), as (n, T ) with n/T 0 (recall Remark 1). It
suffices to show that
1
nT
n
i=1
t=1
and
n
1
nT 2
i=1
(36)
t=1
n
1
nT 2
i=1
t=1
n
1
x = Cw
n
2
i, t
i=1
+ 2Cw Cww
1
n
i=1
1
+ 2Cww
n
i=1
1
T2
2
(
)i, t
+C
t=1
1
n
t=1
1
T2
i=1
1
T2
2
f(w)i,
t
t=1
t=1
1
T2
2
ww
1
n
1
f(w)i, tR(x)i, t +
n
i=1
i=1
1
T2
f( )i, tR(x)i, t
t=1
1
T2
R2(x)i, t
t=1
= C2w
Ib1 + C2wwIb2 + 2Cw
CwwIIb1 + 2Cw
IIb2 + 2CwwIIb3 + IIb4, say.
p
We now show that Cw
2Ib1 + C2wwIb2 xx and IIb1, IIb2, IIb3, IIb4 0 as
(T, n ) so that (36) follows.
Write
n
1
Ib1 =
n
i=1
Yi, T,
(37)
266
HEIKKI KAUPPI
T
1
where Yi, T = 2
T
2
f(
)i,
t. For an application of Theorem 2 observe that Yi, T are
t=1
1
1
E(Yi) =
Jci(r)2dr. We know
1
dsdr and by assumption lim
n n
(r s)2ci
i=1
n
1
n
i=1
For verifying condition (i) let p = 1 + and use the definition of Yi, T in (37)
to obtain
T
1
(E|Yi, T| ) = 2 E
T
p 1/p
2
(
)i, t
t=1
1/p
1
2
T
t=1
e((t s)/T)ci i, s
s=1
2p
1/p
, (38)
where the inequality follows from the Minkowskis inequality and the
definition of f(
)i, t in (31). Now, the e((t s)/T)ci
i, s, (1 s t T), are independent
random variables with zero means and E|e((t s)/T)ci
i, s|2p (esupi|ci|})2 + 2
E|
i, s|2 + 2 M for some M < and some > 0. Therefore, we may apply
Theorem 3.7.8 of Stout (1974, p. 213) to obtain
t
s=1
e((t s)/T)ci i, s
2p
Mt p,
(39)
where M is finite and independent of i. By inserting (39) into (38) and rising
to the power of p = 1 + it is easy to see that E|Yi, T|1 + M so that condition (i)
of Theorem 2 follows. For condition (ii) of Theorem 2 it suffices to note that
T
1
the supremum of the absolute difference between E(Yi, T) = 2
T
1
and E(Yi) =
t=1
e(t q/T)2ci
q=1
267
m
Xi|2 m
i=1
i=1
n
1
Assumption 1 we have E(IIb1) = 2
E(f(
)i, t /T)2E(f(w)i, t /T)2 =
n T i=1 t=1
1
O
, where the latter equality follows from (33). Second, the use of the
n
2
n
1
E
n
1
T2
i=1
f(a)i, tR(x)i, t
t=1
n
1 1
T n
i=1
1
T
t=1
f(a)i, t
T
E|R(x)i, t|2 = O
T
where the equality follows from (33) and (34). Hence, E|IIb2|, E|IIb3| 0 as
(T, n ). It is also straightforward to do similar calculations with IIb4 that
show E|IIb4| 0 as (T, n ). This completes the proof of (36).
We turn to prove the result in (35). First, use (28) through (30) to write
n
1
nT
i=1 t=1
n
1
T
i=1
t=1
n
i=1
= Ia + IIa, say.
1
T
t=1
(40)
Note that f(a)i, 1 = ai, 1 and f(a)i, t = eci /Tf(a)i, t 1 + ai, t, (ai, t =
i, t, wi, t), t 2, so that we
may write
268
HEIKKI KAUPPI
n
Ia =
1
n
1
T
i=1
n
i=1
t=2
n
(eci /T 1)
T
t=2
i=1
1
T
t=1
n
Ia1 =
where
1
n
Yi, T,
i=1
1
Yi, T =
T
t=2
T
1
E(Y ) = 2
T
2
i, T
2
[Cw
C
2 E(f(
)i, t 1
i, t)2 + C2wwC2
wE(f(w)i, t 1wi, t)2
t=2
2
2
+ Cw
C
w
E(f(
)i, t 1wi, t)2 + C2wwC2
E(f(w)i, t 1
i, t)2]
T
1
= uu 2
T
t=2
t1
e((t 1 s)/T)2ci,
(42)
s=1
where the last equality uses (27) and the fact that E(f(a)i, t 1bi, t)2 =
t1
s=1
Now, we apply Theorem 3. First, note that the Yi, T in (41) are independent
across i for all T with mean zero and variance Vi, T = E(Y2i, T ) in (42). Let
1
Vi = uu
n
1
n
1
Vi, T =
n
i=1
i=1
269
1
Vi +
n
(Vi, T Vi).
(43)
i=1
Using the fact that supi|ci| c < it is straightforward to show that the second
term on the right hand side of (43) tends to zero as n, T (see Kauppi (1999,
p. 135136)). On the other hand, the first term in (43) has the positive and finite
limit xx. Thus, condition (i) of Theorem 3 holds with V = uuxx. For
establishing condition (ii) of Theorem 3 recall the definition of Yi, T from (41),
m
1
E|Yi, T| M
E
T
p
f( )i, t 1 i, t
t=2
Xi
p1
E|Xi|p (e.g.
i=1
i=1
1
+ MwwE
T
1
+ M
wE
T
f( )i, t 1wi, t
t=2
f(w)i, t 1wi, t
t=2
1
+ Mw
E
T
f(w)i, t 1 i, t ,
(44)
t=2
where Mab = 4p 1|CwaC
b|p M < (a, b =
, w). Furthermore, by the fact that
i, t
are iid we have
T
f( )i, t 1 i, t
p
=E
T
t1
e((t 1 s)/T)ci i, s i, t
s=1
= E| i, t| E
T
t1
T
p/2
t1
e((t 1 s) /T)ci i, s
s=1
M < ,
(45)
t1
((t 1 s)/T)ci
because |e
2+
| e |ci|} M < , E|
i, t|
supi
M < , and E
i, s
2+
s=1
M(t 1)(2 + )/2 for some M < and for some > 0, where the result with regard
t1
i, s
to E
s=1
2+
270
HEIKKI KAUPPI
an iid sequence is also a martingale difference sequence). Now, given (45) and
the fact that the f(
)i, t 1
i, t, (2 t T) are martingale difference sequences for all
i, we may apply Theorem 3.7.8 of Stout (1974, p. 213) one more time giving
T
T
t=2
f(
)i, t 1
i, t
T
T1
T
p/2
The same arguments show that the other three expectations in (44) are similarly
bounded, and therefore, supTE|Yi, T|p = supTE|Yi, T|2 + M < for some > 0 and
all i. Hence, the conditions of Theorem 3 hold and we have shown that Ia1
converges weakly to the distribution given in (35) as (T, n ). Furthermore,
p
since supi|eci /T 1| = O(T 1), it follows immediately that Ia2 0 as (T, n ).
For Ia3 recall from (27) that u = C
Cw
+ C
wCww so that
n
Ia3 =
1
n
1
T
i=1
n
1
n
t=1
T
1
T
i=1
t=1
k=1
s=k
k=0
s=1
CsCk
CkCs = +
kCk + 1 CC
0.
C
k=0
j = [C
ab, j], (a, b =
, w); we may
Using this in conjunction with the partition C
write
n
IIa =
1
n
i=1
t=1
1
T
xi, t(ui, t 1 u i, t)
j=0
, j + Cww, j + 1C
w, j)
(Cw
, j + 1C
n
1
n
1
T
i=1
t=1
say.
T
1
T
T
1
1
xi, tu i, t 1 = xi, 1u i, 0 +
T
T
t=1
t=2
1
1
xi, tu i, t 1 = xi, 1u i, 0 +
T
T
1
1
= xi, 1u i, 0 + eci /T
T
T
and, thus,
T
1
T
t=1
1
xi, t(ui, t 1 u i, t) =
T
T1
t=1
T1
t=1
+ (e
1
xi, tu i, t +
T
T1
T1
i, t + 1u i, t,
t=1
1
1)
T
T1
xi, tu i, t.
t=1
n
IIa1 =
n
i=1
1
T
T1
i, t + 1u i, t
n
n
+
T
1
1
xi, 1u i, 0
T
n
i=1
, j + Cww, j + 1C
w, j)
(Cw
, j + 1C
j=0
t=1
i=1
xi, t + 1u i, t
t=1
1
1
i, t + 1u i, t + xi, 1u i, 0 xi, Tu i, T
T
T
ci /T
271
n
1
1
xi, Tu i, T +
T
n
ci /T
(e
i=1
1
1)
T
T1
xi, tu i, t
t=1
, j + Cww, j + 1C
w, j)
(Cw
, j + 1C
j=0
n
, say.
T
n
R1, i, T
=O
1
derived in the
T
n i = 1
proof of Lemma 16 of Phillips & Moon (1999, p. 1107) we have
272
HEIKKI KAUPPI
n
n
1
T
i=1
T1
i, tt + 1
t=1
kCk + 1
C
1
.
T
=O
k=0
(46)
Since IIa1a is the (1, 2) element of the matrix inside the norm on the left hand
p
side of (46), we have IIa1a 0 as (T, n ). Next, by the triangle and CauchySchwartz inequalities
n
i=1
n1
Tn
1
xi, Tu i, T
T
T
i=1
n
T
sup E
1in
xi, T
E|ui, T|2
xi, T
n
,
T
E|ui, T|2 = O
T
where the equality is easily verified by using (26), (30), (33) and (34).
p
p
Therefore, IIa1c 0 as (T, n ) with n/T 0. Obviously, also, IIa1b 0 as
sup|ci|/T
1| and note that
(T, n ) with n/T 0. Finally, for IIa1d, let rT = T|e
n
E|IIa1d| rTE
n
i=1
1
T2
rT
i=1
1
T
n1
Tn
xi, tu i, t rT
t=1
T1
n1
Tn
T1
t=1
xi, t
i=1
T
1
T
E|ui, t|2 = O
T1
t=1
xi, t
T
u i, t
n
,
T
by similar arguments to those used for IIa1c and the fact that rT = O(1).
p
We turn to show that IIa2 0 as (T, n ) with n/T 0. Using (32) write
n
IIa2 =
n
1
T
i=1
n
1
n
i=1
t1
1
T
t=1
((t 1)/T)ci
(e
i, 0 + (1 e
ci /T
t=1
s=1
say.
Here IIa2a 0 as (T, n ) with n/T 0, because IIa2a is identical with the
n
term
1
n
i=1
273
p. 1105). Finally, the result IIa2b 0 as (T, n ) with n/T 0 follows from
similar arguments as those used for IIa1. Details are straightforward and thus are
omitted. This completes the proof of the theorem.
n
n
i=1
1
1
[xi, t(ui, t
u
xi, t u+ ) + T(eci /T 1)u
xi, txi, t 1]
t=1
,
T
1
x2i, t
n i=1 T 2 t=1
where the denominator has the limit given in (36). Next let u+ = u
1
and note that the nominator in the above estimation error can be
u
written as
1
n
1
n
i=1
1
T
1
[xi, t(ui, t u
i, t) u+ ]
t=1
n
1
1
n( u
u
)
1
n
i=1
1
T
t=1
1
n( u u) + u
n( ),
where the n-normalized estimation errors of the kernel estimators are op(1)
as (n, T ) with n/T 0 (recall Remark 1). Furthermore, using the fact that
xi, t = (eci /T 1)xi, t 1 + i, t we can write
n
1
n
i=1
1
T
t=1
n
1
(xi, txi, t ) =
n
(eci /T 1)
T
i=1
1
+
n
i=1
xi, txi, t 1
t=1
1
T
t=1
where the last equality holds as (n, T ) and can be proved by applying the
arguments given in the proof of Theorem 4. Thus, for the result in part (a) of
Theorem 5, it suffices that
274
HEIKKI KAUPPI
n
1
n
i=1
1
T
1
[xi, t(ui, t u
i, t) u+ ] N(0, u xx),
t=1
as (T, n ) with n/T 0. The details of the proof of this latter result are
similar to those of the proof of (35) and are thus omitted. Finally, note that the
limiting result in part (b) of the theorem follows from lines used in the proof
of (36) and the fact that the arithmetic average of the quantities ciE(01 Jci(r)2dr)
converges to a finite number cxx.
STATIONARITY TESTS IN
HETEROGENEOUS PANELS
Yong Yin and Shaowen Wu
ABSTRACT
Several stationarity tests in heterogeneous panel data models are
proposed in this chapter. By allowing maximum degree of heterogeneity in
the panel, two different ways of pooling information from independent
tests, the group mean and the Fisher tests, are used to develop the panel
stationarity tests. We consider the case of serially correlated errors in the
level and trend stationary models. The small sample performances of the
tests are investigated via Monte Carlo simulations. The simulation
experiments reveal good small sample performances. In the presence of
serial correlation, either the group mean or the Fisher tests based on
individual KPSS tests with l2 and LMC tests with p = 1 are recommended
for use in empirical work due to their good small sample performances.
I. INTRODUCTION
Dynamic panel data analysis has attracted more and more attention. This is
partly due to the recent availability of large panel data sets. These data sets
usually cover different countries, industries, or regions over relatively long time
spans. They offer new opportunities as well as challenges to the analysis of
dynamic panel data models, especially the heterogeneous panel data models as
researchers usually would anticipate great differences among the cross-section
units in the data.
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 275296.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
275
276
277
deterministic trends under the null and different error structures. The tests
should be able to handle serially correlated errors in the models. In the
univariate case, based on a Lagrange Multiplier (LM) test in case of i.i.d.
errors, there are two different extensions to handle the existence of serial
correlation. Kwiatkowski, Phillips, Schmidt & Shin (1992) (KPSS hereafter)
propose to use nonparametric estimation to handle the situation while
Leybourne & McCabe (1994) (LMC hereafter) propose to use augmented
autoregressive components to take care of it. We shall propose panel
stationarity tests utilizing both tests. One type of the tests we propose would be
based on the group mean of the individual test statistics, which can be shown
to have a normal distribution asymptotically after some adjustments are made
to the group mean. The second test is in line with Maddala & Wu (1999). The
idea of the test could be traced back to Fisher (1932), which pools the p-values
from individual tests. We will also design some Monte Carlo experiments to
investigate the small sample performances of the proposed tests.
The rest of the chapter is organized as follows. In Section II we will set up
the models for heterogeneous panel and discuss panel stationary tests. Monte
Carlo simulation designs and results aiming at investigating small sample
performances of proposed tests can be found in Section III, and Section IV
concludes.
(1)
yt = rt + t
and under the null yt is level stationary instead of trend stationary.
(2)
278
T
2
2
T
t=1
i=1
S2t/ 2 .
t=1
In order to construct the LM test statistic to test the null hypothesis of level
stationary instead of trend stationary, we should define e t as the residuals from
the regression of yt on an intercept only.
d
It has been shown that for the trend stationary model, T 2LM
V2(r)2 dr
under the null hypothesis, where V2(r) is the second-level Brownian bridge
given by V2(r) = W(r) + (2r 3r2)W(1) + (6r + 6r2)
being a Wiener process. For the level stationary model, under the null,
d
T 2LM
V(r)2 dr,
where
V(r)
is
standard
Brownian
bridge:
T
l
w(s, l)
s=1
e 2t + 2T 1
t=1
t=s+1
279
T
l=1
V(r)2 dr,
V2(r)2 dr and
both tests are consistent. See KPSS for more details of derivation and proof
along with some simulation results.
The KPSS tests handle the serial correlation in a way similar to those of
Phillips-Perron tests for unit roots. LMC, on the other hand, propose to use the
augmented autoregression to handle serial correlation, which is similar in a way
to those of the Augmented Dickey-Fuller tests for unit roots. Since any
stationary structure can be represented by autoregressive structures, LMC work
with transformed models of (1) and (2). That is, (L)yt = rt + t + t for trend
stationary models, and (L)yt = rt + t for level stationary models, where (L)
is a polynomial in lag operator L.
To construct the test statistics, one should estimate ARIMA(p, 1, 1) models
in order to remove the serial correlation first, and proceed with the whitened
series to get the LM test statistic as if there is no serial correlation. LMC label
the test statistic s
for the level stationary models and s for the trend stationary
models. Please see their paper for detailed descriptions and discussions of the
d
V(r)2 dr and s
V2(r)2 dr.
LMC argue that their tests are superior to the KPSS tests due to the fact that the
augmented autoregression is used to control for serial correlation. Theoretically, the LMC tests are more powerful than the KPSS tests because the LMC
test statistics are Op(T) under the alternative while the KPSS test statistics are
Op(T/l). This superiority is also shown through Monte Carlo simulation.3
The univariate model for testing for stationarity can be readily extended to
the panel data models. Let yit, i = 1, . . . , N, t = 1, . . . , T, be the observed N
cross section units of time span of T for which we want to test for stationarity.
Let us consider the following models.
Level stationarity: yit = rit + it
(3)
(4)
Where rit = rit 1 + it, with ri0s being fixed constants such that ri0 is not
necessarily equal to rj0 if i j.4
280
Assumption
(i)
2i
0
if i = j and t = s
otherwise
(ii) For each cross-section unit i, it either satisfies the strong mixing
conditions for functional central limit theorem to be hold with long-run
variance of 2i, or it can be expressed in a p-th order AR model.
(iii) E(itjs) = 0 i, j, t, s
Note that assumption (i) adds heterogeneity to the error structure of by
allowing heteroskedasticity. Assumption (ii) also allows heteroskedasticity in
while assumption (iii) rules out contemporaneous correlation and states that
and are uncorrelated within units as well.
Define qi = 2i/2i, that is, qis are the signal-to-noise ratios in each crosssection units. The null hypothesis can be expressed as H0 : qi = 0 for all i. For
level stationary models, under H0, each cross-section unit is stationary around
a level ri0, which is not necessarily the same across the units. While for trend
stationary models, under H0, each cross-section unit is stationary around a
linear trend ri0 + it, which is also not necessarily the same across the units. The
different levels and linear trends truly reflect the possibility of heterogeneity
across sections. The alternative hypothesis is that H1 : qi > 0 for all i. Here, we
introduce heterogeneity by allowing different signal-to-noise ratios across
sections. That is, the signal-to-noise ratios are only required to be greater than
0 but not necessarily to be the same under the alternative.
Let and be the individual KPSS test statistic for the i-th unit. Define
1 =
V(r)2 dr and 2 =
mean tests as
=
and
1
N
N
i E(1)
i=1
Var(1)
N
=
1
N
N
i=1
i E(2)
281
s
i E(1)
si E(2)
N
s
=
and
1
N
i=1
Var(1)
N
N
s =
1
N
i=1
Var(2)
By using the sequential limit theorem, it can be shown that under the null, all
four test statistics would have the standard normal distribution asymptotically
under the assumption spelled out earlier. Note that the sequential limit theorem
requires that T goes to infinity followed by N goes to infinity, and the
asymptotic can be established by an application of the Lindberg-Levy central
limit theorem.5 The consistency of the tests is followed by the consistency of
the univariate tests established in the literature. It should be noted that the tests
are still consistent in the case of a mixed alternative hypothesis in which only
part of the panel are nonstationary while the rest are stationary, as long as
= lim N1/N > 0 where N1 is the number of nonstationary series under the
N
alternative.
Hadri (1998) used the characteristic function given by Anderson & Darling
(1952) to compute the means and the variances of i. For the level stationary
model, the mean is 1/6 and the variance is 1/45 while for the trend stationary
model, the mean is 1/15 and the variance is 11/6300. However, as suggested in
Im, Pesaran & Sin (1997), one can use the mean and the variance of small
sample distributions (in finite T) obtained via simulations to enhance the finite
sample performances of the group mean tests.6
The group mean test pools independent individual test statistics to find
evidence on the composite null. In the literature, there is another way to pool
information from individual test to test the composite null, which is due to
Fisher (1932). The idea has been applied to develop panel unit root tests in
Maddala & Wu (1999) and panel cointegration tests in Wu & Yin (1999). Both
the KPSS and the LMC tests can be used to formulate the Fisher tests to test
for stationarity as well. Let Pi be the p-value of the individual test for
stationarity for the i-th unit (using either the KPSS or the LMC test). Define the
N
i=1
degree of freedom 2N under the null hypothesis that qi = 0 for all i. Note that
282
283
tradition in the literature. It should be noted that all our tests are consistent even
when there are only parts of the series are non-stationary under the alternative
as long as the portion of nonstationary units is non-vanishing asymptotically.
Furthermore, we only consider the alternative H1 : qi = q = 0.001 for simplicity.9
We consider time dimensions of 25, 50, and 100 and cross sectional
dimensions of 15, 25, 50, and 100. The normal variates are generated by
RNDN function in the matrix programming language GAUSS. We apply the
group mean and Fisher tests based on the LM, KPSS, and LMC tests to each
panel. For each case, the number of iterations is 5,000. For the group mean test,
the mean and the variance of small sample distributions are derived from
100,000 simulations for the corresponding time span and test procedures. For
the Fisher test, the small sample distributions are simulated using 100,000
replications as well.
In order to carry out our experiments, we still need to select two parameters.
One is the truncation parameter l in the individual KPSS tests and the other one
is the order of autoregression p in the individual LMC tests. Following
earlier simulation results regarding the univariate KPSS tests in the litera-
l3 = int 12
T
100
1/4
T
100
1/4
, l2 = int 8
T
100
1/4
, and
Also, following earlier simulation results in the literature, we choose the Parzen
window instead of the Bartlett window used by KPSS as the former performs
better than the later. For the LMC test, we experiment with p = 1, 2, and 3
following Monte Carlo experiments by LMC.
Let us first look at the white noise case. In this case i = 0 and the tests based
on the individual LM tests are the appropriate ones to be used. Table 1 presents
the sizes of the group mean and the Fisher tests based on the LM, KPSS, and
LMC tests for the level stationary model. Note that by choosing l = 0 in the
KPSS test or p = 0 in the LMC test, the resulting test statistic is nothing but that
of the LM test. That is why the results for the tests based on the LM test are
listed in the column with the heading of p(l) = 0. We also listed the results for
N = 1 as a benchmark, where the results simply replicate those for the univariate
case. As we can see from the table, the size performances of the panel
stationarity tests are quite satisfactory in this case. In addition the performances
are relatively better as T gets larger. In most cases, the Fisher tests have better
size performances than the group mean tests, especially for larger T and smaller
N. This is not surprising as the Fisher test is an exact test while the group mean
284
Table 1.
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.047
0.049
0.053
0.055
0.047
0.049
0.051
15
25
50
100
0.061
0.053
0.054
0.046
0.063
0.057
0.054
0.046
0.059
0.058
0.054
0.050
0.063
0.063
0.059
0.051
15
25
50
100
0.050
0.045
0.047
0.043
0.047
0.048
0.050
0.043
0.056
0.051
0.053
0.052
0.046
0.048
0.046
0.041
0.045
0.044
0.047
0.046
0.052
0.053
0.052
0.047
0.047
0.046
0.051
0.047
0.050
0.050
15
25
50
100
0.066
0.066
0.056
0.057
0.059
0.062
0.053
0.054
0.051
0.067
0.059
0.056
0.065
0.070
0.065
0.061
0.058
0.067
0.057
0.055
15
25
50
100
0.052
0.054
0.049
0.051
0.050
0.052
0.045
0.050
0.054
0.053
0.048
0.057
0.049
0.054
0.049
0.048
0.050
0.055
0.055
0.054
0.046
0.053
0.049
0.051
0.051
0.050
0.049
0.048
0.045
0.049
15
25
50
100
0.056
0.057
0.056
0.056
0.058
0.057
0.057
0.058
0.060
0.061
0.062
0.059
0.063
0.062
0.054
0.056
0.069
0.063
0.064
0.059
15
25
50
100
0.047
0.047
0.049
0.053
0.046
0.049
0.051
0.053
0.045
0.050
0.051
0.053
0.049
0.049
0.047
0.052
0.048
0.049
0.053
0.051
N
1
25
50
100
Fisher Test
0.051
0.048
0.052
0.047
0.047
Fisher Test
0.050
0.053
0.043
0.054
0.049
Fisher Test
0.047
0.051
0.051
0.053
0.046
0.051
0.049
0.051
Note:
1. The data generating process is yit = ri0 + it, and it ~ i.i.d.N(0, 2i).
2. Please see text for choices of parameters
3. li is the truncation parameter used in individual KPSS test and p is the order of autoregression
in ARIMA(p,1,1) used in individual LMC test. p(l) = 0 indicates individual LM test is used.
285
test is an asymptotic test (in N). As for the tests based on the KPSS tests with
different lag truncation parameters and the LMC tests with different
autoregression orders, the sizes are also quite close to the nominal size of 5%.
In general, we also observe that the size performances are better for larger T
and the Fisher tests have better size performances in this case.
Table 2 presents the powers of the panel stationarity tests for the level
stationary models. To make things comparable, all the powers are adjusted
according to their true sizes. The powers of the LM based tests clearly state the
superiority of the panel stationary tests over their univariate counterparts. When
T = 25, the power of the univariate LM test is only 0.117, while the power
jumps to 0.392 when 15 cross-section units are used, and it is close to 1 (0.954
for the group mean test and 0.952 for the Fisher test) when N = 100. As a matter
of fact, all the powers for T = 100 are 1 and they are close to 1 when T = 50.
The powers of the group mean and the Fisher tests in most cases are almost the
same.
It is documented in the literature that increasing the lag truncation parameter
l in the KPSS tests and the autoregression order p in the LMC tests can reduce
the powers. This is replicated in Table 2 as those entries for N = 1. However,
due to the powerfulness of the panel stationarity tests, the reduction in the
powers by overestimating is not an issue in some cases, especially for larger T
and N, as in those cases the powers are 1 or close to 1. This is a unique feature
of panel stationarity tests. The reduction in power is smaller for the LMC tests
as p increases than for the KPSS tests as l increases.
The size and power performances of panel stationarity tests in the case of
white noise for the trend stationary models are reported in Tables 3 and 4. We
have similar observations in these two tables. One thing we need to point out
is that in this case the powers are smaller than those of level stationary models,
especially for the case of T = 25 where the powers are much smaller. The
powers are only 0.280 for the group mean test and 0.279 for the Fisher test even
when N = 100, though these represent an increase of nearly four-folds from the
univariate case.
Next, let us look at the results for the case of serial correlation. Table 5 gives
us the sizes of panel stationarity tests in this case. Note that size distortions are
expected for the tests based on the LM tests. This can be seen in the table for
the case of N = 1. But the size distortions become much worse as N increases.
As a matter of fact, the actual sizes are close to 1 when N = 100. This is due to
the fact that the size distortions are amplified through pooling the crosssectional units, as pointed out in Wu & Yin (1999) for the panel cointegration
tests as well. The size distortions are still quite severe when l1 is used in the
KPSS tests and they become moderate when l2 and l3 are used for T = 50 and
286
Table 2.
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.117
0.110
0.089
0.074
0.105
0.094
0.086
15
25
50
100
0.392
0.546
0.775
0.954
0.305
0.414
0.630
0.874
0.263
0.341
0.527
0.779
0.233
0.306
0.473
0.727
15
25
50
100
0.384
0.542
0.771
0.952
0.362
0.492
0.719
0.936
0.156
0.236
0.359
0.584
0.302
0.408
0.635
0.873
0.264
0.346
0.526
0.780
0.235
0.308
0.477
0.729
0.302
0.284
0.224
0.277
0.251
0.218
15
25
50
100
0.961
0.995
1.000
1.000
0.939
0.990
1.000
1.000
0.931
0.986
1.000
1.000
0.884
0.969
0.999
1.000
0.835
0.941
0.999
1.000
15
25
50
100
0.960
0.995
1.000
1.000
0.938
0.991
1.000
1.000
0.828
0.946
0.998
1.000
0.932
0.986
1.000
1.000
0.891
0.972
0.999
1.000
0.836
0.944
0.999
1.000
0.583
0.536
0.455
0.566
0.547
0.512
15
25
50
100
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
15
25
50
100
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
N
1
25
50
100
Note:
1. The data generating
it ~ i.i.d.N(0, 2i).
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
process
Fisher Test
0.271
0.381
0.576
0.835
0.268
Fisher Test
0.908
0.978
1.000
1.000
0.495
is
Fisher Test
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
and
Table 3.
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.052
0.052
0.060
0.054
0.051
0.050
0.054
15
25
50
100
0.065
0.064
0.066
0.062
0.071
0.073
0.064
0.060
0.061
0.067
0.065
0.060
0.065
0.067
0.063
0.063
15
25
50
100
0.057
0.054
0.059
0.054
0.055
0.051
0.057
0.053
0.050
0.051
0.054
0.056
0.055
0.057
0.055
0.053
0.051
0.053
0.056
0.053
0.058
0.059
0.057
0.057
0.046
0.047
0.049
0.046
0.047
0.050
15
25
50
100
0.050
0.060
0.057
0.058
0.055
0.053
0.055
0.054
0.064
0.062
0.068
0.064
0.073
0.073
0.075
0.072
0.068
0.069
0.068
0.074
15
25
50
100
0.049
0.051
0.052
0.054
0.047
0.048
0.049
0.052
0.056
0.055
0.052
0.056
0.049
0.048
0.055
0.056
0.053
0.056
0.061
0.066
0.051
0.053
0.054
0.064
0.046
0.042
0.042
0.043
0.050
0.048
15
25
50
100
0.061
0.057
0.059
0.054
0.062
0.057
0.059
0.053
0.064
0.063
0.062
0.062
0.070
0.065
0.068
0.060
0.074
0.068
0.066
0.059
15
25
50
100
0.052
0.049
0.052
0.048
0.051
0.050
0.051
0.049
0.052
0.053
0.053
0.054
0.053
0.055
0.060
0.054
0.060
0.057
0.057
0.053
N
1
25
50
100
287
Fisher Test
0.052
0.052
0.055
0.056
0.045
Fisher Test
0.050
0.049
0.049
0.049
0.041
Fisher Test
0.051
0.050
0.051
0.050
0.052
0.049
0.051
0.048
Note:
1. The data generating process is yit = ri0 + it + it, and it ~ i.i.d.N(0, 2i).
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
288
Table 4.
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.068
0.060
0.047
0.045
0.061
0.061
0.058
15
25
50
100
0.108
0.144
0.172
0.280
0.091
0.090
0.118
0.185
0.090
0.092
0.109
0.164
0.080
0.083
0.102
0.132
15
25
50
100
0.109
0.144
0.163
0.279
0.103
0.138
0.157
0.257
0.040
0.030
0.027
0.024
0.089
0.090
0.127
0.187
0.086
0.092
0.109
0.169
0.079
0.083
0.098
0.140
0.133
0.124
0.079
0.120
0.106
0.095
15
25
50
100
0.485
0.629
0.867
0.986
0.426
0.576
0.806
0.971
0.374
0.509
0.723
0.937
0.287
0.374
0.564
0.817
0.252
0.317
0.488
0.718
15
25
50
100
0.490
0.631
0.864
0.985
0.427
0.574
0.805
0.968
0.153
0.205
0.311
0.488
0.385
0.518
0.740
0.939
0.293
0.400
0.581
0.833
0.252
0.336
0.509
0.730
0.341
0.317
0.231
0.321
0.272
0.239
15
25
50
100
0.991
1.000
1.000
1.000
0.975
1.000
1.000
1.000
0.978
0.999
1.000
1.000
0.946
0.994
1.000
1.000
0.873
0.974
1.000
1.000
15
25
50
100
0.990
1.000
1.000
1.000
0.972
1.000
1.000
1.000
0.979
0.999
1.000
1.000
0.950
0.995
1.000
1.000
0.888
0.982
1.000
1.000
N
1
25
50
100
Fisher Test
0.067
0.070
0.084
0.108
0.106
Fisher Test
0.325
0.445
0.673
0.898
0.275
Fisher Test
0.937
0.992
1.000
1.000
0.868
0.972
1.000
1.000
Note:
1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), and
it ~ i.i.d.N(0, 2i).
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
Table 5.
25
50
100
289
p(l) = 0
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.079
0.059
0.050
0.047
0.051
0.054
0.058
15
25
50
100
0.532
0.694
0.904
0.993
0.232
0.302
0.433
0.657
0.130
0.144
0.181
0.212
0.150
0.182
0.230
0.314
0.152
0.172
0.221
0.328
15
25
50
100
0.490
0.669
0.897
0.993
0.205
0.270
0.401
0.641
0.028
0.024
0.016
0.012
0.104
0.123
0.156
0.212
0.129
0.160
0.210
0.314
0.129
0.150
0.206
0.328
0.080
0.057
0.046
0.055
0.060
0.059
15
25
50
100
0.551
0.747
0.945
0.999
0.099
0.102
0.117
0.145
0.126
0.144
0.182
0.250
0.138
0.161
0.209
0.286
15
25
50
100
0.517
0.729
0.944
0.999
0.140
0.190
0.279
0.456
0.050
0.050
0.047
0.047
0.077
0.082
0.091
0.116
0.096
0.113
0.155
0.213
0.109
0.137
0.178
0.264
0.094
0.062
0.052
0.052
0.058
0.057
15
25
50
100
0.563
0.783
0.944
0.998
0.077
0.082
0.081
0.087
0.086
0.094
0.096
0.106
0.099
0.114
0.124
0.145
15
25
50
100
0.532
0.773
0.943
0.998
0.109
0.148
0.193
0.293
0.057
0.064
0.066
0.070
0.066
0.075
0.072
0.083
0.074
0.083
0.092
0.112
Fisher Test
0.066
0.067
0.072
0.089
0.050
Fisher Test
0.070
0.077
0.095
0.128
0.053
Fisher Test
0.062
0.071
0.083
0.098
0.052
0.056
0.059
0.062
Note:
1. The data generating process is yit = ri0 + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i).
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
290
100. For the LMC test, the size distortion is still considerably large when the
true order of autoregression (p = 1) is used when T = 25. The size distortions
become smaller and moderate when T increases to 50 and 100. Interestingly,
overestimating in this case increases the size distortions. We can also observe
that the Fisher tests in general have better size performances than the group
mean tests.
Table 6 reports the power performances of the panel stationarity tests in the
presence of serial correlation. The first thing we can notice is that the powers
are lower than those in the white noise case for some combinations of N and
T. The powers are around 60% even when N = 100 and T = 25 for the KPSS tests
with l2 and the LMC tests with p = 1, which have relatively moderate size
distortions. The powers are close to 1 when N is larger than 50 and T = 50 for
these two tests (the group mean and Fisher tests). When T = 100, however, all
the powers are still 1 or very close to 1. In such a case, smaller size distortion
would be the primary criterion to decide which test to be used in practice. The
powers of the KPSS tests with l2 and the LMC test with p = 1 are almost the
same for most cases though the results for N = 1 actually indicate that the later
has an advantage in the univariate case, which agrees with the findings in LMC.
There are almost no differences in the power performances of the group mean
and the Fisher tests.
The size distortions of the panel stationarity tests for the trend stationary
models with serial correlation are presented in Table 7 with size adjusted
powers presented in Table 8. For the size distortions, we have the same
observations as those for the level stationary models. Quite interestingly, the
KPSS tests with l2 has slightly edge over the LMC tests with p = 1 when T = 50
while the situation is reversed when T = 100. But we observe severe negative
size distortions for the KPSS tests with l2 when T = 25. Except for this case, the
size distortions for these two tests are smaller than the corresponding ones in
the level stationary models. The Fisher tests have relatively better size
performances than the group mean tests, especially when the individual LMC
tests are used. As for the adjusted powers, we only need to report the lower
powers compared to the level stationary models since things are relatively the
same as those for the level stationary models. For the KPSS tests with l2 and the
LMC tests with p = 1, the powers are about 70% even when N = 100 for T = 50,
compared with powers of 1 in the same situation for the level stationary
models. The powers are close to 1 when T = 100 and there are more than 25
cross-section units in the panel.
In summary, through Monte Carlo simulations, we found the tests we
proposed have quite satisfactory small sample performances in most cases we
considered. In the absence of serial correlation, the tests based on the LM tests
Table 6.
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.153
0.109
0.095
0.079
0.100
0.095
0.089
15
25
50
100
0.249
0.338
0.489
0.754
0.207
0.250
0.394
0.588
0.174
0.207
0.329
0.532
0.157
0.204
0.302
0.479
15
25
50
100
0.247
0.337
0.484
0.750
0.228
0.316
0.488
0.729
0.163
0.209
0.301
0.466
0.212
0.248
0.394
0.584
0.171
0.207
0.331
0.534
0.161
0.200
0.304
0.490
0.316
0.242
0.197
0.235
0.198
0.183
15
25
50
100
0.886
0.862
0.998
1.000
0.831
0.939
0.996
1.000
0.775
0.912
0.993
1.000
0.712
0.854
0.981
1.000
0.643
0.813
0.967
0.999
15
25
50
100
0.885
0.962
1.000
1.000
0.833
0.941
1.000
1.000
0.673
0.844
1.000
1.000
0.774
0.917
1.000
1.000
0.723
0.858
1.000
1.000
0.651
0.812
1.000
1.000
0.530
0.500
0.429
0.524
0.490
0.471
15
25
50
100
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
0.998
1.000
1.000
1.000
15
25
50
100
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
0.999
1.000
1.000
1.000
N
1
25
50
100
291
Fisher Test
0.205
0.269
0.412
0.620
0.219
Fisher Test
0.772
0.910
1.000
1.000
0.468
Fisher Test
0.999
1.000
1.000
1.000
0.996
1.000
1.000
1.000
Note:
1. The data generating process is yit = ri0 + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), it = ii,t 1 + uit,
and uit ~ i.i.d.N(0, (1 2i)2i).
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
292
Table 7.
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.091
0.067
0.044
0.003
0.056
0.057
0.066
15
25
50
100
0.657
0.808
0.975
0.999
0.144
0.151
0.183
0.223
0.156
0.181
0.267
0.377
0.142
0.174
0.245
0.338
15
25
50
100
0.610
0.775
0.966
0.999
0.226
0.292
0.459
0.700
0.003
0.002
0.000
0.000
0.108
0.121
0.149
0.185
0.134
0.158
0.239
0.362
0.127
0.157
0.231
0.333
0.094
0.060
0.040
0.057
0.062
0.061
15
25
50
100
0.758
0.931
0.991
1.000
0.177
0.252
0.332
0.524
0.079
0.091
0.092
0.091
0.134
0.160
0.189
0.251
0.160
0.194
0.237
0.341
15
25
50
100
0.717
0.913
0.988
1.000
0.155
0.224
0.305
0.500
0.017
0.010
0.007
0.002
0.060
0.066
0.072
0.067
0.096
0.118
0.138
0.189
0.120
0.159
0.198
0.297
0.092
0.053
0.041
0.049
0.048
0.054
15
25
50
100
0.789
0.928
0.998
1.000
0.138
0.171
0.259
0.377
0.062
0.056
0.053
0.051
0.076
0.076
0.075
0.074
0.098
0.101
0.115
0.133
15
25
50
100
0.752
0.911
0.997
1.000
0.114
0.148
0.236
0.354
0.046
0.043
0.046
0.046
0.052
0.057
0.055
0.051
0.064
0.077
0.081
0.091
N
1
25
50
100
Fisher Test
0.014
0.012
0.005
0.001
0.051
Fisher Test
0.049
0.050
0.050
0.056
0.044
Fisher Test
0.052
0.054
0.062
0.063
0.039
0.036
0.031
0.027
Note:
1. The data generating process is yit = rit + it + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i).
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
Table 8.
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.065
0.060
0.051
0.044
0.064
0.065
0.054
15
25
50
100
0.055
0.088
0.130
0.203
0.059
0.072
0.086
0.120
0.066
0.078
0.100
0.122
0.064
0.071
0.090
0.090
15
25
50
100
0.052
0.088
0.132
0.203
0.067
0.085
0.119
0.178
0.053
0.042
0.037
0.032
0.056
0.072
0.091
0.119
0.061
0.077
0.098
0.121
0.066
0.076
0.090
0.089
0.123
0.109
0.087
0.100
0.090
0.086
15
25
50
100
0.389
0.381
0.693
0.905
0.311
0.381
0.603
0.827
0.240
0.302
0.437
0.674
0.207
0.238
0.367
0.548
0.186
0.190
0.324
0.478
15
25
50
100
0.389
0.377
0.696
0.903
0.312
0.384
0.608
0.829
0.140
0.216
0.270
0.403
0.234
0.312
0.444
0.680
0.203
0.247
0.374
0.554
0.189
0.199
0.333
0.502
0.302
0.264
0.208
0.273
0.236
0.200
15
25
50
100
0.935
0.993
1.000
1.000
0.908
0.984
1.000
1.000
0.881
0.976
1.000
1.000
0.816
0.930
0.995
1.000
0.707
0.855
0.981
1.000
15
25
50
100
0.934
0.993
1.000
1.000
0.902
0.984
1.000
1.000
0.886
0.974
1.000
1.000
0.823
0.939
0.997
1.000
0.735
0.869
0.985
1.000
N
1
25
50
100
293
Fisher Test
0.072
0.078
0.087
0.106
0.097
Fisher Test
0.252
0.330
0.481
0.694
0.235
Fisher Test
0.849
0.955
0.998
1.000
0.754
0.882
0.990
1.000
Note:
1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), it =
ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i)
2. See Note 2 in Table 1.
3. See Note 3 in Table 1.
294
have sizes close to the nominal size and powers much higher than the univariate
LM tests. Using the KPSS and LMC tests in this case would not result in much
size distortions, but would result in power losses for some combinations of N
and T, while the powers are already 1 or close to 1 for other combinations of
N and T. In the presence of serial correlation, we found that the tests based on
the KPSS tests with l2 and the LMC tests with p = 1 have relatively good size
performances though there are still moderate to severe size distortions when the
time span is short (T = 25), especially for the trend stationary models. And the
powers of all tests are lower than their counterparts in the white noise case.
Overall, the Fisher tests have better size performances than the group mean
tests while their power performances are almost the same.
IV. CONCLUSION
In this chapter, we developed several tests for stationarity in the heterogeneous
panel. We analyzed both level stationary and trend stationary models. By
allowing maximum degree of heterogeneity in the panel, we considered two
different ways to pool information regarding the null hypothesis from each
cross-section units by using the group mean test and the Fisher test. The group
mean test pools the information of the univariate test statistics while the Fisher
test summarizes the p-values of the individual tests. For the univariate
stationary tests, we consider the KPSS and LMC tests in the case of serial
correlation. The group mean tests based on the KPSS, and LMC tests are
asymptotically normal while the Fisher test statistics follow 2 distributions.
The small sample performances of the tests were investigated via Monte
Carlo simulation experiments. The results of simulations showed that the tests
we proposed have quite satisfactory size and power performances. In general,
the Fisher type tests have better size performances than the group mean type
tests while they have similar power performances. The tests based on the KPSS
tests with l2 and the LMC tests with p = 1 perform very similarly in terms of
size and power in most cases when there is serial correlation, except for the
short time span (T = 25). The size performances of these two tests are quite
good in the presence of serial correlation when T = 50 and 100. However, there
are still moderate to severe size distortions when T = 25 in the presence of serial
correlation. In such a case, bootstrapping method might be an effective way to
obtain better size performances. This would be an interesting topic for future
research. According to our simulation results, we would recommend to use
either the group mean tests or the Fisher tests which are based on both the
KPSS tests with l2 and the LMC tests with p = 1 to test for stationarity in the
heterogeneous panel data models in empirical work.
295
ACKNOWLEDGMENTS
We would like to thank Badi Baltagi and three anonymous referees for their
helpful comments. Of course, all remaining errors are ours.
NOTES
1. See, for example, Schwert (1987).
2. See KPSS for all relevant references and derivations of the tests.
3. Please see LMC for the details of this argument. Of course, this supremacy
depends on the correct specification of the LMC model, as pointed out by one
anonymous referee.
4. This means that the intercepts in different cross-section units can be different, one
aspect of the heterogeneous panel.
5. The moment restriction in applying the Lindberg-Levy CLT should not be a
problem here because all tests are variants of the LM tests, which are bounded.
6. The small sample distributions of these tests can be derived by simulating series
of given T under the null and apply the given test to the simulated series over a prespecified number of iterations.
7. In a recent paper, Choi (2000) proposes to standardize the Fisher test statistics as
well. But this is unnecessary unless N is large enough.
8. Please see Maddala & Wu (1999) for a detailed comparison between the group
mean and the Fisher tests.
9. By construction of the tests, the qis can be different across the units.
REFERENCES
Anderson, T. W., & Darling, D. A. (1952). Asymptotic Theory of Certain Goodness of Fit
Criteria Based on Stochastic Processes. Annals of Mathematical Statistics 23: 193212.
Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels:
A Survey. Advances in Econometrics, 15, 751.
Choi, I, (1999). Unit Root Tests for Panel Data. Manuscript, Kookmin University.
Fisher, R. A, (1932). Statistical Methods for Research Workers (4th ed.). Edinburgh: Oliver and
Boyd.
Hadri, K, (1998). Testing for Stationarity in Heterogeneous Panel Data. Working paper, School of
Business and Economics, Exeter University.
Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels.
Discussion paper, University of Cambridge.
Kao, C, (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data.
Journal of Econometrics, 90, 144.
Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis
of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics, 54,
91115.
Leybourne, S. J., & McCabe, B. P. M. (1994). A Consistent Test for a Unit Root. Journal of
Business and Economic Statistics, 12, 157166.
296
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and
a New Simple Test. Oxford Bulletin of Economics and Statistics, forthcoming.
McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel
Data. Econometric Reviews, 17, 5784.
McCoskey, S., & Kao, C. (1997). A Monte Carlo Comparison of Tests for Cointegration in Panel
Data. Working paper, Center for Policy Research and Department of Economics, Syracuse
University.
Newey, W. K., & West,K. D. (1987). A Simple Positive Semi-Definite Heteroskedasticity and
Autocorrelation Consistent Covariance Matrix. Econometrica, 55, 703708.
Pedroni, P, (1995). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time
Series Tests With an Application to the PPP Hypothesis. Working paper, Department of
Economics, Indiana University.
Pedroni, P, (1997). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time
Series Tests With an Application to the PPP Hypothesis, New Results. Working paper,
Department of Economics, Indiana University.
Phillips, P. C. B., & Perron, P. (1988). Testing For a Unit Root in Time Series Regression.
Biometrika, 75, 335346.
Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo
Comparison. Working paper, Department of Economics, State University of New York at
Buffalo.
INSTRUMENTAL VARIABLE
ESTIMATION OF SEMIPARAMETRIC
DYNAMIC PANEL DATA MODELS:
MONTE CARLO RESULTS ON
SEVERAL NEW AND EXISTING
ESTIMATORS
M. Douglas Berg, Qi Li and Aman Ullah
ABSTRACT
We consider the problem of instrumental variable estimation of semiparametric dynamic panel data models. We propose several new
semiparametric instrumental variable estimators for estimating a dynamic
panel data model. Monte Carlo experiments show that the new estimators
perform much better than the estimators suggested by Li & Stengos (1996)
and Li & Ullah (1998).
I. INTRODUCTION
Economic research has been enriched by the availability of panel data that
measure individual cross-sectional behavior over time. For reviews on the
literature of estimation and inference in parametric panel data models, see
Baltagi (1995), Chamberlain (1984), Hsiao (1986) and Matyas & Sevestre
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 297315.
2000 by Elsevier Science Inc.
ISBN: 0-7623-0688-2
297
298
(1.1)
(1.2)
When the error uit has a one-way error component structure, i.e. uit = i + it,
then yit 1 and uit are correlated and instrumental variable methods are needed
to obtain consistent estimation for .
There is a rich literature on how to obtain consistent and efficient estimation
results for parametric dynamic models, see Ahn & Schmidt (1995), Anderson
& Hsiao (1981), Arellano & Bover (1995), Baltagi & Griffin (1998), Pesaran
& Smith (1995) and Kiviet (1995), among others. The consistent and efficient
estimation results for the parametric dynamic panel data model (1.2) depend
crucially on the correct specification of the model. If (zit) zit, parametric
estimation methods based on a misspecified model (1.2) will in general lead to
inconsistent estimation of .
Semiparametric partially linear models have the advantage of not specifying
the functional form of ( ). Hence a consistent semiparametric estimator of
based on (1.1) is robust to functional form specification of ( ). There is a rich
literature on estimating a partially linear model with independent data using
various non-parametric techniques, e.g. Engle et al. (1986), Robinson (1988),
Stock (1989), Donald & Newey (1994), Li (1996). Also, see Ullah & Roy
(1998), Ullah & Mundra (1998), and Khanna et al. (1999) for the estimation
and applications of static partially linear panel data models. However, little
attention has been paid to dynamic partially linear panel data models. Although
Li & Stengos (1996) and Li & Ullah (1998) discussed how to estimate model
(1.1) by semiparametric instrumental variable methods, no simulations are
reported in those works and hence the finite sample performance of the
estimators proposed in Li & Stengos (1996) and Li & Ullah (1998) are
unknown.1
Li & Stengos (1996) proposed a semiparametric OLS type IV (OLSIV)
estimator for estimating . When the error follows an one-way error
components structure. The OLS type estimator is not efficient because it
299
(i = 1, . . . , N; t = 1, . . . , T),
(2.1)
(2.2)
where i is i.i.d. (0, 2), it is i.i.d. (0, 2), i and jt are uncorrelated for all i
and jt.
In this chapter we propose a new semiparametric IVGLS estimator that
fully uses the one-way error component structure. We also propose a
semiparametric IV-within-transformation estimator which has the advantage of
computationally simplicity. Because it does not require one to estimate the
300
(2.3)
(2.4)
(2.5)
where JT = eT eT is a T
T matrix with all elements equal to one, J T = JT /T,
ET = IT J T and 21 = T2 + 2. By noting the facts that J TET = 0, J T + ET = IT, and
both J T and ET are idempotent matrices, it is easy to check that the inverse of
is given by2
1 = IN [(1/21)J T + (1/2 )ET] IN
1,
(2.6)
(2.7)
and
The above expression of 1 and 1/2 will be used in GLS estimation
procedure discussed below.
A. Some Infeasible Estimators
Equation (2.1) contains an unknown function ( ), following Robinson (1988),
we first eliminate ( ). Taking conditional expectation of (2.1) conditional on
zit and then subtracting it from (2.1) leads to
yit E(yit|zit) = (xit E(xit|zit)) + uit
vit + uit,
(2.8)
def
(2.9)
301
Equation (2.9) no longer contains the unknown function ( ). Note that vit
and uit are correlated because vit contains yit 1 and uit contains the random
individual effects i. Suppose there exists a q
1(q p) instrumental variable it
that is correlated with xit and uncorrelated with uit, then we can use
def
wit = it E(it|zit) as IV for vit. For example, consider a simple case where both
xit and zit are scalars with xit = yit 1 and zit is strictly exogenous, then one can
choose it = zit 1 as instrument for yit 1.
In vector-matrix notation, an (infeasible) IVOLS estimator of based on
(2.9) is (see White (1984, 1987) for a discussion on IV estimation)
IVO = (vwwv) 1vww(y E(y|z)) = + (vwwv) 1vwwu. (2.10)
When the model is just identified, i.e. p = q, and if we assume that wv is
invertible, then IVO becomes
IVO = (wv) 1(vw) 1vww(y E(y|z)) = (wv) 1w(y E(y|z)). (2.11)
The above IVOLS estimator is not efficient because it ignores the error
component variance structure. Li and Ullah (1998) suggested estimating by
(2.12)
= (vw(w 1w) 1wv) 1 vw(w 1w) 1w(y E(y|z)).
However, when q = p and if we assume that the square matrices vw and
w 1w are both invertible, then we have from (2.12)
= (wv) 1(w 1w)(vw) 1vw(w 1w) 1w(y E(y|z))
= (wv) 1w(y E(y|z)) = IVO,
that is, reduces to the IVOLS estimator of (2.11) when the model is just
identified. Therefore, the IV estimator also ignores the variance component
structure when the model is just identified.
A new IVGLS estimator that fully uses the one-way error component
structure is given by
IVG = (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1(y E(y|z))
= + (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1u, (2.13)
IVG of (2.13) is an optimal IV estimator as discussed in White (1984, 1987).
When the model is just identified, i.e. p = q, and if we assume that both
w 1v and w 1w are invertible, then IVG of (2.13) becomes
IVG = (w 1v) 1(w 1w)(v 1w) 1v 1w(w 1w) 1w 1(y E(y|z))
(2.14)
= (w 1v) 1w 1(y E(y|z)),
which is different from IVO of (2.11). Note that one can transform the model
by premultiplying y, v and w by 1/2. Denote y* = 1/2y, v* = 1/2v and
w* = 1/2w, then the IVGLS estimator of (2.13) is simply
302
(2.16)
The proof of (2.16) is similar to the proof of lemma 3 of Li and Ullah and
is therefore omitted here.
Next we propose a simple IV estimator based on the within transformation.
Within type estimator has the advantage of computationally simple, it only
requires the least squares regression of the within transformed variables. Define
it = E(yit|zit) and define the within transformed variables: y it = yit y i ,
it = wit w
i , where y i = Ts= 1 yis /T, i , v i and w
i
it = it i , v it = vit v i and w
are similarly defined. The IVWithin estimator is given by
w
v) 1v w
w(
y ).
(2.17)
IVW = (vw
When the model is just identified, we have
v) 1(vw)
1v w
w(
y ).
IVW = (w
y ).
= (w
v) 1w(
(2.18)
The within type estimator has the advantage of being computationally simple
because it does not require one to estimate the error variance .
B. Feasible Estimators
The estimators IVO, IVG and IVW discussed above are not feasible, because the
conditional mean functions E(y|z), E(x|z) and E(w|z) as well as , are unknown.
The feasible estimators can be obtained by replacing the unknown conditional
mean functions by their non-parametric estimators, such as the non-parametric
kernel estimators, and replacing 21 and 2 by consistent estimators of them.
Following Robinson (1988), we use a kernel estimation method to estimate
the unknown conditional expectations. Specifically we denote the kernel
it, respectively,
estimators of f(zit), E(yit|zit), E(xit|zit), E(wit|zit) by fit, y it, x it and w
where
fit = 1 d
NTh
j
Kit, js,
(2.19)
y it =
x it =
1
NThd
1
NThd
303
j
yjsKit, js / fit,
(2.20)
xjsKit, js / fit,
(2.21)
wjsKit, js / fit,
(2.22)
and
w
it =
1
NThd
j
where Kit, js = K((zit zjs)/h), K( ) is the kernel function and h is the smoothing
parameter.
Note that when xit = yit 1, we have
it 1|zit) = (NThd) 1
x it = E(y
j
(2.23)
it 1|zit 1) = (NThd) 1
yjs 1 Kit 1, js 1 /
which is different from y it 1 = E(y
j
s
fit 1.
We estimate vit xit E(xit|zit) by xit x it and we estimate wit it E(it|zit)
by it it, where
it = (NThd) 1
j
(2.24)
(x x )
where
1
1(y y ),
( )[( ) 1( )] 1( )
1
(2.26)
1
(2.27)
304
with
1 = (1/ 2)ET + (1/ 21)J T,
(2.28)
(2.29)
= T + ,
= u (IN J T)u/N,
(2.30)
2
2
1
2
2
2
and u is of dimension n
1 with a typical element given by
u it = yit y it (xit x it) IVO.
(2.31)
(2.32)
For the feasible semiparametric IV within estimator, we will use the same
tilde notation to denote the feasible quantity to avoid introducing too many new
notations. For example we use v it to denote kernel estimator of vit v i . Recall
that vit = xit E(xit|zit). Hence we have
T
1
v it = (xit x it)
T
(xis x is).
(2.33)
s=1
Similarly, recall that wit = it E(it|zit) and it = E(yit|zit), we have
1
w
it = (it it)
T
(is is),
(2.34)
1
it = it
T
is.
(2.35)
and
s=1
s=1
y it remains the same as y it = yit y i . With the notations given in (2.33) to (2.35),
we obtain the feasible semiparametric IVWithin estimator,
w
v) 1v w
w(
y ).
(2.36)
IVW = (vw
In the next section we compare the finite sample performances of the new
estimators proposed in this paper with those suggested by Li & Stengos (1996)
and Li & Ullah (1998) via Monte Carlo simulations.
(2.37)
305
(2.38)
(2.39)
W = [vv] 1v y,
(2.40)
s=1
yis.
s=1
(I)(III) do not use instrumental variables and hence these estimators are
expected to have large bias because they ignore the fact that yit and uit are
correlated. However, they are also expected to have smaller variances
compared with the IV estimators. Therefore, for small and moderate samples,
their mean square error (MSE) are not necessarily larger than the semiparametric IV estimators. Of course when the sample size is sufficiently large, we
expect the semiparametric IV estimators to have smaller MSE because after all,
they are consistent estimators, while the non-IV estimators are inconsistent.
The bias of non-IV estimators will not die out as the sample size increases.
We report estimated bias, standard deviation (Std) and root mean square
errors (Rmse) for all the estimators. These are computed via
M
= M1
Bias( )
j=1
( j ),
= M1
Std( )
j=1
2
( j Mean( ))
1/2
and
= {M 1
Rmse( )
j=1
is the estimated value of at the jth replication. We use M = 2000 in all the
simulations. We choose T = 6 and N = 50, 100, 200, 500.
306
The simulation results are given in Tables 1 and 2. The smallest Rmse for
each case (for a given N and ) is shown as boldface number(s). The
simulations results are qualitatively similar for = 0, = 0.5 and = 1.
Therefore, we only report the cases of = 0 and = 1 to save space.
Table 1 reports the result for = 0. From Table 1 we see that the non-IV
estimators: OLS, GLS and W have large bias because these estimators ignore
the fact that yit 1 is correlated with uit. However, these non-IV estimators all
have smaller standard deviations (or variances) than the semiparametric IV
estimators.
When N is small (N 100) and with small to moderate values of ( 0.5),
GLS has the smallest Rmse among all the estimators.
For N 100 with = 0.8, GLS is no longer the best because of the large bias
due to the strong individual effects. In this case IVG and IVW have the smallest
Rmse.
For N = 200 and N = 500 and for small = 0.2, IVO has the smallest Rmse.
But larger values of ( = 0.5, 0.8), IVG and IVW become the best in terms of
the Rmse criterion.
For N 100 and 0.5 GLS has the smallest Rmse. However, for = 0.8, the
bias in GLS is very large and hence its Rmse is much larger than the IV
estimators. IVG and IVW have the smallest Rmse for = 0.8.
As N increases, the bias in OLS, GLS and W remain the same order as
expected. The variances of the IV estimators decrease as N increases, and as a
result, the IV estimators dominate the non-IV estimators when N 200. For
= 0.2, IVOLS estimator has the smallest Rmse. For = 0.5 and = 0.8, IV
GLS and IVWithin estimators have much smaller Rmse compared with the
IVOLS estimator. The IVOLS estimator ignores the one-way error
component structure. Hence when the individual effects are large, IVOLSs
performance is expected to be worse than that of the IVGLS estimator.
We observe, as expected, the bias of non-IV estimators increase as
increases.
We also observe that the Rmse for IVOLS estimator remain the same for
different values of , while for IVGLS and IVWithin estimators, the Rmse
decrease as increases.
Next, we observe that the results of Table 2 is very similar to that of Table
1. That is, the result is not sensitive to the different functional form of (zit).
This is as expected because all the estimators are semiparametric and hence
they are robust to functional form specifications of ( ).
The DGP given in (2.37) is a just identified model. We have also conducted
some simulations for over identified model. In particular, we consider the
following model
Table 1.
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
307
The case of = 0.
Rmse
N = 50
= 0.5
Bias
Std
Rmse
Bias
0.198
0.117
0.248
0.291
0.215
0.225
0.352
0.099
0.213
0.042
0.008
0.009
0.030
0.059
0.057
0.329
0.171
0.174
0.353
0.115
0.220
0.331
0.171
0.174
0.442
0.310
0.136
0.128
0.012
0.013
= 0.2
Bias
Std
Rmse
N = 100
= 0.5
Bias
Std
Rmse
Bias
0.196
0.104
0.243
0.008
0.006
0.006
0.199
0.111
0.246
0.139
0.146
0.151
0.354
0.100
0.220
0.023
0.007
0.008
0.021
0.040
0.040
0.158
0.117
0.118
0.355
0.108
0.223
0.159
0.117
0.118
0.443
0.312
0.154
0.049
0.009
0.010
= 0.2
Bias
Std
Rmse
N = 200
= 0.5
Bias
Std
Rmse
Bias
0.198
0.105
0.244
0.004
0.004
0.005
0.200
0.108
0.246
0.097
0.103
0.105
0.356
0.100
0.224
0.010
0.005
0.006
0.015
0.029
0.029
0.101
0.083
0.084
0.356
0.104
0.226
0.101
0.083
0.084
0.444
0.312
0.166
0.016
0.007
0.007
= 0.2
Bias
Std
Rmse
N = 500
= 0.5
Bias
Std
Rmse
Bias
0.199
0.105
0.245
0.001
0.006
0.006
0.200
0.106
0.245
0.058
0.065
0.067
0.357
0.100
0.227
0.003
0.006
0.006
0.357
0.101
0.228
0.057
0.053
0.053
0.444
0.311
0.176
0.004
0.006
0.006
= 0.2
Bias
Std
0.193
0.103
0.241
0.019
0.006
0.005
0.045
0.056
0.058
0.290
0.215
0.225
0.031
0.039
0.041
0.139
0.146
0.150
0.021
0.027
0.029
0.097
0.103
0.105
0.014
0.017
0.019
0.058
0.065
0.066
0.009
0.018
0.018
0.057
0.052
0.053
= 0.8
Std
0.016
0.040
0.061
2.39
0.111
0.111
= 0.8
Std
0.011
0.027
0.042
0.528
0.077
0.076
= 0.8
Std
0.008
0.020
0.029
0.106
0.054
0.054
= 0.8
Std
0.005
0.013
0.018
0.057
0.034
0.034
Rmse
0.442
0.313
0.149
2.39
0.112
0.112
Rmse
0.443
0.313
0.160
0.530
0.077
0.077
Rmse
0.444
0.312
0.168
0.107
0.055
0.055
Rmse
0.444
0.311
0.177
0.058
0.034
0.034
308
Table 2.
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
The case of = 1.
Rmse
N = 50
= 0.5
Bias
Std
Rmse
Bias
0.196
0.117
0.244
0.302
0.215
0.225
0.348
0.092
0.208
0.045
0.008
0.009
0.031
0.058
0.057
0.341
0.171
0.174
0.350
0.109
0.216
0.344
0.172
0.174
0.438
0.298
0.132
0.168
0.012
0.013
= 0.2}
Bias
Std
Rmse
N = 100
= 0.5
Bias
Std
Rmse
Bias
0.194
0.104
0.238
0.008
0.006
0.006
0.196
0.111
0.242
0.139
0.146
0.150
0.351
0.094
0.214
0.023
0.007
0.008
0.021
0.040
0.040
0.156
0.117
0.118
0.352
0.102
0.218
0.158
0.118
0.119
0.440
0.299
0.148
0.042
0.009
0.010
= 0.2
Bias
Std
Rmse
N = 200
= 0.5
Bias
Std
Rmse
Bias
0.196
0.104
0.240
0.004
0.004
0.005
0.197
0.108
0.241
0.097
0.103
0.105
0.353
0.093
0.218
0.010
0.005
0.006
0.015
0.029
0.028
0.101
0.083
0.084
0.353
0.097
0.220
0.101
0.083
0.084
0.441
0.298
0.158
0.016
0.007
0.007
= 0.2
Bias
Std
Rmse
N = 500
= 0.5
Bias
Std
Rmse
Bias
0.197
0.105
0.240
0.001
0.006
0.006
0.197
0.106
0.241
0.058
0.065
0.067
0.353
0.092
0.221
0.003
0.006
0.006
0.353
0.094
0.222
0.057
0.053
0.053
0.441
0.297
0.167
0.004
0.006
0.006
= 0.2
Bias
Std
0.190
0.104
0.237
0.021
0.006
0.005
0.045
0.055
0.058
0.301
0.215
0.225
0.031
0.039
0.041
0.139
0.146
0.150
0.021
0.027
0.029
0.097
0.103
0.105
0.013
0.017
0.019
0.058
0.065
0.066
0.009
0.018
0.018
0.057
0.052
0.053
= 0.8
Std
0.016
0.041
0.059
3.53
0.112
0.111
= 0.8
Std
0.012
0.028
0.041
0.243
0.077
0.077
= 0.8
Std
0.008
0.021
0.028
0.106
0.054
0.054
= 0.8
Std
0.005
0.013
0.018
0.057
0.034
0.034
Rmse
0.439
0.301
0.144
3.53
0.112
0.112
Rmse
0.440
0.301
0.153
0.246
0.077
0.077
Rmse
0.441
0.299
0.161
0.107
0.055
0.055
Rmse
0.441
0.298
0.168
0.058
0.035
0.035
309
ACKNOWLEDGMENTS
We would like to thank a referee and Badi Baltagi for very useful comments
that greatly improve the paper. Q. Lis research is supported by Natural
310
NOTES
1. Li & Ullah (1998) reported some Monte Carlo results on a static semiparametric
panel data model. They also proposed two semiparametric instrumental variable
estimators for a semiparametric dynamic panel data model, but they did not conduct any
Monte Carlo simulations on the dynamic model.
2. Using the simple spectral decomposition method to derive the inverse of was
proposed by Wansbeek & Kapteyn (1982, 1983).
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal
of Econometrics, 68, 527.
Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models With Error Components.
Journal of American Statistical Association, 76, 598606.
Arellano, M., & Bover, O. (1995). Another Look at The Instrumental Variable Estimation of Error
Components Models. Journal of Econometrics, 68, 2851.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: Wiley.
Baltagi, B. H., & Griffin, J. M. (1997). Pooled Estimators vs. Their Heterogeneous Counterparts
in The Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303327.
Chamberlain, G. (1984). Panel Data. In: Z. Griliches & M. Intriligator (Eds), Handbook of
Econometrics (pp. 12471318 ), Vol. II. Amsterdam: North Holland.
Donald, S. G., & Newey, W. K. (1994). Series Estimation of Semilinear Regression. Journal of
Multivariate Analysis, 50, 3040.
Engle, R. F., Granger, C. W. J., Rice, J., & Weiss, A. (1986). Semiparametric Estimates of The
Relationship Between Weather and Electricity Sales. Journal of the American Statistical
Association, 81, 310320.
Hsiao, C. (1986). Analysis of Panel Data. Econometric Society monograph No. 11. New York:
Cambridge: Cambridge University Press.
Khanna, M., Mundra, K., & Ullah, A. (1999). Parametric and Semiparametric Estimation of The
Effect of Firm Attributes on Efficiency: The Electricity Generating Sector in India. Journal
of International Trade and Economic Development, forthcoming.
Kiviet, J. F. (1995). On Bias, Inconsistency and Efficiency of Some Estimators in Dynamic Panel
Data Models. Journal of Econometrics, 68, 5378.
Li, Q. (1996). On The Root-n-consistent Semiparametric Estimation of Partially Linear Models.
Economics Letters, 51, 277285.
Li, Q., & Hsiao, C. (1998). Testing Serial Correlation in Semiparametric Panel Data Models.
Journal of Econometrics, 87, 207237.
Li, Q., & Stengos, T. (1996). Semiparametric Estimation of Partially Linear Panel Data Models.
Journal of Econometrics, 71, 389397.
311
Li, Q., & Ullah, A. (1998). Estimating partially linear models with one-way error components.
Econometric Reviews, 17, 145166.
Matyas, L., & Sevestre, P. (1992). The Econometrics of Panel Data. Dordrecht: Kluwer, 2nd
edition.
Pesaran, M. H., & Smith, R. (1995). Estimation of Long-run Relationship From Dynamic
Heterogeneous Panels. Journal of Econometrics, 68, 79114.
Robinson, P. M. (1988). Root-N-consistent Semiparametric Regression. Econometrica, 56,
931954.
Stock, J. H. (1989). Nonparametric Policy Analysis. Journal of the American Statistical
Association, 84, 567575.
Ullah, A., & Roy, N. (1998). Nonparametric and Semiparametric Econometrics of Panel Data. In:
A. Ullah and D. E. A. Giles (Eds), Handbook on Applied Economic Statistics (pp. 579
604), Ch. 17. Marcel Dekker.
Ullah, A., & Mundra, K. (1999). Semiparametric Panel Data Estimation: An Application to
Immigrates Homelink Effect on U.S. Producer Trade Flows. Working paper 15, Department
of Economics, University of California at Riverside.
Wansbeek, T. J., & Kapteyn, A. (1982). A Simple Way to Obtain the Spectral Decomposition of
Variance Components Models for Balanced Data. Communications in Statistics, A11,
21052112.
Wansbeek, T. J., & Kapteyn, A. (1983). A Note on Spectral Decomposition and Maximum
Likelihood Estimation of ANOVA Models With Balanced Data. Statistics and Probability
Letters, 1, 213215.
White, H. (1984). Asymptotic Theory for Econometricians. New York: Academic Press.
White, H. (1986. Instrumental Variables Analogs of Generalized Least Squares Estimator. R. S.
Mariano (Ed.), Advances in Statistical Analysis and Statistical Computing (pp.173277),
Vol.1. New York: JAI Press.
APPENDIX
/** This is a gauss program using Monte Carlo simulation to examine the finite
sample performanes of some semiparametric instrumental variable estimators
in a semiparametric dynamic panel data model, written by M. Douglas Berg **/
output file = c:\gauss\doug\work1.out reset;
format /rd 8,3;
n = 100; T = 6;
T00 = 30; T0 = T + T00 + 1; NT = N*T;
nr = 500;
@ number of replication @
lamt = 0.5; b1 = 1; b2 = 0; sig2 = 10;
rho = 0.8;
sigmu2 = rho*sig2;
signu2 = (1-rho)*sig2; sigmu = sqrt(sigmu2);
signu = sqrt(signu2); s1_5 = sqrt(t*sigmu2 + signu2);
sv_5 = signu;
@ true parameter values @
ycz = zeros(nt,1); y1cz = ycz; z1cz = ycz; fz = ycz;
312
@ Generate y @
@ Li-Ullah, Li-Stengos IV @
lam1[i1,.] = inv(w1v*xxv)*w1v*yyv;
lam3[i1,.] = inv(xxv*xxv)*xxv*yyv;
u01 = yyv xxv*lam1[i1,.];
u03 = yyv xxv*lam3[i1,.];
Jbt = ones(t,t)/t;
Et = eye(t) Jbt;
u11 = Et*( (reshape( u01,n,t)) );
u11 = reshape( u11,nt,1 );
sv2 = u11*u11/(n*(t1));
u22 = Jbt*( (reshape(u01,n,t)) );
u22 = reshape( u22,nt,1 );
smu2 = u22*u22/n;
s12 = sv2 + t*smu2;
sv_1 = sqrt( sv2 );
s1_1 = sqrt( s12 );
u11 = Et*( (reshape( u03,n,t)) );
u11 = reshape( u11,nt,1 );
sv2 = u11*u11/(n*(t1));
u22 = Jbt*( (reshape(u03,n,t)) );
u22 = reshape( u22,nt,1 );
smu2 = u22*u22/n;
s12 = sv2 + t*smu2;
sv_3 = sqrt( sv2 );
s1_3 = sqrt( s12 );
At_1 = Jbt/s1_1 + Et/sv_1;
At_3 = Jbt/s1_3 + Et/sv_3;
At_5 = Jbt/s1_5 + Et/sv_5;
At_w = Et;
yyn_1 = At_1*( (reshape(yyv,n,t)) );
yyn_3 = At_3*( (reshape(yyv,n,t)) );
yyn_6 = At_w*( (reshape(yyv,n,t)) );
xxn_1 = At_1*( (reshape(xxv,n,t)) );
xxn_3 = At_3*( (reshape(xxv,n,t)) );
xxn_6 = At_w*( (reshape(xxv,n,t)) );
w1n_w = At_w*( (reshape(w1v,n,t)) );
w1n = At_1*( (reshape(w1v,n,t)) );
yyv_1 = reshape(yyn_1,nt,1);
yyv_3 = reshape(yyn_3,nt,1);
313
@ IV-OLS estimator @
@ Semi-OLS estimator @
314
yyv_6 = reshape(yyn_6,nt,1);
xxv_1 = reshape(xxn_1,nt,1);
xxv_3 = reshape(xxn_3,nt,1);
w1v_w = reshape(w1n_w,nt,1);
xxv_6 = reshape(xxn_6,nt,1);
w1v = reshape(w1n,nt,1);
lam1n[i1,.] = inv(w1v*xxv_1)*w1v*yyv_1;
@ IV-GLS estimato
@ lam3n[i1,.] = inv(xxv_3*xxv_3)*xxv_3*yyv_3;
@ Semi-GLS estimator @ lam4n[i1,.] = inv(w1v_w*xxv_6)*w1v_w*yyv_6;
@ IV-Within estimator @ lam6n[i1,.] = inv(xxv_6*xxv_6)*xxv_6*yyv_6;
@ Semi-Within est. @ i1 = i1 + 1;
endo;
Bias1 = meanc( lam1 lamt );
@ Bias @
Bias3 = meanc( lam3 lamt );
rmse1 = sqrt( meanc( (lam1-lamt)2^ ) ); @ Root-MSE @
rmse3 = sqrt( meanc( (lam3-lamt)2^ ) );
std1 = stdc(lam1);
@ Standard Dev. @
std3 = stdc(lam3);
Bias1n = meanc( lam1n lamt );
Bias3n = meanc( lam3n lamt );
Bias4n = meanc( lam4n lamt );
Bias6n = meanc( lam6n lamt );
rmse1n = sqrt( meanc( (lam1n-lamt)2^ ) );
rmse3n = sqrt( meanc( (lam3n-lamt)2^ ) );
rmse4n = sqrt( meanc( (lam4n-lamt)2^ ) );
rmse6n = sqrt( meanc( (lam6n-lamt)2^ ) );
std1n = stdc(lam1n);
std3n = stdc(lam3n);
std4n = stdc(lam4n);
std6n = stdc(lam6n);
print "********************************************************";
print "IVO1, bias1, std1, rmse1 = " bias1 std1 rmse1;
print "OLS, bias3, std3, rmse3 = " bias3 std3 rmse3;
print "********************************************************";
print "IVG1, bias1n, std1n, rmse1n = " bias1n std1n rmse1n;
print "GLS, bias3n, std3n, rmse3n = " bias3n std3n rmse3n;
print "********************************************************";
print "With1, bias4n, std4n, rmse4n = " bias4n std4n rmse4n;
315
I. INTRODUCTION
One of the issues around which the recent growth literature has evolved is that
of convergence. This refers to the idea that, because of diminishing returns to
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 317339.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
317
318
NAZRUL ISLAM
capital, poorer economies should grow faster and catch up with the richer ones.
Statistically, convergence is therefore interpreted as a negative correlation
between the initial level of income and the subsequent growth rate.
Accordingly, a popular method for testing the convergence hypothesis has been
to run growth-initial level regressions or growth-convergence regressions,
where subsequent growth rates are regressed on initial levels of income.
For a long time, growth-convergence regressions were estimated using crosssection data. However, recently researchers have drawn attention to the fact that
the growth-convergence equation actually represents a dynamic panel data
model, and by ignoring the individual effects, cross-section estimation courts
omitted variable bias (OVB). Thus, Islam (1993, 1995) argues for using panel
procedures to overcome this bias and in particular implements Chamberlains
(1982, 1983) Minimum Distance (MD) procedure to estimate the equation.
Knight et al. (1993) make similar arguments and also use the Minimum
Distance procedure to produce similar results. Islam, in addition, presents
results from the Least Squares with Dummy Variables (LSDV) procedure.
Since these initial works, panel estimation of the growth-convergence
equation has spread considerably. For example, Lee, Pesaran & Smith (1997,
1998) consider maximum likelihood estimation of the growth-convergence
equation using panel data. Caselli et al. (1996) emphasize the problem of
endogeneity in this equation and use the Arellano-Bond GMM panel procedure
to overcome the problem. Barro (1997) and Barro & Sala-i-Martin (1995) use
pooled estimation on panel data sets. Lee et al. (1998) also present evidence on
panel estimation of the growth convergence equation.
The panel estimates presented in these papers generally differ from
corresponding cross-section estimates. However, they also differ among
themselves. Nerlove (1999) highlights this by using a variety of panel
estimators to estimate the growth-convergence equation and compiling the
results. Similar findings were presented earlier in Islam (1993). This creates a
problem of choosing among various panel estimators. Unfortunately, theoretical properties of dynamic panel data estimators are generally asymptotic and
often equivalent. This creates the necessity of Monte Carlo studies to ascertain
the small sample properties of these estimators. However, Monte Carlo studies
are more useful when they are customized to the specification and the data set
that are used in actual estimation. Although many researchers have recently
presented Monte Carlo evidence on small sample properties of dynamic panel
estimators, studies focusing on the growth-convergence equation and using the
Summers-Heston (1988, 1991) data set are rare.
This chapter tries to help overcome this lacking. The study focuses on those
estimators that have been used so far to estimate the growth-convergence
319
equation. Accordingly, the estimators included are: least squares with dummy
variables (LSDV); the two instrumental variable estimators of Anderson &
Hsiao (1981, 1982), namely AH(l), based on level instruments, and AH(d),
based on difference instruments; the minimum distance (MD) estimator,
suggested by Chamberlain (1982, 1983); and the one-step (ABGMM1) and
two-step (ABGMM2) generalized method of moments estimators proposed by
Arellano & Bond (1991). In addition, the exercise includes simultaneous
equations (SE) estimators such as the two stage least squares estimator (2SLS),
the three stage least squares estimator (3SLS), and the generalized three stage
least squares estimator (G3SLS). To complete the picture, the study also
includes the (pooled) ordinary least squares (OLS) estimator, which ignores the
individual effects.
The two main parameters of the model are the dynamic adjustment
parameter (attached to the lagged dependent variable) and , the parameter
of the exogenous variable. The Monte Carlo results show that the OLS
estimates of are, as expected, positively biased, and the magnitude of this bias
averages to about seventeen percent of the true parameter value. For most of the
panel estimators, the direction of bias is negative, with only the AH(d)
estimator providing some exceptions. The bias is small for the AH(d), the
LSDV, and the MD estimators, ranging between five and six percent. The bias
of the 2SLS, 3SLS, and 3SGLS estimates of ranges between eight to ten
percent. The largest bias is observed for the ABGMM estimators, averaging to
twenty two percent. The AH(l) estimator perform so poorly that we refrain
from reporting its results.
The results regarding root mean square error (RMSE) demonstrate a similar
pattern. The average RMSE as percentage of the true value of proves to be
seventeen percent for the OLS estimator. For the LSDV and the MD estimators,
this percentage ranges between six and seven. For the AH(d), 2SLS, 3SLS, and
3GSLS estimators, it ranges between ten and twenty. This percentage is the
highest for the ABGMM estimators, ranging between forty to forty-six
percent.
With regard to , the bias of the OLS estimates is again positive, but now
averages much higher to forty-eight percent of the parameter value. The
direction of bias of the panel estimates of is quite mixed. However, panel
estimates of are on average quite close to the true parameter value. The
magnitude of the algebraic average of the bias for the 2SLS, 3SLS, LSDV and
the MD estimator remain under one percent. For AH(d) and G3SLS it ranges
between one and two percent. For the ABGMM estimates, this percentage is
higher but still within five to seven percent.
320
NAZRUL ISLAM
321
322
NAZRUL ISLAM
bias of the ABGMM estimators. Kiviet (1995) reports good performance of his
bias-corrected LSDV estimator. On the other hand, Wansbeek & Knaap (1998)
report better performance of a covariance-corrected instrumental variable
estimator and their LIML estimator. Baltagi & Kao (2000) in this volume give
an extensive survey of recent developments in dynamic panel data models.
These studies have illuminated the small sample properties of various
dynamic panel estimators. However, most of these studies do not focus on any
particular model or data set. Ziliak (1997)s study is probably an exception, and
it focuses on a labor supply model and uses the PSID data. However, it is
known that Monte Carlo results are more useful when the exercise is
customized to the model whose estimation is in question and when the
simulations are conducted on the basis of the data set that is actually used for
estimation of the model. From this point of view there exists a void regarding
the growth-convergence equation. Monte Carlo evidence on small sample
performance of panel data estimators in estimating this equation is rare.
This chapter tries to overcome this lacking to some extent. It focuses
exclusively on the growth-convergence equation and bases the simulations on
the Summers-Heston data set that has been widely used in estimating this
equation. This focus also guides the choice of estimators to be included in the
study. The main feature of the growth-convergence equation is that the
exogenous variable of the model is correlated with the individual, country
effects. This implies that panel estimators that rely on uncorrelated randomeffects assumption are not suitable for estimation of this equation. On the other
hand, estimators that highlight this correlation, such as the Minimum Distance
estimator of Chamberlain, may play an important role in estimating it. The
study also considers several different generation mechanism of the random
error term, and it considers estimation of the equation in several different
samples that have widely figured in the recent growth literature. Because of its
customized nature, the results of this study should be directly useful for the
empirical growth researchers.
(1)
323
Here yit represents log of per capita GDP of country i at time t, yi,t 1 is the same
lagged by one period, and xi,t 1 is the difference in log of investment and
population growth rate variables of country i at time t 1. Finally, i and t are
individual and time effect terms, and vit is the transitory error which varies
across both individual and time. In this set up, (t1) and t denote initial and
subsequent periods of time, respectively. The derivation of this equation
proceeds from the Cobb-Douglas aggregate production function, Yt =
Kt (AtLt)1 , where Y, K, and L are output, capital, and labor respectively, and A
is the labor-augmenting technology which grows exponentially at the
exogenous rate g. The derivation yields the following correspondence between
the coefficients of equation (1) and the structural parameters of the production
function:
= e
(2)
1
(3)
i = (1 e ) ln A0i
(4)
= (1 e )
t = g(t2 e
t1).
(5)
Here is the length of time between t2 and t1, where t2 and t1 correspond to t
and (t1) of equation (1), respectively. The parameter is known as the rate of
convergence and is given by = (1 )(n + g +
), where n is the exponential
growth rate of L, and
is the rate of depreciation of capital.
An important issue regarding this model is specification of the individual
effect term i. The equation (4) shows that i basically stands for A0i. Mankiw,
Romer & Weil (1992, p. 6) define A0i as follows: The A0i term reflects not just
technology but resource endowments, climate, institutions, and so on; it may
therefore differ across countries. From this definition, it is obvious that A0i is
correlated with xi,t 1, which represents savings and fertility behavior in an
economy. Thus equation (1) represents a dynamic panel data model with
correlated effects. This shows why random-effects estimators are not
appropriate for the growth-convergence equation.
However, there are different ways to specify the correlation between i and
xi,t 1. Mundlak (1971) proposes a simple specification whereby i is a function
of x i, the time mean of xi,t 1. This is however restrictive and renders the random
effects model equivalent to the fixed effects model, provided the transitory
error term is serially uncorrelated. Hence, a more general specification is
preferable. Following Chamberlain, we adopt the following specification of
i:
i = 0 + 1x` i0 + 2xi1 + + TxT 1 + i,
(6)
324
NAZRUL ISLAM
325
composite residuals (t + i + vit). In the second step, these residuals are
regressed on xits and year dummies to get estimates of s and ts. The
residuals from this second step regression give estimates of (i + vit)s. We can
denote these as uits. The third step consists of estimating the parameters of the
MA(1) and AR(1) models from the estimated values of uits. We use
Chamberlains Minimum Distance estimation procedure to do this and get
estimated values of , , and the corresponding values of
and .4
In growth-convergence studies, three different samples have been frequently
used. Following Mankiw et al. (1992), these samples are often referred to as the
NONOIL, INTER, and OECD. Of these, the OECD is the smallest and consists
of 22 OECD countries. The NONOIL is the largest and consists of most of the
sizable countries of the world for which oil extraction is not the dominant
economic activity. This sample consists of 96 countries. Finally, the INTER is
an intermediate sample comprised of all those countries included in the
NONOIL sample except those for which data quality is not satisfactory. This
sample consists of 74 countries.
Table 1 gives the values of the parameters that belong to the first and second
set. These are also the parameters that remain the same under different
generation mechanisms of vit.
Certain aspects of these parameter values are worth noting. First, there seems
to be some agreement across samples regarding direction in which xits of
different years relate to the individual effect term i. This is reflected in similar
signs of ts across samples. However, this agreement is not complete. Second,
the way different time periods affect the growth process differs across samples.
Table 1. Common Parameter Values
Parameter
NONOIL
INTER
OECD
0
1
2
3
4
5
70
75
80
85
0.7886
0.1641
1.3334
0.0028
0.1200
0.1243
0.0267
0.2277
0.0171
0.0156
0.0067
0.0669
0.7925
0.1732
1.3588
0.1927
0.1098
0.1644
0.1286
0.1715
0.0093
0.0015
0.0218
0.0523
0.6294
0.0954
2.8986
0.5863
0.6354
0.0702
0.6355
0.3484
0.0680
0.0827
0.1295
0.1238
326
NAZRUL ISLAM
This is revealed by the signs of ts in different samples. There are some
differences in this regard between the NONOIL and the INTER samples.
However, the difference between these two samples on the one hand, and the
OECD, on the other, proves to be more significant.
Next we turn to the parameter values that differ with the three different
generation mechanisms of vit. The estimated values of these parameters are
compiled in Table 2.
Several things may be noted from this Table. First, the largest estimated
values of and are about 0.2 and 0.3, respectively. This indicates that any
serial dependence that vit may have in the actual data is of fairly low order.5
This in turn suggests that the relative performance of different estimators may
not vary widely across different ways of modeling of vit. Second, variance of
the individual country effect term remains quite stable under alternative
generating schemes of vit in all different samples. Third, the estimate of the
variance of vit also remains very similar across the samples. Fourth, the relative
values of and v suggest that variation in the individual effect term i
account for a significant part of the overall variation in the data.
C. Data Generation
Once the parameter values are available, data generation can begin. It proceeds
through the following steps. First of all, values of xits are constructed from the
Table 2.
Parameter
Uncorrelated vit
0.1054
0.1281
v
0.2037
0.1179
0.1225
0.1153
v
0.2994
0.1227
0.1183
0.1171
INTER
OECD
0.0872
0.0139
0.0300
0.0762
0.1250
0.0990
0.1010
0.0980
0.1125
0.0302
0.0742
0.0300
0.1787
0.0943
0.0995
0.0927
0.1394
0.0319
0.0742
0.0316
MA(1) vit
AR(1) vit
327
Summers-Heston data set in the way described above.6 This data set also
provides the initial values, y0i. We assume that all disturbance terms have
normal distribution.7 The second step differs for different models of vit. For the
uncorrelated model, random values of vit and i are generated using
distributions N(0, 2v) and N(0, 2), respectively. These values of vit and i are
then combined with the given values of yi,t 1 and xi,t 1, and the parameter
values in Table 1 to produce yit. For the first period, y0is serve as the yi,t 1s. For
the subsequent periods, the value of yit serves as the lagged value of y for
generating yi,t + 1. The process continues till the last (T-th) period is reached. For
the MA(1) model, i is again generated using distribution N(0, 2). However,
generation of vit now requires generation of
it from the distribution N(0, 2
).
These values of
it are then combined with the values of to produce the vits.
Generation of vits for the AR(1) proceeds in analogous manner.
Once the data are generated, estimation can proceed. We now turn to the
estimation results.
328
NAZRUL ISLAM
present tables showing bias and root mean square error (RMSE) in relative
form, i.e. as percentage of the true parameter value.9 Tables 3 and 4 provide the
relative magnitudes of bias, and Tables 5 and 6 show the relative magnitudes of
root mean square error for the estimates of and , respectively.
These Tables indicate that the relative performance of the estimators varies
across samples and vit generation mechanisms (DGM). To convey an overall
picture, we therefore compute the (algebraic) average of the bias and RMSE for
each estimator. These are row-averages and are presented in the last column of
the Tables. We will first describe the results in terms of these averages and then
consider the inter-sample and inter-DGM variations.
Beginning with , we may first consider results regarding bias. Table 3
shows that the OLS estimates of are, as expected, positively biased, and this
bias averages to seventeen percent. The panel estimates of , on the other hand
and as expected, are negatively biased. The only exception in this regard is the
AH(d) estimator, which displays small positive bias when vit is generated under
the uncorrelated (UC) scheme. However, the average bias is negative for this
estimator too. We refrain from reporting results for the AH(l) estimator because
of its very poor performance. (We will come to this issue shortly.) Among the
panel estimators, the bias is smaller for the AH(d), the LSDV, and the MD
estimators, ranging between five and six percent. These are followed by the SE
estimators, for which this bias ranges between eight to ten percent. The largest
bias, about twenty-two percent, is associated with the ABGMM estimators.
Table 5 shows that the RMSE in estimating has a similar pattern. The
average RMSE for the OLS estimator stands at seventeen percent. For the
LSDV and the MD estimator, this ratio lies between six and seven percent. For
the AH(d) estimator the ratio averages to eleven percent. For the SE estimators,
this ratio lies between thirteen to twenty percent. For the ABGMM estimators,
this ratio equals to or exceeds forty percent.
Looking at the bias results for (Table 4), we see that the OLS estimates are
again severely biased upwards, with the bias now averaging to forty-eight
percent. The direction of bias of the panel estimators is mixed. But the panel
procedures yield estimates that are on average quite close to the true parameter
values. The absolute value of this bias for the panel estimators ranges from
under one to seven percent. Within this range, however, the LSDV, the MD, the
2SLS, and the 3SLS estimators perform better, with average bias being less
than one percent. Next comes the AH(d) and the G3SLS estimator, having a
bias ranging between one and two percent. The largest biases, ranging between
five and seven percent, are recorded for the ABGMM estimators.
The smallness of the average biases of the panel estimates of is however
swamped by large variances of the Monte Carlo distributions. This finds
MA(1)
14.6
8.2
nr
14.5
10.4
10.1
9.3
3.3
5.4
6.9
UC
14.8
8.0
nr
0.4
10.7
9.7
9.3
4.5
6.0
6.7
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
14.8
7.9
nr
15.9
10.6
10.2
8.6
6.5
5.2
6.4
NONOIL
UC
15.2
8.4
nr
0.2
44.4
47.3
3.1
5.8
8.3
6.9
INTER
MA(1)
15.2
9.3
nr
9.5
49.5
49.5
3.1
7.9
10.1
7.9
INTER
AR(1)
15.4
8.0
nr
10.0
43.4
44.4
2.8
5.3
8.8
6.7
INTER
UC
21.5
1.6
nr
0.6
9.5
8.6
18.8
12.7
19.9
1.3
OECD
MA(1)
20.9
1.7
nr
1.2
8.6
8.2
15.8
12.2
16.5
1.1
OECD
AR(1)
21.2
1.4
nr
1.6
8.3
8.5
17.1
12.2
13.4
1.2
OECD
Average
17.1
6.1
nr
5.7
21.7
21.8
9.8
7.8
10.4
5.0
Row
Notes:
1. The true values of are different for different sample and are provided in Table 1.
2. Row Average is the algebraic average of the numbers in the row.
3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive
generation mechanism of the transitory error vit.
4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
Table 3.
MA(1)
32.1
0.7
nr
2.2
14.4
5.2
2.7
0.2
2.4
0.7
UC
31.4
1.0
nr
0.4
13.7
3.9
2.3
0.2
1.3
0.2
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
31.6
0.3
nr
4.0
14.5
14.7
1.9
2.0
1.7
0.8
NONOIL
UC
11.8
0.5
nr
0.6
7.5
3.1
2.7
2.0
2.0
1.0
INTER
MA(1)
11.7
1.4
nr
1.1
16.3
22.1
2.0
2.3
2.2
0.5
INTER
AR(1)
11.1
1.1
nr
1.0
26.4
34.5
2.5
9.3
8.7
0.0
INTER
UC
100.0
1.5
nr
1.7
7.3
17.3
0.8
5.1
14.2
0.6
OECD
MA(1)
99.9
0.7
nr
2.5
5.4
19.9
3.9
4.5
8.1
0.6
OECD
AR(1)
100.5
2.1
nr
0.4
2.6
1.5
3.9
8.9
2.0
1.3
OECD
Average
47.8
0.1
nr
1.5
6.9
5.3
0.9
0.8
1.4
0.03
Row
Notes:
1. The true values of are different for different sample and are provided in Table 1.
2. Row Average is the algebraic average of the numbers in the row.
3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive
generation mechanism of the transitory error vit.
4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
Table 4.
330
NAZRUL ISLAM
MA(1)
14.8
8.7
nr
16.6
27.5
29.3
12.6
14.7
18.1
7.8
UC
15.0
8.5
nr
8.3
27.7
29.6
12.1
8.5
10.0
7.4
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
14.9
8.5
nr
17.6
26.7
28.9
12.0
10.4
8.7
7.4
NONOIL
UC
15.3
8.9
nr
5.4
64.8
79.7
5.1
9.6
11.9
7.6
INTER
MA(1)
15.3
9.9
nr
13.9
70.4
84.9
5.4
11.1
13.8
8.7
INTER
AR(1)
15.3
8.7
nr
13.3
65.9
77.1
5.0
8.9
12.6
7.6
INTER
UC
22.3
3.5
nr
7.3
24.3
32.9
24.3
28.4
40.9
3.0
OECD
MA(1)
21.7
3.6
nr
7.2
21.9
29.1
21.3
28.4
37.6
3.1
OECD
AR(1)
22.0
3.6
nr
7.5
23.7
31.0
23.0
23.6
29.3
3.2
OECD
Average
17.4
7.1
nr
10.8
39.2
46.9
13.4
16.0
20.3
6.2
Row
Notes:
1. The true values of are different for different sample and are provided in Table 1.
2. Row Average is the algebraic average of the numbers in the row.
3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive
generation mechanism of the transitory error vit.
4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
Table 5.
MA(1)
35.2
15.3
nr
20.1
151.9
169.7
18.4
21.9
28.2
15.4
UC
34.6
12.8
nr
19.9
147.0
169.6
17.1
13.7
17.5
13.3
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
34.7
15.4
nr
19.1
145.3
165.1
17.5
16.1
15.4
15.8
NONOIL
UC
18.8
12.4
nr
20.3
148.0
187.7
19.5
16.5
17.8
12.6
INTER
MA(1)
18.1
14.5
nr
21.2
153.8
205.2
20.8
15.6
18.5
15.1
INTER
AR(1)
17.7
14.3
nr
18.1
143.6
181.9
19.3
17.8
23.5
14.4
INTER
UC
117.4
40.1
nr
64.9
243.5
306.8
58.3
67.2
119.4
40.5
OECD
MA(1)
116.6
44.9
nr
60.2
237.9
284.6
54.7
64.9
111.1
44.6
OECD
AR(1)
116.0
43.8
nr
62.1
226.5
284.1
57.9
82.5
149.6
45.6
OECD
Average
56.6
23.7
nr
34.0
177.5
217.2
31.5
35.1
55.7
24.1
Row
Notes:
1. The true values of are different for different sample and are provided in Table 1
2) Row Average is the algebraic average of the numbers in the row.
3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive
generation mechanism of the transitory error vit.
4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
Table 6.
332
NAZRUL ISLAM
333
reflection in the large relative RMSE values reported in Table 6. The ratio of
RMSE to true value of for the OLS estimator stands at fifty-seven percent.
For most of the panel estimators this ratio is much lower. For the LSDV and the
MD estimators, this ratio is close to twenty-four percent. For the AH(d), the
2SLS, and 3SLS estimators, the ratio lies between thirty-two and thirty-five
percent. The G3SLS estimator displays a higher ratio, fifty-six percent, which
is close to that observed for the OLS estimator. For the ABGMM estimators,
however, this ratio ranges from 178 to 217 percent, which is much higher than
that for the OLS.
These results show that the OLS estimation of the growth-convergence
equation is very likely to produce significantly biased estimates. The
performance of the panel estimators, on the other hand, varies. The LSDV and
the MD estimators perform well. The SE estimators come next in performance.
The AH estimators display very contrasting performance. The AH(l) estimator
perform so poorly that we refrain from presenting its results. On the other hand,
the AH(d) estimator performs sometimes better than the SE estimators. The
ABGMM estimators are found to display large bias and RMSE.
These results agree with recent Monte Carlo evidence produced by other
researchers in other contexts. For example several studies have reported bias of
the ABGMM estimators. Other studies have reported good small sample
performance of the LSDV estimator. These results imply that the OLS
estimation of the growth-convergence equation should be avoided. Indiscriminate use of panel estimator is also fraught with danger. However, a judicious
choice of panel estimator can yield better estimates of the parameters of the
growth convergence equation. Empirical growth researchers can make use of
this possibility.
Beyond these results of immediate concern, the study brings out several
general points. The first of these concerns the contrasting performance of the
AH estimators. Both these estimators rely on the assumption of orthogonality
of lagged yi to vit. This assumption holds only when vit is serially uncorrelated.
Therefore, one would expect both these estimators to perform well when vit is
serially uncorrelated, and both of them to perform poorly when vit follows
either the AR(1) or the MA(1) pattern. However, as the numbers in the Tables
show, the AH(d) performs relatively well under all different generation
mechanisms of vit and for all samples, while the performance of AH(l) is found
to be unsatisfactory under all different generation mechanisms of vit and for all
samples, particularly for the NONOIL and the INTER samples. The
explanation, as it turns out, lies in the difference in the degree of correlation of
the instruments with the instrumented variables. It is found that (yi,t 2 yi,t 3),
the instrument used by the AH(d), is strongly correlated with the explanatory
334
NAZRUL ISLAM
variable (yi,t 1 yi,t 2), while yi,t 2, the instrument used by the AH(l), is very
poorly correlated with (yi,t 1 yi,t 2). This poor correlation finds reflection in
astronomically large values of standard error for the AH(l) estimates. These
results reconfirm the necessity of instruments to be sufficiently correlated with
the instrumented variable (in addition to being uncorrelated with the error), and
highlight the importance of the research on estimation with weak instruments.10
A second point concerns the performance of the ABGMM estimators as well
as the AH(d) estimator. The performance of these estimators does not vary that
much over the three generation mechanisms of vit. This is particularly true with
regard to estimation of . This is somewhat surprising because these estimators
depend rather heavily for their validity on orthogonality of lagged values of yit
to vit, and this orthogonality is violated when vit follows either an AR or a MA
scheme. It is true that the order of serial correlation is low. However, one would
expect some effect of the serial correlation given that it nullifies validity of so
many instruments. Actually, the AH(d) estimator does show some sensitivity
with respect to the generation scheme of vit. Why the ABGMM estimators do
not display similar sensitivity is an intriguing question.
The third point relates to the variation of performance of the estimators
across samples. The overall picture portrayed above is on the basis of average
over samples and DGMs. Looking at inter-sample variation, however, it is
difficult to establish a pattern. For example, going by the results on bias of
estimated , the performance of the OLS estimator deteriorates for the OECD
when compared with that for either the NONOIL or the INTER samples.
However, in case of the LSDV and the MD estimators, the opposite is true. The
ABGMM and the SE estimators show a yet different kind of contrast. The
performance of the ABGMM estimators deteriorates for the INTER sample in
comparison with that for either the NONOIL or the OECD samples. In case of
the SE estimators, the opposite is true. The contrasting performance of the
ABGMM and the SE estimators may not be entirely surprising in view of the
fact that while the former depends on lagged yits as instruments, the SE
estimators rely entirely on the xits.
The fourth point concerns relative performance of simple and sophisticated
versions of generically similar estimators. The averaged RMSE values
presented in Tables 5 and 6 show that the simpler 2SLS estimators outperforms
the 3SLS and the G3SLS. Similarly, in terms of these averaged values, the
ABGMM1 outperforms the ABGMM2.11 This highlights the fact that
sophisticated estimators requiring estimated weighting matrices may not
necessarily perform better than their simpler counterpart estimators that do not
require such matrices. Estimation of these weighting matrices creates
335
additional scope for noise to enter the estimation process, and that may nullify
the potential gain.
The final point concerns the performance of the LSDV estimator. As is
known, for a dynamic panel data model, the LSDV is inconsistent in the
direction of N. True that the LSDV estimator is consistent in the direction of T.
However T in this study is too small to make one a-priori hopeful of the benefit
of T-asymptotics. The results of this chapter regarding LSDV estimates show
that even theoretically inconsistent estimators can have good small sample
properties. This reinforces the importance of Monte Carlos studies.
V. CONCLUDING REMARKS
The issue of small sample properties of dynamic panel estimators is important.
Both substantive and methodological conclusions often depend on attention
given to this issue. For example, Caselli et al. (1996) reject the Solow model
based on their results from estimation of the growth-convergence equation
using a variant of the ABGMM estimator. The small sample bias of this
estimator reported in this and other studies may raise the question whether such
a rejection was too quick. Also, the estimation results prompt the authors to
abandon the strictly model-based specification in favor of an extended version
that includes a variety of variables based on heuristic reasoning. From a
methodological point of view, this is a throwback to the earlier stage of crosscountry growth research when specifications used to be informal, and the
coefficient of the regressions did not have exact correspondence with the
structural parameters of the production function. One of the great merits of
Mankiw, Romer & Weil (1992) and Barro & Sala-i-Martin (1992) was to put
an end to this stage. Methodologically, therefore, a return to informal
specifications may not be the ideal thing to do. A more satisfactory solution is
perhaps to adopt a two-stage analysis, with the first stage adhering to the
formal, model-based specification and yielding unbiased estimates of parameters and productivity. The second stage may focus on the role of the heuristic
variables in explaining productivity differences. However, this requires
attention to the issue of small sample performance of the estimator used in the
first stage.
NOTES
1. For a derivation of the growth-convergence equation, see Barro & Sala-i-Martin
(1992, 1995), Mankiw, Romer & Weil (1992), and Mankiw (1995). For conversion of
the growth-convergence equation into a dynamic panel data model, see Islam (1993,
1995).
336
NAZRUL ISLAM
2. For discussions of many of these new estimators, see Baltagi (1995) and Hsiao
(1986).
3. This is value of that has been used in Islam (1993, 1995), Knight et al. (1993),
Caselli et al. (1996) and in several other papers.
4. For example, for the MA(1) model, this starts by noticing that E(uiui ) has the
following structure:
2 + (1 + 2)2
2 + 2
2
2
2
2
2
2
2
2
2
2
2
+
+ (1 + )
+
2
2
2
2
2
2
2
2
2
E(uiui) =
+
+ (1 + )
+
2
2
2
2
2
2
2
2
2
+
+ (1 + )
+ 2
2
2
2
2
2
2
+
+ (1 + 2)2
where ui = (ui1, ui2, . . . , uiT), and T = 5. As expected, E(uiui) has three parameters,
namely ,
, and . The sample analog of this covariance matrix is obtained from
1
u iu i, where u i = (ui1, . . . , u iT), and u its are obtained from the second step. There
N i
are T(T + 1)/2 = 15 distinct elements in this sample covariance matrix, which are (nonlinear) functions of the three underlying parameters ,
, and . Estimates of ,
,
and can be obtained from these 15 elements using the MD estimation framework.
See for details Chamberlain (1982, 1983). An analogous procedure is followed for the
AR(1) model to obtain the estimates of ,
, and . Estimation of v and for the
UC case is easier.
5. Perhaps also of interest is that the value of both and are the largest in the
NONOIL sample and the smallest in the OECD sample, with the values for the INTER
sample being in between.
6. For further details on construction of the xits, see Islam (1995).
7. In this study we have limited ourselves to parametric distributions of the
disturbance term. In principle it is possible to do away with parametric assumptions. We
leave this as a future task.
8. To save space, we do not provide detailed description of the estimators. Many of
these are well known. For the rest, the interested reader can see the cited references. An
appendix containing the description of the estimators is also available from the author
upon request.
9. In this chapter we report only the summary results. The detailed results are in a set
of Appendix Tables, which are available upon request.
10. See for example Nelson & Startz (1990), Staiger & Stock (1997), and Wang &
Zivot (1998).
11. To be sure, this ranking does not hold for every sample and every DGM. For
example in the NONOIL sample, regardless of the DGM, results from the 3SLS and the
G3SLS estimators seem to be better than that from the 2SLS. For the INTER sample,
however, the 2SLS seems to perform better than either the 3SLS or the G3SLS. In case
of the OECD sample, the situation is less clear cut. In terms of the mean of the Monte
Carlo distribution, the 3SLS and the G3SLS fare better than the 2SLS, though not in
terms of dispersion. On the other hand, in the OECD sample, the Monte Carlo
distributions for the 2SLS estimator have very large standard deviation. One reason for
337
deterioration of performance of the 3SLS and the G3SLS estimators in the INTER and
the OECD samples, when compared to that in the NONOIL sample, may lie in samplesize. The sizes of the former samples are smaller that that of the latter. Since the
superiority of the 3SLS and the G3SLS over the 2SLS estimator is an asymptotic result,
a larger sample size may help this result to surface.
ACKNOWLEDGMENTS
I would like to thank Professor Chamberlain, Professor Jorgenson, and
Professor Guido Imbens for their guidance to my work on this paper. Initial
versions of this chapter were presented in seminars at Harvard University and
Emory University. Comments of the participants of these seminars are greatly
appreciated. I would like to extend my sincere thanks to the three referees and
the editor, Professor Badi Baltagi, for their comments and suggestions that led
to significant improvement of this chapter. All remaining errors are mine.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal
of Econometrics, 68, 527.
Ahn, S. C., & Schmidt, P. (1997). Efficient Estimation of Dynamic Panel Models: Alternative
Assumptions and Simplified Estimation. Journal of Econometrics, 76, 309321.
Ahn, S. C., & Schmidt, P. (1999). Estimation of Linear Panel Data Models Using GMM. In:
Matyas (Eds), Generalized Method of Moments Estimation. Cambridge: Cambridge
University Press.
Alonso-Borrengo, C., & Arellano, M. (1999). Symmetrically Nomalized Instrumental-Variable
Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 3649.
Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components.
Journal of American Statistical Association, 76, 598606.
Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using
Panel Data. Journal of Econometrics, 18, 4782.
Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations. The Review of Economic Studies,
58, 277297.
Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variable Estimation of Error
Components Models. Journal of Econometrics, 68, 2952.
Balestra, P., & Nerlove, M. (1966). Pooling Cross-section and Time Series Data in the Estimation
of a Dynamic Model: The Demand of Natural Gas. Econometrica, 34, 585612.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons.
Baltagi, B. H., & Kao, C. (2000). Non-stationary Panels, Cointegration in Panels, & Dynamic
Panels: A Survey. Advances in Econometrics, 15 (this volume).
Barro, R. (1997). Determinants of Economic Growth: A Cross-country Empirical Study.
Cambridge: MIT Press.
Barro, R., & Sala-i-Martin, X. (1992). Convergence. Journal of Political Economy, 100(2),
223251.
338
NAZRUL ISLAM
Barro, R., & Sala-i-Martin, X. (1995). Economic Growth. Boston: McGraw Hill.
Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variable
Estimators. Econometrica, 62, 657681.
Blundell, R., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel
Data Models. Journal of Econometrics, 87, 115143.
Caselli, F., Esquivel, G., & Lefort, F. (1996). Reopening the Convergence Debate: A New Look
at Cross-country Growth Empirics. Journal of Economic Growth, 1(3), 363390.
Chamberlain, G. (1982). Multivariate Regression Models for Panel Data. Journal of Econometrics,
18, 546.
Chamberlain, G. (1983). Panel Data. In: Z. Griliches, Z. & M. Intrilligator (Eds), Handbook of
Econometrics (pp. 12471318), Vol. II. North-Holland.
Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed
Effects? Journal of Econometrics, 93, 309326.
Harris, M. N., & Matyas, L. A. (1996). Comparative Analysis of Different Estimators for Dynamic
Panel Data Models. Working paper: 04/96, Department of Econometrics and Business
Statistics, Monash University.
Harris, M., Longmire, R., & Maytas, L. (1996). Robustness of Estimators for Dynamic Panel Data
Models to Misspecification. Working paper No. 14/96, Department of Econometrics and
Business Statistics, Monash University.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Islam, N. (1993). Estimation of Dynamic Models from Panel Data. Unpublished Ph.D.
Dissertation, Department of Economics, Harvard University.
Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, CX,
11271170.
Judson, R. A., & Owen, A. L. (1997). Estimating Dynamic Panel Data Models: Practical Guide
for Macroeconomists. Board of Governors of the Federal Reserve System, Finance and
Economics Discussion Paper Series 1997/03.
Kiviet, J. (1995). On Bias, Inconsistency, & Efficiency of Various Estimators in Dynamic Panel
Data Models. Journal of Econometrics, 68, 5378.
Knight, M., Loyaza, N., & Villanueva, D. (1993). Testing for Neoclassical Theory of Growth. IMF
Staff Papers, 40(3), 512541.
Lee, K., Pesaran, H., & Smith, R. (1997). Growth and Convergence in a Multi-Country Empirical
Stochastic Growth Model. Journal of Applied Econometrics, 12, 357392.
Lee, K., Pesaran, H., & Smith, R. (1998). Growth Empirics: A Panel Data Approach A
Comment. Quarterly Journal of Economics, CXIII, 319323.
Lee, M., Longmire, R., Matyas, L., & Harris, M. (1998). Growth Convergence: Some Panel
Evidence. Applied Economics, 30, 907912.
Mankiw, N. G. (1995). The Growth of Nations. Brookings Papers on Economic Activity, 1,
275310.
Mankiw, N. G., Romer, D., & Weil, D. (1992). A Contribution to the Empirics of Growth.
Quarterly Journal of Economics, CVII, 407437.
Maytas, L. (Ed.) (1999). Generalized Method of Moments Estimation. Cambridge: Cambridge
University Press.
Mundlak, Y. (1971). On the Pooling of Time Series and Cross-section Data. Econometrica, XXXVI,
6985.
Nelson, C. R., & Startz, R. (1990). Some Further Results on the Exact Small Sample Properties
of the Instrumental Variables Estimator. Econometrica, 58, 967976.
339