Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF

LIST OF CONTRIBUTORS
Badi H. Baltagi
Texas A&M University, Department of

Economics, College Station, TX 77843-4228,
USA. E-mail: badi@econ.tamu.edu
M. Douglas Berg
Department of Economics and International

Business, Sam Houston State University,
Huntsville, TX 77341, USA
Richard Blundell
Institute for Fiscal Studies and University

College London, UK.
E-mail: r.blundell@ucl.ac.uk
Stephen Bond
Institute for Fiscal Studies and Nuffield

College, Oxford, UK.
E-mail: steve.bond@nuffield.ox.ac.uk
Jrg Breitung
Humboldt University Berlin, Institute of

Statistics and Econometrics, Spandauer
Strasse 1, D-10178 Berlin, Germany.
Fax: + 49.30.2093.5712;
E-mail: breitung@wiwi.hu-berlin.de
Min-Hsien Chiang
National Cheng-Kung University, Institute of

International Business, Tainan, Taiwan.
Fax: 886-6-2766459;
E-mail: mchiang@mail.ncku.edu.tw
Alain Hecq
University Maastricht, Department of

Quantitative Economics, P.O. Box 616, 6200
MD Maastricht, The Netherlands.
Fax: + 31-43-388 48 74
Nazrul Islam
Department of Economics, Emory University,

Atlanta, GA 30322-2240, USA.
Fax: 404-727-4639;
E-mail: nislam@emory.edu
vii
viii
Chihwa Kao
Syracuse University, Center for Policy

Research, Syracuse, NY 13244-1020, USA.
Fax: 315-443-1081;
E-mail: cdkao@maxwell.syr.edu
Heikki Kauppi
University of Helsinki, Department of

Economics, P.O. Box 54 (Unioninkatu 37),
FIN-00014 University of Helsinki, Finland.
Fax: + 358-9-1917980;
E-mail: heikki.kauppi@helsinki.fi
Qi Li
Department of Economics, Texas A&M

University, College Station, TX 77843 and
Department of Economics, University of
Guelph, Guelph, Ontario, N1G 2W1 Canada.
E-mail: qi@econ.tamu.edu
Chris Murray

Houston, Houston, TX 77204-5882, USA.
Fax: (713) 743-3798;
E-mail: cjmurray@uh.edu
Franz C. Palm

Fax: + 31-43-388 48 74
David H. Papell

Houston, Houston, TX 77204-5882, USA.
Fax: (713) 743-3798;
E-mail: dpapell@uh.edu
Peter Pedroni
Indiana University, Department of

Economics, Bloomington, IN 47405, USA.
E-mail: ppedroni@indiana.edu
Aman Ullah

California, Riverside, CA 92521, USA
ix
Jean-Pierre Urbain

Fax: + 31-43-388 48 74;
E-mail: j.urbain@ke.unimaas.nl
Frank Windmeijer
Institute for Fiscal Studies, 7 Ridgmount

Street, London WC1E 7AE, UK.
Fax: + 44.(0)20.7323.4780;
E-mail: f.windmeijer@ifs.org.uk
Showen Wu
Department of Finance and Managerial

Economics, State University of New York at
Buffalo, Buffalo, NY 14260, USA
Yong Yin
Department of Economics, State University

of New York at Buffalo, Buffalo, NY 14260,
USA. Fax: 716-645-2127;
E-mail: yyin@buffalo.edu
INTRODUCTION
Badi H. Baltagi, Thomas B. Fomby and R. Carter Hill
Twenty two years ago, the first special issue on panel data econometrics was
published by the Annales de lINSEE. This consisted of two volumes
containing a list of whos who in economics and econometrics of panel data
that was edited by Mazodier (1978). Since then, several books on panel data
have been written including the econometric society monograph by Hsiao
(1986), a two volume collection of classic papers on the subject by Maddala
(1993), a Handbook, which in its second edition contained 33 chapters edited
by Matyas & Sevestre (1996) and a textbook by Baltagi (1995a). Several
special issues of journals with a panel data theme have also appeared since
1978, those include Raj & Baltagi (1992), Matyas (1992), Carraro, et al.
(1993), Baltagi (1995b), Sevestre (1999) and Banerjee (1999). There have been
nine international conferences on panel data since the first conference at
INSEE, the last one was held at the University of Geneva in June, 2000. Panel
data econometrics continues to have an important impact on todays empirical
economics studies. A Journal of Economic Literature search returned 2780
citations using the words panel data between 1980 and 2000. This volume is
dedicated to two recent intensive areas of research in the econometrics of panel
data: nonstationary panels and dynamic panels, see the survey chapter by
Baltagi & Kao in this volume. The volume includes eleven refereed chapters on
this subject written by twenty authors. The editors are grateful to the authors
and referees for their cooperation.
The chapter by Baltagi & Kao surveys the nonstationary panels, cointegration in panels and dynamic panels literature. In particular, panel unit root tests
are considered first and several important chapters are reviewed including a
summary of the finite sample properties of these unit roots tests obtained from
Nonstationary Panels, Panel Cointegration and Dynamic Panels, Volume 15, pages 15.
Copyright 2000 by Elsevier Science Inc.
All rights of reproduction in any form reserved.
ISBN: 0-7623-0688-2
BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL
extensive simulations. Also, spurious regressions in panel data are considered

followed by panel cointegration tests with a summary of the finite sample
properties of these cointegration tests using Monte Carlo experiments. Next,
estimation and inference in panel cointegration models is considered and the
chapter concludes with a review of recent developments in dynamic panel data
models that have occurred over the last five years.
The chapter by Blundell, Bond & Windmeijer reviews recent developments
in the estimation of dynamic panel data models using generalized method of
moments (GMM). In particular, this chapter focuses on the system GMM
estimator derived by Blundell & Bond (1998) which relies on relatively mild
restrictions on the initial condition process. This system GMM estimator
encompasses the GMM estimator based on the non-linear moment conditions
available in the dynamic error components model. Monte Carlo experiments
and asymptotic variance calculations show that this extended GMM estimator
can offer considerable efficiency gains in situations where the first differenced
GMM estimator performs poorly.
The chapter by Pedroni develops methods for estimating and testing
hypotheses for cointegrating vectors in dynamic panels. In particular, this
chapter proposes methods based on fully modified OLS principles which
account for considerable heterogeneity across individual members of the panel.
The asymptotic properties of various estimators are compared based on pooling
along the within and between dimensions of the panel. Monte Carlo
simulations show that the group mean estimator is well behaved even in
relatively small samples under a variety of scenarios.
The chapter by Hecq, Palm & Urbain extends the concept of serial
correlation common features analysis to nonstationary panel data models. This
analysis is motivated both by the need to study and test for common structures
and comovements in panel data with autocorrelation present and by an increase
in efficiency due to pooling. The authors propose sequential testing procedures
and test their performance using a small scale Monte Carlo. Concentrating
upon the fixed effects model, they define homogeneous panel common feature
models and give a series of steps to implement these tests. These tests are used
to investigate the liquidity constraints model for 22 OECD and G7 countries.
The presence of a panel common feature vector is rejected at the 5% nominal
level.
The chapter by Breitung studies the local power of panel unit root test
statistics against a sequence of local power alternatives. In particular, this
chapter finds that the Levin & Lin (1993) (LL) and Im, Pesaran & Shin (1997)
(IPS) tests suffer from severe loss of power if individual specific trends are
Introduction
included. Breitung suggests a test statistic that does not employ a bias
adjustment whose power is substantially higher than that of LL or the IPS tests
using Monte Carlo experiments. This chapter also finds that the power of the
LL and IPS tests is sensitive to the specification of the deterministic terms.
The chapter by Kao & Chiang studies the limiting distributions of ordinary
least squares (OLS), fully modified OLS (FMOLS) and dynamic OLS (DOLS)
estimators in a panel cointegrated regression model. This chapter shows that
the OLS, FMOLS and DOLS estimators are all asymptotically normally
distributed. However, the asymptotic distribution of the OLS estimator has a
non-zero mean. Extensive Monte Carlo experiments are performed which show
that the OLS estimator has a non-negligible bias in finite samples, the FMOLS
estimator does not improve on the OLS estimator in general, and the DOLS
estimator outperforms both OLS and FMOLS.
The chapter by Murray & Papell proposes a panel unit roots test in the
presence of structural change. In particular, this chapter proposes a unit root
test for non-trending data in the presence of a one-time change in the mean for
a heterogeneous panel. The date of the break is endogenously determined. The
resultant test allows for both serial and contemporaneous correlation, both of
which are often found to be important in the panel unit roots context. Murray
& Papell conduct two power experiments for panels of non-trending, stationary
series with a one-time change in means and find that conventional panel unit
root tests generally have very low power. Then they conduct the same
experiment using methods that test for unit roots in the presence of structural
change and find that the power of the test is much improved.
The chapter by Kauppi develops a new limit theory for panel data that may
be cross sectionally heterogeneous in a fairly general way. This limit theory
builds upon the concepts of joint convergence in probability and in distribution
for double indexed processes by Phillips & Moon (1999a). This limit theory is
applied to a panel regression model with regressors that are generated by an
autoregressive process with a root local to unity. The main results are the
following: (i) the usual pooled panel OLS estimator is invalid for inference, (ii)
a bias corrected pooled OLS proves to be NT consistent with an asymptotic
normal distribution centered on the true parameter value irrespective of
whether the regressors have near or exact unit roots. This positive result holds
only in the special case where the model does not exhibit any deterministic
effects, such as individual intercepts. (iii) The fully modified panel estimator of
Phillips & Moon (1999a) is also subject to severe bias effects if the regressors
are nearly rather than exactly cointegrated. These theoretical results are
confirmed using Monte Carlo results.
BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL
The chapter by Yin & Wu proposes stationarity tests for a heterogeneous

panel data model. The authors consider the case of serially correlated errors in
the level and trend stationary models. The proposed panel tests utilize the
Kwaitkowski, Phillips, Schmidt & Shin (1992) test and the Leybourne &
McCabe (1994) test from the time series literature. Two different ways of
pooling information from the independent tests are used. In particular, the
group mean and the Fisher type tests are used to develop the panel stationarity
tests. Monte Carlo experiments are performed that reveal good small sample
performance in terms of size and power.
The chapter by Berg, Li & Ullah considers the problem of estimating a
semiparametric partially linear dynamic panel data model with disturbances
that follow a one-way error component structure. Two new semiparametric
instrumental variable (IV) estimators are proposed for the coefficient of the
parametric component. These are shown to be more efficient than the ones
suggested by Li & Stengos (1996) and Li & Ullah (1998) because they make
full use of the error component structure. This is confirmed using Monte Carlo
experiments.
The chapter by Islam conducts a Monte Carlo study to investigate the small
sample properties of dynamic panel data estimators. Although there are
extensive Monte Carlo studies on this subject, this study customizes the design
to the estimation of the growth convergence equation using the SummersHeston data. Islam concludes that the OLS estimation of the
growth-convergence equation is likely to give misleading results. At the same
time, indiscriminate use of panel estimators is risky and one should make
judicious choice of panel estimators.
REFERENCES
Only references that are not cited later in the volume are given here.
Baltagi, B. H. (1995b). Editors Introduction: Panel Data. Journal of Econometrics, 68, 14.
Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of
Economics and Statistics, 61, 607629.
Carraro, C., Peracchi, F., & Weber, G. (Eds.) (1993). The Econometrics of Panels and Pseudo
Panels. Journal of Econometrics, 59, 1211.
Hsiao, C. (1986). Analysis of Panel Data. Cambridge: Cambridge University Press.
Maddala, G. S. (Ed.) (1993). The Econometrics of Panel Data. Vols. 1 and 2. Cheltenham: Edward
Elgar.
Matyas, L. (Ed.) (1996). Modelling Panel Data. Structural Change and Economic Dynamics, 3,
291384.
Introduction
Matyas, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: Handbook of Theory
and Applications. Dordrecht: Kluwer Academic Publishers.
Mazodier P. (Ed.) (1978). The Econometrics of Panel Data. Annales de IINSEE, 30/31.
Raj, B., & Baltagi, B. (1992). Editors Introduction and Overview: Panel Data Analysis. Empirical
Economics, 17, 18.
Sevestre, P. (1999). 19771997: Changes and Continuities in Panel Data. Annales DEconomie et
de Statistique, 5556, 1525.
NONSTATIONARY PANELS,
COINTEGRATION IN PANELS AND
DYNAMIC PANELS: A SURVEY
Badi H. Baltagi and Chihwa Kao
ABSTRACT
This chapter provides an overview of topics in nonstationary panels: panel
unit root tests, panel cointegration tests, and estimation of panel
cointegration models. In addition it surveys recent developments in
dynamic panel data models.
I. INTRODUCTION
Two important areas in the econometrics of panel data that have received much
attention recently are dynamic panel data1 and nonstationary panel time series
models.2 This special issue focuses on these two topics. With the growing use
of cross-country data over time to study purchasing power parity, growth
convergence and international R&D spillovers, the focus of panel data
econometrics has shifted towards studying the asymptotics of macro panels
with large N (number of countries) and large T (length of the time series) rather
than the usual asymptotics of micro panels with large N and small T. In fact, the
limiting distributions of double indexed integrated processes had to be
developed, see Phillips & Moon (1999a). The fact that T is allowed to increase
to infinity in macro panel data, generated two strands of ideas. The first rejected
the homogeneity of the regression parameters implicit in the use of a pooled
ISBN: 0-7623-0688-2
BADI H. BALTAGI & CHIHWA KAO
regression model in favor of heterogeneous regressions, i.e. one for each

country, see Pesaran & Smith (1995), Im, Pesaran & Shin (1997), Lee, Pesaran
& Smith (1997), Pesaran, Shin & Smith (1999) and Pesaran & Zhao (1999) to
mention a few. This literature critically relies on T being large to estimate each
countrys regression separately. Another strand of literature applied time series
procedures to panels, worrying about non-stationarity, spurious regressions and
cointegration. Adding the cross-section dimension to the time-series dimension
offers an advantage in the testing for nonstationarity and cointegration. The
hope of the econometrics of nonstationary panel data is to combine the best of
both worlds: the method of dealing with nonstationary data from the time series
and the increased data and power from the cross-section. The addition of the
cross-section dimension, under certain assumptions, can act as repeated draws
from the same distribution. Thus as the time and cross-section dimension
increase panel test statistics and estimators can be derived which converge in
distribution to normally distributed random variables.
However, the use of such panel data methods are not without their critics, see
Maddala, Wu & Liu (2000) who argue that panel data unit root tests do not
rescue purchasing power parity (PPP). In fact, the results on PPP with panels
are mixed depending on the group of countries studied, the period of study and
the type of unit root test used. More damaging is the argument by Maddala et
al. that for PPP, panel data tests are the wrong answer to the low power of unit
root tests in single time series. After all, the null hypothesis of a single unit root
is different from the null hypothesis of a panel unit root for the PPP hypothesis.
Using the same line of criticism, Maddala (1999) argued that panel unit root
tests did not help settle the question of growth convergence among countries.
However, it was useful in spurring much needed research into dynamic panel
data models. Also, Quah (1996) who argued that the basic issues of whether
poor countries catch up with the rich can never be answered by the use of
traditional panels. Instead, Quah suggested formulating and estimating models
of income dynamics.
One can find numerous applications of time series methods applied to panels
in recent years. Examples from the purchasing power parity literature include
Bernard & Jones (1996), Jorion & Sweeney (1996), MacDonald (1996), Oh
(1996), Wu (1996), Coakley & Fuertes (1997), Culver & Papell (1997), Papell
(1997), OConnell (1998), Choi (1999a), Andersson & Lyhagen (1999), and
Canzoneri, Cumby & Diba (1999) to mention a few. On health care
expenditures, see McCoskey & Selden (1998), and Gerdtham & Lthgren
(1998). On growth and convergence, see Islam (1995), Evans & Karas (1996),
Sala-i-Martin (1996), Lee, Pesaran & Smith (1997), and McCoskey & Kao
Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey
(1999a). On international R&D spillovers, see Funk (1998) and Kao, Chiang &
Chen (1999). On exchange rate models, see Groen & Kleibergen (1999), and
Groen (1999). On savings and investment models, see Coakely, Kulasi & Smith
(1996) and Moon & Phillips (1998).
The first part of this chapter surveys some of the developments in
nonstationary panel models that have occurred since the middle of 1990s. Two
other recent surveys on this subject include Phillips & Moon (1999b) on multiindexed processes and Banerjee (1999) on panel unit roots and cointegration
tests. We will pay attention to the following three topics: (1) panel unit root
tests, (2) panel cointegration tests, and (3) estimation and inference in the panel
cointegration models. The discussion of each topic will be illustrated by
examples taken from the aforementioned list of references. Section 2 reviews
panel unit root tests, while Section 3 discusses the panel spurious models.
Section 4 considers the panel cointegration tests, while Section 5 discusses
panel cointegration models. Section 6 reviews some recent developments in
dynamic panels and Section 7 gives our conclusion.
A word on notation. We write the integral 01W(s)ds, as W when there is no
ambiguity over limits. We define 1/2 to be any matrix such that
p
= (1/2)(1/2), use to denote weak convergence, to denote convergence
in probability, I(0) and I(1) to signify a time series that is integrated of order
zero and one, respectively, and WZ(r) = W(r) [ WZ][ ZZ]Z(r) to denote an
L2 projection residual of W(r) on Z(r).
II. PANEL UNIT ROOTS TESTS

Testing for unit roots in time series studies is now common practice among
applied researchers and has become an integral part of econometric courses.
However, testing for unit roots in panels is recent, see Levin & Lin (1992), Im,
Pesaran & Shin (1997), Harris & Tzavalis (1999), Maddala & Wu (1999), Choi
(1999a), and Hadri (1999). Exceptions are Bharagava et al. (1982), Boumahdi
& Thomas (1991), Breitung & Meyer (1994), and Quah (1994). Bharagava et
al. proposed a test for random walk residuals in a dynamic model with fixed
effects. They suggested a modified Durbin-Watson (DW) statistic based on
fixed effects residuals and two other test statistics based on differenced OLS
residuals. In typical micro panels with N , they recommended their
modified DW statistic. Boumahdi & Thomas (1991) proposed a generalization
of the Dickey-Fuller (DF) test for unit roots in panel data to assess the
efficiency of the French capital market using 140 French stock prices over the
10
period January 1973 to February 1986. Breitung & Meyer (1994) applied
various modified DF test statistics to test for unit roots in a panel of contracted
wages negotiated at the firm and industry level for Western Germany over the
period 19721987. Quah (1994) suggested a test for unit root in a panel data
model without fixed effects where both N and T go to infinity at the same rate
such that N/T is constant. Levin & Lin (1992) generalized this model to allow
for fixed effects, individual deterministic trends and heterogeneous serially
correlated errors. They assumed that both N and T tend to infinity. However, T
increases at a faster rate than N with N/T 0. Even though this literature grew
from time series and panel data, the way in which N, the number of crosssection units, and T, the length of the time series, tend to infinity is crucial for
determining asymptotic properties of estimators and tests proposed for
nonstationary panels, see Phillips & Moon (1999a). Several approaches are
possible including: (i) sequential limits where one index, say N, is fixed and T
is allowed to increase to infinity, giving an intermediate limit. Then by letting
N tend to infinity subsequently, a sequential limit theory is obtained. Phillips &
Moon (1999b) argued that these sequential limits are easy to derive and are
helpful in extracting quick asymptotics. However, Phillips and Moon provided
a simple example that illustrates how sequential limits can sometimes give
misleading asymptotic results. (ii) A second approach, used by Quah (1994)
and Levin & Lin (1992) is to allow the two indexes, N and T to pass to infinity
along a specific diagonal path in the two dimensional array. This path can be
determined by a monotonically increasing functional relation of the type
T = T(N) which applies as the index N . Phillips & Moon (1999b) showed
that the limit theory obtained by this approach is dependent on the specific
functional relation T = T(N) and the assumed expansion path may not provide
an appropriate approximation for a given (T, N) situation. (iii) A third approach
is a joint limit theory allowing both N and T to pass to infinity simultaneously
without placing specific diagonal path restrictions on the divergence. Some
control over the relative rate of expansion may have to be exercised in order to
get definitive results. Phillips & Moon argued that, in general, joint limit theory
is more robust than either sequential limit or diagonal path limit. However, it
is usually more difficult to derive and requires stronger conditions such as the
existence of higher moments that will allow for uniformity in the convergence
arguments. The muti-index asymptotic theory in Phillips & Moon (1999a, b) is
applied to joint limits in which T, N and (T/N) , i.e. to situations where
the time series sample is large relative to the cross-section sample. However,
the general approach given there is also applicable to situations in which (T/
N) 0 although different limit results will generally obtain in that case.
11
A. Levin & Lin (1992) Tests

Consider the model
yit = iyit 1 + ziti + uit, i = 1, . . . , N; t = 1, . . . , T,
(1)
where zit is the deterministic component and uit is a stationary process. zit could
be zero, one, the fixed effects, i, or fixed effect as well as a time trend, t. The
Levin & Lin (LL) tests assume that uit are iid(0, 2u) and i = for all i. LL are
interested in testing the null hypothesis
H0 : = 1
(2)
against the alternative hypothesis

Ha : < 1.
Let be the OLS estimator of in (1) and define
zt = (z1t, . . . , zNt),

T
h(t, s) = zt
1
ztzt
zs,
t=1
u it = uit
h(t, s)uis,
s=1
and

T
y it = yit
h(t, s)yis.
(3)
s=1
Then we have

N
NT( 1) =
N
1
N
i=1
N
i=1
1
T2
y i, t 1u it
t=1
T
y 2i, t 1
t=1
and the corresponding t-statistic, under the null hypothesis is given by

N
( 1)
t =
i=1
se
t=1
y 2i, t 1
12
where

N
1
s =
NT
2
e
i=1
u 2it.
t=1
Assume that there exists a scaling matrix DT and piecewise continuous

function Z(r) such that
DT 1z[Tr] Z(r)
uniformly for r[0, 1]. For a fixed N, we have

N
1
N
i=1
and
y i, t 1u it
t=1

N
1
N

1
T
i=1
1
N
1
T2
WiZ dWiZ
i=1

N
y 2i, t 1
t=1
1
N
W 2iZ,
i=1
as T . Next we assume that WiZ dWiZ and W 2iZ, are independent across i
and have finite second moments. Then it follows that

N

N
1
N
1
N
W 2iZ E
i=1
WiZ dWiZ E
i=1
W 2iZ
WiZ dWiZ
N 0, Var
WiZ dWiZ
as N by a law of large numbers and the Lindeberg-Levy central limit

theorem. The following moments are taken from Levin & Lin (1992):
zit
E[ WiZ dWiZ] Var[ WiZ dWiZ]

1
0
0
2
1
1
0
3
1
1
i

2
12
1
1
(i, t)
2
60
E[ W 2iZ]
1
2
1
2
1
6
1
15
Var[ W2iZ]
1
3
?
1
45
11
6300
(4)
13
Using (4), Levin & Lin (1992) obtain the following limiting distributions of
NT( 1) and t:
zit
0
1

NT( 1) N(0, 2)
NT( 1) N(0, 2)
i
NT( 1) + 3N 0,
t
t N(0, 1)
t N(0, 1)

51
5
(i, t) N(T( 1) + 7.5) N 0,
1.25t + 1.875N N(0, 1)
2895
112
(5)
448
(t + 3.75N) N(0, 1)
277
Sequential limit theory, i.e. T followed by N , is used to derive the

limiting distributions in (5). In case uit is stationary, the asymptotic distributions
of and t need to be modified due to the presence of serial correlation.
Harris & Tzavalis (1999) also derived unit root tests for (1) with
zit = {0}, {i}, or {(i, t)} when the time dimension of the panel, T, is fixed.
This is the typical case for micro panel studies. The main results are:
zit
N( 1) N 0,
i
N 1 +

(i, t) N 1 +
2
T(T 1)

3
3(17T 2 20T + 17)
N 0,
T+1
5(T 1)(T + 1)3
15
15(193T 2 728T + 1147)
N 0,
2(T + 2)
112(T + 2)3(T 2)
Harris & Tzavalis (1999) also showed that the assumption that T tends to
infinity at a faster rate than N as in LL rather than T fixed as in the case in micro
panels, yields tests which are substantially undersized and have low power
especially when T is small.
Recently, Frankel & Rose (1996), Oh (1996), and Lothian (1996) tested the
PPP hypothesis using panel data. All of these articles use LL tests and some of
them report evidence supporting the PPP hypothesis. OConnell (1998),
14
however, showed that the LL tests suffered from significant size distortion in
the presence of correlation among contemporaneous cross-sectional error
terms. OConnell highlighted the importance of controlling for cross-sectional
dependence when testing for a unit root in panels of real exchange rates. He
showed that, controlling for cross-sectional dependence, no evidence against
the null of a random walk can be found in panels of up to 64 real exchange
rates.
Virtually all the existing nonstationary panel literature assume crosssectional independence. It is true that the assumption of independence across i
is rather strong, but it is needed in order to satisfy the requirement of the
Lindeberg-Levy central limit theorem. Moreover, as pointed out by Quah
(1994), modeling cross-sectional dependence is involved because individual
observations in a cross-section have no natural ordering. Driscoll & Kraay
(1998) presented a simple extension of common nonparametric covariance
matrix estimation techniques which yields standard errors that are robust to
very general forms of spatial and temporal dependence as the time dimension
becomes large. In a recent paper, Conley (1999) presented a spatial model of
dependence among agents using a metric of economic distance that provides
cross-sectional data with a structure similar to time-series data. Conley
proposed a generalized method of moments (GMM) using such dependent data
and a class of nonparametric covariance matrix estimators that allow for a
general form of dependence characterized by economic distance.
B. Im, Pesaran & Shin (1997) Tests
The LL test is restrictive in the sense that it requires to be homogeneous
across i. As Maddala (1999) pointed out, the null may be fine for testing
convergence in growth among countries, but the alternative restricts every
country to converge at the same rate. Im, Pesaran & Shin (1997) (IPS) allow for
a heterogeneous coefficient of yit 1 and proposed an alternative testing
procedure based on averaging individual unit root test statistics. IPS suggested
an average of the augmented DF (ADF) tests when uit is serially correlated with
different serial correlation properties across cross-sectional units, i.e. uit =
pj =i 1
ijuit j + it. Substituting this uit in (1) we get:

pi
yit = iyit 1 +
ij yit j + ziti + it.
j=1
The null hypothesis is

H0 : i = 1
(6)
15
for all i and the alternative hypothesis is

Ha : i < 1
for at least one i. The IPS t-bar statistic is defined as the average of the
individual ADF statistics as

N
1
t =
N
ti,
(7)
i=1
where ti is the individual t-statistic of testing H0 : i = 1 in (6). It is known that

for a fixed N

WiZ dWiZ
ti
= tiT
(8)
1/2
W 2iZ
as T . IPS assume that tiT are iid and have finite mean and variance. Then

N
1
N
N
tiT E[tiT | i = 1]
i=1
Var[tiT | i = 1]
N(0, 1)
(9)
as N by the Lindeberg-Levy central limit theorem. Hence

tIPS =
N(t E[tiT | i = ])
Var[tiT | i = 1]
N(0, 1)
(10)
as T followed by N sequentially. The values of E[tiT | i = 1] and

Var[tiT | i = 1] have been computed by IPS via simulations for different values
of T and pis.
In this volume, Breitung (2000) studies the local power of LL and IPS tests
statistics against a sequence of local alternatives. Breitung finds that the LL and
IPS tests suffer from a dramatic loss of power if individual specific trends are
included. This is due to the bias correction that also removes the mean under
the sequence of local alternatives. The simulation results indicate that the
power of LL and IPS tests is very sensitive to the specification of the
deterministic terms.
McCoskey & Selden (1998) applied the IPS test for testing unit root for per
capita national health care expenditures (HE) and gross domestic product
(GDP) for a panel of OECD countries. McCoskey & Selden rejected the null
16
hypothesis that these two series contain unit roots. Gerdtham & Lthgren
(1998) claimed that the stationarity found by McCoskey & Selden are driven by
the omission of time trends in their ADF regression in (6). Using the IPS test
with a time trend, Gerdtham & Lthgren found that both HE and GDP are
nonstationary. They concluded that HE and GDP are cointegrated around linear
trends following the results of McCoskey & Kao (1999b).
C. Combining P-Values Tests
Let GiTi be a unit root test statistic for the i-th group in (1) and assume that as
Ti , GiTi Gi. Let pi be the p-value of a unit root test for cross-section i, i.e.
pi = F(GiTi), where F() is the distribution function of the random variable Gi.
Maddala & Wu (1999) and Choi (1999a) proposed a Fisher type test

N
P=2
ln pi
(11)
i=1
which combines the p-values from unit root tests for each cross-section i to test
for unit root in panel data. P is distributed as 2 with 2N degrees of freedom as
Ti for all N. Maddala et al. (1999) argued that the IPS and Fisher tests relax
the restrictive assumption of the LL test that i is the same under the alternative.
Both the IPS and Fisher tests combine information based on individual unit
root tests. However, the Fisher test has the advantage over the IPS test in that
it does not require a balanced panel. Also, the Fisher test can use different lag
lengths in the individual ADF regressions and can be applied to any other unit
root tests. The disadvantage is that the p-values have to be derived by Monte
Carlo simulations. Choi (1999a) echoes similar advantages of the Fisher test:
(1) the cross-sectional dimension, N, can be either finite or infinite, (2) each
group can have different types of nonstochastic and stochastic components, (3)
the time series dimension, T, can be different for each i, and (4) the alternative
hypothesis would allow some groups to have unit roots while others may not.
When N is large, Choi (1999a) proposed a Z test,

N
Z=
N
( 2 ln pi 2)
i=1
(12)
since E[ 2 ln pi] = 2 and Var[ 2 ln pi] = 4. Assume that the pis are iid and
use the Lindeberg-Levy central limit theorem to get
Z N(0, 1)
17
as Ti followed by N .3
Choi (1999a) applied the Z test in (12) and the IPS test in (7) to panel data
of real exchange rates and provided evidence in favor of the PPP hypothesis.
Choi claimed that this is due to the improved finite sample power of the Fisher
test. Maddala & Wu (1999) and Maddala et al. (1999) find that the Fisher test
is superior to the IPS test, but they argue that these panel unit root tests still do
not rescue the PPP hypothesis. When allowance is made for the deficiency in
the panel data unit root tests and panel estimation methods, support for PPP
turns out to be weak.
D. Residual Based LM Test
Hadri (1999) proposed a residual based Lagrange Multiplier (LM) test for the
null that the time series for each i are stationary around a deterministic trend
against the alternative of a unit root in panel data. Consider the following
model
yit = zit + rit + it
(13)
where zit is the deterministic component, rit is a random walk

rit = rit 1 + uit
uit ~ iid(0, 2u) and it is a stationary process. (13) can be written as
yit = zit + eit
where
(14)
eit =
uij + it.
j=1
Let e it be the residuals from the regression in (14) and 2e be the estimate of the
error variance. Also, let Sit be the partial sum process of the residuals,

t
Sit =
e ij.
j=1
Then the LM statistic is

N
LM =
1
N
i=1
1
T2
2e
t=1
S2it
18
It can be shown that

p
LM E
W 2iZ
as T followed by N provided E[ W 2iZ] < . Also,

N(LM E[ W 2iZ])
Var[ W 2iZ]
N(0, 1)
as T followed by N .
E. Finite Sample Properties of Unit Root Tests
Extensive simulations have been conducted to explore the finite sample
performance of panel unit root tests, e.g. Karlsson & Lthgren (1999), Im et.
al. (1997), Maddala & Wu (1999), and Choi (1999a). Choi (1999a) studied the
small sample properties of IPS t-bar test in (7) and Fishers test in (11). Chois
major findings are the following:
(1) The empirical size of the IPS and the Fisher test are reasonably close to
their nominal size 0.05 when N is small. But the Fisher test shows mild size
distortions at N = 100, which is expected from the asymptotic theory.
Overall, the IPS t-bar test has the most stable size.
(2) In terms of the size-adjusted power, the Fisher test seems to be superior to
the IPS t-bar test.
(3) When a linear time trend is included in the model, the power of all tests
decrease considerably.
III. SPURIOUS REGRESSION IN PANEL DATA

Entorf (1997) studied spurious fixed effects regressions when the true model
involves independent random walks with and without drifts. Entorf found that
for T and N finite, the nonsense regression phenomenon holds for spurious
fixed effects models and inference based on t-values can be highly misleading.
Kao (1999) and Phillips & Moon (1999a) derived the asymptotic distributions
of the least squares dummy variable estimator and various conventional
statistics from the spurious regression in panel data.
Consider a spurious regression model for all i using panel data:
yit = xit + zit + eit,
where
(15)
19
xit = xit 1 + it,

and eit is I(1). The OLS estimator of is

=
i=1
t=1
1
x it x it
where y it is defined in (3) and
i=1
t=1
x ity it ,
(16)
x it = xit
h(t, s)xis.
s=1
It is known that if a time-series regression for a given i is performed in model

(15), the OLS estimator of is spurious. It is easy to see that

N
1
N
and
i=1
1
T2
t=1

N
1
N
i=1
1
T2
WiZWiZ
x it x it E
x ity it E WiZWiZ u
t=1
as by a sequential limit theory, where

E[ WiZWiZ]
1
0
2
1
1
2
1
i
Ik
2
1
(i, t)
Ik
15
zit
Then we have
p
1u.
(17)
is consistent for its true value,

(17) shows that the OLS estimator of , ,
1
u. This is due to the fact that the noise, eit, is as strong as the signal, xit,
20
since both eit and xit are I(1). In the panel regression (15) with a large number
of cross-sections, the strong noise of eit is attenuated by pooling the data and
a consistent estimate of can be extracted. The asymptotics of the OLS
estimator are very different from those of the spurious regression in pure time
series. This has an important consequence for residual-based cointegration tests
in panel data, because the null distribution of residual-based cointegration tests
depends on the asymptotics of the OLS estimator. This point is explained
further in the next section.
IV. PANEL COINTEGRATION TESTS

A. Kao Tests
Kao (1999) presented two types of cointegration tests in panel data, the DF and
ADF types tests. The DF type tests from Kao can be calculated from the
estimated residuals in (15) as:
e it = eit 1 + vit,
(18)
where
e it = y it x it.
In order to test the null hypothesis of no cointegration, the null can be written
as H0 : = 1. The OLS estimate of and the t-statistic are given as:

N
i=1
t=2
e ite it 1
=
e 2it
i=1
and
t=2
( 1)
t =

N
1
where s =
NT
2
e
i=1
i=1
e 2it 1
t=2
se
(eit e it 1)2. Kao proposed the following four DF type
t=2
tests by assuming zit = {i}:

DF =
NT( 1) + 3N
10.2
21
DFt = 1.25t + 1.875N,

3N 2
20
,
36 4
3+ 4
5 0
NT( 1) +
DF* =
and
t +
DF*t =
6N
2 0
,
20 3 2
+
2 2 10 20

1 and 2 =
u
u
1. While DF and DFt are based
where 2 =
0
u
u
on the strong exogeneity of the regressors and errors, DF* and DF*t are for the
cointegration with endogenous relationship between regressors and errors. For
the ADF test, we can run the following regression:

p
e it = eit 1 +
jeit j + itp.
(19)
j=1
With the null hypothesis of no cointegration, the ADF test statistics can be
constructed as:
tADF +
ADF =
6N
2 0
20 3 2
+
2 2 10 20
where tADF is the t-statistic of in (19). The asymptotic distributions of DF,

DFt, DF*, DF*t, and ADF converge to a standard normal distribution N(0, 1) by
the sequential limit theory.
B. Residual Based LM Test
McCoskey & Kao (1998) derived a residual-based test for the null of
cointegration rather than the null of no cointegration in panels. This test is an
extension of the LM test and the locally best invariant (LBI) test for an MA unit
root in the time series literature, see Harris & Inder (1994) and Shin (1994).
Under the null, the asymptotics no longer depend on the asymptotic properties
22
of the estimating spurious regression, rather the asymptotics of the estimation

of a cointegrated relationship are needed. For models which allow the
cointegrating vector to change across the cross-sectional observations, the
asymptotics depend merely on the time series results as each cross-section is
estimated independently. For models with common slopes, the estimation is
done jointly and therefore the asymptotic theory is based on the joint estimation
of a cointegrated relationship in panel data.
For the residual based test of the null of cointegration, it is necessary to use
an efficient estimation technique of cointegrated variables. In the time series
literature a variety of methods have been shown to be efficient asymptotically.
These include the fully modified (FM) estimator of Phillips & Hansen (1990)
and the dynamic least squares (DOLS) estimator as proposed by Saikkonen
(1991) and Stock & Watson (1993). For panel data, Kao & Chiang (2000)
showed that both the FM and DOLS methods can produce estimators which are
asymptotically normally distributed with zero means.
The model presented allows for varying slopes and intercepts:
yit = i + xiti + eit,
(20)
xit = xit 1 + it
(21)
eit = it + uit,
(22)
and
it = it 1 + uit,
where uit are i.i.d(0, 2u). The null of hypothesis of cointegration is equivalent
to = 0.
The test statistic proposed by McCoskey & Kao (1998) is defined as
follows:

N
LM =
1
N
i=1
1
T2
S2it
t=1
2e
(23)
where Sit, is partial sum process of the residuals,

t
Sit =
e ij
j=1
and 2e is defined in McCoskey and Kao. The asymptotic result for the test is:
N(LM ) N(0, 2 ).
(24)
23
The moments, and 2 , can be found through Monte Carlo simulation. The
limiting distribution of LM is then free of nuisance parameters and robust to
heteroskedasticity.
Urban economists have long sought to explain the relationship between
urbanization levels and output. McCoskey & Kao (1999a) revisited this
question and test the long run stability of a production function including
urbanization using nonstationary panel data techniques. McCoskey and Kao
applied the IPS test and LM in (23) and showed that a long run relationship
between urbanization, output per worker and capital per worker cannot be
rejected for the sample of thirty developing countries or the sample of twentytwo developed countries over the period 19651989. They do find, however,
that the sign and magnitude of the impact of urbanization varies considerably
across the countries. These results offer new insights and potential for dynamic
urban models rather than the simple cross-section approach.
C. Pedroni Tests
Pedroni (1997a) also proposed several tests for the null hypothesis of no
cointegration in a panel data model that allows for considerable heterogeneity.
His tests can be classified into two categories. The first set is similar to the tests
discussed above, and involve averaging test statistics for cointegration in the
time series across cross-sections. The second set group the statistics such that
instead of averaging across statistics, the averaging is done in pieces so that the
limiting distributions are based on limits of piecewise numerator and
denominator terms.
The first set of statistics as discussed includes a form of the average of the
Phillips & Ouliaris (1990) statistic:

T
(eit 1eit i)
Z =
t=1
i=1
(25)
e it2 1
t=1
1
where e it is estimated from (15), and i = ( 2i s2i ), for which 2i and s2i are
2
individual long-run and contemporaneous variances respectively of the residual
e it. For his second set of statistics, Pedroni defines four panel test statistics. Let
i be a consistent estimate of i, the long-run variance-covariance matrix.

i such that in the
Define L i to be the lower triangular Cholesky composition of
24
2u
is the long-run conditional variance. In
2
this survey we consider only one of these statistics:
scalar case L 22i = and L 11i = 2u

N
2
L 11i
(eit 1e it i)
Zt =
NT
i=1
t=2
2NT
i=1
where
t=2

N
1
NT =
N
2 2
L 11i
e it 1
i=1
(26)
2i
.
L 211i
It should be noted that Pedroni bases his test on the average of the numerator
and denominator terms respectively, rather than the average for the statistics as
a whole. Using results on convergence of functionals of Brownian motion,
Pedroni finds the following result:
Zt + 1.73N N(0, 0.93).
NT
Note that this distribution applies to the model including an intercept and not
including a time trend. Asymptotic results for other model specifications can be
found in Pedroni (1997a). The convergence in distribution is based on
individual convergence of the numerator and denominator terms. What is the
intuition of rejection of the null hypothesis? Using the average of the overall
test statistic allows more ease in interpretation: rejection of the null hypothesis
means that enough of the individual cross-sections have statistics far away
from the means predicted by theory were they to be generated under the null.
Pedroni (1999) derived asymptotic distributions and critical values for
several residual based tests of the null of no cointegration in panels where there
are multiple regressors. The model includes regressions with individual specific
fixed effects and time trends. Considerable heterogeneity is allowed across
individual members of the panel with regards to the associated cointegrating
vectors and the dynamics of the underlying error process. Pedroni (1997b)
showed that for test of the null of no cointegration, the appropriate weighting
matrix of a GLS based estimator must be constructed using the long run
conditional covariance matrix between individual members of the panel in
order to eliminate nuisance parameters associated with member specific
dynamics. Pedroni (1997b) found that the violation of cross-sectional
independence does not appear to play a significant role for the conclusions in
25
favor of weak long run PPP provided that one also includes common time
dummies in the regression. Pedroni (2000) also demonstrated how it is possible
to construct a test that can be employed to test whether or not members of a
panel with heterogeneous short run dynamics converge to a single common
steady state.
D. Likelihood-Based Cointegration Test
Larsson, Lyhagen & Lthgren (1998) presented a likelihood-based (LR) panel
test of cointegrating rank in heterogeneous panel models based on the average
of the individual rank trace statistics developed by Johansen (1995). The
proposed LR-bar statistic is very similar to the IPS t-bar statistic in (7) through
(10). In Monte Carlo simulation, Larsson et al. investigated the small sample
properties of the standardized LR-bar statistic. They found that the proposed
test requires a large time series dimension. Even if the panel has a large crosssectional dimension, the size of the test will be severely distorted.
Groen & Kleibergen (1999) proposed a likelihood-based framework for
cointegrating analysis in panels of a fixed number of vector error correction
models. Maximum likelihood estimators of the cointegrating vectors are
constructed using iterated generalized method of moments (GMM) estimators.
Using these estimators Groen and Kleibergen construct likelihood ratio
statistics, LR(B|A), to test for a common cointegration rank across the
individual vector error correction models, both with heterogeneous and
homogeneous cointegrating vectors. Interestingly, the limiting distribution of
LR(B|A) is invariant to the covariance matrix of the error terms which
implies that LR(B|A) is robust with respect to the choices of covariance
matrix. Let us define the LRs(r|k) as the summation of the N individual trace
statistics

N
LRs(r | k) =
(27)
LRi(r | k)
i=1
where LRi(r | k) is the i-th Johansens likelihood ratio statistic, so that

LRi(r | k) tr
dBk r, iBk r, i
dBk r, iBk r, i
as T . Now for a fixed N, it is clear that
dBk r, iBk r, i
26

N
LRs(r | k) =
LRi(r | k)
i=1
N
tr
dBk r, iBk r, i
i=1
dBk r, iBk r, i
dBk r, iBk r, i
(28)
as T by a continuous mapping theorem. It follows that LRs(r | k) is
asymptotically equivalent to LR(B | A) when N is fixed and T is large. This
means that nothing is lost by assuming that the covariance matrix has zero nondiagonal covariances as far as the asymptotics are concerned for the proposed
test statistics in this chapter. More importantly, the tests based on the crossindependence like (27) will perform just as well (asymptotically) as the tests
based on the cross-dependence such as LR(B | A). Groen and Kleibergen
verified that the likelihood-based cointegration tests proposed by Larsson et al.
in (27) are robust with respect to the cross-dependence in panel data. The
(asymptotic) equivalence of LRs(r | k) and LR(B | A) found in Groen and
Kleibergen has profound implications to econometricians and applied economists, e.g. there exists tests/estimators based on the cross-independence
which are equivalent to tests/estimators based on the cross-dependence in
nonstationary panel time series. Define LR(r | k) to be the average of LRi(r | k):

N
LR(r | k) =
1
1
LRs(r | k) =
N
N
LRi(r | k).
i=1
It can be shown that

LR(r | k) E[LR(r | k)]
Var[LR(r | k)]
N(0, 1)
as T followed by N by a continuous mapping theorem and a central

limit theorem provided E[LR(r | k)] and Var[LR(r | k)] are bounded. Define
LR(B | A) =
For a fixed N, it is easy to show that
1
LR(B | A).
N
(29)
LR(B | A) =
1
LR(B | A)
N

N

tr
dBk r, iBk r, i
27
dBk r, iBk r, i
i=1
dBk r, iBk r, i
1
=
N
where
Zki = tr
Zki
i=1

dBk r, iBk r, i
as T . Then
dBk r, iBk r, i
1
N
dBk r, iBk r, i
i=1
1
Zki E
N
Zki
i=1
N(0, 1)
N
1
Var
Zki
N i=1
as N since Bk r, i and Bk r, j are independent for i j. It implies that
LR(B | A) E[LR(B | A)]
Var[LR(B | A)]
N(0, 1)
as T followed by N . The above discussion indicates that LR(r | k) and

LR(B | A) are also equivalent when T and N are large.
Groen & Kleibergen (1999) applied LR(B | A) to a data set of exchange
rates and appropriate monetary fundamentals. They found strong evidence for
the validity of the monetary exchange rate model within a panel of vector
correction models for three major European countries, whereas the results
based on individual vector error correction models for each of these countries
separately are less supportive.
E. Finite Sample Properties
McCoskey & Kao (1999b) conducted Monte Carlo experiments to compare the
size and power of different residual based tests for cointegration in
28
heterogeneous panel data: varying slopes and varying intercepts. Two of the
tests are constructed under the null hypothesis of no cointegration. These tests
are based on the average ADF test and Pedronis pooled tests in (25) and (26).
The third test is based on the null hypothesis of cointegration which is based
on the McCoskey & Kao LM test in (23). Wu & Yin (1999) performed a similar
comparison for panel tests in which they consider only tests for which the null
hypothesis is that of no cointegration. Wu & Yin compared ADF statistics with
maximum eigenvalue statistics in pooling information on means and p-values
respectively. They found that the average ADF performs better with respect to
power and their maximum eigenvalue based p-value performs better with
regards to size.
The test of the null hypothesis was originally proposed in response to the low
power of the tests of the null of no cointegration, especially in the time series
case. Further, in cases where economic theory predicted a long run steady state
relationship, it seemed that a test of the null of cointegration rather than the null
of no cointegration would be appropriate. The results from the Monte Carlo
study showed that the McCoskey & Kao LM test outperforms the other two
tests.
Of the two reasons for the introduction of the test of the null hypothesis of
cointegration, low power and attractiveness of the null, the introduction of the
cross-section dimension of the panel solves one: all of the tests show decent
power when used with panel data. For those applications where the null of
cointegration is more logical than the null of no cointegration, McCoskey &
Kao (1999b), at a minimum, conclude that using the McCoskey & Kao LM test
does not compromise the ability of the researcher in determining the underlying
nature of the data.
Recently, Hall et al. (1999) proposed a new approach based on principal
components analysis to test for the number of common stochastic trends
driving the nonstationary series in a panel data set. The test is consistent even
if there is a mixture of I(0) and I(1) series in the sample. This makes it
unnecessary to pretest the panel for unit root. It also has the advantage of
solving the problem of dimensionality encountered in large panel data sets.
V. ESTIMATION AND INFERENCE IN PANEL

COINTEGRATION MODELS
This section discusses the issues that arise in estimation and inference of
cointegrated panel regression models. The asymptotic properties of the
estimators of the regression coefficients and the associated statistical tests are
different from those of the time series cointegration regression models. Some
29
of these differences have become apparent in recent works by Kao & Chiang
(2000), Phillips & Moon (1999a) and Pedroni (1996). The panel cointegration
models are directed at studying questions that surround long run economic
relationships typically encountered in macroeconomic and financial data. Such
a long run relationship is often predicted by economic theory and it is then of
central interest to estimate the regression coefficients and test whether they
satisfy theoretical restrictions. Kao & Chen (1995) showed that the OLS in
panel cointegrated models is asymptotically normal but still asymptotically
biased. Chen, McCoskey & Kao (1999) investigated the finite sample
proprieties of the OLS estimator, the t-statistic, the bias-corrected OLS
estimator, and the bias-corrected t-statistic. They found that the bias-corrected
OLS estimator does not improve over the OLS estimator in general. The results
of Chen et al. suggested that alternatives, such as the fully modified (FM)
estimator or dynamic OLS (DOLS) estimator may be more promising in
cointegrated panel regressions. Phillips & Moon (1999a) and Pedroni (1996)
proposed a FM estimator, which can be seen as a generalization of Phillips &
Hansen (1990). In this volume, Kao & Chiang (2000) propose an alternative
approach based on a panel dynamic least squares (DOLS) estimator, which
builds upon the work of Saikkonen (1991) and Stock & Watson (1993).
Next, we provide a brief discussion of the OLS estimation methods in a
panel cointegrated model. Consider the following panel regression:
yit = xit + ziti + uit,
(30)
where {yit} are 1 1, is a k 1 vector of the slope parameters, zit is the

deterministic component, and {uit} are the stationary disturbance terms. We
assume that {xit} are k 1 integrated processes of order one for all i, where
xit = xit 1 + it.
Under these specifications, (30) describes a system of cointegrated regressions,
i.e. yit is cointegrated with xit. The OLS estimator of is
OLS =
It is easy to show that

i=1
t=1
x it x it

N
1
N
and
i=1
1
i=1
t=1
x ity it .
1
T2
t=1
(31)
1
x it x it lim
N N
p
i=1
E[2i],
(32)
30

N
1
N
i=1
1
T
t=1

N
1
x itu it lim
N N
(33)
E[1i]
i=1
using sequential limit theory, where

zit
E[1i]
i
1
ui + ui
2
1
ui + ui
2
(i, t)
E[2i]
1
2
0
1
i
6
1
i
15
and
i =
ui
ui
ui
i
is the long-run covariance matrix of (uit, it), also i =
ui
ui
ui
is the onei
sided long-run covariance. For example, when zit = {i}, we get

N
1
NT( OLS ) NNT N 0, 6 1 lim
N N
u.i 1 ,
i=1

where = lim
N
1
N
i and
i=1
1
NT =
N
1
T2
i=1
(xit x i)(xit x i)
1
t=1
1
N
1/2
i
i dWi i 1/2ui + ui .
W
i=1
Kao & Chiang (2000) in this volume studied the limiting distributions for the
FM, and DOLS estimators in a cointegrated regression and showed they are
31
asymptotically normal. Phillips & Moon (1999a) and Pedroni (1996) also
obtained similar results for the FM estimator. The reader is referred to the cited
papers for further details. Kao and Chiang also investigated the finite sample
properties of the OLS, FM, and DOLS estimators. They found that: (i) the OLS
estimator has a non-negligible bias in finite samples, (ii) the FM estimator does
not improve over the OLS estimator in general, and (iii) the DOLS estimator
may be more promising than OLS or FM estimators in estimating the
cointegrated panel regressions.
Choi (1999b) extended Kao & Chiang (2000) to study asymptotic properties
of OLS, Within and GLS estimators for an error component model. The error
component model involves both stationary and nonstationary regressors. Chois
simulation results indicated that the feasible GLS estimator is more efficient
than the Within estimator. Choi (1999c) studied instrumental variable
estimation for an error component model with stationary and nearly
nonstationary regressors.
Phillips & Moon (1999a) studied various regressions between two panel
vectors that may or may not have cointegrating relations, and present a
fundamental framework for studying sequential and joint limit theories in
nonstationary panel data. In particular, Phillips and Moon studied regression
limit theory of nonstationary panels when both N and T go to infinity. Their
limit theory allows for both sequential limits, where T followed by N
and joint limits, where T, N simultaneously. Phillips and Moon require that
N/T 0, so that these results apply for moderate N and large T macro panel
data and not large N and small T micro panel data. The panel models
considered allow for four cases: (i) panel spurious regression, where there is no
time series cointegration, (ii) heterogeneous panel cointegration, where each
individual has its own specific cointegration relation, (iii) homogeneous panel
cointegration where individuals have the same cointegration relation, and (iv)
near-homogeneous panel cointegration, where individuals have slightly
different cointegration relations determined by the value of a localizing
parameter. Phillips & Moon (1999a) investigated these four models and
developed panel asymptotics for regression coefficients and tests using both
sequential and joint limit arguments. In all cases considered the pooled
estimator is consistent and has a normal limiting distribution. In fact, for the
spurious panel regression, Phillips & Moon (1999a) showed that under quite
weak regularity conditions, the pooled least squares estimator of the slope
coefficient is N consistent for the long-run average relation parameter
and has a limiting normal distribution. Also, Moon & Phillips (1998a) showed
that a limiting cross-section regression with time averaged data is also N
consistent for and has a limiting normal distribution. This is different from
32
the pure time series spurious regression where the limit of the OLS estimator
of is a nondegenerate random variate that is a functional of Brownian
motions and is therefore not consistent for . The idea in Phillips & Moon
(1999a) is that independent cross-section data in the panel adds information
and this leads to a stronger overall signal than the pure time series case. Pesaran
& Smith (1995) studied limiting cross-section regressions with time averaged
data and established consistency with restrictive assumptions on the heterogeneous panel model. This differs from Phillips & Moon (1999a) in that the
former use an average of the cointegrating coefficients which is different from
the long run average regression coefficient. This requires the existence of
cointegrating time series relations, whereas the long run average regression
coefficient is defined irrespective of the existence of individual cointegrating
relations and relies only on the long run average variance matrix of the panel.
Phillips & Moon (1999a) also showed that for the homogeneous and near
homogeneous cointegration cases, a consistent estimator of the long run
regression coefficient can be constructed which they call a pooled FM
estimator. They showed that this estimator has faster convergence rate than the
simple cross-section and time series estimators. See also Phillips & Moon
(1999b) for a concise review. In fact, the latter paper also shows how to extend
the above ideas to models with individual effects in the data generating process.
For the panel spurious regression with individual specific deterministic trends,
estimates of the trend coefficients are obtained in the first step and the
detrended data is pooled and used in least squares regression to estimate in
the second step. Two different detrending procedures are used based on OLS
and GLS regressions. OLS detrending leads to an asymptotically more efficient
estimator of the long run average coefficient in pooled regression than GLS
detrending. Phillips & Moon (1999b) explain that the residuals after time
series GLS detrending have more cross section variation than they do after OLS
detrending and this produces great variation in the limit distribution of the
pooled regression estimator of the long run average coefficient.
Moon & Phillips (1999) investigate the asymptotic properties of the
Gaussian MLE of the localizing parameter in local to unity dynamic panel
regression models with deterministic and stochastic trends. Moon and Phillips
find that for the homogeneous trend model, the Gaussian MLE of the common
localizing parameter is N consistent, while for the heterogeneous trends
model, it is inconsistent. The latter inconsistency is due to the presence of an
infinite number of incidental parameters (as N ) for the individual trends.
Unlike the fixed effects dynamic panel data model where this inconsistency due
to the incidental parameter problem disappears as T , the inconsistency of
33
the localizing parameter in the Moon and Phillips model persists even when
both N and T go to infinity.
Pesaran, Shin & Smith (1999) derived the asymptotics of a pooled mean
group (PMG) estimator. The PMG estimation constrains the long run
coefficients to be identical, but allow the short run and adjustment coefficients
as the error variances to differ across the cross-sectional dimension.
Recently, Binder, Hsiao & Pesaran (2000) considered estimation and
inference in panel vector autoregressions (PVARS) with fixed effects when T is
finite and N is large. A maximum likelihood estimator as well as unit root and
cointegration tests are proposed based on a transformed likelihood function.
This MLE is shown to be consistent and asymptotically normally distributed
irrespective of the unit root and cointegrating properties of the PVAR model.
The tests proposed are based on standard chi-square and normal distributed
statistics. Binder et al. also show that the conventional GMM estimators based
on standard orthogonality conditions break down if the underlying time series
contain unit roots. Monte Carlo evidence is provided which favors MLE over
GMM in small samples.
In this volume, Kauppi (2000) develops a new joint limit theory where the
panel data may be cross-sectionally heterogeneous in a general way. This limit
theory builds upon the concepts of joint convergence in probability and in
distribution for double indexed processes by Phillips & Moon (1999a) and
develops new versions of the law of large numbers and the central limit
theorem that apply in panels where the data may be cross-sectionally
heterogeneous in a fairly general way. Kauppi demonstrates how this joint limit
theory can be applied to derive asymptotics for a panel regression where the
regressors are generated by a local to unit root with heterogeneous localizing
coefficients across cross-sections. Kauppi discusses issues that arise in the
estimation and inference of panel cointegrated regressions with near integrated
regressors. Kauppi shows that a bias corrected pooled OLS for a common
cointegrating parameter has an asymptotic normal distribution centered on the
true value irrespective of whether the regressor has near or exact unit root.
However, if the regression model contains individual effects and/or deterministic trends, then Kauppis bias corrected pooled OLS still produces
asymptotic bias. Kauppi also shows that the panel FM estimator is subject to
asymptotic bias regardless of how individual effects and/or deterministic trends
are contained if the regressors are nearly rather than exacly integrated. This
indicates that much care should be taken in interpreting empirical results
achieved by the recent panel cointegration methods that assume exact unit roots
when near unit roots are equally plausible.
34
Kao et al. (1999) apply the asymptotic theory of panel cointegration

developed by Kao & Chiang (2000) to the Coe & Helpman (1995) international
R&D spillover regression. Using a sample of 21 OECD countries and Israel,
they re-examine the effects of domestic and foreign R&D capital stocks on total
factor productivity of these countries. They find that OLS with bias-correction,
the fully modified (FM) and the dynamic OLS (DOLS) estimators produce
different predictions about the impact of foreign R&D on total factor
productivity (TFP). However, all the estimators support the result that domestic
R&D is related to TFP. Kao et al.s empirical results indicate that the estimated
coefficients in the Coe and Helpmans regressions are subject to estimation
bias. Given the superiority of the DOLS over FM as suggested by Kao &
Chiang (2000), Kao et al. leaned towards rejecting the Coe and Helpman
hypothesis that international R&D spillovers are trade related.
Funk (1998) examined the relationship between trade patterns and
international R&D spillovers among the OECD countries using the panel
cointegration methods developed by Kao (1999), Kao & Chiang (2000), and
Pesaran, Shin & Smith (1999). Using randomly simulated bilateral trade
patterns, Funk found that the choice of weights used in constructing foreign
R&D stocks is informative of the avenue of spillover transmission when panel
cointegration methods are employed. A re-examination of the relationship
between import patterns and R&D spillovers found no evidence to link the
patterns of R&D spillovers to the patterns of imports. Funk found strong
evidence indicating that exporters receive substantial R&D spillovers from
their customers.
VI. DYNAMIC PANEL DATA MODELS

This section surveys recent developments in dynamic panel data models. The
dynamic panel data regression is characterized by two sources of persistence
over time. Autocorrelation due to the presence of a lagged dependent variable
among the regressors and individual effects characterizing the heterogeneity
among the individuals
yit = yi, t 1 + xit + i + uit
(34)
for i = 1, 2, . . . , N; and t = 1, 2, . . . , T. is a scalar, xit is k 1, i denotes the

i-th individuals effect and uit is the remainder disturbance. Basic introductions
to this topic are found in Hsiao (1986), Baltagi (1995) and Matyas & Sevestre
(1996). Applications using this model are too many to enumerate. These
include employment equations, see Arellano & Bond (1991), liquor demand,
see Baltagi & Griffin (1995), growth convergence, see Islam (1995) and
35
Nerlove (1999), life cycle labor supply models, see Ziliak (1997), and demand
for gasoline, see Baltagi & Griffin (1997) to mention a few.
It is well known that for typical micro-panels where there are a large number
of firms or individuals (N) observed over a short period of time (T), the fixed
effects (FE) estimator is biased and inconsistent (since T is fixed and N ),
see Nickell (1981) and more recently Kiviet (1995, 1999). Monte Carlo results
have shown that first order asymptotic properties do not necessarily yield
correct inference in finite samples. Therefore, Kiviet (1995) examined higher
order asymptotics which may approximate the actual finite sample properties
more closely and lead to better inference. In fact, Kiviet (1995) considered the
simple dynamic linear panel data model with serially uncorrelated disturbances
and strongly exogenous regressors and derived an approximation for the bias of
the FE estimator. When a consistent estimator of this bias is subtracted from the
original FE estimator, a corrected FE estimator results. This corrected FE
estimator performed well in simulations when compared with eight other
consistent instrumental variable or GMM estimators.4
In macro-panels studying for example long run growth, the data covers a
large number of countries N over a moderate size T. In this case, T is not very
small relative to N. Hence, some researchers may still favor the FE estimator
arguing that its bias may not be large. Judson & Owen (1999) performed some
Monte Carlo experiments for N = 20 or 100 and T = 5, 10, 20 and 30 and found
that the bias in the FE can be sizeable, even when T = 30. The bias of the FE
estimator increases with and decreases with T. But even for T = 30, this bias
could be as much as 20% of the true value of the coefficient of interest. Judson
& Owen (1999) recommend the corrected FE estimator proposed by Kiviet
(1995) as the best choice, GMM being second best and for long panels, the
computationally simpler Anderson & Hsiao (1982) estimator. This last
estimator first differences the data to get rid of the individual effects and then
uses lagged predetermined variables in levels as instruments.5 Arellano & Bond
(1991) proposed GMM procedures that are more efficient than the Anderson &
Hsiao (1982) estimator. Ahn & Schmidt (1995) derive additional nonlinear
moment restrictions not exploited by the Arellano & Bond (1991) GMM
estimator.6 Ahn & Schmidt (1995, 1997) also give a complete count of the set
of orthogonality conditions corresponding to a variety of assumptions imposed
on the disturbances and the initial conditions of the dynamic panel data model.
While many of the moment conditions are nonlinear in the parameters, Ahn &
Schmidt (1997) propose a linearized GMM estimator that is asymptotically as
efficient as the nonlinear GMM estimator. They also provide simple moment
tests of the validity of these nonlinear restrictions. In addition, they investigate
the circumstances under which the optimal GMM estimator is equivalent to a
36
linear instrumental variable estimator. They find that these circumstances are
quite restrictive and go beyond uncorrelatedness and homoskedasticity of the
errors. Ahn & Schmidt (1995) provide some evidence on the efficiency gains
from the nonlinear moment conditions which provide support for their use in
practice. By employing all these conditions, the resulting GMM estimator is
asymptotically efficient and has the same asymptotic variance as the MLE
under normality. In fact, Hahn (1997) showed that GMM based on an
increasing set of instruments as N would achieve the semiparametric
efficiency bound.
Hahn (1997) considers the asymptotic efficient estimation of the dynamic
panel data model with sequential moment restrictions in an environment with
i.i.d. observations. Hahn (1997) shows that the GMM estimator with an
increasing set of instruments as the sample size grows attains the semiparametric efficiency bound of the model. Hahn (1997) explains how Fourier series
or polynomials may be used as the set of instruments for efficient estimation.
In a limited Monte Carlo comparison, Hahn finds that this estimator has similar
finite sample properties as the Keane & Runkle (1992) and/or Schmidt et al.
(1992) estimators when the latter estimators are efficient. In cases where the
latter estimators are not efficient, the Hahn efficient estimator outperforms both
estimators in finite samples.
Recently, Wansbeek & Bekker (1996) considered a simple dynamic panel
data model with no exogenous regressors and disturbances uit and random
effects i that are independent and normally distributed. They derived an
expression for the optimal instrumental variable estimator, i.e. one with
minimal asymptotic variance. A striking result is the difference in efficiency
between the IV and ML estimators. They find that for regions of the
autoregressive parameter which are likely in practice, ML is superior. The
gap between IV (or GMM) and ML can be narrowed down by adding moment
restrictions of the type considered by Ahn & Schmidt (1995). Hence, Wansbeek
& Bekker (1996) find support for adding these nonlinear moment restrictions
and warn against the loss in efficiency as compared with MLE by ignoring
them.
Blundell & Bond (1998) revisit the importance of exploiting the initial
condition in generating efficient estimators of the dynamic panel data model
when T is small. They consider a simple autoregressive panel data model with
no exogenous regressors
yit = yi, t 1 + i + uit
(35)
with E(i) = 0; E(uit) = 0; and E(iuit) = 0 for i = 1, 2, . . . , N; t = 1, 2, . . . , T.

Blundell & Bond (1998) focus on the case where T = 3 and therefore there is
37
only one orthogonality condition given by E(yi1ui3) = 0, so that is justidentified. In this case, the first stage IV regression is obtained by running yi2
on yi1. Note that this regression can be obtained from (2) evaluated at t = 2 by
subtracting yi1 from both sides of this equation, i.e.
yi2 = ( 1)yi, 1 + i + ui2
(36)
Since we expect E(yi1i) > 0, ( 1) will be biased upwards with

plim( 1) = ( 1)
c
c + ( 2/ 2u)
(37)
where c = (1 )/(1 + ). The bias term effectively scales the estimated
coefficient on the instrumental variable yi1 towards zero. They also find that the
F-statistic of the first stage IV regression converges to 21 with noncentrality
parameter
=
( 2uc)2
0 as 1
2 + 2uc
(37)
As 0, the instrumental variable estimator performs poorly. Hence, Blundell

and Bond attribute the bias and the poor precision of the first difference GMM
estimator due to the problem of weak instruments described in Nelson & Startz
(1990) and Staiger & Stock (1997) and characterize this weak IV by its
concentration parameter .
Next, Blundell & Bond (1998) show that an additional mild stationarity
restriction on the initial conditions process allows the use of an extended
system GMM estimator that uses lagged differences of yit as instruments for
equations in levels, in addition to lagged levels of yit as instruments for
equations in first differences, see Arellano & Bover (1995). The system GMM
estimator is shown to have dramatic efficiency gains over the basic first
difference GMM as 1 and ( 2/ 2u) increases. In fact, for T = 4 and ( 2/
2u) = 1, the asymptotic variance ratio of the first difference GMM estimator to
this system GMM estimator is 1.75 for = 0 and increases to 3.26 for = 0.5
and 55.4 for = 0.9. This clearly demonstrates that the levels restrictions
suggested by Arellano & Bover (1995) remain informative in cases where first
differenced instruments become weak. Things improve for first difference
GMM as T increases. However, with short T and persistent series, the Blundell
and Bond findings support the use of the extra moment conditions. These
results are reviewed and corroborated in Blundell, Bond & Windmeijer (2000)
in this volume, using Monte Carlo experiments as well as an empirical
example. In fact, simulations that include the weakly exogenous covariates find
large finite sample bias and very low precision for the standard first differenced
38
estimator. However, the system GMM estimator not only improves the
precision but also reduces the finite sample bias. The empirical application
revisits the estimates of the capital and labor coefficients in a Cobb-Douglas
production function considered by Griliches & Mairesse (1998). Using data on
509 R&D performing US manufacturing companies observed over 8 years
(19821989), the standard GMM estimator that uses moment conditions on the
first differenced model finds a low estimate of the capital coefficient and low
precision for all coefficients estimated. However, the system GMM estimator
gives reasonable and more precise estimates of the capital coefficient and
constant returns to scale is not rejected. Blundell et al. conclude that a careful
examination of the original series and consideration of the system GMM
estimator can usefully overcome many of the disappointing features of the
standard GMM estimator for dynamic panel models. Hahn (1999) also
examines the role of the initial condition imposed by the Blundell & Bond
(1998) estimator. This is done by numerically comparing the semiparametric
information bounds for the case that incorporates the stationarity of the initial
condition and the case that does not. Hahn (1999) finds that the efficiency gain
can be substantial.
Ziliak (1997) asks the question whether the bias/efficiency trade-off for the
GMM estimator considered by Tauchen (1986) for the time series case is still
binding in panel data where the sample size is normally larger than 500. For
time series data, Tauchen (1986) shows that even for T = 50 or 75 there is a bias/
efficiency trade-off as the number of moment conditions increase. Therefore,
Tauchen recommends the use of sub-optimal instruments in small samples.
This result was also corroborated by Andersen & Sorensen (1996) who argue
that GMM using too few moment conditions is just as bad as GMM using too
many moment conditions. This problem becomes more pronounced with panel
data since the number of moment conditions increase dramatically as the
number of strictly exogenous variables and the number of time series
observations increase. Even though it is desirable from an asymptotic efficiency
point of view to include as many moment conditions as possible, it may be
infeasible or impractical to do so in many cases. For example, for T = 10 and
five strictly exogenous regressors, this generates 500 moment conditions for
GMM. Ziliak (1997) performs an extensive set of Monte Carlo experiments for
a dynamic panel data model and finds that the same trade-off between bias and
efficiency exists for GMM as the number of moment conditions increase, and
that one is better off with sub-optimal instruments. In fact, Ziliak finds that
GMM performs well with suboptimal instruments, but is not recommended for
panel data applications when all the moments are exploited for estimation.7
Ziliak estimates a life cycle labor supply model under uncertainty based on 532
39
men observed over 10 years of data (19781987) from the panel study of
income dynamics. The sample was restricted to continuously married,
continuously working prime age men aged 2251 in 1978. These men were
paid an hourly wage or salaried and could not be piece-rate workers or selfemployed. Ziliak finds that the downward bias of GMM is quite severe as the
number of moment conditions expands, outweighing the gains in efficiency.
Ziliak reports estimates of the intertemporal substitution elasticity which is the
focal point of interest in the labor supply literature. This measures the
intertemporal changes in hours of work due to an anticipated change in the real
wage. For GMM, this estimate changes from 0.519 to 0.093 when the number
of moment conditions used in GMM are increased from 9 to 212. The standard
error of this estimate drops from 0.36 to 0.07. Ziliak attributes this bias to the
correlation between the sample moments used in estimation and the estimated
weight matrix. Interestingly, Ziliak finds that the forward filter 2SLS estimator
proposed by Keane & Runkle (1992) performs best in terms of the bias/
efficiency trade-off and is recommended. Forward filtering eliminates all forms
of serial correlation while still maintaining orthogonality with the initial
instrument set. Schmidt, Ahn & Wyhowski (1992) argued that filtering is
irrelevant if one exploits all sample moments during estimation. However, in
practice, the number of moment conditions increases with the number of time
periods T and the number of regressors K and can become computationally
intractable. In fact for T = 15 and K = 10, the number of moment conditions for
Schmidt, et al. (1992) is T(T1)K/2 which is 1040 restrictions, highlighting the
computational burden of this approach. In addition, Ziliak argues that the
overidentifying restrictions are less likely to be satisfied possibly due to the
weak correlation between the instruments and the endogenous regressors.8 In
this case, the forward filter 2SLS estimator is desirable yielding less bias than
GMM and sizeable gains in efficiency. In fact, for the life cycle labor example,
the forward filter 2SLS estimate of the intertemporal substitution elasticity was
0.135 for 9 moment conditions compared to 0.296 for 212 moment conditions.
The standard error of these estimates dropped from 0.32 to 0.09.
The practical problem of not being able to use more moment conditions as
well as the statistical problem of the trade-off between small sample bias and
efficiency prompted Ahn & Schmidt (1999a) to pose the following questions:
Under what conditions can we use a smaller set of moment conditions without
incurring any loss of asymptotic efficiency? In other words, under what
conditions are some moment conditions redundant in the sense that utilizing
them does not improve efficiency? These questions were first dealt with by Im,
Ahn, Schmidt & Wooldridge (1999) who considered panel data models with
strictly exogenous explanatory variables. They argued that, for example, with
40
ten strictly exogenous time-varying variables and six time periods, the moment
conditions available for the random effects (RE) model is 360 and this reduces
to 300 moment conditions for the FE model. GMM utilizing all these moment
conditions leads to an efficient estimator. However, these moment conditions
exceed what the simple RE and FE estimators use. Im et al. (1999) provide the
assumptions under which this efficient GMM estimator reduces to the simpler
FE or RE estimator. In other words, Im et al. (1999) show the redundancy of
the moment conditions that these simple estimators do not use. Ahn & Schmidt
(1999a) provide a more systematic method by which redundant instruments can
be found and generalize this result to models with time-varying individual
effects. However, both papers deal only with strictly exogenous regressors. In
a related paper, Ahn & Schmidt (1999b) consider the cases of strictly and
weakly exogenous regressors. They show that the GMM estimator takes the
form of an instrumental variables estimator if the assumption of no conditional
heteroskedasticity (NCH) holds. Under this assumption, the efficiency of
standard estimators can often be established showing that the moment
conditions not utilized by these estimators are redundant. However, Ahn &
Schmidt (1999b) conclude that the NCH assumption necessarily fails if the full
set of moment conditions for the dynamic panel data model are used. In this
case, there is clearly a need to find modified versions of GMM, with reduced
set of moment conditions that lead to estimates with reasonable finite sample
properties.
Crepon, Kramarz & Trognon (1997) argue that for the dynamic panel data
model, when one considers a set of orthogonal conditions, the parameters can
be divided into parameters of interest (like ) and nuisance parameters (like the
second order terms in the autoregressive error component model). They show
that the elimination of such nuisance parameters using their empirical
counterparts does not entail an efficiency loss when only the parameters of
interest are estimated. In fact, Sevestre and Trognon in chapter 6 of Matyas &
Sevestre (1996) argue that if one is only interested in , then one can reduce
the number of orthogonality restrictions without loss in efficiency as far as is
concerned. However, the estimates of the other nuisance parameters are not
generally as efficient as those obtained from the full set of orthogonality
conditions.
The Alonso-Borrego & Arellano (1999) paper is also motivated by the finite
sample bias in panel data instrumental variable estimators when the
instruments are weak. The dynamic panel model generates many overidentifying restrictions even for moderate values of T. Also, the number of
instruments increases with T, but the quality of these instruments is often poor
because they tend to be only weakly correlated with first differenced
41
endogenous variables that appear in the equation. Limited information

maximum likelihood (LIML) is strongly preferred to 2SLS if the number of
instruments gets large as the sample size tends to infinity. Hillier (1990)
showed that the alternative normalization rules adopted by LIML and 2SLS are
at the root of their different sampling behavior. Hillier (1990) also showed that
a symmetrically normalized 2SLS estimator has properties similar to those of
LIML. Following Hillier (1990), Alonso-Borrego & Arellano (1999) derive a
symmetrically normalized GMM (SNM) and compare it with ordinary GMM
and LIML analogues by means of simulations. Monte Carlo and empirical
results show that GMM can exhibit large biases when the instruments are poor,
while LIML and SNM remain unbiased. However, LIML and SNM always had
a larger interquartile range than GMM. For T = 4, N = 100, 2 = 0.2 and 2 = 1,
the bias for = 0.5 was 6.9% for GMM, 1.7% for SNM and 1.7% for LIML.
This bias increases to 17.8% for GMM, 3.7% for SNM and 4.1% for LIML for
= 0.8.
Alvarez & Arellano (1997) studied the asymptotic properties of FE, one-step
GMM and non-robust LIML for a first-order autorgressive model when both N
and T tend to infinity with (N/T) c for 0 c < 2. For T < N, GMM bias is
always smaller than FE and LIML bias is smaller than the other two. In fixed
T framework, GMM and LIML are asymptotically equivalent, but as T
increases, LIML has a smaller asymptotic bias than GMM. These results
provide some theoretical support for LIML over GMM.9
Wansbeek & Knaap (1999) consider a simple dynamic panel data model
with a time trend and heterogeneous coefficients on the lagged dependent
variable and the time trend, i.e.
yit = iyi, t1 + it + i + uit
(39)
This model results from Islams (1995) version of Solows model on growth
convergence among countries. Wansbeek & Knaap (1999) show that double
differencing gets rid of the individual country effects (i) on the first round of
differencing and the heterogeneous coefficient on the time trend (i) on the
second round of differencing. Modified OLS, IV and GMM methods are
adapted to this model and LIML is suggested as a viable alternative to GMM
to guard against the small sample bias of GMM. Macroeconomic data are
subject to measurement error and Wansbeek & Knaap (1999) show how these
estimators can be modified to account for measurement error that is white
noise. For example, GMM is modified so that it discards the orthogonality
conditions that rely on the absence of measurement error.
Jimenez-Martin (1998) performs Monte Carlo experiments to study the
performance of the Holtz-Eakin (1988) test for the presence of individual
42
heterogeneity effects in dynamic small T unbalanced panel data models. The

design of the experiment includes both endogenous and time-invariant
regressors in addition to the lagged dependent variable. The test behaves
correctly for a moderate autoregressive coefficient. However, when this
coefficient approaches unity, the presence of an additional regressor sharply
affects the power and the size of the test. The power of this test is higher when
the variance of the specific effects increases (they are easier to detect), when
the sample size increases, when the data set is balanced (for a given number of
cross-section units) and when the regressors are strictly exogenous.
A. Heterogeneous Dynamic Panel Data Models
The fundamental assumption underlying pooled homogeneous parameter
models has been called into question. Robertson & Symons (1992) warned
about the bias from pooled estimators when the estimated model is dynamic
and homogeneous when in fact the true model is static and heterogeneous.
Pesaran & Smith (1995) argued in favor of dynamic heterogeneous models
when N and T are large. In this case, pooled homogeneous estimators are
inconsistent whereas an average estimator of heterogeneous parameters can
lead to consistent estimates as N and T tend to infinity. Maddala, Srivastava &
Li (1994) argued that shrinkage estimators are superior to either heterogeneous
or homogeneous parameter estimates especially for prediction purposes. In
fact, Maddala, Trost, Li & Joutz (1997) considered the problem of estimating
short run and long run elasticities of residential demand for electricity and
natural gas for each of 49 states over the period 19701990.10 They conclude
that individual heterogeneous state estimates were hard to interpret and had the
wrong signs. Pooled data regressions were not valid because the hypothesis of
homogeneity of the coefficients was rejected. They recommend shrinkage
estimators if one is interested in obtaining elasticity estimates for each state
since these give more reliable results.
Baltagi & Griffin (1997) compare short run and long run estimates as well
as forecasts for pooled homogeneous, individual heterogeneous and shrinkage
estimators of a dynamic demand model for gasoline across 18 OECD countries
over the period 19601990. Based on one, five and ten year forecasts and
plausibility of the short run and long run elasticity estimates, the results are in
favor of pooling. Similar results were obtained for a dynamic model for
cigarette demand across 46 states over the period 19631992, see Baltagi,
Griffin & Xiong (2000).
In chapter 8 of Matyas & Sevestre (1996), Pesaran, Smith & Im investigated
the small sample properties of various estimators of the long run coefficients
43
for a dynamic heterogeneous panel data model using Monte Carlo experiments.
Their findings indicate that the mean group estimator performs reasonably well
for large T. However, when T is small, the mean group estimator could be
seriously biased, particularly when N is large relative to T. Pesaran & Zhao
(1999) examine the effectiveness of alternative bias-correction procedures in
reducing the small sample bias of these estimators using Monte Carlo
experiments. An interesting finding is that when the coefficient of the lagged
dependent variable is greater than or equal to 0.8, none of the bias correction
procedures seem to work.
Hsiao, Pesaran & Tahmiscioglu (1999) suggest a Bayesian approach for
estimating the mean parameters of a dynamic heterogeneous panel data model.
The coefficients are assumed to be normally distributed across cross-sectional
units and the Bayes estimator is implemented using Markov Chain Monte
Carlo methods. Hsiao et al. argue that Bayesian methods can be a viable
alternative in the estimation of mean coefficients in dynamic panel data models
even when the initial observations are treated as fixed constants. They establish
the asymptotic equivalence of this Bayes estimator and the mean group
estimator proposed by Pesaran & Smith (1995). The asymptotics are carried
out for both N and T with N/T 0. Monte Carlo experiments show that
this Bayes estimator has better sampling properties than other estimators for
both small and moderate size T. Hsiao et al. also caution against the use of the
mean group estimator unless T is sufficiently large relative to N. The bias in the
mean coefficient of the lagged dependent variable appears to be serious when
T is small and the true value of this coefficient is larger than 0.6. Hsiao et al.
apply their methods to estimate the q-investment model using a panel of 273
US firms over the period 19721993.
VII. CONCLUSION
This survey gives a brief overview of some of the main results in the
econometrics of nonstationary panels as well as recent developments in
dynamic panels. There has been an immense amount of research in this area
recently with the demand for empirical studies exceeding the supply of
econometric theory developed for these models. As this survey indicates,
several issues have been resolved but a lot remains to be done.
ACKNOWLEDGMENTS
The authors would like to thank R. Carter Hill, M. H. Pesaran and an
anonymous referee for their helpful comments and suggestions. Baltagi was
44
funded by the Advanced Research Program, Texas Higher Education Board.

Kao was supported by a grant from the Chiang Ching-kou Foundation for
International Scholarly Exchange.
NOTES
1. A collection of dynamic panel data routines can be found in: http://www.cemfi.es/
~ arellano/#dpd.
2. Chiang & Kao (2000) have recently put together a fairly comprehensive set of
subroutines, NPT 1.0, for studying nonstationary panel data. NPT 1.0 can be
downloaded from http://web.syr.edu/ ~ cdkao.
3. Testing for cointegration in panel data by combining p-values tests is a
straightforward extension of the testing procedures in this section. For cointegration
tests, the relevant model is equation (15). We let GiTi be a test for the null of no
cointegration and apply the same tests and asymptotic theory in this section.
4. Kiviet (1999) extends this derivation to the case of weakly exogenous variables
and examines to what degree this order of approximation is determined by the initial
conditions of the dynamic panel model.
5. Arellano (1989) found that using lagged differences of predetermined variables
as instruments is not recommended since it has a singularity point and very large
variances over a significant range of the parameter values.
6. See also Arellano & Bover (1995), chapter 8 of Baltagi (1995) and chapters 6 and
7 of Matyas & Sevestre (1996) for more details.
7. For a Hausman & Taylor (1981) type model, Metcalf (1996) shows that using
less instruments may lead to a more powerful Hausman specification test. Asymptotically, more instruments lead to more efficient estimators. However, the asymptotic bias
of the less efficient estimator will also be greater as the null hypothesis of no correlation
is violated. Metcalf argues that if the bias increases at the same rate as the variance (as
the null is violated) for the less efficient estimator, then the power of the Hausman test
will increase. This is due to the fact that the test statistic is linear in variance but
quadratic in bias.
8. See the growing literature on weak instruments by Nelson & Startz (1990),
Bekker (1994), Angrist & Kreuger (1995), Bound, Jaeger & Baker (1995) and Staiger
& Stock (1997) to mention a few.
9. An alternative one-step method that achieves the same asymptotic efficiency as
robust GMM or LIML estimators is the maximum empirical likelihood estimation
method, see Imbens (1997). This maximizes a multinomial pseudo-likelihood function
subject to the orthogonality restrictions. These are invariant to normalization because
they are maximum likelihood estimators.
10. Maddala et al. (1997) also provide a unified treatment of classical, Bayes and
empirical Bayes procedures for estimating this model.
REFERENCES
Ahn, S. C., & Schmidt, P. (1995). Efficient Estimation of Models for Dynamic Panel Data. Journal
of Econometrics, 68, 527.
45
Ahn, S. C., & Schmidt, P. (1997). Efficient Estimation of Dynamic Panel Data Models: Alternative
Assumptions and Simplified Estimation. Journal of Econometrics, 76, 309321.
Ahn, S. C., & Schmidt, P. (1999a). Modified Generalized Instrumental Variables Estimation of
Panel Data Models with Strictly Exogenous Instrumental Variables. In: C. Hsiao, K. Lahiri,
L. F. Lee & M. H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable
Models (pp. 171198). Cambridge: Cambridge University Press.
Ahn, S. C., & P. Schmidt. (1999b). Estimation of Linear Panel Data Models Using GMM. In:
Generalized Method of Moments Estimation (pp. 211247). Cambridge: Cambridge
University Press.
Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalized Instrumental Variable
Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 3649.
Alvarez, J., & Arellano, M. (1997). The Time Series and Cross-section Asymptotics of Dynamic
Panel Data Estimators. Working paper, CEMFI, Madrid.
Andersen, T. G., & Srensen, R. E. (1996). GMM Estimation of a Stochastic Volatility Model: A
Monte Carlo Study. Journal of Business and Economic Statistics, 14, 328352.
Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using
Panel Data. Journal of Econometrics, 18, 4782.
Andersson, J., & Lyhagen, J. (1999). A Long Memory Panel Unit Root Test: PPP Revisited.
Working paper, Economics and Finance, No. 303, Stockholm School of Economics,
Sweden.
Angrist, J. D., & Krueger, A. B. (1995). Split Sample Instrumental Variable Estimates of Return
to Schooling. Journal of Business and Economic Statistics, 13, 225235.
Arellano, M. (1989). A Note on the Anderson-Hsiao Estimator for Panel Data. Economics Letters,
31, 337341.
Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo
Evidence and An Application to Employment Equations. Review of Economic Studies, 58,
277297.
Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variables Estimation of ErrorComponent Models. Journal of Econometrics, 68, 2951.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley.
Baltagi, B. H., & Griffin, J. M. (1995). A Dynamic Demand Model for Liquor: The Case for
Pooling. Review of Economics and Statistics, 77, 545553.
Baltagi, B. H., & Griffin, J. M. (1997). Pooled Estimators v.s. Their Heterogeneous Counterparts
in the Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303327.
Baltagi, B. H., Griffin, J. M. & Xiong, W. (2000). To Pool or Not to Pool: Homogeneous Versus
Heterogeneous Estimators Applied to Cigarette Demand. Review of Economics and
Statistics, 82, 117126.
Banerjee, A. (1999). Panel Data Unit Roots and Cointegration: An Overview. Oxford Bulletin of
Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variables
Estimators. Econometrica, 62, 657682.
Bernard, A., & Jones, C. (1996). Productivity Across Industries and Countries: Time Series Theory
and Evidence. Review of Economics and Statistics, 78, 135146.
Bhargava, A., Franzini, L. & Narendranathan, W. (1982). Serial Correlation and Fixed Effects
Models. Review of Economic Studies, 49, 533549.
Binder, M., Hsiao, C. & Pesaran, M. H. (2000). Estimation and Inference in Short Panel Vector
Autoregressions With Unit Roots and Cointegration. Working paper, Department of
Economics, University of Maryland.
46
Blundell, R. W., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel
Data Models. Journal of Econometrics, 87, 115143.
Blundell, R. W., Bond, S., & Windmeijer, F. (2000). Estimation in Dynamic Panel Data Models:
Impoving on the Performance of the Standard GMM Estimator. Advances in Econometrics,
15, forthcoming.
Boumahdi, R., & Thomas, A. (1991). Testing for Unit Roots Using Panel Data. Economics Letters,
37, 7779.
Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation
When the Correlation Between the Instruments and the Endogenous Explanatory Variables
is Weak. Journal of the American Statistical Association, 90, 443450.
Breitung, J. (2000). The Local Power of Some Unit Root Tests for Panel Data. Advances in
Econometrics, 15, forthcoming.
Breitung, J., & Meyer, W. (1994). Testing for Unit Roots in Panel Data: Are Wages on Different
Bargaining Levels Cointegrated? Applied Economics, 26, 353361.
Canzoneri, M. B., Cumby, E. E., & Diba, B. (1999). Relative Labor Productivity and the Real
Exchange Rate in the Long Run: Evidence for a Panel of OECD Countries. Journal of
International Economics, 47, 245266.
Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression
in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management
Sciences, 19, 75114.
Chiang, M. H., & Kao, C. (2000). Nonstationary Panel Time Series Using NPT 1.0 A User
Guide. Manuscript, Center for Policy Research, Syracuse University.
Choi, I. (1999a). Unit Root Tests for Panel Data. Working paper, Department of Economics,
Kookmin University, Korea.
Choi, I. (1999b). Asymptotic Analysis of a Nonstationary Error Component Model. Working paper,
Department of Economics, Kookmin University, Korea.
Choi, I. (1999c). Instrumental Variables Estimation of a Nearly Nonstationary Error Component
Model. Working paper, Department of Economics, Kookmin University, Korea.
Coakley, J., & Fuertes, A. M. (1997). New Panel Unit Root Tests of PPP. Economics Letters, 57,
1722.
Coakely, J., Kulasi, F., & Smith, R. (1996). Current Account Solvency and the Feldstein-Horioka
Puzzle. Economic Journal, 106, 620627.
Coe, D., & Helpman, E. (1995). International R&D Spillovers. European Economic Review, 39,
859887.
Conley, T. G. (1999). GMM Estimation with Cross Sectional Dependence. Journal of
Econometrics, 92, 145.
Crepon, B., Kramarz, F., & Trognon, A. (1997). Parameters of Interest, Nuisance Parameters and
Orthogonality Conditions: An Application to Autoregressive Error Components Models.
Journal of Econometrics, 82, 135156.
Culver, S. E., & Papell, D. H. (1997). Is There a Unit Root in the Inflation Rate? Evidence from
Sequential Break and Panel Data Model. Journal of Applied Econometrics, 35, 155160.
Driscoll, J. C., & Kraay, A. C. (1998). Consistent Covariance Matrix Estimation with Spatially
Dependent Panel Data. Review of Economics and Statistics, 80, 549560.
Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37,
249265.
Entorf, H. (1997). Random Walks with Drifts: Nonsense Regression and Spurious Fixed-Effect
Estimation. Journal of Econometrics, 80, 287296.
47
Frankel, J. A., & Rose, A. K. (1996). A Panel Project on Purchasing Power Parity: Mean Reversion
Within and Between Countries. Journal of International Economics, 40, 209224.
Funk, M. (1998). Trade and International R&D Spillovers Among OECD Countries. Working
paper, Department of Economics, St. Louis University, St. Louis.
Gerdtham, U. G., & Lthgren, M. (1998). On Stationarity and Cointegration of International
Health Expenditure and GDP. Working paper, Economics and Finance, No. 232,
Stockholm School of Economics, Sweden.
Griliches, Z., & Mairesse, J. (1998). Production Functions: The Search for Identification. In: S.
Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series,
Cambridge: Cambridge University Press.
Groen, J. J. J. (1999). The Monetary Exchange Rate Model as A Long-run Phenomenon. Journal
of International Economics, forthcoming.
Groen, J. J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of
Vector Error Correction Models. Discussion paper 99055/4, Tinbergen Institute, The
Netherlands.
Hadri, K. (1999). Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root
in Panel Data with Serially Correlated Errors. Manuscript, Department of Economics and
Accounting, University of Liverpool, United Kingdom.
Hahn, J. (1997). Efficient Estimation of Panel Data Models With Sequential Moment Restrictions.
Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed
Effects? Journal of Econometrics, 93, 309326.
Hall, S., Lazarova, S., & Urga, G. (1999). A Principal Components Analysis of Common
Stochastic Trends in Heterogeneous Panel Data: Some Monte Carlo Evidence. Oxford
Bulletin of Economics and Statistics, 61, 749767.
Harris, D., & Inder, B. (1994). A Test of the Null Hypothesis of Cointegration. In: C. P. Hargreaves
(Ed.), Nonstationary Time Series Analysis and Cointegration. New York: Oxford University
Press.
Harris, R. D. F., & Tzavalis, E. (1999). Inference for Unit Roots in Dynamic Panels Where the
Time Dimension is Fixed. Journal of Econometrics, 91, 201226.
Hausman, J. A., & Taylor, W. E. (1981). Panel Data and Unobservable Individual Effects.
Econometrica, 49, 13771398.
Hillier, G. H. (1990). On the Normalization of Structural Equations: Properties of Direction
Holtz-Eakin, D. (1988). Testing for Individual Effects in Autoregressive Models. Journal of
Hsiao, C., Pesaran, M. H., & Tahmiscioglu, K. (1999). Bayes Estimation of Short-run Coefficients
in Dynamic Panel Data Models. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Pesaran (Eds.),
Analysis of Panel Data and Limited Dependent Variable Models (pp. 268296).
Im, K. S., Ahn, S. C., Schmidt, P., & Wooldridge, J. M. (1999). Efficient Estimation of Panel Data
Models with Strictly Exogenous Explanatory Variables. Journal of Econometrics, 93,
177201.
Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels.
Manuscript, Department of Applied Economics, University of Cambridge, United
Kingdom.
48
Imbens, G. (1997). One-Step Estimators for Over-identified Generalized Method of Moments

Models. Review of Economic Studies, 64, 359383.
Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, 110,
11271170.
Jimenez-Martin, S. (1998). On the Testing of Heterogeneity Effects in Dynamic Unbalanced Panel
Data Models. Economics Letters, 58, 157163.
Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models.
Oxford: Oxford University Press.
Jorion, P., & Sweeney, R. (1996). Mean Reversion is Real Exchange Rates: Evidence and
Implications for Forecasting. Journal of International Money and Finance, 15, 535550.
Judson, R. A., & Owen, A. L. (1999). Estimating Dynamic Panel Data Models: A Guide for
Macroeconomists. Economics Letters, 65, 915.
Kao, C. (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data.
Kao, C., & Chiang, M. H. (2000). On the Estimation and Inference of a Cointegrated Regression
in Panel Data. Advances in Econometrics, 15, forthcoming.
Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data
when the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center
for Policy Research, Syracuse University.
Kao, C., Chiang, M. H., & Chen, B. (1999). International R&D Spillovers: An Application of
Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and
Karlsson, S., & Lthgren, M. (1999). On the Power and Interpretation of Panel Unit Root Tests.
Working paper, Economics and Finance, No. 299, Stockholm School of Economics,
Sweden.
Kauppi, H. (2000). Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression with
Near Integrated Regressors. Advances in Econometrics, 15, forthcoming.
Keane, M. P., & Runkle, D. E. (1992). On the Estimation of Panel-data Models with Serial
Correlation When Instruments are Not Strictly Exogenous. Journal of Business and
Economics Statistics, 10, 19.
Kiviet, J. F. (1995). On Bias, Inconsistency and Efficiency of Some Estimators in Dynamic Panel
Kiviet, J. F. (1999). Expectations of Expansions for Estimators in a Dynamic Panel Data Model:
Some Results for Weakly Exogenous Regressors. In: C. Hsiao, K. Lahiri, L. F. Lee & M.
H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable Models
(pp. 199225). Cambridge: Cambridge University Press.
Larsson, R., Lyhagen, J., & Lthgren, M. (1998). Likelihood-Based Cointegration Tests In
Heterogeneous Panels. Working paper, Economics and Finance, No. 250, Stockholm
School of Economics, Sweden.
Lee, K., Pesaran, M. H., & Smith, R. (1997). Growth and Convergence in a Multi-Country
Empirical Stochastic Solow Model. Journal of Applied Econometrics, 12, 357392.
Levin, A., & Lin, C. F. (1992). Unit Root Test in Panel Data: Asymptotic and Finite Sample
Properties. Discussion paper No. 9293, University of California at San Diego.
Lothian, J. R. (1996). Multi-Country Evidence on the Behavior of Purchasing Power Parity Under
the Current Float. Journal of International Money and Finance, 16, 1935.
MacDonald, R. (1996). Panel Unit Root Tests and Real Exchange Rates Economics Letters, 50,
711.
49
Maddala, G. S. (1999). On the Use of Panel Data Methods with Cross Country Data. Annales
dEconomie et de Statistique, 5556, 429448.
Maddala, G. S., Srivastava, V. K., & Li, H. (1994). Shrinkage Estimators for the Estimation of
Short-run and Long-run Parameters From Panel Data Models. Working paper, Ohio State
University, Ohio.
Maddala, G. S., Trost, R. P., Li, H., & Joutz, F. (1997). Estimation of Short-run and Long-run
Elasticities of Energy Demand from Panel Data Using Shrinkage Estimators. Journal of
Business and Economic Statistics, 15, 90100.
Maddala, G. S., & Wu, S. (1999). A Comparative Study of Unit Root Tests with Panel Data and
A New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631652.
Maddala, G. S., Wu, S., & Liu, P. (2000). Do Panel Data Rescue Purchasing Power Parity (PPP)
Theory? In: J. Krishnakumar & E. Ronchetti (Eds.), Panel Data Econometrics: Future
Directions (pp. 3551). Amsterdam: North-Holland.
Mtys, L., & Sevestre, P. (Eds.) (1996). The Econometrics of Panel Data: A Handbook of Theory
and Applications. Dordrecht: Kluwer Academic Publishers.
McCoskey, S., & Kao, C. (1998). A Residual-Based Test of the Null of Cointegration in Panel
Data. Econometric Reviews, 17, 5784.
McCoskey, S., & Kao, C. (1999a). Testing the Stability of a Production Function with
Urbanization as a Shift Factor: An Application of Non-Stationary Panel Data Techniques.
Oxford Bulletin of Economics and Statistics, 61, 671690.
McCoskey, S., & Kao, C. (1999b). Comparing Panel Data Cointegration Tests with an Application
of the Twin Deficits Problems. Working paper, Center for Policy Research, Syracuse
University, New York.
McCoskey, S., & Selden, T. (1998). Health Care Expenditures and GDP: Panel Data Unit Root
Test Results. Journal of Health Economics, 17, 369376.
Metcalf, G. E. (1996). Specification Testing in Panel Data with Instrumental Variables. Journal of
Moon, H. R., & Phillips, P. C. B. (1998). A Reinterpretation of the Feldstein-Horioka Regressions
from a Nonstationary Panel Viewpoint. Working paper, Department of Economics, Yale
University.
Moon, H. R., & Phillips, P. C. B. (1999). Maximum Likelihood Estimation in Panels with
Incidental Trends. Oxford Bulletin of Economics and Statistics, 61, 711747.
Nelson, C., & Startz, R. (1990). The Distribution of the Instrumental Variables Estimator and Its
t-ratio When the Instrument Is A Poor One. Journal of Business, 63, S125-S140.
Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical
Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri,
L. F. Lee & M.H. Pesaran (Eds.), Analysis of Panel Data and Limited Dependent Variable
Models (pp. 136170). Cambridge: Cambridge University Press.
Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426.
OConnell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of
Oh, K. Y. (1996). Purchasing Power Parity and Unit Roots Tests Using Panel Data. Journal of
International Money and Finance, 15, 405418.
Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float.
Journal of International Economics, 43, 313332.
Pedroni, P. (1996). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper, Department of Economics, Indiana University.
50
Pedroni, P. (1997a). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time
Series Tests with an Application to the PPP Hypothesis. Working paper, Department of
Economics, Indiana University.
Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power
Parity in Panels. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1999). Critical Values for Cointegration Tests in Heterogeneous Panels with Multiple
Regressors. Oxford Bulletin of Economics and Statistics, 61, 653678.
Pedroni, P. (2000). Testing for Convergence to Common Steady States in Nonstationary
Heterogeneous Panels. Working paper, Department of Economics, Indiana University.
Pesaran, M. H., & Smith, R. (1995). Estimating Long-run Relationships From Dynamic
Heterogeneous Panels. Journal of Econometrics, 68, 79113.
Pesaran, M. H., Shin, Y., & Smith, R. (1999). Pooled Mean Group Estimation of Dynamic
Heterogeneous Panels. Journal of the American Statistical Association, 94, 621634.
Pesaran, M. H., & Zhao, Z. (1999). Bias Reduction in Estimating Long-run Relationships From
Dynamic Heterogeneous Panels. In: C. Hsiao, K. Lahiri, L. F. Lee & M. H. Persaran (Eds.),
Analysis of Panels and Limited Dependent Variable Models (pp. 297322). Cambridge:
Cambridge University Press.
Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables
Regression with I (1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel
Data. Econometrica, 67, 10571111.
Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some
Recent Developments. Econometric Reviews, forthcoming.
Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for
Cointegration. Econometrica, 58, 165193.
Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data.
Economics Letters, 44, 919.
Quah, D. (1996). Empirics for Economic Growth and Convergence. European Economic Review,
40, 13531375.
Robertson, D., & Symons, J. (1992). Some Strange Properties of Panel Data Estimators. Journal
of Applied Econometrics, 7, 175189.
Saikkonen, P. (1991). Asymptotically Efficient Estimation of Cointegrating Regressions.
Econometric Theory, 58, 121.
Sala-i-Martin, X. (1996). The Classical Approach to Convergence Analysis. Economic Journal,
106, 10191036.
Schmidt, P., Ahn, S. C. & Wyhowski, D. (1992). Comment. Journal of Business and Economic
Shin, Y. (1994). A Residual Based Test of the Null of Cointegration Against the Alternative of No
Cointegration. Econometric Theory, 10, 91115.
Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression With Weak Instruments.
Stock, J. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order Integrated Systems.
Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order
Integrated Systems. Econometrica, 61, 783820.
Tauchen, G. (1986). Statistical Properties of Generalized Method of Moments Estimators of
Structural Parameters Obtained From Financial Market Data. Journal of Business and
Economic Statistics, 4, 397416.
51
Wansbeek, T. J., & Bekker, P. (1996). On IV, GMM and ML in a Dynamic Panel Data Model.
Economics Letters , 51, 145152.
Wansbeek, T. J., & Knaap, T. (1999). Estimating a Dynamic Panel Data Model with Heterogenous
Trends. Annales dEconomie et de Statistique, 5556, 331349.
Wooldridge, J. M. (1997). Multiplicative Panel Data Models Without the Strict Exogeneity
Assumption. Econometric Theory, 13, 667678.
Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo Study.
Working paper, Department of Economics, State University of New York at Buffalo, New
York.
Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel Data Set. Journal
of Money, Credit and Banking, 28, 5463.
Ziliak, J. P. (1997). Efficient Estimation with Panel Data When Instruments are Predetermined: An
Empirical Comparison of Moment-condition Estimators. Journal of Business and
ESTIMATION IN DYNAMIC PANEL

DATA MODELS: IMPROVING ON THE
PERFORMANCE OF THE STANDARD
GMM ESTIMATOR
Richard Blundell, Stephen Bond and Frank Windmeijer
ABSTRACT
This chapter reviews developments to improve on the poor performance of
the standard GMM estimator for highly autoregressive panel series. It
considers the use of the system GMM estimator that relies on relatively
mild restrictions on the initial condition process. This system GMM
estimator encompasses the GMM estimator based on the non-linear
moment conditions available in the dynamic error components model and
has substantial asymptotic efficiency gains. Simulations, that include
weakly exogenous covariates, find large finite sample biases and very low
precision for the standard first differenced estimator. The use of the system
GMM estimator not only greatly improves the precision but also greatly
reduces the finite sample bias. An application to panel production function
data for the U.S. is provided and confirms these theoretical and
experimental findings.
1. INTRODUCTION
Much of the recent literature on dynamic panel data estimation has focused on
providing optimal linear Generalised Method of Moments (GMM) estimators
ISBN: 0-7623-0688-2
53
54
RICHARD BLUNDELL, STEPHEN BOND & FRANK WINDMEIJER
under relatively weak auxiliary assumptions about the exogeneity of the

covariate processes and the properties of the heterogeneity and error term
processes. A standard approach is to first-difference the equation to remove
permanent unobserved heterogeneity, and to use lagged levels of the series as
instruments for the predetermined and endogenous variables in first-differences
(see Anderson & Hsiao (1981), Holtz-Eakin, Newey & Rosen (1988) and
Arellano & Bond (1991)). However, in dynamic panel data models where the
series are highly autoregressive and the number of time series observations is
moderately small, this standard GMM estimator has been found to have large
finite sample bias and poor precision in simulation studies (see the
experimental evidence and theoretical discussions in Ahn & Schmidt (1995)
and Alonso-Borrego & Arellano (1999), for example).
The poor performance of the standard GMM panel data estimator is also
reflected in empirical experience with estimation on relatively short panels with
highly persistent data. To quote from the extensive review of production
function estimation by Griliches & Mairesse (1998) one of the original
applications for panel data estimation In empirical practice, the application
of panel methods to micro-data produced rather unsatisfactory results: low and
often insignificant capital coefficients and unreasonably low estimates of
returns to scale. One simple explanation of these findings in the production
function context is that lagged levels of the series provide weak instruments for
first-differenced variables in this case (see Blundell & Bond (2000)).
One response to these findings has been to consider the use of further
moment conditions that have improved properties for the estimates of the
parameters of interest. For example, Ahn & Schmidt (1995) consider the nonlinear moment conditions implied by the standard error components
formulation and show that asymptotic variance ratios can be considerably
improved. Blundell & Bond (1998) consider alternative estimators that require
further restrictions on the initial conditions process, designed to improve the
properties of the standard first-differenced instrumental variables estimator.
This also provides the motivation for the discussion in this chapter. The idea
is to consider the performance of a system GMM estimator that relies on
relatively mild restrictions on the initial condition process to improve the
performance of the GMM estimator in the dynamic panel data context. The
material presented draws extensively from the existing literature. For example,
Arellano & Bover (1995) and Blundell & Bond (1998) show that mean
stationarity in an AR(1) panel data model is sufficient to justify the use of
lagged differences of the dependent variable as instruments for equations in
levels, in addition to lagged levels as instruments for equations in firstdifferences. This result naturally extends to models with weakly exogenous
GMM Estimation in Dynamic Panel Data Models
55
covariates. The Monte Carlo simulations and asymptotic variance calculations

reported in this paper show that this extended GMM estimator can offer
considerable efficiency gains in the situations where the standard firstdifferenced GMM estimator performs poorly. Given this restriction on the
initial conditions, the system GMM estimator is also shown to encompass the
GMM estimator based on the non-linear moment conditions available in the
dynamic error components model (see Ahn & Schmidt (1995)). The system
GMM estimator has substantial asymptotic efficiency gains relative to this nonlinear GMM estimator, and these are reflected in their finite sample
properties.
The chapter is organised in the following way. The next section reviews the
standard error components structure for a linear dynamic panel data model and
lays out the underlying assumptions. Recalling that Within Groups, GLS and
OLS on the levels and first-differenced models all suffer from bias even when
the cross-section dimension is large, this section also briefly considers the
biases that occur for standard panel data estimators in dynamic models. Section
3 then presents the linear GMM estimator for this model that uses lagged
information to instrument current differences in a first-differenced specification. The following section then outlines the problem of weak instruments in
this case. Following the discussion in Ahn & Schmidt (1995), Section 5
considers the use of further non-linear moment conditions that are implied by
the model outlined in Section 2. Section 6 derives a linear moment restriction
for the levels model using initial condition restrictions and this is then
incorporated into the full system GMM estimator. Asymptotic variance
comparisons among these various GMM estimators are given in Section 8. The
detailed discussion in these earlier sections uses an AR(1) model and the
extension to a multivariate setting is presented in Section 9. Finally, before
moving to the Monte Carlo results and empirical application, over-identification tests are reviewed.
The Monte Carlo results presented in Section 11 are the first in the literature
to consider the properties of these GMM estimators in dynamic models with
weakly exogenous regressors. As this is perhaps the most common case in
empirical applications, these results have important bearing on applied work.
The analysis finds both a large bias and very low precision for the standard
first-differenced estimator when the individual series are highly autoregressive.
The use of the system GMM estimator not only greatly improves the precision
but also greatly reduces the finite sample bias. Exploiting the non-linear
moment conditions also provides significant gains compared to the standard
first-differenced GMM estimator, but these gains are much less dramatic than
56
those provided by the system GMM estimator when the initial conditions
restriction is valid.
The empirical application returns to the Griliches and Mairesse discussion.
The application uses production function data for the U.S. and confirms the
Griliches and Mairesse findings for the capital and labor coefficients in a CobbDouglas model. Using the standard first-differenced GMM estimator, the
estimated coefficient on capital is very low and all coefficient estimates have
poor precision. Constant returns to scale is easily rejected. Moreover, an
examination of the individual series suggests that they are highly autoregressive
thus hinting at a weak instruments problem for standard GMM on this data.
These production function results are improved by using the system estimator.
The capital coefficient is now more precise and takes a reasonable value and
constant returns to scale is not rejected. These Monte Carlo and empirical
results indicate that a careful examination of the original series and use of the
system GMM estimator can overcome many of the disappointing features of
the standard GMM estimator in the context of highly persistent series.
2. DYNAMIC MODELS AND THE BIASES FROM

STANDARD PANEL DATA ESTIMATORS
To analyse the properties of estimators of the parameters in linear dynamic
panel data models we consider an autoregressive panel data model of the form
yit = yit 1 + xit + uit
(2.1)
uit = i + vit
(2.2)
for i = 1, . . . , N and t = 2, . . . , T, where i + vit is the usual error components

decomposition of the error term; N is large, T is fixed and || < 1.1 This model
specification is sufficient to cover most of the standard cases encountered in
linear dynamic panel applications. Allowing the inclusion of xit 1 provides the
autoregressive panel data model
yit = yi, t 1 + 1xit + 2xit 1 + i + vit
which has the corresponding common factor restricted (2 = 1) form
yit = 1xit + fi + it,
with it = i, t 1 + vit and i = (1 )fi.
In our Monte Carlo study and application to panel data production function
equations presented in Sections 11 and 12 we allow for the inclusion of xit
regressors, but for the evaluation of the various estimators we use an AR(1)
model with unobserved individual-specific effects
yit = yi, t 1 + uit
57
(2.3)
uit = i + vit
for i = 1, . . . , N and t = 2, . . . , T.2 At the outset we will assume that i and vit
have the familiar error components structure in which
E(i) = 0, E(vit) = 0, E(viti) = 0 for i = 1, . . . , N and t = 2, . . . , T
(2.4)
E(vitvis) = 0 for i = 1, . . . , N and t s.
(2.5)
and
In addition there is the standard assumption concerning the initial conditions yi1
(see Ahn & Schmidt (1995), for example)
E(yi1vit) = 0 for i = 1, . . . , N and t = 2, . . . , T.
(2.6)
These standard assumptions (2.4), (2.5) and (2.6) imply moment restrictions
that are sufficient to (identify and) estimate for T 3.3
Further restrictions on the initial conditions define a mean stationary process
as
yi1 =
i
+ i1
1
for i = 1, . . . , N
(2.7)
and
E( i1) = E(i i1) = 0 for i = 1, . . . , N,
(2.8)
and a covariance stationary process by further specifying

E(v2it) =
2v
E( 2i1) =
for i = 1, . . . , N and t = 2, . . . , T
2v
1 2
for i = 1, . . . , N.
For completeness and to conclude this brief outline of the dynamic error
components model, we consider the biases from the standard panel data
estimators in this model. We consider here the biases found under covariance
stationarity (for more details see Baltagi (1995) and Hsiao (1986)).
The asymptotic bias of the simple OLS estimator for in model (2.3), is
given by
plim( OLS ) = (1 )
2/
2v
1
, with k =
,
/
2v + k
1+
2

where
2 = E(2i ), and therefore the OLS estimator is biased upwards, with
< plim( OLS) < 1.
58
The asymptotic bias of the Within Groups estimator for has been
documented by Nickell (1981) and is given by
1+
1 1 T
1
T1
T (1 )
,
plim( WG ) =
2
1 1 T
1
1
(1 )(T 1)
T (1 )
and so, when > 0, plim( WG) < .
When the model is transformed into first-differences to eliminate the
unobserved individual heterogeneity component i,
yit = yit 1 + uit,
the asymptotic bias of the OLS estimator is given by
1+
,
plim( OLSd ) =
2
1
and so plim( OLSd) =
< 0.
2
3. A FIRST-DIFFERENCED GMM ESTIMATOR

3.1. The Standard Moment Conditions
In the absence of any further restrictions on the process generating the initial
conditions, the autoregressive error components model (2.3)(2.6) implies the
following md = 0.5(T 1)(T 2) orthogonality conditions which are linear in
the parameter
E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1,
(3.1)
where uit = uit ui, t 1. These depend only on the assumed absence of serial
correlation in the time varying disturbances vit, together with the restriction
(2.6).
The moment restrictions in (3.1) can be expressed more compactly as
E(Zdiui) = 0,
where Zdi is the (T 2) md matrix given by
yi1 0 0 . . . 0 . . .
0
0
0 yi1 yi2 . . . 0 . . .
Zdi = .
,
.
. ... . ...
.
0 0 0 . . . yi1 . . . yiT 2
and ui is the (T 2) vector (ui3, ui4, . . . , uiT).
59
The Generalised Method of Moments (GMM) estimator based on these

moment conditions minimises the quadratic distance uZdWNZdu for some
metric WN, where Zd is the md N(T 2) matrix (Zd1, Zd2, . . . , ZdN) and u
is the N(T 2) vector (u1, u2, . . . , uN). This gives the GMM estimator for
as
d = (y 1ZdWNZdy 1) 1y 1ZdWNZdy,
where yi is the (T 2) vector (yi3, yi4, . . . , yiT), yi, 1 is the (T 2)
vector (yi2, yi3, . . . , yi, T 1), and y and y 1 are stacked across individuals in the same way as u.
Alternative choices for the weights WN give rise to a set of GMM estimators
based on the moment conditions in (3.1), all of which are consistent for large
N and finite T, but which differ in their asymptotic efficiency.4 In general the
optimal weights are given by
1
WN =
N
ZdiuiuiZdi
i=1
1
(3.2)
where ui are residuals from an initial consistent estimator. We refer to this as
the two-step GMM estimator.5 In the absence of any additional knowledge
about the process for the initial conditions, this estimator is asymptotically
efficient in the class of estimators based on the linear moment conditions (3.1)
(see Hansen (1982) and Chamberlain (1987)).
3.2. Homoskedasticity
Ahn & Schmidt (1995) show that additional linear moment conditions are
available if the vit disturbances are homoskedastic through time, i.e. if
E(v2it) =
2i for t = 2, . . . , T.
(3.3)
This implies T 3 orthogonality restrictions of the form

E(yi, t 2ui, t 1 yi, t 1uit) = 0; for t = 4, . . . , T
(3.4)
and allows a further T 3 columns to be added to the instrument matrix Zdi.

The additional columns Zhi are
yi2
0
Zhi =
.
0
yi3
yi3
.
0
0
yi4
.
0
...
0
...
0
...
.
. . . yiT 2
0

0
.
.
yiT 1
60
Calculation of the one-step and two-step GMM estimators then proceeds

exactly as described above.
4. WEAK INSTRUMENTS
The instruments used in the standard first-differenced GMM estimator become
less informative in two important cases. First, as the value of the autoregressive
parameter increases towards unity; and second, as the variance of the
individual effects i increases relative to the variance of vit. To examine this
further consider the case with T = 3. In this case, the moment conditions
corresponding to the standard GMM estimator reduce to a single orthogonality
condition. The corresponding method of moments estimator reduces to a
simple two stage least squares (2SLS) estimator, with first stage (instrumental
variable) regression
yi2 = dyi1 + ri for i = 1, . . . , N.
For sufficiently high autoregressive parameter or for sufficiently high relative
variance of the individual effects, the least squares estimate of the reduced form
coefficient d can be made arbitrarily close to zero. In this case the instrument
yi1 is only weakly correlated with yi2. To see this notice that the model (2.3)
implies that
yi2 = ( 1)yi1 + i + vi2 for i = 1, . . . , N.
(4.1)
The least squares estimator of ( 1) in (4.1) is generally biased upwards,

towards zero, since we expect E(yi1i) > 0. Assuming covariance stationarity
d is given by
and letting
2 = var(i) and
2v = var(vit), the plim of
plim
d = ( 1)
+k
2

2
v
; with k =
1
.
1+
(4.2)
The bias term effectively scales the estimated coefficient on the instrumental
variable yi1 toward zero. We find that plim
d 0 as 1 or as (
2/
2v ) ,
which are the cases in which the first stage F-statistic is Op(1). A graph showing
both plim d and 1 against is given in Fig. 1, for
2 =
2v , T = 3.
We are interested in inferences using this first-differenced instrumental
variable (IV) estimator when d is local to zero, that is where the instrument yi1
is only weakly correlated with yi2. Following Nelson & Startz (1990a, b) and
Staiger & Stock (1997) we characterise this problem of weak instruments using
the concentration parameter. First note that the F-statistic for the first stage
instrumental variable regression converges to a noncentral chi-squared with one
Fig. 1.
61
plim
d and 1,
2 =
2 , T = 3. Source: Blundell & Bond (1998).
degree of freedom. The concentration parameter is then the corresponding noncentrality parameter which we label in this case. The IV estimator performs
poorly when approaches zero. Assuming covariance stationarity, has the
following simple characterisation in terms of the parameters of the AR model
=
(
2v k)2
1
; with k =
.
2 +
2v k
1+
The performance of the standard GMM differenced estimator in this AR(1)

specification can therefore be seen to deteriorate as 1, as well as for
decreasing values of
2v and for increasing values of
2. To illustrate this further
Fig. 2 provides a plot of against for the case
2 =
2v = 1, T = 3.
Blundell & Bond (2000) note that the finite sample bias of the firstdifferenced GMM estimator for the AR(1) model with weak instruments is
likely to be in the direction of the Within Groups estimator. This is because the
(one-step) first-differenced GMM estimator coincides with a 2SLS estimator
based on the orthogonal deviations transformation of Arellano & Bover
(1995), and 2SLS estimators are biased in the direction of OLS in the presence
of weak instruments (see, for example, Bound, Jaeger & Baker (1995)).6 We
explore the finite sample behaviour of the first-differenced GMM estimator
further in Section 11 below.
62
Fig. 2.
Concentration Parameter ,
2 =
2 = 1, T = 3. Source: Blundell & Bond
(1998).
5. NON-LINEAR MOMENT CONDITIONS

5.1. Standard Assumptions
The standard assumptions (2.4), (2.5) and (2.6) also imply non-linear moment
conditions which are not exploited by the standard linear first-differenced
GMM estimator described in Section 3.1. Ahn & Schmidt (1995) show that
there are a further T 3 non-linear moment conditions, which can be written
as
E(uitui, t 1) = 0; for t = 4, 5, . . . , T
(5.1)
and which could be expected to improve efficiency. These conditions relate

directly to the absence of serial correlation in vit and do not require
homoskedasticity. Thus, under the standard assumptions, the complete set of
second-order moment conditions available is (3.1) and (5.1). Asymptotic
efficiency comparisons reported in Ahn & Schmidt (1995) confirm that these
non-linear moments are particularly informative in cases where is close to
unity and/or where
2/
2v is high.
63
Under the homoskedasticity through time restriction (3.3), there is one further
non-linear moment condition available, in addition to (3.1), (3.4) and (5.1) (see
Ahn & Schmidt (1995)). This can be written as

T
1
E(uiui3) = 0 where ui =
T1
uit.
(5.2)
t=2
Thus, under the homoskedasticity assumption in addition to the standard

assumptions, the complete set of moment conditions available comprises the
linear conditions (3.1) and (3.4), and the non-linear conditions (5.1) and (5.2).
6. INITIAL CONDITIONS AND A LEVELS GMM

ESTIMATOR
In addition to the standard assumptions set out in Section 2, we now consider
the additional assumption
E(iyi2) = 0 for i = 1, . . . , N.
(6.1)
Notice that, given (2.3)(2.6) which specifies yi2 given yi1, assumption (6.1) is
a restriction on the initial conditions process generating yi1.7
If this initial conditions restriction holds in addition to the standard
assumptions (2.4), (2.5) and (2.6), the following T 2 linear moment
conditions are valid
E(uityi, t 1) = 0; for t = 3, 4, . . . , T.
(6.2)
Moreover, given the standard assumptions, these linear moment conditions

imply the T 3 non-linear moment conditions given in (5.1), and render these
non-linear conditions redundant for estimation. Thus the complete set of
second order moment restrictions implied by (2.3)(2.6) and (6.1) can be
implemented as a linear GMM estimator.
To consider when the first-differences yit are uncorrelated with the
individual effects, notice that for the AR(1) model (2.3)

t3
t2
yit =
yi2 +
sui, t s
s=0
so that yit will be uncorrelated with i if and only if yi2 is uncorrelated with
i. This is precisely the assumption (6.1). To guarantee this, we require the
initial conditions restriction
64

yi1

i
i = 0,
1
which is satisfied under mean stationarity of the yit process, as defined by

(2.3)(2.8).
To show that the moment conditions (6.2) remain informative when
approaches unity or
2/
2v becomes large, we again consider the case of T = 3.
Here we can use one equation in levels
yi3 = yi2 + i + vi3
for which the instrument available is yi2, and the first stage regression is
yi2 = lyi2 + ri.
In this case, assuming covariance stationarity, the plim
l is given by8
plim
l=
1
2
(6.3)
and therefore this moment condition stays informative for high values of , in
contrast to the moment condition available for the first-differenced model.
The 0.5(T + 1)(T 2) linear moment conditions (3.1) and (6.2) comprise the
full set of second-order moment conditions under mean stationarity in
conjunction with the standard assumptions listed in Section 2, and form the
basis for a system GMM estimator which will be discussed in the next section.
However, as this system GMM estimator combines the moment conditions for
the model in first-differences with those for the model in levels, we also
consider a simpler GMM levels estimator, that is based on the
ml = 0.5(T 1)(T 2) moment conditions
E(uityi, t s) = 0; for t = 3, . . . , T and 1 s t 2,
(6.4)
that relate only to the equations in levels. These can be expressed as

E(Zliui) = 0,
where Zli is the (T 2) ml matrix given by
0
0
yi2
0
yi2 yi3
Zli =
.
.
.
0
0
0
...
0
...
0
...
.
. . . yi2
...
0
...
0
,
...
.
. . . yiT 1
and ui is the (T 2) vector (ui3, ui4, . . . , uiT). Calculation of the one-step and
65
two-step GMM estimators then proceeds in a similar way to that described

above. In this case though, unless
2 = 0, there is no one-step GMM estimator
that is asymptotically equivalent to the two-step estimator, even in the special
case of i.i.d. disturbances.9
7. A SYSTEM GMM ESTIMATOR

7.1. The Optimal Combination of Differenced and Levels Estimators
Calculation of the GMM estimator using the full set of linear moment
conditions (3.1) and (6.2) can be based on a stacked system comprising all
T 2 equations in first-differences and the T 2 equations in levels
corresponding to periods 3, . . . , T, for which instruments are observed. The
ms = 0.5(T + 1)(T 2) moment conditions are10
E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1
(7.1)
E(uityi, t 1) = 0; for t = 3, . . . , T.
(7.2)
These can be expressed as

E(Zsipi) = 0,
where
pi =
Zsi =
Zdi 0
0 Zpli

ui
ui
Zdi
0
0
0
0 yi2
= 0
0
yi3
.
.
.
0
0
0
...
0
...
0
...
0
;
...
0
. . . yi, T 1
with Zdi as defined in section 3, and Zpli is the non-redundant subset of Zli.
The calculation of the two-step GMM estimator is then analogous to that
described above. Again in this case, unless
2 = 0, there is no one-step GMM
estimator that is asymptotically equivalent to the two-step estimator, even in the
special case of i.i.d. disturbances.11
The system GMM estimator is clearly a combination of the GMM
differenced estimator and a GMM levels estimator that uses only (7.2). This
combination is linear for the system 2SLS estimator which is given by
66
s = (q 1Zs( ZsZs) 1Zsq 1) 1q 1Zs(ZsZs) 1Zsq,

where
qi =

yi
.
yi
Because
q 1Zs(ZsZs) 1Zsq 1 = y 1Zd(ZdZd) 1Zdy 1 + y 1Zpl (ZplZpl ) 1Zpl y 1
the system 2SLS estimator is equivalent to the linear combination
s = d + (1 ) pl ,
p
where d and l are the 2SLS first-differenced and levels estimators
respectively, with the levels estimator utilising only the T 2 moment
conditions (7.2), and
y 1Zd(ZdZd) 1Zdy 1
=
y 1Zd(ZdZd) 1Zdy 1 + y 1Zpl (ZplZpl ) 1Zpl y 1

dZdZd
d
=
,

dZdZd
d +
lZpl Zpl
l
l are the OLS estimates of the first stage regression coefficients
where
d and
underlying these 2SLS estimators. From (4.2) and (6.3) it follows that 0 if
1 and/or (
2/
2v ) , so all the weight for the system estimator will in
these cases be given to the informative levels moment conditions (7.2).
In the case where the initial conditions satisfy restriction (6.1) and the vit satisfy
restriction (3.3), Ahn & Schmidt (1995, equation (12b)) show that the T 2
homoskedasticity restrictions (3.4) and (5.2) can be replaced by a set of T 2
moment conditions
E(yituit yi, t 1ui, t 1) = 0; for t = 3, . . . , T,
which are all linear in the parameter . The non-linear conditions (5.2) are
again redundant for estimation given (6.1), and the complete set of second
order moment restrictions implied by (2.3)(2.6), (3.3) and (6.1) can be
implemented as a linear GMM estimator.
8. ASYMPTOTIC VARIANCE COMPARISONS

To quantify the gains in asymptotic efficiency that result from exploiting the
linear moment conditions (6.2), Table 1 reports the ratio of the asymptotic
variance of the standard first-differenced GMM estimator described in Section
3.1 to the asymptotic variance of the system GMM estimator described in
Table 1.
67
Asymptotic Variance Ratios
2/
2v = 1.00
SYS
2/
2v = 0.25
NON-LINEAR
SYS
NON-LINEAR
T=3
0.0
0.3
0.5
0.8
0.9
1.33
2.15
4.00
28.00
121.33
n/a
1.33
1.89
2.91
13.10
47.91
n/a
T=4
0.0
0.3
0.5
0.8
0.9
1.75
2.31
3.26
13.97
55.40
1.67
1.91
2.10
2.42
2.54
1.40
1.77
2.42
8.88
30.90
1.29
1.33
1.35
1.41
1.45
Source: Blundell & Bond (1998)
Section 7.1. These asymptotic variance ratios are calculated assuming both
covariance stationarity and homoskedasticity. They are presented for T = 3 and
T = 4, for two fixed values of
2/
2v , and for a range of values of the
autoregressive parameter . For comparison, we also reproduce from Ahn &
Schmidt (1995) the corresponding asymptotic variance ratios comparing firstdifferenced GMM to the non-linear GMM estimator which uses the quadratic
moment conditions (5.1), but not the extra linear moment conditions (6.2). In
the T = 3 case there are no quadratic moment restrictions available. These
calculations suggest that exploiting conditions (6.2) can result in dramatic
efficiency gains when T = 3, particularly at high values of and high values of
2/
2v . These are indeed the cases where we find the instruments used to obtain
the first-differenced estimator to be weak.
In the T = 4 case we still find dramatic efficiency gains at high values of .
Comparison to the results for the non-linear GMM estimator also shows that
the gains from exploiting conditions (6.2) can be much larger than the gains
from simply exploiting the non-linear restrictions (5.1).
In the Monte Carlo simulations presented in Section 11 we investigate
whether similar improvements are found in finite samples.
9. MULTIVARIATE DYNAMIC PANEL DATA MODELS

In this section the dynamic panel data model with additional regressors is
considered.12 In particular, we focus on the model
68
yit = yit 1 + xit + uit

uit = i + vit
(9.1)
where xit is a scalar. The error components i and vit again satisfy the conditions
(2.4)(2.6). The xit process is correlated with the individual effects i and we
consider three possible correlation structures between the xit process and the vit
error process that determine the instruments that can be used to estimate and
.
First, the xit process is strictly exogenous:
E(xisvit) = 0; for s = 1, . . . , T; t = 2, . . . , T.
(9.2)
Secondly, the xit process is weakly exogenous, or predetermined

E(xisvit) = 0; for s = 1, . . . , t; t = 2, . . . , T
(9.3)
E(xisvit) 0; for s = t + 1, . . . , T; t = 2, . . . , T
and thirdly, the xit process is endogenously determined
E(xisvit) = 0; for s = 1, . . . , t 1; t = 2, . . . , T
(9.4)
E(xisvit) 0; for s = t, . . . , T; t = 2, . . . , T.
We are especially interested in the case when the xit process is endogenously
determined, which includes simultaneous processes, but also measurement
error.
For the GMM first-differenced estimator, the 0.5(T 1)(T 2) moment
conditions (3.1)
E(yi, t suit) = 0; for t = 3, . . . , T and 2 s t 1
remain valid. When the xit process is strictly exogenous, the following
additional T(T 2) moment conditions are valid
E(xisuit) = 0; for t = 3, . . . , T and 1 s T.
(9.5)
When xit is predetermined there are only the 0.5(T + 1)(T 2) additional
moment conditions
E(xi, t suit) = 0; for t = 3, . . . , T and 1 s t 1,
(9.6)
whereas when xit is endogenously determined only the following

0.5(T 1)(T 2) additional moment conditions are valid
E(xi, t suit) = 0; for t = 3, . . . , T and 2 s t 1.
(9.7)
For the non-linear GMM estimator, moment conditions (5.1) remain valid,
and no further moment conditions result from the presence of xit variables.
69
For the system GMM estimator, we first consider under what conditions both
yit and xit are uncorrelated with i. In order to illustrate this, we specify the
following process for the regressor
xit = xi, t 1 + i + eit.
Thus 0 allows the level of xit to be correlated with i, and the covariance
properties between vit and eis determine whether xit is strictly exogenous,
predetermined or endogenously determined. First notice that

t3
t2
xit =
xi2 +
sei, t s,
s=0
so that xit will be correlated with i if and only if xi2 is correlated with i.
To guarantee E[xi2i] = 0 we require the initial conditions restriction

xi1

i
i = 0
1
(9.8)
which is satisfied under mean stationarity of the xit process.

Given this restriction, writing yit as

t3
t2
yit =
yi2 +
s(xi, t s + ui, t s)
(9.9)
s=0
shows that yit will be correlated with i if and only if yi2 is correlated with
i. To guarantee E[yi2i] = 0 we then require the similar initial conditions
restriction
yi1

i
+ i
1
1
i
=0
(9.10)
which would again be satisfied under stationarity. Thus, there are additional
moment restrictions available for the equations in levels when the yit and xit
processes are both mean stationary.
Whilst jointly stationary means is sufficient to ensure that both yit and xit
are uncorrelated with i, this condition is stronger than is necessary. For
example, if the conditional model (9.1) has generated the yit series for
sufficiently long time prior to our sample period for any influence of the true
initial conditions to be negligible, then an expression analogous to (9.9) shows
that yit will be uncorrelated with i provided that xit is uncorrelated with i,
70
even if the mean of xit (and hence yit) is time-varying. Moreover we can note
that it is perfectly possible for xit to be uncorrelated with i in cases where yit
is correlated with i (for example, when (9.8) holds or = 0 but (9.10) is not
satisfied). However, given (9.9), it seems very unlikely that yit will be
uncorrelated with i in contexts where xit is correlated with i.
When both yit and xit are uncorrelated with i, the extra moment
conditions for the GMM system estimator are, as before, (7.2),
E(uityi, t 1) = 0; for t = 3, . . . , T
and
E(uitxit) = 0; for t = 2, . . . , T
(9.11)
in the case where xit is strictly exogenous or predetermined; or

E(uitxit 1) = 0; for t = 3, . . . , T,
(9.12)
when xit is endogenously determined. Therefore, when for example xit is

endogenous, the GMM system estimator is based on the moment conditions
(7.1), (9.7), (7.2) and (9.12).
10. TESTS OF OVERIDENTIFYING RESTRICTIONS

The standard test for testing the validity of the moment conditions used in the
GMM estimation procedure is the Sargan test of overidentifying restrictions
(see Sargan (1958) and the development for GMM in Hansen (1982)). For the
GMM estimator in the first-differenced model this test statistic is given by
Sard =
1
uZdWNZdu
N
where WN is the optimal weight matrix as in (3.2) and u are the two-step
residuals in the differenced model. In general, under the null that the moment
conditions are valid, Sard is asymptotically chi-squared distributed with md k
degrees of freedom, where md is the number of moment conditions and k is the
number of estimated parameters.
For the system estimator, the same test is readily defined. Call this test Sars.
A test for the validity of the level moment conditions that are utilised by the
system estimator is then obtained as the difference between Sars and Sard:
Dif-Sar = Sars Sard
(10.1)
and Dif-Sar is asymptotically chi-squared distributed with ms md degrees of

freedom under the null that the level moment conditions are valid.
71
11. MONTE CARLO RESULTS

This section illustrates the performance of the various estimators, as discussed
above, for a dynamic multivariate panel data model. In particular, the effect of
weak instruments and the potential gains from exploiting initial conditions
restrictions are investigated.
The model specification is
yit = yit 1 + xit + i + vit
(11.1)
xit = xit 1 + i + vit + eit
(11.2)
with
i ~ N(0,
2); vit ~ N(0,
2v ); eit ~ N(0,
2e )
and the initial observations are drawn from the covariance stationary
distribution. Although these errors are homoskedastic, we do not consider any
of the additional moment conditions that require homoskedasticity in the
simulated estimators.
We choose the error process parameters in such a way that the xit process is
highly persistent for high values of . Further, xit is positively correlated with
i and the value of is negative to mimic the effects of measurement error. The
values of the parameters that are kept fixed in the various Monte Carlo
simulations presented below are
= 1, = 0.25, = 0.1,
2 = 1,
2v = 1,
2e = 0.16.
The parameters that are varied in the simulations are the autoregressive
coefficients and . We consider four designs with and both taking the
values of 0.5 and 0.95. The case when = 0.5 and = 0.95 resembles the
production function data that will be analysed in the next section. The sample
size is N = 500, and the simulation results for the various estimators are
presented in Tables 2 and 3 for T = 4 and in Tables 4 and 5 for T = 8.
Means, standard deviations and root mean squared errors (RMSE) from
10,000 simulations are tabulated for the OLS levels estimator (OLS), Within
Groups estimator (WG), the GMM first-differenced estimator (DIF), the nonlinear GMM estimator (AS),13 the levels GMM estimator (LEV), and the
0.990
0.583
0.775
0.820
0.762
0.001
0.040
0.053
0.420
0.011
0.320
0.053
0.231
0.017
0.263
St D
rmse
0.194
0.300
0.318
0.010
0.036
Mean
WG
0.032
0.651
0.075
0.809
0.031
0.491
0.080
0.687
0.030
0.538
St D
rmse
0.195
0.350
0.915
0.469
0.496
Mean
DIF
0.487
0.773
0.994
1.554
0.131
0.135
0.420
0.428
0.090
0.091
St D
rmse
0.790
0.840
1.006
0.516
0.501
Mean
AS
0.242
0.266
0.524
0.565
0.095
0.096
0.351
0.351
0.075
0.075
St D
rmse
1.004
0.980
1.029
0.512
0.502
St D
rmse
0.029
0.042
0.289
0.289
0.070
0.070
0.336
0.337
0.059
0.059
LEV
Mean
Monte-Carlo results, T = 4, = 0.5, = 1, N = 500
Means and standard devations of 10,000 replications. DIF, AS, LEV and SYS are two-step estimators.
= 0.95
= 0.5
Mean
OLS
Table 2.
1.000
0.979
1.015
0.512
0.500
Mean
SYS
0.033
0.044
0.232
0.232
0.060
0.061
0.257
0.257
0.055
0.055
St D
rmse
72
0.962
0.904
0.830
0.650
0.997
0.001
0.012
0.026
0.100
0.014
0.151
0.034
0.174
0.002
0.047
St D
rmse
0.465
0.661
0.551
0.089
0.221
Mean
WG
0.026
0.290
0.089
0.543
0.031
0.412
0.090
0.458
0.032
0.729
St D
rmse
0.233
0.907
0.517
0.466
0.472
Mean
DIF
0.104
0.112
1.769
1.928
0.103
0.109
1.438
1.522
0.825
0.954
St D
rmse
0.863
0.936
1.021
0.500
0.868
Mean
AS
0.072
0.074
0.853
0.864
0.065
0.065
0.461
0.461
0.221
0.235
St D
rmse
LEV
1.020
0.957
1.078
0.518
0.961
Mean
Monte-Carlo results, T = 4, = 0.95, = 1, N = 500
= 0.95
= 0.5
Mean
OLS
Table 3.
0.008
0.010
0.091
0.093
0.053
0.056
0.160
0.178
0.144
0.145
St D
rmse
1.020
0.956
1.075
0.514
0.953
Mean
SYS
0.010
0.011
0.090
0.092
0.044
0.046
0.153
0.170
0.096
0.096
St D
rmse

73
0.990
0.581
0.775
0.820
0.762
0.001
0.040
0.035
0.421
0.007
0.320
0.034
0.228
0.012
0.262
St D
rmse
0.388
0.662
0.490
0.311
0.265
Mean
WG
0.016
0.289
0.044
0.613
0.017
0.190
0.045
0.512
0.018
0.236
St D
rmse
0.226
0.548
0.930
0.480
0.494
Mean
DIF
0.177
0.440
0.356
0.852
0.040
0.045
0.136
0.153
0.034
0.035
St D
rmse
0.972
0.969
0.944
0.497
0.495
Mean
AS
0.030
0.036
0.134
0.137
0.029
0.029
0.134
0.145
0.025
0.026
St D
rmse
LEV
0.979
0.982
1.041
0.523
0.503
Mean
Monte Carlo results, T = 8, = 0.5, = 1, N = 500
= 0.95
= 0.5
Mean
OLS
Table 4.
0.007
0.032
0.108
0.110
0.034
0.041
0.157
0.162
0.029
0.029
St D
rmse
0.983
0.979
0.997
0.511
0.501
Mean
SYS
0.011
0.031
0.101
0.103
0.027
0.029
0.124
0.124
0.024
0.024
St D
rmse
74
0.962
0.902
0.830
0.650
0.997
0.001
0.012
0.017
0.100
0.009
0.150
0.022
0.171
0.001
0.047
St D
rmse
0.745
0.882
0.796
0.396
0.591
Mean
WG
0.009
0.068
0.040
0.258
0.015
0.106
0.040
0.208
0.017
0.359
St D
rmse
0.615
0.927
0.800
0.480
0.676
Mean
DIF
0.025
0.034
0.400
0.555
0.033
0.039
0.290
0.352
0.222
0.350
St D
rmse
1.016
0.956
1.099
0.508
0.903
Mean
AS
0.007
0.009
0.118
0.119
0.024
0.025
0.125
0.159
0.061
0.077
St D
rmse
LEV
1.017
0.957
1.084
0.523
0.973
Mean
Monte Carlo results, T = 8, = 0.95, = 1, N = 500
= 0.95
= 0.5
Mean
OLS
Table 5.
0.002
0.007
0.028
0.033
0.022
0.032
0.058
0.101
0.022
0.032
St D
rmse
1.019
0.957
1.075
0.518
0.958
Mean
SYS
0.003
0.007
0.031
0.036
0.021
0.028
0.059
0.095
0.031
0.032
St D
rmse

75
76
system GMM estimator (SYS). Thus for the case of estimating the AR(1)
model for xit, DIF uses the moment conditions (3.1); AS uses the moment
conditions (3.1) and (5.1); LEV uses the moment conditions (6.4); and SYS
uses the moment conditions (3.1) and (6.2). The reported results are for the
two-step GMM estimators.
Tables 2 and 4 present results for = 0.5. The row labelled presents the
results for the estimates of in model (11.2), where the various GMM
estimators only utilise lagged information on x as instruments, and potential
information from the lagged values of y is not used. Our results for the DIF and
SYS estimators can therefore be compared to those reported in, for example,
Blundell & Bond (1998) and AlonsoBorrego & Arellano (1999). As expected,
the OLS estimates are biased upward and the WG estimates are biased
downwards. In this experiment where xit is not highly persistent and the
instruments available for the equations in first-differences are not weak, all four
GMM estimators are virtually unbiased. The AS, LEV and SYS estimators all
provide an improvement in precision compared to the standard DIF estimator.
As we would expect from the asymptotic variance ratios in Table 1, there is a
greater gain in precision from using SYS rather than AS at T = 4, although in
Table 4 we can observe that this difference becomes very small at T = 8.
The next two rows in Tables 2 and 4 present the estimation results for and
in model (11.1) when = 0.5 and = 0.5. The OLS estimates for are biased
upwards, whereas those for are biased downwards. The WG estimates for
and are both biased downwards. Again, as expected, since both the y and x
series have a low degree of persistence, the four GMM estimators perform quite
well in this experiment. The SYS estimator has the smallest RMSE for both
parameters, but the gains are not dramatic at T = 8.
The final two rows in Tables 2 and 4 are for the model with = 0.95 and
= 0.5. As this makes the y process highly persistent, the DIF estimator suffers
from a serious weak instrument bias, as well as being very imprecise. We can
notice that the DIF estimates of and are both biased downwards, in the
direction of the Within Groups estimates. The AS estimator is better behaved,
as a result of exploiting the non-linear moment conditions (5.1). However the
LEV and SYS estimators which exploit the initial conditions restrictions
provide more dramatic gains in precision, particularly for the estimation of
and particularly in the case with T = 4. With T = 8, the LEV and SYS estimates
of are biased upwards, in the direction of the OLS estimate, but still dominate
on the RMSE criterion.
Tables 3 and 5 present the results for the cases where the xit process is highly
persistent, with = 0.95. The estimates for show the familiar pattern: OLS is
upward biased, WG is downward biased, and DIF is downward biased towards
77
WG as a result of weak instruments. The AS estimator provides a substantial

improvement in both bias and precision. However the LEV and SYS estimators
provide more dramatic gains, particularly when T = 4.
When = 0.5, the DIF estimator estimates quite well, but the DIF estimate
of is very imprecise, biased downwards and on average very similar to the
WG estimate of . The AS, LEV and SYS estimates of are all close to the
true value. The AS estimates of are much less biased than DIF but still
imprecise, particularly at T = 4. The LEV and SYS estimates of show a little
finite-sample bias, but again dominate in terms of RMSE. This experiment is
intended to capture salient features of the production function data we consider
in Section 12, notably a highly persistent explanatory variable that is measured
with error, and a significant autoregressive parameter that is not close to one.
The simulation results confirm that the system GMM estimator has reasonable
properties in this context.
When both and are equal to 0.95 the estimators display a similar pattern.
One surprise is that the LEV and SYS estimators actually estimate both
parameters better than in the experiments with = 0.5, and the gain from using
either of these estimators compared to AS is rather more striking in this case.
Also the DIF estimator now estimates quite well (though not ); this may be
because by increasing whilst keeping the variance of i and vit fixed, we have
greatly increased the variance of the yit series.
To investigate the size properties of the Sargan test of overidentifying
restrictions, we present in Figures 312 p-value plots (see Davidson &
MacKinnon, 1996) for the Sargan test statistics for the DIF and SYS GMM
estimators. We also present the p-value plots for the Dif-Sar statistic as defined
in (10.1), testing the validity of the additional levels moment conditions
exploited by the SYS estimator.
The x-axis of the p-value plots represents the nominal size using the
asymptotic critical values of the corresponding chi-squared distributions; the yaxis represents the actual size of the test statistics in the experiments.
Figures 36 are the p-value plots for the Sargan tests for the GMM
estimators in the univariate model for xit, (11.2). When = 0.5, the distributions
of the test statistics are all very close to the asymptotic distribution, with a
slight over-rejection when T = 8. When the series are persistent, = 0.95, the
tests over-reject, especially for larger T, with the Dif-Sar test having the largest
size distortion when T = 4.
Figures 714 present the p-value plots for the Sargan test statistics for the
multivariate dynamic panel data model (11.1). These appear to be well behaved
in the case with = 0.5 and = 0.5. In general, the Dif-Sar test is oversized
when either y or x or both are persistent. An interesting case is when = 0.5,
78
= 0.95 and T = 8. The Sars and Dif-Sar tests are considerably oversized in this
case, whereas the Sard test has the correct size.
Fig. 3.
p-value plot, = 0.5, T = 4.
Fig. 4.
p-value plot, = 0.95, T = 4.
Fig. 5.
p-value plot, = 0.5, T = 8.
Fig. 6.
p-value plot, = 0.95, T = 8.
79
80
Fig. 7.
= 0.5, = 0.5, T = 4.
Fig. 8.
= 0.5, = 0.95, T = 4.
Fig. 9.
= 0.5, = 0.5, T = 8.
Fig. 10.
= 0.5, = 0.95, T = 8.
81
82
Fig. 11.
= 0.95, = 0.5, T = 4.
Fig. 12.
= 0.95, = 0.95, T = 4.
Fig. 13.
= 0.95, = 0.5, T = 8.
Fig. 14.
= 0.95, = 0.95, T = 8.
83
84
12. AN APPLICATION: THE COBBDOUGLAS

PRODUCTION FUNCTION
As Griliches and Mairesse (1998) have argued, the estimation of production
functions has highlighted the poor performance of standard GMM estimators
for short panels. Here we use the problem of estimating production function
parameters to evaluate the practical significance of the alternative estimators
reviewed in this chapter. In particular attention is focused on the estimation of
the CobbDouglas production function
yit =nnit + kkit + t + (i + vit + mit)
vit =vi, t 1 + eit
|| < 1
eit, mit ~MA(0),
(12.1)
where yit is log sales of firm i in year t, nit is log employment, kit is log capital
stock and t is a year-specific intercept reflecting, for example, a common
technology shock. Of the error components, i is an unobserved time-invariant
firm-specific effect, vit is a possibly autoregressive (productivity) shock and mit
reflects serially uncorrelated (measurement) errors. Constant returns to scale
would imply n + k = 1, but this is not necessarily imposed.
Interest is in the consistent estimation of the parameters (n, k, ) when the
number of firms (N) is large and the number of years (T) is fixed. We maintain
that both employment (nit) and capital (kit) are potentially correlated with the
firm-specific effects (i), and with both productivity shocks (eit) and
measurement errors (mit).
The model has a dynamic (common factor) representation
yit = nnit nni, t 1 + kkit kki, t 1 + yi, t 1
+ (t t 1) + (i(1 ) + eit + mit mi, t 1)
(12.2)
or
yit = 1nit + 2ni, t 1 + 3kit + 4ki, t 1 + 5yi, t 1 + *t + (*i + wit)
(12.3)
subject to two non-linear (common factor) restrictions 2 = 1 5 and

4 = 3 5. Given consistent estimates of the unrestricted parameter vector
= ( 1, 2, 3, 4, 5) and var( ), these restrictions can be (tested and)
imposed using minimum distance to obtain the restricted parameter vector
(n, k, ). Notice that wit = eit ~ MA(0) if there are no measurement errors
(var(mit) = 0), and wit ~ MA(1) otherwise.
85
12.1. Data and Results

The data used is a balanced panel of 509 R&D-performing U.S. manufacturing
companies observed for 8 years, 198289. These data were kindly made
available to us by Bronwyn Hall, and are similar to those used in Mairesse &
Hall (1996), although the sample of 509 firms used here is larger than the final
sample of 442 firms used in Mairesse & Hall (1996). Capital stock and
employment are measured at the end of the firms accounting year, and sales is
used as a proxy for output. Further details of the data construction can be found
in Mairesse & Hall (1996).
Table 6 reports results for the basic production function, not imposing
constant returns to scale, for a range of estimators. We report results for both
the unrestricted model (12.3) and the restricted model (12.1), where the
common factor restrictions are tested and imposed using minimum distance.14
We report results here for the one-step GMM estimators, for which inference
based on the asymptotic variance matrix has been found to be more reliable
than for the (asymptotically) more efficient two-step estimator. Simulations
suggest that the loss in precision that results from not using the optimal weight
matrix is unlikely to be large (cf. Blundell & Bond, 1998).
As expected in the presence of firm-specific effects, OLS levels appears to
give an upwards-biased estimate of the coefficient on the lagged dependent
variable, whilst Within Groups appears to give a downwards-biased estimate of
this coefficient. Note that even using OLS, we reject the hypothesis that = 1,
and even using Within Groups we reject the hypothesis that = 0. Although the
pattern of signs on current and lagged regressors in the unrestricted models are
consistent with the AR(1) error-component specification, the common factor
restrictions are rejected for both these estimators. They also reject constant
returns to scale.15
The validity of lagged levels dated t 2 as instruments in the firstdifferenced equations is clearly rejected by the Sargan test of overidentifying
restrictions. This is consistent with the presence of measurement errors.
Instruments dated t 3 (and earlier) are accepted, and the test of common
factor restrictions is easily passed in these first-differenced GMM results.
However the estimated coefficient on the lagged dependent variable is barely
higher than the Within Groups estimate. Indeed the differenced GMM
parameter estimates are all very close to the Within Groups results. The
estimate of k is low and statistically weak, and the constant returns to scale
restriction is rejected.
The validity of lagged levels dated t 3 (and earlier) as instruments in the
first-differenced equations, combined with lagged first-differences dated t 2
86
Table 6.
Production Function Estimates
OLS
Levels
Within
Groups
DIF
t2
DIF
t3
SYS
t2
SYS
t3
0.479
(0.029)
0.423
(0.031)
0.235
(0.035)
0.212
(0.035)
0.922
(0.011)
0.488
(0.030)
0.023
(0.034)
0.177
(0.034)
0.131
(0.025)
0.404
(0.029)
0.513
(0.089)
0.073
(0.093)
0.132
(0.118)
0.207
(0.095)
0.326
(0.052)
0.499
(0.101)
0.147
(0.113)
0.194
(0.154)
0.105
(0.110)
0.426
(0.079)
0.629
(0.106)
0.092
(0.108)
0.361
(0.129)
0.326
(0.104)
0.462
(0.051)
0.472
(0.112)
0.278
(0.120)
0.398
(0.152)
0.209
(0.119)
0.602
(0.098)
m1
m2
Sar
Dif-Sar
2.60
2.06
8.89
1.09
6.21
1.36
0.001
4.84
0.69
0.073
8.14
0.59
0.000
0.001
6.53
0.35
0.032
0.102
n
0.538
(0.025)
0.266
(0.032)
0.964
(0.006)
0.488
(0.030)
0.199
(0.033)
0.512
(0.022)
0.583
(0.085)
0.062
(0.079)
0.377
(0.049)
0.515
(0.099)
0.225
(0.126)
0.448
(0.073)
0.773
(0.093)
0.231
(0.075)
0.509
(0.048)
0.479
(0.098)
0.492
(0.074)
0.565
(0.078)
0.000
0.000
0.000
0.000
0.014
0.000
0.711
0.006
0.012
0.922
0.772
0.641
nt
nt1
kt
kt1
yt1
k
Comfac
CRS
Asymptotic standard errors in parentheses. Year dummies included in all models. m1 and m2 are
tests for first- and second-order serial correlation, asymptotically N(0, 1). We test the levels
residuals for OLS levels, and the first-differenced residuals in all other columns.
Comfac is a minimum distance test of the non-linear common factor restrictions imposed in the
restricted models. P-values are reported (also for Sar and Dif-Sar). CRS is a Wald test of the
constant resturns to scale hypothesis n + k = 1 in the restricted models. P-values are reported.
Source: Blundell & Bond (2000).
For the one-step GMM estimators, t s indicates that levels of the three series (y, n, k) dated
t s and all observed longer lags are used as instruments for the first-differenced equations. SYS
estimators use lagged differences of the three series dated t s + 1 as instruments for the levels
equations.
as instruments in the levels equations, appears to be marginal in the system

GMM estimator. However we have seen that these tests do have some tendency
to overreject in samples of this size. Moreover the Dif-Sar statistic that
87
specifically tests the additional moment conditions used in the levels equations
accepts their validity at the 10% level. The system GMM parameter estimates
appear to be reasonable. The estimated coefficient on the lagged dependent
variable is higher than the Within Groups estimate, but well below the OLS
levels estimate. The common factor restrictions are easily accepted, and the
estimate of k is both higher and better determined than the differenced GMM
estimate. The constant returns to scale restriction is easily accepted in the
system GMM results.16
Blundell & Bond (2000) explore this data in more detail and conclude that
the system GMM estimates in the final column of Table 6 are their preferred
results. In particular they find that the individual series used here are highly
persistent, and that the instruments available for the first-differenced equations
are only weakly correlated with the explanatory variables in first-differences.
This is consistent with the similarity between the first-differenced GMM and
Within Groups results. Blundell & Bond (2000) also find that when constant
returns to scale is imposed on the production function it is not rejected in the
preferred system GMM results then the results obtained using the firstdifferenced GMM estimator become more similar to the system GMM
estimates.
13. SUMMARY AND CONCLUSIONS

The aim of this chapter has been to review developments in the recent literature
which have tried to improve on the poor performance of the standard firstdifferenced GMM estimator for highly autoregressive panel series by using
additional moment conditions. In particular, we discuss the use of the system
GMM estimator that relies on relatively mild restrictions on the initial
conditions process. This system GMM estimator encompasses the GMM
estimator based on the non-linear moment conditions available in the dynamic
error components model and has substantial asymptotic efficiency gains
relative to this non-linear GMM estimator. The chapter systematically sets out
the assumptions required and moment conditions used by each estimator and
provides a Monte Carlo simulation comparison as well as an application to
production function estimation.
The simulation results are the first in the literature to consider the properties
of these GMM estimators in dynamic models with endogenous regressors. Our
analysis suggests that similar issues arise in this case to those that have been
found in previous Monte Carlo studies for the AR(1) model. In particular, we
find both a large bias and very low precision for the standard first-differenced
estimator when the individual series are highly persistent. By exploiting
88
instruments available for the equations in levels, the system GMM estimator
can both greatly improve the precision and greatly reduce the finite sample bias
when these additional moment conditions are valid. Intermediate results are
found for the non-linear GMM estimator considered, which suggests that this
estimator could also be useful in applications with persistent series where the
validity of the initial conditions restrictions required for the system GMM
estimator are rejected.
The empirical application uses company accounts data for the US to estimate
a simple Cobb-Douglas production function. For the standard GMM estimator
that uses moment conditions only for the first-differenced equations, we
confirm the problems noted by Griliches and Mairesse: the estimated
coefficient on capital is very low, all coefficient estimates are imprecise, and
constant returns to scale is easily rejected. We notice that the first-differenced
GMM results are similar to the Within Groups results, which suggests there
may be a problem of weak instruments. This suggestion is consistent with the
persistence of the underlying sales, employment and capital stock series. The
additional moment conditions used by the system GMM estimator are not
rejected in this context, and lead to a marked improvement in the empirical
results.
Taken together, these Monte Carlo and empirical results suggest that careful
consideration of the underlying series and comparisons between different panel
data estimators can be useful in detecting situations where the standard firstdifferenced GMM estimator is likely to be subject to serious weak instruments
biases. Where appropriate, the use of the system GMM estimator offers a
simple and powerful alternative, that can overcome many of the disappointing
features of the standard first-differenced GMM estimator in the context of
highly persistent series.
ACKNOWLEDGMENTS
This research is part of the programme of research at the ESRC Centre for the
Micro-Economic Analysis of Fiscal Policy at IFS. Financial support from the
ESRC is gratefully acknowledged.
NOTES
1. All of the estimators discussed and their properties extend in an obvious fashion
to higher order autoregressive models.
2. Extensions to dynamic models with additional regressors are considered in
Section 9.
89
3. With T = 3, the absence of serial correlation in vit (2.5) and predetermined initial
conditions (2.6) are required to identify (in the absence of any strictly exogenous
instruments). With T > 3, can be identified in the presence of suitably low order
moving average autocorrelation in vit.
4. These estimators are all based on the normalisation (2.3). Alonso-Borrego &
Arellano (1999) consider a symmetrically normalised instrumental variable estimator
based on the normalisation invariance of the standard LIML estimator.
5. As a choice of WN to yield the initial consistent estimator, Arellano & Bond
(1991) suggest
WN =

1
N
1
ZdiHdZdi
i=1
where Hd is the (T 2) (T 2) matrix given by
Hd =
2
1
0
...
0
1
2
1
...
0
0
1
2
...
0
...
...
...
...
...
0
0
0
.
2
which can be calculated in one step. The use of this Hd matrix accounts for the firstorder moving average structure in uit induced by the first-differencing transformation.
Note that when the vit are i.i.d., the one-step and two-step estimators are asymptotically
equivalent in this model. We follow this suggestion in the Monte Carlo simulations in
Section 11.
6. As shown by Arellano & Bover (1995), OLS on the model transformed to
orthogonal deviations coincides with the Within Groups estimator.
7. In this section we focus only on moment conditions that are valid under
heteroskedasticity. The case with homoskedasticity and assumption (6.1) is considered
in Section 7.2.
8. This corrects the expression for plim
l as given in Blundell and Bond (1998,
p. 125).
9. As a choice of WN to yield the initial consistent estimator, we use
WN =

1
N
1
ZliZli
i=1
in the Monte Carlo simulations reported below.

10. The use of moment conditions E(uityi, t s) = 0 for s > 1 can be shown to be
redundant, given (7.1) and (7.2). For balanced panels, the T 2 equations in levels may
be replaced by a single levels equation for period T, with (7.2) replaced by the
equivalent moment conditions E(uiTyi, T s) = 0 for s = 1, . . . , T 2. However this
approach does not extend easily to the case of unbalanced panels.
11. For an analysis of the potential loss in efficiency due to specific choices of the
initial weight matrix for these system estimators, see Windmeijer (2000). As a choice
of WN to yield the initial consistent estimator, we use
90
1
WN =
N
1
ZsiHsZsi
i=1
in our Monte Carlo simulations, where Hs is the matrix
Hd
0
IT 2
IT 2 is the (T 2) identity matrix and Hd is defined in Section 3.

12. Here we only consider moment conditions that do not require any homoskedasticity assumptions.
13. Define si = [ui3 ui2, . . . , uiT uiT 1, ui4(ui3 ui2), . . . , uiT(uiT 1 uiT 2)] and
Znli =
Zdi
0
0
, then the non-linear moment conditions can be written as
IT 3

N
1
1
, see Meghir &
E[Znlisi] = 0. As an initial weight matrix we use WN =
ZnliZnli
N i=1
Windmeijer (1999).
14. The unrestricted results are computed using DPD98 for GAUSS (see Arellano &
Bond, 1998).
15. The table reports p-values from minimum distance tests of the common factor
restrictions and Wald tests of the constant returns to scale restrictions.
16. One puzzle is that we find little evidence of second-order serial correlation in the
first-differenced residuals (i.e. an MA(1) component in the error term in levels),
although the use of instruments dated t 2 is strongly rejected. It may be that the eit
productivity shocks are also MA(1), in a way that happens to offset the appearance of
serial correlation that would otherwise result from measurement errors.
REFERENCES
Alonso-Borrego, C., & Arellano, M. (1999). Symmetrically Normalised Instrumental-Variable
Estimation using Panel Data. Journal of Business and Economic Statistics, 17, 3649.
Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components.
Journal of the American Statistical Association, 76, 598606.
Arellano, M., & Bond, S. R. (1991). Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations. Review of Economic Studies, 58,
277297.
Arellano, M., & Bond, S. R. (1998). Dynamic Panel Data Estimation using DPD98 for GAUSS.
http://www.ifs.org.uk/staff/steve_b.shtml.
Arellano, M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 2952.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley.
91
Bhagarva, A., & Sargan, J. D. (1983). Estimating Dynamic Random Effects Models from Panel
Data Covering Short Time Periods. Econometrica, 51, 16351659.
Blundell, R. W., & Bond, S. R. (1998). Initial Conditions and Moment Restrictions in Dynamic
Panel Data Models. Journal of Econometrics, 87, 115143.
Blundell, R. W., & Bond, S. (2000). GMM Estimation with Persistent Panel Data: An Application
to Production Functions. Econometric Reviews, 19(3), 321340.
Bound, J., Jaeger, D. A., & Baker, R. M. (1995). Problems with Instrumental Variables Estimation
when the Correlation between the Instruments and the Endogenous Explanatory Variable is
Weak. Journal of the American Statistical Association, 90, 443450.
Chamberlain, G. (1987). Asymptotic Efficiency in Estimation with Conditional Moment
Restrictions. Journal of Econometrics, 34, 305334.
Davidson, R., & MacKinnon, J. G. (1996). Graphical Methods for Investigating the Size and
Power of Hypothesis Tests. Manchester School, 66, 126.
Griliches, Z., & Mairesse, J. (1998). Production Functions: the Search for Identification. In: S.
Strom (Ed.), Essays in Honour of Ragnar Frisch. Econometric Society Monograph Series,
Hansen, L. P. (1982). Large Sample Properties of Generalised Method of Moment Estimators.
Holtz-Eakin, D., Newey, W., & Rosen, H. S. (1988). Estimating Vector Autoregressions with Panel
Mairesse, J., & Hall, B. H. (1996). Estimating the Productivity of Research and Development in
French and US Manufacturing Firms: An Exploration of Simultaneity Issues with GMM
Methods. In: K. Wagner & B. Van Ark (Eds), International Productivity Differences and,
Their Explanations (pp. 285315). Elsevier Science.
Meghir, C., & Windmeijer, F. (1999). Moment Conditions for Dynamic Panel Data Models with
Multiplicative Individual Effects in the Conditional Variance. Annales dconomie et de
Statistique, 55/56, 317330.
Nelson, C. R., & Startz, R. (1990a). Some Further Results on the Exact Small Sample Properties
of the Instrumental Variable Estimator. Econometrica, 58, 967976.
Nelson, C. R., & Startz, R. (1990b). The Distribution of the Instrumental Variable Estimator and
Its t-ratio When the Instrument is A Poor One. Journal of Business and Economic Statistics,
63, 51255140.
Nickell, S. J. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49,
14171426.
Sargan, J. D. (1958). The Estimation of Economic Relationships Using Instrumental Variables.
Staiger, D., & Stock, J. H. (1997). Instrumental Variables Regression with Weak Instruments.
Windmeijer, F. (2000). Efficiency Comparisons for a System GMM Estimator in Dynamic Panel
Data Models. In: R. D. H. Heijmans, D. S. G. Pollock & A. Satorra (Eds), Innovations in
Multivariate Statistical Analysis. A Festschrift for Heinz Neudecker (pp. 175184). Kluwer
Academic Publishers.
FULLY MODIFIED OLS FOR

HETEROGENEOUS COINTEGRATED
PANELS
Peter Pedroni
ABSTRACT
This chapter uses fully modified OLS principles to develop new methods
for estimating and testing hypotheses for cointegrating vectors in dynamic
panels in a manner that is consistent with the degree of cross sectional
heterogeneity that has been permitted in recent panel unit root and panel
cointegration studies. The asymptotic properties of various estimators are
compared based on pooling along the within and between dimensions
of the panel. By using Monte Carlo simulations to study the small sample
properties, the group mean estimator is shown to behave well even in
relatively small samples under a variety of scenarios.
I. INTRODUCTION
In this chapter we develop methods for estimating and testing hypotheses for
cointegrating vectors in dynamic time series panels. In particular we propose
methods based on fully modified OLS principles which are able to
accommodate considerable heterogeneity across individual members of the
panel. Indeed, one important advantage to working with a cointegrated panel
approach of this type is that it allows researchers to selectively pool the long
run information contained in the panel while permitting the short run dynamics
ISBN: 0-7623-0688-2
93
94
PETER PEDRONI
and fixed effects to be heterogeneous among different members of the panel.

An important convenience of the fully modified approach that we propose here
is that in addition to producing asymptotically unbiased estimators, it also
produces nuisance parameter free standard normal distributions. In this way,
inferences can be made regarding common long run relationships which are
asymptotically invariant to the considerable degree of short run heterogeneity
that is prevalent in the dynamics typically associated with panels that are
composed of aggregate national data.
A. Nonstationary Panels and Heterogeneity
Methods for nonstationary time series panels, including unit root and
cointegration tests, have been gaining increased acceptance in a number of
areas of empirical research. Early examples include Canzoneri, Cumby & Diba
(1996), Chinn & Johnson (1996), Chinn (1997), Evans & Karras (1996),
Neusser & Kugler (1998), Obstfeld & Taylor (1996), Oh (1996), Papell (1997),
Pedroni (1996b), Taylor (1996) and Wu (1996), with many more since. These
studies have for the most part been limited to applications which simply ask
whether or not particular series appear to contain unit roots or are cointegrated.
In many applications, however, it is also of interest to ask whether or not
common cointegrating vectors take on particular values. In this case, it would
be helpful to have a technique that allows one to test such hypothesis about the
cointegrating vectors in a manner that is consistent with the very general degree
of cross sectional heterogeneity that is permitted in such panel unit root and
panel cointegration tests.
In general, the extension of conventional nonstationary methods such as unit
root and cointegration tests to panels with both cross section and time series
dimensions holds considerable promise for empirical research considering the
abundance of data which is available in this form. In particular, such methods
provide an opportunity for researchers to exploit some of the attractive
theoretical properties of nonstationary regressions while addressing in a natural
and direct manner the small sample problems that have in the past often
hindered the practical success of these methods. For example, it is well known
that superconsistent rates of convergence associated with many of these
methods can provide empirical researchers with an opportunity to circumvent
more traditional exogeneity requirements in time series regressions. Yet the low
power of many of the associated statistics has often impeded the ability to take
full advantage of these properties in small samples. By allowing data to be
pooled in the cross sectional dimension, nonstationary panel methods have the
potential to improve upon these small sample limitations. Conversely, the use
Fully Modified OLS for Heterogeneous Cointegrated Panels
95
of nonstationary time series asymptotics provides an opportunity to make panel

methods more amenable to pooling aggregate level data by allowing
researchers to selectively pool the long run information contained in the panel,
while allowing the short run dynamics to be heterogeneous among different
members of the panel.
Initial methodological work on nonstationary panels focused on testing for
unit roots in univariate panels. Quah (1994) derived standard normal
asymptotic distributions for testing unit roots in homogeneous panels as both
the time series and cross sectional dimensions grow large. Levin & Lin (1993)
derived distributions under more general conditions that allow for heterogeneous fixed effects and time trends. More recently, Im, Pesaran & Shin (1995)
study the small sample properties of unit root tests in panels with
heterogeneous dynamics and propose alternative tests based on group mean
statistics. In practice however, empirical work often involves relationships
within multivariate systems. Toward this end, Pedroni (1993, 1995) studies the
properties of spurious regressions and residual based tests for the null of no
cointegration in dynamic heterogeneous panels. This chapter continues this line
of research by proposing a convenient method for estimating and testing
hypotheses about common cointegrating vectors in a manner that is consistent
with the degree of heterogeneity permitted in these panel unit root and panel
cointegration studies.
In particular, we address here two key sources of cross member
heterogeneity that are particularly important in dealing with dynamic
cointegrated panels. One such source of heterogeneity manifests itself in the
familiar fixed effects form. These reflect differences in mean levels among the
variables of different individual members of the panel and we model these by
including individual specific intercepts. The second key source of heterogeneity in such panels comes from differences in the way that individuals respond
to short run deviations from equilibrium cointegrating vectors that develop in
response to stochastic disturbances. In keeping with earlier panel unit root and
panel cointegration papers, we model this form of heterogeneity by allowing
the associated serial correlation properties of the error processes to vary across
individual members of the panel.
B. Related Literature
Since the original version of this paper, Pedroni (1996a),1 many more papers
have contributed to our understanding of hypothesis testing in cointegrating
panels. For example, Kao & Chiang (1997) extended their original paper on the
least squares dummy variable model in cointegrated panels, Kao & Chen
96
PETER PEDRONI
(1995), to include a comparison of the small sample properties of a dynamic

OLS estimator with other estimators including a FMOLS estimator similar to
Pedroni (1996a). Specifically, Kao & Chiang (1997) demonstrated that a panel
dynamic OLS estimator has the same asymptotic distribution as the type of
panel FMOLS estimator derived in Pedroni (1996a) and showed that the small
sample size distortions for such an estimator were often smaller than certain
forms of the panel FMOLS estimator. The asymptotic theory in these earlier
papers were generally based on sequential limit arguments (allowing the
sample sizes T and N to grow large sequentially), whereas Phillips & Moon
(1999) subsequently provided a rigorous and more general study of the limit
theory in nonstationary panel regressions under joint convergence (allowing T
and N to grow large concurrently). Phillips & Moon (1999) also provided a set
of regularity conditions under which convergence in sequential limits implies
convergence in joint limits, and considered these properties in the context of a
FMOLS estimator, although they do not specifically address the small sample
properties of feasible versions of the estimators. More recently, Mark & Sul
(1999) also study a similar form of the panel dynamic OLS estimator first
proposed by Kao & Chiang (1997). They compare the small sample properties
of a weighted versus unweighted version of the estimator and find that the
unweighted version generally exhibits smaller size distortion than the weighted
version.
In this chapter we report new small sample results for the group mean panel
FMOLS estimator that was originally proposed in Pedroni (1996a). An
advantage of the group mean estimator over the other pooled panel FMOLS
estimators proposed in the Pedroni (1996a) is that the t-statistic for this
estimator allows for a more flexible alternative hypothesis. This is because the
group mean estimator is based on the so called between dimension of the
panel, while the pooled estimators are based on the within dimension of the
panel. Accordingly, the group mean panel FMOLS provides a consistent test of
a common value for the cointegrating vector under the null hypothesis against
values of the cointegrating vector that need not be common under the
alternative hypothesis, while the pooled within dimension estimators do not.
Furthermore, as Pesaran & Smith (1995) argue in the context of OLS
regressions, when the true slope coefficients are heterogeneous, group mean
estimators provide consistent point estimates of the sample mean of the
heterogeneous cointegrating vectors, while pooled within dimension estimators
do not. Rather, as Phillips & Moon (1999) demonstrate, when the true
cointegrating vectors are heterogeneous, pooled within dimension estimators
provide consistent point estimates of the average regression coefficient, not the
97
sample mean of the cointegrating vectors. Both of these features of the group
mean estimator are often important in practical applications.
Finally, the implementation of the feasible form of the between dimension
group mean estimator also has advantages over the other estimators in the
presence of heterogeneity of the residual dynamics around the cointegrating
vector. As was demonstrated in Pedroni (1996a), in the presence of such
heterogeneity, the pooled panel FMOLS estimator requires a correction term
that depends on the true cointegrating vector. For a specific null value for a
cointegrating vector, the t-statistic is well defined, but of course this is of little
use per se when one would like to estimate the cointegrating vector. One
solution is to obtain a preliminary estimate of the cointegrating vector using
OLS. However, although the OLS estimator is superconsistent, it still contains
a second order bias in the presence of endogeneity, which is not eliminated
asymptotically. Accordingly, this bias leads to size distortion, which is not
necessarily eliminated even when the sample size grows large in the panel
dimension. Consequently, this type of approach based on a first stage OLS
estimate was not recommended in Pedroni (1996a), and it is not surprising that
Monte Carlo simulations have shown large size distortions for such estimators.
Even when the null hypothesis was imposed without using an OLS estimator,
the size distortions for this type of estimator were large as reported in Pedroni
(1996a). Similarly, Kao & Chiang (1997) also found large size distortions for
such estimators when OLS estimates were used in the first stage for the
correction term. By contrast, the feasible version of the between dimension
group mean based estimator does not suffer from these difficulties, even in the
presence of heterogeneous dynamics. As we will see, the size distortions for
this estimator are minimal, even in panels of relatively modest dimensions.
The remainder of the chapter is structured as follows. In Section 2, we
introduce the econometric models of interest for heterogeneous cointegrated
panels. We then present a number of theoretical results for estimators designed
to be asymptotically unbiased and to provide nuisance parameter free
asymptotic distributions which are standard normal when applied to heterogeneous cointegrated panels and can be used to test hypotheses regarding
common cointegrating vectors in such panels. In Section 3 we study the small
sample properties of these estimators and propose feasible FMOLS statistics
that perform relatively well in realistic panels with heterogeneous dynamics. In
Section 4 we enumerate the algorithm used to construct these statistics and
briefly describe a few examples of their uses. Finally, in Section 5 we offer
conclusions and discuss a number of related issues in the ongoing research on
estimation and inference in cointegrated panels.
98
PETER PEDRONI
II. ASYMPTOTIC RESULTS FOR FULLY MODIFIED

OLS IN HETEROGENEOUS COINTEGRATED PANELS
In this section we study asymptotic properties of cointegrating regressions in
dynamic panels with common cointegrating vectors and suggest how a fully
modified OLS estimator can be constructed to deal with complications
introduced by the presence of parameter heterogeneity in the dynamics and
fixed effects across individual members. We begin, however, by discussing the
basic form of a cointegrating regression in such panels and the problems
associated with unmodified OLS estimators.
A. Cointegrating Regressions in Heterogeneous Panels
Consider the following cointegrated system for a panel of i = 1, . . . , N
members,
yit = i + xit + it
xit = xit1 + it
(1)
where the vector error process it = (it, it) is stationary with asymptotic
covariance matrix i. Thus, the variables xi, yi are said to cointegrate for each
member of the panel, with cointegrating vector if yit is integrated of order
one. The term i allows the cointegrating relationship to include member
specific fixed effects. In keeping with the cointegration literature, we do not
require exogeneity of the regressors. As usual, xi can in general be an m
dimensional vector of regressors, which are not cointegrated with each other. In
this case, we partition it = (it, it) so that the first element is a scalar series and
the second element is an m dimensional vector of the differences in the
regressors it = xit xit1 =
xit, so that when we construct
i =
11i
21i
21i
22i
(2)
then 11i is the scalar long run variance of the residual it, and 22i is the m m
long run covariance among the it, and 21i is an m 1 vector that gives the long
run covariance between the residual it and each of the it. However, for
simplicity and convenience of notation, we will refer to xi as univariate in the
remainder of this chapter. Each of the results of this study generalize in an
obvious and straightforward manner to the vector case, unless otherwise
indicated.2
99
In order to explore the asymptotic properties of estimators as both the cross

sectional dimension, N, and the time series dimension, T, grow large, we will
make assumptions similar in spirit to Pedroni (1995) regarding the degree of
dependency across both these dimensions. In particular, for the time series
dimension, we will assume that the conditions of the multivariate functional
central limit theorems used in Phillips & Durlauf (1986) and Park & Phillips
(1988), hold for each member of the panel as the time series dimension grows
large. Thus, we have
Assumption 1.1 (invariance principle): The process it satisfies a multivariate
functional central limit theorem such that the convergence as T for the

[Tr]
it Bi(r, i) holds for any given member, i, of the panel,
T t = 1
where Bi(r, i) is Brownian motion dened over the real interval r[0,1], with
asymptotic covariance i.
This assumption indicates that the multivariate functional central limit theorem,
or invariance principle, holds over time for any given member of the panel. This
places very little restriction on the temporal dependency and heterogeneity of
the error process, and encompasses for example a broad class of stationary
ARMA processes. It also allows the serial correlation structure to be different
for individual members of the panel. Specifically, the asymptotic covariance
matrix, i varies across individual members, and is given by i
limT E[T 1(Tt = 1it)(Tt = 1it)], which can also be decomposed as i =
oi + i + i, where oi is the contemporaneous covariance and i is a weighted
sum of autocovariances. The off-diagonal terms of these individual 21i
matrices capture the endogenous feedback effect between yit and xit, which is
also permitted to vary across individual members of the panel. For several of
the estimators that we propose, it will be convenient to work with a
triangularization of this asymptotic covariance matrix. Specifically, we will
refer to this lower triangular matrix of i as Li, whose elements are related as
follows
1/2
(3)
L11i = (11i 221i/22i)1/2, L12i = 0, L21i = 21i /1/2
22i, L22i = 22i
Estimation of the asymptotic covariance matrix can be based on any one of a
number of consistent kernel estimators such as the Newey & West (1987)
estimator.
Next, for the cross sectional dimension, we will employ the standard panel
data assumption of independence. Hence we have:
Assumption 1.2 (cross sectional independence): The individual processes are
assumed to be independent cross sectionally, so that E[it, jt] = 0 for all i j.
partial sum
100
PETER PEDRONI
More generally, the asymptotic covariance matrix for a panel of dimension

N T is block diagonal with the ith diagonal block given by the asymptotic
covariance for member i.
This type of assumption is typical of our panel data approach, and we will be
using this condition in the formal derivation of the asymptotic distribution of
our panel cointegration statistics. For panels that exhibit common disturbances
that are shared across individual members, it will be convenient to capture this
form of cross sectional dependency by the use of a common time dummy,
which is a fairly standard panel data technique. For panels with even richer
cross sectional dependencies, one might think of estimating a full non-diagonal
N N matrix of ij elements, and then premultiplying the errors by this matrix
in order to achieve cross sectional independence. This would require the time
series dimension to grow much more quickly than the cross sectional
dimension, and in most cases one hopes that a common time dummy will
suffice.
While the derivation of most of the asymptotic results of this chapter are
relegated to the mathematical appendix, it is worth discussing briefly here how
we intend to make use of assumptions 1.1 and 1.2 in providing asymptotic
distributions for the panel statistics that we consider in the next two
subsections. In particular, we will employ here simple and somewhat informal
sequential limit arguments by first evaluating the limits as the T dimension
grows large for each member of the panel in accordance with assumption 1.1
and then evaluating the sums of these statistics as the N dimension grows large
under the independence assumption of 1.2.3 In this manner, as N grows large
we obtain standard distributions as we average the random functionals for each
member that are obtained in the initial step as a consequence of letting T grow
large. Consequently, we view the restriction that first T and then N as
a relatively strong restriction that ensures these conditions, and it is possible
that in many circumstances a weaker set of restrictions that allow N and T to
grow large concurrently, but with restrictions on the relative rates of growth
might deliver similar results. In general, for heterogeneous error processes,
such restrictions on the rate of growth of N relative to T can be expected to
depend in part on the rate of convergence of the particular kernel estimators
used to eliminate the nuisance parameters, and we can expect that our iterative
T and then N requirements proxy for the fact that in practice our
asymptotic approximations will be more accurate in panels with relatively large
T dimensions as compared to the N dimension. Alternatively, under a more
pragmatic interpretation, one can simply think of letting T for fixed N
reflect the fact that typically for the panels in which we are interested, it is the
101
time series dimension which can be expected to grow in actuality rather than
the cross sectional dimension, which is in practice fixed. Thus, T is in a
sense the true asymptotic feature in which we are interested, and this leads to
statistics which are characterized as sums of i.i.d. Brownian motion
functionals. For practical purposes, however, we would like to be able to
characterize these statistics for the general case in which N is large, and in this
case we take N as a convenient benchmark for which to characterize the
distribution, provided that we understand T to be the dominant asymptotic
feature of the data.
B. Asymptotic Properties of Panel OLS
Next, we consider the properties of a number of statistics that might be used for
a cointegrated panel as described by (1) under assumptions 1.1 and 1.2
regarding the time series and cross dimensional dependencies in the data. The
first statistic that we examine is a standard panel OLS estimator of the
cointegrating relationship. It is well known that the conventional single
equation OLS estimator for the cointegrating vector is asymptotically biased
and that its standardized distribution is dependent on nuisance parameters
associated with the serial correlation structure of the data, and there is no
reason to believe that this would be otherwise for the panel OLS estimator. The
following proposition confirms this suspicion.4
Proposition 1.1 (Asymptotic Bias of the Panel OLS Estimator). Consider a
standard panel OLS estimator for the coefficient of panel (1), under
assumptions 1.1 and 1.2, given as

N
NT =
i=1

t=1
(xit x i)
i=1
t=1
(xit xi)(yit y i)
where x i and y i refer to the individual specific means. Then,

(a) The estimator is asymptotically biased and its asymptotic distribution will
be dependent on nuisance parameters associated with the dynamics of the
underlying processes.
(b) Only for the special case in which the regressors are strictly exogenous and
the dynamics are homogeneous across members of the panel can valid
inferences be made from the standardized distribution of NT or its
associated t-statistic.
As the proof of proposition 1.1 given in the appendix makes clear, the source
of the problem stems from the endogeneity of the regressors under the usual
102
PETER PEDRONI
assumptions regarding cointegrated systems. While an exogeneity assumption

is common in many treatments of cross sectional panels, for dynamic
cointegrated panels such strict exogeneity is by most standards not acceptable.
It is stronger than the standard exogeneity assumption for static panels, as it
implies the absence of any dynamic feedback from the regressors at all
frequencies. Clearly, the problem of asymptotic bias and data dependency from
the endogenous feedback effect can no less be expected to diminish in the
context of such panels, and Kao & Chen (1995) document this bias for a panel
of cointegrated time series for the special case in which the dynamics are
homogeneous. For the conventional time series case, a number of methods have
been devised to deal with the consequences of such endogenous feedback
effects, and in what follows we develop an approach for cointegrated panels
based on fully modified OLS principles similar in spirit to those used by
Phillips & Hanson (1990).
C. Pooled Fully Modified OLS Estimators for Heterogeneous Panels
Phillips & Hansen (1990) proposed a semi-parametric correction to the OLS
estimator which eliminates the second order bias induced by the endogeneity of
the regressors. The same principle can also be applied to the panel OLS
estimator that we have explored in the previous subsection. The key difference
in constructing our estimator for the panel data case will be to account for the
heterogeneity that is present in the fixed effects as well as in the short run
dynamics. These features lead us to modify the form of the standard single
equation fully modified OLS estimator. We will also find that the presence of
fixed effects has the potential to alter the asymptotic distributions in a nontrivial manner.
The following proposition establishes an important preliminary result which
facilitates intuition for the role of heterogeneity and the consequences of
dealing with both temporal and cross sectional dimensions for fully modified
OLS estimators.
Proposition 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS
Estimator). Consider a panel FMOLS estimator for the coefficient of panel
(1) given by

N
NT =
*
i=1
where
L 2
22i
t=1
(xit x i)

1
1
L 1
11iL22i
i=1
t=1
(xit x i)*it T i
103
L 21i
L
o21i 21i ( 22i +
o22i)
x , 21i +
L22i it i
L 22i
i as defined in (2) above. Then,
and L i is a lower triangular decomposition of
NT converges to the true value
under assumptions 1.1 and 1.2, the estimator *
at rate TN, and is distributed as
*it = it
NT ) N(0, v) where v =
TN(*
2 iff x i = y i = 0
6 else
as T and N .
As the proposition indicates, when proper modifications are made to the
estimator, the corresponding asymptotic distribution will be free of the
nuisance parameters associated with any member specific serial correlation
patterns in the data. Notice also that this fully modified panel OLS estimator is
asymptotically unbiased for both the standard case without intercepts as well as
the fixed effects model with heterogeneous intercepts. The only difference is in
the size of the variance, which is equal to 2 in the standard case, and 6 in the
case with heterogeneous intercepts, both for xit univariate. More generally,
when xit is an m-dimensional vector, the specific values for v will also be a
function of the dimension m. The associated t-statistics, however, will not
depend on the specific values for v, as we shall see.
The fact that this estimator is distributed normally, rather than in terms of
unit root asymptotics as in Phillips & Hansen (1990), derives from the fact that
these unit root distributions are being averaged over the cross sectional
dimension. Specifically, this averaging process produces normal distributions
whose variance depends only on the moments of the underlying Brownian
motion functionals that describe the properties of the integrated variables. This
is achieved by constructing the estimator in a way that isolates the idiosyncratic
components of the underlying Wiener processes to produce sums of standard
and independently distributed Brownian motion whose moments can be
computed algebraically, as the proof of the proposition makes clear. The
estimators L 11i and L 22i, which correspond to the long run standard errors of
conditional process it, and the marginal process
xit respectively, act to purge
the contribution of these idiosyncratic elements to the endogenous feedback

T
and serial correlation adjusted statistic
(xit x i)y*it T i.
t=1
The fact that the variance is larger for the fixed effects model in which
heterogeneous intercepts are included stems from the fact that in the presence
104
PETER PEDRONI
of unit roots, the variation from the cross terms of the sample averages x i and
y i grows large over time at the same rate T, so that their effect is not eliminated
NT ).5 However, since the
asymptotically from the distribution of TN(*
contribution to the variance is computable analytically as in the proof of
proposition 1.2, this in itself poses no difficulties for inference. Nevertheless,
upon consideration of these expressions, it also becomes apparent that there
should exist a metric which can directly adjust for this effect in the distribution
and consequently render the distribution standard normal. In fact, as the
following proposition indicates, it is possible to construct a t-statistic from this
fully modified panel OLS estimator whose distribution will be invariant to this
effect.
Corollary 1.2 (Asymptotic Distribution of the Pooled Panel FMOLS tstatistic). Consider the following t-statistic for the FMOLS panel estimator of
as defined in proposition 1.2 above. Then under the same assumptions as in
proposition 1.2, the statistic is standard normal,

N
NT )
t*
NT = (*
i=1
2
22i
(xit x i)2
t=1
1/2
N(0, 1)
as T and N for both the standard model without intercepts as well as

the fixed effects model with heterogeneous estimated intercepts.
Again, as the derivation in the appendix makes apparent, because the numerator
NT is a sum of mixture normals with zero mean
of the fully modified estimator *
whose variance depends only on the properties of the Brownian motion

T
functionals associated with the quadratic
(xit x i)2, the t-statistic con-
t=1
structed using this expression will be asymptotically standard normal. This is

NT )
regardless of the value of v associated with the distribution of TN(*
and so will also not depend on the dimensionality of xit in the general vector
case.
Note, however, that in contrast to the conventional single equation case
studied by Phillips & Hansen (1990), in order to ensure that the distribution of
this t-statistic is free of nuisance parameters when applied to heterogeneous
panels, the usual asymptotic variance estimator of the denominator is replaced
with the estimator L 2
22i. By construction, this corresponds to an estimator of the
asymptotic variance of the differences for the regressors and can be estimated
accordingly. This is in contrast to the t-statistic for the conventional single
equation fully modified OLS, which uses an estimator for the conditional
105
asymptotic variance from the residuals of the cointegrating regression. This

distinction may appear puzzling at first, but it stems from the fact that in
heterogeneous panels the contribution from the conditional variance of the
residuals is idiosyncratic to the cross sectional member, and must be adjusted
NT estimator itself
for directly in the construction of the numerator of the *
before averaging over cross sections. Thus, the conditional variance has already
NT, and all that is required
been implicitly accounted for in the construction of *
is that the variance from the marginal process
xit be purged from the quadratic

T
(xit x i)2. Finally, note that proposition 1.2 and its corollary 1.2 have been
t=1
specified in terms of a transformation, *it, of the true residuals. In Section 3 we

will consider various strategies for specifying these statistics in terms of
observables and consider the small sample properties of the resulting feasible
statistics.
D. A Group Mean Fully Modified OLS t-Statistic
Before preceding to the small sample properties, we first consider one
additional asymptotic result that will be of use. Recently Im, Pesaran & Shin
(1995) have proposed using a group mean statistic to test for unit roots in panel
data. They note that under certain circumstances, panel unit root tests may
suffer from the fact that the pooled variance estimators need not necessarily be
asymptotically independent of the pooled numerator and denominator terms of
the fixed effects estimator. Notice, however, that the fully modified panel OLS
statistics in proposition 1.2 and corollary 1.2 here have been constructed
without the use of a pooled variance estimator. Rather, the statistics of the
numerator and denominator have been purged of any influence from the
nuisance parameters prior to summing over N. Furthermore, since asymptotically the distribution for the numerator is centered around zero, the covariance
between the summed terms of the numerator and denominator also do not play
NT ) or t*
a role in the asymptotic distribution of TN(*
it as they would
otherwise.
Nevertheless, it is also interesting to consider the possibility of a fully
modified OLS group mean statistic in the present context. In particular, the
group mean t-statistic is useful because it allows one to entertain a somewhat
broader class of hypotheses under the alternative. Specifically, we can think of
the distinction as follows. The t-statistic for the true panel estimator as
described in corollary 1.2 can be used to test the null hypothesis Ho : i = o for
all i versus the alternative hypothesis Ha : i = a o for all i where o is the
106
PETER PEDRONI
hypothesized common value for under the null, and a is some alternative
value for which is also common to all members of the panel. By contrast, the
group mean fully modified t-statistic can be used to test the null hypothesis
Ho : i = o for all i versus the alternative hypothesis Ha : i o for all i, so that
the values for are not necessarily constrained to be homogeneous across
different members under the alternative hypothesis.
The following proposition gives the precise form of the panel fully modified
OLS t-statistic that we propose and gives its asymptotic distributions.
Proposition 1.3 (Asymptotic Distribution of the Panel FMOLS Group Mean
t-Statistic). Consider the following group mean FMOLS t-statistic for of the
cointegrated panel (1). Then under assumptions 1.1 and 1.2, the statistic is
standard normal, and

N
1
t*
NT =
N
L 1
11i
i=1
t=1
(xit x i)

1/2
t=1
(xit x i)y*it T i N(0, 1)
where
y*it = (yit y i)
L 21i
o21i L21i ( 22i +
o22i)
xit, i 21i +
L 22i
L 22i
i as defined in (2) above, as

and L i is a lower triangular decomposition of
T and N for both the standard model without intercepts as well as the
fixed effects model with heterogeneous intercepts.
Note that the asymptotic distribution of this group mean statistic is also
invariant to whether or not the standard model without intercepts or the fixed
effects model with heterogeneous intercepts has been estimated. Just as with
the previous t-statistic of corollary 1.2, the asymptotic distribution of this panel
group mean t-statistic will also be independent of the dimensionality of xit for
the more general vector case. Thus, we have presented two different types of tstatistics, a pooled panel OLS based fully modified t-statistic based on the
within dimension of the panel, and a group mean fully modified OLS tstatistic based on the between dimension of the panel, both of which are
asymptotically unbiased, free of nuisance parameters, and invariant to whether
or not idiosyncratic fixed effects have been estimated. Furthermore, we have
characterized the asymptotic distribution of the fully modified panel OLS
estimator itself, which is also asymptotically unbiased and free of nuisance
parameters, although in this case one should be aware that while the
distribution will be a centered normal, the variance will depend on whether
heterogeneous intercepts have been estimated and on the dimensionality of the
107
vector of regressors. In the remainder of this chapter we investigate the small

sample properties of feasible statistics associated with these asymptotic results
and discuss examples of their application.
III. SMALL SAMPLE PROPERTIES OF FEASIBLE

PANEL FULLY MODIFIED OLS STATISTICS
In this section we investigate the small sample properties of the pooled and
group mean panel FMOLS estimators that were developed in the previous
section. We discuss two alternative feasible estimators associated with the
panel FMOLS estimators of proposition 1.2 and its t-statistic, which were
defined only in terms of the true residuals. While these estimators perform
reasonably well in idealized situations, more generally, size distortions for
these estimators have the potential to be fairly large in small samples, as was
reported in Pedroni (1996a). By contrast, we find that the group mean test
statistics do very well and exhibit relatively little size distortion even in
relatively small panels even in the presence of substantial cross sectional
heterogeniety of the error process associated with the dynamics around the
cointegrating vector. Consequently, after discussing some of the basic
properties of the feasible versions of the pooled estimators and the associated
difficulties for small samples, we focus here on reporting the small sample
properties of the group mean test statistics, which are found to do extremely
well provided that the time series dimension is not smaller than the cross
sectional dimension.
A. General Properties of the Feasible Estimators
First, before reporting the results for the between dimension group mean test
statistic, we discuss the general properties of various feasible forms of the
within dimension pooled panel fully modified OLS statistics and consider the
consequences of these properties in small samples. One obvious candidate for
a feasible estimator based on proposition 1.2 would be to simply construct the
statistic in terms of estimated residuals, which can be obtained from the initial
N single equation OLS regressions associated with the cointegrating regression
for (1). Since the single equation OLS estimator is superconsistent, one might
hope that this produces a reasonably well behaved statistic for the panel
FMOLS estimator. The potential problem with this reasoning stems from the
fact that although the OLS regression is superconsistent it is also asymptotically biased in general. While this is a second order effect for the conventional
108
PETER PEDRONI
single series estimator, for panels, as N grows large, the effect has the potential
to become first order.
Another possibility might appear to be to construct the feasible panel
FMOLS estimator for proposition 1.2 in terms of the original data series
L 21i
xit along the lines of how it is often done for the

y*it = (yit y i)
L 22i
conventional single series case. However, this turns out to be correct only in
very specialized cases. More generally, for heterogeneous panels, this will
introduce an asymptotic bias which depends on the true value of the
cointegrating relationship and the relative volatility of the series involved in the
regression. The following makes this relationship precise.
Proposition 2.1 (Regarding Feasible Pooled Panel FMOLS) Under the
conditions of proposition 1.2 and corollary 1.2, consider the panel FMOLS
estimator for the coefficient of panel (1) given by

N
*NT =
i=1

2
22i
(xit x i)
t=1
L L
1 1
11i 22i
i=1
t=1
(xit x i)y*it T i
where
L 21i
L 11i L 22i
xit +
(xit x i)
L 22i
L 22i
NT ) and t*
and L i and i are defined as before. Then the statistics TN (*
NT
constructed from this estimator are numerically equivalent to the ones defined
in proposition 1.2 and corollary 1.2.
This proposition shows why it is difficult to construct a reliable point estimator
based on the naive FMOLS estimator simply by using a transformation of y*it
analogous to the single equation case. Indeed, as the proposition makes
explicit, such an estimator would in general depend on the true value of the
parameter that it is intended to estimate, except in very specialized cases, which
we discuss below. On the other hand, this does not necessarily prohibit the
usefulness of an estimator based on proposition 2.1 for the purposes of testing
a particular hypothesis about a cointegrating relationship in heterogeneous
panels. By using the hypothesized null value for in the expression for y*it,
proposition 2.1 can at least in principle be employed to construct a feasible
FMOLS statistics to test the null hypothesis that i = for all i. However, as
was reported in Pedroni (1996a), even in this case the small sample
performance of the statistic is often subject to relatively large size distortion.
Proposition 2.1 also provides us with an opportunity to examine the
consequences of ignoring heterogeneity associated with the serial correlation
y*it = (yit y i)
109
dynamics for the error process for this type of estimator. In particular, we
notice that the modification involved in this estimator relative to the convential
time series fully modified OLS estimator differs in two respects. First, it
includes the estimators L 11i and L 22i that premultiply the numerator and
denominator terms to control for the idiosyncratic serial correlation properties
of individual cross sectional members prior to summing over N. Secondly, and
more importantly, it includes in the transformation of the dependent variable y*it
L 11i L 22i
an additional term
(xit x i). This term is eliminated only in two
L 22i
special cases: (1) The elements L11i and L22i are identical for all members of the
panel, and do not need to be indexed by i. This corresponds to the case in which
the serial correlation structure of the data is homogeneous for all members of
the panel. (2) The elements L11i and L22i are perhaps heterogeneous across
members of the panel, but for each panel L11i = L22i. This corresponds to the case
in which asymptotic variances of the dependent and independent variables are
the same. Conversely, the effect of this term increases as (1) the dynamics
become more heterogeneous for the panel, and (2) as the relative volatility
becomes more different between the variables xit and yit for any individual
members of the panel. For most panels of interest, these are likely to be
important practical considerations. On the other hand, if the data are known to
be relatively homogeneous or simple in its serial correlation structure, the
imprecise estimation of these elements will decrease the attractiveness of this
type of estimator relative to one that implicitly imposes these known
restrictions.
B. Monte Carlo Simulation Results
We now study small sample properties in a series of Monte Carlo simulations.
Given the difficulties associated with the feasible versions of the within
dimension pooled panel fully modified OLS estimators discussed in the
previous subsection based on proposition 2.1, it is not surprising that these tend
to exhibit relatively large size distortions in certain scenarios, as reported in the
Pedroni (1996a). Kao & Chiang (1997) subsequently also confirmed the poor
small sample properties of the within dimension pooled panel fully modified
estimator based on a version in which a first stage OLS estimate was used for
the adjustment term. Indeed, such results should not be surprising given that the
first stage OLS estimator introduces a second order bias in the presence of
endogeneity, which is not eliminated asymptotically. Consequently, this bias
leads to size distortion for the panel which is not necessarily eliminated even
when the sample size grows large. By contrast, the feasible version of the
110
PETER PEDRONI
between dimension group mean estimator does not require such an adjustment
term even in the presence of heterogeneous serial correlation dynamics, and
does not suffer from the same size distortion.6 Consequently, we focus here on
reporting the small sample Monte Carlo results for the between dimension
group mean estimator and refer readers to Pedroni (1996a) for simulation
results for the feasible versions of the within dimension pooled estimators.
To facilitate comparison with the conventional time series literature, we use
as a starting point a few Monte Carlo simulations analogous to the ones studied
in Phillips & Loretan (1991) and Phillips & Hansen (1990) based on their
original work on FMOLS estimators for conventional time series. Following
these studies, we model the errors for the data generating process in terms of
a vector MA(1) process and consider the consequences of varying certain key
parameters. In particular, for the purposes of the Monte Carlo simulations, we
model our data generating process for the cointegrated panel (1) under
assumptions 1.1 and 1.2 as
yit = i + xit + it
xit = xit 1 + it
i = 1, . . . , N, t = 1, . . . , T, for which we model the vector error process
it = (it, it) in terms of a vector moving average process given by
it = it iit 1; it ~ i.i.d. N(0, i)
(3)
where i is a 2 2 coefficient matrix and i is a 2 2 contemporaneous

covariance matrix. In order to accommodate the potentially heterogeneous
nature of these dynamics among different members of the panel, we have
indexed these parameters by the subscript i. We will then allow these
parameters to be drawn from uniform distributions according to the particular
experiment. Likewise, for each of the experiments we draw the fixed effects i
from a uniform distribution, such that i ~ U(2.0, 4.0).
We consider first as a benchmark case an experiment which captures much
of the richness of the error process studied in Phillips & Loretan (1991) and yet
also permits considerable heterogeneity among individual members of the
panel. In their study, Phillips & Loretan (1991), following Phillips & Hansen
(1990), fix the following parameters 11i = 0.3, 12i = 0.4, 22i = 0.6,
11i = 22i = 1.0, = 2.0 and then permit 21i and 21i to vary. The coefficient
21i is particularly interesting since a non-zero value for this parameter reflects
an absence of even weak exogeneity for the regressors in the cointegrating
regression associated with (1), and is captured by the term L21i in the panel
FMOLS statistics. For our heterogeneous panel, we therefore set
11i = 22i = 1.0, = 2.0 and draw the remaining parameters from uniform
111
distributions which are centered around the parameter values set by Phillips &
Loretan (1991), but deviate by up to 0.4 in either direction for the elements of
i and by up to 0.85 in either direction for 21i. Thus, in our first experiment,
the parameters are drawn as follows: 11i ~ U(0.1, 0.7), 12i ~ (0.0, 0.8),
21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0) and 21i ~ U(0.85, 0.85). This specification
achieves considerable heterogeneity across individual members and also allows
the key parameters 21i and 21i to span the set of values considered in Phillips
and Loretans study. In this first experiment we restrict the values of 21i to span
only the positive set of values considered in Phillips and Loretan for this
parameter. In several cases Phillips and Loretan found negative values for 21i
to be particularly problematic in terms of size distortion for many of the
conventional test statistics applied to pure time series, and in our subsequent
experiments we also consider the consequences of drawing negative values for
this coefficient. In each case, the asymptotic covariances were estimated
individually for each member i of the cross section using the Newey-West
(1987) estimator. In setting the lag length for the band width, we employ the
data dependent scheme recommended in Newey & West (1994), which is to set
the lag truncation to the nearest integer given by K = 4
T
100
2/9
, where T is the
number of sample observations over time. Since we consider small sample

results for panels ranging in dimension from T = 10 to T = 100 by increments of
10, this implies that the lag truncation ranges from 2 to 4. For the cross
sectional dimension, we consider small sample results for N = 10, N = 20 and
N = 30 for each of these values of T.
Results for the first experiment, with 21i ~ U(0.0, 0.8) are reported in Table
I of Appendix B. The first column of results reports the bias of the point
estimator and the second column reports the associated standard error of the
sampling distribution. Clearly, the biases are small at 0.058 even in extreme
cases when both the N and T dimensions are as small as N = 10, T = 10 and
become minuscule as the T dimension grows larger. At N = 10, T = 30 the bias
is already down to 0.009, and at T = 100 it goes to 0.001. This should be
anticipated, since the estimators are superconsistent and converge at rate TN,
so that even for relatively small dimensions the estimators are extremely
precise. Furthermore, the Monte Carlo simulations confirm that the bias is
reduced more quickly with respect to growth in the T dimension than with
respect to growth in the N dimension. For example, the biases are much smaller
for T = 30, N = 10 than for T = 10, N = 30 for all of the experiments. The
standard errors in column two confirm that the sampling variance around these
112
PETER PEDRONI
biases are also very small. Similar results continue to hold in subsequent
experiments with negative moving average coefficients, regardless of the data
generating process for the serial correlation processes. Consequently, the first
thing to note is that these estimators are extremely accurate even in panels with
very heterogeneous serial correlation dynamics, fixed effects and endogenous
regressors.
Of course these findings on bias should not come as a surprise given the
superconsistency results presented in the previous section. Instead, a more
central concern for the purposes of inference are the small sample properties of
the associated t-statistic and the possibility for size distortion. For this, we
consider the performance of the small sample sizes of the test under the null
hypothesis for various nominal sizes based on the asymptotic distribution.
Specifically, the last two columns report the Monte Carlo small sample results
for the nominal 5% and 10% p-values respectively for a two sided test of the
null hypothesis = 2.0. As a general rule, we find that the size distortions in
these small samples are remarkably small provided that the time series
dimension, T, is not smaller than the cross sectional dimension, N. The reason
for this condition stems primarily as a consequence of the estimation of the
fixed effects. The number of fixed effects, i, grows with the N dimension of
the panel. On the other hand, each of these N fixed effects are estimated
consistently as T grows large, so that i i goes to zero only as T grows large.
Accordingly, we require T to grow faster than N in order to eliminate this effect
asymptotically for the panel. As a practical consequence, small sample size
distortion tends to be high when N is large relative to T, and decreases as T
becomes large relative to N, which can be anticipated in any fixed effects
model. As we can see from the results in Table I, in cases when N exceeds T,
the size distortions are large, with actual sizes exceeding 30 and 40% when
T = 10 and N grows from 10 to 20 and 30. This represents an unattractive
scenario, since in this case, the tests are likely to report rejections of the null
hypothesis when in fact it is not warranted. However, these represent extreme
cases, as the techniques are designed to deal with the opposite case, where the
T dimension is reasonably large relative to the N dimension. In these cases,
even when the T dimension is only slightly larger than the N dimension, and
even in cases where it is comparable, we find that the size distortion is
remarkably small. For example, in the results reported in Table I we find that
with N = 20, T = 40 the size of the nominal 5% and 10% tests becomes 4.5%
and 9.3% respectively. Similarly, for N = 10, T = 30 the sizes for the Monte
Carlo sample become 6.1% and 11% respectively, and for N = 30, T = 60, they
become 4.7% and 9.6%. As the T dimension grows even larger for a fixed N
dimension, the tests tend to become slightly undersized, with the actual size
113
becoming slightly smaller than the nominal size. In this case the small sample
tests actually become slightly more conservative than one would anticipate
based on the asymptotic critical values.
Next, we consider the case in which the values for 21i span negative
numbers, and for the experiment reported in Table II of Appendix B we draw
this coefficient from 21i ~ U(0.8, 0.0). Large negative values for moving
average coefficients are well known to create size distortion for such
estimators, and we anticipate this to be a case in which we have higher small
sample distortion. It is interesting to note that in this case the biases for the
point estimate become slightly positive, although as mentioned before, they
continue to be very small. The small sample size distortions follow the same
pattern in that they tend to be largest when T is small relative to N and decrease
as T grows larger. In this case, as anticipated, they tend to be higher than for the
case in which 21i spans only positive values. However, the values still fall
within a fairly reasonable range considering that we are dealing with all
negative values for 21i. For example, with N = 10, T = 100 we have values of
6.3% and 12% for the 5% and 10% nominal sizes respectively. For N = 20,
T = 100 they become 9% and 15.6% respectively. These are still remarkably
small compared to the size distortions reported in Phillips & Loretan (1991) for
the conventional time series case.
Finally, we ran a third experiment in which we allowed the values for 21i to
span both positive and negative values so that we draw the values from
21i ~ U(0.4, 0.4). We consider this to be a fairly realistic case, and this
corresponds closely to the range of moving average coefficients that were
estimated in the purchasing power parity study contained in Pedroni (1996a).
We find the group mean estimator and test statistic to perform very well in this
situation. The Monte Carlo simulation results for this case are reported in Table
III of Appendix B. Whereas the biases for the case with large positive values
of 21i in Table I were negative, and for the case with large negative values in
Table II were positive, here we find the biases to be positive and often even
smaller in absolute value than either of the first two cases. Most importantly,
we find the size distortions for the t-statistic to be much smaller here than in the
case where we have exclusively negative values for 21i. For example, with
N = 30, and T as small as T = 60, we find the nominal 5% and 10% sizes to be
5.4% and 10.5%. Again, generally the small sample sizes for the test are quite
close to the asymptotic nominal sizes provided that the T dimension is not
smaller than the N dimension. Consequently, it appears to be the case that even
when some members of the panel exhibit negative moving average coefficients,
as long as other members exhibit positive values, the distortions tend to be
averaged out so that the small sample sizes for the group mean statistic stay
114
PETER PEDRONI
very close to the asymptotic sizes. Thus, we conclude that in general when the
T dimension is not smaller than the N dimension, the asymptotic normality
result appears to provide a very good benchmark for the sampling distribution
under the null hypothesis, even in relatively small samples with heterogeneous
serial correlation dynamics.
Finally, although power is generally not a concern for such panel tests, since
the power is generally quite high, it is worth mentioning the small sample
power properties of the group mean estimator. Specifically, we experimented
by checking the small sample power of the test against the alternative
hypothesis by generating the 10,000 draws for the DGP associated with case 3
above with = 1.9. For the test of the null hypothesis that = 2.0 against the
alternative hypothesis that = 1.9, we found that the power for the 10% p-value
test reached 100% for N = 10 when T was 40 or more (or 98.2% when T = 30)
and reached 100% for N = 20 when T was 30 or more, and for N = 30 the power
reached 100% already when T was 20 or more. Consequently, considering the
high power and the relatively small size distortion, we find the small sample
properties of the estimator and associated t-statistic to be extremely well
behaved in the cases for which it was designed.
IV. ESTIMATION ALGORITHM AND SOME EXAMPLES

OF APPLICATIONS7
In this section we describe the algorithm for computing the panel FMOLS
estimators and their associated test statistics and then discuss a few examples
of their use. In summary, we can compute any one the desired statistics by
performing the following steps:
1. Estimate the panel regression and collect the residuals. Specifically one
should estimate the desired panel cointegration regression, making sure to
include any desired intercepts, or common time dummies in the regression,
and then collect the residuals
i,t for each of the members of the panel. If the
slopes are homogeneous, the common time dummy effects can be
eliminated more simply by first demeaning the data over the time dimension
prior to estimating the regression. Thus, construct yit y t, xit x t for each
variable, where y t = N1 Ni = 1 yit, x t = N1 Ni = 1 xit prior to estimating the
regression, and prior to the following steps.
2. Estimate the long run covariances and autocovariances of the errors. Use
the estimated residuals from part (1) plus the differences of each of the
regressors to construct a vector error series it = (it, it). Note that the
second element is a vector of dimension m, where m corresponds to the
number of regressors. Now use any long run covariance matrix estimator,
115
such as the Newey-West (1987) estimator to estimate the elements of the

long run covariance i and the autocovariances i. This can be done by
applying the estimator to the entire m + 1 vector it = (it, it) to produce an
(m + 1) (m + 1) long run covariance matrix and autocovariances matrix.
The elements of i and i then correspond to partitions of the
(m + 1) (m + 1) long run covariance matrix and autocovariance matrix
respectively. Specifically, the far upper right scalar element of the
(m + 1) (m + 1) long run covariance matrix corresponds to 11i. The lower
m m partition corresponds to 22i, which is an m m matrix representing
the long run covariance among the regressors, and the remaining m elements
in the column below the far upper right scalar element correspond to 21i.
Since the covariance matrix is symmetric, 12i = 21i. The same mapping
corresponds the partitions of the (m + 1) (m + 1) autocovariance matrix and
the elements of i, except that unlike i, the autocovariance matrix i is not
symmetric, so 12i 21i, and these elements must be extracted from the
corresponding column and row partitions separately. Once i has been
constructed, apply a Cholesky style triangularization to obtain the elements
of the matrix Li. Finally, we will use an estimate of the standard
contemporaneous covariance matrix, oi , for the elements of it = (it, it),
similarly partitioned.
3. Construct the estimator. Now we have all of the pieces required to construct
the estimators. Each estimator uses a serial correlation correction term, i,
which can be constructed from the pieces obtained in part (2) above, as
o21i L21i ( 22i +

o22i)
i 21i +
L 22i
L 21i
xit can be
L 22i
constructed from the original data. Then the final step is to construct the cross
product terms between y*it and (xit xi). This is sufficient now to compute either
the point estimators or the associated t-statistics for any of the statistics.
It is worth noting two points here. The difference between the panel within
dimension estimators and the group mean between dimension estimators is in
the way in which the cross product terms are computed. For the within
dimension statistics, the cross product terms are computed by summing over
the T and N dimensions separately for the numerator and the denominator. For
the group mean between dimension statistics, the cross product terms are
computed by summing over the T dimension for the numerator and
denominator separately, and then summing over the N dimension for the entire
ratio. Consequently, the first point to note is that the algorithm as applied to the
Next, using the elements of Li, the expression for y*it = (yit y i)
116
PETER PEDRONI
group mean estimator describes the same steps that one would take if one were
estimating N different conventional FMOLS estimators and then taking the
average of these. The same is true for the group mean t-statistic. Thus, if one
already has a routine to estimate the conventional time series FMOLS
estimator, then the group mean panel FMOLS estimator is extremely simple
and convenient to estimate. The second point to note is that for the panel
FMOLS within dimension estimator we have used the estimates of i, i, oi
and i to compute the weighted panel variances. But it is equally feasible to
compute the unweighted panel variances by first averaging the values i, i, oi
before applying the transformations. Whether or not the two different
treatments has much consequence for the estimate is likely to depend on how
heterogeneous the values of i are across individual members.
Next, we briefly describe a few examples of the use of these panel FMOLS
estimators. One obvious application is to the exchange rate literature, and in
particular the purchasing power parity literature. Long run absolute or strong
purchasing power parity predicts that nominal exchange rates and aggregate
price ratios among countries should be cointegrated with a unit cointegrating
vector, so that the real exchange rate is stationary. However, panel unit root
tests based on Levin & Lin (1993) have generally found mixed results. See for
example Oh (1996) and Papell (1997) and Wu (1996) among others. On the
other hand, panel cointegration tests based on Pedroni (1995, 1997a) have
generally rejected the null of no cointegration. See for example Canzoneri,
Cumby & Diba (1996), Chinn (1997) and Taylor (1996) among others for
these. By contrast, long run relative or weak purchasing power parity simply
predicts that the nominal exchange rate and aggregate price ratios will be
cointegrated, though not necessarily with a unit cointegrating vector. The panel
FMOLS estimators presented in this paper are an obvious way to distinguish
between these two hypothesis, and Pedroni (1996a, 1999) uses these panel
FMOLS estimators to show that only the relative, weak form of purchasing
power parity holds for a panel of post Bretton Woods period floating exchange
rates. The latter paper contrasts results for both a parametric group mean DOLS
estimator and nonparametric group mean FMOLS estimator for the weak
purchasing power parity test. In a similar spirit, Alexius & Nilson (2000),
Canzoneri, Cumby & Diba (1996), Chinn (1997) apply these panel FMOLS
tests from Pedroni (1996a) to test the Samuelson-Balassa hypothesis that long
run movements of real exchange rates are driven by differences in long run
relative productivities among countries.
Other examples of the use of these panel FMOLS tests have been to the
growth literature. Neusser & Kugler (1998) use the tests from Pedroni (1996a)
to investigate the connection between financial development and growth. Kao,
117
Chiang & Chen (1999) use a panel FMOLS estimator and compare it to a panel
DOLS estimator to investigate the connection between research and development expenditure and growth. Keller & Pedroni (1999) use the group mean
panel estimator presented in this chapter to study the mechanism by which
imported R&D impacts growth at the industry level and demonstrate the
attractiveness of the more flexible form of the group mean estimator. Canning
& Pedroni (1999) use the same group mean panel FMOLS test as a first step
estimator to construct a test for the direction of long run causality between
public infrastructure and long run growth. Finally Pedroni & Wen (2000) make
use of the group mean panel FMOLS estimator as a first step estimator in an
overlapping generations model to identify the position of the U.S., Japanese
and European economies relative to the golden rule, and the extent to which
social security transfer programs can move economies closer to this position.
This is just a brief summary of the application of these estimators to two
literatures, the exchange rate and growth literatures. Needless to say, many
potential applications exist beyond these two literatures.
V. DISCUSSION OF FURTHER RESEARCH AND

CONCLUDING REMARKS
We have explored in this chapter methods for testing and making inferences
about cointegrating vectors in heterogeneous panels based on fully modified
OLS principles. When properly constructed to take account of potential
heterogeneity in the idiosyncratic dynamics and fixed effects associated with
such panels, the asymptotic distributions for these estimators can be made to be
centered around the true value and will be free of nuisance parameters.
Furthermore, based on Monte Carlos simulations we have shown that in
particular the t-statistic constructed from the between dimension group mean
estimator performs very well in that in exhibits relatively little small sample
size distortion. To date, the techniques developed in this study have been
employed successfully in a number of applications, and it will be interesting to
see if the panel FMOLS methods developed in this paper fare equally well in
other scenarios.
The area of research and application of nonstationary panel methods is
rapidly expanding, and we take this opportunity to remark on a few further
issues of current and future research as they relate to the subject of this chapter.
As we have already discussed, the between dimension group mean estimator
has an advantage over the within dimension pooled estimators presented in this
chapter in that it permits a more flexible alternative hypothesis that allows for
heterogeneity of the cointegrating vector. In many cases it is not known a priori
118
PETER PEDRONI
whether heterogeneity of the cointegrating vector can be ruled out, and it would
be particularly nice to test the null hypothesis that the cointegrating vectors are
heterogeneous in such panels with heterogeneous dynamics. In this context,
Pedroni (1998) provides a technique that allows one to test such a null
hypothesis against the alternative hypothesis that they are homogeneous and
demonstrates how the technique can be used to test whether convergence in the
Solow growth model occurs to distinct versus common steady states for the
Summers and Heston data set.
Another important issue that is often raised for these types of panels pertains
to the assumption of cross sectional independence as per assumption 1.2 in this
chapter. The standard approach is to use common time dummies, which in
many cases is sufficient to deal with cross sectional dependence. However, in
some cases, common time dummies may not be sufficient, particularly when
the cross sectional dependence is not limited to contemporaneous effects and is
dynamic in nature. Pedroni (1997b) proposes an asymptotic covariance
weighted GLS approach to deal with such dynamic cross sectional dependence
for the case in which the time series dimension is considerably larger than the
cross sectional dimension, and applies the panel fully modified form of the test
to the purchasing power parity hypothesis using monthly OECD exchange rate
data. It is interesting to note, however, that for this particular application, taking
account of such cross sectional dependencies does not appear to impact the
conclusions and it is possible that in many cases cross sectional dependence
does not play as large a role as one might anticipate once common time
dummies have been included, although this remains an open question.
Another important issue is parameteric versus non-parametric estimation of
nuisance parameters. Clearly, any of the estimators presented here can be
implemented by taking care of the nuisance parameter effects either
nonparameterically using kernel estimators, or parametrically, as for example
using dynamic OLS corrections. Generally speaking, non-parametric estimation tends to be more robust, since one does not need to assume a specific
parametric form. On the other hand, since non-parametric estimation relies on
fewer assumptions, it generally requires more data than parametric estimation.
Consequently, for conventional time series tests, when data is limited it is often
worth making specific parameteric assumptions. For panels, on the other hand,
the greater abundance of data suggests an opportunity to take advantage of the
greater robustness of nonparametric methods, though ultimately the choice
may simply be a matter of taste. The Monte Carlo simulation results provided
here demonstrate that even in the presence of considerable heterogeneity, nonparametric correction methods do very well for the group mean estimator and
the corresponding t-statistic.
119
NOTES
1. The results in section 2 and appendix A first appeared in Pedroni (1996a). The
Indiana University working paper series is available at http://www.indiana.edu/ iuecon/
workpaps/
2. In fact the computer program which accompanies this paper also allows one to
implement these tests for any arbitrary number of regressors. It is available upon request
from the author at ppedroni@indiana.edu
3. See Phillips & Moon (1999) for a recent formal study of the regularity conditions
required for the use of sequential limit theory in panel data and a set of conditions under
which sequential limits imply joint limits, including the case in which the long run
variances differ among members of the panel.
4. These results are for the OLS estimator when the variables are cointegrated. A
related stream of the literature studies the properties of the panel OLS estimator when
the variables are not cointegrated and the regression is spurious. See for example Entorf
(1997), Kao (1999), Phillips & Moon (1999) and Pedroni (1993, 1997a) on spurious
regression in nonstationary panels.
5. A separate issue pertains to differences between the sample averages and the true
population means. Since we are treating the asymptotics sequentially, this difference
goes to zero as T grows large prior to averaging over N, and thus does not impact the
limiting distribution. Otherwise, more generally we would require that the ratio N/T
goes to zero as N and T grow large in order to ensure that these differences do not
impact the limiting distribution. We return to this point in the discussion of the small
sample properties in section 3.2.
6. Of course this is not to say that all within dimension estimators will necessarily
suffer from this particular form of size distortion, and it is likely that some forms of the
pooled FMOLS estimator will be better behaved than others. Nevertheless, given the
other attractive features of the between dimension group mean estimator, we focus here
on reporting the very attractive small sample properties of this estimator.
7. I am grateful to an anonymous referee for suggesting this section.
ACKNOWLEDGMENTS
I thank especially Bob Cumby, Bruce Hansen, Roger Moon, Peter Phillips,
Norman Swanson and Pravin Trivedi and two anonymous referees for helpful
comments and suggestions on various earlier versions, and Maria Arbatskaya
for research assistance. The paper has also benefitted from presentations at the
June 1996 North American Econometric Society Summer Meetings, the April
1996 Midwest International Economics Meetings, and workshop seminars at
Rice University-University of Houston, Southern Methodist University, The
Federal Reserve Bank of Kansas City, U. C. Santa Cruz and Washington
University. The current version of the paper was completed while I was a
visitor at the Department of Economics at Cornell University, and I thank the
members of the Department for their generous hospitality. A computer program
120
PETER PEDRONI
which implements these tests is available upon request from the author at
ppedroni@indiana.edu
REFERENCES
Alexius, A., & Nilson, J. (2000). Real Exchange Rates and Fundamentals: Evidence from 15
OECD Countries. Open Economies Review, forthcoming.
Canning, D., & Pedroni, P. (1999). Infrastructure and Long Run Economic Growth. CAE Working
paper, No. 9909, Cornell University.
Canzoneri M., Cumby, R., & Diba, B. (1996). Relative Labor Productivity and the Real Exchange
Rate in the Long Run: Evidence for a Panel of OECD Countries. NBER Working paper No.
5676.
Chinn, M. (1997). Sectoral Productivity, Government Spending and Real Exchange Rates:
Empirical Evidence for OECD Countries. NBER Working paper No. 6017.
Chinn, M., & Johnson, L. (1996). Real Exchange Rate Levels, Productivity and Demand Shocks:
Evidence from a Panel of 14 Countries. NBER Working paper No. 5709.
Entorf, H. (1997). Random Walks and Drifts: Nonsense Regression and Spurious Fixed-Effect
Estimation. Journal of Econometrics, 80, 28796.
Evans, P., & Karras, G. (1996). Convergence Revisited. Journal of Monetary Economics, 37,
249265.
Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels. Working
paper, Department of Economics, University of Cambridge.
Kao, C., & Chen, B. (1995). On the Estimation and Inference of a Cointegrated Regression in
Panel Data When the Cross-section and Time-series Dimensions Are Comparable in
Magnitude. Working paper, Department of Economics, Syracuse University.
Kao, C., & Chiang, M. (1997). On the Estimation and Inference of a Cointegrated Regression In
Panel Data. Working paper, Department of Economics, Syracuse University.
Kao, C., Chiang, M., & Chen, B. (1999). International R&D Spillovers: An Application of
Estimation and Inference in Panel Cointegration. Oxford Bulletin of Economics and
Statistics, 61(4), 691709.
Keller, W., & Pedroni, P. (1999). Does Trade Affect Growth? Estimating R&D Driven Models of
Trade and Growth at the Industry Level. Working paper, Department of Economics, Indiana
University and University of Texas.
Levin, A., & Lin, F. (1993). Unit Root Tests in Panel Data; Asymptotic and Finite-sample
Properties. Working paper, Department of Economic, U. C. San Diego.
Mark, N., & Sul, D. (1999). A Computationally Simple Cointegration Vector Estimator for Panel
Data. Working paper, Department of Economics, Ohio State University.
Neusser, K., & Kugler, M. (1998). Manufacturing Growth and Financial Development: Evidence
from OECD Countries. Review of Economics and Statistics, 80, 638646.
Newey, W., & West, K. (1987). A Simple, Positive Semi-Definite, Heteroskedasticity and
Autocorrelation Consistent Coariance Matrix. Econometrica, 55, 703708.
Newey, W., & West, K. (1994). Autocovariance Lag Selection in Covariance Matrix Estimation.
Review of Economic Studies, 61, 631653.
121
Obstfeld M., & Taylor, A. (1996). International Capital-Market Integration over the Long Run:
The Great Depression as a Watershed. Working paper, Department of Economics, U. C.
Berkeley.
Oh, K. (1996). Purchasing Power Parity and Unit Root Tests Using Panel Data. Journal of
International Money and Finance, 15, 405418.
Papell, D. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float.
Pedroni, P. (1993). Panel Cointegration. Chapter 2 in Panel Cointegration, Endogenous Growth
And Business Cycles in Open Economies, Columbia University Dissertation, Ann Arbor,
MI: UMI Publishers.
Pedroni, P. (1995). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time
Series Tests, With an Application to the PPP Hypothesis. Working paper, Department of
Economics, No. 95013, Indiana University.
Pedroni, P. (1996a). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper No. 96020, Department of Economics, Indiana
University.
Pedroni, P. (1996b). Human Capital, Endogenous Growth, & Cointegration for Multi-Country
Panels. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1997a). Panel Cointegration; Asymptotic and Finite Sample Properties of Pooled Time
Series Tests, With an Application to the PPP Hypothesis; New Results. Working paper,
Department of Economics, Indiana University.
Pedroni, P. (1997b). On the Role of Cross Sectional Dependency in Dynamic Panel Unit Root and
Panel Cointegration Exchange Rate Studies. Working paper, Department of Economics,
Indiana University.
Pedroni, P. (1998). Testing for Convergence to Common Steady States in Nonstationary
Heterogeneous Panels. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1999). Purchasing Power Parity Tests in Cointegrated Panels. Working paper,
Pedroni, P., & Wen, Y. (2000). Government and Dynamic Efficiency. Working paper, Department
of Economics, Cornell University and Indiana University.
Pesaran, H., & Smith, R. (1995). Estimating Long Run Relationships from Dynamic
Phillips, P., & Durlauf, S. (1986). Multiple Time Series Regressions with Integrated Processes.
Review of Economic Studies, 53, 473495.
Phillips, P., & Hansen, B. (1990). Statistical Inference in Instrumental Variables Regression with
I(1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P., & Loretan, M. (1991). Estimating Long-run Economic Equilibria. Review of Economic
Studies, 58, 407436.
Phillips, P., & Moon, H. (1999). Linear Regression Limit Theory for Nonstationary Panel Data.
Quah, D. (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data.
Taylor, A. (1996). International Capital Mobility in History: Purchasing Power Parity in the LongRun. NBER Working paper No. 5742.
Wu, Y. (1996). Are Real Exchange Rates Nonstationary? Evidence from a Panel-Data Test. Journal
of Money Credit and Banking, 28, 5463.
122
PETER PEDRONI
MATHEMATICAL APPENDIX A
Proposition 1.1: We establish notation here which will be used throughout the
remainder of the appendix. Let Zit = Zit1 + it where it = (it, it). Then by
virtue of assumption 1.1 and the functional central limit theorem,

T
Z itit
i) dB(r,
i) + i + oi
B(r,
t=1
(A1)
r=0
it
Z itZ
i)B(r,
i) dr
B(r,
(A2)
r=0
t=1
for all i, where Z it = Z it Z i refers to the demeaned discrete time process and
i) is demeaned vector Brownian motion with asymptotic covariance i.
B(r,
i(r) where Li = i1/2 is the
i) = Li W
This vector can be decomposed as B(r,
= W1(r)
lower triangular decomposition of i and W(r)
W2(r)
W1(r) dr,
is a vector of demeaned standard Brownian motion,
W2(r) dr
with W1i independent of W2i. Under the null hypothesis, the statistic can be
written in these terms as
N
T
1
1
Z itit
T
N i = 1
21
t=1
TN( NT ) =
(A3)
N
T
1
2
it
T
Z itZ
N i=1
22
t=1
Based on (A1), as T , the bracketed term of the numerator converges to

i)
i) dB(r,
B(r,
r=0
+ 21i + o21i
the first term of which can be decomposed as
r=0
(A4)
21
i) dB(r,
i)
B(r,

= L11iL22i
21
+ L21iL22i
W2i dW1i W 1i(1)
W2i dW2i W 2i(1)
W2i
W2i
(A5)
123
In order for the distribution of the estimator to be unbiased, it will be necessary

that the expected value of the expression in (A4) be zero. But although the
expected value of the first bracketed term in (A5) is zero, the expected value of
the second bracketed term is given as

E L21iL22i
W2i dW2i W2i(1)

W2i
1
= L21iL22i
2
(A6)
Thus, given that the asymptotic covariance matrix, i, must have positive
diagonals, the expected value of the expression (A4) will be zero only if
L21i = 21i = o21i = 0, which corresponds to strict exogeneity of regressors for all
members of the panel. Finally, even if such strict exogeneity does hold, the
variance of the numerator will still be influenced by the parameters L11i, L22i
which reflect the idiosyncratic serial correlation patterns in the individual
cross sectional members. Unless these are homogeneous across members of the
panel, they will lead to non-trivial data dependencies in the asymptotic
distribution.
Proposition 1.2: Continuing with the same notation as above, the fully modified
statistic can be written under the null hypothesis as

N
NT ) =
TN(*
N
1
L 1
11iL22i (0,1)
Z itit
i=1
1,
t=1
1
N
L 21i
i
L 22i
2
22i
i=1
(A7)
it
Z itZ
t=1
22
Thus, based on (A1), as T , the bracketed term of the numerator converges

to
r=0
i) dB(r,
i)
B(r,
21
L 21i
L 22i
i) dB(r,
i)
B(r,
r=0
+ 21i + o21i
L 21i
( 22i + o22i)
L 22i
i such that
which can be decomposed into the elements of W
22
(A8)
124
PETER PEDRONI
i) dB(r,
i)
B(r,
r=0

= L11iL22i
21
+ L21iL22i
i) dB(r,
i)
B(r,
r=0
W2i dW1i W1i(1)

W2i
W2i dW2i W2i(1)
= L222i
W2i
W2idW2i W2i(1)
22
W2i
(A9)
(A10)
where the index r has been omitted for notational simplicity. Thus, if a
i i and consequently L i Li
consistent estimator of i is employed, so that
and i , then

T
L L
1 1
11i 22i
(0,1)(T
t=1
L 21i
i
Z itit) 1,
L 22i
1
W2i(r) dW1i(r) W1i(1)
(A11)
W2i(r) dr
where the mean and variance of this expression are given by

W2i dW1i
W2idW1i W1i(1)
2W1i(1)

W2idr
(A12)
W2idr = 0

2
W2idW1i + W1i(1)2
1
1
1 1
= 2
+ =
2
3
3 6
W2idr
(A13)
respectively. Now that this expression has been rendered void of any
i), then by virtue of
idiosyncratic components associated with the original B(r,
assumption 1.2 and a standard central limit theorem argument,

N
1
N
i=1
W2i(r) dr N( 0, 1/6) (A14)
125
as N . Next, consider the bracketed term of the denominator of (A3), which

based on (A1), as T , converges to
i)B(r,
i)
B(r,
r=0
22
Thus,
= L222i
(T
it)
Z itZ
22
t=1
W2i(r) dr
(A15)
W2i(r) dr
(A16)

T
2
22i
W2i(r)2 dr
W2i(r)2dr
which has finite variance, and a mean given by
W2i(r)2dr

2
W2i(r) dr
1 1 1
= =
2 3 6
(A18)
Again, since this expression has been rendered void of any idiosyncratic
i), then by virtue of assumption
components associated with the original B(r,
1.2 and a standard law of large numbers argument,

N
1
N
i=1
W2i(r)2 dr
W2i(r) dr
1
6
(A18)
as N . Thus, by iterated weak convergence and an application of the

NT ) N(0, 6) for this case where
continuous mapping theorem, TN(*
heterogeneous intercepts have been estimated. Next, recognizing that T 1/2y i
W1i(r) dr

W1i =
and
T 1/2x i
W2i(r) dr
as
T ,
and
setting
W2i = 0 for the case where y i = x i = 0 gives as a special case of (A13)
and (A17) the results for the distribution in the case with no estimated
126
PETER PEDRONI
intercepts. In this case the mean given by (A12) remains zero, but the variance
1
1
in (A13) become 2 and the mean in (A17) also becomes 2. Thus,
NT ) N(0, 2) for this case.
TN(*
Corollary 1.2: In terms of earlier notation, the statistic can be rewritten as:

N
t*
NT =
N
1
L 1
11iL22i (0,1)
Z itit
i=1
1,
t=1
1
N
L 21i
i
L 22i
(A19)
L 2
22i
it
Z itZ
T 2
i=1
22
t=1
where the numerator converges to the same expression as in proposition 1.2,

and the root term of the denominator converges to the same value as in
proposition 1.2. Since the distribution of the numerator is centered around zero,
the asymptotic distribution of t*
NT will simply be the distribution of the
numerator divided by the square root of this value from the denominator.
Since
E

2
W2i dW1i
2W1i(1)

2
W2i dW1i + W1i(1)2
W2i
W 22i
=E
W2i
(A20)
W2i
by (A13) and (A17) regardless of whether or not

W1i,
W2i are set to zero,
then t*
i, y i are
NT N(0, 1) irrespective of whether x
estimated or not.
Proposition 1.3: Write the statistic as:

N
t*
NT =
N
i=1
t=1
Z itit
1
L 2
11i (0, 1) T
(T
L 21i
i
L 22i
1/2
it)22
Z itZ
1,
t=1
Then the first bracketed term converges to
(A21)
L11iL22i
127
W2i(r) dr
~ N 0, L11iL22i
W2i(r)2 dr
W2i(r) dr
(A22)
by virtue of the independence of W21i(r) and dW1i(r). Since the second bracketed
term converges to
L22i
W2i(r)2 dr
1/2
(A23)
W2i(r) dr
then, taken together, for L i Li, (A21) becomes a standardized sum of i.i.d.
standard normals regardless of whether or not

W1i,
W2i are set to zero,
and thus t*

NT N(0, 1) by a standard central limit theorem argument
irrespective of whether x i, y i are estimated or not.
Proposition 2.1: Insert the expression for y*it into the numerator and use
yit y i = (xit x i) + it to give
N
T
L 21i
1
1
L 11iL 22i
(xit x i)(it
xit) T i
L 22i
i
=
1
t
=
1
NT =
*
N
T

(xit x i)2
L 222i
i=1
L L
1 1
11i 22i
i=1
t=1
L 11i L 22i
1+

L 22i
t=1
(xit x i)2
(A24)
(xit x i)2
L 2
22i
i=1
t=1
L 11i L 22i
1 1
Since L 2
, the last term in (A24) reduces to , thereby
22i = L11iL22i 1 +
L 22i
giving the desired result.
128
PETER PEDRONI
APPENDIX B
Table I. Small Sample Performance of Group Mean Panel FMOLS with
Heterogeneous Dynamics
Case 1: 21i ~ (0.0, 0.8)
N
bias
std error
5% size
10% size
10
10
20
30
40
50
60
70
80
90
100
0.058
0.018
0.009
0.006
0.004
0.003
0.002
0.002
0.002
0.001
0.115
0.047
0.029
0.020
0.016
0.012
0.010
0.009
0.008
0.007
0.282
0.084
0.061
0.035
0.027
0.020
0.016
0.014
0.014
0.014
0.362
0.145
0.110
0.076
0.062
0.049
0.044
0.040
0.038
0.037
20
10
20
30
40
50
60
70
80
90
100
0.034
0.012
0.006
0.004
0.003
0.003
0.002
0.002
0.002
0.001
0.079
0.033
0.020
0.014
0.011
0.009
0.007
0.006
0.006
0.005
0.291
0.100
0.076
0.045
0.039
0.028
0.026
0.021
0.020
0.018
0.378
0.166
0.132
0.093
0.081
0.066
0.059
0.055
0.050
0.052
30
10
20
30
40
50
60
70
80
90
100
0.049
0.017
0.009
0.006
0.004
0.003
0.003
0.002
0.002
0.002
0.061
0.025
0.015
0.011
0.008
0.007
0.006
0.005
0.004
0.004
0.386
0.156
0.107
0.072
0.059
0.047
0.039
0.035
0.032
0.030
0.470
0.234
0.177
0.133
0.118
0.096
0.086
0.073
0.077
0.076
Notes: Based on 10,000 independent draws of the cointegrated system (1)(3), with
= 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11i ~ U(0.1, 0.7),
12i ~ U(0.0, 0.8), 21i ~ U(0.0, 0.8), 22i ~ U(0.2, 1.0).
129
Table II. Small Sample Performance of Group Mean Panel FMOLS with
Case 2: 21i ~ U(0.8, 0.0)
N
bias
std error
5% size
10% size
10
10
20
30
40
50
60
70
80
90
100
0.082
0.041
0.025
0.016
0.012
0.009
0.007
0.006
0.005
0.005
0.132
0.058
0.037
0.027
0.021
0.017
0.014
0.012
0.011
0.010
0.422
0.234
0.187
0.137
0.115
0.091
0.087
0.078
0.072
0.063
0.498
0.324
0.268
0.213
0.185
0.155
0.151
0.140
0.135
0.120
20
10
20
30
40
50
60
70
80
90
100
0.093
0.043
0.026
0.017
0.012
0.009
0.007
0.006
0.005
0.004
0.092
0.042
0.027
0.020
0.015
0.012
0.010
0.009
0.008
0.007
0.581
0.352
0.265
0.205
0.158
0.130
0.117
0.109
0.103
0.090
0.648
0.447
0.361
0.294
0.242
0.211
0.194
0.181
0.170
0.156
30
10
20
30
40
50
60
70
80
90
100
0.070
0.033
0.020
0.013
0.009
0.007
0.006
0.005
0.004
0.003
0.071
0.032
0.020
0.015
0.011
0.009
0.008
0.007
0.006
0.005
0.563
0.339
0.259
0.196
0.152
0.131
0.113
0.103
0.096
0.087
0.630
0.433
0.352
0.289
0.236
0.211
0.190
0.175
0.164
0.156
= 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11i ~ U(0.1, 0.7),
12i ~ U(0.8, 0.0), 21i ~ U(0.8, 0.0), 22i ~ U(0.2, 1.0).
130
PETER PEDRONI
Table III. Small Sample Performance of Group Mean Panel FMOLS with
Case 3: 21i ~ U(0.4, 0.4)
N
bias
std error
5% size
10% size
10
10
20
30
40
50
60
70
80
90
100
0.009
0.011
0.008
0.005
0.004
0.003
0.002
0.002
0.002
0.001
0.129
0.052
0.033
0.023
0.018
0.014
0.012
0.011
0.009
0.008
0.284
0.113
0.086
0.058
0.048
0.039
0.037
0.031
0.029
0.028
0.367
0.179
0.150
0.113
0.093
0.083
0.077
0.072
0.068
0.062
20
10
20
30
40
50
60
70
80
90
100
0.028
0.014
0.009
0.006
0.004
0.003
0.002
0.002
0.001
0.001
0.090
0.037
0.024
0.017
0.013
0.010
0.009
0.008
0.007
0.006
0.346
0.145
0.106
0.077
0.060
0.048
0.040
0.037
0.035
0.035
0.430
0.222
0.179
0.138
0.114
0.093
0.085
0.083
0.079
0.078
30
10
20
30
40
50
60
70
80
90
100
0.008
0.006
0.004
0.003
0.002
0.001
0.001
0.001
0.001
0.001
0.069
0.028
0.018
0.013
0.010
0.008
0.007
0.006
0.005
0.005
0.317
0.122
0.095
0.068
0.054
0.044
0.038
0.036
0.033
0.036
0.402
0.194
0.155
0.122
0.105
0.088
0.082
0.076
0.073
0.074
= 2.0, 1i ~ U(2.0, 4.0), 11i = 22i = 1.0, 21i ~ U(0.85, 0.85) and 11 ~ U(0.1, 0.7),
12i ~ U(0.4, 0.4), 21i ~ U(0.4, 0.4), 22i ~ U(0.2, 1.0).
TESTING FOR COMMON CYCLICAL

FEATURES IN NONSTATIONARY
PANEL DATA MODELS
Alain Hecq, Franz C. Palm and Jean-Pierre Urbain
ABSTRACT
In this chapter we extend the concept of serial correlation common
features to panel data models. This analysis is motivated both by the need
to develop a methodology to systematically study and test for common
structures and comovements in panel data with autocorrelation present
and by an increase in efficiency coming from pooling procedures. We
propose sequential testing procedures and study their properties in a small
scale Monte Carlo analysis. Finally, we apply the framework to the well
known permanent income hypothesis for 22 OECD countries,
19501992.
I. INTRODUCTION
In economics it is often of interest to test whether a set of time series moves
together, that is whether the series are driven by some common factors. The
vast literature on cointegration has focussed on long-run comovements for
nonstationary time series. More recently, some authors have analyzed the
existence of short-run comovements between stationary time series or between
first differenced cointegrated-I(1) series (see Tiao & Tsay, 1989; Engle &
Kozicki, 1993; Gouriroux & Peaucelle, 1993; Vahid & Engle, 1993; Vahid &
ISBN: 0-7623-0688-2
131
132
ALAIN HECQ, FRANZ C. PALM & JEAN-PIERRE URBAIN
Engle, 1997; Ahn, 1997). Among these approaches, the concept of serial
correlation common features (SCCF hereafter) introduced by Engle & Kozicki
(1993) appeared to be useful. It means that stationary time series move together
as there exist linear combinations of these variables that yield white noise
processes. These common feature vectors are measures for analyzing short-run
relationships between economic variables suggested by economic theory such
as relative purchasing power parity (Gouriroux & Peaucelle, 1993), permanent
income hypothesis (Campbell & Mankiw, 1990, Jobert, 1995), cross-country
real interest rate differentials (Kugler & Neusser, 1993), real business cycle
models (Issler & Vahid, 1996), convergence of economies (Beine & Hecq,
1997, 1998), Okuns Law (Candelon & Hecq, 2000).
Serial correlation common features imply the existence of a reduced number
of common dynamic factors explaining short-run comovements in economic
variables. A companion form of the common features models is the common
factor representation which has been used in macroeconomics for some
decades (see e.g. Engle & Watson, 1981; Geweke, 1977; Lumsdaine & Prasad,
1997; Singleton, 1980). Beyond economic considerations, through the reducedrank restrictions, the existence of common features is likely to lead to a
reduction of the number of parameters to be estimated. In general, imposing
common cyclical feature restrictions when they are appropriate will induce an
increase in estimation efficiency (Ltkepohl, 1991) and accuracy of forecasts
(Vahid & Issler, 1999).
Also as for unit roots and cointegration tests, the power of common cyclical
feature procedures may be low for small samples (Beine & Hecq, 1999). The
power of tests might be increased by relying on panel data instead of using only
time series data. Consequently, in this paper we propose to extend these models
by testing for serial correlation common features in a panel data framework. In
order to avoid confusion, it is worth noticing that standard panel data models
with common parameter structures obviously already imply a common feature
structure, namely the one which allows to pool the behavior of N individuals.
Notice that the assumption of poolability often made in panels may be often far
too strong. An investigator may want to test which poolability restrictions are
supported by the data and which restrictions have to be rejected for the panel
data.
We propose to generalize the SCCF approach and apply it to search for
common cyclical features in panel data. In particular, we investigate whether
there exist linear combinations of the variables for individual or entity i which
are white noise for all i, in other words, which weights in the linear
combinations are identical across all entities. Developing a methodology to
Testing for Common Cyclical Features
133
analyze and test common cyclical features in panel data is of theoretical and
practical importance since common cyclical feature restrictions are less
restrictive than the assumption of identical parameters across individuals
usually made in panel data modeling.
Some purists might not speak about panel for this type of analysis. Indeed,
in situations we are interested with, N will be relatively small compared to its
value in usual panel data and T is assumed large (with T asymptotics).
Many macroeconomic studies deal with 15 to 50 annual observations for 20 to
100 countries, regions, industry levels or big firms. In those cases, the border
between pure panel analysis (N ) and pure time series analysis (T ) is
fuzzy. Far from impoverishing the panel data analysis, taking into account
medium or large size time series raises new interesting issues such as testing
for unit roots or cointegration in panel data (see inter alia Levin & Lin, 1993;
Pesaran & Smith, 1995; Evans & Karas, 1996a; Kao, 1999; Pedroni, 1997a;
Phillips & Moon, 1999b, and Phillips & Moon, 1999a, for the asymptotic
theory, and the recent issue of the Oxford Bulletin of Economics and Statistics,
1999).
The chapter is organized as follows. Section II provides an example of
common features between consumption and income implied by economic
theory and likely to be common to data for different countries. In Section III we
review the concept of serial correlation common features. Section IV extends
it to panel data. As we study differences and similarities in macroeconomic
series for different countries, we concentrate our analysis on the fixed effect
model (see Hsiao, 1986). Section V describes estimation procedures. In Section
VI simulation results are reported. In Section VII we present an empirical
analysis of the liquidity constraint consumption model for 22 OECD countries
and the G7. Section VIII concludes.
II. AN EXAMPLE OF COMMON FEATURES

To further motivate this chapter, consider the permanent income hypothesis
(PIH hereafter) and the heterogeneous consumer model proposed by Campbell
& Mankiw (1990, 1991). These authors consider two groups of agents who
receive a disposable income y1t and y2t in fixed proportions of the total income
respectively, such that y1t = yt, y2t = (1 )yt and yt = y1t + y2t. Agents in the first
group are subject to liquidity constraints. Therefore, they consume their current
income while agents in the second group consume their permanent income. We
get the following system:
134
c1t = y1t = yt

c2t = yP2t = (1 )yPt
y1t = yP1t + yT1t
(1)
y2t = yP2t + yT2t,

where cit is the consumption of agent i and yPit and yTit are the permanent and
transitory component of income of the agent i which are assumed to be I(1) and
I(0), respectively. Aggregating over agents we get ct = yP1t + yT1t + yP2t = yPt + yT1t , and
thus:
ct = yPt + yTt
(2)
yt = yPt + yTt ,
which shows that aggregate consumption and income share a common trend yPt .
Note that because a fraction of income accrues to individuals who consume
their current income rather than their permanent income, this model has been
labelled model by Campbell & Mankiw (1990, 1991). It is also easily seen
that if = 0 we get the permanent income model. In order to stress the common
cycle component let us take the first difference of aggregate consumption
ct = c1t + c2t. By substituting the shares of income in the total income we obtain
ct = yt + (1 )yPt which in first differences can be written as:
ct = yt + (1 )yPt .
(3)
Consequently, assuming that the permanent income is a martingale, the

consumption function can be tested by the regression ct = yt + (1 )t.
However, t is a difference martingale which is not orthogonal by construction
to yt. Therefore this equation cannot be consistently estimated by OLS but
instrumental variables (IV) estimators are appropriate.
With a few exceptions as Vahid & Engle (1993) and Jobert (1995), most
empirical studies do not take the cointegrating vector into account as a valid
instrument when testing equation (3) using IV estimates, and therefore may be
subject to an omitted variable problem. Vahid & Engle (1993) made the
connection with the common feature hypothesis that t is a white noise1 with
[1 ] the associate normalized common features vector. Empirical studies
have shown that is usually significantly different from zero with coefficients
in the range 0.3 to 0.5 for most countries. Therefore in order to test for the
existence of one short-run relationship common to a set of countries and to
improve the power of common feature tests, a pooled common features test in
panel seems appropriate. The use of the cross-section dimension in the
estimation could also give rise to substantial efficiency gains.
135
III. COMMON FEATURES IN TIME SERIES

In the context of time series analysis, serial correlation common features means
that there exist linear combinations of (stationary) economic time series which
are white noise processes. Consider a cointegrated VAR of order p = 2, with
reduced rank autoregressive coefficient matrix, written in its VECM form, for
consumption and income, for t = 1 . . . T:

ct
1

=
+
[21
yt
2
1
22]

ct 1

+
[ 1
yt 1
1

2]
ct 1
1t
+
yt 1
2t
(4)
where 1 and 2 are constant drift terms, [1t, 2t]

is a bivariate white noise
process with non-singular covariance matrix . ( 2/ 1) is the long-run income
elasticity if one chooses consumption as normalized variable. A distinction
could be made at this stage between a strong and a weak form reduced rank
structure, as put forward by Hecq, Palm & Urbain (1997, 2000). The Strong
Form Reduced Rank Structure (SF) is the original formulation proposed by
Engle & Kozicki (1993) in which long-run and short-run matrices share the
same left null space. It corresponds to = in system (4). In this case, there
= [1 ] such that premultiplying
exists a common feature vector
expression (4) by
yields a white noise. In the less restrictive model, labelled
Weak Form Reduced Rank Structure (WF), , and a linear combination of
first differences in deviation from the long-run equilibrium is a white noise:

ct

[ 1
yt
1
ct 1
yt 1
1t
.
2t
(5)
Formal definitions of the strong and the weak form are given in Hecq, Palm &
Urbain (1997, 2000) and consequences in terms of common cycles as well as
inference issues are analyzed there as well. Notice that Hecq et al. (1997) also
consider the mixed form combining both the strong and the weak form.
Common features relationships give information on short-run comovements.
These relationships may come from economic theory (relative purchasing
power parity, PIH) or from stylized facts (convergence, Real Business Cycle
(RBC) models) and give the dynamic common factor within the system, i.e.
21ct 1 + 22yt 1 in the WF case for instance. The orthogonal complement of
= 0s 2), gives the factor loading of the common
labelled (
the ,
dynamics in the equations, that is = [ 1]
in system (4). Note that these
common dynamic factors should not be confused with common cycles.
136
Common cycles are defined in a specific trend-cycle decomposition as the

stationary part of the time series left after removing permanent components.
Vahid & Engle (1993) show that the existence of s common feature vectors (of
the SCCF or SF type) leads to n s common cycles in the multivariate
Beveridge-Nelson decomposition. Vahid & Engle (1997) extend this definition
to nonsynchronous cycles. Hecq, Palm & Urbain (2000) propose a BeveridgeNelson decomposition for the WF that allows for a reduced number of common
cycles. Note that the latter weak form reduced rank structure will in the sequel
not be explicitly considered as we want to focus on the extension of the
standard serial correlation common feature analysis to panel data. We use the
terms common dynamic features, common cyclical features and common
dynamic factors as synonyms to denote reduced rank structures in the shortrun dynamics of the first-differenced VAR or the VECM.
In this simple bivariate model (4), the serial correlation common feature
hypothesis may also be written in terms of moment conditions such as:
E[(ct yt).Wt] = 0,
(6)
where E[.] is the expectation operator and Wt = {1, ct 1 . . . ct k, yt 1, . . . ,

yt l, zt 1} is a set of instruments consisting of a constant term, the lags of both
variables and the deviation from the long-run relationship zt 1 ct 1 ( 2/
1)yt 1.
Adopting a two-step approach,2 there are two obvious ways to test for SCCF.
The first way is to carry out a canonical correlation analysis between
consumption and income on the one hand and the set of instruments on the
other hand. The non-significant squared canonical correlations reveal the
existence of linear combinations which yield white noise processes. Alternatively, one can use generalized method of moments type estimators
exploiting the moment condition (6). A test of overidentifying restrictions
implied by (6) is a test of serial correlation common features. The use of
canonical correlation estimation has the advantage that results do not rely on
the choice of the normalization of the moment conditions. Moreover, it is more
convenient when we test for the number of common feature vectors. In this
paper we treat the problem in a GMM framework for several reasons. Firstly,
we have at most one common feature vector in a bivariate system. Secondly,
this framework may be more easily extended to panel data models. Finally,
normalization imposed on IV by selecting one variable as having a coefficient
equal to one leads to an increase of the power of the test compared with those
based on canonical correlations.3
137
IV. EXTENSION TO PANEL DATA MODELS

Frequently, analyses comparing for instance the PIH with model,
concentrate on one country, very often the USA. In order to motivate the
generality of the theory, some authors extend their empirical investigation to
several countries (Campbell & Mankiw, 1991; Evans & Karas, 1996b).
However it is difficult to claim that results for different countries are
uncorrelated. Since it is not possible to construct a pure time series model with
relatively few observations for a large number of individuals, such as a VAR
model with 2 N endogenous variables in a bivariate case, alternatives must be
found.
One solution would be to analyze the system under separation in common
features (Hecq, Palm & Urbain, 1999), an extension to separation in
cointegration (Granger & Haldrup, 1997; Konishi & Granger, 1993). Under
separation in common features, the common feature matrix is block-diagonal
with blocks corresponding to one individual i only. Treating the issue in the
complete system with separation in common features avoids losing efficiency
compared to an analysis of the marginal model for individual i since separation
does not require block-diagonality of the disturbance covariance matrix. This
solution is however difficult to implement for more than two or three
individuals. We illustrate this point via a small Monte Carlo experiment, of
which the precise specification will be given in Section VI. Consider a DGP
made out of bivariate systems similar to (4), with = (SCCF hypothesis), for
respectively two and five individuals. The only cross-sectional relations are due
to a non-diagonal disturbance covariance matrix. Complete separation in
cointegration, in common features as well as absence of bidirectional short-run
Granger causality are thus maintained. Using a standard canonical correlation
framework (see inter alia Hecq, Palm & Urbain, 1997) we perform a serial
correlation common feature analysis in the marginal model for the first
individual, ignoring the disturbance cross-correlations. Alternatively, under
separation in common features, we test the number (s = 2 or s = 5) of common
feature vectors for each individual in the complete system. We then constrain
the common feature space to be block-diagonal (see Hecq, Palm & Urbain,
1999) and estimate the vector for the first individual.
In Table 1, we report for 5,000 replications the median and the spread
(interquartile range) of the bias, 2 test statistics for the overidentifying
restrictions implied by the presence of common features as well as a small
sample adjusted version (Hecq, 1999). Although separation in common
features holds at the level of the DGP, some efficiency loss, as measured by the
spread, is observed in the marginal model compared to the full system for
138
Table 1. Monte Carlo Results

(Separated vs. Marginal Systems)
N=2
N=5
T = 10
T = 25
T = 50
T = 100
T = 10
T = 25
T = 50
T = 100
Marg
bias05
bias075025
2(2)
2ss(2)
Separ
bias05
bias075025
2(8)
2ss(8)
0.056
0.026
0.011
0.005
0.310
0.155
0.104
0.068
14.64
7.56
6.30
4.86
6.22
5.20
5.04
4.42
0.040
0.027
0.013
0.007
0.441
0.138
0.090
0.059
70.98
18.36
10.16
6.66
12.8
7.14
6.16
5.14
Marg
bias05
bias075025
2(2)
2ss(2)
Separ
bias05
bias075025
2(14)
2ss(14)
0.061
0.025
0.012
0.006
0.299
0.152
0.100
0.069
14.14
7.82
6.30
5.58
5.86
5.44
5.18
5.04
0.019
0.011
0.007
0.241
0.087
0.052
99.76
62.88
25.18
35.04
15.26
9.38
T = 25 for N = 2 and for T = 50 for N = 5. However the dispersion is too high for
smaller sample size and test statistics reject too often the presence of
respectively two and five common feature vectors.
These illustrative Monte Carlo results call for an extension to a (possibly
nonstationary) panel common feature analysis.
Let the subscript i = 1, . . . , N indicate the different groups/entities/units,
t = 1, . . . , T denote the sample period and j = 1, . . . , n denote the number of
variables for each group/entity. We assume that the n-dimensional vector of
observed I(1) variables for entity i, Xi, t, is generated by a pi-th order
cointegrated VAR which can be expressed in error-correction form as follows:

pi 1
Xi, t = i + t + i
i Xi, t 1 +
i, jXi, t j + i, t,
j=1
i = 1, . . . N,
t = 1, . . . , T,
(7)
where i denotes fixed individual effects, t denotes a vector of deterministic

time effects, i and i are n ri matrices of full column rank with ri being the
cointegrated rank (ri < n) and i, t is a disturbance. The vector t = (
1, t, . . . ,

N, t)
is an nN 1 dimensional homoskedastic Gaussian mean innovation
process relative to X 1 = {Xi, t j, i = 1, . . . , N; j < t} with non-singular contemporaneous covariance matrix , the (i, j)-th block of which being
139
E(i, t
j, t) = i, j. Note that one could allow for random individual effects in
expression (7). This would lead to an error-component structure of i, t similar
to that used in the panel-data literature.
For system (7), we define a homogeneous SCCF panel model as follows:
Definition 1. A panel model is called an homogeneous panel common feature
model if there exists, i = 1, . . . , N, a (n si) matrix i = j i, j = 1, . . . , N,
ii, t
i Xi, t =
whose columns span the individual co-feature space, such that
is a si-dimensional white noise process for each individual.

This definition applies to the case where the individual co-feature matrices i,
and hence their column ranks si, are the same across all individuals. A typical
dynamic panel data model with fixed effects i and deterministic time effects
t arises as a special case of (7) when the parameters i, i, i, j and i are the
same across entities i (see e.g. Hoogstrate, 1998). In order to clarify the nature
of the hypotheses underlying the panel common feature restrictions, in the next
subsection, following Groen & Kleibergen (1999) for panel cointegration, we
consider a model resulting from sequentially testing and imposing restrictions
on a high dimensional unrestricted VECM.
A. A Panel VECM Representation
We are interested in testing for cointegration and common serial features with
respect to n I(1) time series in vector Xi, t within a dynamic model for N
individuals i. Without loss of generality, we consider a large VECM with one
lag in the first differences, e.g. a VAR with two lags in levels. The
generalization to high order dynamics is immediate by substituting ij by ij(L)
in (8) but it makes the notation heavy. We consider the model without any time
dummies for sake of simplicity. For t = 1, . . . , T we may write the nNdimensional system as:
11 . . . 1N
11 . . . 1N
Xt =

Xt 1 +
Xt 1 + ut,
N1 . . . NN
N1 . . . NN
(8)
where Xt = (X

1, t . . . X
N, t)
, ut = (u
1, t . . . u
N, t)
and Xt 1 = (X
1, t 1 . . . X
N, t 1)
are vectors of dimension nN 1, or more concisely

Xt = urXt 1 + urXt 1 + ut,
(9)
where ur and ur are nN nN matrices and ut = + t, = (

1 , . . . ,
N)
,
t = (
1, t, . . . ,
N, t)
are nN 1 vectors with t ~ N(0, ).
140
nN nN
11

N1
...
...
1N

NN
(10)
When ur = 0, the system (9) is non-cointegrated. The approach presented can
be applied to non-cointegrated systems. Obviously, in such system, the WF and
SF reduced rank structures are identical.
Without imposing any zero block restrictions, the large unrestricted model
(8) is not estimable in practice. Consequently, restrictions have to be
considered. We first describe cointegrating restrictions before introducing serial
correlation common feature restrictions.
1. Cointegrating Restrictions In A Panel VAR
We first consider restrictions on the long-run matrix ur in the unrestricted
VECM. Two types (A and B) of sequences of hypotheses naturally arise in
panel data. The hypotheses involved in a sequence can be tested either
sequentially or jointly.
A1: Absence of long-run Granger-Causality [see Granger & Lin, 1995]
between the individual subgroups, i.e. ur is block-diagonal with elements
ij = 0 for i j.
A2: Cointegration in absence of long-run Granger-causality, i.e. ii = i
i,
with i and i being n ri matrices of rank ri, i = 1, . . . , N.
A3: Homogeneous panel cointegration, i.e. i = 1, i = 1, . . . , N; r = Nr1.
B1: Cointegration, i.e. ur =
, with and being nN r matrices of rank
r.
B2: Complete separation in cointegration (see Granger & Haldrup, 1997), i.e.
and are block-diagonal with typical blocks i and i respectively, of rank
ri, such that a typical block of
is i
i as defined in A2, and r = Ni = 1ri.
B3: Homogeneous panel cointegration, i.e. i = 1 ; i = 1, . . . , N; r = Nr1.
When the first two sets of restrictions in either sequence hold, the following
restricted structure arises.
0
11 . . . 1N
1
1 0 . . .
Xt =
0
0
Xt 1 +
Xt 1 + ut. (11)
0
0 . . . N
N
N1 . . . NN
When it is appropriate to add a restricted trend in the cointegration space, we
replace Xt 1 by X*t 1 = (X
t 1, t)
. For N fixed, a likelihood ratio statistic for
testing (11) versus (8) can be obtained using the sum of two different
conditional likelihood ratio statistics to test the sets of restrictions {A1, A2} or
141
{B1, B2}. Next, homogeneity of panel cointegration can be tested using a

likelihood ratio test. A decomposition similar to {A1, A2} is proposed by
Groen & Kleibergen (1999). The main problem with this approach is that under
A1, that is absence of long-run Granger-causality, the usual tests have an
unknown asymptotic distribution, as the possible presence of cointegration
interfers with the block-diagonality of ur. On the other hand, once the
cointegrating rank in the unrestricted VECM has been fixed, a test statistic with
separation as the null hypothesis has an 2-asymptotic distribution. It is
worthwhile to mention that although model (11) looks rather specific, it is less
restrictive than the models used in the dynamic panel literature, where quite
frequently, in addition to separation in cointegration, the same parameter
structure is assumed to hold across individuals (see inter alia the overview in
Phillips & Moon, 1999b). Occasionally, complete separation is relaxed to
requiring to be block-diagonal leaving unrestricted (Larsson & Lyhagen,
1999).
2. Common Feature Restrictions
Imposing serial correlation common feature and short-run Granger-noncausality restrictions, system (11) becomes:
0
1*1 0 . . .
0
11 1 0 . . .
0
0
X
+
0
0
Xt 1 + ut.
Xt =
t1
0
0 . . . N*N
0
0 . . . NN N
(12)
As for cointegrating restrictions, this model may be obtained by considering
two of the next three hypotheses under (11).
C1: Serial correlation common features: there exists a (nN s) matrix such
N
that
X
t is an s dimensional white noise, with s = i = 1si.
C2: Absence of short-run Granger-causality between the individual subgroups: ur is block-diagonal, i.e. ij = 0 for i j.
C3: Separation in common features: the matrix is block-diagonal with the
(si n) matrix i being a typical block on the main diagonal, s = Ni = 1si.
C4: Homogeneity of common features: i = 1; i = 1, . . . , N; s = Ns1.
Actually the hypothesis C2 is implicit when one stacks VECMs. Restriction C3
is developed in Hecq, Palm & Urbain (1999) for the SCCF as well as for the
weak form structure. Here again a likelihood ratio for testing model (12) versus
(11) can be obtained as the sum of two conditional likelihood ratio statistics to
test either {C1, C2} or {C2, C3}. This means that we can first test for common
cyclical features under the maintained hypothesis of short-run Granger-non-
142
causality C2. Alternatively, we can first test for absence of short-run causality
and then test for SCCF since both sequences of restrictions imply separation in
common features. This result is derived from Proposition 3.3. in Hecq, Palm &
Urbain (1999) which states that under separation in cointegration and blockdiagonality of this long-run matrix, the presence of common features implies
that the co-feature matrix is block-diagonal.
V. GMM ESTIMATION
To test for common features in a time series context, we have the choice
between GMM estimators applied to a regression framework and a canonical
correlation procedure based on maximum likelihood (ML) estimation. Both
methods have their advantages and drawbacks. The ML estimation is fully
efficient and likelihood ratio tests are asymptotically most powerful. GMM
estimators can be more easily implemented but they are in general not fully
efficient. In this section we present a GMM estimator that will be used in our
empirical analysis of a bivariate system for consumption and income for the
case where at most one serial correlation common feature vector exists.
For each individual, let us split Xi, t = (yi, t, zi, t)
and let the bivariate DGP be
i zi, t + i, t
yi, t = i + *
(13)

pi 1
zi, t = i(yi *i zi)t 1 +
k=1

pi 1
yi, t 1 +
(i)
1,k
(i)
2,kzi, t 1 + i, t,
(14)
k=1
where the second equation for zi, t is just one row of the VECM (11), with
normalized cointegrating vector
i = [1, *i ]. Both the ys and the zs are
autocorrelated as the disturbances i, t depend on lagged values of yi, t, zi, t
and on the error correction mechanism. Under the null of serial correlation
common features for individual i, i, t is a white noise process and the
i ].
i = [1, *
normalized SCCF vector is given by
In practice (Vahid & Engle, 1993, 1997), after the cointegration analysis in
the first step, the GMM procedure proceeds as follows. Regress the explanatory
variables zt on the whole set of instruments (i.e. lags of Xt and cointegrating
vectors) in order to obtain the best linear prediction zt. Then regress yt on a
constant term and zt. This estimate gives the potential serial correlation
common feature vector i. Finally, one tests for the validity of the
overidentifying restrictions using Hansens (1982) 2 test.
143
A. Heterogeneous Independent Case

When the observations on individuals are assumed cross-sectionally independent, a joint test for the existence of one individual-specific (heterogeneous)
common feature vector can be obtained by computing the 2-statistics for the
SCCF restrictions for each individual [i ~ 2(i)], with the same number of
variables for each i but with the possibility of having a different dynamics and
the presence or not of cointegrating vectors. The number of degrees of freedom
is then given by i = n(pi 1) + ri (n 1) since si equals one. Using the
standard central limit theorem for large N, we then have
i
i=1
(2)1/2
~ N(0, 1)
where =
i
(15)
k=1
This procedure is however not appropriate in the presence of cross-correlation,

a phenomenon pointed out inter alia by OConnell (1998) in the case of panel
unit root tests. The size distortions increase with N and with the crosscorrelation. While these distortions are DGP-dependent, we observe empirical
sizes of about 20% (nominal size = 5%) for T = 25 and N = 10 as well as for
T = 50 and N = 25 using a Monte Carlo experiment similar to the one presented
in Section 6.4
B. Homogeneous and Heterogeneous Case Dependent
In most cases disturbances across individuals i will be at least contemporaneously correlated i.e. if some ij 0 for i j, and/or for ii being non-diagonal
for some i. For instance, when testing for PPP in panel data, contemporaneous
disturbance correlation arises because one country must serve as a benchmark.
Also, for instance, for a given country consumption and income cannot be
assumed independent. One way to deal with this cross-country correlation is to
incorporate a common time dummy in the panel. This solution was pursued by
Pedroni (1997b) in the context of panel cointegration test, but it appears that
time dummies do not capture all the correlation, see OConnell (1998). Another
solution we use here is to account for cross-correlation by using GLS or SUR
type corrections. These corrections require that T > N and the asymptotics we
consider are mainly based on T while N is fixed or at least grows at a
lower rate than T.
Assuming that all the variables in levels are I(1), we first test for each
individual i the existence of a cointegrating relationship using standard time
144
series-based procedures. In the case the null hypothesis of no-cointegration can

be rejected, the cointegration vector(s) are then considered as known in the
subsequent analyses. An alternative to the time series based cointegration
analysis is to rely on a test procedure designed for cointegrated panel models,
a procedure which could possibly be more powerful. The asymptotic arguments
used in panel cointegration analysis are however mainly based on large Nasymptotics and independence across units while we are here dealing with
fixed N cases allowing for dependence across the units. Existing Monte Carlo
simulations furthermore reveal (see inter alia McCoskey & Kao, 1998b,
Pedroni, 1997b) the occurrence of some problems when cross-correlation
exists. Moreover, the properties of common feature test statistics will be
affected by the outcome of the cointegration analysis. Indeed, if one
erroneously imposes an identical homogeneous cointegrating matrix *i for all
i, while for some j cointegration does not hold or holds with a cointegrating
matrix different from *i , the likelihood to reject the SCCF restrictions will
tend to increase.
Before presenting the GMM-estimator, we present the model under common
features in general terms. Under separation C3, the model (11) can be written
as
1 0 0

2
0
0
0 0
N
0 0
s nN
X1t
X2t

XNt
nN 1
1 0 0

2
0
0
=
0 0
N
0 0
s nN
u1t
u2t

uNt
(16)
nN 1
with s =
si and ut = (u
1t, u
2t, . . . , u
Nt)
being IIN(0, ).
i=1
Under the homogeneity assumption C4, the model (16) specializes to

become
1)Xt = (IN
1)ut.
(IN
(17)
As in (13) and (14), we partition the vector Xit as [y

it, z
it]
, where yit and
i is normalized (without
zit are si 1 and (n si) 1 subvectors. The matrix
i = [Is , *
i
]. Under this normalization, the
loss of generality) as follows
i
system (16) can be expressed as
y1t
y2t

yNt
145
1
0 0
*
2

0 *
0
0
0
0
0 *
z1t
z2t

zNt
s (nN s)
s 1
t
+
u
(18)
(nN s) 1
or more compactly
(19)
yt = B
zt + vt
with yt = [y

1t, . . . , y
Nt]
, B
= diag( *i
), zt = [z
1t, . . . , z
Nt]
, vt =
ut,
= diag(
i). Transposing (19) and writing the model for a sample of T

observations, we get
Y = Z
Ts
+V
T (nN s)(nN s) s
(20)
Ts
or in vectorized form
y* =
Ts 1
Z*
Ts (ns isi2)(ns isi2) 1
+ v*
(21)
Ts 1
with y* = vec(Y), v* = vec(V), Z* = diag(Isi Zi) with Zi = [zitl], of
dimension T (n si), with t = 1, . . . , T, l = 1, . . . , n si; and being a vector
i ). Under the homogeneity
with typical i-th subvector being equal to vec( *
assumption C4, *i = *1, i = 1, . . . , N, s = Ns1, the system (21) specializes to

become
y* = Z*rr + v*
(22)
with the [TNs1 s1(n s1)] matrix

Is1 Z1
I Z2
Z*r = s1
...
Is1 ZN
1).
and the [s1(n s1) 1] vector r = vec( *
The vector of parameters and r can be estimated by GMM provided we
have a (Ts k) matrix of instrumental variables W such that EW
v* = 0 and k is
equal to or larger than the number of unknown parameters in (or r).
The GMM estimator solving W
v* = 0 using the weighting matrix S is given
by
(23)
GMM = [Z*
WS 1W
Z*] 1Z*
WS 1W
y*.
The optimal weighting matrix is S = W
W, where = Ev*v*
= IT v,
.
When is unknown, it will have to be replaced by a consistent
v =

146
estimate. The asymptotic covariance matrix of GMM with optimal weighting

matrix S is given by
(24)
Var( GMM) = [Z*
W(W
W) 1W
Z*] 1.
Under homogeneity C4, r can be estimated by expression (23) replacing Z*
by Z*r. When the number of instruments k is strictly larger than the number of
parameters (or r) to be estimated, these overidentifying restrictions can be
tested using the well-known minimum distance criterion
min (v*
W)(W
W) 1(W
v*),

(25)
which has an asymptotic 2-distribution with the number of degrees of freedom

being equal to k minus the number of estimated parameters.
Some remarks on the choice of the instruments have to be made. We can
determine the order pi of the VAR for each country i using for instance
information criteria. The lagged first differences of Xit, i = 1, . . . , pi 1, and
the lagged long-run relations can be used to yield n(pi 1) + ri, instruments Wi
for Zi in (19) and taking W = diag(Tsi, Wi) where ri is the cointegrating rank of
individual i. As is well-known, the OLS estimator regressing y* on Z*,
where the Z* are the projections of Z* on W, can be obtained as a GMM
estimator by selecting S = ITs in (23) and taking W(W
W) 1W
as instrument.
= W(W
* 1W) 1
Similarly, the GLS estimator regressing y* on Z*
1
W
* Z*, with * being the disturbance covariance matrix of the
(multivariate) regression of Z* on W, can be obtained from (23) by taking
S = and using as instruments W(W
* 1W) 1W
* 1 instead of W.
In the empirical analysis in Section VII, we consider a fixed effects model
because in the macroeconomic application, we study the population and not a
sample. Adding fixed effects to the model (21) for the case which we analyze,
e.g. for si = 1, i = 1, . . . , N and n = 2, yields
y = Z [ + Z] + Z*r r + v*,
(26)
where Z = T IN and Z = IT N, with T and N being unit vectors of

dimension T and N respectively. Let JN denote an N N matrix of ones, so
ZZ
= IT JN and the projection of JN on Z is IT J N with J N = JN/N. This
matrix averages over individuals. Also define time means by ZZ
= JT IN and
the projection of JT on Z
is J T IN. It is shown in Baltagi (1995, p. 28) that
r
Q 1QZ*
r ) 1Z r*
Q 1Qy,
(27)
r, GMM = (Z*
where Q = INT J T IN for model with only individual effects and
Q = IT IN J T IN IT J N + J T J N when time dummies are present. The
r = W(W
* 1W) 1W
* 1Z*r will be indicated as the
estimator (27) with Z*
147
GLS-LSDV estimator. When the matrix is replaced by the identity matrix, a

less-efficient estimator arises which will be denoted as the LSDV estimator.
The asymptotic covariance matrix of r, GMM with optimal weighting matrix S is
then given by
r
QW(W
W) 1W
QZ*
r ] 1.
Var( r, GMM) = [Z*
(28)
A test for the validity the overidentifying restrictions is obtained using (25) and
is readily seen to be a test for the null hypothesis of C4, e.g. for the null of
homogeneity of common features: i = 1; i = 1, . . . , N, with s = Ns1, si = s1 = 1,
i = 1, . . . , N. In this specific case, the number of degrees of freedom for the
overidentifying restrictions test (25) is given by
[n(pi 1) + ri (n 1)] +
i=1
(n 1)(N 1) where n, pi, ri are respectively the number of variables, the

number of lags and the number of cointegrating relations for each i. Note that
the factor (n 1)(N 1) arises as a consequence of the pooled estimation of
the common feature vector. Imposing a common co-feature vector actually
decreases by (n 1)(N 1) the number of parameters to be estimated.
More generally, one could naturally extend the analysis (in the case n > 2)
and consider similar analyses for s1 = 1, . . . , n 1. Sequentially testing, for
s1 = 1, . . . , n 1, the validity of the underlying overidentifying restrictions
with (25), provides a direct way to test the number of common co-features in
a GMM set-up, provided we first properly normalize the co-feature matrix as
above. A somewhat similar use of GMM for the detection of the dimension of
the common feature space, albeit in a pure time series context, is discussed in
Vahid & Engle (1997).
In the next section, we evaluate the merits of this analysis (for si = s1 = 1,
i = 1, . . . , N) in a small Monte Carlo experiment.
VI. MONTE CARLO SIMULATIONS

In this section we present some illustrative Monte Carlo evidence on the
usefulness of the common feature test statistic (25) presented above for panel
data. The data are generated as if there exists a huge VECM with both common
feature and cointegrating restrictions. Under the null of reduced rank structures,
the bivariate DGP for each of N individuals assumes the existence of one
cointegrating vector and of a single common feature vector. It has the form:
148

yi, t
1
0.25
=
+
(1
zi, t
2
0.5
0.5
(0.6
1
1)
y1, t 1
z1, t 1
y1, t 1
i1, t
+
,
z1, t 1
i2, t
0.3)
where the s are generated from uniform distributions 1 ~ U(0, 0.3), 2 ~

U( 0.25, 0.15) so that E(1) = 0.15 and E(2) = 0.05. The normalized
common feature vector is = (1, 0.5)
and the normalized cointegration
vector is simply = (1, 1)
. For each individual i, (i1, t, i2, t)
is bivariate
Gaussian with covariance matrix ii. The cross-contemporaneous correlation
matrices between individual i and j are all equal to ij so that the panel VECM
covariance matrix is given by (10) with
ii =

1 0.8
0.8 1
ij =
0.7
0.6
0.6
.
0.75
We have added a heterogeneous structure increasing5 with N.

Figures 1 and 2 illustrate a realization of the DGP for 10 individuals and two
variables and then they compare processes with (Fig. 2) and without (Fig. 1)
Fig. 1.
A Realization of the GDP for 10 Individuals.
Fig. 2.
149
A Realization of the DGP with Additional Heteroscedasticity.
this additional heteroscedasticity. From this DGP we see that under the
assumption of reduced rank the short run dynamic matrices (for each i) are
simply given by
0.30
0.60
0.15
, while under the alternative we chose to
0.30
arbitrarily fix one element to zero:
0.30
0.60
0.00
.
0.30
We consider three sample sizes, i.e. T = 10, 25 and 50, and five cases for the
number of individuals, i.e. N = 1, 2, 5, 10 and 25. We report the median and the
spread (interquartile range) of the bias of the GMM panel estimator. We also
report the median of the standard deviation of r, GMM. We report the empirical
size (nominal being 5%) as well as the empirical size-adjusted power for overidentifying restrictions test statistics. df denotes the number of degrees of
freedom. Due to the huge computational time required for these simulations,
5,000 replications were used for N = 1, 2, 5; 2,000 for N = 10 and 1,000 for
N = 25.
The results are presented in Table 2. One can directly observe that the bias
is small and decreases when both T and/or N increase. The accuracy of
estimates, measured both by the spread and the standard deviation of the
150
Table 2. Monte Carlo Results

(GMM estimation and test statistic)
biasMedian
biasQ75Q25
( r,GMM)Median
2(df)
size
size-adj. power
N=1
T = 10
T = 25
T = 50
0.0123
0.0101
0.0067
0.2228
0.1387
0.0944
0.156
0.098
0.070
(2)
(2)
(2)
7.88
5.58
5.54
9.90
19.78
34.68
N=2
T = 10
T = 25
T = 50
0.0136
0.0069
0.0034
0.1817
0.1057
0.0726
0.106
0.079
0.057
(5)
(5)
(5)
4.98
6.18
5.72
8.56
16.58
31.52
N=5
T = 10
T = 25
T = 50
0.0045
0.0044
0.0021
0.1409
0.0751
0.0460
0.067
0.060
0.047
(14)
(14)
(14)
3.96
5.68
5.74
7.26
12.52
24.82
N = 10
T = 25
T = 50
0.0022
0.0020
0.0658
0.0377
0.046
0.038
(29)
(29)
4.70
4.80
11.00
21.55
N = 25
T = 50
0.0002
0.0398
0.029
(74)
5.80
13.80
estimate, also increases with T and/or N. We interpret these illustrative findings

as evidence in favor of the pooling estimator. No substantial size distortions are
observed. Remark that the values of N we have retained in these simulations are
clearly too small to assess the validity of a central limit theorem based on large
N asymptotic.
VII. EMPIRICAL ANALYSIS

The data we use are taken from the Penn World Tables Mark 5.6 (see Summers
& Heston, 1991).6 Thanks to the homogeneity in their definition, these data are
extremely useful and have been extensively used in empirical literature.
However the data are certainly not free of measurement errors because the price
to pay for obtaining long series of homogeneous data for more than 150
countries is the reliance on a set of hypotheses, approximations and
interpolations. Because of both the quality of the data as well as the underlying
theoretical motivation, we limit our analysis to 22 OECD countries for the
sample period 19501992 (up to 1991 for Greece and 1990 for Portugal).7 The
data extracted are Y = RGDPL: Real GDP per capita (Laspeyres index) in
1985 international prices and C = C: Real Consumption share of GDP in
1985 international prices Y/100. This last operation is necessary to get the
consumption in level and not in percentage of income.8 Figure 3 plots the 44
Fig. 3.
151
Consumption and Output Series for the 22 OECD Countries.
series, namely consumption and income variables for the OECD countries. The
picture also pleas in favor of disposing tools in order to modeling this
information. Lower case c and y denote natural logarithms of C and Y
respectively.
Table 3 reports time series statistics for each country. The first column of
Table 3 lists in alphabetical order, the names of the countries as well as the date
of joining OECD.9 Column 2 gives the quality ranking of the data as presented
in Summers & Heston (1991). It is seen that for the most part, the quality of the
data is reasonable. Columns 3 and 4 give the value of the Augmented DickeyFuller unit root test for respectively consumption and income. All tests are
based on both a constant term and a trend. The number of lags necessary to
whiten the residuals is given in parentheses. Columns 5 and 6 give respectively
the value of the Engle-Granger Augmented Dickey-Fuller cointegrating test for
each country separately and the long-run elasticity of consumption as a
dependent variable. Column 7 gives the order of the VAR(pi) in level, where pi
is determined using multivariate Hannan-Quinn (HQC) criteria. These lags, as
well as the presence of an error correcting mechanism term, will determine the
instruments to be used in common features test statistics.
In Table 3, a * indicates that individual unit root or cointegration test
statistics reject the null at a 5% nominal level. It emerges that, except for
152
Table 3. Time Series Statistics

(Individual countries)
Australia (1971)
Austria (1961)
Belgium (1961)
Canada (1961)
Denmark (1961)
Finland (1969)
France (1961)
Germany (1961)
Greece (1961)
Iceland (1961)
Ireland (1961)
Italy (1961)
Japan (1964)
Luxembourg (1961)
Netherlands (1961)
New Zealand (1973)
Norway (1961)
Portugal (1961)
Spain (1961)
Sweden (1961)
Switzerland (1961)
Turkey (1961)
UK (1961)
USA (1961)
Qual.
ADF ct
ADF yt
EG
i
*
HQC
A
A
A
A
A
A
A
A
A
B+
A
A
A
A
A
A
A
A
A
A
B+
C
A
A
1.21(4)
0.82(0)
1.43(1)
1.50(1)
0.94(0)
2.48(1)
0.11(2)
2.18(2)
0.58(0)
2.64(1)
2.54(1)
0.61(1)
0.91(0)
1.45(1)
0.71(2)
2.26(0)
1.29(1)
3.54(3)*
1.25(0)
0.70(1)
0.03(4)
3.26(2)
3.61(1)*
1.75(0)
0.93(2)
1.25(2)
0.74(1)
1.80(1)
0.94(0)
0.20(2)
0.04(1)
3.10(2)
0.01(0)
2.23(1)
2.82(1)
0.77(1)
0.48(1)
3.32(4)
0.20(2)
1.52(0)
1.76(1)
2.95(3)
1.34(0)
0.30(1)
1.69(2)
3.48(0)*
3.62(1)*
2.05(0)
1.46(1)
3.59(0)*
2.36(0)
3.89(1)*
3.69(0)*
1.69(3)
1.96(0)
1.69(2)
0.79(0)
4.52(0)*
3.76(2)*
1.86(1)
4.75(1)*
2.16(4)
3.07(1)
5.93(0)*
1.83(1)
3.07(3)
2.99(0)
3.58(1)*
3.28(0)
1.73(0)
2.13(0)
4.08(0)*
0.95
1.00
0.94
1.00
0.82
0.98
0.98
1.07
0.97
1.04
0.81
1.09
0.92
1.34
1.08
1.02
0.80
0.88
0.94
0.81
0.92
0.85
1.04
1.15
3
1
1
1
1
4
2
2
1
1
1
4
4
4
4
1
1
3
1
2
2
1
3
2
Portugal, UK and Turkey, we cannot reject the unit root hypothesis for
consumption and income. Using the Engle-Granger cointegration test, the null
hypothesis of non-cointegration is rejected for nine countries with long-run
elasticity *i close10 to 1. Consequently, we will use the cointegrating vectors
as instruments in six different versions: four homogeneous cases and two
heterogeneous ones. We proceed in two steps. In the first step the cointegrating
vectors are estimated. They are used as instruments in the second step to
estimate the common feature vectors. The results are reported in Table 4.
The homogeneous cases refer to a panel estimation of a common
cointegrating vector, that is parameters are assumed to be the same across
countries and the contemporaneous disturbance correlation across countries
and across variables for a given country is ignored. Absence of short-run
Granger-causality between countries is assumed throughout steps 1 and 2.
153
Because most panel cointegration test statistics assume independence across

individuals, we cannot, strictly speaking, rely on these panel cointegration test
statistics. However because the estimator of the cointegrating vectors is still
consistent we use them to get estimates for four different cases.
As Table 3 shows even when the absence of cointegration is not rejected, the
elasticity is close to one. We first analyze a version in which we assume that
there exists a homogeneous cointegrating relationship for all the countries
with a coefficient * equal to one (see upper panel of Table 4). Similar results
are obtained using Johansens MLE based procedure.
A second panel cointegration test uses the group mean estimator (GM) of
Pesaran et al. (1997). This means that we average cointegrating vectors over
the 22 individuals.
Table 4.
Common Features within 22 OECD Countries

r,GM
NGM
r,GMM
( r,GMM)
Test
df
p-val
*i = 1,
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.770
0.769
0.770
3.71
6.14
4.43
0.745
0.660
0.704
0.718
0.050
0.036
0.031
0.036
148.98
173.65
211.27
156.04
65
109
153
93
< 0.001
< 0.001
0.001
< 0.001
GM = 0.979
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.829
0.804
0.793
5.36
6.54
4.95
0.768
0.670
0.710
0.728
0.051
0.036
0.031
0.036
146.67
176.61
214.06
156.92
65
109
153
93
< 0.001
< 0.001
< 0.001
< 0.001
OLS = 0.939
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.870
0.837
0.822
5.74
5.12
3.93
0.814
0.687
0.727
0.738
0.050
0.036
0.031
0.036
131.96
170.16
206.93
145.01
65
109
153
93
< 0.001
< 0.001
0.002
< 0.001
*i = LSDV = 0.968
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.855
0.821
0.804
6.03
6.25
4.94
0.782
0.677
0.715
0.733
0.051
0.036
0.031
0.036
142.93
175.97
213.50
155.12
65
109
153
93
< 0.001
< 0.001
0.001
< 0.001
j
*i = *
(i,j with
cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.814
0.726
0.755
6.89
6.16
4.46
0.782
0.647
0.696
0.707
0.053
0.036
0.031
0.037
138.45
158.74
210.03
146.50
52
96
140
80
< 0.001
< 0.001
< 0.001
< 0.001
*i = 1
(i with
cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.865
0.784
0.775
1.59
3.89
2.72
0.810
0.682
0.734
0.750
0.056
0.037
0.033
0.040
115.25
144.00
192.33
131.56
52
96
140
80
< 0.001
0.001
0.002
< 0.001
154
A third alternative uses the usual OLS estimator.

The last one allows for intercept heterogeneity and is the usual LSDV
estimator.
Note that the pooled FM-OLS estimator proposed by Pedroni (1997a), which
assumes independence across units, gives a point estimate of 0.971 for the 22
OECD countries and 1.021 for the G7 countries, the latter being not
significantly different from one. Both results are very close to those obtained
with the LSDV and OLS estimators so that the results of the common cyclical
feature analysis obtained with Pedronis FM-OLS estimator are not reported.
For the two heterogeneous cases we impose cointegration for the nine
countries for which the Engle-Granger ADF test is significant. In step 2, we
take as an instrument, cointegrating vectors for countries for which we reject
the null. Notice that Phillips-Hansen Fully Modified OLS estimation was also
used to test formally the assumption of unit long-run elasticity. The null of unit
long-run elasticity was formally rejected in all cases of cointegration but for
three (Austria, Canada and New-Zealand). Two different cases are considered:
For the nine countries we take the estimated value of *i given by the longrun regression.
We fix these values to 1.
The maximum lag length for a country is four, so that p* = (p1) = 0, 1, 2 or 3
for some countries. The following cases are considered:
p* is fixed uniformly to respectively 1, 2, 3
p* is fixed to the value determined using the HQ criterion.
Note that over-estimating the lag length will certainly reduce the power of the
test statistics (Beine & Hecq, 1999). The results of the two panel common
feature statistics are presented. For the heterogeneous cases, the first two
columns present the group mean estimates (denoted by r, GM) as well as the
value of the Normal test statistics (NGM) in (15) which tests for the significance
of one common feature vector. The next columns present the value of common
feature elasticity for the homogeneous dependent case (denoted by r, GMM), the
associated standard errors denoted by ( r, GMM), as well as the value of the test
of the overidentifying restrictions implied by the common feature vector
(column labelled Test) asymptotically 2(df) under the null, with the column df
indicating the degrees of freedom of these statistics. The final column labelled
p-val reports the associated p-values. Note that in the second step, we always
assume the occurrence of nonzero contemporaneous disturbances correlation.
It appears that the estimated coefficient r, GMM and r, GM are too high
compared with a priori expectations. Moreover we reject the null of a panel
155
common feature model for both test statistics. Table 5 presents the results for
the G7. The results are similar to those for the panel of 22 countries. However
in several situations we cannot reject the null of one homogeneous common
features vector. In these cases, we imposed the unlikely hypotheses of an
homogeneous cointegrating vector with a lag order uniformly fixed to p* = 3.
Finally, we want to notice the implications for empirical modeling that
follow from a restriction between the number of variables n and the sum of
cointegrated vectors and common features vectors. From Vahid & Engle
(1993), Theorem 1, it follows that the common feature space and the
cointegration space are linearly independent. This means that the sum of the
number of common feature vectors (s) and of the number of cointegrating
vectors (r) should be less than or equal to the number of variables (n): r + s n.
In a panel context under the absence of long and short-run Granger causality,
this has obvious but different implications depending on whether common
features vectors and cointegrating vectors are homogeneous or heterogeneous.
Table 5.
Common Features within G7 Countries

r,GM
NGM
r,GMM
( r,GMM)
Test
df
p-val
LSDV
*i = 1 = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.866
0.763
0.755
2.47
2.37
1.81
1.042
0.856
0.872
0.884
0.087
0.060
0.052
0.058
32.83
53.70
67.05
50.84
20
34
48
30
0.035
0.017
0.036
0.010
GM = 1.035
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.893
0.777
0.766
1.64
1.815
1.49
1.021
0.857
0.878
0.892
0.082
0.060
0.052
0.057
31.51
50.25
62.75
46.22
20
34
48
30
0.048
0.036
0.075*
0.029
OLS = 1.023
*i = *
(i)
p* = 1
p* = 2
p* = 3
p* = p*i
0.882
0.771
0.762
1.75
1.89
1.51
1.036
0.856
0.876
0.890
0.084
0.060
0.052
0.057
32.06
51.11
63.87
47.84
20
34
48
30
0.043
0.030
0.062*
0.021
j
*i = *
i,j with
size cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.818
0.710
0.737
6.02
3.58
2.13
0.894
0.723
0.787
0.800
0.074
0.053
0.047
0.051
49.07
52.66
64.46
46.61
16
30
44
26
*i = *j = 1
(i,j with
size cointegration)
p* = 1
p* = 2
p* = 3
p* = p*i
0.875
0.753
0.764
2.68
2.60
1.66
1.029
0.859
0.894
0.917
0.089
0.062
0.053
0.061
27.69
47.49
60.14
43.97
16
30
44
26
< 0.001
0.006
0.024
0.008
0.034
0.022
0.053*
0.015
156
A misspecification of the number of homogeneous cointegrating vectors may

for instance too heavily constrain the dimension of the homogeneous common
feature space and lead to flawed inference regarding the existence of common
features.
A last remark seems in order. Although we can formally reject the existence
of a common homogeneous co-feature relation in this OECD data set, one
should be aware that our results do not per se imply the absence of SCCF for
some of the countries taken individually.
VIII. CONCLUSION
In this chapter we extended the serial correlation common feature analysis to
nonstationary panel data models. Concentrating upon the fixed effect model,
we defined homogeneous panel common feature models. We give a series of
steps allowing to implement these tests. We then apply this framework when
investigating the liquidity constraints model for 22 OECD and G7 countries. At
a 5% nominal level, we reject the presence of a panel common feature vector.
From the empirical analysis we can draw several (tentative) conclusions:
First, in a country by country analysis for approximately slightly less than
50% of the countries in the sample, there is evidence of cointegration between
consumption and income. The cointegrating vector appears to be homogeneous
across these countries with a long-run consumption elasticity close to one.
Second, for the sample of 22 countries, the existence of one homogeneous
SF (SCCF) common feature vector is rejected in most instances when using the
test proposed in (15). For the sample of G7 countries, in several instances, the
occurrence of a homogeneous SF common feature vector is not rejected. Notice
that this restriction is obviously less restrictive when it only applies to seven
countries. However the p-values are quite low and the non rejection of the null
hypothesis occurs when the model might be misspecified in particular because
we have maintained a homogeneous lag length of 3.
Third, the overidentifying restrictions implied by the assumption of a
homogeneous common feature vector are rejected in all instances in the sample
of 22 countries. For the G7 countries, again there is occasionally evidence in
favor of the overidentifying restrictions.
Again, it is not surprising to see that the assumption of homogenous
common features is rejected more frequently than the assumption of
homogenous cointegration. In the long-run consumption and income are
closely linked to each other, short-run deviations are generally possible and can
be realized through saving or borrowing.
157
Our model representation is not stricto sensus a dynamic panel because only
a part of the dynamics is common to all individuals. However it does part of the
job. Indeed while no size distortions have been noticed in our Monte Carlo
results, we can increase the power of test statistics, by going a step further
towards dynamic panel data if the null hypothesis of panel common-cyclical
feature model is not rejected. In the opposite case, it is not worth imposing
further common restrictions if the null is rejected. This is a clue for considering
less restrictive models like heterogeneous or group homogeneous models. A
bootstrap procedure could certainly be undertaken to find the distribution. This
is also perhaps the place to choose more flexible models like the nonsynchronous common cycle model (Vahid & Engle, 1997) or the weak form
common feature analysis (Hecq, Palm & Urbain, 1997).
ACKNOWLEDGMENT
Support from METEOR through the research project Dynamic and Nonstationary Panels: Theoretical and Empirical Issues is gratefully
acknowledged. The authors want to thank two anonymous referees and the coeditor for useful comments of a previous version of this paper. The usual
disclaimer applies. The GAUSS routines and the data that have been used in
this paper are available from http://www.employ.unimaas.nl/j.urbain
NOTES
1. Note that Vahid & Engle (1997) have extended their framework to the case where
a linear combination is a MA(q) process and not a white noise. They labelled this model
non-synchronous common cycle.
2. The first step checks for the presence of cointegrating relationships and then,
given the estimated cointegration relations, the common feature analysis is carried out
in a second step. An alternative is to use a joint estimation procedure that exploits both
the cointegration and common features restrictions using a switching algorithm (Hansen
& Johansen, 1998; Hecq, 1999).
3. See Anderson and Vahid (1996) for the connection between GMM and canonical
correlation estimators.
4. Complete results are available upon request.
5. The operation is the following. Consider an N dimensional vector with increment
four g = (1, 5, 9 . . .)
. We form an nN nN matrix G = gg
R with R an n n matrix
with all elements equal to 1. Then the heteroskedasticity disturbance covariance matrix
* is given by * = G, with given in (10) and the elementwise product or
Hadamard product.
6. The data may be downloaded via different internet sites such as
http://www.nber.org/pwt56.html or http://datacentre.epas.utoronto.ca:5620/pwt.
7. Because of computation facility, we have balanced the panel in this study and we
did not consider either Greece and Portugal.
158
8. We did not consider here a slightly different model in which real government
expenditures are substracted from output. Indeed, as raised by Evans & Karas (1996b),
the model should be extended to take care of the potential substitutability or
complementarity between private and public goods. Without a fine distinction of the
components of government expenditures, it might be desirable to extend the model to
take into account a third variable. It is also possible to consider a simple alternative
model where all the public goods are substitutable to private one by substracting G from
Y.
9. Other countries joined the OECD. This was the case of the Czech Republic in
1995, Korea in 1996, Poland 1996 and Mexico 1994. We drop them because the ending
year is 1992 in our data set. Also note that OECD has its origin in the Organization for
European Economic Cooperation which grouped European Countries. This organization was charged with administering United States aid, under the Marshall Plan, to
reconstruct Europe after the World War II. Consequently, for countries that did not
participate at the beginning in this project, homogeneity of cointegration and/or
common features might be rejected for that reason.
10. As noted in Section 4, the main part of the approach presented in this paper also
applies to non-cointegrated systems.
REFERENCES
Ahn, S. K. (1997). Inference of Vector Autoregressive Models with Cointegration and Scalar
Components. Journal of the American Statistical Association, 92, 350356.
Anderson, H., & Vahid, F. (1996). Testing Multiple Equation Systems for Common Nonlinear
Components. Working paper, Department of Economics, Texas A&M University.
Banerjee, A. (Ed.) (1999). Testing for Unit Roots and Cointegration Using Panel Data: Theory and
Applications. Oxford Bulletin of Economics and Statistics, 61, 607629.
Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley.
Beine, M., & Hecq, A. (1997). Asymmetric Shocks Inside Future EMU. Journal of Economic
Integration, 12, 131140.
Beine, M., & Hecq, A. (1998). Codependence and Convergence, an Application to the EC
Economies. Journal of Policy Modeling, 20, 403426.
Beine, M., & Hecq, A. (1999). Inference in Codependence: Some Monte Carlo Results and
Applications. Annales dEconomies et de Statistique, 54, 6990.
Campbell, J. Y., & Mankiw, N. G. (1990). Permanent Income, Current Income, & Consumption.Journal of Business and Economic Statistics, 8, 265279.
Campbell, J. Y., & Mankiw, N. G. (1991). The Response of Consumption to Income: A CrossCountry Investigation. European Economic Review, 35, 723767.
Candelon, B., & Hecq, A. (2000). Stability of the Unemployment-Activity Relationship In: A
Codependent System. Applied Economics Letters, forthcoming.
Engle, R. F., & Kozicki, S. (1993). Testing for Common Features (with comments). Journal of
Engle, R. F., & Watson, M. W. (1981). A One-Factor Multivariate Time Series Model of
Metropolitan Wages.Journal of the American Statistical Association, 76, 545565.
Evans, P., & Karras, G. (1996a). Convergence Revisited. Journal of Monetary Economics, 37,
249265.
159
Evans, P., & Karras, G. (1996b). Private and Government Consumption With Liquidity
Constraints. Journal of International Money and Finance, 2, 255266.
Geweke, J. (1977). The Dynamic Factor Analysis of Economic Time Series. In: D. J. Aigner & A.
S. Goldberger (Eds), Latent Variables in Socio-Economic Models.Amsterdam: NorthHolland.
Gouriroux, C., & Peaucelle, I. (1993). Sries codpendantes: application lhypothse de parit
du pouvoir dachat. In: Macroconomie}, Dveloppements Rcents. Economica: Paris.
Granger, C. W. J., & Lin, J. L. (1995). Causality in the Long Run. Econometric Theory, 11,
530536.
Granger, C. W. J., & Haldrup, N. (1997). Separation in Cointegrated Systems and P-T
Decompositions. Oxford Bulletin of Economics and Statistics, 59, 449463.
Greene, W. H. (1993). Econometric Analysis. New York: MacMillan.
Groen, J. J., & Kleibergen, F. (1999). Likelihood-Based Cointegration Analysis in Panels of Vector
Error Correction Models. Discussion Paper TI 99055/4, Tinbergen Institute, Erasmus
University Rotterdam.
Hamilton, J. D. (1994). Time Series Analysis. Princeton: Princeton University Press.
Hansen, L. P. (1982). Large Sample Properties of Generalized Method of Moment Estimators.
Hansen, P. R., & Johansen, S. (1998). Workbook on Cointegration. Oxford: Oxford University
Press.
Hecq, A. (1999). On the Usefulness of Considering Common Serial Features and Cointegrating
Restrictions. Working paper, University of Maastricht RM/99/017.
Hecq, A., Palm, F. C., & Urbain, J. P. (1997). Testing for Common Cycles in VAR Models with
Cointegration. Working paper, University of Maastricht RM/97/031 (revised 1998).
Hecq, A., Palm, F. C., & Urbain, J. P. (1999). Separation and Weak Exogeneity in Cointegrated
VAR Models with Common Features. mimeo, University of Maastricht.
Hecq, A., Palm, F. C., & Urbain, J. P. (2000). Permanent-Transitory Decomposition in VAR
Models with Cointegration and Common Cycles. Oxford Bulletin of Economics and
Statistics, forthcoming.
Hoogstrate, A. J. (1998). Dynamic Panel Data Models: Theory and Macroeconomic Applications.
Ph. D.Thesis, University of Maastricht.
Im, K. S., Pesaran, M. H., & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels.
mimeo, University of Cambridge.
Issler, J. V., & Vahid, F. (1996). Common Cycles in Macroeconomic Aggregates. mimeo.
Jobert, T. (1995. Tendances et cycles communs la consommation et au revenu: Implications pour
le modle de revenu permanent. Economie et Prvision, 121, 1938.
Johansen, S. (1995). Likelihood-Based Inference in Cointegrated Vector Autoregressive Models.
Oxford: Oxford University Press.
Kugler, P., & Neusser, K. (1993). International Real Interest Rate Equalization: A Multivariate
Time-Series Approach. Journal of Applied Econometrics, 8, 163174.
Kunst, R., & Neusser, K. (1990). Cointegration in Macroeconomic System. Journal of Applied
Konishi, T., & Granger, C. W. J. (1993). Separation in Cointegrated Systems. Working paper,
Department of Economics, University of California-San Diego
160
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite Sample
Properties. Working paper, Department of Economics, University of Calfornia-San Diego.
Larsson, R., & Lyhagen, J. (1999). Likelihood-Based Inference in Multivariate Panel Cointegration Models. Working paper 331, Stockholm School of Economics, SSE.
Lumsdaine, R. L., & Prasad, E. (1997). Identifying the Common Components in International
Economic Fluctuations. NBER Working paper 5984.
Ltkepohl, H. (1991). Introduction to Multiple Time Series Models. Berlin: Springer Verlag.
McCoskey, S., & C. Kao. (1998a. A Residual-Based Test of the Null of Cointegration in Panel
McCoskey, S., & Kao, C. (1998b). A Monte Carlo Comparison of Tests for Cointegration in Panel
Data. mimeo.
OConnell, P. (1998). The Overvaluation of Purchasing Power Parity. Journal of International
Economics, 44, 119.
Pedroni, P. (1997a). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper, Department of Economics, Indiana University.
Pedroni, P. (1997b). Cross Sectional Dependence in Cointegration Tests of Purchasing Power
Parity. Working paper, Department of Economics, Indiana University.
Pesaran, M. H., Shin, Y., & Smith, R. P. (1997). Pooled Estimation of Long-Run Relationships in
Dynamic Heterogenous Panels. Working paper, Department of Economics, University of
Cambridge.
Pesaran, M. H., & Smith, R. P. (1995). Estimating Long-Run Relationships From Dynamic
Heterogenous Panels. Journal of Econometrics, 68, 79113.
Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Nonstationary Panel
Phillips, P. C. B., & Moon, H. (1999b). Nonstationary Panel Data Analysis: An Overview of Some
Recent Developments. Econometric Reviews, forthcoming.
Singleton, K. (1980). A Latent Time Series Model of the Cyclical Behavior of Interest Rates.
International Economic Review, 21, 559575.
Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of
International Comparisons, 19501988. Quarterly Journal of Economics, 106, 327368.
Tiao, G. C., & Tsay, R. S. (1989). Model Specification in Multivariate Time Series. Journal of
Royal Statistical Society (series B), 51, 157213.
Vahid, F., & R. F. Engle (1993). Common Trends and Common Cycles. Journal of Applied
Econometrics}, 8, 341360.
Vahid, F., & R. F. Engle. (1997). Codependent Cycles. Journal of Econometrics, 80, 199221.
Vahid, F., & Issler, J. V. (1999). The Importance of Common-Cyclical Features in VAR Analysis:
A Monte-Carlo Study. Presented at ESEM99 in Madrid, Spain.
THE LOCAL POWER OF SOME UNIT

ROOT TESTS FOR PANEL DATA
Jrg Breitung
ABSTRACT
To test the hypothesis of a difference stationary time series against a trend
stationary alternative, Levin & Lin (1993) and Im, Pesaran & Shin (1997)
suggest bias adjusted t-statistics. Such corrections are necessary to
account for the nonzero mean of the t-statistic in the case of an OLS
detrending method. In this chapter the local power of panel unit root
statistics against a sequence of local alternatives is studied. It is shown
that the local power of the test statistics is affected by two different terms.
The first term represents the asymptotic effect on the bias due to the
detrending method and the second term is the usual location parameter of
the limiting distribution under the sequence of local alternatives. It is
argued that both terms can offset each other so that the test has no power
against the sequence of local alternatives. These results suggest to
construct test statistics based on alternative detrending methods. We
consider a class of t-statistics that do not require a bias correction. The
results of a Monte Carlo experiment suggest that avoiding the bias can
improve the power of the test substantially.
I. INTRODUCTION
In a panel data set, a variable yit is observed for cross section units i = 1, . . . , N
in t = 1, . . . , T time periods. A well known problem with such data is
ISBN: 0-7623-0688-2
161
162
JRG BREITUNG
unobserved heterogeneity (e.g. Hsiao (1986) and Baltagi (1995)). In a

univariate time series context heterogeneity may result in individual specific
mean and short run dynamics. For illustration consider an autoregressive
process of the form
yit = i + iyi, t 1 + it ,
(1)
where the error term it is assumed to be uncorrelated across i and t. In this
model individual heterogeneity is represented by the individual specific
parameters i, i and 2i = E(2it). If there are no further assumptions on the
parameters, then the data for each cross section unit can be analyzed separately
by running N different regressions. In this case, we take no advantage from
pooling the data and, thus, inference may be very inefficient. The other extreme
is that we ignore a possible heterogeneity altogether and estimate a pooled
regression with 1 = = N, 1 = = N and 21 = = 2N. Of course,
ignoring heterogeneity in the data may result in biased estimates (e.g. Baltagi
(1995) p. 3f).
Traditional panel data analysis adopts a compromise between these two
extremes and assumes that individual heterogeneity can be represented by an
individual specific intercept i alone. Furthermore, one often encounters
additional assumptions on the individual effect i, for example, that it is
random and uncorrelated with the regressors. The latter model is known as
random-effects model.
It is not surprising that early work on tests for unit roots in panel data starts
from the Dickey-Fuller type regression with individual specific intercept (e.g.
Breitung (1992)). Levin & Lin (henceforth: LL) (1993) and Im, Pesaran & Shin
(henceforth: IPS) (1997) consider more general models by allowing for
individual specific short run dynamics and time trends.
It is well known that the usual dummy variable estimator (or within-group
estimator) of dynamic models suffers from the so-called Nickell bias (Nickell
1981). The same is true if individual specific time trends are estimated by using
the dummy-variable approach. LL (1993) construct a bias adjusted t-statistic to
test the null hypothesis of a unit root process. Unfortunately, bias adjusted test
statistics for the model with a constant or a time trend suffer from a severe loss
of power. For example, the power of the LL (1993) test without an intercept
(and thus without the need to correct for the Nickell bias) against a stationary
alternative with an autoregressive coefficient of 0.9 is virtually unity for N = 25
and T = 25. For the bias adjusted test statistic in the model with individual
specific intercept (trend), the power against the same alternative drops to 0.45
(0.25). Furthermore IPS (1997) observe a serious size bias if the bias adjusted
LL statistic is augmented with lagged differences.
The Local Power of Some Unit Root Tests
163
If there is only a constant in the model, the problem is easily resolved by

subtracting the first observation instead of the mean. As argued in Schmidt &
Phillips (1992), the first observation is the best estimator of the constant under
the hypothesis of a random walk. Furthermore, subtracting the first observation
instead of the mean avoids the Nickell bias and, therefore, the test does not
require a bias correction (cf. Breitung & Meyer (1994)). To study the
asymptotic properties we compare the local power of the bias adjusted test
statistics. Our analysis demonstrates that the local power of the test depends on
two different terms. The first term represents the asymptotic effect on the bias
due to the detrending method and the second term is the usual location
parameter of the limiting distribution under the sequence of local alternatives.
It is shown that if the long-run variances are estimated consistently, both terms
cancel out each other so that the test statistic is centered around zero under the
local alternative. Levin & Lin (1993), suggest to estimate the long-run
variances by using a non-parametric estimator computed from the first
differences of the series. An attractive property of this approach is that under
the alternative the non-parametric estimator tends to zero so that the resulting
test statistic has power against the sequence of local alternatives. A class of tstatistics is suggested that do not require a bias correction. These tests are based
on the t-statistic from a simple least-squares regression of transformed
variables and it is shown that the limiting distribution of these tests is standard
normal. The results of our Monte Carlo experiments suggest that avoiding the
detrending bias may improve the power of the test substantially.
The rest of this chapter is organized as follows. In Section II the details of
the test statistics are given. The local power of the tests is analyzed in Section
III. In Section IV a class of t-statistics is suggested in order to avoid the
detrending bias. Since the test are based on asymptotic properties, it is
interesting to consider the relative performance of the tests in small samples.
This problem is studied in Section V by using Monte Carlo simulations.
Furthermore, the actual power against a sequence of local alternatives is
investigated by means of Monte Carlo simulations. Section VI offers some
conclusions and makes suggestions for further research.
Finally, a word on the notational conventions applied in this chapter. A
standard Brownian motion is written as Wi(r). Although there are different
Brownian motions for different cross section units i, we sometimes drop the
index i for convenience. This has no consequences for the final results since
they depend on the expectation of the stochastic functionals. Furthermore, if
there is no risk of misunderstanding, we drop the limits and the argument r (or
dr). For example, the term 01 rWi(r) dr will be economically written as rW. A
164
JRG BREITUNG
detrended Brownian motion is represented as V(r) V = W W 12r rW.

As usual in this kind of literature we use [a] to indicate the integer part of a.
The proofs of the lemmas and theorems can be found working paper version
(Breitung 1999).
II. THE TEST STATISTICS

Assume that the variable yit can be represented as
yit = i + it + xit ,
(2)
t = 1, 2, . . . , T ,
where xit is generated by the autoregressive process

p+1
xit =
ikxi, t k + it
(3)
k=1
and xis = 0 for s 0. It is assumed that it is white noise with E(2it) = 2i and
E|it|2 + < for all i, t and some > 0. Furthermore it is assumed to be
independent of js for i j and all t and s.
The null hypothesis is that the process is difference stationary, i.e.

p+1
H0:
i
ik 1 = 0 for all i = 1, . . . , N .
(4)
k=1
Under the alternative we assume that yit is (trend) stationary, that is,
i < 0 for
all i.
The assumptions concerning it ensure that there exists a functional central
limit theorem such that

[rT]
T 1/2
it iWi(r) ,
t=1

T
where Wi(r) is a Brownian motion, = lim E(T ) and i = T

2
i
2
i
1
it (e.g.
t=1
Phillips & Solo (1992)). The parameter 2i is sometimes called the long-run
variance, since it is computed as 2 times the spectral density at frequency
zero.
LL (1993) suggest a test procedure against the alternative
1 = =
N < 0.
Let eit (vi, t 1) denote the residuals from a regression of yit (yi, t 1) on
1, t, yi, t 1, . . . , yi, t p. Furthermore, let e it = eit /i and v it = vit /i, where in
165
practice 2i is estimated using the residuals eit. The LL test is based on the bias
adjusted t-statistic for
= 0 in the regression:
e it =
vi, t 1 + it .
LL (1993) show that under the null hypothesis, the ordinary t-statistic tends to
minus infinity if a constant or a time trend is included in the model. Therefore,
they suggest a bias adjusted test statistic given by

N
i=1
t=1
[eitv i, t 1 (i /i)aT]
LL =
(5)
bT
i=1
v 2i, t 1
t=1
where aT and bT are the small sample analogs of
b2 =
V dV
(6)
var[ VdV]
E V2
(7)
a = E
and V V(r) is a detrended Brownian motion. LL (1993) suggest to use a nonparametric estimator for 2i based on the first differences of the data.1
IPS (1997) relax the assumption of a common parameter
under the
alternative. Accordingly, model (2) is estimated for each cross section unit
separately, yielding an individual specific Dickey-Fuller t-statistic i. The IPS
statistic is given by:

N
IPS = N
1/2
[i mT]/T ,
i=1
where i is the usual augmented Dickey-Fuller t-statistic for cross section unit
i, and mT, 2T are small sample analogs of

m = E
2 = var
VdV
(8)
V 2
VdV
V 2
(9)
166
JRG BREITUNG
IPS (1997) provide tables for various values of T and the lag order p. As for the
LL test, these tables assume that the panels are balanced, that is, all cross
section units have the same number of time periods T.
III. LOCAL POWER

In this section we study the local power of alternative test procedures. The
sequence of local alternatives given by
yit = i + it + xit ,
(10)
where
xit = 1
TN
xi, t 1 + it
c>0.
(11)
To analyze the asymptotic behavior of the tests, it is important to specify the

relationship between N and T (see Phillips & Moon (1999)). For our analysis
it is convenient to apply sequential limits denoted by (T, N )seq, wherein
T is followed by N . Although such an asymptotic framework is more
restrictive than using a joint limit and requires moment conditions that are
difficult to verify (see IPS (1997)), we follow Kao (1999), Moon & Phillips
(1999) and others and apply a sequential limit. Whether our results continue to
hold for a joint limit theory is an interesting problem for future research.
We will further assume that the initial value yi0 is fixed or stochastic with a
finite variance. When the initial conditions are allowed to go into the remote
past, the initial condition plays a role in the limiting distribution of the process
(e.g. Phillips & Lee (1996)). In what follows, however, we will neglect such
complications in order to keep the analysis reasonably simple.
In the following Lemma we state the important fact that under the local
alternative the limiting process of xit is the same as under the null hypothesis.
Lemma 1: Under the local alternative given in (10)(11) and a sequential limit
(T, N )seq we have
T 1/2xi, [rT] iWi(r) ,
0c< .
This is an important difference to the asymptotic theory in the usual time series
context, where under the local alternative the limiting process is an OrnsteinUhlenbeck process (cf. Phillips (1987)).
The probability limits of the tests depend on the parameters i and i. First,
we consider the theoretical value of 2i under the local alternative.
167
Lemma 2: Under the local alternative (10)(11) we have

2i = lim E(T 1x2iT) = 2i .
T
In what follows we derive the main result by assuming that 2i is estimated

consistently for all values of c 0.
First, we present the local power in a model without any deterministics. In
this case no bias adjustment is required and the test can be based on the usual
t-statistic of the pooled sample (Quah 1994).
Theorem 1: Under the sequence of local alternatives given in (10)(11) with
i = 0 and i = 0, the t-statistic for
= 0 in the pooled regression yit =
yi, t 1 + it is asymptotically distributed as ( c/2, 1).

In Breitung (1999) it is shown that the same local power is obtained if the
individual mean i is removed by subtracting the first observation or if in
addition a common time trend 1 = 2 = . . . = N is assumed.
Next we consider the bias corrected test statistics. Under the local alternative
the bias adjusted (BA) statistic due to LL (1993) converges to the limit
E N T
*BA(c) = lim N
N, T

N
1
1
e itv i, t 1 N
i=1
t=1
( i /i)a
i=1
1
E N 1T 2
v 2i, t 1
i=1
t=1
Note that numerator and denominator are normalized so that both converge to
a fixed limit.
Since
e itv i, t 1 = [i 1 it c/(TN)vi, t 1]vi, t 1
the limit can be written as

N
lim N N
*BA(c) =
N, T
1
E(Ti) a
i=1
b E V 2
cE V 2
,
b
(12)
where we use i /o = 1 under the local alternative and

T
Ti = T
1
i 1 itv i, t 1 .
t=1
It turns out that the limit of the bias adjusted statistic depends on two different
terms on the right hand side of (12). The first term is due to the detrending
168
JRG BREITUNG
method represented by the statistic Ti. The second term is proportional to

E V 2 and is similar to the usual location parameter in the asymptotic
distribution under the null hypothesis. For example, in the simple regression
model yt = xt + ut with stationary variables, the location parameter is
proportional to E(xt2).
It is important to notice that the expectation of Ti enters the test statistic with
the factor N and, therefore, for the asymptotic analysis the expectation must
be determined with an accuracy up to O(N 1/2). The following Lemma provides
an approximation of this expectation that is sufficient for our purpose.
Lemma 3: Under the local alternative given in (10)(11) the asymptotic
expectation of Ti is given by
lim E(Ti) = (1/15)c/N 0.5 + o(N 1/2) .
Since the result of Lemma 3 is crucial for the local power of the bias adjusted
test, the accuracy of the approximation is investigated in a Monte Carlo
experiment. First, we generate 10,000 realizations of Ti by letting T = 200,
c = 5 and repeat the experiment with various values for N.2 If Lemma 3 holds,
a regression of the sample means of Ti on c/N and a constant should yield
an estimate for the intercept close to 0.5 and a slope of roughly 1/15 = 0.067.
Using N{30, 35, 40, . . . , 500} the following regression function was
obtained for the 71 realizations:
E(Ti) 0.495 + 0.0629c/N ,
(0.00060)
(0.0016)
where standard errors are given in parentheses. The estimated slope coefficient
is only slightly smaller than 0.067 and, therefore, the approximation in Lemma
3 seems to perform fairly well in finite samples.
Now we present the limiting distribution of the bias adjusted test statistic.
Theorem 2: Consider a sequence of local alternatives given in (10)(11). If the
estimator for i converges weakly to i, the bias adjusted test statistic is
asymptotically distributed as (0, 1).
It turns out that the bias adjusted test can fail to have power against the
sequence of local alternatives. This finding suggests that the power may be
improved by a modification that avoids the bias correction altogether. Such a
modified test procedure is suggested in Section IV.
It is important to notice that the test suggested by LL (1993) employs a nonparametric estimator that converges to zero for a stationary alternative. In the
univariate time series context the unit root tests are inconsistent if the long-run
169
variance is estimated by using the differences of the time series (cf Phillips &
Ouliaris (1990), Theorem 5.3). Therefore, Phillips & Perron (1988) estimate 2i
by using the residuals of the autoregression. In a panel data framework,
however, this approach yields a test that has no power against the sequence of
local alternatives.
Finally the local power of the IPS test is investigated. As in the case of the
bias adjusted statistic considered above, the probability limit of the test statistic
depends on two terms. The first term is due to the detrending method and
depends on

T
i 1itv i, t 1
*Ti =
t=1
v 2i, t 1
t=1
Since this statistic is a ratio of correlated random variables, the analytic

evaluation of this bias is very complicated. To obtain a suitable approximation
we apply a similar simulation technique that was also used to check the
reliability of Lemma 3. Using the same setup as before the following
approximation is found for the expectation of *Ti:
E(*Ti) 2.151 + 0.212c/N
(0.0030)
(0.0077)
(14)
This approximation can be used to compute the limiting distribution of the IPS
test given in
Theorem 3: For a sequence of local alternative given in (10)(11) the IPS test
is asymptotically distributed as (IPS
c , 1), where
IPS
c =
c

lim
E(*Ti)

(c/N)
E
V2
c=0
Again we find that the local power depends on two terms. Our Monte Carlo
experiment suggests that the derivative of E(*Ti) is positive so that the
detrending bias implies a substantial loss of power.
Using 10,000 Monte Carlo replications, the expression E( V 2) is
estimated as 0.243. Using the value 100 = 0.597, which is taken from the
values reported in IPS (1997), we obtain:
cIPS = c(0.2120.243)/0.597 = 0.0401c .
170
JRG BREITUNG
It turns out that the asymptotic mean function has a relatively small slope of
roughly 0.04 compared to the slope of 1/2 = 0.707 for the case
without deterministic trend (see Theorem 1).
III. TEST STATISTICS WITHOUT BIAS ADJUSTMENT

From the local power analysis we found that bias corrections used for the LL
and IPS tests may imply a severe loss of power. It is therefore desirable to avoid
the bias term when constructing the t-statistics. For the case that the model
includes only a constant, such an unbiased statistic is easily obtained by
subtracting the first observation instead of the individual mean. This is the
approach used in Breitung & Meyer (1994). In this section we consider a class
of test statistics that do not involve a bias term.3
To facilitate the exposition we will assume that the data are generated by an
AR(1) process and, thus, no augmentation with lagged differences is needed.
For higher order processes, yit and yi, t 1 are replaced by the residuals from the
regressions of yit and yi, t 1 on yi, t 1, . . . , yi, t p. Furthermore, to correct for
individual specific variances, the series are adjusted as in the case of the LL
statistic.
The idea is to transform the variables yit and yi, t 1 such that the usual
regression t-statistic can be used to test the unit root hypothesis. For this
purpose we define the T 1 vectors yi = [yi1, . . . , yiT] and xi =
[yi0, . . . , yi, T 1]. In order to construct the test statistic we use the transformed
vectors y*i = Ayi = [y*i1, . . . , y*iT] and x*i = Bxi = [x*i1, . . . , x*iT] such that
E(y*it x*it ) = 0
(15)
for all i and t. Imposing further assumptions to rule out degenerate cases it is
possible to show that a t-statistic based on the transformed variables has a
standard normal limiting distribution.
Theorem 4: Let yit be white noise with E(yit) = i, E(yit i)2 = 2i > 0 and
E(yit i)4 < . Under the assumption (15) and
lim E(T 1y*i y*i ) > 0
lim E(T 1x*i AAx*i) > 0
the statistic
171

N
UB =
i 2y*i x*i
i=1
N
i 2x*i AAx*i
i=1
has a standard normal limiting distribution as (N, T )seq.

A simple way to satisfy assumption (15) is to use an upper triangular matrix A,
where the elements of each row sum to zero. In other words, only the present
and future observations are used to transform the differences yit. A well
known example for such a transformation is the Helmert transformation given
by
y*it = st yit
1
(yi, t + 1 + + yiT) ,
Tt
t = 1, 2, . . . , T 1, (16)
where s2t = (T t)/(T t + 1). This transformation is also used in Arellano &
Bover (1995), for example. An important property of this transformation is that
whenever yit is a white noise process with constant variance, then the same is
true for y*it. Obviously, if yit is a random walk with (individual specific) time
trend, then y*it has a zero mean and is uncorrelated with yi, t 1.
The matrix B is chosen such that E(x*it) = 0 and E(y*it x*it) = 0. A possible
transformation with the desired properties is:
t1
yiT .
(17)
x*it = yi, t 1 yi1
T

T
1
Note that T yiT = T
1
yit is an estimator of i and, thus, the transformed
t=1
variable x*it is adjusted for a time trend. It is easy to verify that in this case y*it
and x*it are uncorrelated. Furthermore, since the transformation matrix A
corresponding to the Helmert transformation (16) satisfies AA = I we conclude
from Theorem 4 that the t-statistic for H0:
* = 0 in the pooled regression
y*it =
*xit + e*it t = 2, 3, . . . , T 1
(18)
has a standard normal limiting distribution.
To compute the local power function of this test statistic we need an
approximation for

T
E(*Ti) = E T
1
y*it x*it
t=1
172
JRG BREITUNG
that is accurate up to O(N 1/2). As for the LL and IPS statistic, such an
approximation is obtained by fitting a regression function to the simulated
values of *Ti:
E(*Ti) 0.0104 0.0407cN .
(0.0021)
(0.0104)
(19)
Since the test statistic is constructed to have an expectation of zero under the
null hypothesis, we expect to find a constant close to zero. The estimated
constant is indeed quite small but nevertheless significant. The slope coefficient
is significantly negative so that the test seem to have a local power larger than
the size. The following theorem presents further details on the local power of
the UB statistic.
Theorem 5: For a sequence of local alternative given in (10)(11) the UB test
is asymptotically distributed as (UB
c , 1), where
UB
c = c6 lim
E(*Ti)
(c/N)
c=0
It is interesting to compare the local power of the IPS and the UB test. Since
6 0.0407 > 0.0401, the UB statistic has a location parameter which is more
than twice as large in absolute value compared to the IPS statistic. Again,
however, we emphasize that this comparison is inappropriate, because the IPS
test is more general than the UB test as it allows for a heterogeneous
autoregressive parameter under the alternative.
IV. SMALL SAMPLE PROPERTIES

The asymptotic properties of the tests do not depend on the number of lagged
differences that are used to account for higher order autoregressive models.
However, as noted by IPS (1997) for a small number of time periods T, the null
distribution may be substantially affected by the augmentation lag. They
therefore present tables for the mean and the variance of i that depend on the
type of deterministics (constant/trend), the number of time periods T and the
augmentation lag p.
From the usual Dickey-Fuller test for univariate time series it is known that
the power of the test deteriorates substantially with an increasing augmentation
lag. It is therefore expected that also the power of panel unit root tests are
affected by the choice of the augmentation lag.
To study the robustness of the size and power of the tests considered in the
previous sections we generate time series according to the process
173
xit = xi, t 1 + it
(20)
and yit = i + it + xit. The initial values of the process are set equal to zero. The
errors are i.i.d. with it ~ N(0, 1). Since all tests are invariant to the parameters
i and i, these parameters are set equal to zero. For the bias and variance
corrections of the LL and IPS tests the tabulated values in LL (1993) and IPS
(1997) are used. To represent a typical regional panel data set, we let T = 30
(years) and N = 20 (countries). All rejection frequencies are computed from
1000 realizations with a nominal significance level of 0.05.
Table 1 presents the rejection frequencies for the different tests. For p > 0 the
LL test turns out to be quite conservative. This was also observed by IPS (1997)
and, therefore, the values for the mean and variance of this test should also be
tabulated for different augmentation lags. With respect to the power of the test
it turns out that for p = 0 the power of the LL and IPS tests are roughly similar.
For p > 0 the IPS test is more powerful than the LL test, at least if the critical
values of the LL test are not adjusted for different augmentation lags.
The UB statistic suggested in Section IV appears to be substantially more
powerful than the LL and IPS tests. Furthermore the size of the UB test is fairly
robust with respect to the augmentation lag. Notice that for the UB test no
tables are required for different values of p and T.
In the next Monte Carlo experiment we consider the validity of the
theoretical results for the actual power of the test. For this purpose we set
Table 1.
LL

1.00
0.95
0.90
0.80
IPS
UB
LL
p=0
0.025
0.048
0.189
0.801

1.00
0.95
0.90
0.80
Empirical size and power for T = 30 and N = 20
0.046
0.076
0.198
0.723
0.045
0.072
0.118
0.365
UB
p=1
0.073
0.127
0.396
0.897
0.005
0.009
0.041
0.277
p=2
0.001
0.001
0.001
0.002
IPS
0.053
0.077
0.152
0.544
0.069
0.213
0.417
0.807
p=3
0.038
0.147
0.260
0.508
0.000
0.000
0.000
0.000
0.040
0.056
0.107
0.257
Note: Empirical sizes computed from 1000 Monte Carlo replications of model (20).
p denotes the number of lagged differences. The nominal size is 0.05.
0.053
0.195
0.266
0.418
174
JRG BREITUNG
= 120/(TN). If the test does not have power against such alternative, we
expect that the power of the test tends to the size as N and T . In our
Monte Carlo comparison we also include a variant of the LL test that estimates
the long-run variances by using the regression residuals instead of the first
difference of the process. As shown in Section III such a test has a local power
equal to the size. The critical values for this test are computed by Monte Carlo
simulations. The respective test is denoted as LL*.
Table 2 presents the outcome of such a Monte Carlo experiment. As
predicted by Theorem 2, the power of the LL* test is close the size for all N and
T. All other tests appear to converge to a limit larger than the size, where the
limiting power of the UB test is nearly twice as large as the limiting power of
the IPS test. The original LL test turns out to have power against the local
alternative but the power is substantially smaller than the power of the IPS and
UB statistics.
The findings of the Monte Carlo experiment can be compared to the results
of our theoretical analysis. From Theorem 3 it is expected that the IPS test has
Table 2. Power against local alternatives
LL
N
25
50
70
100
25
50
70
100
LL*
IPS
UB
0.384
0.300
0.296
0.261
0.668
0.660
0.608
0.579
N and T
0.378
0.269
0.210
0.170
0.064
0.056
0.033
0.050
T fixed, N
50
70
100
25
25
25
0.235
0.156
0.090
0.038
0.038
0.028
0.342
0.313
0.273
0.575
0.535
0.450
N fixed, T
25
25
25
50
70
100
0.415
0.378
0.298
0.061
0.020
0.028
0.419
0.421
0.402
0.724
0.742
0.783
Note: This table reports the rejection rates computed from 1000 replications of model (20) with
= 1 20/(TN). The significance level is 0.05. The statistic LL* is constructed similarly to the
LL test but using the residuals from the autoregressions to estimate 2i . For this test the values
for the expectation and variance are computed by additional Monte Carlo simulations.
175
a limiting power of ( 1.645 + 20 0.0401) = 0.199, where ( ) denotes the

c.d.f. of the standard normal distribution. The empirical power for N = 100 and
T = 100 is 0.261, which is higher than the predicted power based on Theorem
3. This may be due to the simulation error when using (14). An analogous
calculation using the results for the UB statistic yields a limiting power of
( 1.645 + 20 0.0997) = 0.636. Since the empirical power for N = 100 and
T = 100 is 0.579, the value derived from Theorem 5 using (19) tends to be too
high.
Finally it is interesting to note that the power of the tests appears to
deteriorate with fixed T and increasing N. For the LL test the local power seems
to tend slowly to the size as T is fixed and T .
V. CONCLUSION
In this chapter we have considered the local power of some well known tests
and a new test for unit roots in panel data. We found that the LL and IPS tests
suffer from a severe loss of power if individual specific trends are included.
Therefore, a class of test statistic is suggested that does not employ a bias
adjustment and it is found that the power of this test is substantially higher than
the LL and the IPS tests. Furthermore, it turns out that the LL test is very
sensitive to the augmentation lag. It is therefore recommended to apply tables
for the mean and variance that take into account the lag-augmentation of the
test.
The results further indicate that the power of the tests is very sensitive to the
specification of the deterministic terms. If there is only a constant or a joint
linear trend, then subtracting the first observation yields a very powerful test.
Including individual specific trends when it is unnecessary leads to a dramatic
loss of power. Hence, in practice it is desirable to have a test for a common
deterministic trend against the alternative of individual specific time trends.
As pointed out by a referee, there are other detrending methods that may be
used to construct an improved test procedure. A natural candidate is the quasi
difference detrending suggested by Elliot, Rothenberg & Stock (1996) (see
also Phillips & Xiao (1998)). Unfortunately, it can be shown that a t-statistic
computed from quasi differenced data also suffers from a (Nickell type) bias so
that again a bias correction is required to obtain a reasonable test procedure.
Nevertheless, a test procedure based on quasi differences may perform better
than test procedures with OLS detrending. In this chapter, our strategy is to
avoid the bias term altogether. The comparison of our approach to a test
procedure based on quasi differences is left for future research.
176
JRG BREITUNG
ACKNOWLEDGMENTS
The research for this paper was carried out within the SFB 373 at the Humboldt
University Berlin and the METEOR research project Dynamic and Nonstationary Panels: Theoretical and Empirical Issues. I thank Carsten Trenkler
and two referees for their helpful comments and suggestions.
NOTES
1. In LL (1993) the test statistic is divided by NT which is computed as the overall
standard deviation of e it. However, since e it is already adjusted for its standard deviation,
we can drop NT when computing the test statistic.
2. I repeated the experiment for different values of c and T. The results turn out to
be fairly robust.
3. Another possibility is to use alternative estimation methods like the Generalized
Methods of Moments (GMM). Breitung (1997) apply second differences and obtains a
unit root test without bias adjustment by using an appropriate GMM estimator.
REFERENCES
Arellano M., & Bover, O. (1995). Another Look at the Instrumental-Variable Estimation of ErrorComponents Models. Journal of Econometrics, 68, 2951.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. Chichester: Wiley and Sons.
Breitung, J. (1992). Dynamische Modelle fr die Paneldatenanalyse (Dynamic Models for the
Analysis of Panel Data). PhD dissertation, Haag + Herchen, Frankfurt.
Breitung, J. (1997). Testing for Unit Roots in Panel Data Using a GMM Approach. Statistical
Papers, 38, 253269.
Breitung, J. (1999). The Local Power of Some Unit Root Tests for Panel Data. SFB 373 Discussion
paper, No. 691999, Humboldt University Berlin.
Cheung, K. S. (1995), Lag Order and Critical Values of the Augmented Dickey-Fuller Test.
Journal of Business and Economic Statistics, 13, 277280.
Dickey, D. A., & Fuller, W. A. (1979). Distribution of the Estimates for Autoregressive Time Series
With a Unit Root. Journal of the American Statistical Association, 74, 427431.
Elliot, G., Rothenberg, T. J., & Stock, J. H. (1996). Efficient Tests for an Autoregressive Unit Root.
Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. DAE
Working paper, No 9526, University of Cambridge, revised version.
Kao, C. (1999). Spurious Regression and Residual-based Tests for Cointegration in Panel Data.
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample
Properties. Working paper, Department of Economics, University of California San
Diego.
177
Moon, H. R., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using
Panel Data. mimeo, Yale University.
Nickell, S. (1981). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 14171426.
Phillips, P. C. B. (1987). Towards a Unified Asymptotic Theory of Autoregression. Biometrika, 74,
53548.
Phillips, P. C. B., & Lee, C. C. (1996). Efficiency Gains from Quasi-Differencing Under
Nonstationarity. In: P. M. Robinson & M. Rosenblatt (Eds), Essays in Memory of E. J.
Hannan (pp. 300314).
Phillips, P. C. B., & Moon, H. R. (1999). Linear Regression Limit Theory for Nonstationary Panel
Phillips, P. C. B., & Ouliaris, S. (1990). Asymptotic Properties of Residual Based Tests for
Cointegration. Econometrica, 58, 165193.
Phillips, P. C. B., & Perron, P. (1988). Testing for a Unit Root in Time Series Regression.
Biometrika, 75, 335346.
Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20,
9711001.
Phillips, P. C. B., & Xiao, Z. (1998). A Primer on Unit Root Testing. Journal of Economic Surveys,
12, 423467.
Quah, D, (1994). Exploiting Cross-Section Variation for Unit Root Inference in Dynamic Data.
Schmidt, P., & Phillips, P. C. B. (1992). LM Test for a Unit Root in the Presence of Deterministic
Trends. Oxford Bulletin of Economics and Statistics, 54, 257287.
ON THE ESTIMATION AND

INFERENCE OF A COINTEGRATED
REGRESSION IN PANEL DATA
Chihwa Kao and Min-Hsien Chiang
ABSTRACT
In this chapter, we study the asymptotic distributions for ordinary least
squares (OLS), fully modified OLS (FMOLS), and dynamic OLS (DOLS)
estimators in cointegrated regression models in panel data. We show that
the OLS, FMOLS, and DOLS estimators are all asymptotically normally
distributed. However, the asymptotic distribution of the OLS estimator is
shown to have a non-zero mean. Monte Carlo results illustrate the
sampling behavior of the proposed estimators and show that (1) the OLS
estimator has a non-negligible bias in finite samples, (2) the FMOLS
estimator does not improve over the OLS estimator in general, and (3) the
DOLS outperforms both the OLS and FMOLS estimators.
I. INTRODUCTION
Evaluating the statistical properties of data along the time dimension has
proven to be very different from analysis of the cross-section dimension. As
economists have gained access to better data with more observations across
time, understanding these properties has grown increasingly important. An area
of particular concern in time-series econometrics has been the use of nonstationary data. With the desire to study the behavior of cross-sectional data
ISBN: 0-7623-0688-2
179
180
CHIHWA KAO & MIN-HSIEN CHIANG
over time and the increasing use of panel data, e.g. Summers and Heston (1991)
data, one new research area is examining the properties of non-stationary timeseries data in panel form. It is an intriguing question to ask: how exactly does
this hybrid style of data combine the statistical elements of traditional crosssectional analysis and time-series analysis? In particular, what is the correct
way to analyze non-stationarity, the spurious regression problem, and
cointegration in panel data?
Given the immense interest in testing for unit roots and cointegration in timeseries data, not much attention has been paid to testing the unit roots in panel
data. The only theoretical studies we know of in this area are Breitung & Meyer
(1994); Quah (1994); Levin & Lin (1993); Im, Pesaran & Shin (1995); and
Maddala & Wu (1999). Breitung & Meyer (1994) derived the asymptotic
normality of the Dickey-Fuller test statistic for panel data with a large crosssection dimension and a small time-series dimension. Quah (1994) studied a
unit root test for panel data that simultaneously have extensive cross-section
and time-series variation. He showed that the asymptotic distribution for the
proposed test is a mixture of the standard normal and Dickey-Fuller-Phillips
asymptotics. Levin & Lin (1993) derived the asymptotic distributions for unit
roots on panel data and showed that the power of these tests increases
dramatically as the cross-section dimension increases. Im et al. (1995) critiqued
the Levin and Lin panel unit root statistics and proposed alternatives. Maddala
& Wu (1999) provided a comparison of the tests of Im et al. (1995) and Levin
& Lin (1993). They suggested a new test based on the Fisher test.
Recently, some attention has been given to the cointegration tests and
estimation with regression models in panel data, e.g. Kao (1999), McCoskey &
Kao (1998), Pedroni (1996, 1997) and Phillips & Moon (1999). Kao (1999)
studied a spurious regression in panel data, along with asymptotic properties of
the ordinary least squares (OLS) estimator and other conventional statistics.
Kao showed that the OLS estimator is consistent for its true value, but the tstatistic diverges so that inferences about the regression coefficient, , are
wrong with a probability that goes to one. Furthermore, Kao examined the
Dickey-Fuller (DF) and the augmented Dickey-Fuller (ADF) tests to test the
null hypothesis of no cointegration in panel data. McCoskey & Kao (1998)
proposed further tests for the null hypothesis of cointegration in panel data.
Pedroni (1997) derived asymptotic distributions for residual-based tests of
cointegration for both homogeneous and heterogeneous panels. Pedroni (1996)
proposed a fully modified estimator for heterogeneous panels. Phillips & Moon
(1999) developed both sequential limit and joint limit theories for nonstationary panel data. Pesaran & Smith (1995) are not directly concerned with
cointegration but do touch on a number of related issues, including the potential
Panel Cointegration
181
problems of homogeneity misspecification for cointegrated panels. See the

survey paper by Baltagi & Kao (2000) in this volume.
This chapter makes two main contributions. First, it adds to the literature by
suggesting a computationally simpler dynamic OLS (DOLS) estimator in panel
cointegrated regression models. Second, it provides a serious study of the finite
sample properties of the OLS, fully modified OLS (FMOLS), and DOLS
estimators.
Section 2 introduces the model and assumptions. Section 3 develops the
asymptotic theory for the OLS, FMOLS and DOLS estimators. Section 4 gives
the limiting distributions of the FMOLS and DOLS estimators for heterogeneous panels. Section 5 presents Monte Carlo results to illustrate the finite
sample properties of the OLS, FMOLS, and DOLS estimators. Section 6
summarizes the findings. The proofs of Theorems 1, 2, and 4 are not presented
since the proofs can be found in Phillips & Moon (1999) and Pedroni (1997).
The appendix contains the proofs of Theorems 3 and 5.
A word on notation. We write the integral 01W(s)ds, as W, when there is no
ambiguity over limits. We define 1/2 to be any matrix such that =
(1/2)(1/2). We use || A || to denote {tr(AA)}1/2, |A| to denote the determinant
p
of A, to denote weak convergence, to denote convergence in probability,
[x] to denote the largest integer x, I(0) and I(1) to signify a time-series that
is integrated of order zero and one, respectively, and BM() to denote
Brownian motion with the covariance matrix .
II. THE MODEL AND ASSUMPTIONS

Consider the following fixed effect panel regression:
yit = i + xit + uit, i = 1, . . . , N, t = 1, . . . , T,
(1)
where {yit} are 1 1, is a k 1 vector of the slope parameters, {i} are the
intercepts, and {uit} are the stationary disturbance terms. We assume that {xit}
are k 1 integrated processes of order one for all i, where
xit = xit 1 + it.
Under these specifications, (1) describes a system of cointegrated regressions,
i.e. yit is cointegrated with xit. The initialization of this system is yi0 = xi0 = Op(1)
as T , for all i. The individual constant term i can be extended into general
deterministic time trends such as 0i + 1it + , . . . , + pit p.
Assumption 1. The asymptotic theory employed in this paper is a sequential
limit theory established by Phillips & Moon (1999) in which T and
followed by N .
182
Next, we characterize the innovation vector wit = (uit, it). We assume that wit is
a linear process that satisfies the following assumption.
Assumption 2. For each i, we assume:

(a) wit =
(L)it =
j it j,
j=0
ja||
j || < , |
(1)| 0, for some a > 1.
j=0
(b) it is i.i.d. with zero mean, variance matrix , and finite fourth order
cumulants.
Assumption 2 implies that (e.g. Phillips & Solo, 1992) the partial sum process

[Tr]
1
T
wit satisfies the following multivariate invariance principle:
t=1

[Tr]
1
T
wit Bi(r) = BMi() as T for all i,
(2)
t=1
where
Bi =

Bui
.
Bi
The long-run covariance matrix of {wit} is given by
=
E(wijwi0)
j=
=
(1)
(1)
= + +
=
u
u
u
,

where
j=1
and
E(wijwi0) =
u
u
u

(3)
Panel Cointegration
183
= E(wi0wi0) =
u
u
u

(4)
are partitioned conformably with wit.

Assumption 3. is non-singular, i.e. {xit}, are not cointegrated.
Define
u. = u u 1u.
(5)
Then, Bi can be rewritten as

Bi =
where

Bui
1/2
u.
=
Bi
0
u 1/2
1/2

Vi
,
Wi
(6)

Vi
= BM(I) is a standardized Brownian motion. Define the one-sided
Wi
long-run covariance
=+
E(wijwi0)
j=0
with
=
u
u
u
.

Here we assume that panels are homogeneous, i.e. the variances are constant
across the cross-section units. We will relax this assumption in Section 4 to
allow for different variances for different i.
Remark 1. The benefits of using panel data models have been discussed
extensively by Hsiao (1986) and Baltagi (1995), though Hsiao & Baltagi
assume the time dimension is small while the cross-section dimension is large.
However, in international trade, open macroeconomics, urban regional, public
finance, and finance, panel data usually have long time-series and crosssection dimensions. The data of Summers & Heston (1991) are a notable
example.
184
Remark 2. The advantage of using the sequential limit theory is that it offers
a quick and easy way to derive the asymptotics as demonstrated by Phillips &
Moon (1999). Phillips & Moon also provide detailed treatments of the
connections between the sequential limit theory and the joint limit theory.
Remark 3. If one wants to obtain a consistent estimate of in (1) or wants to
test some restrictions on , then an individual time-series regression or a
multiple time-series regression is probably enough. So what are the advantages
of using the (N, T) asymptotics, e.g. sequential asymptotics in Assumption 1,
instead of T asymptotics? One of the advantages is that we can get a normal
approximation of the limit distributions of the estimators and test statistics with
the convergence rate NT. More importantly, the biases of the estimators and
test statistics can be reduced when N and T are large. For example, later in this
paper we will show that the biases of the OLS, FMOLS, and DOLS estimators
in Table 2 were reduced by half when the sample size was changed from
(N = 1, T = 20) to (N = 20, T = 20). However, in order to obtain an asymptotic
normality using the (N, T) asymptotics we need to make some strong
assumptions; for example, in this paper we assume that the error terms are
independent across i.
Remark 4. The results in this chapter require that regressors are not
cointegrated. Assuming that I(1) regressors are not cointegrated with each
other is indeed restrictive. The authors are currently investigating this issue.
III. OLS, FMOLS, AND DOLS ESTIMATORS

Let us first study the limiting distribution of the OLS estimator for equation (1).
The OLS estimator of is

N
OLS =
i=1
t=1

(xit x i)(xit x i)
1
i=1
t=1
(xit x i)(yit y i) .
(7)
All the limits in Theorems 16 are taken as T followed by N

sequentially from Assumption 1. First, we present the following theorem:
Theorem 1. If Assumptions 13 hold, then
(a) T( OLS ) 3 1u + 6 1u,
(b) NT( OLS ) NNT N(0, 6 1u.),
p
where
Panel Cointegration
185

N
1
NT =
N
1
T2
i=1
(xit x it)(xit x i)
1
t=1
1
N
idWi 1/2u + u

W
1/2

i=1
i = Wi Wi.
and W
The normality of the OLS estimator in Theorem 1 comes naturally. When
summing across i, the non-standard asymptotic distribution due to the unit root
in the time dimension is smoothed out. From Theorem 1 we note that there is
an interesting interpretation of the asymptotic covariance matrix, 6 1u., i.e.
1u. can be seen as the long-run noise-to-signal ratio. We also note that
1
2u is due to the endogeneity of the regressor xit, and u is due to the serial
correlation. It can be shown easily that
NT 3 1u + 6 1u.
p
If wit = (uit, it) are i.i.d., then

NT 3 1u,
p
u,
, and
,
u be
which was examined by Kao & Chen (1995). Let
consistent estimates of , u, , and u respectively. Then from (b) in
+
Theorem 1, we can define a bias-corrected OLS, OLS
,
NT
+
= OLS
OLS
T
such that
+
NT( OLS
) N(0, 6 1u.),
where
1
u + 6
1
u.
NT = 3
Chen, McCoskey & Kao (1999) investigated the finite sample proprieties of the
OLS estimator in (7), the t-statistic, the bias-corrected OLS estimator, and the
bias-corrected t-statistic. They found that the bias-corrected OLS estimator
does not improve over the OLS estimator in general. The results of Chen et al.
suggest that alternatives, such as the FMOLS estimator or the DOLS estimator
(e.g. Saikkonen, 1991; Stock & Watson, 1993) may be more promising in
186
cointegrated panel regressions. Thus, we begin our study by examining the

limiting distribution of the FMOLS estimator, FM. The FMOLS estimator is
constructed by making corrections for endogeneity and serial correlation to the
OLS estimator OLS in (7). Define
uit+ = uit u 1it,
(8)
1
+
(9)
u it = uit u it,
yit+ = yit u 1xit,
(10)
and
u
1xit.
(11)
y it+ = yit
Note that

u 1
Ik
uit+
1
=
it
0

uit
,
it
which has the long-run covariance matrix
u.
0
0
,

where Ik is a k k identity matrix. The endogeneity correction is achieved by

modifying the variable yit, in (1) with the transformation
u
1xit
y it+ = yit
u
1xit.
= i + xit + uit
The serial correlation correction term has the form
u
)
+u = (

1
u
1
u,
1

= u
are kernel estimates of u and . Therefore, the FMOLS
where u and
estimator is

N
FM =
(xit x i)(xit x i)
i=1
t=1
i=1
t=1
1
(xit x i)yit+ T +u
(12)
Panel Cointegration
187
Now, we state the limiting distribution of FM.

Theorem 2. If Assumptions 13 hold, then NT( FM ) N(0, 6 1u.).
It can be shown easily that the limiting distribution of FM becomes
NT( FM ) N(0, 2 1u.)
(13)
by the exclusion of the individual-specific intercept, i.

it, were estimated, we used
Remark 5. Once the estimates of wit, w
= 1

NT

N
i=1
t=1
(14)
w
itw
it
to estimate . was estimated by

N
=1

N
i=1
1
T
t=1

l
1
w
itw
it +
T
l
=1
(w itw
it + w it w
it) ,
t=+1
(15)
where l is a weight function or a kernel. Using Phillips & Durlauf (1986)
can be shown to be consistent for and
and sequential limit theory, and
.
) does not
Remark 6. The distribution results for FM require N(
may not be small when T is fixed.
diverge as N grows large. However,
) may be non-neglibible in panel data with finite
It follows that N(
samples.
Next, we propose a DOLS estimator, D, which uses the past and future values
of xit as additional regressors. We then show that the limiting distribution of
D is the same as the FMOLS estimator, FM. But first, we need the following
additional assumption:
Assumption 4. The spectral density matrix fww() is bounded away from zero
and full rank for all i, i.e.
fww() IT, [0, ], > 0.
When Assumptions 2 and 4 hold, the process {uit} can be written as (see
Saikkonen, 1991):
uit =
j=
for all i, where
cijit + j + vit
(16)
188
|| cij || < ,
j=
{vit} is stationary with zero mean, and {vit} and {it} are uncorrelated not only
contemporaneously but also in all lags and leads. In practice, the leads and lags
may be truncated while retaining (16) approximately, so that

q
uit =
cijit + j + v it.
j=q
for all i. This is because {cij} are assumed to be absolutely summable, i.e.
|| cij || < .
j=
We also need to require that q tends to infinity with T at a suitable rate:

q3
0, and
T
Assumption 5. q as T such that

T1/2
|| cij || 0
(17)
|j|>q
for all i.
We then substitute (16) into (1) to get
cijit + j + v it,
cijit + j.
yit = i + xit +
j=q
where
v it = vit +
(18)
|j|>q
Therefore, we obtain the DOLS of , D, by running the following

regression:

q
yit = i + xit +
cijxit + j + v it.
(19)
j=q
Next, we show that D has the same limiting distribution FM as in

Theorem 2.
Theorem 3. If Assumptions 15 hold, then NT( D ) N(0, 6 1u.).
Panel Cointegration
189
IV. HETEROGENEOUS PANELS

This chapter so far assumes that the panel data are homogeneous. The
substantial heterogeneity exhibited by actual data in the cross-sectional
dimension may restrict the practical applicability of the FMOLS and DOLS
estimators. Also, the estimators in Sections 2 and 3 are not easily extended to
cases of broader cross-sectional heterogeneity since the variances and biases
are specified in terms of the asymptotic covariance parameters that are assumed
to be shared cross-sectionally.
In this section, we propose an alternative representation of the panel FMOLS
estimator for heterogeneous panels. Before we discuss the FMOLS estimator
we need the following assumptions:
Assumption 6. We assume the panels are heterogeneous, i.e. i, i and i are
varied for different i. We also assume the invariance principle in (2), (16), and
(17) in Assumption 5 still holds.
Let
i 1/2xit,
x*it =
1/2 +
iu.
u*it =
it ,
u
(20)
iu
i 1it,
u it+ = uit
iu
1/2
1/2
1/2xit)),
i 1xit
y it+ = yit
iu.(iu. xit (i
(21)
1/2 +
iu.
it ,
y*it =
y
(23)
(22)
and
iu. are consistent estimators of i and
i and
where
iu. = iu iui 1iu,
1/2
1/2
respectively. Similar to Pedroni (1996) the correction term,
iu.(iu.

1/2
i xit)), is needed in (22) in the heterogeneous panel. We note that
xit (
1/2
1/2xit) = 0 in the
iu.
(22) will be the same as (11) only if
xit (i
heterogeneous panel. Also (22) requires knowing something about the true .
In practice, in (22) can be replaced by a preliminary OLS, OLS. Therefore,
let
iu
1/2
1/2
1/2xit)) OLS,
i 1xit
y it+ + = yit
iu.(iu. xit (i
and
1/2 + +
iu.
it .
y*it =
y
190
Assumption 7. i is not singular for all i.

Then, we define the FMOLS estimator for heterogeneous panels as
FM =
*

N
i=1
t=1

1
(x*it x *i )(x*it x *i )
i=1
t=1
iu
(x*it x *i )y*it T*
(24)
where
1/2
iu.
i 1/2
iu =
iu +
*

and
i+u = (
iu
i)

1
iu
i 1

iu.
i 1
i
= iu
FM )
Theorem 4. If Assumptions 12 and 67 hold, then NT(*
N(0, 6Ik).
D, can be obtained by
The DOLS estimator for heterogeneous panels, *
running the following regression:

qi
y*it = i + x*
it +
cijx*it + j + v *it,
(25)
j = qi
where v *it is defined similarly as in (18). Note that in (25) different lag
truncations, qi, may have to be used because the error terms are heterogeneous
across i. Therefore, we need to assume that qi tends to infinity with T at a
suitable rate for all i:
Assumption 8. qi as T such that
T1/2
q3i
0, and
T
|| cij || 0
(26)
| j | > qi
for all i.
D also has the same limiting
In the following theorem we show that *
distribution as *FM.
Panel Cointegration
191
D )
Theorem 5. If Assumptions 12 and 68 hold, then NT(*
N(0, 6Ik).
FM and
Remark 7. Theorems 4 and 5 show that the limiting distributions of *
*D are free of nuisance parameters.
Remark 8. We now consider a linear hypothesis that involves the elements of
the coefficient vector . We show that hypothesis tests constructed using the
FMOLS and DOLS estimators have asymptotic chi-squared distributions. The
null hypothesis has the form:
H0:R = r,
(27)
where r is an m 1 known vector and R is a known m k matrix describing the

restrictions. A natural test statistic of the Wald test using FM or D for
homogeneous panels is
1
u.R] 1(R D r).
1
W = NT2(R D r)[R
6
(28)
FM or *
D
Remark 9. For the heterogeneous panels, a natural statistic using *
to test the null hypothesis is
1
D r)[RR] 1(R*
D r).
W* = NT2(R*
6
(29)
It is clear that W and W* converge in distribution to a chi-squared random

variable with m degrees of freedom, 2m, as T and followed by N
sequentially under the null hypothesis. Hence, we establish the following
results:
W 2m,
and
W* 2m.
Because the FMOLS and the DOLS estimators have the same asymptotic
distributions, it is easy to verify that the Wald statistics based on the FMOLS
estimator share the same limiting distributions as those based on the DOLS
estimator.
V. MONTE CARLO SIMULATIONS

The ultimate goal of this Monte Carlo study is to compare the sample
properties of OLS, FMOLS, and DOLS for two models: a homogeneous panel
192
and a heterogeneous panel. The simulations were performed by a Sun

SparcServer 1000 and an Ultra Enterprise 3000. GAUSS 3.2.31 and COINT 2.0
were used to perform the simulations. Random numbers for error terms,
(u*it, *it), for Sections 5 A, B and D, were generated by the GAUSS procedure
RNDNS. At each replication, we generated an N(T + 1000) length of random
numbers and then split it into N series so that each series had the same mean
and variance. The first 1, 000 observations were discarded for each series. {u*it}
and {*it} were constructed with ui0 = 0 and i0 = 0.
In order to compare the performance of the OLS, FMOLS, and DOLS
estimators, the following data generating process (DGP) was used:
(30)
yit = i + xit + uit
and
xit = xit 1 + it
where (uit, it) follows an ARMA(1, 1) process:

uit
0.5 0
=
it
0 0.5
uit 1
u*it
0.3
+
+
it 1
*it
21
0.4
0.6

u*it 1
*it 1
with

u*it iid
~N
*it
0
,
0
1 21
21 1
The design in (30) nests several important special cases. First, when
is replaced by

0
0
0
0
0.5 0
0 0.5
and 21 is constant across i, then the DGP becomes the
homogeneous panel in Section 5A. Second, when

0
0

0.5 0
0 0.5
is replaced by
0
, and 21 and 21 are random variable different across i, then the DGP
0
is the heterogeneous panel in Section 5D.

A. Homogeneous Panel
To compare the performance of the OLS, FMOLS, and DOLS estimators for
the homogeneous panel we conducted Monte Carlo experiments based on a
Panel Cointegration
193
design similar to that of Phillips & Hansen (1990) and Phillips & Loretan
(1991).
yit = i + xit + uit
and
xit = xit 1 + it
for i = 1, . . ., N, t = 1, . . . , T, where

0.4
0.6
uit
u*it
0.3
=
+
it
*it
21

u*it 1
*it 1
(31)
with

u*it iid
~N
*it
0
,
0
1
21
21
1
We generated i from a uniform distribution, U[0, 10], and set = 2. From

Theorems 13 we know that the asymptotic results depend upon variances and
covariances of the errors uit and it. The design in (31) is a good one since the
endogeneity of the system is controlled by only two parameters, 21 and 21. We
allowed 21 and 21 to vary and considered values of {0.8, 0.4, 0.0, 0.8} for
21 and {0.8, 0.4, 0.4} for 21.
The estimate of the long-run covariance matrix in (15) was obtained by using
the procedure KERNEL in COINT 2.0 with a Bartlett window. The lag
truncation number was set arbitrarily at five. Results with other kernels, such as
Parzen and quadratic spectral kernels, are not reported, because no essential
differences were found for most cases.
Next, we recorded the results from our Monte Carlo experiments that
examined the finite-sample properties of the OLS estimator, OLS; the FMOLS
estimator, FM; and the DOLS estimator, D. The results we report are based on
10,000 replications and are summarized in Tables 14 and Figures 18. The
FMOLS estimator was obtained by using a Bartlett window of lag length five
as in (15). Four lags and two leads were used for the DOLS estimator.
Table 1 reports the Monte Carlo means and standard deviations (in
parentheses) of ( OLS ), ( FM ), and ( D ) for sample sizes
T = N = (20, 40, 60). The biases of the OLS estimator, OLS, decrease at a rate of
T. For example, with 21 = 0.8 and 21 = 0.8, the bias at T = 20 is 0.201 and at
T = 40 is 0.104. Also, the biases increase in 21 (with 21 > 0) and decrease in
21.
0.176
(0.044)
0.099
(0.017)
0.069
(0.009)
0.064
(0.025)
0.038
(0.009)
0.027
(0.005)
0.002
(0.015)
0.005
(0.005)
0.004
(0.003)
0.038
(0.012)
0.018
(0.004)
0.011
(0.002)
0.201
(0.049)
0.104
(0.019)
0.070
(0.010)
0.132
(0.038)
0.066
(0.014)
0.044
(0.007)
0.079
(0.027)
0.039
(0.009)
0.026
(0.005)
0.029
(0.016)
0.015
(0.006)
0.009
(0.003)
21 = 0.8
FM
0.007
(0.008)
0.003
(0.002)
0.002
(0.001)
0.001
(0.017)
0.001
(0.005)
0.000
(0.003)
0.001
(0.027)
0.001
(0.027)
0.000
(0.005)
0.001
(0.040)
0.000
(0.013)
0.000
(0.007)
D
0.019
(0.017)
0.009
(0.006)
0.007
(0.003)
0.059
(0.026)
0.029
(0.009)
0.019
(0.005)
0.082
(0.030)
0.041
(0.011)
0.027
(0.006)
0.097
(0.032)
0.049
(0.012)
0.033
(0.007)
OLS
0.036
(0.015)
0.018
(0.005)
0.012
(0.002)
0.019
(0.022)
0.012
(0.008)
0.009
(0.004)
0.068
(0.029)
0.038
(0.011)
0.026
(0.006)
0.113
(0.035)
0.062
(0.013)
0.042
(0.007)
21 = 0.4
FM
0.007
(0.014)
0.003
(0.004)
0.002
(0.002)
0.002
(0.026)
0.001
(0.008)
0.001
(0.008)
0.002
(0.031)
0.001
(0.009)
0.001
(0.005)
0.002
(0.033)
0.001
(0.011)
0.000
(0.006)
D
0.114
(0.034)
0.057
(0.012)
0.038
(0.007)
0.005
(0.016)
0.002
(0.006)
0.001
(0.003)
0.014
(0.013)
0.007
(0.005)
0.005
(0.002)
0.022
(0.011)
0.011
(0.004)
0.007
(0.002)
OLS
0.012
(0.028)
0.011
(0.009)
0.010
(0.005)
0.069
(0.021)
0.035
(0.007)
0.023
(0.004)
0.073
(0.018)
0.037
(0.006)
0.025
(0.003)
0.069
(0.016)
0.036
(0.006)
0.024
(0.003)
21 = 0.8
FM
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS Estimators
0.000
(0.031)
0.000
(0.009)
0.000
(0.005)
0.006
(0.017)
0.003
(0.005)
0.002
(0.003)
0.003
(0.013)
0.001
(0.004)
0.001
(0.002)
0.009
(0.009)
0.004
(0.003)
0.003
(0.002)
D
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 1.
194
Panel Cointegration
Table 2.
195
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS

Estimators for Different N and T
(N,T)
OLS
FM(5)
FM(2)
D(4,2)
D(2,1)
(1,20)
0.135
(0.184)
0.070
(0.093)
0.047
(0.063)
0.024
(0.032)
0.082
(0.030)
0.042
(0.016)
0.028
(0.010)
0.014
(0.005)
0.081
(0.022)
0.041
(0.011)
0.028
(0.007)
0.014
(0.004)
0.080
(0.017)
0.041
(0.009)
0.027
(0.006)
0.014
(0.003)
0.079
(0.012)
0.041
(0.006)
0.027
(0.004)
0.014
(0.002)
0.104
(0.196)
0.059
(0.012)
0.041
(0.064)
0.023
(0.031)
0.068
(0.029)
0.039
(0.015)
0.027
(0.010)
0.014
(0.005)
0.066
(0.021)
0.038
(0.011)
0.026
(0.007)
0.014
(0.004)
0.067
(0.017)
0.038
(0.009)
0.026
(0.006)
0.014
(0.003)
0.066
(0.012)
0.037
(0.006)
0.026
(0.004)
0.014
(0.002)
0.122
(0.189)
0.065
(0.092)
0.043
(0.061)
0.022
(0.031)
0.075
(0.029)
0.039
(0.015)
0.026
(0.009)
0.013
(0.005)
0.073
(0.021)
0.038
(0.011)
0.025
(0.007)
0.013
(0.003)
0.073
(0.017)
0.038
(0.009)
0.025
(0.006)
0.012
(0.003)
0.072
(0.012)
0.037
(0.006)
0.025
(0.004)
0.013
(0.002)
0.007
(0.297)
0.001
(0.106)
0.001
(0.064)
0.001
(0.029)
0.002
(0.031)
0.001
(0.015)
0.000
(0.009)
0.000
(0.005)
0.001
(0.022)
0.001
(0.009)
0.001
(0.007)
0.000
(0.003)
0.002
(0.018)
0.001
(0.008)
0.001
(0.005)
0.000
(0.003)
0.002
(0.012)
0.001
(0.006)
0.001
(0.004)
0.000
(0.002)
0.031
(0.211)
0.015
(0.090)
0.009
(0.057)
0.004
(0.027)
0.017
(0.028)
0.008
(0.014)
0.006
(0.009)
0.003
(0.004)
0.017
(0.019)
0.008
(0.009)
0.005
(0.006)
0.003
(0.004)
0.016
(0.016)
0.008
(0.008)
0.005
(0.005)
0.003
(0.003)
0.016
(0.011)
0.008
(0.005)
0.005
(0.004)
0.003
(0.002)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
Note: (a) A lag length 5 and 2 of the Bartlett windows are used for the FMOLS(5) and FMOLS(2)
estimators. (b) 4 lags and 2 leads and 2 lags and 1 lead are used for the DOLS(4,2) and DOLS(2,1)
estimators. (c) 21 = 0.4 and 21 = 0.4.
5.594
(1.330)
8.435
(1.382)
10.749
(1.439)
2.377
(1.042)
4.558
(1.071)
6.012
(1.109)
0.145
(0.919)
0.796
(0.888)
1.294
(0.899)
3.694
(1.201)
5.509
(1.243)
7.130
(1.281)
7.247
(1.526)
10.047
(1.484)
12.250
(1.468)
5.425
(1.340)
7.507
(1.302)
9.161
(1.287)
3.927
(1.200)
5.453
(1.173)
6.674
(1.161)
2.067
(1.066)
2.898
(1.050)
3.574
(1.040)
21 = 0.8
FMOLS
0.635
(0.732)
0.948
(0.712)
1.236
(0.737)
0.054
(0.993)
0.001
(0.926)
0.147
(0.927)
0.046
(1.132)
0.017
(1.023)
0.009
(1.009)
0.047
(1.281)
0.004
(1.119)
0.004
(1.093)
DOLS
1.229
(1.084)
1.758
(1.067)
2.188
(1.061)
2.944
(1.241)
4.134
(1.229)
5.070
(1.229)
3.905
(1.334)
5.462
(1.325)
6.676
(1.329)
4.650
(1.393)
6.503
(1.389)
7.937
(1.397)
OLS
2.893
(1.214)
4.041
(1.161)
4.983
(1.143)
1.006
(1.180)
1.684
(1.086)
2.198
(1.065)
3.017
(1.282)
4.401
(1.205)
5.489
(1.197)
4.823
(1.414)
6.833
(1.366)
8.429
(1.377)
21 = 0.4
FMOLS
0.530
(1.107)
0.741
(0.984)
0.913
(0.964)
0.096
(1.342)
0.168
(1.134)
0.199
(1.088)
0.124
(1.402)
0.104
(1.168)
0.126
(1.118)
0.086
(1.423)
0.069
(1.187)
0.084
(1.135)
DOLS
4.495
(1.123)
6.255
(1.088)
7.630
(1.092)
0.277
(0.897)
0.334
(0.885)
0.405
(0.891)
0.925
(0.867)
1.336
(0.856)
1.626
(0.859)
1.758
(0.859)
2.491
(0.847)
3.030
(0.847)
OLS
Means Biases and Standard Deviations of t-statistics
0.542
(1.209)
1.349
(1.103)
1.975
(1.087)
5.198
(1.503)
7.086
(1.441)
8.556
(1.395)
6.864
(1.642)
9.744
(1.665)
11.966
(1.644)
7.927
(1.719)
11.584
(1.826)
14.402
(1.840)
21 = 0.8
FMOLS
0.013
(1.350)
0.002
(1.160)
0.003
(1.109)
0.439
(1.277)
0.547
(1.104)
0.663
(1.047)
0.277
(1.203)
0.362
(1.054)
0.408
(0.999)
1.049
(1.122)
1.386
(1.006)
1.633
(0.959)
DOLS
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS estimator.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 3.
196
Panel Cointegration
Table 4.
197
Means Biases and Standard Deviations of t-statistics for Different

N and T
(N,T)
OLS
FMOLS(5)
FMOLS(2)
DOLS(4,2)
DOLS(2,1)
(1,20)
1.169
(1.497)
1.116
(1.380)
1.090
(1.357)
1.092
(1.333)
3.905
(1.334)
3.934
(1.307)
3.861
(1.306)
3.893
(1.312)
5.439
(1.347)
5.462
(1.325)
5.457
(1.328)
5.469
(1.296)
6.677
(1.329)
6.699
(1.323)
6.676
(1.329)
6.677
(1.311)
9.407
(1.350)
9.418
(1.313)
9.411
(1.310)
9.408
(1.315)
1.264
(2.326)
1.169
(1.805)
1.162
(1.692)
1.239
(1.165)
3.017
(1.281)
3.202
(1.206)
3.202
(1.150)
3.247
(1.149)
4.163
(1.269)
4.401
(1.205)
4.506
(1.199)
4.647
(1.190)
5.097
(1.258)
5.384
(1.204)
5.489
(1.197)
5.656
(1.196)
7.153
(1.262)
7.753
(1.171)
7.717
(1.182)
7.932
(1.195)
1.334
(2.031)
1.232
(1.738)
1.195
(1.676)
1.217
(1.652)
3.156
(1.230)
3.169
(1.200)
3.111
(1.191)
3.141
(1.209)
4.342
(1.226)
4.344
(1.197)
4.339
(1.192)
4.356
(1.176)
5.314
(1.208)
5.309
(1.192)
5.289
(1.191)
5.299
(1.182)
7.446
(1.215)
7.753
(1.171)
7.429
(1.174)
7.432
(1.181)
0.304
(3.224)
0.113
(2.086)
0.071
(1.778)
0.056
(1.531)
0.124
(1.402)
0.114
(1.186)
0.053
(1.122)
0.073
(1.078)
0.088
(1.358)
0.104
(1.168)
0.098
(1.121)
0.106
(1.050)
0.169
(1.361)
0.162
(1.169)
0.126
(1.118)
0.115
(1.056)
0.220
(1.348)
0.193
(1.157)
0.177
(1.093)
0.152
(1.057)
0.232
(2.109)
0.258
(1.689)
0.254
(1.554)
0.234
(1.448)
0.695
(1.184)
0.634
(1.099)
0.677
(1.079)
0.642
(1.061)
1.008
(1.169)
0.928
(1.092)
0.913
(1.081)
0.879
(1.033)
1.179
(1.162)
1.097
(1.094)
1.106
(1.074)
1.083
(1.041)
1.662
(1.163)
1.565
(1.085)
1.549
(1.053)
1.530
(1.040)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
estimators. (c) 21 = 0.4 and 21 = 0.4.
Fig. 1.
Distribution of biases of Estimators with N = 40, T = 20.
198
Fig. 2.
Distribution of t-statistics with N = 40, T = 20.
Panel Cointegration
199
Fig. 3.
200
Fig. 4.
Panel Cointegration
201
Fig. 5.
202
Fig. 6.
Panel Cointegration
203
Fig. 7.
204
Fig. 8.
Panel Cointegration
205
206
While we expected the OLS estimator to be biased, we expected the FMOLS

estimator to produce much better estimates. However, it is noticeable that the
FMOLS estimator has a downward bias when 21 0 and an upward bias when
21 < 0. In general, the FMOLS estimator, FM, presents the same degree of
difficulty with bias as does the OLS estimator, OLS. For example, while the
FMOLS estimator, FM, reduces the bias substantially and outperforms OLS
when 21 > 0 and 21 < 0, the opposite is true when 21 > 0 and 21 > 0. Likewise,
when 21 = 0.8, FM is less biased than OLS for values of 21 = 0.8. Yet, for
values of 21 = 0.4, the bias in OLS is less than the bias in FM. There seems
to be little to choose between OLS and FM when 21 < 0. This is probably due
to the failure of the non-parametric correction procedure in the presence of a
negative serial correlation of the errors, i.e. a negative MA value, 21 < 0.
Finally, for the cases where 21 = 0.0, FM outperforms OLS when 21 < 0. On the
other hand, FM is more biased than OLS when 21 > 0.
In contrast, the results in Table 1 show that the DOLS, D, is distinctly
superior to the OLS and FMOLS estimators for all cases in terms of the mean
biases. It was noticeable that the FMOLS leads to a significant bias. Clearly, the
DOLS outperformed both the OLS and FMOLS estimators. The FMOLS
estimator is also complicated by the dependence of the correction in (11) and
(12) upon the preliminary estimator (here we use OLS), which may be biased
in finite samples. The DOLS differs from the FMOLS estimator in that the
DOLS requires no initial estimation and no non-parametric correction.
It is important to know the effects of the variations in panel dimensions on
the results, since the actual panel data have a wide variety of cross-section and
time-series dimensions. Table 2 considers 20 different combinations for N and
T, each ranging from 20 to 120 with 21 = 0.4 and 21 = 0.4. First, we notice
that the cross-section dimension has a significant effect on the biases of
OLS, FM, and D when N is increased from 1 to 20. However, when N is
increased from 20 to 40 and beyond, there is little effect on the biases of
OLS, FM, and D. From this it seems that in practice the T dimension must
exceed the N dimension, especially for the OLS and FMOLS estimators, in
order to get a good approximation of the limiting distributions of the
estimators. For example, for each of the estimators in Table 2, the reported bias
is substantially less for (T = 120, N = 40) than it is for either (T = 40, N = 40) or
(T = 40, N = 120). The results in Table 2 again confirm the superiority of the
DOLS. The largest bias in the DOLS with four lags and two leads, DOLS(4, 2),
is less than or equal to 0.02 for all cases except at N = 1 and T = 20, which can
be compared with a simulation standard error (in parentheses) that is less than
0.007 when N 20 and, T 60, confirming the accuracy of the DOLS(4, 2). The
biases in DOLS with two lags and one lead, DOLS(2, 1) start off slightly biased
Panel Cointegration
207
at N = 1 and T = 20, and converge to an almost unbiased coefficient estimate at

N = 20 and T = 40. The biases of DOLS(2, 1) move in the opposite direction to
those of DOLS(4, 2).
Figures 1, 3, 5 and 7 display estimated pdfs for the estimators for 21 = 0.4
and = 0.4 with N = 40 (T = 20 in Figure 1, T = 40 in Figure 3, T = 60 in Figure
5 and T = 120 in Figure 7). In Figure 1, N = 40, T = 20, the DOLS is much better
centered than the OLS and FMOLS. In Figures 3, 5 and 7, the biases of the
OLS and FMOLS were reduced as T increases, the DOLS still dominates the
OLS and FMOLS.
Monte Carlo means and standard deviations of the t-statistic, t = 0, are given
in Table 3. Here, the OLS t-statistic is the conventional t-statistic as printed by
standard statistical packages, and the FMOLS and DOLS t-statistics. With all
values of 21 and 21, the DOLS(4, 2) t-statistic is well approximated by a
standard N(0, 1) suggested from the asymptotic results. The DOLS(4, 2) tstatistic is much closer to the standard normal density than the OLS t-statistic
and the FMOLS t-statistic. When 21 > 0 and 21 < 0, the OLS t-statistic is more
heavily biased than the FMOLS t-statistic. Again, when 21 > 0 and 21 > 0, the
opposite is true. Even when 21 = 0, the FMOLS t-statistic is not well
approximated by a standard N(0, 1). The OLS t-statistic performs better than
the FMOLS t-statistic when 21 = 0.8 and 21 > 0 and when 21 0.4 and 21 =
0.8, but not in other cases. The FMOLS t-statistic in general does not perform
better than the OLS t-statistic.
Table 4 shows that both the OLS t-statistic and the FMOLS t-statistic
become more negatively biased as the dimension of cross-section N increases.
The heavily negative biases of the FMOLS t-statistic in Tables 34 again
indicate the poor performance of the FMOLS estimator. For the DOLS(4, 2),
the biases decrease rapidly and the standard errors converge to 1.0 as T
increases. Similar to Table 2, we observe from Table 4 that for the DOLS tstatistic the T dimension is more important than the N dimension in reducing
the biases of the t-statistics. However, the improvement of the DOLS t-statistic
is rather marginal as T increases.
Figures 2, 4, 6 and 8 display estimated pdfs for the t-statistics for 21 = 0.4
and = 0.4 with N = 40 (T = 20 in Figure 2, T = 40 in Figure 4, T = 60 in Figure
6 and T = 120 in Figure 8). The figures show clearly that the DOLS t-statistic
is well approximated by a standard N(0, 1) especially as T increases. From the
results in Tables 2 and 4 and Figures 18 we note that the sequential limit
theory approximates the limiting distributions of the DOLS and its t-statistic
very well.
in (15)
It is known that when the length of time series is short the estimate
may be sensitive to the length of the bandwidth. In Tables 2 and 4, we first
208
investigate the sensitivity of the FMOLS estimator with respect to the choice of
length of the bandwidth. We extend the experiments by changing the lag length
from 5 to 2 for a Barlett window. Overall, the results show that changing the
lag length from 5 to 2 does not lead to substantial changes in biases for the
FMOLS estimator and its t-statistic. However, the biases of the DOLS
estimator and its t-statistic are reduced substantially when the lags and leads are
changed from (2, 1) to (4, 2) as predicted from Theorem 3. The results from
Tables 2 and 4 show that the DOLS method gives different estimates of and
the t-statistic depending on the number of lags and leads we choose. This seems
to be a drawback of the DOLS estimator. Further research is needed on how to
choose the lags and leads for the DOLS estimator in the panel setting.
B. ARMA(1, 1) Error Terms
In this section, we look at simulations where, instead of the errors being
generated by an MA(1) process, like in (31), the errors are generated by an
ARMA(1, 1) process, as in (30). One may question that the MA(1)
specification in (31) may be unfair to the FMOLS estimator. One of the reasons
why the performance of the DOLS is much better than that of the FMOLS lies
in the simulation design in (31), which assumes that the error terms are MA(1)
processes. If (uit , it) is an MA(1) process, then uit can be written exactly with
three terms, it1, it, and it + 1 and no lag truncation approximation is required
for the DOLS.
Tables 5 and 6 report the performance of OLS, FMOLS, and DOLS and their
t-statistics when the errors are generated by an ARMA(1, 1) process. Tables 5
and 6 show that the FMOLS estimator and its t-statistic are less biased than the
OLS estimator for most cases and is outperformed by the DOLS. Again, when
21 0.0 and 21 = 0.8 the FMOLS estimator and its t-statistic suffer from severe
biases. On the other hand, we observe that DOLS shows less improvement
compared with OLS and FMOLS, in contrast to Tables 1 and 3. However, the
good performance of DOLS may disappear for high order ARMA(p, q) error
process.
C. Non-normal Errors
In this section, we conduct an experiment where the error terms are nonnormal. The DGP is similar to that of Gonzalo (1994):
0.101
(0.038)
0.052
(0.014)
0.035
(0.008)
0.039
(0.024)
0.020
(0.008)
0.013
(0.004)
0.006
(0.015)
0.003
(0.005)
0.002
(0.003)
0.017
(0.009)
0.008
(0.003)
0.005
(0.001)
0.110
(0.042)
0.052
(0.015)
0.034
(0.008)
0.073
(0.032)
0.034
(0.011)
0.022
(0.006)
0.046
(0.025)
0.021
(0.009)
0.014
(0.005)
0.020
(0.016)
0.008
(0.005)
0.006
(0.003)
21 = 0.8
FM
0.002
(0.007)
0.002
(0.002)
0.001
(0.001)
0.001
(0.015)
0.000
(0.005)
0.001
(0.003)
0.001
(0.024)
0.000
(0.008)
0.000
(0.004)
0.003
(0.037)
0.001
(0.012)
0.000
(0.007)
D
0.016
(0.017)
0.007
(0.006)
0.005
(0.003)
0.035
(0.025)
0.016
(0.008)
0.011
(0.005)
0.045
(0.028)
0.021
(0.010)
0.013
(0.005)
0.049
(0.029)
0.024
(0.010)
0.015
(0.006)
OLS
0.017
(0.013)
0.008
(0.004)
0.005
(0.002)
0.013
(0.022)
0.006
(0.007)
0.004
(0.004)
0.038
(0.027)
0.019
(0.009)
0.012
(0.005)
0.062
(0.020)
0.031
(0.011)
0.021
(0.006)
21 = 0.4
FM
0.003
(0.012)
0.001
(0.004)
0.001
(0.002)
0.001
(0.023)
0.001
(0.008)
0.001
(0.004)
0.000
(0.028)
0.000
(0.009)
0.000
(0.005)
0.000
(0.030)
0.000
(0.010)
0.000
(0.005)
D
0.035
(0.024)
0.016
(0.009)
0.011
(0.005)
0.001
(0.016)
0.001
(0.006)
0.000
(0.003)
0.006
(0.013)
0.002
(0.004)
0.002
(0.002)
0.009
(0.011)
0.004
(0.004)
0.003
(0.002)
OLS
0.012
(0.024)
0.007
(0.009)
0.005
(0.005)
0.034
(0.016)
0.016
(0.005)
0.010
(0.003)
0.037
(0.014)
0.017
(0.004)
0.012
(0.002)
0.036
(0.012)
0.017
(0.004)
0.012
(0.002)
21 = 0.8
FM
0.000
(0.031)
0.000
(0.009)
0.000
(0.005)
0.003
(0.015)
0.001
(0.005)
0.002
(0.003)
0.001
(0.012)
0.001
(0.004)
0.000
(0.002)
0.003
(0.009)
0.001
(0.003)
0.001
(0.002)
D
Note: (a) N = T. (b) A lag length 5 of the Bartlett windows is used for the FMOLS estimator. (c) 4 lags and 2 leads are used for the DOLS
estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 5.
Panel Cointegration
209
3.569
(1.323)
4.601
(1.219)
5.22
(1.195)
1.857
(1.106)
2.576
(1.044)
3.179
(1.036)
0.353
(0.952)
0.624
(0.897)
0.827
(0.904)
1.733
(0.933)
2.511
(0.871)
3.270
(0.897)
5.316
(1.929)
7.013
(1.903)
8.437
(1.899)
4.152
(1.762)
5.424
(1.733)
6.521
(1.721)
3.184
(1.644)
4.120
(1.616)
4.952
(1.599)
1.956
(1.529)
2.471
(1.507)
2.966
(1.484)
0.214
(0.663)
0.317
(0.664)
0.428
(0.694)
0.034
(0.956)
0.047
(0.909)
0.058
(0.913)
0.056
(1.132)
0.045
(1.027)
0.034
(1.004)
0.119
(1.290)
0.090
(1.119)
0.068
(1.077)
1.496
(1.589)
1.888
(1.578)
2.267
(1.571)
2.538
(1.769)
3.327
(1.771)
4.131
(1.746)
3.064
(1.867)
4.069
(1.880)
4.899
(1.898)
3.411
(1.924)
4.583
(1.949)
5.523
(1.969)
OLS
1.429
(1.015)
1.917
(1.010)
2.237
(0.999)
0.732
(1.226)
0.967
(1.085)
1.141
(1.021)
1.877
(1.314)
2.346
(1.149)
2.779
(1.114)
2.912
(1.390)
3.580
(1.216)
4.206
(1.178)
0.221
(1.052)
0.294
(0.956)
0.363
(0.941)
0.038
(1.313)
0.075
(1.116)
0.206
(1.118)
0.025
(1.388)
0.011
(1.152)
0.027
(1.096)
0.006
(1.417)
0.009
(1.166)
0.006
(1.111)
DOLS
2.315
(1.577)
3.089
(1.644)
3.736
(1.676)
0.047
(1.498)
0.194
(1.528)
0.064
(1.498)
0.705
(1.454)
1.099
(1.479)
1.343
(1.473)
1.158
(1.426)
1.723
(1.445)
2.097
(1.435)
OLS
21 = 0.4
FMOLS
DOLS
21 = 0.8
FMOLS
0.564
(1.195)
0.876
(1.088)
1.132
(1.062)
2.825
(1.327)
3.557
(1.194)
4.005
(1.096)
3.858
(1.373)
5.034
(1.268)
6.016
(1.211)
4.589
(1.420)
6.144
(1.343)
7.428
(1.294)
21 = 0.8
FMOLS
0.002
(1.551)
0.005
(1.239)
0.003
(1.155)
0.230
(1.276)
0.212
(1.095)
0.693
(1.094)
0.068
(1.208)
0.134
(1.053)
0.144
(1.014)
0.347
(1.139)
0.505
(1.011)
0.603
(0.978)
DOLS
estimator. (d) The error terms are generated by an ARMA(1,1) process from equation (30).
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 6.
210
Panel Cointegration
211

uit
u*it
0.3
=
+
it
*it
21
u*it =
0.4
0.6
u*it1
,
*it1
(32)
1
0.5*it + (10.52)1/2u**
it ,

and
*it = **
it ,
where u**
and **
are independent exponential random variables with a
it
it
parameter 1. The results from Tables 78 show that while the DOLS estimator
performs better in terms of the biases, the distribution of the DOLS t-statistic
is far from the asymptotic N(0, 1). The standard deviations of the DOLS tstatistic are badly underestimated.
To summarize the results so far, it would appear that the DOLS estimator is
the best estimator overall, though the standard error for the DOLS t-statistic
shows significant downward bias when the error terms are generated from nonnormal distributions.
D. Heterogeneous Panel
In Sections AC we compare the small sample properties of the OLS, FMOLS,
and DOLS estimators and conclude that the DOLS estimator and its t-statistic
generally exhibit the least bias. One of the reasons for the poor performance of
the FMOLS estimator in the homogeneous panel is that the FMOLS estimator
needs to use a kernel estimator for the asymptotic covariance matrix, while the
DOLS does not. By contrast, for the heterogeneous panel both DOLS in (20)
and OLS in (33) use kernel estimators. Consequently, one may expect that the
much better performance of the DOLS estimator in Sections 5A-C is limited to
only very specialized cases, e.g. in the homogeneous panel. To test this, we now
compare the performance of the OLS, FMOLS, and DOLS estimators for a
heterogeneous panel using Monte Carlo experiments similar to those in Section
5A. The DGP is
yit = i + xit + uit
and
xit = xit1 + it
for i = 1, . . . , N, t = 1, . . . T, where
0.011
(0.009)
0.003
(0.002)
0.001
(0.001)
0.008
(0.009)
0.005
(0.004)
0.002
(0.002)
0.010
(0.057)
0.002
(0.014)
0.001
(0.007)
0.022
(0.012)
0.006
(0.003)
0.003
(0.001)
0.005
(0.009)
0.001
(0.002)
0.001
(0.001)
0.002
(0.009)
0.002
(0.004)
0.001
(0.002)
0.012
(0.058)
0.003
(0.014)
0.001
(0.007)
0.011
(0.013)
0.003
(0.003)
0.001
(0.001)
= 0.25
FM
0.000
(0.002)
0.000
(0.001)
0.000
(0.000)
0.001
(0.054)
0.000
(0.013)
0.000
(0.006)
0.001
(0.005)
0.000
(0.001)
0.000
(0.001)
0.000
(0.002)
0.000
(0.000)
0.000
(0.000)
D
0.034
(0.020)
0.009
(0.005)
0.004
(0.002)
0.005
(0.017)
0.001
(0.004)
0.001
(0.002)
0.002
(0.009)
0.000
(0.002)
0.000
(0.001)
0.002
(0.006)
0.001
(0.001)
0.000
(0.001)
OLS
0.049
(0.019)
0.014
(0.005)
0.007
(0.002)
0.007
(0.016)
0.002
(0.004)
0.001
(0.002)
0.008
(0.009)
0.002
(0.002)
0.001
(0.001)
0.007
(0.006)
0.002
(0.001)
0.001
(0.001)
= 0.5
FM
0.001
(0.013)
0.000
(0.003)
0.000
(0.001)
0.001
(0.014)
0.000
(0.003)
0.000
(0.002)
0.000
(0.005)
0.000
(0.001)
0.000
(0.001)
0.000
(0.003)
0.028
(0.001)
0.000
(0.000)
D
0.039
(0.016)
0.012
(0.004)
0.005
(0.002)
0.001
(0.005)
0.000
(0.001)
0.000
(0.001)
0.001
(0.004)
0.000
(0.001)
0.000
(0.000)
0.001
(0.003)
0.000
(0.001)
0.000
(0.000)
OLS
0.008
(0.014)
0.003
(0.004)
0.002
(0.002)
0.005
(0.005)
0.001
(0.001)
0.001
(0.001)
0.005
(0.004)
0.001
(0.001)
0.001
(0.000)
0.004
(0.003)
0.001
(0.001)
0.001
(0.000)
=1
FM
0.000
(0.013)
0.000
(0.003)
0.000
(0.001)
0.000
(0.003)
0.000
(0.001)
0.000
(0.000)
0.000
(0.002)
0.000
(0.001)
0.000
(0.000)
0.000
(0.002)
0.000
(0.000)
0.000
(0.000)
D
estimator. (d) The error terms are non-normal.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
21 = 0.8
T = 20
OLS
Table 7.
212
1.248
(0.940)
0.892
(0.599)
0.738
(0.488)
0.884
(0.932)
0.787
(0.599)
0.651
(0.488)
0.164
(0.941)
0.106
(0.616)
0.093
(0.505)
1.714
(0.951)
1.249
(0.605)
1.036
(0.492)
0.699
(1.311)
0.717
(1.253)
0.741
(1.267)
0.259
(1.243)
0.587
(1.250)
0.611
(1.264)
0.275
(1.271)
0.282
(1.231)
0.264
(1.248)
1.104
(1.326)
1.134
(1.262)
1.163
(1.274)
0.000
(0.189)
0.001
(0.126)
0.001
(0.102)
0.014
(0.896)
0.013
(0.579)
0.002
(0.477)
0.071
(0.561)
0.007
(0.230)
0.008
(0.188)
0.006
(0.209)
0.002
(0.139)
0.002
(0.113)
2.286
(1.278)
2.368
(1.208)
2.416
(1.214)
0.340
(1.236)
0.347
(1.186)
0.332
(1.193)
0.259
(1.243)
0.268
(1.189)
0.289
(1.197)
0.472
(1.245)
0.484
(1.191)
0.506
(1.199)
OLS
2.528
(0.976)
1.947
(0.633)
1.637
(0.513)
0.398
(0.941)
0.268
(0.611)
0.226
(0.497)
0.884
(0.932)
0.626
(0.599)
0.519
(0.485)
1.055
(0.931)
0.752
(0.597)
0.623
(0.483)
0.035
(0.650)
0.035
(0.446)
0.033
(0.363)
0.031
(0.784)
0.025
(0.509)
0.013
(0.421)
0.071
(0.561)
0.054
(0.363)
0.052
(0.299)
0.039
(0.421)
0.003
(0.276)
0.028
(0.227)
DOLS
2.749
(1.067)
2.946
(0.992)
3.011
(0.981)
0.145
(1.041)
0.141
(0.982)
0.125
(0.978)
0.199
(1.040)
0.213
(0.981)
0.232
(0.978)
0.406
(1.040)
0.424
(0.981)
0.445
(0.979)
OLS
= 0.5
FMOLS
DOLS
= 0.25
FMOLS
0.539
(0.984)
0.598
(0.672)
0.538
(0.554)
0.961
(0.931)
0.685
(0.594)
0.570
(0.478)
1.152
(0.927)
0.831
(0.589)
0.692
(0.474)
1.265
(0.925)
0.918
(0.588)
0.764
(0.472)
=1
FMOLS
0.026
(0.899)
0.008
(0.624)
0.002
(0.525)
0.066
(0.619)
0.053
(0.407)
0.039
(0.337)
0.019
(0.567)
0.016
(0.368)
0.020
(0.304)
0.118
(0.520)
0.096
(0.336)
0.088
(0.276)
DOLS
estimator. (d) The error terms are non-normal.
T = 60
T = 40
21 = 0.8
T = 20
T = 60
T = 40
21 = 0.0
T = 20
T = 60
T = 40
21 = 0.4
T = 20
T = 60
T = 40
T = 20
OLS
Table 8.
Panel Cointegration
213
214

uit
u*it
0.3
=
+
it
*it
21
0.4
0.6

u*it1
*it1
with

u*it iid
~N
*it
0
,
0
1
21
21
1
As in Section A, we generated i from a uniform distribution, U[0, 10], and set

= 2. In this section, we allowed 21 and 21 to be random in order to generate
the heterogeneous panel, i.e. both 21 and 21 are generated from a uniform
distribution, U[0.8, 0.8]. We hold these values fixed in simulations. An
i, was obtained by the COINT 2.0 with a Bartlett
estimate of i = i + i + i,
window. The lag truncation number was set at 5.
The three estimators considered are the FMOLS, DOLS, and the OLS, where
the OLS is defined as
OLS =
*

N
i=1
t=1

(x**
**
**
it x
i )(x**
it x
i )
i=1
t=1
(x**
**
it x
i )(y**
it )
(33)
with x**
**
=
it = wi xit, y**
it = wiyit, x
i
1
T
1
x**
it , and wi = [i ]11. Two FMOLS
t=1
estimators will be considered, one using the lag length of 5 (FMOLS(5)), the
second using the lag length of 2 (FMOLS(2)). Two DOLS estimators are also
considered: DOLS with four lags and two leads, DOLS(4, 2) and DOLS with
two lags and one lead, DOLS(2, 1). The relatively good performance of the
DOLS estimator in a homogeneous panel can also be observed in Table 9. The
biases of the OLS and FMOLS estimators are substantial. Again, the DOLS
outperforms the OLS and FMOLS. Note from Table 9 that the FMOLS always
has more bias than the OLS for all N and T except when N = 1. The poor
performance of the FMOLS in the heterogenous panels indicates that the
FMOLS in Section 4 is not recommended in practice. A possible reason for the
poor performance of the FMOLS in heterogenous panels is that it has to go
through two non-parametric corrections, as in (22) and (23). Therefore the
failure of the non-parametric correction could be very severe for the FMOLS
estimator in heterogenous panels. Pedroni (1996) proposed several alternative
versions of the FMOLS estimator such as an FMOLS estimator based on the
Panel Cointegration
Table 9.
215
Means Biases and Standard Deviations of OLS, FMOLS, and DOLS

Estimators for Different N and T in a Heterogeneous Panel
(N,T)
OLS
*
FM(5)
*
FM(2)
*
D(4,2)
*
D(2,1)
*
(1,20)
0.102
(0.163)
0.052
(0.079)
0.035
(0.052)
0.018
(0.026)
0.025
(0.032)
0.016
(0.014)
0.012
(0.009)
0.006
(0.004)
0.023
(0.024)
0.015
(0.009)
0.013
(0.006)
0.014
(0.004)
0.023
(0.019)
0.015
(0.008)
0.011
(0.005)
0.006
(0.002)
0.022
(0.014)
0.015
(0.006)
0.011
(0.004)
0.006
(0.002)
0.076
(0.319)
0.006
(0.116)
0.004
(0.066)
0.008
(0.027)
0.069
(0.054)
0.041
(0.019)
0.028
(0.011)
0.014
(0.005)
0.089
(0.038)
0.048
(0.013)
0.032
(0.008)
0.014
(0.004)
0.073
(0.031)
0.042
(0.011)
0.029
(0.006)
0.014
(0.003)
0.075
(0.003)
0.042
(0.008)
0.029
(0.004)
0.014
(0.002)
0.008
(0.212)
0.018
(0.084)
0.014
(0.050)
0.009
(0.023)
0.073
(0.034)
0.035
(0.014)
0.023
(0.009)
0.011
(0.004)
0.083
(0.024)
0.039
(0.009)
0.026
(0.006)
0.012
(0.003)
0.074
(0.019)
0.036
(0.008)
0.023
(0.005)
0.011
(0.002)
0.072
(0.022)
0.036
(0.006)
0.024
(0.004)
0.011
(0.002)
0.011
(0.405)
0.001
(0.121)
0.001
(0.071)
0.000
(0.030)
0.000
(0.054)
0.001
(0.020)
0.000
(0.012)
0.000
(0.005)
0.000
(0.038)
0.001
(0.014)
0.000
(0.009)
0.000
(0.003)
0.001
(0.031)
0.001
(0.011)
0.000
(0.007)
0.000
(0.003)
0.001
(0.022)
0.001
(0.008)
0.000
(0.005)
0.000
(0.002)
0.004
(0.264)
0.006
(0.099)
0.005
(0.061)
0.002
(0.029)
0.006
(0.040)
0.004
(0.017)
0.003
(0.011)
0.002
(0.005)
0.007
(0.028)
0.004
(0.012)
0.003
(0.008)
0.002
(0.004)
0.006
(0.023)
0.004
(0.009)
0.003
(0.006)
0.002
(0.003)
0.016
(0.011)
0.004
(0.007)
0.003
(0.004)
0.002
(0.002)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
estimators. (c) 21 ~ U[0.8,0.8] and 21 ~ U[0.8,0.8].
216
transformation of the estimated residuals and a group-mean based FMOLS

estimator. It would be interesting to study further the issues of estimation and
inference in heterogenous panels. However, it goes beyond the scope of this
chapter.
From Table 10, we note that the DOLS t-statistics tend to have heavier tails
than predicted by the asymptotic distribution theory, though the bias of the
DOLS t-statistic is much lower than those of the OLS and FMOLS t-statistics.
It appears that the DOLS still is the best estimator overall in a heterogeneous
panel.
V. CONCLUSION
This chapter discusses limiting distributions for the OLS, FMOLS, and DOLS
estimators in a cointegrated regression. We also investigate the finite sample
proprieties of the OLS, FMOLS, and DOLS estimators. The results from
Monte Carlo simulations can be summarized as follows: First, for the
homogeneous panel, when the serial correlation parameter, 21, and the
endogeneity parameter, 21, are both negative, the OLS is the most biased
estimator. The OLS is biased in almost all cases for the heterogenous panel.
Second, the FMOLS is more biased than the OLS when 21 0 and 21 > 0 for
the homogeneous panel. The FMOLS is severely biased for the heterogenous
panel in almost all trials. This indicates the failure of the parametric correction
is very serious, especially in the heterogenous panel. Third, DOLS performs
very well in all cases for both the homogeneous and heterogenous panels.
Adding the number of leads and lags reduces the bias of the DOLS
substantially. This was predicted by the asymptotic theory in Theorem 3.
Fourth, the sequential limit theory approximates the limit distributions of the
DOLS and its t-statistic very well. All in all, our findings are summarized as
follows:
(i) The OLS estimator has a non-negligible bias in finite samples.
(ii) The FMOLS estimator does not improve over the OLS estimator in
general.
(iii) The FMOLS estimator is complicated by the dependence of the correction
terms upon the preliminary estimator (here we use OLS), which may be
very biased in finite samples with panel data. More seriously, the failure
of the non-parametric correction for the FMOLS in panel data could be
severe. This indicates that the DOLS estimator may be more promising
than the OLS or FMOLS estimators in estimating cointegrated panel
regressions.
Panel Cointegration
Table 10.
217
Means Biases and Standard Deviations of t-statistics for Different

N and T in a Heterogeneous Panel
(N,T)
OLS
FMOLS(5)
FMOLS(2)
DOLS(4,2)
DOLS(2,1)
(1,20)
0.893
(1.390)
0.861
(1.265)
0.844
(1.233)
0.845
(1.212)
1.221
(1.578)
1.629
(1.344)
1.774
(1.282)
1.957
(1.239)
1.612
(1.640)
2.194
(1.392)
2.417
(1.306)
2.832
(1.234)
1.946
(1.697)
2.715
(1.389)
3.045
(1.328)
3.346
(1.250)
2.675
(1.720)
3.802
(1.408)
4.269
(1.336)
4.715
(1.250)
0.588
(2.473)
0.101
(1.849)
0.095
(1.579)
0.372
(1.336)
2.411
(1.902)
2.899
(1.345)
3.031
(1.195)
3.095
(1.047)
4.381
(1.882)
4.807
(1.341)
4.905
(1.199)
4.886
(1.059)
4.408
(1.884)
5.171
(1.320)
5.361
(1.170)
5.420
(1.033)
6.382
(1.878)
7.399
(1.314)
7.633
(1.162)
7.723
(1.045)
0.058
(1.643)
0.280
(1.331)
0.347
(1.207)
0.459
(1.139)
2.530
(1.192)
2.518
(0.999)
2.508
(0.952)
2.466
(0.907)
4.079
(1.191)
3.969
(1.004)
3.932
(0.960)
3.839
(0.911)
4.474
(1.182)
4.407
(0.976)
4.380
(0.933)
4.281
(0.889)
6.383
(1.169)
6.272
(0.967)
6.209
(0.931)
6.084
(0.897)
0.093
(3.303)
0.009
(1.980)
0.016
(1.729)
0.016
(1.510)
0.010
(1.983)
0.059
(1.485)
0.004
(1.329)
0.046
(1.197)
0.039
(1.987)
0.068
(1.472)
0.007
(1.319)
0.099
(1.181)
0.041
(1.932)
0.110
(1.452)
0.027
(1.307)
0.105
(1.181)
0.073
(1.939)
0.145
(1.444)
0.047
(1.307)
0.136
(1.178)
0.029
(2.156)
0.106
(1.618)
0.119
(1.489)
0.101
(1.405)
0.219
(1.468)
0.271
(1.259)
0.347
(1.184)
0.393
(1.121)
0.365
(1.466)
0.432
(1.233)
0.515
(1.169)
0.608
(1.099)
0.408
(1.449)
0.472
(1.221)
0.572
(1.165)
0.697
(1.099)
0.580
(1.439)
0.683
(1.215)
0.803
(1.165)
0.977
(1.098)
(1,40)
(1,60)
(1,120)
(20,20)
(20,40)
(20,60)
(20,120)
(40,20)
(40,40)
(40,60)
(40,120)
(60,20)
(60,40)
(60,60)
(60,120)
(120,20)
(120,40)
(120,60)
(120,120)
estimators. (c) 21 ~ U[0.8,0.8] and 21 ~ U[0.8,0.8].
218
ACKNOWLEDGMENTS
We thank Suzanne McCoskey, Peter Pedroni, Andrew Levin and participants of
the 1998 North American Winter Meetings of the Econometric Society for
helpful comments and Bangtian Chen for his research assistance on an earlier
draft of this chapter. Thanks also go to Denise Paul for correcting my English
and carefully checking the manuscript to enhance its readability. A Gauss
program for this paper can be retrieved from http://web.syr.edu/ ~ cdkao.
Address correspondence to: Chihwa Kao, Center for Policy Research,
426 Eggers Hall, Syracuse University, Syracuse, NY. 132441020; e-mail:
cdkao@maxwell.syr.edu.
REFERENCES
Baltagi, B. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons.
Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels:
A Survey. Advances in Econometrics, 15, 751.
Chen, B., McCoskey, S., & Kao, C. (1999). Estimation and Inference of a Cointegrated Regression
in Panel Data: A Monte Carlo Study. American Journal of Mathematical and Management
Sciences, 19, 75114.
Gonzalo, J. (1994). Five Alternative Methods of Estimating Long-Run Equilibrium Relationships.
Im, K., Pesaran, H., & Shin, Y. (1995). Testing for Unit Roots in Heterogeneous Panels.
Manuscript, University of Cambridge.
Kao, C., & Chen, B. (1995). On the Estimation and Inference for Cointegration in Panel Data
When the Cross-Section and Time-Series Dimensions are Comparable. Manuscript, Center
for Policy Research, Syracuse University.
Levin, A., & Lin, C. F. (1993). Unit Root Tests in Panel Data: New Results. Discussion paper,
Department of Economics, UC-San Diego.
a New Simple Test: Evidence From Simulations and the Bootstrap. Oxford Bulletin of
Pesaran, H., & Smith, R. (1995). Estimating Long-Run Relationships from Dynamic Heterogeneous Panels. Journal of Econometrics, 68, 79113.
Pedroni, P. (1997). Panel Cointegration: Asymptotics and Finite Sample Properties of Pooled Time
Series Tests with an Application to the PPP Hypothesis. Working paper, Department of
Economics, No. 95013, Indiana University.
Panel Cointegration
219
Pedroni, P. (1996). Fully Modified OLS for Heterogeneous Cointegrated Panels and the Case of
Purchasing Power Parity. Working paper, Department of Economics, No. 9620, Indiana
University.
Phillips, P. C. B., & Durlauf, S. N. (1986). Multiple Time Series Regression with Integrated
Processes. Review of Economic Studies, 53, 473495.
Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference in Instrumental Variables
Regression with I(1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P. C. B., & Loretan, M. (1991). Estimating Long-Run Economic Equilibria. Review of
Economic Studies, 58, 407436.
Phillips, P. C. B., & Moon, H. (1999). Linear Regression Limit Theory for Non-stationary Panel
Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. Annals of Statistics, 20,
9711001.
Quah, D. (1994). Exploiting Cross Section Variation for Unit Root Inference in Dynamic Data.
Saikkonen, P. (1991). Asymptotically Efficient Estimation of Cointegrating Regressions.
Econometric Theory, 58, 121.
Summers, R., & Heston, A. (1991). The Penn World Table; An Expanded Set of International
Comparisons 19501988. Quarterly Journal of Economics, 106, 327368.
Stock, J., & Watson, M. (1993). A Simple Estimator of Cointegrating Vectors in Higher Order
Integrated Systems. Econometrica, 61, 783820.
APPENDIX
Proof of Theorem 3
First we write (19) in vector form:
yi = ei + xi + ZiqC + v i
= xi + ZiD + v i (say),
where yi, is a T 1 vector of yit; e is T 1 unit vector; Ziq is the T 2q matrix
of observations on the 2 q regressors xit q, , xit + q; xi is a vector of T k
of xit; C is a (2 q) 1 vector of cij; v i is a T 1 vector of v it; Zi is a
T (2 q + 1) matrix, Zi = (e, Ziq); and D is a (2 q + 1) 1 vector of
parameters. Let Qi = I Zi(ZiZi) 1Zi. It follows that

N
( D ) =
1
(xiQi xi)
i=1
We rescale ( D ) by NT to get
(xiQiv i) .
i=1
220

N
NT( D ) =
1
N
i=1
1
(xiQi xi)
T2
1
=
N
6iT
1
i=1
1
i=1
i=1
1
(xiQiv i)
T
1
N
N
= [6NT] 1[N5NT],
1
where 5NT =
N
1
N
N
5iT
i=1

N
1
1
5iT, 5iT = (xiQiv i), 6NT =
T
N
6iT, and 6iT =
i=1
Observe that from Saikkonen (1991)

6iT =
=
1
(xiQi xi)
T2
1
(xiWT xi) + op(1)
T2

Tq
1
(xit x i)(xit x i) + op(1)
= 2
T t=q+1
i,
B iB
and
1
5iT = (xiQiv i)
T
1
= (xiWTv i) + op(1)
T

Tq
1
(xit x i)vit + op(1)
=
T t=q+1
B dBui+ ,
1
(xiQi xi).
T2
Panel Cointegration
221
1
ee. Then applying
T
1
the multivariate Lindeberg-Levy central limit theorem to
B idBui+ and
N
N
1
i as in Theorem 2, we have
B iB
combining this with the limit of
N i=1
as T for all i, where B i = Bi
Bi and WT = IT

N
1
N
1
i
B iB
B idBui+ N(0, 6 1u.)
N
i=1
as N . It follows that using the sequential limit theory

NT( D ) N(0, 6 1u.)
as required.

Proof of Theorem 5
The proof is the same as that of Theorem 3. First, similar to Theorem 3, we

write (25) in vector form:
y*i = ei + x*i + Z*iqC + v *i
= x*i + Z*i D + v *i (say),
and define y*i , e, Z*iq, x*i , C, v *i , Z*i , Zi, D, and Q*i as in the proof of Theorem 3.
Then we have:

N
D ) = 1
NT(*
N
i=1
1
(x*
i Q*
i x*
i)
T2
1
=
N
8iT
i=1
1
1
N
N
i=1
1
(x*
*i )
i Q*
iv
T
1
N
N
7iT
i=1
= [8NT] 1[N7NT],

N
where
8iT =
1
7NT =
N
1
(x*
i Q*
i x*
i).
T2
i=1
1

N
7iT,
1
7iT = (x*
*i),
i Q*
i v
T
1
8NT =
N
i=1
8iT,
and
222
Observe that from Assumption 8, we have

8iT =
=
1
(x*
i Q*
i x*
i)
T2
1
(x*
i W*
T x*
i ) + op(1)
T2

T qi
1
(x*it x *i )(x*it x *i ) + op(1)
= 2
T t=q +1
iW
i,
W
and
1
7iT = (x*
*i)
i Q*
i v
T
1
*i ) + op(1)
= (x*
i WT v
T

T qi
1
(x*it x *i )v*it + op(1)
=
T t=q +1
i
idVi,
W
as T for all i. The remainder of the proof follows that of Theorem 3.
TESTING FOR UNIT ROOTS IN PANELS

IN THE PRESENCE OF STRUCTURAL
CHANGE WITH AN APPLICATION TO
OECD UNEMPLOYMENT
Christian J. Murray and David H. Papell
ABSTRACT
There has been extensive research on testing for unit roots in the presence
of structural change and on testing for unit roots in panels. This chapter
takes a small step towards combining the two research agendas. We
propose a unit root test for non-trending data in the presence of a onetime change in the mean for a heterogeneous panel. The date of the break
is determined endogenously. We perform simulations to investigate the
power of the test, and apply the test to a data set of annual unemployment
rates for 17 OECD countries from 1955 to 1990.
I. INTRODUCTION
The work of Perron (1989) has inspired extensive research on testing for unit
roots in the presence of structural change. Banerjee, Lumsdaine & Stock
(1992), Zivot & Andrews (1992), and Perron (1997), among many others,
develop tests which allow the break to be determined endogenously and
Lumsdaine & Papell (1997) extend the tests to allow for two breaks. Starting
with Levin & Lin (1992), much work has also been done on testing for unit
ISBN: 0-7623-0688-2
223
224
CHRISTIAN J. MURRAY & DAVID H. PAPELL
roots in panels, including papers by Im, Peseran & Shin (1997), Maddala & Wu
(1999), and Bowman (1999).
This chapter takes a small step towards combining the two research agendas.
We propose a unit root test for non-trending data in the presence of a one-time
change in the mean for a heterogeneous panel. The date of the break, which is
common across the countries of the panel, is determined endogenously and,
in the additive outlier framework, is assumed to occur instantaneously. The
speed of mean reversion is also common across countries. The intercepts,
coefficients on the break dummy variable, and serial correlation structure,
however, are country specific.
In the context of testing for a unit root in the presence of structural change,
our test is most closely related to the work of Perron & Vogelsang (1992). They
develop a test for a unit root in non-trending data in the presence of a one-time
change in the mean of a single series, with the date of the change determined
endogenously. In the panel unit root context, the most closely related work is
Papell (1997), who utilizes a feasible generalized least squares (SUR) method
which allows for both contemporaneous and heterogeneous serial correlation.
Levin & Lin (1992) and Bowman (1999) show that, in the absence of
structural change, panel unit root tests have good power in moderately sized
samples of 10 or more countries, even with fairly long persistence. We conduct
two power experiments, both involving panels of non-trending, stationary
series with a one-time change in the mean. First, using conventional panel unit
root tests, we find very low power to reject the unit root null. Second, using
tests that incorporate structural change, the power is much improved.
We apply the test to a data set of annual unemployment rates for 17 OECD
countries from 1955 to 1990. Using the panel tests in the presence of structural
change, we find much stronger rejections of unit roots than can be found with
univariate tests that do not incorporate structural change, panel tests that do not
incorporate structural change, or univariate tests that do incorporate structural
change.
II. PANEL UNIT ROOT TESTS IN THE PRESENCE OF

STRUCTURAL CHANGE
In this section, we develop panel unit root tests in the presence of structural
change. We first discuss conventional Augmented Dickey-Fuller (ADF) unit
root tests, panel unit root tests which do not incorporate structural change, and
single-equation unit root tests with structural change, and then describe how to
combine elements from the latter two tests to construct a panel unit root test
Testing for Unit Roots in Panels in the Presence of Structural Change
225
with structural change. While our tests are for non-trending data, an extension
to trending data would be straightforward.
The most common tests for unit roots are Augmented Dickey-Fuller tests.
ADF tests for non-trending data involve running the following regression:

k
ut = + ut 1 +
ciut i + t,
(1)
i=1
where ut is the variable of interest. The null hypothesis of a unit root is rejected
if the value of the t-statistic for (in absolute value) is greater than the
appropriate critical value. While the critical values are non-standard, they are
readily available.1
There is substantial evidence that the lag truncation parameter k is best
selected according to data-dependent methods rather than choosing a fixed k a
priori. We follow the method suggested by Campbell & Perron (1991), Hall
(1994), and Ng & Perron (1995). Start with an upper bound kmax on k. If the tstatistic on the coefficient of the last lag is significant, (using the 10% value of
the asymptotic distribution of 1.645), then kmax = k. If it is not significant, then
k is lowered by one. This procedure is repeated until the last lag becomes
significant. If no lag is significant, then k is chosen to equal zero.
Panel unit root tests in the ADF framework for non-trending data with
heterogeneous intercepts, which are equivalent to including country-specific
dummy variables, involve estimating the following regressions:

kj
ujt = j + ujt 1 +
cjiujt i + jt.
(2)
i=1
The subscript j = 1, . . . , N indexes the elements of the panel which, for

convenience of exposition, we will call countries. While Levin & Lin (1992)
show that imposing homogeneous intercepts results in substantial increases in
power, there is rarely any support for such a restriction in practice.
We estimate equation (2) by feasible generalized least squares (SUR), with
the coefficient equated across countries and the lag length kj set equal to the
value chosen by the single equation models described in equation (1).2 This
method accounts for contemporaneous and serial correlation, both of which are
often important in practice.3 In Papell (1997), this method is used to investigate
purchasing power parity.
The critical values for panel unit root tests computed by Levin & Lin (1992)
do not incorporate serial correlation in the disturbances. While, if the number
of observations is large enough, the panel ADF statistic converges to the
226
asymptotic distribution of the panel Dickey-Fuller statistic with no serial

correlation, this is a serious problem in samples of the size normally used,
especially when the recursive t-statistic method is used to select the lag
length.
Using Monte Carlo methods, we compute finite sample critical values for our
test statistics which account for both serial correlation and cross correlation in
the residuals. First, we generate unit root series for panels of 5, 10, 15, and 20
countries with 50, 100, and 200 observations. We then fit autoregressive (AR)
models to the first differences of each series, using the Schwarz criterion to
choose the optimal model, and then treat the optimal estimated AR models as
the true data generating process for the errors of each of the series. For each
panel, we construct pseudo samples using the optimal AR models with iid
N(0, 2) errors where 2 is the estimated innovation variance of the optimal AR
model.4 We then integrate the AR models to get the data in levels. Our test
statistic is the t-statistic on in equation (2), with the lag length kj for each
series chosen by univariate methods as described above. The critical values for
the finite sample distributions, obtained from 10,000 replications, are reported
in Table 1.
We now discuss univariate tests for a unit root in the presence of structural
change for non-trending data, using the methods of Perron & Vogelsang (1992).
Additive Outlier (AO) models, where the structural change occurs instantaneously, are estimated by the following two equations:5
ut = + DUt + t,
(3)
and

k
t =
i=0

k
iDTBt i + t 1 +
ci t i + t,
(4)
i=1
where t is the estimated residual from equation (3).6 TB is the break date,
DTBt = 1 if t = TB + 1, 0 otherwise, and DUt = 1 if t > TB, 0 otherwise.7
Equations (3) and (4) are estimated sequentially for each break year
TB = k + 2, . . . , T 1, where T is the number of observations. The break date
is chosen to minimize the t-statistic for , and data-dependent methods are used
to select the lag length k. The null hypothesis of a unit root is rejected if the tstatistic on is sufficiently large (in absolute value). The finite sample critical
values of Perron & Vogelsang (1992) can be used to assess the significance of
the unit root statistic.
We proceed to construct a test for unit roots in panel data in the presence of
structural change. With heterogeneous intercepts, the panel AO model is
estimated by the following two equations:
Table 1.
227
Finite Sample Critical Values for Panel Unit Root Tests without
Structural Change
1%
T
5
10
15
20
50
100
200
5.525
6.964
8.327
9.775
5.272
6.604
7.675
8.683
5.121
6.251
7.234
8.119
5%
T
5
10
15
20
50
100
200
4.789
6.244
7.603
8.940
4.641
5.923
6.964
7.955
4.512
5.640
6.629
7.512
10%
T
5
10
15
20
50
100
200
4.452
5.857
7.221
8.528
4.314
5.594
6.621
7.587
4.177
5.317
6.308
7.145
ujt = j + DUjt + jt,
(5)
and

kj
jt =
i=0

kj
jtDTBjt i + jt 1 +
i=1
cjt jt i + jt,
(6)
228
where jt are the residuals from (5), DTBjt = 1 if t = TB + 1, 0 otherwise, DUjt = 1

if t > TB, 0 otherwise, and j = 1, . . . , N indexes the countries. Using the Monte
Carlo methods described above, with 2500 replications, we compute finite
sample critical values for our test statistic, the t-statistic on in equation (6).8
III. POWER OF PANEL UNIT ROOT TESTS

Finite sample critical values for panel unit root tests, which incorporate lag
selection, are presented in Table 1. Critical values for panel unit root tests with
structural change are presented in Table 2. As mentioned earlier, we allow for
panels 5, 10, 15, and 20 countries (N), with 50, 100, and 200 observations (T).
In selecting the lag length, kmax is set to 4, 8, and 12 for T = 50, 100, and 200
respectively. Tables 1 and 2 reveal three properties of panel unit root statistics.
An increase in T leads to a decrease in the absolute value of the critical value
of the unit root statistic, whereas an increase in N increases its absolute value.
Also, allowing for structural change increases the absolute value of the panel
unit root statistic.
We now focus on the power of the t-statistic on in equations (3) and (4) and
equations (5) and (6). The range of (the sum of the AR coefficients) we
consider is 0.95, 0.90, and 0.80. We consider mean shifts, , of 0.5 and 1.0. In
the following empirical application, these values correspond to a one-half and
full percentage point increase in the unemployment rate. We set the break date
in the middle of the sample, i.e. TB = T/2.9
Tables 3 and 4 present the finite sample power of panel unit root tests
without and with structural change, respectively. The AR length is again chosen
by the Schwarz criterion. The number of repetitions used for Table 3 is 2500,
while 1000 repetitions are used for Table 4. The upper bound on the standard
error of rejection frequencies in Table 4 is 0.016.
Table 3 documents the generally poor power of panel unit root tests which
fail to allow for a shift in mean which is indeed present. For the alternative
closest to the null, = 0.95 and = 0.5, power is essentially zero. Holding
constant, power monotonically increases as is lowered to 0.90 and 0.80, but
it is only for the latter case where we begin to see decent power for a reasonable
amount of data. Holding constant, increasing monotonically reduces power.
This is consistent with Perrons (1989) finding that for a stationary time series,
a larger mean shift increases the probability of spuriously finding a unit root.
This is problematic in the context of our following empirical example. A value
of = 1 corresponds to a small (1%), permanent change in the mean
unemployment rate. Our results suggest that if is close to but less than one,
Table 2.
229
Finite Sample Critical Values for Panel Unit Root Tests with
Structural Change
1%
T
5
10
15
20
50
100
200
7.329
9.056
10.940
12.667
6.941
8.658
9.995
11.103
6.915
8.415
9.571
10.672
5%
T
5
10
15
20
50
100
200
6.613
8.484
10.279
12.011
6.432
8.046
9.461
10.618
6.334
7.852
9.105
10.225
10%
T
5
10
15
20
50
100
200
6.344
8.203
10.025
11.705
6.113
7.785
9.184
10.361
6.051
7.553
8.815
9.958
it is probable that panel unit root tests will incorrectly find that unemployment
is integrated, rather than stationary around a one time shift in mean.
Table 4 demonstrates that allowing for a mean shift greatly increases power
relative to Table 3. For all values of and considered, the power is at least
50%, and often times 100%, for a panel of at least 10 countries with at least 100
observations. Indeed, for T = 100, there are only two instances in which the
power is less that 50%, and those occur for the smallest panel considered,
N = 5, and the most persistent value of , 0.95.
230
Table 3.
5
10
15
20
5
10
15
20
5
10
15
20
Power of Panel Unit Root Tests without Structural Change

= 0.95, = 0.5
= 0.95, = 1.0
50
100
200
0.0004
0.0008
0.0000
0.0000
0.0008
0.0004
0.0000
0.0000
0.0008
0.0000
0.0000
0.0000
5
10
15
20
50
100
200
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
= 0.90, = 0.5
= 0.90, = 1.0
50
100
200
0.0180
0.0116
0.0120
0.0084
0.0560
0.1204
0.2300
0.3084
0.3780
0.8312
0.9608
0.9924
5
10
15
20
50
100
200
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0008
= 0.80, = 0.5
= 0.80, = 1.0
50
100
200
0.3652
0.6848
0.8216
0.8732
0.8400
0.9908
0.9992
1.0000
0.9872
1.0000
1.0000
1.0000
5
10
15
20
50
100
200
0.0036
0.0052
0.0052
0.0044
0.0336
0.1784
0.4208
0.6432
0.2052
0.6876
0.9124
0.9872
IV. EMPIRICAL EXAMPLE: UNIT ROOTS IN

UNEMPLOYMENT
We use annual series of unemployment for 17 OECD countries from 1955 to
1990. The source of the data is Layard, Nickell & Jackman (1991). We do not
update the data past 1990. Unemployment rates rose sharply, especially in
Europe, during the early 1990s. In Papell, Murray & Ghiblawi (2000), the
single equation methods of Bai & Perron (1998) detect considerable evidence
Table 4.
5
10
15
20
5
10
15
20
5
10
15
20
231
Power of Panel Unit Root Tests with Structural Change

= 0.95, = 0.5
= 0.95, = 1.0
50
100
200
0.0710
0.0840
0.0810
0.0520
0.2320
0.5160
0.7250
0.8730
0.8460
0.9960
1.0000
1.0000
5
10
15
20
50
100
200
0.0220
0.0160
0.0060
0.0020
0.4130
0.7570
0.8770
0.9570
0.9980
1.0000
1.0000
1.0000
= 0.90, = 0.5
= 0.90, = 1.0
50
100
200
0.2750
0.4730
0.5730
0.6600
0.7790
0.9930
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
5
10
15
20
50
100
200
0.2920
0.5150
0.5600
0.5590
0.9430
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
= 0.80, = 0.5
= 0.80, = 1.0
50
100
200
0.8000
0.9910
0.9990
0.9990
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
5
10
15
20
50
100
200
0.8000
0.8520
0.9960
0.9990
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
of multiple structural changes with unemployment data extended through 1997.

Testing for unit roots in panels with multiple structural changes, however, is
well beyond the scope of this chapter. Our empirical results, therefore, should
be interpreted as an illustration of the techniques rather than as an economic
analysis of postwar unemployment.
The first step in our investigation is to test for unit roots using methods that
do not account for structural change. The objective of this exercise is to provide
a benchmark for our later results. We run Augmented Dickey-Fuller (ADF)
232
tests, as in equation (1), for each of the 17 countries in the sample. The results
of the ADF tests are reported in Table 5. We set kmax to 4. Using critical values
from MacKinnon (1991), we find that the null of a unit root cannot be rejected
for any of the series at the 10% level.
Table 5.
Augmented Dickey-Fuller Tests
Country
Australia
0.437
(1.60)
0.188
(1.26)
0.337
(1.48)
0.819
(1.61)
0.222
(0.82)
0.359
(1.42)
0.176
(1.38)
0.239
(1.19)
0.470
(1.36)
0.597
(2.04)
0.210
(1.91)
0.248
(1.21)
0.435
(1.01)
0.369
(1.85)
0.413
(1.82)
0.391
(1.38)
1.389
(2.14)
0.936
(1.15)
0.915
(1.28)
0.953
(1.40)
0.893
(1.46)
0.993
(0.14)
0.912
(1.26)
0.987
(0.54)
0.929
(1.32)
0.952
(1.28)
0.885
(2.08)
0.883
(2.04)
0.966
(0.96)
0.835
(0.84)
0.945
(2.25 )
0.760
(1.37)
0.947
(1.14)
0.766
(2.16)
Austria
Belgium
Canada
Denmark
Finland
France
Germany
Ireland
Italy
Japan
Netherlands
Norway
Spain
Sweden
U.K.
U.S.A.
1
1
0
4
2
1
1
1
3
3
2
2
3
2
2
0
Note: The critical values for the ADF test, calculated from MacKinnon (1991) with 36
observations, are 3.62 (1%), 2.94 (5%), and 2.61 (10%). Numbers in parentheses are
t-statistics.
233
One possible reason for the failure of the ADF tests to reject the unit root
hypothesis is the relatively short (36 years) time span of the data.10 We
investigate this possibility by conducting panel unit root tests, described by
equation (2), to exploit cross-section variability among the 17 unemployment
rates. The results of the panel unit root tests are reported in Table 6.11 The null
hypothesis of a unit root cannot be rejected, at even the 10% level, either for
the OECD countries as a whole or for smaller panels consisting of European
(13), European Community (EC) (9), European Free Trade Area (EFTA) (4),
Non-European (4), or Non-EC (EFTA plus Non-Europe) (8) countries.12
The results for the univariate AO model of equations (3) and (4) are reported
in Table 7. The null hypothesis of a unit root is rejected for Finland, Ireland and
Spain at the 1% level, Belgium, France, Italy and Norway at the 5% level, and
Austria, Canada, Denmark, and the United Kingdom at the 10% level. The
structural breaks are all positive, reflecting the general rise in unemployment
among the OECD countries. The structural break occurs between 1974 and
1976 for nine out of eleven countries for which the unit root null can be
rejected.
The results of the panel unit root tests from equations (5) and (6) that account
for structural change, along with the associated critical values, are reported in
Table 6.
Panel Unit Root Tests
Group
t
OECD
EUROPE
EC
NON-EC
EFTA
NON-EUROPE
17
13
9
8
4
4
0.924
0.936
0.941
0.846
0.868
0.863
6.40
4.73
3.96
4.82
3.04
3.52
1%
5%
10%
10.16
8.52
7.09
6.83
5.45
5.45
9.00
7.58
6.28
5.99
4.67
4.67
8.48
7.16
5.86
5.58
4.27
4.27
Critical Values
Group
OECD
EUROPE
EC
NON-EC
EFTA
NON-EUROPE
234
Table 8.13 The unit root hypothesis is strongly (at the 1% level) rejected in favor
of stationarity with a one-time break in 1975 for the OECD, European, and EC
countries and a break in 1973 for the non-EC and EFTA countries. For the nonTable 7. The Additive Outlier Model
Country
Break Year
Australia
1973
1979
Belgium
1975
Canada
1976
Denmark
1975
Finland
1974
France
1975
Germany
1972
Ireland
1976
Italy
1976
Japan
1969
Netherlands
1976
Norway
1986
Spain
1974
Sweden
1964
U.K.
1974
U.S.A.
1974
4.536
(10.61)
1.460
(6.42)
6.908
(13.99)
3.754
(8.17)
5.696
(11.93)
2.885
(8.65)
5.914
(11.81)
3.317
(6.01)
7.287
(8.19)
1.907
(4.20)
0.423
(2.38)
6.662
(10.55)
1.781
(4.91)
11.463
(8.20)
0.334
(2.01)
5.604
(8.82)
2.141
(5.67)
0.609
(3.99)
0.623
(4.33)c
0.404
(4.96)b
0.277
(4.33)c
0.513
(4.34)c
0.227
(6.64)a
0.660
(4.95)b
0.732
(3.63)
0.657
(7.58)a
0.702
(4.75)b
0.783
(3.53)
0.606
(4.06)
0.303
(4.78)b
0.685
(7.61)a
0.536
(3.87)
0.493
(4.60)c
0.251
(4.10)
Austria
2.053
(6.99)
1.704
(13.55)
2.771
(8.70)
5.145
(17.95)
2.557
(8.29)
1.915
(8.61)
2.052
(6.35)
1.417
(3.63)
5.627
(10.14)
4.650
(16.43)
1.653
(12.19)
1.945
(4.94)
2.094
(16.96)
2.400
(2.57)
1.470
(10.40)
2.715
(6.41)
4.840
(19.21)
1
4
3
3
1
4
1
3
3
3
2
1
4
1
4
3
Note: The critical values for the AO model, reported in Perron and Vogelsang (1992), are 5.20
(1%), 4.67 (5%), and 4.33 (10%). Numbers in parentheses are t-statistics. Superscripts a, b, and
c denote rejection of the unit root null at the 1%, 5%, and 10% significance levels respectively.
Table 8.
235
Panel Unit Root Tests with Structural Change
Group
Break Year
t
OECD
17
1975
0.638
21.91a
EUROPE
13
1975
0.651
18.92a
EC
1975
0.670
16.15a
NON-EC
1973
0.550
10.36a
EFTA
1973
0.557
8.45a
NON-EUROPE
1975
0.629
5.61
Critical Values
Group
1%
5%
10%
OECD
12.38
11.56
11.16
EUROPE
10.89
10.00
9.63
EC
9.13
8.35
7.97
NON-EC
8.60
8.01
7.66
EFTA
7.18
6.46
6.11
NON-EUROPE
7.18
6.46
6.11
Note: Superscripts a, b, and c denote rejection of the unit root null at the 1%, 5%, and 10%
significance levels respectively.
Europe countries, the unit root null could not be rejected at the 10% level. This
panel, however, consists of only four countries.
V. CONCLUSIONS
The purpose of this chapter was to develop and implement panel unit root tests
in the presence of structural change. To that end, we combine methods from
two previously disjoint literatures: testing for a unit root in panels and testing
236
for a unit root in the presence of structural change. The resultant test allows for
both serial and contemporaneous correlation, both of which are often found to
be important in the panel unit root context.
The motivation for the test comes from the hypothesis that conventional
panel unit root tests, those that do not incorporate structural change, will have
low power if the data are stationary with structural change. While this is well
established in the univariate literature, it is only a conjecture in the panel
context. We investigate this conjecture by conducting power experiments for
panels of non-trending, stationary series with a one-time change in the mean,
and find that conventional panel unit root tests generally have very low power.
We then conduct the same experiments using methods that test for a unit root
in the presence of structural change, and find that the power of the tests is much
improved.
We apply our test to a data set of annual unemployment rates for 17 OECD
countries from 1955 to 1990. For these countries, unit root tests that do not
incorporate structural change, whether univariate or panel, provide no evidence
against the unit root null. While univariate tests that incorporate structural
change do provide some evidence against unit roots, the short span of the data
suggests that power may be problematic. Using our panel test with a one-time
structural change, we find very strong evidence of regime-wise stationarity.
This evidence is both for the full panel and for a number of smaller subpanels.
Our work could be extended in a number of directions. While the test
incorporates a one-time break in non-trending data, extensions to multiple
breaks and/or trending data would be straightforward. Once variety in the
number of breaks, type of breaks, number of countries, and number of
observations are allowed for, the number of possibilities increases rapidly. With
the availability of programs for calculating critical values, we suspect that it
will be more fruitful to develop tests on a case-by-case basis rather than attempt
to achieve generality.14
NOTES
1. MacKinnon (1991) shows how to calculate critical values for ADF tests for any
sample size.
2. If the coefficient is not equated across countries, as in Breuer, McNown &
Wallace (2000), the gains in power over univariate methods are much smaller. Im,
Peseran & Shin (1997) report higher power without equating across countries, but
their alternative hypothesis is that one member of the panel, rather than all members, are
stationary.
237
3. If there is no serial correlation (k = 0), or if the ks and cs are constrained to be

equal across countries, as in OConnell (1998), the FGLS estimator can be iterated to
achieve maximum likelihood. These restrictions, however, rarely (if ever) hold in
practice.
4. For all of the critical value calculations, we generate 50 more observations than
are reported, and then discard the first 50 observations.
5. Innovational outlier models, where the structural change occurs gradually, can
also be estimated.
6. As explained by Perron & Vogelsang (1992), the dummy variables DTBti are
included to ensure that the t-statistic on in equation (4) has the same asymptotic
distribution as in the IO model and is invariant to the value of k.
7. The dummy variable DTBt is included to allow for a change in the mean under the
null.
8. Abuaf and Jorion (1990) conduct panel unit root tests which allow for structural
change, but the time of the break is assumed to be known a priori.
9. The results in Tables 3 and 4 are qualitatively unchanged for TB = T/4 or 3T/4.
10. Froot & Rogoff (1995) show that, if a variable follows a stationary AR(1) process
with a half life of three years, it would take 72 years of annual data to reject the unit
root null using the 5% Dickey-Fuller critical value.
11. The critical values, also reported in Table 6, are calculated for the exact number
of countries and observations in each of the panels, using the Monte Carlo methods
described above.
12. The members of the EC (included in our data) are Belgium, Denmark, France,
Germany, Ireland, Italy, Netherlands, Spain, and the United Kingdom. The EFTA
countries are Austria, Finland, Norway, and Sweden.
13. The critical values are calculated for the exact number of countries and
observations in each of the panels, using the Monte Carlo methods described above.
14. An example is Papell (2000), who develops a panel unit root test in the presence
of three breaks in the slope, but none in the intercept, of the trend function, with further
restrictions imposed for consistency with purchasing power parity.
REFERENCES
Abuaf, N., & Jorion, P. (1990). Purchasing Power Parity in the Long Run. Journal of Finance, 45,
157174.
Bai, J., & Perron, P. (1998). Estimating and Testing Linear Models with Multiple Structural
Changes. Econometrica, 66, 4778.
Banerjee, A., Lumsdaine, R. L., & Stock, J. H. (1992). Recursive and Sequential Tests of the Unit
Root and Trend-Break Hypotheses: Theory and International Evidence. Journal of
Bowman, D. (1999). Efficient Tests for Autoregressive Unit Roots in Panel Data. IFDP #646,
Board of Governors of the Federal Reserve System.
Breuer, J., McNown, R., & Wallace, M. (2000). The Quest for Purchasing Power Parity With A
Series-Specific Test using Panel Data. Working paper, Department of Economics,
University of South Carolina.
238
Campbell, J. Y., & Perron, P. (1991). Pitfalls and Opportunities: What Macroeconomists Should
Know About Unit Roots. In: O. J. Blanchard & S. Fischer (Eds), NBER Macroeconomic
Annual (pp. 141201). Cambridge: MIT Press.
Froot, K. A., & Rogoff, K. (1995). Perspectives on PPP and Long-Run Real Exchange Rates. In:
G. Grossman & K. Rogoff (Eds), Handbook of International Economics, Vol. 3 (pp. 1647
1688). North Holland: Amsterdam.
Hall, A. R. (1994). Testing for a Unit Root in Time Series with Pretest Data-Based Model
Selection. Journal of Business and Economic Statistics, 12, 461470.
Im, S., Pesaran, H., & Shin, Y. (1997). Testing for Unit Roots in Heterogenous Panels. Working
paper, Department of Economics, University of Cambridge.
Layard, R., Nickell, S., & Jackman, R. (1991). Unemployment: Macroeconomic Performance and
The Labour Market. Oxford: Oxford University Press.
Levin, A., & Lin, C. F. (1992). Unit Root Tests in Panel Data: Asymptotic and Finite-Sample
Properties. Discussion paper 9223, Department of Economics, University of CaliforniaSan Diego.
Lumsdaine, R. L., & Papell, D. H. (1997). Multiple Trend Breaks and the Unit Root Hypothesis.
Review of Economics and Statistics, 79, 212218.
a New Simple Test. Oxford Bulletin of Economics and Statistics, 61, 631652.
MacKinnon, J. G. (1991). Critical Values for Cointegration Tests. In: R. F. Engle & C. W. J.
Granger (Eds), Long-Run Economic Relationships: Readings in Cointegration (pp. 267
276). Oxford: Oxford University Press.
Ng, S., & Perron, P. (1995). Unit Root Tests in ARMA Models with Data Dependent Methods for
the Selection of the Truncation Lag. Journal of the American Statistical Association, 90,
268281.
OConnell, P. G. J. (1998). The Overvaluation of Purchasing Power Parity. Journal of
Papell, D. H. (1997). Searching for Stationarity: Purchasing Power Parity Under the Current Float.
Papell, D. H. (2000). The Great Appreciation, the Great Depreciation, and the Purchasing Power
Parity Hypothesis. Working paper, Department of Economics, University of Houston.
Papell, D. H., Murray, C. J., & Ghiblawi, H. (2000). The Structure of Unemployment. Review of
Perron, P. (1989). The Great Crash, the Oil Price Shock, and the Unit Root Hypothesis.
Perron, P. (1997). Further Evidence on Breaking Trend Functions in Macroeconomic Variables.
Perron, P., & Vogelsang, T. J. (1992). Non-stationarity and Level Shifts With An Application to
Purchasing Power Parity. Journal of Business and Economic Statistics, 10, 301320.
Zivot, E., & Andrews, D. W. K. (1992). Further Evidence on the Great Crash, the Oil- Price Shock,
and The Unit Root Hypothesis. Journal of Business and Economic Statistics, 10,
251270.
PANEL DATA LIMIT THEORY AND

ASYMPTOTIC ANALYSIS OF A PANEL
REGRESSION WITH NEAR
INTEGRATED REGRESSORS
Heikki Kauppi
ABSTRACT
This chapter develops a new limit theory for panel data with large
numbers of cross section, n, and time series, T, observations. The results
apply when n and T tend to infinity simultaneously and provide useful tools
for obtaining convergencies in probability and in distribution in cases
where the panel data may be cross sectionally heterogenous in a fairly
general way. We demonstrate how the new theory can be applied to derive
asymptotics for a panel regression where regressors are generated by a
local to unit root process with heterogenous localizing coefficients across
cross section.
I. INTRODUCTION
In the last few years much new research has emerged that develops econometric
methods for panel data where both the numbers of cross section and time series
observations are large. This research is motivated by the increasing availability
of important panel data sets that cover large numbers of different countries,
sectors, and individuals over long periods of time. Many of these data sets
ISBN: 0-7623-0688-2
239
240
HEIKKI KAUPPI
consist of macroeconomic variables that display characteristics resembling

those generated by integrated processes. Accordingly, standard panel methods
cannot be applied for these data and an appropriate method has to take into
account the possible strong persistence of the data. Therefore, particular
techniques have been developed for testing for unit roots and cointegration in
panel data and for statistical analysis of panel regressions with integrated
regressors. Typical empirical applications of these methods involve estimation
and testing for the existence of long-run relationships between international
financial series such as relative prices and spot and future exchange rates.
The purpose of this chapter is to develop a new panel data limit theory that
can be applied to derive asymptotics for a variety of interesting estimators and
test statistics in the context of models for panel data with large cross sectional
dimension, n, and time series dimension, T. Our new theory assumes that n and
T tend to infinity simultaneously and builds upon the concepts of joint
convergence in probability and in distribution for double indexed processes
developed by Phillips & Moon (1999a). The contribution of the chapter is to
develop new versions of the law of large numbers and the central limit theorem
that apply in panels where the data may be cross sectionally heterogenous in a
fairly general way.
We demonstrate the usefulness of the new theory in an application where we
study asymptotic inference in a panel regression in which the regressors are
generated by an autoregressive process with a root local to unity. In this
framework, both the regression errors and the errors that drive the autoregressive regressors are specified by a general linear process. The model then
deviates from the previously analyzed panel cointegration regressions only in
that the autoregressive parameters in the regressors are not necessarily exactly
equal to one but rather may be just within a range of near alternatives to unity.
This generalization of earlier models is motivated by the fact that in most
empirical questions in macroeconomics and finance where the new panel
cointegration methods are applied an assumption about exact unit roots can be
considerably uncertain. Given that near unit roots are known to result in severe
inferential problems for the usual time series cointegration methods it is
important to examine related problems in the context of panel data analysis.
Our application of the panel asymptotics reveals the following. First, due to
error serial correlation biases the usual pooled panel OLS estimator is invalid
for inference. Second, a corrected version of this estimator proved to be nTconsistent with an asymptotic normal distribution centered to the true
regression parameter irrespective whether the regressors have near or exact unit
roots. Unfortunately, this positive result only holds in the special case where
the model does not exhibit any deterministic effects, such as individual
Panel Data Limit Theory and Asymptotic Analysis of a Panel Regression
241
intercepts. In the third application, we derive asymptotics for a pooled panel

fully modified estimator of Phillips & Moon (1999a) who assumed exact unit
roots. The asymptotic results show that this estimator is subject to severe bias
effects, if the regressors are nearly rather than exactly integrated. Our
theoretical findings are illustrated by small sample simulations. Overall, the
analysis indicates that near unit roots are in general likely to result in
insuperable inferential problems even in the context of panel data analysis.
The organization of the chapter is as follows. The new limit theorems are
given in Section II. Section III presents the applications of the panel
asymptotics, while concluding remarks are given in Section IV. Proofs of the
theorems are in the appendix.
II. THEORY
In panel data limit theory we consider a double indexed process Xn, T , in which
both n and T tend to infinity. In general, the limit of Xn, T depends on the
treatment of the indices n and T, and the properties that link the two dimensions
of the process. Phillips & Moon (1999a) discuss different approaches. One
possibility is to allow n and T to pass to infinity along a diagonal path
determined by a monotonically increasing functional relation of the type
T = T(n) as the index n . This approach simplifies the asymptotic theory by
replacing Xn, T with a single indexed process Xn, T(n). However, a drawback of this
diagonal path limit theory is that the assumed expansion path (n, T(n))
may not provide an appropriate approximation for a given (n, T) situation.
Furthermore, the limit theory is likely to depend on the specific functional
relation T = T(n) that is used in the asymptotic development. Following Phillips
& Moon (1999a) we therefore focus on an alternative approach where n and T
are allowed to tend to infinity simultaneously without imposing a specific
diagonal path for the divergence of the indices.
Merely as an auxiliary tool, we also consider a special form of multi-index
asymptotics, called the sequential limit theory. Again, this theory is introduced
by Phillips & Moon (1999a). The general idea of this approach is to derive limit
results in two steps. The first step is to fix one index, say n, and allow the other,
say T, to pass to infinity, giving an intermediate limit. The final limit result is
then obtained by letting n tend to infinity subsequently. While the sequential
limit theory can offer an easy route to a limit result it may give asymptotic
results that are misleading in cases where both indices tend to infinity
simultaneously (see Phillips & Moon (1999b)). Nevertheless, this theory can
often serve as a helpful tool to obtain conjectures about limit results that hold
under the more general joint limit theory.
242
HEIKKI KAUPPI
In this section, we consider a general double indexed process of the form

n
1
Xn, T =
kn
Yi, T,
i=1
where the Yi, T are independent random vectors across i and kn is either n or n.
A typical Yi, T component is a standardized sum of the time series component
of the panel data. Examples are given in the following section. To this end,

n
1
suppose we are interested in the probability limit of Xn, T =
n
Yi, T. Assume
i=1
Yi, T Yi as T for all i. Then, by the independence of Yi, T across i for all T,

n
1
it follows that Xn, T Xn as T for all n, where Xn =
n
p
Yi. Here it should
i=1
be noticed that one has to assume that the Yi are defined on the same probability

n
1
space for all i so that the sum of the limit random variables
n
Yi is well
i=1
defined on the same probability space. This can be justified as shown by

Phillips & Moon (1999a, Appendix B). By allowing n and applying an

n
1
appropriate law of large numbers to Xn =
n
Yi we may then find the
i=1
1
sequential limit of Xn, T . Let X = lim
n n
E(Yi) exist and be finite. Then,
i=1
Xn X so that as T followed by n ,
p
Xn, T X.
This is a sequential probability limit result in the sense defined by Phillips &
Moon (1999a).
In general, the sequential probability limit X of Xn, T is not the same as the
probability limit of Xn, T under joint convergence of the indices n, T and may not
even exist or requires a different normalization. Examples are given in Phillips
& Moon (1999b). Therefore, an interesting question arises: when does the
sequential limit coincide with the joint limit? The following theorem is adopted
from Phillips & Moon (1999a, Theorem 1) and gives sufficient conditions
under which the joint probability limit and the sequential probability limit are
243
identical. Hereafter, we denote by (n, T ) the joint limit as T and n

simultaneously. Also, note that below denotes weak convergence of the
associated probability measure, ||A|| is the usual notation for the Euclidean
norm tr(AA) of a matrix A, 1{.} denotes an indicator function, and
lim supn, T xn, T signifies the superior limit of a sequence {xn, T} when joint
convergence is considered.
Theorem 1. Suppose the random (k 1) vectors Yi, T are independent across i
for all T and integrable. Assume that Yi, T Yi as T for all i. Let the
following conditions hold:

n
(i)
lim supn, T
1
n
E||Yi, T|| < ,
i=1
n
1
(ii) lim supn, T
n
||E(Yi, T) E(Yi)|| = 0,
i=1
n
1
(iii) lim supn, T
n
E||Yi, T||1{||Yi, T|| > n} = 0 for all > 0,
i=1
1
(iv) lim supn
n
1
If limn
n
E||Yi||1{||Yi|| > n} = 0 for all > 0.
i=1
i=1

n
1
E(Yi) = X exists and Xn =
n
Yi X as n , then
i=1
Xn, T =
1
n
Yi, T X as (T, n ).
i=1
Theorem 1 gives fairly general conditions under which a joint probability limit
can be established. However, in many cases it may be rather tedious to verify
all the required conditions (i) through (iv) of the theorem. As shown by
Corollary 1 of Phillips & Moon (1999a) somewhat easier conditions can be
obtained in the special case, where the Yi, T are scaled variates of an iid process.
However, there are certainly various interesting situations where the heterogeneity of the different panel members arises from other sources so that Corollary
1 of Phillips & Moon (1999a) cannot be applied. Therefore, for dealing with
heterogenous panels of other types we have designed the following theorem.
The basic idea of Theorem 2 arises from Markovs law of large numbers that
applies in the case of independent variates Zi satisfying Markovs condition,
E||Zi||1 + M < for some > 0 and for all i.
244
HEIKKI KAUPPI
Theorem 2. Suppose that the random (k 1) vectors Yi, T are independent

across i for all T and integrable. Assume that Yi, T Yi as T for all i. Let
the following conditions hold:
(a) supi||E(Yi, T) E(Yi)|| 0 as T .
(b) supTE||Yi, T||1 + M < for some > 0 and for all i,

n
1
If limn
n
E(Yi) = X exists, then
i=1

n
1
n
Yi, T X as (T, n ).
i=1
We turn to consider conditions under which we can obtain convergencies in

distribution as (n, T ). As in the case of the probability limit, we can often

n
easily derive a sequential weak convergence result for Xn, T =
Yi, T, say.
n i = 1
(Examples are given in Phillips & Moon (1999a, b).) As to how to obtain
convergencies in joint limits as (T, n ), again, Phillips & Moon (1999a) give
some general results. Their Theorem 2 provides a joint central limit theorem for
(T, n ) that employs a Lindeberg condition for double indexed processes. In
addition, their Theorem 3 gives a version which applies to iid variates scaled
differently across cross section. Again, to deal with other types of heterogeneities across cross section we have developed the following version of the joint
central limit theorem.
Theorem 3. Suppose that Yi, T are independent scalar variables across i for all
T with E(Yi, T) = 0 and Var(Yi, T) = Vi, T. Assume the following conditions hold:

n
1
(i) limn, T
n
Vi, T = V is finite and positive,
i=1
2+
i, T
(ii) supTE|Y |
M < for some > 0 and for all i.
Then,

n
Xn, T =
1
n
Yi, T N(0, V) as (T, n ).
i=1
The basic idea of Theorem 3 is to employ a Lyapunov condition to guarantee

that the Lindeberg condition holds. The corresponding vector case can be
handled by using Theorem 3 and the Cramer-Wold device.
245
III. AN APPLICATION
Most of the recent applications of the new large n, T panel data limit theory has
involved studying and developing estimators and tests for panel cointegrating
regressions where the regressors are integrated of order one. In this section we
analyze problems that arise in these models when the regressors are nearly
rather than exactly integrated of order one. We start by introducing the model
and assumptions.
A. The Model
We focus on the simple two variable panel regression
yi, t = xi, t + ui, t,
(1)
ci
xi, t =
i xi, t 1 + i, t,
i = exp(ci /T) 1 + ,
T
(2)
(t = 1, . . . , T, i = 1, . . . , n),
where the initial values zi, 0 = (yi, 0, xi, 0) are iid, E||zi, 0||4 < , and the errors are
specified below. To this end, notice that if
i = 1 (i.e. ci = 0) in (2) for each i, then
the xi, t are pure or exact unit root processes and the system given by equations
(1) and (2) coincides with the homogenous panel cointegration regression
studied by Phillips & Moon (1999a) and many others (for a survey, see Phillips
& Moon (1999b)). In these studies the regression coefficient in (1) is called a
cointegrating parameter and it represents a stationary relationship that holds
between yi, t and xi, t for every i. Such a common long-run relationship is often
predicted by economic theory and it is then of central interest to estimate and
test whether it satisfies theoretically sound restrictions. A typical example
involves testing for the existence of a purchasing power parity hypothesis in a
panel of suitably similar countries.
In contrast to the recent panel cointegration literature, we do not restrict
attention to models, where the regressors are generated by exact unit root
processes. Indeed, although most macroeconomic variables analyzed in the
recent panel cointegration studies display strong autocorrelation, there are
seldom strong prior reasons why the autoregressive parameter should be unity.
The problem is aggravated by the fact that unit root tests cannot reliably detect
small deviations from unity. Given this uncertainty about the unit roots, it is of
interest to study problems that arise in the statistical inference about the
regression parameter in (1) when the autoregressive parameters in (2) are close
to rather than exactly equal to one. From earlier literature we know that such
246
HEIKKI KAUPPI
problematic near alternatives are best modeled by the local to unit root
ci
parametrization
i = exp(ci /T) 1 + in (2) (see e.g. Elliott (1998) and Stock
T
(1997)). By this device it is possible to obtain asymptotic results that provide
reasonable approximations in cases where the regressors xi, t are stationary but
revert to their means so slowly that the standard fixed
i asymptotics fail to
attain satisfactory accuracy.
We close this section by imposing the following assumption.
Assumption 1. The errors i, t = (ui, t, i, t) are linear processes satisfying the
following conditions:

(a) i, t = C(L)i, t =

Cji, t j, where
j=0
j3||Cj|| < ,
j=0
(b) i, t = ( i, t, wi, t), where i, t and wi, t are mutually independent and iid across
i and over t with E( i, t) = E(wi, t) = 0, E( 2i, t) = E(w2i, t) = 1, and E( 4i, t) =
E(w4i, t) = 4 < for all i and t.
Under Assumption 1 the error process in the system (1) and (2) satisfy the same
conditions as the error process of the homogenous panel cointegration
regression of Phillips & Moon (1999a, Assumptions 8 and 9).
B. Preliminary Analysis
For preliminary insights, we derive sequential limits for the pooled panel OLS
estimator,

n
i=1
t=1
xi, tyi, t
i=1
(3)
.
x 2i, t
t=1
Let [Tr] denote the integer part of Tr. From Phillips & Solo (1992), we know

[Tr]
i, t converges weakly
T t = 1
to a two dimensional Brownian motion Bi(r) = (Bui(r), Bi(r)), (0 r 1), with
that under Assumption 1, the partial sum process

the long-run covariance matrix =
j=
E(i, ji, 0), which we partition
247
= [kl], (k, l = u, ). Furthermore, by the well know limit theory for near
integrated processes (e.g. Phillips (1987, 1988)) as T ,

T
1
T2
Kci(r)2dr,
(4)
t=1
1
T
x2i, t
1
xi, tui, t
Kci(r)dBui(r) + u,
(5)
t=1
where u is a non-diagonal element of the one sided long-run covariance

matrix =
E(i, ji, 0) = [kl], (k, l = u, ), and Kci(r) =
e(r s)cidBi(s),
j=0
(0 r 1), is an Ornstein-Uhlenbeck process. Given (4) and (5) we may deduce

for fixed n as T ,

n
1
T( )
n
i=1

1
n
Kci(r) dr
1
Kci(r)dBui(r) + u .
i=1
(6)
This result provides the first step for obtaining sequential asymptotics for (3).
The second step is to derive the limit of the right hand side of (6) as n . For
simplicity assume ci = c for all i. Then, notice that the
with mean zero and variance

Kci(r)dBui(r)

1
= uu
Kci(r)dBui(r) are iid
e2(r s)cdsdr < ,
(7)
where the equality follows from well known results for stochastic integrals.
Consequently, we may apply the strong law of large numbers to obtain

n
1
n
i=1
as
Kci(r)dBui(r) 0, as n ,
as
where denotes almost sure convergence. Furthermore, the

also iid,
E

0
Kci(r)2dr =

1
e2(r s)cdsdr > 0,
(8)
Kci(r)2dr are
248
HEIKKI KAUPPI

and E
Kci(r)2dr
< . Thus, we may deduce that the denominator on the
right hand side of (6) converges almost surely to

1
e2(r s)cdsdr, as
n . In view of these results, we may now conclude that as T followed

by n ,

T( ) 1/
e2(r s)cdsdr
u

(9)
This result indicates that although is consistent it is subject to a second order

bias effect arising from temporal correlation between the system errors ui, t and
i, t. Note that if
i = 1 in (2), the bias term in (9) still exists and actually
becomes equal to 2u /. In contrast, if u = 0, there is no asymptotic bias in
the estimation error of irrespective of the values of the localizing parameters
ci in (2). In fact, if u = 0, we obtain the sequential weak convergence result
nT( ) N(0, V ),
(10)
where
V =
uu

1
1
r
e2(r s)cdsdr
The latter limiting result essentially follows from the fact that

n
1
n
i=1
Kci(r)dBui(r)
is asymptotically normally distributed with zero mean and variance given in

(7).
C. Serial Correlation Corrected Estimation
In view of the above analysis we may conjecture that the asymptotics in (10)
can be attained even when u 0 provided that we have a suitable estimator for
249
u. One alternative is to use the kernel estimation strategy that is used in the
pooled fully modified (PFM) estimator of Phillips & Moon (1999a). The PFM
estimator will be introduced in the subsequent section and it employes the
= [ kl] and
= [ kl], (k, l = u, ), of and ,
averaged kernel estimators
respectively, defined by

i=
i,

i=1
1
Here i(j) =
T
i=1
(j/K) i(j),
j=T+1
T1
=1

n

T1
=1

n
i,
i=

(j/K) i(j).
(11)
j=0
i, t + ji, t, where the summation is over 1 t, t + j T, while
(j/K) is a lag kernel for which (0) = 1, (x) = ( x),
(x)2dx < , and

1 (x)
< . As to
|x|q
applicable lag kernel functions and the choice of the bandwidth parameter K we
follow Phillips & Moon (1999a) and impose the following assumption.
with Parzens exponent q(0, ) such that kq = lim

x0
1
Assumption 2. The lag kernel (j/K) in (11) has Parzen exponent q > , and
2
the bandwidth parameter K tends to infinity with K/T 0 and K2q/T > 0, as
T .
)
Remark 1. Under Assumption 2 the normalized estimation errors n(
) converge in probability to zero. This result was stated in
and n(
Phillips & Moon (1999a, Proof of Theorem 9) and holds as (T, n ) with
n/T 0. This result is employed in the proofs of the theorems given below.
Remark 2. Notice that the kernel estimators defined in (11) are not feasible,
since they employ the unknown errors i, t = (ui, t, i, t). A natural approach to
i, t, from a preliminary
estimate ui, t and i, t is to use the residuals u i, t = yi, t x
pooled panel OLS regression, and the differences xi, t , respectively. It is easy
to show that the associated estimation errors for ui, t and i, t are of orders of
magnitude T 1 and T 1/2, respectively. In view of this and Remark 1 we may
then expect that under the assumptions of this chapter and irrespective whether
the xi, t in (2) have exact or near unit roots, the use of u i, t and xi, t in places of
ui, t and i, t, respectively, has no effect on the rate of consistency of the kernel
250
HEIKKI KAUPPI
estimators in (11). However, following Phillips & Moon (1999a), we proceed

by working with the true errors i, t , since we want to avoid any further
technical complications that might arise in an asymptotic analysis where the
kernel estimators in (11) use the estimates u i, t and xi, t in places of ui, t and i, t,
respectively.
Now we are ready to define a robust estimator for ,

n
xi, tyi, t nT u
=
*
i=1
t=1
i=1
(12)
x2i, t
t=1
where u is given in (11). The estimator in (12) is called a serial correlation

corrected pooled panel estimator.
We turn to establish the joint asymptotics of the new estimator in (12). Let
Jci(r) =
e(r s)cidWi(s), where Wi(r) is a standard Brownian motion. Hereafter,
we assume that the values of ci are uniformly bounded and such that the

1
arithmetic mean of the expected values of

finite number, i.e.

n
1
lim
n n
i=1
1
Jci(r) dr = lim
n n
2
Jci(r)2dr converges to a positive
i=1
e2(r s)cidsdr = xx
exists and is finite by assumption. The latter condition is not restrictive and
basically means that we assume that the appropriately normalized sample

n
1
second moment of the pooled regressors xi, t, i.e. 2
nT
probability.
i=1
x2i, t, converges in
i=1
Theorem 4. Suppose Assumptions 1 and 2 hold and that data are generated by
(1) and (2) with ci such that supi|ci| c < . Then under joint limits as
(T, n ) with n/T 0
) N(0, V *
nT( *
),
where
V *
=
uu 1
.
xx
251
As is apparent from Theorem 4 the serial correlation corrected pooled panel

OLS estimator has indeed very desirable properties. It is nT-consistent,
asymptotically normal and free of asymptotic biases irrespective whether the
regressors xi, t in (2) carry out exact or near unit roots in their generating
mechanisms. This is a remarkable improvement that can be gained, if panel
data are used, since none of the existing time series estimators for cointegrating
parameters can achieve these features. Rather, as shown e.g. by Elliott (1998)
the time series cointegration regression estimators tend to suffer from second
order biases unless the regressors are generated by exact unit root processes,
and these biases lead to severe size distortions in hypothesis testing. In contrast,
we will show below that by the use of the serial correlation corrected pooled
panel OLS estimator we can achieve robust inferences in fairly general
situations where individual regressors may have roots that vary heterogeneously within a range of values near one.
Unfortunately, the situation turns out less hopeful, if the panel regression in
(1) includes individual intercepts or if the data exhibit linear or higher order
time trends. While there is a natural way to modify the new serial correlation
corrected pooled OLS estimator to take these effects into account, it turns out
that in these cases near unit roots result in nuisance parameters that produce
bias effects to the asymptotics of the estimator. To see why this happens
suppose the regression in (1) includes an intercept that may vary across
individuals. This suggests the use of demeaned data in the formula of the
estimator. Accordingly, modify (12) to the form

n
i=1
t=1
x i, ty i, t nT u
=
*
i=1
(13)
x 2i, t
t=1

T

T
1
1
where y i, t = yi, t y i and x i, t = xi, t x i, with y i =
yi, t and x i =
xi, t,
T t=1
T t=1
respectively.
The asymptotic properties of the estimator in (13) are easily found by
employing the sequential limit theory. To reveal the most essential part of this
exercise note that we have

T
1
T
t=1
x i, tu i, t
c (r)dBu (r) + u,

K
i
i
(14)
252
HEIKKI KAUPPI
c (r) is a demeaned Ornstein-Uhlenbeck process defined by K c (r) =

where K
i
i
Kci(r)
Kci(s)ds. Now, while the temporal correlation correction in (13) can
still remove the bias effects that arise from the presence of u on the right hand
side of (14), the remaining term, i.e.
c (r)dBu (r), does no longer have a zero

K
i
i
c (r).
mean in comparison with the case in (5), where we had Kci(r) in place of K
i
In fact,
E
c (r)dBu (r) = u

K
i
i
and we thus obtain

i=1
1
n

e(r s)cidsdr
p
c (r)dBu (r)
uxx, as n ,
K
i
i
where xx is given above. In view of this result it is easy to see that the
estimator in (13) is subject to an asymptotic bias, which depends on the
nuisance parameters ci. Unfortunately, no technique is currently available that
would provide consistent estimates for the single localizing coefficients ci.
Only in the special case where the localizing coefficient are the same across i,
we may use the cross sectional dimension of the panel to provide consistent
estimates for the common localizing coefficient (see Moon & Phillips (1999)).
This fact opens a possibility for correcting the bias effects. However, such a
correction may be rather complicated and is to be restricted in cases where the
common c is well below zero (cf. Moon & Phillips (1999)). While it is out of
the scope of this study to consider this matter in more detail, in empirical
applications the special case of a common c is nevertheless hardly realistic.
D. Fully Modified Estimation
We turn to consider the PFM estimator of Phillips & Moon (1999a). The idea
of the PFM estimator is to modify the pooled OLS estimator in (3) by
employing non-parametric corrections in the same way as in the fully modified
OLS (FM-OLS) estimator of Phillips & Hansen (1990). The estimator is
defined by

n
253
xi, tyi,+ t nT u+
+ =
i=1
t=1
i=1
(15)
x2i, t
t=1
where
1
u

xi, t
yi,+ t = yi, t
(16)
1
u+ = u
u

,
(17)
and
employ the kernel estimators in (11). The equation (16) gives an endogeneity
correction and is similar to that in the FM-OLS estimator of Phillips & Hansen
(1990). The equation (17) gives the contemporaneous and serial correlation
corrections that are needed to remove all the second order bias effects arising
from temporal correlation between ui, t and i, t.
Under the assumption that the regressors xi, t in (2) have exact unit roots the
joint asymptotics of the PFM estimator are determined by Theorem 9 of
Phillips & Moon (1999a). The following theorem shows how this result
changes when the regressors xi, t are generated by the more general class of near
unit root processes. Here we make an additional (technical) assumption that the
values of ci are such that the ci-weighted average of the expected values of
Jci(r)2dr converges to a finite number, i.e.

n
1
limn
n
ciE
i=1
Jci(r)2dr = cxx
exists and is finite by assumption.

Theorem 5. Suppose the assumptions of Theorem 4 hold. Then under joint
limits as (T, n ) with n/T 0
(a) nT( + ) nBn, T N(0, V + ),
p
(b) T( + ) B,
where
V + =
u 1
,
xx
(18)
254
HEIKKI KAUPPI
1
with u = uu 2u
, and

n
u
Bn, T =

i=1

T
ci / T
T(e
1)
xi, t xi, t 1
t=1
(19)
x2i, t
i=1
t=1
u cxx
.
B =
xx
(20)
The following corollary holds when the assumption of Phillips & Moon
(1999a) about exact unit roots in the regressors xi, t is valid.
Corollary 6. Suppose Assumptions 1 and 2 hold and data are generated by (1)
and (2) with ci = 0 for all i. Then under joint limits as (T, n ) with n/T 0
1
nT( + ) N(0, 2u
).

It is indeed easy to see that the result of Corollary 6 follows from Theorem 5,
1
1
1
Jci(r)2dr = E
Wi(r)2dr =
because if ci = 0, then Bn, T = B = 0, and E
2
0
0
1
giving V + = 2u . The result of Corollary 6 coincides precisely with that of
Theorem 9 of Phillips & Moon (1999a) and it is illustrative to compare it to
Theorems 4 and 5 above. First, note from Corollary 6 the obvious fact that
when the exact unit root assumption holds, then + is nT-consistent,
asymptotically normal and unbiased. In addition, note that in this case + is
1
because u = uu 2u
uu. This is the
generally more efficient than *,
price that we have to pay, if the autoregressive parameters in (2) happen to be
instead of + .
exactly equal to one and we use the estimator *
However, as Theorem 5 indicates the behavior of the estimator + is
radically different, if the regressors xi, t are generated by processes with roots
that are only local to one. First, the estimator + is no more nT-consistent.
Rather, in order to obtain nT-rate asymptotics, a bias term Bn, T given in (19)
has to be subtracted from the estimation error. In fact, in view of the result (b)
of Theorem 5, if the xi, t are near, rather than exact, unit root processes, the
estimator + is only T-consistent and has an asymptotic bias given by B in (20).
If there is no simultaneity in the model, i.e. if u = 0, then the biases disappear
and the PFM estimator is nT-consistent and has an asymptotic normal
distribution with the same variance as that of the serial correlation corrected
pooled OLS estimator.
To see why the biases arise notice first that when an autoregressive parameter
i in (2) is just nearly one with ci non-zero, then xi, t = i, t + (eci /T 1)xi, t 1,
255
where (eci /T 1) ci /T. It is then easy to see that the use of xi, t in the
endogeneity correction term (16) gives raise to Bn, T in (19), which has the limit
given in (20). It is worth noticing that if the nuisance parameters ci were known,
we could employ a quasi-difference in place of the pure difference xi, t in (16)
so that the bias term, Bn, T = 0. However, as we already noted above such a
solution is generally infeasible because the localizing coefficient ci are
unknown and cannot be consistently estimated from the individual time series
xi, t.
We close this section by pointing out that the above bias problem also occurs
in cases where the PFM estimator is modified to account of deterministic
effects like individual intercepts in (1). This fact can be easily verified through
sequential asymptotics (for details see Kauppi (1999, p. 124125)).
E. Hypothesis Testing
In this section we consider testing a simple hypothesis H0: = 0 against
H1: 0. First, in view of Theorem 4 we could use the serial correlation
corrected pooled OLS estimator to obtained the t-test statistic
0)
t * = nT( *

n
1
nT2
i=1
t=1

uu
x2i, t
In view of Theorem 4 and the result (36) given in its proof in the appendix it
is easy to deduce the following corollary.
Corollary 7. Suppose the assumptions of Theorem 4 hold. Then, under joint
limits as (T, n ) with n/T 0, t * N(0, 1).
For comparison we will also consider assuming exact unit roots in xi, t and
accordingly employing the PFM estimator based t-test
t + = nT( + 0)

1
,
2
u
1
where and u =
uu
2u

are obtained from the kernel estimators
given in (11) (cf. Phillips & Moon (1999a, Remark (c), p. 1086)).
Corollary 8. Suppose the assumptions of Theorem 5 hold. Then, under joint

limits as (T, n ) with n/T 0
(a) t + diverges, if u 0 and B 0, where B is given in (20);
256
HEIKKI KAUPPI
1
(b) t + N(0, Vt + ), if u = 0, where Vt + = xx.
2
Part (a) of Corollary 8 states the obvious consequence of Theorem 5 that the ttest statistic t + diverges, if the regressors are generated by local to unit root
processes and u is non-zero. This means that hypothesis tests based on the
PFM estimator are generally severely distorted. The result of part (b) of
Corollary 8 shows that even when there is no simultaneity, i.e. u = 0, the test
does not have the desired standard normal distribution. To illustrate this latter
effect suppose that ci = c for all i. Then, if u = 0, we have
because E

1
Jci(r)2dr =
Vt + =
2c2
,
e2c 2c 1
(21)
e2(r s)cdsdr = (e2c 2c 1)/4c2 for all i. It is easy
to see from (21) that for negative values of c, the Vt + becomes larger than unity.
For example, for c = 5 and c = 10, the Vt + is approximately equal to 5.55
and 10.53, respectively. Notice that if the usual 5% critical value 1.96 is applied
in the t + -test, then the true asymptotic rejection rates that correspond to c = 5
and c = 10 are approximately equal to 40.3% and 54.6%, respectively.
F. Simulations
In this section, we illustrate the theoretical findings obtained in the previous
section by conducting some simple Monte Carlo experiments. We focus on
investigating the size behavior of the PFM t-test statistic, t + , and that of the
bias corrected t-test, t *. For the experiments we generate artificial data by
employing equations (1) and (2), where we impose = 1 in (1). The errors
i, t = (ui, t, i, t) are generated simply by equation i, t = chol(C)i, t, where
i, t ~ nid(0, I2) across i = 1, . . . , n, and over t = 1, . . . , T, and chol(C) is the
Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1,
C12 = C21 = u. Thus, we have E(ui, t) = E(i, t) = 0, E(u2i, t) = E(2i, t) = 1 = uu =
and E(ui, ti, t) = u. The initial values yi, 0 and xi, 0 are set to zeros.
Table 1 reports percentage rejection rates of the t-tests, t + and t *,
respectively, when a 5% critical value 1.96 is applied, n = 50, T = 250, and the
local to unit root coefficients are set equal to a common value c, i.e. we use
i =
= 1 + c/T for all i. In computing the long-run covariance estimates in t +
and t *, respectively, we employed the Parzen kernel function and the
bandwidth parameter value K = 1.[2] The columns under c = 0 report results
when an exact unit root assumption holds. In accordance with the analytical
Table 1.
257
Monte Carlo results with n = 50 and T = 250
c=0
c = 5
c = 10
u
t +
t *
t +
t *
t +
t *
0
0.2
0.4
0.6
0.8
5.20
5.30
6.60
4.30
4.30
4.70
4.40
6.80
4.50
4.50
42.10
89.80
100.0
100.0
100.0
5.00
4.30
4.90
4.00
5.80
52.30
99.60
100.0
100.0
100.0
4.20
5.40
4.90
5.60
4.50
Notes: The columns under t + and t * report Monte Carlo rejection rates of the respective t-tests
computed by employing long-run covariance estimates that were achieved by using a Parzen
kernel function and a bandwidth parameter value K = 1. A nominal 5% asymptotic level were
applied. In each replication, the data were obtained by using equations (1) and (2) with = 1 and
i =
= 1 + c/T in (1) and (2), respectively, initial values zeros, and with the errors i,t = (ui,t,i,t)
generated by equation i,t = chol(C)i,t, where i,t ~ nid(0, I2) across i = 1, . . . , n, and over
t = 1, . . . , T, and chol(C) is the Cholesky decomposition of the matrix C = [Cij] with C11 = C22 = 1,
C12 = C21 = u. Results are based on 1000 replications.
results of the previous section, in this case, the size behavior of the two tests is
good. The columns under c = 5 and c = 10 give rejection rates when the
roots of the regressors are only nearly one. As predicted by Corollary 8, now
the t + -test is very sensitive to deviations from exact unit roots and suffers from
severe size distortions through all values of u. Notice that even when u = 0
the t + -test rejects far in excess to the desired 5% nominal level as was
predicted by the considerations of the previous section. In contrast, as predicted
by Corollary 7 the bias corrected t-test, t *, maintains well the desired size level
through different values of u.
Table 2 reports otherwise similarly computed test results as those of Table 1
except that now n and T are set to 25 and 100, respectively. As is apparent the
results do not change much from those of Table 1. This indicates that our
asymptotic results can provide fairly accurate approximations with sample
sizes that are typical in empirical applications.
Table 3 examines the performance of the bias corrected t-test when the
individual localizing coefficients in the generating mechanisms of the
regressors vary across different panel members. The heterogeneity across panel
members were obtained by using otherwise similarly generated data as in
Tables 1 and 2 except that all the individual specific localizing coefficients ci
were drawn from a uniform distribution on the interval [c, 0]. For example, the
column denoted by (n = 25, T = 100) and c = 10 reports simulation results
258
HEIKKI KAUPPI
Table 2.
Monte Carlo results with n = 25 and T = 100
c=0
c = 5
c = 10
u
t +
t *
t +
t *
t +
t *
0
0.2
0.4
0.6
0.8
6.80
6.60
6.00
5.40
5.30
6.20
6.10
5.10
4.90
5.80
37.40
74.60
99.20
100.0
100.0
5.50
4.00
5.10
6.20
5.60
52.30
96.50
100.0
100.0
100.0
5.50
6.60
6.20
5.80
5.00
Notes: See the notes of Table 1.
based on an experiment, where the autoregressive coefficients

i( = 1 + ci/T)
across different panel members vary uniformly within the range [0.9, 1]. A
comparison of the results of Table 3 to those of Tables 1 and 2 clearly indicates
that the bias corrected t-test behaves equally well whether the xi, t have
homogenous or heterogenous localizing coefficients.
In view of the above reported simulation experiments we may conclude that
near unit roots indeed result in severe size distortions to hypothesis tests based
on the PFM estimator. On the other hand, the results are fairly promising with
Table 3.
Monte Carlo results on the bias corrected test when localizing

coefficients are heterogenous
(n = 50, T = 250)
(n = 25, T = 100)
u
c = 5
c = 10
c = 5
c = 10
0
0.2
0.4
0.6
0.8
4.82
5.80
4.96
5.42
5.18
5.10
5.06
4.62
5.06
5.18
5.00}
6.20
5.12
5.98
5.44
5.18
5.00
5.34
5.46
5.92
Notes: The table reports Monte Carlo rejection rates of the t *-test computed in the same way as
in Tables 1 and 2. The data were obtained otherwise similarly as in Tables 1 and 2 except that in
each replication the individual specific localizing coefficient ci (i = 1, . . . , n) were drawn from a
uniform distribution on the interval [c, 0]. The applied values of c are given in the top of each
column. Results are based on 5000 replications.
259
regard to the new bias corrected test, which was able to maintain good size
behavior through all the performed experiments. However, it should be pointed
out that our simulation setup here is rather simple and it is likely that some
problems arise in more complicated models. For example, if the data
generating mechanism obeys a more general short-run dynamics than
experimented here, then it can be expected that the non-parametric corrections
are subject to somewhat larger (finite sample) estimation errors, which may
weaken the performance of the bias corrected test. Furthermore, an additional
source of estimation error results in when the non-parametric estimators use
estimated values in places of the true values of the errors.
IV. CONCLUDING REMARKS

This chapter developed new panel data limit theory that can be used in
obtaining convergencies in probability and in distribution when there is
heterogeneity across panel members and the cross sectional and time series
dimensions of the data tend to infinity simultaneously. The new theory was
applied to study asymptotics of a panel regression in which the regressors were
generated by a local to unit root process with cross sectionally heterogenous
localizing coefficients. The application demonstrated that a serial correlation
corrected pooled panel OLS estimator yields nT-consistent and asymptotically normal estimates that are centered to the true parameter value
irrespective of whether the regressors are nearly or exactly integrated. While
this desirable result holds only in the special case without deterministic effects,
our asymptotic analysis also indicated that the panel fully modified estimator is
subject to asymptotic biases even in this simple case, if the regressors are
nearly rather than exactly integrated. Therefore, much care should be taken in
interpreting results achieved by the recent panel cointegration methods that
assume exact unit roots when near unit roots are equally plausible.
NOTES
1. This is proved by Phillips & Moon (1999a, Theorem 8) when ci = 0 for all i.
Furthermore, similar result can be proved in the case where the ci are nonzero by
following lines given in the proof of Theorem 5 of this chapter.
2. In empirical applications a bandwidth parameter value K = 1 is hardly realistic.
However, in the present simulation setup the actual value of K does not play an
important role, because we use iid errors in the simulations. For example, in all of the
260
HEIKKI KAUPPI
reported cases, essentially similar results were obtained by using the bandwidth
parameter value K = 4.
ACKNOWLEDGMENTS
I would like to thank the two referees for their useful comments and
suggestions. This paper was completed while the author worked at the
Research Department of the Bank of Finland whose hospitality is gratefully
acknowledged. This paper is a part of the research program of the Research
Unit on Economic Structures and Growth (RUESG) at the Department of
Economics at the University of Helsinki. Financial support from the Yrj
Jahnsson Foundation is appreciated. The usual disclaimer applies.
REFERENCES
Billingsley, P. (1968). Convergence of Probability Measures. New York: John Wiley.
Davidson, J. (1994). Stochastic Limit Theory. Oxford University Press.
Elliott, G. (1998). On The Robustness of Cointegration Methods When Regressors Almost Have
Unit Roots. Econometrica, 66(1), 149158.
Kauppi, H. (1999). Essays on Econometrics of Cointegration. Research Reports Nro 84,
Dissertationes Oeconomicae, Department of Economics, University of Helsinki.
Moon, H., & Phillips, P. C. B. (1999). Estimation of Autoregressive Roots Near Unity Using
Panel Data. Cowles Foundation Discussion Paper No. 1224, Yale University,
(http://cowles.econ.yale.edu/).
Phillips, P. C. B. (1987). Towards A Unified Asymptotic Theory for Autoregression. Biometrica,
74(3), 535547.
Phillips, P. C. B. (1988). Regression Theory for Near-integrated Time Series. Econometrica, 56(5),
10211043.
Phillips, P. C. B., & Hansen, B. E. (1990). Statistical Inference In Instrumental Variables
Regression With I(1) Processes. Review of Economic Studies, 57, 99125.
Phillips, P. C. B., & Moon, H. (1999a). Linear Regression Limit Theory for Non-stationary Panel
Data. Econometrica, 67(5), 10571111.
Phillips, P. C. B., & Moon, H. (1999b). Non-stationary Panel Data Analysis: An Overview of
Some Recent Developments. Cowles Foundation Discussion Paper No. 1221, Yale
University, (http://cowles.econ.yale.edu/).
Phillips, P. C. B., & Solo, V. (1992). Asymptotics for Linear Processes. The Annals of Statistics,
20(2), 9711001.
Stock, J. H. (1997). Cointegration, Long-run Comovements, and Long Horizon Forecasting. In: D.
Kreps & K. F. Wallis (Eds), Advances in Econometrics Proceedings of the Seventh World
Congress of the Econometric Society. Cambridge: Cambridge University Press.
Stout, W. F. (1974). Almost Sure Convergence. New York: Academic Press.
White, H. (1984). Asymptotic Theory for Econometricians. Academic Press: San Diego,
California.
261
APPENDIX
APPENDIX A: PROOF OF THEOREM 2

n
1
From the conditions of the theorem we know that Xn, T =
Yi, T
n i=1
n
1
Xn =
Yi as T for all fixed n. Since supTE||Yi, T||1 + M < for all i and
n i=1
because Yi, T Yi implies ||Yi, T||1 + ||Yi||1 + by the continuous mapping theorem
we also have E||Yi||1 + M < by Theorem 5.3 of Billingsley (1968) (see also
discussion on p. 33 of Billingsley (1968)). By arguments given in the proof of
Theorem 1 of Phillips & Moon (1999a) we can justify that the Yi are
independent across i, since the Yi, T are independent across i for all T. Given this
and the fact that E||Yi||1 + M < , we may apply Markovs law of large
p
numbers to deduce Xn X as n (e.g. White (1984, p. 33)). Furthermore,
p
if we establish conditions (i) through (iv) of Theorem 1, then Xn, T X as

(T, n ).
First, condition (i) holds, since

n
1
n

n
E||Yi, T||
i=1
1
n
i=1
< ,
sup E||Yi, T|| M
T
where the last two inequalities follow from condition (b) of the theorem. Also,
condition (ii) holds, since

n
1
n
||E(Yi, T) E(Yi)|| sup ||E(Yi, T) E(Yi)|| 0, as T ,

i
i=1
by condition (a). For condition (iii) we use the fact that E||Yi, T||1{||Yi, T|| > n}
1
M
sup E||Yi, T||1 +
for all i, where the first inequality follows from

(n) T
(n)
arguments given by Billingsley (1968, p. 32) and the second inequality holds
by condition (b). Now, for any > 0,

n
1
n
i=1
E||Yi, T||1{||Yi, T|| > n}
M
,
(n)
262
HEIKKI KAUPPI
and therefore, condition (iii) follows. Condition (iv) holds by the same
1
M
E||Yi||1 +
.
argument as we notice that now E||Yi||1{||Yi|| > n}

(n)
(n)
APPENDIX B: PROOF OF THEOREM 3

n
Let s
2
n, T
Vi, T and define i, n, T =
i=1
Yi, T
. Then
sn, T
i, n, T N(0, 1), as (T, n ),
(22)
i=1
by Theorem 2 of Phillips & Moon (1999a), if the Lindeberg condition

n
lim
n, T
E[2i, n, T1{|i, n, T| > }] = 0, > 0,
i=1

n
Yi, T N(0, V) as (T, n ). It

n i = 1
remains to verify the above Lindeberg condition.
We have for given > 0,
holds. Given condition (i), (22) implies

n

n
E[2i, n, T1{2i, n, T > }] =
i=1
Y2i, T
Y2
1 2i, T >
2
sn, T
sn, T
i=1
n 1
s2n, T n
E Y 2i, T1 Y 2i, T >
i=1
s2n, T
n
n
n 1
2
sn, T n
s2n, T
n
n
sup E Y 2i, T1 Y 2i, T >
i=1
(23)
By condition (ii) we can always find > 0 such that sup E|Y 2i, T|(1 + ) N < for
T
all i. Given this we obtain

sup E Y 2i, T1 Y 2i, T >

T
s2n, T
n
n

n
s2n, T

n
,

(24)
263
for all i (cf. Billingsley (1968, p. 32)). In view of (23) and (24) and given that
n
s2n, T
= V < we may
condition (i) implies lim 2 = 1/V < (V > 0) and lim
n, T s
n, T n
n, T
now conclude that

n
lim
n, T
E[2i, n, T1{2i, n, T > }] = 0,
i=1
so that the Lindeberg condition follows.
APPENDIX C: PROOF OF THEOREM 4

We start by giving some intermediate results that we will use repeatedly in the
main part of the proof given below. First, just as in Phillips & Moon (1999a,
Lemma 2), based on Phillips and Solo (1992), we decompose the i, t as
(25)
i, t = Ci, t + i, t 1 i, t,

where C = C(1) =

Ck and i, t =
k=0
j=0
j=0
Ck. Under
k=j+1
Assumption 1(a), C is finite and

ji, t j with C
j=
C
j2||C j||2 =
j2
j=0
Cs
< (see
s=j+1
Phillips & Moon (1999a, p. 1083)). It follows that

(26)
E|| i, t||2 M < .
We partition C = [Cab], (a, b = , w), so that the long-run covariance matrix
= CC =
C2 + C2 w
C Cw + C wCww

C Cw + C wCww
uu
=
2
2
Cw + Cww
u
u

(27)
For subsequent reference note that the components of i, t = (ui, t, i, t) in (25)
may be written as
(28)
ui, t = C i, t + C wwi, t + u i, t 1 u i, t,
i, t = Cw i, t + Cwwwi, t + i, t 1 i, t,
(29)
where u i, t and i, t are the two components in i, t.
Next, by equation (2)

t
xi, t =
s=1
e((t s)/T)cii, s + e(t/T)cixi, 0
264
HEIKKI KAUPPI
and using (29) we can write this as

xi, t = Cw f( )i, t + Cww f(w)i, t + R(x)i, t,
where have used the notation
(30)
f(a)i, t =
e((t s)/T)ciai, s, a = , w,
(31)
s=1
and

t1
(t 1)/T)ci
i, 0 + (1 e
R(x)i, t = e
ci /T
e((t 1 s)/T)ci i, s i, t + e(t/T)cixi, 0.
(32)
s=1
For later analysis it is useful to have the following two moment bounds. First,

2
f(a)i,
1
t
sup sup E
sup
i 1tT
T
1tT T

t
e((t s)/T)2 supi|ci| M < ,
(33)
s=1
since e((t s)/T)2 supi|ci| M < (recall that supi|ci| c < ). Second, using the

m
inequality E

m
Xi| m
i=1
E|Xi|2 (e.g. Davidson (1994, p. 140)) and the fact
i=1
i, t are iid across i we obtain
sup sup E(R2(x)i, t) 4 sup e((t 1)/T)2 supi|ci| E(2i, 0) + 4 sup E(2i, t)
i
1tT
1tT
1tT
+ 4( sup e(t/T)2 supi|ci|)E(x2i, 0)

1tT
1
2
1tT T
+ 4T 2(1 esupi|ci| /T)2 sup

t1
k=1
t1
e((2t 2 k s)/T)2 supi|ci|E|i, s i, k|
s=1
M < .
(34)
To see that (34) holds note that sup1 t T e(t/T)2 supi|ci| e2 supi|ci|, E(2i, t) M by (26),
E(x2i, 0) M (by the initial value condition), T 2(1 esupi|ci| /T)2 = O(1), and by the
Cauchy-Schwartz inequality E|i, k i, s| E(i, k)2E(i, s)2 M, where the latter
inequality follows again from (26).
265
We turn to give the completing steps of the proof of Theorem 4. Write

n
) =
nT( *
n
i=1
(xi, tui, t u) n( u u)
t=1
1
n

n
i=1
1
T2
x2i, t
t=1
where n( u u) = op(1), as (n, T ) with n/T 0 (recall Remark 1). It
suffices to show that
1
nT

n
i=1
t=1
(xi, tui, t u) N(0, uuxx), as (T, n ) with n/T 0,

(35)
and

n
1
nT 2
i=1
x2i, t xx, as (T, n ).
(36)
t=1
To prove (36) use (30) to write

n
1
nT 2
i=1
t=1

n
1
x = Cw
n
2
i, t
i=1
+ 2Cw Cww
1
n
i=1
1
+ 2Cww
n
i=1
1
T2
2
( )i, t
+C
t=1
1
n
t=1
1
T2
i=1
1
T2
2
f(w)i,
t
t=1
f( )i, t f(w)i, t + 2Cw
t=1
1
T2
2
ww
1
n
1
f(w)i, tR(x)i, t +
n
i=1
i=1
1
T2
f( )i, tR(x)i, t
t=1
1
T2
R2(x)i, t
t=1
= C2w Ib1 + C2wwIb2 + 2Cw CwwIIb1 + 2Cw IIb2 + 2CwwIIb3 + IIb4, say.
p
We now show that Cw 2Ib1 + C2wwIb2 xx and IIb1, IIb2, IIb3, IIb4 0 as
(T, n ) so that (36) follows.
Write

n
1
Ib1 =
n
i=1
Yi, T,
(37)
266
HEIKKI KAUPPI

T
1
where Yi, T = 2
T
2
f( )i,
t. For an application of Theorem 2 observe that Yi, T are
t=1

1
independent across i for all T and as T , Yi, T Yi =

1
E(Yi) =
Jci(r)2dr. We know
1
dsdr and by assumption lim
n n
(r s)2ci
i=1
e(r s)2cidsdr = xx
exists. Therefore, if the conditions (i) and (ii) of Theorem 2 hold,

n
1
n
Yi, T xx as (T, n ).
i=1
For verifying condition (i) let p = 1 + and use the definition of Yi, T in (37)
to obtain

T
1
(E|Yi, T| ) = 2 E
T
p 1/p
2
( )i, t
t=1
1/p
1
2
T
t=1
e((t s)/T)ci i, s
s=1

2p
1/p
, (38)
where the inequality follows from the Minkowskis inequality and the
definition of f( )i, t in (31). Now, the e((t s)/T)ci i, s, (1 s t T), are independent
random variables with zero means and E|e((t s)/T)ci i, s|2p (esupi|ci|})2 + 2
E| i, s|2 + 2 M for some M < and some > 0. Therefore, we may apply
Theorem 3.7.8 of Stout (1974, p. 213) to obtain

t
s=1
e((t s)/T)ci i, s
2p
Mt p,
(39)
where M is finite and independent of i. By inserting (39) into (38) and rising
to the power of p = 1 + it is easy to see that E|Yi, T|1 + M so that condition (i)
of Theorem 2 follows. For condition (ii) of Theorem 2 it suffices to note that

T
1
the supremum of the absolute difference between E(Yi, T) = 2
T

1
and E(Yi) =
t=1
e(t q/T)2ci
q=1
e(r s)2cidsdr tends to zero uniformly in i as T (this follows
since supi|ci| c < , for details see Kauppi (1999, p. 135136)).
267
Obviously the above analysis remains the same if we replace i, t in the

definition of Yi, T in (37) with wi, t implying that Ib2 has the same limit as Ib1.
2
2
2
2
+ Cww
= we therefore see that Cw
Ib1 + Cww
Ib2
Noticing from (27) that Cw
converges in probability to xx as desired.
p
We turn to prove that IIb1, IIb2, IIb3, IIb4 0 as (T, n ) by showing that
E(IIb1)2, E|IIb2|, E|IIb3|, E|IIb4| 0 as (T, n ). First, by the inequality

m
Xi|2 m
i=1
E|Xi|2 (e.g. Davidson, 1994, p. 140) and condition (b) of
i=1

n
1
Assumption 1 we have E(IIb1) = 2
E(f( )i, t /T)2E(f(w)i, t /T)2 =
n T i=1 t=1
1
O
, where the latter equality follows from (33). Second, the use of the
n
2

triangular and Cauchy-Schwartz inequalities shows that

n
1
E
n
1
T2
i=1
f(a)i, tR(x)i, t
t=1

n
1 1
T n
i=1
1
T
t=1
f(a)i, t
T

E|R(x)i, t|2 = O
T
where the equality follows from (33) and (34). Hence, E|IIb2|, E|IIb3| 0 as
(T, n ). It is also straightforward to do similar calculations with IIb4 that
show E|IIb4| 0 as (T, n ). This completes the proof of (36).
We turn to prove the result in (35). First, use (28) through (30) to write

n
1
nT
(xi, tui, t u)
i=1 t=1
n
1
T
i=1
(Cw f( )i, t + Cww f(w)i, t)(C i, t + C wwi, t) u
t=1
n
i=1
= Ia + IIa, say.
1
T
t=1
[xi, t(ui, t 1 u i, t) + R(x)i, t(C i, t + C wwi, t)] + u u
(40)
Note that f(a)i, 1 = ai, 1 and f(a)i, t = eci /Tf(a)i, t 1 + ai, t, (ai, t = i, t, wi, t), t 2, so that we
may write
268
HEIKKI KAUPPI

n
Ia =
1
n
1
T
i=1
n
i=1
(Cw f( )i, t 1 + Cww f(w)i, t 1)(C i, t + C wwi, t)
t=2
n
(eci /T 1)
T
(Cw f( )i, t 1 + Cww f(w)i, t 1)(C i, t + C wwi, t)
t=2
i=1
1
T
[(Cw i, t + Cwwwi, t)(C i, t + C wwi, t) u] = Ia1 + Ia2 + Ia3,

say.
t=1
To consider the asymptotic properties of Ia1 write

n
Ia1 =
where
1
n
Yi, T,
i=1
1
Yi, T =
T
[Cw C f( )i, t 1 i, t + CwwC wf(w)i, t 1wi, t
t=2
+ Cw C w f( )i, t 1wi, t + CwwC f(w)i, t 1 i, t].

Since the summands in Yi, T are uncorrelated over t and the four terms in the
square brackets in (41) are mutually uncorrelated for all t it follows that

T
1
E(Y ) = 2
T
2
i, T
2
[Cw
C 2 E(f( )i, t 1 i, t)2 + C2wwC2 wE(f(w)i, t 1wi, t)2
t=2
2
2
+ Cw
C w
E(f( )i, t 1wi, t)2 + C2wwC2 E(f(w)i, t 1 i, t)2]

T
1
= uu 2
T
t=2
t1
e((t 1 s)/T)2ci,
(42)
s=1
where the last equality uses (27) and the fact that E(f(a)i, t 1bi, t)2 =

t1
e((t 1 s)/T)2ci (a, b = , w).
s=1
Now, we apply Theorem 3. First, note that the Yi, T in (41) are independent
across i for all T with mean zero and variance Vi, T = E(Y2i, T ) in (42). Let

1
Vi = uu
e(r s)2cidsdr and write

n
1
n
1
Vi, T =
n
i=1
i=1
269
1
Vi +
n
(Vi, T Vi).
(43)
i=1
Using the fact that supi|ci| c < it is straightforward to show that the second
term on the right hand side of (43) tends to zero as n, T (see Kauppi (1999,
p. 135136)). On the other hand, the first term in (43) has the positive and finite
limit xx. Thus, condition (i) of Theorem 3 holds with V = uuxx. For
establishing condition (ii) of Theorem 3 recall the definition of Yi, T from (41),

m
let p = 2 + and apply the inequality E

Davidson (1994, p. 140)) to obtain
T
1
E|Yi, T| M E
T
p
f( )i, t 1 i, t
t=2
Xi
p1
E|Xi|p (e.g.
i=1
i=1
1
+ MwwE
T
1
+ M wE
T
f( )i, t 1wi, t
t=2
f(w)i, t 1wi, t
t=2
1
+ Mw E
T
f(w)i, t 1 i, t ,
(44)
t=2
where Mab = 4p 1|CwaC b|p M < (a, b = , w). Furthermore, by the fact that i, t
are iid we have
T
f( )i, t 1 i, t

p
=E
T
t1
e((t 1 s)/T)ci i, s i, t
s=1
= E| i, t| E
T
t1
T
p/2
t1
e((t 1 s) /T)ci i, s
s=1
M < ,
(45)

t1
((t 1 s)/T)ci
because |e
2+
| e |ci|} M < , E| i, t|
supi
M < , and E
i, s
2+
s=1
M(t 1)(2 + )/2 for some M < and for some > 0, where the result with regard

t1
i, s
to E
s=1
2+
follows from Theorem 3.7.8 of Stout (1974, p. 213) (note that
270
HEIKKI KAUPPI
an iid sequence is also a martingale difference sequence). Now, given (45) and
the fact that the f( )i, t 1 i, t, (2 t T) are martingale difference sequences for all
i, we may apply Theorem 3.7.8 of Stout (1974, p. 213) one more time giving

T
T
t=2

f( )i, t 1 i, t
T
T1
T
p/2
M < for all i.
The same arguments show that the other three expectations in (44) are similarly
bounded, and therefore, supTE|Yi, T|p = supTE|Yi, T|2 + M < for some > 0 and
all i. Hence, the conditions of Theorem 3 hold and we have shown that Ia1
converges weakly to the distribution given in (35) as (T, n ). Furthermore,
p
since supi|eci /T 1| = O(T 1), it follows immediately that Ia2 0 as (T, n ).
For Ia3 recall from (27) that u = C Cw + C wCww so that

n
Ia3 =
1
n
1
T
i=1
n
1
n
[(Cw i, t + Cwwwi, t)(C i, t + C wwi, t) (C Cw + C wCww)]
t=1
T
1
T
i=1
[C Cw ( 2i, t 1) + CwwC w(w2i, t 1)
t=1
+ (Cw C w + CwwC ) i, twi, t] 0 as (T, n ),

where the probability limit follows because the summands in the square
brackets are iid with zero mean and finite second order moment across both i
and t.
The remaining step in the proof of Theorem 4 is to show that IIa in (40) is
asymptotically negligible. First, in the same way as in the proof of Lemma 16
of Phillips & Moon (1999, p. 1105) we may decompose the one sided long-run
covariance matrix
=+

k=1
s=k
k=0
s=1
CsCk

CkCs = +
kCk + 1 CC
0.
C
k=0
j = [C
ab, j], (a, b = , w); we may
Using this in conjunction with the partition C
write

n
IIa =
1
n
i=1
t=1
1
T
xi, t(ui, t 1 u i, t)
j=0
, j + Cww, j + 1C
w, j)
(Cw , j + 1C

n
1
n
1
T
i=1
R(x)i, t(C i, t + C wwi, t) + (C w , 0C + C ww, 0C w)} = IIa1 + IIa2,
t=1
say.
For IIa1 note that we can write

T
1
T

T
1
1
xi, tu i, t 1 = xi, 1u i, 0 +
T
T
t=1
t=2
1
1
xi, tu i, t 1 = xi, 1u i, 0 +
T
T
1
1
= xi, 1u i, 0 + eci /T
T
T
and, thus,

T
1
T
t=1
1
xi, t(ui, t 1 u i, t) =
T

T1
t=1
T1
t=1
+ (e
1
xi, tu i, t +
T

T1
T1
i, t + 1u i, t,
t=1
1
1)
T

T1
xi, tu i, t.
t=1
In view of this expression we get

n
IIa1 =
n
i=1
1
T
T1
i, t + 1u i, t
n
n
+
T
1
1
xi, 1u i, 0
T
n
i=1
, j + Cww, j + 1C
w, j)
(Cw , j + 1C
j=0

t=1
i=1
xi, t + 1u i, t
t=1
1
1
i, t + 1u i, t + xi, 1u i, 0 xi, Tu i, T
T
T
ci /T
271

n
1
1
xi, Tu i, T +
T
n
ci /T
(e
i=1
1
1)
T

T1
xi, tu i, t
t=1
, j + Cww, j + 1C
w, j)
(Cw , j + 1C
j=0

= IIa1a + IIa1b + IIa1c + IIa1d + O
n
, say.
T
n
As a counterpart to the result E
R1, i, T
=O
1
derived in the
T
n i = 1
proof of Lemma 16 of Phillips & Moon (1999, p. 1107) we have
272
HEIKKI KAUPPI

n
n
1
T
i=1
T1

i, tt + 1
t=1
kCk + 1
C
1
.
T
=O
k=0
(46)
Since IIa1a is the (1, 2) element of the matrix inside the norm on the left hand
p
side of (46), we have IIa1a 0 as (T, n ). Next, by the triangle and CauchySchwartz inequalities

n
i=1
n1
Tn
1
xi, Tu i, T
T
T
i=1
n
T
sup E
1in
xi, T
E|ui, T|2

xi, T
n
,
T
E|ui, T|2 = O
T
where the equality is easily verified by using (26), (30), (33) and (34).
p
p
Therefore, IIa1c 0 as (T, n ) with n/T 0. Obviously, also, IIa1b 0 as
sup|ci|/T
1| and note that
(T, n ) with n/T 0. Finally, for IIa1d, let rT = T|e

n
E|IIa1d| rTE
n
i=1
1
T2
rT
i=1
1
T
n1
Tn
xi, tu i, t rT
t=1
T1
n1
Tn
T1
t=1
xi, t
i=1
T
1
T
E|ui, t|2 = O
T1
t=1
xi, t
T
u i, t
n
,
T
by similar arguments to those used for IIa1c and the fact that rT = O(1).
p
We turn to show that IIa2 0 as (T, n ) with n/T 0. Using (32) write

n
IIa2 =
n
1
T
i=1
n
1
n
i=1

t1
1
T
i, t(C i, t + C wwi, t) (C w , 0C + C ww, 0C w)
t=1
((t 1)/T)ci
(e
i, 0 + (1 e
ci /T
t=1
(C i, t + C wwi, t) = IIa2a + IIa2b,
e((t 1 s)/T)ci i, s + e(t/T)cixi, 0)
s=1
say.
Here IIa2a 0 as (T, n ) with n/T 0, because IIa2a is identical with the

n
term
1
n
i=1
R3, i, T in the proof of Lemma 16 of Phillips & Moon (1999a,
273
p. 1105). Finally, the result IIa2b 0 as (T, n ) with n/T 0 follows from
similar arguments as those used for IIa1. Details are straightforward and thus are
omitted. This completes the proof of the theorem.
APPENDIX D: PROOF OF THEOREM 5

The proof follows from the same arguments as the proof of Theorem 4. To see
the main lines write
nT( + ) nBn, T

n
n
i=1
1
1
[xi, t(ui, t
u

xi, t u+ ) + T(eci /T 1)u
xi, txi, t 1]
t=1

,
T
1
x2i, t
n i=1 T 2 t=1
where the denominator has the limit given in (36). Next let u+ = u
1
and note that the nominator in the above estimation error can be
u
written as
1

n
1
n
i=1
1
T
1
[xi, t(ui, t u
i, t) u+ ]
t=1

n
1
1
n( u

u
)
1
n
i=1
1
T
(xi, txi, t )
t=1
1
n( u u) + u

n( ),
where the n-normalized estimation errors of the kernel estimators are op(1)
as (n, T ) with n/T 0 (recall Remark 1). Furthermore, using the fact that
xi, t = (eci /T 1)xi, t 1 + i, t we can write

n
1
n
i=1
1
T
t=1

n
1
(xi, txi, t ) =
n
(eci /T 1)
T
i=1
1
+
n
i=1
xi, txi, t 1
t=1
1
T
(xi, ti, t ) = Op(1),
t=1
where the last equality holds as (n, T ) and can be proved by applying the
arguments given in the proof of Theorem 4. Thus, for the result in part (a) of
Theorem 5, it suffices that
274
HEIKKI KAUPPI

n
1
n
i=1
1
T
1
[xi, t(ui, t u
i, t) u+ ] N(0, u xx),
t=1
as (T, n ) with n/T 0. The details of the proof of this latter result are
similar to those of the proof of (35) and are thus omitted. Finally, note that the
limiting result in part (b) of the theorem follows from lines used in the proof
of (36) and the fact that the arithmetic average of the quantities ciE(01 Jci(r)2dr)
converges to a finite number cxx.
STATIONARITY TESTS IN
HETEROGENEOUS PANELS
Yong Yin and Shaowen Wu
ABSTRACT
Several stationarity tests in heterogeneous panel data models are
proposed in this chapter. By allowing maximum degree of heterogeneity in
the panel, two different ways of pooling information from independent
tests, the group mean and the Fisher tests, are used to develop the panel
stationarity tests. We consider the case of serially correlated errors in the
level and trend stationary models. The small sample performances of the
tests are investigated via Monte Carlo simulations. The simulation
experiments reveal good small sample performances. In the presence of
serial correlation, either the group mean or the Fisher tests based on
individual KPSS tests with l2 and LMC tests with p = 1 are recommended
for use in empirical work due to their good small sample performances.
I. INTRODUCTION
Dynamic panel data analysis has attracted more and more attention. This is
partly due to the recent availability of large panel data sets. These data sets
usually cover different countries, industries, or regions over relatively long time
spans. They offer new opportunities as well as challenges to the analysis of
dynamic panel data models, especially the heterogeneous panel data models as
researchers usually would anticipate great differences among the cross-section
units in the data.
ISBN: 0-7623-0688-2
275
276
YONG YIN & SHAOWEN WU
Along with the development of univariate non-stationary time series

analysis, researchers also show more interests in analyzing non-stationary
panel data. So far, people have proposed various methods to test for unit roots
and cointegration along with methods of estimating cointegrating system in the
context of panel data, see Baltagi & Kao (2000) for an up to date survey in this
volume. The biggest advantage of using the panel data approach is the
increased effective sample size, therefore it can effectively increase the powers
of statistical tests and the efficiencies of estimation methods compared with
their univariate counterparts. However, extending univariate methods of
handling non-stationary data to the context of panel data raises the question of
heterogeneity as well.
The early development of dynamic panel data analysis mainly deals with the
homogeneous models. But the availability of panel data sets such as the Penn
World Table raises the issue of plausibility of the homogeneous assumption.
The parameters as well as dynamic structures of different cross-section units
might be different. Hence, it is necessary to develop methods investigating the
non-stationary properties in the heterogeneous panel data models. Heterogeneous panel data model is referred to the situation that both the error term
structures as well as the slopes can be different across the units. This is quite
different from the usual fixed-effects (random-effects) models.
There have been some papers dealing with tests for unit root and
cointegration in the heterogeneous panel in the literature, see, for example, Im,
Pesaran & Shin (1997), Maddala & Wu (1999) for panel unit root tests, and
Pedroni (1995, 1997), Kao (1999), McCoskey & Kao (1997, 1998), and Wu &
Yin (1999) for panel cointegration tests. Baltagi & Kao (2000) recently give a
complete survey on this subject as well. As in the univariate case, it would be
interesting to test for unit roots by using stationarity as the null. Not only does
it provide a complement to the conventional unit root tests using nonstationarity as the null, but it also incorporates the moving average structure
that seems to be a common empirical feature, especially for macroeconomic
data.1 Thus, it is quite natural to develop stationarity tests for the heterogeneous
panel.
However, panel stationarity tests have not yet received serious attention in
the literature. Stationarity tests have been developed for residuals to be used as
the residual-based tests for the null of cointegration in panel data models in
McCoskey & Kao (1998). Hadri (1998) addresses panel stationarity test
directly. However, he only considers models with i.i.d. errors and only
considers homogeneous deterministic trends under the null hypothesis.
In this chapter, we shall develop some stationarity tests in heterogeneous
panel data models. The models we consider will allow both heterogeneous
Panel Stationarity Tests
277
deterministic trends under the null and different error structures. The tests
should be able to handle serially correlated errors in the models. In the
univariate case, based on a Lagrange Multiplier (LM) test in case of i.i.d.
errors, there are two different extensions to handle the existence of serial
correlation. Kwiatkowski, Phillips, Schmidt & Shin (1992) (KPSS hereafter)
propose to use nonparametric estimation to handle the situation while
Leybourne & McCabe (1994) (LMC hereafter) propose to use augmented
autoregressive components to take care of it. We shall propose panel
stationarity tests utilizing both tests. One type of the tests we propose would be
based on the group mean of the individual test statistics, which can be shown
to have a normal distribution asymptotically after some adjustments are made
to the group mean. The second test is in line with Maddala & Wu (1999). The
idea of the test could be traced back to Fisher (1932), which pools the p-values
from individual tests. We will also design some Monte Carlo experiments to
investigate the small sample performances of the proposed tests.
The rest of the chapter is organized as follows. In Section II we will set up
the models for heterogeneous panel and discuss panel stationary tests. Monte
Carlo simulation designs and results aiming at investigating small sample
performances of proposed tests can be found in Section III, and Section IV
concludes.
II. TESTS FOR STATIONARITY IN THE

HETEROGENEOUS PANELS
The basic model for testing for trend stationarity in the univariate time series
is as follows:
yt = rt + t + t
(1)
where rt is a random walk:

rt = rt 1 + t
It is assumed that t ~ iid(0, ), t ~ iid(0, 2), and t and t are independent.
The initial value r0 is treated as fixed and serves as the role of an intercept. The
null of stationarity is simply 2 = 0. Under the null, yt is trend stationary
because t is assumed to be stationary. Define q = 2/2. q is the so-called
signal-to-noise ratio in structural time series models. The null can be specified
as H0 : q = 0 as well. If = 0, the model will be reduced to
2
yt = rt + t
and under the null yt is level stationary instead of trend stationary.
(2)
278
The statistic considered in the literature is both the one-sided LM test

statistic and the local best invariant (LBI) test statistic under the stronger
assumption that the ts are normal.2 Let e t be the residuals from the regression

T
of yt on a linear time trend. Define as =
2
2
process of the residuals St =
e 2t/T and the partial sum

T
t=1
e i . Then the LM test statistic is LM =
i=1
S2t/ 2 .
t=1
In order to construct the LM test statistic to test the null hypothesis of level
stationary instead of trend stationary, we should define e t as the residuals from
the regression of yt on an intercept only.
d
It has been shown that for the trend stationary model, T 2LM
V2(r)2 dr
under the null hypothesis, where V2(r) is the second-level Brownian bridge
given by V2(r) = W(r) + (2r 3r2)W(1) + (6r + 6r2)
W(s) ds, with W(r)
being a Wiener process. For the level stationary model, under the null,
d
T 2LM
V(r)2 dr,
where
V(r)
is
standard
Brownian
bridge:
V(r) = W(r) rW(1).

There are two ways to incorporate serial correlation into the basic univariate
models. One way is due to KPSS and the other one is due to LMC. In KPSS,
the models are still (1) and (2) with modification that t can be serially
correlated in any form. The usual specification is that t satisfies the strong
mixing regularity conditions of Phillips & Perron (1988). Under such
conditions, the normalized numerators of the LM test statistics will converge to
the corresponding Brownian bridges associated with the long-run variance 2
of t. So the effort is concentrated on how to get a consistent estimator
of 2. KPSS consider the Newey & West (1987) consistent estimator s2(l),

T
which is based on nonparametric estimation of s (l) = T

l
w(s, l)
s=1
e 2t + 2T 1
t=1
e te t s . This estimator depends on the choice of a spectral
t=s+1
window w(s, l) along with the truncation parameter l.

KPSS use the Bartlett window and recommend choosing l = o(T 1/2). The
resulting test statistics are labeled as for level stationary models and for
279

T
tend stationary models with () = T
S2t/s2(l), where both S2t and s2(l)
l=1
depend on e t, which is the residual from the regression of yt on an intercept only

for the level stationary models and on a linear trend for the trend stationary
d
models. It has also been proved that
V(r)2 dr,
V2(r)2 dr and
both tests are consistent. See KPSS for more details of derivation and proof
along with some simulation results.
The KPSS tests handle the serial correlation in a way similar to those of
Phillips-Perron tests for unit roots. LMC, on the other hand, propose to use the
augmented autoregression to handle serial correlation, which is similar in a way
to those of the Augmented Dickey-Fuller tests for unit roots. Since any
stationary structure can be represented by autoregressive structures, LMC work
with transformed models of (1) and (2). That is, (L)yt = rt + t + t for trend
stationary models, and (L)yt = rt + t for level stationary models, where (L)
is a polynomial in lag operator L.
To construct the test statistics, one should estimate ARIMA(p, 1, 1) models
in order to remove the serial correlation first, and proceed with the whitened
series to get the LM test statistic as if there is no serial correlation. LMC label
the test statistic s
for the level stationary models and s for the trend stationary
models. Please see their paper for detailed descriptions and discussions of the
d
tests. They also show that under the null s

V(r)2 dr and s
V2(r)2 dr.
LMC argue that their tests are superior to the KPSS tests due to the fact that the
augmented autoregression is used to control for serial correlation. Theoretically, the LMC tests are more powerful than the KPSS tests because the LMC
test statistics are Op(T) under the alternative while the KPSS test statistics are
Op(T/l). This superiority is also shown through Monte Carlo simulation.3
The univariate model for testing for stationarity can be readily extended to
the panel data models. Let yit, i = 1, . . . , N, t = 1, . . . , T, be the observed N
cross section units of time span of T for which we want to test for stationarity.
Let us consider the following models.
Level stationarity: yit = rit + it
(3)
Trend stationarity: yit = rit + it + it
(4)
Where rit = rit 1 + it, with ri0s being fixed constants such that ri0 is not
necessarily equal to rj0 if i j.4
280
Assumption
(i)
E(it) = 0, and E(itjs) =
2i
0
if i = j and t = s
otherwise
(ii) For each cross-section unit i, it either satisfies the strong mixing
conditions for functional central limit theorem to be hold with long-run
variance of 2i, or it can be expressed in a p-th order AR model.
(iii) E(itjs) = 0 i, j, t, s
Note that assumption (i) adds heterogeneity to the error structure of by
allowing heteroskedasticity. Assumption (ii) also allows heteroskedasticity in
while assumption (iii) rules out contemporaneous correlation and states that
and are uncorrelated within units as well.
Define qi = 2i/2i, that is, qis are the signal-to-noise ratios in each crosssection units. The null hypothesis can be expressed as H0 : qi = 0 for all i. For
level stationary models, under H0, each cross-section unit is stationary around
a level ri0, which is not necessarily the same across the units. While for trend
stationary models, under H0, each cross-section unit is stationary around a
linear trend ri0 + it, which is also not necessarily the same across the units. The
different levels and linear trends truly reflect the possibility of heterogeneity
across sections. The alternative hypothesis is that H1 : qi > 0 for all i. Here, we
introduce heterogeneity by allowing different signal-to-noise ratios across
sections. That is, the signal-to-noise ratios are only required to be greater than
0 but not necessarily to be the same under the alternative.
Let and be the individual KPSS test statistic for the i-th unit. Define
1 =

V(r)2 dr and 2 =
V2(r)2 dr. We can construct the standardized group
mean tests as
=
and
1
N
N
i E(1)
i=1
Var(1)

N
=
1
N
N
i=1
i E(2)
for level stationary models
for trend stationary models.

Var(2)
Similarly, let s
i and si be the individual LMC test statistic for the i-th unit.
Define the standardized group mean tests as
281
for level stationary models
for trend stationary models.

s
i E(1)

si E(2)
N
s
=
and
1
N
i=1
Var(1)
N
N
s =
1
N
i=1
Var(2)
By using the sequential limit theorem, it can be shown that under the null, all
four test statistics would have the standard normal distribution asymptotically
under the assumption spelled out earlier. Note that the sequential limit theorem
requires that T goes to infinity followed by N goes to infinity, and the
asymptotic can be established by an application of the Lindberg-Levy central
limit theorem.5 The consistency of the tests is followed by the consistency of
the univariate tests established in the literature. It should be noted that the tests
are still consistent in the case of a mixed alternative hypothesis in which only
part of the panel are nonstationary while the rest are stationary, as long as
= lim N1/N > 0 where N1 is the number of nonstationary series under the
N
alternative.
Hadri (1998) used the characteristic function given by Anderson & Darling
(1952) to compute the means and the variances of i. For the level stationary
model, the mean is 1/6 and the variance is 1/45 while for the trend stationary
model, the mean is 1/15 and the variance is 11/6300. However, as suggested in
Im, Pesaran & Sin (1997), one can use the mean and the variance of small
sample distributions (in finite T) obtained via simulations to enhance the finite
sample performances of the group mean tests.6
The group mean test pools independent individual test statistics to find
evidence on the composite null. In the literature, there is another way to pool
information from individual test to test the composite null, which is due to
Fisher (1932). The idea has been applied to develop panel unit root tests in
Maddala & Wu (1999) and panel cointegration tests in Wu & Yin (1999). Both
the KPSS and the LMC tests can be used to formulate the Fisher tests to test
for stationarity as well. Let Pi be the p-value of the individual test for
stationarity for the i-th unit (using either the KPSS or the LMC test). Define the

N
Fisher test statistic as = 2
log Pi.7 Then has a 2 distribution with
i=1
degree of freedom 2N under the null hypothesis that qi = 0 for all i. Note that
282
the validity of the 2 distribution depends on the accuracy of the distributions

from which Pis are derived, and thus it does not rely on the asymptotic of N
where the group mean test does. On the other hand, the small sample
distribution is usually unknown, so it is necessary to get the small sample
distributions via simulations to enhance the small sample performance of the
Fisher tests.8
III. MONTE CARLO SIMULATION RESULTS

In this section, we will design some Monte Carlo simulation experiments to
investigate the small sample properties of the panel stationarity tests we
proposed in the previous section. The object of the simulations is to shed lights
on the relative small sample performances of various tests. As we have seen, we
can use either the KPSS or the LMC tests to handle the serial correlation. For
each univariate stationarity test, we can use either the group mean test or the
Fisher test to formulate the panel version. As illustrated in Maddala & Wu
(1999) and Wu & Yin (1999), in many cases they considered, the performances
of the group mean and Fisher tests are very similar to each other. However we
still need to investigate it for stationarity tests. As for the univariate KPSS and
LMC tests, LMC established small sample supremacy of their tests. But
whether this supremacy can be carried over to the panel tests based on the
individual LMC test remains a question, and it can be answered by simulation
experiments.
The basic models for simulations are models (3) and (4) with rit = rit 1 + it
where it ~ iidN(0, qi2i). The models for it are it = iit 1 + uit where
uit ~ iidN(0, (1 2i)2i). Hence when i = 0, its are i.i.d. within each unit, while
its are serially correlated within each unit when i 0.
These two models are extensions of the standard univariate models for
stationarity to the panel data. The introduction of different i 2i ri0 and i is to
allow the largest degree of heterogeneity. For this purpose, we set the
parameters as follows:
i ~ U[0, 1], 2i ~ U[0.5, 1.5], ri0 ~ U[0, 5]
i = 0 for i.i.d. case
and
i ~ U[0.1, 0.3] for the case of serial correlation
where U denotes the uniform distribution.
The null hypothesis is specified as qi = 0 for all i. For the alternative
hypothesis, we only consider the case where all qis are positive following the
283
tradition in the literature. It should be noted that all our tests are consistent even
when there are only parts of the series are non-stationary under the alternative
as long as the portion of nonstationary units is non-vanishing asymptotically.
Furthermore, we only consider the alternative H1 : qi = q = 0.001 for simplicity.9
We consider time dimensions of 25, 50, and 100 and cross sectional
dimensions of 15, 25, 50, and 100. The normal variates are generated by
RNDN function in the matrix programming language GAUSS. We apply the
group mean and Fisher tests based on the LM, KPSS, and LMC tests to each
panel. For each case, the number of iterations is 5,000. For the group mean test,
the mean and the variance of small sample distributions are derived from
100,000 simulations for the corresponding time span and test procedures. For
the Fisher test, the small sample distributions are simulated using 100,000
replications as well.
In order to carry out our experiments, we still need to select two parameters.
One is the truncation parameter l in the individual KPSS tests and the other one
is the order of autoregression p in the individual LMC tests. Following
earlier simulation results regarding the univariate KPSS tests in the litera-

ture, we experiment with l1 = int 4

l3 = int 12
T
100
1/4
T
100
1/4

, l2 = int 8
T
100
1/4
, and
, where int[ ] returns the integer part of the argument.
Also, following earlier simulation results in the literature, we choose the Parzen
window instead of the Bartlett window used by KPSS as the former performs
better than the later. For the LMC test, we experiment with p = 1, 2, and 3
following Monte Carlo experiments by LMC.
Let us first look at the white noise case. In this case i = 0 and the tests based
on the individual LM tests are the appropriate ones to be used. Table 1 presents
the sizes of the group mean and the Fisher tests based on the LM, KPSS, and
LMC tests for the level stationary model. Note that by choosing l = 0 in the
KPSS test or p = 0 in the LMC test, the resulting test statistic is nothing but that
of the LM test. That is why the results for the tests based on the LM test are
listed in the column with the heading of p(l) = 0. We also listed the results for
N = 1 as a benchmark, where the results simply replicate those for the univariate
case. As we can see from the table, the size performances of the panel
stationarity tests are quite satisfactory in this case. In addition the performances
are relatively better as T gets larger. In most cases, the Fisher tests have better
size performances than the group mean tests, especially for larger T and smaller
N. This is not surprising as the Fisher test is an exact test while the group mean
284
Table 1.
Sizes of Panel Stationarity Tests: Level Stationary Model, White

Noise
p(l) = 0
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.047
0.049
0.053
0.055
0.047
0.049
0.051
15
25
50
100
0.061
0.053
0.054
0.046
Group Mean Test

0.057
0.061
0.059
0.057
0.053
0.053
0.055
0.055
0.055
0.047
0.050
0.053
0.063
0.057
0.054
0.046
0.059
0.058
0.054
0.050
0.063
0.063
0.059
0.051
15
25
50
100
0.050
0.045
0.047
0.043
0.047
0.048
0.050
0.043
0.056
0.051
0.053
0.052
0.046
0.048
0.046
0.041
0.045
0.044
0.047
0.046
0.052
0.053
0.052
0.047
0.047
0.046
0.051
0.047
0.050
0.050
15
25
50
100
0.066
0.066
0.056
0.057
0.059
0.062
0.053
0.054
Group Mean Test

0.058
0.064
0.060
0.058
0.050
0.054
0.057
0.061
0.051
0.067
0.059
0.056
0.065
0.070
0.065
0.061
0.058
0.067
0.057
0.055
15
25
50
100
0.052
0.054
0.049
0.051
0.050
0.052
0.045
0.050
0.054
0.053
0.048
0.057
0.049
0.054
0.049
0.048
0.050
0.055
0.055
0.054
0.046
0.053
0.049
0.051
0.051
0.050
0.049
0.048
0.045
0.049
15
25
50
100
0.056
0.057
0.056
0.056
0.058
0.057
0.057
0.058
Group Mean Test

0.057
0.059
0.058
0.061
0.058
0.054
0.056
0.055
0.060
0.061
0.062
0.059
0.063
0.062
0.054
0.056
0.069
0.063
0.064
0.059
15
25
50
100
0.047
0.047
0.049
0.053
0.046
0.049
0.051
0.053
0.045
0.050
0.051
0.053
0.049
0.049
0.047
0.052
0.048
0.049
0.053
0.051
N
1
25
50
100
Fisher Test
0.051
0.048
0.052
0.047
0.047
Fisher Test
0.050
0.053
0.043
0.054
0.049
Fisher Test
0.047
0.051
0.051
0.053
0.046
0.051
0.049
0.051
Note:
1. The data generating process is yit = ri0 + it, and it ~ i.i.d.N(0, 2i).
2. Please see text for choices of parameters
3. li is the truncation parameter used in individual KPSS test and p is the order of autoregression
in ARIMA(p,1,1) used in individual LMC test. p(l) = 0 indicates individual LM test is used.
285
test is an asymptotic test (in N). As for the tests based on the KPSS tests with
different lag truncation parameters and the LMC tests with different
autoregression orders, the sizes are also quite close to the nominal size of 5%.
In general, we also observe that the size performances are better for larger T
and the Fisher tests have better size performances in this case.
Table 2 presents the powers of the panel stationarity tests for the level
stationary models. To make things comparable, all the powers are adjusted
according to their true sizes. The powers of the LM based tests clearly state the
superiority of the panel stationary tests over their univariate counterparts. When
T = 25, the power of the univariate LM test is only 0.117, while the power
jumps to 0.392 when 15 cross-section units are used, and it is close to 1 (0.954
for the group mean test and 0.952 for the Fisher test) when N = 100. As a matter
of fact, all the powers for T = 100 are 1 and they are close to 1 when T = 50.
The powers of the group mean and the Fisher tests in most cases are almost the
same.
It is documented in the literature that increasing the lag truncation parameter
l in the KPSS tests and the autoregression order p in the LMC tests can reduce
the powers. This is replicated in Table 2 as those entries for N = 1. However,
due to the powerfulness of the panel stationarity tests, the reduction in the
powers by overestimating is not an issue in some cases, especially for larger T
and N, as in those cases the powers are 1 or close to 1. This is a unique feature
of panel stationarity tests. The reduction in power is smaller for the LMC tests
as p increases than for the KPSS tests as l increases.
The size and power performances of panel stationarity tests in the case of
white noise for the trend stationary models are reported in Tables 3 and 4. We
have similar observations in these two tables. One thing we need to point out
is that in this case the powers are smaller than those of level stationary models,
especially for the case of T = 25 where the powers are much smaller. The
powers are only 0.280 for the group mean test and 0.279 for the Fisher test even
when N = 100, though these represent an increase of nearly four-folds from the
univariate case.
Next, let us look at the results for the case of serial correlation. Table 5 gives
us the sizes of panel stationarity tests in this case. Note that size distortions are
expected for the tests based on the LM tests. This can be seen in the table for
the case of N = 1. But the size distortions become much worse as N increases.
As a matter of fact, the actual sizes are close to 1 when N = 100. This is due to
the fact that the size distortions are amplified through pooling the crosssectional units, as pointed out in Wu & Yin (1999) for the panel cointegration
tests as well. The size distortions are still quite severe when l1 is used in the
KPSS tests and they become moderate when l2 and l3 are used for T = 50 and
286
Table 2.
Size Adjusted Powers of Panel Stationarity Tests: Level Stationary

Model, White Noise
p(l) = 0
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.117
0.110
0.089
0.074
0.105
0.094
0.086
15
25
50
100
0.392
0.546
0.775
0.954
Group Mean Test

0.365
0.272
0.183
0.491
0.383
0.262
0.712
0.576
0.377
0.936
0.834
0.612
0.305
0.414
0.630
0.874
0.263
0.341
0.527
0.779
0.233
0.306
0.473
0.727
15
25
50
100
0.384
0.542
0.771
0.952
0.362
0.492
0.719
0.936
0.156
0.236
0.359
0.584
0.302
0.408
0.635
0.873
0.264
0.346
0.526
0.780
0.235
0.308
0.477
0.729
0.302
0.284
0.224
0.277
0.251
0.218
15
25
50
100
0.961
0.995
1.000
1.000
0.939
0.990
1.000
1.000
Group Mean Test

0.903
0.815
0.977
0.944
1.000
0.998
1.000
1.000
0.931
0.986
1.000
1.000
0.884
0.969
0.999
1.000
0.835
0.941
0.999
1.000
15
25
50
100
0.960
0.995
1.000
1.000
0.938
0.991
1.000
1.000
0.828
0.946
0.998
1.000
0.932
0.986
1.000
1.000
0.891
0.972
0.999
1.000
0.836
0.944
0.999
1.000
0.583
0.536
0.455
0.566
0.547
0.512
15
25
50
100
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
Group Mean Test

1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
15
25
50
100
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
N
1
25
50
100
Note:
1. The data generating
it ~ i.i.d.N(0, 2i).
2. See Note 2 in Table 1.
process
Fisher Test
0.271
0.381
0.576
0.835
0.268
Fisher Test
0.908
0.978
1.000
1.000
0.495
is
Fisher Test
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
yit = rit + it,
rit = ri,t 1 + it,
it ~ i.i.d.N(0, q2i),
and
Table 3.
Sizes of Panel Stationarity Test Based on Group Mean: Trend

Stationary Model, White Noise
p(l) = 0
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.052
0.052
0.060
0.054
0.051
0.050
0.054
15
25
50
100
0.065
0.064
0.066
0.062
Group Mean Test

0.064
0.058
0.057
0.058
0.059
0.062
0.063
0.060
0.062
0.060
0.058
0.060
0.071
0.073
0.064
0.060
0.061
0.067
0.065
0.060
0.065
0.067
0.063
0.063
15
25
50
100
0.057
0.054
0.059
0.054
0.055
0.051
0.057
0.053
0.050
0.051
0.054
0.056
0.055
0.057
0.055
0.053
0.051
0.053
0.056
0.053
0.058
0.059
0.057
0.057
0.046
0.047
0.049
0.046
0.047
0.050
15
25
50
100
0.050
0.060
0.057
0.058
0.055
0.053
0.055
0.054
Group Mean Test

0.056
0.065
0.053
0.060
0.053
0.056
0.050
0.059
0.064
0.062
0.068
0.064
0.073
0.073
0.075
0.072
0.068
0.069
0.068
0.074
15
25
50
100
0.049
0.051
0.052
0.054
0.047
0.048
0.049
0.052
0.056
0.055
0.052
0.056
0.049
0.048
0.055
0.056
0.053
0.056
0.061
0.066
0.051
0.053
0.054
0.064
0.046
0.042
0.042
0.043
0.050
0.048
15
25
50
100
0.061
0.057
0.059
0.054
0.062
0.057
0.059
0.053
Group Mean Test

0.060
0.058
0.057
0.055
0.060
0.056
0.056
0.055
0.064
0.063
0.062
0.062
0.070
0.065
0.068
0.060
0.074
0.068
0.066
0.059
15
25
50
100
0.052
0.049
0.052
0.048
0.051
0.050
0.051
0.049
0.052
0.053
0.053
0.054
0.053
0.055
0.060
0.054
0.060
0.057
0.057
0.053
N
1
25
50
100
287
Fisher Test
0.052
0.052
0.055
0.056
0.045
Fisher Test
0.050
0.049
0.049
0.049
0.041
Fisher Test
0.051
0.050
0.051
0.050
0.052
0.049
0.051
0.048
Note:
1. The data generating process is yit = ri0 + it + it, and it ~ i.i.d.N(0, 2i).
288
Table 4.
Size Adjusted Powers of Panel Stationarity Test:Trend Stationary

Model, White Noise
p(l) = 0
l1
KPSS
l2
l3
p=1
LMC
p=2
p=3
0.068
0.060
0.047
0.045
0.061
0.061
0.058
15
25
50
100
0.108
0.144
0.172
0.280
Group Mean Test

0.106
0.069
0.040
0.128
0.074
0.034
0.159
0.085
0.031
0.245
0.116
0.027
0.091
0.090
0.118
0.185
0.090
0.092
0.109
0.164
0.080
0.083
0.102
0.132
15
25
50
100
0.109
0.144
0.163
0.279
0.103
0.138
0.157
0.257
0.040
0.030
0.027
0.024
0.089
0.090
0.127
0.187
0.086
0.092
0.109
0.169
0.079
0.083
0.098
0.140
0.133
0.124
0.079
0.120
0.106
0.095
15
25
50
100
0.485
0.629
0.867
0.986
0.426
0.576
0.806
0.971
Group Mean Test

0.327
0.159
0.450
0.212
0.677
0.320
0.901
0.508
0.374
0.509
0.723
0.937
0.287
0.374
0.564
0.817
0.252
0.317
0.488
0.718
15
25
50
100
0.490
0.631
0.864
0.985
0.427
0.574
0.805
0.968
0.153
0.205
0.311
0.488
0.385
0.518
0.740
0.939
0.293
0.400
0.581
0.833
0.252
0.336
0.509
0.730
0.341
0.317
0.231
0.321
0.272
0.239
15
25
50
100
0.991
1.000
1.000
1.000
0.975
1.000
1.000
1.000
Group Mean Test

0.938
0.874
0.993
0.972
1.000
0.999
1.000
1.000
0.978
0.999
1.000
1.000
0.946
0.994
1.000
1.000
0.873
0.974
1.000
1.000
15
25
50
100
0.990
1.000
1.000
1.000
0.972
1.000
1.000
1.000
0.979
0.999
1.000
1.000
0.950
0.995
1.000
1.000
0.888
0.982
1.000
1.000
N
1
25
50
100
Fisher Test
0.067
0.070
0.084
0.108
0.106
Fisher Test
0.325
0.445
0.673
0.898
0.275
Fisher Test
0.937
0.992
1.000
1.000
0.868
0.972
1.000
1.000
Note:
1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), and
it ~ i.i.d.N(0, 2i).
Table 5.
25
50
100
289
Sizes of Panel Stationarity Tests: Level Stationary Model, Serial

Correlation
p(l) = 0
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.079
0.059
0.050
0.047
0.051
0.054
0.058
15
25
50
100
0.532
0.694
0.904
0.993
0.232
0.302
0.433
0.657
Group Mean Test

0.074
0.032
0.076
0.025
0.079
0.017
0.087
0.012
0.130
0.144
0.181
0.212
0.150
0.182
0.230
0.314
0.152
0.172
0.221
0.328
15
25
50
100
0.490
0.669
0.897
0.993
0.205
0.270
0.401
0.641
0.028
0.024
0.016
0.012
0.104
0.123
0.156
0.212
0.129
0.160
0.210
0.314
0.129
0.150
0.206
0.328
0.080
0.057
0.046
0.055
0.060
0.059
15
25
50
100
0.551
0.747
0.945
0.999
Group Mean Test

0.162
0.081
0.058
0.208
0.090
0.052
0.300
0.103
0.049
0.472
0.132
0.048
0.099
0.102
0.117
0.145
0.126
0.144
0.182
0.250
0.138
0.161
0.209
0.286
15
25
50
100
0.517
0.729
0.944
0.999
0.140
0.190
0.279
0.456
0.050
0.050
0.047
0.047
0.077
0.082
0.091
0.116
0.096
0.113
0.155
0.213
0.109
0.137
0.178
0.264
0.094
0.062
0.052
0.052
0.058
0.057
15
25
50
100
0.563
0.783
0.944
0.998
Group Mean Test

0.130
0.080
0.065
0.169
0.083
0.063
0.210
0.091
0.065
0.307
0.104
0.066
0.077
0.082
0.081
0.087
0.086
0.094
0.096
0.106
0.099
0.114
0.124
0.145
15
25
50
100
0.532
0.773
0.943
0.998
0.109
0.148
0.193
0.293
0.057
0.064
0.066
0.070
0.066
0.075
0.072
0.083
0.074
0.083
0.092
0.112
Fisher Test
0.066
0.067
0.072
0.089
0.050
Fisher Test
0.070
0.077
0.095
0.128
0.053
Fisher Test
0.062
0.071
0.083
0.098
0.052
0.056
0.059
0.062
Note:
1. The data generating process is yit = ri0 + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i).
290
100. For the LMC test, the size distortion is still considerably large when the
true order of autoregression (p = 1) is used when T = 25. The size distortions
become smaller and moderate when T increases to 50 and 100. Interestingly,
overestimating in this case increases the size distortions. We can also observe
that the Fisher tests in general have better size performances than the group
mean tests.
Table 6 reports the power performances of the panel stationarity tests in the
presence of serial correlation. The first thing we can notice is that the powers
are lower than those in the white noise case for some combinations of N and
T. The powers are around 60% even when N = 100 and T = 25 for the KPSS tests
with l2 and the LMC tests with p = 1, which have relatively moderate size
distortions. The powers are close to 1 when N is larger than 50 and T = 50 for
these two tests (the group mean and Fisher tests). When T = 100, however, all
the powers are still 1 or very close to 1. In such a case, smaller size distortion
would be the primary criterion to decide which test to be used in practice. The
powers of the KPSS tests with l2 and the LMC test with p = 1 are almost the
same for most cases though the results for N = 1 actually indicate that the later
has an advantage in the univariate case, which agrees with the findings in LMC.
There are almost no differences in the power performances of the group mean
and the Fisher tests.
The size distortions of the panel stationarity tests for the trend stationary
models with serial correlation are presented in Table 7 with size adjusted
powers presented in Table 8. For the size distortions, we have the same
observations as those for the level stationary models. Quite interestingly, the
KPSS tests with l2 has slightly edge over the LMC tests with p = 1 when T = 50
while the situation is reversed when T = 100. But we observe severe negative
size distortions for the KPSS tests with l2 when T = 25. Except for this case, the
size distortions for these two tests are smaller than the corresponding ones in
the level stationary models. The Fisher tests have relatively better size
performances than the group mean tests, especially when the individual LMC
tests are used. As for the adjusted powers, we only need to report the lower
powers compared to the level stationary models since things are relatively the
same as those for the level stationary models. For the KPSS tests with l2 and the
LMC tests with p = 1, the powers are about 70% even when N = 100 for T = 50,
compared with powers of 1 in the same situation for the level stationary
models. The powers are close to 1 when T = 100 and there are more than 25
cross-section units in the panel.
In summary, through Monte Carlo simulations, we found the tests we
proposed have quite satisfactory small sample performances in most cases we
considered. In the absence of serial correlation, the tests based on the LM tests
Table 6.
Size Adjusted Powers of Panel Stationarity Tests:Level Stationary

Model, Serial Correlation
p(l) = 0
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.153
0.109
0.095
0.079
0.100
0.095
0.089
15
25
50
100
0.249
0.338
0.489
0.754
Group Mean Test

0.234
0.211
0.161
0.319
0.264
0.210
0.478
0.400
0.306
0.724
0.619
0.474
0.207
0.250
0.394
0.588
0.174
0.207
0.329
0.532
0.157
0.204
0.302
0.479
15
25
50
100
0.247
0.337
0.484
0.750
0.228
0.316
0.488
0.729
0.163
0.209
0.301
0.466
0.212
0.248
0.394
0.584
0.171
0.207
0.331
0.534
0.161
0.200
0.304
0.490
0.316
0.242
0.197
0.235
0.198
0.183
15
25
50
100
0.886
0.862
0.998
1.000
0.831
0.939
0.996
1.000
Group Mean Test

0.761
0.656
0.904
0.841
0.992
0.980
1.000
0.999
0.775
0.912
0.993
1.000
0.712
0.854
0.981
1.000
0.643
0.813
0.967
0.999
15
25
50
100
0.885
0.962
1.000
1.000
0.833
0.941
1.000
1.000
0.673
0.844
1.000
1.000
0.774
0.917
1.000
1.000
0.723
0.858
1.000
1.000
0.651
0.812
1.000
1.000
0.530
0.500
0.429
0.524
0.490
0.471
15
25
50
100
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
Group Mean Test

0.999
0.996
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
0.998
1.000
1.000
1.000
15
25
50
100
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.999
1.000
1.000
1.000
0.999
1.000
1.000
1.000
N
1
25
50
100
291
Fisher Test
0.205
0.269
0.412
0.620
0.219
Fisher Test
0.772
0.910
1.000
1.000
0.468
Fisher Test
0.999
1.000
1.000
1.000
0.996
1.000
1.000
1.000
Note:
1. The data generating process is yit = ri0 + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), it = ii,t 1 + uit,
and uit ~ i.i.d.N(0, (1 2i)2i).
292
Table 7.
Sizes of Panel Stationarity Tests:Trend Stationary Model, Serial

Correlation
p(l) = 0
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.091
0.067
0.044
0.003
0.056
0.057
0.066
15
25
50
100
0.657
0.808
0.975
0.999
Group Mean Test

0.252
0.016
0.007
0.314
0.001
0.004
0.495
0.005
0.000
0.723
0.001
0.000
0.144
0.151
0.183
0.223
0.156
0.181
0.267
0.377
0.142
0.174
0.245
0.338
15
25
50
100
0.610
0.775
0.966
0.999
0.226
0.292
0.459
0.700
0.003
0.002
0.000
0.000
0.108
0.121
0.149
0.185
0.134
0.158
0.239
0.362
0.127
0.157
0.231
0.333
0.094
0.060
0.040
0.057
0.062
0.061
15
25
50
100
0.758
0.931
0.991
1.000
0.177
0.252
0.332
0.524
Group Mean Test

0.058
0.019
0.053
0.011
0.048
0.006
0.048
0.001
0.079
0.091
0.092
0.091
0.134
0.160
0.189
0.251
0.160
0.194
0.237
0.341
15
25
50
100
0.717
0.913
0.988
1.000
0.155
0.224
0.305
0.500
0.017
0.010
0.007
0.002
0.060
0.066
0.072
0.067
0.096
0.118
0.138
0.189
0.120
0.159
0.198
0.297
0.092
0.053
0.041
0.049
0.048
0.054
15
25
50
100
0.789
0.928
0.998
1.000
0.138
0.171
0.259
0.377
Group Mean Test

0.059
0.042
0.061
0.039
0.069
0.032
0.066
0.026
0.062
0.056
0.053
0.051
0.076
0.076
0.075
0.074
0.098
0.101
0.115
0.133
15
25
50
100
0.752
0.911
0.997
1.000
0.114
0.148
0.236
0.354
0.046
0.043
0.046
0.046
0.052
0.057
0.055
0.051
0.064
0.077
0.081
0.091
N
1
25
50
100
Fisher Test
0.014
0.012
0.005
0.001
0.051
Fisher Test
0.049
0.050
0.050
0.056
0.044
Fisher Test
0.052
0.054
0.062
0.063
0.039
0.036
0.031
0.027
Note:
1. The data generating process is yit = rit + it + it, it = ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i).
Table 8.
Size Adjusted Powers of Panel Stationarity Tests:Trend Stationary

Model, Serial Correlation
p(l) = 0
KPSS
l1
l2
l3
LMC
p=1
p=2
p=3
0.065
0.060
0.051
0.044
0.064
0.065
0.054
15
25
50
100
0.055
0.088
0.130
0.203
Group Mean Test

0.068
0.076
0.054
0.089
0.076
0.044
0.122
0.088
0.037
0.185
0.106
0.036
0.059
0.072
0.086
0.120
0.066
0.078
0.100
0.122
0.064
0.071
0.090
0.090
15
25
50
100
0.052
0.088
0.132
0.203
0.067
0.085
0.119
0.178
0.053
0.042
0.037
0.032
0.056
0.072
0.091
0.119
0.061
0.077
0.098
0.121
0.066
0.076
0.090
0.089
0.123
0.109
0.087
0.100
0.090
0.086
15
25
50
100
0.389
0.381
0.693
0.905
0.311
0.381
0.603
0.827
Group Mean Test

0.240
0.139
0.324
0.213
0.489
0.274
0.699
0.391
0.240
0.302
0.437
0.674
0.207
0.238
0.367
0.548
0.186
0.190
0.324
0.478
15
25
50
100
0.389
0.377
0.696
0.903
0.312
0.384
0.608
0.829
0.140
0.216
0.270
0.403
0.234
0.312
0.444
0.680
0.203
0.247
0.374
0.554
0.189
0.199
0.333
0.502
0.302
0.264
0.208
0.273
0.236
0.200
15
25
50
100
0.935
0.993
1.000
1.000
0.908
0.984
1.000
1.000
Group Mean Test

0.853
0.767
0.958
0.909
0.998
0.993
1.000
1.000
0.881
0.976
1.000
1.000
0.816
0.930
0.995
1.000
0.707
0.855
0.981
1.000
15
25
50
100
0.934
0.993
1.000
1.000
0.902
0.984
1.000
1.000
0.886
0.974
1.000
1.000
0.823
0.939
0.997
1.000
0.735
0.869
0.985
1.000
N
1
25
50
100
293
Fisher Test
0.072
0.078
0.087
0.106
0.097
Fisher Test
0.252
0.330
0.481
0.694
0.235
Fisher Test
0.849
0.955
0.998
1.000
0.754
0.882
0.990
1.000
Note:
1. The data generating process is yit = rit + it + it, rit = ri,t 1 + it, it ~ i.i.d.N(0, q2i), it =
ii,t 1 + uit, and uit ~ i.i.d.N(0, (1 2i)2i)
294
have sizes close to the nominal size and powers much higher than the univariate
LM tests. Using the KPSS and LMC tests in this case would not result in much
size distortions, but would result in power losses for some combinations of N
and T, while the powers are already 1 or close to 1 for other combinations of
N and T. In the presence of serial correlation, we found that the tests based on
the KPSS tests with l2 and the LMC tests with p = 1 have relatively good size
performances though there are still moderate to severe size distortions when the
time span is short (T = 25), especially for the trend stationary models. And the
powers of all tests are lower than their counterparts in the white noise case.
Overall, the Fisher tests have better size performances than the group mean
tests while their power performances are almost the same.
IV. CONCLUSION
In this chapter, we developed several tests for stationarity in the heterogeneous
panel. We analyzed both level stationary and trend stationary models. By
allowing maximum degree of heterogeneity in the panel, we considered two
different ways to pool information regarding the null hypothesis from each
cross-section units by using the group mean test and the Fisher test. The group
mean test pools the information of the univariate test statistics while the Fisher
test summarizes the p-values of the individual tests. For the univariate
stationary tests, we consider the KPSS and LMC tests in the case of serial
correlation. The group mean tests based on the KPSS, and LMC tests are
asymptotically normal while the Fisher test statistics follow 2 distributions.
The small sample performances of the tests were investigated via Monte
Carlo simulation experiments. The results of simulations showed that the tests
we proposed have quite satisfactory size and power performances. In general,
the Fisher type tests have better size performances than the group mean type
tests while they have similar power performances. The tests based on the KPSS
tests with l2 and the LMC tests with p = 1 perform very similarly in terms of
size and power in most cases when there is serial correlation, except for the
short time span (T = 25). The size performances of these two tests are quite
good in the presence of serial correlation when T = 50 and 100. However, there
are still moderate to severe size distortions when T = 25 in the presence of serial
correlation. In such a case, bootstrapping method might be an effective way to
obtain better size performances. This would be an interesting topic for future
research. According to our simulation results, we would recommend to use
either the group mean tests or the Fisher tests which are based on both the
KPSS tests with l2 and the LMC tests with p = 1 to test for stationarity in the
heterogeneous panel data models in empirical work.
295
ACKNOWLEDGMENTS
We would like to thank Badi Baltagi and three anonymous referees for their
helpful comments. Of course, all remaining errors are ours.
NOTES
1. See, for example, Schwert (1987).
2. See KPSS for all relevant references and derivations of the tests.
3. Please see LMC for the details of this argument. Of course, this supremacy
depends on the correct specification of the LMC model, as pointed out by one
anonymous referee.
4. This means that the intercepts in different cross-section units can be different, one
aspect of the heterogeneous panel.
5. The moment restriction in applying the Lindberg-Levy CLT should not be a
problem here because all tests are variants of the LM tests, which are bounded.
6. The small sample distributions of these tests can be derived by simulating series
of given T under the null and apply the given test to the simulated series over a prespecified number of iterations.
7. In a recent paper, Choi (2000) proposes to standardize the Fisher test statistics as
well. But this is unnecessary unless N is large enough.
8. Please see Maddala & Wu (1999) for a detailed comparison between the group
mean and the Fisher tests.
9. By construction of the tests, the qis can be different across the units.
REFERENCES
Anderson, T. W., & Darling, D. A. (1952). Asymptotic Theory of Certain Goodness of Fit
Criteria Based on Stochastic Processes. Annals of Mathematical Statistics 23: 193212.
Baltagi, B., & Kao, C. (2000). Nonstationary Panels, Cointegration in Panels and Dynamic Panels:
A Survey. Advances in Econometrics, 15, 751.
Choi, I, (1999). Unit Root Tests for Panel Data. Manuscript, Kookmin University.
Fisher, R. A, (1932). Statistical Methods for Research Workers (4th ed.). Edinburgh: Oliver and
Boyd.
Hadri, K, (1998). Testing for Stationarity in Heterogeneous Panel Data. Working paper, School of
Business and Economics, Exeter University.
Im, K. S., Pesaran, M. H. & Shin, Y. (1997). Testing for Unit Roots in Heterogeneous Panels.
Discussion paper, University of Cambridge.
Kao, C, (1999). Spurious Regression and Residual-Based Tests for Cointegration in Panel Data.
Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis
of Stationarity Against the Alternative of a Unit Root. Journal of Econometrics, 54,
91115.
Leybourne, S. J., & McCabe, B. P. M. (1994). A Consistent Test for a Unit Root. Journal of
296
a New Simple Test. Oxford Bulletin of Economics and Statistics, forthcoming.
McCoskey, S., & Kao, C. (1997). A Monte Carlo Comparison of Tests for Cointegration in Panel
Data. Working paper, Center for Policy Research and Department of Economics, Syracuse
University.
Newey, W. K., & West,K. D. (1987). A Simple Positive Semi-Definite Heteroskedasticity and
Autocorrelation Consistent Covariance Matrix. Econometrica, 55, 703708.
Pedroni, P, (1995). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time
Series Tests With an Application to the PPP Hypothesis. Working paper, Department of
Economics, Indiana University.
Pedroni, P, (1997). Panel Cointegration: Asymptotic and Finite Sample Properties of Pooled Time
Series Tests With an Application to the PPP Hypothesis, New Results. Working paper,
Phillips, P. C. B., & Perron, P. (1988). Testing For a Unit Root in Time Series Regression.
Biometrika, 75, 335346.
Wu, S., & Yin, Y. (1999). Tests for Cointegration in Heterogeneous Panel: A Monte Carlo
Comparison. Working paper, Department of Economics, State University of New York at
Buffalo.
INSTRUMENTAL VARIABLE
ESTIMATION OF SEMIPARAMETRIC
DYNAMIC PANEL DATA MODELS:
MONTE CARLO RESULTS ON
SEVERAL NEW AND EXISTING
ESTIMATORS
M. Douglas Berg, Qi Li and Aman Ullah
ABSTRACT
We consider the problem of instrumental variable estimation of semiparametric dynamic panel data models. We propose several new
semiparametric instrumental variable estimators for estimating a dynamic
panel data model. Monte Carlo experiments show that the new estimators
perform much better than the estimators suggested by Li & Stengos (1996)
and Li & Ullah (1998).
I. INTRODUCTION
Economic research has been enriched by the availability of panel data that
measure individual cross-sectional behavior over time. For reviews on the
literature of estimation and inference in parametric panel data models, see
Baltagi (1995), Chamberlain (1984), Hsiao (1986) and Matyas & Sevestre
2000 by Elsevier Science Inc.
ISBN: 0-7623-0688-2
297
298
M. DOUGLAS BERG, QI LI & AMAN ULLAH
(1996)). Recently, semiparametric modeling and estimation has attracted much

attention among statisticians and econometricians. One popular semiparametric
model is the partially linear model. In this chapter we consider the problem of
estimating a semiparametric dynamic panel data model which includes the
following model as a special case:
yit = yit 1 + (zit) + uit,
(1.1)
where the functional form of ( ) is unspecified. Therefore (1.1) is a

semiparametric dynamic panel data model. When ( ) has a known form, say
(zit) = zit, we obtain a parametric dynamic panel data model:
yit = yit 1 + zit + uit.
(1.2)
When the error uit has a one-way error component structure, i.e. uit = i + it,
then yit 1 and uit are correlated and instrumental variable methods are needed
to obtain consistent estimation for .
There is a rich literature on how to obtain consistent and efficient estimation
results for parametric dynamic models, see Ahn & Schmidt (1995), Anderson
& Hsiao (1981), Arellano & Bover (1995), Baltagi & Griffin (1998), Pesaran
& Smith (1995) and Kiviet (1995), among others. The consistent and efficient
estimation results for the parametric dynamic panel data model (1.2) depend
crucially on the correct specification of the model. If (zit) zit, parametric
estimation methods based on a misspecified model (1.2) will in general lead to
inconsistent estimation of .
Semiparametric partially linear models have the advantage of not specifying
the functional form of ( ). Hence a consistent semiparametric estimator of
based on (1.1) is robust to functional form specification of ( ). There is a rich
literature on estimating a partially linear model with independent data using
various non-parametric techniques, e.g. Engle et al. (1986), Robinson (1988),
Stock (1989), Donald & Newey (1994), Li (1996). Also, see Ullah & Roy
(1998), Ullah & Mundra (1998), and Khanna et al. (1999) for the estimation
and applications of static partially linear panel data models. However, little
attention has been paid to dynamic partially linear panel data models. Although
Li & Stengos (1996) and Li & Ullah (1998) discussed how to estimate model
(1.1) by semiparametric instrumental variable methods, no simulations are
reported in those works and hence the finite sample performance of the
estimators proposed in Li & Stengos (1996) and Li & Ullah (1998) are
unknown.1
Li & Stengos (1996) proposed a semiparametric OLS type IV (OLSIV)
estimator for estimating . When the error follows an one-way error
components structure. The OLS type estimator is not efficient because it
Semiparametric Dynamic Panel Data Model
299
ignores this error structure. Li & Ullah (1996) therefore proposed a

semiparametric GLS-type IV (GLSIV) estimator. However, the GLSIV
estimator in Li & Ullah (1998) did not make full use of the one-way error
component structure. In fact when the model is just identified, their
semiparametric IVGLS estimator reduces to a semiparametric IVOLS
estimator and hence it is inefficient in the sense that the one-way error
component structure is not utilized in constructing the estimator. In this chapter
we propose a new semiparametric IVGLS estimator and a new semiparametric IVWithin estimator that are more efficient than the ones considered in
Li & Stengos (1996), and Li & Ullah (1998). We then use Monte Carlo
experiments to examine the finite sample performances of the new semiparametric estimators and some existing estimators (e.g. Li & Ullah (1998) and Li
& Stengos (1998)). Our simulation results show that the new estimators
perform substantially better than the existing ones.
The chapter is organized as follows. Section 2 first reviews the semiparametric estimators of Li & Stengos (1996), and Li & Ullah (1998). We then
propose some new estimators. Section 3 reports Monte Carlo simulations to
compare the relative performances of various estimators. Finally section 4
concludes the paper.
II. THE MODEL

We consider a slightly more general semiparametric dynamic panel data model
than (1.1) considered in the introduction section.
yit = xit + (zit) + uit,
(i = 1, . . . , N; t = 1, . . . , T),
(2.1)
where xit is of dimension p

1, is a p
1 unknown parameter, zit is of
dimension d, () is an unknown smooth function. We assume that the first
element of xit is yit 1 so that model (2.1) is a semiparametric dynamic panel data
model. We are mainly interested in obtaining accurate estimation for .
We consider the case that the error uit follows an one-way error components
specification,
uit = i + it,
(2.2)
where i is i.i.d. (0, 2), it is i.i.d. (0, 2), i and jt are uncorrelated for all i
and jt.
In this chapter we propose a new semiparametric IVGLS estimator that
fully uses the one-way error component structure. We also propose a
semiparametric IV-within-transformation estimator which has the advantage of
computationally simplicity. Because it does not require one to estimate the
300
variance components. We then employ Monte Carlo simulations to investigate

the finite sample performance of our proposed semiparametric IV estimators
and compare them with some existing estimators.
GLS type estimators require knowledge of error variance structure. In vector
notation, the one-way error component model of (2.2) has the following form,
u = (IN eT) + ,
(2.3)
where eT is a column of ones of dimension T, = (1, 2, . . . , N) is of

dimension N
1, u and are both of dimension NT
1 with u = (u11, . . . ,
u1T, . . . , uN1, . . . , uNT) and is similarly defined.
= E(uu) = 2IN JT + 2 INT,
= IN [21J T + 2ET] IN ,
(2.4)
(2.5)
where JT = eT eT is a T
T matrix with all elements equal to one, J T = JT /T,
ET = IT J T and 21 = T2 + 2. By noting the facts that J TET = 0, J T + ET = IT, and
both J T and ET are idempotent matrices, it is easy to check that the inverse of
is given by2
1 = IN [(1/21)J T + (1/2 )ET] IN 1,
(2.6)
1/2 = IN [(1/1)J T + (1/)ET] IN 1/2,
(2.7)
and
The above expression of 1 and 1/2 will be used in GLS estimation
procedure discussed below.
A. Some Infeasible Estimators
Equation (2.1) contains an unknown function ( ), following Robinson (1988),
we first eliminate ( ). Taking conditional expectation of (2.1) conditional on
zit and then subtracting it from (2.1) leads to
yit E(yit|zit) = (xit E(xit|zit)) + uit
vit + uit,
(2.8)
def
where vit = xit E(xit|zit). In vector-matrix notation we have

y E(y|z) = v + u,
(2.9)
where y, E(y|z) and u are all NT

1 vectors with typical elements given by yit,
E(yit|zit) and uit, respectively, and v is of dimension NT
p with typical row
given by vit = xit E(xit|zit).
301
Equation (2.9) no longer contains the unknown function ( ). Note that vit
and uit are correlated because vit contains yit 1 and uit contains the random
individual effects i. Suppose there exists a q
1(q p) instrumental variable it
that is correlated with xit and uncorrelated with uit, then we can use
def
wit = it E(it|zit) as IV for vit. For example, consider a simple case where both
xit and zit are scalars with xit = yit 1 and zit is strictly exogenous, then one can
choose it = zit 1 as instrument for yit 1.
In vector-matrix notation, an (infeasible) IVOLS estimator of based on
(2.9) is (see White (1984, 1987) for a discussion on IV estimation)
IVO = (vwwv) 1vww(y E(y|z)) = + (vwwv) 1vwwu. (2.10)
When the model is just identified, i.e. p = q, and if we assume that wv is
invertible, then IVO becomes
IVO = (wv) 1(vw) 1vww(y E(y|z)) = (wv) 1w(y E(y|z)). (2.11)
The above IVOLS estimator is not efficient because it ignores the error
component variance structure. Li and Ullah (1998) suggested estimating by
(2.12)
= (vw(w 1w) 1wv) 1 vw(w 1w) 1w(y E(y|z)).
However, when q = p and if we assume that the square matrices vw and
w 1w are both invertible, then we have from (2.12)
= (wv) 1(w 1w)(vw) 1vw(w 1w) 1w(y E(y|z))
= (wv) 1w(y E(y|z)) = IVO,
that is, reduces to the IVOLS estimator of (2.11) when the model is just
identified. Therefore, the IV estimator also ignores the variance component
structure when the model is just identified.
A new IVGLS estimator that fully uses the one-way error component
structure is given by
IVG = (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1(y E(y|z))
= + (v 1w(w 1w) 1w 1v) 1v 1w(w 1w) 1w 1u, (2.13)
IVG of (2.13) is an optimal IV estimator as discussed in White (1984, 1987).
When the model is just identified, i.e. p = q, and if we assume that both
w 1v and w 1w are invertible, then IVG of (2.13) becomes
IVG = (w 1v) 1(w 1w)(v 1w) 1v 1w(w 1w) 1w 1(y E(y|z))
(2.14)
= (w 1v) 1w 1(y E(y|z)),
which is different from IVO of (2.11). Note that one can transform the model
by premultiplying y, v and w by 1/2. Denote y* = 1/2y, v* = 1/2v and
w* = 1/2w, then the IVGLS estimator of (2.13) is simply
302
IVG = (w*v*) 1(w*w*)(v*w*) 1v*w*(w*w*) 1w*(y* E(y*|z)), (2.15)

which is easier to compute since it does not require one to invert a NT
NT
matrix.
p
Let n = NT, then under the conditions of (i) wu/n 0 (w is a legitimate IV),
(ii) v 1w/n A, and (iii) w 1w/n B, a positive definite matrix, one can
show that
p
n( IVG ) N(0, AB 1A).

p
(2.16)
The proof of (2.16) is similar to the proof of lemma 3 of Li and Ullah and
is therefore omitted here.
Next we propose a simple IV estimator based on the within transformation.
Within type estimator has the advantage of computationally simple, it only
requires the least squares regression of the within transformed variables. Define
it = E(yit|zit) and define the within transformed variables: y it = yit y i ,
it = wit w
i , where y i = Ts= 1 yis /T, i , v i and w
i
it = it i , v it = vit v i and w
are similarly defined. The IVWithin estimator is given by
w
v) 1v w
w(
y ).
(2.17)
IVW = (vw
When the model is just identified, we have
v) 1(vw)
1v w
w(
y ).
IVW = (w
y ).
= (w
v) 1w(
(2.18)
The within type estimator has the advantage of being computationally simple
because it does not require one to estimate the error variance .
B. Feasible Estimators
The estimators IVO, IVG and IVW discussed above are not feasible, because the
conditional mean functions E(y|z), E(x|z) and E(w|z) as well as , are unknown.
The feasible estimators can be obtained by replacing the unknown conditional
mean functions by their non-parametric estimators, such as the non-parametric
kernel estimators, and replacing 21 and 2 by consistent estimators of them.
Following Robinson (1988), we use a kernel estimation method to estimate
the unknown conditional expectations. Specifically we denote the kernel
it, respectively,
estimators of f(zit), E(yit|zit), E(xit|zit), E(wit|zit) by fit, y it, x it and w
where
fit = 1 d
NTh

j
Kit, js,
(2.19)
y it =
x it =
1
NThd
1
NThd
303

j
yjsKit, js / fit,
(2.20)
xjsKit, js / fit,
(2.21)
wjsKit, js / fit,
(2.22)
and
w
it =
1
NThd

j
where Kit, js = K((zit zjs)/h), K( ) is the kernel function and h is the smoothing
parameter.
Note that when xit = yit 1, we have
it 1|zit) = (NThd) 1
x it = E(y

j
yjs 1 Kit, js / fit,
(2.23)
it 1|zit 1) = (NThd) 1
yjs 1 Kit 1, js 1 /
which is different from y it 1 = E(y
j
s
fit 1.
We estimate vit xit E(xit|zit) by xit x it and we estimate wit it E(it|zit)
by it it, where
it = (NThd) 1

j
js Kit, js / fit,
(2.24)
is the kernel estimator of E(it|zit).

In vector-matrix notation, the feasible IVOLS estimator of is obtained
from (2.10) by replacing E(yit|zit), vit = xit E(xit|zit) and wit = it E(it|zit) by
their kernel estimators y it, xit x it and it it, respectively,
IVO = [(x x )( )( )(x x )] 1(x x )( )( )(y y ). (2.25)
Similarly, we have
1( )
1( )]
1(x x )} 1
[( )

1( )

IVG = {(x x )
(x x )
where
1
1(y y ),
( )[( ) 1( )] 1( )
1
(2.26)
1
is a consistent estimator of given by

1,
1 = IN

(2.27)
304
with
1 = (1/ 2)ET + (1/ 21)J T,

(2.28)
= u (IN ET)u/[N(T 1)]
(2.29)
= T + ,
= u (IN J T)u/N,
(2.30)
2

2
1
2

2

2

and u is of dimension n
1 with a typical element given by
u it = yit y it (xit x it) IVO.
(2.31)
(2.32)
For the feasible semiparametric IV within estimator, we will use the same
tilde notation to denote the feasible quantity to avoid introducing too many new
notations. For example we use v it to denote kernel estimator of vit v i . Recall
that vit = xit E(xit|zit). Hence we have

T
1
v it = (xit x it)
T
(xis x is).
(2.33)
s=1
Similarly, recall that wit = it E(it|zit) and it = E(yit|zit), we have
1
w
it = (it it)
T
(is is),
(2.34)
1
it = it
T
is.
(2.35)
and
s=1
s=1
y it remains the same as y it = yit y i . With the notations given in (2.33) to (2.35),
we obtain the feasible semiparametric IVWithin estimator,
w
v) 1v w
w(
y ).
(2.36)
IVW = (vw
In the next section we compare the finite sample performances of the new
estimators proposed in this paper with those suggested by Li & Stengos (1996)
and Li & Ullah (1998) via Monte Carlo simulations.
III. MONTE CARLO RESULTS

We use the following data generating process (DGP):
yit = yit 1 + zit + z2it + i + it
= yit 1 + (zit) + i + it,
(2.37)
305
where zit is independent and uniformly distributed in the interval of

[ 3,3], it is i.i.d. N(0,1). We choose = 0.5, = 0, 0.5, 1. We fix total
variance of 2 + 2 = 10 and vary = 2/(2 + 2) to be 0.2, 0.5, 0.8. We
choose it = zit 1 as IV for yit 1.
For comparison we also compute the following non-IV semiparametric
estimators:
(I) A semiparametric OLS estimator given by
OLS = [(x x )(x x )] 1(x x )(y y ).
(2.38)
(II) A semiparametric GLS estimator defined by

1(y y ).
GLS = [(x x ) 1(x x )] 1(x x )
(2.39)
(III) A semiparametric within estimator
W = [vv] 1v y,
(2.40)
where v it = xit x it (1/T)

y it = yit (1/T)
(xis x is) is the same as defined in (2.33) and
s=1
yis.
s=1
(I)(III) do not use instrumental variables and hence these estimators are
expected to have large bias because they ignore the fact that yit and uit are
correlated. However, they are also expected to have smaller variances
compared with the IV estimators. Therefore, for small and moderate samples,
their mean square error (MSE) are not necessarily larger than the semiparametric IV estimators. Of course when the sample size is sufficiently large, we
expect the semiparametric IV estimators to have smaller MSE because after all,
they are consistent estimators, while the non-IV estimators are inconsistent.
The bias of non-IV estimators will not die out as the sample size increases.
We report estimated bias, standard deviation (Std) and root mean square
errors (Rmse) for all the estimators. These are computed via

M
= M1
Bias( )
j=1

( j ),
= M1
Std( )
j=1
2
( j Mean( ))
1/2
and
= {M 1
Rmse( )
( j )2}1/2, where M is the number of replication and j
j=1
is the estimated value of at the jth replication. We use M = 2000 in all the
simulations. We choose T = 6 and N = 50, 100, 200, 500.
306
The simulation results are given in Tables 1 and 2. The smallest Rmse for
each case (for a given N and ) is shown as boldface number(s). The
simulations results are qualitatively similar for = 0, = 0.5 and = 1.
Therefore, we only report the cases of = 0 and = 1 to save space.
Table 1 reports the result for = 0. From Table 1 we see that the non-IV
estimators: OLS, GLS and W have large bias because these estimators ignore
the fact that yit 1 is correlated with uit. However, these non-IV estimators all
have smaller standard deviations (or variances) than the semiparametric IV
estimators.
When N is small (N 100) and with small to moderate values of ( 0.5),
GLS has the smallest Rmse among all the estimators.
For N 100 with = 0.8, GLS is no longer the best because of the large bias
due to the strong individual effects. In this case IVG and IVW have the smallest
Rmse.
For N = 200 and N = 500 and for small = 0.2, IVO has the smallest Rmse.
But larger values of ( = 0.5, 0.8), IVG and IVW become the best in terms of
the Rmse criterion.
For N 100 and 0.5 GLS has the smallest Rmse. However, for = 0.8, the
bias in GLS is very large and hence its Rmse is much larger than the IV
estimators. IVG and IVW have the smallest Rmse for = 0.8.
As N increases, the bias in OLS, GLS and W remain the same order as
expected. The variances of the IV estimators decrease as N increases, and as a
result, the IV estimators dominate the non-IV estimators when N 200. For
= 0.2, IVOLS estimator has the smallest Rmse. For = 0.5 and = 0.8, IV
GLS and IVWithin estimators have much smaller Rmse compared with the
IVOLS estimator. The IVOLS estimator ignores the one-way error
component structure. Hence when the individual effects are large, IVOLSs
performance is expected to be worse than that of the IVGLS estimator.
We observe, as expected, the bias of non-IV estimators increase as
increases.
We also observe that the Rmse for IVOLS estimator remain the same for
different values of , while for IVGLS and IVWithin estimators, the Rmse
decrease as increases.
Next, we observe that the results of Table 2 is very similar to that of Table
1. That is, the result is not sensitive to the different functional form of (zit).
This is as expected because all the estimators are semiparametric and hence
they are robust to functional form specifications of ( ).
The DGP given in (2.37) is a just identified model. We have also conducted
some simulations for over identified model. In particular, we consider the
following model
Table 1.
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
307
The case of = 0.
Rmse
N = 50
= 0.5
Bias
Std
Rmse
Bias
0.198
0.117
0.248
0.291
0.215
0.225
0.352
0.099
0.213
0.042
0.008
0.009
0.030
0.059
0.057
0.329
0.171
0.174
0.353
0.115
0.220
0.331
0.171
0.174
0.442
0.310
0.136
0.128
0.012
0.013
= 0.2
Bias
Std
Rmse
N = 100
= 0.5
Bias
Std
Rmse
Bias
0.196
0.104
0.243
0.008
0.006
0.006
0.199
0.111
0.246
0.139
0.146
0.151
0.354
0.100
0.220
0.023
0.007
0.008
0.021
0.040
0.040
0.158
0.117
0.118
0.355
0.108
0.223
0.159
0.117
0.118
0.443
0.312
0.154
0.049
0.009
0.010
= 0.2
Bias
Std
Rmse
N = 200
= 0.5
Bias
Std
Rmse
Bias
0.198
0.105
0.244
0.004
0.004
0.005
0.200
0.108
0.246
0.097
0.103
0.105
0.356
0.100
0.224
0.010
0.005
0.006
0.015
0.029
0.029
0.101
0.083
0.084
0.356
0.104
0.226
0.101
0.083
0.084
0.444
0.312
0.166
0.016
0.007
0.007
= 0.2
Bias
Std
Rmse
N = 500
= 0.5
Bias
Std
Rmse
Bias
0.199
0.105
0.245
0.001
0.006
0.006
0.200
0.106
0.245
0.058
0.065
0.067
0.357
0.100
0.227
0.003
0.006
0.006
0.357
0.101
0.228
0.057
0.053
0.053
0.444
0.311
0.176
0.004
0.006
0.006
= 0.2
Bias
Std
0.193
0.103
0.241
0.019
0.006
0.005
0.045
0.056
0.058
0.290
0.215
0.225
0.031
0.039
0.041
0.139
0.146
0.150
0.021
0.027
0.029
0.097
0.103
0.105
0.014
0.017
0.019
0.058
0.065
0.066
0.009
0.018
0.018
0.057
0.052
0.053
= 0.8
Std
0.016
0.040
0.061
2.39
0.111
0.111
= 0.8
Std
0.011
0.027
0.042
0.528
0.077
0.076
= 0.8
Std
0.008
0.020
0.029
0.106
0.054
0.054
= 0.8
Std
0.005
0.013
0.018
0.057
0.034
0.034
Rmse
0.442
0.313
0.149
2.39
0.112
0.112
Rmse
0.443
0.313
0.160
0.530
0.077
0.077
Rmse
0.444
0.312
0.168
0.107
0.055
0.055
Rmse
0.444
0.311
0.177
0.058
0.034
0.034
308
Table 2.
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
OLS
GLS
W
IVO
IVG
IVW
The case of = 1.
Rmse
N = 50
= 0.5
Bias
Std
Rmse
Bias
0.196
0.117
0.244
0.302
0.215
0.225
0.348
0.092
0.208
0.045
0.008
0.009
0.031
0.058
0.057
0.341
0.171
0.174
0.350
0.109
0.216
0.344
0.172
0.174
0.438
0.298
0.132
0.168
0.012
0.013
= 0.2}
Bias
Std
Rmse
N = 100
= 0.5
Bias
Std
Rmse
Bias
0.194
0.104
0.238
0.008
0.006
0.006
0.196
0.111
0.242
0.139
0.146
0.150
0.351
0.094
0.214
0.023
0.007
0.008
0.021
0.040
0.040
0.156
0.117
0.118
0.352
0.102
0.218
0.158
0.118
0.119
0.440
0.299
0.148
0.042
0.009
0.010
= 0.2
Bias
Std
Rmse
N = 200
= 0.5
Bias
Std
Rmse
Bias
0.196
0.104
0.240
0.004
0.004
0.005
0.197
0.108
0.241
0.097
0.103
0.105
0.353
0.093
0.218
0.010
0.005
0.006
0.015
0.029
0.028
0.101
0.083
0.084
0.353
0.097
0.220
0.101
0.083
0.084
0.441
0.298
0.158
0.016
0.007
0.007
= 0.2
Bias
Std
Rmse
N = 500
= 0.5
Bias
Std
Rmse
Bias
0.197
0.105
0.240
0.001
0.006
0.006
0.197
0.106
0.241
0.058
0.065
0.067
0.353
0.092
0.221
0.003
0.006
0.006
0.353
0.094
0.222
0.057
0.053
0.053
0.441
0.297
0.167
0.004
0.006
0.006
= 0.2
Bias
Std
0.190
0.104
0.237
0.021
0.006
0.005
0.045
0.055
0.058
0.301
0.215
0.225
0.031
0.039
0.041
0.139
0.146
0.150
0.021
0.027
0.029
0.097
0.103
0.105
0.013
0.017
0.019
0.058
0.065
0.066
0.009
0.018
0.018
0.057
0.052
0.053
= 0.8
Std
0.016
0.041
0.059
3.53
0.112
0.111
= 0.8
Std
0.012
0.028
0.041
0.243
0.077
0.077
= 0.8
Std
0.008
0.021
0.028
0.106
0.054
0.054
= 0.8
Std
0.005
0.013
0.018
0.057
0.034
0.034
Rmse
0.439
0.301
0.144
3.53
0.112
0.112
Rmse
0.440
0.301
0.153
0.246
0.077
0.077
Rmse
0.441
0.299
0.161
0.107
0.055
0.055
Rmse
0.441
0.298
0.168
0.058
0.035
0.035
309
yit = yi,t 1 + z1,it + 1z1,it + z2,it + 2z2,it + i + it

= yi,t 1 + (z1,it,z2,it) + i + it.
(2.41)
The simulation results for the above over identified model lead to the same
conclusion as the just identified model. Therefore, we do not report the results
for the over identified case to save space. However, the results are available
from the authors upon request.
IV. CONCLUDING REMARKS

In this chapter we consider the problem of estimating a semiparametric
partially linear panel data model with errors that has a one-way error
components structure. We propose two new semiparametric IV estimator for
the coefficient of the parametric component, and we argue that the new
semiparametric estimators are more efficient than the ones suggested by Li &
Stengos (1996) and Li & Ullah (1998) because the new estimators make full
use of the one-way error components structure. The Monte Carlo simulation
results confirm our theoretical analysis.
Throughout the chapter we assume the existence of random individual
effects. In practice one may want to test the existence of random individual
effects. For this purpose one can use the test statistic suggested by Li & Hsiao
(1998) for testing the null of no random individual effects in a partially linear
dynamic panel data model.
Also in this chapter we only consider the case that i is a random effect. We
now briefly discuss the case of fixed effects semiparametric partially linear
models. The model is the same as given in (2.1) and (2.2) except that now we
assume the individual effect i is a fixed effect rather than a random effect. The
semiparametric IVOLS and IVGLS estimators that either ignore the fixed
effects or treat the fixed effects as random effects will not lead to consistent
estimation results by the same reason as in the parametric regression model
case. However, the semiparametric within estimator, which wipes out the
individual effects whether it is fixed or random, remains a consistent estimator
in the case of a fixed effect model.
Our Monte Carlo results of Section 3 show that the within semiparametric
estimator IVW performs quite well relative to other estimators. Therefore, we
recommend its use in practice.
ACKNOWLEDGMENTS
We would like to thank a referee and Badi Baltagi for very useful comments
that greatly improve the paper. Q. Lis research is supported by Natural
310
Sciences and Engineering Research Council of Canada, the Social Sciences

and Humanities Research Council of Canada, Ontario Premiers Research
Excellence Awards, and Bush program in economics on public policy. A. Ullah
thanks the Academic Senate of UCR for the research support.
NOTES
1. Li & Ullah (1998) reported some Monte Carlo results on a static semiparametric
panel data model. They also proposed two semiparametric instrumental variable
estimators for a semiparametric dynamic panel data model, but they did not conduct any
Monte Carlo simulations on the dynamic model.
2. Using the simple spectral decomposition method to derive the inverse of was
proposed by Wansbeek & Kapteyn (1982, 1983).
REFERENCES
Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models With Error Components.
Journal of American Statistical Association, 76, 598606.
Arellano, M., & Bover, O. (1995). Another Look at The Instrumental Variable Estimation of Error
Components Models. Journal of Econometrics, 68, 2851.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: Wiley.
Baltagi, B. H., & Griffin, J. M. (1997). Pooled Estimators vs. Their Heterogeneous Counterparts
in The Context of Dynamic Demand for Gasoline. Journal of Econometrics, 77, 303327.
Chamberlain, G. (1984). Panel Data. In: Z. Griliches & M. Intriligator (Eds), Handbook of
Econometrics (pp. 12471318 ), Vol. II. Amsterdam: North Holland.
Donald, S. G., & Newey, W. K. (1994). Series Estimation of Semilinear Regression. Journal of
Multivariate Analysis, 50, 3040.
Engle, R. F., Granger, C. W. J., Rice, J., & Weiss, A. (1986). Semiparametric Estimates of The
Relationship Between Weather and Electricity Sales. Journal of the American Statistical
Association, 81, 310320.
Hsiao, C. (1986). Analysis of Panel Data. Econometric Society monograph No. 11. New York:
Khanna, M., Mundra, K., & Ullah, A. (1999). Parametric and Semiparametric Estimation of The
Effect of Firm Attributes on Efficiency: The Electricity Generating Sector in India. Journal
of International Trade and Economic Development, forthcoming.
Kiviet, J. F. (1995). On Bias, Inconsistency and Efficiency of Some Estimators in Dynamic Panel
Li, Q. (1996). On The Root-n-consistent Semiparametric Estimation of Partially Linear Models.
Li, Q., & Hsiao, C. (1998). Testing Serial Correlation in Semiparametric Panel Data Models.
Li, Q., & Stengos, T. (1996). Semiparametric Estimation of Partially Linear Panel Data Models.
311
Li, Q., & Ullah, A. (1998). Estimating partially linear models with one-way error components.
Econometric Reviews, 17, 145166.
Matyas, L., & Sevestre, P. (1992). The Econometrics of Panel Data. Dordrecht: Kluwer, 2nd
edition.
Pesaran, M. H., & Smith, R. (1995). Estimation of Long-run Relationship From Dynamic
Robinson, P. M. (1988). Root-N-consistent Semiparametric Regression. Econometrica, 56,
931954.
Stock, J. H. (1989). Nonparametric Policy Analysis. Journal of the American Statistical
Association, 84, 567575.
Ullah, A., & Roy, N. (1998). Nonparametric and Semiparametric Econometrics of Panel Data. In:
A. Ullah and D. E. A. Giles (Eds), Handbook on Applied Economic Statistics (pp. 579
604), Ch. 17. Marcel Dekker.
Ullah, A., & Mundra, K. (1999). Semiparametric Panel Data Estimation: An Application to
Immigrates Homelink Effect on U.S. Producer Trade Flows. Working paper 15, Department
of Economics, University of California at Riverside.
Wansbeek, T. J., & Kapteyn, A. (1982). A Simple Way to Obtain the Spectral Decomposition of
Variance Components Models for Balanced Data. Communications in Statistics, A11,
21052112.
Wansbeek, T. J., & Kapteyn, A. (1983). A Note on Spectral Decomposition and Maximum
Likelihood Estimation of ANOVA Models With Balanced Data. Statistics and Probability
Letters, 1, 213215.
White, H. (1984). Asymptotic Theory for Econometricians. New York: Academic Press.
White, H. (1986. Instrumental Variables Analogs of Generalized Least Squares Estimator. R. S.
Mariano (Ed.), Advances in Statistical Analysis and Statistical Computing (pp.173277),
Vol.1. New York: JAI Press.
APPENDIX
/** This is a gauss program using Monte Carlo simulation to examine the finite
sample performanes of some semiparametric instrumental variable estimators
in a semiparametric dynamic panel data model, written by M. Douglas Berg **/
output file = c:\gauss\doug\work1.out reset;
format /rd 8,3;
n = 100; T = 6;
T00 = 30; T0 = T + T00 + 1; NT = N*T;
nr = 500;
@ number of replication @
lamt = 0.5; b1 = 1; b2 = 0; sig2 = 10;
rho = 0.8;
sigmu2 = rho*sig2;
signu2 = (1-rho)*sig2; sigmu = sqrt(sigmu2);
signu = sqrt(signu2); s1_5 = sqrt(t*sigmu2 + signu2);
sv_5 = signu;
@ true parameter values @
ycz = zeros(nt,1); y1cz = ycz; z1cz = ycz; fz = ycz;
312
kel = zeros(nt,1); lam1 = zeros(nr,1);

lam3 = lam1; lam1n = lam1; lam3n = lam1; lam4n = lam1;
lam6n = lam1; y0 = zeros(n,t0);
rndseed 7893450;
i1 = 1; do while i1 < = nr;
z0 = 2*sqrt(3)*rndu(n,t0) sqrt(3);
u0 = rndn(n,t0); mu = rndn(n,1);
@ Monte Carlo simulation loop @
i2 = 2; do while i2 < = t0;

y0[.,i2] = lamt*y0[.,i21] + b1*z0[.,i2]
+ b2*z0[.,i2]2^ + signu*u0[.,i2] + sigmu*mu;
i2 = i2 + 1;
endo;
@ Generate y @
y = y0[.,T00 + 1:T00 + T];

y1 = y0[.,T00:T00 + T1];
z = z0[.,T00 + 1:T00 + T];
z1 = z0[.,T00:T00 + T1];
yv = reshape( y, nt, 1 );
y1v = reshape( y1, nt, 1 );
zv = reshape( z, nt, 1 );
z1v = reshape( z1, nt, 1 );
hz = stdc(zv)*(nt^(1/5));
hz1 = stdc(z1v)*(nt^(1/5));
zvh = zv/hz; z1vh = z1v/hz1;
i3 = 1; do while i3 < = nt;
@ Nonparametric Estimation Loop @
zd = zvh[i3,.] zvh;
z1d = z1vh[i3,.] z1vh;
^ )/sqrt(2*pi);
kelz = prodc( (exp(0.5*zd2))
kelz1 = prodc( (exp(0.5*z1d2^)) )/sqrt(2*pi);
ycz[i3,.] = yv*kelz/(nt*hz);
y1cz[i3,.] = y1v*kelz/(nt*hz);
z1cz[i3,.] = z1v*kelz/(nt*hz);
fz[i3,.] = sumc( kelz )/(nt*hz);
i3 = i3 + 1;
endo;
w1v = z1v z1cz./fz;
xxv = y1v y1cz./fz;
yyv = yv ycz./fz;
@ Li-Ullah, Li-Stengos IV @
lam1[i1,.] = inv(w1v*xxv)*w1v*yyv;
lam3[i1,.] = inv(xxv*xxv)*xxv*yyv;
u01 = yyv xxv*lam1[i1,.];
u03 = yyv xxv*lam3[i1,.];
Jbt = ones(t,t)/t;
Et = eye(t) Jbt;
u11 = Et*( (reshape( u01,n,t)) );
u11 = reshape( u11,nt,1 );
sv2 = u11*u11/(n*(t1));
u22 = Jbt*( (reshape(u01,n,t)) );
smu2 = u22*u22/n;
s12 = sv2 + t*smu2;
sv_1 = sqrt( sv2 );
s1_1 = sqrt( s12 );
u11 = Et*( (reshape( u03,n,t)) );
sv2 = u11*u11/(n*(t1));
u22 = Jbt*( (reshape(u03,n,t)) );
smu2 = u22*u22/n;
s12 = sv2 + t*smu2;
sv_3 = sqrt( sv2 );
s1_3 = sqrt( s12 );
At_1 = Jbt/s1_1 + Et/sv_1;
At_3 = Jbt/s1_3 + Et/sv_3;
At_5 = Jbt/s1_5 + Et/sv_5;
At_w = Et;
yyn_1 = At_1*( (reshape(yyv,n,t)) );
yyn_3 = At_3*( (reshape(yyv,n,t)) );
yyn_6 = At_w*( (reshape(yyv,n,t)) );
xxn_1 = At_1*( (reshape(xxv,n,t)) );
xxn_3 = At_3*( (reshape(xxv,n,t)) );
xxn_6 = At_w*( (reshape(xxv,n,t)) );
w1n_w = At_w*( (reshape(w1v,n,t)) );
w1n = At_1*( (reshape(w1v,n,t)) );
yyv_1 = reshape(yyn_1,nt,1);
313
@ IV-OLS estimator @
@ Semi-OLS estimator @
314
xxv_1 = reshape(xxn_1,nt,1);
w1v_w = reshape(w1n_w,nt,1);
w1v = reshape(w1n,nt,1);
lam1n[i1,.] = inv(w1v*xxv_1)*w1v*yyv_1;
@ IV-GLS estimato
@ lam3n[i1,.] = inv(xxv_3*xxv_3)*xxv_3*yyv_3;
@ Semi-GLS estimator @ lam4n[i1,.] = inv(w1v_w*xxv_6)*w1v_w*yyv_6;
@ IV-Within estimator @ lam6n[i1,.] = inv(xxv_6*xxv_6)*xxv_6*yyv_6;
@ Semi-Within est. @ i1 = i1 + 1;
endo;
Bias1 = meanc( lam1 lamt );
@ Bias @
Bias3 = meanc( lam3 lamt );
rmse1 = sqrt( meanc( (lam1-lamt)2^ ) ); @ Root-MSE @
rmse3 = sqrt( meanc( (lam3-lamt)2^ ) );
std1 = stdc(lam1);
@ Standard Dev. @
std3 = stdc(lam3);
Bias1n = meanc( lam1n lamt );
rmse1n = sqrt( meanc( (lam1n-lamt)2^ ) );
std1n = stdc(lam1n);
print "********************************************************";
print "IVO1, bias1, std1, rmse1 = " bias1 std1 rmse1;
print "OLS, bias3, std3, rmse3 = " bias3 std3 rmse3;
print "********************************************************";
print "IVG1, bias1n, std1n, rmse1n = " bias1n std1n rmse1n;
print "GLS, bias3n, std3n, rmse3n = " bias3n std3n rmse3n;
print "********************************************************";
print "With1, bias4n, std4n, rmse4n = " bias4n std4n rmse4n;
315
print "With, bias6n, std6n, rmse6n = " bias6n std6n rmse6n;

print "********************************************************";
end;
SMALL SAMPLE PERFORMANCE OF

DYNAMIC PANEL DATA ESTIMATORS
IN ESTIMATING THE
GROWTH-CONVERGENCE EQUATION:
A MONTE CARLO STUDY
Nazrul Islam
ABSTRACT
This chapter conducts a Monte Carlo investigation into small sample
properties of some of the dynamic panel data estimators that have been
applied to estimate the growth-convergence equation using SummersHeston data set. The results show that the OLS estimation of this equation
is likely to yield seriously upward biased estimates. However, indiscriminate use of panel estimators is also risky, because some of them display
large bias and mean square error. Yet, there are panel estimators that have
much smaller bias and mean square error. Through a judicious choice of
panel estimators it is therefore possible to obtain better estimates of the
parameters of the growth-convergence equation. The growth researchers
may make use of this potential.
I. INTRODUCTION
One of the issues around which the recent growth literature has evolved is that
of convergence. This refers to the idea that, because of diminishing returns to
ISBN: 0-7623-0688-2
317
318
NAZRUL ISLAM
capital, poorer economies should grow faster and catch up with the richer ones.
Statistically, convergence is therefore interpreted as a negative correlation
between the initial level of income and the subsequent growth rate.
Accordingly, a popular method for testing the convergence hypothesis has been
to run growth-initial level regressions or growth-convergence regressions,
where subsequent growth rates are regressed on initial levels of income.
For a long time, growth-convergence regressions were estimated using crosssection data. However, recently researchers have drawn attention to the fact that
the growth-convergence equation actually represents a dynamic panel data
model, and by ignoring the individual effects, cross-section estimation courts
omitted variable bias (OVB). Thus, Islam (1993, 1995) argues for using panel
procedures to overcome this bias and in particular implements Chamberlains
(1982, 1983) Minimum Distance (MD) procedure to estimate the equation.
Knight et al. (1993) make similar arguments and also use the Minimum
Distance procedure to produce similar results. Islam, in addition, presents
results from the Least Squares with Dummy Variables (LSDV) procedure.
Since these initial works, panel estimation of the growth-convergence
equation has spread considerably. For example, Lee, Pesaran & Smith (1997,
1998) consider maximum likelihood estimation of the growth-convergence
equation using panel data. Caselli et al. (1996) emphasize the problem of
endogeneity in this equation and use the Arellano-Bond GMM panel procedure
to overcome the problem. Barro (1997) and Barro & Sala-i-Martin (1995) use
pooled estimation on panel data sets. Lee et al. (1998) also present evidence on
panel estimation of the growth convergence equation.
The panel estimates presented in these papers generally differ from
corresponding cross-section estimates. However, they also differ among
themselves. Nerlove (1999) highlights this by using a variety of panel
estimators to estimate the growth-convergence equation and compiling the
results. Similar findings were presented earlier in Islam (1993). This creates a
problem of choosing among various panel estimators. Unfortunately, theoretical properties of dynamic panel data estimators are generally asymptotic and
often equivalent. This creates the necessity of Monte Carlo studies to ascertain
the small sample properties of these estimators. However, Monte Carlo studies
are more useful when they are customized to the specification and the data set
that are used in actual estimation. Although many researchers have recently
presented Monte Carlo evidence on small sample properties of dynamic panel
estimators, studies focusing on the growth-convergence equation and using the
Summers-Heston (1988, 1991) data set are rare.
This chapter tries to help overcome this lacking. The study focuses on those
estimators that have been used so far to estimate the growth-convergence
Monte Carlo Study of Panel Estimators for Growth-Convergence Equation
319
equation. Accordingly, the estimators included are: least squares with dummy
variables (LSDV); the two instrumental variable estimators of Anderson &
Hsiao (1981, 1982), namely AH(l), based on level instruments, and AH(d),
based on difference instruments; the minimum distance (MD) estimator,
suggested by Chamberlain (1982, 1983); and the one-step (ABGMM1) and
two-step (ABGMM2) generalized method of moments estimators proposed by
Arellano & Bond (1991). In addition, the exercise includes simultaneous
equations (SE) estimators such as the two stage least squares estimator (2SLS),
the three stage least squares estimator (3SLS), and the generalized three stage
least squares estimator (G3SLS). To complete the picture, the study also
includes the (pooled) ordinary least squares (OLS) estimator, which ignores the
individual effects.
The two main parameters of the model are the dynamic adjustment
parameter (attached to the lagged dependent variable) and , the parameter
of the exogenous variable. The Monte Carlo results show that the OLS
estimates of are, as expected, positively biased, and the magnitude of this bias
averages to about seventeen percent of the true parameter value. For most of the
panel estimators, the direction of bias is negative, with only the AH(d)
estimator providing some exceptions. The bias is small for the AH(d), the
LSDV, and the MD estimators, ranging between five and six percent. The bias
of the 2SLS, 3SLS, and 3SGLS estimates of ranges between eight to ten
percent. The largest bias is observed for the ABGMM estimators, averaging to
twenty two percent. The AH(l) estimator perform so poorly that we refrain
from reporting its results.
The results regarding root mean square error (RMSE) demonstrate a similar
pattern. The average RMSE as percentage of the true value of proves to be
seventeen percent for the OLS estimator. For the LSDV and the MD estimators,
this percentage ranges between six and seven. For the AH(d), 2SLS, 3SLS, and
3GSLS estimators, it ranges between ten and twenty. This percentage is the
highest for the ABGMM estimators, ranging between forty to forty-six
percent.
With regard to , the bias of the OLS estimates is again positive, but now
averages much higher to forty-eight percent of the parameter value. The
direction of bias of the panel estimates of is quite mixed. However, panel
estimates of are on average quite close to the true parameter value. The
magnitude of the algebraic average of the bias for the 2SLS, 3SLS, LSDV and
the MD estimator remain under one percent. For AH(d) and G3SLS it ranges
between one and two percent. For the ABGMM estimates, this percentage is
higher but still within five to seven percent.
320
NAZRUL ISLAM
The RMSE results for display a similar ranking of performance.

However, the smallness of bias in estimation of is nullified greatly by large
variance of the estimates. As a result, the RMSE values for are in general
much higher than for . For a good number of panel estimators, which
include AH(d), 2SLS, and 3SLS, the RMSE remain under thirty-five percent of
true value of . For the LSDV and the MD, this percentage is under
twenty-five. However, for 3GSLS, this percentage is fifty-six. For the
ABGMM it is around two hundred percent. For the OLS the ratio is fifty-six
percent.
The results indicate that the OLS estimation of the growth-convergence
equation is very likely to give considerably biased results. However,
indiscriminate use of panel estimators is risky too. Yet, there are panel
estimators that have much smaller bias and RMSE than the OLS. Hence, a
judicious choice of panel estimator has the potential to yield much better
estimates of the parameters of the growth convergence equation. Growth
researchers may make use of this potential.
In addition to the above, several general points emerge from this study. First,
the performances of the two AH estimators contrast sharply. The source of this
contrast lies in different degree of correlation of the instruments with the
instrumented variables. This highlights the importance of research into
estimation with weak instruments. Second, a comparison of the ABGMM1
results with that of ABGMM2 and of 2SLS results with that of either 3SLS or
3GSLS shows that simpler estimators not requiring estimated weighting
matrices may perform better than sophisticated estimators that do require such
matrices. Use of estimated weighting matrices creates avenue for unwarranted
noise to enter into estimation. Third, increasing the number of instruments may
not necessarily improve estimation results. This is revealed by the poor
performance of the ABGMM estimators compared to that of AH(d). Fourth,
theoretically inconsistent estimators can display good small sample performance. The performance of the LSDV estimator, which is inconsistent in the
direction of N, illustrates this. Finally, the results of this chapter are in general
agreement with other recent Monte Carlo studies, which have also reported
large bias of the ABGMM estimators and better performance of the LSDV
estimator.
The discussion of the chapter is organized as follows. Section 2 reviews
previous Monte Carlo studies of dynamic panel estimators and specifies the
objectives of the current study. Section 3 presents the model and discusses the
data generation processes. Section 4 presents the results. Section 5 contains
some concluding remarks.
321
II. PREVIOUS MONTE CARLO STUDIES

Much of the recent empirical research on growth has revolved around
estimation of the growth-convergence equation. A close inspection of this
equation shows that it is actually a dynamic panel data model.1A cross-section
estimation of the equation therefore suffers from omitted variable bias. This has
led to panel estimation. Different panel estimators have however produced
different results. Theoretical properties of many of these estimators are
asymptotic and equivalent. Hence, Monte Carlo evidence is necessary to gauge
which of these estimates are more acceptable.
The issue of small sample properties of dynamic panel estimators is not new.
Earlier, the gas demand study by Balestra & Nerlove (1966) also raised this
issue. This led Nerlove to conduct several Monte Carlo studies. Nerlove (1967)
considers a simple auto-regressive model with no exogenous variable and
compares the performance of the OLS, LSDV, MLE, and several variants of the
GLS estimator in estimating the model. In Nerlove (1971), the dynamic panel
model is extended to include an exogenous variable. This allows consideration
of instrumental variable (IV) estimator with lagged values of the exogenous
variable as instrument. It also allows having another variant of the two-stage
GLS. Overall, Nerloves Monte Carlo results favor the GLS estimators over
other estimators.
Since Nerloves work, there have been significant developments in the field
of dynamic panel data estimators.2 Among these is introduction of the
Anderson & Hsiao (1981, 1982) instrumental variable estimators that use
further lagged values of the dependent variable as instruments. Arellano &
Bond (1991) carry this idea further and propose using all lagged variables
(provided they qualify) as instruments within a GMM framework. Ahn &
Schimdt (1995, 1997, 1999), Arellano & Bover (1995), Blundell & Bond
(1998), Hahn (1999), Wansbeek & Knaap (1998), and Ziliak (1997) suggest
various extensions and modifications of the Arellano-Bond GMM estimator
(ABGMM). On the other hand, Kiviet (1995) and Wansbeek & Knaap (1998)
propose modifications of the LSDV and LIML estimators, respectively.
Many of the recent works offer Monte Carlo evidence too. Thus Arellano &
Bond (1991) perform a Monte Carlo study to compare primarily the small
sample properties of their GMM estimators with corresponding properties of
the Anderson-Hsiao estimators. According to their results, the GMM estimators
perform better than the Anderson-Hsiao IV estimators, though not so much in
terms of bias as in terms of dispersion. However, simulation studies of AlonsoBorrengo & Arellano (1999), Kiviet (1995), Harris & Matyas (1996), Judson &
Owen (1997), Wansbeek & Knaap (1998), and Ziliak (1997) report significant
322
NAZRUL ISLAM
bias of the ABGMM estimators. Kiviet (1995) reports good performance of his
bias-corrected LSDV estimator. On the other hand, Wansbeek & Knaap (1998)
report better performance of a covariance-corrected instrumental variable
estimator and their LIML estimator. Baltagi & Kao (2000) in this volume give
an extensive survey of recent developments in dynamic panel data models.
These studies have illuminated the small sample properties of various
dynamic panel estimators. However, most of these studies do not focus on any
particular model or data set. Ziliak (1997)s study is probably an exception, and
it focuses on a labor supply model and uses the PSID data. However, it is
known that Monte Carlo results are more useful when the exercise is
customized to the model whose estimation is in question and when the
simulations are conducted on the basis of the data set that is actually used for
estimation of the model. From this point of view there exists a void regarding
the growth-convergence equation. Monte Carlo evidence on small sample
performance of panel data estimators in estimating this equation is rare.
This chapter tries to overcome this lacking to some extent. It focuses
exclusively on the growth-convergence equation and bases the simulations on
the Summers-Heston data set that has been widely used in estimating this
equation. This focus also guides the choice of estimators to be included in the
study. The main feature of the growth-convergence equation is that the
exogenous variable of the model is correlated with the individual, country
effects. This implies that panel estimators that rely on uncorrelated randomeffects assumption are not suitable for estimation of this equation. On the other
hand, estimators that highlight this correlation, such as the Minimum Distance
estimator of Chamberlain, may play an important role in estimating it. The
study also considers several different generation mechanism of the random
error term, and it considers estimation of the equation in several different
samples that have widely figured in the recent growth literature. Because of its
customized nature, the results of this study should be directly useful for the
empirical growth researchers.
III. MODEL, PARAMETER VALUES, AND DATA

GENERATION
A. The Model
The dynamic panel data model that arises in the convergence literature is as
follows:
yit = yi,t 1 + xi,t 1 + i + t + vit.
(1)
323
Here yit represents log of per capita GDP of country i at time t, yi,t 1 is the same
lagged by one period, and xi,t 1 is the difference in log of investment and
population growth rate variables of country i at time t 1. Finally, i and t are
individual and time effect terms, and vit is the transitory error which varies
across both individual and time. In this set up, (t1) and t denote initial and
subsequent periods of time, respectively. The derivation of this equation
proceeds from the Cobb-Douglas aggregate production function, Yt =
Kt (AtLt)1 , where Y, K, and L are output, capital, and labor respectively, and A
is the labor-augmenting technology which grows exponentially at the
exogenous rate g. The derivation yields the following correspondence between
the coefficients of equation (1) and the structural parameters of the production
function:
= e
(2)

1
(3)
i = (1 e ) ln A0i
(4)
= (1 e )
t = g(t2 e

t1).
(5)
Here is the length of time between t2 and t1, where t2 and t1 correspond to t
and (t1) of equation (1), respectively. The parameter is known as the rate of
convergence and is given by = (1 )(n + g +
), where n is the exponential
growth rate of L, and
is the rate of depreciation of capital.
An important issue regarding this model is specification of the individual
effect term i. The equation (4) shows that i basically stands for A0i. Mankiw,
Romer & Weil (1992, p. 6) define A0i as follows: The A0i term reflects not just
technology but resource endowments, climate, institutions, and so on; it may
therefore differ across countries. From this definition, it is obvious that A0i is
correlated with xi,t 1, which represents savings and fertility behavior in an
economy. Thus equation (1) represents a dynamic panel data model with
correlated effects. This shows why random-effects estimators are not
appropriate for the growth-convergence equation.
However, there are different ways to specify the correlation between i and
xi,t 1. Mundlak (1971) proposes a simple specification whereby i is a function
of x i, the time mean of xi,t 1. This is however restrictive and renders the random
effects model equivalent to the fixed effects model, provided the transitory
error term is serially uncorrelated. Hence, a more general specification is
preferable. Following Chamberlain, we adopt the following specification of
i:
i = 0 + 1x` i0 + 2xi1 + + TxT 1 + i,
(6)
324
NAZRUL ISLAM
where i distributed as N(0, 2). Viewed as a linear predictor, this does not

involve any restriction. Viewed as a conditional expectation function, the only
restriction is linearity.
Almost all researchers have used the Summers-Heston data set to estimate
the growth-convergence equation. This data set has yearly data. However, it is
generally believed that yearly data are not suitable for studying growth,
because influence of business fluctuations are likely to have more role in such
data. Most of the panel studies have used five-year averages/panels for
estimation of the model. Accordingly, the value of in this study is set to
five.3
B. Parameter Values
Considered in full, the model presented in equation (1) and (6) has three sets
of parameters. The first consists of the auto-regressive parameter and the
slope parameter . These are the main parameters of interest. The second set
consists of 0, 1, . . . . , T , which arise from specification of the individual
effect term i. In addition, this set includes the time effect terms, ts. The third
set consists of parameters which govern the error terms vit and i.
An important issue in data generation is specification of the transitory error
term vit. A value of five implies that vits are five years apart. However, some
possibility of serial correlation in vit still remains. Accordingly we allow for the
following three possibilities:
1. UC (serially Uncorrelated) process: vit ~ N(0, 2v).
2. MA (1) process: vit = it + i,t 1, with ~ N(0, 2 ).
3. AR (1) process: vit = vi,t 1 + , with ~ N(0, 2 ).
There are two reasons for limiting the order of MA and AR processes to one.
First, given that vits are five calendar years apart, orders greater than one are
not very plausible theoretically. Second, even if such higher orders cannot be
ruled out theoretically, the limited value of T does not make them very feasible.
The data used in this chapter range from 1960 to 1985. With equal to five, this
implies five cross-sections in the panel, i.e. T equals five.
With regard to parameter values for which to conduct the simulations, we
again follow the principle of customization. We let the data determine the set
of parameter values for which to conduct the simulations. The following threestep procedure is employed for this purpose. In the first step, we obtain
consistent estimates of and . This is done by an instrumental variable (IV)
regression based on the first-differenced model and using lagged xits as
instruments. These consistent estimates of and are used to compute
325
composite residuals (t + i + vit). In the second step, these residuals are
regressed on xits and year dummies to get estimates of s and ts. The
residuals from this second step regression give estimates of (i + vit)s. We can
denote these as uits. The third step consists of estimating the parameters of the
MA(1) and AR(1) models from the estimated values of uits. We use
Chamberlains Minimum Distance estimation procedure to do this and get
estimated values of , , and the corresponding values of and .4
In growth-convergence studies, three different samples have been frequently
used. Following Mankiw et al. (1992), these samples are often referred to as the
NONOIL, INTER, and OECD. Of these, the OECD is the smallest and consists
of 22 OECD countries. The NONOIL is the largest and consists of most of the
sizable countries of the world for which oil extraction is not the dominant
economic activity. This sample consists of 96 countries. Finally, the INTER is
an intermediate sample comprised of all those countries included in the
NONOIL sample except those for which data quality is not satisfactory. This
sample consists of 74 countries.
Table 1 gives the values of the parameters that belong to the first and second
set. These are also the parameters that remain the same under different
generation mechanisms of vit.
Certain aspects of these parameter values are worth noting. First, there seems
to be some agreement across samples regarding direction in which xits of
different years relate to the individual effect term i. This is reflected in similar
signs of ts across samples. However, this agreement is not complete. Second,
the way different time periods affect the growth process differs across samples.
Table 1. Common Parameter Values
Parameter
NONOIL
INTER
OECD

0
1
2
3
4
5
70
75
80
85
0.7886
0.1641
1.3334
0.0028
0.1200
0.1243
0.0267
0.2277
0.0171
0.0156
0.0067
0.0669
0.7925
0.1732
1.3588
0.1927
0.1098
0.1644
0.1286
0.1715
0.0093
0.0015
0.0218
0.0523
0.6294
0.0954
2.8986
0.5863
0.6354
0.0702
0.6355
0.3484
0.0680
0.0827
0.1295
0.1238
326
NAZRUL ISLAM
This is revealed by the signs of ts in different samples. There are some
differences in this regard between the NONOIL and the INTER samples.
However, the difference between these two samples on the one hand, and the
OECD, on the other, proves to be more significant.
Next we turn to the parameter values that differ with the three different
generation mechanisms of vit. The estimated values of these parameters are
compiled in Table 2.
Several things may be noted from this Table. First, the largest estimated
values of and are about 0.2 and 0.3, respectively. This indicates that any
serial dependence that vit may have in the actual data is of fairly low order.5
This in turn suggests that the relative performance of different estimators may
not vary widely across different ways of modeling of vit. Second, variance of
the individual country effect term remains quite stable under alternative
generating schemes of vit in all different samples. Third, the estimate of the
variance of vit also remains very similar across the samples. Fourth, the relative
values of and v suggest that variation in the individual effect term i
account for a significant part of the overall variation in the data.
C. Data Generation
Once the parameter values are available, data generation can begin. It proceeds
through the following steps. First of all, values of xits are constructed from the
Table 2.
Parameter
Parameter Values for Different Generating Mechanisms of vit

NONOIL
Uncorrelated vit
0.1054
0.1281

v

0.2037
0.1179
0.1225
0.1153

v

0.2994
0.1227
0.1183
0.1171
INTER
OECD
0.0872
0.0139
0.0300
0.0762
0.1250
0.0990
0.1010
0.0980
0.1125
0.0302
0.0742
0.0300
0.1787
0.0943
0.0995
0.0927
0.1394
0.0319
0.0742
0.0316
MA(1) vit
AR(1) vit
327
Summers-Heston data set in the way described above.6 This data set also
provides the initial values, y0i. We assume that all disturbance terms have
normal distribution.7 The second step differs for different models of vit. For the
uncorrelated model, random values of vit and i are generated using
distributions N(0, 2v) and N(0, 2), respectively. These values of vit and i are
then combined with the given values of yi,t 1 and xi,t 1, and the parameter
values in Table 1 to produce yit. For the first period, y0is serve as the yi,t 1s. For
the subsequent periods, the value of yit serves as the lagged value of y for
generating yi,t + 1. The process continues till the last (T-th) period is reached. For
the MA(1) model, i is again generated using distribution N(0, 2). However,
generation of vit now requires generation of it from the distribution N(0, 2 ).
These values of it are then combined with the values of to produce the vits.
Generation of vits for the AR(1) proceeds in analogous manner.
Once the data are generated, estimation can proceed. We now turn to the
estimation results.
IV. SIMULATION RESULTS

Given a certain number of cross-sections available (i.e. given T), different panel
data estimators can make use of different numbers of these cross-sections at the
final stage of estimation. In simulation, therefore, it is possible to adopt two
different approaches. One is to keep the actual number of cross-sections used
by the estimators the same by generating varying number of cross-sections for
different estimators. The other is to keep the number of cross-sections
generated the same and let the number of actual cross-sections used in the final
stage of estimation by different estimators to vary. It is the second situation that
a researcher faces in actual practice. In order to conform to this real situation,
we adopt the second approach. In our particular case, there are five crosssections available, namely for 1965, 1970, 1975, 1980, and 1985, and T is five.
We let the actual number of cross-sections used by individual estimators to
vary.8
As is known, not all panel estimators are geared to estimation of all the
parameters of the model. Because of this and also in order not to clutter the
presentation with too many numerical results, we focus here only on results
regarding and . The simulation results presented in this chapter are on the
basis of one thousand replications. In most cases, Monte Carlo distributions
stabilized with only one hundred replications. Hence increasing the number of
replications by any further was not necessary.
The two criteria that are usually used in judging performance of an estimator
are bias and mean square error (MSE). In order to make assessment easy, we
328
NAZRUL ISLAM
present tables showing bias and root mean square error (RMSE) in relative
form, i.e. as percentage of the true parameter value.9 Tables 3 and 4 provide the
relative magnitudes of bias, and Tables 5 and 6 show the relative magnitudes of
root mean square error for the estimates of and , respectively.
These Tables indicate that the relative performance of the estimators varies
across samples and vit generation mechanisms (DGM). To convey an overall
picture, we therefore compute the (algebraic) average of the bias and RMSE for
each estimator. These are row-averages and are presented in the last column of
the Tables. We will first describe the results in terms of these averages and then
consider the inter-sample and inter-DGM variations.
Beginning with , we may first consider results regarding bias. Table 3
shows that the OLS estimates of are, as expected, positively biased, and this
bias averages to seventeen percent. The panel estimates of , on the other hand
and as expected, are negatively biased. The only exception in this regard is the
AH(d) estimator, which displays small positive bias when vit is generated under
the uncorrelated (UC) scheme. However, the average bias is negative for this
estimator too. We refrain from reporting results for the AH(l) estimator because
of its very poor performance. (We will come to this issue shortly.) Among the
panel estimators, the bias is smaller for the AH(d), the LSDV, and the MD
estimators, ranging between five and six percent. These are followed by the SE
estimators, for which this bias ranges between eight to ten percent. The largest
bias, about twenty-two percent, is associated with the ABGMM estimators.
Table 5 shows that the RMSE in estimating has a similar pattern. The
average RMSE for the OLS estimator stands at seventeen percent. For the
LSDV and the MD estimator, this ratio lies between six and seven percent. For
the AH(d) estimator the ratio averages to eleven percent. For the SE estimators,
this ratio lies between thirteen to twenty percent. For the ABGMM estimators,
this ratio equals to or exceeds forty percent.
Looking at the bias results for (Table 4), we see that the OLS estimates are
again severely biased upwards, with the bias now averaging to forty-eight
percent. The direction of bias of the panel estimators is mixed. But the panel
procedures yield estimates that are on average quite close to the true parameter
values. The absolute value of this bias for the panel estimators ranges from
under one to seven percent. Within this range, however, the LSDV, the MD, the
2SLS, and the 3SLS estimators perform better, with average bias being less
than one percent. Next comes the AH(d) and the G3SLS estimator, having a
bias ranging between one and two percent. The largest biases, ranging between
five and seven percent, are recorded for the ABGMM estimators.
The smallness of the average biases of the panel estimates of is however
swamped by large variances of the Monte Carlo distributions. This finds
Bias as Percentage of True Parameter Value
MA(1)
14.6
8.2
nr
14.5
10.4
10.1
9.3
3.3
5.4
6.9
UC
14.8
8.0
nr
0.4
10.7
9.7
9.3
4.5
6.0
6.7
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
14.8
7.9
nr
15.9
10.6
10.2
8.6
6.5
5.2
6.4
NONOIL
UC
15.2
8.4
nr
0.2
44.4
47.3
3.1
5.8
8.3
6.9
INTER
MA(1)
15.2
9.3
nr
9.5
49.5
49.5
3.1
7.9
10.1
7.9
INTER
AR(1)
15.4
8.0
nr
10.0
43.4
44.4
2.8
5.3
8.8
6.7
INTER
UC
21.5
1.6
nr
0.6
9.5
8.6
18.8
12.7
19.9
1.3
OECD
MA(1)
20.9
1.7
nr
1.2
8.6
8.2
15.8
12.2
16.5
1.1
OECD
AR(1)
21.2
1.4
nr
1.6
8.3
8.5
17.1
12.2
13.4
1.2
OECD
Average
17.1
6.1
nr
5.7
21.7
21.8
9.8
7.8
10.4
5.0
Row
Notes:
1. The true values of are different for different sample and are provided in Table 1.
2. Row Average is the algebraic average of the numbers in the row.
3. The NONOIL, INTER, and OECD are different samples, and UC, MA, and AR refer to Uncorrelated, Moving Average, and Autoregressive
generation mechanism of the transitory error vit.
4. n.r. stands for Not Reported, because these numbers generally prove to be too large.
NONOIL
NONOIL
Estimator
For in the model: yit = yi,t 1 + xi,t 1 + i + t + vit
Table 3.

329
Bias as Percentage of True Parameter Value
MA(1)
32.1
0.7
nr
2.2
14.4
5.2
2.7
0.2
2.4
0.7
UC
31.4
1.0
nr
0.4
13.7
3.9
2.3
0.2
1.3
0.2
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
31.6
0.3
nr
4.0
14.5
14.7
1.9
2.0
1.7
0.8
NONOIL
UC
11.8
0.5
nr
0.6
7.5
3.1
2.7
2.0
2.0
1.0
INTER
MA(1)
11.7
1.4
nr
1.1
16.3
22.1
2.0
2.3
2.2
0.5
INTER
AR(1)
11.1
1.1
nr
1.0
26.4
34.5
2.5
9.3
8.7
0.0
INTER
UC
100.0
1.5
nr
1.7
7.3
17.3
0.8
5.1
14.2
0.6
OECD
MA(1)
99.9
0.7
nr
2.5
5.4
19.9
3.9
4.5
8.1
0.6
OECD
AR(1)
100.5
2.1
nr
0.4
2.6
1.5
3.9
8.9
2.0
1.3
OECD
Average
47.8
0.1
nr
1.5
6.9
5.3
0.9
0.8
1.4
0.03
Row
Notes:
NONOIL
NONOIL
Estimator
Table 4.
330
NAZRUL ISLAM
Root MSE as Percentage of True Parameter Value
MA(1)
14.8
8.7
nr
16.6
27.5
29.3
12.6
14.7
18.1
7.8
UC
15.0
8.5
nr
8.3
27.7
29.6
12.1
8.5
10.0
7.4
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
14.9
8.5
nr
17.6
26.7
28.9
12.0
10.4
8.7
7.4
NONOIL
UC
15.3
8.9
nr
5.4
64.8
79.7
5.1
9.6
11.9
7.6
INTER
MA(1)
15.3
9.9
nr
13.9
70.4
84.9
5.4
11.1
13.8
8.7
INTER
AR(1)
15.3
8.7
nr
13.3
65.9
77.1
5.0
8.9
12.6
7.6
INTER
UC
22.3
3.5
nr
7.3
24.3
32.9
24.3
28.4
40.9
3.0
OECD
MA(1)
21.7
3.6
nr
7.2
21.9
29.1
21.3
28.4
37.6
3.1
OECD
AR(1)
22.0
3.6
nr
7.5
23.7
31.0
23.0
23.6
29.3
3.2
OECD
Average
17.4
7.1
nr
10.8
39.2
46.9
13.4
16.0
20.3
6.2
Row
Notes:
NONOIL
NONOIL
Estimator
Table 5.

331
Root MSE as Percentage of True Parameter Value
MA(1)
35.2
15.3
nr
20.1
151.9
169.7
18.4
21.9
28.2
15.4
UC
34.6
12.8
nr
19.9
147.0
169.6
17.1
13.7
17.5
13.3
OLS
LSDV
AH(l)
AH(d)
AGMM1
AGMM2
2SLS
3SLS
G3SLS
MD
AR(1)
34.7
15.4
nr
19.1
145.3
165.1
17.5
16.1
15.4
15.8
NONOIL
UC
18.8
12.4
nr
20.3
148.0
187.7
19.5
16.5
17.8
12.6
INTER
MA(1)
18.1
14.5
nr
21.2
153.8
205.2
20.8
15.6
18.5
15.1
INTER
AR(1)
17.7
14.3
nr
18.1
143.6
181.9
19.3
17.8
23.5
14.4
INTER
UC
117.4
40.1
nr
64.9
243.5
306.8
58.3
67.2
119.4
40.5
OECD
MA(1)
116.6
44.9
nr
60.2
237.9
284.6
54.7
64.9
111.1
44.6
OECD
AR(1)
116.0
43.8
nr
62.1
226.5
284.1
57.9
82.5
149.6
45.6
OECD
Average
56.6
23.7
nr
34.0
177.5
217.2
31.5
35.1
55.7
24.1
Row
Notes:
1. The true values of are different for different sample and are provided in Table 1
2) Row Average is the algebraic average of the numbers in the row.
NONOIL
NONOIL
Estimator
Table 6.
332
NAZRUL ISLAM
333
reflection in the large relative RMSE values reported in Table 6. The ratio of
RMSE to true value of for the OLS estimator stands at fifty-seven percent.
For most of the panel estimators this ratio is much lower. For the LSDV and the
MD estimators, this ratio is close to twenty-four percent. For the AH(d), the
2SLS, and 3SLS estimators, the ratio lies between thirty-two and thirty-five
percent. The G3SLS estimator displays a higher ratio, fifty-six percent, which
is close to that observed for the OLS estimator. For the ABGMM estimators,
however, this ratio ranges from 178 to 217 percent, which is much higher than
that for the OLS.
These results show that the OLS estimation of the growth-convergence
equation is very likely to produce significantly biased estimates. The
performance of the panel estimators, on the other hand, varies. The LSDV and
the MD estimators perform well. The SE estimators come next in performance.
The AH estimators display very contrasting performance. The AH(l) estimator
perform so poorly that we refrain from presenting its results. On the other hand,
the AH(d) estimator performs sometimes better than the SE estimators. The
ABGMM estimators are found to display large bias and RMSE.
These results agree with recent Monte Carlo evidence produced by other
researchers in other contexts. For example several studies have reported bias of
the ABGMM estimators. Other studies have reported good small sample
performance of the LSDV estimator. These results imply that the OLS
estimation of the growth-convergence equation should be avoided. Indiscriminate use of panel estimator is also fraught with danger. However, a judicious
choice of panel estimator can yield better estimates of the parameters of the
growth convergence equation. Empirical growth researchers can make use of
this possibility.
Beyond these results of immediate concern, the study brings out several
general points. The first of these concerns the contrasting performance of the
AH estimators. Both these estimators rely on the assumption of orthogonality
of lagged yi to vit. This assumption holds only when vit is serially uncorrelated.
Therefore, one would expect both these estimators to perform well when vit is
serially uncorrelated, and both of them to perform poorly when vit follows
either the AR(1) or the MA(1) pattern. However, as the numbers in the Tables
show, the AH(d) performs relatively well under all different generation
mechanisms of vit and for all samples, while the performance of AH(l) is found
to be unsatisfactory under all different generation mechanisms of vit and for all
samples, particularly for the NONOIL and the INTER samples. The
explanation, as it turns out, lies in the difference in the degree of correlation of
the instruments with the instrumented variables. It is found that (yi,t 2 yi,t 3),
the instrument used by the AH(d), is strongly correlated with the explanatory
334
NAZRUL ISLAM
variable (yi,t 1 yi,t 2), while yi,t 2, the instrument used by the AH(l), is very
poorly correlated with (yi,t 1 yi,t 2). This poor correlation finds reflection in
astronomically large values of standard error for the AH(l) estimates. These
results reconfirm the necessity of instruments to be sufficiently correlated with
the instrumented variable (in addition to being uncorrelated with the error), and
highlight the importance of the research on estimation with weak instruments.10
A second point concerns the performance of the ABGMM estimators as well
as the AH(d) estimator. The performance of these estimators does not vary that
much over the three generation mechanisms of vit. This is particularly true with
regard to estimation of . This is somewhat surprising because these estimators
depend rather heavily for their validity on orthogonality of lagged values of yit
to vit, and this orthogonality is violated when vit follows either an AR or a MA
scheme. It is true that the order of serial correlation is low. However, one would
expect some effect of the serial correlation given that it nullifies validity of so
many instruments. Actually, the AH(d) estimator does show some sensitivity
with respect to the generation scheme of vit. Why the ABGMM estimators do
not display similar sensitivity is an intriguing question.
The third point relates to the variation of performance of the estimators
across samples. The overall picture portrayed above is on the basis of average
over samples and DGMs. Looking at inter-sample variation, however, it is
difficult to establish a pattern. For example, going by the results on bias of
estimated , the performance of the OLS estimator deteriorates for the OECD
when compared with that for either the NONOIL or the INTER samples.
However, in case of the LSDV and the MD estimators, the opposite is true. The
ABGMM and the SE estimators show a yet different kind of contrast. The
performance of the ABGMM estimators deteriorates for the INTER sample in
comparison with that for either the NONOIL or the OECD samples. In case of
the SE estimators, the opposite is true. The contrasting performance of the
ABGMM and the SE estimators may not be entirely surprising in view of the
fact that while the former depends on lagged yits as instruments, the SE
estimators rely entirely on the xits.
The fourth point concerns relative performance of simple and sophisticated
versions of generically similar estimators. The averaged RMSE values
presented in Tables 5 and 6 show that the simpler 2SLS estimators outperforms
the 3SLS and the G3SLS. Similarly, in terms of these averaged values, the
ABGMM1 outperforms the ABGMM2.11 This highlights the fact that
sophisticated estimators requiring estimated weighting matrices may not
necessarily perform better than their simpler counterpart estimators that do not
require such matrices. Estimation of these weighting matrices creates
335
additional scope for noise to enter the estimation process, and that may nullify
the potential gain.
The final point concerns the performance of the LSDV estimator. As is
known, for a dynamic panel data model, the LSDV is inconsistent in the
direction of N. True that the LSDV estimator is consistent in the direction of T.
However T in this study is too small to make one a-priori hopeful of the benefit
of T-asymptotics. The results of this chapter regarding LSDV estimates show
that even theoretically inconsistent estimators can have good small sample
properties. This reinforces the importance of Monte Carlos studies.
V. CONCLUDING REMARKS
The issue of small sample properties of dynamic panel estimators is important.
Both substantive and methodological conclusions often depend on attention
given to this issue. For example, Caselli et al. (1996) reject the Solow model
based on their results from estimation of the growth-convergence equation
using a variant of the ABGMM estimator. The small sample bias of this
estimator reported in this and other studies may raise the question whether such
a rejection was too quick. Also, the estimation results prompt the authors to
abandon the strictly model-based specification in favor of an extended version
that includes a variety of variables based on heuristic reasoning. From a
methodological point of view, this is a throwback to the earlier stage of crosscountry growth research when specifications used to be informal, and the
coefficient of the regressions did not have exact correspondence with the
structural parameters of the production function. One of the great merits of
Mankiw, Romer & Weil (1992) and Barro & Sala-i-Martin (1992) was to put
an end to this stage. Methodologically, therefore, a return to informal
specifications may not be the ideal thing to do. A more satisfactory solution is
perhaps to adopt a two-stage analysis, with the first stage adhering to the
formal, model-based specification and yielding unbiased estimates of parameters and productivity. The second stage may focus on the role of the heuristic
variables in explaining productivity differences. However, this requires
attention to the issue of small sample performance of the estimator used in the
first stage.
NOTES
1. For a derivation of the growth-convergence equation, see Barro & Sala-i-Martin
(1992, 1995), Mankiw, Romer & Weil (1992), and Mankiw (1995). For conversion of
the growth-convergence equation into a dynamic panel data model, see Islam (1993,
1995).
336
NAZRUL ISLAM
2. For discussions of many of these new estimators, see Baltagi (1995) and Hsiao
(1986).
3. This is value of that has been used in Islam (1993, 1995), Knight et al. (1993),
Caselli et al. (1996) and in several other papers.
4. For example, for the MA(1) model, this starts by noticing that E(uiui ) has the
following structure:
2 + (1 + 2)2
2 + 2
2
2
2
2
2
2
2
2
2
2
2
+
+ (1 + )
+

2
2
2
2
2
2
2
2
2
E(uiui) =

+
+ (1 + )
+
2
2
2
2
2
2
2
2
2

+
+ (1 + )
+ 2
2
2
2
2
2
2

+
+ (1 + 2)2
where ui = (ui1, ui2, . . . , uiT), and T = 5. As expected, E(uiui) has three parameters,
namely , , and . The sample analog of this covariance matrix is obtained from
1
u iu i, where u i = (ui1, . . . , u iT), and u its are obtained from the second step. There
N i
are T(T + 1)/2 = 15 distinct elements in this sample covariance matrix, which are (nonlinear) functions of the three underlying parameters , , and . Estimates of , ,
and can be obtained from these 15 elements using the MD estimation framework.
See for details Chamberlain (1982, 1983). An analogous procedure is followed for the
AR(1) model to obtain the estimates of , , and . Estimation of v and for the
UC case is easier.
5. Perhaps also of interest is that the value of both and are the largest in the
NONOIL sample and the smallest in the OECD sample, with the values for the INTER
sample being in between.
6. For further details on construction of the xits, see Islam (1995).
7. In this study we have limited ourselves to parametric distributions of the
disturbance term. In principle it is possible to do away with parametric assumptions. We
leave this as a future task.
8. To save space, we do not provide detailed description of the estimators. Many of
these are well known. For the rest, the interested reader can see the cited references. An
appendix containing the description of the estimators is also available from the author
upon request.
9. In this chapter we report only the summary results. The detailed results are in a set
of Appendix Tables, which are available upon request.
10. See for example Nelson & Startz (1990), Staiger & Stock (1997), and Wang &
Zivot (1998).
11. To be sure, this ranking does not hold for every sample and every DGM. For
example in the NONOIL sample, regardless of the DGM, results from the 3SLS and the
G3SLS estimators seem to be better than that from the 2SLS. For the INTER sample,
however, the 2SLS seems to perform better than either the 3SLS or the G3SLS. In case
of the OECD sample, the situation is less clear cut. In terms of the mean of the Monte
Carlo distribution, the 3SLS and the G3SLS fare better than the 2SLS, though not in
terms of dispersion. On the other hand, in the OECD sample, the Monte Carlo
distributions for the 2SLS estimator have very large standard deviation. One reason for
337
deterioration of performance of the 3SLS and the G3SLS estimators in the INTER and
the OECD samples, when compared to that in the NONOIL sample, may lie in samplesize. The sizes of the former samples are smaller that that of the latter. Since the
superiority of the 3SLS and the G3SLS over the 2SLS estimator is an asymptotic result,
a larger sample size may help this result to surface.
ACKNOWLEDGMENTS
I would like to thank Professor Chamberlain, Professor Jorgenson, and
Professor Guido Imbens for their guidance to my work on this paper. Initial
versions of this chapter were presented in seminars at Harvard University and
Emory University. Comments of the participants of these seminars are greatly
appreciated. I would like to extend my sincere thanks to the three referees and
the editor, Professor Badi Baltagi, for their comments and suggestions that led
to significant improvement of this chapter. All remaining errors are mine.
REFERENCES
Ahn, S. C., & Schmidt, P. (1997). Efficient Estimation of Dynamic Panel Models: Alternative
Assumptions and Simplified Estimation. Journal of Econometrics, 76, 309321.
Ahn, S. C., & Schmidt, P. (1999). Estimation of Linear Panel Data Models Using GMM. In:
Matyas (Eds), Generalized Method of Moments Estimation. Cambridge: Cambridge
University Press.
Alonso-Borrengo, C., & Arellano, M. (1999). Symmetrically Nomalized Instrumental-Variable
Estimation Using Panel Data. Journal of Business and Economic Statistics, 17, 3649.
Anderson, T. W., & Hsiao, C. (1981). Estimation of Dynamic Models with Error Components.
Journal of American Statistical Association, 76, 598606.
Anderson, T. W., & Hsiao, C. (1982). Formulation and Estimation of Dynamic Models Using
Panel Data. Journal of Econometrics, 18, 4782.
Arellano, M., & Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo
Evidence and an Application to Employment Equations. The Review of Economic Studies,
58, 277297.
Arellano, M., & Bover, O. (1995). Another Look at the Instrumental Variable Estimation of Error
Components Models. Journal of Econometrics, 68, 2952.
Balestra, P., & Nerlove, M. (1966). Pooling Cross-section and Time Series Data in the Estimation
of a Dynamic Model: The Demand of Natural Gas. Econometrica, 34, 585612.
Baltagi, B. H. (1995). Econometric Analysis of Panel Data. New York: John Wiley and Sons.
Baltagi, B. H., & Kao, C. (2000). Non-stationary Panels, Cointegration in Panels, & Dynamic
Panels: A Survey. Advances in Econometrics, 15 (this volume).
Barro, R. (1997). Determinants of Economic Growth: A Cross-country Empirical Study.
Cambridge: MIT Press.
Barro, R., & Sala-i-Martin, X. (1992). Convergence. Journal of Political Economy, 100(2),
223251.
338
NAZRUL ISLAM
Barro, R., & Sala-i-Martin, X. (1995). Economic Growth. Boston: McGraw Hill.
Bekker, P. A. (1994). Alternative Approximations to the Distributions of Instrumental Variable
Blundell, R., & Bond, S. (1998). Initial Conditions and Moment Restrictions in Dynamic Panel
Caselli, F., Esquivel, G., & Lefort, F. (1996). Reopening the Convergence Debate: A New Look
at Cross-country Growth Empirics. Journal of Economic Growth, 1(3), 363390.
Chamberlain, G. (1982). Multivariate Regression Models for Panel Data. Journal of Econometrics,
18, 546.
Chamberlain, G. (1983). Panel Data. In: Z. Griliches, Z. & M. Intrilligator (Eds), Handbook of
Econometrics (pp. 12471318), Vol. II. North-Holland.
Hahn, J. (1999). How Informative is the Initial Condition in the Dynamic Panel Model with Fixed
Effects? Journal of Econometrics, 93, 309326.
Harris, M. N., & Matyas, L. A. (1996). Comparative Analysis of Different Estimators for Dynamic
Panel Data Models. Working paper: 04/96, Department of Econometrics and Business
Statistics, Monash University.
Harris, M., Longmire, R., & Maytas, L. (1996). Robustness of Estimators for Dynamic Panel Data
Models to Misspecification. Working paper No. 14/96, Department of Econometrics and
Business Statistics, Monash University.
Islam, N. (1993). Estimation of Dynamic Models from Panel Data. Unpublished Ph.D.
Dissertation, Department of Economics, Harvard University.
Islam, N. (1995). Growth Empirics: A Panel Data Approach. Quarterly Journal of Economics, CX,
11271170.
Judson, R. A., & Owen, A. L. (1997). Estimating Dynamic Panel Data Models: Practical Guide
for Macroeconomists. Board of Governors of the Federal Reserve System, Finance and
Economics Discussion Paper Series 1997/03.
Kiviet, J. (1995). On Bias, Inconsistency, & Efficiency of Various Estimators in Dynamic Panel
Knight, M., Loyaza, N., & Villanueva, D. (1993). Testing for Neoclassical Theory of Growth. IMF
Staff Papers, 40(3), 512541.
Lee, K., Pesaran, H., & Smith, R. (1997). Growth and Convergence in a Multi-Country Empirical
Stochastic Growth Model. Journal of Applied Econometrics, 12, 357392.
Lee, K., Pesaran, H., & Smith, R. (1998). Growth Empirics: A Panel Data Approach A
Comment. Quarterly Journal of Economics, CXIII, 319323.
Lee, M., Longmire, R., Matyas, L., & Harris, M. (1998). Growth Convergence: Some Panel
Evidence. Applied Economics, 30, 907912.
Mankiw, N. G. (1995). The Growth of Nations. Brookings Papers on Economic Activity, 1,
275310.
Mankiw, N. G., Romer, D., & Weil, D. (1992). A Contribution to the Empirics of Growth.
Quarterly Journal of Economics, CVII, 407437.
Maytas, L. (Ed.) (1999). Generalized Method of Moments Estimation. Cambridge: Cambridge
University Press.
Mundlak, Y. (1971). On the Pooling of Time Series and Cross-section Data. Econometrica, XXXVI,
6985.
Nelson, C. R., & Startz, R. (1990). Some Further Results on the Exact Small Sample Properties
of the Instrumental Variables Estimator. Econometrica, 58, 967976.
339
Nerlove, M. (1967). Experimental Evidence on the Estimation of Dynamic Economic Relations

from a Time Series of Cross-sections. Economic Studies Quarterly, 18, 4274.
Nerlove, M. (1971). Further Evidence on the Estimation of Dynamic Economic Relations from a
Time Series of Cross-sections. Econometrica, 39, 383396.
Nerlove, M. (1999). Properties of Alternative Estimators of Dynamic Panel Models: An Empirical
Analysis of Cross-country Data for the Study of Economic Growth. In: C. Hsiao, K. Lahiri,
L. Lee & M. Pesaran (Eds), Analysis of Panel and Limited Dependent Variable Models.
Nickel, S. (1979). Biases in Dynamic Models with Fixed Effects. Econometrica, 49, 13991416.
Staiger, D., & Stock, J. H. (1997). Instrumental Variable Regressions with Weak Instruments.
Summers, R., & Heston, A. (1988). A New Set of International Comparisons of Real Product and
Price Levels Estimates for 130 Countries, 195085. Review of Income and Wealth, XXXIV,
126.
Summers, R., & Heston, A. (1991). The Penn World Table (Mark 5): An Expanded Set of
International Comparisons, 19501988. Quarterly Journal of Economics, 106, 327368.
Wang, J., & Zivot, E. (1998). Inference on Structural Parameters in Instrumental Variables
Regression with Weak Instruments. Econometrica, 66(6), 13891404.
Wansbeek, T. J., & Knaap, T. (1998). Estimating a Dynamic Panel Data Model with Heterogenous
Trends. Working paper, Department of Economics, University of Groningen.
Ziliak, J. P. (1997). Efficient Estimation with Panel Data When Instruments are Predetermined: An
Empirical Comparison of Moment-Condition Estimators. Journal of Business and

Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF

Uploaded by

Copyright:

Available Formats

You might also like

Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonstationary Panels, Panel Cointegration, and Dynamic Panels PDF

Uploaded by

Copyright:

Available Formats

LIST OF CONTRIBUTORS

Texas A&M University, Department of

Department of Economics and International

Institute for Fiscal Studies and University

Institute for Fiscal Studies and Nuffield

Humboldt University Berlin, Institute of

National Cheng-Kung University, Institute of

University Maastricht, Department of

Department of Economics, Emory University,

Syracuse University, Center for Policy

University of Helsinki, Department of

Department of Economics, Texas A&M

Department of Economics, University of

University Maastricht, Department of

Department of Economics, University of

Indiana University, Department of

Department of Economics, University of

University Maastricht, Department of

Institute for Fiscal Studies, 7 Ridgmount

Department of Finance and Managerial

Department of Economics, State University

BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL

extensive simulations. Also, spurious regressions in panel data are considered

BADI H. BALTAGI, THOMAS B. FOMBY & R. CARTER HILL

The chapter by Yin & Wu proposes stationarity tests for a heterogeneous

BADI H. BALTAGI & CHIHWA KAO

regression model in favor of heterogeneous regressions, i.e. one for each

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

II. PANEL UNIT ROOTS TESTS

BADI H. BALTAGI & CHIHWA KAO

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

A. Levin & Lin (1992) Tests

against the alternative hypothesis

and the corresponding t-statistic, under the null hypothesis is given by

BADI H. BALTAGI & CHIHWA KAO

Assume that there exists a scaling matrix DT and piecewise continuous

as N  by a law of large numbers and the Lindeberg-Levy central limit

E[ WiZ dWiZ] Var[ WiZ dWiZ]

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

(i, t) N(T( 1) + 7.5) N 0,

1.25t + 1.875N N(0, 1)

Sequential limit theory, i.e. T  followed by N , is used to derive the

BADI H. BALTAGI & CHIHWA KAO

ij yit j + ziti + it.

The null hypothesis is

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

for all i and the alternative hypothesis is

where ti is the individual t-statistic of testing H0 : i = 1 in (6). It is known that

as N  by the Lindeberg-Levy central limit theorem. Hence

as T  followed by N  sequentially. The values of E[tiT | i = 1] and

BADI H. BALTAGI & CHIHWA KAO

Nonstationary Panels, Cointegration in Panels and Dynamic Panels: A Survey

where zit is the deterministic component, rit is a random walk

Then the LM statistic is

BADI H. BALTAGI & CHIHWA KAO

It can be shown that

as T  followed by N  provided E[ W 2iZ] < . Also,

III. SPURIOUS REGRESSION IN PANEL DATA

as N by a law of large numbers and the Lindeberg-Levy central limit

(i, t) N(T( 1) + 7.5) N 0,

1.25t + 1.875N N(0, 1)

Sequential limit theory, i.e. T followed by N , is used to derive the

ij yit j + ziti + it.

where ti is the individual t-statistic of testing H0 : i = 1 in (6). It is known that

as N by the Lindeberg-Levy central limit theorem. Hence

as T followed by N sequentially. The values of E[tiT | i = 1] and

as T followed by N provided E[ W 2iZ] < . Also,

(eit e it 1)2. Kao proposed the following four DF type

tests by assuming zit = {i}:

DFt = 1.25t + 1.875N,

where tADF is the t-statistic of in (19). The asymptotic distributions of DF,

eit = it + uit,

as T . Now for a fixed N, it is clear that

as T followed by N by a continuous mapping theorem and a central

as T followed by N . The above discussion indicates that LR(r | k) and

where {yit} are 1 1, is a k 1 vector of the slope parameters, zit is the

sided long-run covariance. For example, when zit = {i}, we get

for i = 1, 2, . . . , N; and t = 1, 2, . . . , T. is a scalar, xit is k 1, i denotes the

with E(i) = 0; E(uit) = 0; and E(iuit) = 0 for i = 1, 2, . . . , N; t = 1, 2, . . . , T.

Since we expect E(yi1i) > 0, ( 1) will be biased upwards with

As 0, the instrumental variable estimator performs poorly. Hence, Blundell