Professional Documents
Culture Documents
Grouped Patterns of Heterogeneity in Panel Data
Grouped Patterns of Heterogeneity in Panel Data
1. INTRODUCTION
THERE IS AMPLE EVIDENCE that workers, firms, or countries differ in many
dimensions that are unobservable to the econometrician. In practice, applied
researchers face a trade-off between using flexible approaches to model un-
observed heterogeneity, and building parsimonious specifications that are well
adapted to the data at hand. The goal of this paper is to propose a flexible yet
parsimonious approach to allow for unobserved heterogeneity in a panel data
context.
A common approach is to model heterogeneity as unit-specific, time-
invariant fixed-effects. Fixed-effects approaches are attractive as they allow for
unrestricted correlation between unobserved effects and covariates. However,
in models with as many parameters as individual units, estimates of common
parameters are subject to an “incidental parameter” bias that may be substan-
tial in short panels (Nickel (1981)), and the fixed-effects themselves are often
poorly estimated. In addition, standard fixed-effects approaches are arguably
restrictive as they assume that unobserved heterogeneity is constant over time.
This paper proposes a framework that allows for clustered time patterns of
unobserved heterogeneity that are common within groups of individuals. The
group-specific time patterns and individual group membership are left unre-
1
We thank a co-editor and two anonymous referees for their help in improving the paper.
We also thank Daron Acemoglu, Daniel Aloise, Dante Amengual, Manuel Arellano, Jushan Bai,
Alan Bester, Fabio Canova, Victor Chernozhukov, Denis Chetverikov, David Dorn, Kirill Evdoki-
mov, Iván Fernández-Val, Lars Hansen, Jim Heckman, Han Hong, Bo Honoré, Andrea Ichino,
Jacques Mairesse, Mónica Martínez-Bravo, Serena Ng, Taisuke Otsu, Eleonora Patacchini, Peter
Phillips, Jean-Marc Robin, Enrique Sentana, Martin Weidner, and seminar participants at various
venues for useful comments. Support from the European Research Council/ERC Grant agree-
ment n0 263107 and from the Ministerio de Economía y Competitividad Grant ECO2011-26342
is gratefully acknowledged. All errors are our own.
stricted, and are estimated from the data. In particular, as in fixed-effects, our
time-varying specification allows for general forms of covariates endogeneity.
The main assumption is that the number of distinct individual time patterns of
unobserved heterogeneity is relatively small.
A simple linear model with grouped patterns of heterogeneity takes the fol-
lowing form:
(1) yit = xit θ + αgi t + vit i = 1 N t = 1 T
where the covariates xit are contemporaneously uncorrelated with vit , but may
be arbitrarily correlated with the group-specific unobservables αgi t . The group
membership variables gi ∈ {1 G} and the group-specific time effects αgt ,
for g ∈ {1 G}, are unrestricted. Units in the same group share the same
time profile αgt (e.g., all i such that gi = 1 share the profile α1t ). The number
of groups G is to be set or estimated by the researcher. Beyond model (1),
we study in detail two extensions: a model with additive time-invariant fixed-
effects ηi in addition to the time-varying grouped effects αgi t , and a model with
group-specific coefficients θgi .
Potential applications of model (1) and its extensions include social interac-
tion models for panel data where group-level interactions are subsumed in αgi t
(Blume, Brock, Durlauf, and Ioannides (2011)), or tests of full risk-sharing in
village economies (Townsend (1994)). Unlike most applications of social inter-
actions and risk-sharing, our approach allows to estimate the reference groups,
under the assumption that group membership remains constant over time. In
a different perspective, grouped patterns of heterogeneity can be useful to
model interdependence across individual units over time. Compared with ex-
isting spatial dependence models for panel data (e.g., Sarafidis and Wansbeek
(2012)), model (1) allows the researcher to estimate the spatial weights matrix.
Our estimator, which we will refer to as “grouped fixed-effects” (GFE), is
based on an optimal grouping of the N cross-sectional units, according to a
least squares criterion. Units whose time profiles of outcomes—net of the ef-
fect of covariates—are most similar are grouped together in estimation. In the
absence of covariates in model (1), the estimation problem coincides with the
standard minimum sum-of-squares partitioning problem, and a simple com-
putational method is given by the “kmeans” algorithm (Forgy (1965), Steinley
(2006)). We take advantage of recent advances in the clustering literature to
build fast and reliable computational routines.2
We derive the statistical properties of the grouped fixed-effects estimator in
an asymptotic where N and T tend to infinity simultaneously. In our frame-
work, N can grow substantially faster than T , in contrast with models with
unit-specific fixed-effects. While fixed-effects estimators generally suffer from
an O(1/T ) bias as N/T tends to a constant (Arellano and Hahn (2007)), we
2
Stata codes, which make use of a Fortran executable program, are available as Supplemental
Material (Bonhomme and Manresa (2015)).
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1149
show that the GFE estimator is consistent and asymptotically normal as N/T ν
tends to zero for some ν > 0, provided groups are well separated and errors vit
satisfy suitable tail and dependence conditions. This property, which has also
been noted in models with time-invariant discrete heterogeneity (Hahn and
Moon (2010)), is a consequence of group classification improving very quickly
as the number of time periods increases. In particular, our results provide a
formal justification for clustering methods.3
As the two dimensions of the panel diverge, the GFE estimator is asymp-
totically equivalent to the infeasible least squares estimator with known popu-
lation groups. As a consequence, in a large-T perspective standard errors are
unaffected by the fact that group membership has been estimated. In short
panels, however, group misclassification may contribute to the finite-sample
dispersion of the estimator. For this reason, we also study the properties of
the GFE estimator for fixed T as N tends to infinity. In a Monte Carlo exer-
cise calibrated to the empirical application, we provide evidence that using the
GFE estimator in combination with an estimate of its fixed-T variance yields
reliable inference for the population parameters.
We use our approach to study the effect of income on democracy in a panel
of countries that spans the last part of the twentieth century. In an influen-
tial paper, Acemoglu, Johnson, Robinson, and Yared (2008) found that the
positive association between income and democracy disappears when control-
ling for additive country- and time-effects. They interpreted the country fixed-
effects as reflecting long-run, historical factors that have shaped the political
and economic development of countries.
In the context of this application, the grouped fixed-effects model allows for
time-varying unobservables in a period that is characterized by a large number
of transitions to democracy, and it is well suited to deal with the short length
of the panel (T = 7). Grouped patterns are also consistent with the empir-
ical observation that regime types and transitions tend to cluster in time and
space (e.g., Gleditsch and Ward (2006), Ahlquist and Wibbels (2012)). An early
conceptual framework was laid out in Huntington’s (1991) work on the “third
wave of democracy,” which argues that international and regional factors—
such as the influence of the Catholic Church or the European Union—may
have induced grouped patterns of democratization. We find robust evidence of
heterogeneous, group-specific paths of democratization in the data.
3
Related results in statistics include Pollard (1981, 1982) on minimum sum-of-squares parti-
tioning, and Bryant and Williamson (1978) in a class of likelihood models. To our knowledge, ours
is the first paper to establish conditions under which the clustering estimator in kmeans becomes
consistent as N and T tend to infinity.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1150 S. BONHOMME AND E. MANRESA
4
See the monographs by McLachlan and Peel (2000) and Frühwirth-Schnatter (2006) for re-
cent advances in this area.
5
In this sense, our approach is reminiscent of sparsity assumptions in the literature on high-
dimensional modeling (e.g., Tibshirani (1996)).
6
Group models and clustering approaches have also been used to search for “convergence
clubs” in the empirical growth literature; see, for example, Canova (2004), and Phillips and Sul
(2007). See also Sun (2005). Similar techniques have been proposed in the statistical analysis of
network data (e.g., Bickel and Chen (2009), Choi, Wolfe, and Airoldi (2012)).
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1151
N
T
2
(2) ( α
θ γ) = argmin yit − xit θ − αgi t
(θαγ)∈Θ×AGT ×ΓG i=1 t=1
N
T
2
(4) ( α) = argmin
θ yit − xit θ − αgi (θα)t
(θα)∈Θ×AGT i=1 t=1
where gi (
gi (θ α) is given by (3). The GFE estimate of gi is then simply θ
α).
Unlike standard finite mixture modeling, which specifies the group proba-
bilities as parametric or semiparametric functions of observed covariates (e.g.,
McLachlan and Peel (2000)), grouped fixed-effects leaves group membership
unrestricted. In the Supplemental Material, we show that the GFE estimator
maximizes the pseudo-likelihood of a mixture-of-normals model, where the
mixing probabilities are unrestricted and individual-specific. In this perspec-
tive, the grouped fixed-effects approach may be seen as a point of contact be-
tween finite mixtures and fixed-effects.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1152 S. BONHOMME AND E. MANRESA
N
T
2
(8) ( α
θ γ) = argmin yit − xit θgi − αgi t
(θαγ)∈ΘG ×AGT ×ΓG i=1 t=1
Nonlinear Models
Grouped patterns of heterogeneity may be introduced in nonlinear models
as well. A general M-estimator formulation based on a data-dependent func-
tion mit (·) is as follows:
N
T
(9) ( α
θ γ) = argmin mit (θ αt gi )
(θαγ)∈ΘG ×AGT ×ΓG i=1 t=1
7
A GFE estimator in a likelihood setup is obtained by taking mit (θ αt gi ) =
− ln f (yit |xit ; θ αt gi ), where f (·) denotes a parametric density function.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1153
statistical properties of GFE in nonlinear models exceeds the scope of this pa-
per.
2.2. Computation
One can see, from (4), that the grouped fixed-effects estimator minimizes a
piecewise-quadratic function, where the partition of the parameter space is de-
fined by the different values of
gi (θ α), for i = 1 N. However, the number
of partitions of N units into G groups increases steeply with N, making exhaus-
tive search virtually impossible.
The following algorithm uses a simple iterative strategy to minimize (4).
ALGORITHM 1—Iterative:
1. Let (θ(0) α(0) ) ∈ Θ × AGT be some starting value. Set s = 0.
2. Compute for all i ∈ {1 N}:
T
2
(10) gi(s+1) = argmin yit − xit θ(s) − α(s)
gt
g∈{1G}
t=1
3. Compute:
N
N
2
(11) θ(s+1) α(s+1) = argmin yit − xit θ − αg(s+1) t
i
(θα)∈Θ×AGT i=1 t=1
Algorithm 1 alternates between two steps. In the “assignment” step, each in-
dividual unit i is assigned to the group gi whose vector of time effects is closest
(in a Euclidean sense) to her vector of residuals yit − xit θ. In the “update” step,
θ and α are computed using an OLS regression that controls for interactions of
group indicators and time dummies.8 The objective function is non-increasing
in the number of iterations, and numerical convergence is typically very fast.
However, the solution depends on the chosen starting values. Drawing start-
ing values at random and selecting the solution that yields the lowest objective
provides a practical approach in low-dimensional problems.
For larger-scale problems, we take advantage of the close connection be-
tween (4) and the well-studied kmeans clustering algorithm (Forgy (1965)),
and develop a more efficient computational routine, Algorithm 2, which ex-
ploits recent advances in data clustering (Hansen, Mladenović, and Moreno
8
As written, the solution of the algorithm may have empty groups. A simple modification
consists in re-assigning one individual unit to every empty group, as in Hansen and Mladenović
(2001). Note that doing so automatically decreases the objective function.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1154 S. BONHOMME AND E. MANRESA
3. ASYMPTOTIC PROPERTIES
In this section, we characterize the asymptotic properties of the grouped
fixed-effects estimator as N and T tend to infinity in model (1). Extensions of
the main theorems for models (5) and (7) can be found in the Supplemental
Material.
where gi0 ∈ {1 G} denotes group membership, and where the 0 superscripts
refer to true parameter values. We assume for now that the number of groups
G = G0 is known, and we defer the discussion on estimation of the number of
groups until the end of this section.
Let (θα) be the infeasible version of the GFE estimator where group mem-
bership gi , instead of being estimated, is fixed to its population counterpart gi0 :
N
T
2
(13) ( α) = argmin
θ yit − xit θ − αgi0 t
(θα)∈Θ×AGT i=1 t=1
This is the least squares estimator in the pooled regression of yit on xit and the
interactions of population group dummies and time dummies.
The main result of this section provides conditions under which estimated
groups converge to their population counterparts, and the GFE estimator de-
fined in (2) is asymptotically equivalent to the infeasible least squares estimator
(
θα), when N and T tend to infinity and N/T ν → 0 for some ν > 0. In par-
ticular, this allows T to grow considerably more slowly than N (when ν 1).
Before discussing the general case of model (12), we provide an intuition in a
simple case.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1155
We assume that α01 < α02 . The properties of GFE are different when group sep-
aration fails (e.g., when α01 = α02 ), as we discuss below.
In finite samples, there is a nonzero probability that estimated and popula-
tion group membership do not coincide. Specifically, it follows from (3) that
the probability of misclassifying into group 2 an individual who belongs to
group 1 is
0
Pr gi α = 2|gi0 = 1
T
T
0 2
0 2
= Pr α1 + vit − α2 <
0
α1 + vit − α01
t=1 t=1
α −α
0 0
= Pr vi > 2 1
2
That is,
0 √ α02 − α01
(15) Pr
gi α = 2|gi0 = 1 = 1 − T
2σ
where denotes the standard normal c.d.f.
For fixed T ,
gi (α0 ) is inconsistent as N tends to infinity, because only the ith
observation is informative about gi0 . As a result, α generally suffers from an in-
cidental parameter bias and is inconsistent. Nevertheless, (15) implies that the
group misclassification probability tends to zero at an exponential rate, which
intuitively means that the incidental parameter problem vanishes very rapidly
as T increases.
Extending the analysis of model (14) to a more general setup raises two main
challenges. First, consistency is not straightforward to establish since, as N and
T tend to infinity, both the number of group membership variables gi and the
number of group-specific time effects αgt tend to infinity, causing an inciden-
tal parameter problem in both dimensions.9 Second, the argument leading to
the exponential rate of convergence in (15) relies on i.i.d. normal errors. In or-
der to bound tail probabilities under more general conditions, approximations
based on a central limit theorem are not sufficient.
9
Note that the class of models considered in a recent paper by Hahn and Moon (2010) only
covers time-invariant discrete unobserved heterogeneity. So their results do not apply here.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1156 S. BONHOMME AND E. MANRESA
3.2. Consistency
Consider the following assumptions.
1
N T
N 0
i=1 1{gi =g}1{gi =g}xit
10
Formally: xg∧gt = N 0 . Note that xg∧gt depends on the grouping γ =
i=1 1{gi =g}1{gi =
g}
{g1 gN }, although we leave that dependence implicit for conciseness. In fact, Theorem 1
below remains true if, in Assumption 1(g), the average xg0 ∧gi t is replaced by the linear projec-
i
tion of xit on the group indicators 1{gi0 = 1}, . . . , 1{gi0 = G}, 1{gi = 1}, . . . , 1{gi = G}, all of them
interacted with time dummies.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1157
αgi t − α0g0 t → 0
NT i=1 t=1 i
ASSUMPTION 2: N
(a) For all g ∈ {1 G}: plimN→∞ N1 i=1 1{gi0 = g} = πg > 0.
T
(b) For all (g g) ∈ {1 G}2 such that g = g: plimT →∞ T1 t=1 (α0gt − α0gt )2 =
cgg > 0.
d
(c) There exist constants a > 0 and d1 > 0 and a sequence α[t] ≤ e−at 1
such that, for all i ∈ {1 N} and (g g) ∈ {1 G}2 such that g =
g, {vit }t ,
{αgt − αgt }t , and {(αgt − αgt )vit }t are strongly mixing processes with mixing coeffi-
0 0 0 0
11
Assumption 1(g) is interestingly related to Assumption A in Bai (2009).
N T
12
To see this, let us suppose that xit ∼ N (0 1) for simplicity. Then maxγ∈ΓG i=1 t=1 x2g0 ∧gi t
i
is the maximum of |ΓG | ≤ GN random variables drawn from a χ2DT distribution, where D ≤ G2 ,
so that Assumption 1(g) is satisfied.
13
Note that α[t] is a conventional notation for strong mixing coefficients. We use this notation
here, in the hope that this does not generate confusion with the group-specific time effects αgt .
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1158 S. BONHOMME AND E. MANRESA
14
It is possible to relax Assumptions 2(c)–2(d) and assume that vit and α0gt − α0gt are strongly
mixing with a polynomial decay rate, and that the marginal distribution of vit has polynomial tails,
that is, that α[t] ≤ at −d1 , and Pr(|vit | > m) ≤ m−d2 for some constants a ≥ 1, d1 > 1, and d2 > 2.
It may then be shown that θ− θ = op (T −q ), provided that (dd11+1)d
+d2
2
> 4q + 1.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1159
and
(17)
θ= θ + op T −δ and
(18) αgt =
αgt + op T −δ for all g t
ASSUMPTION 3:
(a) For all i j, and t: E(xjt vit ) = 0.
(b) There exist positive definite matrices Σθ and Ωθ such that
1
N T
1
N N T T
N T d
t=1 (xit − xgi0 t )vit → N (0 Ωθ ).
1
(c) As N and T tend to infinity: √NT i=1
N N
(d) For all (g t): limN→∞ N1 i=1 j=1 E(1{gi0 = g}1{gj0 = g}vit vjt ) = ωgt > 0.
N d
(e) For all (g t), and as N and T tend to infinity: √1N i=1 1{gi0 = g}vit →
N (0 ωgt ).
Assumptions 3(a)–3(c) imply that the least squares estimator θ has a stan-
dard asymptotic distribution. Assumption 3(a) is satisfied if xit are strictly ex-
ogenous or predetermined and observations are independent across units. As
a special case, lagged outcomes may thus be included in xit (although the
assumption does not allow for spatial lags such as yi−1t ). Similarly, Assump-
tions 3(d)–3(e) ensure that αgt has a standard asymptotic distribution.
The following result is a direct consequence of Theorem 2.
1 2
N T
1
(21)
αgi t − α0g0 t = Op
NT i=1 t=1 i N
These properties contrast with those of estimators that allow for unit-specific
fixed-effects in combination with time fixed-effects. Given the interactive struc-
ture of model (12), “interactive fixed-effects” estimators are particularly rele-
vant in our context. The interactive fixed-effects estimator of θ0 , as fixed-effects
estimators in other settings, has a O(1/T ) bias in general when N/T → c > 0;
see Theorem 3 in Bai (2009). In addition, the conditions for root-N consis-
tency of the time-varying factors require that N/T 2 → 0; see Theorem 1 in
Bai (2003).16 Lastly, when using interactive fixed-effects, the components α0g0 t
√ √ i
are estimated at a rate of min( N T ); see Theorem 3 in Bai (2003). These
properties suggest that, when a grouped structure is a reasonable assumption,
GFE may be better suited than interactive fixed-effects in panels of moderate
length. Simulations calibrated to the empirical application, summarized below,
are in line with this theoretical discussion.
Inference
The large-N T asymptotic analysis above provides conditions under which
group membership estimation does not affect inference. In the Supplemental
N N T
t=1 E(1{gi = g}1{gj = g}vit vjt ) = O(1), in addition to
15 1 0 0
Equation (21) holds if NT i=1 j=1
the conditions of Corollary 1. See the Supplemental Material for a proof.
16
Theorem 1 in Bai and Ng (2002) does not rely on this condition, but yields a rate of
√ √
min( N T ).
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1161
Simulation Evidence
In order to assess the finite-sample performance of the GFE estimator, we
conduct several exercises on simulated data. The designs mimic the cross-
country data set that we use in the empirical application (N = 90, T = 7). We
find small probabilities of group misclassification (less than 10% when G = 3
and G = 5), and moderate biases on common parameters. Moreover, when
comparing the GFE estimator to the interactive fixed-effects estimator on a
simulated data set with grouped heterogeneity, we find that the latter has large
biases and imprecisely estimated components of unobserved heterogeneity. Fi-
nally, we compare different inference methods, and conclude that estimators
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1162 S. BONHOMME AND E. MANRESA
of the fixed-T variance lead to more reliable inference for the population pa-
rameters. Details and additional exercises can be found in the Supplementary
Material.
FIGURE 1.—Coefficients of income and lagged democracy. Note: Balanced panel from
Acemoglu et al. (2008). The x-axis shows the number of groups G used in estimation, the y-axis
reports parameter values. 95%-confidence intervals clustered at the country level are shown in
dashed lines. Confidence intervals are based on bootstrapped standard errors (100 replications).
Details on the computation are provided in the Supplemental Material.
17
All data in this section are taken from the files of Acemoglu et al. (2008): http://economics.
mit.edu/files/5000.
18
All estimates are computed using Algorithm 2. We performed extensive checks of numerical
accuracy, some of which are described in the Supplemental Material. Stata codes to replicate the
results are available as Supplemental Material.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1164 S. BONHOMME AND E. MANRESA
shows that the implied cumulative income effect θ2 /(1 − θ1 ) sharply decreases
from 025 in OLS to 010 for G = 5, and remains almost constant as G in-
creases further. The left and middle panels show that this pattern is mostly
driven by a decrease in the coefficient of lagged democracy. This is consistent
with unobserved country heterogeneity being positively correlated with lagged
democracy, causing an upward bias in OLS.
Note that, though statistically significant, the cumulative income effect is
quantitatively small. Moreover, we show in the Supplemental Material that
the association between income and democracy disappears in a specification
that combines both time-varying grouped effects and time-invariant country-
specific effects, as in model (5). Hence, in this specification which nests the one
in Acemoglu et al. (2008), the income effect is not statistically different from
zero.
Grouped Patterns
The GFE estimates of the unobserved determinants of democracy reveal
heterogeneous, time-varying patterns. The upper panel of Figure 2 shows the
estimates of group membership by country on a World map, when G = 4. The
bottom panel shows the parameter estimates αgt , and average democracy and
lagged log-GDP per capita by group over time. In the Supplemental Mate-
rial, we report the results of specifications with different choices of number
of groups and variables used, as well as estimates that also take into account
time-invariant country-specific fixed-effects or heterogeneity in coefficients.
The qualitative classification shown on the map is remarkably robust across
these different specifications.
Figure 2 shows that two of the four groups experience stable paths of democ-
racy over time, albeit at very different levels, while the other two show upward-
sloping profiles. Group 1, which we refer to as the “high-democracy” group,
mostly contains high-income, high-democracy countries. It includes the United
States and Canada, most of Continental Europe, Japan, and Australia, but
also India and Costa Rica. Group 2, which we refer to as “low-democracy,”
mostly includes low-income, low-democracy countries: a large share of North
and Central Africa, China, and Iran, among others. Groups 1 and 2, which
together account for 59 of the 90 countries, are broadly consistent with an
additive fixed-effects representation, as their grouped effects α1t and
α2t are
approximately parallel over time. In addition, the graph of average income by
group shows that group membership is strongly correlated with log-GDP per
capita, consistently with the presence of an upward omitted variable bias in the
cross-sectional regression of democracy on income.
While the first two groups of countries are consistent with a fixed-effects
model, the other two are not. Group 3 (“early transition”) experiences a
marked increase in democracy in the first part of the sample period: its mean
Freedom House score increases from 020 in 1970 to almost 090 in 1990. This
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1165
FIGURE 2.—Patterns of heterogeneity, G = 4. Note: See the notes to Figure 1. On the bot-
tom panel, the left graph shows the group-specific time effects αgt . The other two graphs show
the group-specific averages of democracy and lagged log-GDP per capita, respectively. Calendar
years (1970–2000) are shown on the x-axis. Light solid lines correspond to Group 1 (“high-democ-
racy”), dark solid lines to Group 2 (“low-democracy”), light dashed lines to Group 3 (“early
transition”), and dark dashed lines to Group 4 (“late transition”). The top panel shows group
membership. The list of countries by group is given in the Supplemental Material.
group includes a large share of Latin America, Greece, Spain and Portugal,
Thailand, and South Korea, in total 13 countries, with an intermediate level of
GDP per capita. Group 4 (“late transition”) makes a later transition to democ-
racy: its average Freedom House score increases from 020 to 075 between
1985 and 2000. This group includes 18 countries, among which are a large part
of West and South Africa, Chile, Romania, and Philippines. These are low-
income countries, whose GDP per capita is similar on average to that of the
“low-democracy” group (Group 2).
Note that the time patterns and group membership in Figure 2 are estimated
from the panel data, and not driven by modeling assumptions other than the
grouped structure. In particular, nothing in our framework imposes that time
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1166 S. BONHOMME AND E. MANRESA
patterns are smooth over time. Moreover, group membership is not assumed
to have a particular spatial structure, so the geographic correlation apparent
on the map is purely a result of estimation.
Discussion
Overall, the evidence obtained suggests that the effect of income on democ-
racy is perhaps zero, or in any case quantitatively small, in line with the con-
clusions of Acemoglu et al. (2008). At the same time, our analysis highlights
the presence of clustering in the evolution of political outcomes: while a sub-
stantial share of the world seems to have experienced stable parallel political
patterns during the period, roughly one third of the sample has experienced
steep upward transitions.
These results raise interesting questions: which factors explain democratic
transitions, and importantly, why we observe groups of countries making tran-
sitions at similar points in time. In the Supplemental Material, we present a
first attempt at explaining why these groups of countries have evolved so dif-
ferently, by regressing group membership on various determinants (including
historical measures). We see our framework as providing a starting point to
assess how well different theories of democratization fit the political and eco-
nomic evolution of countries over time.
5. CONCLUSION
Grouped fixed-effects (GFE) offers a flexible yet parsimonious approach to
model unobserved heterogeneity. The approach delivers estimates of common
regression parameters, together with interpretable estimates of group-specific
time patterns and group membership. The framework allows for strictly ex-
ogenous covariates and lagged outcomes. It also easily accommodates unit-
specific fixed-effects in addition to the time-varying grouped patterns, and
grouped heterogeneity in coefficients. Importantly, the relationship between
group membership and observed covariates is left unrestricted.
The GFE approach should be useful in applications where time-varying
grouped effects may be present in the data. As a first example, the empirical
analysis of the evolution of democracy shows evidence of a clustering of po-
litical regimes and transitions. More generally, GFE should be well-suited in
difference-in-difference designs, as a way to relax parallel trend assumptions.
Other potential applications include models of social interactions and spatial
dependence where the reference groups or the spatial weights matrix are esti-
mated from the panel data.
The extension to nonlinear models is a natural next step. While it is possible
to define GFE estimators in more general models (see, e.g., equation (9)), the
analysis raises statistical challenges. One area of applications is static or dy-
namic discrete choice modeling, where a discrete specification of unobserved
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1167
(A.1)
Q(θ α γ) = yit − xit θ − αgi t
NT i=1 t=1
N
T
2
(θ α γ) = 1
Q vit + xit θ0 − θ + α0g0 t − αgi t
NT i=1 t=1 i
PROOF:
N
T
Q (θ α γ) = 2
(θ α γ) − Q vit xit θ0 − θ + α0g0 t − αgi t
NT i=1 t=1 i
2
N T
= vit xit θ0 − θ
NT i=1 t=1
2 2
N T N T
2
N T
1 1
N T G N T
1
T
where, by Assumption 1(a), T t=1 α2gt is uniformly bounded. Now, note that
2
1 1
T N
1{gi = g}vit
T t=1 N i=1
1
N N T
1 1
N N T
≤ 2 vit vjt
N i=1 j=1 T t=1
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1169
1 1
N N T
≤ E(vit vjt )
N 2 i=1 j=1 T t=1
1 1
N N T
+ 2
vit vjt − E(vit vjt )
N i=1 j=1 T t=1
N N T
By Assumption 1(e): N12 i=1 j=1 | T1 t=1 E(vit vjt )| ≤ M
N
. Moreover, by the CS
inequality,
2
1 1
N N T
where plimNT →∞
ρ = ρ > 0.
1
N T
N
T
0 2
Q θ 0 α0 γ 0 = 1
(θ α γ) − Q xit θ − θ + α0g0 t − αgi t
NT i=1 t=1 i
≥ θ0 − θ Σ(γ) θ0 − θ
≥ min θ0 − θ Σ(γ) θ0 − θ
γ∈ΓG
2
ρ(γ) θ0 − θ
≥ min
γ∈ΓG
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1170 S. BONHOMME AND E. MANRESA
where minγ∈ΓG
ρ(γ) is asymptotically bounded away from zero by Assump-
tion 1(g). Q.E.D.
To show that θ is consistent for θ0 , note that, by Lemma A.1 and by the
definition of the GFE estimator,
(A.2) (
Q θ γ) = Q
α (θ γ ) + op (1) ≤ Q
α θ0 α0 γ 0 + op (1)
=Q θ0 α0 γ 0 + op (1)
1 0 0
N T
1 2
N T
≤ xit 2 × θ0 −
θ
NT i=1 t=1
1
N T
+ 4 sup |αt | × xit × θ0 −
θ
αt ∈A NT i=1 t=1
dH (a b) = max
2
max min (agt − bgt )2
g∈{1G} T
g∈{1G}
t=1
1
T
LEMMA B.3: Let Assumptions 1(a)–1(g), and 2(a)–2(b) hold. Then, as N and
T tend to infinity,
p
dH α α0 → 0
1 2 p
T
= 1 gi = g min
αgt − α0gt
N i=1 g∈{1G} T
t=1
Now,
1 0 2
N T
≤ 1 gi = g αgi t − α0gt
NT i=1 t=1
1 2
N T
≤ αgi t − α0g0 t
NT i=1 t=1 i
19
Note that group separation (Assumption 2(b)) is assumed to show Lemma B.3. Proving con-
sistency of the group-specific time effects absent this assumption would require different argu-
ments.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1172 S. BONHOMME AND E. MANRESA
1 2
T
where the right-hand side of this inequality converges in probability to (cgg )1/2
by Assumption 2(b) and equation (B.3). It thus follows that, with probability
approaching 1, σ(g) = σ( g) for all g = g. Thus σ admits a well-defined in-
verse σ −1 .
Now, with probability approaching 1, we have, for all g ∈ {1 G},
1 1 2
T T
0 2
min
αgt − αgt ≤
αgt − α0σ −1 (g)t
g∈{1G} T T t=1
t=1
1 2 p
T
1 2 p
T
ασ(g)t − α0gt → 0
T t=1
LEMMA B.4: For η > 0 small enough, we have, for all δ > 0 and as N and T
tend to infinity,
1
N
PROOF: Note that, from the definition of gi (·), we have, for all g ∈
{1 G},
T
2
T
2
1 gi (θ α) = g ≤ 1 yit − xit θ − αgt ≤ yit − xit θ − αgi0 t
t=1 t=1
so
1 1 0
N G N
1
gi (θ α) = gi0 = 1 gi = g 1
gi (θ α) = g
N i=1 g=1
N i=1
G
1
N
where
Zig (θ α) = 1 gi0 = g
T
2
T
2
×1 yit − xit θ − αgt ≤ yit − xit θ − αgi0 t
t=1 t=1
We start by bounding Zig (θ α), for all (θ α) ∈ Nη , by a quantity that does
not depend on (θ α). To proceed note that, for all (θ α) and all i,
T
0
Zig (θ α) = 1 gi = g 1 (αgi0 t − αgt )
t=1
αgt + αgi0 t
× vit + xit θ0 − θ + α0g0 t − ≤0
i 2
T
≤ max 1 (αgt − αgt )
g=g
t=1
αgt + αgt
× vit + x θ0 − θ + α0gt −
it ≤0
2
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1174 S. BONHOMME AND E. MANRESA
T
αgt + αgt
AT = (αgt − αgt ) vit + xit θ0 − θ + α0gt −
t=1
2
T
0 α0gt + α0gt
− αgt − α0gt vit + α0gt −
t=1
2
As we have
T
T
0
AT ≤ (αgt − αgt )vit − αgt − α0gt vit
t=1 t=1
T
+ (αgt − αgt )xit θ0 − θ
t=1
T
αgt + αgt
+ (αgt − αgt ) α0gt −
t=1
2
T
0 α0gt + α0gt
− αgt − αgt αgt −
0 0
t=1
2
1/2
1 2 1
T T
√ √ √
AT ≤ T C1 η v + T C2 η xit + T C3 η
T t=1 it T t=1
Zig (θ α)
T
0 α0gt + α0gt
≤ max 1 αgt − α0gt vit + α0gt −
g=g
t=1
2
1/2
1 2 1
T T
√ √ √
≤ T C1 η v + T C2 η xit + T C3 η
T t=1 it T t=1
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1175
Noting that the right-hand side of this inequality does not depend on (θ α),
ig , where
it follows that sup(θα)∈Nη Zig (θ α) ≤ Z
T
1 0 2
T
As a result,
1 1
N N G
(B.5) sup 1
gi (θ α) = gi0 ≤ ig
Z
(θα)∈Nη N N
i=1 i=1 g=1
√
Fix M > max( M M ∗ ), where M and M ∗ are given by Assumptions 1
√
and 2(e), respectively. Note that E(vit2 ) ≤ M. We have, using standard prob-
ability algebra and for all g,
T
1 0
T
2
(B.6) ig = 1) ≤
Pr(Z Pr α0gt − α0gt vit ≤ − αgt − α0gt
g=g t=1
2 t=1
1/2
1 2
T
√
+ T C1 η v
T t=1 it
1
T
√ √
+ T C2 η xit + T C3 η
T t=1
1
T
≤ Pr xit ≥ M
g=g
T t=1
1 0 1 2
T T
2 c g
g
+ Pr α − αgt ≤0
+ Pr v ≥M
T t=1 gt 2 T t=1 it
T
cgg √
+ Pr α0gt − α0gt vit ≤ −T + T C1 η M
t=1
4
√ √
+ T C2 ηM + T C3 η
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1176 S. BONHOMME AND E. MANRESA
To end the proof of Lemma B.4, we rely on the use of exponential inequali-
ties for dependent processes. Specifically, we use the following result, which is
a direct consequence of Theorem 6.2 in Rio (2000).
LEMMA B.5: Let zt be a strongly mixing process with zero mean, with strong
d
mixing coefficients α[t] ≤ e−at 1 , and with tail probabilities Pr(|zt | > z) ≤
d
e1−(z/b) 2 , where a, b, d1 , and d2 are positive constants. Then, for all z > 0, we
have, for all δ > 0,
1
T
T →∞
δ
T Pr zt ≥ z → 0
T t=1
PROOF: Let s2 = supt≥1 ( s≥1 |E(zt zs )|). Note that s2 < ∞ under the con-
dition of Lemma B.5.20 Let also d = dd1+d d2
. By evaluating inequality (1.7) in
1 2
Merlevède, Peligrad, and Rio (2011) at λ = T z4 and r = T 1/2 , we obtain that
there exists a constant f > 0 independent of T such that, for all z > 0 and
T ≥ 1,
1
T 2 −(1/2)T 1/2
1/2 z
Pr zt ≥ z ≤ 4 1 + T
T t=1 16s2
d
16f z
+ exp −a T 1/2
z 4b
Lemma B.5 directly follows. Q.E.D.
We now bound the last three terms on the right-hand side of (B.6).
T
• By Assumptions 1(a) and 2(b), we have limT →∞ T1 t=1 E[(α0gt − α0gt )2 ] =
cgg . So for T large enough, we have
1 0 2 2cgg
T
E αgt − α0gt ≥
T t=1 3
Applying Lemma B.5 to zt = (α0gt − α0gt )2 − E[(α0gt − α0gt )2 ], which satisfies ap-
propriate mixing and tail conditions by Assumptions 1(a) and 2(c), and taking
c g
z = g
6
, yields, for all δ > 0 and as T tends to infinity,
1 0 2 cgg
T
Pr αgt − α0gt ≤ = o T −δ
T t=1 2
20
This is a consequence of the fact that | Cov(X Y )| can be bounded in terms of the strong
mixing coefficient α[X Y ] and the quantile functions QX and QY (see, e.g., Theorem 1.1 in Rio
(2000)).
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1177
• Similarly, for the third term on the right-hand √ side of (B.6), applying
− M yields
Lemma B.5 to zt = vit2 − E(vit2 ) and taking z = M
1 2
T
Pr = o T −δ
vit ≥ M
T t=1
for all δ > 0. Note that {vit2 }t is strongly mixing as {vit }t is strongly mixing by
Assumption 2(c).
• Lastly, to bound the fourth term on the right-hand side of (B.6), we denote
as c the minimum of cgg over all g = g and we take
2
c
(B.7) η≤
+ C2 M
8(C1 M + C3 )
Now, by Assumption 2(c), the process {(α0gt − α0gt )vit }t has zero mean, and is
strongly mixing with faster-than-polynomial decay rate. Moreover, for all i, t,
and m > 0,
0 m
Pr αgt − αgt vit > m ≤ Pr |vit | >
0
2 sup |αt |
αt ∈A
so {(α0gt − α0gt )vit }t also satisfies the tail condition of Assumption 2(d), albeit
with a different constant b > 0 instead of b > 0.
c g
Lastly, applying Lemma B.5 with zt = (α0gt − α0gt )vit and taking z = g 8
yields
1 0
T
cgg
(B.8) Pr αgt − α0gt vit ≤ − = o T −δ
T t=1 8
Note that the above upper bounds on the probabilities do not depend on i
and g. Combining results, we thus obtain, using (B.6) and Assumption 2(e),
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1178 S. BONHOMME AND E. MANRESA
1
N G
(B.9) ig = 1)
Pr(Z
N i=1 g=1
1
T
≤ G(G − 1) sup Pr + o T −δ = o T −δ
xit ≥ M
i∈{1N} T t=1
Sufficient conditions for Assumption 2(e) are given at the end of Appendix B.
To complete the proof of Lemma B.4, note that, for η that satisfies (B.7), we
have, for all δ > 0 and all ε > 0,
1
N
E Zig
1 N i=1 g=1
N G
where we have used (B.5), the Markov inequality, and (B.9), respectively.
This ends the proof of Lemma B.4. Q.E.D.
We now prove the three parts of Theorem 2, and derive asymptotic results
for α, and
θ, gi in turn.
Properties of
θ. Let us denote:21
N
T
2
(B.10) α) = 1
Q(θ yit − xit θ − αgi (θα)t
NT i=1 t=1
and
1 2
N T
(B.11)
Q(θ α) = yit − xit θ − αgi0 t
NT i=1 t=1
is minimized at (
Note that Q(·) θ is minimized at (
α), and that Q(·) θ
α).
Let η > 0 be small enough such that Lemma B.4 is satisfied. Using Assump-
tions 1(a)–1(c) and Lemma B.4, it is easy to see that, for all δ > 0,
(B.12) α) − Q(θ
sup Q(θ α) = op T −δ
(θα)∈Nη
21 α) is a concentrated version of Q
Note that Q(θ (θ α γ) that was defined in the proof of
Theorem 1.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1179
Likewise, as
θ and
α are also consistent under the conditions of Theorem 1,
we have
(B.14) Pr ( θ / Nη → 0
α) ∈
Combining (B.12) and (B.13), we have, for all δ > 0 and as N and T tend to
infinity,
(B.15) Q(
α) − Q(
θ α) = op T −δ
θ
(B.18)
Q( θ
α) − Q( θ
α)
1 2
N T
≥ (
θ−
θ) (xit − xgi0 t )(xit − xgi0 t ) (
θ−
θ)
NT i=1 t=1
ρ
≥ θ−
θ2
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1180 S. BONHOMME AND E. MANRESA
1 0 1
G N T N T
1 gi = g αgt −
( αgt )2 = αgi0 t −
( αgi0 t )2
NT g=1 i=1 t=1
NT i=1 t=1
−δ
= op T
1
T
(B.19) ( αgt )2 = op T −δ
αgt −
T t=1
N sup Pr (
θ gi (
α) ∈ Nη α) = gi0
θ
i∈{1N}
= N sup E 1 (
θ gi (
α) ∈ Nη 1 α) = gi0
θ
i∈{1N}
G
≤ N sup E 1 ( α) ∈ Nη
θ ig
Z
i∈{1N}
g=1
G
≤ N sup ig = 1) = o NT −δ for all δ > 0
Pr(Z
i∈{1N}
g=1
where the last equality is obtained similarly as (B.9) under Assumption 2(e).
This implies (16), and completes the proof of Theorem 2.
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1181
PROPOSITION B.1: Suppose that either of the following two conditions holds:
1. Assumption 1(b) holds, and {xit }t satisfies the mixing and tail conditions
of Assumptions 2(c) and 2(d)
2. Assumptions 1(a), 1(c), and 2(c)–2(d), hold. Moreover, xit = (yit−1 xit ) ,
and θ = (ρ θ1 ) , where |ρ | < 1,
0
xit satisfy Assumption 2(e), and, for all constants
F > 0,
(B.20) sup Pr |yi0 | ≥ FT = o T −δ for all δ > 0
i∈{1N}
Note that (B.20) requires that the distribution of yi0 has thinner-than-polynomial
tails.
Then there exists a constant M ∗ > 0 such that, as N T tend to infinity,
1
T
t−1
0 s t
yit = ρ xit−s θ10 + α0g0 t−s + vit−s + ρ0 yi0
i
s=0
1
T −1
1
≤ xim θ10
1 − ρ0 T m=1
1
+ α0g0 m + |vim | + |yi0 |
i T
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
1182 S. BONHOMME AND E. MANRESA
i∈{1N} T m=1 i 2
−δ
=o T
REFERENCES
ACEMOGLU, D., S. JOHNSON, J. ROBINSON, AND P. YARED (2008): “Income and Democracy,”
American Economic Review, 98, 808–842. [1149,1162-1164,1166]
AHLQUIST, J., AND E. WIBBELS (2012): “Riding the Wave: World Trade and Factor-Based Models
of Democratization,” American Journal of Political Science, 56 (2), 447–464. [1149]
ANDREWS, D. (1984): “Non-Strong Mixing Autoregressive Processes,” Journal of Applied Proba-
bility, 21, 930–934. [1158]
ARELLANO, M., AND J. HAHN (2007): “Understanding Bias in Nonlinear Panel Models: Some
Recent Developments,” in Advances in Economics and Econometrics, Ninth World Congress,
ed. by R. Blundell, W. Newey, and T. Persson. Cambridge: Cambridge University Press. [1148]
BAI, J. (2003): “Inferential Theory for Factor Models of Large Dimensions,” Econometrica, 71,
135–171. [1160]
(2009): “Panel Data Models With Interactive Fixed Effects,” Econometrica, 77,
1229–1279. [1150,1156,1157,1160]
BAI, J., AND S. NG (2002): “Determining the Number of Factors in Approximate Factor Models,”
Econometrica, 70, 191–221. [1156,1160,1161]
BARRO, R. J. (1999): “Determinants of Democracy,” Journal of Political Economy, 107 (S6),
S158–S183. [1162]
BESTER, A., AND C. HANSEN (2013): “Grouped Effects Estimators in Fixed Effects Models,”
Journal of Econometrics (forthcoming). [1150]
BICKEL, P. J., AND A. CHEN (2009): “A Nonparametric View of Network Models and Newman–
Girvan and Other Modularities,” Proceedings of the National Academy of Sciences of the United
States of America, 106, 21068–21073. [1150]
BLUME, L. E., W. A. BROCK, S. N. DURLAUF, AND Y. M. IOANNIDES (2011): “Identification of
Social Interactions,” in Handbook of Social Economics, ed. by J. Benhabib, A. Bisin, and M. O.
Jackson. Amsterdam: Elsevier Science. [1148]
BONHOMME, S., AND E. MANRESA (2015): “Supplement to ‘Grouped Patterns of Hetero-
geneity in Panel Data’,” Econometrica Supplemental Material, 83, http://dx.doi.org/10.3982/
ECTA11319. [1148,1150]
BROWNING, M., AND J. CARRO (2014): “Dynamic Binary Outcome Models With Maximal Het-
erogeneity,” Journal of Econometrics, 178, 805–823. [1167]
BRYANT, P., AND J. A. WILLIAMSON (1978): “Asymptotic Behaviour of Classification Maximum
Likelihood Estimates,” Biometrika, 65, 273–281. [1149]
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
GROUPED PATTERNS OF HETEROGENEITY 1183
CANOVA, F. (2004): “Testing for Convergence Clubs in Income per capita: A Predictive Density
Approach,” International Economic Review, 45 (1), 49–77. [1150]
CHOI, D. S., P. J. WOLFE, AND E. M. AIROLDI (2012): “Stochastic Blockmodels With a Growing
Number of Classes,” Biometrika, 99, 273–284. [1150]
FORGY, E. W. (1965): “Cluster Analysis of Multivariate Data: Efficiency vs. Interpretability of
Classifications,” Biometrics, 21, 768–769. [1148,1153]
FRÜHWIRTH-SCHNATTER, S. (2006): Finite Mixture and Markov Switching Models. New York:
Springer. [1150]
GLEDITSCH, K. S., AND M. D. WARD (2006): “Diffusion and the International Context of De-
mocratization,” International Organization, 60, 911–933. [1149]
HAHN, J., AND H. MOON (2010): “Panel Data Models With Finite Number of Multiple Equilib-
ria,” Econometric Theory, 26 (3), 863–881. [1149,1155]
HANSEN, P., AND N. MLADENOVIĆ (2001): “J-Means: A New Local Search Heuristic for Mini-
mum Sum-of-Squares Clustering,” Pattern Recognition, 34 (2), 405–413. [1153]
HANSEN, P., N. MLADENOVIĆ, AND J. A. MORENO PÉREZ (2010): “Variable Neighborhood
Search: Algorithms and Applications,” Annals of Operations Research, 175, 367–407. [1154]
HUNTINGTON, S. P. (1991): The Third Wave: Democratization in the Late Twentieth Century. Nor-
man, OK: University of Oklahoma Press. [1149]
KASAHARA, H., AND K. SHIMOTSU (2009): “Nonparametric Identification of Finite Mixture Mod-
els of Dynamic Discrete Choices,” Econometrica, 77 (1), 135–175. [1167]
LIN, C. C., AND S. NG (2012): “Estimation of Panel Data Models With Parameter Heterogeneity
When Group Membership Is Unknown,” Journal of Econometric Methods, 1 (1), 42–55. [1150]
LIPSET, S. M. (1959): “Some Social Requisites of Democracy: Economic Development and Polit-
ical Legitimacy,” American Political Science Review, 53 (1), 69–105. [1162]
MCLACHLAN, G., AND D. PEEL (2000): Finite Mixture Models. Wiley Series in Probability and
Statistics: Applied Probability and Statistics. New York: Wiley-Interscience. [1150,1151]
MERLEVÈDE, F., M. PELIGRAD, AND E. RIO (2011): “A Bernstein Type Inequality and Moder-
ate Deviations for Weakly Dependent Sequences,” Probability Theory and Related Fields, 151,
435–474. [1176]
NICKEL, S. (1981): “Biases in Dynamic Models With Fixed Effects,” Econometrica, 49, 1417–1426.
[1147,1162]
PHILLIPS, P. C. B., AND D. SUL (2007): “Transition Modelling and Econometric Convergence
Tests,” Econometrica, 75, 1771–1855. [1150]
POLLARD, D. (1981): “Strong Consistency of k-Means Clustering,” The Annals of Statistics, 9,
135–140. [1149,1161]
(1982): “A Central Limit Theorem for k-Means Clustering,” The Annals of Probability,
10, 919–926. [1149,1161]
RIO, E. (2000): Théorie asymptotique des processus aléatoires faiblement dépendants. Berlin:
Springer. [1158,1176]
SAGGIO, R. (2012): “Discrete Unobserved Heterogeneity,” in “Discrete Choice Panel Data Mod-
els,” Master Thesis, CEMFI. [1167]
SARAFIDIS, V., AND T. WANSBEEK (2012): “Cross-Sectional Dependence in Panel Data Analysis,”
Econometric Reviews, 31, 483–531. [1148]
STEINLEY, D. (2006): “K-Means Clustering: A Half-Century Synthesis,” British Journal of Math-
ematical & Statistical Psychology, 59, 1–34. [1148]
STOCK, J., AND M. WATSON (2002): “Forecasting Using Principal Components From a Large
Number of Predictors,” Journal of the American Statistical Association, 97, 1167–1179. [1156]
SUN, Y. (2005): “Estimation and Inference in Panel Structure Models,” Unpublished Manuscript.
[1150]
TIBSHIRANI, R. (1996): “Regression Shrinkage and Selection via the Lasso,” Journal of the Royal
Statistical Society, Series B, 58, 267–288. [1150]
TOWNSEND, R. M. (1994): “Risk and Insurance in Village India,” Econometrica, 62, 539–591.
[1148]
14680262, 2015, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.3982/ECTA11319 by Erasmus University Rotterdam Universiteitsbibliotheek, Wiley Online Library on [08/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Dept. of Economics, University of Chicago, 1126 E. 59th Street, Chicago, IL