McCulloch and Neuhaus 2005 Generalized Linear Mixed Models

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Generalized Linear Mixed Generalized Linear Mixed Models: A

Definition
Models
Generalized linear mixed models constitute a class of
models for describing the stochastic relationship of
Introduction an n-dimensional outcome vector Y to an (n × p)-
dimensional matrix of covariates X, with rows x i .
Generalized linear mixed models (GLMMs) are an The construction of generalized linear mixed mod-
extension of the class of generalized linear models els begins with the specification of a generalized
linear model conditional on a vector u of random
in which random effects are added to the linear pre-
effects. That is, given a vector u (often with compo-
dictor. This modification extends the broad class of
nents specific to a subject or cluster), the conditional
generalized linear models to accommodate correla-
density of Yi is of the exponential family form
tion via random effects, while retaining the ability to
f (yi |u) = exp[{yi θi − b(θi )}φ + c(yi , φ)], where b
model nonnormal distributions and allowing nonlin-
and c are functions of known form. In addition, one
ear models of specific form. The class of GLMMs
assumes that E(Yi | u, x i ) = g −1 (x i β + zi u), where
includes the special cases of linear mixed models,
zi is a specified vector of covariates, analogous to
random coefficient models, random effects logistic
x i . Given u, the model additionally assumes that the
regression, and random effects Poisson regression,
responses Yi are independent. The function g links the
to name a few.
linear predictor to the expected value of the response.
The incorporation of random effects is a natu-
The model further assumes that the random effects
ral way to model or accommodate correlation in the u follow a distribution G, typically (but not neces-
context of a nonlinear model for nonnormal data. It sarily) multivariate normal with mean vector 0 and
generates a rich class of correlated data models that covariance matrix Σ(γ ), where γ is a vector of
would be difficult to specify directly. Readily avail- (co)variance parameters, for example, variances and
able, flexible, multivariate distributions analogous correlation coefficients.
to the multivariate normal distribution do not exist Thus, the model assumes that the linear predic-
for most nonnormally distributed data. tor consists of two portions: the fixed effects portion
Inferences for these models can be of the usual x i β, and the random effects portion zi u, for which a
variety, that is, modeling the effect of predictors on distribution is assigned to u. Just as with linear mixed
the mean, in which case the random effects and cor- models, the assumption of a distribution for the ran-
relation are “nuisance” features of the model. In other dom effects induces correlations among observations.
situations, however, both estimation and testing of the Finally, the assumptions underlying GLMMs specify
variances of the random effects, as well as prediction the multivariate distribution of Y = (Y1 , . . . , Yn ) , so
of the realized values of the random effects, may be that one can base inference with these models on
of interest (see Variance Components). likelihood methods.
We will illustrate several of our points using as The specification of covariate effects conditional
an example the longitudinal study of physicians and on random effects determines the interpretation of the
their patients described by Korff et al. [8]. This study fixed effects parameters β. For example, considering
classified 44 primary care physicians in a large HMO the activity limitation outcome in the back pain study,
according to their practice styles in treating back pain GLMMs would measure how the risk of activity lim-
management (low, moderate, or high frequency of itation in a particular patient of a particular physician
prescription of pain medication and bed rest), and changes over time, and how that change relates to
followed an average of 24 patients per physician for the practice style of the physician. Using GLMMs,
2 years (1 month, 1 year, and 2 year follow-ups) after one can also directly relate changes in explanatory
the index visit. Outcome variables included functional variables within an individual subject to changes in
measures (e.g. Did you experience moderate to severe the expected value of the subject’s response.
activity limitation?), patient satisfaction (e.g. “After To be more specific, we consider the simple and
your visit with the doctor, did you fully understand common case in which the data are correlated in clus-
how to take care of your back problem?”), and cost. ters, where i = 1, . . . , m indexes the clusters (e.g.

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd.


This article is © 2005 John Wiley & Sons, Ltd.
This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd.
DOI: 10.1002/0470011815.b2a10021
2 Generalized Linear Mixed Models

patients) while j = 1, . . . , ni indexes units within random effects, especially when data incorporating a
clusters (e.g. different time points) (see Cluster Sam- particular random effect are sparse (see Shrinkage
pling). In a GLMM with a single fixed effect, the Estimation).
parameter β measures the change in the conditional
expectation of Y corresponding to a unit increase in
the covariate within the ith cluster, Inference and Estimation
β = g[E(Yij |xij + 1, ui )] − g[E(Yij |xij , ui )], (1) Maximum likelihood (ML), or variants such as
where ui is a cluster-specific vector of random restricted maximum likelihood (REML), are stan-
effects. This contrasts with marginal models, where dard methods of estimation for linear mixed mod-
one specifies the marginal or population-averaged els and generalized linear models (e.g. logistic
(PA) distribution of the response of the j th unit in regression). Evaluation of the likelihood and hence
the ith cluster, integrated over the distribution of u, likelihood inference with GLMMs is computation-
together with some working (hypothesized) covari- ally difficult, however, because the random effects
ance structure for the ni responses in the ith cluster on which the likelihood is conditioned must be inte-
to account for intraclass correlation. For the single grated out of the distribution prior to maximization
fixed effect example, marginal models measure the as a function of the fixed effects. Although sev-
change in the marginal expectation E(Yij |xij ), uncon- eral useful computational methods currently exist, the
ditional on the random effects, associated with change development of new methods for GLMMs continues
in the covariate. Such covariate effects are exactly to be an active research area.
those one would estimate with a single response per To illustrate the inherent complexity, consider a
subject; the cluster structure thus plays no role in the general mixed logistic regression model for binary
interpretation of the model regression coefficients. data. The marginal likelihood takes the form
In addition to estimates of the effects of covariates  
on the expected value of the response, GLMMs can   
provide estimates of the dependence of responses ··· exp Yi (x i β + zi u)
within clusters, such as subjects. Measures such as i
the intraclass correlation coefficient, corr(Yij , Yij  ), 
× {1 + exp(x i β + zi u)}−1 dG(u), (2)
depend on the random effects distribution G, along
i
with its parameters γ ; using estimates of γ one can
construct estimates of intraclass correlation. where G is the distribution function of the random
As well as modeling within-cluster response de- effects and the integration is of a dimension equal
pendence and estimates of its magnitude, GLMMs to the dimension of u. For most choices of G, the
allow consideration of the individual random effects, integral cannot be evaluated in closed form although,
which themselves may be of interest. For example, in for simple cases like random intercept and/or random
the back pain study, we would include random effects slope models, (1) reduces to a product of lower-
to describe both the physician and patient effects on dimensional integrals amenable to numerical integra-
each of the outcomes. We might be interested in tion. Numerical integration becomes inaccurate for
obtaining predicted values for the random effects of three or more dimensions, but simulation-based meth-
each physician to help indicate which physicians had ods such as Markov Chain Monte Carlo [3], in par-
better outcomes and/or lower costs, after adjusting ticular, Gibbs sampling, and Monte Carlo EM [13]
for fixed effects of model covariates. The random have proven useful in these settings. Such methods
effects in a GLMM are best predicted by their can also handle complications such as crossed ran-
conditional expectations given the data, E(u|Y ). dom effects.
However, this expectation is unknown since it If ML estimation is feasible, then the usual infer-
depends on the unknown parameters β and γ . Hence, ential methods are available. In particular,
one typically estimates E(u|Y ) using estimates of
these parameters. The estimated values of E(u|Y ) are • ML estimators are asymptotically normal, with
shrinkage estimates and “borrow strength” across the standard errors available from second deriva-
data set in order to improve estimates of individual tives of the log likelihood.

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd.


This article is © 2005 John Wiley & Sons, Ltd.
This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd.
DOI: 10.1002/0470011815.b2a10021
Generalized Linear Mixed Models 3

• One can carry out hypothesis tests using like- expectation of the response,
lihood ratio, score, or Wald procedures.
• One can calculate best-predicted values as ex- E(Yij | x ij )
pected values of the random effects conditional    
on the data, substituting ML or REML esti- = y · · · f (y|x ij , ui ) dG(ui ) dy, (3)
mates for unknown parameters. Typically, one
cannot evaluate the conditional expected val- where x ij is the vector of covariate values associated
ues in closed form, so these calculations involve with Yij . Some approaches, such as Generalized
numerical integration. Estimating Equations (GEEs), do this without fully
• One can test whether variances of random specifying the functional form of the joint distribution
effects are zero using the likelihood ratio of the responses Yi1 , . . . , Yini within the ith cluster.
statistic. As with linear mixed models, the GLMM and marginal models are similar in that
asymptotic null distribution involves a mixture both parameterize the mean and covariance matri-
of chi-square distributions rather than the null ces of correlated groups of observations, and both
chi-square distribution usual in fixed effects base inferences on marginal likelihoods or marginal
models [15]. quasi-likelihoods of the observed data. However, the
implications of modeling the response distribution
For the case of random intercepts only, sev-
conditional on random effects, as do GLMMs, rather
eral authors have proposed a semiparametric mixed
than averaged over random effects, as do marginal
model approach that jointly estimates the regres-
models, are profound. With a nonlinear link function,
sion parameters and the (nonparametric) mixing
the impact of a conditional main effect in the linear
distribution G. These methods are given in several predictor varies, on the scale of E(Y ), with the values
papers, including [1, 2, 9, 10, 11]. This approach of accompanying fixed and random effects. Hence,
provides consistent estimation of the effects of all in contrast with the linear case, when averaging
covariates and of G under conditions of identifiabil- over the random effect distribution, the mean linear
ity [7]. predictor does not correspond (transform to) to the
Although GLMMs require specification of the ran- marginal conditional expectation Eui [E(Yij |x ij , ui )].
dom effects distribution G, several studies of the Similarly, linear predictors based on within-group
performance of logistic models with random inter- covariate means do not transform to means of the
cepts show that estimates of the fixed effects param- response within the corresponding groups. Hence, as
eters β are robust to misspecification of G. For stated earlier, marginal approaches measure concep-
example, using both approximations and simulations, tually and numerically different covariate effects than
Neuhaus et al. [14] showed that incorrectly assum- do GLMMs. In some cases, scientific interest and
ing that G was Gaussian produced fixed effects inferential goals of a problem lead one clearly to the
covariate effect estimates with very little bias (see marginal or conditional specification of a model; in
Unbiasedness). Heagerty and Kurland [5] corrob- other cases, the appropriate direction is less clear.
orated these findings but pointed out that more Since the distinctions between marginal models and
severe model misspecifications, such as the fail- GLMMs are commonly blurred in practice, it is useful
ure to model interactions of covariates and random to contrast them further.
effects, could yield biased estimates of covariate Marginal models are most helpful when corre-
effects. lation among observations cannot be ignored, but
neither the nature of the clustering that generates
such correlations, nor the individual clusters them-
Contrasting the Marginal and Conditional selves, are otherwise of particular scientific interest.
Approaches This characterization often applies when public health
impact is the focus of an investigation. From that
We return to the special case of correlated clusters and perspective, there may be little concern with either
the notation used previously to describe them. Many intracluster factors or with predictions about the spe-
investigators have considered alternative methods for cific units, such as families observed cross-sectionally
clustered data that focus on models for the marginal or individuals observed over time, that generate the

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd.


This article is © 2005 John Wiley & Sons, Ltd.
This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd.
DOI: 10.1002/0470011815.b2a10021
4 Generalized Linear Mixed Models

measurement clusters. In the back pain study men- The marginal distribution may be of a different
tioned earlier, in which practice style was constant form from any conditional distribution and,
within subjects over time, marginal modeling would indeed, the form of a conditional mixing
be natural to study a presumed homogeneous effect distribution required for common marginal
of practice style across subjects and times, and any models may be quite unusual [17]. In extreme
population time trend. (However, it would not be use- cases, features in every conditional model
ful for quoting the odds of reduction over time in, may be absent in the marginal model, for
say, the risk of activity limitation for an individual example, the marginal effect averaged across
patient.) 2 × 2 conditional tables might be opposite in
In such circumstances, for example, where lon- direction to those in each conditional table.
gitudinal observation is used for logistical reasons More commonly, however, population average
or statistical efficiency, and correlations are nui- effects may simply understate the strengths of
sance parameters in analysis, then, there is techni- effects on individuals [18].
cal advantage in bypassing a conditional model (see • GEE for marginal models may not estimate the
Efficiency and Efficient Estimators). For example, variance–covariance structure efficiently and
GLMMs that mistakenly assume random effects that does not allow, without further assumptions,
are homoscedastic can produce biased estimators [5, prediction of random effects. However, see [4,
6]. In contrast, estimates of marginal model parame- 16, 18] for developments in these directions.
ters obtained using GEE are consistent, even if the
association structure is misspecified. For a more detailed critique of marginal modeling,
In other circumstances, however, marginal models see [12].
may not measure covariate effects of primary scien-
tific interest, for example, in longitudinal studies in
which explanatory variables change over time within Summary
a subject and interest is in how individuals respond During the past decade, GLMMs have become an
to such changes. In the back pain study, investigators important statistical tool and now see heavy use for
were interested specifically in assessing patterns of modeling correlated, nonnormally distributed data.
change over time in individual subjects. In such sit- Software for fitting GLMMs is starting to mature,
uations, GLMMs are more ambitious than marginal and much experience has contributed to better appre-
models in attempting to (i) parse, using explicit ran- ciation of both the utility and pitfalls of the currently
dom effects and predicted values for them, sources available techniques. GLMMs are a natural modeling
of variation that produce correlated observations, and approach for longitudinal data when changes within
to (ii) portray and predict, for instance, shapes of subjects are of interest.
individual longitudinal disease trajectories, vulnera-
bilities of individual litters to teratogenesis, breeding
values of individual bulls, and susceptibilities of indi- References
vidual families to inheritable diseases. The price of
this ambition is paid in more stringent assumptions, [1] Butler, S.M. & Louis, T.A. (1992). Random effects
models with non-parametric priors, Statistics in Medicine
and in greater complexity of the model fitting process
11, 1981–2000. Disc:2017–2023.
and computations. [2] Follmann, D. & Lambert, D. (1989). Generalizing logis-
If one is willing to pay such a price, then GLMMs tic regression by nonparametric mixing, Journal of the
may be used to make inferences about marginal American Statistical Association 87, 295–300.
distributions, even when purely marginal methods [3] Gilks, W.R., Richardson, S. & Spiegelhalter, D. (1996).
would be adequate. Marginal modeling, however, Markov Chain Monte Carlo in Practice. Chapman &
Hall, London.
without such assumptions does not allow conditional
[4] Heagerty, P. (1999). Marginally specified logistic-
inference, for example, about longitudinal trajectories normal models for longitudinal binary data, Biometrics
typical of individual subjects. More specifically, 55, 688–698.
[5] Heagerty, P. & Kurland, B. (2001). Misspecified maxi-
• Variations of Simpson’s paradox and the mum likelihood estimates and generalised linear mixed
ecologic fallacy may apply at several levels. models, Biometrika 88, 973–986.

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd.


This article is © 2005 John Wiley & Sons, Ltd.
This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd.
DOI: 10.1002/0470011815.b2a10021
Generalized Linear Mixed Models 5

[6] Heagerty, P.J. & Zeger, S.L. (2000). Marginalized mul- [13] McCulloch, C.E. (1997). Maximum likelihood algo-
tilevel models and likelihood inference (with comments rithms for generalized linear mixed models, Journal of
and a rejoinder by the authors), Statistical Science 15(1), the American Statistical Association 92, 162–170.
1–26. [14] Neuhaus, J.M., Hauck, W.W. & Kalbfleisch, J.D. (1992).
[7] Kiefer, J. & Wolfowitz, J. (1956). Consistency of The effects of mixture distribution misspecification when
the maximum likelihood estimator in the presence of fitting mixed-effects logistic models, Biometrika 79,
infinitely many incidental parameters, Annals of Mathe- 755–762.
matical Statistics 27, 887–906. [15] Self, S. & Liang, K.-Y. (1987). Asymptotic properties of
[8] Korff, M., Barlow, W., Cherkin, D. & Deyo, R. (1994). maximum likelihood estimators and likelihood ratio tests
Effects of practice style in managing back pain, Annals under nonstandard conditions, Journal of the American
of Internal Medicine 121, 187–195. Statistical Association 82, 605–610.
[9] Laird, N. (1978). Nonparametric maximum likelihood [16] Waclawiw, M.A. & Liang, K.-Y. (1993). Prediction of
estimation of a mixture distribution, Journal of the random effects in the generalized linear model, Journal
American Statistical Association 73, 805–811. of the American Statistical Association 88, 171–178.
[10] Lesperance, M. & Kalbfleisch, J. (1992). An algorithm [17] Wang, Z. & Louis, T. (2003). Matching conditional and
for computing the nonparametric MLE of a mixing dis- marginal shapes in binary random intercept models using
tribution, Journal of the American Statistical Association a bridge distribution function, Biometrika 90 765–775.
87, 120–126. [18] Zeger, S.L., Liang, K.-Y. & Albert, P.S. (1988). Models
[11] Lindsay, B., Clogg, C. & Grego, J. (1991). Semiparamet- for longitudinal data: A generalized estimating equation
ric estimation in the Rasch model and related exponential approach (Corr: V45 p347), Biometrics 44, 1049–1060.
response models, including a simple latent class model
for item analysis, Journal of the American Statistical CHARLES E. MCCULLOCH &
Association 86, 96–107. JOHN M. NEUHAUS
[12] Lindsey, J.K. & Lambert, P. (1998). On the appropriate-
ness of marginal models for repeated measurements in
clinical trials, Statistics in Medicine 17, 447–469.

Encyclopedia of Biostatistics, Online © 2005 John Wiley & Sons, Ltd.


This article is © 2005 John Wiley & Sons, Ltd.
This article was published in the Encyclopedia of Biostatistics in 2005 by John Wiley & Sons, Ltd.
DOI: 10.1002/0470011815.b2a10021

You might also like