Professional Documents
Culture Documents
Applied Longitudinal Analysis 2nd Edition full chapter instant download
Applied Longitudinal Analysis 2nd Edition full chapter instant download
Edition
Visit to download the full and correct content document:
https://ebookmass.com/product/applied-longitudinal-analysis-2nd-edition/
More products digital (pdf, epub, mobi) instant
download maybe you interests ...
https://ebookmass.com/product/threat-assessment-and-risk-
analysis-an-applied-approach-1st-edition-allen/
https://ebookmass.com/product/data-science-applied-to-
sustainability-analysis-jennifer-b-dunn/
https://ebookmass.com/product/interaction-between-residential-
greenness-and-air-pollution-mortality-analysis-of-the-chinese-
longitudinal-healthy-longevity-survey-john-s-ji/
https://ebookmass.com/product/applied-veterinary-clinical-
nutrition-2nd-edition-andrea-j-fascetti/
(eTextbook PDF) for Applied Regression Analysis and
Other Multivariable Methods 5th Edition
https://ebookmass.com/product/etextbook-pdf-for-applied-
regression-analysis-and-other-multivariable-methods-5th-edition/
https://ebookmass.com/product/machines-and-mechanisms-applied-
kinematic-analysis-4th-ed-edition-david-h-myszka/
https://ebookmass.com/product/primer-of-applied-regression-and-
analysis-of-variance-3rd-edition-glantz-s-a/
https://ebookmass.com/product/applied-data-analysis-and-modeling-
for-energy-engineers-and-scientists/
https://ebookmass.com/product/applied-analysis-of-composite-
media-analytical-and-computational-approaches-drygas/
Applied Longitudinal Analysis
CONTENTS ix
References 671
Index 695
Preface
The first edition of Applied Longitudinal Analysis was designed to serve as a textbook
for a course on modern statistical methods for longitudinal data analysis, and subse-
quently, as a reference resource for students and researchers. The book was targeted
at a broad audience: graduate students in statistics, statisticians working in the health
sciences, pharmaceutical industry, and governmental health-related agencies, as well
as researchers and graduate students from a variety of substantive fields. In the seven
years that have elapsed since publication of the first edition, Applied Longitudinal
Analysis has been used extensively in university classrooms throughout the United
States and abroad. We are grateful to many colleagues, course instructors, students,
and readers who have offered constructive suggestions on how the book could be
improved. This feedback has been invaluable and helped shape the content of the
second edition.
The feedback we received has encouraged us to retain the general structure and
format of the first edition while taking the opportunity to introduce a number of new
and important topics. Although there is much new material in this second edition, the
principles that guided us in writing the first edition have not changed. Our primary
goal is to present a rigorous and comprehensive description of modern statistical meth-
ods for the analysis of longitudinal data that is accessible to a wide range of readers.
A strong emphasis is placed on the application of these methods to longitudinal data
and the interpretation of results. Although the methods are presented in the setting of
numerous applications to actual data sets drawn from studies in health-related fields,
reflecting our own research interests in the health sciences, they apply equally to other
areas of application, for example, education, psychology, and other branches of the
behavioral and social sciences.
xvii
xviii PREFACE
How does this edition differ from its predecessor? The major changes in this
edition have resulted from the addition of six new chapters:
1. A chapter (Chapter 9) on "fixed effects models," in which subject-specific effects
are treated as fixed rather than random, has been added. This chapter complements
the existing chapter on mixed effects models (Chapter 8) and includes a discussion
of the relative advantages of these two classes of models.
2. In the first edition, a single chapter was devoted to marginal models and generalized
estimating equations (GEE) that focused exclusively on binary and count data. We
now devote two chapters (Chapters 12 and 13) to marginal models and GEE, with
new material on models for ordinal responses, residual diagnostics, and issues
that arise when modeling time-varying covariates.
3. A chapter (Chapter 15) on approximate methods for generalized linear mixed
effects models discusses penalized quasi-likelihood (PQL) and marginal quasi-
likelihood (MQL) methods. We highlight settings where these approximations
are unlikely to be accurate and can yield biased estimates of effects.
4. A second chapter (Chapter 18) on missing data and dropout, focusing on multiple
imputation and inverse probability weighting (IPW) methods, has been added. To
give greater prominence to methods for accounting for missing data and dropout
in longitudinal analyses, the two companion chapters (Chapters 17 and 18) now
appear before the Advanced Topics part of the book.
5. A chapter (Chapter 19) on smoothing longitudinal data has been added to the
Advanced Topics. This chapter focuses on the connection between penalized
splines and linear mixed effects models.
6. A chapter on sample size and power (Chapter 20) has been added to the Advanced
Topics. This chapter considers issues of sample size, power, number of repeated
measurements, and study duration for longitudinal study designs.
In addition the chapter on residual analyses and diagnostics (Chapter 10) has been re-
vised to include material on recently developed model-checking techniques based on
cumulative sums of residuals and the chapters that review generalized linear models
(Chapter 11) and generalized linear mixed effects models (Chapter 14) have been up-
dated to include new material on models for ordinal data and on methods for handling
overdispersion. Finally, extra problem sets have been added to many of the chapters.
As in the first edition, the prerequisites for a course based on this book are an in-
troductory course in statistics and a strong background in regression analysis. Some
previous exposure to generalized linear models (e.g., logistic regression) would be
helpful, although these models are reviewed in detail in the text. An understanding of
matrix algebra or calculus is not assumed. Although we do not assume a high level of
mathematical preparation, we have written this book for the motivated reader who is
willing to consider mathematical ideas. The more technical or mathematical sections
of the book are signposted with asterisks and may be omitted at first reading without
loss of continuity.
PREFACE xix
The methods described in this book require the use of appropriate statistical
software. As before, we include illustrative SAS commands for performing the analy-
ses presented throughout the text at the end of many chapters, with basic descriptions
of their usage. Because many of the analyses we discuss can be performed using
alternative software packages (e.g., R, S-Plus, Stata, and SPSS), this book can be sup-
plemented with any one of them. Readers are encouraged to perform and verify the
results of analyses using statistical software of their choice. Programming statements
and computer output for selected examples, prepared using SAS, Stata, and R, can
be downloaded from the website: www.biostat.harvard.edu/~fi tzmaur /ala2e.
Because statistical software is constantly evolving, we will endeavor to update the
website as new procedures become available in the major statistical software pack-
ages. The thirty-two real data sets used throughout the text and problem sets to
illustrate the applications of longitudinal methods also can be downloaded from the
website.
We hope this second edition of Applied Longitudinal Analysis provides a broader
foundation in modem methods for the analysis of longitudinal data and will prove a
worthy successor to the first edition. The original impetus for writing this book arose
from teaching a graduate-level course on "Applied Longitudinal Analysis" at the
Harvard School of Public Health. We are especially grateful to the students who have
participated in the course since its inception almost twenty years ago; we have learned
much from these extraordinary students. The collection of individuals who gave us
useful feedback on the first edition is far too long to list. However, we would like to
thank the many friends and colleagues who have helped us with this project. A special
word of thanks to Amy Herring and Russell Localio. We thank Amy for her many
helpful and constructive suggestions on how the book could be improved. We thank
Russell for reading a draft of the new chapters and for providing invaluable feedback
and suggestions that improved their content. Thanks also to Nick Horton, Stu Lipsitz,
and Caitlin Ravichandran for their helpful suggestions and insightful comments on
several chapters. Finally, we thank Steve Quigley and Susanne Steitz-Filler of Wiley,
for their advice and encouragement during all stages of this project.
GARRETT M. FITZMAURICE
NANM. LAIRD
JAMES H. WARE
Boston, Massachusetts
May, 2011
Preface to First Edition
Our goal in writing this book is to provide a rigorous and systematic description of
modem methods for analyzing data from longitudinal studies. In recent years there
have been remarkable developments in methods for longitudinal analysis. Despite
these important advances, the methods have been somewhat slow to move into the
mainstream. Applied Longitudinal Analysis bridges the gap between theory and
application by presenting a comprehensive account of these methods in a way that is
accessible to a wide range of readers.
The impetus for this book arose from teaching a graduate-level course on "Applied
Longitudinal Analysis" at the Harvard School of Public Health. As course instructors,
we were frustrated by the lack of a suitable textbook that adequately covered modem
statistical methods for longitudinal analysis at a level accessible to a broad audience
of researchers and graduate students in the health and medical sciences. We envision
this book as a textbook for such a course and, subsequently, as a reference resource
for researchers and graduate students. It is also suitable for graduate students in
statistics and for statisticians already working in the health sciences, governmental
health-related agencies, and the pharmaceutical industry. It is intended to allow a
diverse group of statisticians, researchers, and graduate students in substantive fields
to master modem methods for longitudinal data analysis.
The scope of this book is broad, covering methods for the analysis of diverse
types of longitudinal data arising in the health sciences. The methods are pre-
sented in the setting of numerous applications to real data sets. Our main em-
phasis is on the practical rather than the theoretical aspects of longitudinal anal-
ysis. Twenty-five real data sets, drawn from studies in health-related fields, are
xxi
xxii PREFACE TO FIRST EDITION
used throughout the text and problem sets to illustrate the applications of longitu-
dinal methods. These data sets can be downloaded from the website for the book:
www.biostat.harvard.edu/~fitzmaur/ala. Although the methods are applied
to data sets drawn from the health sciences, they apply equally to other areas of ap-
plication, for example, education, psychology, and other branches of the behavioral
and social sciences.
Because longitudinal data are a special case of clustered data, albeit with a natural
ordering of the measurements within a cluster, we include also a description of modem
methods for analyzing clustered data, more broadly defined. Indeed, one of our
goals is to demonstrate that methods for longitudinal analysis are, more or less,
special cases of more general regression methods for clustered data. As a result a
comprehensive understanding of longitudinal data analysis provides the basis for a
broader understanding of methods for analyzing the wide range of clustered data that
commonly arises in studies in the biomedical and health sciences.
The prerequisites for a course based on this book are an introductory course in
statistics and a strong background in regression analysis. Some previous exposure to
generalized linear models (e.g., logistic regression) would be helpful, although these
models are reviewed in the text. An understanding of matrix algebra or calculus is
not assumed; the reader will be gently introduced to only those aspects of vector and
matrix notation necessary for understanding the matrix representation of regression
models for longitudinal data. Because vectors and matrices are used to simplify
notation, the reader is required to attain some basic facility with the addition and
multiplication of vectors and matrices. Although we do not assume a high level
of mathematical preparation, a willingness to read and consider mathematical ideas
is required. More technical or mathematical sections of the book are marked with
asterisks and may be omitted at first reading without loss of continuity.
To use the methods described in this book, appropriate statistical software is re-
quired. In general, the methods available via commercially available software lag
behind the recent advances in statistical methods; longitudinal data analysis is not
exceptional in this regard. Recently the introduction of new programs for analyzing
multivariate and longitudinal data has made these methods far more accessible to
practitioners and students. We use SAS, which is widely available, to perform the
analyses presented throughout the text. Illustrative SAS commands are included at
the end of many of the chapters, with basic descriptions of their usage. Program-
ming statements and computer output for the examples, prepared using SAS, can be
downloaded from the website: www.biostat.harvard.edu/~fitzmaur/ala. We
selected SAS because all of the analyses we discuss can be performed using its pro-
cedures. Many of the methods can be carried out using alternative software packages
(e.g., S-Plus and Stata) or special purpose programs (e.g., BMDP5-V) and this book
can be supplemented with any one of them. Readers are encouraged to perform and
verify the results of analyses using software of their choice. Because statistical soft-
ware is constantly evolving, we anticipate that all of the methods we discuss will soon
be available within most of the major statistical packages.
Throughout the text references have been kept to an absolute minimum. Instead,
at the end of each chapter we include suggestions for further readings that provide
PREFACE TO FIRST EDITION xxiii
more in-depth coverage of certain topics. We also include "bibliographic notes" that
highlight key references in the mainstream statistical literature. Although many of
our readers may find the latter references to be too technical, they are included to give
due credit to those who have contributed to the statistical methods described in each
chapter.
Finally, we would like to thank the many friends and colleagues who have helped us
to write this book. A special word of thanks to Misha Salganik, for preparation of the
diagrams and many helpful suggestions for improvement of graphical displays. We
are especially grateful to Joe Hogan and Russell Localio, for reading a first draft and
providing invaluable feedback, comments, and suggestions that improved the book.
We would also like to thank Rino Bellocco, Brent Coull, Nick Horton, Sharon-Lise
Normand, Misha Salganik, Judy Singer, S. V. Subramanian, and Florin Vaida, for their
insightful comments on several chapters. We are grateful to the students who have
participated in the course on "Applied Longitudinal Analysis" at the Harvard School
of Public Health since its inception; they have provided the impetus and motivation
for writing this book. We gratefully acknowledge support from grant GM 29745 from
the National Institutes of Health. The first author gratefully acknowledges support
from the Junior Faculty Sabbatical Program at the Harvard School of Public Health;
the support provided by a sabbatical created a unique opportunity to begin writing
this book. Last, but not least, we thank Steve Quigley and Susanne Steitz of Wiley,
for their advice and encouragement during all stages of this project.
GARRETT M. FITZMAURICE
NANM. LAIRD
JAMES H. WARE
Boston, Massachusetts
March, 2004
Acknowledgments
Throughout this book we have used data sets drawn from published studies in health-
related fields to exemplify important concepts in the analysis of longitudinal and
clustered data. We are grateful to the following investigators for sharing their data with
us: Graham Bentham, Doug Dockery, Brian Flay, Robert Greenberg, Keith Henry,
Aviva Must, Elena Naumova, George Rhoads, Jan Schouten, Linda Van Marter, and
Gwen Zahner.
We also thank the following publishers for permission to reproduce published data
sets in print and electronic format: The American Statistical Association, Blackwell
Publishing, Brooks/Cole (a division of Thomson Learning), CRC Press, Elsevier,
Iowa State Press, Oxford University Press, and SAS Institute, Inc.
Finally, in all data sets used throughout this book, the original subject identification
(ID) numbers have been deleted and replaced with new subject ID numbers, to ensure
that the data sets cannot be linked to the original records.
XXV
Part/
Introduction to Longitudinal
and Clustered Data
1
Longitudinal and
Clustered Data
1.1 INTRODUCTION
Research on statistical methods for the design and analysis of human investigations
expanded explosively in the second half of the twentieth century. Beginning in the
early 1950s, the U.S. government shifted a substantial part of its research support from
military to biomedical research. The legislative foundation for the modem National
Institutes of Health (NIH), the Public Health Service Act, was passed in 1944 and
NIH grew rapidly throughout the 1950s and 1960s. During these "golden years" of
NIH expansion, the entire NIH budget grew from $8 million in 1947 to more than
$1 billion in 1966. The NIH sponsored many of the important epidemiologic studies
and clinical trials of that period, including the influential Framingham Heart Study
(Dawber et al., 1951; Dawber, 1980).
The typical focus of these early studies was morbidity and, especially, mortality.
Investigators sought to identify the causes of early death and to evaluate the effective-
ness of treatments for delaying death and morbidity. In the Framingham Heart Study,
participants were seen at two-year intervals. Survival outcomes during successive
two-year periods were treated as independent events and modeled using multiple lo-
gistic regression. The successful use of multiple logistic regression in this setting,
and the recognition that it could be applied to case-control data, led to widespread use
of this methodology beginning in the 1960s. The analysis of time-to-event data was
revolutionized by the seminal 1972 paper of D.R. Cox, describing the proportional
hazards model (Cox, 1972). This paper was followed by a rich and important body
of work that established the conceptual basis and the computational tools for modem
survival analysis.
Although the design of the Framingham Heart Study and other cohort studies called
for periodic measurement of the patient characteristics thought to be determinants of
chronic disease, interest in the levels and patterns of change of those characteristics
over time was initially limited. As the research advanced, however, investigators
began to ask questions about the behavior of these risk factors. In the Framingham
Heart Study, for example, investigators began to ask whether blood pressure levels in
childhood were predictive of hypertension in adult life. In the Coronary Artery Risk
Development in Young Adults (CARDIA) Study, investigators sought to identify the
determinants of the transition from normotensive or normocholesterolemic status in
early adult life to hypertension and hypercholesterolemia in middle age (Friedman
et al., 1988). In the treatment of arthritis, asthma, and other diseases that are not
typically life-threatening, investigators began to study the effects of treatments on the
level and change over time in measures of severity of disease. Similar questions were
being posed in every disease setting. Investigators began to follow populations of
all ages over time, both in observational studies and clinical trials, to understand the
development and persistence of disease and to identify factors that alter the course of
disease development.
This interest in the temporal patterns of change in human characteristics came at
a period when advances in computing power made new and more computationally
intensive approaches to statistical analysis available at the desktop. Thus, in the early
1980s, Laird and Ware proposed the use of the EM algorithm to fit a class of linear
mixed effects models appropriate for the analysis of repeated measurements (Laird
and Ware, 1982); Jennrich and Schluchter (1986) proposed a variety of alternative
algorithms, including Fisher-scoring and Newton-Raphson algorithms. Later in the
decade, Liang and Zeger introduced the generalized estimating equations in the bio-
statistical literature and proposed a family of generalized linear models for fitting
repeated observations of binary and counted data (Liang and Zeger, 1986; Zeger
and Liang, 1986). Many other investigators writing in the biomedical, educational,
and psychometric literature contributed to the rapid development of methodology for
the analysis of these "longitudinal" data. The past 30 years have seen considerable
progress in the development of statistical methods for the analysis of longitudinal
data. Despite these important advances, methods for the analysis of longitudinal data
have been somewhat slow to move into the mainstream. This book bridges the gap be-
tween theory and application by presenting a comprehensive description of methods
for the analysis of longitudinal data accessible to a broad range of readers.
The defining feature of longitudinal studies is that measurements of the same individ-
uals are taken repeatedly through time, thereby allowing the direct study of change
over time. The primary goal of a longitudinal study is to characterize the change in
response over time and the factors that influence change. With repeated measures
on individuals, one can capture within-individual change. Indeed, the assessment of
within-subject changes in the response over time can only be achieved within a Ion-
LONGITUDINAL AND CLUSTERED DATA 3
gitudinal study design. For example, in a cross-sectional study, where the response is
measured at a single occasion, one can only obtain estimates of between-individual
differences in the response. That is, a cross-sectional study may allow comparisons
among sub-populations that happen to differ in age, but it does not provide any infor-
mation about how individuals change during the corresponding period.
To highlight this important distinction between cross-sectional and longitudinal
study designs, consider the following simple example. Body fatness in girls is thought
to increase just before or around menarche, leveling off approximately 4 years after
menarche. Suppose that investigators are interested in determining the increase in
body fatness in girls after menarche. In a cross-sectional study design, investigators
might obtain measurements of percent body fat on two separate groups of girls: a
group of 10-year-old girls (a pre-menarcheal cohort) and a group of 15-year-old girls
(a post-menarcheal cohort). In this cross-sectional study design, direct comparison of
the average percent body fat in the two groups of girls can be made using a two-sample
(unpaired) t-test. This comparison does not provide an estimate of the change in body
fatness as girls age from 10 to 15 years. The effect of growth or aging, an inherently
within-individual effect, simply cannot be estimated from a cross-sectional study that
does not obtain measures of how individuals change with time. In a cross-sectional
study the effect of aging is potentially confounded with possible cohort effects. Put
in a slightly different way, there are many characteristics that differentiate girls in
these two different age groups that could distort the relationship between age and
body fatness. On the other hand, a longitudinal study that measures a single cohort
of girls at both ages 10 and 15 can provide a valid estimate of the change in body
fatness as girls age. In the longitudinal study the analysis is based on a paired t-test,
using the difference or change in percent body fat within each girl as the outcome
variable. This within-individual comparison provides a valid estimate of the change
in body fatness as girls age from 10 to 15 years. Moreover, since each girl acts as
her own control, changes in percent body fat throughout the duration of the study are
estimated free of any between-individual variation in body fatness.
A distinctive feature of longitudinal data is that they are clustered. In longitudi-
nal studies the clusters are composed of the repeated measurements obtained from a
single individual at different occasions. Observations within a cluster will typically
exhibit positive correlation, and this correlation must be accounted for in the analysis.
Longitudinal data also have a temporal order; the first measurement within a clus-
ter necessarily comes before the second measurement, and so on. The ordering of
the repeated measures has important implications for analysis. There are, however,
many studies in the health sciences that are not longitudinal in this sense but which
give rise to data that are clustered or cluster-correlated. For example, clustered data
commonly arise when intact groups are randomized to health interventions or when
naturally occurring groups in the population are randomly sampled. An example of
the former is group-randomized trials. In a group-randomized trial, also known as
a cluster-randomized trial, groups of individuals, rather than each individual alone,
are randomized to different treatments or health interventions. Data on the health
outcomes of interest are obtained on all individuals within a group. Alternatively,
clustered data can arise from random sampling of naturally occurring groups in the
4 LONGITUDINAL AND CLUSTERED DATA