Professional Documents
Culture Documents
Full download Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee file pdf all chapter on 2024
Full download Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee file pdf all chapter on 2024
Full download Handbook of Regression Analysis With Applications in R, Second Edition Samprit Chatterjee file pdf all chapter on 2024
https://ebookmass.com/product/handbook-of-statistical-analysis-
and-data-mining-applications-second-edition-elder/
https://ebookmass.com/product/primer-of-applied-regression-and-
analysis-of-variance-3rd-edition-glantz-s-a/
https://ebookmass.com/product/primer-of-applied-regression-
analysis-of-variance-3rd-edition-edition-stanton-a-glantz/
https://ebookmass.com/product/valuing-businesses-using-
regression-analysis-c-fred-hall/
Introduction to Linear Regression Analysis (Wiley
Series in Probability and Statistics) 6th Edition
Montgomery
https://ebookmass.com/product/introduction-to-linear-regression-
analysis-wiley-series-in-probability-and-statistics-6th-edition-
montgomery/
https://ebookmass.com/product/random-process-analysis-with-r-
marco-bittelli/
https://ebookmass.com/product/an-introduction-to-statistical-
learning-with-applications-in-r-ebook/
https://ebookmass.com/product/oxford-handbook-of-nutrition-and-
dietetics-second-edition-reprinted-with-corrections-edition-
gandy/
https://ebookmass.com/product/data-analysis-for-the-life-
sciences-with-r-1st-edition/
Handbook of Regression Analysis
With Applications in R
WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors
David J. Balding, Noel A.C. Cressie, Garrett M. Fitzmaurice, Harvey
Goldstein, Geert Molenberghs, David W. Scott, Adrian F.M. Smith, and
Ruey S. Tsay
Editors Emeriti
Vic Barnett, Ralph A. Bradley, J. Stuart Hunter, J.B. Kadane, David G.
Kendall, and Jozef L. Teugels
A complete list of the titles in this series appears at the end of this volume.
Handbook of Regression
Analysis With Applications
in R
Second Edition
Samprit Chatterjee
New York University, New York, USA
Jeffrey S. Simonoff
New York University, New York, USA
This second edition first published 2020
© 2020 John Wiley & Sons, Inc
Edition History
Wiley-Blackwell (1e, 2013)
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in
any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by
law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/
permissions.
The right of Samprit Chatterjee and Jeffery S. Simonoff to be identified as the authors of this work has been
asserted in accordance with law.
Registered Office
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us
at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that
appears in standard print versions of this book may not be available in other formats.
10 9 8 7 6 5 4 3 2 1
Dedicated to everyone who labors in the field
of statistics, whether they are students,
teachers, researchers, or data analysts.
Contents
Part I
The Multiple Linear Regression Model
2 Model Building 23
2.1 Introduction 23
2.2 Concepts and Background Material 24
2.2.1 Using Hypothesis Tests to Compare Models 24
2.2.2 Collinearity 26
2.3 Methodology 29
2.3.1 Model Selection 29
2.3.2 Example — Estimating Home Prices
(continued) 31
2.4 Indicator Variables and Modeling Interactions 38
2.4.1 Example — Electronic Voting and the 2004
Presidential Election 40
2.5 Summary 46
vii
viii CONTENTS
Part II
Addressing Violations of Assumptions
Part III
Categorical Predictors
Part IV
Non-Gaussian Regression Models
Part V
Other Regression Models
Part VI
Nonparametric and Semiparametric
Models
Bibliography 337
Index 343
Preface to the
Second Edition
The years since the first edition of this book appeared have been fast-moving
in the world of data analysis and statistics. Algorithmically-based methods
operating under the banner of machine learning, artificial intelligence, or
data science have come to the forefront of public perceptions about how to
analyze data, and more than a few pundits have predicted the demise of classic
statistical modeling.
To paraphrase Mark Twain, we believe that reports of the (impending)
death of statistical modeling in general, and regression modeling in particular,
are exaggerated. The great advantage that statistical models have over “black
box” algorithms is that in addition to effective prediction, their transparency
also provides guidance about the actual underlying process (which is crucial
for decision making), and affords the possibilities of making inferences and
distinguishing real effects from random variation based on those models.
There have been laudable attempts to encourage making machine learning
algorithms interpretable in the ways regression models are (Rudin, 2019), but
we believe that models based on statistical considerations and principles will
have a place in the analyst’s toolkit for a long time to come.
Of course, part of that usefulness comes from the ability to generalize
regression models to more complex situations, and that is the thrust of the
changes in this new edition. One thing that hasn’t changed is the philosophy
behind the book, and our recommendations on how it can be best used, and
we encourage the reader to refer to the preface to the first edition for guidance
on those points. There have been small changes to the original chapters, and
broad descriptions of those chapters can also be found in the preface to the
first edition. The five new chapters (Chapters 11, 13, 14, 15, and 16, with
the former chapter 11 on nonlinear regression moving to Chapter 12) expand
greatly on the power and applicability of regression models beyond what
was discussed in the first edition. For this reason many more references are
provided in these chapters than in the earlier ones, since some of the material
in those chapters is less established and less well-known, with much of it still
the subject of active research. In keeping with that, we do not spend much
(or any) time on issues for which there still isn’t necessarily a consensus in the
statistical community, but point to books and monographs that can help the
analyst get some perspective on that kind of material.
Chapter 11 discusses the modeling of time-to-event data, often referred
to as survival data. The response variable measures the length of time until an
event occurs, and a common complicator is that sometimes it is only known
xv
xvi PREFACE TO THE SECOND EDITION
that a response value is greater than some number; that is, it is right-censored.
This can naturally occur, for example, in a clinical trial in which subjects
enter the study at varying times, and the event of interest has not occurred at
the end of the trial. Analysis focuses on the survival function (the probability
of surviving past a given time) and the hazard function (the instantaneous
probability of the event occurring at a given time given survival to that
time). Parametric models based on appropriate distributions like the Weibull
or log-logistic can be fit that take censoring into account. Semiparametric
models like the Cox proportional hazards model (the most commonly-used
model) and the Buckley-James estimator are also available, which weaken
distributional assumptions. Modeling can be adapted to situations where
event times are truncated, and also when there are covariates that change over
the life of the subject.
Chapter 13 extends applications to data with multiple observations for
each subject consistent with some structure from the underlying process. Such
data can take the form of nested or clustered data (such as students all in
one classroom) or longitudinal data (where a variable is measured at multiple
times for each subject). In this situation ignoring that structure results in an
induced correlation that reflects unmodeled differences between classrooms
and subjects, respectively. Mixed effects models generalize analysis of variance
(ANOVA) models and time series models to this more complicated situation.
Models with linear effects based on Gaussian distributions can be generalized
to nonlinear models, and also can be generalized to non-Gaussian distributions
through the use of generalized linear mixed effects models.
Modern data applications can involve very large (even massive) numbers of
predictors, which can cause major problems for standard regression methods.
Best subsets regression (discussed in Chapter 2) does not scale well to very
large numbers of predictors, and Chapter 14 discusses approaches that can
accomplish that. Forward stepwise regression, in which potential predictors
are stepped in one at a time, is an alternative to best subsets that scales
to massive data sets. A systematic approach to reducing the dimensionality
of a chosen regression model is through the use of regularization, in which
the usual estimation criterion is augmented with a penalty that encourages
sparsity; the most commonly-used version of this is the lasso estimator, and it
and its generalizations are discussed further.
Chapters 15 and 16 discuss methods that move away from specified
relationships between the response and the predictor to nonparametric and
semiparametric methods, in which the data are used to choose the form of
the underlying relationship. In Chapter 15 linear or (specifically specified)
nonlinear relationships are replaced with the notion of relationships taking the
form of smooth curves and surfaces. Estimation at a particular location is based
on local information; that is, the values of the response in a local neighborhood
of that location. This can be done through local versions of weighted least
squares (local polynomial estimation) or local regularization (smoothing
splines). Such methods can also be used to help identify interactions between
numerical predictors in linear regression modeling. Single predictor smoothing
PREFACE TO THE SECOND EDITION xvii
SAMPRIT CHATTERJEE
Brooksville, Maine
JEFFREY S. SIMONOFF
New York, New York
October, 2019
Preface to the
First Edition
xix
xx PREFACE TO THE FIRST EDITION
groups to each other. Data of this type often exhibit nonconstant variance
related to the different subgroups in the population, and the appropriate tool
to address this issue, weighted least squares, is also a focus here.
Chapters 8 though 10 examine the situation where the nature of the
response variable is such that Gaussian-based least squares regression is no
longer appropriate. Chapter 8 focuses on logistic regression, designed for
binary response data and based on the binomial random variable. While
there are many parallels between logistic regression analysis and least squares
regression analysis, there are also issues that come up in logistic regression
that require special care. Chapter 9 uses the multinomial random variable to
generalize the models of Chapter 8 to allow for multiple categories in the
response variable, outlining models designed for response variables that either
do or do not have ordered categories. Chapter 10 focuses on response data in
the form of counts, where distributions like the Poisson and negative binomial
play a central role. The connection between all these models through the
generalized linear model framework is also exploited in this chapter.
The final chapter focuses on situations where linearity does not hold,
and a nonlinear relationship is necessary. Although these models are based on
least squares, from both an algorithmic and inferential point of view there
are strong connections with the models of Chapters 8 through 10, which we
highlight.
This Handbook can be used in several different ways. First, a reader may
use the book to find information on a specific topic. An analyst might want
additional information on, for example, logistic regression or autocorrelation.
The chapters on these (and other) topics provide the reader with this subject
matter information. As noted above, the chapters also include at least one
analysis of a data set, a clarification of computer output, and reference to
sources where additional material can be found. The chapters in the book are
to a large extent self-contained and can be consulted independently of other
chapters.
The book can also be used as a template for what we view as a reasonable
approach to data analysis in general. This is based on the cyclical paradigm
of model formulation, model fitting, model evaluation, and model updating
leading back to model (re)formulation. Statistical significance of test statistics
does not necessarily mean that an adequate model has been obtained. Further
analysis needs to be performed before the fitted model can be regarded as
an acceptable description of the data, and this book concentrates on this
important aspect of regression methodology. Detection of deficiencies of fit
is based on both testing and graphical methods, and both approaches are
highlighted here.
This preface is intended to indicate ways in which the Handbook can
be used. Our hope is that it will be a useful guide for data analysts, and will
help contribute to effective analyses. We would like to thank our students and
colleagues for their encouragement and support. We hope we have provided
xxii PREFACE TO THE FIRST EDITION
them with a book of which they would approve. We would like to thank Steve
Quigley, Jackie Palmieri, and Amy Hendrickson for their help in bringing this
manuscript to print. We would also like to thank our families for their love
and support.
SAMPRIT CHATTERJEE
Brooksville, Maine
JEFFREY S. SIMONOFF
New York, New York
August, 2012
Part One
20–9059
“The feature that gives the book its greatest value, is its profound
understanding of the British people, whose industrial and political
problems it describes and illumines with such keen comment.” T. M.
Ave-Lallemant
[2]
GLINSKI, ANTONI JÓZEF. Polish fairy tales;
tr. by Maude Ashurst Biggs. il *$5 Lane
21–658
20–7588
20–18316
“It can be said, however, that the first half of the book leads the
way to its climax with a relentless logic—providing always that the
author’s premises are correct—that is truly delightful and admirably
lucid.” Van Buren Thorne
20–2423
“Dr Goldberg has written in great detail, with diction lucid and at
times sparkling.”
20–12048
“‘The fight for freedom’ is a good play quite apart from any
pretensions to be different in character from the social plays of the
pre-war theater. It is, in fact, in direct line with the best work of
Shaw, Galsworthy and Barker.” B. L.
20–9785
Reviewed by H. W. Boynton
20–17759
Reviewed by R. E. Roberts
Boston Transcript p7 Ag 7 ’20 400w
“In this book the author once more gives proof of his remarkable
receptivity, his power of seizing and reproducing the surface
impressions of the circle in which he moves. That there is nothing
either well-thought-out or valuable in these essays is hardly so much
his fault as his misfortune. The lighter sketches are incomparably the
better, and should prove to him his true vocation.”
“As he has a gift for seeing beneath the genius to the man, and can
attend a tea-party for the pleasure of saying afterwards how trivial he
found it, his book is not devoid of spice, though its prose is
undistinguished and sometimes slack.”
20–1376
20–224
“In it are adequately set forth the solid, conservative policies of the
long-time president of the American federation of labor. But the
thoughts are the thoughts of history rather than of the present; the
reader who would know what labor is thinking now must supplement
the Gompers philosophy with many creations of a new régime of
ideas.” E. D. Strong
“We had occasion a few weeks ago to notice a book of the Civic
federation, one chapter being written by James W. Sullivan of the A.
F. of L. Our judgment was that the national officials of the
organization had become trade union chauvinists. This latest volume
confirms our impression. Nevertheless, we are glad to have this book.
The selections by Robbins are excellent and no matter whether the
reader agrees or does not agree with Mr Gompers, this compilation is
valuable for his partisans and all others interested in the history of
the American federation of labor.” James Oneal
Reviewed by J. E. Le Rossignol
20–12195
“With its companion volume, ‘Labor and the common welfare,’ this
book gives a complete review of American social problems as Mr
Gompers has known them during the past thirty-five years.” (R of
Rs) “The book is made up of excerpts from reports, speeches,
testimony, writings and editorials classified under such major
headings as Employers and employers’ organizations, Wages, Hours
of work, The ‘open’ shop, Women in industry, Unemployment,
Insurance and compensation, Limitation of output, Strikes,
Arbitration and collective bargaining, Profit sharing and Industrial
democracy. Within each group are arranged chronologically the
various minor topics which naturally come under the major
headings.” (Survey)