Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Psychological Methods © 2013 American Psychological Association

2013, Vol. 18, No. 3, 335–351 1082-989X/13/$12.00 DOI: 10.1037/a0032553

Managing Heteroscedasticity in General Linear Models

Patrick J. Rosopa Meline M. Schaffer


Clemson University Walmart, Bentonville, Arkansas

Amber N. Schroeder
Western Kentucky University

Heteroscedasticity refers to a phenomenon where data violate a statistical assumption. This assumption
is known as homoscedasticity. When the homoscedasticity assumption is violated, this can lead to
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

increased Type I error rates or decreased statistical power. Because this can adversely affect substantive
This document is copyrighted by the American Psychological Association or one of its allied publishers.

conclusions, the failure to detect and manage heteroscedasticity could have serious implications for
theory, research, and practice. In addition, heteroscedasticity is not uncommon in the behavioral and
social sciences. Thus, in the current article, we synthesize extant literature in applied psychology,
econometrics, quantitative psychology, and statistics, and we offer recommendations for researchers and
practitioners regarding available procedures for detecting heteroscedasticity and mitigating its effects. In
addition to discussing the strengths and weaknesses of various procedures and comparing them in terms
of existing simulation results, we describe a 3-step data-analytic process for detecting and managing
heteroscedasticity: (a) fitting a model based on theory and saving residuals, (b) the analysis of residuals,
and (c) statistical inferences (e.g., hypothesis tests and confidence intervals) involving parameter
estimates. We also demonstrate this data-analytic process using an illustrative example. Overall, detect-
ing violations of the homoscedasticity assumption and mitigating its biasing effects can strengthen the
validity of inferences from behavioral and social science data.

Keywords: heteroscedasticity, heterogeneity of variance, nonconstant variance, homoscedasticity, homo-


geneity of variance

Empirical tests of theoretical models in the behavioral sciences rates or decreased statistical power (Box, 1954; DeShon & Alex-
often rely on assumptions to ensure the accuracy of estimated ander, 1996; White, 1980; Wilcox, 1997). Because this can ad-
parameters and statistical tests. In general linear models, one of the versely affect substantive conclusions, the failure to detect and
most frequently used data-analytic procedures in psychology and manage heteroscedasticity could have serious implications for
related disciplines (Cohen, Cohen, West, & Aiken, 2003; Dawes & theory, research, and practice. In addition, heteroscedasticity is not
Corrigan, 1974; DeShon & Morris, 2004; Judd, 2000; Stone- uncommon in psychology and allied fields (see Aguinis & Pierce,
Romero, Weaver, & Glenar, 1995), an important assumption is 1998; Alexander & Govern, 1994; Antonakis & Dietz, 2011; Cai
that of homoscedasticity (often reserved for linear regression) or & Hayes, 2008; DeShon & Alexander, 1996; Ghiselli & Sanders,
homogeneity of variance (often reserved for analysis of variance; 1967; Grissom, 2000; Kahneman & Ghiselli, 1962; Olejnik, 1988;
Fox, 2008; King, Rosopa, & Minium, 2010; Rencher, 2000). When Overton, 2001; Rosopa, 2006). Therefore, in the current article, we
this assumption is violated, it is typically referred to as heterosce- review research in applied psychology, econometrics, quantitative
dasticity or heterogeneity of variance, respectively, and leads to psychology, and statistics, and we offer recommendations for
incorrect standard errors, which can cause inflated Type I error researchers and practitioners regarding available procedures for
detecting and managing heteroscedasticity.
Our article is organized in the following manner. First, we
provide a description of homoscedasticity and heteroscedasticity.
Second, we describe how heteroscedasticity can manifest in sta-
Patrick J. Rosopa, Department of Psychology, Clemson University;
tistical models commonly used to analyze data in the behavioral
Meline M. Schaffer, Global Organizational Effectiveness, Walmart, Ben-
tonville, Arkansas; Amber N. Schroeder, Department of Psychology, West-
and social sciences. Third, we describe how heteroscedasticity can
ern Kentucky University. serve as a useful indicator in the initial stages of research design.
Portions of this article were presented at the 25th Annual Conference of Fourth, based on a review of extant literature, we describe the
the Society for Industrial and Organizational Psychology in Atlanta, Geor- data-analytic process for managing heteroscedasticity. In addition,
gia. We would like to thank Huy Le, Rich Pak, Xiaogang Su, Kevin Stagl, we integrate an illustrative example of the data-analytic process.
and Tom Zagenczyk for their constructive comments on earlier versions of
the manuscript. Homoscedasticity Versus Heteroscedasticity
Correspondence concerning this article should be addressed to Patrick J.
Rosopa, Department of Psychology, College of Business & Behavioral In general linear models, homoscedasticity is an assumption that
Science, Clemson University, 418 Brackett Hall, Clemson, SC 29634- is required to ensure the accuracy of standard errors and asymp-
1355. E-mail: prosopa@clemson.edu totic covariances among estimated parameters. With heteroscedas-

335
336 ROSOPA, SCHAFFER, AND SCHROEDER

ticity, although estimated parameters remain unbiased and are Then, with the N ⫻ 1 vector of residuals (viz., e ⫽ y ⫺ X␤ˆ ), an
consistent, the estimated covariance matrix among the parameter unbiased estimator of the common ␴2, based on an average of the
estimates will be incorrect (Rencher, 2000). This can lead to low squared eis (i.e., the mean square error), can be denoted by
statistical power or inflated Type I error rates (Box, 1954; DeShon
& Alexander, 1996). ˆ (y ⫺ X␤)
(y ⫺ X␤) ˆ
Formally, the general linear model for N observations and p ␴
ˆ2⫽ . (4)
N⫺p⫺1
predictors (or regressors) can be compactly expressed in matrix
form as With Equation 4, we can estimate the (p ⫹ 1) ⫻ (p ⫹ 1)
covariance matrix among the regression coefficients with the
y ⫽ X␤ ⫹ ␧, (1) usual estimator (Fox, 2008; Rencher, 2000; Seber & Lee, 2003)
denoted by
where y is an N ⫻ 1 response vector, X is an N ⫻ (p ⫹ 1) model
matrix that includes a leading column vector of 1s, ␤ is a (p ⫹ cov共␤
ˆ兲 ⫽ ␴
ˆ 2共XX兲⫺1 . (5)
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

1) ⫻ 1 vector of unknown parameters (viz., ␤0, ␤1, . . ., ␤p) to be


This document is copyrighted by the American Psychological Association or one of its allied publishers.

estimated, and ␧ is an N ⫻ 1 vector of population errors. Estima- With heteroscedasticity, however, the elements in Equation 5 will
tion of ␤ is important for arriving at an estimate of ␧. Procedures be incorrect, with the standard errors (viz., the square root of the
for the unbiased estimation of ␤ are described elsewhere (see, e.g., diagonal elements in cov共␤兲) ˆ generally being too large (viz.,
Cohen et al., 2003; Rencher, 2000; Seber & Lee, 2003). inefficient). Thus, statistical inferences (e.g., hypothesis tests, con-
Of particular interest in the present article are the population fidence intervals) will be incorrect (Fox, 2008; Rencher, 2000;
errors (i.e., ␧) in Equation 1, which are assumed to follow the same Seber & Lee, 2003).
distribution. That is, the εis (for i ⫽ 1, 2, . . ., N) are assumed to To detect violations of homoscedasticity in a sample, the resid-
follow a normal distribution and be independently and identically uals (i.e., eis) are needed. The homoscedasticity assumption is said
distributed with a mean of 0 and a common variance of ␴2 (Fox, to be tenable when the spread of residuals is approximately con-
2008; Rencher, 2000; Seber & Lee, 2003).1 This last assumption of stant across all predictors, combinations of predictors, and pre-
a common variance is homoscedasticity. Stated differently, in the dicted values (Darlington, 1990; Fox, 2008). When the spread of
general linear model, ␧ is assumed to have a diagonal N ⫻ N the residuals is no longer approximately constant (e.g., wedge-
covariance matrix given by shaped or butterfly-shaped pattern), heteroscedasticity is said to
exist.

冢 冣
␴2 0 · · · 0
0 ␴2 · · · É Heteroscedasticity in Commonly Used Statistical
cov(␧) ⫽ ␴2IN ⫽ , (2)
É É 0 Models
0 0 · · · ␴2
Heteroscedasticity can manifest in statistical models used for
analyzing data in the behavioral and social sciences. Because
where IN ⫽ an identity matrix of order N (Fox, 2008; Rencher,
heteroscedasticity can take various forms, in the following sec-
2000; Seber & Lee, 2003). Note the common variance on the main
tions, we provide examples of different patterns of heteroscedas-
diagonal in Equation 2. In contrast, heteroscedasticity is said to
ticity using three statistical approaches subsumed by the general
exist when the variances are no longer the same. In matrix form,
linear model (see Equation 1): (a) testing for the equality of two
this can be denoted by
independent means, (b) simple linear regression, and (c) analysis
of covariance.

冢 冣冢 冣
w1 0 ··· 0 ␴21 0 · · · 0
0 w2 · · · É 0 ␴22 ··· É
cov(␧) ⫽ ␴2V ⫽ ␴2 ⫽ , Testing for the Equality of Two Independent Means
É É 0 É É 0
0 0 · · · wN 0 0 · · · ␴N2 With two independent samples of observations on y, researchers
may be interested in testing the equality of two independent
(3) population means (i.e., H0: ␮1 ⫽ ␮2). When testing for the equality
of two independent means, p ⫽ 1, and the model matrix (X) in
where V is an N ⫻ N diagonal matrix, wi represents an arbitrary
Equation 1 can be expressed as follows:
scaling factor, and we use a normalization, tr(V) ⫽ N. We can
conveniently express ␴i2 ⫽ ␴2wi. With heteroscedasticity, ␴i2 ⫽
␴i⬘2 for some i and i=. Note that, when wi ⫽ 1 (for all N observa- 1
For unbiased parameter estimation, the normality assumption for the
tions), Equation 3 reduces to Equation 2 for the usual homosce- distribution of εis is not required. However, it is a required assumption for
dastic linear model (Rencher, 2000; Seber & Lee, 2003).2 statistical inferences (e.g., hypothesis testing, confidence intervals; see
Rencher, 2000; Seber & Lee, 2003).
Notably, the population errors (i.e., εis) are not directly observed 2
It deserves noting that when describing the statistical assumption of
in a sample (Cook & Weisberg, 1982). However, as mentioned homoscedasticity (see Equation 2) and the violation of this statistical
above, with an estimate of ␤, say, ␤ˆ (by the method of ordinary assumption (see Equation 3), it can be further assumed that the conditional
residual variances come from participants, or more generally units (Shad-
least squares), we can obtain sample-based estimates of the errors ish, Cook, & Campbell, 2002), with an equivalent pattern of scores on the
known as residuals (ei) to infer the appropriateness of the assump- predictor or predictors (see Cohen et al., 2003; Ghiselli, Campbell, &
tions regarding the population errors (Cook & Weisberg, 1982). Zedeck, 1981).
MANAGING HETEROSCEDASTICITY 337

冤冥
1 0 and the model matrix (X) in Equation 1 can be expressed as
1 0 follows:
É É

冤 冥
1 x1
X ⫽ 关1 N d 兴 ⫽ 1 0 , 1 x2
1 1 X ⫽ 关1 N x 兴 ⫽ ,
É É
É É
1 xN
1 1
where x is a continuous predictor. Assuming the normality and
where 1N denotes the usual N ⫻ 1 column vector of 1s and d is an homoscedasticity assumptions have been satisfied, the t test with
N ⫻ 1 vector denoting group membership (i.e., a dummy variable) N ⫺ 2 degrees of freedom associated with the regression coeffi-
such that the n 0s (where n ⬍ N) indicate membership in Group 1 cient for x provides the usual test of the null hypothesis that the
and the (N ⫺ n) 1s indicate membership in Group 2. Assuming the
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

slope is equal to zero.


This document is copyrighted by the American Psychological Association or one of its allied publishers.

normality and homoscedasticity assumptions have been satisfied, When homoscedasticity is violated, however, Equation 2 no
the t test with N ⫺ 2 degrees of freedom associated with the longer holds. The variances of the population errors (i.e., ␧) may
regression coefficient for d (i.e., ␤ˆ ⫽ ␮ˆ ⫺ ␮ˆ ) provides the usual
1 2 2
be functions of x (e.g., increasing with x). Then, we have a special
test for the equality of two independent means. case of Equation 3 expressed as
When homoscedasticity is violated, however, Equation 2 no
f 共x1兲

冢 冣
longer holds. Instead, the main diagonal has a common variance 0 ··· 0
for Group 1 (1␴2) and a common variance for Group 2 (2␴2) such 0 f 共x2兲 · · · 0
that 1␴2 ⫽ 2␴2.3 Thus, we have a special case of Equation 3 cov(␧) ⫽ ␴2 , (7)
É É Ì É
expressed as
0 0 · · · f 共xN兲

冢 冣
1␴
2
0 ... 0 0
with functions of the xs on the diagonal (Rencher, 2000). When
1␴
2
0 ... É É
evaluated subjectively, Figures 2a and 2b depict plots of the
cov(␧) ⫽ É É Ì . (6) residuals against a continuous predictor in the absence and
2␴
2 presence of heteroscedasticity, respectively, which may occur
... 0
when conducting a simple linear regression. In Figure 2b, the
2␴
2
0 0 ... 0
variability of the residuals increases with x. Although other
For example, if the common variance in Group 1 and Group 2, patterns of heteroscedasticity may manifest (e.g., butterfly
respectively, was three and one, Equation 6 would simply be shape or galaxy shape), Darlington (1990) noted that the “most
common type of heteroscedasticity occurs when . . . [the con-

冢 冣
3 0 ··· 0 0 ditional distribution of y] . . . is largest for the highest or lowest
0 3
··· É É values of some regressor or combination of regressors” (p.
cov(␧) ⫽ É É Ì . 360).4
··· 1 0
0 0 ··· 0 1 3
Note that when testing for differences between independent means in the
presence of heteroscedasticity, this has been termed the Behrens–Fisher problem
When evaluated subjectively, Figures 1a and 1b graphically depict after the researchers who proposed approximations to the sampling distribution of
the absence and presence of heteroscedasticity, respectively, in a the statistic that employed separate variance estimators. Note that the Behrens–
test of the equality of two independent means. Fisher problem is particularly problematic when sample sizes across groups are
unequal. When the larger variance is paired with the group with the larger sample
It deserves noting that the test of the equality of two inde- size (i.e., direct pairing), Type I error rates become conservative, and ceteris
pendent means is a special case of analysis of variance paribus, power generally decreases. In contrast, when the larger variance is paired
(ANOVA; Fox, 2008; Rencher, 2000). Thus, the model matrix with the group with the smaller sample size (i.e., indirect pairing), Type I error rates
(X) in Equation 1 can include additional dummy variables (e.g., become liberal, and ceteris paribus, power increases (albeit illegitimately). We
thank an anonymous reviewer for highlighting this important point. This has been
p ⫽ 3) for, say, a one-way ANOVA with four independent studied by Box (1954) and reviewed by Glass, Peckham, and Sanders (1972). The
groups. In addition, Equation 6 can be generalized to instances Behrens–Fisher problem has also been studied when testing for slope differences
of heteroscedasticity in a one-way ANOVA, for example, where (DeShon & Alexander, 1996; Overton, 2001; Rosopa, 2006) and in multivariate
the diagonal elements are unequal and change as a function of analysis of variance (S. J. Kim, 1992).
4
group membership. It deserves mentioning that the standard errors (i.e., the square root of the
diagonal elements in Equation 5) will be affected differently depending on the
degree and the nature of the heteroscedasticity. For example, with butterfly-shaped
heteroscedasticity associated with a regressor, the diagonal element in Equation 5
Simple Linear Regression associated with this regressor will be underestimated. In contrast, with galaxy-
shaped heteroscedasticity associated with a regressor, the corresponding diagonal
When a researcher has scores on a continuous predictor (x), element in Equation 5 will be overestimated (Darlington, 1990). See also the
he or she may be interested in testing whether the slope of x is Behrens–Fisher problem in Footnote 3. We thank an anonymous reviewer for
zero when predicting y. With simple linear regression, p ⫽ 1, raising this important issue.
338 ROSOPA, SCHAFFER, AND SCHROEDER

moscedasticity assumptions are met, the t test with N ⫺ 3 degrees


of freedom associated with the regression coefficient for d (i.e., ␤ˆ ) 2
provides the usual test for the equality of the adjusted means. Note
that the unambiguous interpretation of the adjusted means (typi-
cally, at the grand mean of x, x៮) assumes the regression slopes are
the same across groups (Rutherford, 1992). When the regression
slopes are the same, the regression lines for the two groups will be
parallel to one another. That is, when x is used to predict y in
Group 1 and x is used to predict y in Group 2, the population
regression slope associated with x in each group should be the
same to allow for an unambiguous interpretation of the adjusted
mean difference on y between the two groups.
When heteroscedasticity exists, however, Equation 2 no longer
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

holds. Although the variances of ␧ could differ in various ways, we


This document is copyrighted by the American Psychological Association or one of its allied publishers.

consider how ␧ may differ within each of two groups. For conve-
nience, assume that xc is placed in ascending order separately
within each group. The n residuals within Group 1 could have

Figure 1. Boxplot of residuals against a categorical predictor with two


levels in the presence of (a) homoscedasticity and (b) heteroscedasticity.
The error bars in Figure 1 depict the minimum and maximum values.

Analysis of Covariance
With analysis of covariance (ANCOVA), researchers are tradi-
tionally interested in whether means across two (or more) inde-
pendent groups differ after adjusting for one or more covariates.
Based on Equation 1, the model matrix (X) for an ANCOVA with
one covariate and two independent groups, with n observations in
Group 1 and (N ⫺ n) observations in Group 2, can be expressed as

冤 冥
1 x1 ⫺ x៮ 0
1 x2 ⫺ x៮ 0
X ⫽ 关1 N x c d 兴 ⫽ ,
É É É
1 xN ⫺ x៮ 1

where xc is the covariate vector of length N expressed in deviation Figure 2. Scatterplot of residuals against a continuous predictor, x, in the
score form (Fox, 2008). Assuming that the normality and ho- presence of (a) homoscedasticity and (b) heteroscedasticity.
MANAGING HETEROSCEDASTICITY 339

increasing variability as x increases (see Figure 2b). For Group 2, Ohtani and Toyoda (1980) and Ali and Giaccotto (1984), we do
the (N ⫺ n) residuals could be fairly constant (i.e., homoscedastic) not suggest that researchers assume the presence of heteroscedas-
or perhaps could have decreasing variability as x increases. ticity or assume to know its exact form. Rather, its presence and
In sum, heteroscedasticity can occur in statistical models com- form should be diagnosed (Step 2 below), and if present, an
monly used in the behavioral and social sciences. Therefore, the ameliorative procedure applied (Step 3 below).
use of data-analytic techniques focused on detecting heteroscedas-
ticity and mitigating its effects is necessary. Step 1. Estimate a Model Based on Theory and Save
Residuals
Heteroscedasticity as a Signal of Nonstandardized
Treatment Implementation The first step in examining heteroscedasticity is to estimate a model
that includes relevant predictors based on theory, that is, proper model
Although heteroscedasticity can manifest in statistical models specification. From this analysis, the residuals should be saved. Re-
frequently used in the behavioral sciences, it is important to note gardless of the complexity of the model, the number of predictors, or
that heteroscedasticity can also serve as an indicator of an exper- type of predictors (e.g., categorical, continuous) suggested by theory,
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

imental treatment that may not have been executed in a standard- the residuals are required to evaluate whether heteroscedasticity ex-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

ized manner (Bryk & Raudenbush, 1988). Consider the following ists.5 All statistical software packages allow the user to save residuals
example. When pilot testing a research protocol, participants are (e.g., as a new object, a new data file, or an additional column in an
administered a pretest, are randomly assigned to one of two ex- existing file). Note that for some statistical analyses (e.g., ANOVA,
perimental conditions, and are then measured on a posttest. Dif- two independent sample t test), it is not necessary to save the residuals
ferences in the estimated variances of the residuals across the two because the variances can be analyzed directly within each of the
experimental conditions could signal that the implementation was independent groups.6
not standardized (e.g., the instructions or the tasks were unclear for It is important to note that studentized residuals have been
participants in one of the groups). Note that the issue of estimated recommended by some researchers (Cook, 1977; Stevens, 1984),
mean or slope differences is not the focus here; such parameter particularly when using graphical approaches to detect heterosce-
estimates remain unbiased in the presence of heteroscedasticity dasticity (Fox, 2008). Although internally and externally studen-
(Rencher, 2000). Instead, it is the standard errors and the estimated tized residuals can be calculated, because the specific type of
covariances among parameter estimates that will be incorrect. residual does not markedly change the data-analytic process, we
Thus, in the initial research design stage, if the research protocol direct the interested reader to Belsley, Kuh, and Welsch (1980),
(e.g., equipment, scripts, adequate participant practice) can be Cook and Weisberg (1982), and Meloun and Militky (2001) for a
executed without inducing heteroscedasticity, this could suggest discussion of the relative merits of each type.
that an adequate level of standardization has been achieved. How- Illustrative example. In this section, we illustrate how Step 1
ever, the presence of heteroscedasticity may also indicate that the is applied to a set of data. For this illustration, participants (N ⫽
treatment effect was not fixed or that all participants did not react 78) completed a pretest (x), were randomly assigned to one of two
to the treatment in a homogeneous manner. For this reason, the conditions, and then completed a posttest (y). Those in Group 1
nature of the heteroscedasticity should be examined further to (n ⫽ 40) were in the control group, and those in Group 2 (N ⫺ n ⫽
determine at which level it occurs. For instance, are there differ- 38) were in the treatment group. A researcher hypothesized that the
ences in variability between those who are implementing the slope of the pretest when predicting the posttest would be more
treatment or conducting the experiment, or is the variability be- positive in the treatment group compared to the control group. The
tween participants overall? The careful analysis of heteroscedas- data for this illustration can be found in the Appendix, along with
ticity may play an important role in psychological research. population parameters used to generate this sample of data.
Here, we are testing for the equality of regression slopes with
Data-Analytic Process and an Illustrative Example two independent groups (i.e., an interaction between x and d). This
In this section, we discuss the data-analytic process that can be is, of course, a generalization of ANCOVA, where, instead of
employed to diagnose heteroscedasticity and mitigate its effects. assuming homogeneous regression slopes across the levels of d,
We present a set of procedures based on our synthesis of literature the model allows for the possibility of unequal regression slopes
in applied psychology, econometrics, quantitative psychology, and (Rutherford, 1992). This analysis is also referred to as moderated
statistics, providing a general resource for researchers and practi- multiple regression with a continuous predictor and a categorical
tioners. In particular, we discuss saving residuals (Step 1), analysis moderator (Aguinis, 2004; Overton, 2001). For our illustration, the
of residuals (Step 2), and statistical inferences involving parameter model matrix (X) can be expressed as
estimates (Step 3). In addition, to demonstrate the data-analytic X ⫽ 关1 N x d x 䉺 d 兴 , (8)
process while also comparing different procedures, we incorporate
an illustrative example. More specifically, after describing Step 1, where 䉺 denotes the Hadamard product (Schott, 2005).
we introduce our illustrative example and apply Step 1 to this data.
Next, we review procedures associated with Step 2, then continue
5
with the same example and apply Step 2. Finally, we describe Although residuals based on other estimation procedures can be used
various procedures associated with Step 3 and complete our dem- to evaluate the tenability of the homoscedasticity assumption, based on
Monte Carlo simulations, Ali and Giaccotto (1984) found that tests that use
onstration by applying the last step to the same example data. ordinary least squares residuals tend to provide the greatest power to detect
It deserves noting that when the homoscedasticity assumption is heteroscedasticity.
6
violated, researchers should take corrective action. Consistent with We thank an anonymous reviewer for alerting us to this issue.
340 ROSOPA, SCHAFFER, AND SCHROEDER

Assuming the normality and homoscedasticity assumptions function of a categorical predictor (e.g., gender or type of treat-
have been satisfied, the t test on the regression coefficient associ- ment), a number of statistical tests are available. If the normality
ated with the last column in Equation 8 (i.e., ␤ˆ ) with (N ⫺ 4)
3
assumption has not been violated, Bartlett’s (1937) test has been
degrees of freedom provides the usual test for the equality of shown to be more powerful compared to other procedures (Games,
regression slopes across the two independent groups (viz., Group Winkler, & Probert, 1972). Essentially applying a logarithmic
2’s slope ⫺ Group 1’s slope). Note that this analysis is a standard modification to independent variances, Bartlett’s test is distributed
procedure prior to interpreting a traditional ANCOVA (Ruther- as ␹2 with degrees of freedom equal to the number of levels of the
ford, 1992). That is, prior to interpreting means across independent categorical predictor minus one. Bartlett’s procedure has also been
groups after adjusting for a covariate, researchers assess whether recommended by DeShon and Alexander (1996) to detect het-
slopes are the same across groups. If slopes are not the same across eroscedasticity between independent groups when testing for slope
groups, then the adjusted means will differ depending on the differences.
specific value of a covariate. Notably, testing for the equality of Because Bartlett’s (1937) test is sensitive to nonnormality (Box,
regression slopes is also commonly used in research on Aptitude ⫻ 1953; Levene, 1960), other procedures may be recommended when
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Treatment interactions (Smith & Sechrest, 1991) and differential the variance of the residuals changes as a function of a categorical
This document is copyrighted by the American Psychological Association or one of its allied publishers.

prediction (American Educational Research Association, Ameri- predictor. For example, Brown and Forsythe’s (1974) test is consid-
can Psychological Association, & National Council on Measure- ered robust when data are nonnormal (Conover, Johnson, & Johnson,
ment in Education, 1999; Saad & Sackett, 2002). 1981) and is often used in ANOVA. Note that Brown and Forsythe’s
In this example, we used the general linear model where the approach is a modified version of Levene’s (1960) test. Other ap-
model matrix (based on the researcher’s theory) was that shown in proaches have been developed by O’Brien (1981) and Wilcox (2002)
Equation 8. The regression coefficients were estimated using or- that are robust to violations of the normality assumption.
dinary least squares. The overall model was statistically signifi- The Brown and Forsythe (1974) test is relatively easy to calculate
cant, F(3, 74) ⫽ 74.53, p ⬍ .001 (R2 ⫽ .75), and the t test because it is simply an ANOVA on the absolute value of the residuals
associated with ␤ˆ ⫽ .69 was not statistically significant at ␣ around their respective group medians. Thus, it may be quite acces-
3
(Type I error rate) ⫽ .05, t(74) ⫽ 1.98, p ⬎ .05. However, we sible for a wide variety of researchers and practitioners. Although
refrain from concluding that the slopes do not differ significantly O’Brien’s (1981) procedure is not available in statistical software,
from one another because Steps 2 and 3 have not yet been com- Rosopa, Schroeder, and Doll (2013) provided code in R—a free,
pleted. To complete Step 1, the unstandardized residuals from our open-source, statistical software—that calculates this test. In addition,
analysis were saved. They appear in the Appendix in the column O’Brien’s procedure can be used to investigate the main and interac-
labeled RESID. tive effects of the variances of the residuals. That is, the residual
variances may change not only due to the effects of two categorical
predictors but also due to their interaction.
Step 2. Investigate Compliance With Homoscedasticity
Wilcox (2002) described a procedure that can complement the
Assumption: Residual Analysis
Brown and Forsythe (1974) test. Specifically, Wilcox showed that
The next step is to analyze the saved residuals, for which various when interest lies in testing the hypothesis of equal variances
approaches exist. Across diverse literatures, we have classified across two independent groups, the mean half-square successive
them into statistical, graphical, and heuristic approaches. difference statistic, when combined with a modified percentile
Statistical approaches. Table 1 summarizes the statistical bootstrap, performs well in terms of Type I error and probability
procedures that we review, including their main strengths and coverage. However, this procedure may be limited because it has
weaknesses. When the variance of the residuals changes as a not been extended beyond two independent groups.

Table 1
Summary of Selected Statistical Approaches for Detecting Heteroscedasticity

Procedure Type of predictors Strengths (⫹) & weaknesses (⫺)

Bartlett (1937) Categorical •Good power levels when data are normal (⫹)
•Sensitive to nonnormality (⫺)
Brown & Forsythe (1974) Categorical •Computationally simple (⫹)
•Robust when data are nonnormal (⫹)
Levene (1960) Categorical •Computationally simple (⫹)
•Sensitive to nonnormality (⫺)
O’Brien (1981) Categorical •Robust when data are nonnormal (⫹)
•Test for main and interactive effects (⫹)
Wilcox (2002) Categorical •Control over Type I and Type II errors (⫹)
•Applicable to two groups only (⫺)
Score testa Continuous and/or categorical •Flexible approach (⫹)
•Can use fitted values as predictors (⫹)
•Requires normality assumption (⫺)
White (1980) Continuous and/or categorical •Low power (⫺)
•Nondiagnostic (⫺)
a
Developed independently by Breusch and Pagan (1979) and Cook and Weisberg (1983).
MANAGING HETEROSCEDASTICITY 341

Overall, when residual variances are suspected to change as a regression sum of squares (SSR) is required. The test statistic,
function of a categorical predictor, Bartlett’s (1937) test appears to (SSR/2) ⫼ (SSE/N)2, is asymptotically distributed as ␹2 with degrees
be preferred when the normality assumption is met. However, if of freedom equal to the number of variables used to predict the
the normality assumption is violated, we suggest the use of Brown squared residuals in the second analysis. The score test assumes that
and Forsythe’s (1974) procedure. If there are multiple categorical the normality assumption has not been violated.
predictors that may be interacting in their effects on the residual Although there exists another general test of heteroscedasticity,
variances, then we suggest O’Brien’s (1981) test. Based on the developed by White (1980), we are not aware of a general test that
results of the selected statistical approach, if the test is statistically has power levels comparable to the score test. The general test by
significant at some predetermined ␣, then heteroscedasticity ap- White tends to have low power (Ali & Giaccotto, 1984) and is not
pears to be a function of the categorical predictor. Otherwise,
diagnostic. That is, if White’s test for heteroscedasticity is rejected,
homoscedasticity remains tenable.
it remains unknown how the homoscedasticity assumption was
When the variance of the residuals changes as a function of a
violated. It could be a function of one or more variables (contin-
continuous predictor (e.g., age), categorical predictor (e.g., gender), or
uous and/or categorical), but it does not specify the cause of
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

the predicted values, a flexible approach is the score test that was
heteroscedasticity. Therefore, overall, for a general test of het-
This document is copyrighted by the American Psychological Association or one of its allied publishers.

proposed independently in the econometrics (Breusch & Pagan, 1979)


and statistics (Cook & Weisberg, 1983) literatures. This test assesses eroscedasticity, we suggest the score test (Breusch & Pagan, 1979;
whether known variables (or predicted values) are systematically Cook & Weisberg, 1983). If the results of the score test are
related to the residual variances. To construct the test statistic, two statistically significant at some predetermined ␣, then heterosce-
ordinary least squares regression analyses are required. In the first dasticity appears to be a function of some focal variable (e.g.,
analysis, the model of interest is fitted (e.g., Equation 8). Then, the continuous predictor or predicted values). Otherwise, homoscedas-
sum-of-squares error (SSE; i.e., the quantity in the numerator of ticity remains tenable.
Equation 4) is obtained. Then, in the second analysis, the squared Note that some statistical procedures are available in software
residuals from the first analysis are regressed on the predictors be- packages. However, some procedures may need to be programmed
lieved to be the cause of the heteroscedasticity. From this analysis, the by the user. Table 2 summarizes which statistical procedures are

Table 2
Availability of Procedures in Various Software Packages

Procedure R SAS IBM SPSS STATA SYSTAT

Residual analysis (Step 2)


Bartlett (1937) ✓ ✓ ✓ ✓ ✓
Brown & Forsythe (1974) ✓ ✓ ✓ ✓ ✓
Levene (1960) ✓ ✓ ✓ ✓ ✓
b
O’Brien (1981)
Wilcox (2002) ✓
Scorea ✓ ✓ ✓
White (1980) ✓ ✓
Graphical approaches ✓ ✓ ✓ ✓ ✓
Statistical inferences (Step 3)
Weighted least squares ✓ ✓ ✓ ✓ ✓
c c
HCCM ✓ ✓
Randomization tests
Mann–Whitney U test ✓ ✓ ✓ ✓
Wilcoxon’s matched pairs test ✓ ✓ ✓ ✓
Least median squares ✓ ✓ ✓
Least trimmed squares ✓ ✓ ✓
Generalized M-estimators ✓ ✓ ✓
Theil–Sen estimator ✓ ✓ ✓
d
HC4 with a wild bootstrap ✓ ✓
Theil–Sen estimator with a
d
percentile bootstrap ✓ ✓
Welch–Satterthwaite test ✓ ✓ ✓ ✓ ✓
Fⴱ approximatione
J approximatione,f
A approximatione,f
Variance-stabilizing transformations ✓ ✓ ✓ ✓ ✓
Note. HCCM ⫽ heteroscedasticity-consistent covariance matrix; HC4 ⫽ heteroscedasticity-consistent cova-
riance matrix 4.
a
Developed independently by Breusch and Pagan (1979) and Cook and Weisberg (1983). b Rosopa, Schroe-
der, and Doll (2013) provided R code for this procedure. c Hayes and Cai (2007) provided macros for this
procedure in these software packages. d Ng and Wilcox (2010) provided R code for this procedure. e Stand-
alone programs and syntax were offered by DeShon and Alexander (1996). f Aguinis, Petersen, and Pierce
(1999) provided a free online tool that calculates this approximation.
342 ROSOPA, SCHAFFER, AND SCHROEDER

currently available in commonly used statistical software in the residual variance near the mean of a regressor with the smallest
behavioral and social sciences. residual variances toward small and large values of the regressor;
It deserves mentioning that some literature suggests that re- Cohen et al., 2003; Darlington, 1990; Fox, 2008). Note that the
searchers should not conduct preliminary tests of homoscedasticity regression surface is p-dimensional and is contained within (p ⫹
when conducting tests of location (Sawilowsky, 2002). In simula- 1)-dimensional space. Thus, it may not be feasible to use graphical
tions involving two independent groups, Zimmerman (2004) cal- approaches to assess whether the residuals are nonconstant across
culated both conditional and unconditional probabilities of Type I p ⬎ 2 predictors simultaneously. However, researchers can graph-
error rates. He defined the conditional probability of a Type I error ically explore, one (or two) predictor(s) at a time, whether the
rate as the probability of rejecting a true null hypothesis (at ␣) residuals are approximately nonconstant across a focal predictor or
depending on the result of a preliminary Levene’s (1960) test at ␣. the predicted values.
If Levene’s test was not statistically significant, Student’s t test Although interpreting graphical displays can be subjective, they
was conducted at ␣; otherwise, Welch’s (1938) test was conducted can provide considerable diagnostic value (Mansfield & Conerly,
at ␣. Zimmerman also calculated the unconditional probability of 1987; Thisted, 1988) and may be used to complement statistical
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

a Type I error rate for both Student’s t and Welch’s test. The approaches (Cook & Weisberg, 1983). Graphical approaches for
This document is copyrighted by the American Psychological Association or one of its allied publishers.

unconditional probability was the probability of a Type I error rate, analyzing data including residuals were described in Chambers,
for each test, without conducting a preliminary Levene’s test. Cleveland, Kleiner, and Tukey (1983); Cook and Weisberg (1982,
Overall, the conditional probability of a Type I error deviated 1983, 1999); Fox (2008); and Wainer and Thissen (1993). All
considerably from the nominal ␣, while an unconditional Welch’s major statistical software packages have these capabilities.
test provided the best control over Type I error rate. Thus, Zim- Heuristic approaches. Some heuristic approaches (i.e., rules
merman suggested that tests for the equality of two independent of thumb) have been suggested by researchers. Specifically, if the
means should be unconditionally conducted with Welch’s test, residual variances can be partitioned on the basis of a categorical
particularly when sample sizes are unequal.7 Similarly, but with a predictor, then a ratio involving the largest residual variance to the
more general case, Ng and Wilcox (2011) found that when con- smallest residual variance can be computed. When testing for
ducting a preliminary test for heteroscedasticity prior to perform- mean differences, Seber and Lee (2003) suggested a ratio of 2 as
ing statistical tests on ordinary least squares regression coeffi- the threshold where heteroscedasticity is likely to unduly influence
cients, Type I error rates are not adequately controlled at the statistical inferences. In the context of moderated multiple regres-
nominal level. sion with a continuous predictor and a categorical moderator, when
Zimmerman’s (2004) suggestion is an interesting one. Although residual variances differ across a categorical moderator, DeShon
he varied the degree of heteroscedasticity from mild to extreme, as and Alexander’s (1996) statistical simulations suggested that het-
well as the pairing of subgroup sample size (i.e., large subgroup eroscedasticity can be problematic once the ratio of the variances
sample size vs. small subgroup sample size) with different error of the residuals exceeds 1.5, and this heuristic has been shown to
variances (i.e., large error variance vs. small error variance), het- be useful in some applied settings (Oswald, Saad, & Sackett,
eroscedasticity could only take one form—the variance of the 2000).
population errors were unequal across the two groups. That is, one Note that these heuristic approaches are helpful general guide-
sample had a larger variance than the other. Thus, a general lines that do not require statistical inferences. However, they are
recommendation was possible for this scenario. However, having limited to those instances where the variances of the residuals can
only one possible form of heteroscedasticity is unlikely in more be calculated within each of the levels of a categorical predictor.
complex analyses (e.g., factorial ANCOVA, moderated multiple Otherwise, a researcher would have to artificially polychotomize a
regression). Note that research by Ng and Wilcox (2011) suggested continuous predictor to calculate the ratio of the largest to the
the use of a general remedy for heteroscedasticity even when the smallest residual variances. This is generally not advised because
specific form is unknown (to be discussed below in the subsection how the polychotomization is performed (e.g., two groups, three
titled Heteroscedasticity-Consistent Covariance Matrices; Cai & groups, etc.) could result in very different ratios. In addition,
Hayes, 2008; Long & Ervin, 2000). research generally suggests that artificial polychotomization of
Graphical approaches. Graphical approaches similar to data results in a diminished capacity to detect true relationships in
those noted in the section above titled Heteroscedasticity in Com- some situations and an increase in Type I error rates in other
monly Used Statistical Models may also be used to detect various situations (Cohen, 1983; Maxwell & Delaney, 1993).
forms of heteroscedasticity. Regardless of whether a two- or three- Three general approaches for investigating whether residual
dimensional display is generated, it is typical to place residuals on variances differ have been summarized above. Next, we apply
the vertical axis. Then, predictors believed to be related to the these approaches to our illustrative example data.
variance of the residuals are placed on the horizontal axis. If Illustrative example. Here, we continue with our illustrative
homoscedasticity is satisfied, then the spread of the residuals will example and apply Step 2, investigating whether the homoscedas-
be approximately the same at all levels (or values) of a predictor ticity assumption was violated. For comparison purposes, we apply
(see Figures 1a and 2a). Otherwise, heteroscedasticity may exist statistical, graphical, and heuristic approaches to analyze the re-
such that the variance of the residuals increases (or decreases) as siduals.
the value of a predictor increases (or decreases; see, e.g., Figure Using the residuals (see column labeled RESID in the Appen-
2b). Researchers have described other patterns of heteroscedastic-
dix) as the dependent variable and d as the independent variable,
ity, including a butterfly shape (i.e., large residual variances at
small and large values of a regressor with small residual variances
7
near the mean of the regressor) and a galaxy shape (i.e., large We thank an anonymous reviewer for directing us to this study.
MANAGING HETEROSCEDASTICITY 343

Brown and Forsythe’s (1974) test was conducted. The test was Step 3. Statistical Inferences Involving Parameter
statistically significant, F(1, 76) ⫽ 10.11, p ⬍ .01, indicating that Estimates
the variability of the residuals between conditions was not the
same. Note that the score test (Breusch & Pagan, 1979; Cook & The issue of statistical tests and interval estimation involves
not only the parameter estimates (which remain unbiased) but
Weisberg, 1983) was also conducted, using d as the predictor in
also the covariance matrix among the parameter estimates (see
the second regression analysis (see section above titled Statistical
Equation 5). Recall that the latter will be incorrect if heterosce-
Approaches), ␹2(1) ⫽ 11.16, p ⬍ .001, and it was consistent with
dasticity is present. Thus, if heteroscedasticity exists, based on
the results of Brown and Forsythe’s test.
the results of the residual analyses described above in Step 2, it
Next, using graphical approaches to assess the tenability of the
is important that appropriate statistical tests be conducted and
homoscedasticity assumption, a plot of the residuals against group
confidence intervals examined for the parameter estimates. For
membership (i.e., d) was constructed (see Figure 3). Overall, example, assuming the alternative hypothesis was true, when
Figure 3, when combined with the results of Brown and Forsythe’s testing whether a regression slope is positive in the population
(1974) test and the score test, indicates that those in the treatment
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

in the presence of heteroscedasticity, failure to support this


group (i.e., Group 2) exhibited less variability in the residuals
This document is copyrighted by the American Psychological Association or one of its allied publishers.

hypothesis is likely to be due, in part, to an incorrect standard


compared to those in the control group (i.e., Group 1). error. That is, the sample-based conclusion is likely to be
For completeness, we also plotted the residuals against predicted erroneous due to incorrect standard errors. This would pose a
values and x to inspect whether the residual variances changed as serious threat to statistical conclusion validity (Shadish et al.,
a function of the predicted values or x. There was no evidence of 2002). Below, we describe various alternatives available for
heteroscedasticity in either case because the spread of the residuals statistical inferences (e.g., hypothesis tests, confidence inter-
was approximately the same across the predicted values and across vals) in the presence of heteroscedasticity. Table 3 provides a
x. Thus, heteroscedasticity manifested only between the two summary of the procedures, and Table 2 summarizes which
groups. procedures are currently available in commonly used statistical
Finally, in analyzing the residuals, because we have data where software in the behavioral and social sciences.
the residuals can be divided on the basis of a categorical predictor Weighted least squares. Weighted least squares regression is
(viz., our binary predictor, d), we applied a heuristic approach. a general approach to analyzing data in linear models when het-
Using the variances of the residuals, the ratio was 3.34/0.99 ⫽ eroscedasticity exists. However, because the N weights for use in
3.37. Thus, based on the suggestions by Seber and Lee (2003) and weighted least squares regression are often unknown (for an ex-
DeShon and Alexander (1996), the degree of heteroscedasticity ception, see Steel & Kammeyer-Mueller, 2002), this requires es-
present in the data is likely to lead to biased results. timating the weights, which are functions of the residual variance.
Note that heteroscedasticity can pose problems when conduct- Although the details of weight estimation are beyond the scope of
ing statistical tests and computing confidence intervals involving the current article, an excellent discussion of estimating the
estimated parameters. Therefore, in the section below, we summa- weights, including when the error variances are known up to a
rize procedures that can be used in the presence of heteroscedas- proportionality constant and when replicates are available at each
ticity. combination of the levels of the predictors (e.g., in controlled
experiments), can be found in Neter, Kutner, Nachtsheim, and
Wasserman (1996, pp. 400 – 409) and Cook and Weisberg (1999,
pp. 204 –220). In weighted least squares regression, the model can
be expressed as

Wy ⫽ WX␤ ⫹ W␧, (9)

where W is an N ⫻ N diagonal matrix with the square root of the


N weights on the diagonal. Expressing Equation 9 as yⴱ ⫽ Xⴱ␤ ⫹
␧ⴱ (where yⴱ ⫽ Wy, Xⴱ ⫽ WX, and ␧ⴱ ⫽ W␧), we see that
weighted least squares regression can be viewed as the usual linear
model (cf. Equation 1) but applied to transformed data (Cook &
Weisberg, 1999; Fox, 2008; Neter et al., 1996). In weighted least
squares regression, Equation 5 becomes

cov共␤
ˆ兲 ⫽ ␴
ˆ 2共XWX兲⫺1
When the weights are all equal to unity, weighted least squares
regression is identical to ordinary least squares regression. Stated
differently, ordinary least squares regression is a special case of
weighted least squares regression (Cook & Weisberg, 1999; Fox,
2008; Neter et al., 1996; Thisted, 1988). Weighted least squares
Figure 3. Boxplots of residuals for Group 1 (control) and Group 2 regression has been recommended by researchers because of its
(treatment) from the illustrative example. The error bars in Figure 3 depict increased statistical power and robustness relative to ordinary least
the minimum and maximum values. squares regression when heteroscedasticity exists (see, e.g., Neter
344 ROSOPA, SCHAFFER, AND SCHROEDER

Table 3
Summary of Procedures for Statistical Inference in the Presence of Heteroscedasticity

Types of
Name of test predictors Strengths Weaknesses

Weighted least squares Continuous and/or •Greater power in the presence of heteroscedasticity •Requires estimation of weights
categorical than ordinary least squares
•Output similar to regression: easy to interpret
HCCM Continuous and/or •Does not require a priori knowledge of the form of •Some HCCMs can be computationally
categorical heteroscedasticity intensive
•Versatile, with good statistical power
Randomization tests Categorical •Does not require normality •Computer-intensive, especially with large
samples
Mann–Whitney U test Categorical •Available in major statistical software packages •Can compare two independent groups only
Wilcoxon’s matched pairs
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

test Categorical •Available in major statistical software packages •Can compare two dependent groups only
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Kruskal–Wallis test Categorical •Available in major statistical software packages •Generally less power than their parametric
counterparts
Friedman’s rank test Categorical •Available in major statistical software packages •Generally less power than their parametric
counterparts
Theil–Sen estimator Continuous and/or •Low Type I error and good statistical power •May not perform as well when testing
categorical interaction terms
HC4 with a wild bootstrap Continuous and/or •Recommended when N is small or the regression •Computationally intensive
categorical coefficient is small
•Available in some major statistical software
packages
Theil-Sen estimator with a Continuous and/or •Recommended when N or the regression coefficient •Computationally intensive
percentile bootstrap categorical is large
•Available in some major statistical software
packages
Note. HCCM ⫽ heteroscedasticity-consistent covariance matrix; HC4 ⫽ heteroscedasticity-consistent covariance matrix 4.

et al., 1996; Overton, 2001; Rosopa, 2006; Steel & Kammeyer- among the parameter estimates (i.e., cov共␤兲; ˆ see Equation 5).
Mueller, 2002). In addition, a convenient aspect of weighted least Regression coefficients are not adjusted or corrected because they
squares regression is that the output from statistical software remain unbiased. All statistical tests, confidence intervals, and
remains in a familiar regression framework (Aguinis, 2004; Over- confidence bands employ the relevant linear combination of the
ton, 2001). Thus, regression coefficients, F tests, and t tests can be elements in the adjusted covariance matrix among the regression
interpreted as usual, making weighted least squares regression a coefficients—the HCCM. Currently, there are six types of
practical alternative procedure for a wide range of researchers and HCCMs: HC0 through HC5. Later variants were generally in-
practitioners.8 tended to improve upon early HCCMs.
Although we recommend the use of weighted least squares Specifically, HC3 (MacKinnon & White, 1985) can be ex-
regression in general, the estimation of the weights requires cor-
pressed as
rectly identifying the pattern of heteroscedasticity present in the

冋共 册共
data (see Step 2) because the weights are functions of the residual
ei2
variance. If a researcher is unsure of the pattern or incorrectly HC3 ⫽ 共X⬘X兲⫺1X⬘diag X X⬘X兲⫺1 , (10)
diagnoses the pattern of heteroscedasticity, the weights could be 1 ⫺ hii兲2
incorrectly estimated, resulting in a weighted least squares regres-
sion where the elements, for example, in cov共␤兲 ˆ ⫽ ␴ ˆ 2共XWX兲⫺1 where hii ⫽ the ith diagonal element in the hat matrix, H ⫽
may still be incorrect. Thus, as a remedial procedure to mitigate X(X=X)⫺1X=. The hiis are known as leverages. HC3 improved
the effects of heteroscedasticity, weighted least square regression upon HC2 by providing for a more stringent adjustment for influ-
tends to perform optimally when the pattern of heteroscedasticity ential cases in X. Namely, HC2 resembles HC3. However, the
has been correctly diagnosed and the weights estimated accord-
ingly. Otherwise, other procedures may be preferred (see, for
8
example, the section below titled Heteroscedasticity-Consistent Some researchers have suggested that effect size measures (R2) typi-
Covariance Matrices). Weighted least squares regression is avail- cally employed in psychology and other behavioral sciences cannot be
easily interpreted in weighted least squares regression (Cohen et al., 2003,
able in all major software packages. p. 147). However, Willett and Singer (1988) presented a pseudo-R2 for
Heteroscedasticity-consistent covariance matrices. Hetero- weighted least squares regression. In an example of its application, the
scedasticity-consistent covariance matrices (HCCMs) originated in pseudo-R2 was nearly identical to the ordinary least squares– based R2.
econometrics (White, 1980) and have become increasingly studied These researchers argue that the pseudo-R2 is not likely to differ much from
the ordinary least squares– based R2 (Willett & Singer, 1988). In addition,
in the statistics and behavioral sciences literature (Cai & Hayes, they suggested that researchers should “refocus attention on other aspects
2008; Hayes & Cai, 2007; Long & Ervin, 2000; Rosopa & Wolf, of the analysis, particularly the increased precision of the estimates of ␤”
2008). Put simply, this approach corrects the covariance matrix (Willett & Singer, 1988, p. 238).
MANAGING HETEROSCEDASTICITY 345

exponent 2 in the denominator of the elements of the diagonal well in terms of Type I error and power in models without
matrix in Equation 10 is equal to 1 in HC2. interaction terms (Wilcox, 1998). When conducting t tests on
HC4 was developed by Cribari-Neto (2004). It can be ex- interaction terms, Ng and Wilcox (2010) found two procedures
pressed as that perform well in controlling Type I error under nonnormality

冋共 册共
and heteroscedasticity. Specifically, the HC4 with a wild bootstrap
ei2 is recommended when N is small or the magnitude of the regres-
HC4 ⫽ 共XX兲⫺1Xdiag X XX兲⫺1 , (11)
1 ⫺ hii兲␦i sion coefficient is small, and the Theil–Sen estimator with a
percentile bootstrap is recommended when N is large or the mag-
where ␦i ⫽ min{4, Nhii/(p ⫹ 1)}. Note that HC4 in Equation 11 nitude of the regression coefficient is large. Ng and Wilcox pro-
resembles HC3 in Equation 10. The difference, however, between vided functions in R for both procedures. Robust regression meth-
these two HCCMs is that HC4 provides for an even more stringent ods are available in some major statistical software packages,
adjustment for influential cases in X, resulting in slightly larger including R, SAS, and SYSTAT.
elements in the diagonal matrix in Equation 11 compared to Statistical approximations. When testing mean and slope
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Equation 10. differences, statistical approximations are available. For example,


This document is copyrighted by the American Psychological Association or one of its allied publishers.

Finally, HC5 (Cribari-Neto, Souza, & Vasconcellos, 2007) was when testing for the equality of two independent means, the test by
developed to provide further adjustment to the estimated covari- Welch (1938) and Satterthwaite (1946) uses a test with separate
ance matrix among the regression coefficients. HC5 is identical to variance estimators. When heteroscedasticity exists, the Welch–
Equation 11 with one exception—␦i ⫽ min{Nhii/(p ⫹ 1), max{4, Satterthwaite test performs very well compared to the usual Stu-
Nkhmax/(p ⫹ 1)}}, where k ⫽ .7, as recommended by Cribari-Neto dent’s t (Zimmerman, 2004). This procedure is available in all
et al. (2007), and hmax ⫽ max{h11, h22, . . ., hNN}. statistical packages.
HCCMs can be an attractive alternative because they do not When testing for regression slope differences, some procedures
require a priori knowledge of the exact form of heteroscedasticity, include the Welch–Aspin F approximation (Fⴱ; Aspin, 1948;
and HCCMs perform well in terms of statistical power (Long & Welch, 1938), a generalization of James’s (1951) second-order
Ervin, 2000). Based on a wide range of extant Monte Carlo approximation (J; DeShon & Alexander, 1994), and the normal-
simulations, it appears that HC4 performs the best in terms of ized t approximation (A; Alexander & Govern, 1994). Simulation
control over Type I and Type II errors (Cribari-Neto, 2004; Ng & research on the utility of the various approximation procedures has
Wilcox, 2009, 2011). Thus, of the HCCMs, we recommend HC4. led to a number of conclusions. As expected, across various
Although some HCCMs can be computationally intensive (e.g., manipulated conditions, the Fⴱ, J, and A approximations result in
HC5), many are available in some of the major software packages more stable performance than the standard F (DeShon & Alexan-
like R and STATA. In addition, Hayes and Cai (2007) provided der, 1994, 1996). For small Ns, the J approximation slightly
macros in IBM SPSS and SAS. outperforms the Fⴱ and A approximations. However, because the
Nonparametric and other robust methods. A number of Type I and Type II error rates of the J and A approximations are
nonparametric or robust methods have been suggested when the nearly identical and the A approximation is easier to compute,
homoscedasticity assumption is violated. When testing for the consistent with DeShon and Alexander (1996), we recommend the
equality of independent or dependent means, randomization tests A approximation when testing for regression slope differences.
have long been recommended because they do not require the Because the Fⴱ, J, and A approximations can be computationally
normality or homoscedasticity assumption (Siegel, 1956). Al- intensive and are not available in statistical software packages,
though such tests can be computer intensive, especially when they are some of the least accessible procedures. However, stand-
sample sizes are large, with recent advances in computing speed, alone programs and syntax are often available through the authors
this is generally less of a concern. Randomization tests are not of journal articles (see Aguinis, Petersen, & Pierce, 1999; DeShon
typically available in statistical software packages and, thus, must & Alexander, 1996), and Aguinis et al. (1999) provided a free
be programmed by the user. online tool that calculates the J and A approximations.
Other nonparametric methods have been suggested when com- Variance-stabilizing transformations. In an effort to ame-
paring independent and dependent groups. More specifically, liorate the effects of heteroscedasticity, variance-stabilizing trans-
when comparing independent groups, conventional recommenda- formations (Box & Cox, 1964) applied to the dependent variable
tions include the Mann–Whitney U test for two groups and the have been recommended by researchers, including angular trans-
Kruskal–Wallis test for more than two groups (King et al., 2010; formations, logarithms, reciprocals, and square roots (e.g., Carroll
Siegel, 1956). When comparing dependent groups, the Wilcoxon’s & Ruppert, 1988). After identifying a suitable transformation of
matched pairs test is suggested for two groups, and Friedman’s the dependent variable that restores homoscedasticity (using yNew
rank test is recommended for more than two groups (King et al., to denote the transformed y), the linear model in Equation 1 can be
2010; Siegel, 1956). These and other nonparametric tests are expressed as yNew ⫽ X␤ ⫹ ␧. However, during the search for an
available in all major statistical software. appropriate transformation, some may eliminate a hypothesized
Wilcox (2005) provided comprehensive coverage of modern mean or slope difference (see Aguinis & Pierce, 1998). In addition,
robust methods of location (typically, trimmed means or medians) Grissom (2000) indicated that “there are mixed empirical results
across independent and dependent groups excluding and including on the effectiveness of transformations and interpretive problems
covariates. In addition, Wilcox described robust regression meth- with their use” (p. 158). As an example, when the transformed
ods involving least median squares, least trimmed squares, and dependent variable is the natural logarithm of time, it may be
generalized M-estimators. The Theil–Sen estimator (Sen, 1968; difficult to unambiguously interpret the results of statistical anal-
Theil, 1950) with a percentile bootstrap method appears to perform yses. On the other hand, some transformations (assuming that
346 ROSOPA, SCHAFFER, AND SCHROEDER

homoscedasticity has been restored) may lend themselves to a Next, we calculated HC4 (see Equation 11); the square root of
useful interpretation. For example, if a dependent variable is in the fourth diagonal element provided the appropriate standard
units of time/distance (e.g., seconds/meters), then the reciprocal error. Using HC4, the t test on ␤ˆ ⫽ .69 was statistically signifi-
3
could be quite meaningful because the transformed variable would cant, t(74) ⫽ 5.87, p ⬍ .001, indicating that the population slopes
be a speed (i.e., meters/second). Variance-stabilizing transforma- differ from one another.
tions can be conducted in all major statistical packages. Then, we calculated the Theil–Sen estimator with a percentile
Above, we have reviewed various ameliorative procedures that bootstrap using the R functions described in Ng and Wilcox
can be used when the homoscedasticity assumption is violated. (2010). The number of bootstrap samples was kept at the default
Next, for demonstration and comparison purposes, we apply some setting of 599. Note that we calculated a 95% confidence interval
of these procedures to our illustrative example data. around the difference in the regression slopes (i.e., Group 2’s slope
Illustrative example. Recall that in our example, the re- ⫺ Group 1’s slope) and a two-tailed p value. The lower and upper
searcher tested for the interaction between x and d. However, bounds of the estimated 95% confidence interval around the dif-
because heteroscedasticity is present, this could have affected the ference in regression slopes were 0.079 and 1.529, respectively.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

validity of the t test on the regression coefficient of interest. Thus, The p value was .013. Thus, there was evidence to suggest that the
This document is copyrighted by the American Psychological Association or one of its allied publishers.

in Step 3, we use an alternative procedure for statistical inferences. regression slopes differed from one another.
More specifically, for comparison purposes, we used weighted Next, for comparison purposes, we also calculated the A ap-
least squares regression, HC4 (Cribari-Neto, 2004), Theil–Sen proximation (Alexander & Govern, 1994). The test statistic was
estimator with a percentile bootstrap (Ng & Wilcox, 2010), and the ␹2(1) ⫽ 4.11, p ⫽ .043. Based on this statistical approximation, the
A approximation (Alexander & Govern, 1994). For further com- population slopes differ between groups. Note that all four alter-
parisons, in an effort to restore homoscedasticity, we also applied native statistical approaches converged on the same conclusion.
various transformations to the dependent variable. Finally, we analyzed the data using a different alternative pro-
Although a comprehensive discussion of estimating the weights cedure. We applied a variety of variance-stabilizing transforma-
for use in weighted least squares regression is beyond the scope of tions to the dependent variable in an effort to restore homoscedas-
the present article, excellent descriptions of estimating the weights ticity. After applying different transformations, including the
for various models can be found in Bement and Williams (1969), natural logarithm and various exponents (i.e., ⫺1, ½, and 2), we
Cook and Weisberg (1999), and Neter et al. (1996). Briefly, each then completed Step 1 again (for each transformation) but using
of the weights is the reciprocal of an estimated variance. Using the yNew ⫽ X␤ ⫹ ␧. Then, for each transformation, we completed
unstandardized residuals in the Appendix in the column labeled Step 2. Based on the statistical, graphical, and heuristic ap-
RESID, we calculated the sum of the squared residuals for Group proaches, the homoscedasticity assumption was still violated.
1 (i.e., 130.08896) and Group 2 (i.e., 36.64667). Based on the Thus, in this example, because we were unable to identify a
results of Monte Carlo simulations and consistent with recommen- transformation that would restore homoscedasticity, it would be
dations by Overton (2001), where he suggested the degrees of preferable to use one of the other procedures described above (e.g.,
freedom for each group be based on sample size within each group weighted least squares regression, HC4, the Theil–Sen estimator,
minus four, the estimated weight for the 40 observations in Group or the A approximation).
1 was (40 ⫺ 4)/130.08896 ⫽ .27673. The estimated weight for the Overall, based on our analyses in Steps 1–3, we can conclude
38 observations in Group 2 was (38 ⫺ 4)/36.64667 ⫽ .92778. As that the population regression slopes were unequal between the
discovered in Step 2 above, the form of heteroscedasticity was treatment and control groups and that the treatment group’s slope
such that the residual variances were different across the two was steeper than the control group’s slope. In addition, the vari-
groups; thus, the weights were different between groups but the ability of the residuals was different between the groups such that
same within a group. The calculated weights appear in the Appen- the treatment group was less variable than the control group.
dix in the column labeled WT.
To conduct a weighted least squares regression, the analyses are
Discussion
the same as those that were conducted above in Step 1 (i.e., using In the current article, we have presented a set of procedures that
the model matrix in Equation 8). However, statistical software can be used to assess whether heteroscedasticity exists, as well as
packages allow for a weights variable. For example, in regression how to proceed with data analysis in the presence of heterosce-
or general linear model procedures in IBM SPSS, it is labeled dasticity. It is important that researchers not only appropriately test
WLS Weight. In the linear models function in R, a user simply hypothesized relations involving estimated parameters in a model
specifies an N ⫻ 1 vector of weights in an optional argument (Step 3) but also investigate changes in residual variances (Step 2).
called weights. Consistent with this, Grissom (2000) suggested that researchers
In the present illustration, the overall model remained statisti- “discuss differences in variabilities, not just means in their data”
cally significant, F(3, 74) ⫽ 88.47, p ⬍ .001 (R2 ⫽ .78). Note that
whereas analyses in Step 1 indicated nonsignificant slope differ- 9
When using weighted least squares regression, the ordinary least
ences, upon utilizing weighted least squares regression, which is squares residuals and the square root of the estimated weights can be used
appropriate in the presence of heteroscedastic data, the t test to estimate the weighted least squares residuals (␧ⴱ). In our illustrative
associated with ␤ˆ ⫽ .69 was statistically significant, t(74) ⫽ 2.07,
3
example, after conducting weighted least squares regression, we completed
Step 2 on the estimated weighted least squares residuals (see the column
p ⫽ .041. This indicates that the regression slopes for the two labeled WLSRESID in the Appendix). Based on the statistical, graphical,
groups are not equal, suggesting that Group 2’s slope (1.53) is and heuristic approaches, the homoscedasticity assumption was no longer
steeper than Group 1’s slope (0.84).9 violated, suggesting that weighted least squares regression was effective.
MANAGING HETEROSCEDASTICITY 347

(pp. 161–162). In addition, consistent with Grissom’s recommen- References


dation, when the homoscedasticity assumption is violated, re-
Aguinis, H. (2004). Regression analysis for categorical moderators. New
searchers should use procedures that can mitigate the effects of
York, NY: Guilford Press.
heteroscedasticity. This can have implications for both theory Aguinis, H., Petersen, S. A., & Pierce, C. A. (1999). Appraisal of the
development and practice. homogeneity of error variance assumption and alternatives to multiple
Clearly, there remain other issues that future research could regression for estimating moderating effects of categorical variables.
explore. Because our review has revealed that there were con- Organizational Research Methods, 2, 315–339. doi:10.1177/
siderable differences in variability (pun intended) in the diag- 109442819924001
nostic procedures described in extant literature, future research Aguinis, H., & Pierce, C. A. (1998). Heterogeneity of error variance and
could explore which procedures for detecting heteroscedasticity the assessment of moderating effects of categorical variables: A con-
ceptual review. Organizational Research Methods, 1, 296 –314. doi:
outperform others in terms of controlling Type I error rates
10.1177/109442819813002
while concurrently providing the greatest statistical power. As Alexander, R. A., & Govern, D. M. (1994). A new and simpler approxi-
Zimmerman’s (2004) research suggested, Levene’s (1960) test
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

mation for ANOVA under variance heterogeneity. Journal of Educa-


may not have adequate statistical power to detect violations of
This document is copyrighted by the American Psychological Association or one of its allied publishers.

tional Statistics, 19, 91–101. doi:10.2307/1165140


the homogeneity of variance assumption when testing for the Ali, M. M., & Giaccotto, C. (1984). A study of several new and existing
equality of two independent means. Thus, a comprehensive tests for heteroscedasticity in the general linear model. Journal of
examination of other procedures may be warranted. For exam- Econometrics, 26, 355–373. doi:10.1016/0304-4076(84)90026-5
ple, because of the flexibility of the score test (Breusch & American Educational Research Association, American Psychological As-
sociation, & National Council on Measurement in Education. (1999).
Pagan, 1979; Cook & Weisberg, 1983), it may be important to
Standards for educational and psychological testing. Washington, DC:
examine how well it performs against procedures that were
American Educational Research Association.
developed to detect specific forms of heteroscedasticity, for Antonakis, J., & Dietz, J. (2011). Looking for validity or testing it? The
example, Brown and Forsythe’s (1974) procedure in ANOVA. perils of stepwise regression, extreme-scores analysis, heteroscedastic-
That is, if the score test performs comparably to (or better than) ity, and measurement error. Personality and Individual Differences, 50,
other procedures under various conditions, it may be recom- 409 – 415. doi:10.1016/j.paid.2010.09.014
mended as a general-purpose test for heteroscedasticity, sup- Aspin, A. A. (1948). An examination and further development of a formula
planting other procedures. In addition, because O’Brien’s arising in the problem of comparing two mean values. Biometrika, 35,
88 –96. doi:10.2307/2332631
(1981) method can only be used in ANOVA, future research
Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Pro-
could potentially extend this procedure to include categorical
ceedings of the Royal Society: A. Mathematical, Physical and Engineer-
and continuous predictors. Notably, many of these diagnostic ing Sciences, 160, 268 –282. doi:10.1098/rspa.1937.0109
procedures were developed within the context of the general Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics:
linear model. Some may not perform comparably in the context Identifying influential data and sources of collinearity. New York, NY:
of hierarchical linear models (i.e., multilevel modeling), where Wiley. doi:10.1002/0471725153
the population errors (i.e., ␧) are permitted to be correlated (see Bement, T. R., & Williams, J. S. (1969). Variance of weighted regression
J. Kim & Seltzer, 2011). Thus, simulation studies may be useful estimators when sampling errors are independent and heteroscedastic.
to further explore this area. Journal of the American Statistical Association, 64, 1369 –1382. doi:
10.1080/01621459.1969.10501063
Research could also further examine what Zimmerman
Box, G. E. P. (1953). Non-normality and tests on variances. Biometrika,
(2004) described as conditional probabilities of Type I error. 40, 318 –335. doi:10.2307/2333350
However, these could be studied with more complex statistical Box, G. E. P. (1954). Some theorems on quadratic forms applied in the
analyses beyond two independent groups. In addition, although study of analysis of variance problems, I. Effect of inequality of variance
Zimmerman focused on control of Type I error rates, it would in the one-way classification. Annals of Mathematical Statistics, 25,
be interesting to explore conditional probabilities of Type II 290 –302. doi:10.1214/aoms/1177728786
error rates (or power). This could result in general recommen- Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations.
dations for Step 3 regardless of the degree or pattern of het- Journal of the Royal Statistical Society, Series B, 26, 211–252. Retrieved
from http://www.wiley.com/bw/journal.asp?ref⫽1369 –7412
eroscedasticity identified in Step 2. For example, some research
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity
suggests that HCCMs provide a general-purpose solution for and random coefficient variation. Econometrica, 47, 1287–1294. doi:
heteroscedasticity even when its form is unknown (Long & 10.2307/1911963
Ervin, 2000; Ng & Wilcox, 2011). Brown, M. B., & Forsythe, A. B. (1974). Robust test for the equality of
In conclusion, heteroscedasticity can adversely affect statis- variances. Journal of the American Statistical Association, 69, 364 –367.
tical models commonly employed in the behavioral and social doi:10.1080/01621459.1974.10482955
sciences. Because this can affect substantive conclusions, it is Bryk, A. S., & Raudenbush, S. W. (1988). Heterogeneity of variance in
important to have data-analytic tools for detecting heterosce- experimental studies: A challenge to conventional interpretations. Psy-
chological Bulletin, 104, 396 – 404. doi:10.1037/0033-2909.104.3.396
dasticity and mitigating its biasing effects. Based on the syn-
Cai, L., & Hayes, A. F. (2008). A new test of linear hypotheses in OLS
thesis of diverse literature, we have reviewed procedures and
regression under heteroscedasticity of unknown form. Journal of Edu-
offered recommendations for managing heteroscedasticity that cational and Behavioral Statistics, 33, 21– 40. doi:10.3102/
could serve as a useful resource for researchers and practitio- 1076998607302628
ners, thus strengthening the validity of inferences from behav- Carroll, R. J., & Ruppert, D. (1988). Transformation and weighting in
ioral and social science data. regression. New York, NY: Chapman & Hall.
348 ROSOPA, SCHAFFER, AND SCHROEDER

Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. A. (1983). James, G. S. (1951). The comparison of several groups of observations
Graphical methods for data analysis. Boston, MA: Duxbury Press. when the ratios of population variances are unknown. Biometrika, 38,
Cohen, J. (1983). The cost of dichotomization. Applied Psychological 324 –329. doi:10.2307/2332578
Measurement, 7, 249 –253. doi:10.1177/014662168300700301 Judd, C. M. (2000). Everyday data analysis in social psychology: Com-
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple parisons of linear models. In H. T. Reis, & C. M. Judd (Eds.), Handbook
regression/correlation analysis for the behavioral sciences (3rd ed.). of research methods in social and personality psychology (pp. 370 –
Mahwah, NJ: Erlbaum. 393). New York, NY: Cambridge University Press.
Conover, W. J., Johnson, M. E., & Johnson, M. M. (1981). A comparative Kahneman, D., & Ghiselli, E. E. (1962). Validity and nonlinear heterosce-
study of tests for homogeneity of variances, with applications to the dastic models. Personnel Psychology, 15, 1–11. doi:10.1111/j.1744-
outer continental shelf bidding data. Technometrics, 23, 351–361. doi: 6570.1962.tb01842.x
10.1080/00401706.1981.10487680 Kim, J., & Seltzer, M. (2011). Examining heterogeneity of residual vari-
Cook, R. D. (1977). Direction of influential observations in linear regres- ance to detect differential response to treatments. Psychological Meth-
sion. Technometrics, 19, 15–18. doi:10.2307/1268249 ods, 16, 192–208. doi:10.1037/a0022656
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. Kim, S. J. (1992). A practical solution to the multivariate Behrens-Fisher
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

New York, NY: Chapman & Hall. problem. Biometrika, 79, 171–176. doi:10.1093/biomet/79.1.171
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Cook, R. D., & Weisberg, S. (1983). Diagnostics for heteroscedasticity in King, B. M., Rosopa, P. J., & Minium, E. W. (2010). Statistical reasoning
regression. Biometrika, 70, 1–10. doi:10.1093/biomet/70.1.1 in the behavioral sciences (6th ed.). Hoboken, NJ: Wiley.
Cook, R. D., & Weisberg, S. (1999). Applied regression including com- Levene, H. (1960). Robust tests for equality of variances. In I. Olkin, S. G.
puting and graphics. New York, NY: Wiley. doi:10.1002/ Ghurye, W. Hoeffding, W. G. Madow, & H. B. Mann (Eds.), Contri-
9780470316948 butions to probability and statistics (pp. 278 –292). Stanford, CA: Stan-
Cribari-Neto, F. (2004). Asymptotic inference under heteroskedasticity of ford University Press.
unknown form. Computational Statistics and Data Analysis, 45, 215– Long, J. S., & Ervin, L. H. (2000). Using heteroscedasticity consistent
233. doi:10.1016/S0167-9473(02)00366-3 standard errors in the linear regression model. American Statistician, 54,
Cribari-Neto, F., Souza, T. C., & Vasconcellos, A. L. P. (2007). Inference 217–224. doi:10.2307/2685594
under heteroscedasticity and leveraged data. Communication in Statistics: The- MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent
covariance matrix estimators with improved finite sample properties.
ory and Methods, 36, 1877–1888. doi:10.1080/03610920601126589
Journal of Econometrics, 29, 305–325. doi:10.1016/0304-
Darlington, R. B. (1990). Regression and linear models. New York, NY:
4076(85)90158-7
McGraw-Hill.
Mansfield, E. R., & Conerly, M. D. (1987). Diagnostic value of residual
Dawes, R. M., & Corrigan, B. (1974). Linear models in decision making.
and partial residual plots. American Statistician, 41, 107–116. doi:
Psychological Bulletin, 81, 95–106. doi:10.1037/h0037613
10.2307/2684221
DeShon, R. P., & Alexander, R. A. (1994). A generalization of James’s
Maxwell, S. E., & Delaney, H. D. (1993). Bivariate median splits and
second-order approximation to the test for regression slope equality.
spurious statistical significance. Psychological Bulletin, 113, 181–190.
Educational and Psychological Measurement, 54, 328 –335. doi:
doi:10.1037/0033-2909.113.1.181
10.1177/0013164494054002007
Meloun, M., & Militky, J. (2001). Detection of single influential points in
DeShon, R. P., & Alexander, R. A. (1996). Alternative procedures for testing
OLS regression model building. Analytica Chimica Acta, 439, 169 –191.
regression slope homogeneity when group error variances are unequal. Psycho-
doi:10.1016/S0003-2670(01)01040-6
logical Methods, 1, 261–277. doi:10.1037/1082-989X.1.3.261
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996).
DeShon, R. P., & Morris, S. B. (2004). Modeling complex data structures: The
Applied linear regression models (3rd ed.). Chicago, IL: Irwin.
general linear model and beyond. In S. G. Rogelberg (Ed.), Handbook of Ng, M., & Wilcox, R. R. (2009). Level robust methods based on the least
research methods in industrial and organizational psychology (pp. 390–411). squares regression estimator. Journal of Modern Applied Statistical
Malden, MA: Blackwell. doi:10.1002/9780470756669.ch19 Methods, 8, 384 –395.
Fox, J. (2008). Applied regression analysis and generalized linear models Ng, M., & Wilcox, R. R. (2010). Comparing the regression slopes of
(2nd ed.). Thousand Oaks, CA: Sage. independent groups. British Journal of Mathematical and Statistical
Games, P. A., Winkler, H. B., & Probert, D. A. (1972). Robust tests for Psychology, 63, 319 –340. doi:10.1348/000711009X456845
homogeneity of variance. Educational and Psychological Measurement, Ng, M., & Wilcox, R. R. (2011). A comparison of two-stage procedures for
32, 887–909. doi:10.1177/001316447203200404 testing least-squares coefficients under heteroscedasticity. British Jour-
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory nal of Mathematical and Statistical Psychology, 64, 244 –258. doi:
for the behavioral sciences. San Francisco, CA: Freeman. 10.1348/000711010X508683
Ghiselli, E. E., & Sanders, E. P. (1967). Moderating heteroscedasticity. O’Brien, R. G. (1981). A simple test for variance effects in experimental
Educational and Psychological Measurement, 27, 581–590. doi: designs. Psychological Bulletin, 89, 570 –574. doi:10.1037/0033-2909
10.1177/001316446702700302 .89.3.570
Glass, G. V., Peckham, P. D., & Sanders, J. R. (1972). Consequences of Ohtani, K., & Toyoda, T. (1980). Estimation of regression coefficients
failure to meet assumptions underlying the fixed effects analyses of after a preliminary test for homoscedasticity. Journal of Econometrics,
variance and covariance. Review of Educational Research, 42, 237–288. 12, 151–159. doi:10.1016/0304-4076(80)90003-2
doi:10.3102/00346543042003237 Olejnik, S. (1988). Variance heterogeneity: An outcome to explain or a nui-
Grissom, R. J. (2000). Heterogeneity of variance in clinical data. Journal of sance factor to control. Journal of Experimental Education, 56, 193–197.
Consulting and Clinical Psychology, 68, 155–165. doi:10.10370022-006X.68 Oswald, F. L., Saad, S., & Sackett, P. R. (2000). The homogeneity
.1.155 assumption in differential prediction analysis: Does it really matter?
Hayes, A. F., & Cai, L. (2007). Using heteroscedasticity-consistent stan- Journal of Applied Psychology, 85, 536 –541. doi:10.1037/0021-9010
dard error estimators in OLS regression: An introduction and software .85.4.536
implementation. Behavior Research Methods, 39, 709 –722. doi: Overton, R. C. (2001). Moderated multiple regression for interactions
10.3758/BF03192961 involving categorical variables: A statistical control for heterogeneous
MANAGING HETEROSCEDASTICITY 349

variance across two groups. Psychological Methods, 6, 218 –233. doi: Steel, P. D., & Kammeyer-Mueller, J. D. (2002). Comparing meta-analytic
10.1037/1082-989X.6.3.218 moderator estimation techniques under realistic conditions. Journal of
Rencher, A. C. (2000). Linear models in statistics. New York, NY: Wiley. Applied Psychology, 87, 96 –111. doi:10.1037/0021-9010.87.1.96
Rosopa, P. J. (2006, May). An alternative solution for heterogeneity of Stevens, J. P. (1984). Outliers and influential data points in regression
variance across categorical moderators in moderated multiple regression. analysis. Psychological Bulletin, 95, 334 –344. doi:10.1037/0033-2909
In D. Newman (Chair), Testing interaction effects: Problems and pro- .95.2.334
cedures. Symposium conducted at the meeting of the Society for Indus- Stone-Romero, E. F., Weaver, A. E., & Glenar, J. L. (1995). Trends in
trial and Organizational Psychology, Dallas, TX. research design and data analytic strategies in organizational research.
Rosopa, P. J., Schroeder, A. N., & Doll, J. L. (2013). A note on detecting Journal of Management, 21, 141–157. doi:10.1016/0149-
between-groups heteroscedasticity in moderated multiple regression 2063(95)90039-X
with a continuous predictor and a categorical moderator. Manuscript Theil, H. (1950). A rank invariant method for linear and polynomial
submitted for publication. regression analysis. Indagationes Mathematicae, 12, 85–91.
Rosopa, P. J., & Wolf, A. N. (2008, June). Effects of measurement error on Thisted, R. A. (1988). Elements of statistical computing: Numerical com-
statistical inferences based on heteroscedasticity-consistent covariance matri- putation. London, England: Chapman & Hall.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

ces. Poster session presented at the meeting of the Psychometric Society, Wainer, H., & Thissen, D. (1993). Graphical data analysis. In G. Keren
Durham, NH. & C. Lewis (Eds.), A handbook for data analysis in the behavioral
This document is copyrighted by the American Psychological Association or one of its allied publishers.

Rutherford, A. (1992). Alternatives to traditional analysis of covariance. sciences: Statistical issues (pp. 391– 457). Hillsdale, NJ: Erlbaum.
British Journal of Mathematical and Statistical Psychology, 45, 197– Welch, B. L. (1938). The significance of the difference between two means
223. doi:10.1111/j.2044-8317.1992.tb00988.x when the population variances are unequal. Biometrika, 29, 350 –362.
Saad, S., & Sackett, P. R. (2002). Investigating differential prediction by doi:10.2307/2332010
gender in employment-oriented personality measures. Journal of Ap- White, H. (1980). A heteroskedasticity-consistent covariance matrix esti-
plied Psychology, 87, 667– 674. doi:10.1037/0021-9010.87.4.667 mator and a direct test for heteroskedasticity. Econometrica, 48, 817–
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance 838. doi:10.2307/1912934
components. Biometrics Bulletin, 2, 110–114. doi:10.2307/3002019 Wilcox, R. R. (1997). Comparing the slopes of two independent regression
Sawilowsky, S. S. (2002). Fermat, Schubert, Einstein, and Behrens-Fisher: lines when there is complete heteroscedasticity. British Journal of Math-
The probable difference between two means when ␴21 ⫽ ␴22. Journal of ematical and Statistical Psychology, 50, 309 –317. doi:10.1111/j.2044-
Modern Applied Statistical Methods, 1, 461– 472. 8317.1997.tb01147.x
Schott, J. R. (2005). Matrix analysis for statistics (2nd ed.). Hoboken, NJ: Wilcox, R. R. (1998). A note on the Theil-Sen regression estimator when
Wiley. the regressor is random and the error term is heteroscedastic. Biometrical
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). Journal, 40, 261–268. doi:10.1002/(SICI)1521-4036(199807)40:
Hoboken, NJ: Wiley. doi:10.1002/9780471722199 3⬍261::AID-BIMJ261⬎3.0.CO;2-V
Sen, P. K. (1968). Estimates of the regression coefficient based on Kend- Wilcox, R. R. (2002). Comparing the variances of two independent groups.
all’s tau. Journal of the American Statistical Association, 63, 1379 – British Journal of Mathematical and Statistical Psychology, 55, 169 –
1389. doi:10.1080/01621459.1968.10480934 175. doi:10.1348/000711002159635
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Wilcox, R. R. (2005). Introduction to robust estimation and hypothesis
quasi-experimental designs for generalized causal inference. Boston, testing (2nd ed.). New York, NY: Elsevier.
MA: Houghton Mifflin. Willett, J. B., & Singer, J. D. (1988). Another cautionary note about R2: Its
Siegel, S. (1956). Nonparametric statistics for the behavior sciences. New use in weighted least-squares regression analysis. American Statistician,
York, NY: McGraw-Hill. 42, 236 –238. doi:10.2307/2685031
Smith, B., & Sechrest, L. (1991). Treatment of Aptitude ⫻ Treatment Zimmerman, D. W. (2004). A note on preliminary tests of equality of
interactions. Journal of Consulting and Clinical Psychology, 59, 233– variances. British Journal of Mathematical and Statistical Psychology,
244. doi:10.1037/0022-006X.59.2.233 57, 173–181. doi:10.1348/000711004849222

(Appendix follows)
350 ROSOPA, SCHAFFER, AND SCHROEDER

Appendix
Data for Illustrative Example

y x d RESID WT WLSRESID

14.76 18.81 0 0.69545 0.27673 0.365843


17.19 20.78 0 1.47553 0.27673 0.776210
16.52 19.28 0 2.06181 0.27673 1.084626
17.09 20.95 0 1.23315 0.27673 0.648706
15.09 20.27 0 ⫺0.19730 0.27673 ⫺0.103809
15.85 18.98 0 1.64307 0.27673 0.864344
14.64 21.06 0 ⫺1.30900 0.27673 ⫺0.688593
11.78 20.63 0 ⫺3.80880 0.27673 ⫺2.003659
10.60 17.54 0 ⫺2.40090 0.27673 ⫺1.263007
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

14.97 20.29 0 ⫺0.33410 0.27673 ⫺0.175747


This document is copyrighted by the American Psychological Association or one of its allied publishers.

15.44 20.94 0 ⫺0.40850 0.27673 ⫺0.214879


15.89 18.44 0 2.13533 0.27673 1.123300
17.30 21.35 0 1.10814 0.27673 0.582944
17.13 20.29 0 1.82592 0.27673 0.960532
13.69 20.60 0 ⫺1.87370 0.27673 ⫺0.985677
17.06 21.52 0 0.72577 0.27673 0.381792
13.89 20.37 0 ⫺1.48110 0.27673 ⫺0.779133
12.61 21.44 0 ⫺3.65720 0.27673 ⫺1.923905
14.55 19.80 0 ⫺0.34370 0.27673 ⫺0.180805
17.15 20.74 0 1.46903 0.27673 0.772791
15.01 20.59 0 ⫺0.54530 0.27673 ⫺0.286879
15.90 21.27 0 ⫺0.22490 0.27673 ⫺0.118286
15.98 20.49 0 0.50841 0.27673 0.267452
15.24 20.82 0 ⫺0.50800 0.27673 ⫺0.267220
12.57 17.24 0 ⫺0.17960 0.27673 ⫺0.094504
16.51 19.43 0 1.92618 0.27673 1.013278
14.56 19.69 0 ⫺0.24160 0.27673 ⫺0.127080
14.85 19.87 0 ⫺0.10230 0.27673 ⫺0.053829
16.33 20.17 0 1.12642 0.27673 0.592558
12.22 18.96 0 ⫺1.97020 0.27673 ⫺1.036424
16.68 20.35 0 1.32566 0.27673 0.697372
15.88 19.83 0 0.96118 0.27673 0.505631
17.21 18.85 0 3.11195 0.27673 1.637054
12.63 18.58 0 ⫺1.24190 0.27673 ⫺0.653320
16.90 20.38 0 1.52054 0.27673 0.799887
12.61 19.05 0 ⫺1.65560 0.27673 ⫺0.870915
13.81 18.60 0 ⫺0.07870 0.27673 ⫺0.041387
12.09 20.43 0 ⫺3.33130 0.27673 ⫺1.752466
20.02 21.06 0 4.07102 0.27673 2.141582
11.62 19.51 0 ⫺3.03080 0.27673 ⫺1.594377
22.03 20.69 1 1.21273 0.92778 1.168121
17.46 19.19 1 ⫺1.06550 0.92778 ⫺1.026310
20.97 21.40 1 ⫺0.93200 0.92778 ⫺0.897744
21.29 20.05 1 1.45055 0.92778 1.397190
20.37 21.03 1 ⫺0.96670 0.92778 ⫺0.931167
18.06 18.97 1 ⫺0.12940 0.92778 ⫺0.124623
22.02 20.60 1 1.34024 0.92778 1.290936
20.61 20.59 1 ⫺0.05450 0.92778 ⫺0.052477
20.48 20.69 1 ⫺0.33730 0.92778 ⫺0.324858
19.10 19.76 1 ⫺0.29640 0.92778 ⫺0.285472
18.35 19.00 1 0.11478 0.92778 0.110560
19.34 19.08 1 0.98256 0.92778 0.946410
21.53 20.57 1 0.89608 0.92778 0.863111
16.85 18.52 1 ⫺0.65190 0.92778 ⫺0.627875
21.56 20.39 1 1.20109 0.92778 1.156902

(Appendix continues)
MANAGING HETEROSCEDASTICITY 351

Appendix (continued)

y x d RESID WT WLSRESID

22.23 21.50 1 0.17518 0.92778 0.168740


20.69 21.33 1 ⫺1.10510 0.92778 ⫺1.064429
20.67 19.86 1 1.12084 0.92778 1.079608
19.37 19.80 1 ⫺0.08750 0.92778 ⫺0.084270
17.30 19.69 1 ⫺1.98940 0.92778 ⫺1.916241
20.15 19.71 1 0.83002 0.92778 0.799483
20.26 20.32 1 0.00804 0.92778 0.007739
18.98 19.42 1 0.10309 0.92778 0.099298
21.65 19.91 1 2.02445 0.92778 1.949975
19.70 19.20 1 1.15921 0.92778 1.116571
15.97 18.49 1 ⫺1.48600 0.92778 ⫺1.431353
20.12 20.48 1 ⫺0.37640 0.92778 ⫺0.362572
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

19.08 20.26 1 ⫺1.08030 0.92778 ⫺1.040554


This document is copyrighted by the American Psychological Association or one of its allied publishers.

16.70 17.94 1 0.08429 0.92778 0.081191


17.51 18.33 1 0.29843 0.92778 0.287456
17.41 18.33 1 0.19843 0.92778 0.191135
21.93 21.18 1 0.36409 0.92778 0.350699
17.62 19.74 1 ⫺1.74580 0.92778 ⫺1.681594
18.96 18.81 1 1.01507 0.92778 0.977730
19.42 19.87 1 ⫺0.14440 0.92778 ⫺0.139124
18.60 20.39 1 ⫺1.75890 0.92778 ⫺1.694208
17.40 18.99 1 ⫺0.81990 0.92778 ⫺0.789776
21.51 20.85 1 0.44828 0.92778 0.431789
Note. The data were generated by taking a random sample of 40 bivariate observations (y and x) for Group 1 and 38 bivariate observations for Group
2 from a population with the following parameters: Population error variance in Group 1 (Group 2) ⫽ 3.0 (1.0), population slope in Group 1 (Group 2) ⫽
0.5 (1.2), population standard deviation of x in Group 1 (Group 2) ⫽ 1.0 (1.0). With these parameters and equations found in DeShon and Alexander (1996),
the remaining parameters (i.e., population standard deviation of y in each group and population correlation coefficient between x and y in each group) were
specified. Namely, population standard deviation of y in Group 1 (Group 2) ⫽ 1.8027756 (1.5620499), and population correlation coefficient between x
and y in Group 1 (Group 2) ⫽ .2773501 (.7682213). y ⫽ posttest score; x ⫽ pretest score; d ⫽ 0 for participants in Group 1 (control) and 1 for participants
in Group 2 (treatment); RESID ⫽ residuals based on ordinary least squares regression; WT ⫽ estimated weights for weighted least squares regression;
WLSRESID ⫽ residuals based on weighted least squares regression.

Received November 15, 2010


Revision received February 12, 2013
Accepted March 5, 2013 䡲

E-Mail Notification of Your Latest Issue Online!


Would you like to know when the next issue of your favorite APA journal will be available
online? This service is now available to you. Sign up at http://notify.apa.org/ and you will be
notified by e-mail when issues of interest to you become available!

You might also like