Professional Documents
Culture Documents
A Comparison of Methods To Test Mediatio
A Comparison of Methods To Test Mediatio
A Comparison of Methods To Test Mediatio
A Monte Carlo study compared 14 methods to test the statistical significance of the
intervening variable effect. An intervening variable (mediator) transmits the effect
of an independent variable to a dependent variable. The commonly used R. M.
Baron and D. A. Kenny (1986) approach has low statistical power. Two methods
based on the distribution of the product and 2 difference-in-coefficients methods
have the most accurate Type I error rates and greatest statistical power except in 1
important case in which Type I error rates are too high. The best balance of Type
I error and statistical power across all cases is the test of the joint significance of
the two effects comprising the intervening variable effect.
The purpose of this article is to compare statistical vening variable. Consideration of conceptual issues
methods used to test a model in which an independent related to the definition of intervening variable effects
variable (X) causes an intervening variable (I), which is deferred to the final section of the Discussion.
in turn causes the dependent variable (Y). Many dif- Hypotheses articulating measurable processes that
ferent disciplines use such models, with the termi- intervene between the independent and dependent
nology, assumptions, and statistical tests only par- variables have long been proposed in psychology
tially overlapping among them. In psychology, the X (e.g., MacCorquodale & Meehl, 1948; Woodworth,
→ I → Y relation is often termed mediation (Baron & 1928). Such hypotheses are fundamental to theory in
Kenny, 1986), sociology originally popularized the many areas of basic and applied psychology (Baron &
term indirect effect (Alwin & Hauser, 1975), and in Kenny, 1986; James & Brett, 1984). Reflecting this
epidemiology, it is termed the surrogate or interme- importance, a search of the Social Science Citation
diate endpoint effect (Freedman & Schatzkin, 1992). Index turned up more than 2,000 citations of the
This article focuses on the statistical performance of Baron and Kenny article that presented an important
each of the available tests of the effect of an inter- statistical approach to the investigation of these pro-
cesses. Examples of hypotheses and models that in-
volve intervening variables abound. In basic social
Editor’s Note. Howard Sandler served as the action editor psychology, intentions are thought to mediate the re-
for this article.—SGW lation between attitude and behavior (Ajzen & Fish-
bein, 1980). In cognitive psychology, attentional pro-
David P. MacKinnon, Chondra M. Lockwood, Jeanne M. cesses are thought to intervene between stimulus and
Hoffman, Stephen G. West, and Virgil Sheets, Department behavior (Stacy, Leigh, & Weingardt, 1994). In in-
of Psychology, Arizona State University. dustrial psychology, work environment leads to changes
Jeanne M. Hoffman is now at the Department of Reha- in the intervening variable of job perception, which in
bilitation Medicine, University of Washington. Virgil turn affects behavioral outcomes (James & Brett,
Sheets is now at the Department of Psychology, Indiana 1984). In applied work on preventive health interven-
State University.
tions, programs are designed to change proximal vari-
This research was supported by U.S. Public Health Ser-
vice Grant DA09757 to David P. MacKinnon. We thank
ables, which in turn are expected to have beneficial
Sandy Braver for comments on this research. effects on the distal health outcomes of interest (Han-
Correspondence concerning this article should be ad- sen, 1992; MacKinnon, 1994; West & Aiken, 1997).
dressed to David P. MacKinnon, Department of Psychol- A search of psychological abstracts from 1996 to
ogy, Arizona State University, Tempe, Arizona 85287- 1999 yielded nearly 200 articles with the term media-
1104. E-mail: David.MacKinnon@asu.edu tion, mediating, or intervening in the title and many
83
84 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
more articles that investigated the effects of interven- on the product of coefficients involving paths in a
ing variables but did not use this terminology. Both path model (i.e., the indirect effect; Alwin & Hauser,
experimental and observational studies were used to 1975; Bollen, 1987; Fox, 1980; Sobel, 1982, 1988). In
investigate processes involving intervening variables. this article, we reserve the term mediational model to
An investigation of a subset of 50 of these articles refer to the causal steps approach of Judd and Kenny
found that the majority of studies examined one in- (1981a, 1981b) and Baron and Kenny (1986) and use
tervening variable at a time. Despite the number of the term intervening variable to refer to the entire set
articles proposing to study effects with intervening of 14 approaches that have been proposed.
variables, fewer than a third of the subset included To date, there have been three statistical simulation
any test of significance of the intervening variable studies of the accuracy of the standard errors of in-
effect. Not surprisingly, the majority of studies that tervening variable effects that examined some of the
did conduct a formal test used Baron and Kenny’s variants of the product of coefficients and one differ-
(1986) approach to testing for mediation. One expla- ence in coefficients approach (MacKinnon & Dwyer,
nation of the failure to test for intervening variable 1993; MacKinnon, Warsi, & Dwyer, 1995; Stone &
effects in two-thirds of the studies is that the methods Sobel, 1990). Two other studies included a small
are not widely known, particularly outside of social simulation to check standard error formulas for spe-
and industrial–organizational psychology. A less cific methods (Allison, 1995; Bobko & Rieck, 1980).
plausible explanation is that the large number of al- No published study to date has compared all of the
ternative methods makes it difficult for researchers to available methods within the three general ap-
decide which one to use. Another explanation, de- proaches. In this article, we determine the empirical
scribed below, is that most tests for intervening vari- Type I error rates and statistical power for all avail-
able effects have low statistical power. able methods of testing effects involving intervening
Our review located 14 different methods from a variables. Knowledge of Type I error rates and statis-
variety of disciplines that have been proposed to test tical power are critical for accurate application of any
path models involving intervening variables (see statistical test. A method with low statistical power
Table 1). Reflecting their diverse disciplinary origins, will often fail to detect real effects that exist in the
the procedures vary in their conceptual basis, the null population. A method with Type I error rates that
hypothesis being tested, their assumptions, and statis- exceed nominal rates (e.g., larger than 5% for nominal
tical methods of estimation. This diversity of methods alpha ⳱ .05) risks finding nonexistent effects. The
also indicates that there is no firm consensus across variety of statistical tests of effects involving inter-
disciplines as to the definition of an intervening vari- vening variables suggests that they may differ sub-
able effect. Nonetheless, for convenience, these meth- stantially in terms of statistical power and Type I error
ods can be broadly conceptualized as reflecting three rates. We first describe methods used to obtain point
different general approaches. The first general ap- estimates of the intervening variable effect and then
proach, the causal steps approach, specifies a series of describe standard errors and tests of significance. The
tests of links in a causal chain. This approach can be accuracy of the methods is then compared in a simu-
traced to the seminal work of Judd and Kenny (1981a, lation study.
1981b) and Baron and Kenny (1986) and is the most
commonly used approach in the psychological litera- The Basic Intervening Variable Model
ture. The second general approach has developed in- The equations used to estimate the basic interven-
dependently in several disciplines and is based on the ing variable model are shown in Equations 1, 2, and 3
difference in coefficients such as the difference be- and are depicted as a path model in Figure 1.
tween a regression coefficient before and after adjust-
ment for the intervening variable (e.g., Freedman & Y = 0共1兲 + X + 共1兲 (1)
Schatzkin, 1992; McGuigan & Langholtz, 1988;
Y = 0共2兲 + ⬘X + I + 共2兲 (2)
Olkin & Finn, 1995). The difference in coefficients
procedures are particularly diverse, with some testing I = 0共3兲 + ␣X + 共3兲 (3)
hypotheses about intervening variables that diverge in
major respects from what psychologists have tradi- In these equations, X is the independent variable, Y is
tionally conceptualized as mediation. The third gen- the dependent variable, and I is the intervening vari-
eral approach has its origins in sociology and is based able. 0(1), 0(2), 0(3) are the population regression
COMPARISON OF INTERVENING VARIABLE METHODS 85
Table 1
Summary of Tests of Significance of Intervening Variable Effects
Type of method Estimate Test of significance
Causal steps
− ⬘
Freedman & Schatzkin (1992) − ⬘ tN−2 =
公 2
+ ⬘
2
− 2⬘公1 − 2XI
− ⬘
McGuigan & Langholtz (1988) − ⬘ tN−2 =
公2 + ⬘2 − 2共⬘⬘兲
Clogg et al. (1992) − ⬘ − ⬘
tN−3 =
|XI ⬘|
Olkin & Finn (1995) simple minus partial
XY − XY.I XY − XY.I
correlation z=
Olkin & Finn
Product of coefficients
␣
Sobel (1982) first-order solution ␣ z=
公 ␣22 + 22␣
␣
Aroian (1944) second-order exact solution ␣ z=
公␣22 + 22␣ + 2␣2
␣
Goodman (1960) unbiased solution ␣ z=
公␣22 + 22␣ − 2␣2
MacKinnon et al. (1998) distribution of
products z ␣ z P = z ␣ z
␣
MacKinnon et al. (1998) distribution of ␣/␣ ␣ z⬘ =
公␣22 + 22␣
MacKinnon & Lockwood (2001) asymmetric
distribution of products ␣ ␣ Ⳳ CL公␣22 + 22␣
XI共IY − XY XI兲
Bobko & Rieck (1980) product of correlations XI 共IY − XYXI兲 共1 − 2XI兲
z=
共1 − 2XI兲 Bobko & Rieck
86 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
The series of causal steps described by Judd and this method provides no test of either the ␣ product
Kenny (1981b) and Baron and Kenny (1986) differ or the overall X → Y relation.
only slightly. Judd and Kenny (1981b, p. 605) require In summary, the widely used Judd and Kenny
three conclusions for mediation: (a) “The treatment (1981b) and Baron and Kenny (1986) variants of the
affects the outcome variable” (Ho: ⳱ 0), (b) “Each causal steps approach clearly specify the conceptual
variable in the causal chain affects the variable that links between each hypothesized causal relation and
follows it in the chain, when all variables prior to it, the statistical tests of these links. However, as de-
including the treatment, are controlled” (Ho: ␣ ⳱ 0 scribed in more depth in the Discussion, these two
and Ho:  ⳱ 0), and (c) “The treatment exerts no approaches probe, but do not provide, the full set of
effect upon the outcome when the mediating variables necessary conditions for the strong inference of a
are controlled” (Ho: ⬘ ⳱ 0). Baron and Kenny (p. causal effect of the independent variable on the de-
1176) defined three conditions for mediation: (a) pendent variable through the intervening variable,
“Variations in levels of the independent variable sig- even when subjects are randomly assigned to the lev-
nificantly account for variations in the presumed me- els of the independent variable in a randomized ex-
diator” (Ho: ␣ ⳱ 0), (b) “Variations in the mediator periment (Baron & Kenny, 1986; Holland, 1988;
significantly account for variations in the dependent MacKinnon, 1994). Because the overall purpose of
variable” (Ho:  ⳱ 0), and (c) “When Paths a [␣] and the causal steps methods was to establish conditions
b [] are controlled, a previously significant relation for mediation rather than a statistical test of the indi-
between independent and dependent variables is no rect effect of X on Y through I (e.g., ␣), they have
longer significant, with the strongest demonstration of several limitations. The causal steps methods do not
mediation occurring when Path c [⬘] is zero” (Ho: ⬘ provide a joint test of the three conditions (conditions
⳱ 0). Implicit in condition c is the requirement of an a, b, and c), a direct estimate of the size of the indirect
overall significant relation between the independent effect of X on Y, or standard errors to construct con-
and dependent variables (Ho: ⳱ 0). The central fidence limits, although the standard error of the in-
difference between the two variants is that Judd and direct effect of X on Y is given in the descriptions of
Kenny emphasized the importance of demonstrating the causal steps method (Baron & Kenny, 1986;
complete mediation, which would occur when the hy- Kenny et al., 1998). In addition, it is difficult to ex-
potheses that ⬘ ⳱ 0 cannot be rejected. Baron and tend the causal steps method to models incorporating
Kenny argued that models in which there is only par- multiple intervening variables and to evaluate each of
tial mediation (i.e., |⬘| < ||) rather than complete the intervening variable effects separately in a model
mediation are acceptable. They pointed out that such with more than one intervening variable (e.g., Mac-
models are more realistic in most social science re- Kinnon, 2000; West & Aiken, 1997). Finally, the re-
search because a single mediator cannot be expected quirement that there has to be a significant relation
to completely explain the relation between an inde- between the independent and dependent variables ex-
pendent and a dependent variable. cludes many “inconsistent” intervening variable mod-
A third variant of the causal steps approach has also els in which the indirect effect (␣) and direct effect
been used by some researchers (Cohen & Cohen, (⬘) have opposite signs and may cancel out (Mac-
1983, p. 366). In this variant, researchers claim evi- Kinnon, Krull, & Lockwood, 2000).
dence for intervening variable effects when separate
tests of each path in the intervening variable effect (␣ Difference in Coefficients Tests of the
and ) are jointly significant (Ho: ␣ ⳱ 0 and Ho:  ⳱ Intervening Variable Effect
0). This method simultaneously tests whether the in-
dependent variable is related to the intervening vari- Intervening variable effects can be assessed by
able and whether the intervening variable is related to comparing the relation between the independent vari-
the dependent variable. A similar test was also out- able and the dependent variable before and after ad-
lined by Allison (1995). Kenny, Kashy, and Bolger justment for the intervening variable. Several differ-
(1998) restated the Judd and Kenny causal steps but ent pairs of coefficients can be compared, including
noted that the tests of ␣ and  are the essential tests the regression coefficients ( − ⬘) described above
for establishing mediation. This method provides the and correlation coefficients, XY − XY.I, although, as
most direct test of the simultaneous null hypothesis described below, the method based on correlations
that path ␣ and path  are both equal to 0. However, differs from other tests of the intervening variable
88 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
effect. In the above expression, XY is the correlation developed from a test of collapsibility (Clogg, Pet-
between the independent variable and the dependent kova, & Shihadeh, 1992). Clogg et al. extended the
variable and XY.I is the partial correlation between the notion of collapsibility in categorical data analysis to
independent variable and the dependent variable par- continuous measures. Collapsibility tests whether it is
tialled for the intervening variable. Each of the vari- appropriate to ignore or collapse across a third vari-
ants of the difference in coefficients tests is summa- able when examining the relation between two vari-
rized in the middle part of Table 1. Readers should ables. In this case, collapsibility is a test of whether an
note that these procedures test a diverse set of null intervening variable significantly changes the relation
hypotheses about intervening variables. between two variables. As shown below, the standard
Freedman and Schatzkin (1992) developed a error of − ⬘ in Clogg et al. (1992) is equal to the
method to study binary health measures that can be absolute value of the correlation between the indepen-
extended to the difference between the adjusted and dent variable and the intervening variable times the
unadjusted regression coefficients (Ho: − ⬘ ⳱ 0). standard error of ⬘:
Freedman and Schatzkin derived the correlation be-
MSEⱍXIⱍ
tween and ⬘ that can be used in an equation for the Clogg et al. = = ⱍXIⱍ⬘. (6)
standard error based on the variance and covariance of 公X关n共1 − 2XI兲兴
the adjusted and unadjusted regression coefficients: Furthermore, Clogg et al. (1992) showed that the sta-
Freedman–Schatzkin = 公 2
+ ⬘
2
− 2⬘公1 − 2XI.
tistical test of − ⬘ divided by its standard error is
equivalent to testing the null hypothesis, Ho:  ⳱ 0.
(4) This indicates that a significance test of the interven-
ing variable effect can be obtained simply by testing
In this equation, XI is equal to the correlation be-
the significance of  or by dividing − ⬘ by the
tween the independent variable and the intervening
standard error in Equation 6 and comparing the value
variable, is the standard error of , and ⬘ is the
to the t distribution. Although Clogg et al.’s (1992)
standard error of ⬘. The estimate of − ⬘ is divided
approach tests Ho: − ⬘ ⳱ 0, Allison (1995) and
by the standard error in Equation 4 and this value is
Clogg, Petkova, and Cheng (1995) demonstrate that
compared to the t distribution for a test of signifi-
the derivation assumes that both X and I are fixed,
cance.
which is unlikely for a test of an intervening variable.
McGuigan and Langholtz (1988) also derived the
A method based on correlations (Ho: XY − XY.I ⳱
standard error of the difference between these two
0) compares the correlation between X and Y before
regression coefficients (Ho: − ⬘ ⳱ 0) for standard-
and after it is adjusted for I as shown in Equation 7.
ized variables. They found the standard error of the
The difference between the simple and partial corre-
− ⬘ method to be equal to
lations is the measure of how much the intervening
McGuigan–Langholtz = 公2 + ⬘
variable changes the simple correlation, where IY is
2
− 2共⬘⬘兲. (5)
the correlation between the intervening variable and
The covariance between and ⬘ (⬘⬘), appli- the dependent variable. The null hypothesis for this
cable for either standardized or unstandardized vari- test is distinctly different from any of the three forms
ables, is the mean square error (MSE) from Equation that have been used in psychology (see p. 86). As
2 divided by the product of sample size and the vari- noted by a reviewer, there are situations where the
ance of X. The difference between the two regression difference between the simple and partial correlation
coefficients ( − ⬘) is then divided by the standard is nonzero, yet the correlation between the intervening
error in Equation 5 and this value is compared to the variable and the dependent variable partialled for the
t distribution for a significance test. MacKinnon et al. independent variable is zero. In these situations, the
(1995) found that the original formula for the standard method indicates an intervening variable effect when
error proposed by McGuigan and Langholtz was in- there is not evidence that the intervening variable is
accurate for a binary (i.e., unstandardized) indepen- related to the dependent variable. The problem occurs
dent variable. On the basis of their derivation, we because of constraints on the range of the partial cor-
obtained the corrected formula given above that is relation:
accurate for standardized or unstandardized indepen-
XY − IY XI
dent variables. difference = XY − . (7)
Another estimate of the standard error of − ⬘ was 公共1 − IY2 兲共1 − 2XI兲
COMPARISON OF INTERVENING VARIABLE METHODS 89
Olkin and Finn (1995) used the multivariate delta propriate coefficients and test significance of their dif-
method to find the large sample standard error of the ference in models with more than one intervening
difference between a simple correlation and the same variable.
correlation partialled for a third variable. The differ-
ence between the simple and partial correlation is then Product of Coefficients Tests for the
divided by the calculated standard error and compared Intervening Variable Effect
to the standard normal distribution to test for an in-
tervening variable effect. The large sample solutions The third general approach is to test the signifi-
for the variances and covariances among the correla- cance of the intervening variable effect by dividing
tions are shown in Appendix A and the vector of the estimate of the intervening variable effect, ␣, by
partial derivatives shown in Equation 8 is used to find its standard error and comparing this value to a stan-
the standard error of the difference. There is a typo- dard normal distribution. There are several variants of
graphical error in this formula in Olkin and Finn (p. the standard error formula based on different assump-
160) that is corrected in Equation 8. The partial de- tions and order of derivatives in the approximations.
rivatives are These variants are summarized in the bottom part of
冋
Table 1.
IY − XIXY The most commonly used standard error is the ap-
2 1Ⲑ2
,
共1 − IY 兲 共1 − 2XI兲3 Ⲑ 2 proximate formula derived by Sobel (1982) using the
册
multivariate delta method based on a first order Tay-
1 XI − XY IY
1− , . lor series approximation:
公共1 − IY兲共1 − XI兲 共1 − 2XI兲1 Ⲑ 2共1 − IY2 兲3 Ⲑ 2
2 2
For the two tests of intervening variable effects (the The intervening variable effect is divided by the stan-
other test is described below) where standard errors dard error in Equation 9, which is then compared to a
are derived using the multivariate delta method, the standard normal distribution to test for significance
partial derivatives are presented in the text rather than (Ho: ␣ ⳱ 0). This standard error formula is used in
showing the entire formula for the standard error, covariance structure programs such as EQS (Bentler,
which is long. A summary of the multivariate delta 1997) and LISREL (Jöreskog & Sörbom, 1993).
method is shown in Appendix A. An SAS (Version The exact standard error based on first and second
6.12) program to compute the standard errors is order Taylor series approximation (Aroian, 1944) of
shown in Appendix B. the product of ␣ and  is
MacKinnon, Lockwood, and Hoffman (1998) limits to accommodate the nonnormal distribution of
showed evidence that the ␣/␣ methods to test the the intervening variable effect based on the distribu-
significance of the intervening variable effect have tion of the product of random variables. Again, two z
low power because the distribution of the product of statistics are computed, z␣ ⳱ ␣/␣ and z ⳱ /.
regression coefficients ␣ and  is not normally dis- These values are then used to find critical values for
tributed, but rather is often asymmetric with high kur- the product of two random variables from the tables in
tosis. Under the conditions of multivariate normality Meeker et al. (1981) to find lower and upper signifi-
of X, I, and Y, the two paths represented by ␣ and  cance levels. Those values are used to compute lower
in Figure 1 are independent (MacKinnon, Warsi, & and upper confidence limits using the formula CL ⳱
Dwyer, 1995; Sobel, 1982). On the basis of the sta- ␣± (critical value) ␣. If the confidence interval
tistical theory of the products of random variables does not include zero, the intervening variable effect
(Craig, 1936; Meeker, Cornwell, & Aroian, 1981; is significant.
Springer & Thompson, 1966), MacKinnon and col- Bobko and Rieck (1980) examined intervening
leagues (MacKinnon et al., 1998; MacKinnon & variable effects in path analysis using regression co-
Lockwood, 2001) proposed three alternative variants efficients from the analysis of standardized variables
(presented below) that theoretically should be more (Ho: ␣ss ⳱ 0, where ␣s and s are from regression
accurate: (a) empirical distribution of ␣/␣ (Ho: analysis of standardized variables). These researchers
␣/␣ ⳱ 0), (b) distribution of the product of two used the multivariate delta method to find an estimate
standard normal variables, z␣z (Ho: z␣z ⳱ 0), and of the variance of the intervening variable effect for
(c) asymmetric confidence limits for the distribution standardized variables, based on the product of the
of the product, ␣ (Ho: ␣ ⳱ 0). correlation between X and I and the partial regression
In the first variant, MacKinnon et al. (1998) con- coefficient relating I and Y, controlling for X. The
ducted extensive simulations to estimate the empirical function of the product of these terms is
sampling distribution of ␣ for a wide range of values
XI 共IY − XY XI兲
of ␣ and . On the basis of these empirical sampling product = . (12)
distributions, critical values for different significance 1 − 2XI
levels were determined. These tables of critical values
are available at http://www.public.asu.edu/∼davidpm/ The partial derivatives of this function given in Bobko
ripl/methods.htm. For example, the empirical critical and Rieck (1980) are
冋 册
value is .97 for the .05 significance level rather than
1.96 for the standard normal test of ␣ ⳱ 0. We 2XIIY + IY − 2XIXY −2XI XI
, , . (13)
designate this test statistic by z⬘ because it uses a 共1 − XI兲
2 2
1 − XI 1 − 2XI
2
Overview of Simulation Study of path , effect size of path ⬘, and sample size (50,
The purpose of the simulation study was to provide 100, 200, 500, 1,000), for a total of 640 different
researchers with information about the statistical per- conditions. A total of 500 replications of each condi-
formance of the 14 tests of the intervening variable tion were conducted.
effect. The primary focus of our comparison was the Accuracy of Point Estimates and
Type I error rate and statistical power of each test. Standard Error
Intervening variable effect estimates and standard er-
rors were also examined to provide another indication Bias and relative bias were used to assess the ac-
of the accuracy of the methods. We predicted that the curacy of the point estimates of the intervening vari-
use of multiple hypothesis tests in the causal steps able effect. As shown below, relative bias was calcu-
approaches would lead to low Type I error rates and lated as the ratio of bias (numerator) to the true value:
low statistical power. We also predicted that many of ˆ −
the traditional tests of the ␣ product would also have RelativeBias = , (14)
low Type I error rates and low statistical power be-
cause of the associated highly heavy tailed distribu- ˆ is the point estimate of the simulated intervening
tion. A central question addressed by the simulation variable effect and is the true value of the interven-
was whether the alternative and the newer tests of the ing variable effect.
intervening variable effects would yield higher levels The accuracy of each standard error was deter-
of statistical power without increasing the Type I error mined by comparing the average estimate of the stan-
rate. dard error of the intervening variable effect across the
500 simulations to the standard deviation of the inter-
Method
vening variable effect estimate from the 500 simula-
Simulation Description tions. The standard deviation of the intervening vari-
The SAS (Version 6.12) programing language was able effect across 500 simulations was the estimate of
used for all statistical simulations and analyses. Vari- the true standard error (Yang & Robertson, 1986).
ables were generated from the normal distribution us-
Calculation of Empirical Power and Type I
ing the RANNOR function with the current time as
Error Rate
the seed. Sample sizes were chosen to be comparable
to those common in the social sciences: 50, 100, 200, Empirical power or Type I error rates as appropri-
500, and 1,000. Parameter values ␣, , and ⬘ were ate were calculated for each test of the intervening
chosen to correspond to effect sizes of zero, small variable effect. We report results for the 5% level of
(2% of the variance in the dependent variable), me- significance because it is the most commonly used
dium (13% of the variance in the dependent variable), value in psychology. For each condition, the propor-
and large (26% of the variance in the dependent vari- tion of times that the intervening variable effect was
able), as described in Cohen (1988, pp. 412–414). statistically significant in 500 replications was tabu-
These parameters were 0, 0.14, 0.39, and 0.59, corre- lated.
sponding to partial correlations of 0, 0.14, 0.36, and When ␣ ⳱ 0,  ⳱ 0, or both ␣ and  ⳱ 0, the
0.51, respectively. The intervening variable and de- proportion of replications in which the null hypothesis
pendent variable were always simulated to be con- of no intervening variable effect was rejected pro-
tinuous. In half of the simulations, the independent vided an estimate of the empirical Type I error rate.
variable was continuous and in the other half, the Because we used the 5% significance level, the inter-
independent variable was binary with an equal num- vening variable effect was expected to be statistically
ber of cases in each category. The binary case was significant in 25 (5%) of the 500 samples when the
included to investigate intervening variable effects in intervening variable effect equals zero.
experimental studies. The ␣ parameters were adjusted When both ␣ and  did not equal zero, the propor-
in the binary case to maintain the same partial corre- tion of times that each method led to the conclusion
lations as in the continuous case. that the intervening variable effect was significant
In summary, the simulation used a 2 × 4 × 4 × 4 × provided the measure of statistical power. The higher
5 factorial design. We varied the factors of indepen- the proportion of times a method led to the conclusion
dent variable type (continuous and binary), effect size to reject the false null hypothesis of no effect, the
of path ␣ (zero, small, medium, and large), effect size greater the statistical power.
92 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
Table 3 Table 4
Comparison of Multivariate Delta Standard Error Type I Error Rates and Statistical Power for Causal
Estimates to Standard Deviation of Point Estimates Step Methods
Sample size Sample size
Effect size 50 100 200 500 1,000 Effect size 50 100 200 500 1,000
Standard deviation for simple minus partial Judd & Kenny (1981a, 1981b)
correlation rdifference Zero 0 0 .0020 0 0
Zero .0215 .0119 .0049 .0021 .0009 Small .0040 0 .0060 .0400 .0740
Small .0361 .0207 .0159 .0087 .0061 Medium .1060 .2540 .4940 .8620 .9520
Medium .0721 .0489 .0347 .0227 .0162 Large .4580 .7940 .9520 .9460 .9500
Large .0947 .0623 .0456 .0288 .0197 Baron & Kenny (1986)
Olkin & Finn (1995) standard error Zero 0 0 .0020 0 0
Zero .0252 .0127 .0061 .0025 .0013 Small .0040 0 .0100 .0600 .1060
Small .0346 .0226 .0151 .0089 .0063 Medium .1160 .2760 .5200 .8820 .9960
Medium .0711 .0496 .0351 .0221 .0156 Large .4700 .8220 .9880 1.000 1.000
Large .0921 .0649 .0461 .0270 .0205 Joint significance of ␣ and 
Standard deviation of product of coefficients for Zero .0040 .0060 .0020 .0020 0
standardized variables rproduct Small .0360 .0660 .2860 .7720 .9880
Zero .0205 .0117 .0051 .0021 .0009 Medium .5500 .9120 1.000 1.000 1.000
Small .0349 .0207 .0157 .0088 .0061 Large .9300 1.000 1.000 1.000 1.000
Medium .0698 .0465 .0323 .0215 .0157 Note. For all analyses, ␣ ⳱  and ⬘ ⳱ 0. Small effect size ⳱
Large .0831 .0556 .0405 .0252 .0177 .14, medium effect size ⳱ .36, and large effect size ⳱ .51. Tests
are two-tailed, p ⳱ .05. For each method, values in the first row for
Bobko & Rieck (1980) standard error each test are estimates of the empirical Type I error rate. Values in
Zero .0256 .0127 .0061 .0025 .0012 rows 2–4 represent empirical estimates of statistical power.
Small .0351 .0228 .0152 .0089 .0063
Medium .0722 .0497 .0351 .0219 .0155 The causal steps methods had Type I error rates
Large .0919 .0637 .0452 .0282 .0199
below nominal values at all sample sizes as shown in
Note. The measure of the true standard error for the simple minus Table 4. The Baron and Kenny (1986) and Judd and
partial correlation is the standard deviation of the simple minus
partial correlation. The measure of the true standard error of the
Kenny (1981b) methods had low power for small and
product of coefficients for standardized variables is the standard medium effect sizes and attained power of .80 or
deviation of the product of coefficients for standardized variables. greater for large effects with more than 100 subjects.
The Baron and Kenny (1986) method had greater
variables and standard errors for simple minus partial power as ⬘ increased and the Judd and Kenny
correlations were all very close to the true values for (1981b) method had less power as ⬘ increased. The
all conditions, indicating that the standard errors de- test of the joint significance of ␣ and  was similar to
rived using the multivariate delta method were gen- the other causal steps methods in that it had low Type
erally accurate. I error rates. The Type I error rate was consistent with
.052 ⳱ .0025 expected for two independent tests,
Power and Type I Error
however. Unlike the Baron and Kenny and Judd and
To reduce the number of tables, we present the Kenny methods, it had at least .80 power to detect
results from the subset of conditions in which ␣ ⳱  large effects at a sample size of 50, medium effects at
and ⬘ ⳱ 0 in Tables 4, 5, and 6. The results for 100, and approached .80 power to detect a small effect
conditions having nonzero values of ⬘ and the con- for a sample size of 500. The power to detect small
ditions in which ␣ ⫽  but both ␣ and  were greater effects was low for all causal steps methods. The joint
than zero generally produced the same results. Results significance of ␣ and  was the most powerful of the
that differed across the values of ⬘ are described in causal steps methods.
the text. The results for the case when either ␣ or  Similar to the causal steps methods, all of the dif-
were zero and the other path was nonzero are shown ference in coefficients methods had low Type I error
in Tables 7, 8, and 9. The full results of the simulation rates with two exceptions, as shown in Table 5. All of
are available from the Web site http://www.public. the − ⬘ methods had .80 or greater power and were
asu.edu/∼davidpm/ripl/methods.htm. able to detect small effects once the sample size
94 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
Table 5 Table 6
Type I Error Rates and Statistical Power for Difference Type I Error Rates and Statistical Power for Product of
in Coefficients Methods Coefficients Methods
Sample size Sample size
Effect size 50 100 200 500 1,000 Effect size 50 100 200 500 1,000
− ⬘ (Freedman & Schatzkin, 1992) First-order test (Sobel, 1982)
Zero .0160 .0440 .0180 .0520 .0500 Zero 0 0 .0020 0 0
Small .1240 .2280 .5060 .8900 .9920 Small .0060 .0100 .1220 .5620 .9760
Medium .7100 .9560 1.000 1.000 1.000 Medium .3600 .8620 1.000 1.000 1.000
Large .9560 1.000 1.000 1.000 1.000 Large .9020 1.000 1.000 1.000 1.000
− ⬘ (McGuigan & Langholtz, 1988) Second-order test (Aroian, 1944)
Zero 0 0 0 0 0 Zero 0 0 0 0 0
Small .0060 .0060 .0920 .5260 .9740 Small .0060 .0060 .0920 .5260 .9740
Medium .3380 .8540 1.000 1.000 1.000 Medium .3320 .8540 1.000 1.000 1.000
Large .8920 1.000 1.000 1.000 1.000 Large .8920 1.000 1.000 1.000 1.000
− ⬘ (Clogg et al., 1992) Unbiased test (Goodman, 1960)
Zero .0320 .0660 .0320 .0620 .0540 Zero .0160 .0040 .0140 .0020 .0100
Small .1780 .2840 .5100 .8920 .9920 Small .0080 .0200 .1420 .6200 .9820
Medium .7320 .9560 1.000 1.000 1.000 Medium .3900 .8700 1.000 1.000 1.000
Large .9580 1.000 1.000 1.000 1.000 Large .9120 1.000 1.000 1.000 1.000
Simple minus partial correlation (Olkin & Finn, 1995) Distribution of products test P ⳱ z ␣ z
Zero .0020 0 .0020 0 0 (MacKinnon et al., 1998)
Small .0100 .0120 .1260 .5780 .9800 Zero .0620 .0760 .0420 .0660 .0400
Medium .4340 .8920 1.000 1.000 1.000 Small .2220 .3960 .7180 .9740 1.000
Large .9380 1.000 1.000 1.000 1.000 Medium .9180 .9960 1.000 1.000 1.000
Note. For all analyses, ␣ ⳱  and ⬘ ⳱ 0. Small effect size ⳱ Large 1.000 1.000 1.000 1.000 1.000
.14, medium effect size ⳱ .36, and large effect size ⳱ .51. Tests Distribution of ␣/␣ (MacKinnon et al., 1998)
are two-tailed, p ⳱ .05. For each method, values in the first row for
each test are estimates of the empirical Type I error rate. Values in Zero .0560 .0680 .0400 .0600 .0420
rows 2–4 represent empirical estimates of statistical power. Small .2060 .3600 .6920 .9580 .9960
Medium .9040 .9960 1.000 1.000 1.000
reached 1,000, medium effects at 100, and large ef- Large .9980 1.000 1.000 1.000 1.000
fects at a sample size of 50. Only the Clogg et al. Asymmetric distribution of products test
(1992) and Freedman and Schatzkin (1992) methods (MacKinnon & Lockwood, 2001)
had accurate Type I error rates (i.e., close to .05) and Zero .0040 .0040 .0020 0 0
greater than .80 power to detect a small, medium, and Small .0300 .0620 .2740 .7600 .9880
Medium .5540 .9200 1.000 1.000 1.000
large effect at sample sizes of 500, 100, and 50, re-
Large .9400 1.000 1.000 1.000 1.000
spectively. Even though the standard errors from
these methods appeared to underestimate the true Product of coefficients for standardized variables
standard error, they had the most accurate Type I error (Bobko & Rieck, 1980)
rates and higher statistical power. This pattern of re- Zero .0020 0 .0020 0 0
Small .0080 .0160 .1300 .5700 .9780
sults suggests that a standard error that is too small
Medium .4200 .8760 1.000 1.000 1.000
may be partially compensating for the higher critical Large .9200 1.000 1.000 1.000 1.000
values associated with the nonnormal distribution of
Note. For all analyses, ␣ ⳱  and ⬘ ⳱ 0. Small effect size ⳱
the intervening variable effect. .14, medium effect size ⳱ .36, and large effect size ⳱ .51. Tests
Like most of the previous methods, the product of are two-tailed, p ⳱ .05. For each method, values in the first row for
coefficients methods generally had Type I error rates each test are estimates of the empirical Type I error rate. Values in
rows 2–4 represent empirical estimates of statistical power.
below .05 and adequate power to detect small, me-
dium, and large effects for sample sizes of 1,000, 100, all tests. These results are presented in Table 6. At a
and 50 respectively. The distribution of products test, sample size of 50, the two distribution methods had
P ⳱ z␣z, and the distribution of z⬘ ⳱ ␣/␣ test power of above .80 to detect medium and large effects
had accurate Type I error rates and the most power of and detected small effects with .80 power at a sample
COMPARISON OF INTERVENING VARIABLE METHODS 95
Table 7
Type I Error Rates of Mixed Effects for Causal Steps Methods
Sample size
␣ value/ value 50 100 200 500 1,000
Judd & Kenny (1981a, 1981b)
Large/zero .0440 0 0 0 0
Medium/zero .0020 .0020 .0020 .0020 0
Small/zero 0 0 0 0 .0020
Zero/large .0100 .0080 .0060 .0100 .0060
Zero/medium .0480 .0040 .0040 .0020 .0080
Zero/small 0 0 0 .0080 0
Baron & Kenny (1986)
Large/zero .0040 0 0 0 0
Medium/zero .0020 .0040 .0020 0 0
Small/zero 0 0 0 0 0
Zero/large .0020 .0060 .0080 .0020 .0040
Zero/medium 0 .0020 .0100 .0020 .0060
Zero/small 0 0 0 .0040 .0020
Joint significance of ␣ and 
Large/zero .0400 .0420 .0500 .0480 .0380
Medium/zero .0400 .0560 .0460 .0500 .0480
Small/zero .0100 .0200 .0300 .0380 .0360
Zero/large .0520 .0520 .0460 .0520 .0340
Zero/medium .0480 .0540 .0620 .0340 .0500
Zero/small .0060 .0160 .0280 .0400 .0420
Note. For all analyses, ⬘ ⳱ 0. Small value ⳱ .14, medium value ⳱ .39, and large value ⳱ .59. Tests
are two-tailed, p ⳱ .05. For each method, values in each row are empirical estimates ot the Type I error
rate.
size of 500. The asymmetric confidence limits method variable effect when the  path was a medium or large
also had Type I error rates that were too low but had effect. In the case where ␣ ⳱ 0 and the  effect was
more power than the other product of coefficient large, these methods yielded Type I error rates that
methods. were too high, although the distribution methods, P ⳱
Overall, the two distribution methods, P ⳱ z␣z, z␣z and z⬘ ⳱ ␣/␣, performed better than the dif-
and z⬘ ⳱ ␣/␣, the Clogg et al. (1992), and Freed- ference in coefficients methods. When ␣ was a large
man and Schatzkin (1992) methods performed the effect and  ⳱ 0, the Clogg et al. (1992) and Freed-
best of all of the methods tested in terms of the most man and Schatzkin methods worked well and the dis-
accurate Type I error rates and the greatest statistical tribution of products methods did not. The joint sig-
power. However, recall that the Clogg et al. (1992) nificance test of ␣ and , the asymmetric confidence
method assumes fixed effects for X and I (equivalent limits test, and the tests based on dividing the ␣
to a test of the significance of ) so that it may not be intervening variable effect by the standard error of ␣
a good test on conceptual grounds. The similar per- had more accurate standard errors in the case where
formance of the Freedman and Schatzkin test suggests one of the ␣ and  parameters was equal to zero.
that this method is also based on fixed effects for X
and I. These methods were superior whenever both ␣ Discussion
⳱ 0 and  ⳱ 0 and for all combinations of ␣ and  In our discussion, we initially focus on the statisti-
values as long as both parameters were nonzero. For cal performance of each of the 14 tests of the effect of
cases when either ␣ ⳱ 0 and  was nonzero or ␣ was the intervening variable effect that were considered.
nonzero and  ⳱ 0, these methods were not the most We then focus on statistical recommendations and
accurate (see Tables 7, 8, and 9). The ␣ path could be more general conceptual and practical issues associ-
nonsignificant and quite small, yet these methods ated with the choice of a test of an intervening vari-
would suggest a statistically significant intervening able effect.
96 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
Table 8
Type I Error Rates of Mixed Effects for Difference in Coefficients Methods
Sample size
␣ value/ value 50 100 200 500 1,000
− ⬘ (Freedman & Schatzkin, 1992)
Large/zero .0440 .0540 .0500 .0480 .0380
Medium/zero .0460 .0560 .0460 .0500 .0480
Small/zero .0300 .0560 .0480 .0440 .0400
Zero/large .5680 .5800 .6000 .5820 .5980
Zero/medium .4720 .6560 .7100 .7020 .7080
Zero/small .1120 .1980 .3940 .7520 .8840
− ⬘ (McGuigan & Langholtz, 1988)
Large/zero .0200 .0380 .0440 .0440 .0360
Medium/zero .0120 .0280 .0360 .0440 .0440
Small/zero 0 .0020 .0080 .0080 .0140
Zero/large .0300 .0380 .0400 .0480 .0320
Zero/medium .0120 .0260 .0420 .0240 .0440
Zero/small 0 0 .0020 .0100 .0180
− ⬘ (Clogg et al., 1992)
Large/zero .0460 .0540 .0500 .0480 .0380
Medium/zero .0460 .0560 .0460 .0500 .0480
Small/zero .0460 .0640 .0500 .0440 .0400
Zero/large .9800 1.000 1.000 1.000 1.000
Zero/medium .7740 .9740 1.000 1.000 1.000
Zero/small .2020 .2980 .4900 .8660 .9860
Simple minus partial correlation (Olkin & Finn, 1995)
Large/zero .0340 .0380 .0480 .0380 .0364
Medium/zero .0160 .0340 .0380 .0440 .0500
Small/zero 0 .0040 .0080 .0100 .0160
Zero/large .0340 .0380 .0540 .0520 .0280
Zero/medium .0300 .0400 .0040 .0280 .0500
Zero/small 0 .0060 .0080 .0140 .0200
Note. For all analyses, ⬘ ⳱ 0. Small value ⳱ .14, medium value ⳱ .39, and large value ⳱ .59.
Tests are two-tailed, p ⳱ .05. For each method, values in each row are empirical estimates of the
Type I error rate.
Table 9
Type I Error Rates of Mixed Effects for Product of Coefficients Methods
Sample size
␣ value/ value 50 100 200 500 1,000
First-order test (Sobel, 1982)
Large/zero .0240 .0460 .0460 .0460 .0360
Medium/zero .0120 .0300 .0380 .0460 .0440
Small/zero 0 .0020 .0080 .0100 .0140
Zero/large .0320 .0400 .0400 .0500 .0320
Zero/medium .0200 .0300 .0420 .0240 .0440
Zero/small 0 .0020 .0080 .0160 .0220
Second-order test (Aroian, 1944)
Large/zero .0200 .0380 .0440 .0440 .0360
Medium/zero .0120 .0280 .0360 .0440 .0440
Small/zero 0 .0020 .0080 .0080 .0140
Zero/large .0300 .0380 .0400 .0480 .0320
Zero/medium .0120 .0260 .0420 .0240 .0440
Zero/small 0 0 .0020 .0100 .0180
Unbiased test (Goodman, 1960)
Large/zero .0280 .0480 .0480 .0480 .0360
Medium/zero .0160 .0320 .0400 .0460 .0480
Small/zero .0220 .0080 .0100 .0120 .0140
Zero/large .0380 .0420 .0420 .0500 .0320
Zero/medium .0300 .0360 .0480 .0260 .0440
Zero/small .0080 .0160 .0100 .0200 .0240
Distribution of products test P ⳱ z␣z (MacKinnon et al., 1998)
Large/zero .5860 .6720 .8080 .8820 .8860
Medium/zero .2940 .5340 .6600 .8820 .8740
Small/zero .1160 .1720 .2580 .4340 .5920
Zero/large .6260 .6700 .7920 .8580 .9120
Zero/medium .4320 .5400 .6780 .8180 .8700
Zero/small .1380 .1800 .2800 .4560 .5680
Distribution of ␣/␣ (MacKinnon et al, 1998)
Large/zero .3460 .3520 .4260 .3760 .3600
Medium/zero .2940 .3160 .3440 .3500 .3660
Small/zero .0900 .1480 .1920 .2800 .3100
Zero/large .3520 .3700 .3660 .3860 .3660
Zero/medium .3080 .3380 .3780 .3500 .3800
Zero/small .1360 .1680 .2380 .3260 .2860
Asymmetric distribution of products test (MacKinnon & Lockwood, 2001)
Large/zero .0280 .0380 .0480 .0480 .0400
Medium/zero .0240 .0420 .0400 .0460 .0440
Small/zero .0060 .0120 .0160 .0280 .0280
Zero/large .0440 .0360 .0400 .0460 .0340
Zero/medium .0300 .0340 .0380 .0320 .0460
Zero/small .0060 .0100 .0160 .0260 .0200
Product of coefficients for standardized variables (Bobko & Rieck, 1980)
Large/zero .0340 .0500 .0500 .0480 .0360
Medium/zero .0200 .0320 .0400 .0460 .0480
Small/zero 0 .0020 .0080 .0100 .0140
Zero/large .0420 .0480 .0420 .0500 .0320
Zero/medium .0300 .0380 .0480 .0280 .0440
Zero/small 0 .0080 .0080 .0180 .0220
Note. For all analyses, ⬘ ⳱ 0. Small value ⳱ .14, medium value ⳱ .39, and large value ⳱ .59. Tests
are two-tailed p ⳱ .05. For each method, values in each row are empirical estimates of the Type I error
rate.
98 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
Freedman and Schatzkin tests do not appear to give an Type I errors that are too high. The better perfor-
Freedman and Schatzkin tests do not appear to give an mance for the Clogg et al. (1992) test when ␣ ⫽ 0 and
accurate estimate of the standard error of the inter-  ⳱ 0 is not surprising because the test of signifi-
vening variable effect (because of the assumption of cance is equivalent to the test of whether the  pa-
fixed X and I), the significance tests have the most rameter is statistically significant (H0:  ⳱ 0) and
accurate Type I error rates and the greatest statis- does not include the ␣ value in the test. The test based
tical power for most situations. Similarly, the pro- on the empirical distribution of z⬘ ⳱ ␣/␣ has the
duct of coefficients methods have higher power lowest Type I error rates of the four best methods
than the Baron and Kenny and Judd and Kenny when looking across both the ␣ ⳱ 0 and  ⫽ 0 and
(1981b) methods but, again, the Type I error rates are the ␣ ⫽ 0 and  ⳱ 0 cases.
too low. The low power rates and low Type I error In summary, statistical tests of the intervening
rates are present for the first-order test used in co- variable effect trade off two competing problems.
variance structure analysis programs including First, the nonnormal sampling distribution of the ␣
LISREL (Jöreskog & Sörbom, 1993) and EQS effect leads to tests that are associated with empirical
(Bentler, 1997). The distribution of the product test levels of significance that are lower than the stated
P ⳱ z␣z and the distribution of z⬘ ⳱ ␣/␣ have levels when H0 is true as well as low statistical
accurate Type I error rates when ␣ ⳱  ⳱ 0, and the power when H0 is false. The MacKinnon et al. (1998)
highest power rates throughout. These two distribu- z⬘ and P tests explicitly address this problem and
tion tests do not assume that the intervening variable provide accurate Type I error rates when ␣ ⳱  ⳱ 0
effect is normally distributed, consistent with the and relatively high levels of statistical power when
unique distribution of the product of two random, H0 is false. Second, the test of the null hypothesis for
normal variables (Craig, 1936), but they do assume ␣ ⳱ 0 is complex because the null hypothesis
that individual regression coefficients are normally takes a compound form, encompassing (a) ␣ ⳱ 0,
distributed.  ⳱ 0; (b) ␣ ⫽ 0,  ⳱ 0; and (c) ␣ ⳱ 0,  ⫽ 0. Two
The difference in the statistical background of the of the MacKinnon et al. tests break down and yield
Clogg et al. (1992) and Freedman and Schatzkin higher than stated Type I error rates under conditions b
(1992) tests and the distribution of products tests, P ⳱ and c. In contrast, the use of otherwise inappropriate
z␣z and z⬘ ⳱ ␣/␣, makes the similarity of the conservative critical values based on the normal sam-
empirical power and Type I error rates when ␣ ⳱  pling distribution turns out empirically to compensate
⳱ 0 somewhat surprising. The Clogg et al. (1992) and for the inflation in the Type I error rate associated with
Freedman and Schatzkin (1992) tests underestimate the compound form of the null hypothesis.
the standard errors, which serves to compensate for Statistical Recommendations
critical values that are too low when standard refer-
ence distributions are used. Although the degree of Focusing initially on the statistical performance of
compensation appeared to be quite good under some the tests of the intervening variable effect, the 14 tests
of the conditions investigated in the present simula- can be divided into three groups of tests with similar
tion, it is unclear whether these tests could be ex- performance. The first group consists of the Baron
pected to show an appropriate degree of compensation and Kenny (1986) and Judd and Kenny (1981b) ap-
under other conditions (e.g., larger effect sizes and proaches, which have low Type I error rates and the
other levels of significance). lowest statistical power in all conditions studied. Four
There is an important exception to the accuracy of tests are included in the second group of methods
the Clogg et al. (1992), Freedman and Schatzkin consisting of P ⳱ z␣z and z⬘ ⳱ ␣/␣, the Clogg
(1992), P ⳱ z␣z, and z⬘ ⳱ ␣/␣ tests. When the et al. (1992), and Freedman and Schatzkin (1992)
true population values are ␣ ⳱ 0 and  ⫽ 0, the tests, which have the greatest power when both ␣ and
methods lead to the conclusion that there is an inter-  are nonzero and the most accurate Type I error rates
vening variable effect far too often, although the dis- when both ␣ and  are zero. These four methods can
tribution of products test z⬘ ⳱ ␣/␣ is less suscep- be ordered from the best to the worst as z⬘ ⳱ ␣/␣,
tible to Type I errors than the other methods. When P ⳱ z␣z, Freedman and Schatzkin test, and Clogg et
the true value of ␣ ⫽ 0 and  ⳱ 0, the Clogg et al. al. test for most values of ␣ and . If researchers wish
(1992) and Freedman and Schatzkin (1992) tests still to have the maximum power to detect the intervening
perform well and the two distribution methods give variable effect and can tolerate the increased Type I
COMPARISON OF INTERVENING VARIABLE METHODS 99
error rate if either the ␣ or  population parameter is The overall pattern of results for the case of ␣ ⳱ 0
zero, then these are the methods of choice. If there is and  ⫽ 0 forces consideration of two different sta-
evidence that ␣ ⫽ 0 and  ⳱ 0, then the Clogg et al. tistical null hypotheses regarding intervening variable
and Freedman and Schatzkin methods will have in- effects. The first hypothesis is a test of whether the
creased power and accurate Type I error rates and P indirect effect ␣ is zero. This hypothesis is best
⳱ z␣z and z⬘ ⳱ ␣/␣ tests have Type I error rates tested by the methods with the most power, empirical
that are too high. When ␣ ⫽ 0 and  ⳱ 0, the Clogg distribution z⬘ ⳱ ␣/␣ and distribution of P ⳱
et al., and the Freedman and Schatzkin methods have z␣z, and possibly by the Freedman and Schatzkin
very high Type I error rates. For both cases where (1992) test. The second hypothesis is a test of whether
either ␣ or  is zero, the empirical distribution method both paths ␣ and  are equal to zero. In this case the
z⬘ ⳱ ␣/␣ has the lowest Type I error rates (Type joint significance of ␣ and  or the asymmetric con-
I error rates did not exceed .426). As a result, if the fidence limit test provide the most direct test of the
researcher seeks the greatest power to detect an effect hypothesis. Given the importance of establishing that
and does not consider an effect to be transmitted (a) the treatment leads to changes in the intervening
through an intervening variable if ␣ can be zero, then variable (␣ ⫽ 0) and (b) the intervening variable is
the z⬘ ⳱ ␣/␣ empirical distribution test is the test associated with dependent variable ( ⫽ 0) (see
of choice. The researcher should be aware that the Krantz, 1999), we strongly recommend this test for
Type I error rate can be higher than nominal values experimental investigations involving the simple in-
for the situation whether either ␣ or  (but not both) tervening variable model portrayed in Figure 1.
is zero in the population. Asymmetric confidence limits for the mediated effect,
Eight tests are included in the third group of meth- ␣, can then be computed based on the distribution of
ods, which represent less power and too low Type I the product.
error rates when ␣ ⳱  ⳱ 0, but more accurate Type
I error rates when either ␣ or  is zero. The tests listed Causal Inference
in terms of accuracy consist of the joint significance
test of ␣ and , asymmetric critical value test, test of
the simple minus partial correlation, test of product of This article has focused on the statistical properties
correlations, unbiased test of ␣, first-order tests of of intervening effect tests, at least in part because the
␣, McGuigan and Langholtz (1988) test of − ⬘, requirements for causal inference regarding an inter-
and then second-order test of ␣. Unfortunately, vening effect are complex and controversial. The ma-
Goodman’s (1960) unbiased test often yields negative jority of the statistical tests of intervening effects re-
variances and is hence undefined for zero or small ported in psychology journals test the significance of
effects or small sample sizes. The joint significance the indirect effect ␣ or each of the causal steps pro-
test of ␣ and  appears to be the best test in this group posed by Judd and Kenny (1981b) or Baron and
as it has the most power and the most accurate Type Kenny (1986). The tests of the indirect effect largely
I error rates in all cases compared to the other meth- follow the tradition of path analysis in which a re-
ods. Note that no parameter estimate or standard error stricted model is hypothesized, here X → I → Y and
of the intervening variable effect is available for the joint the hypothesized model is tested against the data. Al-
test of the significance of ␣ and  so that effect sizes and though important competing models may be consid-
confidence intervals are not directly available. Conse- ered and also tested against the data, this tradition
quently, other tests that are close to the joint significance typically seeks to demonstrate only that the causal
test in accuracy such as the asymmetric confidence in- processes specified by the hypothesized model are
terval test may be preferable as they do include an es- consistent with the data.
timate of the magnitude of the intervening variable ef- In contrast, Judd and Kenny (1981b) originally pre-
fect. The very close simulation performance of the other sented their causal steps method as a device for di-
six methods in this group suggests that for practical data rectly probing the causal process through which a
analysis, the choice of tests in this group will not change treatment produces an outcome. The strength of their
the conclusions of the study. Overall, the methods in the approach is that in the context of a single randomized
third group represent a compromise with less power than experiment it provides evidence that the treatment
some methods, and more accurate Type I error rates than causes the intervening variable, the treatment causes
other methods. the outcome, and that the data are consistent with the
100 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
proposed intervening variable model X → I → Y, path and each link of the indirect path can be manipu-
where X represents the treatment conditions. How- lated in a randomized experiment, strong causal in-
ever, the third causal step makes the strong assump- ferences can potentially be reached. A recent experi-
tion that the residuals ⑀2 and ⑀3 in Equations 2 and 3, ment by Sheets and Braver (1999) presented an
respectively, are independent. This assumption can be illustration of this approach.
violated for several reasons including omitting a vari-
Conclusion
able from the path model, an interaction between I
and X, incorrect specification of the functional form Tests of the intervening variable effect are useful
of the relations, error of measurement in the interven- because they examine processes by which variables
ing variable, and a bidirectional causal relation be- are related. In clinical and community research, such
tween I and Y (Baron & Kenny, 1986; MacKinnon, tests are critical for the elucidation of how prevention
1994). When this assumption is violated, biased esti- and treatment programs work. In experimental re-
mates of the indirect effect may be obtained and the search, such tests are critical for establishing the plau-
causal inference that X → I → Y may be unwarranted. sibility of causal sequences implied by theory. Re-
Holland (1988) presents an extensive analysis of the flecting the lack of consensus in the definition of an
assumptions necessary for causal inference in this de- intervening effect across disciplines, the available
sign. Among the conditions necessary for causal in- tests address several different null hypotheses. The
ference are randomization, linear effects, and that the available procedures also differ in the extent to which
full effect of the treatment operates through the inter- they simply test whether the data are consistent with
vening variable (i.e., no partial intervening variable a hypothesized intervening variable model versus at-
effect). tempt to establish other logical features that support
Establishing the conditions necessary for causal in- causal inference by ruling out other competing mod-
ference requires a more complex design than the two els. As a result of these differing conceptual bases,
group randomized experiment considered by Judd and different assumptions, and different estimation meth-
Kenny (1981b). Designs in which both the treatment ods, the available tests show large differences in their
and intervening variable are manipulated in a random- Type I error rates and statistical power. The present
ized experiment can achieve stronger causal infer- article provides researchers with more information
ences. For example, imagine a hypothesized model in about both the conceptual bases and the statistical
which commitment leads to intentions, which, in turn, performance of the available procedures for determin-
leads to behavior. Subjects could be randomly as- ing the statistical significance of an intervening vari-
signed to a high or low commitment to exercise pro- able effect. Our hope is that researchers will now have
gram condition, following which their intentions to guidance in selecting a test of the intervening variable
exercise would be measured. Following this, subjects effect that addresses their question of interest with the
could be randomly assigned to a condition in which maximal statistical performance.
the same exercise program was easy versus difficult
to access and the extent of their behavioral compli- References
ance with the program could be measured. Adding
design features like randomization and temporal pre- Ajzen, I., & Fishbein, M. (1980). Understanding attitudes
cedence can powerfully rule out alternative causal ex- and predicting social behavior. Englewood Cliffs, NJ:
planations (Judd & Kenny, 1981b; Shadish, Cook, & Prentice Hall.
Campbell, 2002; West & Aiken, 1997; West, Biesanz, Allison, P. D. (1995). The impact of random predictors on
& Pitts, 2000). comparison of coefficients between models: Comment on
The addition of design features can change the as- Clogg, Petkova, and Haritou. American Journal of Soci-
sumptions that are necessary for causal inference. For ology, 100, 1294–1305.
example, Judd and Kenny (1981b) and Baron and Alwin, D. F., & Hauser, R. M. (1975). The decomposition
Kenny (1986) ruled out models in which the treatment of effects in path analysis. American Sociological Re-
could not be shown to affect the outcome in Equation view, 40, 37–47.
1. This condition rules out inconsistent effect models Aroian, L. A. (1944). The probability function of the prod-
in which the intervening variable effect (␣) and the uct of two normally distributed variables. Annals of
direct effect (⬘) in Figure 1 have opposite signs and Mathematical Statistics, 18, 265–271.
may cancel out. However, if the strength of the direct Baron, R. M., & Kenny, D. A. (1986). The moderator–
COMPARISON OF INTERVENING VARIABLE METHODS 101
mediator variable distinction in social psychological re- of social interventions. Cambridge, England: Cambridge
search: Conceptual, strategic, and statistical consider- University Press.
ations. Journal of Personality and Social Psychology, 51, Judd, C. M., & Kenny, D. A. (1981b). Process analysis:
1173–1182. Estimating mediation in treatment evaluations. Evalua-
Bentler, P. (1997). EQS for Windows (Version 5.6) [Com- tion Review, 5, 602–619.
puter software]. Encino, CA: Multivariate Software. Kenny, D. A., Kashy, D. A., & Bolger, N. (1998). Data
Bobko, P., & Rieck, A. (1980). Large sample estimators for analysis in social psychology. In D. T. Gilbert, S. T.
standard errors of functions of correlation coefficients. Fiske, & G. Lindzey (Eds.), The handbook of social psy-
Applied Psychological Measurement, 4, 385–398. chology (pp. 233–265). Boston: McGraw-Hill.
Bollen, K. A. (1987). Total direct and indirect effects in Krantz, D. H. (1999). The null hypothesis testing contro-
structural equation models. In C. C. Clogg (Ed.), Socio- versy in psychology. Journal of the American Statistical
logical methodology (pp. 37–69). Washington, DC: Association, 94, 1372–1381.
American Sociological Association. MacCorquodale, K., & Meehl, P. E. (1948). On a distinction
Clogg, C. C., Petkova, E., & Cheng, T. (1995). Reply to between hypothetical constructs and intervening vari-
Allison: More on comparing regression coefficients. ables. Psychological Review, 55, 95–107.
American Journal of Sociology, 100, 1305–1312. MacKinnon, D. P. (1994). Analysis of mediating variables
Clogg, C. C., Petkova, E., & Shihadeh, E. S. (1992). Statis- in prevention and intervention research. In A. Cazares &
tical methods for analyzing collapsibility in regression L. A. Beatty (Eds.), Scientific methods in prevention re-
models. Journal of Educational Statistics, 17(1), 51–74. search. NIDA Research Monograph 139 (pp. 127–153);
Cohen, J. (1988). Statistical power for the behavioral sci- DHHS Publication No. 94-3631. Washington, DC: U.S.
ences. Hillsdale, NJ: Erlbaum. Government Printing Office.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/ MacKinnon, D. P. (2000). Contrasts in multiple mediator
correlation analysis for the behavioral sciences. Hills- models. In J. Rose, L. Chassin, C. C. Presson, & S. J.
dale, NJ: Erlbaum. Sherman (Eds.). Multivariate applications in substance
Craig, C. C. (1936). On the frequency function of xy. An- use research (pp. 141–160). Mahwah, NJ: Erlbaum.
nals of Mathematical Statistics, 7, 1–15. MacKinnon, D. P., & Dwyer, J. H. (1993). Estimating me-
Fox, J. (1980). Effect analysis in structural equation models. diated effects in prevention studies. Evaluation Review,
Sociological Methods and Research, 9, 3–28. 17, 144–158.
Freedman, L. S., & Schatzkin, A. (1992). Sample size for MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000).
studying intermediate endpoints within intervention trials Equivalence of the mediation, confounding, and suppres-
of observational studies. American Journal of Epidemi- sion effect. Prevention Science, 1, 173–181.
ology, 136, 1148–1159. MacKinnon, D. P., & Lockwood, C. (2001). Distribution of
Goodman, L. A. (1960). On the exact variance of products. products tests for the mediated effect. Unpublished manu-
Journal of the American Statistical Association, 55, 708– script.
713. MacKinnon, D. P., Lockwood, C., & Hoffman, J. (1998,
Hansen, W. B. (1992). School-based substance abuse pre- June). A new method to test for mediation. Paper pre-
vention: A review of the state of the art in curriculum, sented at the annual meeting of the Society for Prevention
1980–1990. Health Education Research: Theory and Research, Park City, UT.
Practice, 7, 403–430. MacKinnon, D. P., Warsi, G., & Dwyer, J. H. (1995). A
Holland, P. W. (1988). Causal inference, path analysis, and simulation study of mediated effect measures. Multivari-
recursive structural equation models (with discussion). In ate Behavioral Research, 30, 41–62.
C. Clogg (Ed.), Sociological methodology 1988 (pp. 449– McGuigan, K., & Langholtz, B. (1988). A note on testing
484). Washington, DC: American Sociological Associa- mediation paths using ordinary least-squares regression.
tion. Unpublished note.
James, L. R., & Brett, J. M. (1984). Mediators, moderators Meeker, W. Q., Cornwell, L. W., & Aroian, L. A. (1981).
and tests for mediation. Journal of Applied Psychology, Selected tables in mathematical statistics, Vol. VII: The
69, 307–321. product of two normally distributed random variables.
Jöreskog, K. G., & Sörbom, D. (1993). LISREL (Version Providence, RI: American Mathematical Society.
8.12) [Computer software]. Chicago: Scientific Software Olkin, I., & Finn, J. D. (1995). Correlation redux. Psycho-
International. logical Bulletin, 118, 155–164.
Judd, C. M., & Kenny, D. A. (1981a). Estimating the effects Olkin, I., & Siotani, M. (1976). Asymptotic distribution of
102 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
functions of a correlation matrix. In S. Ikeda (Ed.), Es- Memory accessibility and association of alcohol use and
says in probability and statistics (pp. 235–251). Tokyo: its positive outcomes. Experimental and Clinical Psycho-
Shinko Tsusho. pharmacology, 2, 269–282.
Sampson, C. B., & Breunig, H. L. (1971). Some statistical Stone, C. A., & Sobel, M. E. (1990). The robustness of es-
aspects of pharmaceutical content uniformity. Journal of timates of total indirect effects in covariance structure
Quality Technology, 3, 170–178. models estimated by maximum likelihood. Psycho-
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). metrika, 55, 337–352.
Experimental and quasi-experimental designs for gener- West, S. G., & Aiken, L. S. (1997). Toward understanding
alized causal inference. Boston: Houghton-Mifflin. individual effects in multicomponent prevention pro-
Sheets, V. L., & Braver, S. L. (1999). Organizational status grams: Design and analysis strategies. In K. J. Bryant, M.
and perceived sexual harassment: Detecting the media- Windle, & S. G. West, (Eds.). The science of prevention:
tors of a null effect. Personality and Social Psychology Methodological advances from alcohol and substance
Bulletin, 25, 1159–1171. abuse research (pp. 167–209). Washington, DC: Ameri-
Sobel, M. E. (1982). Asymptotic confidence intervals for can Psychological Association.
indirect effects in structural equation models. In S. Lein- West, S. G., Biesanz, J. C., & Pitts, S. C. (2000). Causal
hardt (Ed.), Sociological methodology 1982 (pp. 290– inference and generalization in field settings: Experimen-
312). Washington, DC: American Sociological Associa- tal and quasi-experimental designs. In H. T. Reis & C. M.
tion. Judd (Eds.), Handbook of research methods in social and
Sobel, M. E. (1988). Direct and indirect effects in linear personality psychology (pp. 40–84). New York: Cam-
structural equation models. In J. S. Long (Ed.), Common bridge University Press.
problems/proper solutions (pp. 46–64). Beverly Hills, Woodworth, R. S. (1928). Dynamic psychology. In C. Mur-
CA: Sage. chison (Ed.), Psychologies of 1925 (pp. 111–126).
Springer, M. D., & Thompson, W. E. (1966). The distribu- Worcester, MA: Clark University Press.
tion of independent random variables. SIAM Journal on Yang, M. C. K., & Robertson, D. H. (1986). Understanding
Applied Mathematics, 14, 511–526. and learning statistics by computer. Singapore: World
Stacy, A. W., Leigh, B. C., & Weingardt, K. R. (1994). Scientific.
Appendix A
error derived using the multivariate delta method. The mul- Var共u兲 ≈ 兺 兺p p.
i=1 j=1
i ij j
tivariate delta method solution for the standard error is ob-
tained by pre- and postmultiplying the vector of partial de- The standard error is then taken as the square root of Var(u).
rivatives of a function by the covariance matrix for the Olkin and Finn (1995) derived the asymptotic covariance
correlations in the function (Olkin & Finn, 1995; Sobel, matrix (see Olkin & Siotani, 1976, for asymptotic results) of
1982). (XI, XY, IY). The variances and covariances among the
The multivariate delta method assumes a function u ⳱ elements of this correlation matrix (Olkin & Finn, 1995) are
f (1,2,3), where (v1, v2, v3) has covariance matrix
XI XY IY
11 12 13
冤 冥
XI var共XI兲
⌺ = 21 22 23 . XY cov共XI, XY兲 var共XY兲
31 32 33
IY cov(XI, IY兲 cov共IY, XY兲 var共IY兲
Let (p1, p2, p3) denote the partial derivatives (⭸u/⭸v1, The formulas to calculate the variances and covariances
⭸u/⭸v2, ⭸u/⭸v3) of u with respect to (v1, v2, v3). According to among the correlations based on asymptotic theory from
the Delta method, the variance of u can be approximated by Olkin and Siotani (1976) are
COMPARISON OF INTERVENING VARIABLE METHODS 103
共1 − XI
2 2
兲 1
var共XI兲 = , (A1) 共2 − XI IY兲共1 − XI
2
− XY
2
− IY
2
兲 + XY
3
N 2 XY
cov共XI, IY兲 = ,
N
共1 − XY
2 2
兲
var共XY兲 = , (A2) (A5)
N
共1 − IY
2 2
兲 and
var共IY兲 = , (A3)
N
1
1 共2 − XY IY兲共1 − XI
2
− XY
2
− IY
2
兲 + XI
3
2 XI
共2 − XI XY兲共1 − IY
2
− XI
2
− XY
2
兲 + IY
3
cov共IY, XY兲 = .
2 IY N
cov共XI, XY兲 = ,
N (A6)
(A4)
Appendix B
An SAS Program to Calculate Standard Errors Using the Multivariate Delta Method
data a; input rxi rxy riy nobs;
*note r corresponds to correlation which is indicated by the Greek letter rho in the article;
*x, i, and y represent the independent, intervening, and dependent variables, respectively;
*variance of the correlations from Appendix A;
vrxy⳱((1-rxy*rxy)*(1-rxy*rxy))/nobs;
vriy⳱((1-riy*riy)*(1-riy*riy))/nobs;
vrxi⳱((1-rxi*rxi)*(1-rxi*rxi))/nobs;
ovar⳱opd1*opd1*vrxy+opd2*opd1*crxyriy+opd3*opd1*rxyrxi+opd1*opd2*crxyriy+opd2*opd2*vriy+opd2*opd3*criyrxi
+opd1*opd3*crxyrxi+opd2*opd3*criyrxi+opd3*opd3*vrxi;
ose⳱sqrt(ovar);
zolkin⳱diff/ose;
polkin⳱1-probnorm(zolkin);
(Appendix continues)
104 MACKINNON, LOCKWOOD, HOFFMAN, WEST, AND SHEETS
bpd2⳱(-(rxi*rxi)/(1-rxi*rxi));
bpd3⳱(rxi/(1-rxi*rxi));
bobkovar⳱((bpd1**2)*vrxi)+((bpd2**2)*vrxy)+((bpd3**2)*vriy)+(2*bpd1*bpd2*crxyrxi)+(2*bpd2*bpd3*crxyriy)
+(2*bpd1*bpd3*criyrxi);
bobkose⳱sqrt(bobkovar);
zbobko⳱corr/bobkose;
pbobko⳱1−probnorm(abobko);
cards;
.14 .14 0 200
;
proc print;
var diff ose zolkin polkin corr bobkose zbobko pbobko;
run;