Download as pdf or txt
Download as pdf or txt
You are on page 1of 22

Does PLS Have Advantages for Small Sample Size or Non-Normal Data?

Author(s): Dale L. Goodhue, William Lewis and Ron Thompson


Source: MIS Quarterly , September 2012, Vol. 36, No. 3 (September 2012), pp. 981-1001
Published by: Management Information Systems Research Center, University of
Minnesota

Stable URL: https://www.jstor.org/stable/41703490

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

Management Information Systems Research Center, University of Minnesota is collaborating


with JSTOR to digitize, preserve and extend access to MIS Quarterly

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
QÍarterjy
Does RLS Have Advantages for Small Sample
Size or Non-Normal Data?1
Dale L. Goodhue
Terry College of Business, MIS Department, University of Georgia,
Athens, GA 30606 U.S.A. {dgoodhue@terry.uga.edu}

William Lewis
{william.w.lewis@gmail.com}

Ron Thompson
Schools of Business, Wake Forest University,
Winston-Salem, NC 27109 U.S.A. {thompsrl@wfu.edu}

There is a pervasive belief in the MIS research community that PLS has advantages over other techniques when
analyzing small sample sizes or data with non-normal distributions. Based on these beliefs, major MIS journals
have published studies using PLS with sample sizes that would be deemed unacceptably small if used with other
statistical techniques . We used Monte Carlo simulation more extensively than previous research to evaluate
PLS, multiple regression, and LISREL in terms of accuracy and statistical power under varying conditions of
sample size, normality of the data, number of indicators per construct, reliability of the indicators, and
complexity of the research model We found that PLS performed as effectively as the other techniques in
detecting actual paths, and not falsely detecting non-existent paths . However, because PLS (like regression)
apparently does not compensate for measurement error, PLS and regression were consistently less accurate
than LISREL. When used with small sample sizes, PLS, like the other techniques, suffers from increased
standard deviations, decreased statistical power, and reduced accuracy. All three techniques were remarkably
robust against moderate departures from normality, and equally so. In total, we found that the similarities in
results across the three techniques were much stronger than the differences.

Keywords: Partial least squares, PLS, regression, structural equation modeling, statistical power, small sample
size, non-normal distributions, Monte Carlo simulation

Introduction sizes or data with non-normal distributions, partial least


squares (PLS) has advantages that make it more appropriate
than other statistical estimation techniques such as regression
There is a pervasive belief in the Management Information
Systems (MIS) research community that for small sample or covariance-based structural equation modeling (CB-SEM)
with LISREL, Mplus, etc. Partly because of these beliefs,
PLS has been widely adopted in the MIS research community.
^ike Morris was the accepting senior editor for this paper. Andrew Burton-
Jones served as the associate editor.
To get a clearer picture of the extent of these beliefs in the
The appendix for this paper is located in the "Online Supplements" section MIS field, we examined three top MIS journals ( Information
of the MIS Quarterly's website (http://www.misq.org).
Systems Research [ISR], Journal of Management Information
A much earlier version of this paper was published in a conference Systems [JMIS], and MIS Quarterly [MISQ]). We identified
proceedings (Goodhue et al. 2006). all articles that used some form of path analysis published

MIS Quarterty Vol. 36 No. 3, pp. 981 -1001 /September 2012 981

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size, and Non-Normal Data

from 2006 to 2010, inclusive - 188 articles across the three proper they did not directly address the issue. However, in
journals. Overall, PLS was used for 49% of the path analysis their Appendix B they did distinguish between regression on
papers in these three MIS journals. Of the 90 articles using the one hand (where they suggest that minimum sample size
PLS, at least 35% stated that PLS had special abilities relative is primarily an issue of statistical power and is well addressed
to small sample size and/or non-normal distributions. Thir- by Cohen's 1988 guidance), and PLS and CBSEM on the
teen of these studies (14%) had sample sizes smaller than 80 other hand (where sample size plays a more complex role).
(which we will show is insufficient). Four of the 13 papers They do note that "the core of the PLS estimation method -
stated that PLS had these special abilities without any ordinary least squares - is remarkably stable even at low
supporting citations. This suggests that these beliefs are so sample sizes." They suggest that this gave rise to the "10
widely accepted they are seen as no longer needing support times" rule, although they point out that it "is only a rule of
from the literature. thumb, however, and has not been backed up with substantive
research." Similarly, Ringle, Sarstedt, and Straub (2012)
There are a relatively small number of articles that are most noted that very few researchers employing PLS reported any
attempts to determine the adequacy of their sample size (other
often cited to support the claim that PLS has advantages at
than the 10 times rule), but they did go on to suggest that
small sample sizes (e.g., Barclay et al. 1995; Chin 1998; Chin
power tables from regression could be used.
et al. 2003; Gefen et al. 2000), with other sources also cited
at times (Chin and Newsted 1999; Falk and Miller 1992;
Hair et al. (20 11) also provided guidelines for when the use of
Fornell and Bookstein 1982; Lohmöller 1988). The most
PLS is appropriate, and with respect to sample size they
commonly cited minimum sample rule for PLS might be
repeated the "10 times" rule. While they did go on to caution
termed the "10 times" rule, which states that the sample size
that "although this rule of thumb does not take into account
should be at least 10 times the number of incoming paths to
effect size, reliability, the number of indicators, and other
the construct with the most incoming paths (Barclay et al.
factors known to affect power and can thus be misleading,"
1995; Chin and Newsted 1998). Some MIS researchers have
they still conclude that "it nevertheless provides a rough
also cited Falk and Miller (1992) to justify using a "5 times"
estimate of minimum sample size requirements." In their
rule.
summary table of guidelines for applying PLS, the 10 times
rule is the only guidance for minimum sample size, without
Chin and Newsted ( 1 999, p. 327) added the following caution
any caveat. Likewise in another work, Hair, Ringle, and
to their description of the 10 times rule:
Sarstedt (2011) recommend the 10 times rule without any
caveat.
Ideally, for a more accurate assessment, one needs to
specify the effect size for each regression analysis Statements about PLS and non-normal data are also common,
and look up the power tables provided by Cohen such as the following quote cited widely in MIS research:
(1988) or Green's (1991) approximation to these "Because PLS estimation involves no assumption about the
tables.
population or scale of measurement, there are no distribu-
tional requirements" (Fornell and Bookstein, 1982, p. 443).
However, MIS researchers appear to have interpreted these Many seem to have interpreted this to mean that while the
and similar statements to imply that, although one could use distribution of data used in a regression or LISREL analysis
Cohen's tables to determine the minimum allowable sample is important, it is less important or perhaps even irrelevant for
size required to conduct a given study, one can also use the PLS analysis. More recently it has been suggested the PLS
"rule of 10" or even the "rule of 5." For example Kahai and may not have an advantage on this score, because new esti-
Cooper (2003, p. 277) used a sample size of 31 in a study mation techniques with CB-SEM provide options that are
published in JMIS; Malhotra et al. (2007, p. 268) used a quite robust to departures from normality (Gefen et al. 201 1 ;
sample size of 41 in ISR. Chin and his coauthors used a Hair et al. 201 1). Nonetheless, recommendations to use PLS
sample size of 17 in MISQ (Majchrzak et al. 2005, p. 660). when data are to some extent non-normal still appear (Hair,
Ringle, and Sarstedt 201 1, p. 144).
A recent editorial by Gefen, Rigdon, and Straub (2011) in
MIS Quarterly provided guidelines for choosing between PLS Thus while some articles and editorials have issued cautions
and CB-SEM analysis techniques. They expressed some against assuming too much for PLS' s special capabilities
concern about PLS and sample size, referencing Marcoulides (e.g., Gefen et al. 2011; Goodhue et al. 2006; Marcoulides
and Saunders (2006) with respect to "the apparent misuse of and Saunders 2006), many of the recommendations lack
perceived leniencies such as assumptions about minimum specificity. In particular, cautions about the 10 times rule for
sample size in partial least squares (PLS)," but in their article sample size typically do not suggest any concrete alternative,

982 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size, and Non-Normal Data

even though that rule's validity has not been tested empiri- Our study provides four important contributions. First, we
cally. As a result, MIS journal articles continue to cite and show that as sample sizes are reduced, all three techniques
use the 10 times rule. For example, Zhang et al. explicitly use suffer from increasing standard deviations for the path
it in determining the minimum sample size for their PLS estimates and the attendant decrease in accuracy, and all have
analysis in their December 2011 MIS Quarterly article. about the same resulting loss in statistical power.4 This
Clearly a subset of the social science research community still strongly suggests that PLS has no advantage for either accu-
believes that the 10 times rule provides an acceptable racy or statistical power at small sample size, and that the 10
guideline. times rule is a misleading guide to minimal sample size for all
three techniques equally. Second, all three techniques were
In our study we use Monte Carlo simulation to empirically relatively robust (and equally so) to moderate departures from
address the issues of small sample sizes and non-normal normality and all suffered somewhat (again about equally)
distributions. Specifically, we look at the relative efficacy of under extreme departures from normality. This strongly
PLS, regression, and CB-SEM (or as we will refer to it here, suggests that PLS has no advantage with non-normal distribu-
LISREL2) under a variety of conditions. By efficacy we mean tions. Third, these results hold with simple and more complex
their ability to support a researcher's need to statistically test models. Fourth, LISREL consistently produces more accurate
hypothesized relationships among constructs.3 We specifi- estimates of path strengths. PLS and regression are not only
cally test to see whether these techniques have different
less accurate than LISREL and about equally so, but the
abilities in terms of: (1) arriving at a solution, (2) producing
amount of underestimation is quite consistent with Nunnally
accurate path estimates, (3) avoiding false positives (Type 1
and Bernstein's (1994, pp. 241, 257) equation for the attenua-
errors), and (4) avoiding false negatives (Type 2 errors,
tion of relationship strength due to measurement error. This
related to statistical power). We also test each of the tech-
leads us to the conclusion that PLS, like regression, does not
niques against commonly accepted standards (e.g., at least
take measurement error into account in its path estimates, as
80% power, no more than 5% false positives, and path
opposed to LISREL and other CB-SEM techniques which do.
estimates that are as accurate as possible). A primary goal of
our work is to provide researchers with some concrete
In this note we restrict our focus to situations where
findings to help inform decisions relating to: (1) designing
researchers use PLS, LISREL, or regression and have a
research studies, (2) selecting analysis techniques, and
hypothesized model with reflective, multi-indicator construct
(3) interpreting the results obtained.
data. Given our focus on the efficacy of the three techniques
rather than examining how they operate, we do not describe
We accomplish this by using Monte Carlo simulation to
compare results across different statistical analysis techniques the techniques in detail. Interested readers are encouraged to
(Goodhue et al. 2012). We employ PLS, regression and review published work (e.g., Barclay et al. 1995; Chin 1998;
LISREL to analyze identical collections of sets of 500 data Chin and Newsted 1 999; Fornell 1 984; Fornell and Bookstein
sets each, using the identical research model. We start with 1982; Gefen et al. 2000; Hayduk 1987; Kline 1998) to obtain
a relatively simple model and five different sample sizes, and more detailed descriptions of the different statistical analysis
then do sensitivity testing by varying distributional properties techniques. However, we will provide enough of a descrip-
of the data, number and reliability of indicators, and com- tion of each technique to justify the presumption that we
plexity of the model. might get different path or statistical significance estimates
depending on which technique was used, even using the exact
same input data.

2LISREL is a specific statistical analysis program that is one of a set of pro-


grams (others include AMOS, EQS, etc.) that use a covariance-basedFollowing Rönkkö and Ylitalo (20 1 0), we can think of regres-
sion and PLS as both having three steps: (1) determining the
structural equation modeling (CB-SEM) technique. For ease of exposition,
we use the term LISREL to refer to this technique in general, since that is theweightings for the construct indicators, (2) using those
program we employed; the reader should note that other CB-SEM programsweights to calculate composite construct scores and using
could have been chosen, presumably with similar results.
ordinary least squares to calculate path estimates, and
(3) determining the statistical significance of the path esti-
3McDonald (1996) has suggested reserving the phrase "latent construct" for
SEM techniques such as LISREL that do not presume to have developed an
actual score for each construct, as opposed to the phrase "composite con- 4The only caveat to this is that (as is well known) for smaller sample sizes
struct" for techniques such as regression or PLS that do develop explicit (smaller than 90 in our studies), LISREL may not arrive at an acceptable
scores for each construct. In this paper we will adhere to that distinction, butsolution. In our studies this was only slightly apparent at n = 40, but very
since we are not focusing on construct scores themselves to any great extent, apparent at n = 20. We note that when this problem occurs, the researcher
we generally use the les specific term "construct" to refer to either latent or will not be tempted to think it is a valid solution, since the result is very
composite constructs. obvious.

MIS Quarterly Vol. 36 No. 3/September 2012 983

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

mates. For regression, the first step is usually accomplished Monte Carlo simulation has been used to study issues such as
by giving equal weights to all indicators. Then composite bias sizes in PLS estimates (Cassel et al. 1999), impact of
scores are determined, and each dependent composite con- different correlation structures on goodness of fit tests
struct and all its predictors are then analyzed separately. (Fornell and Larcker 1981), and the efficacy of PLS with
Regression uses ordinary least squares (linear algebra) to product indicators versus regression in detecting interaction
calculate the solution for the path values that minimizes the effects (Chin et al. 2003; Goodhue et al. 2007). The
squared differences between the predicted and the actual remainder of this paper begins with a quick explanation of
scores for each dependent construct. There is no iteration Monte Carlo simulation and how it can be used to assess the
involved. The regression solution includes estimates of the efficacy of different statistical techniques. We do this so we
standard deviation of each path estimate, from which statis-
can explain how our examination of the simulation data goes
tical significance can be determined, using normal distribution
significantly beyond that employed by previous researchers
theory.
who investigated PLS using Monte Carlo simulation (Cassel
et al. 1999; Chin et al. 2003; Chin and Newsted, 1999). Fol-
In PLS, step one is the core of its uniqueness. As opposed to lowing that, we describe four simulation studies used to test
regression where equal weights are used, PLS iterates through
the three techniques under varying conditions. We end with
a process to find the "optimal" indicator weights for each
a discussion of our overall findings, and their implications.
construct, such that the overall R2 for all dependent constructs
is maximized. The second step in PLS, as in regression, uses
the indicator weights to calculate construct scores which are
then used in ordinary least squares to determine final path How to Test "Efficacy" with Monte Carlo
estimates. The third step in PLS is to determine the standard Simulation: Our Approach and
deviations of those path estimates with bootstrapping.5 This
Previous Approaches
third PLS step contrasts with regression where normal
distribution theory is used, but bootstrapping could also be
Use of the Monte Carlo simulation approach requires the
used with regression. Thus note that both PLS and regression
researcher to start with a prespecified "true" model (such as
use weighted averages of indicator values to develop con-
shown in Figure 1) that includes both the strength of paths
struct scores, and both use ordinary least squares to determine
and the amount of random variance in linkages (between con-
path values. The critical difference between the two in terms
structs, and between constructs and their indicators). From
of path values is that regression uses equally weighted
the prespecified model, sets of simulated questionnaire
indicator scores, while PLS has a process intended to optimize
responses are generated using random number generators.6
weights.
Then the same sets of questionnaire responses are analyzed by
each of the three techniques in turn. The analysis results can
LISREL is quite different from regression and PLS. It also
be compared across techniques and against accepted stan-
uses linear algebra, but its equations include all constructs, all
dards. Repeating this process with many different data sets
indicators, all error terms, and all relationships between them
(in our case, 500) for each condition tested removes the worry
in a single analysis. LISREL iterates between two processes:
that any given result is atypical.
(1) taking a candidate solution for the various parameters
(with candidate values for paths between constructs, error
Figure 1 shows the model used as the basis for much of our
variances, etc.) and generating the implied covariance matrix
analysis. (We will modify it to test different specific issues.)
for the indicators, and (2) comparing that implied covariance
We have four constructs (Ksil through Ksi4) that "cause"
matrix with the sample covariance matrix. From this com-
changes in the value of a fifth construct (Etal). Three of the
parison, the degree of fit between the candidate solution and
four independent constructs have the following effect sizes:
the actual data is determined, and changes to the various
large (.35), medium (.15), and small (.02), to correspond to
parameters in the candidate solution are suggested. This then
Cohen's (1988) suggested values. We also include an effect
feeds back into step 1, until limited changes are suggested.
size of zero so we can test for false positives. See the notes
Since step 1 also includes estimates of the standard deviation
in Figure 1 for more detail.
of each path estimate, statistical significance can be deter-
mined using normal distribution theory.

5Jackknifing can also be used to determine statistical significance, but 6See Appendix D for an example of the SAS program used to generate the
bootstrapping is generally recommended. data.

984 MIS Quarterty Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data

We use an example to illustrate how our studies differ from Using a random number generator and the relationships from
previous Monte Carlo simulation studies (and to explain why the Figure 1 model, we generated 500 data sets of 40
we therefore draw different conclusions). In this illustration "questionnaires" each (20,000 questionnaire responses total).
we will focus on the Gamma2 path (medium effect size) of This could be thought of as 500 researchers, each with a
Figure 1, and compare the results from regression analysis sample size of n = 40. In this example, we are interested in
with the results from PLS analysis. The rule of 10 would predicting what a 501st researcher with a similar sample size
suggest n = 40 is a minimum sample size for PLS given the from the same population might find if he or she analyzed that
model in Figure 1. data set with PLS or regression. Figure 2 shows the results of

MIS Quarterly Vol. 36 No. 3/September 2012 985

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

analyzing the 500 data sets first with regression and then with However, we went further than the comparisons done by
PLS. Figure 2a is a histogram of the 500 regression path earlier studies and also examined the number of data sets that
estimates for Gamma2; Figure 2c shows the same thing for resulted in statistically significant paths and the standard
PLS estimates, based on exactly the same 500 data sets. deviations of the 500 path estimates. Figures 2b and 2d show
the 500 associated t statistics for Gamma2, using regression
In Figure 2a, the mean Gamma2 estimate across 500 data sets and PLS respectively. Since the t statistic cutoff for p < .05
is .255 for regression, as shown by the heavy dotted vertical in this regression is 2.037 (the heavy dotted vertical line), we
line. Lighter dotted lines show the 95% confidence interval see in Figure 2b that about 40% of the 500 regression data
sets found a statistically significant Gamma2 path. From
for the path value the 501st research should expect. Notice in
Figure 2d we see that 41% of the PLS data sets found a
Figure 2c that PLS has a slightly higher mean value for
significant Gamma2 path.
Gamma2 (.273). Although the difference is not large, our
results for Gamma2 accuracy are consistent with those of
Chin and Newsted (1999); PLS produces a slightly larger 7For an n = 40 regression with four constructs and a constant, the degrees of
estimate than regression, on average. freedom are 40 - 5 = 35.

986 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

PLS seems to have the advantage: a slightly higher average • from .107 at n = 200 to .380 at n = 20 for Chin and
path estimate and a slightly higher statistical power. But this Newsted (a factor of 3.5); and
seeming advantage is misleading. First of all, the 95%
confidence intervals for the path value the 501st researcher • from .019 at n = 1000 to .092 at n = 50 for Cassel et al.
will likely see for Gamma2 using regression (in Figure 2a) (a factor of 4)10
and PLS (in Figure 2C) almost completely overlap. Although
PLS has a higher mean value than regression (.018 higher), Focusing on the average path estimate bias can mask serious
that difference is not statistically significant at the p < .05 problems, since a combination of very high and very low path
level. Second, for statistical power, the 95% confidence estimates might still average out to about the true value. Our
interval around regression's power (40%) and PLS's power interpretation of the results in all three of these papers is that
(41%) goes from about .36% to about .45%.8 Thus the small as sample size goes down, the average bias across hundreds
difference of 40% versus 41% is not even close to being of data sets does not change much, but the "wildness"
statistically significant. From a statistical point of view, (standard deviation) of individual path estimates increases
neither the accuracy nor the power obtained from PLS can be considerably. In the context of Figures 2a and 2c, the wider
distinguished from that obtained from regression in this the "spread" of the data in the histogram (and the wider its
analysis. 95% confidence interval), the greater the "wildness" of the
estimates. Increasingly wild estimates do not suggest to us
More importantly, since 80% power is the generally sought that PLS is robust to changes in sample size. We do not mean
minimum acceptable level of power, both 40% and 41% to suggest that PLS is more wild than regression or LISREL
power are unacceptably low. About 60% of the time, true at these sample sizes. As will be seen, we only suggest that
medium effect size paths will not be detected by either all three techniques suffer at small sample sizes, and about
technique at this sample size. equally so.

Let us focus for a minute just on the issue of accuracy of PLS Therefore, in our analysis, we will look at the average bias of
estimates. In three of the empirical articles often used to the 500 path estimates as the above articles did, but also at the
justify PLS's special capabilities (Cassel et al. 1999; Chin et average of the standard deviations of those path estimates and
al. 2003; Chin and Newsted),9 the authors displayed both at the proportion of data sets that found a significant path.
average path estimates (or average biases) and standard Very different conclusions will be drawn by using this
deviations of the path estimates. However, each set of authors additional information.
focused attention almost entirely on the average accuracy of
the path estimate, concluding that it did not change much with
decreases in sample size. None placed much importance on Study 1 : The Effect of Sample Size
the fact that as their sample size went down, standard with a Simple Model
deviations of the path estimates went up:
In this study, our objective was to assess the relative efficacy
• from .069 at n = 500 to .250 at n = 20 for Chin et al. (a of the three techniques under conditions of varying sample
factor of 3.5); size, using normally distributed data, well measured con-
structs, and a simple but realistic model of construct relation-
ships. We used the model shown in Figure 1, and sample
sizes of 20, 40, 90, 150, and 200, generating 500 data sets for
each sample size.11 For our model, 20 is the required sample
8A standard equation for the 95% confidence interval around a proportion p
size based on the "rule of 5"; 40 is the required sample size
with sample size n is: p+/- 1.96 [ p(l-p) / n ] 5.
based on the "rule of 10"; 90 is perhaps just below the
9To better understand the statistical significance test that Chin and
minimum sample size acceptable for LISREL, 200 is a
Newsted
conservative
and Chin et al. use, it is necessary to understand the difference between the estimate of minimum sample size based on 5
question answered in our Figures 2a and 2c (for the 501st researcher, does the
95% confidence interval of likely Gamma2 path values include zero?) and the
10Based on Chin et al.'s four indicators per construct, x -> y path in their
question answered in our Figures 2b and 2d (what proportion of the 500 data
sets had a statistically significant Gamma2 estimate?). Chin and Newsted'Table
s 7 (p. 204), Chin and Newsted' s four indicator, two latent variables, .2
path in their Tables 2 and 6 of their Online Appendix; and Cassel et al.'s
test for statistical significance in their Tables 7 through 1 1 is precisely the
statistical significance test we show in Figure 2a and Figure 2c. While it gammal
is in their Table 4 (p. 442).
a test of statistical significance, it is the significance for the question of
whether a 501st researcher with n = 40 will find a Gamma2 that is positive.
1 *Note that sample sizes of 20 or 40 will turn out to be too small, as expected,
and not recommended for any technique except where effect sizes are known
It is not an indication of power, and does not tell us the likelihood of a 501st
researcher finding a statistically significant Gamma2 at this sample size. to be very strong.

MIS Quarterly Vol. 36 No. 3/September 2012 987

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size, and Non-Normal Data

times the number of LISREL estimated parameters,12 and 1 50 accurate" than regression, since the PLS line is above the
is roughly the mid-point between 90 and 200. For each of the regression line. However, the differences are quite small.
five sample sizes, we analyzed the 500 data sets using Discounting the small effect size path (which never achieved
multiple regression,13 PLS-Graph,14 and LISREL.15 Results even a 35% statistical power), there are ten possible accuracy
for Study 1 are displayed in graphical form in Figure 3, with comparisons (large and medium effect sizes times five
actual values in Appendix A, Tables Al, A2, and A3. different sample sizes). For only one of these ten was the
mean PLS path estimate statistically significant16 and higher
Simple Model: Arriving at a Solution. We found that than the mean regression path estimate. That was at n = 90
virtually all of our runs with regression and PLS arrived at and medium effect size.
viable solutions (i.e., converged with admissible results).
With LISREL, for n = 40, 1 1 of the 500 data sets did not. For More importantly, the LISREL path estimates were consis-
n = 20, 164 of 500 runs did not. At n = 90 and above, these tently more accurate than those of PLS. Again, discounting
problems of LISREL disappeared in our analysis. These are the small effect size path, for nine of the ten possible
not surprising results. As is well known, LISREL might not comparisons the mean LISREL path estimate was statistically
converge or might produce inadmissable values at smaller significantly higher than the mean PLS path estimate. (The
sample sizes; for the most part, regression and PLS do not exception was at n = 20 and medium effect size.) We also
share this weakness. note that at n = 40 and n = 20, the LISREL estimates should
be discounted because that is far below any recommended
Simple Model: False Positives. All three techniques found sample size for LISREL. However, PLS and regression don't
false positives for the false path from Ksi4 to Etal roughly fare much better at those sample sizes!
5% of the time. This is exactly as it should be when we fix
the statistical significance hurdle at p < .05. In all of our Similar to Chin and Newsted (1999) and Cassel et al. (1999),
studies based on the model in Figure 1 , we found no problems we found that for any given sample size, by averaging across
with excessive false positives (Type 1 errors) for any of the all 500 data sets, inaccuracies of the individual data sets tend
techniques. to cancel each other out (some high and some low), and for all
three statistical techniques, the average across 500 data sets
Simple Model: Accuracy. Figures 3a (large effect size) and seems to be robust to reductions in sample size, at least down
3b (medium effect size) show differences in accuracy across to n = 40. The picture changes when we look at standard
the 500 data sets, with details in Appendix A, Table Al . The deviations of the 500 path estimates. These are shown for the
small effect size paths are hardly ever detected, at any of our Gammal path (large effect size) in Figure 3c and for the
sample sizes, so they are not displayed. Gamma2 path (medium effect) in Figure 3d. These graphs
give a sense of how "wild" the 500 individual path estimates
Accuracy is represented as the "bias" or percent departure can be as sample size goes down. From n = 200 to n = 20, the
from the true value in the Figure 1 model. Chin and Newsted standard deviations for regression and PLS increase by a
(1999) observed that PLS provided estimates for path factor of about 3, and for LISREL by a factor of almost 5.
coefficients that were more accurate than regression. Our Even though the bias when averaged across 500 data sets is
results in Figures 3a and 3b were similar. It could be argued robust, the findings for individual data sets are not robust with
that at all of these sample sizes, PLS is arithmetically "more respect to sample size. Below about n = 90, the chance of
having an accurate path estimate for an individual sample
decreases markedly for all three techniques.
12Different sample size guidelines for LISREL have been suggested. These
include at least 100 (Hair et al. 1998), at least 150 (Bollen 1989), at least 200 Simple Model: Detecting Paths That Do Exist. Power is
(for degrees of freedom of 55 or lower) (MacCallum et al. 1996), or five
the proportion (of the 500 data sets) for which the t statistic
times the number of parameters to be estimated (Bagozzi and Edwards 1 998).
for true paths exceeds the p < .05 level.17 We determined
13 We used SAS 9.2.

14We used PLS-Graph 3.0, build 1 130 (Chin 2001) for all analyses reported. l6We used a two sample t test for equal means.

15We used LISREL with the default, maximum likelihood estimation, the 17In some conditions (for Study 1 it was the n = 40 and n = 20 conditions),
most common choice. With LISREL and other CB-SEM approaches, some LISREL data sets did not converge or resulted in an inadmissible
identification is an issue. Figure 1 can be thought of as two separate models: solution. We included these data sets in the calculation of power, counting
one model with one construct and 3 indicators and a second model with three them as data sets that did not detect a significant path. Had we calculated
constructs and two or more (in this case, three) indicators each. Each of these power looking only at the LISREL data sets that produced admissible
is identified, as per Kline (1998, p. 203). solutions, LISREL' s power would have been higher.

988 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et ai/PLS, Small Sample Size, and Non-Normal Data

power for regression and LISREL by using the 500 t-statistics B we address in more depth the reasons for using 100 boot-
provided by the technique. For PLS we used the boot- strapping resamples instead of a larger number, and provide
strapping option with 100 resamples for each of the 500 a sensitivity analysis to check the impact on our results.
analyses. This means that each reported statistical signifi-
cance value for PLS is based on 50,000 bootstrapping Figures 3e (large effect size) and 3f (medium effect size)
resamples (500 data sets times 100 resamples). In Appendix show the power results for the three techniques as the sample

MIS Quarterly Vol. 36 No. 3/September 2012 989

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aL/PLS, Small Sample Size, and Non-Normal Data

size decreases (right to left) from 200 to 20. Details for the Study 2: Simple Model and Non-Normal Data
power analysis are shown in Appendix A, Table A2. To give
a point of comparison, given our base model (Figure 1), a Although PLS may not have an advantage with normally
medium effect size, n = 40, and looking across 500 data sets, distributed data, much data in behavioral research is not
Cohen (1988) would predict a power of about .44, with the normally distributed (Micceri 1989). It may be that the
95% confidence interval going from about .40 to about .48. advantage of PLS is only apparent with non-normally
Similarly, for a large effect size and n = 40, Cohen would distributed data. In Study 2 we tested the impact of non-
predict power between .78 and .84. normal data on the efficacy of our three techniques, gener-
ating our non-normal data using Fleishman's (1978) tables
Three things are immediately clear from Figures 3e and 3f. and approach. Results for Study 2 are displayed in graphical
First, there is almost no difference between the techniques in form in Figure 4. Appendix C has a detailed report of our
their ability to detect true paths for n = 40 and above - the findings for the simple model and non-normal data with actual
three lines are almost on top of each other. Only at n = 20 for values in Appendix A, Tables A4 for path estimates and A5
the medium effect size (Figure 3f) are the differences great for power. Below we summarize what we found.
enough to be statistically significant. At that sample size,
regression's power of 22% is statistically significantly higher Non-Normal Data: Summary. It appears that all three tech-
than PLS's power of 15%, which is statistically significantly niques are fairly robust to small to moderate skew or kurtosis
higher that LISREL's power of 10%. Second, all of these (up to skew =1.1 and kurtosis = 1.6). However, with more
values are abysmal - none are even close to the recommended extremely skewed data (skew = 1.8 and kurtosis = 3.8), all
power value of 80%. Third, Cohen's predictions of power are three techniques suffer a substantial and statistically signifi-
remarkably accurate, taking into account effect size, the cant loss of power for both n = 40 and n = 90 (the two sample
overall model, and n. sizes we tested). For example with n = 90 and medium effect
size, regression's power is 76% with normal data, but drops
From Figures 3e and 3f, it is clear that sample sizes smaller to 53% for extremely skewed data. Under the same condi-
than 90 can make it difficult to detect a medium effect size, tions PLS's power drops from 75% to 48%, while LISREL
regardless of what technique is used. Following the rule of 1 0 drops from 79% to 50%. Although some have argued that
here (n = 40) with a medium effect size produces a power of new estimator options are what makes CB-SEM robust to
only about 40%. The rule of 5 produces a power of about non-normality (Gefen et al. 2011), we note that we used
20%. The striking result of this analysis is that all three maximum likelihood estimate in all of our LISREL analyses.
techniques achieve about the same power at any reasonable This suggests that, at least for our simple model, CB-SEM
sample size (e.g., 90 or above), and they also achieve nearly with maximum likelihood is as resistant to departures from
the same level of power at lower sample sizes (e.g., 20 or 40). normality as regression or PLS.
If any technique has a power advantage at low sample sizes
it is regression, though all are unacceptable for a medium The overall results for the non-normal data do not contradict
effect size at n = 40 or n = 20. what was observed in Study 1 . All three techniques seem to
respond in approximately the same way to non-normality.
Simple Model: Summary. In our simulation with normally Regardless of the distribution of the data, none of the tech-
distributed data, we do not observe any advantage of PLS niques has sufficient power to detect a medium effect size at
over the other two techniques at sample sizes of 90 or above, n = 40. Only at n = 90 do any of the techniques have approxi-
nor do we observe any advantage over regression at smaller mately an 80% power to detect a medium effect size, and that
sample sizes for either accuracy or power. Clearly, for all is not changed by moderate departures from normality for any
three techniques, individual estimates become increasingly of the techniques.
wild as sample size decreases. In moving from n = 200 to n
= 20, the standard deviations are increased by a factor of at
least 2.5. Therefore, none of the techniques reached an Study 3: Number of Indicators, Reliability
acceptable level of power for n = 20, and only a large effect
size had acceptable power for n = 40. Instead of PLS demon- The data in Studies 1 and 2 utilized three indicators per
strating greater efficacy, the dominant finding from Study 1 construct with loadings of 0.70, 0.80, and 0.90. However,
is that all three techniques seem to have remarkably similar researchers doing field studies often have a greater number of
performance, and respond in remarkably similar ways to indicators, and empirical work has demonstrated that the
decreasing sample size. The only difference suggested is the number of indicators used in the analysis can impact the path
greater path estimate accuracy of LISREL. estimates of PLS (McDonald 1996). It seems important,

990 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data

therefore, to extend our previous study by examining the Number of Indicators and Reliability: Summary. The
impact, if any, that changes in the number of indicators have doubling of the number of indicators from three to six per
on the results. construct gave higher scale reliability (from .84 to .91) and
therefore each technique had slightly higher power. However,
In addition, the reliability of the constructs in the Study 1 it had no effect on the relative performance of the three
model can be considered comfortably high (Cronbach's alpha techniques for accuracy or power and none achieved 80%
= 0.84). To test the impact of less reliable and more diverse power at n = 40. Similarly, changing the indicator loadings
indicators, we generated two additional groups of data sets, from (.7, .8, .9) to (.6, .7, .8) reduced the reliabilities from .84
with lower and higher reliabilities (Cronbach's alpha = .74 to .74, and caused all three techniques to lose power.
and .91 respectively). A write-up for all of the Study 3 However (as with the earlier studies) this also did not change
findings is included in Appendix C, and is summarized below. the relative performance. Our results suggest that changing
Details are in Appendix A, Tables A6 (path estimates) and A7 by moderate amounts the number of indicators or indicator
(power) for the six indicator model, and A8 (path estimates) loadings (both of which affect reliability) does not produce
and A9 (power) for the lower loadings model. any evidence of a PLS advantage over regression or LISREL.

MIS Quarterly Vol. 36 No. 3/September 2012 991

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

Study 4: A More Complex Model Arriving at a solution. The findings for the complex model
are quite similar to those for the simpler model. For n = 90
The models that we used in Studies 1, 2, and 3 are quite and 150, all techniques arrived at a solution. For n = 20 and
simple, with four independent variables and one dependent 40, LISREL had the expected difficulties; 99 of the n=20 and
variable. PLS may have more relative advantage when two of the n = 40 LISREL runs did not produce a solution.
employed with more complicated models. For example, Chin
and Newsted (1999) used a model comprised of multiple Accuracy. There are seven true paths in Figure 5 and two
constructs predicting a focal construct which then predicted
zero (non-existent) paths. The full results for the individual
multiple dependent constructs.18
paths in the complex model are shown in Appendix A, Tables
AIO (accuracy) and Al 1 (power). To conserve space (and be
To investigate model complexity, we created a more complex
respectful of the readers' stamina), for accuracy and statistical
model as shown in Figure 5. It contains seven constructs that
power we will look only at two of the seven paths (Ksi3 ->
can be partitioned into three interconnected submodels.
Eta3 and Eta3 -> Eta4), with these results shown in Figure 6.
Starting from the bottom and working up, we see that Ksi3
and Eta3 together cause variation in Eta4, but in this case These two paths are quite representative of all seven true
there is no direct link between Ksi3 and Eta4. Therefore Eta3 paths. In Figure 7 we move the level of abstraction up and
fully mediates the relationship between Ksi3 and Eta4. look at statistical power as averages across all seven true
Moving to the middle of the diagram and looking at Ksi2, paths and both false paths.
Eta2, and Eta4, we see that Eta2 only partially mediates the
relationship between Ksi2 and Eta4, as there is also a direct As shown in Figures 6a and 6b, for both the Ksi3 -> Eta3 and
relationship between Ksi2 and Eta4. Finally, at the top of the Eta3 -> Eta4 paths, PLS had a slight advantage over
diagram looking at Ksi 1 , Eta 1 and Eta4, we see that Eta 1 does regression in terms of bias (path accuracy), but as before
not mediate the relationship between Ksl and Eta4, as there LISREL had the least bias. We note that at n = 20, LISREL's
is no direct relationship between Etal and Eta4. All paths bias advantage largely disappeared.
have approximately a medium effect size. See the note in
Figure 5 for more detail. Although the average bias across 500 data sets was reason-
ably robust to decreases in sample size, the standard deviation
As before, we generated 500 data sets from the model for across the 500 path estimates was not, repeating what was
each sample size condition (n = 20, 40, 90, and 150), and found in the simpler model. For all techniques, as the sample
analyzed each sample using all three techniques. Note that
size goes down, the average path estimate standard deviations
we decided not to test the results at n = 200, since all of our
go up, as can be seen in Figure 6c and 6d. For all three tech-
earlier tests had high power at n = 150 and there was not
niques and for both the paths displayed, the standard deviation
much difference in moving from n = 150 to n = 200. Also
increased by a factor of about 2.5 as sample size dropped
note that applying the rule of 10 would give a minimum
from 150 to 20.
sample size of 60 for PLS for this model.

Statistical Power. In terms of statistical power, for n = 90


We should note that Reinartz et al, (2009) used a fairly com-
plex model and compared the accuracy and statistical power and above, the three techniques were remarkably similar for
of PLS and CB-SEM at sample sizes of 100 and greater. both the Ksi3 Eta3 and Eta3 Eta4 paths - the three lines
They concluded that CB-SEM was more accurate, but that essentially blur together, as can be seen in Figures 6e and 6f.
PLS had more statistical power. Since these statistical power At n = 40 and below, none of the techniques has acceptable
results are quite different from our own results in this paper, power, but statistically significant differences do begin to
it raises the question of whether the statistical power of either appear. However, they are inconsistent. At n = 40, for the
or both PLS and CB-SEM are highly variable depending upon Ksi3 -> Eta3 path, PLS's power (59%) is significantly larger
the particular model being analyzed. We also note that than LISREL's (53%). But for the Eta3 -> Eta4 path,
Reinartz et al. did not test for false positives. LISREL's power (43%) is significantly larger than PLS's
(35%). At n = 20, again there are inconsistent significant
differences in the power. For the Ksi3 -> Eta3 path, PLS's
18
As in their simpler model, Chin and Newsted found that, using their more power (37%) is significantly larger than both LISREL's and
complex model (their Figure 4), the mean bias for the PLS path estimates was regression's, but for the Eta3 -> Eta4 path, regression's power
again quite robust to reductions in sample size. Although they did not focus
(22%) is significantly larger than PLS's (19%) or LISREL's
on it, the standard error of those path estimates generally doubled for all paths
as sample size was reduced from 200 to 50. See their Online Appendix,
(15%). We note that at n = 40 and below, none of the tech-
Tables 12-15. niques has even close to acceptable power.

992 MIS Quarterly Voi 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size , and Non-Normal Data

False Positives. The paths from Etal to Eta4 and from Ksi3 individual paths at sample sizes of less than n = 90, a look at
to Eta4 are non-existent, allowing us to test for false positives. the behavior averaged across all seven true paths in the com-
The proportion of false positives for these two paths is shown plex model becomes of interest. These averaged power
in Figures 6g and 6h. Here we see something unexpected. values19 are shown in Figure 7a, and at the bottom of Appen-
Although no technique consistently shows up as better or dix A Table Al 1. As expected, for sample sizes n = 90 and
worse than the others, the 95% confidence interval around 5% above, the values for the average of the seven true paths are
was exceeded (i.e., higher than 6.9% false positives) at least almost on top of one another, with no statistically significant
once for each technique. This suggests that in more complex
models, false positives may become a concern.
19Each of these power numbers represents a proportion out of 3,500 trials
Average Power Across All Seven True Paths. Given the (7 X 500), with a 95% confidence interval of plus or minus 1.3%, 1.6%,
1.3%, and .7% for proportion values of 20%, 40%, 80%, and 95%,
inconsistencies in the dominant technique for the power of
respectively.

MIS Quarterly Voi 36 No. 3/September 2012 993

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et al./PLS, Small Sample Size, and Non-Normal Data

994 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

differences. However, at n = 40, regression is significantly Complex Model with Non-Normal Data. There is also the
lower than the other techniques (41 .2% versus 43.0%), while possibility that a different picture might emerge if the com-
at n = 20, regression is significantly higher than the other plex model were analyzed with non-normal data, so we tested
techniques (23.2% versus 19% and 15%). Given that below this as well. See Table A12 in Appendix A for the specific
n = 90 none of these power values are even close to the target results. In fact, all three techniques have about the same per-
of 80%, these inconsistent differences may not be very centage drop in power with extremely non-normal data,
important. regardless of whether the simple or the complex model is
used. If any technique has the advantage, it is regression, for
Average False Positives. Similarly, to increase the level of both the simple or complex model. PLS has no apparent
abstraction on the erratic picture of false positives, we aver- advantage when using more complex models and non-normal
data.
aged across both false paths as shown in Figure 7b and Table
Al 1 of Appendix A. Each combined power score represents
Complex Model Summary. The most prominent finding for
a proportion of false positives (out of 1,000). With 1,000
the complex model is that, for the most part, the results
trials,20 the 95% confidence interval around .05 goes from 3.6
closely mirror what we found with the simpler model.
to 6.4. From Figure 6b, it can be seen that across all sample
LISREL has a slight advantage in accuracy across the board.
sizes regression just barely stays within the "safe" range of
For power, there are differences between PLS versus regres-
false positives. For sample sizes of 90 and above, both
sion or LISREL on particular paths of the model - some paths
LISREL and PLS have too many false positives, especially at
consistently seem to favor PLS, others LISREL. However, an
n = 90 where PLS has 7.5% and LISREL 7. 1% false positives
average across all seven true paths gives statistically
(see Table Al 1). These values are worthy of concern, since
indistinguishable power for all three techniques for both n =
false positives threaten our ability to draw correct conclusions 90 andn= 150.
from our statistical methods.

For n = 40, PLS and LISREL power values are identical, and
For n = 20 and n = 40, LISREL's proportion of false positives
statistically larger than regression's, although all values are
increases to about 10%; regression jumps up but then drops below 43%. For n = 20, regression's power is significantly
for n = 20; and PLS's drops to an average of 4%. Thus both higher than PLS and LISREL, but all values are below 25%.
LISREL and regression have excessive false positives at n = Interestingly, at each sample size except n = 20, the average
40 and below, while PLS has an acceptable number. We do power for the complex model (with its seven approximately
note that PLS's average power at these sample sizes is quite medium effect sized paths) is within a percent or two of the
low (43% at n = 40 and 26% at n = 20). medium effective size power for the simpler model.

However, there is one striking difference from the simpler


20The confidence interval around a 5% proportion based on 1 ,000 trials is +/-
1.4%. model. With the complex model both PLS and LISREL have

MIS Quarterly Vol. 36 No. 3/September 2012 995

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

an unsettling number of false positives, especially at sample could hold the answer. It is generally accepted that regression
sizes of 90 where PLS has 7.5% and LISREL 7.1%. The does not account for measurement reliability in its path
difference between these values and the allowable 5% is estimates, but that LISREL does. Regression path estimates
statistically significant. We attribute the somewhat higher
are "attenuated" by measurement error, according to the
number of false positives to the fact that in each case, the
following equation from Nunnally and Bernstein (1994, pp.
241,257):
false predictor constructs were correlated with the true
predictor constructs. This creates some amount of multi-
collinearity and may make the results less stable (see Apparent CorrelationXY = Actual CorrelationXY x
Goodhue et al. 201 la). This condition may often be present Square Root(Reliabilityx x ReliabilityY)
in more complex models and is a potential concern.
Given this equation, and knowing the reliabilities of all of the
constructs in our various studies, we should be able to
Overall, the results suggest that small sample size and non-
normality have the same effect in the complex model that they
"adjust" the attenuated regression path estimates to the correct
do in the simple model. We again see no advantage for PLS, value using the reliability of the constructs in our model.
except fewer false positives at sample sizes of 40 or less,Figure 8b shows both PLS and regression path estimates
where power is below 50% for all techniques. corrected for measurement error and LISREL estimates
unchanged. Looking at Figures 8a and 8b suggests the
following: The negative bias of PLS and regression is propor-
tional to the reliability of the constructs, and is essentially
Post Hoc Analysis: Accuracy corrected when Nunnally and Bernstein's equation is used.
Differences21 Bi^ ■ The only time this is not true is when a small effect size is
involved, in which case the reliability correction sometimes
Across all of our studies, LISREL seems
seemed to over to
correct for PLS.
have theFigures 8a and 8b suggest that
advan-
PLS is more similar
tage in terms of path estimate accuracy. For the simpleto regression than it is to LISREL in the
waybias
model, Figure 8a shows the average in whichat itdifferent
compensates for measurement
values of error. In
retrospect,
effect size and reliability. LISREL was this closer
is not so surprising.
to the It is consistent
true with both
value (bias closer to zero) than PLS in 24 of 27 McDonald (1996, pp. 266-267) and Dijkstra (1983, p. 81)
comparisons.22 Looking just at LISREL and PLS (the two who suggest that when dealing with multiple indicators for
leading contenders), and taking as the null hypothesis that each construct, as measurement error goes up, both PLS and
either technique had a 50% chance of being the most accurate regression will suffer increasingly in terms of accuracy in
in every comparison, we can use the binomial distribution to comparison with LISREL.
ask, what is the likelihood of having only zero, 1 , 2, or 3 com-
parisons favoring PLS out of 27 trials under those assump- Consider the PLS process in "mode A" (when all constructs
tions? With this nonparametric test, we reject the hypothesis have reflective indicators, which is what our study involves).
that LISREL is no more accurate than PLS, with a p value of As we said earlier in this note, we can think of the PLS
.00002. The picture is similar if we look at results from the process as having three major steps. The first step is to iterate
non-normal data, or the more complex model. through a process to determine the optimal indicator weights
for each construct. The second step uses those weights to
In fact, Figure 8a shows PLS to be quite a bit more similar to calculate construct scores which are then used in ordinary
regression than it is to LISREL in terms of accuracy. Given least squares regression to determine path estimates. The
the apparent fact of LISREL's accuracy advantage over PLS third step is to determine the standard deviations of those path
(and regression), a reasonable question is to ask is, why? A estimates with bootstrapping.
possible answer comes from other work we have been con-
We would submit that if PLS has somehow taken into account
ducting that suggests that the level of measurement reliability
measurement error in determining its path estimates, then it
must have done so in step 1. This has to be the case, since
21
An earlier version of the material in this section was published as a beyond this point (in step 2) all path estimates are determined
conference paper (Goodhue et al. 201 lb). by ordinary least squares regression, which is known to not
compensate for measurement error. But all that comes out of
22This includes only N = 90, 150, and 200, for each of the three levels of
step 1 is the appropriate weights for the indicators. If PLS
reliability (.74, .84, .91), and each of three effect sizes (large, medium, and
small). The overall results do not change if we also include the other possible
compensates for measurement error, it must do so by
sample sizes (n = 20 and n = 40 where applicable). assigning just the right combination of weights. We would

996 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size, and Non-Normal Data

submit that unless at least one indicator measures the con- posed by Vittadini et al. (2007) would need to be
struct without error, there is no conceivable weighting scheme used to appropriately determine the PLS model esti-
that will overcome measurement error. In fact, our empirical mates. This is because in PLS. . . [for reflective con-
evidence seems to suggest that PLS path estimates (like structs] the model errors are not taken into account.
regression path estimates) do not compensate for measure-
ment error at all , while LISREL path estimates do. Our work is a bit more explicit than the Marcoulides et al.
comment - we show that PLS is like regression in ignoring
We recognize this may be a controversial statement, since it measurement error in determining its path estimates, and that
is counter to the widespread belief among MIS researchers. the same correction (Nunnally and Bernstein 1994) can be
However we are not the first to suggest it. Marcoulides et al.used in either PLS or regression to address this weakness.
(2009, p. 172) commented:
Our findings on measurement error also suggest a different
We note that in cases where the model errors are not perspective on Wold's (1982) assertion that PLS has "consis-
explicitly taken into account for the estimation of tency at large." He suggests that given an infinite number of
endogenous latent variables, a new approach pro- indicators and an infinite sample size, PLS path estimates will

MIS Quarterly Vol. 36 No. 3/September 2012 997

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

have no bias. We would point out that if there were an Earlier in this paper, we indicated that one of our primary
infinite number of indicators, then Cronbach' s alpha would be goals was to provide guidance to researchers in terms of
1, there would be no measurement error, and Wold's claim designing studies, selecting statistical analysis techniques, and
and ours would be the same: no measurement bias in the PLS interpreting results. We do so here.
path estimates, nor in the regression estimates, nor in the
LISREL estimates. This suggests that saying that PLS has
"consistency at large" is not actually a unique or useful selling Study Design
point.
Although many have argued that there are important differ-
ences between the efficacies of the three techniques under
certain conditions, in our studies what is much more prom-
Limitations and Opportunities for inent than the differences is the surprising similarity of the
Future Research results. In our studies, all suffered from increasingly large
standard deviations for their path estimates as sample sizes
As with any study, there are a number of limitations that may
decreased, and from the resulting drop in statistical power.
lead to opportunities for future research. For example, mostcould successfully detect medium effect sizes on a
None
of the data generated for use in this study were designed to
consistent basis using sample sizes recommended by the rule
have relatively high indicator loadings with few of
cross-
1 0 or the rule of 5. All techniques showed substantial (and
loadings. While such well-behaved data created aabout
level equal) robustness in response to small or moderate
playing field, actual field data often exhibits more challenging
departures from normality; all showed significant losses in
characteristics. Future studies could be designed to test the
response to extreme non-normality.
three techniques across a variety of other data conditions,
including indicators that cross-load (i.e., load on constructs
Recommendation: When determining the minimum sample
other than the one they are intended to measure), or constructs
size to obtain adequate power, use Cohen's approach (regard-
exhibiting multicollinearity. Furthermore, future studies
less of the technique to be used). Do not rely on the rule of
could examine the testing of more complex models involving
10 (or the rule of 5) for PLS. It also might be fruitful for
formative measurement (Barclay et al. 1995; Chinsocial
1998;science researchers to more precisely identify the
Petter et al. 2007). amounts and types of non-normality, so we know better at
what point such problems become threatening.
In addition, it is important to recognize that all of our analyses
with the more complex model involved approximately
Interpretation of Results
medium effect sizes. We do not expect that the relative
performance between the three techniques would change
much with different effect sizes, however. We do need to consider what our findings for PLS and small
sample size might mean in terms of existing published
research that found statistically significant results using PLS
Conclusion with small sample size. First, with the simple model, none of
the techniques showed excessive false positives. Even though
with the more complex model we did find evidence of greater
The belief among MIS researchers that PLS has special
than a 5% occurrence of false positives with all three tech-
powers at small sample size or with non-normal distributions
niques, with that model and smaller sample size PLS had the
is strongly and widely held in the MIS research community.
smallest occurrence of false positives. Therefore, overall
Our study, however, found no advantage of PLS over the
there is nothing in our findings to suggest that any previously
other techniques for non-normal data or for small sample size
reported statistically significant results found with PLS and
(other than the universally stated concern that with smaller
small sample sizes are suspect.23 We note that our findings
samples, LISREL may not converge). This should not be
suggest that regression would be equally likely to find such
surprising; all of these techniques can be thought of as
attempts to make meaningful statements about latent variables
statistically significant relationships at those small sample
in a population on the basis of a small sample. Even if a true
random sample is drawn (which already suggests problems),
23
basic statistics tells us that the smaller the N, the less sure
We notewe
that we did not test PLS using an alternate approach for deter-
can be about our estimates. mining statistical significance employed by Majchrzak et al. (2005). We are
skeptical of the viability of that approach.

998 MIS Quarterty Voi 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size , and Non-Normal Data

sizes. On the other hand, studies using PLS and small sample In conclusion, we want to stress that in our empirical work,
sizes that failed to detect a hypothesized path (e.g., Malhotra even though it seems to have no special abilities with respect
et al. 2007), may well have a false negative. to sample size or non-normality, PLS did not perform worse
than the other techniques in terms of statistical power and
Recommendation: In studies where small sample sizes were avoidance of false positives. These are perhaps the most
used (with any of the techniques, including PLS) and a important performance attributes of hypothesis testing. PLS
hypothesized path was not observed to be statistically signi- is still a convenient and powerful technique that is appropriate
ficant, this should not be interpreted as a lack of support for for many research situations. For example, with complex
the hypothesis. In these cases, further testing with larger research models, PLS may have an advantage over regression
sample sizes is probably warranted. in that it can analyze the whole model as a unit, rather than
dividing it into pieces. However, we found that PLS certainly
is not a silver bullet for overcoming the challenges of small
Selecting a Technique sample size or non-normality. At reasonable sample sizes,
LISREL has equal power and greater accuracy.
One important difference between the techniques did stand
out. This is the suggestion that PLS, unlike LISREL, seems
not to compensate for measurement error in its path estimates. Acknowledgments
In fact, PLS seems to be quite comparable to regression in all
its performance measures, including accuracy of path esti- Professor Thompson would like to acknowledge the generous
mates. If accuracy is a major concern, both PLS and regres- financial support of the Wake Forest Schools of Business in helping
sion have poorer performance than LISREL. The magnitude to complete this research project.
of the difference in path estimates, however, was not great.

Recommendation: Here, interestingly, the three authors of References


this note are not in complete agreement. One point of view
suggests that if one is in the early stages of a research investi- Barclay, D., Higgins, C., and Thompson, R. 1995. "The Partial
gation and is concerned more with identifying potential Least Squares (PLS) Approach to Causal Modeling: Personal
relationships than the magnitude of those relationships, then Computer Adoption and Use as an Illustration," Technology
regression or PLS would be appropriate. As the research Studies (2:2), pp. 285-309.
stream progresses and accuracy of the estimates becomes Bagozzi, R. P., and Edwards, J. R. 1998. "A General Approach for
more important, LISREL (or other CB-SEM techniques) Representing Constructs in Organizational Research," Organi-
zational Research Methods (1:1), pp. 45-87.
would likely be preferable.
Bollen, K. A. 1989. Structural Equations with Latent Variables ,
New York: Wiley.
The second point of view suggests the following: Both
Cassel, C., Hackl, P., and Westlund, A. 1999. "Robustness of
regression and CB-SEM techniques have received a great deal
Partial Least-Squares Method for Estimating Latent Variable
of attention from statisticians over the years, and their advan-
Quality Structures," Journal of Applied Statistics (26:4), pp.
tages and limitations are generally well established. PLS has 435-446.
received much less attention (at least until more recently), and
Chin, W. W. 1998. "The Partial Least Squares Approach to Struc-
hence we are still learning about its properties and how it tural Equation Modeling," in Modern Methods for Business
behaves under various conditions. In addition to the evidence
Research , G. A. Marcoulides (ed.), London: Psychology Press,
presented here (with respect to a loss of power at small pp. 295-336.
sample sizes, etc.) some recent studies have provided prelim- Chin, W. W. 2001. PLS Graph User's Guide, Version 3.0,
inary evidence of other potential short-comings, such as Houston, TX: Soft Modeling, Inc.
greater susceptibility to multicollinearity problems than CB- Chin, W. W., Marcolm, B., and Newsted, P. 2003. "A Partial Least
SEM and regression (Goodhue et al. 201 lb), lower construct Squares Latent Variable Modeling Approach for Measuring
validity when measurement errors are correlated across con- Interaction Effects: Results from a Monte Carlo Simulation
structs (Rönkkö and Ylitalo 2010), and an inability to detect Study and an Electronic-Mail Emotion/Adoption Study," Infor-
mis-specified models (Evermann and Tate 2010). As a result, mation Systems Research (14:2), pp. 189-217.
one could argue that researchers would be well advised to use Chin, W. W., and Newsted, P. R. 1999. "Structural Equation
PLS with caution (until more is known about its properties), Modeling Analysis with Small Samples Using Partial Least
Squares," in Statistical Strategies for Small Sample Research , R.
and to rely on more well-established techniques under most
circumstances. Hoyle (ed.), Newbury Park, CA: Sage Publications, pp. 307-341 .

MIS Quarterly Vol. 36 No. 3/September 2012 999

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et aUPLS, Small Sample Size, and Non-Normal Data

Cohen, J. 1988. Statistical Power Analysis for the Behavioral Hair, J. F., Jr., Anderson, R. E., Tatham, R. L., and Black, W. C.
Sciences , Hillsdale, N: Lawrence Erlbaum Associates. 1998. Multivariate Data Analysis with Readings (5th ed.),
Dijkstra, T. 1983. "Some Comments on maximum Likelihood and Englewood Cliffs, NJ: Prentice Hall.
Partial Least Squares Methods," Journal of Econometrics (22), Hair, J. F., Ringle, C. M., and Sarstedt, M. 2011. "PLS-SEM:
pp. 67-90. Indeed a Silver Bullet," Journal of Marketing Theory and
Evermann, J., and Tate, M. 2010. "Testing Models or Fitting Practice (19:2), pp. 139-151.
Models? Identifying Model Misspecification in PLS," in Hair, J. F., Sarstedt, M., Ringle, C. M., and Mena, J. A. 201 1. "An
Proceedings of the 31st International Conference on Information Assessment of the Use of Partial Least Squares Structural
Systems , St. Louis, MO, December 12-15. Equation Modeling in Marketing Research," Journal of the
Falk, R. F., and Miller, N. B. 1992. A Primer for Soft Modeling , Academy of Marketing Science , Online Publication (DOI
Akron, OH: University of Akron Press. 10.1007/sl 1747-01 1-0261-6).
Fleishman, A. I. 1978. "A Method for Simulating Non-Normal Kahai, S. S., and Cooper, R. B. 2003. "Exploring the Core
Distributions," Psychometrika (43:4), pp. 521-532. Concepts of Media Richness Theory: The Impact of Cue Multi-
Forneil, C. 1984. "A Second Generation of Multivariate Analysis: plicity and Feedback Immediacy on Decision Quality," Journal
Classification of Methods and Implications for Marketing of Management Information Systems (20:1), pp. 263-299.
Research," Working Paper, University of Michigan. Kline, R. B. 1998. Principles and Practice of Structural Equation
Fornell, C., and Bookstein, F. 1982. "Two Structural Equation Modeling , New York: Guilford Press.
Models: LISREL and PLS Applied to Consumer Exit-Voice Lohmöller, J. B. 1988. "The PLS Program System: Latent Vari-
Theory," Journal of Marketing Research (19), pp. 440-452. ables Path Analysis with Partial Least Squares Estimation,"
Fornell, C., and Larcker, D. 1981. "Evaluating Structural Equation Multivariate Behavioral Research (23), pp. 125-127.
Models with Unobservable Variables and Measurement Error," MacCallum, R., Browne, M., and Sugawara, H. 1996. "Power
Journal of Marketing Research (18), pp. 39-50. Analysis and Determination of Sample Size for Covariance
Gefen, D., Rigdon, E., and Straub, D. 201 1. "Editor's Comments: Structure Modeling," Psychological Methods (1:2), pp. 130-149.
An Update and Extension to SEM Guidelines for Administrative Majchrzak, A., Beath, C. M., Lim, R. A., and Chin, W. W. 2005.
and Social Science Research," MIS Quarterly (35:2), pp. iii-xiv. "Managing Client Dialogues During Information Systems Design
Gefen, D., Straub, D., and Boudreau, M. C. 2000. "Structural to Facilitate Client Learning," MS Quarterly (29:4), pp. 653-672.
Equation Modeling and Regression: Guidelines for Research Malhotra, A., Gosain, S., and El Sawy, O. 2007. "Leveraging
Practice," Communications of the Association for Information Standard Electronic Business Interfaces to Enable Adaptive
Systems (4:Article 7). Supply Chain Partnerships," Information Systems Research
Goodhue, D., Lewis, W., and Thompson, R. 2006. "Small Sample (18:3), pp. 260-279.
Size and Statistical Power in MIS Research," in Proceedings of Marcoulides, G. A., Chin, W. W., and Saunders, C. 2009. "Fore-
the 39th Hawaii International Conference on Systems Sciences, R.
word: A Critical Look at Partial Least Squares Modeling," MS
Sprague (ed.), Los Alamitos, CA: IEEE Computer Society Quarterly (33:1), pp. 171-175.
Press, January 4-7.
Marcoulides, G. A., and Saunders, C. 2006. "Editor's Comments:
Goodhue, D., Lewis, W., and Thompson, R. 2007. "Research Note
PLS: A Silver Bullet?," MS Quarterly (30:2), pp. iii-ix.
- Statistical Power in Analyzing Interaction Effects: Questioning
McDonald, R. P. 1996. "Path Analysis with Composite Variables,"
the Advantage of PLS With Product Indicators," Information
Multivariate Behavioral Research (31:2), pp. 239-270.
Systems Research (18:2), pp. 21 1-227.
Micceri, T. 1989. "The Unicorn, the Normal Curve, and Other
Goodhue, D., Lewis, W., and Thompson, R. 2011a. "A Dangerous
Improbable Creatures," Psychological Bulletin (105:1), pp.
Blind Spot in IS Research: False Positives Due to Multi- 156-166.
collinearity Combined with Measurement Error," in Proceedings
Nunnally, J. C., and Bernstein, I. H. 1994. Psychometric Theory
of the 1 Th Americas Conference on Information Systems , Detroit,
(3rd ed.), New York: McGraw-Hill.
MI, August 4-7.
Petter, S., Straub, D., and Rai, A. 2007. "Specifying Formative
Goodhue, D., Lewis, W., and Thompson, R. 201 lb. "Measurement
Constructs in Information Systems Research," MS Quarterly
Error in PLS, Regression and CB-SEM," in Proceedings of the
(31:4), pp. 623-656.
6th Mediterranean Conference on Information Systems Sciences ,
Reinartz, W., Haenlein, M., and Henseler, J. 2009. "An Empirical
Limassol, Cyprus, September 3-5.
Comparison of the Efficacy of Covariance-Based and Variance-
Goodhue, D., Lewis., W., and Thompson, R. 2012. "Comparing
Based SEM, 2009," International Journal of Research in
PLS to Regression and LISREL: A Response to Marcoulides,
Marketing (26), pp. 332-344.
Chin, and Saunders," MS Quarterly (36:3), pp. 703-716.
Kingle, C., ¡Sarstedt, M., and ötraub, D. zU12. bditor s Comments:
Green, S. B. 199. "How Many Subjects Does it Take to Do A
A Critical Look at the Use of PLS-SEM in MS Quarterly ," MS
Regression Analysis," Multivariate Behavioral Research (26),
Quarterly (36:1), pp. iii-xiv.
pp. 499-510.
Rivard, S., and Huff, S. 1988. "Factors of Success for End-User
Hayduk, L. A. 1987. Structural Equation Modeling with LISREL ,
Computing," Communications of the ACM (31:5), pp. 552-561.
Baltimore, MD: Johns Hopkins University Press.

1000 MIS Quarterly Vol. 36 No. 3/September 2012

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms
Goodhue et alJPLS, Small Sample Size, and Non-Normal Data

Rönkkö, M, and Ylitalo, J. 2010. "Construct Validity in Partial lished in journals including Management Science , MIS Quarterly ,
Least Squares Path Modeling," in Proceedings of the 31st Information Systems Research , Decision Sciences , and Sloan
International Conference on Information Systems , St. Louis, Management Review. Dale's research interests include measuring
MO, December 12-15. impacts of information systems, the impact of task-technology fit on
Wold, H. O. 1982. "Soft Modeling: The Basic Design and Some individual performance, the management of data and other IS
Extensions," in Systems Under Indirect Observation: Causality, infrastructures/resources, the impacts of enterprise systems on
Structure, Prediction, Partii , K. G. Jöreskog and H. Wold (eds.), organizations, and the strengths and weaknesses of various statistical
Amsterdam: North-Holland. techniques.
Wood, R. E., Goodman, F. S., Beckmann, N., and Cook, A. 2008.
"Mediation Testing in Management Research: A Review and William Lewis is an independent consultant and a former assistant
Proposals," Organizational Research Methods (11:2), ppprofessor of MIS in the Department of Management and Information
270-295. Systems at Louisiana Tech University. His research has appeared
Vittadimi, G., Minotti, S. I., Fattore, M., and Lovaglio, P. G. 2007. in several journals including MIS Quarterly and Communications of
"On the Relationship Among Latent Variables and Residuals inthe ACM. William's research interests include business continuity
PLS Path Modeling: The Formative-Reflective Scheme," planning, individual technology adoption in organizations, IS
Computational Statistics and Data Analysis (51:12), pp.leadership, and research methodology.
5828-5846.

Zhang, T., Agarwal, R., and Lucas, H., Jr. 2011. "The Value Ronald
of L. Thompson is a professor in the Schools of Business at
Wake Forest University. His research has been published in a
IT-Enabled Retailer Learning: Personalized Product Recommen-
variety of journals, including MIS Quarterly, Information Systems
dations and Customer Store Loyalty in Electronic Markets," MS
Quarterly (35:4), pp. 859-882. Research , Journal of Management Information Systems, and
Information & Management. Ron is currently a senior editor for
MIS Quarterly, and formerly served on the editorial board for the
About the Authors Journal of the Association for Information Systems. He holds a
Ph.D. from the Ivey School at the University of Western Ontario.
His current research interests include managing information systems
Dale Goodhue is the MIS Department head, and the C. Herman and
Mary Virginia Terry Chair of Business Administration at the projects and the use of advanced data analysis techniques in
Information
University of Georgia's Terry College of Business. He has pub- Systems research.

MIS Quarterly Vol. 36 No. 3/September 2012 1001

This content downloaded from


128.233.11.60 on Sat, 10 Jun 2023 01:06:41 +00:00
All use subject to https://about.jstor.org/terms

You might also like