Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

International Journal of

Research in
Marketing
ELSEVIER Intern. J. of Research in Marketing 13 (1996) 139-161

Applications of structural equation modeling in marketing and


consumer research: A review
Hans Baumgartner a,*, Christian Homburg b
a The Pennsyh,ania State Uniuersi~, 707-K BAB, Department of Marketing, Smeal College of Business, Unit'ersi~ Park, PA 16802, USA
b WHU Koblenz, D-56179 Vallendar, Germany

Received 15 September 1995; accepted 2 November 1995

Abstract

This paper reviews prior applications of structural equation modeling in four major marketing journals (the Journal of
Marketing, Journal of Marketing Research, International Journal of Research in Marketing, and the Journal of Consumer
Research) between 1977 and 1994. After documenting and characterizing the number of applications over time, we discuss
important methodological issues related to structural equation modeling and assess the quality of previous applications in
terms of three aspects: issues related to the initial specification of theoretical models of interest; issues related to data
screening prior to model estimation and testing; and issues related to the estimation and testing of theoretical models on
empirical data. On the basis of our findings, we identify problem areas and suggest avenues for improvement.

Ke?~vords: Structural equation modeling; Confirmatory factor analysis

1. Introduction marketing and consumer behavior researchers, and


articles in which structural equation modeling is used
Since the development of a general framework for for data analysis now appear routinely in most lead-
specifying structural equation models with latent ing marketing and consumer behavior journals. The
variables - referred to as the Jihreskog-Keesling-Wi- popularity of the methodology is apparent from the
ley model by Bentler (1980) - and the implementa- recent introduction of the eighth version of LISREL
tion of the statistical approach in the LISREL com- (Jtireskog and Stirbom, 1993a - also available as a
puter program, latent variable modeling has become module in SPSSX) and the emergence of a host of
a popular research tool in the social and behavioral computer programs that can be used as alternatives
sciences. The monograph by Bagozzi (1980) on to LISREL, such as COSAN (Fraser, 1980), EQS
Causal Modeling is generally credited with bringing (Bentler, 1989 - also implemented in BMDP), EZ-
the technique to the attention of a wide audience of PATH (Steiger, 1989), LINCS (Schoenberg, 1989),
the PROC CALIS procedure in SAS, and RAMONA
(Browne and Mels, 1992).
* Corresponding author: Tel: (814) 863-3559; fax: (814) 865- Although the potential of structural equation mod-
3015; e-mail: JXB14@psuvm.psu.edu. eling (henceforth referred to as SEM) for compre-

0167-8116/96/$15.00 Copyright © 1996 Elsevier Science B.V. All rights reserved


SSDI 0 1 6 7 - 8 1 1 6 ( 9 5 ) 0 0 0 3 8 - 0
140 H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

hensive investigations of both measurement and the- 2. Previous applications of SEM


oretical issues is generally acknowledged (e.g., An-
derson and Gerbing, 1988; Bagozzi, 1984; Bagozzi We selected the Journal of Marketing, Journal of
and Yi, 1988; Dillon, 1986; Steenkamp and van Marketing Research, International Journal of Re-
Trijp, 1991) and even though the methodology has search in Marketing, and the Journal of Consumer
developed a loyal following in some quarters and Research as the journals most representative of re-
continues to attract new users, some authors have search in the fields of marketing and consumer be-
commented critically on the technique's value for havior. All issues between 1975 and 1994 were
empirical research, These criticisms range from out- searched for empirical applications of SEM. Theoret-
right denial of the method's usefulness because of ical papers dealing with issues related to SEM and
the presumed implausibility of underlying assump- papers in which only simulated data were analyzed
tions (e.g., Freedman, 1987) to concerns about the or actual data were analyzed for illustrative purposes
way in which SEM has been applied in practice only were not considered. Similarly, conventional
(e.g., Breckler, 1990; Biddle and Marlin, 1987; Cliff, exploratory factor analysis models, path analysis and
1983; Fornell, 1983; Martin, 1987). In our opinion, other structural models estimated by regression
the methodology has much to offer to the empirical methods (e.g., models estimated by two-stage least
researcher, but there are many pitfalls that can make squares), nonlinear structural models, partial least
SEM a dangerous tool in the hands of inexperienced squares (PLS) models (cf. Fornell and Bookstein,
users. The purpose of this paper is to critically 1982), and ordinal and limited observed variable
evaluate previous empirical applications of SEM in models (e.g., those that can be estimated with the
four leading marketing and consumer behavior jour- LISCOMP program of Muth6n (1987)) were ex-
nals (the Journal of Marketing, Journal of Marketing cluded from the sample. Essentially, our data base of
Research, International Journal of Research in Mar- applications of SEM includes confirmatory measure-
keting, and the Journal of Consumer Research) and ment models, single-indicator structural models pro-
to provide guidance to future users on how to em- vided they were estimated by a program normally
ploy the methodology more appropriately. used for latent variable modeling, and integrated
Specifically, our review has the following objec- measurement/latent variable models.
tives. First, we want to document the number of In total, we found 149 applications that satisfied
applications of SEM over the years and classify our selection criteria, i Fig. 1 graphs the number of
these applications in terms of relevant criteria such applications between 1977 (the first year an applica-
as the purpose for which the methodology is used tion was found) and 1994, both overall and by
(e.g., investigations of measurement issues, tests of journal. It is apparent that the use of SEM in the four
theoretical relationships). To our knowledge, no journals has increased fairly steadily over the years.
comprehensive survey of marketing applications has When the number of applications (overall and sepa-
been reported in the literature, and little is known rately for each journal) is regressed on the linear and
about the use of LISREL and related programs in quadratic effects of time (plus a dummy variable for
actual research. Second, we seek to evaluate the 1982 in the analysis for the Journal of Marketing
quality of applications of SEM by assessing their Research and in the overall analysis to correct for the
conformance with formal statistical assumptions re- large number of applications found in the November
quired for the valid use of these techniques, their issue of this journal due to a special issue devoted to
adherence to guidelines derived empirically from SEM), only the linear trend of time and the dummy
simulation studies, and their reliance on rules of variable are significant (the only exception being a
thumb proposed by expert practitioners. Initial appli- significantly positive quadratic effect for the Journal
cations of methodologies, particularly if they are as of Marketing), indicating that in general the use of
complex as SEM, are prone to misuse, and an in-
depth analysis of a critical mass of empirical studies
should point to problem areas and suggest avenues i A listing of the 149 applicationsincludedin the meta-analysis
for improvement. may be obtainedby contactingthe first author.
H, Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161 141

number of
applications
model fit the data well, accounting for 78 percent of
the variance in the dependent variable, and all three
20
predictors were statistically significant (t-values >
18
11.891, with the coefficient of the quadratic term
16 being negative, as expected). The coefficient of inno-
14
vation ( p ) was 0.005 and the coefficient of imitation
(q) was 0.207, and since q is substantially greater
12
than p the introduction of SEM may be considered a
10 successful innovation (Bass, 1969).
8 Structural equation models can be specified to
investigate measurement issues, to examine struc-
6
tural relationships among sets of variables, or to
4
accomplish both purposes simultaneously. Most pub-
2 lished applications of SEM are factor-analytic mea-
0 time surement studies (39 percent) and integrated investi-
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 gations of both the measurement structure underlying
a set of observed variables and the structural rela-
tions among the latent variables (42 percent). In
Fig. 1. Applications of structural equation modeling over time. some cases SEM is also used for examining the
relationships among variables which are all mea-
sured by single indicators (15 percent), and in five
SEM has neither accelerated nor decelerated over the instances multiple uses of SEM were reported (e.g.,
years. Overall, the Journal of Marketing, Journal of separate analyses of measurement and structural
Marketing Research, International Journal of Re- models were performed). 2
search in Marketing, and the Journal of Consumer The vast majority of published studies have been
Research accounted for 28, 42, 3, and 28 percent of conducted with cross-sectional data (93 percent). In
the total number of applications, respectively. one instance the analysis was performed on true
SEM can be regarded as a methodological innova- longitudinal data covering many time periods, and in
tion, and the question arises how this innovation has nine cases the authors used panel data, where obser-
developed over time. To investigate this issue, the vations at two or more points in time were available
Bass (1969) diffusion model was fit to the data. The for each member of the sample. The almost exclu-
Bass model is applicable to non-replacement sales sive reliance on cross-sectional data to investigate
only, which in the present case means that it models structural relationships among constructs (89 percent
only initial applications of the technique by a given when measurement studies are excluded from the
author. When there are multiple authors and at least sample) and the well-known problems of inferring
one of the authors has not used the method previ- causation from cross-sectional data (e.g., Cliff, 1983;
ously, it is not obvious whether the paper should be Biddle and Marlin, 1987) suggest that special care be
regarded as an initial application since knowledge of exercised in causally interpreting results derived from
who conducted the analysis is unavailable. We used (cross-sectional) structural equation models. In fact,
the conservative criterion that if at least one of the it might be advantageous to avoid the term causal
authors was a first-time user, the paper was included modeling altogether and instead talk about SEM, as
in the sample. A total of 128 papers were thus is done in this paper.
classified as initial applications. In fitting the Bass
model to our data, we regressed the number of
first-time applications per year on the number of 2 Separate analysis of measurement and structural models does
not refer to the two-step approach of Anderson and Gerbing
cumulative applications up to the previous year, the
(1988). Rather, authors sometimes perform a factor analysis on
square of the latter term, and a dummy variable for the set of items available and then combine variables into compos-
the special issue in 1982. The resulting regression ites, which are analyzed as single-indicator constructs.
142 H. Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161

As a final characterization of previous applica- ses in this section is 184, although in some cases it
tions of SEM, our analysis shows that 85 percent of might be slightly smaller because of missing values.
all authors used LISREL to perform the analysis. In
six instances the data were analyzed with EQS, and 3.1. Issues related to the initial specification o f
in 17 cases another program was used or the author(s) theoretical models o f interest
did not specifically mention which program was
used. The data indicate that LISREL has enjoyed a Model specification. To establish a common ter-
considerable first-mover advantage, and it will be minology for the discussion that follows, we will
interesting to see whether any of the newer programs briefly review some specification issues. Using the
will be able to challenge the hegemony of LISREL LISREL formulation, a full structural equation model
and become a serious competitor in the future. can be stated as follows (cf. Bollen, 1989):
(1)
y = AYrl + , , (2)
3. Methodological issues in the application of SEM
x=A~+8. (3)
In discussing methodological aspects relating to Eq. (1) is called the latent variable (or structural)
SEM and in assessing the quality of published appli- model and expresses the hypothesized relationships
cations, we will consider the following three broad among the constructs in one's theory. The m X 1
sets of issues (cf. Bagozzi and Baumgartner, 1994): vector ~7 contains the latent endogenous constructs
(1) issues related to the initial specification of theo- and the n × 1 vector ~ consists of the latent exoge-
retical models of interest; (2) issues related to data nous constructs. The coefficient matrix B shows the
screening prior to model estimation and testing; and effects of endogenous constructs on each other, and
(3) issues related to the estimation and testing of the coefficient matrix F signifies the effects of
theoretical models on empirical data. In contrast to exogenous on endogenous constructs. The vector of
the previous section, where the focus was on track- disturbances ~" represents errors in equations. If la-
ing the number of articles employing SEM over tent variables are specified to have simultaneous
time, the unit of analysis in this section is a given effects on each other (so that the B matrix has
model. In many papers only a single model is con- nonzero elements both above and below the diago-
sidered. Even if multiple (mostly nested) models are nal) a n d / o r if errors in equations are allowed to be
compared using a single sample, but the author(s) correlated, the model is called nonrecursive. If B is
present(s) one model that best represents the data, subdiagonal and the ~i are uncorrelated, the model is
only this final model is analyzed. Thus, in most said to be recursive. Special care is required with
cases using the article or the model as the unit of nonrecursive models because of such issues as model
analysis leads to the same result. In some papers, identification, the stability of reciprocal effects, and
however, the same model is estimated on multiple the interpretation of measures of variation accounted
samples (e.g., the model is cross-validated with dif- for in endogenous constructs (cf. Schaubroeck, 1990;
ferent respondents), different models are estimated Teel et al., 1986).
on the same sample (e.g., separate measurement Eqs. (2) and (3) are factor-analytic measurement
models are specified for different constructs or sepa- models which tie the constructs to observable indica-
rate measurement and structural models are investi- tors. The p X 1 vector y contains the measures of
gated), or different models are estimated on different the endogenous constructs, and the q X 1 vector x
samples (e.g., different model specifications are ex- consists of the measures of the exogenous indicators.
amined using different respondents). In the first case, The coefficient matrices A y and A ~ show how y
the data were averaged across replications before relates to r/ and x relates to ~, respectively. The
including the application in the analysis. In the other vectors of disturbances e and 8 represent errors in
two cases, each distinct model was used separately in variables (or measurement error). Generally (but not
the analysis, so that a single paper might contribute always) the measurement model possesses simple
several data points. The sample size for most analy- structure such that each observed variable is related
H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161 143

to a single latent variable. Models with simple struc- (e.g., Anderson and Gerbing, 1984). Besides these
ture and no correlated measurement errors represent statistical considerations, a greater number of ob-
unidimensional construct measurement, which is fre- served measures is more likely to tap all facets of the
quently considered to be a highly desirable character- construct of interest. On the other hand, the greater
istic of measurement (Anderson and Gerbing, 1988; the number of indicators per factor, the more diffi-
Gerbing and Anderson, 1988; Hattie, 1985). cult it will probably be to parsimoniously represent
A total of 73 models in our sample were full the measurement structure underlying a set of ob-
structural equation models consisting of Eqs. (1) served variables and to find a model that fits the data
through (3). 3 In 81 cases, SEM was used solely for well.
investigating the measurement structure underlying a Bagozzi and Heatherton (1994) have recently sug-
set of observed variables. This submodel, which gested an approach for representing personality con-
corresponds to a confirmatory measurement model, structs that seems applicable to the modeling of
is given by either Eq. (2) or Eq. (3). Sometimes, measurement structures in general. They distinguish
SEM is also applied to examinations of the structural four different levels of abstraction in modeling per-
relations among constructs that are all measured by sonality constructs, but in the present context a
single indicators. The necessary specification is ob- differentiation into three levels seems sufficient. In
tained by ignoring unreliability of measurement and the total aggregation model, a single composite is
setting A ~' and A x to be equal to identity matrices, formed by combining all the measures of a given
or by assuming reliability to be known and fixing the construct. This approach results in a model that is
factor loadings or error variances accordingly. This formally identical to one in which only a single
was done in 30 cases. In the sequel we will refer to indicator is available, but in general a composite
confirmatory measurement models, single-indicator single indicator should be more reliable than a true
structural models, and integrated measurement/latent single-item measure. In fact, it is possible to com-
variable models as models of type I, II, and III, pute a measure of reliability when a composite of
respectively. 4 items is available (e.g., coefficient a), and this
Measurement model specification. One important estimated reliability can be incorporated into the
consideration in planning a study is how many ob- analysis by fixing the error variance of the indicator
served variables should be used to measure each to (1 - reliability) times the variance of the indica-
latent variable and how the various indicators should tor. This method has the advantage that the specifica-
be related to each construct. It is generally accepted tion of the model is quite simple and that, compared
that each construct should be measured by multiple to the true single-indicator case, unreliability of mea-
items, but how many items should be used is less surement can be taken into account in a limited way.
clear. On the one hand, a sufficient number of However, a major disadvantage is that the quality of
indicators per factor has to be available for a model construct measurement is not investigated explicitly
to be identified, and for estimation problems such as (e.g., no assessment of unidimensionality is pro-
nonconvergence and improper solutions to be mini- vided). In the partial aggregation and partial disag-
mized it is advantageous to have many indicators gregation models, subsets of items are combined into
several composites and these composites are treated
as multiple indicators of a given factor. This method
3 Included m this category are several second-order factor
takes into account unreliability more explicitly and
models, which are measurement models from a substantive per-
spective but which are specified as integrated measurement/
allows some assessment of unidimensionality while
structural models for estimation purposes. minimizing model complexity. However, combining
4 All three types of models can be specified as single-sample or subsets of items into composites is usually somewhat
multi-sample models. If the results of both the single-sample and arbitrary. Finally, in the total disaggregation model
multi-sample analyses were reported in the paper, the single-sam-
true single-item measures are used as multiple mea-
ple data were used and averaged before including the application
in our sample. In seven cases (three type II models and four type
sures of an underlying latent variable. This method
III models) only the results of the multi-sample analysis were allows the most explicit tests of the quality of con-
reported so that these data had to be used in the analysis. struct measurement, but unfortunately the analysis
144 H. Baumgarmer, C Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

becomes rather unwieldy if more than, say, five variable be assessed with a minimum of three or four
indicators per factor are available and the model indicators each (cf. Bollen, 1989).
contains even a moderately large number of con- As mentioned previously, not all single-item con-
structs. structs are true single-item measures. Authors some-
Table 1 presents relevant statistics regarding the times aggregate items into composites before enter-
issue of construct measurement, both overall and by ing them into the analysis. We also assessed the
type of model. Overall, the median number of ob- number of actual items that went into each measure,
served variables ( p + q) across all applications of and the median of this variable (18) was substan-
SEM has been 11 and the median number of con- tially higher than the median number of formal
structs (m + n) has been five, resulting in a median indicators on which the degrees of freedom are based
ratio of observed variables to constructs of about (11). In fact, across all three models items were
two. For type II models, this ratio is identically equal combined into composites prior to entering them into
to one by definition. In pure measurement studies, a structural equation model in 38 percent of the
the median number of items per factor is about four, cases. This practice was particularly prevalent in the
whereas in integrated measurement/latent variable case of type II models, where 77 percent of applica-
models the median is around two. An unexpectedly tions followed this procedure. Sometimes, it will be
large number of models contained at least one sin- practically unavoidable to combine items into com-
gle-indicator 'latent' construct. By definition, all posites if the number of indicators is even moder-
constructs are indicated by a single measure in type ately large (e.g., if one of the constructs is a person-
II models. However, even in type III models at least ality trait which is measured by a battery of, say, ten
one single-indicator construct was used in 71 percent items). In this case we recommend that a (confirma-
of all cases. For type I models this figure was much tory) factor analysis be conducted on the items to be
lower (7 percen0, but this should not be too surpris- aggregated and that evidence on the dimensionality
ing given that these studies deal exclusively with of the construct be presented. If unidimensionality
construct measurement. holds, the items may be aggregated into a single
Single-indicator constructs are unattractive be- composite and measure unreliability can be taken
cause they ignore unreliability of measurement, into account by fixing the error variance appropri-
which is one of the problems SEM was specifically ately, as described previously. If unidimensionality
designed to circumvent. As noted by Bentler and does not hold, only indicators of a given subdimen-
Chou (1987), even having two measures per factor sion of the construct should be combined and the
might be problematic since three indicators per con- resulting composites can be treated as multiple indi-
struct are needed for a model to be identified unless cators of a higher-order construct, provided the inter-
covariances among factors help to identify the sys- correlations are high enough. Otherwise, separate
tem of equations. Furthermore, as discussed by constructs will have to be specified. Unreliability can
Bentler and Bonett (1980) and Anderson and Gerb- be taken into account as in the single-composite
ing (1988), a model of independence at the structural case.
level serves a very useful function in model compari- In terms of further characteristics of construct
son tests, and such a model is in general not identi- measurement practices, Table 1 shows that about 20
fied when fewer than three indicators are available. percent of all measurement models contain double-
These arguments suggest that authors should not use loading items. Although not advisable in general
single-indicator constructs or the minimum number (Gerbing and Anderson, 1988), in about half of these
of indicators required for multi-item measurement of cases the procedure is justified because of a priori
constructs. Instead, we recommend that each latent specifications of method factors in multi-trait, multi-

Note to Table 1:
a Table entries are medians (with the 25th and 75th percentile in parentheses), unless a percentage value is indicated.
H. Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161 145

~-'~ - ~ ~ ,q'~'~ ~ , 4 ~ ' ~ ~ "

-6
7 :~ ~ ~ ~ ~ ~ ~ ~-~
d z = ~ _

°
m

E E

~3 i~°~-'~.~
_ ~ °o~
~ °° o
~._=
~~
E

•= , - ~ E

.~EE ~=

~4 E E ¢ '~ o o
£
& ~ ~ = =-~ -~
.o_ .£ .~

- = ~ = ~.,-~ .~.~ t=
"= "=
o ~ ~ .~ -~0
#

"8
•"8o ~#~&~ ~ & ~ .~,,~

o 66666 6666 ,=
~.o .~ .~ .~ .~ .~ ~ .~ .~ .~ .~ ~
~z ~z ~.~
146 H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

method analyses (cf. Bagozzi and Yi, 1991; Bagozzi ical guidance as to what constitutes an adequate
et al., 1991). About 5 percent of type I models and sample size is available and the evidence from simu-
19 percent of type III models contain correlated lation studies is sparse, but Bentler and Chou (1987)
measurement errors. We specifically coded whether provide the rule of thumb that under normal distribu-
the authors provided a substantive justification for tion theory the ratio of sample size to number of free
this practice or whether they introduced correlated parameters should be at least 5:1 to get trustworthy
errors simply because of goodness of fit considera- parameter estimates, and they further suggest that
tions. In about 50 percent of the cases no justifica- these ratios should be higher (at least 10:l, say) to
tion was provided. We concur with Anderson and obtain appropriate significance tests.
Gerbing (1988) and Hattie (1985) that unidimen- Table 1 shows that, overall, the median number of
sional construct measurement is a desirable charac- parameters estimated was about 29, with the median
teristic of measurement models, and we recommend somewhat smaller for type II models. The median
that researchers use double-loading items and corre- sample size was 178, resulting in a median ratio of
lated errors of measurement with caution and not sample size to number of free parameters of about
introduce them simply to boost the fit of the model. 6:1. The median ratio is smallest for type III models
Latent variable model specification. We coded at about 5:1. A total of 41 (73) percent of all models
how many of the models were nonrecursive, either had ratios smaller than 5:1 (I0:1). In particular, the
because of correlated errors in equations or because figures are lowest for type III models. In fact, for 86
of entries both above and below the diagonal in the percent of these models the ratio of sample size to
B matrix. Approximately 23 percent of type II mod- number of parameters estimated was smaller than
els and 26 percent of type III models contained 10:1. These figures show that sample sizes are often
correlated errors in equations, of which 57 and 16 toward the lower end of, or even below, levels that
percent were substantively justified, respectively (see are considered acceptable to obtain trustworthy pa-
Table 1). Correlated errors in equations are some- rameter estimates and valid tests of significance. It is
times useful to model correlations among the en- fairly easy to calculate beforehand what the likely
dogenous constructs that are due to unmeasured or number of parameters to be estimated will be, and
omitted variables, but as in the case of correlated necessary sample sizes should be determined accord-
errors of measurement they should be used with ingly. As pointed out by Martin (1987), there may be
caution. About 7 percent of type II models and 8 a trade-off between collecting data of high quality
percent of type III models allowed simultaneous and gathering data from a large sample of respon-
effects among the endogenous constructs, and overall dents. A researcher's primary objective should be to
30 percent of type II models and 32 percent of type obtain high-quality data so that SEM may not be an
III models were nonrecursive. Given the complexity appropriate methodology in certain cases. In particu-
of nonrecursive models, these figures are fairly high lar, even though SEM can in principle be used in
and, as shown below, the lack of proper regard for experimental designs (Bagozzi and Yi, 1989), small
issues such as model identification indicates a poten- sample sizes will often preclude application of the
tial cause for concern. We recommend that if nonre- technique in that context.
cursive models are specified, relevant issues such as Model identification. Another important a priori
model identification and the stability of reciprocal consideration is whether the model to be estimated is
effects be addressed explicitly in the paper. identified. A model is said to be identified if it is
Sample size. Another important issue that should impossible for two distinct sets of parameter values
be considered prior to actually conducting the study to yield the same population variance-covariance
is whether the sample size is likely to be sufficient matrix. A necessary condition for identification is
given the number of parameters to be estimated. All that the number of parameters to be estimated should
methods for the estimation and testing of structural not exceed the number of distinct elements in the
equation models are based on asymptotic theory and variance-covariance matrix of the observed vari-
the sample size has to be 'large' for the parameter ables. This rule, which implies that the number of
estimates and test statistics to be valid. Little theoret- degrees of freedom be nonnegative, is easy to check,
H. Baumgarmer, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161 147

but unfortunately it is not a sufficient condition for struct is assumed to be known and also fixed at
6
identification. General, easy-to-follow procedures for one. This results in inflated degrees of freedom and
proving identification are unavailable except in spe- exaggerated p-values for the overall goodness-of-fit
cialized cases, and showing that a model is identified X 2 statistic. Fortunately, the error is generally small
may be nontrivial for certain kinds of models. In in magnitude.
particular, special care is required for models that are In most cases, it is possible to separate the mea-
not unidimensional a n d / o r nonrecursive. surement model from the latent variable model and
We coded whether the issue of identification was to partition the total number of degrees of freedom
addressed in a given article. Table 1 shows that very into degrees due to the particular specification of the
few authors mention that they checked whether a measurement model (essentially the degrees of free-
model to be estimated was identified. It is possible dom of a saturated structural model) and the degrees
that identification was considered without explicitly of freedom derived from a particular structural model
mentioning this fact, and in many cases computer formulation (the deviation of a model from a satu-
programs will give a warning message if a model rated structural specification). Table 1 indicates that,
appears to be underidentified, so that identification overall, most degrees of freedom derive from the
problems might be detected even if identification is measurement model. In type I models deviations
not proved theoretically. However, particularly when from the saturated structural model are rare (i.e.,
it is not immediately clear that a model is identified, cases where factor covariances are specified to be
it would be advisable to deal with identification fixed), and in type II models overidentifying restric-
explicitly and to mention this fact in the paper. Our tions can only be imposed on the structural model. In
own assessment of model identification in previous type III models, where degrees of freedom may
applications of SEM suggests that the vast majority come from both the measurement and latent variable
of models are indeed identified theoretically. How- models, the median contribution of the measurement
ever, in at least two instances the target model was model to the total number of degrees of freedom is
probably not identified and in another paper one of about 93 percent. Duncan (1975) and Forneil (1983)
the comparison models was underidentified. It is thus have pointed out the dangers of interpreting a good
strongly recommended that in the future more atten- overall fit of a model as support for the validity of
tion be paid to the issue of identification. 5 one's theory. The figures on the percentage of overi-
Degree of freedom. As a final characterization of dentifying restrictions that are generally derived from
previous applications of SEM, Table 1 shows that the measurement model provide further evidence on
the median number of degrees of freedom and thus the folly of such arguments.
overidentifying restrictions is about 32. The median
is lowest for type II models at 11 and highest for 3.2. Issues related to data screening prior to model
type III models at 49. In a surprisingly large propor- estimation and testing
tion of all cases degrees of freedom are reported
incorrectly (8 percent). The reason for this error is Probably one of the most common mistakes in
that a correlation matrix is used as input to estima- applying SEM is to pay little or no attention to the
tion, at least one exogenous construct is measured by raw data and to immediately compute a correlation
a single indicator, the loading of the single indicator matrix and rush to model estimation and testing. The
on its 'construct' is constrained to one, and at the danger inherent in this practice is that the correlation
same time the variance of the single-indicator con- matrix masks the multitude of factors that may call
into doubt the applicability of the chosen statistical
procedure.

5 Even if a model is identified theoretically, there might be


empirical identification problems. This happens when the expres-
sion of a parameter in terms of observed variances and covari- 6 In other words, the scale of a latent variable should be fixed
ances involves a denominator that is zero or close to zero (cf. by setting either the loading of one reference indicator or the
Kenny, 1979). factor variance equal to unity (but not both).
148 H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

First, it is important to check that there are no Alternative estimation techniques for which multi-
coding errors, that variables have been recoded ap- variate normality is not as crucial as for maximum
propriately if necessary, and that missing values have likelihood estimation (see discussion below) were
been dealt with properly (Kaplan, 1990). Second, it used very infrequently (5 percent), and in no instance
is helpful to investigate possible distorting influences was their use motivated by the fact that the data were
introduced by the presence of a few influential out- not sufficiently normal. These results indicate that
liers. Third, it is crucial to examine the approximate not enough attention is being paid to the satisfaction
normality of the data and to take corrective action if of assumptions that are necessary for the valid use of
this assumption is violated, since most estimation the most commonly used estimation technique. We
methods assume that the data come from a multivari- recommend that in future research the requirement of
ate normal population. Finally, it is essential that an multivariate normality be taken more seriously and
appropriate measure of association be used as input that a summary measure describing the extent to
to model estimation and testing. which the data are normally distributed be reported
Recent discussions of SEM have placed increased as a matter of routine in the methods or results
emphasis on such important issues as outlier detec- section of the paper (e.g., the multivariate coefficient
tion and assessment of normality (see Bollen, 1989 of relative kurtosis; cf. Browne, 1982).
for an excellent discussion). The necessary analyses In addition to the issue of data screening, there is
can now be conducted fairly easily with conventional also the question of which measure of association to
computer programs for SEM (e.g. EQS, PROC use in the analysis. Researchers often use correla-
CALLS in SAS) or specialized programs such as tions rather than covariances as input to estimation,
PRELIS (J6reskog and S~Srbom, 1993b) and the LIS- and Cudeck (1989) has recently discussed this issue
RES macro for SAS (Davis (1992), based on the in some detail. Since in most cases maximum likeli-
work of Bollen (1989), and Bollen and Arminger hood (as well as generalized least squares) fitting
(1991)). PRELIS is helpful for exploratory data functions are scale invariant and the resulting esti-
screening and for testing the univariate and multi- mates scale free (Bollen, 1989), this has no effect on
variate normality of the observed variables. The overall goodness of fit indices and parameter esti-
LISRES macro can be used to flag 'atypical' cases mates. However, standard errors may be inaccurate,
and to perform outlier analysis for groups of obser- and Cudeck (1989) cautions against the use of corre-
vations, and it also reports univariate and multivari- lation matrices. Our survey of previous applications
ate tests of normality based on the skewness and of SEM shows that in many cases researchers do not
kurtosis of the observed variables (see Bollen (1989), mention specifically which measure of association
for details). An assessment of the approximate nor- they used as input to estimation. If the conservative
mality of the data is important because model esti- criterion is adopted that a correlation is used by
mation and testing are usually based on the validity default, about 78 percent of applications have based
of this assumption, and lack of normality adversely estimation on correlation matrices. Covariances were
affects goodness-of-fit indices and standard errors. specifically used in 21 percent of all cases and in a
There was little evidence in most of the papers few applications another measure of association was
that the authors had shown particular concern for employed. It is difficult to assess how frequently the
data screening prior to model estimation and testing. use of correlations has had detrimental effects on the
Due to space constraints data screening is probably analysis. One case where correlations should not be
not described in most cases and the idea of outlier used is with multiple-group analyses. In seven in-
detection in SEM is a relatively new one. However, stances only the results of multiple-group analyses
it is widely known that SEM generally requires the were reported in the paper. In four of these cases no
assumption of multivariate normality, and we coded mention was made of the fact that the analyses were
whether assessment of normality was discussed ei- based on covariances, and it is possible that correla-
ther qualitatively or quantitatively. Only in 8 percent tions were used as input. We recommend that in
of all cases did authors indicate that they had checked future research all analyses be conducted on covari-
whether the data were at least approximately normal. ance matrices and that tests of significance for indi-
H. Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161 149

vidual parameters be reported from this analysis. problems were sometimes mentioned was in the
Standardized parameter estimates can be easily ob- context of multi-trait multi-method analysis. Im-
tained from the standardized solution in which either proper solutions in the form of negative error vari-
the latent variables or both the latent and observed ances were found in 5 percent of all models (al-
variables have been standardized. though they were not always significant). Sometimes
the offending estimate was set to zero, and some-
3.3. Issues related to the estimation and testing of times it was retained. It should be noted that esti-
theoretical models on empirical data mated model parameters are not always reported in
enough detail so that it is sometimes impossible to
Model estimation. A variety of estimation proce- tell whether improper solutions occurred or not. We
dures are available to obtain parameter estimates and would recommend that authors at least mention that
test statistics, but the majority of models (about 95 there were no improper solutions if complete results
percent) are estimated using maximum likelihood are not reported because of space constraints or other
techniques (see Table 2). The reason for this prefer- reasons.
ence for maximum likelihood techniques seems to be Assessment of overall model fit. The most popular
that it is the default method in most computer pack- index for assessing the overall goodness of fit of a
ages. Furthermore, given the lack of concern for the model has been the X2 statistic, which tests the null
validity of the normality assumption, authors proba- hypothesis that the estimated variance-covariance
bly do not see a need to consider alternative estima- matrix deviates from the sample variance-covari-
tion procedures. In principle, asymptotically distribu- ance matrix only because of sampling error. In prac-
tion-free (ADF) methods (Browne, 1984) can be tice, the X 2 test is sometimes of limited usefulness
used regardless of which distribution underlies the because it is not robust to violations of underlying
observed variables, but in practice very large sample assumptions (particularly normality) and because it
sizes are required, and simulations have shown that is heavily influenced by sample size (Bentler, 1990).
ADF techniques do not necessarily perform better The latter problem is particularly serious because on
even when they might be expected to be more the one hand large sample sizes are needed to obtain
appropriate theoretically (cf. Hu et al., 1992; Sharma valid tests, but on the other hand specified models
et al., 1989). Given the small sample sizes on which are probably never literally true and thus subject to
correlations or covariances are generally based, ADF rejection in sufficiently large samples (cf. Cudeck
techniques are probably not a practical alternative in and Browne, 1983).
most situations. Because of these problems, many alternative fit
Estimation problems. Sometimes estimation tech- indices have been developed (for recent overviews
niques encounter difficulties in converging on a solu- see Gerbing and Anderson, 1993; J/Sreskog, 1993;
tion or converge on a locally optimal solution, and Marsh et al., 1988; Mulaik et al., 1989; Tanaka,
even when a solution has been found it might be 1993). Some of these are stand-alone indices assess-
improper in the sense that the estimated parameters ing model fit in an absolute sense (e.g., the goodness
are impossible in the population (e.g., negative error of fit index (GFI), adjusted goodness of fit index
variances) or of little use in testing (e.g., very large (AGFI), and root mean square residual (RMR) re-
standard errors). Frequent causes of such problems ported by earlier versions of LISREL) and others are
are poorly specified models, overfitting, outliers, bad incremental fit indices comparing the target model to
starting values, insufficiently operationalized con- the fit of a baseline model, among them the Bentler
structs, and small sample sizes (Bentler and Chou, and Bonett (1980) normed fit index (BBI) and the
1987). Tucker and Lewis (1973) nonnormed fit index (TLI).
Few instances of estimation problems were en- Most of the time the baseline model is one in which
countered in our review (see Table 2). With respect all observed variables are assumed to be uncorre-
to convergence problems, this is not too surprising lated, although other baseline models are possible.
since nonconvergence would probably preclude pub- Recent work on goodness-of-fit assessment has
lication of an article. One area where convergence emphasized the idea of expressing model fit in terms
150 H. Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161

- ~ ~ ~ -

0
E

~ ~ - - ~ _ ~ ~ ~ - -

0 0

e.

-~ ~ o o o ~ _ _ ~ _ ~ _ o oo ~
H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161 151

e.

.=_

.8
E

~e

e~

ea

"7.

! ,o

e~

[-
",7.
152 H. Baumgarmer. C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161

of noncentrality (e.g., Bentler, 1990; McDonald and programs for SEM (e.g., LISREL 8, the PROC
Marsh, 1990; Steiger, 1990; Browne and Cudeck, CALLS procedure in SAS) report a large number of
1993). This explicitly recognizes the fact that hy- different fit indices so that authors will have to make
pothesized models are generally only approximately a decision about which ones to use in model evalua-
true, provides a basis for population-based (rather tion. Bollen and Long (1993) recommend that re-
than sample-based) fit measures and associated con- searchers should not rely solely on the X 2 statistic
fidence intervals, and appears to mitigate the prob- but report multiple fit indices representing different
lem that the means of the sampling distributions of types of measures (i.e., other stand-alone indices
many alternative fit measures are a function of sam- besides the X 2 statistic, such as RMSEA, and incre-
ple size (with larger samples yielding larger fit in- mental fit indices such as CFI and TLI; see Tanaka,
dices on average). Among the stand-alone fit indices 1993, for a discussion of different dimensions along
based on noncentrality are the McDonald (1989) which fit indices can be classified). Table 2 indicates
measure of centrality (MC) and the root mean squared that researchers base model evaluation too much on
error of approximation (RMSEA) of Steiger (1989) the X 2 test, and we suggest that alternative fit
and Steiger (1990), which estimates how well the indices (particularly those based on noncentrality) be
fitted model approximates the population covariance used more widely in future applications.
matrix per degree of freedom. Browne and Cudeck Table 2 also reports summary statistics on the
(1993) suggest that a value of RMSEA below 0.05 distribution of goodness-of-fit measures across previ-
indicates close fit and that values up to 0.08 are ous applications of SEM. The X: statistic by itself is
reasonable, and they propose a test of close fit for not a meaningful statistic without taking into account
testing the hypothesis that RMSEA is smaller than the degrees of freedom of a model so that the ratio of
0.05. (In contrast, the conventional X2 statistic tests X 2 to degrees of freedom ( x 2 / d f ) is reported. If
the hypothesis that RMSEA = 0). Among the incre- either GFI or AGFI was provided in the paper, it is
mental fit indices based on noncentrality are the generally possible to calculate the other index, and
Bentler (1990) normed comparative fit index (CFI), this was done to increase the sample size. In the case
which in most cases equals the McDonald and Marsh of RMR, only models in which the RMR was based
(1990) nonnormed relative noncentrality index, and on a correlation matrix are included in the sample.
the Tucker-Lewis index (TLI). The most important Summary statistics are also reported for MC and
difference between CFI and TLI is that TLI (like RMSEA since they might become more widespread
RMSEA) expresses fit per degree of freedom, thus in the future. Besides the three incremental fit in-
imposing a penalty for estimating less parsimonious dices that were encountered with some degree of
models. This may be important in comparing models frequency (BBI, TLI, and CFI), Table 2 also reports
of different complexity. the relevant figures for the relative fit index (RFI) or
Table 2 shows that most published articles using p~ and the incremental fit index (IFI) or /I 2, both of
SEM report at least one stand-alone fit index, and which are due to Bollen (1989). Provided that at
slightly more than a third also rely on incremental fit least one incremental fit index was reported in the
indices to assess the overall fit of the model. As paper, it is possible to calculate the X 2 of the
expected, by far the most commonly used fit index is baseline model and thus any other incremental fit
the X 2 test. Other common stand-alone indices are index. Whenever possible, this was done to increase
GFI, AGFI, and RMR. Until recently these were the the sample size on which the summary statistics are
ones automatically reported by LISREL, which seems based. Only models for which the baseline model
to explain their popularity. Incremental fit indices are was the one of complete independence among all
used less frequently, probably because, until re- measures were used. The effective sample sizes for
cently, one had to estimate a separate baseline model the norms on x 2 / d f , GFI, AGFI, RMR, MC, RM-
and calculate the index by hand. BBI has been the SEA, BBI, TLI, CFI, RFI, and IFI are provided in
most popular incremental fit index, but CFI has brackets in Table 2.
recently gained in popularity. The medians for x 2 / d f , GFI, AGFI, RMR, MC,
The latest versions of the most common computer and RMSEA were 1.62, 0.95, 0.91, 0.05, 0.95, and
H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161 153

0.06, respectively. In 54 percent of all cases the and nonnegligible relationship with sample size, cor-
hypothesized model was inconsistent with the data roborating earlier theoretical discussions and simula-
based on a X 2 goodness-of-fit test at a critical value tion evidence (with the exception of GFI) that the
of 0.05. The corresponding figures for type I, II, and means of the sampling distributions of these indices
III models were 61, 21, and 60 percent. Thus, type II are a positive function of sample size (cf. Bollen,
models tend to achieve better fits than the other two 1989; Marsh et al., 1988; McDonald and Marsh,
types of models. The reason for this seems to be that 1990). Second, there were sizable (negative) effects
these models are generally less complex than type I of model complexity (in terms of number of ob-
and type III models. The medians for BBI, TLI, CFI, served variables, number of observed variables per
RFI, and IFI were 0.91, 0.93, 0.95, 0.85, and 0.95. factor, number of parameters estimated, degrees of
The relative magnitude of the various incremental fit freedom, and contribution of the measurement model
indices is consistent with theoretical expectations (cf. to the overall number of degrees of freedom) on GFI
Bollen, 1989). The percentage of applications in and MC ( r ' s > 10.51), BBI ( r ' s > 10.41), AGFI ( r ' s >
which the GFI, AGFI, BBI, TLI, and CFI were 10.31), and CFI, RFI, and IFI ( r ' s > 10.21). In contrast,
smaller than 0.9 (the value presumably indicating x2/df, RMSEA, and TLI were unaffected by model
acceptable fit for these indices) was 24, 48, 44, 32, complexity. These results indicate that model com-
and 21, respectively. Although the figures for the plexity is an important factor contributing to the
incremental fit indices should be interpreted with contingent nature of goodness-of-fit assessments, and
caution because of the relatively small number of they suggest that general rules of thumb (e.g., that
applications on which they are based, it is apparent GFI or BBI be greater than 0.9) may be misleading
that the 0.9 value is a reference point that many because they ignore such contingencies. On the posi-
models do not achieve. For RMSEA, 58 percent of tive side, x 2 / d f , RMSEA, and TLI seem to be
all models had values above 0.05, and 23 percent effective in controlling for model complexity by
had values exceeding 0.08. Thus, a sizable propor- assessing fit per degree of freedom.
tion of published models falls short of what Browne Assessment of the measurement model. The qual-
and Cudeck (1993) call a reasonable fit. ity of construct measurement is ascertained by look-
In an effort to ascertain determinants of the over- ing at the sign, size, and significance of estimated
all goodness of fit of a model, X 2/df, GFI, AGFI, factor Ioadings and the magnitude of measurement
MC, RMSEA, BBI, TLI, CFI, RFI, and IFI were error. Various indices of reliability can be computed
correlated with the variables listed in Table 1 and on to summarize how well the constructs are measured
the bottom of Table 2. 7 Two important sets of by their indicators, either at the individual item level
results emerged from this analysis. First, X 2/df was (individual-item reliability) or for all measures of a
fairly strongly correlated with sample size ( r = 0.47). given construct jointly (composite reliability, average
This confirms the problematic dependence of the X 2 variance extracted; cf. Alwin and Jackson, 1979;
test on sample size. In addition, AGFI ( r = 0.22), Bagozzi and Yi, 1988; Fornell and Larcker, 1981;
BBI ( r = 0.31), and RFI ( r = 0.35) had a significant Steenkamp and van Trijp, 1991).
We initially attempted to compute summary mea-
sures of measurement reliability for a given model,
but this proved too difficult because of the diversity
of approaches used by different authors to assess the
7 The possible determinants of the overall goodness-of-fit of a
quality of construct measurement. This makes com-
model were submitted to a principal components analysis to parisons across studies almost impossible. For exam-
identify clusters of variables that behave similarly across samples ple, in some cases reliability is computed based on
and to aid in the interpretation of the effect of a given variable. estimated model parameters. In other cases, coeffi-
This analysis showed, for example, that number of observed
cient ot is reported for composites of items that are
variables, number of latent variables, number of parameters esti-
mated, degrees of freedom, contribution of the measurement
used either as single-indicator 'constructs' or as
model to the overall number of degrees of freedom loaded on a multi-item individual indicators. Eventually, we only
single factor which reflects the complexity of a model. coded whether authors showed any concern for mea-
154 H. Baumgarmer. C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

sure reliability by reporting at least one of the vari- clear on the difference between the two and that the
ous possible reliability indices. emphasis on covariance fit detracts from a proper
Overall, 78 percent of all applications mentioned concern for variance fit. We specifically coded
some form of reliability assessment (see Table 2). whether papers reported evidence on variation ac-
The figures for type I, type II, and type III models counted for, at least for some of the endogenous
were 78, 70, and 82 percent, respectively. The prac- variables. For type II models this was the case in
tice of examining measure reliability is least com- only 30 percent of all cases, and for type III models
mon in type II models, which is not surprising since the figure was 45 percent. Although we do not have
construct measurement is not modeled explicitly in any hard evidence on why these figures are so low,
this case and since reliability cannot be assessed if we suspect that one of the reasons might be that
true single-item measures are used in the analysis. since the goodness of fit of the overall model was
Although reliability assessment is more common in assessed by means of X 2 and related statistics,
type I and type III models, there is room for im- authors see no need to report what in regression
provement even in these cases. terminology would be called the goodness of fit of
We recommend that in the future authors report at each structural equation (i.e., R2). Since covariance
least one measure of construct reliability which is fit says nothing about variance fit - a model might
based on estimated model parameters (e.g., compos- fit well but not explain significant amounts of varia-
ite reliability, average variance extracted). Coeffi- tion in endogenous variables, or conversely fit poorly
cient ct is generally an inferior measure of reliability and explain a large portion of the variance in en-
since in most practical cases it is only a lower bound dogenous variables (Fornell, 1983) - it is recom-
on reliability. In particular, if coefficient a is en- mended that authors report the R 2 for each structural
tered into the analysis as an external estimate of equation. We hasten to add, however, that the amount
reliability, it will usually exaggerate unreliability of of variance explained by a structural equation is only
measurement. Furthermore, in cases where compos- one consideration in evaluating a model and that the
ites of measures are used as individual items, authors meaningfulness of individual structural parameters is
should report supplementary evidence on unidimen- probably of greater importance in most cases.
sionality based on factor analyses. Coefficient a is Model modification. It is quite unlikely that the
insufficient for this purpose since a scale may not be model that is initially specified as a plausible repre-
unidimensional even if it has high reliability (Gerb- sentation of the data (with all items that were col-
ing and Anderson, 1988). lected to measure a given construct included in the
Assessment of the latent variable model. In mod- analysis and all model parameters estimated based
els of type II and III, the latent variable model purely on a priori considerations) will be the one that
represents the hypotheses of interest. The hypotheses is eventually presented as the most parsimonious
are tested by examining the sign, size, and statistical summary of the data. There is, however, consider-
significance of the structural coefficients. In addi- able variation across studies in how readily this fact
tion, it is useful to report the percentage of variation is acknowledged. Some authors describe in great
in the endogenous constructs accounted for by the detail how the measurement model was purified and
exogenous constructs (i.e., the R 2 for each structural what modifications were made to the structural model
equation). If the model is nonrecursive, these figures to obtain acceptable goodness-of-fit statistics. Other
have to be interpreted with caution (cf. Teel et al., authors present only the final model and provide no
1986). evidence on the process that led to it. It is thus
Several authors (e.g., Fornell, 1983) have stressed difficult to evaluate how much the initial model was
the importance of distinguishing between variance fit modified.
(explained variance in the endogenous variables) and Various tools are available to locate model mis-
covariance fit (overall goodness of fit statistics test- specifications, including modification indices and
ing the applicability of the overidentifying restric- residual analysis. Using modification indices, which
tions imposed on the model, such as the X 2 test). It are reported routinely in the output of LISREL and
seems that authors are sometimes not completely other programs, researchers can conduct specifica-
H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161 155

tion searches almost automatically and possibly im- exploratory or confirmatory factor analyses, specifi-
prove the fit of a model to acceptable levels. Unfor- cations on the basis of preliminary explorations of
tunately, simulation work by MacCallum (1986) and the data (e.g. through exploratory factor analysis),
Homburg and Dobratz (1992) indicates that specifi- respecifications of the measurement model on the
cation searches (particularly modifications of the la- basis of modification indices or residual analysis
tent variable model) can go astray and fail to un- (e.g., introduction of correlated measurement errors),
cover the correct underlying model, particularly when addition of structural paths or correlated errors in
the original model has many specification errors, equations because some overidentifying restrictions
when the sample size is small, and when the search were not met, and pruning of the model of nonsignif-
is guided solely by a desire to improve the overall fit icant parameters on the basis of t-values or X 2
of the model. Since models arrived at through speci- difference tests. Model comparisons were performed
fication searches are rarely cross-validated, it is quite in 31 percent of all cases, and cross-validation -
possible that authors are capitalizing on chance when defined broadly as the estimation of the same model
respecifying models that have been found to be on at least two sets of data, with no requirement that
lacking in fit. We recommend that model modifica- the two models be compared explicitly using multi-
tions be strongly guided by substantive considera- sample analysis procedures or the cross-validation
tions and that constraints having large modification approaches suggested by Browne and Cudeck (1989),
indices be relaxed only if the resulting parameter Cudeck and Browne (1983), and Homburg (1991) -
change is theoretically and practically meaningful. was conducted in only 21 percent of the cases. The
The expected parameter change statistic available in low incidence of cross-validation coupled with the
some computer programs should prove helpful in presumably frequent practice of searching for an
this regard (Kaplan, 1990). acceptable model specification (we would venture to
As argued by Cudeck and Browne (1983), SEM is guess that most models reported in the literature
best conducted in the form of comparisons among have been modified at least to some extent) imply
different plausible models that are nested in each that the replicability of findings obtained through
other and can be justified theoretically. At the struc- SEM may often be doubtful (cf. Steiger, 1990).
tural level, the decision-tree framework suggested by Furthermore, the figure on how often a target model
Anderson and Gerbing (1988) may be quite helpful is compared to alternative specifications suggests
in this regard. Besides avoiding the dangers of speci- that the benefits of model comparisons have not been
fication searches which are not guided by theory, the realized by many authors. We recommend that in the
comparison of different models should also guard future authors be frank in their reporting of how they
against the frequently encountered but mistaken no- arrived at their final model and that a strategy of
tion that a theoretical model which achieves an model comparison be adopted whenever possible.
acceptable (covariance) fit has somehow been shown The expected cross-validation index of Browne and
to be the most plausible representation of the data. Cudeck (1989), Browne and Cudeck (1993) and
Such thinking ignores the very real possibility of related model selection criteria may prove helpful in
model equivalence (i.e., two different parametric this regard.
structures with possibly very different theoretical Residual analysis. Traditionally, residual analysis
implications summarize the data equally well as in the context of SEM has referred to an examination
shown by equivalent X 2 values; cf. Stelzl (1986), of the difference between the observed (sample)
and Luijben (1991)) or near model equivalence (i.e., variance-covariance matrix and the estimated vari-
two different models are not formally equivalent but ance-covariance matrix implied by a particular
more or less equally consistent with the data; cf. model specification. This is in contrast to regression
Breckler (1990)). analysis, where residual analysis refers to an exami-
As shown in Table 2, in 54 percent of all applica- nation of the difference between observed and pre-
tions authors acknowledged some form of specifica- dicted values of a dependent variable. Recent work
tion search. These included deletion of items because by Bollen and Arminger (1991) shows that model-
of low item-total correlations or bad performance in based residual analysis (of both estimated errors in
156 H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

~ ,~ ~ ~ ~,=
"-'= ~ 6 ~,~._ ~. .-=.-=
~,.~- _ ~ -~,..~ ._

~I.~
~'~. ~ .-
" ~

•.-._- ~ ~ ~

~.~ -~ :
~-:~.= ,,

~ ~,'. ,~.... .~ ~o ,, - .~ ~~~I ~.~- N

.~- =

~ ea -~~

o ~a
.~

~
~.~

~
~ ~

,-
- - ~ ~ _~~
=, ~: ~ .~ ~ ~.
' °
;<
,e

~ ~=.~- ~ o.- ~ '~


$. •~ , . ~ ~ ~ o
"~-~ '~
~
~ 'a
,~ ~.,--
.=_
~.~ ~

.=_
E
H. Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161 157

=~= ~ .-.-_ ,~ .~. ~. ,~ .,-. _

f~
~ ' ~0
~ '~ :" = "-' '' "~ ,~ "- o ~ ~

~'~-

o=
6

~.~

,I
E ~ .~ ~ ~ . ~ ~ ~ ~ ~ ,~ ~ . ~

,- ,,, ,~ ~ ~, ~ ,,,

,,,,~
. ~ ~ ~ _.,,

- o ,~ ~ "~ .~
E
_~ ~. ~ ~ -~
158 H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

variables and errors in equations) may be useful in are most profitably specified for relatively well-de-
the detection of outliers and influential cases and in fined theoretical frameworks of moderate complexity
the assessment of assumptions such as normality. in which each construct is measured by a fairly
The LISRES macro discussed previously (Davis, compact set of indicators. SEM is usually not the
1992) implements the Bollen and Arminger (1991) most useful technique in the early, exploratory stages
ideas and allows researchers to conduct sensitivity of research when the measurement structure underly-
analysis on their model. This work is too recent to be ing a set of items is not well established and theoreti-
reflected in previous applications of SEM, but we cal guidance concerning possible patterns of relation-
recommend that researchers give serious considera- ships among constructs is lacking. Furthermore, al-
tion to using these tools in their work. though fairly complex models are specified quite
easily, it is generally advisable to refrain from for-
mulating models that are too grandiose because the
analysis easily degenerates into an exercise in data
4. Discussion mining, resulting in models with suspect statistical
properties and questionable substantive implications
Structural equation modeling has become an es- (cf. Bentler and Chou, 1987).
tablished component of the methodological reper- Second, once the data are available, they should
toire of marketing and consumer behavior re- be screened carefully before a moment matrix is
searchers. There are at least two features that make computed and particular models of interest are inves-
SEM an attractive candidate for purposes of data tigated. We specifically recommend that researchers
analysis. First, SEM allows the researcher to take make greater use of the diagnostic tools that are
into account explicitly the inherent fallibility of be- beginning to appear in the literature (e.g., outlier
havioral science data and to assess and correct for analysis as implemented in the LISRES macro) and
measure unreliability provided multiple indicators of that some evidence of approximate normality based
each construct are available. Second, SEM makes it on skewness and kurtosis be presented in the paper
possible to investigate in a straightforward fashion (as reported in PRELIS, EQS, etc.). As pointed out
comprehensive theoretical frameworks in which the by Martin (1987), among others, the powerful capa-
effects of constructs are propagated across multiple bilities of SEM derive partly from highly restrictive
layers of variables via direct, indirect, or bi-direc- simplifying assumptions, and researchers have to
tional paths of influence. These advantages, coupled make sure that these assumptions are not too grossly
with the development of ever more sophisticated, yet violated.
surprisingly user-friendly computer programs to esti- Third, with regard to model estimation and test-
mate and test such models, make it rather likely that ing, the consistency of a given model specification
SEM will enjoy widespread use in future research. with empirical data should be assessed in terms of a
As with any other research tool that offers power- variety of global and local fit measures, alternative
ful data analysis capabilities, however, SEM has to theoretical models should be considered whenever
be used prudently if researchers want to take full possible, and research results should be cross-vali-
advantage of its potential. Based on prior method- dated, particularly when the original model formula-
ological discussions, our own work in the area, and tion was revised considerably through specification
particularly the present review of previous applica- searches. Steiger (1990) argues this point most force-
tions of SEM in the Journal of Marketing, Journal of fully by stating, "Perhaps a moratorium should be
Marketing Research, International Journal of Re- declared on publication of causal modeling articles
search in Marketing, and the Journal of Consumer using any PMM [post hoc model modification] pro-
Research, we would offer the following general cedure.., unless such articles provide evidence of
guidelines to future users of these techniques. First, cross-validation" (p. 176). Although the replicability
careful thought should be given to model specifica- of findings is an important concern of scientific
tion issues before empirical data are ever collected. research in general, the ease with which data-driven
Experience indicates that structural equation models model modifications are conducted using modifica-
H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161 159

tion indices and related statistics suggests that spe- only legitimate conclusion is that the proposed model
cial attention be paid to this issue in future applica- is one possible plausible account of the data. By
tions of SEM. Table 3 summarizes the major prob- grounding hypothesized patterns of effects strongly
lem areas identified in our review of previous appli- in extant theoretical frameworks, by using panel
cations of SEM and offers specific recommendations designs in which data from the same subjects are
about how to improve the practice of latent variable collected at multiple points in time so that certain
modeling. patterns of influence can be ruled out, and by com-
As this review of previous applications of SEM in paring the proposed model to various competing
four major marketing and consumer behavior jour- specifications, the researcher can take steps to safe-
nals has shown, LISREL and related techniques have guard against likely alternative explanations.
already had a substantial impact on empirical re- A final issue concerns the notion of SEM being a
search in our discipline. The question arises whether confirmatory method of analysis. If hypothesized
the widespread use of SEM has had a positive models are truly specified a priori and no data-based
influence on research from a substantive model modifications are introduced, SEM is indeed
perspective. 8 Several authors have voiced critical used in a confirmatory manner. However, in practice
comments in this regard. For example, Martin (1987) respecifications of either the measurement model or
suggests that the complexities of SEM may place the latent variable model or both are quite common,
undue emphasis on methodological aspects and di- and SEM is then used in a more exploratory fashion.
vert attention from sound theorizing. Particularly in The conclusion that a particular model has been
earlier applications of SEM, authors often felt com- 'confirmed' by the data becomes suspect in such
pelled to explain the basics of the new technique in cases, and cross-validation is necessary to ascertain
great detail (usually with the help of most letters of how well the model will hold up in an actual confir-
the Greek alphabet) and given the space constraints matory analysis (cf. Biddle and Marlin, 1987; Breck-
in journals, the result probably was a decreased ler, 1990).
concern with theory development. It is hoped that as Although some authors have taken a rather pes-
familiarity with the methodology increases, re- simistic view on the value of regression models and
searchers will be able to present technical matters related techniques for empirical research - for exam-
more succinctly and accord greater attention to sub- ple, Freedman (1991) argues that they "make it all
stantive issues. too easy to substitute technique for work" (p. 300) -
Another problem has been the careless use of we believe that on balance researchers have put the
causal terminology (Biddle and Marlin, 1987; Breck- powerful capabilities of SEM to good use and that
ler, 1990; Cliff, 1983; Martin, 1987). Since SEM is empirical work has benefited from the application of
almost always based on correlational data and since this methodology. In particular, we think that the
the vast majority of studies use data collected in a explicit emphasis on multi-item measurement of con-
single wave, causal conclusions are usually unwar- structs and the resultant ability to assess the validity
ranted, and it is probably best not to use the term and reliability of construct measurement has done
causal modeling. A related problem occurs in con- much to bring the importance of these issues to the
nection with the interpretation of a model that has attention of marketing and consumer behavior re-
been found to fit the data. Because of such issues as searchers and to establish what Ray (1979) calls a
model equivalence or near model equivalence, a marketing measurement tradition. It is clear that
good fit should not be interpreted to imply that the valid and reliable measurement is a prerequisite to
proposed model is the 'true' representation of the theory testing, and SEM has certainly contributed to
structure underlying the data (Breckler, 1990). The theory development in this sense (see also Bagozzi
(1984)). We hope that our review of prior applica-
tions of SEM will further improve the quality of
empirical research in marketing and consumer be-
s W e thank an a n o n y m o u s reviewer for suggesting a discussion havior and ultimately advance our understanding of
o f these issues. substantive phenomena.
160 H. Baumgartner, C. Homburg/Intern. J. of Research in Marketing 13 (1996) 139-161

Acknowledgements Bentler, P.M. and C.P. Chou, 1987. Practical issues in structural
modeling. Sociological Methods and Research 16, 78-117.
Biddle, B.J. and M.M. Marlin, 1987. Causality, confirmation,
The authors thank Nirmalya Kumar, Jan-Benedict credulity, and structural equation modeling. Child Develop-
Steenkamp, two anonymous reviewers, and the edi- ment 58, 4-17.
tor for helpful comments on previous versions of this Bollen, K.A., 1989. Structural equations with latent variables.
paper. New York: Wiley.
Bollen, K.A. and G. Arminger, 1991. Observational residuals in
factor analysis and structural equation models. In: P.V. Mard-
sen (ed.), Sociological Methodology 1991,235-262. Washing-
References ton: American Sociological Association.
Bollen, K.A. and J.S. Long, 1993. Introduction. In: K.A. Bollen
Alwin, D.F. and D.J. Jackson, 1979. Measurement models for and J.S. Long (eds.), Testing structural equation models, 1-9.
response errors in surveys: Issues and applications. In: K.F. Newbury Park, CA: Sage.
Schuessler (ed.), Sociological Methodology 1980, 68-119. Breckler, S.J., 1990. Applications of covariance structure model-
San Francisco: Jossey-Bass. ing in psychology: Cause for concern? Psychological Bulletin
Anderson, J.C. and D.W. Gerbing, 1984. The effects of sampling 107, 260-273.
error on convergence, improper solutions and goodness-of-fit Browne, M.W., 1982. Covariance structures. In: D.M. Hawkins
indices for maximum likelihood confirmatory factor analysis. (ed.), Topics in applied multivariate analysis, 72-141. Cam-
Psychometrika 49, 155-173. bridge, England: Cambridge University Press.
Anderson, J.C. and D.W. Gerbing, 1988. Structural equation Browne, M.W., 1984. Asymptotically distribution-free methods
modeling in practice: A review and recommended two-step for the analysis of covariance structures. British Journal of
approach. Psychological Bulletin 103, 411-423. Mathematical and Statistical Psychology 37, 62-83.
Bagozzi, R.P., 1980. Causal models in marketing. New York: Browne, M.W. and R. Cudeck, 1989. Single sample cross-valida-
Wiley. tion indices for covariance structures. Multivariate Behavioral
Bagozzi, R.P., 1984. A prospectus for theory construction in Research 24, 445-455.
marketing. Journal of Marketing 48, 11-29. Browne, M.W. and R. Cudeck, 1993. Alternative ways of assess-
Bagozzi, R.P. and H. Baumgartner, 1994. The evaluation of ing model fitl In: K.A. Bollen and J.S. Long (eds.), Testing
structural equation models and hypothesis testing. In: R.P. structural equation models, 136-162. Newbury Park, CA:
Bagozzi (ed.), Principles of Marketing Research, 386-422. Sage.
Cambridge, MA: Blackwell. Browne, M.W. and G. Mels, 1992. RAMONA user's guide.
Bagozzi, R.P. and T.F. Heatherton, 1994. A general approach to Department of Psychology, Ohio State University, Columbus,
representing multifaceted personality constructs: Application Ohio.
to state self-esteem. Structural Equation Modeling 1, 35-67. Cliff, N., 1983. Some cautions concerning the application of
Bagozzi, R.P. and Y. Yi, 1988. On the evaluation of structural causal modeling methods. Multivariate Behavioral Research
equation models. Journal of the Academy of Marketing Sci- 18, 115-126.
ence 16, 74-94. Cudeck, R., 1989. Analysis of correlation matrices using covari-
Bagozzi, R.P. and Y. Yi, 1989. On the use of structural equation ance structure models. Psychological Bulletin 1989, 317-327.
models in experimental designs. Journal of Marketing Re- Cudeck, R. and M.W. Browne, 1983. Cross-validation of covari-
search 26, 271-284. ance structures. Multivariate Behavioral Research 18, 147-167.
Bagozzi, R.P. and Y. Yi, 1991. Multitrait-multimethod matrices in Davis, W.R., 1992. The LISRES macro, Unpublished manuscript,
consumer research. Journal of Consumer Research 17, 426- University of North Carolina.
439. Dillon, W.R., 1986. Building consumer behavior models with
Bagozzi, R.P., Y. Yi, and L.W. Phillips, 1991. Assessing con- LISREL: Issues in applications. In: D. Brinberg and R.J. Lutz
struct validity in organizational research. Administrative Sci- (eds.), Perspectives on Methodology in Consumer Research,
ence Quarterly 36, 421-458. 107-154, New York: Springer.
Bass, F.M., 1969. A new product growth model for consumer Duncan, O.D., 1975. Introduction to Structural Equation Models.
durables. Management Science 15, 215-227. New York: Academic Press.
Bender, P.M., 1980. Multivariate analysis with latent variables: Fornell, C., 1983. Issues in the application of covariance structure
Causal modeling. Annual Review of Psychology 31,419-456. analysis. Journal of Consumer Research 9, 443-448.
Bentler, P.M., 1989. EQS: Structural equations program manual. Fornell, C. and F.L. Bookstein, 1982. Two structural equation
Los Angeles, CA: BMDP Statistical Software. models: LISREL and PLS applied to consumer exit-voice
Bentler, P.M., 1990. Comparative fit indexes in structural models. theory. Journal of Marketing Research 19, 440-452.
Psychological Bulletin 107, 238-246. Fornell, C. and D.F. Larcker, 1981. Evaluating structural equation
Bentler, P.M. and D.G. Bonett, 1980. Significance tests and models with unobservable variables and measurement errors.
goodness of fit in the analysis of covariance structures. Psy- Journal of Marketing Research 18, 39-50.
chological Bulletin 88, 588-606. Fraser, C., 1980. COSAN user's guide, Centre for Behavioral
H. Baumgartner, C. Homburg~Intern. J. of Research in Marketing 13 (1996) 139-161 161

Studies, University of New England, Armidale, New South McDonald, R.P., 1989. An index of goodness-of-fit based on
Wales, Australia. noncentrality. Journal of Classification 6, 97-103.
Freedman, D.A., 1987. As others see us: A case study in path McDonald, R.P. and H.W. Marsh, 1990. Choosing a multivariate
analysis. Journal of Educational Statistics 12, 101-128. model: Noncentrality and goodness of fit. Psychological Bul-
Freedman, D.A., 1991. Statistical models and shoe leather. In: letin 107, 247-255.
P.V. Mardsen (ed.), Sociological Methodology 1991, 291-313. Mulaik, S.A., L.R. James, J. Van Alstine, N. Bennett, S. Lind, and
Oxford, England: Basil Blackwell. C.D. Stilwell, 1989. Evaluation of goodness-of-fit indices for
Gerbing, D.W. and J.C. Anderson, 1988. An updated paradigm for structural equation models. Psychological Bulletin 105, 430-
scale development incorporating unidimensionality and its as- 445.
sessment. Journal of Marketing Research 25, 186-192. Muth6n, B.O., 1987. LISCOMP: Analysis of linear structural
Gerbing, D.W. and J.C. Anderson, 1993. Monte carlo evaluations relations with a comprehensive measurement model.
of goodness-of-fit indices for structural equation models. In: Mooresville, IN: Scientific Software.
K.A. Bollen and J.S. Long (eds.), Testing structural equation Ray, M.L., 1979. The critical need for a marketing measurement
models, 40-65. Newbury Park, CA: Sage. tradition: A proposal. In: O.C. Ferrel, S.W. Brown, and C.W.
Hattie, J.A., 1985. Methodology review: Assessing unidimension- Lamb (eds.), Conceptual and theoretical developments in mar-
ality of tests and items. Applied Psychological Measurement keting, 34-48. Chicago, IL: American Marketing Association.
9, 139-164. Schaubroeck, J., 1990. Investigating reciprocal causation in orga-
Homburg, C., 1991. Cross-validation and information criteria in nizational behavior research. Journal of Organizational Behav-
causal modeling. Journal of Marketing Research 28, 137-144. ior 11, 17-28.
Homburg, C. and A. Dobratz, 1992. Covariance structure analysis Schoenberg, R.J., 1989. LINCS: Linear covariance structure anal-
via specification searches. Statistical Papers 33, 119-142. ysis. User's guide. Kent, WA: RJS Software.
Hu, L., P.M. Bentler, and Y. Kano, 1992. Can test statistics in Sharma, S., S. Durvasula, and W.R. Dillon, 1989. Some results on
covariance structure analysis be trusted? Psychological Bul- the behavior of alternate covariance structure estimation proce-
letin 112, 351-362. dures in the presence of non-normal data. Journal of Market-
J~ireskog, K.G., 1993. Testing structural equation models. In: K.A. ing Research 26, 214-221.
Bollen and J.S. Long (eds.), Testing structural equation mod- Steenkamp, J.B. and H. van Trijp, 1991. The use of LISREL in
els, 294-316. Newbury Park, CA: Sage. validating marketing constructs. International Journal of Re-
J~ireskog, K.G. and D. SiSrbom, 1993a. LISREL8: User's refer- search in Marketing 8, 283-299.
ence guide. Mooresville, IN: Scientific Software. Steiger, J.H., 1989. EZPATH causal modeling: A supplementary
JiSreskog, K.G. and D. SiSrbom, 1993b. PRELIS: A program for module for SYSTAT and SYGRAPH. Evanston, IL: SYSTAT.
multivariate data screening and data summarization. Steiger, J.H., t990. Structural model evaluation and modification:
Mooresville, IN: Scientific Software. An interval estimation approach. Multivariate Behavioral Re-
Kaplan, D., 1990. Evaluating and modifying covariance structure search 25, 173-180.
models: A review and recommendation. Multivariate Behav- Stelzl, I., 1986. Changing a causal hypothesis without changing
ioral Research 25, 137-155. the fit: Some rules for generating equivalent path models.
Kenny, D.A., 1979. Correlation and causality. New York: Wiley. Multivariate Behavioral Research 21,309-331.
Luijben, T.C.W., 1991. Equivalent models in covariance structure Tanaka, J.S., 1993. Multifaceted conceptions of fit in structural
analysis. Psychometrika 56, 653-665. equation models. In: K.A. Bollen and J.S. Long (eds.), Testing
MacCallum, R., 1986. Specification searches in covariance struc- structural equation models, 10-39. Newbury Park, CA: Sage.
ture modeling. Psychological Bulletin 100, 107-120. Teel, J.E., W.O. Bearden, and S. Sharma, 1986. Interpreting
Marsh, H.W., J.W. Balla, and R.P. McDonald, 1988. Goodness- LISREL estimates of explained variance in nonrecursive struc-
of-fit indices in confirmatory factor analysis: Effects of sample tural equation models. Journal of Marketing Research 23,
size. Psychological Bulletin 103, 391-411. 164-168.
Martin, J.A., 1987. Structural equation modeling: A guide for the Tucker, LR. and C. Lewis, 1973. The reliability coefficient for
perplexed. Child Development 58, 33-37. maximum likelihood factor analysis. Psychometrika 38, 1-10.

You might also like