Best SEM STATA Menu StataSEMMasterDay2and3 PDF

N E I L T. D I A M O N D A N D E W A M .
S Z T E N D U R
INTRODUCTION TO SEM
W I T H S TATA - D AY 1
E S Q U A N T S TAT I S T I C A L C O N S U LT I N G P T Y LT D
Copyright © 2014 Neil T. Diamond and Ewa M. Sztendur
published by esquant statistical consulting pty ltd
typeset with tufte-latex
For academic use only. You may not reproduce or distribute without permission of the authors.
First printing, October 2014

Contents
1 Introduction to SEM 5
2 SEM Builder 9
3 Stata SEM Commands 27
4 Datasets 49
5 Bibliography 57
1 Introduction to SEM
1.1 The basics
Structural equation modelling (SEM) is a statistical methodology that

takes a confirmatory rather than exploratory approach to the analysis
of a structural theory of some phenomenon. In that respect, the aim
of SEM is to determine whether a certain model is valid rather than
to find a suitable model. SEM is primarily a latent-variable approach,
which means that interest usually focuses on theoretical constructs
that cannot be observed (latent constructs). Such constructs arise in
many disciplines, including social and behavioural sciences as well
as economics. Examples of latent variables are intelligence, moti-
vation, attitude, liberalism, self-esteem, stress, verbal ability, math
ability, teacher expectancy, etc. Unlike traditional multivariate meth-
ods which are incapable to assess how measurement error distorts
causal inferences, SEM provides explicit estimates of these error vari-
ance parameters. Although data analyses using the former methods
are solely based on observed measurements, SEM procedures can
incorporate both unobserved and observed variables.
1.1.1 Latent versus manifest variables
Latent variables are unobserved variables that are inferred (through

a mathematical model) from multiple manifest variables that are
observed. For example, a latent variable substance use can be derived
from observed items measuring behaviours such as alcohol use,
cigarettes use, marijuana use, etc. Mathematical models that aim
to explain observed variables in terms of latent variables are called
latent variable models. Latent variable models are used in many
disciplines, including social sciences and psychology, economics,
management, marketing, medicine, physics, and bioinformatics.
6 introduction to sem with stata - day 1
1.1.2 Exogenous versus endogenous latent variables

Exogenous latent variables are like independent variables in ANOVA
or predictors in regression; they cause fluctuations in the values of
other latent variables in the model. Changes in the values of of ex-
ogenous variables are not explained by the model, but are considered
to be influenced by other factors external to the model. Background
variables such as gender or age are examples of such external factors.
Endogenous latent variables are like dependent variables in

ANOVA or outcome or criterion variables in regression. They are
influenced by the exogenous variables in the model, either directly
or indirectly. Fluctuations in the values of endogenous variables are
said to be explained by the model because all latent variables that in-
fluence them are included in the model specification. Unlike ANOVA
or regression, in SEM endogenous variables can also predict other
variables in the model.
1.1.3 The factor analytic model

Factor analysis is a procedure for examining the nature of inter-
relationships between sets of observed and latent variables. The
method is used to investigate a large set of variables that represent
elements of an abstract construct, and to reduce it to a smaller, more
manageable set of underlying concepts. For example, we can anal-
yse intelligence in terms of perception, quantification, word fluency,
verb ability, spatial ability, memory and reasoning. In another exam-
ple, we could examine a large set of behaviours within an individual
and categorise them as representing different conceptual elements
of the person’s psychological state. Loss of appetite, lack of moti-
vation, withdrawal, and feelings of sadness and guilt might reflect
underlying "depression". Sleeplessness, worried thoughts, racing
heart, hot and cold flushes, and nail biting might be indicative of
"anxiety". Depression and anxiety would each be composed of a set
of related elements, with each set of elements unrelated to the other
set. Each set of related elements represents a unique factor. The inter-
correlation of variables within a factor suggests that those variables,
taken together, represent a singular concept that can be distinguished
from other factors. Therefore depression can be distinguished from
anxiety. We might also be interested in the relative strength of the
association between each of the variables within a factor and the
concept that the factor represents. For example, what is the relation-
ship between having worried thoughts and the concept of anxiety?
In addition to categorising variables into factors, factor analysis also
weights each variable within a factor. These coefficients, called fac-
introduction to sem 7
tor loadings, are measures of the correlation between the individual

variable and the overall factor.
There are two types basic types of factor analysis: exploratory

factor analysis (EFA) and confirmatory factor analysis (CFA). As an
exploratory approach, factor analysis can be used to sort through a
large number of variables in an effort to reveal links between the ob-
served and latent variables. This type of analysis may represent early
stages of inquiry, when concepts and relationships are not yet suffi-
ciently understood to propose relevant hypotheses. The exploratory
approach is ofter referred to as a theory building approach. Confir-
matory approach, on the other hand, is used when the researcher has
some knowledge of the underlying latent variable structure (often
based on the exploratory findings). The confirmatory approach is
often referred to as a theory confirming approach.
In summary, both EFA and CFA are procedures used to reduce a

large number of inter-related measured variables to a smaller number
of underlying factors. They both focus solely on how, and extend to
which, the observed variables are linked to their underlying latent
factors.
1.2 Outline of the Workshop
1.2.1 Day 1
Introduction to Stata Menus Reading data into Stata. Cleaning a

dataset.
Introduction to Stata Commands Turning a review window into a do

file. Running do files. A discussion of some useful Stata com-
mands.
Statistics in Stata Using the menus for simple statistical methods

and corresponding Stata commands. Revision of Correlation and
Regression.
Some Multivariate methods in Stata Reliability Analyis, Principal Com-

ponents, and Exploratory Factor Analysis.
1.2.2 Day 2
Introduction to SEM Builder Confirmatory Factor Analysis using the

SEM Builder. Simple Structural Equation Models. Fit indices.
Introduction to SEM Commands Understanding the model syntax.

Modifying the syntax generated by the SEM Builder.
Some further commands for SEM Using a covariance matrix as input.

Constraints.
Some more details Estimators, standard errors, and missing values.

Identication.
1.2.3 Day 3
What to do when the model does not fit Modication Indices.
More on SEM Multiple Groups Analysis and Growth Curve Models.
Reporting SEM What to include. Modifying the diagram for publica-

tion.
2 SEM Builder
SEM Builder is graphical user inteface to build and fit Structural

Equation Models in Stata
In the SEM Builder, and more generally, structural equation mod-
els are portrayed as diagrams, using particular configurations of four
geometric symbols – a circle (or ellipse), a square (or rectangle), a
single-headed arrow, and a double-headed arrow. By convention cir-
b
cles (or ellipses; ) represent unobserved latent variables, squares
(or rectangles; ) represent observed variables; single-headed
arrows (→) represent the impact of one variable on another, and
double-headed arrows (↔) represent covariances or correlations be-
tween pairs of variables.
2.1 An Example of Using the SEM Builder
As an example, we will use a subset of the classic Holzinger and

Swineford (1939) dataset 1 In this section, however, we will only 1
From the help file in the Lavaan
package (Rossel, 2013): The classic
concern ourselves with three of the variables, x1 , x2 and x3 , which
dataset consists of mental ability
area related to visual perception. test scores of seventh and eigth grade
children from two different schools
(Pasteur and Grant-White). In the
2.2 Specifying the data original dataset (available in the MBESS
package), there are scores for 26 tests.
However, a smaller subset with 9
The first step is to specify the data. Since the data is in a .csv file we variables is more widely used in the
can use File ⊳ Import Text data (delimited, *.csv, . . . ) and browse for literature (for example in Joreskog’s
the HolzingerSwineford1939.csv file. 1969 paper , which uses the 145
subjects from the Grant-White school
only).
2.3 Specifying the model using the SEM Builder K. Holzinger and F. Swineford. A
study in factor analysis: The stability of
a bifactor solution. Number 48 in Sup-
Here are the steps involved: plementary Educational Monograph.
University of Chicago Press, Chicago,
• Choose Statistics ⊳ SEM (structural equation modeling) ⊳ Model 1939; and K. G. Joreskog. A general
approach to confirmatory maximum
building and estimation.The SEM builder screen will open.
likelihood factor analysis. Psychometrika,
34:183–202, 1969
Figure 2.1: SEM builder screen
• On the left hand side, click (Add Measurement Component

(M)). Click the cursor on a position in the centre and at the top of
the canvas. The measurement component dialog box will open.
Figure 2.2: Measurement Component

Dialog Box
sem builder 11
• Change the latent variable name to “Visual". It is a good idea to

follow the convention that latent variable begin with a capital
letter but observed variables are all lower case.
• Use the drop down menu in the Measurement variables box and
click on x1, x2 and x3.
• Click OK and the model will be shown.

Choose maximum likelihood and press OK. You should get the
following graph.
Figure 2.3: Estimated Visual congeneric

factor model for the Holzinger and
Swineford Dataset.
2.3.1 Interpretation of the Model

• The mean of the three observed variables are 4.94, 6.09, and 2.25.
• The values of 1, 0.78, and 1.1 on the arrows from the latent vari-
able to the observed variables are the loadings. These are the re-
gression coefficients of the latent variable “Visual" on the three
observed variables.
– Note the loading on the first variable is set to 1. The latent

variable needs a scale and the scale is by default set to be the
same as the first observed variable.
• The mean of the latent variable is assumed to be 0 and is not

shown.
• The variance of the latent variable is estimated to be 0.52.
• The residual variances for x1, x2, and x3, i.e. the variation not
explained by the latent variable “Visual" are 0.83, 1.06 and 0.63,
respectively.
2.4 Standardised Model
An alternative is to specify that the variance of the latent variable is 1

and all the manifest variables are standardised. To do this, folllow the
steps below:
• Select View ⊳ Standardized Estimates
The revised model is shown in Figure 2.4
Figure 2.4: Estimated Visual standard-

ised congeneric factor model for the
Holzinger and Swineford Dataset.
2.4.1 Interpretation of the Standardised Model

• Note that the loading on x1 is now different to 1. The latent vari-
able needs a scale and this is set by setting the variance equal to
1.
• The correlations between the latent variable and the observed

variables are 0.62, 0.48, and 0.47, respectively.
• The mean of the latent variable is 0.
• The residual variances for x1, x2, and x3, i.e.the proportion of the
variation not explained by the latent variable “Visual" are 0.61, 0.77
and 0.5, respectively.
sem builder 13
2.5 Creating a CFA example in SEM Builder
The “one-factor congeneric" model for Visual has no degrees of

freedom-it is a just-identified model. You need at least four indica-
tors for over-identification. Now we will examine the nine observed
variables. This is an example of Confirmatory Factor Analysis. Our
model is that x1 , x2 , and x3 load on Visual; x4 , x5 , and x6 load on
Textual; and x7 , x8 , and x9 load on Speed, and that the three latent
variables are distinct concepts but are correlated with each other.
Let’s use Stata’s SEM builder to fit this model to the nine-variable
Holzinger and Swineford data set.
1. Choose Estimation ⊳ Clear Estimates
2. Type “S" to choose the Select button
3. With the shift key, select the Visual model and move it to the left of
the canvas.
4. Type “M" to choose the “Add Measurement Component" button,

click in the middle of the canvas about the same level as the Visual
latent variable.
5. In the dialog box change the latent variable to “Textual" and asso-
ciate the observed variables x4, x5 and x6. Press OK.
6. Type “M" to choose the “Add Measurement Component" button,

click on the right of the canvas about the same level as the Visual
and Textual latent variables.
7. In the dialog box change the latent variable to “Speed" and asso-
ciate the observed variables x7, x8 and x9. Press OK.
8. Type “C" to choose the “Add Covariance" button
9. Click on the Visual latent variable and drag the covariance to the
Textual latent variable. You can adjust the position of the covari-
ance double sided arrow by moving the little circles on the latent
variable ellipses. You can also adjust the curve of the covariance
double sided arrow by moving the circle on the end of the “han-
dle".
10. Do the same for Visual and Speed; and Textual and Speed.
11. Estimate the parameters using Estimation ⊳ Estimate.

12. Display the standardized solution using View ⊳ Standardized

Estimates. In the Main tab, choose Maximum Likelihood and in
the Reporting tab, select Display Standardized coefficients and
values2 . In the Advanced Tab, check Do not estimate means or 2
You can also get the standardized
intercepts. estimates to display by estimating
the unstandardized model and then
13. If you have done everything correctly then the model should ap- going View rhd Standardized Estimates.
However, if you do this, only the
pear as in Figure 2.5. standardized estimates show up in the
results window.
Figure 2.5: Estimated CFA Model for

Holzinger-Swineford Data Set
.47
.46
.28
Visual Textual Speed

1 1 1
.58 .84 .67
.77 .42 .86 .72
.85 .57
x1 x2 x3 x4 x5 x6 x7 x8 x9
e1 .4 e2 .82 e3 .66 e4 .27 e5 .27 e6 .3 e7 .68 e8 .48 e9 .56
2.6 The output
S t r u c t u r a l e q u a t i o n model Number o f obs = 301

E s t i m a t i o n method = ml
Log l i k e l i h o o d = −3737.7449
( 1) [ x1 ] V i s u a l = 1
( 2) [ x4 ] T e x t u a l = 1
( 3) [ x7 ] Speed = 1
sem builder 15
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
Standardized | Coef . Std . E r r . z P>|z| [95% Conf . I n t e r v a l ]
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Measurement |
x1 <− |
Visual | .7718802 .0575346 13.42 0.000 .6591144 .8846459
_cons | 4.234926 .1819724 23.27 0.000 3.878267 4.591586
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x2 <− |
Visual | .4236006 .062738 6.75 0.000 .3006364 .5465649
_cons | 5.179137 .2188139 23.67 0.000 4.75027 5.608005
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x3 <− |
Visual | .5811323 .0584538 9.94 0.000 .4665651 .6956996
_cons | 1.993107 .0996045 20.01 0.000 1.797886 2.188328
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x4 <− |
Textual | .8515823 .0226412 37.61 0.000 .8072064 .8959581
_cons | 2.633762 .1218401 21.62 0.000 2.39496 2.872564
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x5 <− |
Textual | .8550654 .0221923 38.53 0.000 .8115693 .8985616
_cons | 3.369123 .1489219 22.62 0.000 3.077242 3.661005
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x6 <− |
Textual | .8380101 .0235412 35.60 0.000 .7918702 .88415
_cons | 1.998179 .0997732 20.03 0.000 1.802627 2.193731
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x7 <− |
Speed | .5695148 .0583107 9.77 0.000 .4552279 .6838017
_cons | 3.848319 .1671013 23.03 0.000 3.520807 4.175832
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x8 <− |
Speed | .7230442 .0622861 11.61 0.000 .6009657 .8451228
_cons | 5.46731 .2301649 23.75 0.000 5.016195 5.918424
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x9 <− |
Speed | .6650094 .0660831 10.06 0.000 .5354889 .7945299
_cons | 5.334255 .2249189 23.72 0.000 4.893422 5.775088
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
var ( e . x1 )| .404201 .0888196 .2627564 .6217867
var ( e . x2 )| .8205625 .0531517 .7227287 .9316398
var ( e . x3 )| .6622852 .0679387 .5416601 .809773
var ( e . x4 )| .2748076 .0385616 .2087307 .3618023

var ( e . x5 )| .2688631 .0379518 .2038818 .3545552
var ( e . x6 )| .2977391 .0394555 .2296345 .386042
var ( e . x7 )| .6756529 .0664177 .557249 .8192152
var ( e . x8 )| .477207 .0900713 .3296441 .6908254
var ( e . x9 )| .5577625 .0878918 .4095601 .759593
var ( V i s u a l )| 1 . . .
var ( T e x t u a l )| 1 . . .
var ( Speed )| 1 . . .
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
cov ( Visual , |
T e x t u a l )| .4585093 .0634638 7.22 0.000 .3341225 .5828962
cov ( Visual , |
Speed )| .4705348 .0862308 5.46 0.000 .3015256 .639544
cov ( Textual , |
Speed )| .2829848 .0714709 3.96 0.000 .1429045 .4230652
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
LR t e s t o f model vs . s a t u r a t e d : c h i 2 ( 2 4 ) = 8 5 . 3 1 , Prob > c h i 2 = 0 . 0 0 0 0
2.6.1 Interpretation of Output

• The number of observed statistics is 54, consisting of 9 means, 9
variances and 36 = (9 × 8)/2 covariances. Sometimes the means are
not counted and you will the number of observed statistics equal
to 45 i.e. 9 variances plus 36 covariances.
• The number of estimated parameters is 30: 9 means, 9 residuals

variances, 6 loadings (2 for each of the 3 factors), 3 covariances
among the latent variables and 3 variances for the latent variables.
Again sometimes the means are not counted and in this case the
number of estimated parameters is said to be 21.
• For each parameter estimated, the standard error is also given.
• The degrees of freedom is the number of observed statistics minus

the number of estimated parameters i.e. 54 − 30 = 24 in this case.
This is the same whether the means are counted or not.
• The model is fitted by maximising the likelihood. It is usual to

quote minus twice the log-likelihood which, if the model is correct,
has a χ2 distribution with degrees of freedom given above. Note
that maximising the likelihood is equivalent to minimising minus
twice the log likelihood.
• The χ2 statistic is a measure of lack of fit. The p-value is is very

small, indicating the model does not fit the data.
sem builder 17
To get further information on the model fit, choose Estimation

⊳ Overall goodness of fit. In the estat-Postestimation tool for sem
dialog box, select Goodness-of-ft statistics in the Reporting and statis-
tics:(subcommand) drop-down list, and select all in the Statistics to
be displayed drop-down list. Press Ok.
. e s t a t gof , s t a t s ( a l l )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Fit s t a t i s t i c | Value Description
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Likelihood r a t i o |
chi2_ms ( 2 4 ) | 85.306 model vs . s a t u r a t e d
p > chi2 | 0.000
chi2_bs (3 6 ) | 918.852 b a s e l i n e vs . s a t u r a t e d
p > chi2 | 0.000
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Population e r r o r |
RMSEA | 0.092 Root mean squared e r r o r o f approximation
90% CI , lower bound | 0.071
upper bound | 0.114
pclose | 0.001 P r o b a b i l i t y RMSEA <= 0 . 0 5
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Information c r i t e r i a |
AIC | 7517.490 Akaike ’ s i n f o r m a t i o n c r i t e r i o n
BIC | 7595.339 Bayesian i n f o r m a t i o n c r i t e r i o n
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
B a s e l i n e comparison |
CFI | 0.931 Comparative f i t index
TLI | 0.896 Tucker −Lewis index
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Size of residuals |
SRMR | 0.065 Standardized r o o t mean squared r e s i d u a l
CD | 0.986 C o e f f i c i e n t of determination
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
The interpretation of these statistics is as follows:
2.6.2 Model Chi-Square

The model chi-square tests the exact fit hypothesis i.e that there are
no discrepancies between the population covariance and that implied
by the fitted model. The first part of the summary indicateso indi-
cates that the χ2 statistic was 85.306, and that there were 24 degrees
of freedom and the p-value, the probability of obtaining the observed
value of χ2 or more extreme asssuming the assumed model is correct

is quite small. Ideally, we would want the p-value to be greater than
0.05.
. e s t a t gof , s t a t s ( c h i 2 )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
p > chi2 | 0.000
p > chi2 | 0.000
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Where does the 24 degrees of freedom come from? There are 9

variances and (9 × 8)/2 = 36 covariances. We need to estimate 9
residual variances, 6 loadings (2 for each factor), 3 factor covariances
and 3 factor variances; a total of 21 free parameters. The degrees
of freedom is the number of bits of information less the number of
parameters to estimate i.e 45-21=24.
The expected value of the χ2 statistic equals the degrees of free-
dom, 24 in this case. Note that the χ2 statistic can be affected by
non-normality and sample size as well as other factors.
• The saturated model corresponds to an exact fit model. Since there

is a statistically significant difference between our model and the
saturated model, it means that our model does not explain the
data.
• The baseline model is the model where all the observed variables
are independent.
2.6.3 Root Mean Square Error of Approximation

The Root Mean Square Error of Approximation is a popular fit index.
The formula is
¿
Á χ2 − df M
RMSEA = Á À
df M × (N − 1)
√
85.306 − 24
= in this case
24 × 300
= 0.092
sem builder 19
and is, in words, the amount of discrepancy per degree of freedom.

Ideally, the RMSEA is less than 0.05. We are provided with a 90% confi-
dence interval, as well as the probability that the population RMSEA is
less than 0.05. In this case the close-fit hypothesis is not supported.
. e s t a t gof , s t a t s ( rmsea )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
upper bound | 0.114
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
2.6.4 Information Criteria

• The AIC (Aikake’s Information Criteria) is a penalised measure of
lack of fit. It equals minus twice the log-likeihood plus twice the
number of estimated parameters. Smaller values mean better fit.
AIC’s can be compared for non-nested models.
• The BIC (Bayesian Information Criteria) is an alternative to the

AIC, with a different penalty.
. e s t a t gof , s t a t s ( i c )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
These fit indices can be used to compare non-nested models. The

actual values can’t be interpreted, except to say that the smaller the
criteria is the better.
2.6.5 Full model versus baseline model

Next, we get a comparison between the model we have fitted and the
baseline model. Two indices are provided. The comparative fit index
(CFI) is given by
χ2M − df M
CFI = 1−
χ2B − dfB
85.306 − 24
= 1− in this case
918.852 − 36
= 0.931
where the subscript indicates whether we are referring to our model

(M) or the baseline model (B).
The Tucker-Lewis index (TLI) is given by
χ2B χ2 χ2
TLI = [ − M ] / [ B − 1]
dfB df M dfB
918.852 85.306 918.852
= [ − ]/[ − 1] in this case
36 24 36
= 0.896
. e s t a t gof , s t a t s ( i n d i c e s )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
• The SRMR is the ratio of the sum of the squared differences be-
tween the correlations for the observed variable and the correla-
tions implied by our model divided by the number of variances
and covariances. This is given by the formula below
¿
Á ∑i<=j (ri,j − ρi,j )2
SRMR =
Á
À
v(v + 1)/2)
where ri,j is the observed correlation for the ith and jth variables,
ρi,j is the model implied correlation between the ith and jth vari-
ables, and v is the number of variables.
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
sem builder 21
Size of residuals |
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
2.7 Political Democracy Dataset
In Bollen’s Political Democracy Dataset and Model, relating to 75

developing countries, there is one exogenous latent variable “Indus-
trialisation" and two endogenous latent variables “Democracy in
1960" and “Democracy in 1965". Bollen wanted to examine the effect
of Industrialisation on Democracy.
Industrialisation is measured by three observed variable
x1 the gross national product (GNP) per capita in 1960
x2 the inaminate energy consumption per capita in 1960
x3 the percentage of the labour force in industry in 1960

Political Democracy in 1960 is measured by four observed vari-
ables
y1 expert ratings of the freedom of the press in 1960
y2 the freedom of political opposition in 1960
y3 the fairness of elections in 1960
y4 the effectiveness of the elected legislature in 1960

Similarly Political Democracy in 1965 is measured by (similar
variables to y1 to y4 ) y5 to y8 but measured in 1965.
Exercise: Use SEM builder to generate a diagram similar to Fig-
ure 2.6. The steps are:
1. Save your current graph as a .stsem file3 . 3

This stands for SEM Path Diagram.
2. Choose File ⊳ Exit.

3. Adjust the canvas size to say 9 in. by 6 in., and use the fit in win-
dow button.
4. Press the Add measurement component model and click at about
3 down and 3 across on the grid. For the latent variable name type
dem60. In the measured variables select y1 y2 y3 y4. Check the Do
not estimate constants box. Put the Menu direction as up.
3 down and 6 across on the grid. For the latent variable name type
dem65. In the measured variables select y5 y6 y7 y8. Check the Do
not estimate constants box. Put the Menu direction as up.

4.5 down and 4.5 across on the grid. For the latent variable name
type ind60. In the measured variables select x1 x2 x3. Check the
Do not estimate constants box. Put the Menu direction as down.
7. Use the select tool to adjust the position of the latent variables and
associated observed variables as you see fit.
8. Add paths from ind60 to dem60; from ind60 to dem65; and from
dem60 to dem65.
9. Add covariances between ε 2 and ε 7 ; from ε 3 and ε 8 ; from ε 4 and ε 9 ;
and ε 5 and ε 10 .
10. Also add covariances between ε 3 and ε 8 . Modify the appearance of
these paths by moving the lever as appropriate.
11. Choose Estimation ⊳ Estimate. Choose Maximum Likelihood. In
the Reporting tab, check the Standardized coefficients and values.
In the Advanced tab check do not fit mean or intercepts.
The resulting diagram needs some adjustment. In particular, the

variance for ε 1 does not show and the covariances between the errors
are hard to read because of all the lines. To do this follow the steps
below:
1. Select ε 1 . Press Properties . . . to get the Variable properties dialog

box. In the Appearance Tab, check Customize appearance for se-
lected variables and choose Set custom appearance. In the Variable
settings-selected variables tab, choose the Results tab. Press the Re-
sults1 box under Appearance of results (font, color, position etc.).
Choose Position of 9 o’clock and the Boundary gap as 3 pt. Press
OK, OK, OK.
2. Now select the covariance arrows and move the positions of the
covariances.To do this for the covariance between ε 2 and ε 7 , for
example, select the covariance and then click Properties . . . . In
the Appearance Tab, check Customize appearance for selected
variables and choose Set custom appearance. In the Results tab,
press Results 1 and select the Distribution between nodes to be
10%. The appropriate values for the covariance between ε 5 and ε 10
should be 90%. For the covariance between ε 3 and ε 8 , we suggest
15% (which you need to type in); and for the covariance between
ε 4 and varepsilon9 we suggest 85%.
3. Finally adjust the positions of the covariances between ε 3 andε 5

and between ε 8 and ε 10 to look the same.
The results you obtain should be as follows:

sem builder 23
Figure 2.6: Estimated CFA model for

the Political Democracy Dataset.

Log l i k e l i h o o d = −1547.7909
( 1 ) [ y1 ] dem60 = 1
( 2 ) [ y5 ] dem65 = 1
( 3 ) [ x1 ] ind60 = 1
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
Standardized | Coef . Std . E r r . z P>|z| [95% Conf . I n t e r v a l ]
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
dem60 <− |
ind60 | .4467129 .1046964 4.27 0.000 .2415117 .6519141
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
dem65 <− |
dem60 | .8852288 .0517686 17.10 0.000 .7837641 .9866934
ind60 | .1822596 .0729762 2.50 0.013 .0392289 .3252904
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Measurement |
y1 <− |
dem60 | .8504258 .0437576 19.43 0.000 .7646626 .9361891
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y2 <− |
dem60 | .7171219 .0639886 11.21 0.000 .5917065 .8425373
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y3 <− |
dem60 | .7223492 .064376 11.22 0.000 .5961746 .8485238
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y4 <− |
dem60 | .8457095 .0444636 19.02 0.000 .7585624 .9328566
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y5 <− |
dem65 | .8080173 .0483896 16.70 0.000 .7131754 .9028593
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y6 <− |
dem65 | .7460072 .0572477 13.03 0.000 .6338037 .8582107
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y7 <− |
dem65 | .8236733 .0456011 18.06 0.000 .7342968 .9130499
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
y8 <− |
dem65 | .8278414 .0459159 18.03 0.000 .737848 .9178348
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x1 <− |
ind60 | .9198529 .0231947 39.66 0.000 .8743921 .9653137
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x2 <− |
ind60 | .9730326 .0165154 58.92 0.000 .9406629 1.005402
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
x3 <− |
ind60 | .8721386 .0308137 28.30 0.000 .8117447 .9325324
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
var ( e . y1 )| .2767759 .0744251 .1633954 .4688313
var ( e . y2 )| .4857361 .0917753 .3354083 .7034399
var ( e . y3 )| .4782116 .0930039 .3266451 .7001064
var ( e . y4 )| .2847755 .0752066 .1697102 .4778562
var ( e . y5 )| .347108 .0781993 .2232025 .5397968
var ( e . y6 )| .4434733 .0854144 .3040347 .6468621
var ( e . y7 )| .3215622 .0751209 .2034295 .5082954
var ( e . y8 )| .3146786 .0760221 .1959876 .5052496
var ( e . x1 )| .1538706 .0426714 .0893512 .2649787
sem builder 25
var ( e . x2 )| .0532076 .0321401 .0162856 .1738374

var ( e . x3 )| .2393743 .0537477 .1541537 .3717075
var ( e . dem60)| .8004476 .0935385 .6365953 1.006474
var ( e . dem65)| .0390048 .0497035 .0032095 .4740198
var ( ind60 )| 1 . . .
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
cov ( e . y1 , e . y5 )| .2957611 .1421271 2.08 0.037 .0171972 .5743251
cov ( e . y2 , e . y4 )| .272567 .1206589 2.26 0.024 .0360799 .5090541
cov ( e . y2 , e . y6 )| .3562224 .0975541 3.65 0.000 .1650199 .5474249
cov ( e . y3 , e . y7 )| .1906414 .1374685 1.39 0.166 −.0787919 .4600747
cov ( e . y4 , e . y8 )| .1088014 .1354941 0.80 0.422 −.1567621 .374365
cov ( e . y6 , e . y8 )| .3377705 .1113979 3.03 0.002 .1194346 .5561064
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
LR t e s t o f model vs . s a t u r a t e d : c h i 2 ( 3 5 ) = 3 8 . 1 3 , Prob > c h i 2 = 0 . 3 2 9 2
As before, we can check out the fit of the model using Estimation
⊳ Overall goodness of fit. In the estat-Postestimation tool for sem
dialog box, select Goodness-of-ft statistics in the Reporting and statis-
tics:(subcommand) drop-down list, and select all in the Statistics to
be displayed drop-down list. Press Ok. All the statistics are satisfac-
tory.
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
p > chi2 | 0.329
p > chi2 | 0.000
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
upper bound | 0.092
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−

−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Size of residuals |
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
3 Stata SEM Commands
3.1 A first glimpse of Stata SEM Syntax
Although the SEM Builder is great, it is probably better to use the

Stata SEM commands. Actually, all the SEM builder does is to transate
your diagram into a set of commands and runs the commands when
you fit the model. Stata has great facilities for Structural Equation
Modelling. It is easy to use and has a simple model syntax allow-
ing you to easily specify the model you want to fit to your data. The
package provides many summaries of your model and provides con-
venient ways of improving your model. Stata can handles multiple
groups (e.g. Males and Females) and handles growth curve models,
categorical variables and more.
When you estimated the model using Estimation ⊳ Estimate, Stata
automatically generated the commands. These can be found in the
results window but also can be obtained by clicking on them in the
review window. After reloading the HolzingerSwineford data, we
can refit the Visual CFA model by clicking the appropriate line in the
Review window. Stata gives us the following.
sem ( V i s u a l −> x1 , ) ( V i s u a l −> x2 , ) ( V i s u a l −> x3 , ) , l a t e n t ( V i s u a l ) n o c a p s l a t e n t
1. The syntax shows that the latent variable Visual loads onto the
three observed variables x1, x2, x3. It could have been written as
( V i s u a l −> x1 x2 x3 )
2. The comma indicates that everything after that is an option to the

sem command. There are two options.
(a) latent(Visual) explicitly specifies that Visual is a latent variable.

All other variables are then observed variables.
(b) nocapslatent says not to treat variables with the first letter cap-
italized as a latent variable by default. Stata sem assumes that
latent variables have the first letter capitalized and that ob-
served variables have the first letter lower case. If you want to
have all variables in your data set lpwer case, you can use the
followiing command.
. rename * , lower
The syntax generated is correct but a bit long-winded and some of

it is superfluous as all our observed variables are lower case. We can
simplify it and also put it in a do file by right clicking and with a bit
of editing
import d e l i m i t e d ///
C: \ Users\NeilDiamond\Documents\LavaanCourse\HolzingerSwineford1939 . csv , c l e a r
sem ( V i s u a l −> x1 x2 x3 )
Behind the scenes, Stata automatically sets the first loading to 1,

and adds the residual variance.
3.2 Confirmatory Factor Analysis Example
The Stata SEM code for the confirmatory factor analysis example is
given below:
cd "C: \ Users\NeilDiamond\Documents\ S t a t a Workshop\Day 2 "

import d e l i m i t e d
"C: \ Users\NeilDiamond\Documents\ S t a t a Workshop\Day 2\Data\HolzingerSwineford1939 . csv "
sem ( V i s u a l −> x1 , ) ( V i s u a l −> x2 , ) ( V i s u a l −> x3 , ) , l a t e n t ( V i s u a l ) n o c a p s l a t e n t
sem ( V i s u a l −> x1 , ) ( V i s u a l −> x2 , ) ( V i s u a l −> x3 , )
( T e x t u a l −> x4 , ) ( T e x t u a l −> x5 , ) ( T e x t u a l −> x6 , )
( Speed −> x7 , ) ( Speed −> x8 , ) ( Speed −> x9 , ) ,
c o v s t r u c t ( _lexogenous , d i a g o n a l ) s t a n d a r d i z e d nomeans
l a t e n t ( V i s u a l T e x t u a l Speed ) cov ( V i s u a l * T e x t u a l V i s u a l * Speed T e x t u a l * Speed )
nocapslatent
graph e x p o r t "C: \ Users\NeilDiamond\Documents\ S t a t a Workshop\Day 2\HS1939 . png " ,
as ( png ) r e p l a c e
e s t a t gof , s t a t s ( a l l )
Again it is a bit long-winded. We can simplify the commands

somewhat as follows.
• (Visual -> x1, ) (Visual -> x2, ) (Visual -> x3, ) can become (Visual
<- x1 x2 x3), and similarly for Textual and Speed.
• Because we have followed the convention that latent variables

begin with a capital and observed variables don’t, we do not have
to specify the Visual Textual and Speed are latent, nor that we are
not following the convention (which nocapslatent does).
stata sem commands 29
• covstruct(_lexogenous, diagonal) specifies that the covariance

structure of the latent exogenous variables is diagonal, but cov(
Visual*Textual Visual*Speed Textual*Speed) says that the three
variables covary. So we can just leave these parts out.
• We need to take out the graph command. SEM builder generates

the commands but you can’t use commands to generate a graph.
• We need to keep the standardized and nomeans options after the

important comma.
Your new do file should look something like this:

capture log c l o s e
l o g using HS39 , r e p l a c e t e x t
// HS39 . do : F i t s CFA t o HolzingerSwineford1939 data

// N e i l Diamond 20/10/14
v e r s i o n 13
clear all
macro drop _ a l l
s e t l i n e s i z e 80
cd "C: \ Users\NeilDiamond\Documents\ S t a t a Workshop\Day 2 "

" . \ Data\HolzingerSwineford1939 . csv "
sem ( Visual −> x1 x2 x3 ) ///

( Textual −> x4 x5 x6 ) ///
( Speed −> x7 x8 x9 ) , ///
s t a n d a r d i z e d nomeans
log c l o s e
exit
Run the do file and confirm you get the same results as before.
3.3 Fit Indices
If we fit the model using the code below

we get a summary of the fitted model.

We can also get a subset by listing the fit indices we want, for
example
e s t a t gof , s t a t s ( c h i 2 rmsea i c i n d i c e s r e s i d u a l s )
3.4 Extracting Information from the fitted model
After the analysis, Stata saves various statistics which you might
want to use. For example after getting the goodness of fit statistics
there are many statistics retained.
. return l i s t
scalars :
r ( N_groups ) = 1
r ( cd ) = .9861419451994397
r ( srmr ) = .0595237982362845
r( tli ) = .8958394762056794
r( cfi ) = .9305596508037862
r ( bic ) = 7646.703173647184
r ( aic ) = 7535.489865704717
r ( pclose ) = .0006612367108219
r ( ub90_rmsea ) = .1136780172014793
r ( lb90_rmsea ) = .0714184911919339
r ( rmsea ) = .0921214848760547
r ( p_bs ) = 1 . 5 7 3 4 1 7 5 1 0 6 e −169
r ( df_bs ) = 36
r ( chi2_bs ) = 918.8515836481301
r ( p_ms ) = 8 . 5 0 2 5 5 1 6 1 2 6 5 e −09
r ( df_ms ) = 24
r ( chi2_ms ) = 85.30552225695647
matrices :
r ( nobs ) : 1 x 1
. display r ( chi2_bs )
918.85158
One use of these statistcs is to calculate fit statistics that Stata does
not compute. For example, the GFI (Goodness of Fit Statistic) is given
by
GFI = 1 − [χ2model /χ2null ]
but Stata does not compute this statistic. But it is easy to generate it
using the retained statistics.
gen g f i =1− r ( chi2_ms )/ r ( c h i 2 _ b s )
. display g f i
.9071607
3.5 Re-analysis of the Political Democracy Data Set
The commands generated for the Political Democracy model is given

below.
. sem ( dem60 −> y1 , ) ( dem60 −> y2 , ) ( dem60 −> y3 , ) ( dem60 −> y4 , ) ( dem60 −>
dem65 , ) ( ind60 −> dem60 , ) ( ind60 −> dem65 , )
( ind60 −> x1 , ) ( ind60 −> x2 , ) ( ind60 −> x3 , ) , nomeans s t a n d a r d i z e
l a t e n t ( dem60 dem65 ind60 )
cov ( e . y1 * e . y5 e . y2 * e . y4 e . y2 * e . y6 e . y3 * e . y7 e . y4 * e . y8 e . y6 * e . y8 ) n o c a p s l a t e n t
• Note that because we have dem60, dem65, and ind60 begin with
lower case letters, we do need to specify nocapslatent and also
specify that dem60, dem65 and ind60 are latent variables. Can you
think of something to simplify the commands?
• The code can be simplified by putting all the variables that a latent
variable loads on within the same bracket.
• Note how the covariances are specified. e.y1 and e.y5 are the error
variances attached to y1 and y5, respectively and e.y1*e.y5 says we
want to allow these errors to covary.
Exercise: The commands are a bit long winded. Develop a simpler

set of commands in a do file to fit the model1 . 1
An answer is over the page.
C: \ Users\NeilDiamond\Documents\LavaanCourse\ P o l i t i c a l D e m o c r a c y . csv , c l e a r
sem ( Dem60 −>y1 y2 y3 y4 Dem65 ) ///
( Dem65 −> y5 y6 y7 y8)///
( I i n d 6 0 −> x1 x2 x3 Ind60 Ind65 ) ///
( T e x t u a l −> x4 x5 x6 ) , ///
cov ( e . y1 * e . y5 e . y2 * e . y4 e . y2 * e . y6 e . y3 * e . y7 e . y4 * e . y8 e . y6 * e . y8 )
3.6 Using a covariace matrix and vector of means as input
Sometimes we have not got the raw data, for example we are reading
a paper. Usually either a sample variance-covariance matrix will be
provided; or the sample correlation matrix and the vector of sample
standard deviations (and possibly the sample means).
For an example, consider the data analysed by Kline (2011, p.163).
The data is adapted from Sava (2002), and relates to a study of 109
high school teachers and considers the causes and effects of teacher-
burnout. The hypothesied model is that school support and coercive
control affect teacher burnout and all these variables have an effect
on the teacher-pupil interaction. which in turn has an effect on the
school experience and the somatic status of the teacher’s students.
The variance-covariance matrix is given below.
Table 3.1: Correlations and Standard

Deviations for teacher and pupils data
set
Variable 1 2 3 4 5 6
1. Coercive Control 1.0000
2. Teacher Burnout 0.3557 1.0000
3. School Support −0.2566 −0.4774 1.0000
4. Teacher-Pupil Interactions -0.4046 0.0207 0.1864 1.0000
5. School Experience -0.1615 0.0938 0.0718 0.6542 1.0000
6. Somatic Status -0.3487 -0.0133 0.1570 0.7277 0.4964 1.0000
SD 8.3072 9.7697 10.5212 5.0000 3.7178 5.2714
A graphical depiction of the model considered is given below:

Figure 3.1: Sara Model.

We need to enter the data into Stata. We use the ssd (Summary
statistics data) command. Open the do file editor and type the fol-
lowing commands and save the file as Save.do
clear
ssd i n i t c c t b s c _ s p t t p i s c _ e som_st
ssd s e t o b s e r v a t i o n s 109
ssd s e t sd 8 . 3 0 7 2 9 . 7 6 9 7 1 0 . 5 2 1 2 5 . 0 0 0 3 . 7 1 7 8 5 . 2 7 1 4
# delimit ;
ssd s e t c o r r e l a t i o n s
1 \
.3557 1 \
−.2566 −.4774 1 \
−.4046 . 0 2 0 7 . 1 8 6 4 1 \
−.1615 . 0 9 3 8 . 0 7 1 8 . 6 5 4 2 1 \
−.3487 −.0133 . 1 5 7 0 . 7 2 7 7 . 4 9 6 4 1 ;
# delimit cr
save sava . dta
clear
use sava
ssd l i s t
• ssd init sets up the variables
• ssd set observations tells Stata how many observations there

are.
• ssd set sd specifies the standard deviations of the variables.
• #delimit ; changes the signal to submit a line from a carriage

return (i.e. Enter) to a semi-colon. We need this, because we are
going to enter the matrix of correlations row by row.
• The correlation matrix is symmetric so we only have to enter the

lower diagonal of the matrix.
• Each row of the matrix is ended by a backslash.
• The end of the matrix input is ended by a new signal, i.e a semi-
colon.
• We then save the data, clear the memory, and then use the data
and list it with ssd list.
Now run the do file.
. do sava . do
. clear
. ssd i n i t c c t b s c _ s p t t p i s c _ e som_st
Summary s t a t i s t i c s data i n i t i a l i z e d . Next use , i n any order ,
ssd s e t o b s e r v a t i o n s ( r e q u i r e d )
I t i s b e s t t o do t h i s f i r s t .
ssd s e t means ( o p t i o n a l )
Default s e t t i n g i s 0 .
ssd s e t v a r i a n c e s or ssd s e t sd ( o p t i o n a l )
Use t h i s only i f you have s e t or w i l l s e t c o r r e l a t i o n s and , even then ,
t h i s i s o p t i o n a l but h i g h l y recommended . D e f a u l t s e t t i n g i s 1 .
ssd s e t c o v a r i a n c e s or ssd s e t c o r r e l a t i o n s ( r e q u i r e d )
. ssd s e t o b s e r v a t i o n s 109
( value s e t )
Status :
observations : set
means : unset
v a r i a n c e s or sd : unset
c o v a r i a n c e s or c o r r e l a t i o n s : unset ( r e q u i r e d t o be s e t )
. ssd s e t sd 8 . 3 0 7 2 9 . 7 6 9 7 1 0 . 5 2 1 2 5 . 0 0 0 3 . 7 1 7 8 5 . 2 7 1 4
( values s e t )
Status :
observations : set
means : unset
v a r i a n c e s or sd : set
c o v a r i a n c e s or c o r r e l a t i o n s : unset ( r e q u i r e d t o be s e t )
. # delimit ;
d e l i m i t e r now ;
. ssd s e t c o r r e l a t i o n s
> 1 \
> .3557 1 \
> −.2566 −.4774 1 \
> −.4046 . 0 2 0 7 . 1 8 6 4 1 \
> −.1615 . 0 9 3 8 . 0 7 1 8 .6542 1 \
> −.3487 −.0133 .1570 .7277 .4964 1 ;

( values s e t )
Status :
observations : set
means : unset
v a r i a n c e s or sd : set
c o v a r i a n c e s or c o r r e l a t i o n s : set
. # delimit cr
d e l i m i t e r now c r
. save sava . dta
f i l e sava . dta saved
. clear
. use sava
. ssd l i s t
O b s e r v a t i o n s = 109
Means undefined ; assumed t o be 0
Standard d e v i a t i o n s :
cc tb sc_spt tpi sc_e som_st
8.3072 9.7697 10.5212 5 3.7178 5.2714
Correlations :
cc tb sc_spt tpi sc_e som_st
1
.3557 1
−.2566 −.4774 1
−.4046 .0207 .1864 1
−.1615 .0938 .0718 .6542 1
−.3487 −.0133 .157 .7277 .4964 1
.
end o f do− f i l e
Now open the do-file editor and create a do file called sava_fit.do
with the following commands:
. sem ///
> ( s c _ s p t −> t b t p i ) ///
> ( c c −> t b t p i ) ///
> ( t b −> t p i ) ///
> ( t p i −> s c _ e som_st )
Run the save_fit do file to get the following results.
sem ///
( s c _ s p t −> t b t p i ) ///
( c c −> t b t p i ) ///
( t b −> t p i ) ///
( t p i −> s c _ e som_st )
Endogenous v a r i a b l e s
Observed : t b t p i s c _ e som_st
Exogenous v a r i a b l e s
Observed : sc_spt cc
F i t t i n g t a r g e t model :
Iteration 0: l o g l i k e l i h o o d = −2052.8451

Log l i k e l i h o o d = −2052.8451
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
| Coef . Std . E rr . z P>|z| [95% Conf . I n t e r v a l ]
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
t b <− |
s c _ s p t | −.3838194 .0777506 −4.94 0.000 −.5362078 −.231431
cc | .293585 .0984724 2.98 0.003 .1005828 .4865873
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
t p i <− |
tb | .1424866 .0510318 2.79 0.005 .0424661 .2425071
sc_spt | .0966997 .0458219 2.11 0.035 .0068904 .1865089
c c | −.2717027 .0545622 −4.98 0.000 −.3786426 −.1647628
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
s c _ e <− |
tpi | .486437 .0538653 9.03 0.000 .3808629 .592011
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
som_st <− |
tpi | .7671996 .0692629 11.08 0.000 .6314468 .9029524
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
var ( e . t b )| 67.51208 9.14499 51.77024 88.04055
var ( e . t p i )| 19.16417 2.595923 14.69565 24.99144
var ( e . s c _ e )| 7.833977 1.061168 6.007324 10.21606
var ( e . som_st )| 12.95285 1.754555 9.932622 16.89143
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
LR t e s t o f model vs . s a t u r a t e d : c h i 2 ( 7 ) = 3 . 9 3 , Prob > c h i 2 = 0 . 7 8 7 7
Exercises:
1.(a) The results above are for the unstandardized model. If you
want to determine which of School Support or Coercive Control
has the biggest effect on Teacher Burnout, how would you mod-
ify your commands to do this? Modify your commands and
obtain a summary of the model.
(b) Assuming the model fits well, use SEM builder to display the
model.
2. (The Classic Wheaton dataset) Anomia and Powerlessness are two

subscales of a standard alienation scale. The variance-covariance
matrix below is from data collected on a panel of 932 individuals
in rural Illinois in 1967 and 1971. Education is measured in years
and occstat represents a socioeconomic index based on the respon-
dent’s occupation and these are indicators of Socioeconomic status
(SES).
1. anomia67 11.834
2. powerlessness67 6.947 9.364
3. anomia71 6.819 5.091 12.532
4. powerlessness71 4.783 5.028 7.495 9.986
5. education -3.839 -3.889 -3.841 -3.625 9.610
6. occstat -21.899 -18.831 -21.748 -18.775 35.522 450.288
The model fitted is given in the following diagram.
(a) Create a do file to enter the variance-covariance matrix into

Stata.
(b) Create a do file with the Stata commands for the model shown
in the diagram. You will need to define three latent variables,
two regressions, and two sets of correlated residuals.
Figure 3.2: Wheaton Structural Equa-

e1 e2
tion Model
educ66 occstat66
0 0
SES66
e3 Alien67 Alien71 e6
anomia67 pwless67 anomia71 pwless71

0 0
0 0
e4 e5 e7 e8
(c) Fit the model, and obtain standardized estimates. Does the
model fit? What is your interpretation of the results?
(d) Use SEM builder to create a diagram summarizing the results.
3.7 Indirect effects
For the teacher burnout example, we can estimate the direct, indirect
and total effects of one variable on another. For example,
• The direct effect of Coercive Control on Teacher-Pupil interaction

is the coefficient on the path from Coercive Control to Teacher-
Pupil interaction (i.e. −0.272).
• The indirect effect of Coercive Control on Teacher-Pupil interaction

is the product of the coefficients on the path from Coercive Control
to Teacher Burnout and from Teacher Burnout to Teacher-Pupil
interaction (i.e. 0.294 × 0.143 = 0.042).
• The total effect is the sum of the direct and indirect effects (i.e
−0.275 + .042 = −0.233).
Stata stores the coefficients in the model. We can take advantage of

these to test linear and non-linear combinations of these coefficients.
To see how they are defined in Stata type the following command:
sem , c o e f l e g e n d
The results are as follows. Note that for the path going from sc_spt
to cc, the notation is _b to indicate it is a path followed by an opening
left square bracket. The destination variable comes first and then
the origin variable, separated by a colon. Finally we have a closing
square bracket.
. sem , c o e f l e g e n d

Log l i k e l i h o o d = −2052.8451
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| Coef . Legend
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
t b <− |
s c _ s p t | −.3838194 _b [ t b : s c _ s p t ]
cc | . 2 9 3 5 8 5 _b [ t b : c c ]
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
t p i <− |
tb | . 1 4 2 4 8 6 6 _b [ t p i : t b ]
sc_spt | . 0 9 6 6 9 9 7 _b [ t p i : s c _ s p t ]
c c | −.2717027 _b [ t p i : c c ]
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
s c _ e <− |
tpi | . 4 8 6 4 3 7 _b [ s c _ e : t p i ]
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
som_st <− |
tpi | . 7 6 7 1 9 9 6 _b [ som_st : t p i ]
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
var ( e . t b )| 6 7 . 5 1 2 0 8 _b [ var ( e . t b ) : _cons ]
var ( e . t p i )| 1 9 . 1 6 4 1 7 _b [ var ( e . t p i ) : _cons ]
var ( e . s c _ e )| 7 . 8 3 3 9 7 7 _b [ var ( e . s c _ e ) : _cons ]
var ( e . som_st )| 1 2 . 9 5 2 8 5 _b [ var ( e . som_st ) : _cons ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
To test whether, for example, the total effect of Coercive Control on

Teacher-Pupil interaction is statistically significant, follow the steps
below:
1. Choose Statistics ⊳ SEM (structural equation modeling) ⊳ Testing

and CIs ⊳ Nonlinear combinations of parameters. Check the Post-
estimation results and the press Create. Press Create again. Click
on Coefficients in the Category Box and then on Coefficients. The
list of saved coefficients is displayed. Select tb:cc and double click
to enter it into the equation box. Type * and the select tpc:tb and
double click. Type + and then select tpi:cc and double click. Press
OK three times.
. nlcom ( _b [ t b : c c ] * _b [ t p i : t b ] + _b [ t p i : c c ] ) , p o s t
_nl_1 : _b [ t b : c c ] * _b [ t p i : t b ] + _b [ t p i : c c ]
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
_ n l _ 1 | −.2298708 .0543087 −4.23 0.000 −.3363138 −.1234277
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Stata provides an easier way to do this calculation for the direct,

indirect and total effects. Choose Statistics ⊳ SEM (structural equa-
tion modeling) ⊳ Testing and CIs ⊳ Direct and indirect effects. In the
estat-Postestimation tools for sem dialog box make sure that Decom-
position of effects into total, direct and indirect effects is highlighted
in the Reporting and statistics: (subcommand) dropdown box. Check
the following boxes: Do not display effects with no paths; Report
standardized effects; Do not display direct effects.
. e s t a t t e f f e c t s , compact s t a n d a r d i z e d n o d i r e c t
Indirect effects
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
| Coef . Std . E rr . z P>|z| Std . Coef .
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
t b <− |
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
t p i <− |
s c _ s p t | −.0546891 .0225029 −2.43 0.015 −.1150791
cc | .0418319 .0205264 2.04 0.042 .0695013
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
s c _ e <− |
tb | .0693108 .0248238 2.79 0.005 .182136
sc_spt | .0204355 .020981 0.97 0.330 .0578314
c c | −.1118176 .0291756 −3.83 0.000 −.2498498
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
som_st <− |
tb | .1093157 .0391516 2.79 0.005 .2025992
sc_spt | .0322305 .0330262 0.98 0.329 .0643288

c c | −.1763568 .044604 −3.95 0.000 −.2779206
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Total e f f e c t s
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
| Coef . Std . E rr . z P>|z| Std . Coef .
−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
t b <− |
s c _ s p t | −.3838194 .0777506 −4.94 0.000 −.4133434
cc | .293585 .0984724 2.98 0.003 .2496361
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
t p i <− |
tb | .1424866 .0510318 2.79 0.005 .2784103
sc_spt | .0420105 .0428804 0.98 0.327 .0884002
c c | −.2298708 .0543087 −4.23 0.000 −.3819165
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
s c _ e <− |
tb | .0693108 .0248238 2.79 0.005 .182136
tpi | .486437 .0538653 9.03 0.000 .6542
sc_spt | .0204355 .020981 0.97 0.330 .0578314
c c | −.1118176 .0291756 −3.83 0.000 −.2498498
−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
som_st <− |
tb | .1093157 .0391516 2.79 0.005 .2025992
tpi | .7671996 .0692629 11.08 0.000 .7277
sc_spt | .0322305 .0330262 0.98 0.329 .0643288
c c | −.1763568 .044604 −3.95 0.000 −.2779206
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
3.7.1 Recursive Models
In a study of 329 boys, Duncan, Haller, and Portes (1968) studied the
effect of peers on aspirations. The model is given below and the Stata
data set is available in the SEM manual.
r_intel e1
r_occasp
r_ses
f_ses
f_occasp
f_intel e2
. use h t t p ://www. s t a t a − p r e s s . com/data/r 1 3/sem_sm1

. ssd d e s c r i b e
. sem ( r _ i n t e l −> r_occasp , ) ( r _ s e s −> r_occasp , ) ( r _ s e s −> f _ o c c a s p , ) ( f _ s e s
> −> r_occasp , ) ( f _ s e s −> f _ o c c a s p , ) ( f _ i n t e l −> f _ o c c a s p , ) ( r _ o c c a s p −> f _ o
> ccasp , ) ( f _ o c c a s p −> r_occasp , ) , cov ( e . r _ o c c a s p * e . f _ o c c a s p ) n o c a p s l a t e n t
Observed : r_occasp f_occasp
Observed : r_intel r_ses f_ses f _ i n t e l

Log l i k e l i h o o d = −2617.0489
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
r _ o c c a s p <− |
f_occasp | .2773441 .1287622 2.15 0.031 .0249748 .5297134
r_intel | .2854766 .0522001 5.47 0.000 .1831662 .3877869
r_ses | .1570082 .052733 2.98 0.003 .0536534 .260363
f_ses | .0973327 .0603699 1.61 0.107 −.0209901 .2156555
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
f _ o c c a s p <− |
r_occasp | .2118102 .1563958 1.35 0.176 −.09472 .5183404
r_ses | .0794194 .0589095 1.35 0.178 −.0360411 .1948799
f_ses | .1681772 .0543854 3.09 0.002 .0615838 .2747705
f_intel | .3693682 .0557939 6.62 0.000 .2600142 .4787223
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
var ( e . r _ o c c ~p)| .6868304 .0535981 .5894193 .8003401
var ( e . f _ o c c ~p)| .6359151 .0501501 .5448425 .7422109
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
cov ( e . r _ o c c ~p , |
e . f _ o c c a s p )| −.1536992 .1442554 −1.07 0.287 −.4364346 .1290362
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
LR t e s t o f model vs . s a t u r a t e d : c h i 2 ( 0 ) = 0 . 0 0 , Prob > c h i 2 = .
The model is over-parameterised. We would expect though that

some of the parameters should be the same as each other. To do this,
follow the following steps:
1. Open the diagram and select the path from r_intel to r_occasp. In
the β box, type b1 and make sure you press Enter. Select the path
from r_ses to f_ses. Again type b1.
2. Do the same for the three other pairs you expect to be the same,
but this time type b2, b3, and b4, respectively.
3. Re-estimate the model.

.69
r_intel e1
1
.33
r_occasp
r_ses .16
.088
1
.25 .25 -.16

.088
f_ses .16
1
f_occasp
.33
f_intel e2
1
.64
. sem ( r _ i n t e l @ b 1 −> r_occasp , ) ( r_ses@b2 −> r_occasp , ) ( r_ses@b3 −> f _ o c c a s p ,

> ) ( f_ses@b3 −> r_occasp , ) ( f_ses@b2 −> f _ o c c a s p , ) ( f _ i n t e l @ b 1 −> f _ o c c a s p ,
> ) ( r_occasp@b4 −> f _ o c c a s p , ) ( f_occasp@b4 −> r_occasp , ) , cov ( e . r _ o c c a s p * e . f
> _occasp ) n o c a p s l a t e n t
Observed : r_occasp f_occasp
Observed : r_intel r_ses f_ses f _ i n t e l

Log l i k e l i h o o d = −2617.8705
( 1) [ r_occasp ] f_occasp − [ f_occasp ] r_occasp = 0

( 2) [ r_occasp ] r _ i n t e l − [ f_occasp ] f _ i n t e l = 0
( 3) [ r_occasp ] r_ses − [ f_occasp ] f _ s e s = 0
( 4) [ r_occasp ] f _ s e s − [ f_occasp ] r_ses = 0
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
| OIM
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Structural |
r _ o c c a s p <− |
f_occasp | .2471578 .1024504 2.41 0.016 .0463588 .4479568
r_intel | .3271847 .0407973 8.02 0.000 .2472234 .4071459
r_ses | .1635056 .0380582 4.30 0.000 .0889129 .2380984
f_ses | .088364 .0427106 2.07 0.039 .0046529 .1720752
−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
f _ o c c a s p <− |
r_occasp | .2471578 .1024504 2.41 0.016 .0463588 .4479568
r_ses | .088364 .0427106 2.07 0.039 .0046529 .1720752
f_ses | .1635056 .0380582 4.30 0.000 .0889129 .2380984
f_intel | .3271847 .0407973 8.02 0.000 .2472234 .4071459
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
var ( e . r _ o c c ~p)| .6884513 .0538641 .5905757 .8025477
var ( e . f _ o c c ~p)| .6364713 .0496867 .5461715 .7417005
−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
cov ( e . r _ o c c ~p , |
e . f _ o c c a s p )| −.1582175 .1410111 −1.12 0.262 −.4345942 .1181592
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Notice the code for a constraint. You can also set numbers here.
Exercise: Repeat the exercise with a do file.
3.8 Methods of Estimation
• ML (Maximum Likelihood) is the method used by default or you

can specify method(ml).It assumes multivariate normality. Note
that the distribution of the χ2 statistic is affected by kurtosis in the
data.
• ADF (Asymptotic Distribution Free) relaxes the assumption but re-

quires a large sample size. You specify this by setting method(adf).
• MLMV does full information maximum likelihood. It assumes

multivariate normality and that missing data is missing at random.
Note that ML and ADF use listwise deletion.
3.9 Identification
Identification relates to whether it is possible for the computer to

derive a unique set of parameter estimates. We don’t want to say too
much today because it is complicated, but we should say something.
3.9.1 Confirmatory Factor Analysis

• With a single factor you need at least three indicators for the
model to be identified.
• With more than two factors, then you require at least two indica-
tors to be identified.
Non-standard CFA models, where some indicators load on multi-

ple factors or some error terms covary, are more complicated.
3.9.2 Structural Models

The situation is simple if the structural model is recursive. The model
is identified. If the model is non-recursive then it is more compli-
cated.
3.9.3 Structural Regression Models

Assuming each latent variable is measured by two or more indica-
tors, the situation is quite simple. If the measurement part of the
model is identified; and the structural part of the model is identified,
then the structural regression model is identified. Again, when one
of the latent variables has only one indicator, the situation is more
complicated.
4 Datasets
4.1 Holzinger Swineford
HolzingerSwineford1939 { lavaan } R Documentation

Holzinger and Swineford D a t a s e t ( 9 V a r i a b l e s )
Description
The c l a s s i c Holzinger and Swineford ( 1 9 3 9 ) d a t a s e t

c o n s i s t s o f mental a b i l i t y t e s t s c o r e s o f
seventh − and eighth −grade c h i l d r e n from
two d i f f e r e n t s c h o o l s ( P a s t e u r and Grant −White ) .
In t h e o r i g i n a l d a t a s e t ( a v a i l a b l e i n t h e MBESS
package ) , t h e r e a r e s c o r e s f o r 26 t e s t s . However ,
a s m a l l e r s u b s e t with 9 v a r i a b l e s i s more widely
used i n t h e l i t e r a t u r e ( f o r example i n Joreskog ’ s
1969 paper , which a l s o uses t h e 145 s u b j e c t s
from t h e Grant −White s c h o o l only ) .
Usage
data ( HolzingerSwineford1939 )
Format
A data frame with 301 o b s e r v a t i o n s o f 15 v a r i a b l e s .
id
Identifier
sex
Gender
ageyr
Age , year p a r t
agemo
Age , month p a r t
school
School ( P a s t e u r or Grant −White )
grade
Grade
x1
Visual perception
x2
Cubes
x3
Lozenges
x4
Paragraph comprehension
x5
S e n t e n c e completion
x6
Word meaning
x7
Speeded a d d i t i o n
x8
Speeded counting o f dots
x9
Speeded d i s c r i m i n a t i o n s t r a i g h t and curved c a p i t a l s
Source
This d a t a s e t was r e t r i e v e d from

h t t p ://web . m i s s o u r i . edu/~ k o l e n i k o v s / s t a t a /hs− c f a . dta
and converted t o a csv f i l e .
References
datasets 51
Holzinger , K . , and Swineford , F . ( 1 9 3 9 ) . A study i n f a c t o r

a n a l y s i s : The s t a b i l i t y o f a b i f a c t o r s o l u t i o n . Supplementary
E d u c a t i o n a l Monograph , no . 4 8 . Chicago : U n i v e r s i t y o f
Chicago P r e s s .
Joreskog , K . G. ( 1 9 6 9 ) . A g e n e r a l approach t o c o n f i r m a t o r y
maximum l i k e l i h o o d f a c t o r a n a l y s i s . Psychometrika , 3 4 ,
183 −202.
4.2 Political Democracy
P o l i t i c a l D e m o c r a c y { lavaan } R Documentation
I n d u s t r i a l i z a t i o n And P o l i t i c a l Democracy D a t a s e t
Description
The famous I n d u s t r i a l i z a t i o n and P o l i t i c a l Democracy d a t a s e t .

This d a t a s e t i s used throughout B o l l e n ’ s 1989 book ( s e e pages
1 2 , 1 7 , 36 i n c h a p t e r 2 , pages 228 and f o l l o w i n g i n c h a p t e r 7 ,
pages 321 and f o l l o w i n g i n c h a p t e r 8 ) . The d a t a s e t c o n t a i n s
v a r i o u s measures o f p o l i t i c a l democracy and i n d u s t r i a l i z a t i o n
i n developing c o u n t r i e s .
Usage
data ( P o l i t i c a l D e m o c r a c y )
Format
A data frame o f 75 o b s e r v a t i o n s o f 11 v a r i a b l e s .
y1
Expert r a t i n g s o f t h e freedom o f t h e p r e s s i n 1960
y2
The freedom o f p o l i t i c a l o p p o s i t i o n i n 1960
y3
The f a i r n e s s o f e l e c t i o n s i n 1960
y4
The e f f e c t i v e n e s s o f t h e e l e c t e d l e g i s l a t u r e i n 1960
y5
Expert r a t i n g s o f t h e freedom o f t h e p r e s s i n 1965
y6
The freedom o f p o l i t i c a l o p p o s i t i o n i n 1965
y7
The f a i r n e s s o f e l e c t i o n s i n 1965
y8
The e f f e c t i v e n e s s o f t h e e l e c t e d l e g i s l a t u r e i n 1965
datasets 53
x1
The g r o s s n a t i o n a l product (GNP) per c a p i t a i n 1960
x2
The inanimate energy consumption per c a p i t a i n 1960
x3
The p e r c e n t a g e o f t h e l a b o r f o r c e i n i n d u s t r y i n 1960
Source
The d a t a s e t was r e t r i e v e d from

h t t p ://web . m i s s o u r i . edu/~ k o l e n i k o v s / S t a t 9 3 7 0 /
democindus . t x t ( s e e d i s c u s s i o n on SEMNET 18 Jun 2 0 0 9 )
References
B o l l e n , K . A. ( 1 9 8 9 ) . S t r u c t u r a l Equations with L a t e n t
V a r i a b l e s . Wiley S e r i e s i n P r o b a b i l i t y and Mathematical
S t a t i s t i c s . New York : Wiley .
B o l l e n , K . A. ( 1 9 7 9 ) . P o l i t i c a l democracy and t h e timing o f

development . American S o c i o l o g i c a l Review , 4 4 ,
572 −587.
B o l l e n , K . A. ( 1 9 8 0 ) . I s s u e s i n t h e comparative measurement o f
p o l i t i c a l democracy . American S o c i o l o g i c a l Review , 4 5 , 370 −390.
4.3 Pupil and teacher data set
1. Coercive Control 1.0000
2. Teacher Burnout 0.3557 1.0000
3. School Support −0.2566 −0.4774 1.0000
4. Teacher-Pupil Interactions -0.4046 0.0207 0.1864 1.0000
5. School Experience -0.1615 0.0938 0.0718 0.6542 1.0000
6. Somatic Status -0.3487 -0.0133 0.1570 0.7277 0.4964 1.0000
SD 8.3072 9.7697 10.5212 5.0000 3.7178 5.2714
datasets 55
4.4 Example 7/8 from Stata
. use h t t p ://www. s t a t a − p r e s s . com/data/r 1 3/sem_sm1

( S t r u c t u r a l model with a l l observed v a l u e s )
. ssd d e s c r i b e
Summary s t a t i s t i c s data from h t t p ://www. s t a t a − p r e s s . com/data/r 1 3/sem_sm1 . dta

obs : 329 S t r u c t u r a l model with a l l obse . .
vars : 10 25 May 2013 1 0 : 1 3
( _dta has n o t e s )
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
v a r i a b l e name variable label
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
r_intel respondent ’ s i n t e l l i g e n c e
r_parasp respondent ’ s p a r e n t a l a s p i r a t i o n
r_ses respondent ’ s f a m i l y socioeconomic s t a t u s
r_occasp respondent ’ s o c c u p a t i o n a l a s p i r a t i o n
r_educasp respondent ’ s e d u c a t i o n a l a s p i r a t i o n
f_intel friend ’ s i n t e l l i g e n c e
f_parasp friend ’ s parental aspiration
f_ses f r i e n d ’ s f a m i l y socioeconomic s t a t u s
f_occasp friend ’ s occupational aspiration
f_educasp friend ’ s educational aspiration
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
. notes
_dta :
1 . Summary s t a t i s t i c s data from Duncan , O.D. , H a l l e r , A.O. , and P o r t e s , A. ,
1 9 6 8 , " Peer I n f l u e n c e s on A s p i r a t i o n s : A R e i n t e r p r e t a t i o n " , _American
J o u r n a l o f S o c i o l o g y _ 7 4 , 119 −137.
2 . The data c o n t a i n 329 boys with i n f o r m a t i o n on f i v e v a r i a b l e s and t h e same
i n f o r m a t i o n f o r each boy ’ s b e s t f r i e n d .
5 Bibliography
[1] K.A. Bollen. Structural Equations with Latent Variables. John Wiley
and Sons, New York, 1989.
[2] T.A. Brown. Confirmatory Factor Analysis for Applied Research. The
Guilford Press, New York, 2006.
[3] B.M. Byrne. Structural Equation Modeling with AMOS: Basic

Concepts, Applications, and Programming. Routledge, New York, 2
edition, 2010.
[4] John Fox. The R Commander: A basic statistics graphical user

interface to R. Journal of Statistical Software, 14(9):1–42, 2005.
[5] J.F. Jr Hair, G.T.M Hult, C.M Ringle, and M. Sarstedt, editors.
A Primer on Partial Least Squares Structural Equation Modeling
(PLS-SEM). Routledge, New York, 2010.
[6] G.R. Hancock and R.O. Mueller, editors. The Reviewer’s Guide to
Quantitative Methods in the Social Sciences. Routledge, New York,
2010.
[7] K. Holzinger and F. Swineford. A study in factor analysis: The

stability of a bifactor solution. Number 48 in Supplementary
Educational Monograph. University of Chicago Press, Chicago,
1939.
[8] K. G. Joreskog. A general approach to confirmatory maximum

likelihood factor analysis. Psychometrika, 34:183–202, 1969.
[9] R.B Kline. Principles and Practice of Structural Equation Modeling.

The Guilford Press, New York, 3 edition, 1989.
[10] T.D. Little. Longitudinal Structural Equation Modeling. The Guil-

ford Press, New York, 2013.
[11] R Core Team. R: A Language and Environment for Statistical Com-

puting. R Foundation for Statistical Computing, Vienna, Austria,
2013.
[12] Yves Rosseel. lavaan: An R package for structural equation

modeling. Journal of Statistical Software, 48(2):1–36, 2012.
[13] Deepayan Sarkar. Lattice: Multivariate Data Visualization with R.

Springer, New York, 2008. ISBN 978-0-387-75968-5.
[14] F.A. Sava. Causes and effects of teacher conflict-inducing at-

titudes towards pupils: A path analysis model. Tecahing and
Teacher Education, (2):1007–1021, 2002.
[15] R.E. Schumacker and R.G. Lomax. A Beginner’s Guide to Struc-

tural Equation Modeling. Routledge, New York, 3 edition, 2010.
[16] Mark P.J. van der Loo and Edwin de Jonge. Learning RStudio for
R Statistical Computing. Packt Publishing, Birgiingham, UK, 2012.

Best SEM STATA Menu StataSEMMasterDay2and3 PDF

Uploaded by

Copyright:

Available Formats

You might also like

Best SEM STATA Menu StataSEMMasterDay2and3 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Best SEM STATA Menu StataSEMMasterDay2and3 PDF

Uploaded by

Copyright:

Available Formats

N E I L T. D I A M O N D A N D E W A M .

published by esquant statistical consulting pty ltd

typeset with tufte-latex

First printing, October 2014

3 Stata SEM Commands 27

1.1 The basics

Structural equation modelling (SEM) is a statistical methodology that

1.1.1 Latent versus manifest variables

Latent variables are unobserved variables that are inferred (through

1.1.2 Exogenous versus endogenous latent variables

Endogenous latent variables are like dependent variables in

1.1.3 The factor analytic model

tor loadings, are measures of the correlation between the individual

There are two types basic types of factor analysis: exploratory

In summary, both EFA and CFA are procedures used to reduce a

1.2 Outline of the Workshop

Introduction to Stata Menus Reading data into Stata. Cleaning a

Introduction to Stata Commands Turning a review window into a do

Statistics in Stata Using the menus for simple statistical methods

Some Multivariate methods in Stata Reliability Analyis, Principal Com-

Introduction to SEM Builder Confirmatory Factor Analysis using the

Introduction to SEM Commands Understanding the model syntax.

Some further commands for SEM Using a covariance matrix as input.

Some more details Estimators, standard errors, and missing values.

More on SEM Multiple Groups Analysis and Growth Curve Models.

Reporting SEM What to include. Modifying the diagram for publica-

SEM Builder is graphical user inteface to build and fit Structural

2.1 An Example of Using the SEM Builder

As an example, we will use a subset of the classic Holzinger and

Figure 2.1: SEM builder screen

• On the left hand side, click (Add Measurement Component

Figure 2.2: Measurement Component

• Change the latent variable name to “Visual". It is a good idea to

• Click OK and the model will be shown.

Figure 2.3: Estimated Visual congeneric

2.3.1 Interpretation of the Model

– Note the loading on the first variable is set to 1. The latent

• The mean of the latent variable is assumed to be 0 and is not

• The variance of the latent variable is estimated to be 0.52.

2.4 Standardised Model

An alternative is to specify that the variance of the latent variable is 1

Figure 2.4: Estimated Visual standard-

2.4.1 Interpretation of the Standardised Model

• The correlations between the latent variable and the observed

• The mean of the latent variable is 0.

2.5 Creating a CFA example in SEM Builder

The “one-factor congeneric" model for Visual has no degrees of

1. Choose Estimation ⊳ Clear Estimates

2. Type “S" to choose the Select button

4. Type “M" to choose the “Add Measurement Component" button,

6. Type “M" to choose the “Add Measurement Component" button,

8. Type “C" to choose the “Add Covariance" button

11. Estimate the parameters using Estimation ⊳ Estimate.

12. Display the standardized solution using View ⊳ Standardized

Figure 2.5: Estimated CFA Model for

Visual Textual Speed

e1 .4 e2 .82 e3 .66 e4 .27 e5 .27 e6 .3 e7 .68 e8 .48 e9 .56

2.6 The output