SEM Ppt2final

Statistical Modelling
(Special Topic: SEM)
PROF.SANJIV MITTAL
SEM: The Basic
PROF.SANJIV MITTAL
Structural equation modeling
Structural Equation Modelling (SEM) is a
powerful method to estimate multiple and
simultaneous relationships involving
several dependent variables and
explanatory variables, and allows for the
inclusion of latent variables which cannot
be directly measured but can be expressed
as a function of other measurable
variables
3
Latent v/s Observed Variables
In behavioral sciences researchers are interested in
studying theoretical constructs that cannot be
observed directly. These abstract phenomena are
called as latent variables or factors. E.g. like self
concept, motivation, teacher ability, service quality
etc. To measure these researcher must
operationally define the latent variable of interest
in terms of behavior believed to represent it. Uses
some attitudinal scales as interview questions
called observation scores. These measured scores
i.e. measurements are termed as observed or
manifest variables.
Latent v/s Observed Variables
In SEM terminology they serve as indicators
of the underlying constructs that they
represent. Hence it is necessary to have this
bridging process between the observed
variables and unobserved latent variables.
Exogenous and Endogenous Latent Variables
Latent variables are of two types Exogenous or
Endogenous. Exogenous latent variables are also
called as independent variables; they “cause”
fluctuations in the values of other latent variables
in the model. Changes in the values of exogenous
variables is not explained by the model. They are
influenced by the background factors external to
the model called demographics like gender, age,
socio-economic status etc.
On the other hand Endogenous latent variables
are called as dependent variables. They are
influenced by the exogenous variables directly or
indirectly in the model.
SEM
SEM extends the regression assumptions as
Several dependent variables can be considered at
the same time
Explanatory variables can be assumed to be
measured with a random error
Endogenous variables can be used to explain
dependent variables
Correlation between explanatory variables is
allowed for
And this is not all. Another key feature in SEM is
the possibility of including in the model, as
endogenous or exogenous variables i.e. some
latent (unobservable) variables.
7
SEM is ……
a family of statistical techniques
which incorporates and
integrates
Path analysis
Linear regression
Factor analysis
Causality
Causality has theoretical basis
Education Success
in Life
Price Suppl
Demand
y
Unemp- Windows of
loyment No. of Opportunity
Rate Crimes for Crime
Cause and Effect
Philosophers have had a great deal to say
about the conditions necessary to infer
causality i.e. Cause and effect
cause should occur before an effect is
observed, and
the cause should never occur without the
presence of the effect.
Statistical Modeling
A Statistical Model DOES NOT necessarily have
theoretical basis – It may be interpreted as either
‘make sense’ or ‘nonsense’
Weight
Heart
Disease
Income
Smokin
g
No. of No. of
Road Newspaper
Accidents Readers
Variables in SEM
 Manifest variables that can be directly measured and serve
as indicators/items
 Latent variables are not observable and are the actual
components of the causal relationship
 Just like variables in regression analysis one can distinguish
between exogenous constructs and endogenous constructs in
a causal relationship
 Exogenous constructs only act as explanatory factors and
do not depend on any other construct
 Endogenous constructs play the role of the dependent
variables and they appear on the left-hand side in at least
one of the SEM causal relationships
12
Examples of marketing applications for SEM
Scarcity and willingness to buy How perceived scarcity influences
willingness to buy through a set of mediating variables, like perceived
quality, perceived monetary sacrifice and perceived symbolic benefits
(Wu and Hsing (2006))
E-shopping On-line shopping as the interaction of four variables:

education, technological know-how, economic condition and trust
(Mahmood et al. (2005))
Customer relationship Customer satisfaction for credit cards

considering two latent constructs, customer service and card member
rewards as measured by some manifest variables, like being courteous,
accurate in answering questions, correcting errors (for customer
service) or ease of obtaining rewards (Dillon et al. (1997))
13
SEM & related techniques
Structural Equation Modelling is a
comprehensive method which includes as
special cases:
◦ confirmatory factor analysis
◦ path analysis
◦ multivariate regression (simultaneous
equation systems)
14
Exogenous Latent Variable
/Construct
Verbal
Ability
Indicators x1 x2 x4 x4 Endogenous Latent Variable

1 1 1 1 General
Intelligence
d1 d2 d3 d4 1
y1 y2 y3 y4 Indicators
1 1 1 1
Exogenous Latent Variable Quantitative
Ability e1 e2 e3 e4
x5 x6 x7 x8 Indicators
1 1 1 1
d5 d6 d7 d8
SEM Nomenclature
Independent variables, which are assumed to
be measured without error, are called
exogenous or upstream variables;
Dependent or mediating variables are called
endogenous or downstream variables.
Manifest or observed variables or indicators
are directly measured by researchers
Latent or unobserved variables are not directly
measured but are inferred by the relationships
or correlations among measured variables in
the analysis. Example, self-concept,
motivation, powerlessness, verbal ability,
capitalism, social class.
SEM Nomenclature (cont.)
SEM illustrates relationships among
observed and unobserved variables using
path diagrams.
Ovals or circles represent latent variables,
Rectangles or squares represent measured
variables.
Residuals/error terms are always
unobserved, so they are represented by
ovals or circles.
SEM – Definition
 SEM is an extension of the general linear
model (GLM) that enables a researcher to
test a set of regression equations
simultaneously.
 SEM consists of TWO components;
 Structural Model
 illustrates the relationships among the latent
constructs or endogenous variables
 Measurement Model
 represents how the constructs are related to their
indicators or manifest variables
Example
In psychology, the theory
postulates that …
Ability /
Intelligenc Aspirations Achievement
e
1 1 2
Exogenous Endogenous Endogenous
Latent Latent Latent
Construct Construct Construct
Full Latent Variable Model
Ability Aspiration Achievement
Father’s Peer’s Personal

Academic Interpersonal Communication Family
Occupation, Influence
Professional Social
Skill, x1 Skill, x2 Skill, x3 Status, y1 Actualization, Status, y5 Status,
y2 y3 y4 y6
Example:
ONE Latent (unobserved) Exogenous Variable &
TWO Latent (unobserved) Endogenous Variables
21
Structural 1 2
11  21
Model
1 1 2
x 11y y
31 y
42 y
11 x
 x
y
62
21 31 y
21 52
Measurement
Model
x1 x2 x3 y1 y2 y3 y4 y5 y6
1 2 3 1 2 3 3 4 6
Structural Model
The structural model allows for certain
relationships among the latent variables,
depicted by lines or arrows (in a path diagram)
In the path diagram, we specified that Ability
and Achievement were related in a specific way.
That is, intelligence had some influence on later
achievement.
Thus, one result from the structural model is an
indication of the extent to which these priori
hypothesized relationships are supported by our
sample data.
Structural Model (Cont.)
The structural equation addresses the
following questions:
Are Ability and Achievement related?
Exactly how strong is the influence of
Ability on Achievement?
Could there be other latent variables that
we need to consider to get a better
understanding of the influence on
Achievement?
Example:
ONE Latent (unobserved) Exogenous Variable &
TWO Latent (unobserved) Endogenous Variables
21
Structural 1 2
11  21
Model
1 1 2
x 11y y
31 y
42 y
11 x
 x
y
62
21 31 y
21 52
Measurement
Model
x1 x2 x3 y1 y2 y3 y4 y5 y6
1 2 3 1 2 3 3 4 6
Measurement Model
 Specifying the relationship between the latent
variables and the observed variables
 Answers the questions:
1) To what extent are the observed variables
actually measuring the hypothesized latent
variables?
2) Which observed variable is the best measure of
a particular latent variable?
3) To what extent are the observed variables
actually measuring something other than the
hypothesized latent variable?
 Using Exploratory Factor Analysis (EFA) or
Confirmatory Factor Analysis (CFA) to
determine the significant observed variables
related to each of the latent variables
Measurement Model (Cont.)
The relationships between the observed
variables and the latent variables are described
by factor loadings
Factor loadings provide information about the
extent to which a given observed variable is able
to measure the latent variable. They serve as
validity coefficients.
Measurement error is defined as that portion of
an observed variable that is measuring
something other than what the latent variable is
hypothesized to measure. It serves as a measure
of reliability.
Exploratory FA (EFA)
 In EFA the factor structure or theory about a
phenomenon is NOT KNOWN.
 For example, the researcher is interested in
measuring “the achievement of a personnel”.
 Suppose he has no knowledge ( very little theory)
regarding
 the factors that contribute to achievement
 the no. of indicators of each factor
 which indicators represent which factor
 In such a case, the researcher may collect data
and ‘explore’ for a factor or theory which can
explain the correlations among the indicators.
The Factor Analytic Model
The best known method to investigate the
relationship between set of observed and latent
variables is that of factor analysis. Here the
researcher examines the covariance among a set
of observed variables to gather information on
the underlying construct or Factors.
It is normally performed in situations when the
links between the observed and latent variables
are unknown or uncertain. So analysis in an
exploratory mode proceeds to determine how
and to what extent the observed variables are
linked to their underlying factors.
The Factor Analytic Model
So researcher attempts to find out the
minimal number of factors that account
for covariance or correlations among the
observed variables. In factor analysis
these relationships are represented by
factor loadings. It is exploratory in nature
as researcher has no prior knowledge that
these items do, indeed measure the
intended factors.
Confirmatory FA (CFA)
 In CFA the precise factor structure or theory about a
phenomenon is KNOWN or specified priori.
 For example, a researcher is interested in measuring
“consumer preference” to a product.
 Suppose that ‘based on previous research’ it is hypothesized
(the theory) that a construct or factor to measure ‘consumer
preference’ is
 a one-dimensional construct with 7 indicators or items as its
measures
 The obvious question is:
 How well do the empirical data conform to the theory of
consumer preferences? Or
 How well do the data fit the model?
 In such a case, CFA is used to do empirical ‘confirmation’ or
‘testing’ of the theory
Confirmatory Factor Analysis
Is used when the researcher has prior knowledge
of the underlying latent variable structure. The
relationship between the observed measures and
the underlying factors “a priori” would then be
tested statistically using SEM. All items will be
free to load on that factor but restricted to have
zero loadings on the remaining factors. This is
evaluated statistically using measurement model
by determining its adequacy of goodness of fit
to the sample data.
Reliability
• Definition: Extent to which a variable or set of
variables is consistent in what it is intended to
measure
• If multiple measurement are taken, the reliable
measures will all be consistent in their values
• It is a degree to which the observed variable
measure the “true” value and is “error free”
• It is different from validity
Reliability
• The degree to which scores are free from
random measurement error
• Reliability measures
– Internal Consistency Reliability
– Test-retest Reliability
– Alternate Forms Reliability
Reliability
• Levels of Reliability
– 0.90 Excellent
– 0.80 Very Good
– 0.70 Adequate
– <0.70 Poor
Example: Reliability of Observed Variables
Cronbach’s alpha were computed for the all variables
Variable No. of items Reliability
Variable1 10 .91
Variable2 10 .87
Variable3 10 .58
Variable4 10 .70
Variable5 12 .72
Variable6 12 .80
Variable7 12 .80
Variable8 12 .87
Variable9 10 .84
Variable10 7 .71
Variable11 4 .48
Types of Reliability
Test-retest
Assessed by administering the same instrument to
the same sample respondent at two points in time,
and computing the correlation between two sets of
scores.
Internal consistency reliability
The extent to which individual items that constitute a
test correlate with one another or with the test total.
In short, it measures how consistently respondents
respond to the items within scale.
Validity
Definition: extent to which an item or set of
items correctly represent the construct of
study- the degree of which it is free from any
systematic or non-random error
Validity deals with
How well the construct is defined by the item/s
(what should be measured)
While Reliability deals with
How consistent the item/s is/are in measuring the
construct (HOW it is measured)
Validity
Whether the scores measure what they are
supposed to measure
Types of validity
Construct Validity
Criterion-Related Validity or Nomological Validity
(Correlation with an external standard)
Convergent Validity (SEM Confirmatory Factor Analysis
helps to establish convergent validity)
 Discriminant Validity (Can be determined through SEM
Confirmatory Factor Analysis)
Types of Measurement Scale
There 4 types of measurement scale in a scale
instrument
◦ Nominal Scale
◦ Ordinal
◦ Interval Scales
◦ Ratio
Some other common scales like Likert scales,
Semantic Differential Scales, Dichotomous
Scales etc can be categorized into the 4 above
Metric and Non-metric Scales
Metricscales are quantitative data where the
parameters of the scale is continuum
◦ Interval or Ratio scale data
Non-metric scales are qualitative data where
attributes, characteristics or categorical
properties that identify or describe a subject or
object
◦ Possibly Nominal or Ordinal scale data
VARIABLE SCALES
SEM in general assumes observed variables are
measured on a linear continuous scale. Metric
scales or Interval scales data is only used in SEM.
Correlation
Perhaps the most basic semantic
◦ Definition: the linear relationship of two variables
The strength of relationship is determined by the
correlation coefficient and R² .
There are 2 common types of correlation
coefficient
◦ Pearson Product Moment Correlation (Interval)
◦ Spearman Ranking Correlation (Ordinal)
The former is the one we will use in this case
Covariance
The covariance between two variables

equals the correlation times the product of
the variables' standard deviations. The
covariance of a variable with itself is the
variable's variance which is equal to one.
Factors Effecting Correlation/
Covariance Coefficient
 Type of scale and range of values
◦ Pearson correlation is basis for analysis in regression, path, factor
analysis and SEM. Hence data must be in metric form.
◦ There must be enough variation in scores to allow correlation
relationship to manifest.
 Linearity
◦ Pearson correlation coefficient measures degree of linear relationship
between two variables, hence need to test linearity.
 Sample size
◦ SEM requires big sample size. Rule of thumb: 10-20 times the number
of variables. Ding, Velicer and Harlow (1995): 100-150; Boomsma
(1982,1983): 400; Hu, Bentler and Kano (1992): in some cases 5000 is
still insufficient; Schumaker, Lomax (1999) many articles 250-500.
Bentler and Chou (1987): for normal data 5 subjects per variable is
sufficient.
SEM Assumptions
Sample Size
a good rule of thumb is >15 cases per
predictor / indicator (James Stevens’
Applied Multivariate Statistics for the
Social Sciences)
Model with TWO factors,
recommended sample size >100
Model with FOUR factors,
recommended sample size > 200
SEM Assumptions (cont.)
Sample Size
Consequences of using smaller samples
◦ convergence failures (the software cannot reach a
satisfactory solution),
◦ improper solutions (including negative error
variance estimates for measured variables),
◦ lowered accuracy of parameter estimates and, in
particular, standard errors
SEM program standard errors are computed
under the assumption of large sample sizes.
Normality
Many SEM estimation procedures assume
multivariate normal distributions
Lack of univariate normality occurs when
the skew index is > 3.0 and kurtosis index
> 10.
Multivariate normality can be detected by
indices of multivariate skew or kurtosis
Non-normal distributions can sometimes be
corrected by transforming variables
Multicollinearity
 Occurs when inter-correlations among some
variables are so high that certain mathematical
operations are impossible. Multicollinearity be
avoided or unity matrix should not be there. The
covariance between the independent and
dependent variables be there.
Jargon
Measured variable
◦ Observed variables, indicators or manifest
variables in an SEM design
◦ Predictors and outcomes in path analysis
◦ Squares in the diagram
Latent Variable
◦ Un-observable variable in the model, factor,
construct
◦ Construct driving measured variables in the
measurement model
◦ Circles in the diagram
Jargon
Error or E
◦ Variance left over after prediction of a measured
variable
Disturbance or D
◦ Variance left over after prediction of a factor
Exogenous Variable
◦ Variable that predicts other variables
Endogenous Variables
◦ A variable that is predicted by another variable
◦ A predicted variable is endogenous even if it in
turn predicts another variable
Jargon
Measurement Model
◦ The part of the model that relates indicators to
latent factors
◦ The measurement model is the factor analytic
part of SEM
Path model
◦ This is the part of the model that relates
variable or factors to one another (prediction)
◦ If no factors are in the model then only path
model exists between indicators
Jargon
Direct Effect
◦ Regression coefficients of direct prediction
Indirect Effect
◦ Mediating effect of x1 on y through x2
Confirmatory Factor Analysis
Covariance Structure
◦ Relationships based on variance and covariance
Mean Structure
◦ Includes means (intercepts) into the model
Diagram elements
Single-headed arrow →
◦ This is prediction
◦ Regression Coefficient or factor loading
Double headed arrow ↔
◦ This is correlation
Missing Paths
◦ Hypothesized absence of relationship
◦ Can also set path to zero
Degrees of freedom (DoF)
DoF count the number of independent
observations available for estimation
 Formula of df= ½{(p)(p+1)}-k, where p=total no.
of observed variables & k=no. of estimated parameters
Rule of thumb for identification. At least

three manifest indicators for each latent
variable.
54
Estimation
Maximum likelihood estimation (MLE)
• the manifest indicators must follow a multivariate normal distribution (i.e.
they are normally distributed for any value of the other indicators)
• the latent constructs are also assumed to be normally distributed
Key aspects of SEM estimation
• Individual cases only enter the estimation process to obtain the covariance
matrix
• SEM does not use the individual observations (cases) for the estimation of
the parameters
• Estimation is based on the covariance matrix not on the individual cases
• Thus, degrees of freedom refer to the elements in the covariance
matrix
• An adequate sample size is still needed
• Identification problems may emerge when many of the elements of the
covariance matrix are close to zero unless the sample size is large enough
• A simple rule of thumb requires at least fifteen observations per measured
variable or indicator
55
Testing
How is the theory tested?

• First, parameter estimates should be reasonable,
both in terms of the founding theory and for
statistical acceptability
• Goodness-of-fit tests
56
Goodness-of-fit indicators
Three Types of Fit Indices
1. Absolute Fit Indices
a) Minimum sample discrepancy (CMIN) checks whether the model

perfectly fits the data (very unlikely and not really useful as a test).
• When this measure is divided by the degree of freedom (CMIN/DF), one
obtains the above mentioned Chi-square test. Also represented as
NCS( Normed Chi Square)=Chi square/df (values lies between 1 & 5)
b) Root mean square residual (RMSR) or SRMR( Standardised Root Mean
Residual) refer to the residuals between the estimated and sample
covariance matrices. It can be used to compare alternative models, where
a smaller RMR indicates better fit values between (below-4.0 to
above4.0) also called badness of fit.
c) Goodness-of-fit index (GFI), should be above 0.90 for acceptable
theories. The adjusted version (AGFI) has similar interpretation
57
Goodness-of-fit indicators
2. Incremental Fit Indices: which are
 Other Goodness of fit indices which are expected to be as close as
possible to one (and not below 0.90):
• normed fit index (NFI)
• relative fit index (RFI)
• incremental fit index (IFI)
• Tucker-Lewis coefficient (TLI)
• comparative fit index (CFI)
 Root Mean Square Error of Approximation (RMSEA): It has a

distribution so be seen. It is a badness of fit Index. RMSEA is between
0.03 and 0.08 with 95% confidence. So its values are less than 0.10 to
have model fit. Here confidence interval is constructed giving a range of
RMSEA values for a given level of confidence as p-close
3. Parsimonious Fit : see value of chi-square/df
 Others can be Parsimony Fit Indices: called PGFI: For them also values
range between 0 to 1 like for other Indices
58
Measurement Error
Their can exists some degree of measurement
error when we cannot perfectly measure a
concept. For example when asking straight
forward about ones household income, we know
some people will answer in correctly i.e. either
overstating or understating called measurement
error. The measurement error help to strengthen
the coefficients in our dependent models that is
why we use the error concept in SEM model.
Example
Retail Manager are interested in knowing
how individuals became loyal, committed
customers. They ask the research
questions: How do customer perceptions
of three key strategic elements-price,
service & atmosphere-determine customer
acceptance of store, measured by
customer share & customer commitment?
Example contd…..
From their experience they developed a serious of
relationship they feel explain the process
Better price perception increase customer share.
Better service perceptions increase customer
share.
Better store atmosphere perceptions increase
customer share.
Higher customer state increases customer
commitment.
Structural Equation Model
The relationship identified by the retail
managers include five constructs: perceptions of
price, service, and store atmosphere along with
customer share and customer commitment. The
first step is to identify which constructs can be
considered exogenous versus endogenous.
From our relationship we can identify three
exogenous constructs and two endogenous
constructs:
Exogenous Constructs Endogenous Constructs
Price Customer Share
Service Customer Commitment
Atmosphere
With the constructs specified, the relationships

can be represented in a path diagram. Figure 1
portrays the four relationships suggested by the
retail managers.
Path Diagram Figure:-1
Price
Customer Customer
Service
Share Commitment
Atmosphere
Path Diagram Figure:-1
Price
.065
.200
.200
.219 Customer .500 Customer
Service
Share Commitment
.150
.454
Atmosphere
Covariance matrix
Price Service Atmosphere Customer Customer

share commitment
Price Var(P)
service Cov(P,S) Var(S)
atmosphere Cov(P,A) Cov(S,A) Var(A)
Customer share Cov(P,CS) Cov(S,CS) Cov(A,CS) Var(CS)
Customer Cov(P,CC) Cov(S,CC) Cov(A,CC) Cov(CS,CC) Var(CC)

commitment
Showing observed covariance

Covariance matrix
Price Service Atmosphere Customer Customer

share commitment
Price Var(P)
service 0.20 Var(S)
atmosphere 0.20 0.15 Var(A)
Customer share 0.20 0.30 0.50 Var(CS)
Customer -.05 0.25 0.40 0.50 Var(CC)

commitment
Showing observed covariance by putting values

From the observed values for Price, Service, Atmosphere
we can estimate the value for Customer Share using the
equation
Y(cs) = .065(price) + .219(service) + .454(atmosphere)
Similarly predicted values of customer commitment can be

obtained as;
Y(cc) = .50(cs)
Or Y(cc) = .50{.065(price) + .219(service) + .454(atmosphere)}
Now look at the relationship i.e. relationship between service and

customer share
Direct Path
Service to Customer Share=0.219
Indirect Path
Service to Price to Customer share = .20x .065 =.013
Service to atmosphere to customer share = .15 x ..454 = .30
Direct Path
Service to Customer Share=0.219
Indirect Path
Service to Price to Customer share = .20x .065 =.013
Service to atmosphere to customer share = .15 x ..454 = .30
So Total: Direct + Indirect = .219 + .013 + .068 = .30
Thus the estimated covariance between Service and Customer

Share is .30 the sum of Direct and Indirect paths.
Hence RESIDUAL(R) in SEM is the difference between any

observed and estimated covariance.
Thus when we compare the observed and actual covariance

matrices any difference we detect are the residuals.
Reflective v/s Formative Model
Itis the direction of relationship. If from
construct to the measures it is called
Reflective model.
Measure
latent
In reflective Model
a) Measures denotes the effects of the underlying
Latent constructs.
b) A change in the latent variable causes variance in all
measures simultaneously and
c) All measures in Reflective model be positively
correlated.
Formative Measurement Model
Measure latent
If the relationship is from the

Measure to the Construct it is
called formative Measurement.
In Formative Models
a) Measures are causes of the construct i.e. indicators
determine the latent construct.
b) Latent variable is the dependent variable and the
indicators or measures are the explanatory variable
c) Omitting any one indicator or measure will alter the
nature of construct
d) Measures may or may not correlate or have no
correlation between indicators/measures.
e) Formative indicators are error free or have no individual
measurement error term.
d) Error term is fixed at the construct level only
f) Used to form Indices.
Types of Model Identification
There are three types of Model Identification:
1.Just Identified Model
2. Unidentified Model
3. Over Identified Model
In Amos Model needs to be either Just Identified

Model or Over Identified Model other wise
estimates cannot be calculated.
Types of Model Identification
Two variables are to be used to do calculations:
a. No. of distinct sample moments
b. No. of distinct parameters to be estimated.
Df= Degrees of Freedom= a-b
So If a-b= 0 than the model is just identified and all GFI will be 1 so
no further model fit can be done and the model is called saturated
model.
When df is positive say = 1 than we say model is over identified
When no. of parameters to be estimated are greater than the no. of
sample moments than we say model is unidentified as the df
becomes negative so we cannot the estimations i.e. no estimations
can be made.
So model should always be over identified to calculate estimates and model
fit.
The SPSS package for SEM – AMOS
 Graphical interface
 Alternative softwares
• LAMOS reads SPSS data sets
• ISREL, the first computer program which dates back to 1973 after the
pioneering work of the statistician Karl Jöreskog (1967 and 1969) which has
evolved together with the method.
• SAS also provides a procedure for estimating structural equation models proc
CALIS.
• Other packages designed for structural equation models are EQS and Mplus
76
Drawing the path diagram
Open AMOS graphics from the AMOS
menu
77
The AMOS graphic interface
The path diagram can be

drawn directly here
These are elements and

functions to draw the diagrams
easily
78
Some useful functions
Draw a square Draw a latent variable or
(manifest variable) adds indicators to a latent
variable
Draw a circle
(latent variable) When clicking on an
indicator, associate it with a
Draw a single arrow latent unobserved variable
(causation) (generally an error)
Otherwise it creates the
Draw a double arrow latent variable
(correlation) Copies Moves Delete
objects objects objects
79
The path diagram
1 1 1 1 1 1 1 1 1 1 1
80
The data
 To include the observed variables it is now time to open the SPSS
file:
or simply click on this
button
Click on FILE NAME to open the

data, then click on OK
81
Adding names of variables from the data-set
Click here to see the list of

available variables
Simply drag the desired variables

on to the desired square
82
Unobservable variables and errors
 It is also necessary to give names to the latent variables and to the errors
(circles)
 Latent variables: Right-click on the desired circle, select PROPERTIES and
assign a a name and a label
 Errors: all remaining variables can be assigned an automated name as follows
83
The model is now ready
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11
1 1 1 1 1 1 1 1 1 1 1
bba bbb bbc bbd bbe bbf bbg bbh bbi bbj bbk
1 0,
A
0,
e12
SN 1
ITP
PBC
84
Before estimation
Intercept is required
for estimation if there
are missing data
Choose estimation
method
Decide whether results
for the independence
and saturated should
be shown
85
Analysis properties
Ask for additional

output (e.g. factor
Convergenceweights for the
Set estimation
criteria measurement model)
options
86
Estimation
87
Estimation output
As the estimation procedure
terminates this button becomes
available to show estimates
directly in the path diagram
This pane shows the final

Chi-square degrees of
freedom and signals
potential convergence
problems
As expected, the model is

over-identified
88
Output view
Click here (or press F10)

to see the full output as
text
89
AMOS output window
Summary information
Final estimates
Goodness-of-fit evaluation
90
Degrees of freedom and Chi-
square
Variable summary
Parameter estimates
91
Output
Computation of DoF(Default model)
Number of distinct sample moments: 119 DoF show that the

Number of distinct parameters to be estimated: 45 model is over-
Degrees of freedom (119 - 45): 74 identified
The Chi-square statistic is high and

significant: this means that there is a
discrepancy between the observed
Result (Default model)
Minimum was achieved
covariance matrix and the estimated one
Chi-square = 619.428
Degrees of freedom = 74 This is a frequent result but the Chi-square
Probability level = .000 statistic DoFtends to be inflated when there are
many DoF(or large sample sizes) and is very
sensitive to the assumption of multivariate
normality
92
Estimates
Estimate S.E. C.R. P
This coefficient is constrained to
bba <--- A 1.000
one in order to ensure identification
bbb <--- A .985 .074 13.298 ***
bbc <--- A -.097 .067 -1.461 .144
bbd <--- A .836 .067 12.496 ***
bbe <--- A 1.045 These
.078 are the loadings
13.485 *** of the
bbf <--- A .853 manifest
.061 indicators***on the latent
13.872
bbg <--- A .887 variable
.074 attitude
11.953 (A)
***
bbh <--- A .965 .079 12.296 ***
bbi <--- A -.297 .064 -4.628 ***
Attitudes have a
bbj <--- A .674 .077 8.807 ***
positive and
bbk <--- A -.028 .084 -.327 .744
significant (albeit
q7 <--- A .073 .012 6.307 ***
small)
SN impact
& PBC on
do not
q7 <--- sn .008 .008 1.051 .293
intentions
have a significant
q7 <--- pbc .016 .010 1.581 .114
impact on intentions
93
Standardized regression weights
Standardized weights can be interpreted like
Estimate
correlations, but they assume causation
bba <--- A .735
bbb <--- A .656 The behavioral belief which is most important to
bbc <--- A -.072 measure the latent construct (attitude) is bba, i.e.
bbd <--- A .616
chicken taste
bbe <--- A .661
bbf <--- A .682 These weights are negatively related to
bbg <--- A .594 attitudes. They measure the following beliefs:
bbh <--- A .633 bbc: difficulty of preparation
bbi <--- A -.228
bbj <--- A .447 bbi: the agreement with the statement that
bbk <--- A -.017 chicken lacks flavour
q7 <--- A .324 A standardized weight
bbk: animal of 0.32
welfare indicates a positive but
concern
q7 <--- sn .048 small relation between attitudes and intentions to
q7 <--- pbc .071 purchase chicken
94
Correlations (no causation)
Covariances: (Group number 1 - Default model)
S.E. C.R. P
Estimate
sn <--> pbc 3.712 4.219 .880 .379
pbc <--> A -2.880 3.548 -.812 .417
sn <--> A 12.017 4.461 2.694 .007
Correlations: (Group number 1 - Default model)
Estimate The only bidirectional relation which

sn <--> pbc .040 emerges as significant is between
pbc <--> A -.040 subjective norm (referent beliefs) and
sn <--> A .134 attitudes
95
Goodness-of-fit (1)
Model Fit Summary
CMIN
Model NPAR CMIN DF P CMIN/DF
Default model 45 619.428 74 .000 8.371
Saturated model 119 .000 0
Independence model 14 1736.822 105 .000 16.541
Baseline Comparisons Model

NFI RFI IFI TLI Values above fve
Model Delta1 rho1 Delta2 rho2 suggest
CFI rejection
Default model .643 .494 .672 .526 of the
.666 model
Saturated model 1.000 1.000 1.000
Independence model .000 .000 .000 .000 .000
Parsimony-Adjusted Measures
Model PRATIO PNFI PCFI
Default model .705 .453 .469 In a good model all
Saturated model .000 .000 .000 these indicators should
Independence model 1.000 .000 .000
be above 0.9
96
Goodness-of-fit (2)
NCP
Model NCP LO 90 HI 90
Default model 545.428 469.732 628.594
Saturated model .000 .000 .000
Independence model 1631.822 1500.462 1770.572
FMIN
Model FMIN F0 LO 90 HI 90
Default model 1.241 1.093 .941 1.260
Saturated model .000 .000 .000 .000
Independence model 3.481 3.270 3.007 3.548
RMSEA
Model RMSEA LO 90 HI 90 PCLOSE Good models
Default model .122 .113 .130 .000
Independence model .176 .169 .184 .000 have a pclose
value above
0.05
97
Information criteria
AIC
Model AIC BCC
Default model 709.428 712.218
Saturated model 238.000 245.376
Independence model 1764.822 1765.690
ECVI
Model ECVI LO 90 HI 90 MECVI
Default model 1.422 1.270 1.588 1.427
Saturated model .477 .477 .477 .492
Independence model 3.537 3.273 3.815 3.538
HOELTER
HOELTER HOELTER
Model .05 .01
Default model 77 85
Independence model 38 41
Information and Hoelter criteria

are unsatisfactory
98
A competing model
 Thetheoretical model is rejected by the coefficients are significant
 Competing model strategy
• try and remove all the non-significant components of the model
• add some other explanatory variables
 Problem: measurement of the latent construct for attitude because
the presence of items with negative wording might lead to the
identification of more than one latent factor
 Additional explanatory variable
• some variable explaining habit, which could be correlated with attitude and
influence ITP
• For example, variable q2b measures the frequency of chicken purchases and
we label it as habit
99
The modified model 0, 80.22 0, 90.91
0, 57.29 0, 89.38 e4 e5
e1 e2 1 1
1
29.38
34.27
1
bbb 32.42 bbd bbe
37.31 1.08
bba .98 .82

1.00 0, 71.04
A
14.54, 114.85 0, 2.76
SN .07 e12
12.36 1
2.90
ITP
1.63, .93 3.85
.64
Habits
100
Estimates
Raw Std.
S.E. C.R. P
Estimate Estimate
Intercept bba 37.307 .509 73.344 ***
Intercept bbb 32.417 .567 57.216 ***
Intercept bbd 29.377 .513 57.300 ***
Intercept bbe 34.267 .593 57.774 ***
Intercept q7 3.848 .169 22.763 ***
bba <--- A 1.000 .744

bbb <--- A .975 .080 12.131 *** .656
bbd <--- A .820 .072 11.426 *** .611
bbe <--- A 1.083 .086 12.647 *** .692
q7 <--- A .066 .012 5.490 *** .291
q7 <--- q2b .637 .091 7.022 *** .319
Q2b <--> A 2.901 .448 6.479 *** .358
sn <--> A 12.357 4.456 2.773 .006 .137
101
Goodness-of-fit
Model NPAR CMIN DF P CMIN/DF NFI-Delta1
Final model 22 20.99 13 0.073 1.615 0.967
RFI-rho1 IFI-Delta2 TLI-rho2 CFI PRATIO PNFI

Final model 0.929 0.987 0.972 0.987 0.464 0.449
PCFI NCP LO 90 HI 90 FMIN F0

Final model 0.458 7.99 0 24.616 0.042 0.016
LO 90 HI 90 RMS EA LO 90 HI 90 PCLOS E
Final model 0 0.049 0.035 0 0.062 0.801
AIC BCC ECVI LO 90 HI 90 MECVI

Final model 64.99 65.707 0.13 0.114 0.164 0.132
Saturated model 70 71.141 0.14 0.14 0.14 0.143
Independence model 652.525 652.753 1.308 1.15 1.48 1.308
HOELTER HOELTER
0.05 0.01
Final model 532 659
The diagnostics are now ok
Independence model 33 38
102
Defining a Model
In SEM there are two models which are used
1. The measurement model which represent how measured
variable come to represent constructs.
2. The structural models which shows how constructs are
associated which each other.
Common types of theoretical relationships in a SEM model are as
follows.
a. Relationship Between a
Construct and a Measured Exogenous X
Variable
Or
Exogenous X
b. Relationship Between a X1
Construct and Multiple
Measure Variables Exogenous X2
X3
c. Dependence Relationship
Between Two Constructs Construct 1 Construct 2
(a structural relationship)
Construct 1
d. Correlational Relationship
Between Constructs
Construct 1
Straight Arrows depict a dependence relationship.

In a measurement model the dependence relationship occur
from constructs to variables.
In a structural model dependence relationship occur between
constructs.
The Arrows flow from the antecedents i.e. independent variable
to the outcome or dependent variable.
Relationships
a. Dependence Relationship
Exogenous Endogenous
Construct Construct
X1 X2 X3 X4 y1 Y2 Y3 Y4
The dependence relationships is depicted by a state

Arrow head.
b. Correlational Relationship
Exogenous Exogenous
Construct Construct
X1 X2 X3 X4 X5 X6 X7 X8
It depicts a Correlational Relationship. This relationship is

depicted by a two headed Arrow connections.
A single SEM model can contain both dependence and
correlational Relationships.
Assumptions of SEM
1. SEM in general requires a larger sample relative to
other multivariate approaches.
2. Estimation Technique: The most common SEM
estimation procedure is MLE i.e. maximum
likelihood estimation instead of OLS i.e. ordinary
least squares regression.
3. Model validity: It depends on goodness of fid (GOF)
index. The fundamental measure of fid is chi-square
(x2) GOF. The model fid compare the theory to
reality as represented by the data. If the researcher
theory was perfect, the estimated covariance matrix
and the observed covariance matrix would be the
same. Mathematically the estimated covariance
matrix is compare to be actual observed covariance
matrix and closer the values of these two matrices are
to each other, the better the model is said to fit.
4. Degrees of Freedom (df): The degrees of freedom for
an analysis of a covariance structure (SEM) model is
determined by
df = ½(p)(p+1)-k
Where p is the total number of observed variables and
k is the number of estimated (free) parameters.
5. Statistical Significance of x2: in SEM we want a small
x2 value (and corresponding large p value) that
indicates no statistically significant difference
between the matrices.
Alternative Perspectives on model fit
1. Goodness-of-fit Index (GFI) : The possible range of
GFI values is 0 to 1 with higher values indicating
better fit. 0.95 value should be used for (GFI).
2. Adjusted goodness-of-fit index (AGFI). These values
are typically lower than GFI values.
3. Root Means Square Residual (RMSR) and
Standardized Root Mean Residual (SRMR). The
error in prediction for each covariance term creates a
residual. The root mean square residual (RMSR) is
the square root of the mean of these squared residual
(SRMR) is a standardized value of RMSR and thus is
more useful for comparing fit across models. Lower
RMSR and SRMR values represent better fit.
Contd…...
4. Root Mean Square Error of Approximation (RMSEA):

Lower RMSEA values indicate better fit. So, like the
SRMR and RMSR, it is balances of fit index in contrast to
indices where higher values produce better fit. Values
below 0.10 are best fit for the model.
5. Other Absolute Indices:
• Normed x2: It is simple ratio of x2 to degrees of freedom
for a model. x2: df ratios of the order of 3:1 or less are
associated with better fitting models.
• Expected cross validation index (ECVI): It is most useful
in comparing the performance of one model to another.
• Cross validation index (CVI): This is used for validation of
sample by splitting the original observations randomly into
two groups.
• Gamma Hat: Typical Gamma Hat values range between 0.9
& 1.0.
Incremental Fit Indices
1. Normed Fit Index (NFI): It ranges between 0
and 1 and a model with perfect fit would
produce an NFI of 1.
2. Comparative Fit Index (CFI): The values range
between 0 and 1, with higher values indicating
better fit.
3. Tucker Lewis Index (TLI): The TLI is not
normed and thus its values can fall below 0 or
above 1. Models with good fit have values that
approach 1.
4. Relative Noncentrality Index (RNI): Values
range between 0 and 1. RNIs lower than 0.9
are usually not associated with good fit.
Parsimony Fit Indices
1. Parsimony Goodness-of-Fit Index (PGFI):
The values range between 0 and 1. The two
models can be compared and the one with a
higher PGFI is preferable.
2. Parsimony Normed Fit Index (PNFI):
Relatively high values of (PNFI) represent
relatively better fit. PNFI values can be used
in comparing one model to another, with the
highest PNFI value indicating the better
model.
Multiple fit Indices
 As models become more complex, the likelihood of
alternative models with equivalent fit increases.
 Multiple fit indices should be used to assess a
model’s goodness-of-fit and should include:
 The x2 value and the associated df
 One absolute fit index (i.e., GFI, RMSEA, or
SRMR)
 One incremental fit index (i.e., CFI or TLI)
 One goodness-of-fit index (GFI, CFI, TLI, etc.)
 One badness-of-fit index (RMSEA, SRMR, etc.)
THANK YOU

SEM Ppt2final

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SEM Ppt2final

Uploaded by

Copyright:

Available Formats

Statistical Modelling

(Special Topic: SEM)

E-shopping On-line shopping as the interaction of four variables:

Customer relationship Customer satisfaction for credit cards

Indicators x1 x2 x4 x4 Endogenous Latent Variable

Ability Aspiration Achievement

Father’s Peer’s Personal

The covariance between two variables

Rule of thumb for identification. At least

How is the theory tested?

a) Minimum sample discrepancy (CMIN) checks whether the model

 Root Mean Square Error of Approximation (RMSEA): It has a

With the constructs specified, the relationships

Price Service Atmosphere Customer Customer

service Cov(P,S) Var(S)

atmosphere Cov(P,A) Cov(S,A) Var(A)

Customer share Cov(P,CS) Cov(S,CS) Cov(A,CS) Var(CS)

Customer Cov(P,CC) Cov(S,CC) Cov(A,CC) Cov(CS,CC) Var(CC)

Showing observed covariance

Price Service Atmosphere Customer Customer

service 0.20 Var(S)

atmosphere 0.20 0.15 Var(A)

Customer share 0.20 0.30 0.50 Var(CS)

Customer -.05 0.25 0.40 0.50 Var(CC)

Showing observed covariance by putting values

Y(cs) = .065(price) + .219(service) + .454(atmosphere)

Similarly predicted values of customer commitment can be

Or Y(cc) = .50{.065(price) + .219(service) + .454(atmosphere)}

Now look at the relationship i.e. relationship between service and

So Total: Direct + Indirect = .219 + .013 + .068 = .30

Thus the estimated covariance between Service and Customer

Hence RESIDUAL(R) in SEM is the difference between any

Thus when we compare the observed and actual covariance

If the relationship is from the

In Amos Model needs to be either Just Identified

The path diagram can be

These are elements and

Click on FILE NAME to open the

Click here to see the list of

Simply drag the desired variables

Ask for additional

This pane shows the final

As expected, the model is

Click here (or press F10)

Number of distinct sample moments: 119 DoF show that the

The Chi-square statistic is high and

Correlations: (Group number 1 - Default model)

Estimate The only bidirectional relation which

Baseline Comparisons Model

Information and Hoelter criteria

bba .98 .82

bba <--- A 1.000 .744

RFI-rho1 IFI-Delta2 TLI-rho2 CFI PRATIO PNFI

PCFI NCP LO 90 HI 90 FMIN F0

AIC BCC ECVI LO 90 HI 90 MECVI

Straight Arrows depict a dependence relationship.

The dependence relationships is depicted by a state

It depicts a Correlational Relationship. This relationship is

4. Root Mean Square Error of Approximation (RMSEA):

You might also like