Professional Documents
Culture Documents
Factor
Factor
Factor
e2
e3
e4
measured directly
Require a specific operational definition Indicators of the construct need to be selected Data from the indicators must be consistent with certain predictions (e.g., moderately correlated with one another)
Latent variables are the real variables, not measured variables. Cannot use factor scores because they still have measurement error. Ability to use latent variables is the primary strength of SEM
Multi-Indicator Approach
Reduces the overall effect of measurement error of any individual observed variable on the accuracy of the results We distinguish between observed variables (indicators) and underlying latent variables or factors (constructs) measurement model: observed variables and the latent variables
Error Var1
1
Error Var2
1
Error Var3
1
Error VT
1
Error AT
1
Error ST
1
Writing Sample
Latent Variable
Verbal Abilities
Construct of interest
In EFA, no a priori specification of how many latent factors or about how measures relate to these factors
Problems
Communality:
must know the communality before estimation, but communality is a function of the loadings Number of factors Rotation: when there are two or more factors, the solution is not unique
Principal Components
Communality
is set to 1 Factor is defined as the sum of the variables Loadings chose to maximize the explanation of the variances of the measures Loadings are usually too high in that the predicted correlations are larger than the observed correlations
Principal Factors
Communality
maximize the explanation of the correlations between the measures. minimizes the sum of squared residuals residual = observed correlation minus predicted correlation
Maximum Likelihood
Solution
is iteratively estimated Factor is a latent variable Loadings chosen to maximize the explanation of the correlations between the measures tries harder to explain the larger correlations Statistical tests available
Example: Bollen.sps
Principal Axis Maximum
Overall
.905
.877
.816
Clear
Color Odor
.865
.921 .805
.811
.918 .710
.862
.938 .684
Note that Principal Components loadings generally larger than the other methods. PA and ML fairly similar.
EFA is useful when the researcher does not know how many factors there are or when it is uncertain what measures load on what factors EFA typically used as a data reduction strategy Both EFA and CFA reduce a larger number of observed variables into a smaller number of latent factors However, EFA is done with little a priori hypothesis; CFA requires a priori specification based on hypothesis
Assumptions of CFA
Multivariate normality Sufficient sample size Correct model specification Sampling Assumptions- Simple random sample
Representation in SEM
Latent variable represented by a circle Measured variables (indicators) represented by a square Each indicator variable has an error term
Error Var1
1
Error Var2
1
Error Var3
1
Each variable loads on one and only one factor Factors can (and typically are) correlated Errors across indicator variables are independent
The factors are uncorrelated with the measurement errors Most (if not all) of the errors of different indicators are uncorrelated with each other
Assumptions
Latent Variable
Residuals In CFA
Item level residuals are represented as latent variables. They are not called disturbances. They represent measurement error in EFA/CTT sense. This is a tremendous advantage in hybrid models, which combine CFA and path models, because it separates measurement error from error in the model.
Are both residuals. Both necessary in their respective roles. Errors always represent measurement error. Disturbances
Represent
omitted variables (in hybrid model). If no error terms, measurement error will be in disturbance (in path model).
One indicator treated as a marker or reference variable (Brown p. 61, Kline p. 170): Its loading is fixed to one Which variable should you choose?
closest in meaning to the factor most interpretable units of measurement empirical: strongest correlations with other indicators No test of statistical significance
Factor variance is freely estimated Error paths are set to one Error variances are freely estimated
Standard Specification
e1 e2 e3 e4
Identification
Identification in CFA is largely determined by the number of indicator variables used in the model (more later). Number of indicators
2 is the minimum 3 is safer, especially if factor correlations are weak 4 provides safety 5 or more is more than enough (If too many indicators then combine indicators into sets or parcels.)
Identification
Overidentified model = knowns > unknowns
Number of knowns = Number of variances and covariances of observed variables computed by k(k+1)/2, where k is the number of observed variables Number of unknowns (free parameters) is based on the specified model. It is typically a sum of the number of: exogenous variables (one variance estimated for each) endogenous variables (one error variance each) correlations between variables (one covariance for each pairing) regression paths (arrows linking exogenous variables to endogenous variables) Latent variables indicator variables (one error variance for each) paths from latent variables to indicator variables (excluding those fixed to 1)
Can be unstandardized or standardized If measure loads on only one factor, standardized factor loading is the correlation between the measure and the factor (and square root of measures reliability).
Factor covariances or correlations: association between each pair of latent variables Error variance: variance in the observed measure that is not explained by latent variable
Error variance is variance not explained by the factor (but not necessarily random or meaningless variance)
V3
W1 1 V1 W4 1
BSI8 Depression Factor BSI5 e5
V4 1
e8
V5 W2
BSI10
1
e10
W5 1 W6 W3 1
BSI16 e16 BSI14 e14
V6
V7
V8 1
BSI18 e18
Number of unknowns = 14
Variance of latent factor (1) Free factor loadings (6) Variances of error terms (7)
CONFIRMATORY FACTOR ANALYSIS: One Factor of BSI Depression with all parameters labeled
2.
Items (indicators) specified to measure a common underlying factor should have relatively high loadings on that factor (convergent validity) Estimated correlations between the factors should not be excessively high (>.85) (discriminant validity)
Discriminant validity refers to the distinctiveness of the factors measured by different sets of indicators.
What to examine
error variances (one-tailed test) error correlations (two tailed) Check for Heywood cases!!!! (Negative error variances)
Heywood Cases: Negative error variance (or a standardized loading larger than 1) Why?
Misspecification
Outliers
Small
Creates an extra df as one parameter is not estimated Need to adjust chi-square and fit indices
Non-linear constraints that error variances cannot be negative (always in EQS) Set loadings equal (must use covariance matrix) Use an alternative estimation method beside ML Empirical underidentification: make sure correlations are not weak
Respecification
Simpler Model
Set
loadings equal: use covariance matrix and variables must be in the same metric
If the two-headed arrow in model b is set to 1, that would be saying there is only one latent trait.
Thus model b has one more path than model a.
Two models
Base
Model More Complex Model (e.g., base model with additional paths)
df If the base model is good fitting, then the more complex model must also be good fitting. Chi square and degrees of freedom are subtracted to test constraints made in the base model The more complex model should be a good fitting model, otherwise the conclusion is that one model is less poor than another.
fewer
Complex: more parameterized, less parsimonious Simpler: less parameterized, more parsimonious
2 diff = n.s. favor parsimonious model 2 diff = sig favor more parameterized model
Nested Models?
Nested Models?
All loadings are freely estimated Factor variance are set to one Error paths are freely estimated:
(Standardized)
error path equals the square root of one minus the standardized factor loading squared
e1
e2
e3
e4
select the number of factors to determine the nature of the paths between the factors and the measures. Paths can be fixed at zero, fixed at another constant value, allowed to vary freely, or be allowed to vary under specified constraints (such as being equal to each other).
2.
When the factor model is fit to the data, the factor loadings are chosen to minimize the discrepancy between the correlation matrix implied by the model and the actual observed matrix. The amount of discrepancy after the best parameters are chosen can be used as a measure of how consistent the model is with the data. Fit statistics
To compare two nested models, examine the difference between their c2 statistics. Most tests of individual factor loadings can be made as comparisons of full and reduced factor models. For non-nested models, you can compare the Root mean square error of approximation (RMSEA), an estimate of discrepancy per degree of freedom in the model, other fit indices, and the AIC and BIC.