Professional Documents
Culture Documents
SAS Libary Factor Analysis Using SAS PROC FACTOR
SAS Libary Factor Analysis Using SAS PROC FACTOR
SAS Libary Factor Analysis Using SAS PROC FACTOR
This page was developed by the Consulting group of the Division of Statistics and Scientific
Computing at the University of Texas at Austin. We thank them for permission to distribute
it via our web site.
26 June 1995
Usage Note: Stat-53
Copyright 1995-1997, ACITS, The University of Texas at Austin
Statistical Services, 475-9372
Originally available online at: http://ssc.utexas.edu/docs/stat53.html
= a
f + a f + ...
i1 1
i2 2
+ a
f + e
ik k
i
where y_i is the iith observed variable on the factors, and e_i is the residual of y_i on the factors.
Given the assumption that the residuals are uncorrelated across the observed variables, the
correlations among the observed variables are accounted for by the factors.
The following is an example of a simple path diagram for a factor analysis model. This diagram
is a schematic representation of the above formula.
F1 and F2 are two common factors. Y1, Y2, Y3, Y4, and Y5 are observed variables, possibly 5
subtests or measures of other observations such as responses to items on a survey. e1, e2, e3, e4,
and e5 represent residuals or unique factors, which are assumed to be uncorrelated with each
other. Any correlation between a pair of the observed variables can be explained in terms of their
relationships with the latent variables.
sense that the obtained composite variables serve different purposes. In common factor analysis,
a small number of factors are extracted to account for the intercorrelations among the observed
variables--to identify the latent dimensions that explain why the variables are correlated with
each other. In principal component analysis, the objective is to account for the maximum portion
of the variance present in the original set of variables with a minimum number of composite
variables called principal components.
Secondly, what are the assumptions about the variance in the original variables? If the observed
variables are measured relatively error free, (for example, age, years of education, or number of
family members), or if it is assumed that the error and specific variance represent a small portion
of the total variance in the original set of the variables, then principal component analysis is
appropriate. But if the observed variables are only indicators of the latent constructs to be
measured (such as test scores or responses to attitude scales), or if the error (unique) variance
represents a significant portion of the total variance, then the appropriate technique to select is
common factor analysis. Since the two methods often yield similar results, only CFA will be
illustrated here.
variables increases while the number of factors remains constant. Kaiser and Rice[2] proposed a
measure of sampling adequacy, which indicates how near R-1 is to a diagonal matrix.
Third, is the number of observations sufficient to provide reliable estimations of the correlations
between the variables? Correlation coefficients tend to be unstable and greatly influenced by the
presence of outliers if the sample size is not large. It is generally unwise to conduct a factor
analysis on a sample of fewer than 50 observations. Moreover, the sample size should also be
considered in relation to the number of variables included in the analysis. Various rules of thumb
have been proposed, with the minimum number of observations per variable ranging from 5 to
10. While there seems to be no definitive answer to this problem, everyone agrees that the more
observations you have, the more valid your results.
Fourth, is correlation a valid measure of association among the variables to be analyzed? The
correlation coefficient is being used as a measure of conceptual similarity of the variables. If
strong curvilinear relationships are present among variables, for example, the correlation
coefficient is not an appropriate measure. In such cases, the results of a factor analysis based on
correlation coefficients will be invalid. The variables should meet the other assumptions required
for the correlation coefficient as well. However, in social and behavioral sciences, we seldom
have variables that strictly meet these assumptions. Ordinal and dichotomous variables have
been submitted to a factor analysis in the social and behavioral sciences. Unless the distributions
of the variables are strongly nonnormal, factor analysis seems to be robust to minor violations of
these assumptions.
3. Estimating Communalities
As mentioned earlier, in principal components analysis we do not make a distinction between
common and unique parts of the variation present in a variable. The correlation (covariance)
matrix, with 1.0s (variances) down the main diagonal, is submitted to an analysis. On the other
hand, a common factor analysis begins by substituting the diagonal of the correlation matrix with
what are called prior communality estimates (h2). The communality estimate for a variable is the
estimate of the proportion of the variance of the variable that is both error free and shared with
other variables in the matrix. Since the concept of common variance is hypothetical, we never
know exactly in advance what proportion of the variance is common and what proportion is
unique among variables. Therefore, estimates of communalities need to be supplied for a factor
analysis. These estimates can be specified with the PRIORS= option to the PROC FACTOR
statement. The simplest approach is to use the largest absolute correlation for a variable with any
other variable as the communality estimate for the variable (PRIORS=MAX). A more
sophisticated approach is to use the squared multiple correlation (R2) between the variable and all
other variables (PRIORS=SMC). As the number of variables increases, the importance of
accurate prior estimates decreases.
There are still other methods of estimating communalities available in SAS. Interested readers
should refer to SAS manual[4]. Some method should be chosen, because SAS by default sets all
prior communalities to 1.0, which is the same as requesting a principal components analysis.
This default setting has caused misunderstanding among the novice users who are not aware of
the consequence of overlooking the default settings. Many researchers claim to have conducted a
common factor analysis when actually a principal components analysis was performed.
Another criterion, related to the latent root criterion, is the percentage or proportion of the
common variance (defined by the sum of communality estimates) that is explained by successive
factors. For example, if you set the cutting line at 75 percent of the common variance
(PROPORTION=.75 or PERCENT=75), then factors will be extracted until the sum of
eigenvalues for the retained factors exceeds 75 percent of the common variance, defined as the
sum of initial communality estimates.
Scree Test
Sometimes plotting the eigenvalues against the corresponding factor numbers gives insight into
the maximum number of factors to extract. The SCREE option in the PROC FACTOR statement
produces a scree plot that illustrates the rate of change in the magnitude of the eigenvalues for
the factors. The rate of decline tends to be fast for the first few factors but then levels off. The
"elbow", or the point at which the curve bends, is considered to indicate the maximum number of
factors to extract. The figure below illustrates an example of a rather idealistic scree plot, where
a clear elbow occurred at the fourth factor, which has an eigenvalue right around 1. Notice that
the eigenvalues for the first few variables drop rapidly and after the fourth factor the decline in
the eigenvalues gradually levels off. The scree plot suggests a maximum of four factors in this
example. One less factor than the number at the elbow might be appropriate if you are concerned
about getting an overly defined solution. However, many scree plots do not give such a clear
indication of the number of factors.
Analysis of Residuals
If the factors are doing a good job in explaining the correlations among the original variables, we
expect the predicted correlation matrix R* to closely approximate the input correlation matrix. In
other words, we expect the residual matrix R - R* to approximate a null matrix. The RESIDUAL
(or RES) option in the PROC FACTOR statement prints the residual correlation matrix and the
partial correlation matrix (correlation between variables after the factors are partialled out or
statistically controlled). If the residual correlations or partial correlations are relatively large (>
0.1), then either the factors are not doing a good job explaining the data or we may need to
extract more factors to more closely explain the correlations. If maximum likelihood factors
(METHOD=ML) are extracted, then the output includes the Chi-square test for the significance
of residuals after the extraction of the given factor. This test comprises two separate hypothesis
tests. The first test, labeled, "Test of H0: No common factors" tests the null hypothesis that no
common factors can sufficiently explain the intercorrelations among the variables included in the
analysis. You want this test to be statistically significant (p < .05). A nonsignificant value for this
test statistic suggests that your intercorrelations may not be strong enough to warrant performing
a factor analysis since the results from such an analysis could probably not be replicated.
The second Chi-square test statistic, labelled "Test of H0: N factors are sufficient" is the test of
the null hypothesis that N common factors are sufficient to explain the intercorrelations among
the variables, where N is the number of factors you specify with an NFACTORS=N option in the
PROC FACTOR statement. This test is useful for testing the hypothesis that a given number of
factors are sufficient to account for your data; in this instance your goal is a small chi-square
value relative to its degrees of freedom. This outcome results in a large p-value (p > .05). One
downside of this test is that the Chi-square test is very sensitive to sample size: given large
degrees of freedom, this test will normally reject the null hypothesis of the residual matrix being
a null matrix, even when the factor analysis solution is very good. Therefore, be careful in
interpreting this test's significance value. Some data sets do not lend themselves to good factor
solutions, regardless of the number of factors extracted.
Interpretability
Another very important but often overlooked criterion for determining the number of factors is
the interpretability of the factors extracted. Factor solutions should be evaluated not only
according to empirical criteria but also according to the criterion of " theoretical
meaningfulness." Extracting more factors will guarantee that the residual correlations get smaller
and thus that the chi-square values get smaller relative to the number of degrees of freedom.
However, noninterpretable factors may have little utility. That is, an interpretable three-factor
solution may be more useful (not to mention more parsimonious) than a less interpretable fourfactor solution with a better goodness-of-fit statistic.
A Priori Hypotheses
The problem of determining the number of factors is not a concern if the researcher has an a
priori hypothesis about the number of factors to extract. That is, an a priori hypothesis can
provide a criterion for the number of factors to be extracted. If a theory or previous research
suggests a certain number of factors and the analyst wants to confirm the hypothesis or replicate
the previous study, then a factor analysis with the prespecified number of factors can be run. The
NFACTOR=n (or N=n) option in PROC FACTOR extracts the user-supplied number of factors.
Ultimately, the criterion for determining the number of factors should be the replicability of the
solution. It is important to extract only factors that can be expected to replicate themselves when
a new sample of subjects is employed.
Once you decide on the number of factors to extract, the next logical step is to determine the
method of rotation. The fundamental theorem of factor analysis is invariant within rotations. That
is, the initial factor pattern matrix is not unique. We can get an infinite number of solutions,
which produce the same correlation matrix, by rotating the reference axes of the factor solution
to simplify the factor structure and to achieve a more meaningful and interpretable solution. The
idea of simple structure has provided the most common basis for rotation, the goal being to rotate
the factors simultaneously so as to have as many zero loadings on each factor as possible. The
following figure is a simplified example of rotation, showing only one variable from a set of
several variables.
The variable V1 initially has factor loadings (correlations) of .7 and .6 on factor 1 and factor 2
respectively. However, after rotation the factor loadings have changed to .9 and .2 on the rotated
factor 1 and factor 2 respectively, which is closer to a simple structure and easier to interpret.
The simplest case of rotation is an orthogonal rotation in which the angle between the reference
axes of factors are maintained at 90 degrees. More complicated forms of rotation allow the angle
between the reference axes to be other than a right angle, i.e., factors are allowed to be correlated
with each other. These types of rotational procedures are referred to as oblique rotations.
Orthogonal rotation procedures are more commonly used than oblique rotation procedures. In
some situations, theory may mandate that underlying latent constructs be uncorrelated with each
other, and therefore oblique rotation procedures will not be appropriate. In other situations where
the correlations between the underlying constructs are not assumed to be zero, oblique rotation
procedures may yield simpler and more interpretable factor patterns.
A number of orthogonal and oblique rotation procedures have been proposed. Each procedure
has a slightly different simplicity function to be maximized. The ROTATE= option in the PROC
FACTOR statement supports five orthogonal rotation methods: EQUAMAX, ORTHOMAX,
QUARTIMAX, PARSIMAX, and VARIMAX; and two oblique rotation methods:
PROCRUSTES and PROMAX. The VARIMAX method has been the most commonly used
orthogonal rotation procedure.
6. Interpretation of Factors
One part of the output from a factor analysis is a matrix of factor loadings. A factor loading or
factor structure matrix is a n by m matrix of correlations between the original variables and their
factors, where n is the number of variables and m is the number of retained factors. When an
oblique rotation method is performed, the output also includes a factor pattern matrix, which is a
matrix of standardized regression coefficients for each of the original variables on the rotated
factors. The meaning of the rotated factors are inferred from the variables significantly loaded on
their factors. A decision needs to be made regarding what constitutes a significant loading. A rule
of thumb frequently used is that factor loadings greater than .30 in absolute value are considered
to be significant. This criterion is just a guideline and may need to be adjusted. As the sample
size and the number of variables increase, the criterion may need to be adjusted slightly
downward; it may need to be adjusted upward as the number of factors increases. The procedure
described next outlines the steps of interpreting a factor matrix.
1. Identifying significant loadings: The analyst starts with the first variable (row) and examines
the factor loadings horizontally from left to right, underlining them if they are significant. This
process is repeated for all the other variables. You can instruct SAS to perform this step by using
the FUZZ= option in the PROC FACTOR statement. For instance, FUZZ=.30 prints only the
factor loadings greater than or equal to .30 in absolute value.
Ideally, we expect a single significant loading for each variable on only one factor: across each
row there is only one underlined factor loading. It is not uncommon, however, to observe split
loadings, a variable which has multiple significant loadings. On the other hand, if there are
variables that fail to load significantly on any factor, then the analyst should critically evaluate
these variables and consider deriving a new factor solution after eliminating them.
2. Naming of Factors: Once all significant loadings are identified, the analyst attempts to assign
some meaning to the factors based on the patterns of the factor loadings. To do this, the analyst
examines the significant loadings for each factor (column). In general, the larger the absolute
size of the factor loading for a variable, the more important the variable is in interpreting the
factor. The sign of the loadings also needs to be considered in labeling the factors. It may be
important to reverse the scoring of the negatively worded items in Likert-type instruments to
prevent ambiguity. That is, in Likert-type instruments some items are often negatively worded so
that high scores on these items actually reflect low degrees of the attitude or construct being
measured. Remember that the factor loadings represent the correlation or linear association
between a variable and the latent factor(s). Considering all the variables' loading on a factor,
including the size and sign of the loading, the investigator makes a determination as to what the
underlying factor may represent.
manual[5] and is shown in Table 2. Inspection of the correlation matrix shows that the
correlations are substantial, indicating the presence of a substantial general factor.
Table 1. Correlation matrix for 13 subscales
Subscale
Inf
Sim
Ari
Voc
Com
PiA
Blo
Obj
Sym
Information
Similarities
.66
Arithmetic
.57
.55
Vocabulary
.70
.69
.54
Comprehension .56
.59
.47
.64
Digit Span
.34
.34
.43
.35
.29
Pic. Completion .47
.45
.39
.45
.38
Coding Subscale .21
.20
.27
.26
.25
Pic. Arrang.
.40
.39
.35
.40
.35
Block Design
.48
.49
.52
.46
.40
.41
Object Assembly .41
.42
.39
.41
.34
.37
.61
Symbol Search .35
.35
.41
.35
.34
.36
.45
.38
Mazes
.18
.18
.22
.17
.17
23
.31
.29
.24
Dig
PiC
Cod
.25
.23
.20
.32
.18
.37
.52
.28
.27
.26
.49
.24
.28
.33
.53
.14
.24
.15
PROC FACTOR can handle input data consisting of either a correlation matrix or the raw data
matrix used to produce the correlation matrix. The correlation matrix can be a SAS dataset
generated from the PROC CORR procedure or can be a text file containing the lower triangle
(including the main diagonal) of a correlation matrix. For our example, a text file of correlations
is created and called WISC.DAT. The following SAS DATA step code defines the type of the
input data file WISC.DAT as a correlation matrix, and labels its variables. The
_TYPE_=`CORR'; statement must be typed exactly as shown:
DATA
d1 (TYPE=CORR);
_TYPE_='CORR';
INFILE `wisc.dat' MISSOVER;
INPUT inf sim ari voc com dig pic cod pia blo obj sym maz;
RUN;
The following SAS code calls the FACTOR procedure with some options. METHOD=P or
METHOD=PRINCIPAL specifies the method for extracting factors to be the principal-axis
factoring method. This option in conjunction with PRIORS=SMC performs a principal factor
analysis. The option ROTATE=PROMAX performs an oblique rotation after an orthogonal
VARIMAX rotation. It is specified here because the hypothetical constructs that constitute
human intelligence, which WISC-III attempts to measure, are believed to be interrelated with
each other. The CORR option requests the correlation matrix be printed, and the RES or
RESIDUALS option requests that a residual correlation matrix be printed. The residual
correlation matrix shows the difference between the observed correlation matrix and the
predicted correlation matrix. If the retained factors are sufficient to explain the correlations
among the observed variables, the residual correlation matrix is expected to approximate a null
matrix (most values <= .10).
Table 2 shows the prior communality estimates for 13 subtests used in this analysis. The squared
multiple correlations (SMC), which are printed below, represent the proportion of variance of
each of the 13 subtests shared by all remaining subtests. The subtest MAZES has the prior
communality estimate of 0.132, which means that only 13% of the variance of the subtest
MAZES is shared by all other subtests, indicating that this subtest measures a somewhat different
construct than the other subtests. A small communality estimate might indicate that the variable
or item may need to be modified or even dropped.
Table 2.
SIM
0.587543
ARITH
0.481994
VOC
0.636296
DIGIT
0.224104
PICTCOM
0.385580
CODING
0.306120
PICTARG
0.287693
0.422932
0.132220
BLOCK
OBJECT SYMBOL MAZES
0.533202
0.439176
COMP
0.473358
Average = 0.42344554
The sum of all prior communality estimates, 5.505 in this example, is the estimate of the
common variance among all subtests. This initial estimate of the common variance constitutes
about 42% of the total variance present among all 13 subtests.
Table 3 shows the factor numbers and corresponding eigenvalues. According to the Kaiser and
Guttman rule, only one factor can be retained because only the first factor has an eigenvalue
greater than one. However, as suggested in the previous section, this criterion may be applicable
only to principal component analysis, not common factor analysis. Two factors can be retained if
the average eigenvalue (0.423) instead of 1.0 is used as the criterion. The authors of WISC-III
retained all factors with positive eigenvalues and thus retained the first four factors. The fifth and
following factors have negative eigenvalues, which may not be intuitively appealing just as a
negative variance is not. This oddity occurs only in common factor analysis due to the restriction
that the sum of eigenvalues be set equal to the estimated common variance, not the total
variance.
Table 3.
Eigenvalue
Difference
Proportion
Cumulative
Eigenvalue
Difference
Proportion
Cumulative
6
-0.0224
0.0345
-0.0041
1.1450
Eigenvalue
Difference
Proportion
Cumulative
11
12
13
-0.1310 -0.1547 -0.2031
0.0237
0.0485
-0.0238 -0.0281 -0.0369
1.0650
1.0369
1.0000
7
-0.0569
0.0213
-0.0103
1.1347
8
-0.0782
0.0065
-0.0142
1.1205
9
-0.0848
0.0049
-0.0154
1.1051
10
-0.0897
0.0412
-0.0163
1.0888
The scree plot shown below seems to suggest the presence of a general factor as predicted from
the inspection of the correlation matrix. A large first eigenvalue (5.11) and a much smaller
second eigenvalue (0.68) suggests the presence of a dominant global factor. Stretching it to the
limit, one might argue that a secondary elbow occurred at the fifth factor, implying a four-factor
solution. That is equivalent to retaining all factors with positive eigenvalues. Research has
suggested that the structure of the Wechsler's intelligence scales are hierarchical. That is, at the
top of the hierarchy all subtests converge to a single general factor, below which are several less
general factors defined by clusters of subtests. A four-factor solution is more interesting and
meaningful than a single factor solution to investigate the hierarchical structure of the WISC-III.
The results presented in the following section will be based on a four-factor solution, which was
obtained by repeating the analysis with the NFACTOR=4 option specifying that the first four
factors be retained.
Table 4.
Table 4 above shows the initial unrotated factor structure matrix, which consists of the
correlations between the 13 subtests and the four retained factors. The current estimate of the
common variance is now 6.338, which is somewhat larger than the initial estimate of 5.505.
The off-diagonal elements of the residual correlation matrix are all close to 0.01, indicating that
the correlations among the 13 subtests can be reproduced fairly accurately from the retained
factors. The root mean squared off-diagonal residual is 0.0178. The inspection of the partial
correlation matrix yields similar results: the correlations among the 13 subtests after the retained
factors are accounted for are all close to zero. The root mean squared partial correlation is 0.038,
indicating that four latent factors can accurately account for the observed correlations among the
13 subtests.
The table shown below is the factor structure matrix after the VARIMAX rotation. The
correlations greater than 0.30 are underlined. There are some split loadings where a variable is
significantly (> 0.3) loaded on more than one factor. This matrix, however, is not interpreted
because an oblique solution has been requested.
Table 5.
Table 5.
BLOCK
OBJECT
SYMBOL
MAZES
Table 6 shown below is the factor structure matrix after the oblique PROMAX rotation, which
allows the latent factors to be correlated with each other. The matrix of inter-factor correlations
(Table 7) shows that the factors are substantially correlated with each other. The inter-factor
correlations range between 0.44 and 0.65. If we submit these intercorrelated factors to new factor
analysis, we might be able to obtain a single second-order factor, which could correspond to the
general intelligence or g factor in previous research. One downside of an oblique rotation method
is that if the correlations among the factors are substantial, then it is sometimes difficult to
distinguish among factors by examining the factor loadings. In such situations, you should
investigate the factor pattern matrix, which is a matrix of the standardized coefficients for the
regression of the factors on the observed variables.
Table 6.
Table 8 is the factor pattern matrix, which will be used to interpret the meaning of the factors.
The values in this matrix are the standardized regression coefficients, which are functionally
related to the part or semipartial correlation between a variable and the factor when other factors
are held constant. Therefore, a value in this matrix represents the individual and nonredundant
contribution that each factor is making to predict a subtest. The regression coefficients greater
than 0.30 are underlined to assist the interpretation.
Table 8. Rotated Factor Pattern (Standardized Regression
Coefficients)
INFO
SIM
ARITH
VOC
COMP
DIGIT
PICTCOM
CODING
PICTARG
BLOCK
OBJECT
SYMBOL
MAZES
FACTOR1
0.73663
0.74378
0.35704
0.85010
0.71870
0.16057
0.24101
0.00651
0.25467
0.06661
0.04111
0.03508
0.08719
The subtests significantly loaded on the first factor are Information, Similarity, Arithmetic,
Vocabulary, and Comprehension subtests. These are the subtests that are orally presented and
require verbal responses. Therefore, this factor may be named "Verbal Comprehension". The
second factor is identified by the following subtests: Picture Completion, Picture Arrangement,
Block Design, and Object Assembly. All of these subtests have a geometric or configural
component in them: these subtests measure the skills that require the manual manipulation or
organization of pictures, objects, blocks, and the like. Therefore, this factor may be named
"Perceptual Organization." The two subtests loaded on the third factors are Coding and Symbol
Search subtests. Both subtests measure basically the speed of simple coding or searching
process. Therefore, this factor can be named "Processing Speed." Finally, Arithmetic and Digit
Span subtests identify the fourth factor. Both subtests deal with arithmetic problems or numbers
so that this factor can be named "Numerical Ability." The last two factors are doublets since they
are identified by only two subtests each. Therefore, they are conceptually weak compared to the
first two factors and more subtests may need to be added to these factors to make them
conceptually sound.
It is possible to estimate the factor scores, or a subject's relative standing on each of the factors, if
the original subject-by-variable raw data matrix is available. To compute the factor scores for all
subjects on all factors, use the following SAS code:
PROC FACTOR DATA=raw
PROC SCORE DATA=raw
RUN;
where raw is the original data matrix, fact is the matrix of factor scoring coefficients, and scores
is the matrix of factor scores for subjects.
Footnotes
1. Guttman, L. (1953) "Image Theory for the Structure of Quantitative Variables",
Psychometrica, 18, 277-296.
2. Kaiser, H.F., and Rice, J. (1974) "Little Jiffy, Mark IV", Educational and Psychological
Measurement, 34, 111-117.
3. Loehlin, J.C. (1992) Latent Variable Models. Erlbaum Associates, Hillsdale NJ.