Professional Documents
Culture Documents
Factor Analysis (SPSS Based)
Factor Analysis (SPSS Based)
Factor Analysis (SPSS Based)
To demonstrate principal components analysis, we will use the sample problem in the
text which begins on page 120.
The perceptions of HATCO on seven attributes (Delivery speed, Price level, Price
flexibility, Manufacturer image, Service, Salesforce image, and Product quality) are
examined to (1) understand if these perceptions can be "grouped" and (2) reduce the
seven variables to a smaller number. (Text, page 120)
In this stage, we address issues of sample size and measurement issues. Since missing
data has an impact on sample size, we will analyze patterns of missing data in this
stage.
There are 100 subjects and 7 variables in the analysis for a ratio of 14 to 1. This
requirement is met.
The variables included in the analysis are core elements of the business. There do not
appear to be any extraneous variables.
Principal Components Factor Analysis
Stage 3: Assumptions of Factor Analysis
In this stage, we do the tests necessary to meet the assumptions of the statistical
analysis. For factor analysis, we will examine the suitability of the data for a factor
analysis.
We will use the criteria of 0.40 for identifying substantial loadings on factors, rather
than a statistical probability, so this criteria is not binding on this analysis.
Homogeneity of sample
There is nothing in the problem to indicate that subgroups within the sample have
different patterns of scores on the variables included in the analysis. In the absence of
evidence to the contrary, we will assume that this assumption is met.
The determination that the use of factor analysis is justifiable is obtained from the
statistical output that SPSS provides in the request for a factor analysis.
Principal Components Factor Analysis
Requesting a Principal Components Factor Analysis
Second, select
Third, mark the Fourth, mark the
'Principal components'
'Correlation matrix' checkboxes for 'Unrotated
from the 'Method' drop
option in the factor solution' and 'Scree
down menu.
'Analyze' panel. plot' on the 'Display' panel.
First, click
on the
'Extraction...'
button.
Sixth, click on
the 'Continue'
button to
complete the
dialog box.
In the stage, several criteria are examined to determine the number of factors that represent
the data. If the analysis is designed to identify a factor structure that was obtained in or
suggested by previous research, we are employing an a priori criterion, i.e. we specify a specific
number of factors. The three other criteria are obtained in the data analysis: the latent root
criterion, the percentage of variance criterion, and the Scree test criterion. A final criterion
influencing the number of factors is actually deferred to the next stage, the interpretability
criterion. The derived factor structure must make plausible sense in terms of our research; if it
does not we should seek a factor solution with a different number of components.
It is generally recommended that each strategy be used in determining the number of factors in
the data set. It may be that multiple criteria suggest the same solution. If different criteria
suggest different conclusions, we might want to compare the parameters of our problem, i.e.
sample size, correlations, and communalities to determine which criterion should be given
greater weight.
One of the most commonly used criteria for determining the number of factors or components to
include is the latent root criterion, also known as the eigenvalue-one criterion or the Kaiser
criterion. With this approach, you retain and interpret any component that has an eigenvalue
greater than 1.0.
The rationale for this criterion is straightforward. Each observed variable contributes one unit of
variance to the total variance in the data set (the 1.0 on the diagonal of the correlation matrix).
Any component that displays an eigenvalue greater than 1.0 is accounting for a greater amount
of variance than was contributed by one variable. Such a component is therefore accounting for
a meaningful amount of variance and is worthy of being retained. On the other hand, a
component with an eigenvalue less than 1.0 is accounting for less variance than had been
contributed by one variable.
The latent root criterion has been shown to produce the correct number of components when the
number of variables included in the analysis is small (10 to 15) or moderate (20 to 30) and the
communalities are high (greater than 0.70). Low communalities are those below 0.40 (Stevens,
page 366).
Another criterion in determining the number of factors to retain involves retaining a component
if it accounts for a specified proportion of variance in the data set, i.e. at least 5% or 10%.
Alternatively, one can retain enough components to explain some cumulative total percent of
variance, usually 70% to 80%.
While this strategy has intuitive appeal (our goal is to explain the variance in the data set), there
is not agreement about what percentages are appropriate to use and the strategy is criticized for
being subjective and arbitrary. We will employ 70% as the target total percent of variance.
With a Scree test, the eigenvalues associated with each component are plotted against their
ordinal numbers (i.e. first eigenvalue, second eigenvalue, etc.). Generally what happens is that
the magnitude of successive eigenvalues drops off sharply (steep descent) and then tends to level
off.
The recommendation is to retain all the eigenvalues (and corresponding components) before the
first one on the line where they start to level off. (Hatcher and Stevens both indicate that the
number to accept is the number before the line levels off; Hair, et. al. say that the number of
factors is the point where the line levels off, but also state that this tends to indicate one or two
more factors that are indicated by the latent root criteria.)
The Scree test has been shown to be accurate in detecting the correct number of factors with a
sample size greater than 250 and communalities greater than 0.60.
Perhaps the most important criterion in solving the number of components problem is the
interpretability criterion: interpreting the substantive meaning of the retained components and
verifying that this interpretation makes sense in terms of what is know about the constructs
under investigation.
All three of the criteria for determining the number of components indicate that two
components should be retained. If this number matches the number of components
derived by SPSS, we can continue with the analysis. If this number does not match the
number of components that SPSS derived, we need to request the factor analysis,
specifying the number of components that we want SPSS to extract.
Once the extraction of factors has been completed satisfactorily, the resulting factor
matrix, which shows the relationship of the original variables to the factors, is rotated
to make it easier to interpret. The axes are rotated about the origin so that they are
located as close to the clusters of related variables as possible. The orthogonal
VARIMAX rotation is the one found most commonly in the literature. VARIMAX rotation
keeps the axes at right angles to each other so that the factors are not correlated with
each other. Oblique rotation permits the factors to be correlated, and is cited as
being a more realistic method of analysis. In the analyses that we do for this class we
will specify an orthogonal rotation.
The first step in this stage is to determine if any variables should be eliminated from
the factor solution. A variable can be eliminated for two reasons. First, if the
communality of a variable is low, i.e. less than 0.50, it means that the factors contain
less than half of the variance in the original variable, so we might want to exclude
that variable from the factor analysis and use it in its original form in subsequent
analyses. Second, a variable may have loadings below the criteria level on all factors,
i.e. it does not have a strong enough relationship with any factor to be represented by
the factor score, so that the variables information is better represented by the original
form of the variable.
Once we have eliminated variables that do not belong in the factor analysis, we
complete the analysis by naming the factors which we obtained. Naming is an
important piece of the analysis, because it assures us that the factor solution is
conceptually valid.
Once the extraction of factors has been completed, we examine the table of 'Communalities'
which tells us how much of the variance in each of the original variables is explained by the
extracted factors. For example, in the table shown below, 65.8% of the variance in the
original X1 'Delivery Speed' variable is explained by the two extracted components. Higher
communalities are desirable. If the communality for a variable is less than 50%, it is a
candidate for exclusion from the analysis because the factor solution contains less that half of
the variance in the original variable, and the explanatory power of that variable might be
better represented by the individual variable.
The table of Communalities for this analysis shows communalities for all variables above 0.50,
so we would not exclude any variables on the basis of low communalities. If we did exclude a
variable for a low communality, we should re-run the factor analysis without that variable
before proceeding.
The table of Communalities for this analysis shows communalities for all variables above 0.50,
so we would not exclude any variables on the basis of low communalities. If we did exclude a
variable for a low communality, we should re-run the factor analysis without that variable
before proceeding.
Principal Components Factor Analysis
Analysis of the Factor Loadings - 1
When we are satisfied that the factor solution explains sufficient variance for all of the
variables in the analysis, we examine the 'Rotated Factor Matrix ' to see if each variable has
a substantial loading on one, and only one, factor. The size of the loading termed
substantial is a subject about which there are a lot of divergent opinions. We will use a
time-honored rule of thumb that a substantial loading is 0.40 or higher. Whichever method
is employed to define substantial, the process of analyzing factor loadings is the same.
The methodology for analyzing factor loading is to underline or mark all of the loadings in
the rotated factor matrix that are higher than 0.40. For the rotated factor matrix for this
problem, the substantial loadings are highlighted in green.
We examine the pattern of loadings for what is called 'simple structure' which means that
each variable has a substantial loading on one and only one factor.
In this component matrix, each variable does have one substantial loading on a
component. If one or more variables did not have a substantial loading on a factor, we
would re-run the factor analysis excluding those variables one at a time, until we have a
solution in which all of the variables in the analysis load on at least one factor.
In this component matrix, each of the original variables also has a substantial loading on
only one factor. If a variable had a substantial loading on more than one variable, we refer
to that variable as "complex" meaning that it has a relationship to two or more of the
derived factors. There are a variety of prescriptions for handling complex variables. The
simple prescription is to ignore the complexity and treat the variable as belonging to the
factor on which it has the highest loading. A second simple solution to complexity is to
eliminate the complex variable from the factor analysis. I have seen other instances where
authors chose to include it as a variable in multiple factors, or to arbitrarily assign it to a
factor for conceptual reasons. Other prescriptions are to try different methods of factor
extraction and rotation to see if a more interpretable solution can be found.
If the factors are not conceptually distinct and cannot be named satisfactorily, the
factor solution may be a mathematical contrivance that has not useful application.
The naming should take into account the signs of the factor loadings, i.e. negative
signs imply an inverse relationship to the factor. For example, X1 'Delivery Speed',
X2 'Price Level', X3 'Price Flexibility', and X7 'Product Quality' load on the first
factor. Two of these variables: X1 'Delivery Speed' and X3 'Price Flexibility' have a
negative sign meaning that they vary inversely to the two variables which have a
positive loading: 'Price Level' and X7 'Product Quality'. The name for this factor,
which the authors term 'basic value' on page 126 of the text, attempts to take into
account the direction of the relationships among all of these variables.
The two variables loading on the second factor X4 'Manufacturer Image' and
X6 'Salesforce Image' both have positive signs and are named 'HATCO image' by the
authors on page 127 of the text.
In the validation stage of the factor analysis, we are concerned with the issue
generalizability of the factor model we have derived. We examine two issues: first, is
the factor model stable and generalizable, and second, is the factor solution impacted
by outliers.
The only method for examining the generalizability of the factor model in SPSS is a
split-half validation.
As in all multivariate methods, the findings in factor analysis are impacted by sample
size. The larger the sample size, the greater the opportunity to obtain significant
findings that are present only because of the large sample. The strategy for examining
the stability of the model is to do a split-half validation to see if the factor structure
and the communalities remain the same.
Second, highlight
the 'split' variable
and click on the
move button to put
it into the
'Selection
Variable:' text box.
Fifth, click on
the Continue
button to
complete the
value
assignment.
Click on the
Third, click on OK button in
the 'Value...' the Factor
button which Analysis
was activated dialog to
when the compute the
'Selection factor
Variable:' text analysis for
box was the second
highlighted. half of the
sample.
The two rotated factor matrices for each half of the sample produce the same pattern
of loadings of variables on factors that we obtained for the analysis on the complete
sample. This result validates the factor solution obtained. Had we obtained fewer
factors or a different pattern of loading, we should adjust our analysis accordingly, or
include this information in the discussion of limitations to our study.
While the communalities differ for the two models, in all cases they are above 0.50,
indicating that the factor model is explaining more than half of the variance in all of
the original variables.
SPSS proposes a strategy for identifying outliers that is not found in the text (See: SPSS
Base 7.5 Applications Guide, pp. 303-304). SPSS computes the factor scores as
standard scores with a mean of 0 and a standard deviation of 1. We can examine the
factor scores to see if any are above or below the standard score size associated with
extreme cases, i.e. +/-2.5 or +-3.0. For this analysis, we will need to compute the
factors scores which we have not requested to this point.
Second, we click on
the Continue button to
complete our selection
of statistics.
Using a criterion of +/-2.5, we have no outliers on the first factor and two outliers on
the second factor, case ID 5 and case ID 42.
The correlation matrix for the full sample is shown in the top half of the window, and
the correlation matrix for the sample excluding outliers is shown in the bottom half of
the window. Some correlations are stronger without the outliers and others are
Delivery
Correlation Matrix
weaker, but the overall pattern of correlations in the matrix is the same. This output
Correlation Delivery Speed
Price Level
Price Flexibility
Manufacturer Image
Speed
1.000
-.349
.509
.050
Price Level
-.349
1.000
-.487
.272
Flexibility
.509
-.487
1.000
-.116
Image
.050
.272
-.116
1.000
Image
.077
.186
-.034
.788
Quality
-.483
.470
-.448
.200
would not support a conclusion that the outliers are having an impact on the factor
Salesforce Image
Product Quality
.077
-.483
.186
.470
-.034
-.448
.788
.200
1.000
.177
.177
1.000
results.
Correlation Delivery Speed
Price Level
Delivery
Speed
1.000
-.319
Price Level
Correlation Matrix
-.319
1.000
Price
Flexibility
.487
-.471
Manufacturer
Image
-.039
.353
Salesforce
Image
-.020
.272
Product
Quality
-.450
.449
Price Flexibility .487 -.471 1.000 -.186 -.107 -.426
Manufacturer Image -.039 .353 -.186 1.000 .761 .295
Salesforce Image -.020 .272 -.107 .761 1.000 .284
Product Quality -.450 .449 -.426 .295 .284 1.000
The communalities for the full model are shown on the left, with the communalities
for the model excluding outliers shown on the right. The overall pattern of
communalities is identical for both models.
The Rotated Component Matrix for the full model are shown on the left, with the
Rotated Component Matrix for the model excluding outliers shown on the right. The
overall pattern of Rotated Component Matrix is identical for both models, so we would
conclude that the outliers are not impacting our solution. In subsequent analysis using
the factors, we can include all cases in the analysis.
We have already computed the scores for the two factors, which we can use in
subsequent analyses as a substitute for the six original variables.
Another option for reducing the data set is to select one of the variables on each
factor to use as a surrogate for all the variables that loaded on that factor.
A more common method for incorporating the results of the factor analysis is to create
summated scale variables. In this method, the variables which load on each factor are
simply summed to form the scale score, rather than using the weights or coefficients
for each variable that SPSS uses in calculating factor scores.
Summated scales are easier to compute than weighted factor scores and can easily be
applied to cases not included in the original factor analysis. When summated scales
are used, it is customary to compute Chronbach's Alpha
(From: Larry Hatcher and Edward J. Stepanski. A Step-by-Step Approach to Using the
SAS System for Univariate and Multivariate Statistics.)
Summated or additive scales are formed by summing the scores for a set of variables
that load on a factor. If you incorporate summated or additive scales into your
research, there is an expectation that you will include efforts to assess the reliability
of your measures.
A variety of methods for estimating scale reliability are actually used in practice. Test-
retest reliability is assessed by administering the same instrument to the same sample
of subjects at two points in time and computing the correlation between the two sets
of scores. However, this can be a time consuming and expensive procedure, where you
are collecting additional data that cannot be used in other analyses. Because of the
cost and time involved in test-retest procedures, indices of reliability that require only
one administration are often used. The most popular of these indices are the internal
consistency indices of reliability. Briefly, internal consistency is the extent to which
the individual items that constitute a test correlate with one another or with the test
total. In the social sciences, one of the most widely used indices of internal
consistency is coefficient alpha or Cronbach's alpha.
Principal Components Factor Analysis
Summated Scales and Chronbach's Alpha - 2
While coefficient alpha has values from 0 to 1.0, the general rule of thumb is that it
must be above 0.70 in order to be judged adequate. Coefficient alpha will be high to
the extent that many items are included in the scale, and the items that constitute the
scale are highly correlated with one another.
Coefficient alpha requires that we specify the variables that we believe form the scale
and the measure tells us whether or not we have internal consistency between these
items and a summated scale that would be formed from them.
In many of the articles which we have used this semester, variables have been
combined to form scales which were in turn used as independent variables. The alpha
statistic frequently cited in discussing the formation of the scales is Cronbach's, or
coefficient, alpha.
Join us on:
Twitter - http://twitter.com/#!/AnalytixLabs
Facebook - http://www.facebook.com/analytixlabs
LinkedIn - http://www.linkedin.com/in/analytixlabs
Blog - http://www.analytixlabs.co.in/category/blog/
60