Factor Analysis (SPSS Based)

Factor Analysis (Principal Component Analysis Based)
Principal Components Factor Analysis
The purpose of principal components factor analysis is to reduce the number of

variables in the analysis by using a surrogate variable or factor to represent a number
of variables, while retaining the variance that was present in the original
variables. The data analysis indicates the relationship between the original variables
and the factors, so that we know how to make the substitutions.
Principal components is frequently used to simplify a data set prior to conducting a

multiple regression or discriminant analysis.
To demonstrate principal components analysis, we will use the sample problem in the
text which begins on page 120.

Stage 1: Define the Research Problem
The perceptions of HATCO on seven attributes (Delivery speed, Price level, Price
flexibility, Manufacturer image, Service, Salesforce image, and Product quality) are
examined to (1) understand if these perceptions can be "grouped" and (2) reduce the
seven variables to a smaller number. (Text, page 120)

Stage 2: Designing a Factor Analysis
In this stage, we address issues of sample size and measurement issues. Since missing
data has an impact on sample size, we will analyze patterns of missing data in this
stage.
Sample size issues
Missing Data Analysis
There is no missing data in the HATCO data set.
Sample size of 100 or more
There are 100 subjects in the sample. This requirement is met.
Ratio of subjects to variables should be 5 to 1
There are 100 subjects and 7 variables in the analysis for a ratio of 14 to 1. This
requirement is met.
Variable selection and measurement issues
Dummy code non-metric variables
All variables in the analysis are metric so no dummy coding is required.
Parsimonious variable selection
The variables included in the analysis are core elements of the business. There do not
appear to be any extraneous variables.
Stage 3: Assumptions of Factor Analysis
In this stage, we do the tests necessary to meet the assumptions of the statistical
analysis. For factor analysis, we will examine the suitability of the data for a factor
analysis.
Metric or dummy-coded variables
All variables are metric in this analysis.
Departure from normality, homoscedasticity, and linearity diminish correlations
In the last class, we conducted an exploratory analysis of these variables. We know

from this several variables are not normally distributed. Non-normality will diminish
the correlations among the variables, such that the relationships between variables
might be stronger than what we are able to represent in this analysis.
Multivariate Normality required if a statistical criteria for factor loadings is used
We will use the criteria of 0.40 for identifying substantial loadings on factors, rather
than a statistical probability, so this criteria is not binding on this analysis.
Homogeneity of sample
There is nothing in the problem to indicate that subgroups within the sample have
different patterns of scores on the variables included in the analysis. In the absence of
evidence to the contrary, we will assume that this assumption is met.
Use of Factor Analysis is Justified
The determination that the use of factor analysis is justifiable is obtained from the
statistical output that SPSS provides in the request for a factor analysis.
Requesting a Principal Components Factor Analysis
Click on the 'Data Reduction |

Factor...' command in the
Analyze menu.

Specify the Variables to Include in the Analysis
First, highlight the variables: X1 'Delivery

Speed', X2 'Price Level', X3 'Price Flexibility',
X4 'Manufacturer Image', X5 'Service', X6
'Salesforce Image', and X7 'Product Quality'.
Second, click on the move

arrow to move the highlighted
variables to the 'Variables:' list.

Specify the Descriptive Statistics to include in the Output
Second, mark the checkbox for

'Initial solution' in the 'Statistics'
panel. Clear all other checkboxes.
First, click on the

'Descriptives...'
button.
Fourth, click on the

'Continue' button to
complete the 'Factor
Analysis: Descriptives'
dialog box.
Third, mark the checkboxes for 'Coefficients',

'KMO and Bartlett's test of sphericity', and
'Anti-image' on the 'Correlation Matrix' panel.
Clear all other checkboxes.

Specify the Extraction Method and Number of Factors
Second, select
Third, mark the Fourth, mark the
'Principal components'
'Correlation matrix' checkboxes for 'Unrotated
from the 'Method' drop
option in the factor solution' and 'Scree
down menu.
'Analyze' panel. plot' on the 'Display' panel.
First, click
on the
'Extraction...'
button.
Sixth, click on
the 'Continue'
button to
complete the
dialog box.
Fifth, accept the default values of 'Eigenvalues

over: 1' on the Extract panel and the
'Maximum Iterations for convergence: 25'.

Specify the Rotation Method
Second, mark the

'Varimax' option on
the 'Method' panel.
First, click on the

'Rotation...' button.
Fourth, click on the

'Continue' button to
complete the dialog
box.
Third, mark the checkbox for 'Rotated

solution' on the 'Display' panel. Clear all other
checkboxes.

Complete the Factor Analysis Request
Click on the OK button

to complete the factor
analysis request.

Count the Number of Correlations Greater than 0.30
Nine of the 21 correlations in the matrix are

larger than 0.30, highlighted in yellow in the
Correlation Matrix. We meet this criteria for
the suitability of the data for factor analysis.

Measures of Appropriateness of Factor Analysis
Interpretive adjectives for the Kaiser-Meyer-Olkin Measure of Sampling

Adequacy are: in the 0.90 as marvelous, in the 0.80's as meritorious,
in the 0.70's as middling, in the 0.60's as mediocre, in the 0.50's as
miserable, and below 0.50 as unacceptable. The value of the KMO
Measure of Sampling Adequacy for this set of variables is .446, falling
below the acceptable level. We will examine the anti-image correlation
matrix to see if it provides us with any possible remedies.

Assessing the Sampling Adequacy Problem
The Anti-image Correlation Matrix contains the measures of sampling

adequacy for the individual variables on the diagonal of the matrix,
highlighted in cyan. The measures for three variables fall below the
acceptable level of 0.50: X1 'Delivery Speed' (.344), X2 'Price Level'
(.330), and X5 'Service' (.288). The corrective action is to delete the
variables one at a time, starting with the one with the smallest value,
until the problem is corrected.

Removing X5 'Service' from the Analysis
First, click on Second, in the drop down menu of recently used

the 'Dialog dialogs, highlight the 'Factor Analysis' item.
Recall' tool
button.
Third, highlight 'Service (X5)' in

the list of 'Variables:'.
Fifth, click on the OK

Fourth, click on the button to request the
move arrow to return revised analysis.
'Service (X5)' to the list
of available buttons.

The Revised Measures of Appropriateness of Factor Analysis
The revised KMO Measure of Sampling

Adequacy has a value of 0.665, in the range of
acceptable values.
Bartlett's test of sphericity tests the hypothesis that the correlation

matrix is an identify matrix; i.e. all diagonal elements are 1 and all off-
diagonal elements are 0, implying that all of the variables are
uncorrelated. If the Sig value for this test is less than our alpha level,
we reject the null hypothesis that the population matrix is an identity
matrix. The Sig. value for this analysis leads us to reject the null
hypothesis and conclude that there are correlations in the data set that
are appropriate for factor analysis.

The Revised Anti-image Correlation Matrix
The new anti-image correlation matrix

indicates that the sampling adequacy for each
variable is above the 0.50 threshold.

Stage 4: Deriving Factors and Assessing Overall Fit - 1
In the stage, several criteria are examined to determine the number of factors that represent
the data. If the analysis is designed to identify a factor structure that was obtained in or
suggested by previous research, we are employing an a priori criterion, i.e. we specify a specific
number of factors. The three other criteria are obtained in the data analysis: the latent root
criterion, the percentage of variance criterion, and the Scree test criterion. A final criterion
influencing the number of factors is actually deferred to the next stage, the interpretability
criterion. The derived factor structure must make plausible sense in terms of our research; if it
does not we should seek a factor solution with a different number of components.
It is generally recommended that each strategy be used in determining the number of factors in
the data set. It may be that multiple criteria suggest the same solution. If different criteria
suggest different conclusions, we might want to compare the parameters of our problem, i.e.
sample size, correlations, and communalities to determine which criterion should be given
greater weight.

The Latent Root Criterion
One of the most commonly used criteria for determining the number of factors or components to
include is the latent root criterion, also known as the eigenvalue-one criterion or the Kaiser
criterion. With this approach, you retain and interpret any component that has an eigenvalue
greater than 1.0.
The rationale for this criterion is straightforward. Each observed variable contributes one unit of
variance to the total variance in the data set (the 1.0 on the diagonal of the correlation matrix).
Any component that displays an eigenvalue greater than 1.0 is accounting for a greater amount
of variance than was contributed by one variable. Such a component is therefore accounting for
a meaningful amount of variance and is worthy of being retained. On the other hand, a
component with an eigenvalue less than 1.0 is accounting for less variance than had been
contributed by one variable.
The latent root criterion has been shown to produce the correct number of components when the
number of variables included in the analysis is small (10 to 15) or moderate (20 to 30) and the
communalities are high (greater than 0.70). Low communalities are those below 0.40 (Stevens,
page 366).

Proportion of Variance Accounted For
Another criterion in determining the number of factors to retain involves retaining a component
if it accounts for a specified proportion of variance in the data set, i.e. at least 5% or 10%.
Alternatively, one can retain enough components to explain some cumulative total percent of
variance, usually 70% to 80%.
While this strategy has intuitive appeal (our goal is to explain the variance in the data set), there
is not agreement about what percentages are appropriate to use and the strategy is criticized for
being subjective and arbitrary. We will employ 70% as the target total percent of variance.

The Scree Test
With a Scree test, the eigenvalues associated with each component are plotted against their
ordinal numbers (i.e. first eigenvalue, second eigenvalue, etc.). Generally what happens is that
the magnitude of successive eigenvalues drops off sharply (steep descent) and then tends to level
off.
The recommendation is to retain all the eigenvalues (and corresponding components) before the
first one on the line where they start to level off. (Hatcher and Stevens both indicate that the
number to accept is the number before the line levels off; Hair, et. al. say that the number of
factors is the point where the line levels off, but also state that this tends to indicate one or two
more factors that are indicated by the latent root criteria.)
The Scree test has been shown to be accurate in detecting the correct number of factors with a
sample size greater than 250 and communalities greater than 0.60.

The Interpretability Criteria
Perhaps the most important criterion in solving the number of components problem is the
interpretability criterion: interpreting the substantive meaning of the retained components and
verifying that this interpretation makes sense in terms of what is know about the constructs
under investigation.
The interpretability of a component is improved if it is measured by at least three variables,

when all of the variables that load on the component have the same conceptual meaning, when
the conceptual meaning of other components appear to be measuring other constructs, and when
the rotated factor pattern shows simple structure, i.e. each variable loads significantly on only
one component.

The Latent Root Criterion
In the latent root criterion, we identify the

number of eigenvalues that are larger than
1.0. In this table, we have two, so this
criterion supports the presence of two
components or factors.

Percentage of Variance Criterion
In this criterion, we count the

number of components that
would be necessary to explain
70% or more of the variance
in the original set of variables.
In this analysis, we reach the
70% minimum with two
components.

Scree Test Criterion
In my analysis of the scree plot, the

eigenvalues level off beginning with the
third eigenvalue. The number of
components to retain corresponds to
the number of eigenvalues before the
line levels off. Therefore, we would
retain two components, which
corresponds to the number determined
by the latent root criterion. (NOTE: in
applying this test, the text identifies
three components using their
interpretation of the criteria).

Stage 5: Interpreting the Factors - 1
All three of the criteria for determining the number of components indicate that two
components should be retained. If this number matches the number of components
derived by SPSS, we can continue with the analysis. If this number does not match the
number of components that SPSS derived, we need to request the factor analysis,
specifying the number of components that we want SPSS to extract.
Once the extraction of factors has been completed satisfactorily, the resulting factor
matrix, which shows the relationship of the original variables to the factors, is rotated
to make it easier to interpret. The axes are rotated about the origin so that they are
located as close to the clusters of related variables as possible. The orthogonal
VARIMAX rotation is the one found most commonly in the literature. VARIMAX rotation
keeps the axes at right angles to each other so that the factors are not correlated with
each other. Oblique rotation permits the factors to be correlated, and is cited as
being a more realistic method of analysis. In the analyses that we do for this class we
will specify an orthogonal rotation.
The first step in this stage is to determine if any variables should be eliminated from
the factor solution. A variable can be eliminated for two reasons. First, if the
communality of a variable is low, i.e. less than 0.50, it means that the factors contain
less than half of the variance in the original variable, so we might want to exclude
that variable from the factor analysis and use it in its original form in subsequent
analyses. Second, a variable may have loadings below the criteria level on all factors,
i.e. it does not have a strong enough relationship with any factor to be represented by
the factor score, so that the variables information is better represented by the original
form of the variable.

Stage 5: Interpreting the Factors - 2
Since factor analysis is based on a pattern of relationships among variables,

elimination of one variable will change the pattern of all of the others. If we have
variables to eliminate, we should eliminate them one at a time, starting with the
variable that has the lowest communality or pattern of factor loadings.
Once we have eliminated variables that do not belong in the factor analysis, we
complete the analysis by naming the factors which we obtained. Naming is an
important piece of the analysis, because it assures us that the factor solution is
conceptually valid.

Analysis of the Communalities
Once the extraction of factors has been completed, we examine the table of 'Communalities'
which tells us how much of the variance in each of the original variables is explained by the
extracted factors. For example, in the table shown below, 65.8% of the variance in the
original X1 'Delivery Speed' variable is explained by the two extracted components. Higher
communalities are desirable. If the communality for a variable is less than 50%, it is a
candidate for exclusion from the analysis because the factor solution contains less that half of
the variance in the original variable, and the explanatory power of that variable might be
better represented by the individual variable.
The table of Communalities for this analysis shows communalities for all variables above 0.50,
so we would not exclude any variables on the basis of low communalities. If we did exclude a
variable for a low communality, we should re-run the factor analysis without that variable
before proceeding.
The table of Communalities for this analysis shows communalities for all variables above 0.50,
so we would not exclude any variables on the basis of low communalities. If we did exclude a
variable for a low communality, we should re-run the factor analysis without that variable
before proceeding.
Analysis of the Factor Loadings - 1
When we are satisfied that the factor solution explains sufficient variance for all of the
variables in the analysis, we examine the 'Rotated Factor Matrix ' to see if each variable has
a substantial loading on one, and only one, factor. The size of the loading termed
substantial is a subject about which there are a lot of divergent opinions. We will use a
time-honored rule of thumb that a substantial loading is 0.40 or higher. Whichever method
is employed to define substantial, the process of analyzing factor loadings is the same.
The methodology for analyzing factor loading is to underline or mark all of the loadings in
the rotated factor matrix that are higher than 0.40. For the rotated factor matrix for this
problem, the substantial loadings are highlighted in green.

Analysis of the Factor Loadings - 2
We examine the pattern of loadings for what is called 'simple structure' which means that
each variable has a substantial loading on one and only one factor.
In this component matrix, each variable does have one substantial loading on a
component. If one or more variables did not have a substantial loading on a factor, we
would re-run the factor analysis excluding those variables one at a time, until we have a
solution in which all of the variables in the analysis load on at least one factor.
In this component matrix, each of the original variables also has a substantial loading on
only one factor. If a variable had a substantial loading on more than one variable, we refer
to that variable as "complex" meaning that it has a relationship to two or more of the
derived factors. There are a variety of prescriptions for handling complex variables. The
simple prescription is to ignore the complexity and treat the variable as belonging to the
factor on which it has the highest loading. A second simple solution to complexity is to
eliminate the complex variable from the factor analysis. I have seen other instances where
authors chose to include it as a variable in multiple factors, or to arbitrarily assign it to a
factor for conceptual reasons. Other prescriptions are to try different methods of factor
extraction and rotation to see if a more interpretable solution can be found.

Naming the Factors
Once we have an interpretable pattern of loadings, we name the factors or

components according to their substantive content or core. The factors should have
conceptually distinct names and content. Variables with higher loadings on a factor
should play a more important role in naming the factor.
If the factors are not conceptually distinct and cannot be named satisfactorily, the
factor solution may be a mathematical contrivance that has not useful application.
The naming should take into account the signs of the factor loadings, i.e. negative
signs imply an inverse relationship to the factor. For example, X1 'Delivery Speed',
X2 'Price Level', X3 'Price Flexibility', and X7 'Product Quality' load on the first
factor. Two of these variables: X1 'Delivery Speed' and X3 'Price Flexibility' have a
negative sign meaning that they vary inversely to the two variables which have a
positive loading: 'Price Level' and X7 'Product Quality'. The name for this factor,
which the authors term 'basic value' on page 126 of the text, attempts to take into
account the direction of the relationships among all of these variables.
The two variables loading on the second factor X4 'Manufacturer Image' and
X6 'Salesforce Image' both have positive signs and are named 'HATCO image' by the
authors on page 127 of the text.

Stage 6: Validation of Factor Analysis
In the validation stage of the factor analysis, we are concerned with the issue
generalizability of the factor model we have derived. We examine two issues: first, is
the factor model stable and generalizable, and second, is the factor solution impacted
by outliers.
The only method for examining the generalizability of the factor model in SPSS is a
split-half validation.
To identify outliers, we will employ a strategy proposed in the SPSS manual.
Split Half Validation
As in all multivariate methods, the findings in factor analysis are impacted by sample
size. The larger the sample size, the greater the opportunity to obtain significant
findings that are present only because of the large sample. The strategy for examining
the stability of the model is to do a split-half validation to see if the factor structure
and the communalities remain the same.

Set the Starting Point for Random Number Generation
First, select the

'Random Number
Seed...' command
from the 'Transform'
menu.
Second, click on the 'Set seed

to:' option to access the text
box for the seed number.
Fourth, click on the Third, type '34567' in

OK button to complete the 'Set seed to:' text
this action. box. (This is the same
random number seed
specified by the authors
on page 705 of the
text.)

Compute the Variable to Randomly Split the Sample into Two Halves
First, select the 'Compute...'

command from the
Transform menu.
Second, create a Third, type the formula 'uniform(1) >

new variable 0.52' in the 'Numeric Expression:' text
named 'split' that box. The uniform function will generate
has the values 1 a random number between 0.0 and 1.0
and 0 to divide the for each case. If the generated
sample into two random number is greater than 0.52,
part. Type the the numeric expression will result in a
name 'split' into 1, since the numeric expression is
the 'Target true. If the generated random number
Variable:' text is 0.52 or less, the numeric expression
box. will produce a 0, since its value is
false. In many computer programs,
true is represented by the number 1
and false is represented by a 0.
Fourth, we click
on the OK button
to compute the
split variable.

Compute the Factor Analysis for the First Half of the Sample
First, select the 'Data

Reduction | Factor...'
command from the
Analyze menu.
Second, highlight
the 'split' variable
and click on the
move button to put
it into the
'Selection
Variable:' text box.

Select the First Half of the Sample for Analysis
Second, type the value 0 into the

'Value for Selection Variable:' text
box to replace the '?' in the
'split=?' entry in the 'Selection
Variable:' text box with 'split=0'.
First, click on the

'Value...' button which
was activated when the
split variable moved to
the 'Selection Variable:'
text box.
Third, click on the

Continue button to
complete the value
assignment.
Click on the OK button

in the Factor Analysis
dialog to compute the
factor analysis for the
first half of the
sample.

Compute the Factor Analysis for the Second Half of the Sample
Fourth, type the value 1 into the

'Value for Selection Variable:' text box
to replace the '0' in the 'split=0' entry
in the 'Selection Variable:' text box
with 'split=1'.
First, select the 'Data
Reduction | Factor...'
command from the
Analyze menu.
Second, click on the
'Selection Variable:' text box
to highlight it.
Fifth, click on
the Continue
button to
complete the
value
assignment.
Click on the
Third, click on OK button in
the 'Value...' the Factor
button which Analysis
was activated dialog to
when the compute the
'Selection factor
Variable:' text analysis for
box was the second
highlighted. half of the
sample.

Compare the Two Rotated Factor Matrices
The two rotated factor matrices for each half of the sample produce the same pattern
of loadings of variables on factors that we obtained for the analysis on the complete
sample. This result validates the factor solution obtained. Had we obtained fewer
factors or a different pattern of loading, we should adjust our analysis accordingly, or
include this information in the discussion of limitations to our study.

Compare the Communalities
While the communalities differ for the two models, in all cases they are above 0.50,
indicating that the factor model is explaining more than half of the variance in all of
the original variables.

2. Identification of Outliers
SPSS proposes a strategy for identifying outliers that is not found in the text (See: SPSS
Base 7.5 Applications Guide, pp. 303-304). SPSS computes the factor scores as
standard scores with a mean of 0 and a standard deviation of 1. We can examine the
factor scores to see if any are above or below the standard score size associated with
extreme cases, i.e. +/-2.5 or +-3.0. For this analysis, we will need to compute the
factors scores which we have not requested to this point.

Removing the Split Variable from the Analysis
First, we re-open the Factor

Analysis dialog box by
selecting the 'Data Reduction
| Factor...' command from
the Analyze menu.
Second, we highlight the 'split=1'

selection variable and click on the
move arrow to remove it, so that
the factors scores are computed
using the parameters for the full
sample.

Requesting the Factor Scores
Second, we mark the

'Save as variables' Fourth, we click
checkbox in the 'Factor on the 'Continue'
Analysis: Factor Scores' button to close the
First, we click on the dialog. 'Factor Analysis:
'Scores...' button in
Factor Scores'
the Factor Analysis
dialog and the OK
dialog.
button to request
the output.
Third, we accept the

default 'Regression'
method for computing
the factor scores.

The Factor Scores in the SPSS Data Editor
SPSS adds variables for the

factor scores to the data set.

Use the Explore Procedure to Locate Factor Score Outliers
Second, move the FAC1_1

'REGR factor score 1 for
analysis 1' and 'FAC2_1
REGR factor score 2 for First, select the 'Descriptive
analysis 1' variables compute Statistics | Explore…' command
by the Factor Analysis to the from the Analyze menu.
'Dependent List:' list box.
Third, move the ID variable to

the 'Label Cases by:' text box
so that the case ID will appear
in the output listings.
Fifth, click on the

'Statistics…' to request
the listing of outliers.
Fourth, mark the

'Statistics' option on
the Display panel.

Specify Outliers as the Desired Statistics
First, we mark the

'Outliers' check box
and clear all other
check boxes. Third, we click on
the OK button to
produce the output.
Second, we click on
the Continue button to
complete our selection
of statistics.

Extreme Values as Outliers
Using a criterion of +/-2.5, we have no outliers on the first factor and two outliers on
the second factor, case ID 5 and case ID 42.

Excluding the Outliers from the Factor Analysis
Second, mark the 'If

condition in satisfied'
option in the 'Select'
panel.
Third, click on the

'If...' button to specify
the inclusion
condition.
First, select the 'Select Cases...'

command from the Data menu.

Specify the Criterion for Selecting Cases
First, we type in the criteria that specifies that

cases will be included is their ID number is not
5 and their ID number is not 42.

Continue button to
complete the
specification.

Re-computing the Factor Model
First, select 'Factor

Second, since we are not
Analysis' using the 'Dialog
Recall' tool button. changing the specifications
for the Factor Analysis, we
click on the OK button to
request that it be re-
computed.

The Correlation Matrix for the Model Excluding Outliers
The correlation matrix for the full sample is shown in the top half of the window, and
the correlation matrix for the sample excluding outliers is shown in the bottom half of
the window. Some correlations are stronger without the outliers and others are
Delivery
Correlation Matrix
Price Manufacturer Salesforce Product
weaker, but the overall pattern of correlations in the matrix is the same. This output
Correlation Delivery Speed
Price Level
Price Flexibility
Manufacturer Image
Speed
1.000
-.349
.509
.050
Price Level
-.349
1.000
-.487
.272
Flexibility
.509
-.487
1.000
-.116
Image
.050
.272
-.116
1.000
Image
.077
.186
-.034
.788
Quality
-.483
.470
-.448
.200
would not support a conclusion that the outliers are having an impact on the factor
Salesforce Image
Product Quality
.077
-.483
.186
.470
-.034
-.448
.788
.200
1.000
.177
.177
1.000
results.
Correlation Delivery Speed
Price Level
Delivery
Speed
1.000
-.319
Price Level
Correlation Matrix
-.319
1.000
Price
Flexibility
.487
-.471
Manufacturer
Image
-.039
.353
Salesforce
Image
-.020
.272
Product
Quality
-.450
.449
Price Flexibility .487 -.471 1.000 -.186 -.107 -.426
Manufacturer Image -.039 .353 -.186 1.000 .761 .295
Salesforce Image -.020 .272 -.107 .761 1.000 .284
Product Quality -.450 .449 -.426 .295 .284 1.000

The Communalities for the Model Excluding Outliers
The communalities for the full model are shown on the left, with the communalities
for the model excluding outliers shown on the right. The overall pattern of
communalities is identical for both models.

The Rotated Component Matrix for the Model Excluding Outliers
The Rotated Component Matrix for the full model are shown on the left, with the
Rotated Component Matrix for the model excluding outliers shown on the right. The
overall pattern of Rotated Component Matrix is identical for both models, so we would
conclude that the outliers are not impacting our solution. In subsequent analysis using
the factors, we can include all cases in the analysis.

Stage 7: Additional Uses of the Factor Analysis Results
We have already computed the scores for the two factors, which we can use in
subsequent analyses as a substitute for the six original variables.
Another option for reducing the data set is to select one of the variables on each
factor to use as a surrogate for all the variables that loaded on that factor.
A more common method for incorporating the results of the factor analysis is to create
summated scale variables. In this method, the variables which load on each factor are
simply summed to form the scale score, rather than using the weights or coefficients
for each variable that SPSS uses in calculating factor scores.
Summated scales are easier to compute than weighted factor scores and can easily be
applied to cases not included in the original factor analysis. When summated scales
are used, it is customary to compute Chronbach's Alpha

Summated Scales and Chronbach's Alpha - 1
(From: Larry Hatcher and Edward J. Stepanski. A Step-by-Step Approach to Using the
SAS System for Univariate and Multivariate Statistics.)
Summated or additive scales are formed by summing the scores for a set of variables
that load on a factor. If you incorporate summated or additive scales into your
research, there is an expectation that you will include efforts to assess the reliability
of your measures.
Recall that an underlying construct is a hypothetical variable that you wish to

measure, but which cannot be directly measured. The observed variables, on the other
hand, consist of measurements that are actually obtained. A reliability coefficient is
defined as the percent of variance in an observed variable that is accounted for by the
true scores on the underlying construct. Since it is generally not possible to obtain true
scores on the underlying construct, reliability is usually defined in practice in terms of
the consistency of the scores that are obtained on the observed variables; an
instrument is said to be reliable if it is shown to provide consistent scores upon
repeated administration, upon administration in alternate forms, and so forth.
A variety of methods for estimating scale reliability are actually used in practice. Test-
retest reliability is assessed by administering the same instrument to the same sample
of subjects at two points in time and computing the correlation between the two sets
of scores. However, this can be a time consuming and expensive procedure, where you
are collecting additional data that cannot be used in other analyses. Because of the
cost and time involved in test-retest procedures, indices of reliability that require only
one administration are often used. The most popular of these indices are the internal
consistency indices of reliability. Briefly, internal consistency is the extent to which
the individual items that constitute a test correlate with one another or with the test
total. In the social sciences, one of the most widely used indices of internal
consistency is coefficient alpha or Cronbach's alpha.
Summated Scales and Chronbach's Alpha - 2
While coefficient alpha has values from 0 to 1.0, the general rule of thumb is that it
must be above 0.70 in order to be judged adequate. Coefficient alpha will be high to
the extent that many items are included in the scale, and the items that constitute the
scale are highly correlated with one another.
Coefficient alpha requires that we specify the variables that we believe form the scale
and the measure tells us whether or not we have internal consistency between these
items and a summated scale that would be formed from them.
In many of the articles which we have used this semester, variables have been
combined to form scales which were in turn used as independent variables. The alpha
statistic frequently cited in discussing the formation of the scales is Cronbach's, or
coefficient, alpha.

Computing the Reliability Coefficient for the First Factor
Second, move the items loading on the first scale,

Delivery Speed, Price Level, Price Flexibility, and
Product Quality, to the list box of 'Items:'.
First, select the 'Scale |

Reliability Analysis...' from
the Analyze menu.
Third, select 'Alpha' from

the drop down menu of Fourth, click on the 'Statistics...'
'Model:' choices. button to specify the statistics we
want included in the output.

Specifying the Statistics to Include in the Reliability Analysis

Continue button to close
the 'Reliability Analysis:
Statistics' dialog box.
Third, click on the

OK button to close
the 'Reliability
Analysis' dialog
First, we mark the check boxes box.
for 'Scale' and 'Scale if item
deleted' and clear all other check
boxes. If the obtained value of
coefficient alpha is below the
acceptable criteria, these
statistics will suggest a remedy
for correcting the problem.

The Reliability Analysis for the First Summated Scale
The alpha coefficient for

the first summated scale
is -0.8984, well above
the 0.70 criteria. Had we
not exceeded the
criteria, we would have
looked at the column
"Alpha if Item Deleted" to
see if we could have
omitted one of the
variables from the list
and formed a reliable
scale from the 3
remaining variables.

Computing the Reliability Coefficient for the Second Factor
First, select the 'Scale |

Reliability Analysis...' from
the Analyze menu.
Second, remove the variables for the first

scale from the 'Items:' list box and move the
items for the second scale, Manufacturer
Image and Salesforce Image, to the list box of
'Items:'.
Third, all other

specifications
remain the same,
so we click on the
OK button to
produce the
output.

The Reliability Analysis for the Second Summated Scale
The alpha coefficient

for the second
summated scale is
0.8463, well above
the 0.70 criteria.

Contact us
Visit us on: http://www.analytixlabs.in/
For course registration, please visit: http://www.analytixlabs.co.in/course-registration/
For more information, please contact us: http://www.analytixlabs.co.in/contact-us/

Or email: info@analytixlabs.co.in
Call us we would love to speak with you: (+91) 88021-73069
Join us on:
Twitter - http://twitter.com/#!/AnalytixLabs
Facebook - http://www.facebook.com/analytixlabs
LinkedIn - http://www.linkedin.com/in/analytixlabs
Blog - http://www.analytixlabs.co.in/category/blog/
60

Factor Analysis (SPSS Based)

Uploaded by

Copyright:

Available Formats

You might also like

Factor Analysis (SPSS Based)

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Factor Analysis (SPSS Based)

Uploaded by

Copyright:

Available Formats

Factor Analysis (Principal Component Analysis Based)

Principal Components Factor Analysis

The purpose of principal components factor analysis is to reduce the number of

Principal components is frequently used to simplify a data set prior to conducting a

Principal Components Factor Analysis

Principal Components Factor Analysis

Sample size issues

Missing Data Analysis

There is no missing data in the HATCO data set.

Sample size of 100 or more

There are 100 subjects in the sample. This requirement is met.

Ratio of subjects to variables should be 5 to 1

Variable selection and measurement issues

Dummy code non-metric variables

All variables in the analysis are metric so no dummy coding is required.

Parsimonious variable selection

Metric or dummy-coded variables

All variables are metric in this analysis.

Departure from normality, homoscedasticity, and linearity diminish correlations

In the last class, we conducted an exploratory analysis of these variables. We know

Multivariate Normality required if a statistical criteria for factor loadings is used

Use of Factor Analysis is Justified

Click on the 'Data Reduction |

Principal Components Factor Analysis

First, highlight the variables: X1 'Delivery

Second, click on the move

Principal Components Factor Analysis

Second, mark the checkbox for

First, click on the

Fourth, click on the

Third, mark the checkboxes for 'Coefficients',

Principal Components Factor Analysis

Fifth, accept the default values of 'Eigenvalues

Principal Components Factor Analysis

Second, mark the

First, click on the

Fourth, click on the

Third, mark the checkbox for 'Rotated

Principal Components Factor Analysis

Click on the OK button

Principal Components Factor Analysis

Nine of the 21 correlations in the matrix are

Principal Components Factor Analysis

Interpretive adjectives for the Kaiser-Meyer-Olkin Measure of Sampling

Principal Components Factor Analysis

The Anti-image Correlation Matrix contains the measures of sampling

Principal Components Factor Analysis

First, click on Second, in the drop down menu of recently used

Third, highlight 'Service (X5)' in

Fifth, click on the OK

Principal Components Factor Analysis

The revised KMO Measure of Sampling

Bartlett's test of sphericity tests the hypothesis that the correlation

Principal Components Factor Analysis

The new anti-image correlation matrix

Principal Components Factor Analysis

Principal Components Factor Analysis