Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 3

Factor Analysis

1. The purpose of factor analysis is to reduce the initial number of variables into a smaller and therefore
more manageable (easier to analyze and interpret) set of underlying dimensions, called factors.
2. There should be no dependent or independent variables in factor analysis. Mixing dependent and
independent variables in a single factor analysis and then examining the dependence relationships is
3. R Factor Analysis vs. Q Factor Analysis
a) R Factor Analysis (most common type) analyzes the variables (usually the columns of the input
b) Q Factor Analysis analyzes the respondents (not very popular – for analysis of subgroups of
respondents, Cluster Analysis is rather used)
4. Assumptions of factor analysis:
a) The variables should be metric (interval or ratio scale).
b) Sample size n = at least 5 (ideally 10 or even 20) times the number of variables.
c) The sample should be homogeneous with respect to the underlying factor structure. For example,
if you know that some variables differ because of gender, it is wrong to apply factor analysis to a
sample of males and females together. In such cases, you should perform two factor analyses, one
on a sample of males, and the other on a sample of females.
d) The typical for other techniques assumptions of normality, homoscedasticity, and linearity are not
very important in factor analysis, i.e. they are – as always – welcome, however, they are not
crucial, unless one wants to apply statistical tests (rarely used) of the significance of the factors.
Some degree of multicollinearity is even desirable in that the correlation matrix should reveal a
substantial number of correlations greater than 0.30 (however, most of the partial correlations
should – ideally - be less than 0.30; they are displayed by SPSS in the anti-image correlation
e) Always remember to check for and remove the outliers.
f) The correlation matrix should also be examined with the Bartlett test of sphericity wich measures
the presence of correlations among the variables.
g) Another measure to quantify the degree of intercorrelations among the variables and hence the
appropriateness of factor analysis is KMO MSA (Kaiser-Meyer-Olkin Measure of Sampling
Adequacy; 0KMO1.
 KMO >0.8 (meritorious*)
 0.7< KMO < 0.8 (middling)
 0.6 < KMO < 0.7 (mediocre)
 0.5 < KMO < 0.6 (miserable)
 KMO less than 0.5 (unacceptable)
 Original terms used by Kaiser

Example: open File5.sav

1. AnalyzeData ReductionFactorDescriptives (check all boxes)Extraction (Method: Principal

Components; check Scree plot box; Maximum Iterations for Convergence = 200 OK

a) Interpretation of the results

 Check Correlation Matrix: is there a sufficient number of correlations greater than 0.30 or less
than –0.30? Are they significant?  YES. There are 6 such correlations (+ the 7th very close to
0.30) among the 15 correlation coefficients.
 KMO = 0.660 (mediocre)  OK (in general, should be greater than 0.50)
 Bartlett’s Test of Spehericity: Ho: The variables are uncorrelated in the population. Because Sig.
= 0.000  Reject Ho, which is the desired result.
 Based on KMO and Bartlett’s Test of Sphericity  FA is appropriate for analyzing the
correlation matrix.
 Anti-image Correlation Matrix

 The elements on the main diagonal are the individual variables’ MSA’s: they should be greater
than 0.5 (a variable with MSA < 0.5 should be removed from further analysis; however, do
not eliminate all such variables at once. First, remove the variable with the lowest MSA,
repeat the FA, remove the new lowest MSA, until all the MSA’a are greater than 0.50)
 Communalites:
 There are several methods of analyzing the data in FA: the most popular ones are
(i) Principal Components Analysis (PCA) – this method is used in the current example
 used to extract the minimum number of factors accounting for maximum variance in
the data for use in subsequent multivariate analysis (e.g. cluster analysis)
 PCA analyzes the original correlation matrix, where the main diagonal has all 1’s, i.e.
it is based on the total variance

(ii) Common Factor Analysis (CFA) – also known as Principal Axis Factoring (SPSS)
 used to determine all the underlying dimensions
 CFA analyzes the correlation matrix, where the 1’s on the main diagonal have been
replaced with the communalities, i.e. it is based on the common variance
 Communality = Sum of squares of the variable loadings (elements of
Component Matrix described below) across all factors:
 Ex. Variable 1: Prevents Cavities: Communality = 0.926 = (0.928)^2 +
 CFA has several problems, such as no single solution. Therefore, the PCA is more
widely used than the CFA

 Total Variance Explained and Scree Plot

 Used to choose the number of factors

 All eigenvalues greater than 1  two factors
 Scree plot (all eigenvalues before the start of the “scree” – gradual trailing of the
eigenvalues)  three factors
 Based on the cumulative percentage of variance > 60%  two factors
 Note that Eigenvalue1 = 2.731 divided by 6 (number of variables) = 45.52% (the %
of the variance accounted by the first factor, etc.)
 Based on the above: choose two factors.

 Component Matrix (or Factor Matrix, Factor Pattern Matrix)

 Shows the correlations (loadings) between the underlying dimensions (factors) and the
variables. For example, variable “Prevents Cavities) is highly correlated (has a loading of
0.928) with the first factor (component). Unfortunately, the first factor (component) is also
highly correlated with many other variables (five of them altogether) and therefore it is
difficult to interpret. Also the second factor is highly correlated with four variables. To
overcome this interpretation difficulty, one may choose to rotate the solution, using either
 orthogonal rotation (e.g. so called varimax rotation), where the principal axes are
perpendicular or
 oblique rotation – the principal axes are not perpendicular
 Note: The sum of the squared loadings across Factor 1 = The first eigenvalue, etc.

Let’s repeat the analysis with varimax rotation: choose in Rotation – Varimax

 Rotated Component Matrix

 The first factor is highly correlated with the following variables:
 Prevent Cavities (0.962)
 Strengthen Gums (0.934)
 Tooth Decay Unimportant (-0.933)
 The second factor is highly correlated with:

 Shiny Teeth
 Freshens Breath
 Attractive Teeth

Assign labels to this factors. For example, the first factor could be named “Health Benefit” factor, whereas
the second – “Social Image” factor.

The results of varimax rotation look good, there is no need for oblique rotation, which might help if, for
example, one variable were highly correlated on both factors.

 Component Plot in Rotated Space confirms the above interpretation

 Calculate the Factor Scores – the original variables (after standarization) may be multiplied by
these scores and then used in other multivariate analyses (e.g. cluster analysis) instead of the
original observations.
 The Factor Scores are presented in Component Score Coefficient Matrix
 Repeat FA, this time click on Factore Scores (Save as variables)
 The recalculated two scores for each respondent are presented as additional columns
in the input matrix. Now, the respondents may be further analyzed based on only two
factors rather than the six original variables (this the one of the purposes of the data
 Sometimes, instead of computing factors scores, one may want to use some of the
original (surrogate) variables: for example, the variables with the highest loading on
each factor: so, instead of Factor 1, choose V1; instead of Factor 2, choose V6.
 Finally, one may also use so-called Summated Scales: taking an average of all the
variables with high loadings on each factor (remember, however, to reverse the
scores of the variables with negative loadings)
 Example: Factor 1: V1 (positive loading - no change is necessary), V3 (no
change is necessary), V5 (negative loading – change is necessary: e.g. reverse
the value of V5 for Respondent 1 from 2 to 7 – 1 = 6, etc.)

 Reproduced Correlations are used to Test the Model Fit

 The upper triangle of the Reproduced Correlations Matrix contains the Residuals, which
should be less than 0.05 there are only 5 out of 15 residuals greater than 0.05, which
indicates an acceptable model fit.

You might also like