Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 11

Factor analysis

A class of procedures primarily used for data reduction and summarization.

In marketing research, there may be a large number of variables, most of which are
correlated, and which must be reduced to a manageable level.

Factor analysis is an inter-dependent technique, in which the whole set of inter-dependent


relationships are examined i.e.,

 How the items/questions are reflective (or correlation) with each other? and to
organise them into a reduced number of variables.

Factor analysis attempts to achieve parsimony by explaining the maximum amount of


common variance1 in a correlation matrix using the smallest number of explanatory
constructs (variables). These ‘explanatory constructs’ are known as factors.

All these uses are exploratory in nature and, therefore, factor analysis is also called
exploratory factor analysis (EFA).
 
Application
 To identify underlying dimensions, or factors, that explain the correlations among a
set of variables.

 To identify a new, smaller set of uncorrelated variables to replace the original set of
correlated variables in subsequent multivariate analysis (regression or discriminant
analysis).

Example: It can be used in market segmentation for identifying the underlying variables on
which to group the customers.

New car buyers might be grouped based on the relative emphasis and as a result in five
segments seeking:
 Economy
 Convenience
 Performance
 Comfort seekers
 Luxury

Mathematically, a variable after factor analysis may be represented as


Fi = Wi1X1 + Wi2X2 + Wi3X3 +…..+ Wik Xk (Equation – 1)
Fi - estimate of ith factor
Wi - weight or factor score co-efficient
k - number of variables
STEPS TO FOLLOW IN FACTOR ANALYSIS
1
It is the amount of variance that is shared among a set of items. Items that are highly correlated will
share a lot of variance.
1
1. Formulate the Problem: The objectives of factor analysis should be identified.

The variables to be included in the factor analysis should be specified based on


- Past research
- Theory
- Judgment of the researcher.

Variables measured using Interval or ratio scale is appropriate for factor analysis.

And if there are 30 questions/items then the sample size shall be 4 or 5 times the number of
items in case of survey method.

2. Construct the Correlation Matrix: For the factor analysis to be appropriate, the variables
must be correlated.

 If the correlations between all the variables are small, factor analysis may not be
appropriate.

 We would also expect that variables that are highly correlated with each other would
also highly correlate with the same factor or factors.

3. Bartlett’s test of sphericity: To test the null hypothesis that “the variables are
uncorrelated among each other”.

 In other words, each variable correlates perfectly with itself but has no correlation
with the other variables.

When Bartlett’s test significance value is less than 0.05, we conclude that the alternate
hypothesis is accepted.

- It means we have identical items and there is scope for factor analysis.

4. Kaiser-Meyer-Olkin (KMO): It is a measure of sampling adequacy. It is an index used to


examine the appropriateness of factor analysis.

- Small values of KMO statistic indicate that the correlations between pairs of variables
cannot be explained by other variables and that factor analysis may not be
appropriate.

- Generally, a value greater than 0.5 is desirable. High values (between 0.5 and 1.0)
indicate factor analysis is appropriate.

- Values below 0.5 imply that factor analysis may not be appropriate.

2
5. Determine the Method of Factor Analysis: Principal component analysis (PCA) and
Component factor analysis ( to be covered in AMR)

- PCA is recommended when the primary concern is to determine the minimum number
of factors that will account for the maximum variance in the data for use in
subsequent multi-variate analysis.

6. Determine the Number of Factors


It is possible to compute as many principal components as there are variables, but in doing so,
no parsimony is gained.

There are several procedures to determine how many factors are required to extract and we
see three common procedures

 A Priori Determination: Researcher can specify number of factors expected using


his knowledge and can stop extraction of factors once desired number is achieved.

 Determination Based on Eigenvalues: Only factors with eigenvalues greater than


“1” are retained. Eigenvalue represents the amount of variance associated with the
factor.

 Determination Based on Scree Plot: It is a plot of the eigenvalues against the


number of factors in order of extraction.

o Typically, the plot has a distinct break between the steep slope of factors, with
large eigenvalues and a gradual ‘trailing off’ associated with the rest of the
factors.

o This gradual trailing off is referred to as the ‘scree’.

o The point at which the ‘scree’ begins denotes the true number of factors.

o The number of factors determined by a ‘scree plot’ will be one or a few more than
that determined by the eigenvalue criterion.

7. Selection of Rotation: Select varimax rotation which is an orthogonal method of factor


rotation that minimises the number of variables with high loadings on a factor, thereby
enhancing the interpretability of the factors.

Factor loadings
 These are simple correlations between the variables and the factors.
 Can see the “component matrix” for factor loadings in the output tables.

8. Save Variables: Saving the obtained output as factors.

3
4
STATISTICS ASSOCIATED WITH FACTOR ANALYSIS

Communalities: The amount of variance a variable shares with all other variables included
in the analysis is referred to as communality.

- When the communality value is smaller, then we can consider it as unique variable.

Factor Scores Coefficient Matrix: This matrix contains the weights, or factor score
coefficients, used to combine the standardized variables to obtain factor score.

 From the factor scores coefficient matrix table, we write the equation for the factors
as per equation-1 (mentioned in page 1).

Principal component analysis (PCA) makes the assumption that there is no unique variance,
the total variance is equal to common variance.

 Common variance is the amount of variance that is shared among a set of items.
Items that are highly correlated will share a lot of variance.

 Unique variance is any portion of variance that’s not common.

CONDUCTING FACTOR ANALYSIS

Example: Develop a questionnaire to measure various aspects of students’ anxiety towards


learning SPSS.

- Based on interviews with anxious and non-anxious students, 23 possible questions are
prepared.

- Each question was a statement followed by a 5-point Likert scale: ‘strongly disagree’,
‘disagree’, ‘neither disagree nor agree’, ‘agree’ and ‘strongly agree’.

The questionnaire was designed to measure how anxious a given individual would be about
learning how to use SPSS.

But specifically, we wanted to know whether anxiety about SPSS could be broken down into
specific forms of anxiety.

- In other words, what latent variables2 contribute to anxiety about SPSS?

2
Variables that are not directly observed but are rather inferred from other variables that are
observed. 
5
SPSS Commands
Go to “Analyse”
Select “Dimension Reduction”
Select “Factor”
Select “First 8 items” to the ‘variable box’
Go to “Descriptive”
Select “Univariant descriptives”
Select “Coefficients”
Select “Significance levels”
Select “KMO and Bartlett’s test of Sphericity”
Click “Continue”
Then
Click “Extraction”
In Methods select “PCA”
Select “Correlation matrix”
Under display check “unrotated factor selection” and “Scree plots”
Under Extract box check “Based on Eigen value”
Input “1” in the Eigen value greater than box
Click “Continue”
Go to “Rotation”
Select “Varimax”
Check under display if the “Rotated solution” is selected
Click “Continue”
Go to “Scores”
Click “Save as variables”
Under method “Select Regression”
Click “Display factor score coefficient matrix”
Go to “Options”
In missing values “Exclude cases listwise”
In “Coefficient display format” check both sorted size and supress small coefficients
In “absolute value below” input 0.3
Click “Continue”
Click “OK”

6
OUTPUT INTERPRETATION
Correlation Matrix
1 2 3 4 5 6 7 8
Correlation Statistics 1.000
makes me cry
My friends -0.099 1.000
will think I'm
stupid for not
being able to
cope with
SPSS
Standard -0.337 0.318 1.000
deviations
excite me
I dream that 0.436 -0.112 -0.380 1.000
Pearson is
attacking me
with
correlation
coefficients
I don't 0.402 -0.119 -0.310 0.401 1.000
understand
statistics
I have little 0.217 -0.074 -0.227 0.278 0.257 1.000
experience of
computers
All 0.305 -0.159 -0.382 0.409 0.339 0.514 1.000
computers
hate me
I have never 0.331 -0.050 -0.259 0.349 0.269 0.223 0.297 1.000
been good at
mathematics

All statements have correlation with others in the table ranging from r=−0.382 to r=.514

Due to relatively high correlations among items, this would be a good candidate for factor
analysis.

Recall that the goal of factor analysis is to model the interrelationships between items with
fewer (latent) variables. These interrelationships can be broken up into multiple components.

7
KMO & Bartlett’s TEST

KMO and Bartlett's Test


Kaiser-Meyer-Olkin Measure of Sampling
.818
Adequacy.
Bartlett's Test of Approx. Chi-Square 4157.283
Sphericity df 28
Sig. .000

The significance level is less than 0.05 which means the alternate hypothesis is accepted, i.e.,
Items analysed are significantly correlated with each other.

Generally, a value greater than 0.5 is desirable. We have more than the value desirable, which
means that we can proceed with factor analysis.

SCREE PLOT

After the 4th factor there is a jump, so we can say that there may be a possibility to extract
three factors. In Scree plot we consider only the first jump.

We can go for Eigen value which is greater than 1 and two factors might be extracted from
the Eigenvalue selection method.

8
COMPONENT MATRIX
(Before Rotation)
Component
1 2
1. I dream that Pearson is attacking me with correlation coefficients .720
2. All computers hate me .718
3. Statistics makes me cry .659
3. Standard deviations excite me -.653 .409
4. I don't understand statistics .650
5. I have little experience of computers .572
6. I have never been good at mathematics .568
7. My friends will think I'm stupid for not being able to cope with SPSS .866
Note 1: Extraction Method: Principal Component Analysis.
Note 2: 2 components extracted.

This table is useful in identifying less correlated variables in both factors.


- Statement 4 & 7 are highly correlated with component 2, whereas rest statements are
highly correlated with component 1.

ROTATED COMPONENT MATRIX

Component
1 2
I dream that Pearson is attacking me with correlation coefficients .712
All computers hate me .682
Statistics makes me cry .662
I don't understand statistics .638
I have never been good at mathematics .627
I have little experience of computers .600
My friends will think I'm stupid for not being able to cope with SPSS .916
Standard deviations excite me -.454 .623
Note 1: Extraction Method: Principal Component Analysis.
Note 2: Rotation Method: Varimax with Kaiser Normalization
Note 3: Rotation converged in 3 iterations.

The purpose of the rotated component matrix is to identify the lowest correlated value in the
extracted components and remove and re-run the analysis.

The table contains the rotated factor loadings which represent both how the variables are
weighed for each factor, but also the correlation between the variables and the factor.

We used the option “suppress small coefficients” which is 0.3 or less so it removes the value
which are less than 0.3.

For example, “I have little experience of computers” has the least correlation value i.e.,0.6
that is loaded in the component 1.

9
Similarly, “Standard deviations excite me” has a correlation value of 0.623 that is loaded in
component 2.

Now let us assume that “I have little experience of computers” statement showed a
correlation value of 0.2, then we can eliminate the statement from the analysis and re-run the
analysis.

COMMUNALITIES

Initial Extraction
Statistics makes me cry 1.000 .453
My friends will think I'm stupid for not being able to cope with
1.000 .840
SPSS
Standard deviations excite me 1.000 .594
I dream that Pearson is attacking me with correlation coefficients 1.000 .532
I don't understand statistics 1.000 .431
I have little experience of computers 1.000 .361
All computers hate me 1.000 .517
I have never been good at mathematics 1.000 .394

The amount of variance a variable shares with all other variables included in the analysis is
referred to as communality.

- If communality value is smaller then, we can consider it as unique variable.

For example, the statement “My friends will think I am stupid for not being able to cope with
SPSS” has 0.840 as extracted value.

- It means that it has more shared variance among the other items in the analysis.

For example, “I have never been good at mathematics”, has 0.394 as extracted value.

- It means that it has less shared variance among the other items in the analysis.
(Unique variable)

COMPONENT SCORES COEFFICIENT MATRIX

Component
1 2
Statistics makes me cry .248 .038
My friends will think I'm stupid for not being able to cope with SPSS .212 .790
Standard deviations excite me -.056 .435
I dream that Pearson is attacking me with correlation coefficients .260 .016
I don't understand statistics .231 .004
I have little experience of computers .238 .091
All computers hate me .233 -.049
I have never been good at mathematics .266 .163

10
Note 1: Extraction Method: Principal Component Analysis.
Note 2: Rotation Method: Varimax with Kaiser Normalization.

F1 = (0.248 * Statistics makes me cry) + ….. + (0.266 * I have never been good at
mathematics).

F2 = (0.038 * Statistics makes me cry) + ….. + (0.163 * I have never been good at
mathematics)

Now we give names to the extracted factors, i.e., F1 and F2, as

- F1 = Fear of Statistics
- F2 = Fear of Mathematics

Now these variables can be used for further multivariate analysis like regression and
discriminant analysis.

The variables can be seen as saved in variable view.

11

You might also like