Download as pdf or txt
Download as pdf or txt
You are on page 1of 47

Factor Analysis

Amar Saxena
AmarSaxena@gmail.com
+91.993.002.2910 21th Oct 2022
A few life scenarios
There is a girl who, You know a person who is,
• Shares her chocolate with • Very high on confidence
you • Extraordinary level of
• Waits for you to go for pride
lunch & snacks • Self-centered and self-
• Often praises you and loving
gives you compliments • Very adamant
• Gives you gifts • Difficult to handle
• Is very affectionate • Can insult anyone easily

How would you describe What can you say about


the situation? this person?

2
What is this country famous for?

Slide 3
Gross National Happiness Index

Good 1. Psychological wellbeing


Governance 2. Health
Gross National Happiness Index

Sustainable 3. Education
economic 4. Time use
development
5. Cultural diversity and resilience
6. Good governance
7. Community vitality
Cultural 8. Ecological diversity and resilience
preservation
9.Living standards

Environmental 10. … … … … …
conservation

33 Indicators Slide 4
Some life scenarios

Egoism Creativity • Satisfaction


Happiness Religiosity • Attitude
Comfort Care • Perception
Motivation Hate • Quality
Altruism Anxiety • Brand Equity
Worry Stress • Buyer Persona
Reliability Product Quality • Customer Value
Physical Aptitude Power • Customer Personality

What is common among these terms?

5
The Basic Premise
• So, how did you understand these life scenarios?
o What would people do when they are in love?
o What are the behavioral traits of an egoist?

• There are some variables that can be easily measured


o Such as speed, height, weight
• And then, there are variables that cannot be directly
measured. We need other variables to measure them.
o These variables are not a single measurable entity.

7
The Basic Premise (contd)

• These are complex variables – composed of simpler and


measurable variables.
• Helping to understand and describe any phenomenon better.
o They are called constructs - unobservable latent variables.
o Constructs, by themselves, are difficult to measure. Hence use
other variables to measure and understand them.

This is the basic premise behind Factor Analysis

8
So, what is Factor Analysis
General name denoting a class of procedures primarily used for
data reduction and summarization.
o A statistical procedure,
o To identify these complex variables (or constructs),
o Based on a number of inter-related quantitative variables.

• Used to understand the directly observable underlying


dimensions of a construct.
o This is done on the basis of their relevance – Correlation.
• Different dimensions within the data are called factors.

• Interdependence technique –
Entire set of interdependent relationships is examined
without making the distinction between dependent &
independent variables.
9
A Hypothetical Example of
Factor Analysis

10
Determinants of Store Image
Situation: Retailer would like to know whether consumers think in more
general evaluative dimensions rather than in just the specific items

Research Plan: Nine store image elements, including measures of the product
offering, store personnel, price levels, and in-store service and experiences
V1 V2 V3 V4 V5 V6 V7 V8 V9
V1 Price Level 1.00

V2 Store Personnel .427 1.00


V3 Return Policy .302 .771 1.00
V4 Product Availability .470 .497 .427 1.00
V5 Product Quality .765 .406 .307 .472 1.00
V6 Assortment Depth .281 .445 .423 .713 .325 1.00
V7 Assortment Width .354 .490 .471 .719 .378 .724 1.00
V8 In-Store Service .242 .719 .733 .428 .240 .311 .435 1.00

V9 Store Atmosphere .372 .737 .774 .479 .326 .429 .466 .710 1.00
Grouping the Variables (elements)

Grouping the variables that are more correlated to each other

V3 V8 V9 V2 V6 V7 V4 V1 V5
V3 Return Policy 1.00
V8 In-store Service .733 1.00
V9 Store Atmosphere .774 .710 1.00
V2 Store Personnel .741 .719 .787 1.00
V6 Assortment Depth .423 .311 .429 .445 1.00
V7 Assortment Width .471 .435 .468 .490 .724 1.00
V4 Product Availability .427 .428 .479 .497 .713 .719 1.00
V1 Price Level .302 .242 .372 .427 .281 .354 .470 1. 00
V5 Product Quality .307 .240 .326 .406 .325 .378 .472 .765 1.00

12
Interpreting the Factor Analysis Results

Variables Factors

V3 Return Policy
V8 In-store Service In-store Experience
V9 Store Atmosphere
V2 Store Personnel

V6 Assortment Depth
Product Offerings
V7 Assortment Width
V4 Product Availability

V1 Price Level
Value
V5 Product Quality

13
Uses of Factor Analysis
• Reducing data
o Reduces a large number of overlapping variables into smaller,
more manageable number.
• Scale development
o Helps in identifying the important variables to be measured for
measuring a construct.
o E.g. Service Quality, Satisfaction.
• Simplifies the understanding and description of complex
constructs by identifying of underlying dimensions (factors).
• Reduce multi-collinearity
• Factor analysis can also be used to construct indices
o Helps in finding weight of each variable in the index.

Slide 14
A Good Factor Analysis Solution
• A major goal of factor analysis is to represent relationships
among sets of variables parsimoniously yet keeping
factors meaningful.
• A good factor solution is both simple and interpretable.
• When factors can be interpreted, it leads to new insights.

• Results of factor analysis may not always be satisfactory:


o The items or scales may be poor indicators of the construct or
constructs.
o There may be too few items or scales to represent each
underlying dimension.
So, the variables measured are extremely important.

15
Stages in conducting Factor Analysis

Formulate the Problem

Data Screening and Preparation

Decide - should we do Factor Analysis?

Method of Factor Analysis?

Determine the Number of Factors

Rotate the Factors

Interpret the Factors

Calculate the Factor Scores


Conducting Factor Analysis
RESPONDENT
V1 V2 V3 V4 V5 V6
NUMBER
1 7 3 6 4 2 4
2 1 3 2 4 5 4
3 6 2 7 4 1 3
4 4 5 4 6 2 5
5 1 2 2 3 6 2
6 6 3 6 4 2 4
7 5 3 6 3 4 3
8 6 4 7 4 1 4
9 3 4 2 3 6 3
10 2 6 2 6 7 6
11 6 4 7 3 2 3
12 2 3 1 4 5 4
13 7 2 6 4 1 3
14 4 6 4 5 3 6
15 1 3 2 2 6 4
16 6 4 6 3 3 4
17 5 3 6 3 3 4
18 7 3 7 4 1 4
19 2 4 3 3 6 3
20 3 5 3 6 4 6
21 1 3 2 3 5 3
22 5 4 5 4 2 4
23 2 2 1 5 4 4
24 4 6 4 6 4 7
25 6 5 4 2 1 4
26 3 5 4 6 4 7
27 4 4 7 2 2 5
28 3 7 2 6 4 3
29 4 6 3 7 2 7
30 2 3 2 4 7 2
Assumptions
• The underlying dimensions (factors) can be used to explain a
complex phenomenon.
• Variables have been measured at least on an Interval Scale.
• Sample is homogenous with respect to underlying factor
structure.
• Requires large sample size
o Factor analysis is based on the correlation of variablesinvolved,
and correlations usually need a large sample size before they
stabilize.

• Departure of normality and linearity can apply to extent


that they diminish observed correlation.
• There should not be homoscedasticity between the
variables. 19
Statistics Associated with Factor Analysis
• Correlation matrix. Matrix showing the simple correlations, r, between
all possible pairs of variables included in the analysis.
• Communality. Amount of variance a variable shares with all the other
variables. Each variable's proportion of variability that is explained by
the factors. Remains same – Unrotated or Rotated.
• Eigenvalue. Represents the total variance explained by each factor.
• Factor loadings. Correlations between the variables and the factors.
• Factor loading plot. Plot of the original variables using the factor
loadings as coordinates.
• Factor matrix. Matrix containing the factor loadings of all the variables
on all the factors extracted.
• Factor scores. Composite scores estimated for each respondent on the
derived factors.
• Percentage of variance. % of total variance attributed to each factor.
Formulate the Problem
• The objectives of factor analysis?
o Data Summarization Vs Data Reduction
• Using Factor Analysis with Other Multivariate Techniques
o Factors may identify concepts more useful than individual
variables
o Factors help mitigate the impact of multicollinearity on the
interpretation of correlated variables.
• An appropriate sample size should be used.
o As a rough guideline, there should be at least four or five times
as many observations (sample size) as there are variables.
o Generally requires high sample sizes – higher the better
• All variables should be on interval or ratio scale.
• Data Screening and Cleaning
Variable Selection
Three elements in variable selection
1. Variable specification – researcher must specifically designate
variables to be analyzed.
Select variables based on past research, theory, and judgment of
the researcher.

2. Factors are always produced – Exploratory Factor Analysis


always generates factors, researcher has the responsibility to
evaluate the usefulness and validity of the factors.
Caution – GIGO

3. Factors require multiple variables – Exploratory Factor


Analysis must have at least two correlated variables to form a
factor. Thus, variables which are not included in a specified
factor are not “defective” in some manner, it is just that no
other correlated variables were included in the analysis.
22
Should we do Factor Analysis?

Correlation Matrix

Variables V1 V2 V3 V4 V5 V6
V1 1
V2 -0.53 1
V3 0.873 -0.155 1
V4 -0.086 0.572 -0.248 1
V5 -0.858 0.02 -0.778 -0.007 1
V6 0.004 0.64 -0.018 0.64 -0.136 1
Should we do Factor Analysis?

• Correlation among variables


o If they are independent, then will be difficult to implement.
o Unexplained Correlation (aka Partial Correlation):
A high value (say above 0.5), makes factor analysis inappropriate.

• Bartlett's test of sphericity


o H0 – the variables are uncorrelated in the population
o If this hypothesis cannot be rejected, then the appropriateness of
factor analysis should be questioned.
o Values lower than 0.05 are ideal.
• Kaiser-Meyer-Olkin (KMO) – Measure of Sampling
Adequacy.
o Values between 0.8 and 1 – excellent.
o Values 0.5 to 0.8 – acceptable.
o Close to zero – Problem for factor analysis.
Method of Factor Analysis
• Methods of factor extraction include
o Principal Component Analysis (PCA)
o Principal Axis Factoring (or Common Factor Analysis)
o Maximum Likelihood Method
o Alpha Method
o Unweighted Lease Squares Method
o Generalized Least Square Method
o Image Factoring.

Principal Components Analysis:


• Most commonly used method for factor extraction
• Total variance in the data is considered. The diagonal of the
correlation matrix consists of unities, and full variance is brought
into the factor matrix.
Principal Component Analysis
• Primary concern is Data Reduction – i.e. determine the
minimum number of factors to account for maximum
variance in the data for use in subsequent multivariate
analysis.
The factors are called Principal Components.

• Linear combination of the observed variables are formed.

• 1st principal component is the combination that accounts for the


largest amount of variance in sample (1st extracted factor).
• 2nd principle component accounts for the next largest amount of
variance and is uncorrelated with the first (2nd extracted factor).
• Successive components explain progressively smaller portions of
the total sample variance, and all are uncorrelated with each other.

26
Results of Principal Components Analysis

Communalities
Variables Initial Extraction
V1 1.000 0.926
V2 1.000 0.723
V3 1.000 0.894
V4 1.000 0.739
V5 1.000 0.878
V6 1.000 0.790

Initial Eigen Values


Factor Eigen value % of Cumulat.
variance %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
3 0.442 7.360 89.848
4 0.341 5.688 95.536
5 0.183 3.044 98.580
6 0.085 1.420 100.000
Communalities

Measure how much of the variance in the variables has


been accounted for extraction of factors

56.0% of the variance in


UN91 is accounted for

while

78.4% is accounted for


UT94
Results of Principal Components Analysis
The lower-left triangle contains the reproduced correlation matrix;
The diagonal, the communalities;
The upper-right triangle, the residuals between the observed
correlations and the reproduced correlations.

Factor Score Coefficient Matrix


Variables V1 V2 V3 V4 V5 V6
V1 0.926 0.024 -0.029 0.031 0.038 -0.053
V2 -0.078 0.723 0.022 -0.158 0.038 -0.105
V3 0.902 -0.177 0.894 -0.031 0.081 0.033
V4 -0.117 0.730 -0.217 0.739 -0.027 -0.107
V5 -0.895 -0.018 -0.859 0.020 0.878 0.016
V6 0.057 0.746 -0.051 0.748 -0.152 0.790
How many factors to extract?
• A Priori Determination. Prior knowledge
• Based on Eigenvalues (Latent Root Criteria/ Kaiser Rule).
o Retain factors with Eigenvalues greater than 1.0.
• Retaining factors explaining more than a single variable.
• If the number of variables is less than 20, this approach will result in
a conservative number of factors.

• Based on Scree Plot. Plot of Factor and Eigen Values


o Number of factors at the point of inflection
• Based on Percentage of Variance.
o Factors explain a satisfactory level of variance
o Generally, ~60% of the variance.
Other ways … though less used
• Determination Based on Split-Half Reliability. The sample is
split in half and factor analysis is performed on each half. Only
factors with high correspondence of factor loadings across the two
subsamples are retained.

• Determination Based on Significance Tests. It is possible to


determine the statistical significance of the separate Eigenvalues
and retain only those factors that are statistically significant. A
drawback is that with large samples (size greater than 200), many
factors are likely to be statistically significant, although from a
practical viewpoint many of these account for only a small
proportion of the total variance.

At this stage, the decision about the number of factors is not final
Scree Plot

3.0

2.5
Eigen Value

2.0

1.5

1.0

0.5

0.0
1 2 3 4 5 6
Component Number
Choosing Number of Factors
• Several considerations to decide the # of factors to retain:
✓ Various criteria for finding the initial solution:
▪ A pre-determined number based on prior research/ judgement
▪ Factors with Eigenvalues greater than 1.0
▪ Enough factors for a specified %age of variance explained (~60%)
▪ Scree plot – factors before inflection point
▪ Factors which have eigenvalues greater than factors from
randomly-generated data
▪ Factors above the threshold established by parallel analysis
✓ More factors when there is heterogeneity among sample
subgroups.

• Consideration of several alternative solutions


o One more and one less factor than the initial solution – to ensure
the best structure is identified.

33
Total Variance Explained
Extraction Sums of Squared
Initial Eigenvalues Loadings

Compo % of Cumulative % of Cumulative


nent Total Variance % Total Variance %
1 3.046 30.465 30.465 3.046 30.465 30.465
2 1.801 18.011 48.476 1.801 18.011 48.476
3 1.009 10.091 58.566 1.009 10.091 58.566
4 .934 9.336 67.902

5 .840 8.404 76.307

6 .711 7.107 83.414

7 .574 5.737 89.151

8 .440 4.396 93.547

9 .337 3.368 96.915

10 .308 3.085 100.000

Extraction Method: Principal Component Analysis.


Slide 34
Results of Factor Selection

Extraction Sums of Squared Loadings


Factor Eigen value % of variance Cumulat. %
1 2.731 45.520 45.520
2 2.218 36.969 82.488
Factor Matrix
Variables Factor 1 Factor 2
Initial V1 0.928 0.253
V2 -0.301 0.795
Factor V3 0.936 0.131
Loadings V4 -0.342 0.789
V5 -0.869 -0.351
V6 -0.177 0.871

Rotation Sums of Squared Loadings


Factor Eigenvalue % of variance Cumulat. %
1 2.688 44.802 44.802
2 2.261 37.687 82.488
Initial Factor Loadings
Component Matrix Loading of variables on extracted factors
Component
1 2 3
I discussed my frustrations and feelings with person(s)in .771 -.271 .121
school
I tried to develop a step-by-step plan of action to remedy the
problems
.545 .530 .264
I expressed my emotions to my family and close friends .580 -.311 .265
I read, attended workshops, or sought someothereducational
approach to correct the problem
.398 .356 -.374
I tried to be emotionally honest with my self about theproblems .436 .441 -.368
I sought advice from others on how I should solve theproblems .705 -.362 .117
I explored the emotions caused by the problems .594 .184 -.537
I took direct action to try to correct the problems .074 .640 .443
I told someone I could trust about how I felt about the problems .752 -.351 .081
I put aside other activities so that I could work to solve the .225 .576 .272
problems
Extraction Method: Principal ComponentAnalysis.

Which variable explains which particular factor?


36
Understanding the Factors
• Un-rotated (or raw) factors are typically not very interpretable
̶ Most factors are correlated with many variables
̶ Making interpretation difficult.

Solution – ROTATE the factors


• Through rotation, the factor matrix is transformed into a
simpler one that is easier to interpret.
• In rotating the factors, each variable should have either a very
low or a very high loading with the factor.
• Orthogonal Rotation – Rotation when the axes are
maintained at right angles.
• Oblique Rotation – when the axes are not maintained at right
angles, and the factors are correlated.
o Less frequently used as the result is more difficult to summarize.
Rotation of Factors
• Most commonly used method – Varimax Rotation.
o It is an orthogonal method of rotation that minimizes the number
of variables with high loadings on a factor, thereby enhancing
the interpretability of the factors.
o The Orthogonal rotation results in factors that are uncorrelated.

• Other rotational methods:


o Quartimax (Orthogonal)
o Equamax (Orthogonal)
o Promax (Oblique)
Results of Varimax Rotation

Rotated Factor Matrix


Variables Factor 1 Factor 2
V1 0.962 -0.027
V2 -0.057 0.848
V3 0.934 -0.146
V4 -0.098 0.845
V5 -0.933 -0.084
V6 0.083 0.885

Factor Score Coefficient Matrix


Variables Factor 1 Factor 2
V1 0.358 0.011
V2 -0.001 0.375
V3 0.345 -0.043
V4 -0.017 0.377
V5 -0.350 -0.059
V6 0.052 0.395
Factor Matrix Before and After Rotation

Factors Factors
Variables 1 2 Variables 1 2
1 X 1 X
2 X X 2 X
3 X 3 X
4 X X 4 X
5 X X 5 X
6 X 6 X
(a) Before Rotation (b) After Rotation
Slide 41
Factor Loading Plot

Component Plot in Rotated Space Rotated Component Matrix

Component 1
Component
1.0 V4
  V6 Var 1 2
V2
Component 2

V1 0.962 -2.66E-02
0.5
V2 -5.72E-02 0.848
V1  V3 0.934 -0.146
0.0
 V5 V3 V4 -9.83E-02 0.854
-0.5
V5 -0.933 -8.40E-02
V6 8.337E-02 0.885
-1.0

1.0 0.5 0.0 -0.5 -1.0


Approaches to Validation
• Use of Replication or a Confirmatory Perspective
o Assess the replicability/generalizability of the results, either
with a split sample in the original dataset or with a separate
sample.
o Pursue a confirmatory analysis, most likely with structural
equation modeling.
• Assessing Factor Structure Stability
o Larger samples provide more confidence as to generalizability
and stability.
• Detecting Influential Observations
o Estimate the model with and without observations identified as
outliers to assess their impact on the results. If omission of the
outliers is justified, the results should have greater
generalizability.

43
Making the Final Decision
• Interpret the factors
o By examining the largest values linking the factor to measured
variables
o Collate all the variables that have large loadings for the same factor
o Plots of loadings provide a visual for variable clusters.
• Do the factors make sense?
o What do the variables tell about the factor?
o Are all the variables pointing in the same direction? Can we name it?
o Interpret factors according to the meaning of the variables
o Does the factor make sense?
o If NO – then change the number of factors and re-do the analysis.
• So, how do we decide on the number of factors?
o Past research should ideally guide the number of factors to be extracted
o Eigen values computed in step 2.
o The relative interpretability of rotated solutions computed in step 3.
• Calculate Factor Scores

44
Interpret Factors – Label the factors
• A factor can then be interpreted in terms of the variables that
load high on it.

• Another useful aid in interpretation is to plot the variables,


using the factor loadings as coordinates. Variables at the end
of an axis are those that have high loadings on only that
factor, and hence describe the factor.
Calculate Factor Scores
The factor scores for the ith factor may be estimated as follows:
Fi = Wi1 X1 + Wi2 X2 + Wi3 X3 + . . . + Wik Xk

The first factor is


F1 = (-.137)BD81 + (-.128)BD82+ (-.143)BD83 + …+
(.228)UT95

Another factor is
F2=(.304)BD81 + (.289)BD82 + (.279)BD83 + ... +
(-.144)UT95
Uses of Factor Analysis
• Reducing data
o Reduces a large number of overlapping variables into smaller,
more manageable number.
• Scale development
o Helps in identifying the important variables to be measured for
measuring a construct.
o E.g. Service Quality, Satisfaction.
• Simplifies the understanding and description of complex
constructs by identifying of underlying dimensions (factors).
• Reduce multi-collinearity.
• Factor analysis can also be used to construct indices
o Helps in finding weight of each variable in the index.

47
Uses of Factor Analysis
• Interdependency and pattern delineation.
• Parsimony or data reduction.
• Classification or description.
• Scale Development.
• Reduce Multi-collinearity.
• Data transformation.
• Exploration.
• Mapping.
• Theory.

Slide 48

You might also like