Professional Documents
Culture Documents
Advanced Research Methods STAT-6: Factor Analysis: Exploring Measurements
Advanced Research Methods STAT-6: Factor Analysis: Exploring Measurements
n
i=1
(x
i
x) (y
i
y)
(n 1) s
x
s
y
where
larger values of |r| denote a strong correlation between the
variables x and y;
smaller values of |r| indicate a weak correlation between x and y;
r = 0 indicates a lack of correlation between x and y.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 16 / 36
Studying the correlation matrix
Consider the following correlation matrix on the random variables
x
1
, . . . , x
5
:
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
Is there an underlying factor F that explains the correlations between
the variables x
1
, . . . , x
5
?
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 17 / 36
Partial correlations
Partial correlation coefcient
Consider the three random variables x
i
, x
j
, x
k
. The partial correlation
coefcient r
ijk
of x
i
and x
j
given x
k
is
r
ijk
=
r
ij
r
ik
r
jk
(1 r
2
ik
) (1 r
2
jk
)
where r
ij
is the correlation coefcient of x
i
and x
j
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 18 / 36
Partial correlations continued
The partial correlation coefcient
r
ijk
=
r
ij
r
ik
r
jk
(1 r
2
ik
) (1 r
2
jk
)
equals zero
if the correlation between the variables x
i
and x
j
is fully described
by their separate correlations with the variable x
k
;
if r
ij
= r
ik
r
jk
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 19 / 36
Studying correlation matrices, continued
Consider again
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
There is a common factor that fully explains the correlations among
x
1
, . . . , x
5
, if a factor variable F can be constructed such that
r
ijF
= 0
for all pairs of variables x
i
, x
j
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 20 / 36
Studying correlation matrices, continued
Consider again
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
and let F be a factor variable with
r
1F
= 0.9 r
2F
= 0.8 r
3F
= 0.7 r
4F
= 0.6 r
5F
= 0.5
Then,
r
12
= r
1F
r
2F
= 0.9 0.8 = 0.72
r
13
= r
1F
r
3F
= 0.9 0.7 = 0.63
r
14
= r
1F
r
4F
= 0.9 0.6 = 0.54
.
.
.
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 21 / 36
Studying correlation matrices, continued
Consider again
x
1
x
2
x
3
x
4
x
5
x
1
1.00 0.72 0.63 0.54 0.45
x
2
0.72 1.00 0.56 0.48 0.40
x
3
0.63 0.56 1.00 0.42 0.35
x
4
0.54 0.48 0.42 1.00 0.30
x
5
0.45 0.40 0.35 0.30 1.00
After excluding the effect of the factor F, a residual matrix results:
x
1
x
2
x
3
x
4
x
5
x
1
0.19 0 0 0 0
x
2
0 0.36 0 0 0
x
3
0 0 0.51 0 0
x
4
0 0 0 0.64 0
x
5
0 0 0 0 0.75
c Peter de Waal STAT-6: Factor Analysis : Studying correlations 22 / 36
The factor matrix
Factor matrix
A factor matrix on the random variables x
1
, . . . , x
n
, n 2, and the
factors F
1
, . . . , F
m
, m 1, is a matrix of the following form:
F
1
F
2
F
m
x
1
l
11
l
12
l
1m
x
2
l
21
l
22
l
2m
.
.
.
.
.
.
.
.
.
x
n
l
n1
l
n2
l
nm
where l
ij
is the correlation coefcient between the variable x
i
and the
factor F
j
; l
ij
is called the loading of x
i
on F
j
.
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 23 / 36
Factor matrix: an example
F
1
F
2
F
3
F
4
F
5
x
1
0,86 -0,29 0,15 -0,13 -0,37
x
2
0,82 -0,38 0,14 0,25 0,32
x
3
0,84 -0,19 -0,14 -0,48 0,06
x
4
0,47 0,77 0,42 -0,08 0,03
x
5
0,68 0,53 0,48 0,19 0,01
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 24 / 36
The example revisited
For the community-platform study, part of the computed factor matrix
is
Factor 1: Factor 3:
Feature Identity Governance
Prole, personal agenda 0.76 0.17
Prole, favourite artists 0.73 0.06
Prole, favourite parties 0.81 0.09
Buddy system 0.77 0.10
Prole, personal photo 0.73 0.04
.
.
.
Clear netiquette 0.06 0.84
Moderation 0.01 0.89
User guidance / FAQ 0.13 0.72
Report-to-moderator 0.16 0.77
.
.
.
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 25 / 36
Loadings
The loading l of a variable x on a factor F is interpreted as follows:
a large value of |l| indicates that the variable x contributes to the
meaning of the factor F;
a small or zero value of |l| indicates that x does not contribute
much to the meaning of F, but rather contributes to that of another
factor.
A loading ranges between 1.00 and +1.00.
c Peter de Waal STAT-6: Factor Analysis : Factor matrix 26 / 36
Communalities
Communality, uniqueness
Consider the random variables x
1
, . . . , x
n
, n 2, and the factors
F
1
, . . . , F
m
, m 1. Let l
ij
be the loading of the variable x
i
on the factor
F
j
. The communality c
i
of x
i
equals
c
i
=
m
j=1
l
2
ij
The uniqueness u
i
of x
i
equals
u
i
= 1 c
i
c Peter de Waal STAT-6: Factor Analysis : Communalities 27 / 36
The example revisited
Consider part of the factor matrix of the community-platform study
and the communalities of the original variables:
Factor 1: Factor 3:
Feature Identity Governance Communality
Clear netiquette 0.06 0.84 0.71
Moderation 0.01 0.89 0.79
User guidance / FAQ 0.13 0.72 0.54
Report-to-moderator 0.16 0.77 0.62
The communality c
M
of the Moderation variable equals
c
M
= l
2
MF
1
+ l
2
MF3
= 0.79
which expresses that 79% of the variance of the variable Moderation is
explained by the two factors F
1
and F
3
.
c Peter de Waal STAT-6: Factor Analysis : Communalities 28 / 36
Factor matrix: an example
F
1
F
2
F
3
F
4
F
5
x
1
0,86 -0,29 0,15 -0,13 -0,37
x
2
0,82 -0,38 0,14 0,25 0,32
x
3
0,84 -0,19 -0,14 -0,48 0,06
x
4
0,47 0,77 0,42 -0,08 0,03
x
5
0,68 0,53 0,48 0,19 0,01
c Peter de Waal STAT-6: Factor Analysis : Communalities 29 / 36
The number of factors
The number of factors selected for further processing, is based upon
prior knowledge:
domain knowledge;
i=1
l
2
ij
c Peter de Waal STAT-6: Factor Analysis : Factor solution 31 / 36
Eigenvalues, continued
Suppose that for n random variables, n factors are extracted from the
data:
each variable could essentially account for (100/n)% of the total
variance in the data;
a factor with an eigenvalue e accounts for as much variance as e
variables in essence could.
The eigenvalue is sometimes called the amount of variance explained.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 32 / 36
Eigenvalues: an example
F
1
F
2
F
3
F
4
F
5
x
1
0,86 -0,29 0,15 -0,13 -0,37
x
2
0,82 -0,38 0,14 0,25 0,32
x
3
0,84 -0,19 -0,14 -0,48 0,06
x
4
0,47 0,77 0,42 -0,08 0,03
x
5
0,68 0,53 0,48 0,19 0,01
eigenvalues 2,80 1,14 0,46 0,35 0,25
c Peter de Waal STAT-6: Factor Analysis : Factor solution 33 / 36
Tests of eigenvalues
For the factor solution, several tests are in use for selecting the factors
to be included:
Cattells scree curve shows the eigenvalue of each subsequent
factor:
The factors just prior to the levelling of the curve are included in
the factor solution;
Kaisers rule of eigenvalues is to include factors with an
eigenvalue e > 1 only.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 34 / 36
Rotation
To achieve markedly different loadings on the various factors, often a
rotation of the factor solution is performed:
the original variables and their values remain unchanged;
the factors of the solution are redened in terms of the original
variables, while (more or less) maintaining their mutual
independence.
c Peter de Waal STAT-6: Factor Analysis : Factor solution 35 / 36
Lessons learned
The overall lessons learned from this lecture are:
factor analysis is a statistical technique for exploration;
factors are extracted from the data by analysing the correlations
between the measured variables;
factor analysis allows for data reduction.
c Peter de Waal STAT-6: Factor Analysis : Lessons learned 36 / 36