Professional Documents
Culture Documents
Factor Analysis
Factor Analysis
Nathaniel E. Helwig
Updated 16-Mar-2017
Copyright
c 2017 by Nathaniel E. Helwig
Background
X = µ + LF +
where
L = {`jk }p×m denotes the matrix of factor loadings
`jk is the loading of the j-th variable on the k -th common factor
F = (F1 , . . . , Fm )0 denotes the vector of latent factor scores
Fk is the score on the k -th common factor
= (1 , . . . , p )0 denotes the vector of latent error terms
j is the j-th specific factor
X = µ + LF +
This implies that the covariance between X and F has the form
σjj = hj2 + ψj
|{z} |{z} |{z}
Var(Xj ) communality uniqueness
where
σjj is the variance of Xj (i.e., the j-th diagonal of Σ)
hj2 = (LL0 )jj = `2j1 + `2j2 + · · · + `2jm is the communality of Xj
ψj is the specific variance (or uniqueness) of Xj
Note that the communality hj2 is the sum of squared loadings for Xj .
R − Ψ = LL0
iid
Suppose xi ∼ N(µ, LL0 + Ψ) is a multivariate normal vector.
3
f
●
2
2
f
●
e● g● e● e● f
●
g●
1
1
d● g●
d
●
xyrot[,2]
xyrot[,2]
d● h● a● b● c●
0
0
y
c● h●
a● b● h●
−1
−1
−1
c● i
●
b
● j
●
i
●
i
●
a● k●
−2
−2
−2
j
●
j
●
k● k●
−3
−3
−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
x xyrot[,1] xyrot[,1]
3
a● a●
2
2
k●
b● j
●
b●
e● c● i c●
f
● ●
1
1
a
●
d
● d●
b● c● g● e●
xyrot[,2]
xyrot[,2]
xyrot[,2]
f
●
0
0
h● d●
g●
h● h●
−1
−1
−1
i
●
g● e●
i
● j
●
−2
−2
−2
●j k● f
●
k●
−3
−3
−3
−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
X = µ + LF +
= µ + L̃F̃ +
where
L̃ = LR are the rotated factor loadings
F̃ = R0 F are the rotated factor scores
where
`˜jk is the rotated loading of the j-th variable on the k -th factor
qP
m
h̃j = `˜2 is the square-root of the communality for Xj
k =1 jk
In FA, one may want to obtain estimates of the latent factor scores F .
X = µ + LF +
F
= µ + L Ip
= µ + L∗ F ∗
where
xi is the i-th subject’s vector of data
x̄ = (1/n) ni=1 xi is the sample mean
P
Note that if PCA is used to estimate L̂ and Ψ̂, then it is typical to use
−1
f̂i = L̂0 L̂ L̂0 (xi − x̄)
(W ) (R)
where f̂i and f̂i denote the WLS and REG estimates, respectively.
(W ) 2 (R) 2
Note that this implies that kf̂i k ≥ kf̂i k where k · k denotes the
Euclidean norm.
Some Extensions
X = µ + LF +
This implies that the covariance between X and F has the form
If the factors are orthogonal, then Φ = Im and the factor pattern and
structure matrices are identical.
Like EFA models, CFA models can be fit via either least squares or
maximum likelihood estimation.
Least squares is analogue of PCA fitting
Maximum likelihood assumes multivariate normality
Decathlon Example
> decathlon[1:9,]
run100 long.jump shot high.jump run400 hurdle discus pole.vault javelin run1500 score
Schenk 11.25 7.43 15.48 2.27 48.90 15.13 49.28 4.7 61.32 268.95 8488
Voss 10.87 7.45 14.97 1.97 47.71 14.46 44.36 5.1 61.76 273.02 8399
Steen 11.18 7.44 14.20 1.97 48.29 14.81 43.66 5.2 64.16 263.20 8328
Thompson 10.62 7.38 15.02 2.03 49.06 14.72 44.80 4.9 64.04 285.11 8306
Blondel 11.02 7.43 12.92 1.97 47.44 14.40 41.20 5.2 57.46 256.64 8286
Plaziat 10.83 7.72 13.58 2.12 48.34 14.18 43.06 4.9 52.18 274.07 8272
Bright 11.18 7.05 14.12 2.06 49.34 14.39 41.68 5.7 61.60 291.20 8216
De.Wit 11.05 6.95 15.34 2.00 48.21 14.36 41.32 4.8 63.00 265.86 8189
Johnson 11.15 7.12 14.52 2.03 49.15 14.66 42.36 4.9 66.46 269.62 8180
FA Scree Plot
●
0.4
Proportion of Variance
0.3
0.2
●
0.1
●
●
●
●
1 2 3 4 5 6
# Factors
high.jump
0.7
shot
discus
0.8
0.6
javelin run1500
pole.vault
0.6
Uniqueness (ψj)
0.5
F2 Loadings
javelin
0.4
hurdle
high.jump long.jump
score
run100
0.4
0.2
long.jump
0.3
run400 score
run100 hurdle pole.vault
0.0
run400
0.2
discus
run1500
0.1
shot
−0.4
8500
● ●
● ●
●● ● ● ●● ● ●
●●
●● ● ● ●●
●●● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ●
●●● ● ●●● ●
7500
7500
● ● ● ●
Decathlon Score
Decathlon Score
●● ● ●
● ●
● ●
● ● ● ●
● ●
● ●
6500
6500
5500
5500
● ●
−3 −2 −1 0 1 −2 −1 0 1
F1 Score F1 Score
1.0
shot shot
discus
discus
0.8
0.8
javelin javelin
pole.vault
0.6
0.6
F2 Loadings
F2 Loadings
pole.vault
0.4
0.4
hurdle
high.jump long.jump
score
run100
high.jump
0.2
0.2
hurdle
long.jump
score
run100
run400
0.0
0.0
run400
run1500
−0.4
−0.4
−0.4 −0.2 0.0 0.2 0.4 0.6 0.8 1.0 −0.4 −0.2 0.0 0.2 0.4 run1500
0.6 0.8 1.0
F1 Loadings F1 Loadings