Professional Documents
Culture Documents
Lecture 6 - PCA - Lecturefin
Lecture 6 - PCA - Lecturefin
Lecture 6 - PCA - Lecturefin
• E-mail: dndoh2009@gmail.com
• Tel: 653754070
p k
n A n X
These underlying factors are inferred from the correlations among the p
variables. Each factor is estimated as a weighted sum of the p variables.
Step 1: Standardization
The aim of this step is to standardize the range of the continuous initial variables so that
each one of them contributes equally to the analysis.
If there are large differences between the ranges of initial variables, those variables with
larger ranges will dominate over those with small ranges (For example, a variable that
ranges between 0 and 100 will dominate over a variable that ranges between 0 and 1),
which will lead to biased results. So, transforming the data to comparable scales can
prevent this problem.
Mathematically, this can be done by subtracting the mean and dividing by the standard
deviation for each value of each variable.
Once the standardization is done, all the variables will be transformed to the same scale.
X i m X i X
n
1
C ij
n 1 m 1
jm X j
Covariance of
variables i and j
Value of Value of Mean of
Mean of
Sum over all variable i variable j variable j
variable i
n objects in object m in object m
PC 1
PC 2
S I 0
X1 X2 1 = 9.8783
X1 6.6707 3.4170 2 = 3.0308
X2 3.4170 6.2384
Eigenvectors
u1 u2
X1 0.7291 -0.6844
X2 0.6844 0.7291
0.7291*(-0.6844) + 0.6844*0.7291 = 0
2
PC 2
0
-8 -6 -4 -2 0 2 4 6 8 10 12
-2
-4
-6
Prof Dr.Ndoh Mbue 17 11/8/2022
PC 1
…The Algebra of PCA
The cross-products matrix computed among the p principal axes has a
simple form:
all off-diagonal values are zero (the principal axes are uncorrelated)
the diagonal values are the eigenvalues.
PC1 PC2
Variance-covariance Matrix
of the PC axes
(a) Compute the eigenvalues λ1 and λ2 of R and the corresponding eigenvectors v1 and v2 of
R:
(b) Show that λ1+λ2=tr(R) where the trace of a matrix equals the sum of its diagonal
components.
(c) Show that λ1 λ2 = where is the determinant of the matrix.
(d) Compute the weights of the principal components w1 and w2 that sets the scales of the
components and ensures that they are orthogonal.
(e) Compute the loadings of the variables.
(f) What proportion of the total variance in the data does the first principal component
account for?
A variety of methods have been developed to extract factors from an intercorrelation matrix.
SPSS offers the following methods …
1) Principle components method (probably the most commonly used method)
2) Maximum likelihood method (a commonly used method)
3) Principal axis method also know as common factor analysis
4) Unweighted least-squares method
5) Generalized least squares method
6) Alpha method
7) Image factoring
gs2000.sav
[File > Open > Data > (Choose the C/D/E: drive, the SPSS
folder, then gs2000.sav]
To compute a principal
component analysis in SPSS,
select the Data Reduction |
Factor… command from the
Analyze menu.
Sixth, click
on the
Continue
Second, keep the Initial button.
solution checkbox to get
the statistics needed to
determine the number
of factors to extract. Fifth, mark the Anti-image
checkbox to get more
outputs used to assess the
appropriateness of factor
analysis for the variables.
Second, click
on the
Continue
button.
First, highlight
the life variable.
Anti-image Matrices
CONDITION OF HEALTH -.008 .050 -.052 .203 -.085 -.099 .749 -.102
IS LIFE EXCITING OR On iteration 1, the MSA for all of the
DULL
.108 .028 -.121
individual
-.039
variables
-.085
included
-.024
in the-.102 .876
First, highlight
the health
variable.
Component
1 2
On iteration 4, none of the
RS HIGHEST DEGREE .732 -.202
variables demonstrated
complex structure. It is not
FATHERS HIGHEST
DEGREE
.848 .031 necessary to remove any
additional variables because
MOTHERS HIGHEST
DEGREE
.810 .169 of complex structure.
GENERAL HAPPINESS .145 .851
HAPPINESS OF
-.145 .872
MARRIAGE
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
No variables need to be
Rotated Component Matrixa
removed because they
Component are the only variable
1 2 loading on a component.
RS HIGHEST DEGREE .732 -.202
FATHERS HIGHEST
.031
DEGREE .848
MOTHERS HIGHEST
.169
DEGREE .810
GENERAL HAPPINESS .145 .851
HAPPINESS OF
-.145
MARRIAGE .872
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.
Communalities
Initial Extraction
RS HIGHEST DEGREE 1.000 .577
FATHERS HIGHEST
1.000 .720
DEGREE
MOTHERS HIGHEST The communalities for all of the
1.000 .684 variables included on the
DEGREE
GENERAL HAPPINESS 1.000 .745 components were greater than
HAPPINESS OF 0.50 and all variables had
MARRIAGE
1.000 .782 simple structure.
Extraction Method: Principal Component Analysis.
The principal component
analysis has been completed.
Yes
Yes
No
Is the ratio of cases to Incorrect application
variables at least 5 to 1? of a statistic
Yes
Yes
Is the measure of
No Remove variable with
sampling adequacy larger
lowest MSA and repeat
than 0.50 for each analysis
variable?
Yes
Overall measure of No
sampling adequacy Incorrect application
greater than 0.50? of a statistic
Yes
Prof Dr.Ndoh Mbue 67 11/8/2022
Steps in principal component analysis - 3
Yes
Yes
Yes
Yes
Yes
Yes
Yes
Is the cumulative
proportion of variance No
for variables 60% or True with caution
higher?
Yes