Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Dimensionality Reduction,

Principal Component Analysis


algorithm, Factor analysis

1
Syllabus
• Module 1: (14 hours)
• Relation between Machine Learning and Statistics. Introduction to Algorithms in Machine Learning –
Classification, Supervised machine learning – linear regression, Multiple linear regression, Logistic
regression – Model representation, Discriminant Analysis, Classification Trees, Support Vector Machine.
• Module 2: (11 hours)
• Introduction to unsupervised learning - Clustering – types of clustering, Dimensionality Reduction,
Principal Component Analysis algorithm, Factor analysis.
• Module 3: (14 hours)
• Era of Intelligent Systems - The Fourth Industrial Revolution Impact, The Technology of the Fourth
Industrial Revolution, Introduction to Artificial Intelligence and Cognition. Application of artificial
intelligence (AI) techniques: Meta-heuristics: Genetic Algorithm, Scatter Search, Tabu Search, Particle
Swarm Intelligence, Ant Colony Optimization; Artificial Neural Networks; Fuzzy Logic Systems; Case
based reasoning.
Dimensionality Reduction
• To obtain a more accurate result, add as many features as possible at first.
• However, after a certain point, the performance of the model will decrease with
the increasing number of elements.
• This phenomenon is often referred to as “The Curse of Dimensionality.”
• is the process of reducing the number of random variables under
consideration by obtaining a set of principal variables.
• summarization of data with many (p) variables by a smaller set of (k) derived
p
(synthetic, composite) variables. k

n A n X
Data Reduction

• “Residual” variation is information in A that is not


retained in X
• balancing act between
• clarity of representation, ease of understanding
• oversimplification: loss of important or relevant information.
Principal Component Analysis (PCA)
• Probably the most widely-used and well-known of the
“standard” multivariate methods
• invented by Pearson (1901) and Hotelling (1933)
• first applied in ecology by Goodall (1954) under the name “factor
analysis” (“principal factor analysis” is a synonym of PCA).
Principal Component Analysis (PCA)
• takes a data matrix of n objects by p variables, which
may be correlated,
• summarizes it by uncorrelated axes (principal components
or principal axes) that are linear combinations of the
original p variables
• the first k components display as much as possible of the
variation among objects. Variables

F1  a1V1  a4V4  a10V10 V1 V2 V3 V4 V5 V6


1

F2  a2V2  a6V6  a8V8  a9V9


2
3
4
5
Main idea of Principal component analysis

• is to reduce the dimensionality of a data set consisting of


many variables correlated with each other,
• Variables may be either heavily or lightly correlated
• Variation present in the dataset may be retained up to the
maximum extent.
• Variables are transformed to a new set of variables, which
are known as the principal components (or simply, the PCs) and
are orthogonal.
Image Source: Machine Learning Lectures by Prof. Andrew NG at Stanford
University
Terminology
• Dimensionality: It is the number of random variables in a
dataset or simply the number of features, or rather more simply,
the number of columns present in your dataset.
• Correlation: It shows how strongly two variable are related to
each other. The value of the same ranges for -1 to +1.
• Positive indicates that when one variable increases, the other
increases as well,
• Negative indicates the other decreases on increasing the former.
• And the modulus value of indicates the strength of relation.
• Orthogonal: Uncorrelated to each other, i.e., correlation
between any pair of variables is 0.
Terminology
• Eigenvectors: When factor analysis generates the factors, each
factor has an associated eigenvalue which provides the total
variance explained by each factor. We keep factors having
eigenvalue >1
So, consider a non-zero vector v.
It is an eigenvector of a square matrix A, if Av is a scalar
multiple of v. Or simply:
Av = ƛv
• Here, v is the eigenvector and ƛ is the eigenvalue associated
with it.
• Covariance Matrix: This matrix consists of the co-variances
between the pairs of variables. The (i,j) th element is the
covariance between i-th and j-th variable.
Terminology
• Partitioning the variance of a variable: The total variance of any
variable can be divided into three types of variance.
1. Common variance – is defined as that variance in a variable that
is shared with all other variables in the analysis. A variable’s
communality is the estimate of its shared or common variance
2. Specific variance (or unique) – is that variance associated with
only a specific variable. This cannot be explained by the
correlations to the other variables, but uniquely associated with this
variable
3. Error variance – that cannot be explained due to unreliability in
data collection or a random component in the phenomenon.
• Total variance consists of all the three.
• As a variable is more highly correlated with one or more variables,
the common variance increases and vice versa.
Common Factor Analysis versus
Component Analysis
• Selection criteria: (1) objectives of the factor analysis
(2) amount of prior knowledge about the variance in the variables
• Component Analysis is used when the objective is to summarize
most of the original information in a minimum number of factors
• Common Factor Analysis is used primarily to identify underlying
factors or dimensions that reflect what the variables share in
common.
• Component Analysis also known as Principal Component
Analysis considers the total variance and derives factors that
contain small proportions of unique variance and error variance.
Common Factor Analysis versus
Component Analysis
• Common Factor Analysis considers only common or shared variance.
• It is assumed that both unique and error variance are not of interest in
defining the structure of the variables.
• Component factor analysis is most appropriate when data reduction is
primary concern focusing on the minimum number of factors needed to
represent the total variance
• Prior knowledge about specific and error variance represent a relatively small
proportion of the total variance.
• Common Factor Analysis is to identify the latent dimensions or
constructs
• Researcher has little knowledge about the amount of specific and error variance.
Criteria for the number of factors
• Latent Root Criterion
• A priori Criterion
• Percentage of variance criterion
• Scree Test Criterion
Interpreting the factors
• Three processes of factor interpretation
• Estimate the factor matrix – contains factor loadings for each
variable
• Factor loadings are the correlation of each variable and the factor.
Higher loadings make the variable representative of the factor
• Factor Rotation – reference axes of the factors are turned about the
origin until some other position has been reached.
• Orthogonal factor rotation – EQUIMAX, VARIMAX, QUARTIMAX
• Oblique factor rotation – OBLIMIN, PROMAX, ORTHOBLIQUE
• Varimax method maximizes the sum of variances of the required loadings of
the factor matrix
• Factor interpretation and respecification
Terminology
• Eigenvalue
Percentage of variation explained by Factor F1
Eigenvalue of Factor 1
Percentage of variation explained by F1 
Number of variables
• Communality: is the amount of variance in a variable
explained by different factors.
Example
• Following correlation matrix between six variables was obtained
V1 V2 V3 V4 V5 v6
V1 1 0.8 0.9 0.1 0.06 0.08
V2 1 0.85 0.08 0.1 0.04
V3 1 0.02 0.05 0.04
V4 1 0.9 0.08
V5 1 0.02
V6 1

• There is a high correlation between V1, V2 and V3.


• There is a high correlation between V4, V5
• Hence Factor F1 measures V1,V2 and V3
• Factor F2 measures V4 and V5; Factor F3 measures V6
Example
• After rotation, the factor matrix is considered
F1 F2 F3 Communalities

V1 0.86 0.12 0.04 0.76


V2 0.84 0.18 0.10 0.75
V3 0.68 0.24 0.15 0.54
V4 0.10 0.92 0.05 0.86
V5 0.06 0.94 0.08 0.89
V6 0.12 0.14 0.89 0.83
Eigenvalue 1.9356 1.8540 0.8340

Eigenvalue of F1 is calculated as
 0.86 2  0.84 2  0.68 2  0.10 2  0.06 2  0.12 2  1.9356
Example
Communality for V1 is calculated as  0.86 2  0.12 2  0.04 2  0.76
F1 F2 F3 Communalities

V1 0.86 0.12 0.04 0.76


V2 0.84 0.18 0.10 0.75
V3 0.68 0.24 0.15 0.54
V4 0.10 0.92 0.05 0.86
V5 0.06 0.94 0.08 0.89
V6 0.12 0.14 0.89 0.83
Eigenvalue 1.9356 1.8540 0.8340

Eigenvalue of Factor 1 1.9356


Percentage of variation explained by F1    0.3226
Number of variables 6
Interpretation of factor
analysis using SPSS
• Descriptive statistics
• The mean, standard
deviation and number of
respondents (N) who
participated in the survey are
given.
Interpretation of factor
analysis using SPSS
• The correlation matrix
• This provides the
correlation coefficients
between a single variable
and every other variables
in the study.
Interpretation of factor analysis using
SPSS
• The KMO value >0.7 indicates good
• Kaiser (1974) recommend 0.5 (value for KMO) as minimum
(barely accepted), values between 0.7-0.8 acceptable, and
values above 0.9 are superb.
• Bartlett’s test of sphericity tests the hypothesis that the
correlation matrix is an identity matrix
Interpretation of factor
analysis using SPSS
• Communalities
• A communality is the sum of the squared
component loadings
• This represents the amount of variance in
that variable accounted for by all the
components.
• For example, two extracted components
account for 73.1% of the variance in
variable ‘Instruc Well prepared’
• Communality value should be more than
0.5 to be considered for further analysis.
Else remove from further steps of factor
analysis
Interpretation of factor
analysis using SPSS
• Scree plot
• The scree plot is a graph of the
eigenvalues against the factors.
• This graph can be used to determine the
number of factors to be retained.
• The important point is where the curve
starts to flatten or eigen value <1
Interpretation of factor
analysis using SPSS
• Component matrix
• This matrix provides the loadings
(extracted values of each item
under 2 variables) of all the
variables selected on the two
factors extracted.
• Higher the absolute value of the
loading, the more the factor
contributes to the variable
Interpretation of factor
analysis using SPSS
• Rotated component matrix
• Rotation is to reduce the
number factors on which the
variables under investigation
have high loadings.
• Rotation does not actually
change anything but makes the
interpretation of the analysis
easier.

You might also like