9 ML

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

DIMENSIONALITY REDUCTION

Module-III
MODULE-III

Dimensionality Reduction: Introduction, Subset Selection, Principal Components Analysis,


Factor Analysis, Multidimensional Scaling, Linear Discriminant Analysis, Isomap, Locally
Linear Embedding,
Decision Trees: Introduction, Univariate Trees, Pruning, Rule Extraction from Trees,
Learning Rules from Data, Multivariate Trees,
Introduction to Linear Discrimination, Generalizing the Linear Model.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 2


REFERENCES

Ref-3 Ethem Alpaydin, “Introduction to Machine Learning”, Second Edition,


2010 [Ch-6,9,10]

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 3


INTRODUCTION
 Complexity of any classifier or regressor depends on the number of inputs.

 Data contain information


• are taken as inputs and
• fed to the system for decision making.

 Ideally, the classifier (or regressor) should be able to use necessary features ,
discarding the irrelevant.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 4


NEED FOR DIMENSIONALITY REDUCTION
 Curse of dimensionality: the size of training set increases with dimension.

 Computation time and storage increases


 Data can be easily visualized
 Help to eliminate irrelevant data or reduce noise

 Ideally, the classifier (or regressor) should be able to use necessary features ,
discarding the irrelevant.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 5


PROJECTING INTO ANY ONE ATTRIBUTE

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 6


Why Reducing Dimensionality?
 complexity depends on the No. of input dimensions, d and size of the data sample, N
• for reduced memory and computation, reduce the dimensionality of the problem.
• Decreasing d also decreases the complexity of the inference algorithm during
testing.

 Unnecessary inputs need not be extracted .

 Simpler models are more robust on small datasets.


• Simpler models have less variance, that is, they vary less depending on the
particulars of a sample, including noise, outliers, etc.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 7


Why Reducing Dimensionality? …Contd

 With fewer features, a better idea about the process is obtained and this allows
knowledge extraction.

 When data can be represented in a few dimensions without loss of information,


it can be plotted and analyzed visually for structure and outliers.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 8


Two Methods for Reducing Dimensionality

 Feature Selection
• find k of the d dimensions that give us the most information
• discard the other (d − k) dimensions.

 Feature Extraction
• find a new set of k dimensions that are combinations of the original d
dimensions.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 9


Subset Selection

 Finding the best subset of the set of features.


 The best subset contains the least number of dimensions that most contribute to
accuracy.
 Using a suitable error function, eliminate irrelevant dimensions
 This can be used in both regression and classification problems.
 There are possible subsets of d variables,
 But not possible to test all of them unless d is small and we employ heuristics to
get a reasonable (but not optimal) solution in reasonable (polynomial) time.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 10


Subset Selection
 Two approaches
• Forward selection

• Backward selection

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 11


Forward Selection

 Start with no variables and add them one by one


 At each step adding the one that decreases the
error the most
 Until any further addition does not decrease the
error

https://quantifyinghealth.com/stepwise-selection/

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 12


Backward Selection

 Start with all variables and remove them one by


one,
 At each step removing the one that decreases the
error the most
 Until any further removal increases the error
significantly.

https://quantifyinghealth.com/stepwise-selection/

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 13


Subset Selection
In either case,

• Error checking should be done on a validation set distinct from the


training set because we want to test the generalization accuracy.
• With more features, generally we have lower training error, but not necessarily
lower validation error.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 14


Principal Component Analysis
• projection methods
• find a mapping from the inputs in the original d-dimensional space to a new (k <
d)-dimensional space,
• with minimum loss of information

• unsupervised method -it does not use the output information;


• the criterion to be maximized is the variance

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 15


Principal Component Analysis

• Principal component analysis, or PCA, is a dimensionality-reduction method


• that is often used to reduce the dimensionality of large data sets,
• by transforming a large set of variables into a smaller one
• that still contains most of the information in the large set.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 16


Principal Component Analysis

• The Principal Components are a straight line


that captures most of the variance of the data.
• They have a direction and magnitude.
• Principal components are orthogonal
projections (perpendicular) of data onto lower-
dimensional space.
https://www.simplilearn.com/tutorials/machin
e-learning-tutorial/principal-component-analysis

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 17


PCA Step-1 Standardisation of data
• to standardize the range of continuous initial variables so that each one of them
contributes equally to the analysis.
• if there are large differences between the ranges of initial variables, larger ranges
will dominate over those with small ranges
• (for example, a variable that ranges between 0 and 100 will dominate over a variable
that ranges between 0 and 1),
• which will lead to biased results.
• So, transforming the data to comparable scales can prevent this problem.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 18


PCA Step-1 Standardisation of data
• Mathematically, this can be done by subtracting the mean and dividing by the
standard deviation for each value of each variable.
.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 19


PCA Step-2 Covariance Matrix Computation

Aim :
to understand how the variables of the input data set are varying from
the mean with respect to each other, or

in other words, to see if there is any relationship between them.


Because sometimes, variables are highly correlated in such a way that
they contain redundant information.
So, in order to identify these correlations, we compute the covariance
matrix.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 20


PCA Step-2 Covariance Matrix Computation
• The covariance matrix is a p × p symmetric matrix (where p is the number of
dimensions) that has as entries the covariances associated with all possible pairs of
the initial variables.

• For example, for a 3-dimensional data set with 3 variables x, y, and z, the
covariance matrix is a 3×3 matrix of this from:

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 21


PCA Step-2 Covariance Matrix Computation

• the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)),


• in the main diagonal (Top left to bottom right) have the variances of each
initial variable.
• And since the covariance is commutative (Cov(a,b)=Cov(b,a)), the entries of
the covariance matrix are symmetric with respect to the main diagonal,
which means that the upper and the lower triangular portions are equal.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 22


PCA Step-2 Covariance Matrix Computation

•If entries of covariance matrix are


•positive then: the two variables increase or decrease together (correlated)
•negative then: one increases when the other decreases (Inversely correlated)

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 23


PCA Step-3 Compute Eigen Values and Eigen vectors of
Covariance Matrix to find Principal Components
• Eigenvectors and eigenvalues are the linear algebra concepts that are to be
computed from the covariance matrix in order to determine the principal
components of the data.

• Principal components are new variables that are constructed as linear


combinations or mixtures of the initial variables.

• These combinations are done in such a way that the new variables (i.e., principal
components) are uncorrelated and most of the information within the initial
variables is squeezed or compressed into the first components.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 24


PCA Step-3 Compute Eigen Values and Eigen vectors of
Covariance Matrix to find Principal Components

A 10-dimensional data gives 10


principal components,
but PCA tries to put maximum possible
information in the first component,
then maximum remaining information
in the second and so on,

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 25


PCA Step-3 Compute Eigen Values and Eigen vectors of
Covariance Matrix to find Principal Components
Principal components represent the directions of data that give a maximal amount of
variance,
OR
the lines that capture most information of the data.
The relationship between variance and information , is that,
the larger the variance carried by a line, the larger the dispersion of the data points
along it, and
the larger the dispersion along a line, the more information it has.

Simply, principal components as new axes that provide the best angle to see and
evaluate the data, so that the differences between the observations are better visible.
Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 26
Constructing Principal Components

• No of principal components is the no of variables in the data

• principal components are constructed in such a manner that


the first principal component accounts for the largest possible
variance in the data set

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 27


Constructing Principal Components
let’s assume the scatter plot of our data set is
given

First principal component is approximately the


line that matches the purple marks because it
goes through the origin and it’s the line in which
the projection of the points (red dots) is the most
spread out.

Or mathematically it’s the line that maximizes the


variance (the average of the squared distances
from the projected points (red dots) to the origin).
Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 28
Constructing Principal Components

• The second principal component is calculated


in the same way, with the condition that it is
uncorrelated with (i.e., perpendicular to) the
first principal component and that it accounts
for the next highest variance.

• This continues until a total of p principal


components have been calculated, equal to
the original number of variables.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 29


Constructing Principal Components

• let’s go back to eigenvectors and eigenvalues.


• they always come in pairs, so that every
eigenvector has an eigenvalue.
• And their number is equal to the number of
dimensions of the data.
• For example, for a 3-dimensional data set,
there are 3 variables, therefore there are 3
eigenvectors with 3 corresponding
eigenvalues.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 30


PCA Step-3 Compute Eigen Values and Eigen vectors of
Covariance Matrix to find Principal Components

• the eigenvectors of the Covariance matrix are actually the directions of the axes
where there is the most variance(most information) and that we call Principal
Components.
• And eigenvalues are simply the coefficients attached to eigenvectors, which give
the amount of variance carried in each Principal Component.

By ranking eigenvectors in order of their eigenvalues, highest to lowest, principal


components in order of significance is obtained

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 31


Example
Let’s suppose that our data set is 2-dimensional with 2 variables x,y and that the
eigenvectors and eigenvalues of the covariance matrix are as follows:

If we rank the eigenvalues in descending order, we get λ1>λ2, which means that the
eigenvector that corresponds to the first principal component (PC1) is v1 and the one that
corresponds to the second component (PC2) isv2.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 32


Example

• After having the principal components, to compute the percentage of


variance (information) accounted for by each component, divide the
eigenvalue of each component by the sum of eigenvalues.

• Apply this on the example above, we find that PC1 and PC2 carry
respectively 96% and 4% of the variance of the data.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 33


Step-4 Feature Vector
• Decide to keep all these components or discard those of lesser significance (of
low eigenvalues), and

• form with the remaining ones a matrix of vectors that we call Feature vector.
• So, the feature vector is simply a matrix that has as columns the eigenvectors of
the components that we decide to keep.

• This makes it the first step towards dimensionality reduction, because if we


choose to keep only p eigenvectors (components) out of n, the final data set will
have only p dimensions.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 34


Step-4 Feature Vector Example

Continuing with the example from Or discard the eigenvector v2,


the previous step, we can either which is the one of lesser
form a feature vector with both of significance, and form a feature
the eigenvectors v1 and v2: vector with v1 only:

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 35


Step-4 Feature Vector Example

• it’s up to you to choose whether to keep all the components or discard


the ones of lesser significance, depending on what you are looking for.

• Because if you just want to describe your data in terms of new


variables (principal components) that are uncorrelated without seeking
to reduce dimensionality, leaving out lesser significant components is
not needed.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 36


Step-5 RECAST DATA ALONG PRINCIPAL COMPONENTS

• the aim is to use the feature vector formed using the eigenvectors of the
covariance matrix,

• to reorient the data from the original axes to the ones represented by the
principal components (hence the name Principal Components Analysis).

• This can be done by multiplying the transpose of the original data set by the
transpose of the feature vector.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 37


Step-5 RECAST DATA ALONG PRINCIPAL COMPONENTS

• the aim is to use the feature vector formed using the eigenvectors of the
covariance matrix,

• to reorient the data from the original axes to the ones represented by the
principal components (hence the name Principal Components Analysis).

• This can be done by multiplying the transpose of the original data set by the
transpose of the feature vector.

Module-III Introduction to Dimensionality Reduction-PCA 11:21 AM 38


THANK YOU

You might also like