Prof. Dr. George Mathew B.Sc., B.Tech, PGDCA, PGDM, MBA, PhD 1 Dimension Reduction Techniques Unsupervised dimensionality reduction is a commonly used approach in feature preprocessing to remove noise from data, which can also degrade the predictive performance of certain algorithms, and compress the data onto a smaller dimensional subspace while retaining most of the relevant information. Sometimes, dimensionality reduction can also be useful for visualizing data; for example, a high-dimensional feature set can be projected onto one-, two-, or three-dimensional feature spaces in order to visualize it via 2D or 3D scatterplots or histograms. Some of the dimensionality reduction techniques are: •Principal component analysis (PCA) for unsupervised data compression •Linear discriminant analysis (LDA) as a supervised dimensionality reduction technique for maximizing class separability •Nonlinear dimensionality reduction via kernel principal component analysis (KPCA) Principal component analysis (PCA) PCA is a powerful decomposition technique; it allows one to break down a highly multivariate dataset into a set of orthogonal components. When taken together in sufficient number, these components can explain almost all of the dataset's variance. PCA works by successively identifying the axis of greatest variance in a dataset (the principal components). It does this as follows: 1. Identifying the center point of the dataset. 2. Calculating the covariance matrix of the data. 3. Calculating the eigenvectors of the covariance matrix. 4. Orthonormalizing the eigenvectors. 5. Calculating the proportion of variance represented by each eigenvector. Principal component analysis (PCA) Covariance is effectively variance applied to multiple dimensions; it is the variance between two or more variables. While a single value can capture the variance in one dimension or variable, it is necessary to use a 2 x 2 matrix to capture the covariance between two variables, a 3 x 3 matrix to capture the covariance between three variables, and so on. So the first step in PCA is to calculate this covariance matrix. An Eigenvector is a vector that is specific to a dataset and linear transformation. Specifically, it is the vector that does not change in direction before and after the transformation is performed. Orthogonalization is the process of finding two vectors that are orthogonal (at right angles) to one another. In an n-dimensional data space, the process of orthogonalization takes a set of vectors and yields a set of orthogonal vectors. Orthonormalization is an orthogonalization process that also normalizes the product. Eigenvalue (roughly corresponding to the length of the eigenvector) is used to calculate the proportion of variance represented by each eigenvector. This is done by dividing the eigenvalue for each eigenvector by the sum of eigenvalues for all eigenvectors. The main steps behind principal component analysis PCA helps us to identify patterns in data based on the correlation between features. In a nutshell, PCA aims to find the directions of maximum variance in high-dimensional data and projects the data onto a new subspace with equal or fewer dimensions than the original one. The orthogonal axes (principal components) of the new subspace can be interpreted as the directions of maximum variance given the constraint that the new feature axes are orthogonal to each other, as illustrated in Figure : Principal Component Analysis If we use PCA for dimensionality reduction, we construct a d×k-dimensional transformation matrix, W, that allows us to map a vector of the features of the training example, x, onto a new k-dimensional feature subspace that has fewer dimensions than the original d-dimensional feature space. As a result of transforming the original d-dimensional data onto this new k- dimensional subspace (typically k << d), the first principal component will have the largest possible variance. All consequent principal components will have the largest variance given the constraint that these components are uncorrelated (orthogonal) to the other principal components—even if the input features are correlated, the resulting principal components will be mutually orthogonal (uncorrelated). Note that the PCA directions are highly sensitive to data scaling, and we need to standardize the features prior to PCA if the features were measured on different scales and we want to assign equal importance to all features. Principal Component Analysis Before looking at the PCA algorithm for dimensionality reduction in more detail, let’s summarize the approach in a few simple steps: 1. Standardize the d-dimensional dataset. 2. Construct the covariance matrix. 3. Decompose the covariance matrix into its eigenvectors and eigenvalues. 4. Sort the eigenvalues by decreasing order to rank the corresponding eigenvectors. 5. Select k eigenvectors, which correspond to the k largest eigenvalues, where k is the dimensionality of the new feature subspace ( 5. Select k eigenvectors, which correspond to the k largest eigenvalues, where k is the dimensionality of the new feature subspace . 6. Construct a projection matrix, W, from the “top” k eigenvectors. 7. Transform the d-dimensional input dataset, X, using the projection matrix, W, to obtain the new k-dimensional feature subspace. Extracting the principal components step by step In this subsection, we will tackle the first four steps of a PCA: 1. Standardizing the data 2. Constructing the covariance matrix 3. Obtaining the eigenvalues and eigenvectors of the covariance matrix 4. Sorting the eigenvalues by decreasing order to rank the eigenvectors