Business Data Mining Week 4

Week 4 - LAQ's
Explain in detail about the methods of Dimensionality Reduction?
Machine Learning: Machine learning is nothing but a field of study which allows
computers to “learn” like humans without any need of explicit programming.
Predictive Modeling: Predictive modeling is a probabilistic process that allows us

to forecast outcomes, on the basis of some predictors. These predictors are
basically features that come into play when deciding the final result, i.e. the
outcome of the model.
Dimensionality reduction is the process of reducing the number of features (or

dimensions) in a dataset while retaining as much information as possible. This can
be done for a variety of reasons, such as to reduce the complexity of a model, to
improve the performance of a learning algorithm, or to make it easier to visualize
the data.
Dimensionality reduction technique can be defined as, "It is a way of converting
the higher dimensions dataset into lesser dimensions dataset ensuring that it
provides similar information." These techniques are widely used in machine
learning for obtaining a better fit predictive model while solving the classification
and regression problems.
It is commonly used in the fields that deal with high-dimensional data, such as
speech recognition, signal processing, bioinformatics, etc. It can also be used for
data visualization, noise reduction, cluster analysis, etc.
Dimensionality reduction is a technique used to reduce the number of features in a
dataset while retaining as much of the important information as possible. In other
words, it is a process of transforming high-dimensional data into a lower-
dimensional space that still preserves the essence of the original data.
In machine learning, high-dimensional data refers to data with a large number of
Pragalath EA2252001010013 1
features or variables. The curse of dimensionality is a common problem in machine
learning, where the performance of the model deteriorates as the number of features
increases. This is because the complexity of the model increases with the number
of features, and it becomes more difficult to find a good solution. In addition, high-
dimensional data can also lead to overfitting, where the model fits the training data
too closely and does not generalize well to new data.
Dimensionality reduction can help to mitigate these problems by reducing the
complexity of the model and improving its generalization performance. There are
two main approaches to dimensionality reduction: feature selection and feature
extraction.
Approaches of Dimension Reduction

There are two ways to apply the dimension reduction technique, which are given
below:
Feature Selection
Feature selection is the process of selecting the subset of the relevant features and leaving out
the irrelevant features present in a dataset to build a model of high accuracy. In other words, it
is a way of selecting the optimal features from the input dataset.
Three methods are used for the feature selection:
1. Filters Methods
In this method, the dataset is filtered, and a subset that contains only the relevant features is
taken. Some common techniques of filters method are:
o Correlation
o Chi-Square Test
o ANOVA
o Information Gain, etc.
2. Wrappers Methods
The wrapper method has the same goal as the filter method, but it takes a machine learning
model for its evaluation. In this method, some features are fed to the ML model, and evaluate
the performance. The performance decides whether to add those features or remove to increase
the accuracy of the model. This method is more accurate than the filtering method but complex
to work. Some common techniques of wrapper methods are:
o Forward Selection
o Backward Selection
o Bi-directional Elimination
3. Embedded Methods
Embedded methods check the different training iterations of the machine learning model and
evaluate the importance of each feature. Some common techniques of Embedded methods are:
o LASSO
o Elastic Net
o Ridge Regression, etc.
Feature Extraction
Feature extraction is the process of transforming the space containing many dimensions into
space with fewer dimensions. This approach is useful when we want to keep the whole
information but use fewer resources while processing the information.
Some common feature extraction techniques are:
a. Principal Component Analysis
b. Linear Discriminant Analysis
c. Kernel PCA
d. Quadratic Discriminant Analysis
Common techniques of Dimensionality Reduction

a. Principal Component Analysis
b. Backward Elimination
c. Forward Selection
d. Score comparison
e. Missing Value Ratio
f. Low Variance Filter
g. High Correlation Filter
h. Random Forest
i. Factor Analysis
j. Auto-Encoder
Principal Component Analysis (PCA)

Principal Component Analysis is a statistical process that converts the observations of
correlated features into a set of linearly uncorrelated features with the help of orthogonal
transformation. These new transformed features are called the Principal Components. It is
one of the popular tools that is used for exploratory data analysis and predictive modeling.
PCA works by considering the variance of each attribute because the high attribute shows the
good split between the classes, and hence it reduces the dimensionality. Some real-world
applications of PCA are image processing, movie recommendation system, optimizing the
power allocation in various communication channels.
Backward Feature Elimination
The backward feature elimination technique is mainly used while developing Linear
Regression or Logistic Regression model. Below steps are performed in this technique to
reduce the dimensionality or in feature selection:
o In this technique, firstly, all the n variables of the given dataset are taken to train the model.
o The performance of the model is checked.
o Now we will remove one feature each time and train the model on n-1 features for n times,
and will compute the performance of the model.
o We will check the variable that has made the smallest or no change in the performance of
the model, and then we will drop that variable or features; after that, we will be left with n-
1 features.
o Repeat the complete process until no feature can be dropped.
In this technique, by selecting the optimum performance of the model and maximum tolerable
error rate, we can define the optimal number of features require for the machine learning
algorithms.
Forward Feature Selection

Forward feature selection follows the inverse process of the backward elimination process. It
means, in this technique, we don't eliminate the feature; instead, we will find the best features
that can produce the highest increase in the performance of the model. Below steps are
performed in this technique:
o We start with a single feature only, and progressively we will add each feature at a time.
o Here we will train the model on each feature separately.
o The feature with the best performance is selected.
o The process will be repeated until we get a significant increase in the performance of the
model.
Missing Value Ratio

If a dataset has too many missing values, then we drop those variables as they do not carry
much useful information. To perform this, we can set a threshold level, and if a variable has
missing values more than that threshold, we will drop that variable. The higher the threshold
value, the more efficient the reduction.
Low Variance Filter
As same as missing value ratio technique, data columns with some changes in the data have
less information. Therefore, we need to calculate the variance of each variable, and all data
columns with variance lower than a given threshold are dropped because low variance features
will not affect the target variable.
High Correlation Filter

High Correlation refers to the case when two variables carry approximately similar
information. Due to this factor, the performance of the model can be degraded. This correlation
between the independent numerical variable gives the calculated value of the correlation
coefficient. If this value is higher than the threshold value, we can remove one of the variables
from the dataset. We can consider those variables or features that show a high correlation with
the target variable.
Random Forest
Random Forest is a popular and very useful feature selection algorithm in machine learning.
This algorithm contains an in-built feature importance package, so we do not need to program
it separately. In this technique, we need to generate a large set of trees against the target
variable, and with the help of usage statistics of each attribute, we need to find the subset of
features.
Random forest algorithm takes only numerical variables, so we need to convert the input data
into numeric data using hot encoding.
Factor Analysis
Factor analysis is a technique in which each variable is kept within a group according to the
correlation with other variables, it means variables within a group can have a high correlation
between themselves, but they have a low correlation with variables of other groups.
We can understand it by an example, such as if we have two variables Income and spend. These
two variables have a high correlation, which means people with high income spends more, and
vice versa. So, such variables are put into a group, and that group is known as the factor. The
number of these factors will be reduced as compared to the original dimension of the dataset.
Auto-encoders
One of the popular methods of dimensionality reduction is auto-encoder, which is a type of
ANN or artificial neural network, and its main aim is to copy the inputs to their outputs. In this,
the input is compressed into latent-space representation, and output is occurred using this
representation. It has mainly two parts:
o Encoder: The function of the encoder is to compress the input to form the latent-space
representation.
o Decoder: The function of the decoder is to recreate the output from the latent-space
representation.
Why is Dimensionality Reduction important in Machine Learning

and Predictive Modeling?
An intuitive example of dimensionality reduction can be discussed through a
simple e-mail classification problem, where we need to classify whether the e-mail
is spam or not. This can involve a large number of features, such as whether or not
the e-mail has a generic title, the content of the e-mail, whether the e-mail uses a
template, etc. However, some of these features may overlap. In another condition,
a classification problem that relies on both humidity and rainfall can be collapsed
into just one underlying feature, since both of the aforementioned are correlated to
a high degree. Hence, we can reduce the number of features in such problems. A
3-D classification problem can be hard to visualize, whereas a 2-D one can be
mapped to a simple 2-dimensional space, and a 1-D problem to a simple line. The
below figure illustrates this concept, where a 3-D feature space is split into two 2-
D feature spaces, and later, if found to be correlated, the number of features can be
reduced even further.
Components of Dimensionality Reduction
There are two components of dimensionality reduction:
 Feature selection: In this, we try to find a subset of the original set of variables,
or features, to get a smaller subset which can be used to model the problem. It
usually involves three ways:
1. Filter
2. Wrapper
3. Embedded
 Feature extraction: This reduces the data in a high dimensional space to a
lower dimension space, i.e. a space with lesser no. of dimensions.
Methods of Dimensionality Reduction

The various methods used for dimensionality reduction include:
 Principal Component Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Generalized Discriminant Analysis (GDA)
Dimensionality reduction may be both linear and non-linear, depending upon the
method used. The prime linear method, called Principal Component Analysis, or
PCA, is discussed below.
Principal Component Analysis

This method was introduced by Karl Pearson. It works on the condition that while
the data in a higher dimensional space is mapped to data in a lower dimension
space, the variance of the data in the lower dimensional space should be maximum.
It involves the following steps:

 Construct the covariance matrix of the data.
 Compute the eigenvectors of this matrix.
 Eigenvectors corresponding to the largest eigenvalues are used to reconstruct a
large fraction of variance of the original data.
Hence, we are left with a lesser number of eigenvectors, and there might have been
some data loss in the process. But, the most important variances should be retained
by the remaining eigenvectors.
Benefits of applying Dimensionality Reduction

Some benefits of applying dimensionality reduction technique to the given dataset are given
below:
o By reducing the dimensions of the features, the space required to store the dataset
also gets reduced.
o Less Computation training time is required for reduced dimensions of features.
o Reduced dimensions of features of the dataset help in visualizing the data
quickly.
o It removes the redundant features (if present) by taking care of multicollinearity.
The Curse of Dimensionality

Handling the high-dimensional data is very difficult in practice, commonly known
as the curse of dimensionality. If the dimensionality of the input dataset increases,
any machine learning algorithm and model becomes more complex. As the number
of features increases, the number of samples also gets increased proportionally, and
the chance of overfitting also increases. If the machine learning model is trained on
high-dimensional data, it becomes overfitted and results in poor performance.
Hence, it is often required to reduce the number of features, which can be done with
dimensionality reduction.
Advantages of Dimensionality Reduction

 It helps in data compression, and hence reduced storage space.
 It reduces computation time.
 It also helps remove redundant features, if any.
 Improved Visualization: High dimensional data is difficult to visualize, and
dimensionality reduction techniques can help in visualizing the data in 2D or
3D, which can help in better understanding and analysis.
 Overfitting Prevention: High dimensional data may lead to overfitting in
machine learning models, which can lead to poor generalization performance.
Dimensionality reduction can help in reducing the complexity of the data, and
hence prevent overfitting.
 Feature Extraction: Dimensionality reduction can help in extracting important
features from high dimensional data, which can be useful in feature selection
for machine learning models.
 Data Preprocessing: Dimensionality reduction can be used as a preprocessing
step before applying machine learning algorithms to reduce the dimensionality
of the data and hence improve the performance of the model.
 Improved Performance: Dimensionality reduction can help in improving the
performance of machine learning models by reducing the complexity of the
Pragalath EA2252001010013 10
data, and hence reducing the noise and irrelevant information in the data.
Disadvantages of Dimensionality Reduction

 It may lead to some amount of data loss.
 PCA tends to find linear correlations between variables, which is sometimes
undesirable.
 PCA fails in cases where mean and covariance are not enough to define datasets.
 We may not know how many principal components to keep- in practice, some
thumb rules are applied.
 Interpretability: The reduced dimensions may not be easily interpretable, and it
may be difficult to understand the relationship between the original features and
the reduced dimensions.
 Overfitting: In some cases, dimensionality reduction may lead to overfitting,
especially when the number of components is chosen based on the training data.
 Sensitivity to outliers: Some dimensionality reduction techniques are sensitive
to outliers, which can result in a biased representation of the data.
 Computational complexity: Some dimensionality reduction techniques, such as
manifold learning, can be computationally intensive, especially when dealing
with large datasets.
Important points:
 Dimensionality reduction is the process of reducing the number of features in a
dataset while retaining as much information as possible.
This can be done to reduce the complexity of a model, improve the performance
of a learning algorithm, or make it easier to visualize the data.
 Techniques for dimensionality reduction include: principal component analysis
(PCA), singular value decomposition (SVD), and linear discriminant analysis
(LDA).
 Each technique projects the data onto a lower-dimensional space while
preserving important information.
 Dimensionality reduction is performed during pre-processing stage before
building a model to improve the performance
Pragalath EA2252001010013 11
 It is important to note that dimensionality reduction can also discard useful
information, so care must be taken when applying these techniques.
Pragalath EA2252001010013 12

Business Data Mining Week 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Data Mining Week 4

Uploaded by

Copyright:

Available Formats

Week 4 - LAQ's

Explain in detail about the methods of Dimensionality Reduction?

Predictive Modeling: Predictive modeling is a probabilistic process that allows us

Dimensionality reduction is the process of reducing the number of features (or

In machine learning, high-dimensional data refers to data with a large number of

Approaches of Dimension Reduction

Common techniques of Dimensionality Reduction

Principal Component Analysis (PCA)

Forward Feature Selection

Missing Value Ratio

High Correlation Filter

Why is Dimensionality Reduction important in Machine Learning

Methods of Dimensionality Reduction

Principal Component Analysis

It involves the following steps:

Benefits of applying Dimensionality Reduction

The Curse of Dimensionality

Advantages of Dimensionality Reduction

Disadvantages of Dimensionality Reduction

You might also like