Professional Documents
Culture Documents
Feature Selection: Slide 1
Feature Selection: Slide 1
Slide 1
Feature Selection
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Why we need Feature Selection:
Slide 7
Need for Feature Selection
Slide 8
Overview
Slide 9
Feature Selection Techniques
• There are mainly two types of Feature Selection techniques, which are:
• Supervised Feature Selection technique
Supervised Feature selection techniques consider the target variable and can be used for the
labelled dataset.
• Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the target variable and can be used for the
unlabelled dataset.
Slide 10
• Filter Method:
• features are dropped based on their relation to the output, or
how they are correlating to the output. We use correlation to
check if the features are positively or negatively correlated to
the output labels and drop features accordingly. Eg:
Information Gain, Chi-Square Test, Fisher’s Score, etc.
Slide 11
Wrapper Methods
• We split our data into subsets and train a model using this. Based on the
output of the model, we add and subtract features and train the model
again. It forms the subsets using a greedy approach and evaluates the
accuracy of all the possible combinations of features. Eg: Forward
Selection, Backwards Elimination, etc.
Some of the techniques under these methods are:
Foreword selection.
Backward selection.
Bi-directional selection.
Exhaustive selection.
Recursive selection.
Slide 12
• Intrinsic Method: This method combines the qualities of both
the Filter and Wrapper method to create the best subset.
• This method takes care of the machine training iterative
process while maintaining the computation cost to be
minimum. Eg: Lasso and Ridge Regression.
Slide 13
Slide 14
How to choose a Feature Selection Method?
Slide 15
How to Choose a Feature Selection Method
• Choosing the best feature selection method depends on the input and output in
consideration:
• Numerical Input, Numerical Output: feature selection regression problem with numerical
input variables - use a correlation coefficient, such as Pearson’s correlation coefficient (for
linear regression feature selection) or Spearman’s rank coefficient (for nonlinear).
• Numerical Input, Categorical Output: feature selection classification problem with
numerical input variables - use a correlation coefficient, taking into account the
categorical target, such as ANOVA correlation coefficient (for linear) or Kendall’s rank
coefficient (nonlinear).
• Categorical Input, Numerical Output: regression predictive modeling problem with
categorical input variables (rare) - use a correlation coefficient, such as ANOVA correlation
coefficient (for linear) or Kendall’s rank coefficient (nonlinear), but in reverse.
• Categorical Input, Categorical Output: classification predictive modeling problem with
categorical input variables - use a correlation coefficient, such as Chi-Squared test
(contingency tables) or Mutual Information, which is a powerful method that is agnostic
to data types.
Slide 16
Perspectives
• Filters:
– measuring uncertainty, distances, dependence or consistency is usually
cheaper than measuring the accuracy of a learning process. Thus, filter
methods are usually faster.
– it does not rely on a particular learning bias, in such a way that the
selected features can be used to learn different models
– it can handle larger sized data, due to the simplicity and low time
complexity of the evaluation measures.
Slide 17
Slide 18
Perspectives
• Filters:
Slide 19
Perspectives
• Embedded FS:
– similar to the wrapper approach in the sense that the features are
specifically selected for a certain learning algorithm, but in this
approach, the features are selected during the learning process.
– they could take advantage of the available data by not requiring to split
the training data into a training and validation set; they could achieve a
faster solution by avoiding the re-training of a predictor for each feature
subset explored.
Slide 20
Aspects:Output of Feature Selection
Slide 21
Aspects:Output of Feature Selection
Slide 22
Aspects:Output of Feature Selection
Slide 23
Aspects:Output of Feature Selection
Slide 24
Aspects: Evaluation
• Goals:
– Inferability: For predictive tasks, considered as an
improvement of the prediction of unseen examples with
respect to the direct usage of the raw training data.
– Interpretability: Given the incomprehension of raw data by
humans, DM is also used for generating more
understandable structure representation that can explain
the behavior of the data.
– Data Reduction: It is better and simpler to handle data with
lower dimensions in terms of efficiency and interpretability.
Slide 25
Aspects:Evaluation
– Complexity
Slide 27
Aspects:Using Decision Trees for FS
Slide 28
Slide 29