Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Overfitting & Regularization

Overfitting:- Model learns information plus


noise.
• cross-validation sampling
• reducing number of features
• pruning
• Regularization:-
 Adds the penalty as complexity increases.

28/04/2021 Dr Geetishree Mishra 1


• When large number of features are there in a
dataset, compared to the number of
observations, some of the Regularization
(shrinkage models)techniques used to address
over-fitting and feature selection are:
» L2 – Ridge regression
» L1– Lasso regression
• Ridge and Lasso regression are some of the
simple techniques to reduce model complexity
and prevent over-fitting which may result
from simple linear regression.

28/04/2021 Dr Geetishree Mishra 2


Ridge Regression(L2)

Regularization parameter (lambda) penalizes all the


parameters except intercept so that model generalizes the
data and won’t overfit.
It tends to solve the multicollinearity problem through
shrinkage parameter ‘λ’.
28/04/2021 Dr Geetishree Mishra 3
Lasso regression(L1)

 Lasso (Least absolute shrinkage and selection operator)


penalizes the absolute size of the regression coefficients.
 In addition to this, it is quite capable of reducing the
variability and improving the accuracy of linear regression
models.
 Helps in dimensionality reduction and feature selection.
28/04/2021 Dr Geetishree Mishra 4
• Traditional methods like cross-validation,
stepwise regression to handle overfitting and
perform feature selection work well with a
small set of features but Ridge and Lasso
regularization techniques are a great
alternative when we are dealing with a large
set of features.

28/04/2021 Dr Geetishree Mishra 5


Feature scaling..
• Machine learning algorithm just sees number — if
there is a vast difference in the range say few ranging in
thousands and few ranging in the tens, it makes the
assumption that higher ranging numbers have higher
impact on the response. So these more significant
number starts playing a more decisive role while
training the model, leads to higher biasing.
• Thus feature scaling is needed to bring every feature to
the same scale without any upfront importance.
• Feature scaling is important even for the training
algorithms like gradient descent to converge much
faster.

28/04/2021 Dr Geetishree Mishra 6


28/04/2021 Dr Geetishree Mishra 7
Standard Scaler..

•The Standard Scaler assumes data is normally distributed


within each feature and scales them such that the
distribution centered around 0, with a standard deviation of
1.
•Centering and scaling happen independently on each
feature by computing the relevant statistics on the samples
in the training set.
•If data is not normally distributed, this is not the best
Scaler to use.

28/04/2021 Dr Geetishree Mishra 8


MinMax Scaler..

•Transform features by scaling each feature to a given range.


•This estimator scales and translates each feature individually
such that it is in the given range on the training set, e.g.,
between zero and one. This Scaler shrinks the data within the
range of -1 to 1 if there are negative values. We can set the
range like [0,1] or [0,5] or [-1,1].
•This Scaler responds well if the standard deviation is small
and when a distribution is not Gaussian.
•This Scaler is sensitive to outliers.
28/04/2021 Dr Geetishree Mishra 9

You might also like