Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

OVERFITTING AND SOLUTION

SOVLVE
1. Overfitting
Overfitting is a phenomenon when the built model shows the details of the training
dataset. This means that both noisy data, or outliers in the training set are selected and
learned to give the model rule.

a. Examples of Overfitting

The diagram is overfitting and if has more new data or test data, there are far true values,
we can see it has lots of variances and high bias. So the bias and variances are decided
high or low depending on model in it.

Bias occurs when an algorithm has limited flexibility to learn from data.

Variance defines the algorithm’s sensitivity to specfic sets of data.

b. Reasons for Overfitting


- Data used for training is not cleaned and contain noise( garbage values) in it
- The model has high variance
- Size of training data used is not enough
- The model is too complex

2. Solution to solve Overfitting


- Increase the dataset
- Validation:
o Create validation data from train data.
o Comparing to test data
o Validation data is not enough to avoid overfitting
- Cross-validation
o Cross-validation is the improved technique by Validation
o Create a lot of validation in train data.
o Divide train data into distinct k validate sets.
o Iteration of k – 1 to find validation error
o Comparing validation error and train data to decide final model.
- Regularization
o Early Stopping
 Stop iteration when validation error achive min and start rising.
o L1 Regularization (Lasso Regularition)
 Add the coefficients into the Loss function
n p
1
L= ∑ ( y i (actual )− y i ( predicted ) ) + λ ∑| β j|
2

n i=1 j=1
λ :turning parameter
o L2 Regularization (Ridge Regularition)
 Add the coefficients into the Loss function
n p
1
 L= ∑ ( y i (actual )− y i ( predicted ) )2 + λ ∑ β 2j
n i=1 j=1

λ :turning parameter
 Which Technique to use ?

Ridge Lasso
- Lot of features in the dataset - Small number of few features
- All features have small - Few features have high
coefficients coefficients value

You might also like