Professional Documents
Culture Documents
Theory in Machine Learning
Theory in Machine Learning
Theory in Machine Learning
Jaya Sil
Indian Institute of Engineering Science & Technology, Shibpur
Howrah
Predicting of y from x ∈ ℜ
y = w0 + w1x y = w0 + w1x + w2x2
Data points do not lie Added an extra feature x2. More features make better
on the straight line Slightly better fit the data fitting
• On the other hand, complicated, low bias model likely fits the
training data very well and so predictions vary wildly even
predictor values change slightly.
• This means this model has high variance, and it will not
generalize to new/unseen data well.
Overfitting
•
Overfitting- Low Bias, High
Variance
Avoid overfitting
3. Regularization
Underfitting
• A model is said to have underfitting when it cannot capture the
underlying trend of the training data and generalization ability.
• Causes due to less data but more features to build the model and we
try to build a linear model with a non-linear data.
• Train the model with training dataset and then compute loss
using validation data set.
• Each data is held out in turn and used to test a model trained
on the other N-1 objects or observations.
The total error of the model is composed of three terms: the (bias)²,
the variance, and an irreducible error term.
Bias vs. Variance
• Weights for some predictors have little predictive power,
results high-variance, low bias model.
Weight values
The model assigns non-zero values to the noise features, despite none of
them having any predictive power. Noise features have values similar to
some of the real features in the dataset.
Regularization
• One way to reduce overfitting is to reduce the number of
parameters or features by feature selection approach.
Loss =