Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Machine Learning

Bias and Variance


Dr. R. Jothi
Underfitting vs. Overfitting
• Underfit: An underfit model is one that isn’t complex enough to
match the available data. It does a poor job of matching the available
data because it can’t replicate all of the observed changes.
• Overfit: An overfit model is one that is too complex for the available
data. The high complexity allows the model to have too many changes
for the data set, creating curves and variations that don’t actually
exist.
Bias in machine learning
• Bias describes how well a model matches the training set.
• A model with high bias won’t match the data set closely, while a
model with low bias will match the data set very closely.
• Bias comes from models that are overly simple and fail to capture the
trends present in the data set.
Variance in machine learning
• Variance describes how much a model changes when you train it
using different portions of your data set.
• Even if the training data changes slightly, the model parameters learnt
will change), it is unstable and said to have a high variance
• Variance comes from models that are highly complex, employing a
significant number of features.
Bias vs. Variance
• Models with high bias will have low variance.
• Models with high variance will have a low bias.
• A model that is not flexible enough to match a data set correctly (High
bias) is also not flexible enough to change dramatically when given a
different data set (Low variance).
Model performance
• How to avoid high variance while maintaining low bias?
High bias, low Low bias, high
variance variance
• The following model suggests more assumptions about the target
functions
A. Model with low bias
B. Model with high bias
• Which of the models tend to have high-bias?
A. Linear regression
B. Decision tree
C. Logistic regression
D. SVM
E. K-NN
F. Linear Discriminant Analysis
• Which of the models tend to have low-variance ?
A. Linear regression
B. Decision tree
C. Logistic regression
D. SVM
E. K-NN
F. Linear Discriminant Analysis
• Examples of high variance machine learning algorithms
A. Decision trees
B. Linear regression
C. SVM
D. Logistic regression
E. Perceptron (single layer)
F. K-NN
• Which type of model is ideal?
A. Low bias and high-variance
B. High bias and low-variance
Error
• Reducible Error
• the element that we can improve
• Bias and Variance
• Irreducible error
• elements outside our control, such as statistical noise in the observations

Model Error = Reducible Error + Irreducible Error

You might also like