Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 14

Bias-Variance Trade-off

Sources of Error
• Sources of error in supervised learning models
• Bias – assumptions can prevent us from learning
relationships between features and output
• Variance – sensitivity to intricate details of training
set can lead to modeling noise in the data
• Irreducible Error – noise in the problem itself
• Bias-Variance trade-off: It is difficult to avoid
the first two types of error simultaneously
Bias-Variance trade-off

• High variance models (more flexible/complex) are


better at capturing intricate features of the training
set but may easily overfit
• Vary a lot with the training set drawn
• High bias models (more simple) may avoid
overfitting, but may miss important features of the
data that have been assumed away
• For instance linearity assumptions (great if true model is
linear, but not otherwise)
Example
• K-Nearest Neighbor
• Low values of K (e.g. K=1) are extremely flexible
but will overfit (high variance, low bias)
• High values of K are less likely to overfit, but may
miss relationships in the data (low variance, high
bias)
• Linear regression
• Tends to have low variance (not so sensitive to the
details of the training set), but
• High bias (if the relationship is non-linear, no
amounts of data will learn this)
K = 1 and K = 100
Decomposition
• Suppose the training set is where we assume (where has 0
mean and variance ). We find a function to approximate . It turns
out that at any test :

• where is the expected test MSE


• (E ranges over different training sets drawn


Highfrom
bias =aestimate
fixed likely to
be far away from true model
distribution)
High variance = estimate

varies a lot with training


set
Derivation

Irreducible error Variance Bias (squared)


Training vs. Test MSE’s
In general the more flexible a method is the lower
its training MSE will be i.e. it will “fit” or explain
the training data very well.

However, the test MSE may in fact be higher for a


more flexible method than for a simple approach
like linear regression.
Three Examples

• The next three examples illustrate how test MSE


varies with features of the data
• Example 1: In this data set there is a lot of variability in Y and
the relationship is non-linear
• Example 2: The relationship in the data is “close to” linear
• Example 3: Nonlinear relationship with very little variability in
the data
Examples with Different Levels of Flexibility:

Example 1

LEFT RIGHT
Black: Truth RED: Test MSE
Orange: Linear Estimate Grey: Training MSE
Blue: smoothing spline Dashed: Minimum possible test
Green: smoothing spline (more MSE (irreducible error)
flexible)
Examples with Different Levels of Flexibility:
Example 2

LEFT RIGHT
Black: Truth RED: Test MSE
Orange: Linear Estimate Grey: Training MSE
Blue: smoothing spline Dashed: Minimum possible test
Green: smoothing spline (more MSE (irreducible error)
flexible)
Examples with Different Levels of
Flexibility: Example 3

LEFT RIGHT
Black: Truth RED: Test MES
Orange: Linear Estimate Grey: Training MSE
Blue: smoothing spline Dashed: Minimum possible test
Green: smoothing spline (more MSE (irreducible error)
flexible)
Test MSE, Bias and Variance

Test MSE

Bias

Variance
A Fundamental Picture

In general training


errors will always
decline.
However, test errors
will decline at first (as
reductions in bias
dominate) but will
then start to increase
again (as increases in
variance dominate).

More flexible/complicated is not always better!

You might also like