Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Polynomial Curve Fitting

BITS F464 – Machine Learning


Navneet Goyal
Department of Computer Science, BITS-Pilani, Pilani Campus, India
Polynomial Curve Fitting

• Seems a very trivial concept!!


• All of us know it well!!
• Why are we discussing it in Machine Learning
course?
• A simple regression problem!!
• It motivates a number of key concepts of ML!!
• Let’s discover…
Polynomial Curve Fitting
Observe Real-valued
input variable x
• Use x to predict value

Variable
Target
of target variable t
• Synthetic data
generated from
sin(2π x)
• Random noise in
target values Input Variable

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Polynomial Curve Fitting
N observations of x
x = (x1,..,xN)T
t = (t1,..,tN)T
• Goal is to exploit training
set to predict value of

Variable
Target
from x
• Inherently a difficult
problem
Data Generation:
N = 10
Spaced uniformly in range [0,1]
Generated from sin(2πx) by adding
small Gaussian noise Input Variable
Noise typical due to unobserved
variables

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Polynomial Curve Fitting

• Where M is the order of the


polynomial
• Is higher value of M better?
We’ll see shortly!

Variable
Target
• Coefficients w0 ,…wM are
denoted by vector w
• Nonlinear function of x, linear
function of coefficients w
• Called Linear Models

Input Variable

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Sum-of-Squares Error Function

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Polynomial curve fitting

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial curve fitting
• Choice of M??
• Called model selection or model comparison

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
0th Order Polynomial

Poor representations of sin(2πx)


Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
1st Order Polynomial

Poor representations of sin(2πx)


Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
3rd Order Polynomial

Best Fit to sin(2πx)


Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
9th Order Polynomial

Over Fit: Poor representation of sin(2πx)


Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
Polynomial Curve Fitting
• Good generalization is the objective
• Dependence of generalization performance on M?
• Consider a data set of 100 points
• Calculate E(w*) for both training data & test data
• Choose M which minimizes E(w*)
• Root Mean Square Error (RMS)

– Sometimes convenient to use as division by N allows us to


compare different sizes of data sets on equal footing
– Square root ensures ERMS is measure on the same scale ( and in
same units) as the target variable t

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Flexibility & Model Complexity

} M=0, very rigid!! Only 1 parameter to play with!


Flexibility & Model Complexity

} M=1, not so rigid!! 2 parameters to play with!


French Curves – Optimum Flexibility
Flexibility & Model Complexity
} So what value of M is most suitable?

} Any Answers???
Over-fitting
For small M(0,1,2)
Inflexible to
handle oscillations
of sin(2πx)

M(3-8)
flexible enough to
handle
oscillations of
sin(2πx)
For M=9
Too flexible!!
TE = 0
GE = high

Why is it happening?
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006
Springer
Polynomial Coefficients

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Data Set Size
M=9
- Larger the data set, the more complex
model we can afford to fit to the data
- No. of data pts should be no less than 5-
10 times the no. of adaptive parameters in
the model

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Over-fitting Problem
Should we limit the no. of parameters according
to the available training set?
Complexity of the model should depend only on the
complexity of the problem!

LSE represents a specific case of Maximum Likelihood

Over-fitting is a general property of maximum


likelihood

Over-fitting Problem can be avoided using the


Bayesian Approach!
Over-fitting Problem
In Bayesian Approach, the effective number of
parameters adapts automatically to the size of the data
set

In Bayesian Approach, models can have more


parameters than the number of data points

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Regularization
Penalize large coefficient values

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Regularization:

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Regularization:

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Regularization: vs.

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Polynomial Coefficients

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006


Springer
Take Aways from Polynomial Curve Fitting
• Concept of over-fitting
• Model Complexity & Flexibility
• Model Selection

Will keep revisiting them from time to time…


Mixture of Distributions/Gaussians
• Real data typically possess an underlying regularity,
which we wish to learn
– But the individual observations are corrupted by random
noise
• Data need not come from a single
distribution/Gaussian
• Read about Mixture of Gaussians

You might also like