Polynomial Curve Fitting August 2019 PDF

Polynomial Curve Fitting
BITS F464 – Machine Learning

Navneet Goyal
Department of Computer Science, BITS-Pilani, Pilani Campus, India
• Seems a very trivial concept!!

• All of us know it well!!
• Why are we discussing it in Machine Learning
course?
• A simple regression problem!!
• It motivates a number of key concepts of ML!!
• Let’s discover…
Observe Real-valued
input variable x
• Use x to predict value
Variable
Target
of target variable t
• Synthetic data
generated from
sin(2π x)
• Random noise in
target values Input Variable
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Springer
N observations of x
x = (x1,..,xN)T
t = (t1,..,tN)T
• Goal is to exploit training
set to predict value of
Variable
Target
from x
• Inherently a difficult
problem
Data Generation:
N = 10
Spaced uniformly in range [0,1]
Generated from sin(2πx) by adding
small Gaussian noise Input Variable
Noise typical due to unobserved
variables

Springer
• Where M is the order of the

polynomial
• Is higher value of M better?
We’ll see shortly!
Variable
Target
• Coefficients w0 ,…wM are
denoted by vector w
• Nonlinear function of x, linear
function of coefficients w
• Called Linear Models
Input Variable

Springer
Sum-of-Squares Error Function

Springer
Polynomial curve fitting
Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006 Springer
Polynomial curve fitting
• Choice of M??
• Called model selection or model comparison

Springer
0th Order Polynomial
Poor representations of sin(2πx)

Springer
1st Order Polynomial
Poor representations of sin(2πx)

Springer
3rd Order Polynomial
Best Fit to sin(2πx)

Springer
9th Order Polynomial
Over Fit: Poor representation of sin(2πx)

Springer
• Good generalization is the objective
• Dependence of generalization performance on M?
• Consider a data set of 100 points
• Calculate E(w*) for both training data & test data
• Choose M which minimizes E(w*)
• Root Mean Square Error (RMS)
– Sometimes convenient to use as division by N allows us to

compare different sizes of data sets on equal footing
– Square root ensures ERMS is measure on the same scale ( and in
same units) as the target variable t

Springer
Flexibility & Model Complexity
} M=0, very rigid!! Only 1 parameter to play with!

} M=1, not so rigid!! 2 parameters to play with!

French Curves – Optimum Flexibility
} So what value of M is most suitable?
} Any Answers???
Over-fitting
For small M(0,1,2)
Inflexible to
handle oscillations
of sin(2πx)
M(3-8)
flexible enough to
handle
oscillations of
sin(2πx)
For M=9
Too flexible!!
TE = 0
GE = high
Why is it happening?
Springer
Polynomial Coefficients

Springer
Data Set Size
M=9
- Larger the data set, the more complex
model we can afford to fit to the data
- No. of data pts should be no less than 5-
10 times the no. of adaptive parameters in
the model

Springer
Over-fitting Problem
Should we limit the no. of parameters according
to the available training set?
Complexity of the model should depend only on the
complexity of the problem!
LSE represents a specific case of Maximum Likelihood
Over-fitting is a general property of maximum

likelihood
Over-fitting Problem can be avoided using the

Bayesian Approach!
Over-fitting Problem
In Bayesian Approach, the effective number of
parameters adapts automatically to the size of the data
set
In Bayesian Approach, models can have more

parameters than the number of data points

Springer
Regularization
Penalize large coefficient values

Springer
Regularization:

Springer
Regularization:

Springer
Regularization: vs.

Springer
Polynomial Coefficients

Springer
Take Aways from Polynomial Curve Fitting
• Concept of over-fitting
• Model Complexity & Flexibility
• Model Selection
Will keep revisiting them from time to time…

Mixture of Distributions/Gaussians
• Real data typically possess an underlying regularity,
which we wish to learn
– But the individual observations are corrupted by random
noise
• Data need not come from a single
distribution/Gaussian
• Read about Mixture of Gaussians

Polynomial Curve Fitting August 2019 PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Polynomial Curve Fitting August 2019 PDF

Uploaded by

Copyright:

Available Formats

Polynomial Curve Fitting

BITS F464 – Machine Learning

• Seems a very trivial concept!!

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

• Where M is the order of the

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Poor representations of sin(2πx)

Poor representations of sin(2πx)

Best Fit to sin(2πx)

Over Fit: Poor representation of sin(2πx)

– Sometimes convenient to use as division by N allows us to

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

} M=0, very rigid!! Only 1 parameter to play with!

} M=1, not so rigid!! 2 parameters to play with!

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

LSE represents a specific case of Maximum Likelihood

Over-fitting is a general property of maximum

Over-fitting Problem can be avoided using the

In Bayesian Approach, models can have more

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Reference: Christopher M Bishop: Pattern Recognition & Machine Leaning, 2006

Will keep revisiting them from time to time…

You might also like