Chapter 4.1

Curve Fitting – Least square regression
Learning outcome
 Upon completion of this module students shall be able to;
• to fit curves to data using available techniques.

• to assess the reliability of the answers obtained.
• to choose the preferred method for any particular problem.
Curve Fitting - Introduction
 Typical data
 is discrete but we are interested to know the intermediate value
 need to estimate these intermediate values
Curve Fitting - Introduction
 Curve fitting:
 finding a curve (approximation) which has the best fit to a series of discrete data
 The curve is the estimate of the trend of the dependent variables
 the curve can be used to determine the intermediate estimate of the data.
 How to draw the curve?

 need to define the function of the curve – can be linear or non-linear
 Approaches for curve fitting:
1. Least-square regression
 Data with significant error or noise
 Curve doesn’t pass all data points – curve represent general trend of the data
2. Interpolation
 Data is known to be precise
 Curve passes all data point
Least Square Regression - Introduction
 Regression?
 modelling of relationship between dependent and independent variables
 finding a curve which represent the best approximation of a series of data
points
 the curve is the estimate of the trend of dependent variables
 How to find the curve?

 by deriving the function of the curve
 functions can be linear, polynomial & exponential
Least Square Regression - Introduction
What is regression?
 Given n data points (x1, y1), (x2,y2),…. , (xn,yn), best fit y=f(x) to the data set. The best
fit is generally based on minimising the sum of the square of the residuals, Sr.
 Regression model;
(xn, yn)
y p  f(x)
 Residual at any point is y p  f(x)

εi
εi  yi  f(xi) ‘Regression model’
 Sum of the square of the residuals

n
(x1, y1)
Sr  
i 1
(yi  f(x i)) 2
Basic model for regression
a) Linear Regression
 Fit a straight line to a set of n data points (x1,y1), (x2,y2), ....., (xn, yn).
 Equation of the line (regression model) is given by
y  a 0  a 1x  e
 a1- slope
 a0- intercept
 e- error, or residual, between the model
and the measurement
 Ideally, if all the residuals are zero, one may have found an equation in which all the points lie
on the model.
 Thus, minimization of the residual is an objective of obtaining regression coefficients.
 The most popular method to minimize the residual is the least squares methods, where the
estimates of the constants of the models are chosen such that the sum of the squared
residuals, Sr is minimized.
 Why best fit is based on Square of residuals?
 Criteria for a “Best” fit
1. Minimize the sum of the residual errors for all available data:
n n
 e   (y
i 1
i
i 1
i  a o  a 1x i )
 n = total number of points

 This is an inadequate criterion  no unique model
2. Minimize the sum of the absolute values of residual:

n n
 e   y a
i 1
i
i 1
i 0  a 1x i
 This is an inadequate criterion  no unique model
3. Minimax criterion  ill-suited (affected by outlier)

 Minimize the sum of the squares of the residuals, Sr, between the measured y and the y
calculated with the linear model:
n n n
Sr   e   (y , measured  y , model)   (y
i 1
2
i
i 1
i i
2
i 1
i  a 0  a 1x i ) 2
 Best strategy! Yields a unique line for a given set of data.
 Using the regression model: y  a 0  a 1x

 the slope and intercept producing the best fit can be found using:
n  x i yi   x i  yi
a1 
n  x i2   x i 
2
a 0  y  a 1x
a) Linear Regression - Example
 Fit a straight line to the data series
i xi yi (xi)2 xiyi
a1 
n x y x y
i i i i
n  x   x 
2 2
1 10 25 100 250
i i
2 20 70 400 1400 8312850   360 5135

3 30 380 900 11400 820400   360 2
 19.47024
4 40 550 1600 22000
5 50 610 2500 30500 a 0  y  a 1x

6 60 1220 3600 73200  641.875  19.4702445
7 70 830 4900 58100
 234.2857
8 80 1450 6400 116000  Least-square fit is given by;

 360 5135 20400 312850 y  234.2857  19.47024x
a) Linear Regression - Example
 The regression model is given as
y  234.2857  19.47024x
 The regression line is as shown which give the minimum Sr.

a) Linear Regression - Quantification of error
Quantification of error
 for a straight line, the sum of the squares of the estimate residuals:
n n
Sr  
i 1
e i2   i 0 1i
 y
i 1
 a  a x 2
 Standard error of the estimate:
Sr
s y/x  • Quantify the spread of data around the regression line
n2 • Used to quantify the ‘goodness’ of a fit
Standard error of estimate, Sy/x Quantify the spread of data around regression line
 Regression data showing

 the spread of data around the mean of the dependent data, Sy
 the spread of the data around the best fit line, Sy/x
Standard deviation, Sy Standard error of estimate, Sy/x
 The reduction in spread represents the improvement due to linear regression.

n

 i 0 1i
  2
 y  y
2 y a a x
St Sr
sy   i
s y/x   i 1
n 1 n 1 n2 n2
Coefficient of determination, r2
 how well regression line represent the real data.

 Represent the % of the data which is closest to the line of best fit!
r2 is the difference between the sum of the squares of the data residuals, St and the sum of
the squares of the estimate residuals, Sr, normalized by the sum of the squares of the
data residuals:
n
S t  Sr St    y i  y  Sr    y i  a 0  a 1 x i 
2 2
r 
2
St i 1
 St – Sr : quantify the improvement (or error reduction) due to describing data in term of
straight line rather than an average value.
 r2 represents the percentage of the original uncertainty explained by the model. (i.e. % of data
that is closest to the regression line)
 For a perfect fit, Sr=0 and r2=1.

 If r2=0  St=Sr, there is no improvement over simply picking the mean.
 If r2<0, the model is worse than simply picking the mean!
i xi yi a0+a1xi (yi- ȳ)2 (yi-a0- Fest  234.2857  19.47024x i
a1xi)2
St   y i  y   1808297
2
1 10 25 -39.58 380535 4171
Sr   y i  a 0  a 1x i   216118
2
2 20 70 155.12 327041 7245
3 30 380 349.82 68579 911

1808297
sy   508.26
4 40 550 544.52 8441 30 8 1
5 50 610 739.23 1016 16699
216118
s y/x   189.79
6 60 1220 933.93 334229 81837 82
7 70 830 1128.63 35391 89180 1808297  216118
r2   0.8805
8 80 1450 1323.33 653066 16044 1808297
 360 5135 1808297 216118 88.05% of the original uncertainty
has been explained by the
linear model
How good is the model?
 Check for adequacy

 Standard error of estimate, Sy/x
 Coefficient of determination, r2
 Plot graph and check visually
*Non linear relationships
 Linear regression is predicted based on linear relationship between the dependent and
independent variables. But this is not always the case!
 Non linear regression is available but an alternative to linearise these relationship.
 Some popular nonlinear regression models
M odel Nonlinear Linearized
exponential : y  α1eβ1x ln y  lnα1  β1x
power : y  α 2 x β2 log y  logα 2  β 2 logx

x 1 1 β3 1
saturation - growth - rate : y  α 3  
β3  x y α3 α3 x
transform these equations into linear form so that simple linear regression can be used.
 Example of nonlinear transformation
 In their transformed forms, these model can use linear regression to evaluate the constant
coefficients.
Example:
 Given y  α 2 x β2
ORIGINAL DATA LINEARISED DATA
Use linearised plot to determine the coefficient of power equation

 Linear regression is log y = 1.75 log x – 0.300
Not all equations can be easily transformed!!!!

Alternative method  Nonlinear regression
b) Polynomial regression – non-linear model
 The linear least-squares regression procedure
can be readily extended to fit data to a higher-
order polynomial.
 Again, the idea is to minimize the sum of the

squares of the estimate residuals.
 The figure shows the same data fit with:
a) A first order polynomial

b) A second order polynomial
 For second order polynomial regression:
y  a 0  a 1x  a 2 x 2  e
 For a second order polynomial, the best fit would mean minimizing:
 y  a 
n n

2
Sr  e i2  i 0  a 1x i  a 2 x i
2
i 1 i 1
 In general, this would mean minimizing:
 y  a 
n n

2
Sr  e i2  i 0  a1x i  a 2 x i    a m x i
2 m
i 1 i 1
 The standard error for fitting an mth order polynomial to n data points is:
Sr
s y/x 
n  m  1
because the mth order polynomial has (m+1) coefficients.
 The coefficient of determination r2 is still found using: S t  Sr

r2 
St
 To find the constants of the polynomial model, we set the derivatives with respect to
  
Sr
n
 2. y i  a 0  a 1x i  . . .  a m x im (1)  0
a 0 i 1
  2.y 
Sr
n
i  a 0  a 1x i  . . .  a m x im ( x i )  0
a 1 i 1
   
  
Sr
n
 2. y i  a 0  a 1x i  . . .  a m x im ( x im )  0
a m i 1
 In general, these equations in matrix form are given by
   
   n 
  n   n m 
 n

 x 
i 
 i 1 
. . 
. x i  

 i 1

  a 0 

  yi 

  ni 1 
 n   n 2  n   
m 1  a1   
 
 xi 
  i 1  
 xi 
 i 1


. . .

 i 1
xi 
 . . .

  xi yi 
   i 1 
. . . . . . . . . . .  a m  . . . 
   n 
 n m   n m 1   n  

 x i  
 xi 

. . 
. x 2i m  
  
 xmy
 i 1
i i



 i 1   i 1   i 1  
The above equations are then solved for a 0 , a1,, a m

 Fit a second-order polynomial
m  2; n  6; x  2.5; y  25.433
  
x i  15; y i  152.6; x i 2  55;  x i 3  225;
x i
4
 979; x y
i i x
 585.6; i
2
y i  2488.8
Solving gives; a0 = 2.47847, a1=2.35929 & a2=1.86071
 y = 2.47847 + 2.35929x + 1.86071x2

Chapter 4.1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4.1

Uploaded by

Copyright:

Available Formats

Curve Fitting – Least square regression

• to fit curves to data using available techniques.

 How to draw the curve?

 Approaches for curve fitting:

 How to find the curve?

 Residual at any point is y p  f(x)

 Sum of the square of the residuals

 Equation of the line (regression model) is given by

 Thus, minimization of the residual is an objective of obtaining regression coefficients.

 Criteria for a “Best” fit

 n = total number of points

2. Minimize the sum of the absolute values of residual:

 This is an inadequate criterion  no unique model

3. Minimax criterion  ill-suited (affected by outlier)

 Best strategy! Yields a unique line for a given set of data.

 Using the regression model: y  a 0  a 1x

5 50 610 2500 30500 a 0  y  a 1x

8 80 1450 6400 116000  Least-square fit is given by;

 The regression line is as shown which give the minimum Sr.

 Standard error of the estimate:

 Regression data showing

Standard deviation, Sy Standard error of estimate, Sy/x

 The reduction in spread represents the improvement due to linear regression.

 how well regression line represent the real data.

 For a perfect fit, Sr=0 and r2=1.

3 30 380 349.82 68579 911

 Check for adequacy

 Non linear regression is available but an alternative to linearise these relationship.

 Some popular nonlinear regression models

M odel Nonlinear Linearized

exponential : y  α1eβ1x ln y  lnα1  β1x

power : y  α 2 x β2 log y  logα 2  β 2 logx

ORIGINAL DATA LINEARISED DATA

Use linearised plot to determine the coefficient of power equation

Not all equations can be easily transformed!!!!

 Again, the idea is to minimize the sum of the

 The figure shows the same data fit with:

a) A first order polynomial

 For second order polynomial regression:

 In general, this would mean minimizing:

 The coefficient of determination r2 is still found using: S t  Sr

The above equations are then solved for a 0 , a1,, a m

Solving gives; a0 = 2.47847, a1=2.35929 & a2=1.86071

 y = 2.47847 + 2.35929x + 1.86071x2

You might also like