Professional Documents
Culture Documents
Chapter 4.1
Chapter 4.1
Learning outcome
Upon completion of this module students shall be able to;
1. Least-square regression
Data with significant error or noise
Curve doesn’t pass all data points – curve represent general trend of the data
2. Interpolation
Data is known to be precise
Curve passes all data point
Least Square Regression - Introduction
Regression?
modelling of relationship between dependent and independent variables
finding a curve which represent the best approximation of a series of data
points
the curve is the estimate of the trend of dependent variables
Given n data points (x1, y1), (x2,y2),…. , (xn,yn), best fit y=f(x) to the data set. The best
fit is generally based on minimising the sum of the square of the residuals, Sr.
Regression model;
(xn, yn)
y p f(x)
y a 0 a 1x e
a1- slope
a0- intercept
e- error, or residual, between the model
and the measurement
Ideally, if all the residuals are zero, one may have found an equation in which all the points lie
on the model.
The most popular method to minimize the residual is the least squares methods, where the
estimates of the constants of the models are chosen such that the sum of the squared
residuals, Sr is minimized.
a) Linear Regression
Why best fit is based on Square of residuals?
1. Minimize the sum of the residual errors for all available data:
n n
e (y
i 1
i
i 1
i a o a 1x i )
e y a
i 1
i
i 1
i 0 a 1x i
n n n
Sr e (y , measured y , model) (y
i 1
2
i
i 1
i i
2
i 1
i a 0 a 1x i ) 2
n x i yi x i yi
a1
n x i2 x i
2
a 0 y a 1x
a) Linear Regression - Example
Fit a straight line to the data series
i xi yi (xi)2 xiyi
a1
n x y x y
i i i i
n x x
2 2
1 10 25 100 250
i i
2 20 70 400 1400 8312850 360 5135
3 30 380 900 11400 820400 360 2
19.47024
4 40 550 1600 22000
y 234.2857 19.47024x
for a straight line, the sum of the squares of the estimate residuals:
n n
Sr
i 1
e i2 i 0 1i
y
i 1
a a x 2
Sr
s y/x • Quantify the spread of data around the regression line
n2 • Used to quantify the ‘goodness’ of a fit
a) Linear Regression - Quantification of error
Standard error of estimate, Sy/x Quantify the spread of data around regression line
y y
2 y a a x
St Sr
sy i
s y/x i 1
n 1 n 1 n2 n2
a) Linear Regression - Quantification of error
Coefficient of determination, r2
r2 is the difference between the sum of the squares of the data residuals, St and the sum of
the squares of the estimate residuals, Sr, normalized by the sum of the squares of the
data residuals:
n
S t Sr St y i y Sr y i a 0 a 1 x i
2 2
r
2
St i 1
St – Sr : quantify the improvement (or error reduction) due to describing data in term of
straight line rather than an average value.
r2 represents the percentage of the original uncertainty explained by the model. (i.e. % of data
that is closest to the regression line)
Sr y i a 0 a 1x i 216118
2
2 20 70 155.12 327041 7245
In their transformed forms, these model can use linear regression to evaluate the constant
coefficients.
*Non linear relationships
Example:
Given y α 2 x β2
y a 0 a 1x a 2 x 2 e
b) Polynomial regression – non-linear model
For a second order polynomial, the best fit would mean minimizing:
y a
n n
2
Sr e i2 i 0 a 1x i a 2 x i
2
i 1 i 1
y a
n n
2
Sr e i2 i 0 a1x i a 2 x i a m x i
2 m
i 1 i 1
The standard error for fitting an mth order polynomial to n data points is:
Sr
s y/x
n m 1
because the mth order polynomial has (m+1) coefficients.
Sr
n
2. y i a 0 a 1x i . . . a m x im (1) 0
a 0 i 1
2.y
Sr
n
i a 0 a 1x i . . . a m x im ( x i ) 0
a 1 i 1
Sr
n
2. y i a 0 a 1x i . . . a m x im ( x im ) 0
a m i 1
b) Polynomial regression – non-linear model
In general, these equations in matrix form are given by
n
n n m
n
x
i
i 1
. .
. x i
i 1
a 0
yi
ni 1
n n 2 n
m 1 a1
xi
i 1
xi
i 1
. . .
i 1
xi
. . .
xi yi
i 1
. . . . . . . . . . . a m . . .
n
n m n m 1 n
x i
xi
. .
. x 2i m
xmy
i 1
i i
i 1 i 1 i 1
m 2; n 6; x 2.5; y 25.433
x i 15; y i 152.6; x i 2 55; x i 3 225;
x i
4
979; x y
i i x
585.6; i
2
y i 2488.8