MAFE208IU-L6 - Least Squares Regression

Chapter 3:
g & Interpolation
Curve Fitting p
Lecture 1:
Least-Squares Regression
Motivation
• Data often given for

discrete values
• Estimates at points
between the discrete
values
1 2 3 4 5 6 7 8 9
Year 1 2 3 4 5 6 7
Profit 2.2 2.5 3.0 3.4 3.8 4.3 4.8
Question: Estimate the profit in the next year

Motivation
Given experimental data: (x1, y1),
) (x2, y2),…,(x
) (xn, yn)
• The relationship between

x and y may not be clear.
clear
• Find a function f(x) that
best fit the data
1 2 3 4 5 6 7 8 9 3
Curve Fitting
Given a set of tabulated data, find a curve or a function
that best represents the data.
Given:
1. The tabulated data
2 The form of the function

2.
3. The curve fitting criteria
Find the unknown coefficients of the function
4
Selection of the Functions
Linear f ( x) = a + bx
Quadratic f ( x) = a + bx + cx 2
n
Polynomial f ( x) = ∑ ak x k
k =0
m
General f ( x ) = ∑ ak g k ( x )
k =0
g k ( x) are known.
5
Decide on the Criterion
1. Least Squares
q Regression:
g
n
minimize Φ = ∑ ( y − f ( x ))
i=1
i i
2
2 E t Matching
2.Exact M t hi (I
(Interpolation):
t l ti )
yi = f ( xi )
6
Least Squares Regression
Linear Regression
Fitting a straight line to a set of paired observations:
((x1, y1), (x
( 2, y2),
),…,(x
,( n, yn)).
y=a+bx+e
Linear f ( x) = a + bx
a-intercept
a intercept
b-slope
e-error or residual
e-error, residual, between the model and the
observations.
7
Determine the Unknowns
We want to find a and b to minimize :

n
Φ ( a , b) = ∑ ( a + bxi − yi ) 2
i =1
How do we obtain a and b to minimize : Φ ( a , b) ?
8
Determine the Unknowns
Necessary condition for the minimum :
∂ Φ ( a, b)
=0
∂a
∂ Φ ( a, b)
=0
∂b
9
Determining the Unknowns
n
∂ Φ ( a , b)
= ∑ 2(a + bxi − yi ) = 0
∂a i =1
n
∂ Φ ( a , b)
= ∑ 2(a + bx
b i − yi )xi = 0
∂b i =1
10
Normal Equations
q
∂ Φ ( a , b) n
= ∑ 2(a + bxi − yi ) = 0
∂a i =1
∂ Φ ( a , b) n
= ∑ 2(a + bxi − yi )xi = 0
∂b i =1
⎛ n ⎞ ⎛ n ⎞
n a + ⎜⎜ ∑ xi ⎟⎟ b = ⎜⎜ ∑ yi ⎟⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠
⎛ n ⎞ ⎛ n 2⎞ ⎛ n ⎞
⎜ ∑ xi ⎟ a + ⎜ ∑ xi ⎟ b = ⎜ ∑ xi y i ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠ ⎝ i =1 ⎠
These equations are
called Normal Equations
11
Example 1
According to a research by a group of biologists, the
chirping rate y (chirps/min) of crickets of a certain
species appears to be related to temperature x ( oF).
x 50 55 60 65 70 75 80 85 90 95
y 22 45 82 95 110 145 183 202 216 238
Question: What is the chirping rate at 100oF?
12
Plot data
13
⎛ n ⎞ ⎛ n ⎞
n a + ⎜ ∑ xk ⎟ b = ⎜ ∑ yk ⎟
Solution ⎝ k =1 ⎠ ⎝ k =1 ⎠
⎛ n ⎞ ⎛ n 2⎞ ⎛ n ⎞
Normal equations: ⎜ ∑ xk ⎟ a + ⎜ ∑ xk ⎟ b = ⎜ ∑ xk yk ⎟
⎝ k =1 ⎠ ⎝ k =1 ⎠ ⎝ k =1 ⎠
10a + 725b = 1338

725a + 54625b = 107105
⇒ a = -221.2, b = 4.897
221.22 + 4.897
y = -221 4 897 x
y (100) = 268
i12
14
Exercise ì
íẶ
The profits y of a company (in millions USD) during
the past seven years x are given in the following table
x 1 2 3 4 5 6 7
y 2.2
Cd 2.5
18 3.0
05 3.4
30 3.8 4.3
00 4.8
6
a)) Plot the data points nên

b) Find the least-squares line that best fit the data. Plot
the line with the data points
c) Use the least-squares line found in part b) to
estimate the profit in the 8th year n a + ⎛ ∑ x ⎞ b = ⎛ ∑ y ⎞
n n
⎜ k ⎟ ⎜ k ⎟
⎝ k =1 ⎠ ⎝ k =1 ⎠
ị ⎛ n
⎜∑ k ⎟x
⎞
a +
⎛ n 2⎞
⎜∑ k ⎟x b =
⎛ n15
⎜∑ k k ⎟
x y
⎞
⎝ k =1 ⎠ ⎝ k =1 ⎠ ⎝ k =1 ⎠
ịỄẸ
Multiple Linear Regression
Example: t 0 1 2 3
x 0.1 0.4 0.2 0.2

Given the following data:
y 3 2 1 2
D t
Determine
i a ffunction
ti off two
t variables:
i bl
f(x,t) = a + b x + c t
That best fits the data with the least sum of the square of errors.
16
Solution of Multiple Linear
R
Regression
i
Construct Φ =
n
t 0 1 2 3
∑ i i i
( y
i =1
− f ( x , t )) 2
x 0.1 0.4 0.2 0.2

D i the
Derive th necessary
conditions by equating the y 3 2 1 2
partial derivatives with respect
to the unknown parameters to
zero then solve the equations.
zero, equations
17
Solution of Multiple Linear
R
Regression
i
n
Φ (a, b, c) = ∑ ( a + bxi + cti − yi ) → min
2
f ( x, t ) = a + bx + ct ,
i =1
N
Necessary conditions:
di i
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) = 0
∂a i =1
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) xi = 0
∂b i =1
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) ti = 0
∂c i =1
18
N
Normal
lEEquations
ti
n n n
a n + b ∑ xi + c ∑ ti = ∑ y i
i =1 i =1 i =1
n n n n
a ∑ xi + b ∑ ( xi ) 2 + c ∑ ( xi ti ) = ∑ ( xi y i )
i =1 i =1 i =1 i =1
n n n n
a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2
i =1 i =1 i =1 i =1
Chúnose 19
Example 2: Multiple Linear
R
Regression
i
i 1 2 3 4 Sum
ti 0 1 2 3 6
xi 0.1 0.4 0.2 0.2 0.9
yi 3 2 1 2 8
xi2 0.01 0.16 0.04 0.04 0.25
xi t i 0 0.4 0.4 0.6 1.4
xi yi 0.3 0.8 0.2 0.4 1.7
ti2 0 1 4 9 14
t i yi 0 2 2 6 10
20
E
Example
l 2:
2 S
System
t off Equations
E ti
4a + 0.9b + 6c = 8
0.9a + 0.25b + 1.4c = 1.7
6a + 1.4b + 14c = 10
Solving :
a = 2.9574, b = −1.7021, c = −0.38298
f ( x, t ) = a + bx + ct = 2.9574 − 1.7021 x − 0.38298 t
21
Exercise
Develop a multiple least-square model for these data,
data
and use it to estimate y(1/2, -1)
t -1
1 1 3 4
x -2 0 1 3
y -6 -2 3 6
Normal Equations:
n n n x = ( x1 , x2 ,..., xn ), t = (t1 , t2 ,..., tn )
a n + b ∑ xi + c ∑ ti = ∑ y i y = ( y1 , y2 ,..., yn )
i =1 i =1 i =1 n n n
n n n n a n + b∑ xi + c ∑ ti = ∑ yi
a ∑ x i + b ∑ ( xi ) 2 + c ∑ ( x i t i ) = ∑ ( x i y i ) n
i =1 i =1 i =1
i =1 i =1 i =1 i =1 a ∑ xi + bx 2 + cxt = xy
n n n n i =1
2 n 22
a ∑ ti + bxt + ct 2 = ty
i =1 i =1 i =1 i =1 i =1
Exercise 2
Develop a multiple least-square model for these data,
data
and use it to estimate y(x=3, t=5)
t 0 1 3 4
x 1 2 3 4
y 2 3 5 8
Normal Equations:
n n n x = ( x1 , x2 ,..., xn ), t = (t1 , t2 ,..., tn )
a n + b ∑ xi + c ∑ ti = ∑ y i y = ( y1 , y2 ,..., yn )
i =1 i =1 i =1 n n n
n n n n a n + b∑ xi + c ∑ ti = ∑ yi
a ∑ x i + b ∑ ( xi ) 2 + c ∑ ( x i t i ) = ∑ ( x i y i ) n
i =1 i =1 i =1
i =1 i =1 i =1 i =1 a ∑ xi + bx 2 + cxt = xy
n n n n i =1
2 n 23
a ∑ ti + bxt + ct 2 = ty
i =1 i =1 i =1 i =1 i =1
Nonlinear Least Squares Models and
Methods
Polynomial Regression
Fitting with Nonlinear Functions
Linearization Method
Polynomial Regression
The least squares
q method can be extended to fit the data to a
higher-order polynomial
f ( x) = a + bx
b + cx , e = ( f ( xi ) − yi )
2 2
i
2
n
Minimize Φ (a, b, c) = ∑ ( a + bxi + cx − yi ) 2 2
i
i =1
Necessary conditions:
∂Φ (a, b, c) ∂Φ (a, b, c) ∂Φ (a, b, c)
= 0, = 0, =0
∂a ∂b ∂c
25
Equations for Quadratic
Regression
( )
n 2
Mi i i
Minimize Φ ( a, b, c ) = ∑ b i + cxi2
a + bx − yi
i =1
( )
n
∂Φ ( a , b, c )
= 2∑ a + bxi + cxi − yi = 0
2
∂a i =1
( )
n
∂Φ ( a, b, c )
= 2∑ a + bxi + cxi2 − yi xi = 0
∂b i =1
( )
n
∂Φ ( a, b, c )
= 2∑ a + bxi + cxi − yi xi = 0
2 2
∂c i =1
26
Normal Equations
n n n
a n + b ∑ xi + c ∑ xi2 = ∑ yi
i =1 i =1 i =1
n n n n
a ∑ xi + b ∑ xi2 + c ∑ xi3 = ∑ xi yi
i =1 i =1 i =1 i =1
n n n n
a ∑ xi2 + b ∑ xi3 + c ∑ xi4 = ∑ xi2 yi
i =1 i =1 i =1 i =1
27
Example 3: Polynomial
Regression
Fit a second-order ppolynomial
y to the following
g data
xi 0 1 2 3 4 5 ∑=15
yi 21
2.1 77
7.7 13 6
13.6 27 2
27.2 40 9
40.9 61 1
61.1 ∑=152 6
∑=152.6
xi2 0 1 4 9 16 25 ∑=55
xi3 0 1 8 27 64 125 225
xi4 0 1 16 81 256 625 ∑=979
xi yi 0 7.7 27.2 81.6 163.6 305.5 ∑=585.6
xi2 yi 0 7.7 54.4 244.8 654.4 1527.5 ∑=2488.8
28
Example 3: Equations and
Solution
6 a + 15 b + 55 c = 152.6
15 a + 55 b + 225 c = 585.6
55 a + 225 b + 979 c = 2488.8
Solving
g...
a = 2.4786, b = 2.3593, c = 1.8607
2
f ( x ) = 2.4786 + 2.3593 x + 1.8607 x
29
J stif the best choice
Justify
Given
Gi en two
t o or more functions
f nctions to fit the data,
data
How do you select the best?
Answer :
Determine the parameters for each function,
function
then compute Φ for each one. The function
resulting in smaller Φ (least sum of the squares
of the errors) is the best.
n
Φ= ∑ i i
( y
i =1
− f ( x )) 2
30
Example showing that Quadratic is
preferable
f bl than
th Linear
Li R
Regression
i
y y
x x
Linear Regression Quadratic Regression
31
xi 0.24 0.65 0.95 1.24 1.73 2.01 2.23 2.52
yi 0.23 -0.23 -1.1 -0.45 0.27 0.1 -0.29 0.24
It iis required
i d tto fi
findd a function
f ti off th
the fform, ffor example
l :
f ( x) = a ln( x) + b cos( x) + c e x
to fit the data.
data
n
Φ (a, b, c) = ∑ ( f ( xi ) − yi ) 2
i =1
32
n
Φ ( a, b, c ) = ∑ ( a ln( xi ) + b cos( xi ) + c e xi − yi ) 2
i =1
Necessary condition for the minimum :
∂ Φ ( a, b, c ) ⎫
= 0⎪
∂a
⎪
∂ Φ ( a, b, c ) ⎪
= 0⎬ ⇒ Normal Equations
q
∂b ⎪
∂ Φ ( a, b, c ) ⎪
= 0⎪
∂c ⎭ 33
Normal Equations
n n n n
a ∑ (ln xi ) + b∑ (ln xi )(cos xi ) + c ∑ (ln xi )( e ) = ∑ yi (ln xi )
2 xi
i =1 i =1 i =1 i =1
n n n n
a ∑ (ln xi )(cos xi ) + b∑ (cos xi ) + c ∑ (cos xi )( e ) = ∑ yi (cos xi )
2 xi
i =1 i =1 i =1 i =1
n n n n
a ∑ (ln xi )( e ) + b∑ (cos xi )( e ) + c ∑ ( e ) = ∑ yi ( e xi )
xi xi xi 2
i =1 i =1 i =1 i =1
Evaluate the sums and solve the normal equations.
34
Example 4: Evaluating Sums
xi 0.24 0.65 0.95 1.24 1.73 2.01 2.23 2.52 ∑=11.57
yi 0.23 -0.23 -1.1 -0.45 0.27 0.1 -0.29 0.24 ∑=-1.23
(ln xi)2 2.036 0.1856 0.0026 0.0463 0.3004 0.4874 0.6432 0.8543 ∑=4.556
ln(xi)
( ) cos(xi)
( ) -1.386
.386 -0.3429
0.3 9 -0.0298
0.0 98 0.0699 -0.0869
0.0869 -0.2969
0. 969 -0.4912
0. 9 -0.7514
0.75 ∑=-3.316
∑ 3.3 6
ln(xi) * exi -1.814 -0.8252 -0.1326 0.7433 3.0918 5.2104 7.4585 11.487 ∑=25.219
yi * ln(xi) -0.328 0.0991 0.0564 -0.0968 0.1480 0.0698 -0.2326 0.2218 ∑=-0.0625
( i)2
cos(xi) 0 943
0.943 0 6337
0.6337 0 3384
0.3384 0 1055
0.1055 0 0251
0.0251 0 1808
0.1808 0 3751
0.3751 0 6609
0.6609 ∑ 3 26307
∑=3.26307
cos(xi) * exi 1.235 1.5249 1.5041 1.1224 -0.8942 -3.1735 -5.696 -10.104 ∑=-14.481
yi*cos(xi) 0.223 -0.1831 -0.6399 -0.1462 -0.0428 -0.0425 0.1776 -0.1951 ∑=-0.8485
(exi)2 1.616 3.6693 6.6859 11.941 31.817 55.701 86.488 154.47 ∑=352.39
yi * exi 0.2924 -0.4406 -2.844 -1.555 1.523 0.7463 -2.697 2.9829 ∑=-1.9923
35
Example 4: Equations &
Solution
4.55643 a − 3.31547 b + 25.2192 c = −0.062486
− 3.31547 a + 3.26307 b − 14.4815 c = −0.848514
25.2192 a − 14.4815 b + 352.388 c = −1.992283
Solving the above equations :

a = − 0.88815, b = − 1.1074 , c = 0.012398
Therefore
Therefore,
f ( x ) = −0.88815 ln( x ) − 1.1074 cos( x ) + 0.012398 e x
36
Linearization Method:
Exponential Model
Example: Given data xi 1 2 3
yi 2.4 5 9
Find a function f(x) = ae bx

b
that best fits the data.
( )
n 2
Φ ( a , b) = ∑ ae bxi
− yi
i =1
Normal Equations are obtained using :
( )
n
∂Φ
= 2∑ ae bxi − yi e bxi = 0
∂a i =1
Difficult to Solve
( )
n
∂Φ
= 2∑ ae bxi − yi a xi e bxi = 0
∂b i =1
37
Exponential Model
Find a function f(x) = aebx that best fits the data.
data
Then ln( f ( x)) = ln(a ) + b x
Define z = α + bx
where α = ln((a ) and z = ln(( y )
n
Instead of minimizing: Φ (a, b) = ∑ ae ( )
2
bxi
− yi
i =1
n
Minimize: Φ (α , b) = ∑ (α + bxi − zi ) (Easier to solve)
2
i =1
i=
38
Normal Equations
n
Φ (α , b) = ∑ (α + b xi − zi )2
i =1
Normal Equation
q s are obtained using
g:
n
∂Φ
= 2∑ (α + b xi − zi ) = 0
∂α i =1
n
∂Φ
= 2∑ (α + b xi − zi ) xi = 0
∂b i =1
n n n n n
α n + b ∑ xi = ∑ zi and α ∑ xi + b∑ xi2 = ∑ ( xi zi )
i =1 i =1 i =1 i =1 i =1
39
E l ti S
Evaluating Sums and
dSSolving
l i
xi 1 2 3 ∑=6
yi 2.4 5 9
zi=ln(yi) 0.875469 1.609438 2.197225 ∑=4.68213
xi2 1 4 9 ∑ 14
∑=14
xi zi 0.875469 3.218876 6.591674 ∑=10.6860
Set z = ln y
Normal Equations for x and z: α = ln( a ), a = eα
3 α + 6 b = 44.68213
68213
a = e 0.23897 = 1.26994
6 α + 14 b = 10.686
f ( x ) = ae bx = 1.26994 e 0.66087 x
Solving Equations:
α = 0.23897, b = 0.66087 40
Li
Linearization
i ti M Method:
th d Oth
Other Models
M d l
41
Homework N3. Deadline: 5 weeks
Problem 1: The profit (in millions USD) of a company during the past
seven years are reported in the following table
Time 1 2 3 4 5 6 7
(
(years)
)
Profit 1.2 1.5 1.8 2.m 3.8 4.7 6.n
(millions $)
Find a best-fit equation to the data trend. Try several possibilities-
linear, parabolic, and exponential. Plot all these curves and the data
in the same coordinates. Find the best equation
q to predict
p the pprofit
in two years.
(m-2)(n-2) is the two last digits of your student ID number
42
Homework N3: Problem 2
3. Use linear regression to fit these data by a plane
x 1 2 3 3+1/m 4 6
t 1 2 2+1/n 3 4 5
y 5 12 9 4 3 24
U thi
Use this plane
l to
t estimate
ti t f(2,3)
f(2 3)
43
Homework N3: Problem 3
Given the data
x 0 1 3 4
y 1+1/
1+1/n 22
2.2 45
4.5 10+1/
10+1/m
a)) Find the Newton divided-difference interpolating

p g
polynomial that passes through these data points. Use it
to estimate f(2). Plot the curve.
b) Find the Lagrange interpolating polynomial that passes
through these data points. Use it to estimate f’(2)
(m 2)(n 2) is the two last digits of your student ID number

(m-2)(n-2)
44
Homework N3
Problem 4: Fit the data in the following

g Table with qquadratic
splines, and cubic splines. Use the result to evaluate the
function at x=1.5
x 0 1 2 3
f(x) 1+1/n 2 4-1/m 12
S. Chapra & R.P. Canale, Numerical Methods for

Engineers, McGraw-Hill, 7th ed., 2015:
Pages 575-578
Problems: 20.20, 20.21, 20.25 20.26, 20.32, 20.33
45

MAFE208IU-L6 - Least Squares Regression

Uploaded by

Copyright:

Available Formats

You might also like

MAFE208IU-L6 - Least Squares Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MAFE208IU-L6 - Least Squares Regression

Uploaded by

Copyright:

Available Formats

Chapter 3:

• Data often given for

Question: Estimate the profit in the next year

• The relationship between

2 The form of the function

3. The curve fitting criteria

Find the unknown coefficients of the function

We want to find a and b to minimize :

How do we obtain a and b to minimize : Φ ( a , b) ?

y 22 45 82 95 110 145 183 202 216 238

Question: What is the chirping rate at 100oF?

10a + 725b = 1338

a)) Plot the data points nên

x 0.1 0.4 0.2 0.2

x 0.1 0.4 0.2 0.2

xi4 0 1 16 81 256 625 ∑=979

xi yi 0 7.7 27.2 81.6 163.6 305.5 ∑=585.6

xi2 yi 0 7.7 54.4 244.8 654.4 1527.5 ∑=2488.8

Linear Regression Quadratic Regression

yi 0.23 -0.23 -1.1 -0.45 0.27 0.1 -0.29 0.24

Evaluate the sums and solve the normal equations.

Solving the above equations :

Find a function f(x) = ae bx

(m-2)(n-2) is the two last digits of your student ID number

3. Use linear regression to fit these data by a plane

a)) Find the Newton divided-difference interpolating

(m 2)(n 2) is the two last digits of your student ID number

Problem 4: Fit the data in the following

S. Chapra & R.P. Canale, Numerical Methods for

You might also like