Professional Documents
Culture Documents
MAFE208IU-L6 - Least Squares Regression
MAFE208IU-L6 - Least Squares Regression
MAFE208IU-L6 - Least Squares Regression
g & Interpolation
Curve Fitting p
Lecture 1:
Least-Squares Regression
Motivation
Year 1 2 3 4 5 6 7
Profit 2.2 2.5 3.0 3.4 3.8 4.3 4.8
1 2 3 4 5 6 7 8 9 3
Curve Fitting
Given a set of tabulated data, find a curve or a function
that best represents the data.
Given:
1. The tabulated data
4
Selection of the Functions
Linear f ( x) = a + bx
Quadratic f ( x) = a + bx + cx 2
n
Polynomial f ( x) = ∑ ak x k
k =0
m
General f ( x ) = ∑ ak g k ( x )
k =0
g k ( x) are known.
5
Decide on the Criterion
1. Least Squares
q Regression:
g
n
minimize Φ = ∑ ( y − f ( x ))
i=1
i i
2
2 E t Matching
2.Exact M t hi (I
(Interpolation):
t l ti )
yi = f ( xi )
6
Least Squares Regression
Linear Regression
Fitting a straight line to a set of paired observations:
((x1, y1), (x
( 2, y2),
),…,(x
,( n, yn)).
y=a+bx+e
Linear f ( x) = a + bx
a-intercept
a intercept
b-slope
e-error or residual
e-error, residual, between the model and the
observations.
7
Determine the Unknowns
8
Determine the Unknowns
Necessary condition for the minimum :
∂ Φ ( a, b)
=0
∂a
∂ Φ ( a, b)
=0
∂b
9
Determining the Unknowns
n
∂ Φ ( a , b)
= ∑ 2(a + bxi − yi ) = 0
∂a i =1
n
∂ Φ ( a , b)
= ∑ 2(a + bx
b i − yi )xi = 0
∂b i =1
10
Normal Equations
q
∂ Φ ( a , b) n
= ∑ 2(a + bxi − yi ) = 0
∂a i =1
∂ Φ ( a , b) n
= ∑ 2(a + bxi − yi )xi = 0
∂b i =1
⎛ n ⎞ ⎛ n ⎞
n a + ⎜⎜ ∑ xi ⎟⎟ b = ⎜⎜ ∑ yi ⎟⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠
⎛ n ⎞ ⎛ n 2⎞ ⎛ n ⎞
⎜ ∑ xi ⎟ a + ⎜ ∑ xi ⎟ b = ⎜ ∑ xi y i ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠ ⎝ i =1 ⎠
These equations are
called Normal Equations
11
Example 1
According to a research by a group of biologists, the
chirping rate y (chirps/min) of crickets of a certain
species appears to be related to temperature x ( oF).
x 50 55 60 65 70 75 80 85 90 95
12
Plot data
13
⎛ n ⎞ ⎛ n ⎞
n a + ⎜ ∑ xk ⎟ b = ⎜ ∑ yk ⎟
Solution ⎝ k =1 ⎠ ⎝ k =1 ⎠
⎛ n ⎞ ⎛ n 2⎞ ⎛ n ⎞
Normal equations: ⎜ ∑ xk ⎟ a + ⎜ ∑ xk ⎟ b = ⎜ ∑ xk yk ⎟
⎝ k =1 ⎠ ⎝ k =1 ⎠ ⎝ k =1 ⎠
14
Exercise ì
íẶ
The profits y of a company (in millions USD) during
the past seven years x are given in the following table
x 1 2 3 4 5 6 7
y 2.2
Cd 2.5
18 3.0
05 3.4
30 3.8 4.3
00 4.8
6
⎜ k ⎟ ⎜ k ⎟
⎝ k =1 ⎠ ⎝ k =1 ⎠
ị ⎛ n
⎜∑ k ⎟x
⎞
a +
⎛ n 2⎞
⎜∑ k ⎟x b =
⎛ n15
⎜∑ k k ⎟
x y
⎞
⎝ k =1 ⎠ ⎝ k =1 ⎠ ⎝ k =1 ⎠
ịỄẸ
Multiple Linear Regression
Example: t 0 1 2 3
D t
Determine
i a ffunction
ti off two
t variables:
i bl
f(x,t) = a + b x + c t
That best fits the data with the least sum of the square of errors.
16
Solution of Multiple Linear
R
Regression
i
Construct Φ =
n
t 0 1 2 3
∑ i i i
( y
i =1
− f ( x , t )) 2
17
Solution of Multiple Linear
R
Regression
i
n
Φ (a, b, c) = ∑ ( a + bxi + cti − yi ) → min
2
f ( x, t ) = a + bx + ct ,
i =1
N
Necessary conditions:
di i
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) = 0
∂a i =1
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) xi = 0
∂b i =1
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) ti = 0
∂c i =1
18
N
Normal
lEEquations
ti
n n n
a n + b ∑ xi + c ∑ ti = ∑ y i
i =1 i =1 i =1
n n n n
a ∑ xi + b ∑ ( xi ) 2 + c ∑ ( xi ti ) = ∑ ( xi y i )
i =1 i =1 i =1 i =1
n n n n
a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2
i =1 i =1 i =1 i =1
Chúnose 19
Example 2: Multiple Linear
R
Regression
i
i 1 2 3 4 Sum
ti 0 1 2 3 6
xi 0.1 0.4 0.2 0.2 0.9
yi 3 2 1 2 8
xi2 0.01 0.16 0.04 0.04 0.25
xi t i 0 0.4 0.4 0.6 1.4
xi yi 0.3 0.8 0.2 0.4 1.7
ti2 0 1 4 9 14
t i yi 0 2 2 6 10
20
E
Example
l 2:
2 S
System
t off Equations
E ti
4a + 0.9b + 6c = 8
0.9a + 0.25b + 1.4c = 1.7
6a + 1.4b + 14c = 10
Solving :
a = 2.9574, b = −1.7021, c = −0.38298
f ( x, t ) = a + bx + ct = 2.9574 − 1.7021 x − 0.38298 t
21
Exercise
Develop a multiple least-square model for these data,
data
and use it to estimate y(1/2, -1)
t -1
1 1 3 4
x -2 0 1 3
y -6 -2 3 6
Normal Equations:
n n n x = ( x1 , x2 ,..., xn ), t = (t1 , t2 ,..., tn )
a n + b ∑ xi + c ∑ ti = ∑ y i y = ( y1 , y2 ,..., yn )
i =1 i =1 i =1 n n n
n n n n a n + b∑ xi + c ∑ ti = ∑ yi
a ∑ x i + b ∑ ( xi ) 2 + c ∑ ( x i t i ) = ∑ ( x i y i ) n
i =1 i =1 i =1
i =1 i =1 i =1 i =1 a ∑ xi + bx 2 + cxt = xy
n n n n i =1
a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2 n 22
a ∑ ti + bxt + ct 2 = ty
i =1 i =1 i =1 i =1 i =1
Exercise 2
Develop a multiple least-square model for these data,
data
and use it to estimate y(x=3, t=5)
t 0 1 3 4
x 1 2 3 4
y 2 3 5 8
Normal Equations:
n n n x = ( x1 , x2 ,..., xn ), t = (t1 , t2 ,..., tn )
a n + b ∑ xi + c ∑ ti = ∑ y i y = ( y1 , y2 ,..., yn )
i =1 i =1 i =1 n n n
n n n n a n + b∑ xi + c ∑ ti = ∑ yi
a ∑ x i + b ∑ ( xi ) 2 + c ∑ ( x i t i ) = ∑ ( x i y i ) n
i =1 i =1 i =1
i =1 i =1 i =1 i =1 a ∑ xi + bx 2 + cxt = xy
n n n n i =1
a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2 n 23
a ∑ ti + bxt + ct 2 = ty
i =1 i =1 i =1 i =1 i =1
Nonlinear Least Squares Models and
Methods
Polynomial Regression
Fitting with Nonlinear Functions
Linearization Method
Polynomial Regression
The least squares
q method can be extended to fit the data to a
higher-order polynomial
f ( x) = a + bx
b + cx , e = ( f ( xi ) − yi )
2 2
i
2
n
Minimize Φ (a, b, c) = ∑ ( a + bxi + cx − yi ) 2 2
i
i =1
Necessary conditions:
∂Φ (a, b, c) ∂Φ (a, b, c) ∂Φ (a, b, c)
= 0, = 0, =0
∂a ∂b ∂c
25
Equations for Quadratic
Regression
( )
n 2
Mi i i
Minimize Φ ( a, b, c ) = ∑ b i + cxi2
a + bx − yi
i =1
( )
n
∂Φ ( a , b, c )
= 2∑ a + bxi + cxi − yi = 0
2
∂a i =1
( )
n
∂Φ ( a, b, c )
= 2∑ a + bxi + cxi2 − yi xi = 0
∂b i =1
( )
n
∂Φ ( a, b, c )
= 2∑ a + bxi + cxi − yi xi = 0
2 2
∂c i =1
26
Normal Equations
n n n
a n + b ∑ xi + c ∑ xi2 = ∑ yi
i =1 i =1 i =1
n n n n
a ∑ xi + b ∑ xi2 + c ∑ xi3 = ∑ xi yi
i =1 i =1 i =1 i =1
n n n n
a ∑ xi2 + b ∑ xi3 + c ∑ xi4 = ∑ xi2 yi
i =1 i =1 i =1 i =1
27
Example 3: Polynomial
Regression
Fit a second-order ppolynomial
y to the following
g data
xi 0 1 2 3 4 5 ∑=15
yi 21
2.1 77
7.7 13 6
13.6 27 2
27.2 40 9
40.9 61 1
61.1 ∑=152 6
∑=152.6
xi2 0 1 4 9 16 25 ∑=55
xi3 0 1 8 27 64 125 225
28
Example 3: Equations and
Solution
6 a + 15 b + 55 c = 152.6
15 a + 55 b + 225 c = 585.6
55 a + 225 b + 979 c = 2488.8
Solving
g...
a = 2.4786, b = 2.3593, c = 1.8607
2
f ( x ) = 2.4786 + 2.3593 x + 1.8607 x
29
J stif the best choice
Justify
Given
Gi en two
t o or more functions
f nctions to fit the data,
data
How do you select the best?
Answer :
Determine the parameters for each function,
function
then compute Φ for each one. The function
resulting in smaller Φ (least sum of the squares
of the errors) is the best.
n
Φ= ∑ i i
( y
i =1
− f ( x )) 2
30
Example showing that Quadratic is
preferable
f bl than
th Linear
Li R
Regression
i
y y
x x
31
Fitting with Nonlinear Functions
xi 0.24 0.65 0.95 1.24 1.73 2.01 2.23 2.52
It iis required
i d tto fi
findd a function
f ti off th
the fform, ffor example
l :
f ( x) = a ln( x) + b cos( x) + c e x
to fit the data.
data
n
Φ (a, b, c) = ∑ ( f ( xi ) − yi ) 2
i =1
32
Fitting with Nonlinear Functions
n
Φ ( a, b, c ) = ∑ ( a ln( xi ) + b cos( xi ) + c e xi − yi ) 2
i =1
Necessary condition for the minimum :
∂ Φ ( a, b, c ) ⎫
= 0⎪
∂a
⎪
∂ Φ ( a, b, c ) ⎪
= 0⎬ ⇒ Normal Equations
q
∂b ⎪
∂ Φ ( a, b, c ) ⎪
= 0⎪
∂c ⎭ 33
Normal Equations
n n n n
a ∑ (ln xi ) + b∑ (ln xi )(cos xi ) + c ∑ (ln xi )( e ) = ∑ yi (ln xi )
2 xi
i =1 i =1 i =1 i =1
n n n n
a ∑ (ln xi )(cos xi ) + b∑ (cos xi ) + c ∑ (cos xi )( e ) = ∑ yi (cos xi )
2 xi
i =1 i =1 i =1 i =1
n n n n
a ∑ (ln xi )( e ) + b∑ (cos xi )( e ) + c ∑ ( e ) = ∑ yi ( e xi )
xi xi xi 2
i =1 i =1 i =1 i =1
34
Example 4: Evaluating Sums
xi 0.24 0.65 0.95 1.24 1.73 2.01 2.23 2.52 ∑=11.57
yi 0.23 -0.23 -1.1 -0.45 0.27 0.1 -0.29 0.24 ∑=-1.23
(ln xi)2 2.036 0.1856 0.0026 0.0463 0.3004 0.4874 0.6432 0.8543 ∑=4.556
ln(xi)
( ) cos(xi)
( ) -1.386
.386 -0.3429
0.3 9 -0.0298
0.0 98 0.0699 -0.0869
0.0869 -0.2969
0. 969 -0.4912
0. 9 -0.7514
0.75 ∑=-3.316
∑ 3.3 6
ln(xi) * exi -1.814 -0.8252 -0.1326 0.7433 3.0918 5.2104 7.4585 11.487 ∑=25.219
yi * ln(xi) -0.328 0.0991 0.0564 -0.0968 0.1480 0.0698 -0.2326 0.2218 ∑=-0.0625
( i)2
cos(xi) 0 943
0.943 0 6337
0.6337 0 3384
0.3384 0 1055
0.1055 0 0251
0.0251 0 1808
0.1808 0 3751
0.3751 0 6609
0.6609 ∑ 3 26307
∑=3.26307
cos(xi) * exi 1.235 1.5249 1.5041 1.1224 -0.8942 -3.1735 -5.696 -10.104 ∑=-14.481
yi*cos(xi) 0.223 -0.1831 -0.6399 -0.1462 -0.0428 -0.0425 0.1776 -0.1951 ∑=-0.8485
(exi)2 1.616 3.6693 6.6859 11.941 31.817 55.701 86.488 154.47 ∑=352.39
yi * exi 0.2924 -0.4406 -2.844 -1.555 1.523 0.7463 -2.697 2.9829 ∑=-1.9923
35
Example 4: Equations &
Solution
4.55643 a − 3.31547 b + 25.2192 c = −0.062486
− 3.31547 a + 3.26307 b − 14.4815 c = −0.848514
25.2192 a − 14.4815 b + 352.388 c = −1.992283
36
Linearization Method:
Exponential Model
Example: Given data xi 1 2 3
yi 2.4 5 9
( )
n 2
Φ ( a , b) = ∑ ae bxi
− yi
i =1
Normal Equations are obtained using :
( )
n
∂Φ
= 2∑ ae bxi − yi e bxi = 0
∂a i =1
Difficult to Solve
( )
n
∂Φ
= 2∑ ae bxi − yi a xi e bxi = 0
∂b i =1
37
Exponential Model
Find a function f(x) = aebx that best fits the data.
data
Then ln( f ( x)) = ln(a ) + b x
Define z = α + bx
where α = ln((a ) and z = ln(( y )
n
Instead of minimizing: Φ (a, b) = ∑ ae ( )
2
bxi
− yi
i =1
n
Minimize: Φ (α , b) = ∑ (α + bxi − zi ) (Easier to solve)
2
i =1
i=
38
Normal Equations
n
Φ (α , b) = ∑ (α + b xi − zi )2
i =1
Normal Equation
q s are obtained using
g:
n
∂Φ
= 2∑ (α + b xi − zi ) = 0
∂α i =1
n
∂Φ
= 2∑ (α + b xi − zi ) xi = 0
∂b i =1
n n n n n
α n + b ∑ xi = ∑ zi and α ∑ xi + b∑ xi2 = ∑ ( xi zi )
i =1 i =1 i =1 i =1 i =1
39
E l ti S
Evaluating Sums and
dSSolving
l i
xi 1 2 3 ∑=6
yi 2.4 5 9
zi=ln(yi) 0.875469 1.609438 2.197225 ∑=4.68213
xi2 1 4 9 ∑ 14
∑=14
xi zi 0.875469 3.218876 6.591674 ∑=10.6860
Set z = ln y
Normal Equations for x and z: α = ln( a ), a = eα
3 α + 6 b = 44.68213
68213
a = e 0.23897 = 1.26994
6 α + 14 b = 10.686
f ( x ) = ae bx = 1.26994 e 0.66087 x
Solving Equations:
α = 0.23897, b = 0.66087 40
Li
Linearization
i ti M Method:
th d Oth
Other Models
M d l
41
Homework N3. Deadline: 5 weeks
Problem 1: The profit (in millions USD) of a company during the past
seven years are reported in the following table
Time 1 2 3 4 5 6 7
(
(years)
)
Profit 1.2 1.5 1.8 2.m 3.8 4.7 6.n
(millions $)
Find a best-fit equation to the data trend. Try several possibilities-
linear, parabolic, and exponential. Plot all these curves and the data
in the same coordinates. Find the best equation
q to predict
p the pprofit
in two years.
42
Homework N3: Problem 2
x 1 2 3 3+1/m 4 6
t 1 2 2+1/n 3 4 5
y 5 12 9 4 3 24
U thi
Use this plane
l to
t estimate
ti t f(2,3)
f(2 3)
43
Homework N3: Problem 3
Given the data
x 0 1 3 4
y 1+1/
1+1/n 22
2.2 45
4.5 10+1/
10+1/m
44
Homework N3
45