MAFE208IU-L6 - Least Squares Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Chapter 3:

g & Interpolation
Curve Fitting p

Lecture 1:
Least-Squares Regression
Motivation

• Data often given for


discrete values
• Estimates at points
between the discrete
values
1 2 3 4 5 6 7 8 9

Year 1 2 3 4 5 6 7
Profit 2.2 2.5 3.0 3.4 3.8 4.3 4.8

Question: Estimate the profit in the next year


Motivation
Given experimental data: (x1, y1),
) (x2, y2),…,(x
) (xn, yn)

• The relationship between


x and y may not be clear.
clear
• Find a function f(x) that
best fit the data

1 2 3 4 5 6 7 8 9 3
Curve Fitting
Given a set of tabulated data, find a curve or a function
that best represents the data.
Given:
1. The tabulated data

2 The form of the function


2.

3. The curve fitting criteria

Find the unknown coefficients of the function

4
Selection of the Functions
Linear f ( x) = a + bx
Quadratic f ( x) = a + bx + cx 2
n
Polynomial f ( x) = ∑ ak x k

k =0
m
General f ( x ) = ∑ ak g k ( x )
k =0

g k ( x) are known.
5
Decide on the Criterion
1. Least Squares
q Regression:
g
n
minimize Φ = ∑ ( y − f ( x ))
i=1
i i
2

2 E t Matching
2.Exact M t hi (I
(Interpolation):
t l ti )
yi = f ( xi )

6
Least Squares Regression
Linear Regression
Fitting a straight line to a set of paired observations:
((x1, y1), (x
( 2, y2),
),…,(x
,( n, yn)).
y=a+bx+e
Linear f ( x) = a + bx
a-intercept
a intercept
b-slope
e-error or residual
e-error, residual, between the model and the
observations.

7
Determine the Unknowns

We want to find a and b to minimize :


n
Φ ( a , b) = ∑ ( a + bxi − yi ) 2
i =1

How do we obtain a and b to minimize : Φ ( a , b) ?

8
Determine the Unknowns
Necessary condition for the minimum :

∂ Φ ( a, b)
=0
∂a
∂ Φ ( a, b)
=0
∂b

9
Determining the Unknowns

n
∂ Φ ( a , b)
= ∑ 2(a + bxi − yi ) = 0
∂a i =1

n
∂ Φ ( a , b)
= ∑ 2(a + bx
b i − yi )xi = 0
∂b i =1

10
Normal Equations
q
∂ Φ ( a , b) n
= ∑ 2(a + bxi − yi ) = 0
∂a i =1

∂ Φ ( a , b) n
= ∑ 2(a + bxi − yi )xi = 0
∂b i =1

⎛ n ⎞ ⎛ n ⎞
n a + ⎜⎜ ∑ xi ⎟⎟ b = ⎜⎜ ∑ yi ⎟⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠
⎛ n ⎞ ⎛ n 2⎞ ⎛ n ⎞
⎜ ∑ xi ⎟ a + ⎜ ∑ xi ⎟ b = ⎜ ∑ xi y i ⎟
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ i =1 ⎠ ⎝ i =1 ⎠ ⎝ i =1 ⎠
These equations are
called Normal Equations
11
Example 1
According to a research by a group of biologists, the
chirping rate y (chirps/min) of crickets of a certain
species appears to be related to temperature x ( oF).
x 50 55 60 65 70 75 80 85 90 95

y 22 45 82 95 110 145 183 202 216 238

Question: What is the chirping rate at 100oF?

12
Plot data

13
⎛ n ⎞ ⎛ n ⎞
n a + ⎜ ∑ xk ⎟ b = ⎜ ∑ yk ⎟
Solution ⎝ k =1 ⎠ ⎝ k =1 ⎠
⎛ n ⎞ ⎛ n 2⎞ ⎛ n ⎞
Normal equations: ⎜ ∑ xk ⎟ a + ⎜ ∑ xk ⎟ b = ⎜ ∑ xk yk ⎟
⎝ k =1 ⎠ ⎝ k =1 ⎠ ⎝ k =1 ⎠

10a + 725b = 1338


725a + 54625b = 107105
⇒ a = -221.2, b = 4.897
221.22 + 4.897
y = -221 4 897 x
y (100) = 268
i12

14
Exercise ì
íẶ
The profits y of a company (in millions USD) during
the past seven years x are given in the following table
x 1 2 3 4 5 6 7

y 2.2
Cd 2.5
18 3.0
05 3.4
30 3.8 4.3
00 4.8
6

a)) Plot the data points nên


b) Find the least-squares line that best fit the data. Plot
the line with the data points
c) Use the least-squares line found in part b) to
estimate the profit in the 8th year n a + ⎛ ∑ x ⎞ b = ⎛ ∑ y ⎞
n n

⎜ k ⎟ ⎜ k ⎟
⎝ k =1 ⎠ ⎝ k =1 ⎠
ị ⎛ n
⎜∑ k ⎟x

a +
⎛ n 2⎞
⎜∑ k ⎟x b =
⎛ n15
⎜∑ k k ⎟
x y

⎝ k =1 ⎠ ⎝ k =1 ⎠ ⎝ k =1 ⎠
ịỄẸ
Multiple Linear Regression
Example: t 0 1 2 3

x 0.1 0.4 0.2 0.2


Given the following data:
y 3 2 1 2

D t
Determine
i a ffunction
ti off two
t variables:
i bl

f(x,t) = a + b x + c t
That best fits the data with the least sum of the square of errors.

16
Solution of Multiple Linear
R
Regression
i
Construct Φ =
n
t 0 1 2 3
∑ i i i
( y
i =1
− f ( x , t )) 2

x 0.1 0.4 0.2 0.2


D i the
Derive th necessary
conditions by equating the y 3 2 1 2
partial derivatives with respect
to the unknown parameters to
zero then solve the equations.
zero, equations

17
Solution of Multiple Linear
R
Regression
i
n
Φ (a, b, c) = ∑ ( a + bxi + cti − yi ) → min
2
f ( x, t ) = a + bx + ct ,
i =1

N
Necessary conditions:
di i
∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) = 0
∂a i =1

∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) xi = 0
∂b i =1

∂Φ (a, b, c) n
= 2∑ ( a + bxi + cti − yi ) ti = 0
∂c i =1

18
N
Normal
lEEquations
ti
n n n
a n + b ∑ xi + c ∑ ti = ∑ y i
i =1 i =1 i =1
n n n n
a ∑ xi + b ∑ ( xi ) 2 + c ∑ ( xi ti ) = ∑ ( xi y i )
i =1 i =1 i =1 i =1
n n n n
a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2

i =1 i =1 i =1 i =1

Chúnose 19
Example 2: Multiple Linear
R
Regression
i
i 1 2 3 4 Sum
ti 0 1 2 3 6
xi 0.1 0.4 0.2 0.2 0.9
yi 3 2 1 2 8
xi2 0.01 0.16 0.04 0.04 0.25
xi t i 0 0.4 0.4 0.6 1.4
xi yi 0.3 0.8 0.2 0.4 1.7
ti2 0 1 4 9 14
t i yi 0 2 2 6 10

20
E
Example
l 2:
2 S
System
t off Equations
E ti
4a + 0.9b + 6c = 8
0.9a + 0.25b + 1.4c = 1.7
6a + 1.4b + 14c = 10

Solving :
a = 2.9574, b = −1.7021, c = −0.38298
f ( x, t ) = a + bx + ct = 2.9574 − 1.7021 x − 0.38298 t

21
Exercise
Develop a multiple least-square model for these data,
data
and use it to estimate y(1/2, -1)
t -1
1 1 3 4
x -2 0 1 3

y -6 -2 3 6
Normal Equations:
n n n x = ( x1 , x2 ,..., xn ), t = (t1 , t2 ,..., tn )
a n + b ∑ xi + c ∑ ti = ∑ y i y = ( y1 , y2 ,..., yn )
i =1 i =1 i =1 n n n

n n n n a n + b∑ xi + c ∑ ti = ∑ yi
a ∑ x i + b ∑ ( xi ) 2 + c ∑ ( x i t i ) = ∑ ( x i y i ) n
i =1 i =1 i =1

i =1 i =1 i =1 i =1 a ∑ xi + bx 2 + cxt = xy
n n n n i =1

a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2 n 22
a ∑ ti + bxt + ct 2 = ty
i =1 i =1 i =1 i =1 i =1
Exercise 2
Develop a multiple least-square model for these data,
data
and use it to estimate y(x=3, t=5)
t 0 1 3 4
x 1 2 3 4

y 2 3 5 8
Normal Equations:
n n n x = ( x1 , x2 ,..., xn ), t = (t1 , t2 ,..., tn )
a n + b ∑ xi + c ∑ ti = ∑ y i y = ( y1 , y2 ,..., yn )
i =1 i =1 i =1 n n n

n n n n a n + b∑ xi + c ∑ ti = ∑ yi
a ∑ x i + b ∑ ( xi ) 2 + c ∑ ( x i t i ) = ∑ ( x i y i ) n
i =1 i =1 i =1

i =1 i =1 i =1 i =1 a ∑ xi + bx 2 + cxt = xy
n n n n i =1

a ∑ ti + b∑ ( xi ti ) + c ∑ (ti ) = ∑ (ti yi )
2 n 23
a ∑ ti + bxt + ct 2 = ty
i =1 i =1 i =1 i =1 i =1
Nonlinear Least Squares Models and
Methods

Polynomial Regression
Fitting with Nonlinear Functions
Linearization Method
Polynomial Regression
The least squares
q method can be extended to fit the data to a
higher-order polynomial

f ( x) = a + bx
b + cx , e = ( f ( xi ) − yi )
2 2
i
2

n
Minimize Φ (a, b, c) = ∑ ( a + bxi + cx − yi ) 2 2
i
i =1

Necessary conditions:
∂Φ (a, b, c) ∂Φ (a, b, c) ∂Φ (a, b, c)
= 0, = 0, =0
∂a ∂b ∂c
25
Equations for Quadratic
Regression
( )
n 2
Mi i i
Minimize Φ ( a, b, c ) = ∑ b i + cxi2
a + bx − yi
i =1

( )
n
∂Φ ( a , b, c )
= 2∑ a + bxi + cxi − yi = 0
2
∂a i =1

( )
n
∂Φ ( a, b, c )
= 2∑ a + bxi + cxi2 − yi xi = 0
∂b i =1

( )
n
∂Φ ( a, b, c )
= 2∑ a + bxi + cxi − yi xi = 0
2 2
∂c i =1
26
Normal Equations
n n n
a n + b ∑ xi + c ∑ xi2 = ∑ yi
i =1 i =1 i =1
n n n n
a ∑ xi + b ∑ xi2 + c ∑ xi3 = ∑ xi yi
i =1 i =1 i =1 i =1
n n n n
a ∑ xi2 + b ∑ xi3 + c ∑ xi4 = ∑ xi2 yi
i =1 i =1 i =1 i =1

27
Example 3: Polynomial
Regression
Fit a second-order ppolynomial
y to the following
g data
xi 0 1 2 3 4 5 ∑=15

yi 21
2.1 77
7.7 13 6
13.6 27 2
27.2 40 9
40.9 61 1
61.1 ∑=152 6
∑=152.6

xi2 0 1 4 9 16 25 ∑=55
xi3 0 1 8 27 64 125 225

xi4 0 1 16 81 256 625 ∑=979

xi yi 0 7.7 27.2 81.6 163.6 305.5 ∑=585.6

xi2 yi 0 7.7 54.4 244.8 654.4 1527.5 ∑=2488.8

28
Example 3: Equations and
Solution
6 a + 15 b + 55 c = 152.6
15 a + 55 b + 225 c = 585.6
55 a + 225 b + 979 c = 2488.8
Solving
g...
a = 2.4786, b = 2.3593, c = 1.8607
2
f ( x ) = 2.4786 + 2.3593 x + 1.8607 x

29
J stif the best choice
Justify
Given
Gi en two
t o or more functions
f nctions to fit the data,
data
How do you select the best?

Answer :
Determine the parameters for each function,
function
then compute Φ for each one. The function
resulting in smaller Φ (least sum of the squares
of the errors) is the best.
n
Φ= ∑ i i
( y
i =1
− f ( x )) 2

30
Example showing that Quadratic is
preferable
f bl than
th Linear
Li R
Regression
i

y y

x x

Linear Regression Quadratic Regression

31
Fitting with Nonlinear Functions
xi 0.24 0.65 0.95 1.24 1.73 2.01 2.23 2.52

yi 0.23 -0.23 -1.1 -0.45 0.27 0.1 -0.29 0.24

It iis required
i d tto fi
findd a function
f ti off th
the fform, ffor example
l :
f ( x) = a ln( x) + b cos( x) + c e x
to fit the data.
data
n
Φ (a, b, c) = ∑ ( f ( xi ) − yi ) 2
i =1

32
Fitting with Nonlinear Functions
n
Φ ( a, b, c ) = ∑ ( a ln( xi ) + b cos( xi ) + c e xi − yi ) 2
i =1
Necessary condition for the minimum :
∂ Φ ( a, b, c ) ⎫
= 0⎪
∂a

∂ Φ ( a, b, c ) ⎪
= 0⎬ ⇒ Normal Equations
q
∂b ⎪
∂ Φ ( a, b, c ) ⎪
= 0⎪
∂c ⎭ 33
Normal Equations
n n n n
a ∑ (ln xi ) + b∑ (ln xi )(cos xi ) + c ∑ (ln xi )( e ) = ∑ yi (ln xi )
2 xi

i =1 i =1 i =1 i =1

n n n n
a ∑ (ln xi )(cos xi ) + b∑ (cos xi ) + c ∑ (cos xi )( e ) = ∑ yi (cos xi )
2 xi

i =1 i =1 i =1 i =1

n n n n
a ∑ (ln xi )( e ) + b∑ (cos xi )( e ) + c ∑ ( e ) = ∑ yi ( e xi )
xi xi xi 2

i =1 i =1 i =1 i =1

Evaluate the sums and solve the normal equations.

34
Example 4: Evaluating Sums
xi 0.24 0.65 0.95 1.24 1.73 2.01 2.23 2.52 ∑=11.57
yi 0.23 -0.23 -1.1 -0.45 0.27 0.1 -0.29 0.24 ∑=-1.23

(ln xi)2 2.036 0.1856 0.0026 0.0463 0.3004 0.4874 0.6432 0.8543 ∑=4.556

ln(xi)
( ) cos(xi)
( ) -1.386
.386 -0.3429
0.3 9 -0.0298
0.0 98 0.0699 -0.0869
0.0869 -0.2969
0. 969 -0.4912
0. 9 -0.7514
0.75 ∑=-3.316
∑ 3.3 6

ln(xi) * exi -1.814 -0.8252 -0.1326 0.7433 3.0918 5.2104 7.4585 11.487 ∑=25.219

yi * ln(xi) -0.328 0.0991 0.0564 -0.0968 0.1480 0.0698 -0.2326 0.2218 ∑=-0.0625

( i)2
cos(xi) 0 943
0.943 0 6337
0.6337 0 3384
0.3384 0 1055
0.1055 0 0251
0.0251 0 1808
0.1808 0 3751
0.3751 0 6609
0.6609 ∑ 3 26307
∑=3.26307

cos(xi) * exi 1.235 1.5249 1.5041 1.1224 -0.8942 -3.1735 -5.696 -10.104 ∑=-14.481

yi*cos(xi) 0.223 -0.1831 -0.6399 -0.1462 -0.0428 -0.0425 0.1776 -0.1951 ∑=-0.8485

(exi)2 1.616 3.6693 6.6859 11.941 31.817 55.701 86.488 154.47 ∑=352.39

yi * exi 0.2924 -0.4406 -2.844 -1.555 1.523 0.7463 -2.697 2.9829 ∑=-1.9923

35
Example 4: Equations &
Solution
4.55643 a − 3.31547 b + 25.2192 c = −0.062486
− 3.31547 a + 3.26307 b − 14.4815 c = −0.848514
25.2192 a − 14.4815 b + 352.388 c = −1.992283

Solving the above equations :


a = − 0.88815, b = − 1.1074 , c = 0.012398
Therefore
Therefore,
f ( x ) = −0.88815 ln( x ) − 1.1074 cos( x ) + 0.012398 e x

36
Linearization Method:
Exponential Model
Example: Given data xi 1 2 3

yi 2.4 5 9

Find a function f(x) = ae bx


b
that best fits the data.

( )
n 2
Φ ( a , b) = ∑ ae bxi
− yi
i =1
Normal Equations are obtained using :

( )
n
∂Φ
= 2∑ ae bxi − yi e bxi = 0
∂a i =1
Difficult to Solve

( )
n
∂Φ
= 2∑ ae bxi − yi a xi e bxi = 0
∂b i =1
37
Exponential Model
Find a function f(x) = aebx that best fits the data.
data
Then ln( f ( x)) = ln(a ) + b x

Define z = α + bx
where α = ln((a ) and z = ln(( y )
n
Instead of minimizing: Φ (a, b) = ∑ ae ( )
2
bxi
− yi
i =1
n
Minimize: Φ (α , b) = ∑ (α + bxi − zi ) (Easier to solve)
2

i =1
i=

38
Normal Equations
n
Φ (α , b) = ∑ (α + b xi − zi )2
i =1
Normal Equation
q s are obtained using
g:
n
∂Φ
= 2∑ (α + b xi − zi ) = 0
∂α i =1
n
∂Φ
= 2∑ (α + b xi − zi ) xi = 0
∂b i =1
n n n n n
α n + b ∑ xi = ∑ zi and α ∑ xi + b∑ xi2 = ∑ ( xi zi )
i =1 i =1 i =1 i =1 i =1
39
E l ti S
Evaluating Sums and
dSSolving
l i
xi 1 2 3 ∑=6
yi 2.4 5 9
zi=ln(yi) 0.875469 1.609438 2.197225 ∑=4.68213
xi2 1 4 9 ∑ 14
∑=14
xi zi 0.875469 3.218876 6.591674 ∑=10.6860

Set z = ln y
Normal Equations for x and z: α = ln( a ), a = eα
3 α + 6 b = 44.68213
68213
a = e 0.23897 = 1.26994
6 α + 14 b = 10.686
f ( x ) = ae bx = 1.26994 e 0.66087 x
Solving Equations:
α = 0.23897, b = 0.66087 40
Li
Linearization
i ti M Method:
th d Oth
Other Models
M d l

41
Homework N3. Deadline: 5 weeks
Problem 1: The profit (in millions USD) of a company during the past
seven years are reported in the following table

Time 1 2 3 4 5 6 7
(
(years)
)
Profit 1.2 1.5 1.8 2.m 3.8 4.7 6.n
(millions $)
Find a best-fit equation to the data trend. Try several possibilities-
linear, parabolic, and exponential. Plot all these curves and the data
in the same coordinates. Find the best equation
q to predict
p the pprofit
in two years.

(m-2)(n-2) is the two last digits of your student ID number

42
Homework N3: Problem 2

3. Use linear regression to fit these data by a plane

x 1 2 3 3+1/m 4 6
t 1 2 2+1/n 3 4 5
y 5 12 9 4 3 24
U thi
Use this plane
l to
t estimate
ti t f(2,3)
f(2 3)

43
Homework N3: Problem 3
Given the data

x 0 1 3 4
y 1+1/
1+1/n 22
2.2 45
4.5 10+1/
10+1/m

a)) Find the Newton divided-difference interpolating


p g
polynomial that passes through these data points. Use it
to estimate f(2). Plot the curve.
b) Find the Lagrange interpolating polynomial that passes
through these data points. Use it to estimate f’(2)

(m 2)(n 2) is the two last digits of your student ID number


(m-2)(n-2)

44
Homework N3

Problem 4: Fit the data in the following


g Table with qquadratic
splines, and cubic splines. Use the result to evaluate the
function at x=1.5
x 0 1 2 3
f(x) 1+1/n 2 4-1/m 12

S. Chapra & R.P. Canale, Numerical Methods for


Engineers, McGraw-Hill, 7th ed., 2015:
Pages 575-578
Problems: 20.20, 20.21, 20.25 20.26, 20.32, 20.33

45

You might also like