8 - Data Fitting

DATA FITTING
DATA FITTING 1 / 27
1 Data fitting
2 Ordinary Least Square Method (OLS)
DATA FITTING 2 / 27
Data fitting
Outlines
1 Data fitting
DATA FITTING 3 / 27
Data fitting
Data fitting
A common problem in experimental work is to obtain a mathematical
relationship between a quantity y, called dependent variable, and
other quantities x1 , x2 , · · · , xn , called independent variables. The
relationship will be taken of the form of a function of n-variables,
y = f (x1 , x2 , · · · , xn ).
First, we consider the case n = 1 where the relationship between
dependent and independent variables is a function of one variable,
y = f (x), which belongs to some class of functions.
Example 1. (a) In physics, Hooke’s law gives a mathematical
relationship between the length y of a uniform spring (the dependent
variable) and the force x applied to it (an independent variable).
Precisely, the relationship in Hooke’s law is of the form
y = a + bx, (1)
where y0 = a is the length of the unstretched spring, and b is called
the spring constant. DATA FITTING 4 / 27
Data fitting
DATA FITTING 5 / 27
Data fitting
DATA FITTING 6 / 27
Data fitting
(b) The Newton’s second law of motion states that a body near the
Earth’s surface falls vertically downward in accordance with the
relationship
1
x = x0 + v0 t + gt 2 , (2)
2
where x is the height of the body with respect to the Earth surface at
time t, for example, x0 is the height of the body at time t = 0 (the
initial height), v0 is the (initial) velocity of the body at time t = 0,
and g is the acceleration of gravity at the Earth’s surface.
In example 1 (a), we consider the class of linear functions in x, and in
(b), the class of quadratic functions.
In order to choose a function y = f (x) in the class, i.e. to find the
spring constant b in (1), the parameters x0 , v0 , and g in (2), we have
to collect data for the values of the independent variable and the
corresponding dependent variable. For example, by letting different
value for the force x and measuring the corresponding length y of the
spring, we may have the following data for (x, y ),
DATA FITTING 7 / 27
Data fitting
x 1 2 3 4
(kg)
y 6.3 6.7 7 7.4
(cm)
and with example 1 (b), we may have the following data
t (s) 0.1 0.2 0.3 0.4

x (m) 10.1 10.4 10.7 11.2
DATA FITTING 8 / 27
Ordinary Least Square Method (OLS)
Outlines
1 Data fitting
DATA FITTING 9 / 27

Given data between independent variable x and dependent variable y,
x x1 x2 ··· xN
y y1 y2 ··· yN
and a function y = f (x), the OLS measures the fitness of this

function to the observed data by
N
X
RSS (f ) = (yi − f (xi ))2 , (3)
i=1
where yi − f (xi ), i = 1, 2, · · · , N, are called the residuals of the

model y = f (x) with respect to the data, and RSS stands for
Residuals Sum of Squares.
DATA FITTING 10 / 27
Given basic functions f1 (x) , f2 (x) , · · · , fk (x), we consider the class

of linear combinations of these basic functions,
f (x) = b1 f1 (x) + b2 f2 (x) + · · · + bk fk (x) ,
with b1 , b2 , · · · , bk ∈ R, i.e. each function in this class is identified by
a vector (b1 , b2 , · · · , bk ) ∈ Rk .
(3) is rewritten as
N
X
RSS (b1 , b2 , · · · , bk ) = [yi − (b1 f1 (xi ) + b2 f2 (xi ) + · · · + bk fk (xi ))]2
i=1
(4)
Example 1 continued. (a) The class of linear functions (1) is
generated by two basic functions, f1 (x) = 1 and f2 (x) = x, i.e.,
f (x) = a + bx = af1 (x) + bf2 (x)
which is a linear combination of the two basic functions. Under the

data, we have
x 1 2 3 4
y 6.3 6.7 7 7.4
f (x) a+b a + 2b a + 3b a + 4b
and
RSS (f ) = RSS (a, b) = (a + b − 6.3)2 +(a + 2b − 6.7)2 +(a + 3b − 7)2 +
(b) The class of quadratic functions (2) is generated by three basic

functions, f1 (x) = 1, f2 (x) = x, and f3 (x) = x 2 , i.e.,
f (x) = a + bx + cx 2 = af1 (x) + bf2 (x) + cf3 (x)
and we have
x 0.1 0.2 0.3 0.4

y 10.1 10.4 10.7 11.2
f (x) a + b · 0.1 + a + b · 0.2 + a + b · 0.3 + a + b · 0.4 +
c · 0.12 c · 0.22 c · 0.32 c · 0.42
and
2
RSS (f ) = RSS (a, b, c) = (a + b · 0.1 + c · 0.12 − 10.1) + (a + b · 0.
2 2
+ (a + b · 0.3 + c · 0.32 − 10.7) + (a + b · 0.4 + c · 0.42 − 11.2) .
In the ordinary least squares method, the function f in the class is
chosen with minimum residual sum of squares,
RSS (f ) → min,
i.e., the parameter vector (b1 , b2 , · · · , bk ), which identifies f, is
chosen as the global minimum of the function RSS (b1 , b2 , · · · , bk ) in
(4). By letting
   
f1 (x1 ) f2 (x1 ) · · · fk (x1 ) y1
 f1 (x2 ) f2 (x2 ) · · · fk (x2 )   y2 
X= .. .. ..  ∈ RN×k , Y =  ..  ∈ RN×1 , b
   
 . . .   . 
f1 (xN ) f2 (xN ) · · · fk (xN ) yN
we get
y1 − (b1 f1 (x1 ) + · · · + bk fk (x1 ))

 
..
.
 
 
Y − Xb =  yi − (b1 f1 (xi ) + · · · + bk fk (xi )) 
 
 .. 
 . 
yN − (b1 f1 (xN ) + · · · + bk fk (xN ))
and then (4) becomes
RSS (b) = kY − Xbk2 . (5)
Using the result from convex optimization (Theorem 16), with k ≤ N

and rank (X ) = k, i.e., X is of full rank, the unique solution of (5) is
given by
−1 T
XT X X Y = arg minRSS (b) . (6)
b∈Rk
Example 1 continued. (a) Under the data
x 1 2 3 4
y 6.3 6.7 7 7.4
we have    
1 1 6.3
 1 2   6.7 
X=
 1
,X =  ,
3   7.0 
1 4 7.4
and the best function in the class of linear functions is identified by
the parameter vector
−1
a T
−1 T 4 10 27.4 5.95
b= = X X X Y= = .
b 10 30 70.3 0.36
Therefore, we get the relation between y and x as
y = 5.95 + 0.36x
and we conclude that the original length of the spring is 5.95 cm, and
the spring constant is 0.36.
(b) With the data
t 0.1 0.2 0.3 0.4
x 10.1 10.4 10.7 11.2
we have
0.12
   
1 0.1 10.1
 1 0.2 0.22   10.4 
X=
 1 2 ,X = 
  ,
0.3 0.3 10.7 
1 0.4 0.42 11.2
and the best function in the class of quadratic functions is
characterized by
 
a −1 T
b =  b  = XT X X Y
c
 −1    
4 1 0.3 42.4 9.95
=  1 0.3 0.1   10.78  =  1.1 
0.3 0.1 0.0354 3.272 5.0
i.e., the best relationship between x and t in the class of quadratic
functions is
x = 9.95 + 1.1t + 5t 2 .
We conclude that the body’s height at t = 0 is 9.95 m, the initial
velocity is 1.1 m/s, and the acceleration of gravity at the Earth’s
surface is 10 m/s 2 .
Generalization to the case n > 1. Each basic function is a function of
n-variables, fi (x1 , x2 , · · · , xn ), i = 1, 2, · · · , k, and we consider the
class of linear combination of these basic functions,
f (x) = b1 f1 (x) + b2 f2 (x) + · · · + bk fk (x) , x = (x1 , x2 , · · · , xn ) .
The observed data have to be of the form
x = x1 = x2 = ··· xN
(x1 , . . . , xn ) (x1,1 , x2,1 , . . . , xn,1 )(x1,2 , x2,2 , . . . , xn,2 ) (x1,N , x2,N , . . .
y y1 y2 ··· yN
In the OLS algorithm, we try to find the global minimum of the

residual sum of squares,
RSS (b) = RSS (b1 , b2 , · · · , bk )

N
X
= [yi − (b1 f1 (xi ) + b2 f2 (xi ) + · · · + bk fk (xi ))]2
i=1
As in the case n = 1, by letting

 
f1 (x1 ) f2 (x1 ) · · · fk (x1 )
 f1 (x2 ) f2 (x2 ) · · · fk (x2 ) 
X= .. .. ..  ∈ RN×k
 
 . . . 
f1 (xN ) f2 (xN ) · · · fk (xN )
   
y1 b1
 y2   b2 
Y= ..  ∈ RN×1 , b =  ..  ∈ Rk×1
   
 .   . 
yN bk
we get
RSS (b) = kY − Xbk2 ,
and the unique solution of the corresponding optimization problem,
when k ≤ N and rank (X ) = k, is also given by
−1
XT X XT Y = arg minRSS (b) . (7)
b∈Rk
Example 2. Suppose that we try to find the relationship between the

selling price of a house, y, with the area x1 and the number of
bedrooms x2 of this house from the following observations
x1 (m2 ) x2 y (billion
VND)
50 1 1.5
60 2 3.2
70 3 4.5
80 3 5.8
100 3 6.5
Model 1. Consider the class of linear functions
f (x1 , x2 ) = a + bx1 + cx2 , which are linear combinations of three basic
functions, f1 (x1 , x2 ) = 1, f2 (x1 , x2 ) = x1 , and f3 (x1 , x2 ) = x2 . Let
   
f1 (50, 1) f2 (50, 1) f3 (50, 1) 1 50 1
 f1 (60, 2) f2 (60, 2) f3 (60, 2)   1 60 2 
   
X=  f1 (70, 3) f2 (70, 3) f3 (70, 3)  =  1 70 3  .
  
 f1 (80, 3) f2 (80, 3) f3 (80, 3)   1 80 3 
f1 (100, 3) f2 (100, 3) f3 (100, 3) 1 100 3
 
1.5  

 3.2 
 a
Y=
 4.5 ,b =  b .

 5.8  c
6.5
The best function in this class is given by
   −1  
a 5 360 12 21.5
−1
b =  b  = XT X XT Y =  360 27400 920   1696 

c 12 920 32 58.3
 
−2.57
=  0.0615 
1.0175
, i.e., f (x1 , x2 ) = −2.57 + 0.0615x1 + 1.0175x2 .
Model 2 : Consider the class of polynomial functions

f (x1 , x2 ) = a + bx1 + cx12 + dx2 with four basic functions,
f1 (x1 , x2 ) = 1, f2 (x1 , x2 ) = x1 , f3 (x1 , x2 ) = x12 , and f4 (x1 , x2 ) = x2 .
Let
 
f1 (50, 1) f2 (50, 1) f3 (50, 1) f4 (50, 1)
 f1 (60, 2) f2 (60, 2) f3 (60, 2) f4 (60, 2) 
 
X=  f1 (70, 3) f2 (70, 3) f3 (70, 3) f4 (70, 3) 

 f1 (80, 3) f2 (80, 3) f3 (80, 3) f4 (80, 3) 
f1 (100, 3) f2 (100, 3) f3 (100, 3) f4 (100, 3)
1 50 502
   
1 1.5  
a

 1 60 602 2 


 3.2 
  b 
=
 1 70 702 3 ,Y = 
  4.5 ,b =  .
  c 
 1 80 802 3   5.8 
d
1 100 1002 3 6.5
The best function in this model is given by

 
a
 b  T
−1 T
b=  c = X X
 X Y
d
 −1  
5 360 27400 12 21.5
 360 27400 2196000 920   1696 
 
=
 27400 2196000 184180000 73600   139440 
12 920 73600 32 58.3
 
−19.25
 0.5775 
=
 −0.003
,

−0.6625
i.e., f (x1 , x2 ) = −19.25 + 0.5775x1 − 0.003x12 − 0.6625x2 .
Model 3 : linear functions with ”interaction term”

f (x1 , x2 ) = a + bx1 + cx2 + dx1 x2 with four basic functions,
f1 (x1 , x2 ) = 1, f2 (x1 , x2 ) = x1 , f3 (x1 , x2 ) = x2 , and f4 (x1 , x2 ) = x1 x2 .
Let
 
f1 (50, 1) f2 (50, 1) f3 (50, 1) f4 (50, 1)
 f1 (60, 2) f2 (60, 2) f3 (60, 2) f4 (60, 2) 
 
X=  f1 (70, 3) f2 (70, 3) f3 (70, 3) f4 (70, 3) 

 f1 (80, 3) f2 (80, 3) f3 (80, 3) f4 (80, 3) 
f1 (100, 3) f2 (100, 3) f3 (100, 3) f4 (100, 3)
   
1 50 1 1 · 50 1.5  
a

 1 60 2 2 · 60 


 3.2 
  b 
=
 1 70 3 3 · 70 ,Y = 
  4.5 ,b =  .
  c 
 1 80 3 3 · 80   5.8 
d
1 100 3 3 · 100 6.5
The best function in this model is given by

 
a
 b 
 = XT X −1 XT Y

b=  c 
d
 −1    
5 360 12 920 21.5 −3.58571
 360 27400 920 73600   1696   0.08143 
=
 12
  = ,
920 32 2540   58.3   1.33571 
920 73600 2540 208600 4746 −0.00643
i.e., f (x1 , x2 ) = −3.58571 + 0.08143x1 + 1.33571x2 − 0.00643x1 x2 .

8 - Data Fitting

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8 - Data Fitting

Uploaded by

Copyright:

Available Formats

DATA FITTING

2 Ordinary Least Square Method (OLS)

2 Ordinary Least Square Method (OLS)

and with example 1 (b), we may have the following data

t (s) 0.1 0.2 0.3 0.4

2 Ordinary Least Square Method (OLS)

Ordinary Least Square Method (OLS)

and a function y = f (x), the OLS measures the fitness of this

where yi − f (xi ), i = 1, 2, · · · , N, are called the residuals of the

Given basic functions f1 (x) , f2 (x) , · · · , fk (x), we consider the class

f (x) = a + bx = af1 (x) + bf2 (x)

which is a linear combination of the two basic functions. Under the

RSS (f ) = RSS (a, b) = (a + b − 6.3)2 +(a + 2b − 6.7)2 +(a + 3b − 7)2 +

(b) The class of quadratic functions (2) is generated by three basic

f (x) = a + bx + cx 2 = af1 (x) + bf2 (x) + cf3 (x)

x 0.1 0.2 0.3 0.4

y1 − (b1 f1 (x1 ) + · · · + bk fk (x1 ))

and then (4) becomes

RSS (b) = kY − Xbk2 . (5)

Using the result from convex optimization (Theorem 16), with k ≤ N

Example 1 continued. (a) Under the data

Therefore, we get the relation between y and x as

f (x) = b1 f1 (x) + b2 f2 (x) + · · · + bk fk (x) , x = (x1 , x2 , · · · , xn ) .

The observed data have to be of the form

In the OLS algorithm, we try to find the global minimum of the

RSS (b) = RSS (b1 , b2 , · · · , bk )

As in the case n = 1, by letting

Example 2. Suppose that we try to find the relationship between the

Model 2 : Consider the class of polynomial functions

The best function in this model is given by

Model 3 : linear functions with ”interaction term”

The best function in this model is given by

You might also like