Professional Documents
Culture Documents
Least Squares Approximation
Least Squares Approximation
3.1 Introduction
In the previous chapter we discussed approximation of a function using polynomial interpolation.
Here, we will discuss another approach to approximate a function . This approach for approximation
of a function is called the least squares approximation. This approach is suitable if the given data
points are experimental data. We shall discuss linear, nonlinear, plane, and trigonometric least
squares approximation of a function. We shall also discuss least squares solution of overdetermined
and underdetermined linear systems. In the end of the chapter we shall discuss least squares with
QR decomposition and singular value decomposition.
5 5
4.5 4.5
4 O 4 O
3.5 3.5
O O
3 3
O O
2.5 2.5
O O
2
O 2
O
O O
1.5 O 1.5 O
1 1
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
A Lagrange interpolation of polynomial of degree six could easily be constructed for this data.
However, there is no justification for insisting that the data points be reproduced exactly, and
such an approximation may well be very misleading since unwanted oscillations are likely. A more
satisfactory approach would be to find a straight line which passes close to all seven points. One
such possibility is shown in Figure 3.1(b). Here we have to decide what criterion is to be adopted
for constructing such an approximation. The most common approach for this curve is known as
linear least squares data fitting. The linear least squares approach defines the correct straight line
as the one that minimizes the sum of the squares of the distances between the data points and the
line.
The least squares straight line approximations are an extremely useful and common approximate
fit. The solution to linear least squares approximation is an important application of the solution
of systems of linear equations and leads to other interesting ideas of numerical linear algebra. The
least squares approximation is not restricted to straight line. However, in order to motivated the
general case we consider this first. The straight line
should be fitted through the given points (x1 , y1 ), . . . , (xn , yn ) so that the sum of the squares of the
distances of these points from the straight line is minimum, where the distance is measured in the
vertical direction (the y-direction). Hence it will suffice to minimize the function
n
X
E(a, b) = (yj − a − bxj )2 . (3.2)
j=1
The minimum of E occurs if the partial derivatives of E with respect to a and b become zero.
Note that {xj } and {yj } are constant in (3.2) and unknown parameters a and b are variables. Now
differentiate E with respect to variable a by taking other variable b fixed and then put it equal to
zero, gives
n
∂E X
= −2 (yj − a − bxj ) = 0. (3.3)
∂a j=1
Now hold variable a and differentiate E with respect to variable b and then put it equal to zero,
Chapter Six Least Squares Approximation 245
we obtain n
∂E X
= −2 xj (yj − a − bxj ) = 0. (3.4)
∂b j=1
The above equations (3.3) and (3.4) may be rewritten after dividing by -2 as follows
n
X n
X n
X
yj − a−b xj = 0
j=1 j=1 j=1
n
X n
X n
X
xj yj − a xj − b x2j = 0,
j=1 j=1 j=1
which can be arranged to form a 2 × 2 system that is known as the normal equations
n
X n
X
na + b xj = yj
j=1 j=1
n
X n
X n
X
a xj + b x2j = x j yj .
j=1 j=1 j=1
where X X X X
S1 = xj , S2 = yj , S3 = x2j , S4 = x j yj .
In the foregoing equations the summation is over j from 1 to n.
The solution of the above system (3.5) can be obtain easily as
The formula (3.5) reduces the problem of finding the parameters for least squares linear fit to a
simple matrix multiplication.
We shall call a and b the least squares linear parameters for the data and the linear guess function
with parameters, that is
p1 (x) = a + bx,
will be called the least squares line (or regression line) for the data.
Example 3.1 Using the method of least squares, fit a straight line to the four points
Solution. The sums required for the normal equation (3.5) are easily obtained using the values in
Table 3.1. The linear system involving a and b in (3.5) form is
246 6.2 Techniques of Least Squares Approximation
! ! !
4 10 a 8
= .
10 30 b 23
Then solving above linear system using the LU decomposition by Cholesky method discussed in
Chapter 1, the solution of the linear system is
a = 0.5 and b = 0.6.
Thus the least squares line is
p1 (x) = 0.5 + 0.6x.
Clearly, p1 (x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.2.
We use the author-defined function LineFit and the following MATLAB commands to reproduce
the above results as follows:
Table 3.2 shows the error analysis of the straight line using least squares approximation. Hence we
have
4
X
E(a, b) = (yi − p1 (xi ))2 = 0.2000,
i=1
the possible error. •
Chapter Six Least Squares Approximation 247
3.5
3
p1(x) = 0.6x + 0.5
2.5
1.5
0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
m
X m
X m
X
= yj2 − 2 pn (xj )yj + (pn (xj ))2
j=1 j=1 j=1
m m n
! m n
!2
X X X X X
= yj2 −2 bi xij yj + bi xij
j=1 j=1 i=0 j=1 i=0
m n m n X
n m
xi+k
X X X X X
= yj2 − 2 bi yj xij + bi bk j
.
j=1 i=0 j=1 i=0 k=0 j=1
Like in the linear least squares, for E to be minimized, it is necessary that ∂E/∂bi = 0, for each
i = 0, 1, 2, . . . , n. Thus for each i,
m n m
∂E
xji+k .
X X X
0= = −2 yj xij + 2 bk (3.8)
∂bi j=1 k=1 j=1
m m m m m
xn+1
X X X X X
b0 x1j + b1 x2j + b2 x3j + · · · + bn j = yj x1j
j=1 j=1 j=1 j=1 j=1
..
.
m m m m m
xn+1 xn+2
X X X X X
b0 xnj + b1 j + b2 j + · · · + bn x2n
j = yj xnj .
j=1 j=1 j=1 j=1 j=1
Note that the coefficients matrix of this system is symmetric and positive definite. Hence the
normal equations possess a unique solution.
Example 3.2 Find the least squares polynomial approximation of degree 2 to the following data:
xj 0 1 2 4 6
yj 3 1 0 1 4
p2 (x) = b0 + b1 x + b2 x2 ,
The sums required for the normal equation (3.10) are easily obtained using the values in Table 3.3.
The linear system involving unknown coefficients b0 , b1 and b2 is
Then solving the above linear system, the solution of the linear system is
5
p2(x) = 2.8252 − 2.0490x + 0.3774x2
0
−1 0 1 2 3 4 5 6 7
We use the author-defined function PolyFit and the following MATLAB commands to reproduce
the above results as follows:
Clearly, p2 (x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.3. To plot Figure 3.3
one can use the MATLAB window command:
>> xf it = −1 : 0.1 : 7;
>> yf it = 2.8252 − 2.0490. ∗ xf it + 0.3774. ∗ xf it. ∗ xf it;
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );
Hence the error associated with the least squares polynomial approximation of degree 2 is
5
X
E(b0 , b1 , b2 ) = (yi − p(xi ))2 = 0.2345.
j=1
Then the set of normal equations (3.14) represent the system of two equations in the two unknowns
a and b. Such nonlinear simultaneous equations can be solved using the Newton’s method for
nonlinear systems.
Example 3.3 Find the best-fit of the form y = axb by using following data:
x 1 2 4 10
y 2.87 4.51 6.11 9.43
Chapter Six Least Squares Approximation 251
by Newton’s method, starting with the initial approximation (a0 , b0 ) = (2, 1) and taking desired
accuracy within = 10−5 .
By using the given data points, the nonlinear system (3.15), gives
2.87 − a(1 + 22b + 42b + 102b ) + 4.5(2b ) + 6.11(4b ) + 9.43(10b ) = 0
−a(0.69(22b ) + 1.39(42b ) + 2.30(102b )) + 3.12(2b ) + 8.47(4b ) + 21.72(10b ) = 0
Let us consider the two functions
f1 (a, b) = 2.87 − a(1 + 22b + 42b + 102b ) + 4.5(2b ) + 6.11(4b ) + 9.43(10b )
f2 (a, b) = −a(0.69(22b ) + 1.39(42b ) + 2.30(102b )) + 3.12(2b ) + 8.47(4b ) + 21.72(10b )
and their derivatives with respect to unknown variables a and b:
∂f1
= −(1 + 22b + 42b + 102b )
∂a
∂f1
= −a(1.39(22b ) + 2.77(42b ) + 4.61(102b ) + 3.12(2b ) + 8.47(4b ) + 21.71(10b ))
∂b
∂f2
= −(0.69(22b ) + 1.39(42b ) + 2.30(102b )
∂a
∂f2
= −a(0.96(22b ) + 3.84(42b ) + 10.61(102b ) + 2.16(2b ) + 11.74(4b ) + 50.01(10b ))
∂b
Since the Newton’s formula for the system of two nonlinear equation is
! ! !
ak+1 ak −1 f1 (ak , bk )
= −J (ak , bk ) ,
bk+1 bk+1 f2 (ak , bk )
where ∂f
1 ∂f1
∂a ∂b
J = .
∂f2 ∂f2
∂a ∂b
Let we start with the initial approximation (a0 , b0 ) = (2, 1), and the values of the functions at this
initial approximation are as follows:
f1 (2, 1) = −111.39
f2 (2, 1) = −253.216
252 6.2 Techniques of Least Squares Approximation
The Jacobian matrix J and its inverse J −1 at the given initial approximation can be calculated as
follows: !
−121 −763.576
J(2, 1) = ,
−255.248 −1700.534
and !
−1 −0.1565 0.0703
J (2, 1) = .
0.0235 −0.0111
Substituting all these values in the above Newton’s formula, we get the first approximation as follows:
! ! ! ! !
a1 2.0 −0.1565 0.0703 −111.39 2.3615
= − = .
b1 1.0 0.0235 −0.0111 −253.216 0.7968
The first two and the further steps of the method are listed in Table 3.8 by taking desired accuracy
within = 10−5 .
Hence
y(x) = 3.08314x0.48751 .
is the best nonlinear fit. •
But remember that nonlinear simultaneous equations are more difficult to solve than linear equa-
tions. Because of this difficulty, the exponential forms are usually linearized by taking logarithms
before determining the required parameters. Therefore, taking logarithms of both sides of (3.11),
we get
ln y = ln a + b ln x,
which may be written as
Y = A + BX, (3.16)
Chapter Six Least Squares Approximation 253
where Xj = ln xj and Yj = ln yj . After differentiate E with respect to A and B and then putting
the results equal to zero, we get the normal equations in linear form as
n
X n
X
nA + B Xj = Yj
j=1 j=1
n
X n
X n
X
A Xj + B Xj2 = Xj Yj
j=1 j=1 j=1
where X X X X
S1 = Xj , S2 = Yj , S3 = Xj2 , S4 = Xj Yj .
In the foregoing equations the summation is over j from 1 to n. The solution of the above system
can be obtain easily as
S3S2 − S1S4
A =
nS3 − (S1)2
(3.19)
nS4 − S1S2
B =
nS3 − (S1)2
Now the data set may be transformed to (ln xj , ln yj ) and determining a and b is a linear least
squares problem. The values of unknowns a and b deduced from the relations
a = eA and b = B. (3.20)
y(x) = axb ,
will be called the nonlinear least squares approximation for the data.
Example 3.4 Find the best-fit of the form y = axb by using following data:
x 1 2 4 10
y 2.87 4.51 6.11 9.43
Solution. The sums required for the normal equation (3.18) are easily obtained using the values
in Table 3.6. The linear system involving A and B in (3.18) form is
254 6.2 Techniques of Least Squares Approximation
! ! !
4 4.3821 A 6.6144
= .
4.3821 7.7043 B 8.7201
Then solving the above linear system, the solution of the linear system is
Using these values of A and B in (3.20), we have the values of the parameters a and b as
Hence
y(x) = 2.9969x0.5076 ,
is the best nonlinear fit. •
We use the author-defined function Exp1Fit and the following MATLAB commands to reproduce
the above results as follows:
>> x = [1 2 4 10];
>> y = [2.87 4.51 6.11 9.43];
>> [A, B] = Exp1F it(x, y);
Clearly, y(x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.4. To plot Figure 3.4
one can use the MATLAB window command:
Table 3.7 shows the error analysis of the nonlinear least squares approximation.
Hence the error associated with the nonlinear least squares approximation is
4
X
E(a, b) = (yi − axbx 2
i ) = 0.1267.
i=1
•
Chapter Six Least Squares Approximation 255
12
10 y(x) = 2.9969x0.5076
0
0 2 4 6 8 10 12
Similarly, for other nonlinear curve y(x) = aebx , the least squares error is defined as
n
X
E(a, b) = (yj − aebxj )2 , (3.21)
j=1
Then the set of normal equations (3.22) represents the nonlinear simultaneous system.
Example 3.5 Find the best-fit of the form y = aebx by using following data:
by Newton’s method, starting with the initial approximation (a0 , b0 ) = (8, 0) and taking desired
accuracy within = 10−5 .
256 6.2 Techniques of Least Squares Approximation
By using the given data points, the nonlinear system (3.23), gives
where ∂f
1 ∂f1
∂a ∂b
J = .
∂f2 ∂f2
∂a ∂b
Let we start with the initial approximation (a0 , b0 ) = (8, 0), and the values of the functions at this
initial approximation are as follows:
f1 (2, 1) = −4.156
f2 (2, 1) = −2.522
Chapter Six Least Squares Approximation 257
The Jacobian matrix J and its inverse J −1 at the given initial approximation can be computed as
follows: !
−4 −11.722
J(8, 0) = ,
−1.15 −4.913
and !
−1 −0.7961 1.8993
J (8, 0) = .
0.1863 −0.6481
Substituting all these values in the above Newton’s formula, we get the first approximation as follows:
! ! ! ! !
a1 8.0 −0.7961 1.8993 −4.156 9.48168
= − = .
b1 0.0 0.1863 −0.6481 −2.522 −0.86015
The first two and the further steps of the method are listed in Table 3.8 by taking desired accuracy
within = 10−5 .
Hence
y(x) = 9.73060e−1.26492x ,
is the best nonlinear fit. •
Once again to make this exponential form as a linearized form, we take the logarithms of both sides
of the (3.12), we get
ln y = ln a + bx,
which may be written as
Y = A + BX (3.24)
with A = ln a, B = b, X = x, and Y = ln y. The values of A and B can be chosen to minimize
n
X
E(A, B) = (Yj − (A + BXj ))2 , (3.25)
j=1
258 6.2 Techniques of Least Squares Approximation
to get the values of A and B. Now the data set may be transformed to (xj , ln yj ) and determining a
and b is a linear least squares problem. The values of unknowns a and b deduced from the relations
a = eA and b = B, (3.27)
Thus nonlinear guess function with parameters a and b
y(x) = aebx ,
will be called the nonlinear least squares approximation for the data.
Example 3.6 Find the best-fit of the form y = aebx by using following data:
x 0 0.25 0.4 0.5
y 9.532 7.983 4.826 5.503
Solution. The sums required for the normal equation (3.26) are easily obtained using the values
in Table 3.9.
The linear system involving unknown coefficients A and B is
4A + 1.1500B = 7.6112
1.1500A + 0.4725B = 2.0016
Then solving the above linear system, the solution of the linear system is
A = 2.2811 and B = −1.3157.
Using these values in (3.27), we have the values of the unknown parameters a and b as follows
a = eA = 9.7874, and b = B = −1.3157.
Hence the best nonlinear fit is
y(x) = 9.7874x−1.3157 .
Chapter Six Least Squares Approximation 259
12
11
10
y(x) = 9.7874e−1.357
9
4
−0.1 0 0.1 0.2 0.3 0.4 0.5 0.6
We use the author-defined function Exp2Fit and the following MATLAB commands to reproduce
the above results as follows:
Clearly, y(x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.9. To plot Figure 3.9
one can use the MATLAB window command:
Note that the value of a and b calculated for the linearized problem will not necessarily be the same
as the values obtained for the original least squares problem. In this example, the nonlinear system
becomes:
Now the Newton’s method for nonlinear systems can be applied to this system and we get the
values of a and b as follows
a = 9.731, and b = −1.265.
Table 3.10 shows the error analysis of the nonlinear least squares approximation.
Hence the error associated with the nonlinear least squares approximation is
4
X
E(a, b) = (yi − aebxi )2 = 2.0496.
i=1
260 6.2 Techniques of Least Squares Approximation
12
−8
0.5 2.5 5
Example 3.8 Find the best-fit of the form y = axe−bx by using the change of variables to linearize
the following data points:
x 1.5 2.5 4.0 5.5
y 3.0 4.3 6.5 7.0
Solution. Write the given form y = axe−bx into the following form
y
= ae−bx ,
x
and taking logarithms of both sides of the above equation, we get
y
ln = ln a + (−bx).
x
It may be written as
Y = A + BX,
y
with A = ln a, B = −b, X = x, and Y = ln .
x
Then the sums required for the normal equation (3.18) are easily obtained using the values in Ta-
ble 3.12.
Then solving the above linear system, the solution of the linear system is
Using these values of A and B, we have the values of the parameters a and b as
Hence
y(x) = 2.3224e−0.1043x ,
is the best nonlinear fit. •
n
∂E X
= 2 (zj − axj − bxj − c)(−yj ) = 0
∂b j=1
n
∂E X
= 2 (zj − axj − bxj − c)(−1) = 0
∂c j=1
Example 3.9 Find the least squares plane z = ax + by + c by using following data:
xj 1 1 2 2 2
yj 1 2 1 2 3
zj 7 9 10 11 12
Solution. The sums required for the normal equation (3.32) are easily obtained using the values
in Table 3.16.
The linear system (3.32) involving unknown coefficients a, b and c is
14a + 15b + 8c = 82
15a + 19b + 9c = 93
8a + 9b + 5c = 49
264 6.2 Techniques of Least Squares Approximation
Then solving the above linear system, the solution of the linear system is
We use the author-defined function PlaneFit and the following MATLAB commands to reproduce
the above results as follows:
Table 3.15 shows the error analysis of the least squares plane approximation.
Hence
4
X
E(a, b, c) = (zi − axi + byi + c)2 = 0.4000,
i=1
n
X
E(a0 , a1 , . . . , an , b1 , b2 , . . . , bn ) = [yj − p(xj )]2 , (3.34)
j=1
Chapter Six Least Squares Approximation 265
∂E
= 0, for k = 0, 1, . . . , m
∂ak
(3.35)
∂E
= 0, for k = 1, 2, . . . , m
∂bk
gives
n m m n
X a0 X X X
[ ai cos(kxj ) + bi sin(kxj )] = yj
j=1
2 i=1 i=1 j=1
n m m n
X a0 X X X
[ ai cos(kxj ) + bi sin(kxj )] cos(kxj ) = cos(kxj )yj
j=1
2 i=1 i=1 j=1 (3.36)
n m m n
a0 X
X X X
[ ai cos(kxj ) + bi sin(kxj )] sin(kxj ) = sin(kxj )yj
j=1
2 i=1 i=1 j=1
for k = 1, 2, . . . , m.
Then the set of these normal equations (3.37) represent the system of (2m+1) equations in (2m+1)
unknowns and can be solved using any numerical method discussed in the Chapter 1. Note that
the derivation of the coefficients ak and bk is usually called discrete Fourier analysis.
For m = 1, we can write the normal equations (3.37) in the following form
n n n
a0 X X X
+ a1 cos(xj ) + b1 sin(xj ) = yj
2 j=1 j=1 j=1
n
X n
X n
X n
X
a0 cos(xj ) + a1 cos2 (xj ) + b1 cos(xj ) sin(xj ) = cos(xj )yj (3.37)
j=1 j=1 j=1 j=1
n n n n
X X X X
sin2 (xj )
a0 sin(xj ) + a1 cos(xj ) sin(xj ) + b1 = sin(xj )yj
j=1 j=1 j=1 j=1
wherej is the number of the data points. By writing above equations in the matrix form, we have
n/2 S1 S2 a0 S6
S1 S3 S4 a1 = S7 (3.38)
S2 S4 S5 b1 S8
266 6.2 Techniques of Least Squares Approximation
where
n
X n
X
S1 = cos(xj ), S2 = sin(xj ),
j=1 j=1
Xn Xn
S3 = cos2 (xj ), S4 = cos(xj ) sin(xj ),
j=1 j=1
Xn Xn
S5 = sin2 (xj ), S6 = yj ,
j=1 j=1
Xn Xn
S7 = cos(xj )yj , S8 = sin(xj )yj .
j=1 j=1
which represent a linear system of three equations in three unknowns a0 , a1 and b1 . Note that the
coefficients matrix of this system is symmetric and positive definite. Hence the normal equations
possess a unique solution.
Example 3.10 Find the trigonometric least squares polynomial p(x) = a0 + a1 cos x + b1 sin x that
approximate the following data:
Solution. To find the trigonometric least squares polynomial p(x) = a0 + a1 cos x + b1 sin x, we
have solve the following system
n/2 S1 S2 a0 S6
S1 S3 S4 a1 = S7 ,
S2 S4 S5 b1 S8
where
n
X n
X
S1 = cos(xj ) = 4.7291, S2 = sin(xj ) = 1.4629,
j=1 j=1
Xn Xn
S3 = cos2 (xj ) = 4.4817, S4 = cos(xj ) sin(xj ) = 1.3558,
j=1 j=1
Xn Xn
S5 = sin2 (xj ) = 0.5183, S6 = yj = 3.2100,
j=1 j=1
Xn Xn
S7 = cos(xj )yj = 3.0720, S8 = sin(xj )yj = 0.8223.
j=1 j=1
So
2.5 4.7291 1.4629 a0 3.2100
4.7291 4.4817 1.3558 a1 = 3.0720 ,
1.4629 1.3558 0.5183 b1 0.8223
and by solving this system, we got the values of unknown as follows:
1.2
− data
1 − p(x)
0.8
0.6
0.4
0.2
−0.1 0.2 0.4 0.6
We use the author-defined function TiGFit and the following MATLAB commands to reproduce
the above results as follows:
>> x = [0.1 0.2 0.3 0.4 0.5]; y = [0.9 0.75 0.64 0.52 0.40];
>> T iGF it(x, y)
>> x = [0.1 0.2 0.3 0.4 0.5]; y = [0.9 0.75 0.64 0.52 0.40];
>> xf it = −0.1 : 0.1 : 0.6; yf it = −0.0002 + 0.9851. ∗ cos(xf it) − 0.9900. ∗ sin(xf it);
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );
268 6.2 Techniques of Least Squares Approximation
2x1 = 3
4x1 = 1 (3.39)
0 = −5,
which is impossible and hence, the given system (3.39) is inconsistent. Writing the given system in
vector form, we get
! !
2 2
x1 = . (3.40)
4 4
The left-hand side of (3.40) is [0, 0]T when x1 = 0, and is [2, 4]T when x1 = 1. Note that as x1 takes
on all possible values, the left-hand side of (3.40) generates the line connecting the origin and the
point (2, 4) (see Figure 3.8). On the other hand, the right-hand side of (3.40) is the vector [3, 1]T .
Since the point (3, 1) does not lie on the line, therefore, left-hand side and the right-hand side of
(3.40) are never equal. The given system (3.40) is only consistent when the point corresponding
to the right-hand side is contained in the line corresponding to the left-hand side. Thus the least
squares solution to (3.40) is the value of x1 for which the point on the line is closest to the point
(3, 1). In Figure 3.8, we see that the point (1, 2) on the line is closest to (3, 1) which we got when
x1 = 1/2. So the least squares solution to (3.39) is x1 = 1/2. Now consider the following linear
system of three equations in two variables:
a11 x1 + a12 x2 = b1
a21 x1 + a22 x2 = b2 (3.41)
a31 x1 + a32 x2 = b3
Again, it is impossible to find a solution that can be satisfy all of the equations unless two of three
equations are dependent. That is, if only two out of the three equations are unique, then a solution
is possible. Otherwise, our best hope is to find a solution that minimizes the error, that is, the
least squares solution. Now, we discuss the method for finding the least squares solution to the
overdetermined system.
Chapter Six Least Squares Approximation 269
4 O
(2,4)
(1,2)
2 O
1 O
(3,1)
x
1 2 3 4
In the least squares method, x̂ is chosen so that Euclidean norm of residual r = b − Ax̂ is as small
as possible. The residual corresponding to system (3.41) is
b1 − a11 x1 − a12 x2
r = b2 − a21 x1 − a22 x2
b3 − a31 x1 − a32 x2
The l2 -norm of the residual is the square root of the sum of each component squared:
q
krk2 = r12 + r22 + r32 .
Since minimizing krk2 is equivalent to minimizing (krk2 )2 , the least squares solution to (3.41) is
that values for x1 and x2 which minimize the expression
(b1 − a11 x1 − a12 x2 )2 + (b2 − a21 x1 − a22 x2 )2 + (b3 − a31 x1 − a32 x2 )2 (3.42)
The minimizing x1 and x2 are found by differentiating (3.42) with respect to x1 and x2 and setting
the derivatives to zero. Then solving for x1 and x2 , we will obtain the least squares solution
x̂ = [x1 , x2 ]T to the system (3.41).
For a general overdetermined linear system Ax = b, the residual is r = b − Ax̂ and the l2 -norm of
the residual is the square root of rT r. The least squares solution to linear system minimizes
The above equation (3.43) attains minimum when the partial derivative with respect to each of the
variables x1 , x2 , . . . , xn is zero. Since
rT r = r12 + r22 + · · · + rm
2
, (3.44)
∂ T
r r = −2r1 a1j − 2r2 a2j − · · · − 2rm amj . (3.45)
∂xj
From the right side of (3.45), we see that the partial derivative of rT r with respect to xj is −2
times the product between the jth column of A and r. Note that the jth column of A is the jth
row of AT . Since the jth component of AT r is equal to the jth column of A times r, the partial
derivative of rT r with respect to xj is the jth component of the vector −2AT r. The l2 -norm of the
residual is minimized at that point x where all the partial derivatives vanish, that is
∂ T ∂ T ∂ T
r r= r r = ··· = r r=0 (3.46)
∂x1 ∂x2 ∂xn
Since each of these partial derivatives is −2 times the corresponding component of AT r, we conclude
that
AT r = 0. (3.47)
Replacing r by b − Ax, gives
AT (b − Ax̂) = 0, (3.48)
or
AT Ax̂ = AT b, (3.49)
which is called the normal equation.
Any x̂ that minimizes the l2 -norm of the residual r = b − Ax̂ is a solution to the normal equation
(3.49). Conversely, any solution to the normal equation (3.49) is a least squares solution to the
overdetermined linear system.
Example 3.11 Solve the following overdetermined linear system of three equations in two un-
knowns.
2x1 + 5x2 + x3 = 1
3x1 − 4x2 + 2x3 = 3
4x1 + 3x2 + 3x3 = 5
5x1 − 2x2 + 4x3 = 7
Solution. The matrix form of the given system is
2 5 1 1
x1
3 −4 2 3
x2 = .
4 3 3 5
x3
5 −2 4 7
5
x 5 x
2 2
4 4
4x1 + 3x2 = 15
4x + 3x =15
3 1 2 3
(2.4,1.8)
2 (z1, z2) 2
1 1
0 1 2 3 0 1 2 3
x1 x1
Now, let us consider a general underdetermined linear system Ax = b and suppose that p is any
solution to the linear system and q is any vector for which
Aq = 0.
Since
A(p + q) = Ap + Aq = Ap = b,
we see that p + q is a solution to Ax = b, whenever Aq = 0. Conversely, it is also true because if
Ax = b, then
x = p + (x − p) = p + q,
where q = x − p.
Since
A(x − p) = b − b = 0,
then x can be expressed as x = p + q, and Az = 0.
The set of all q such that Aq = 0, is called the null space of A (kernel of A) letting
N = {q : Aq = 0}.
Chapter Six Least Squares Approximation 273
p
o
z Solution set
p+q
q
Origin
Any solution x of underdetermined linear system Ax = b are sketched in Figure 3.10 for x = p + q,
and q ∈ N.
In linear algebra, the set of vectors perpendicular to the null space of A are linear combinations of
the rows of A, so if
a11 a12 · · · a1n
a21 a22 · · · a2n
A= .. .. .. .. , m < n,
. . . .
am1 am2 · · · amn
then
z1 a11 a21 am1
z2
a12
a22
am2
z= .. = t1 .. + t2 .. + · · · + tm .. ,
. . . .
zn a1n a2n amn
or
a11 t1 + a21 t2 + · · · + am1 tm a11 a21 · · · am1 t1
a12 t1 + a22 t2 + · · · + am2 tm
a12 a22 · · · am2
t2
s= .. .. .. .. = .. .. .. .. .. .
. . . . . . . . .
a1n t1 + a2n t2 + · · · + amn tm a1n a2n · · · amn tm
So
z = AT t,
where
t1
t2
t= .. .
.
tm
274 6.2 Techniques of Least Squares Approximation
AAT t = b. (3.52)
t = (AAT )−1 b,
AAT t = b,
we obtain
! 1 5 ! !
1 2 −3 t1 42
2 −1 = ,
5 −1 1 t2 54
−3 1
which reduces the given system as
! ! !
14 0 t1 42
= .
0 27 t2 54
Chapter Six Least Squares Approximation 275
If A is a matrix with linearly independent columns, then the pseudoinverse of a matrix is the matrix
A+ defined by
A+ = (AT A)−1 AT . (3.54)
For example, consider the matrix
1 2
A = 2 3 ,
3 4
then we have
! 1 2 !
T 1 2 3 14 20
A A= 2 3 = ,
2 3 4 20 29
3 4
and its inverse will be of the following form
!
T −1 29/6 −10/3
(A A) = .
−10/3 7/3
Thus
! ! !
T −1 T + 29/6 −10/3 1 2 3 −11/6 −1/3 7/6
(A A) A =A = = ,
−10/3 7/3 2 3 4 4/3 1/3 −2/3
is the pseudoinverse of the matrix. •
276 6.2 Techniques of Least Squares Approximation
The pseudoinverse of the matrix can be obtained using MATLAB built-in pinv function as follows:
>> A = [1 2; 2 3; 3 4];
>> pinv(A);
Note that if A is a square matrix, then A+ = A−1 and in such case, the least squares solution of a
linear system Ax = b is the exact solution, since
x̂ = A+ b = A−1 b.
Example 3.13 Find the pseudoinverse of the matrix of the following linear system and then use
it to compute the least squares solution of the system.
x1 + 2x2 = 3
2x1 − 3x2 = 4
Solution. The matrix form of the given system is
! ! !
1 2 x1 3
= ,
2 −3 x2 4
and so ! ! !
T 1 2 1 2 5 −4
A A= = .
2 −3 2 −3 −4 13
The inverse of the matrix AT A can be computed as follows
!
T −1 13/49 4/49
(A A) = .
4/49 5/49
which gives !
17/7
x̂ = ,
2/7
and this is the least squares solution of the given system. •
The least squares solution to the linear system by pseudoinverse of a matrix can be obtained using
the MATLAB command window as follows:
Theorem 3.1 Let A is a matrix with linearly independent columns, then A+ of A satisfies the
follows:
1. AA+ A = A.
2. A+ AA+ = A+ .
3. (AT )+ = (A+ )T .
•
2. (A+ )+ = A.
3. (AT )+ = (A+ )T .
•
AT Ax̂ = AT b
(QR)T (QR)x̂ = (QR)T b
RT QT QRx̂ = RT QT b
RT Rx̂ = RT QT b (becauseQT Q = I)
Rx̂ = QT b,
or equivalently,
x̂ = R−1 QT b.
Since R is an upper triangular, in practice it is easier to solve Rx̂ = QT b directly (using backward
substitution) than to invert R and compute R−1 QT b.
278 6.2 Techniques of Least Squares Approximation
Example 3.14 A QR decomposition of A is given. Use it to find a least squares solution of the
linear system Ax = b, where
2 2 6 1
A= 1 4 −3 , b = −1 ,
2 −4 9 4
and
2 1 2
−
3 3 3
3 0 9
1 2 2
Q= , R = 0 6 −6 .
3 3 3
0 0 −3
2 2 1
− −
3 3 3
Solution. For the right-hand side of (3.56), we obtain
2 1 2
−
3 3 3
1 3
1 2 2
QT b = − −1 = −3 .
3 3 3
4 0
2 2 1
−
3 3 3
Hence (3.56) can be written as
3 0 9 x1 3
0 6 −6 x2 = −3 ,
0 0 −3 x3 0
or
3x1 + 9x3 = 3
6x2 − 6x3 = −3
− 3x3 = 0
Now using the backward substitution, we obtain
x̂ = [x1 , x2 , x3 ]T = [1, −1/2, 0]T ,
which is called the least squares solution of the given system. •
Chapter Six Least Squares Approximation 279
So we conclude that
Rx̂ = QT b,
must satisfies by the solution of AT Ax̂ = AT b, but because in general R is not even square, we can
not use multiplication by (RT )−1 to arrive at this conclusion. In fact, it is not true in general that
the solution of
Rx̂ = QT b,
even exists, after all, Ax = b is equivalent to QRx = b, that is, to Rx = QT b, so Rx = QT b can
have an actual solution x only if Ax = b does. However, we are getting close to finding the least
squares solution. Here, we need to find a way to simplify the expression
RT Rx̂ = RT QT b. (3.57)
The matrix R is upper triangular, and because we have restricted ourselves the case m ≥ n, we
may write the m × n matrix R as !
R1
R= , (3.58)
0
in partitioned (block) form, where R1 is an upper triangular n × n matrix and 0 represents an
(m − n) × n zero matrix. Since rank(R) = n, so that R1 is nonsingular. hence every diagonal
element of R1 must be nonzero. Now we may rewrite
RT Rx̂ = RT QT b,
as
!T ! !T
R1 R1 R1
x̂ = QT b
0 0 0
!
R1
R1T 0T x̂ = R1T 0T QT b
0
R1T R1 x̂ = R1T 0T (QT b).
Note that multiplying by the block 0T (an n × (m − n) zero matrix) on the right-hand side simply
means that the last (m − n) components of QT b do not affect the computation. Since R1 is
nonsingular, then we have
R1 x̂ = (R1T )−1 R1T 0T (QT b)
= In 0T (QT b).
which is a square linear system involving a nonsingular upper triangular n × n matrix. So (3.59) is
called the least squares solution of the overdetermined system Ax = b with QR decomposition
by backward substitution, where A = QR is the QR decomposition of A and q is essentially QT b.
Note that the last (m − n) columns of Q are not needed to solve the least squares solution of the
linear system with QR decomposition. The block-matrix representation of Q corresponding to R
(by (3.58)) is
Q = [Q1 , Q2 ],
where Q1 is the matrix composed of the first m columns of Q and Q2 is a matrix composed of the
remaining columns of Q. Note that only the first n columns of Q are needed to create A using the
coefficients in R, we can save effort and memory in the process of creating the QR decomposition.
The so-called short QR decomposition of A is
A = Q1 R1 . (3.60)
The only difference between the full QR decomposition and the short decomposition is that the
full QR decomposition contains the additional (m − n) columns of Q.
Example 3.15 Find the least solution of the following linear system Ax = b using QR decompo-
sition, where
2 1 ! 1.9
x1
A = 1 0 , x = , b = 0.9 .
x2
3 1 2.8
Solution. First, we find the QR decomposition, we will get
−0.5345 0.6172 −0.5774 −3.7417 −1.3363
Q = −0.2673 −0.7715 −0.5774 and R= 0 0.4629 ,
−0.8018 −0.1543 0.5774 0 0
and
−0.5345 −0.2673 −0.8018 1.9 −3.5011
QT b = 0.6172 −0.7715 −0.1543 0.9 = 0.0463 .
−0.5774 −0.5774 0.5774 2.8 0.0000
So that ! !
−3.7417 −1.3363 −3.5011
R1 = and q= .
0 0.4629 0.0463
Hence we must solve (3.59), that is
R1 x̂ = q,
or ! ! !
−3.7417 −1.3363 x1 −3.5011
= .
0 0.4629 x2 0.0463
Using backward substitution, we obtain
The MATLAB build-in qr function returns the QR decomposition of a matrix. There are two ways
of calling qr which are:
where Q and Q1 are orthogonal matrices and R and R1 are upper triangular matrices. The above
first form returns the full QR decomposition (that is, if A is (m × n), then Q is (m × m) and R is
(m × n). The second form returns the short QR decomposition, where Q1 and R1 are the matrices
in (3.60).
In the Example 3.15, we apply the full QR decomposition of A using the first form of build-in qr
function as
>> A = [2 1; 1 0; 3 1];
>> [Q, R] = qr(A);
The short QR decomposition of A can be obtain by using the second form of build-in qr function as
>> A = [2 1; 1 0; 3 1];
>> [Q1 , R1 ] = qr(A, 0);
As expected, Q1 and the first two columns of Q are identical, as are R1 and the first two rows of
R. The short QR decomposition of A possesses all the necessary information in the columns of Q1
and R1 to reconstruct A.
(U DV T )T U DV T x̂ = (U DV T )T b
x̂ = V D−1 U T b.
This is the same formal solution that we found for the linear system Ax = b (see the Chapter 6),
but recall that A is no longer a square matrix.
Note that in exact arithmetic, the solution to a least squares problem via normal equations, QR
and SVD is exactly the same. The main difference between these two approaches are the numerical
stability of the methods. To find the least squares solution of the overdetermined linear system
with SVD decomposition, we will find D1 as
!
D1
D= ,
0
the least squares solution of the overdetermined linear system is to solve the following system
D1 V T x̂ = q, (3.61)
or
x̂ = V D1−1 q. (3.62)
Example 3.16 Find the least solution of the following linear system Ax = b using singular value
decomposition, where
1 1 ! 1
x1
A = 0 1 , x= , b = 1 .
x2
1 0 1
Solution. First, we find the singular value decomposition of the given matrix. The first step is to
find the eigenvalues of the following matrix
! 1 1 !
T 1 0 1 2 1
A A= 0 1 = .
1 1 0 1 2
1 0
p(λ) = λ2 − 4λ + 3 = (λ − 3)(λ − 1) = 0,
gives
λ1 = 3, λ2 = 1,
the eigenvalues of AT A and the corresponding eigenvectors are
! !
1 −1
and .
1 1
Thus √ √ ! !
√ 2/2 −√ 2/2 0.7071 −0.7071
V = =
2/2 2/2 0.7071 0.7071
and √
3 0 1.7321 0
D= 0 1 = 0 1.0000 .
0 0 0 0
Chapter Six Least Squares Approximation 283
and similarly
0
1 √
u2 = Av2 = √2/2 .
σ2
− 2/2
These are the two of the three column vectors of U and already form an orthonormal basis for R2 .
Now to find the third column vector u3 of U , we will look for a unit vector u3 that is orthogonal to
√ 2 √ 0
6u1 = 1 and 2u2 = 1 .
1 −1
To satisfy these two orthogonality conditions, the vector u3 must be a solution of the homogeneous
linear system
! x1 0
2 1 1
x2 = 0 ,
0 1 −1
x3 0
which gives the general solution of the system
x1 −1
x2 = α 1 , α ∈ R.
x3 1
√ √ √
By normalizing the vector on the right-hand side, gives, u3 = [−1/ 3, 1/ 3, −1/ 3]T . So we have
√ √
6/3 0 −1/ 3 0.8165 0.0000 −0.5774
√ √ √
U = √6/6 2/2 1 √3 = 0.4082 0.7071 0.5774 .
√
6/2 − 2/2 1/ 3 0.4082 −0.7071 0.5774
Hence ! !
1.7321 0 0.5774 0
D1 = and D1−1 = .
0 1.0000 0 1.0000
Also
0.8165 0.4082 0.4082 1 1.6330
U T b = 0.0000 0.7071 −0.7071 1 = 0.0000 ,
−0.5774 0.5774 0.5774 1 0.5774
284 6.2 Techniques of Least Squares Approximation
Here A is any matrix, D is diagonal matrix having singular values of A in the diagonal, and U and
V are orthogonal matrices. The first form returns the full SVD decomposition and the second form
returns the short SVD. The second decomposition is useful when A is an m×n matrix with m > n.
The second form of SVD decomposition gives U1 the first n columns of U and square (n × n) D1 .
When m > n, then the full SVD decomposition of A gives a D matrix with only zeros in the last
(m − n) rows. Note that there is no change in V in both forms.
In the Example 3.16, we apply the full SVD decomposition of A using the first form of build-in
svd function as
>> A = [1 1; 0 1; 1 0];
>> [U, D, V ] = svd(A);
The short SVD decomposition of A can be obtain by using the second form of build-in svd function
>> A = [1 1; 0 1; 1 0];
>> [U1 , D1 , V ] = svd(A, 0);
As expected, U1 and the first two columns of U are identical, as are D1 and the first two rows of D
(and no change in V in both forms). The short SVD decomposition of A possesses all the necessary
information in the columns of U1 and D1 (with V also) to reconstruct A.
Now we consider the problem of finding the least squares solution of the underdetermined linear
system Ax = b. Since we discussed previously that the least squares solution of Ax = b is the
solution of AAT x̂ = b, that is, the solution of
(U DV T )(U DV T )T x̂ = b
x̂ = U (D−1 )T D−1 U T b.
Note that
D= D1 0 ,
Chapter Six Least Squares Approximation 285
D1T U T x̂ = p, (3.63)
or
x̂ = U (D1−1 )T p. (3.64)
Example 3.17 Find the least solution of the following linear system Ax = b using singular value
decomposition, where
! ! !
1 1 0 x1 1
A= , x= , b= .
0 0 1 x2 2
gives
λ1 = 2, λ2 = 1, λ2 = 0,
the eigenvalues of AT A and the corresponding eigenvectors are
Thus √ √
2/2 0 − 2/2 0.7071 0 −0.7071
√ √
V = 2/2 0 2/2 = 0.7071 0 0.7071 ,
0 1 0 0 1 0
and √ ! !
2 0 0 1.4142 0 0
D= = .
0 1 0 0 1.0000 0
286 6.2 Techniques of Least Squares Approximation
Note that when m and n are similar size, the SVD is significantly more expansive to compute than
the QR decomposition. If m and n are equal, then solving a least squares problem by the SVD is
about an order of magnitude more costly than using the QR decomposition. So for least squares
problems it is generally advisable to use the QR decomposition. When a least squares problem is
known to be difficult one, using the SVD is probably justified.