Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

Chapter 3

Least Squares Approximation

3.1 Introduction
In the previous chapter we discussed approximation of a function using polynomial interpolation.
Here, we will discuss another approach to approximate a function . This approach for approximation
of a function is called the least squares approximation. This approach is suitable if the given data
points are experimental data. We shall discuss linear, nonlinear, plane, and trigonometric least
squares approximation of a function. We shall also discuss least squares solution of overdetermined
and underdetermined linear systems. In the end of the chapter we shall discuss least squares with
QR decomposition and singular value decomposition.

3.2 Techniques of Least Squares Approximation


In fitting a curve to given data points, there are two basic approaches. One is to have the graph of
the approximating the function pass exactly through the given data points. The methods of poly-
nomial interpolation approximation discussed in the previous sections have this special property.
If the data values are experimental then they may contain errors or have a limited number of sig-
nificant digits. In such case the polynomial interpolation methods may yield unsatisfactory results.
The second approach which is discussed here, is usually a more satisfactory one for experimental
data, uses an approximating function which graphs as a smooth curve having the general shape
suggested by the data values but not in general passing exactly through all of the data points. Such
approach is known as least squares data fitting. The least squares method seeks to minimize the
sum (over all data points) of the squares of the differences between the function value and the data
value. The method is based on results from calculus demonstrating that a function, in this case
the total squared error, attains a minimum value when its partial derivatives are zero.
The least squares method of evaluating empirical formulas has been used for many years. In
engineering, curve fitting plays an important role in the analysis, interpretation and correlation of
experimental data with mathematical models formulated from fundamental engineering principles.

3.2.1 Linear Least Squares


To introduce the idea of linear least squares approximation, consider, the experimental data shown
in the Figure 3.1(a).
243
244 6.2 Techniques of Least Squares Approximation

5 5

4.5 4.5

4 O 4 O

3.5 3.5

O O
3 3
O O
2.5 2.5
O O
2
O 2
O

O O
1.5 O 1.5 O

1 1
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8

Figure 3.1: Least squares approximation.

A Lagrange interpolation of polynomial of degree six could easily be constructed for this data.
However, there is no justification for insisting that the data points be reproduced exactly, and
such an approximation may well be very misleading since unwanted oscillations are likely. A more
satisfactory approach would be to find a straight line which passes close to all seven points. One
such possibility is shown in Figure 3.1(b). Here we have to decide what criterion is to be adopted
for constructing such an approximation. The most common approach for this curve is known as
linear least squares data fitting. The linear least squares approach defines the correct straight line
as the one that minimizes the sum of the squares of the distances between the data points and the
line.
The least squares straight line approximations are an extremely useful and common approximate
fit. The solution to linear least squares approximation is an important application of the solution
of systems of linear equations and leads to other interesting ideas of numerical linear algebra. The
least squares approximation is not restricted to straight line. However, in order to motivated the
general case we consider this first. The straight line

p1 (x) = a + bx, (3.1)

should be fitted through the given points (x1 , y1 ), . . . , (xn , yn ) so that the sum of the squares of the
distances of these points from the straight line is minimum, where the distance is measured in the
vertical direction (the y-direction). Hence it will suffice to minimize the function
n
X
E(a, b) = (yj − a − bxj )2 . (3.2)
j=1

The minimum of E occurs if the partial derivatives of E with respect to a and b become zero.
Note that {xj } and {yj } are constant in (3.2) and unknown parameters a and b are variables. Now
differentiate E with respect to variable a by taking other variable b fixed and then put it equal to
zero, gives
n
∂E X
= −2 (yj − a − bxj ) = 0. (3.3)
∂a j=1

Now hold variable a and differentiate E with respect to variable b and then put it equal to zero,
Chapter Six Least Squares Approximation 245

we obtain n
∂E X
= −2 xj (yj − a − bxj ) = 0. (3.4)
∂b j=1

The above equations (3.3) and (3.4) may be rewritten after dividing by -2 as follows
n
X n
X n
X
yj − a−b xj = 0
j=1 j=1 j=1

n
X n
X n
X
xj yj − a xj − b x2j = 0,
j=1 j=1 j=1

which can be arranged to form a 2 × 2 system that is known as the normal equations
n
X n
X
na + b xj = yj
j=1 j=1

n
X n
X n
X
a xj + b x2j = x j yj .
j=1 j=1 j=1

Now writing in matrix form, we have


! ! !
n S1 a S2
= , (3.5)
S1 S3 b S4

where X X X X
S1 = xj , S2 = yj , S3 = x2j , S4 = x j yj .
In the foregoing equations the summation is over j from 1 to n.
The solution of the above system (3.5) can be obtain easily as

S3S2 − S1S4 nS4 − S1S2


a= and b = . (3.6)
nS3 − (S1)2 nS3 − (S1)2

The formula (3.5) reduces the problem of finding the parameters for least squares linear fit to a
simple matrix multiplication.
We shall call a and b the least squares linear parameters for the data and the linear guess function
with parameters, that is
p1 (x) = a + bx,
will be called the least squares line (or regression line) for the data.

Example 3.1 Using the method of least squares, fit a straight line to the four points

(1, 1), (2, 2), (3, 2) and (4, 3).

Solution. The sums required for the normal equation (3.5) are easily obtained using the values in
Table 3.1. The linear system involving a and b in (3.5) form is
246 6.2 Techniques of Least Squares Approximation

Table 3.1: Find the Coefficients of (3.5)


i xi yi x2i x i yi
1 1.0000 1.0000 1.0000 1.0000
2 2.0000 2.0000 4.0000 4.0000
3 3.0000 2.0000 9.0000 6.0000
4 4.0000 3.0000 16.0000 12.0000
n=4 S1=10 S2=8 S3=30 S4=23

Table 3.2: Error analysis of the linear fit


i xi yi p1 (xi ) abs(yi − p1 (xi ))
1 1.0000 1.0000 1.1000 0.1000
2 2.0000 2.0000 1.7000 0.3000
3 3.0000 2.0000 2.3000 0.3000
4 4.0000 3.0000 2.9000 0.1000

! ! !
4 10 a 8
= .
10 30 b 23
Then solving above linear system using the LU decomposition by Cholesky method discussed in
Chapter 1, the solution of the linear system is
a = 0.5 and b = 0.6.
Thus the least squares line is
p1 (x) = 0.5 + 0.6x.
Clearly, p1 (x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.2.

We use the author-defined function LineFit and the following MATLAB commands to reproduce
the above results as follows:

>> x = [1 2 3 4]; y = [1 2 2 3]; [a, b] = LineF it(x, y)

To plot Figure ?? one can use the MATLAB window command:

>> xf it = 0 : 0.1 : 5; yf it = 0.6 ∗ xf it + 0.5;


>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Table 3.2 shows the error analysis of the straight line using least squares approximation. Hence we
have
4
X
E(a, b) = (yi − p1 (xi ))2 = 0.2000,
i=1
the possible error. •
Chapter Six Least Squares Approximation 247

3.5

3
p1(x) = 0.6x + 0.5

2.5

1.5

0.5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

Figure 3.2: Least squares fit of four data point to a line.

3.2.2 Polynomial Least Squares


In the previous section we discussed a procedure to derive the equation of a straight line using least
squares which worked very well if the measure data are intrinsically linear. But in many cases data
from experimental results are not linear. Therefore, now we show how to find the least squares
parabola and the extension to a polynomial of higher degree is easily made. The general problem
of approximating a set of data {(xi , yi ), i = 0, 1, . . . , m} with a polynomial of degree n < m − 1
pn (x) = b0 + b1 x + b2 x2 + · · · + bn xn . (3.7)
Then the error E takes the form
m
X
E = (yj − pn (xj ))2
j=1

m
X m
X m
X
= yj2 − 2 pn (xj )yj + (pn (xj ))2
j=1 j=1 j=1

m m n
! m n
!2
X X X X X
= yj2 −2 bi xij yj + bi xij
j=1 j=1 i=0 j=1 i=0

   
m n m n X
n m
xi+k
X X X X X
= yj2 − 2 bi  yj xij  + bi bk  j
.
j=1 i=0 j=1 i=0 k=0 j=1

Like in the linear least squares, for E to be minimized, it is necessary that ∂E/∂bi = 0, for each
i = 0, 1, 2, . . . , n. Thus for each i,
m n m
∂E
xji+k .
X X X
0= = −2 yj xij + 2 bk (3.8)
∂bi j=1 k=1 j=1

This gives (n+1) normal equations in the (n+1) unknowns bi


n m m
xi+k
X X X
bk j = yj xij , i = 0, 1, 2, . . . , n. (3.9)
k=1 j=1 j=1
248 6.2 Techniques of Least Squares Approximation

It is helpful to write equations as follows:


m
X m
X m
X m
X m
X
b0 x0j + b1 x1j + b2 x2j + · · · + bn xnj = yj x0j
j=1 j=1 j=1 j=1 j=1

m m m m m
xn+1
X X X X X
b0 x1j + b1 x2j + b2 x3j + · · · + bn j = yj x1j
j=1 j=1 j=1 j=1 j=1
..
.
m m m m m
xn+1 xn+2
X X X X X
b0 xnj + b1 j + b2 j + · · · + bn x2n
j = yj xnj .
j=1 j=1 j=1 j=1 j=1

Note that the coefficients matrix of this system is symmetric and positive definite. Hence the
normal equations possess a unique solution.

Example 3.2 Find the least squares polynomial approximation of degree 2 to the following data:

xj 0 1 2 4 6
yj 3 1 0 1 4

Solution. The coefficient of the least squares polynomial approximation of degree 2

p2 (x) = b0 + b1 x + b2 x2 ,

are the solution values b0 , b1 and b2 of the linear system


X X X
x1j + b2 x2j yj x0j 

b0 m + b1 =  



X X X X 
b0 x1j + b1 x2j + b2 x3j = yj x1j (3.10)




X X X X 
b0 x2j + b1 x3j + b2 x4j = 2 
yj xj 

The sums required for the normal equation (3.10) are easily obtained using the values in Table 3.3.
The linear system involving unknown coefficients b0 , b1 and b2 is

5b0 + 13b1 + 57b2 = 9


13b0 + 57b1 + 289b2 = 29
57b0 + 289b1 + 1569b2 = 161

Then solving the above linear system, the solution of the linear system is

b0 = 2.8252, b1 = −2.0490, b2 = 0.3774.

Hence the parabola equation becomes

p2 (x) = 2.8252 − 2.0490x + 0.3774x2 .


Chapter Six Least Squares Approximation 249

5
p2(x) = 2.8252 − 2.0490x + 0.3774x2

0
−1 0 1 2 3 4 5 6 7

Figure 3.3: Least squares fit of five data point to a parabola.

Table 3.3: Find the Coefficients of (3.10)


i xi yi x2i x3i x4i xi yi x2i yi
1 0.00 3.00 0.00 0.00 0.00 0.00 0.00
2 1.00 1.00 1.00 1.00 1.00 1.00 1.00
3 2.00 0.00 4.00 8.00 16.00 0.00 0.00
4 4.00 1.00 16.00 64.00 256.00 4.00 16.00
5 6.00 4.00 36.00 216.00 1296.00 24.00 144.00
m=5 13.00 9.00 57.00 289.00 1569.00 29.00 161.00

We use the author-defined function PolyFit and the following MATLAB commands to reproduce
the above results as follows:

>> x = [0 1 2 4 6]; y = [3 1 0 1 4];


>> n = 2;
>> C = polyF it(x, y, n);

Clearly, p2 (x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.3. To plot Figure 3.3
one can use the MATLAB window command:

>> xf it = −1 : 0.1 : 7;
>> yf it = 2.8252 − 2.0490. ∗ xf it + 0.3774. ∗ xf it. ∗ xf it;
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

3.2.3 Nonlinear Least Squares


Here, we will discuss two well known nonlinear least square form, called, the exponential forms and
the hyperbolic form.
Table 3.4 shows the error analysis of the parabola using least squares approximation.
250 6.2 Techniques of Least Squares Approximation

Table 3.4: Error analysis of the polynomial fit


i xi yi p2 (xi ) abs(yi − p2 (xi ))
1 0.0000 3.0000 2.8252 0.1748
2 1.0000 1.0000 1.1535 0.1535
3 2.0000 0.0000 0.2367 0.2367
4 4.0000 1.0000 0.6674 0.3326
5 6.0000 4.0000 4.1173 0.1173
13.000 9.0000 9.0001 1.0148

Hence the error associated with the least squares polynomial approximation of degree 2 is
5
X
E(b0 , b1 , b2 ) = (yi − p(xi ))2 = 0.2345.
j=1

3.2.3.1 Exponential Forms


Although polynomials are frequently used as the approximating function that are by no means the
only possibility. The most popular forms of nonlinear curves that are tried are the exponential
forms
y(x) = axb , (3.11)
or
y(x) = aebx . (3.12)
We can develop the normal equations for these analogously to the above development for a least
squares. The least squares error for (3.11) is given by
n
X
E(a, b) = (yj − axbj )2 , (3.13)
j=1

with associated normal equations


n

∂E X
(yj − axbj )xbj = 0

= −2 

∂a



j=1 

(3.14)
n 
∂E X
b b−1


= −2 (yj − axj )(abxj ) = 0 


∂b j=1

Then the set of normal equations (3.14) represent the system of two equations in the two unknowns
a and b. Such nonlinear simultaneous equations can be solved using the Newton’s method for
nonlinear systems.
Example 3.3 Find the best-fit of the form y = axb by using following data:

x 1 2 4 10
y 2.87 4.51 6.11 9.43
Chapter Six Least Squares Approximation 251

by Newton’s method, starting with the initial approximation (a0 , b0 ) = (2, 1) and taking desired
accuracy within  = 10−5 .

Solution. The normal equation is


4 n

X X
yj xbj − a x2b

j =0





j=1 j=1 

(3.15)
4
X 4
X



yj xbj ln xj − a x2b

j ln xj = 0




j=1 j=1

By using the given data points, the nonlinear system (3.15), gives
2.87 − a(1 + 22b + 42b + 102b ) + 4.5(2b ) + 6.11(4b ) + 9.43(10b ) = 0
−a(0.69(22b ) + 1.39(42b ) + 2.30(102b )) + 3.12(2b ) + 8.47(4b ) + 21.72(10b ) = 0
Let us consider the two functions
f1 (a, b) = 2.87 − a(1 + 22b + 42b + 102b ) + 4.5(2b ) + 6.11(4b ) + 9.43(10b )
f2 (a, b) = −a(0.69(22b ) + 1.39(42b ) + 2.30(102b )) + 3.12(2b ) + 8.47(4b ) + 21.72(10b )
and their derivatives with respect to unknown variables a and b:
∂f1
= −(1 + 22b + 42b + 102b )
∂a
∂f1
= −a(1.39(22b ) + 2.77(42b ) + 4.61(102b ) + 3.12(2b ) + 8.47(4b ) + 21.71(10b ))
∂b
∂f2
= −(0.69(22b ) + 1.39(42b ) + 2.30(102b )
∂a
∂f2
= −a(0.96(22b ) + 3.84(42b ) + 10.61(102b ) + 2.16(2b ) + 11.74(4b ) + 50.01(10b ))
∂b
Since the Newton’s formula for the system of two nonlinear equation is
! ! !
ak+1 ak −1 f1 (ak , bk )
= −J (ak , bk ) ,
bk+1 bk+1 f2 (ak , bk )
where  ∂f
1 ∂f1 
 ∂a ∂b 
J = .
 
∂f2 ∂f2 

∂a ∂b
Let we start with the initial approximation (a0 , b0 ) = (2, 1), and the values of the functions at this
initial approximation are as follows:
f1 (2, 1) = −111.39
f2 (2, 1) = −253.216
252 6.2 Techniques of Least Squares Approximation

The Jacobian matrix J and its inverse J −1 at the given initial approximation can be calculated as
follows: !
−121 −763.576
J(2, 1) = ,
−255.248 −1700.534
and !
−1 −0.1565 0.0703
J (2, 1) = .
0.0235 −0.0111

Substituting all these values in the above Newton’s formula, we get the first approximation as follows:
! ! ! ! !
a1 2.0 −0.1565 0.0703 −111.39 2.3615
= − = .
b1 1.0 0.0235 −0.0111 −253.216 0.7968

Similarly, the second iteration using (a1 , b1 ) = (2.3615, 0.7968), gives


! ! ! ! !
a2 2.3615 −0.2323 0.1063 −35.4457 2.7444
= − = .
b2 0.7968 0.0339 −0.0169 −81.1019 0.6282

The first two and the further steps of the method are listed in Table 3.8 by taking desired accuracy
within  = 10−5 .

Table 3.5: Solution of a system of two nonlinear equations


n a-approximation b-approximation
an bn
0.0000 2.00000 1.00000
1.0000 2.36151 0.79684
2.0000 2.74443 0.62824
3.0000 2.99448 0.52535
4.0000 3.07548 0.49095
5.0000 3.08306 0.48754
6.0000 3.08314 0.48751

Hence
y(x) = 3.08314x0.48751 .
is the best nonlinear fit. •

But remember that nonlinear simultaneous equations are more difficult to solve than linear equa-
tions. Because of this difficulty, the exponential forms are usually linearized by taking logarithms
before determining the required parameters. Therefore, taking logarithms of both sides of (3.11),
we get
ln y = ln a + b ln x,
which may be written as
Y = A + BX, (3.16)
Chapter Six Least Squares Approximation 253

with A = ln a, B = b, X = ln x, and Y = ln y. The values of A and B can be chosen to minimize


n
X
E(A, B) = (Yj − (A + BXj ))2 , (3.17)
j=1

where Xj = ln xj and Yj = ln yj . After differentiate E with respect to A and B and then putting
the results equal to zero, we get the normal equations in linear form as
n
X n
X
nA + B Xj = Yj
j=1 j=1

n
X n
X n
X
A Xj + B Xj2 = Xj Yj
j=1 j=1 j=1

Then writing above equations in matrix form, we have


! ! !
n S1 A S2
= , (3.18)
S1 S3 B S4

where X X X X
S1 = Xj , S2 = Yj , S3 = Xj2 , S4 = Xj Yj .
In the foregoing equations the summation is over j from 1 to n. The solution of the above system
can be obtain easily as
S3S2 − S1S4 

A = 
nS3 − (S1)2  

(3.19)
nS4 − S1S2  
B =


nS3 − (S1)2

Now the data set may be transformed to (ln xj , ln yj ) and determining a and b is a linear least
squares problem. The values of unknowns a and b deduced from the relations

a = eA and b = B. (3.20)

Thus nonlinear guess function with parameters a and b

y(x) = axb ,

will be called the nonlinear least squares approximation for the data.

Example 3.4 Find the best-fit of the form y = axb by using following data:

x 1 2 4 10
y 2.87 4.51 6.11 9.43

Solution. The sums required for the normal equation (3.18) are easily obtained using the values
in Table 3.6. The linear system involving A and B in (3.18) form is
254 6.2 Techniques of Least Squares Approximation

Table 3.6: Find the Coefficients of (3.19)


i Xi Yi Xi2 Xi Yi
1 0.0000 1.0543 0.0000 0.0000
2 0.6932 1.5063 0.4805 1.0442
3 1.3863 1.8099 1.9218 2.5091
4 2.3026 2.2439 5.3020 5.1668
n=4 S1=4.3821 S2=6.6144 S3=7.7043 S4=8.7201

! ! !
4 4.3821 A 6.6144
= .
4.3821 7.7043 B 8.7201

Then solving the above linear system, the solution of the linear system is

A = 1.0975 and B = 0.5076

Using these values of A and B in (3.20), we have the values of the parameters a and b as

a = eA = 2.9969 and b = B = 0.5076.

Hence
y(x) = 2.9969x0.5076 ,
is the best nonlinear fit. •

We use the author-defined function Exp1Fit and the following MATLAB commands to reproduce
the above results as follows:

>> x = [1 2 4 10];
>> y = [2.87 4.51 6.11 9.43];
>> [A, B] = Exp1F it(x, y);

Clearly, y(x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.4. To plot Figure 3.4
one can use the MATLAB window command:

>> xf it = 0 : 0.1 : 11; yf it = 2.9969 ∗ xf it. ˆ 0.5076;


>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Table 3.7 shows the error analysis of the nonlinear least squares approximation.
Hence the error associated with the nonlinear least squares approximation is
4
X
E(a, b) = (yi − axbx 2
i ) = 0.1267.
i=1


Chapter Six Least Squares Approximation 255

12

10 y(x) = 2.9969x0.5076

0
0 2 4 6 8 10 12

Figure 3.4: Nonlinear least squares fit.

Table 3.7: Error analysis of the nonlinear fit


i xi yi y(xi ) abs(yi − y(xi ))
1 1.0000 2.870 2.9969 0.1269
2 2.000 4.510 4.2605 0.2495
3 4.000 6.110 6.0569 0.0531
4 10.000 9.430 9.6435 0.2135

Similarly, for other nonlinear curve y(x) = aebx , the least squares error is defined as
n
X
E(a, b) = (yj − aebxj )2 , (3.21)
j=1

and gives the associated normal equations as follows


n

∂E X
bxj bxj 
= −2 (yj − ae )e =0 

∂a



j=1 

(3.22)
n 
∂E X
bxj bxj


= −2 (yj − ae )axj e = 0 


∂a j=1

Then the set of normal equations (3.22) represents the nonlinear simultaneous system.

Example 3.5 Find the best-fit of the form y = aebx by using following data:

x 0 0.25 0.4 0.5


y 9.532 7.983 4.826 5.503

by Newton’s method, starting with the initial approximation (a0 , b0 ) = (8, 0) and taking desired
accuracy within  = 10−5 .
256 6.2 Techniques of Least Squares Approximation

Solution. The normal equation is


4 4

X X
bxj 2bxj 
yj e −a e =0 




j=1 j=1 

(3.23)
4
X 4
X



yj xj ebxj − a xj e2bxj = 0





j=1 j=1

By using the given data points, the nonlinear system (3.23), gives

9.53 − a(1 + e0.5b + e0.8b + eb ) + 7.98e0.25b + 4.83e0.4b + 5.503e0.5b = 0


−a(0.25e0.5b + 0.4e0.8b + 0.5eb ) + 1.996e0.25b + 1.93e0.4b + 2.752e0.5b = 0

Let us consider the two functions

f1 (a, b) = 9.53 − a(1 + e0.5b + e0.8b + eb ) + 7.98e0.25b + 4.83e0.4b + 5.503e0.5b


f2 (a, b) = −a(0.25e0.5b + 0.4e0.8b + 0.5eb ) + 1.996e0.25b + 1.93e0.4b + 2.752e0.5b

and their derivatives with respect to unknown variables a and b:


∂f1
= −(1 + e0.5b + e0.8b + eb )
∂a
∂f1
= −a(0.5e0.5b + 0.8e0.8b + eb ) + 1.996e0.25b + 1.93e0.4b + 2.752e0.5b
∂b
∂f2
= −(0.25e0.5b + 0.4e0.8b + 0.5eb )
∂a
∂f2
= −a(0.125e0.5b + 0.32e0.8b + 0.5eb ) + 0.499e0.25b + 0.772e0.4b + 1.376e0.5b
∂b
Since the Newton’s formula for the system of two nonlinear equation is
! ! !
ak+1 ak −1 f1 (ak , bk )
= −J (ak , bk ) ,
bk+1 bk+1 f2 (ak , bk )

where  ∂f
1 ∂f1 
 ∂a ∂b 
J = .
 
∂f2 ∂f2 

∂a ∂b
Let we start with the initial approximation (a0 , b0 ) = (8, 0), and the values of the functions at this
initial approximation are as follows:

f1 (2, 1) = −4.156
f2 (2, 1) = −2.522
Chapter Six Least Squares Approximation 257

The Jacobian matrix J and its inverse J −1 at the given initial approximation can be computed as
follows: !
−4 −11.722
J(8, 0) = ,
−1.15 −4.913
and !
−1 −0.7961 1.8993
J (8, 0) = .
0.1863 −0.6481
Substituting all these values in the above Newton’s formula, we get the first approximation as follows:
! ! ! ! !
a1 8.0 −0.7961 1.8993 −4.156 9.48168
= − = .
b1 0.0 0.1863 −0.6481 −2.522 −0.86015

Similarly, the second iteration using (a1 , b1 ) = (9.48168, −0.86015), gives


! ! ! ! !
a2 9.48168 −0.87813 2.19437 −1.4546 9.70881
= − = .
b2 −0.86015 0.20559 −0.92078 −0.6856 −1.19239

The first two and the further steps of the method are listed in Table 3.8 by taking desired accuracy
within  = 10−5 .

Table 3.8: Solution of Example 3.7


n a-approximation b-approximation
an bn
0.0000 8.00000 0.00000
1.0000 9.48168 -0.86015
2.0000 9.70881 -1.19239
3.0000 9.72991 -1.26193
4.0000 9.73060 -1.26492
5.0000 9.73060 -1.26492

Hence
y(x) = 9.73060e−1.26492x ,
is the best nonlinear fit. •

Once again to make this exponential form as a linearized form, we take the logarithms of both sides
of the (3.12), we get
ln y = ln a + bx,
which may be written as
Y = A + BX (3.24)
with A = ln a, B = b, X = x, and Y = ln y. The values of A and B can be chosen to minimize
n
X
E(A, B) = (Yj − (A + BXj ))2 , (3.25)
j=1
258 6.2 Techniques of Least Squares Approximation

where Xj = xj and Yj = ln yj . By solving the linear normal equations of the form


n n

X X 
nA + B Xj = Yj 




j=1 j=1 

(3.26)
n
X n
X n
X


Xj2 =

A Xj + B Xj Yj 




j=1 j=1 j=1

to get the values of A and B. Now the data set may be transformed to (xj , ln yj ) and determining a
and b is a linear least squares problem. The values of unknowns a and b deduced from the relations
a = eA and b = B, (3.27)
Thus nonlinear guess function with parameters a and b
y(x) = aebx ,
will be called the nonlinear least squares approximation for the data.
Example 3.6 Find the best-fit of the form y = aebx by using following data:
x 0 0.25 0.4 0.5
y 9.532 7.983 4.826 5.503
Solution. The sums required for the normal equation (3.26) are easily obtained using the values
in Table 3.9.
The linear system involving unknown coefficients A and B is

Table 3.9: Find the Coefficients of (3.26)


i Xi Yi Xi2 Xi Yi
1 0.0000 2.2546 0.0000 0.0000
2 0.2500 2.0773 0.0625 0.5193
3 0.4000 1.5740 0.1600 0.6296
4 0.5000 1.7053 0.2500 0.8527
n=4 1.1500 7.6112 0.4725 2.0016

4A + 1.1500B = 7.6112
1.1500A + 0.4725B = 2.0016
Then solving the above linear system, the solution of the linear system is
A = 2.2811 and B = −1.3157.
Using these values in (3.27), we have the values of the unknown parameters a and b as follows
a = eA = 9.7874, and b = B = −1.3157.
Hence the best nonlinear fit is
y(x) = 9.7874x−1.3157 .
Chapter Six Least Squares Approximation 259

12

11

10
y(x) = 9.7874e−1.357
9

4
−0.1 0 0.1 0.2 0.3 0.4 0.5 0.6

Figure 3.5: Nonlinear least squares fit.

We use the author-defined function Exp2Fit and the following MATLAB commands to reproduce
the above results as follows:

>> x = [0 0.25 0.4 0.5]; y = [9.532 7.983 4.826 5.503];


>> [A, B] = Exp2F it(x, y);

Clearly, y(x) replaces the tabulated functional relationship given by y = f (x). The original data
along with the approximating polynomials are shown graphically in Figure 3.9. To plot Figure 3.9
one can use the MATLAB window command:

>> xf it = −0.1 : 0.1 : 0.6; yf it = 9.7874 ∗ exp(−1.3157. ∗ xf it);


>> plot(x, y,0 o0 , xf it, yf it,0 −0 );

Note that the value of a and b calculated for the linearized problem will not necessarily be the same
as the values obtained for the original least squares problem. In this example, the nonlinear system
becomes:

9.532 − a + 7.983e0.25b − ae0.5b + 4.826e0.4b − ae0.8b + 5.503e0.5b − aeb = 0

1.996e0.25b − 0.25a2 e0.5b + 1.930e0.4b − 0.4a2 e0.8b + 2.752ae0.5b − 0.5a2 eb = 0

Now the Newton’s method for nonlinear systems can be applied to this system and we get the
values of a and b as follows
a = 9.731, and b = −1.265.

Table 3.10 shows the error analysis of the nonlinear least squares approximation.
Hence the error associated with the nonlinear least squares approximation is

4
X
E(a, b) = (yi − aebxi )2 = 2.0496.
i=1
260 6.2 Techniques of Least Squares Approximation

Table 3.10: Error analysis of the nonlinear fit


i xi yi y(xi ) abs(yi − y(xi ))
1 0.000 9.532 9.7872 0.2552
2 0.250 7.983 7.0439 0.9391
3 0.400 4.826 5.7823 0.9563
4 0.500 5.503 5.0695 0.4335

3.2.3.2 Hyperbolic Form


Suppose we want to fit the given data by a function of the form
b
y(x) = a +
, (3.28)
x
in the least squares sense. Here, before finding the normal equation, we linearize y(x) by setting
1
Y (x) = y(x), X= , A = a, B = b,
x
so
Y (x) = A + BX. (3.29)
Since this form is now a linear function in X, and if we transform the given data points (xi , yi ) to
(1/xi , yi ), we can use the linear least squares to find a and b.
b
Example 3.7 Find the best-fit of the hyperbolic form y = a + by using following data:
x
x 1 2 3 4
y 3 5 8 11
Solution. First we convert the given data points (xi , yi ) to (1/xi , yi ), we get
X 1 0.5 0.33 0.25
y 3 5 8 11
The sums required for the normal equation (3.18) are easily obtained using the values in Table 3.11.
The linear system involving A and B in (3.18) form is
! ! !
4 2.0800 A 27.0000
= .
2.0800 1.4214 B 10.8900
Then solving the above linear system, the solution of the linear system is
A = 11.5705 and B = −9.2702.
Using these values of A and B, we have the values of the parameters a and b as
a = A = 11.5705 and b = B = −9.2702.
Hence
9.2702
y(x) = 11.5705 − ,
x
is the best hyperbolic fit. •
Chapter Six Least Squares Approximation 261

Table 3.11: Find the Coefficients of (3.19)


i Xi Yi Xi2 Xi Yi
1 1.0000 3.000 1.0000 3.0000
2 0.5000 5.0000 0.2500 2.5000
3 0.3333 8.0000 0.1089 2.6400
4 0.2500 11.0000 0.0625 2.7500
n=4 S1=2.0800 S2=27.0000 S3=1.4214 S4=10.8900

12

2 y(x) = 11.5705 − 9.2702/x

−8
0.5 2.5 5

Figure 3.6: Hyperbolic least squares fit.

Example 3.8 Find the best-fit of the form y = axe−bx by using the change of variables to linearize
the following data points:
x 1.5 2.5 4.0 5.5
y 3.0 4.3 6.5 7.0
Solution. Write the given form y = axe−bx into the following form
y
= ae−bx ,
x
and taking logarithms of both sides of the above equation, we get
y
 
ln = ln a + (−bx).
x
It may be written as
Y = A + BX,
y
 
with A = ln a, B = −b, X = x, and Y = ln .
x
Then the sums required for the normal equation (3.18) are easily obtained using the values in Ta-
ble 3.12.

The linear system involving A and B in (3.18) form is


! ! !
4 13.50 A 1.9622
= .
13.50 54.750 B 5.6642
262 6.2 Techniques of Least Squares Approximation

Table 3.12: Find the Coefficients of (3.19)


i xi yi Xi = xi Yi = ln(yi /xi ) Xi2 Xi Yi
1 1.50 3.0 1.50 0.6932 2.25 1.0398
2 2.50 4.3 2.50 0.5423 6.25 1.3558
3 4.00 6.5 4.00 0.4855 16.00 1.9420
4 5.50 7.0 5.50 0.2412 3.25 1.3266
n=4 S1=13.50 S2=1.9622 S3=54.750 S4=5.6642

Table 3.13: Conversion of Nonlinear forms into Linear forms


No. Nonlinear Form Linear Form Change of Variable(s) and Constants
y = f (x) Y = A + BX
1 y = ax + b/x2 Y = A + BX Y = y/x, X = 1/x3 , A = a, B = b
2 y = 1/(a + bx)2 Y = A + BX Y = (1/y)1/2 , X = x, A = a, B = b
3 y = 1/(a + bx2 ) Y = A + BX Y = 1/y, X = x2 , A = a, B = b
4 y = 1/(2 + aebx ) Y = A + BX Y = ln(1/y − 2), X = x, A = a, B = b
5 y = axe−bx Y = A + BX Y = ln(y/x), X = x, A = ln(a), B = −b
6 y = a + b/ ln x Y = A + BX Y = y, X = 1/ ln x, A = a, B = b
7 y = (a + bx)−3 Y = A + BX Y = 1/y 1/3 , X = x, A = a, B = b
8 y = N/(1 + aebx ) Y = A + BX Y = ln(N/y − 1), X = x, A = ln a, B = b
9 y = x2 /(ax2 + b) Y = A + BX Y = 1/y, X = 1/x2 , A = a, B = b
10 y = a ln x + b/ ln x Y = A + BX Y = y/ ln x, X = 1/(ln x)2 , A = a, B = b

Then solving the above linear system, the solution of the linear system is

A = 0.8426 and B = −0.1043

Using these values of A and B, we have the values of the parameters a and b as

a = eA = 2.3224 and b = −B = 0.1043,

Hence
y(x) = 2.3224e−0.1043x ,
is the best nonlinear fit. •

3.2.4 Least Squares Plane


Many problems arise in engineering and science where the dependent variable is a function of two
or more variables. For example, z = f (x, y) is a two-variables. Consider least squares plane The
following Table 3.13 shows the conversion of nonlinear forms into linear forms by using change of
variables and the constants.
z = ax + by + c, (3.30)
Chapter Six Least Squares Approximation 263

Table 3.14: Find the Coefficients of (3.32)


i xi yi zi x2i x i yi yi2 xi z i yi zi
1 1.000 1.000 7.000 1.000 1.000 1.000 7.000 7.000
2 1.000 2.000 9.000 1.000 2.000 4.000 9.000 18.000
3 2.000 1.000 10.000 4.000 2.000 1.000 20.000 10.000
4 2.000 2.000 11.000 4.000 4.000 4.000 22.000 22.000
5 2.000 3.000 12.000 4.000 6.000 9.000 24.000 36.000
n=5 8.000 9.000 49.000 14.000 15.000 19.000 82.000 93.000

for the n points (x1 , y1 , z1 ), . . . , (xn , yn , zn ) is obtained by minimizing


n
X
E(a, b, c) = (zj − axj − byj − c)2 , (3.31)
j=1

the function E(a, b, c) is minimum when


n
∂E X
= 2 (zj − axj − bxj − c)(−xj ) = 0
∂a j=1

n
∂E X
= 2 (zj − axj − bxj − c)(−yj ) = 0
∂b j=1

n
∂E X
= 2 (zj − axj − bxj − c)(−1) = 0
∂c j=1

Dividing by 2 and rearranging gives the normal equations:



( x2j )a
P P P P
+ ( xj yj )b + ( xj )c = z x
P j j


P P 2 P
( xj yj )a + ( yj )b + ( yj )c = z y (3.32)
P P P j j 
( xj )a + ( yj )b + nc = zj 

The above linear system can be solved for unknowns a, b, and c.

Example 3.9 Find the least squares plane z = ax + by + c by using following data:
xj 1 1 2 2 2
yj 1 2 1 2 3
zj 7 9 10 11 12
Solution. The sums required for the normal equation (3.32) are easily obtained using the values
in Table 3.16.
The linear system (3.32) involving unknown coefficients a, b and c is
14a + 15b + 8c = 82
15a + 19b + 9c = 93
8a + 9b + 5c = 49
264 6.2 Techniques of Least Squares Approximation

Table 3.15: Error analysis of the plane fit


i xi yi zi z(xi , yi ) abs(zi − z)
1 1.0000 1.0000 7.0000 7.4000 0.4000
2 1.0000 2.0000 9.0000 8.6000 0.4000
3 2.0000 1.0000 10.0000 9.8000 0.2000
4 2.0000 2.0000 11.0000 11.0000 0.0000
5 2.0000 3.0000 12.0000 12.2000 0.2000

Then solving the above linear system, the solution of the linear system is

a = 2.400, b = 1.200, c = 3.800.

Hence the least squares plane fit is

z = 2.400x + 1.200y + 3.800.

We use the author-defined function PlaneFit and the following MATLAB commands to reproduce
the above results as follows:

>> x = [1 1 2 2 2]; y = [1 2 1 2 3];


>> z = [7 9 10 11 12]; sol = P laneF it(x, y, z);

Table 3.15 shows the error analysis of the least squares plane approximation.
Hence
4
X
E(a, b, c) = (zi − axi + byi + c)2 = 0.4000,
i=1

is the error associated with the least squares plane approximation. •

3.2.5 Trigonometric Least Squares Polynomial


This is another popular form of the polynomial frequently used as the approximating function.
Since we know that a series of the form
m
a0 X
p(x) = + [ai cos(kx) + bi sin(kx)], (3.33)
2 i=1

is called a trigonometric polynomial of order m.


Here shall approximate the given data points with the function (3.33) using the least squares
method. The least squares error for (3.33) is given by

n
X
E(a0 , a1 , . . . , an , b1 , b2 , . . . , bn ) = [yj − p(xj )]2 , (3.34)
j=1
Chapter Six Least Squares Approximation 265

with associated normal equations

∂E

= 0, for k = 0, 1, . . . , m 

∂ak

 
(3.35)
∂E 

= 0, for k = 1, 2, . . . , m 


∂bk

gives

n m m n 
X a0 X X X
[ ai cos(kxj ) + bi sin(kxj )] = yj



j=1
2 i=1 i=1 j=1









n m m n 
X a0 X X X 

[ ai cos(kxj ) + bi sin(kxj )] cos(kxj ) = cos(kxj )yj


j=1
2 i=1 i=1 j=1 (3.36)





n m m n 
a0 X
X X X 

[ ai cos(kxj ) + bi sin(kxj )] sin(kxj ) = sin(kxj )yj



j=1
2 i=1 i=1 j=1






for k = 1, 2, . . . , m.

Then the set of these normal equations (3.37) represent the system of (2m+1) equations in (2m+1)
unknowns and can be solved using any numerical method discussed in the Chapter 1. Note that
the derivation of the coefficients ak and bk is usually called discrete Fourier analysis.
For m = 1, we can write the normal equations (3.37) in the following form

n n n 
a0 X X X
+ a1 cos(xj ) + b1 sin(xj ) = yj



2 j=1 j=1 j=1









n
X n
X n
X n
X



a0 cos(xj ) + a1 cos2 (xj ) + b1 cos(xj ) sin(xj ) = cos(xj )yj (3.37)
j=1 j=1 j=1 j=1







n n n n


X X X X 
sin2 (xj )

a0 sin(xj ) + a1 cos(xj ) sin(xj ) + b1 = sin(xj )yj 



j=1 j=1 j=1 j=1

wherej is the number of the data points. By writing above equations in the matrix form, we have

    
n/2 S1 S2 a0 S6
 S1 S3 S4   a1  =  S7  (3.38)
    
S2 S4 S5 b1 S8
266 6.2 Techniques of Least Squares Approximation

where
n
X n
X
S1 = cos(xj ), S2 = sin(xj ),
j=1 j=1
Xn Xn
S3 = cos2 (xj ), S4 = cos(xj ) sin(xj ),
j=1 j=1
Xn Xn
S5 = sin2 (xj ), S6 = yj ,
j=1 j=1
Xn Xn
S7 = cos(xj )yj , S8 = sin(xj )yj .
j=1 j=1

which represent a linear system of three equations in three unknowns a0 , a1 and b1 . Note that the
coefficients matrix of this system is symmetric and positive definite. Hence the normal equations
possess a unique solution.

Example 3.10 Find the trigonometric least squares polynomial p(x) = a0 + a1 cos x + b1 sin x that
approximate the following data:

xj 0.1 0.2 0.3 0.4 0.5


yj 0.9 0.75 0.64 0.52 0.40.

Solution. To find the trigonometric least squares polynomial p(x) = a0 + a1 cos x + b1 sin x, we
have solve the following system
    
n/2 S1 S2 a0 S6
 S1 S3 S4   a1  =  S7  ,
    
S2 S4 S5 b1 S8

where
n
X n
X
S1 = cos(xj ) = 4.7291, S2 = sin(xj ) = 1.4629,
j=1 j=1
Xn Xn
S3 = cos2 (xj ) = 4.4817, S4 = cos(xj ) sin(xj ) = 1.3558,
j=1 j=1
Xn Xn
S5 = sin2 (xj ) = 0.5183, S6 = yj = 3.2100,
j=1 j=1
Xn Xn
S7 = cos(xj )yj = 3.0720, S8 = sin(xj )yj = 0.8223.
j=1 j=1

So     
2.5 4.7291 1.4629 a0 3.2100
 4.7291 4.4817 1.3558   a1  =  3.0720  ,
    
1.4629 1.3558 0.5183 b1 0.8223
and by solving this system, we got the values of unknown as follows:

a0 = −0.0002, a1 = 0.9851, b1 = −0.9900.


Chapter Six Least Squares Approximation 267

Table 3.16: Find the Coefficients of (3.37)


j xj yj C S C2 S2 CS Cyj Syj
1 0.1 0.90 0.9950 0.0998 0.9900 0.0100 0.0993 0.8955 0.0899
2 0.2 0.75 0.9801 0.1987 0.9605 0.0395 0.1947 0.7350 0.1490
3 0.3 0.64 0.9553 0.2955 2.000 0.0873 0.2823 0.6114 0.1891
4 0.4 0.52 0.9211 0.3894 4.000 0.1516 0.3587 0.4790 0.2025
5 0.5 0.40 0.8776 0.4794 6.000 0.2298 0.4207 0.3510 0.1918
n=5 1.5 3.21 4.7291 1.4629 4.4817 0.5183 1.3558 3.0720 0.8223

1.2

− data
1 − p(x)

0.8

0.6

0.4

0.2
−0.1 0.2 0.4 0.6

Figure 3.7: Trigonometric least squares fit.

Thus we got the best trigonometric least squares polynomial

p(x) = −0.0002 + 0.9851 cos x − 0.9900 sin x,

which approximate the given data. •

We use the author-defined function TiGFit and the following MATLAB commands to reproduce
the above results as follows:

>> x = [0.1 0.2 0.3 0.4 0.5]; y = [0.9 0.75 0.64 0.52 0.40];
>> T iGF it(x, y)

Note that C = cos(xj ) and S = sin(xj ). •


The original data along with the approximating polynomial are shown graphically in Figure 3.7.
To plot Figure 3.7 one can use the MATLAB window command:

>> x = [0.1 0.2 0.3 0.4 0.5]; y = [0.9 0.75 0.64 0.52 0.40];
>> xf it = −0.1 : 0.1 : 0.6; yf it = −0.0002 + 0.9851. ∗ cos(xf it) − 0.9900. ∗ sin(xf it);
>> plot(x, y,0 o0 , xf it, yf it,0 −0 );
268 6.2 Techniques of Least Squares Approximation

3.2.6 Least Squares Solution of a Overdetermined System


In the Chapters 1 and 2 we have discussed methods for computing the solution x to a linear system
Ax = b when the coefficient matrix A is square (number of rows and columns are equal). For
square matrix A, usually the linear system has a unique solution. Now we consider linear systems
where the coefficient matrix is rectangular (number of rows and columns are not equal). If A has m
rows and n columns, then x is a vector with n components and b is a vector with m components.
If the number of rows are greater than number of columns (m > n), then the linear system is called
overdetermined system. Typically, an overdetermined system has no solution. This type of system
generally arises when dealing with experimental data. It is also common in optimization-related
problems.
Consider the following overdetermined linear system of two equations in one variable

2x1 = 3
4x1 = 1 (3.39)

Now using the Gauss elimination to solve this system, we obtain

0 = −5,

which is impossible and hence, the given system (3.39) is inconsistent. Writing the given system in
vector form, we get
! !
2 2
x1 = . (3.40)
4 4

The left-hand side of (3.40) is [0, 0]T when x1 = 0, and is [2, 4]T when x1 = 1. Note that as x1 takes
on all possible values, the left-hand side of (3.40) generates the line connecting the origin and the
point (2, 4) (see Figure 3.8). On the other hand, the right-hand side of (3.40) is the vector [3, 1]T .
Since the point (3, 1) does not lie on the line, therefore, left-hand side and the right-hand side of
(3.40) are never equal. The given system (3.40) is only consistent when the point corresponding
to the right-hand side is contained in the line corresponding to the left-hand side. Thus the least
squares solution to (3.40) is the value of x1 for which the point on the line is closest to the point
(3, 1). In Figure 3.8, we see that the point (1, 2) on the line is closest to (3, 1) which we got when
x1 = 1/2. So the least squares solution to (3.39) is x1 = 1/2. Now consider the following linear
system of three equations in two variables:

a11 x1 + a12 x2 = b1 

a21 x1 + a22 x2 = b2 (3.41)

a31 x1 + a32 x2 = b3 

Again, it is impossible to find a solution that can be satisfy all of the equations unless two of three
equations are dependent. That is, if only two out of the three equations are unique, then a solution
is possible. Otherwise, our best hope is to find a solution that minimizes the error, that is, the
least squares solution. Now, we discuss the method for finding the least squares solution to the
overdetermined system.
Chapter Six Least Squares Approximation 269

4 O
(2,4)

(1,2)
2 O

1 O
(3,1)

x
1 2 3 4

Figure 3.8: Least squares solution to an overdetermined system.

In the least squares method, x̂ is chosen so that Euclidean norm of residual r = b − Ax̂ is as small
as possible. The residual corresponding to system (3.41) is
 
b1 − a11 x1 − a12 x2
r =  b2 − a21 x1 − a22 x2 
 
b3 − a31 x1 − a32 x2

The l2 -norm of the residual is the square root of the sum of each component squared:
q
krk2 = r12 + r22 + r32 .

Since minimizing krk2 is equivalent to minimizing (krk2 )2 , the least squares solution to (3.41) is
that values for x1 and x2 which minimize the expression

(b1 − a11 x1 − a12 x2 )2 + (b2 − a21 x1 − a22 x2 )2 + (b3 − a31 x1 − a32 x2 )2 (3.42)

The minimizing x1 and x2 are found by differentiating (3.42) with respect to x1 and x2 and setting
the derivatives to zero. Then solving for x1 and x2 , we will obtain the least squares solution
x̂ = [x1 , x2 ]T to the system (3.41).
For a general overdetermined linear system Ax = b, the residual is r = b − Ax̂ and the l2 -norm of
the residual is the square root of rT r. The least squares solution to linear system minimizes

rT r = (krk2 )2 = (b − Ax̂)T (b − Ax̂). (3.43)

The above equation (3.43) attains minimum when the partial derivative with respect to each of the
variables x1 , x2 , . . . , xn is zero. Since

rT r = r12 + r22 + · · · + rm
2
, (3.44)

and the ith component of the residual r is

ri = bi − ai1 x1 − ai2 x2 − · · · − ain xn .


270 6.2 Techniques of Least Squares Approximation

Thus, the partial derivative of rT r with respect to xj is given by

∂ T
r r = −2r1 a1j − 2r2 a2j − · · · − 2rm amj . (3.45)
∂xj

From the right side of (3.45), we see that the partial derivative of rT r with respect to xj is −2
times the product between the jth column of A and r. Note that the jth column of A is the jth
row of AT . Since the jth component of AT r is equal to the jth column of A times r, the partial
derivative of rT r with respect to xj is the jth component of the vector −2AT r. The l2 -norm of the
residual is minimized at that point x where all the partial derivatives vanish, that is

∂ T ∂ T ∂ T
r r= r r = ··· = r r=0 (3.46)
∂x1 ∂x2 ∂xn

Since each of these partial derivatives is −2 times the corresponding component of AT r, we conclude
that
AT r = 0. (3.47)
Replacing r by b − Ax, gives
AT (b − Ax̂) = 0, (3.48)
or
AT Ax̂ = AT b, (3.49)
which is called the normal equation.
Any x̂ that minimizes the l2 -norm of the residual r = b − Ax̂ is a solution to the normal equation
(3.49). Conversely, any solution to the normal equation (3.49) is a least squares solution to the
overdetermined linear system.

Example 3.11 Solve the following overdetermined linear system of three equations in two un-
knowns.
2x1 + 5x2 + x3 = 1
3x1 − 4x2 + 2x3 = 3
4x1 + 3x2 + 3x3 = 5
5x1 − 2x2 + 4x3 = 7
Solution. The matrix form of the given system is
   
2 5 1   1
x1
 3 −4 2   3 
  x2  =  .
    
4 3 3 5

x3
   
5 −2 4 7

Then using the normal equation (3.49), we obtain


   
  2 5 1     1
2 3 4 5  x1 2 3 4 5 
3 −4 2  3 
 5 −4 3 −2     x2  =  5 −4 3 −2   ,
      
4 3 3 5
1 2 3 4 x3 1 2 3 4
   
5 −2 4 7
Chapter Six Least Squares Approximation 271

which reduces the given system as


    
54 0 40 x1 66
 0 54 −2   x2  =  −6  .
    
40 −2 30 x3 50
Solving this simultaneous linear system, the values of unknowns are as follows
x1 = −1.00, x2 = 0.00, x3 = 3.00,
and it is called the least squares solution of the given overdetermined system. •
We use the author-defined function OverDet and the following MATLAB commands to reproduce
the above results as follows:

>> A = [2 5 1; 3 − 4 2; 4 3 3; 5 − 2 4]; b = [1; 3; 5; 7];


>> x = OverDet(A, b);

3.2.7 Least Squares Solution of Underdetermined System


We consider again such linear systems where the coefficient matrix is rectangular (number of rows
and columns are not equal). If A has m rows and n columns, then x is a vector with n components
and b is a vector with m components. If the number of rows are smaller than number of columns
(m < n), then the linear system is called underdetermined system. Typically, an underdetermined
system has infinitely many solutions. This type of system generally arises in optimization theory
and in economic modeling.
In general, the coefficient in row i and column j for the matrix AAT is the dot product between
row i and row j from A.
Notice that the coefficient matrix AAT is symmetric, so when forming the matrix AAT , we just
evaluate the coefficients that are on the diagonal or above the diagonal. Whereas the coefficients
below the diagonal are determined form the symmetry property.
Consider the equation
4x1 + 3x2 = 15 (3.50)
We want to find the least squares solution to (3.50). The set of all points (x1 , x2 ) that satisfy (3.50)
1
forms a line with slop − 43 and the distance from the origin to the point (x1 , x2 ) is (x21 + x22 ) 2 .
To find the least squares solution to (3.50), we choose the point (x1 , x2 ) that is as close to the
origin as possible. The point (z1 , z2 ) in Figure 3.9 , which is closest to the origin is the least squares
solution to (3.50). We see in Figure 3.9 that the vector from the origin to (z1 , z2 ) is orthogonal to
line 4x1 + 3x2 = 15.
The collection of points forming this perpendicular have the form
!
4
t ,
3
where t is an arbitrary scalar.
Since (z1 , z2 ) lies on this perpendicular, there exists a value of t such that
! !
z1 4
=t ,
z2 3
272 6.2 Techniques of Least Squares Approximation

5
x 5 x
2 2

4 4

4x1 + 3x2 = 15
4x + 3x =15
3 1 2 3

(2.4,1.8)
2 (z1, z2) 2

1 1
0 1 2 3 0 1 2 3
x1 x1

Figure 3.9: Least squares solution of underdetermined system.

and this implies that


z1 = 4t and z2 = 3t.
Since x1 = z1 and x2 = z2 must satisfy (3.50), therefore, we have

4z1 + 3z2 = 15. (3.51)

Substituting z1 = 4t and z2 = 3t into (3.51), gives

25t = 15, or t = 3/5 = 0.6.

Thus, the least squares solution to (3.50) is


! ! !
z1 4 2.4
= 0.6 = .
z2 3 1.8

Now, let us consider a general underdetermined linear system Ax = b and suppose that p is any
solution to the linear system and q is any vector for which

Aq = 0.

Since
A(p + q) = Ap + Aq = Ap = b,
we see that p + q is a solution to Ax = b, whenever Aq = 0. Conversely, it is also true because if
Ax = b, then
x = p + (x − p) = p + q,
where q = x − p.
Since
A(x − p) = b − b = 0,
then x can be expressed as x = p + q, and Az = 0.
The set of all q such that Aq = 0, is called the null space of A (kernel of A) letting

N = {q : Aq = 0}.
Chapter Six Least Squares Approximation 273

Minimum norm solution

p
o
z Solution set

p+q

q
Origin

Figure 3.10: Least squares solution of underdetermined system.

Any solution x of underdetermined linear system Ax = b are sketched in Figure 3.10 for x = p + q,
and q ∈ N.

From Figure 3.10, the solution closest to the origin is perpendicular to N.

In linear algebra, the set of vectors perpendicular to the null space of A are linear combinations of
the rows of A, so if  
a11 a12 · · · a1n
 a21 a22 · · · a2n
 

A=  .. .. .. .. , m < n,
 . . . .


am1 am2 · · · amn
then        
z1 a11 a21 am1

 z2 


 a12 


 a22 


 am2 

z= ..  = t1  ..  + t2  ..  + · · · + tm  .. ,
. . . .
       
       
zn a1n a2n amn
or
    
a11 t1 + a21 t2 + · · · + am1 tm a11 a21 · · · am1 t1

 a12 t1 + a22 t2 + · · · + am2 tm  
  a12 a22 · · · am2 
 t2 

s= .. .. .. .. = .. .. .. ..  .. .
. . . . . . . . .
    
    
a1n t1 + a2n t2 + · · · + amn tm a1n a2n · · · amn tm

So
z = AT t,
where  
t1

 t2 

t= .. .
.
 
 
tm
274 6.2 Techniques of Least Squares Approximation

Substituting x = z = AT t into a linear system Ax = b, we have

AAT t = b. (3.52)

Solving this equation yields t, that is

t = (AAT )−1 b,

while the least squares solution s to the underdetermined system is

z = AT t = AT (AAT )−1 b. (3.53)

Now solving the underdetermined equation (3.50),


!
  x1
4 3 = 15,
x2

we first use (3.52) as follows: !


  4
4 3 t = 15,
3
which gives t = 15/25 = 0.6. Now using (3.53), we have
! !
T 4 2.4
z=A t= (0.6) = ,
3 1.8

which is the least squares solution of the given underdetermined equation. •


Example 3.12 Solve the following underdetermined linear system of two equations in three un-
knowns.
x1 + 2x2 − 3x3 = 42
5x1 − x2 + x3 = 54
Solution. The matrix form of the given system is
 
! x1 !
1 2 −3 42
 x2  = .
 
5 −1 1 54
x3

Then using the normal equation (3.53),

AAT t = b,

we obtain  
! 1 5 ! !
1 2 −3 t1 42
 2 −1  = ,
 
5 −1 1 t2 54
−3 1
which reduces the given system as
! ! !
14 0 t1 42
= .
0 27 t2 54
Chapter Six Least Squares Approximation 275

Solving the above linear system, the values of unknowns are


t1 = 3 t2 = 2.
Since the best least squares solution z to the given linear system is, z = AT t, that is,
     
z1 1 5 ! 13
3
 z2  =  2 −1  =  4 ,
     
2
z3 −3 1 −7
which is called the least squares solution of the given underdetermined system. •
We use the author-defined function UnderDet and the following MATLAB commands to repro-
duce the above results as follows:

>> A = [1 2 − 3; 5 − 1 1]; b = [42; 54];


>> x = U nderDet(A, b);

3.2.8 The Pseudoinverse Least Squares Approximation


If A is an n × n matrix with linearly independent columns, then it is invertible, and the unique
solution to the linear system Ax = b is x = A−1 b. If m > n and A is m × n with linearly
independent columns, then a system Ax = b has no exact solution, but the best approximation is
given by the unique least squares solution x̂ = (AT A)−1 AT b. The matrix (AT A)−1 AT , therefore,
plays the role of an inverse of A in this situation.
Definition 3.1 (Pseudoinverse of a Matrix)

If A is a matrix with linearly independent columns, then the pseudoinverse of a matrix is the matrix
A+ defined by
A+ = (AT A)−1 AT . (3.54)
For example, consider the matrix  
1 2
A =  2 3 ,
 
3 4
then we have  
! 1 2 !
T 1 2 3 14 20
A A=  2 3 = ,
 
2 3 4 20 29
3 4
and its inverse will be of the following form
!
T −1 29/6 −10/3
(A A) = .
−10/3 7/3
Thus
! ! !
T −1 T + 29/6 −10/3 1 2 3 −11/6 −1/3 7/6
(A A) A =A = = ,
−10/3 7/3 2 3 4 4/3 1/3 −2/3
is the pseudoinverse of the matrix. •
276 6.2 Techniques of Least Squares Approximation

The pseudoinverse of the matrix can be obtained using MATLAB built-in pinv function as follows:

>> A = [1 2; 2 3; 3 4];
>> pinv(A);

Note that if A is a square matrix, then A+ = A−1 and in such case, the least squares solution of a
linear system Ax = b is the exact solution, since

x̂ = A+ b = A−1 b.

Example 3.13 Find the pseudoinverse of the matrix of the following linear system and then use
it to compute the least squares solution of the system.
x1 + 2x2 = 3
2x1 − 3x2 = 4
Solution. The matrix form of the given system is
! ! !
1 2 x1 3
= ,
2 −3 x2 4

and so ! ! !
T 1 2 1 2 5 −4
A A= = .
2 −3 2 −3 −4 13
The inverse of the matrix AT A can be computed as follows
!
T −1 13/49 4/49
(A A) = .
4/49 5/49

The pseudoinverse of the matrix of the given system is


! ! !
T −1 T + 13/49 4/49 1 2 3/7 2/7
(A A) A =A = = .
4/49 5/49 2 −3 2/7 −1/7

Now we compute the least squares solution of the system as follows


! !
+ 3/7 2/7 3
x̂ = A b = ,
2/7 −1/7 4

which gives !
17/7
x̂ = ,
2/7
and this is the least squares solution of the given system. •
The least squares solution to the linear system by pseudoinverse of a matrix can be obtained using
the MATLAB command window as follows:

>> A = [1 2; 2 − 3]; b = [3 4]0 ; x = pinv(A) ∗ b;


Chapter Six Least Squares Approximation 277

Theorem 3.1 Let A is a matrix with linearly independent columns, then A+ of A satisfies the
follows:
1. AA+ A = A.

2. A+ AA+ = A+ .

3. (AT )+ = (A+ )T .

Theorem 3.2 If A is a square matrix with linearly independent columns, then


1. A+ = A−1 .

2. (A+ )+ = A.

3. (AT )+ = (A+ )T .

3.2.9 Least Squares with QR Decomposition


The least squares solutions discussed previously suffer from a frequent problem. The matrix AT A
of the normal equation is usually ill-conditioned, therefore, a small numerical error in performing
the Gauss elimination will result in a large error in the least squares.
Usually, the Gauss elimination for AT A of size n ≥ 5 does not yield any good approximate solutions.
It turns out that the QR decomposition of A (discussed in the Chapter 4) yields a more reliable
way of computing the least squares approximation of linear system Ax = b. The idea behind this
approach is that because orthogonal matrix preserve length, they should preserve the length of the
error as well.
Let A have linearly independent columns and let A = QR be a QR decomposition. In this decom-
position, we express a matrix as the product of an orthogonal matrix Q and an upper triangular
matrix R.
For x̂ a least squares solution of Ax = b, we have

AT Ax̂ = AT b
(QR)T (QR)x̂ = (QR)T b
RT QT QRx̂ = RT QT b
RT Rx̂ = RT QT b (becauseQT Q = I)

Since R is invertible, so is RT , and hence

Rx̂ = QT b,

or equivalently,
x̂ = R−1 QT b.
Since R is an upper triangular, in practice it is easier to solve Rx̂ = QT b directly (using backward
substitution) than to invert R and compute R−1 QT b.
278 6.2 Techniques of Least Squares Approximation

Theorem 3.3 If A is an m × n matrix with linearly independent columns and if A = QR is a QR


decomposition, then the unique least squares solutions x̂ of Ax = b is, theoretically, given by
x̂ = R−1 QT b, (3.55)
and it is usually computed by solving the system
Rx̂ = QT b. (3.56)

Example 3.14 A QR decomposition of A is given. Use it to find a least squares solution of the
linear system Ax = b, where
   
2 2 6 1
A= 1 4 −3  , b =  −1  ,
   
2 −4 9 4
and 
2 1 2 


 3 3 3 
  
  3 0 9

1 2 2 
Q= , R =  0 6 −6  .
   

 3 3 3 
 0 0 −3
 
 2 2 1 
− −
3 3 3
Solution. For the right-hand side of (3.56), we obtain

2 1 2 


 3 3 3 
   
  1 3

1 2 2 

QT b =  −   −1  =  −3  .
   

 3 3 3 

4 0
 
 2 2 1 

3 3 3
Hence (3.56) can be written as
    
3 0 9 x1 3
 0 6 −6   x2  =  −3  ,
    
0 0 −3 x3 0
or
3x1 + 9x3 = 3
6x2 − 6x3 = −3
− 3x3 = 0
Now using the backward substitution, we obtain
x̂ = [x1 , x2 , x3 ]T = [1, −1/2, 0]T ,
which is called the least squares solution of the given system. •
Chapter Six Least Squares Approximation 279

So we conclude that
Rx̂ = QT b,
must satisfies by the solution of AT Ax̂ = AT b, but because in general R is not even square, we can
not use multiplication by (RT )−1 to arrive at this conclusion. In fact, it is not true in general that
the solution of
Rx̂ = QT b,
even exists, after all, Ax = b is equivalent to QRx = b, that is, to Rx = QT b, so Rx = QT b can
have an actual solution x only if Ax = b does. However, we are getting close to finding the least
squares solution. Here, we need to find a way to simplify the expression

RT Rx̂ = RT QT b. (3.57)

The matrix R is upper triangular, and because we have restricted ourselves the case m ≥ n, we
may write the m × n matrix R as !
R1
R= , (3.58)
0
in partitioned (block) form, where R1 is an upper triangular n × n matrix and 0 represents an
(m − n) × n zero matrix. Since rank(R) = n, so that R1 is nonsingular. hence every diagonal
element of R1 must be nonzero. Now we may rewrite

RT Rx̂ = RT QT b,

as
!T ! !T
R1 R1 R1
x̂ = QT b
0 0 0
!
  R1  
R1T 0T x̂ = R1T 0T QT b
0
 
R1T R1 x̂ = R1T 0T (QT b).

Note that multiplying by the block 0T (an n × (m − n) zero matrix) on the right-hand side simply
means that the last (m − n) components of QT b do not affect the computation. Since R1 is
nonsingular, then we have
 
R1 x̂ = (R1T )−1 R1T 0T (QT b)
 
= In 0T (QT b).

The left-hand side, R1 x̂, is (n × n) × (n × 1) −→ n × 1, and the right-hand side is (n × (n + (m −


n)) × (m × m) × (m × 1) −→ n × 1. If we define the vector q to be equal to the first n components
of QT b, then this becomes
R1 x̂ = q, (3.59)
280 6.2 Techniques of Least Squares Approximation

which is a square linear system involving a nonsingular upper triangular n × n matrix. So (3.59) is
called the least squares solution of the overdetermined system Ax = b with QR decomposition
by backward substitution, where A = QR is the QR decomposition of A and q is essentially QT b.
Note that the last (m − n) columns of Q are not needed to solve the least squares solution of the
linear system with QR decomposition. The block-matrix representation of Q corresponding to R
(by (3.58)) is
Q = [Q1 , Q2 ],
where Q1 is the matrix composed of the first m columns of Q and Q2 is a matrix composed of the
remaining columns of Q. Note that only the first n columns of Q are needed to create A using the
coefficients in R, we can save effort and memory in the process of creating the QR decomposition.
The so-called short QR decomposition of A is

A = Q1 R1 . (3.60)

The only difference between the full QR decomposition and the short decomposition is that the
full QR decomposition contains the additional (m − n) columns of Q.

Example 3.15 Find the least solution of the following linear system Ax = b using QR decompo-
sition, where    
2 1 ! 1.9
x1
A =  1 0 , x = , b =  0.9  .
   
x2
3 1 2.8
Solution. First, we find the QR decomposition, we will get
   
−0.5345 0.6172 −0.5774 −3.7417 −1.3363
Q =  −0.2673 −0.7715 −0.5774  and R= 0 0.4629  ,
   
−0.8018 −0.1543 0.5774 0 0

and     
−0.5345 −0.2673 −0.8018 1.9 −3.5011
QT b =  0.6172 −0.7715 −0.1543   0.9  =  0.0463  .
    
−0.5774 −0.5774 0.5774 2.8 0.0000
So that ! !
−3.7417 −1.3363 −3.5011
R1 = and q= .
0 0.4629 0.0463
Hence we must solve (3.59), that is
R1 x̂ = q,
or ! ! !
−3.7417 −1.3363 x1 −3.5011
= .
0 0.4629 x2 0.0463
Using backward substitution, we obtain

x̂ = [x1 , x2 ]T = [0.9000, 0.1000]T ,

the least solution of the given system. •


Chapter Six Least Squares Approximation 281

The MATLAB build-in qr function returns the QR decomposition of a matrix. There are two ways
of calling qr which are:

>> [Q, R] = qr(A);


>> [Q1 , R1 ] = qr(A, 0);

where Q and Q1 are orthogonal matrices and R and R1 are upper triangular matrices. The above
first form returns the full QR decomposition (that is, if A is (m × n), then Q is (m × m) and R is
(m × n). The second form returns the short QR decomposition, where Q1 and R1 are the matrices
in (3.60).
In the Example 3.15, we apply the full QR decomposition of A using the first form of build-in qr
function as

>> A = [2 1; 1 0; 3 1];
>> [Q, R] = qr(A);

The short QR decomposition of A can be obtain by using the second form of build-in qr function as

>> A = [2 1; 1 0; 3 1];
>> [Q1 , R1 ] = qr(A, 0);

As expected, Q1 and the first two columns of Q are identical, as are R1 and the first two rows of
R. The short QR decomposition of A possesses all the necessary information in the columns of Q1
and R1 to reconstruct A.

3.2.10 Least Squares with Singular Value Decomposition


One of the advantage of the singular value decomposition (SVD) is that we can efficiently compute
the least squares solution. Consider the problem of finding the least squares solution of the overde-
termined linear system Ax = b. Since we discussed previously that the least squares solution of
Ax = b is the solution of AT Ax̂ = AT b, that is, the solution of

(U DV T )T U DV T x̂ = (U DV T )T b
x̂ = V D−1 U T b.

This is the same formal solution that we found for the linear system Ax = b (see the Chapter 6),
but recall that A is no longer a square matrix.
Note that in exact arithmetic, the solution to a least squares problem via normal equations, QR
and SVD is exactly the same. The main difference between these two approaches are the numerical
stability of the methods. To find the least squares solution of the overdetermined linear system
with SVD decomposition, we will find D1 as
!
D1
D= ,
0

in partitioned (block) form, where D1 is an n × n matrix and 0 represents an (m − n) × n zero


matrix. If we define the right-hand vector q to be equal to the first n components of U T b, then
282 6.2 Techniques of Least Squares Approximation

the least squares solution of the overdetermined linear system is to solve the following system

D1 V T x̂ = q, (3.61)

or
x̂ = V D1−1 q. (3.62)

Example 3.16 Find the least solution of the following linear system Ax = b using singular value
decomposition, where
   
1 1 ! 1
x1
A =  0 1 , x= , b =  1 .
   
x2
1 0 1

Solution. First, we find the singular value decomposition of the given matrix. The first step is to
find the eigenvalues of the following matrix
 
! 1 1 !
T 1 0 1 2 1
A A=  0 1 = .
 
1 1 0 1 2
1 0

The characteristic polynomial of AT A is

p(λ) = λ2 − 4λ + 3 = (λ − 3)(λ − 1) = 0,

gives
λ1 = 3, λ2 = 1,
the eigenvalues of AT A and the corresponding eigenvectors are
! !
1 −1
and .
1 1

These vectors are orthogonal, so we normalize them to obtain


√ ! √ !
2/2 − 2/2
v1 = √ and v2 = √ .
2/2 2/2

The singular values of A are


p √ p √
σ1 = λ1 = 3 and σ2 = λ2 = 1 = 1.

Thus √ √ ! !
√ 2/2 −√ 2/2 0.7071 −0.7071
V = =
2/2 2/2 0.7071 0.7071
and  √   
3 0 1.7321 0
D= 0 1 = 0 1.0000  .
   
0 0 0 0
Chapter Six Least Squares Approximation 283

To find U , we first compute


 √
√ √
  
1 1 !
√ 6/3
1 3
u1 = Av1 =  0 1 
 √2/2 =  √6/6  ,

σ1 3 6/2
1 0 6/2

and similarly  
0
1  √
u2 = Av2 =  √2/2  .

σ2
− 2/2
These are the two of the three column vectors of U and already form an orthonormal basis for R2 .
Now to find the third column vector u3 of U , we will look for a unit vector u3 that is orthogonal to
   
√ 2 √ 0
6u1 =  1  and 2u2 =  1  .
   
1 −1

To satisfy these two orthogonality conditions, the vector u3 must be a solution of the homogeneous
linear system    
! x1 0
2 1 1 
 x2  =  0  ,
  
0 1 −1
x3 0
which gives the general solution of the system
   
x1 −1
 x2  = α  1  , α ∈ R.
   
x3 1
√ √ √
By normalizing the vector on the right-hand side, gives, u3 = [−1/ 3, 1/ 3, −1/ 3]T . So we have
 √ √   
6/3 0 −1/ 3 0.8165 0.0000 −0.5774
 √ √ √  
U =  √6/6 2/2 1 √3  =  0.4082 0.7071 0.5774  .


6/2 − 2/2 1/ 3 0.4082 −0.7071 0.5774

This yields the SVD


 √ √  √
√ √

√ 6/3 √ 0 −1/√3 3 0 !
√ 2/2 √ 2/2
A =  √6/6 −√2/2 1/√3   0 1  .
  
2/2 − 2/2
6/2 2/2 1/ 3 0 0

Hence ! !
1.7321 0 0.5774 0
D1 = and D1−1 = .
0 1.0000 0 1.0000
Also     
0.8165 0.4082 0.4082 1 1.6330
U T b =  0.0000 0.7071 −0.7071   1  =  0.0000  ,
    
−0.5774 0.5774 0.5774 1 0.5774
284 6.2 Techniques of Least Squares Approximation

and from it we obtain !


1.6330
q= .
0.0000
Thus we must solve (3.62), that is,
x̂ = V D1−1 q,
which gives
! ! ! ! !
x1 0.7071 −0.7071 0.5774 0 1.6330 0.6667
= = ,
x2 0.7071 0.7071 0 1.0000 0.0000 0.6667

which is the least squares solution of the given system. •


Like QR decomposition, the MATLAB build-in svd function returns the SVD decomposition of a
matrix. There are two ways of calling svd which are:

>> [U, D, V ] = svd(A); [U1 , D1 , V ] = svd(A, 0);

Here A is any matrix, D is diagonal matrix having singular values of A in the diagonal, and U and
V are orthogonal matrices. The first form returns the full SVD decomposition and the second form
returns the short SVD. The second decomposition is useful when A is an m×n matrix with m > n.
The second form of SVD decomposition gives U1 the first n columns of U and square (n × n) D1 .
When m > n, then the full SVD decomposition of A gives a D matrix with only zeros in the last
(m − n) rows. Note that there is no change in V in both forms.
In the Example 3.16, we apply the full SVD decomposition of A using the first form of build-in
svd function as

>> A = [1 1; 0 1; 1 0];
>> [U, D, V ] = svd(A);

The short SVD decomposition of A can be obtain by using the second form of build-in svd function

>> A = [1 1; 0 1; 1 0];
>> [U1 , D1 , V ] = svd(A, 0);

As expected, U1 and the first two columns of U are identical, as are D1 and the first two rows of D
(and no change in V in both forms). The short SVD decomposition of A possesses all the necessary
information in the columns of U1 and D1 (with V also) to reconstruct A.

Now we consider the problem of finding the least squares solution of the underdetermined linear
system Ax = b. Since we discussed previously that the least squares solution of Ax = b is the
solution of AAT x̂ = b, that is, the solution of
(U DV T )(U DV T )T x̂ = b
x̂ = U (D−1 )T D−1 U T b.
Note that  
D= D1 0 ,
Chapter Six Least Squares Approximation 285

in partitioned (block) form, where D1 is an m × m matrix and 0 represents an (n − m) × m zero


matrix, and
p = D1−1 U T b.
Thus the least squares solution of the underdetermined linear system is to solve the following system

D1T U T x̂ = p, (3.63)

or
x̂ = U (D1−1 )T p. (3.64)

Example 3.17 Find the least solution of the following linear system Ax = b using singular value
decomposition, where
! ! !
1 1 0 x1 1
A= , x= , b= .
0 0 1 x2 2

Solution. First, we find the eigenvalues of the following matrix


   
1 0 ! 1 1 0
1 1 0
AT A =  1 0  = 1 1 0 
   
0 0 1
0 1 0 0 1

The characteristic polynomial of AT A is

p(λ) = λ3 − 3λ2 + 2λ = λ(λ − 1)(λ − 2) = 0

gives
λ1 = 2, λ2 = 1, λ2 = 0,
the eigenvalues of AT A and the corresponding eigenvectors are

[1, 10]T , [0, 0, 1]T , [−1, 1, 0]T .

These vectors are orthogonal, so we normalize them to obtain


√ √ √ √
v1 = [ 2/2, 2/2, 0]T , v2 = [0, 0, 1]T , v3 = [− 2/2, 2/2, 0]T .

The singular values of A are


√ √ √
σ1 = 2, σ2 = 1, and σ3 = 0.

Thus  √ √   
2/2 0 − 2/2 0.7071 0 −0.7071
 √ √
V =  2/2 0 2/2  =  0.7071 0 0.7071  ,
  
0 1 0 0 1 0
and √ ! !
2 0 0 1.4142 0 0
D= = .
0 1 0 0 1.0000 0
286 6.2 Techniques of Least Squares Approximation

To find U , we first compute


 √ 
! 2/2 !
1 1 1 1 0  √ 1
u1 = Av1 = √  2/2  = ,

σ1 2 0 0 1 0
0
and  
! 0 !
1 1 1 1 0 0
u2 = Av2 =  0 = .
 
σ2 1 0 0 1 1
1
These are the two column vectors of U and already form an orthonormal basis for R2 , so we get
!
1 0
U= .
0 1
Thus we have the SVD of the given matrix
√ √

 
! ! 2/2 2/2 0
1 0 2 0 0 T
A=  √ 0 √ 0 1  = U DV .
 
0 1 0 1 0
− 2/2 2/2 0
Hence ! !
1.4142 0 0.7071 0
D1 = and D1−1 = .
0 1.0000 0 1.0000
Also ! ! ! !
0.7071 0 1 0 1.0000 1.0000
P = D1−1 U T b = = ,
0 1.0000 0 1 2.0000 2.0000
and from it we obtain !
0.7071
q= .
2.0000
Thus we must solve (3.64), that is,
x̂ = U (D1−1 )T P,
which gives
! ! ! ! !
x1 1 0 0.7071 0 0.7071 0.5000
= = ,
x2 0 1 0 1.0000 2.0000 2.0000
which is the least squares solution of the given system. •
We use the author-defined function svdfunction and the following MATLAB commands to repro-
duce the above results of the Example 3.17 as follows:

>> A = [1 1 0; 0 0 1]; [U, D, V ] = svdf unction(A);

Note that when m and n are similar size, the SVD is significantly more expansive to compute than
the QR decomposition. If m and n are equal, then solving a least squares problem by the SVD is
about an order of magnitude more costly than using the QR decomposition. So for least squares
problems it is generally advisable to use the QR decomposition. When a least squares problem is
known to be difficult one, using the SVD is probably justified.

You might also like