Final Exam Solutions: Solution

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

L.

Vandenberghe 12/12/00
EE103

Final exam solutions


Problem 1. (20 points)

If x and y are two positive numbers with x = y, then


x+y √
− xy > 0.
2

(The arithmetic mean (x + y)/2 is always greater than the geometric mean xy.)
The following Matlab code evaluates the left hand side of the inequality at x = 1 and
y = 1.01, rounding the result of the square root to 4 significant digits:
>> x=1; y=1.01;
>> (x+y)/2 - chop(sqrt(x*y),4)
ans =
-2.2204e-016
The result is clearly wrong because it should be positive. (The correct answer is 1.24 · 10−5 .)
1. Explain why the result is wrong. (You do not have to explain the numerical value of
the result.)

2. Give a more stable method for evaluating (x + y)/2 − xy when x ≈ y. (You should
assume that square roots are evaluated with 4 correct significant digits, and that mul-
tiplications, divisions, additions, and subtractions are exact.)

SOLUTION.
1. Cancellation. We are subtracting two numbers that are almost equal. There is no error
in the first number ((x + y)/2 = 1.005), but there is an error of about 0.01% in the

second ( xy = 1.00498 . . . is rounded to 1.005). Cancellation causes the relative error
in the result to be much larger (more than 100%).

2. Multiply and divide by (x + y)/2 + xy:

√ ((x + y)2 /4 − xy)


(x + y)/2 − xy = √
(x + y)/2 + xy
(x − y)2
= √ .
4((x + y)/2 + xy)
In this expression, we still have a subtraction x − y of two nearly equal numbers. But

the difference is larger than between (x + y)/2 and xy in the original expression, and
moreover there is no error in x and y, so no cancellation occurs.

1
Problem 2. (20 points)

A matrix C is defined as
C = Auv T B
where A ∈ Rn×n , B ∈ Rn×n , u ∈ Rn , and v ∈ Rn . The product on the right hand side
can be evaluated in many different ways, e.g., as A(u(v T B)) or as A((uv T )B), etc. What is
the fastest method (i.e., requiring the least number of flops) when n is large? Explain your
answer.

SOLUTION. We evaluate C as C = (Au)(v T B). This requires

• 2n2 flops for Au

• 2n2 flops for v T B

• n2 flops for (Au)(v T B) (or 2n2 depending on how you calculate it).

The total is 5n2 (or 6n2 ). All other mehods include n3 terms because we have at least one
product of two n × n matrices.

Problem 3. (20 points)

You are given a nonsingular matrix A, and you would like to obtain an idea of the condition
number κ(A) without calculating the exact value of κ(A). To estimate the condition number
you evaluate the matrix-vector product Ax for a few different values of x, and calculate the
norm of x and the norm of Ax. The following table shows the results for four vectors x(1) ,
x(2) , x(3) , x(4) .
x(i) Ax(i)
i=1 1 100
i=2 100 1
i=3 103 104
i=4 10−3 102

What are the best (i.e., largest) lower bounds on A , A−1 and κ(A) that you can derive
based on this information?

SOLUTION.

• A ≥ Ax / x for all x = 0. The best bound is A ≥ 105 (by taking x = x(4) ).

• A−1 ≥ A−1 y / y for all y = 0. The best bound is A−1 ≥ 100 (by taking
y = Ax(2) and A−1 y = x(2) ).

• κ(A) = A A−1 ≥ 107 (by combining the bounds for A and A−1 ).

2
Problem 4. (20 points)

Consider the set of p + q linear equations in p + q variables


    
I A ȳ b
T = . (1)
A −I x̄ c

A ∈ Rp×q , b ∈ Rp , and c ∈ Rq are given. The variables are x̄ ∈ Rq and ȳ ∈ Rp .

1. Show that the coefficient matrix


 
I A
AT −I

is nonsingular, regardless of the rank and the dimensions of A. Hint. Show that the
matrix I + AT A is nonsingular.

2. We can conclude from part 1 that the solution x̄, ȳ of (1) is unique. Show that x̄
minimizes
Ax − b 2 + x + c 2 .

SOLUTION.
1. We have to show that y + Ax = 0 and AT y − x = 0 are only possible if x = 0 and
y = 0. Eliminating y from the first equation, and substituting in the second yields

AT (−Ax) − x = (AT A + I)x = 0.

This will imply that x = 0 if we can show that AT A + I is nonsingular, i.e., that
(AT A + I)x = 0 implies x = 0. We have

(AT A + I)x = 0 =⇒ xT (AT A + I)x = 0


=⇒ xT AT Ax + xT x = 0
=⇒ Ax 2 + x 2 = 0
=⇒ x = 0.

(Both terms Ax 2 and x 2 must be zero, and that is only possible if x = 0.) Finally,
if x = 0 then y = −Ax means y = 0. We conclude that y + Ax = 0 and AT y − x = 0
imply x = 0 and y = 0, and therefore that the coefficient matrix is nonsingular.

2. We have two equations

ȳ + Ax̄ = b
AT ȳ − x̄ = c.

Eliminating ȳ from the first equation and substituting in the second yields

AT (b − Ax̄) − x̄ = c

3
i.e., x̄ satisfies
(AT A + I)x̄ = AT b − c. (2)

On the other hand, we can express the minimization problem in the problem statement
as a least-squares problem
minimize Ãx − b̃ 2
where    
A b
à = , b̃ = .
I −c
The normal equations are ÃT Ãx = ÃT b̃, which reduce to

(AT A + I)x = AT b − c,

if we substitute the definition of à and b̃. Comparing with (2), we can conclude that
the solution is equal to x̄.

Problem 5. (20 points)

Consider the underdetermined set of linear equations

Ax + By = b (3)

where b ∈ Rp , A ∈ Rp×p , and B ∈ Rp×q are given. The variables are x ∈ Rp and y ∈ Rq .
We assume that q < p, that A is nonsingular, and that B is full rank (i.e., Rank B = q).
The equations are underdetermined, so there are infinitely many solutions. For example, we
can pick any y, and solve the set of linear equations Ax = b − By to find x.
Below we define four solutions that minimize some measure of the magnitude of x, or y, or
both. For each of these solutions, describe the factorizations (QR, Cholesky, or LU with
partial pivoting) that you would use to calculate x and y. Clearly specify the matrices that
you factorize, and the type of factorization. If you know several methods, you should give
the most efficient one (least number of flops for large p, q).
 
 x 2
 2  2
1. The solution x, y with the smallest value of x + y =   .
y 

2. The solution x, y with the smallest value of x 2 + 2 y 2 .

3. The solution x, y with the smallest value of y 2 .

4. The solution x, y with the smallest value of x 2 .

4
SOLUTION.
1. This is the least-norm solution of the underdetermined set of equations
 
  x
A B = b.
y
We can calculate the least-norm solution from the QR factorization of
 
AT
BT

which is full rank because A is nonsingular. The cost is 2(p + q)p2 . (Note that you
didn’t have to give flop counts. We will include them here for completeness.)
We can also get the solution from the Cholesky factorization of
 
  AT
A B = AAT + BB T .
BT

The cost would be (p+q)p2 to form the matrix, plus p3 /3 for the Cholesky factorization.

2. We can√formulate this as a least-norm problem by a change of variable ỹ = 2y, or
y = ỹ/ 2. We find x and ỹ by calculating the solution of

Ax + (1/ 2)B ỹ = b
with the smallest norm x 2 + ỹ 2 . We calculate the least-norm solution from the
QR factorization of the matrix  
T
A

(1/ 2)B T
(which is full rank because A is nonsingular). The cost is 2(p + q)p2 . Alternatively we
can use the Cholesky factorization of
 
 √  AT 1
A (1/ 2)B √ = AAT + BB T .
(1/ 2B T 2
The cost would be (p+q)p2 to form the matrix, plus p3 /3 for the Cholesky factorization.
3. We set y = 0 and solve Ax = b using the LU factorization of A (2p3 /3 flops).
4. x and y must satisfy x = A−1 b − A−1 By. We can therefore minimize x 2 by solving
the least-squares problem
minimize A−1 b − A−1 By 2
with variable y. Before we can solve this LS problem we need to calculate A−1 b and
the matrix A−1 B. The most efficient method is to take the LU factorization of A and
then solve q + 1 sets of linear equations
Az0 = b, Az1 = b1 ..., Azq = bq

5
where bi denotes the ith column of B. The result is A−1 b = z0 and

A−1 B = [z1 z2 · · · zq ].

The total cost of calculating A−1 b and A−1 B using this method is 2p3 /3 for the fac-
torization of A and 2(q + 1)p2 for the forward and backward substitutions to find the
vectors zi .
We then solve the least-squares problem

minimize A−1 b − A−1 By 2

by taking the QR factorization of A−1 B, which costs 2pq 2 flops. We know that A−1 B
is full rank, because B is full rank (A−1 By = 0 implies By = 0, which implies y = 0.)
We can also use the Cholesky factorization of (A−1 B)T (A−1 B). The cost would be pq 2
to form the matrix, plus q 3 /3 for the factorization.

Problem 6. (20 points)

Below we define three functions g that we want to minimize (see equations (4), (5), (6)). For
each of these three functions, explain which of the following three methods you would use
to minimize g: linear least-squares via QR factorization, Newton’s method, or the Gauss-
Newton method.

• If your answer is linear least-squares, you should express the problem in the standard
form of minimizing Ax − b 2 , and give the matrix A, the vector of variables x, and
the right hand side b.

• If your answer is Newton’s method, you should describe how you construct the Hessian
and gradient of g, and how you would choose the starting point.

• If your answer is Gauss-Newton method, you should describe how you construct the
linear least-squares problems that you have to solve at each iteration, and how you
would choose the starting point.

Note: The description of the problem is long, but all the information you need is summa-
rized in the definitions of g (i.e., the equations (4), (5), (6)). The rest of the discussion is
background and can be read quickly.

The figure shows a leveling network used to determine the vertical height (or elevation) of
three landmarks.

6
node 1

edge 1
edge 3 edge 2

node 2

edge 4 edge 5
node 4 node 3

The nodes in the graph represent four points on a map. The variables in the problem are the
heights h1 , h2 , h3 of the first three points with respect to the fourth, which will be used as
reference (in other words we take h4 = 0). Each of the five edges in the graph represents a
measurement of the height difference of its two end points. For example, edge 1 from node 2
to node 1 indicates a measurement of h1 − h2 . We will denote the measured values of the
five height differences as ρ1 , ρ2 , ρ3 , ρ4 , ρ5 . Our goal is to determine h1 , h2 , and h3 from the
five measurements. We distinguish three situations.

1. There is a small error in each measurement. The measured values are

ρ1 = h1 − h2 + v1
ρ2 = h3 − h1 + v2
ρ3 = −h1 + v3
ρ4 = −h2 + v4
ρ5 = h2 − h3 + v5

where v1 , . . . , v5 are small and unknown measurement errors. To estimate the heights,
we minimize the function

g(h1 , h2 , h3 ) = (h1 − h2 − ρ1 )2 + (h3 − h1 − ρ2 )2 + (−h1 − ρ3 )2


+ (−h2 − ρ4 )2 + (h2 − h3 − ρ5 )2 . (4)

The variables are h1 , h2 , h3 . The numbers ρi are given.


2. In addition to the measurement errors vi , there is a systematic error, or offset, w in
the measurement device. We actually measure

ρ1 = h1 − h2 + v1 + w
ρ2 = h3 − h1 + v2 + w
ρ3 = −h1 + v3 + w
ρ4 = −h2 + v4 + w
ρ5 = h2 − h3 + v5 + w.

7
To estimate h1 , h2 , h3 , and w we minimize

g(h1 , h2 , h3 , w) = (h1 − h2 + w − ρ1 )2 + (h3 − h1 + w − ρ2 )2 + (−h1 + w − ρ3 )2


+ (−h2 + w − ρ4 )2 + (h2 − h3 + w − ρ5 )2 . (5)

The variables are h1 , h2 , h3 , and w. The numbers ρi are given.


3. In addition to the measurement errors vi , there is an error caused by nonlinearity of
the measurement device. We actually measure

ρ1 = (h1 − h2 ) + α(h1 − h2 )3 + v1
ρ2 = (h3 − h1 ) + α(h3 − h1 )3 + v2
ρ3 = −h1 − αh31 + v3
ρ4 = −h2 − αh32 + v4
ρ5 = (h2 − h3 ) + α(h2 − h3 )3 + v5 .

where α ∈ R is small and given. To estimate h1 , h2 , h3 , we minimize


 2  2
g(h1 , h2 , h3 ) = (h1 − h2 ) + α(h1 − h2 )3 − ρ1 + (h3 − h1 ) + α(h3 − h1 )3 − ρ2
 2  2
+ −h1 − αh31 − ρ3 + −h2 − αh32 − ρ4
 2
+ (h2 − h3 ) + α(h2 − h3 )3 − ρ5 . (6)

The variables are h1 , h2 , and h3 . The numbers ρi and α are given. You know that α
is small.

SOLUTION.

1. Linear least squares:


   
1 −1 0  
ρ1
   


−1 0 1 

h1 

ρ2 

A=
 −1 0 0 ,
 x= 
 h2  , b=
 ρ3 ,

   
 0 −1 0  h3  ρ4 
0 1 −1 ρ5

2. Linear least squares:


   
1 −1 0 1   ρ1
  h1  


−1 0 1 1 
  h  

ρ2 

A=
 −1 0 0 1 ,
 x=

2 
, b=
 ρ3 ,

   h3   
 0 −1 0 1   ρ4 
w
0 1 −1 1 ρ5

3. This is a nonlinear least-squares problem. We can use Newton’s method or the Gauss-
Newton method. In both cases we can take as starting point the solution h from

8
problem 1. We know that α is small, so the solutions of problem 1 and 3 are probably
close. We can write g as
5

g(h) = ri (h)2
i=1

where ri (h) = aTi h


+ α(aTi h)3
− ρi , and aTi
is the ith row of the matrix A from part 1.
This makes it easier to write the gradient and Hessian of ri :

∇ri (h) = ai + 3α(aTi h)2 ai , ∇2 ri (h) = 6α(aTi h)ai aTi .

For example, for


r1 (h) = (h1 − h2 ) + α(h1 − h2 )3 − ρ1
we have
   
1 1 −1 0
∇r1 (h) = (1 + 3α(h1 − h2 )2 )  
 −1  , ∇2 r1 (h) = 6α(h1 − h2 ) 
 −1 1 0 
.
0 0 0 0

Now we can plug in the expressions for ri (h), ∇ri (h) and ∇2 ri (h) in
5

∇g(h) = 2 ri (h)∇ri (h)
i=1
5
∇2 g(h) = 2 (ri (h)∇2 ri (h) + ∇ri (h)∇ri (h)T )
i=1

to find the gradient and Hessian of g.


The Gauss-Newton is simpler, because it does not require any second derivatives. At
each iteration, we minimize
5 
 2
minimize ri (h(k) ) + ∇ri (h(k) )T (h − h(k) )
i=1

where h(k) is our current guess, and we take the solution of this minimization problem
as the next iterate h(k+1) . Using the expressions for ri (h) and ∇ri (h) given above, we
have

∇ri (h(k) )T h = (1 + 3α(aTi h(k) )2 )aTi h


ri (h(k) ) − ∇ri (h(k) )T h(k) = aTi h(k) + α(aTi h(k) )3 − ρi − (1 + 3α(aTi h(k) )2 )aTi h(k)
= −2α(aTi h(k) )3 − ρi .

In other words to find the new h we have to solve


5 
 2
minimize (1 + 3α(aTi h(k) )2 )aTi h − 2α(aTi h(k) )3 − ρi
i=1

9
which is a least-squares problem

minimize A(k) h − b(k) 2

where    
(1 + 3α(aT1 h(k) )2 )aT1 2α(aT1 h(k) )3 + ρ1
   
 (1 + 3α(aT h(k) )2 )aT   2α(aT h(k) )3 + ρ 
 2 2   2 2 
   
(k)    
A = 
T (k) 2 T
(1 + 3α(a3 h ) )a3 ,

b(k) = 

2α(a T (k) 3
3 h ) + ρ 3 .

   
 (1 + 3α(aT4 h(k) )2 )aT4   2α(aT4 h(k) )3 + ρ4 
   
(1 + 3α(aT5 h(k) )2 )aT5 2α(aT5 h(k) )3 + ρ5

10

You might also like