Professional Documents
Culture Documents
Final Exam Solutions: Solution
Final Exam Solutions: Solution
Final Exam Solutions: Solution
Vandenberghe 12/12/00
EE103
SOLUTION.
1. Cancellation. We are subtracting two numbers that are almost equal. There is no error
in the first number ((x + y)/2 = 1.005), but there is an error of about 0.01% in the
√
second ( xy = 1.00498 . . . is rounded to 1.005). Cancellation causes the relative error
in the result to be much larger (more than 100%).
√
2. Multiply and divide by (x + y)/2 + xy:
1
Problem 2. (20 points)
A matrix C is defined as
C = Auv T B
where A ∈ Rn×n , B ∈ Rn×n , u ∈ Rn , and v ∈ Rn . The product on the right hand side
can be evaluated in many different ways, e.g., as A(u(v T B)) or as A((uv T )B), etc. What is
the fastest method (i.e., requiring the least number of flops) when n is large? Explain your
answer.
• n2 flops for (Au)(v T B) (or 2n2 depending on how you calculate it).
The total is 5n2 (or 6n2 ). All other mehods include n3 terms because we have at least one
product of two n × n matrices.
You are given a nonsingular matrix A, and you would like to obtain an idea of the condition
number κ(A) without calculating the exact value of κ(A). To estimate the condition number
you evaluate the matrix-vector product Ax for a few different values of x, and calculate the
norm of x and the norm of Ax. The following table shows the results for four vectors x(1) ,
x(2) , x(3) , x(4) .
x(i) Ax(i)
i=1 1 100
i=2 100 1
i=3 103 104
i=4 10−3 102
What are the best (i.e., largest) lower bounds on A , A−1 and κ(A) that you can derive
based on this information?
SOLUTION.
• A−1 ≥ A−1 y / y for all y = 0. The best bound is A−1 ≥ 100 (by taking
y = Ax(2) and A−1 y = x(2) ).
• κ(A) = A A−1 ≥ 107 (by combining the bounds for A and A−1 ).
2
Problem 4. (20 points)
is nonsingular, regardless of the rank and the dimensions of A. Hint. Show that the
matrix I + AT A is nonsingular.
2. We can conclude from part 1 that the solution x̄, ȳ of (1) is unique. Show that x̄
minimizes
Ax − b 2 + x + c 2 .
SOLUTION.
1. We have to show that y + Ax = 0 and AT y − x = 0 are only possible if x = 0 and
y = 0. Eliminating y from the first equation, and substituting in the second yields
This will imply that x = 0 if we can show that AT A + I is nonsingular, i.e., that
(AT A + I)x = 0 implies x = 0. We have
(Both terms Ax 2 and x 2 must be zero, and that is only possible if x = 0.) Finally,
if x = 0 then y = −Ax means y = 0. We conclude that y + Ax = 0 and AT y − x = 0
imply x = 0 and y = 0, and therefore that the coefficient matrix is nonsingular.
ȳ + Ax̄ = b
AT ȳ − x̄ = c.
Eliminating ȳ from the first equation and substituting in the second yields
AT (b − Ax̄) − x̄ = c
3
i.e., x̄ satisfies
(AT A + I)x̄ = AT b − c. (2)
On the other hand, we can express the minimization problem in the problem statement
as a least-squares problem
minimize Ãx − b̃ 2
where
A b
à = , b̃ = .
I −c
The normal equations are ÃT Ãx = ÃT b̃, which reduce to
(AT A + I)x = AT b − c,
if we substitute the definition of à and b̃. Comparing with (2), we can conclude that
the solution is equal to x̄.
Ax + By = b (3)
where b ∈ Rp , A ∈ Rp×p , and B ∈ Rp×q are given. The variables are x ∈ Rp and y ∈ Rq .
We assume that q < p, that A is nonsingular, and that B is full rank (i.e., Rank B = q).
The equations are underdetermined, so there are infinitely many solutions. For example, we
can pick any y, and solve the set of linear equations Ax = b − By to find x.
Below we define four solutions that minimize some measure of the magnitude of x, or y, or
both. For each of these solutions, describe the factorizations (QR, Cholesky, or LU with
partial pivoting) that you would use to calculate x and y. Clearly specify the matrices that
you factorize, and the type of factorization. If you know several methods, you should give
the most efficient one (least number of flops for large p, q).
x 2
2 2
1. The solution x, y with the smallest value of x + y = .
y
4
SOLUTION.
1. This is the least-norm solution of the underdetermined set of equations
x
A B = b.
y
We can calculate the least-norm solution from the QR factorization of
AT
BT
which is full rank because A is nonsingular. The cost is 2(p + q)p2 . (Note that you
didn’t have to give flop counts. We will include them here for completeness.)
We can also get the solution from the Cholesky factorization of
AT
A B = AAT + BB T .
BT
The cost would be (p+q)p2 to form the matrix, plus p3 /3 for the Cholesky factorization.
√
2. We can√formulate this as a least-norm problem by a change of variable ỹ = 2y, or
y = ỹ/ 2. We find x and ỹ by calculating the solution of
√
Ax + (1/ 2)B ỹ = b
with the smallest norm x 2 + ỹ 2 . We calculate the least-norm solution from the
QR factorization of the matrix
T
A
√
(1/ 2)B T
(which is full rank because A is nonsingular). The cost is 2(p + q)p2 . Alternatively we
can use the Cholesky factorization of
√ AT 1
A (1/ 2)B √ = AAT + BB T .
(1/ 2B T 2
The cost would be (p+q)p2 to form the matrix, plus p3 /3 for the Cholesky factorization.
3. We set y = 0 and solve Ax = b using the LU factorization of A (2p3 /3 flops).
4. x and y must satisfy x = A−1 b − A−1 By. We can therefore minimize x 2 by solving
the least-squares problem
minimize A−1 b − A−1 By 2
with variable y. Before we can solve this LS problem we need to calculate A−1 b and
the matrix A−1 B. The most efficient method is to take the LU factorization of A and
then solve q + 1 sets of linear equations
Az0 = b, Az1 = b1 ..., Azq = bq
5
where bi denotes the ith column of B. The result is A−1 b = z0 and
A−1 B = [z1 z2 · · · zq ].
The total cost of calculating A−1 b and A−1 B using this method is 2p3 /3 for the fac-
torization of A and 2(q + 1)p2 for the forward and backward substitutions to find the
vectors zi .
We then solve the least-squares problem
by taking the QR factorization of A−1 B, which costs 2pq 2 flops. We know that A−1 B
is full rank, because B is full rank (A−1 By = 0 implies By = 0, which implies y = 0.)
We can also use the Cholesky factorization of (A−1 B)T (A−1 B). The cost would be pq 2
to form the matrix, plus q 3 /3 for the factorization.
Below we define three functions g that we want to minimize (see equations (4), (5), (6)). For
each of these three functions, explain which of the following three methods you would use
to minimize g: linear least-squares via QR factorization, Newton’s method, or the Gauss-
Newton method.
• If your answer is linear least-squares, you should express the problem in the standard
form of minimizing Ax − b 2 , and give the matrix A, the vector of variables x, and
the right hand side b.
• If your answer is Newton’s method, you should describe how you construct the Hessian
and gradient of g, and how you would choose the starting point.
• If your answer is Gauss-Newton method, you should describe how you construct the
linear least-squares problems that you have to solve at each iteration, and how you
would choose the starting point.
Note: The description of the problem is long, but all the information you need is summa-
rized in the definitions of g (i.e., the equations (4), (5), (6)). The rest of the discussion is
background and can be read quickly.
The figure shows a leveling network used to determine the vertical height (or elevation) of
three landmarks.
6
node 1
edge 1
edge 3 edge 2
node 2
edge 4 edge 5
node 4 node 3
The nodes in the graph represent four points on a map. The variables in the problem are the
heights h1 , h2 , h3 of the first three points with respect to the fourth, which will be used as
reference (in other words we take h4 = 0). Each of the five edges in the graph represents a
measurement of the height difference of its two end points. For example, edge 1 from node 2
to node 1 indicates a measurement of h1 − h2 . We will denote the measured values of the
five height differences as ρ1 , ρ2 , ρ3 , ρ4 , ρ5 . Our goal is to determine h1 , h2 , and h3 from the
five measurements. We distinguish three situations.
ρ1 = h1 − h2 + v1
ρ2 = h3 − h1 + v2
ρ3 = −h1 + v3
ρ4 = −h2 + v4
ρ5 = h2 − h3 + v5
where v1 , . . . , v5 are small and unknown measurement errors. To estimate the heights,
we minimize the function
ρ1 = h1 − h2 + v1 + w
ρ2 = h3 − h1 + v2 + w
ρ3 = −h1 + v3 + w
ρ4 = −h2 + v4 + w
ρ5 = h2 − h3 + v5 + w.
7
To estimate h1 , h2 , h3 , and w we minimize
ρ1 = (h1 − h2 ) + α(h1 − h2 )3 + v1
ρ2 = (h3 − h1 ) + α(h3 − h1 )3 + v2
ρ3 = −h1 − αh31 + v3
ρ4 = −h2 − αh32 + v4
ρ5 = (h2 − h3 ) + α(h2 − h3 )3 + v5 .
The variables are h1 , h2 , and h3 . The numbers ρi and α are given. You know that α
is small.
SOLUTION.
3. This is a nonlinear least-squares problem. We can use Newton’s method or the Gauss-
Newton method. In both cases we can take as starting point the solution h from
8
problem 1. We know that α is small, so the solutions of problem 1 and 3 are probably
close. We can write g as
5
g(h) = ri (h)2
i=1
Now we can plug in the expressions for ri (h), ∇ri (h) and ∇2 ri (h) in
5
∇g(h) = 2 ri (h)∇ri (h)
i=1
5
∇2 g(h) = 2 (ri (h)∇2 ri (h) + ∇ri (h)∇ri (h)T )
i=1
where h(k) is our current guess, and we take the solution of this minimization problem
as the next iterate h(k+1) . Using the expressions for ri (h) and ∇ri (h) given above, we
have
9
which is a least-squares problem
where
(1 + 3α(aT1 h(k) )2 )aT1 2α(aT1 h(k) )3 + ρ1
(1 + 3α(aT h(k) )2 )aT 2α(aT h(k) )3 + ρ
2 2 2 2
(k)
A =
T (k) 2 T
(1 + 3α(a3 h ) )a3 ,
b(k) =
2α(a T (k) 3
3 h ) + ρ 3 .
(1 + 3α(aT4 h(k) )2 )aT4 2α(aT4 h(k) )3 + ρ4
(1 + 3α(aT5 h(k) )2 )aT5 2α(aT5 h(k) )3 + ρ5
10