Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Math 2050 Fall 2022

Homework 4: Solutions

1 SVD of simple matrices

1. Consider the matrix


 
1 0 0
 0 −2 0 
A=
 0 0 0 .

0 0 0

a) (3 points) Find the range, nullspace, and rank of A.


Solution: Range(A)= span{e1 , e2 }
Null(A)= span{e3 }
where e1 , e2 , e3 are the unit vectors in R3 .
Rank(A)=2.
b) (2 points) Find an SVD of A.
Solution: We have: 

1 0 0
AT A = 0 4 0
0 0 0
AT A has eigenvalues 1 and 4 with corresponding eigenvectors e1 and e2 so A has singular values 1
and 2 . Therefore:
A = 2u1 eT2 + 1.u2 eT1
   
0 1
−1 0
so u1 =  0  and u2 = 0. In conclusion,
  

0 0
  
0 1
−1 0
A = 2
 0  (0, 1, 0) + 0 (1, 0, 0)
  

0 0

c) (2 points) Determine the set of solutions to the linear equation Ax = y, with



  
2 2
 3   0 
y=
 0 ,
 y=
 0 .

0 3

1

  
2 2
, the solution is x =  −3/2 .
 3   
Solution: When y =  0   0 
0 0
 
2
 0 
When y =  0 , the equation has no solution.

3
2. Consider the 2 × 2 matrix
   
1 2  2 −1 
A= √ 1 −1 +√ 1 1 .
10 1 10 2

a) (2 points) What is an SVD of A? Express it as A = U SV T , with S the diagonal matrix of singular


values ordered in decreasing fashion. Make sure to check all the properties required for U, S, V.
Solution: We have:
 √  √ 
√ √  √ √ 

2/√5 −1/√ 5
A = 1. 1/ 2 −1/ 2 + 2. 1/ 2 1/ 2 .
1/ 5 2/ 5
√   √ 
√ √  −1/ 5 √ √  1/ 2

Because 2/ 5 1/ 5 √ = 0 and 1/ 2 − 1/ 2 √ = 0, therefore we have
2/ 5 1/ 2
orthogonal matrices U and V such that:

A = U SV T
 √ √     √ √ 
−1/√ 5 2/√5 2 0 1/√2 1/ √2
where U = , S= and V = .
2/ 5 1/ 5 0 1 1/ 2 −1/ 2
b) (2 points) Find the semi-axis lengths and principal axes (minimum and maximum distance and associ-
ated directions from E to the center) of the ellipsoid:

E(A) = {Ax : x ∈ R2 , ∥x∥2 ≤ 1}.

Hint: Use the SVD of A to show that every element of E(A) is of the form y = U ȳ for some element
ȳ ∈ E(S). That is, E(A) = {U ȳ : ȳ ∈ E(S)}. (In other words the matrix U maps E(S) into the set
E(A).) Then analyze the geometry of the simpler set E(S).
Solution: We have, for every x, y := Ax = U S(V T x) hence y = U ȳ, with ȳ = S x̄ and x̄ = V T x.
Since V is orthogonal, ∥x̄∥2 = ∥x∥2 . In fact, when x runs the unit Euclidean sphere, so does x̄. Thus
every element of E(A) is of the form y = U ȳ for some element ȳ in E(S). To analyze E(A) it suffices to
analyze E(S) and then transform the points of the latter set via the mapping ȳ → U ȳ.
Since
x̃21 x̃22
  
x̃1
x̄21 x̄22

E(S) = σ1 x̄1 e1 + σ2 x̄2 e2 : + ≤1 = : 2 + 2 ≤1 ,
x̃2 σ1 σ2
with e1 , e2 the unit vectors, we have

E(A) = U (σ1 x̄1 e1 + σ2 x̄2 e2 ) : x̄21 + x̄22 ≤ 1 = σ1 x̄1 u1 + σ2 x̄2 u2 : x̄21 + x̄22 ≤ 1 .
 

2
In the coordinate system defined by the orthonormal basis (u1 , u2 ) the set is an ellipsoid with semi-axis
lengths (σ1 , σ2 ), and principal axes given by the coordinate axes. In the original system the principal axes
are u1 , u2 .
i) What is the set E(A) when we append a zero vector after the last column of A, that is A is replaced with
à = (A, 0) ∈ R2×3 ?
Solution:
This is a map from a sphere in R3 to an ellipse in R2

E(Ã) = σ1 x̄1 u1 + σ2 x̄2 u2 : x̄21 + x̄22 + x̄23 ≤ 1 .




ii) Same question when we append a row after the last row of A, that is, A is replaced with à = [AT , 0]T ∈
R3×2 . Interpret geometrically your result.
Solution: The map from x ∈ R2 to an ellipse in R3 , the ellipse is located in the plane xOy, has the same
axis lengths (σ1 , σ2 ) and

E(Ã) = σ1 x̄1 ũ1 + σ2 x̄2 ũ2 : x̄21 + x̄22 ≤ 1 .




   
u1 u
where ũ1 = , ũ2 = 2 .
0 0

3
2 Rank and SVD

The image above shows a 256 × 256 matrix A of pixel values. The lines indicate +1 values; at each
intersection of lines, the corresponding matrix element is +2. All the other elements are zero.
a) (3 points) Show that for some permutation matrices P, Q, the permuted matrix B := P AQ has the
symmetric form B = pq T + qpT , for two vectors p, q. Determine P, Q, B and p, q.
Solution: Let’s create a matrix B by swapping row 50 and row 150 of the matrix A. To do that
 T the matrix P , formed by swapping row 50 and row 150 of the 256 × 256 identity matrix:
we consider
e1
...
 
eT  50th
 150 
P = ...
 
 T 
 e50  150th
 
...
eT256
T T
So B =P A and  B = pq + qp
099
 1  100th
 
049 
 
where p =   1  150th and q = 1256 .

049 
 
 1  200th
056
b) (3 points) What is the rank of A? Hint: find the nullspace of B.
Solution: Consider the system of equations Bx = 0 so (pq T + qpT )x = 0 or (q T x)p + (pT x)q = 0.
Because p and q are linear independent so pT x = 0 and q T x = 0. Therefore the null space of B is
orthogonal to p and q. So dim(N (B)) = 256 − 2 = 254.
We have rank(A) = rank(B) = 2.
c) (3 points) Find an SVD of A.

4
Hint: Find an eigenvalue decomposition of B:

• Prove that p̃ + q̃ and p̃ − q̃ are the eigenvectors of B where p̃, q̃ are the normalized vectors of p, q
respectively.

• These 2 eigenvectors are corresponding to the 2 eigenvalues that different than 0 and the null space
vectors are corresponding the eigenvalues equal 0

• Multiply P −1 , Q−1 in the correct way to get the SVD of A, here: P −1 = P T , Q−1 = QT .

Solution: We have:

B(p̃ + q̃) = (pq T + qpT )(p̃ + q̃) = ∥p∥|q∥(p̃q̃ T + q̃ p̃T )(p̃ + q̃) = ∥p∥|q∥(p̃ + q̃ + (p̃ + q̃)p̃T q̃)

= ∥p∥|q∥(p̃T q̃ + 1)(p̃ + q̃) = σ1 (p̃ + q̃)


where σ1 = ∥p∥|q∥(p̃T q̃ + 1) and σ1 is an eigenvalue of B with corresponding eigenvector p̃ + q̃.
p̃+q̃
We can normalize p̃ + q̃ by letting v1 = ∥p̃+q̃∥ so Bv1 = σ1 v1 .
p̃−q̃
Similarly, we have v2 = ∥p̃−q̃∥ and Bv2 = σ2 v2
where σ2 = ∥p∥|q∥(p̃T q̃ − 1) and σ2 is an eigenvalue of B with corresponding eigenvector v2 .
Because p̃ + q̃ and p̃ − q̃ are orthogonal to each other so v1 and v2 are orthogonal to each other.
Therefore:
B = σ1 v1 v1T + σ2 v2 v2T
Thus: A = σ1 (P T v1 )v1T + σ2 (P T v2 )v2T = σ1 u1 v1T + σ2 u2 v2T
where u1 = P T v1 and u2 = P T v2 (it’s easy to check that u1 and u2 are orthogonal).

5
3 SVD and projections

1. We consider a set of m data points xi ∈ Rn , i = 1, . . . , m. We seek to find a line in Rn such that the
sum of the squares of the distances from the points to the line is minimized. To simplify, we assume
that the line goes through the origin.
Consider a line that goes through the origin L := {tu : t ∈ R}, where u ∈ Rn is given. (You can
assume without loss of generality that ∥u∥2 = 1.) Find an expression for the projection of a given
point x on L.
Now consider the m points and find an expression for the sum of the squares of the distances from
the points to the line L.
a) (2 points) Explain how you would find the line via the SVD of the n×m matrix X = [x1 , . . . , xm ].

Solution:
We want to find the closest line to a set of points. It’s equivalent to finding a line to minimize the
squares of the distances of all the points to that line (the orange segments). We can change the
problem to maximize the squares of all the red segments, or maximize ni=1 (xTi u)2 = ∥X T u∥2 . We
P
have:
T 2 T T uT XX T u
∥X u∥ = u XX u =
uT u
(because u is an unit vector).
This is the Rayleigh quotient of the matrix XX T so the solution is u = p1 , the eigenvector of X T X
that corresponds to the largest eigenvalue.
If X = U ΣV T so u = p1 = u1 , u1 is the first column vector of U corresponds to the largest singular
value.

b) (2 points) How would you address the problem without the restriction that the line has to pass
through the origin?

Solution: Let’s consider a general case, the average point is x0 , we just need to shift all the
points by −x0 , now the average point is the origin and we will obtain the same result u = u1 like
in question a. The line when shifted back to the original set of points goes through x0 and has the
direction vector u = u1 .

6
2. (4 points) Solve the same problems as previously by replacing the line by a hyperplane.
Solution: Consider a plane that goes through the origin P := {αu + βv : α, β ∈ R}, where
u, v are units and orthogonal to each other.
Same arguments as before, we maximize the term: ∥X T u∥2 + ∥X T v∥2 . Consider a matrix W =
(u v), so: ∥X T u∥2 + ∥X T v∥2 = ∥X T W ∥2F .
We need to find the matrix W such that the Frobenious norm of the matrix Z = X T W maximize.
We have rank(Z) = 2 and it’s easy to see geometrically that ∥Z∥F ≤ ∥X T ∥F . Imagine a row vector
of X T projected into the plane P so the length of that vector is always greater or equal to the length
of the projected vector.
Therefore, the problem is to find a matrix W such that the Frobenious norm of a rank-2 matrix
X T W is closest to the Frobenious norm of X T .
The solution to this problem is u = u1 , v = u2 or W = (u1 u2 ) where u1 , u2 are the first 2 column
vectors of U corresponding to the 2 largest singular values of X ( where X = U ΣV T ).

7
4 Gram-Schmidt Algorithm and QR Decomposition (10 points)

Any set of n linearly independent vectors in Rn could be used as a basis for Rn . However, certain bases
could be more suitable for certain operations than others. For example, an orthonormal basis could facili-
tate solving linear equations.

1. (3 points) Given a square matrix A ∈ Rn×n , it can be represented as a multiplication of two matrices

A = QR,

where Q is a unitary matrix (its columns form an orthonormal basis for Rn ) and R is an upper-
triangular matrix. For the matrix A, describe how Gram-Schmidt process could be used to find the
Q and R matrices, and apply this to  
1 1 1
A = 1 3 2
1 2 4
to find a unitary matrix Q and an upper-triangular matrix R. Is the decomposition unique?
Solution: Let ai and qi denote the columns of A and Q, respectively. Using Gram-Schmidt, we
obtain an orthogonal basis qi for the column space of A.

p1
p 1 = a1 , q 1 =
∥p1 ∥2
⊤ p2
p2 = a2 − (a2 q1 )q1 , q2 =
∥p2 ∥2
p3
p3 = a3 − (a⊤ ⊤
3 q1 )q1 − (a3 q2 )q2 , q3 =
∥p3 ∥2

Rearranging terms, we have

a1 = r11 q1 (1a)
ai = ri1 q1 + · · · + rii qi , i = 2, ..., n, (1b)

where each qi has unit norm, and rij qj denotes the projection of ai onto the vector qj for j ̸= i.
Stacking ai horizontally into A and rewriting (1a-b) in matrix notation, we obtain A = QR. For the
given matrix, we have
 √ √ √  √ √ √ 
−1/√3 1/ √2 −1/√6 − 3 −2√ 3 −7/√3
A = −1/√3 −1/ 2 −1/√ 6  0 − 2 −1/√ 2 .
−1/ 3 0 2/ 6 0 0 5/ 6

The decomposition is not unique: an equivalent factorization is A = (−Q)(−R).

2. (3 points) Given an invertible matrix A ∈ Rn×n and an observation vector b ∈ Rn , the solution to
the equation
Ax = b

8
is given as x = A−1 b. For the specific matrix A = QR from part 1, assume that we want to solve
 
1
Ax = 3 .

3
By using the fact that Q is a unitary matrix, find b such that
Rx = b.

Then, given the upper-triangular matrix R and b, find the elements of x sequentially.
Solution: We note that Q−1 = QT .

Ax = b
QRx = b
Q⊤ QRx = Rx = Q⊤ b.

Thus  √ 
−7/√ 3
b̄ = Q⊤ b =  − √2  .
2/ 6

Given R and b̄, we can find x by back-substitution:


 √ √ √     √ 
− 3 −2√ 3 −7/√3 x1 −7/√ 3
 0 − 2 −1/√ 2 . x2  =  − √2 
0 0 5/ 6 x3 2/ 6
 
−1/5
=⇒ x3 = 2/5 =⇒ x2 = 4/5 =⇒ x1 = −1/5 =⇒ x =  4/5  .
2/5
3. (3 points) Given an invertible matrix B ∈ Rn×n and an observation vector c ∈ Rn , find the compu-
tational cost of finding the solution z to the equation Bz = c by using the QR decomposition of B.
Assume that Q and R matrices are available, and adding, multiplying, and dividing scalars take one
unit of “computation”.
As an example, computing the inner product a⊤ b is said to be O(n), where the “big O” notation
indicates that the cost is proportional to the number inside the parentheses since we have n scalar
multiplications total – one for each ai bi . Similarly, matrix-vector multiplication is O(n2 ), since
matrix-vector multiplication can be viewed as computing n inner products.
Solution: We count the number of operations in back substitution. Solving the initial equation
rnn xn = b̄n
takes 1 multiplication. Solving each subsequent equation takes one more multiplication and one
more addition than the previous. In total, we have 1 + 3 + 5 + · · · of operations, which is on the
order of O(n2 ).
Thus, matrix multiplication and back substitution are both O(n2 ). Given the QR decomposition of
A, we can solve Ax = b in O(n2 ) time.

9
4. (3 points) Describe how the QR decomposition can be used to find the inverse of a square matrix A.
Find the associated computational cost.
Solution: We simply solve AX = I, which is equivalent to n systems Ax = ei , i = 1, . . . , n,
where ei is the i-th unit vector. The total computational cost is thus O(n3 ).

10

You might also like