Least Squares: Least Squares Problem Solution of A Least Squares Problem Solving Least Squares Problems

L.
Vandenberghe ECE133A (Fall 2019)
8. Least squares
• least squares problem
• solution of a least squares problem
• solving least squares problems
8.1
Least squares problem
given A ∈ Rm×n and b ∈ Rm , find vector x ∈ Rn that minimizes
!2
m n
k Ax − bk 2 =
X X
Ai j x j − bi
i=1 j=1
• “least squares” because we minimize a sum of squares of affine functions:
m n
k Ax − bk =
2
ri (x) ,
2
ri (x) =
X X
Ai j x j − bi
i=1 j=1
• the problem is also called the linear least squares problem
Least squares 8.2

Example
1
f ( x̂) + 2
 2 0   1  f ( x̂) + 1
x2
0
A =  −1 1  , b =  0 
   
 
 0 2   −1  x̂
   
−1
−1 0 1
x1
• the least squares solution x̂ minimizes
f (x) = k Ax − bk 2 = (2x1 − 1)2 + (−x1 + x2)2 + (2x2 + 1)2
• to find x̂ , set derivatives with respect to x1 and x2 equal to zero:
10x1 − 2x2 − 4 = 0, −2x1 + 10x2 + 4 = 0
solution is ( x̂1, x̂2) = (1/3, −1/3)
Least squares 8.3

Least squares and linear equations
minimize k Ax − bk 2
• solution of the least squares problem: any x̂ that satisfies
k A x̂ − bk ≤ k Ax − bk for all x
• r̂ = A x̂ − b is the residual vector
• if r̂ = 0, then x̂ solves the linear equation Ax = b
• if r̂ , 0, then x̂ is a least squares approximate solution of the equation
• in most least squares applications, m > n and Ax = b has no solution
Least squares 8.4

Column interpretation
least squares problem in terms of columns a1, a2, . . . , an of A:
n
k Ax − bk = k
2
a j x j − bk 2
X
minimize
j=1
r = A x̂ − b
A x̂
range(A) = span(a1, . . . , an )
• A x̂ is the vector in range(A) = span(a1, a2, . . . , an) closest to b

• geometric intuition suggests that r̂ = A x̂ − b is orthogonal to range(A)
Least squares 8.5

Example: advertising purchases
• m demographic groups; n advertising channels

• Ai j is # impressions (views) in group i per dollar spent on ads in channel j
• x j is amount of advertising purchased in channel j
• (Ax)i is number of impressions in group i
• bi is target number of impressions in group i
Example: m = 10, n = 3, b = 1031
Columns of matrix A Target b and least squares result A x̂
Channel 1 1,500
Channel 2
2 Channel 3
Impressions
Impressions
1,000
1
500
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Least squares
Group Group 8.6
Example: illumination
• n lamps at given positions above an area divided in m regions

• Ai j is illumination in region i if lamp j is on with power 1 and other lamps are off
• x j is power of lamp j
• (Ax)i is illumination level at region i
• bi is target illumination level at region i
Example: m = 252, n = 10; figure shows position and height of each lamp
25m
2 (3.5m)
1 (4.0m)
3 (6.0m)
6 (6.0m)
4 (4.0m)
5 (4.0m) 7 (5.5m)
9 (5.0m) 10 (4.5m)
8 (5.0m)
0
0 25m
Least squares 8.7
Example: illumination
• left: illumination pattern for equal lamp powers ( x = 1)

• right: illumination pattern for least squares solution x̂ , with b = 1
(4.0m) (3.5m) (4.0m) (3.5m) 1.4

(6.0m) (6.0m)
1.2
(6.0m) (6.0m)
(4.0m) (4.0m)
(5.5m) (5.5m) 1
(4.0m) (4.0m)
0.8
(5.0m) (4.5m) (5.0m) (4.5m)
(5.0m) (5.0m)
0.6
Number of regions
100 100
50 50
0 0
0 1 2 0 1 2
Intensity Intensity
Least squares 8.8
Outline

Solution of a least squares problem
if A has linearly independent columns (is left-invertible), then the vector
x̂ = (AT A)−1 AT b
= A† b
is the unique solution of the least squares problem
minimize k Ax − bk 2
• in other words, if x , x̂ , then k Ax − bk 2 > k A x̂ − bk 2

• recall from page 4.23 that
A† = (AT A)−1 AT
is called the pseudo-inverse of a left-invertible matrix
Least squares 8.9

Proof
we show that k Ax − bk 2 > k A x̂ − bk 2 for x , x̂ :
k Ax − bk 2 = k A(x − x̂) + (A x̂ − b)k 2

= k A(x − x̂)k 2 + k A x̂ − bk 2
> k A x̂ − bk 2
• 2nd step follows from A(x − x̂) ⊥ (A x̂ − b):
(A(x − x̂))T (A x̂ − b) = (x − x̂)T (AT A x̂ − AT b) = 0
• 3rd step follows from linear independence of columns of A:
A(x − x̂) , 0 if x , x̂
Least squares 8.10

Derivation from calculus
!2
m n
f (x) = k Ax − bk 2 =
X X
Ai j x j − bi
i=1 j=1
• partial derivative of f with respect to x k

!
∂f m n
(x) = 2 Ai j x j − bi = 2(AT (Ax − b)) k
X X
Aik
∂ xk i=1 j=1
• gradient of f is
∂f ∂f ∂f

∇ f (x) = (x), (x), . . . , (x) = 2AT (Ax − b)
∂ x1 ∂ x2 ∂ xn
• minimizer x̂ of f (x) satisfies ∇ f ( x̂) = 2AT (A x̂ − b) = 0
Least squares 8.11

Geometric interpretation
residual vector r̂ = A x̂ − b satisfies AT r̂ = AT (A x̂ − b) = 0
r̂ = A x̂ − b
A x̂
range(A) = span(a1, . . . , an )
• residual vector r̂ is orthogonal to every column of A; hence, to range(A)

• projection on range(A) is a matrix-vector multiplication with the matrix
A(AT A)−1 AT = AA†
Least squares 8.12

Outline

Normal equations
AT Ax = AT b
• these equations are called the normal equations of the least squares problem
• coefficient matrix AT A is the Gram matrix of A
• equivalent to ∇ f (x) = 0 where f (x) = k Ax − bk 2
• all solutions of the least squares problem satisfy the normal equations
if A has linearly independent columns, then:
• AT A is nonsingular
• normal equations have a unique solution x̂ = (AT A)−1 AT b
Least squares 8.13

QR factorization method
rewrite least squares solution using QR factorization A = QR
x̂ = (AT A)−1 AT b = ((QR)T (QR))−1(QR)T b

= (RT QT QR)−1 RT QT b
= (RT R)−1 RT QT b
= R−1 R−T RT QT b
= R−1QT b
Algorithm
1. compute QR factorization A = QR (2mn2 flops if A is m × n)
2. matrix-vector product d = QT b (2mn flops)
3. solve Rx = d by back substitution (n2 flops)
complexity: 2mn2 flops

Least squares 8.14
Example
 3 −6   −1 
A =  4 −8  , b =  7 
   
 
 0 1   2 
   
1. QR factorization: A = QR with
 3/5 0 
5 −10
Q =  4/5 0  , R=
 

 0 1  0 1
 
2. calculate d = QT b = (5, 2)
3. solve Rx = d
5 −10 x1 5
=
0 1 x2 2
solution is x1 = 5, x2 = 2
Least squares 8.15

Solving the normal equations
why not solve the normal equations
AT Ax = AT b
as a set of linear equations?
Example: a 3 × 2 matrix with “almost linearly dependent” columns
 1 −1   0 
A =  0 10−5 , b =  10−5 ,
   
 
 0 0   1 
   
we round intermediate results to 8 significant decimal digits
Least squares 8.16

Solving the normal equations
Method 1: form Gram matrix AT A and solve normal equations

1 −1 1 −1 0
AT A = { , AT b =
−1 1 + 10−10 −1 1 10−10
after rounding, the Gram matrix is singular; hence method fails
Method 2: QR factorization of A is
 1 0 
1 −1
Q =  0 1  , R=
 

 0 0  0 10−5
 
rounding does not change any values (in this example)
• problem with method 1 occurs when forming Gram matrix AT A

• QR factorization method is more stable because it avoids forming AT A
Least squares 8.17

Least Squares: Least Squares Problem Solution of A Least Squares Problem Solving Least Squares Problems

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Least Squares: Least Squares Problem Solution of A Least Squares Problem Solving Least Squares Problems

Uploaded by

Copyright:

Available Formats

L.

Vandenberghe ECE133A (Fall 2019)

• least squares problem

• solution of a least squares problem

• solving least squares problems

given A ∈ Rm×n and b ∈ Rm , find vector x ∈ Rn that minimizes

• “least squares” because we minimize a sum of squares of affine functions:

• the problem is also called the linear least squares problem

Least squares 8.2

f (x) = k Ax − bk 2 = (2x1 − 1)2 + (−x1 + x2)2 + (2x2 + 1)2

• to find x̂ , set derivatives with respect to x1 and x2 equal to zero:

10x1 − 2x2 − 4 = 0, −2x1 + 10x2 + 4 = 0

solution is ( x̂1, x̂2) = (1/3, −1/3)

Least squares 8.3

• solution of the least squares problem: any x̂ that satisfies

• r̂ = A x̂ − b is the residual vector

• if r̂ = 0, then x̂ solves the linear equation Ax = b

• if r̂ , 0, then x̂ is a least squares approximate solution of the equation

• in most least squares applications, m > n and Ax = b has no solution

Least squares 8.4

least squares problem in terms of columns a1, a2, . . . , an of A:

• A x̂ is the vector in range(A) = span(a1, a2, . . . , an) closest to b

Least squares 8.5

• m demographic groups; n advertising channels

Example: m = 10, n = 3, b = 1031

Columns of matrix A Target b and least squares result A x̂

• n lamps at given positions above an area divided in m regions

• left: illumination pattern for equal lamp powers ( x = 1)

(4.0m) (3.5m) (4.0m) (3.5m) 1.4

• least squares problem

• solution of a least squares problem

• solving least squares problems

if A has linearly independent columns (is left-invertible), then the vector

is the unique solution of the least squares problem

• in other words, if x , x̂ , then k Ax − bk 2 > k A x̂ − bk 2

Least squares 8.9

we show that k Ax − bk 2 > k A x̂ − bk 2 for x , x̂ :

k Ax − bk 2 = k A(x − x̂) + (A x̂ − b)k 2

• 2nd step follows from A(x − x̂) ⊥ (A x̂ − b):

(A(x − x̂))T (A x̂ − b) = (x − x̂)T (AT A x̂ − AT b) = 0

• 3rd step follows from linear independence of columns of A:

Least squares 8.10

• partial derivative of f with respect to x k

• minimizer x̂ of f (x) satisfies ∇ f ( x̂) = 2AT (A x̂ − b) = 0

Least squares 8.11

residual vector r̂ = A x̂ − b satisfies AT r̂ = AT (A x̂ − b) = 0

• residual vector r̂ is orthogonal to every column of A; hence, to range(A)

A(AT A)−1 AT = AA†

Least squares 8.12

• least squares problem

• solution of a least squares problem

• solving least squares problems

if A has linearly independent columns, then:

Least squares 8.13

rewrite least squares solution using QR factorization A = QR

x̂ = (AT A)−1 AT b = ((QR)T (QR))−1(QR)T b

1. compute QR factorization A = QR (2mn2 flops if A is m × n)

2. matrix-vector product d = QT b (2mn flops)

3. solve Rx = d by back substitution (n2 flops)

complexity: 2mn2 flops

Least squares 8.15

why not solve the normal equations