Professional Documents
Culture Documents
cs450 Chapt03
cs450 Chapt03
†
Chapter 3: Linear Least Squares
Survey by Michael T. Heath, copyright
c 2018 by the Society for Industrial and
Applied Mathematics. http://www.siam.org/books/cl80
2
Data Fitting
Data Fitting
I Polynomial fitting
f (t, x) = x1 + x2 t + x3 t 2 + · · · + xn t n−1
f (t, x) = x1 e x2 t + · · · + xn−1 e xn t
1 t1 t12
y1
1 t2 t22 x1 y2
2 ∼
Ax = 1 t3 t32 x2 = y3 = b
1 t4 t4 x3 y4
2
1 t5 t5 y5
Example, continued
I For data
t −1.0 −0.5 0.0 0.5 1.0
y 1.0 0.5 0.0 0.5 2.0
overdetermined 5 × 3 linear system is
1 −1.0 1.0 1.0
1 −0.5 0.25 x1 0.5
∼
Ax = 1 0.0 0.0 x2 = 0.0 = b
1 0.5 0.25 x3 0.5
1 1.0 1.0 2.0
so approximating polynomial is
Example, continued
I Resulting curve and original data points are shown in graph
h interactive example i
10
Normal Equations
2AT Ax − 2AT b = 0
AT Ax = AT b
13
Orthogonality
I Vectors v1 and v2 are orthogonal if their inner product is zero,
v1T v2 = 0
0 = AT r = AT (b − Ax)
AT Ax = AT b
14
Orthogonality, continued
Orthogonal Projectors
I Matrix P is orthogonal projector if it is idempotent (P 2 = P) and
symmetric (P T = P)
v = (P + (I − P)) v = Pv + P⊥ v
P = A(AT A)−1 AT
b = Pb + P⊥ b = Ax + (b − Ax) = y + r
16
A+ = (AT A)−1 AT
ky k2 kAxk2
cos(θ) = =
kbk2 kbk2
k∆xk2 1 k∆bk2
≤ cond(A)
kxk2 cos(θ) kbk2
18
k∆xk2 kE k2
/ [cond(A)]2 tan(θ) + cond(A)
kxk2 kAk2
AT A = LLT
AT Ax = AT b
Example, continued
I Cholesky factorization of symmetric positive definite matrix AT A
gives
5.0 0.0 2.5
AT A = 0.0 2.5 0.0
2.5 0.0 2.125
2.236 0 0 2.236 0 1.118
= 0 1.581 0 0 1.581 0 = LLT
1.118 0 0.935 0 0 0.935
1 + 2
1 1 1
AT A = =
1 1 + 2 1 1
which is singular
cond(AT A) = [cond(A)]2
24
Orthogonalization Methods
27
Orthogonal Transformations
I We seek alternative method that avoids numerical difficulties of
normal equations
I We need numerically robust transformation that produces easier
problem without changing solution
I What kind of transformation leaves least squares solution
unchanged?
I Residual is
kr k22 = kb1 − Rxk22 + kb2 k22
29
I We have no control over second term, kb2 k22 , but first term becomes
zero if x satisfies n × n triangular system
Rx = b1
QR Factorization
I Given m × n matrix A, with m > n, we seek m × m orthogonal
matrix Q such that
R
A=Q
O
Orthogonal Bases
I If we partition m × m orthogonal matrix Q = [Q1 Q2 ], where Q1 is
m × n, then
R R
A=Q = [Q1 Q2 ] = Q1 R
O O
Computing QR Factorization
Householder QR Factorization
34
Householder Transformations
I Householder transformation has form
vv T
H =I −2
vTv
v = a − αe1
Example: Householder
T Transformation
I If a = 2 1 2 , then we take
2 1 2 α
v = a − αe1 = 1 − α 0 = 1 − 0
2 0 2 0
where α = ±kak2 = ±3
I Since a1 is positive, we
choose
negative
sign for α to avoid
2 −3 5
cancellation, so v = 1 − 0 = 1
2 0 2
I To confirm that transformation works,
T 2 5 −3
v a 15
Ha = a − 2 T v = 1 − 2 1 = 0
v v 30
2 2 0
h interactive example i
36
Householder QR Factorization
vv T vTu
Hu = I −2 T u=u− 2 T v
v v v v
Example, continued
Example, continued
Example, continued
h interactive example i
43
Givens QR Factorization
44
Givens Rotations
I Givens rotations introduce zeros one at a time
T
I Given vector a1 a2 , choose scalars c and s so that
c s a1 α
=
−s c a2 0
p
with c 2 + s 2 = 1, or equivalently, α = a12 + a22
I Previous equation can be rewritten
a1 a2 c α
=
a2 −a1 s 0
p
I Finally, c 2 + s 2 = 1, or α = a12 + a22 , implies
a1 a2
c=p 2 and s=p
a1 + a22 a12 + a22
46
Givens QR Factorization
Givens QR Factorization
h interactive example i
49
Gram-Schmidt QR Factorization
50
Gram-Schmidt Orthogonalization
I Given vectors a1 and a2 , we seek orthonormal vectors q1 and q2
having same span
I This can be accomplished by subtracting from second vector its
projection onto first vector and normalizing both resulting vectors,
as shown in diagram
h interactive example i
51
Gram-Schmidt Orthogonalization
Modified Gram-Schmidt
for k = 1 to n
rkk = kak k2
qk = ak /rkk
for j = k + 1 to n
rkj = qkT aj
aj = aj − rkj qk
end
end
h interactive example i
54
Rank Deficiency
55
Rank Deficiency
I Computing QR factorization,
1.1997 0.4527
R=
0 0.0002
h interactive example i
59
A = UΣV T
Example: SVD
1 2 3
4 5 6
I SVD of A = is given by UΣV T =
7 8 9
10 11 12
.141 .825 −.420 −.351 25.5 0 0
.344 .504 .574 .644
.426 .298 .782 0 1.29 0
−.761
−.057 .646
.547 .0278 .664 −.509 0 0 0
.408 −.816 .408
.750 −.371 −.542 .0790 0 0 0
h interactive example i
62
Applications of SVD
σmax
I Euclidean condition number of matrix : cond2 (A) =
σmin
Pseudoinverse
A+ = V Σ+ U T
Orthogonal Bases
A = UΣV T = σ1 E1 + σ2 E2 + · · · + σn En
with Ei = ui viT
I Ei has rank 1 and can be stored using only m + n storage locations
I Product Ei x can be computed using only m + n multiplications
I Condensed approximation to A is obtained by omitting from
summation terms corresponding to small singular values
I Approximation using k largest singular values is closest matrix of
rank k to A
I Approximation is useful in image processing, data compression,
information retrieval, cryptography, etc.
h interactive example i
66
I When all data, including A, are subject to error, then total least
squares is more appropriate
Comparison of Methods