Professional Documents
Culture Documents
Chapter 5 (5.3 - 5.4) Orthogonality
Chapter 5 (5.3 - 5.4) Orthogonality
(5.3~5.4)
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
5.3 Least Squares Problems
• Find a least squares curve (a linear function, a
polynomial, a trigonometric polynomial, etc.) fit to a
set of data points in the plane.
• The curve provides an optimal approximation in the
sense that the sum of squares of errors between the y
values of the data points and the corresponding y
values of the approximating curve are minimized.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
2
Least Squares Solutions to
Overdetermined Systems
• A least squares problems can generally be formulated
as an overdetermined linear system of equations.
• An over-determined system is one involving more
equations than unknowns.
• An over-determined system is usually inconsistent.
• Given an mn system Ax = b with m > n, we cannot
expect in general to find a vector x Rn such that Ax
equals b.
• Instead, we can look for a vector x Rn for which Ax
is “closer” to b.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
3
Description of the Least Squares
Problems
• Given a system of equations Ax = b, where A is an mn matrix
with m > n and b Rm, for each x Rn we can form a residual
r(x) = b – Ax
• The distance between b and Ax is given by
||b - Ax|| = ||r(x)||
• we wish to find a vector x Rn for which ||r(x)|| is minimized
• In fact, minimizing ||r(x)|| is equivalent to minimizing ||r(x)||2
• A vector that minimizing ||r(x)||2 is said to be a least
x̂
squares solution to the system Ax = b
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
4
Theorem 5.3.1
Let S be a subspace of Rm. For each b Rm there is a
unique element p of S that is closest to b, that is
|| b – y || > || b – p ||
for any y p in S. Furthermore, a given vector p in S
will be closest to a given vector b Rm if and only if b
– p S
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
5
Theorem 5.3.1 proof
(1)()Since Rm = S S, each b Rm can be expressed
uniquely as a sum b = p + z, where p S and z S
If y is another element of S, then
|| b – y ||2 = || (b – p) + (p – y) ||2
Since p – yS (∵ pS and yS) and b – pS
It follows from the Pythagorean Law that
|| b – y ||2 = || b – p ||2 + || p – y ||2
Therefore, || b – y || > || b – p ||
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
6
Theorem 5.3.1 proof
(2) () Proof by contradiction:
If qS such that || b–y || > || b – q || for any y q in S
(i.e., q is closest to b ) and b – q S,
let pS and b – pS , then q p,
|| b – q ||2 = || (b – p) + (p – q) ||2
= || b – p ||2 + || p – q ||2
Therefore, || b – q || > || b – p ||
p will be closest to b, which is a contradiction.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
7
• Back to our problem: Let S=R(A), a vector x̂ will be a
solution to the least squares problem Ax = b if and only
if p = Ax̂ is the vector in R(A) that is closest to b. The
vector p is said to be the projection of b onto R(A).
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
8
• From Theorem 5.3.1, b – p R(A), then b – p = b – A x̂
= r( x̂ ) R(A). Thus x̂ is a solution to the least squares
problem if and only if
x̂r( ) R(A) (1)
Figure 5.3.2
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
9
• How to find the vector x̂ satisfying (1)?
• Since r( x̂ ) R(A), and from Theorem 5.2.1:
R(A) = N(AT)
• A vector x̂ will be a solution to the least squares
problem Ax = b if and only if
x̂ ) N(AT)
r(
or equivalently,
x̂ ) = AT(b – x̂A )
0 = AT r(
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
10
• To solve the least squares problem Ax = b, we must
solve
ATAx = ATb (2)
• Eq. (2) represents an nn system of equations (note:
ATA: nn). These equations are called normal
equations.
• In general, it is possible to have more than one
solution to the normal equations
• If x̂ and ŷ are both solutions and since the projection
p of b onto R(A) is unique, then:
x̂A = ŷA = p
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
11
Theorem 5.3.2
If A is an mn matrix of rank n, the normal equations
ATAx = ATb
have a unique solution
x̂ = (ATA)-1ATb
and is the unique least squares solution to the problem
Ax = b.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
12
Theorem 5.3.2 proof
• First, we have to show that ATA is nonsingular
• Let z be the solution to
ATAx = 0
(3)
• Then, Az N(AT) (think of AT(Az) = 0)
• But clearly, Az R(A) = N(AT)
Az N(AT) ∩ N(AT) = {0}
Az = 0
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
13
Theorem 5.3.2 proof
• If A has rank n, the column vectors of A are linear
independent and thus Ax = 0
has only the trivial solution 0 z = 0 in (3)
Eq. (3) ATAx = 0 has only the trivial solution
By Theorem 1.5.2, ATA is nonsingular
x̂ = (ATA)-1ATb is the unique solution to the normal
equations and, consequently, the unique least squares
solution to the problem Ax = b
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
14
• The projection vector
p = x̂A = A(ATA)-1ATb
is the element of R(A) that is closest to b in the least
squares sense.
• The matrix P = A(ATA)-1AT is called the projection
matrix
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
15
Example 1
• Find the least squares solution to the system
x1 + x2 = 3
–2x1 + 3x2 = 1
2x1 – x2 = 2
• Sol:
1 1 3
1 2 2
A 2 3, AT , b 1
1
3 1
2 1 2
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
16
Example 1
• The normal equations for the system are ATAx = ATb:
1 1 3
1 2 2 x1 1 2 2
1 2 3 1
3 1 x2 1 3 1
2 1 2
9 7 x1 5
7 11 x2 4
x1 50
83
71
x2 50
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
17
• Given a table data:
x x1 x2 … xm
y y1 y2 … ym
we wish to find a linear function:
y = c0 + c1 x
that best fits the data in the least squares sense.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
18
• If we require that
yi = c0 + c1 xi for i = 1, …, m
we get a system of m equations in two unknowns:
1 x1 y1
1 x c y
2
0 2
c1
(5) 1 x y
m m
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
21
Figure 5.3.4
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
22
• If the data do not resemble a linear function, a higher-
order polynomial is needed.
• To find the coefficients c0, c1, …, cn of the best least
squares fit to the data
x x1 x2 … xm
y y1 y2 … ym
by a polynomial of degree n (y = c0 + c1 x + c2 x2 + …
+ cn xn), we must find the least squares solution to the
system:
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
23
1 x1 x12 x1n c0 y1
1 x2 x22 x2n c1 y2
(6)
1 xm xm2 xmn cn ym
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
24
Example 3
• Find the best quadratic least squares fit to the data
x 0 1 2 3
y 3 2 4 4
• Sol: For this example the system (6) becomes
1 0 0 3
1 c0
1 1 2
c1
1 2 4 4
1 c2
3 9 4
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
25
Example 3
• The normal equations are:
1 0 0 3
1 1 1 1 c0 1 1 1 1
0 1 1 1 1 2
2 3 c1 0 1 2 3
1 2 4 4
0 1 4 9 c2 0 1 4 9
1 3 9 4
4 6 14 c0 13
6 14 36 c1 22
14 36 98 c2 54
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
26
Example 3
• The solution to the system is
c0 2.75
c0
c
c 1 0. 25
1
c2 0.25
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
27
5.4 Inner Product Spaces
Definition
An inner product on a vector space V is an operation
on V that assigns to each pair of vectors x and y in V a
real number <x, y> satisfying the following conditions:
I. <x, x> 0 with equality if and only if x = 0
II. <x, y> = <y, x> for all x and y in V
III. <x+y, z> = <x, z> + <y, z> for all x, y, z in V
and all scalars and
(1) x, y xi yi wi
i 1
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
29
Case 2: The vector space R mn
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
30
Case 3: The vector space C[a, b]
• In C[a, b] we may define an inner product by
b
f , g f ( x) g ( x)dx (3)
a
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
31
Basic Properties of Inner Product
Spaces
• If v is a vector in an inner product space V, the length
or norm of v is given by
v v, v
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
32
Theorem 5.4.1
(The Pythagorean Law)
If u and v are orthogonal vectors in an inner product
space V, then
||u+v||2 = ||u||2 + ||v||2
Figure 5.4.1
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
33
Theorem 5.4.1
Pf: ||u+v||2 = <u+v, u+v>
= <u, u+v> + <v, u+v>
= <u, u> + <u, v> + <v, u> + <v, v>
= ||u||2 + 2<u, v> + ||v||2 (Note <u, v> = 0)
= ||u||2 + ||v||2
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
34
Example 1
• Consider the vector space C[-1, 1] with inner product
defined by (3).
• The vector 1 and x are orthogonal since
1 1 2 1 1 2 1
1, x 1 x dx x 1 (1) 2 0
1 2 1 2 2
• To determine the length of each vector, we compute
1 1
1, 1 1 1 dx x 1 (1) 2
1 1
1 1 3 1 1 3 1 2
x, x x x dx x 1 (1)
3
1 3 1 3 3 3
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
35
Example 1
1 1, 1 2
1/2
x x, x 2 / 3 6 / 3
1/2
2 2 2
1 x 1 x 2 2/3 8/3
• Verification:
1 1
1 x, 1 x (1 x) (1 x) dx (1 x) 2 dx
2
1 x
1 1
1 3 1 1 1 1 1 8
(1 x) (1 1) 3 (1 (1))3 23 03
3 1 3 3 3 3 3
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
36
Example 2
• For the vector space C[-, ], if we use a constant
weight function w(x) = 1/ to define an inner product:
1
f,g
f ( x) g ( x) dx
• Then
1
cos x, sinx cos x sinx dx 0
1
cos x, cosx cos x cosx dx 1
1
sin x, sinx sinx sinx dx 1
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
37
Example 2
• Thus cosx and sinx are orthogonal unit vectors with
respect to this inner product.
• From the Pythagorean law,
2 2
cos x sin x cos x sin x 11 2
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
38
Example 2
• Verification:
2
cos x sin x cos x sin x, cos x sin x
1
(cos x sin x) (cos x sin x) dx
1
[(cos x) 2 2 sin x cos x (sin x) 2 ] dx
1 1 1
(cos x) dx 2 sin x cos x dx
2
(sin x) 2 dx
1 2 0 1 2
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
39
• For the vector space Rmn the norm derived from the
inner product (2) is called the Frobenius norm and is
denoted by || • ||F. Thus if A Rmn, then
1/ 2
m n
2
AF A, A aij
i 1 j 1
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
40
Example 3
1 1 -1 1
• If A 1 2 and B 3 0 , then
3 3 - 3 4
• <A, B>
= 1(-1) + 11 + 13 + 20 + 3(-3) + 34
=6
||A||F = (12 + 12 + 12 + 22 + 32 + 32)1/2 = 5
||B||F = [(-1)2+ 12+ 32+ 02 +(-3) 2 +42]1/2 = 6
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
41
Example 4
• In P5, define an inner product by (5) with xi = (i –1)/4
for i = 1, 2, …, 5. The length of the function p(x) = 4x
is given by
1/ 2 1/ 2
1/ 2 16 x 2
5 5
4 x 4 x, 4 x (i 1) 2 30
i 1 i 1
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
42
Definition
If u and v are vectors in an inner product space V and v
0, then the scalar projection and vector projection
p of u onto v are given by
u, v v u, v v u, v u, v
α and p α v v
v v v v v
2
v, v
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
43
Observation
If v 0 and p is the vector projection of u onto v, then
I. u – p and p are orthogonal
II. u = p if and only if u is a scalar multiple of v
Pf:
α α α 2
I. Since p, p v, v ( ) v, v α 2 and
v v v
u, v u, v u, v 2
u, p u, v u, v α 2
v, v v, v v
2
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
44
Observation
II. ()
If u = v, then the vector projection of u onto v is
given by
u, v βv, v
p v v βv u
v, v v, v
()
If u = p v
upα
α
v βv, with β
α
v v v
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
45
Theorem 5.4.2
(The Cauchy-Schwarz Inequality)
If u and v are any two vectors in an inner product space
V, then
| <u, v> | ||u|| ||v||
Equality holds if and only if u and v are linearly
dependent.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
46
• From the above theorem, if u and v are nonzero
vectors, then
u, v
-1 1
u v
and hence there is a unique angle [0, ] such that
u, v
cosθ
u v
this equation can be used to define the angle
between two nonzero vectors u and v
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
47
Definition
A vector space V is said to be a normed linear space if
to each vector vV there is associated a real number ||v||
called the norm of v, satisfying
I. ||v|| 0 with equality if and only if v = 0
II. ||v|| = || ||v|| for any scalar
III. ||v+w|| ||v|| + ||w|| for all v, wV
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
48
Theorem 5.4.3
If V is an inner product space, then the equation
v v, v , for all v V
defines a norm on V
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
49
Theorem 5.4.3
• Pf: It is easily seen that conditions I and II are
satisfied (please verify on your own). Here we show
that condition III is satisfied:
||u+v||2 = < u+v, u+v >
= < u, u > + 2 < u, v > + < v, v >
||u||2 + 2 ||u|| ||v|| + ||v||2
(from The Cauchy-Schwarz
Inequality)
= (||u|| + ||v||)2
Thus
||u+v|| ||u|| + ||v||
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
50
Definition
• For every vector x = (x1, x2, . . ., xn)T in Rn
1-norm: ||x||1 = |x1| + |x2| + ... + |xn|
2-norm: ||x||2 = (|x1|2 + |x
1 / 22
| 2
+ ... + |xn
| ) =
2 1/2
n 2
xi x, x
i 1 p
n 1/ p
i
x
p-norm: ||x||p = (|x1| + |x2| + ... + |xn| ) = i1
p p p 1/p
-norm (uniform norm, infinity norm):
||x|| = max(|x1|, |x2|, ... ,|xn|)
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
51
• If p 2, ||||p does not correspond to any inner
product, thus the Pythogorean Law will not hold.
• For example, x1 = [1 , 2]T and x2 = [-4 , 2]T are
orthogonal. However,
x1 + x2 = [1 , 2]T + [-4 , 2]T = [-3 , 4]T
(||x1||)2 + (||x2||)2 = 22 + 42 = 4 + 16 = 20
(||x1 + x2||)2 = 42 = 16
(||x1||2)2 + (||x2||2)2 = (12 + 22) + ((-4)2 + 22)
= 5 + 20 = 25
(||x1 + x2||2)2 = (-3)2 + 42 = 9 + 16 = 25
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
52
Example 5
• Let x = (4, -5, 3)T in R3, then
||x||1 = |4| + |-5| + |3| = 12
||x||2 = (|4|2 + |-5|2 + |3|2)1/2 = (50)1/2
||x|| = max( |4|, |-5|, |3 | ) = 5
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
53
Definition
Let x and y be vectors in a normed linear space. The
distance between x and y is defined to be the number ||y
– x||.
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
54