Chapter 5 (5.3 - 5.4) Orthogonality

Chapter 5 Orthogonality
(5.3~5.4)
( 感謝中華大學資訊工程系李建興老師免費提供講義母本 )
5.3 Least Squares Problems
• Find a least squares curve (a linear function, a
polynomial, a trigonometric polynomial, etc.) fit to a
set of data points in the plane.
• The curve provides an optimal approximation in the
sense that the sum of squares of errors between the y
values of the data points and the corresponding y
values of the approximating curve are minimized.
2
Least Squares Solutions to
Overdetermined Systems
• A least squares problems can generally be formulated
as an overdetermined linear system of equations.
• An over-determined system is one involving more
equations than unknowns.
• An over-determined system is usually inconsistent.
• Given an mn system Ax = b with m > n, we cannot
expect in general to find a vector x  Rn such that Ax
equals b.
• Instead, we can look for a vector x  Rn for which Ax
is “closer” to b.
3
Description of the Least Squares
Problems
• Given a system of equations Ax = b, where A is an mn matrix
with m > n and b  Rm, for each x  Rn we can form a residual
r(x) = b – Ax
• The distance between b and Ax is given by
||b - Ax|| = ||r(x)||
• we wish to find a vector x  Rn for which ||r(x)|| is minimized
• In fact, minimizing ||r(x)|| is equivalent to minimizing ||r(x)||2
• A vector that minimizing ||r(x)||2 is said to be a least
x̂
squares solution to the system Ax = b
4
Theorem 5.3.1
Let S be a subspace of Rm. For each b  Rm there is a
unique element p of S that is closest to b, that is
|| b – y || > || b – p ||
for any y  p in S. Furthermore, a given vector p in S
will be closest to a given vector b  Rm if and only if b
– p  S
5
Theorem 5.3.1 proof
(1)()Since Rm = S  S, each b  Rm can be expressed
uniquely as a sum b = p + z, where p  S and z  S
If y is another element of S, then
|| b – y ||2 = || (b – p) + (p – y) ||2
Since p – yS (∵ pS and yS) and b – pS
It follows from the Pythagorean Law that
|| b – y ||2 = || b – p ||2 + || p – y ||2
Therefore, || b – y || > || b – p ||
6
Theorem 5.3.1 proof
(2) () Proof by contradiction:
If qS such that || b–y || > || b – q || for any y  q in S
(i.e., q is closest to b ) and b – q  S,
let pS and b – pS , then q  p,
|| b – q ||2 = || (b – p) + (p – q) ||2
= || b – p ||2 + || p – q ||2
Therefore, || b – q || > || b – p ||
p will be closest to b, which is a contradiction.
7
• Back to our problem: Let S=R(A), a vector x̂ will be a
solution to the least squares problem Ax = b if and only
if p = Ax̂ is the vector in R(A) that is closest to b. The
vector p is said to be the projection of b onto R(A).
8
• From Theorem 5.3.1, b – p  R(A), then b – p = b – A x̂
= r( x̂ )  R(A). Thus x̂ is a solution to the least squares
problem if and only if
x̂r( )  R(A) (1)
Figure 5.3.2
9
• How to find the vector x̂ satisfying (1)?
• Since r( x̂ )  R(A), and from Theorem 5.2.1:
R(A) = N(AT)
• A vector x̂ will be a solution to the least squares
problem Ax = b if and only if
x̂ )  N(AT)
r(
or equivalently,
x̂ ) = AT(b – x̂A )
0 = AT r(
10
• To solve the least squares problem Ax = b, we must
solve
ATAx = ATb (2)
• Eq. (2) represents an nn system of equations (note:
ATA: nn). These equations are called normal
equations.
• In general, it is possible to have more than one
solution to the normal equations
• If x̂ and ŷ are both solutions and since the projection
p of b onto R(A) is unique, then:
x̂A = ŷA = p
11
Theorem 5.3.2
If A is an mn matrix of rank n, the normal equations
ATAx = ATb
have a unique solution
x̂ = (ATA)-1ATb
and is the unique least squares solution to the problem
Ax = b.
12
Theorem 5.3.2 proof
• First, we have to show that ATA is nonsingular
• Let z be the solution to
ATAx = 0
(3)
• Then, Az  N(AT) (think of AT(Az) = 0)
• But clearly, Az  R(A) = N(AT)
 Az  N(AT) ∩ N(AT) = {0}
 Az = 0
13
Theorem 5.3.2 proof
• If A has rank n, the column vectors of A are linear
independent and thus Ax = 0
has only the trivial solution 0  z = 0 in (3)
 Eq. (3) ATAx = 0 has only the trivial solution
 By Theorem 1.5.2, ATA is nonsingular
 x̂ = (ATA)-1ATb is the unique solution to the normal
equations and, consequently, the unique least squares
solution to the problem Ax = b
14
• The projection vector
p = x̂A = A(ATA)-1ATb
is the element of R(A) that is closest to b in the least
squares sense.
• The matrix P = A(ATA)-1AT is called the projection
matrix
15
Example 1
• Find the least squares solution to the system
x1 + x2 = 3
–2x1 + 3x2 = 1
2x1 – x2 = 2
• Sol:
 1 1  3
1  2 2
A   2 3, AT   , b  1 
  1 
3  1  
 2  1  2
16
Example 1
• The normal equations for the system are ATAx = ATb:
 1 1  3
1  2 2    x1  1  2 2  
1  2 3      1
 3  1    x2  1 3  1 

 2  1 2
 9  7   x1  5
     
 7 11  x2  4
 x1   50
83

     71 
 x2   50 
17
• Given a table data:
x x1 x2 … xm
y y1 y2 … ym
we wish to find a linear function:
y = c0 + c1 x
that best fits the data in the least squares sense.
18
• If we require that
yi = c0 + c1 xi for i = 1, …, m
we get a system of m equations in two unknowns:
1 x1   y1 
1 x  c y 
 2  
 0  2
    c1    
(5) 1 x  y 
 m  m
• The linear function whose coefficients are the least

squares solutions to (5) is said to be the best least
squares fit to the data by a linear function.
19
Example 2
• Given the data
x 0 3 6
y 1 4 5
Find the best least squares fit to the data by a linear

function.
• Sol:
Let the system (4) be described as Ac = y, where
1 0 1 
  c0 
A  1 3 , c   , y   4 ,
   c1   
1 6 5
20
Example 2
• The normal equations ATAc = ATy:
1 0 1 
1 1 1    c0  1 1 1   3 9  c0  10 
0 3 6 1 3  c   0 3 6 4  9 45  c   42
   1     1   
1 6   
5
• The solution to the system is

c0   43 
c   2 
 1  3
 The best least squares fit is given by y = 4/3 + 2/3x
21
Figure 5.3.4
22
• If the data do not resemble a linear function, a higher-
order polynomial is needed.
• To find the coefficients c0, c1, …, cn of the best least
squares fit to the data
x x1 x2 … xm
y y1 y2 … ym
by a polynomial of degree n (y = c0 + c1 x + c2 x2 + …
+ cn xn), we must find the least squares solution to the
system:
23
1 x1 x12  x1n  c0   y1 
 
1 x2 x22  x2n   c1   y2 
  
      
(6)     
1 xm xm2  xmn  cn   ym 
24
Example 3
• Find the best quadratic least squares fit to the data
x 0 1 2 3
y 3 2 4 4
• Sol: For this example the system (6) becomes
1 0 0  3
1  c0   
1 1   2
  c1   
1 2 4    4
1  c2   
 3 9  4
25
Example 3
• The normal equations are:
1 0 0  3
1 1 1 1   c0  1 1 1 1  
0 1  1 1 1     2
2 3   c1  0 1 2 3  
  1 2 4      4
0 1 4 9   c2  0 1 4 9  
1 3 9   4
4 6 14  c0  13 
6 14 36  c1   22
    
14 36 98 c2  54
26
Example 3
• The solution to the system is
c0   2.75
c0     
c
c  1     0. 25
 1 
c2   0.25
 The quadratic polynomial that gives the best least

squares fit to the data is
p(x) = 2.75 – 0.25x + 0.25 x2
27
5.4 Inner Product Spaces
Definition
An inner product on a vector space V is an operation
on V that assigns to each pair of vectors x and y in V a
real number <x, y> satisfying the following conditions:
I. <x, x>  0 with equality if and only if x = 0
II. <x, y> = <y, x> for all x and y in V
III. <x+y, z> = <x, z> + <y, z> for all x, y, z in V
and all scalars  and 
• A vector space V with an inner product is called an

inner product space.
28
Case 1: The vector space R n
• The standard inner product for Rn is the scalar product

<x, y> = xTy
Given a vector w with positive entries
(1)  x, y   xi yi wi
i 1
can be also defined as an inner product where wi are

referred to as weights
29
Case 2: The vector space R mn
• Given A and B in Rmn

m n
 A, B   aij bij
(2) i 1 j 1
30
Case 3: The vector space C[a, b]
• In C[a, b] we may define an inner product by
b
 f , g   f ( x) g ( x)dx (3)
a
• If w(x) is a positive continuous function on C[a, b], then

(4)
b
 f , g   f ( x) g ( x) w( x)dx
• (4) is also an inner product
a
where the function w(x) is
called a weight function
31
Basic Properties of Inner Product
Spaces
• If v is a vector in an inner product space V, the length
or norm of v is given by
v   v, v 
• Two vectors u and v are said to be orthogonal if

<u, v> = 0
32
Theorem 5.4.1
(The Pythagorean Law)
If u and v are orthogonal vectors in an inner product
space V, then
||u+v||2 = ||u||2 + ||v||2
Figure 5.4.1
33
Theorem 5.4.1
Pf: ||u+v||2 = <u+v, u+v>
= <u, u+v> + <v, u+v>
= <u, u> + <u, v> + <v, u> + <v, v>
= ||u||2 + 2<u, v> + ||v||2 (Note <u, v> = 0)
= ||u||2 + ||v||2
34
Example 1
• Consider the vector space C[-1, 1] with inner product
defined by (3).
• The vector 1 and x are orthogonal since
1 1 2 1 1 2 1
 1, x    1  x dx  x  1  (1) 2  0
1 2 1 2 2
• To determine the length of each vector, we compute
1 1
 1, 1    1 1 dx  x  1  (1)  2
1 1
1 1 3 1 1 3 1 2
 x, x    x  x dx  x  1  (1) 
3
1 3 1 3 3 3
35
Example 1
1   1, 1    2
1/2
x   x, x    2 / 3  6 / 3
1/2
2 2 2
1 x  1  x  2  2/3  8/3
• Verification:
1 1
  1  x, 1  x    (1  x)  (1  x) dx   (1  x) 2 dx
2
1 x
1 1
1 3 1 1 1 1 1 8
 (1  x)  (1  1) 3  (1  (1))3  23  03 
3 1 3 3 3 3 3
36
Example 2
• For the vector space C[-, ], if we use a constant
weight function w(x) = 1/ to define an inner product:
1 
 f,g 
  
f ( x) g ( x) dx
• Then
1 
 
 cos x, sinx   cos x sinx dx  0

1 
 
 cos x, cosx   cos x cosx dx  1

1 
 
 sin x, sinx   sinx sinx dx  1

37
Example 2
• Thus cosx and sinx are orthogonal unit vectors with
respect to this inner product.
• From the Pythagorean law,
2 2
cos x  sin x  cos x  sin x  11  2
38
Example 2
• Verification:
2
cos x  sin x   cos x  sin x, cos x  sin x 
1 

  

(cos x  sin x)  (cos x  sin x) dx
1 

 

[(cos x) 2  2 sin x cos x  (sin x) 2 ] dx
1  1  1 
  (cos x) dx  2  sin x cos x dx  
2
(sin x) 2 dx
     
 1 2  0 1  2
39
• For the vector space Rmn the norm derived from the
inner product (2) is called the Frobenius norm and is
denoted by || • ||F. Thus if A  Rmn, then
1/ 2
 m n
2
AF   A, A     aij 
 i 1 j 1 
40
Example 3
1 1   -1 1
• If A  1 2 and B   3 0 , then
   
3 3 - 3 4
• <A, B>
= 1(-1) + 11 + 13 + 20 + 3(-3) + 34
=6
||A||F = (12 + 12 + 12 + 22 + 32 + 32)1/2 = 5
||B||F = [(-1)2+ 12+ 32+ 02 +(-3) 2 +42]1/2 = 6
41
Example 4
• In P5, define an inner product by (5) with xi = (i –1)/4
for i = 1, 2, …, 5. The length of the function p(x) = 4x
is given by
1/ 2 1/ 2
1/ 2   16 x 2   
5 5
4 x   4 x, 4 x    (i  1) 2   30
 i 1   i 1 
42
Definition
If u and v are vectors in an inner product space V and v
 0, then the scalar projection  and vector projection
p of u onto v are given by
 u, v  v  u, v  v  u, v   u, v 
α and p  α   v  v
v v v v v
2
 v, v 
43
Observation
If v  0 and p is the vector projection of u onto v, then
I. u – p and p are orthogonal
II. u = p if and only if u is a scalar multiple of v
Pf:
α α α 2
I. Since  p, p    v, v   ( )  v, v   α 2 and
v v v
 u, v   u, v   u, v  2
 u, p    u, v  u, v    α 2
 v, v   v, v  v
2
 <u – p, p> = <u, p> – <p, p> = 2 – 2 = 0

 u – p and p are orthogonal
44
Observation
II. ()
If u = v, then the vector projection of u onto v is
given by
 u, v   βv, v 
p v v  βv  u
 v, v   v, v 
()
If u = p  v
upα 
α
v  βv, with β 
α
v v v
45
Theorem 5.4.2
(The Cauchy-Schwarz Inequality)
If u and v are any two vectors in an inner product space
V, then
| <u, v> |  ||u|| ||v||
Equality holds if and only if u and v are linearly
dependent.
46
• From the above theorem, if u and v are nonzero
vectors, then
u, v
-1  1
u v
and hence there is a unique angle   [0, ] such that
u, v
cosθ 
u v
this equation can be used to define the angle 
between two nonzero vectors u and v
47
Definition
A vector space V is said to be a normed linear space if
to each vector vV there is associated a real number ||v||
called the norm of v, satisfying
I. ||v||  0 with equality if and only if v = 0
II. ||v|| = || ||v|| for any scalar 
III. ||v+w||  ||v|| + ||w|| for all v, wV
• The third condition is called the triangle inequality.
48
Theorem 5.4.3
If V is an inner product space, then the equation
v  v, v , for all v  V
defines a norm on V
49
Theorem 5.4.3
• Pf: It is easily seen that conditions I and II are
satisfied (please verify on your own). Here we show
that condition III is satisfied:
||u+v||2 = < u+v, u+v >
= < u, u > + 2 < u, v > + < v, v >
 ||u||2 + 2 ||u|| ||v|| + ||v||2
(from The Cauchy-Schwarz
Inequality)
= (||u|| + ||v||)2
Thus
||u+v||  ||u|| + ||v||
50
Definition
• For every vector x = (x1, x2, . . ., xn)T in Rn
1-norm: ||x||1 = |x1| + |x2| + ... + |xn|
2-norm: ||x||2 = (|x1|2 + |x
1 / 22
| 2
+ ... + |xn
| ) =
2 1/2
 n 2
  xi    x, x 
 i 1   p
n 1/ p
 i 
x
p-norm: ||x||p = (|x1| + |x2| + ... + |xn| ) =  i1
p p p 1/p

-norm (uniform norm, infinity norm):
||x|| = max(|x1|, |x2|, ... ,|xn|)
51
• If p  2, ||||p does not correspond to any inner
product, thus the Pythogorean Law will not hold.
• For example, x1 = [1 , 2]T and x2 = [-4 , 2]T are
orthogonal. However,
x1 + x2 = [1 , 2]T + [-4 , 2]T = [-3 , 4]T
(||x1||)2 + (||x2||)2 = 22 + 42 = 4 + 16 = 20
(||x1 + x2||)2 = 42 = 16
(||x1||2)2 + (||x2||2)2 = (12 + 22) + ((-4)2 + 22)
= 5 + 20 = 25
(||x1 + x2||2)2 = (-3)2 + 42 = 9 + 16 = 25
52
Example 5
• Let x = (4, -5, 3)T in R3, then
||x||1 = |4| + |-5| + |3| = 12
||x||2 = (|4|2 + |-5|2 + |3|2)1/2 = (50)1/2
||x|| = max( |4|, |-5|, |3 | ) = 5
• A norm provides a way of measuring the distance

between two vectors
53
Definition
Let x and y be vectors in a normed linear space. The
distance between x and y is defined to be the number ||y
– x||.
54

Chapter 5 (5.3 - 5.4) Orthogonality

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 5 (5.3 - 5.4) Orthogonality

Uploaded by

Copyright:

Available Formats

Chapter 5 Orthogonality

• The linear function whose coefficients are the least

Find the best least squares fit to the data by a linear

• The solution to the system is

 The best least squares fit is given by y = 4/3 + 2/3x

 The quadratic polynomial that gives the best least

• A vector space V with an inner product is called an

• The standard inner product for Rn is the scalar product

can be also defined as an inner product where wi are

• Given A and B in Rmn

• If w(x) is a positive continuous function on C[a, b], then

• Two vectors u and v are said to be orthogonal if

 <u – p, p> = <u, p> – <p, p> = 2 – 2 = 0

• The third condition is called the triangle inequality.

• A norm provides a way of measuring the distance

You might also like