Least Squares Approximation

Numerical Analysis E3, I3 FMN050

Numerical Analysis
Centre for Mathematical Sciences
Lund University, Sweden


February 28, 2007

Numerical Analysis E3/Least Squares

Approximation problems

We will treat two basic problems:

• Overdetermined linear systems
• Interpolation

• Fit straight line to measurement data
• Find continuous function that agrees with discrete
data (“digital-to-analogue conversion”)

Form & Norm: central questions of approximation

• Form: Which form should the approximant have?
Straight line, polynomial, trigonometric function,
3rd degree surface,. . . ?
• Norm: How do we measure the “error”?
Which is the “best” approximation?

For example, k · k2 ⇒ least squares method

c 2003–2007 Gustaf Söderlind, Numerical Analysis E3/Least Squares

Fitting a straight line

Table of data:
x 1 2 3
y 1 2 2



0 0.5 1 1.5 2 2.5 3 3.5 4

Let y ∗ (x) = c0 + c1x. Determine parameters c0, c1!

Wishful thinking:
y ∗(1) = y(1) ⇒ c0 + c1 =1
y ∗(2) = y(2) ⇒ c0 + 2c1 =2
y ∗(3) = y(3) ⇒ c0 + 3c1 =2

“Overdetermined system” — what is a solution?

Three equations, two unknowns c0, c1.

Minimax approximation, k · k∞



0 0.5 1 1.5 2 2.5 3 3.5 4

Error e(x) = y ∗(x) − y(x) = c0 + c1x − y(x).

Determine c0, c1 so that ke(x)k∞ is minimized!

min kek∞ ⇒ |e(1)| = |e(2)| = |e(3)|

c0 ,c1

e(1) = −e(2) ⇒ c0 + c1 − 1 = −(c0 + 2c1 − 2)

e(1) = e(3) ⇒ c0 + c1 − 1 = c0 + 3c1 − 2

2c0 + 3c1 = 3
2c1 = 1 ⇒ c1 = 1/2, c0 = 3/4
Best (minimax) approximation: y ∗ (x) = 4 + 12 x
Note only for three points; otherwise Remes’ algorithm

Least squares approximation

Error e(x) = y ∗(x) − y(x) = c0 + c1x − y(x).

Determine c0, c1 to minimize ρ(c0, c1) = ke(x)k22

ρ = (c0 + c1 − 1)2 + (c0 + 2c1 − 2)2 + (c0 + 3c1 − 2)2

min ρ(c0, c1) = min |e(xi)|22
c0 ,c1 c0 ,c1

Smooth minimum if ∂ρ/∂c0 = ∂ρ/∂c1 = 0

∂ρ/∂c0 := 0 ⇒ 6c0 + 12c1 − 10 = 0

∂ρ/∂c1 := 0 ⇒ 12c0 + 28c1 − 22 = 0

Solution: c1 = 1/2, c0 = 2/3

Best (least squares) approximation: y ∗(x) = 23 + 21 x

Note: minimax and least-squares are not the same!

Dashed line: 2.5

minimax 2


Solid line:
least squares
0 0.5 1 1.5 2 2.5 3 3.5 4

The optimal solution depends on the choice of norm!

x 1 2 3
y 1 2 2

Minimax solution
y∗ 5/4 7/4 9/4
e 1/4 −1/4 1/4 kek∞ = 1/4 kek22 = 3/16

Least squares solution

y∗ 7/6 10/6 13/6
e 1/6 −2/6 1/6 kek∞ = 1/3 kek22 = 3/18

The least squares method

Overdetermined linear system:

Ax ≈ b ; A ∈ Rm×n ; m>n

Usually no solution exists: existence ⇔ b ∈ R(A).

This is an exception, because

If rank A = n, then b ∈ Rm ⊃ Rn = R(A).

Least squares approximate solution:

Determine x so that the residual kAx − bk2 is minimal!

min kAx − bk2

The least squares method. . .

Minimize the residual r or (more conveniently) r T r.

r = Ax − b ⇒ r T r = kAx − bk22

Quadratic form:

r T r = (Ax − b)T (Ax − b)

= xT AT Ax − 2bT Ax + bT b

Stationary points if and only if

grad r T r = 2xT AT A − 2bT A = 0.

The normal equations: AT Ax − AT b = 0.

The residual must be orthogonal to the columns of A:

AT (Ax − b) = 0 ⇔ AT r = 0

The residual r is normal to R(A).

The normal equations

Overdetermined system: Ax ≈ b ; A ∈ Rm×n

Normal equations: AT Ax = AT b

Least squares solution: x = (AT A)−1AT b.

Note: AT A is n × n (square), and rank A = n ⇒

• det AT A 6= 0
• AT A > 0 (positive definite)
• normal equations give the minimum solution
• (AT A)−1AT is called the pseudo inverse of A.

Example: Fitting a straight line

Fit y ∗(x) = c0 + c1x to

x 1 2 3
y 1 2 2

Overdetermined system Ac ≈ y:
   
1 1   1
 1 2  c 0
≈ 2 
1 3 2
 
1 1  
T 1 1 1
A=  1 2  ⇒ A = .
1 2 3
1 3

Normal equations AT Ac = AT y:
3 6 c0 5
6 14 c1 11

Solution: c0 = 23 ; c1 = 1
2 ⇒ y ∗(x) = 2
3 + 21 x.

L2 approximation

Problem: Given f , find a ϕ ∈ Φ such that ϕ ≈ f .

Standard questions of form and norm:

• Form: How to choose the system of functions Φ?

Example: polynomials, trigonometric functions. . .

• Norm: How do we measure the approximation error?

Example: kf − ϕk2 (known as L2 approximation).

Approximation with function systems in L2 is a

generalization of the least squares method.

Important examples:
• Orthogonal polynomials
• Fourier analysis
• the Finite Element Method

Inner products and the norm k · k2

Inner product = generalized scalar product.

Properties of a real inner product:
1. hf, gi = hg, f i
2. hf, αg + βhi = αhf, gi + βhf, hi
3. hf, f i ≥ 0
4. hf, f i = 0 ⇒ f ≡ 0

Euclidean norm: kf k22 = hf, f i

Orthogonality: f ⊥ g if hf, gi = 0

Pythagorean theorem:

hf, gi = 0 ⇒ kf + gk22 = kf k22 + kgk22.


hf + g, f + gi = hf, f i + 2hf, gi + hg, gi

Inner products. . .

Discrete case:
X m
hf, gi = f (xi)g(xi); kf k22 = f (xi)2
i=1 i=1

kf k22 fi2.
Compare f g = figi and =

Continuous case:
Z 1 Z 1
hf, gi = f (x)g(x) dx; kf k22 = f (x)2 dx
0 0

Orthogonal systems:
A set of functions Φ = {ϕj } is an orthogonal system
with respect to h·, ·i if

hϕi, ϕj i = 0 i 6= j.

L2 approximation

Given a function f , an inner product h·, ·i and an

orthogonal system Φ = {ϕj }.

Approximate f by f = j cj ϕj !

f ∗ is the best approximant in the least squares sense

if the residual satisfies the orthogonality condition

f ∗ − f ⊥ ϕj ∀j.

Normal equations:
hϕi, f ∗ − f i =0 ∀i ⇒
hϕi, j cj ϕj i = hϕi, f i ⇒
j hϕi, ϕj icj = hϕi, f i

Compare matrix formula “ΦT Φc = ΦT f ”.

L2 approximation and Fourier coefficients

The normal equations

j hϕi , ϕj icj = hϕi, f i

is a linear equation system for the coefficient vector c.

The system matrix has elements aij = hϕi, ϕj i.

If {ϕj } is an orthogonal system, hϕi, ϕj i = 0 unless

i = j. The system matrix becomes diagonal and

hϕi, f i
ci =
hϕi, ϕii

The ci are called Fourier coefficients.

Note: In an orthogonal system the computation is

reduced to a minimum: it is only necessary to compute
inner products and no equation solving is needed!
Example: Fourier series, wavelet decomposition etc.

L2 approximation. . .

“Theorem:” Let {ϕj } be a linearly independent

system of basis functions in L2. For every f ∈ L2
there is a unique f ∗ = Σj c∗j ϕj , given by the normal
equations hϕi, f ∗ − f i = 0, such that

kf ∗ − f k22 ≤ kg − f k22

for any g = Σj cj ϕj .

Proof: Write g − f = g − f ∗ + f ∗ − f and note that

by the normal equations, g − f ∗ ⊥ f ∗ − f . Apply the
Pythagorean theorem:

kg − f k22 = kg − f ∗k22 + kf ∗ − f k22.

So kg − f k22 is minimal when kg − f ∗k22 = 0, i.e.,

g = f ∗ by the linear independence of {ϕj }.

