Chap 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Chapter 4

Solving Systems of Linear Equations

4.3 Pivoting and Constructing an Algorithm


• Gaussian Elimination: most efficient means of solving linear systems of small to
medium size.

• Ax = b ⇒ U x̃ = b̃:
Reduce the linear system to an equivalent triangular system by performing elementary
row operations.

Basic Gaussian Elimination


Step 1. 1st equation unchanged and zero all subdiagonal elements in the 1st column by
adding −ak1 /a11 to the k-th row.

mk1 = ak1 /a11 is called a “multiplier" & a11 and the first equation is a "pivot" element
and "pivot" equation, respectively.

Step 2.
 
a11 a12 ··· a1n
0 a022 ··· a02n 
 
 
a032 .
· · · a03n 
0

 . .. ... .. 
 .. . . 
0 a0n2 0
· · · ann

Step n. Solve the upper triangular system by “back" substitution.

1
2 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
LU factorization & GE
(k)
Theorem 4.3.1. If all the pivot elements akk are nonzero in the process of GE (without
pivoting), then A = LU , where L is a lower-triangular matrix with `ij = mij , i > j and
`ii = 1 and U is the upper-triangular matrix resulting from the GE.

     
1 1 −1 1 1 −1 1 1 −1
m21 =1, m32 =−2
A =  1 2 −2 =⇒ 0 1 −1 m=⇒
32 =3
0 1 −1
−2 1 1 0 3 −1 0 0 2
   
1 0 0 1 0 0
L = m21 1 0 =  1 1 0 ,
m31 m32 1 −2 3 1
 
1 1 −1
U = 0 1 −1 .
0 0 2

Pivoting
Potential difficulties

1. A pivot elements is zero: this can be corrected by exchanging the order of the rows. ⇒
"Pivoting".

Example 4.3.1.     
0 1 x 1
=
1 1 y 2

2. The pivot element is small relative to the other terms in the pivot row: this can lead to
bad numerical results.

Example 4.3.2.     
 1 x 1
=
1 1 y 2
 
 1
If  is very small ( ≈ 0), det =  − 1 ≈ −1 6= 0. However, GE yields
1 1
    
 1 x 1
1 =
0 1− 
y 2 − 1

2 − 1
∴y= .
1 − 1
4.3. PIVOTING AND CONSTRUCTING AN ALGORITHM 3
In a computer, if  is small enough, 2 − 1

≈ − 1 . For example, let  = 10−8 . In a
seven-place decimal machine,

−1 = 0.1000000 × 109 ,


2 = 0.20000000 × 101 = 0.0000000|02 × 109 .
1
Therefore, the mantissa of 2 = 0.0000000 and 2 − 
= − 1 .

1−y
∴y=1 & x= = 0.

1 1−2
However, the correct solution is x = 1−
≈ 1 and y = 1−
≈ 1.

Remark 4.3.1. It is not actually the smoothness of the coefficient a11 that is causing the
trouble. Rather, it is the smallness of a11 relative to the other elements in the row.

Example 4.3.3.
1 1
    1
x
= 
1 1 y 2

After applying GE, we have


1
    1 
1 
x 
1 = 1
0 1− 
y 2 − 

For small ,
1
2−  1 1
∴y= 1 ≈1 & x= − y ≈ 0.
1− 
 

Example 4.3.4. The difficulties will disappear if the order of the system is changed.
    
1 1 x 2
=
 1 y 1

GE produces     
1 1 x 2
= .
0 1− y 1 − 2
Then,
y = (1 − 2)/(1 − ) ≈ 1 & x = 2 − y ≈ 1.

Remark 4.3.2. In general, a more accurate solution is obtained when the equations are
arranged such that the pivot equation has the largest possible pivot element. ⇒ “Partial
pivoting".
4 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Partial pivoting: choose the k-th pivot row based on the largest absolute value.

Pivoting can be expressed using matrix multiplication:

Ax = b =⇒ P Ax = P b, (P : permutation matrix)

Then, LU factorization of P A is used to solve P Ax = P b as follows:

(i) Factorization phase: find L and U such that P A = LU .

(ii) Solution phase:

Lz = P b (forward substitution)
Ux = z (back substitution)

GE with scaled row pivoting


(i) Factorization:

(a) Compute the scale of each row: si = max |aij |.


1≤j≤n

(b) Select the pivot row for which |ai1 |/si is largest. The index chosen is denoted by
P1 , · · · , etc.
Note: we want to keep track of Pi ’s (or permutation vector (P1 , · · · , Pn )). We start
with (P1 , · · · , Pn ) = (1, 2, · · · , n) and exchange Pk and Pj at the steps for k-th
column.

Operation Counts
Let us count the number of arithmetic operations involved in the Gaussian Elimination (both
factorization phase and solution phase). As the multiplications and divisions are much more
time consuming than additions and subtractions, we count only multiplications and divisions.

Definition 4.3.1. “Flops": floating-point operators/second.

Operation counts for GE with scaled row pivoting


(i) Factorization phase: ≈ 31 n3 + 12 n2 ops

(ii) Solution phase: ≈ n2 ops

(iii) Total costs: ≈ 31 n3 + 32 n2 ops

Solving m n × n linear system Ax(i) = b(i) , i = 1, . . . , m, using the LU factorization will take
≈ 31 n3 + 12 + m n2 ops.
4.3. PIVOTING AND CONSTRUCTING AN ALGORITHM 5
Systems Requiring No Pivoting
Linear systems we encounter while solving differential equations numerically often have rel-
atively large diagonal elements and pivoting is sometimes unnecessary.

Definition 4.3.2. A matrix A = (aij ) is called diagonally-dominant if


X
|aii | > |aij | ∀i.
j6=i

A is said to be weakly diagonally-dominant if


X
|aii | ≥ |aij | ∀i.
j6=i

Example:  
4 −1 0 −1
−1 4 0 −1
 
−1 0 4 −1
0 −1 −1 4
is arising from the finite difference method.

Theorem 4.3.2. If A is diagonally-dominant, Gaussian Elimination with no pivoting can


be applied to solve Ax = b and a zero pivot will not be encountered.

Another class of matrices that require no pivoting:

Theorem 4.3.3. If A is positive definite, the GE with no pivoting can be applied to solve
Ax = b, and a zero pivot will not be encountered. Thus, A is nonsingular.

Tridiagonal System
Definition 4.3.3. A matrix A = (aij ) is said to betridiagonal if aij = 0 for all (i, j) such
that |i − j| > 1.

Consider a tridiagonal system


    
d1 c1 x1 b1
a1 d2 c2   x 2   b2 
 .   . 
...

 .   . 
a2 d 3  .  =  . .


.. ..  .   . 
  ..   .. 

 . . cn−1
an−1 dn xn bn

Assume that A requires no pivoting. Then, the GE is


6 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Step 1.

d2 ←− d2 − (a1 /d1 )d1


b2 ←− b2 − (a1 /d1 )b1
..
.

Step n. Back substitution:

xn ←− bn /dn
xn−1 ←− (bn−1 − cn−1 xn )/dn−1
..
.

4.4 Norms and the Analysis of Errors


Vector norms
Definition 4.4.1. Let k · k be a function from a vector space V to R. Then, k · k is a vector
norm if

1. kxk > 0 for all x 6= 0 ∈ V and kxk = 0 if x = 0.

2. kcxk = |c|kxk for all c ∈ R and x ∈ V .

3. kx + yk = kxk + kyk for all x, y ∈ V .

Example 4.4.1. Let x = (x1 , x2 , . . . , xn ) ∈ Rn .

1. 1-norm: kxk2 = ni=1 |xi |.


P

1
2. 2-norm: kxk2 = ( ni=1 |xi |2 ) 2 .
P

3. ∞ norm (or max-norm):kxk∞ = max |xi |.


1≤i≤n

Matrix norms
Theorem 4.4.1. If k · k is any vector norm on Rn , then the equation

kAk = sup {kAxk : x ∈ Rn }


kxk=1

defines a matrix norm on the linear space of all n × n matrices.

Properties of matrix norms:


4.4. NORMS AND THE ANALYSIS OF ERRORS 7
1. kA + Bk ≤ kAk + kBk for all A, B ∈ Rn×n .
2. kAxk ≤ kAkkxk for all A ∈ Rn×n and x ∈ Rn .
Example 4.4.2. 1. kAk1 = max ni=1 |aij | ( maximum column sum)
P
1≤j≤n
Pn
2. kAk∞ = max j=1 |aij | ( maximum row sum)
1≤i≤n
p
3. kAk2 is the largest singular value of A. That is, kAk2 = ρ(AT A), where ρ(B) = largest
absolute eigenvalue of B and is called the spectral radius of B.

Condition number of matrices


Consider a linear system Ax = b. If we make a small change in b to obtain a vector b̃, then
the solution of the linear system will be also changed to a new solution x̃. That is, Ax̃ = b̃.
By how much do x and x̃ differ in relative terms? Let’s do some analysis.
kx − x̃k = kA−1 (b − b̃)k
≤ kA−1 kkb − b̃k
kb − b̃k
= kA−1 kkAxk
kbk
kb − b̃k
≤ kA−1 kkAkkxk .
kbk
Hence,
kx − x̃k kb − b̃k
≤ κ(A) ,
kxk kbk
where
κ(A) = kAkkA−1 k.
The number κ(A) is called a condition number of the matrix A.
If the condition number is much larger than 1, then a small perturbation in b can produce
a large change in x. In this case, we say that Ax = b is ill-conditioned. Otherwise, it is
well-conditioned.
Example 4.4.3.
   
1 1+ 1
−1 1 −(1 + )
A= , A = 2 ,  > 0.
1− 1  −1 +  1
 2
−1 2+ 2+ 4
kAk∞ = 2 + , kA k∞ = 2 . ∴ κ(A, ∞) = > 2.
  
If  ≤ 0.01, then κ(A, ∞) ≥ 40, 000. Therefore, a small perturbation in b may induce a
relative perturbation 40, 000 times greater in the solution of Ax = b.
8 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
4.6 Iterative Methods for Linear Systems
• Direct methods:
– ex) Gaussian Elimination and its variants
– Compute an exact solution after a finite number of steps (in exact computation,
but in practice, there are round-off errors)
• Iterative methods:
– produce a sequence of approximations, x(1) , x(2) , · · · , which ideally converges to
the solution
– may require less memory than direct methods
– may be faster than direct methods
– may handle special structures (such as sparsity) in a simpler way

Two classes of iterative methods


i) Stationary methods (classical iterative methods)
x(k+1) = Gx(k) + c,
where G is the iteration matrix.
Examples: Jacobi, Gauss-Seidel, and Successive Over Relaxation (SOR) methods
Note: these are “stationary" methods because the transition for x(k) to x(k+1) does not
depend on the history of the iteration.
ii) Krylov subspace methods:
They find solutions in the krylov subspace {b, Ab, A2 b, · · · , Ak−1 b}.
Examples: Conjugate Gradient (CG), Generalized Minimal Residual (GMRES), etc.

Classical Iterative Methods


Consider Ax = b.

Jacobi iteration
Consider     
7 −6 x1 3
=
−8 9 x2 −4
We rewrite the system in equation forms and solve the i-th equation for xi , i = 1, . . . , n:
6 3
x1 = x2 +
7 7
8 4
x 2 = x1 −
9 9
4.6. ITERATIVE METHODS FOR LINEAR SYSTEMS 9
Then, given x(k) , the formula for the Jacobi iteration is

(k+1) 6 (k) 3
x1 = x2 +
7 7
(k+1) 8 (k) 4
x2 = x1 −
9 9
Example calculations:
Given x(0) = (0, 0),

(1) 6 (0) 3 3
x1 = x2 + = ,
7 7 7
(1) 8 (0) 4 4
x2 = x1 − =− .
9 9 9
 
(2) 6 (1) 3 6 4 3 3 1
x1 = x2 + = · − + = = ,
7 7 7 9 7 63 21
(2) 8 (1) 4 8 3 4 4
x2 = x1 − = · − =− .
9 9 9 7 9 63
General component-wise formula:
(k+1)
X (k)
xi = (bi − aij xj )/aii .
j6=i

Gauss-Seidel iteration
Iteration overwrites the approximate solution with the new value as soon as it is computed.
For the same example     
7 −6 x1 3
= ,
−8 9 x2 −4
the Gauss-Seidel iteration produces, given x(k) ,

(k+1) 6 (k) 3
x1 = x2 +
7 7
(k+1) 8 (k+1) 4
x2 = x1 −
9 9
Example calculations: Given x(0) = (0, 0),

(1) 6 (0) 3 3
x1 = x2 + = ,
7 7 7
(1) 8 (1) 4 8 3 4 4
x2 = x1 − = · − =− .
9 9 9 7 9 63
10 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
General component-wise formula:
!
(k+1)
X (k+1)
X (k)
xi = bi − aij xj − aij xj /aii .
j<i j>i

Matrix splitting and stationary iterative methods


Consider a matrix splitting of A = Q + (A − Q), where Q is invertible. Then,

Qx = (Q − A)x + b
⇔ x = Q−1 (−(A − Q)x + b) = (I − Q−1 A)x + Q−1 b,

which suggests an iterative process, defined by writing

x(k+1) = (I − Q−1 A)x(k) + Q−1 b,

Jacobi and GS as stationary iterative methods


i) Jacobi: Q = D, A − Q = L + U . ∴ GJ = −D−1 (L + U ). In other words,

x(k+1) = −D−1 (L + U )x(k) + D−1 b.

ii) Gauss-Seidel: Q = D + L, A − Q = U . ∴ GGS = −(D + L)−1 U .

x(k+1) = −(D + U )−1 U x(k) + (D + L)−1 b.

Convergence
Definition 4.6.1. The spectral radius of an n × n matrix A is

ρ(A) = max |λ|,


λ∈σ(A)

where σ(A) is the set of eigenvalues of A.

Remark 4.6.1. Convergence of the method is related to kGk. But, the norm can be small in
some norms and quite large in others. But the spectral radius allows us to make a complete
description.

Theorem 4.6.1. The iterative method x(k+1) = Gx(k) +c converges for any x(0) iff ρ(G) < 1.

Theorem 4.6.2. (Convergence of Jacobi and GS)


If A is diagonally dominant, then Jacobi and GS converge for any starting vector.
4.6. ITERATIVE METHODS FOR LINEAR SYSTEMS 11
Successive Over Relaxation (SOR) (∗ We haven’t covered SOR in class)
modifies Gauss-Seidel by adding a relaxation parameter ω:
(k+1)
x(k+1) = ωxGS + (1 − ω)x(k) .

The splitting matrix Q = D + ωL and GSOR = (D + ωL)−1 ((1 − ω)D − ωU ). If


ω < 1: SOR is called “underrelaxation"
ω > 1: SOR is called “overrelaxation"
ω = 1: SOR = GS.

Remark 4.6.2. 1. Performance can be dramatically improved with a good choice of ω.

2. Disadvantage: the choice of ω is often difficult to make.

Lemma 4.6.1. (Necessary condition for the convergence of SOR)


If the SOR method converges, then ω ∈ (0, 2).

Proof.

GSOR = ((1 − ω)D − ωU )


= (I + ω L̃)−1 ((1 − ω)I − ω Ũ ),

where L̃ = D−1 L, Ũ = D−1 U


Since det(I + ω L̃) = 1,

p(λ) = det((I + ω L̃)(λI − GSOR ))


= det((λ + ω − 1)I + ωλL̃ + ω Ũ ).

Hence,

±Πni=1 λi (GSOR ) = p(0) = det((ω − 1)) + ω Ũ


= (ω − 1)n ,

which implies that ρ(GSOR ) ≤ |ω − 1|.

Theorem 4.6.3. If A is SPD, then ρ(GSOR ) < 1 for all 0 < ω < 2.
Optimal relaxation factor
2
ωopt = p > 1,
1+ 1 − µ2max
where µ: eigenvalues of the Jacobi matrix.
12 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
4.7 Steepest Descent & Conjugate Gradient Methods
Consider Ax = b, where A is an n × n SPD matrix. We will use the following notation for
an inner product in Rn :
n
X
T
hx, yi = x y = xi yi .
i=1
n
Definition 4.7.1. Let f : R → R be twice continuously differentiable, then the gradient of
f is
 T
∂f ∂f ∂f
∇f = , ,··· ,
∂x1 ∂x2 ∂xn
and the Hessian of f is an n × n matrix
 
∂2f ∂2f
∂x21
··· ∂x1 ∂xn

∇2 f =  .. .. .. 
.
 . . . 
∂2f ∂2f
∂xn ∂x1
··· ∂xn ∂xn

Lemma 4.7.1. Let x∗ be a local minimizer of f , and assume that f is twice differentiable
in a neighborhood of x∗ . Then, ∇f (x∗ ) = 0 and ∇2 f is positive semi-definite (PSD).
Lemma 4.7.2. Assume that f is twice differentiable in a neighborhood of x∗ and that
∇f (x∗ ) = 0 and ∇2 f is positive definite. Then, x∗ is a local strict minimizer of f .
Definition 4.7.2. A function is said to be convex if

f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y) ∀x, y,

where λ ∈ [0, 1].


Lemma 4.7.3. If f : Rn → R is a convex function, then a local minimizer of f is the same
as the global minimizer.
Theorem 4.7.1. If A is SPD, then the problem of solving Ax = b is equivalent to the
problem of minimizing the quadratic form f (x) = 12 xT Ax − bT x.
Proof. First, note that ∇2 f = A and ∇f (x) = Ax − b.
(⇒) Suppose Ax∗ = b (that is, x∗ is the solution of Ax = b). Then,

∇f (x∗ ) = Ax∗ − b = 0

and ∇2 f (x∗ ) = A is positive definite. Hence, x∗ is a local minimizer. Since f (x) = 12 xT Ax −


bT x is convex, x∗ is the global minimizer.
(⇐) Suppose that x∗ is the global minimizer of f ( hence also is a local minimizer). Then,
0 = ∇f (x∗ ) = Ax∗ − b. Hence, x∗ is a solution of Ax = b.
4.7. STEEPEST DESCENT & CONJUGATE GRADIENT METHODS 13
Gradient descent method

Algorithm 1 Gradient Descent


1: Initialize x(0)
2: for k = 0, 1, 2, · · · do
3: xk+1 = xk + αk dk . dk ∈ Rn : search direction, αk ∈ R: stepsize.
4: end for

Steepest descent method


The steepest descent method is a special case of a gradient descent method. For the steepest
descent method, the search direction is given by dk = −∇f (xk ).

Algorithm 2 Steepest Descent Method


1: Initialize x0
2: for k = 0, 1, 2, · · · do
3: Compute dk = −∇f (xk )
4: Set αk by doing either exact or inexact line search
5: xk+1 = xk + αk dk
6: end for

Choosing stepsizes
At each iteration k, compute
αk = arg minf (xk + αdk ).
α

This αk yields maximum reduction of f at every iteration k. The search direction for the
steepest descent method applied to the quadratic form above is
dk = −∇f (xk ) = −(Axk − b) = b − Axk := rk ,
which is called residual. To use the exact line search to determine αk , consider the function
φ(α) = f (xk + αdk ).
We know that αk needs to satisfy φ0 (αk ) = 0.
φ0 (α) = ∇f (xk + αdk )T dk = (A(xk + αdk ) − b)T dk = (Axk − b)T dk + αdTk AT dk = 0.
Therefore,
(Axk − b)T dk dTk dk or hrk , rk i
αk = − = = .
(Adk )T dk dTk Adk hArk , rk i
14 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Algorithm 3 Steepest Descent for Quadratic Form
1: Initialize x0
2: for k = 0, 1, 2, · · · do
3: Compute rk = b − Axk
4: if ||rk || < tol then . Equivalent to ||∇f (xk )|| < tol
5: Stop
6: else
hrk ,rk i
7: Set αk = hAr k ,rk i
8: xk+1 = xk + αk rk
9: end if
10: end for

Remark 4.7.1.
• The negative gradient is normal to level curves of f (x).

• The two consecutive search directions, rk+1 and rk , are orthogonal:

Proof. First note that


 
hrk , rk i hrk , rk i hrk , rk i
rk+1 = b−Axk+1 = b−A xk + rk = b−Axk − Ark = rk − Ark .
hArk , rk i hArk , rk i hArk , rk i
Therefore,
 
hrk , rk i hrk , rk i
hrk+1 , rk i = rk − Ark , rk = (rk , rk ) − hArk , rk i = 0.
hArk , rk i hArk , rk i

Example 1. Let f : R2 → R be defined by

1
f (x) = xT Ax − bT x,
2
where      
2 0 2 x
A= , b= , x= 1 .
0 3 −1 x2
We want to find a local minimizer of this function. Recall that this problem is equivalent to
solving a linear system Ax = b, which has the solution x = [1, −1/3]T . Let x0 = [0, 1]T . Note
that −∇f (x) = b − Ax = [2 − 2x1 , −1 − 3x2 ]T . Therefore, d0 = −∇f (x0 ) = [2 − 0, −1 − 3]T =
[2, −4]T . With this,
hr0 , r0 i h[2, −4], [2, −4]i 20 5
α0 = = = = .
hAd0 , d0 i h[4, −12], [2, −4]i 56 14
4.7. STEEPEST DESCENT & CONJUGATE GRADIENT METHODS 15
Then,
5
x1 = [0, 1]T + [2, −4]T = [10/14, 1 − 20/14]T = [5/7, −3/7]T .
14
We continue this process until ||rk || < tol.

Convergence
The order of convergence is determined by the size of the spectral condition number κ(A, 2).

Definition 4.7.3. Energy norm: ||x||A = xT Ax.
σmax |λmax |
Recall that the spectral condition number κ(A, 2) = σmin
= |λmin |
since A is symmetric.

Theorem 4.7.2. Let x∗ is the local minimizer of f (x). Then,


 k
∗ κ(A, 2) − 1
||xk − x ||A ≤ ||x0 − x∗ ||A .
κ(A, 2) + 1

Remark 4.7.2.
κ(A,2)−1 2
• If κ(A, 2) is large, κ(A,2)+1
=1− κ(A,2)
≈ 1. Hence, the convergence will be very slow.

• The convergence (or κ(A, 2)) is related to the geometry of the level curves of f (x), which
are ellipsoids. If A = I, then the level curves are circles. So, the method converges in
one step. If the ellipsoids are long and skinny, which means a large κ(A, 2), the method
converges slowly.

Conjugate Gradient
A basic idea behind the CG method: successive search directions are orthogonal directions,
where orthogonality is meaured in a metric more suited to the problems than the Euclidean
metric.

Definition 4.7.4. Let A be a SPD matrix. Then, two vectors x and y are called conjugate
or A-orthogonal provided hx, Ayi = 0.

CG as a direct method
Let {d0 , d1 , . . . , dn−1 } be “mutually (pairwise)" conjugate directions. Then, they are linearly
independent and form a basis for Rn . Let x∗ be the solution of Ax = b, that is x∗ = A−1 b.
Then,
n−1
X

x = αk dk for some αk0 s.
k=0
16 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Then, what should be the coefficients αk0 s? Let’s do some calculations.

n−1
X
hdi , Ax∗ i = αk hdi , Adk i
k=0
= αi hdi , Adi i (∵ hdi , Adk i = 0 for i 6= k).

Therefore,
hdi , Ax∗ i hdi , bi
αi = = .
hdi , Adi i hdi , Adi i
Therefore, we can compute αi ’s directly from b.

4.7.1 CG as an iterative method


If we choose d0k s carefully, we may not need all of them to obtain a good approximation to
x∗ . But, how do we do this?

1. Choose x0 and compute the residual r0 = b − Ax0 . Then, take d0 = r0 .

2. For k = 1, . . ., compute rk−1 = b−Axk−1 . Assume that {di }k−1


i=0 are mutually conjugate.
We want to find dk that is A-orthogonal to {di }k−1
i=0 . This dk will be built out of the
k−1
current residual rk−1 and {di }i=0 :

k−1
X hdi , Ark−1 i
dk = rk−1 − di .
i=0
hdi , Adi i

Then, we update
xk = xk−1 + αk dk ,

where
hdk , rk−1 i
α= .
hdk , Adk i

Illustration of CG vs SDM

Consider solving
    
17 2 x1 2
=
2 7 x2 2

using the SDM and CG with x(0) = [0, 0]T .


4.7. STEEPEST DESCENT & CONJUGATE GRADIENT METHODS 17

x0

Figure 4.1: Convergence of CG (red) vs SDM (green)

Convergence the CG method

Theorem 4.7.3. Let x∗ is the local minimizer of f (x). Then,


p !k
κ(A, 2) − 1
||xk − x∗ ||A ≤ 2 p ||x0 − x∗ ||A .
κ(A, 2) + 1

Note that p
κ(A, 2) − 1 2
p ≈1− p ≈ 1 for κ(A, 2) ≥ 1.
κ(A, 2) + 1 κ(A, 2)

Theorem 4.7.4. Let A ∈ Rn×n be SPD. Then, the CG method will find the solution within
n iterations.

Remark 4.7.3. The CG method terminates in finitely many iterations with the exact solution
(in the absence of all round-off errors). But, this is not as good as it sound since n can be
very large.

n
Theorem 4.7.5. Let A be a SPD matrix with eigenvectors Pk {ui }i=1 , and let b be a linear
combination of k of the eigenvectors of A. That is, b = `=1 r` ui` . Then, the CG method
for Ax = b with x0 = 0 will terminate in at most k iterations.

Theorem 4.7.6. Let A be SPD. Assume that there are exactly k ≤ n distinct eigenvalues
of A. Then, the CG method terminates in at most k iterations.
18 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Stopping criteria
We terminate the CG method when the (relative) residual is small enough. That is,

||b − Axk || ≤ ||b||.

Final remarks
1. If κ(A) ≈ 1, then the CG converges very rapidly.

2. Even if κ(A) is large, the iteration will perform well if the eigenvalues are clustered in
a few small intervals.

3. Preconditioning: the transforming of the problem into one with eigenvalues clustered
near one. For example, find B such that B ≈ A−1 and solve

BAx = Bb.

Since B ≈ A−1 , BA ≈ I and κ(BA) ≤ κ(A). But, be careful! BA might not be SPD
anymore.

Generalized Minimal Residual (GMRES) Method


GMRES are for non-symmertic systems. We won’t go over the details.

You might also like