Professional Documents
Culture Documents
Chap 4
Chap 4
Chap 4
• Ax = b ⇒ U x̃ = b̃:
Reduce the linear system to an equivalent triangular system by performing elementary
row operations.
mk1 = ak1 /a11 is called a “multiplier" & a11 and the first equation is a "pivot" element
and "pivot" equation, respectively.
Step 2.
a11 a12 ··· a1n
0 a022 ··· a02n
a032 .
· · · a03n
0
. .. ... ..
.. . .
0 a0n2 0
· · · ann
1
2 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
LU factorization & GE
(k)
Theorem 4.3.1. If all the pivot elements akk are nonzero in the process of GE (without
pivoting), then A = LU , where L is a lower-triangular matrix with `ij = mij , i > j and
`ii = 1 and U is the upper-triangular matrix resulting from the GE.
1 1 −1 1 1 −1 1 1 −1
m21 =1, m32 =−2
A = 1 2 −2 =⇒ 0 1 −1 m=⇒
32 =3
0 1 −1
−2 1 1 0 3 −1 0 0 2
1 0 0 1 0 0
L = m21 1 0 = 1 1 0 ,
m31 m32 1 −2 3 1
1 1 −1
U = 0 1 −1 .
0 0 2
Pivoting
Potential difficulties
1. A pivot elements is zero: this can be corrected by exchanging the order of the rows. ⇒
"Pivoting".
Example 4.3.1.
0 1 x 1
=
1 1 y 2
2. The pivot element is small relative to the other terms in the pivot row: this can lead to
bad numerical results.
Example 4.3.2.
1 x 1
=
1 1 y 2
1
If is very small ( ≈ 0), det = − 1 ≈ −1 6= 0. However, GE yields
1 1
1 x 1
1 =
0 1−
y 2 − 1
2 − 1
∴y= .
1 − 1
4.3. PIVOTING AND CONSTRUCTING AN ALGORITHM 3
In a computer, if is small enough, 2 − 1
≈ − 1 . For example, let = 10−8 . In a
seven-place decimal machine,
1−y
∴y=1 & x= = 0.
1 1−2
However, the correct solution is x = 1−
≈ 1 and y = 1−
≈ 1.
Remark 4.3.1. It is not actually the smoothness of the coefficient a11 that is causing the
trouble. Rather, it is the smallness of a11 relative to the other elements in the row.
Example 4.3.3.
1 1
1
x
=
1 1 y 2
For small ,
1
2− 1 1
∴y= 1 ≈1 & x= − y ≈ 0.
1−
Example 4.3.4. The difficulties will disappear if the order of the system is changed.
1 1 x 2
=
1 y 1
GE produces
1 1 x 2
= .
0 1− y 1 − 2
Then,
y = (1 − 2)/(1 − ) ≈ 1 & x = 2 − y ≈ 1.
Remark 4.3.2. In general, a more accurate solution is obtained when the equations are
arranged such that the pivot equation has the largest possible pivot element. ⇒ “Partial
pivoting".
4 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Partial pivoting: choose the k-th pivot row based on the largest absolute value.
Ax = b =⇒ P Ax = P b, (P : permutation matrix)
Lz = P b (forward substitution)
Ux = z (back substitution)
(b) Select the pivot row for which |ai1 |/si is largest. The index chosen is denoted by
P1 , · · · , etc.
Note: we want to keep track of Pi ’s (or permutation vector (P1 , · · · , Pn )). We start
with (P1 , · · · , Pn ) = (1, 2, · · · , n) and exchange Pk and Pj at the steps for k-th
column.
Operation Counts
Let us count the number of arithmetic operations involved in the Gaussian Elimination (both
factorization phase and solution phase). As the multiplications and divisions are much more
time consuming than additions and subtractions, we count only multiplications and divisions.
Solving m n × n linear system Ax(i) = b(i) , i = 1, . . . , m, using the LU factorization will take
≈ 31 n3 + 12 + m n2 ops.
4.3. PIVOTING AND CONSTRUCTING AN ALGORITHM 5
Systems Requiring No Pivoting
Linear systems we encounter while solving differential equations numerically often have rel-
atively large diagonal elements and pivoting is sometimes unnecessary.
Example:
4 −1 0 −1
−1 4 0 −1
−1 0 4 −1
0 −1 −1 4
is arising from the finite difference method.
Theorem 4.3.3. If A is positive definite, the GE with no pivoting can be applied to solve
Ax = b, and a zero pivot will not be encountered. Thus, A is nonsingular.
Tridiagonal System
Definition 4.3.3. A matrix A = (aij ) is said to betridiagonal if aij = 0 for all (i, j) such
that |i − j| > 1.
xn ←− bn /dn
xn−1 ←− (bn−1 − cn−1 xn )/dn−1
..
.
1
2. 2-norm: kxk2 = ( ni=1 |xi |2 ) 2 .
P
Matrix norms
Theorem 4.4.1. If k · k is any vector norm on Rn , then the equation
Jacobi iteration
Consider
7 −6 x1 3
=
−8 9 x2 −4
We rewrite the system in equation forms and solve the i-th equation for xi , i = 1, . . . , n:
6 3
x1 = x2 +
7 7
8 4
x 2 = x1 −
9 9
4.6. ITERATIVE METHODS FOR LINEAR SYSTEMS 9
Then, given x(k) , the formula for the Jacobi iteration is
(k+1) 6 (k) 3
x1 = x2 +
7 7
(k+1) 8 (k) 4
x2 = x1 −
9 9
Example calculations:
Given x(0) = (0, 0),
(1) 6 (0) 3 3
x1 = x2 + = ,
7 7 7
(1) 8 (0) 4 4
x2 = x1 − =− .
9 9 9
(2) 6 (1) 3 6 4 3 3 1
x1 = x2 + = · − + = = ,
7 7 7 9 7 63 21
(2) 8 (1) 4 8 3 4 4
x2 = x1 − = · − =− .
9 9 9 7 9 63
General component-wise formula:
(k+1)
X (k)
xi = (bi − aij xj )/aii .
j6=i
Gauss-Seidel iteration
Iteration overwrites the approximate solution with the new value as soon as it is computed.
For the same example
7 −6 x1 3
= ,
−8 9 x2 −4
the Gauss-Seidel iteration produces, given x(k) ,
(k+1) 6 (k) 3
x1 = x2 +
7 7
(k+1) 8 (k+1) 4
x2 = x1 −
9 9
Example calculations: Given x(0) = (0, 0),
(1) 6 (0) 3 3
x1 = x2 + = ,
7 7 7
(1) 8 (1) 4 8 3 4 4
x2 = x1 − = · − =− .
9 9 9 7 9 63
10 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
General component-wise formula:
!
(k+1)
X (k+1)
X (k)
xi = bi − aij xj − aij xj /aii .
j<i j>i
Qx = (Q − A)x + b
⇔ x = Q−1 (−(A − Q)x + b) = (I − Q−1 A)x + Q−1 b,
Convergence
Definition 4.6.1. The spectral radius of an n × n matrix A is
Remark 4.6.1. Convergence of the method is related to kGk. But, the norm can be small in
some norms and quite large in others. But the spectral radius allows us to make a complete
description.
Theorem 4.6.1. The iterative method x(k+1) = Gx(k) +c converges for any x(0) iff ρ(G) < 1.
Proof.
Hence,
Theorem 4.6.3. If A is SPD, then ρ(GSOR ) < 1 for all 0 < ω < 2.
Optimal relaxation factor
2
ωopt = p > 1,
1+ 1 − µ2max
where µ: eigenvalues of the Jacobi matrix.
12 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
4.7 Steepest Descent & Conjugate Gradient Methods
Consider Ax = b, where A is an n × n SPD matrix. We will use the following notation for
an inner product in Rn :
n
X
T
hx, yi = x y = xi yi .
i=1
n
Definition 4.7.1. Let f : R → R be twice continuously differentiable, then the gradient of
f is
T
∂f ∂f ∂f
∇f = , ,··· ,
∂x1 ∂x2 ∂xn
and the Hessian of f is an n × n matrix
∂2f ∂2f
∂x21
··· ∂x1 ∂xn
∇2 f = .. .. ..
.
. . .
∂2f ∂2f
∂xn ∂x1
··· ∂xn ∂xn
Lemma 4.7.1. Let x∗ be a local minimizer of f , and assume that f is twice differentiable
in a neighborhood of x∗ . Then, ∇f (x∗ ) = 0 and ∇2 f is positive semi-definite (PSD).
Lemma 4.7.2. Assume that f is twice differentiable in a neighborhood of x∗ and that
∇f (x∗ ) = 0 and ∇2 f is positive definite. Then, x∗ is a local strict minimizer of f .
Definition 4.7.2. A function is said to be convex if
∇f (x∗ ) = Ax∗ − b = 0
Choosing stepsizes
At each iteration k, compute
αk = arg minf (xk + αdk ).
α
This αk yields maximum reduction of f at every iteration k. The search direction for the
steepest descent method applied to the quadratic form above is
dk = −∇f (xk ) = −(Axk − b) = b − Axk := rk ,
which is called residual. To use the exact line search to determine αk , consider the function
φ(α) = f (xk + αdk ).
We know that αk needs to satisfy φ0 (αk ) = 0.
φ0 (α) = ∇f (xk + αdk )T dk = (A(xk + αdk ) − b)T dk = (Axk − b)T dk + αdTk AT dk = 0.
Therefore,
(Axk − b)T dk dTk dk or hrk , rk i
αk = − = = .
(Adk )T dk dTk Adk hArk , rk i
14 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Algorithm 3 Steepest Descent for Quadratic Form
1: Initialize x0
2: for k = 0, 1, 2, · · · do
3: Compute rk = b − Axk
4: if ||rk || < tol then . Equivalent to ||∇f (xk )|| < tol
5: Stop
6: else
hrk ,rk i
7: Set αk = hAr k ,rk i
8: xk+1 = xk + αk rk
9: end if
10: end for
Remark 4.7.1.
• The negative gradient is normal to level curves of f (x).
1
f (x) = xT Ax − bT x,
2
where
2 0 2 x
A= , b= , x= 1 .
0 3 −1 x2
We want to find a local minimizer of this function. Recall that this problem is equivalent to
solving a linear system Ax = b, which has the solution x = [1, −1/3]T . Let x0 = [0, 1]T . Note
that −∇f (x) = b − Ax = [2 − 2x1 , −1 − 3x2 ]T . Therefore, d0 = −∇f (x0 ) = [2 − 0, −1 − 3]T =
[2, −4]T . With this,
hr0 , r0 i h[2, −4], [2, −4]i 20 5
α0 = = = = .
hAd0 , d0 i h[4, −12], [2, −4]i 56 14
4.7. STEEPEST DESCENT & CONJUGATE GRADIENT METHODS 15
Then,
5
x1 = [0, 1]T + [2, −4]T = [10/14, 1 − 20/14]T = [5/7, −3/7]T .
14
We continue this process until ||rk || < tol.
Convergence
The order of convergence is determined by the size of the spectral condition number κ(A, 2).
√
Definition 4.7.3. Energy norm: ||x||A = xT Ax.
σmax |λmax |
Recall that the spectral condition number κ(A, 2) = σmin
= |λmin |
since A is symmetric.
Remark 4.7.2.
κ(A,2)−1 2
• If κ(A, 2) is large, κ(A,2)+1
=1− κ(A,2)
≈ 1. Hence, the convergence will be very slow.
• The convergence (or κ(A, 2)) is related to the geometry of the level curves of f (x), which
are ellipsoids. If A = I, then the level curves are circles. So, the method converges in
one step. If the ellipsoids are long and skinny, which means a large κ(A, 2), the method
converges slowly.
Conjugate Gradient
A basic idea behind the CG method: successive search directions are orthogonal directions,
where orthogonality is meaured in a metric more suited to the problems than the Euclidean
metric.
Definition 4.7.4. Let A be a SPD matrix. Then, two vectors x and y are called conjugate
or A-orthogonal provided hx, Ayi = 0.
CG as a direct method
Let {d0 , d1 , . . . , dn−1 } be “mutually (pairwise)" conjugate directions. Then, they are linearly
independent and form a basis for Rn . Let x∗ be the solution of Ax = b, that is x∗ = A−1 b.
Then,
n−1
X
∗
x = αk dk for some αk0 s.
k=0
16 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Then, what should be the coefficients αk0 s? Let’s do some calculations.
n−1
X
hdi , Ax∗ i = αk hdi , Adk i
k=0
= αi hdi , Adi i (∵ hdi , Adk i = 0 for i 6= k).
Therefore,
hdi , Ax∗ i hdi , bi
αi = = .
hdi , Adi i hdi , Adi i
Therefore, we can compute αi ’s directly from b.
k−1
X hdi , Ark−1 i
dk = rk−1 − di .
i=0
hdi , Adi i
Then, we update
xk = xk−1 + αk dk ,
where
hdk , rk−1 i
α= .
hdk , Adk i
Illustration of CG vs SDM
Consider solving
17 2 x1 2
=
2 7 x2 2
x0
Note that p
κ(A, 2) − 1 2
p ≈1− p ≈ 1 for κ(A, 2) ≥ 1.
κ(A, 2) + 1 κ(A, 2)
Theorem 4.7.4. Let A ∈ Rn×n be SPD. Then, the CG method will find the solution within
n iterations.
Remark 4.7.3. The CG method terminates in finitely many iterations with the exact solution
(in the absence of all round-off errors). But, this is not as good as it sound since n can be
very large.
n
Theorem 4.7.5. Let A be a SPD matrix with eigenvectors Pk {ui }i=1 , and let b be a linear
combination of k of the eigenvectors of A. That is, b = `=1 r` ui` . Then, the CG method
for Ax = b with x0 = 0 will terminate in at most k iterations.
Theorem 4.7.6. Let A be SPD. Assume that there are exactly k ≤ n distinct eigenvalues
of A. Then, the CG method terminates in at most k iterations.
18 CHAPTER 4. SOLVING SYSTEMS OF LINEAR EQUATIONS
Stopping criteria
We terminate the CG method when the (relative) residual is small enough. That is,
Final remarks
1. If κ(A) ≈ 1, then the CG converges very rapidly.
2. Even if κ(A) is large, the iteration will perform well if the eigenvalues are clustered in
a few small intervals.
3. Preconditioning: the transforming of the problem into one with eigenvalues clustered
near one. For example, find B such that B ≈ A−1 and solve
BAx = Bb.
Since B ≈ A−1 , BA ≈ I and κ(BA) ≤ κ(A). But, be careful! BA might not be SPD
anymore.