Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Matrix Functions:

Theory and Algorithms

Nick Higham
Department of Mathematics
University of Manchester

higham@ma.man.ac.uk
http://www.ma.man.ac.uk/~higham/

Includes joint work with Philip Davies

Function of Matrix – p.1/42


OUTLINE
I Definitions of f (A)

Applications

Algorithms for particular f

Schur–Parlett algorithm for general f

Computing f (A)b

Function of Matrix – p.2/42


Defining by Substitution
Want to define f : Cn×n → Cn×n , but not elementwise.
Given f (t), can define f (A) by substituting A for t:

1 + t2
f (t) = ⇒ f (A) = (I − A)−1 (I + A2 ).
1−t

x2 x3 x4
log(1 + x) = x − + − + · · · , |x| < 1
2 3 4
A2 A3 A4
⇒ log(I + A) = A − + − + · · · , ρ(A) < 1.
2 3 4
Works for f
a polynomial,
a rational,
or with a convergent power series.
Function of Matrix – p.3/42
Multiplicity of Definitions

There have been proposed in the literature since 1880


eight distinct definitions
of a matric function,
by Weyr, Sylvester and Buchheim,
Giorgi, Cartan, Fantappiè, Cipolla,
Schwerdtfeger and Richter.
— R. F. Rinehart,
The Equivalence of Definitions of a Matric Function,
Amer. Math. Monthly (1955)

Function of Matrix – p.4/42


Cauchy Integral Theorem
Definition 1
Z
1
f (A) = f (z)(zI − A)−1 dz,
2πi Γ

where f is analytic inside a closed contour Γ which


encloses λ(A).

Function of Matrix – p.5/42


Jordan Canonical Form
 
λk 1
 ... 
 λk 
Z −1 AZ = J = diag(J1 , J2 , . . . , Jp ), Jk =  ... 
 1 
λk

Definition 2

f (A) = Zf (J)Z −1 = Zdiag(f (Jk ))Z −1 ,


 
f (k−1) )(λ k)
 f (λk ) f 0 (λ k) ... 
 (k − 1)! 
 ... .. 
f (Jk ) = 
 f (λk ) . .

 ... 
 f 0 (λk ) 
f (λk )

Function of Matrix – p.6/42


Interpolation
Definition 3 (Sylvester, 1883; Buchheim, 1886) Distinct
e’vals λ1 , . . . , λs , ni = geometric mult. of λi . Then
f (A) = r(A), where r is unique Hermite interpolating poly of
Ps
degree less than i=1 ni satisfying interpolation conditions

r(j) (λi ) = f (j) (λi ), j = 0: ni − 1, i = 1: s.

Poly r depends on A.
This def. preserves functional relations G(f1 , . . . , fp ) = 0,
where G is a polynomial. E.g. sin2 (A) + cos2 (A) = I .
But of course eA+B 6= eA eB .

Function of Matrix – p.7/42


Non-Primary Functions
Horn & Johnson call these defs primary matrix functions.
But not all possible functions captured when multiple
eigenvalues. E.g.,
· ¸ · ¸ · ¸
−1 0 i 0 0 −1
A= , X= , Y = .
0 −1 0 −i 1 0

X and Y are square roots of A but are not polynomials in A.


However, A = givens(π) and Y = givens(π/2) is a natural
square root.

Virtually all existing theory and methods are for primary


functions.
Non-primary functions sometimes needed when
tracking f (A(t)) when eigenvalues of A(t) coalesce.

Function of Matrix – p.8/42


Textbook References
[1] F. R. Gantmacher. The Theory of Matrices, volume
one. Chelsea, New York, 1959.

[2] Gene H. Golub and Charles F. Van Loan. Matrix


Computations. Johns Hopkins University Press,
Baltimore, MD, USA, third edition, 1996.

[3] Roger A. Horn and Charles R. Johnson. Topics in


Matrix Analysis. Cambridge University Press, 1991.

[4] Peter Lancaster and Miron Tismenetsky. The Theory


of Matrices. Academic Press, London, second edition,
1985.

Function of Matrix – p.9/42


OUTLINE
Definitions of f (A)

I Applications

Algorithms for particular f

Schur–Parlett algorithm for general f

Computing f (A)b

Function of Matrix – p.10/42


Application: Differential equations
Nuclear magnetic resonance: Solomon equations

dM/dt = −RM, M (0) = I,

where M (t) = matrix of intensities and R = symmetric


relaxation matrix. NMR workers need to solve both forward
and inverse problems.

Exponential time differencing for stiff systems (Cox &


Matthews, J. Comp. Phys., 2002)

y 0 = Ay + F (y, t).

Methods based on exact integration of linear part—require


one accurate evaluation of exp(hA) per integration.

Function of Matrix – p.11/42


Application: Control theory
Convert continuous-time system
dx
= Ax(t) + Bu(t)
dt
to discrete-time state-space system

xk+1 = F xk + Guk ,

where F = eAτ and τ is sampling period.


(E.g., MATLAB Control System Toolbox, c2d, d2c.)

Function of Matrix – p.12/42


OUTLINE
Definitions of f (A)

Applications

I Algorithms for particular f

Schur–Parlett algorithm for general f

Computing f (A)b

Function of Matrix – p.13/42


Classic MATLAB
< M A T L A B >
Version of 01/10/84

HELP is available

<>
help
Type HELP followed by
INTRO (To get started)
NEWS (recent revisions)
ABS ANS ATAN BASE CHAR CHOL CHOP CLEA COND CONJ COS
DET DIAG DIAR DISP EDIT EIG ELSE END EPS EXEC EXIT
EXP EYE FILE FLOP FLPS FOR FUN HESS HILB IF IMAG
INV KRON LINE LOAD LOG LONG LU MACR MAGI NORM ONES
ORTH PINV PLOT POLY PRIN PROD QR RAND RANK RCON RAT
REAL RETU RREF ROOT ROUN SAVE SCHU SHOR SEMI SIN SIZE
SQRT STOP SUM SVD TRIL TRIU USER WHAT WHIL WHO WHY
< > ( ) = . , ; \ / ’ + - * :

Function of Matrix – p.14/42


Classic MATLAB
<>
help fun

FUN For matrix arguments X , the functions SIN, COS, ATAN,


SQRT, LOG, EXP and X**p are computed using eigenvalues D
and eigenvectors V . If <V,D> = EIG(X) then f(X) =
V*f(D)/V . This method may give inaccurate results if V
is badly conditioned. Some idea of the accuracy can be
obtained by comparing X**1 with X .
For vector arguments, the function is applied to each
component.

The availability of [FUN] in early versions of MATLAB


quite possibly contributed to
the system’s technical and commercial success.
— Cleve Moler (2003)

Function of Matrix – p.15/42


Setup

I General nonsymmetric A
I Factorization of A feasible
I May not want full accuracy

I Many applications.
I Methods for very large, sparse A, often require solution
of smaller, dense subproblems.

Function of Matrix – p.16/42


Matrix Exponential
Cleve Moler and Charles Van Loan.
Nineteen dubious ways to compute the exponential of a
matrix, twenty-five years later, SIAM Rev., 45 (2003).
B 355 citations on Science Citation Index.

Scaling and squaring (SS) method for X ≈ eA


(Ward, 1977; Moler & Van Loan, 1978).
1. A ← A/2k so kAk∞ ≤ 1/2
2. r(A) = [6/6] Padé approximant to eA
k
3. X = r(A)2

Used by MATLAB’s expm.

Function of Matrix – p.17/42


Alternative SS Algorithm for eA
Suggested by Najfeld & Havel (1995): exploit

τ (A) = A coth(A) = A(e2A + I)(e2A − I)−1


A2
=I+ 2 .
A
3I +
A2
5I +
7I + · · ·

1. B = A/2k+1 so kA2 k∞ /22k+2 ≤ 1.152


2. r(B) = [8/8] Padé approximant to τ (B).
h i2k
3. X = (r(B) + B)(r(B) − B)−1

I Claimed to require fewer flops than original SS alg.


Function of Matrix – p.18/42
Principal Log and pth Root
Let A ∈ Cn×n have no eigenvalues on R− .

Log
X = log A denotes unique X such that
1. eX = A.
2. −π < Im(λ(X)) < π .
pth root
For integer p > 0, X = A1/p is unique X such that
1. X p = A.
2. −π/p < arg(λ(X)) < π/p.

Function of Matrix – p.19/42


Briggs’ Log Method (1617)
log(ab) = log a + log b ⇒ log a = 2 log a1/2 .
Use repeatedly:
k 1/2k
log a = 2 log a .
k
Writea1/2 = 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
k 1/2k
log10 a ≈ 2 · log10 e · (a − 1).

Function of Matrix – p.20/42


Briggs’ Log Method (1617)
log(ab) = log a + log b ⇒ log a = 2 log a1/2 .
Use repeatedly:
k 1/2k
log a = 2 log a .
k
Writea1/2 = 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
k 1/2k
log10 a ≈ 2 · log10 e · (a − 1).

Briggs must be viewed as one of the


great figures in numerical analysis.
— Herman H. Goldstine, A History of Numerical
Analysis (1977)

Function of Matrix – p.20/42


Briggs’ Log Method (1617)
log(ab) = log a + log b ⇒ log a = 2 log a1/2 .
Use repeatedly:
k 1/2k
log a = 2 log a .
k
Writea1/2 = 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
k 1/2k
log10 a ≈ 2 · log10 e · (a − 1).

Can we generalize to matrices:

k
log A = 2k log A1/2 ?

Function of Matrix – p.20/42


Splitting Lemma
Lemma 0 (Cheng, H, Kenney & Laub, 2001) Suppose
A = BC has no eigenvalues on R− and
1. BC = CB .
2. Every eigenvalue of B (or C ) lies in the open halfplane
of the corresponding eigenvalue of A1/2 .
Then log A = log B + log C .

λ λ1/2
B A

Im λ

Re λ Function of Matrix – p.21/42


Matrix Logarithm
Use the Briggs idea:
k
log A = 2k log A1/2 .

Kenney & Laub’s (1989) inverse scaling and squaring


method:
Bring A close to I by repeated square roots.
k
Approximate log A1/2 using an [m/m] Padé
approximant rm (x) ≈ log(1 − x).
Rescale to find log A.

Function of Matrix – p.22/42


Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.

Function of Matrix – p.23/42


Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

where Mk → I and Yk → A1/2 .

Function of Matrix – p.23/42


Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

where Mk → I and Yk → A1/2 .


F Aims for a specified accuracy.

Function of Matrix – p.23/42


Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

where Mk → I and Yk → A1/2 .


F Aims for a specified accuracy.
F Padé degree m chosen using K & L’s (1989) bound:

krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.

Function of Matrix – p.23/42


Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.
F Sq. roots by product form of Denman–Beavers iteration:
1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

where Mk → I and Yk → A1/2 .


F Aims for a specified accuracy.
F Padé degree m chosen using K & L’s (1989) bound:

krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.

F rm evaluated using partial fraction expansion


Pm αj(m) x
rm (x) = j=1 (m) : fast and accurate (H, 2001).
1+βj x
Function of Matrix – p.23/42
Matrix pth Root
Square root: Björck & Hammarling (1983). Compute Schur
decomp. A = QT Q∗ and then solve R2 = T by
Pj−1
√ tij − k=i+1 tij tkj
rii = tii , rij = .
tii + tjj

Extended to pth roots by Smith (2003)—much more


complicated recurrence.

These algs

I Have essentially optimal numerical stability.


I Generalize to real Schur decomp.

Function of Matrix – p.24/42


Matrix Cosine
Algorithm 0 (Serbin & Blalock, 1980) Given A ∈ Rn×n
and parameter α > 0 this alg approximates cos(A).
Choose m such that 2−m kAk ≈ α.
C0 = Taylor or Pade approximation to cos(A/2m ).
for i = 0: m − 1
Ci+1 = 2Ci2 − I
end

Choice of m (i.e., α)?


Which approximation?
Effect of rounding errors?

Function of Matrix – p.25/42


Alg of H & Smith (2002)
I Initial argument reduction and balancing to
reduce norm.
I [8/8] Padé approximation proved fully accurate
in IEEE double if kAk∞ ≤ 1. More economical
than Taylor series.
I “Schoolboy” evaluation of r8 (A).
I Total cost: (4 + dlog2 (kAk∞ )e)M + D.
I Error analysis give bound containing terms
(4.1)m and norms of intermediate Ci .

Function of Matrix – p.26/42


Numerical Stability
Is kfb − f k consistent with condition of problem?
Is fb = f (A + E) with E “small’, i.e.,
is residual f −1 (fb) − A “small’?

Unclear for all algs discussed except “yes” for A1/p .

F Currently lack characterizations of when an


f (A) problem is ill conditioned for nonnormal A.

Function of Matrix – p.27/42


OUTLINE
Definitions of f (A)

Applications

Algorithms for particular f

I Schur–Parlett algorithm for general f

Computing f (A)b

Function of Matrix – p.28/42


Similarity Transformations
Can use the formula

A = XBX −1 ⇒ f (A) = Xf (B)X −1 ,

provided f (B) is easily computable.


E.g. B = diag(λi ) if A diagonalizable.

Problem : any error ∆B in f (B) magnified by up to


κ(X) = kXkkX −1 k ≥ 1.

Prefer to work with unitary X : thus can use


eigendecomposition (diagonal B ) when A is normal
(AA∗ = A∗ A),
Schur decomposition (triangular B ) in general.

Function of Matrix – p.29/42


Example: Eigendecomposition
function F = funm_ev(A,fun)
[V,D] = eig(A);
F = V * diag(feval(fun,diag(D))) / V;

>> A = [3 -1; 1 1]; X = funm_ev(A,@sqrt)


X =
1.7678e+000 -3.5355e-001
3.5355e-001 1.0607e+000

>> norm(A-Xˆ2) % cond(V) = 9.4e7


ans =
9.9519e-009

>> Y = sqrtm(A); norm(A-Yˆ2)


ans =
6.4855e-016
Function of Matrix – p.30/42
Parlett’s Recurrence
Schur decomposition A = QT Q∗ reduces problem to
F = f (T ), T upper triangular.

fii = f (tii ) is immediate.

Parlett (1976): from F T = T F obtain recurrence


j−1
X
fii − fjj fik tkj − tik fkj
fij = tij + .
tii − tjj tii − tjj
k=i+1

Used in MATLAB’s funm.

Function of Matrix – p.31/42


Parlett’s Recurrence
Schur decomposition A = QT Q∗ reduces problem to
F = f (T ), T upper triangular.

fii = f (tii ) is immediate.

Parlett (1976): from F T = T F obtain recurrence


j−1
X
fii − fjj fik tkj − tik fkj
fij = tij + .
tii − tjj tii − tjj
k=i+1

Used in MATLAB’s funm.

Fails when T has repeated eigenvalues.

Function of Matrix – p.31/42


Parlett vs. Björck & Hammarling
Parlett recurrence is not “optimal”, as clear from sq. root
case: x12 obtained from
√ √
a12 ( a11 − a22 ) a12
Parlett : =√ √ : B & H.
a11 − a22 a11 + a22

Function of Matrix – p.32/42


Schur–Parlett Algorithm
H & Davies (2002):

Compute Schur decomposition A = QT Q∗ .


Re-order T to block triangular form in which
eigenvalues within a block are “close” and those of
separate blocks are “well separated”.
Evaluate Fii = f (Tii ).
Solve the Sylvester equations
j−1
X
Tii Fij − Fij Tjj = Fii Tij − Tij Fjj + (Fik Tkj − Tik Fkj ).
k=i+1

Undo the unitary transformations.

Function of Matrix – p.33/42


Function of Atomic Block
Assume f has Taylor series with ∞ radius of cgce and
derivatives available.

For diagonal blocks T use



X f (k) (σ)
T = σI + M, σ = trace(T )/n : f (T ) = M k.
k!
k=0

Truncate series based on strict error bound, not using


size of terms. NB: for n = 2,
· ¸
² α
M=
0 −²
· ¸ · ¸
2k ²2k 0 2k+1 ²2k+1 α²2k
⇒ M = 2k , M = 2k+1 .
0 ² 0 −²
Function of Matrix – p.34/42
Features of Algorithm
Costs O(n3 ) flops, or up to n4 /3 flops if large
blocks needed (close, repeated eigenvalues).
Needs derivatives if blocks size > 1: price to
pay for treating general f and nonnormal A.
Best general f (A) alg. Benchmark for
comparing other f (A) algs—general and
specific.
The basis of a new funm for next MATLAB
release.

Function of Matrix – p.35/42


OUTLINE
Definitions of f (A)

Applications

Algorithms for particular f

Schur–Parlett algorithm for general f

I Computing f (A)b

Function of Matrix – p.36/42


log(A) b
R1 Pm
Apply quadrature rule 0
f (t) dt ≈ k=1 ck f (tk ) to (Wouk, 1965)

R1 £ ¤−1
log A = 0 (A − I) t(A − I) + I dt.

Combine with Hessenberg reduction A = QHQT to get


m
X £ ¤−1
(log A) b ≈ Q ck tk (H − I) + I d, d = QT (A − I)b,
k=1

Costs (10/3)n3 + 2mn2 flops.

When kI − Ak < 1 can use m-point Gauss-Legendre ≡ Padé


approximation! Choose m using (Kenney & Laub, 2001)

krmm (X) − log(I + X)k ≤ |rmm (−kXk) − log(1 − kXk)|.

When kI − Ak > 1 use adaptive quadrature.


Function of Matrix – p.37/42
Aα b
dy
= α(A − I)[t(A − I) + I]−1 y, y(0) = b
dt
has unique solution y(t) = [t(A − I) + I]α b ⇒ y(1) = Aα b.
Used by Allen, Baglama & Boyd (2000) for α = 1/2, spd A.
Example using MATLAB’s ode45.
A = gallery(’parter’,64), b = randn(64,1).
f (A) tol Succ. steps Fail. atts f evals Rel. err
A−1/2 1e-3 12 0 73 3.5e-8
1e-6 14 0 85 6.0e-9
1e-9 40 0 241 7.7e-12
A2/5 1e-3 15 0 79 2.8e-8
1e-6 16 0 91 2.4e-9
1e-9 54 0 325 1.8e-12

Function of Matrix – p.38/42


Interpolation
If A has distinct eigenvalues λj , Lagrange interp poly:
n
Y
(x − λk )
n
X k=0, k6=j
f (A)b = fj `j (A)b, `j (x) = n .
Y
j=0 (λj − λk )
k=0, k6=j

Cost: O(n4 ) flops.


For any A, Newton divided difference form:
n
X i−1
Y
f (A)b = ci (A − λj I)b, ci = (confluent) div. diffs.
i=0 j=0

Requires derivatives of f . Cost: O(n3 ) flops.


Function of Matrix – p.39/42
Cauchy Integral Theorem
Z Z
1
y= f (z)(zI − A)−1 b dz =: g(z) dz.
2πi Γ Γ
Take circle
Γ : z − α = βeiθ , 0 ≤ θ ≤ 2π.
Apply repeated trapezium rule:
Z Z 2π n−1
2πi X
g(z) dz = (z(θ) − α)g(z(θ)) dθ ≈ (zk − α)g(zk ),
Γ 0 n
k=0

where zk − α = βe2πki/n .
Use Hessenberg reduction, as before.

Function of Matrix – p.40/42


Euler-Maclaurin Error Bound
h(x) period 2π , in C 2k+1 (−∞, ∞), |h(2k+1) (x)| ≤ M :
¯Z 2π ¯
¯ ¯ 4πM ζ(2k + 1)
¯ h(x) dx − Tn (f )¯≤ .
¯ ¯ n 2k+1
0

• h(2k+1) (x) proportional to β 2k+2 = radius of circle.


• h(2k+1) (x) contains powers of resolvent (z(θ)I − A)−1 .
Bad if contour close to some λi or A highly nonnormal.
• h(2k+1) (x) contains derivatives of f on contour.
Conclude : restricted to matrices
not too nonnormal,
λi can be enclosed in circle of small radius not close to
singularity of derivs of f .
Function of Matrix – p.41/42
Future Work
F Theory and algorithms for non-primary
functions, perhaps linked to an f (A(t))
application.
F Better understanding of conditioning of f (A).
F Exploiting structure, e.g. A ∈ matrix
automorphism group (H, Mackey, Mackey &
Tisseur, 2003).

http://www.ma.man.ac.uk/~higham/

Function of Matrix – p.42/42

You might also like