Matrix Functions: Theory and Algorithms: Nick Higham Department of Mathematics University of Manchester

Matrix Functions:
Theory and Algorithms
Nick Higham
Department of Mathematics
University of Manchester
higham@ma.man.ac.uk
http://www.ma.man.ac.uk/~higham/
Includes joint work with Philip Davies
Function of Matrix – p.1/42

OUTLINE
I Definitions of f (A)
Applications
Algorithms for particular f
Schur–Parlett algorithm for general f
Computing f (A)b

Defining by Substitution
Want to define f : Cn×n → Cn×n , but not elementwise.
Given f (t), can define f (A) by substituting A for t:
1 + t2
f (t) = ⇒ f (A) = (I − A)−1 (I + A2 ).
1−t
x2 x3 x4
log(1 + x) = x − + − + · · · , |x| < 1
2 3 4
A2 A3 A4
⇒ log(I + A) = A − + − + · · · , ρ(A) < 1.
2 3 4
Works for f
a polynomial,
a rational,
or with a convergent power series.
Multiplicity of Definitions
There have been proposed in the literature since 1880

eight distinct definitions
of a matric function,
by Weyr, Sylvester and Buchheim,
Giorgi, Cartan, Fantappiè, Cipolla,
Schwerdtfeger and Richter.
— R. F. Rinehart,
The Equivalence of Definitions of a Matric Function,
Amer. Math. Monthly (1955)

Cauchy Integral Theorem
Definition 1
Z
1
f (A) = f (z)(zI − A)−1 dz,
2πi Γ
where f is analytic inside a closed contour Γ which

encloses λ(A).

Jordan Canonical Form
 
λk 1
 ... 
 λk 
Z −1 AZ = J = diag(J1 , J2 , . . . , Jp ), Jk =  ... 
 1 
λk
Definition 2
f (A) = Zf (J)Z −1 = Zdiag(f (Jk ))Z −1 ,

 
f (k−1) )(λ k)
 f (λk ) f 0 (λ k) ... 
 (k − 1)! 
 ... .. 
f (Jk ) = 
 f (λk ) . .

 ... 
 f 0 (λk ) 
f (λk )

Interpolation
Definition 3 (Sylvester, 1883; Buchheim, 1886) Distinct
e’vals λ1 , . . . , λs , ni = geometric mult. of λi . Then
f (A) = r(A), where r is unique Hermite interpolating poly of
Ps
degree less than i=1 ni satisfying interpolation conditions
r(j) (λi ) = f (j) (λi ), j = 0: ni − 1, i = 1: s.
Poly r depends on A.
This def. preserves functional relations G(f1 , . . . , fp ) = 0,
where G is a polynomial. E.g. sin2 (A) + cos2 (A) = I .
But of course eA+B 6= eA eB .

Non-Primary Functions
Horn & Johnson call these defs primary matrix functions.
But not all possible functions captured when multiple
eigenvalues. E.g.,
· ¸ · ¸ · ¸
−1 0 i 0 0 −1
A= , X= , Y = .
0 −1 0 −i 1 0
X and Y are square roots of A but are not polynomials in A.

However, A = givens(π) and Y = givens(π/2) is a natural
square root.
Virtually all existing theory and methods are for primary

functions.
Non-primary functions sometimes needed when
tracking f (A(t)) when eigenvalues of A(t) coalesce.

Textbook References
[1] F. R. Gantmacher. The Theory of Matrices, volume
one. Chelsea, New York, 1959.
[2] Gene H. Golub and Charles F. Van Loan. Matrix

Computations. Johns Hopkins University Press,
Baltimore, MD, USA, third edition, 1996.
[3] Roger A. Horn and Charles R. Johnson. Topics in

Matrix Analysis. Cambridge University Press, 1991.
[4] Peter Lancaster and Miron Tismenetsky. The Theory

of Matrices. Academic Press, London, second edition,
1985.

OUTLINE
Definitions of f (A)
I Applications
Computing f (A)b

Application: Differential equations
Nuclear magnetic resonance: Solomon equations
dM/dt = −RM, M (0) = I,
where M (t) = matrix of intensities and R = symmetric

relaxation matrix. NMR workers need to solve both forward
and inverse problems.
Exponential time differencing for stiff systems (Cox &

Matthews, J. Comp. Phys., 2002)
y 0 = Ay + F (y, t).
Methods based on exact integration of linear part—require

one accurate evaluation of exp(hA) per integration.

Application: Control theory
Convert continuous-time system
dx
= Ax(t) + Bu(t)
dt
to discrete-time state-space system
xk+1 = F xk + Guk ,
where F = eAτ and τ is sampling period.

(E.g., MATLAB Control System Toolbox, c2d, d2c.)

OUTLINE
Applications
I Algorithms for particular f
Computing f (A)b

Classic MATLAB
< M A T L A B >
Version of 01/10/84
HELP is available
<>
help
Type HELP followed by
INTRO (To get started)
NEWS (recent revisions)
ABS ANS ATAN BASE CHAR CHOL CHOP CLEA COND CONJ COS
DET DIAG DIAR DISP EDIT EIG ELSE END EPS EXEC EXIT
EXP EYE FILE FLOP FLPS FOR FUN HESS HILB IF IMAG
INV KRON LINE LOAD LOG LONG LU MACR MAGI NORM ONES
ORTH PINV PLOT POLY PRIN PROD QR RAND RANK RCON RAT
REAL RETU RREF ROOT ROUN SAVE SCHU SHOR SEMI SIN SIZE
SQRT STOP SUM SVD TRIL TRIU USER WHAT WHIL WHO WHY
< > ( ) = . , ; \ / ’ + - * :

Classic MATLAB
<>
help fun
FUN For matrix arguments X , the functions SIN, COS, ATAN,

SQRT, LOG, EXP and X**p are computed using eigenvalues D
and eigenvectors V . If <V,D> = EIG(X) then f(X) =
V*f(D)/V . This method may give inaccurate results if V
is badly conditioned. Some idea of the accuracy can be
obtained by comparing X**1 with X .
For vector arguments, the function is applied to each
component.
The availability of [FUN] in early versions of MATLAB

quite possibly contributed to
the system’s technical and commercial success.
— Cleve Moler (2003)

Setup
I General nonsymmetric A
I Factorization of A feasible
I May not want full accuracy
I Many applications.
I Methods for very large, sparse A, often require solution
of smaller, dense subproblems.

Matrix Exponential
Cleve Moler and Charles Van Loan.
Nineteen dubious ways to compute the exponential of a
matrix, twenty-five years later, SIAM Rev., 45 (2003).
B 355 citations on Science Citation Index.
Scaling and squaring (SS) method for X ≈ eA

(Ward, 1977; Moler & Van Loan, 1978).
1. A ← A/2k so kAk∞ ≤ 1/2
2. r(A) = [6/6] Padé approximant to eA
k
3. X = r(A)2
Used by MATLAB’s expm.

Alternative SS Algorithm for eA
Suggested by Najfeld & Havel (1995): exploit
τ (A) = A coth(A) = A(e2A + I)(e2A − I)−1

A2
=I+ 2 .
A
3I +
A2
5I +
7I + · · ·
1. B = A/2k+1 so kA2 k∞ /22k+2 ≤ 1.152

2. r(B) = [8/8] Padé approximant to τ (B).
h i2k
3. X = (r(B) + B)(r(B) − B)−1
I Claimed to require fewer flops than original SS alg.

Principal Log and pth Root
Let A ∈ Cn×n have no eigenvalues on R− .
Log
X = log A denotes unique X such that
1. eX = A.
2. −π < Im(λ(X)) < π .
pth root
For integer p > 0, X = A1/p is unique X such that
1. X p = A.
2. −π/p < arg(λ(X)) < π/p.

Briggs’ Log Method (1617)
log(ab) = log a + log b ⇒ log a = 2 log a1/2 .
Use repeatedly:
k 1/2k
log a = 2 log a .
k
Writea1/2 = 1 + x and note log(1 + x) ≈ x. Briggs worked to
base 10 and used
k 1/2k
log10 a ≈ 2 · log10 e · (a − 1).

Use repeatedly:
k 1/2k
log a = 2 log a .
k
base 10 and used
k 1/2k
log10 a ≈ 2 · log10 e · (a − 1).
Briggs must be viewed as one of the

great figures in numerical analysis.
— Herman H. Goldstine, A History of Numerical
Analysis (1977)

Use repeatedly:
k 1/2k
log a = 2 log a .
k
base 10 and used
k 1/2k
log10 a ≈ 2 · log10 e · (a − 1).
Can we generalize to matrices:
k
log A = 2k log A1/2 ?

Splitting Lemma
Lemma 0 (Cheng, H, Kenney & Laub, 2001) Suppose
A = BC has no eigenvalues on R− and
1. BC = CB .
2. Every eigenvalue of B (or C ) lies in the open halfplane
of the corresponding eigenvalue of A1/2 .
Then log A = log B + log C .
λ λ1/2
B A
Im λ
Re λ Function of Matrix – p.21/42

Matrix Logarithm
Use the Briggs idea:
k
log A = 2k log A1/2 .
Kenney & Laub’s (1989) inverse scaling and squaring

method:
Bring A close to I by repeated square roots.
k
Approximate log A1/2 using an [m/m] Padé
approximant rm (x) ≈ log(1 − x).
Rescale to find log A.

Alg of Cheng, H, Kenney & Laub (2001)
F Transformation-free: uses only matrix mult, LU, inv.

F Sq. roots by product form of Denman–Beavers iteration:
1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,
where Mk → I and Yk → A1/2 .

1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

F Aims for a specified accuracy.

1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

F Padé degree m chosen using K & L’s (1989) bound:
krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.

1h 1 i
Mk+1 = I + (Mk + Mk−1 ) , M0 = A,
2 2
Yk+1 = Yk (I + Mk−1 )/2, Y0 = A,

F Padé degree m chosen using K & L’s (1989) bound:
krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.
F rm evaluated using partial fraction expansion

Pm αj(m) x
rm (x) = j=1 (m) : fast and accurate (H, 2001).
1+βj x
Matrix pth Root
Square root: Björck & Hammarling (1983). Compute Schur
decomp. A = QT Q∗ and then solve R2 = T by
Pj−1
√ tij − k=i+1 tij tkj
rii = tii , rij = .
tii + tjj
Extended to pth roots by Smith (2003)—much more

complicated recurrence.
These algs
I Have essentially optimal numerical stability.

I Generalize to real Schur decomp.

Matrix Cosine
Algorithm 0 (Serbin & Blalock, 1980) Given A ∈ Rn×n
and parameter α > 0 this alg approximates cos(A).
Choose m such that 2−m kAk ≈ α.
C0 = Taylor or Pade approximation to cos(A/2m ).
for i = 0: m − 1
Ci+1 = 2Ci2 − I
end
Choice of m (i.e., α)?

Which approximation?
Effect of rounding errors?

Alg of H & Smith (2002)
I Initial argument reduction and balancing to
reduce norm.
I [8/8] Padé approximation proved fully accurate
in IEEE double if kAk∞ ≤ 1. More economical
than Taylor series.
I “Schoolboy” evaluation of r8 (A).
I Total cost: (4 + dlog2 (kAk∞ )e)M + D.
I Error analysis give bound containing terms
(4.1)m and norms of intermediate Ci .

Numerical Stability
Is kfb − f k consistent with condition of problem?
Is fb = f (A + E) with E “small’, i.e.,
is residual f −1 (fb) − A “small’?
Unclear for all algs discussed except “yes” for A1/p .
F Currently lack characterizations of when an

f (A) problem is ill conditioned for nonnormal A.

OUTLINE
Applications
I Schur–Parlett algorithm for general f
Computing f (A)b

Similarity Transformations
Can use the formula
A = XBX −1 ⇒ f (A) = Xf (B)X −1 ,
provided f (B) is easily computable.

E.g. B = diag(λi ) if A diagonalizable.
Problem : any error ∆B in f (B) magnified by up to

κ(X) = kXkkX −1 k ≥ 1.
Prefer to work with unitary X : thus can use

eigendecomposition (diagonal B ) when A is normal
(AA∗ = A∗ A),
Schur decomposition (triangular B ) in general.

Example: Eigendecomposition
function F = funm_ev(A,fun)
[V,D] = eig(A);
F = V * diag(feval(fun,diag(D))) / V;
>> A = [3 -1; 1 1]; X = funm_ev(A,@sqrt)

X =
1.7678e+000 -3.5355e-001
3.5355e-001 1.0607e+000
>> norm(A-Xˆ2) % cond(V) = 9.4e7

ans =
9.9519e-009
>> Y = sqrtm(A); norm(A-Yˆ2)

ans =
6.4855e-016
Parlett’s Recurrence
Schur decomposition A = QT Q∗ reduces problem to
F = f (T ), T upper triangular.
fii = f (tii ) is immediate.
Parlett (1976): from F T = T F obtain recurrence

j−1
X
fii − fjj fik tkj − tik fkj
fij = tij + .
tii − tjj tii − tjj
k=i+1
Used in MATLAB’s funm.

Parlett’s Recurrence
Schur decomposition A = QT Q∗ reduces problem to
F = f (T ), T upper triangular.
fii = f (tii ) is immediate.
Parlett (1976): from F T = T F obtain recurrence

j−1
X
fii − fjj fik tkj − tik fkj
fij = tij + .
tii − tjj tii − tjj
k=i+1
Used in MATLAB’s funm.
Fails when T has repeated eigenvalues.

Parlett vs. Björck & Hammarling
Parlett recurrence is not “optimal”, as clear from sq. root
case: x12 obtained from
√ √
a12 ( a11 − a22 ) a12
Parlett : =√ √ : B & H.
a11 − a22 a11 + a22

Schur–Parlett Algorithm
H & Davies (2002):
Compute Schur decomposition A = QT Q∗ .

Re-order T to block triangular form in which
eigenvalues within a block are “close” and those of
separate blocks are “well separated”.
Evaluate Fii = f (Tii ).
Solve the Sylvester equations
j−1
X
Tii Fij − Fij Tjj = Fii Tij − Tij Fjj + (Fik Tkj − Tik Fkj ).
k=i+1
Undo the unitary transformations.

Function of Atomic Block
Assume f has Taylor series with ∞ radius of cgce and
derivatives available.
For diagonal blocks T use

∞
X f (k) (σ)
T = σI + M, σ = trace(T )/n : f (T ) = M k.
k!
k=0
Truncate series based on strict error bound, not using

size of terms. NB: for n = 2,
· ¸
² α
M=
0 −²
· ¸ · ¸
2k ²2k 0 2k+1 ²2k+1 α²2k
⇒ M = 2k , M = 2k+1 .
0 ² 0 −²
Features of Algorithm
Costs O(n3 ) flops, or up to n4 /3 flops if large
blocks needed (close, repeated eigenvalues).
Needs derivatives if blocks size > 1: price to
pay for treating general f and nonnormal A.
Best general f (A) alg. Benchmark for
comparing other f (A) algs—general and
specific.
The basis of a new funm for next MATLAB
release.

OUTLINE
Applications
I Computing f (A)b

log(A) b
R1 Pm
Apply quadrature rule 0
f (t) dt ≈ k=1 ck f (tk ) to (Wouk, 1965)
R1 £ ¤−1
log A = 0 (A − I) t(A − I) + I dt.
Combine with Hessenberg reduction A = QHQT to get

m
X £ ¤−1
(log A) b ≈ Q ck tk (H − I) + I d, d = QT (A − I)b,
k=1
Costs (10/3)n3 + 2mn2 flops.
When kI − Ak < 1 can use m-point Gauss-Legendre ≡ Padé

approximation! Choose m using (Kenney & Laub, 2001)
krmm (X) − log(I + X)k ≤ |rmm (−kXk) − log(1 − kXk)|.
When kI − Ak > 1 use adaptive quadrature.

Aα b
dy
= α(A − I)[t(A − I) + I]−1 y, y(0) = b
dt
has unique solution y(t) = [t(A − I) + I]α b ⇒ y(1) = Aα b.
Used by Allen, Baglama & Boyd (2000) for α = 1/2, spd A.
Example using MATLAB’s ode45.
A = gallery(’parter’,64), b = randn(64,1).
f (A) tol Succ. steps Fail. atts f evals Rel. err
A−1/2 1e-3 12 0 73 3.5e-8
1e-6 14 0 85 6.0e-9
1e-9 40 0 241 7.7e-12
A2/5 1e-3 15 0 79 2.8e-8
1e-6 16 0 91 2.4e-9
1e-9 54 0 325 1.8e-12

Interpolation
If A has distinct eigenvalues λj , Lagrange interp poly:
n
Y
(x − λk )
n
X k=0, k6=j
f (A)b = fj `j (A)b, `j (x) = n .
Y
j=0 (λj − λk )
k=0, k6=j
Cost: O(n4 ) flops.

For any A, Newton divided difference form:
n
X i−1
Y
f (A)b = ci (A − λj I)b, ci = (confluent) div. diffs.
i=0 j=0
Requires derivatives of f . Cost: O(n3 ) flops.

Cauchy Integral Theorem
Z Z
1
y= f (z)(zI − A)−1 b dz =: g(z) dz.
2πi Γ Γ
Take circle
Γ : z − α = βeiθ , 0 ≤ θ ≤ 2π.
Apply repeated trapezium rule:
Z Z 2π n−1
2πi X
g(z) dz = (z(θ) − α)g(z(θ)) dθ ≈ (zk − α)g(zk ),
Γ 0 n
k=0
where zk − α = βe2πki/n .
Use Hessenberg reduction, as before.

Euler-Maclaurin Error Bound
h(x) period 2π , in C 2k+1 (−∞, ∞), |h(2k+1) (x)| ≤ M :
¯Z 2π ¯
¯ ¯ 4πM ζ(2k + 1)
¯ h(x) dx − Tn (f )¯≤ .
¯ ¯ n 2k+1
0
• h(2k+1) (x) proportional to β 2k+2 = radius of circle.

• h(2k+1) (x) contains powers of resolvent (z(θ)I − A)−1 .
Bad if contour close to some λi or A highly nonnormal.
• h(2k+1) (x) contains derivatives of f on contour.
Conclude : restricted to matrices
not too nonnormal,
λi can be enclosed in circle of small radius not close to
singularity of derivs of f .
Future Work
F Theory and algorithms for non-primary
functions, perhaps linked to an f (A(t))
application.
F Better understanding of conditioning of f (A).
F Exploiting structure, e.g. A ∈ matrix
automorphism group (H, Mackey, Mackey &
Tisseur, 2003).
http://www.ma.man.ac.uk/~higham/

Matrix Functions: Theory and Algorithms: Nick Higham Department of Mathematics University of Manchester

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Matrix Functions: Theory and Algorithms: Nick Higham Department of Mathematics University of Manchester

Uploaded by

Copyright:

Available Formats

Matrix Functions:

Theory and Algorithms

Includes joint work with Philip Davies

Function of Matrix – p.1/42

Algorithms for particular f

Schur–Parlett algorithm for general f

Function of Matrix – p.2/42

There have been proposed in the literature since 1880

Function of Matrix – p.4/42

where f is analytic inside a closed contour Γ which

Function of Matrix – p.5/42

f (A) = Zf (J)Z −1 = Zdiag(f (Jk ))Z −1 ,

Function of Matrix – p.6/42

r(j) (λi ) = f (j) (λi ), j = 0: ni − 1, i = 1: s.

Function of Matrix – p.7/42

X and Y are square roots of A but are not polynomials in A.

Virtually all existing theory and methods are for primary

Function of Matrix – p.8/42

[2] Gene H. Golub and Charles F. Van Loan. Matrix

[3] Roger A. Horn and Charles R. Johnson. Topics in

[4] Peter Lancaster and Miron Tismenetsky. The Theory

Function of Matrix – p.9/42

Algorithms for particular f

Schur–Parlett algorithm for general f

Function of Matrix – p.10/42

dM/dt = −RM, M (0) = I,

where M (t) = matrix of intensities and R = symmetric

Exponential time differencing for stiff systems (Cox &

Methods based on exact integration of linear part—require

Function of Matrix – p.11/42

where F = eAτ and τ is sampling period.

Function of Matrix – p.12/42

I Algorithms for particular f

Schur–Parlett algorithm for general f

Function of Matrix – p.13/42

Function of Matrix – p.14/42

FUN For matrix arguments X , the functions SIN, COS, ATAN,

The availability of [FUN] in early versions of MATLAB

Function of Matrix – p.15/42

Function of Matrix – p.16/42

Scaling and squaring (SS) method for X ≈ eA

Used by MATLAB’s expm.

Function of Matrix – p.17/42

τ (A) = A coth(A) = A(e2A + I)(e2A − I)−1

1. B = A/2k+1 so kA2 k∞ /22k+2 ≤ 1.152

I Claimed to require fewer flops than original SS alg.

Function of Matrix – p.19/42

Function of Matrix – p.20/42

Briggs must be viewed as one of the

Function of Matrix – p.20/42

Can we generalize to matrices:

Function of Matrix – p.20/42

Re λ Function of Matrix – p.21/42

Kenney & Laub’s (1989) inverse scaling and squaring

Function of Matrix – p.22/42

Function of Matrix – p.23/42

where Mk → I and Yk → A1/2 .

Function of Matrix – p.23/42

where Mk → I and Yk → A1/2 .

Function of Matrix – p.23/42

where Mk → I and Yk → A1/2 .

krm (X) − log(I − X)k ≤ |rm (kXk) − log(1 − kXk)|.