Download as pdf
Download as pdf
You are on page 1of 367
GENE H. GOLUB - CHARLES F, VAN LOAN MATRIX COMPUTA LOWNS Matrix C omputations Johns Hopkins Studies in the Mathematical Sciences in association with the Department of Mathematical Sciences THIRD EDITION The Johns Hopkins University Gene H. Golub Department of Computer Science ‘Stanford University Charles F. Van Loan Department of Computer Science Cornell University The Johns Hopkins University Press Baltimore and London (©1983, 1989, 1996 The Johns Hopkins University Press All Fights reserved. Published 1996 Printed inthe United States of America on acid-free paper 105 0403 02 01 009998 97 3432 Firs edition 1983 Second edition 1989, ‘Third Edition 1996 ‘Toe Johns Hopkins University Press 2715 North Charles Suet Baltimore, Maryland 21218-4319 ‘The Johns Hopkins Press Lu, London Libary of Congress Cataloging in-Publication Data will be found atthe end of this hook ISBN 0-8018-5414.8 (pbk) DEDICATED To ALSTON S. HOUSEHOLDER AND JAMES H. WILKINSON Contents Preface to the Third Edition xi Software xiii Selected References xv 1 Matrix Multiplication Problems Ll Basic Algorithms and Notation 2 12 Exploiting Structure 16 13 Block Matrices and Algorithms 4 14 ‘Vectorization and Re-Use Issues Mu 2 Matrix Analysis 48 21 Basic Ideas from Linear Algebra 43 22 ‘Vector Norms 52 23 Matrix Norms 54 24 Finite Precision Matrix Computations 59 25 Orthogonality and the SVD 69 2.6 Projections and the CS Decomposition 75 27 The Sensitivity of Square Linear Systems 80 General Linear Systems 87 ‘Triangular Systems 88 ‘The LU Factorization 94 Roundoff Analysis of Gaussian Elimination 104 Pivoting 109 Improving and Estimating Accurscy 123, 4 41 42 43 44 45 46 ar 5 BL 52 53 54 55 56 BT 6 61 62 63 7 Ta 72 73 Ta 75 76 aT 8 81 82 Special Linear Systems ‘The LDMT and LDLT Factorizations 135 Positive Definite Systems 140 Banded Systems 152 Symmetric Indefinite Systems 161 Block Systems 174 Vandermonde Systems and the FFT 183 ‘Toeplitz and Related Systems 193 Orthogonalization and Least Squares Householder and Givens Matrices 208 ‘The QR Factorization 223 ‘The Full Rank LS Problem 236 Other Orthogonal Factorizations 248 ‘The Rank Deficient LS Problen 256 Weighting and Iterative Improvement 264 ‘Square and Underdetermined Systems 270 Parallel Matrix Computations Basic Concepts 276 Matrix Multiplication 292 Factorizations — 300 The Unsymmetric Eigenvalue Problem Properties and Decompositions 310 Perturbation Theory 320 Power Iterations 330 ‘The Hessenberg and Real Schur Forms 341 ‘The Practical QR Algorithm 352 Invariant Subspace Computations 362 ‘The QZ Method for Ax = Bx 375 The Symmetric Eigenvalue Problem Properties and Decompositions 393 Power Iterations 405 133 206 275 308 391 83 The Symmetric QR Algorithm 414 84 Jacobi Methods 426. 85 ‘Tridiagonal Methods 439 86 Computing the SVD 448 BT ‘Some Generalized Eigenvalue Problems 461 9 Lanczos Methods 470 9. Derivation and Convergence Properties 471 92 Practical Lanczos Procedures 479 93 Applications to Ar = b and Least Squares 490 94 Amoldi and Unsymmetric Lanczos 499. 10 _ iterative Methods for Linear Systems 508 10.1 The Standard Iterations 509 10.2 The Conjugate Gradient Method 520 10.3 Preconditioned Conjugate Gradients 532 104 Other Krylov Subspace Methods ‘544 11. Functions of Matrices 555 11.1 Eigenvalue Methods 556 11.2 Approximation Methods 562 11.3 The Matrix Exponential 572 12 special Topics 579 12.1 Constrained Least Squares 580 12.2 Subset Selection Using the SVD 590 12.3 Total Least Squares 595 124 Computing Subspsces with the SVD 601 12.5 Updating Matrix Pactorizations 606 12.6 — Modified/Structured Eigenproblems 621 Bibliography 637 Index 687 Preface to the Third Edition ‘The field of matrix computstions continues to grow and mature. In the Thind Edition we have added over 300 new references and 100 new problems, The LINPACK and EISPACK citations have been replaced with appropriate pointers to LAPACK with key codes tabulated at the Beginning of appropriate chapters. In the First Edition and Second Edition ie identified » small number ‘of global references: Wilkinson (1965), Forsythe and Moler (1967), Stewart (1973), Hanson and Lawaon (1974) and Parlett (1980). These volumes are ‘as important as ever to the research landscape, but there are some mag- nificent new textbooks and monographs on the scene. See The Literature section that follows. ‘We continue as before with the practice of giving references at the end ‘of each section and a master bibliography at the end of the book. ‘The earlier editions suffered from a large number of typographical errors ‘and we are obliged to the dozens of readers who have brought these to our attention. Many corrections and clarifications have been made. Here are some specific highlights of the new edition. Chapter 1 (Matrix Multiplication Problems) and Chapter 6 (Parallel Mstrix Computations) have been completely rewritten with less formality. We think thot this facilitates the building of intuition for high performance computing and ‘draws 8 better line between algorithm end implementation on the printed page. Tn Chapter 2 (Matrix Analysis) we expanded the treatment of CS de- ‘composition and included a proof. The overview of floating point arithmetic hhas been brought up to date. In Chapter 4 (Special Linear Systems) we embellished the Toeplitz section with connections to circulant matrices and the fast Fourier transform. A subsection on equilibrium systems has been included in our treatment of indefinite systers. ‘A more accurate rendition of the modified Gram-Schmidt process is offered in Chapter 5 (Orthogonslization and Least Squares). Chapter 8 (The Symmetric Eigenproblem) has been extensively rewritten and rear ranged 20 as to minimize its dependence upon Chapter 7 (The Unsymmet- tie Eigenproblem). Indeed, the coupling between these two chapters is now 0 minimal that it is possible to read either one first. In Chapter 9 (Lanczos Methods) we have expanded the discussion of xan PRePAGE 10 THe imi BULLS the unsymmetric Lanczos process and the Arnoldi iteration. The “unsym- metric component” of Chapter 10 (Iterative Methods for Linear Systems) thas likewise been broadened with whole new section devoted to various Krylov apace methods designed to handle the sparee unsymmetric linear system problem. In §12.5 (Updating Orthogonal Decompositions) we included new sub- section on ULV updating, Toeplitz matrix eigenproblems and orthogonal matrix eigenproblems are discussed in §12.6. Both of us look forward to continuing the dialog with our readers. As ‘we said in the Preface to the Second Edition, “Tt has been a pleasure to deal with such an interested and friendly readership.” Many individuals made valuable Third Edition suggestions, but Greg Ammar, Mike Heath, Nick Trefethen, and Steve Vavasis deserve special ‘thanks. Finally, we would like to acknowledge the support of Cindy Rabinson ft Corel. A dedicated amistant makes a big diference Software LAPACK Many of the algorithms in this book are implemented in the software pack- ‘ge LAPACK: E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongerra, J. DuCros, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen (1995). LAPACK Users’ Guide, Release 2.0, 2nd ed., SIAM Publications, Philadelpbis. Pointers to some of the more important routines in this package are given at the beginning of selected chapters: Chapter 1. Level-1, Level-2, Level-3 BLAS Chapter 3. General Linear Systems Chapter 4. Positive Definite and Band Systems Chapter 5. Orthogonalization and Least Squares Problems Chapter 7. ‘The Unsymmetric Eigenvalue Problem Chapter 8. The Symmetric Eigenvalue Problem ‘Our LAPACK references are spare in detail but rich enough to “get. you started.” Thus, when we say that .TASV can be used to solve a triangular system Az = b, we leave it to you to discover through the LAPACK manual that A can be either upper or lower triangular and that the transposed system ATx = b can be handled as well. Moreover, the underscore is a Blonder whee iission is to designate type (single, double, complex, etc). LAPACK stands on the shoulders of two other packages that are mile ‘tones in the history of software development. EISPACK was developed in the early 1970s and is dedicated to solving symmetric, unsymmetric, and generalized eigenproblems: B.T, Smith, J.M. Boyle, Y. Uebe, V.C. Klema, and C.B. Moler (1970). Matriz Eigensystem Routines: EISPACK Guide, 2nd ed., Lecture Notes in Computer Science, Volume 6, Springer-Verlag, New York. xill xiv SOFTWARE BS. Garbow, J.M. Boyle, J.J. Dongarra, and C.B. Moler (1972). Matriz Bigensystem Routines: BISPACK Guide Bztension, Lecture Notes in Computer Science, Volume 51, Springer-Verlag, New York. LINPACK was developed in the late 1970s for linear equations and least squares problems: EISPACK and LINPACK have their roots in sequence of papers that feature Algol implementations of some of the key matrix factorizations. These papers are collected in LH, Wilkinson and C. Reinsch, eds. (1971). Handbook for Automatic Computation, Vol. 2, Linear Algebra, Springer-Verlag, New York. NETLIB ‘A wide range of software including LAPACK, EISPACK, and LINPACK is available electronically via Netlib: ‘World Wide Web: ntep://www.net1ib.org/index. html Anonymous ftp: ftp://ftp.netlib.org Via email, send a one-line mesage: mail. net]ibtornl gov send index to get started. MariaB® Complementing LAPACK and defining a very popular matrix computation enviroument is MATLAB: Martas User's Guide, The MathWorks Inc., Natick, Massachusetts, 'M, Marcus (1993). Mairices and MATLAB: A Tutorial, Prentice Hall, Up- per Saddle River, NJ. RL. Pratap (1995). Getting Started with Maras, Saunders College Pub- lishing, Fort: Worth, TX. Many of the problems in Matrix Computations are bert posed to students 36 MATLAB problems. We make extensive use of MATLAB notation in the resentation of algorithms. Selected References Bach section in the book concludes with an annotated list of references. ‘A master bibliography is given at the end of the text. ‘Useful books that collectively cover the field, are cited below. Chapter titles are included if appropriate but do not infer too much from the level of detail because one author's chapter may be another's subsection. The citations are classified as follows: Pre-1970 Classica. Barly volumes that set the stage. Introductory (General). Suitable for the undergraduate classroom. Advanced (General). Best for practitioners and graduate students. ‘Analytical. For the supporting mathematics, Linear Equation Problems. Az = 6. Linear Fitting Problems. Az ~ b, Eigenvalue Problems. Az = Az. High Performance. Parallel/vector issues Edited Volumes, Useful, themstic collections. Within each group the entries are specified in chronological order. Pre-1970 Classics VIN. Faddeeva (1959). Computational Methods of Linear Algebra, Dover, New York. Basic Meleral from Linear Algebra. Systeme of Linear Equations. The Proper Numbers and Proper Vectors of = Matrix . Bodewig (1959). Matriz Calenlus, North Holland, Amsterdam. ‘Matric Calculus. Direct Methods for Linear Equation, lnirect Methods for Linear Equations Inversion of Matrices. Geodetic Matrices Eigeaprobles. RS, Varga (1962). Matrix Iterative Analysis, Prentice-Hall, Englewood Clif, NJ. wi SeuecTep REFERENCES J.H. Wilkinson (1963). Rounding Errors in Algebraic Processes, Prentice- Hill, Englewood Clifs, NJ. ‘The Fundamental Arithmetic Operations Computations Involving, Polynomials. Matrix Computations A\S, Householder (1964). Theory of Matrices in Numerical Analysis, Blais- dell, New York. Reprinted in 1974 by Dover, New York. ‘Some Basic Identiti and Inequalities. Norms, Bounds, and Convergence. Localism tion Theorems and Other Inequalities. The Soliton of Linear Systemax: Methods of ‘Suocemsive Appreximation. Direct Methods of iverson. Proper Valves and Vectors: Normalization and Reduction of the Metrix. Proper Values and Vectors: Succemive ‘Approximation. L. Fax (1964). An Introduction to Numerical Linear Algebra, Oxford Uni- versity Press, Oxford, England. Introduction, Matrix Algebra. Elimination Metbods of Gauss, Jordan, and Aitken, {Gompect Elimination Methods of Doolittle, Crout, Banechcwic, and Chole. Onthoonalzation Methods. Condition, Accuracy, and Precion,” Comparson of Methods, Measure of Work. Iteraive and Gradient Methods. erative methods for latent Roots and Vectors. ‘Tasaormation Methods for Latent Roots aod Vertors. Notes on Error Anaya fr Latent Root ad Vectors HL, Wilkinsoa (1965), The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, England. ‘Theoretical Background, Perturbation Theory. Error Analyt, Solution of Line ‘er Algebraic Equation Hermitian Matrioa. Reduction of « General Matrix to Condensed Form. Eigenvalues of Matrices of Condensed Forma. The LR aad QR Algorithms, Iterative Methods. G.E. Forsythe and C. Moler (1967). Computer Solution of Linear Algebraic ‘Systems, Prentice-Hall, Englewood Clifis, NJ. Reader's Background and Purpose of Book. Vector and Matrix Norms, Diagonal ‘Form of a Matrix Under Orthogonal Equivalence. Proof of Diagonal Form Theorem ‘Type of Computational Problems ia Linear Algebra. ‘Types of Matrices encou- tered ia Practical Problema. Sources of Compatetional Problems of Linear Algebre. Condition of w Lisear System. Gauman Hisnination and LU Decomposition. Now for Interchanging Rows. Scaling Equations nad Unkaowns. The Crott oad Dooli- te Variants. Iterative improvement. Computing the Deverminant. Neary Singular Matrices. Algol 60 Program. Fortran, Extended Algol, end PL/I Programm. Mee trix laversion. An Example: Hubert Matric Floating Point Round-O8 Analysis, Rounding Error in Geumian Elimination. Convergecce of Iterative Improvemect Positive Definite Matrices; Band Matrices Iterative Methods for Solving Linear Systems. Nonlinear System of Equations. REFERENCES xvii Introductory (General) A.R. Gourlay and G.A. Watson (1973). Computational Methods for Matriz Eigenproblems, John Wiley & Sons, New York. Introduction. Background Theory. Reductions and Transformations, Methods for the Dominant Eigervalue, Methods for the Subdominaat Eigenvalue. avers It ‘eration. Jacobi'a Methods, Giveas and Howsebolder’s Methods. Eigensystem of = Symmetric Tridiagonal Matzix, The LR and QR Algorithms, Extensions of Ja ‘cbi's Method. Extension of Giveas' and Houpehokler’s Methods. QR Algorithm for Heeweaberg Matric. geoeraized Bigeovalue Problems. Available Lmplementations G.W. Stewart (1973). Introduction to Matric Computations, Acodemic Press, New Yori. Proliminavies, Practicalition. The Direct Solution of Linear Systema. Norms, Lim: las, and Condition Numbers. The Lioear Lanst Square Problem. Eigenvalues and Bigeavectors. The QR Algorithm. RJ. Goult, RP. Hoskins, J.A. Milner and M.J. Pratt (1974). Compute- tional Methods in Linear Algebra, John Wiley and Sons, New York. Eigenvaiver and Pigenvectors. Error Analysis. The Soltion of Linear Equations by Eliminstion and Decomposition Methods. The Solution of Linear Systema of Eqs ‘tone by Iterative Methods. Errors inthe Solution Sets of Fquations, Computation of Eigenvalues and Eigenvectort. Errors in Eigenvalues and Eigenvectors. Appendix =A Survey of Easential Revue om Linear Algeoee ‘TAF. Coleman and C.F. Van Loan (1988). Handbook for Matriz Computa- tions, SIAM Publications, Philadelphia, PA. Fostran TT, The Basic Linear Aigsbra Subprogram, Linpack, MATLAB. W.W. Hager (1988). Applied Numerical Linear Algebra, Prentice-Hall, En- ‘lewood Clifs, NJ. Introduction. Elimination Schemes. Conditioning. Noniioonr Systems. Least Scares Eigenprobiame. fterative Methods. P.G. Ciarlet (1989). Introduction to Numerical Linear Algebra and Opti- mization, Cambridge University Press. ‘A Summary of Reaulta on Matrices. General Results in the Numerical Anaiysi of, “Matrices. Sources of Problems in the Nutperial Analyt of Matricen- Direct Meth- ‘eds for the Solution of Liar Systema Iterative Methods forthe Solation of Linear Symtoma. Methods for the Calculation of Eigenvalues and Eigeavectore. A Review of Differential Calcaloa. Some Applications. General Remus oa Optimization. Some Algorithms. Introduction to Nonlinear Programming. Linear Programming. DS. Watkins (1991). Amdamentals of Matrix Computations, John Wiley and Sons, New York. ‘Gouminn Elimination and Tia Varacte. Sensitivity of Linear Syateme; Effects of REFERENCES P. Gill, W. Murray, and M.H. Wright (1991). Numerical Linear Algebra ‘and Optimization, Vol. 1, Addizon-Wealey, Reading, MA. atroduction. Linear Algsiea Background. Computation and Condition. Linear ‘Equations, Compatible System. Lisen Leut Squares. Linesr Constraints I: Linear Progremuing. The Simplex Method. ‘A. Jennings and J.J. McKeowen (1992). Matriz Computation (2nd ed), Joba Wiley and Sons, New York. ‘Basic Algsbraic wad Numerical Concepts. Some Matrix Problems. Computer In ‘eenst Bleisaton tho Linear Epon. Spun Maite Cian, Some Matrix Eigeavalve Problema ‘Treaslormaticn Methods for Eigenvalue Prob- Jaman, Sturra Sequence Methods. Vector Iterative Methods for Partial Eigonoltion Orthogonalsetion and Re-Solution Technique for Linear Bquatins. trative Meth- coda for Linear Equations. Nos-inear Equations. Parallel and Vector Computing. BLN. Datta (1995). Numerical Linear Algebra and Applications. Brooks/Cole Publishing Company, Pacific Grove, California. Review of Required Linear Algsbra Concepta. Floating Polat Numbers and Errors ia Computations. Stability of Algorithms and Conditioaing of Problams. Numerically ‘ective Algorithms and Mathematical Software. Some Useful Transformations ix Numerical Linear Algebra and Their Applications. Numerical Matrox Elgeevaue Problems. The Geserlized Eigenvalue Problom. The Singular Value Decomposition. AVTnate of Roundoll Error Analyaie M.T. Heath (1997). Scientific Computing: An Introductory Survey, McGraw- Hill, New York. Scientific Computing, Syetems of Linear Equotions. Linear Least Squaren. Eigen- values and Singular Valuen. Nonlinear Bquations Optimisation. Interpolatioa. Nu ‘merical Integration and Diferentiation. Initial Value Problema far ODEs. Boundary ‘Value Probleme for ODEs. Partial Dieeatial Equations, Past Fourier Transform, ‘Random Numbers and Simulation. CF. Van Loan (1997). Introduction to Scientific Computing: A Matrix. Vector Approach Using Matlab, Prentice Hall, Upper Saddle River, NJ. Power Tools of the Trade. Polynomial Interpolation. Piecewise Polynomial Interpo- lation. Numerical Integration. Matrix Cocputstions. Linear Syrtems. The QR aad ‘Gdolerky Factorwations, Nonlinear Equations and Optioimtion. The Initial Value Probleen Advanced (General) NJ. Higham (1996). Accurncy and Stability of Numerical Algorithms, ‘SIAM Publications, Philadelphia, PA. ‘Principles of Finite Precision Computation. Fioating Polat Arithmetic. Basic Sommatioa, Polynomials Norms. Perturbation Theory for Linene Sywtema. TH ‘angular Syrtema. LU Factortation and Linear Equations, Choleky Factorization. Iterative Refinement. Block LU Factorization. Matrix Inversion. Condition Number Eaimation. The Sylvewtar Equation. Stationary Tarative Method. Matrix Powart QR Paccorisatin. ‘The Least Squares Problem. Underdetermined Sywtems Vas. ‘dermonde Syatera. Fast Matrix Multiplication, The Fast Fourier Transform sd ‘Applications. Automate Error Analysis. Software lanies in Floating PoittArith- toetic. A Gallery of Txt Matrices SELECTED REFERENCES wax JW, Demmel (1996). Numerical Linear Algebra, SIAM Publications, Philadel. phia, PA. Introduction. Linear Equation Solving. Linear Least Squares Problems. Noney= ‘metric Eigenvalue Probleme The Symunstrc Eigenproblom and Singular Value De- composition, Iterative Methods fr Linear Systeme and Eigenvlue Problems. Iter ive Algeithme for Bigeavalue Problems. LN. Trefethen and D. Bau IK (1997). Numerical Linear Algebra, SIAM Publications, Philadelphia, PA. Matsa Vector Mukiplication. Orthogoaal Vctors and Matricat. Norma. The Sio- {nae Decompaation, Nore ont SVD. Proton. QR Petoranton, Gras lewenborg/Tridiegonal Form. Rayleigh QR Algorithm Without Shift. QR Algorithm With Shite. Other Eigecvaiue Al {sorithme. Computing the SVD. Overview of erative Methods. ‘The Arnoldi fers tion. How Arnoldi Leceten Eigenvalues. GMRES, The Lanceoe Iteration. Orthogo- tal Polynomials and Cuum Quadrature. Conjugate Gradienta. Borthogoaalization Methods. Precooditioning. The Definition of Numerical Analysis. Analytical F.R Gantmacher (1959). The Theory of Motrices Vol 1, Chelsea, New ‘York. Matrioas and Operations on Matriow. The Algorithm of Geum and Some of its F-R. Gantmacher (1959). The Theory of Matrices Vol. 2, Chelsea, New York. Complex Symmetric, Shew-Symmetic, and Orthogonal Matrices Singular Peocila ‘of Matriows. Matrices with Nonbegative Elmeots. Application of the Theory of Ma- (ies tothe Tavestigation of Syrtane of Linear Diferestiat Equations. "The Probles ‘of Routhtlurwits abd Related Quertions, ‘A. Berman and R.J. Plemmons (1979). Nonnegative Mairices in the Math- ‘ematical Sciences, Acedemic Presa, New York. Reprinted with additions in 1994 by SIAM Publications, Philadelphia, PA, Matziom Which Leaves Cone Invariant. Noanegative Matrices. Semigroups of Now ‘agative Matrices Syrmetic Nonoegative Matriow. Generali Inver Poitviy MMatrice, Itratiw Methods for Low Systeme. Flite Markov Chaine. Input Outpot Anelyss in Beosomicn, The Linear Complementarity Protlem. x REFERENCES G.W. Stewart and J. Sun (1990). Matréz Perturbation Theory, Academic Press, San Diego. ‘Proliminarin, Norm aod Metrics. Linear Symams and Lous Squares Problems. The Perturbation of Eigenvalues. Lneriant Subspaces. Generalised Eigenvalue Probie, R. Hom and C. Johnson (1985). Matric Analysis, Cambridge University Press, New York. Review and Miscellanea. Eigeavalues, Eigunvectors, and Similarity. Unitary Equiv ‘lence and Normal Matrices. Canotical Foras. Hermitian and Symmetric Matrices, Norms for Vectors and Matrices. Location and Perturbation of Eigecvalues, Positive Definite Matrices, R. Hom and C. Johnson (1991). Topics in Matrix Analysis, Cambridge University Press, New York. ‘The Field of Valuos. Stable Matric and Inertia, Singular Valve [eequalition Ma ‘x Equations and the Krooecar Product. The Hadamard Product. Matrices and Functions. Linear Equation Problems D.M. Young (1971), Iterative Solution of Large Linear Systems, Academic Press, New York. Introduction. Matrix Preliminarion Linear Stationary Iterative Methods. Conver- ‘20ce of the Basic Iarative Methods. Figeavaluer of the SOR Method for Coo inently Ordered Matric. Determination of the Optistum Relaxation Parameter. ‘Norma of the SOR Method. The Modified SOR Method: Fixed Parameters. Nonsts- LNonary Linear Iterative Methods. Tae Modified SOR Method: Variable Parameters. ‘Sembiterative Methods. Extensions of the SOR Theory; Stnitje Matrices, Gener- ‘lized Consistently Ordered Matriom. Group Iterative Methods. Symmetric SOR Method and Related Methods. Second Degree Methods. Alternating Direction tz plicit Methods. Selection of an Iterative Method. L.A. Hageman and D.M. Young (1981). Applied Iterative Methods, Aca demic Press, New York. Background on Linear Aigetra and Related Topics, Background oa Basic Terative Methods. Polynomial Acceleration. Chebyshev Acceleration An Adaptive Chey shew Procedure Using Special Norms. Chebyahe Acovleration. Conjugete Gradient Acosleraion. Special Methods for Red/Black Partitionings. Adaptive Pro- cadures for Sucoemive Overrelaxation Method. The Use of Iterative Methods i the Solution of Partial Dierenti! Equations, Case Studion. The Nosaymmetriaable Cane, REFERENCES wd ‘A. George and J. W-H. Liu (1981). Computer Solution of Large Sparse Positive Definite Systems. Prentice-Hall Inc., Englewood Cliffs, New Jersey. Introduction. Fundameotals. Some Graph Theory Notation and Its Use in the ‘Study of Sparse Symunetric Matric BAnd and Buvelope Methods. General Spare Methods. Quotiot ‘Tree Method for Fisite Element and Piaite Diference Prob- Jes, One-Way Disecton Methods for Finite Blament Probl. Nested Dimection Methods, Numerical Experiments. S, Pissanetaky (1984). Sparse Matriz Technology, Academic Press, New York. Fundamentals. Linear Algebraic Equations. Numerical Errore in Cauaian Etimi nation. Ordering for Gane Elimination: Symmetric Matrices. Ordering for Cau Elimination: General Matrices. Sparse Eigenanalyaa Sparse Matrix Algebra. Con- nectivity and Nodal Amembly. Cenoral Parposs Algorithms IS. Duff, A.M, Erisman, and J.K, Reid (1986). Direct Methods for Sparse Matrices, Oxford University Press, New York. Introduction. Sparse Matrices Storage Scheanes and Simple Operations. Gaussian Elimination for Dense Matrices: Tho Algebraic Problem. Caumian Elimination for Dente Matrices: Numerical Considerstions. Geumsion Eliminotion for Sparse Matrices’ An Introduction. Reduction to Block Triangular Farm. Local Pivotal Strategies for Sparse Matrices. Ordering Sparse Matricer to Special Forma, Im plementing Gaurian Elimination: Analyae with Numerical Vas, Implereating Geumian Elimination with Symbolic Analyee. Partitioning, Matrix Modifcation, sad Tearing, Other Sparity-Orieted Iatven. R. Barrett, M. Berry, T-F. Chan, J, Demmel, J. Donato, J. Dongarra, V. Eijchout, R. Pozo, C. Romine, H. van der Vorst (1993). Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, STAM Publications, Philadelphia, PA. Introduction, Why Use Templates? What Methods are Covered? Itratve Methods Stationary Methods, Nonstationary Iterative Methods. Survey of Recent Krylor Methods. Jacobi, incompleta, SSOR, and Polynomial Preconditioners. Complex ‘Syrtems, Stopping Critenm. Date Structures. Perle. The Lancsoe Connection. Block Iterative Methods. Raduoed Sywtem Presonditioning. Domain Decompaxtion Methoda: Multis! Methods. Row Projection Methods W. Hackbusch (1994). Iterative Solution of Large Sparse Systems of Bawa tons, Springer-Verlag, New York. Introduction. Recapitulaticn of Linear Algebra. Iterative Methods. Methods of Incobl and Gane Seal and SOR Iteration inthe Postive Definite Case Analysis in the 2Cyelic Cuan Analysis for M-Matrices, Seu llerative Methods Transor- ‘mations, Secondary Iterations, Incomplete Triangular Decomporitions. Conjugate Gradient Methods. Mult-Grid Methods. Domain Decomposition Methods, vai REFERENCES 0. Axelwon (1994). Herative Solution Methods, Cambridge University Pres. Diuest Solution Methods. Theory of Matrix Elgmvaloes. Positive Definite Matsi- lgenvaluo Probens. Reducible and ire ‘Type Methods. Generallaed Conjugste Gradieat Methods, The Rate of Converesnce of the Conjugsta Gradient Method. ‘Y. Saad (1996). Iterative Methods for Sparse Linear Systems, PWS Pub- lishing Co., Boston. Badgzound In Linaar Algebra. Discretiation of PDEs. Sparve Matrices. Basic IMerative Methods. Project Methods. Keykw Subspace Methods ~ Part I. Krylov Subepace Methods ~ Part [1. Methods Ralated to the Normal Equations. Precon- ‘4iioped lterotions, Preconditioning Techniques. Parallel implementations, Parallel Preconditioners, Domain Decomposition Methods. Linear Fitting Problems CLL, Lawson and RJ. Hanson (1974). Solving Least Squares Problems, Prentice-Hall, Englewood Cliffs, NJ. Reprinted with a detailed “new developments” appendix in 1996 by STAM Publications, Philadelphia, PAL Incroductioa. Analysis of the Leut Squares Problem. Orthogonal Decomposition by Orthogonal the Solution for the Overdetermined or Exactly Determined Pall Rank Problesa. ‘Computation of the Covariance Mairi ofthe Solution Parametert, Computing the Solution fr the Underdstarmina Pull Rank Preblara. Computing the Solution for Problem LS with Powably Deicant Preudoruak- Analysis of Computing Ecroce for Householder Transiormations Analyse of Competing Errors for the Problem LS. Analyrin of Computing Errors for the Problam L Using MUsed Precision Arithmetic. Computation of the Singular Valve Decompestion snd the Solution of Problem US. Other Methods for Least Squares Problems. Linear Least Squares with Linear Brguality Constraiots Using a Basia ofthe Null Spon Linear Least Squares with inner Equality Constraints by Direc Elimination. Linear Lass Squares with Lis- mr Equality Constraints by Weighting, Liner last Squares with Linear Inequality REFERENCES vail RW. Farebrother (1987). Linear Least Squares Computations, Marcel Dekier, New York. Canonical Bxpremions for the Loust Squares Eetinators aod Teat Statietics, "rae ditional Expromions for tbe Least Squares Updating Formulas aod tort Statistics ‘eaet Squares Eatimation Subject to Linear Constraints, 8. Van Huffel and J. Vandewalle (1991). The Total Least Squares Problem: Computational Aspects and Analysis, SIAM Publications, Philadelphia, PA. Introduction. Basic Prineiplat ofthe Total Lanet Squares Problem. Extensions ofthe ‘Basic Total Least Squares Problem, Direct Speed improvement of the Total Least Squares Computetions. Iterative Speed Improvement for Solving Slowly Varying ‘Total Least Squares Problema. Algebraic Connections Between Total Loost Squares and Losst Squares Problems. Sensitivity Analysis of Total Laut Squares aad Lent ‘Squares Probie in the Presence of Error in All Data. Statistical Properties ofthe ‘Total Least Squares Problem, A. Bjérck (1996). Numerical Methods for Least Squares Problems, SIAM Pubiications, Philadelphia, PA. Mathematical and Statatical Propatio of Laat Squares Soletions. Basse Namarcal Methods. Modiied Least Squares Problems. Generalized Least Squares Problems. ‘Constrained Least Squares Problems. Direct Methods for Sparse Laat Squares Prob- Jens. Iterative Methods for Lanat Squares Problems. Least Squares with Special ‘aves. Nonlinear Least Squares Problems. Eigenvalue Problems BU. Pariett (1980). The Symmetric Higenvaiue Problem, Prentice-Hall, Englewood Cliffs, NJ. Basic Pacts about SethAdjoiat Matriow, Taaks, Obstacle, and Aida. Courting, ‘Subepace Heration. The General Linear Eigenvalue Problem. J. Cullum and R.A. Willoughby (1985a). Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. I Theory, Biricaiiser, Boston. Proliminariae Notation aod Definitions, Real Symmetric Problems. Lancsos Pro- ‘urs Rea Syma Probl, Tdiganal Matin, Lancs Procedures with [No Renrthogonaliatian for Symmetric Problema. Rael Rectangular Matrica. Hoo ‘Detective Complos Syeumetrc Marien. Block Lancaoa Procedure, Ral Symunetric Mastins viv REFERENCES 4. Cullum and R.A. Willoughby (1985b). Lanczos Algorithms jor Large Symmetric Eigenvalue Computations, Vol. If Programs, Birkhaiiser, Boston. ‘Lanczos Procedures. Real Symmetric Matrices Hermitian Matrices, Factored In- ‘yenen of Fanl Symmetric Matrices. Real Symmetric Generalied Problems. Reel Rectangular Probleme. Noodelective Complex Symmetric Matriom. Real Symmet- te Matrices, Block Lancace Code.” Fectored Inverse, Real Symmetric Matrices, Block Lanczos Code Y. Saad (1992). Numerical Methods for Large Eigenvalue Problems: Theory and Algorithms, John Wiley and Sons, New York. Background in Matrix Theory and Linear Algebra. Perturbation Theory and Ex- ror Anaiyaia. ‘The Tools of Spectral Approximation. Subspace Tteration. Krylov Subepace Methods. Acceleration Techniques and Hybrid Methods. Precondition- ing Techoiques. Noo-Standard Eigeavalue Problems. Origins of Matrix Eigenwlue Problems, F. Chatelin (1993). Bigenvalues of Matrices, John Wiley and Sons, New ‘York. Supplements from Linear Algebra, Slemenis of Spectral Theory. Why Compute Eigenvalues, Error Analyuss. Foundations of Methods for Computing Eigenvalues [Nutmerical Methods for Large Matrices. Chabysbev’s Iterative Methods. High Performance W. Schinauer (1987). Scientific Computing on Vector Computers, North Holland, Amsterdam. Introduction. ‘The First Commercially Significant Vector Computer, The Arithmetic Pecormsance ofthe First Commercially Significant Vector Computer. Hockney's n¥? ‘nd Timing Formals, Fortran and Autovectoraaton. Behavior of Programs. Some ‘Basic Algorithme, Recurrences. Matrix Opertioas. Systems of Linear Equations with Pull Matrices. Tridiagonal Linear Systema. The iterative Solution of Linear Equations. Special Applications. The Fujitas VPs and Other Japaneae Vector Com- puters. The Cray-2. The IBM VF and Other Vector Processors. The Convex Cl. RW. Hockney and C.R. Jesshope (1988). Parallel Computers % Adam Hilger, Bristol and Philadelphia. Inerodvction. Pipelined Computers. Procesor Arrays. Parallel Languages, Palla ‘Algorithms. Future Developments 4.4. Modi (1988).Parallel Algorithms and Matric Computation, Oxford Uni- versity Press, Oxford. ‘Cenecai Principles of Parallel Computing, Parallel Techniques and Algorithms, Par alld Sorting Algorithms. Solution of « System of Linear Algsbrale Equations. The Symmetric Eigenvalue Problem: Jacobi’s Method. QR Factorization. Siagular Valve Decompouition and Related Problems Szuectep REFERENCES 08 J. Ortega (1988). Introduction to Parallel and Vector Solution of Linear ‘Systems, Plenum Press, New York. Introduction. Direct Methods for Linear Equations. Iterative Methods for Linear equations J. Dongarrs, I. Duff, D. Sorensen, and H. van der Vorst (1990). Solving Linear Systems on Vector and Shared Memory Computers, SIAM Pub- lications, Philadelphia, PA. ‘Vector and Parallel Proceming, Overview of Current High-Performance Compa. ‘rx. Implementation Detain and Overhead. Perormancs Analyse, Modeling, and Measurement. Building Blocksio Cinoar Algebra. Direct Solution of Spare Linear Systeme, Iterative Solution of Spans Linear Syma 'Y. Robert (1990). The Impact of Vector and Parallel Architectures on the Gaussian Elimination Algorithm, Halsted Press, New York. Introduction, Vector snd Parallel Architectures. Vector Multiprocaor Computing. Hypercube Computing Systolic Computing. Task Graph Scheduling, Anelyaa of Dimcribated Algorithme. Design Methodologie. GLH. Golub and J.M. Ortega (1993). Scientific Computing: An Introduc- tion with Parallel Computing, Academic Press, Boston. ‘The World of Scientiie Computing. Linear Algebra. Parallel and Vector Computing. Polynomial Approximation. Continuous Problema Solved Discretly. Direct Solu- tion of Linear Equations. Paral Dizect Methods. Iterative Methods. Conjugete radient-Type Methods, Edited Volumes D.J. Rose and R. A. Willoughby, eds. (1972). Sparse Matrices and Their Applications, Plenum Press, New York, 1972 JR. Bunch and D.J. Rose, eds. (1976). Sparse Matriz Computations, Academic Press, New York. LS. Duff and G.W. Stewart, eds. (1979). Sparse Matrix Proceedings, 1978, ‘SIAM Publications, Philsdelphia, PA. LS. Dufl, ed. (1981). Sparse Matrices and Their Uses, Academic Press, New York. A. Bjérck, RJ. Plemmons, and H. Schneider, eds. (1981). Large-Scale ‘Matriz Problems, North-Holland, New York. G. Rodrigue, ed. (1962). Parallel Computation, Academic Press, New York. voi REreRENcEs B. Kagstrom and A. Rube, ads, (1983). Matriz Pencils, Proc. Pite Havs- bad, 1982, Lecture Notes in Mathematics 973, Springer-Verlag, New York and Berlin. J. Cullum and R.A. Willoughby, eds. (1986). Large Seale Eigenvalue Prob. Jems, North-Holland, Amsterdam. ‘A, Wouk, ed. (1986). New Computing Environments: Parallel, Vector, and ‘Systolic, SIAM Publications, Philadelphia, PA. M.T. Heath, ed. (1986). Proceedings of First SIAM Conference on Hyper- cube Multiprocessors, SIAM Publications, Philadelphia, PA. MCT. Heath, ed. (1987). Hypercube Multiprocessors, SIAM Publications, Philadelphia, PA. G. Rox, ed. (1988). The Third Conference on Hypercube Concurrent Com- puters and Applications, Vol. II - Applications, ACM Press, New York. MHL Schultz, ed. (1988), Numerical Algorithms for Modern Parallel Com- ‘puter Architectures, IMA Volumes in Mathematics and Its Applications, ‘Number 13, Springer-Verlag, Berlin. E.P. Deprettere, ed. (1988). SVD and Signal Processing. Elsevier, Ams- terdam. BUN. Datta, C.R. Johnson. M.A. Keashoek, R. Plemmons, and E.D. Son- tog, eds. (1988), Linear Algebra in Signals, Systems, and Control, SIAM Publications, Philadelphia, PA. 4. Dongsrra, I. Duff, P. Gaffney, and $. McKee, eds. (1989), Vector and Parallel Computing, Ellis Horwood, Chichester, England. ©. Axelsson, ed. (1989). “Preconditioned Conjugate Gradient Methods,” BIT 29: K. Gallivan, M. Heath, E. Ng, J. Ortega, B. Peyton, R. Plemmons, C. Romine, A. Semeh, and B. Voigt (1990), Parallel Algorithms for Matriz Computations, SIAM Publications, Philadelphia, PA. G.H. Golub and P. Van Dooren, eds. (1991). Numerical Linear Alge- bra, Digital Signal Processing, and Parallel Algorithms. Springer-Verlag, Berlin. R. Vaccaro, ed. (1991). SVD and Signal Processing HI: Algorithms, Analy- ‘sis, and Applications. Elsevier, Amsterdam. REFERENCES revi R, Beauwens and P. de Groen, eds. (1992). iterative Methods in Linear Algebra, Elsevier (North-Holland), Amsterdam. RJ, Plemmons and C.D. Meyer, eds. (1993). Linear Algebra, Markov ‘Chains, and Quewing Models, Springer-Verlag, New York. MS. Moonen, G.H. Golub, and B.L.R. de Moor, eds. (1903). Linear ‘Algebra for Large Scale and Real-Time Applications, Khuwer, Dordrecht, ‘The Netherlands. J.D. Brown, M-T. Chu, D.C. Ellison, and R.J. Plemmons, eds. (1994). Pro- ceedings of the Cornelius Lanczos International Centenary Conference, SIAM Publications, Philadelphia, PA. RV. Patel, AJ. Laub, and P.M. Vaa Dooren, eds. (1994). Numerical Linear Algebra Techniques for Systems and Control, [EE Press, Pis- cataway, New Jersey. J. Lewis, ed. (1994). Proceedings of the Fifth SIAM Conference on Applied Linear Algebra, SIAM Publications, Philadelphia, PA. A, Bojanczyk and G. Cybenko, eds. (1995). Linear Algebra for Signal ‘Processing, IMA Volumes in Mathematics and Its Applications, Springer- Verlag, New York. M. Moonen and B. De Moor, eds. (1995). SVD and Signal Processing IIT: Algorithms, Analysis, and Applications, Elsevier, Amsterdam. Matrix Computations Chapter 1 Matrix Multiplication Problems $1.1 Basic Algorithms and Notation §1.2 Exploiting Structure §1.3 Block Matrices and Algorithms §1.4 Vectorization and Re-Use Issues ‘The proper study of matrix computations begins with the study of the tmatcixcmatrix multiplication problem. Although this problem is simple mathematically it is very rich from the computstional point of view. We begin in §1.1 by looking at the several ways that the matrix multipica- tion problem can be organized. The “language” of partitioned matrices is established and sed to characterize several near algebraic “levels” of ‘computation. If matrix bas structure, then it is usually possible to exploit it. For example, a symmetric matrix can be stored in balf the space as a general matrix. 'A matrix-vector product that involves a matrix with many 22r0 entries may require much less time to execute than a full matrix times a vector. These matters are discussed in §1.2. In §1.3 block matrix notation ia established. A block matrix in a matrix ‘ith matrix entries. This concept is very important from the standpoint of oth theory and practice. On the theoretical side, block matrix notation allows us to prove important matrix factorizations very succinctly. These factorisations are the comerstone of numerical linear algebra. From the ‘computational point of view, block algorithms are important because they 1 2 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS are rich in matrix multiplication, the operation of choice for many new high performance computer architectures. ‘These new architectures require the algorithm designer to pay 9s much attention to memory traffic as to the actual amount of arithmetic. This aspect of scientific computation is illustrated in §1.4 where the critical is- sues of vector pipeline computing are discussed: stride, vector length, the ‘umber of vector loads and stores, and the level of vector re-use. Before You Begin It is important to be familiar with the MATLAB language. Seo the texta by Pratap(1995) and Van Loan (1996). A richer introduction to high performance matrix computations is given in Dongarra, Du, Sorensen, and Duff (1991). ‘This chapter's LAPACK connections include LAPAGK: Some General Operations TEAL e= az fector wale Soot. | an aty Dot product carr |ysarty | Sexy Tene | y— ade + py | Metrixwecior multipliation coma | AC Atary” | Ranke) opdate repn | C= 2AB +00 | Matrix mltiplieation LAPACK: Some Symmetric Operations “Sy = ade By Maurie vectar mnliplcation Ts | y ade + By Matrix-veetor multiplication (Packed) ism | Anan? +4 Ranlet update Tema | AS aryl paya™ +A Ranke? update Tame | C— ada? +90 Rani update Tsmax| CH adaT+aBa" +0C | Rank-2k updote snat_| C= GAB +90 oF (aBA+6) | Symmetric General Product LAPACK: Some Band/Triangular Operations TGHRT | y= aA + By | Gaal Bart csanv | yeaAz+dy | Symmetric Band cmv | 2 oa ‘angular low | eae ‘Trinpular Paciad imu | 5048 (or BA) | TWianguae/General Product 1.1 Basic Algorithms and Notation Matrix computations are built upon a hierarchy of linesr algebraic opera- ‘tons. Dot products invoive the scalar operations of addition and multipli- cation. Matrix-vector multiplication is made up of dot products. Matrix- matrix multiplication amounts to a collection of matrix-vector products. All of these operations can be described in algorithmic form or in the lan- guage of linear algebra. Our primary objective in this section is to show 1.1, Baste AuconrTHas AND NOTATION 3 bow these two styles of expression complement esch another. Along the way wwe pick up notation and acquaint the reader with the kind of thinking that underpins the matrix computation area. The discustion revolves around the matrix multiplication problem, a computation that can be organized in several ways, 1.1.1 Matrix Notation Let R denote the set of real numbers, We denote the vector space of all m-by-n real matrices by R™*": ageR AGR™™ => Aa(ay)=| ; Bh + wm If a capital letter is used to denote a matrix (eg. A, B, A), then the corresponding lower case letter with subscript 4j refers to the (i,3) entry (e6.. a4 , bys 6i)- AS appropriate, we also use the notation (Aj and (i,j) to designate the matrix elements. 1.1.2. Matrix Operations Basic matrix operations include transposition (IR™*" — R°*™), CHAT me cy say, addition (R™*" x R™" — R™), CHALB, me cy may thy, scalar-matrés multiplication, (Rx R™" — R™), CmaA = cy many, sand matriz-matréx multiplication (RI™? x RP*" —+ RE"), CHAR = ay = Doaby. mt ‘These are the building blocks of matrix computations. 4 (CHapTeR 1, MATREX MULTIPLICATION PROBLEMS 1.1.3 Vector Notation Let IR" denote the vector space of real n-veetors: eos -["] nen. We refer to 2; as the ith component of 2. Depending upon context, the alternative notations [x], and 2(4) are sometimes used. Notice that we are identifying I" with R™*! and so the members of RR" are column vectors. On the other hend, the elements of R'™" are row ‘vectors: ERP ee re (eiesan)- Uz is a column vector, then y= 27 is a row vector. 1.1.4 Vector Operations Assumea € R, z € R”, andy € RR", Basic vector operations include scalar- vector multiplication, z = yan, vector addition, psrty =F A= Btw the dot product (or inner product) cmsty ome c= Sozam and vector multiply (or the Hadamard product) zezey => 4 =o, Another very important operation which we write in “update form” is the aK veaty = wrantn Here, the symbol “=" is being used to denote assignment, not mathematical equality. ‘The vector y is being updated. The name “saxpy” is used in LAPACK, a software paciage that implements many of the algorithms in this book. One can think of “sexpy” as o mnemonic for “scalar a x plus ve 1.1. BASIC ALGORITHMS AND NoTATION 5 1.1.5 The Computation of Dot Products and Saxpys We have chosen to express algorithias in a stylized version of the MATLAB language. MATLAB is a powerful interactive system that is ideal for matrix ‘computation work. We gradually introduce our stylized MATLAB notation in this chapter beginning with an algorithm for computing dot products. Algorithm 1.1.1 (Dot Product) If z,y € I”, then this algorithm com- putes their dot product ¢ = 27y. e=0 for i=in exe atiy(i) end “The dot produet of two n-vectors involves n multiplicstions ond n additions. It is an *O(n)” operation, meaning that the amount of work is linear in the dimension. The saxpy computation is also an O(n) operation, but it returns a vector instead of a sealar. Algorithm 1.1.2 (Saxpy) if x,y € RO and ¢ € I, theo this algorithm overwrites y with a+ y- for isin vet) = ax(4) + 96) end 1 must be stressed that the algorithms in this book are encapsulations of critical computational ideas and not “production codes.” 1.1.6 Matrix-Vector Multiplication and the Gaxpy Suppose A ¢ I™" and that we wish to compute the update ys Arty where z ¢ R" and y € R™ are given. This generalized saxpy operation is referred to as a gaspy. A standard way that this computation proceeds is to update the components one at a time: we Doge +m ‘This gives the following algorithm. Algorithm 1.1.3 (Gaxpy: Row Version) If Ae R™", 2 €R", and ¥ER™, then this algorithm overwrites y with Ax +y. 6 Chaprer 1. MATRIX MULTIPLICATION PROBLEMS Ln AG, j)2(3) + vf) ‘An alternative algorithm results if we regard Az as a linear combination of ‘A’s columns, eg., (G1 ESH]~El-ld- (8) Algorithm 1.1.4 (Gaxpy: Column Version) If A€R™", r eR", and y IR”, then this algorithm overwrites y with Az + y. for j= in for i= im Gi) = AG, J) + vl) end end Note that the inner loop in either gaxpy algorithm carries out a saxpy operation. The column version was derived by rethinking what matrix- ‘vector multiplication “means” at the vector level, but it could also have been obtained simply by interchanging the order of the loops in the row version. In matrix computations, it is important to relate loop interchanges to the underlying linear algebra. 1.1.7 Partitioning a Matrix into Rows and Columns Algorithms 1.1.3 and 1.1.4 access the data in A by row and by column respectively. To highlight these orientations more clearly we introduce the language of partitioned matrices. From the row point of view, a matrix is a stack of row vectors: T ACR ee ner’. (uy oh ‘This is called a row partition of A. Thus, if we row partition 1.1, Basic Auconrrums AND Notation 7 then we are choosing to think of A as a collection of rows with rP=[1 2) FP s(3 4], ond FP =[5 8]. With the row partitioning (1.1.1) Algorithm 1.1.3 can be expressed as fol- lows: for i= 1m warts + uli) end Alternatively, a matrix isa collection of solumn vectors: AGR™™ ep Asfencsenl, eR” (12) We refer to this as a colunn partition of A. In the S-by-2 example above, we thus would set cy and cz to be the first and second columns of A respectively: “i El With (1.1.2) we oe that Algorithm 1.1.4 is a saxpy procedure that accesses, A by columns: for j= 1m vengty end In this context appreciate y as a running vector sum that undergoes re- peated saxpy updates. 1.1.8 The Colon Notation ‘A handy way to specify a column or row of a matrix is with the “colon” notation. If A € R™*", then A(k,:) designates the kth row, ie, Alk,:) = [ats .--+@4n) ‘The Ath coluzan is speciied by ou AGa=] 2 |. Ont ‘With these conventions we can rewrite Algorithms 1.1.3 and 1.14 as for i= 1m uli) = AG, Je + ui) 8 CHaprer 1. MATRIX MULTIPLICATION PROBLEMS respectively. With the colon notation we ate able to suppress iteration details. ‘This frees us to think at the vector level and focus on larger com- putational issues. 1.1.9 The Outer Product Update ‘As a preliminary application of the colon notation, we use it to understand the outer product update A= Atay", AeR™ eR" yeR" ‘The outer product operation zy “looks funny” but is perfectly legal, e. \ ‘os [}]s-[ a]. 3 is ‘This is because zy? is the product of two “skinny” matrices and the number of columns in the left matrix x equals the number of rows in the right matrix xy". The entries in the outer product update are prescribed by fori= im for j ag = yy + ay end ‘The mission of the J loop is to add a multiple of y to the i-th row of A, for i= im Aliy:) = AG) + 2()y? end On the other hand, if we make the i-loop the inner loop, then its task is to ‘add a multiple of z to the th column of A: for j=1n Alaa) end (3) +e ‘Note that both outer product algorithms amount to a set of saxpy updates. Ll, Basic ALGORITHMS AND Notation 9 1.1.10 Matrix-Matrix Multiplication Consider the 2-by-2 matrix-matrix multiplication AB. In the dot product, formulation esch entry is computed as a dot product: 1 2][5 6]_[t-542-7 1642-8 3 4)[7 8] “[3-54+4-7 3-6+4-8 In the saxpy version each column in the product is regarded as a linear combination of columns of A: GUE le bE del). Ble] 12 6 6] _[2 2 Gil[s -B)e oe[ie a Although equivalent mathematically, it turns out that these versions of matrix multiplication can have very different levels of performance because of their memory traffic properties, This matter is pursued in 91.4. For now, Se ee ee ee because it gives us a chance to review notation and to practice thinking at different linear algebraic levels. 1.1.11 Scalar-Level Specifications To fix the discussion we focus on the following matrix multiplication update: C=AB+C AcR™?, BeR*", CeR™". ‘The starting point isthe familiar triply-nested loop algorithm: Algorithm 1.1.5 (Matrix Multiplication: ijk Variant) If A ¢R™?, 10 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS ‘This is the “ijk variant” because we identify the rows of C (and A) with i, the columns of C (and B) with j, and the summation index with &. We consider the update C = AB +C instead of just C = AB for two reasons. We do not have to bother with C = 0 initializations and updates ‘of the form C = AB +C arise more frequently in practice. ‘The three loops in the matrix multiplication update can be arbitrarily ordered giving 3! = 6 variations. Thus, for j for k= Lp for i= 1m Cli, 3) = AG. R)B(R, 3) + CH, 9) end end is the jki variant. Bach of the six possiblities (ijk, jik, ikj, jhi, bij, 4438) features an inner loop operation (dot product or saxpy) and has its ‘own pattern of data flow. For example, in the ijk variant, the inner loop oversees a dot product that requires access to a row of A and s columnn of B. The jki variant involves saxpy that requires access to a column of C and a column of A. These attributes are summarized in Table 1.1.1 along with an interpretation of what is going on when the middle and inner loop are considered together. Each variant involves the same emount of floating Loop | Inner Middle Inner Loop Order_| Loop Loop Data Access ak ‘dot vector x matrix | A by tow, B by cohanns jk dot. ‘matrix x vector | A by row, B by column ky | saxpy TOW KaXpy By row, C by row jki | saxpy column gaxpy | A by column, C by column ij | saxpy | row outer product B by tow, C by row 3¢_|_saxpy | column outer product | A by column, C by column Be RE", and C ¢ IR™" are given, then this algorithm overwrites C with AB+C. for i= Lim for for tf =lp CUi9) = Ali ABLES) +0U63) end end ‘TABLE 1.1.1. Matriz Multiplication: Loop Orderings and Properties point arithmetic, but accesses the A, B, and C data differenty. 1.1.12 A Dot Product Formulation ‘The usual matrix multiplication procedure regards AB as an array of dot Products to be computed one at a time in left-to-right, top-to-bottom order. 1.1. Basic Aucorrrams AND NOTATION u ‘This is the idea behind Algorithm 1.1.5. Using the colon notation we can highlight this dot-product formulation: Algorithm 1.1.6 (Matrix Multiplication: Dot Product Version) If Ae R™?, BeR™, and CeR™ are given, then this algorithm overwrites C with AB+C. for i for j=1n OCi3) = AU IBE3) + Ol) ond end In the language of partitioned matrices, if 7 a a, B eR” and Pryeyba] be ER then Algorithm 1.1.6 has this interpretation: for i= iim b+ Note that the ‘mission” of the j-loop is to compute the ith row of the update, To emphasize this we could write for i= Lm. Gadaed end where ¢ c=|: om Jn a row partitioning of C. To say the same thing with the colon notstion for i= 1am Olé.) = AGB +O end Either way we see that the inner two loops of the ijk variant define Tow-oriented gaxpy operation. 12 Cuaprer 1. Maraix MULTIPLICATION PROBLEMS 1.1.13 A Saxpy Formulation Suppose A and C are column-partitioned as follows Ae [anya] aR” c Leyensee] GERM. By comparing jth columns in C = AB + C we see that Soyas +e i ‘These vector sums can be put together with a sequence of saxpy updates. 6 Algorithm 1.1.7 (Matrix Multiplication: Saxpy Version) If the ma- trices A €R™?, Bc RP**, and C € R™*" are given, then this algorithm overwrites C with AB +C. for jin for k ct = AG, A) BCR 3) + C63) end end Note that the k-loop oversees a gaxpy operation: for j=im G5) = ABE.) +O.) 1.1.14 An Outer Product Formulation Consider the ki variant of Algorithm 1.1.5: for k=.p for j = 1:n for i= i:m Cli, 3) = AG, A)B(R, j) + C9) end end ‘The inner two loops oversee the outer product update Coasts 1.1, Baste ALonrrams AND NOTATION 13 where a Ax [a1e-s09) and B= (113) we y with ay € IR” and by € R". We therefore obtain Algorithm 1.1.8 (Matrix Multiplication: Outer Product Version) If AeR™?, BeR™, and Ce R™" are given, then this algorithm overwrites O with AB +C. for k > C= AG RB) +O end ‘This implementation revolves around the fact that AB is the sum of p outer products. 1.1.15 The Notion of “Level” ‘The dot product and saxpy operations are examples of “level1” operations. Level-1 operations involve an amount of data and an amount of arithmetic that is linear in the dimension of the operation. An m-by-n outer product update or gaxpy operation involves a quadratic amount of data (O(mn)) ‘nd a quadratic amount of work (O{mn)). hey are examples of *level-2* operations. ‘The matrix update C = AB +C ia a “level” operation. Level-3 ‘operations involve a quadratic amount of data and a cubic amount of work. If A, B, and C ace n-by-n matrices, then C = AB + C involves O(n*) matrix entries and O(n) arithmetic operations. ‘The design of matrix algorithms thet are rich in high-level linear al- gebra operations is a recurring theme in the book. For example, a high Performance linear equation solver may require a level-3 organization of Gaussian elimination. This requires some algorithmic rethinking because that method is usually epecifed in level-1 terms, e.g. multiply row 1 by a scalar and add the result to row 2.” 1.1.16 A Note on Matrix Equations In striving to understand matrix multiplication via outer products, we es sentially tabled the metris equation . AB = Sool a where the ay and by are defined by the paritionings in (1.13). 4 (Cuaprer 1. MATRIX MULTIPLICATION PROBLEMS ‘Numerous matrix equations are developed in subsequent chapters. Some- times they are established algorithmically like the above outer product ex- pansion and other times they are proved at the ij-component level. As ‘an example of the latter, we prove an important result that characterizes ‘ranspoces of products, ‘Theorem 1.1.1 If Ac R™? and Be RP", then (AB) = BTA. Proof. If C= (AB)", then (AB)" oy = [Ble = Yasue - On the other hand, if D = BT AT, then dy = [BTA y = SVB lA" hy = Shao. m= ro Since oy = diy for all ‘and j, it follows that C= D. 0 Scalas-level proofs such as this one are usually aot very insightful. However, they are sometimes the only way to proceed. 1.1.17 Complex Matrices From time to time computations that involve complex matrices are dis- cussed. ‘The vector space of m-by-n complex matrices is designated by ™", The scaling, addition, and multiplication of complex matrices corre- sponds exactly to the real case. However, transposition becomes conjugate CHA me ay nt. ‘The vector space of complex n-vectors is designated by *. The dot product of complex n-vectors x and y is prescribed by sazty= oan Finally, if A= B+iC € ©"**, then we designate the real and imaginary parts of A by Re(A) = B and Im(A) = C respectively. Probleme Piatt Suppose A € FO" and 2 € RY are given. Give a saxpy algorithm for computing ‘he fim column of M = (A= 241)---(A—2el). 1.1. BAstc ALconrras AND NotaTION 15 PL.L.2 In the conventional 2by-2 matrix multiplication C = AB, there are eight multiplications 3x61, 1,632, o24633, 093612, arabat, arab, aaa and ozabaa. Male ‘table thot indicates the order that these multiplications are porformad forthe jh, si, ‘ij, th, jo, ane jt matrix multiply algorithm. PLLA Give an algorithm for computing C= (2y7)* where 2 and y ae n-vectort PA.1.4 Specify an slgrithi for computing (XY7)* where X,Y € FO*? 1.1.5 Formulate an outer product algorithm for the update. C= ABT + where AGRO, Be RO, and CER, 1.1.8 Suppone we have real n-by-n matrices C, D, and F. Show how to compute real reby-n matrices A nod B with just three roaln-by-n mateix multiplications vo that (A418) = (C4 {DE + KF). Hint: Compa W = (C + DUE - F) Notes and References for Sec. 1.1 1k cust be stremed that the development of quality software from aay of our “sami- formal” algorithmic promentations is long and arduous task. Even the implementation of the evehL 2, and 3 BLAS require care: CL, Laweon, RJ. Hanson, , D.R. Kincaid, and P:T. Krogh (1979). “Basie Linear Algebra Subprogram for PORTRAN Usage,” ACM ‘Trans. Math. Soft 5, 308-323, CL, Lavon, RJ. Hanson, D.R. Kincadd, and FT. Krogh (1979). “Algorithm 539, ‘asic Linear Algebra Subpeograma for FORTRAN Usage,” ACM Trans: Math. Sof 5, 30395, 4.5. Dongarra, J. Du Cros, 8, Hammaring, and RJ, Hangon (1988). “An Extended Sat of Fortran Basi Linear Algebra Subprograms,” ACM Trans, Math Soft. 14, 1-17. J.J. Doone, J. Du Cros, 8. Hammarling, snd RJ. Hanson (1988). “Algorithms 656 An. Extended Set of Fortran Basic Linear Algebra Subprogram: Model Implementation and Test Programa” ACM Trans. Math Soft. 14, 18-32. 41.4. Dongarra, J. Du Gros, 15. Duff, and 5.J. Hammacling (1990). “A Set of Lavel 3 Basic Linear Algebra Subprograms” ACM Trans. Math. Soft. 16 1-17. 43.4. Dongazra, J. Du Cros, 18. Duff, and S.J. Hammarling (1990). "Algneithm 679. A ‘Set of Laval 3 Basic Linear Algeira Subprogram Model implementation aod Test Programa,” ACM Trans, Math Soft 16, 1-28 (Other BLAS references inciude B. Kigerim, P. Ling, and C. Van Loan (1991). ‘High-Performance Lewe-3 BLAS: Sample Routines for Doubla Precision Real Dats” in High Performance Computing 1, M. Duraod and F. El Dabagh (eds), North-Holland, 250-281. 2B, Kigetrém, P. Ling, and C. Van Loea (1006). “GEMOM-Dasnd Lave.3 BLAS: High- Performance Model Implementations and Performance Evaluation Beachmatl,” ia Paral Proremnmg end Apptatons,F,Pitsen nod Ls Fame (es), 150 Pr, i For an appreciation ofthe mbtietion amocintad with woftware development we recommend JLR Rice (1081). Metre Computations and Mathematicel Software, Academie Prem, New York. tnd « browse through the LAPACK manual 16 CHAPTER 1. MATRIX MULTIPLICATION PROBLEMS 1.2 Exploiting Structure ‘The efficiency of a given matrix algorithm depends on many things. Most ‘obvious and what we trest in this section is the amount of required arith- metic and storage. We continue to use matrix-vector and matrix-matrix ‘multiplication as a vehicle for introducing the key ideas. As examples of exploitable structure we have chosen the properties of bandedness and sym- metry. Band metrices have many zero entries and so it is no surprise that band matrix manipulation allows for many arithmetic and storage short- cuts. Arithmetic complexity and data structures are discussed in this con- text. ‘Symmetric matrices provide another set of examples that can be used to illustrate structure exploitation, Symmetric inear systems and eigenvalue problems have a very prominent role to play in matrix computations and 0 it is important to be familiar with their manipulation, 1.2.1 Band Matrices and the x-0 Notation ‘We say that A ¢ IR™" has lower bandwidth p if ais = 0 whenever i > j +p. and upper bandwidth q if j > i+q implies ay =O. Here is an example of an 8-by-5 matrix that has lower bandwidth 1 and upper bandwidth 2: ‘The x's designates arbitrary nonzero entries. This notation is handy to Indicate the zero-nonzero structure of a matrix and we use it extensively, Band structures that occur frequently are tabulated in Table 1.2.1. 1.2.2 Diagonal Matrix Manipulation Matrices with upper and lower bandwidth zero are diagonal. If D ¢ ™" is diagonal, then D=ding(y, If D is diagonal and A is a matrix, then DA is a row scaling of A and AD is 0 colurnn scaling of A. d,), q=minfmn} ed dy = dy 1.2. Exptormne Srrucrure Type cof Matrix Giagonal upper triangular lower triangular | m—-1 tridiagonal 1 ‘upper bidiagonal 0 lower bidiagonal 1 1 upper Hessenberg lower Hessenberg | _m. ‘TABLE 1.2.1. Band Terminology for m-by-n Matrices 1.2.8 Triangular Matrix Multiplication ‘To introduce band matrix “thinking” we look at the matrix multiplication problem C = AB when A and B are both n-by-n and upper triangular ‘The 3-by-3 case is illuminating: aub: anbia+ aiabe2 ayibys + arabay + ayabas c= 0 ambra azabaa + aasbsy 0 o assbss It suggests that the product is upper triangular and that its upper trian- gular entries are the result of abbreviated inner products. Indeed, since ‘Oubyy =O whenever k < ior j j, then digng = Adiag(i + nk k(k-1)/2) (20) (123) Some notation simplifies the discussion of bow to use this data structure in ‘a matrix-vector multiplication. If AE R™", then let D(A, k) € R"*" designate the kth diagonal of A 1s follows: sith, I & Using thin data structure write & rmaicix-vector multiply function that computes Raz) aod Im() from Rafe) and T(z) so that ¢ = Az PLL2.6 Suppoon X ¢ FO*P and AG FOX", with A symmetric and stored by diagonal. ‘Give an algorithm that computen ¥ = XTAX and storm the result by diagonal Use separate arrays for A and Y. PL2.7 Suppose a RY ia given and that AG RO** han the property that ayy = ‘4-s}41- Give am algorithm that overwrites y with Az-+y where 2,9 € RP are given, PL28 Suppose a¢ RP is given and that A R** has the property that oxy = 44(:¢5-1) mod np44- Gi ae algorithm that overwrite y with Ax +y where 2,y € RO we given, 1.2.9 Develop 2 compact storeby-diagonal acheme for unsymunetric band matrices and write the comerpanding guepy algorithm. 1.2.10 Suppose p and qaren-vectom and that A = (oi) le defined by 3 = a34 = Pty for L$ ¢ AarByp + Cas iN. i If we organize a matrix multiplication procedure around this summation, then we obtain 6 block analog of Algorithm 1.1.5: fora=1:N i= (a- ttt hae for f= 1:N ja G-we+upe (1.33) for y=1:N k=(y-Metiat a oh, i, k)B(E,3) + C3) end LN, B= end Note that if € = 1, then and 7 = k and we revert to Algorithm 11s. ‘To obtain a block saxpy matrix multiply, we write C= AB +C as Bu Bw [a.. Ler}at ote | bom oF Jpvnnce Bu + Bw vthere dg, Ca RO, and Bag € RO, From this we obtain 30 CHAPTER 1, Marrux MULTIPLICATION PROBLEMS: for B= 1: f= (0-104 tpt for a= 1:N (a-1)e + Liat (134) Oli 3) = AG IBA) + C9) end end ‘This is the block version of Algorithm 1.1.7. ‘A block outer product scheme results if we work with the blocking A (Ate A »-(2] BE where A,B, €R™*, From Lemma 1.3.2 we have c= Saat +e and 80 for y=: ke (y-Nerint BEE, +O (135) end This is the block version of Algorithm 1.1.8. 1.3.6 Complex Matrix Multiplication Consider the complex matrix multiplication update CL iy = (As + iAa)(Bs + iBe) + (C1 + Cr) where all the matrices are real and # imaginary parts we find CL = AB ~ ABs + Cr Cr = ABr+ AaB + C2 - Comparing the real and and this can be expressed as follows: [a]-[4 “2 ][2]+(2] 1.3. BLOCK MATRICES AND ALGORITHMS a ‘This suggests how real matrix software might be applied to solve complex matrix problems. The only snag is that the explicit formation of requires the “double storage” of the matrices Ay and Ay. 1.3.7 A Divide and Conquer Matrix Multiplication ‘We conclude this section with a completely different spproach to the matrix- matrix multiplication problem. The starting point in the discussion is the ‘2-by-2 block matrix multiplication [& ¢ Ce a |= [a An | [ Bu | Cu An An || Bn Ba where each block is square. In the ordinary algorithm, Cy = AaBiy + AcgBay. There ore 8 multiplies and 4 adds. Strassen (1963) has shown how to compute C with just 7 multiplies and 18 adds: PL = (Au+An)(Bu + Bn) Ppoo= (Ant An)Bu B Au (Br2 ~ Bra) Py An(Bu — Bi) P= (Ant Au)Ba Pe = (An ~ Au)(Bu + B12) Py (Ara — An)(Bn + Bra) Cu = Ath Bath Cu = Path Cu = P+ Cn = Pi+R-Pt Pe “These equations are easily confirmed by substitution. Suppose n= 2m so that the blocks are m-by-m. Counting adds and multiplies in the compu- tation C = AB we find that conventional matrix multiplication involves (2m)? raultiplies and (2m)5 - (2mm)? adds. In contrast, if Stressen’s al- ‘gorithm is applied with conventional multiplication at the block level, then ‘Tm? multiplies and 7m? + Lim? adds are required. If m >> 1, then the Strassen method involves about 7/8ths the arithmetic of the fully conven tional algorithm. Now recognize that we can recur on the Strassen idea. In particular, we can apply the Strassen algorithm to each of the half-sized block multiplica- tions associated with the A. Thus, ifthe original A and B are n-by-n and 1 = 2, then we can repeatedly apply the Strassen multiplication algorithm. At the bottom “level,” the blocks are 1-by-1. Of course, there is no need to 32 CHAPTER 1. MaTRIx MULTIPLICATION PROBLEMS recur down to the n= 1 level. When the block size gets sufficiently small, ( < ryin), it may be sensible to use conventional matrix multiplication when finding the P, . Here is the overall procedure: Algorithm 1.3.1 (Strassen Multiplication) Suppose n = 2" and that ACR" and Be R™™. If tin = 2 with d 1, then it sufices to count multiplications ‘as the number of additions is roughly the same. If we just count the mul- tiplicatione, then it cuffices to examine the deepest level of the recursion ‘as that is where all the multiplications occur. In strass there are q ~ d subdivisions and thus, 7°~¢ conventional matrix-matrix multiplications to perform. These multiplications have size tmin and thus strass involves about s = (24)°7t~* multiplications compared to ¢ = (2*)°, the number of multiplications in the conventional approach. Notice that 1G @™ 1.3, BLOCK MATRICES AND ALGORITHMS 33 Ifd=0, ie., we recur on down to the I-by-1 level, then ) ce Ta ntl neg 3 ‘Thus, asymptotically, the number of multiplications in the Strassen proce- dure is O(n?40"). However, the number of additions (relative to the number cof multiplications) becomes significant 08 rimin gets small. Example 1.8.1 Ifn = 1024 and tain ~ 64, then strane involves (7/8)!9-8 m6 the asthmatic of the conventional algorithm. Probleme 1.3.1. Generalize (1.3.3) a0 that it can handle the variable block-eiza problem covered. by Theorem 1.33. : 3.9.2 Generalize (1.24) and (1.3.5) eo thet they can handle the variable Blockine 2.3.3 Adapt stramn ap that it can handle square matrix multiplication of any order. Hint Ifthe "current" A haa odd dimension, appand a zero row and column, iad Pret ba hee la blocking of the matrix A, then ae 13.5 Spon even and cin te flowing function from RO to Re 2 Ja) = ttamFatny © Senta (0) Show that i 2,9 € RP thea oa Py = Sener + vader +91) = 12) ~ fo) (8) Now consider the mty-n matrs makiptiation @ = AB. Ghre an algoithmn for By} Ae ‘and ote from Lemma 13.2 that Now analyze each Ay By with the help of Lama 13.1. Notes and References for Sec. 1.3 For quite some time fast methods for matrix multiplication have atractad « ft of at- tention within computer science, See 8. Winograd (1968), “A New Algorithm for Inner Produc,” BBE Trans. Comp. C-17, 6st, YY. Straseen (1969). “Gaussian Elimination is Not Optimal,” Numer. Math. 15, 354-356, YV. Pan (1984), “How Can We Speed Up Matrix Multiplicatio?,” SIAM Repiew 26, so4i6, “Maay of thase methods have dubious practical value. However, withthe publication of D. Bailey (1068). “Extra High Spend Matrix Multiplication on the Cray.2," SIAM J. Sei and Stat. Comp. 9, 603-607. {ie clar that the blankatdieminnal of those fast procedures ia unwiae, The “etability” (of the Straaen algorithm ix diced in $24.10. Seo aleo [NJ Higham (1990). “Exploiting Fast Matrix Multiplication withix the Level 3 BLAS” ACH Trona, Mosh. Soft 16, 352-968. ‘CC. Douglas, M. Heroux, G, Slshman, and RLM. Smith (1994), “CEMMW: A Portable Level 3 BLAS Winograd Variant of Struaen's Matris-Mutrie Multiply Algorithm.” I Comput, Phys. 110, 1-10. 1.4 Vectorization and Re-Use Issues ‘The matrix manipulations discussed in this book are mostly built upon dot products and saxpy operations. Vector pipeline computers are able to perform vector operations such ax these very fast because of special hardware that is able to exploit the fact that a vector operation is a very regular sequence of scalar operations. Whether or not high performance is extracted from such a computer depends upon the length of the vector operands and a number of other factors that pertain to the movement of data such as vector stride, the number of vector loads and stores, and the level of data re-use. Our goal is to build a useful awareness of these issues, We are not trying to build e comprebensive model of vector pipeline 1.4, VectoRmaTion AnD Re-Use Issuzs 35 ‘computing that might be used to predict performance. We simply want to Identify the kind of thinking that goes into the design of an effective vector pipeline code. We do not mention any particular machine. The literature is filled with case studies. 1.4.1 Pipelining Arithmetic Operations ‘The primary reason why vector computers are fast has to do with pipelin- ing. The concept of pipelining is best understood by making an analogy to scsembly line production. Suppose the sssembly of an individual automo- bile requires one minute at each of sixty workstations along an assembly line. the line is well staffed and able to initiste the assembly of a new car ‘every minute, then 1000 cars can be produced from seratch in about 1000 + 60 = 1060 minutes. For a work order of this size the line hos an effective “vector speed” of 1000/1060 automobiles per minute, On the other hand, if the assembly line is understalfed and 8 new assembly can be initiated just once an hour, then 1000 hours are required to produce 1000 cars. In this case the line has an effective “scalar speed” of 1/60th automobile per minute, So itis with a pipelined vector operation such as the vector add z = <-+y. The sealar operations = = xj + y are the cars. The aumber of elements is the size of the work order. If the start-to-finish tive requited for each is 7, then & pipelined, length vector add could be completed in time much less than nr. This gives vector speed. Without the pipelining, the ‘vector computation would proceed at s.scalar rate and would approximately require time nr for completion. Let us see how a sequence of fosting point operetions can be pipelined. Floating point operations usually require several cycles to complete. For example, 3 dcycle addition of two scalars x and y may proceed as in Fic.1.4.1. To visualize the operation, continue with the above metaphor :—[ Xa : Spe] tet] Horas == Fig. 14.1 A 3-Cycle Adder ‘and think of the addition unit as an assembly line with throe “work sta- tions”. The input scslars x and y proceed along the assembly line spending ‘one cycle at each of three stations. The sum = emerges after three cycles. « cuarsen 1, Mannix Mesmucamon Pontos sa ite, hak mae a Boe: . Fig. 1.4.2 Pipelined Addition Note that when o single, “iree standing” addition is performed, only one of the three stations is active during the computation. ‘Now consider a vector addition z =+y. With pipelining, the x and y vectors are streamed through the addition unit. Once the pipeline is filed and steady state reached, a 2 is produced every cycle. In Fic.1.4.2 we depict what the pipeline might look like once this steady state is achieved. In this case, vector speed is about three times scalar speed because the time for on individual add is three cycles. 1.4.2 Vector Operations ‘A vector pipeline computer comes with a repertoire of wector instructions, such as vector add, vector multiply, vector scale, dot product, and saxpy. ‘We assume for clarity that these operations take place in vector registers. ‘Vectors travel between the registers and memory by means of vector load and vector store instructions. ‘An important attribute of a vector processor is the length of its vector registers which we designate by v,. A length-n vector operation must be broken down into subvector operations of length v, or less. Here is how such ‘2 partitioning might be managed in the case of a vector addition z = =+y whore = and y are n-vectors: first =1 while first Aaj =; eR} If {01,---.dn} is independent and 6 € span{ai,...,0a}, then b is a unique linear combination of the a;. If $1, ..+4Se are subspaces of FO, then their sum is the subspace defined by $= (ay $aa +++ +04: a, € Si, f= Lk }. Sis gald to be a direct sum if each v € S bas a unique representation sso ay with a € Si. In this case we write $ = $1 ®---@ S,. The intersection of the S; is also a subspace, 5 = 31S Se. +1044} is @ mazimal linearly independent subset of span{a linearly independent subsot of {a1,...52q}- If fojyy.+-s044} is maximal, On} = span{aiye--0i,) and {aiyoo-sdhy) 18 a basis ay} . If SC IR™ is a subspace, then it is possible to find independent basic vectors ‘ay € 5 such that S = span{ay,...,ax} . All bases for a subspace S have the same number of elements. This number is the dimension and is denoted by dim(S) 2.1.2 Range, Null Space, and Rank ‘There aze two important subspaces ossociated with an m-by-n matrix A. ‘The range of A is defined by ran(A) = {y¢R™: y = Az for some 2 € RY}, and the mull space of A is defined by null(A) = {z €R" : Az =0}. +4] is @ columa partitioning, then ran(A) = span{ar,...saq} - ‘The rank of a matrix A is defined by rank(A) = dim (ran(A)). Tt ean be shown thot rank(A) = rank(AT). We say that A ¢ Re™*" is rank deficient if rank(A) < min{m,n). If A€ R™*, then im(null()) + ronk(A) KA=[a, 50 CHAPTER 2. MATRIX ANALYSIS 2.1.3 Matrix Inverse ‘The n-by-n identity matriz Iq is defined by the column partitioning Ine len en] where e4 is the kth “canonical” vector: ee =(Qye-50 1, 0, ‘The canonical vectors arise frequently in matrix analysis and if their di- mension is ever ambiguous, we use superscripts, ie., ef") € IR". If A and X are in FE" and satisfy AX = 1, then X is the inverse of A and is denoted by A™!, If A~' exists, then A is said to be nonsingular. Otherwise, we say A is singular. ‘Several matrix inverse properties have an important role to play in mae trix computations. The inverse of a product is the reverse product of the (AB) = BA", 11) ‘The transpose of the inverse is the inverse of the transpose: (AYP a (Attar (212) ‘The identity Be At BB AA (2.13) shows how the inverse changes if the matrix changes. ‘The Sherman-Morrison- Woodbury formula gives convenient expres- sion for the inverse of (A+UV") where 4 ¢ RO*™ and U and V are n-by-k: (AF UVT) AAMT VATU) VTA! (2.1.4) ‘A rank & correction to a matrix results in a rank k correction of the inverse. Ja (2.1.4) we assume that both A and (I+ V?A~'U) are nonsingular. ‘Any of these facts can be verified by just showing that the “proposed” inverse does the job. For example, here is how to confirm (2.1.3): B(A*—~ B-4B — AYA“) = BA (B - A)A™ 2.14 The Determinant If A = (0) € R™!, then ita determinant is given by det() = a. The determinant of A ¢ R"*" is defined in terms of order n — 1 determinanta: det(A) = De 1)? *ayydet(Ay)) 2.1. BAsic IpeAs FROM LINEAR ALGEBRA 51 Here, A1y is an (n ~ 1)-by-(n— 1) matrix obtained by deleting the frst row ‘and jth column of A. Useful properties of the determinant include det(A)det(B) ABER" det(AB) = det(A) = det(A) Aer" det(cA) = det A) ceRAcR™ det(A) #0 4 Ais nonsingular Ae RO" 2.1.5 Differentiation Suppooe a is a scalar ood that A(a) is an m-by-n matrix with entries a(c2). If ay(a) is a differentiable finetion of a for all and j, then by A(a) we mean the matrix a Ha) = Zale) = (Lasla)) = @yla)- ‘The diferentiation of a parameterized matrix tums out to be a handy way to examine the sensitivity of various matrix problems. Probleme 2.1.1 Show that if Ae FO" bas rank p, then therw existe an X€R™? and 9 Y¥ @ RO*? wach that A = XY, where rank(X) = raak(Y) op. 2.1.2 Suppose A(a) € Fad B(a) € FE** are matrices whose entree are difer- cotiable fanctions of the scalar @. Show Emorocen = [Zac] 210+ 40 [2 aa] P2.1.3 Suppome A(a) ¢ FO%™ haa enti that ar diferntiabe functions of the salar ‘a. Amuming Aa) almye noaingula, obow $ acer] = -A0o-* [Zara] Acad. 2.14 Suppose A GFE, BG HE and that g(e) = JT A 27% Show that the aden of § i gion by VAs) = HAF + Ale 8 P2.1.5 Amume that both A and +97 are soosagulr whare A € FO** sad, € RE Show that f= saves (A 4-wo?o = then 1 alo soles perturbed ight bead ide problem ofthe form Az= b +e. Give an xpremi fr aia tem of A, 3, and Notes and Referanons for Sec. 2:1 ‘There ae many introductory near lgebra vets, Among them, the flowing are par- ‘Heulary ef PR Halmos (1858). Finite Dimensional Vector Spaces, od od, Van Nostrand Reinhold, Prinoron, 52 CHAPTER 2. MATRIX ANALYSIS 8.3, Loon (1980). Linaor Algebra ith Applications. Macmillan, New York. Strang (1989), Intreduction to Linear Alger, Waleley-Casbegs Prem, Wellley D, Lay (1994). Linear Algebra and fis Applications, Addioon- Wesley, Reading, MA. © Meyer (1907. Course Appin Lineer Agere, SIAM Pubatons,Phiadephi PAL More advanced treatments include Gantmacher (1050), Hora nad Johnson (1985, 2991), aod AS, Honma (104), The Theory of Mabe Numeral Analy, Gas (Bate ‘dell, Boston. M, Marcus and BL. Mine (1964). A Survey of Matrie Theory ond Matric Inorualiten Allyn tnd Bacon, Boston. LN. Fraaldin (1968). Matris Theory Prentice Hall, Englewood Cif, NJ. R. Baliman (1970). introduction to Matrez Analyes, Second Editon, McGraw-Hill, New York. P. Lancaster and M. Tiamenetaky (1085). ‘The Theory of Matrices, Sesond dition JMC Oncega (1087), Matrie Theory: A Second Course, Plus Pres, New York. 2.2 Vector Norms ‘Norms serve the same purpose on vector spaces that absolute value docs fon the real line: they furnish a measure of distance. More precisely, R” ‘together with a norm on R® defines a metric space. Therefore, we have the familiar notions of neighborhood, open sets, convergence, and continuity ‘when working with vectors and vector-valued functions, 2.2.1 Definitions A vector norm on IR" is a function /-R" —~ R that satisfies the following properties: f(z) 20 zeER', (f(z) =0iffz=0) f(z+y) < f(z) + fy) ye R” faz) = |alf(z) aeRzeR" We denote such a function with a double ber notation: f(2) = J. Sub- ‘scripts on the double bar are used to distinguish between various norms. ‘A useful class of vector norms are the p-norms defined by Hod, = (eu? ++ leal)# pan (223) ‘Of these the 1, 2, and oo norms are the most important: Veh, = false lea lz = (lei? +--+ Jeni)? = (Pa)! Wells = max lel 2.2. Vector Nonms 53 ‘A unit vector with respect to the norm l| || is a vector x that satisfies Iz}=1. 2.2.2 Some Vector Norm Properties A classic result concerning p-norms is the Holder inequality: By || we say that tobe = 2-24] isthe absolute error in . If #0, then 2-21) ot TT prescribes the relative error in 4. Relative error in the oo-norm can be ‘translated into o statement about the number of correct significant digits in @. In particular, if Hele a 1 then the largest component of has approximately p correct significant digits Bxample 2.21 I= (170 06674)7 and w (1.235 .06128)7, ben 12 #lln/I = las 0043 = 10-*. Nota than 4y has about three signcant digits that are correct while aly coe significant digit in 23 ia comect. 54 CHAPTER 2. MATRIX ANALYSIS 2.2.4 Convergence We say that 0 sequence {2(*)} of n-vectors converges to = if fm 2 zi =0. ‘Note that becuse of (2.2.4), convergence in the a-norm implies convergence {in the A-norm and vice versa. Problems P22. Show that i € RO, then lity oii, = 2 by P22. Prove the Cauchy-Schwarts inequality (223) by comakdering the inegsality OS (ax + by)" (az + by) for suitable scalars a and 6. PB.23 Verify tat + Hy I Nyy 804 [Nag ate Yetor norms 2.24 Verify (22.5}(227). Whe is euality achieved in each em? 2.25 Show that in RY, 20) + if ad only 20 24 fr B= 2.2.6. Show that aay vector norm on R* is waiforly continuous by veritying the inequality | lel ~ vil ll Alla Bll. For the most part we work with norms that satisfy (2.3.4), ‘The p-norms have the important property that for every A € R"™" and ZER" we have | Az |, < |] All,] 2[,. More generally, for any vector norm ff «lq 08 Re" and’ - jon R™ we have | Azlly < Algal? la where A fag is « matrix norm defined by Arig Tle” (2.3.5) HAllas = Wesay that | lug is subordinate tothe vector norm fg and | «By. Since the set {z €'R": |x, = 1} is compact and | = [1 i continuous, © follows that Walaa jaa LAr ly = Lae" ly (238) for some z* ¢ IR having unit a-norm. 2.3.2 Some Matrix Norm Properties ‘The Frobenius and p-norms (especially p = 1, 2, co) satisfy certain ineqtal- ities that are frequently used in the analysis of matrix computations. For AER™® we have UA: $f Allp < Vala ean max lou] < Ala S Vein max layl (2.38) VAh = me Seu (239) Vly = ae Lieut 230) Fal Ale SUAb < VIALS sn) JRiAh sib s vila (32) 2.3. Marmix Noms 37 HAER™", 1 1, a contradiction. ‘Thus, fF is nonsingular. To obtain’ an expression for its inverse consider the identity . (E*) (+P) = 1-FH, = Siace | FI, <1 i flows that im F* = 0 because 1F*Y, < WIS Thus, (vde)e-n al w It follows thot (I'-F)-* = Jim 7 F*. From this itis easy to show that is 1. IF il, ta-ry s Died = & Note that |(I-F)--Iil, < [Fllp/(- Il F'ill,) a8 @ consequence of the lemma. ‘Thus, if ¢ <1, then O(6) perturbations in I induce O(¢) perturbations in the inverse. We next extend this result to general matrices. ‘Thoorem 2.3.4 If A is nonsingular and r= AE J, <1, then A+ E is nonsingular and (A+ 2)" AI, WEL WAT —2)- Proof. Since A is nonsingular A +E = A(I ~ F) where F = —A-'E. Since | Fil, = r < Lit follows from Lemma 2.3.3 that IF is nonsingular and {(2- F)-" fl, < 1/(1-r). Now (A+ £)-! = (I= F)"1A! and so taser, s Abe 2.4. Foire PRECISION MaTREx COMPUTATIONS 59 Equation (2.1.3) seys that (A+ E)~! — An! = —A~18(A + B)~! and oo by taking norms we find VAs et- A, < PAM EE NAS, 193 < MULE g Problems P2.3.1 Show | ABI, SHAI,II Bil, where 1

M or 0 < |a op bl < m respectively. ‘The handling of these and other exceptions is hardware/system dependent. 2.4.3 Cancellation Another important aspect of finite precision arithmetic is the phenomenon of catastrophic cancellation. Roughly speaking, this term refers to the ex ‘treme loss of correct significant digits when small numbers are additively computed from large numbers. A well-known example taken from Forsythe, Malcolm and Moles (1977, pp. 14-16) i the computation of e* via Tay- Jor series with a > 0. The roundoff error associated with this method is ‘There are important camps of machines whose additive floating point operations satiety 7020) = (1+ e)0-© (1+ 49)0 whee [elle] bj =lay,i=im jain BSA > by Sayi ‘With this notation we see that (2.4.6) has the form m, j= in If(A) ~ A} < ulAl. A relation such as this csa be easily tuned into a norm inequality, eg., HAA) = Al}, S all All, However, when quantifying the rounding errors in a matrix manipulation, the absolute value notation can be lot more Informative because it, provides a comment on esch (i,j) entry. 2.4.5 Roundoff in Dot Products We begin our study of finite precision matrix computations by considering ‘the rounding errors that result in the standard dot product algorithm: s=0 for k= In saotnn (24.7) end Here, x and y are n-by-1 floating point vectors. In trying to quantify the rounding errors in this algorithm, we are immediately confronted with notational problem: the distinction be- ‘tween computed abd exact quantities. When the underlying computations are clear, we shall use the /l(-) operator to signify computed quantities. 2.4. Favre Precision MATRIX COMPUTATIONS 63 Thus, fi(2"y) denotes the computed output of (2.4.7). Let us bound Uste"9) = 37). It . y=Tl (Eas) , i then #1 = 21ys(1 +61) with [64] 0 and B20, then conventional matrix multiplication produces a product € that ‘has small componentwise relative error: [-C| ¢ mula} [Bl +O(u?) = aujo] +0(u") This follows from (2.4.18). Because we cannot say the same for the Strassen. approach, we conclude that Algorithm 1.3.1 is not attractive for certain nonnegative matrix multiplicstion problems if relatively accurate ij are required. Extrapolating from this discussion we reach two fairly obvious but im- portant conclusions: ‘© Different methods for computing the same quantity can produce sub- stantially diferent results. '* Whether or not on algorithm produces satisfactory results depends upon the type of problem solved and the goals of the ser. ‘These observations are clarified in subsequent chapters and are intimately related to the concepts of algorithin stability and problem condition. Probleme Pad Show thet if (24.7) la applied with y = 2, then fi(2"2) = 272(1 +a) where al ¢ na +0(02), Pada Prov (243). PAA9 Show thet if B ¢ HE™M® with m > m, thon 151 a < VA Ef This eeu is ‘met when deriving norm bounds fom absolute value bounds. 244 Amume the existence of « aqoare rot function stiying f1(/2) = VE() +6) with [el (0? + ww). But o? =A} =|] Ar fi}, and 0 we must have w = 0. An obvious induction argument completes the proof of the theorem, © ‘The a; are the singular values of A and the vectors uw; and 1 are the ith left singular vector and the ith right singular vector respectively. It 2.5. ORTHOGONALITY AND THE SVD. n is easy to verify by comparing columns in the equations AV = UE and ATU = VSP that Ay = 014 Ay = om It is convenient to have the following notation for designating singular val- } i= 1min(mn} (A) = the ith largest singular value of A, Omes(A) = the largest singular value of A, Cmin(A) = the smallest singular value of A. ‘The singular values of @ matrix A are precisely the lengths of the semi-axes of the hyperellipsoid E defined by B= { Az: [la =1}. Bxample 2.5.1 (8 Ble -(S ME ay ‘The SVD reveals s great deal about the structure of a metrix. If the SVD of A is given by Theorem 2.5.2, and we define r by OB DOe> orgy soy =O, theo rank(A) = or (2.5.3) aull(A) = spam{eegty.--+¥a} (254) rand) = epan(yeste} 5 (283) and we have the SVD expansion A= Dowsl (250) st ‘Various 2-norm and Frobenius notm properties have connections to the SVD. If Ae R™", then WAIL = oft--403 pa minimn} (25.7) Ab =m (2.5.8) in "Azle oo (m2n) (259) 2 CHAPTER 2, MATRIX ANALYSIS: 2.5.4 The Thin SVD If A= ULV? € R™ is the SVD of A and m > n, then A=UiEVT where sooty RE" and Ey = E(t, Ln) = diag(oy,...,09) € RO We refer to this much-used, trimmed down version of the SVD as the thin sup. 2.5.5 Rank Deficiency and the SVD One of the most valuable aspects of the SVD is that it enables us to deal sensibly with the concept of matrix rank. Numerous theorems in linear ‘algebra have the form “if such-and-ouch # matrix has full rank, then such- ‘and-such a property holds.” While neat and sesthetic, results of this favor do not help us address the numerical difficulties frequently encountered in situations where near rank deficiency prevails. Rounding errors and fuzzy data moke rank determination a nontrivial exercise. Indeed, for some small ‘ce may be interested in the crank of a matrix which we define by rank(A.e) = min rank) Thus, if A is obtained in a laboratory with each a; correct to within +001, then it might make sense to look at raak(A,.001). Along the same lines, if Ais an m-by-n floating point matrix then itis reasonable to regard A ss numerically rank deficient if rank(A,¢) < min{m,n} with e= ull A [2. Numerical rank deficiency and e-rank are nicely characterized in terms of the SVD because the singular values indicate how near a given matrix is to a matrix of lower rank. ‘Theorem 2.5.3 Let the SVD of A R™" be given by Theorem 2.5.2. If eer =rank(A) and £ Ay= Dovw?, (25.10) min A-Blha = A Avia = ons. (2.5.11) ait [A- Bla = WA~ dul (say 2.5. ORTHOGONALITY AND THE SVD a ) it Follows that rank( Ax) 109) and so | A — Ax {la = Ost ‘Now suppose rank(B) = k for some B éR™™". It follows that we can find orthonormal vectors z1;.--,2q-n 90 oull(B) = span{2t,...,2n-1} - ‘A dimension argument shows that wwe have kat WA-BUE > M(A~ Ble =WAZHE = DoT e)? > ods completing the proof of the theorem. © ‘Theorem 2.5.3 says that the smallest singular value of A is the 2-norm distance of A to the set of all rank-deficient matrices. Tt also follows that the set of full rank matrices in IR™*" is both open and dense. Finally, ifre = rank(A,), then a1 Do Borg > CD organ Bo Boy p= minfmn). ‘We have more to say about the numerical rank issue in §5.5 and §12.2. 2.5.6 Unitary Matrices ‘Over the complex field the unitary matrices correspond to the orthogonal ‘matrices. In particular, Q € ©" is unitaryif Q7Q = QQ" = Iq. Unitary matrices preserve 2-norm. The SVD of a complex matrix involves unitary matrices. If A€ (™™, then there exist unitary matrices U € C™™™ and Vee" guch that UM AV = ding(o1,.-.,09) ER" — p= minfm,n} where oy 2 042 ...2 0,20. Problecse 2.5.1 Show that if ts ral and ST =~5, then J ~ $ a nonsingular and the mate (C= S)-*(F% 5) orthogonal. Thin in known 0a the Capley ronaform of 5. ™ CHAPTER 2. MATRIX ANALYSIS; 2.6.2 Stow that angular orthogooal matric ia dagpal. 2.63. Show that FQ = Qi +402 is unitary with Qi, € FE, then the nbn palmate z-(& & ix othogona P25. Brabah properties (25.)(259) 25.8 Prove that emee(A) = mae “ay veR™z6R™ Trl P25 Foc the 2-2 matix A = [ 9 rnin) tata unetion of, ¥, ad = P2S.7 Show that any matrix ia RO ithe limi ofa mquese of fil enk mates. (2.5.8 Show thet if A € R™** has rank n, then || A(A74)-'A? Jja = 1. P2.5.9 What isthe neat asi manixto A= [4 | inthe Robeaus nom? 5 ] deems at an 2.5.10 Show that if A € R™*® then || Ally $ /Fank(A) EA lay thereby sharpening aan. Notes and Referonces for Sec. 2.5 Forsythe and Moler (1967) offer » good account of the SVD's roe in tho analysis of the ‘Ax = problem. Their proof of the decomposition a more traditional than ours is that it maton ue ofthe eigeavale theory for eymetic matrices. Historical SVD relerencer inelade E. Boltrami (1873). ‘Sulla Punsioni Bilinear,” Gionale i Mathematiche 11, 98-106. C. Beart ane G. Young (1999). *A Principal Axia Transformation for Noo- Hermitian Matrices” Bull. Amer. Math Soe. 46, 118-21 G.W. Stewart (1989). "On the. Early History of the Singular Value Decomposition,” ‘SIAM Review $5, 351-566, (One ofthe most significant developments in sclemifc computation hasbeen the increased ‘se of the SVD in application area that roqure the inteligens handling of matrix rank. ‘The range of applications i impremive. One of the mort interesting is CB, Moler and D. Morrison (1988). “Singular Value Analysis of Cryptogramng,” Amer. ‘Math. Monthy 90, 78-27 For geoeraliantions of the SVD to infinite dimensional Uber space, see LC. Gohberg and M.G. Krein (1960). introduction to the Theory of Linear Now-Self ‘Adjoint Operators, Amer. Math. Soc, Providence, Rl . Smithiee (1070). Inter Squations, Cambridge University Press, Cambridge. Reducing the rank of o matrix as in Theorem 25.3 when the perturbing matrix i com ‘erined in dicuamed i IW, Demme (1987). “The smallest perturbation ofa submatri< which lowers the rank ‘and constrained total least squares problemas, STAM J. emer. Anal. #f, 199-206. 2.6. PROJECTIONS AND THE CS DECOMPOSITION % GH, Golub, A. Hotiman, and G.W. Stewart (1088). "A Ganeralisation ofthe Eelare- ‘Young Mirsky Approsimation Theorem.” Lin. Alp. and fle Applic 68/80, 317-328, GA. Wotaon (1988). “The Smalls Perturbatioa of « Submmatrox which Lowers the Raa of the Matrix” IMA J. Numer, Anal. 8, 295-304. 2.6 Projections and the CS Decomposition If the object of a computation is to compute a matrix or a vector, thea norms are useful for assessing the sccuracy of the answer or for measuring progress during an iteration. Ifthe object of a computation is to compute a subspece, then to make similar comments we need to be able to quantify the distance between two subspaces. Orthogonal projections are critica in this regard. After the elementary concepts are established we discuss the CS decomposition. This is an SVD-like decomposition that is handy when having to compare & pair of subspaces, We begin with the notion of an ‘orthogonal projection. 2.6.1 Orthogonal Projections Let SC IR" be a subspace. P ¢ R°*" is the orthogonal projection onto Sif ran(P) = $, P! = P, and P? = P. From this definition itis ensy to show that if x €R", then Pz € S$ and (I — P)z¢ $+ If Py and Py are each orthogonal projections, then for any z € R° we have NCP ~ Paz = (Piz) ~ Pade + (Paz) "UE - Pde If ran(P,) = ran(P,) = S, then the right-hand side of this expression is zero showing that the orthogonal projection for a subspace is unique. If the columns of V = [v4,...,1% ] are an orthonormal basis for a subspace S, then it is easy to show that P = VV" i the unique orthogonal projection onto S. Note that if v €R°, then P = w7/u7v is the orthogonal projection onto $ = span{v}. 2.6.2 SVD-Related Projections ‘There are several important orthogonal projections associated with the sin- gular value decomposition. Suppose A = UEVT e R™*" is the SVD of A and thot r = rank(A). If we have the J and V parttionings u=(Uu 6) vel% %] rom=r roner VVE projection on to null()* = ran(AT) V,VP_ = projection on to null(A) U,UT = projection on to ran(A) U-0F = projection on to ran(A)* = null A?) 76 CHAPTER 2. MATRIX ANALYSIS 2.6.3 Distance Between Subspaces ‘The one-to-one correspondence between subspaces and orthogonal projec- tions enables us to devise a notion of distance between subspaces. Suppose ‘S, and Sz are subspaces of Rand that dim(S;) = dim(S;). We define the distance between these two spaces by dist(Si, Ss) = 1 Pi- Palla (2.6.1) where P, is the orthogonal projection onto Sj. The distance between a pair of subspaces can be characterized in terms of the blocks of a certain ‘orthogonal matrix. ‘Theorem 2.6.1 Suppose we=lm We] Z=(% hl Bonk kon-k are n-by-n orthogonal matrices. If Sy = ran(W1) and Sp = ran(Zi), the dist(Si,52) = |WYZe la = 127M ll. Proof. dist(S,52) AWAWT ~ 227 I, = WW™(WAWE - 2.28)2 fy = [fees "JL. Note that the matrices WZ and WY Z; are subinstrices of the orthogonal matrix Qu Qa] 2 | WE 2 o=[8: 2] = [WE whe |-#7 ‘Our goal is to show that || Qa fly = laeh ‘Since Q is orthogons! it follows from } = [SE] @ LehQuelh + 1Qax2i} for all unit 2-norm x € RY. Thus, that Qari = max zi min H Vnih = ymax Bnei pit, Hue = 1G min( Qs)? 2.6, PROJECTIONS AND THE CS DzcoMPostrion n Anslogously, by working with Q7 (which is also orthogonal) it is possible to show that ; 1 Qf 1B = 1 - ermal). and therefore Qa = 2 ~omn(Qas)? ‘Thus, Qa lly = 1x2 tO Note that if 5, and $2 are subspaces in IR" with the same dimension, then 0 < dist(S1, 52) < 1. ‘The distance is zero if S, = Sz and one if S:\St # {0}. ‘A more refined analysia of the blocks of the Q matrix above sheds more light on the difference between a pair of subspaces, This requires a special SVD-like decomposition for orthogonal matrices. 2.6.4 The CS Decomposition ‘The blocks of an orthogonal metrix partitioned into 2-by-2 form have highly related SVDs. This is the gist of the CS decomposition. We prove a very useful special case first. ‘Theorem 2.6.2 (The CS Decomposition (Thin Version)) Consider the rs 2-[%] str 2m nia 3 i ef eer ey ee enist orthogonal matrices U, € R™*™, Uz €R™*™, and Vi € R™™ such (2 ay [&le-[8] C= dingloon(Oy), S = ding(sin(®),.. eR", Qa eR /con(n)), sin(@n)), Och che Proof. Since || Qui lla $ II @ lla = 1, the singular values of Qi, are all in he interval (0,1). Let UFQuV, = C = dingler,... 60) = [% S] mice tnt 8 CHAPTER 2. MarRix ANALYSIS be the SVD of @; where we assume ea a> Bs Dey 20. ‘To complete the proof of the theorem we must construct the orthogonal matrix Ua if am = 1% He) ton 7 hoo nu os fa (5 2 Tale-[§ 3] * WW Since the columns of this matrix have unit 2-nona, W; = 0. The columas of Wz are nonzero and mutually orthogonal because WY Wa = Inne — ETE = dlag(t — 2,1,...,1-4) JIG for k = I:n, then the columns of Z = Wa ding(h ses, +05 1/8n) are orthonormal. By Theorem 2.5.1 there exists an orthogonal matrix U, € R™*™ with Up(:,t + In) = Z. It is easy to verify that UF Qav; = dlag(s4,---44n) = 5. Since +s = 1 for k ~ lin, it follow that these quantities are the required cosines and snes, © Using the same sort of techniques it is possible to prove the following more general version of the decomposition: ‘Theorem 2.6.3 (CS Decomposition (General Version)) if otée] is a 2-by-2 (arbitrary) partitioning of an n-by-n orthogonal matriz, then there exist orthogonal o- Fete] = bite] is nonsingular. If 35 such that vTov = 2.6. PROJECTIONS AND THE CS DECOMPOSITION nm where C = diag(eys..-s69) and S = diag(ar,. matrices with 0< c4 8 <1. Proof. See Peige and Saunders (1981) for details, We have suppressed the dimensions of the zero submatrices, some of which may be empty. ‘The eavential message of the decomposition is that the SVDs ofthe Qi, are highly related. Example 2.6.1 The matric 07876 o3eoT 04077 =o. 147) are square diagonal oss 0.2198 mz 0287601817 Qe =01301 00502 05805 0162 0.895 The angles amociated with the cosines and sines turn out to be very im Portant in a number of applications. See §12.4. Problems 2.6.1 Show chat i Pw an ortogonal projection, then Q-= I 2P i orchogool. 72.6.2 Whet ar te iagular value of an orthogooal projection? 2.63. Suppom Si = man{s} and 5: = span(y), where © and y are unit -n0na vector in R2. Working ony with the definition of dit), stow that dat( Si, $) = Vi (=F oF verifying that th datnce between Ss and S ual the ina of the angle erween and. Notes and References for Sec. 2.8 ‘The following papers diacunt various aspects of the CS decomposition: . Davia aad W. Kaba (1970). “The Rotation of Eigeuvectort by » Perturbation I,” SIAM J. Num. Anal 7, 1-48, G.W. Stawart (1977). "On the Perturbation of Preudo-Lrversn, Projections and Linear eaot Squares Problems.” SIAM Review 19, 634-682. (CLC. Paige and M. Saunders (1081). “Toward a Generalized Singular Value Decomspoui- on," SIAM J. Num. Anak 18, 308-405, (COC: Paige and M. Wat (1984). “History aad Gaveraity of tbe C8 Decomposition,” Lin. ‘Alp. and Its Applic. 208/200, 303-328. 80 CHAPTER 2. MATRIX ANALYSIS ‘Sea $8.7 for soxme computational details Foca deoper geometrical understanding ofthe CS decomposition and the notion of datance between subspaom, wo TTA, Aris, A. Edelman, and 8, Smith (1008). “Conjugate Gradimt and Newton's ‘Method! on the Graaarian end Stiefel Maaifokin” to appear in STAM J. Matrie Ancl. Ap 2.7 The Sensitivity of Square Systems ‘We now use some of the tools developed in previous sections to analyze the linear system problem Az = 5 where A € R'™" is nonsingular andé ¢ R". ur aim is to examine how perturbations in A and 8 affect the solution 2. A much more detailed treatment may be found in Higham (1996). 2.7.1 An SVD Analysis It A= Soa? = uEvt is the SVD of A, then ‘This expansion shows that small changes in A or b can induce relatively large changes in z if on is small. Tt sbould come as no surprise that the magnitude of om should have 1 bearing on the sensitivity of the Az = b problem when we recall from ‘Theorem 2.5.3 that op isthe distance from A to the set of singular matrices. ‘As the matrix of coefficients approsches this set, itis intuitively clear that ‘the solution z should be increasingly sensitive to perturbations. 2.7.2 Condition A precise measure of linear system sensitivity can be obtained by consider- ing the parameterized aymam (A+ePla() =b+ef 20) =z where F € R°*" and f € R”. If A is nonsingular, then it is clear that 2(¢) is differentiable in a neighborhood of zero. Moreover, #(0) = A-*(f-Fz) and thus, the Taylor series expansion for z(c) has the form 20) = 2 + €2(0) 4012). 2.7, Tue SeNsITIviTy OF SQUARE SYSTEMS aL Using ony vector norm and consistent matrix norm we obtain t-te jepatn{tanen} +o. ara et Vel For square matrices A define the condition number n(A) by (A) = AN RATE (2.7.3) with the convention that x(A) = 00 for singular A. Using the inequality OY < [All hel it follows from (2.7.2) that EG) Tap < Alea + m+ oe) (2.7.4) ster ua Wel represent the relative errors in A and 6, respectively. Thus, the relative error in z can be x(A} times the relative error in A and 6. In this sense, the ret suber Sa) ques the santa othe Aa 8 probles Note that x(-) depends on the underlying norm and subscripts are used secondly e+ we HEl ele ea el Tay aad = lel ou(A) on(A) ‘Thus, the 2-norm condition of a matrix A messures the elongation of the hhyperellipsoid {Az Iz lla = 1}. ‘We mention two other characterizations of the condition number. For pnorm condition numbers, we have (A) = 1 A lial] A“ I (2.7.5) (2.78) a (A) asa. ‘This reoult may be found in Keban (1966) and shows that (A) measures the relative p-norm distance from A to the set of singular metrices. For any norm, we also have n(A)= im up (AAAI AY (2.2.7) eo waaay ‘This imposing result merely says thatthe condition number isa normalized Frechet derivative of the map A A-'. Further details may be found in Rice (19666). Recall that we were initially le to n(A) through diferenti- sion. 82 Cuapren 2. Maran ANALYSIS ‘If x(A) is large, then A is anid to be an ill-conditioned matrix. Note that this is a norm-dependent property?, However, any two condition numbers ‘o(-) and ma(-) on IRO*” are equivalent in that constants ¢; and ¢; can be found for which cutalA) S Ka(A) S cate(A) AER. For example, oa R'™" we have dla) ia mi(A) < neq(A) 1” Leta) $ mld) S matt) ar) Seta) < wold) above. For any of the p-norms, we have (A) 2 1. Matrices with small con- dition numbers are said to be well-conditioned . In the 2-norm, orthogonal ‘matrices are perfectly conditioned in that n2(Q) = 1 if @ ia orthogonal. 2.7.3 Determinants and Nearness to Singularity It is natural to consider bow well determinant size measures ill-conditioning. If det(A) = 0 is equivalent to singularity, i det(A) ~ 0 equivalent to near singularity? Unfortunately, there is lttle correlation between det(A) and the condition of Ax = b. For example, the matrix 3, defined by bate at Oe al By = , | ere (279) OO nd thas determinant 1, but oo(By) = n2"1, On the other hand, a very well conditioned matrix can have a very small determinant. For example, ding(10-4,...,10-1) € Re" although det(D,) = 10-*. satisfies rg( Dn 2.7.4 A Rigorous Norm Bound Recall that the derivation of (2.7.4) was valuable because it highlighted the connection between «(A) and the rate of change of 2(«) at « = 0. However, "It also depend upon the definition of “large.” The matter in purrued in 52.5 2.7. THE SENSITIVITY OP SQUARE SYSTEMS 83 it is a Ltt unsatisfying becouse it is contingent on ¢ being “small enough” ‘and because it sheds no light on the size of the O(c?) term. In this and the next subsection we develop some additional Az = b perturbation theorems ‘that are completely rigorous. ‘We first cotoblish o useful lemma that indicates in terms of x(4) when we can expect a perturbed system to be nonsingular, Lemma 2.7.1 Suppose Az=b AER™", OZER” (A+ Ady = 5445 AACR, AbERT with || OAl| Sel] Al andj Ab|| Selo. en(A) =r <1, then A+ dA is nonsingular and Uivil 2 ler Tel © T= Proof. Since | A“AA|| < eA“! Al] = 7 < 1 it follows from ‘Theorem 2.3.4 that (A+ AA) is nonsingular. Using Lemma 2.3.3 ond the equality (I+ A™'AA)y = 2+ A“1Ab we find Wl < N+ 4-aay (Ded ef Am) L & > (Islet) Qeh+rizi) 2 (iz|+e] A“? ot) Since {| {| = Arf < || All] zl] it follows that 1 Wis 4 We are now set to establish a rigorous Ax = 6 perturbation bound. ‘Theorem 2.7.2 If the conditions of Lemma 2.7.1 hold, then dy-=i) me Ter $ r=7"4) (2740) Proof. Since yore = AMAb ~ AB Ay (27:1) wehave fy—zi] < ef A* Il] bl] + Amt AMM yl and so, w ex(ay ell tz (ragien tA ex(A) (+}4) = oy a w oy CHAPTER 2. MATRIX ANALYSIS Example 2.7.1 The Az =) problem (3 te ][3 +L] hha solution 2 = (1, 1)F and condition rae(A) = 108. IAb=(10-*, 0)", A= 0, Be At aay 848, cheng (he 0-8) 2)? nd te aay (20) ye Late tp w < Mable a) = 10th = 1 Tale © fol6 ‘Thus, che upper bound ia (2.7.10) can be arom oyerentimate ofthe eror induced by the persusbation. Ou the other band, if Ab= (0, 10°*)", AA = 0, and (A+A.Aly = bea, hen this inequality sys IS caxiotiot “Thus, thre ace perturbation for Which te bound in (2.710) semen axa. 2.7.5 Some Rigorous Componentwise Bounds ‘We conclude this section by showing that a more refined perturbation the- ‘ory is possible if componentwise perturbation bounds are in effect and if ‘we make use of the absolute value notation. ‘Theorem 2.7.8 Suppose A= AcR™, 04¢bER™ (AtAdly = b+Ab AAERM™™, AbERT ‘and that |AAJ < eA} and [Ab] < el. If éao(A) {is nonsingular and y= 2 oo 2 lleo Proof. Since j AA lle $ el Ao 804 |] 6 lo < el blo the conditions of Lemma 2.7.1 are satisfied in the infinity norm. This implies that A+ AA fs nonsingular and Iyllo . L+r T Welle * I=r Now using (2.7.11) we find ly—al < |ATHIAH + JA“ Al Tot S ATH Bl + Am Allyl 0 such that (A+ AA) = b+ Ab [AA n) to the Znorm condition of the matrionr oa[e x o in a] sod c= Notes and References for Sec. 2.7 ‘The condition concept ia thoroughly investigated in 4. Rice (1966). “A Theory of Condition” SIAM J. Num. Anal $, 287-310 ‘W. Kahan (1966). "Numerical Linear Algebra,” Canadian Math. Bull 9, 757-801. References for componentwise perurbation theory include 86 (CHAPTER 2, MATRIX ANALYSIS 'W. Outtll and W. Prager (1964). “Compatibility of Approximate Solutions of Linear ‘Equations with Given Esror Bounds lor Coeficiente and Right Hand Sides,” Numer. Math. 6, 405-409. LE. Cope aad BLW. Rust (1970). "Bounds on solutions of nystems with accurate dat” ‘SIAM J. Nar. Anok. 16, 950-63. RD. Shoe (1979), “Scaling for numerical stability in Gaussian Elimination” J. ACM 26, 404-828, LW, Demme (1992), "Phe Componeatwige Distance to the Nearest Singular Matrix,” ‘SIAM J. Motris Anal. Appl. 13, 10-19. DJ, Higham and 8 J. Higham (1992). “Componeatwiie Perturbation Theory for Linear ‘Syrtem with Maltiple Right-Hand Sides,” Lan Alg. and fts Applic. 174, 111-129, NJ. Higham (1994). °A Survey of Componenti Perturbation Theory in Numerical Linear Algebra,” ia Mathematics of Computation 1949-1988: A Half Century of Computational Mathematica, W. Gautsch (ed), Volume 48 of Proceedings of Sym- ona sn Applied Mathematics, American Mathematical Society, Providence, Rhode fend. ‘8. Chandrasaren and 1.C-P. Ipoea (1998). "On the Sensitivity of Solution Components in Linear Syetemw of Equations,” SIAM J. Matris Anal Appl 16, 99-112. ‘The reciprocal of the condition number measures how near 8 given Az = b problem ia to singularity. The importance of knowing bow nea given problem isto a dificult or fnsoluble problem bas come to be appreciated in many computational settings. See ‘A. Laub(1905). “Numerical Linear Algebra Aspects of Control Design Computations,” IBEE Trans, Auto. Cont. AC-S0, 9-108. 4. L, Baslow (1086). *On the Samallor: Positive Singulat Value of an Af-Matrix with Applications to Ergodic Matkov Chal,” SIAM J. Alp. and Disc. Struct. 7, 414 a EW, Demme! (1967). “On the Distance to the Nearot I-Powod Problem,” Numer: ‘Mach. 51, 281-229. LW, Demme (1988). “The Probability that a Numerical Analysis Problem is Difficult,” ‘Math. Comp. 50, 49-420. 1N4J. Higham (1989). “Matrox Nearnens Problems aad Applications,” in Appiiations of ‘Matrix Theory, M.J.C. Gover and 8. Baroets (eds), Oxford University Prem, Oxford UK 1-27. Chapter 3 General Linear Systems §3.1 Triangular Systems §3.2 The LU Factorization §3.3 Roundoff Analysis of Gaussian Elimination §3.4 Pivoting §3.5 Improving and Estimating Accuracy ‘The problem of solving a linear system Az = b is central in scientific computation. In this chapter we focus on the method of Gaussian elimi- notion, the algorithm of choice when A is square, dense, and unstructured. When A does not fall into this category, then the algorithms of Chapters 4, 5, and 10 are of interest. Some parallel Az = b colvers are discussed in Chapter 6. ‘We motivate the method of Gaussian elimination in §3.1 by discussing the ease with which triangular systems can be solved. The conversion of 1 general system to triangular form via Gauss transformations is then pro- sented in §3.2 where the “language” of matrix factorizations is introduced. Unfortunately, the derived method behaves very poorly on 8 nontrivial class of problems. Our error analysis in §3.3 pinpoints the diffcalty and moti- vates $3.4, where the concept of pivoting is introduced. In the final section ‘we comment upon the important practical issues associated with scaling, iterative improvement, and condition estimation. Before You Begin Chapter 1, §§2.1-2.5, and §2.7 are assumed. Complementary references include Forsythe and Moler (1967), Stewart (1973), Hager (1988), Watkins 88 Cuarrer 3. Genenat Linzar SYSTEMS (1991), Ciarlet (1992), Datta (1995), Highamn (1996), Trefethen and Bau (1996), and Demmel (1996). Some MaTLARfunctions important to this chapter are lu, cond, rcond, and the “backslash” operator “\". LAPACK connections include solutions with error bounds ‘with condition entinate Solve AX = B, ATX = BANK = B via PA = LU a Equitbration 3.1 Triangular Systems ‘Traditional factorization methods for linear systems involve the conversion of the given square system to. triangular system that has the same solution, ‘This section is about the solution of triangular systems. Waeeaee 3.1.1 Forward Substitution Consider the following 2-by-2 lower triangular system: 4: 0 ][n] fh fa ta} | con If falas #0, then the unknowns can be determined sequentially: a= hfs t= (bo~fnt1)/tn. ‘This is the 2-by-2 version of an algorithm known as forward substitution. ‘The general procedure is obtained by solving the ith equation in Lz = forse a ( - ¥en) / 3.1. ‘TRIANGULAR SySTEMS 89 If this is evaluated for i = 1:n, then 0 comploto specification of zis obtained. Note that at the ith stage the dot produet of L(j,1:i ~1) and e(1:i~ 1) i required. Since & only is involved in the formula for 24, the former may be overwritten by the latter: Algorithm 3.1.1 (Forward Substitution: Row Version) If L.¢ R™ 1s lower triangular and b € RR, then this slgorithum overwrites 5 with the solution to Lz = 6, L is assumed to be nonsingular, (1) = O(1)/E(1, 1) for i= 2m 2(4) = (6G) — LG Asi — 1)o0L4 — 1))/E64,3) end ‘This algorithm requires n? flops. Note that L is accemsed by row. ‘The computed solution 2 satisfies: (E+F)E = 6 [F| < nals) + O(W) Ga) For a proof, see Higham (1996). It says that the computed solution exactly satisfies a slightly perturbed system. Moreover, each entry in the perturbing ‘matrix F is small relative to the corresponding element of L. 3.1.2 Back Substitution ‘The analogous algorithm for upper triangular systems Uz = 6 is called back-substitution. The recipe for 2, is preseribed by = (Ee) and once again 6 can be overwritten by 2;. Algorithm 9.1.2 (Back Substitution: Row Version) If U ¢ RO" is upper triangular ond 6 € R°, then the folowing algorithm overwrites & ‘with the solution to Uz = 6. U'is assumed to be nonsingular, Hn) = H(n)/U (nn) fori=n- 1-11 H) = (OG) — UU 4 + Ln) + 1:0))/0G,4) end ‘This algorithm requires n? flops and accesses U by row. The computed solution 2 obtained by the algorithm can be shown to eatiefy (+r =o IF < nul|+0(u*), (8.1.2) % CHAPTER 3. GENERAL LINEAR SYSTEMS 3.1.3 Column Oriented Versions Column oriented versions of the above procedures can be obtained by re- versing loop orders. To understand what this means from the algebraic point of view, consider forward substitution. Once x; is resolved, it can >be removed from equations 2 through n and we proceed with the reduced system L{2:n, 2:n)z(2:n) = 6(2:n)—2(1)E(2:n, 1). We thea compute x2 and remove it from equations 3 through n, etc. ‘Thus, if this approach is applied (35 2]/2]- [2 wwe find 2; = 3 and then deal with the 2-by-2 system [3 8][2]=[2]-9[2] = [8] Here is the complete procedure with overwriting. Algorithm 3.1.3 (Forward Substitution: Columa Version) IfL ¢ R'™" is lower triangular and b€ RR", then this algorithm overwrites ® with the solution to Lz =. L is assumed to be nonsingular. for j=in=1 664) = G/L.) OG + Lin) = Dj + In) ~ BY)LG + 1:0, 7) end Hn) = Bln) /L (nn) It is also possible to obtain a column-oriented saxpy procedure for back- substitution. Algorithm 3.1.4 (Back Substitution: Column Version) IfU € R™" is upper triangular and b € R", then this algorithm overwrites 6 with the solution to Uz = b. U is assumed to be nonsingular. 2-12 86) = GG.) (ls = 1) = Hy ~ 3) BG) ~ 1,4) end a1) = 6q)/00,1) [Note that the dominant operation in both Algorithms 3.1.3 and 3.1.4 is the saxpy operation. The roundoff behavior of these saxpy implementations is essentially the same as for the dot product versions. ‘The accuracy of a computed solution to a triangular system is often surprisingly good. See Higham (1996). for j 3.1, TRIANGULAR SvsTEMS a 3.1.4 Multiple Right Hand Sides Consider the problem of computing a solution X ¢ R™* to LX = B where Le Re*” is lower triangular and Be RO™*. This is the multiple right hand side forward substitution problem. We show thet such » problem ccan be solved by a block algorithm that is rich in matrix multiplication assuming that q and n are large coough. This turns out to be important in subsequent sections where various block fectorization schemes are discussed. ‘We mention that slthough we are considering here just the lower triangular problem, everything we say applies to the upper triangular case as well. ‘To develop a block forward substitution algorithm we partition the equar ton LX = B os follows: Lo | ox a cr) In Ia 0 Lin Ene Assume that the diagonal Menesan Paralleling the development of Algorithm 3.1.3, we solve the system Lu1X; = By for X, and then remove /X; from block equations 2 oe N: In 0 By— In Lon yy By— Lau X La bws «> Law By-LmXy Coming i tte vay we ca he tng oe spy vd elimi sation scheme: for j=1:N (4) Notice that the i-loop oversees a single block saxpy update of the form Bi Bysr sais (Z}-TEH EE)» By By ing For this to be handled as a matrix multiplication in » given architec ture it is clear that the blocking in (3.1.3) must. give sufficiently “big” X;. Let us assume thot this is the case if each X, has at least r rows. plished if NV = ceil(n/r) and X1,...,Xw—1 €R™* and 9 Cuapren 3. General Lingar SysTEMs 3.1.5 The Level-3 Fraction It is handy to adopt a measure that quantifies the amount of matrix multi- plication in a given algorithm. To this end we define the level-3 fraction of an algorithm to be the fraction of flops that occur in the context of matrix multiplication. We call such flops level-3 flops, Let us determine the level-3 fraction for (3.1.4) with the simplifying assumption that n = rN. (The seme conclusions hold with the unequal ‘blocking described above.) Because there are N applications of r-by-r forward elimination (the level-2 portion of the computation) and n? flops overall, the level-3 fraction is approximately given by Nr? 1 loa thoy ‘Thus, for large N almost all flops are level-3 flops and it makes sense to choose NV as large as possible subject to the constraint thet the underlying architecture can achieve a high level of performance when processing block saxpy's of width at least r= n/N. 3.1.6 Non-square Triangular System Solving ‘The problem of solving nonsquare, m-by-n triangular systems deserves some mention. Consider first the lower triangular case when m > 1, i Inj, - [% IneR™™ — eR" Jn }* = IneR™™ beRe Asoume thet Zi is lower triangular, and nonsingular. If we apply forward elimination to 1,2 = by then z solves the system provided Lai(Lj;'61) = ba, Otherwise, there is no solution to the overall system. In auch a case Jeost squares minimization may be appropriste. See Chapter 5 Now consider the lower triangular system Lx = 6 when the number of columns n excceds the number of rows ra. In this case apply forward substitution to the square system L(I:m, 1:ma)x(1:m, :m) = b and preseribe ‘an arbitrary value for 2(m + 1:n). See §5.7 for additional comments on systems that have more unknowns than equations. ‘The handling of nonsquare upper triangular systems is similar. Details are left to the reader. 3.1.7 Unit Triangular Systems A undt triangular matrix is a triangular matrix with ones on the diagonal. Many of the triangular matrix computations that follow have this added bit of structure. It clearly poses no difficulty in the above procedures. 3.1. TRIANGULAR SYSTEMS 93 3.1.8 The Algebra of Triangular Matrices For future reference we list a few properties about products and inverses of triangular and unit trlangular matrices. 4 The inverse of an upper (lower) triangular matrix is upper (lower) ‘triangular. ‘+ The product of two upper (lower) triangular matrices is upper (lower) triangular. «The inverse of a unit upper (lower) triangular matrix is unit upper (ower) triangular, ‘+ The product of two unit upper (lower) triangular matrices is unit upper (lower) triangular. Problems PS.1.1 Give an algorithm for computing s nonzero 2 € RY much that Us = 0 where Ue FO** upper triangular with tpn = O and wo tnmicant #O- 3.1.2 Discuss how the determinant of a aquare triangular matrix could be computed ‘with iinimum risk of overton and undertow. 3.1.8 Rewrite Algorithm 3.1.4 given thet U is stored by column in a leagth n(n 1)/2 3.1.4 Welte a detailed version of (3.1.4). Do not amume that divides 1. Sen S27 € FO” we ape tpir nd ht(SF =A) = 8 no Eiaie Sela Gh opty abe soaps ae tas ce wea st SF rcne Ot ope. es ‘| é s=[$ Z]m[o a] % Sm tte oni) Te Tk = Benga eR serait ams Coo een snd we Trt i en solves (8474 — Ae = by. Obverve that 24 and wy = Tez cach require O(n) ope. PS.1-7 Seppone the matric Ry,...,.ly € FO" are all upper triangular, Give an (Olgm) algorithm for molviog the ayatem (Ry Rp~ A) = bamruming thatthe mazrix of coeficknts is nowsingulas, Hiat. Generalize the aoution tothe previous problem. Notes and Rafarancne for Sec. 3.1 ‘The accuracy of triangular syste solvers is analyzed in (NJ. Higham (1980). “The Accuracy of Solutions to Tiangular Syrems,” SIAM J. Num. ‘Anal. #6, 1252-1265, 4 (CHAPTER 3. GENERAL LINEAR SYSTEMS 3.2. The LU Factorization ‘As we have just soon, triangular systems are “easy” to solve. The idea behind Gaussian elimination is to convert a given system Az = b to an ‘equivalent triangular system. The conversion is achieved by taking appro- priate linear combinations of the equations. For example, in the system Szjtin, = 9 6ytte = 4 if we multiply the first equation by 2 and subtract it from the second we obtain Setie2 = 9 ~tr = -14 ‘This ia n = 2 Gaussian elimination. Our objective in this section is to give 8 complete specification of this central procedure and to describe what it does in the language of matrix factorizations. ‘This means showing that the algorithm computes a unit lower triangular matrix L and an upper triangular matrix Uso that A = LU, e. 35 1ojfs s ez} - [2a}lo -3j- ‘The solution to the original Ax = b problem is then found by a two step triangular solve process: ty=b Urey => Ara LUr= =b ‘The LU factorization is a “high-level” algebraic description of Gaussian elimination. Expressing the outcome of a matrix algorithm in the “ln guage” of matrix factorizations is a worthwhile sctivity. It facilitates gen- eralization and highlights connections between algorithms that may appear very different at the ecalar level. 3.2.1 Gauss Transformations ‘To obtain a factorization description of Gaussian elimination we need a matrix description of the zeroing process. At the n= 2 level if x3 # 0 and oN 1-02] More generally, suppose x € R® with zy #0. If Fe (QenOsreieit) n= Eo iektin (ecadomtirt) taken 3.2. Tue LU Facrorizarion 95 and we define Mz=1—r, 24) then oyf a of] = |_| a Mur = 0 tee | | 8 oe ae ° Tn general, a matrix of the form My =I- ref € R™" is a Gauss trans- Jormation if the first k components of r € R° are zero. Such a matrix is unit lower triangular. The components of (k + I:n) are called multipliers. ‘The vector r is called the Gauss vector. 3.2.2 Applying Gauss Transformations ‘Multiplication by a Gauss transformation is particularly simple. 1fC ¢ R™* and My =1—ref is 6 Gauss transform, then ae = U=re)C = C= 16o) = c-70(8,9. is an outer product update. Since 7(1:k) = 0 only C(k + 1:n,2) is affected and the update C= M,C can be computed row-by-row as follaws: fori=ktin Cb) = C6) — CUE, ond ‘This computation requires 2(n — 1}r flops. Brxample 8.2.1 aad c=[25 8 a6 0 3.2.3 Roundoff Properties of Gauss Transforms If is the computed version of an exact Gauss vector 7, then it is easy to verify that forte lelsuir 96 CHAPTER 3. GENERAL LINEAR SYSTEMS If + is used in o Gauss transform update and fi((I ~ #¢f)C) denotes the computed result, then fU(I-#ef)C) = (-7ef)C+ 8, where JBL < 3u(ICl + IrllOCk 2) + O(0). Clearly, if 7 has large components, then the errors in the update may be Jarge in comparison to |C|. For this reason, care must be exercised when Gauss transformations are employed, a matter that is pursued in §2.4. 3.2.4 Upper Triangularizing Assume that A IR™". Gauss transformations Mi,...,Ma—1 can usually be found such that My-1-++MzMyA = U is upper triangular. To oe this we first look at the n = 3 case. Suppose ie then 14 7 MA={0 -3 -6 0-6 -1 Likewise, 1 00 1467 M=/|0 10/ + aMa(Mid)=|0 -3 -6 0-21 oo. Extrapolating from this example observe that during the kth step “© We are confronted with a matrix AM“) = Myay---MA that is upper triangular in columns 1 to k—1. ‘+ The multiplier in My are based on A—1)(k 4 1:n,A). In particular, we need aff") + 0 to proceed Noting that complete upper triangularization is achieved after n— 1 steps wwe therefore obtain 3.2. THE LU FAcToRizaTion 7 kel while (A(k,&) #0) & (k 3.2.8 Solving a Linear System Once A has been factored via Algorithm 3.2.1, then Z and Vare represented jn the array A. We can then solve the system Az = b via the triangular systems Ly = b and Uz = y by using the methods of §3.1. Example 8.2.2 If Algorithm 32.1 is applied to I a TEETTE 4a] -[}44] Hd = OLD, then y = (1)-1,0)7 aohee Ly = band = = (-1/3,1/2,0)7 solves ue=y. 3.2.9 Other Versions ‘Gaussian elimination, like matrix multiplication, is a triple-loop procedure that can be arranged in several ways. Algorithm 3.2.1 corresponds to the “hij” version of Gaussian elimination if we compute the outer product: update row-by-row: for k=Ln=1 AE + I:n, k) = A(k-+ Ln, B)/A(K,&) for isk+iin for j=k+in Alia) = Ali) ~ AGRA) end end ond ‘There are five other versions: yi, thy, ijk, jik, and jh. The last of these results in an implementation that features a sequence of gaxpy’s and for- ward eliminations. In this formulation, the Gauss transformations are not 100 Onaprer 3. GENerat Linear SysTEMS immediately applied to A as they aro in the outer product version. Instead, their application is delayed. The original A(;,j) is untouched until step 3. ‘At that point in the algorithm A(,j) is overwritten by Mj-1--- MiAG3) The jth Gauss transformation is then computed, ‘To be precise, suppose 1 j and AG, j) is overwritten with U(i,j) fj 24. d=1 while A n case is illustrated by eile a 12 poyfi 2s 46 0-3 -6 depicts the m n case we modify Algorithm 3.2.1 as follows: for k= in rows =k + 1m A(rows, k) = A(rows,)/A(k, ) ifk j we have B,(1:j- 1,15 ~1) = 1-1. It follows that each Ma is 8 Gauss transform with Gauss vector = Eyy-+-Eysir®). 0 ‘As a consequence of the theorem, itis easy to see how to change Algorithm 3.4.1 40 that upon completion, A(i,j) houses L(i,j) for all i > 3. We merely apply each E to all the previously computed Gauss vectors. This is accomplished by changing the line “A(k,k:n) > A(u,k:n)” in Algorithm BA to “Ak, I:n) = AU Lin)” Example 8.4.2 The octorzation PA = LU ofthe matrix in Bvample 3.4.1 ia given by oo1){[s 7 wo 1 0 o}fs % -2 pool? 4 2/=;12 10fjo s w orojle w a} [is - ijlo 0 6 3.4.5 ‘The Gaxpy Version In §3.2 we developed outer product and gaxpy schemes for computing the LU factorization. Having just incorporated pivoting in the outer product version, itis natural to do the same with the gaxpy approach. Recall from (3.2.5) the general structure of the gaxpy LU process: Lal er) for j= in j=. ‘uGjin) = An.) else Solve L({Iij ~ 1 Lj ~ jz = A(Lj ~ 1,4) for and set U(13j ~ 1,3) = 2 u(jin) = Aljin,3) — LG 13 ~ 1) ifi1 UG 15 =) > L(y —1) end UG,5) =») end In this implementation, we emerge with the factorization PA = LU where P = By.y:-- Ey where Ey is obtained by interchanging rows k and p(k) of the n-by-n identity. As with Algorithm 3.4.1, this procedure requires 2n°/3 flope and O(n?) comparisons. 3.4.6 Error Analysis We now examine the stability that is obtained with partial pivoting. This ‘Fequires an accounting of the rounding errors that are sustained during elimination and during the triangular system solving, Bearing in mind ‘that there are no rounding errors associated with permutation, it is not hard to show using Theorem 3.3.2 that the computed solution ¢ satisfies (A+ B)E = b where II s mu(3]Al + SPTLICI) + fw). (943) Here we are assuming that P, £, and U are the computed analogs of P, E, and U as by the above algorithms. Pivoting implies that the elements of L are bounded by one. Thus fj £ joo } 0 othecwion thea A han an LU fectorization with [fy] $1 and tian = 29-1 3.4.7 Block Gaussian Elimination Gaussian Elimination with partial pivoting can be organized so that it is rich in level-3 operations. We detail » block outer product procedure but block gaxpy and block dot product formulations are also possible. See Dayde and Duff (1988). ‘Assume A € IR" and for clarity that n= rv. Partition A as follows: Aun An a> [a2 3.4. Prvorina ny ‘The first step in the block reduction is typical and proceeds as fallows: ‘* Use scalar Gaussian elimination with partial pivoting (eg. 9 rectan- {gular version of Algorithm 3.4.1) to compute permutation P; € R°**, unit lower triangular Ii, € J" and upper triangular Uy; € IR'™" 50 a(] = [2]. ‘© Apply the P, across the rest of A: (]-9(% ‘= Solve the lower triangular multiple right hand side problem Lute = Aw Perform the level-3 update A= dn-lWn. With these computations we obtain the factorization bu Ou Us PAal oe nels al 0 an ‘The process is then repeated on the first r columns of A, In general, during step k (1 < & < N ~ 1) of the block algorithm we apply scalar Gaussian elimination to a matrix of size (n ~ (k— I)r)-by-r. ‘An r-by-(n — kr) multiple right hand side system is solved and a level 3 Update of size (n — kr}-by-(n ~ kr) ia performed. The level 3 fraction for the overall proceae is approximately given by 1 ~ 3/(2N). Thus, for large N the procedure is rich in matréx multiplication. 3.4.8 Complete Pivoting ‘Another pivot strategy called complete pivoting has the property that the associated growth factor bound is considerably smaller than 2*-1. Recall that in partial pivoting, the kth pivot is determined by scanning the current subcoluma A(k:n, k). In complete pivoting, the largest entry in the cur- rent submatrix A(k:n, kn) is permuted into the (K,) position. Thus, we ‘compute the upper triangularizasion My-1Ey-1:*-MiE\AFi -~ Fant =U with the property thet in step & we are confronted with the matrix ACCU My Bee MU ELAPY Feat us CHapTen 3. GENERAL LINEAR SYSTEMS ‘and determine interchange permutations E and Fy, such thet ‘We have the analog of Theorem 3.4.1 ‘Theorem 3.4.2 If Gaussian elimination with complete pivoting is used to compute the upper triangularization My Ennis Mi EVAR + Fao = 0 (4.7) then PAQ = LU where P = Enai-- Ey, Q= Fy--- Fans and L is a unit lower triangular matriz with [,5| <1. The kth column of L below the diagonal is a permuted version of the kth Gauss vector. In particular, if My = 1-7 eT then Lk-+ Link) = g(k-+ 1) where 9 = Eni ---Enyit) Proof. The proof is similar to the proof of Theorem 3.4.1. Details are left to the reader. Here is Gaustian elimination with complete pivoting in detail: Algorithm 3.4.2 (Gaussian Elimination with Complete Pivoting) ‘This algorithm computes the complete pivoting factorization PAQ = LU where Lis unit lower triangular ond U is upper triangular, P= Ea1--- Ey and Q = Fi-++ Fy are products of interchange permutations. A(I:k,) i overwritten by U(I:k,k), = Ln. A(k + Len, k) is overwritten by L(k + Im,k),k = Ln 1. Ey interchanges rows and p(k). Fy interchanges colurnns & and 9(F). for k=1:n=1 * Determine p with k - Fp satisfy fal} < P.M (3.48) ‘The upper bound is a rather slow-growing function of k. This fact coupled with vast empirical evidence suggesting that pis always modesty sized (e.g, ‘p= 10) permit us to conclude that Goussian elimination with complete ‘pivoting is stable. The method solves & nearby linear system (A+ E)z = b ‘exactly in the sonse of (3.3.1). However, there appears to be 0 practical Justification for choosing complete pivoting over partial pivoting except in cases where rank determination is an issue, xample 3.4.8 If Gaussian elimination with complete pivoting a applied to the prob- gy 19 ][ a]. [3m im 20 | vmaposns-n tual ra(To) e-[t a). #= (38 tS]. ¢ and # = 1.00, 1.00]? Compare with Examples 3.3.1 and 3.43, 3.4.10 The Avoidance of Pivoting For certain clases of matrices it is not neosssary to pivot. It is important ‘to identify such classes because pivoting usually degrades performance. To 120 CHAPTER 3. GENERAL LINEAR SYSTEMS ilustrate the kind of analysis required to prove that pivoting can be safely avoided, we consider the case of diagonally dominant matrices. We say that Ae RO" is strictly diagonally dominant if lel > Slagle m The following theorem shows how this property can ensure a nice, no- pivoting LU factorization. Theorem 3.4.3 If A” is strictly diagonally dominant, then A has an LU factorization and \lu| <1. In other words, if Algorithm 3.4.1 is oppled, then P =I. Proof, Partition A as follows a wt a- [0%] where a is I-by-1 and note that after one step of the outer product LU process we have the factorization a ut 1 ojft 0 aut vc via }[0 C-w7/a}lo f |° ‘The theorem follows by induction on n if we can show that the transpose of B= C vw" /axis strictly diagonally dominant. This is because we may ‘then assume that 3 has an LU factorization B = LU; and that implies 4=[o a) [3%] = But the proof that 8 is strictly diagonally dominant is straight forward. From the definitions we have et et at You Teg saal < Shey + tS ot @ ms Ey a (het ~ toy + algal ~ les~ 84] = foyl.0 ”“ w 34. Prvotina in 3.4.11 Some Applications We conclude with some examples that illustrate how to think in terms of matrix factorizations when confronted with various linear equation situa- tions. ‘Suppose A is nonsingular and n-by-n and that B is n-by-p. Consider the . the multiple right hand side problem of finding X (n-by-p) 90 AX = By i problem. If X = (21,...,2p] and B = [by thea Compute PA = LU. for k= 1p Solve Ly = Phy (3.4.9) Solve Ura = y end Note that A is factored Just once, If B = Jy then we emerge with a computed A! , ‘As another example of getting the LU factorization “outside the loop,” suppose we want to solve the linear system Atz = b where Ac RO", b€ R®, and k is a postive integer. One approach is to compute C = At and then solve Cz = 8. However, the matrix multiplications can be avoided altogether: Compute PA = LU for j= Lk Overwrite b with the solution to Ly=P8. (3.4.10) Overwrite 6 with the solution to Uz = 6. end As a final example we show how to avoid the pitfall of explicit inverse computation. Suppose we are given AG R"*", de R®, and ce R" and ‘that we want to compute s = c" A~'d. One approach is to compute X = A a suggested above and then compute s =<" Xd. A more economical procedure is to compute PA = LU and then solve the triangular systems Ly = Pd and Uz = y. It follows that s = c"s. The point of this example is to stress that when a matrix inverse is encountered in a formals, we must think in terms of solving equations rather than in terms of explicit inverse formation. Probleme 9.4.1 Let A= LU be the LU factorization of m-by-n A with [tj} $1. Let oF and wT Sushi sw of aad pec. Waly te pan fad Deal 122 Chapren 3. GENERAL Linear Systems ‘and ue it to ahow that IU fae S 2°" Allan « (Hint: Take nora and use induction.) 3.42 Show that if PAQ = LU is obtained vie Gaumian elimination with complete Pivottng, then no clement af (im) ia large in abeotuee valve than S43 Soppose A.C FE** hen an LU factorization and that Land U are known, Give ‘a algorichm which can compata tha (i) entry of A“ in appresimately(0=3)?+(n—i)? ope. 3.4.4. Suppose isthe computed averse obtained via (8.4.9). Give an upper bound for [AR ~ Ip. 3.4.5. Prove Theorem 342. P3.4.6 Extend Algoritim 3.43 90 thas it can factor an arbitrary rectangular matrix, 'PS.4.7 Write detailed verion of the block eliminasion algorithm outlined in $8.47 ‘Notes and References for See. 3.4 ‘Am Algol version of Algorithm 3.4.1 a given in HJ. Bowdir, RLS. Martin, G. Petory and J.Hl Wilkiawon (1966), “Solution of Real ‘and Complex Systems of Linear Equations.” Numer. Moth 8, 217-34. See also Witkinsom and Reiaach (1971, $9-110). “The conjactare tat [|

You might also like