LINEAR SPACES 1.1 Introduction Throughout mathematics we encounter many examples of mathematical objects that can be added to cach other and multiplied by real numbers. First of all, the real numbers themselves are such objects. Other examples are real-valued functions, the complex numbers, infinite series, vectors in n-space, and vector-valued functions. In this chapter we discuss a general mathematical concept, called a linear space, which includes all these examples and many others as special cases. Briefly, a linear space is a set of elements of any kind on which certain operations (called addition and multiplication by numbers) can be performed. Norms 1.12 Orthogonality in a Euclidean space 1.13 Exercises 1.14 Construction of orthogonal sets, The Gram-Schmidt process 1.15. Orthogonal complements, Projections 1.16 Best approximation of elements in a Euclidean space by elements in a finite- dimensional subspace 1.17 Exercises 2. LINEAR TRANSFORMATIONS AND MATRICES 1 Linear transformations 2 Null space and range 3 Nullity and rank xii Contents 2.4 Exercises 2.5. Algebraic operations on linear transformations 2.6 Inverses 2.7 One-to-one linear transformations 2.8 Exerci 8 2.9 Linear transformations with prescribed values 2.10 Matrix representations of linear transformations 211 Construction of a matrix representation in diagonal form 2.12 Exercises 2.13 Linear spaces of matrices 2.14 Tsomorphism between linear transformations and matrices 2.15 Multiplication of matrices 2.16 Exercises 2.17. Systems of finear equations 2.18 Computation techniques 2.19 Inverses of square matrices 2.20. Exercises 2.21 Miscellaneous exercises. on matrices 3. DETERMINANTS 3.1 Introduction 3.2 Motivation for the choice of axioms for a determinant function 3.3. A set of axioms for a determinant function 3.4 Computation of determinants 3.5 The uniqueness theorem 3.6 Exercises 3.7 The product formula for determinants 3.8 The determinant of the inverse of a nons 3.9 Determinants and independence of vectors 3.10 The determinant of a block-diagonal matrix 3.11 Exercises 3.12. Expansion formulas for determinants. Minors and cofactors 3.13. Existence of the determinant function 3.14 The determinant of a transpo 3.15 The cofactor matrix 3.16 Cramer's rule gular matrix 3.17 Exercises 35 36 38 4 42 45 48 50 51 52 54 37 58 61 65 67 68 n 72 13 16 19 79 81 83 83 84 85 86 90 91 92 93 94, Contents xiit 4, EIGENVALUES AND EIGENVECTORS 4.1 Linear transformations with diagonal matrix representations 96 4.2 Kigenvectors and eigenvalues. of a linear transformation 97 4,3. Linear independence of eigenvectors corresponding (0 distinct eigei 100 fi 101 ional case. Characteristic polynomials 102 4,6 Calculation of eigenvalues and cigenvectors in the finite-dimensional case 103 4.7 Trace of a matrix 106 48 Exercises 107 4.9 Matrices representing the same linear transformation. Similar matrices 108 4.10 Exercises 112 5. EIGENVALUES OF OPERATORS ACTING ON EUCLIDEAN SPACES 5.1 Eigenvalues and inner products 14 5.2 Hermitian and skew-Hermitian transformations 115 5.3. Eigenvalues and eigenvectors. of Hermitian and skew-Hermitian operators. 117 5.4 Orthogonality of eigenvectors comesponding to distinct eigenvalues 7 5.5 Exercises 118 5.6 Existence of an orthonormal set of eigenvectors for Hermitian and skew-Hermitian operators acting on finite-dimensional _ spaces 120 5.7 Matrix representations for Hermitian and —skew-Hermitian operators 121 58 jan and skew-Hermitian matrices. The adjoint of a matrix 5.9 Diagonalization of a Hermitian or skew-Hermitian matrix 5.10 Unitary matrices. Orthogonal matrices 5.11 Exerci 5.12 Quadratic forms 13 Reduction of a real quadratic form to a diagonal form 5.14 Applications to analytic geometry 515 Exercises 5.16 Eigenvalues of a symmetric transformation obtained as values of its quadratic form 5.17 Extremal properties of cigenvalues of a symmetric transformation 5.18 ‘The finite-dimensional case 5.19 Unitary transformations 5.20 Exercises 135 136 137 138 141 xiv Contents 6. LINEAR DIFFERENTIAL EQUATIONS 6.1 Historical introduction 6.2 Review of results conceming linear equations of first and second orders 6.3 Exercises 6.4 Linear differential equations of order 7 6.5 The existence-uniqueness theorem 6.6 The dimension of the solution space of a homogeneous linear equation 6.7 The algebra of constant-coefficient operators 68 Determination of a basis of solutions for linear equations with constant coefficients by factorization of operators 6.9 Bxercis 6.10 The relation between the homogeneous and nonhomogeneous equations 6.11 Determination of a particular solution of the nonhomogeneous equation. ‘The method of variation of parameters 6.12 Nonsingularity of the Wronskian matrix of 1 independent solutions of a homogeneous linear equation 6.13 Special methods for determining a particular solution of the nonhomogeneous equation, Reduction to a system of first-order Tinear equations 6.14 The annihilator method for determining a particular solution of the nonhomogeneous equation 6.15 Exercises 6.16 Miscellaneous exercises on linear differential equations 6.17 Linear equations of second order with analytic coefficients 6.18 The Legendre equation 6.19 The Legendre polynomials 6.20 Rodrigues’ formula for the Legendre polynomials 6.21 Exercises 622 The method of Frobenius 6.23 The Bessel equation 6.24 Exercises 7. SYSTEMS OF DIFFERENTIAL EQUATIONS 7.1 Introduction 7.2. Calculus of matrix functions 7.3 Infinite series of matrices. Norms of matrices 7.4 Bxercises 7.5 The exponential matrix 142 143 144 145 147 147 148 150 154 156 187 161 163, 163 166 167 169 i 174 176 7 180 182 188 191 193 194 195 197 Contents xv 7.6 The differential equation satisfied by e'4 197 7.7. Uniqueness theorem for the matrix differential equation F’(t) = AF(t) 198 7.8 The law of exponents for exponential matrices 199 7.9 Existence and uniqueness theorems for homogeneous linear systems with constant coefficients 7.10 The problem of calculating e*4 7.1 The Cayley-Hamilton theorem 7.12 Exercises 7.13. Putzer’s method for 7.14 Alternate methods for calculating e in special cases 7.15. Exercises alculaing et 7.16 Nonhomogeneous linear systems with constant coefficients 7.17 Exercises 718 The general linear system Y(t) = P(t) ¥() + Q(t) 7.19 A powerseries method for solving homogeneous linear systems 7.20. Exercises 7.21 Proof of the existence theorem by the method of successive approximations ‘The method of successive approximations applied to first-order nonlinear systems Proof of an existence-uniqueness theorem for first-order nontinear systems Exercises Suocessive approximations and fixed points of operators Normed linear spaces 47.27 Contraction operators 47.28 Fixed-point theorem for contraction operators 47.29 Applications of the fixed-point theorem PART 2. NONLINEAR ANALYSIS 8. DIFFERENTIAL CALCULUS OF SCALAR AND VECTOR FIELDS 8.1 Functions from R" t0 R”. Scalar and vector fields 243 8.2 Open balls and open sets 244 8.3 Exercises 24s 8.4 Limits and continuity 247 8.5 Exercises 251 8.6 The derivative of a scalar field with respect to a vector 252 8.7. Directional derivatives and partial derivatives 254 8.8 Partial derivatives of higher order 255 8.9 Bxercises 255 xvi Contents 8.10 Directional derivatives and continuity 8.11 The total derivative 8.12 The gradient of a scalar field 813 A 8.14 Exercises icient condition for differentiability 8.15 A chain rule for derivatives of scalar fields 8.16 Applications to geometry. Level sets. Tangent planes 8.17 Exercises 8.18 Derivatives of vector fields 8. 19 Diflerentiability implies continuity 8.20 The chain rule for derivatives of vector fields 8.21 Matrix form of the chain rule 8.22 Exercises 48.23 Sufficient conditions for the equality of mixed partial derivatives 8.24 Miscellaneous exercises 257 258 259 261 262 263 266 268 269 2 272 273 215 217 281 9. APPLICATIONS OF THE DIFFERENTIAL CALCULUS 9.1 Partial differential equations 9.2 A first-order partial differential equation with constant coefficients 9.3 Exercises 9.4 ‘The one-dimensional wave equation 9.5 Bxe ses 9.6 Derivatives of functions defined implicitly 9.7 Worked examples 9.8 Bxe 9.9 Maxim s a, and saddle points 9. 10 Second-order Taylor formula for scalar fields 9.11 The nature of a stationary point determined by the eigenvalues of the Hessian matrix 9.12 Second-derivative test for extrema of functions of two variables 9.13 Exercises 9.14 Extrema with constraints. Lagrange’s multipliers 9, 15 Exercises 9.16 The extreme-value theorem for continuous scalar fields 9.17 ‘The small-span 10.1. Introduction 10.2 Paths and line theorem for continuous scalar fields (uniform continuity) 10. LINE INTEGRALS integrals 283 284 286 288 292 294 298 302 303 308 310 312 313 314 318 319 321 Contents xvii 10.3. Other notations for fine integrals 324 10.4 Basic properties of line integrals 326 10.5 Exercises 328 10.6 The concept of work as a line integral 328 10.7. Line integrals with respect to are length 329 10.8 Further applications of line integrals, 330 10.9 Bxercises 331 10.10 Open connected sets. Independence of the path 332 10.11 The second fundamental theorem of calculus for line integrals 333 10.12 Applications to mechanics 335 10.13 Exercises 336 10.14 The first fundamental theorem of calculus for line integrals 337 10.15 Necessary and sufficient conditions for a vector field to be a gradient 339 10.16 Necessary conditions for a vector field to be a gradient 340 10.17 Special methods for constructing potential functions 342 10.18 Exercises 345 10.19 Applications to exact differential equations of first order 346 10.20 Exercises 349 10.21 Potential functions on convex sets 11. MULTIPLE INTEGRALS 11.1 Introduetion 11.2. Partitions of rectangles. Step functions 11.3 The double integral of a step function 11.4 The definition of the double integral of a function defined and bounded on rectangle 11.5 Upper and lower double integrals 11.6 Evaluation of a double integral by repeated one-dimensional integration 11,7 Geometric interpretation of the double integral as a volume 11.8 Worked examples 11.9 Exercises 11.10 Integrability of continuous functions 11.1 1 Integrability of bounded functions with discontinuities 11.12 Double integrals extended over more general regions 11.13 Applications to area and volume 11.14 Worked examples 11, 15 Exercises 11.16 Further applications of double integrals 11.17 Two theorems of Pappus 11.18. Exercises 353 355 357 357 358 359 360 362 363 364 365 368 369 an 373 316 377 xviii Contents 11.19 Green’s theorem in the plane 11.20 Some applications of Green’s theorem 11.21 A neces 11.22 11.23 gradient Exercises Green’s, theorem 411.24 The winding number 1 1.25 Exercises 11.26 Change of variables in a double integral 127 11.28 Special cases Exercises ary and sufficient condition for a two-dimensional vector field to be a for multiply connected regions of the transformation formula 11.29 Proof of the transformation formula in a special case 11.30 Proof of the transformation formula in the general case Extensions (0 higher dimensions 11.32 Change of variables in an n-fold integral Worked examples 1131 11.33 11.34 12.1 12, 12.3 12.4 125 12.6 Exercises Parametric representation 12. SURFACE INTEGRALS of a surface The fundamental vector product ‘The fundamental vector product as a normal to the surface Exercises Area of a parametric surface Exercises 12.7 Surface integrals 128 129 12.10 12.11 The theorem of Stokes Change of — parametric Other notations Exercises representation for surface integrals 12.12 The curl and divergence of a vector field 12.13 Exercises 12.14 Further properties of the curl and divergence 12.15 Exercises 12.16 Reconstruction of a vector field from its curl 12.17 12.18 Exercises Extensions of Stokes? theorem, 12.19 The divergence theorem (Gauss’ theorem:) 12.20 Applications of the divergence theorem 12.21 Exercises 378 382 383 385 387 389 391 392 396 399 401 403 405 407 409 413 4l7 420 423 424 424 429 430 432 434 436 438 442 443 447 448 452 453 457 460 462 Contents xix PART 3. SPECIAL TOPICS 13. SET FUNCTIONS AND ELEMENTARY PROBABILITY 13.1 13.2 13.3 13.4 13.5 13.6 13.7 13.8 13.9 13.10 13.11 13.12 13.13 13.14 13.15 13.16 13.17 13.18 13.19 13.20 13.21 13.22 13.23 14.1 14.2 14.3 14.4 14.5 14.6 14.7 14.8 14.9 Historical introduction Finitely additive set functions Finitely additive measures Exercises The definition of probability for finite sample spaces Special terminology peculiar 10 probability theory Exercises Worked examples Exercises Some basic principles of combinatorial analysis Exercises Conditional probability Independence Exercises Compound experiments Bemoulli tials The most probable number of successes in n Bemoulli trials Exercises Countable and uncountable sets Exercises The definition of probability for countably infinite sample spaces Exercises Miscellaneous exercises on probability 14. CALCULUS OF PROBABILITIES The definition of probability for uncountable sample spaces Countability of the set of points with positive probability Random variables Exercises Distribution functions Discontinuities of distribution functions Discrete distributions. Probability mass functions Exercises Continuous distributions, Density functions 469 470 47 472 473 475 477 ary 479 481 485 486 488 490 492 495 497 499 501 504 506 307 507 510 Sul 512 513 54 517 520 523 525 xx Contents 14.10 Uniform distribution over an interval 14.11 Cauchy's distribution 14,12 Exercises 14.13 Exponential distributions 14.14 Normal — distributions 14.15. Remarks on more general distributions 14.16 Exercises 14.17. Distributions of functions of random variables 14.18 Exercises 14.19 Distributions of two-dimensional random _ variables 14.20 Two-dimensional discrete _ distributions 14.21 Two-dimensional continuous distributions. Density functions 14.22 Exercises 14.23. Distributions of functions of two random va 14.24 Bxercises 14.25 Expectation and variance 14.26 Expectation of a function of a random variable 14.27, Exercises 14.28 Chebyshev’s inequality 14.29 Laws of large numbers 14.30 The central fimit theorem of the calculus of probabilities 14.3 I Exercises Suggested References ables. 15. INTRODUCTION TO NUMERICAL ANALYSIS 15.1 Historical introduction 15.2 Approximations by polynomials 15.3 Polynomial approximation and normed linear spaces 15.4 Fundamental problems in polynomial approximation 15.5. Exerc’ 15.6 Interpolating polynomials 15.7 Equally spaced interpolation points 15.8 Error analysis in polynomial interpolation 15.9 Exercises 15.10 Newton's interpolation formula 15.11 Equally spaced interpolation points, The forward difference operator 15.12 Factorial polynomials 15.13, Exercises 15.14 A minimum problem relative to the max norm. 526 530 532 539 540 541 542 543 545 546 548 550 553 556 559 560 562 564 566 568 569 71 572 974 575 $17 579 582 583 585 588 590 592 593 595 Contents xxi 15.15 Chebyshev polynomials 596 15.16 A minimal property of Chebyshev polynomials 598 15.17 Application to the error formula for interpolation 599 15.18 Exercises 600 15.19 Approximate integration, The trapezoidal rule 602 15.20 Simpson's. rule 605 15.21. Exercises 610 15.22 The Buler summation formula 613 Exercises 618 Suggested References 621 Answers {0 exercises 622 Index 665 Calculus PART | LINEAR ANALYSIS LINEAR SPACES 1.1 Introduction Throughout mathematics we encounter many examples of mathematical objects that can be added to cach other and multiplied by real numbers. First of all, the real numbers themselves are such objects. Other examples are real-valued functions, the complex numbers, infinite series, vectors in n-space, and vector-valued functions. In this chapter we discuss a general mathematical concept, called a linear space, which includes all these examples and many others as special cases. Briefly, a linear space is a set of elements of any kind on which certain operations (called addition and multiplication by numbers) can be performed. In defining a linear space, we do not specify the nature of the elements nor do we tell how the operations are tw be performed on them, Instead, we require that the operations have certain properties which we take as axioms for a linear space. We tum now (0 a detailed description of these axioms 1.2 The defi Let V denote a nonempty set of objects, called elements. The set V is called a linear space if it satisfies the following ten axioms which we list in three groups. n of a linear space Closure axioms axrom 1. cuosure unper appzezom or every pair of elements x and y in V there corresponds a unique element in V called the sum of x and y, denoted by x + y. AXIOM 2, crosuns men woerpuicarron sy gaz mummers. For every x in V and every real number a there corresponds an element in V called the product of a and x, denoted by ax. Axioms for addition axtom 3, commurative taw. For all x and y in V, we have x + y xtQt2). axrom 4, assocrarrveraw. Forall.x, y, andzin V,wehave(x + y) + 4 Linear spaces axiom 5. EXISTENCEOFZEROELEMENT. ‘There is an element in V, denoted by 0, such that x+0=x forallxinV axrom 6, EXISTENCEOFNEGATIVEs. For every x in V, the element (— \)x has the property x+(-Dx 0 Axioms for multiplication by numbers axrom 7 assoctarrive Law. For every x in V and all real numbers a and b, we have a(bx) = (ab)e. AKIOM 8. orsrmzeurive 1am ror anorron mV. For all x andy in V and all real a, we hare a(x + y) = ax + ay. AxToM 9, DISTRIBUTIVE LAW FOR ADDITION oF NumpeRS. For all x in V and all real a and b, we have (a + Dx = ax + bx. nxou 10, mxrsrance or spenrsey. For every x in V, we have Ix Linear spaces, as defined above, are sometimes called real linear spaces to emphasize the fact that we are multiplying the elements of V by real numbers. If real number is replaced by complex number in Axioms 2, 7, 8 and 9, the resulting structure is called a complex linear space. Sometimes a linear space is refered to as a Linear vector space or simply a vector space; the numbers used as multipliers are also called scalars. A real linear space has real numbers as scalars; a complex linear space has complex numbers as scalars Although we shall deal primarily with examples of real linear spaces, all the theorems are valid for complex linear spaces as well. When we use the term linear space without further designation, it is to be understood that the space can be real or complex. 1.3 Examples of linear spaces If we specify the set V and tell how to add its elements and how to multiply them by numbers, we get a concrete example of a linear space. The reader can easily verify that each of the following examples satisfies all the axioms for a real linear space. suvere 1. Let V = R, the sot of all real numbers, and let x + y and ax be ordinary addition and multiplication’ of real numbers sows 2, Let V = C, the set of all complex numbers, define x + y to be ordinary addition of complex numbers, and define ax to be multiplication of the complex number x Examples of linear spaces 5 by the real number a. Even though the elements of V are complex numbers, this is a real linear space because the scalars are real. smor 3. Let V = Vq, the vector space of all n-tuples of real numbers, with addition and multiplication by scalars defined in the usual way in terms of components. wowis 4, Let V be the set of all vectors in V,, orthogonal to a given nonzero vector N. If n = 2, this linear space is a line through 0 with N as a normal vector. If n = 3, it is a plane though 0 with N as normal vector. The following examples are called function spaces. The elements of V are real-valued functions, with addition of two functions f and g defined in the usual way: + gx) =fix) + 8) for every real x in the intersection of the domains off and g. Multiplication of a function f by a real scalar a is defined as follows: af is that function whose value at each x in the domain off is af (x). The zero element is the function whose values are everywhere zero. ‘The reader can easily verify that each of the following seis is a function space. waxexz 5. The set of all functions defined on a given interval wowsz 6. The set of all polynomials. sewers 7, The set of all polynomials of degre < m, where 1 is fixed. (Whenever we consider this set it is understood that the zero polynomial is also included.) The set of all polynomials of degree equal to m is not a linear space because the closure axioms are not satisfied. For example, the sum of two polynomials of degree m need not have degree 1. move 8, The set of all functions continuous on a given interval, If the interval is La, b], we denote this space by C(a, b): awexe 9. The set of all functions differentiable at a given point. nowiz 10. The set of all functions integrable on a given interval woven 11, The set of all functions f defined at 1 with 1) = 0, The number 0 is essential in this example. If we replace 0 by a nonzero number c, we violate the closure axioms. zxanpze 12, The set of all solutions of a homogeneous linear differential equation y” + ay’ + by = 0, where a and b are given constants, Here again 0 is essential, The set Of solutions of a nonhomogencous differential equation does not satisfy the closure axioms. ‘These examples and many others illustrate how the linear space concept permeates algebra, geometry, and analysis. When a theorem is deduced from the axioms of a linear space, we obtain, in one stroke, a result valid for each concrete example. By unifying 6 Linear spaces diverse examples in this way we gain a deeper insight into each, Sometimes special knowl- edge of one particular example helps to anticipate or interpret results valid for other examples and reveals relationships which might otherwise escape notice. 14 jementary consequences of the axioms ‘The following theorems are easily deduced from the axioms for a linear space. varoreM 11, iQ and only one zero element. ESS OF THE ZERO ELEMENT. Jn any linear space there is one Proof. Axiom 5 tells us that there is at least one zero element. Suppose there were two, ay 0, and 0,. Taking x = Q, and 0 = 0, in Axiom 5, we obtain 0, + O2= O1. Similarly, taking x = 02 and 0 = O,, we find 02 + 0, = 02. But O, + 02 = 02 + 0, wuse of the commutative law, so 0, ‘THBOREM 1.2. UNIQUENESS OF NNGATIVE ELEMENTS In any linear space every element has exactly one negative. That is, for every x there is one and only one y such that x + = 0. Proof. Axiom 6 tells us that each x has at least one negative, namely ( 1)x. Suppose x has two negatives, say y, and yy. Then x + y, = 0 and x + y, = 0. Adding y2 to both members of the first equation and using Axioms 5, 4, and 3, we find that Jat @+y) =e t+ O= yay and eee Therefore y; = yz, so x has exactly one negative, the element ~ Dx. Notation. The negative of x is denoted by —x. The difference y — x is defined to be the sum y + Gx) The next theorem describes a number of properties which govern elementary algebraic manipulations in a linear space. THBOREM 1.3. Ina given linear space, let x and y denote arbitrary elements and let a and b denote arbitrary scalars. Then we'have the following properties: (@) Ox = 0. (b) a0 = 0. (e) (-a)x = -(ax) = a(-x) (4) Ifax=O,theneithera=Oorz=0 (e) Ifax = ay anda #0, thenx=y (f) Ifax = bx and x # O, thena =b (2) -@ + y= (—x) 4+ (-y) = -x ye ih) x + x 2x," 4% 4x2 Bx, andingeneral, 3: Exercises 7 We shall prove (a), (b), and (c) and leave the proofs of the other properties as exercises, Proof of (a). Let z = Ox. We wish to prove that z = 0. Adding z to itself and using Axiom 9, we find that z+2=0x + Ox = 0 + Ox = 0x 0. Now add —z to both members to get z Proof offb). Let z = a0, add z to itself, and use Axiom 8. Proof off). Let z = (a)x. Adding z to ax and using Axiom 9, we find that z+ ax = (—a)x + ax = (—a+a)x =0x=0 so z is the negative of ax, z = -(ax). Similarly, if we add @(—x) to ax and use Axiom 8 and property (b), we find that a(—x) = -(ax). 15 Exercises In Exercises 1 through 28, determine whether each of the given sets is a real linear space, if addition and multiplication by real scalars are defined in the usual way. For those that are not, tell which axioms fail to hold. The functions in Exercises 1 through 17 are real-valued. In Exer- cises 3, 4, and 5, each function has domain containing 0 and 1, In Exercises 7 through 12, each domain contains “all real numbers. 1. All rational functions. 2. All rational functions fig, with the degree off & the dear ofe Gncluding J = 0)- 3. All f with £(0) ="(1) All even functions. 4. All fwith 2f(0) =1(1) °. All odd functions. 5. Allfwith (1) = 1 +st0). 10. All bounded functions. 6, All step functions defined on [0, 1]. 11, All increasing functions. 7. All fwith (x) + 0asx—> +. 12. All functions with period 2%. 13. All F integrable on [0, 1] with jt fix) dx = 14. All f integrable on [0, 1] with {} fix) dx = 0. 15. All f satisfying f(x) = f(1 ~ 29 for all x. 16. All Taylor polynomials ‘of degree ¢,x, = 0 with not all c,= 0 is said to be a nontrivial representation of 0. ‘The set S is called independent if it is not dependent. In this case, for all choices of distinct elements x15... X¢in Sand scalars ¢y,..., Cy, : > cxe,=O implies cy ep = = =O. Although dependence and independence are properties of sets of elements, we also apply these terms to the elements themselves. For example, the elements in an independent set are called independent elements, If S is a finite set, the foregoing definition agrees with that given in Volume I for the space V,. However, the present definition is not restricted w finite sets. savers 1. If a subset 7 of a set S is dependent, then $ iself is dependent This is logically equivalent to the statement that every subset of an independent set is independent. exevexe 2. If one clement in § is a scalar multiple of another, then S$ is dependent. wows 3. If 0 & S, then S is dependent. wownz 4. The empty set is independent, 10 Linear spaces Many examples of dependent and independent sets of vectors in V,, were discussed in Volume 1. The following examples illustrate these concepts in function spaces. In each case the underlying linear space V is the set of all real-valued functions defined on the real line. wows 5. Let u,(t) = cos? t , u(t) = sin? ¢, u(t) = 1 for all real t. The Pythagorean identity shows that % + t, — ts = 0, so the three functions w,, ta, tv are dependent ware 6. Let u,(t) = t* fork =0,1,2...., and ¢ real, The set S= {ug, ty, ug... .} is independent. To prove this, it suffices to show that for each n the x + 1 polynomials Uys Uy, +445 My are independent. A relation of the form > et = 0 means that aay Set =0 for all real t, When ¢ = 0, this gives ¢y = 0 . Differentiating (1.1) and setting t = 0 we find that ¢, = 0. Repeating the process, we find that each coefficient ¢, is zero. ExaMPLE 7. If a... @, am distinct real numbers, the m exponential fiunctions (x) = 6). 6, ula) = etm are independent. We can prove this by induction on n, The result holds trivially when n = 1. Therefore, assume it is tue for 2 — 1 exponential funetions and consider sca C1y sees Cy such that (1.2) Yee = 0. on Let ay be the largest of the » numbers a, ..., a. Multiplying both members of (1.2) by €-™, we obtain (13) Seerw* = 0 If k AM, the number @ — ayy is negative. Therefore, when x —» + co in Equation (1.3), cach term with k % M tends to zero and we find that ¢37 = 0. Deleting the Mth term from (1.2) and applying the induction hypothesis, we find that each of the remaining n — 1 coefficients cy is zero. rHEorEM 1.5. Let S = {x1,.... x} be an independent set consisting of k elements in a linear space V and let L(S) be the subspace spanned by S. Then every set of k + 1 elements in L{S) is dependent. Proof. ‘The proof is by induction on &, the number of elements in S. First suppose k = 1, Then, by hypothesis, S consists of one element x1, where x; # 0 since S is independent. Now take any two distinct elements y, and y» in 1(S). Then each is a sca Dependent and independent sets in a linear space un multiple of x1, say Yy= ¢%y and Y2= CyX,, where ¢ and ¢, are not both 0. Multiplying Ja by Cy and yz by ¢ and subtracting, we find that ae aa This is a nontrivial representation of O soy, and yz are dependent. This proves the theorem when k= 1 Now we assume the theorem is true for k — 1 and prove that it is also wue for k. Take any set of k + 1 elements in L{S), say T= (1, Yo... .. Jerr}. We wish to prove that Tis dependent, Since each y, is in L(S) we may write (1.4) Da sxs Ta foreachi= 1,2,...,k +1. We examine all the scalars qj that multiply x, and split the proof into two cases according to whether all these scalars are 0 or not. CASE I. dy, = 0 for every i= 1,2,..., k +41. In this case the sum in (1.4) does not involve x1, 50 each y, in T is in the linear span of the set S’ = {xp,..., Xx} . But S? is independent and consists of k — 1 elements, By the induction hypothesis, the theorem is true for k — 1 so the set T is dependent. This proves the theorem in Case 1 CASE 2. Not all the scalars aa are zero. Let us assume that a, #0. (Ir necessary, we can renumber the y’s to achieve this.) Taking ¢ = 1 in Equation (1.4) and multiplying both members by c,, where ¢; = an/a,, we get oo, aunt Sega ay From this we subtract Equation (14) tw get cy Llears = a4); 5 fori =2,...,k + 1 . This equation expresses each of the & elements ¢,J — yj; as a linear combination of the k — 1 independent elements x, + Xe. By the induction hypothesis, the k elements ¢;— y, must be dependent, Hence, for some choice of scalars fe... 5 fys1s not all zero, we have ke Dien — =O, from which we find (3) ya Sto o. But this is a nontrivial linear combination of yy, + Yesa Which represents the zero cle ment, so the elements Yi, + Jee1 Must be dependent. This completes the proof, 12 Linear spaces 1.8 Bases and dimension persurtion. A jinite set Sof elements in a linear space V is called a finite basis for V S is independent and spans V. The space V is called finite-dimensional if it has a jinite basis, or if V consists of 0 alone. Otherwise, V is called injinite-dimensional. muzonsm 1.6. Let V be a_finite-dimensional linear space. Then every finite basis for V has the same number of elements. Proof. Let $ and T be two finite bases for V. Suppose S consists of k elements and T consists of m elements. Since S is independent and spans V, Theorem 1.5 tells us that every set of k + 1 elements in Vis dependent. Therefore, every set of more thank elements in V is dependent. Since T is an independent set, we must have m < k. The same argu ment with Sand 7 interchanged shows that k < m . Therefore k = m permurtion. If @ linear space V has a basis of n elements, the integer n is called the dimension of V. We write n= dim V. Tf V = {0}, we say V has dimension 0. wars 1, The space V, has dimension n, One basis is the set of nm unit coordinate vectors, pevers 2, The space of all polynomials p(t) of degree C,) that is uniquely determined by x. In fact, if we have another representation of x as a linear combination of €;,..., n; say X = Sf, de,, then by subtraction from (1.5), we find that XE (; — de; = 0. Butsince the basis elements are independent, this implies ¢; = d, for each i, so we have (¢1,-. +5 eq) = (dys+++ sd) The components of the ordered n-tuple (©, ... , ¢,) determined by Equation (1.5) are called the components of x relative to the ordered basis (e, ,... , ,). 1.10 Exercises In each of Exercises 1 through 10, let S denote the set of all vectors (x, y. 7) in Py whose com- ponents satisfy the condition given. Determine whether Sis a subspace of V,. If S is a subspace, compute dim’S. 1x0. 6 2x ty =o. 7. Bxtytz Bxtyel. aon sy 9 ys2eand z=3x. Slicieyiee 1.x + y+z=0andx+y—7=0. Let P,, denote the linear space of all real polynomials of degree < , where a is fixed, In each of Exercises 11 through 20, let $ denote the set of all polynomials f in P,, satisfying the condition given. Determine whether or not S is a subspace of P,. If $ is a Subspace, compute dim S. 11, £0) = 0. 16. £(0) = £2). 12. f’(0) = 0. 17. £ seven. 13. f"@) = 0. 18. f is odd. 14. (0 +/'O) 19. £ has degree < k, where k 0 Ff x#0 (positivity) Inner products, Euclidean spaces. Norms 15 A real linear space with an inner product is called a real Euclidean space. Note: Taking ¢ = 0 in (3), we find that (O, y) = 0 for all y. In a complex linear space, an inner product (x, y) is a complex number satisfying the same axioms as those for a real inner product, except that the symmetry axiom is replaced by the relation ay (9) =(x), (Hermitian symmetry) where (y, x) denotes the complex conjugate of (y, x). In the homogeneity axiom, the scalar multiplier ¢ can be any complex number, From the homogeneity axiom and (I’), we get the companion relation 8) 9) = ©) (Y, %) = ex, y). ‘A complex linear space with an inner product is called a complex Buclidean ‘space. (Sometimes the term unitary space is also used.) One example is complex vector space V,,(C) discussed briefly in Section 12.16 of Volume 1 ‘Although we are interested primarily in examples of real Euclidean spaces, the theorems of this chapter are valid for complex Euclidean spaces as well. When we use the term Euclidean space without further designation, it is to be understood that the space can be real or complex. The reader should verify that each of the following satisfies all the axioms for an inner product woz 1. In Vq let (x, y) = x.y. the usual dot product of x and y. severe 2. If x = (X15 x2) and y = (yy, ys) are any two vectors in Ve, define (x, y) by the formula @, vy) = xu Xia XaVart Xe This example shows that there may be more than one inner product in a given linear space. wowz 3. Let C(a, 6) denote the linear space of all real-valued functions continuous ‘on an interval fa, b]. Define an inner product of two functions f and g by the formula (he)= J, fOgat . This formula is analogous to Equation (1.6) which defines the dot product of wo vectors in V,. The function values f(t) and g(t) play the role of the components x; and y, , and integration takes the place ‘of summation, + In honor of Charles Hermite (1822-1901), a French mathematician who made many contsibutions 10 algebra and analysis. 16 Linear spaces seers 4, In the space C(a, b), define a=! wore at, where w is a fixed positive function in C(a, b). The function w is called a weightfinction In Example 3 we have w(t) = 1 for all ft. severe 5. In the linear space of all real polynomials, define a= [Pegg at. Because of the exponential factor, this improper integral converges for every choice of polynomials fand g. mon 1.8. In a Euclidean space V, every inner product satisfies the Cauchy-Schwarz inequatity: 10, ¥)? <0, 0, y) for all x andy in V. Moreover, the equality sign holds if and only if x and y are dependent. Proof. W either x = 0 or y = 0 the result holds tivially, so we can assume that both x and y are nonzero. Let z = ax + by, where @ and b are scalars to be specified later. We have the inequality (, ) > 0 for all a and b, When we express this inequality in terms of x and y with an appropriate choice of a and b we will obtain the Cauchy-Schwarz inequality, To express (z, 2) in terms of x and y we use properties (I’), (2) and (3°) to obtain (@,2)=(@x+by, a%+ by) (ax, aX) + (ax, by) + (by, ax) + (by, by) = a(x, x) + ab(x, y) + ba(y, x) + bB(y, y) D0. Taking a = (y, y) and cancelling the positive factor (y, y) in the inequality we obtain GO ys x) + BO, Y) +60, x) + dbz. Now we take b = —(x, ¥) . Then 6 = ~ (y, x) and the last inequality simplifies to Os YE 2) ZO NO A) = 1G YF This proves the Cauchy-Schwarz inequality. The equality sign holds throughout the proof if and only if z = 0. This holds, in tum, if and only if x and y are dependent. avers, Applying Theorem 1.8 to the space C(a, b) with the inner product (f, g) = S2f/(Dg(t) de, we find that the Cauchy-Schwarz inequality becomes (Pree a)’ <([? re a)( f° ko ar) Inner products, Euclidean spaces. Norms 7 ‘The inner product can be used to introduce the metric concept of length in any Euclidean space. permztron. In a Euclidean space V, the nonnegative number |\x || defined by the equation Ux! = Gx, x4 is called the norm of the element x. When the Cauchy-Schwarz inequality is expressed in terms of norms, it becomes [@y)| < Il | Since it may be possible to define an inner product in many different ways, the norm of an clement will depend on the choice of inner product. This lack of uniqueness is to be expected. It is analogous to the fact that we can assign different numbers to measure the length of a given line segment, depending on the choice of scale or unit of measurement, The next theorem gives fundamental properties of norms that do not depend on the choice of inner product. mazorEm 1.9. In a Euclidean space, every norm has the following properties for all elements x and y and all scalars c: ee ee () |x] > 0 fF xO (positivity) (©) Jexl) = Iel lx] (homogeneity). (@) [x + vil < Ex + Iyll (triangle inequality). The equality sign holds in (A) if x = O, ify = O, or if y = cxfor some c > 0. Proof. Properties (a), (b) and (c) follow at once from the axioms for an inner product To prove @), we note that Ix $y = © FYFE) = + 0.9) + + 09) = Ixi? + ip? + Gy) + Gy) The sum (x, y) + (% y) is meal, The Cauchy-Schwarz inequality shows that |(x, y)| < xl Ibll and I(x, y)| < xl ty, so we have Ix + yl? < boll? + yl? + Ill) iy = (lai + byl)? This proves (@). When y = cx , where ¢ > 0, we have Ix + yl = [lx + ex] = (Lt co) |x] = [xl + flex! 18 Linear spaces DEFINITION In a real Euclidean space V, the angle between two nonzero elements x and y ie defined to be that number 0 in the interval 0 <0< 7 which satisfies the equation (1.7) cos = 9) Nell Note: ‘The Cauchy-Schwarz inequality shows that the quotient on the right of (1.7) lies in the interval | — 1, 1], so there is exactly one @ in 0, 7] whose cosine is equal to this quotient. 1.12 Orthogonality in a Euclidean space pernution. In a Euclidean space V, two elements x and y are called orthogonal if their inner produet is zero. A subset S of Vis called an orthogonal set if (x, y) = 0 for every pair of distinct elements x and y in S An orthogonal set is called orthonormal if each of its elements has norm 1. ‘The zero clement is orthogonal to every element of V; it is the only element orthogonal to itself. ‘The next theorem shows a relation between orthogonality and independence. rurorm 1.10. In a Buclidean space V, every orthogonal set of nonzero elements is independent. In particular, in a jinite-dimensional Euclidean space with dim V = n, every orthogonal set consisting of n nonzero elements is a basis for V. Proof. Let S be an orthogonal set of nonzero elements in V, and suppose some finite linear combination of elements of $ is zero, say dom =O, where each x, €S. Taking the inner product of each member with x, and using the fact that (xi , x,) = 0 if i # 1, we find that c,(x,, X1) = 0. But (x1, x) ¥ 0 since x, 0 so ¢, = 0. Repeating the argument with x, replaced by X,, we find that cach c, = 0. This proves that S$ is independent. If dim V = n and if S$ consists of n elements, Theorem 1.7(b) shows that S is a ba itera wovexe. In the real linear space C(O, 27) with the inner product (f, g)= 2" f(x)g(x) dx. let S be the set of trigonometric functions {u, %41, v2... -} given by tg(x) 1, agi) = cos ms, g(x) = sin mx, = for n= 1,2... If m x nm, we have the orthogonality relations JP wsCoug(a) ae = 0, Orthogonality in a Euclidean space 19 so S is an orthogonal set. Since no member of S is the zero element, S is independent. The norm of each clement of S is easily calculated. We have (ti, uo) = J3* dx = 2m and, for a > 1, we have (ants Man dary, vos? nx dx = 7, (Ways te,) =), sin® nx de = 7 Therefore, ||ugl| = \/2m and ||u,| = v7 for n > 1. Dividing each u, by its norm, we obtain an orthonormal set {v. G1, G2» - ».} where g, = u,/|u,| Thus, we have 1 Gx) = for n>1 bf In Section 1.14 we shall prove that every finite-dimensional Buclidean space has an orthogonal basis. The next theorem shows how to compute the components of an element relative to such a basis. moan I.11, Let V he a finite-dimensional Euclidean space with dimension n, and assume that S = {e,,..., €,}8 an orthogonal basis for V. [£an element x is expressed as 2 linear combination of the basis elements, say 18) See. ‘hen its components relative to the ordered basis (e, ,..., ,) are given by the formulas 19) SE en eae tes, e4) ™n particular, if S is an orthonormal basis, each c, is given by 1.10) = @) Proof. Taking the inner product of each member of (1.8) with e;, we obtain (x, e)= ¥ aes, ep) = &(e;, €)) ‘ince (e,, ¢,) O if i J. This implies (1.9), and when (e;, ¢,) 1, we obtain (1.10), If {e,,...,€,} is an orthonormal basis, Equation (1.9) can be written in the form 1.11) x eer. ‘The next theorem shows that in a finite-dimensional Buctidean space with an orthonormal vasis the inner product of two elements can be computed in terms of their components. 20 Linear spaces susonee 1.12, Let V be a finite-dimensional Euclidean space of dimension n, and assume fhat (e,,..., J is an orthonormal basis for V. Then for every puir of elements x and y in V, we have (12) (9) =Z(,exG08) — Parseval’s formula). In particular, when x = y , we have (1.13) xl? = SIG, ed. Proof, ‘Taking the inner product of both members of Equation (1.11) withy and using the linearity property of the inner product, we obtain (1.12), When x = y, Equation (1.12) reduces to (1.13), Note: Equation (1.12) is named in honor of M, A. Parseval (circa 1776-1836), who obtained this type of formula in a special function space. Equation (1.13) is a generalization of the theorem of Pythagoras. 1.13 Exercises 1. Let x = (xy, ...,) andy = (5... + yn) be arbitrary vectors in Vq . In each case, determine whether (x, y) is an inner product for V;, if (x, y) is defined by the formula given, In case (x, y) is not an inner product, tell which axioms are not satisfied. : oe Hi cs yal (d) (x, y) = y xy? . 9 y= 3 a @9=( Sav) © &» ~ [Sou : © «9-3 t+ Sa- Sy. © @y=Sudy Suppose we retain the first three axioms for a real inner product (symmetry, Tinearity, and homogeneity but replace the fourth axiom by a new axiom (4’): (x, x) = 0 if and only if x = 0. Prove that either (x, x) > 0 for all x # 0 or else (x, x) < 0 for all x # 0. [Hint: Assume (x, x) > 0 for some x ¥ 0 and (y, y) <0 for some y ¥ 0. In the space spanned by {x, y), find an element 2 # 0 with (z, 2) = 0.] Prove that each of the statements in Exercises 3 through 7 is valid for all elements x and y in a real Euclidean space. 3. (%, y) = 0 if and only if lx + yll= lle - yl. 4. (, y) = 0 if and only if ix + yl? = lixl® + ily IP 5. (x,y) = 0 if and only if {|x + ey!) > ljx! for all real c. 6. & +y,x = y) =0 if and only if x! = Iipl- 7. If x and y are nonzero elements making an angle @ with each other, then fle — yl* = Yxi* + ly? — 2 [xl iyll cos 6. Exercises a1 8. In the real linear space C(l, e), define an inner product by the equation (Ka= J. dog 2 floygts) ae. @ IffG) = Vx, compute |i/\ (b) Find a linear polynomial g(x) = a + bx that is orthogonal to the constant function f(x) = 1 9. In the real linear space CC — 1, 1), let (f, g) = f1, f(g@) det . Consider the three functions Uy» lg, Hy given by m@=1, a= tg() = 1+ te Prove that two of them are orthogonal, two make an angle 7/3 with each other, and two make an angle 7/6 with each other. 10. In the linear space P,, of all real polynomials of degree 1. 4. Let V be the set‘of all real functions f continuous on [0, + 2) and such that the integral JF “FO ae converges. Define (f, 2) = fF e-"f(Ng(s) dt 22 Linear spaces (@ Prove that the integral for (f, g) converges absolutely for each pair of functions f and g inV. [Hin Use the Cauchy-Schwarz inequality to estimate the integral {34 et | (Ng(A)| dt] (b) Prove that V is a linear space with ce, g) as an inner product, (©) Compute (f,g) if ((O = et and g(0) = 1, where n= 0, 1,2, 15, In a complex’ Euclidean space, prove that the inner product has the following. properties for all elements x, y and z, and all complex a and 5, . (a) (ax, by) = able, y). (©) (x, ay + 62) = a(x, y) + B(x, 2). 16. Prove that the following identities are valid in every Euclidean space. (a) lx + yl? = lixl* + Il? + Gy) +O, 2. (b) lin + yl? = Ne = yl? = 2, y) + 20, 9). (© ix + yl? + be = yl? = 2 Hel + 2 My 17. Prove that the space of all complex-valued functions continuous on an interval fa, 6] becomes unitary space if we define an inner product by the formula aa) - fh woreorWae, where w is a fixed positive function, continuous on fa, b/. 1.14 Construction of orthogonal s The Gram-Scltmidt process Every finite-dimensional linear space has a finite basis, If the space is Euclidean, we can always construct an orthogonal basis. This result will be deduced as a consequence of a general theorem whose proof shows how to construct orthogonal sets in any Euclidean space, finite or infinite dimensional. The construction is called the Gram-Schmidt orthog- onalizationprocess, in honor of J. P. Gram (1850-1916) and E. Schmidt (18451921), razonem 1.13. onrnocowaurzatron tugoren. Let X,,X2,..-, be ajinite or infinite sequence of elements in a Euclidean space V, and let L(X;,..., %,) denote the subspace spanned by the first k oF these elements. Then there is a corresponding sequence of elements VisVaseees in V which has the following properties sor each integer ke (@) The element y, is orthogonal to every element in the subspace L(y... +1 Ye (b) The subspace spanned by Y1,.. ., wis the same as that spanned by X,..-, Xx! LO rr Ve) = LG x) (c) The sequence yx. ¥2,..., is unique, except for scalar factors. That is, ify,, yy,..., is another sequence os elements in V satisfying properties (a) and (b) for all k, then for each k there is a scalar ¢, such that yi,= ¢,¥x. Proof. We construct the elements Yi, Je... ., by induction, To start the process, we take yy = x1. Now assume we have constructed y,,. ... Yy 80 that @) and (b) are satisfied when k = . Then we define Y,.1 by the equation (1.14) Jest = Xe — aes Construction of orthogonal sets. The Gram-Schmidt process 23 where the scalars @,,...,d, are to be determined. For j Yu) = Ly, not) given that L(y, + v= LO, + %,) . The first 7 elements y;, . yp are in Ly ooo Xp) and hence they are in the larger subspace L(x, ,... , X43). The new element y,1 given by (1.14) is a difference of two elements in L(x,,.. . X11) 80 it, Wo, isin LOx, ee ‘This proves that Ly +++ Vert) S Lis ++ +s Xpaa)- Equation (1.14) shows that x,,4 is the sum of two elements in L(y... Ypya) 80 a similar ‘argument gives the inclusion in the other direction: Ls 00s Xt) SLO Yaa) This proves (b) when k = 7 + 1, Therefore both (a) and (b) are proved by induction on k. Finally we prove (c) by induction on &. The case & = 1 is trivial, Therefore, assume (©) is tue for k =r and consider the element y/_, . Because of (b), this element isin LO ts 000 Deans so we can write ra Yar = DO es Crary where z,€ LOQ1s-.. 5 y,) « We wish to prove that z, = 0. By property (a), both yl, and CeeiVex1 aC Orthogonal 10 z, . Therefore, their difference, z,, is orthogonal to z, . In other words, Z, is orthogonal to itself, so z, = 0. This completes the proof of the orthogonaliza- tion theorem. 24 Linear spaces In the foregoing construction, suppose we have yp. = 0 for some r. Then (1.14) shows that x,)1 is a linear combination of Yi... . 5 yy) and hence of 4... ., Xp» So the elements X15... . p41 are dependent. In other words, if the first A clements X15... . Xe ‘are independent, then the corresponding elements J , + Yq are nonzero, In this case the coefficients @, in (1.14) are given by (1.15), and the formulas’ defining y,... . . Yy become Oe i Oe vo ‘These formulas describe the Gram-Schmidt process for constructing an orthogonal set of nonzero elements Y,,... , J» which spans the same subspace as a given independent set Xpee. sy Xge In particular, if 1, + Xe is a basis for a finite-dimensional Euclidean space, then y, 1Jx is an orthogonal basis for the same space. We can also convert this to an orthonormal basis by normalizing each element y,, that is, by dividing it by its norm. Therefore, as a corollary of Theorem 1.13 we have the following THEOREM 1.14. Every finite-dimensional Euclidean space has an orthonormal basis. If x and y are elements in a Euclidean space, withy 7 O, the clement Gs y) OQ, ») is called the projection of x along y. In the Gram-Schmidt provess (1.16), we construct the clement y,,1 by subtracting from x,,, the projection of x,,. along each of the earlier elements Yy,..., Jr- Figure 1.1 illustrates the construction geometrically in the vector space Vs By = Xe yy — ayy, a = an oe ee Yoox Figure 1.1 The Gram-Schmidt process in Vg . An orthogonal set {y,, Je» yy) is constructed from a given independent set {x, x2, x3} Construction of orthogonal sets. The Gram-Schmidt process 25 In Vg, find an orthonormal basis for the subspace spanned by the three exampte 1, vectors x, = (1, -1, 1, —1), 42 = G 1, 1, 1), and xy = (3, -3, 1, -3), Solution. Applying the Gram-Schmidt process, we find w= = (1, -1, 1, —1), te SD yx (4,202), = oa eee lL Ord” On ve) Since yy = O, the three vectors 1, Xe, %3 must be dependent. But since y, and y, are nonzero, the vectors x and x» are independent. Therefore L(x,, Xe, Xa) is a subspace of dimension 2. The set {y,, Ya} is an orthogonal basis for this subspace. Dividing each of Ji and yy by its norm we get an orthonormal basis consisting of the two vectors = 20, -1,1, -1) rein 2. The Legendre polynomials. In inner product (x, y) = J, x(y() dt, consider x,(f) =e". When th another sequence of polynomials Yos Yiy Yas matician A. M. Legendre (1752-1833) in his work on potential theory. The firs polynomials are easily calculated by the Gram-Schmidt process. Folt) = x(t) = 1. Since and Ya 1 02,1, 0, 1). yell V6 the linear space of all polynomials, with the where the infinite sequence xp. %1,%2,... orthogonalization theorem is applied to this sequence it’ yields » first encountered by the French mathe- few First of all, we have (4, 90) = fiedt= 0, (Yos Yo! i dti=2 and we find that nd) = x4) ~ FY yy = x) = Co» Xa) Next, we use the relations Gay = fl edr=a, (ey y to. obtain C2, Yo) Ow vd volt) A(t) = x2(t) = Similarly, we find that d= P=, y= oe Pd=0, Owy= feat SEW ye roy, Onw” a bs WE) = em 26 Linear spaces We shall encounter these polynomials again in Chapter 6 in our further study of differential equations, and we shall prove that ‘The polynomials P, given by PQ) = 22 yy + £ ty 2a" 2"n! dt” are known as the Legendrepolynomials. The polynomials in the corresponding orthonormal sequence Yo, G15 Pa, ++ +s given by Gy = Yx/||¥qll are called the normalized Legendre poly- nomials. From the formulas for y,.... ¥s given above, we find that w= Vi, w=Vit, w= NEGE-Dy glD= NIGP -3n, pat) = AVE St — 30% + 3), ge(t) = AVA (631% — TOF? + 151). 1.15, Orthogonal compl ents, Projections Let V be a Euclidean space and let S be a finite-dimensional subspace. We wish to consider the following type of approximation problem: Given an element x in V, to deter- mine an element in S whose distance from x is as small as possible. The distance between two elements x and y is defined to be the norm jx — yl Before discussing this problem in its general form, we consider a special case, illustrated in Figure 1.2, Here V is the vector space Vz and S is a two-dimensional subspace, a plane through the origin. Given x in V, the problem is to find, in the plane S, that point s hearest 10 x. If x € S, then clearly s = x is the solution, If x is not in S, then the nearest point 5 is obtained by dropping a perpendicular from x to the plane. This simple example suggests an approach to the general approximation problem and motivates the discussion that follows. perinrtiov. Let S be a subset of a Euclidean space V. An element in Vis said to be orthogonal to S if it is orthogonal to every element of S The set of all elements orthogonal to $ is denoted by S* and is called “S perpendicular.” It is a simple exercise to verify that S! is a subspace of V, whether or not S itself is one. In case S is a subspace, then S* is called the orthogonal complement of S. EXAMPLE, If S is a plane through the origin, as shown in Figure 1.2, then $+ is a line through the origin perpendicular (© this plane, This example also gives a geometric inter- pretation for the next theorem. Orthogonal complements. Projections 27 st x-s4st Ss t i i FIGURE 1.2 Geometric interpretation of the orthogonal decomposition theorem in V. mumorex 1.15. onrmocowan pxcomposrtzon tuzoren. Let V be a Euclidean space and let S be a finite-dimensional subspace of V. Then every element x in V can be represented uniquely as a sum of two elements, one in S and one in $+, That is, we have aan ae ge where $ © S and ste Si, Moreover, the norm of x is given by the Pythagorean formula (1.18) (el? = Isl? + is*l?. Proof. First’ we prove that an orthogonal decomposition (1.17) actually exists. Since S is finite-dimensional, it has a finite orthonormal basis, say {e, , > G}. Given x, define the elements s and st as follows: (19) s= DO eer Note that each term (x, e,)e, is the projection of x along e, . The element 5 is the sum of the projections of x along each basis element. Since s is a linear combination of the basis elements, s lies in S. The definition of s+ shows that Equation (1.17) holds. To prove that s fies in S+, we consider the inner product of s+ and any basis element e, . We have (st, 6) = @& = 5,8) = (4 6) — 65,2). But from (1.19), we find that (s, €) = (x, ¢), so s/ is orthogonal to e,, Therefore 54 is orthogonal to every element in S, which means that s+ € $4 Next we prove that the orthogonal decomposition (1.17) is has two such representations, say that x unique. Suppos (1.20) xast+st and xerte, 28 Linear spaces where § and # are in S, and s“ and ¢+ are in S, We wish to prove that s= 1 and st = ¢+ From (1.20), we have § — 1 =f! = s+, so we need only prove that s— 4 = 0. But § = 16 Sand t} — s! € §+ so § —1 is both orthogonal to ¢+ — s+ and equal to 44 — st Since the zero element is the only element orthogonal to itself, we must have s = 1 = 0. This shows that the decomposition is unique. Finally, we prove that the norm of x is given by the Pythagorean formula, We have In? =@, NY =64st,54+ 5°) = (5) +64, 5°), the remaining terms being zero since s and gs! are orthogonal. This proves. (1.18). Durnvinion. Let S be a finite-dimensional subspace of a Euclidean space V, and let fe1,..., &} be an orthonormal basis for S. IE x € V, the element s defined by the equation s= 2 ee is called the projection of x on the subspace S. We prove next that the projection of x on S' is the solution to the approximation problem stated at the beginning of this. section. 1.16 Best approximation of elements in a Euclidean space by elements in a finite- dimensional subspace ‘THEOREM 1.16, APPROXIMATION THEOREM. Let S be a_finite-dimensional subspace of a Euclidean space V, and let x be any element of V. Then the projection ef x on S is nearer to x than any other element of 8. That is, if 8 is the projection ef x on S,we have IIx = sll < lx — tl) Sor all t in S; the equality sign holds if and only if t = s. Proof. By Theorem 1.15 we can write x = 5 + 5!, where s € Sand st © S!. Then, for any ¢ in S, we have x—-f=(k-s)+G-H Since s = 1 € S and x ~ 5 = s4 € St, this is an orthogonal decomposition of x — 1, $0 its norm is given by the Pythagorean formula ilx = tl] = lx = sl? + fs — el. But y = £2 > 0, so we have |x — t\)? > |lx —sll?, with equality holding if and only. if 5 =f, This completes the proof Best approximation of elements in a Euclidean space 29 mers 1. Approximation of continuous functions on (0, 2] by trigonometric polynomials. Let V = C(0,2n), the linear space of all real functions continuous on the interval {0, 27], and define an inner product by the equation (f, g) = J3" f(x)g(x) de . In Section 1.12 we exhibited an onhonormal set of trigonometric functions @, gy, ga, -- - » where if sin kx (1.21) Go) = Se, Pea) =F, palx)= for ket. Jie Va Vm The 2n + 1 elements %, 1+... zn Span a subspace S of dimension 2n + 1. The ele- ments of § are called trigonometric polynomials. Iffe C(O, 2m), letf, denote the projection off on the subspace S. Then we have (122) In=Sh Me — where Cf 9) = [2 F6)9@) dx The numbers (f, 9x) are called Fourier coefficients off. Using the formulas in (1,21), we can rewrite (1.22) in the form (1.23) Sl) = Jag +3 (ay cos kx + by sin kx), i where a f(x) cos kede, by = £ Prey sin kx de fork =0,1,2,...,”. The approximation theorem tells us that the trigonometric poly- nomial in (1.23) approximates f beter than any other trigonometric polynomial in S, in the sense that the norm |f — f,| is as small as possible. wxwerz 2. Approximation of continuous functions on [— 1, 1] by polynomials of degree Since (f, 7) = 0, this is also the nearest quadratic approximation. 1.17 Exercises 1. In each case, find an orthonormal basis for the subspace of Vg spanned by the given vectors. 1,1), x = (1,0,1), Xs = 2,3) (b) x 1), my =(-b1-), m= 0, D 2, In each case, find an orthonormal basis for the subspace of V, spanned by the given vectors. @ m= (61,0,0, m=O1,10, x= (0,01, iD, = (1,0,0, D. () =O, 1,0,D, = 1,0,2,0, — ¥% = (1,2, -2, D. In the real linear space C(O, 7), with inner product (x, 3) = f§ x(y(t) dé, let x,(0) = cos nt fom =0,1,2,.... Prove that the functions yo, Vis Yes... given by for n > 1, form an orthonormal set spanning the same subspace as Xo. 15 Xe». « 4, In the linear space of all real polynomials, with inner product (x,y) = f§ x@y(@ dr, let x() = tM for m= 0, 1, 2, Prove that the functions WO =1, WO =VI@-D, xf) = V5 GP - 6 + 1) form an orthonormal set spanning the same subspace as {x9 , x1, 3) 5. Let V be the linear space of all real functions £ continuous on [0, + 9) and such that the integral Jz? e~'f*(t) dt converges. Define (f,g) = fe etf(g() dt, and let Yo. yrsY2s.- «5 be the set Obtained by applying the Gram-Schmide process (© x9, xy, Xe + - + > Where x,(1) = for n= 0. Prove that y(t) =1, w= t= 1, w= P= He + 2 y(t) = P= 9 + 181 ~ 6 © C(I, 3) with inner product (fag) = ff (g(a) de let fd) = Ve and show that the constant polynomial g nearest to f is g = } log 3. impute lig =f |? for this g 7. In the real linear space C(O, 2) with inner product (/, g) = a sf g(x) dx, let foo) = & and show that the cor this g. 8, In the real linear space CC — 1, 1) with inner product (f. g)= J14 f (g(X) dee. let foe) = and find the linear polynomial g nearest to f Compute [ig — a for this. g. 9. In the real linear space C(O, 2) with inner product (f,g) = J3" flags) dx, let fle) = In the subspace spanned by u(x) = 1, (x) = cos X, w(x) = sin x, find the trigonometric polynomial nearest tof. 10. In the linear space V of Exercise 5, let f (2) tof tant polynomial g nearest to f is g = }(e? = 1). Compute lig -f |? for e~* and find the linear polynomial that is nearest : LINEAR TRANSFORMATIONS AND MATRICES 2.1 Linear transformations One of the ultimate goals of analysis is a comprehensive study of functions whose domains and ranges are subsets of linear spaces. Such funetions are called transformations, ‘mappings, ot operators. This chapter treats the simplest examples, called linear transforma- tions, which occur in all branches of mathematics. Properties of more general transforma- tions are often obtained by approximating them by linear transformations. First we introduce some notation and terminology conceming arbitrary functions. Let V and W be two sets. The symbol ce ae will be used to indicate that T is a function whose domain is V and whose values are in W. For each x in V, the element Ys) in W is called the image of x under T, and we say that T maps x onto Tix). Wf A is any subset of V, the set of all images T(x) for x in A is called the image of A under T and is denoted by TVA). ‘The image of the domain V, TYV), is the range of T. Now we assume that V and Ware linear spaces having the same set of scalars, and we define a linear transformation as follows. perimtiox. If Vand Ware linear spaces, a function T: V > W is called a linear trans- formation of V into W if it has the following two propertie (@ T(x + y) = Tie) + Ty) — forall xand yin V, () T(ex) = eT) forall x in Vand all scalars c. ‘These properties are verbalized by saying that T’ preserves addition and multiplication by scalars. The two properties can be combined into one formula which states that Tax + by) = aTix) + bTO) for all x,y in V and all scalars a and b. By induction, we also have the more general relation 1 (Zax) =Lared for any # elements x, . . . .x, in Vand any n scalars a, ..., dy. 31 Linear transformations and matrices ‘The reader can easily verify that the following examples are linear transformations, EXAMPLE 1. The identity transformation, ‘The wansformation T: V —> V, where Tix) = x for each x in V, is called the identity wansformation and is denoted by Z or by fy moxers 2. The zero transformation. The wansformation 7: V—» V which maps each element of V onto 0 is called the zero transformation and is denoted by 0. mworz 3. Multiplication by a fixed scalar c. Here we have T: V > V, where Ttx) = cx for all x in V. When ¢ = 1, this is the identity transformation, When ¢ = 0, it is the zero transformation. sewers 4, Linear equations. Let V = Vq and W = Vy, . Given mn real numbers ayy, where i= 1,2,...,mandk =1,2,...,m, define T: V, > V,, as follows: Tmapseach vector x = (x;, +x) in V,, onto the vector y = (is... . Ym) in Vy, according to the equations Viz Dayx, — for eves 6. Inner product with a fixed element. Let V be a real Buclidean space. For a fixed clement z in V, define T: V + R as follows: If x € V, then T(x) = (x, z), the inner product of x with z suwexe 6. Projection on a subspace. Let V be a Buclidean space and let S be a finite- dimensional subspace of V. Define T: V + S as follows: If x € V, then T(x) is the projection of x on S, mers 7. The differentiation operator. Let V be the linear space of all real functions f differentiable on an open interval (@, 6). The linear transformation which maps each functionfin V onto its derivativef is called the differentiation operator and is denoted by D. Thus, we have D:V — W, where D (£) = f” for each f in V. The space W consists of all derivatives f°. sexes & ‘The integration operator. Let V be the linear space of all real functions continuous on an interval fa, bf. Iff€ V, define g = Tf) to be that function in V given by g(x) [roa Aig lcci ibt This transformation 7’ is called the integration operator. 2.2 Null space and range In this section, Tdenotes a linear transformation of a linear space V into a linear space W. muzorem 2.1. The set TV) (the range of TD is a subspace of W. Moreover, T maps the zero element of V onto the zero element of W. Null space and range 33 Proof. To prove that T{ V) is a subspace of W, we need only verify the closure axioms, Take any two elements of TCV), say T(x) and T(y). Then T(x) + T(y) = T(x + y) . so T(x) + Ti) is in T(V). Also, for any scalar c we have eT(x) = T(cx), so eT (x) is in TCV). Therefore, T( V) is a subspace of W. Taking c = 0 in the relation T(ex) = T(x), we find that 7/0) = 0. perinrrron. ‘The set of all elements in V that T maps onto 0 is called the null space of Tand is denoted by N(T). Thus, we have MD = & |= € Vand Tix) = O ‘The null space is sometimes called the kernel of T. rumor 2.2, The null space of Tis a subspace of V. Proof. f x and y are in N(T), then so are x + y and cx for all scalars c, since T(x + y) = Tx) + Ty) = 0 and T(ex) = T(x) = 0. ‘The following examples describe the null spaces of the linear wansformations given in Section 2.1 sowie 1 Identity transformation. The null space is {0}, the subspace consisting of the zero element alone, sowie 2. Zero transformation, Since every clement of V is mapped onto zero, the null space is V_ itself. sues 3, Multiplication by a fixed scalar c. If ¢ # 0, the null space contains only 0. If c = 0, the null space is V. exwere 4, Linear equations. The null space consists of all vectors (x1, i a for which L4yx,= 0 for a weweis 5. Inner product with a fixed element 1. The null space consists of all elements in V orthogonal to z sxanete 6. Projection on a subspace S. If x © V, we have the unique orthogonal decomposition x = s + s4 (by Theorem I 15). Since T(x) = s,we have T(x) = 0 if and only if x = s+ . Therefore, the null space is S", the orthogonal complement of S. avers 7. Differentiation operator. The null space consists of all functions that are constant on the given interval. weverz & Integration operator. The mull space contains only the zero function, 34 Linear transformations and matrices 2.3 Nullity and rank Again in this section 7’ denotes a linear transformation of a linear space Vito a linear space W. We are interested in the relation between the dimensionality of V, of the null space N(T), and of the range 7YV). If Vis finitedimensional, then the null space is also finite-dimensional since it is a subspace of V. The dimension of N(T) is called the nullity of T. In the next theorem, we prove that the range is also finite-dimensional; its dimension is called the rank of T: TH 2.3. NULLITY PI mM. If Vis finite-dimensional, then TV) is also finite-dimensional, and we have RANK ‘THEOR (2.1) dim N(T) + dim T1V) = dim V. In other words, the nullity plus the rank of a linear transformation is equal to the dimension of its domain. Proof. Let n=dim V and let €: , . . . , éy bea basis for N(T), where k = dim N(T) < n. By Theorem 1.7, these elements are part of some basis for V, say the basis (2.2) Cis ees Ces Cris. + Cbtes We shall prove that the 7 elements 2.3) Tress...» Tens) form a basis for T(V), thus proving that dim TYV) =r. Since k +r = 7 , this also proves 2p, First we show that the 7 elements in (2.3) span TV). If y © TYV), we have y = Tix) for some x in V, and we can write x = Ge, + '' + Cyssgay - Hence, we have keg z iy ker y= Tx) =LaTle) = CT) + 2 aT) = % Te) & Se fas sinee T(e,) = Now we show that these elements Cee al Am | = T(e,) = 0. This shows that the elements in (2.3) span TV). are independent. Suppose that there are scalars ker > oT) = 0. wk ‘This implies that 7(¥ oe) o so the clement x = Cgrateer #1 CeeCxer 8 in the null space N(T). This means there Exercises 35 are scalars C1...» , Ce such that x = qe ++. ++ Ge, 80 we have air sax Sea Zou- 0. But since the elements in (2.2) are independent, this implies that all the scalars ¢; are zero, ‘Therefore, the clements in (2.3) are independent, Note: \f Vis infinite-dimensional, then at least one of N(T) or TCV) is infinite- dimensional. A proof of of this fact is outlined in Exercise 30 of Section 2.4. 2.4 Exercises In each of Exercises 1 through 10, a transformation T: ¥, -» Vp is defined by the fo for T(x, y), where (x, y) is an arbitrary point in V, In each case determine whether T T is linear, describe its null space and range, and compute its nullity and rank, mula given linear. If 1. T(x, y= 6 Tox, = (7, e%)- 2. Tx, Y= @, -y) « 7. Tex, y= 0% 2D). 3. TE.) = (0). 87a, y)=@+ Ly td. 4, T(x, y) = (t,x) 9. T(x, y) = (x —y, x+y). 5. Tex y) = (2, y*)- 10. T(x, y) =x ~y,x + y). Do the same as above for each of Exercises 11 through 15 if the transformation T: Vy —> Vy is described as indicated. 11. T rotates every point through the same angle @ about the origin, That is, 7 maps a point with polar coordinates (r, 6) onto the point with polar coordinates (r, 6 + g), where g is fixed. Also, T maps 0 onto itself, 2 2. T maps each point onto its reflection with respect to a fixed line through the origin, 13. T maps every point onto the point (1, 1). 14, T maps each point with polar coordinates (¥, ) onto the point with polar coordinates (2r, 6). Also, T maps 0 onto itself. 15. T maps each point with polar coordinates (r, 8) onto the point with polar coordinates (7, 26), Also, T- maps 0 onto itself Do the same as above in each of Exercises 16 through 23 if a transformation 7: ¥y—> Vs is defined by the formula given for T(x, y, 2), where (x, -y, 2) is an arbitrary point of Vg 16. T(x, y,2) = yee 20. Tix, y,2= (xt lythz— dD. 17. TU, y,2) = (9,0) 21. Tony, 2) a + Lye 22 +d). 18. T(x, y, 2) = (x, 2y, 32). 22. Tx, y, 2) =, y4, 2). 19, Tes ys2) = Gy D 23. Thx, 2) = +2 Ox HYD In each of Exercises 24 through 27, a transformation 7: ¥—> V is described as indicated. In each case, determine whether Tis linear. If T is linear, desctibe its null space and range, and compute the nullity and rank when they are finite, 24, Let V be the linear space of all real polynomials p(x) of degree < m. itp € ¥, q = T(p) means that q(x) = p(x + 1) for all real x, 25. Let V be the linear space of all real functions differentiable on the open interval (— 1, 1). Ife V, g = TC) means that g(x) = xf"(a) for all x in (1, 1), 36 Linear transformations and matrices 26. Let V be the linear space of all real functions continuous on [a, b]. Ife V, g = T() means that eo) =[PFO sin —H de forage V be the linear transformation defined as follows: Iff € 74g = Tif) means that ge) ‘a {1 + cos (x— Of () dt. (@) Prove that 7(V), the range of 7, is finite-dimensional and find a basis for 7(V). () Determine the null space of 7. (® Find all real ¢ # 0 and all nonzero f in V such that 7() = cf. (Note that such an f lies in the range of 7.) 30. Let 7: V—» W be a linear transformation of a linear space V into a linear space W. If V is infinite-dimensional, prove that at least one of TCV) or A(T) is infinite-dimensional, [Hint Assume dim N(T) = & , dim 7(V) =r, let ex... , € be a basis for N(T) and et 1). +> Ces Cbets- ©» » Cam be independent elements in V, where n> r. The elements T(e;41),- + » Tein) are dependent since n > 1. Use this fact to obtain a contradiction, 25 Algebraic operations on near transformations Functions whose values lic in a given linear space W can be added to each other and can be multiplied by the scalars in W according to the following definition, mitrox. Let S: V—» W and T: V—> W be two functions with a common domain V and with values in a linear space W. If c is any scalar in W, we define the sum S + T and the product CT by the equations (24) (S+ DG) = S@)+ TE), (eT) T(x) for all x in V. Algebraic operations on linear transformations a7 We are especially interested in the case where V is also a linear space having the same scalars as W. In this case we denote by £( V, W) the set of all linear transformations of V into W. If S and Tare two linear transformations in #( V, W), it is an casy exercise to verify that S + T and cT are also linear transformations in @(V, W). More than this is tue, With the operations just defined, the set £(V, W) itself becomes a new linear space. The zero transformation serves as the zero element of this space, and the transformation (—1)T is the negative of T. It is a straightforward mater to verify that all ten axioms for a linear space are satisfied. Therefore, we have the following. THEOREM 2.4. The set L(V, W) of all linear transformations of V into W is a linear space with the operations of addition and multiplication by scalars defined as in (2.4), ‘A more interesting algebraic operation on linear transformations is composition or ‘multiplication of transformations. This operation makes no use of the algebraic structure of a linear space and can be defined quite generally as follows. Frovss 2.1 Illustrating the composition of two transformations. perinrtron. Let U, V, W be sets. Let T: U-> V be a function with domain U and values in V, and let S:V—> W be another function with domain V and values in W. Then the composition ST is the function ST: U—> W defined by the equation (STV) = SIT(X)] for every x in U Thus, to map x by the composition ST, we first map x by T and then map T(x) by S. ‘This is illustrated in Figure 2.1, Composition of real-valued functions has been encountered repeatedly in our study of calculus, and we have seen that the operation is, in general, not commutative. However, as in the case of real-valued functions, composition does satisfy an associative law. THEOREM 2.5, If T: U-» V, S: VW, and R: W > X are three functions, then we have R(ST) = (RS)T. 38. Linear transformations and matrices Proof. Both functions R(ST) and (RS)T have domain U and values in X, For each x in U, we have [R(ST)I&) = RE(STIC)] = RISITOO]] and [(RS)T](x) = (RS)[TO)] = RISIT@M, whieh proves that R(ST) = (RS)T: permmon. Let T: V—> V be a function which maps V into itself. We define integral powers of T inductively as follows. P= T*=1T 1 for n>1. Here J is the identity tansformation. The reader may verify that the associative law implies the law of exponents 7"7" = 7" for all nonnegative integers m and n, ‘The next theorem shows that the composition of linear transformations is again linear. raronew 2.6. If U,V, W are linear spaces with the same scalars, and if T: U > V and S: V — W are linear transformations, then the composition ST: U — W is linear. Proof. For all x, y in U and all scalars a and b, we have (ST)(ax + by) = S[T(@x + by)] = SlaT(x) + bTG)) = aST(x) + 6STQ). Composition can be combined with the algebraic operations of addition and multiplica- tion of scalars in #(V, W) to give us the following. runorea 2.7. Let U,V, W be linear spaces with the same scalars, assume S and T are in £(V,W), and let c be any scalar. (a) For any function R with values in V, we have ee ob and (cS)R = e(SR) (b) For any linear transformation R: W > U, we have R(S+ T)=RS+RT and —_ReS) = e(RS). ‘The proof is a straightforward application of the definition of comp and is left as an. exercise. 2.6 In our study of real-valued functions we learned how to construct new functions by inversion of monotonic functions. Now we wish to extend the process of inversion to a more general class of functions. Inverses 39 Given a function T, our goal is to find, if possible, another function § whose composition with 7 is the identity transformation. Since composition is in general not commutative, we have to distinguish between ST and 7S. Therefore we introduce two kinds of inverses which we call left and right inverse. permurtiow. Given two sets V and Wand a function T: V —> W. A function S: T(V) > V is called a left inverse of T if S{T(x)] = x for all x in V, that is, if ST=Iy, where Ip is the identity transformation on V. A function R: T(V) > V is called a right inverse Of T if TIRQ)] = y for ally in T(V), that is, if TR = Tpiry5 where Ip yy is the identity transformation on T(V). wows. A function with no left inverse but with two right inverses. Let V = (1, 2) and let W = {0}. Define 7: V—> Was follows: TC) = T@) = 0. This function has two right inverses R: W—> V and R” : W—» V given by RO) = 1, RO) = 2. It cannot have a left inverse S since this would require oi a a het and 2 = S[T(2)] = S(O). This simple example shows that left inverses need not exist and that right inverses need not be unique. Every function 7; V—* W has at least one right inverse. In fact, each y in 7(V) has the form y = Tix) for at least one x in V. If we select one such x and define R(y) = x , then T{R(y)] = T(x) = y for each y in 7(V), so R is a right inverse. Nonuniqueness may occur because there may be more than one x in V which maps onto a given y in 1/V). We shall prove presently (in Theorem 2.9) that if each y in 1YV) is the image of exactly one x in V, then right inverses are unique. First we prove that if a left inverse exists it is unique and, at the same time, is a right inverse. ‘razon 2.8, A function T: V—> W can have at most one left inverse. If T has a left inverse S, then S is also a right inverse. Proof. Assume T has two left inverses, S: T(V)+ Vand S’: 1(V)+ V. Choose any y in T(V). We shall prove that S(y) = Sy) . Now y = T(x) for some x in V, so we have SiT@I= x and S/T) =x, 40 Linear transformations and matrices since both S and S’ are left inverses, Therefore S(y) = x and S’(y) = x , so Sty) = S'(y) for all y in TCV). Therefore S = S* which proves that left inverses are unique. Now we prove that every left inverse S is also a right inverse. Choose any element y in T(V). We shall prove that T[S(Q))] = y . Since y € T(V). we have y = T(x) for some’ x in ¥. But S is a left inverse, so x = SIT] = SQ). Applying 7, we get Tix) = TISO) . Bu y proof T(x), so y = TIS(Q] . which completes. the ‘The next theorem characterizes all functions having left inverses. magorem 2.9. A function T: V — W has a left inverse if and only if T maps distinct elements of V onto distinct elements of W; that is, if and only if, for all x and y in V, (2.5) xAy — implies T(x) ¥ Ty). Note: Condition (2.5) is equivalent to the statement 2.6) TQ) =T(Q) — implies — x A function 1 satisfying (2.5) or (2.6) for all x and y in V is said to be one-to-one on V. Proof. Assume T has a left inverse S, and assume that T(x) = Ty). We wish to prove that x = y . Applying S, we find S[7(x)] = SITQ)] Since S[T(x)] = x and S[TQ)] = », this implies x = y. This proves that a function with a left inverse is one-to-one on its domain Now we prove the converse. Assume Tis on S: TV) + V which is a left inverse of (2.6), there is we define $ on e-to-one on V. We shall exhibit function If y © TV) , then y = T(x) for some x in V. By xactly one x in V for which y = T(x). Define S(y) to be this x. That is, 'V) as follows: SQ) = x means that T(x) = y. ‘Then we have S[7(x)] = x for each x in V, so ST = 1, Therefore, the function S so defined is a left inverse of parmation. Let T: V -> W be one-to-one on V. The unique left inverse of T (which we know is also a right inverse) is denoted by T-, We say that T is invertible, and we call TO the inverse of T. The results of this section refer to arbitrary functions. Now we apply these ideas to linear transformations. One-to-one linear transformations 41 2.7 One-to-one linear transformations In this section, V and W denote linear spaces with the same scalars, and T: V—> W denotes a linear transformation in Y( V, W). The linearity of T enables us to express the one-to-one property in several equivalent forms. auzone 2.10, Let T: V —> W be a linear transformation in P( V, W). Then the following statements are equivalent. (a) T is one-to-one on V. (b) T is invertible and its inverse T-): TV) — V is linear. (©) For all x in V, T(x) = 0 implies x = 0. That is, the null space N(T) contains only the zero element of V. Proof. We shall prove that (a) implies (b), (6) implies (©), and (©) implies (@). First assume (a) holds. Then 7 has an inverse (by Theorem 2.9), and we must show that 7-1 is linear. Take any two elements y and v in 7/V). Then u = Tix) and v = Tiy) for some x and y in V. For any scalars a and b, we have aus bv = aT (x) + bT(y) = Tax + by), since T is linear. Hence, applying 7-1, we have T7(au + by) = ax + by = aT) + bT“Q), so Tt is linear, Therefore (a) implies (b) Next assume that (b) holds. Take any x in V for which 1¥x) = 0. Applying T-1, we find that x = T-(O) = O, since T-* is linear, Therefore, (b) implies (©), Finally, assume (©) holds. Take any two elements u and v in V with Tu) = T(v). By linearity, we have T(u — v) = Nu) — Ti) = 0, so u — v = 0. Therefore, Tis one-to-one on V, and the proof of the theorem is complete. When V is finite-dimensional, the one-to-one property can be formulated in terms of independence and dimensionality, as indicated by the next theorem, mumonen 2.1 1. Let T: V — W be a linear transformation in L(V, W) and assume that Vis finite-dimensional, say dim V =n. Then the following statements are equivalent. (a) T is one-to-one on V. (b) If e,,...4 €» are independent elements in V, then T(e,),..., T(e,) are independent elements in T(V). © dim T(V) =n (d) If fer, ..., @/ is a basis for V, then {T(e,),...» T(e,)} is a basis for TV). Proof. We shall prove that (a) implies (6), (b) implies (€), (©) implies (), and (@) implies (@). Assume (a) holds. Let e€,,.. . , @y be independent elements of V and consider the a2 Linear transformations and matrices elements Te), . . . , T(e,) in TCV). Suppose that See) =0 for certain scalars ¢, ¢,. By linearity, we obtain (Se ©, and hence = See = since T is one-to-one, But ¢,.. . . @y are independent, so cy =** . = ¢, = 0. Therefore (a) implies (b) Now assume (b) holds. Let {e, -@} be a basis for ¥. By (b), the n elements T(e,),..., T(e,) in TCV) are independent. Therefore, dim T(V) > n . But by Theorem 2.3, we have dim T(V) Dee;, and hence raat w= sare. IfT(x) = O, thence, = y= 0, since the elements T(e,), » T(e,) are independent Therefore x = 0, so Tis one-to-one on V. Thus, (d) implies (a) and the proof is complete. 2.8 Exercises 1, Let V = {0,1} . Describe all functions 7; ¥—> V . There are four altogether. Label them as T,, T,, Ts, T, and make a multiplication table showing the composition of each pair, Indicate which functions are one-to-one on V and give their inverses. 2. Let V = {0, 1,2}. Describe all functions 7; V—+ V for which T(V) = V . There are six altogether. Label them as Ty, ... ,7@ and make a multiplication table showing the com- position of each pair. Indicate which functions are one-to-one on V, and give their inver In each of Exercises 3 through 12, a function T: ¥;—> Vz is defined by the formula given for T(x, y), where (x, y) is an arbitrary point in ¥,. In cach case determine whether T' is one-to-one on Vg. If it is, describe its range T( V,); for each point (u, v) in T( V2), let (x, y) = TM, v) and give formulas for determining x and yin terms of w and v. 3. Tix, y) = (Ys 9- & T(x, y) 4. Tex, y) = @, -y) 9. Tx, 9) 5. T(x, y) = (x, 0) 10. TO, y) 6. T(x, y) = (x, ) 1. Ty) 7. Te, y= 8, ). 12. Tex, y) (2x Exercises 43 In each of Exercises 13 through 20, a function 7: Ve— Vg is defined by the formula given for T(x, y 2), where (x, y, 2) is an arbitrary point in Vg. In each case, determine whether Tis one-to- one on Vg. cribe its range T(V4); for cach point (u, v, w) in TVs), let (x,y, 2) Tu, v, w) and give formulas for determining x, y, and z in terms of w, v, and w. 13. Tix, Yo) 14, Tex, ys 2) 15. Thx, 2) 16. T(x, y,2) fe yx) 17. Tix, y,2) =@ + Lyt 1z— D. (ys Oe 18. TH, 9,2) = (@+ Ly +22 43). (x, 2y, 32). 19. T(x, Jy 2) =X + YH VHD: (x, yx typ tz) 20. T(x, y,2) =O +y,y +2, 42). 21. Let T: V — V be a function which maps V into itself. Powers are defined inductively by the formulas T? = I, T" = TT" for n> 1 . Prove that the associative law for composition implies the law of exponents: T™7" = Ti". If Tis invertible, prove that 7” is also invertible and that (T)4 = (Ty, In Exercises, 22 through 25, S and 7 denote functions with domain V and values in V, In general, ST # TS, If ST = TS, we say that Sand T commute, 22, If $ and 7 commute, prove that (ST)" = S*T" for all integers n > 0. 23, If S and Tare invertible, prove that ST’ is also invertible and that (ST)! = 7-18, In other words, the inverse of ST is the composition of inverses, taken in reverse order. 24. If § and Tare invertible and commute, prove that their inverses also commute. 25. Let V be a linear space. If $ and 7 commute, prove that (S47 = St 4257+ Te and (S + T= $9 + 39T 4 38TH TP. Indicate how these formulas must be altered if ST # TS. 26. Let Sand Tbe the linear transformations of Vg into ¥, defined by the formulas S(x, y, 2) @ yex) and T(x, y, 2) = (x, x + yx + y + z), where (x, y, z) is an arbitrary point of V5. (@) Determine the image of (x, y, 7) under each of the following transformations : ST, TS, ST — TS. S*, T*, (ST), (TS)*, (ST — TS)*. (b) Prove that S$ and Tare one-to-one on Vz and find the image of (u, v, w) under each of the following transformations : S—!, T-1, (ST), (Ts). (©) Find the image of (x, y, 2) under (7 = 1)" for each n > 1 27. Let V be the linear space of all real polynomials p(x). Let D denote the differentiation operator and let T denote the integration operator which maps each polynomial p onto the polynomial @ given by a(x) = JZ p() dt . Prove that DT = Jy but that TD ¥ I, Describe the null space and range of TD. 28. Let V be the linear space of all real polynomials p(x). Let D denote the differentiation operator and let 1 be the linear transformation’ that maps p(x) onto xp’(x). (a) Let p(x) = 2 + 3x — x*+ 433 and determine the image ofp under each of the following transformations: D, T, PT, TD, DT — TD, T*D? — D*T?. (b) Determine those p in V for which 1(p) = p (©) Determine thosep in V for which (DT — 2D)(p) = 0. (@) Determine thosep in V for which (DT — TD)*(p) = D(p). 29. Let V and D be as in Exercise 28 but let 1 be the linear tansformation that maps p(x) onto xp(x). Prove tihat DT — TD = Z and that DT™ — T*D = nT for n > 2. 30. Let S and T be in ¥(V, V) and assume that ST = TS = J. Prove that ST" — T*S = nT™ + aie ame 3.1. Let V be the linear space of all real polynomials p(x). Let &, S, 7 be the functions which map an arbitrary polynomial p(x) = c+ cx +... + nx" in V onto the polynomials u(x), s(x), 44 Linear transformations and matrices and t(x), respectively, where rx) = pO), si) = Sex, = Yer. o fo (a) Let p(x) = 2 + 3x = x2 + x and determine the image of p under cach of the following wansformations: R, S, T, ST, TS, (TS), T?S?, S*T*, TRS, RST. (b) Prove that R, S, and Tare linear and determine the null spa (c) Prove that T is one-to-one on V and determine its inverse. (d) If n>1, express ( 7S)" and S"7™ in terms of J and R. 32. Refer to Exercise 28 of Section 2.4. Determine whether T is one-to-one on V. If it is, describe its inverse. and range of each, 2.9 Linear transformations with prescribed values If Vis. finite-dimensional, we can always construct linear transformation 7; V—> W. with prescribed values at the basis elements of V, as described in the next theorem. quzorEM 2.12. Let e,...,¢, be a basis for an n-dimensional linear space V. Let ts +++ 5 My be n arbitrary elements "in a linear space W. Then there is one and only one linear transformation T: V —» W_ such that 2D Te) = uy for k=1,2,..., This T maps an arbitrary element x in V as follows. (2.8) Xele, then Tx) Proof. Every x in V can be expressed uniquely as a linear combination of é,, oe the multipliers x1, .. . , 2, being the components of x relative to the ordered ba: (@,...,€. If we define T by (2.8), it is a straightforward matter to verify that Tis linear, If x = ey for some f, then all components of x are 0 except the kth, which is 1, so (2.8) gives T(e,) = ux» are required To prove that there is only one linear transformation satisfying (2.7), let 7? be another and compute T(x). We find that T= 1 (Exe) 7 sare) 2 yt, = The). Since T@) = Tix) for all x in V, we have T = T, which completes the proof. mower. Determine the linear transformation 7: V,—> V, which maps the basis elements i = (1,0) and j = , 1) as follows: TW =i +), TQ) = 2 — Matrix representations of linear transformations 45 Solution, If x= x,i+ Xp j is an arbitrary element of Vg, then T(x) is given by Tod) = 7) + XT) = AG +4) + m2Qi = j) = Or + 2x) + C1 — Dj. 2.10 Matrix representations of linear transformations Theorem 2.12 shows that a linear transformation 7: V—» W of a finite-dimensional linear space V is completely determined by its action on a given set of basis elements €1,.. +5 n+ Now, suppose the space W is also finite-dimensional, say dim W = m, and let Wy, +064 W,, be a basis for W. (The dimensions 2 and m may or may not be equal.) Since T has values in JV, each element 7(e,) can be expressed uniquely as a linear combination of the basis elements Wy... Wms Say ree) = Stam, Where fy. +++ ty ate the components of T(@,) relative to the: ordered basis (Wy... . 5 W)). We shall display the m-tuple (ty... + fz) vertically, as follows: tie box (2.9) an 1, This array is called a column vector or a column matjix. We have such a column vector for each of the n elements T(@), . . . . T(@,). We place fhem side by side and enclose them in one pair of brackets to obtain the following rectangular array: Aaetaa anata dt Maaiiiiteg tic shila fr ta 2° 8 bmn This array is called a matrix consisting of m rows and n columns. We call it an_m by ” mattix, or an m xn matrix. The first row is the 1 xm matrix (fy . tho, + tin) The m x 1 matrix displayed in (2.9) is the kth column. The scalars f,, are indexed so the first. subscript i indicates the row:, and the second subscript k indicates the column in which fx occurs We call ft, the ik-entry ot the ik-element of the matrix, The more compact notation (ads or Caden is also used t denote the matrix whose ik-entry is fig 46 Linear transformations and matrices ‘Thus, every linear transformation T of an n-dimensional space V into an m-dimensional ¢ W gives rise to an m x # matrix (f,,) whose columns consist of the components of T(e,),..., T(e,,) relative to the basis (W,,. . . , w,). We call this the matrix representation of T relative to the given choice of ordered bases (e,,. . . , ¢,) for Vand (w + w,) for W. Once we know the matrix (f,,), the components of any element 7/x) relative to the basis (w,,... ,w,) can be determined as described in the next theorem. razon 2.13. Let T be a linear transformation in PV, W), where dim V = n and dim W =m. Let (e,...,€,) and (wy... , w,) be ordered bases for Vand W, respectively, and let (tx) be the m x n matrix whose entries are determined by the equations (2.10) Te) =Staw, for k=1,2,...50 a Then an arbitrary element 1) x= Dxite in V with components (x1... . , x,) relative to (e,,., ., e,) is mapped by T onto the element (2.12) Tx) = > yy A in W with components (y,,... , y,) relative to (w,,..., w,). The Y; are related to the components of x by the linear equations (2.13) =D tax for i= Proof. Applying T t each member of (2.11) and using (2.10), we obtain dad n= > ( Stes) % 3 vives where cach y, is given by (2.13). This completes the proof. Having chosen a pair of bases (e,,... , @) and (w,, + Wm) for V and W, respectively, every linear transformation 7; V—> W has a matrix representation (t,,). Conversely, if we start with any om scalars amanged as a rectangular matrix (/,,) and choose a pair of ordered bases for Vand W, then it is easy to prove that there is exactly one linear trans- formation T: V—» W having this matrix representation, We simply define Tar the basis elements of V by the equations in (2.10). Then, by Theorem 2.12, there is one and only one linear wansformation 7; V—> W with these prescribed values. The image T(x) of an arbitrary point x in V is then given by Equations (2.12) and (2.13). Matrix representations of linear transformations 47 EXAMPLE J, Construction of a linear transformation from a given matrix. Suppose we start with the 2 x 3° matrix 3001 2 1 0 4] Choose the usual bases of unit coordinate vectors for Vy and Vp. Then the given matrix represents a linear transformation 7: Vy—> V_ which maps an arbitrary vector (x1, x25 %4) in V, onto the vector (jy, ys) in Yz according to the linear equations Di = 3%, + Xp — 2xs, Yes M+ ox, + xy. exampue 2. Construction of a matrix representation of a given linear transformation. Let V be the linear space of all real polynomials p(x) of degree <3. This space has dimen- sion 4, and we choose the basis (1, x, x%, x9), Let D be the differentiation operator which maps each polynomial p(x) in V onto its derivative p’(x). We can regard D as a linear transformation of V into W, where W is the 3-dimensional space of all real polynomials of degree < 2’. In W we choose the basis (1, x, 2%). To find the matrix representation of D relative to this (choice of bases, we transform (differentiate) each basis element of Vand express it as a linear combination of the basis elements of W. Thus, we find that Dd) = 0 = 0 + Ox + Ox, D(x) = 1 = 1+ Ox + Ox%, DX’) = Os ax + Ox*, D(x) = 3x2 = 0 + Ox + 3x? The coefficients of these polynomials determine the columns of the matrix representation of D. Therefore, the required representation is given by the following 3 x 4 matrix: Ol o 00 2:0: DEON Og To emphasize that the matrix representation depends not only on the basis elements but also on their order, let reverse the order of the ba elements in Wand use, instead, the ordered basis (x, x, 1). Then the basis elements of V are transformed into the same poly- nomials obtained above, but the components of these polynomials relative to the new basis (x, x. 1) appear in reversed order. Therefore, the matrix representation of D now becomes 48 Linear transformations and matrices Let us compute a third matrix representation for D, usingthebasis (1, 1 +x, 1 +x + x%, 1 +x + x%+ x) for V, and the basis (1, x, x2) for W. ‘The basis elements of V are trans- formed as follows: Da) = Dil+x)=1, DCL +x+x7) +2x, DU +x4x2+ x)= 1 +2x43x7, so the matrix representation in this case is 01 1 0 0 22 00 2.11 Construction of a mat representation in diagonal form Since it is possible to obtain different matrix representations of a given finear transforma- tion by different choices of bases, it is natural to try to choose the bases so that the resulting matrix will have a particularly simple form. The next theorem shows that we can make all the entries 0 except possibly along the diagonal starting from the upper left-hand comer of the matrix, Along this diagonal there will be a string of ones followed by zeros, the number of ones being equal to the rank of the transformation. A matrix (f,) with all entries 4, = 0 when i # k is said to be a diagonal matrix. mon 2.14. Let V and W be finite-dimensional linear spaces, with dim V = 1 and dim W =m. Assume T & P( V, W) and let r =dim T(V) denote the rank of T. Then there exists a basis (e,,..., @) for Vand a basis (W,.. +, Wy) for W such that (2.14) Te) =", for iml,2,...47 and (2.15) Te@)=Of or tmrt+l,....8 Therefore, the matrix (ty) of T relative to these bases has all entries zero except for the r diagonal entries ty= hg =e yp 1. Proof. First we construct a basis for W. Since 1(V) is a subspace of W with dim 7(V) = rr, the space 7(V) has a basis of r elements in W, say w1, . W,. By Theorem 1.7, these elements form a subset of some basis for W. Therefore we can adjoin elements Wpy15 Wy, SO that 2.16) Ga Wrens. Waa) is a basis for W. Construction of a matrix representation in diagonal form 49 Now we construct a basis for V. Each of the first + elements w; in (2.16) is the image of at least one element in V. Choose one such element in V and call it e, . Then T(e,) = w, for i=1,2,..., 1 so (2.14) is satisfied. Now let k be the dimension of the null space N(T). By Theorem 2.3 we have n =k +r. Since dim N(T) = &, the space N(T) has a bi consisting of k elements in V which we designate ase, ... + jp- For each of these elements, Equation (2.15) is satisfied. Therefore, to complete the proof, we must show that the ordered set 17 (iy... Cre Crete ss Cran) is a basis for V. Since dim V = n = r + k, we need only show that these elements are independent. Suppose that some linear combination of them is zero, say nike 2.18) Yee =o. Applying 7’ and using Equations (2.14) and (2.19, we find that LaTe) = cows f a But w,,...,W, are independent, and hence ¢, = + = 0. Therefore, the first r terms in (2.18) are zero, so (2.18) reduces to £ e,=0 ie Buteéjs:, sss, € ft independent since they form a basis for N(T), and hence ¢,, = =++ = Cay = 0. Therefore, all the ¢, in (2.18) are zero, so the elements in (2.17) form a basis for V. This completes the proof, wows, We refer to Example 2 of Section 2.10, where D is the differentiation operator which maps the space V of polynomials of degree < 3 into the space W of polynomials of degree <2. In this example, the range 7(V) = W, so T has rank 3, Applying the method used to prove Theorem 2.14, we choose any basis for W, for example the basis (I, x, x?) A set of polynomials in V which map onto these elements is given by (x, 4x%, 4x9), We extend this set to get a basis for V by adjoining the constant polynomial 1, which is a basis, for the null space of D. Therefore, if we use the basis (x, $x, $x%, 1) for V and the basis 1, x, x) for W, the corresponding matrix representation for D has the diagonal form icOH Ol Otis On.0: 08:13:60. 50 Linear transformations and matrices 2.12 Exercises In all exercises involving the vector space V, , the usual basis of unit coordinate vectors is to be chosen unless another basis is specifically mentioned, In exercises concemed with the matrix of a linear transformation 7: ¥—> W where V = W’, we take the same basis in both V and W unless another choice is indicated. 1. Determine the matrix of each of the following linear transformations of V,, into Vy (a) the identity transformation, (b) the zero transformation, (©) multiplication by a fixed scalar c. 2. Determine the matrix for each of the following projections. (a) 1: Va Vo, — where T(x, X2,X9) = Oy %2) (b) 7: Vs» Vp, where T(xy, x2,¥9) = Orgs Xe fc) T: Vy—> Vy, where (xy, x25 X35 ¥45%) = (Xn X3 Xa) 3. A linear wansformation 7: ¥, > V, maps the basis vectors # andj as follows: With Th) =%-J. (@) Compute TGi = 4) and 773i ~ 4j) in terms of i and j. (b) Determine the matrix of 7 and of 7%, (©) Solve part (b) if the basis (, j) is replaced by (e, , ey), where ¢ = i —j, ey = 3i + j 4, A Tinear transformation 7: Vz —> Vz is defined as follows: Each vector (x, y) is reflected in the y-axis and then doubled in length to yield T(x, y). Determine the matrix of T and of T?. 5. Let T: Vy» Vq be a linear transformation such that Tek) =WAB+SK, TG+k) =H T(itj+k) (a) Compute 7( + 2j + 3k) and determine the nullity and rank of 7 (b) Determine the matrix of 7. 6. For the linear transformation in Exercise 5, choose both bases t0 be (€,, @9, €y), where ey = (2, 3, 5), ep = (1, 0, 0), eg =, 1, =I), and determine the matrix of 7 relative to the new bases 7. A linear wansformation 7 V_—* Vz maps the basis vectors as follows: TH = (0, 0), TG) = GD, TU) = C, -. (a) Compute 7(4i -j + k) and determine the nullity and rank of 7. (b) Determine the matrix of 7. (©) Use the basis (i, j, &) in Vy and the basis (wy , wy) Determine the matrix of 7 relative to these bases. (@) Find bases (e, . eg, €) for Vz and (wy , w) for Ve relative to which the matrix of Twill be in diagonal form. 8. A linear transformation 7: V, + Vg maps the basis vectors as follows: TH = (1, & D, T@ = (1,0, D, (a) Compute T(2i ~ 3)) and determine the nullity and rank of 7. (b) Determine the matrix of 7. (©) Find bases (@,, ¢,) for V2 and (w., Wy, w4) for Vg relative 0 which the matrix of Twill be jin. diagonal form. 9, Solve Exercise 8 if 71) = (1,0, 1) and 7g) = (1,1, 1). 10. Let Vand W belinear spaces, each with dimension 2 and each with basis (e , ¢2). Let T: V-> W be a linear transformation such that T(e, + e,) = 3¢, + 9ee, T(3e, + 2¢:) = Tey + 23¢2. (a) Compute T(e, — e,) and determine the nullity and rank of T. (b) Determine the matrix of T relative to the given basis. in Vz, where w, = (1, 1), ¥2 = (1,2). Linear spaces of matrices 1 (©) Use the basis (e, , ¢)) for V and find a new basis of the form (e, + aes, 2¢ + be.) for W, relative t© which the matrix of Twill be in diagonal form. In the linear space of all real-valued functions, each of the following sets is independent and spans a finite-dimensional subspace V. Use the given set as a basis for ¥ and let D: V—> V be the differentiauon operator. Tn each case, find the matrix of D and of B® relative to this choice of basis 11. (in x, cos »). 15. Geos x, sin x). 12. , x, 4) 16. (sin x, cos xX, x sin x, x COS x), 13. (0,1 4x14 x +e). 17. (€% sin x, e* COS x). 14. (e€, xe") 18, (e* sin 3x, e* COS 3x). 19, Choose the basis (1. x, %, x) in the linear space V of all real polynomials of degree <3. Let D denote the differentiation operator and let Ts V—> V be the linear transformation which maps p(x) onto xp’). Relative the given basis, determine the matrix of each of the followingtransformations: (a) T; (6) DT; (©) TD; (a) TD — DT; (e) T?; (f) T2D* ~ DET* 20, Refer to Exercise 19. Let W be the image of Vunder TD. Find bases for Vand for W relative © which the matrix of 7D is in diagonal form. 2.13 Linear spaces of matrices We have seen how matrices arise in a natural way as representations of Tinear trans: formations. Matrices can also be considered as objects existing in their own right, without necessarily being connected to linear transformations. As such, they form another class of mathematical objects on which algebraic operations can be defined. The connection with linear transformations serves as motivation for these definitions, but this connection will be ignored for the moment Let m and n be two positive integers, and let J,,, be the set of all pairs of integers (i, /) such that 1 : then we have 6 2 -2 24 -6 —5 0 = tan |: 2A = [ |: (-DB= [ |: edd 2s -1 2 -3. We define the zero matrix 0 to be the m x n matrix all of whose elements are 0. -With these definitions, it is a straightforward exercise to verify that the collection of all m x n matrices is a linear space. We denote this linear space by M,,,,. If the entries are real numbers, the space M,,,, i8 a real linear space. If the entries are complex, My, is a complex linear space. It is also easy to prove that this space has dimension mn. In fact, a basis for Moy,» consists. of the mn matrices having one entry equal to 1 and all others equal to 0. For example, the six matrices [ool Goal foot God: fick bed form a basis for the set of all 2 x 3 matrices. 2.14 Isomorphism between linear transformations and matrices We return now to the connection between matrices and linear transformations. Let V and W be finite-dimensional linear spaces with dim V = n and dim W = m. Choose a basis (@,, ... ,¢) for Vand a basis (w,,.... w,) for W. In this discussion, these bases are kept fixed. Let @(V, W) denote the linear space of all linear transformations of V_ into W. if Te L(V, W), let m(T) denote the matrix of T relative to the given bases. We recall that_m(T) is defined as follows. isomorphism between linear transformations and matrices 53 ‘The image of cach basis clement ¢, is expressed as a linear combination of the basis elements in W: (2.19) Te) = Sta, for k= 1,2, ‘The scalar multipliers ¢, are the ikentries of m(T). Thus, we have (2.20) m(T) = (ta dB Equation (2.20) defines a new function m whose domain is L(V, W) and whose values are matrices in M,,,,. Since every m x n matrix is the.matrix m(T) for some Tin L(V, W), the range of m is M,,. The next theorem shows that the transformation m: P(V, W) > Ma, is linear and one-to-one on P( V, W). THHOREM 2.15, ISOMORPHISM ‘THEOREM. For all Sand 1° in L(V, W) and all scalar G we have m(S + T) = m(S)+ m(T) and —m(cT) = em(7). Moreover, m(S) = m(T) implies so m is one-to-one on &(V, W). Proof. The matrix m(T) is formed from the multipliers ty in (2.19), Similarly, the matrix m(S) is formed from the multipliers si in the equations (2.21) S(e,) Ssaw, for kad, 2..-.50 Since we have (S+ THe) =EOw+ tame and (TIE) = F (eta, = em(T). This. proves we obtainm(S + 7) = (Sy + ty) = m(S) + m(T) and m(cT) = (ct that mis linear. To prove that m is one-to-one, suppose that m(S) = m(T), where S = (sy) and T = (ta). Equations (2.19) and (2.21) show that S(e,) = T(e,) for each basis element ¢, so S(x) = T(x) for all x in V, and hence S = T. Note: The function m is called an isomorphism. For a given choice of bases, m establishes a one-to-one comespondence between the set of linear transformations (V,W) and the set of m xn matrices Myy,q The operations of addition and multipli- cation by scalars are preserved under this correspondence. The linear spaces #(V, W) and My, ate said to be isomorphic. Incidentally, Theorem 2.11 shows that the domain of a one-to-one linear transformation has the same dimension as its range. Therefore, dim £(V, 1V) = dim My, If V = Wand if we choose the same basis in both V and W, then the matrix m(Z) which comesponds to the identity transformation J; V— V is ann x n diagonal matrix with each 54 Linear transformations and matrices diagonal entry equal to 1 and all others equal to 0. This is and is denoted by I or by L, called the identity or unit matrix 2.15 Multiplication of matrices Some linear transformations can be multiplied by means of composition. Now we shall define multiplication of matrices in such a way that the product of two matrices comesponds to the composition of the linear transformations they represent. We recall that if T: U-» V and S: ¥-> Ware tinear transformations, their composition ST: U-> W is a linear transformation given. by ST(x) = S[T(x)]_— forallxin U. Suppose that U, V, and Ware finite-dimensional, say dmU=n, dimV=p, dim W= m. Choose bases for U, V, and W. Relative to these bases, the matrix m(S) is an m x p matrix, the matrix 7 is a p x M matrix, and the matrix of ST is an m x m matrix, The following definition of matrix multiplication will enable us to deduce the relation m(ST) = m(S)m(T). This extends the isomorphism property to products. pammox. Let A be any mx p matrix, and let B be any p x n matrix, say A= (a) and B= (bia Then the product AB is defined 0 be the m xn matrix C = (c43) whose ij-entry is given by (2.22) cy = Sainbes- a Note: The product AB is not defined unless the number of columns of A is equal 0 the number of rows of B. If we write A, for the ith row of A, and BY for thejth column of B, and think of these as p-dimensional vectors, then the sum in (2.22) is simply the dot product 4; + B’, In other words, the ij-entry of AB is the dot product of the ith row of A with thejth column of B: AB = (A; - BY 21 Thus, matrix multiplication can be regarded as a generalization of the dot product. 4 6 312 wane toed = [ rope 5-1]. SinceAis2 x 3andBis3x 2, ] 011 2. Multiplication of matrices 55 the product AB is the 2 x 2 matrix Ay* BY Ay> B® nek AB= = Aa BY Ag? BY 1 +7 The entries of AB are computed as. follows: A,- BE 17, ABs 3-44 1-542-0 3-641-(-1I) +2-2=21, A+ B= Cl) t4 41-54 0-021 Ag? B= (-1)°6+1-(-1) 40-25 — EXAMPLE 2, Let Here A is 2 x 3and B is 3 x 1, so AB is the 2 x 1 matrix given by ane [3]- (7) + (=2) 41-14 (—3):2 = ~9and Ay B= 1 (-2) 4-2-1 44-25 ce A, + BY = suwete 3, If A and B are both square matrices of the same size, then both AB and BA are defined. For example, if we find that Siang) CD AB = » BA= 2 2 3 12. This example shows that in general AB % BA . If AB = BA, we say A and B commute. mompce 4, If I, is the p x p identity matrix, then 1,4 = A for every p x m matix A, and BI, = B for every m x p matrix B. For example, 1 0 07/2 iz ato i i iting o 1 off3} =]3}, 01 45 6. tS oo 14 4 oof Now we prove that the matrix of a composition ST is the product of the matrices m(S) and m(r), 56 Linear transformations and matrices ruzorem 2.16. Let T: U-» Vand S: V— W be linear transformations, where U, V, W are finite-dimensional linear spaces. Then, for a fixed choice of bases, the matrices of S,T, and ST are related by the equation m(ST) = m(S)m(7). Proof. Assume dim U = n, dim V = p, dim W = m. Let (u,.... u,) be a basis for U, (oy, +++, ¥) a basis for V, and (w,,..., w,) a basis for W. Relative to these bases, we have m(S) = (5:9) hers | SGu)a Swe em eee A and m(T) = (ters where TH) =Stede for fe ty2,-..5m ‘Therefore, we have ST(u,) = S[T(u,)] 3 tesStoy so we find that m(ST) Sat) = m(Sm(T) ffm We have already noted that matrix multiplication does not always satisfy the com- mutative law. The next theorem shows that it does satisfy the associative and distributive laws THEOREM 2.17, ASSOCIATIVE AND DISTRIBUTIVE LAWS FOR MATRIX MULTIPLICATION. Given matrices A, B, C. (@) If the products A(BC) and (AB)C are meaningful, we have A(BC) = (AB)C (associative law). (b) Assume A and B are of the same size, If AC and BC are meaningful, we have (A+ B)C = AC + BC (right distributive law), whereas if CA and CB are meaningjiil, we have C(A + B) = CA + CB (left distributive law). Proof. These properties can be deduced directly ftom the definition of mavix muli- plication, but we prefer the following type of argument. Introduce finite-dimensional linear spaces U, V, W, X and linear transformations T: U-> V, S: V» W, R: W—> X such that, for a fixed choice of bases, we have A=mR) B=mS) C= m(T). Exercises s7 By Theorem 2.16, we have m(RS) = AB and m(ST) = BC From the associative law for composition, we find that R(ST) = (RS)T. Applying Theorem 2.16 once more to this equation, we obtain m(Rym(ST) = m(RS)m(T) or A(BC) = (4B)C, which proves (a). The proof of (b) can be given by a similar type of argument, DEFINITION. If A is a square matrix, we define integral powers of A inductively as follows: A A" =AA"1 for n>1. 2.16 Exercises Lita a 1 3), c=fa -1 LAs so + C= ]1 =1], compute , AB, “o404 compute B + C, A BA, AC, CA, AQB = 30). 01 2.LetA= i ] Find all 2 x 2 matrices B such that (a) AB = 0 ; (b) BA = 0. 3. In each ease find a, b, ¢, d to satisfy the given equation. ee i. [Hee a 7 Ochiai Aan a @ -|?}; -{ ] 2 6 100 19 8 4 2s a 5. ol 0 4. Calculate AB — BA in each case. 122 41 a 123 121 2 ee 3 ir 2 @ a=] 1 1 2], B 30-2004 -1 ia i. 3 ia 11 5. If A is a square matrix, prove that A"4™ = A™*” for all integers m > 0, n> 0. to 2 6. Let A = « Verify that 4? = and compute 4”, ot o1 ‘cos 0 —sin 0 ‘cos 20 —sin 201 7. Let A = « Verify that 42 = and compute A”, sin 8 cos 0 sin29 cos 20 Lia 1 233 8. Letd=]0 1 1]. Verify that 4? 1 2]. Compute 4? and A‘, Guess a general 0 oo1 formula for A” and prove it by induction, 58 Linear transformations and matrices 10 9. LetA = [ 1171 Prove that A* = 2A —Z and compute 4% af 10. Find all 2 x 2 matrices A such that A? = 0. 11. (a) Prove that a 2 x 2 matrix A commutes with every 2 x 2 matrix if and only if A commutes with each of the four matrices 10 oO 0 0 0 0 de 1 of’ jo af (b) Find all such matrices A. 12. The equation A? = Z, is satisfied by each of the 2 x 2 matrices feed eden where & and ¢ are arbitrary real numbers. Find all 2 x 2 matrices A such that 4? = 7. 2- 7 6 3.1f A= and B= » find 2 x 2 matrices C and D such that AC 23, 98 andDA=B 14, (@) Verify that the algebraic identities (A + BY = AP+ 2AB+ BY and (A + BAB) = AP Bt ~1 1 0 do not hold for the 2 x 2 matrices A= and B= : Oa: ae (b) Amend the right-hand members of these identities to obtain formulas valid for all square matrices A and B. (©) For which matrices A and B are the identities valid as stated in (a)? 2.17 Systems of year equations Let A = (ay) be a given m x matrix of numbers, and let cy, + Cm bem further numbers. A set of m equations of the form (2.23) Dae ee a is called a system of 7m linear equations in n unknowns, Here x4, .. . , x, are regarded as unknown. A solution of the system is any n-tuple of numbers (x,,.... x,) for which all the equations are satisfied. The matrix A is called the coefficient-matrix of the system. Linear systems can be studied with the help of linear transformations. Choose the usual bases of unit coordinate vectors in V, and in V,,. The coefficien-matix A determines a Systems of linear equations 59 linear tansformation, T: V, — Viq, which maps an arbitrary vector x = (xy, .... x) in Vy onto the vector y = (V1, + Ym) in V_ given by the m linear equations ine osm wm DaarX, for i Letc=(c,,...,6) be the vector in V,, whose components are the numbers appearing in system (2.23). This system can be written more simply as; Tx) The system has a solution if and only if ¢ is in the range of 7. If exactly one x in V,, maps onto ¢ the system has exactly one solution, If more than one x maps onto ¢, the system has more than one solution. sxwewz: 1, A system with no solution. The system x +y = 1,x¥+ y=2 no solution, “The sum of two numbers cannot be both I and 2. severe 2. A system with exactly one solution. The system x + ¥ exactly one solution: (x, y) = (4, 4). x= y =0 has severs: 3. A system with more than one solution. The system x + y = 1 , consisting ‘of one equation in two unknowns, has more than one solution, Any two numbers whose sum is 1 gives 44 solution, With each linear tem (2.23), we can associate another oS ikea en: for obtained by replacing each ¢; in (2.23) by 0. This is called the homogeneous system corre- sponding to (2.23). If ¢ ¥ O, system (2.23) is called a nonhomogeneous system. A vector x in V,, will satisfy the homogeneous system if and only if T(x) = 0, where T is the linear transformation determined by the coefficient-matrix. The homogene- ous system always has one solution, namely x = @, but it may have others, The set of solutions of the homogeneous system is the null space of T, The next theorem describes the relation between solutions of the homogeneous system and those of the nonhomogeneous razon 2.18, Assume the nonhomogeneous system (2.23) has a solution, say b. (@) If a vector x is a solution of the nonhomogeneous system, then the vector v is a solution of the corresponding homogeneous system. (b) Ifa vector v is a solution of the homogeneous+ system, then the vector x = v + b isa solution of the nonhomogeneous system. —b 60 Linear transformations and matrices Proof. Let T: V, — Vm be the linear transformation determined by the coefficient- matrix, as described above. Since b is a solution of the nonhomogeneous system we have T(b) =e. Let x and v be two vectors in V, such that v = x — b. Then we have Te) = T(x = b) = T(x) = Tb) = Tix) — Therefore T(x) = ¢ if and only if T(e) = 0. This proves both (a) and (b). This theorem shows that the problem of finding all solutions of a nonhomogeneous system splits naturally into two parts: (1) Finding all solutions v of the homogeneous system, that is, determining the null space of 7; and @) finding one particular solution b of the nonhomogeneous system. By adding b to each vector v in the null space of T, we thereby obtain all solutions x = v + b of the nonhomogenec vem. Let k denote the dimension of A(T) (the nullity of 7). If we can find k independent solutions 1... . 0% Of the homogeneous system, they will form a basis for N(I), and we can obtain every v in NCT) by forming all possible linear combinations DE Dt + thy, where fy,.. +. fy are arbitrary scalars, This linear combination is called the general solution of the homogeneous system. If b is one particular solution of the nonhomogeneous system, then all solutions x are given by XS b+ yet. hee This linear combination is called the general solution of the nonhomogeneous system. ‘Theorem 2.18 can now be restated as follows. suconm == 2.19. Let T: V, —> Vy, be the linear transformation such that Tix) = y, where X= Qi Mey = Ore... , Pm) and vs fo Let k denote the nullity of T. If 1, . . . , Uy are k independent solutions of the homogeneous c, system T(x) = 0, and if b is one particular solution of the nonhomogeneous system 1 then the general solution of the nonhomogeneous system is xe be ht... + th, where ty... +» ty are arbitrary scalars. ‘This theorem does not tell us how to decide if a nonhomogeneous system has a particular solution b, nor does it tell us how to determine solutions 0, + Uy of the homogeneous system, It does tell us what to expect when the nonhomogeneous system has a solution ‘The following example, although very simple, illustrates the theorem. Computation techniques 61 wows. The system x + y = 2 has for its associated homogeneous system the equation x + y = 0. Therefore, the null space consists of all vectors in V_ of the form (t, —1), where f is arbitrary. Since (t, —1) = ¢(L, —1), this is a one-dimensional subspace of V» with basis (1, - 1). A particular solution of the nonhomogeneous system is (0,2). There- fore the general solution of the nonhomogeneous system is given by Gy) = (0,2 +01,-1I or x at, where f is arbivwury. 2.18 Computation techniques We tum now to the problem of actually computing the solutions of a nonhomogeneous linear system. Although many methods have been developed for attacking this problem, all of them require considerable computation if the system is large, For example, to solve a system of ten equations in as many unknowns can require several hours of hand com- putation, even with the aid of a desk calculator. We shall discuss a widely-used method, known as the Gauss-Jordan elimination method, which is relatively simple and can be easily programmed for high-speed electronic computing machines. The method consists of applying three basic types of operations on the equations of a linear system: (1) Imterchanging two equations; (2) Multiplying all the terms of an equation by a nonzero scalar: (3) Adding to one equation a multiple of another. Bach time we perform one of these operations on the system we obtain a new system having exactly the same solutions, Two such systems are called equivalent. By performing these operations over and over again in a systematic fashion we finally arrive at an equivalent system which can be solved by inspection, We shall illustrate the method with some specific examples. It will then be clear how the method is to be applied in general were 1, A system with a unique solution, Consider the system 2x — Sy + 42 xt z x— 4y + 62 This particular system has a unique solution, x = 124, y = 75, 2 = 31 , which we shall obtain by the Gauss-Jordan elimination process. To save labor we do not bother to copy the letters x, Y, 2 and the equals sign over and over again, but work instead with the aug- mented matrix (2.24) 12 1 3 aa eHla io: 62 Linear transformations and matrices obtained by adjoining the right-hand members of the system © the coefficient matrix. The three basic types of operation mentioned above are performed on the rows of the augmented matrix and are called row operations. At any stage of the process we can put the letters x, y, % back again and insert equals signs along the vertical line (© obtain equations. Our ultimate goal is to arrive at the augmented matrix 10 Ofagpa (2.25) o 10 | [hs oo1 after a succession of row operations. The corresponding system of equations is x = 124, y = 75, 2 = 31 , which gives the desired solution, ‘The first step is to obtain a 1 in the upper left-hand comer of the matrix, We can do this by interchanging the first row of the given matrix (2.24) with either the second or third row. Or, we can multiply the first row by J. Interchanging the first and second rows, we get 12 1 s ee tee] ea eee eo. ‘The next step is to make all the remaining entries in the first column equal to zero, leaving the first row intact. To do this we multiply the first row by -2 and add the result to the second row. Then we multiply the first row by -1 and add the result to the third row. After these two operations, we obtain (2.26) helen meal Oat fi —12{-13 Now we repeat the process on the smaller matrix [ 2s _| which appears = 5 adjacent to the two zeros. We can obtain a 1 in its upper left-hand corner by multiplying the second row of (2.26) by — 1 [This gives us the matrix ort 251 | Bs Multiplying the second row by 2 aifi adding the resul]to the third, we get 1-2 1/5 2.27) Ue) g Computation techniques 63 At this stage, the comesponding system of equations is given by xt zs 5 These equations can be solved in succession, starting with the third one and working backwards, to give us 31, ys 13422513 462575, x 5+ BYor=5 + 150-31 = 124, Or, we can continue the Gauss-Jordan process by making all the entries zero above the diagonal elements in the second and third columns. Multiplying the second row of (2.27) by 2 and adding the result to the first row, we obtain ® oots21 | Finally, we multiply the third row by 3 and add the result to the first row, and then multiply the third row by 2 and add the result to the second row to get the matrix in (2.25). severe 2. A system with more than one solution. Consider the following system of 3 equations in 5 unknowns: dx —Sy+4z+ u-v=—3 (2.28) xyes, = utvss x—4y t+ 624+ 2u—v=10. The corresponding augmented matrix is 4 46N21 4 3105.1 ‘The coefficients of x, y, z and the right-hand members are the same as those in Example 1 If we perform the same row operations used in Example 1, we: finally arrive at the augmented matrix Dee Oes Ole Gigs 19) lai2 Oe oy Oe 4| 31 64 Linear transformations and matrices ‘The corresponding system of equations can be solved for x, y, and x in terms of u and v, giving us x = 124 + 16u = 190 75 + 9u — Vv z= 31 + 3u— 40. If we let w = f and v = fy, where f, and fy are arbitrary real numbers, and determine X, Js Z by these equations, the vector (x. y, z u, v) in Vy given by (Ys 2 My 0) = (124 + 164 = 19%, 75 + 9Fy — Mfg, 31 + 34 — tes fh, 6) is a solution, By separating the parts involving 4; and fz, we can rewrite this as follows: (%y, 2, uy vd) = (124, 75, 31,0, 0) + (16, 9,3, 1,0) + 44(-19, -11, —4,0, 1) ‘This equation gives the general solution of the system. The vector (124, 75, 31,0, 0) is a particular solution of the nonhomogeneous system (2.28). The two vectors (16, 9, 3, 1,0) and (-19, —11,-4 , 0, 1) are solutions of the corresponding homogeneous system. Since they are independent, they form a basis for the space of all solutions of the homogeneous system, movers 3. A system with no solution, Consider the system 2x — Sy +42 =- 3 2.29) xodts2 = 5 x—4y+5z= 10. This system is almost identical to that of Example 1 except that the coefficient of % in the third equation has been changed fom 6 t 5. The corresponding augmented matrix is aiess | 3 as, Applying the same row operations used in Example 1 to transform (2,24) into (2.27), we arrive at the augmented matrix (2.30) ool 201 | When the bottom row is expressed as an equation, it states that 0 = 31, Therefore the original system has no solution since the two systems (2.29) and (2.30) are equivalent Inverses of square matrices 65 In each of the foregoing examples, the number of equations did not exceed the number of unknowns. If there are more equations than unknowns, the Gauss-Jordan process is still applicable, For example, suppose we consider the system of Example 1, which has the solution x = 124, y = 75, z = 31. If we adjoin a new equation to this system which is also satisfied by the same triple, for example, the equation 2x — 3y + % = 54, then the elimination “process leads to the agumented matrix 1.0 0| 124 01 10:) 275 001 /3 1 000 0 with a row of zeros along the botiom, But if we adjoin a new equation which is not satisfied by the triple (124, 75, 31), for example the equation x + y + % = 1, then the elimination process leads to an augmented matrix of the form ee ee 0 1 0} 75 O10 test): 000] a where a 3 0. The last row now gives a contradictory equation 0 = @ which shows that the system has no solution, 2.19 Inverses of square matrices Let A = (a,) be a square n x n matrix, If there is another n x nm matrix B such that BA = I, where Z is the n x 1 identity matrix, then A is called nonsingular and B is called a lofi inverse of A Choose the usual basis of unit coordinate vectors in V,, and let T: V,, — V,, be the linear transformation with matix m(T) = A. Then we have the following. musonn 2.20, The matrix A is nonsingular if and only if T is invertible. If BA =1 then B = m(T-). Proof. Assume that A is nonsingular and that BA = I, We shall prove that 7(x) = 0 implies x = 0. Given x such that 7(x) = Q, let X be the nm x 1 column matix formed from the components of x. Since T(x) = O, the matrix product AX is an n x 1 column matrix consisting of zeros, so B(A.X) is also acolumn matrixof zeros. But B(AX) = (BA)X’ IX = X, so every component of x is 0. Therefore, Tis invertible, and the equation 77-1 = Z implies that_m(T)m(T7) = Z or Am(T-) = 1. Multiplying on the left by B, we find m(T>) = B . Conversely, if T is invertible, then TT’ is the identity transformation so m(T-)m(T)is the identity matrix, Therefore A is nonsingular and m(T-)A = 1. 66 Linear transformations and_matr All the properties of invertible linear transformations have their counterparts for non= singular matrices. In particular, left inverses (if they exist) are unique, and every left inverse is also a right inverse. In other words, if A is nonsingular and BA = J, then AB = I. We call B the inverse of A and denote it by A~’. The inverse AW is also non- singular and its inverse is A. Now we show that the problem of actually determining the entries of the inverse of a nonsingular matrix is equivalent to solving 7 separale nonhomogeneous linear systems. Let A = (a) be nonsingular and let A-t = (b,,) be its inverse. The entries of A and A> are related by the 1* equations 231) Ditnbes = Ou where 8, =1if i =j, and 4, = 0 if i # j, For each fixed choice of j, we can regard this as a nonhomogeneous system of 72 linear equations in 7 unknowns by;, bss...» Bay. Since A is nonsingular, each of these systems has a unique solution, the jth column of B. All these systems have the same coefficien-matrix A and differ only in their right members, For example, if A is a3 x 3 matrix, there are 9 equations in (2.31) which can be expressed as 3 separate Tinear systems having the following augmented matrices : Qn 2 hs | T au a2 as | 0 1 a2 as | 0 Go Gan Gay | 0, Gq, daz doy | 1], Gx Gee Ges | 0 My, gq Aaa | O. 3, Gaz ag | 0 M31 sy gy | 1 If we apply the Gauss-Jordan provess, we arive at the respective augmented matrices 10 0] dy 10 Of bs 10 0] bs 0 1 Ofdn|, [0 1 Of de], [0 1 Of] by 0 0 1| da. 0 0 1) be 0 0 1] by In actual practice we exploit the fact that all three systems have the same coeflicient-matrix and solve all three systems at once by working with the enlarged matrix 4, 2 aa! 1 0 0, Gy, Gz Gg} O 1 0 My, Ae a3 | 0 0 1 The elimination process then leads 1 10 Of dn die bs’ 10] ba bx: 25 0 0 0 1] ba bas Bas, Exercises 67 The matrix on the right of the vertical line is the required inverse. The matrix on the left of the line is the 3 x 3 identity matrix. It is not necessary to know in advance whether A is nonsingular. If A is singular (not nonsingular), we can still apply the Gauss-Jordan method, but somewhere in the process one of the diagonal elements will become zero, and it will not be possible to wansform A to the identity matrix. A stem of ft linear equations inn unknowns, say a rs oe a can be written more simply as a matrix equation, AX= C, where A (a) is the coefficient matrix, and X and C are column matrices, If A is nonsingular there is a unigue solution of the system given by X = AC, 0 Exercises Apply the Gauss-Jordan elimination process to each of the following systems. If a. solution exists, determine the general solution. 5. 3K — 2p + Set und K+ yo 3+ Wa? 6x+ yp —4z+3u=7 & xty-3r+ 0 2x-y+ 2 —2u Ix +y —T2 +3u=3. T. xa ye 2es BU 4 do X+ y+ Te + Mus ldy + 3y + 6z + 10u + 150 =0 8 xm rZeMe-2 XM .3y—Z-Su= 9 4x —yt2- us 5 x Sys deew- 3. 9. Prove that the system x + y +22 =2, 2x —y +32 =2, 5x —y +az solution if'a 8. Find all solutions when a ="8. 10. (@) Determine all solutions of the sys 6, has a unique 68 Linear transformations and matrices (b) Determine all solutions of the system Sx +2y — 62 +2u = x-y+t z- us-2 xtytz = 6 11. This exercise tells how to determine all nonsingular 2 x 2 matrices. Prove that eee 1 is nonsingular if and only if ad — be x 0, in which case its inverse is 1 d -6 ieee ie Determine the inverse of each of the matrices in Exercises 12 through 16. Deduce that 123 4 234 a2 yaaa 1s, 0.0) 1 -1 12 ooo 123 o100007 2d 202000 0530) 1.0.0 1302 ie 010.40 2/07 oe 0) 00:30) os 000020 1-4 6 2.21 Miscellaneous exerci son matrices 1. If a square matrix has a row of zeros or a column of zeros, prove that it is singular. 2. For each of the following statements about n x m matrices, give a proof or exhibit a counter example. (a) If AB + BA = O, then 2B = Bia? (b) If A and B are nonsingular, then A + B is nonsingular, (©) If A and B are nonsingular, then AB is nonsingular. @) If A, 8, and A + B are nonsingular, then A — B is nonsingular. (©) If A? = 0, then A — Z is nonsingular. (® If the product of & matrices ,... 4, i8 nonsingular, then each matrix A, is nonsingular. Miscellaneous exercises on matrices 69 1 2 6 3. 1fA = [ |: find a nonsingular matrix P such that P-14P = [ 54 0 4, The marix A = |“ | , where # = -1,a =4(1 + J/3),andb = (1 — V5), has the prop- i erty that A? = A. Describe completely all 2 x 2 matrices A with complex entries such that A =A, 5. If At = A, prove that (A + DE =Z + 2*= DA. 6. The special theory of relativity makes use of a set of equations of the form x’ = a(x — vf), y= y, 2 =4%, t= alt — vx/ct). Here v represents the velocity of a moving object, ¢ the speed of light, and a = ¢/,/e = 2, where |v] 2, then nis a multiple of 4. The proof is based on two very simple Iemmas concerning vectors in n-space. Prove each of these lemmas and apply them to the rows of Hadamard matrix to prove the theorem. emma 1. IfX,Y,Z are orthogonal vectors inVn, then we have (X+ ¥) (K+Z)= 1x. Lemma 2. Write X=, 5-2-5 Xp) Y= On veces Vads ZS (a veees Z)- Uf each component X,,¥:12)i8 either lor 1, then the product (x,+ y,)(x;+ 2,) is either O or 4. &] DETERMINANTS 3.1 Introduction In many applications of linear algebra to geometry and analysis the concept of a determinant plays an important part. ‘This chapter studies the basic properties of dotermi- nants and some of their applications. Detemminants of order two and three were intoduced in Volume I as a useful notation for expressing certain formulas in a compact form, We recall ‘that a determinant of onder two was defined by the formula ay de @.1) eiq22 “Adan ae Despite similarity in notation, the determinant (written with vertical bars) is [an o= renee conceptually distinct from the matrix [ (written with square brackets). The Gn as determinant is a number assigned to the matrix according to Formula @.1). To emphasize this connection we also write a a2 | det i Ga ap, Determinants of order three were defined in Volume I in terms of second-order detemmi- nants by the formula an ae aq a2 an Gia as ay Oey an a3 an Aan SLD) MMM ec gua cg e | +a, og an ax aa Gay Ose Ga aan ‘This chapter treats the more general case, the determinant of a square matrix of order n for any integer n > 1. Our point of view is to teat the: determinant as a function which n 2 Determinants assigns to each square matrix A a number called the determinant of A and denoted by det A. It is possible to define this function by an explicit formula generalizing @.1) and (2). ‘This formula is a sum containing n! products of entries of A. For large n the formula is unwieldy and is rarely used in practice, It seems preferable to study determinants from another point of view which emphasizes more clearly their essential properties. These properties, which are important in the applications, will be taken as axioms for a determi- nant function. Initially, our program will consist of three parts: (1) To motivate the choice of axioms. (2) To deduce further properties of determinants from the axioms, (3) To prove that there is one and only one function which satisfies the axioms, 3.2 Motivation for the choice of axioms for a determinant function In Volume 1 we proved that the scalar tiple product of three vectors A,, A,, A, in 3-space can be expressed as the determinant of a matrix whose rows are the given vectors, Thus we have Gaels A Ae dt asi Gaz Bos » Ay, sy where Ay = (air, Giz, @13)s Az = (day, Gans as), ANd Ay = (Ay, , G52 Aas) - If the rows are linearly independent the scalar tiple product is nonzero; the absolute value of the product is equal to the volume of the parallelepiped determined by the three vectors A, A, A,. If the rows are dependent the scalar triple product is zero. In this case the vectors A, A, A, are coplanar and the parallelepiped degenerates to a plane figure of zero volume. Some of the properties of the scalar triple product will serve as motivation for the choice of axioms for a determinant function in the higher-dimensional case. To state these Properties in a form suitable for generalization, we consider the scalar triple product as a function of the three row-vectors A,, dz, ds. We denote this function by d; thus, MA, Ag, As) = Ay X Aa’ Ag. We focus our attention on the following properties: (a) Homogeneity in each row. For example, homogeneity in the first row states that Ad(tAy, Ae, dp) = d(Ay, Av, Ag) for every scalar t (b) Additivity in each row. For example, additivity in the second row states that (Ay, Ae 4 C, Ag) = d(Ay, Aa, As) + d(AL, Cy Ag) for every vector C. (©) The scalar triple product is zero if two of the rows are equal. (@ Normalization: (1, 0,0), § = ,1,0), = 0,0, 1). A set of axioms for a determinant function 3 Each of these properties can be easily verified from properties of the dot and cross product. Some of these properties are suggested by the geometric relation between the scalar triple product and the volume of the parallelepiped determined by the geometric vectors Ay, Ag, Aq. The geometric meaning of the additive property (b) in a special case is of particular interest. If we wke C = A, in (b) the second term on the right is zero because of ©), and relation (b) becomes ea) d(A,, Ap + Ay, Ag) = Ai, A,, As) This property is illustrated in Figure 3.1 which shows a parallelepiped determined by A,, A, A, and another parallelepiped determined by A,, A, + Az, A,. Equation (3.3) merely states that these (wo parallelepipeds have equal volumes. This is evident geometti- cally because the parallelepipeds have equal altitudes and bases of equal area. Volume MA,, Aa As) Volume = did 1, A, + A, As) Frours 8.1 Geometric interpretation of the property d(4,, A, As) = d(4y, A, + Ay. Ag). The (wo parallelepipeds have equal” volumes. ‘The properties of the scalar tiple product mentioned in the foregoing section can be suitably generalized and used as axioms for determinants of onder n. If A = (ay) is an nx A matrix with real or complex entries, we denote its rows by A,, + Ay . Thus, the ith row of 4 is a vector in n-space given by Ap (ayy, 4s. + Gn). We regard the determinant as a function of the rows A,, ..., A,, and denote its values by d(A,,..., A) oF by det A. AXIOMATIC DEFINITION OF A DETERMINANT FUNCTION. A. real- or complex-valuedfunction d, defined for each ordered n-tuple of vectors A,,..., A, in n-space, is called a determinant function of order n if it satisfies the following axioms for all choices of vectors A,,,.., A, and C in n-space. 74 Determinants AXIOM I, HOMOGENEITY IN EACH Row. If the kth row A, is multiplied by a scalar t, then the determinant is also multiplied by t: NE dl... tay.) = dl. axrom 2. apprtivity ry Bacn now. For each k we have MA Ag Cy aed Ae, ae CAL ee Ce + AL). AXIOM 3. ‘THE DEI MINANT VANISHES IP ANY TWO ROWS ARE EQUAL: Ay, ...44,)=0 if Ay = Ay for some i and j with i # j. AXIOM 4, ‘THE DETERMINANT OF THE IDENTITY MATRIX IS EQUAL ‘TO 1: d(,...,4,)=1, where 1, is the kth unit coordinate vector. The first two axioms state that the determinant of a matrix is a linear function of each of its rows. This is often described by saying that the determinant is a multilinear function of its rows. By repeated application of linearity in the first row we can write 4( 3 Ce, Aas» An) DidG. 4 4): 7 ei where fy, . +. sf, are scalars and Cy... . Cy are any vectors in n-space. Sometimes a weaker version of Axiom 3 is used: axiom 3°, ‘THE DE’ ERMINANT VANISHES ° TWO ADJACENT ROWS ARE Qt AL: Ay AD =0 if Ay = Aux — for somek =1,2 4.0.5 0-1 It is a remarkable fact that for a given n there is one and only one function d which satisfies Axioms 1, 2, 3° and 4, The proof of this fact, one of the principal results of this chapter, will be given later. The next theorem gives properties of determinants deduced from Axioms 1,2, and 3° alone. One of these properties is Axiom 3. It should be noted that Axiom 4 is not used in the proof of this theorem. This observation will be usefull later when we prove uniqueness of the determinant function, ‘reoreM 3.1, A determinant function satisfying Axioms 1, 2, and 3° has the following {further properties: (a) The determinant vanishes if some row is 0: da. A 0 if A,=O — forsomek. A set of axioms for a determinant function 15 (b) The determinant changes sign if two adjacent rows are interchanged: As... Ans Ansa « Hd... Aggie Ans de rhe determinant changes sign if any two rows A, and A, with i % j are interchanged. ‘he determinant vanishes if any two rows are equal: @ a) = for some i and j with i #j. (e) The determinant vanishes if its rows are dependent. Proof. To prove (a) we simply take £ = 0 in Axiom 1, To prove (b), let B be a matrix having the same rows as A except for row k and row k + 11. Let both rows By and By. be equal to A, + Agia. Then det B = 0 by Axiom 3°, Thus we may write dl... Ag + Apsrs Ae + Apery s+) Applying the additive property to row k and to row k + 1 we can rewrite this equation as follows + MW... Ags Agree) FAG ++ Aes Angas D+... Absas Any +d... Aggy Apis.) = 0. ‘The first and fourth terms are zero by Axiom 3°. Hence the second and third terms are negatives of each other, which proves (b). To prove (c) we can assume that i 2 But each term d(A,, A,, ,A,) in the last sum is zero since A, is equal to at least one of Agy.++,A,+ Hence the whole sum is zero. If row A, is 2 linear combination of the other rows we argue the same way, using linearity in the ith row. This proves ¢ 76 Determinants 3.4 Computation of determinants At this stage it may be instructive to compute some determinants, using only the axioms and the properties in Theorem 3.1, assuming throughout that determinant functions exist. In each of the following examples we do not use Axiom 4 until the very last step in the computation. sxaeis 1. Determinant of a 2 x 2 matrix. We shall prove that ay - (34) det Gy 1022 — Ayala ay a, Write the row vectors as linear combinations of the unit coordinate vectors # = (1,0) and F= 0.0: Ay = (Gia, Gyn) = Gb + Qyej, — Ag= (Gay » dag) = Agyl + oo js Using linearity in the first row we have d(A,, Ag) = day + aj. A,) =a, di, A) +a, d(j, Aa). Now we use linearity in the second row to obtain (i, Aa) = Ai, anid + ayaj) = Ger dCi, i) + ea di, j) = on dG), since d(i, i) = 0. Similarly we find (j, Aq)= d(j, dni + ayaj) = an d(j, i) = —an d(i,j). Hence we obtain AAy, Aa) = (@yr499 = Aen) HG, J) But d(i,j) = 1 by Axiom 4, so d(Ay, A.) = @i@2g — 492, a8 asserted. This argument shows that if a determinant function exists for 2 x 2 matrices, then it must necessarily have the form (3.4). Conversely, it is easy to verify that this formula does, indeed, define a determinant function of order 2. Therefore we have shown that there is one and only one determinant function of order 2. owes 2, Determinant of a diagonal matrix. A square matrix of the form an 0 0 Computation of determinants 7 's called a diagonal matrix. Each entry aj; off the main diagonal (¢#j) is zero. We shall prove that the determinant of A is equal to the product of its diagonal elements, (3.5) det A = ayaa? days ‘The kth row of A is simply a scalar multiple of the kth unit ‘coordinate vector, A, = dyyly - Applying the homogeneity property repeatedly to factor out the scalars one at a time we get det A= d(Ay,..., An) = d@irtis..., Gamba) = di dan UL, I) ‘This formula can be written in the form det A 4», det I, where Z is the identity matrix. Axiom 4 tells us that det Z 1 so we obtain (3.5) movexe 3. Determinant of an upper triangular matrix. A square matrix of the form [a ake Ae Oe ent oan: O00 a called an upper triangular matrix. All the entries below the main diagonal are zero. We all prove that the determinant of such a matrix is equal to the product of its diagonal elements, det U = titan + Han « First we prove that det U = 0 if some diagonal clement x, = 0. If the last diagonal clement tin, i8 zero., then the last row is and det U = 0 by Theorem 3.1 (a). Suppose, then, that some earlier diagonal element u,, is zero. To be specific, say usp = 0. Then each of the n= 1 row-vectors U,,..., Uy has its first two components zero. Hence these vectors span a subspace of dimension at most n— 2. Therefore these n— 1 rows (and hence all the rows) are dependent. By Theorem 3.1(e), det U = 0. In the same way we find that det U = 0 if any diagonal element is zero. Now we reat the general case, First we write the first row U, as a sum of two row-vectors, U=aKA+tV, where Vy =[ti1,0,..., O] and Vj = [0, 12, 135 + tj,]. By linearity in the first row we have det. U = det (Yy,Us,..., U,) + det (Vj, Us... 78 Determinants But det (Vj,Uz,..., Up) = 0 since this is the determinant of an upper triangular matrix with a diagonal element equal to 0. Hence we have 36) det U = det (Vj, Ur..-5 Uy). Now we treat row-vector Uz in a similar way, expressing it as a sum, Us=Vet Vi, where Vz = (0, u22,0,-6. 5,0] and = Vg= (0,0, arse. ss Wan) + We use this on the right of (3.6) and apply linearity in the second row to obtain G7) det U = det (Vi, Va,Us,.... Uy), since det (V;,V;,Ua,..., U,) = 0. Repeating the argument on each of the succeeding rows in the right member of (3:7) we finally obtain det U = det (Vj, Vas-++sVn)s where (Vi, Vos... V,) i8 @ diagonal matrix with the same diagonal elements as U, There- fore, by Example 2 we have det U = tyatlye - * Mans as required sewers 4. Computation by the Gauss-Jordan process. The Gauss-Jordan elimination process for solving systems of linear equations is also one of the best methods for com- puting determinants. We recall that the method consists in applying three types of operations to the rows of a matrix: (1) Imterchanging two rows. (2) Multiplying all the elements in a row by a nonzero scalar. (3) Adding to one row a scalar multiple ef another. By performing these operations over and over again in a systematic fashion we can transform any square matrix A to an upper tiangular matrix U whose determinant we now know how t© compute. It is easy to determine the relation between det A and det U. Each time operation (1) is performed the determinant changes sign. Each time (2) is performed with ‘a scalar ¢ sé 0, the determinant is multiplied by c, Each time (3) is performed the deter- minant is unaltered. Therefore, if operation (1) is performedp times and if ,.. . , ¢, are the nonzero scalar multipliers used in connection with operation (2), then we have 38) det A = (= 1)?(¢ey¢, +++ ¢q)-* det U. Again we note that this formula is a consequence of the first three axioms alone. Its proof does not depend on Axiom 4. Exercises 79 3.5 The uniqueness theorem In Example 3 of the foregoing section we showed that Axioms 1, 2, and 3 imply the formula det U == ty:Mgy **. tp, det Z, Combining this with @.8) we see that for every mx matrix A there is a scalar ¢ (epending on A) such that G9) aAy yA cd(h,... Ty). Moreover, this formula is a consequence of Axioms 1, 2, and 3 alone, From this we can easily prove that there cannot be more than one determinant function, sunonen 3.2, umzquemes sisonen ron veremamuns. Let d be afunction satisfying all four axioms for a determinant function of order n, and let f be another function satisfying Axioms 1,2, and 3. Then for every choice of vectors A, in n-space we have (3.10) SArs +. -, An) = Ars. ADS. I In particular, if f also satisfies Axiom 4 we have f(A... 5 A,) =dAy,.. «5 Ay) Proof. Let g(Ay,.... AJ) = (Ars. +. AD — dA er ey ae a a prove that g(41;.. . , A,,) = 0 for every choice of A,, ..., A. Since both d and f satisfy Axioms 1, 2, and 3 the same is tue of g. Hence g also satisfies Equation (3.9) since this was deduced from the first three axioms alone. Therefore we can write (3.1) BAL 2, An) = 68»... Sad where ¢ is a scalar depending on A. Taking A = 1 in the definition of g and noting that d satisfies Axiom 4 we find ath A) SS iver, by) — Pye, Ty) = 0. Therefore Equation (3.11) becomes g(4, . . . . A.) = 0. This completes the proof. 3.6 Exercises In this set of exercises you may assume existence of a determinant function. Determinants of order 3 may be computed by Equation (3.2). 1. Compute each of the following determinants 2 1 4 300 8 aor 0 Retna OL a a ona aed a Determinants ey ft . I det |3 0 2) = 1, compute the determinant of each of the following matrices: 1 ‘2x 2y 2 x y z xohyo-taz-t @|: 0 11, @|3x4+3 ay 3242), ©] 4 1 3 te x+l ytl 241 1 1 1 ae |. (a) Prove thaf a bc =!(b — ale = alle = b) @ ee (b) Find comesponding formulas for the determinants ot ro1d be ea ema ae) bce iam Ct al ‘Compute the determinant of each of the following matrices by transforming each of them to an uppe} triangular matrix. ey eer fee ee Io-to sto -l abed abed @l 1-1 4? © ae e al’ Ole pe a@|" ee a ot dA lat ot ct at 1 1000 : : Pot do -b ede 40200 Se 0 sec sao), @ 7 ce 1 00d ee Go ouitilian ee aa ed A lower tiangular matrix A = (q;,) is a square matrix with all entries above the main diagonal equal to 0; that is, aj; = 0 whenever i u e ° 3.7 The product formula for determinants In this section ‘we use the uniqueness theorem to prove that the determinant of a product of two square matrices is equal to the product of their determinants, det (AB) = (det A)(det B) , assuming that a determinant function exists. We recall that the product AB of two matrices A = (a,) and B = (by) is the matrix C = (ey) whose i, j entry is given by the formula 3.12) cu Saabs The product is defined only if the number of columns of the left-hand factor A is equal 10 the number of rows of the right-hand factor B. This is always the case if both A and B are square matrices of the same size. ‘The proof of the product formula will make use of a simple relation which holds between the rows of AB and the rows of A. We state this as a lemma, As usual, we let A, denote the ith row of matrix A, 82 Determinants umea 3.3. IfA isan m x n matrix and Ban nx p matrix, then we have (AB), = AB. That is, the ith row of the product AB is equal to the product of row matrix A, with B. Proof. Let B! denote thejth column of B and let C = AB. Then the sum in (3.12) can be regarded as the dot product of the ith row of A with thejth column of B, cy = Ay BS Therefore the ith row C, is the row matrix Cy = [Ay BY Ay. B..., Age BPD. But this is also the result of matrix multiplication of row matrix A, with B, since Dit bal eee Ooo al Df a Bye Ay BP. Bar Baa 777 Onp, ‘Therefore C;= A,B , which proves the lemma. muzonm 3,4, prooucnronmuza von pemeauznanrs. For any nx n matrices A and B we have det (AB) = (det A)(det B) . Proof. Since (AB), = A,B, we are to prove that dA, » AnB) = d(A,,..., dy) dBi... B,D. Using the lemma again we also have B, = (ZB), = 1B, where Z is the x n identity matrix. Therefore d(B,,..., B) =d(hB, ... . [,B), and we are to prove that dA,B,..., A,B)= UAy,.- +5 An) UB. +s InB). We keep B fixed and introduce a function £ defined by the formula S(Ay,,, -» A) = d(A,B,..., A,B). ‘The equation we wish to prove states that (3.13) S(Atsy 6 sg) =A Ays And fy . stn) Determinants and independence of vectors 83 But now it is a simple mater to verify that satisfies Axioms 1, 2, and 3 for a determinant function so, by the uniqueness theorem, Equation (3.13) holds for every matrix A. This completes the - proof. Applications of the product formula are given in the next two sections. 3.8 The determinant of the inverse of a nonsingular matrix We recall that a, square matrix A is called nonsingular if it has a left inverse B such that BA = I. If a left inverse exists it is unique and is also a right inverse, AB = 7. We denote the inverse by A-!. The relation between det A and det A~? is as natural as one could expect THEOREM 3.5. If matrix A is nonsingular, then det A # 0 and we have G14) det A* Proof. From the product formula we have (det A)(det A-*) = det (AA!) = det P= 1 Hence det A #¥ 0 and (3.14) holds. Theorem 3.5 shows that nonvanishing of det A is a necessary condition for A to be non- singular. Later we will prove that this condition is also sufficient, That is, if det A 3 0 then A~t exists. 3.9 Determinants and dependence of vectors A simple criterion for testing independence of vectors can be deduced from Theorem 3.5. aaron 3.6, A set of n vectors A,,.. ., A, in n-space is independent if and only if d(Ay,..., A) #0. Proof. We already proved in Theorem 3.2(e) that dependence implies d(A, , EA) 0. To prove the converse, we assume that A,, ... , A, are independent and prove that d(Ay,.. A) #0. Let V;, denote the linear space of n-tuples of scalars. Since a*,,..., A, are 7 independent elements in an n-dimensional space they form a basis for V,, . By Theorem 2.12 there is a linear transformation 7: ¥, —» V,, which maps these n ‘vectors onto the unit coordinate vectors, Te heer med 20 te Therefore there is an m x mn matix B such that AB=i for k= 1,200.50. 84 Determinants But by Lemma 3.3 we have 4,B = (AB), , where A is the matrix with rows A,, . Hence AB = J, so A is nonsingular and det A # 0. 3.10 The determinant of a block-diagonal_ matrix A 0 where A and B are square matrices and each 0 denotes a matrix of zeros, is called a block- diagonal matrix with wo diagonal blocks A and B. An example is the 5 x 5 matrix A. square matrix C of the form pT a en" BF 01000 C= O01) 3 00456 [00789 The diagonal blocks in this case p 1 a=[, and B=|4 5 6 : 789 ‘The next theorem shows that the determinant of a block-diagonal matrix is equal to the product of the determinants of its diagonal blocks. suzonme 3.7. For any two square matrices A and B we have = (det A)(det B). hola ee oe et B). (3.15) et (det A)(det Proof. Assume A isn xX n and Bis m Xm. We note that the given block-diagonal matrix can be expressed as a product of the form: ‘A 0) [A O]fl, 0 o sl lo p,jLo B where J, and Z,, are identity matrices of orders nm and m, respectively. Therefore, by the product formula for determinants we have ‘4 O° Ato To 3.16) det = det det é oO B. Onar Oo B. Exercises 85 ‘AO Now we regard the determinant det in | as a function of the m rows of A. ‘This is possible because of the block of zeros in the upper right-hand comer. It is easily verified that this function satisfies all four axioms for a determinant function of order #, Therefore, by the uniqueness theorem, we must have Aine det det 4. O In. quo: A similar argument shows that det i it) = det B. Hence (3.16) implies (3.15). 3.11 Exercises 1. For each of the following statements about square matrices, give a proof or exhibit a counter example. (a) de(A +B) =detA +detB. (b) det (A + B)%} = {det (A + B)}* (©) det (A + B)%} = det (42 + 248 + BY) (d) det (A + B)} = det (4° + BY). (a) Extend ‘Theorem 3.7 to block-diagonal matrices with three diagonal blocks: OO, det]O BO} = Get Aydet Bydet C). ooc (b) State and prove a generalization for block-diagonal matrices with any number of diagonal blocks. Tou oNiG ab ed Ott OF O) eae cd 3. Let a= ie Prove that det A = det and that abil aliia HOHE gh tae o 001 det B= det i Ht e 4, State and prove a generalization of Exercise 3 for x m matrices. aistote awa o no, ‘ab ‘A 5. Let A = Prove that det A = det det atau eal cud aanilei Ixy FM 6. State and prove a generalization of Exercise 5 for x m matrices of the form (ea where B, C, D denote square matrices and 0 denotes a matrix of zeros. A 86 Determinants 7. Use Theorem 36 to determine whether the following sets of vectors are linearly dependent or independent (@) A, = G, -1,0), A, = @1, 1D, Ay = 2,3, -D. (b) A, = (1, =1,2, 1D, Az = (1,2, =1,0), 4g = GB, 1,1,0), 4g = (1, 0,0, 1). (© 41 =U, 0, 0,0, D, 42 = (1, 1,0, 0,0), dg = (1,0, 1,0, D, de = 1, 1,01, 0, A, = @, 1,0, 1,0). 3.12 Expansion formulas for determinants. Minors and cofactors We still have not shown that a determinant function actually exists, except in the 2 x 2 case. In this section we exploit the linearity property and the uniqueness theorem to show that if determinants exist they can be computed by a formula which expresses every deter- minant of order as a linear combination of determinants of order 2 — 1 . Equation (3.2) in Section 3.1 is an example of this formula in the 3 x 3 case. The general formula will suggest. a method for proving existence of determinant functions by induction. Every row of ann xm matrix A can be expressed as a linear combination of the 7 unit coordinate vectors Iy,... , I, For example, the first row A, can be written as follows at a Since determinants are linear in the first row we have G17) d(Ay, Ags Ag) a(S aud dee say) Daas dy, dese. A. Therefore to compute det A it suffices to compute dl, A,, ..., A,) for each unit coordinate vector I. Let us use the notation Aj, to denote the matrix oblained from A by replacing the first row A, by the unit vector ,. For example, if m = 3 there are three such maéeices: ae OHO Oia | Si Aj, = [421 Gee aa}, fe = [42 Go2 Gan), ia = [421 G22 Gas Note that det 4, = d(lj, day... . A.J. Equation (3.17) can now be written in the form 3.18) det A = Yay, det Ai. fr This is called an expansion formula; it expresses the determinant of A as a linear combina- tion. of the elements in its first row. The argument used to derive (3.18) can be applied to the kth row instead of the first row. The result is an expansion formula in terms of elements of the kth row. Expansion formulas for determinants, Minors and cofactors 87 THEOREM 3.8. EXPANSION BY corAcTORS. Let Aj, denote the matrix obtained from A by replacing the kth row A, by the unit coordinate vector I,. Then we have the expansion formula B.19) det A = Yaz, det Aj; ca which expresses the determinant of A as a linear combination of the elements of the kth row. The number det Aj, is called the cofactor of entry ays. In the next theorem we shall prove that each cofactor is, except for a plus or minus sign, equal to a determinant of a matrix of order n= 1. These smaller matrices are called minors. pevimmiox. Given a square matrix A of order n > 2, the square matrix of order n = 1 obiained by deleting the kth row and the jth column of A is called the k, j minor of A and is denoted by A,. . nome. A matrix A = (a,,) of order 3 has nine minors. Three of them are On Os an On On Aum [ |: ju [% a Aum ] Ce Qa sa. 43, Ase, Equation (3.2) expresses the determinant of a 3 x 3 matrix as a linear combination of determinants of these three minors. The formula can be written as follows: det A = a,, det A,, — ae det A, + ag det A, The next theorem extends this formula to the nx n case for any n > 2. THmoREM 3.9, EXPANSION ny KTH-ROW muNors. For any n x n matrix A, n > 2, the cofactor of ay, is related to the minor Ay; by the formula 820) det Ag, = (— I) det Ay, Therefore the expansion of det A in terms of elements of the kth row is given by G21 det A= 3 (= 1) agy det Ags. Proof. We illustrate the idea of the proof by considering first the special case k = j = 1 ‘The mauix 4‘, has the form 88 Determinants By applying clementary row operations of type (3) we can make every entry below the 1 in the first column equal to zero, leaving all the remaining entries intact. For example, if we multiply the first row of j, by aa, and add the result to the second row, the new second row becomes (0, ders aay... » dz,)- By a succession of such elementary row operations we obtain a new matrix which we’ shall denote by 49, and which has the form t 0 s+ 0 O ay. 75? Gan Since row operations of type (3) leave the determinant unchanged we have (3.22) det Af, = det Aj. But 49, is a block-diagonal matrix so, by Theorem 37, we have det 4%, = det Ais where A, is the 1, 1 minor of A, cy ay Gs, °°" Oy na Therefore det A’, =det 4:1, which proves (3.20) fork = j = 1. We consider next the special case k = 1 ,f arbitrary, and prove that (3.23) det Aj, = (— 1)" det Ai,- Once we prove (3.23) the more general formula (3.20) follows at once because matrix Ay, aan be transformed to a matrix of the form Bi, by k — 1 successive interchanges of adjacent rows. The determinant changes sign at cach interchange so we have (3.24) det Aj; = (- D7 det Bi, where B is ann on matrix whose first row is 7, and whose 1, j minor By, is Ay. By (3.23), we have = 1) det By, = (— 1)! det Ay, det Bi, $0 (3.24) gives us det Ai, = (— 17 1)? det Ay; eer Ae ae Therefore if we prove (3.23) we also prove (3.20). Expansion formulas for determinants. Minors and cofactors 89 We tum now to the proof of (3.23). The matrix A‘, has the form a. Aye ee Gay ttt Gay My =|) : eae ee eae By clementary row operations of type (3) we introduce a column of zeros below the 1 and transform this to 0 0 1 O rs 0 dare. ag 0 dain 1. Gan Al; . |. 4m. 1 Anger 0 Ansar. 7. nnd As before, the determinant is unchanged so det A%, = det Aj;. The 1, j minor 4,, has the form Gm 8 Baer Gas 7" Fan Gp 7 Ansa Ouse 8 Fan, Now we regard det 48, as a function of the m — 1 rows of Ay, say det 4%, = f(A) . The functionfsatisfies the first thee axioms for a determinant function of order» — 1. There- fore, by the uniqueness theorem we can write (3.25) L(As)) = FJ) det Ar, where J is the identity matrix of onder nm — 1. Therefore, to prove (3.23) we must show that f) = (— DY. Now f) is by definition the determinant of the matrix 070d O eG 1 000 0 oo 1.0 0 +++ 0 | < jth row 0 oo ea ee a t jth column 90 Determinants The entries along the sloping lines are all 1. The remaining entries not shown are all 0. By interchanging the first row of C successively with rows 2, 3,... ,j we arrive at the nx n identity matrix J after j — 1 interchanges. The determinant changes sign at each inter- change, so det © = © I), Hence f(J) = 1), whieh proves ( 3) and hence (3.20), 3.13 Existence of the determinant function In this section we use induction on n, the size of a matrix, to prove that determinant functions of every order exist. For n = 2 we have already shown that a determinant function exists. We can also dispense with the case n = 1 by defining det [ay] = a, . Assuming a determinant function exists of order n — 1, a logical candidate for a deter- minant function of order m would be one of the expansion formulas of Theorem 3.9, for example, the expansion in terms of the firs-row minors. However, it is easier to verify the axioms if we use a different but analogous formula expressed in terms of the firstcolumn minors, THEOREM 3.10. Assume determinants of order n — 1 exist. For any n x n matrix A=(ay,), let f be the function defined by the formula (3.26) SA... AD = Be Day det Ay. Then £ satisfies all four axiomsfor a determinantfunction of onder n. Therefore, by induction, determinants of order n exist for every n. Proof. We regard each term of the sum in (3.26) as a function of the rows of A and we write S(Ars + AD = (= 1) ag det Ay If we verify that each f; satisfies Axioms 1 and 2 the same will be tue for f, Consider the effect of multiplying the first row of A by a scalar ¢. The minor A,, is not affected since it does not involve the first row. The coefficient a,, is multiplied by 4, so we have SQA,» Aas...) AD = ty det A, = tfi(Ai,- +.» Ad). Ifj > 1 the first row of each minor Aj, gets multiplied by f and the coefficient aj, is not affected, so again we have FitAy, Aas. Aw ths(Ar, Any. + +s An) Therefore each f; is homogeneous in the first row. If the kth row of A is multiplied by 1, where & > 1, the minor A,, is not affected but dys is multiplied by 4, so f, is homogeneous in the kth row. If j 9 h, the coefficient ay, is not affected but some row of A,, gels multiplied by £. Hence every f; is homogeneous in the kth row. The determinant of a transpose 91 A similar argument shows that cach f, is additive in every row, so £ satisfies Axioms 1 and 2. We prove next that £ satisfies Axiom 3’, the weak version of Axiom 3. From Theorem 3.1, it then follows that £ isfies Axiom 3. To verily that, £ satisfies Axiom 3’, assume two adjacent rows of A are equal, say A, = Agsy. Then, except for minors Ay, and Agy41, cach minor 4, has two equal rows det 4;,= 0. Therefore the sum in (3.26) consists only of the two terms coresponding to jukandj=k +1, G27) LAr yey An) = (HDF dn det Aus s (1 ayaa det Ay... But 4p, = Agia and ay, = asa since A, = A,,, .. Therefore the two terms in (3.27) differ only in sigh, so £ (A... , A) = 0, Thus, £ satisfies Axiom 3, Finally, we verify that £ satisfies Axiom 4. When A = / we have a,, = 1 and a, J > 1. Also, A,, is the identity matrix of order n~ 1, so cach term in (3.26) is zero except the first, which is equal to 1. Hence £ (J,,..., f,)=1 sof satisfies Axiom 4, In the foregoing proof we could just as well have used a functionfdefined in tems of the kth-column minors A,, instead of the first-column minors A, . In fact, if we let An) =X (=1"ay det An. 3.28) SA. exactly the same type of proof shows that this £ satisfies all four axioms for a. determinant function, Since determinant functions are unique, the expansion formulas in (3.28) and those in (3.21) are all equal to det A. ‘The expansion formulas 28) not only establish the existence of determinant functions but also reveal a new aspect of the theory of determinants-a connection between row= properties and column-properties. ‘This connection is discussed further in the next section, 3.14 The determinant of a transpose Associated with each matrix A is another matrix called the transpose of A and denoted by 4!. The rows of 4’ are the columns of A. For example, if aa esl Az SMM EEE Ps ae 2 3) USING Slit A. formal definition may be given as_ follows. DEFINITION OF TRANSPOSE. The transpose of an m Xn matrix A = (4,))%%s is the n Xm matrix At whose i j entry is aj, Although transposition can be applied to any rectangular matrix we shall be concerned primarily with square matrices. We prove next that transposition of a square matrix does not alter its determinant, 92 Determinanis razoren 3.11. For any n x n matrix A we have det A = det At, Proof. The proof is by induction on n, For 2 = 1 and a = 2 the result is easily verified. Assume, then, that the theorem is true for matrices of order 7m — 1 . Let A = (a) and let B = At = (). Expanding det A by its first-column minors and det B by its firstrow minors we have det A= Y(—1)*4a,, det Aj, det B= > (— 1)'*4b,; det By, a fe But from the definition of transpose we have by, = a, and By; = (Ay)*. Since we are assuming the theorem is tue for matrices of order m — 1 we have det By = det Ay. Hence the foregoing sums are equal term by term, so det A = det B. 3.15 The cofactor matrix ‘Theorem 3.5 showed that if A is nonsingular then det A 3 0. The next theorem proves the converse. That is, if det A # 0, then A~ exists. Moreover, it gives an explicit formula for expressing A~t in terms of a matrix formed from the cofactors of the entries of A. In Theorem 3.9 we proved that the 4, j cofactor of ay; is equal to (— 1) det Ay, , where Aj, is the i j minor of A. Let us denote this cofactor by cof aj. Thus, by definition, cof ayy = (— 1) det Ay - DEFINITION OF THE COFACTOR MATRIX. The matrix whose i, j entry is cof a, is called the cofactor matrix? of A and is denoted by cof A. Thus, we have cof A = (of ay) ya = ((— 1)! det Aisa ‘The next theorem shows that the product of A with the transpose of its cofactor matrix is, apart from a scalar factor, the identity matrix J, ragoren 3.12, For any nx n matrix A with n > 2 we have (3.29) A(cof A)! = (det AMI. In particular, if det A ¥ 0 the inverse of A exists and is given by 1 cof A). dera ©OF AY a an entirely different object, discussed in Section 5.8 Cramers’ rule 93 Proof. Using Theorem 3.9 we express det A in terms of its kth-row cofactors by the formula (3.30) det A= > ag, cof ay, A Keep & fixed and apply this relation to a new matrix B whose ith row is equal to the kth row of A for some i &, and whose remaining rows are the same as those of A. Then det B = 0 because the ith and kth rows of B are equal, Expressing det B in terms of its we have det B= ¥ bj, cof b,, = 0 But since the ith row of B is equal to the kth row of A we have ig = Gey and — cof by = cofay —_ for every j Hence (3.31) states that (3.32) 4; cofa,;=O0 if k # ji, Equations (3.30) and (3.32) together can be written as follows: dt A if i=k (3.33) oF cof 5 = 1g Vere. But the sum appearing on the left of (3.33) is the &, i entry of the product A(cof A)‘. There fore (3.33) implies (3.29). As a direct corollary of Theorems 3.5 and 3.12 we have the following necessary and sufficient condition for a square matrix (© be nonsingular. swzonse 3.13. A square matrix A is nonsingular if and only if det A # 0. 3.16 Cramer’s rule ‘Theorem 3.12 can also be used to give explicit formulas for the solutions of a system of linear equations with a nonsingular coefficient matrix. The formulas are called Cramer's rule, in honor of the Swiss mathematician Gabriel Cramer (1704-1752). THEOREM 3.14, CRAMER'S RULE. If a system of n linear equations in n unknowns ayes Xs a 1,2,...,n) of Determinants has a nonsingular coefficient-matrix A = (aj;) , then there is a unique solution for the system given by the formulas (3.34) by cody, for fH, 2seeeym Proof. ‘The system can be written as a matrix equation, AX =B, x by where X and B are column matrices, ¥ = : eee : . Since A is nonsingular x by there is a unique solution X given by 1 1 (3.35) X=A"B= aa (cof A)'B. ‘The formulas in (3.34) follow by equating components in (3.35). It should be noted that the formula for x, in (3.34) can be expressed as the quotient of two determinants, Cc, bat > det A where C, is the matrix obtained from A by replacing the jth column of A by the column matrix B. 3.17 Exercises 1, Determine the cofactor matrix of each of the following matrices : oe eae 4 3 By Oe ea Co re : ci 1-1-2 6 -1 -2 0 Sosa 2. Determine the inverse of each of the nonsingular matrices in Exercise 1. 3. Find all values of the scalar 2 for which the matrix 27 — A is singular, if A is equal to O73 @ E i}: OO ieee enh uaa ai la Exercises 95 4. If Ais an mx m matrix with 1 > 2, prove each of the following, properties of its cofactor matrix: (a) cof (44) = (Cof A). (b) (of A)'A = et ANI, (©) A(cof A) = (cof ‘A (A commutes with the transpose of its cofactor matrix). 5. Use Cramer’s rule to solve each of the following systems: (@)x+ 2y+32=8 2x-y #4257, yY+2=1 (b) x+y +2250, 3x-y-z=3, 9 x +5y+32=4, 6. (@) Explain why each of the following is a Cartesian equation for a straight Tine in the xy-plane passing through two distinct points (xy, 4) and (xg, ye). ee ete re ae eal X2— 1 Ye Sh, (b) State and prove comesponding relations for a plane in 3-space passing through three distinct points. (©) State and prove corresponding relations for a circle in the xy-plane passing through three noncolinear points. 7. Given n® functions fi, each differentiable on an interval (a, 4), define F(x) = det [f,(x)] for each x in (a, b). Prove that the derivative F'(x) is a sum of # determinants, yey ° det} oy Ba Ye Fo = 3 det Aco. where A,(x) is the matrix obtained by differentiating the functions in the ith row of [f,(x)]. 8. An ax m matrix of functions of the form W(x) = {u!11@)], in which each row afier the first is the derivative of the previous row, is called a Wronskian matrix in honor of the Polish mathe- matician J. M. H. Wronski (1778-1853). Prove that the derivative of the determinant of W(x) is the determinant of the matrix obtained by differentiating each entry in the last row of W(x). [Hint Use Exercise 7.] 4 EIGENVALUES AND EIGENVECTORS 4.1 Linear transformations with diagonal matrix representations Let T: V—> V be a linear transformation on a finite-dimensional linear space V. Those properties of T which are independent of any coordinate system (basis) for V are called inwrinsicproperties of T. They are shared by all the matrix representations of 7 If a basis can be chosen so that the resulting matrix has a particularly simple form it may be possible to detect some of the intrinsic properties directly from the matrix representation, Among the simplest types of matrices are the diagonal matrices. Therefore we might ask Whether every linear transformation has a diagonal matrix representation, In Chapter 2 we vated the problem of finding a diagonal matrix representation of a linear transfor mation 7: V—» W, where dim V =n and dim W = m . In Theorem 2.14 we proved that there always exists a basis (e,.. . . ¢,) for V and a basis (w,..... W,) for W such that the matrix of T relative to this pair of bases is a diagonal matrix, In particular, if W = V the matrix will be a square diagonal matrix, The new feature now is that we want to use the same basis for both Vand W. With this restriction it is not always possible to find a diagonal matrix representation for 7. We tum, then, to the problem of determining which trans- formations do have a diagonal matrix representation. Notation: IE A = (ayy) is a diagonal matrix, we write A = diag (a1, a2)... a). It is easy to give a necessary and sufficient condition for a linear transformation to have a diagonal matrix representation. THEOREM 4.1, Given a linear transformation T: V—> V, where dim V =n. If T has a diagonal matrix representation, then there exists an independent set of elements uy, +h in V and a corresponding set of scalars A,,..., Aq such that (4.1) T(t) = Aye for K=1,2,...,0 Conversely, if there is an independent set Uy,..., uy in V and a corresponding set of scalars Ay e+ +5 Aq satisfying (4.1), then the matrix A = diag (y,...,4,) isa representation of T relative to the basis (Uy... ., us). 96 Bigenvectors and eigenvalues of a linear transformation 7 Proof. Assume first that J has a diagonal matrix representation A = (aj,) relative 10 some basis (€;, . +). The action of J on the basis elements is given by the formula since ay = 0 for i k . This proves (4.1) with m= ey and A,= a,, . Now suppose independent elements #%,. . . , My and scalars 2, .. . , 2, exist satisfying (4.1), Since i, +. . «My ate independent they form a basis for V. If we define’ a,, = J, and 4, = 0 for i # k, then the matrix A = (a,) is a diagonal matrix which represents T relative t0 the basis (yy... tp). Thus the problem of finding a diagonal matrix representation of a linear transformation has been transformed to another problem, that of finding independent elements uj, ... + Uy and scalars Ay... , 1, 10 satisfy (4.1), Elements u, and scalars A, satisfying (4.1) are called eigenvectors and eigenvalues of T, respectively. In the next section we study eigenvectors and eigenvaluesf in a more general setting. 4.2. Eigenvectors and eigenvalues of a linear transformation In this discussion V denotes a linear space and S denotes a subspace of V. The spaces $ and V are not required (© be finite dimensional vermtron. Let T:S—» V be a linear transformation of S into V. A scalar 2is called an eigenvalue of Tif there is a nonzero element x in S such that (4.2) Thx) = Ax. The element x is called an eigenvector of Thelonging to 1. The scalar A is called an eigenvalue corresponding to x. There is exactly one eigenvalue corresponding to a given eigenvector x. In fact, if we have T(x) = Ax and Th) = yx for some x # O, then Ix a = ux so A= pe. Note: Although Equation (4.2) always holds for x = 0 and any scalar I, the definition excludes 0 as an eigenvector. One reason for this prejudice against 0 is to have exactly one eigenvalue 2 associated with a given eigenvector x. ‘The following examples illustrate the meaning of these concepts, exueze 1, Multiplication by a fixed scalar. Le. T: S-* V be the linear transformation defined by the equation 7Yx) = cx for each x in S, where c is a fixed scalar. In this example every nonzero clement of S is an eigenvector belonging t the scalar c. +The words eigenvector and eigenvalue ate partial translations of the German words Eigenvektor and Eigenwert, respectively. Some authors use the terms characteristic vector, OF proper vector as Synonyms for eigenvector. Eigenvalues are also called characteristic values, proper values, Of latent roots. 98 Eigenvalues and eigenvectors wow 2, The eigenspace E(A) consisting of all x such that Tix) = ax, Let T: S > V be a linear transformation having an eigenvalue 2. Let E(A) be the set of all elements x in S such that 7x) = Ax. This set contains the zero element 0 and all eigenvectors belonging to il, It is easy to prove that E(1) is a subspace of S, because if x and y are in E(2) we have Tax + by) = aT(x) + bT(y) = akx + bly = Max + by) for all scalars a and 6. Hence (ax + by) © E(1) so E(A) is a subspace. The space E(1) is called the eigenspace corresponding 0 2. Ik may be finite- or infinite-dimensional. If E(A) is finitedimensional then dim EQ) > 1 , since EC) contains at least one nonzero element x comesponding to 1 wows 3. Existence of zero eigenvalues. If an eigenvector exis definition. However, the zero scalar can be an eigenvalue. In fact, if 0 is an eigenvalue for x then T(x) = Ox = 0, so x is in the null space of 7. Conversely, if the null space of T contains any nonzero elements then each of these is an eigenvector with eigenvalue 0. In general, E(A) is the null space of T— AI. it cannot be zero, by mower 4. Rejection in the xy-plane. Let S = V = V{R) and let the xy-plane. That is, let Tact on the basis vectors i, j, k as follows: 1) = i, TG) = j T(k) = -k. Every nonzero vector in the xy-plane is an eigenvector with eigenvalue 1 ‘The remaining eigenvectors are those of the form ck, where ¢ # 0 ; cach of them has eigenvalue — 1 uwere 5. Rotation of the plane through a fixed angle x. This example is of special interest because it shows that the existence of eigenvectors may depend on the underlying field of scalars. The plane can be regarded as a linear space in two different ways: (1) As a 2- dimensional real linear space, V = V,(R), with two basis elements (1, 0) and (0, 1), and with real numbers as scalars; or (2) as a Ledimensional complex linear space, V = V,(C), with one basis clement 1, and complex numbers as scalars. Consider the second interpretation first, Each element 2 # 0 of V,(C) can be expressed in polar form, z = re. If T rotates z through an angle a then T(z) = re@t#! = efaz, Thus, each % 3 0 is an eigenvector with eigenvalue 4 =e. Note that the eigenvalue e™ is not real unless % is an integer multiple of 7. Now consider the plane as a real linear space, V,(R). Since the scalars of V,(R) are real numbers the rotation T has real eigenvalues only if a is an integer multiple of 7. In other words, if a is not an integer multiple of z then T has no real eigenvalues and hence no eigenvectors. Thus the existence of eigenvectors and eigenvalues may depend on the choice of scalars for V. suxexe 6. The differentiation gperator. Let V be the linear space of all real functions f having derivatives of every order on a given open interval. Let D be the linear transfor- mation which maps each f onto its derivative, D(f) = f '. The eigenvectors of D are those nonzero functions f satisfying an equation of the form Eigenvectors and eigenvalues of a linear transformation 99 fay for some real 4, This is a first order linear differential equation, All its solutions are given by the formula f(x) = ce**, where ¢ is an arbitrary real constant. Therefore the eigenvectors of D are all exponential functions fx) = cet with c # 0 . The eigenvalue comesponding to f(x) = ce is A. In examples like this one where V is a function space the eigenvectors are called eigenfunctions. eevexe 7. The integration operator. Let V be the linear space of all real functions continuous on a finite interval fa, bl. Iffe V define g = TY) 10 be that function given by sax =['sod if asxgo. ‘The cigenfunctions of 7 Gf any exist) are those nonzerofsatisfying an equation of the form (4.3) for some real 4, If an eigenfunction exists we may differentiate this equation to obtain the relation f(x) = 2f"(x), from which we find f(x) = ce**, provided 2 # 0. In other words, the only candidates for eigenfunctions are those exponential functions of the form f(x) = ce*l4 with ¢ 0 and i ¥ 0. However, if we put x = a in (4.3) we obtain 0 = Af(a) = Kee” Since e/4 is never zero we see that the equation T(f) = Afcannot be satisfied with a non- zero f, so Thas no eigenfunctions and no eigenvalues. wowein 8 The subspace spanned by an eigenvector. Let T: S — V be a linear trans- formation having an eigenvalue 1. Let x be an eigenvector belonging to A and let L(x) be the subspace spanned by x. That is, L(x) is the set of all scalar multiples of x. It is easy to show that T maps L(x) into itself, In fact. if y = cx we have Tiy) = Tex) = eT(x) = (2x) = Hex) = dy. If © % 0 then y ¥ 0 so every nonzero clement y of L(x) is also an eigenvector belonging wos A subspace U of S is called invariant under Tif Tmaps each element of U onto an element of U. We have j ust shown that the subspace spanned by an eigenvector is invariant under 7, 100 Eigenvalues and eigenvectors 4.3 Linear independence of eigenvectors corresponding to eigenvalues One of the most important properties of eigenvalues is described in the next theorem, As before, $ denotes a subspace of a linear space V. mon 4.2, Let thy... , Uy be eigenvectors of a linear transformation T: S — V, and assume that the corresponding eigenvalues }y,... , 4, are distinct. Then the eigenvectors thy ve, %, are independent. Proof. The proof is by induction on k. The result is trivial when & = 1. Assume, then, that it has been proved for every set of k — 1 eigenvectors, Let w,,.. . , uy be k eigen- vectors belonging to distinct eigenvalues, and assume that scalars ¢, exist such that (4.4) de =O. Applying T to both members of (4.4) and using the fact that T(u,) = Ayu, we find (4.5) Multiplying (4.4) by 4, and subtracting from (4.5) we obtain the equation Sed — Adu =O. But since t,.. ., tq are independent we must have ¢,(A; — 4,) = 0 for each i = 1, 2,..., k —1. Since the eigenvalues are distinct we have A, # A, for ik so ¢, = 0 for i= 1, 2, 1k — 1. From (4.4) we see that ¢y is also 0 so the eigenvectors t,..., My are inde- pendent, Note that Theorem 4.2 would not be true if the zero element were allowed to be an eigen- vector. This is another reason for excluding 0 as an eigenvector. Warning: ‘The converse of Theorem 4.2 does not hold. That is, if 7 has independent eigenvecions ,... . ey then the comesponding eigenvalues 4,,..., 2, need not be dis- tinct. For example, if 7 is the identity transformation, 7(x) x for all x, then every x # 0 is an eigenvector but there is only one eigenvalue, 4 = Theorem 4,2, has important consequences for the finite-dimensional case. rumor 4.3. If dimV=n, every linear transformation T: V— V has at most n distinct eigenvalues. If T has exactly n distinct eigenvalues, then the corresponding eigenvectors form a basis for V and the matrix of T relative to this basis is a diagonal matrix with the eigenvalues as diagonal entries. Proof. W there were n + 1 distinct eigenvalues then, by Theorem 4.2, V would contain n + 1 independent elements, This is not possible since dim V =n . The second assertion follows from Theorems 4.1 and 4.2. Exercises 101 44 n 10. uu. Note: Theorem 4.3 tells us that the existence of 2 distinct eigenvalues is a sufficient condition for Tio have a diagonal matrix representation, This condition is not necessary. ‘There exist linear transformations with less than n distinct eigenvalues that can be represented by diagonal matrices. The identity transformation is an example. All its eigen- values are equal to 1 but it can be represented by the identity matrix. Theorem 4.1 tells us that the existence of x independent eigenvectors is necessary and sufficient for 7 to have a diagonal matrix representation, Exercises - @) If 7 has an eigenvalue 4, prove that aT has the eigenvalue al (b) If x is an eigenvector for both T, and T, , prove that x is also an eigenvector for aT, + Ty - How are the eigenvalues related? Assume ‘T: V—» V has an eigenvector x belonging to an eigenvalue 1. Prove that vis an eigen- vector of 7? belonging t© 72 and, more generally, x is an eigenvector of 7 belonging to 2". Then use the result of Exercise 1 t0 show that if P is a polynomial, then x is an eigenvector of PCT) belonging to PCA). Consider the plane as a real linear space, V = Ve(R) . and let 7’ be a rotation of V through an angle of 7/2 radians. Although 7 has no eigenvectors, prove that every nonzero vector is an eigenvector for T?, . If 7: V+ V has the property that 7? has a nonnegative eigenvalue 4%, prove that at least one of Zor ~A is an eigenvalue for 7. (Hint: T? = 21 = (0 + A(T = al) .] Let V be the linear space of all real functions differentiable on (0, 1). If f € V, define g 7) tw mean that g(t) = tf “@ forall tin (0, 1). Prove that every real 4 is an eigenvalue for T, and determine the eigenfunctions corresponding 1 A, Let V be the linear space of all real polynomials p(x) of degree V as follows: If x = {x,} is a convergent sequence with limit a, let T(x) = {Vp}, where yy = a — x, for n> 1. Prove that Thas only two eigenvalues, 2=0 and 2 = -1, and deiermine the eigen- vectors belonging to each such 2, Assume that a linear transformation 7 has two eigenvectors x and y belonging ( distinet eigenvalues 2 and y. If ax + by is an cigenvector of 7, prove that a =0 or b = 0 Let 7: § + V be a linear transformation such that every nonzero clement of $ is an eigenvector. Prove that there exists a scalar ¢ such that 77x) = ex. In other words, the only transformation with this property is a scalar times the identity. (Hint: Use Exercise 11.) 102 Eigenvalues and eigenvectors 4.5 The finite-dimensional case. Characteristic polynomials If dim V = n, the problem of finding the eigenvalues of a linear transformation T: V—> V can be solved with the help of determinants. We wish to find those scalars A such that the equation T(x) = Ax has a solution x with x y 0. The equation T(x) = Ax can be written in the form QI Tx) = 0, where /is the identity transformation. If we let 7, = 27 — 7, then 2 is an eigenvalue if and only if the equation 46) TW) = 0 has a nonzero solution x, in which case 7, is not invertible (because of Theorem 2.10) Therefore, by Theorem 2.20, a nonzero solution of 4.6) exists if and only if the matrix of T,, is singular, If A is a matrix representation for 7, then AZ — A is a matrix representation for T,. By Theorem 3.13, the matrix 27 — A is singular if and only if det (AZ — A) = 0. Thus, if A is an eigenvalue for Tit satisfies the equation a2 det (A — A) = 0. Conversely, any 4 in the underlying field of scalars which satisfies (4.7) is an eigenvalue. This suggests that we should study the determinant det (Af = A) as a function of 2. recom 44, If Ais any nx nmatrix and if I is the nx nidentity matrix, the function f defined by the equation f(A) = det (AZ — A) is a polynomial in i. of degree n. Moreover, the term of highest degree is A", and the constant term is f(0) = det CA) = (- 1)" det A, Proof. The statement (0) = det (-A) follows at once from the definition of f. We prove thatf is a polynomial of degree # only for the case nm < 3. The proof in the general case can be given by induction and is left as an exercise. (See Exercise 9 in Section 4.8.) For n = 1 the determinant is the linear polynomial f(@) = 4 ~ ay. For n = 2 we have =a 4 I |G = a = ae) = dete det (i — A) = | =m Aas = B= Git dash + (Grider = aire a quadratic polynomial in 2, For n = 3 we have Andy -an —as ee AL A ee 4 yg A — ey Calculation of eigenvalues and eigenvectors in thejnite-dimensional case 103 4m = Ory =a an =a h = aay == ay) +e ee =a, Am aay =a, 1 — ag =a, Og The last two terms are linear polynomials in 2, The first tem is a cubic polynomial, the term of highest degree being 7°, perinrtion. If A is an nxn matrix the determinant MA) = det (AT = A) is called the characteristic polynomial of A. The roots of the characteristic polynomial of A are complex numbers, some of which may be real. If we let F denote either the real field R or the complex field C, we have the following theorem, ruzonen 4.5. Let T:V—> V be a linear transformation, where V has scalars in F, and dim V =n. Let A be a matrix representation of T. Then the set of eigenvalues of T consists of those roots of the characteristic polynomial of A which lie in F. Proof. ‘The discussion preceding Theorem 4.4 shows that every eigenvalue of 7" satisfies the equation det (47 — A) = 0 and that any root of the characteristic polynomial of A Which lies in F is an eigenvalue of 7: The matrix 4 depends on the choice of basis for V, but the eigenvalues of T’ were defined without reference to a basis. Therefore, the set of roots of the characteristic polynomial of A must be independent of the choice of basis. More than this is true, In a later section we shall prove that the characteristic polynomial itself is independent of the choice of basis We tum now to the problem of actually calculating the eigenvalues and eigenvectors in the finite-dimensional case. 46 Calculation of” eigenyalues and eigenvectors in the finite-dimensional case In the finite-dimensional case the cigenvalues and eigenvectors of a linear transformation Tare also called eigenvalues and eigenvectors of each matrix representation of T. Thus, the eigenvalues of a square matrix A are the roots of the characteristic polynomial f(2) = det (AJ — A). The eigenvectors corresponding to an eigenvalue 2 are those nonzero vectors ¥ =(x,,...,%,) regarded as n x 1 column matrices satisfying the matrix equation AX=4X, or (AI- AX= 0. This is a system of n linear equations for the components X15... , X» Once we know 2 we can obtain the eigenvectors by solving this system, We illustrate with three examples that exhibit different features. 104 Eigenvalues and eigenvectors sxuexe J, A matrix with all its eigenvalues distinct. The matrix has the characteristic polynomial A-2 -1 -1 det (A = A) = det] —2 1 1 A+, =@-DA+NDE—-3), so there are three distinet eigenvalues: 1,—1, and 3, To find the eigenvectors corresponding, to A= 1 we solve the system AX = X, or 2 Ips x 2 3 4{lml=—|x|. -1 —2\ Lx, Xs, This gives us 2xy+ Xe+ MSH 2x, + 3xa + 4xy = xe =m — M2 = ty which can be rewritien as x m+ xy =O 2x, + 2x + 4x, = 0 Hx - 3m = 0 Adding the first and third equations we find x. = 0, and all three equations then reduce to X, + Xq = 0. Thus the eigenvectors comesponding to 2 = 1 are X = t(l,— 1, 0) , where 1 is any nonzero scalar. By similar calculations we find the eigenvectors X = 10, 1 , -1) corresponding to 2 = — 1, and ¥ = 12,3, = 1) corresponding to 4 = 3, with ¢ any nonzero Since the eigenvalues are distinct the correspondingeigenvectors (1, — 1 , 0), (0, 1, — 1), and (2,3, — 1) are independent. The results can be summarized in tabular form as follows. In the third column we have listed the dimension of the eigenspace E(I). Eigenvalue 4 Bigenvectors dim E(A) t «1, —1,0), 140 it -1 10,1,-1), 140 1 3 12,3, -1), £0 \ Calculation of eigenvalues and eigenvectors in thejnite-dimensional case 105 nnex 2. A matrix with repeated eigenvalues. ‘The matrix has the characteristic polynomial det (4 — A) = det = (A- 24 — 24-4). The eigenvalues are 2, 2, and 4, (We list the eigenvalue 2 twice to emphasize that it is a double root of the characteristic polynomial.) To find the eigenvectors corresponding to 2 = 2 we solve the sysiem AX = 2X, which reduces to —m+ = 0 Xx = 0 2xy + xa + xy = 0. This has the solution x, = x3 = ~x; So the eigenvectors corresponding to 2 = 2 are (G1, 1, 1), where ¢ 0. Similarly we find the eigenvectors #(1, — 1, 1) corresponding to the eigenvalue: 4 = 4. The results can be summarized as follows: Eigenvalue Bigenvectors dim F(A) 2,2 “(-1,1,), 640 \ 4 1,-1.1), 140 Il paws 3. Another matrix with repeated eigenvalues. ‘The matrix 2 abel =[2 3 2 Siisiia has the characteristic polynomial (4 — 1)(4 — 1)(4 — 7). When 4 = 7 the system AX = 7X. becomes Sum oe m= 0 -2x, + dxy = 2x, = 0 —3x, = 3x, + 3m = 0. This has the solution xz =2X,, ¥.= 3x;, $0 the eigenvectors corresponding to 4 = 7 are 1(1, 2, 3), where 1 # 0. For the eigenvalue 4 = 1, the system AX = X consists of the equation 106 Eigenvalues and eigenvectors Ms Mes Xa. 0 repeated three times. To solve this equation we may take x; = a, x,= 6, where a and b are arbitrary, and then take x,= -a = 6 . Thus every eigenvector corresponding 0 2 = 1 has the form (a,b, -a — b) = af, 0, -1) + 6@, 1. -1), where a x 0 or 6 0, This means that the vectors (1, 0, -1) and (0, 1, -1) form a ba for E(1). Hence dim E(A) =2 when 2= 1, ‘The results can be summarized as. follows: sis genvalue Eigenvectors dim E(A) 7 11, 2,3), 1#0 1 1,1 a(1,0,-1) + b@, 1, —1), a, 6 not both 0. 2 Note that in this example there are three independent eigenvectors but only two distinct eigenvalues. 4.7 Trace of a matrix root written as often as 's multiplicity indicates. Th F(A) by As have the factori: A= AAA) Gm Ay We can also write f(A) in decreasing powers of I as follows, SA) = AP + pale et Cat coe Comparing this with the factored form we find that the constant term ¢q and thecoeflicient of 2°" are given by the formulas oS (—D A, and ya = At bay) Since we also have ¢y = (-1)" det A, we see that Ad =det A. ‘That is, theproduct of the roots of the charucteristicpolynomialof A is equal to the determinant of A. ‘The sum of the roots ofl(A) is called the trace of A, denoted by t A. Thus, by definition, Asda. a The coefficient of 4°-1 is given by ¢,4 = -tr A. We can also compute this coefficient from the determinant form for (4) and. we find that ya = (dy + 1+ ay): Exercises 107 (A proof of this formula is requested in Exercise 12 of Section 4.8) The two formulas for ¢,-1 show that rA=Say. ‘That is, the trace of A is also equal to the sum of the diagonal elements of A. Since the sum of the diagonal elements is easy to compute it can be used as a numerical check in calculations of eigenvalues. Further properties of the trace are described in the next set of exercises, 48° Exercises Determine the eigenvalues and eigenvectors of each of the matrices in Exercises 1 through 3. Also, for each eigenvalue 4 compute the dimension of the eigenspace E(A). 10 aL 1 0 a 1 of |} of |: of 1. of i o1 o1 a4 a diay cos 6 sin 0 [ a>0,b>0. 3. [ (i | [ i 4, The matrices Py = aes , io 0, mechanical theory of electron spin and are called Pauli spin matrices, in honor of the physicist Wolfgang Pauli (1900-1958). Verify that they all have eigenvalues 1 and — 1 . Then determine all 2 x 2 matrices with complex entries having the two eigenvalues 1 and -1 . 5. Determine all 2 x 2 matrices with real entries whose eigenvalues are (a) real and distinct, (b) real and equal, (¢) complex conjugates. 6. Determine a, b,c, d,e, £, given that the vectors (1, 1,1), (1, 0, =I), and (1, — 1 , 0) are eigen- vectors of the matrix ae Bao [ ] occur in the quantum es abe def 7. Calculate the eigenvalues and eigenvectors of each of the following matrices, Also, compute the dimension of the eigenspace £(2) for each eigenvalue 4. 8. Calculate the eigenvalues of each of the five matrices 0: 0010: Lo ong 0100 ooo) o 1 0 0 000 fa) > (b) ©) + 00 0 0 -1 0 oool o10 0 0 0 =I ONTO, 108 Eigenvalues and eigenvectors 0-10 0 1 0 0 72014070 0-1 o @ a): ° o -i Oto o io OOH Holey These are called Dire matrices in honor of Paul A. M. Dirac (1902— ), the English physicist. They occur in the solution of the relativistic wave equation in quantum mechanics. 9, If A and B are 2x n matrices, with B a diagonal matrix, prove (by induction) that the deter- minant f(4) = det (AB = A) is a polynomial in 4 with f(@) = 1)” det A, and with the coefficient of 4” equal to the product of the diagonal entries of B. 10, Prove that a square matrix A and its transpose At have the same characteristic polynomial 11, If A and B are mx m matrices, with A nonsingular, prove that AB and BA have the same set of eigenvalues. Note: It can be shown that AB and BA have the same characteristic poly- nomial, even if A is singular, but you are not required to prove this. 2, Let A be an m x m matrix with characteristic polynomial (2). Prove (by induction) that the coefficient of 2°! in f(A) is -tr A. 13. Let A and B be mx n matrices with det A = det B and tr A = B. Prove that A and B have the same characteristic polynomial if 1 = 2 but that this need not be the case if n > 2, 14, Prove each of the following statements about the trace. (a)tr(A+B)=trA+trB (b) w (cA) = AL (c) tr (AB) = (BA). @e ats 0a. lar_mat 4.9 Matrices representing the same linear transformation, es In this section we prove that two different matrix representations of a linear tran: formation have the same characteristic polynomial. To do this we investigate more closely the relation between matrices which represent the same transformation. Let us recall how matrix representations are defined. Suppose T; V— W is a linear mapping of an n-dimensional space V into an m-dimensional space W. Let (e,,... . &) and (w,,...,W.) be ordered bases for Vand W respectively. The matrix representation of T relative (© this choice of bases is the m x m matrix whose columns consist of the com- ponents of T(e,),. -. ,T(e,) wlative to the basis (1, . . . , w,). Different matrix represen- tations arise from different “choices of the bases. We consider now the case in which V = W, and we assume that the same ordered basis (e1,--+,©) is used for both V and W. Let A = (ay) be the matrix of T relative to this basis. This means that we have @s) Tia)= > ane for k=1,2,...,n. Now choose another ordered basis (it... . i) for both V and Wand let B = (by,) be the mauix of T relative to this new basis, Then we have Matrices representing the same linear transformation. Similar matrices 109 (49) Gia > Gea nitride eae it Since each u, is in the space spanned by @,..., e, we can write (4.10) uD forjel..,m im for some set of scalars G,. The n x m matrix C = (¢,,) determined by these scalars is non- singular because it represents a linear transformation which maps a basis of V onto another basis of V. Applying T to both members of (4.10) we also have the equations an Tue ere) cor edit, ‘The systems of equations in (4.8) through (4.11) can be written more simply in matrix form by introducing matrices with vector entries. Let = [er,.-.,e,) and u [use be 1 x # row matrices whose entries are the basis elements in question. Then the set of equations in (4.10) can be written as a single matrix equation, (4.12) U=EC. Similarly, if we introduce E's =(T(e; ee cea ae een T(u,),---, T(u,))> Equations (4.8), (4.9), and (4.11) become, respectively, (4.13) U's UB, U'= EC. From (4.12) we also have E= UC. To find the relation between A and B we express U’ in wo ways in terms of U, From (4.13) we have UB and. U' = E'C = FAC = UC7AC Therefore UB = UC-AC. But each entry in this matrix equation is a linear combination 110 Eigenvalues and eigenvectors of the basis vectors uy, + 4, . Since the u, are independent we must have B ‘AC, ‘Thus, we have proved the following theorem. suscerx 4.6. If hon xn matrices A and B represent the same linear transformation T, then there is a nonsingular matrix C such that B=CAC, Moreover, if A is the matrix of T relative to a basis E = [e,,.,., ¢,] and if B is the matrix of T relative to a basis U = [u,.. . uy], then for C we can take the nonsingular matrix relating the two bases according to the matrix equation U = EC. ‘The converse of Theorem 4.6 is also true. razorem 4.7. Let A and B be two n x n matrices related by an equation of the form B = C+AC, where C is a nonsingular n x n matrix. Then A and B represent the same linear transformation. Proof. Choose a basis B= [ey,... , ¢] for an nlimensional space V. Let aay... 5 ty be the vectors determined by the equations 14) w=See for j=1,2,. wi where the scalars ey, are the entries of C. Since C is nonsingular it represents an invertible linear transformation, so U = [w1,.. . . t,] i8 also a basis for V, and we have U = EC. Let Tbe the linear transformation having "the matrix representation A relative to the basis E, and let S$ be the transformation having the matrix representation B relative to the basis U. ‘Then we have (4.15) Te) =Sage, for k=1,2,...,7 i and (4.16) sa=S by forj=1,2,...,0 We shall prove that S = T by showing that T(u,) = S(u,) for each j. Equations (4.15) and @16) can be written more simply in matrix form as follows, representing the same linear transformation, Similar matrices ut (Te), +++ Ted) = FA, — [S(u),++. » Sua) = UB. Applying T'to (4.14) we also obtain the relation 7(,) Y cesT (e), oF (Tw)... Ted] EAC. But we have UB = ECB = EC(C-1AC) = EAC, which shows that T(w,) = S(u,) for each j. Therefore T(x) = S(x) for each x inV, so T = S. In other words, the matrices A and B represent the same linear transformation, peFinirion. ‘To nm X n matrices A and B are called similar if there is a nonsingular matrix C such that B = C-¥AC. ‘Theorems 46 and 4.7 can be combined to give us raronmm 4.8, Two nx n matrices are similar ifand only if they represent the same linear transformation. Similar matrices share many properties. For example, they have the same determinant since det (CAC) = det (C~)(det A)(det C) = det A. ‘This property gives us the following theorem, sizonsn 4.9. Similar matrices have the same characteristic polynomial and therefore the same eigenvalues. Proof. If A and B are similar there is a nonsingular matrix C such that B ‘Therefore we have eae A — B= MI — CAAC = ACHIC — CC := CAM = ANC. This shows that 4/ — B and 4J — A are similar, so det (AJ — B) = det (Al — A). Theorems 4.8 and 4.9 together show that all matrix representations of a given linear transformation T have the same characteristic polynomial. This polynomial is also called the characteristic polynomial of 7: ‘The next theorem is a combination of ‘Theorems 4.5, 4:2, and 46. In ‘Theorem 4.10, F denotes either the real field R or the complex field C. 12 Eigenvalues and eigenvectors mmonm 4.10, Let T: V+ Vbe a linear transformation, where V has scalars in F, and dim V =n. Assume that the characteristic polynomial of T has n distinct roots I, ..., Ay in F. Then we have: (a) The corresponding eigenvectors 1... 1, form a basis for V. (b) The matrix of T relative to the ordered basis U = [u,,..., uJ is the diagonal matrix A having the eigenvalues as diagonal entries: A = diag (4y,..., 44). (o) If A is the matrix of T relative to another basis E=(e,,..., e,}, then A=COAC, where C is the nonsingular matrix relating the two bases by the equation U=EC. Proof. By Theorem 4.5 each root A, is an eigenvalue. Since there are n distinct roots, Theorem 4,2 tells us that the corresponding eigenvectors #1, . . . , Uy ate independent. Hence they form a basis for V. This proves (a). Since T(u,) = Ada the matrix of T' relative to U is the diagonal matrix A, which proves (b). To prove (c) we use Theorem 4.6. Note: The nonsingular matrix C in Theorem 4.10 is called a diagonalizing matrix. If (e1,---»¢) is the basis of unit coordinate vectors (Jy, .... In), then the equation U = EC in Theorem 4.10 shows that the kth column of C consists of the components of the eigenvector u, relative to (hy... » [,)- If the eigenvalues of A are distinct then A is similar to a diagonal matrix, If the eigen- values are not distinct then A still might be similar to a diagonal matrix. This will happen if and only if there are h independent eigenvectors comesponding to each eigenvalue of multiplicity . Examples occur in the next set of exercises. 4.10 Exercises 1d o 1, Prove that the matrices [ ] and [10 il have the same eigenvalues but are not similar. O41 gular matrix C such that C-1AC is a diagonal matrix or explain why 2. In each case find a nons no such C exists, 1 0 1 2 27 20 wal } oa-[ |} oaa[ i @a [ |: 13 54 -1 4 -1 0. 3. Three bases in the plane are given. With respect to these bases a point has components (x,, 2x2)» Ot Ya), and (z,, 22), respectively. Suppose that (},, Yo] = [es 14, [1 Ze] = (x, 2]B, and Iz, Ze] = (Yas yeJC, where A, B,C are 2 x 2 matrices. Express C in terms of A and B. Exercises 113 In each case, show that the eigenvalues of A are not distinct but that A has three independent eigenvectors, Find a nonsingular matrix C such that C-1AC is a diagonal matrix. foo 1 1-1 ya | Onto |S Ce) ge a Sa 100 -1 -1 1 Show that none of the following matrices is similar to a diagonal matrix, but that each is s pa j 2 <1 21 «| | o| |: Out 2 -1 4 one O) Determine the eigenvalues and eigenvectors of the matrix} 0 0 — 1Jand thereby show that it is not similar to a diagonal matrix. -1 3003 (@) Prove that a. square matrix A is nonsingular if and only if 0 is not an eigenvalue of A. (b) I A js nonsingular, prove that the eigenvalues of A“? are the reciprocals of the eigenvalues of A. Given an m x matrix A with real entries such that A® = —J, Prove the following. statements about A. (a) A is nonsingular, (b) mis even. (©) A has no real eigenvalues. (d) deta. = 1. 5 EIGENVALUES OF OPERATORS ACTING ON EUCLIDEAN SPACES 5.1 Eigenvalues and inner products js chapter describes some properties of eigenvalues and eigenvectors of linear trans- formations that operate on Euclidean spaces, that is, on linear spaces having an inner product. We recall the fundamental properties of inner products. In a real Euclidean space an inner product (x, y) of two elements x andy is a real number satisfying the following properties Q) @&y)= G9) (symmetry) 2) +2 Y= Y+ GY (linearity) (3) (ex, ») = e(x, y) (homogeneity) i el a (positivity). In a complex Euclidean space the inner product is a complex number satisfying the same properties, with the exception that symmetry is replaced by Hermitian symmetry, ay (, y= 0%), where the bar denotes the complex conjugate. In (3) the scalar ¢ is complex, From (1") and (3) we obtain B) (, o) = ex, YD, which tells us that scalars are conjugated when taken out of the second factor. Taking x = y in (1°) we see that (x, x) is real so property (4) is meaningful if the space is complex. When we use the term Euclidean space without further designation it is to be understood that the space can be real or complex. Although most of our applications will be to finite- dimensional spaces, we do not require this restriction at the oulset. The first theorem shows that eigenvalues (if they exist) can be expressed in terms of the inner — product. muzonen 5.1. Let Ebe a Euclidean space, let V be a subspace of E, and let T:V + Ebea linear transformation having an eigenvalue A with a corresponding eigenvector x. Then we have T(x), 6D a= 7) | (x, x) ia Hermitian and skew-Hermitian transformations us Proof. Since Tix) = 2x we have (10), x) = Ax, x) = Ax, 9) Since X # 0 we can divide by (x, x) to get (5.1). Several properties of eigenvalues are easily deduced from Equation (6.1). For example, from the Hermitian symmetry of the inner product we have the companion formula pa TO) (ex) for the complex conjugate of 7, From (5.1) and (5.2) we see that A is real (2 only if (T(x), x) is real, that is, if and only if ) if and (TQ), x) = (x, T(x) for the eigenvector x . (This condition is trivially satisfied in a real Buclidean space.) Also, 2 is pure imaginary (2 = —A) if and only if (T(x), 3) 18 pure imaginary, that is, if and only if (T(x), x) =(x, T()) for the eigenvector x. 5.2 Hermitian and skew-Hermitian transformations In this section we introduce two important types of linear operators which act on Buclid- ean spaces. These operators have two categories of names, depending on whether the underlying Euclidean space has a real or complex inner product. In the real case the trans- formations are called symmetric and skew-symmetric. In the complex case they are called Hermitian and. skew-Hermitian. ‘These transformations occur in many different applications. For example, Hermitian operators on infinite-dimensional spaces play an important role in quantum — mechanics. We shall discuss primarily the complex case since it presents no added difficulties. permurtion. Let E be a Euclidean space and let V be a subspace of E. A linear trans- formation T: V— E is called Hermitian on V if (T(x), y) (& Ty) for all x and y in V. Operator T is called skew-Hermitian on V if (TQ), )=— GTO) for all x and y in V. In other words, a Hermitian operator Tecan be shified from one factor of an inner product to the other without changing the value of the product. Shifting a skew-Hermitian operator changes the sign of the product, Note: As already mentioned, if Bis a rea/ Euclidean space, Hermitian transformations ate also called symmetric; skew-Hlermitian transformations are called skew-symmetric. 116 Eigenvalues of operators acting on Euclidean spaces mower 1. Symmetry and skew-symmetry in the space Ca, b). Let C(a, b) denote the space of all real functions continuous on a closed interval fa, 6) with the real inner product (9) = [27a at. Let Vbe a subspace of C(a,b). If T:¥ > C(a, ) is a linear transformation then (f, T(g)) = f f(OTg(#) dt , where we have written Tg(t) for Tg). ‘Therefore the conditions for sym- metry and skew-symmetry become 63) [2 POT a = THO} at = 0 if Tis symmettic, and 4) L FOTR + gOTF(O} dt = 0 if Tis. skew-symmetric sxveue 2, Multiplication by a fixed function. In the space C(a, b) of Example 1, choose a fixed function p and define 74 = pf, the product ofp and f. For this 7, Equation (5.3) is satisfied for all f and g in C(a,4) since the integrand is zero. ‘Therefore, multiplication by a fixed function is a symmetric operator. swore 3. The differentiation operator. In the space C(a, b) of Example 1, let V be the subspace consisting of all functions £ which have a continuous derivative in the open interval (@ 6) and which also satisfy the boundary condition f (@ = fib. Let D:V—> Cla, b) be the differentiation operator given by Dif} = £' . It is easy to prove that D is skew-symmettic. In this case the integrand in (6.4) is the derivative of the product jg, so the integral is equal to [2 a @ de= FOE) ~ flae(a) . 0. Thus, sub- Since bothfand g satisfy the boundary condition, we have f (B)g(b) ~ £ (a)g(a) the boundary condition implies skew-symmety of D. ‘The only eigenfunctions in th space Vare the constant functions. ‘They belong to the eigenvalue 0, EXAMPLE 4. Sturm-Liowville operators. ‘This example is important in the theory of linear second-order differential equations, We use the space C(a, 6) of Example I once more and let V be the subspace consisting of all £ which have a continuous second derivative in fa, 6/ and which also. satisfy the two boundary conditions 65) PAS @)=0, pHs e) = wherep is a fixed function in C(a, 6) with a continuous derivative on fa, 6 Let q be another fixed function in C(a, ) and let 7; V- C(a, b) be the operator defined by the equation TP) =(ef Y + af. Orthogonality of eigenvectors corresponding to distinct eigenvalues uy This is called a Sturm-Liowville operator. To test for symmetry we note that f7(g) — at(f) = S(ps’Y — g(Pf'’. Using this in (6.3) and integrating both f% f + (pg? dé and Bee «(of dt by pans, we find +f pe dem onl L pgp’ at fo (rr(@) = eT) dt = since bothfand g satisfy the boundary conditions (5.5). Hence T is symmetric on V, The eigenfunctions of Tare those nonzero f which satisfy, for some real 1, a differential equation of the form WY + of =4 on fa, bl], and also satisfy the boundary conditions (5.5). 53 Eigenvalues and eigenvectors of Hermitian and skew-Hermitian operators Regarding eigenvalues we have the following theorem; smzonem 5.2. Assume T has an eigenvalue il Then we have: (a) If T is Hermitian, Ais real: (©) If Tis skew-Hermitian, 2is pure imaginary: Proof. Let x be an eigenvector comesponding to 2. Then we have px (TO T(x) (x9) (x, x) If Tis Hermitian we have (T(x), x) = (x, T(x)) so A = A. if Tis skew-Hermitian we have (T@), x) = —(x, Th) so A= —T. Note: If is symmetric, Theorem 5.2 tells us nothing new about the eigenvalues of 7 since all the eigenvalues must be real if the inner product is real.. If Tis skew-symmetric, the eigenvalues of 7 must be both real and pure imaginary. Hence all the eigenvalues of a skew-symmetric operator must be zero (iff any exist). 5.4 Orthogonality of eigenvectors corresponding to distinct eigenvalues Distinct eigenvalues of any linear transformation correspond to independent eigen vectors (by Theorem 4.2). For Hermitian and skew-Hermitian transformations more is true sumorem 5.3. Let T be a Hermitian or skew-Hermitian transformation, and let A and jt be distinct eigenvalues of T with corresponding eigenvectors x and y. Then x and y are orthogonal; that is, (x, y) = 0. Proof. We write Tix) = Lx, TO) = wy and compare the two inner products (T(x), y) and (x, 7%). We have (TQ), y) = Gx, ¥) Ax, y) and (x, TY) = wy) = AG y). 118 Eigenvalues of operators acting on Euclidean spaces If T is Hermitian this gives us A(x, y) = A(x, y) = uGxy) since w = ff, Therefore (x, ¥) = 0 since 2 ¥ w. If T is skew-Hermitian we obtain A(x, y) = —fi(x, ») = w(x, ¥) which again implies (x,y) = 0. sweuz. We apply Theorem 5.3 to those nonzero functions which satisfy a differential equation of the form (5.6) f'Y + of =f on an interval fa, b], and which also satisfy the boundary conditions p(a)f(a) = p(b)f (6) 0. The conclusion is that any two solutions £ and g corresponding to two distinct values of A are orthogonal. For example, consider the differential equation of simple harmonic motion, f" + f= on the interval [0, 7], where k 3 0. This has the form (56) with p = 1, ¢ = 0, and 4 = —K®. All solutions are given by/(2) = ¢, cos kt + cy sin kt . The boundary condition Ff ©) =0 implies c, = 0. The second boundary condition, £ (x)= 0, implies ¢, sin ka = 0. Since ¢, % 0 for a nonzero solution, we must have sin ke = 0, which means that k is an integer. In other words, nonzero solutions which satisfy the boundary conditions exist if and only if k is an integer. These solutions are ft) a ae a Hae The orthogonality condition implied by Theorem 5.3 now becomes the familiar relation IF sin ne sin me ae = 0 if m® and n® are distinct integers. 5.5 Exercises 1. Let Z be a Buclidean space, let V be a subspace, and let T: V+ & be a given linear trans- formation. Let 4 be a scalar and x a nonzero element of V. Prove that 1 is an eigenvalue of 7 with x as an eigenvector if and only if (T@0, y) = A(x, y) for every yin Be Rp . Let T(x) =cx for every x ina linear space V, where ¢ is a fixed metric if V is a real Euclidean space. 3. Assume T: V — V is a Hermitian transformation, (a) Prove that 7 is Hermitian for every positive integer », and that T-) is Hermitian if Tis invertible. (b) What can you conclude about 7* and T+ if Tis skew-Hermitian? 4.Let Ty: V+ B and Ty: V — E be wo Hermitian transformations (a) Prove that aT, + bT, is Hermitian for all real scalars a and 6. (b) Prove that the product (composition) 7,7) is Hermitian if Ty and T, commute, that is, if TT, ~ ToT. 5. Let V = V,(R) with the usual dot product as inner product. Let Tbe a reflection in the xy- plane; that is, let 7@) =#, 7G) = Jj,and T(k) = —-k. Prove that T is symmetric, Exercises 119 6. Let C(0, 1) be the real linear space of all real functions continuous on [0, 1] with inner product (f.g) = 18 f(g() dt. Let V be the subspace ofall £ such that fA £(t) dt = 0.LetT: V > C(O, 1) be the integration operator defined by Tf (x) =S§ f(t) dt . Prove that T'is skew-sym- metric. 7. Let V be the real Buclidean space of all real polynomials with the inner product (f,g) = Lyf (Ng@ae. Determine which of the following transformations T; V — V is symmetric or skew-symmetric: (a) T/@) =f(-»). (©) Tf) =f) +f(-x) () Tf) =/@f(-2). () 110) =f) ~f(-%). 8. Refer to Example 4 of Section 5.2. Modify the inner product as follows: (a9) =) fogowe at, where w is a fixed positive function in C(a, 5). Modify the Sturm-Liouville operator T by weiting Tmo te Prove that the modified operator is symmetric on the subspace V. 9, Let V be a subspace of a complex Euclidean space F, Let T: VE be a linear transformation ‘and define a scalar-valued function Q on V as follows : Q(x) = (Tix), x) for all x in V. (@) If 7 is Hermitian on P, prove that Q(x) is real for all x. (b) If Tis skew-Hermitian, prove that Q(x) is pure imaginary for all x. (©) Prove that Q(tx) = 170(x) for every scalar 1. (@) Prove that Q(x + y») = Q&x) + QG) + (Tha). y) + (Ty), ©), and find a corresponding formula for Ox + 1y). (©) If Q(x) = 0 for all x prove that Ttx) = 0 for all x. () If Qos) is real for all x prove that T’ is Hermitian, [Hint: Use the fact that Q(x + ty) equals its conjugate for every scalar t] 10. This exercise shows that the Legendre polynomials (introduced in Section 1.14) are eigen- functions of a SturmLiouville operator. The Legendre polynomials are defined by the equation 1 PAO =e S PO. — whew £, (t) = @ = (a) Verify thau(@®@—1) £", (t) = Ant fult). (b) Differentiate the equation in (a) 4+ 1 times, using Leibniz’s formula (see p. 222 of Volume D to obtain (2 mF ND) + eae DP PO) = ns DF PO) = 2uf_ HHH + nla + DF PO (©) Show that the equation in (b) can be rewritten in the form Ue = DPLOY = nla + PAO. This shows that P,(f) is an eigenfunction of the Sturm-Liouville operator 7 given on the 120 Eigenvalues of operators acting on Ki iclidean spaces interval [ -1 , 1] by T(f) = (pf’)’, where p(t) = ® ~ 1. The eigenfunction P,(t) belongs t the eigenvalue 2 = n(n + 1). In this example the boundary conditions for’ symmetry are automatically satisfied since p(1)= p( = 1) = 0. 5.6 Existence of an orthonormal set of eigenvectors for Hermitian and skew-Hermitian operators acting on finite-dimensional spaces Both Theorems 5.2 and 5.3 are based on the assumption that 7° has an eigenvalue. As we know, eigenvalues need not exist. However, if 7’ acts on a finite-dimensional complex space, then eigenvalues always exist since they are the roots of the characteristic polynomial If 7 is Hermitian, all the eigenvalues are real, If 7 is skew-Hermitian, all the eigenvalues are pure imaginary, We also know that two distinct eigenvalues belong to orthogonal eigenvectors if Tis Hermitian or skew-Hermitian, Using this property we can prove that J’ has an orthonormal set of eigenvectors which spans the whole space. (We recall that an orthogonal set is called orthonormal if each of its (elements has norm 1.) ruronme 5.4. Assume dim V = n and let T: V—» V be Hermitian or skew-Hermitian, Then there exist n eigenvectors uyy... , U, OF T which form an orthonormal basis for V. Hence the matrix of T relative to this basis is the diagonal matrix A = diag (Ay,.... An), where J, is the eigenvalue belonging 10 uy. Proof. We vse inductian on the dimension n. If n = 1, then T has exactly one eigen value, Any eigenvector # of norm 1 is an onhonormal basis for V. Now assume the theorem is tue for every Buclidean To prove it is also tme for V we choose an eigenvalue 4, for T and a comesponding eigenvector % of nom 1. Then T(%) = Ayu, and jel) = 1 . Let S be the subspace spanned by 1. We shall apply the induction hypothesis to the subspace S- consisting of all elements in V which are orthogonal 10 t , fx |x eV, (x, 4) =0}. To do this we need to know that dim $+ = n — 1 and that 7 maps $+ into itself From Theorem 1.7(a) we know that t is part of a basis for V, say the basis (uy, 05,...,6,). We can assume, without loss in generality, that this is an omhonormal basis. (If not, we apply the Gram-Schmidt process to convert it into an orthonormal basis, keeping 1 as the first basis element.) Now take any x in S+ and write X= Xt + Mada bE Wn Then x, = (x, %) = 0 since the basis is orthonormal, so x is in the space spanned by Payee. y Vy Hence dim St= n= 1 Next we show that T maps S/ into itself, Assume 7 Hermitian, If x € S4 we have (TO), w) =O, Tu) = Ata) = AG, m1) = 0 Matrix representations for Hermitian and skew-Hermitian operators 121 0 Tx) € $+. Since Tis Hermitian on $4 we can apply the induction hypothesis to find that T has n — | eigenvectors wg... . .Mq Which form an orthonormal basis for $+, ‘Therefore the orthogonal set %, . . /, % is an orthonormal basis for V. This proves the theorem if Tis Hermitian, A similar argument works if Tis skew-Hermitian, 5.7 Matrix representations for Hermitian and skew-Hermitian operators sume that V is a finite-dimensional Euclidean space. A Hermitian or skew-Hermitian transformation can be characterized in terms of its action on the elements of any basis. turore 5.5. Let (e,,..., ¢,) be a basis for Vand let T; V > V be a linear transfor- mation. Then we have: (a) Tis Hermitian if and only if (T(e,)s €) = (es T(e,)) for all i andj. (b) Tis skew-Hermitian if and only if (T(e;),e,) = = (es, T(e)) for all i andj. Proof. Take any two elements x and y in V and express each in terms of the basis elements, say x = 5 x,e, and y = > y,e,. Then we have 709.9) = ( Similarly we find Sxeo.r) —Sx(rer.3 ve) =F Fancreed.e0. AG & TO) = ZB x Hey Ted). (@) and (b) following immediately from these equatior Now we characterize these concepts in terms of a matrix representation of rmonm 5.6. Let (e,,... ,€,) be an orthonormal basis for V, and let A = (a,;) be the matrix representation of a linear transformation T: V-> V relative to this basis. Then we have: (a) Lis Hermitian if and only if ax, = 4, for all i andj. (b) Tis skew-Hermitian if and only if a, = —4), for all i andj. Gxyey . Taking the inner product t we obtain Proof. Since A is the matrix of T we have Tle) = Sp. of T(e,) with e, and using the linearity of the inner prodi (Tee). 00 = (Sauer es) =S aster, d- But (¢, . e;) = 0 unless & = /, so the last sum simplifies to 4;,(e,, ¢,) = a; since (e, , e,) =1. Hence we have ay = (Te), e) — forall i,j, 122 Eigenvalues of operators acting on Euclidean spaces Interchanging i and j, taking conjugates, and using the Hermitian symmetry of the inner product, we find Gs, = (e;,T(e)) for all i, j. Now we apply Theorem 5.5 to complete the proof, 58 Hermitian and skew-Hermitian matrices. The adjoint of a matrix The following definition is suggested by Theorem 5.6. peeiimioy, A square matrix A = (aj) is called Hermitian if aj = 4,, for all i andj. Matrix A is called skew-Hermitian if ay, = —d,, for all i and j. Theorem 5.6 states that a transformation 7’ on a finite-dimensional space Vis Hermitian or skew-Hermitian according as its matrix relative to an orthonormal basis is Hermitian or skew-Hermitian ‘These matrices can be described in another way. Let A denote the matrix obtained by replacing each entry of A by its complex conjugate. Matrix A is called the conjugate of A. Matrix A is Hermitian if and only if it is equal to the transpose of its conjugate, A = d!. It is skew-Hermitian if A = —A* The transpose of the conjugate is given a special name. _DEPINITION OFTHE ADJOINT OF A MATRIX. For any matrix A, the transpose of the conjugate, Ais also called the adjoint of A and is denoted by A*. Thus, a square matrix A is Hermitian if A = A%, and skew-Hermitian if A = -A™, A Hermitian matrix is also called self-adjoint. Note: Much of the older matrix literature uses the term adjoint for the transpose of the cofactor matrix, an entirely different object. ‘The definition given here conforms to the current nomenclature in the theory of linear operators, 5.9 Diagonalization of a Hermitian or skew-Hermitian matrix sonst 5.7, Every n X n Hermitian or skew-Hermitian matrix A is similar to the diagonal matrix A= diag (Ay,..-, 4,) of its eigenvalues. Moreover, we have A= C4AC, where Cis a nonsingular matrix whose inverse is its adjoint, C= C*. Proof. Let V be the space of n-tuples of complex numbers, and let (@,... , €,) be the orthonormal basis of unit coordinate vectors. If x = Y xe, and y = ¥ yyey let the inner product be given by (x, y) =D xs. For the given mauix A, let Tbe the transformation represented by A relative to the chosen basis. Then Theorem 5.4 tells us that V has an Unitary matrices. Orthogonal matrices 123 orthonormal basis of eigenvectors (u,, + U,), telative to which T has the diagonal matrix representation A = diag (4, .. . .4,), where A, is the eigenvalue belonging to uM. Since both A! and A represent T’ they are similar, so we have A = C-1AC, where C = (cy,) is the nonsingular matrix relating the two bases: [arse ee eal = Ler... el. This equation shows that thejth column of C consists of the components of My relative to (1s... ,@). Therefore ey; is the ith component of 4,. The inner product of uy and uy is given by (uy, we) =D cases a Since {uy,...,W_} is an orthonormal set, this shows that CC* = I, so C1 = C*. Note: The proof of Theorem 5.7 also tells us how (© determine the diagonalizing matrix C. We find an orthonormal set of eigenvectors ty... , Mm and then use the components of u; (relative to the basis of unit coordinate vectors) as the entries of thejth column of C. 2 The eigenvectors belonging to 1 are «2, —1), f ¥ 0. Those belonging to 6 are #(1, 2), 1 #0. The two eigenvectors % = 1(2, -1) and u,= ¢(1, 2) with ¢ = 1/V5 form an orthonormal set. Therefore the matrix 2 co V5L— is a diagonalizing matrix for A, In this case C* = C! since C is real. It is easily verified 1 0 sxwexe 2, If A is already a diagonal matrix, then the diagonalizing matrix C of Theorem 5.7 either leaves A unchanged or mérely rearranges the diagonal entries. that CAC 5.10 Unitary matrices. Orthogonal matrices perruzrron. A square matrix A is called unitary if AA* AA‘ = 1. Note: Every real unitary matrix is orthogonal since A* = At. 1. It is called orthogonal if ‘Theorem 5.7 tells us that a Hermitian or skew-Hermitian matrix can always be diago- alized by a unitary matrix. A real Hermitian matrix has real eigenvalues and the corre sponding eigenvectors can be taken real, Therefore a real Hermitian matrix can be 124 Eigenvalues of operators acting on Euclidean spaces diagonalized by a real orthogonal matrix, This is nor true for real skew-Hermitian matrices. (See Exercise 11 in Section 5.11.) We also have the following related concepts. DEFINITION. A square matrix A with real or complex entries is called symmetric if A = A! ; it is called skew-symmetric if A = —A'. wovere 3. If A is real, its adjoint is equal to its wanspose, A* = A! Thus, every real Hermitian matrix is symmetric, but a symmetric matrix need not be Hermitian. Oi a pu 1m? i 1+i 3-7 examene: 4.1fA = » then a= , A= 3 i 4 34% 4: 2 4i L-i 347 and A* i 2 —4i 1 Pea 247 EXAMPLE: 5. Both matrices and are Hermitian, The first is sym- iis 2 3 metric, the second is not. is 0-2 i 2 EXAMPLE 6. Both matrices FE aI and iL ak are skew-Hermitian, The firs skew-symmetric, the second is not. sowexe 7. All the diagonal elements of a Hermitian matrix are real, All the diagonal elements of a skew-Hermitian matrix are pure imaginary. All the diagonal elements of a skew-symmetric matrix are zero, wawexe 8, For any square matrix A, the matrix B = (A + A*) is Hermitian, and the matrix C = }(A — A*) is skew-Hermitian, Their sum is A. Thus, every square matrix A can be expressed as a sum A = B + C, where B is Hermitian and C is. skew-Hermitian. It is an easy exercise to verify that this decomposition is unique. Also every square matrix A can be expressed uniquely as the sum of a symmetric matrix, }(A + A‘, and a skew- symmetric matrix, 44 = A‘). is orthogonal we have 1 = det (44*) = (det A)det A*) = (det A), so 1, Determine which of the following matrices are symmetric, skew-symmetric, Hermitian, skew- Hermitian. Oe Oe Oe 2 oe iG) tos ay Ona acy: yest nicay 3]. Zi 3 4i, 3 0 0. Exercises 125 ‘cos 6) —sin | . (a) Verify that the 2 x 2 matrix A = is an orthogonal: matrix. sin@ cos @ (b) Let T be the linear transformation’ with the above matrix A relative to the usual basis {i,j}. Prove that T maps each point in the plane with polar coordinates (r, 2) onto the point with polar coordinates (r, x + 9), Thus, T is a rotation of the plane about the origin, @ being the angle of rotation. 3. Let ¥ be real 3-space with the usual basis vectors i J, k Prove that cach of the following matrices. is orthogonal and represents the transformation indicated 1 0 @) | 0 1 ff reflection in the xy-plane), 0 1 1 0 Oo (b) [0 -1 0] reflection through the x-axis). 0 0 =1 e100 (©) o-1 oO (reflection through the origin). Oo at ae 0 Ate [0 cos 6 —si / (rotation about the x-axis). lo sino cosa] -1 ht 0 © | 0 cos 6 -sin 6, (rotation about x-axis followed by reflection in the yz-plane). 0 sin @ cos Oy 4, A real orthogonal matrix A is called proper if det A = 1 , and improper if det A = ‘cos 0 —sin 0 for some 6, This represents 1 (a) If A is a proper 2 x 2 matrix, prove that A = a rotation through an angle 8. sind cos 4 0 ih Haga (b) Prove that [: ] and tg i| are improper matrices, The first matrix represents a reflection of the xy-plane through the x-axis; the second represents a reflection through the y-axis, Find all improper 2 x 2 matrices. In cach of Exercises 5 through 8, find (@) an orthogonal set of eigenvectors for A, and (b) @ unitary matrix C such that C-1AC is a diagonal matrix. 9 12 2 SAS : 6A 12 16, 9 1s 13 1A=|3 -2 126 Eigenvalues of operators acting on Euclidean spaces 9, Determine which of the following matrices are unitary, and which are orthogonal (a, 6, 0 real). cos@ 0 siné aV2 -}V3 4V6 |: @m] o 1 0}, ©] 0 4v3 4v6 —sin@ 0 cos 6, 4¥2 4V3 -2V6 10. The special theory of relativity makes use of the equations ae a ca t = alt —vx]c%) Here p is the velocity of a moving object, c the speed of light, and a = ¢/ Ve ‘The linear tensformation which maps (x, y, 2 £) oaw (xy, 2.) is called a Lorentz manalomaton (a) Let (x, Xa, X35 x9) = (%, Js Z ict) and 245452) =O, y's Z, ict”). Show that the four equations can be written ‘a One matrix equation, a 00 ~iavje 01080 Da. xa.xe ll 0 01 0 Liwje 0 0 a J Lx. ah. 5, xf] (b) Prove that the 4 x 4 matrix in (@) is orthogonal but not unitary. Oa 11, Let a be a nonzero real number and let A be the skew-symmetric matrix A [ a a (@) Find an orthonormal set of eigenvectors for A. (b) Find a unitary matrix C such that C-1AC is a diagonal mauix. (©) Prove that there is no real orthogonal matrix C such that C1AC is a diagonal matrix 12. If the eigenvalues of a Hermitian or skew-Hermitian matrix A are all equal to ¢, prove that A sel. 13. If 4 is a real skew-symmetric matrix, prove that both Z — A and Z + A are nonsingular and that (I — A)(Z + A) is orthogonal. 14, For each of the following statements about n x n matrices, give a proof or exhibit a counter example, (@) If A and B are unitary, then A + B is unitary, (b) If A and B are unitary, then AB is unitary, (©) If A and AB are unitary, then B is unitary, (6) If A and B are unitary, then A + B is not unitary. 5.12 Quadratic forms Let V be a real Euclidean space and let T: V—> V be a symmetric operator. This means that 7 can be shified from one factor of an inner product to the other, (T@), Yy =(,7Q)) —forallxandyin V. Given T, we define a real-valued function Q on V by the equation Q(x) = (T(x), x). Quadratic forms 127 ‘The function Q is called the quadratic form associated with T. The term “quadratic” is suggested by the following theorem which shows that in the finite-cimensional case Q(x) is a quadratic polynomial in the components of x. marorey 5.8. Let(e,,.. .,€,) be an orthonormal basis for a real Euclidean space V. Let T: V > V be a symmetric transformation, and let A = (a,;) be the mairix of T relative to this basis. Then the quadratic form Q(x) = (T(x), x) is related to A as follows. 6.7) ew => Sage, if x =Yxe. Oe Proof. By linearity we have T(x) = ¥ x;T(e;) . Therefore 20) = (Zx7@). Sx) = 3 SreiTed, 00 This proves (5.7) since aj = a: = (T(E), €)- ‘The sum appearing in (5.7) is meaningful even if the matrix A is not symmetric. vernution. Let V be any real Euclidean space with an orthonormal basis (e,, ... , ¢), and let A = (a;;) by any n x n matrix of scalars. The scalar-valued function Q defined at each element x = xe, of V by the double sum 68) 000) = 3 Sacprers Oe is called the quadratic form associated with A. If A is a diagonal matrix, then a;, = 0 if ¢ # j so the sum in (5.8) contains only squared terms and can be writen more simply as 0) = Saat In this case the quadratic form is called a diagonalform. The double sum appearing in (5.8) can also be expressed as a product of three matrices, varonm 5.9. Let X =[x,,..., %] be a 1% 1m row matrix, and let A = (ay) be an nx n matrix. Then XAX* isa 1 x 1 matrix with entry 69) 2 Zar Proof. The product XA isa 1x m matrix, XA = [y;, + Yal » where entry y, is the dot product. of ¥ with thejth column of A, 2 rise a 28 Bigenvalues of operators acting on Euclidean spaces ‘Therefore the product XAX' is a 1 x 1 matrix whose single entry is the dot product Note: It is customary to identify the 1x 1 matrix XAX* with the sum in (5.9) and to call the product XAX' a. quadratic form, Equation (5.8) is written more simply as follows: Qlx) = XAX. +X = (x1, x2]. Then we have XA = ba, ~f | 1 = [ty — 3x2, —¥1 + Saal x ] = xf Bux. ute + Ski and hence XAX! = [x= 3x2, =a + sual x, 1-2 5. 1-2) [x XBX¢ = [x1, x] xP — 2xgxy — yxy + 5x2, ExaMpLe 2. Let B [ x = [iy, Xe]. Then we have eee eo Uae In both Examples 1 and 2 the two mixed product terms add up 10 —4xy%, so XAX! = XBX*. These examples show that different matrices can lead (© the same quadratie form, Note that one of these matrices is symmetric. This illustrates the next theorem, rusonm 5.10. For any nx nmatrix A and any 1x n row matrix X we have XAX' = XBX' where B is the symmetric matrix B= 3(A+ A*) Proof. Since XAX* is al x 1 matrix it is equal to its transpose, XAX* = (XAX*) But the wanspose of a product is the product of tansposes in reversed order, so we have (XAX')! = XAIN' . Therefore XAX' = 4XAX'+ LXA'K' = XBX!. 5.13 Reduction of a real quadratic form to a diagonal form A real symmetric matix A is Hermitian, Therefore, by Theorem 5.7 it is similar to the diagonal matrix A = diag (A,, » An) of its eigenvalues. Moreover, we have A= C'AC, where C is an orthogonal matrix, Now we show that C can be used to convert the quadratic form XAX* 0 a diagonal form. Reduction of a real quadratic form to a diagonal form 129 rumor 5.11, Let XAX* be the quadratic form associated with a real symmetric matrix A, and let C be an orthogonal matrix that converts A to a diagonal matrix A= CAC. Then we have XAX' = YAY! Zot. where Y= [is JJ is the row matrix Y= XO, and A,,...,}, are the eigenvalues of A. Proof. Since C is orthogonal we have C-? = C*, Therefore the equation Y = XC implies Y = YC! , and we obtain XAX' = YC)A( YC)! = ¥(C'AC) ¥'= YAY. Note: Theorem 5.11 is described by saying that the linear transformation Y = XC reduces the quadratic form AX to a diagonal form YA ¥*, powers 1, The quadratic form belonging to the identity matrix is XIXxt Sx? = x1", the square of the length of the vector X¥ = (x,,.,. 4%) . A linear transformation Y = XC, where C is an orthogonal matrix, gives a new quadratic form YAY! with A = CICt = cCt = I, Since XIX' = YIY' we have |X|? = || Y|?, so Y has the same length as X. A linear transformation which preserves the length of each vector is called an isometry. These transformations are discussed in more detail in Section 5.19. wowere 2. Determine an orthogonal matrix C which reduces the quadratic form Q@) = 2x? + 4xyxp + Sxf to a diagonal form. eiric matrix was 2 Solution. We write Q(x) = XAX', where A = [ 1. ms symm diagonalized in Example 1 following Theorem 5.7. It has the eigenvalues 2, = 1, fy = 6, and an orthonormal set of eigenvectors 1, Wz, where = ¢(2, —1), w= t(1,2),1 = 1 The corresponding 1/3. An orthogonal diagonalizing matrix is C = { diagonal form is YAY? = Ayi+ dgyl= y? + 6y? The result of Example 2 has a simple geometric interpretation, illustated in Figure 5.1. The linear transformation Y = XC can be regarded as a rotation which maps the basis i, Jj onto the new basis my, ta. A point with coordinates (x1, 2) relative (© the first basis has new coordinates (Yi, ys) relative to the second basis. Since XAX' = YAY* , the set of points (x1, X,) satisfying the equation XYAX' = ¢ for some c is identical with the set of points (j, ¥) satisfying YAY! = c . The second equation, writen as y? + 62 = c, is 130 Higevalues Of operators acting on Euclidean spaces : ( Gi, x2) relativeto basis 4, ( (, 92) relativeto basis wy, ue vy FIGURE 5.1 Rotation of axes by an orthogonal matrix, The ellipse has Cartesian equation XAX! =9 in the x,xy-system, and equation YA ¥' = 9 in the yiyersystem. the Cartesian equation of an ellipse if ¢ > 0, Therefore the equation XAX* = c, written as 2xt + 4xyx, + Sx¥ = ¢, represents the same ellipse in the original coordinate system, Figure 5.1 shows the ellipse corresponding to ¢ = 9. 5.14 Applications to analytic geometry ‘The reduction of a quadratic form to a diagonal form can be used to identify the set of all points (x, y) in the plane which satisfy a Cartesian equation of the form 6.10) axts bay + cyt des yt fa We shall find that this set is always a conic section, that is, an ellipse, hyperbola, parabola, for one of the degenerate cases (the empty set, a single point, or one or two straight lines). The type of conic is govemed by the second-degree terms, that is, by the quadratic form ax + bxy + ey*. To conform with the notation used earlier, we write x, for X, Xz for Ys and express this quadratic form as a matrix product, XAX! = ax} + bxixs + exd, a ara . By a rotation Y = XC we reduce this form 10 bd a diagonal form Ary? + Aay2, where Ay, az are the eigenvalues of A. An orthonormal set of eigenvectors ts, te determines a new set of coordinate axes, relative t which the Cartesian equation (5.10) becomes where X= [x1, x2] and A = [ (6.11) Ayit dyitd nt ey, + f =0, with new coefficients d’ and e” in the linear terms, In this equation there is no mixed product term yyY2, so the type of conic is easily identified by examining the eigenvalues A, and Ag Applications 10 analytic geometry 131 If the conic is not degenerate, Equation (5.11) represents an ellipse if 4, A, have the same sign, a hyperbola if A,, A, have opposite signs, and a parabola if either 4, or Ay is zero. The three cases correspond to Aydg > 0, Aya, < 0, and A,A, = 0. We illustrate with some specific examples. + dxy + Sy? + 4x + 13y — f= 0. We rewrite this as (5.12) 2x} + 4xyx2 + Sx} + 4x, + 13x, -2 = 0. The quadratic form 2x? + 4x,xp + Sx} is the one treated in Example 2 of the foregoing section, Its matrix has eigenvalues Ay = 1 , ap = 6, and an orthonormal set of eigenvectors uy = (2, — 1), we = t(1,2), where t = 1/V5. An orthogonal diagonalizing matrix is 24 c- {| . This reduces the quadratic part of (5.12) to the form y?+ 6) May determine the effect on the linear part we write the equation of rotation Y = XC in the form X¥ = ¥C! and obtain Ix, x2) 2yy+ Yds Xe - EN ‘Therefore the linear part 4x, + 13x, is transformed to 2y1 . Ye) B V5 (Hi + 2ye) - —VS 1 + 6/3 yy. ‘The transformed Cartesian equation becomes vit OE V5 1 + 6V3 v2 — 2 By completing the squares in y, and 2 we rewrite this as follows: Or — V5) + 60 + V5)? = 9. ‘This is the (equation of an ellipse with its center at the point (4/5, —3V/5) in the yyy system, The positive directions of the y, and yy axes are determined by the eigenvectors w and up, as indicated in Figure 5.2. We can simplify the equation further by writing A=n-W35, 2=ywtWs. Geometrically, this is the same as introducing a new system of coordinate axes parallel to the Jaye axes but with the new origin at the center of the ellipse. In the z,z-system the equation of the ellipse is simply 2 ae ge ae a ue 97 32 ‘The ellipse and all three coordinate systems are shown in Figure 5.2. 132 Eigenvalues of operators acting on Euclidean spaces FIGURE 5.2 Rotation and translation of coordinate axes. The rotation Y = XC is followed by the translation 2, = yy — V5, 22 = yo +3V5. exameie 2, 2a? = 4xy — y?— 4x + 10y = 13 = 0. We rewrite this as 2x? — dex, — x8 — 4x, + 10x, — 13 = 0. 2 ‘The quadratic part is XAX‘, where A = [ : I This matrix has the cigenvalues ees An orthonormal set of eigenvectors is uy = (2, —1), w = «C1, 2), ane where t = 1/ JS. An orthogonal diagonalizing matrix is C [ |: The equation of rotation X = YC! gves us me 1 1 1-2 + Ve) X= GEC + 2) v5 V5 ‘Therefore the transformed equation becomes at — 8 - Lent yt Ben t20— 23 V5 or aoe oe 16 3yt = 2y3 yo re J2—13=0. By completing the squares in y, and yz we obtain the equation 301 ~ 8¥5)" — 204 — tV5)' = 12, Applications to analytic geometry 1 x J Do (a) Hyperbola: 327 = 22% = (b) Parabola: yf + y, = 0 Figure 5.3. The curves in Examples 2 and 3. which represents a hyperbola with its center at (@V3, #V'5) in the yyyarsystem. The trans- lation z; = y; — 3V5, 22= ya — 4V'5 simplifies this equation further to Bef aba 12, of BAB a1, ‘The hyperbola is shown in Figure 5.3¢a). ‘The eigenvectors uj und uz determine the diree- tions of the positive y, and yz axes. EXAMPLE 3. Ox? + 24xy + 16y®~ 20x + 15y = 0. We rewrite this as Ox? + 24x.x_ + 16x; — 20x, + 15x, = 0. 7 b tT «Tis eigenvalues are A, = 25, ‘The symmetric matrix for the quadratic part is A = [ 0. An orthonormal set of eigenvectors is v4; = $3, 4), “: = 1(—4,3). An orthog- 4, 3 onal diagonalizing matrix is C = rn L | The equation of rotation X = ¥C* gives us m= 8G 49), oe = Bans 3) ‘Therefore the transformed Cartesian equation becomes 0 25y} — A8Gy. = Aye) + M41 + 392) = ‘This simplifies toy? + ya = 0, the equation of a parabola with its vertex at the origin. ‘The parabola is shown’ in “Figure 5.3(b). 134 Bigenvalues of operators acting on Buclidean spaces ewweue 4, Degenerate cases. A knowledge of the eigenvalues alone does not reveal whether the Cartesian equation represents a degenerate conic section, For example, the three equations x? + 2y? = 1, x? + 2y? = 0, and x®+ 2y?= — | all have the same cigenvalues; the first represents a nondegenerate ellipse, the second is satisfied only by (x, y) = (, 0), and the third represents the empty set. The last two can be regarded as degenerate cases of the ellipse. ‘The graph of the equation y* = 0 is the x-axis, The equation y* — 1 = 0 represents the wo parallel lines y= 1 and y=— 1. These can be regarded ‘as degenerate cases of the parabola. The equation x# = 4y® = 0 represents two imtersecting lines since it is satisfied if either x— 2y =O or x+ 2y = 0. This can be regarded as a degenerate case of the hyperbola. However, if the Cartesian equation ax? + bxy + cy? + dx + ey + f = 0 represents a nondegenerate conic section, then the type of conic can be determined quite easily. The characteristic polynomial of the matrix of the quadratic form ax® + bey + cy* A-a 2] det —b/2 Ae. Therefore the product of the eigenvalues is (at dA+ (ac — $b) = (A- AMAA). Ady = ac — $b = 4(4ac — b). Since the type of conic is determined by the algebraic sign of the product 4yZe , we see that the conic is an ellipse, hyperbola, or parabola, according as dac — bis positive, negative, or zero. The number 4ac — 6? is called the discriminant of the quadratic form ax? + bxy + cy, In Examples 1, 2 and 3 the discriminant has the values 34, -24, and 0, respectively. 5.15 Exercises In each of Exervises 1 through 7, find (a) a symmetric matrix A for the quadratic form, (b) the eigenvalues of A; () an orthonormal set of eigenvectors, (d) an orthogonal diagonalizing matrix C. Lax; + 4xpxy + xd. soxt tage tote xats. 2. xe. 6. 2x + Axyy + xf — at 3. a8 + 2xnx, = xd. 7. Bx; + Axpxe + BXYXy + 4xyXy + 3x: 4. 34x; — 24xyxy + 4123, In each of Exercises 8 through 18, identify and make a sketch of the conic section represented by the Cartesian equation. 8. y8— 2xy 420% - 14, Sx* + Oxy + Sy? -2 9. y2 — 2xy + 5x =0. 15, 2 4 2xy + y® — 2x + 2y +3 10. y® — 2xy + x? — 5x =0. 16. 2x8 + dxy + 5y®— 2x —y — 11. Sa? — xy + 2p" = 6 17. 28 + dxy — 2y? - 12 = 0. 12, 19x? + dary + 16)% = 212x + 104y = 356. 13, 9x* + 2Axy + 16y® — 5:2x + dy = 6. 18. xy ty -2x -2 = 19. For what value (or values) of ¢ will the graph of the Cartesian equation 2xy — 4x + Ty + ¢ = 0 be a pair of lines ? 20. If the equation ax? + bxy + cy® = 1 represents an ellipse, prove that the area of the region it bounds is 2n/V4ac = 6%, This gives a geometric meaning to the discriminant 4ac — 6*. Bigenvalues Of a symmetric transformation obtained as values of its quadratic form 135 k5.16+ Eigenvalues of a symmetric transformation obtained as values of its quadratic form Now we drop the requirement that V be finite-dimensional and we find a relation between the eigenvalues of a symmetric operator and its quadratic form. Suppose x is an eigenvector with norm 1 belonging to an eigenvalue 4, Then T(x) = 2x so we have (5.13) Q®) = (TO), 9 = (Ax, ») = AG, x) =A, since (x, x) = 1. The set of all x in V satisfying (x, x) = 1 is called the unit sphere in V. Equation (5.13) proves the following theorem. sumone 5.12. Let T: V —» V be a symmetric transformation on a real Euclidean space V, and let Q(x) « = (Tx), x). Then the eigenvalues of T (if any exist) are to be found among the values that Q takes on the unit sphere in V. wwe. Let V = V,(R) with the usual basis (i,j) and the usual dot product as inner 4 product. Let T be the symmetric transformation with matrix A = [ 1 ‘Then the quadratic form of T’ is given by 0 O(x) =F Sage, = axf + Bad. The eigenvalues of,T are 4; = 4, A; = 8. I is easy to see that these eigenvalues are, respectively, the minimum and maximum values which Q takes on the unit circle x? + xg = 1. In fact, on this circle we have Qlx) = 42+ 2) 4 4xba44 43, where -1 Sa 0 for all real ¢ In other words, the quadratic polynomial p(t) = at + bt® has its minimum at tf = 0 Hence p'(0) = 0. But p'(0) = a = 2(T(x), y), 80 (Tx), y) = 0. Since y was arbitrary, we can in particular take y = T(x), getting (T(x), T(x)) = 0. This proves that T(x) = 0. If Q is nonpositive on V we get p(t) = at + bt® < 0 for all f, sop has its maximum at 1 = 0, and hence p'(0) = 0 as before. + 5.17 Extremal properties of eigenvalues of a symmetric transformation Now we shall prove that the extreme values of a quadratic form on the unit sphere are eigenvalues’ vHworem 5.14, Let T:V—> V be a symmetric linear transformation on a real Euclidean space V, and let Q(x) = (T(x), x) . Among all values that Q takes on the unit sphere, assume ‘The finite-dimensional case 137 there is an extremum (maximum or minimum) at a point u with (uw = 1 . Then uwisaneigen- vector for T; the corresponding eigenvalue is Q(w), the extreme value of Q on the unit sphere Proof. Assume Q has a minimum at u, ‘Then we have 6.14) Q(x) > OW) for all x with (x, x) = Let A= Q(u). If (x, x) = 1 we have Q(u) = A(x, x) = (Ax, x) So inequality (5.14) can be written as (5.15) (T(x), x) > (Ax, x) provided (x, x) = 1. Now we prove that (5.15) is valid for a! x in V. Suppose |[x||= a. Then x = ay , where ||y\| = 1. Hence (Te), x) = (Tay), ay) = aX(TQ),y) and (Ax, x) = a(Ay, y). But (70), ») > Gy, 9) since (, y) a we get (5.15) for x = ay. Since (Tx), x) — (Ax, x) = (Tx) — Ax, x), we can rewrite inequality (5.15) in the form (Nx) = Ax, x) > 0, or Multiplying both members of this inequality by 6.16) (S(x), x) 0, where -S Ta. When x = u we have equality in (5.14) and hence also in (6.16). The linear transformation S is symmetric. Inequality (5.16) states that the quadratic form Q, given by Qu(x) (Sx), x) is nonnegative on V, When x = u we have Q,(u) = 0. Therefore, by Theorem 5.13 we must have S(w) = 0. In other words, Tw = 4u, so u is an eigenvector for T, and A = Qu) is the comesponding eigenvalue, ‘This completes the proof if Q has a minimum atu. If there is a maximum at u all the inequalities in the foregoing proof are reversed and we apply Theorem 5.13 to the nonpositive quadratic form Q, . 5.18 The finitedimensional case Suppose now that dim V =n. Then T has n real eigenvalues which can be arranged in increasing order, say A Sh Sc Say According to Theorem 5.14, the smallest eigenvalue 4, is the minimum of Q on the unit sphere, and the largest eigenvalue is the maximum of Q on the unit sphere. Now we shall show that the intermediate cigenvalues also occur as extreme values of Q, restricted to certain subsets of the unit sphere. If Vis infinite-dimensional, the quadratic form Q need not have an extremum on the unit sphere. ‘This will be the ease when T has no eigenvalues. In the finite-dimensional case, Q always has a maximum and a minimum somewhere on the unit sphere. This follows as a consequence of a more general theorem on extreme values of continuous funetions. For a special case of this theorem see Section 9.16. 138 Eigenvalues of operators acting on Buclidean spaces Let w be an eigenvector on the unit sphere which minimizes Q. Then 4, = Q(w,) . If 4 is an eigenvalue different from 4, any eigenvector belonging to A must be orthogonal to #1. Therefore it is natural to search for such an eigenvector on the orthogonal complement. of the subspace spanned by w- Let S be the subspace spanned by #4. The orthogonal complement S+ consists of all elements in V orthogonal to #%. In particular, S$ contains all eigenvectors belonging to eigenvalues A % 4. It is easily verified that dim S+ = n — 1 and that T maps S+ into itself.? Let S,_; denote the unit sphere in the ( — 1)-dimensional subspace S+. (The unit sphere S,_, is a subset of the unit sphere in V.) Applying Theorem 5.14 to the subspace S+ we find that 2, = Q(ug) . where wz is a point which minimizes Q on S,_1. ‘The next eigenvector Ay can be obtained in a similar way as the minimum value of Q on the unit sphere S,,» in the (n — 2)-dimensional space consisting of those elements orthog- onal {0 both #4, and ti. Continuing in this manner we find that each eigenvalue 2, is the minimum. value which Q takes on a unit sphere S,,,41 in a subspace of dimension n— hk + 1 ‘The largest of these minima, 4,, is also the maximum value which Q takes on each of the spheres S, 41. The comesponding set of eigenvectors uy, + Mq form an orthonormal basis for V. 5.19 Unitary transformations We conclude this chapter with a brief discussion of another important class of trans- formations known as unitary transformations. In the finite-dimensional case they are represented by unitary matrices sersuzsion. Let Ebe a Euclidean space and V a subspace of E. A linear transformation 1: V—> Eis called unitary on V if we have G.17) (Tx), Ty) = (x, y) forall xandy inV. When Eis a real Euclidean space a unitary transformation is also called an orthogonal transformation. Equation (5.17) is described by saying that T preserves inner products, Therefore it is natural {© expect that T also preserves orthogonality and noms, since these are derived from the inner produ. tuzorem 5.15. If T: V— E is a unitary transformation on V, then for all x and y in V we have (a) (x, y) = 0 implies (Tix), Te) = 0 (Tpreserves orthogonality). (b) TOI = [ll (Ipreserves norms). (©) IT@) — TY)| = Ie — yl (Tpreserves distances). . (@) Tis invertible, andT* is unitary on TCV). + This was done in the proof of Theorem 5.4, Section 5.6, Unitary transformations 139) Proof. Part (a) follows at once from Equation (5.17). Part (b) follows by taking x = y in (5.17). Part (c) follows from (b) because T(x) — Tiy) = T(x — y). To prove (d) we use (b) which shows that T(x) = 0 implies x = O, so T is invertible. If x € 1(V) and y € T{V) we can write x = Tw), y = Tv), so we have (T7(), TAQ) = Ce ») = (Tw, TV) = y)- ‘Therefore T? is unitary on 7(V). Regarding eigenvalues and eigenvectors. we have the following theorem. rumor 5.1/6. Let T: V > E be a unitary transformation on V. (a) If T has an eigenvalue 3, then \A\= 1. (b) If x and y are eigenvectors belonging to distinct eigenvalues and |, then x and y are orthogonal. (c) Zf V = E and dim V = n, and if V is a complex space, then there exist eigenvectors ty, +, Uy Of T which form an orthonormal basis for V. The matrix of T relative to this basis is the diagonal matrix A = diag (Ay,..., 2,), Where dy is the eigenvalue belonging to Up. Proof. To prove (a), let x be an eigenvector belonging to il. Then x # 0 and T(x) = Ax. Taking y = x in Equation (5.17) we get (ax, x)= or UG, 0 = Since (x, x) > 0 and Ad = [A[®, this implies [2] = To prove (b). write 7x) = Lx, Ty) = wy and compute the inner product (T(x), Ty) in two ways. We have (TO), TO) = Gy) since T is unitary. We also have (TQ), TOY = (Ax, wy) = AAC, 9) since x and y are eigenvectors. Therefore Afi(x, y) = (x, y) 80 (%, y) : unless Ai = But 42 = I by (a), so if we had Aji = I we would also have ,a=B, tick contradicts the assumption that 2 and je are distinct, Therefore A 1 and” & y= 0. Part (©) is proved by induction on # in much the same way that we proved Theorem 5.4, the corresponding result for Hermitian operators. The only change required is in that part of the proof which shows that 7 maps S! into itself, where =ixlxeV, (x,m) = 0}. Here u is an eigenvector of 7 with eigenvalue 2,, From the equation T(u,) = Ayu we find *T(u) = ATC) 140 Eigenvalues of operators acting on Euclidean spaces since 4,4, = |A|? = 1. Now choose any x in $4 and note that (ZT), m1) = (TE), ATH) = ATO), TEn)) = ACs, Hence T(x) € S+if x S4, so T maps S+ into itself. The rest of the proof is identical with that of Theorem 5.4, so we shall mot repeat the details The next two theorems describe properties of unitary transformations on a. finite- dimensional space. We give only a brief outline of the proofs. THEOREM 5.17. Asswne dim V = n and let E = (@,..., @y) be a jixed basis for V. Then a linear transformation T: V — V is unitary if and only if (5.18) (Ted), Tle) = (e:. @) for all i and j In particular, if E is orthonormal then T is unitary if and only if T maps E onto an orthonormal basis. Sketch of proof. Write x = J x1, y = Y yse; . Then we have S dele), GE (9 = (Exe. Sy) = and (Te, TO = (Exiteea.E 9,09) =F Sus, Te). Now compare (x, y) with (Tix), T(y)). rumor 5.18, Assume dim V = n and let (e,,.. +, @) be an orthonormal basis for V. Let A = (a) be the matrix representation of a linear transformation T: V —» V relative to this basis. Then T is unitary if and only if A is unitary, that is, if and only if G.19) A*A Sketch of proof. Since (e:, e;) is the ij-entry of the identity matrix, Equation (5.19) implies 5.20) (is 6) =D dates = > aides. mt ¢ Since A is the maix of T we have Tie) = Thy dies, Tle) Dhar Giles 80 (Te), Tep) = (Sane. aus) =3 3 aude» 6) = 3 onde Now compare this with (5.20) and use Theorem 5.17. Exercises 141 ruzonsm 5.19. Every unitary matrix A has the following properties: (a) Ais nonsingular and A-! = A*, (b) Each of A‘, A, and A* is a unitary matrix. (0 The eigenvalues of A are complex numbers of absolute value 1. (@) |det Al == 1; if A is real, then dei A= +1 ‘The proof of Theorem 5.19 is left as an exercise for the reader. 5.20 Exercises 1. (a) Let 7: V + V be the transformation given by 710) that 7 is unitary if and only if fe) = 1 (b) If is one-dimensional, prove that the only unitary transformations on V are those des- cribed in (a). In particular, if Vis a real one-dimensional space, there are only two orthogonal x, where c is a fixed scalar. Prove transformations, 7(x) = x and Tx) = —x. 2. Prove each of the following statements about a real orthogonal mx n matix A. (a) If 2 is a real eigenvalue of A, then 2= 1 or Z=—1. (b) If 2 is a complex eigenvalue of A, then the complex conjugate J is also an eigenvalue of A. In other words, the nonreal eigenvalues of A occur in conjugate pairs. (If mis odd, then A has at least one real eigenvalue. 3. Let V be a real Euclidean space of dimension , An orthogonal transformation 7: V + V with determinant 1 is called a rotation. If n is odd, prove that 1 is an eigenvalue for T. This shows that every rotation of an odd-dimensional space has a fixed axis. [Hint: Use Exercise 2.] 4, Given a real orthogonal matrix 4 with -1 as an eigenvalue of multiplicity &, Prove that det A cn 5. If 7 is linear and norm-preserving, prove that 7 is unitary. 6. If 7: V -> V is both unitary and Hermitian, prove that T?= I. 7. Let ey... ,€,) and (uj, +- 5 tq) be two orthonormal bases for a Euclidean space V. Prove that there is a unitary transformation 7 which maps one of these bases onto the other. 8. Find a real @ such that the following matrix is unitary: a Me 4aQi — 17 me ee | a2 — i) 9, If 4 is a skew-Hermitian matrix, prove that both Z — A and Z + A are nonsingular and that Z = ANZ + AY is unitary. 10. If A is a unitary matrix and if Hermitian. 11. If 4 is Hermitian, prove that A — if is nonsingular and that (A — iN1(A + i) is unitary 12. Prove that any unitary mauix can be diagonalized by a unitary matrix. 13. A square matrix is called normal if AA* = A*A . Determine which of the following types of matrices are normal. Z + A is nonsingular, prove that ([ — A\(Z + A) is skew- (a) Hermitian matrices. (@) Skew-symmetric matrices. (b) Skew-Hermitian matrices. (@) Unitary matrices. (©) Symmetric matrices. (© Orthogonal matrices. 14. If A is a normal matrix (AA™ = AMA) and if U is a unitary matrix, prove that U*AU is normal 6 LINEAR DIFFERENTIAL EQUATIONS 6.1 Historical introduction ‘The history of differential equations began in the 17th century when Newton, Leibniz, and the Bemoullis solved some simple differential equations of the first and second order arising from problems in geometry and mechanics, These early discoveries, beginning about 1690, seemed 10 suggest that the solutions of all differential equations based on geometric and physical problems could be expressed in terms of the familiar elementary functions of calculus, ‘Therefore, much of the early work was aimed at developing ingenious techniques for solving differential equations by elementary means, that is to say, by addition, sub- traction, multiplication, division, composition, and integration, applied only a finite number of times to the familiar functions of calculus Special methods such as separation of variables and the use of integrating factors were devised more or less haphazardly before the end of the 17th century, During the 18th century, mote systematic procedures were developed, primarily by Buler, Lagrange, and Laplace. It soon became apparent that relatively few differential equations could be solved by elementary means. Little by little, mathematicians began to realize that it was hopeless to try to discover methods for solving all differential equations. Instead, they found it- more fruitful to ask whether or not a given differential equation has any solution at all and, when it has, to ty to deduce properties of the solution from the differential equation itself Within this framework, mathematicians began to think of differential equations as new sources of functions. ‘An important phase in the theory developed early in the 19th century, paralleling the general trend toward a more rigorous approach to the calculus, In the 1820's, Cauchy obtained the first “existence theorem” for differential equations. He proved that every first-order equation of the form y= f(xy) has a solution whenever the right member, f(x,y), satisfies certain general conditions. One important example is the Ricatti equation y= POpts Oy + RO), where P, Q, and R are given functions, Cauchy’s work implies the existence of a solution of the Ricatti equation in any open interval (—r, 1) about the origin, provided P, Q, and 142 Review of results concerning linear equations of first and second orders 143, R have power-series expansions in (—r, r). In 1841 Joseph Liouville (1809-1882) showed that in some cases this solution cannot be obtained by elementary means Experience has shown that it is difficult to obtain results of much generality about solutions of differential equations, except for a few types. Among these are the so-called linear differential equations which occur in a great variety of scientific problems. Some simple types were discussed in Volume Flinear equations of first order and linear equations of second order with constant coefficients. The next section gives a review of the principal results conceming these equations. 6.2 Review of results concerning linear equations of first and second orders A linear differential equation of first order is one of the form. (6.1) Y + PO = OG), where P and Q are given functions. In Volume I we proved an existence-uniqueness theorem for this equation (Theorem 8.3) which we restate here. tugormm 6.1. Assume P and Q are continuous on an open interval J. Choose any point a in J and let b be any real number. Then there is one and only one function y = f (x) which satisfies the differential equation (6.1) and the initial condition fla) = b . This function is given by the explicit formula (62) flo) = bers + ete FOC a, where A(x) = JZ P(t) dt. Linear equations of second order are those of the form Po(x)y"” + Pixdy’ + Pedy = RO). If the coefficients Py, Py, Pz and the right-hand member R are continuous on some interval J, and if Py is never zero on J, an existence theorem (discussed in Section 6.5) guarantees that solutions always exist over the interval J. Nevertheless, there is no general formula analogous to (6.2) for expressing these solutions in terms of Po, Py, Pe, and R Thus, even in this relatively simple generalization of (6.1), the theory is far from complete, except in special cases. If the coefficients are constants and if R is zero, all the solutions can be determined explicitly in terms of polynomials, exponential and trigonometric functions by the following theorem which was proved in Volume I (Theorem 87) ruzonn 6.2. Consider the differential equation (6.3) wv + ay’ + by 144 Linear differential equations where a and b are given real constants, Let d= a* = 4b, Then every solution of (63) on the interval (— 00, + «) has the form A) y =e 8feyu(x) + Catte(x)], where ¢,and ¢, are constants, and the functions u, and u, are determined according to the algebraic sign of d as follows: (@ If d =0, then u(x) = 1 and u9(x : _ (b) If d > 0, then u(x) = e and u(x) =e, wherek= 4d. (co) fd < 0, then u,(x) = cos kx and u,(x)= sin kx, where k = Wed. x The number d= a? — 4b isthe discriminant of the quadratic equation (6.5) P+artb This is called the characteristic equation of the differential equation (6.3). Its roots are given by . va n= vd ‘The algebraic sign of d determines the nature of these roots, If d > 0 both roots are real and the solution in (6.4) can be expressed in the form y= Ce + ce". If d < 0, the roots 7 and fz are conjugate complex numbers. Each of the complex exponential functions fA(x) = eM? and f,(x) = e’* is a complex solution of the differential equation (6.3). We obtain real solutions by examining the real and imaginary parts of f, and fy. Writing r= —}a + ik, r= —}a —ik, where k= }/—d, we bave rrr ai i ea cere a a ae) 2gthe fi) =e" Srl) = oF = eo teltgnthe = e and @zBoos kx — ie ?sin kx. The general solution appearing in Equation (64) is a linear combination of the real and imaginary parts of f,(x) and f,(x). 63 Exercises ‘These exercises have been selected from Chapter 8 in Volume I and are intended as a review of the introductory material on Tinear differential equations of first and second orders. Linear equations of first order. In Exercises 1,2,3, solve the initial-value problem on the specified interval, 3y = e* on (—©, +), with y =Owhenx =O x on (0, +=), with y = 1 when x = 1. 2x on (—4n, £7), with y = 2 when x Linear differential equations of order n 145 4. If a strain of bacteria grows at a rate proportional to the amount present and if the population doubles in one hour, by how much will it increase at the end of two hours? 5. A curve with Cartesian equation y = f(x) passes through the origin. Lines drawn parallel to the coordinate axes through an arbitary point of the curve form a rectangle with two sides on the axes. ‘The curve divides every such rectangle into two regions A and B, one of which has an area equal 10 1 times the other, Find the funetion f. 6. (@) Let w be a nonzero solution of the second-order equation y” + POO" + O(x)y Show that the substitution y = av converts the equation 0. y+ POdy’ + OC) RO) inlo a first-order linear equation foro” (b) Obtain a nonzero solution of the equation _y’ use the method of part (a) to find a solution of = 4y' + x2(y" — 4y) = 0 by inspection and y" — 4y' + xt ly’ — 4y) = Dee such that y = 0 and y” = 4 when x Linear equations of second order with constant coefficients. In each of Exercises 7 through 10, find all solutions on (= 2, + ©). 7. yl -4y =0 9. y! = 2y' +5y =0 8.y" + dy = 0. 10. y+ 2y' ty =O 11, Find all values of the constant & such that the differential equation y” + ky = 0 has a nontrivial solution y = fi,(x) for which&(O) = f,(1) = 0. For each permissible value of k, determine the corresponding solution y = f,(x) . Consider both positive and negative values of k. 12. If (a, 6) is a given point in the plane and if m is a given real number, prove that the differential equation y” + ky = 0 has exactly one solution whose graph passes through (a, 6) and has slope m there. Discuss separately the case k = 0. 13. In each case, find a linear differential equation of second order s isfied by uy and wy. U(x) = e*. g(x) = xe. (©) u(x) = e7*? cos x, g(x) = e7*!? sin x, @ m0) = sin 2x +1), ug) = sin (2x + 2). (©) m2) = cosh x, ug(x) = sinh x 14. A particle undergoes simple harmonic motion, Initially its displacement is 1, its velocity is 1 and its acceleration is — 12. Compute its displacement and acceleration when the velocity is V8. 64 Linear differential equations of order 7» A linear differential equation of order n is one of the form (6.6) Pox)y + Py 4 oe PACY = ROO ‘The functions Py, Py... , P, multiplying the various derivatives of the unknown function y are called the coefficients of the equation, In our general discussion of the linear equation we shall assume that all the coefficients are continuous on some interval J. The word “interval” will refer either to a bounded or to an unbounded interval. 146 Linear differential equations In the differential equation (6.6) the leading coefficient Py plays a special role, since it determines the order of the equation, Points at which P(x) = 0 are called singular points Of the equation, The presence of singular points sometimes introduces complications that require special investigation. To avoid these difficulties we assume that the function Po is never zero on ef Then we can divide both sides of Equation (6.6) by Py and rewrite the differential equation in a form with leading coefficient 1. Therefore, in our general dis- cussion we shall assume that the differential equation has the form 67 Ay tant + Pay = RG). ‘The discussion of linear equations can be simplified by the use of operator notation, Let (J) denote the linear space consisting of all real-valued functions continuous on an interval J Let S*(J) denote the subspace consisting of all functions f whose first 1 derivatives Lf 5 «ee f0 exist and are continuous on I Let Py, . . , Py be m given functions in ‘G(J) and consider the operator L: €*(J) > VOD given by LP af e PSHM rere Pyf ‘The operator L itself is sometimes written as L= D+ PD 4--- +P, where D® denotes the kth derivative operator, In operator notation the differential equation in (6.7) is written simply as (68) Ly) = R. A solution of this equation is any function y in @"(J) which satisfies (6.8) on the interval J. It is easy (© verify that LO1 + ys) = L(y) + LO), and that L(cy) = eL(y) for every constant ¢. Therefore L is a linear operator. This is why the equation Ly) = Ris referred to as a linear equation. The operator L is called a linear differential operator of order n. With each linear equation L(y) = R we may associate the equation L(y) = 0, in which the righthand side has been replaced by zero, This is called the homogeneous equation corresponding to L{y) = R, When R is not identically zero, the equation L(y) = R is called a nonhomogeneous equation. We shall find that we can always solve the non- homogeneous equation whenever we can solve the cormesponding homogeneous equation. Therefore, we begin our study with the homogeneous case, The set of solutions of the homogeneous equation is the null space N(Z) of the operator L. This is also called the solution space of the equation, The solution space is a subspace of €*(J). Although @"(J) 18. infinite-dimensional, it turns out that the solution space N(L) is always finite-dimensional. In fact, we shall prove that (69) dim ND) = 1, The dimension of the solution space of a homogeneous linear equation 147 where 1 is the order of the operator Z, Equation (6.9) is called the dimensionality theorem for linear differential operators. The dimensionality theorem will be deduced as a conse- quence of an existence-uniqueness theorem which we discuss next. 6.5 The existence-uniqueness theorem ‘THEOREM 6.3. EXISTENCE-UNIQUENESS THEOREM sor LINEAR EQUATIONS OF onper M. Let P,,Po,.4. 5 Py be continuous functions on an open interval J and let L be the linear differ- ential operator L=D + PD A+ +P. If xo J and if ky, ks function y = fix) which satisfies the homogeneous differential equation L(y, which also satisfies the initial conditions ++sKq1 are n given real numbers, then there exists one and only one OondJand S%0) = Ka f(a) = Kay. ss sf Me) Knot Note: The vector in n-space given by (fo), f/@o), - . .f"P@)) is called the initial-value vector off at x), Theorem 6.3 tells us that if we choose a point xy in J and choose a vector in n-space, then the homogeneous equation L(y) = 0 has exactly one solution y= £ (x) on J with this vector as initial-value vector at x9. For example, when n = 2 there is exactly one solution with prescribed value f(x) and prescribed derivative F'Gx,) at a prescribed point x. The proof of the existence-uniqueness theorem will be oblained as a corollary of more general existence-uniqueness theorems discussed in Chapter 7. An altemate proof for the case of equations with constant coefficients is given in Section 7.9. 66 The dimension of the solution space of a homogeneous linear equation mmzoren 6.4. pmumstowarty tuzormu. Let L: 6*(J) > €(J) be a linear differential operator of order n given by (6.10) Dh $ PDs Py Then the solution space of the equation Ly) = 0 has dimension n. Proof. Let Vy denote the mdimensional linear space of n-tples of scalars. Let 7” be the linear transformation that maps each function f in the solution space N(L) onto the initial- value vector off at X05 TP) = Sd S Cos... LM)» Where Xo is a fixed point in eZ ‘The uniqueness theorem tells us that _T(£) = 0 implies f = 0. ‘Therefore, by Theorem 2.10, Tis one-to-one on N(L). Hence T-! is also one-to-one and maps V,, onto N(L)!. and Theorem 2.11 shows that dim N(L) = dim V,= 1 148 Linear differential equations Now that we know that the solution space has dimension n, any set of 1 independent solutions will serve as a basis, Therefore, as a corollary of the dimensionality theorem we have : TweoneM 6.5. Let L: "(J) + Vid) be a linear differential operator of order n. If tay... 4, are N independent solutions of the homogeneous differential equation L(y) = 0 on J, then every solution y = f (x) on J can be expressed in the form (6.11) I) San, where C1,..., Cn are constants. Note: Since all solutions of the differential equation L(y) = 0 are contained in formula (6.1 1), the linear combination on the right, with arbitrary constants ¢,, + Cy is sometimes called the general solution of the difierential equation. The dimensionality theorem tells us that the solution space of a homogeneous linear differential equation of order n always has a basis of n solutions, but it does not tell us how to determine such a basis, In fact, no simple method is known for determining a basis of solutions for every linear equation. However, special methods have been devised for special equations. Among these are differential equations with constant coefficients to which we tum now. 67 The algebra of constant-coefficient operators ‘A. constant-coefficient operator A. is a linear operator of the form (6.12) A= 4D" + a,D™4+ +++ + a,.D + a, where D is the derivative operator and a, a, .. . , a, are real constants. If a, % 0 the operator is said to have order x, The operator A can be applied to any function y with n derivatives on some interval, the result being a function A(y) given by Aly aay + aye re aay! + any In this section, we restrict our attention to functions having derivatives of every onder on (—co, + co). The set of all such functions will be denoted by @° and will be referred to as the class of infinitely differentiable functions. If y € © then A(y) is also in €®. ‘The usual algebraic operations on linear transformations (addition, multiplication by scalars, and composition or multiplication) can be applied, in particular, to constant- coefficient operators. Let A and B be two constant-coefficient operators (not necessatily of the same order). Since the sum A + B and all scalar multiples JA are also constant- coefficient operators, the set of all constant-coefficient operators is a linear space. The product of A and B (in either order) is also a constant-coefficient operator. Therefore, sums, products, and scalar multiples of constant-coefficient operators satisfy the usual commutative, associative, and distributive laws satisfied by all linear transformations Also, since we have D'D* = D*D" for all positive integers r and s, any two constant- coefficient operators commute: AB = BA The algebra of constant-coefficient operators 149 With each constant-coefficient operator A we associate a polynomial p4 called the characteristic polynomial of A, If A is given by (6.12), p4 is that polynomial which has the same coefficients as A. That is, for every real r we have PAP) = age" + ay" dy Conversely, given any real polynomial p, there is a corresponding operator A whose coefficients are the same as those ofp. The next theorem shows that this association between operators and polynomials is a one-to-one correspondence. Moreover, this correspondence associates with sums, products, and scalar multiples of operators the respective sums, products, and scalar multiples of their characteristic polynomials. muonm 6.6. Let A and B denote constant-coefficient operators with characteristic polynomials pg and py, respectively, and let 2 be a real number. Then we have. (a) A = Bifand only if py = pas () Pare at Pz (©) Pap = Pa" Pas @) PaPid hPa Proof. We consider part (a) first. Assumep4= Py . We wish to prove that AQ) = B(y) for every y in €°, Since py = py, both polynomials have the same degree and the same coefficients. Therefore A and B have the same order and the same coeflicients, so A(}) = B(y) for every y in 6”. Next we prove that A = B implies p4= pg. The relation A = B means that A(y) for every y in ©, Take y = e', where r is a constant. Since y we have BQ) Pe forevery k > 0, AQ) = pa(e™ and By) = pp(e"*. The equation A(y) = B(y) implies pyr) = pp(r). Since r is arbitrary we must have Pa = Pp . This completes the proof of part (a). Parts (b), (¢), and (d) follow at once from the definition of the characteristic polynomial. From Theorem 6.6 it follows that every algebraic relation involving sums, products, and sealar multiples of polynomials py and py also holds for the operators A and B, In particular, if the characteristic polynomial py can be factored as a product of two or more polynomials, each factor must be the characteristic polynomial of some constant-coefficient operator, so, by Theorem 6.6, there is a comesponding factorization of the operator A. For example, ifp4(r) = pp(Apc(’), then A = BC. Ifp4(r) can be factored as a product of n linear factors, say 6.13) PA) = alr = nr) 7), the corresponding factorization of A takes the form A= a(D~ nD —n)+*(D=r,). 150 Linear differential equations The fundamental theorem of algebra tells us that every polynomial p4(r) of degree n> 1 has a factorization of the form (6,13), where 74,725... 5 Tq afe the roots of the equa tion, Pat) = 0, called the characteristic equation of A. Each root is written as often as its multiplicity indicates. The roots may be real or complex. Since p,(r) has real coefficients, the complex roots occur in conjugate pairs, « + iB, « — if, if B % 0. The two linear factors comesponding t0 each such pair can be combined to give one quadratic factor r? — Qar + a2 + f® whose coefficients are real. Therefore, every polynomial p.s(7) can be factored as a product of linear and quadratic polynomials with real coefficients. This gives a corre- sponding factorization of the operator A as a product of firstorder and second-order constant-coefficient operators with real coefficients wows I Let A = D? = 5D + 6, Since the characteristic polynomial p,(r) has. the factorization ? — Sr + 6 = (r = 2)(r — 3), the operator A has the factorization - 5D +6=(D —2)(D = 3), more 2. Let A = D‘ — 2D% + 2D* — 2D + 1. The characteristic polynomial p4(r) has the factorization r= 4 2 ore 1 ~ Dr = I + D, so A has the factorization A = (D — 1)(D — 1)" + 1). 68 Determination of a basis of s by factorization of operators lutions for linear equations with constant coefficients The next theorem shows how factorization of constant-coefficient operators helps us to solve linear differential equations with constant coefficients. sumomm 6.1. Let L be a constant-coefficient operator which can be factored as a product of constant-coefficient operators, say L= AA,’ a Then the solution space of the linear differential equation L(y) = 0 contains the solution space of each differential equation A,(y) = 0. In other words, (6.14) N(A) S N@) — foreachi= 1, Proof. If w is in the null space of the last factor A, we have A,(u) = 0 so LW) = (Avda Au) = A, . 1 Ap DAW) = (A, ++ A a)(0) = 0. Determination of a basis of solutions for linear equations 15 Therefore the null space of L contains the null space of the last factor A,, But since constant-coefficient operators commute, we can rearrange the factors so that any one of them is the last factor. This proves (6.14). If L{u) = 0, the operator L is said to annihilate u, Theorem 6.7 tells us that if a factor Ay of Z annihilates u, then L also annihilates w. We illustrate how the theorem can be used (© solve homogeneous differential equations with constant coefficients. We have chosen examples to illustrate different features, depending on the nature of the roots of the characteristic equation, CASE I. Real distinct roots. movers 1, Find a basis of solutions for the differential equation (6.15) (D9 - 7D + Oy =0 Solution. This thas the form L(y) = 0 with L=D'—7D+6=(D—l(D-2(D +3). The null space of D = | contains w(x) = e*, that of D — 2 contains ug(x) = e%, and that of D + 3 contains u(x) = e-®. In Chapter 1 (p. 10) we proved that 1, up, ug are inde- pendent. Since three independent solutions of a third order equation form a basis for the solution space, the general solution of (6.15) is given by y = eet + ce + Ce. The method used to solve Example 1 enables us to find a basis for the solution space of any constant-coefficient operator that can be factored into distinct linear factors. rmonm 68. Let L be a constant coefficient operator whose characteristic equation Px(t) = Ohas n distinct real roots ry, f2,..., rx. Then the general solution of the differential equation L{y) = 0 on the interval (- 00, + 0) is given by the formula (6.16) y= Toe fa Proof. We have the factorization L = a(D = 1)(D = n)11D —1,) Since the null space of (D = r,) contains u,(x) =e”, the null space of L contains the n functions (612 u(x) = en, Ug(x) =e, w(x) 152 Linear differential equations In Chapter 1 (p. 10) we proved that these functions are independent. Therefore they form a basis for the solution space of the equation L(y) = 0, so the general solution is given by (6.16). CASE IL Real roots, some of which are repeated. If all the roots are real but not distinct, the functions in (6.17) are not independent and therefore do not form a basis for the solution space, If a root r occurs with multiplicity m, then (D = FJ is a factor of L. The next theorem shows how to obtain m independent solutions in the null space of this factor. rumor 6.9, The m functions u(x) = e™, a(x) = xe)... u(x) = xe are _m independent elements annihilated by the operator (D — ry . Proof. The independence of these functions follows from the independence of the polynomials 1, x, 2%... ., "1, To prove that 4, We, ..., Mm are annihilated by (D = 7)" we use induction on m. If m = 1 there is only one function, u(x) =e, which is clearly annihilated by (D — 1) . ‘Suppose, then, that the theorem is true for m ~ 1. This means that the functions wa, .. . , Uy, are annihilated by (D — ry", Since =n)" =D ~n(D =n the functions 4 + Mp1 are also annihilated by (D = 7)” . To complete the proof we must show that (D — r)" annihilates u,, . Therefore we consider D = 1)"y =D = r"-(D = Noe"), We have (D = r(x ter®) = D(x™1e"*) = rx = (m = Ixmter + xt tre = pxtnlere (m = 1)x"Fert = (mM = Mt a) When we apply (D ~ r)"-1 to both members of this last equation we get 0 on the right since (D — r)""* annihilates u,1. Hence (D — 1)"%q = 0 SO My is annihilated by ( —r)". This completes the proof. sxaneue 2. Find the general solution of the differential equation L(y) = 0, where L=D— DP-8D+12. Solution. The operator L has the factorization L (D — 2)(D + 3). Determination of a basis of solutions for linear equations 153 By Theorem 6.9, the two functions wx) em, u(x) = xe are in the null space of (D — 2), The function u(x) = e-® is in the null space of (D + 3). Since 4, ue, uy are independent (see Exercise 17 of Section 6.9) they form a basis for the null space of Z, so the general solution of the differential equation is y = ce + exe + ce, Theorem 6.9 tells us how to find a basis of solutions for any nth order linear equation with constant coefficients whose characteristic equation has only real roots, some of which are repeated. If the distinct roots are Ty, fz, + + +» Te and if they occur with respective multiplicities 17, , ma, ... , Mes that part of the basis corresponding to ry is given by the m, functions, We.g(x) = xT, — where G= 1,2 ,..., mp. As p takes the values 1, 2,...,k we get mm +++ + mg, functions altogether. In Exercise 17 of Section 6.9 we outline a proof showing that all these functions are independent, Since the sum of the multiplicities m, + +++ + m, is equal ton, the order of the equation, the functions vy, form a basis for the solution space of the equation, sxvext 3, Solve the equation (D? + 2D* — 2D? — Dy = 0. Solution. We have D® + 2D’ — 2D° — D? = D(D — 1)(D + 1)*. The part of the basis comesponding to the factor D* is (x) = 1, u(x) = x 3 the par corresponding to the factor (D = 1) is u(x) = e*; and the part comesponding to the factor (D + 1) is u(x) = e*, u(x) = xe, ug(x) = x**. The six functions m4, .. . , ue are independent so the general solution of the equation is ext eax + ge? + (cy + Ox + cyxtem*. CASE Il. Complex roots. If complex exponentials are used, there is no need to distinguish between real and complex roots of the characteristic equation of the differential equation L(y) = 0. If real-valued solutions are desired, we factor the operator L into linear and quadratic factors with real coefficients, Each pair of conjugate complex roots + if, & — iB corresponds to a quadratic factor, (6.18) D? — 2nD + a? + 8 The null space of this second-order operator contains the two independent functions u(x) = e% cos Bx and v(x) = e% sin Bx. If the pair of roots @ + iB occurs with mult plicity ma, the quadratic factor occurs to the mth power. The null space of the operator [D? — 22D + at + BA” 154 Linear differential equations contains 2m independent functions, U(x) = xe cos fx, —vg(x) xe sin Px, G1, 2s..04m These facts can be easily proved by induction on m. (Proofs are outlined in Exercise 20 of Section 6.9.) ‘The following examples illustrate some of the possibilities. exaMPLe 4, y"— dy" + L3y" = 0, The characteristi has the roots 0, 2 -£ 3i; the’ general solution is equation, 7° — 4r8+ 13r = 0, Y= c+ Cc, Cos 3x + cy sin 3x), AMPLE 5. yp" — 2y" + 4y’ — 8y = 0 . The characteristic equation is Ped 8 =r — (P44 =O; roots are 2, 2/, —2/, so the general solution of the differential equation is y= ce + cz c0S 2x + cy sin 2x. EXAMPLE 6 y — 9y) + 34y" — 66y" + 65y' = 25y = 0. The cl can be written a aracteristic equation Hee i oe its roots are 1, 2 + i, 2 1, so the general solution of the differential equation is y = cet + e*f(ce + cyx) cos x + (Cy + C5) sin x] 6.9 Exercises Find the general solution of each of the differential equations in Exercises 1 through 12, ly” —2y" —3y' =0. ot ley 0 Lye ay’ &y" -y=o. 3. y" + ay" + dy’ = 0. 9. yl 4 dy” + By" + By’ + dy = 0. 4p = By" 4 ay’-y 10. y+ 2y’+y = 0. 5. yD + ay” + Oy" + 4y' + y =O, 11. y+ dy + dy” = 0. 6 J) = ley = 0. 12, y+ By) + 16y” =O. 13. . If m is a positive constant, find that particular solution y = f(x) of the differential equation y= my" + my! = my = 0 which satisfies the condition (0) = f’(0) = 0, /"(0) = 1. 14, A linear differential equation with ‘constant coefficients has characteristic equation f() = 0. If all the roots of the characteristic equation are negative, prove that every solution of ihe diffe ential equation approaches zero as x — + 2. What can you conclude about the behavior of all solutions on the interval [0, + 0) if all the roots of the characteristic equation are nonpositive? Exercises 155 15. In each case, find a linear differential equation with constant coefficients satisfied by all the given functions. @) 40) =e, u(x) = em", g(x) = ug) = eo. 0) =e, U(X) = xe, u(x) = Xe. ©) mG) = 1, u(x) — x, my) = ef, YX) = xe". (@) w= %, GD = ef, g(x) = xe". SG MN A a a ( m(x) = e-* cos 3x, u(x) = e sin 3x, (x)= e*, ag (x) = xe, (g) m4 @) = cosh x, u(x) = sinh x, g(x) = x cosh x, | uy(x) = x sinh x (h) w(x) = cosh x sin x, g(x) = sinh x cos x, g(x) = x. 16, Letry,..., fm be m distinct real numbers, and let Qy , Q, be 1 polynomials, none of which is the zero polynomial, Prove that the » functions M(x) = ACE, .. - wx O,(xe™™ are independent. Outline of proof. Use induction on n, For # = 1 and n = 2 the result is easily verified. Assume the statement is true for n =p and let ¢, +Cysa be p + 1 real scalars such that SeOuaem = Multiply both sides by e-te# and differentiate the resulting equation, Then use the induction hypothesis to show that all the scalars c, are 0. An altemate proof can be given based on order of magnitude as x -r + Cc, as was done in Example 7 of Section 1.7 (p. 10). 17, Let my, mg,..., mg be k positive integers, let ry, rz... .. re be & distinct real numbers, and letn =m, +°°* + my. For each pair of integersp, q saisiying 1 < p, Ow, (b) If w has m — m derivatives, prove that Rom et mg) Leu) as ee oto a, where L® 156 Linear differential equations 19. Refer to the notation of Exercise 18. If w and v have n derivatives, prove that 2 p00 Lew) => FPO wo, [Hint: Use Exercise 18 along with Leibniz’s formula for the kth derivative of a product oy = F Guero 4 20. (a) Let p(t) = 4(1)2(t), where q and r are polynomials and m is a positive integer. Prove that p(t) = q()"—1s(0), where s is a polynomial. (b) Let L be a constant-coefficient operator which annihilates u, where w is a given function of x. Let M = L™,the mth power of L, where m > 1. Prove that each of the derivatives M’, M",..., MY also annihilates u. (©) Use part (b) and Exercise 19 to prove that M annihilates each of the functions u, xu, xt, (@) Use part (c) to show that the operator (D? — 24D + a? + 6)" annihilates each of the functions xe sin fx and xte™ cos Bx for g = 1,2,...,m—1 21. Let L be a constant-coefficient operator of order n with characteristic polynomial p(r). If a is constant and if y has derivatives, prove that Leu(x)) = eye ux), 6.10 The relation between the homogeneous and nonhomogencous equations We retum now (© the general linear differential equation of order 1 with coefficients. that are not necessarily constant. ‘The next theorem describes the relation between solutions of a homogeneous equation L(y) = 0 and those of a nonhomogeneous equation L(y) = R(x). razon 6.10. Let L:€,,(J)—> (J) be a linear differential operator of order n. Let, ,..-, u,, be n independent solutions of the homogeneous equation L(y) = 0, and let y, be a particular solution of the nonhomogeneous equation L(y) = R, where R © V(J). Then every solution y =F (2) of the nonhomogeneous equation has the form 6.19) SG) = iG) + Denia), where ¢1,..., €, are constants. Proof. By linearity we have L(f = y) =L(f) — LQy) = R= R=0. Therefore — 1 is in the solution space of the homogeneous equation L(y) = 0,80 f — y; is a linear combination of #1, .. 0, Un , Sayf = Ja = Cy +. ++ Cqlty . This proves (6.19). Since all solutions of L(y) = R are found in (6.19), the sum on the right of (6.19) (with arbitrary constants Cy, C2, + ©) is called the general solution of the nonhomogeneous Determination of a particular solution of the nonhomogeneous equation 187 equation. Theorem 6.10 states that the general solution of the nonhomogeneous equation is obtained by adding toy, the general solution of the homogeneous equation. Note: Theorem 6.10 has a simple geometric analogy which helps give an insight into its meaning. To determine all points on a plane we find a particular point on the plane and add to it all points on the parallel plane through the origin, To find all solutions of Ly) = R_ we find a particular solution and add 10 it all solutions of the homogeneous equation Lfy) = 0. The set of solutions of the nonhomogeneous equation is analogous to a plane through a particular point. The solution space of the homogeneous equation is analogous to a parallel plane through the origin. To use Theorem 6.10 in practice we must solve two problems: (1) Find the general solution of the homogeneous equation L(y) = 0, and (2) find a particular solution of the nonhomogeneous equation L(y) = R. In the next section we show that we can always solve problem (2) if we can solve problem (1). 6.11 Determination of a particular solution of the nonhomogencous equation. ‘The method of variation of parameters We tum now to the problem of determining one particular solution y, of the nonhomo- geneous equation L(y) = R . We shall describe a method known as variation ofparameters which tells us how to determine y if we know n independent solutions 1, +, of the homogeneous equation L(y) = 0, The method provides a particular solution of the form (6.20) Vi Oy, + --- + gly, wherev,, ... 0, are functions that can be calculated in terms of u,..., Md, and the right- hand member R. The method leads to a system of 1 linear algebraic equations satisfied by the derivatives {,..., vi. This system can always be solved because it has a nonsingular coefficient matrix. Integration of the derivatives then gives the required functions 1, v,. The method was first used by Johann Bemoulli 10 solve linear equations of first order, and then by Lagrange in 1774 tw solve linear equations of second order. For the nth order case the details can be simplified by using vector and matrix notation The right-hand member of (6.20) can be writen as an inner product, (6.21) Ji = (@ 4), where v and u are n-dimensional vector functions given by i dG We ty to choose v so that the inner product defining y, will satisfy the nonhomogeneous equation L(y) = R, given that L(w) = 0, where L(w = (£04)... . » L(t). We begin by calculating the first derivative of y, . We find (6.22) Kia u')+ (uw). 158 Linear differential equations We have 1 functions ©, , +Uq to determine, so we should be able to put n conditions on them. If we impose the condition that the second term on the right of (6.22) should vanish, the formula for y{ simplifies wo Vis, w), — provided that (W, u) = 0. Differentiating the relation for yf we find y= uy + Ww). If we can choose v so that (v’, u’) = 0 then the formula for y{ also simplifies and we get yf =u) provided that also (v’, w’) = 0. If we continue in this manner for the first n — 1 derivatives of y, we find yr) = (v, ue), provided that also (v’, u"-*)) = 0, So far we have put 2 — 1 conditions on y, Differentiating once more we get y= Ww) 4 uD), ‘This time we impose the condition (v’, u"-1)) = R(x), and the last equation becomes yer =v, u™) + R(X), provided that also (W’, uw?) = ROO, Suppose, for the moment, that we can satisfy the m conditions imposed on v. Let L D" + Py(x)D" ++. . + Px). When we apply L to y; we find LO) = AY + PCOVE™ +0 + PAOD = {(e, u™) + REO) + PACH, WY) + + Pal), w) = (v, Llu) + R(x) = (v, 0) + Rix) = Rix) Thus L(y1) = R(x), so J, is a solution of the nonhomogeous equation. ‘The method will succeed if we can satisly the » conditions we have imposed on v, These conditions state that (v’, u) = 0 for k= 0, 1, n= 2, and that (W’, wD) = ROO. We can write these # equations as a single matrix equation, R(x) ol’ 1 (6.23) W(x)o"(x) Determination of a particular solution of the nonhomogeneous equation 159 where v'(x) is regarded as an M x 1 column matrix, and where W is the n x nm matrix function whose rows consist of the components of 4 and its successive derivatives: " i eee ‘The matrix W is called the Wronskian matrix of %, - > Mq, after J. M. H. Wronski (1778-1853). In the next section we shall prove that the Wronskian matrix is nonsingular. Therefore we can multiply both sides of zy (x)? © obtain arf i ( : ' x COW 5 ‘Choose two points ¢ and x in the interval J under consideration and integrate this vector equation over the interval from ¢ to x to obtain 0 3) = 6) + [Rowe] | ae x where 0 2(x) = [Rowe 4 5 dt 1 ‘The formula y: = (u, v) for the particular solution now becomes Jr - &% %) - 0 « 2) - @ 00) + (2). ‘The first term (u, v(c)) satisfies the homogeneous equation since it is a linear combination of Uy, +. . 5 4,. Therefore we can omit this term and use the second term (u, z) as a particular solution of the nonhomogeneous equation, In other words, a particular solution of 160 Linear differential equations Lfy) = R is given by the inner product 0 ue), 269) = (uo, f "ROMO § Ja 1 Note that it is not necessary that the function R be continuous on the interval All that is required is that R be integrable on [e, x]. We can summarize the results of this section by the following. theorem. Tamorem G11. Let w,.-., um be n independent solutions of the homogeneous nth order linear differential equation L(y) = 0 onan interval L Then a particular solution y, of the nonhomogeneous equation L(y) = R is given by the formula iG) = Sano, where v,,..., 0, are the entries inthe nx | column matrix v determined-by the equation 0 (6.24) w(x) if Rowen} 3) ae. 1 In this formula, W is the Wronskian matrix of uy,..., t,,andc is any point ind Note: The definite integral in (6.24) can be replaced by any indefinite integral 0 f RW) 4 | de 1 I. Find the general solution of the differential equation on the interval (— 09, + co). Solution. The homogeneous equation, (D? — Dy = 0 has the owo independent solutions u(x) = €*, u(x) = e-* . The Wronskian matrix of uy and ip is Wi) Nonsingularity of the Wronskian matrix 161 Since det W(x) = -2, the matrix is nonsingular and its inverse is given by ifoee tee Waxy = —5 7 ‘Therefore and we have 0 7 34 zal el Integrating each component of the vector on the right we find v(x) ee (* -it ree < des ere log (1+ e*) 1+ & and = — dx = -lo ae are vx) J a e+e) ‘Therefore the general solution of the differential equation is Y= cya) + eaita(x) + rx) + Pau) = ce + Qe? — 1 = xe? + (@ — €*) log (1 + &). 6.12 Nonsingularity of the Wronskian matrix of n independent solutions of a homogeneous linear equation In this section we prove that the Wronsklan matrix W of n independent solutions 1, ., ¥, of a homogeneous equation L()) = 0 is nonsingular. We do this by proving that the determinant of W is an exponential function which is never zero on the interval J under consideration. Let w(x) = det W¢x) for each x in J, and assume that the differential equation satisfied by wy... 5M, has the form (625) H+ PYM 4 oot PyCody ‘Then we have: ‘nmr 6.12. The Wronskian determinant satisfies the first-order differential equation (6.26) w+ Py(xw = 0 162 Linear differential equations onJ. Therefore, if ¢ € J we have (627) w(x) = wi) exp | —[* Py dt (Abel'sformuta). Moreover, w(x) # 0 for all x in J. Proof. Let u be the row-veetor u = (wy,..., U,). Since each component of u- satis! the differential equation (6.25) the same is tue of u. The rows of the Wronskian matrix W are the vectors uu’)... , Ww", Hence we can write wedet W= det(uu’,..., uD) ‘The derivative of w is the determinant of the matrix obtained by differentiating the last row of W (see Exercise 8 of Section 3.17). That is wesdet(uuy.. 6, 2, uw), Multiplying the last row of w by P,(x) we also have Py(xw = det (a, ul, Py(xutP). Adding these last two equations we find w+ Py(xdw = det (uy uO, We 4 P(x), But the rows of this last determinant are dependent since w satisfies the differential equation (6.25). Therefore the determinant is zero, which means that w_ satisfies (6.26). Solving 6.26) we obtain Abel’s formula (6.27). Next we prove that w(e) # 0 for some ¢ in J. We do this by a contradiction argument, Suppose that w(t) = 0 for all ¢ in J. Choose a fixed t in J, say f = f), and consider the linear system of algebraic equations W)X = 0, where X is a column vector, Since det W(t.) = 0, the matrix W(tq) is singular so. this system has a nonzero solution, say X= (41... , ¢) #@,.. . , 0) . Using the components of this nonzero vector, let f be the linear combination SA) = Can) + + endinD The function f so defined satisfies L(f) = 0 on J since it is a linear combination of Uys eee My. The matrix equation W(1,)X = 0 implies that H(t) =f) FOAM) = 0. Therefore f’ has the initial-value vector O at ¢ = f) so, by the uniqueness theorem, f is the zero solution. This means ¢, = + ** = Cy = 0, which is a contradiction. Therefore w(t) £0 for some t in J. Taking ¢ to be this ¢ in Abel’s formula we see that w(x) #¥ 0 for all x in J, This completes the proof of Theorem 6.12. ‘The annihilator method for determining a particular solution 163 6.13 Special methods for determining a particular solution of the Reduction to a system of first-order linear equations nhomogencous equation, Although variation of parameters provides a general method for determining a particular solution of L(y) = R, special methods are available that are often easier to apply when the equation has certain special forms. For example, if the equation has constant coefficients we can reduce the problem (0 that of solving a succession of linear equations of first order. ‘The general method is best illustrated with a simple example. overs 1. Find a particular solution of the equation (6.28) (D =D = Dy = xe* Solution. Let u = (D — 2)y. Then the equation becomes (D = lu = xe, This is a first-order linear equation in u which can be solved using Theorem 6.1. A particular solution is u = fers" Substituting this in the equation u = (D = 2)y we obtain (D — Dy = per", a firstorder linear equation for y. Solving this by Theorem 6.1 we find that a particular solution (withy(©) = 0) is given by yx) = fe* a et de. Although the integral cannot be evaluated in terms of elementary functions we consider the equation as having been solved, since the solution is expressed in terms of integrals of familiar functions. The general solution of (6.28) is Ode yaa cet 4el 6.14 The annihilator method for determining a particular solution of the nonhomogeneous equation We describe next a method which can be used if the equation L(y) = R_ has constant coefficients and if the right-hand member FR is itself annihilated by a constant-coefficient operator, say A(R) = 0. In principle, the method is very simple. We apply the operator A (© both members of the differential equation L(y) = R and obtain a new equation AL(y) = 0 which must be satisfied by all solutions of the original equation, Since AL is another constant-coefficient operator we can determine its null space by calculating the roots of the characteristic equation of AZ. Then the problem remains of choosing from Linear differential equations 164 this null space a particular function y, that satisfies L(,) = R. The following examples illustrate the process. . Find a particular solution of the equation mame | (D'— oy = +x4l- 4, is annihilated by the The right-hand member, a polynomial of degre also solution of the equation Solution. operator D®, ‘Therefore any solution of the given equation is D(Dt — 16)y = 0. (6.29) ‘The roots of the characteristic equation are 0, 0, 0, 0, 0,2, —2, 2/, —2/, so all the solutions of (6.29) are to be found in the linear combination ey sin 2x 2. oe YG 1 GaX + Gyn? + ax? + Cgxt + cg + CE « C4008 2x 16. Since the last We want to choose the ¢; so that Ly) = xt +x +1, where L= Dt— four terms are annihilated by L, we can take ¢¢ = c, = ¢ = ¢ = 0 and try to find + ¢580 that Cases Ley + eax + axt + Cx? eget) xt xD In other words, we seek a particular solution y, which is a polynomial of degree 4 satisfying To simplify the algebra we write Lyy=xt+x41 16y, = axt + bx* + ox* 4+ dx +e. This gives us 16y! = 24a, so y= 3a/2. Substituting in the differential equation LQy,) = xt +x +1, we must determine a, 6, c, d, € to satisfy fa — axt = bx? — cxt—de —e st tx $1 Equating coefficients of like powers of x we obtain a so the particular solution yy is given by Ji = ext — Tex — ve. swverz 2, Solve the differential equation y” — 5p’ + 6y = xe* ‘The differential equation has the form Solution. Ly)= R, (630) ‘The annihilator method for determining a particular solution 165 where R(x) = xe* and L = D® — SD + 6. The corresponding homogeneous equation can be written as (D — 2(D = 3yy = 0; it has the independent solutions u(x) = @%, us(x) = e*. Now we seek a particular solution y, of the nonhomogeneous equation. We recognize the function R(x) = xe* asa solution of the homogeneous equation @M—\y=0. Therefore, if we operate on both sides of (6.30) with the operator (D = 1)? we find that any function which satisfies (6:30) must also satisfy the equation D = 1)(D ~ 2(D = 3p = 0. ‘This differential equation has the characteristic roots 1, 1, 2, 3, so all its solutions are to be found in the linear combination y sae + bxet + co 4 de, where a, 6, c, d are constants. We want to choose a, b, ¢, dso that the resulting solution Ja Satisfies L(y) = xe". Since L(ce* + de*) = 0 for every choice of ¢ and d, we need only choose a and b so that L(ae* + bxe”) = xe" and take c= d = 0. If we put Ja = ae” + bxe*, we have D0) = (a + bye + bxe*, — Dy.) = (a + 2b)e* + bee”, so the equation ( D? — 5 D + 6)y, = xe* becomes (2a = 3b)e* + 2bxe* = xe". inceling e* and equating coefficients of like powers of x we find a = $,b =}, Therefore Ja = Je? + dre and the general solution of L(y) = R is given by the formula yp ace + cet + bet + dnl. ‘The method used in the foregoing examples is called the annihilator method. it will always work if we can find a constant coefficient operator A that annihilates R. From our knowledge of homogeneous linear differential equations with constant coefficients, we know that the only real-valued functions annihilated by constant-coefficient operators are linear combinations “of terms of the form Ie, x MTeX# cos Bx, xe ain Bx, x where m is a positive integer and a and B are real cons is a solution of a differential equation with a character ants, The function y = xox ic root having multiplicity m. 166 Linear differential equations Therefore, this function has the annihilator (D — 2)". Each of the functions y = x" cos fx and y =x"™te™* sin Bx is a solution of a differential equation with complex characteristic roots @ + if , each occuring with multiplicity m, so they are annihilated by the operator [D? — 20D + (a? + %)". For ease of reference, we list these annihilators in Table 6.1, along with some of their special cases. Taste 6.1 Function Annihilator 1 D* y =e D-a va xe (D—a y =cospx ory = sin Bx D+ & y=x™Toos Bx ory = x1 sin Bx (D* + pay" yre*cos x ory = ef sin Bx D? — 22D + (x2 + 6%) y=x™ le cos fx ory = x” Te" sin Ax [D? = 2aD + (a + fA)" Although the annihilator method is very efficient when applicable, it is limited to equations whose right members R have a constant-coefficient annihilator. If R(x) has the form e*, log x, or tan x, the method will not work; we must then use variation of parameters or some other method to find a particular solution 6.15 Exercises In each of Exercises 1 through 10, find the general solution on the interval (— 0, + ¢0). dye at. 6y"-y : 2. y" ay = e, Ly" ay mete. 3." + 2y' = 3x0". B. yl" + By" + By ty = xe, 4, y? + dy = sin x. 9." + y = xe® sin 2x. Sy" = dy tysete 10. x y= 8e# 11. If a constant-coefficient operator A annihilates f and if a constant-coefficient operator annihilates g, show that the product AB annihilates f + g . 12, Let A be a constant-coefficient operator with characteristic polynomial p (a) Use the annihilator method to prove that the differential equation 4(y) = e#* has a particular solution of the form i. 7 = pale) if « is not a zero of the polynomial py (®) If a is a simple zero of p, (multiplicity 1), prove that the equation A(y) = e* bas the particular solution xen Pig)" (©) Generalize the results of (a) and (b) when a is a zero of py with multiplicity m. Miscellaneous exercises on linear differential equations 167 13. Given two constant-coefficient operators A and 8 whose characteristic polynomials have no zeros in common, Let C = AB. (@) Prove that every solution of the differential equation C{y) = 0 has the form y= yy + Jay where A(j,) = 0 and B(y) = 0. (b) Prove that the functions y, and y, in part (a) are uniquely determined, That is, for a given y satisfying C(y) = 0 there is only one pair yy. 2 with the properties in part (a). 14. If L(y) = y" + ay’ + by, where @ and 6 are constants, let _f be that particular solution of LQ) = 0 satisfying the conditions (0) = 0 and FO) = 1. Show that a particular solution of L(y) = R is given by the formula si) = [7 flr - OR] de for any choice of c. In particular, if the roots of the characteristic equation are equal, say ry = r= m_, show that the formula for y.(x) becomes ios = emt [Fe = perme 15. Let Q be the operator “multiplication by x.” That is, Q(y)%) = x . y(x) for each y in class 6% and each real x. Let Z denote the identity operator, defined by Z(y) = y for each y in 6”. (a) Prove that DQ ~ QD = J. @) Show’that D8Q ~ QD* is a constant-coefficient operator of first order, and determine this ‘operator explicitly as a linear polynomial in . (©) Show that D®°Q ~ QD* is a constant-coefficient operator of second order, and determine this operator explicitly as a quadratic polynomial in D. (@) Guess the generalization suggested for the operator D"Q — QD", and prove your result by induction. In each of Exercises 16 through 20, find the general solution of the differential equation in the given interval 16. y" -y Ux, 0, 40). 18, y” ~y = sectx — seox, -2.2), 19. y= dy’ +y = eM(eZ = 1)", (— 0, +2). 20. y" = Ty" + l4y’ — By = log x, ©, +0). 6.16 Miscellaneous exercises on linear differential equations 1. An integral curve y = u(x) of the differential equation y" = 3p" = 4y = 0 intersects an integral curve y = v(x) of the differential equation y"+ 4y’ = Sy = 0 at the origin. Determine the functions w and v if the two curves have equal slopes at the origin and if 2. An integral curve y = u(x) of the differential equation y" ~ 4y' + 29 = 0 intersects an integral curve y = v(x) of the differential equation y” + 4y’ + 13y = 0 at the origin. The two curves have equal slopes at the origin, Determine w and v if a’ (x/2 168 Linear differential equations 3. Given that the differential equation y" + 4xy’ + QG)y = 0 has two solutions of the form Da =uCx) and yy = xu(x), where u(Q) = 1. Determine both u(x) and Qc) explicitly in terms of x. 4. Let LQ) = y"+ Pry" + Pay . To solve the nonhomogeneous equation L(y) = ® by variation ‘of parameters, we need to know two linearly independent solutions of the homogeneous equation, This exercise shows that if one solution u, of L(y) = 0 is known, and if u, is never zero on an interval J, a second solution 1, of the homogeneous equation is given by the formula QF uC f 7 i oo where Q(x) = e~!P:(2d2, and ¢ is any point in J These two solutions are independent on of (@) Prove that the function w, does, indeed, satisfy L(y) = (b) Prove that 1 and uy are independent on J. 5. Find the general solution of the equation ay" = 20+ Dy + (e+ Dy = xe for x > 0, given that the homogeneous equation has a solution of the form y = e"* . 6. Obtain one nonzero solution by inspection and then find the general solution of the differential equation QO" = 4y’) + 32G)" = 4y) = 0. 7. Find the general solution of the differential equation arty” + 4x) given that there is a particular solution of the form y = x" for x > 0. 8, Find a solution of the homogeneous equation by trial, and then find the general solution of the equation x(L = ayy" = (1 = ay’ + G8 — 3x + Dy = — 9, 9, Find the general solution of the equation (Ox = 3xdy" + ay" + Oxy = given that it has a solution that is a polynomial in x, 10. Find the general solution of the equation C1 = 0p" + 2x2 = xy'+ 20+ Dye, given that the homogeneous equation has a solution of the form y = x¢ 11. Let g(x) = Jf el/t de if x > 0. (Do not attempt to evaluate this integral.) Find all values of the constant @ such that the function £ defined by i Fla) = 5 este Linear equations of second onder with analytic coefficients 169 isfies the linear differential equation 28y" + Gx = xy + x — ey = 0. Use this information to determine the general solution of the equation on the interval (0, + 20). 6.17 near equations of second order wi analytic coefficients ‘A funetionfis said to be analytic on an interval (Xo — 7, Xo + 7) iffhas a power-series expansion in this interval, convergent for |x — X| < r. If the coefficients of a homogeneous linear differential equation + Py 4 oe PACOy = 0 are analytic in an interval (x, — 7% Xo + 7, then it can be shown that there exist n inde- pendent solutions ww, . u,, cach of which is analytic on the same interval. We shall prove this theorem for equations of second order and then discuss an important example that occurs in many applications sumo 6.13. Let P, and Pz be analytic on an open interval (x, =r, x9 +1), say Pio) = Ese = x9", PG) = See =x)" Then the differential equation 3D y+ Prey’ + Pry - 0 has two independent solutions u, and 42 which are analytic on the same interval. solution of the form Proof. We ty to find a powerseries 32 = Sate =)", convergent in the given interval. To do this, we substitute the given series for Py and Py in the differential equation and then determine relations which the coefficients a, must. satisfy so that the function y given by (6.32) will satisfy the equation, The derivatives y’ and y” can be oblained by differentiating the power series for y term by term (sce Theorem 11.9 in Volume 1). This gives us yf = Snasoe = uy = El t+ Dageals = x9", Tas Hn + Dansle — x0)" 170 Linear differential equations The products Pi(x)y" and P,(x)y are given by the power seriest Peay" = 3 (EO + Vacsabn)(e— 0) ana Pay = 3 (Sarena) (= 0". When these series are substituted in the differential equation (6.31) we find Slt 20+ Daas t Se + Danske t der al} — 0" ‘Therefore the differential equation will be satisfied if we choose the coefficients a, so that they satisfy the recursion formula (6:33) ee -«] forn=0,1,2,.... This formula expresses a. in terms of the earlier coefficients dy, 4;,...,4,,, and the coefficients of the given functions Py and P,. We choose arbitrary values of the first two coefficients a, and a, and use the recursion formula to define the remaining coefficients @, a3, .. . , in terms of a, and a, This guarantees that the power series in (6.32) will satisfy the differential equation (6.31). The next step in the proof is to show that the series so defined actually converges for every x in the interval (xp — 7 Xo + 7). ‘This is done by dominating the series in (6.32) by another power series known to converge. Finally, we show that we can choose a, and a, to obtain two independent solutions. We prove now that the series (6.32) whose coefficients are defined by (6.33) converges in the required interval, Choose a fixed point x, 4 X» in the interval (xo — 7; x9 + 7) and let f= [xy — xol- Since the series for P, and P, converge absolutely for x = x; the terms of these series are bounded, say Wiel t*< My, and [gl t* < Me, for some M,> 0, My > 0. Let Ad be the larger of M, and My . Then we have M Il {«. Dlaysal Se, Dlagalt™ + Dlasal 2. lag M. pal - ' ue S&+ 2) laggal A la} "5 1 Duk +1) lal + Those readers not familiar with multiplication of power series may consult Exercise 7 of Section 6.21 ‘The Legendre equation m1 Now let A, = dl, A, = |@il, and define A, A, ... successively by the recursion formula (6.34) (n+ Dine Aya = “L Dk DA" vl for n > 0. Then |a,| < A, for all m > 0, so the series Y a,(x — x9)” is dominated by the series > A, |x = x9|". Now we use the ratio test to show that > A, |x — X9|" converges if Ix = Xl < f. Replacing n by 1 — 1 in (6.34) and subtracting ¢-} times the resulting equation from (6.34) we find that (t+ 2)@1 + I)Ag.e — OM + DA, = M(n + Anus . Therefore (n+ In + (n+ 2Mt Ane Anu (n+ r+ De" and we find Anse X= al"? (ns Dns (ns DME ye a) Ana |e xol”** (n + 2(n 4 De t as @ — 00. This limit is less than 1 if |x — x9| < ¢. Hence ¥ a,(x — x9)” converges if |x — xol <¢. But since [xy — Xol and since x, was an arbitrary point in the interval (io — 7. Xo + 7), the series S ay(x — x0)" converges for all « in (xy— 7, Xo + 7). ‘The first two coefficients a, and a, represent the initial values ofy and its derivative at the point x9. IF we let z be the power-series solution with a, = 1 and a = 0, so that u(%o) =1 and u(x) = 0, and let we be the solution with a, = 0 and a, = 1, so that tf) 0 and u(x) = 1 then the solutions 4 and Wz will be independent, This completes the proof. 6.18 The Legendre equation In this section we find power-series solutions for the Legendre equation, (6.35) (1 — xy" — 2xy’ + a(a+ Dy =0, where a is any real constant. This equation occurs in problems of attraction and in heat- flow problems with spherical symmetry. When is a positive integer we shall find that the equation has polynomial solutions called Legendre polynomials. These are the same polynomials we encountered earlier in connection with the Gram-Schmidt process (Chapter 1, page 26). The Legendre equation can be written as (or = DyT = a+ Dy, which has the form TQ) = ay, 172 Linear differential equations where 7’ is a Sturm-Liouville operator, 71) = (pf')', with pO) = x7 — 1 and a= a(x + 1). Therefore the nonzero solutions of the Legendre equation are eigenfunctions of T belonging to the eigenvalue a(x + 1). Since p(x) satisfies the boundary conditions PQ) = p(-I) =0, the operator T is symmetric with respect to the inner product (ha) = [se ax. ‘The general theory of symmetric operators tells us that eigenfunctions belonging 10 distinct eigenvalues are orthogonal (Theorem 5.3). In the differential equation treated in Theorem 6.13 the coefficient of y" is 1, The Legendre equation can be put in this form if we divide through by 1— x®, From (6.35) we obtain Pi@)y' + Pay - 0, where Pix) == and P(x) — aa +1) x 1 , if x 41. Since 1/(1 — x4) = D2, x™ for |x| < 1, both P, and P, have power-series expansions in the open interval (— 1, 1) so ‘Theorem 6.13 is applicable. ‘To find the recursion formula for the coefficients it is simpler to leave the equation in the form (6.35) and try to find a power-series solution of the form y =D a,x” Zo valid in the open interval (— 1, 1. Differentiating this series term by term we obiain Snax and y= San — a,x? Eon Se ‘Therefore we have xy’ = ¥ 2na,x"= ¥ 2na,x", = and ba (1 = Hy" = Enc = Yaya”? = Smt — Dayo" Sn + Mat Dagar” — FE nln — Da," ole + Nn + Dayys — n(n — Da, )x”. If we substitute these series in the differential equation (6.35), we see that the equation will be satisfied if, and only if, the coefficients satisfy the relation (n+ Mn+ Vays — nl — 1a, = 2na, + x(a + La, = 0 The Legendre equation 173 for all n > 0. This equation is the same as (14D + DAn2 — = a(n 1+ aa, or (@ = maene) 6.36) ee a, _ me (n + In + 2) ‘This relation enables us to determine 42, @,, a,, . . . , successively in terms of a, Similarly, we can compute @, d;,a,, ..., in terms of a,. For the coefficients with even subscripts we have ofa + 1 an ay (a — (a + 3) 2 aa — 2a + 1a + 3) a= —-“> + a, = (—1)? = a, 3.4 oo! 4 : and, in general, (ae = 2)+ 2. (= Mn +B) (a+ Mat 3). G+ Mn = 1) Ce aaa amo) (Qn)! ‘This can be proved by induction. For the coefficients with odd subscripts we find nw (= Ie = 3). = Ont 1) (H+ Wat 4) "+ (a+ 20) Dor Qn + D! Therefore the series for y can be written as 637) Y= M(x) + a(x), where (6.38) ae Sap ERD = nD FM 43) (+ AM—D Lon u(x) =1 +2 | and (6.39) Se gyn = We = 3) (= An $1) Nee A) (4 Bn) one uo) = x 2 y a zt The ratio test shows that each of these series converges for |x| < 1. Also, since the relation (6.36) is satislied separately by the even and odd coefficients, each of uy and us is a solution of the differential equation (6.35), These solutions satisfy the initial conditions uO) =1, ui) = 0, u,(0)- 0, uO) = 1 174 Linear differential equations Since u, and uy are independent, the general solution of the Legendre equation (6.35) over the open interval (~ 1, 1) is given by the linear combination (6.37) with arbitrary constants a, and a,. When a is 0 or a positive even integer, say 2 = 2m, the series for u,(x) becomes a polynomial of degree 2m containing only even powers of x. Since we have eee 2)... (2 — 2m + 2) = 2mm — 2)+-+ Qm — In 4 2) = Aw (m=) and (at Ia+ 3). (a+ 2n — 1) = (2m + 1)(2m + 3). ++ (2m + 2n — 1) (2m + 2n)!m! 22m)! (m + njt the formula for 44(x) in this case becomes 6.40) (joa ca ye 2m + 2k)! i (6.40) u(x) 14 GSC Y Gobir babi For example, when « = 0, 2, 4, 6 (m =0, 1,2, 3) the corresponding polynomials are uy) = 1, 1 5x38, 1 10x + xt, LDL + 63x4 = BEANE ‘The series for u(x) is not a polynomial when a is even because the coefficient of x2"H is never zero. ve integer, the roles of 4 and 2 are reversed; the series for u(x) becomes a polynomial and the’ series for u(x) is nota polynomial, Specifically if, a = 2m + Iwe have (my SS ke (2m + 2k + Dt mp 641 ax) = x $ Sy a ee Dae, (ay mC) *+ Gn Dios Gn ym + HL Qk + DI* For example, when « = 1, 3, 5 (m = 0, 1,2), the corresponding polynomials are 1) xo, the 4 a 6.19 The Legendre polynomials Some of the properties of the polynomial solutions of the Legendre equation can be deduced directly from the differential equation or from the formulas in (6.40) and (6.1). Others are more easily deduced from an alternative formula for these polynomials which we shall now derive. First we shall obtain a single formula which contains (aside from constant factors) both the polynomials in (6.40) and (6.41). Let cya 1Qn 2)! nor aa P= a8 2 rin = Mn 2 0 The Legendre polynomials 175 where [n2] denotes the greatest integer 0 we have P,(l) = 1. Exercises 117 Moreover, P(x) is the only polynomial which satisfies the Legendre equation (1 = x%)y" — 2xy' + n(n + Dy = 0 and has the value 1 when x = 1 For each m > 0 we have Py(—x) = (—D"PA(x)- ‘This shows that P, is an even function when mt is even, and an odd function when n is odd. We have already mentioned the orthogonality relation, [ PQOPa(Q) de = 0 if men. When m =n we have the norm relation AP,it =f TP.oor ae = A 2n+ 1° Every polynomial of degree can be expressed as a linear combination of the Legendre polynomials Po, Py, ..., Py. In fact, iff is a polynomial of degree we have SO) =SeP.), where 2k Ge + [soorsoo ax. From the orthogonality relation it follows that LF ecor,.cd dx = 0 for every polynomial g of degree less than 7, This property can be used to prove that the Legendre polynomial P, has n distinct real zeros and that they all lie in the open interval (1, 0. 6.21 Exercises 1, The Legendre equation (6.35) with & = 0 has the polynomial solution 1(x) = 1 and a solution tq, not a polynomial, given by the series in Equation (6.41). (a) Show that the sum of the series for uy is given by 1+ for il <1 for bl (b) Verify directly that the function ug in part (a) is a solution of the Legendre equation when «=o 178 Linear differential equations 2. Show that the function f defined by the equation L+x fox) Tox x lai for |x| < 1 satisfies the Legendre equation (6.35) with # = 1. Express this function as a linear combination of the solutions u and wz given in Equations (6.38) and (6.39). 3. The Legendre equation (6.35) can be written in the form [@? — Dy'Y - aa + Dy (@) If a,b, ¢ are constants with a > band 4¢ + 1 > 0, show that a differential equation of the type , [@ -@@ - Dy'T - ey = 0 can be transformed to a Legendre equation by a change of variable of the form x= At + B, with A > 0. Determine A and B in terms of a and 5. (b) Use the method suggested in part (a) to transform the equation (= xy" + x — Dy’ - 2y= 0 to a Legendre equation, 4, Find two independent power-series solutions of the Hermite equation = 2xy' + 2ay = 0 on an interval of the form (-r , r). Show that one of these solutions is a polynomial when a is a nonnegative integer. 5. Find a power-series solution of the differential equation ay" +B + xy’ + 3xty = 0 valid for all x. Find a second solution of the form y = x 6. Find a power-series solution of the differential equation YS a,x" valid for all x # 0. ay" + ty! = ax + Dy = 0 valid on an interval of the form -r, 1) . 7, Given two functions A and B analytic on an interval (x9 — 17, Xo +7), say AG) = ¥ age — x0)", BL) mo Src ~ x". It can be shown that the product C(x) = A(x)B(x) is also analytic on (%» — 1, > + 7). This exercise shows that C has the power-series expansion CO) =F egx— x0)", where cy = Sanda: fo =o Exerc: 179 (@) Use Leibniz’s rule for the nth derivative of a produet to show that the nth derivative of C is given by cry = 3 ("aerae—He, (b) Now use the fact that A(x.) =k! dy and B(x) =(n— 1)! by, to obtain CG) <0 1S abet Since C(x) = nl! cy , this proves the required formula for ¢y . In Exercises 8 through 14, P,(x) denotes the Legendre polynomial of degree #, These exerci outline proofs of the properties of the Legendre polynomials described in Section 6.20. 8. (a) Use Rodrigues’ formula to show that Pa) = - @+D" +6 = 12,0), where Q,(x) is a polynomial, (b) Prove that P,( 1) = 1 and that Py(~ 1) = (= 1)" (©) Prove that Py(x) is the only polynomial solution of Legendre’s equation (with a = n) having the value 1 when x = 1. 9. (@) Use the differential equations satislied by P, and P,, to show that [d= 2)(P.PR, — PRP) = [nr + De mOn + IP aPm (b) If m_# 2, integrate the equation in (a) from -1 to 1 Ww give an altemate proof of the onthogonality relation | PaC@Pr) de 10. (a) Let f@) = @* = 1)". Use integration by parts to show that Prcarnen ae =f poem) a, Apply this formula repeatedly to deduce that the integral on the left is equal Qn)! [= x4" dx. 20m! [Fd = x" de (b) The substitution x = cos # wansforms the integral {2 (1 = x2)" ax to fx! sin?n+ty de. Use the Telation ne 2nQn = 2)+-+2 f sine tdt - OF ees yd and Rodrigues’ formula to obtain [cor ae = 2n+1° 180 Linear differential equations 11. (@) Show that Qn)! Pal) = Tran x" + nO), where Qu(x) is a polynomial of degree less than 7, (b) Express the polynomial f(x) = x asa linear combination of Py , Py . Pz , Py, and Py. (© Show that every polynomialf of degree m can be expressed as a linear combination of the Legendre polynomials Py, P,,..., Pn+ 12. @) Iifis a polynomial of degree tt, write Se) = Zoe). [This is possible because of Exercise 11 (¢).] For a fixed m, 0 < m<1,, multiply both sides of this equation by P,(x) and integrate from — 1 to 1. Use Exercises 9(b) and 10(b) to deduce the relation : OEE Peppa) de 13. Use Exercises less than n. 14, (@) Use Rolle’s theorem to show that P,, cannot have any multiple zeros in the open interval G1, 1). In other words, any zeros of P,, which lie in G1 , 1) must be simple zeros. (b) Assume P, has m zeros in the interval (— 1, 1). If m= 0, let Q(X) = 1. If m= 1, let 9 and 11 to show that §1, g(x)P,(x) dx = 0 for every polynomial g of degree Om(x) = (= A=)... Xm), where x, %2,...,X,arethemzerosofP,in(-1, 1), Showthatateachpointxin(-1 , 1), Q.@) has the same sign as P,(a. (©) Use part (b), along with Exercise 13, to show that the inequality m < 1 leads to a con- tradiction, This shows that P,, has n distinct real zeros, all of which lie in the open interval (-1, ). 15. (a) Show that the value of the integral [21 P,(2)P{.1(%) dx is independent of 7». (b) Evaluate the integral fly x Py@)Pra@) de. 6.22 The method of Frobeni In Section 6.17 we learned how to find power-series solutions of the differential equation (6.43) y + Pilx)y' + Palxyy = in an interval about a point X9 where the coefficients P, and P, are analytic. If either Py or Pz is not analytic near xq , power-series solutions valid near x» may or may not exist. For example, suppose we ty 10 find a powerseries solution of the differential equation (6.44) ay" yy =O near X» = 0. If we assume that a solution y = ¥ @,x* exists and substitute this series in the differential equation we are led to the recursion formula wont Gut = ane ‘The method of Frobenius 181 Although this gives us a power series y = 5 a,x* which formally s (6.44), the ratio test shows that this power series converges only for x = 0. Thus, there is no power-series solution of (644) valid in any open interval about Xo = 0, This example does not violate Theorem 6.13 because when we put Equation (6.44) in the form (6.43) we find that the coefficients Py and Py are given by 1 1 PY= and PG) = = Ay. These functions do not have power-series expansions about the origin. ‘The difficulty here is that the coefficient of y” in (6.44) has the value 0 when x = 0 ; in other words, the differential equation has a singular point at x = 0. ‘A knowledge of the theory of functions of a complex variable is needed to appreciate the difficulties encountered in the investigation of differential equations near a singular point. However, some important special cases of equations with singular points can be treated by elementary methods. For example, suppose the differential equation in (643) is equivalent to an equation of the form (6.45) (& — xp)*y” + Oe = 40) PCOy’ + OCy = 0+ where P and Q have power-series expansions in some open interval (Xo — 7; X» +7). In this case we say that x» is a regular singular point of the equation, If we divide both sides I5) by (X = Xo)? the equation becomes ve PO yy 4 OW yo oi ean for x # Xo. If P(x») ¥.0 or O(xq) ¥ 0, or if Q(x) = 0 and O'(x,) ¥ 0, either the co- efficient ofy’ or the coefficient ofy will not have a power-series expansion about the point xo, so Theorem 6.13 will not be applicable, In 1873 the German mathematician Georg Fro- benius (1849-1917) developed a useful method for treating such equations. We shall describe the theorem of Frobenius but we shall not present its proof.? In the next section we give the details of the proof for an important special case, the Bessel equation Frobenius’ theorem splits into two parts, depending on the nature of the roots of the quadratic equation (6.46) t(t — 1) + PO%)t + Q(x) This quadratic equation is called the indicial equation of the given differential equation (6.45). The coefficients P(x) and Q(x) are the constant terms in the power-series ex- pansions of Pand Q. Let x and a denote the roots of the indicial equation, These roots may be real or complex, equal or distinct. ‘The type of solution obtained by the Frobenius method depends on whether or not these roots differ by an integer. } For a proof see E. Hille, Analysis, Vol. Il, Blaisdell Publishing Co., 1966, or E. A. Coddington, An Introduction to Ordinary Differential Equations, Prentice-Hall, 1961 182 Linear differential equations mon 6.14. rinsr cass or rnonanrus' munonm. Let a and a be the roots of the indicial equation and assume that dy — % is not an integer. Then the differential equation (6.45) has two independent solutions u, and us Of the form (6.47) ux) = |x — xl" Sax — x)", with a = 1, ey and (6.48) u(x) = [x — xql"*S b,x — xq)", with by = 1 Bo Both series converge in the interval |x — xq| 0, the constant C may or may not be zero. As in Case 1, both series converge in the interval |x — X9| 0, 0 that |x|'= 2x. Differentiation of (6.50) gives us Y= tt tY agx" + xt Y na,x" = XD 4 Naar”. #o E-) ‘The Bessel equation 183 Similarly, we obtain de + O(n +t = thax". If L(y) = x2y" + xy’ + (x = 02), we find Lo) = °F m+ din +t —1)a,x" + #S in + Dax" <0 a + xtS ax? = xt ata,x"= x3 Min + Ot — ola,x" + ee a,x"? so eo ft & Now we put L(y) = 0, cancel x, and uy to determine the a, so that the coefficient of each power of x will vanish, For the constant term we need (1? — a#)a, = 0. Since we seek a solution with a, % 0, this requires that 651) po! oO. ‘This is the indicial equation, Its roots « and —2 are the only possible values of ¢ that can give us a solution of the desired type. Consider first the choice 1 For this / the remaining equations for determining the coefficients become (652) (1+ a) = ata = 0 and (m+ @)® = oan + dye 0 for n > 2. Since « > 0, the first of these implies that a, = 0. The second formula can be written as a, (6.53) n= -—o « : (nt+aPh—a? n(n + 2x) 80 a, = dy = a, = *'+=0. For the coefficients with even subscripts we have =a =a =a (=a 20+ w=2ita’ “= M44 2a) =220 + al2+a)’ , a —=4 (= 17a , = 66+ 2a) = 231(1 + a 2+ a3+a) and, in general, ae (=1)"40 Pn + 2+ ade. (nt a)" ‘Therefore the choice r= « gives us the solution oe ee ald + OA +a) (at @]* The ratio test shows that the power series appearing in this formula converges for all real x. 184 Linear differential equations In this discussion we assumed that x > 0. If x <0 we can repeat the discussion with xt replaced by (—x)'. We again find that ¢ must satisfy the equation ?— a? =0. Taking 1 =a we then obtain the same solution, except that the outside factor x* is replaced by (—x)*. Therefore the function f, given by the equation (=1)"x*" {Pmt + a2+) (654) FAR) = Ao |xI* ( ms a+ 3) is a solution of the Bessel equation valid for all ral x % 0. For those values of % for which 20) and f.(0) exist the solution is also valid for x = 0. Now consider the root ¢ = —a of the indicial equation, We obtain, in place of (6.52), the equations [d= aja] =0 and [(n— 2)? alla, + a, = 0, which become (= 2x)a=0 and n(n = 22a, + ayn If 2x is not an integer these equations give us a, = 0 and noe nn — 20) a, for n > 2. Since this recursion formula is the same as (6.53), with a replaced by —«, we are led to the solution . ne 69 tie abil, Sea aaa) for all real x #0. The solution f_, was obtained under the hypothesis that 2x is not a positive integer. However, the series for f_,, is meaningful even if 2x is a positive integer, so long as « is not a positive integer. It can be verified that f_, satisfies the Bessel equation for such «, Therefore, for each % > 0 we have the series solution /,, given by Equation (6.54); and if a is not a nonnegative integer we have found another solution f.. given by Equation (6.55). The two solutions f, and /_, are independent, since one of them —+00 as x + 0, and the other does not. Next we shall simplify the form of the solutions, To do this we need some properties of Euler’s gamma function, and we digress briefly to recall these properties. For each real s > 0 we define P(s) by the improper tegral Te= [ mtetar. ‘The Bessel equation 185 ‘This integral converges if $ > O and diverges if s < 0. Integration by parts leads to the functional equation (6.56) Te +) = 5 TG). This implies that Te + 2) =6+D06+1=6+ 0s TO), T(s + 3) = (6 + 2)PG + 2) =(s + 2) (s + Ds T(s), and, in general, (6.57) Tis + n=(stn—1)-.. (+ Ds TO) for every positive integer n, Since (1) find Sg ect dt = 1, when we put s = 1 in (6.57) we Ta +) =a. Thus, the gamma function is an extension of the factorial function from integers to positive real numbers, The functional equation (6.56) can be used 10 extend the definition of 12%) 10 negative values of s that are not integers. We write (6.56) in the form (6.58) Te) feat The righthand member is meaningful ifs + 1 > 0 and s # 0. Therefore, we can use this equation to define 1's) if — 1 < s < 0. The righthand member of (6.58) is now meaning- ful if $ + 2 > 0,5 % —1, 5 £0, and we can use this equation to define 1%(8) for -2 < § <= 1. Continuing in this manner, we can extend the definition of 1%(8) by induction to every open interval of the form -n < s < -n + 1, where n is a positive integer. The functional equation (6.56) and its extension in (6.57) are now valid for all real s for which both sides are meaningful, We reum now © the discussion of the Bessel equation, The series for f, in Equation (6.54) contains the product (1 + a) (2 + a). .* (m + a). We can express this product in terms of the gamma function by taking § = 1 + & in (657). This gives us Ta + 1+) (+ a)2+ a) to) = OES Therefore, if we choose a, = 2-*/[(1 + a) in Equation (6.54) and denote the resulting function f,(x) by J,(x) when x > 0, the solution for x > 0 can be written as v 19 SS 186 Linear differential equations The function J, defined by this equation for x > 0 and « > 0 is called the Bessel /unction Of the first kind of order %. When « is a nonnegative integer, say % = p , the Bessel function Jp is given by the power series This is also a solution of the Bessel equation for x < 0. Extensive tables of Bessel functions have been constructed. The graphs of the two functions J and J, are shown in Figure 6.2. Ficure 6.2 Graphs of the Bessel functions Jy and J. We can define a new function J_, by replacing « by —a in Equation (6.59), if a is such that D(n + 1 = a) is meaningful; that is, if a is not a positive integer. Therefore, if x > 0 anda >0,a#1,2,3,..., wedeline 49-93 Taking § = 1 — @ in (6.57) we obtain T(n+ 1-%) =(1 =a) 2Q—2) (n= a) MC 1-2) and we see that the series for J_,(x) is the same as that for f_,(x) in Equation (6.55) with a = HT = 0), x > 0. Therefore, if « is not a positive integer, J_, is a solution of the Bessel equation for x > 0. If @ is not an integer, the two solutions J,(a) and J_,(x) are linearly independent on the positive real axis (since their ratio is not constant) and the general solution of the Bessel equation for x > 0 is Y= CS (X) + CyJ_g(X)- If « is a nonnegative integer, say 2= p , we have found only the solution J, and its con- stant multiples valid for x > 0, Another solution, independent of this one, can be found The Bessel equation 187 by the method described in Exercise 4 of Section 6.16. This states that if u, is a solution of y" + Pyy’ + Pay = 0 that never vanishes on an interval J, a second solution uz independent of uw, is given by the integral * OD) WD = 00) | Toe where Q(x) = e7/72'=!@2, For the Bessel equation we have P,(x) = I/x, so Q(x) = Vix and a second solution u, is given by the formula (6.60) wo o | 7 (| we Je OF if ¢ and x lie in an interval J in which J, does not vanish, This second solution can be put in other forms, For example, from Equation (6.59) we may write 1 1 0)» Dor = 2 ® @ where g(O) # 0. In the interval Z the function gy has a power-series expansion a(d= 3 Aut" which could be determined by equating coefficients in the identity g(t) J,(QP = ee. If we assume the existence of such an expansion, the integrand in (6.60) takes the form wore Integrating this formula term by term from ¢ to x we obtain a logarithmic term A, log x (from the power #4) plus a series of the form x” S B,x", Therefore Equation (6.60) takes the form u(x) = Aagd 20) log x + JQ) aM > B,x 0 It can be shown that the coefficient A,, y£ 0. If we multiply u(x) by I/A2, the resulting solution is denoted by K,,(x) and has the form Ks) = J) log x +7 36,2". This is the form of the solution promised by the second case of Frobenius’ theorem. Having arrived at this formula, we can verify that a solution of this form actually exists by substituting the right-hand member in the Bessel equation and determining the co- efficients C,, so as to satisfy the equation, The details of this calculation are lengthy and 188 Linear differential equations will be omitted. The final result can be expressed as tyxy? S@—n—p! eS Part ase (2)" K,(x) = J,(x) log x — +(* won’ 2 <1 tat Bee (X) 109) = J,(0) eax — 3(3) 2a 2) 2 vores (3) where hy =O and h,= 1+ 4-+++*-++ Vin for nm > 1. The series on the right converges for all real x. The function K, defined for x > 0 by this formula is called the Besselfunction of the second kind of orderp.” Since K, is not a constant multiple of J, , the general solution of the Bessel equation in this case for x > 0 is y= Cy,(x) + CoK,(x)- Further properties of the Bessel functions are discussed in the next set of exereises. 6.24 Exercises 1. (@) Let £ be any solution of the Bessel equation of order # and let g(x) = x'4f (x) for x > 0. Show that g satisfies the differential equation yale (b) When 42? = 1 the differential equation in (a) becomes y” + » = 0 ; its general solution is y = A cos x + B sin x. Use this information and the equation? T'(4) = Vx to show that, forx >0, 2K 26 Jue) = (= sinx and Jag) -(2) cos x. (©) Deduce the formulas in part (b) directly from the series for Jy (x) and J ¢(x). Use the series representation for Bessel functions to show that p a @) GF OVa) = Jaa); a (0) FOV) = Vasa) 3. Let F(x) = J, (2) and GG) = x*J,(2) for x > 0. Note that each positive zero of J, is a zero of F, and is also a zero of G,. Use Rolle’s theorem and Exercise 2 to prove that the posi- tive zeros of J, and J, interlace, That is, there is a zero of J, between each pair of positive zeros of Jya1, and a Zero of J,,4 between cach pair of positive Zeros of J. (See Figure 6.2.) + The change of variable ¢ =u gives us TW) = [2 tet dea? [Pew du (See Exercise 16 of Section 11.28 for a proof that 2 {7° e-w* du = Vz.) Exercises 189 10. u (@) From the relations in Exercise 2 deduce the recurrence relations 2,0) HO =IxG) ad 24,0) = HO) = Juul). (b) Use the relations in part (a) to deduce the formulas 2a Fea) + InuC) == JS(x) and 4) = Jaya) = 2). = e Exercise 1 (b) and a suitable recurrence formula to show that Jag) = 2)(= ~ cos s). Find a similar formula for J_s,(x), Note: Jj(2) is an elementary function for every & which is half an odd integer. Prove that 14 ga . eee dx Ya) + J24C0) = 5 FRO Sa and a " ge Haber) = EG) — Jin). (@) Use the identities in Exervise 6 10 show that He + 2 F N)=1 and SQn+ DCU raG)= dx. 2 Fo (b) From part (a), deduce that [Jp(x)| < 1 and [J,(x)| <}¥2 for m= 1,2, 3,..., and all x20. . Let §2(x) = x'4/,(ax*) for x > 0, where a and & are nonzero constants. Show that g, satisfies the differential equation xty" + (abi +} oBbdy = 0 if, and only if, f, is a solution of the Bessel equation of order a. Use Exercise 8 to express the general solution of each of the following differential equations in terms of Bessel functions for x > 0. (a) y" + xy =O. © yh + xmy = 0. (b)y” + xy =O. (d) xy" + Gt+ Dy = 0. Generalize Exercise 8 when f, and g, are related by the equation g,(x) = x*f,(ax*) for x > 0. ‘Then find the general solution of each of the following equations in terms of Bessel functions forx >0. (@) xy” + 6y' + y =0. (©) xy” + 6" + aty = 0. (b) xy" + by’ + xy = 0. (@) xty” — xy + & + Dy = 0. ‘A Bessel function identity exists of the form Sule) —Io(x) = a(x), where @ and ¢ are constants. Determine a and c. 190 Linear differential equations 12, Find a power series solution of the differential equation xy” + y' + y = 0 convergent for to 0 it can be expressed in terms of a Bessel function, 13. Consider a linear second-order differential equation of the form 16. ACOY” + xP’ + Ody = 0, where A(x), P(x), and Q(x) have power series expansions, A(X) = Sant, P(x) piss oe) Sax, with a, 0, each convergent in an open interval (-7, 7). If the differential equation has a series solution of the form x S egx", 0 valid for 0 < x , Determine these solutions. ‘The nonlinear differential equation y" + y + ay only “mildly ne there is a solution which can be expressed as a power series in a nonlinear if q is a small nonzero constant. Assu of the form eae (valid in some interval 0 < a < 1) mo and that this solution satisfies the initial conditions y = 1 and y' = 0 when x = 0. To con. form with these initial conditions, we ty to choose the coefficients tug(x)s0 that 1up(0) u(0) = 0 and u,(0) = ui(0) = 0 for n> 1. Substivute this series in the differential equation, Ye g(x) and u(x). equate suitable powers of 4 and thereby determ 7 SYSTEMS OF DIFFERENTIAL EQUATIONS 7A Introduction Although the study of differential equations began in the 17th century, it was not until the 19th century that mathematicians realized that relatively few differential equations could be solved by elementary means. The work of Cauchy, Liouville, and others showed the importance of establishing general theorems to guarantee the existence of solutions to certain specific classes of differential equations. Chapter 6 illustrated the use of an existence- uniqueness theorem in the study of linear differential equations. This chapter is concemed with a proof of this theorem and related topi Existence theory for differential equations of higher order can be reduced to the first- onder case by the introduction of systems of equations. For example, the second-order equation ea y+’ -y=et can be transformed to a system of two firstorder equations by introducing wo unknown functions yy andy, , where ae Ys a va Then we have = y", so (7.1) can be written as a system of two first-order equations = Vis Ye (7.2) ie wean a2 eel. We cannot solve the equations separately by the methods of Chapter 6 because each of them involves two unknown functions. In this chapter we consider systems consisting of m linear differential equations of first order involving n unknown functions y,,.. . . J, - These systems have the form Yi = Puy + PulOy.+ 0+. + Piya + a) (7.3) Yn = Pra. + Pre ++ Puan + In(t) 19 192 Systems of differential equations The functions py, and gq; which appear in (7.3) are considered as given functions defined on a given interval J. The functions y,, + Jn are unknown functions to be determined Systems of this type are called first-order linear systems, In general, each equation in the system involves more than one unknown function so the equations cannot be solved separately, ‘A linear differential equation of order n can always be transformed to a linear system, Suppose the given nth order equation is (74) POE GYD 4 red ay = RU), Where the coefficients @, are given functions, To transform this to a system we write y and introduce a new unknown function for each of the successive derivatives of y, That is, we put va Yes Vis Va= Yar...) Y¥n= Ynoay and rewrite (7.4) as the system (75) Yn-1= Vn Yn = An = FnaY2 — 111 GYn + R(t). The discussion of systems may be simplified considerably by the use of vector and matrix notation. Consider the general system (7.3) and introduce vector-valued functions. Y = Ores. 20s Q = Gis +. . +Gn)s and a matrix-valued function P = [p,,], defined by the equations YQ = AM. Wa), — OM =HOs.. 90D), PO) = PC) for cach in J, We regard the vectors as n x 1 column matrices and write the system (7.3) in the simpler form (16) PU) ¥ + OW). For example, in system (7.2) we have : oo 7. Ph: — b —2¢. Calculus of matrix functions 193 In system (7.5) we have - 1 0 OG Og yn 0 0 1 0 0 Ja Y fe) Or) = 0 0 0 i 0 Jn. ettt-a,- RW) a An initial-value problem for system (7.6) is to find a vector-valued function Y which satisfies (7.6) and which also. satisfies an initial condition of the form Y(a) = B, where aeJand B= (b,,... , b,) is a given n-dimensional vector. In the case n = 1 (the scalar case) we know from Theorem 6.1 that, if P and Q are continuous on J, all solutions of (7.6) are given by the explicit formula (1.2) Y(x) = e4¥(a) + eats fe eA O(1) dt, where A(x) = Jz P(t) dt , and a is any point in J. We will show that this formula can be suitably generalized for systems, that is, when P(1) is an nm x n matrix function and Q(t) is an n-dimensional vector function, To do this we must assign a meaning to integrals of matrices and to exponentials of matrices, Therefore, we digress briefly to discuss the calculus of matrix functions. 72 Calculus of mat functions The generalization of the concepts of integral and derivative for matrix functions is straightforward. If P(t) = [p,(f)], we define the integral (% P(t) dr by the equation [r@a= if rit) a| : That is, the integral of matrix P(t) is the matrix obtained by integrating each entry of P(2), assuming of course, that each entry is integrable on /a, bj. The reader can verify that the linearity property for integrals generalizes (© matrix functions Continuity and differentiability of matrix functions are also defined in terms of the entries, We say that a matrix function P = [p,] is continuous at f if each entry py is continuous at #. The derivative P’ is defined by differentiating each entry, whenever all derivatives pi,(t) exis. I is easy to verify the basic differentiation rules for sums and products. For example, if P and Q are differentiable mawix functions, we have cae ce | 194 Systems of differential equations if P and Q are of the same size, and we also have (PO)' = PQ’ + PQ if the product PQ is defined. The chain rule also holds, That is, if F(t) = Plg(t)] , where P isa differentiable matrix function and g is a differentiable scalar funetion, then F’(t) = BOP'Ig()]. The zero-derivative theorem, and the first and second fundamental theorems ‘of calculus are also valid for matrix functions. Proofs of these properties are requested in the next set of exercises. The definition of the exponential of a matrix is not so simple and requires further preparation. This is discussed in the next section. 73 Infinite series of matrices. Norms of matrices Let A = [ay] be an n x n matrix of real or complex entries. We wish to define the exponential e in such a way that it possesses some of the fundamental properties of the ordinary real or complex-valued exponential, In particular, we shall require the law of exponents in the form 78) e4 for all real s and 1, and the relation 7.9) where 0 and Z are the nx n zero and identity matrices, respectively. IL might seem natural to define e4 to be the matrix [e*]. However, this is unacceptable since it satisfies neither of properties (7.8) or (7.9). Instead, we shall define e by means of a power series expansion, We know that this formula holds if A is a real or complex number, and we will prove that it implies properties (7.8) and (7.9) if A is a matrix. Before we can do this we need to explain what is meant by a convergent series of matrices, DEFINITION OF CONVERGENT SERIES OF MATRICES, Given an infinite sequence af m Xn matrices {Cz} whose entries are real or complex numbers, denote the ij-entry of Cy by cf). If all. mn series 7.0) Sa are convergent, then we say the series of matrices 32, C;, is convergent, and its sum is defined to be the mx n matrix whose ij-entry is the series in (7.10). Exercises 195 A simple and useful test for convergence of a series of matrices can be given in terms of the norm of & matrix, a generalization of the absolute value of a number. pet NITION OF NoRM OF A MATRIX If A = [ay] is an m xn matrix of real or complex entries, the norm of A, denoted by |\Al|, is defined to be the nonnegative number given by the formula qi) ll = > Zlasl am In other words, the norm of A is the sum of the absolute values of all its entries. There are other definitions of norms that are sometimes used, but we have chosen this one because of the ease with which we can prove the following properties. ‘THHOREM 7./, FUNDAMENTAL PROPERTIES OF NORMS. For rectangular matrices A and B, and all real or complex scalars c we have 4+ Bil )(CDC cDCcs, and, more generally, A’ = CDKC, Therefore in this case we have Here the difficulty lies in determining C and its inverse. Once these are known, e4 is easily calculated, Of course, not every matrix can be diagonalized so the usefulness of the foregoing remarks is limited. [: 1 id- 1, so there is a nonsingular awe 1. Calculate e! for the 2 x 2 matrix A Solution, ‘This matrix has distinct eigenvalues A, = 6, aad 6 0 ‘] such that CAC = D, where D = diag (Ay, As) = [ iI: To matrix C = © 202 Systems of differential equations determine C we can write AC = CD, or 5 4)fa 6) [a 51/6 0 1 2jle a)” le allo 1 Multiplying the matrices, we find that this equation is satisfied for any scalars a, b, ¢, d with a=4c,b=-d. Taking c= d= 1 we choose eft} aly ae Jee 2221 oe ea exampie 2. Solve the linear system ‘Therefore v= Si + Aye Vas Vit Qo subject to the initial conditions (0) = 2, y,(0) = Solution. In matrix form the system can be written as ("] 7 Y'(i) =A YO), Y¥0) = fH where A = i ; [" 2] By Theorem 7.7 the solution is Y(t) = e Y(O). Using the matrix e4 calculated in Example I we find ("| alae + & de | (i vol SLeft mot ot + 4eilld Y= det — et, y= et + Det, from which we obtain ‘There are many methods known for calculating e when A cannot be diagonalized. Most of these methods are rather complicated and require preliminary matrix transforma- tions, the nature of which depends on the multiplicities of the cigenvalues of A. In a later section we shall discuss a practical and straightforward method for calculating e'4 which can be used whether or not A can be diagonalized. It is valid for alf matrices A and requires no preliminary transformations of any kind, This method was developed by E. J. Putzer in a paper in the American Mathematical Monthly, Vol. 73 (1966), pp. 2-7. It is based on a famous theorem attributed to Arthur Cayley (1821-1895) and William Rowan Hamilton ‘The Cayley-Hamilton theorem 203 (18051865) which states that every square matrix satisfies its characteristic equation. First we shall prove the Cayley-Hamilton theorem and then we shall use it to obtain Puzer’s formulas for calculating. e4, 7.11 ‘The Cayley-Hamilton theorem THEOREM 7.8. CAYLEY-HAMILTON THEOREM. Let A be an nxn matrix and let (7.19) Nf a He be its characteristic polynomial. Then £ (A) = 0. In other words, A satisfies the equation (7.20) A” + Cyl" + 1 + GA + Of =O. Proof. The proof is based on Theorem 3.12 which states that for any square matrix A we have 721) A (of A)‘ = (det A). We apply this formula with A repl (7.21) becomes d by A = A. Since det (AJ — A) = f(@), Equation 22 (ar = Afeof(Al — A= SA. This equation is valid for all real 1, The idea of the proof is to show that it is also valid when 2 is replaced by A. The entries of the matrix cof (Af — A) are the cofactors of Af — A. Except for a factor + 1, each such cofactor is the determinant of a minor of Af — A of order n — 1. Therefore cach entry of cof (21 ~ A), and hence of {cof (A = A)/*, is a polynomial in 2 of degree Sn — 1. Therefore otal — Ay = Sm, where each coefficient B, is ann xn matrix with scalar entries. Using this in (7.22) we ‘obtain the relation (123) ane ANS, FOL which can be rewritten in the form (7.24) Bas + SAM Ba — AB) ~ ABy =A" + ¥ Peal + col. mt é 204 Systems of differential equations At this stage we equate coeflicients of like powers of A in (7.24) to obtain the equations Bra = By2 — ABya = eo) By — AB, = ot ABy = ool. Equating coefficients is permissible because (7.24) is equivalent to n® scalar equations, in each of which we may equate coefficients of like powers of il, Now we multiply the equa- tions in (7.25) in succession by A”, A*,..., A, I and add the results. The terms on the left cancel and we obtain OA" + GAME. + A+ Ol. This proves the Cayley-Hamilton theorem, Note: Hamilton proved the theorem in 1853 for a special class of matrices, A few years later, Cayley announced that the theorem is true for all matrices, but gave no proof. 54 0) mowers. The matrix A =|1 2 0] has character polynomial 122 f(a) = (2 = IA = 24 = 6) = BB — 972 + 204 = 12. ‘The Cayley-Hamilion theorem states that A satisfies the equation (7.26) A — 9A? + 200A — 120 = 0. ‘This equation can be used to express A® and all higher powers of A in terms of J, A, and A®, For example, we have A = 9A? = 20A + 121, At = 943 — 204? + 12A = 9(9A? — 20A + 121) = 204? + 124 = 61A? — 168A + 1081 It can also be used to express A~? as a polynomial in A, From (7.26) we write A(A? = 9A + 201) = 122, and we obtain A+ = 3,(A? — 9A + 201), Putzer’s methodfor calculating et4 205 712 Exer es In each of Exercises 1 through 4, (a) express A-1, 4® and all higher powers of A as a linear combination of Z and A. (The Cayley-Hamilton theorem can be of help.) (b) Calculate et, 1 0 1 0 o1 Ale 4 : 14 12 10 on cost sint 5. @Was + prove that et : -1 0 =sint cost, a (b) Find a corresponding formula for e4 when A = [ I a, b real. -b a 6. FO = “t that eF = eF(e) TEA} is je ai the derivative of e4() js 4(t4’(¢), Compute the derivative of ed ven ‘and show that the result is not equal to either of the two products eA AO) or Nile In cach of Exercises 8, 9, 10, (a) calculate A", and express 4° in terms of J, A, 42. (b) Calculate ef oid 011 2.0 0 a4-]00 1], 9 4=]0 11]. 104=/0 1 Of, 000 000 oid 0 o Mt. TfA=]1 0 1), express et as a linear combination of I, A, 4. ogo I° 1 9 eooxy yt 1A 0 21, prove that et = Lazy at +y* 2xyl ,wherex = cosh tand y = sinh 1. 0 10 aay B. This example shows that the equation e4+2 = ef & sis not always true for matrix exponentials. 1 1-1 ‘Compute each of the matrices e4e8 , eBe4 , e¢+B when A = (. 10] and B= [ = note that the three results are distinct, 7.13 Putzer’s method for calculating e'4 ‘The Cayley-Hamilton theorem shows that the nth power of any ” x n matrix A can be expressed as a linear combination of the lower powers J, A, A?,..., A", It follows that zach of the higher powers A™+1, A™*2,..., can also be expressed as a linear combination of 206 Systems of differential equations 1A, A2,..., A", Therefore, in the infinite series defining e'4, each term f*A*/k! with k > nis a linear combination of t*/, t*A, tA, . t£A"4, Hence we can expect that eM should be expressible as a polynomial in A of the form nel 1.27) 4 | Yaa. where the scalar coefficients g,(f) depend on, Putver developed two useful methods for expressing et as a polynomial in A. The next theorem describes the simpler of the two methods : raworeM 7.9, Let hy,,..,2q be the eigenvalues of ann Xn matrix A, and define a sequence of polynomials in A as follows 7.28) PsA) = aaa a Ra u Then we have (1.29) where the scalar coefficients v(t), ..., F(t) are determined recursively from the system of linear differential equations no ToD An), n= 1, (7.30) satel) 1D, Mer) 0, (k= 4,2,-.-5 0-1) Note: Equation (7.29) does not express et directly in powers of A as indicated in (7.27), but as a linear combination of the polynomials Py(A), Py(A), ... , Pu-i(A). These polynomials are easily calculated once the eigenvalues of A are determined, Also the multipliers r,(1),.. .. (0) in (7.30) are easily calculated. Although this requires solving a system of linear differential equations, this particular system has a triangular matrix and the solutions can be determined in succession. Proof, Levr(i),.. .. r0) be the scalar functions determined by (7.30) and define a matrix function F by the equation ee 731) FO) = 2 eiPs(A) - Note that FO) =1,(0)P9(A) = I. We will prove that F(t) = e'4 by showing that F satisfies the same differential equation as et, namely, F(t) = AF(1) . Differentiating (7.31) and using the recursion formulas (7.30) we obtain PO = Sri OPA) Lr) + Ancares sD} PMA) Putzer's method for calculating e'4 where r,,(t) is defined to be 0. We rewrite this in the form aca ot FO = & resPrealA) « & Arsater (OPA) then subtract A,F(t) = S23 Apres:(t)P,(A) to obtain the relation (7.32) FOHAFO. ZreOlPrlA). Ania APA} But from (7.28) we see that P,..(A) = (A= Ay DP (A), 50 Pei) , Ary — APA). (AA DPA), Cera 4,)Pe(A) =(A ~A,DP,A). ‘Therefore Equation (7.32) becomes FO) — AF = (Am ad > Tes()P(A) = (A = ADP) = 1a(DPna(AD} = (A = 2,DF(O) ~ ry(DP,(A)- The Cayley-Hamilton theorem implies that P,(A) = 0, so the last equation becomes FO — nF) = (A ADF) = AF) — FO. from which we find F(t) = AF(t). Since F(O) = 7, the uniqueness theorem (Theorem 7.7) shows that F(.) = e'4 . severe I. Express e'4 as a linear combination of Zand A if A is a 2 x 2 matrix with both its eigenvalues equal to il, Solution. Writing 4, = A, = 4, we are to solve the system of differential equations HO = an, rx) = 1, re) = art) + r(), (0) = 0. Solving these first-order equations in succession we find nD =e, rt) = te™ Since P(A) = Z and P(A) = A = AI, the required formula for e4 is (7.33) e4 = eT + te(A — AD) = = ANT + tetA. 208 Systems of differential equations swexz 2. Solve Example 1 if the eigenvalues of A are A and 4, where A p. Solution. In this case the system of differential equations is nQ@ = Ar), nO) =1, nO) = er) +n(), (0) = 0. Iis solutions are given by ae nf) =e", rfth= a —H Since P,(A) = Z and P,(A) = A — Al the required formula for e'4 is it gat nel fila | lll & Am (7.34) ett a eM + If the eigenvalues 4, 4 are complex numbers, the exponentials e* and e" will also be complex numbers. But if 2 and 1 are complex conjugates, the scalars multiplying Z and A in (734) will be real. For example, suppose Asatif, p=a-ip, pO. Then A= = 2i so Equation (7.34) becomes ADt _ olaipt [4 —(@+ if] etm olatinly + 2B le te an) ole 7 ap A at = | = @*{(eos ft+ isin Byr+ ar (Amol — ian} ‘The terms involving i cancel and we get ev i . (7.35) 4s 2 {(B cos Bt — a sin Bt) + sin Bt A}. 7.14 Alternate methods for calculating e"4 in. special cases Puvver’s method for expressing e'4 as a polynomial in A is completely general because it valid for all square matrices A. A general method is not always the simplest method to se in certain special cases, In this section we give simpler methods for computing e4 in pecial cases: (a) When all the eigenvalues of A are equal, (b) when all the eigenvalues of A are distinct, and (c) when A has two distinct eigenvalues, exactly one of which has multiplicity 1. Alternate methods for calculating e4 in special cases 209 ruronmt 7.10. If A is ann x n matrix with all its eigenvalues equal to 2, then we have (7.36) ef4 = ec “iA —ant Proof. Since the matrices AI and (A — AD) commute we have GA we eMlgita-a wy 7 (A —an* The Cayley-Hamilton theorem implies that (A — 2/)* = 0 for every k > n , so the theorem is proved. monn 7.11, If A is an nxn matrix with n distinct eigenvalues hy ha... - + Ay» then we have 4 = Se), where [,(A) is a polynomial in A of degree n ~ 1 given by the formula Léa) . TT 4 ad or a i hod Note: The polynomials L(A) are called Lagrange interpolation coefficients. Proof. We define a matrix function F by the equation (7.37) F() 2 e*L (A) and. verify that F satisfies the differential equation F(t) = AF(t) and the initial condition F(0) = 1. From (7.37) we see that AFO) — FW) = ¥ &*(A = AgIL(A) - Tai By the Cayley-Hamilton theorem we have (A — 4,1)L,(A) = 0 for cach k, so F satisfies the differential equation F'(1) = AF(t) . To complete the proof we need to show that F satisfies the initial condition F(0) = I, which becomes (7.38) ZLAA)- 1. A proof of (7.38) is outlined in Exercise 16 of Section 7.15. The next theorem teats the case when A has two distinct eigenvalues, exactly one of which has multiplicity 1 210 Systems of differential equations ruzosm 7.12. Let A be an n x n matrix (n > 3) with pvo distinct eigenvalues 2and fs, where 2 has multiplicity n — 1 and jw has multiplicity 1. Then we have Pa et ete yy ee - ao (4a + era aya oy Proof. As in the proof of Theorem 7.10 we begin by writing 2. pe ae & et4 = elt vii (A — AD = e* 2H (A — ane + “2 a —an* ye yt S fa yale (AAD + e’ 2 21474 Ady . Now we evaluate the series over r in closed form by using the Cayley-Hamilton theorem. Since Agu =A-M-(u-dI we find (A — Al (A = wl) = (A — AN" = (H— AYA = AD. The eft member is 0 by the Cayley-Hamilton theorem so (A = AD" = (u— A(A- aN. Using this relation repeatedly we find = AN = (Wm (A= AN". Therefore the series over r becomes oe ; / Loe _— Gar 06 ae yi ee ae / are, Sue ara ot ‘This completes the proof, ‘The explicit formula in Theorem 7.12 can also be deduced by applying Puver’s method, but the details are more complicated ‘The explicit formulas in Theorems 7.10, 7.11 and 7.12 cover all matrices of onlern < 3. Since the 3 x 3 case often arises in practice, the formulas in this case are listed below for easy reference. CASE 1. if a 3 x 3 matrix A has eigenvalues A, 4, 4, then ed = e{1 + (A — Al) + $(A = AD} Exercises au CASE 2. If a3 x 3 matrix A has distinct eigenvalues 4, 4, v, then t4 = A= HDA= HD) 4 yu (A= ADA =D) 4 yu (A= AA = BD G=wa=9 Du” O=ae =») CASE 3.16 a 3 x 3 matrix A has eigenvalues 4, 4, , with 2% 42, then é ey ee ee ly (aay nd 0 1 0 woere. Compute e'4 when A =|0 0 1 en 4 Solution. The eigenvalues of A are 1, 1, 2, so the formula of Case 3 gives us (7.39) elt = elf] + (A = Z)) + (2% — eA — 1) — te(A — 1). By collecting powers of A we can also write this as follows, (7.40) ol = (—2te" + eT + {(3t + Dol — 2A — (1 + Net — e*} AP. At this stage we can calculate (A — 1)? or 4? and perform the indicated operations in (7.39) or (7.40) to write the result as a 3 x 3 matrix, —2ue! + et (3t+ Yet 2e (4 + Det + et 4 = | —2(r + Het + 2e (31+ Set — de (r+ Det + Det —2t + Det + de — (31 + Ble —Be™ (t+ ANel 4 det 7A5 Exercises For each of the matrices in Exercises 1 through 6, express e“ as a polynomial in A. te fs -2 12 ae LAs 3.4=]0 1 3 {4 -1 2 -1] oo 1 1 6 OF Tao Seer Oana a OO eee Si Alo Oe ee 3. 00 6 ll -6 et: oo 0 7. @) A3 x3 matrix A is known to have all its eigenvalues equal to 1, Prove that ett = folt{(H2rt — Dat + 27 + (—24 + WA + PAY. (b) Find a comesponding formula if A is a4 x 4 matrix with all its eigenvalues equal to 4, 212 Systems of differential equations In each of Exercises 8 through 15, solve the sysiem Y" = A Y subject to the given initial condition, 3 \ 1 20 0 cd A 0 ¥@ =|-1 ia=lo 1 of, vo=|ea]- 10. a 2 or) Ca, eo 1 22 -3 1 10) Ol ia Ae a a) oye 0. 6 -ll -6 0 -1 -24 0 1g 0 0 0 2 | ‘ qo qo 1 0004 2 A= o 1 ooo 1 16. This exercise outlines a proof of Equation (7.38) used in the proof of Theorem 7.11, Let L(A) be the polynomial in 2 of degree m — 1 defined by the equation where 4,...,1, are m distinct scalars. (@) Prove that 0 fa AA, LQ 1 eae (b) Let yx)... + Jn be m arbitrary scalars, and let Prove that p(/) is the only polynomial of degree © Compute Y(x) explicitly when of tht 3, Let A be an n x 7 constant matrix, let B and C be n-dimensional constant vectors, and let a be a given scalar. (@) Prove that the nonhomogeneous system Z’() = AZ(t) + eC has a solution of the form Zit) = eB if, and only if, (af = A)B = C (b) If a is not an eigenvalue of A, prove that the vector B can always be chosen so that the system in (a) has a solution of the form Z(t) = e*B (©) If @ is not an eigenvalue of A, prove that every solution of the system Y'(t) = A YG) + e#C has the form Y@ = e'4( YO) = B) + eB, where B = (al ~ APC 4, Use the method suggested by Exercise 3 to find a solution of the nonhomogeneous system YW = AYO + eC, with Sina =I 0 a-| \. c-[ |: vo =]. a | -1 1 216 Systems of differential equations 5. Let A be an a x m constant matrix, let B and C be n-dimensional constant vectors, and let m be a positive integer. (a) Prove that the nonhomogeneous system Y°() = A YO + PC, YO) = B, has a particular solution of the form Y(f) = By + tBy + PB, +++ + Buy where By, By... +, Bm are constant vectors, if and only if 1 C=a- samp m Determine the coefficients By, By...» By for such a solution (b) If 4 is nonsingular, prove that the initial vector B can always be chosen so that the system in (a) has a solution of the specified form. 6. Consider the nonhomogeneous system N= nt te Je = 2y1 + 2yn4 Pr (a) Find a particular solution of the form Y(f) = By + 1B, + 2B, + BBs. (b) Find a solution of the system with y,(0) = y4(0) = 1 7. Let A be ann x 1 constant matrix, let B, C, D be n-dimensional constant vectors, and let a be a given nonzero real number. Prove that the nonhomogeneous system Y() = A Y() + (cos at)\C + Gin at)D, (0) = B has a particular solution of the form Y@) = (os aE + (Sin af\F, where £ and Fare constant vectors, if and only if (42 + 2B = (AC + 2D). Determine £ and Fin terms of A, B, C for such a solution, Note that if 4? + 227 is nonsingular, the initial vector 8 can always be chosen so that the system has a solution of the specified form. 8. (@) Find a particular solution of the nonhomogeneous system Ye + 3yn +4 sin 2 ae a (b) Find a solution of the system with y,(0) = y,(0) The general linear system Y'(t) = P(t) Y(t) + QU) 217 In cach of Exercises 9 through 12, solve the nonhomogeneous system Y") = A YO + QW) subject to the given initial condition. a 07 1 [ |: aM = [ ‘} YO) = [ | 21 ~2¢' 0. 5 e es [ |: ao -[ |. ro =| | 1-3 et ca -5 -1 Tet — 27 UA= eee > YO) = [2 a) e-Loce} LT 7.18 The general wear system Y(t) = PY) + QW Theorem 7.13. gives an explicit formula for the solution of the tinear system YOQ=A YX) + OW. Y@ = B, where A is a constant n Xn matix and Q(), ¥() aren X 1 column matrices, We tum now (© the more general case (7.45) Y= POYO+ 0, Ka = where the nx nm mattix P(t) is not necessarily constant If P and Q are continuous on an open interval J, a general existence-uniqueness theorem. which we shall prove in a later section tells us that for each a in J and each initial vector B there is exactly one solution to the initial-value problem (7.45). In this section we use this result to obtain a formula for the solution, generalizing Theorem 7.13, In the scalar case (# = 1) the differential equation (7.45) can be solved as follows. We let A(x) = J¥ P(t) dt, then multiply both members of (7.45) by e~4" to rewrite the differ ential equation in the form (7.46) e AOL YD — PODY(D} = eA Q(1). Now the left member is the derivative of the product e~ 4 ¥(1), Therefore, we can integrate both members from a (© x, where a and x are points in J, to obtain eA YG) — eAy(a) = [f 40) dt. Multiplying by e4") we obtain the explicit formula (7.47) Y(x) = e4e-4A™My(Q) + et fi eAMQ(t) dt. 218 Systems of differential equations The only part of this argument that does not apply immediately to matrix functions is the statement that the left-hand member of (7.46) is the derivative of the product e-4 (1), At this stage we used the fact that the derivative of e-4‘t is —P(t)e-4, In the lar case this is a consequence of the following formula for differentiating exponential functions If E(t) = ef, then Et) = A(pet™. Unfortunately, this differentiaion formula is not always tue when A is a matrix function. 1 For example, it is false for the 2 x 2 matix function A(t) = [° 1 (ee Eterns 7 ce Section 7.12.) Therefore a modified argument is needed to extend Equation (7.47) to the matrix case. Suppose we multiply each member of (7.45) by an unspecified n x n matix F(#). This gives us. the relation FI) Y(t) = FOP) ¥(O + FNQY), Now we add F(t) ¥() t both members in order to transform the left member to the derivative of the product Ff) ¥(. If we do this, the last equation gives us {FO (OP {F() + FOP} YO + FOQW . If we can choose the matrix F(t) so that the sum {F’(t) + F(s)P(1)} on the right is the zero matrix, the last equation simplifies to {FO YO) FQ). Integrating this ftom @ wo x we obtain FOYE) = F@Y(a) = f* FEO ar If, in addition, the matrix F(x) is nonsingular, we obtain the explicit formula (7.48) Y(x) = FoyMa) Ya) + Foy! [* POM) ar. This is a generalization of the scalar formula (7.47). ‘The process will work if we can find a nx Mt matrix function F(t) which satisfies the matrix differential equation Ft) = — FOP) and which is nonsingular. Note that this differential equation is very much like the original differential equation (7.45) with Q(4) = 0, except that the unknown function F(t) is a square matrix instead of a column matrix, Also, the unknown function is multiplied on the right by —P(r) instead of on the left by P(t). We shall prove next that the differential equation for F always has a nonsingular solution. The proof will depend on the following existence theorem for homogeneous linear systems. The general linear system Y'(t) = P(t) Yi) + O()) 219 rumor 7.14, Assume A(t) is an n x n matrix function continuous on an open interval J. Ifa € J and if B is a given n-dimensional vector, the homogeneous linear system y(n) = AW) Yio, Ya) = B, has an n-dimensional vector solution Y on J. A proof of Theorem 7.14 appears in Section 7.21, With the help of this theorem we can prove the following. mmo 7.15. Given an n x 1 matrix function P, continuous on an open interval J, and given any point a in J, there exists an n x n matrix function F which satisfies the matrix differential equation (749) F(X) = —FQ)P(x) on J with initial value F(a) Moreover, F(x) is nonsingular for each x in J. Proof. Let Y{x) be a vector solution of the differential equation Yi) = —P(x)'¥y(x) on J with initial vector ¥,(a) = I, where J, is the kth column of the mx m identity matrix 1 Here P(x)! denotes the transpose of P(x). Let G(x) be the n x nm matrix whose kth column is Y,0). Then G_ satisfies the matrix differential equation (7.50) G(x) = —P(x)'G(x) on J with initial value Gfa) = 1. Now take the tanspose of each member of (7.50), Since the transpose of a product is the product of wansposes in reverse order, we obtain (G@@!” = —GQ)'P(Q). Also, the transpose of the derivative G’ is the derivative of the transpose G‘, Therefore the matrix F(x) = G(2x)! satisfies the differential equation (7.49) with initial value Fla) = J. Now we prove that F(x) is nonsingular by exhibiting its inverse. Let HW be the nx n matrix function whose kth column is the solution of the differential equation V(x) = Px) Yer) with initial vector ¥(a) = 1, the kth column of J Then H satisfies the initial-value problem Wx) = PO)H(x), Ma) = on J The product F(x)H(x) has derivative FO)H(x) + F'QOH() = FOPO)H) — FR)PQ)H(x) = 0 220 Systems of differential equations for cach x in J. Therefore the product F(x)H(x) is constant, F(x)H(x) = F(a)H(a) = so H(x) is the inverse of F(x). This completes the proof, ‘The results of this section are summarized in the following theorem. arom 7.16. Given an n x 1 matrix function P and an n-dimensional vector function Q, both continuous on an open interval J, the solution of the initial-value problem (151) r(x) = P(x) Yix) + QQ), Ya) on J is given by the formula (7.52) Yix) = Faxh'¥(a) + F(X 7 * FQQ(t) at. The nx n matrix F(x) is the transpose of the matrix whose kth column is the solution of the initial-value problem (753) Y@ = —PO@)!Y@), Y@ = hy where I, is the kth column of the identity matrix 1. Although Theorem 7.16 provides an explicit formula for the solution of the general linear system (7,51), the formula is not always a useful one for calculating the solution because of the difficulty involved in determining the matrix function F. ‘The determination of F requires the solution of n homogencous linear systems (7.53). The next section describes a power-series method that is sometimes used to solve homogencous linear systems, We remind the reader once more that the proof of ‘Theorem 7.16 was based on ‘Theorem 7.14, the existence theorem for homogeneous linear systems, which we have not yet proved. 7.19 A power-series method for solving homogencous linear systems Consider a homogeneous linear system (7.54) YO = AQ) YO, YO in which the given n x n matrix A(x) has a power-s some open interval containing the origin, say i s expansion in x convergent in A(S) = Ag + XA $38, bt MARE for bel < ny where the coefficients 49, A, A... . . are given xn matrices. Let us try to find a power-series solution of the form Y(x) = By + xBp+ XB, + + EBLE, Exercises 221 with vector coefficients By, B,, B,,...,. Since ¥(0) = Bp, the initial condition will be satisfied by taking By = B, the prescribed initial vector. To determine the remaining coefficients we substitute the power series for Y(x) in the differential equation and equate coefficients of like powers of x t obtain the following system of equations: & (7.55) B= AB, (kK + DB. = 5 4,By, for k=1,2,.... fo ‘These equations can be solved in succession for the vectors B,, By, ... . If the resulting power series for Y(x) converges in some interval |x] < rz, then Y(x) will be a solution of the initial-value problem (7.54) in the interval |x| 1, so the system of equations in (7.55) becomes B=AB, (K+ 0By = AB, for kB1. Solving these equations in succession we find a eget ‘Therefore the series solution inthis case becomes ae ¥(x) =B+> 548 =e a This agrees with the result obtained earlier for homogeneous linear systems with constant coefficients. 7.20 Exercises 1. Let p be a reakvalued function and Q an m x 1 matrix function, both continuous on an interval J. Let A be an x m constant matrix. Prove that the initial-value problem YQ) = p@A ¥(x%) + OC), Yu) = B, has the solution Yo) = cap eal ft e409 dt on J, where q(x) = Jt pdt. Consider the special “case of Exercise 1 in which A is nonsingular, a = 0, p(x) = 2x, and Q(x) = xC, where C is a constant vector. Show that the solution becomes nb ¥(x) = e*4(B + FAC) = bic, 3. Let A(®) be an # xn matrix function and let E@ = e4, Let QW, Y(O, and Bbe ax 1 column matrices. Assume that E() = AOE Systems of differential equations on an open interval J. Ifa € J and if A’ and Q are continuous on J, prove that the initial-value problem ra = AWYO+ OO, Yu) = has the following solution on J: ¥(x) = ete aAlarB + et (F «AQ det a 4, Let EQ) = e4!, This exereise describes examples of matrix functions A(@ for which EO = AWEO. (@) Let AW = 774, where A is an at x n constant matrix andr is a positive integer. Prove that EW = AEM on (= ©, 2). (b) Let A(t) be a polynomial in ¢ with matrix coefficients, say aw = > PA, 2 where the coefficients commute, 4,4, = A.A, forall r and s. Prove that EW = A’@EW on (—@, ©). (©) Solve the homogeneous linear system YO =U + ay, 0) = B on the interval ( = co, ©), where A is an mx m constant matrix. 5. Assume that the a x m matrix function A(x) has a power-series expansion convergent for |x| co we shall prove that the infinite series (7.60) E Yul) — Yn} converges for each x in J. For this purpose it suffices to prove that the series sv) 0 Yau) = Yai converges. In this series we use the matrix norm introduced in Section 7.3; the norm of a matrix is the sum of the absolute values of all its entries. Consider a closed and bounded subinterval J, of J containing a. We shall prove that for every x in J; the series in (7.61) is dominated by a convergent series of constants inde- pendent of x. This implies that the series converges uniformly on Jy. To estimate the size of the terms in (7.61) we use the recursion formula repeatedly. Initially, we have Ye) — KO) = |F AWB at. Proof of the existence theorem by the method of successive approximations 225 For simplicity, we assume that @ a in Jy. Now we use the recu of ¥,— Yo, and then ion formula once more to express the difference Ye — Y; in terms se the estimate just obtained for Y, — Yo to obtain 109 — ¥C9) = | fF AcoER~ — KO} at | < fF LAOH BIL MG = a) at @ in Jy. By induction we find Yq) = Fath < VBL MPI orm and for all x > @ in Jy. If x eS = UBL eM — 0. This proves that the series in (7.61) converges uniformly on Jy. ‘The foregoing argument shows that the sequence of successive approximations always converges and the convergence is uniform on Jy. Let Y denote the limit function, That is, define Y(x) for each x in Jy by the equation Y(x) = lim ¥,(x). kaw 226 Systems of differential equations We shall prove that Y has the following properties: (a) Y is continuous on Jy. (b) Yo) = B+ feA@Y (dt — for all x in Jy. (c) Ya) = Band Y@) = AQ) Y(x) for all x in Jy. Part (c) shows that Y is a solution of the initial-value problem on J, Proof of (a). Each function Y, is a column matrix whose entries are scalar functions, continuous on Jy. Each entry of the limit function Y is the limit of a uniformly convergent sequence of continuous functions so, by Theorem 11.1 of Volume I, each entry of Y is also continuous on Jy, Therefore Y itself is continuous on J. Proof offb). The recursion formula (7.58) states that You) = B+ [* AYA) dr. Therefore Yo) = fim Foy) = B+ im fF aon ar = B+ [f A@Iim YO) at ke = B+ [AMY ae. ‘The interchange of the limit symbol with the integral sign is valid because of the uniform convergence of the sequence {¥,} on J,. Proof of (c). The equation ¥(a) = B follows at once from (b). Because of (a), the integrand in (b) is continvous on Jy so, by the first fundamental theorem of calculus, Y"(x) exists and equals A(x) Y(x) on Jy The interval Jy was any closed and bounded subinterval of J containing a. If J, is enlarged, the process for obtaining Y(x) doesn’t change because it only involves integration fiom a to x. Since for every x in J there is a closed bounded subinterval of J containing a and x, a solution exists over the full interval J. ‘THEOREM 7.17. UNIQUE THEOREM POR HOMOGENEOUS LINEAR continuous on an open interval J, the differential equation reMS. If A(t) is yy = aw ¥() has at most one solution on J satisfying a given initial condition Y(a) = B . Proof. Let Y and Z be two solutions on J. Let Jy be any closed and bounded subinterval of J containing a. We will prove that Z(x) = Y(x) for every x s implies that Z = Y on the full imerval J. Since both Y and Z are solutions we have ZW) — YO) = AW) = Yo). The method of successive approximations applied tojrst-order nonlinear systems 227 Choose x in J, and integrate this equation from @ to x to obtain air) = von) = [* AOLZO = YH) de ‘This implies the inequality 763) 12) ~ veo co the right-hand member approaches 0, so Z(x) = Y(x). This completes the proof. The results of this section can be summarized in the following existence-uniqueness theorem. razon 7.18. Ler A be an n x n matrix function continuous on an open interval J. Ifa © J and if B is any n-dimensional vector, the homogeneous linear system yw = A) YO, Ya) = has one and only one n-dimensional vector solution on J. 7.22 The method of successive approximations applied to first-order nonlinear systems The method of successive approximations can also be applied to some nonlinear systems. Consider a first-order system of the form: (7.66) oh F(t Y), where Fis a given n-dimensional vector-valued function, and Yis an unknown n-dimensional vector-valued function to be determined. We seek a solution Y which satisfies the equation Y@= Fle vor 228 Systems of differential equations for each f in some interval J and which also satisfies a given initial condition, say Ya) = B, where a € J and B is a given n-dimensional vector. In a manner parallel to the linear case, we construct a sequence of successive approxi- mations ¥y, Y;, Ya... .» by taking Y,= B and defining ¥,,, in terms of ¥, by the recursion formula (7.61) Yeu) = B+ i Fit, ¥,(Q]dt for k= 0, Under certain conditions on F, this sequence will converge to a limit function Y which will satisfy the given differential equation and the given initial condition, Before we investigate the convergence of the process we discuss some one-dimensional examples chosen to illustrate some of the difficulties that can arise in practice. sowexe 1, Consider the nonlinear initial-value problem y’ = x* + y® withy = 0 when x = 0. We shall compute a few approximations to the solution, We choose Yo(x) = 0 and determine the next three approximations as _ follows: ¥(x) = i “at = Ya(x) = fe + YO] dt = Le + 5) aoe ye 6s): woo [le Gog It is now apparent that a great deal of labor will be needed to compute further approxi- mations, For example, the next two approximations ¥, and Y, will be polynomials of degrees 3 1 and 63, respectively. The next example exhibits a further difficulty that can arise in the computation of the successive approximations. roagt gas a= tye =. ] 3 * 63 2079 * 59535 wxaeze 2, Consider the nonlinear initial-value problem y’ = 2x + e” , withy = 0 when 0. We begin with the initial guess Yj(x) = 0 and we find x Hi) ¥4x), = [ees eden xta fear, Here further progress is impeded by the fact that the last integral cannot be evaluated in terms of elementary functions. However, for a given x it is possible to calculate a numerical approximation to the integral and thereby obiain an approximation to Yo(x). Because of the difficulties displayed in the last two examples, the method of successive approximations is sometimes not very useful for the explicit determination of solutions in practice. The real value of the method is its use in establishing existence theorems, Proof of an existence-uniqueness theorem for first-order nonlinear systems 7.23 Proof of an existence-uniqueness theorem for first-order nonlinear systems We tum now to an existence-uniqueness theorem for first-order nonlinear systems. By placing suitable restrictions on the function which appears in the righthand member of the differential equation FY), we can extend the method of proof used for the linear case in Section 7.21. Let Jdenote the open interval over which we seek a solution, Assume a € Jand let B be a given n-dimensional vector, Let S denote a set in (n + 1)-space given by S={%,Y)| Ik-aish,l¥-Bi 0 and k > 0. [If n = 1 this is a rectangle with center at (a, B) and with base 2h and altitude 2k.] We assume that the domain of F includes a set $ of this type and that F is bounded on S, say (7.68) FG, Y)I| 0 and every x in J. ‘The convergence of the sequence {Y,(x)} is now established exactly as in Section 7.21 We write WO)< We, Led Yal)} and prove that Y,(x) tends to a limit as k —» 00 by proving that the infinite series >, Yonei(X) — Yall) converges on I. This is deduced from the inequality MA™|x — a\™? — MA™em*! Nae) oe gl Sot eee TT (m+! (m+! which is proved by induction, using the recursion formula and the Lipschitz condition. We then define the limit function Y by the equation Y(x) = lim YQ) mm for each x in J and verify that it satisfies the integral equation Yox)= Bs Fla voldt, se. This proves the existence of a solution, The uniqueness may ne -method used to prove Theorem 7.17. exactly as in the linear then be proved by the si 7.24 Exercises 1, Consider the linear initial-value problem yy =2e, with y = 1 when x = 0. Exercises 231 . Apply the method of succ (@) Find the exact solution Y of this problem, (b) Apply the method of successive approximations, starting with the initial guess ¥o(x) = 1 Determine ¥,(x) explicitly and show that lim Y,@) = YO) for all real x. Apply the method of successive approximations to the nonlinear initia value problem yo=x+y, with y =Owhenx =0. Take Y,(x) = 0 as the initial guess and compute ¥9(x), ive approximations to the nonlinear init value problem y=1+4xy®, with y =Owhenx =O Take Y(x) = 0 as the initial guess and compute Y.(x). Apply the method of successive approximations 10 the nonlinear initial-value problem v Owhenx =O. f+ yt, with Start with the “bad” initial guess ¥,6) = 1, compute Y4(x), and compare with the results of Example 1 in Section 7.22. . Consider the nonlinear initial-value problem yextty with y =I whenx =0. (@) Apply the method of successive approximations, starting with the initial guess Y,0) = 1, and compute ¥2(x). b) Let R=[— 1, 1) x [= 1, 1]. Find the smallest M such that I/(x, y)) < Mon R. Find an interval Z, ) such that the graph of every approximating function Y,, over Z, will liein R. (©) Assume the solution y = Y(x) has a power-series expansion in a neighborhood of the origin, Determine the first six nonzero terms of this expansion and compare with the result of part (@). Consider the jal-value problem y=ity, with y =Owhenx = (@) Apply the method of successive approximations, starting with the initial guess Y,(x) and compute Y.(x). (b) Prove that every approximating function Y,, is defined on the entire real axis. (©) Use Theorem 7.19 to show that the initial-value problem has at most one solution in any interval of the form (—/ , hy. (@) Solve the differential equation by separation of variables and thereby show that there is exactly one solution Y of the initial-value problem on the interval (=x/2, 7/2) and no solution on any larger interval. In this example, the successive approximations are defined on the entire real axis, but they converge to a limit function only on the interval (—x/2, 7/2). |. We seek two functions y = Y(x) and z = Z(x) that simultaneously satisfy the system of equations yes, x0 +2) 232 Systems of differential equations with initial conditions y = 1 and z = 1/2 when x = 0. Start with the initial guesses ¥(x) = 1, Zo(X) = 1/2, and use the method of successive approximations to obtain the approximating functions coe eee YO=145+ 75+ H+ ig? xo 3x8 Text a et i0t ef + 360 + 356° 1 2 8. Consider the system of equations 2x +2, y+ xz, with initial conditions y = 2 and z = 0 when x Start with the initial guesses Y,00) = 2, Z,Ax) = 0, use the method of successive approximations, and determine Y,(x) and. Zg(.x). 9. Consider the initial-value problem Via ‘+ xy, with y andy’ = 1 whenx = Change this problem to an equivalent problem involving a system of two equations for two unknown functions y = Y(x) and z = Z(x), where z = y’. Then use the method of successive approximations, starting with initial guesses Yx) = 5 and Z,(x) = 1, and determine Y,(x) and Z,(x). 10. Letfbe defined on the rectangle R= [-1, 1} x [-1, 1] as follows: 0 if x=0, { dle if x #0 and ly sat, ape es Ol andy ad Se Y= (@) Prove that [f(x, y)] < 2 for all (x,y) in R (b) Show thatfoes not satisfy a Lipschitz condition on R. For each constant C satisfying |C| <1, show that y = Cx? is a solution of the initial-value problem y’ =/(x, »), with y =0 when x = 0. Show also that the graph of each of these solutions over (= 1, 1) lies in R, @ Apply the method of successive approximations to &his initial-value problem, starting with initial guess Y.(x) = 0. Determine Y,(x) and show that the approximations converge to a solution of the problem on the interval (~ 1 , 1). (©) Repeat part (d), starting with initial guess Y,(x) = x, Determine Y,(x) and show that the approximating functions converge t a solution different from any of those in part (©) (® Repeat part (@), starting with the initial guess Y.(x) = x? (g) Repeat part (d), starting with the initial guess Y,,(x) = x!/* 4 7.25 Successive approximations and fixed points of operators The basic idea underlying the method of successive approximations can be used not only to establish existence theorems for differential equations but also for many other important problems in analysis. The rest of this chapter reformulates the method of successive approximations in a setting that greatly increases the scope of its applications. Normed linear spaces 233 In the proof of Theorem 7.18 we constructed a sequence of functions {¥,} a the recursion formula ‘ording to Youd = B+ J? AYO at The righthand member of this formula can be regarded as an operator 7) which converts cenain functions Y into new functions 71¥) according © the equation m7) Be f AY) dt. In the proof of Theorem 7.18 we found that the solution Y of the initial-value problem Y( = A Y(, Y(a) = B, satisfies the integral equation YoBu | Av@ at. In operator notation this states that Y = 7%. In other words, the solution Y remains unaltered by the operator 7: Such a function Y is called a fixed point of the operator T. Many important problems in analysis can be formulated so their solution depends on the existence of a fixed point for some operator. Therefore it is worthwhile to ty to discover properties of operators that guarantee the existence of a fixed point. We tum now to a systematic treatment of this problem. 7.26 Normed 1 To formulate the method of successive approximations in a general form it is convenient to work within the framework of linear spaces. Let S be an arbitrary linear space, When we speak of approximating one element x in S by another clement y in S, we consider the difference x — y , which we call the error of the approximation’ To measure the size of this error we introduce a norm in the space. ear spaces DEFINITION OF A NORM. Let S be any linear space. A. real-valuedfunction N defined on is called a norm if it has the following properties. (@) Nx) 20 for each x in S. (b) N(cx) = lel NG) for each x in S and each scalar ¢. (c) N(x +9) < NG) + NG) for all x and y in & (@) NG) = 0 implies x = 0. A linear space with a norm assigned to it is called a normed linear space. The norm of x is sometimes written {)x\J instead of NOx). In this notation the fundamental properties become (a) [IX] > 0 for all x in S. (b) lex!) = le| xl) for all x in S and all scalars (c) x + yl < |x| + [Lvl] for all x and y in S. (d) [lxl| = 0 implies x = 0. If x and y are in S, we refer to [|x —y [as the distance from x to y. 234 Systems of differential equations If the space S is Buclidean, then it always has a norm which it inherits from the inner product, namely, |[xl| = (x, x)’. However, we shall be interested in a particular norm which does not arise from an inner product. The max norm. Let C(J) denote the linear space of real-valued functions continuous on a closed and bounded interval J. If g € C(J), define Holl = max |e@)1, where the symbol on the right stands for the maximum absolute value of g on J. The reader may verify that this norm has the four fundamental properties. The max norm is not derived from an inner product, To prove this we show that the max norm violates some property possessed by all inner-product norms. For example, if a norm is derived from an inner product, then the “parallelogram law” lx + yl? + Ue = yl)? = 2 |x|? + 2 Iyllt holds for all x and y in S. (See Exercise 16 in Section 1.13.) The parallelogram law is not always satisfied by the max nom, For example, let x and y be the functions on the interval (0, 1] given by x) =t, y(t)= lot. Then we have |[x]|= Ilyll= lle + yill= lx — 9 1, so the panillelogram law is. violated. 7.27 Contraction operators In this section we consider the normed linear space C(J) of all real funetions continuous on a closed bounded imerval J in which | gil is the max norm, Consider an operator T: Cd) > Cd) whose domain is C(J) and whose range is a subset of C(J), That is, if @ is continuous on J, then T(q) is also continuous on J, The following formulas illustrate a few simple examples of such operators. In each case y is an arbitrary function in CU) and T(g)(x) is defined for each x in J by the formula given: a fixed real number, TOG) = 2g{x), where 4 T(g\(x) = fF Gi) dt, where ¢ is a given point in J, TPM) = b+ [PFI oO) at, where b is a constant and the composition /[t, @(¢)] is continuous on J. We are interested in those operators T for which the distance | T(g) ~ T(p)\| is less than a fixed constant multiple 2 < 1 of |jp — yj . These are called contraction operators; they are defined as follows. Fixed-point theorem for contraction operators 235 DEFINITION OF A CONTRACTION OPERATOR. A” operator T: C(J)- C(J) is called a contraction operator if there is a constant % satisfying 0 << such that for every pair of functions p and y in C(J) we have (7.70) UT -TwWl 1 . To prove (7.74) we note that Vex) — Px = TENE) — Terr VOM S & Ie = Gell ‘Therefore the inequality in (7.74) will be proved if we show that (7.75) We — Gl] < Mok for every k > 1. We now prove (7.75) by induction. For k = 1 we have er - Poll S Weal + loll = M, which is the same as (7.75). To prove that (7.75) holds for k + 1 if it holds for k we note that VFe@) = oxO91 = IMEDC) — TDC) S % We — Pall S Mot. Since this is valid for each x in J we must also have Weer > Gull S Mo. This proves (7.75) by induction, Therefore the series in (7.73) converges uniformly on J. If we let g(x) denote its sum we have (770) As) = lin P40) = 9) + (9m) = 1909} Applications of thejxed-point theorem 237 The function g is continuous on J because it is the sum of a uniformly convergent series of continuous functions. To prove that g is a fixed point of T we compare T(g) with Gns1 = T(P,)- Using the contraction property of T’ we have 77 ITEC) = Pn@d! = ITC) = TED S 419C) — Fal. But from (7.72) and (7.76) we find I) — #4091 =] ¥en) ~ 009} ] <3 raed — nol < MZ", where in the last step we used (7.74), Therefore (7.77) implies ITN) = Pass ME oh, When n —» co the series on the right tends t0 0, 80 Pasi) > Th)(a). But since Pqsa(x) > g(x) as n—> co, this proves that g(x) = T(g)(x) for each x in J Therefore y = T(y), so @ is a fixed point Finally we prove that the fixed point is unique. Let y be another function in CJ such that Tty) = y, Then we have le = vl = IT) — TH) Sa Ig — yl. ‘This gives us (1 — a) | ¢ — yl < 0. Since a@ < I we may divide by I — a t obtain the inequality || — yl| < 0. But since we also have {| ¢ — || > 0 this means that |g — yl] = 0, and hence g — y = 0. The proof of the fixed-point theorem is now complete. © 7.29 Applications of the fixed-point theorem To indicate the broad scope of applications of the fixed point theorem we use it to prove two important theorems, The first gives a sufficient condition for an equation of the form A(x, ¥) = 0 to define y as a function of x. ‘THEOREM 7.21. AN IMPLICIT-FUNCTION THEOREM. Let f be defined on a rectangular strip B of the form R={(@ylacxdb,-w 0, then for each i such that 785) 4) <—+— a “ M(b — a) there is one and only one function @ in C that Satisfies the integral equation (7. Proof. We shall prove that 7 G2 in C and consider the difference 4 contraction operator. Take any two functions g and Toe) — Tepe) = 4 |? Kes Dipl — gal] dt. Using the inequality (7.84) we may write ITE) = TIONS A Mb = 4) bos = gall = 21g = gl, where a = [2! M(b — a). Because of (7.85) we have 0 < a < 1, so T is a contraction operator with contraction constant a. Therefore T has a unique fixed point g in C. This function g satisfies (7.83). PART 2 NONLINEAR ANALYSIS 8 DIFFERENTIAL CALCULUS OF SCALAR AND VECTOR FIELDS 8.1 Functions from R” to R”. Scalar and yector fields Parl 1 of this volume dealt primarily with linear transformations T:V>Ww from one linear space V into another linear space W. In Part 2 we drop the requirement that T be linear but restrict the spaces V and W to be finite-dimensional. Specifically, we shall consider funetions with domain in n-space R" and with range in m-space R”™. When both 1 and m are equal to 1, such a function is called a real-valued function of a real variable. When n = 1 and m > 1 it is called a vector-valued function of a real variable. Examples of such functions were studied extensively in Volume 1 In this chapter we assume that 2 > 1 and m >. When m = 1 the function is called a real-valued function of a vector variable or, more briefly, a scalar field. When m > 1 it is called a vector-valued function of a vector variable, or simply a vector field. This chapter extends the concepts of limit, continuity, and derivative to scalar and vector fields. Chapters 10 and 11 extend the concept of the integral Notation: Scalar will be denoted by light-faced type, and vectors by bold-faced type. It f is a scalar field defined at a point x = (4, ... , x,) in R”, the notations f(x) and SQr,+. 45%) are both used to denote the value of fat that particular point. Iffis a vector field we write fix) or fit, ... , x) for the function value at x. We shall use the inner product ry = dae and the corresponding norm |x|] = (x + x), where x = (x, +x,)and y = Q1s-+- . Px)+ Points in R® are usually denoted by (x, ¥) instead of (x,, x,); points in by (x,y, 9 insteade (4, 03, %). jar_and vector fields defined on subsets of R® and R® occur frequently in the appli- cations of mathematics to science and engineering, For example, if at each point x of the atmosphere we assign a real number f(x) which represents the temperature at x, the function 243 244 Differential calculus of scalar and vector fields £ 50 defined is a scalar field. If we assign a vector which represents the wind velocity at that point, we obtain an example of a vector field. In physical problems dealing with either scalar or vector fields it is important to know how the field changes as we move from one point to another. In the one-dimensional case the derivative is the mathematical tool used to study such changes. Derivative theory in the one-dimensional case deals with functions defined on open intervals. To extend the theory to R" we consider generalizations of open intervals called open sets. 82 Open balls and open sets Let @ be a given point in R" and let r be a given positive number. The set of all. points x in R™ such that |x -—al 1. The boundary of $ consists of all x with |lx| = 1. 8.3 Exercises 1. Letf be a scalar field defined on a set S and let ¢ be a given real number. The set of all points x in S such that f(x) = ¢ is called a level set off. (Geometric and physical problems dealing with level sets will be discussed later in this chapter.) For each of the following scalar fields S is the whole space R". Make a sketch t describe the level sets corresponding to the given values of c. @ fx, Y= 8+ 9%, © =0,1,4,9 (b) fl, =e, csehe lee. (©) fx, y) = cos (x + ¥), c= -1,0,4,492, 1 @) fle, 2) =x FY +2, c= -1,0, 1. TL ec c =0, 6, 12, © fl, y.2) = sin G+ v4 2), c= -1, 4,042.1. 246 Differential calculus of scalar and vector fields 2. In each of the following cases, let $ be the set of all points (x,y) in the plane satisfying the given inequalities, Make a sketch showing the set S and explain, by a geometric argument, whether or not S is open, Indicate the boundary of $ on your sketch, @ + y0 ©) |x| <1 and |yl <1 @ xy. (@x >Oand y>0 Wx >y. (e) |x| S 1 and ly] S$ 1 (1) y > x* and [x| < 2. () x >Oand y 0. (xy <1 (a) Cx — a? — yDG# + y* x) >0. 3, In each of the following, let S be the set of all points (x, y, 2) in 3-space satisfying the given inequalities and determine whether or not $ is open. (@ 2 = x2 —y? = 1 >0. (b) [x1 < 1, yl] <1, and [2] <1 ©xty+z<1 (@) tal $1 yl <1, and [zl <1. ©xty+z0,y>0,27>0. (x2 +°4y? + 42% — 2x + 16y +402 + 113 0. @i x and |x| < 2. 7. (a) If A is a closed set in n-space and x is a point not in 4, prove that 4 « {2x} is also closed. (b) Prove that a closed interval [a, 6] on the real line is a closed set. (©) If A and B are closed intervals on the real line, show that A U B and AN B are closed. tf A and B are sets, the difference A — B (called the complement of B relative to A) is the set of all elements of A which are not in B. Limits and continuity 247 8. Prove the following properties of closed sets in R", You may use the results of Exercise 5. (a) The empty set 2 is closed. (b) R® is closed. (©) The intersection of any collection of closed sets is closed. @ The union of a finite number of closed sets is. closed. (©) Give an example to show that the union of an infinite collection of closed sets is not necessarily closed. 9. Let S be a subset of R”. (a) Prove that both int S and ext S are open sets. (b) Prove that R* = (int S) U (ext S) U aS, a union of disjoint sets, and use this to deduce that the boundary @S is always a closed set. 10. Given a set $ in R" and a point x with the property that every ball B(x} contains both interior points of S and points exterior to S, Prove that x is a boundary point of S. Is the converse statement true? That is, does every boundary point of S necessarily have this property? 11, Let S be a subset of R, Prove that ext $= int(R" — S) 12, Prove that a set S in R” is closed if and only if § = (int S) U as. 8.4 Limits and continuity ‘The concepts of limit and continuity are easily extended to scalar and vector fields. We shall formulate the definitions they apply also to scalar fields. We consider a function where $ is a subset of R”. If a € R" and b € R™ we write (8.1) im f(x) = 6 (or, f(x) +b as xa) to mean that (8.2) lim Il f(@) — 4) = 9. sai ‘The limit symbol in equation (8.2) is the usual limit of elementary calculus. In this definition it is not required thatf be defined at the point a itself If we write A= x — a, Equation (8.2) becomes lim, If@+ #) ~ B= 0 For points in R® we write (x, y) for x and (a, b) for @ and express the limit relation (8.1) as follows : lim f(x, 9) (eul=ta For points in R® we put x = (x, y, 2) and @ = (a, b, ©) and write lim f(x, sz) = 5 fay.2)-¥(0,0.0) A function f is said to be continuous at a if f is defined at @ and if lim f(3) = F(@). We say f is continuous on a set S iff is continuous at each point of

