Sanet

Anthony N.
Michel
Charles .J Herget
Algebra and Analysis

for Engineers and Scientists
Birkhauser
Boston • Basel • Berlin
Anthony N. Michel Charles .J eH rget
Department of Electrical Engineering eH rget Associates
nU iversity of Notre Dame P.O. Box 1425
Notre Dame, IN 64 556 Alameda, CA 94501
.U S.A. .U S.A.
Cover design by Dutton and Sherman, aH mden, CT.
Mathematics Subject Classification (2000): 03Ex,x 03E20, 08-,X 08-01, IS-,X 15-01, 15A03,
15A04, 15A06, 15A09, 15A15, 15AI8, 15A21, 15A57, 15A60, 15A63, 20-,X 20-01, 26-,X
26-01, 26Ax,x 26A03, 26A15, 26Bx,x 34,X - 340- 1, 34A,x 34AI2, 34A30, 340H 5, 54 B05, 64 ,X -
64 0- 1, 64 Ax,x 64 A22, 64 A50, 64 A55, 64 Bx,x 64 B20, 64 B25, 64 Cx,x 64 C05, 64 Ex,x 64 NIO, 64 N20,
74 ,X - 74 0- 1, 74 Ax,x 74 A05, 74 A07, 74 A10, 74 A25, 74 A30, 74 A67, 47BI5,47HI0, 74 N20, 74 N70,
54,X - 540- 1, 54A20, 54C,x 54C05, 54C30, 540x , 54005, 54 0 30,54 0 35,54 0 4 5 , 54E35, 54E54 ,
54E50, 93EIO
L i brary of Congress Control Number: 2007931687
ISBN-13: 978-08- 176-74 06-3 e-ISBN-13: 978-08- 176-74 07-0
Printed on acid-free paper.
©2007 Birkhiiuser Boston
Originally published as Mathematical oF undations in Engineering and Science by Prentice-aH ll,

Englewood Cliffs, NJ, 1981. A subsequent paperback edition under the title Applied Algebra and
Functional Analysis was published by Dover, New oY rk, 1993. o
F r the Birkhiiuser Boston printing,
the authors have revised the original preface.
All rights reserved. This work may not be translated or copied in whole or in part without the writ-
ten permission of the publisher (Birkhiiuser Boston, c/o Springer Science+Business Media C L , 233
Spring Street, New oY rk, NY 10013, S U A), except for brief excerpts in connection with reviews or
scholarly analysis. sU e in connection with any form of information .storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter de-
veloped is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
9 8 7 6 5 432 I
www.birkhauser.com (IBT)
CONTENTS
PREFACE IX
CH A PTER 1: F U N DAMENTAL CONCEPTS 1
1.1 Sets 1
1.2 Functions 12
1.3 Relations and Equivalence Relations 25
1.4 Operations on Sets 26
1.5 Mathematical Systems Considered in This Book 30
1.6 References and Notes 31
References 32
CH A PTER 2: ALGEBRAIC STRU C TU R ES 33
2.1 Some Basic Structures of Algebra 34

A. Semigroups and Groups 36
8. Rings and iF elds 46
C. Modules, Vector Spaces, and Algebras 53
D. Overview 61
2.2 Homomorphisms 62
2.3 Application to Polynomials 69
References 74
v
vi Contents
CHAPTER J : VECTOR SPACES AND LINEAR

TRANSFORMATIONS 75
3.1 iL near Spaces 75
3.2 iL near Subspaces and Direct Sums 81
3.3 iL near Independence, Bases, and Dimension 85
3.4 iL near Transformations 95
3.5 iL near uF nctionals 109
3.6 Bilinear uF nctionals 113
3.7 Projections 119
3.8 Notes and References 123
References 123
CHAPTER 4 : FINITE-DIMENSIONAL VECTOR

SPACES ANDMATRICES 124
.4 1 Coordinate Representation of Vectors 124
.4 2 Matrices 129
A. Representation of iL near Transformations
by Matrices 129
B. Rank of a Matrix 134
C. Properties of Matrices 136
.4 3 Equivalence and Similarity 148
4.4 Determinants of Matrices 155
.4 5 Eigenvalues and Eigenvectors 163
.4 6 Some Canonical oF rms of Matrices 169
.4 7 Minimal Polynomials, Nilpotent Operators
and the oJ rdan Canonical oF rm 178
A. Minimal Polynomials 178
B. Nilpotent Operators 185
C. The oJ rdan Canonical oF rm 190
.4 8 Bilinear uF nctionals and Congruence 194
;4 9 Euclidean Vector Spaces 202
A. Euclidean Spaces: Definition and Properties 202
B. Orthogonal Bases 209
.4 10 iL near Transformations on Euclidean Vector Spaces 216
A. Orthogonal Transformations 216
B. Adjoint Transformations 218
C. Self-Adjoint Transformations 221
D. Some Examples 227
E. uF rther Properties of Orthogonal
Transformations 231
Contents vii
.4 11 Applications. to Ordinary Differential Equations 238

A. Initial-Value Problem: Definition 238
B. Initial-Value Problem: linear Systems 24 4
References 262
CH A PTER 5: METRIC SPACES 263
5.1 Definition of Metric Spaces 264

5.2 Some Inequalities 268
5.3 Examples of Important Metric Spaces 271
5.4 Open and Closed Sets 275
5.5 Complete Metric Spaces 286
5.6 Compactness 298
5.7 Continuous Functions 307
5.8 Some Important Results in Applications 314
5.9 Equivalent and Homeomorphic Metric Spaces.
Topological Spaces 317
5.10 Applications 323
A. Applications of the Contraction Mapping
Principle 323
B. uF rther Applications to Ordinary Differential
Equations 329
References 341
CHAPTER 6: NORMED SPACES AND INNER PRODUCT

SPACES 343
6.1 Normed linear Spaces 344

6.2 linear Subspaces 348
6.3 Infinite Series 350
6.4 Convex Sets 351
6.5 iL near Functionals 355
6.6 Finite- Dimensional Spaces 360
6.7 Geometric Aspects of iL near Functionals 363
6.8 Extension of iL near Functionals 367
6.9 Dual Space and Second Dual Space 370
6.1 0 Weak Convergence 372
6.11 Inner Product Spaces 375
6.12 Orthogonal Complements 381
yiii Contents
6.13 oF urier Series 387

6.14 The Riesz Representation Theorem 393
6.15 Some Applications 394
A. Approximation of Elements in iH lbert Space
(Normal Equations) 395
B. Random Variables 397
C. Estimation of Random Variables 398
References 404
CHAPTER 7: L I NEAR OPERATORS 406
7.1 Bounded iL near Transformations 04 7

7.2 Inverses 415
7.3 Conjugate and Adjoint Operators 419
7.4 eH rmitian Operators 24 7
7.5 Other iL near Operators: Normal Operators, Projections,
nU itary Operators, and Isometric Operators 34 1
7.6 The Spectrum of an Operator 439
7.7 Completely Continuous Operators 74
7.8 The Spectral Theorem for Completely Continuous
Normal Operators 454
7.9 Differentiation of Operators 458
7.10 Some Applications 465
A. Applications to Integral Equations 465
B. An Example from Optimal Control 468
C. Minimization of Functionals: Method of Steepest
Descent 74 1
References 74 3
Index 475
PREFACE
This book evolved from a one-year sequence of courses offered by the authors
at Iowa State University. The audience for this book typically included theoreti-
cally oriented first- or second-year graduate students in various engineering or
science disciplines. Subsequently, while serving as Chair of the Department of
Electrical Engineering, and later, as Dean of the College of Engineering at the
University of Notre Dame, the first author continued using this book in courses
aimed primarily at graduate students in control systems. Since administrative
demands precluded the possibility of regularly scheduled classes, the Socratic
method was used in guiding students in self study. This method of course deliv-
ery turned out to be very effective and satisfying to student and teacher alike.
F e edback from colleagues and students suggests that this book has been used in
a similar manner elsewhere.
The original objectives in writing this book were to provide the reader with ap-
propriate mathematical background for graduate study in engineering or science;
to provide the reader with appropriate prerequisites for more advanced subjects
in mathematics; to allow the student in engineering or science to become famil-
iar with a great deal of pertinent mathematics in a rapid and efficient manner
without sacrificing rigor; to give the reader a unified overview of applicable
mathematics, thus enabling him or her to choose additional courses in math-
ematics more intelligently; and to make it possible for the student to understand
at an early stage of his or her graduate studies the mathematics used in the cur-
ix
x Preface
rent literature (e.g., journal articles, monographs, and the like).

Whereas the objectives enumerated above for writing this book were certain-
ly pertinent over twenty years ago, they are even more compelling today. The
reasons for this are twofold. First, today's graduate students in engineering or
science are expected to be more knowledgeable and sophisticated in mathemat-
ics than students in the past. Second, today's graduate students in engineering
or science are expected to be familiar with a great deal of ancillary material
(primarily in the computer science area), acquired in courses that did not even
exist a couple of decades ago. In view of these added demands on the students'
time, to become familiar with a great deal of mathematics in an efficient manner,
without sacrificing rigor, seems essential.
Since the original publication of this book, progress in technology, and con-
sequently, in applications of mathematics in engineering and science, has been
phenomenal. oH wever, it must be emphasized that the type of mathematics it-
self that is being utilized in these applications did not experience corresponding
substantial changes. This is particularly the case for algebra and analysis at the
intermediate level, as addressed in the present book. Accordingly, the material
of the present book is as current today as it was at the time when this book first
appeared. (Plus a~ change, plus c' e st la meme chose.- A lphonse aK rr, 1849.)
This book may be viewed as consisting essentially of three parts: set theory
(Chapter I), algebra (Chapters 2- 4 ) , and analysis (Chapters 5-7). Chapter I is
a prerequisite for all subsequent chapters. Chapter 2 emphasizes abstract alge-
bra (semigroups, groups, rings, etc.) and may essentially be skipped by those
who are not interested in this topic. Chapter 3, which addresses linear spaces
and linear transformations, is a prerequisite for Chapters ,4 6, and 7. Chap-
ter 4, which treats finite-dimensional vector spaces and linear transformations
on such spaces (matrices) is required for Chapters 6 and 7. In Chapter 5, metric
spaces are treated. This chapter is a prerequisite for the subsequent chapters. Fi-
nally, Chapters 6 and 7 consider Banach and Hilbert spaces and linear operators
on such spaces, respectively.
The choice of applications in a book of this kind is subjective and will al-
ways be susceptible to criticisms. We have attempted to include applications of
algebra and analysis that have broad appeal. These applications, which may be
omitted without loss of continuity, are presented at the ends of Chapters 2, 4, 5,
6, and 7 and include topics dealing with ordinary differential equations, integral
equations, applications of the contraction mapping principle, minimization of
functionals, an example from optimal control, and estimation of random vari-
ables.
All exercises are an integral part of the text and are given when they arise,
rather than at the end of a chapter. Their intent is to further the reader's under-
standing of the subject matter on hand.
Preface ix
The prerequisites for this book include the usual background in undergraduate
mathematics offered to students in engineering or in the sciences at universities
in the United States. Thus, in addition to graduate students, this book is suit-
able for advanced senior undergraduate students as well, and for self study by
practitioners.
Concerning the labeling of items in the book, some comments are in order. Sec-
tions are assigned numerals that reflect the chapter and the section numbers. For
example, Section 2.3 signifies the third section in the second chapter. Extensive
sections are usually divided into subsections identified by upper-case com-
mon letters A, B, C, etc. Equations, definitions, theorems, corollaries, lemmas,
examples, exercises, figures, and special remarks are assigned monotonically
increasing numerals which identify the chapter, section, and item number. For
example, Theorem 4.4.7 denotes the seventh identified item in the fourth section
of Chapter .4 This theorem is followed by Eq. (4.4.8), the eighth identified item
in the same section. Within a given chapter, figures are identified by upper-case
letters A, B, C, etc., while outside of the chapter, the same figure is identified
by the above numbering scheme. iF nally, the end of a proof or of an example is
signified by the symbol • .
Suggested Course Outlines

Because of the flexibility described above, this book can be used either in a one-
semester course, or a two-semester course. In either case, mastery of the material
presented will give the student an appreciation of the power and the beauty of
the axiomatic method; will increase the student's ability to construct proofs;
will enable the student to distinguish between purely algebraic and topological
structures and combinations of such structures in mathematical systems; and of
course, it will broaden the student's background in algebra and analysis.
A one-semester course
Chapters 1, 3, 4, 5, and Sections 6.1 and 6.11 in Chapter 6 can serve as the basis
for a one-semester course, emphasizing basic aspects of Linear Algebra and
Analysis in a metric space setting.
The coverage of Chapter 1 should concentrate primarily on functions (Sec-
tion 1.2) and relations and equivalence relations (Section 1.3), while the material
concerning sets (Section 1.1) and operations on sets (Section 1.4) may be cov-
ered as reading assignments. On the other hand, Section 1.5 (on mathematical
systems) merits formal coverage, since it gives the student a good overview of
the book' s aims and contents.
xii Preface
The material in this book has been organized so that Chapter 2, which ad-
dresses the important algebraic structures encountered in Abstract Algebra, may
be omitted without any loss of continuity. In a one-semester course emphasizing
Linear Algebra, this chapter may be omitted in its entirety.
In Chapter 3, which addresses general vector spaces and linear transforma-
tions, the material concerning linear spaces (Section 3.1), linear subspaces and
direct sums (Section 3.2), linear independence and bases (Section 3.3), and lin-
ear transformations (Section 3.4) should be covered in its entirety, while selected
topics on linear functionals (Section 3.5), bilinear functionals (Section 3.6), and
projections (Section 3.7) should be deferred until they are required in Chap-
ter .4
Chapter 4 addresses finite-dimensional vector spaces and linear transforma-
tions (matrices) defined on such spaces. The material on determinants (Section
4.4) and some of the material concerning linear transformations on Euclidean
vector spaces (Subsections .4 1 OD and .4 1 OE), as well as applications to ordinary
differential equations (Section 4.11) may be omitted without any loss of conti-
nuity. The emphasis in this chapter should be on coordinate representations of
vectors (Section 4.1), the representation of linear transformations by matrices
and the properties of matrices (Section 4.2), equivalence and similarity of ma-
trices (Section 4.3), eigenvalues and eigenvectors (Section 4.5), some canonical
forms of matrices (Section 4.6), minimal polynomials, nilpotent operators and
the Jordan canonical form (Section 4.7), bilinear functionals and congruence
(Section 4.8), Euclidean vector spaces (Section 4.9), and linear transformations
on Euclidean vector spaces (Subsections .4 1 OA, .4 1 OB, and .4 1 oq .
Chapter 5 addresses metric spaces, which constitute some of the most impor-
tant topological spaces. In a one-semester course, the emphasis in this chapter
should be on the definition of metric space and the presentation of important
classes of metric spaces (Sections 5.1 and 5:3), open and closed sets (Sec-
tion 5.4), complete metric spaces (Section 5.5), compactness (Section 5.6), and
continuous functions (Section 5.7). The development of many classes of metric
spaces requires important inequalities, including the Holder and the Minkowski
inequalities for finite and infinite sums and for integrals. These are presented
in Section 5.2 and need to be included in the course. Sections 5.8 and 5.10 ad-
dress specific applications and may be omitted without any loss of continuity.
oH wever, time permitting, the material in Section 5.9, concerning equivalent and
homeomorphic metric spaces and topological spaces, should be considered for
inclusion in the course, since it provides the student a glimpse into other areas
of mathematics.
To demonstrate mathematical systems endowed with both algebraic and to-
pological structures, the one-semester course should include the material of
Sections 6.1 and 6.2 in Chapter 6, concerning normed linear spaces (resp., Ban-
ach spaces) and inner product spaces (resp., Hilbert spaces), respectively.
Preface ix ii
A two-semester course
In addition to the material outlined above for a one-semester course, a two-se-
mester course should include most of the material in Chapters 2, 6, and 7.
Chapter 2 addresses algebraic structures. The coverage of semigroups and
groups, rings and fields, and modules, vector spaces and algebras (Section 2.1)
should be in sufficient detail to give the student an appreciation of the various
algebraic structures summarized in Figure B on page 61. Important mappings
defined on these algebraic structures (homomorphisms) should also be empha-
sized (Section 2.2) in a two-semester course, as should the brief treatment of
polynomials in Section 2.3.
The first ten sections of Chapter 6 address normed linear spaces (resp., Ban-
ach spaces) while the next four sections address inner product spaces (resp.,
Hilbert spaces). The last section of this chapter, which includes applications (to
random variables and estimates of random variables), may be omitted without
any loss of continuity. The material concerning normed linear spaces (Sec-
tion 6.1), linear subspaces (Section 6.2), infinite series (Section 6.3), convex sets
(Section 6.4), linear functionals (Section 6.5), finite-dimensional spaces (Sec-
tion 6.6), inner product spaces (Section 6.11), orthogonal complements (Section
6.12), and Fourier series (Section 6.13) should be covered in its entirety. Cov-
erage of the material on geometric aspects of linear functionals (Section 6.7),
extensions of linear functionals (Section 6.8), dual space and second dual space
(Section 6.9), weak convergence (Section 6.10), and the Riesz representation
theorem (Section 6.14) should be selective and tailored to the availability of time
and the students' areas of interest. (For example, students interested in optimiza-
tion and estimation problems may want a detailed coverage of the H a hn- B anach
theorem included in Section 6.8.)
Chapter 7 addresses (bounded) linear operators defined on Banach and Hilbert
spaces. The first nine sections of this chapter should be covered in their entirety
in a two-semester course. The material of this chapter includes bounded lin-
ear transformations (Section 7.1), inverses (Section 7.2), conjugate and adjoint
operators (Section 7.3), Hermitian operators (Section 7.4), normal, projection,
unitary and isometric operators (Section 7.5), the spectrum of an operator (Sec-
tion 7.6), completely continuous operators (Section 7.7), the spectral theorem
for completely continuous normal operators (Section 7.8), and differentiation of
(not necessarily linear and bounded) operators (Section 7.9). The last section,
which includes applications to integral equations, an example from optimal con-
trol, and minimization of functionals by the method of steepest descent, may be
omitted without loss of continuity.
Both one-semester and two-semester courses offered by the present authors,
based on this book, usually included a project conducted by each course par-
ticipant to demonstrate the applicability of the course material. Each project
ix v Preface
involved a formal presentation to the entire class at the end of the semester.
The courses described above were also offered using the Socratic method, fol-
lowing the outlines given above. These courses typically involved half a dozen
participants. While most of the material was self taught by the students them-
selves, the classroom meetings served as a forum for guidance, clarifications, and
challenges by the teacher, usually resulting in lively discussions of the subject on
hand not only among teacher and students, but also among students themselves.
For the current printing of this book, we have created a supplementary web-
site of additional resources for students and instructors: http://Michel.Herget.
net. Available at this website are additional current references concerning the
subject matter of the book and a list of several areas of applications (including
references). Since the latter reflects mostly the authors' interests, it is by defini-
tion rather subjective. Among several additional items, the website also includes
some reviews of the present book. In this regard, the authors would like to invite
readers to submit reviews of their own for inclusion into the website.
The present publication of Algebra and Analysisfor Engineers and Scientists
was made possible primarily because of Tom Grasso, Birkhauser's Computa-
tional Sciences and Engineering Editor, whom we would like to thank for his
considerations and professionalism.
Anthony N. Michel
Charles .J Herget
Summer. 2007
1
N
U F DAMENTAL CONCEPTS
In this chapter we present fundamental concepts required throughout the

remainder of this book. We begin by considering sets in Section 1.1. In
Section 1.2 we discuss functions; in Section 1.3 we introduce relations and
equivalence relations; and in Section 1.4 we concern ourselves with operations
on sets. In Section 1.5 we give a brief indication of the types of mathematical
systems which we will consider in this book. The chapter concludes with a
brief discussion of references.
1.1. SETS
Virtually every area of modern mathematics is developed by starting from

an undefined object called a set. There are several reasons for doing this.
One of these is to develop a mathematical discipline in a completely axiomatic
and totally abstract manner. Another reason is to present a unified approach
to what may seem to be highly diverse topics in mathematics. Our reason is
the latter, for our interest is not in abstract mathematics for its own sake.
However, by using abstraction, many of the underlying principles of modern
mathematics are more clearly understood.
Thus, we begin by assuming that a set is a well defined collection of
1
2 Chapter 1 I uF ndomental Concepts
elements or objects. We denote sets by common capital letters A, B, C, etc.,

and elements or objects of sets by lower case letters a, b, c, etc. F o r example,
we write
A = a{ , b, c}
to indicate that A is the collection of elements a, b, c. If an element x belongs
to a set A, we write
X EA.
In this case we say that "x belongs to A," or "x is contained in A," or "x is
a member of A," etc. Ifx is any element and if A is a set, then we assume that
one knows whether x belongs to A or whether x does not belong to A. If x
does not belong to A we write
x ¢ A.
To illustrate some of the concepts, we assume that the reader is familiar
with the set of real numbers. Thus, if we say
R is the set of all real numbers,
then this is a well defined collection of objects. We point out that it is possible
to characterize the set of real numbers in a purely abstract manner based on
an axiomatic approach. We shall not do so here.
To illustrate a non-well defined collection of objects, consider the state-
ment "the set of all tall people in Ames, Iowa." This is clearly not precise
enough to be considered here.
We will agree that any set A may not contain any given element x more
than once unless we explicitly say so. Moreover, we assume that the concept
of "order" will play no role when representing elements of a set, unless we
say so. Thus, the sets A = a{ , b, c} and B = c{ , b, a} are to be viewed as being
exactly the same set.
We usually do not describe a set by listing every element between the
curly brackets { } as we did for set A above. A convenient method of charac-
terizing sets is as follows. Suppose that for each element x of a set A there is a
statement P(x ) which is either true or false. We may then define a set B which
consists of all elements x E A such that P(x ) is true, and we may write
B = {x E A: P(x ) is true}.
F o r example, let A denote the set of all people who live in Ames, Iowa, and
let B denote the set of all males who live in Ames. We can write, then,
B= {x E A: x is a male}.
When it is clear which set x belongs to, we sometimes write { x : P(x ) is true}
(instead of, say, {x E A: P(x ) is trueD.
It is also necessary to consider a set which has no members. Since a set is
determined by its elements, there is only one such set which is called the
1.1. Sets 3
empty set, or the vacuous set, or the null set, or the void set and which is
denoted by 0. Any set, A, consisting of one or more elements is said to be
non-empty or nOD-void. IfA is non-void we write A 1= = 0.
If A and B are sets and if every element of B also belongs to A, then we
say that B is a subset of A or A includes B, and we write B c A or A :::> B.
Furthermore, if B c A and if there is an x E A such that x .¢ B, then we
say that B is a proper subset of A. Some texts make a distinction between
proper subset and any subset by using the notation c and ~, respectively.
We shall not use the symbol ~ in this book. We note that if A is any set,
then 0 c: A. Also, 0 c 0. If B is not a subset of A, we write B ¢ A or
A P= B.
1.1.1. Example. Let R denote the set of all real numbers, let Z denote
the set of all integers, let J denote the set of all positive integers, and let Q
denote the set of all rational numbers. We could alternately describe the set
Zas
Z = { x E R: x is an integer}.
Thus, for every x E R, the statement x is an integer is either true or false.
We frequently also specify sets such as J in the following obvious manner,
J = {x E Z: x = 1, 2, ...}.
We can specify the set Q as
Q= x{ E R:x = :,p,q E Z,q : ;i:o} .
It is clear that 0 c J c Z c Q c R, and that each of these subsets are

proper subsets. We note that 0 .¢ .J •
We now wish to state what is meant by equality of sets.
1.1.2. De6nition. Two sets, A and B, are said to be equal if A c Band

B c A. In this case we write A = B. If two sets, A and B, are not equal,
we write A :;i: B. Ifx and y denote the same element of a set, we say that they
are equal and we write x = y. If x and y denote distinct elements of a set,
we write x :;i: y.
We emphasize that all definitions are "ifand only if" statements. Thus, in
the above definition we should actually have said: A and B are equal if and
only if A c Band Be A. Since this is always understood, hereafter all
definitions will imply the "only if" portion. Thus, we simply say: two sets A
and B are said to be equal if A c Band B cA.
In Definition 1.1.2 we introduced two concepts of equality, one of equality
of sets and one of equality of elements. We shall encounter many forms of
equality throughout this book.
4 Chapter 1 I uF ndamental Concepts
Now let X be a set and let A c: .X The complement of subset A with

respect to X is the set of elements of X which do not belong to A. We denote
the complement of A with respect to X by CxA . When it is clear that the com-
plement is with respect to ,X we simply say the complement of A (instead of
the complement of A with respect to X), and simply write A- . Thus, we have
A- = {x E X: x ~ AJ. (1.1.3)
In every discussion involving sets, we will always have a given fixed set
in mind from which we take elements and subsets. We will call this set the
universal set, and we will usually denote this set by .X
Throughout the remainder of the present section, X denotes always an
arbitrary non-void fixed set.
We now establish some properties of sets.
1.1.4. Theorem. eL t A, B, and C be subsets of .X Then

(i) if A c: Band Bee, then Ace;
(ii) X- = 0;
(iii) 0- = X;
(iv) (A- r = A;
(v) A c B if and only if A- >= B- ; and
(vi) A = B if and only if A- = B- .
Proof To prove (i), first assume that A is non-void and let x E A. Since
A c: B, x E B, and since B c: C, X E C. Since x is arbitrary, every element
of A is also an element of C and so A c C. Finally, if A = 0, then A c C
follows trivially.
The proofs of parts (ii) and (iii) follow immediately from (1.1.3).
To prove (iv), we must show that A c (A- ) - and (A- r c: A. If A = 0,
then clearly A c: (A- r . Now suppose that A is non-void. We note from
(1.1.3) that
(A- r = {x E X: x ~ A- } . (1.1.5)
If x E A, it follows from (1.1.3) that x ~ A- , and hence we have from
(1.1.5) that x E (A- ) - . This proves that A c:(A- ) - .
If(A- r = 0, then A = 0; otherwise we would have a contradiction by
what we have already shown; i.e., A c: (A- r . So let us assume that (A- r
"* 0. If x E (A- r it follows from (1.1.5) that x ~ A- , and thus we have
x E A in view of (1.1.3). eH nce, (A- r c: A.
We leave the proofs of parts (v) and (vi) as an exercise. _
1.1.6. Exercise. Prove parts (v) and (vi) of Theorem 1.1.4.
The proofs given in parts (i) and (iv) of Theorem 1.1.4 are intentionally
quite detailed in order to demonstrate the exact procedure required to prove
1.1. Sets 5
containment and equality of sets. Frequently, the manipulations required to

prove some seemingly obvious statements are quite long. It is suggested that
the reader carry out all the details in the manipulations of the above exercise
and the exercises that follow.
Nex t , let A and B be subsets of .X We define the union of sets A and B,
denoted by A U B, as the set of all elements that are in A or B; i.e.,
A u B= x{ E X: x E A or x E B}.
When we say x E A or x E B, we mean x is in either A or in B or in both
A and B. This inclusive use of "or" is standard in mathematics and logic.
IfA and B are subsets of ,X we define their intersection to be the set of all
elements which belong to both A and B and denote the intersection by A n B.
Specifically,
A n B = x { E X : x E A and x E B}.
If the intersection of two sets A and B is empty, i.e., if A n B = 0, we
say that A and B are disjoint.
F o r example, let X = I{ , 2, 3,4 , 5}, let A = I{ , 2}, let B = 3{ , ,4 5}, let
C = 2{ , 3}, and let D = ,4{ 5}. Then A- = B, B- = A, DeB, A U B = ,X
A n B = 0, A U C = I{ , 2, 3}, B n D = D, A n C = 2{ ,} etc.
In the next result we summarize some of the important properties of
union and intersection of sets.
1.1.7. Theorem. Let A, B, and C be subsets of .X Then

(i) An B = B n A;
(ii) AU B = B U A;
(iii) An 0 = 0;
(iv) Au 0 = A;
(v) An X = A;
(vi) AuX = X ;
(vii) A n A= A;
(viii) Au A= A;
(ix) A u A- = X;
(x) An A- = 0;
(xi) An Be A;
(xii) An B = A if and only if A c B;
(xiii) A c A U B;
(xiv) A = A u B if and only if B c A;
(xv) (A n B) n C = An (B n C);
(xvi) (A U B) U C = A U (B U C);
(xvii) A n (B u C) = (A n B) u (A n C);
(xviii) (A n B) U C = (A U C) () (B U C);
(xi)x (A U B)- = A- n B- ; and
(x)x (A n Bf = A- U B- .
Proof. We only prove part (xviii) of this theorem. again as an illustration of
the manipulations involved. We will first show that (A () B) U C c (A U C)
() (B U C), and then we show that (A () B) U C::::J (A U C) n (B U C).
Clearly, if (A () B) U C = 0, the assertion is true. So let us assume that
(A () B) U C *- 0, and let x be any element of (A () B) U C. Then x E
A () B or x E C. Suppose x E A () B. Then x belongs to both A and B, and
hence x E A U C and x E B U C. F r om this it follows that x E (A U C)
() (B U C). On the other hand, let x E C. Then x E A U C and x E B U C.
and hence x E (A U C) () (B U C). Thus, if x E (A n B) U C, then
x E (A U C) n (B U C). and we have
(A n B) U C c (A U C) n (B U C). (1.1.8)
To show that (A () B) U C ::::J (A U C) () (B U C) we need to prove the
assertion only when (A U C) () (B U C) *- 0. So let x be any element of
(A U C) n (B U C). Then x E A U C and x E B U C. Since x E A U C,
then x E A or x E C. Furthermore, x E B U C implies that x E B or
x E C. We know that either x E C or x ¢ C. If x E C. then x E (A () B)
U C. If x ¢ C, then it follows from the above comments that x E A and
also x E B. Then x E A () B, and hence x E (A () B) U C. Thus, if x ¢ C,
then x E (A () B) U C. Since this exhausts all the possibilities, we conclude
that
(A U C) () (B U C) c (A () B) U C. (1.1.9)
F r om (U . S) and (1.1.9) it follows that (A U C) () (B U C) = (A () B)
U C .•
1.1.10. Exercise. Prove parts (i) through (xvii) and parts (xi)x and (x)x
of Theorem 1.1.7.
In view of part (xvi) of Theorem 1.1.7, there is no ambiguity in writing

AU B U C. Extending this concept. let n be any positive integer and let
AI' A 2• • ,A3 denote subsets of .X The set AI U A 2 U ... U A3 is defined
to be the set of all x E X which belong to at least one of the subsets AI,
and we write
U
3
A, = AI U A2 U ... U A3 = x{ E X: x E A, for some i = 1• . .. , n).

';1
Similarly, by part (xv) of Theorem 1.1.7, there is no ambiguity in writing

A () B n C. We define
n
';1
A, = AI () A: () ... () A. = {x E X: x E A, for all i = 1, ... ,n).
1.1. Sets 7
That is, n A, consists of those members of X

1= 1
n
which belong to all the subsets
AI, A z , • . ,An'
We will consider the union and the intersection of an infinite number of
subsets A, at a later point in the present section.
The following is a generalization of parts (xi)x and (x)x of Theorem
1.1.7.
1.1.11. Theorem. Let AI> ... , An be subsets of .X Then
(i) U[ 1= I
A/J- = n A;,
1= 1
and (1.1.12)
(ii) n[ 1=1
A/J- = U/=1 A;. (1.1.13)
1.1.14. Exercise. Prove Theorem 1.1.11.
The results expressed in Eqs. (1.1.12) and (1.1.13) are usually referred
to as De Morgan's laws. We will see later in this section that these laws hold
under more general conditions.
Next, let A and B be two subsets of X. We define the difference of Band
A, denoted (B - A), as the set of elements in B which are not in A, i.e.,
B- A= {x E :X x E B and x f/: A}.
We note here that A is not required to be a subset of B. It is clear that
B- A = Bn A- .
Now let A and B be again subsets of the set .X The symmetric difference
of A and B is denoted by A ! l B and is defined as
A !l B = (A - B) U (B - A).
The following properties follow immediately.
1.1.15. Theorem. Let A, B, and C denote subsets of .X Then

(i) A A B = B A A;
(ii) A A B = (A U B) - (A n B);
(iii) A A A = 0;
(iv) A! l 0 = A;
(v) A A (B! l C) = (A A B) A C;
(vi) A n (B! l C) = (A n B)! l (A n C); and
(vii) A! l Be (A! l C) U (C! l B).

8 Chapter 1 I Fundamental Concepts
In passing, we point out that the use of Venn diagrams is highly useful in
visualizing properties of sets; however, under no circumstances should such
diagrams take the place of a proof. In Figure A we illustrate the concepts of
union, intersection, difference, and symmetric difference of two sets, and the
complement of a set, by making use of Venn diagrams. Here, the shaded
regions represent the indicated sets.
• A
C"' A U B
B
CU ( DO
x
1.1.17. iF gure A. Venn diagrams.
1.1.18. Definition. A non-void set A is said to be finite if A contains n

distinct elements, where n is some positive integer; such a set A is said to be
of order n. The null set is defined to be finite with order ez ro. A set consisting
of exactly one element, say A = a{ ,} is called a singleton or the singleton of a.
If a set A is not finite, then we say that A is infinite.
In Section 1.2 we will further categorize infinite sets as being countable

or uncountable.
Next, we need to consider sets whose elements are sets themselves. F o r
example, if A, D, and C are subsets of ,X then the collection 1< = A { , B, C}
is a set whose elements are A, D, and C. We usually call a set whose elements
are subsets of X a family of subsets of X or a collection of subsets of .X
We will usually employ a hierarchical system of notation where lower
case letters, e.g., a, b, c, are elements of ,X upper case letters, e.g., A, B, C,
are subsets of ,X and script letters, e.g., 1< , B< , e, are families of subsets of .X
We could, of course, continue this process and consider a set whose elements
are families of subsets, e.g., 1<{ , B< , e}.
In connection with the above comments, we point out that the empty
1.1. Sets 9
set, 0, is a subset of .X It is possible to form a non-empty set whose only

element is the empty set, i.e., { 0 } . In this case, { 0 } is a singleton. We see that
o E { 0 } and 0 c { 0 } .
In principle, we could also consider sets made up of both elements of X
and subsets of .X F o r example, if x E X and A c ,X then ,x { A} is a valid
set. oH wever, we shall not make use of sets of this nature in this book.
There is a special family of subsets of X to which we given a special name.
1.1.19. Definition. Let A be any subset of .X We define the power class

of A or the power set of A to be the family of all subsets of A. We denote the
power class of A by P< (A). Specifically,
(P(A) = B
{ : B c A}.
1.1.20. Example. The power class of the empty set, (P(0) = { 0 } , i.e.,
the singleton of 0. The power class of a singleton, (P({a)J = { 0 , a{ n. F o r the
bn.
set A = a{ , b}, (P(A) = { 0 , a{ ,} b{ ,} a{ , In general, if A is a finite set with
n elements, then (P(A) contains 2" elements. _
Before proceeding further, it should be pointed out that a free and uncrit-
ical use of a set theory can lead to contradictions and that set theory has had
a careful development with various devices used to exclude the contradictions.
Roughly speaking, contradictions arise when one uses sets which are "too
big," such as trying to speak of a set which contains everything. In all of our
subsequent discussions we will keep away from these contradictions by always
having some set or space X fixed for a given discussion and by considering
only sets whose elements are elements of ,X or sets (collections) whose ele-
ments are subsets of ,X or sets (families) whose elements are collections of
subsets of ,X etc.
Let us next consider ordered sets. Above, we defined set in such a manner
that the ordering of the elements is immaterial, and furthermore that each
element is distinct. Thus, if a and b are elements of ,X then a{ , b} = b{ , a};
i.e., there is no preference given to a or b. Furthermore, we have a{ , a, b}
= a{ , b}. In this case we sometimes speak of an unordered pair a{ , b}.
Frequently, we will need to consider the ordered pair (a, b), (a and b need
not belong to the same set) where we distinguish between the first element a
and the second element b. In this case (a, b) = (u, v) if and only if u = a and
v = b. Thus, (a, b) *- (b, a) if a *- b. Also, we will consider ordered triplets
(a, b, c), ordered uq adruplets (a, b, c, d), etc., where we need to distinguish
between the first element, second element, third element, fourth element,
etc. Ordered pairs, ordered triplets, ordered quadruplets, etc., are examples of
ordered sets.
We point out here that our characterization of ordered sets is not axiom-
atic, since we are assuming that the reader knows what is meant by the first
10
element, second element, third element, etc. (However, it is possible to define

ordered sets in a totally abstract fashion without assuming this simple fact.
We shall forego these subtle distinctions and accept the preceding as a defi-
nition.)
Now let X and Y be two non-void sets. We define the Cartesian or direct
product of X and ,Y denoted by X x ,Y as the set of all ordered pairs whose
first element belongs to X and whose second element belongs to .Y Thus,
X x Y = {(x, y): x E ,X Y E .} Y (1.1.21)
Next, let X I > " " IX I denote n arbitrary non-void sets. We similarly
define the (n-fold) Cartesian product of XI" .. , X., denoted by X I x X 2
X •.• x X., as
XI x X 2 X •.. x X. = {(Xl' x 2, • . ,x . ):
XI E IX > X 2 E X 2, • • , X. EX . } . (1.1.22)
We call X I the ith element of the ordered set (X l ' • • ,x . ) E X I X X 2 X •.
x .X , i = 1, ... , n. eH re again, two ordered sets (X I " • . ,x . ) and (Y I '
•• ,Y . ) are said to be equal if and only if X I = Y I ' i = 1, ... ,n.
In the following example, the symbol I:>. means equal by definition.
1.1.23. Example. eL t R be the set of all real numbers. We denote the

Cartesian product, R x R, by R2 I:>. R x R. Thus, if x, Y E R, the ordered
pair (x, y) E R x R. We may interpret (x, y) geometrically as being the
coordinates of a point in the plane, x being the first coordinate and Y the
second coordinate. _
1.1.24. Example. eL t A = to, I}, and let B = a{ , b, c}. Then

A x B= ({O, a), (0, b), (0, c), (1, a), (I, b), (I, c)}
and
B x A = ({ a, 0), (a, 1), (b, 0), (b, 1), (c, 0), (c, I)}.
F r om this example it follows that, in general, if A and B are distinct sets,
then
A x B=;e. B x A. •
Next, we consider some generalizations to an ordered set. To this end,

let I denote any non-void set which we call index set. Now for each « E I,
suppose there is a unique A" c: .X We call A { .. :« E I} an indexed family of
sets. This notation requires some clarification. Strictly speaking, the set
{ .. :« E I} would normally indicate that none of the sets A",
notation A
« E I may be repeated. oH wever, in the case of indexed family we agree to
permit the possibility that the sets A.. ,« E I need not be distinct.
We define an indexed set in a similar manner. eL t I be an index set, and
for each« E I let there be a unique element x .. E .X Then the set { x .. : « E I}
1.1. Sets 11
is called an indexed set. Here again, we agree to permit the possibility that the
elements x .., a E I need not be distinct. Clearly, if I is a finite non-void set,
then an indexed set is simply an ordered set.
In the next definition, and throughout the remainder of this section, J
denotes the set of positive integers.
1.1.25. Definition. A sequence is an indexed set whose index set is .J A

sequence of sets is an indexed family of sets whose index set is .J
We usually abbreviate the sequence "x { E :X n E } J by ,x { ,}, when no

possibility for confusion exists. (Even though the same notation is used for
the sequence ,x { ,} and the singleton of "x ' the meaning as to which is meant
will always be clear from context.) Some authors write ,x { ,};C! to indicate
that the index set of the sequence is .J Also, some authors allow the index set
of a sequence to be finite.
We are now in a position to consider the following additional generaliza-
tions.
1.1.26. Definition. L e t A{ .. : a E I} be an indexed family of sets, and let

K be any subset of I. If K is non-void, we define
U
.. eJ r
A.. = {x E :X x E A.. for some a E }K
and
n A.. =
..ex
{x E X: x E A.. for all ex E K } .
If K = 0, we define U
.. e0
A.. = 0 and n A.. =
.. e0
.X
The union and intersection of families of sets which are not necessarily
indexed is defined in a similar fashion. Thus, if ff' is any non-void family of
subsets of ,X then we define
U F = x{ E :X x E F for some F E ff'}
pe'
and
(\ F = {x E :X x E F for all F E ff'.}
e' ~
When, in Definition 1.1.26, K is of the form K = k{ , k + 1, k + 2, ...},
where k is an integer, we sometimes write 0 A" and n A".
• -t • k
1.1.27. Example. Let X = R, the set of real numbers, and let I = x{ E R:

o< x < I}. Let A.. = x{ E R: 0 < x < a} for all a E I. Then, U A. = I
n A.. =
..el
and O
{ ,J i.e., the singleton containing only the element O. •
.el
1.1.28. Example.
A" = :x { -n < x <
eL t
n+ .}
X =
Then, U
. A" =
.
R, the set of real numbers, and let 1=
Rand n A" = :x { -1
.J Let
< x < I} .
I< xI
< 1} + - U ..
.. = 1 .- 1
eL t B" = {x E R: - -
n
.
n
Then,
,,= \
B" = :x { -1 < x < 2}
and n B" =
00
,,= \
:x { 0 <x< I} . •
The reader is now in a position to prove the following results.
1.1.29. Deorem. eL t A { .. : (t E I} be an indexed family of sets. Let B be

any subset of ,X and let K be any subset of I. Then
(i) B n U [ .. ex A..] = U
.. eIC
B
[ n A..] ;
(ii) B U n[ A..] = n B[ U A.. ];
n (B -
.. eIC .. eIC
(iii) B- U A.. = A..);
n A.. = U (B - A.. );
.. ex .. eIC
(iv) B -
A.. r = n A;; and
.. eIC .. eIC
(v) U [ .. eIC .. elC
(vi) n[.. eIC
A.. r = U
.. ex
A;.
Parts (v) and (vi) of Theorem 1.1.29 are called De Morgan's laws.
We conclude the present section with the following:
1.1.31. Definition. eL t ff' be any family of subsets of .X ff' is said to be a
family of disjoint sets if for all A, B E ff' such that A :# B, then A n B = 0.
A sequence of sets E{ ,,} is said to be a sequenee of disjoint sets if for every m,
n E J such that m n, E", n E" = 0. "*
1.2. N
U F CTIONS
We first give the definition of a function in a set theoretic manner. Then

we discuss the meaning of function in more intuitive terms.
1.2.1. Definition. eL t X and Y be non-void sets. A function/from X into

Y is a subset of X x Y such that for every x E X there is one and only one
y E Y (Le., there is a unique y E )Y such that (x, y) E f The set X is called
the domain of f (or the domain of definition of f), and we say that / is de8Ded
on .X The set { y E :Y (x, y) E f for some x E X } is called the range of f
and is denoted by <n(f). F o r each (x, y) E ,J we call y the value of / at x
1.2. uF nctions 13
and denote it by f(x ) . We sometimes writef: X -> Y t o denote the function

f from Xinto .Y
The terms mapping, map, operator, transformation, and function are used
interchangeably. When using the term mapping, we usually say "a mapping
of X into Y." Although the distinction between the words "of X " and "from
X " is immaterial, as we shall see, the wording "into Y " becomes important
as opposed to the wording "onto Y," which we will encounter later.
Sometimes it is convenient not to insist that the domain of definition off
be all of X ; i.e., a function is sometimes defined on a subset of X rather than
on all of .X In any case, the domain of definition offis denoted by ' J ( f) c .X
Unless specified otherwise, we shall always assume that ' J ( f) = .X
Intuitively, a functionfis a "rule" whereby for each x E X a uniq u e y E Y
is assigned to x . When viewed in this manner, the term mapping is q u ite
descriptive. However, defining a function as a "rule" involves usage of yet
another undefined term.
Concerning functions, some additional comments are in order.
1. So-called "multivalued functions" are not allowed by the above
definition. They will be treated later under the topic of relations
(Section 1.3).
2. The set X (or )Y may be the Cartesian product of sets, e.g., X = IX
X X 2 X ... x "X . In this case we think offas being a function of n
variables. We write f(x l , . . . ,x,,) to denote the value offat (X I ' • , ,
x . ) E X = X I X ... x "X .
3. It is important that the distinction between a function and the value
of a function be clearly understood. The value of a function, f(x ) ,
is an element of .Y The function.f is a much larger entity, and it is
to be thought ofas a single object. Note that f E P< (X x Y ) (the power
set of X x Y), but not every element of P< (X x Y ) is a function. The
set of all functions from X into Yis a subset of P< (X x Y) and is some-
times denoted by yx .
1.2.2. Example. Let A and B be the sets defined in Example 1.1.24. L e t

fbe the subset of A x B given byf = (0, a), (1, b)}. Thenfis a function from
A into B. We see thatf(O) = a andf(1) = b. The range offis the set (a, h}
which is a proper subset of B. •
Although we have defined a function as being a set, we usually characterize

a function according to a rule as shown, for example, in the following.
1.2.3. Example. L e t R denote the real numbers, and letfbe a function from
R into R whose value at each x E R is given by f(x ) = sin x. The function
f is the sine function. Expressed explicitly as a set, we see that f = {(x, y):
14 Chapter 1 / Fundamental Concepts
y = sin .}x Note that the subset ({ ,x y): x = sin y} c R x R is not a

function. _
The preceding example also illustrates the notion of the graph of a

function. Let X and Y denote the set of real numbers, let X x Y denote
their Cartesian product, and let/be a function from X into .Y The collection
of ordered pairs (x , f(x » in X x Y is called the graph of the function f.
Thus, a subset G of X x Y is the graph of a function defined on X if and
only if for each x E X there is a unique ordered pair in G whose first element
is .x In fact, the graph of a function and the function itself are one and the
same thing.
Since functions are defined as sets, equality of fUDctiODS is to be interpreted
in the sense as equality of sets. With this in mind, the reader will have no
difficulty in proving the following.
1.2.4. Deorem. Two mappings I and g of X into Y a re equal if and only

if I(x ) = g(x) for every x E .X
We now wish to further characterize and classify functions. If I is a

function from X into ,Y we denote the range ofIby Gl(f). In general, Gl(/)
c Y mayor may not be a proper subset of .Y Thus, we have the following
definition.
1.2.6. Definition. Let I be a function from X into .Y If Gl(f) = ,Y then

I is said to be surjective or a surjection, and we say that I maps X onto .Y
If/is a function such that for every IX ,X:t E X , f(x l ) = / (x : t) implies that
lX = :X t, then/is said to be injective or a oile-to-one mapping, or an injection.
If/is both injective and surjective, we say that/is bijective or one-to-one and
onto, or a bijection.
Let's go over this again. Every function I: X - > Y is a mapping of X into

.Y If the range of I happens to be all of ,Y then we say I maps X onto .Y
F o r each x E ,X there is always a unique y E Ysuch thaty = I(x ) . oH wever,
there may be distinct elements lX and :X t in X such that I(x l ) = I(x:t). If
there is a unique x E X such that I(x ) = y for each y E Gl(f), then we say
that I is a one-to-one mapping. IfI maps X onto Y a nd is one-to-one, we say
that I is one-to-one and onto. In Figure B an attempt is made to illustrate
these concepts pictorially. In this figure the dots denote elements of sets and
the arrows indicate the rules of the various functions.
The reader should commit to memory the following associations: surjec-
tive ~ onto; injective ~ one-to-one; bijective ~ one-to-one and onto.
Frequently, the term one-to-one is abbreviated as (1-1).
1.2. Functions 15
e • • 5 e • ,.4 d. ,.5
'~4
:?' 3 :7
d • • 4 c • ".4 b 3
d><:3
c 2 b. ,.3 c 2
b 2 1 d 1
a • 1
a~:
1,: ,X -+ ,Y 12 : X 2 -+ Y 2 13 : X 3 ... 3Y I.: .X - + Y .
I, is into 12 is onto 13 is (1 - 11 I. is bijective
1.2.7. iF gure B. Illustration of different types of mappings.
We now prove the following important but obvious result.
1.2.8. Theorem. eL t f be a function from X into ,Y and let Z = fSt(f),

the range off L e tgdenote the set{(y, x ) E Z X :X (x, y) E f} . Then, clearly,
g is a subset of Z x X and f is injective if and only if g is a function from Z
into .X
Proof L e tfbe injective, and let y E Z. Since y E fSt(f), there is an x E X
such that (x, y) E f and hence (y, x ) E g. Now suppose there is another
IX E X such that (y, IX ) E g. Then (XI' y) E f Since f is injective and
y = f(x ) = f(x I)' this implies that X = X I and so X is unique. This means
that g is a function from Z into .X
Conversely, suppose g is a function from Z into X. Let x h X~ E X be such
that f(x l ) = f(x 2). This implies that (X I ,f(X I» and (X 2,f(X 2» E f and so
(f(x l ), IX ) and (f(x 2), x 2) E g. Since f(x l ) = f(x 2) and g is a function,
we must have IX = X 2. Therefore,fis injective. _
The above result motivates the following definition.
1.2.9. Definition. L e tfbe an injective mapping of X into .Y Then we say

that f has an inverse, and we call the mapping g defined in Theorem 1.2.8
the inverse off Hereafter, we will denote the inverse of fbyf- I .
Clearly, iffhas an inverse, thenf- I

is a mapping from fSt(f) onto .X
1.2.10. Theorem. eL t f be an injective mapping of X into .Y Then

(i) fis a one-to-one mapping of X onto fSt(f);
(ii) f- I is a one-to-one mapping of fSt(f) onto ;X
(iii) for every X E ,X f- l (f(X » = ;X and
(iv) for every y E al(f),f(f- I (y» = y.

16 Chapter 1 I Fundamental Concepts
Note that in the above definition, the domain of /-1 is R< (f) ,which need
not be all of .Y
Some texts insist that in order for a function/to have an inverse, it must
be bijective. Thus, when reading the literature it is important to note which
definition of /-1 the author has in mind. (Note that an injective function
/: X - Y is a bijective function from X onto R< (f).)
1.2.12. Example. eL t X = Y = R, the set o"real numbers. L e t/: X - Y

be given by lex ) = x 3 for every x E R. Then/is a (1-1) mapping of X onto Y
and/- I (y) = (y)'/3 for all y. _
1.2.13. Example. eL t X = Y = ,J the set of positive integers. Let / : ,X

- Y b e given by fen) = n + 3 for all n E .J Then/is a (1-1) mapping of X
into .Y oH wever, the range off, R < (f) = y{ E :Y y > }4 = ,4 { 5, ...} .Y *'
Therefore,fhas an inverse,f- I , which is defined only on R< (f) and not on all
of .Y In this case we have/- I (y) = y - 3 for all y E R< (f). _
1.2.14. Example. eL t X = Y = R, the set of all real numbers. Let /: X

- bY e given by lex ) = I ;Ix \ for all x E R. Then/is an injective mapping
and R
< (f) = y{ E :Y ,- I < y< + I] . Also, /-1 is a mapping from R< (f)
into R given by /- I (y) = I !\y I for all y E R
< (f). -
Next, let ,X ,Y and Z be non-void sets. Suppose that/: X - Y a nd g : Y

- Z. F o r each x E ,X we have f(x ) E Y and g(f(x » E Z. Since / and g
are mappings from X into Y a nd from Y into Z, respectively, it follows that
for each x E X there is one and only one element g(f(x » E Z. e H nce, the
set
({ ,x )z E X X Z: z = g(f(x » , x E X } (1.2.15)
is a function from X into Z. We call this function the composite function of
g and / and denote it by g 0 f The value of go/ at x is given by
(g 0 f)(x ) = g o/(x ) t:. g(f(x » .
In Figure C, a pictorial interpretation of a composite function is given.
1.2.17. Theorem. If/is a mapping of a set X onto a set Yand g is a mapping

of the set Y onto a set Z, then go/ is a mapping of X onto Z.
Proof In order to show that go/ is an onto mapping we must show that
foranyz E Zthere exists an x E X s uchthatg(/(x » = z . Ifz E Zthensince
g is a mapping of Y onto Z, there is an element y E Y such that g(y) = .z
Furthermore, since / is a mapping of X onto ,Y there is an x E X such that
lex ) = y. Since g o/(x ) = g(f(x » = g(y) = ,z it readily follows that go/
is a mapping of X onto Z, which proves the theorem. _
1.2. uF nctions 17
1.2.16. iF gure C. Illustration of a composite function.
We also have
1.2.18. Theorem. IfI is a (I- I ) mapping of a set X onto a set ,Y and if g
is a (I- I ) mappi ng ofthe set Y o nto a set Z, then g 0 I is a (I- I ) mapping of X
ontoZ.
Next we prove:
1.2.20. Theorem. If I is a (1-1) mapping of a set X onto a set ,Y and if

a set Z, then (g 0 f)- I = (f- I ) 0 (g- I ).
g is a (1- I ) mapping of Y o nto
Proof Let z E Z. Then there exists an x E X such that g 0 f(x ) = ,z and
hence (g 0 f)- I (Z) = .x Also, since g 0 f(x ) = g(f(x » = ,z it follows that
g- I (Z) = I(x ) , from which we have f- I (g- I (Z» = .x But I- I (g- I (z » =
I- I 0 g- I (Z) and since this is equal to x, we havef- I 0 g- I (Z) = (g olt 1(z).
Since z is arbitrary, the theorem is proved. _
Note carefully that in Theorem 1.2.20 I is a mapping of X onto .Y If it

had simply been an injective mapping, the composite function (/- 1 ) 0 (g- I )
may not be defined. That is, the range of g- I is ;Y however, the domain of
1- 1 is R
< (f). Clearly, the domain ofI- I must include the range of g-1 in order
that the composition (f- l ) 0 (g- l ) be defined.
1.2.21. Example. Let A = r{ , s, t, u}, B = u{ , v, W, x}, and C = { , x , y,

w
.J z Let the function I : A - > B be defined as
1= fer, )U , (s, w), (t, v), (u, )x .J
We find it convenient to represent this function in the following way:
1= (r stU ) .
u W v x
That is, the top row identifies the domain ofI and the bottom row contains
each uniq u e element in the range of I directly below the appropriate element
in the domain. Clearly, this representation can be used for any function
defined on a finite set. In a similar fashion, let the function g : B - + C be
defined as
g = (U v W )X .
x W z y
Clearly, both/and g are bijective. Also, go lis the (I- I ) mapping of A onto
C given by
g 0/= (xr stU

z
). y W
F u rthermore,
uX ), g- I = (X
u v
W Z
W
y),
x
(gof)- t = (X r sZ w Y
t u
).
Now
I- I og- t = (rX Wt sZ Y )'

u
i.e.,f- t og- t = ( goltt .•
The reader can prove the next result readily.
1.2.22. T
' heorem. L e t W, X, ,Y and Z be non-void sets. If I is a mapping
of set W into set ,X if g is a mapping of X into set ,Y and if h is a mapping
of Y into set Z (sets W, ,X ,Y Z are not necessarily distinct), then h 0 (g 0 f)
= (h 0 g) of
1.2.24. Example. Let A = m [ , n, p, ,} q B= m

[ , r, s}, C = r{ , t, u, v}, D
= w{ , ,x ,Y ,}z and define I : A - + B, g : B - + C, and h : C - + D as
1= (: ; ~ :), g= (~ r :), h = C ~ ; :).

Then
g0I = (~ ; ~ :) and hog = (: : :) .

1.2. uF nctions 19
Thus,
h 0 (g 0 f) = (: : ~ ~) and (h 0 g) 0 f = (: : ~ ~),
i.e., h 0 (g 0 f) = (h 0 g) 0 f. •
There is a special mapping which is so important that we give it a special

name. We have:
1.2.25. Definition. Let X be a non-void set. eL t e : X - X be defined by

e(x) = x for all x E .X We call e the identity function on .X
It is clear that the identity function is bijective.
1.2.26. Theorem. eL t X and Y be non-void sets, and left f: X - .Y Let

ex, ey, and e l be the identity functions on X, ,Y and R
< (f), respectively. Then
(i) iffis injective, thenf- I of= ex andfof- I ; = e l ; and
(ii) f is bijective if and only if there is a g : Y - X such that g 0 f = ex
andfo g = ey.
Proof. Part (i) follows immediately from parts (iii) and (iv) of Theorem
1.2.10.
The proof of part (ii) is left as an exercise. _
1.2.27. Exercise. Prove part (ii) of Theorem 1.2.26.
Another special class of important functions are permutations.
1.2.28. Definition. A permutation on a set X is a (I- I ) mapping of X onto .X
It is clear that the identity mapping on X is a permutation on .X F o r this

reason it is sometimes called the identity permutation on .X It is also clear
that the inverse of a permutation is also a permutation.
1.2.29. Exercise. eL t X = a{ , b, e}, and definef: X -+ X and g : X - X

as
f= (ac bb ae), g= (ab eb ae).
Show that/, g,f- I , and g- I are permutations on .X
1.2.30. Exercise. Let Z denote the set of integers, and let f : Z - Z be

defined by f(n) = n + 3 for all n E Z. Show thatfandf- I are permutations
on Z and thatf- I 0 f= fo f- I .
10
The reader can readily prove the following results.
1.2.31. Theorem. Iflis a (I- I ) mapping of a set A onto a set B and if g

is a (1-1) mapping of the set B onto the set A, then g 0 I is a permutation on A.
1.2.32. Corollary. If I and g are both permutations on a set A, then g 0 I

is a permutation on A.
1.2.33. Exercise. Prove Theorem 1.2.31 and Corollary 1.2.32.
1.2.34. Exercise. Show that if a set A consists of n elements, then there

are exactly n! (n factorial) distinct permutations on A.
Now letl be a mapping of a set X into a set .Y If X I is a subset of X , then

for each element x ' E XI there is a unique element/(x ' ) E .Y Thus,fmay be
used to define a mapping f' of XI into Y defined by
f' ( x ' ) = I(x ' ) (1.2.35)
for all x ' E XI' This motivates the following definition.
1.2.36. Definition. The mappingf' of subset XI C X into Y o f Eq. (I.2.35)

is called the mapping of X . into Y induced by the mapping f: X - > .Y In
this case f' is called the restriction offto the set X I '
We also have:
1.2.37. Definition. IfI is a mapping of XI into Y a nd if XI C ,X then any

mapping f of X into Y is said to be an extension offif
/(x ) = I(x ) (1.2.38)
for every x E XI'
Thus, if j is an extension off, then I is a mapping of a set XI C X into Y

which is induced by the mapping j of X into .Y
1.2.39. Example. eL t X I = u{ , v, ,} x X = f{ l, v, ,x y, ,} z and Y = tn, p, ,q

T,s, t}. Clearly XI C .X Define I : X I -> Yas
I=(U n p q
v )X .
Also, define j, j :X -> Y as
j = (U v x y Z), j = (U v x y Z) .
npqrs npqnt
Then j andj are two different extensions off Moreover, I is the mapping
1.2. uF nctions 11
of IX into Y induced either by j or j. In general, two distinct mappings

may induce the same mapping on a subset. _
Let us next consider the image and the inverse image of sets under
mappings. Specifically, we have
1.2.40. Definition. L e tf be a function from a set X into a set :Y Let A c: ,X

and let B c: .Y We define the image of A under f, denoted by f(A), to be the
set
f(A) = y{ E :Y y = f(x ) , X E A}.
We define the inverse image of B under f, denoted by f- l (B), to be the set
f- ' ( B) = x{ E X : f(x ) E B}.
Note thatf- I (B) is always defined for any f: X - - > .Y That is, there is no
implication here thatfhas an inverse. The notation is somewhat unfortunate
in this respect. Note also that the range offis f( X).
In the next result, some of the important properties of images and inverse
images of functions are summarized.
1.2.41. Theorem. Let f be a function from X into ,Y let A, A1> and A2

be subsets of ,X and let B, BI> and B2 be subsets of .Y Then
(i) if AI c: A, then f(A I) c: f(A);
(ii) f(A I U A2 ) = f(A I ) U f(A2 );
(iii) f(A I n A2 ) c: f(A I) n f(A2 );
(iv) f- ' ( B I U B2 ) = f- I (B I) U f- I(B2 );
(v) f- ' ( B I n B2 ) = rJ ( B I ) n f- I(B 2 );
(vi) f- ' ( B- ) = [ f - I (B)r;
(vii) f-'[f(A)]:::> A; and
(viii) f[ f - ' ( B)] c: B.
Proof We prove parts (i) and (ii) to demonstrate the method of proof.
The remaining parts are left as an exercise.
To prove part (i), let y E f(AI)' Then there is an x E AI such that
y = f(x ) . But AI c: A and so x E A. H e nce,f(x ) = y E f(A). This proves
thatf(A I) c: f(A).
To prove part (ii), let y E f(A 1 U A2 ). Then there is an x E AI U A2
such that y = f(x ) . If x E AI, then f(x ) = y E f(A,). If x E A2 , then
f(x ) = y E f(Az ). Since x is in AI or in Az , f(x ) must be in f(A,) or f(Az ).
Therefore, f(A I U A2 ) c: f(A I) U f(Az ). To prove that f(A I) U f(Az )
c: f(A, U Az ), we note that Al c: AI U Az . So by part (i), f(AI) c: f(AI
Chapter 1 I uF ndamental Concepts
U A2). Similarly, f(A2) c f(A, U A2). F r om this it follows that f(A I)
-
U f(A2) c f(A, U A2). We conclude that f(A, U A2) = f(A I) U f(A,j.
1.2.42. Exercise. Prove parts (iii) through (viii) of Theorem 1.2.41.
We note that, in general, equality is not attained in parts (iii), (vii), and
(viii) of Theorem 1.2.41. However, by considering special types of mappings
we can obtain the following results for these cases.
1.2.43. Theorem. L e tfbe a function from X into ,Y let A, AI' and A2 be

subsets of ,X and let B be a subset of .Y Then
(i) f(A, n A2) = f(A I) n f(A2) for all pairs of subsets AI, A2 of X
if and only iffis injective;
(ii) f- ' [ f (A)] = A for all A c X if and only iff is injective; and
(iii) f[ f - I (B)] = B for all B c Y i f and only iffis surjective.
Proof We will prove only part (i) and leave the proofs of parts (ii) and
(iii) as an exercise.
To prove sufficiency, letfbe injective and let AI and A2 be subsets of .X
In view of part (iii) of Theorem 1.2.41, we need only show thatf(A I) nf(A,)
c f(A, n A2). In doing so, let y E f(A I) n f(A2). Then y E f(A I) and
y E f(A2). This means there is an IX E AI and an x 2 E A2 such that y
= f(x ,) = f(x 2). Since f is injective, IX = 2X ' Hence, IX E AI n A2. This
implies that y E f(A J n A2); i.e.,f(A I) n f(Al ) c f(A I n Al )·
To prove necessity, assume that f(A I n A2) = f(A I) n f(A2) for all
subsets AI and A 2 of .X F o r purposes of contradiction, suppose there are
IX ' 2X E X such that IX *
X 2 and f(x , ) = f(x 2). Let AI = IX { } and A2
= (X2;} i.e., AI and A2 are singletons of X I and X 2, respectively. Then AI n A2
0, and so f(A, n A2) = 0. However, f(A,) = y{ } and f(A2} = y { ,}
*
=
and thus f(A I) n f(A2) = y{ } 0. This contradicts the fact that f(A , )
n f(A2) = f(A I n A2) for all subsets AI and A2 of .X Thus, f is injective. -
1.2.4.4 Exercise. Prove parts (ii) and (iii) of Theorem 1.2.43.
Some of the preceding results can be extended to families of sets. F o r

example, we have:
1.2.45. Theorem. Let f be a function from X into ,Y let A

{ .. : IX E I} be
an indexed family of sets in ,X and let B
{ .. : IX E } K be an indexed family
of sets in .Y Then
(i) f(U A..) = « UE J f(A..);
n A..) c n f(A..);
(l.EI
(ii) f(
«EI «EI
1.2. u~ nct;ons
(iii) f- I (U B,,) = U f- I (B,,);

"EI: "EI:
(iv) f- I (n B,,) = n f- I (B,,); and

"E /{ "EI:
(v) if Be Y , f- ' ( B- ) = [ f - I (B} r .
Proof We prove parts (i) and (iii) and leave the proofs of the remaining
parts as an exercise.
To prove part (i), let y E feu A,,). This means that there is an x E U A"
"EI "EI
such that y = f(x ) . Thus, for some IX E T, x E A". This implies that f(x )
E f(A,,) and so y E f(A,,). eH nce, y E U f(A,,). This shows that feu A,,)
"EI "EI
c U f (A,,).
"EI
To prove the converse, let y E U f(A,,). Then y E f(A,,) for some IX E T.
"EI
This means there is an x E A" such thatf(x ) = y. Now x E U A", and so
"EI
f(x ) = Y E feu A,,). Therefore, U f(A,,) c feu A,,). This completes the
"EI "EI "EI
proof of part (i).
To prove part (iii), let x E f- I ( U B.). This means that f(x ) E U B",.
• E/{ "E/{
eH nce, f(x ) E B" for some IX E .K Thus, x E f-I(B",), and so x E
U f - I (B.). Therefore,j- I (U B",) c U f- I (B,,) .

• E/{ "' E /{ .E/{
Conversely, let x E U f- I (B,,). Then x E f- I (B,,) for some IX E K.
• E/{
Thus,j(x ) E B". H e nce,j(x ) E U B., and so x E f- I ( U B,,). This means

ceK ceK
that
.EI:
U f- I (B,,) c f- I ( U
"E/{
B,,), which completes the proof of part (iii). •
1.2.46. Exercise. Prove parts (ii), (iv), and (v) of Theorem 1.2.45.
Having introduced the concept of mapping, we are in a position to consider

an important classification of infinite sets. We first consider the following
definition.
1.2.47. Definition. Let A and B be any two sets. The set A is said to be
equivalent to set B if there exists a bijective mapping of A onto B.
Clearly, if A is equivalent to B, then B is equivalent to A.
1.2.48. Definition. eL t J be the set of positive integers, and let A be any set.
Then A is said to be countably infinite if A is equivalent to .J A set is said to
be countable or denumerable if it is either finite or countably infinite. Ifa set
is not countable, it is said to be uncountable.
We have:
Chapter 1 I ~ntal Concepts
1.2.49. Theorem. L e t J be the set of positive integers, and let 1 c .J If 1

is infinite, then 1 is equivalent to .J
Proof. We shall construct a bijective mapping, f, from J onto 1. L e t .J { :
n E J } be the family of sets given by J . = {I, 2, ... , n} for n = 1,2, ....
Clearly, each J. is finite and of order n. Therefore, J. n I is finite. Since I is
infinite, 1 - J.* 0 for all n. L e t us now define f : J - + I as follows. L e t
f(I) be the smallest integer in 1. We now proceed inductively. Assume f(n)
E I has been defined and let f(n + 1) be the smallest integer in I which is
greater than f(n). Now f(n + 1) > f(n), and so f(n.) > f(n,J for any
n. > n2 • This implies thatfis injective.
Nex t , we want to show that f is surjective. We do so by contradiction.
* *
*
Suppose that f(J ) I. Since f(J ) c I, this implies that 1- f(J ) 0. L e t
q be the smallest integer in 1 - f(J ) . Then q f(1) because f(l) E f(J ) ,
and so q > f(I). This implies that 1 n J q _ . * 0. Since In J q _ . is non- v oid
and finite, we may find the largest integer in this set, say r. It follows that
r < q - 1 < .q Now r is the largest integer in I which is less than .q But
r < q implies that r E f(J ) . This means there is an s E J such that r = f(s).
By definition of f,f(s + 1) = .q Hence, q E f(J ) and we have arrived at a
contradition. Thus, f is surjective. This completes the proof. _
We now have the following corollary.
1.2.50. Corollary. Let A c B c .X If B is a countable set, then A is

countable.
Proof. If A is finite, then there is nothing to prove. So let us assume that A
is infinite. This means that B is countably infinite, and so there exists a
bijective mapping f : B - + .J L e t g be the restriction offto A. Then for all
Xu X 2 E A such that X . * X 2 , g(x . ) = *
f(x t ) f(x 2 ) = g(x 2 ). Thus, g
is an injective mapping of A into .J By part (i) of Theorem 1.2.10, g is a
bijective mapping of A onto g(A). This means A is equivalent to g(A), and
thus g(A) is an infinite set. Since g(A) c ,J g(A) is equivalent to .J Hence,
there is a bijective mapping of g(A) onto ,J which we call h. By Theorem
1.2.18, the composite mapping hog is a bijective mapping of A onto .J This
means that J is eq u ivalent to A. Therefore, A is countable. _
We conclude the present section by considering the cardinality of sets.

Specifically, if a set is finite, we say the cardinal Dumber of the set is eq u al to
the number of elements of the set. Iftwo sets are countably infinite, then we
say they have the same cardinal number, which we can define to be the
cardinal number of the positive integers. More generally, two arbitrary sets
are said to have the same cardinal number if we can establish a bijective
mapping between the two sets (i.e., the sets are equivalent).
1.3. RELATIONS AND EQUIVALENCE RELATIONS
Throughout the present section, X denotes a non-void set.

We begin by introducing the notion of relation, which is a generalization
of the concept of function.
1.3.1 Deftnition. Let X and Y be non-void sets. Any subset of X X Y is

called a relation from X to .Y Any subset of X x X is called a relation in .X
1.3.2. Example. Let A = u{ , v, ,x y) and B = a{ , b, c, d). Let ~ = ({ u, a),

(v, b), (u, c), (x, a»). Then ~ is a relation from A into B. It is clearly not a
function from A into B (why?). _
1.3.3. Example. Let X = Y = R, the set of real numbers. The set

({ ,xy) E R x R: :x ::;;; y) is a relation in R. Also, the set ({ ,x y) E R x R:
x = sin y) is a relation in R. This shows that so-called multivalued functions
are actually relations rather than mappings. _
As in the case of mappings, it makes sense to speak of the domain and the
range of a relation. We have:
1.3.4. DefiDition. eL t p be a relation from X to .Y The subset of X,

{x E :X (x, y) E p, Y E )Y ,
is called the domaiD or p. The subset of Y
{y E :Y (x, y) E p, X EX ) ,
is called the ruge of p.
Now let p be a relation from X to .Y Then, clearly, the set p- I c Y X ,X

defined by
p- I = { ( y; x) E Y X X : (x, y) E pc X x )Y ,
is a relation from Y to .X The relation p- I is called the inverse relation of p.
Note that whereas the inverse of a function does not always exist, the inverse
of a relation does always exist.
Next, we consider equivalence relations. eL t p denote a relation in X ;
i.e., p c X X .X Then for any ,x y E ,X .either (x, y) E P or (x, y) i p,
but not both. If (x, y) E p, then we write x p y and if (x, y) i p, we write
x.J/y.
1.3.5. DefiDition. Let p be a relation in .X

(i) If x P x for all x E ,X then p is said to be reflexive;
26 Chapter 1 I uF ndtzmental Concepts
(ii) if x P y implies y p x for all x, Y E p, then p is said to be symmetric;

and
(iii) if for all x, y, Z E ,X X PY and y p Z implies x p ,z then p is said to
be traositive.
1.3.6. Example. Let R denote the set of real numbers. The relation in R
given by {(x, y): x < y} is transitive but not reflexive and not symmetric.
The relation in R given by {(x, y): x *"
y} is symmetric but not reflexive and
not transitive. _
1.3.7. Example. Let p be the relation in (>J< )X defined by p = ({ A x B):

A c B}. That is, A p B if and only if A c B. Then p is reflexive and transitive
but not symmetric. _
In the following, we use the symbol,.., to denote a relation in .X If

(x, y) E ,.." then we write, as before, x ,.., y.
1.3.8. Definition. L e t,.., be a relation in .X Then ...., is said to be an

equivalence relation in X if ,.., is reflexive, symmetric, and transitive. If ,.., is
an equivalence relation and if x ...., y, we say that x is equivalent to y.
In particular, the equivalence relation in X characterized by the statement

"x ,.., y if and only if x = y" is called the equals relation in X or the identity
relation in .X
1.3.9. Example. eL t X be a finite set, and let A, B, C E P< (X). Let,.., on

P< (X)be defined by saying that A ...., B if and only if A and B have the same
number of elements. Clearly A ,.., A. Also, if A ,.., B then B "' "' A. F u rther-
more, if A ...., Band B "' "' C, then A ,.., C. Hence, ...., is reflexive, symmetric,
and transitive. Therefore, ,.., is an equivalence relation in P< (X). _
1.3.10. Example. Let R1. = R x R, the real plane. Let X be the family of
all triangles in R1.. Then each of the following statements can be used to define
an equivalence relation in :X "is similar to," "is congruent to," "has the same
area as," and "has the same perimeter as." _
1.4. OPERATIONS ON SETS
In the present section we introduce the concept of operation on set, and

we consider some of the properties of operations. Throughout this section, X
denotes a non-void set.
1.4.1. Definition. A binary operation on X is a mapping of X x X into

.X A ternary operation on X is a mapping of X x X x X into .X
1.4. Operations on Sets 27
We could proceed in an obvious manner and define an n-ary operation

on .X Since our primary concern in this book will be with binary operations,
we will henceforth simply say "an operation on X " when we actually mean a
binary operation on .X
If IX: X X X - > X is an operation, then we usually use the notation
IX(,X y) A IX yX .
1.4.2. Example. Let R denote the real numbers. Let f: R x R - > R be

given by f(x , y) = x + y for all x, y E R, where x + y denotes the custom-
ary sum of x plus y (Le., + denotes the usual operation of addition of real
numbers). Then f is clearly an operation on R, in the sense of Definition
1.4.1. We could just as well have defined as being the operation on R,"+"
i.e., +: R x R - > R, where + ( x , y) A x + y. Similarly, the ordinary rules
of subtraction and multiplication on R, "- " and" . ", respectively, are also
operations on R. Notice that division, :- ,- is not an operation on R, because
x :- - y is not defined for all y E R (i.e., x :- - y is not defined for y = 0).
However, if we let R* = R - O { ,J then "- : - " is an operation on R#. •
1.4.3. Exercise. Show that if A is a set consisting of n distinct elements,

then there exist exactly n(·)· distinct operations on A.
1.4..4 Example. Let A = a{ , b}. An example of an operation on A is the

mapping IX: A x A - > A defined by
I%(a, a) A 01% 0 = 0, 1%(0, b) A 01% b = b,
lX(b,O) A b IX a = b, lX(b, b) = b IX b = a.
It is convenient to utilize the following operation table to define IX:
..!~-
ala b (l.4 . 5)
b b a
If, in general, IX is an operation on an arbitrary finite set A, or sometimes
even on a countably infinite set A, then we can construct an operation table
as follows:
IX Y
x xIXy
If A = a{ , b}, as at the beginning of this example, then in addition to IX

28 CMprerlIFm
~ en~/C~up~
p, y,
:r::
given in (1.4.5), we can define, for example, the operations and ~ on
A as
L~ " a b a b
a
b
Iba a
a
a
b
a b
a b
a a a
b b b •
We now consider operations with important special properties.
1.4.6. Definition. An operation cz on X is said to be commutative if x cz y

= y cz x for all x , y E X.
1.4.7. Definition. An operation cz on X is said to be associative if (x cz y) cz z

= x cz (y cz )z for x, y, Z E .X
In the case of the real numbers R, the operations of addition and multipli-
cation are both associative and commutative. The operation ofsubtraction is
neither associative nor commutative.
1.4.8. Definition. If cz and P are operations on X (not necessarily distinct),

then
(i) cz is said to be left distributive over P if
x cz (y P )z = (x cz y) P (x cz )z
for every x, y, Z E ;X
(ii) cz is said to be right distributive over P if
(x P y) cz z = (x cz )z P (y cz )z
for every x, y, Z E X ; and
(iii) cz is said to be distributive over P if cz is both left and right distributive
over p.
In Ex a mple 1.4.4, cz is the only commutative operation. The operation

p of Example 1.4.4 is not associative. The operations cz, y, and 6 of this ex a m-
ple are associative. In this example, " is distributive over 6 and 6 is distributive
over y.
In the case of the real numbers R, multiplication, ".", is distributive
over addition, "+".
The converse is not true.
1.4.9. Definition. If cz is an operation on ,X and if IX is a subset of ,X

then X l is said to be closed relative to cz if for every ,x y E X .. x cz Y E X l .
Clearly, every set is closed with respect to an operation on it.

The set of all integers Z, which is a subset of the real numbers R, is closed
with respect to the operations of addition and multiplication defined on R.
The even integers are also closed with respect to both of these operations,
whereas the odd integers are not a closed set relative to addition.
1.4. Operations on Sets
1.4.10. Definition. If a subset X l of X is closed relative to an operation ~

on X, then the operation a: on X l defined by
('« ,x y) = x IX' y = x« y
for all ,x y E lX > is called the operation on X l induced by IX.
If X l = X, then IX' = IX. If X l C X but X l 1= = X, then IX' 1= «= since IX'

and « are operations on different sets, namely X l and X, respectively. In
general, an induced operation IX' differs from its predecessor IX; however,
it does inherit the essential properties which « possesses, as shown in the
following result.
1.4.11. Theorem. L e t« be an operation on X, let X l C X, where X l is

closed relative to IX, and let IX' be the operation on X l induced by IX. Then
(i) if« is commutative, then IX' is commutative;
(ii) if« is associative, then IX' is associative; and
(iii) if P is an operation on X and X l is closed relative to p, and if« is
left (right) distributive over p, then IX' is left (right) distributive over
P', where P' is the operation on X l induced by p.
The operation IX' on a subset X l induced by an operation « on X will

frequently be denoted by IX, and we will refer to « as an operation on X l '
In such cases one must keep in mind that we are actually referring to the
induced operation IX' and not to IX.
1.4.13. Definition. eL t X l be a subset of .X An operation a. on X is called

an extension of an operation « on X l if X l is closed relative to a. and if « is
equal to the operation on X l induced by a..
A given operation « on a subset X l of a set X may, in general, have many

different extensions.
1.4.14. Example. Let Xl = a{ , b, c}, and let X = a{ , b, c, d, e}. Define «

on X l and a. and a. on X as
« a b C a.a b C d e f1. a b C d e
a a C b a a C b e d a a C b d e
b C b a b C b a d e b C b a e d
C b a C C b a C e d C b a C d e
d C d a b e d d C b a e
e d C a b e e d a C b e
Clearly, ~ is an operation on IX and ii and fl. are operations on .X Moreover,

*'
both a. and fl. (ii fl.) are extensions of ~. Also, ~ may be viewed as being
induced by ii and fl.. •
1.5. MATHEMATICAL SYSTEMS CONSIDERED

IN THIS BOOK
We will concern ourselves with several different types of mathematical

systems in the subsequent chapters. Although it is possible to give an abstract
definition of the term mathematical systelf1, we will not do so. Instead, we
will briefly indicate which types of mathematical systems we shall consider in
this book.
1. In Chapter 2 we will begin by considering mathematical systems which
are made up of an underlying set X and an operation ~ defined on X. We
will identify such systems by writing ;X{ .}~ We will be able to characterize a
system ;X{ }~ according to certain properties which X and ~ possess. Two
important cases of such systems that we will consider are semigroups and
groups.
In Chapter 2 we will also consider mathematical systems consisting of a
basic set X and two operations, say ~ and p, defined on ,X where a special
relation exists between ~ and p. We will identify such systems by writing
{X;~, Pl. Included among the mathematical systems of this kind which we
will consider are rings and fields.
In Chapter 2 we will also consider composite mathematical systems. Such
systems are endowed with two underlying sets, say X and ,F and possess
a much more complex (algebraic) structure than semigroups, groups, rings,
and fields. Composite sy~tems which we will consider include modules, vector
spaces over a field F which are also called linear spaces, and algebras.
In Chapter 2 we will also study various types of important mappings
(e.g., homomorphisms and isomorphisms) defined on semigroups, groups,
rings, etc.
Mathematical systems of the type considered in Chapter 2 are sometimes
called algebraic systems.
2. In Chapters 3 and 4 we will study in some detail vector spaces and
special types of mappings on vector spaces, called linear transformations.
An important class of linear transformations can be represented by matrices,
which we will consider in Chapter .4 In this chapter we will also study in
some detail important vector spaces, called Euclidean spaces.
3. Most of Chapter 5 is devoted to mathematical systems consisting of
a basic set X and a function p: X x X - + R (R denotes the real numbers),
where p possesses certain properties (namely, the properties of distance
1.6. References and Notes 31
between points or elements in X ) . The function p is called a metric (or a

distance function) and the pair ;X{ p) is called a metric space.
In Chapter 5 we will also consider mathematical systems consisting of
a basic set X and a family of subsets of X (called open sets) denoted by 3.
The pair { X ; 3) is called a topological space. It turns out that all metric
spaces are in a certain sense topological spaces.
We will also study functions and their properties on metric (topological)
spaces in Chapter 5.
.4 In Chapters 6 and 7 we will consider Dormed linear spaces, inner
product spaces, and an important class of functions (linear operators) defined
on such spaces.
A normed linear space is a mathematical system consisting of a vector
space X and a real-valued function denoted by II . II, which takes elements of
X into R and which possesses the properties which characterize the "length"
of a vector. We will denote normed spaces by { X ; 1I·11l.
An inner product space consists of a vector space X (over the field of real
numbers R or over the field of complex numbers C) and a function (' , ' ) ,
which takes elements from X x X into R (or into C) and possesses certain
properties which allow us to introduce, among other items, the concept of
orthogonality. We will identify such mathematical systems by writing
{ X ; (,,· » ) .
It turns out that in a certain sense all inner product spaces are normed
linear spaces, that all normed linear spaces are metric spaces, and as indicated
before, that all metric spaces are topological spaces. Since normed linear
spaces and inner product spaces are also vector spaces, it should be clear
that, in the case of such spaces, properties of algebraic systems (called
algebraic strocture) and properties of topological systems (called topological
structure) are combined.
A class of normed linear spaces which are very important are Bauach
spaces, and among the more important inner product spaces are Hilbert
spaces. Such spaces will be considered in some detail in Chapter 6. Also, in
Chapter 7, linear transformations defined on Banach and Hilbert spaces will
be considered.
5. Applications are considered at the ends of Chapters ,4 5, and 7.
1.6. REFERENCES AND NOTES
A classic reference on set theory is the book by Hausdorff 1[ .5]. The many
excellent references on the present topics include the elegant text by Hanneken
1[ .4), the standard reference by Halmos 1[ .3] as well as the books by Gleason
1[ .1] and Goldstein and Rosenbaum 1[ .2].
REFERENCES
1[ .1] A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.:

Addison-Wesley Publishing Co., Inc., 1966.
1[ .2] M. E. GOLDStEIN and B. M. ROSENBAUM, "Introduction to Abstract Analy-
sis," National Aeronautics and Space Administration, Report No. SP-203,
Washington, D.C., 1969.
1[ .3] P. R. A H M
L OS, Naive Set Theory. Princeton, N.J.: D. Van Nostrand Com-
pany, Inc., 1960.
1[ .4] C. B. A H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dicken-
son Publishing Co., Inc., 1968.
1[ .5] F. AH SU DORF,F Mengenlehre. New o Y rk: Dover Publications, Inc., 194.4
31
2
ALGEBRAIC STRUCTURES
The subject matter of the previous chapter is concerned with set theoretic
structure. We emphasized essential elements of set theory and introduced
related concepts such as mappings, operations, and relations.
In the present chapter we concern ourselves with algebraic structure.
The material of this chapter falls usually under the heading of abstract
algebra or modern algebra. In the next two chapters we will continue our
investigation of algebraic structure. The topics of those chapters go usually
under the heading of linear algebra.
This chapter is divided into three parts. The first section is concerned
with some basic algebraic structures, including semigroups, groups, rings,
fields, modules, vector spaces, and algebras. In the second section we study
properties of special important mappings on the above structures, including
homomorphisms, isomorphisms, endomorphisms, and automorphisms of
semigroups, groups and rings. Because of their importance in many areas
of mathematics, as well as in applications, polynomials are considered in
the third section. Some appropriate references for further reading are sug-
gested at the end of the chapter.
The subject matter of the present chapter is widely used in pure as well as
in applied mathematics, and it has found applications in diverse areas, such
as modern physics, automata theory, systems engineering, information
theory, graph theory, and the like.
33
34 Chapter 2 I Algebraic Structures
Our presentation of modern algebra is by necessity very brief. oH wever,

mastery of the topics covered in the present chapter will provide the reader
with the foundation required to make contact with the literature in applica-
tions, and it will enable the interested reader to pursue this subject further
at a more advanced level.
2.1. SOME BASIC STRUCTURES OF ALGEBRA
We begin by developing some of the more important properties of mathe-

matical systems, { X ; IX,} where IX is an operation on a non-void set .X
2.1.1. Definition. Let IX be an operation on .X If for all ,x ,Y Z E X,

x IX Y = x IX z implies that y = ,z then we say that ;X { IX} possesses the left
cancellation property. If x IX y = Z IX Y implies that x = ,z then ;X{ IX} is
said to possess the right cancellation property. If { X ; IX} possesses both the
left and right cancellation properties, then we say that the cancellation laws
hold in ;X { IX.}
In the following exercise, some specific cases are given.
2.1.2. Exercise. eL t X = ,x{ y} and let IX, p, )', and d be defined as

x Y
~
IX
xxy
~
xxx
.1.- r :- t y
xxy
~ xxx
yyx yyx yxy yyy
Show that (i) { X ; P} possesses neither the right nor the left cancellation prop-
erty; (ii) { X ; )'} possesses the left cancellation property but not the right can-
cellation property; (iii) { X ; d} possesses the right cancellation property but
not the left cancellation property; and (iv) { X ; IX} possesses both the left and
the right cancellation property.
In an arbitrary mathematical system { X ; IX} there are sometimes special

elements in X which possess important properties relative to the operation
IX. We have:
2.1.3. Definition. eL t IX be an operation on a set X and let X contain an

element e, such that
x IX e, = ,x
for all x E .X We call e, a right identity element of X relative to lX, or simply
aright identity of the system ;X{ IX.} If X contains an element e, which satisfies
the condition
e,IX x = ,x
2.1. Some Basic Structures ofAlgebra 3S
for all x E X, then et is called a left identity element of X relative to «, or

simply a left identity of the system ;X{ .} «
We note that a system ;X { } « may contain more than one right identity
element of X (e.g., system { X ; cS} of Exercise 2.1.2) or left identity element of
X (e.g., system ;X { y} of Exercise 2.1.2).
2.1.4. Definition. An element e of a set X is called an identity element

of X relative to an operation « on X if
e« x = x « e = x
for every x E .X
2.1.5. Exercise. Let X = to, I} and define the operations""

+ and"· " by
I-± h-oI
• .0 I
0 I o 0 0
I I 0 I 0 I
Does either ;X{ +} or ;X{ .} have an identity element?
Identity elements have the following properties.
2.1.6. Theorem. L e t« be an operation on .X

(i) If { X ; has an identity element e, then e is unique.
}«
} « has a right identity e, and a left identity ee. then e, = et .
(ii) If { X ;
(iii) If« is a commutative operation and if ;X{ } « has a right identity
element e" then e, is also a left identity.
Proof To prove the first part, let e' and en be identity elements of { X ; .} «
Then e' « en = e' and e' « en = en. Hence, e' = en.
To prove the second part, note that since e, is a right identity, et« e, = et.
Also, since et is a left identity, et « e , = e,. Thus, et = e,.
To prove the last part, note that for all x E X we have x = x « e, =
e,« x. •
In summary, if { X ; } « has an identity element, then that element is unique.

F u rthermore, if { X ; } « has both a right identity and a left identity element,
then these elements are equal, and in fact they are equal to the uniq u e identity
element. Also, if { X ; } « has a right (or left) identity element and « is a com-
mutative operation, then {X; } « has an identity element.
2.1.7. Definition. L e t« be an operation on X and let e be an identity of X

relative to «. If x E X, then x ' E X is called a right inverse of x relative to
Chapter 2 I Algebraic Structures
« provided that
x« x' = e.
An element x " E X is called a left ia~erse of x relative to « if
x"« x = e.
The following exercise shows that some elements may not possess any right
or left inverses. Some other elements may possess several inverses of one kind
and none of the other, and other elements may possess a number of inverses
of both kinds.
2.1.8. Exercise. Let X = ,x{ y, u, v} and define ~ as

« y u v
x
x x y x y
y x y y x
u x y u v
v x y v u
(i) Show that { X ; } « contains an identity element.
(ii) Which elements possess neither left inverses nor right inverses?
(iii) Which element has a left and a right inverse?
A. Semigroups and Groups
Of crucial importance are mathematical systems called semlgroups. Such

mathematical systems serve as the natural setting for many important results
in algebra and are used in several diverse areas of applications (e.g., qualita-
tive analysis of dynamical systems, automata theory, etc.).
2.1.9. Deftnition. L e t« be an operation on .X We call { X ; }« a semlgroup

if « is an associative operation on .X
Now let ,x y, Z E ,X and let « be an associative operation on .X Then

x « (y« )z = (x « y) « Z = U E .X Henceforth, we will often simply write
u = x « y « .z As a result of this convention we see that for x, y, u, V E ,X
X « y~ u~ v = x ~ (y « u) « v = x ~ y ~ (u « v)
= (x~y)~(u«v) = (x~y)«u«v. (2.1.10)
As a generalization of the above we have the so-called generalized assoc:la-
ti~e law, which asserts that if X I ' X z , .. ' ,x . are elements of a semigroup
{ X ; ~}, then any two products, each involving these elements in a particular
order, are equal. This allows us to simply write X I X« z ~ ... ~ x .•
2.1. Some Basic Structures ofAlgebra 37
In view of Theorem 2.1.6, part (i), if a semigroup has an identity element,

then such an element is unique. We give a special name to such a semigroup.
2.1.11. Definition. A semigroup {X; (X} is called a .monoid if X contains

an identity element relative to (x, Henceforth, the unique identity element of
a monoid ;X { (X} will be denoted bye.
Subsequently, we frequently single out elements of monoids which possess

inverses.
2.1.12. DefiDition. Let { X ; (X} be a monoid. If x E X possesses a right

inverse x ' E ,X then x is called a right invertible element in .X If x E X
possesses a left inverse x " E ,X then x is called a left invertible element
in .X If x E X is both right invertible and left invertible in ,X then we say
that x is an invertible element or a unit of .X
Clearly, if e E ,X then e is an invertible element.
2.1.13. Theorem. Let { X ; (X} be a monoid, and let x E .X If there exists

a left inverse of x, say x', and a right inverse of ,x say x " , then x ' = x " and
x ' is unique.
Proof Since (X is associative, we have (x' (X x) (X x " = x " and x ' (X (x (X x " )
= x'. Thus, x ' = x " . Now suppose there is another left inverse of x, say x " ' .
Then x ' " = x " and therefore x ' " = x'. •
Theorem 2.1.13 does, in general, not hold for arbitrary mathematical

systems {X; (X} with identity, as is evident from the following:
2.1.14. Exercise. Let X = u{ , v, x , y} and define (X as

(X u v x y
u v v u u
v u u v x
x u v x y
y x v y x
Use this operations table to demonstrate that Theorem 2.1.13 does not, in
general, hold if monoid ;X { (X} is replaced by system ;X { (X} with identity.
By Theorem 2.1.13, any invertible element of a monoid possesses a unique

right inverse and a unique left inverse, and moreover these inverses are
equal. This gives rise to the following.
2.1.15. Definition. eL t { X ; a} be a monoid. If x E X has a left inverse

and a right inverse, x ' and x " , respectively, then this unique element x ' = x "
is called the inverse of x and is denoted by X - I .
Concerning inverses we have.
2.1.16. Theorem. eL t ;X{ a} be a monoid.

(i) If x E X has an inverse, X - I , then X - I has an inverse (X - I t I = x .
(ii) If x, y E X have inverses X - I , y- I , respectively, then X a y has an
inverse, and moreover (x a y)- I = y- I 1% X - I .
(iii) The identity element e E X has an inverse e- I and e- I = e.
Proof To prove the first part, note that x a X - I = e and X - I ax = e.
Thus, x is both a left and a right inverse of X - I and (X - I )- I = .X
To prove the second part, note that
(x a y)a(y- I ax - I ) = x l % ( yay- I )ax - I = e
and
(y- I a X-I) 1% (x a y) = y- I 1% (X - I a x ) a y = e.
The third part of the theorem follows trivially from e a e = e. _
In the remainder of the present chapter we will often use the symbols
"+" and "." to denote operations in place of a, p, etc. We will call these
"addition" and "multiplication." oH wever, we strongly emphasize here that
..+" and"· " will, in general, not denote addition and multiplication of real
numbers but, instead, arbitrary operations. In cases where there exists an
identity element relative to "+ " , we will denote this element by "0" and call
it "zero." If there exists an identity element relative to ". ", we will denote this
element either by "I" or bye. Our usual notation for representing an identity
relative to an arbitrary operation a will still be e. If in a system {X; + } an
element x E X possesses an inverse, we will denote this element by - x and
we will call it "minus "x . F o r example, if ;X{ + } is a semigroup, then we
denote the inverse of an invertible element x E X by - x , and in this case we
have x + (- x ) = (- x ) + x = 0, and also, - ( - x ) = .x Furthermore, if
,x y E X are invertible elements, then the "sum" x + y is also invertible,
+ y) = (- y ) + "+"
*'
and - ( x (- x ) . Note, however, that unless is com-
mutative, - ( x + y) (- x ) + (- y ). Finally, if x, y E X and if y is an
invertible element, then - y E .X In this case we often will simply write
x + (- y ) = x - y.
2.1.17. Example. eL t X = O { , 1,2, 3}, and let the systems { X ; +} and

{ X ; .} be defined by means of the operation tables
+ 10 1 2 3 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 I 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
The reader should readily show that the systems { X ; }+ and { X ; .} are
monoids. In this case the operation" "
+ is called "addition mod 4" and"· "
is called "multiplication mod 4."
•
The most important special type of semigroup that we will encounter in
this chapter is the group.
2.1.18. Definition. A group is a monoid in which every element is invertible;

eL ., a group is a semigroup, ;X { IX,} with identity in which every element is
invertible.
The set R of real numbers with the operation of addition is an example of

a group. The set of real numbers with the operation of multiplication does
not form a group, since the number zero does not have an inverse relative to
multiplication. However, the latter system is a monoid. If we let Rtt = R
- O { ,J then R
{ ;# .} is a group.
Groups possess several important properties. Some of these are summa-
rized in the next result.
2.1.19. Theorem. Let {X; IX} be a group, and let e denote the identity
element of X relative to IX. Let x and y be arbitrary elements in .X Then
(i) if x IX x = x , then x = e;
(ii) if Z E X and x IX y = x IX ,z then y = z;
(iii) ifz E X a ndx I X y = z I X Y , thenx = z ;
(iv) there exists a unique W E X such that
W (X x = y; and (2.1.20)
(v) there exists a unique z E X such that
x(Xz=y. (2.1.21)
Proof To prove the first part, let x (X x = x. Then X - I IX (x IX x ) = X-I (X ,x
and so (X - I (X x ) IX x = e. This implies that x = e.
To prove the second part, let x IX y = x IX .z Then X - I (X (x IX y) = X-I IX (x
IX )z , and so (X - I IX x ) IX Y = (X - I (X x ) IX .z This implies that y = .z
The proof of part (iii) is similar to that of part (ii).
04 Chapter 2 I Algebraic·Structures
To prove part (iv), let w = y« X - I . Then w« x = (y« x - I )« x = y« (X - I

« x) = y. To show that w is unique, suppose there is a v E X such that
v «x = y. Then w « x = v « x . By part (iii), w = v.
The proof of the last part of the theorem is similar to the proof of part
(iv). -
In part (iv) of Theorem 2.1.19 the element w is called the left solution
of Eq. (2.1.20), and in part (v) of this theorem the element z is called the
right solution of Eq. (2. I.21).
We can classify groups in a variety of ways. Some of these classifications
are as follows. eL t { X ; } « be a group. Ifthe set X possesses a finite number
of elements, then we speak of a finite group. If the operation « is commutative
then we have a commutative group, also called an abelian group. If « is not
commutative, then we speak of a non-commutative group or a non-abelian
group. Also, by the order of a group we understand the order of the set .X
Now let ;X { } « be a semigroup and let IX be a non-void subset of X
which is closed relative to .« Then by Theorem 1.4.11, the operation « I on XI
induced by the associative operation « is also associative, and thus the
mathematical system { X I ; I« } is also a semigroup. The system { X I ; I« } is
called a subsystem of { X ; .} « This gives rise to the following concept.
2.1.22. Definitio... eL t { X ; }« be a semigroup, let IX be a non-void subset

of X which is closed relative to lX, and let I« be the operation on X I induced
by .« The semigroup (X I ; (XI} is called a subsemigroup of (X ; (Xl.
In order to simplify our notation, we will henceforth use the notation

(X I ; (X} to denote the subsemigroup (X I ; « t l (Le., we will suppress the
subscript of )« .
The following result allows us to generate subsemigroups in a variety of
ways.
2.1.23. Theorem. eL t {X; } « be a semigroup and let ,X c X for aU i E I,

where I denotes some index set. eL t Y = n
"X If,X { ; }«
lEI
is a subsemigroup
of (X ; } « for every i E I, and jf Y is not empty, then { ;Y }« is a subsemigroup
of { X ; .} «
Proof eL t x , y E .Y Then x, y E ,X for all i E I and so x « y E ,X for
every i, and hence x « y E .Y This implies that { Y ; }« is a subsemigroup. _
Now let Wbe any non~void subset of ,X where { X ; }« is a semigroup, and

let
Y' = {Y: We Y c X and { Y ; }« is a subsemigroup of { X ; «n.
Then cy is non-empty, since X E cy. Also, let

G= n
YE'l/
.Y
Then We G, and by Theorem 2.1.23 G { ; Il} is a subsemigroup of { X ; Il}.
This subsemigroup is called the subsemigroup generated by W.
2.1.24. Theorem. Let ;X { Il} be a monoid with e its identity element, and let
{ X I ; Ill} be a subsemigroup of { X ; Il}. Ife E IX ! , then e is an identity element
of { X I ; Ill} and { X I ; Iltl is a monoid.
Nex t we define subgroup.
2.1.26. Definition. L e t { X ; Il} be a semigroup, and let { X I ; Iltl be a sub-

semigroup of { X ; Il}. If { X I ; Ill} is a group, then { X I ; Ill} is called a subgroup
of{ X ; Il}. We denote this subgroup by { X I ; Il}, and we say the set IX deter-
mines a subgroup of{ X ; Il}.
We consider a specific example in the following:
2.1.27. Exercise. L e t Z6 = O
{ , 1,2,3,4 , 5} and define the operation +
on Z6 by means of the following operation table:
+012345
0012345
1104523
2 2 504 3 1
3345012
4431250
5523104
(a) Show that Z { 6; +} is a group.
(b) L e t K = O
{ , I}. Show that{ K ; +} is a subgroup Of{Z6; +}.
(c) Are there any other subgroups Of{Z6; + } ?
We have seen in Theorem 2.1.24 that if e E IX c ,X then it is also an

identity of the subsemigroup { X I ; Il}. We can state something further.
2.1.28. Theorem. L e t { X ; Il} be a group with identity element e, and let

{ X I ; Il} be a subgroup of { X ; Il}. Then e l is the identity element of { X I ; Il}
if and only if e l = e.
It should be noted that a semigroup { X ; lX} which has no identity element

may contain a subgroup { X I ; lX,} since it is possible for a subsystem to possess
an identity element while the original system may not possess an identity.
If{ X ; lX} is a semigroup with an identity element and if { X I ; lX} is a subgroup,
then the identity element of X mayor may not be the identity element of X I '
oH wever, if { X ; lX} is a group, then the subgroup must satisfy the conditions
given in the following:
2.1.30. Theorem. eL t { X ; lX} be a group, and let X I be a non-empty subset

of .X Then { X l ; lX} is a subgroup if and only if
(i) e E X I ;
(ii) for every x E IX > X-I E XI; and
(iii) for every ,x y E IX > X lX Y E X l '
Proo.f Assume that { X I ; lX} is a subgroup. Then (i) follows from Theorem
2.1.28, and (ii) and (iii) follow from the definition of a group.
Conversely, assume that hypotheses (i), (ii), and (iii) hold. Condition
(iii) implies that IX is closed relative to lX, and therefore { X I ; lX} is a sub-
semigroup. Condition (i) along with Theorem 2.1.24 imply that (X I ; lX} is
a monoid, and condition (ii) implies that (X I ; lX} is a group. _
Analogous to Theorem 2.1.23 we have:
2.1.31. Theorem. eL t (X ; lX} be a group, and let ,X c X for all i E l,

where lis some index set. Let Y = n
"X If (X,; lX} is a subgroup of { X ;
lEI
lX}
for every i E l, then (Y ; lX} is a subgroup of (X ; lX.}
Proof Since e E ,X for every i E 1 it follows that e E .Y Therefore, Y is
non-empty. Now let y E .Y Then y E ,X for all i E l, and thus y- I E IX
so that y- l E .Y Since y E X, it follows that Y c .X Also, for every ,x
y E ,Y x, Y E IX for every i E l, and thus x lX y E IX for every i and hence
x lX y E .Y Therefore, we conclude from Theorem 2.1.30 that { Y ; lX} is a
subgroup of ;X { lX.} _
A direct consequence of the above result is the following:
2.1.32. Corollary. eL t (X ; lX} be a group, and let (X I ; lX} and (X 2 ; lX} be

subgroups of { X ; lX.} eL t X 3 = XI n X 2 • Then {X 3 ; lX} is a subgroup of
{ X I ; lX} and (X 2 ; lX.}
2.1.33. Exercise. Prove Corollary 2.1.32.

2.1. Some Basic Structures of Algebra 34
We can define a generated subgroup in a similar manner as was done in

the case of semigroups. To this end let W be any subset of ,X where (X;~}
is a group, and let
Y' = (Y : We Y c X and (Y;~} is a subgroup of (X ; ~} .
The set Y ' is clearly non-empty because X E .Y' Now let

G=
E
n
Y J!'
.Y
Then We G, and by Theorem 2.1.31 (G;~} is a subgroup of (X ; ~}. This

subgroup is called the subgroup generated by W.
2.1.34. Exercise • . Let W be defined as above. Show that if (W;~ } is a

subgroup of(X ; ~}, then it is the subgroup generated by W.
Let us now consider the following:
2.1.35. Example. Let Z denote the set of integers, and let denote "+"
the usual operation of addition of integers. Let W = (I}. If Y is any subset
of Z such that (Y ; + } is a subgroup of { Z ; + } and We ,Y then Y = Z.
To prove this statement, let n be any positive integer. Since Y i s closed with
respect to ,+ we must have 1 + 1 = 2 E .Y Similarly, we must have 1 + I
+ ... + 1 = n E .Y Also, n- I = - n , and therefore all the negative integers
are in .Y Also, n - n = 0 E ,Y i.e., Y = Z. Thus, G = Y = Z, and so
E
n
Y J!'
the group { Z ; + } is the subgroup generated by I{ .} •
The above is an example of a special class of generated subgroups, the

so-called cyclic groups, which we will define after our next result.
2.1.36. Theorem. Let Z denote the set of all integers, and let { X ; ~}
be a group. Let x E and define ;xl< = x IX X IX • • IX x (k times), for k a
X
positive integer. Let I- x < = (Xl<)-I, and let OX = e. Let Y = lx { :< k E Z}.
Then { Y ; ~} is the subgroup of{ ;X }~ generated by (x.}
Proof We first show that {Y;~} is a subgroup of ;X{ .}~ Clearly, Y eX
and e E Y and for every y E Y we have r lEY . Also, for every x , y E Y
we have x ~ y E .Y Thus, by Theorem 2.1.30, (Y;}~ is a subgroup of{ X ; ~}.
Next, we must show that {Y;~} is the subgroup generated by .} x { To do so,
it suffices to show that Y c /Y for every /Y such that x E Y and such that j
{Y/;~} is a subgroup of (X; .}~ But this is certainly true, since y E Yimplies
y = lx < for some k E Z. Since x E ,Y it follows that lx < E Y and therefore j
y E ,Y . •
The preceding result motivates the following:

4 Ciropter 2 I Algebraic Structures
2.1.37. Definition. Let { X ; } « be a group. Ifthere exists an element x E X

such that the subgroup generated by } x { is equal to { X ; ,} « then { X ; } « is
called the cyclic group generated by .x
By Theorem 2.1.36, we see that a cyclic group has elements of such a

form that X = { ... , x - 3 , x - z , X - I , e, ,x x z , .. .}. Now suppose there is some
positive integer n such that ' X = e. Then we see that I+ ' X = .x Similarly,
X-II = e, and x - + I = .x Thus, X = fe, ,x ... , I-'X ,}
II
and X i s a finite set of
order n. If there is no n such that ' X = e, then X is an infinite set.
We consider next another important class of groups, the so-called permu-
tation groups. To this end let X be a non-empty set and let M(X ) denote the
set of all mappings of X into itself. Now, if ,« p E M(X ) then it follows from
(1.2.15) that the composite mapping p 0« belongs also to M(X ) , and we can
define an operation on M(X ) (i.e., a mapping from M(X ) x M(X ) into
M(X ) ) by associating with each ordered pair (P, «) the element po «. We
denote this operation by "." and write
p . « = po ,« «, pE M(X ) . (2.1.38)
We call this operation "multiplication," we refer to p . « as the product of
p and «, and we note that (P 0 «)(x) = (P . «)(x) for all x E .X We also note
that "." is associative, for if /%, P, y E M(X ) , then
(<< • P) . y = (<< 0 P) 0 i' = « 0 (jJ 0 y) = « • (P . y).
Thus, the system { M (X ) ; .} is a semigroup, which we call the semigroup of
transformations on .X
Next, let us recall that a permutation on X is a one-to-one mapping of X
onto .X Clearly, any permutation on X belongs to M(X ) . In particular, the
identity permutation e: X - + X defined by
e(x) = x for all x E ,X
belongs to M(X ) . We thus can readily prove the following:
2.1.39. Theorem. { M (X ) ;.} is a monoid whose identity element is the

identity permutation of M(X ) .
Proof L e t« E M(X ) . The (e • «)(x) = e(<)x « ) = «(x) for every x E ,X
and so e • « = .« Similarly, (<< • e)(x) = (« e(x)) = «(x) for all x E ,X and
so /% • e = .« •
Next, we prove:
2.1.40. Theorem. Let { M (X ) ; .} be the semigroup of transformations on

the set .X An element /% E M(X ) has an inverse in M(X ) if and only if« is a
permutation on .X Moreover, the inverse of a unit« is the inverse mapping
« - I determined by the permutation /%.
2.1. Some Ba.fic Structures ofAlgebra 54
Proof Suppose that (X E M(X ) is a permutation on .X Then it follows from

Theorem 1.2.10, part (ii), that (X - I is a permutation on X and hence (X - I
E M(X ) . Since (X 0 (X - I = (X - I 0 (X = e, it follows that (x . (X - I = (X - I . (X = e,
and thus (X has an inverse.
Next, suppose that (X has an inverse in M(X ) and let (x ' denote that inverse
relative to ".". Then (x ' E M(X ) and (x . (X' = (X' • (X = e. To show that
(X is a permutation on X we must show that (X is a one-to-one mapping of
X onto .X To prove that (X is onto, we must show that for any x E X there
exists ayE X such that (X(y) = .x Since (x ' E M(X ) it follows that (X(' )x E X
for every x E X and (X 0 (X(' )x = e(x) = .x Letting y = (X(' )x it follows that
(X is onto. To show that (X is one-to-one we assume that (X()x = (X(y). Then,
(X ' ( X ( x » = (X'(X(y» and since (X 0 (X' = e, we have
x = e(x) = (X 0 (X(' )x = (X' 0 (X()x = (X' 0 (X(y) = e(y) = y.
Therefore, (X is one-to-one. eH nce, if (X E M(X ) has an inverse, (X - I , it is a
permutation on .X •
Henceforth, we employ the following notation: the set of all permutations

on a given set X is denoted by P(X). As pointed out in Chapter I, if a set
X has n elements, then there are n! distinct permutations on .X
The reader is now in a position to prove the following result.
2.1.41. Theorem. (P(X ) ;.J is a subgroup of(M(X ) ;.J .

The preceding result gives rise to a very important class of groups.
2.1.43. Definition. Any subgroup of the group {P(X); • J is called a permu-
tation group or a transformation group on ,X and {P(X); • J is called the
permutation group or the transformation group on .X
Occasionally, we speak of a permutation group on ,X say { Y ; • ,J without
making reference to the set .X In such cases it is assumed that (Y ; • J is a
subgroup of the permutation group P(X ) for some set .X
2.1.4.4 Example. eL t X = ,x{ y, .}z Then P(X ) consists of 3! = 6 permu-

tations, namely,
(XI = C :), y
y
(X2 = C z ;), y
(X3 = (;
y
x
:),
(X4 = Gz ;), y
(x s = G ;), x
y
(x , = (: y
y
;).
We can readily verify that (XI = e. If X I = fe, (X 2 J , then (X I ; • J is a subgroup
of P(X ) and hence a permutation group on .X eL t X 2 = (e, (X 4 ' (X s J . Then
{X 2 ; .} is also a permutation group on .X Note that { X I ; .} is of order 2

and { X 2 ; .} is of order 3. •
B. Rings and iF elds
Thus far we have concerned ourselves with mathematical systems con-

sisting of a set and an operation on the set. Presently we consider mathemat-
ical systems consisting of a basic set X with two operations ~ and P defined
on the set, denoted by { X ; ~, Pl. Associated with such systems there are two
mathematical systems (called subsystems) {X;~} and ;X { Pl. By insisting that
the systems ;X { }~ and ;X { P} possess certain properties and that one of the
operations be distributive over the other, we introduce the important mathe-
matical systems known as rings. We then concern ourselves with special types
of important rings called integral domains, division rings, and fields.
2.1.45. Definition. Let X be a non- e mpty set, and let ~ and P be operations
on .X The set X together with the operations ~ and P on ,X denoted by
{X;~, P}, is called a ring if
(i) {X;~} is an abelian group;
(ii)' { X ; Pl is a semigroup; and
(iii) P is distributive over ~.
We refer to {X;~} as the group component of the ring, to ;X{ P} as the
semigroup componeDt of the ring, to tX as the group operatioD of the ring, and
to P as the semigroup operation of the ring. F o r convenience we often denote
a ring { X ; tX, P} by X and simply refer to "ring X " . F o r obvious reasons,
we often use the symbols "+" and "." ("addition" and "multiplication")
in place of tX and P, respectively. Thus, if X is a ring we may write { X ; ,+ .}
and assume that { X ; + 1 is the group component of ,X and { X ; .} is the
semigroup component of .X We call { X ; + } the additive group of ring ,X
;X{ .} the multiplicatiYe semigroup of ring ,X x +
y the sum of x and y, and
x . y the product of x and y.
We use 0 ("z e ro") to denote the identity element of { X ; .}+ If { X ; .} has
an identity element, we denote that identity bye.
The inverse of an element x relative to + is denoted by - .x If x has an
inverse relative to ".", we denote it by X - I . F u rthermore, we denote x +(- y )
by x - y (the "difference of x and y") and (- x ) y by - x + +
y. Note that
the elements 0, e, - x , and X - I are unique.
Subsequently, we adopt the convention that when operations "+" and
"." appear mixed without parentheses to clarify the order of operation,
the operation should be taken with respect to ". " first and then with respect
to .+ F o r example,
x • y + z = (x • y) + z
and not x • (y + )z . The latter would have to be written with parentheses.

Thus, we have
x • (y + )z = (x • y) + (x • )z = x • y + x • .z
In general, the semigroup ;X { .} does not contain an identity. However,
if it does we have:
2.1.46. Definition. Let ;X { ,+ .} be a ring. If the semigroup ;X { .} has

an identity element, we say that X is a ring with identity.
There should be no ambiguity concerning the above statement. The group

;X { + } always has an identity, so if we say "ring with identity," we must
refer to ;X{ .}.
We note that it is always true that the operation is commutative for "+"
a given ring. If in addition the operation "." is also commutative, we have
2.1.47. Definition. eL t ;X { ,+ .}
be a ring. Ifthe operation "." is com-
mutative on the set X then the ring X is called a commutative ring.
F o r rings we also have:
2.1.48. Definition. L e t{ X ; ,+ .}
be a ring with identity. An element x E X
is called a unit of X if x has an inverse as an element of the semigroup { X ; .}.
We denote this inverse of x by X - I .
The reader can readily verify that the following examples are rings.
2.1.49. Exercise. L e tting" "

+ and "." denote the usual operations of
addition and multiplication, show that { X ; ,+ .} is a commutative ring with
identity if
(i) X is the set of integers;
(ij) X is the set of rational numbers; and
(iii) X is the set of real numbers.
2.1.50. Exercise. Let X = O{ , I} and define" "+ and"· " by the following
operation tables:
+ 0 I I
~
_O
o 0 I o 0 0
1 I 0 I 0 I
Show that ;X{ ,+ .} is a commutative ring with identity.
2.1.51. Exercise. Let ;X{ ex} be an abelian group with identity element e.
Define the operation P on X as x p y = e for every x , y E .X Show that
;X{ ex, P} is a ring.
F o r rings we have:
2.1.52. Theorem. If { X ; ,+ .} is a ring then for every ,x y E X we have

(i) x + 0 = 0 + x = x ;
(ii) - ( x + y) = (- x ) + (- y ) = (- x ) - y = -x - y;
(iii) if x + y = 0, then x = - y;
(iv) - ( - x ) = x;
(v) 0 = x • 0 = 0 • x ;
(vi) (- x ) . y = - ( x . y) = x · (- y ); and
(vii) (- x ) · ( - y )= x · y .
Proof Parts (i)-(iv) follow from the fact that ;X{ + } is an abelian group
and from our notation convention.
To prove part (v) we note that since z + 0 = z for every z E X we have
for every x E ,X 0 • x + 0 = 0 • x = (0 + 0) • x = 0 • x + 0 • x, and thus
o = 0 • x. Also, x . 0 + 0 = x • 0 = x . (0 + 0) = x • 0 + x • 0, so that
o = x • O. eH nce, 0 = x • 0 = 0 . x for every x E .X
To prove part (vi), note that 0 • y = 0 for every y E X and since x + (- x )
= 0 we have 0 = O· y = x [ + (- x ) ] · y = x · y+ ( - x ) · y. This implies
that - ( x · y) = (- x ) • y since - ( x · y) is the additive inverse of x • y.
Similarly, 0 = x · 0 = x • y[ + (- y )] = x · y + x · (- y ). This implies
that x · (- y ) = - ( x · y). Thus, (- x ) . y = - ( x · y) = x • (- y ).
iF nally, to prove part (vii), we note that since - ( - z ) = z for every
z E X and since part (vi) holds for any x E X, we obtain, replacing x by - x ,
(- x ) . (- y ) = -[(-x). y] = -[-(x· y)] = x· y. •
Now let { X ; + , .} denote a ring for which the two operations are equal,
i.e., "+" = ".". Then x + y = x · y for all x , y E .X In particular, if
y = 0, then x + 0 = x . 0 = 0 for all x E X and we conclude that 0 is the
only element of the set :X
This gives rise to:
2.1.53. Definition. A ring ;X{ ,+ .} is called a trivial ring if X = O

{ .J
We next introduce:
2.1.54. Definition. eL t { X ; ,+ .}
be a ring. Ifthere exist non-zero elements
x , y E X (not necessarily distinct) such that x • y = 0, then x and yare both
called divisors of ez ro.
We have:
2.1.55. Theorem. eL t ;X { ,+ .} be a ring, and let X · = X - O{ .J Then X

has no divisors of ez ro if and only if ;#X{ .} is a subsemigroup of ;X{ .}.
Proof Assume that X has no divisors of ez ro. Then ,x y E #X

*'
implies
x • y 0, so x • Y E #X and X# is a subsemigroup.
Conversely, if x, y E #X
y,*O. •
implies x • y E ,#X then x • y,* if x ° *' ° and
We now consider special types of rings called integral domains.
2.1.56. Deftnition. A ring { X ; ,+ .J is called an integral domain if it has

no divisors of ez ro.
Our next result enables us to characterize integral domains in another

equivalent fashion.
2.1.57. Theorem. A ring X is an integral domain if and only if for every

*' x 0, the following three statements are equivalent for every y, Z E X:
(i) y = ;z
(ii) x · y = x • z; and
(iii) y. x = z • .x
Proof Assume that X is an integral domain. Clearly (i) implies (ii) and
(iii). To show that (ii) implies (i), let x • y = x • .z Then x • (y - )z = 0.
Since x *'
0 and X has no ez ro divisors, y - z = 0 or y = .z Thus, (ii)
implies (i). Similarly, it follows that (iii) implies (i). This proves that (i),
(ii), and (iii) are equivalent.
Conversely, assume that x ;= t:. 0 and that (i), (ii), and (iii) are equivalent.
Let x • y = O. Then x • 0 = x . y, and it follows that y must be zero since
(ii) implies (i). Thus, x . Y ;= t:. 0 for y ;= t:. 0, and X has no ez ro divisors. _
We now introduce divisors of elements.
2.1.58. Definition. Let ;X{ ,+ . J be a commutative integral domain with

identity, and let x, y E .X We say y is a divisor of x if there exists an element
Z E X such that x = y • .z If y is a divisor of ,x we write y I.x
If y Ix, it is customary to say that y divides .x
2.1.59. Theorem. Let ;X{ ,+ .J be a commutative integral domain with

identity, and let x E .X Then x is a unit of X ifand only if x Ie.
Proof Let x Ie. Then there is a z E X such that e = x • z = Z • .x Thus, Z
is an inverse of ,x eL ., z = X - I .
Conversely, let x be a unit of .X Then there exists X - I E X such that
e = x • X - I , and thus x l e. _
We notice that if in an integral domain x • y = 0, then either x = 0 or

y = O. Now a divisor of zero cannot have an inverse. To show this, we let

x and y be divisors of zero, eL ., x • y = O. Suppose that y has an inverse.
Then x • y • r I = 0 • r I, or x = 0, which contradicts the fact that x and y
are z e ro divisors. However, the fact that an element is not a zero divisor
does not imply it has an inverse. If all of the elements except zero have an
inverse, we have yet another special type of ring.
2.1.60. DeflDition. L e t {X; ,+ .}

be a non-trivial ring, and let X # = X - O
{ .J
The ring X is called a division ring if { X # ; .} is a subgroup of {X; .}.
In the case of division rings we have:
2.1.61. Theorem. Let { X ; ,+ .} be a division ring. Then X is a ring with

identity.
Proof L e t X # = X - O{ .J Then { X # ; .} has an identity element e. L e t
x E .X Ifx E X # , then e • x = x • e = x. If x ~ X * , then x = 0 and 0 • e
= e • 0 = O. Therefore, e is an identity element of .X •
Of utmost importance is the following special type of ring.
2.1.62. Definition. L e t { X ; ,+ .} be a division ring. Then X is called a

field if the operation "." is commutative.
Because of the prominence of fields in mathematics as well as in applica-

tions, and because we will have occasion to make repeated use of fields, it
may be worthwhile to restate the above definition, by listing all the properties
of fields.
2.1.63. DefinitioD. L e t X be a set containing more than one element, and

let there be two operations "+"
and "." defined on .X Then { X ; is a ,+ .}
field provided that:
(i) x + (y + z) = (x + y) +z and x · (y • )z = (x • y) • z for all
,x y, z E X (Le., "+" and"." are associative operations);
(ii) x+ y= y +
x and x · y = y • x for all x , y E X (Le., "+" and
"." are commutative operations);
(iii) there exists an element 0 E X such that 0 +
x = x for all x E X;
(iv) for every x E X there exists an element - x E X such that x
+(-x)=O;
(v) x · (y + z) = x • y + x • z for all x , y, z E X (i.e., "." is distribu-
tive over "+ " );
(vi) there exists an element e*"O such that e • x = x for all x E X; and
(vii) for any x *"
0, there exists an X - I E X such that x • (X - I ) = e.
2.1.64. Example. Perhaps the most widely known field is the set of real
numbers with the usual rules for addition and multiplication. _
2.1.65. Exercise. Let Z denote the set of all integers and .. " + and "."
denote the usual operations of addition and multiplication on Z. Show that
{ ; ,+ .} is an integral domain, but not a division ring, and hence not a
Z
field.
The above example and exercise yield:
2.1.66. Definition. Let R denote the set of all real numbers, let Z denote
the set of all integers, and let" "+ and "." denote the usual operations of
addition and multiplication, respectively. We call {R; ,+ .} the field of real
numbers and { Z ; ,+ .} the ring of Integers.
Another very important field is considered in the following:
2.1.67. Exercise. eL t C = R x R, where R is given in Definition 2.1.66.

F o r any x , y E C, let x = (a, b) and y = (c, d), where a, b, e, d E R. We
define x = y if and only if a = c and b = d. Also, we define the operations
..+" and "." on C by
x + y = (a + c, b + d)
and
x • y = (ac - bd, ad + be).
Show that C
{ ; ,+ .} is a field.
In view of the last exercise we have:
2.1.68. Definition. The field (C; ,+ .} defined in Exercise 2.1.67 is called

the field of complex numbers.
2.1.69. Exercise. eL t Q denote the set of rational numbers, let P denote

the set of irrational numbers, and let" "
+ and"· " denote the usual operations
of addition and multiplication on P and Q.
(a) Discuss the system Q{ ; ,+ .}.
(b) Discuss the system P
{ ; ,+ .}.
2.1.70. Exercise. (This exercise shows that the family of 2 x 2 matrices
forms a ring but not a field.) Let {R; ,+ .}
denote the field of real numbers.
Define M to be the set characterized as follows. If u, v E M, then u and v
are of the form
where a, b, c, d and m, n, p, q E R. Define the operations "+" and "."

onMby
u+ v= 0[ dbJ + m[ p qnJ = 0[ c+ p+ m bd+ q+ nJ

C
and
u • v= 0[ bJ. [m nJ = 0[ . m + b• pO· n + b• .Jq
C d p q c· m + d · p c.n+ d · q
(Note that in the preceding the operations + and· defined on M are entirely
different from the operations + and· for the field R.)
(a) Show that M
{ ; +} is a monoid.
(b) Show that M{ ; +} is an abelian group.
(c) Show that M { ; ,+ .} is a ring.
(d) Show that M { ; ,+ .} has divisors of ez ro.
Next, we introduce the concept of subring.
2.1.71. Definition. eL t X be a ring, and let Y be a non-void subset of X

which is closed relative to both operations "+" and "." of the ring .X The
set ,Y together with the (induced) operations "+" and ".", { Y ; ,+ ,} ' is
called a subring of the ring X provided that { Y ; ,+ .} is itself a ring.
In connection with the above definition we say that subset Y determines

the subring { Y ;,+ .}. We have:
2.1.72. Theorem. If X is a ring then a non-void subset Y o f X determines
a subring of the ring X if and only if
(i) Y is closed with respect to both operations" "
+ and"· " ; and
(ii) -x E Y whenever x E .Y
sU ing the concept of subring, we now introduce subdomains.
2.1.74. Definition. eL t X be a ring, and let Y be a subring of .X If Y is

an integral domain, then it is called a subdomain of .X
We also define subfield in a natural way.
2.1.75. Definition. eL t X be a ring, and let Y be a subring of .X If Y is a

field, then it is called a subfteld of .X
Before, we characterized a trivial ring as a ring for which the set X consists
only of the 0 element. In the case of subrings we have:
2.1. Some Basic Structures ofAlbegra 53
2.1.76. Definition. Let { X ; ,+ .}

be a ring, and let { Y ; ,+ .} be a subring.
Then subring Y is called a trivial subring if either
(i) Y = O
{ J or
(ii) Y= .X
F o r subdomains we have:
2.1.77. Theorem. L e t X be an integral domain, and let Y be a non-trivial

subring of .X Then Y is a subdomain of .X
Proof Let x , y E ,Y and let x • y = O. Since x , y E X, x and y cannot be
zero divisors. Thus, Y has no zero divisors. •
F o r subfields we have:
"*
2.1.78. Theorem. Let X be a field, and let Y be a subring of .X Then Y
is a subfield of X if and only if for every x E ,Y X 0, X - I E .Y
F o r the intersection of arbitrary subrings we have the following:
2.1.80. Theorem. Let X be a ring, and let /X be a subring of X for each

i E I, where I is some index set. Let Y = X I ' Then ;Y{ n
is a subring
/EI
,+ .}
of(X ; + , .} .
Proof Since 0 E /X for all i E I, it follows that 0 E Y a nd Y is non-empty.
L e t x , y E .Y Then x , y E /X for all i El. Hence, x + y E /X and x · y
E /X for all i E I so that Y is closed with respect to "+" and "." Also,
-x E /X for every i E I. Thus, by Theorem 2.1.72, Y is a subring of .X •
Now let (X ; ,+ .} be a ring and let W be any subset of .X Also, let

Y' = {Y: W eY e X and Y is a subring of .}X
Then' Y is non-empty because X E .Y ' Now let R = n
YEy
.Y Then We R
and, by Theorem 2.1.80, R { ; ,+ .}
is a subring of { X ; ,+ .}. This subring is
called the subring generated by W.
C. Modules, Vector Spaces, and Algebras
Thus far we have considered mathematical systems consisting of a set

X of elements and of mappings from X X X into X called operations on .X
Since a mapping may be regarded as a set and since an operation is a mapping
(see Chapter I), the various components of the mathematical systems con-
sidered up to this point may be thought as being derived from one set .X
Chapter 2 / Algebraic Structures
Next, we concern ourselves with mathematical systems which are not

restricted to possessing one single fundamental set. We have seen that a single
set admits a number of basic derived sets. Clearly, the number of sets that
may be derived from two sets, say X and ,Y will increase considerably. F o r
example, there are sets which may be generated by utilizing operations on X
and ,Y and then there are sets which may be derived from mappings of
X x Y into X or into .Y
Mathematical systems which possess several fundamental sets and
operations on at least one of these sets may, at least in part, be analyzed by
making use of the development given thus far in the present section. Indeed,
one may view many such complex systems as a composite of simpler mathe-
matical systems and refer to such systems simply as composite mathematical
systems. Important examples of such systems include vector spaces, algebras,
and modules.
{ ; ,+ .} be a ring with identity, e, and let { X ; } +

2.1.81. Definition. L e t R
be an abelian group. L e t IJ: R x X - + X be any function satisfying the
following four conditions for all tx, pER and for all x, y E X :
(i) p(tx + p, x ) = lJ(tx, x ) + IJ(P, x);
(ii) lJ(tx, x + y) = p(tx, x ) + p(tx, y);
(iii) lJ(tx, p(P, » x = p(tx • P, x); and
(iv) p(e, x ) = x .
Then the composite system {R, X , p} is called a module.
Since the function p is defined on R x ,X the module defined above is

sometimes called a left R-module. A right R-module is defined in an analogous
manner. We will consider only left R-modules and simply refer to them as
modules, or R-modules.
The mapping IJ: R x X - + X is usually abbreviated by writing lJ(tx, x )
= tx,x i.e., in the same manner as "multiplication of tx times x . " Using this
notation, conditions (i) to (iv) above become
(i) (tx + P)x = txx + px ;
(ii) tx(x + y) = txx + txy;
(iii) tx(px) = (tx • P)x; and
(iv) ex = x;
respectively. We usually refer to the module R { , X,IJ} by simply referring
to X and calling it an R-module or a module over R.
To simplify notation, we used in the preceding the same operation symbol,
,+ for ring R as well as for group .X However, this should cause no confusion,
since it will always be clear from contex t which operation is used. We will
follow similar practices on numerous other occasions in this book.
2.1.82. Example. Let Z { ; ,+ .} denote the ring of integers, and let{X; }+

be any abelian group. Define p: Z X X - + X by p(n, )x = x + ... +
,x
where the summation includes x n times. We abreviate this as p(n,x ) = nx
and think of it as "n times x . " The identity element in Z is 1, and we see that
the conditions (i) to (iv) in Definition 2.1.81 are satisfied. Thus, any abelian
group may be viewed as a module over the ring of integers. _
2.1.83. Example. eL t { X ; ,+ .}
be a ring with identity, and let R be a
subring of X with e E R. By defining p: R x X - + X as p«(I" )x = (1,. ,x it
is clear that X is an R-module. In particular, if R = ,X we see that any ring
with identity can be made into a module over itself. _
F o r modules we have:
2.1.84. Theorem. Let X be an R-module. Then for all (I, E R and x E X

we have
(i) (1,0 = 0;
(ii) (1,( - x ) = -«(I,x);
(iii) Ox = 0; and
(iv) (- ( I,)x = -«(I,x).
Proof. To prove the first part, we note that for 0 E X we have 0 + 0 = O.
Thus, (1,(0 + 0) = (1,0 + (1,0 = (1,0, and so (f,Q = O.
To prove the second part, note that for any x E X w e have x + (- x ) = 0,
and thus (I,(x + (- x » = (l,X + (1,( - x ) = (1,0 = O. Therefore, (1,( - x )
= -«(I,x).
To prove the third part observe that for 0 E R we have 0 + 0 = O. eH nce,
(0 + O)x = Ox + Ox = Ox, and therefore Ox = O.
To prove the last part, note that since (I, + (- ( I,) = 0 it follows that
(« I, + (- ( I,» x = Ox = O. Therefore, (l,X + (- ( I,)x = 0, and (- ( I,)x = -«(I,x).
We next introduce the important concept of vector space.

-
2.1.85. Definition. eL t ;F { ,+ .} be a field, and let ;X { }+ be an abelian
group. If X is an F-module, then X is called a vector space over .F
The notion of vector space, also called linear space, is among the most
important concepts encountered in mathematics. We will devote the next two
chapters and a large portion of the remainder of this book to vector spaces
and to mappings on such spaces.
2.1.86. Theorem. Let {R; ,+ .} be a ring, and let R" = R x ... x R;

i.e., R" denotes the n-fold Cartesian product of R. We denote the element
X E R" by x = (x .. :x h •• , ,x ,) and define the operation" "+ on k' by

x + Y = (X I + IY "> ., "x + ,Y ,)
for all ,x Y E R". Also, we define p: R X R" - + k' by
otx = (otx .. .• . , otx,,)
for all ot E R and x E R". Then, R" is an R-module.
We also have:
2.1.88. Theorem. eL t ;F { ,+ .}
be a field, and let "F = F x ... X F
be the n-fold Cartesian product of .F Denote the element x E F " by x = (~I'
~z, ... ,~,,) and define the operation "+" on "F by
for all ,x Y E "F . Also, define p: F x "F -+ "F by

otx = (~ ..... , ~,,)
for all ot E F and x E F". Then F " is a vector space over .F
In view of Theorem 2.1.88 we have:
2.1.90. Definition. Let ;F { ,+ .} be a field. The vector space "F over F

is called the vector space of n-tuples over .F
Another very important concept encountered in mathematics is that of

an algebra. We have:
2.1.91. Definition. eL t X be a vector space over a field .F Let a binary

operation called "multiplication" and denoted by "." be defined on X,
satisfying the following axioms:
(i) x ' (y + )z = x • Y + x • ;z
(ii) (x +y) • z = x • z + Y • ;z and
(iii) (otx)· (PY) = (ot • P)(x • y)
for all x, ,Y z E X and for all ot, P E .F Then, X is called an algebra over .F
If, in addition to the above axioms, the binary operation of multiplication is
associative, then X is called an associative algebra. Ifthe operation is com-
mutative, then X is called a commutative algebra. IfX has an identity element,

then X is called an algebra with identity.
Note that in hypothesis (iii) the symbol"." is used to denote two different
operations. Thus, in the case of x • y the operation used is defined on X while
in the case of /X • p the operation used is defined on .F
The reader is cautioned that in some texts the term algebra means what
we defined to be an associative algebra.
2.1.92. Exercise. Let M

{ ; ,+ .} denote the ring of 2 x 2 matrices defined
in Exercise 2.1.70, and let R
{ ; ,+ .} be the field of real numbers. F o r u E M
given by
where a, b, c, d E R, define /Xu for /X E R by
/Xu = /[ QX /Xb.J
/Xc tXd
Show that M is an associative algebra over R.
In some areas of application, so-called Lie algebras are of importance.

We have:
2.1.93. Definition. A non-associative algebra R is called a Lie algebra

if x • x = 0 for every x E R and if
x • (y • )z + y . (z • x) + z • (x • y) = 0 (2.1.94)
for every x , y, Z E R. Eq u ation (2.1.94) is called the J a cobi identity.
L e t us now consider some specific cases of Lie algebras. Our first exercise
shows that any associative algebra can be made into a iL e algebra.
2.1.95. Exercise. Let R be an associative algebra over ,F and define the

operation "." on R by
x . y= x · y - y · x
for all x, y E R (where"· " is the operation on the associative algebra R
over F ) . Show that R with "." defined on it is a iL e algebra.
2.1.96. Example. In Exercise 2.1.70 we showed that the set of 2 x 2

matrices forms a ring but not a field, and in Exercise 2.1.92 we showed that
this set forms an algebra over ,F the field of real numbers. This set can be
made into a Lie algebra by Exercise 2.1.95. •
2.1.97. Exercise. eL t X denote the usual "three-dimensional space," and

let i, j, k denote the elements of X depicted in Figure A.
k = (0,0,1)
j = (0,1,0)
2.1.98. iF gure A. Unit vectors i.j. k in three-dimensional space.
Define the operation "x " on X by

x j k
0 k -j
j -k 0 i
k j -i 0
i.e., "X " denotes the usual "cross product," also called "outer product."
encountered in vector analysis. Show that X is a iL e algebra.
eL t us next consider submodules.
2.1.99. Deflnition. eL t {R; ,+ .J be a ring with identity, and let { X ; } +

be an abelian group, where X is an R-module. eL t { Y ; + J be a subgroup of
{X; + .J If Y is an R-module, then Y is called an R-submoclole of .X
We can characterize submodules by the following:
2.1.100. Theorem. eL t X be an R-module, and let Y be a non-empty

subset of .X Then, Y is an R-submodule if and only jf
(i) { Y ; + J is a subgroup of { X ; + ;J and
(ii) for all « E R and x E ,Y we have « x E .Y
Proof We give the sufficiency part of the proof and leave the necessity part
as an exercise.
Let ,« pER and let x E .Y Then «x, px , (<< + P)x E Y by hypothesis
(ii). Since iY s a group, it follows that « x + py E Yand since x E X we have
(<< + P)x = « x + px . Now Iet« E R and let x , y E .Y Then (x + y) E Y

and, also, « ( x + y), « x , « y E .Y Thus, « ( x + y) = « x + «Y, since Y is a
subgroup of .X Now let « , PER, and let x E .Y Then px E ,Y and hence
« ( px ) E .Y We have (<< • P)x E ,Y and so « ( px ) = (<< • P)x . Also, since
e E R, we have ex E Y for all x E Y and furthermore, since x E ,X we
have ex = x . This proves that Y is an R-module and hence an R-submodule
of X . •
2.1.101. Exercise. Prove the necessity part of the preceding theorem.
We next introduce the notion of vector subspace, also called linear subspace.
2.1.102. DefinitioD. Let F be a field, and let X be a vector space over .F

Let Y be a subset of .X If Y is an F - s ubmodule of ,X then Y is called a vector
snbspace.
L e t us consider some specific cases.
2.1.103. Example. L e t R be a ring, let X be an R- m odule, and let {x E X

for i = I, ... ,n. Then the subset of X given by { x E :X x = ii (1,{,{x {«
E R} is an R-submodule of .X •
2.1.104. Example. Let F be a field, and let P be the vector space of n-
tuples over .F Let IX = (1,0, ... ,0) and lX = (0,1,0, ...• 0). Then IX '
lX E F~. Let Y = (x E P: X = I« X I + «lX l , « .. 1« E .}F Then iY s a vector
subspace. We see that jf X E ,Y then x is of the form x = (« 1,1. (1,1,0, ... ,0).
•
We next prove:
2.1.105. Theorem. Let X be an R-module, and let Y' denote a family of

R-submodules of ,X i.e., /Y is a submodule of X for every {Y E ,Y ' where
i E I and I is some index set. Let Y =
.X {el
n
.{Y Then Y is an R-submodule of
Proof Since {Y is a subgroup of X for all Y 1 E ,Y ' it follows that Y is a

subgroup of X by Theorem 2.1.31. Now let « E R and let y E .Y Then
y E Y j for all {Y E .Y ' Hence, « y E {Y for all {Y E .Y ' and so (1,y E .Y
Therefore, by Theorem 2.1.100, Y is an R-submodule of .X •
The above result gives rise to:

2.1.106. Deftnition. Let X be an R-module, and let W be a subset of .X

eL t cy be the family of subsets of X given by
cy = {Y: W c: Y c: X and Y i s an R-submodule of X}.
eL t G = () .Y Then G is called the R-sabmodule of X generated by W.
rely
eL t us next prove:
2.1.107. Theorem. eL t X be an R-module, and let X lt • • ,X" E .X eL t

(Y x I ' • . . , ,x ,) denote the subset of X given by
(Y x l, ,x,,) = {x E X: x = «IX I + ... + «"x"' «I' ... ,«" E R}.
Then (Y x I' , ,x ,) is an R-submodule of .X
of X we first note that E .Y Next, for x E ,X let y = (- « I )X

+ (-I"X )X,,. Then y E Y and x y = 0, and hence y = - x .
°
Proof. F o r brevity let Y = (Y x I ' • . . , ,x ,). To show that Y is a subgroup
I
Next, let +
+ ...
z = PIX I + ... +
P..x". Then x Z = (IX I PI)X I (IX" P")x,, + + + ... + +
E .Y Therefore, by Theorem 2.1.30, Y is a subgroup of .X
Finally, note that for any a E R,
ax = a(IXlx l + ... + IX",X ,) = a • IXIX I + ... +a • IX""X E .Y
Thus, by Theorem 2.1.100, Y is an R-submodule of .X •
We see that (Y x l , ••• , ,x ,) belongs to the family cy of Definition 2.1.106

if we let Y = Y ( x " ... , ,x ,), in which case n Y , = (Y x l , • • , ,x ,). This
r'.E.y
leads to:
2.1.108. Deftnition. eL t X be an R-module, let XI' ... ,X "X, and let E

(Y x l, ,x,,) = { x E X : x = IXIX I + ... + IX""X , lX I ' .• . ,IX " E R}. Then
(Y x I' , ,x ,) is called the R-module of X generated by x I ' . . . , "x .
Also of interest to us is:
2.1.109. Definition. eL t X be an R-module. If there exist elements XI'

• . . ,X" E X such that for every x E X there exist lX I ' • • , IX" E R such that
x = I« X I + ... + «"x"' then X is said to be bitely generated and XI' ... ,
x" are called the generators of .X
It can happen that the indexed set I{ X It • . . , IX,,} in the above definition is
not unique. That is to say, for x E X we may have x = IXIX I + ... + IX..x"
= PIX I + ... + P"x", where 1X , 1= = P, for some i. oH wever, if it turns out
that the above representation of x in terms of x " ... ,X " is unique, then we
have:
2.1.110. DefiDitioD. Let X be an R-module which is finitely generated.

eL t "x ... , X n be generators of .X Iffor every x E X the relation
x = ~,x, + ... + ~.xn =PIX, + ... + Pnxn
implies that ~I = PI for all i = I, ... ,n, then the set "x { . .. ,xnJ is called
a basis for .X
D. Overview
We conclude this section with the flow chart of Figure D, which attempts
to put into perspective most of the algebraic systems considered thus far.
Integral domain Commutative ring
Module
Associative algebra Commutative algebra
2.1.111. iF gure B. Some basic structures of algebra.

2.2. O
H MOMORPHISMS
Thus far we have concerned ourselves with various aspects of different

mathematical systems (e.g., semigroups, groups, rings, etc.). In the present
section we study special types of mappings defined on such algebraic struc-
tures. We begin by first considering mappings on semigroups.
2.2.1. Definition. eL t { X ; }« and { Y ; P} be two semigroups (not neces-
sarily distinct). A mapping p of set X into set Y is called a homomorphism
of the semigroup { X ; }« into the semigroup { Y ; P} if
p(x (X y) = p(x)P p (y) (2.2.2)
for every ,x y E .X The image of X under p, denoted by p(X ) , is called the
homomorphic image of .X If x E X then p(x ) is called the homomorphic
image of .x
In iF gure C, the significance of Eq. (2.2.2) is depicted pictorially. rF om
this figure and from Eq. (2.2.2) it is evident why homomorphisms are said to
"preserve the operations (X and p."
x y
2.2.3. Figure C. Homomorphism or semigroup { X ; ~} into semigroup

{ Y ; fJ.}
In the above definition we have used arbitrary semigroups ;X{ }« and

{Y; Pl. As mentioned in Section 2.1, it is often convenient to use the symbol
"+ " for operations. When using the notation { X ; +} and { Y ; +} to denote
two different semigroups, it should of course be understood that the operation
+ associated with set X will, in general, be different from the operation +
associated with set .Y Since it will usually be clear from context which par-
ticular operation is being used, the same symbol will be employed for both
semigroups (however, on rare occasions we may wish to distinguish between
different operations on different sets).
sU ing the notation { X ; + } and { Y ; + } in Definition 2.2.1, Eq. (2.2.2)
62
2.2. oH momorphisms
now assumes the form

p(x + y) = p(x ) + p(y) (2.2.4)
for every ,x y E .X This relation looks very much like the "linearity property"
which will be the central topic of a large portion of the remainder of this
book, and with which the reader is no doubt familiar. oH wever, we emphasize
here that the definition of "linear" will be reserved for a later occasion, and
that the term homomorphism is not to be taken as being synonymous with
linear. Nevertheless, we will see that many of the subsequent results for
homomorphisms will reoccur with appropriate counterparts throughout the
this book.
2.2.5. Example. Let R denote the set of real numbers, and let "+" and
"." denote the usual operations of addition and multiplication on R. Then
{ ; + } and {R; .} are semigroups. eL t
R
f(x ) = e"
for all x E R. Thenfis a homomorphism from {R; + } to {R; .} . •
2.2.6. Exercise. eL t ;X { +} and { X ; · } denote the semigroups defined

in Example 2.1.17. L e tf: X - + X b e defined as follows:f(O) = 1,/(1) = 3,
f(2) = I, and f(3) = 3. Show that f is a homomorphism from ;X{ + } into
{ X ; .} .
In order to simplify our notation even further, we will often use the
symbol"." in the remainder of the present chapter to denote operations for
semigroups (or groups), say { X ; ' } , { Y ; • ,J and we will often refer to these
simply as semigroup (or group) X and ,Y respectively. In this case, if p
denotes a homomorphism of X into Y we write
p(x • y) = p(x ) • p(y)
for all x , y E .X
In Chapter I we classified mappings as being into, onto, one-to-one and into,
and one-to-one alld onto. Now if p is a homomorphism of a semigroup X
into a semigroup ,Y we can also classify homomorphisms as being into, onto,
one-to-one and into, and one-to-one and onto. This classification gives rise
to the following concepts.
2.2.7. Definition. eL t p be a homomorphism of a semigroup X into a

semigroup .Y
(i) If P is a mapping of X onto ,Y we say that X and Yare homomorphic
semigroups, and we refer to X as being homomorphic to .Y
(ii) If P is a one-to-one mapping of X into ,Y then p is called an
isomorphism of X into .Y
(iii) If P is a mapping which is onto and one-to-one, we say that

semigroup X is isomorphic to semigroup .Y
(iv) If X = Y (Le., p is a homomorphism of semigroup X into itself),
then p is called an endomorphism.
(v) If X = Y a nd if p is an isomorphism (Le., p is an isomorphism of
semigroup X into itself), then p is called an automorphism of .X
We note that since all groups are semigroups, the concepts introduced
in the above definition apply necessarily also to groups.
In connection with isomorphic semigroups (or groups) a very important
observation is in order. We first note that if a semigroup (or group) X is
isomorphic to a semigroup ,Y then there exists a mapping p from X into Y
which is one-to-one and onto. Thus, the inverse of p, p- I, exists and we can
associate with each element of X one and only one element of ,Y and vice
versa. Secondly, we note that p is a homomorphism, eL ., p preserves the
properties of the respective operations associated with semigroup (or group)
X and semigroup (or group) Y o r, to put it another way, under p the (alge-
braic) properties of semigroups (or groups) X and Y a re preserved. eH nce,
it should be clear that isomorphic semigroups (or groups) are essentially
indistinguishable, the homomorphism (which is one-to-one and onto in this
case) amounting to a mere relabeling of elements of one set by elements of
a second set. We will encounter this type of phenomenon on several other
occasions in this book.
We are now ready to prove several results.
2.2.8. 1beorem. eL t p be a homomorphism from a semigroup X into a
semigroup .Y Then
(i) p(X ) is a subsemigroup of ;Y
(ii) if X has an identity element, e, p(e) is an identity element of p(X ) ;
(iii) if X has an identity element, e, and if x E X has an inverse, x - I,
then p(x) has an inverse in p(X ) and, in fact, P
[ (X)I-] = p(x - I);
(iv) if IX is a subsemigroup of ,X then p(X I) is a subsemigroup of p(X ) ;
and
(v) if Y I is a subsemigroup of p(X ) , then
Xl = {x E :X p(x ) E Y I}
is a subsemigroup of .X
Proof. To prove the first part we must show that the subset p(X ) of Y
is closed relative to the operation"· .. on .Y Now if x', y' E p( X), then there
exists at least one x E X and at least one y E X such that p(x) = x ' and
p(y) = y'. Since p is a homomorphism, we have
x' • y' = p(x) • p(y) = p(x • y),
2.2. oH momorphisms
and since x • y E X it follows that x ' • y' E p(X ) because p(x • y) E P(X ) .
Thus, p(X ) is closed and, hence, is a subsemigroup of .Y
To prove the second part, note that since e E X we have pee) E p(X ) ,
and since for any x ' E p(X ) there exists x E X such that p(x ) = x ' , we have
p(e) • x ' = pee) • p(x ) = p(e • x ) = p(x ) = x'.
Since this is true for every x ' E p(X ) , it follows that p(e) is a left identity
element of p(X ) . Similarly, we can show that x ' • pee) = x ' for every x '
E p(X ) . Thus, p(e) is an identity element of the subsemigroup p(X ) of .Y
To prove the third part of the theorem, note that since p is a homo-
morphism, we have
p(x ) • p(x - I) = p(x • I-X ) = p(e),
and
p(x - I) • p(x ) = p(x - I • x) = p(e);
i.e., p(e) is an identity element of p(X ) . Also, since p(x - I ) E P(X ) , p(x ) has
an inverse in P(X ) , and P [ (X)I- ] = p(x - I).
The proof of parts (iv) and (v) of this theorem are left as an exercise. _
2.2.9. Exercise. Complete the proof of Theorem 2.2.8.
We emphasize that although p(e) in the above theorem is an identity

element of the subsemigroup p(X ) of ,Y it is not necessarily true that pee)
has to be an identity element of .Y
2.2.10. Definition. eL t p be a homomorphism of a semigroup X into a

semigroup .Y If p(X ) has identity element, say e', then the subset of ,X K p •
defined by
K p = { x E :X p(x) = e'l
is called the kernel of the homomorphism p.
It turns out that K p is a semigroup; i.e., we have:
2.2.11. Theorem. K , is a subsemigroup of .X
Now let X and Y be groups (instead of semigroups, as above), and let p

be a homomorphism of X into .Y We have:
2.2.13. Theorem. eL t p be a homomorphism from a group X into a group

.Y Then
(i) P(X ) is a subgroup of Y; and
(ii) if e is the identity element of ,X then pee) is the identity element of .Y
66 Chapter 2 / Algebraic Structures
Proof To prove the first part, let e denote the identity element of .X By
part (i) of Theorem 2.2.8, p( X ) is a subsemigroup of ;Y by part (ii) of Theorem
2.2.8, p(e) is an identity element of p(X ) ; and by part (iii) of the same theorem,
it follows that every element of p(X ) has an inverse. Thus, p(X ) is a subgroup
of .Y
The second part of this theorem follows from Theorem 2.1.28 and from
part (ii) of Theorem 2.2.8. •
The following result is known as Cayley' s theorem.
2.2.14. Theorem. L e t ;X{ .} be a group, and let { P (X ) ; .} denote the

permutation group on .X Then X is isomorphic to a subgroup of P(X ) .
Proof F o r each 0 E ,X define the mapping /,,: X - + X by fa(x) = 0 • x
for each x E .X If ,x y E X and fa(x) = /,,(y), then a • x = 0 • y, and so
x = y. H e nce, fa is an injective mapping. Now let y E .X Then a- I . y E X
and so /"(0- 1 • y) = y. This implies that/" is surjective. H e nce, fa is a (1-1)
mapping of X onto ,X which implies that /" is a permutation on X ; i.e.,
fG E P(X ) . Now define the function rp: X - + P(X ) by rp(o) = /" for each
o E .X Now let u, VEX . F o r each x E X , I. .• (x) = (u • v) • x = u • (v • x)
= fu(v • x ) = fuU . (x ) ) = fu ol.(x ) . Thus, fu.• = fu 0 f. for all u, VEX .
Since rp(u • v) = f..• and rp(u) 0 rp(v) = fu 0 f., it follows that rp(u • v) = rp(u)
o rp(v), and so rp is a homomorphism. Suppose u, v E X are such that rp(u)
= rp(v). Then fu = f., which implies that j,,(x) = I.(x) for all x E .X In
particular,fu(e) = I.(e). H e nce, u • e = v • e, so that u = v. This implies that
rpis injective. It follows that rp is a (I- I ) mappingof X onto rp(X). By Theorem
2.2.13, part (i), rp(X) is a subgroup of P(X ) . This completes the proof. •
We also have:
2.2.15. Theorem. L e t p be a homomorphism of a semigroup X into a semi-

group ,Y and let p be an isomorphism of X with p(X ) . Then
(i) p- I is an isomorphism of p(X ) with X ; and
(ii) if P(X ) contains an identity element e', then p- I (e' ) = e is an identity
element of X and K , = e{ } and ,K .- = e{ }' (K, denotes the kernel of
the homomorphism p).
Proof To prove the first part of the theorem, let x', y' E P(X). Then there
exist uniq u e ,x y E X such that p(x) = x ' and p(y) = y', and p- I (X ' ) = x
and p- I (y' ) = y. Since
p(x • y) = p(x) • P(Y) = x' • .v',
we have
p- I (X ' • y' ) = x • y = p- I (X ' ) • p- I (y' ) .
2.2. oH momorphisms 67
Since this is true for all x ' , y' E p(X ) , it follows that p- I is an isomorphism
of p( X ) with .X
To prove the second part of the theorem we first note that P(X) is a
subsemigroup of Y by Theorem 2.2.8. It follows from Theorem 2.2.13 that
e = r
I(e') is an identity element of .X Now let p(k) = e'. Since p(e) = e',
it follows that k = e and that K, = e{ .} We can similarly show that K , .•
= e{ .} ' _
F r om the above result we can now conclude that if a semigroup X is

isomorphic to a semigroup ,Y then the semigroup Y is isomorphic to the
semigroup .X
F o r endomorphisms and automorphisms we have:
2.2.16. Theorem. L e t" and IjI be homomorphisms of a semigroup X into

itself.
(i) If" and IjI are endomorphisms of X , then the composite mapping
IjI 0 " is likewise an endomorphism of .X
(ii) If" and IjI are automorphisms of X , then IjI 0 " is an automorphism
of .X
(iii) If" is an automorphism of ,X then is also an automorphism ,,-1
of .X
Proof To prove the first part, note that" and IjI are both mappings of
X into X, and thus IjI 0 " is a mapping of X into .X Also, by definition,
(1jI 0 ' l )(x ) = 1jI('l(x» for every x E X. Now since ' l (x ' y) = 'lex) • ' l (y)
and ljI(x • y) = ljI(x) • IjI(Y) for every x , y E X , we have
IjI 0 ,,(x • y) = 1jI('l(x • y» = 1jI('l(x) • ' l (Y » = 1jI(' l (x » • 1jI(' l (x »
= (1jI 0 'l(x» • (1jI 0 ' l (y» .
This implies that the mapping IjI 0 1' is an endomorphism of .X
The proof of the second and third part of this theorem is left as an
exercise. _
2.2.17. Exercise. Complete the proof of the above theorem.
L e t us next consider homomorphisms of rings. To this end let, henceforth,

X and Y be arbitrary rings, and without loss of generality let the operations
of these two rings be denoted by and ".". "+"
2.2.18. Definition. Let X and Y be two rings. A mapping p of set X into
set Y is called a homomorphism of the ring X into the ring Y if
(i) p(x + y) = p(x ) + p(y); and
(ii) p(x , y) = p(x ) • p(y)
for every ,x y E X. The image of X into ,Y denoted by P(X ) , is called the

homomorphic image of .X
If a homomorphism p is a one-to-one mapping of a ring X into a ring

,Y then p is called an isomorphism of X into .Y If the isomorphism p is an
onto mapping of X into ,Y then p is called an isomorphism of X with .Y
Furthermore, if p is a homomorphism of X into X, then p is called an
endomorphism of the ring .X Finally, an isomorphism of X with itself is called
an automorphism of ring .X
The properties associated with homomorphisms of groups and semigroups
can, of course, be utilized when discussing homomorphisms of rings.
2.2.19. Deorem. eL t p be a homomorphism of a ring X into a ring .Y

(i) The homomorphic image p(X ) is a subring of .Y
(ii) . If lX is a subring of X , then p(X I ) is a subring of P(X ) .
(iii) eL t Y I be a subring of P(X ) . Then the subset lX C X defined by
XI = {x E :X p(x ) E Y t l
is a subring of .X
(iv) eL t Z be a ring and let f/I be a homomorphism of Y into Z. Then
the composite mapping f/I 0 p is a homomorphism of X into Z.
Proof To prove the first part of the theorem we note that the homomorphic
image p(X ) is clearly the homomorphic image of the group { X ; + } and of
the semigroup { X ; .). Since this homomorphic image is a subgroup of
{ Y ; + ) and subsemigroup of { Y ; .), it follows from Theorem 2.1.72 that
P(X ) is a subring of .Y
The proofs of the remaining parts of this theorem are left as an exercise .
•
2.2.20. Exercise. Prove parts (ii), (Hi), and (iv) of Theorem 2.2.19.
Analogous to 2.2.10, we make the following definition.
2.2.21. Definition. If p is a homomorphism of a ring X into a ring ,Y

then the subset K , of X defined by
K, = z{ E X: p(z) = O}
is called the kernel of the homomorphism p of the ring X into .Y
We close the present section by introducing one more concept.
2.2.22. Definition. eL t { R ; ,+ .} be a ring with identity and let X and Y

be two R-modules. A mapping f: X - Y is called an R-homomorphism if,
for all u, v E X and « E R the relations
2.3. Application to Polynomials 69
(i) f(u + 1) = f(u) + f(v); and

(ii) f(rt.u) = rt.f(u)
hold.
In the next chapter we will consider in great detail a special class of

vector spaces and homomorphisms, and for this reason we will not pursue
this subject any further at this time.
2.3. APPLICATION TO POLYNOMIALS
Polynomials play an important role in many branches of mathematics

as well as in science and engineering. In the present section we briefly consider
applications of some of the concepts of the preceding sections to polynomials.
First, we wish to give an abstract definition for a polynomial function.
Basically, we want this function to take the form
f(t) = 00 + alt + ... + a.t·.
oH wever, we are not looking for a way of defining the value of f(t) for each
t, but instead we seek a definition offin terms of the indexed set 0{ o, ... , an}.
To this end we let the a, belong to some field.
More formally, let F be a field and define a set P as follows. If a E P,
then a denotes an infinite sequence of elements from F in which all except a
finite number are ez ro. Thus, if a E P, then
a = 0{ o, a .. ... ,an' 0, 0, ...}.
That is to say, there exists some integer n > 0 such that a/ = 0 for all i > n.
Now let b be another element of P, where
b = lbo, bl' ... ,b.., 0, 0, ...}.
We say that a = b if and only if a, = b, for all i. We now define the operation
"+" on P by
0+ b= a{ o + bo, 0 1 + b.. ... .J
Thus, if n 2 m, then a, + b, = 0 for all i > nand P is clearly closed with
respect to "+". Next, we define the operation "." on P by
a• b= c = c{ o, CI , •. ,J
where
C" = ~
t:'o
" a,b,,_,
for all k. In this case c" = 0 for all k> m + n, and P is also closed with
respect to the operation"· " . Now let us define
0= O{ , 0, ....J
Then 0 E P and { P ; + } is clearly an abelian group with identity O. Next,
define
e= { I ,O,O, ... .J
Then e E P and { P ; • J is obviously a monoid with e as its identity element.
We can now easily prove the following
2.3.1. Theorem. The mathematical system P { ; J is a commutative ,+ .

ring with identity. It is called the riDg of polynomials over the field .F
Let us next complete the connection between our abstract characteriz a tion
of polynomials and with the function f(t) we originally introduced. To this
end we let
to= { I ,O,O, J
t\ = O{ , I, 0, 0, }
t'1. = O
{ ,O, 1,0, J
t3 = O
{ ,O,O, I,O, J
At this point we still cannot give meaning to a,t', because a, E F and t' E P.
However, if we make the obvious identification a{ " 0,0, ... J E P, and if
we denote this element simply by a, E P, then we have
f(t) = a o • to + a\ • t\ + ... + a• • t· .
Thus, we can represent J ( t) uniquely by the sequence a{ o, at, ... ,a.,
0, ... .J By convention, we henceforth omit the symbol ". ", and write, e.g.,
f(t) = ao + a\ t + ... + a"r.
We assign t appearing in the argument of f(t) a special name.
2.3.3. DeftnitiOD. Let P{ ; ,+ .}

be the polynomial ring over a field .F
The element t E P, t = O
{ , 1,0, ...}, is called the indeterminate of P.
To simplify notation, we denote by F [ t ] the ring of polynomials over a

field ,F and we identify elements of t[ F ] (i.e., polynomials) by making use of
the argument t, e.g., f(t) E t[F .]
2.3.4. DeftnitioD. Let f(t) E t[F ,] and let f(t) = f{ O,f1o .• . ,f", ... J * 0,
*° °
where f, E F for all i. The polynomial f(t) is said to be of order n or of
degree n iff" and if f, = for all i > n. In this case we write degf(t) =
and we call f" the leading coefticieDt off If f" = I and f, =
then J ( t) is said to be monic.
for all i > ° n
n,
If every coefficient of a polynomialfis zero, thenf b. is called the zero

polynomial. The order of the zero polynomial is not defined.'
°
2.3.5. Theorem. L e tf(t) be a polynomial of order n and let get) be a poly-

nomial of order m. Then f(t)g(t) is a polynomial of order m + n.
Proof L e tf(t) = f o+ + ... +

fit /"t· , let get) = go + glt + ... + g.r,
and let h(t) = f(t)g(t). Then
Since It = 0 for i > nand gJ = 0 for j > m, the largest possible value of k
such that hk is non-zero occurs for k = m n; eL ., +
hm+n = /"gm'
Since F is a field, f" and gm cannot be zero divisors, and thushm + . *- O.
Therefore, hm + . *- 0, and hk = 0 for all k > m + n. •
The reader can readily prove the next result.
2.3.6. Theorem. The ring F ( t) of polynomials over a field F is an integral

domain.
Our next result shows that, in general, we cannot go any further than
integral domain for t[ F l.
2.3.8. Theorem. Let f(t) E t[F .] Then f(t) has an inverse relative to "."
if and only if f(t) is of order zero.
Proof Let f(t) E t[F J be of order n, and assume that f(t) has an inverse
relative to ".", denoted by f- I (t), which is of order m. Then
f(t)f- I (t) = e,
where e = {I, 0, 0, ... J is of order ez ro. By Theorem 2.3.5 the degree of
f(t)f- 1 (t) is m + n. Thus, m + n = 0 and since m > 0 and n > 0, we must
havem = n = O.
Conversely, let f(t) = fo = f{ o, 0, 0, ... ,J where fo *- O. Then f- I (t)
= fo 1 = { f o· , 0, 0, ... J . •
In the case of polynomials of order zero we omit the notation t, and we

say f(t) is a scalar. Thus, if c(r) is a polynomial of order zero, we have
c(t) = c, where c 1= = O. We see immediately that cf(t) = cfo + cflt + ...
+ cf"t" for all f(t) E t[F .J
The following result, which we will require in Chapter ,4 is sometimes
called the division algorithm.
2.3.9. Theorem. eL t f(t), get) E E[t] and assume that get) *"O. Then there
exist unique elements q(t) and ret) in E[t] such that
= (q t)g(t) + ret),
f(t) (2.3.10)
where either ret) = 0 or deg ret) < deg get).
Proof If f(t) = 0 or if degf(t) < deg get), then Eq. (2.3.10) is satisfied with
q(t) = 0, and ret) = f(t). Ifdegg(t) = 0, Le.,g(t) = c, thenf(t) = c[ - I • f(t)]
• C, and Eq. (2.3.10) holds with q(t) = c- I f(t) and ret) = O.
Assume now that deg f(t) > deg get) > 1. The proof is by induction on
the degree of the polynomial f(t). Thus, let us assume that Eq. (2.3.10)
holds for deg f(t) = n. We first prove our assertion for n = 1 and then for
n + I.
Assume that deg f(t) = I, eL ., f(t) = a o + alt, where a l O. We need *"
only consider the case g(t) = b o + bit, where b l O. We readily see that *"
Eq. (2.3.10) is satisfied withq ( t) = alb. 1 and ret) = a o - alb.lb o'
Now assume that Eq. (2.3.10) holds for degf(t) = k, where k = 1, ... ,
n. We want to show that this implies the validity of Eq. (2.3.10) for
degf(t) = n + I. Let
f(t) = ao + alt + ... + a"+lt"+I,
where a,,+ I 1= = O. Let deg get) = m. We may assume that 0 < m < n + I. Let
*"
g(t) = bo + bit + ... + b",t"', where b", O. It is now readily verified that
f(t) = b;.la"t"+I"- g' (t) + [ f (t) - b;.la.,tk+I"- g' (t)]. (2.3.11)
Now let h(t) = f(t) - b;.l a.,t"+I-"'g(t). It can readily be verified that the
coefficient of t"+1 in h(t) is O. Hence, either h(t) = 0 or deg h(t) < n + I.
By our induction hypothesis, this implies there exist polynomials set) and ret)
such that h(t) = s(t)g(t) + ret), where ret) = 0 or deg ret) < deg get). Sub-
stituting the expression for h(t) into Eq. (2.3.11), we have
f(t) = b[ ;.la"t"+I"'- + s(t)]g(t) + ret).
Thus, Eq. (2.3.10) is satisfied and the proof of the existence of ret) and q(t)
is complete.
The proof of the uniqueness of q(t) and ret) is left as an exercise. _
2.3.12. Exercise. Prove that (q t) and ret) in Theorem 2.3.9 are unique.
The preceding result motivates the following definition.
2.3.13. Definition. Let f(t) and get) be any non-zero polynomials. Let
q(t) and ret) be the unique polynomials such thatf(t) = q(t)g(t) + r(t), where
either ret) = 0 or deg ret) < deg get). We call q(t) the qootient and ret) the
remainder in the division of f(t) by get). If ret) = 0, we say that get) divides
f(t) or is a factor of f(t).
Next. we prove:
2.3.14. Theorem. eL t t[ F ] denote the ring of polynomials over a field .F

eL t f(t) and g(t) be non·zero polynomials in t[ F .] Then there exists a unique
monic polynomial. d(t). such that (i) d(t) divides f(t) and g(t). and (ii) if
d'(t) is any polynomial which divides f(t) and g(t), then d'(t) divides d(t).
Proof. Let
t[ K ] = {x(t) E t[ F :] x ( t) = m(t)f(t) + n(t)g(t). where m(t). n(t) E t[ F l}.
We note that f(t). g(t) E t[ K .] Furthermore, if a(t), b(t) E t[ K .] then a(t) -
b(t) E t[ K ] and a(t)b(t) E t[ K .] Also. if c is a scalar. then ca(t) E t[ K ] for all
a(/) E K[/]. Now let d(/) be a polynomial of lowest degree in K[t]. Since all
scalar multiples of d(/) belong to t[ K ,] we may assume that d(t) is monic. We
now show that for any h(/) E t[ K .] there is a q(t) E t[ F ] such that h(/) =
d(/)q(t). To prove this. we know from Theorem 2.3.9 that there exist unique
elements q(t) and ,(t) in t[ F ] such that h(t) = q(t)d(t) + ,(1). where either
r(t) = 0 or deg ,(t) < deg d(t). Since d(t) E /[ K ] and q(t) E t[ F .] it follows
that q(I)d(t) E K(t). Also. since h(/) E t[ K ,] it follows that r(/) = h(t) -
q(t)d(t) E t[ K .] Since d(t) is a polynomial of smallest degree in (K t). it
follows that r(/) = O. eH nce. d(t) divides every polynomial in /[ K .]
To show that d(t) is unique. suppose dl(t) is another monic polynomial
in t[ K ] which divides every polynomial in t[ K .] Then d(t) = a(t)dl(t). and
d 1(t) = b(t)d(/) for some a(t). b(t) E t[ F .] It can readily be verified that this
is true only when aCt) = b(t) = 1. Now, since J ( t), g(t) E t[K l, part (i) of
the theorem has been proven.
To prove part (ii), let o(t), b(t) E t[ F ] be such that f(t) = a(t)d'(t) and
get) = b(t)d'(t). Since d(t) E t[ K ,] there exist polynomials m(t), n(t) such that
d(t) = m(t)f(t) + n(t)g(t). eH nce,
d(t) = m(t)a(t)d'(t) + n(t)b(t)d'(t)
= m
[ (t)a(t) + n(t)b(t)]d(' t).
This implies that d'(t) divides d(t) and completes the proofofthe theorem. _
The polynomial d(t) in the preceding theorem is called the greatest

common divisor of f(t) and g(t). If d(t) = 1. then f(t) and g(t) are said to be
relatively prime.
2.3.15. Exercise. Show that if d(t) is the greatest common divisor of f(t)
and g(t). then there exist polynomials m(t) and n(t) such that
de,) = m(t)f(t) + n(t)g(t).
Iff(t) and g(t) are relatively prime, then
1= m(t)f(t) + n(t)g(t).
Now let f(t) E t[ F ] be of positive degree. If f(t) = g(t)h(t) implies that

either g(t) is a scalar or h(t) is a scalar, then f(t) is said to be irreducible.
We close the present section with a statement of the fundameotal theorem
of algebra.
2.3.16. Theorem. Let f(t) E t[ F ] be a non-zero polynomial. L e t R denote

the field of real numbers and let C denote the field of complex numbers.
(i) If F = C, then f(t) can be written uniquely, except for order, as a
product
f(t) = c(t - cl)(t - C1)' .. (t - c.),
where c, C l , • • ,C. E C.
(ij) If F = R, then f(t) can be written uniquely, except for order, as a
product
f(t) = cfl(t)f1(t) . . .f",(t),
where C E R and the fl(t), ... ,/",(t) are monic irreducible polyno-
mials of degree one or two.
There are many excellent texts on abstract algebra. F o r an introductory

exposition of this subject refer, e.g., to Birkhoffand MacLane 2[ .1], H a nneken
2[ .2], H u 2[ .3], Jacobson 2[ .4,] and McCoy 2[ .6]. The books by Birkhoff
and MacLane and Jacobson are standard references. The texts by H u and
McCoy are very readable. The excellent presentation by H a nneken is
concise, somewhat abstract, yet very readable. Polynomials over a field are
treated extensively in these references. F o r a brief summary of the properties
of polynomials over a field, refer also to Lipschutz 2[ .5].
REFERENCES
2[ .1] G. BIRKO H F and S.MACLANE, A Survey of Modern Algebra. New York:

The Macmillan Company, 1965.
2[ .2] C. B. A H NNEKEN, Introduction to Abstract Algebra. Belmont, Calif.: Dicken-
son Publishing Co., Inc., 1968.
2[ .3] S. T. Hu, Elements ofModern Algebra. San rF ancisco, Calif.: oH lden-Day,
Inc., 1965.
2[ .4] N. A J COBSON, eL ctures in Abstract Algebra. New York: D. Van Nostrand
Company, Inc., 1951.
2[ .5) S. LIPSCHUTZ, iL near Algebra. New York: McGraw-iH ll Book Company,
1968.
2[ .6) N. .H McCoY, uF ndamentals of Abstract Algebra. Boston: Allyn & Bacon,
Inc., 1972.
3
VECTOR SPACES AND

IL NEAR
TRANSFORMATIONS
In Chapter I we considered the set-theoretic structure of mathematical

systems, and in Chapter 2 we developed to various degrees of complexity the
algebraic structure of mathematical systems. One of the mathematical systems
introduced in Chapter 2 was the linear or vector space, a concept of great
importance in mathematics and applications.
In the present chapter we further examine properties of linear spaces.
Then we consider special types of mappings defined on linear spaces, called
linear transformations, and establish several important properties of linear
transformations.
In the next chapter we will concern ourselves with finite dimensional
vector spaces, and we will consider matrices, which are used to represent
linear transformations on finite dimensional vector spaces.
3.1. IL NEAR SPACES
We begin by restating the definition of linear space.
3.1.1. Definition. Let X be a non-empty set, let F be a field, let "+ ..

denote a mapping of X x X into ,X and let"· " denote a mapping of F x X
into .X Let the members x E X be called l'ectors, let the elements « E F be
"+ ..
called scalars, let the operation defined on X be called e'\ ctor addition,
75
76 Chapter 3 I Vector Spaces and iL near Transformations
and let the mapping "." be called scalar multiplicatioD or moltipUcatioa or

vectors by scalars. Then for each ,x y E X there is a unique element, x y +
E ,X called the sum or x aad y, and for each x E X and IX E F there is a
unique element, IX • X I!. IXX E ,X called the multiple or x by IX. We say that
the non-empty set X and the field ,F along with the two mappings of vector
addition and scalar multiplication constitute a vector space or a iJ Dear space
if the following axioms are satisfied:
(i) x + y= y +
x for every ,x y EX ;
(ii) x+ (y +
)z = (x + y) + z for every ,x y, Z E X ;
(iii) there is a unique vector in ,X called the ez ro vector or the Dull
vector or the origiD, which is denoted by 0 and which has the prop-
+
erty that 0 x = x for all x EX ;
(iv) IX(X +
y) = IXX +
IXy for all IX E F and for all ,x y E X ;
(v) (IX +
p)x = IXX +
px for all IX, p E F and for all x E X ;
(vi) (IXP)X = IX(PX) for all IX, p E F and for all x E ;X
(vii) Ox = 0 for all x E X ; and
(viii) Ix = x for all x E .X
The reader may find it instructive to review the axioms of a field which
are summarized in Definition 2.1.63. In (v) the "+" on the left-hand side
denotes the operation of addition on F ; the "+"
on the right-hand side
denotes vector addition. Also, in (vi) IXP I!. IX · p, where "." denotes the
operation of mulitplication on .F In (vii) the symbol 0 on the left-hand side is
a scalar; the same symbol on the right-hand side denotes a vector. The I
on the left-hand side of (viii) is the identity element of F r elative to ".".
To indicate the relationship between the set ofvectors X and the underlying
field ,F we sometimes refer to a vector space X over field .F oH wever, usually
we speak of a vector space X without making explicit reference to the field F
and to the operations of vector addition and scalar multiplication. If F is
the field of real numbers we call our vector space a real vector space. Similarly,
if F is the field of complex numbers, we speak of a complex vector space.
Throughout this chapter we will usually use lower case Latin letters (e.g.,
,x y, )z to denote vectors (Le., elements of X ) and lower case Greek letters
(e.g., IX, p, )') to denote scalars (Le., elements of F ) .
If we agree to denote the element (- l )x E X simply by - x , eL ., (- l )x
I!. - x , then we have x - x = Ix + (- l )x = (l - l)x = Ox = O. Thus,
if X is a vector space, then for every x E X there is a unique vector, denoted
-x, such that x - x = O. There are several other elementary properties of
vector spaces which are a direct consequence of the above axioms. Some of
these are summarized below. The reader will have no difficulties in verifying
these.
3.1. iL near Spaces 77
3.1.2. Theorem. eL t X be a vector space. If ,x y, z are elements in X and

if ,« P are any members of ,F then the following hold:
(i) if « x = « y and IX 1= = 0, then x = y;
(ii) If IXX = px and x 1= = 0, then IX = p;.
(iii) if oX + y = x + ,z then y = ;z
(iv) IXO = 0;
(v) IX(X - y) = IXX - IX}';
(vi) (IX - fJ)x = IXX - px; and
(vii) x + y= 0 implies that x = -yo
We now consider several important examples of vector spaces.
3.1.4. Example. eL t X be the set of all "arrows" in the "plane" emanating

from a reference point which we call the origin or the ez ro vector or the null
vector, and which we denote by o. eL t F denote the set of real numbers, and
let vector addition and scalar multiplication be defined in the usual way,
as shown in iF gure A.
/
x x
x+y
o 0 y
• •
"fY 0
• • av• ., .
y
• ($•y
Vector x Vector x + y Vector y
Vector av, O< a < l
Vector ($y, fj > 1
Vector "fY, O
<'}
3.1.5. iF gm'e A
The reader can readily verify that, for the space described above, all the
axioms of a linear space are satisfied, and hence X is a vector space. _
The purpose of the above example is to provide an intuitive idea of a

linear space. We wiJ) utilize this space occasionally for purposes of motivation
in our development. We must point out however that the terms "plane" and
"arrows" were not formally defined, and thus the space X was not really
properly defined. In the examples which follow, we give a more precise for-
mulation of vector spaces.
3.1.6. Example. eL t X = R denote the set of real numbers, and let F also
denote the set of real numbers. We define vector addition to be the usual
addition of real numbers and multiplication of vectors x E R by scalars
(X E F to be multiplication of real numbers. It is a simple matter to show that
this space is a linear space. _
3.1.7. Example. Let X = P denote the set of all ordered n-tuples of

elements from field .F Thus, if x E iF t, then x = (C It e2' ... ,elt), where
e, E ,F i = I, ... ,n. With ,x yEP and (X E ,F let vector addition and
scalar multiplication be defined as
x + y = (el' e2"' " elt) + ('71' ' 7 2"' " 7' 1t)
.0. (el + 7' 1' e2 + 7' 2' ... ,elt + 7' 1t) (3.1.8)
and
(Xx = (X(el' e2" .. ,elt) .0. eX « l' e« 2" .. ,«elt)' (3.1.9)
It should be noted that the symbol" "+
on the right-hand side of Eq. (3.1.8)
denotes addition on the field ,F and the symbol" on the left-hand side of "+
Eq. (3.1.8) designates vector addition. (See Theorem 2.1.88.)
In the present case the null vector is defined as = (0, 0, ... , 0) and the
vector - x is defined by - x = -(~I' ~2"'" ~It) = (-~I' -~2"'" e- lt)'
°
Utilizing the properties of the field ,F all axioms of Definition 3.1.1 are
readily verified, and iF t is thus a vector space. We call this space the space
iF t of n-tuples of elements of .F _
3.1.10. Example. In Example 3.1.7 let F = R, the field of real numbers.

Then X = RIt denotes the set of all n-tuples of real numbers. We call the
vector space R" the n-dimensional real coordinate space. Similarly, in Example
3.1.7 let F = C, the field of complex numbers. Then X = Cit designates the
set of all n-tuples of complex numbers. The linear space C" is called the n-
dimensional complex coordinate space. _
In the previous example we used the term dimension. At a later point in

the present chapter the concept of dimension will be defined precisely and
some of its properties will be examined in detail.
3.1.11. Example. eL t X denote the set of all infinite sequences of real

numbers of the form
(3.1.12)
let F denote the field of real numbers, let vector addition be defined similarly
as in Eq. (3.1.8), and let scalar multiplication be defined similarly as in Eq.
(3.1.9). It is again an easy matter to show that this space is a vector space.
We point out that this space, which we denote by R- , is simply the collection
3.1. iL near Spaces 79
of all infinite sequences; eL ., there is no req u irement that any type of conver-
gence of the sequence be implied. _
3.1.13. Example. L e t X = c~ denote the set of all infinite sequences of

complex numbers of the form (3.1.12), let F represent the field of complex
numbers, let vector addition be defined similarly as in Eq. (3.1.8), and let
scalar multiplication be defined similarly as in Eq. (3.1.9). Then C~ is a
vector space. _
3.1.14. Example. L e t X denote the set of all sequences of real numbers

having only a finite number of non- z e ro terms. Thus, if x E ,X then
(3.1.15)
for some positive integer I. If we define vector addition similarly as in Eq.
(3.1.8), if we define scalar multiplication similarly as in Eq. (3.1.9), and if we
let F be the field of real numbers, then we can readily show that X is a real
vector space. We call this space the space of finitely non-zero sequences.
If X denotes the set of all sequences of complex numbers of the form
(3.1.15), if vector addition and scalar multiplication are defined similarly as
in eq u ations (3.1.8) and (3.1.9), respectively, then X is again a vector space
(a complex vector space). _
3.1.16. Example. L e t X be the set of infinite sequences of real numbers

of the form (3.1.12), with the property that Iim~. = O. If F is the field of real
numbers, if vector addition is defined similarly as in Eq. (3.1.8), and if scalar
multiplication is defined similarly as in Eq. (3.1.9), then X is a vector space.
This is so because the sum of two sequences which converge to z e ro also
converges to zero, and because the scalar multiple of a sequence converging
to z e ro also converges to zero. _
3.1.17. Example. L e t X be the set of infinite sequences of real numbers

of the form (3.1.12) which are bounded. If vector addition and scalar multi-
plication are again defined similarly as in (3.1.8) and (3.1.9), respectively,
and if F denotes the field of real numbers, then X is a vector space. This
space is called the space of bounded real sequences.
There also exists, of course, a complex counterpart to this space, the
space of bounded complex sequences. _
3.1.18. Example. Let X denote the set of infinite sequences of real numbers
00
of the form (3.1.12), with the property that ~ I~II < 00. Let F be the field
1= 1
of real numbers, let vector addition be defined similarly as in (3.1.8), and let
scalar multiplication be defined similarly as in Eq. (3.1.9). Then X is a vector
space. _
80 Chapter 3 I Vector Spaces and Linear Transformations
3.1.19. ExaDlple. Let X be the set of all real-valued continuous functions

defined on the interval a[ , b]. Thus, if x E ,X then x : a[ , b] - > R is a real,
continuous function defined for all a ~ t < b. We note that x = y if and only
if x ( t) = y(t) for all t E a[ , b], and that the null vector is the function which
is ez ro for all t E a[ , b]. eL t F denote the field of real numbers, let liEF ,
and let vector addition and scalar multiplication be defined pointwise by
(x + yX t ) = x ( t) + y(t) for all t E a[ , b] (3.1.20)
and
(lIx X t ) = lIx{t) for all t E a[ , b]. (3.1.21)
Then clearly x + y E X whenever x, y E X, IIX E X whenever liEF and

x E ,X and all the axioms of a vector space are satisfied. We call this vector
space the space of real-valued continuous functions on a[ , b) and we denote it
by era, b). _
3.1.22. Example. Let X be the set of all real-valued functions defined on

the interval a[ , b) such that
s: Ix ( t)\ dt < 00,
where integration is taken in the Riemann sense. eL t F denote the field of

real numbers, and let vector addition and scalar multiplication be defined as
in equations (3.1.20) and (3.1.21), respectively. We can readily verify that X
is a vector space. _
3.1.23. Example. eL t X denote the set of all real-valued polynomials

defined on the interval a[ , b), let F be the field of real numbers, and let vector
addition and scalar multiplication be defined as in equations (3.1.20) and
(3.1.21), respectively. We note that the null vector is the function which is
ez ro for all t E a[ , b], and also, if x ( t) is a polynomial, then so is - x ( t).
Furthermore, we observe that the sum of two polynomials is again a poly-
nomial, and that a scalar multiple of a polynomial is also a polynomial.
We can now readily verify that X is a linear space. _
3.1.24. Example. eL t X denote the set of real numbers between - a < 0

and + a > 0; i.e., if x E X then x E [ - a , a). Let F be the field of real
numbers. Let vector addition and scalar multiplication be as defined in
Example 3.1.6. Now, if II E F is such that II> I, then tZa > a and tZa ¢ .X
F r om this it follows that X is not a vector space. _
Vector spaces such as those encountered in Examples 3.1.19,3.1.22,

and 3.1.23 are called function spaces. In Chapter 6 we will consider some addi-
tional linear spaces.
3.2. Linear Subspaces and Direct Sums 81
3.1.25. Exercise. Verify the assertions made in Examples 3.1.6,3.1.7,

3.1.10,3.1.11,3.1.13,3.1.14,3.1.16,3.1.17,3.1.18,3.1.19,3.1.22, and 3.1.23.
3.2. IL NEAR SUBSPACES AND DIRECT SUMS
We first introduce the notion of linear subspace. (See also Definition

2.1.102.)
3.2.1. Definition. A non-empty subset Y of a vector space X is called a

linear manifold or a linear subspace in X if (i) x + y is in Y whenever x and y
are in ,Y and (ii) (X x is in Y whenever (X E F and x E .Y
It is an easy matter to verify that a linear manifold Y satisfies all the

axioms of a vector space and may as such be regarded as a linear space itself.
3.2.2. Example. The set consisting of the null vector 0 is a linear subspace;
i.e., the set Y = O
{ J is a linear subspace. Also, the vector space X is a linear
subspace of itself. If a linear subspace Y is not all of X, then we say that Y
is a proper subspace of .X •
3.2.3. Example. The set of all real-valued polynomials defined on the

interval a[ , b] (see Example 3.1.23) is a linear subspace of the vector space
consisting of all real-valued continuous functions defined on the interval
[a, b] (see Example 3.1.19). •
Concerning linear subspaces we now state and prove the following result.
3.2.4. Theorem. Let Y and Z be linear subspaces of a vector space .X

The intersection of Y and Z, Y n Z, is also a linear subspace of .X
Proof. Since Y a nd Z are linear subspaces, it follows that 0 E Y a nd 0 E Z,
and thus 0 E Y n Z. Hence, Y n Z is non-empty. Now let (x , p E ,F let
X , y E ,Y and let x , y E Z. Then (X x + py E Y a nd also (X x + py E Z,
because Y a nd Z are both linear subspaces. eH nce, (X x + py E Y n Z and
Y n Z is a linear subspace of .X •
We can extend the above theorem to a more general result.
3.2.5. Theorem. Let X be a vector space and let ~ be a linear subspace

of X for every i E I, where I denotes some index set. Then
lei
n
XI is a linear
su bspace 0 f .X

Now consider in the vector space of Example 3.1.4 the subsets Y a nd Z

consisting of two lines intersecting at the origin 0, as shown in Figure B.
Clearly, Y and Z are linear subspaces of the vector space .X On the other
hand, the union of Y and Z, Y u Z, obviously does not contain arbitrary
sums lXy + flz, where lX, fl E F and y E Y a nd z E Z. F r om this it follows
that if Y and Z are linear subspaces then, in general, the union Y U Z is
not a linear subspace of .X
3.1.7. iF gure B
3.2.8. Definition. eL t X be a linear space, and let Y and Z be arbitrary

subsets of X. The sum of sets Y and Z, denoted by Y + Z, is the set of all
vectors in X which are of the form y + ,z where y E Y a nd Z E Z.
The above concept is depicted pictorially in Figure C by utilizing the

vector space of Example 3.1.4. With the aid of our next result we can generate
various linear subspaces.
Y+Z
3.2.9. iF gure C. Sum of two Subsets.

3.2. Linear Subspaces and Direct Sums 83
3.2.10. Theorem. Let Y a nd Z be linear subspaces of a vector space .X

Then their sum, Y + Z, is also a linear subspace of .X
Now let Y and Z be linear subspaces of a vector space .X If Y n Z = O

{ ,J
we say that the spaces Y and Z are disjoint. We emphasize that this termi-
nology is not consistent with that used in connection with sets. We now
have:
3.2.12. Theorem. Let Y and Z be linear subspaces of a vector space .X

Then for every x E Y + Z there exist unique elements Y E Y and Z E Z
such that x = Y +
Z if and only if Y n Z = O
{ .J
Proof x E Y + Z be such that x = IY + ZI = 2Y + Z2' where Y I '
Let
2Y E where ZI,Z2 E Z. Then clearly IY - 2Y = Z2 - ZI' Now IY - 2Y
Y a nd
E Y and Z2 - Z I E Z, and since by assumption Y n Z = (OJ, it follows that
IY - 2Y = 0 and Z2 - ZI = 0, .Y = 2Y and ZI = Z2' Thus, every x E Y + Z
has a unique representation x = Y + ,z where Y E Y and Z E Z, provided
that Y n Z = O { .J
Conversely, let us assume that for each x = Y + Z E Y + Z the Y E Y
and the Z E Z are uniquely determined. Let us further assume that the linear
subspaces Y and Z are not disjoint. Then there exists a non-zero vector
v E Y n Z. In this case we can write x = Y Z = Y +
Z + +
«v - «v = (y
+ /Xv) + (z - /Xv) for all/X E .F But this implies that y and z are not unique,
which is a contradiction to our hypothesis. eH nce, the spaces Y and Z must
be disjoint. _
Theorem 3.2.10 is readily extended to any number of linear subspaces

of .X Specifically, if IX > ... , ,X are linear subspaces of ,X then XI + ...
+ X, is also a linear subspace of .X This enables us to introduce the fol-
lowing:
3.2.13. Definition. Let X I > " " ,X be linear subspaces of the vector
space .X The sum XI + ... + ,X is said to be a direct sum if for each
x E XI + +
,X there is a unique set of iX E IX > i = I, ... ,r such that
x = IX + +
x,. We denote the direct sum of X I , . .. , ,X by XI EB ...
EB ,X .
There is a connection between the Cartesian product of two vector spaces
and their direct sum. eL t Y a nd Z be two arbitrary linear spaces over the same
field F and let V = Y x Z. Thus, if v E V, then v is the ordered pair
v= (y, z),
where y E Y a nd Z E Z. Now let us define vector addition as
(3.2.14)
and scalar multiplication as
« ( y, z) = (<y< , )z« , (3.2.15)
where (yI' Z I), (Y1' z 1) E V = Y x Z and where« E .F Noting thatfor each

vector (y, z) E V there is a vector - ( y, z) = (- y , - z ) E V, and observing
that (0, 0) = (y, )z - (y, )z for all elements in V, it is an easy matter to show
that the space V = Y x Z is a linear space. We note that Y is not a linear
subspace of V, because, in fact, it is not even a subset of V. However, if we
let
Y' = fey, 0): yE Y},
and
Z' = ({ O, z): z E Z} ,
Then Y ' and Z' are linear subspaces of V and V = Y' EB Z' . By abuse of
notation, we frequently express this simply as V = Y EB Z.
Once more, making use of Example 3.1.4, let Y and Z denote two lines
intersecting at the origin 0, as shown in F i gure D. The direct sum of linear
subspaces Y a nd Z is in this case the "entire plane."
3.2.16. Figure D
In order that a subset be a linear subspace of a vector space, it is necessary

that this subset contain the null vector. Thus, in Figure D, the lines Y a nd Z
passing through the origin 0 are linear subspaces of the plane (see Example
3.1.4). In many applications this requirement is too restrictive and a general-
ization is called for. We have:
3.2.17. DefinitiOD. Let Y be a linear subspace of a vector space ,X and let

x be a fixed vector in .X We cal1 the translation
3.3. iL near Independence. Bases. and Dimension 8S
z = X + Y Do z{ E X: Z = x + .Y Y E }Y
a linear variety or a ftat or an affine linear subspace of .X
In F i gure E. an example of a linear variety is given for the vector space of

Example 3.1.4.
3.2.18. iF gure E
3.3. IL NEAR INDEPENDENCE, BASES,

AND DIMENSION
Throughout the remainder of this and in the following chapter we use the
following notation: ({ IX • . .. ,(X,,}, (X, E ,F denotes an indexed set of scalars,
and IX{ ' ...• ,x ,}. ,X E .X denotes an indexed set of vectors.
Before introducing the notions of linear dependence and independence
of a set of vectors in a linear space ,X we first consider the following.
3.3.1. Deflnition. Let Y be a set in a linear space X (Y may be a finite set

or an infinite set). We say that a vector X E X is a finite linear combination
of vectors in Y if there is a finite set of elements {YI' Y 2 "" • ,Y ,} in Y a nd a
finite set of scalars ({ IX ' (X2' ... , (X.} in F such that
X = (XIIY + (X22Y + ... + (X"y". (3.3.2)
In Eq. (3.3.2) vector addition has been extended in an obvious way from
the case of two vectors to the case of n vectors. In later chapters we will
consider linear combinations which are not necessarily finite. The represen-
tation of x in Eq. (3.3.2) is, of course, not necessarily unique. Thus, in the
case of Example 3.1.10, if X = RZ and if x = (1, 1), then x can be represented
as
or as
x = PIZI + pzz = 2(! , 0) + 3(0,! ) ,
etc. This situation is depicted in Figure .F
2Y = (0,1) X" (1,1) = 1(1,0) + 1(0, 1)" 2(! , 0) + 3(O,! ) " etc.
Z2 = (0, M
...- .......- ...... ,Y = ( 1,0)
Z, .. (!' 0)
3.3.3. iF gure F
3.3.4. Theorem. eL t Y be a non-empty subset of a linear space .X eL t

V( Y ) be the set of all finite linear combinations of the vectors from Y; eL .,
Y E V( Y ) if and only if there is some set of scalars { l X I ' • • , IX",l and some
finite subset {y I ' • . ,Y",l of Y such that
Y = IXIIY + IX.}'Yz + ... + IX",Y .. ,
where m may be any positive integer. Then V( Y) is a linear subspace of .X
Our previous result motivates the following concepts.
3.3.6. Defloition. We say the linear space V( Y ) in Theorem 3.3.4 is the

linear subspace generated by the set .Y
3.3.7. Defloition. eL t Z be a linear subspace of a vector space .X If there

exists a set of vectors Y c X such that the linear space V( Y ) generated by
Y is Z, then we say Y spans Z.
If, in particular, the space of Example 3.1.4 is considered and if V and W

are linear subspaces of X as depicted in Figure 0, then the set Y = e{ ll
spans W, the set Z = e{ lz spans V, and the set M = {el' ezl spans the vector
space .X The set N = e{ l, e'.} , e3l also spans the vector space .X
3.3. iL near Independence, Bases, and Dimension 87
3.3.8. iF gure G. Vand Ware iL nes Intersecting at Origin O.
3.3.9. Exercise. Show that V( )Y is the smallest linear subspace of a vector

space X containing the subset Y of X. Specifically, show that if Z is a linear
subspace of X and if Z contains ,Y then Z also contains V( Y).
And now the important notion of linear dependence.
3.3.10. Deftnition. Let { X I ' X 2, • • ,x",} be a finite non-empty set in a

linear space .X Ifthere exist scalars IX I , • . ,IX " , E E, not all zero, such that
IX I X I + ... + IX",X", = 0 (3.3.11)
then the set IX{ > X2' • • , x m} is said to be linearly dependent. If a set is not
linearly dependent,then it is said to be linearly independent. In this case the
relation (3.3.11) implies that IX I = IX 2 = ... = IX", = O. An infinite set of
vectors Y in X is said to be linearly independent if every finite subset of Y
is linearly independent.
Note that the null vector cannot be contained in a set which is linearly
independent. Also, if a set of vectors contains a linearly dependent subset,
then the whole set is linearly dependent.
If X denotes the space of Example 3.1.4, the set of vectors y{ , }z in Figure
H is linearly independent, while the set of vectors ru,
v} is linearly dependent.
o
3.3.12. iF gure .H iL nearly Independent and iL nearly Dependent Vec-
tors.
3.3.13. Exercise. eL t X = e[a, b), the set of all real-valued continuous

functions on a[ , b), where b > a. As we saw in Example 3.1.19, this set forms
a vector space. eL t n be a fixed positive integer, and let us define ,x E X for

i = 0, 1,2, ... , n, as follows. F o r all I E a[ , b), let
x i t) = 1
and
,x (t) = I', i= I, ... ,n.
L e tY = x{ o, X I "' " x 8 }. Then V( )Y is the set of all polynomials on a[ , b]
of degree less than or equal to n.
(a) Show that Y is a linearly independent set in .X
(b) eL t ,X = ,x { ,} i = 0, 1, ... ,n; i.e., each ,X is a singleton subset
of .X Show that
V( Y ) = V(X o) Ef> V(X I) Ef> • . . Ef> V(X.).
(c) eL t oz (t) = 1 for all I E a[ , b) and let
Zk(t) = I + I + ... + Ik
for all I E a[ , b) and k = 1, ... ,n. Show that Z = oz{ , Zl"' " .z }
is a linearly independent set in V( )Y .
3.3.14. Theorem. eL t {XI' X 1 "' " "x ,} be a linearly independent set in

a vector space .X "'
If I:: "',X "' P,x"
= I:: then "" = P, for all i = 1, 2, ... , m.
I- ' I- '
Proof. If,~ "' "',,x = ,~ "'
P,x, then ,~ "' ("" - P,)x, = O. Since the set .x{ , ... ,
"x ,} is linearly independent, we have ("', - P,) = 0 for all i = 1, ... ,m.
Therefore "" = P, for all i. •
The next result provides us with an alternate way of defining linear

dependence.
3.3.15. Theorem. A set of vectors .x{ , x 1, •• ,x",} in a linear space X

is linearly dependent if and only if for some index i, 1 ~ i ~ m, we can find
scalars "' I ' ... , "',- .. "',+ I' , "'", such that
,x = "' I X . + + ""- I X ' - I + "',.+ IX + I + ... + "'",x.. (3.3.16)
Proof. Assume that Eq. (3.3.16) is satisfied. Then
"' . X I + ... + "' , _ . X ' _ I + (- l )x , + "',.+ ,x .+ + ... + "'."X , = O.
Thus, "" = - 1 1= = 0 is a non-trivial choice of coefficient for which Eq.
(3.3.11) holds, and therefore the set {Xl> X1' • • , "x ,} is linearly dependent.
Conversely, assume that the set {XI' X z , • • ,x",} is linearly dependent.
Then there exist coefficients "' . , ... , "'", which are not all ez ro, such that
"'IX +
(3.3.17)
I "'x z z + ... + "'",x", = O.
Suppose that index i is chosen such that "" 1= = O. Rearranging Eq. (3.3.17) to
3.3. Linear Independence, Bases, and Dimension 89
- I I I,X = II,X , + ... + II' - I X I- I + III+ I X I+ I + ... + II.X " , (3.3.18)

and multiplying both sides of Eq. (3.3.18) by - 1 /11" we obtain
IX = PIX I + P1.X1. + + P' _ I X / _ I + PI+ I X / + 1 + ... + P",x " "
where P" = -11"/11,, k = I, ,i - I, i + I, ... ,m. This concludes our
proof. _
The proof of the next result is left as an exercise.
3.3.19. Theorem. A finite non-empty set Y in a linear space X is linearly

indenpendent if and only if for each y E V( Y), y 0, there is a unique finite *
subset of ,Y say { X I ' X 1 ."' " x " ,} and a uniq u e set of scalars { I II' 111.,"" II",} ,
such that
3.3.21. Exercise. L e t Y be a finite set in a linear space .X Show that Y

is linearly independent if and only if there is no proper subset Z of Y such
that V(Z) = V( )Y .
A concept which is of utmost importance in the study of vector spaces is

that of basis of a linear space.
3.3.22. Definition. A set Y in a linear space X is called a Hamel basis,

or simply a basis, for X if
(i) Y is linearly independent; and
(ii) the span of Y is the linear space X itself; eL ., V( Y ) = .X
As an immediate consequence of this definition we have:
3.3.23. Theorem. Let X be a linear space, and let Y be a linearly indepen-

dent set in .X Then Y is a basis for V( )Y .
In order to introduce the notion of dimension of a vector space we show

that if a linear space X is generated by a finite number of linearly independent
elements, then this number of elements must be unique. We first prove the
following result.
3.3.25. 1beorem. L e t {XI' X 1 .,' " ,x , ,} be a basis for a linear space .X

Then for each vector x E X there exist unique scalars (II' . . . , (I" such that
X = (lIX 1 + ... + II"X " .

Proof. Since IX ' ... ,X . span ,X every vector X E X can be expressed as

a linear combination of them; i.e.,
X = lIlx l + lI"X" + ... + lI.X.
for some choice of scalars lIl" .. ,lI• . We now must show that these scalars
are unique. To this end, suppose that
X = lIlX I + lI"X" + ... + lI.X.
and
Then
x + (- x ) = (lIIX I +
lI"X" + ... + lI.X.) + (- P IX I- P"x"
- ... - P.x.)
= (lIl - PI)X I + (lI" - P")x,, + ... + (ll. - P.)x. = O.
Since the vectors x I, "x , ...' ,X. form a basis for ,X it follows that they
are linearly independent, and therefore we must have (lI, - P,) = 0 for
i = 1, ... ,n. From this it follows that III = PI' lI" = P", ... ,lI" = p". •
We also have:
3.3.26. Theorem. eL t IX{ ' "X , ... ,x . } be a basis for vector space ,X and
let {YI' ... IY' II} be any linearly independent set of vectors. Then m < n.
Proof. We need to consider only the case m > n and prove that then we
actually have m = n. Consider the set of vectors IY{ ' X I "" ,x.l. Since the
vectors XI' ... ,X . span ,X IY can be expressed as a linear combination of
them. Thus, the set {YI' X I > ' " ,x.l is not linearly independent. Therefore,
there exist scalars PI' lIl> ... , lI., not all ez ro, such that
PIYI + lIlx l + ... + lI"X. = O. (3.3.27)
If all the lI, are zero, then PI *' 0 and PlY I = O. Thus, we can write
PIYI + O· "Y + ... + O· IY II = O.
But this contradicts the hypothesis· of the theorem and can' t happen because
the YI' ... IY ' II are linearly independent. Therefore, at least one of the lI, O. *'
Renumbering all the x" if necessary, we can assume that lI" O. Solving for *'
x" we now obtain
"x = (- l l)Y I + (~~I)XI + ... + (- : :- I )X . _ I. (3.3.28)
Now we show that the set IY{ ' X I "' " ,X ,-I} is also a basis for .X Since
{ X I "' " x.} is a basis for ,X we have I~ ' "~ , •• ,~. E F s uch that
X = ~IXI + ... + ~.x .•

Substituting (3.3.28) into the above expression we note that
3.3. Linear Independence, Bases, and Dimension 91
x = 'IXI + Z' X z + + ,.[(I[ - )Yt + ... + (-::-I)X._ t]

= "Y I + "tXI + + ".- I X . _ I '
"1
where" and are defined in an obvious way. In any case, every x E X can
be expressed as a linear combination of the set of vectors y{ t, X I' • • , X . _ a,
and thus this set must span .X To show that this set is also linearly indepen-
dent, let us assume that there are scalars such that A, AI' ... ,A._ I
AYI + AIX I + ... + A._IX._ I = 0,
and assume that A1= = O. Then
YI- _ (-A
T I) XI·" + + (-A
-A._I)- X.- t
+ 0 ·X .• (3.3.29)
In view of Eq. (3.3.27) we have, since PI 1= = 0, the relation
YI = (p~I)XI + ... + (-p:-t)x._ t+ (p~.)x. (3.3.30)
Now the term (-a../Pt)x. in Eq. (3.3.30) is not zero, because we solved for
X . in Eq. (3.3.28); yet the coefficient multiplying X . in Eq. (3.3.29) is zero.
Since { X I ' ... ,x . J is a basis, we have arrived at a contradiction, in view of
Theorem 3.3.25. Therefore, we must have A = O. Thus, we have
At IX + ... + A._t.X _ 1 + 0 . .X = 0
and since { x u . .. , .x l is a linearly independent set it follows that = 0, AI
• . . , A._ I = O. Therefore, the set { y \ J X I ' • • , X . _ d is indeed a basis for X.
By a similar argument as the preceding one we can show that the set
,z Y { YI' XI'· ' ,x . - z J is a basis for ,X that the set 3Y{ ' ,z Y Y I ' X I ' ... ,x . - 3 I
is a basis for ,X etc. Now if m > n, then we would not utilize Y n + 1 in our
process. Since {Y., . .. ,Y I ) is a basis by the preceding argument, there exist
coefficients ' I ., ... , ' I I such that
Y.+I = ' I .Y . + ... + ' I IY I '
But by Theorem 3.3.15 this means the "Y i = 1, ... ,n + 1 are linearly
dependent, a contradiction to the hypothesis of our theorem. F r om this it
now follows that if m > n, then we must have m = n. This concludes the
proof of the theorem. _
As a direct consequence of Theorem 3.3.26 we have:
3.3.31. Theorem. If a linear space X has a basis containing a finite number

of vectors n, then any other basis for X consists of exactly n elements.
Proof Let { X I ' ... , .x 1 be a basis for X, and let also { Y I "" , y.. l be a
basis for .X Then in view of Theorem 3.3.26 we have m < n. Interchanging
the role of the X i and ,Y we also have n < m. Hence, m = n. _
Our preceding result enables us to make the following definition.
3.3.32. Definition. If a linear space X has a basis consisting of a finite

number of vectors, say X { I ' • • , ,x ,}, then X is said to be a ftnite-diJDelLl4 ional
vector space and the dimension of X is n, abbreviated dim X = n. In this
case we speak of an n-dimeasional vector space. If X is not a finite-dimensional
vector space, it is said to be an inftnite-dimeasional vector space.
We will agree that the linear space consisting of the null vector is finite
dimensional, and we will say that the dimension of this space is ez ro.
Our next result provides us with an alternate characterization of (finite)
dimension of a linear space.
3.3.33. Theorem. Let X be a vector space which contains n linearly inde-

pendent vectors. If every set of n + I vectors in X is linearly dependent,
then X is finite dimensional and dim X = n.
Proof eL t IX{ > • . . ,x,,} be a linearly independent set in ,X and let x E .X
Then there exists a set of scalars {II I' ... , 11,,+ I} not all ez ro, such that
II I X I + ... + II"X" + II H I X = O.
Now 11"+1 *- 0, otherwise we would contradict the fact that XI' •.• ,X " are
linearly independent. eH nce,
X = (- ...!)L x11"+1
l - .•. - (~)x"
11,,+ I
and X E V({ X I > " " ,x ,}); i.e., { X l • • ,x,,} is a basis for .X Therefore, X
is n-dimensional. _
F r om our preceding result follows:
3.3.34. Corollary. Let X be a vector space. If for given n every set of n + 1

vectors in X is linearly dependent, then X is finite dimensional and dim X
n< o
3.3.35. Exercise. Prove Coroll~ry 3.3.34.
We are now in a position to speak of coordinates of a vector. We have:
3.3.36. Definition. Let X be a finite-dimensional vector space, and let

x { I ' . . . , ,x ,} be a basis for .X Let X E X be represented by
x = 'tXI + ... + ,,,x,,.
The unique scalars, I> 2' ., ... ,,,, are called the coordinates of x with respect
to the basis {XI' 2X ." • , ,x ,}.
It is possible to prove results similar to Theorems 3.3.26 and 3.3.31 for
infinite-dimensional linear spaces. Since we will not make further use of
3.3. iL near Independence, Bases, and Dimension 93
these results in this book, their proofs will be omitted. In the following
theorems, X is an arbitrary vector space (i.e., finite dimensional or infinite
dimensional).
3.3.37. Theorem. If Y is a linearly independent set in a linear space ,X

then there exists a Hamel basis Z for X such that Y c Z.
3.3.38. Theorem. If Y and Z are Hamel bases for a linear space ,X then
Y and Z have the same cardinal number.
The notion of H a mel basis is not the only concept of basis with which we
will deal. Such other concepts (to be specified later) reduce to H a mel basis
on finite-dimensional vector spaces but differ significantly on infinite-dimen-
sional spaces. We will find that on infinite-dimensional spaces the concept
of Hamel basis is not very useful. However, in the case of finite-dimensional
spaces the concept of Hamel basis is most crucial.
In view of the results presented thus far, the reader can readily prove the
following facts.
3.3.39. Theorem. Let X be a finite-dimensional linear space with dim X

=n.
(i) No linearly independent set in X contains more than n vectors.
(ii) A linearly independent set in X is a basis if and only if it contains
exactly n vectors.
(iii) Every spanning or generating set for X contains a basis for .X
(iv) Every set of vectors which spans X contains at least n vectors.
(v) Every linearly independent set of vectors in X is contained in a basis
for .X
(vi) If Y is a linear subspace of X, then Y is finite dimensional and
dim Y < n .
(vii) If Y is a linear subspace of X and if dim X = dim ,Y then Y = .x
F r om Theorem 3.3.39 follows directly our next result.
3.3.41. Theorem. Let X be a finite-dimensional linear space of dimension

n, and let Y be a collection of vectors in .X Then any two of the three con-
ditions listed below imply the third condition:
(i) the vectors in Y a re linearly independent;
(ii) the vectors in Y span X ; and
(iii) the number of vectors in Y is n.
Chapter 3 I Vector Spaces and iL near Transformalions
Another way of restating Theorem 3.3.41 is as follows:

(a) the dimension of a finite-dimensional linear space X is equal to the
smallest number of vectors that can be used to span X ; and
(b) the dimension of a finite-dimensional linear space X is the largest
number of vectors that can be linearly independent in .X
F o r the direct sum of two linear subspaces we have the following result.
3.3.43. Theorem. eL t X be a finite-dimensional vector space. If there

exist linear subspaces Y and Z of X such that X = Y ® Z, then dim (X )
= dim (Y ) + dim (Z).
Proof Since X is finite dimensional it follows from part (vi) of Theorem
3.3.39 that Y a nd Z are finite-dimensionallinear spaces. Thus, there exists
a basis, say { Y I "" ,Y,,} for ,Y and a basis, say { Z I> ' " ,z ..}, for Z. Let
W = { Y I "' " "Y , ZI"" ,z",}. We must show that Wis a linearly independent
set in X and that V(W) = .X Now suppose that
Since the representation for 0 E X must be unique in terms of its components

in Y and Z, we must have
and
But this implies that ~I = ~ = ... = ~ " = PI= P~ = ... = P.. = O.

Thus, W is a linearly independent set in .X Since X is the direct sum of Y
and Z, it is clear that W generates .X Thus, dim X = m + n. This completes
the proof of the theorem. _
We conclude the present section with the following results.
3.3.4.4 1beorem. eL t X be an n-dimensional vector space, and let y{ I '

... , y",} be a linearly independent set of vectors in ,X where m < n. Then it
is possible to form a basis for X consisting of n vectors x I ' • • , x"' where
,x = ,Y for i = I, ... , m.
Proof Let { e l"" ,e,,} be a basis for .X Let SI be the set of vectors IY{ '
... ,Y"" e l , • • , ell}, where { Y I "' " Y .. } is a linearly independent set of
vectors in X and where m < n. We note that SI spans X and is linearly
3.4. iL near Transformations 95
dependent, since it contains more than n vectors. Now let
E . tJ,,Y + E "
p,e, = O.
1= ' 1= '
Then there must be some Pj "*
0, otherwise the linear independence of
{ y " ... , Y.} would be contradicted. But this means that ej is a linear combi-
nation of the set of vectors Sz = y{ I' • . . , Y .., e l , • • , e j _ l , e j "+ ... , ell};
i.e., Sz is the set SI with ej eliminated. Clearly, Sz still spans .X Now either
Sz contains n vectors or else it is a linearly dependent set. If it contains n
vectors, then by Theorem 3.3.41 these vectors must be linearly independent
in which case Sz is a basis for .X We then let "x = t j , and the theorem is proved.
On the other hand, if Sz contains more than n vectors, then we continue the
above procedure to eliminate vectors from the remaining e,'s until exactly
n - m of them are left. Letting eil, ... ,ej _ be the remaining vectors and
letting X .. + I = til' ... ,x " = ej • _ , we have completed the proof of the
theorem. _
3.3.45. Corollary. Let X be an n-dimensional vector space, and let Y be

an m-dimensional subspace of .X Then there exists a subspace Z of X of
dimension (n - m) such that X = Y EB Z.
Referring to Figure 3.3.8, it is easy to see that the subspace Z in Corollary

3.3.45 need not be unique.
3.4. IL NEAR TRANSFORMATIONS
Among the most important notions which we will encounter are special
types of mappings on vector spaces, called linear transformations.
3.4.1. Deftnition. A mapping T of a linear space X into a linear space ,Y

where X and Y a re vector spaces over the same field ,F is called a linear
transformation or linear operator provided that
(i) T(x + y) = T(x) + T(y) for all x, y E X ; and
(ii) T(tJ)x = tJT(x) for all x E X and for all tJ E .F
A transformation which is not linear is called a non-linear transformation.
We will find it convenient to write T E L ( X , )Y to indicate that T is

a linear transformation from a linear space X into a linear space Y (i.e.,
L(X, )Y denotes the set of all linear transformations from linear space X
into linear space Y). .
It follows immediately from the above definition that T is a linear transfor-
mation from a linear space X into a linear space Y if and only if T(tl IIIXI)
" II,T(X
= I-I; ,) for all ,X E X and for all II, E F , ; = I, ... ,n. In engineering
I
and science this is called the principle of soperposition and is among the most
important concepts in those disciplines.
3.4.2. Example. Let X = Y denote the space of real-valued continuous

functions on the interval a[ , b] as described in Example 3.1.19. Let T: X - + Y
be defined by
T
[ (]x t) = f (x s)ds, a < t < b,
where integration is in the Riemann sense. By the properties of integrals

it follows readily that T is a linear transformation. •
3.4.3. Example. Let X = e"(a, b) denote the set of functions x ( t) with n

continuous derivatives on the interval (a, b), and let vector addition and scalar
multiplication be defined by equations (3.1.20) and (3.1.21), respectively.
It is readily verified that e"(a, b) is a linear space. Now let T: e"(a, b)
-+ eO-I(a, b) be defined by
T
[ (]x t) = dx(t) .
dt
F r om the properties of derivatives it follows that T is a linear transformation
from e"(a, b) to e"- I (a, b). •
3.4.4. Example. Let X denote the space ofall complexv- alued functions x ( t)
defined on the half-open interval 0[ , 00) such that x ( t) is Riemann integrable
and such that
,--
where k is some positive constant and a is any real number. Defining vector
addition and scalar multiplication as in Eqs. (3.1.20) and (3.1.21), respectively,
it is easily shown that X is a linear space. Now let Y denote the linear space of
complex functions of a complex variable s (s = (1 + ;0>, ; = ,.JT = ). The
reader can readily verify that the mapping T: X - + Y defined by
T
[ (] x s) = 50- e- " x ( t) dt (3.4.5)
is a linear transformation (called the Laplace traasform of x ( t» . •
3.4.6. Example. Let X be the space of real-valued continuous functions

on a[ , b] as described in Example 3.1.19. Let k(s, t) be a real-valued function
3.4. iL near Transformations
defined for a < s :::;;: b, a < t < b, such that for each x E X the Riemann
integral
s: k(s, t)x ( t) dt (3.4.7)
exists and defines a continuous function of s on a[ , b). eL t T1 : X -+ X be

defined by
[Ttx)(s) = y(s) = s: k(s, t)x ( t) dt. (3.4.8)
It is readily shown that T 1 E L ( X , X). The equation (3.4.8) is called the

rF edholm integral equation of the first type. _
3.4.9. Example. If in place of (3.4.8) we define T z : X -+ X by
T
[ )xz (s) = y(s) = x(s) - s: k(s, t)x ( t) dt, (3.4.10)
then it is again readily shown that Tz E L ( X , X). Equation (3.4.10) is known

as the rF edholm integral equation of the second type. _
3.4.11. Example. In Examples 3.4.6 and 3.4.9, assume that k(s, t) = 0

when t> s. In place of (3.4.7) we now have
r k(s, t)x(t)dt. (3.4.12)
Eq u ations (3.4.8) and (3.4.10) now become
[ T 3(]x s) = y(s) = J : k(s, t)x ( t) dt (3.4.13)
and
[T.x)(s) = y(s) = (x s) - .J : k(s, t)x ( t) dt, (3.4.14)
respectively. Equations (3.4.13) and (3.4.14) are called Volterra integral

equations (of the first type and the second type, respectively). Again, the
mappings T3 and T. are linear transformations from X into .X _
~
3.4.15. Example. eL t X = C, the set of complex numbers. If x E C,

let x denote the complex conjugate of .x Define T: X -+ X as
T(x ) = .x
Then, clearly, T(x + y) = x + y = x + y= T(x ) + T(y). Now if F = C,
the field of complex numbers. and if ~ E .F then
T(~)x = ~x = ~x = <iT(x) *' T~ (x).
Therefore, T is not a linear transformation. _

Example 3.4.15 demonstrates the important fact that condition (i) of

Definition 3.4.1 does not imply condition (ii) of this definition.
Henceforth, where dealing with linear transformations T: X -> ,Y we wiJI

write Tx in place of T(x ) .
3.4.16. Definition. L e t T E L(X, )Y . We call the set

& ( T) = {x E X: Tx = O} (3.4.17)
the null space of T. The set
R
< (T) = y{ E :Y y = Tx , x E }X (3.4.18)
is called the range space of T.
Since TO = 0 it follows that & ( T) and R

< (T) are never empty. The next
two important assertions are readily proved.
3.4.19. Theorem. Let T E L(X, )Y . Then

(i) the null space & ( T) is a linear subspace of X; and
(ii) the range space R < (T) is a linear subspace of .Y
F o r the dimension of the range space R

< (T) we have
3.4 . 21.Theorem. L e t T E L ( X , )Y . If X is finite dimensional with dimen-

sion n, then R
< (T) is finite dimensional and dim R < [ (T)] < n.
Proof We assume that R < (T) =# O
{ J and X =# O
{ ,J for if R
< (T) = O
{ J or X
= O
{ ,Jthen dim R <{ (T)} = 0, and the theorem is proved. Thus, assume that
n> 0 and let Y l o"" y~+1 E R< (T). Then there exist X I ' • . • , X~+1 E X
such that Tx/ = y/ for ; = 1, ... , n + 1. Since X is of dimension n, there
exist ~I' • . • , ~+I E F such that not all ~/ = 0 and
~IXt + ... + ~+tX~+t = O.

This implies that
Thus,
or
~IYI + ... +
O. ~+IY-+I =
Therefore, by Corollary 3.3.34, R
< (T) is finite dimensional and dim R
< [ (T)]
n< o •
3.4.22. Example. Let T: R% - > R"", where R% and R- are defined in Exam-
ples 3.1.10 and 3.1.11, respectively. F o r x E R% we write x = ({Io )%{ . Define
3.4. iL near Transformations
Tby
T(~I' ~Z) = (0, I~ ' 0, ~Z, 0, 0, ...).
The mapping T is clearly a linear transformation. The vectors (0; 1,0, ...)
and (0,0,0,1,0,0, ...) span R <[ (T)] = 2 = dim R
< (T) and dim R [ Z]. •
We also have:
3.4.23. Theorem. Let T E L ( X , )Y , and let X be finite dimensional. eL t

{YI>'" ,y.J
be a basis for R < (T) and let ,X be such that TX I = ,Y for i = 1,
... , n. Then X I ' . • • , x. are linearly independent in .X
Our next result, which as we will see is of utmost importance, is sometimes

called the fundamental theorem of linear equations.
3.4.25. Theorem. eL t T E L(X, )Y . If X is finite dimensional, then

dim &(T) + dim R
< (T) = dim .X (3.4.26)
Proof eL t dim X = n, let dim & ( T) = s, and let r = n - s. We must
show that dim R < (T) = r.
°
First, let us assume that < s < n, and let e{ l> ez , ... , e.} be a basis for
X chosen in such a way that the last s vectors, et+., e' H ' ... ,e., form a
basis for the linear subspace & ( T) (see Theorem 3.3.4)4 . Then the vectors
Tel, Tez, , Te" Te'1+ > ... , Te. generate the linear subspace R < (T). But
e,+1> e,+,z , e. are vectors in &(T), and thus Te,+1 = 0, ... , Te. = O.
From this it now follows that the vectors Tel, Te z , ... , Te, must generate
< (T). Now let fl = Tel,fz = Tez, .. ' .I, = Te,. We must show that the
R
vectors {f1,fZ, ... ,f,} are linearly independent and as such form a basis
for R < (T).
Next, we observe that "ltfl + "ldz + ... + "I,f, E R < (T). If the "II> "lz,
... ,"1, are chosen in such a fashion that "ltf. + tdz + ... + "1'/, = 0, then
° = 7tfl + tdz + + 7,f, = 71 Te l + 7z Tez + ... + 7,Te,
= T(7. e l + 7z e z + + 7,e,),
and from this it follows that x = "lle l 7zez + ... 7,e, E &(T). Now, + +
by assumption, the set e{ I+ ' " .. , e.} is a basis for &(T). Thus there must
exist scalars 7t+1> 7,H, ... ,7. such that
"lle l + "Izez + ... + "I,e, = )' , + J e ,+ J + ... + )'.e .•
This can be rewritten as
But fel, e", ... ,en} is a basis for .X F r om this it follows that 71 = 7" = ...
= Y r = 7r+ I = ... = Y n = O. eH nce, fltf", ... ,fr are linearly independent
and therefore dim R < (T) = r. If s = 0, the preceding proof remains valid if
we let fel, ... ,e.} be any basis for X and ignore the remarks about the
vectors e{ r + I ' • • ,en}' If s = n, then ffi.(T) = .X eH nce, R
< (T) = O
{ J and so
dim R< (T) = O. This concludes the proof of the theorem. _
Our preceding result gives rise to the next definition.
3.4.27. Definition. The rank p(T) of a linear transformation T of a finite-

dimensional vector space X into a vector space Y is the dimension of the
range space R
< (T). The nullity v(T) of the linear transformation Tis the dimen-
sion of the nullspace ffi.(i').
The reader is now in a position to prove the next result.
3.4.28. Theorem. eL t T E L ( X , )Y . Let X be finite dimensional, and let

s = dim ffi.(T). eL t IX { ' ... ,x , } be a basis for ffi.(T). Then
(i) a vector x E X satisfies the equation
Tx = O
if and only if x = lIlX I + ... + lI,X , for some set of scalars { l ilt
... , lI,}. Furthermore, for each x E X such that Tx = 0 is satisfied,
the set of scalars { l ilt ... , II,} is unique;
(ii) if oY is a fixed vector in ,Y then Tx = oY holds for at least one x E X
(called a solutioD of the equation Tx = oY ) if and only if oY E R < (T);
and
(iii) if oY is any fixed vector in Y a nd if X o is some vector in X such that
Tx o = oY (i.e., X o is a solution of the equation Tx o = oY ), then a
vector x E X satisfies Tx = oY if and only if x = X o + PIX I + ...
+ P,X, for some set of scalars P{ it P", ... ,P,}. Furthermore, for
each x E X such that Tx = oY , the set of scalars P { it P1.' ... ,P,}
is unique.
Since a linear transformation T of a linear space X into a linear space Y

is a mapping, we can distinguish, as in Chapter I, between linear transforma-
tions that are surjective (i.e., onto), injective (i.e., one-to-one), and bijective
(i.e., onto and one-to-one). We will often be particularly interested in
knowing when a linear transformation T has an inverse, which we denote by
T- l . In this connection, the following terms are used interchangeably: T- I
exists, T has an inverse, T is invertible, and Tis non-singular. Also,. a linear
transformation which is not non-singular is said to be singular. We recall,

if T has an inverse, then
T- I (Tx ) = x for all x E X (3.4.30)
and
T(T- I y) = y for all y E R< (T). (3.4.31)
The following theorem is a fundamental result concerning inverses of
linear transformations.
3.4.32. Theorem. Let T E L ( X , )Y .

(i) The inverse of T exists if and only if Tx = 0 implies x = O.
(ii) If T- I exists, then T- I is a linear transformation from R
< (T) onto .X
Proof To prove part (i), assume first that Tx = 0 implies x = O. Let
X I ' X 2 E X with TX I = TX2' Then T(x l - x 2) = 0 and therefore IX - 2X
= O. Thus, IX = X 2 and T has an inverse.
Conversely, assume that T has an inverse. Let Tx = O. Since TO = 0, we
have TO = Tx. Since T has an inverse, X = O.
To prove part (ii), assume that T- I exists. To establish the linearity of
T- I ,let IY = TX I and 2Y = Tx 2, where Y I ' 2Y E R
< (T) and X I ' X 2 E X are
such that IY = TX I and 2Y = Tx 2. Then
T- I (Y I + 2Y ) = T- I (Tx l + Tx 2) = T- I T(x l + x 2) = IX + X 2
= T- I (Y I ) + T- I (yz ) .
Also, for ~ E F we have
T-I(~YI) = T-I(~Txl) = T-I(T(~xl)) = ~XI = ~T-I(YI)'
Thus, T- I is linear. It is also a mapping onto ,X since every Y E R < (T) is

the image of some X E .X F o r, if X E ,X then there is ayE R< (T) such that
Tx = y. Hence, X = T- I y and X E R < (T-I). •
3.4.33. Example. Consider the linear transformation T: R2 - + R~ of

Example 3.4.22. Since Tx = 0 implies X = 0, Thas an inverse. We see that T
is not a mapping of R2 onto R- ; however, T is clearly a one-to-one mapping
of R2 onto R< (T). •
F o r finite-dimensional vector spaces we have:
3.4.34. Theorem. Let T E L ( X , )Y . If X is finite dimensional, T has an

inverse if and only if CR(T) has the same dimension as X ; i.e., p(T) = dim .X
Proof By Theorem 3.4.25 we have
dim ffi:(T) + dim R
< (T) = dim .X
Since Thas an inverse ifand only iU t (T) = O{ ,J it follows that P(T) = dim X
if and only if T has an inverse. _
F o r finite-dimensional linear spaces we also have:

3.4.35. Theorem. eL t X and Y be finite-dimensional vector spaces of the
same dimension, say dim X = dim Y = n. Let T E L ( X , < (T) = Y
)Y . Then R
if and only if T has an inverse.
Proof Assume that T has an inverse. By Theorem 3.4.34 we know that
dim R < (T) = n. Thus, dim R< (T) = dim Y a nd if follows from Theorem 3.3.39,
< (T) = .Y
part (vii), that R
Conversely, assume that R < (T) = .Y eL t IY{ :Y' .! ' .• . ,Y . } be a basis for
< (T). Let ,X be such that TX t = ,Y for i =
R I, ... ,n. Then, by Theorem
3.4.23, the vectors X u • • , X . are linearly independent. Since the dimension
of X is n, it follows that the vectors X l ' • • ,X . span .X Now let Tx = 0 for
some X E .X We can represent X as X = «IX I + ... + « . x .• Hence, 0 = Tx
= «IYI + ... + «.1 Since the vectorsY I ".' ,Y . are linearly independent,
we must have I« = = .« = 0, and thus X = o. This implies that T has
an inverse. _
At this point we find it instructive to summarize the preceding results

which characterize injective, surjective, and bijective linear transformations.
In so doing, it is useful to keep Figure J in mind.
:Dm = X
3.4.36. iF gure J . iL near transformation T from vector space X into

vector space .Y
3.4.37. Summary (Injective Linear Transformations). Let X and Y be

vector spaces over the same field ,F and let T E L ( X , )Y . The following
are equivalent:
(i) T is injective;
(ii) T has an inverse;
(iii) Tx = 0 implies x = 0;
(iv) for each y E R < (T), there is a unique x E X such that Tx = y;
(v) if TXt = Tx 1 , then X t = x 1 ; and
(vi) if X t *' x 1, then TXt *' Tx 1•
If X is finite dimensional, then the following are equivalent:

(i) T is injective; and
(ii) p(T) = dim .X
3.4.38. Summary (Surjective Linear Transformations). Let X and Y be

vector spaces over the same field E, and let T E L ( X , )Y . The following are
equivalent:
(i) T is surjective; and
(ii) for each Y E ,Y there is an x E X such that Tx = y.
If X and Y a re finite dimensional, then the following are equivalent:
(i) T is surjective; and
(ii) dim Y = p(T).
3.4.39. Summary (Bijective Linear Transformations). Let X and Y be

vector spaces over the same field E, and let T E L ( X , )Y . The following are
equivalent:
(i) T is bijective; and
(ii) for every y E Y there is a unique x E X such that Tx = y.
If X and Y a re finite dimensional, then the following are equivalent:
(i) T is bijective; and
(ii) dim X = dim Y = p(T).
3.4.40. Summary (Injective, Surjective, and Bijective Linear Transforma-

tions). L e t X and Y be finite-dimensional vector spaces, over the same
field E, and let dim X = dim .Y (Note: this is true if, e.g., X = .Y ) The
following are equivalent:
(i) T is injective;
(ii) T is surjective:
(iii) T is bijective; and
(iv) T has an inverse.
3.4.41. Exercise. Verify the assertions made in summaries (3.4.37)-

(3.4.04 ).
eL t us next examine some of the properties of the set L ( X , )Y , the set

of all linear transformations from a vector space X into a vector space .Y
As before, we assumelhat X and Y a re linear spaces over the same field .F
Let S, T E L ( X , Y), and define the sum of SandT by
(S + T)x t::. Sx + Tx (3.4.24 )
for all x E .X Also, with /X E F and T E L ( X , )Y , define multiplication of T
by a scalar /X as
(/XT)x t::. /XTx (3.4.34 )
for all x E .X It is an easy matter to show that (S T) E L ( X , )Y and also +
that /XT E L(X, )Y . eL t us further note that there exists a zero element in
L(X, Y), called the ez ro transformation and denoted by 0, which is defined by
Ox = 0 (3.4.)4
for all x E .X Moreover, to each T E L ( X , )Y there corresponds a unique
linear transformation - T E L ( X , Y) defined by
( - T)x = - Tx (3.4.45)
for all x E .X In this case it follows trivially that - T + T= O.
3.4.64 . Exercise. eL t X be a finite-dimensional space, and let T E L ( X , )Y .

Let e{ l> ... ,e.} be a basis for .X Then Te, = 0 for i = I, ... , n if and only
if T = 0 (i.e., T is the ez ro transformation).
With the above definitions it is now easy to establish the following result.
3.4.74 . Tbeorem. eL t X and Y be two linear spaces over the same field
of scalars ,F and let L ( ,X Y) denote the set of all linear transformations from
X into .Y Then L ( X , Y ) is itself a linear space over ,F called the space of
linear transformations (here, vector addition is defined by Eq. (3.4.24 ) and
multiplication of vectors by scalars is defined by Eq. (3.4.43».
3.4.84 . Exercise. Prove Theorem 3.4.74 .
Next, let us recall the definition of an algebra, considered in Chapter 2.
3.4.94 . Definition. A set X is called an algebra if it is a linear space and

if in addition to each ,x y E X there corresponds an element in X, denoted
by x · y and called the product of x times y, satisfying the following axioms:
(i) x · (y + )z = x • y + x • z for all x , y, z E X ;
(ii) (x + y) • z = x • z + y • z for all x , y, Z E X ; and
(iii) (/Xx), (py) = (/XP)(x • y) for all x , y E X and for all /x, P E .F
If in addition to the above,
(iv) (x · y) • z = x • (y • )z for all x , y, Z E ,X

then X is called an associatil'e algebra.
If there exists an element i E X such that i . x = x • i = x for every
x E ,X then i is called the identity of the algebra. It can be readily shown
that if i exists, then it is unique. Furthermore, if x • y = y • x for all x , y E ,X
then X is said to be a commutative algebra. Finally, if Y is a subset of X
(X i sanalgebra)and(a)ifx + y E Y w heneverx , y E Y , and(b)ifex x E Y
whenever ex E F and x E ,Y and (c) if x • y E Y whenever x , y E ,Y then
Y is called a subalgebra of .X
Now let us return to the subject on hand. Let ,X ,Y and Z be linear spaces
over ,F and consider the vector spaces L ( ,X Y) and L(Y, Z). IfS E L ( ,Y Z)
and if T E L ( X , )Y , then we define the product STas the mapping of X into
Z characterized by
(ST)x = S(Tx ) (3.4.50)
for all x E .X The reader can readily verify that ST E L ( X , Z).
Next, let X = Y = Z. If S, T, V E L ( X , X ) and if ex, PE ,F then it is
easily shown that
S(TU ) = (ST)V, (3.4.51)
S(T+ U) = ST+ SV, (3.4.52)
(S + T)V = SU + TV, (3.4.53)
and
(exS)(PT) = (a,P)ST. (3.4.54)
F o r example, to verify (3.4.52), we observe that
S
[ eT + )U x] = S[(T + )U ]x = S[Tx + ]xU
= (ST)x + (SU ) x = (ST + SU ) x
for all x E ,X and hence Eq. (3.4.52) follows.
We emphasize at this point that, in general, commutativity of linear
transformations does not hold; i.e., in general,
ST*- TS. (3.4.55)
There is a special mapping from a linear space X into ,X called the identity
transformation, defined by
Ix = x (3.4.56)
for all x E .X We note that I is linear, i.e., I E L ( X , X ) , that I*- O ifand only
if X * - O
{ ,J that I is unique, and that
TI = IT = T (3.4.57)
for all T E L(X, X). Also, we can readily verify that the transformation
106 Chapter j I Vector Spaces and Linear Transformations
rJ,I, rJ, e ,F defined by
(a.I)x = a.lx = a.x (3.4.58)
is also a linear transformation.
The above discussion gives rise to the following result.
3.4.59. Theorem. The set of linear transformations of a linear space X

into ,X denoted by L ( X , X), is an associative algebra with identity I. This
algebra is, in general, not commutative.
We further have:
3.4.60. Theorem. Let T E L(X, X). If T is bijective, then T- I E L(X, X)

and
T- I T= IT- I = I, (3.4.61)
where I denotes the identity transformation defined in Eq. (3.4.56).
F o r invertible linear transformations defined on finite-dimensional linear

spaces we have the following result.
3.4.63. Theorem. Let X be a finite-dimensional vector space, and let

T E L(X, X). Then the following are equivalent:
(i) T is invertible;
(ii) rank T = dim X ;
(iii) T is one-to-one;
(iv) T is onto; and
(v) Tx = 0 implies x = O.
Bijective linear transformations are further characterized by our next

result.
3.4.65. Theorem. Let X be a linear space, and let S, T, U E L(X, X). Let
IE L ( X , X ) denote the identity transformation.
(i) If ST = S U = I, then S is bijective and S- I = T = .U
(ii) IfSand Tare bijective, then STis bijective, and (Sn- I = T- I S- I .
(iii) If S is bijective, then (S- I )- I = S.
(iv) If S is bijective, then a.S is bijective and (a.S>1- = ~ S- I for all a. E
F a nd a. *' O.
With the aid of the above concepts and results we can now construct
certain classes of functions of linear transformations. Since relation (3.4.51)
allows us to write the product of three or more linear transformations without
the use of parentheses, we can define T", where T E L ( ,X X ) and n is a positive
integer, as
T"I1T· T · ... · T . (3.4.67)
n times
Similarly, if T- I is the inverse of T, then we can define T- " ' , where m is a
positive integer, as
T- ' " 11 (T- I )' " = T- I • T- I ... • T- t .
. (3.4.68)
mtfmes
m ti'ines n tImes
= (T. T· .... T)
m + n·times
= T"'"+ = (T • T • . ..• T) • (T • T • . .• T)
•
n times
.
mtimes
= 1'" • T"'. (3.4.69)
In a similar fashion we have
(T"')" = T"" = T- = (1"')"' (3.4.70)
and
(3.4.71)
where m and n are positive integers. Consistent with this notation we also
have
TI = T (3.4.72)
and
TO = 1. (3.4.73)
We are now in a position to consider polynomials of linear transformations.
Thus, if f(A) is a polynomial, i.e.,
f(A) = 0« + A
\« + ... + "« A", (3.4.74)
where 0
« ' ... ,« " E ,F then by f(T) we mean
f(T) = f1, 0 1 + f1,tT + ... + ,« ,1"'. (3.4.75)
The reader is cautioned that the above concept can, in general, not be
extended to functions of two or more linear transformations, because linear

transformations in general do not commute.
Next, we consider the important concept of isomorphic linear spaces.
In Chapter 2we encountered the notion of isomorphisms of groups and rings.
We saw that such mappings, if they exist, preserve the algebraic properties
of groups and rings. Thus, in many cases two algebraic systems (such as
groups or rings) may differ only in the nature ofthe elements ofthe underlying
set and may thus be considered as being the same in all other respects. We
n.ow extend this concept to linear spaces.
3.4.76. Definition. eL t X and Y be vector spaces over the same field .F

Ifthere exists T E L ( X , Y) such that Tis a one-to-one mapping of X into ,Y
then T is said to be an isomorphism of X into .Y If in addition, T maps X
onto Y then X and Yare said to be isomorphic.
Note that if X and aY re isomorphic, then clearly aY nd X are isomorphic.

Our next result shows that all n-dimensional linear spaces over the same
field are isomorphic.
3.4.77. Theorem. Every n-dimensional vector space X over a field F is

isomorphic to F".
Proof eL t e{ l, ... ,e,,} be a basis for .X Then every x E X has the unique
representation
x = ele l + ... + e"e",
where {el, e1., ... ,~,,} is a unique set of scalars (belonging to F ) . Now let
us define a linear transformation T from X into P by
Tx = (~1> ~1., •• ,e,,)·
It is an easy matter to verify that T is a linear transformation of X onto P,
and that it is one-to-one (the reader is invited to do so). Thus, X is isomorphic
to P . •
It is not difficult to establish the next result.
3.4.78. Theorem. Two finite-dimensional vector spaces X and Yover the

same field F are isomorphic if and only if dim X = dim .Y
Theorem 3.4.77 points out the importance ofthe spaces R" and C". Namely,
every n-dimensional vector space over the field of real numbers is isomorphic
to R" and every n-dimensional vector space over the field of complex numbers
is isomorphic to eft (see Example 3.I.lO).
3.5. IL NEAR N
UF CTIONALS
There is a special type of linear transformation which is so important

that we give it a special name: linear functional.
We showed in Example 3.1.7 that if F is a field, then "F is a vector space
over .F If, in particular, n = I, then we may view F as being a vector space
over itself. This enables us to consider linear transformations of a vector
space X over F into .F
3.5.1. Definition. Let X be a vector space over a field .F A mapping f

of X into F is called a functional on .X If1 is a linear transformation of X
into ,F then we call 1 a linear functional on X .
. We cite some specific examples of linear functionals.
3.5.2. Example. Consider the space era, b]. Then the mapping
II(x ) = s: ex s) ds, x E era, b] (3.5.3)
is a linear functional on era, b]. Also, the function defined by

Il(X) = (x so), X E era, b], So E a[ , b] (3.5.4)
is also a linear functional on era, b]. Furthermore, the mapping
f,ex) = r (x s)xo(s) ds, (3.5.5)
where X o is a fixed element of era, b] and where x is any element in era, b],
is also a linear functional on era, b]. •
3.5.6. Example. eL t X = P, and denote x E X by x = (e I' •.• , e.).

The mappingf, defined by
f,(x ) = el (3.5.7)
is a linear functional on .X A more general form of I, is as follows. eL t
a = (~I' ... , ~.) E X be fixed and let x = (el' ... ,e.) be an arbitrary
element of .X It is readily shown that the function
•
Is(x ) = :E ,~ e, (3.5.8)
I- I
is a linear functional on .X •
3.5.9. Exercise. Show that the mappings (3.5.3), (3.5.4), (3.5.5), (3.5.7),
and (3.5.8) are linear functionals.
Now let X be a linear space and let X ' denote the set of all linear func-
109
tionals on .X Iff E X ' is evaluated at a point x E ,X we write f(x ) . Fre-

quently we will also find the notation
f(x ) A (x , J ) (3.5.10)
useful. In addition to Eq. (3.5.10), the notation x'(x) or x ' x is sometimes
used. In this case Eq. (3.5.10) becomes
f(x ) = (x , J ) = (x , x ' ) , (3.5.11)
where x ' is used in place of f Now letfl = t' x ,J1. = ~ belong to IX , and let
« E .F Let us define fl + f1. = t'x + ~ and « f = « x ' by
(fl + f1.)(x) = (x , t'x + ~) A (x , ;X ) + (x,~)
= fl(x ) + f1.(x), (3.5.12)
and
(<f< )(x) = (x , « x ' ) A «(x, x') = « f (x ) , (3.5.13)
respectively. We denote the functional f = x ' such that f(x ) = x'(x) = 0
for all x E X by O. Iffis a linear functional then we note that
f(x l + 1X .) = (XI + 1X .' x ' )
= IX < ' x') + 1X< .' x') = f(x l ) + f(x1.), (3.5.14)
and also,
f(<)x< = ,x< x') = ,x < « x') = « f (x ) . (3.5.15)
It is now a simple matter to prove the following:
3.5.16. Theorem. The space X ' with vector addition and multiplication
of vectors by scalars defined by equations (3.5.12) and (3.5.13), respectively,
is a vector space over .F
3.5.18. Definition. The linear space X ' is called the algebraic conjugate
of .X
Let us now examine some of the propeties of X ' for the case of finite-
dimensional linear spaces. We have:
3.5.19. T ' heorem. Let X be a finite-dimensional vector space, and let

e{ l , • • ,e,,} be a basis for .X IfI« { t ... ,« . } is an arbitrary set of scalars,
then there is a unique linear functional x ' E X ' such that (e" x ' ) = ,« for
i = 1, ... , n.
Proof F o r every X E ,X we have
X = ~1l'1 + ~1.e1. + ... + ~"e .•
3.5. iL near uF nctionals 111
Now let x ' E X' be given by

•
I'
(x , x ' ) = ~ rt~"
J'
1= 1
Ifx = e, for some i, we have = I and = 0 if i *- j. Thus, (e x ' ) = rt,

for i = I, ... , n. To show that x ' is unique, suppose there is an" ~ E X '
such that (e,,~) = rt, for i = I, ... ,n. It then follows that (e,,~) - (e
x ' ) = 0 for i = I, ... ,n, and so (e ~ - x ) = 0 for i = I, ... ,n. This "
implies ~ - x = 0; i.e., ~ = x . • "
In our next result and on several other occasions throughout this book,
we make use of the Kronecker delta.
3.5.20. DefIDitioD. Let

~ _ I{ if i = j
(3.5.21)
/J - 0 ifi*- j
for i,j = I, ... ,n. Then ~'J is called the Kronecker delta.
We now have:
3.5.22. Theorem. Let X be a finite-dimensional vector space. If e{ l' e~,

... , e.} is a basis for ,X then there is a unique basis {e~, e;, ... , e:.} in X '
with the property that (e e~) = ~/J' F r om this it follows that if X is n-
dimensional, then so is XI. "
Proof F r om Theorem 3.5.19 it follows that for eachj = I, ... , n, a unique
~ E X' can be found such that .< e e~) = ~/J' Thus, we only have to show
"
that the set e{ ;, e;, ... , e:.} is a linearly independent set which spans X I .
To show that e{ ;, e;, ... , e~} is linearly independent, let
PIe; + pze; + ... + P.e:. = o.
(eJ , ~ . Pie;)= .~ .
Then
o= 1= 1 1= 1
p,(eJ , e;) = ~
1= 1
P'~/J = PJ'
and therefore we have PI = P~ = ... = P. = O. This proves that {e~, e;,
... , e~} is a linearly independent set.
To show that the set Ie;, e;, ... , e~} spans X ' , let ' x E X ' and define
•
rt , = (e x'). Let x = ~ 'lei' We then have
" 1= 1
(x , x ' ) = (' l eI + ... + ' . e., x ' ) = l'< eI> x') + + (' . e., x ' )
= ' I (e l , x ' ) + ... + ' . (e., x ' ) = ' I rt l + + ' . rt .•
Also,
(x , e~) = t:t
•
~ "<e ,, ~) = 'J
Combining the above relations we now have

,x < x') = I« ,X< e;) + + «.(X, e.)
= ,x< I« e; + + .« e.).
F r om this it now follows that for any x ' E X' we have
x' = I« e; + ... + .« e.,
which proves our theorem. _
The previous result motivates the following definition.
3.5.23. DefinitiOD. The basis (e;, e;, ... , e.} of X ' in Theorem 3.5.22 is
called the dual basis of (e" e2 , • • , e.}.
We are now in a position to consider the algebraic transpose of a linear

transformation. L e t S be a linear transformation of a linear space X into a
linear space Y a nd let X ' and yl denote the algebraic conjugates of X and ,Y
respectively (the spaces X and Y need not be finite dimensional). F o r each
y' E yl let us establish a correspondence with an element x ' E X ' according
to the rule
x ' ( x ) = ,x < x ' ) = <Sx, y' ) = y' ( Sx ) , (3.5.24)
where x E .X L e t us denote the mapping defined in this way by ST: STy' = x'
and let us rewrite Eq. (3.5.24) as
,x < STy' ) = S< ,x y' ) , x E ,X y' E yl, (3.5.25)
to define ST. It should be noted that if S is a mapping of X into ,Y then ST
is a mapping of yl into X I , as depicted in F i gure K. We now state the
following formal definition.
s
y
x
lX
3.5.26. iF gure .K Transpose of a linear transformation.
3.5.27. DefinitioD. L e t S be a linear transformation of a linear space X

into a linear space Y over the same field F and let X ' and yl denote the
3.6. Bilinear uF nctiona[s 113
algebraic conjugates of X and ,Y respectively. A transformation ST from yl

into X ' such that
,x< STy') = S
< ,x y' )
for all x E X and all y' E yl is called the (algebraic) traaspose of S.
We now show that ST is a linear transformation.
3.5.28. Theorem. Let S E L ( X , y), and let ST be the transpose of S. Then

ST is a linear transformation from yl into X ' .
Proof L e t« E ,F and let y;, y~ E yl. Then for all x E ,X
,x < ST(Y; + h» = S
< ,x (y; + h» = S
< ,x y;) + S
< ,x h)
= ,x < STY;) + ,x< ST h ).
Thus, ST(y; + y~) = ST(y;) + ST(y~). Also,
,x < ST(<y< ;» = S
< ,x «Y;) = S
< « ,x Y;)
= ,x < « sry;) = ,x< s« ry~).
eH nce, ST(<y< ;) = S
« T(Y;). Therefore, ST E L ( yl, X ' ) . •
The reader should have now no difficulties in proving the following results.
3.5.29. Theorem. Let R, S E L ( X , )Y , and let T E L ( y, Z). Let RT, ST,

and TT be the transpose transformations of R, S, and T, respectively. Then,
(i) (R + Sf = RT + ST; and
(ii) (TSf = STTT.
3.5.30. Theorem. Let I denote the identity element of L ( X , X). Then TJ

is the identity element of L ( X ' , X I ).
3.5.31. Theorem. Let 0 be the null transformation in L ( X , )Y . Then OT

is the null transformation in L ( yl, X I ).
3.5.32. Exercise. Prove Theorems 3.5.29-3.5.31.
We will consider an important class of transpose linear transformations

in Chapter 4 (transpose of a matrix).
3.6. BILINEAR N
UF CTIONALS
In the present section we introduce the notion of bilinear functional and

examine some of the properties of this concept. Throughout the present
section we concern ourselves only with real vector spaces or complex vector
spaces. Thus, if X is a linear space over a field ,F it will be assumed that F

is either the field of real numbers, R, or the field of complex numbers, C.
3.6.1. DefiDitiOD. L e t X be a vector space over C. A mapping g from X

into C is said to be a coojugate f.ClioDal if
g(czx + py) = lig(x ) + pg(y) (3.6.2)
for all ,x y E X and for all cz, p E C, where d denotes the complex conjugate
p
of cz and denotes the complex conjugate of p.
Ifin Definition 3.6.1 the complex vector space is replaced by a real linear
space, then the concept of conjugate functional reduces to that of linear
functional, for in this case Eq . (3.6.2) assumes the form
g(czx + py) = czg(x) + pg(y) (3.6.3)
for all ,x y E X and for all cz, pER.
3.6.4. DeftDitioD. L e t X be a vector space over C. A mapping g of X x X

into C is called a bilioear f.ClioDal or a bilioear form if
(i) for each fixed y, g(x , y) is a linear functional in x ; and
(ii) for each fixed x, g(x , y) is a conjugate functional in y.
Thus, if g is a bilinear functional, then
(a) g(czx + py, )z = czg(x, )z + pg(y, z ) ; and
(b) g(x , czy + pz ) = iig(x , y) + pg(x , z )
for all ,x y, z E X and for all cz, pE C.
F o r the case of real linear spaces the definition of bilinear functional

is modified in an obvious way by deleting in Definition 3.6.4 the symbol for
complex conjugates.
We leave it as an exercise to verify that the examples cited below are
bilinear functionals.
3.6.5. Example. L e t ,x y E Cl, where Cl denotes the linear space of

ordered pairs of complex numbers (if ,x y E Cl, then x = (~I> 1~ ) and
y = (111,111». The function
g(x , y) = ~I;;I + ~7.; 7.
is a bilinear functional. _
3.6.6. Example. L e t ,x y E Rl, where R7. denotes the linear space of

ordered pairs of real numbers (if ,x y E R7., then x = (~I> ~7.) and y =
(111) 111»' L e t (J denote the angle hetween ,x y E l} 7.. The dot product of two
3.6. Bilinear uF nctionals 115
vectors, defined by
I'll +
g(x , y) = ~ ~z'lz = (~t + ~DI/2('It + I' DI/2 cos (J
3.6.7. Example. Let X be an arbitrary linear space over C, and let L ( x )

and P(y) denote two linear functionals on .X The transformation
g(x , y) = L ( x ) P(y)
3.6.8. Example. eL t X be any linear space over C, and let g be a bilinear

functional. The transformation h defined by
h(x, y) = g(x , y)
3.6.9. Exercise. Verify that the transformations given in Examples 3.6.5

through 3.6.8 are bilinear functionals.
We note that for any bilinear functional, g, we have g(O, y) = g(O • 0, y)

= 0 • g(O,y) = 0 for all y E .X Also, g(x , 0) = 0 for all x E .X
Frequently, we find it convenient to impose certain restrictions on
bilinear functionals.
3.6.10. Definition. Let X be a complex linear space. A bilinear functional

g is said to be symmetric if g(x , y) = g(y, x ) for all x, y E .X Ifg(x , x) 2 0
for all x E ,X then g is said to be positive. Ifg(x , x ) > 0 for all x 0, then *"
g is said to be strictly positive.
3.6.11. Definition. eL t X be a complex vector space, and let g be a bilinear

functional. We call the function g: X ......... C defined by
g(x ) = g(x , x )
for all x E X, the quadratic fonn induced by g (we frequently omit the phrase
"induced by g").
F o r example, if g(x , y) = ~ 1;;1 + ~2;;2' as in Example 3.6.5, then g(x )

= ~I~l + ~2~Z = I~dz + l~zI2. This is a quadratic form as studied in
analytic geometry.
F o r real linear spaces, Definitions 3.6.10 and 3.6.11 are again modified in
an obvious way by ignoring complex conjugates.
3.6.12. Theorem. If g is the quadratic form induced by a bilinear functional

116 Chapter 3 I Vectof Spaces and iL near Transformations
g, then
~ [ g (x , y) + g(y,x » ) = t(X t Y) - te 2 Y ) .
Proof By direct expansion we have,

y)
t ( -x +2 - Y ) = g (x- 2
+y - ' - 2 - x+ = 4 I g(x + y,x + y)
= I
4 [ g (x , x + y) + g(y, x + y»)
=
I
4 [ g (x , x) + g(x, y) + g(y, x ) + g(y, y»),
and also,
t e 2 Y) = ~ g[ (x, x) - g(x, y) - g(y, x ) + g(y, y»).

Thus,
- } [ g (x , y) + g(y, )» x = te t Y) - t e 2 y). -
Our next result is commonly referred to as polarization.
3.6.13. 1beorem. Ift is the quadratic form induced by a bilinear form g

on a complex vector space ,X then
g(x , y) = t[ ! ( x + y») - t[ ! ( x - y») + it[ ! ( x + iy»)
- it[ ! ( x - iy)] (3.6.14)
for every ,x y E X (here i = ~ - I ).
Proof F r om the proof of the last theorem we have
te t Y) = ~ g[ (x, x) + g(x, y) + g(y, x ) + g(y, y»)

and
t e 2 Y) = ~ g[ (x, x) - g(x , y) - g(y, x ) + g(y, y»).

Also,
it(X i iY ) = ~ g[ (x, x ) ._ ig(x, y) + ig(y, x ) + g(y, y»)
and
it(X - ; iY ) = ~ g[ (x, x) + ig(x, y) - ig(y, x ) + g(y, y»).
After combining the above four expressions, Eq. (3.6.14) results. _

3.6. Bilinear uF nctiona/s 117
3.6.15. Theorem. Let X be a complex vector space. If two bilinear func-

tionals g and h are such that g = h, then g = h.
F o r symmetric bilinear functionals we have:
3.6.17. Theorem. A bilinear functional g on a complex vector space X is

symmetric if and only if g is real (i.e., g(x) is real for all x E )X .
Proof Suppose that g is symmetric; i.e., suppose that
g(x, y) = g(y, x)
for all ,x y E .X Setting x = y, we obtain
g(x) = g(x, x) = g(x, x) = g(x)
for all x E .X But this implies that g is real.
Conversely, if g(x) is real for all x E ,X then for h(x, y) = g(y, x) we
have h(x) = g(x, x ) = g(x, x ) = g(x). Since h = g, it now follows from
Theorem 3.6.15 that h = g, and thus
g(x, y) = g(y, )x . •
Note that Theorems 3.6.13, 3.6.15, and 3.6.17 hold only for complex
vector spaces. Theorem 3.6.15 implies that a bilinear form is uniquely
determined by its induced quadratic form, and Theorem 3.6.13 gives an
explicit connection between g and g. In the case of real spaces, these conclu-
sions do not follow.
3.6.18. Example. Let X = R2 with x = (el' e2) E R2 and y = (171,172)

E R2. Define the bilinear functionals g and h by
g(x, y) = el171 + 2'217. + 1{4 172 + 2' 172

and
h(x, y) = ' I ' l l + 3e2'11 + 3,1'12 + 2' 1' 2'
Then g(x) = h(x), but g ;#: h. Note that h is symmetric whereas g is not. •
Using bilinear functionals, we now introduce the very important concept

of inner product.
3.6.19. Definition. A strictly positive, symmetric bilinear functional g on

a complex linear space X is called an inner product.
F o r the case of real linear spaces, the definition of inner product is identical
to the above definition.
Since in a given discussion the particular bilinear functional g is always
specified, we will write (x , y) in place of g(x, y) to denote an inner product.

Utilizing this notation, the inner product can alternatively be defined as a
rule which assigns a scalar (x, y) to every x , y E X (X is a complex vector
space), having the following properties:
(i) (x , x ) 0 for all x 1= = 0 and (x , x ) = 0 if x = 0;
>
(ii) (x , y) = (Y, x ) for all x , y E X ;
(iii) (Ctx + Py, )z = cz(,x )z + P(Y, )z for all x , y, Z E X and for all
Ct, P E C; and
(iv) (x , Cty + pz) = « ( x , y) + p(x , )z for all x , y, Z E X and for all
Ct, p E C.
In the case of real linear spaces, the preceding characterization of inner

product is identical, except, of course, that we omit conjugates in (i}(- iv).
We are now in a position to introduce the concept of inner product space.
3.6.20. DefiDition. A complex (real) linear space X on which a complex

(real) inner product, (" ' ) , is defined is called a complex (real) inner product
space. In general, we denote this space by { X ; (0, • )}. If the particular inner
product is understood, we simply write X to denote such a space (and we
usually speak of an inner product space rather than a complex or real inner
product space).
It should be noted that if two different inner products are defined on the
same linear space ,X say (' , )' 1 and (' , • )2' then we have two different inner
product spaces, namely, { X ; (' , .).} and { X ; (0, ')2}'
Now let { X ; (0, .)' } be an inner product space, let Y be a linear subspace
of ,X and let (' , .)" denote the inner product on Y induced by the inner
product on X ; i.e.,
(x, y)' = (x, y)" (3.6.21)
for all ,x y EY e .X Then { Y ; (' , ' )"} is an inner product space in its own
right, and we say that Y is an inner product subspace of X.
Using the concept of inner product, we are in a position to introduce the
notion of orthogonality. We have:
3.6.22. Definition. eL t X be an inner product space. The vectors ,x y E X

are said to be orthogonal if (x, y) = O. In this case we write x - l y. If a vector
x E X is orthogonal to every vector of a set Y c X, then x is said to be
orthogonal to set ,Y and we write x - l .Y If every vector of set Y c X is
orthogonal to every vector of set Z c X, then set Y is said to be orthogonal
to set Z, and we write Y ...L Z.
Clearly, if x is orthogonal to y, then y is orthogonal to .x Note that if

0, then it is not possible that x - l x , because (x, x ) > 0 for all x 1= = O.
x 1= =
Also note that 0 - l x for all x E X.
3.7. Projections 119
Before closing the present section, let us consider a few specific examples.
3.6.23. Example. Let X = R"o F o r x = (~I' 00" ~") E R" and y = (' I I'
o ,' I .) E R· , we can readily verify that
••
•
(x, y) = ~ ~,'Il
I~
is an inner product, and { X ; ( ., .)} is a real inner product space. _
3.6.24. Example. Let X = C", F o r x = (~I' .. " ~.) E C" and y = ('II>
... ,' I .) E C· , let
•
(x, y) = :E ,~ ; "
1- 1
Then (x, y) is an inner product and ;X{ (., .)} is a complex inner product
space. _
3.6.25. Example. Let X denote the space of continuous complex valued

functions on the interval 0[ , 1). The reader can readily show that for f, g E ,X
(f,g) f'= f(t)g(t)dt
is an inner product. Now consider the family of functions {f.} defined by

f.(t) = e1_ 1
, t E 0[ , 1],
n= 0, ± l , ± 2 , .... Clearly, f. E X for all n. It is easily shown that (frn,
f.) = 0 if m *' n. Thus, f .. ..L f .. if m n. •*'
3.7. PROJECTIONS
In the present section we consider another special class of linear transfor-

mations, called prOjectiODS. Such transformations which utilize direct sums
(introduced in Section 3.2) as their natural setting will find wide applications
in later parts of this book.
3.7.1. Definition. Let X be the direct sum of linear spaces X I and X 1 ;

i.e., let X = X I ® X 1 • eL t x = X I + 2X , be the unique representation of
x E X , where X I E X I and 2X , E X 1 • We say that the projection on X I along
2X , is the transformation defined by
P(x ) = XI'
Referring to Figure ,L we note that elements in the plane X can uniquely

be represented as x = X I + 2X " where X I E X I and X 2 E X 2 (X I and X 1
are one-dimensional linear spaces represented by the indicated lines inter-
secting at the origin 0). In this case, a projection P can be defined as that
x = Xl + 2X
3.7.1. Figure L . Projection on IX along 1'X ..
transformation which maps every point x in the plane X onto the subspace
XI along the subspace 1'X .'
3.7.3. Theorem. eL t X be the direct sum of two linear subspaces X I and

1'X ., and let P be the projection on X I along 1'X .' Then
(i) P E L(X, X);
(ii) R
< (P) = X I ; and
(iii) ~(P) = X 2•
Proof To prove the first part, note that if x = X I + 1'X . and Y = YI + 1'Y .'
where x " Y I E X I and 1'X .' 1'Y . E X 2 , then clearly
P(f1.X + Py) = P(f1.X I + f1.X1' . + PYI + PY1' .) = f1.X I + PYI
= f1.P(x l) + PP(YI) = f1.P(x I + 1'X .) + PP(YI + 1'Y .)
= f1.P(x) + pP(y),
and therefore P is a linear transformation.
To prove the second part of the theorem. we note that from the definition
of P it follows that R < (P) C X I ' Now assume that IX E X I ' Then PX I = IX >
and thus x I E R < (P). This implies that XI C R< (P) and proves that R< (P) = X I '
To prove the last part of the theorem, let 1'X . E X 2 • Then PX1' . = 0 so that
1'X . C ~(P). On the other hand, if x E ~(P), then Px = O. Since x =
X I + 1' X .' where X I E XI and 1'X . E 1'X .' it follows that X I = 0 and X E 1'X .'
Thus, 1'X . ::J ~(P). Therefore, 1'X . = ~(P). •
Our next result enables us to characterize projections in an alternative

way.
3.7.4. Theorem. eL t P E L ( X , X). Then P is a projection on R

< (P) along
~(P) if and only if PP = p'1.= P.
3.7. Projections 111
Proof Assume that P is the projection on the linear subspace X l of X

along the linear subspace :X h where X = X I EB X I ' By the preceding theorem,
Xl = R< (P) and X I = m(p). F o r x E ,X we have x = lX XI' where +
X I E X I and IX E XI' Then
p'1. x = P(Px) = PX I = XI = Px,
and thus p'1. = P.
let us assume that p2 = P. Let 1'X . = m(p) and let X I = R
C~n> versely, < (P).
Clearly, m(p) and R < (P) are linear subspaces of .X We must show that
X = R< (P) EB m(P) = X I EB X I ' In particular, we must show that R < (P)
n m(p) = O{ J and that R< (P) and m(p) span .X
Now if y E R < (P) there exists an x E X such that Px = y. Thus, p'1. x = Py
= Px = y. If y E m(p) then Py = O. Thus, if y is in both m(p) and m(p),
then we must have y = 0; i.e., R < (P) n m(p) = O { .J
Next, let x be an arbitrary element in .X Then we have
x = Px + (I - P)x.
Letting Px = lX and (I - P)x = IX ' we have PX I = pIX = Px = X I and
also PX I = P(I - P)x = Px - p'1. x = Px - Px = 0; i.e., X I E X I and
IX E X I ' F r om this it follows that X = X I EB X I and that the projection on
X I along X I is P. •
The preceding result gives rise to the following:
3.7.5. Definition. Let P E L(X, X). Then P is said to be idempotent if

pI = P .
Now let P be the projection on a linear subspace X l along a linear subspace

XI' Then the projection on X I along X I is characterized in the following way.
3.7.6. Theorem. A linear transformation P is a projection on a linear

subspace if and only if (I - P) is a projection. If P is the projection on X I
along 1'X .' then (I - P) is the projection on 1'X . along X l '
In view of the preceding results there is no ambiguity in simply saying a

transformation P is a projection (rather than P is a projection on X I along
1'X .)'
We emphasize here that if P is a projection, then
X = R
< (P) EB m(p). (3.7.8)
This is not necessarily the case for arbitrary linear transformations T E L ( X ,
*'
X ) for, in general, R< (T) and meT) need not be disjoint. F o r example, if
there exists a vector X E X such that Tx 0 and such that T2 x = 0,
. then Tx E R < (T) and Tx E meT).
eL t us now consider:
3.7.9. Definition. eL t T E U.X, X). A linear subspace Y of a vector space

X is said to be invariant under the linear transformation T if y E Y implies
that Ty E .Y
Note that this definition does not imply that every element in Y can be
written in the form z = Ty, with y E .Y It is not even assumed that Ty E Y
implies y E .Y
F o r invariant subspaces under a transformation T E U.X, X ) we can
readily prove the following result.
3.7.10. Theorem. eL t T E U.X, X). Then

(i) X is an invariant subspace under T;
(ii) O
{ J is an invariant subspace under T;
(iii) R< (T) is an invariant subspace under T; and
(iv) (~ T) is an invariant subspace under T.
Next we consider:
3.7.12. Definition. eL t X be a linear space which is the direct sum of two

linear subspaces Y and Z; i.e., X = Y EEl Z. If Y a nd Z are both invariant
under a linear transformation T, then T is said to be reduced by Y a nd Z.
We are now in a position to prove the following result.
3.7.13. Theorem. Let Y and Z be two linear subspaces of a vector space

X such that X = Y EEl Z. Let T E L ( X , X). Then T is reduced by Y and Z
if and only if PT = TP, where P is the projection on Y along Z.
Proof Assume that PT = TP. If y E ,Y then Ty = TPy = PTy so that
Ty E Y and Y is invariant under T. Now let y E Z. Then Py = 0 and PTy
= TPy = TO = O. Thus, Ty E Z and Z is also invariant under T. eH nce, T
is reduced by Y and Z.
Conversely, let us assume that T is reduced by Y and Z. If x E ,X then
x = y + ,z where y E Y and Z E Z. Then Px = yand TPx = Ty E .Y
eH nce, PTPx = Ty = TPx ; i.e.,
PTPx = TPx (3.7.14)
for all x E .X On the other hand, since Y a nd Z are invariant under T, we
have Tx = Ty + Tz with Ty E Y and Tz E Z. eH nce, PTx = Ty = PTy
= PTPx ; i.e.,
PTPx = PTx (3.7.15)
3.8. Notes and References 123
for all x E .X Equations (3.7.14) and (3.7.15) imply that PT = TP. •
We close the present section by considering the following special type of

projection.
3.7.16. Definition. A projection P on an inner product space X is said

to be an orthogonal projection if the range of P and the null space of Pare
< (P) l.. &(P).
orthogonal; i.e., if R
We will consider examples and additional properties of projections in

much greater detail in Chapters 4 and 7.
3.8. NOTES AND REFERENCES
The material of the present chapter as well as that of the next chapter is
usually referred to as linear algebra. Thus, these two chapters should be
viewed as one package. F o r this reason, applications (dealing with ordinary
differential equations) are presented at the end of the next chapter.
There are many textbooks and reference works dealing with vector spaces
and linear transformations. Some of these which we have found to be very
useful are cited in the references for this chapter. The reader should consult
these for further study.
REFERENCES
3[ .1] P. R. A H M
L OS, iF nite Dimensional Vector Spaces. Princeton, N.J . : D. Van
Nostrand Company, Inc., 1958.
3[ .2] K. O H M
F AN and R. N U K ZE, Linear Algebra. Englewood Cliffs, N.J . : Prentice-
H a ll, Inc., 1971.
3[ .3] A. W. NAYO L R and G. R. SEL,L Linear Operator Theory in Engineering and
Science. New Y o rk: H o lt, Rinehart and Winston, 1971.
3[ .4] A. E. TAYO L R, Introduction to u F nctional Analysis. New Y o rk: J o hn Wiley &
Sons, Inc., 1966.
4
IF NITE-DIMENSIONAL
VECTOR SPACES AND
MATRICES
In the present chapter we examine some of the properties offinite-dimensional

linear spaces. We will show how elements of such spaces are represented by
coordinate vectors and how linear transformations on such spaces are repre-
sented by means of matrices. We then will study some of the important prop-
erties of matrices. Also, we will investigate in some detail a special type of
vector space, called the Euclidean space. This space is one of the most impor-
tant spaces encountered in applied mathematics.
Throughout this chapter { « " ... , .« ,} /« E ,F and { x " ... ,x . } , /x E ,X
denote an indexed set of scalars and an indexed set of vectors, respectively.
.4 1. COORDINATE REPRESENTATION
OF VECTORS
Let X be a finite-dimensional linear space over a field ,F and let x { I' • • ,

x.} be a basis for .X Now if x E ,X then according to Theorem 3.3.25 and
Definition 3.3.36, there exist unique scalars ~ I' . . . ,~., called coordinates
of x with respect to this basis such that
x = ~IXI + ... + ~.x.. (4.1.1)
124
.4 1. Coordinate Representation of Vectors U5
This enables us to represent x unambiguously in terms of its coordinates

as
(4.1.2)
or as
(4.1.3)
We call x (or x T) the coordinate representation of the underlying object (vector)
x with respect to the basis { x " ... ,x,,}. We call x a column vector and x T a
row vector. Also, we say that x T is the transpose vector, or simply the transpose
of the vector x. F u rthermore, we define (x T f to be x.
I'
It is important to note that in the coordinate representation (4.1.2) or
(4.1.3) of the vector (4.1.1), an "ordering" of the basis IX{ ' ... ,x,,} is em-
ployed (i.e., the coefficient of X, is the ith entry in Eqs. (4.1.2) and (4.1.3».
If the members of this basis were to be relabeled, thus specifying a different
"ordering," then the corresponding coordinate representation of the vector
X would have to be altered, to reflect this change. However, this does not pose
any difficulties, because in a given discussion we will always agree on a par-
ticular "ordering" of the basis vectors.
Now let« E .F Then
x« = «('IX I+ ... + ,,,x.) = (<I' < )X I + ... + (<e< ")x,,. (4.1.4)
In view of Eqs. (4. t.I)-.4{- 1.4) it now follows that the coordinate representa-
tion of x « with respect to the basis {XI' ... ,x . } is given by
I'
z{
« e l-
e« z
«x= « (4.1.5)
I'_ t e«_ .
or
«x T= «({I' 'z,' .. ",,) = (<I'< ' «ez,' .. ,«e,,). (4.1.6)
Next, let y E ,X where
y = "IX I + "z X z + ... + ""X". (4.1.7)

The coordinate representation of y with respect to the basis IX { " .. ,x,,}
is, of course,
126 ChtJpter 4 I iF nite-Dimensional Vector Spaces and Matrices
=Y (4.1.8)
or
(4.1.9)
Now
x + y = (~.x. + ... + ~RX.) + ('1IXI + ... + 1' RX.)
= (~I + 1' .)x l + ... + (~. + 1' .)x.. (4.1.10)
F r om Eq. (4.1.10) it now follows that the coordinate representation of the
vector x + Y E X with respect to the basis { X I "" ,x . l is given by
x+y=
[ ~.] . + .[~I]
~.
. . =
1' .
[~I . ~
R~
.
+ 1' R
1' 1] (4.1.11)
or
x T+ yT = (~ ..... , ~.) + ('11" .. ,'1R)
= (~. + 1' ..... ,~. + 1' R)' (4.1.12)
Next, let lU { t • . , u.l and V { It .• • , vRl be two different bases for the linear
space .X Then clearly there exist two different but unique sets of scalars
(i.e., coordinates) t{ .¥ , .• • ,t¥.l and fP..... ,P.l such that
X = t¥lu. + ... + t¥.u. = P.v l + ... + P.v.. (4.I.I3)
This enables us to represent the same vector x E X with respect to two dif-
ferent bases in terms of two different but unique sets of coordinates, namely,
[] ~d []
(4.1.14)
The next two examples are intended to throw additional light on the above
discussion.
.4 1.15. Example. eL t X = (~I'" . ,~.) E R·. eL t "I = (1,0, ... ,0),

U2 = (0, 1,0, ... ,0), ... , U . = (0, ... ,0, I). It is readily shown that the
set u{ l , • . , u.l is a basis for RR. We call this basis the natural basis for RR.
Noting that
(4.1.16)
.4 1. Coordinate Representation of Vectors 117
the unambiguous coordinate representation of x E R" with respect to the

natural basis of R" is
(4.1.17)
or
x T= (~1' ... , ~.).
Moreover, the coordinate representations of the basis vectors U I> •• , tU I are
I 0 0
0 I
u. = , U z = 0 , ... , u =
"
, (4.1.18)
0
0 0 I
respectively. We call the coordinates in Eq. (4.1.17) the natural coordinates
of x E R". (The natural basis for F " and the natural coordinates of x E F "
are similarly defined.)
{ ., ... , v.J, given by v. = (1,0, ... ,0),
Next, consider the set of vectors v
Vz = (I, 1,0, ... ,0), ... ,v" = (I, ... , I). We see that the vectors { V I>
... , v.J form a basis for R". We can express the vector x given in Eq. (4.1.16)
in terms of this basis by
(4.1.19)
where ot" = ,,, and ott = " - ,'.+
for i = 1, 2, ... , n - 1. Thus, the coor-
dinate representation of x relative to {v., ... , v.) is given by
,. - z' -
z' - 3'
ot.
ot z
- (4.1.20)
ot,,_. ,,,-. "~-

_ ot" "~
Hence, we have represented the same vector x E R· by two different coordi-
nate vectors with respect to two different bases for R". •
.4 1.21. Example. Let X = e[a, b}, the set of all real-valued continuous
functions on the interval a[ , b]. Let Y = x{ o, x . , .• . , "x J c ,X where ox (t)
= 1 and x,(t) = I' for all I E a[ , b}, i = 1, ... ,n. As we saw in Exercise
3.3.13, Y is a linearly independent set in X and as such it is a basis for V( )Y .
128 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
eH nce, for any y E V(Y) there exists a unique set of scalars ('Io, I' I> • . , 1' .1
such that
y = I' oXo + ... + I' . X .• (4.1.22)
Since y is a polynomial in t we can write, mote explicitly,
y(t) = 1' 0 + ' l It + ... + 'I.t·, t E a[ , b). (4.1.23)
In the present example there is also a coordinate representation; i.e., we can
represent y E V( )Y by
(4.1.24)
I' .
This representation is with respect to the basis (x o, IX > • , .x l in V(Y).
We could, of course, also have used another basis fo~ V(Y). F o r example,
let us choose the basis (zo, z I' • . , .z l for V( )Y given in Exercise 3.3.13. Then
we have
y = 1X0Z o + IX I Z I + ... + IX"Z", (4.1.25)
where IX. = I' " and IXt = I' t - I' t+I' i = 0, 1, ... ,n - 1. Thus, y E V(Y )
may also be represented with respect to the basis (ZO,ZI' " • ,z . } by
1X 0 1' 0 - 'II
IX I I' I - 1' 2
(4.1.26)
IX._ I 1' ,,-1 - 'I.

_ IX" 'I.
Thus, two different coordinate vectors were used above in representing the
same vector y E V( )Y with respect to two different bases for V( )Y . •
Summarizing, we observe:
1. Every vector X belonging to an n-dimensional linear space X over
a field F can be represented in terms of a coordinate vector x, or its
transpose x T , with respect to a given basis e{ I' • • , e.l c .X We note
that x T E P (the space P is defined in Example 3.1.7). By convention
we will henceforth also write x E P. To indicate the coordinate repre-
sentation of x E X by x E P, we write x ~ .x
2. In representing x by x, an "ordering" of the basis e{ l t • • , ell} c X
is implied.
.4 2. Matrices 129
3. sU age of different bases for X results in different coordinate repre-

sentations of x E .X
.4 2. MATRICES
In this section we will first concern ourselves with the representation

of linear transformations on finite-dimensional vector spaces. Such represen-
tations of linear transformations are called matrices. We will then examine
the properties of matrices in great detail. Throughout the present section X
will denote an n-dimensional vector space and Yan m-dimensional vector space
over the same field .F
A. Representation of Linear Transformations by Matrices
We first prove the following result.
.4 2.1. Theorem. Let e{ ., e2, ... ,e ..} be a basis for a linear space .X
(i) eL t A be a linear transformation from X into vector space Y and
set e; = Ae1 , e~ = Ae2 , • • is any vector in X and if ,I" = Ae... If x
(el> e2"' " e..) are the coordinates of x with respect to e{ ., e2, ... ,
e..}, then Ax = e1e; + e2e~ + ... + e.. e~.
(ii) L e t {e~, e;, ... , e~J be any set of vectors in .Y Then there exists a
unique linear transformation A from X into Y such that Ae l = e;,
Ae 2 = e~, .• . , Ae.. = 1".
Proof To prove (i) we note that
Ax = A(e1e l + e2e2 + + e"e,,) = elAe l + e2Ae2 + ... + e"Ae"
= el~ + e2e~ + + e"e~.
To prove (ii), we first observe that for eachx E X we have unique scalars
e., e2" .. , e.. such that
x = e.e l + e2e2 + ... + e..e".
Now define a mapping A from X into Y as
A(x) = ele; + e2e~ + ... + e.. l".
Clearly, A(e,) = e; for i = 1, ,n. We first must show that A is linear.
Given x = ele l + e2e2 + + e..e.. and y = ' l Ie. + I' 2e2 + ... + ' I ..e..,
we have
A(x + y) = A[(el + I' 1)e. + + (e .. + ' I ..)e ..l
= (el + I' 1)e'l + + (e .. + ' I ..)e' ...
On the other hand.
and
Thus.
A(x) + A(y) = ell. + e~e~ + ... + e"e:. + 111e~ + 11~~ + ... + l1"e~
= (el + 111)e~ + (e~ + 11~)~ + ... + (e" + 11,,)e:.
= A(x + y).
In an identical way we establish that
lXA(x) = A(lX)X
for all x E X and all lX E .F It thus follows that A E L ( X . )Y .
To show that A is uniq u e. suppose there exists aBE L ( X . )Y such that
Be, = e; for i = I• . ..• n. It follows that (A - B)e, = 0 for all i = I• . ..• n.
and thus it follows from Exercise 3.4.6 4 that A = B. •
We point out that part (i) of Theorem .4 2.1 implies that a linear transfor-
mation is completely determined by knowing how it transforms the basis vectors
in its domain. and part (ii) of Theorem .4 2.1 states that this linear transfor-
mation is uniquely determined in this way. We will utilize these facts in the
following.
Now let X be an n-dimensional vector space. and let {el' ez • . ..• ell} be
a basis for .X L e t Y b e an m-dimensional vector space. and let {fIJ~ • ... J " ,}
be a basis for .Y L e t A E L ( X . )Y . and let e; = Ae, for i = I • . ..• n. Since
{[IJ~ • ... J " ,} is a basis for .Y there are uniq u e scalars a{ o.} i = I• . ..• m.
j = I • . ..• n. such that
Ael = I. = allfl + azt!~ + + a",t!",
Aez = ~ = aufl + aufz + + a",d", (4.2.2)
+ + ... +
I
Ae" = e:. = at..!1 az,,[z a",..!",.
Now let x E .X Then x has the uniq u e representation
x = elel + e~ez + .,. + e"e"
with respect to the basis e{ l• •• ell}' In view of part (i) of Theorem 4.2.1
we have
Ax = ele~ + ... + e"e~. (4.2.3)
Since Ax E .Y Ax has a uniq u e representation with respect to the basis
ffIJ~.· .. ,fIlII. say.
Ax = 11t!1 + l1dz + ... + 11",[",. (4.2.4)
.4 2. Matrices 131
Combining Equations (4.2.2) and (4.2.3), we have

Ax = el(aldl + ... + a",d",)
+ e,,(au/l + + a",,,/,,,)
+ .
+ e8(a l J I + + a",Jm)'
Rearranging the last expression we have
Ax = (allel +
al"e" + ... + a h e8)/1
+ (a"lel + aue" + ... + a"8e8)/"
+ (a"'lel + a",,,e,, + ... + a"'8en)/",·
However, in view of the uniqueness of the representation in Eq. (4.2.4) we
have
111 aue" +
= allel + + alnen,
11" = a"lel + aue" + + ah e8' (4.2.5)
11", = amlel + a",,,e,, + ... + a",ne8'

This set of equations enables us to represent the linear transformation A
from linear space X into linear space Y by the unique scalars lao}, i = I,
... , m,j = I, ... , n. F o r convenience we let
ail au a 18 ]
A -- [ a,}] - - a" I au ... ah . (4.2.6)
r a"'l
We see that once the bases {el, e", . .. ,e { / h/", ... ,I",} are fixed, we can
a",,, .. , a"'8
8 },
represent the linear transformation A by the array of scalars in Eq. (4.2.6)

which are uniquely determined by Eq. (4.2.2).
In view of part (ii) of Theorem .4 2.1, the converse to the preceding also
holds. Specifically, with the bases for X and Y still fixed, the array given in
Eq. (4.2.6) is uniquely associated with the linear transformation A of X into .Y
The above discussion justifies the following important definition.
.4 2.7. Definition. The array given in Eq. (4.2.6) is called the matrix A of
tbe linear transformation A from linear space X into linear space Y with respect
to the basis e{ 1> • • , en} of X and the basis { I I' ... ,fIll} of .Y
If, in Definition .4 2.7, X = ,Y and if for both X and Y the same basis
e{ l' ... , e is used, then we simply speak of the matrix A of the linear trans-
8 }
formation A with respect to the basis e{ l, ... ,e8 } .

In Eq. (4.2.6), the scalars (all, 0,,,, ... ,0'8) form the ith row of A and the
scalars (all' 0 2/ , ... , 0"'/) form the jth column of A. The scalar a'l refers to
that element of matrix A which can be found in the ith row and jth column of
A. The array in Eq. (4.2.6) is said to be an (m X n) matrix. Ifm = n, we speak
of a square matrix (i.e., an (n X n) matrix).
In accordance with our discussion of Section .4 1, an (n X 1) matrix
is called a column vector, column matrix, or n-vector, and a (1 x n) matrix
is called a row vector.
We say that two (m X n) matrices A = [ 0 1/] and B = b[ l/] are equal if
and only if 01/ = bl/ for all i = I, ... , m and for allj = I, ... , n.
F r om the preceding discussion it should be clear that the same linear
transformation A from linear space X into linear space Y may be represented
by different matrices, depending on the particular choice of bases in X and .Y
Since it is always clear from context which particular bases are being used,
we usually don' t refer to them explicitly, thus avoiding cumbersome notation.
Now let AT denote the transpose of A E L ( X , Y) (refer to Definition
3.5.27). Our next result provides the matrix representation of AT.
.4 2.8. Theorem. Let A E L ( X , Y ) and let A denote the matrix of A with

respect to the bases e{ I' ... , e~} in X and { f l' ... ,I.} in .Y Let X I and
yl be the algebraic conjugates of X and Y, respectively. Let AT E L ( Y I , X I )
be the transpose of A. Let {f~, ... ,f~} and {e~, ... , e:.}, denote the dual
bases of { f l' ... , f",} and e{ u ... , e~}, respectively. If the matrix A is given by
Eq. (4.2.6), then the matrix of AT with respect to {f~, ... ,f~} of yl and
{e~, ... , e:.} of X ' is given by
all a21 0"'1]

(4.2.9)
AT = [ 01.2.. .~2.2 • """ • ~."'2 •
al~ a2~ ... a",~
Proof. Let B = b[ l' ] denote the (n x m) matrix of the linear transformation

AT with respect to the bases f{ ,~ ... ,f~} and {e~, ... , e:.J. We want to show
that B is the matrix in Eq. (4.2.9). By Eq. (4.2.2) we have
for i = I, ... ,n, and
for j = I, ... , m. By Theorem 3.5.22, e< e~> = 6,,, and <I",f~> = 6,,}.
Therefore, "
.4 2. Matrices 133
Also,
A< el,f>~ = e< l, AT/~> = (el, tl bkje~)
= k=L 1" bklel, e~> = bl}'
Therefore, b,j = ajl' which proves the theorem. _
The preceding result gives rise to the following concept.
.4 2.10. Definition. The matrix AT in Eq. (4.2.9) is called the transpose

of matrix A.
Our next result follows trivially from the discussion leading up to Defini-
tion .4 2.7.
.4 2.11. Theorem. Let A be a linear transformation of an n-dimensional

vector space X into an m-dimensional vector space ,Y and let y = Ax. Let
the coordinates of x with respect to the basis e{ l , el' ... , e,,} be (e \ J el' ... ,
e.), and let the coordinates of y with respect to the basis { f l,fl' ... ,f..} be
('I I ' 1' 1' ... , 'I.). eL t
all 011 ala
all au ala
(4.2.12)
be the matrix of A with respect to the bases reI' e1 , •• , e.} and { f l,fl, ... ,
I.}.Then
allel + a l1 el + + alae. = ' I I'
auel + a 21 el + + a 1"e. = 1' 1' (4.2.13)
or, equivalently,
•
I' I = L
jml
a,je j, i = I, ... , m. (4.2.14)
.4 2.15. Exercise. Prove Theorem .4 2.11.
Using matrix and vector notation, let us agree to express the system of
linear equations given by Eq. (4.2.13) equivalently as
all
au
aU.
au
aa.
a~ h
I'
2'
"1
"2
(4.2.16)
or, more succinctly, as

a. 1 a.2 a• • ,- . ".
~ ~ y, (4.2.17)
T
where x ~ (' I ' 2
' " .. ".) and yT ~ ("1> "2' ... ,,,.).
In terms ofx T, yT, and AT, let us agree to express Eq. (4.2.13) equivalently
as
all a21 a. 1
au a.2
2' ' ,' ,.)
aU.
(' I t ~ ("It "2' .. " "",) (4.2.18)
a_ In ab a•
or, in short, as
T
x AT ~ yT. (4 . 2.19)
We note that in Eq. (4.2.17), x E P, Y E F"', and A is an m X n matrix.
F r om our discussion thus far it should be clear that we can utilize matrices
to study systems of linear eq u ations which are of the form of Eq. (4.2.13).
It should also be clear that an m x n matrix A is nothing more than a uniq u e
representation of a linear transformation A of an n-dimensional vector space
X into an m-dimensional vector space Y over the same field .F As such,
A possesses all the properties of such transformations. We could, in fact,
utilize matrices in place of general linear transformations to establish many
facts concerning linear transformations defined on finite-dimensional linear
spaces. However, since a given matrix is dependent upon the selection of two
particular sets of bases (not necessarily distinct), such practice will, in general,
be avoided whenever possible.
We emphasize that a matrix and a linear transformation are not one and
the same thing. In many texts no distinction in symbols is made between
linear transformations and their matrix representation. We will not follow
this custom.
B. Rank of a Matrix
We begin by proving the following result.
4.2.20. Theorem. L e t A be a linear transformation from X into .Y Then

A has rank r if and only if it is possible to choose a basis e{ l> e2 , • • , e.}
.4 2. Matrices 135
for X and a basis { I I' ... ,fIll} for Y such that the matrix A of A with respect
to these bases is of the form
r..
- 100 6 0 0 0-
010 o 0 0 0
A= 0 0 0 ... 1 0 0 ... 0 m= dim .Y (4.2.21)
000···000···0
000···000···0
....
n= dim X
Proof. We choose a basis for X of the form e{ l, e2.' ... ,e" e,+I' • . . , e.},
where e{ l+ ' > ... , e.} isa basisfodJt(A). Ifll = Ae l ,f2. = Ae2.' ... ,/, = Ae"
then {l1,f2.," .,/,} is a basis for R < (A), as we saw in the proof of Theorem
3.4.25. Now choose vectors 1,+1, ... ,fin in Y such that the set of vectors
l{ 1,f2., .. . ,f",} forms a basis for Y (see Theorem 3.3.4)4 . Then
II = Ae l = (1)/1 + (0)/2. + + (O)/, + (0)/'1+ + + (O)/In'

12. = Ae2 = (0)/1 + (1)12. + + (0)/, + (0)/'1+ + + (0)/""
..................................................................................................... ,
I, = Ae, = (0)/1 + (0)/2 + + (1)/, +
+ "' + (O)/In' (4.2.22)
(0)/'1+
o= Ae,+ I = (0)/1 + (0)/2. + + (0)/, + (O)/,+ 1 + ... + (O)/In'
...................................................................................................... ,
0= Ae" = (0)/1 + (0)/2. + ... + (0)/, + (0)/'1+ + ... + (O)/In'
The necessity is proven by applying Definition 4.2.7 (and also Eq. (4.2.2»
to the set of equations (4.2.22); the desired result given by Eq. (4.2.21)
follows.
Sufficiency follows from the fact that the basis for R
< (A) contains r linearly
independent vectors. _
A question of practical significance is the following: if A is the matrix

of a linear transformation A from linear space X into linear space Y with
respect to arbitrary bases e{ l , • • , e.} for X and { I I' ... , /In} for ,Y what is
the rank of A in terms of matrix A? Let R < (A) be the subspace of Y gener-
ated by Ae l , Ae2.' ... , Ae". Then, in view of Eq. (4.2.2), the coordinate repre-
sentation of Ae/> i = I, ... ,n, in Y with respect to { I I' ... ,fin} is given
by
... , Ae,,~
F r om this it follows that R

< (A) consists of vectors y whose coordinate repre-
sentation is
y= + ... + " (4.2.23)
"_ ... a_ ... I a_ ... 2 a..."

where" I' • • , "" are scalars. Since every spanning or generating set of a linear
space contains a basis, we are able to select from among the vectors Ael •
Ae 2• ... ,Ae" a basis for R < (A). Suppose that the set A
{ e l , Ae2 • ...• Aek}
is this basis. Then the vectors Ae I. Ae 2• ..• , Ae k are linearly independent.
and the vectors Aek+I' ... , Ae" are linear combinations of the vectors Ae l •
Ae2 , • • • • Aek • F r om this there now follows:
.4 2.24. Theorem. Let A E L ( X . )Y , and let A be the matrix of A with

respect to the (arbitrary) basis eel' e2 • ... , e,,} for X and with respect to the
(arbitrary) basis { l 1.l2 • ... .I...} for .Y Let the coordinate representation of
y = Ax be Y = Ax. Then
(i) the rank of A is the number of vectors in the largest possible linearly
independent set of columns of A; and
(ii) the rank of A is the number of vectors in the smallest possible set of
columns of A which has the property that all columns not in it can
be expressed as linear combinations of the columns in it.
In view of this result we make the following definition.
.4 2.25. Definition. The rank of an m X n matrix A is the largest number

of linearly independent columns of A.
c. Properties of Matrices
Now let X be an n-dimensional linear space. let Y be an m-dimensional

linear space, let F b e the field for X and ,Y and let A and B be linear transfor-
mations of X into .Y eL t A = a[ o ] be the matrix of A. and let B = h[ o ]
be the matrix of B with respect to the bases felt e2 • • • , e,,} in X and { f t.f2.
.4 2. Matrices 137
... ,/",} in .Y Using Eq. (3.4.2

4 ) as well as Definition .4 2.7, the reader can
readily verify that the matrix of A + D, denoted by C A A + B, is given by
A + B = a[ lj] + b[ IJ] = a[ lJ + blj] = e[ IJ] = C. (4.2.26)
Using Eq. (3.4.34 ) and Definition .4 2.7, the reader can also easily show
that the matrix of A« , denoted by D A «A, is given by
«A = a[ « IJ] = a«[ lj] = d[ IJ] = D. (4.2.27)
F r om Eq. (4.2.26) we note that, in order to be able to add two matrices A
and B, they must have the same number of row.5 and columns. In this case
we say that A and B are comparable matrices. Also, from Eq. (4.2.27) it is
clear that if A is an m X n matrix, then so is A « .
Next, let Z be an r-dimensional vector space, let A E L ( X , )Y , and let
D E L ( ,Y Z). L e t A be the matrix of A with respect to the basis e{ I' e", ... ,
e in X and with respect to the basis { f l' ! ' " ... ,f",} in .Y Let B be the matrix
K }
of D with respect to the basis { f l ,f", ... ,!m} in Y a nd with respect to the basis
{ g l' g", ... , g,} in Z. The product mapping DA as defined by Eq. (3.4.50)
is a linear transformation of X into Z. We now ask: what is the matrix C
of DA with respect to the bases e{ l, e", ... , e of X and g{ I ' g", ... , g,} K }
of Z? By definition of matrices A and B (see Eq. (4.2.2», we have
and
,
B! J = 1 :bljg/t j= I , ... ,m.
1= 1
Now
, "'
= 1=1:1 J=I1: blj aJkgl'
for k = I, ... , n. Thus, the matrix C of BA with respect to basis e{ I' .•. , eK }
in X and { g " ... , g,} in Z is e[ IJ' ] where
(4.2.28)
for i = I, ... , r andj = I, ... , n. We write this as

C= B A. (4.2.29)
F r om the preceding discussion it is clear that two matrices A and B can
be multiplied to form the product BA if and only if the number of columns
ofB is equal to the number of rows of A. In this case we say that the matrices
B and A are conformal matrices.
In arriving at Equations (4.2.28) and (4.2.29) we established the result

given below.
.4 2.30. Theorem. Let A be the matrix of A E L ( X , )Y with respect to the

basis leu ez , ... , e.} in X and basis { l u! z , ,fill} in .Y Let B be the matrix
of BEL ( ,Y Z) with respect to basis { I I' ,z ! ,fill} in Y and basis {g" g,z
... ,g,} in Z. Then BA is the matrix of BA.
We now summarize the above discussion in the following definition.
.4 2.31. Definition. Let A = a[ l' ] and B = b[ ll] be two m X n matrices,

let C = C[ II] be an n X r matrix, and let ~ E .F Then
(i) the som of A and B is the m x n matrix
D= A + B
where
dll = a'l + bl'
for all i = I, ... , m and for allj = 1, ... ,n;
(ii) the product of matrix A by scalar ~ is the m x n matrix
E=~A
where
ell = ~all
for all i = 1, ... ,m and for allj = I, ... ,n; and

(iii) the product of matrix A and matrix C is the m x r matrix
G= A C,
where
for each i = I, ... , m and for eachj = 1, ... , r.
The properties of general linear transformations established in Section

3.4 hold, of course, in the case of their matrix representation. We summarize
some of these in the remainder of the present section.
.4 2.32. Theorem.
(i) Let A and B be (m x n) matrices, and let C be an (n X r) matrix.
Then
(A B)C = AC + BC. + (4.2.33)
(ii) Let A be an (m X n) matrix, and let Band C be (n x r) matrices.
Then
A(B + C) = AD + AC. (4.2.34)
.4 2. Matrices 139
(iii) Let A be an (m X n) matrix, let B be an (n X r) matrix, and let

C be an (r X s) matrix. Then
A(BC) = (AB)C. (4.2.35)
(iv) Let t¥, pE ,F and let A be an (m X n) matrix. Then
(t¥ + P)A = t¥A + pA. (4.2.36)

(v) Let t¥ E ,F and let A and B be (m x n) matrices. Then
t¥(A + B) = t¥A + t¥B. (4.2.37)
(vi) Let t¥, P E ,F let A be an (m X n) matrix, and let B be an (n X r)
matrix. Then
(t¥A)(pB) = (t¥P)(AB). (4.2.38)
(vii) Let A and B be (m x n) matrices. Then
A +B= B+ A . (4.2.39)
(viii) Let A, B, and C be (m x n) matrices. Then
(A + B) + C = A+ (B + C). (4.2.40)
The proofs of the next two results are left as an exercise.
.4 2.41. Theorem. L e t 0 E L ( X , Y ) be the zero transformation defined by

Eq. (3.4.)4 . Then for any bases e{ l' ... , e.J and { f l' ... ,I.. J for X and ,Y
respectively, the linear transformation 0 is represented by the (m x n) matrix
(4.2.42)
The matrix 0 is called the Dull matrix.
.4 2.43. Theorem. Let I E L ( X , X ) be the identity transformation defined

by Eq. (3.4.56). L e t e{ l> ... , e.J be an arbitrary basis for .X Then the matrix
representation of the linear transformation I from X into X with respect to
the basis e{ l> ... , e.J is given by
I ~ ~ [ : .. ..: ..:.:.:.. :J (4.2.4)4
I is called the n x n identity matrix.
.4 2.45. Exercise. Prove Theorems 4.2.32,4.2.41, and .4 2.43.

F o r any (m x n) matrix A we have

A+ O = O + A = A (4.2.46)
and for any (n X n) matrix B we have
BI= I B= B (4.2.47)
where I is the (n x n) identity matrix.
If A = a[ u] is a matrix of the linear transformation A, then correspond-
ingly, - A is a matrix of the linear transformation - A , where
all 012 ala
021 02 2 02"
-A = (- I )A = (- I )
_ 0 "' 1 0",2 a",,, (4.2.48)

- a ll - 0 12 - a la
- 0 21 -au - 0 211
_ - a "' l - 0 "' 2 - a ",,,

It follows immediately that A + (- A ) = 0, where 0 denotes the null matrix.
By convention we usually write A + (- A) = A- A .
Let A and B be (n X n) matrices. Then we have, in general,
AB*BA, (4.2.49)
as was the case in Eq. (3.4.55).
Nex t ,let A E L ( X , X ) and assume that A is non-singular. Let A- I denote
the inverse of A. Then, by Theorem 3.4.60, ..4A1- = A-1A = 1. Now if A is
the (n x n) matrix of A with respect to the basis e{ l , • • ,ell} in ;X then
there is an (n X n) matrix B of A- I with respect to the basis e{ u ... ,ell} in
,X such that
BA= A B= I . (4.2.50)
We call B the inverse of A and we denote it by A- I . In this connection we use
the following terms interchangeably: A- I exists, A bas an inverse, A is
invertible, or A is non-singular. If A is not non-singular, we say A is singnlar.
With the aid of Theorem 3.4.63 the reader can readily establish the fol-
lowing result for matrices.
4.2.51. Theorem. eL t A be an (n X n) matrix. The following are equivalent:

(i) rank A = n;
(ii) Ax = 0 implies x = 0;
.4 2. Matrices 141
(iii) for every oY E "F , there is a unique X o E F " such that oY = Ax o;

(iv) the columns of A are linearly independent; and
(v) A - I exists.
4.2.52. Exercise. Prove Theorem .4 2.51.
We have shown that we can represent n linear eq u ations by the matrix

eq u ation (4.2.17). Now let A be a non-singular (n x n) matrix and consider
the eq u ation
y = Ax. (4.2.53)
If we premultiply both sides of this eq u ation by A - I we obtain
x = A- I y , (4.2.54)
the solution to Eq. (4.2.53). Thus, knowledge of the inverse of A enables
us to solve the system of linear eq u ations (4.2.53).
In our next result, which is readily verified, some of the important proper-
ties of non-singular matrices are given.
4.2.55. Theorem.
(i) An (n x n) non-singular matrix has one and only one inverse.
(ii) IfA and B are non-singular (n x n) matrices, then (AB)-I = B-1 A- I .
(iii) If A and Bare (n x n) matrices and if AB is non-singular, then so
are A and D.
Our next theorem summarizes some of the important properties of the

transpose of matrices. The proof of this theorem is a direct consequence of
the definition of the transpose of a matrix (see Eq. (4.2.9».
4.2.57. Theorem.
(i) F o r any matrix A, (AT)T = A.
(ii) L e t A and B be conformal matrices. Then (AB)T = DTAT.
(iii) L e t A be a non-singular matrix. Then (AT)-I = (A-I)T.
(iv) L e t A be an (n X n) matrix. Then AT is non-singular if and only if
A is non-singular.
(v) Let A and B be comparable matrices. Then (A + B)T = AT BT. +
(vi) L e t« E F and A be a matrix. Then (<A
< )T = A « T.
Now let A be an (n X n) matrix, and let m be a positive integer. Similarly

as in Eq. (3.4.67) we define the (n x n) matrix A'" by

A'" = -'! • A • . ..• A. (4.2.59)
.
m times
and if A- I exists. then similarly as in Eq. (3.4.68). we define the (n x n)
matrix A-"' as
A-"' = (A-I)'" = A- I . A- I • . ..• A- I . (4.2.60)
. ~ ,
m times
As in the case of Eqs. (3.4.69) through (3.4.71). the usual laws of exponents
follow from the above definitions. Specifically. if A is an (n x n) matrix and
if rand s are positive integers. then
A' • A' = A' · ' = A' · ' = A' • A'. (4.2.61)
(A' ) ' = A" = A" = (A' ) ' . (4.2.62)
and if A- I exists. then
(4.2.63)
Consistent with the above notation we have
Al = A (4.2.64)
and
AO = I. (4.2.65)
We are now once more in a position to consider functions of linear transfor-
mations. where in the present case the linear transformations are represented
by matrices F o r example. if f(J.) is the polynomial in .J given in Eq. (3.4.74).
and if A is any (n X n) matrix. then by f(A) we mean
f(A) = /1 0 1 + /lIA + ... + /I.A·. (4.2.66)
.4 2.67. Exercise. Let A E L ( X . X ) . and let A be the matrix of A with

respect to the basis felt ...• e.l in .X L e tf(J . ) be given by Eq. (3.4.74). Show
that/(A) is the matrix of /(A) with respect to the basis e{ l • ...• e.l.
We noted earlier that in general linear transformations and matrices do

not commute (see (3.4.55) and (4.2.49.» oH wever. in the case of square
matrices. the reader can verify the following result easily.
.4 2.68. lbeorem. Let A. B. C denote (n x n) matrices. let 0 denote the

(n X n) null matrix. and let 1 denote the (n X n) identity matrix. Then.
(i) 0 commutes with any A;
(ii) A' commutes with Af. where p and q are positive integers;
(iii) /II commutes with any A. where /I E F; and
(iv) if A commutes with B and if A commutes with C. then A commutes
with /lB + PC. where /I. P E .F
.4 2. Matrices 143
eL t us now consider some specific examples.
.4 2.70. Example. eL t F denote the field of real numbers, and let
A= [~ ~ ~l and B ~ [~ i ~I
Then
A+ B = :[ 3 ~ :J
o 3
and A - B = [~ ~ iJ.
1 0 -I
-
If /X = 3, then
/XA =
[ l~ :~ ~ 18
I'J~ .
9
o 3
.4 1.71. Example. eL t F denote the field ofcomplex numbers, let; = J- -1,

let
Then
3- i 5+ 2;]
C + D =[ I~ 7 + 7; 11 .
7+ i 5 -4
I- 2- 3;]
If/X = - i , then
-i
/XC = -8; 7 -6i·.
[
1- i - 3 i +5i
a
.4 2.71. Example. eL t F denote the field of real numbers, let
G ~ [: :] and H - [~
144 eMpter 4 I iF nite-Dimensional Vector Spaces and Matrices
Then
10 13.
5]
GH =
[ 10 IS
22
Notice that in this case H G is not defined. _
.4 2.73. Example. Let F be the field of real numbers, let
K = [~ ~J and L = [~ ~J
Then
10 5J I[ I 7 12·
16J
KL= [ 22 13 and K L =
Clearly, K L *' LK. _
.4 2.74. Example. Let
M = [~ ~J and N = [~ ~J.
Then
MN= [~ ~J =0,
i.e., MN = 0, even though M t= = 0 and N t= = o. _

.4 2.75. Example. If A is as defined in Example .4 2.70, then
2 4 I 2]
AT=4560._
[
163 I
.4 2.76. Example. Let

32 8 - 1 6-
5 24 24 24
p~ [: -6
7 ~] and Q= 24
I
24
-2 I
24
-45 -6 27
24 24 24
Then
.4 2. Matrices 145
1 0
p· Q = Q · P = 0 1
[ o 0
i.e., Q = p- I or, equivalently, P = Q- l . •
.4 2.77. Example. Consider the set of simultaneous linear equations

~I + 2e~ + e3 + 3e, = 0,
6el + 3e2 + e3 + e4 , = 0, (4.2.78)
2el + e2 + 0·e3 + e, = o.
Equation (4.2.78) can be rewritten as
[
: ~ ~] [~:]
:
e3
= [ ~ ]. (4.2.79)
2 1 0 1
e, 0
eL t
4 2 1 3]
[2
A= 6 314 .
I 0 I
(4.2.80)
Matrix A is the coordinate representation of a linear transformation A E L ( X ,

Y). In this case dim X = 4 and dim Y = 3. Observe now that the first column
of A is a linear combination of the second column of A. Also, by adding the
third column of A to the second column we obtain the fourth column of A.
It follows that A has only two linearly independent columns. Hence, the rank
of A is 2. Now since dim X = dim (~ A) + dim lR(A), the nullity of A is
also 2. •
Next, we discuss briefly partitioned vectors and matrices. Such vectors and
matrices arise in a natural way when linear transformations acting on the
direct sum of linear spaces are considered.
Let X be an n-dimensional vector space, and let Y be an m-dimensional
vector space. Suppose that X = U EB W, where U is an r-dimensionallinear
subspace of ,X and suppose that Y = R EB Q, where R is a p-dimensional
linear subspace of .Y Let A E L(X, )Y , let e{ l, , eft} be a basis for X such
that e{ l' , e,} is a basis for ,U and let { f l' ,f",} be a basis for Y such
that { I I' ,f,} is a basis for R. Let A be the matrix of A with respect to
these bases. Now if x E P is the coordinate representation of x E X with
respect to the basis e{ l, ... , eft}' we can partition x into two components,
I~ -
;[= } (4.2.81)
~.
where U E pr, ' " E )'-·<F and where
Similarly, we can express y E pI as

'II
y=
=[~} (4.2.82)
' I ..
where y is the coordinate representation of y with respect to { I I' ... ,I..}
and where r E PF and q E F"-p. We say the vector x in Eq. (4.2.81) is parti-
tioned into components u and "'. Clearly, the vector u is determined by the
coordinates ofx corresponding to the basis vectors e{ l' ... ,e,} in .U
We can similarly divide the matrix A into the partition
A= [~_I -L~:J' (4.2.83)

AZI ; Au
where All isa(p x r)matrix, Au isa(p x (n - r» m atrix , A11 isan« m - p)
X r) matrix, and Au is an « m - p) X (n - r» matrix. In this case, the
equation
y= A x (4.2.84)
is equivalent to the pair of equations
r = Allu + Au"' } . (4.2.85)
q = A 11 u + Az ' "
.4 2. Matrices 147
A matrix in the form of Eq. (4.2.83) is called a partitioned mattix. The

matrices All, Au, A21 , and A22 are called sUbmatrices of A.
The generalization of partitioning the matrix A into more than four
submatrices is accomplished in an obvious way, when the linear space X
and/or the linear space Y a re the direct sum of more than two linear
subspaces.
Now let the linear spaces X and Y and the linear transformation A and
the matrix A of A still be defined as in the preceding discussion. Let Z be a
k-dimensional vector space (the spaces ,X ,Y and Zare vector spaces over
the same field )F . Let Z = M EB N, where M is a j-dimensionallinear sub-
space of Z. eL t B E L ( ,Y Z). In a manner analogous to our preceding dis-
cussion, we represent B by the partitioned matrix
B= [~I_~LJ' B21 ! Bu
(4.2.86)
It is now a simple matter to show that the linear transformation BA E L(X,

Z) is represented by the partitioned matrix
BA = [~; -!~;: -i~:;~-! ;:l (4.2.87)
We now prove:
.4 2.88. Theorem. Let X be an n-dimensional vector space, and let

P E L(X, X ) . If P is a projection, then there exists a basis (el> ... , e.J for
X such that the matrix P of P with respect to this basis is of the form
1 0 0:
o
I
1 0:,
o
I
I
I r
I
I
I
I
I
p= 0 0 : (4.2.89)
-~_.
:0
I
0
I
I
o I
I
I
I
•
n- r
I •
;0 o
I
where r = dim R
< (P).
Proof. Since P is a projection we have, from Eq. (3.7.8),
X = R
< (P) EB (~ P).
Now let r = dim R < (P), and let e{ l> ... , e.J be a basis for X such that (el'
... , e,J is a basis for R< (P). Let P be the matrix of P with respect to this basis,
and the theorem follows. •
We leave the next result as an exercise.
4.2.90. Theorem. Let X be a finite-dimensional vector space, and let

A E L(X, X). If W is a p-dimensional invariant subspace of X and if X = W
EB Z, then there exists a basis for X such that the matrix A of A with respect
to this basis has the form
A = [~: -i'~!:J o :A 2Z
where All is a (p x p) matrix and the remaining submatrices are of appro-

priate dimension.
.4 3. EQUIVALENCE AND SIMILARITY
F r om the previous section it is clear that a linear transformation A of

a finite-dimensional vector space X into a finite-dimensional vector space Y
can be represented by means ofdifferent matrices, depending on the particular
choice of bases in X and .Y The choice of bases may in different cases result
in matrices that are "easy" or "hard" to utilize. Many of the resulting
"standard" forms of matrices, called canonical forms, arise because of prac-
tical considerations. Such canonical forms often exhibit inherent character-
istics of the underlying transformation A. Before we can consider some of the
more important canonical forms of matrices, we need to introduce several
new concepts which are of great importance in their own right.
Throughout the present section X and Y a re finite-dimensional vector spaces
over the same field ,F dim X = n and dim Y = m. We begin our discussion
with the following result.
4.3.1. Theorem. Let e{ l , • • ,e"} be a basis for a linear space ,X and let
e{ ;, ... , e~} be a set of vectors in X given by
e; = :t pjlej•
j= I
i= 1, ... ,n, (4.3.2)
where Plj E F for all i,j = I, ... ,n. The set e{ ;, ... ,~} forms a basis for
X if and only if P = [Plj] is non-singular.
Proof Let e{ ;, . .. ,~} be linearly independent, and let Pj denote the
jthcolumn vector of P. Let
.4 3. Equivalence and Similarity 149
for some scalars lX I ' •• ,IX " E .F This implies that
' } ' " IXIPIl = 0, i = I, ... ,n.

1':1
It follows that
Rearranging, we have
or
I::" IX I"
1= I = O.
Since e;, ... , e.
are linearly independent, it follows that IX I = ... = IX" = O.
Thus, the columns ofP are linearly independent. Therefore, P is non-singular.
Conversely, let P be non-singular, i.e., let P
{ I' .• • , PIt} be a linearly inde-
,=I:: lX,e; =
pendent set of vectors in .X Let " 0 for some scalars IX I' . • . , IX" E .F
I
Then
Since e{ l' " IX,PI'

... ,e,,} is a linearly independent set, it follows that I:: = 0
I- '
for j = I, ... ,n, and thus, I::" IX,P, = O. Since P{ I" .. ,p,,} is a linearly
I-'
independent set, it now follows that IX = ... = IX" = 0, and therefore
e{ ;, ... , e.} is a linearly independent set. _
I
The preceding result gives rise to:
4.3.3. Definition. The matrix P of Theorem .4 3.1 is called the matrix of

basis e{ ;, ... , e.}
with respect to basis e{ I ' . • • , eft}'
We note that since P is non-singular, p- I exists. Thus, we can readily

prove the next result.
4.3.4. Theorem. L e t e{ l, ... ,e,,} and e{ ;, . .. be two bases for ,X ,e.}

and let P be the matrix of basis e{ ;, ... ,e~} with respect to basis e{ l' ... , eft}'
Then p- I is the matrix of basis e{ I' ... , eft} with respect to the basis e{ ;,
... , e,.}.
The next result is also easily verified.

.4 3.6. Theorem. eL t X be a linear space, and let the sets of vectors e{ l>
... ,eft}' e{ ~ ,e..}, and e{ f' , . .. , e':} be bases for .X If P is the matrix
of basis e{ ,~ , e'ft} with respect to basis e{ I ' • • , eft} and if Q is the matrix
of basis e{ f' , , e':} with respect to basis e{ ,~ ... ,e..}, then PQ is the
matrix of basis e{ f' , • . . , e':} with respect to basis e{ l , • • eft}'
We now prove:
.4 3.8. Theorem. eL t e{ I • . • . , eft} and e{ ,~ ~ • , e..} be two bases for a linear

space .X and let P be the matrix of basis ,~{ ... ,e..} with respect to basis
e{ lt • • , eft}' eL t x E X and let x denote the coordinate representation of x
with respect to the basis e{ lt • • , eft}' eL t x ' denote the coordinate representa-
tion of x with respect to the basis e{ ,~ ... ,e..}. Then Px ' = .x
Proof. eL t x T
= (~I' ... '~ft)' and let (x)' T = (~~, ... ,~~). Then
and
Thus,
~ ft ~eJ = ~• ~ [ .~ plJe, ] = ~ ft(.
~ P/J~J ) e,
J-I J~I 1-' t:1 I- I
which implies that
ft
,~ = ~
j':1
P/J~J' i = I, ...• n.
Therefore,
x = Px /. •
.4 3.9. Exercise. eL t X = Rft and let u{ It • • ,u.} be the natural basis

for Rft (see Example .4 l.l5). eL t e{ lt • • ,eft} be another basis for R-, and let
e lt • • • • eft be the coordinate representations of e lt • • , e., respectively, with
respect to the natural basis. Show that the matrix of basis e{ I • • • . , e.} with
respect to basis u{ lt • • , fU t} is given by P = e[ lt e2 , • • , eft]' i.e., the matrix

whose columns are the column vectors e l • • . • ,eft'
.4 3.10. Theorem. eL t A E L ( X , )Y , and let e{ l, ... ,e.} and { f l" .. ,f..}

be bases for X and ,Y respectively. eL t A be the matrix of A with respect
to the bases e{ l , • • ,eft} in X and { f l' ,fill} in .Y eL t ~{ • . .. , e..} be
another basis for .X and letthe matrix of{e,~ , e..} with respectto e{ l , • • ,
eft} be P. eL t f{ ,~ ... ,f~} be another basis for ,Y and let Q be the matrix of
{ f l' ... ,fill} with respect to f{ ,~ ... ,f~}. eL t A' be the matrix of A with respect
.4 3. Equivalence and Simiklrity 151
to the bases e{ ,~ ... , e:,} in X and f{ ,~ ... ,f~} in .Y Then.

A' = Q AP.
Proof. We have
Ae; = A(~ I~ Pklek) = t Pk/Aek = t Pkl(f't1 at/rlt)

k~1 k~1
= t
k~1
Pkl[l=t1 alk(t
J=I
q J d j)] = J-l
t(f't1 t
k= 1
lJ q alkPkl)fj.
Now, by definition, Ae; =

IN
~ aj,!j. Since a matrix of a linear transformation
"J ::1
is uniquely determined once the bases are specified, we conclude that
for i = I, ... ,m andj = I, ... , n. Therefore, A' = QAP. •
In iF gure A, Theorem .4 3.10 is depicted schematically.
A
x " y
x= Px' " y= Ax
t·
(e;, .. ·.e~} A' u; ..... f;"}
x' " y' = Qy
.4 3.11. IF gure A. Schematic diagram of Theorem .4 3.10.
The preceding result motivates the following definition.
.4 3.12. Definition. An (m X n) matrix A' is said to be equivalent to an

(m X n) matrix A if there exists an (m X m) non-singular matrix Q and an
152 Chapter" I iF nite-Dimensional Vector Spaces and Matrices
(n X n) non-singular matrix P such that

A' = Q AP. (4.3.13)
IfA' is equivalent to A, we write A' ..., A.
Thus, an (m X n) matrix A' is equivalent to an (m X n) matrix A if and

only if A and A' can be interpreted as both being matrices of the same linear
transformation A of a linear space X into a linear space ,Y but with respect
to possibly different choices of bases.
Our next result shows that ..., is reflexive, symmetric, and transitive,
and as such is an equivalence relation.
.4 3.14. Theorem. Let A, B, and C be (m x n) matrices. Then

(i) A is always equivalent to A;
(ii) if A is equivalent to B, then B is equivalent to A; and
(iii) if A is equivalent to Band B is equivalent to C, then A is equivalent
to C.
.4 3.16. Theorem. Let A and B be m x n matrices. Then

(i) every matrix A is equivalent to a matrix of the form
1 0 0 .. · · .. 0-
o 1 0 ... ... 0
r = rank A
(4.3.17)
000 .. · 1 0 0 .. · 0
0 0 0 .. · 0 0 0 .. · 0
0 0 0 .. · 0 0 0 .. · 0
(ii) two (m x n) matrices A and B are equivalent if and only if they
have the same rank; and
(iii) A and AT have the same rank.
Our definition of rank of a matrix given in the last section (Definition

.4 2.25) is sometimes called the columa rank of a matrix. Sometimes, an analo-
gous definition for row rank of a matrix is also considered. The above theorem
shows that the row rank of a matrix is equal to its column rank.
.4 3. Equivalence and Similarity 153
Next, let us consider the special case when X = .Y We have:
.4 3.19.Theorem. L e t A E L ( X , X), let (e l , • • , e.l be a basis for ,X and

let A be the matrix of A with respect to (e l , • • , e.l. L e t (e~, ... , e"l be
another basis for X whose matrix with respect to (e l , • • , e.l is P. L e t A'
be the matrix of A with respect to (~, ... , e"l. Then
A' = P- I AP. (4.3.20)
The meaning of the above theorem is depicted schematically in F i gure B.

The proof of this theorem is just a special application of Theorem .4 3.10.
,;,.;A~ __ ....
X • X
A
•
t,
Ie" . ". enl Ie,.' ..• enl
t,,-
A'
Ie; ..... e~l e{ ;, ... , e~}
4.3.21. Figure B. Schematic diagram of Theorem .4 3.19.
Theorem .4 3.19 gives rise to the following concept.
.4 3.22.Definition. An (n X n) matrix A' is said to be similar to an (n X n)

matrix A if there exists an (n X n) non-singular matrix P such that
A' = P- I AP. (4.3.23)
If A' is similar to A, we write A' ,." A. We call P a similarity transformation.
It is a simple matter to prove the following:
.4 3.24. Theorem. Let A' be similar to A; i.e., A' = P- I AP, where P is

non-singular. Then A is similar to A' and A = PA' P - I .
In view of this result, there is no ambiguity in saying two matrices are

similar.
To sum up, if two matrices A and A' represent the same linear transfor-
mation A E L ( X , X), possibly with respect to two different bases for ,X
then A and A' are similar matrices.
154 eMpter 4 I iF nite-Dimensional Vector Spaces and Matrices
Our next result shows that ' " given in Definition 4.3.22 is an equivalence
relation.
4.3.25. Theorem. Let A, B, and C be (n x n) matrices. Then

(i) A is similar to A;
(ii) if A is similar to B, then B is similar to A; and
(iii) if A is similar to B and if B is similar to C, then A is similar to C.
F o r similar matrices we also have the following result.
4.3.27. Theorem.
(i) Ifan (n X n) matrix A is similar to an (n X n) matrix B, then At is
similar to Bk, where k is a positive integer.
(ii) L e t
(4.3.28)
where /%0' •• ,/%", E .F Then

f(P- I AP) = P- l f(A)P. (4.3.29)
This implies that if B is similar to A, then f(B) is similar to f(A). In fact,
the same matrix P is involved.
(iii)L e t A' be similar to A, and let f(l) denote the polynomial of Eq.
(4.3.28). Then f(A) = 0 if and only if f(A' ) = O.
(iv) L e t A E L ( X , X ) , and l' et A be the matrix of A with respect to a
basis e{ l , • • ,e.} in .X L e t f(l) denote the polynomial of Eq .
(4.3.28). Then f(A) is the matrix of f(A) with respect to the basis
e{ l , • • , e.}.
(v) L e t A E L ( X , X ) , and letf(l) denote the polynomial ofEq . (4.3.28).
Let A be any matrix of A. Thenf(A) = 0 ifand only iff(A) = O.
We can use results such as the preceding ones to good advantage. F o r

example, let A' denote the matrix
11 0 0 0-
o 12 0 0
A' = (4.3.31)
o 0 0 1"_1 0
o 0 0 o 1.
.4 .4 Determinants ofMatrices ISS
Then
MOO .. · · 0
o A~ 0 0
(A')k =
o 0 0
o 0 0 o
Now letf(A) be given by Eq. (4.3.28). Then
I 0 0 Al 0
o I 0 0 A2
f(A' ) = (10 + (II + ...

0 0 0 0 0 A"_I 0
0 0 I 0 0 0 A"
A'1' 0 ........ . 0 f(AI ) 0 ............ 0
-
0 Ar ......... 0 0 f(A2) . ........... 0
o o o o o o
o o o o f(l.)
We conclude the present section with the following definition.
.4 3.32. Definition. We call a matrix of the form (4.3.31) a diagonal matrix.

Specifically, a square (n X n) matrix A = [a'l] is said to be a diagonal
"*
matrix if alj = 0 for all i j. In this case we write A = diag (all, an, ... ,
a•• ).
.4 .4 DETERMINANTS OF MATRICES
At this point of our development we need to consider the important

topic of determinants. After stating the definition of the determinant of a
matrix, we explore some of the commonly used properties of determinants.
We then characterize singular and non-singular linear transformations on
finite-dimensional vector spaces in terms of determinants. Finally, we give
a method of determining the inverse of non-singular matrices.
Let N = {I, 2, ... ,n} . We recall (see Definition 1.2.28) that a permutation
on N is a one-to-one mapping of N onto itself. F o r example, if (J denotes a
permutation on N, then we can represent it as
q=
I 2 ... n)
( jl jz ... j,,'
wherej, E Nfor i = I, ... , n andj, *- j" for i *- k. Henceforth, we represent
q given above, more compactly, as
q = j dz .. . j".
Clearly, there are n! possible permutations on N. We let P(N) denote the
set of all permutations on N, and we distinguish between odd and even
permutations. Specifically, if there is an even number of pairs (i, k) such
that i > k but i precedes k in q , then we say that q is even. Otherwise q is
said to be odd. Finally, we define the function sgn from P(N) into F b y
+ I q is even
sgn (q) = {
-I q is odd
for all q E P(N).
Before giving the definition of the determinant of a matrix, let us consider
a specific example.
4.4.1. Example. As indicated in the accompanying table, there are six

permutations on N = (I, 2,3). In this table the odd and even permutations
are identified and the function sgn is given.
t1 is
t1 (jl.h) (j.. h) (jz , h) odd or even sgn t1
123 (1,2) (1,3) (2,3) even +1

132 (1,3) (1,2) (3,2) odd -1
213 (2, 1) (2,3) (1,3) odd -1
231 (2,3) (2,1) (3,1) even +1
312 (3,1) (3,2) (1,2) even +1
321 (3,2) (3,1) (2, 1) odd -1
Now let A denote the (n X n) matrix
all al2 alrt]

A= a~~ .. ~ ......... ~" .
[
a"l a"z •.• a""
We form the product of n elements from A by taking one and only one
element from each row and one and only one element from each column. We
represent this product as
.4 .4 Determinants ofMatrices 157
where tU i]. ... j.) E P(N). It is possible to find n! such products, one for
each u E P(N). We now define the determinant of A, denoted by det (A), by
the sum
det (A) = I:
"ep(N)
sgn (0') • allt • a2jo • . ..• a.}., (4..4 2)
where u = jl .. . j .• We also denote the determinant of A by writing
det(A) = (4..4 3)
We now present some of the fundamental properties of determinants.
.4 .4 .4 Theorem. eL t A and B be (n x n) matrices.

(i) det (AT) = det (A).
(ii) If all elements of a column (or row) of A are ez ro, then det (A) = O.
(iii) IfB is the matrix obtained by multiplying every element in a column
(or row) of A by a constant tx, while all other columns of B are the
same as those in A, then det (B) = tx det (A).
(iv) If B is the same as A, except that two columns (or rows) are inter-
changed, then det (B) = - d et (A).
(v) If two columns (or rows) of A are identical, then det (A) = O.
(vi) If the columns (or rows) of A are linearly dependent, then det
(A) = O.
Proof To prove the first part, we note first that each product in the sum
given in Eq. (4..4 2) has as a factor one and only one element from each
column and each row of A. Thus, transposing matrix A will not affect the
n! products appearing in the summation. We now must check to see that
the sign of each term is the same.
F o r U E P(N), the term in det (A) corresponding to 0' is sgn (u)a llta 2•}
.• • a.} .• There is a product term in det (AT) of the form a lt'lajo'2" . aN. such
that a 1lt a 2jo . . , a.} . = a} I ' l aN2 ... au .• The right-hand side of this equation
is just a rearrangement of the left-hand side. The number of j; > j;+ I for
i = I, ... ,n - I is the same as the number of j/ > j/+ I for i = 1, ... ,
n - 1. Thus, if 0" = ;U j~ . . .j~) then sgn (u' ) = sgn (0'), which means det
(AT) = det (A). Note that this result implies that any property below which
is proved for columns holds equally as well for rows.
To prove the second part, we note from Eq. (4..4 2) that if for some i,
Q/ k = 0 for all k, then det (A) = O. This proves that if every element in a row
of A is ez ro, then det (A) = O. By part (i) it follows that this result holds
also for columns. _
.4 .4 5. Exercise. Prove parts (iii}(- vi) of Theorem .4 .4 .4
We now introduce some additional concepts for determinants.
.4 .4 6. Definition. Let A = a[ l' ] be an n x n matrix. If the ith row and

jth column of A are deleted, the remaining (n - 1) rows and (n - 1) columns
can be used to form another matrix Mil whose determinant is det (Mil)'
We call det (MIJ) the minor of a'l' If the diagonal elements of MIJ are diagonal
elements of A, i.e., i = j, then we speak of a principal minor of A. The cofactor
of a'l is defined as (- 1 )' + 1 det (MIJ).
F o r example, if A is a (3 x 3) matrix, then
all au a l3
det (A) = a ZI an a Z3 ,
the minor of element a Z3 is

a ll
det(Mz3) =
l a 31
and the cofactor of a Z3 is
The next result provides us with a convenient method of evaluating

determinants.
.4 .4 7. Theorem. Let A be an n x n matrix. eL t e'l denote the cofactor of

a'l' i,j = I, ... ,n. Then the determinant of A is equal to the sum of the
products of the elements of any column (or row) of A, each by its own
cofactor. Specifically,
(4..4 8)
for j = I, ... , n, and,
det (A) = J=IL " a,AI' (4..4 9)

for i = 1, ... ,n.
F o r example, if A is a (2 x 2) matrix, then we have

.4 .4 Determinants ofMatrices 159
If A is a (3 x 3) matrix, then we have

all 012. 0' 3
det (A) = 02' au 023
= 0IlC I , + 0I1CU + 0I3 C I3'
In this case five other possibilities exist. F o r example, we also have

det (A) = O"C" + 02,C2' + a 3 ,c 31 •
.4 .4 10. Exercise. Prove Theorem .4 .4 7.
We also have:
.4 .4 11. Theorem. Ifthe ith row of an (n X n) matrix A consists of elements

of the form 0/1 + 0:" 0' 2 + 0;2' • • ,a," + 0:.; i.e., if
a.2
then
det(A) =
Furthermore, we have:
.4 .4 13. Theorem. eL t A and B be (n x n) matrices. If B is obtained from

the matrix A by adding a constant tt times any column (or row) to any other
column (or row) of A, then det (B) = det (A).
In addition, we can prove:

.4 .4 15.Theorem. Let A be an (n X n) matrix, and let c,/ denote the

cofactor of 0 ,/, i,j = I, ... , n. Then the sum of products of the elements
of any column (or row) by the corresponding cofactors of the elements of
any other column (or row) is ez ro. That is,
~
1=1
•
a,/c ,k = 0 for j *' k (4..4 16a)
and
(4..4 16b)
We can combine Eqs. (4..4 8) and (4..4 16a) to obtain

•
~ a,/c ,k = det (A)cS/k> (4..4 18)
1=1
j, k =1, ... , n, where /~ k denotes the Kronecker delta. Similarly, we can

combine Eqs. (4..4 9) and (4..4 16b) to obtain
(4..4 19)
i, k = 1, ... , n.
We are now in a position to prove the following important result.
.4 .4 20. Theorem. eL t A and B be (n x n) matrices. Then

det (AD) = det (A) det (B). (4.4.21)
Proof We have
det(AB)= .
•
~ a",.b /• 1
'.=1
By Theorem .4 .4 11 and Theorem .4 .4 ,4 part (iii), we have
a"" a",.
This determinant will vanish whenever two or more of the indices i/,j = 1,
... , n, are identical. Thus, we need to sum only over (f E P(N). We have
det (AB) = ~ b"lb,,1" .b ,• . . ,

"EP(N)
.4 .4 Determinants 01 Matrices 161
where q = ili~ ... i. and P(N) is the set of all permutations of N = {I, ... ,
n}. It is now straightforward to show that
= sgn (q) det (A),
and hence it follows that

det (AB) = det (A) det (B). •
Our next result is readily verified.
.4 .4 22. Theorem. Let I be the (n x n) identity matrix, and let 0 be the

(n x n) zero matrix. Then det (I) = I and det (0) = 0.
The next theorem allows us to characterize non-singular matrices in terms

of their determinants.
.4 .4 24. Theorem. An (n X n) matrix A is non-singular if and only if det

(A)::I= O.
Proof Suppose that A is non-singular. Then A- I exists and A- I A = AA- I
= I. F r om this it follows that det (A - I A) = I *0, and thus, in view of Eq.
(4..4 21), det (A - I ) ::1= 0 and det (A) O. *
Next, assume that A is singular. By Theorem .4 3.16, there exist nonsingular
matrices Q and P such that
o
A' = QAP=
o
°
o
This shows that rank A < nand det (A') = 0. But
det (QAP) = d[ et (Q») • [det (A») • [det (P») = 0,
and det (P) *° and det (Q)::I= 0. Therefore, if A is singular, then det (A)
=0 . •
Let us now turn to the problem of finding the inverse A- I of a non-

singular matrix A. In doing so, we need to introduce the classical adjoint of A.
.4 .4 25. Definition. Let A be an (n X n) matrix, and let c' j be the cofactor

of D/J for i,j = 1, ... ,n. Let C be the matrix formed by the cofactors of A;
i.e., C = c[ /J' ] The matrix (J is called the classical adjoint of A. We write
adj (A) to denote the classical adjoint of A.
We now have:
.4 .4 26. Theorem. Let A be an (n X n) matrix. Then

A[adj (A)] = a[ dj (A)]A = [det (A)] • I.
Proof The proof follows by direct computation, using Eqs. (4..4 18) and
(4..4 19). •
As an immediate consequence of Theorem .4 .4 26 we now have the fol-

lowing practical result.
4.4.27. CoroUary. Let A be a non-singular (n x n) matrix. Then
A -I = de/(A) adj(A). (4.4.28)
.4 .4 29. Example. Consider the matrix
A~[: _~ H
We have det(A) = -1,
-1
adj (A) =[ ~ -1 ~],
-3 1 -1
and
-2
A- I = -~
[
The proofs of the next two theorems are left as an exercise.
.4 .4 30. Theorem. If A and 8 are similar matrices, then det (A) = det (8).
.4 .4 31. Theorem. Let A E L ( X , X). Let A be the matrix of A with respect

to a basis {el>' .. ,e,,} in ,X and let A' be the matrix of A with respect to
another basis fe;, ... , e:.} in .X Then det (A) = det (A').
.4 5. Eigenvalues and Eigenvectors 163
.4 .4 32. Exercise. Prove Theorems .4 .4 30 and .4 .4 31.
In view of the preceding results, there is no ambiguity in the following

definition.
.4 .4 33. Definition. The determinant of a linear transformation A of a

finite-dimensional vector space X into X is the determinant of any matrix
A representing it; i.e., det (A) Do det (A).
The last result of the present section is a consequence of Theorems .4 .4 20

and .4 .4 24.
.4 .4 34. Theorem. Let X be a finite-dimensional vector space, and let

A, B E L ( X , X ) . Then A is non-singular if and only if det (A) O. Also, *"
det (AB) = d[ et (A)] • d[ et (B)].
.4 5. EIGENVALE
U S AND EIGENVECTORS
In the present section we consider eigenvalues and eigenvectors of linear

transformations defined on finite-dimensional vector spaces. Later, in
Chapter 7, we will reconsider these concepts in a more general setting.
Eigenvalues and eigenvectors play, of course, a crucial role in the study of
Throughout the present section, X denotes an n-dimensional vector space
over a field .F
eL t A E L ( X , X ) , and let us assume that there exist sets of vectors e{ l,
... , e.J and e{ ;, ... , e~J, which are bases for X such that
e; = Ael = lle l ,
(4.5.1)
i. = Ae. = l.e.,
where 1, E ,F i = 1, ... , n. If this is the case, then the matrix A' of A with
respect to the given basis is
A/ =
o
This motivates the following result.

.4 5.2. Theorem. eL t A E L ( ,X X ) , and let.t E .F Then the set ofall x E X

such that
Ax = Ax (4.5.3)
is a linear subspace of .X In fact, it is the null space of the linear transforma-
tion (A - .tI), where I is the identity element of L(X, )X .
Proof Since the zero vector satisfies Eq. (4.5.3) for any .t E ,F the set is
non-void. If the zero vector is the only such vector, then we are done, for
{ J is a linear subspace of X (of dimension ez ro). In any case, Eq. (4.5.3)
O
holds if and only if (A - U ) x = O. Thus, x belongs to the null space of
A - U , and it follows from Theorem 3.4.19 that the set of all x E X sat-
isfying Eq. (4.5.3) is a linear subspace of .X •
Henceforth we let
mol = x{ E :X (A -
OJ. .tl)x
(4.5.4)=
The preceding result gives rise to several important concepts which we
introduce in the following definition.
.4 5.5. DefiDition. Let ,X A E L ( X , X ) , and mol be defined as in Theorem

.4 5.2 and Eq. (4.5.4). A scalar .t such that mol contains more than just the
zero vector is called an eigenvalue of A (i.e., if there is an x =# 0 such that
Ax = lx , then 1 is called an eigenvalue of A). When .t is an eigenvalue of A,
then each x =# 0 in mol is called an eigenvector of A corresponding to the
eigenvalue .t. The dimension of the linear subspace mol is called the multi-
plicity of the eigenvalue .t. Ifmol is of dimension one, then A. is called a simple
eigenvalue. The set of all eigenvalues of A is called the spectrum of A.
Some authors call an eigenvalue a proper value or a characteristic value
or a latent value or a secular value. Similarly, other names for eigenvector are
proper vector or cbaracteristic vector. The space mol is called the .tth proper
subspace of X.
F o r matrices we give the following corresponding definition.
.4 5.6. DefiDition. Let A be an (n X n) matrix whose elements belong to
the field .F If there exists.t E F and a non-zero vector x E F " such that
Ax = .tx (4.5.7)
then .t is called an eigenvalue of A and x is called an eigenvector of A corre-
sponding to the eigenvalue .t.
Our next result provides the connection between Definitions .4 5.5 and
.4 5.6.
.4 5.8. Theorem. Let A E L ( X , X ) , and let A be the matrix of A with respect
to the basis e{ ., ... ,e,,}. Then A. is an eigenvalue of A if and only if.t is an
eigenvalue of A. Also, x E X is an eigenvector of A corresponding to .t if
and only if the coordinate representation of x with respect to the basis

e{ I' • • , e,,}, ,x is an eigenvector of A corresponding to 1.
.4 5.9. Exercise. Prove Theorem 4.5.8.
Note that if x (or x) is an eigenvector of A (of A), then any non-ez ro

multiple of x (of x) is also an eigenvector of A (of A).
In the next result, the proof of which is left as an exercise, we use deter-
minants to characterize eigenvalues. We have:
.4 5.10.Theorem. Let A E L(X, )X . Then 1 E F is an eigenvalue of A if

and only if det (A - lI) = O.
.4 5.11. Exercise. Prove Theorem 4.5.10.
Let us next examine the equation

det(A - 1 1) = 0 (4.5.12)
in terms of the parameter 1. We ask: Can we determine which values of 1,
if any, satisfy Eq. (4.5.12)1 eL t e{ l, ... ,e,,} be an arbitrary basis for X and
let A be the matrix of A with respect to this basis. We then have
det (A - U) = det (A - 11). (4.5.13)
The right-hand side of Eq. (4.5.13) may be rewritten as
(all -1) au at..
det(A - 1 1) = (4.5.14)
0"1 ad (a"" - 1)
It is clear from Eq. (4.4.2) that expansion of the determinant (4.5.14) yields
a polynomial in 1 of degree n. In order for 1 to be an eigenvalue of A it must
(a) satisfy Eq. (4.5.12), and (b) it must belong to .F Requirement (b) warrants
further comment: note that there is no guarantee that there exists 1 E F
such that Eq. (4.5.12) is satisfied, or equivalently we have no assurance that
the nth-order polynomial equation
det(A - 1 1) = 0
has any roots in .F There is, however, a special class of fields for which
requirement (b) is automatically satisfied. We have:
.4 5.15.
Definition. A field F is said to be algebraically closed if for every
polynomial p(l) there is at least one 1 E F such that
Pel) = o. (4.5.16)
Any 1 which satisfies Eq. (4.5.16) is said to be a root of the polynomial equa-
tion (4.5.16).
In particular, the field ofcomplex numbers is algebraically closed, whereas

the field of real numbers is not (e.g., consider the equation ..P + I = 0).
There are other fields besides the field of complex numbers which are
algebraically closed. oH wever, since we will not develop these, we will restrict
ourselves to the field of complex numbers, C, whenever the algebraic closure
property of Definition .4 5.15 is required. When considering results that are
valid for a vector space over an arbitrary field, we will (as before) make usage
of the symbol F or frequently (as before) make no reference to F at all.
We summarize the above discussion in the following theorem.
.4 5.17. Theorem. eL t A E L(X, X). Then

(i) det (A - 1 I) is a polynomial of degree n in the parameter 1; i.e.,
there exist scalars /10' /II' • • , /1ft' depending only on A, such that
det (A - lT) = /1 0 + /Ill + /lz l z + ... + /I)' f t (4.5.18)
(note that /1 0 = det (A) and /1ft = (- I)");
(ii) the eigenvalues of A are precisely the roots of the equation det
(A - ).T) = 0; i.e., they are the roots of
/1 0 + /II). + /lz)z' + ... + /lft1" = 0; and (4.5.19)

(iii) A has; at most, n distinct eigenvalues.
The above result motivates the following definition.
.4 5.20. Definition. eL t A E L ( X , X), and let A be a matrix of A. We call

det (A - 1I) = det (A - ).1) = /1 0 + /II). + ... + /I)." (4.5.21)
the characteristic polynomial of A (or of A) and
det(A - 1 T) = det(A - 1 1) = 0 (4.5.22)
the characteristic equation of A (or of A).
rF om the fundamental properties of polynomials over the field of complex

numbers there now follows:
.4 5.23. Theorem. If X is an n-dimensional vector space over C and if

A E X ) , then it is possible to write the characteristic polynomial of A
L(X,
in the form
det (A - ).1) = (1 1 - ).)",,().z - ).)"" • . • ()., - ).)"", (4.5.24)
where AI' i = 1, ... ,p, are the distinct roots of Eq. (4.5.19) (Le., AI 1= = A/
for i 1= = j). In Eq. (4.5.24), ml is called the algebraic multiplicity of the root AI'
The ml are positive integers, and t
1= 1
ml = n.
Note the distinction between the concept of algebraic multiplicity of AI

given in Theorem .4 5.23 and the multiplicity of ).1 as given in Definition
.4 5.5. In general, these need not be the same, as will be seen later.
We now state and prove one of the most important results of linear
algebra, the Cayley-aH milton theorem.
.4 5.25. Theorem. eL t A be an n X n matrix, and let p(A) = det (A - AI)

be the characteristic polynomial of A. Then
P(A) = O.
Proof eL t the characteristic polynomial for A be
p(A) = ~o + ~IA + ... + ~"A".
Now let B(A) be the classical adjoint of (A ~ AI). Since the elements bli).)
of B(A) are cofactors of the matrix A - ),1, they are polynomials in A of
degree not more than n - 1. Thus,
blJ(A) = PI/O + PI/IA + ... + PI/<,,-Il A,,-I.
Letting Bk = P[ I/k] for k = 0, 1, ... , n - 1, we have
B(.t) = Bo + .tB I + ... + .t"- I B"_ I '
By Theorem .4 .4 26,
(A - AI)B(A) = d[ et (A - AI)]I.
Thus,
(A - II)(Bo + AB I + ... + A,,-IB,,_I] = (~o + ~IA + ... + ~"A")I.
Expanding the left-hand side of this equation and equating like powers of
A, we have
- B ,,_ I = ~"I, AB,,_I - B"-1 = ~"_II, ... , AB I - B o = « I I,
ABo = ~0I.
Premultiplying the above matrix equations by A", A"-I, ... , A, I, respec-
tively, we have
-A"B"_I = ~"A", A"B"_I - A"-IB"_1 = ~"_IA"-I, ... ,
A1B I - ABo = ~IA, ABo = ~ol.
Adding these matrix equations, we obtain

o= ~oI + ~IA + ... + ~"A" = p(A),
which was to be shown. _
As an immediate consequence of the Cayley-aH milton theorem, we have:
.4 5.26. Theorem. Let A be an (n X n) matrix with characteristic poly-

nomial given by Eq. (4.5.21). Then
(i) A~ = (-I)~+I[(loI +
(lIA + ... + (l~_IA~-I]; and
(ii) if f(.A.) is any polynomial in 1, then there exist Po, PI' .• . , P~_I E F
such that
f(A) = Pol + PIA + ... + P~_IA~-I.
Proof Part (i) follows from Theorem .4 5.25 and from the fact that (l~
= (-I)~.
To prove part (ii), let f(A) be any polynomial in A and let P(1) denote
the characteristic polynomial of A. Then there exist two polynomials g(1)
and r(A) (see Theorem 2.3.9) such that
f(A) = P(A)g(1) + r(1), (4.5.27)
where deg r[ (A)] < n - I. sU ing the fact that p(A) = 0, we have f(A) = r(A),
and the theorem follows. •
The Cayley-aH milton theorem holds also in the case of linear transfor-
mations. Specifically, we have the following result.
.4 5.28. Theorem. eL t A E L ( X , X ) , and let p(l) denote the characteristic

polynomial of A. Then P(A) = O.
Let us now consider a specific example.
.4 5.30. Example. Consider the matrix
A=G J~
Let us use Theorem .4 5.26 to evaluate A37. Since n = 2, we assume that
A37 is of the form
A37 = Pol + PIA.
The characteristic polynomial of A is
P(A) = (I - 1)(2 - 1)
and the eigenvalues of A are 1 1 = I and 1 2 = 2. In the present case f(l)
= 1 37 and r(l) in Eq. (4.5.27) is
r(l) = Po + PI1.
We must determine Po and PI' sU ing the fact that P(11) = P(11) = 0, it
.4 6. Some CtllWnical oF rms ofMatrices 169
follows thatfO' I ) = rO. I ) andf(A z ) = r(A z ). Thus, we have

Po + PI = p7 = I, Po + 2PI = 237 •
Hence, PI = 237 - I and Po = 2- 2 37
• Therefore,
A37 = (2 - 2 37 )1 + (2 37 - I)A,
or,
A37 - [I
237 - I 237
0.J •
Before closing the present section, let us introduce another important
concept for matrices.
.4 5.31. Definition. If A is an (n X n) matrix, then the trace of A, denoted

by trace A or by tr A, is defined as
trace A = tr A = all + 022 + ... + a.. (4.5.32)
(i.e., the trace of a square matrix is the sum of its diagonal elements).
It turns out that if F = C, the field of complex numbers, then there is a

relationship between the trace, determinant, and eigenvalues of an (n X n)
matrix A. We have:
.4 5.33. Theorem. Let X be a vector space over C. Let A be a matrix of

A E L(X, X ) and let det (A - U ) be given by Eq. (4.5.24). Then
(i) det (A) = jJ ; .ti';

(ii) trace (A) = t
J=I
mJ 1J ;
(iii) if B is any matrix similar to A, then trace (8) = trace (A); and
(iv) let f(A) denote the polynomial
f(A) = 10 + 11 A + ... + 1... A... ;
then the roots of the characteristic polynomial of f(A) are f(AI ),
... ,f(A,) and
det [ f (A) - All = [ f (A I ) - A]"" ... [ f (1,) - 1] ...•.
.4 6. SOME CANONICAL O
F RMS OF MATRICES
In the present section we investigate under which conditions a linear

transformation of a vector space into itself can be represented by special
types of matrices, namely, by (a) a diagonal matrix, (b) a so-called triangular
matrix, and (c) a so-called "block diagonal matrix." We will also investigate
when a linear transformation cannot be represented by a diagonal matrix.

Throughout the present section X denotes an n-dimensional vector space
over a field .F
.4 6.1. Theorem. Let ll' ... , lp be distinct eigenvalues of a linear trans-

formation A E L ( X , )X . Let e~ 1= = 0, ... , e~ 1= = 0 be eigenvectors of A
corresponding to ll"' " lp, respectively. Then the set {e~, ... , e~} is
linearly independent.
Proof. The proof is by contradiction. Assume that the set e{ ,~ ... ,e~}
is linearly dependent so that there exist scalars « I ' ... , p« , not all ez ro, such
that
«Ie~ + ... + «pe~ = O.
We assume that these scalars have been chosen in such a fashion that as
few of them as possible are non-zero. Relabeling, if necessary, we thus have
«Ie~ + ... + «,e~ = 0, (4.6.2)
where I« ;= C 0, ... , IX, 1= = 0 and where r < p is the smallest number for which
we can get such an expression.
Since ll, ... ,l, are eigenvalues and since e~, . .. ,I, are eigenvectors.
we have
0= A(O) = A(<<le~ + ... + ,« 1,) = «IAe~ + ... + « , AI,
= (<<Ill)e~ + ... + (<,< l,)I,. (4.6.3)
Also,
o= .2., • 0 = .2.,(<,~ + ... + «,e~)
= (<<Il,)e~ + ... + (<,< .2.,)1,. (4.6.4)
Subtracting Eq. (4.6.4) from Eq. (4.6.3) we obtain
o= « I (ll - .2.,)e~ + ... + ,« (.2., - l,)I,.
Since by assumption the .2.,'s are distinct, we have found an expression
involving only (r - I) vectors satisfying Eq. (4.6.2). But r was chosen to
be the smallest number for which Eq. (4.6.2) holds. We have thus arrived
at a contradiction, and our theorem is proved. _
We note that if, in the above theorem, A has n distinct eigenvalues, then
the corresponding n eigenvectors span the linear space X (recall that dim
X = n ).
Our next result enables us to represent a linear transformation with n
distinct eigenvalues in a very convenient form.
.4 6.5. Theorem. Let A E L ( X , )X . Assume that the characteristic poly-

nomial of A has n distinct roots, so that
.4 6. Some Canonical oF rms of Matrices 171
det (A - )..f) = ()..l - )..)O·z - )..) ... ()..ft - )..),

where )..1' are distinct eigenvalues. Then there exists a basis
)"z, ... ')..ft
(e~,e;, ,e~} of X such that e; is an eigenvector corresponding to )../ for
i = I, 2, , n. The matrix A' of A with respect to the basis e{ ;, e;, ... , i.} is
Al
A' = (4.6.6)
o
A"
Proof Let e; denote the eigenvector corresponding to the eigenvalue A/.
In view of Theorem .4 6.1, the set e{ ;, e;, ... ,e.} is linearly independent
because AI' Az , , ,t" are all different. Moreover, since there are n of the
e;, the set e{ ,~ e;, , e~} forms a basis for the n-dimensional space .X Also,
from the definition of eigenvalue and eigenvector, we have
Ae; Ale; =
Ae; = Aze; (4.6.7)
F r om Eq. (4.6.7) we obtain the desired matrix given in Eq. (4.6.6). •
The reader can prove the following useful result readily.
.4 6.8. Theorem. eL t A E L ( X , X ) , and let A be the matrix of A with respect

to a basis (e I ' ez , ... , eft}' If the characteristic polynomial
det (A - )..l) = IX o + IX l ' t +
z )" z + ... + IXft)..ft
IX
has n distinct roots, )..1' ,Aft' then A is similar to the matrix A' of A
with respect to a basis (e;, , e~}, where
o
A' = (4.6.9)
o
Aft
In this case there eixsts a non-singular matrix P such that
A' = P- I AP. (4.6.10)
The matrix P is the matrix of basis (e;, e;, ... , e~} with respect to basis
(e l , ez , ... , ell}, and p- I is the matrix of basis e{ l, ... ,eft} with respect to
basis e{ ,~ ... , e,,}. The matrix P can be constructed by letting its columns
be eigenvectors of A corresponding to AIt • • , A., respectively. That is,
P= [XI' x 2, • • ,x.l, (4.6.11)
where x tt • • ,x . are eigenvectors of A corresponding to the eigenvalues
AI, ... , A., respectively.
The similarity transformation P given in Eq. (4.6.11) is called a modal
matrix. If the conditions of Theorem .4 6.8 are satisfied and if, in particular,
Eq. (4.6.9) holds, then we say that matrix A bas been diagonalized.
eL t us now consider some specific examples.
.4 6.13. Example. eL t X be a two-dimensional vector space over the field

of real numbers. eL t A E L ( X , X ) , and let e{ l, e2} be a basis for .X Suppose
the matrix A of A with respect to this basis is given by
A- _ 2-[ J4 I I
.
p(l) = det(A - 1 I) = det(A - 1 1) = A2 + A- 6.
Now det (A - AI) = 0 if and only if A2 + A- 6 = 0, or (A - 2)(A + 3)
= O. Thus, the eigenvalues of A are 1 1 = 2 and 1 2 = - 3 . To find an
eigenvector corresponding to 1 1 , we solve the equation (A - l ll)x = 0, or
The last equation yields the equations

e4- 1 + e4 2 = 0, I~ - ~2 = O.
These are satisfied whenever I~ = ~2' Thus, any vector of the form
is an eigenvector of A corresponding to the eigenvalue 11' F o r convenience,

let us choose ~ = I. Then
is an eigenvector. In a similar fashion we obtain an eigenvector X 2 corre-

sponding to 1 2 , given by
.4 6. Some Canonical oF rms ofMatrices 173
The diagonal matrix A' given in Eq. (4.6.9) is, in the present case,
A' = [~I ;J= ~[ l~-

We can arrive at A', using Eq. (4.6.10). Specifically, let
Then
P= [XI' x 2] = [~ ;- J
p- I =
[ ..22 -.2.8J
;J.
and
P- I AP = [~ -~J = [~I
By Eq. (4.3.2), the basis e{ ,~ e;) c X with respect to which A' represents
A is given by
2 •
e~ = ~
1=1
PIleI = el + e2' e; =
1=1
~ Pnel = 4e 1 - e2'
In view of Theorem 4.3.8, if X is the coordinate representation of x with
respect to e{ l, e2 ), then X ' = P- I X is the coordinate representation of x
with respect to {e~, e;). The vectors e~, are, of course, eigenvectors of A e;
corresponding to AI and A2 , respectively. _
When the algebraic multiplicity of one or more of the eigenvalues of a

linear transformation is greater than one, then the linear transformation
is said to have repeated eigenvalues. Unfortunately, in this case it is not always
possible to represent the linear transformation by a diagonal matrix. To put
it another way, if a square matrix has repeated eigenvalues, then it is not
always possible to diagonalize it. oH wever, from the preceding results of the
present section it should be clear that a linear transformation with repeated
eigenvalues can be represented by a diagonal matrix if the number of linearly
independent eigenvectors corresponding to any eigenvalue is the same as the
algebraic multiplicity of the eigenvalue. The following examples throw addi-
tional light on these comments.
4.6.14. Example. The characteristic equation of the matrix
13-2]
A=
[ 0 4
o 3
-2
-I
is
det (A - AI) = (I - A)2(2 - A) = 0,
and the eigenvalues of A are AI = 1 and A2 = 2. The algebraic multiplicity
of AI is two. Corresponding to AI we can find two linearly independent
m H[
eigenvectors
and
Corresponding to A~ we have an eigenvector
:[ }
Letting P denote a modal matrix, we have
1 I] 0[ 1 -1
p=[~ Oland p- I
010
= -2
3
!n
and
A'-P-'AP=[~
In this example, dim moll = 2, which happens to be the same as the algebraic
multiplicity of 11" F o r this reason we were able to diagonalize matrix A. •
The next example shows that the multiplicity of an eigenvalue need not
be the same as its algebraic multiplicity. In this case we are not able to
diagonalize the matrix.
.4 6.15. Example. The characteristic eq u ation of the matrix
21-2]
A =
[ 001
0 2 -1
is
det(A - ) .I) = (1- 1 )(2 - 1 )~ = 0
and the eigenvalues of A are 1 1 = 1 and 1~ = 2. The algebraic multiplicity
of 1~ is two. An eigenvector corresponding to AI is = (I, 1, 1). An rx
eigenvector corresponding to 1~ must be of the form
H[ *~ O.
Setting ~x = (1,0,0), we see that dim mAl = I, and thus we have not been
able to determine a basis for R3, consisting of eigenvectors. Consequently,
we have not been able to diagonalize A. •
When a matrix cannot be diagonalized we seek, for practical reasons,

to represent a linear transformation by a matrix which is as nearly diagonal
as possible. Our next result provides the basis of representing linear transfor-
mations by such matrices, which we call block diagonal matrices. In the next
section we will consider the "simplest" type of block diagonal matrix, called
the Jordan canonical form.
4.6.16. Theorem. Let X be an n-dimensional vector space, and let A

E L(X, Let Y and Z be linear subspaces of X such that X = Y EEl Z
X).
and such that A is reduced by Y a nd Z. Then there exists a basis for X such
that the matrix A of A with respect to this basis has the form
A=l'-~[ *J
where dim Y = r, AI is an (r X r) matrix and A2 is an (n - r) X (n - r)
matrix.
We can generalize the preceding result. Suppose that X is the direct sum
of linear subspaces X I ' ... ' X , that are invariant under A E L ( X , X).
We can define linear transformations AI E L ( X I , ,X ), i = 1, ... ,p, by
A/x = Ax for x E X,. That is to say, A, is the restriction of A to ,X . We now
can find for each A, a matrix representation A" which will lead us to the
following result.
4.6.18. Theorem. eL t X be a finite-dimensional vector space, and let

A E L(X, )X . If X is the direct sum of p linear subspaces, X I ' ... , "X which
are invariant under A, then there exists a basis for X such that the matrix
representation for A is in the block diagonal form given by
AI:
...I -
._ - I
: A2 0
A=
:
,-- ,
r- -
o : A,
Moreover, A, is a matrix representation of A" the restriction of A to X it
i = 1, ... ,po Also,
F r om the preceding it is clear that, in order to carry out the block diago-
nalization of a matrix A, we need to find an appropriate set of invariant
subspaces of X and, furthermore, to find a simple matrix representation on
each of these subspaces.
.4 6.20. Example. Let X be an n-dimensional vector space. If A E L ( X , X)

has n distinct eigenvalues, 1 1 , • • , 1., and if we let
~J = :x{ (A - ll)x = OJ, j = 1, ... ,n,
then ~ J is an invariant linear subspace under A and
X= ~I EB·· E· B~.·
F o r any x E ~J' we have Ax = 1J,x and hence AJx = 1Jx for x E ~J'
A basis for ~J is any non-zero x J E ~r Thus, with respect to this basis, AJ
is represented by the matrix 1J (in this case, simply a scalar). With respect to a
basis of n linearly independent eigenvectors, IX{ > • • , .x ,} A is represented
by Eq. (4.6.6). •
In addition to the diagonal form and the block diagonal form, there
are many other useful forms for matrices to represent linear transformations
on finite-dimensional vector spaces. One of these canonical forms involves
triangular matrices, which we consider in the last result ofthe present section.
We say that an (n X n) matrix is a triangulu matrix ifit either has the form
all 012. 0 13 ab
o 022 023 02.
(4.6.21)
0 0 0 a._ I ,.
0 0 0 a••
or the form
all 0 0 0
021 02:1, 0 0
(4.6.22)
In case of Eq. (4.6.21) we speak of an upper triangulu matrix, whereas in

case of Eq. (4.6.22) we say the matrix is in the lower triangular form.
.4 6.23. Theorem. L e t X be an n-dimensional vector space over C, and let

A E L ( X , X). Then there exists a basis for X such that A is represented by
an upper triangular matrix.
Proof. We wilt show that if A is a matrix of A, then A is similar to an upper
triangular matrix A'. Our proof is by induction on n.
If n = 1, then the assertion is clearly true. Now assume that for n = k,
and C any k x k matrix, there exists a non-singular matrix Q such that
C' = Q- I CQ is an upper triangular matrix. We now must show.the validity
of the assertion for n = k + 1. Let X b e a (k + I)-dimensional vector space
over C. Let AI be an eigenvalue of A, and letll be a corresponding eigenvector.
Let { f z , ... ,fk+l} be any set of vectors in X such that { f l' ... ,fk+l} is
a basis for .X L e t B be the matrix of A with respect to the basis { f l' ... ,
fk+I.} Since All = A.lI • B must be of the form
AI bl2 bl,k+1 J
B=
[
0.... ~ ... ::: ... ~ '.k:.1 .
o bk+I,z .• . bk+I,k+1
Now let C be the k x k matrix
By our induction hypothesis, there exists a non-singular matrix Q such that

C' = Q- I CQ,
where C' is an upper triangular matrix. Now let
I i 0 ... 0
0-- :- - I
I
I
p= I
Q
I
• I
I
• I
0:
I
By direct computation we have

I ;I 0 ... 0
0:
~-I-
p- I = I
I
.: Q- I
I
• 1
0:
1
and
AI :. • •. •
-~_.
o: I
P- I BP = I
I
I
I
I
I
o:
I
where the .' s denote elements which may be non-ez ro. Letting A = P-IBP,
it follows that A is upper triangular and is similar to B. eH nce, any (k + 1)
x (k + 1) matrix which represents A E L ( X , X ) is similar to the upper
triangular matrix A, by Theorem .4 3.19. This completes the proof of the
theorem. _
Note that if A is in the triangular form of either Eq. (4.6.21) or (4.6.22),

then
det (A - 11) = (a J I - A)(au - A) ... (a • - 1).
In this case the diagonal elements of A are the eigenvalues of A.
.4 7. MINIMAL POLN
Y OMIALS, NILPOTENT
OPERATORS, AND THE JORDAN
CANONICAL O F RM
In the present section we develop the Jordan canonical form of a matrix.

To do so, we need to introduce the concepts of minimal polynomial and
nilpotent operator and to study some of the properties of such polynomials
and operators. nU less otherwise specified, X denotes an n-dimensional vector
space over a field F throughout the present section.
A. Minimal Polynomials
F o r purposes of motivation, consider the matrix
A = [~ o ~ =~]. 3 -I
p(A) = (I - 1)Z(2 - 1),
and we know from the Cayley- Hamilton theorem that
P(A) = O. (4.7.1)
.4 7. Minimal Polynomials 179
Now let us consider the polynomial

m(A) = (1 - A)(2 - A) = 2- 3A + AZ •
Then
m(A) = 21 - 3A + A2 = O. (4.7.2)
Thus, matrix A satisfies Eq. (4.7.2), which is of lower degree than Eq. (4.7.1),
the characteristic eq u ation of A.
Before stating our first result, we recall that an nth- o rder polynomial in
A is said to be monic if the coefficient of An is unity (see Definition 2.3.4).
4.7.3. Theorem. L e t A be an (n X n) matrix. Then there exists a unique
polynomial m(A) such that
(i) m(A) = 0;
(ii) m(A) is monic; and,
(iii) if m'(A) is any other polynomial such that m'(A) = 0, then the degree
of m(A) is less or equal to the degree of m'(A) (Le., m(A) is ofthe lowest
degree such that m(A) = 0).
Proof We know that a polynomial, p(A), exists such that P(A) = 0, namely,
the characteristic polynomial. F u rthermore, the degree of p(A) is n. Thus,
there exists a polynomial, say f(A), of degree m < n such that f(A) = O.
Let us choose m to be the lowest degree for which f(A) = O. Since f(A) is
of degree m, we may divide f(A) by the coefficient of Am, thus obtaining
a monic polynomial, m(A), such that m(A) = O. To show that m(A) is
uniq u e, suppose there is another monic polynomial m' ( A) of degree m
such that m'(A) = O. Then m(l) - m' ( l) is a polynomial of degree less than
m. F u rthermore, m(A) - m'(A) = 0, which contradicts our assumption that
m(A) is the polynomial of lowest degree such that m(A) = O. This completes
the proof. _
The preceding result gives rise to the notion of minimal polynomial.
4.7.4. Definition. The polynomial m(A) defined in Theorem .4 7.3 is called

the minimal polynomial of A.
Other names for minimal polynomial are minimum polynomial and

reduced characteristic fUBction. In the following we will develop an explicit
form for the minimal polynomial of A, which makes it possible to determine
it systematically, rather than by trial and error.
In the remainder of this section we let A denote an (n X n) matrix, we let
p(A) denote the characteristic polynomial of A, and we let m(A) denote the
minimal polynomial of A.
4.7.5. Theorem. Let f(l) be any polynomial such that f(A) = O. Then
m(A) divides f(A).
Proof. Let 11 denote the degree of mel). Then there exist polynomials q ( l)
and r(l) such that (see Theorem 2.3.9)
I(l) = q ( l)m(l) + r(l),
where deg r[ (l)] < 11 or r(l) = O. Since I(A) = 0, we have
o= q(A)m(A) + rCA),
and hence rCA) = O. This means r(l) = 0, for otherwise we would have a
contradiction to the fact that mel) is the minimal polynomial of A. Hence,
I(l) = q ( l)m(l) and mel) divides I(l). •
.4 7.6. Corollary. The minimal polynomial of A, mel), divides the char-

acteristic polynomial of A, pel).
.4 7.7. Exercise. Prove Corollary .4 7.6.
We now prove:
.4 7.8. Deorem. The polynomial pel) divides m

[ (l)]".
Proof. We want to show that m [ (l)]" = p(l)q ( l) for some polynomial
Let m(,t) be of degree 11 and be given by
q(,t).
mel) = l' + P.l· - ' + ... + P•.
Let us now define the matrices Bo, B., ... , B._. as
Bo = I, B. = A + P.I, B1 = Al + P.A + P1I, ... ,
B._. = A· - t + PIA,- l + ... + P._ . I.
Then
Bo = I, B. - ABo = PtI, B1 - AB. = P1I, ... ,
B.- t - AB.- 1 = P.- t I ,
and
- A B' _ I = P,I - [A' + PtA· - t + ... P,I) +
= P,I - meA) = P.I.
Now let
Then
(A - lI)B(l) = l' B o + A,-tB 1 + ... + AB'I_ - [l'-'AB o+ l· - l AB.
+ ... + AB,_t]
= A'B o + A·1- B
[ t - ABo] + A,-l[Bl - AB t] + .,.
+ A[B,-t - AB,_l] - AB,_t
= l' I + PtA,- I I + ... + P,- t ll + P,I = m(l)I.
.4 7. MinimolPolynomials 181
Taking the determinant of both sides of this equation we have

[det (A - ).1)] • d[ et B().») = m
[ ()')» ft.
But det B().) is a polynomial in )., say q().). Thus, we have proved that
p().)q().) = m
[ ().)].- •
The next result establishes the form of the minimal polynomial.
.4 7.9. Theorem. Letp().) be given by Eq. (4.5.24); i.e.,

P().) = ().t - ).)"',().% - ).)"' • . .. ().p - ).)"",
where m t , •• , m p are the algebraic multiplicities of the distinct eigenvalues

).\ , .• . ,).p of A, respectively. Then
= (). - ).t)"(). -
m().) ).%), • . .• (). - ).p)", (4.7.10)
where 1 ::;;; v,::;;; m, for i = I, ... ,p.
.4 7.11. Exercise. Prove Theorem .4 7.9. (Hint: Assume that m().) = ().
- p\ ) " ... (). - p,)", and use Corollary .4 7.6 and Theorem .4 7.8).
The only unknowns left to determine the minimal polynomial of A are

Vt, • • , vp in Eq. (4.7.10). These can be determined in several ways.
Our next result is an immediate consequence of Theorem .4 3.27.
4.7.12. Theorem. Let A' be similar to A. and let m' ( .t) be the minimal
polynomial of A' . Then m /().) = m().).
This result justifies the following definition.
.4 7.13.Definition. eL t A E L ( X , X ) . The minimal polynomial of A is

the minimal polynomial of any matrix A which represents A.
In order to develop the J o rdan canonical form (for linear transformations

with repeated eigenvalues), we need to establish several additional preliminary
results which are important in their own right.
.4 7.14. Theorem. Let A E L ( X , X ) . and letf().) be any polynomial in )..

Let m, = { x : f(A)x = OJ. Then m, is an invariant linear subspace of X
under A.
Proof The proof that m, is a linear subspace of X is straightforward and
is left as an exercise. To show that m, is invariant under A, let x Em,. so
thatf(A)x = O. We want to show that Ax E m" Let
Then
and
which completes the proof. _
Before proceeding further, we establish some additional notation. Let

AI" .. ,Ap be distinct eigenvalues of A E L(X, )X . F o r j = I, ... ,p and
for any positive integer ,q let
1~ = {x: (A - AJT)qX = OJ. (4.7.15)
Note that this notation is consistent with that used in Example .4 6.20
if we define
}~ = ~J.
Note also that, in view of Theorem .4 7.14, 1~ is an invariant linear subspace
of X under A.
We will need the following result concerning the restriction of a linear
transformation.
.4 7.16. Theorem. Let A E L(X, )X . Let IX and X 1 be linear subspaces of

X such that X = IX EEl X 1 and let AI be the restriction of A to IX . Let f(A)
be any polynomial in 1. If A is reduced by X I and X 1 then, for all IX E X "
f(AI)x I = f(A)x l •
Next we prove:
.4 7.18. Theorem. Let X be a vector space over C, and let A E L(X, X).
Let m(l) be the minimal polynomial of A as given in Eq. (4.7.10). Let g(l)
= (A - AI)", let h(A) = (l - A1)" ... (A - Ap )" if p 2 2, let h(A) = I
if p = I. eL t AI be the restriction of A to ~i', i.e., AI X = Ax for all x E ~i'.
Let ml = x { E :X h(A)x = OJ. Then
(i) X = ~'i' EEl ml; and
(ii) (l - A I)" is the minimal polynomial for AI'
Proof By Theorem .4 7.14, ml and ~i' are invariant linear subspaces under
A. Since g(l) and h(l) are relatively prime, there exist polynomials (q A) and
r(l) such that (see Exercise 2.3.15)
q ( l)g(l) + r(l)h(l) = 1.
.4 7. Minimal Polynomials 183
eH nce, for the linear transformation A we have

(q A)g(A) + r(A)h(A) = I. (4.7.19)
Thus, for x E ,X we have
x = (q A)g(A)x + r(A)h(A)x.
Now since
h(A)q(A)g(A)x = (q A)g(A)h(A)x = (q A)m(A)x = (q A)Ox = 0,
it follows thatq(A)g(A)x E ml. We can similarly show that r(A)h(A)x Emi' .
Thus, for every x E X we have x = XI + x 2 , where IX E mi' and X z E ml.
Let us now show that this representation of x is unique. Let X = IX X 2 +
= x; + x~, where IX ' ;x E ml ' and 2X ' ~x E ml. Then
r(A)h(A)x = r(A)h(A)x l = r(A)h(A)x;.
Applying Eq. (4.7.19) to IX and ;x we get
XI = r(A)h(A)x l
and
;X = r(A)h(A)x;.
F r om this we conclude that XI = ;x . Similarly. we can show that X 2 = x~.
Therefore. X = mi' EB ml.
To prove the second part of the theorem, let A I be the restriction of A to
mi' and let A2 be the restriction of A to ml. eL t ml(l) and m2(1) be the
minimal polynomials for AI and A2• respectively. Since g(A I) = 0 and
h(A 1 ) = O. it follows that ml(l) divides g(l) and m1 divides hell. by o.)
Theorem 4.7.5. eH nce, we can write
ml(l) = (1 - ll)kt
and
m2(A) = (A - A2)lo' ... (1 - A,)lo,.
where 0 < kl :::;;:vl for i = I• . .. • p. Now let fell = ml(A)mrlA). Then
f(A) = m l(A)m 2(A). eL t X E X with X = IX + 2X ' where IX E mi' and
2X E ml. Then
f(A)x = m l (A)m 2(A)x l +
m l(A)m 2(A)x 2 = m 2(A)m.(A)x l = O.
Therefore,f(A) = O. But this implies that mel) dividesf(l) and 0 < VI < kl'
i = I, ... ,po
We thus conclude that kl = VI for i = I, ...• P. which completes the
proof of the theorem. _
We are now in a position to prove the following important result, called

the primary decomposition theorem.
.4 7.20. Theorem. eL t X be an n-dimensional vector space over C. let

AI' ...• A, be the distinct eigenvalues of A E L ( X . X ) . let the characteristic
184 Chapter 4 I m
F ite-Dimensional Vector Spaces and Matrices
polynomial of A be
p(A.) = (A.I - A.)"" ... (A., - A.)-,' (4.7.21)
and let the minimal polynomial of A be
m(A.) = (A. - A. I ) " . • . (A. - A.,)". (4.7.22)
eL t
,x = :x { (A - A.,I)"x = OJ, i= I, ... ,po
Then
(i) "X i= I, ... ,p are invariant linear subspaces of X under A;
(ii) X = Xl Et> •.. Et> X,;
(iii) (A. - A.,)" is the minimal polynomial of A" where A, is the restriction
of A to X,; and,
(iv) dim ,X = m" i = I, ... ,po
Proof The proofs of parts (i), (ii), and (iii) follow from the preceding
theorem by a simple induction argument and are left as an exercise.
To prove the last part ofthe theorem, we first show that the only eigenvalue
of A, E (L "X ,X ) is A." i = I, ... ,po eL t f) E "X v*" 0, and consider
(A, - A.l)v = O. From part (iii) it follows that
0= (A, - A.,ly"V = (A, - 11I),·1- (A , - A.I/)v
= (A, - 1,I),·I- (A. - A.,)v = (A. - A.,)(A, - A.,I),.- l (A, - A.,l)v
= (A. - l ,)l(A , - A.,I),,-l v = ... = (A. - A.,)"v.
From this we conclude that 1 = 1
"
We can now find a matrix representation of A in the form given in Theorem
.4 6.18. uF rthermore, from this theorem it follows that
p(A.) = det (A - A./) = D; det (A, - A./).

Now since the only eigenvalue of A, is 1 the determinant of A, - A./ must
be of the form "
det (A, - A.I) = (A., - A.)'t,
where ,q = dim ,X . Since p(A.) is given by Eq. (4.7.21), we must have
(A. I - A.)IIII .• • (A., - A.)III, = (A. l - A.)" ..• (A., - A.)t"
from which we conclude that m, = "q Thus, dim ,X = m i= 1, ... ,po
This concludes the proof of the theorem. _ "
.4 7.23. Exercise. Prove parts (i)-i{ ii) of Theorem .4 7.20.
The preceding result shows that we can always represent A E L(X, X)

by a matrix in block diagonal form, where the number of diagonal blocks
.4 7. Nilpotent Operators 185
(in the matrix A of Theorem .4 6.18) is equal to the number of distinct

eigenvalues of A. We will next find a convenient representation for each of
the diagonal submatrices A" It may turn out that one or more of the sub-
matrices A, will be diagonal. Our next result tells us specifically when A
E L(X, X ) is representable by a diagonal matrix.
.4 7.24. Theorem. Let X be an n-dimensional vector space over C, and

let A E L ( X , X ) . eL t 1..... , 1" p < n, be the distinct eigenvalues of A.
Then there exists a basis for X such that the matrix A of A with respect to
this basis is diagonal if and only if the minimal polynomial for A is of the
form
mel) = (1 - A1 )(1 - Az ) • . • (A - A,).
.4 7.26. Exercise. Apply the above theorem to the matrices in Examples

.4 6.14 and .4 6.15.
B. Nilpotent Operators
eL t us now proceed to find a representation for each of the A, E L ( X ,X )

in Theorem .4 7.20 so that the block diagonal matrix representation " of
A E L(X, X ) (see Theorem .4 6.18) is as simple as possible. To accomplish
this, we first need to define and examine so-called nilpotent operators.
.4 7.27. DefiDition. eL t N E L ( X , X). Then N is said to be nilpotent if

there exists an integer q > 0 such that N" = O. A nilpotent operator is said
to be of index q if N" = 0 but N,,- I "* O.
Recall now that Theorem .4 7.20 enables us to write X = X I EB X z EEl •

EB X .• Furthermore, the linear transformation (A, - A,l) is nilpotent on ~.
Ifwe let N, = A, - A,I, then A, = All + N,. Now 1,1 is clearly represented
by a diagonal matrix. oH wever, the transformation N, forces the matrix
representation of A, to be in general non-diagonal. So our next task is to
seek a simple representation of the nilpotent operator N,.
In the next few results, which are concerned with properties of nilpotent
operators, we drop for convenience the subscript i.
.4 7.28. T' heorem. eL t N E L ( V, V), where V is an m-dimensional vector
*"
space. If N is a nilpotent linear transformation of index q and if x . E V is
such that N,- l x 0, then the vectors x , Nx , ... , N,,- I x in V are linearly
independent.
Proof. We first note that if Nf- I X *- 0, then NJx *- 0 for j = 0, I, ... ,

q - I. Our proof is now by contradiction. Suppose that
~
1= 0
l{ ,INI X = o.
L e tj be the smallest integer such that l{ ,J *- o. Then we can write
NJx = - ~ l{ ,1 Nix *- O.
I=I+ J l{ ,J
Thus,
NJ x = NJ+I[~ (- ! t )NI- J - I X ] = NJ+l y,
I=I+ J (l,J
where y is defined in an obvious way. Now we can write
Nf- I X = Nf- J - I NJ x = Nf- J - I NJ + l y = Nfy = O.
We thus have arrived at a contradiction, which proves our result. _
Next, let us examine the matrix representation of nilpotent transfor-

mations.
.4 7.29. Theorem. Let V be a q-dimensional vector space, and let N E L ( V,

V) be nilpotent of index .q Let mo E V be such that Nf-1m o *- o. Then the
matrix N of N with respect to the basis { N f- I m o, NQ-2 mo , . .. ,mol in V
is given by
0100 00
0010 00
N= . (4.7.30)
0000 01
0000 00
Proof. By the previous theorem we know that {Nf-Im o,' .. ,mol is a linearly
independent set. By hypothesis, there are q vectors in the set, and thus
{ N f- I m o, ... ,mol forms a basis for V. Let el = Nqm
'- o for i = I, ... ,q .
Then
O, i= I
Ne l = { el->J .
1= 2, ... ,q .
Hence,
Ne l = 0 • et + 0 • e2 + + 0 . ef - t + 0 • ef
Ne 2 = I • et + 0 • e2 + + 0 • ef - 1 + 0 • eq
Ne f = 0 •e t + 0 • e2 + ... + I •e f- t + 0 •e f•
F r om Eq. (4.2.2) and Definition .4 2.7, it follows that the representation

of Nis that given by Eq. (4.7.30). This completes the proofofthetheorem. -
The above theorem establishes the matrix representation of a nilpotent

linear transformation of index q on a q-dimensional vector space. We will
next determine the representation of a nilpotent operator of index v on a
vector space of dimension m, where v < m. The following lemma shows
that we can dismiss the case v > m.
.4 7.31. eL mma. Let N E L ( V, V) be nilpotent of index v, where dim

V = m. Then v < m.
Proof Assume x E V, N· x = 0, N- - I X *
0, and v > m. Then, by Theorem
4.7.28, the vectors x , Nx , ... , N- - I x are linearly independent, which con-
tradicts the fact that dim V = m. •
To prove the next theorem, we require the following result.
.4 7.32. eL mma. eL t V be an m-dimensional vector space, let N E L ( V,

V), let v be any positive integer, and let
WI = {x: = OJ, dim WI =
Nx II,
W2 = {x: N2X = OJ, dim W 2 = 12 ,
W. = {x: N' x = OJ, dim W. = I•.

Also, for any i such that I ... ,Ne, .. ,} is a linearly independent set of
vectors in W,.
Proof To prove the first part, let x E WI for any i < v. Then NiX = O.
eH nce, NI+ I X = 0, which implies x E W1+ 1 '
To prove the second part, let r = II- I and let t = 11+ I - II' We note
that if x E WI+ I , then NI(Nx ) = 0, and so Nx E WI' This implies that
Ne J E WI for j = II + I, ... ,11+1' This means that the set of vectors
{el, ... ,e" NeH> ! ... , Ne"..} is in WI' We show that this set is linearly
independent by contradiction. Assume there are scalars (XI" • ,(x , and
PI' ... , PI> not all ez ro, such that
(Xle l + ... + (X,e, + PINe,,+1 + ... + p,Ne"., = O.

Since e{ l , • • ,e,} is a linearly independent set, at least one of the PI must
be non-ez ro. Rearranging the last equation we have
eH nce,
Thus,
N' ( fl. e,,+. + ... + fl,e,• ..> = 0,
and (fl.e,,+. + ... + fl,e".,) E W,. If fl.e,,+! + ...
+ fl,e" • 1= = 0, it can
be written as a linear combination of e., ... , e", which contradicts the fact
that e{ ., . .. ,e".,} is a linearly independent set. If fl.e,,+. + ... + fl,e,•• ,
= 0, we contradict the fact that e { ., ... , e".,} is a linearly independent set.
eH nce, weconcludethatlZ, = Ofori = I, ... , r andfl, = Ofori = I, ... , t.
This completes the proof of the theorem. _
We are now in a position to consider the general representation of a

nilpotent operator on a finite-dimensional vector space.
.4 7.33. Theorem. eL t V be an m-dimensional vector space over C, and

let N L ( V, V) be nilpotent of index v. Let W. = {x: Nx = O}, ... , W.
E
= {x: N· x = OJ, and let I, = dim W" i = I, ... ,v. Then there exists a basis
for V such that the matrix N of N is of block diagonal form,
N=:[ ' :], (4.7.34)
o N,
where
0100 00
N,= 0010
0000
.
00
01
(4.7.35)
0000 00
i = 1, ... ,r, where r = I., N, is a (k, x k,) matrix, I :::;; k,:::;; lI, and k, is
determined in the following way: there are
I. - I._I (v X v) matrices,
2/, - 1'1+ - 1,-. (i x i) matrices, i = 2, ... ,v - I, and
2/. - 11 (I x I) matrices.
The basis for V consists of strings of vectors of the form
Proof By eL mma .4 7.32, W. c W1 C • • c W•. Let e{ ., ... , e.} be a

basis for V such that {e., . .. ,e,.l is a basis for W,. We see that W. = V.
Since N is nilpotent of index v, W._ 1 1= = W. and 1.-. < I•.
We now proceed to select a new basis for V which yields the desired result.
We find it convenient to use double subscripting of vectors. L e th .• = e,•.• .+ ,
•• ,/(/y- I v_ . ),y = e,y and let It. .- 1 = Nlt.., ... ,/(/.- 1 .- . ),.- 1 = NI(/._I .• • )•• ,
By Lemma .4 7.32, it follows that {el>'" ,e,._.,fl .• - I ,' " ,I<,.-, .• ,)•• - I } is
a linearly independent subset of W._I> which mayor may not be a basis for
W._ I' If it is not, we adjoin additional elements from W._> \ denoted by
1<,.-, .• • 1+ 1 .• - 1 "" ,/(/•.• -Iv • )•• >\- so as to form a basis for W._ I • Now let
11 .• 2- . = NII • - I ,I2.•• 2- . = NI2..• - I ' · · · ,1<, .• • ,- .• • ),.-2. = NI<, .• • ,_ .• • ).• _ I · By
Lemma .4 7.32 it follows, as before, that e{ >\ ... , e,•.• ,/I.• 2- .,. .. ,1<, .• • I- .• • ).• 2- .}
is a linearly independent set in W.-2.' If this set is not a basis we adjoin vectors
from W.-2. so that we do have a basis. We denote the vectors that we adjoin
by 1<".,-1 .• • 1+ 1 .• - 2 ., • • ,1<,. .• - 1 .,.) • 2- .' We continue in this manner until we
have formed a basis for V. We express this basis in the manner indicated in
Figure C.
Basis for
f '." - -, f(/.-I..-,I. V
f"._" ,- f(l.- I • ,),V- l , - - . f(l._ , - / .- 2 ),v- l
f,,2' - - ' f(l.- I • ,I,2,- - - - - - - - - , f(/2- 1 ,),2
f,." ,- f(l._ I ._ , ). ,,- - - - - - - - - , f(/2- 1,), 1. - - , f/"I'
.4 7.36. F i gtn C. Basis for V.
The desired result follows now, for we have
NI; =
1./
/{ ,./-0,>\ j>
j =
I
I.
eH nce, if we let XI = II.., we see that the first column in Figure C reading
bottom to top, is
We see that each column of Figure C determines a string consisting of k,

entries, where k, = v for i = I, ... , (I. - /._1)' Note that (/. - 1.-1) > 0,
so there is at least one string. In general, the number of strings withj entries
is (// - //-1) - (/J + I - //) = 2/} - I} + I - I} - I for j = 2, ... , v - I. Also,
there are /1 - (12. - /1) = 2/ 1 - /" vectors, or strings with one entry.
Finally, to show that the number of entries, NI, in N is /1' we see that
there are a total of(/. - I.- I ) + (2/'1- - I. - 1.- 2 ) + ... + (2/ 2 - II - 13 )

+ (2/ 1 - 12 ) = II columns in the table of Figure C.
The reader should study Figure C to obtain an appreciation of the structure

of the basis for the space V.
C. The oJ rdan Canonical oF rm
We are finally now in a position to state and prove the result which estab-
lishes the Jordan canonical form of matrices.
.4 7.37. Deorem. eL t X be an n-dimensional vector space over C, and let

A E L(X, X ) . eL t the characteristic polynomial of A be
p(A) = (AI - A)"" ... (A, - A)m.,
and let the minimal polynomial of A be
m(A) = (A - AI)" ... (A - A,)",
where AI' ... ,A, are the distinct eigenvalues of A. eL t
,X = x{ E X: (A - A,I)"x = OJ.
Then
(i) Xl>"" X, are invariant subspaces of X under A;
(ii) X = IX EB ..• EB X,;
(iii) dim ,X = m i = 1, ... ,p; and
"
(iv) there exists a basis for X such that the matrix A of A with respect
to this basis is of the form
AI 0 ... 0]
A =
[
~ ... ~.2 • : : : • ~. '
(4.7.38)
o 0 ... A,
where A, is an (m, X m,) matrix of the form
A, = 1,1 + N, (4.7.39)
and where N, is the matrix of the nilpotent operator (A, - liT)
of index V, on ,X given by Eq. (4.7.34) and Eq. (4.7.35).
Proof. Parts (i)-(iii) are restatements of the primary decomposition theorem
(Theorem .4 7.20). From this theorem we also know that (1 - 1 ,)" is the
minimal polynomial of A" the restriction of A to "X eH nce, if we let N,
= A, - l,I, then N, is a nilpotent operator of index V, on "X We are thus
able to represent N, as shown in Eq. (4.7.35).
The completes the proof of the theorem. _
.4 7. oJ rdan Canonical oF rm 191
A little extra work shows that the representation of A E L ( X . X ) by a

matrix A of the form given in Eqs. (4.7.38) and (4.7.39) is unique. except for
the order in which the block diagonals AI• . ..• Ap appear in A.
.4 7.40. Definition. The matrix A of A E L ( X . X ) given by Eqs. (4.7.38)

and (4.7.39) is called the Jordan canonical form of A.
We conclude the present section with an example.
.4 7.41. Example. Let X = R 7 • and let u{ I • • • u7 } be the natural basis for

X (see Example .4 I.15). L e t A E L ( X . X ) be represented by the matrix
-I 0 -1 3 0 1 1
o 1 o o 0o 0
2 1 2 -1 -1 -6 0
A= -2 0 -1 2 1 3 0
o 0 o 0 1 o0
o 0 o 0 o 1 0
-I -I o 1 2 4 1
with respect to u{ I , • . . • u7 } . L e t us find the matrix At which represents A
in the J o rdan canonical form.
We first find that the characteristic polynomial of A is
Pel) = (I - 1)7.
This implies that 1 1 = I is the only distinct eigenvalue of A. Its alge-

braic multiplicity is m. = 7. In order to find the minimal polynomial of A.
let
N = A- ),.1,
where I is the identity operator in L ( X , X). The representation for N with
respect to the natural basis in X is
-2 o -I 1 1 3 0-
o o 0 o 0 o0
2 I 1 -I -I -6 0
N= A - I = -2 o -I I 1 3 0
o o 0 o 0 o 0
o o 0 o 0 o 0
-1 -I 0 1 2 4 0
We assume the minimal polynomial is of the form m(l) = (l - I» ' and

proceed to find the smallest VI such that m(A - I ) = m(N) = O. We first
obtain
o -1 0 0 0 3 0
o 0 0 0 0 o0
o I 0 0 0 -3 0
NZ = 0 -I 0 0 0 3 0
o 0 0 0 0 o0
o 0 0 0 0 o0
o 0 0 0 0 o0
Next, we get that
N3 = 0 ,
and so 3. eH nce, N is a nilpotent operator of index 3. We see that
VI =
X = 5't~. We will now apply Theorem .4 7.33 to obtain a representation for
N in this space.
sU ing the notation of Theorem .4 7.33, we let WI = :x { Nx = OJ, Wz
= :x { NZx = OJ, and W, = :x { N 3 x = 0). We see that N has three linearly
independent rows. This means that the rank of N is 3, and so dim (WI) = II
= .4 Similarly, the rank of NZ is I, and so dim (Wz ) = Iz = 6. Clearly,
dim (W3) = 13 = 7. We can conclude that N will have a representation N'
ofthe form in Eq. (4.7.34) with r = .4 Each of the N; will be of the form in Eq.
(4.7.35). There will be 13 - Iz = 1 (3 x 3) matrix. There will be 2/ z - 13
- II = 1 (2 X 2) matrix, and 2/1 - Iz = 2 (l x I) matrices. eH nce, there
is a basis for X such that N may be represented by the matrix
o 1 0iO 0 0 0
001:0000
o 0 0:0 0 0 0
I
r- -j·-
N' = 000:01:00
o 0 010
I I
010 0
·- - - r- ·
o 0 0 0 0:0:0 1 ,-_ ..
o 0 0 0 0 0:0
The corresponding basis will consist of strings of vectors of the form
NZx.. Nx . . x..
Nx z , X z , x 3, x ...
We will represent the vectors x .. X z , "x and x .. by x .. x z , "x and x ..,
their coordinate representations, respectively, with respect to the natural
basis u{ .. ... , u,} in .X We begin by choosing XI E W3 such that X I 1= Wz ;
i.e., we find an X I such that N 3x I = 0 but NZx I :# O. The vector x f = (0,
.4 7. oJ rdan Canonical oF rm 193
1,0,0,0,0,0) will do. We see that (Nxl)T = (0,0, 1,0,0,0, - I ) and

(N2IX )T = (- 1 ,0, I, - 1 ,0,0,0). Hence, NX I E Wz but NX I ~ WI and
NZx l E WI' We see there will be only one string of length three, and so we
next choose zX E Wz such that X z ~ WI' Also, the pair N { x l , }zx must be
linearly independent. The vector x I = (1,0,0,0,0,0,0) will do. Now
(NxZ)T = (- 2 ,0,2, - 2 ,0,0, - I ), and NX 2 E WI' We complete the basis
for X by selecting two more vectors, X 3 , x , E W., such that N { Zx l , Nx z ,
X 3t x , } are linearly independent. The vectors x I = (0, 0, - I , - 2, I, 0, 0)
and x r = (1, 3, I, 0, 0, I, 0) will suffice.
It follows that the matrix
P = N
[ xz l, Nx l , X I ' Nx z , X z , x 3, x,]
is the matrix of the new basis with respect to the natural basis (see Exercise
.4 3.9).
The reader can readily show that
N' = P - I NP,
where
-I 0 0 -2 I 0 I
0 0 I 0 0 0 3
I I 0 2 0 -I I
P= -I 0 0 -2 0 -2 0
0 0 0 0 0 I 0
0 0 0 0 0 0 I
0 -I 0 - I 0 0 0
and
0 0 2 I 4 -2 2
0 0 I 3 -I 0 I
0 I 0 0 0 -3 0
p- l = 0 0 -I -I -3 I -I
I 0 0 -I -2 -I 0
0 0 0 0 I 0 0
0 0 0 0 0 I 0
Finally the J o rdan canonical form for A is given by
A' = N' + I.
(Recall that the matrix representation for [ i s the same for any basis in .X )
Thus,
1 1 0iO 0 0 0
I
011:0000 I
001:0000
t- - ·-
A' = 00 0: 1 1:0 0
I I
0 0 0 :I 0 1 :I 0 0
o 0 0 o- o- T"i-l
0
'- -i-
0 0 0 0 0 OIl
Again, the reader can show that
A' = P- I AP.
In general, it is more convenient as a check to show that PA' = AP. •
.4 7.42. Exercise. eL t X = R' , and let u{ t , • • , u,} denote the natural
basis for .X Let A E L ( X , X ) be represented by the matrix
5 -1 I 1 0 0-
1 3 -I -1 0 0
0 0 4 0 1 1
A=
0 0 0 4 -1 -1
0 0 0 0 3 1
0 0 0 0 1 3
Show that the Jordan canonical form of A is given by
4 1 0iO 0 0
I
04 1:000
o 0 4:0 0 0
I
A' = O-O
- O
- r- i- 4 l- 0
o
I I
0 0:0 4 : 0
1 1_ _
0 0 0 0 i2
~
0_
and find a basis for X for which A' represents A.
.4 8. BILINEAR N
UF CTIONALS AND CONGRUENCE
In the present section we consider the representation and some of the

properties of bilinear functionals on real finite-dimensional vector spaces.
(We will consider bilinear functionals defined on complex vector spaces in
Chapter 6.)
.4 8. Bilinear uF nctionals and Congruence 195
Throughout this section X is assumed to be an n-dimensional vector space

over the field ofreal numbers. We recall that iffis a bilinear functional on a
real vector space ,X then f: X x X - + Rand
f(<x< I + px 2,Y ) = « f (x l ,y) + Pf(x 2,y)
and
f(x , IY « + PY2) = « f (x , YI) + Pf(x , 2Y )
for all ,« pER and for all x, X I ' x 2, ,Y YI' 2Y E .X As a consequence of
these properties we have, more generally,
f(ii x J « J, Ir"'i;,. PlrYIr) = tl 1;1 «JPd(x J, IY r)

for all ' J « PIr E Rand ' J x IY r E X,j = I, ... , rand k = I, ... ,s.
.4 8.1. Definition. eL t e{ l , ... , en) be a basis for the vector space X, and
let
ftJ = f(e" eJ ), i,j = I, ... , n.
The matrix F = lftJ ] is called the matrix of the bilinear functional f with
respect to e{ l> ... , en)'
Our first result provides us with the representation of bilinear functionals

on finite-dimensional vector spaces.
.4 8.2. Theorem. Let f be a bilinear functional on a vector space ,X and

let fe I' . • . , e.l be a basis for .X eL t F be the matrix of the bilinear functional
fwith respect to the basis fel> ... ,e.l. If X and yare arbitrary vectors in X
and if x and yare their coordinate representation with respect to the basis
fel , e2 , • • , e.l, then
• •
f(x , y) = ~
1'1= I- J
:E fttl' l J '
T
x yF = (4.8.3)
Proof We have x T = (el" .. ,en) and yT = ('II" .. ,'In)' Also, X = ele l

+ ... + e.e. and Y = ' I le l + ... + I' .en· Therefore,
• • • •
f(x , y) = :E :E
I- I I- J
el' l J ( e l, eJ ) = :E :E ftJel'lJ
1= I = J I
= T
x yF
which was to be shown. •
Conversely, if we are given any (n X n) matrix F, we can use formula

(4.8.3) to define the bilinear functional f whose matrix with respect to the
given basis e{ I ' . . • , e.) is, in turn, F again. In general, it therefore follows that
on finite-dimensional vector spaces, bilinear functionals correspond in a
one-to-one fashion to matrices. The particular one-to-one correspondence
depends on the particular basis chosen.
Now recall that if X is a real vector space, thenfis said to be symmetric
if f(x , y) = f(y, x ) for all x , y E .X We also have the following related

concept.
4.8.4. Definition. A bilinear functional f on a vector space X is said to be

skew symmetric if
f(x , y) = - f (y,x ) (4.8.5)
for all x, y E .X
F o r symmetric and skew symmetric bilinear functionals we have the

following result.
4.8.6. Theorem. L e t e{ t , • • ,eR } be a basis for ,X and let F be the matrix

for a bilinear functionalfwith respect to e{ l>' .. ,e.}. Then
(i) f is symmetric if and only if F = T F ;
(ii) f is skew symmetric if and only if F = - T F ; and
(iii) for every bilinear functional f, there exists a unique symmetric
bilinear functional f, and a unique skew symmetric bilinear func-
tional f2 such that
f= f , + f 2'
We callft the symmetric part offandf2 the skew symmetric part off.
.4 8.7. Exercise. Prove Theorem .4 8;6.
The preceding result motivates the following definitions.
4.8.8. Definition. An (n X n) matrix F is said to be

(i) symmetric if F = T
F ; and
(ii) skew symmetric if F = - F T .
The next result is easily verified.
.4 8.9. Theorem. Let f be a bilinear functional on ,X and let ft and f2 be

the symmetric and skew symmetric parts off, respectively. Then
f,(x , y) = t[ f (x , y) + feY , )x ]
and
f2(X, y) = t[ f (x , y) - feY , )x ]
for all x , y E .X
Now let us recall that the q u adratic form induced by f was defined as
!(x) = f(x , x). On a real finite-dimensional vector space X we now have
.4 8. Bilinear uF nctionals and CongrlUn! ce 197
F o r quadratic forms we have the following result.
.4 8.11. Theorem. L e tJ a nd g be,bilinear functionals on .X The quadratic

forms induced by J and g are equal if and only ifJ and g have the same sym-
metric part. In other words,! ( x ) = § ( x ) for all x E X if and only if
![J(x, y) + J(Y, )x ] = ! [ g (x , y) + g(y, )x ]
for all x , y E .X
Proof We note that
J(x - y, x - y) = J(x, x) - J(x, y) - J(Y, x) + J(Y, y).
F r om this it follows that
![J(x, y) + J(Y, )x ] = ! [ f (x , x) + feY , y) - f(x - y, x - y)].
Now if g(x , x ) = J(x, x ) , then
![J(x, y) + J(Y, )x ] = ! [ g (x , x) + g(y, y) - g(x - y, x - y)]
= ! [ g (x , y) + g(y, x)],
so that
![J(x, y) 1
+ (Y, )x ] = ! [ g (x , y) + g(y, )x .] (4.8.12)
Conversely, assume that Eq. (4.8.12) holds for all ,x y E .X Then, in
particular, if we let x = y, we have f(x , x ) = g(x , x ) for all x E .X This
concludes our proof. _
F r om Theorem .4 8.11 the following useful result follows: when treating

quadratic functionals, it suffices to work with symmetric bilinear functionals.
We leave the proof of the next result as an exercise.
.4 8.13. Theorem. A bilinear functional on a vector space X is skew sym-

metric if and only if J ( x , x ) = 0 for all x E .X
The next result enables us to introduce the concept of congruence.
.4 8.15. Theorem. Let f be a bilinear functional on a vector space X, let

e{ l> ... , e.l be a basis for ,X and let F be the matrix ofJ w ith respect to this
basis. Let e{ ;, . .. ,e~l be another basis whose matrix with respect to e{ l>
... ,e.l is P. Then the matrix F ' of fwith respect to the basis e{ ;, .. . ,e.l
is given by
F' = P"F P . (4.8.16)
t
~
. p,/e,) = ..= /[ :/]p",p,/I(e", e,) = . . p",I",p,/.= 1(1" )~ .
Proof Let F '
~ ~
where, by definition,I:/
~ ~ Hence, F '
Then/(
=
"-I
P"F P .
p",e".
_
t:1 "-I I-' t:' t t:1
We now have:
4.8.17. Definition. An (n x n) matrix F ' is said to be congruent to an

(n X n) matrix F if there exists a non-singular matrix P such that
F' = PTFP. (4.8.18)
We express this congruence by writing F ' ,." .F
Note that congruent matrices are also equivalent matrices. The next
theorem shows that ,." in Definition 4.8.17 is reflexive, symmetric, and
transitive, and as such it is an equivalence relation.
4.8.19. Theorem. Let A, B, and C be (n x n) matrices. Then,

(i) A is congruent to A;
(ii) if A is congruent to B, then B is congruent to A; and
(iii) if A is congruent to Band B is congruent to C, then A is congruent
toC.
Proof Clearly A = ITAI, which proves the first part. To prove the second
part, let A = PTBP, where P is non-singular. Then
B = (PT)-IAP-I = (P-I)TA(P-I),
which proves the second part.
Let A = PTBP and B = QTCQ, where P and Q are non-singular matrices.
Then
A = PTQTCQP = (QP)TC(QP),
where QP is non-singular. This proves the third part. _
F o r practical reasons we are interested in determining the "nicest" (i.e.,

the simplest) matrix congruent to a given matrix, or what amounts to the
same thing, the "nicest" (i.e., the most convenient) basis to use in expressing
a given bilinear functional. If, in particular, we confine our interest to
quadratic functionals, then it suffices, in view of Theorem .4 8.11, to consider
symmetric bilinear functionals.
We come now to the main result of this section, called Sylvester's

theorem.
.4 8.20. Theorem. L e t / be any symmetric bilinear functional on a real

n-dimensional vector space .X Then there exists a basis {el' ... ,e.} of X
such that the matrix of/with respect to this basis is of the form
+1
p}
0
+1
r
-1
n (4.8.21)
-1
0 0
o
The integers rand p in the above matrix are uniquely determined by the
bilinear form.
Proof. Since the proof of this theorem is somewhat long, it will be carried
out in several steps.
Step 1. We first show that there exists a basis v{ u ... , v.J of X such that
/(v1, vJ) = 0 for i 1= = j. The proof of this step is by induction on the dimension
of .X The statement is trivial if dim X = 1. Suppose that the assertion is
true for dim X = n - l. Let / be a bilinear functional on ,X where dim
X = n. L e t VI E X be such that /(v l , VI) 1= = O. There must be such a VI;
otherwise, by Theorem .4 8.13, / would be skew symmetric, and we would
conclude that/(x , y) = 0 for all x , y. Now let mz = x { E X : f(v l , x) = OJ.
We now show that mz is a linear subspace of .X Let X u 2X E mz so that
f(vl , X I ) = f(v u x 2) = O. Then f(v u X I + x 2 ) = f(vl , X I ) + f(v l , x 2) = 0
+ 0 = O. Similarly, f(vt> « X I ) = 0 for all « E R. Therefore, mz is a linear
subspace of .X Furthermore, mz 1= = X because VI ¢ mz. Hence, dim mz
:= ;;; n - 1. Now let dim mz = q < n - 1. Since / is a bilinear functional on
mz, it follows by the induction hypothesis that there is a basis for mz consisting
of a set of q vectors v{ 2 , • • , vf+tl such that f(v1, vJ) = 0 for i 1= = j, 2 < i,
j < q + 1. Also, f(v l , vJ) = 0 for j = 2, ... ,q + I, by definition of mz.
uF rthermore, f(v VI) = f(v l , vJ eH nce, f(v VI) = f(v., v,} = 0 for i = 2,
... ,q + l . "
It follows that f(v"vJ } = 0 for" i:# j and I~i,j<q+l.
We now show that {VI"" ,vq+ll is a basis for .X eL t x E X and let x '
= x - (XlVI, where (XI = f(v l , x ) ff(v l , VI)' Then f(v., x'} = f(v" )x
- (X t /(v l , VI) = f(v l , VI) - f(v l , VI) = O. Thus, x ' E mI. Since v{ ,z .• . ,
vq + l l is a basis for mI, there exist (X2" • ,(Xq+1 such that x ' = (XV z z + ...
+ (XI+q V q + l ; i.e., x = (XlVI + ... + (XI+q V q + l , Thus, { V I"' " vq + l } spans .X
To show that the set {VI" • , vq + l } is linearly independent, assume that
(XlVI + ... + (XI+q Vq1+ = O. Then 0 = f(v l , 0) = f(V., (XlVI + ... +
q + l) = VI) = 0, which implies that (x , = O. eH nce, (X2VZ + ...
+
(XI+q V (X t /(v"
(XI+q V q + 1 = O. Since the set { V 2"' " vqt+ l forms a basis for mI, we
must have (X2 = ... = (X1+q = O. Thus, { V I"' " vqt+ l forms a basis for
,X and we conclude that q + I = n. This completes the proof of
step l.
Step 2. eL t { V I' • • , v.l be a basis for X such that f(v" VJ) = 0 for
i:# j and let P, = f(v v,) for i,j = I, ... , n. eL t e, = 1' IV, for i = 1, ... , n,
where 1' , = If..Jj;J l "if P,:# 0 and 1' 1 = 1 if P, = O. Now suppose that
P, = f(v" v,):# O. Then we have f(e" e,l = f(Ylv 1' ,V,) = 1' 1f(v" v,)
= P,f..J7lJ = ± l. Also, if P, = f(v v,) = 0, then f(e" e,l" = Ylf(v" v,) = O.
iF nally, we see that f(e eJ ) = f(y,v " vJ Y J ) = IY 1' fJ (v VJ) = 0 if i:# j.
Thus, we have established a basis for" X such that fl}" = f(e" eJ ) = 0 if
"
* i j and "[ = f(e,. e,l = + 1, - 1 , or O.
Step 3. We now show that the integers p and r in matrix (4.8.21) are
uniquely determined by f eL t { e l, • • ,e.l and {e~, . .. , e~} be bases for X
and let F and F ' be matrices offwith respect to { e l, • • , e.l and Ie;, ... , i.},
respectively, where
o
-1
F=
o -I
n- p
o
o
o
-1
F'=
-1
n- q
o o
To prove that p = q we show that e l , • • ,e" e;+h ... ,e:. are linearly
independent. F r om this it must follow that p (n - q ) < n, or p < .q By +
the same argument, q < p, and so p = .q L e t
Ylel + + y,e, + ;Y l+ e;+1 + ... + , ~e:. = 0,
where 1' 1 E R, i = 1, ,p and 1' : E R, i = q + 1, ... ,n. Rewriting the
above equation we have
"leI + ... + "pep = -(Y;+le;+1 + ... + y~e:.) A x o'
Then
f(x o, x o) = f(y,e, + ... + ,Y e" Ylel + ... + pY e p)
= Y~ + ... + Y~ > 0,
by choice of{ e l , ... , ep}. On the other hand,
f(x o, x o) = f[-(~+,~+, + ... + y~e:.), - (r~+I~+I' ... ,y~e~)]
= (- 1 )Z[ - (,,~+ 1)2 - (,,~+z)Z - .• • - (y~)Z] < 0
by choice of{~+I" .. ,e~+R}' F r om this we conclude thaty~ '1~ = 0; + ... +
i.e., 1' 1 = ... = 1' p = O. Hence, Y~+ I~+ I + ... + y~e~ = O. But the set
{~+I" .. , e:,} is linearly independent, and thus Y~+I = ... = , ~ = O. Hence,
the vectors el' ... ,ep , ~+t, ... ,e~ are linearly independent, and it follows
thatp = .q
To prove that r is unique, let r be the number of non-zero elements of F
and let r' be the number of non-zero elements of F ' . By Theorem .4 8.15,
F and F ' are congruent and hence equivalent. Thus, it follows from Theorem
4.3.16 that F and F ' must have the same rank, and therefore r = r'.
This concludes the proof of the theorem. _
201 Chapter 4 I Finite-Dimensional Vector Spaces and Matrices
Sylvester's theorem allows the following classification ofsymmetric bilinear

functionals.
.4 8.22. Definition. The integer r in Theorem .4 8.20 is called the rank of

the symmetric bilinear functional f. The integer p is called the index of f.
The integer n is called the order off. The integer s = 2p - r (i.e., the number
of + l' s minus the number of - I s' ) is called the signature off.
Since every real symmetric matrix is congruent to a unique matrix of the

form (4.8.21), we define the index, order, and rank of a real symmetric
matrix analogously as in Definition .4 8.22.
Now let us recall that a bilinear functional f on a vector space X is said
to be positive if f(x , x ) > 0 for all x E .X Also, a bilinear functional f is
said to be strictly positive if f(x , x) > 0 for all x "*
0, x E X (it should be
noted that f(x , x ) = 0 for x = 0). Our final result of the present section,
which is a consequence of Theorem .4 8.20, enables us now to classify sym-
metric bilinear functionals.
.4 8.23. Theorem. Let p, r, and n be defined as in Theorem .4 8.20. A

symmetric bilinear functional on a real n-dimensional vector space X is
(i) strictly positive if and only if p = r = n; and
(ii) positive if and only if p = r.
.4 9. EUCIL DEAN VECTOR SPACES
A. Euclidean Spaces: Definition and Properties
Among the various linear spaces which we will encounter, the so-called
Euclidean spaces are so important that we devote the next two sections to
them. These spaces will allow us to make many generalizations to facts
established in plane geometry, and they will enable us to consider several
important special types of linear transformations. In order to characterize
these spaces properly, we must make use of two important notions, that of
the norm of a vector and that of the inner product of two vectors (refer to
Section 3.6). In the real plane, these concepts are related to the length of a
vector and to the angle between two vectors, respectively. Before considering
the matter on hand, some preliminary remarks are in order.
To begin with, we would like to point out that from a strictly logical
point of view Euclidean spaces should actually be treated at a later point of
.4 9. Euclidean Vector Spaces 203
our development. This is so because these spaces are specific examples of

metric spaces (to be treated in the next chapter), of normed spaces (to be dealt
with in Chapter 6), and of inner product spaces (also to be considered in
Chapter 6). oH wever, there are several good reasons for considering Euclidean
spaces and their properties at this point. These include: Euclidean spaces are
so important in applications that the reader should be exposed to them as
early as possible; these spaces and their properties will provide the motivation
for subsequent topics treated in this book; and the material covered in the
present section and in the next section (dealing with linear transformations
defined on Euclidean spaces) constitutes a natural continuation and con-
clusion of the topics considered thus far in the present chapter.
In order to provide proper motivation for the present section, it is useful
to utilize certain facts from plane geometry to indicate the way. To this end
let us consider the space R'- and let x = (' I ' ,,-) and y = ('11' 1' ,-) be vectors
in R'.- Let IU{ > u,-} be the natural basis for R'.- Then the natural coordinate
representation of x and y is
x = [~:J and y = :[ :J (4.9.1)
respectively (see Example .4 1.15). The representation of these vectors in

the plane is shown in Figure D. In this figure, Ix I, Iy I, and Ix - y Idenote the
.4 9.1. iF gure D. eL ngth of vectors and angle between vectors.
lengths of vectors ,x y, and (x - y), respectively, and 8 represents the angle

between x and y. The length of vector x is equal to (,f + ,n IlZ , and the length
of vector (x - y) is equal to { ( ' I - 1' 1)'- + (,,- - 1' ,-))- ' 1/2. By convention,
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
we say in this case that "the distance from x to y" is equal to {(~I - I' I)Z
+ (~z - I' Z)Z}1/2, that "the distance from the origin 0 (the null vector) to
x" is equal to (~f + ~DI/Z, and the like. Using the notation of the present
chapter, we have
(4.9.3)
and
Ix - yl = ,J ( x - y)T(x - y) = ,J ( y - )X T(y - )x = Iy - lx . (4.9.4)
The angle (J between vectors x and y can easily be characterized by its
cosine, namely,
cos 8 =(~ 17~+ ~z7z) Z· (4.9.5)
""'~f + i ""'I' I + I' z
Utilizing the notation of the present chapter, we have
cos (J = ,J
T
x x
XT~ yTy
(4.9.6)
It turns out that the real-valued function Tx y, which we used in both Eqs.
(4.9.3) and (4.9.6) to characterize the length of any vector x and the angle
between any vectors x and y, is of fundamental importance. F o r this reason
we denote it by a special symbol; i.e., we write
(x, y) t:. T
x y. (4.9.7)
Now if we let x = yin Eq. (4.9.7), then in view of Eq. (4.9.3) we have
Ix I = ""'(x, x). (4.9.8)
By inspection of Eq. (4.9.3) we note that
(x, x) > 0 for all x * - O (4.9.9)
and
(x , x ) = 0 for x = O. (4.9.10)
Also, from Eq. (4.9.7) we have
(x, y) = (y, x) (4.9.11)
for all x and y. Moreover, for any vectors ,x y, and z and for any real scalars
« and p we have, in view of Eq. (4.9.7), the relations
(x + y, )z = (x, )z + (Y, )z , (4.9.12)
(x , y + )z = (x, y) + (x , )z , (4.9.13)
(<,x< y) = «(x, y), (4.9.14)
and
(x , « y ) = «(x, y). (4.9.15)
In connection with Eq. (4.9.6) we can make several additional observa-
tions. First, we note that if x = y, then cos (J = 1; if x = - y , then cos +
8 = - 1 ; if x T = (~I> 0) and yT = (0, I' )z , then cos (J = 0; etc. It is easily
.4 9. Euclidean Vector Spaces
verified, using Eq. (4.9.6), that cos (J assumes all values between + 1 and
- 1 ; i.e., - 1 < cos (J S + 1.
The above formulation agrees, of course, with our notions of length
of a vector, distance between two vectors, and angle between two vectors.
F r om Eqs. (4.9.9}-(4.9.l5) it is also apparent that relation (4.9.7) satisfies
all the axioms of an inner product (see Section 3.6).
U s ing the above discussion as motivation, let us now begin our treatment
of Euclidean vector spaces.
F i rst, we recall the definition of a real inner product: a bilinear functional
f on a real vector space X is said to be an inner product on X if (i) f is sym-
metric and (ii) f is strictly positive. We also recall that a real vector space X
on which an inner product is defined is called a real inner product space. We
now have the following important
.4 9.16. Definition. A real finite-dimensional vector space on which an

inner product is defined is called a Euclidean space. A finite-dimensional
vector space over the field of complex numbers on which an inner product is
defined is called a unitary space.
We point out that some authors do not restrict Euclidean spaces to be

finite dimensional.
Although many of the results of unitary spaces are essentially identical
to those of Euclidean spaces, we postpone our treatment of complex inner
product spaces until Chapter 6, where we consider spaces that, in general,
may be infinite dimensional.
Throughout the remainder ofthe present section, X will denote an n-dimen-
sional Euclidean space, unless otherwise specified. Since we will always be
concerned with a given bilinear functional on ,X we will henceforth write
(x, y) in place of f(x , y) to denote the inner product of x and y. Finally, for
purposes of completeness, we give a summary of the axioms of a real inner
product. We have
(i) (x, x ) > 0 for all x *' 0 and (x, x ) = 0 if x = 0;

(ii) (x, y) = (y, x ) for all x , y E X;
(iii) (IXX + py, )z = IX(,X )z + P(y, )z for all x, y, z E X and all IX,
PER; and
(iv) (x, lXy + +
pz) = IX(,X y) P(x, )z for all x , y E X and all IX, pER.
We note that Eqs. (4.9.9}-(4.9.15) are clearly in agreement with these
axioms.
.4 9.17. Theorem. The inner product (x, y) = 0 for all x E X if and only
if y = o.
Proof If y = 0, then y = 0 • x and (x, 0) = ( x , 0 • x ) = 0 • (x, x ) = 0 for

allx E X .
On the other hand, let (x , y) = 0 for all x E .X Then, in particular, it
must be true that (x, y) = 0 if x = y. We thus have (y, y) = 0, which implies
thaty = 0.. .
The reader can prove the next results readily.
.4 9.18. Corollary. L e t A E L(X, X). Then (x, Ay) = 0 for all ,x y E X

if and only if A = O.
.4 9.19. Corollary. Let A, B E L ( X , X). If (x, Ay) = (x, By) for all ,x
y E X, then A = B.
4.9.20. Corollary. Let A be a real (n x n) matrix. If x T Ay = 0 for all

x , y E R-, then A = o.
4.9.11. Exercise. Prove Corollaries 4.9.18-4.9.20.
Of crucial importance is the notion of norm. We have:
4.9.11. Definition. F o r each x E ,X let

Ixl = (x , X ) 1/2.
We call Ix I the norm of .x
Let us consider a specific case.
4.9.13. Example. Let X = R- and let x, y E X, where x = (~\t .• . , ~_)

and y = ("I ' . • • , ,,_ ) . F r om Example 3.6.23 it follows that
(x, y) = :E-
I- I
/~ ' (4.9.24)
is an inner product on .X The coordinate representation of x and y with

respect to the natural basis in R- is given by
respectively (see Example .4 1.15). We thus have

(x , y) = Tx y, (4.9.25)
and
_ )1/2
Ixl = ( :E l~
I- I
= (X TX)1/2 • • (4.9.26)
The above example gives rise to:

.4 9.27. Definition. The vector space R" with the inner product defined
in Eq. (4.9.24) is denoted by P. The norm of x given by Eq. (4.9.26) is called
the Euclidean norm on R".
Relation (4.9.29) of the next result is called the Schwarz inequality.
.4 9.28. Theorem. Let x and y be any elements of .X Then

l(x,y)1 ~ Ix l · I Y I , (4.9.29)
where in Eq. (4.9.29) I(x, y) I denotes the absolute value of a real scalar and
Ix I denotes the norm of .x
Proof F o r any x and y in X and for any real scalar tt we have
(x + tty, x + tty) = (x, )x + tt(x, y) + tt(y, )x + tt 2(y, y) > O.
Now assume first that y *- 0, and let
tt = - ( x , y).
(y, y)
Then
(x + tty, x + tty) = (x, x ) + 2tt(x, y) + tt 2(y, y)
= (x
,
x) _ 2(x, y)(x y)
(y, y ) '
+ (x , y)2(y y)
(Y, y)2 ,
= (x x) _ (x , y)2 > 0
, (y,y) - ,
or
(x, x)(y, y) > (x , y)z.
Taking the square root of both sides, we have the desired inequality
l(x,y)1 < Ix l · l yl·
To complete the proof, consider the case y = O. Then (x, y) = 0, Iy I = 0,
and in this case the inequality follows trivially. •
.4 9.30. Exercise. F o r ,x y E ,X show that

l(x,y)1 = Ix l ' l yl
if and only if x and yare linearly d.ependent.
In the next result we establish the axioms of a norm.
.4 9.31. Theorem. For all x and y in X and for all real scalars tt, the
following hold:
(i) Ix l > 0 unless x = 0, in which case Ixl = 0;
(ii) Ittx I = Itt I . Ix I, where Itt I denotes the absolute value of the scalar
tt; and
(iii) Ix + IY ~ Ixl + Iyl·
Chopter 4 I iF nite-Dimensional Vector Spaces and Matrices
Proof The proof of part (i) follows from the definition of an inner product.
To prove part (ii), we note that
I«lx z = (<,x< «x) = (« ,x «x) = Z
« (x, )x = lz « lx z.
Taking the square root of both sides we have the desired relation
= I«llx I· I«lx
To verify the last part of the theorem we note that
Ix + ylZ = (x + y, x + y) = (x, x) + 2(x , y) + (Y , y)
= Ixl z + 2(x , y) + Iylz.
sU ing the Schwarz inequality we obtain
Ix + ylZ < Ixl z + 2lxl · I yl + lylZ = (Ixl + Iyl)z.
Taking the square root of both sides we have
Ix + y I< Ix I + Iy I,
which is the desired result. _
Part (iii) of Theorem .4 9.31 is called the triangle inequality. Part (ii) is
called the homogeneous property of a norm. In Chapter 6 we will define
functions on general vector spaces satisfying axioms (i), (ii), and (iii) of
Theorem .4 9.31 without making use of inner products. In such cases we will
speak of normed linear spaces (Euclidean spaces are examples of normed
linear spaces).
Our next result is called the parallelogram law. Its meaning in the plane is
evident from Figure E.
x x+y
I
I
/
I
I
I
/
/
I
I
/
o Iyl y
.4 9.32. FIgure E. Interpretation of the parallelogram law.
.4 9.33. Deorem. F o r all ,x y E X the equality

Ix + ylZ + Ix - ylZ = 21xl z + 21yI Z
holds.
Generalizing Eq. (4.9.4), we define the distance between two vectors x

.4 9. Euclidean Vector Spaces
andy of X a s
p(x , y) = Ix - yl. (4.9.35)
It is not difficult for the reader to prove the next result.
4.9.36. Theorem. F o r all x , y, Z E ,X the following hold:

(i) p(x , y) = p(y, x ) ;
(ii) p(x , y) ~ 0 and p(x , y) =
0 if and only if x = y; and
(iii) p(x , y) < p(x , )z + p(z, y).
A function p(x , y) having properties (i), (ii), and (iii) of Theorem .4 9.36
is called a metric. Without making use of inner products, we will in Chapter 5
define such functions on non-empty sets (not necessarily linear spaces) and
we will in such cases speak of metric spaces (Euclidean spaces are examples
of metric spaces).
B. Orthogonal Bases
Following our discussion at the beginning of the present section further,

we now recall the important concept of orthogonality, using inner products.
In accordance with Definition 3.6.22, two vectors ,x y E X are said to be
ortbogonal (to one another) if (x, y) = O. We recall that this is written as
x ..L y. F r om the discussion at the beginning of this section it is clear that in
the plane x *
0 is orthogonal to y 0 if and only if the angle between x *
and y is some odd multiple of 90°.
The reader has undoubtedly encountered a special case of our next result,
known as the Pythagorean tbeorem.
4.9.38. Theorem. Let x , y E .X Ifx ..L y, then

Ix + ylZ = Ixl z + Iylz.
Proof. Since by assumption x ..L y, we have (x , y) = O. Thus,
Ix + ylZ = (x + y, x + y) = (x, x ) + (x , y) + (y,x ) + (Y , y) = Ixl z + IYl,z
which is the desired result. _
.4 9.39. Definition. A vector x E X is said to be a unit vector if Ix I= 1.
L e t us choose any vector y * 0 and let Z = I~ Iy. Then the norm of z is
Izi = 11~lyl = 1~l YI = 1,

i.e., z is a unit vector. This process is called normalizing the vector y.
Next, let {fl" .. .J.l be an arbitrary basis for X and let F = [ l t J ] denote
the matrix of the inner product with respect to this basis; i.e., ItJ = (It, f J )
for all; andj. More specifically,F denotes the matrix of the bilinear functional
f that is used in determining the inner product on X with respect to the
indicated basis (see Definition .4 8.1). Let x and y denote the coordinate
representation of x and y, respectively, with respect to { f l' ... ,f.l. Then
we have, by Theorem .4 8.2,
• •
(x , y) = Tx yF = yTFx' = J - LI L '-I
Ittlh·
Now by Theorems .4 8.20 and .4 8.23, since the inner product is symmetric
and strictly positive, there exists a basis e{ l , • • ,e.l for X such that the
matrix of the inner product with respect to this basis is the (n x n) identity
matrix I, i.e.,
ifi:;ej
(e" eJ ) = ~'J ={ ~ if; = j .
This motivates the following:
.4 9.40. Definition. If e{ l , • • , e.l is a basis for X such that (e" eJ ) = 0

for all ;:;e j, i.e., if e, ..L eJ for all ;:;e j, then e{ l , • • ,e.l is called an
orthogonal basis. If in addition, (e" e,) = I, i.e., if Ie,l = I for all i, then
e{ l , • • ,e.l is said to be an orthonormal basis for X (thus, e{ lt • • e.l is
orthonormal if and only if (e" eJ ) = ~I})'
sU ing the properties of inner products and the definitions of orthogonal

and orthonormal bases, we are now in a position to establish several useful
results.
.4 9.41. Theorem. eL t e{ lt • • ,e.l be an orthonormal basis for .X Let

x and y be arbitrary vectors in ,X and let the coordinate representation of
x andy with respectto this basis bex T = (':1' ... ,e.) and yT = (' l it· · , I' .),
respectively. Then
(4.9.42)
and
Ixl = (X TX) I/1 = ,Je: + ... + e:· (4.9.43)
Proof. F r om the above discussion we have
• • •
(x, y) = T
x yF = ',J-I
L Itt,'IJ = ',J-I
L ~'Je,'IJ = 'L=1 e,'I,.
In particular, we have
(x, x) = L
1= '
•
e:.•
The reader should note that Eqs. (4.9.7) and (4.9.8) introduced at the
beginning of this section are, of course, in agreement with Eqs. (4.9.42)
and (4.9.43). (See also Example .4 9.23.)
Our next result enables us to determine the coordinates of a vector with
respect to a given orthonormal basis.
4.9.44. Theorem. Let e{ l , • • ,e,,} be an orthonormal basis for X and let

x be an arbitrary vector. The coordinates of x with respect to {el' ... , ell}
are given by the formulas
~I = (x, e l ), •• ,~" = (x, ell)'
Proof Since x = ~Iel + ... + ~"e", we have
(x, e l) = (ele l + ... + e"e", e l) = el(e l , e l) + ... + ~"(e,, el ) = el'
Repeating this procedure for (x, el ), i = 2, ... ,n, yields the desired
result. _
Let us consider some specific cases.
4.9.45. Example. Let X = Z£ (see Definition .4 9.27). Let ,x y E £ 2 ,

where x = (~I' ~2) and y = (111,112)' Then
el111 + 2{ 112'
(x , y) =
The natural basis for £ 2 is given by U I = (1,0) and U 2 = (0, I). Since (u l ,
u / ) = J ' I ' it follows that u{ 1, u 2} is an orthonormal basis for £2. F u rthermore,
we have
4.9.46. Example. Let X = RZ, and let the inner product on RZ be defined
by
1{ 111 + 4~z11z·
(x, y) = (4.9.47)
(The reader may verify that this is indeed an inner product.) L e t u{ I , u z }
denote the natural basis for RZ; i.e., U I = (1,0) and U z = (0,1). The matrix
representation of the bilinear functional, which determines the above inner
product with respect to the basis u{ l , uz} is
(x, y) = XT[~ ~Jy,

where x and yare the coordinate vectors of x and y with respect to u{ l , U 2 J .
We see that (UI> U 2 ) = 1 · 0 +
4 · 0 . 1 = 0; eL ., U I and U 2 are orthogonal
with respect to the inner product (4.9.47). Note however that lUll = I and
Iuz\ = 2; i.e., the vectors U I and U 2 are not orthonormal.
Now let e l = (1,0) and e z = (0, t). Then it is readily verified that e{ l> e 2 }
is an orthonormal basis for .X F u rthermore, for x = ~el + e~e2' we have
e~ = (x, e,) and e~ = (x, e 1 ). If we let
x' = [~] and y' = [:~J

denote the coordinate representation of x and y, respectively, with respect to
eel, e1,} then
(x, y) = (x)' Ty'.
This illustrates the fact that the norm of a vector must be interpreted with
respect to the inner product used in determining the norm. _
Our next result allows us to represent vectors in X in a convenient way.
.4 9.48. Theorem. Let e{ l , •• ,e.} be an orthogonal basis for .X Then for

all x E X w e have
_
x - ( e , e, )e l
(x , e,) + • • •
+ (x , eft)
(-eft,
- ) e".
l e.
Proof. Normalizing e l , • • ,e", we obtain the orthonormal basis {~, ... ,
i..}, where e: = rile" i = I, ,n. By Theorem .4 9.44 we have
x = (x , e~)e~ + + (x , e~)e~
= (x, Tf-re,) (Tf-r) e, + ... + (x, ;' .le.)C:.I)e.

(x , e,) + + (xle.1
, e.)
= ~el .• • eft 1
= (x , e,) e + ... + (x , e.) e.. _

(el> e,) l (e., e.)
We are now in a position to characterize inner products by means of

Parseyal's identity, given in our next result.
.4 9.49. Coronary. Let e{ " ... ,e,,} be an orthogonal basis for .X Then
for any ,x y E X we have
(x, y) = t. (x, e,)(y, e,).
t1 (e" e,)
.4 9.50. Exercise. Verify Corollary .4 9.49.
Our next result establishes the linear independence of orthogonal vectors.

We have:
.4 9.51. Theorem. Suppose that X I ' • • ' X k are mutually orthogonal non-
zero vectors in ;X i.e., ,x ...L X j ' i::l= j. Then X l " • ,x k are linearly inde-
pendent.
Proof Assume that for real scalars (X I ' •• , (Xk we have

(X I X I + ... + (XkkX = O.
F o r arbitrary i = I, ... , k, we have
0= (0, XI) = «XIX I + ... + (XkkX J XI) = (X I (X I, XI) + ... + (Xk(X k , XI)
= (X/(X" X I );
i.e., (XI(X I , XI) = O. This implies that (XI = 0 for arbitrary i, which proves the
linear independence of X I " • , X k • •
Note that the converse to the above theorem is not true. We leave the
proofs of the next two results as an exercise.
.4 9.52. Corollary. A set, of k non-zero mutually orthogonal vectors is

a basis for X if and only if k = dim X = n.
.4 9.53. Corollary. F o r X there exist not more than n mutually orthonormal

vectors. (In this case we speak of a complete orthonormal set of vectors.)
.4 9.54. Exercise. Prove Corollaries .4 9.52 and .4 9.53.
Our next result, which is called the Gram·Schmidt process, allows us to

construct an orthonormal basis from an arbitrary basis.
.4 9.55. Theorem. eL t ff., ... , fn} be an arbitrary basis for .X Set

gl = fl, el = gl/lgII,
g,. = f,. - (/,., el)e u e,. = g,./lg,.l,
.- 1
g" 1
=" - ~ (/",ej)ej, elf = g"/lg,,l·
j= 1
Then e{ u ... , elf} is an orthonormal basis for .X
.4 9.56. Exercise. Prove Theorem .4 9.55. To accomplish this, show that

(e" ej ) = 0 for i :#' .j, that lell = 1 for i = 1, ... , n, and that e{ l , • • , elf}
forms a basis for .X
The next result is a direct consequence of Theorem .4 9.55 and Theorem

3.3.4.4
.4 9.57. Corollary. If e u ... , ekJ k < n, are mutually orthogonal non-ez ro

vectors in ,X then we can find a set of vectors ek + I ' • . . , elf such that the
set {el, ... , elf} forms a basis for .X
Our next result is known as the Bessel inequality.

214 Clwpter 4 I iF nite-Dimensional Vector Spaces and Matrices
.4 9.58. Theorem. If { X I ' " ' x k} is an arbitrary set of mutually

orthonormal vectors in ,X then
for all X E .X Moreover, the vector

k
Y = X - :E (x, ,x )x,
1= '
is orthogonal to each "x i = I, ... , k.
Proof. Let IJ, = (x, ,x ). We have
o I< x - f.
~
IJ,,X I2 = (x - ~
f. IJ,"X X - t
J-I
IJ j X j)
k k k k
= (x, x ) - ~ 1J,tJ, - ftilJ j lJ j + ~I ~I (I,,tJt,x , X j )'
Now since the vectors x lt • • 'X k are mutually orthonormal, we have
which proves the first part of the theorem.

To prove the second part, we note that
In Theorem .4 9.58, let U denote the linear subspace of X which is spanned

by the set of vectors x { lt • . • , x k } . Then clearly each vector Y defined in this
theorem is orthogonal to each vector of ;U i.e., Y .l.. U (see Definition 3.6.22).
Let us next consider:
.4 9.59. Theorem. Let Y be a linear subspace of ,X and let

y.L = x{ E X: (x, y) = 0 for all Y E .} Y (4.9.60)
(i) Let {II' ,Ik} span .Y Then x E y.L if and only if x ..1 I j for
j= I, , k.
(ii) y.L is a linear subspace of .X
(iii) n = dim X = dim Y + dim y.L.
(iv) (Y.)L .L = .Y
(v) X = yEt> y.L
(vi) Let X , Y E .X If x = IX + X 2 and Y = YI + Y2' where X I ,Y I E Y
and X 2, 2Y E y.L, then
and
Ixl = v- "lxti z + IXlz .z
Proof To prove the first part, note that if x E y.L, then x J . .fl"' " X J . .fk'
since It E Y for i = 1, ... , k. On the other hand, let x J . ./' , i = 1, ... , k.
Then for any Y E Y there exist scalars I' " i = 1, ... , k such that y = ndl
+ ... + I' ,.fk' eH nce,
(x, y) = (x , t 'I,/')
I- '
= f
t:1
' I ,(x , /,) = O.
Thus, x E y.L.
The remaining parts of the theorem are left as an exercise. _
.4 9.61. Exercise. Prove parts (ii) through (vi) of Theorem 4.9.59.
.4 9.62. Definition. Let Y be a linear subspace of .X The subspace y.L

defined in Eq. (4.9.60) is called the orthogonal complement of .Y
Before closing the present section we state and prove the following impor-
tant result.
.4 9.63. lbeorem. L e t/be a linear functional on .X There exists a unique

y E X such that
f(x ) = (x , y) (4.9.64)
for all x E .X
Proof. If I(x ) = 0 for all x E X , then y = 0 is the unique vector such that
Eq. (4.9.64) is satisfied for all x E ,X by Theorem 4.9.17. So let us suppose
that I(x ) *- 0 for some x E ,X and let
~ = {x E X : /(x ) = OJ.
Then is a linear subspace of .X Let ~.L
~ be the orthogonal complement
of~. Then it follows from Theorem 4.9.59 that X = ~ EB .~ L Further-
more, .~ L contains a non-ez ro vector. Let oY E ~.L and, without loss of
generality, let oY be chosen in such a fashion that Iyo I = 1. Now let y
= f(yo)Yo and for any x E X let X o = x - lXoY , where lX = f(x ) /f(yo)' Then
f(x o) = 0, and thus X o E ~. We now have x = X o + lXoY , and
(x, y) = (xo'/(Yo) • oY ) + (lXoY /' (Yo) • oY )
= f(yo) • (x o, oY ) + lXf(yo) • (Yo. oY )
= lXf(yo) = f(x ) ;
i.e., for all x E X , /{ x ) = (x , y).
To show that y is unique, suppose that (x , IY ) = (x , yz ) for all x E .X
Then (x , IY - yz ) = 0 for all x E .X But this implies that IY - Y z = 0, or
IY = Yz· This completes the proof of the theorem. _
.4 10. IL NEAR TRANSFORMATIONS
ON EUCIL DEAN VECTOR SPACES
A. Orthogonal Transformations
In the present section we concern ourselves with special types of linear

transformations defined on Euclidean vector spaces. We will have occasion
to reconsider similar types of transformations again in Chapter 7, in a much
more general setting. nU less otherwise specified, X willdenote an n-dimensional
Euclidean vector space throughout the present section.
The first special type of linear transformation defined on Euclidean
vector spaces which we consider is the so-called "orthogonal transformation."
Let {el, • . . ,e.l be an orthonormal basis for ,X let e; = t. PJleJ' i= 1, ... ,
1="1
n, and let P denote the matrix determined by the real scalars P'' J The following
question arises: when is the set {e~, ... , e.l also an orthonormal basis for
X? To determine the desired properties of P, we consider
(e;, eJ) = (t Pklek, t. PIJel)

k~1 ~
= ~ PkIP/J
t.1 (e ko e,).
In order that (1" ej) = 0 for i *"j and (e;, ej) = I for i = j, we require that
• •
(e;, ej) =
~I
~ PklP' ) k' = ~
~ PklPkJ = 6,J ,
i.e., we require that
PTP = I,
where, as usual, I denotes the n X n identity matrix. We summarize.
.4 10.1. Theorem. Let { e l' ... ,e.l be an orthonormal basis for .X Let
e; = t
I-J
PJleJ' i = 1, ... ,n. Then {e~, ... ,e.l is an orthonormal basis for
X if and only if pT = P- I .
This result gives rise to the following:
.4 10.2. Definition. A matrix P such that pT = p- I , i.e., such that p7' p

= p- I p = I, is called an orthogonal matrix.
.4 10.3. Exercise. Show that if P is an orthogonal matrix, then either

det P = 1 or det P = - I . Also, show that if P and Q are (n X n) orthogonal
matrices, then so is PQ.
The nomenclature used in our next definition will become clear shortly.
216
.4 10. Lineary Transformations on Euclidean Vector Spaces 217
.4 10.4. Definition. A linear transformation A from X into X is called an

orthogonal linear transformation if (Ax, Ay) = (x, y) for all x, y E .X
Let us now establish some of the properties of orthogonal transformations.
.4 10.5. Theorem. eL t A E L(X, X). Then A is orthogonal if and only if

IAx l = Ixl for all x E .X
Proof. If A is orthogonal, then (Ax, Ax ) = (x, x) and IAx I = Ilx . Con-
versely, if IAx I = Ix I for all x E ,X then
IA(x + y)1 2 = (A(x + y), A(x + y» = (Ax + Ay, Ax + Ay)
= IAxl 2 + 2(Ax, Ay) + IAyl2
= Ixl 2 + 2(Ax, Ay) + lyl2.
Also,
IA(x + yW = Ix + yl2 = (x + y, x + y) = Ixl 2+ 2(x , y) + lyl2,
and therefore
(Ax, Ay) = (x, y)
for all x , y E .X _
We note that if A is an orthogonal linear transformation, then x .1.. y

for all ,x y E X if and only if Ax .1.. Ay. F o r (x, y) = 0 if and only if (Ax,
Ay) = O.
.4 10.6. Corollary. Every orthogonal linear transformation of X into X

is non-singular.
Proof Let Ax = O. Then IAxl = Ixl = O. Thus, x = 0 and A is non-
singular. _
Our next result establishes the link between Definitions 4.10.2 and 4.10.4.
.4 10.7. Theorem. eL t e{ l' ... ,e.J be an orthonormal basis for .X Let

A E X ) , and let A be the matrix of A with respect to this basis. Then
L(X,
A is orthogonal if and only if A is orthogonal.
Proof. Let x and y be arbitrary vectors in ,X and let x and y denote their
coordinate representation, respectively, with respect to the basis e{ l, ... , e.J.
Then Ax and Ay denote the coordinate representation of Ax and Ay, respec-
tively, with respect to this basis. Now,
(Ax, Ay) = (Ax Y ( Ay) = x T
ATAy,
and
118 Chopter 4 I iF nite-Dimensional Vector Spaces and Matrices
Now suppose that A is orthogonal. Then ATA = I and (Ax , Ay) = T x y

= (x, y) for all x , y E .X On the other hand, if A is orthogonal, then (Ax ,
Ay) = Tx ATAy = T x ([ ATA - I )y] = O.
x y = (x , y) for all x , y E .X Thus, T
Since this holds for all x , y E ,X we conclude from Corollary 4.9.20 that
ATA - I = 0; i.e., ATA = I. •
The next two results are left as an exercise.
4.10.8. Corollary. Let A E L ( X , X). IfA is orthogonal, then det A = ± 1.
4.10.9. Corollary. Let A, BE L ( X , X ) . If A and B are orthogonal trans-

formations, then AB is also an orthogonal linear transformation.
4.10.10. Exercise. Prove Corollaries .4 10.8 and .4 10.9.
F o r reasons that will become apparent later, we introduce the following

convention.
4.10.11. Definition. L e t A E L ( X , X ) be an orthogonal linear transfor-

mation. Ifdet A = + I, then A is called a rotation. Ifdet A = - I , then A is
called a reflection.
B. Adjoint Transformations
The next important class of linear transformations on Euclidean spaces

which we consider are so-called adjoint linear transformations. Our next
result enables us to introduce such transformations in a natural way.
4.10.12. Theorem. L e t G E L ( X , X ) and defineg: X x X - + R by g(x , y)

= (x, Gy) for all x , y E .X Then g is a bilinear functional on .X Moreover, if
{el>' .. , e.1 is an orthonormal basis for ,X then the matrix of g with respect
to this basis, denoted by G, is the matrix of G with respect to {el>' • , e.l.
Conversely, given an arbitrary bilinear functional g defined on ,X there
exists a unique linear transformation G E L ( X , X ) such that (x , Gy) = g(x , y)
for all x , y E .X
Proof. Let G E L(X, X), and let g(x , y) = (x , Gy). Then
g(x l + x z , y) = (X I + X Z, Gy) = (X I ' Gy) + (x z , Gy) = g(x l ,y) + g(x z , Y ) .

Also,
g(x, YI + yz ) = (x, G(YI + yz » = (x, GYI + Gyz) = (x , GYI) + (x , Gyz)
= g(x, Y I ) + g(x , yz)·
.4 10. iL near Transformations on Euclidean Vector Spaces 119
Furthermore,
g(tU, y) = (lX,X Gy) = IX(,X Gy) = IXg(X, y),
and
g(x, IX)Y = (x, G(IX» Y = (x, IXG(y» = IX(,X Gy) = IXg(X, y),
where IX is a real scalar. Therefore, g is a bilinear functional.
Next, let e{ ., ... ,e.} be an orthonormal basis for .X Then the matrix
G of g with respect to this basis is determined by the elements g/j = g(e l, eJ).
Now let G' = g[ ;J] be the matrix of G with respect to {e., . .. ,e.}. Then
Ge J = t
k=.
g~Jek for j = I, ... ,n. Hence, (e lt Ge) = (e k=t.
l, g~)ek) = g;j.
Since glJ = g(e l , eJ ) = (e lt Ge J ) = g;J' it follows that G' = G; eL ., G is the
matrix ofG.
To prove the last part of the theorem, choose any orthonormal basis
e[ ., ... ,e.} for .X Given a bilinear functional g defined on ,X let G = g[ lj]
denote its matrix with respect to this basis, and let G be the linear transfor-
mation corresponding to G. Then (x, Gy) = g(x, y) by the identical argument
given above. Finally, since the matrix of the bilinear functional and the matrix
of the linear transformation were determined independently, this correspon-
dence is unique. _
It should be noted that the correspondence between bilinear functionals

and linear transformations determined by the relation (x, Gy) = g(x, y) for
all x , y E X does not depend on the particular basis chosen for ;X however,
it does depend on the way the inner product is chosen for X at the outset.
Now let G E L ( X , X ) , set g(x, y) = (x, Gy), and let h(x, y) = g(y, x)
= (y, Gx) = (Gx, y). By Theorem 4.10.12, there exists a unique linear trans-
formation, denote it by G*, such that h(x, y) = (x, G*y) for all ,x y E .X
We call the linear transformation G* E L ( X , X ) the adjoint of G.
.4 10.13. Theorem
(i) F o r each G E L ( X , X ) , there is a unique G* E L ( X , X ) such that
(x, G*y) = (Gx, y) for all ,x y E .X
(ii) Let {e., . .. ,e.} be an orthonormal basis for ,X and let G be the
matrix of the linear transformation G E L ( X , X ) with respect to this
basis. Let G* be the matrix of G* with respect to e[ l , • • , e.}. Then
G* = GT.
Proof The proof of the first part follows from the discussion preceding
the present theorem.
To prove the second part, let e[ l, ... , e.} be an orthonormal basis for
,X and let G* denote the matrix of G* with respect to this basis. Let x and y
be the coordinate representation of x and y, respectively, with respect to this
basis. Then
(x , G*y) = T
x G*y = (Gx , y) = (GX)T y = T
x GT y.
Thus, for all x and y we have T
x (G* - GT)y = O. eH nce, G* = GT. •
The above result allows the following equivalent definition of the adjoint
linear transformation.
.4 10.14. Definition. eL t G E L(X, X). The adjoint transformation, G*

is defined by the formula
(x , G*y) = (Gx , y)
for all x, y E .X
Although there is obviously great similarity between the adjoint linear

transformation and the transpose of a linear transformation, it should be
noted that these two transformations constitute different concepts. The
differences of these will become more apparent in our subsequent discussion
of linear transformations defined on complex vector spaces in Chapter 7.
Our next result includes some of the elementary properties of the adjoint
of linear transformations. The reader should compare these with the proper-
ties of the transpose of linear transformations.
.4 10.15. Theorem. eL t A, B E L ( X , X ) , let A*, B* denote their respective

adjoints, and let lX be a real scalar. Then
(i) (A*)* = A;
(ii) (A B)* = A* + + B*;
(iii) (lXA)* = lXA*;
(iv) (AB)* = B*A*;
(v) /* = I, where / denotes the identity transformation;
(vi) 0* = 0, where 0 denotes the null transformation;
(vii) A is non-singular if and only if A* is non-singular; and
(viii) if A is non-singular, then (A*)- I = (A- I )*.
Our next result enables us to characterize orthogonal transformations in

terms of their adjoints.
.4 10.17. Theorem. eL t A E L ( X , X). Then A is orthogonal if and only if

A* = A- I .
Proof We have (Ax, Ay) = (A*Ax , y). But A is orthogonal if and only jf
(Ax , Ay) = (x, y) for all x , y E .X Therefore,

(A*Ax , y) = (x , y)
for all x and y. F r om this it follows that A*A = I, which implies that A*
=A-I . •
The proof of the next theorem is left as an exercise.
.4 10.18. Theorem. Let A E L ( X , X ) . Then A is orthogonal if and only if

A- I is orthogonal, and A- I is orthogonal if and only if A* is orthogonal.
C. Self- A djoint Transformations
Using adjoints, we now introduce two additional important types of

.4 10.20. Definition. Let A E L ( X , )X . Then A is said to be self-adjoint

if A* = A, and it is said to be skew-adjoint if A* = - A .
Some of the properties of such transformations are as follows.
.4 10.21. Theorem. Let A E L ( X , X ) . Let e{ lO • • , e"} be an orthonormal

basis for ,X and let A be the matrix of A with respect to this basis. The fol-
lowing are equivalent:
(i) A is self-adjoint;
(ii) A is symmetric; and
(iii) (Ax , y) = (x , Ay) for all x , y E .X
.4 10.22. Theorem. Let A E L ( X , X), and let e{ l, ... , e"} be an orthonor-

mal basis for .X Let A be the matrix of A with respect to this basis. The
(i) A is skew-adjoint;
(ii) A is skew-symmetric (see Definition .4 8.8); and
(iii) (Ax , y) = - ( x , Ay) for all x , y E .X
.4 10.23. Exercise. Prove Theorems .4 10.21 and .4 10.22.
The following corollary follows from part (iii) of Theorem .4 10.22.

.4 10.24. Corollary. eL t A be as defined in Theorem .4 10.22. Then the

(i) A is skew-symmetric;
(ii) (x, Ax ) = 0 for all x E ;X and
(iii) Ax . .l x for all x E .X
Our next result enables us to represent arbitrary linear transformations

as the sum of self-adjoint and skew-adjoint transformations.
.4 10.25. Corollary. eL t A E L(X, X). Then there exist unique At, A"
E L(X, X ) such that A = AI + A", where At is self-adjoint and A" is skew-
adjoint.
.4 10.26. Exercise. Prove Corollaries .4 10.24 and .4 10.25.
.4 10.27. Exercise. Show that every real n x n matrix can be written in

one and only one way as the sum of a symmetric and skew-symmetric matrix.
Our next result is applicable to real as well as complex vector spaces.
.4 10.28. Theorem. eL t X be a complex vector space. Then the eigenvalues

of a real symmetric matrix A are all real. (If all eigenvalues of A are positive
(negative), then A is called positive (oegative) definite.)
Proof eL t A = r + is denote an eigenvalue of A, where rand s are real
numbers and where i = ../- 1 . We must show that s = O.
Since A is an eigenvalue we know that the matrix (A - AI) is singular.
So is the matrix
B= A
[ - (r + is)I)[A - (r - is)I)
= A" - (r + is)IA - (r - is)IA + (r + is)(r - is)1"
= A" - 2rA + (r" + s")1" = (A - rI)" + s"I.
Since B is singular, there exists an x * "O such that Bx = O. Also,
0= T
x Bx = T
x ([ A - rl)" + s"I)x = T
x (A - rI)"x + s"xT.x
Since A and I are symmetric,
(A - rI)T = AT - rl T = A - rl.
Therefore,
i.e.,
where y = (A - rI)x. Now yTy = ~

•
,,1 ~ 0 and T
x x = L • ,1> 0, because
,~ I- '
by assumption x * O. Thus, we have

o= yTy + SZ(xT)x > 0 + sZxT.x
The only way that this last relation can hold is if s = O. Therefore, A = T,
and Ais real. _
Now let A be the matrix of the linear transformation A E L ( X , X ) with

respect to some basis. If A is symmetric, then all its eigenvalues are real.
In this case A is self-adjoint and all its eigenvalues are also real; in fact, the
eigenvalues of A and A are identical. Thus, there exist uniq u e real scalars
AI' ... , Apt P < n, such that
det (A - U) = det (A - AI) = (AI - A)""(Az - A)'"'
... (A, - A)'".' (4.10.29)
We summarize these observations in the following:
.4 10.30. Corollary. Let A E L ( X , )X . If A is self-adjoint, then all eigen-

values of A are real and there exist uniq u e real numbers AI" • ,A" p < n,
such that Eq. (4.10.29) holds.
As in Section 4.5, we say that in Corollary 4.10.30 the eigenvalues A"

i = 1, ... ,p < n, have algebraic multiplicities m i = 1, ... ,p, respectively.
Another direct consequence of Theorem 4.10.28 " is the following result.
4.10.31. Corollary. Let A E L(X, X). If A is self-adjoint, then A has at

least one eigenvalue.
.4 10.32. Exercise. Prove Corollary 4.10.31.
Let us now examine some of the properties of the eigenvalues and

eigenvectors of self-adjoint linear transformations. First, we have:
.4 10.33. Theorem. Let A E L ( X , X ) be a self-adjoint transformation, and

let AI" .. ,Ap , p < n, denote the distinct eigenvalues of A. If ,X is an eigen-
vector for A, and if XI is an eigenvector for AI' then ,x .1. XI for all i j. *
Proof Assume that A, A,andconsider AX I = * A,X , and Ax, = AJ"X where
,x *
0 and x , O. We have *
A,(X x,) = (A,X )JX = (Ax )JX = (XI' Ax /) = (x" AJX /) = Aix " )J x '
" " "
Thus,
(A, - AJ)(X" IX ) = O.
Since A, * AI' we have (XI' )JX = 0, which means ,x .1. xI' _
Now let A E L(X, X), and let A, be an eigenvalue of A. Recall that ~,

denotes the null space of the linear transformation A - A,l, i.e.,

m= l x{ E :X (A - A Il)x = OJ. (4.10.34)
Recall also that ml is a linear subspace of .X F r om Theorem .4 10.33 we now
have immediately:
.4 10.35. Corollary. Let A E L ( X , X ) be a self-adjoint transformation, and

let AI and Aj be eigenvalues of A. If AI *- Aj , then ml ..1 mj •
Making use of Theorem .4 9.59, we now prove the following important

result.
.4 10.37. lbeorem. Let A E L ( X , X ) be a self-adjoint transformation, and

let A\, ... , A" p < n, denote the distinct eigenvalues of A. Then
dim X = n= dim m\ + dim mz + ... + dim m,.
Proof Let dim ml = nl , and let ret, ... , e• .l be an orthonormal basis for
mi' Next, let e{ • + I > ' " ,e.,H .} be an orthonormal basis for mz . We continue
in this manner, finally letting e{ ., + ... +_. + I> • • • , e•• + ... .+ ,} be an orthonormal
basis for mp • Let n\ + ... + n p = m. Since ml ..1 mj , i *- j, it follows that
the vectors et> ... ,e.., relabeled in an obvious way, are orthonormal in .X
We can conclude, by Corollary .4 9.52, that these vectors are a basis for ,X
if we can prove that m = n.
Let Y be the linear subspace of X generated by the orthonormal vectors
e\ , ... , e... Then e{ l , • • , e..} is an orthonormal basis for Y a nd dim Y = m.
Since dim Y + dim y1. = dim X = n (see Theorem .4 9.59), we need only
prove that dim Y 1. = O. To this end let x be an arbitrary vector in Y 1.. Then
(x, e\) = 0, ... , (x, e..) = 0; i.e., x . .l e\, ... , x ..1 e.., by Theorem .4 9.59.
So, in particular, again by Theorem .4 9.59, we have x ..1 ml , i = I, ... ,p.
Now let y be in mi' Then
(Ax, y) = (x, Ay) = (x, AIY) = Alx , y) = 0,
since A is self-adjoint, since y is in ml , and since x ..1 mi' Thus, Ax ..1 m,
for i = I, ... ,p, and again by Theorem .4 9.59, Ax . .l el , i = I, ... , m.
Thus, by Theorem .4 9.59, Ax . .l yol. Therefore, for each x E Y 1. we also have
Ax E yol. Hence, A induces a linear transformation, say A', from yol into
1Y ., where A' x = Ax for all x E y1.. Now A' is a self-adjoint linear trans-
formation from yol into oY l, because for all x and y in yol we have
(A'x, y) = (AX, y) = (x, Ay) = (x, A'y).
Assume now that dim yol> O. Then by Corollary .4 10.31, A' has an
eigenvalue, say Ao, and a corresponding eigenvector X o *- O. Thus, X o *- 0
IS 10 y1. and A' x o = Ax o = Aox o; i.e., Ao is also an eigenvector of A, say

Ao = A,. So now it follows that X o E ~/' But from above, X o E Y 1., which
means X o 1- ~/' This implies that X o 1- x o, or (x o, x o) = 0, which in turn
implies that X o = O. But this contradicts our earlier assumption that X o 1= = O.
eH nce, we have arrived at a contradiction, and it therefore follows that dim
Y 1. = O. This proves the theorem. _
Our next result is a direct consequence of Theorem .4 10.37.
.4 10.38. Corollary. eL t A E L(X, )X . If A is self-adjoint, then

(i) there exists an orthonormal basis in X such that the matrix of A
with respect to this basis is diagonal; and
(ii) for each eigenvalue A, of A we have dim m, = multiplicity of A,.
Proof As in the proof of Theorem .4 10.37 we choose an orthonormal
basis ret, ... ,em}, where m = n. We have Ael = Ale., . .. ,Ae", = Ale""
Ae",+l = A2.e"'h+ ' .. ,Ae",+ ... u. = Ape",+ ... + .• Thus, the matrix A of A with
respect to e{ l , • . • ,e.} is
Al
In.
Al 0
A. 2.
A=
In.
A2.
I
o A,
n,
A,
To prove the second part, we note that the characteristic polynomial of
A is
det (A - AI) = det (A - AI) = (AI - A)"'(A2. - A)"' (Ap - A)"',
and, hence, n, = dim /~ = multiplicity of A" i = 1, ,p. _
Another consequence of Theorem .4 10.37 is the following:

.4 10.39. Corollary. Let A be a real (n x n) symmetric matrix. Then there

exists an orthogonal matrix P such that the matrix A' defined by
A' = P- I AP = pTAP
is diagonal.
F o r symmetric bilinear functionals defined on Euclidean vector spaces

we have the following result.
.4 10.41. Corollary. eL t f(x , y) be a symmetric bilinear functional on .X

Then there exists an orthonormal basis for X such that the matrix offwith
respect to this basis is diagonal.
Proof By Theorem .4 10.12 there exists an F E L( ,X X ) such that f(x , y)
= (x, Fy) for all x, y E .X Since f is symmetric, f(y, x ) = f(x , y) =
(y, Fx) = (x, yF ) = (F,x y) for all x, y E X, and thus, by Theorem .4 10.21,
F is self-adjoint. eH nce, by Corollary .4 10.38, there is an orthonormal basis
for X such that the matrix of F is diagonal. By Theorem .4 10.12, this matrix
is also the representation offwith respect to the same basis. _
The proof of the next result is left as an exercise.
.4 10.42. Corollary. eL t j(x ) be a quadratic form defined on .X Then

there exists an orthonormal basis for X such that if x T = (~I' ..• ,~.) is
the coordinate representation of x with respect to this basis, then! ( x ) = lXle~
+ ... + lX.e~ for some real scalars lXI' • • , IX •
Next, we state and prove the spectral tbeorem for self-adjoint linear
transformations. First, we recall that a transformation P E L ( X , X ) is a
projection on a linear subspace of X if and only if p1. = P (see Theorem
3.7.4). Also, for any projection P, X = R< (P) EEl (~ P), where R< (P) is the range
of P and ~(P) is the null space of P (see Eq. (3.7.8». Furthermore, recall
that a projection P is called an orthogonal projection if R < (P) ..1 (~ P) (see
Definition 3.7.16).
.4 10.4.4 Theorem. Let A E L(X, X) be a self-adjoint transformation, let

AI' ... ,Ap denote the distinct eigenvalues of A, and let ~I be the null space
of A - AsI (see Eq. (4.10.34.» F o r each; = I, ... ,p, let PI denote the
projection on 1& along &f-. Then
(i) PI is an orthogonal projection for each; = I, ... ,p;
(ii) PIP) = 0 for i *j, i,j = I, ... ,p;
.4 10. Linear Transformations on Euclidean Vector Spaces
(iii) t
J-I
PJ = I, where I E L(X, X ) denotes the identity transformation;
and
(iv) A = t
J=I
AJP)"
Proof To prove the first part, note that X = m:, EB m:;-, i = I, ... ,p,
by Theorem .4 9.59. Thus, by Theorem 3.7.3, R< (P ,) = m:, and m:(P ,) = m:;-,
and hence, P, is an orthogonal projection.
To prove the second part, let i 1= = j and let x E .X Then PJx I:>. x J E m: J .
Since R< (P ,) = m:, and since m:,1.. m: J , we must have x J E m:(P ,); i.e.,
P,PJx = 0 for all x E .X
To prove the third part, let P = t
I- I
P" We must show that P = I. To
do so, we first show that P is a projection. This follows immediately from the
fact that for arbitrary x E ,X p2 X = (PI + ... + P,)(Plx + ... + P,x ) =
PIx + ... + P;x , because P'P J = 0 for i 1= = j. Hence, p2 X = (PI + ...
+ P,)x = Px, and thus P is a projection. Next, we show that dim R<[ (P)] = n.
It is straightforward to show that
dim R
<[ (P)] = t
1= 1
dim m
[ :,],
But by Theorem .4 10.37, t
1= 1
dim m
[ :,] = n, and thus dim R
< [ (P)] = n. Since
X = R < (P) EB m:(P), we conclude that R
< (P) = .X Finally, since P is a pro-
jection with range ,X we conclude that Px = x for all x E ,X i.e., P = 1.
To prove the last part of the theorem, let x E .X F r om part (iii) we have
x = PIX + P 2x + ... + P,x.
Let ,x = P,X for i = I, ... , p. Then ,x E m:, and AX I = A,X ,. Hence,
Ax = A(x i + + x,) = AX + ... + Ax ,
I = Al IX + ... + A,x,
= AIPIX + + A'p,X = (AIP I + ... + A,P,)X,
which concludes the proof of the theorem. _
Any set of linear transformations P { I" .. , P,} satisfying parts (i)-(iii)

of Theorem .4 10.44 is said to be a resolution of the identity in the setting of
a Euclidean space. We shall give a more general definition of this concept in
Chapter 7.
D. Some Examples
At this point it is appropriate to consider some specific cases.
4.10.45. Example. Let X = P, let A E L(X, X), and let {el' e2 } be an

arbitrary basis for .X Suppose that
Chapter 4 I Finite-Dimensional Vector Spaces and Matrices
A= 0[ 11
021 022
Ou]
is the matrix of A with respect to the basis e{ l, e2}' eL t x E E2, and let
x T = (' I ' ' 2 ) denote the coordinate representation of x with respect to this
basis. Then Ax is the coordinate representation of Ax with respect to this
basis, and we have
Ax = 0[ 111' + OUe2J ~ 1'[ IJ = y.
021 '1 + 022e2 - 1' 2
This transformation is depicted p' ictorially in iF gure F.
Now assume that A is a self-adjoint linear transformation. Then there
exists an orthonormal basis e{ ,~ }~ such that
Ae~ = A.le~, Ae; = A.2e;,
o 111 8 1 81
.4 10.46. iF gure F
.4 10.47. FIgure G
.4 10. iL near Transformations on Euclidean Vector Spaces
where A1 and Az denote the eigenvalues of A. Suppose that the coordinates

of x with respect to l{ a, e~} are ;~ and ;~ , respectively. Then
Ax = A(~;e~ + ~e;) = ~;Ae~ + ~;Ae; = ~Ale~ + ~Aze;;
i.e., the coordinate representation of Ax with respect to e{ ,~ e~} is (AI~;' Az;~ ).

Thus, in order to determine Ax, we merely "stretch" or "compress" the coor-
dinates ;~ , ~ along lines colinear with e~ and e;, respectively. This is illus-
trated in iF gure G. •
.4 10.48. Example. Consider a transformation R from EZ into 1£ . which

rotates vectors as shown in iF gure .H By inspection we can characterize R,
with respect to the indicated orthonormal basis e{ l, ez,J as
Rei = cos (Je l + sin (Jez
Re z = - sin (Je l + cos (Je z •
U n it circle
.4 10.49. iF gure H
The reader can readily verify that R is indeed a linear transformation. The
matrix of R with respect to this basis is
_ C[ OS
R.-
(J - sin (JJ .
sin (J cos (J
By direct computation we can verify that
lij' = R; I = [ c~s (J sin (JJ

- sm (J cos (J ,
and, moreover, that
det R, = cos" (J + sin" (J = 1.
Thus, R is indeed a rotation as defined in Definition .4 10.11.
F o r the matrix R, we also note that R o = I, R;t = R_, and R,R~ = R,+~ .
•
.4 10.50. Example. Consider now a transformation A from E3 into E3,
as depicted in iF gure .J The vectors e l , e", e 3 form an orthonormal basis for
Set +Y
PlaneZ
900
I
I
I
I
I
/
I
I
Set .Y
.4 10.51. aJF ure J
E3. The plane Z is spanned by et and e". This transformation accomplishes

a rotation about the vector e3 in the plane Z. By inspection of Figure J it is
clear that this transformation is characterized by the set of equations
Aet = cos (Jet + sin (Je" + 0 • e3
Ae" = - s in (Je l + cos (Je" + 0 • e3
Ae 3 0 • e l + 0 • e" + I • e 3 •
=
The reader can readily verify that A is a linear transformation. The matrix
.4 10. Linear Transformations on Euclidean Vector Spaces 231
of A with respect to the basis { e l' e z , e3 l is

COS9 - s in 9
A= sin9 cos 9
[
o 0
F o r this transformation the following facts are immediately evident (assume
sin9 t= = 0): (a) e3 is an eigenvector with eigenvalue I; (b) plane Z is a linear
subspace of £ 3 ; (c) Ax E Z whenever x E Z; (d) the set +Y is a linear
subspace of £ 3 ; (e) Ax E +Y whenever x E ;+ Y (f) Z..l ;+ Y and (g)
dim +Y = I, dim Z = 2, and dim +Y +
dim Z = dim E3. •
E. Further Properties of Orthogonal Transformations
The preceding example motivates several of our subsequent results.

Let A E L ( X , X ) . We recall that a linear subspace Y of X is invariant under
A if Ax E Y whenever x E .Y We now prove the following:
.4 10.52. Theorem. eL t A E L(X, X) be an orthogonal transformation.

Then
(i) the only possible real eigenvalues of A, if there are any, are + I
and - I ;
(ii) if Y is a linear subspace of X which is invariant under A, then the
restriction A' of A to Y is an orthogonal transformation from Y into
Y; and
(iii) if Y is a linear subspace of X which is invariant under A, then Y l.
is also a linear subspace of X which is invariant under A.
Proof To prove the first part, assume that A has a real eigenvalue, say Ao•
(The definition of eigenvalue of A E L ( X , X ) excludes the possibility of com-
plex eigenvalues, since X is a vector space over the field R of real numbers.)
Then Ax = .lox for x *'
0 and
IAx I = IAox I = lAo IIlx -
But IAx I = Ix I, because A is by assumption an orthogonal linear transfor-
mation. Therefore, lAo I = I, and we have Ao = + I or - 1 .
To prove the second part assume that Y is invariant under A. Then
Ax E Y whenever x E ,Y and thus the restriction A' of A to ,Y defined by
A' x = Ax
for all x in ,Y is clearly a linear transformation of Y into .Y Now, trivially,
for all x in Y we have
IA'lx = IAx l = lxi,
since A E L ( X , X ) is an orthogonal transformation. Therefore, A' is an

orthogonal transformation from Y into .Y
To prove the last part, let Y be an invariant subspace of X under A. Then
x E y.l if and only if x 1.. y for all y E .Y Suppose then that x E y.l and
consider Ax . Then for each y E Y we have
(Ax , y) = (x, A*y) = (x, A- I y),
because A is orthogonal.. But A- I Y is also in ,Y for the following reasons.
The restriction A' of A to Y is orthogonal on Y by part (ii) and is therefore
a non-singular transformation from Y into .Y eH nce, (A' t 1 exists and,
moreover, (A' t l must be a transformation from Y into .Y Thus, (A' t ly
= A- I y and A- I y is in .Y We finally have
(Ax , y) = (x, A- I y ) = 0
for each y in .Y Thus, Ax E y.l whenever x E y.l. This proves that y.l
is invariant under A. •
We also have:
.4 10.53. Theorem. Let A E L ( X , X ) be an orthogonal transformation, let

+Y denote the set of all x E X such that Ax = ,x and let y_ denote the set
of all x E X such that Ax = - x . Then +Y and L are linear subspaces of X
and +Y 1.. L .
Proof Since +Y = ~(A - I) and y_ = ~(A + I), it follows that +Y and
_ Y are linear subspaces of .X Now let x E Y H and let y E Y_. Then
(x, y) = (Ax , Ay) = (x, - y ) = -(x, y),
which implies that (x, y) = O. Therefore, x 1.. yand +Y 1.. y_. •
sU ing the above theorem we now can prove the following result.
.4 10.54. Corollary. eL t A, +Y and y_ be defined as in Theorem .4 10.53,

and let Z denote the set of all x E X such that x 1.. +Y and x 1.. y_. Then Z
is a linear subspace of X and dim +Y + dim y_ + dim Z = dim X = 11.
Furthermore, the restriction of A to Z has no (real) eigenvalues.
Proof Let e{ l , • • , e"J be an orthonormal basis for Y H and let e{ .'1+ >
... , e"I+n.} be an orthonormal basis for ,_ Y where dim +Y = n 1 and dim y_
= 11". Then the set e{ l , • • ,e"I+"}' is orthonormal. Let Y denote the linear
subspace generated by e{ l , • • , e".+n.}. Then dim Y = 11 1 + 11". By the def-
inition of Z and by Theorem .4 9.59 we have Z = .Y ,L and thus Z is a linear
subspace of .X Therefore,
11 = dim X = dim Y + dim y.l = n l +· 11" + dimZ
= dim +Y + dim y_ + dimZ,
which was to be shown.
To prove the second assertion, let A' denote the restriction of A to Z.

Suppose there exists a non-zero vector x E Z such that A' x = lox . Since
A is orthogonal by part (ii) of Theorem .4 10.52, we have 1 0 = ± 1 by part
I
(i) of Theorem .4 10.52. Thus, x is either in +Y or in L . But by assumption,

x E Z and Z .J . +Y and Z .J . L . Therefore, x = 0, a contradiction to
our earlier assumption. Hence, the restriction A' of A to Z cannot have a
real eigenvalue. _
Our next result is concerned with orthogonal transformations on two-

dimensional Euclidean spaces.
.4 10.55. Theorem. eL t A E L(X, X) be an orthogonal transformation,

where dim X = 2.
(i) If det A = + 1 (i.e., A is a rotation), there exists some real (J such
that for every orthonormal basis fe., e2 } the corresponding matrix
of A is
R,-
_ C[ OS (J - sin (JJ . (4.10.56)
sin (J cos (J
(ii) If det A = - 1 (i.e., A is a reflection), there exists some orthonormal
basis feI' e2 } such that the matrix of A with respect to this basis is
Q= 1[ o - I OJ. (4.10.57)
Proof To prove the first part assume that det A = + 1 and choose an
arbitrary orthonormal basis fe" e2 } . eL t
A = [all au]
a21 a2 2
denote the matrix of A with respect to this basis. Then, since A is orthogonal,
so is A and we have
(4.10.58)
and
det A = I. (4.10.59)
Solving Eqs. (4.10.58) and (4.10.59) (we leave the details to the reader) yields
alI = cos (J, au = - sin (J, a21 = sin (J, and a 22 = cos (J.
To prove the second part assume that A is orthogonal and that det A
= - I . Consider the characteristic polynomial of A,
. p(l) = 12 + /X 11 + /X o •
Since det A = - 1 we have /x o = - 1 . Solving for 1 I and 1 2 we have

, , _ -/XI ± .v'iif+4
AI> A2 - 2 '
which implies that both AI and Az are real and that AI *- Az. rF om Theorem
.4 10.52 these eigenvalues are + 1and - 1 . Therefore, there exists an orthonor-
mal basis such that the matrix of A with respect to this basis is
OJ I[ OJ
[ °
AI
Az = 0 -1 . •
In the above proof we have e l E +Y and ez E ,_Y in view of Theorem

.4 10.53. Also, from the preceding theorem it is clear that if A is orthogonal
and (a) if det A = 1, then det (A - U ) = 1 - 2 cos 9A + AZ, and (b) if
det A = - 1 , then det (A - U ) = AZ - l.
.4 10.60. Theorem. eL t A E L ( X , X ) be an orthogonal transformation

having no (real) eigenvalues. Then there exist linear subspaces Y " ... , Y r
of X such that
(i) dim Y I = 2, i = 1, ... , r;
(ii) Y I 1- Y J for all i *- j;
(iii) dim Y I + ... + dim Y r = dim X = n; and
(iv) each subspace Y I is invariant under A; in fact, the restriction of A
to Y I is a non-trivial rotation (i.e., for the matrix given by Eq.
(4.10.56) we have 9 *- kn, k = 0, 1,2, ...).
Proof. Since by assumption A does not have any (real) eigenvalues, we have
det (A - AI) = (tX I + PI). + ).Z) ... (tX, + P,). + ).Z),
where the tXl' PI> i = I, ... , r are real (i.e., det (A - U ) does not have any
linear factors (AI - ).), with AI real). Solving the first uq adratic factor we have
, _ -PI ,+ P
J : -4«1
""I - 2
and
A-
z-
-PI .- .Jm2 - t4 X l,
where AI and Az are complex. By Theorem .4 5.33, part (iv), if 1(' ) is any
polynomial function, then I(AI) will be an eigenvalue of I(A). In particular,
ifj(A) = tX l + PIA + AZ, we know that one of the eigenvalues of the linear
transform~tion tXlI + PIA + AZ will be tX l + PIAl + Ai = 0, by choice.
Thus, the linear transformation (tXII + PIA + AZ) has 0 as an eigenvalue.
Therefore, there exists a vector jl *- 0 in X such that
(tXII + PIA + AZ)/I = 0 • II
or
(4.10.61)
Now let 11 = Alt· We assert that 11 and 11 are linearly independent. F o r

if they were not, we would have 11 = "fIt = Alto where" is a real scalar,
and It would be an eigenvector corresponding to a real eigenvalue" of A,
which is impossible by hypothesis. Nex t , let Y t be the linear subspace of X
generated by It and/1 • Then Y t is two dimensional. We now show that .Y
is invariant under A. L e t x E Y t • Then
x = etft + e.Jz
for some et and ez, and
Ax = etA/. + ez AI1 = etA/. + ez A Z/ t .
But from Eq. (4.10.61) it follows that
AZit = - f 1,tft - PtA/t.
Thus,
Ax = etA/] + ez ( - f 1,tft - PtA/.) = - e z f 1,tft + (et - ez p t)A/t
= - e z f 1,tft + (et - eZpt)fz,
which shows that Ax E \ Y whenever x E Y t . Thus, Y t is invariant under A.
By Theorem 4.10.52, the restriction A' of A to Y t is an orthogonal trans-
formation from Y t into Y t • This restriction cannot have any (real) eigenvalues,
for then A would also have (real) eigenvalues.
F r om Theorem 4.10.55, A' cannot be a reflection, for in that case A'
would have eigenvalues eq u al to +
I and - 1 . Moreover, A' cannot be a
trivial rotation, for then the eigenvalues of A' would be eq u al to 1 if 6 = 0 0
and - 1 if (J = 1800 • But from Corollary 4.10.8 we know that if A is
orthogonal, then det A = ± 1. Therefore, it follows now from Theorem
4.10.55 that the restriction of A to Y t is a non- t rivial rotation.
Now let Zt = Y t . Since Y t is invariant under A, so is Zto by Theorem
4.10.52, part (iii), and dim Zt = dim X - 2. The restriction At of A to Zt
is an orthogonal transformation from Zt into Zto and it cannot have any
(real) eigenvalues. Applying the argument already given for A and X now to
At andZ t , we can conclude that there exists a two- d imensional linear subspace
Y z of Z t such that the restriction of A t to Y 1 is a non- t rivial rotation. Now
since Y z i scontainedinZ t and since by definitionZ t = Y t ,wehave .\ Y ..L Y 2•
Nex t , let Z2 be the linear subspace which is orthogonal to both Y t and
Y z , and let A2 be the restriction of A to Z2' Repeating the argument given
thus far, we can conclude that there exists a two- d imensional linear subspace
] Y of Z2 such that the restriction of A 2 to ] Y is a non-trivial rotation and
such that .z Y ..L ] Y and Y t ...L .] Y
To conclude the proof of the theorem, we continue the above process
until we have ex h austed the original space .X •
Combining Theorems 4.10.53 and 4.10.60, we obtain the following:

.4 10.62. Corollary. Let A E L ( X , X ) be an orthogonal linear transforma-

tion. Then there exist linear subspaces t+ Y ,_ Y Y lt • . • , ,Y of X such that
(i) all of the above linear subspaces are orthogonal to one another;
(ii) n = dim X = dim + Y + dim L + dim Y I + ... + dim ,Y ;
(iii) x E +Y if and only if Ax = x;
(iv) x E y_ if and only if Ax = - x ; and
(v) the restriction of A to each /Y , i = 1, ... , r, is a non-trivial rotation.
Since in the above corollary the dimension of each "Y i = 1, ... ,r,
is two, we have the following additional result.
.4 10.63. Corollary. If in Corollary .4 10.62 dim X is odd, then A has a

real eigenvalue.
.4 10.64. Deorem. If A is an orthogonal transformation from X into ,X

then the characteristic polynomial of A is of the form
(I - l)· ' ( - 1 - l)- ' ( I - 2 cos 8 1l + lZ) . .. (1 - 2 cos 8,l + lZ)
= det(A - Al).
Moreover, there exists an orthonormal basis e{ lt ... , e.l of X such that the
matrix of A with respect to this basis is of the form
cos 8 1 - s in 9 1
sin 9 I cos 9 I
o
2,
:I cos 8, - s in 8, :I
~-,
8 8
_
I • I
L~ ~ J_! ~
-1
A=
o -1
1+
In our next result the canonical form of skew-adjoint linear transforma-

tions is established.
.4 10.66. Theorem. Let A be a skew-adjoint linear transformation from

X into .X Then there exists an orthonormal basis fe"~ ... ,e.l such that
the matrix A of A with respect to this basis is of the form
A=
!-o-i~
o
i -J.,
I
0
where the .J " i= I, ... ,r are real and where some of the .J , may be
ez ro.
Before closing the present section, we briefly introduce so-called "normal

transformations." We will have quite a bit more to say about such transfor-
mations and their representation in Chapter 7.
.4 10.68. DefinitioD. A transformation A E L(X, X) is said to be a normal

linear transformation if A*A = AA*.
Some of the properties of such transformations are as follows.
.4 10.69. Theorem. eL t A E L(X, )X . Then
(i) if A is a self-adjoint transformation then it is also a normal transfor-

mation;
(ii) if A is a skew-adjoint transformation, then it is also a normal
transformation;
(iii) if A is an orthogonal transformation, then it is also a normal
transformation; and
(iv) if A is a normal linear transformation then there exists an ortho-
normal basis {el' ... ,e,,} of X such that the matrix A of A with
respect to this basis is of the form
Chapter 4 I Finite-Dimensional Vector Spaces and Matrices
PI AI:I
-AI PI!
._ - - -
o
- - - - - -1
P,
1 , I
A= I
1
A, I
I
"
:- - - - - - A , P, I
I
1
o
V~-2,
The proofs of parts (i)-(iii) follow from the definitions of normal, self-
adjoint, skew-adjoint, and orthogonal linear transformations. To prove
part (iv), let A = AI + A2 , where AI = H A + A*) and A2 = t(A - A*),
and note that AI is self-adjoint and A 2 is skew adjoint. This representation
is unique by Corollary .4 10.25. Making use of Theorem .4 10.66 and Corollary
.4 10.38, we obtain the desired result. We leave the details of the proof of
this theorem as an exercise.
4.11. APPLICATIONS TO ORDINARY

DIFE
F RENTIAL EQUATIONS
In the present section we present applications to the material covered in

the present chapter and the preceding chapter. Because of their importance
in almost all branches of science and engineering, we consider some topics
in ordinary differential equations. Specifically, we concern ourselves with
initial-value problems described by ordinary differential equations. The
present section is divided into two parts. In subsection A, we define the initial-
value problem, while in subsection B we treat linear initial-value problems.
At the end of the next chapter, we will continue our discussion of ordinary
differential equations.
A. Initial- Value Problem: Definition
Let R denote the set of real numbers, and let D c R2 be a domain (i.e.,
D is an open and connected subset of R2). We will call R2 the (t, x) plane.
Let / be a real-valued function which is defined and continuous on D, and
4.IJ. Applications to Ordinary Differential Equations 239
let x I:J. dx/dt (Le., x denotes the derivative of x with respect to t). We call
x = f(t, x) (4.11.1)
an ordinary differential equation of the first order. Let T = (t l ' t )z c R be
an open interval which we call a t interval (Le., T = (t l ' t z ) = t{ E R:
t I < t < t )} z . A real differentiable function rp (if it exists) defined on T such
that the points (t, rp(t» E D for all t E T and such that
;(t) = f(t, rp(t» (4.11.2)
for all t E Tis called a solution of the differential equation (4.11.1).
.4 11.3. Definition. Let ('r, e) D. If rp is a solution of the differential
e, then rp is called a solution of tbe initial-
E
equation (4.11.1) and if rp('r) =
value problem
x = f(t, x ) } .
x('r) = e (4.11.4)
In Figure K a typical solution of an initial-value problem is depicted.
t interval T = (t, • t2)

x
m = slope of line L
= fIT, .,(T))
~ ---
t, T t
.4 11.5. iF gure .K Typical solution of an initial-value problem.
We can represent the initial-value problem given in Eq. (4.1 1.4) equiva-
r
lently by means of the integral equation
rp(t) = e+ f(s, rp(s» ds. (4.1 1.6)
Here we say that two problems are equivalent if they have the same solution.
To prove this equivalence, let rp be a solution of the initial-value problem
(4.1 1.4). Then rprr) = and e
;(t) = f(t, rp(t»
140 Chapter 4 I iF nite- DimensiolUll Vector Spaces and Matrices
for all lET. Integrating from f to I we have
s: ;(s) ds = s: I(s, ,(s» ds

or
' ( 1) = ~ + s: I(s, ,(s» ds.

Thus, , is a solution of the integral equation (4.11.6).
Conversely, let, be a solution of the integral equation (4.11.6). Then
' ( f) = ~ , and differentiating both sides of Eq. (4.11.6) with respect to I
we have
;(1) = 1(1, ,(t)),
and thus, is a solution of the initial-value problem (4.11.4).
Next, we consider initial-value problems described by means of several
first-order ordinary differential equations. Let D c R!+' I be a domain (Le.,
D is an open and connected subset of R· + I ). We will call R· + I the (t, X I '
... , x . ) space. Let II> ... ,I. be n real-valued functions which are defined
and continuous on D (i.e., /,(t, X I ' ... , x.), i = I, ... ,n, are defined for
all points in D and are continuous with respect to all arguments I, IX > • • ,
x.). We call
IX = /,(1, X I ' ... ,x . ), i = 1, ... , n, (4.11. 7)
a system of n ordinary differential equations of tbe first order. A set of n real
differentiable functions 1' £ ' ... , ,.} (if it exists) defined on a real I interval
T = (I I' I z ) c R such that the points (I, '1(1), ... , ,.(1» E D for all lET
and such that
;tCt) = /,(1, '1(1), ... ".(t» , i = 1, ... , n (4.11.8)
for all lET, is called a solution of tbe system of ordinary differential equations
(4.11.7).
.4 11.9. Definition. Let (f, ~ I> • . . , ~.) E D. If the set { ' I "' " ,.} is
a solution of the system of equations (4.11.7) and if (' I (f), ... , ,.(f» = (~I>
... , ~.), then the set 1 ' £ "' . ".} is called a solution of the initial-value
problem
IX = /,(t, X I ' . ' • , x.), i = 1, ... , n } .
(4.11.10)
X I (f) = ~I' I = I, ... , n
It is convenient to use vector notation to represent Eq. (4.11.10). Let
.4 11. Applications to Ordinary Differential Equotions 241
/1(/, X It.' • ,X.)] /[ I('~ )X ]

f(/, x ) = . =.
[ . .
1.(/, X It • . . ,x . ) 1.(/, x)
and define i = dx/dt componentwise; i.e.,
We can express Eq. (4.11.10) equivalently as

i = f(t, x) }.
(4.11.11)
(X T) =;
If in Eq. (4.11.1 I) f(t, x) does not depend on I (i.e., f(t, )x = f(x) for all
(I, )x E D), then we have
i = f(x). (4.11.12)
In this case we speak of an autonomous system of first-order ordinary difl'er-
ential equations.
Of special importance are systems of first-order ordinary differential
equations described by
i = A(t)x + vet), (4.11.13)
i = A(t)x, (4.11.14)
and
i= A x , (4.11.15)
where x is a real n-vector, A(t) = a[ j{ (t)] is a real (n x n) matrix with elements
a{j(/) that are defined and continuous on a t interval T, A = a[ ,/] is an (n X n)
matrix with real constant coefficients, and vet) is a real n-vector with com-
ponents v,(t), i = 1, ... ,n, which are defined and at least piecewise continu-
ous on T. These equations are clearly a special case of Eq. (4.11.7). F o r exam-
ple, if in Eq. (4.11.7) we let
/,(t, XI' •• ,x . ) = /,(t, x) = ~

• a'/(t)x i= I, ... ,n,
l,
I- I
then Eq. (4.11.14) results. In the case of Eqs. (4.11.14) and (4.11.15), we
speak of a linear homogeneous system of ordinary differential equations, in
the case of Eq. (4.11.13) we have a linear non-bomogeneous system of ordinary
differential equations, and in the case of Eq. (4.11.15) we speak of a linear
system of ordinary differential equations with constant coefficients.
Next, we consider initial-value problems described by means of nth-order
ordinary differential equations. L e tlbe a real function which is defined and
continuous in a domain D of the real (I, XI' •• ,x~) space, and let X ( k)
Ii. dkx/dl k. We call
(X )~ = 1(1, ,X X (I), • • • , x(~-Il) (4.1 1.1 6)

an nth-order ordinary dift'erential equation. A real function ' I (if it exists)
which is defined on a I interval T = (I I' t 2) C R and which has n derivatives
on T is called a solution of Eq. (4.11.16) if (I, 1' (/), ... ,rp(~)(/» E D for all
I E Tand if
rp(~)(/) = 1(/, 1' (/), ..• , rp(~-Il(/» (4.1 1.17)
for all lET.
.4 11.18. Definition. eL t (r, e" ... ,e~) D. If ' I is a solution of Eq.

e" ... ,
E
(4.11.16) and if rp(r) = rp(~-Il(r) = e~, then ' I is called a solution
of the initial value problem
(X )~ = 1(/, ,x x(ll, ... ,X(~-I» }.
(4.1 1.19)
x ( r) = eJ' ... ,x(I-~ l(r) = e~
Of particular interest are nth-order ordinary differential equations

a,,(/)x(~) + a._I(/)x(~-1l + + al(/)x ( l) + ao(t)x = V(/), (4.11.20)
a,,(t)x()~ + a~_I(t)X(~-1l + + a l (t)x(1l + ao(t)x = 0, (4.11.21)
and
a,.x(~) + a~_lx(~-1l + ... + alx ( I) + aox = 0, (4.11.22)
where a,,(t), .• . ,oo(t) are real continuous functions defined on the interval
T, where a~(/) :;z:! 0 for all lET, where a~, • . , a o are real constants, where
a" :;z:! 0, and where v(/) is a real function defined and piecewise continuous
on T. We call Eq. (4.11.21) a linear homogeneous ordinary differential equation
oforder n, Eq. (4.1 1.20) a linear non-homogeneous ordinary differential equation
of order n, and Eq. (4.1 I .22) a linear ordinary differential equation of order n
with constant coefficients.
We now show that the theory of nth-order ordinary differential equations
reduces to the theory of a system of n first-order ordinary differential equa-
tions. To this end, let in Eq. (4.11.19) X = X I ' and let
IX = X 2 = X
x = 2
x 3
= X ( 2)
(4.1 1.23)
I_~X = x~ = X(~-I)
x~ = 1(/, XI' •• , x~) = x(~)
This system of equations is clearly defined for all (I, X I ' ... ,x~) E D. Now
assume that the vector p4 T = ('11' ... , rp~) is a solution of Eq. (4.11.23) on an
.4 11. Applications to Ordinary Differential Equations
interval T. Since rp" = ;" rp3 = ;", ... ,rpft = rp\ft-I), and since
f(t, rp,(t), . .. ,rpft(t» = f(t, rp,(t), . .. ,rp\ft-Il(t» = rp\ft)(t),
it follows that the first component rp, of the vector, is a solution of Eq.
(4.11.16) on the interval T. Conversely, assume that rp, is a solution of Eq.
(4.11.16) on the interval T. Then the vector cpT = (rp, rp(l), ... , rp(ft-ll) is
clearly a solution of the system of eq u ations (4.11.23). Note that if rp,(1') = ~"
... ,rp\ft-I)(1') = ~ft' then the vector, satisfies ,(f) = ; , where = (~t, ;T
... , ~ft)' The converse is also true.
Thus far we have concerned ourselves with initial-value problems charac-
terized by real ordinary differential equations. It is possible to consider initial-
value problems involving complex ordinary differential equations. F o r exam-
ple, let t be real and let ZT = (z " ... , Zft) be a complex vector (i.e., Zk is of
the form U k + ivk , where U k and V k are real and i = ,J = } ) . Let D be a domain
in the (t, z) space, and letf., ,f,. be n continuous complex-valued functions
defined on D. Let fT = (fl' ,f,.), and let = dz/dl. We call z
z = C(t, )z (4.11.24)
a system of n complex ordinary differential eq u ations of the first order.
A complex vector cpT = (rp" • .. , rpft) which is defined and differentiable on
a real t interval T = (T" T,,) c R such that the points (I, rp,(t), ... , rpft(t»
E D for all t E T and such that
(+ t) = C(t, .<t»
for all t E T, is called a solution of the system of eq u ations (4.11.24). If in
addition, (r,~" ... '~ft) E D and if (rp,(-r), ... ,rpft(-r» = (~I"" '~ft) = ~T,
then cp is said to be a solution of the initial-value problem
z = (£ t, )z } .
(4.11.25)
(z r- ) = ~
Of particular interest in applications are initial-value problems characterized
by complex linear ordinary differential eq u ations having forms analogous
to those given in equations (4.1 1.13)-(4.11.15).
We can similarly consider initial-value problems described by complex
nth- o rder ordinary differential equations.
Let us look now at some specific examples. The first example demonstrates
that the solution to an initial-value problem may not be unique.
4.11.26. Example. Consider the initial-value problem

x = x'/3
(x O) = O.
We can readily verify that this problem has infinitely many solutions passing
through the origin of the (I, x) plane, given by
l
tpi ) =
0,
{ H(t ~ p)]3/Z, P <
° <1<
t < I
p
where p is any real number such that ° < p < I. •
The next example shows that the t interval for which a solution to the
initial-value problem exists may be restricted.
.4 11.27. Example. Consider the initial-value problem
x ( t.) = ,{
where { is any real number. By direct computation we can verify that
tp(t) = {[I - (t - tl){ ] - l
is a solution of this problem. We note that if t = t l + Ij{, then the solution
tp(t) is not defined. Thus, there is a restriction on the t interval for which a
solution to the above problem exists. Namely, if { > 0, the above solution
is valid over any interval (t l , t 2 ) such that
II< t 2 =t l e-+ ' I
In this case we say the solution fails to exist for I > t 2 • On the other hand,
if { < 0, the solution given above is valid for any t > t I ' and we say the solu-
tion exists on any interval (t l , t 2 ). •
The preceding examples give rise to several important questions:

When does an initial-value problem possess a solution?
When is a solution unique?
What is the extent of the interval over which such a solution exists?
Is the solution continuously dependent on the initial condition ~ ?
At the end of the next chapter we will state and prove results which give
answers to these questions.
8. Initial-Value Problem: Linear Systems
In the remainder of the present section we concern ourselves exclusively

with initial-value problems described by linear ordinary differential equations.
Let again T = (tit t 2 ) be a real t interval, let x T = (X I "' " ,x ,) denote an
n-dimensional vector, let A = a[ ,/] be a constant (n X n) matrix, let A(t)
= o[ l/(t)] be an (n X n) matrix with elements 0l/(t) that are defined and con-
tinuous on the interval T, and let v(tY = (vl(t), ... ,v,,(t» denote an n-
4 . lJ . Applications to Ordinary Differential Equations 24S
vector with components vl(t) that are defined and piecewise continuous on T.
In the following we consider matrices and vectors with components which
may be either real- or complex-valued. In the former case the field for the x
space is the field of real numbers, while in the latter case the field for the x
space is the field of complex numbers. Also, let
D = {(I, x): lET, x E Rn(or en)}. (4.11.28)
At first we consider systems of ordinary differential equations given by
i = A(I)x + V(I), (4.11.29)
i = A(I)X, (4.11.30)
and
i= Ax. (4.11.31)
In the applications section of the next chapter we will show that, with the
above assumptions, equations (4.11.29)-(4.11.31) possess unique solutions
for every (r, e) E D which exist over the entire interval T = (II' (2 ) and which
depend continuously on the initial conditions. This is an extremely important
result in applications, where we usually require that T = (- 00, 00).
.4 11.32. Theorem. The set S of all solutions of Eq. (4.11.30) on T forms

an n-dimensional vector space.
Proof L e t fl and ' 2 be solutions of Eq. (4.11.30), let F denote the field for
the x space, and let 0.: 1, 0.: 2 E .F Since
d
dt [o.:lfl(l) + 0.:2' 2 (1)] = 0.: 11»4 (1) + 0.: 242» (1)
= o.:IA(t)4pl(t) + 0.:2A(t)4p2(t)
= A(t)[o.:l'l(t) + 0.:2'2(t)].
E .F
1'
it follows that 0.: 1 + 0.: 2'2 E S whenever f2 E S and whenever 0.: 1,0.: 2
F u rthermore, the trivial solution f = 0 defined by f(l) = 0 for all
1' >
t E T is clearly in S, and for every TI E S there exists a I' f = -11 E S such
that TI + I' f = O. It is now an easy matter to verify that all the axioms of a
vector space are satisfied for S (we leave the details to the reader to verify).
solutions fl' ..• , n'

Next, we must show that S is n-dimensional; i.e., we must find a set of
which is linearly independent and which spans S. Let
1;1' ... ,I;n be a set of linearly independent vectors in the n-dimensional x
space. By the existence results which we will prove in the next chapter (and
which we will accept here on faith), if f E T, there exist n solutions
of Eq. (4.11.30) such that' l (f) = ;1' i = I, ... ,n. We first show that these
I' ' .... n'
solutions are linearly independent. F o r purposes of contradiction, assume that
these solutions are linearly dependent. Then there exist scalars 0.: 1" .. ,
IX" E ,F not all ez ro, such that
for all t E T. This implies that
But this last equation contradicts the assumption that the ~, are linearly
independent. Thus, the ." i = I, ... ,n, are linearly independent. iF nally,
to show that these solutions span S, let. be any solution of Eq. (4.1'1.30) on
T, such that '<1') = ~. Then there exist unique scalars IX I , • • , IX" E F such
that
because the vectors /~ ' i= 1, ... ,n, form a basis for the x space. It now
follows that
is a solution of Eq. (4.11.30) on T such that .(1' ) =~. By the uniqueness

results which we will prove in the next chapter (and which we accept here
on faith),
Since. was chosen arbitrarily, it follows that the solutions

span S. This concludes the proof. _
1' ' i= I, ... ,n,
The above result motivates the following two definitions.
.4 11.33. Definition. A set of n linearly independent solutions of Eq.

(4.11.30) on T is called a fundamental set of solutions of (4.11.30). An (n X n)
matrix ' P whose n columns are linearly independent solutions of Eq. (4.11.30)
on T is called a fundamental matrix.
Thus, if { . I' ... , .,,} is a set of n linearly independent solutions of Eq.

(4.11.30) and if.r = (" 11' • • , "III)' then
" II , U , 111]
,,= [
:~ .. ..:::.. :.1~
:.1~ = [ . II.1! · 1· .,,]
""I "111 ... ",,"
is a fundamental matrix.
.4 11. Applications to Ordinary Diff~rential Equations 247
In our next definition we employ the natural basis for the x space, given by
I 0- 0
0 I
81 = , 8:&= 0 ... , u• =
0
0 0
I
.4 11.34. Definition. A fundamental matrix . (for Eq. (4.11.30» whose

columns are determined by the linearly independent solutions ' I ' i = I,
... ,n, with
'/(1') = 01' i = I, ... ,n,
'f E T, is called the state transition matrix . of Eq. (4.11.30).
Let X = IX [ I] be an (n X n) matrix and define differentiation of X with

respect to t E Tcomponentwise; i.e., X A x [ o.] We now have:
.4 11.35. Theorem. L e t" be a fundamental matrix of Eq. (4.11.30) and

let X denote an (n X n) matrix. Then" satisfies the matrix equation
X = A(t)X , t E T. (4.11.36)
Proof We have
+ = [ + I I+ : & ! · 1· .+ ] = A
[ (t)'II' IA(t>l' :' !& .. · I A(t)' I ' . ]
= A(t)[ . II.z I· · 1· • .] = A(t)Y . •
We also have:
.4 11.37. Theorem. If" is a solution of the matrix equation (4.11.36)
on T and if t, ' f E T, then
det "(/) = det "(' f )ef. tf A(.) i., t E T. (4.11.38)
Proof Recall that if C = [ e ll] is an (n X n) matrix, then tr C = t
I-I
CII' Let
" = [ " II] and A(t) = o[ IAt)]. Then I¥ II = ;{ I• alk(t)"kr Now
fill "u ... ¥lh fill flu .. , 'IIh
~(detY)= :~ .. :.:&~ .. ::: .. :.z~ + ~.2~ ~.:&~ .. ::: .. ~:& +

I "d
•
fl. .. .
".. ".1 "d fl••
"" "u "I.

+ "u
..................
flu ,,:& • . (4.11.39)
,,-' fld . , . fl••

Also,
1' 2' . -
...................
' / Inn
The last determinant is unchanged if we subtract from the first row 012 times
°
the second row plus 1 , times the third row up to 0ln times the nth row.
This yields
0\1 (/)'/1 \I 0 \ I (t}yt u ... ° \I (t}'/I 1n
1\ 2' 1 /' 122 \l'2n = 01\(/) det 1' .
Repeating the above procedure for the remaining determinants we get
~ d[ et 1'(t)] = ° 11 (/) det Y ( t) + 022(1) det 1'(1) + ... + 0..(/) det 1'(t)
= [tr A(t)] det 1'(t).
This now implies
det Y ( t) = det Y(r)ef~ It A(,),,,
for all t E T. •
.4 1J.04 . Exercise. Verify Eq. (4.11.39).
We now prove:
.4 1J.14 . Theorem. A solution Y of the matrix equation (4.11.36) is a

fundamental matrix for Eq. (4.11.30) ifand only if det 1'(t) 0 for all t E T. *'
Proof. Assume that l' = [ . 1 I V ' 21· .. 1V
' .] is a fundamental matrix for
Eq. (4.11.30), and let 'I' be a nontrivial solution for (4.11.30). By Theorem
.4 11.32 there exist unique scalars ~I' • • , /In E ,F not all ez ro, such that
or
• = 1' a , (4.11.42)
where aT = (/II' ..• ,/I.). Equation (4.11.42) constitutes a system of n linear
equations with unknowns /II' ..• , /In at any f E T and has a unique solution
for any choice of.(f). eH nce, we have det 1' ( f) *'
*'
0, and it now follows from
Theorem .4 11.37 that det Y ( t) 0 for any 1 E T.
Conversely, let l' be a solution of the matrix equation (4.11.36) and assume
.4 11. Applications to Ordinary Differential Equations 249
that det Y ( t) 1= = 0 for all t E T. Then the columns of., are linearly inde-
pendent for all t E T. •
.4 11.43. Theorem. L e t" be a fundamental matrix for Eq. (4.11.30), and

let C be an arbitrary (n x n) non-singular constant matrix. Then is also "C
a fundamental matrix for Eq. (4.11.30). Moreover, ifT is any other funda-
mental matrix for Eq. (4. 11.30) then there exists a constant (n X n) non-sin-
gular matrix P such that T = "P.
4.1l.44. Exercise. Prove Theorem .4 11.43.
Now let R(t) = [rit») be an arbitrary matrix such that the scalar valued
functions rl}(t) are Riemann integrable on T. We define integration of R(t)
componentwise, i.e.,
f R(t)dt = fr[ ,it»)dt = J[ r,/(t)dt}

Integration of vectors is defined similarly.
In the next result we establish some of the properties of the state transition
matrix, • . Hereafter, in order to indicate the dependence of. on l' as well
as t, we will write .(t, 1'). By b4 (t, 1'), we mean u.(t, 1')/ut.
.4 11.45. Theorem. eL t D be defined by Eq. (4.11.28), let l' E T, let

cp(1') = ~, let (1',)~ E D, and let .(t,1' ) denote the state transition matrix
for Eq. (4.11.30) for all t E T. Then
(i) b4 (t, f) = A(t).(t, 1') with .(1' , 1' ) = I, where I denotes the (n x n)
identity matrix;
(ii) the unique solution of Eq. (4.11.30) is given by
,(t) = .(t, 1'~ (4.11.46)
for all t E T;
(iii) .(t, f) is non-singular for alI t E T;
(iv) for any t, (J E T we have
.(t,1' ) = .(t, (J~«(J, f);

(v) [.(t,1'»)-1 t:. .- I (t, f) = .(- r , t) for all t E T; and
(vi) the unique solution of Eq. (4.11.29) is given by
cp(t) = .(t, 1')~ + f .(t, ")v(,,)d,,. (4.11.47)
Proof The first part of the theorem follows from the definition of the state
transition matrix.
Chapter 4 I iF nite-Dimensiotull Vector Spaces and Matrices
To prove the second part, assume that f{ t ) = .(t, f~. Differentiating

with respect to t we have
(+ t) = i(t, f~ = A(t~(t, f~ = A(t)t<t).
F u rthermore, f{ f ) = .(f, f~ = ~ . F r om the uniqueness results (to be
presented in the next chapter) it follows that the specified • is indeed the
solution of Eq. (4.11.30).
The third part of the theorem is a consequence of Theorem .4 11.41.
To prove the fourth part of the theorem we note that p4 t{ ) = .(t, f~
is the unique solution of Eq. (4.11.30) satisfying f{ f ) = ~ , and also that
f{o)' = ~O', f~, 0' E T. Now consider the solution of Eq. (4.11.30) with
initial condition given at 0' in place of f; i.e., p4 t{ ) = .(t,O'.> O
< )' . Then
f{ t ) = .(t, f~ = ~t, O'~O', f~.
Since this equation holds for arbitrary ~ in the x space, we have

~t, f) = .(t, O'~(O', f).
To prove the fifth part of the theorem we note that .- I (t, f) exists by part
(iii). F r om part (iv) it now follows that
I= .(t, f~(f, t),
where I denotes the (n x n) identity matrix. Thus,
.- I (t, f) = .(f, t)
for all t E T.
In the next chapter we will show that under the present assumptions,
Eq. (4.11.29) possesses a unique solution for every (f, ~) E D, wheret< f ) = ~.
Thus, to prove the last part of the theorem, we must show that the function
(4.11.47) is this solution. Differentiating with respect to t we have
.(t) = ,< t , f~ + ~t, t)Y(t) + f ,<t, ,,)Y(,,) d"
= A(t~(t, f~ + (Y t) + f A(t~(t, ,,)Y(,,) d"
= A(t)[~t, f~ + f .(t, ,,)v(,,) d"J + v(t)
= A(t~t) + (Y t).
Also, f{ f ) =~. Therefore, • is the unique solution of Eq. (4.11.29). •
In engineering and physics, • is interpreted as representing the "state"

of a physical system described by appropriate ordinary differential equations.
In Eq. (4.11.46), the matrix .(t, f) relates the "states" of the system at the
points t E T and f E T. Hence, the name "state transition matrix."
Next, we wish to examine the properties of linear ordinary differential
.4 11. Applications to Ordinary Differential EqQ
U tions 251
eq u ations with constant coefficients given by Eq. (4.1 1.31). We require the
following preliminary result.
4.11.48. Theorem. L e t A be a constant (n X n) matrix (A may be real or

complex). L e t SN(I) denote the matrix
N tie
SNCt) = I Ie~ k!AIe. +
Then each element of matrix SH(I) converges absolutely and uniformly on
any finite interval (- I I' II), II > 0, as N - - > 00.
Proof Let aW denote the (i,j)th element of matrix Ale, where i,j = I,
... , n, and k = 0, I, 2, .... Then the (i,j)th element of SNCt) is equal to
~ ~ (leI tie
QI) + 1e~1 a'i k! '
where ~'J is the K r onecker delta. We now show that
~'J + I
letl a}r
l
~! I< 00 for all i,j.
Let m = max
I~I~.
(t Ia'i I). Then m is a constant which depends on the elements
=J I
of the matrix A• SinceA1c+1 = A AIe , wehavemaxla(Ie+I)I=maxl~a I} ~ Ipa(lell

p}
p- I
(t IalP I . IaW I) < t

~J ~J
< max
I,) p= I
(max
I p= I
Ialp I)(max
P.}
Ia~~1 I). Therefore,
max la}+ } II I< m .max l a1} 1 1. When k= t. we have maxlat]l~m. and
I.J I.} I.J
by induction it follows that max l aWI< m k
• Now tet Mk= ( mtl)klk! .
I.}
Then we have for any t (- t tl), t l > 0, and for any i,j,
I (kIt"k! I<M
E l,
-
ali - ".
Since I + L M" = e"'t" we now have
"-I
i-
~
QI) + ~
1e= 1
al}
(kl tIc
-k' •
is an absolutely and uniformly convergent series for each i,j over the interval
(- I I' I I) by the Weierstrass M-test. •
We are now in a position to consider the following:
4.11.49. Definition. Let A be a constant (n X n) matrix. We define eAt

to be the matrix
+ "=L 1-
k
eAt = I t_ A "
k!
for any -00 < I < 00.
We note immediately that eN 1,-0 = I.

We now prove:
.4 11.50. Theorem. Let T = (- 0 0, 00), let l' E T, and let A be a constant

(n x n) matrix. Then
(i) the state transition matrix for Eq. (4.11.31) is given by
.(t, 1') = eA l' - . )
for all t E T;
(ii) the matrix eA' is non-singular for all t E T;
(iii) eA"eA" = eA1h+ t ,) for all t I> t 2 E T;
(iv) AeA' = eA'A for all t E T; and
(v) (eN)-1 = e- AI for all t E T.
Proof To prove the first part we must show that .(t, 1') satisfies the matrix
equation
' ( t,1' ) = A.(t, 1')
for all t E T, with .(1' , 1') = I. Now, by definition,
.(/,1' ) = e AII-. ) = I + :E (t - k! r- )k Ak.

k=1
In view of Theorem .4 11.48 we may differentiate the above series term by
term. In doing so we obtain
.{ [ e AII- . )]
dt
= A+
k=1
:E (t - k! 1')k Ak+l = A[ I + :E (t - k! 1')k Ak]
k-I
= AeA II - . l,
and thus we have

t~ , 1') = A~t, 1')
for all t E T, with .(1' , 1' ) = eA l.- . l = I. Therefore, eAll - d is the state tran-
sition matrix for Eq. (4.11.31).
The second part of the theorem is obvious.
To prove the third part of the theorem, we note that for any tl, t 2 E T,
we have
Now .(tl> - t 2) = eAII,+ t ,l, .(tl> 0) = eA", and ~O, - t 2) = eN', which
yields the desired result.
To prove the fourth part of the theorem we note that for all t E T,
A(I + :E Ik Ak) =
k= l k!
A+ :E t....Ak+1
k=1 k!
= (I + k-tl t....Ak)A.
k!
iF nally, to prove the last part of the theorem, note that for all t E T,
t!', . t!(' ,- ) = eA (,- , ) = I.
Therefore, (t! " )- t = e- A,. •
The following natural question arises: can we find an expression similar

to t!', for the case when A = A(t), t E T. The answer is, in general, no.
oH wever, there is a special case when such a generalization is valid.
.4 11.51. Theorem. If for Eq. (4.11.30) A(tt)A(t z ) = A(tz)A(tt) for all

t l , t z E T, then the state transition matrix .(t, T) is given by
I
r
= e = = +
A'J (I,)d~ ...
.(t, T) T eB(.,T) I ~ -kIB~{t, T),
k= 1 •
where B(t, T) A A('1) d'l.
.4 11.52. Exercise. Prove Theorem .4 11.5 I.
We note that a sufficient condition for A{ t l) to commute with A(tz ) for

all t I> t z E T is that A( t) be a diagonal matrix.
.4 11.53. Exercise. Find the state transition matrix for i = A{t)x, where
A{ t ) = [; ~l
The reader will find it instructive to verify the following additional
results.
.4 11.54. Exercise. eL t A denote the (n X n) diagonal matrix
A= A[ .I. 0].
o A.~
Show that
ell' . 0 ]
eA , = .
[
o el .,
for all t E T= (- b o, bo).
.4 11.55. Exercise. eL t t E T = (- bo, be» , let T E T, and let ~ E R~

(or en). Let A be the (n x n) matrix for Eq. (4.1I.3I), and let. denote the
unique solution ofEq . (4.11.31) with ,(of) = ~. Let P be a similarity transfor-

mation for A, and let B = p- I AP.
(a) Show that eAt = Pe-rP-1 for all t E T.
(b) Show that the unique solution of Eq. (4.11.31) is given by
,= P.r,
where. is the unique solution of the initial-value problem
t= B y
with
'!(f) = P- I ' ( f) = P-I~.
4.1l.56. Exercise. Let D be defined by (4.11.28). In Eq. (4.1 1.29), let

A(t) = A for all t E T; i.e.,
i = Ax + v(t). (4.11.57)
L e tf E T, and let, denote the unique solution ofEq . (4.11.57) with ' ( f) =~.
Let P be a similarity transformation for A, and let B = p- I AP. Show that
the unique solution of Eq. (4.11.57) is given by
.= p ' ! ' ,
where. is the unique solution of the initial-value problem
t = By + P- I v(t)
with
(f, '! (f» E D, t E T.
.4 11.58. Exercise. Let J denote the J o rdan canonical form of the (n x n)

matrix A of Eq. (4.11.31), and let M denote the non-singular (n x n) matrix
which transforms A into J ; i.e., J = M- I AM. Then
J o:
- - 1- - 1
i_~!: 0
J=
o
where
o
o
where
o o-
o o
J.=
. I
1.. + . o 0· · · · ·
o 0· · · · · 0 lk+ ..
m = I, ... ,p, and where 11> ... ,1.., lk+.' ... ,lu, denote the (not
necessarily distinct) eigenvalues of A. Show that
o
ell'
o e'"
.]
where
and
I" I,· - i
1 2! (v. - l)!
1' · - "
t
(VIII - 2)!
o 0 0
where J . is a VIII X VIII matrix and k + v. + ... + v, = n.
Next, we consider initial-value problems characterized by linear nth-order
ordinary differential equations given by
a.(t)x l l
• + a._ . (t)x l
.-
Il
+ + a.(t)x ( \ l + ao(t)x = v(t), (4.11.59)
a.(t)x l• l + a._ . (t)x c.- Il + + a.(t)x ( \ l + ao(t)x = 0, (4.11.60)
and
a.x l .) + a._ . x l. - I ) + ... + a.x l l) + aox = O. (4.11.61)
In Eqs. (4.11.59) and (4.11.60), v(t) and o,(t), i = 0, ... ,n, are functions
which are defined and continuous on a real t interval T, and in Eq. (4.11.61),
the i = 0, ... , n, are constant coefficients. We assume that 0. F= 0, that

01'
0,,(1) F=0 for any 1 E T, and that v(l) is not identically ez ro. Furthermore, the
coefficients 01' 0 1(1), i = 0, ... ,n, may be either real or complex.
In accordance with Eq. (4.11.23), we can reduce the study of Eq. (4.11.60)
to the study of the system of n first-order ordinary differential equations
i = A(I)x, (4.11.62)
where
o 1 o o
o o I o
A(I) = (4.11.63)
o 0 0 I
-' oo(t) -° 1(1) - 0 2(1) - O "- I (t)
_ a,,(I) a,,(I) 0.(1) . •• a,,(/)
In this case the matrix A(I) is said to be in companion form. Since A(I) is
continuous on T, there exists for all 1 ETa unique solution II to the initial-
value problem
i = A(I)x }
(4.11.64)
x(t')=;=(~I,··,~,,)T '
where T E T and; E R" (or e") (this will be proved in the next chapter).
Moreover, the first component of ,I, PI' is the solution of Eq. (4.11.60)
satisfying
PI(T) = p(T) = ~I' p(\)(T) = ~2' ... , pl"-II(T) = ~".
Now let 1' 1' " .. ,' I ' " be solutions of Eq. (4.11.60). Then we can readily
verify that the matrix
y =[ ;::: ... ;1:' ...:::...;:' ] (4.11.65)

1' "\ ' - t) ¥I~"- t) • • • ,,~.- t)
is a solution of the matrix equation

+ = A(I)", (4.11.66)
where A(I) is defined by Eq. (4.11.63). We call the determinant of" the
Wronskian of Eq. (4.11.60) with respect to the solutions ¥l1>"" I¥ ", and
we denote it by
det" = W(' I ' I > " " 1' ,' ,). (4.11.67)
Note that for a fixed set of solutions I¥ I" .. , "" (and considering T fixed),
the Wronskian is a function of I. To indicate this, we write W(" I ' • • , ".)(1).
In view of Theorem .4 11.37 we have for all t E T,

W(Y ' I "' " ,' Y ,)(t) = det Y ( t) = det 'P(r)eJ~trACII'"
= W(Y ' I "' " Y', )(r)eJ~-[II.-.e")/II.C"lld". (4.11.68)
.4 11.69. Example. Consider the second-order ordinary differential eq u a-

tion
tZx CZI + tx W - x = 0, 0 < t< 00. (4.11.70)
The functions 1' Y (t) = t and (z ' Y t) = lit are clearly solutions of Eq. (4.11.70).
Consider now the matrix
Then
2
W(YI' > )z'Y (t) = det P
' (t) = --, t> O.
t
Using the notation of Eq. (4.1 1.63), we have in the present case al(t)laz(t)
= lit. F r om Eq. (4.1 1.68) we have, for any l' > 0,
W(YI' > )z'Y (t) = det "(t) = W(Y' I> )z'Y (r)eJ~ (- II.e"I/II,C"IJ d"
-_ - e2 ID (Titl _
- -, 2 t> 0,
l' t
which checks. •
The reader will have no difficulty in proving the following:
.4 11.71. Theorem. A set of n solutions of Eq. (4.11.60), Y'I' ... ,Y'", is

linearly independent on a t interval T if and only if W(Yt' > ... ,Y,' ,)(t) 1= = 0
for all t E T. Moreover, every solution of Eq. (4.11.60) is a linear combina-
tion of any set of n linearly independent solutions.
.4 11.72. Exercise. Prove Theorem .4 11. 71.
We call a set ofn solutions ofEq . (4.11.60), 1'Y t ..• , "' Y , which is linearly
independent on T a fundamental set for Eq. (4.11.60).
L e t us next turn our attention to the non-homogeneous linear nth- o rder
ordinary differential eq u ation (4.11.59). Without loss of generality, let us
assume that a,,(t) = 1 for all t E T; i.e., let us consider
C
X "1 + a"_I(t)xC"-1l + ... + al(t)x(l) + ao(t)x = v(t). (4.11.73)
The study of this eq u ation reduces to the study of the system of n first-order
158 Chapter 4 I Finite-Dimensional Vector Spaces and Matrices
ordinary differential equations

i = A(t)x + b(t), (4.11.74)
where
o 1 o o o
o o 1 o
A(t) = . b(t) =
000 o
- o o(t) - 0 1(/) - 0 2(/) ... - 0 ._ 1(/) V(/)
(4.11.75)
In the next chapter we will show that for all lET there exists a unique
solution ~ to the initial-value problem
i = A(t)x + b(t) }
, (4.11.76)
(X T) =; = (el' ... ,e.)T
where T E T and; E R· (or C·). The first component of~, CI ' is the solution
of Eq. (4.11.59), with 0.(/) = 1 for all t E T, satisfying
CI(-r) = el' C(tJ(r- ) = '2' ... , Clo-(> ! r- ) = , .•
We now have:
.4 11.77. Theorem. Let I¥{ t> ... , I¥ .} be a fundamental set for the equation
lX .> + O._I(t)X I.- I> + ... + OI(t)X()J + oo(/)X = O. (4.11.78)
Then the solution' of the equation
Xl.> + o._I(/)x(·-tJ + + 01(/)X()J + oo(t)x = v(1), (4.11.79)
satisfying ~(T) = ; = (C(T), CIIl(T), , ,(·-t>(T»T = (' I " .. ,' . )T, T E T,
; E R· (or C·) is given by the expression
C(/) = CA(/) + to I¥ ,(/)

t:1
I' r
W
{ ,(¥I .. • .
W(¥I h • • ,
, .'¥ )(s)} v(s) ds,
I¥ .)(s)
(4.11.80)
where CA is the solution of Eq. (4.11.78) with CA(T) = ' I ' and where ~(¥lI'
... ,¥I.X/) is obtained from W(¥lI" .. , I¥ .)(/) by replacing the ith column of
W(¥lI" .. , I¥ .X/) by (0,0, ... , l)T.
.4 11.82. Example. Consider the second-order ordinary differential equa-

tion
12 x 12> + tx ltJ - X = b(t), t > 0, (4.11.83)
where b(t) is a real continuous function for all t > O. This equation is
equivalent to
(4.11.84)
where v(t) = b(t)/t'1.. F r om Example .4 11.69 we have V'I(t) = t, V'1' .(t) = l/t,
and W(V'., V'1' .)(t) = - 2 /t. Also,
o I
t 1
=--,
-1 t
tr
eL t us next focus our attention on linear nth-order ordinary differential

equations with constant coefficients. Without loss of generality, let us
assume that, in Eq. (4.11.61), a. = 1. We have
(4.11.85)
We call the algebraic equation
P(l) = ln +
a._ l l n- 1 + .,. + all + a o = 0 (4.11.86)
the characteristic equation of the differential equation (4.11.85).
As was done before, we see that the study of Eq. (4.11.85) reduces to the
study of the system of first-order ordinary differential equations given by
:i = Ax, (4.11.87)
where
AJ l-a.~ ..... ~ ..... ~ .....~ ... :::.... ~ . .] (4.11.88)
o -al - a 3 ..• -a2

- a ._ I
We now show that the eigenvalues of matrix A ofEq . (4.11.88) are precisely
the roots of the characteristic equation (4.11.86). First we consider
-,t 1 0 o o
o -,t 0 o o
det(A - , tI) = .
o o o -,t
-1 0 0 0
o -1 I 0 0
= -1 .
o 0 0 -1 I
-01 -0" -03 ... - 0 "_ , , -(1 + 0,,_ 1 )
100
-1 I 0
+ (- 1 )"+ 1 (- 0 0) 0 -1 0.
o 0
sU ing induction we arrive at the expression
det(A - 1 1) = (- I )"{ l " + ,° ,_11,,-1 + ... + all + oo}. (4.11.89)
It follows from Eq. (4.11.89) that 1 is an eigenvalue of A if and only if 1
is a root of the characteristic equation (4.11.86).
.4 11.90. Exercise. Assume that the eigenvalues of matrix A given in Eq.

(4.11.88) are all real and distinct. eL t A denote the diagonal matrix
o
(4.11.91)
1"
where 1 1 , • • ,1" denote the eigenvalues of matrix A. eL t V denote the
Vanclermonde matrix given by
I
11 1" 1"
V= II l~ l~
(a) Show that V is non-singular.

(b) Show that A = V-IAV.
Before closing the present section, let us consider so-called "adjoint

systems." To this end let us consider once more Eq. (4.11.30); i.e.,
t = A(t)x . (4.11.92)
Let A*(t) denote the conjugate transpose of A(t). (That is, ifA(t) = o[ (J' t)],
then A*(t) = a[ l}(t)]T = a[ ,J (t)], where a,it) denotes the complex conjugate
.4 12. Notes and References 261
of a,it).) We call the system of linear first-order ordinary differential equa-

tions
y = -A*(t)y (4.11.93)
the adjoint system to (4.1 1.92).
.4 11.94. Exercise. eL t Y be a fundamental matrix of Eq. (4.11.92). Show

that T is a fundamental matrix for Eq. (4.11.93) if and only if
T*Y = C,
where. C is a constant non-singular matrix, and where T* denotes the con-
jugate transpose of T.
It is also possible to consider adjoint equations for linear nth-order

ordinary differential equations. eL t us for example consider Eq. (4.11.85),
the study of which can be reduced to that of Eq. (4.11.87), with A specified
by Eq. (4.11.88). Now consider the adjoint system to Eq. (4.11.87), given by
y= - A *y, (4.11.95)
where
0 0 0 ao
-I 0 0 a 1
- A *= 0 -I 0 a2 , (4.1 1.96)
.....................
o 0· · - I a.- 1
where a, denotes the complex conjugate of a" i = 0, ... , n - I. Equation

(4.11.95) represents the system of equations
Yl = aoY.,
2Y = -YI + alY . ' (4.1 1.97)
.Y = - Y , ,- I + a,,-I.Y ·
Differentiating the last expression in Eq. (4.11.97) (n - 1) times, eliminating
Y" ... ' Y " - I ' and letting "Y = ,Y we obtain
(- I )· y "> + (- I ),,- l a._1Y c..-I> + ... + (- I )Qlit> +
C
aoY = O. (4.11.98)
Equation (4.11.98) is called the adjoint of Eq. (4.1 1.85).
.4 12. NOTES AND REFERENCES
There are many excellent texts on finite-dimensional vector spaces and

matrices that can be used to supplement this chapter (see e.g., .4 [ 1], .4 [ 2],
.4 [ ,] 4 and .4 [ 6].4 [ - 10]). References .4 [ 1], .4 [ 2], .4 [ 6], and .4 [ 10] include appli-
C1uIpter 4 I Fmite-Dimensional Vector Spaces and Matrices
cations. (In particular, consult the references in .4 [ 10] for a list of diversified
areas of applications.)
ExceUent references on ordinary differential equations include .4 [ 3], .4 [ 5],
and .4 [ 11].
REFERENCES
.4 [ 1] N. R. AMuNDSON, MatMmatical Methods in Chemical Engineering: Matrices
and Their Applications. Englewood ai1f's, N.J.: Prentice-aH ll, Inc., 1966.
.4 [ 2] R. E. BELM L AN, Introduction to Matrix Algebra. New York: McGraw-iH D
Book Company, Inc., 1970.
.4 [ 3] .F BRAUER and .J A. NOBEL, Qualitatil1e Theory of Ordinary Differential
Equations: An Introduction. New York: W. A. Benjamin, Inc., 1969. *
.4 [ ] 4 E. T. BROWNE, Introduction to the Theory of Determinants and Matrices.
Chapel iH D, N.C.: The nU iversity of North carolina Press, 1958.
.4[ 5] E. A. CoDDINGTON and N. IL MNSON, Theory of Ordinary Differential
Equations. New York: McGraw-iH ll Book Company, Inc., 1955.
.4 [ 6] F. R. GANTMACHER, Theory of Matrices. Vols. I, II. New York: Chelsea
Publishing Company, 1959.
.4 [ 7) P. R. IIALMos, iF nite Dimensional Vector Spaces. Princeton, N.J.: D. Van
Nostrand Company, Inc., 1958.
.4[ 8] .K OH M
F AN and R. N UK ZE, Linear Algebra. Englewood ai1f's, N.J.:
Prentice-aH ll, Inc., 1961.
.4[ 9] S. IL PSCHT
U Z, Linear Algebra. New York: McGraw-iH ll Book Company,
1968.
.4[ 10] B. NOBLE, Applied iL near Algebra. Englewood aiit' s , N.J.: Prentice-aH ll,
Inc., 1969.
.4 [ 11] .L S. PoNTlU A OIN, Ordinary Differential Equations. Reading, Mass.:
- R eprinted by Dover Publications, Inc., New York, 1989.
5
METRIC SPACES
U p to this point in our development we have concerned ourselves primarily

with algebraic structure of mathematical systems. In the present chapter we
focus our attention on topological structure. In doing so, we introduce the
concepts of "distance" and "closeness." In the final two chapters we will
consider mathematical systems endowed with algebraic as well as topological
structure.
A generalization of the concept of "distance" is the notion of metric.
Using the terminology from geometry, we will refer to elements of an
arbitrary set X as points and we will characterize metric as a real-valued,
non-negative function on X x X satisfying the properties of "distance"
between two points of .X We will refer to a mathematical system consisting
of a basic set X and a metric defined on it as a metric space. We emphasize
that in the present chapter the underlying space X need not be a linear
space.
In the first nine sections of the present chapter we establish several basic
facts from the theory of metric spaces, while in the last section of the present
chapter, which consists of two parts, we consider some applications to the
material of the present chapter.
5.1. DEFINITION OF METRIC SPACE
We begin with the following definition of metric and metric space.
5.1.1. Definition. eL t X be an arbitrary non-empty set, and let p be a

real-valued function on X x ,X i.e., p: X x X - R, where p has the fol-
lowing properties:
(i) p(x, y) > 0 for all ,x y E X and p(x , y) = 0 if and only if x = y;
(ii) p(x , y) = p(y, x) for all x, y E X ; and
(iii) p(x, y) < p(x , )z + p(z, y) for all x , y, Z E .X
The function p is called a metric on ,X and the mathematical system con-
sisting of p and ,X {X; p}, is called a metric space.
The set X is often called the underlying set of the metric space, the elements
of X are often called points, and p(x, y) is frequently called the distance
from a point x E X to a point y E .X In view of axiom (i) the distance
between two different points is a unique positive number and is equal to zero
if and only if two points coincide. Axiom (ii) indicates that the distance
between points x and y is equal to the distance between points y and x.
Axiom (iii) represents the well-known triangle inequality encountered, for
example, in plane geometry. Clearly, if p is a metric for X and if IX is any
real positive number, then the function IXp(X, y) is also a metric for .X We
are thus in a position to define infinitely many metries on .X
The above definition of metric was motivated by our notion of distance.
Our next result enables us to define metric in an equivalent (and often con-
venient) way.
5.1.2. Theorem. eL t p: X x X - R. Then p is a metric if and only if

(i) p(x, y) = 0 if and only if x = y; and
(ii) p(y, )z < p(x , y)+ p(x, )z for all x , y, z E .X
Proof The necessity is obvious. To prove sufficiency, let x, y, Z E X with
y = .z Then 0 = p(y, y) < 2p(x, y). eH nce, p(x , y) ~ 0 for all ,x y E .X
Next, let Z = .x Then p(y, )x < p(x, y). Since x and yare arbitrary, we can
reverse their role and conclude p(x , y) < p(y, x). Therefore, p(x , y) = P(Y, x )
for all ,x y E .X This proves that p is a metric. •
Different metrics defined on the same underlying set X yield different

metric spaces. In applications, the choice of a specific metric is often dictated
by the particular problem on hand. If in a particular situation the metric p
is understood, then we simply write X in place of { X ; p} to denote the par-
ticular metric space under consideration.
eL t us now consider a few examples of metric spaces.
5.1. Definition 01 Metric Space
5.1.3. Example. L e t X be the set of real numbers R, and let the function
P on R x R be defined as
p(x, y) = Ix - yI (5.1.4)
for all x, Y E R, where Ix I denotes the absolute value of .x Now clearly
p(x , y) = Ix - yl = 0 ifand only if x = y. Also, for all x , y, Z E R, we have
p(y, )z = Iy - lz = 1(Y - )x + (x - )z 1 < Ix - yl + Ix - lz = p(x , y) +
p(x, )z . Therefore, by Theorem 5.1.2, P is a metric and R { ; p} is a metric
space. We call p(x, y) defined by Eq. (5.1.4) the usual metric on R, and we
call the metric space R{ ; p} the real line. _
5.1.5. Example. L e t X be the set of all complex numbers C. If z E C,

then z = a + ib, where i = . .;= 1 , and where a, b are real numbers. Let
i = a - ib and define p as
p(z l' Z2) = ([ z I - Z2)(Z I - Z2)],12. (5.1.6)
{ ; p} is a metric space. We call (5.1.6) the
It can readily be shown that C
usual metric for C. _
5.1.7. Example. Let X be an arbitrary non-empty set and define the

function p on X X X as
0 if x = y
p(x, y) = { I if x * y. (5.1.8)
Clearly p(x, y) 2 0 for all ,x y E X, p(x, x) = 0 for all x E ,X and p(x, y)

+
::::;; p(x, z) p(z, y) for all x, y, z E .X Therefore, (5.1.8) is a metric on .X
The function defined in Eq. (5. I .8) is called the discrete metric and is important
in analysis because it can be used to metrize any set .X _
We distinguish between bounded and unbounded metric spaces.
5.1.9. Definition. L e t { X ; p} be a metric space. If there exists a positive

number r such that p(x, y) < , for all x, y E ,X we say ;X { p} is a bounded
metric space. If;X{ p} is not bounded, we say ;X{ p} is an unbounded metric
space.
If ;X { p} is an unbounded metric space, then p takes on arbitrarily large

values. The metric spaces in Examples 5. 1.3 and 5. 1.5 are unbounded, whereas
the metric space in Example 5.1. 7 is clearly bounded.
5.1.10. Exercise. Let ;X{ p} be an arbitrary metric space. Define the

function PI : X x X - + R by
PI( X , y) = p(x , y) . (5.1.11)

1+ p(x , y)
Show that PI(X, y) is a metric. Show that ;X { PI} is a bounded metric space,
Chapter 5 I Metric Spaces
even though ;X { p} may not be bounded. Thus, the function (5.1.11) can be
used to generate a bounded metric space from any unbounded metric space.
(H i nt: Show that if,: R - . R is given by ,(t) = t/(l t), then ,(t 1) < ,(t 1 ) +
for all t 1 , t 1 such that 0 < t 1 < t 1 .)
Subsequently, we will call

R* = RU { - o o} u o+{ o}
the extended real numbers. In the following exercise, we define a useful metric
on R*. This metric is, of course, not the only metric possible.
5.1.12. Exercise. Let X = R* and define the function f: R* - . R as
J 1 {lxi' x R
1:+ : :: ~:
E
[(x) ~
Let p*: R* x R* - . R be defined by p.(x , y) = If(x ) - f(y) I for all ,x
y R*. Show that R
E { *; P.} is a bounded metric space. The function p* is
called the usual metric for R*, and R
{ *; p*} is called the extended real line.
We will have occasion to use the nex t result.
5.1.13. Theorem. L e t { X ; p} be a metric space, and let x, y, and z be any

elements of .X Then
Ip(x, )z - p(y, )z I< p(x, y) (5.1.14)
for all x , ,Y z E .x
Proof F r om ax i om (iii) of Definition 5.1.1· it follows that
p(x, )z < p(x, y) + p(y, )z (5.1.15)
and
P(Y, z) < p(y, x) + p(x, )z . (5.1.16)
F r om (5.1.15) we have
p(x, z) - p(y, z) < p(x, y) (5.1.17)
and from (5.1.16) we have
- p (y, x) S p(x, z) - P(Y, )z . (5.1.18)
In view of ax i om (ii) of Definition 5.1.1 we have p(x, y) = p(y, x), and thus
relations (5.1.17) and (5.1.18) imply
- p(x, y) < p(x, z) - P(Y, z) < p(x, y).
This proves that Ip(x, z) - p(y, z) I< p(x, y) for all x , y, z E .X •
5.1. Definition ofMetric Space 267
The notion of metric makes it possible to consider various geometric

concepts. We have:
5.1.19. Definition. Let fX ; p} be a metric space, and let Y be a non-void

subset of .X If p(x, y) is bounded for all ,x y E ,Y we define the diameter
of set ,Y denoted t5( )Y or diam (Y ) , as
t5(Y) = sup p{ (x, y): ,x y E .} Y
If p(x , y) is unbounded, we write t5( Y ) = +
00 and we say that Y has
infinite diameter, or Y is unbounded. If Y is empty, we define t5( )Y = O.
5.1.20. Exercise. Show that if Y c Z c ,X where fX ; p} is a metric space,

then t5( )Y < t5(Z). Also, show that if Z is non-empty, then t5(Z) = 0 if and
only if Z is a singleton.
We also have:
5.1.21. Definition. Let fX ; p} be a metric space, and let Y a nd Z be two

non-void subsets of .X We define the distance between sets Y a nd Z as
d(Y , Z) = inffp(y, )z : y E ,Y z E Z}.
Let p E X and define
d(p, Z) = inffp(p, )z : z E Z}.
We call d(p, Z) the distance between point p and set Z.
Since p(y, z) = p(z, y) for all y E Y and z E Z, it follows that d( ,Y Z)

= d(Z, Y). We note that, in general, d( ,Y Z) = 0 does not imply that Y a nd
Z have points in common. F o r example, let X be the real line with the usual
metric p. If Y = fx E :X 0 < x < I} and Z = fx E :X I < x < 2}, then
clearly d( ,Y Z) = 0, even though Y n Z = 0. Similarly, d(p, Z) = 0 does
not imply that p E Z.
5.1.22. Theorem. Let fX ; p} be a metric space, and let Y b e any non-void

subset of .X If p' denotes the restriction of p to Y X ,Y i.e., if
p'(x, y) = p(x, y) for all ,x y E ,Y
then f;Y p'} is a metric space.
We call p' the metric induced by p on ,Y and we say that {Y; p'} is a
metric subspace of fX ; p} or simply a subspace of .X Since usually there is
no room for confusion, we drop the prime from p' and simply denote the
metric subspace by {Y; pl. We emphasize that any non-void subset of a metric
space can be made into a metric subspace. This is not so in the case of linear
subspaces. If Y *
,X then we speak of a proper subspace.
5.2. SOME INEQUAIL TIES
In order to present some of the important metric spaces that arise in

applications, we first need to establish some important inequalities. These
are summarized and proved in the following:
5.2.1. Theorem. L e t R denote the set of real numbers, and let C denote
the set of complex numbers.
(i) Let p, q E R such that I 0 and p > 0, we have
~p :::;; ,~
p
+ fJq ' . (5.2.2)
(ii) (H6Ider's inequality) Let p, q E R be such that 1 
and ' I ., ... ,' I . belong either to R or to C. Then
•• , n'
(5.2.3)
(b) Infinite Sums. L e t

R or C. If ~ - le,l' <
"{ l
00 and ~
-
and I'{ I} be infinite sequences in either
I'II~ < 00, then
(5.2.4)
(c) Integrals. Let la, b) be an interval on the real line, and let
f, g: la, b] -+ R. If s: If(t) I' dt < 00 and s: Ig(t) I' dt < 00
(integration is in the Riemann sense), then
s: If(t)g(t) Idt :::;; :U If(t) I' dt] I/':U Ig(t) ~ dt] II'. (5.2.5)
(iii) (Minkowski's inequality) L e t pER, where 1:::;; p < 00.

(a) iF nite Sums. Let n be any positive integer, and let
and ' I I' ... , tI. belong either to R or to C. Then
I' ' ... ,e.
5.2. Some Inequalities 269
(b) Infinite Sums. Let e{ /} and I'{ I} be infinite sequences in either

eo eo
R or C. Ift:1le/l' < 00 and ~ 1'1/1' < 00, then

/
[~Iel ± I' II'T ' < [~lel 'T/, + [ t il' l d' T /,. (5.2.7)
r r
(c) Integrals. Let a[ , b] be an interval on the real line, and let
,J g: a[ , b] -+ R. If If(tWdt < 00 and Ig(tW dt < 00,
then
:U If(t) ± g(tW dtTI, < :U I

If(tW dtT ' + :U /
Ig(tW dtT ,.
(5.2.8)
Proof To prove part (i), consider the graph of 1' = e,-I in the (e, 'I) plane,
depicted in Figure A. Let ql = So- e,-I de and q 2 = If-I d'l. We have s:
ql = fl.'/p and q 2 = P'/.q From Figure A it is clear that q l + q2 > a.p for
any choice of fl., P > 0, and hence relation (5.2.2) follows.
To prove part (iia) we first note that if (tilell') II' = 0 or if (til1' 11') III
= 0, then inequality (5.2.3) follows trivially. Therefore, we assume that
(tile/l,)' / , 7= = 0 and (iil'lII,)'/I 7= = O. From (5.2.2) we now have
lell . 1'1/1 < 1- . lell' + 1- . I'llI' .

(~I~II,Y!J' (~'TI Iyl' - P (~I~II') q (~'TI ')
Hence,
5.1.9. iF gure A.
270 Chapter 5 I Metric Spaces
It now follows that
~ I,~ ' I = ~ '~,I 'I,I ~ (~ I~,I,)'/ (~I' ,~)'/.

which was to be proved.
To prove part (iib), we note that for any positive integer n,
~ 1,~I' 1 < (~I~,I,)'/'(~ 1'1,1,)'/' ~ (~I,~ I,)/' (~ 1'1,1,)'/.'

If we let n - . 00 in the above inequality, then (5.2.4) follows.
The proof of part (iic) is established in a similar fashion. We leave the
details of the proof to the reader.
To prove part (iiia), we first note that if p = I, then inequality (5.2.6)
follows trivially. It therefore suffices to consider the case I < p < 00. We
observe that for any ~, and we have I' "
(I~,I + I'I,IY = (I~,I + I'I,I)'-II~,I + (I~,I + 1'1,1)'1- 1'1,1·
Summing the above identity with respect to i from I to n, we now have
• •
= ~ 1[ ,' 1 + I'I,I]'-I'~,I + ~ [I~,I + 1'1,1]1-' 1'1,1·
Applying the Halder inequality (5.2.3) to each of the sums on the right side
of the above relation and noting that (p - l)q = p, we now obtain
~ 1[ ,' 1 + 1'1,1]' < [~(I~,I + I' ,I)'T/'[~ I~,I'J/

/
+ [~(le,1 + 1'I,l)'T ,[t.1'1' ,I,]/' .'
If we assume that [t 'sl
/
(le,1 + 1'1,1),]1 ' *- 0 and divide both sides of the
above inequality by this term, we have
[ ~• (1',1 + 1'1,1)' 1] /' < [ • ~ I~,I'

1] /' + [ .~ 1'1,1' 1] /' .
Since [ ~ a
I~, ± 1' ,1' 1] 1' < [ .~ (I ,~ I + 1'1,1)' 1] /' ,the desired result follows.
We note that in case [ I;
a
(I~,I
1=1
+ 1'1,1)' 1] /' = 0, inequality (5.2.6) follows
trivially.
Applying the same reasoning as above, the reader can now prove the
Minkowski inequality for infinite sum!! and for integrals. _
If in (5.2.3), (5.2.4), or (5.2.5) we let p = q = t, then we speak of the

Schwarz inequality for finite sums, infinite sums, and integrals, respectively.
5.3. Examples ofImportant Metric Spaces 271
5.2.10. Exercise. Prove H o lder' s inequality for integrals (5.2.5),

Minkowski' s inequality for infinite sums (5.2.7), and Minkowski's inequality
for integrals (5.2.8).
5.3. EXAMPLES OF IMPORTANT METRIC SPACES
In the present section we consider specific examples of metric spaces which

are very important in applications. It turns out that aU of the spaces of this
section are also vector spaces.
As in Chapter ,4 we denote elements ,x y E R~ (elements ,x y E C~) by
x = (' I ' , ,~) and y = ('11' ... ,'1~), respectively, where ' I ' 1' 1 E R for
i = I, ,n (where ' " 1' 1 E C for i = I, ... ,n). Similarly, elements ,x
1' ", ...), respectively, where 1' 1 E R for all i (where I' '
y E Roo (elements x , y E Coo) are denoted by x = (' 1 ' ,,,, ...) andy = ('11'
1' 1 E C for all i). I' '
5.3.1. Example. Let X = Rn (let X = Cn), let 1 :::;; P < 00, and let
p,(x, y) = [ t1=1 1', - 1' 11,]1/'. (5.3.2)
We now show that (Rn; p,}({ C~; p,}) is a metric space.

Axioms (i) and (ii) of Definition 5.1.1 are readily verified. To show that
axiom (iii) is satisfied, let a, b, d E Rn (let a, b, d E cn), where a = (<IX '
... , (I..), b = (ft I' ... ,ft.), and d = (~I' ... ,~.). If x = a - band y = b
- d, then we have from inequality (5.2.6),
p,(a,d)=
•
~1(l.1-~,I'
} 11,
=
1.l,~I(l.I-~ PI+PI-~II' } 11,
{
• } 11, { 11,
:::;; { 1~ (l.I-PII' + n
~IP,-~,I'
}
= p (a,b)+ p (b,d),
{ n; p,}) is a metric
the triangle inequality. It thus follows that (Rn; p,}(C
space; in fact, it is an unbounded metric space.
We frequently abbreviate (R~; p,} by R; and (cn; p,} by C;. F o r the case
p = 2, we call p" the Euclidean metric or the usual metric on R~. _
5.3.3. Example. F o r ,x y E R~ (for ,x y E C~), let

poo(x, y) = max (1'1 - 1111, ... ,I'n - 1' nl}· (5.3.4)
It is readily shown that (R~; poo}(« cn; poo}) is a metric space. _
5.3.5. Ex a mple. Let 1 <p < 00, let X = R- (or X = Coo), and define
I, = x{ EX : t I'll' <
I-I
oo} . (5.3.6)
[ I;.. I{, 1] /, .
F o r ,x y E I" let
pJ X , y) = I- I
- 1' ,1' (5.3.7)
We can readily verify that {I,; p,} is a metric space. _
5.3.8. Example. eL t X = R" (or X = C"), and let

I.. = x{ E :X sup 1{ ,{ 1l < oo}. (5.3.9)
I
F o r ,x y E I.. , define
p..(x, y) = sup
, I{ ,{ - I' ,ll· (5.3.10)
We can easily show that I{ .. ; P..} is a metric space. _
5.3.11. Exercise. sU e the inequalities of Section 5.2 to show that the

spaces of Examples 5.3.3, 5.3.5, and 5.3.8 are metric spaces.
5.3.12. Example. eL t a[ , b], a < b, be an interval on the real line, and let
era, b] be the set of all real-valued continuous functions defined on a[ , b].
Let I < p < 00 and for ,x y E era, b], define
p,(x , y) = i[
•
b
Ix(t) - y(t)IP dt IJ I' . (5.3.13)
We now show that e{ ra, b]; p,} is a metric space.

Clearly, p,(x , y) = ply. x), and p,(x , y) ~ 0 for all x , y E era, b]. If
x ( t) = y(t) for all t E a[ , b], then p,(x , y) = O. To prove the converse of
this statement, suppose that x(t) 1= = y(t) for some t E a[ , b). Since x , y E
era, b], x - Y E e[a, b], and there is some interval in a[ , b], i.e., a subinterval
of a[ , b], such that Ix(t) - y(t) I > 0 for all t in that subinterval. eH nce,
6 1] /, > o.
1[ • Ix ( t) - y(t) I' dt
Therefore, p,(x , y) = 0 if and only if x ( t) = y(t) for all t E a[ , b].

To show that the triangle inequality holds, let u, tJ, W E era, b], and let
x = U - tJ and y = v - w. Then we have, from inequality (5.2.8),
I{ • lu(t) -
b } 1/,
p,(u, w) = w(tWdt
I{• Iu(t) - v(t) +

} 1/,
w(t) I' dt
b
= v(t) -
b }
< • I{ lu(t) - v(tWdt
1/,
+ • I{ b Iv(t) - w(tWdt
} 1/,
= p,(u, v) + p,(v, w),

the triangle inequality. It now follows that e{ ra, bJ; p,} is a metric space.
It is easy to see that this space is an unbounded metric space. _
5.3. Examples of Important Metric Spaces 273
5.3.14. Example. eL t a[~ , b] be defined as In the preceding example.

F o r x, y E ~[a, b], let
p_ ( x , y) = sup Ix { t ) - y{ t ) I. (5.3.15)
GStS6
To show that {~[a, b]; p- l is a metric space we first note thatp_ ( x , y) = p_ ( y,

x), that p(x , y) > 0 for all ,x y, and that p{ x , y) = 0 if and only if x ( t) = y{ t )
for all t E a[ , b]. To show that p_ satisfies the triangle inequality we note that
p_ ( x , y) = sup Ix { t ) - y(t)/
GStS6
= sup Ix ( t) -
.StS6
z{t) + z ( t) - y{ t ) I
I Ix - y l I
i
I
·' I
I
I I
•x •
I I
o y o
x = R, pix, yl = Ix - yl
ev= ( v,.V2)
I
X" (x " 2x 1 :
.- - - - , -
I I
o o
P.(x , yl
X .. R2, P.(X , yl" max (Ix, - y ,I,lx 2 - 2Y 11
(x tl
o Ib
"' - _ ' : "' - ' - - - -~ - -- - - -
I
x = era, bJ , P. (x" 2x 1 = sup { I x l (t)- 2

x t{ 111
a~tb
~
5.3.16. iF gure B. Illustration of various metrics.

174 Chopttr 5 I Mttr;c Spacts
< sup I{ x ( t) - z(t) I + Iz(t) - y(t) I}

.S;' 9
= p_(,x )z + p_(,z y).

It thus follows that e{ a[ , b]; P-J is a metric space. _
In Figure B,. several metrics considered in Section 5.1 and in the present
section are depicted pictorially.
5.3.17. Exercise. Show that the metric defined in Eq. (5.3.4) is eq u ivalent
to
" /
= - 1' ,1' IJ , .
,-- [ I-'
p_(,x y) lim I; I~,
5.3.18. Exercise. Let X = R denote the set of real numbers, and define
d(x , y) = (x - yy~ for all x, y E R. Show that the function dis not a metric.
This illustrates the necessity for the ex p onent lip in Eq. (5.3.2).
We conclude the present section by considering Cartesian products of

metric spaces. Let { X ; P.. } and { Y ; py} be two metric spaces, and letZ = X
x .Y Utilizing the metrics P.. and py we can define Metrics on Z in an infinite
variety of ways. Some of the more interesting cases are given in the following:
5.3.19. Theorem. L e t {X; P.. } and {Y; pyJ be metric spaces, and let
Z = X x .Y Let ZI = (XI> Y I ) and Z2 = (x 2, 2Y ) be two points ofZ = X x .Y
Define the functions
P,(ZI' Z2) = ([p(z IX > x 2 l> ' + [ p iY I ' Y2)],}I/" 1 <p < 00
and
P_(ZI' Z2) = max p{ (z x u 2X ), PY(IY ' 2Y )}.
Then Z
{ ; PI} and Z
{ ; P-J are metric spaces.
The spaces Z
{ ; P,J and Z
{ ; P-J are examples of product (metric) spa~es.
We can extend the above concept to the product of n metric spaces.

We have:
5.3.21. Theorem. Let { X I ; PIJ, ... ,{ X , ,; P"J be n metric spaces, and let
X = XI X ... X "X = " "X
IT For x = (XI> ... , IX I) E ,X Y = (YI' ... ,
t-~
,Y ,) E X, define the functions
p' ( x , y) " P,(X
= I; y,)
'~I "
5.4. Open and Closed Sets 175
and
p"(x , y) = (I-' •
~ p[ ,(x" y,)~]
)1/~
.
Then { X ; p'} and { X ; pIt} are metric spaces.
5.4. OPEN AND CLOSED SETS
Having introduced the notion of metric, we· are now in a position to

consider several important fundamental concepts which we will need through-
out the remainder of this book.
In the present section ;X{ p} will denote an arbitrary metric space.
5.4.1. Definition. Let X o E X and let r E R, r > O. An open sphere or

open ball, denoted by S(x o; r), is defined as the set
S(x o; r) = x { E :X p(x, x o) < r}.
We call the fixed point X o the center and the number r the radius of S(x o ; r).
F o r simplicity, we often call an open sphere simply a sphere.
The radius of a sphere is always positive and finite. In place of the terms
ball or sphere we also use the term spherical neighborhood of X o'
In Figure C, spheres in several types of metric spaces considered in the
previous sections are depicted. Note that in these figures the indicated spheres
do not include boundaries.
5.4.3. Exercise. Describe the open sphere in R~ as a function of r if the

metric is the discrete metric of Example 5.1.7.
We can now categorize the points or elements of a metric space in several

ways.
5.4..4 Definition. eL t Y be a subset of .X A point x E X is called a contact

point or adherent point of set Y if every open sphere with center x contains at
least one point of .Y The set of all adherent points of Y is called the closure
of Y and is denoted by .Y
We note that every point of Y is an adherent point of ;Y however, there

may be points not in Y which are also adherent points of .Y
5.4.5. Definition. Let Y be a subset of ,X and let x E X be an adherent

point of .Y Then x is called an isolated point if there is a sphere with center x
'.
I
r
~
oX
Sphere S(XO; rl, where X = R
and pIx , yl = Ix - vi
Sphere S(x o ; rl. where X ., R2 and
pIx , yl" P2(x , yl "' [ ( tl 1- 1112 + (b -1I212J~
t2
t 02 +
9! - i •
r
t 02 +
~
r
- ~ ."
-~.
t 02
~
1.,-, : I
t 02 - r - - +- -,
I~ I
:
I
I
I
I
I
I
I
I
I
I I I
tOI - r tOI tOI + r t1 tOI - r tOI to! + r tI
Sphere S(x o ; rl. where X = R2 and Sphere S(xo ; rI where X "' R2 and
p(x , yl= P I(x , yl= l t I - " I+ I.,
~ I ~2 - 112 1 p(x , yl= p _ ( x , yl"' m ax litI - .."1'I It:~2 - 1121)
x l tl
oX - r
a b
Sphere S(Xo; rl, where X ' " era, bJ

and pIx , yl "' p_ (x, yl = sup Ix ( tI- y(tl I
a~t~b
5.4.2. Figure C• Spheres In

. various
. metric spaces.
which contains no point of Y o ther than x itself. The point x is called a limit
point or point of accumulation of set Y if every sphere with center at x contains
an infinite number of points of .Y The set of all limit points of Y is called the
derived set of Y a nd is denoted by .'Y
Our next result shows that adherent points are either limit points or
isolated points.
5.4.6 • . Theorem. eL t Y be a subset of X and let x E .X Ifx is an adherent

point of ,Y then x is either a limit point or an isolated point.
Proof We prove the theorem by assuming that x is an adherent point of Y
but not an isolated point. We must then show that x is a limit point of .Y
To do so, consider the family of spheres S(x; lin) for n = 1,2, .... eL t
fX t E S(x; lin) be such that fX t E Y b ut fX t 1= = x for each n. Now suppose there
are only a finite number of distinct such points X ft , say, lX { ' ... , x k } . If we
let d = min p(x, IX )' then d > O. But this contradicts the fact that there is
1:S:I:S:k
an fx t E S(x; lin) for every n = 1,2,3, .... eH nce, there are infinitely many
fx t and thus X is a limit point of .Y •
We can now categorize adherent points of Y c X into the following

three classes: (a) isolated points of ,Y which always belong to Y; (b) points
of accumulation which belong to ;Y and (c) points of accumulation which do
not belong to .Y
5.4.7. Example. Let X = R, let p be the usual metric, and let Y = x{ E R:

o< x < 1, x = 2}, as depicted in Figure D. The element x = 2 is an isolated
point of ,Y the elements 0 and 1 are adherent points of Y which do not
belong to ,Y and each point of the set x { E R: 0 < x < I} is a limit point
of Y belonging to .Y •
( ) •
o 2
5.4.8. iF gure D. Set Y = {x E R: 0 < x < 1, x = 2} of Example

5.4.7.
5.4.9. Example. Let R { ; p} be the real line with the usual metric, and let
Q be the set of rational numbers in R. F o r every x E R, any open sphere
S(x; r) contains a point in Q. Thus, every point in R is an adherent point
of Q; i.e., R c Q. Since Q c R, it follows that R = Q. Clearly, there are
no isolated points in Q. Also, for any x E R, every sphere S(x; r) contains
an infinite number of points in Q. Therefore, every point in R is a limit

point of Q; i.e., R c Q'. This implies that Q' = R. _
L e t us now consider the following basic results.
5.4.10. Theorem. L e t Y a nd Z be subsets of ,X and let f and i denote the

closures of Y a nd Z, respectively. L e t r
denote the closure of ,Y and let Y '
be the derived set of .Y Then
(i) Y c f;
(ii) f = f;
(iii) if Y c Z, then f c i;
(iv) YUZ = f u i;
(v) Y n Z c f n i; and
(vi) f = Y U Y'.
Proof To prove the first part, let x E .Y Then x E S(x ; r) for every r > O.
Hence, x E .Y Therefore, Y c f.
To prove the second part, let x E ,Y and let r> O. Then there is an
XI E Y such that X I E S(x ; r),andhencep(x , X I ) = r l < r. L e tro = r - r l
> O. WenowwishtoshowthatS(x l ; ro) c S(x ; r). Indoingso,lety E S(x l ;
ro)' Then p(y, X I ) < roo By the triangle inequality we have p(x , y) ~ p(x ,
XI) + p(x l , y) < r l + (r - r l ) = r, and hence y E S(x ; r). Since X I E f,
the sphere S(x l ; ro) containsapointx 2 E .Y Thus, X 2 E S(x ; r). Since S(x ;
r) is an arbitrary spherical neighborhood of x , we have X E .Y This proves
r
that c .Y Also, in view of part (i), we have Y c r. Therefore, it follows
that = Y .Y ·
To prove the third part of the theorem, let r > 0 and let X E .Y Then
there is ayE Y such that y E S(x ; r). Since Y c Z, Y E Z and thus X is an
adherent point of Z.
To prove the fourth part, note that Y c Y U Z and Z c Y U Z. F r om
part (iii) it now follows that Y c Y U Z and i c Y U Z. Thus, f u i
c Y U Z. To show that Y U Z c f u i, let X E Y U Z and suppose
that X :q Y u i. Then there exist spheres S(x ; r l ) and S(x ; r2) such that
S(x ; r l) n Y = 0 and S(x ; ' 2 ) n Z = 0. L e t r = min {'It :' ' } z Then
S(x ; r) n [ Y U Z] = 0. But this is impossible since X E Y U Z. Hence,
X E Y u i, and thus Y U Z c f u i.
The proof of the remainder of the theorem is left as an exercise. _
5.4.11. Exercise. Prove parts (v) and (vi) of Theorem 5.4.10.
We can further classify points and subsets of metric spaces.
5.4.12. Definition. L e t Y be a subset of X and let Y - denote the comple-

ment of .Y A point X E X is called an interior point of the set Y if there
exists a sphere Sex; r) such that sex; r) c .Y The set of all interior points of
set Y is called the interior of Y a nd is denoted by yo. A point x E X is an
ex t erior point of Y if it is an interior point of the complement of .Y The
exterior of Y is the set ofall exterior points of set .Y The set ofall points x E X
such that x E f () (Y - ) is called the frontier of set .Y The boundary of a
set Y is the set of all points in the frontier of Y which belong to .Y
5.4.13. Example. Let R{ ; p} be the real line with the usual metric, and
let Y = y{ E R: 0 < :Y 5: I} = (0, I]. The interior of Y is the set (0, I) =
°
{ y E R: 0 < y < I}. The exterior of Y i s the set (- 0 0, 0) U (I, + 0 0), f =
y{ E R: < Y : 5: I} = 0[ , I] and Y - = (- 0 0,0] U 1[ , + 0 0). Thus, the
frontier of Y is the set CO, I}, and the boundary of Y is the singleton l{ .} •
We now introduce the following important concepts.
5.4.14. Definition. A subset Y of X is said to be an open subset of X if

every point of Y is an interior point of ;Y eL ., Y = yo. A subset Z of X is
said to be a closed subset of X if Z = Z.
When there is no room for confusion, we usually call Y an open set and
Z a closed set. On occasions when we want to be very explicit, we will say
that Y is open relative to { X ; p} or witb respect to { X ; p}.
In our next result we establish some of the important properties of open
sets.
5.4.15. Theorem.
(i) X and 0 are open sets.
(ii) If { .Y } .. eA is an arbitrary family of open subsets of ,X then U Y ..
• eA
is an open set.
(iii) The intersection of a finite number of open sets of X is open.
Proof To prove the first part,. note that for every x E X, any sphere
Sex; r) c .X Hence, every point in X is an interior point. Thus, X is open.
Also, observe that 0 has no points and therefore every point of 0 is an
interior point of 0. Hence, 0 is an open subset of .X
To prove the second part, let .Y{ .} EA be a family of open sets in ,X and
let Y = U Y .• If Y .. is empty for every tt E A, then Y = 0 is an open
.eA
subset of .X Now suppose that Y *- 0 and let x E .Y Then x E Y . for some
tt E A. Since Y .. is an open set, there is a sphere Sex; r) such that sex; r)
c Y .• Hence, Sex; r) c ,Y and thus x is an interior point of .Y Therefore,
Y is an open set.
To prove the third part, let Y 1 and Y 2 be open subsets of .X If Y 1 () Y 2
= 0, then Y 1 n Y 2 is open. So let us assume that Y 1 n Y z *- 0, and let
x E Y = Y 1 n Y z • Since x E Y " there is an r l > 0 such that x E S(x;

T 1) C Y I ' Similarly, there is an r z > 0 such that x E S(x; rz) c Y z . L e t
T = min { r " Tz.} Then x E S(x ; r), where S(x ; r) c S(x ; T1) and S(x ; r)
c S(x ; rz). Thus, S(x; r) c Y 1 n Y z , and x is an interior point of Y 1 n Y z .
Hence, Y 1 n Y z is an open subset of .X By induction, we can show that the
intersection of any finite number of open subsets of X is open. _
We now make the following
5.4.16. Definition. L e t ;X{ p} be a metric space. The topology of X deter-

mined by p is defined to be the family of all open subsets of .X
In our next result we establish a connection between open and closed

subsets of .X
5.4.17. Theorem.
(i) X and 0 are closed sets.
(ii) If Y is an open subset of ,X then r is closed.
(iii) If Z is a closed subset of ,X then Z- is open.
Proof The first part of this theorem follows immediately from the defini-
tions of ,X 0, and closed set.
To prove the second part, let Y b e any open subset of .X We may assume
that Y 1= = 0 and Y 1= = .X Let x be any adherent point of Y - . Then x cannot
belong to ,Y for if it did, then there would exist a sphere S(x ; ,) c ,Y which
is impossible. Therefore, every adherent point of Y - belongs to Y - , and thus
Y - is closed if Y is open.
To prove the third part, let Z be any closed subset of .X Again, we may
assume that Z 1= = 0 and Z 1= = .X L e t x E Z- . Then there exists a sphere
S(x ; T) which contains no point of Z. This is so because if every such sphere
would contain a point of Z, then x would be an adherent point of Z and
consequently would belong to Z, since Z is closed. Thus, there is a sphere
S(x ; r) c Z- ; i.e., x is an interiorpointofZ- . Since this holds for arbitrary
x E Z- , Z- is an open set. _
In the next result we present additional important properties of open

sets.
5.4.18. Theorem.
(i) Every open sphere in X is an open set.
(ii) If Y is an open subset of ,X then there is a family of open spheres,
{ .}.eA' such that Y = U S .•
S
• eA
(iii) The interior of any subset Y of X is the largest open set contained
in .Y
Proof To prove the first part, let Sex; r) be any open sphere in .X L e t
x . E sex; r), and let p(x, lX ) = r .• If we let r o = r - ' . , then according to
the proof of part (ii) of Theorem 5.4.10 we have S(x l ; ro) c Sex; r). Hence,
x . is an interior point of sex; r). Since this is true for any x . E sex; r), it
follows that sex ; r) is an open subset of .X
To prove the second part of the theorem, we first note that if Y = 0,
then Y is open and is the union of an empty family of spheres. So assume
that Y t= = 0 and that Y is open. Then each point X E Y is the center of a
sphere Sex; r) c ,Y and moreover Y is the union of the family of all such
spheres.
The proof of the last part of the theorem is left as an exercise. _
5.4.19. Exercise. Prove part (iii) of Theorem 5.4.18.
Let {Y; p} be a subspace of a metric space {X; pI, and suppose that V
is a subset of .Y It can happen that V may be an open subset of Y and at
the same time not be an open subset of .X Thus, when a set is described as
open, it is important to know in what space it is open. We have:
5.4.20. Theorem. p} be a metric subspace of { X ; pl.

Let { Y ;
(i) A subset V c Y is open relative to { Y ; p} if and only if there is a
subset U c X such that U is open relative to { X ; p} and V = Y n .U
(ii) A subset G c Y is closed relative to { Y ; p} if and only if there is a
subset F of X such that Fis closed relative to ;X { p} and G = F n .Y
Proof L e t S(x o; r) = x { E :x p(x, x o) < r} and S'(x o; r) = x { E :Y p(x,
x o) < r}. Then S' ( x o; r) = Y n S(x o; r).
To prove the necessity of part (i), let V be an open set relative to { Y ; p} ,
and let x E V. Then there is a sphere S' ( x ; r) c V (r may depend on )x .
Now
V= U
.,el'
S' ( x ; r) = U
.,el'
S(x ; r)n Y.
By part (ii) of Theorem 5.4.15, U
.,el'
Sex; r) = U is an open set in ;X{ pl.
To prove the sufficiency of part (i), let V = Y n ,U where U is an open
subset of .X L e t x E V. Then x E ,U and hence there is a sphere S(x; r) c .U
Thus, S'(x; r) = Y n Sex; r) c Y n U = V. This proves that x is an inte-
rior point of V and that V is an open subset of .Y
The proof of part (ii) of the theorem is left as an exercise. _
The first part of the preceding theorem may be stated in another equivalent
way. L e t 3 and 3' be the topology of ;X { p} and {Y; pI, respectively, generated
by p. Then 3' = { Y n :U U E 3}.
Let us now consider some specific examples.
elulpter 5 I Metric Spaces
5.4.22. Example. Let X = R, and let p be the usual metric on R; eL .,

p(x, y) = Ix - yl. Any set Y = (a, b) = { x : a < x < b} is an open subset
of .X We call (a, b) an open interval on R. _
5.4.23. Example. We now show that the word "finite" is crucial in part
{ ; p} denote again the real line with the usual
(iii) of Theorem 5.4.15. eL t R
metric, and let a < b. If "Y = x { E R: a < x < b + lin}, then for each
positive integer n, "Y is an open subset of the real line. oH wever, the set
n- "Y
,,= \
= x{ E R: a < x < b} = (a, b]
is not an open subset of R. (This. can readily be verified, since every sphere
S(b; r) contains a point greater than b and hence is not in n- "Y .)
,,= \
_
In the above example, let Y = (a, b]. We saw that Y is not an open subset
of R; i.e., b is not an interior point of .Y oH wever, if we were to consider
{ Y ; p} as a metric space by itself, then Y is an open set.
5.4.24. Example. eL t e{ ra, b]; p_} denote the metric space of Example
5.3.14. eL t 1 be an arbitrary finite positive number. Then the s~t of continuous
functions satisfying the condition Ix ( t) I < 1 for all a < t < b is an open
subset of the metric space e{ ra, b]; p_.} _
Theorems 5.4.15 and 5.4.17 tell us that the sets X and 0 are both open
and closed in any metric space. In some metric spaces there may be proper
subsets of X which are both open and closed, as illustrated in the following
example.
5.4.25. Example. eL t X be the set of real numbers given by X = (- 2 ,

- 1 ) U (+ 1 , + 2 ), and let p(x , y) = Ix - yl for x , y E .X Then { X ; p} is
clearly a metric space. Let Y = (- 2 , - 1 ) c X and Z = (+ I, + 2 ) c .X
Note that both Y and Z are open subsets of .X oH wever, Y - = Z, Z- = ,Y
and thus Y a nd Z are also closed subsets of .X Therefore, Y and Z are proper
subsets of the metric space ;X { p} which are both open and closed. (Note that
in the preceding we are not viewing X as a subset of R. As such X would be
open. Considering ;X{ p} as our metric space, X is both open and closed.) _
5.4.26. Exercise. eL t { X ; p} be a metric space with p the discrete metric

defined in Example 5.1.7. Show that every subset of X is both open and
closed.
In our next result we summarize several important properties of closed

sets.
5.4. Open and Closed Sets
5.4.27. Theorem.
(i) Every subset of X consisting of a finite number of elements is closed.
(ii) L e t X o E ,X let r> 0, and let K ( x o ; r) = x { E X : p(x , x o) < r}.
Then K ( x o; r) is closed.
(iii) A subset Y c X is closed if and only if feY .
(iv) A subset Y c X is closed if and only if Y ' c .Y
(v) Let {Y.}.eA be any family of closed sets in .X Then Y. is closed. n
• eA
(vi) The union of a finite number of closed sets in X is closed.
(vii) The closure of a subset Y of X is the intersection of all closed sets
containing .Y
Proof Only the proof of part (v) is given. Let {Y.}.eA be any family of
closed subsets of .X Then {Y:}.eA is a family of open sets. Now Y . )- (n
.eA
= U
.eA
Y: is an open set, and hence n
.eA
Y. is a closed subset of .X •
5.4.28. Exercise. Prove parts (i) to (iv), (vi), and (vii) of Theorem 5.4.27.
We now consider several specific examples of closed sets.
5.4.29. Example. Let X = R, and let p be the usual metric, p(x , y)

= Ix - yl· Any set Y = x{ E R: a < x < b}, where a < b is a closed subset
of R. We call Y a closed interval on R and denote it by a[ , b]. •
5.4.30. Example. We now show that the word "finite" is essential in part
(vi) of Theorem 5.4.27. Let {R; p} denote the real line with the usual metric,
and let a> O. If Y. = x { E R: lin < x < a} for each positive integer n,
then Y. is a closed subset of the real line. However, the set
U - Y. = (x E R: 0 < x < a} = (0, a]
°
.=1
is not a closed subset of the real line, as can readily be verified since is an
adherent point of (0, a]. •
5.4.31. Exercise. The set K ( x o; r) defined in part (ii) of Theorem 5.4.27

is sometimes called a closed sphere. It need not coincide with S(x o; r), i.e.,
the closure of the open sphere S(x o; r).
(i) Show thatS(x o; r) c K(xo;r).
(ii) Let (X ; p} be the discrete metric space defined in Example 5.1.7.
Describe the sets S(x; I), S(x ; I), and K(x; I) for any x E X and conclude
that, in general, S(x ; I) K ( x ; *'
I) if X contains more than one point.
(iii) Let X = (- 0 0,0) u ,J where J denotes the set of positive integers,

and let p(x, y) = Ix - Y I. Describe S(O; 1), (& 0; I), and (K O; 1)
and conclude that (& 0; 1) (K O; 1).*"
We are now in a position to introduce certain additional concepts which
are important in analysis and applications.
5.4.32. Definition. eL t Y and Z be subsets of .X The set Y is said to be

dense in Z (or dense with respect to Z) if Y :J Z. The set Y is said to be
everywhere dense in { X ; p} (or simply, everywhere dense in X ) if Y = .X
If the exterior of Y is everywhere dense in X, then Y is said to be nowhere
dense in .X A subset Y of X is said to be dense-in-itself if every point of Y
is a limit point of .Y A subset Y of X which is both closed and dense-in-itself
is called a perfect set.
5.4.33. Definition. A metric space {X; p} is said to be separable if there

is a countable subset Y in X which is everywhere dense in .X
The following result enables us to characterize separable metric spaces

in an equivalent way. We have:
5.4.34. Theorem. A metric space { X ; p} is separable if and only if there

is a countable set S = lX{ ' ,~x ...} c X such that for every x E ,X for given
f> 0 there is an x . E S such that p(x, x . ) < f.
eL t us now consider some specific cases.
5.4.36. Example. The real line with the usual metric is a separable space.
As we saw in Example 5.4.9, if Q is the set of rational numbers, then Q = R.
•
5.4.37. Example. Let {R·; p,} be the metric space defined in Example
5.3.1 (recall that 1 < p < 00). The set of vectors x = (e I' ,e.) with
rational coordinates (i.e., e,
is a rational real number, i = I, ,n) is a
denumerable everywhere dense set in R· and, therefore, R
{ ;· p,} is a separable
metric space. _
5.4.38. Example. eL t {l,; p,} be the metric space defined in Example 5.3.5
(recall that I < p < 00). We can show that this space is separable in the
following manner. eL t
Y = .Y{ E I,: .Y = ('II' ... , 1/.,0,0, ...) for some n,
where 1/1 is a rational real number, i = 1, ... ,n} .
Then Y is a countable subset of I,. To show that it is everywhere dense, let

E > 0 and let x E I" where x = (~I> ~z, ...). Choose n sufficiently large so
that
E'
1: <_. 2
~
I~kl'
k-~+t
We can now find a Y~ E Y such that
eH nce,
i.e., p,(x, )~Y < E. By Theorem 5.4.34, {I,; P,} is separable. _
In order to establish the separability of the space of continuous functions,

it is necessary to use the Weierstrass approximation theorem, which we state
without proof.
5.4.39. Theorem. Let era, b] be the space of real continuous functions

on the interval a[ , b], and let 6'(t) be the family of all polynomials (defined
on a[ , b]). Let E > 0, and let x E era, b]. Then there is apE 6'(t) such that
sup Ix(t) - P(t)1 < E•
• ~/~b
5.4.40. Ex e rcise. U s ing the Weierstrass approx i mation theorem, show

that the metric spaces e{ ra, b]; P,}, defined in Example 5.3.12, and e{ ra, b];
p~,} defined in Example 5.3.14, are separable.
5.4.41. Exercise. Show that the metric space { X ; p}, where pis the discrete
metric defined in Example 5.1.7, is separable if and only if X is a countable
set.
We conclude the present section by considering an example of a metric

space which is not separable.
5.4.42. Example. L e t {l~; p~} be the metric space defined in Example

5.3.8. Let Y c R~ denote the set
Y ={y E R~: Y = ('11' 1' ,z ...), where 1' 1 = 0 or I}.
Clearly then Y c I~. Now for every real number IX E 0[ , I], there is ayE Y
= L where Y =
~
such that IX '1~H)~, ('11> 1' ,z ...). Thus, Y is an uncountable

.~I
set. Notice now that for every IY > zY E ,Y p~(IY > yz ) = 0 or l. That is, p~
restricted to Y is the discrete metric. It follows from Exercise 5.4.14 that Y
cannot be separable and, consequently, { t ; p~} is not separable. _
5.5. COMPLETE METRIC SPACES
The set of real numbers R with the usual metric p defined on it has many
remarkable properties, several of which are attributable to the so-called
"completeness property" of this space. F o r this reason we speak of R { ; p}
as being a complete metric space. In the present section we consider general
complete metric spaces.
Throughout this section {X; p} is our underlying metric space, and J denotes
the set of positive integers. Before considering the completeness of metric
spaces we need to consider a few facts about sequences on metric spaces (cf.
Definition 1.1.25).
5.5.1. Definition. A sequence .x { } in a set Y c: X is a functionf: J - .Y

Thus, if .x{ } is a sequence in ,Y thenf(n) = x . for each n E .J
5.5.2. Definition. eL t .x{ } be a sequence of points in ,X and let x be a

point of .X The sequence {x.} is said to converge to x if for every f > 0,
there is an integer N such that for all n;;::: N, p(x, x . ) < f (i.e., x . E S(x ; f)
for all n ;;::: N). In general, N depends on f; i.e., N = N(f). We call x the limit
of .x{ ,} and we usually write
lim x . = ,x
•
or x . - x as n - 00. If there is no x E X to which the sequence converges,
then we say that {x.l diverges.
Thus, x . - + x if and only if the sequence of real numbers {p(x., )x }

converges to ez ro. In view of the above definition we note that for every
f > 0 there is afinite number N such that all terms of {x.l except the first
(N - I) terms must lie in the sphere with center x and radius E. eH nce, the
convergence of a sequence depends on the infinite number of terms x{ N + 1J
X N+ 2' • • ), and no amount of alteration of a finite number of terms of a
divergent sequence can make it converge. Moreover, if a convergent sequence
is changed by omitting or adding a finite number of terms, then the resulting
sequence is still convergent to the same limit as the original sequence.
Note that in Definition 5.5.2 we called x the limit of the sequence .x{ .}
We will show that if { x . ) has a limit in ,X then that limit is unique.
5.5.3. Definition. eL t .x { } be a sequence of points in ,X where f(n) to x .

for each n E .J If the range offis bounded, then .x { } is said to be a bounded
sequence.
The range off in the above definition may consist of a finite number of
points or of an infinite number of points. Specifically, if the range of f
5.5. Complete Metric Spaces
consists of one point, then we speak of a constant sequence. Clearly, all

constant sequences are convergent.
5.5.4. Example. Let R { ; p} denote the set of real numbers with the usual
metric. If n E ,J then the sequence n{ Z} diverges and is unbounded, and the
range of this sequence is an infinite set. The sequence { ( - I )"} diverges, is
bounded, and its range is a finite set. The sequence a{ + ( nl)"} converges
to a, is bounded, and its range is an infinite set. _
5.5.5. Definition. eL t "x { } be a sequence in .X Let n l , n z , ... , nk' ... be

a sequence of positive integers which is strictly increasing; i.e., nJ > nk for
all j > k. Then the sequence "x { .} is called a subsequence of ,x { ,}. If the
subsequence "x { .} converges, then its limit is called a subsequential limit
of ,x { ,].
It turns out that many of the important properties of convergence on R

can be extended to the setting of arbitrary metric spaces. In the next result
several of these properties are summarized.
5.5.6. lbeorem. eL t ,x { ,} be a sequence in .X Then

(i) there is at most one point x E X such that lim "x = x;
"
(ii) if ,x { ,} is convergent, then it is bounded;
(iii) ,x { ,} converges to a point x E X if and only if every sphere about x
contains all but a finite number of terms in ,x { ,};
(iv) ,x { ,} converges to a point x E X if and only if every subsequence
of ,x { ,} converges to x ;
(v) if{,x ,} converges to x E X and if Y E ,X then lim p(x", )Y = p(x, )Y ;
"
(vi) if ,x { ,} converges to x E X and if the sequence y{ ,,} of X converges
to Y E ,X then lim p(x", y,,) = p(x, y); and
(vii) if ,x [ ,} converges "to x E ,X and if there is ayE X and a )' > 0 such
that p(x", y) < )' for all n E ,J then p(x, y) < y.
Proof. To prove part (i), assume that ,x y E X = x and
and that lim "x
"
lim "x = y. Then for every f > 0 there are positive integers N" and N)' such
" p(x", x ) < f/2 whenever n > N" and p(x", y) < f/2 whenever n > N
that r
If we let N = max (N", N,,), then it follows that
Now f is any positive number. Since the only non-negative number which
is less than every positive number is ez ro, it follows that p(x, y) = 0 and
therefore x = y.
To prove part (iii), assume that lim x . = x and let Sex; f) be any sphere
•
about .x Then there is a positive integer N such that the only terms of the
sequence { x . } which are possibly not in Sex; f) are the terms X I ' x 2 , • • , X N - 1 •
Conversely, assume that every sphere about X contains all but a finite number
of terms from the sequence .x{ .} With f > 0 specified, let M = max n{ E :J
.x 1= S(x ; f)} . IfwesetN= M + l,thenx . E S(x ; f)foralln> N ,which
was to be shown.
To prove part (v), we note from Theorem 5.1.13 that
lP(y, )x - p(y, x.) I< p(x, .x ).
By hypothesis, lim x . = .x Therefore, lim p(x, x . ) = 0 and so lim Ip(y, )x
• • •
- p (y, x . ) I= 0; i.e., lim p(y, x . ) = p(y, x) .
•
iF nally, to prove part (vii), suppose to the contrary that p(x, y) > .Y'
Then 6 = p(x, y) - i' > O. Now'Y - p(x., y) > 0 for all n E ,J and thus
0< 6< p(x, y) - p(x., y) < p(x, x . )
for all n E .J But this is impossible, since lim X. = .x Thus, p(x, y) < y.
•
We leave the proofs of the remaining parts as an exercise. _
5.5.7. Exercise. Prove parts (ii), (iv), and (vi) of Theorem 5.5.6.
In Definition 5.4.5, we introduced the concept of limit point of a set

Y c .X In Definition 5.5.2, we defined the limit of a sequence of points,
.x{ ,} in .X These two concepts are closely related; however, the reader should
carefully note the distinction between the two. The limit point of a set is
strictly a property of the set itself. On the other hand, a sequence is not a set.
Furthermore, the elements of a sequence are ordered and not necessarily
distinct, while the elements of a set are not ordered but are distinct. oH wever,
the range of a sequence is a subset of .X We now give a result relating these
concepts.
S.S.8. Theorem. eL t Y be a subset of .X Then

(i) x E X is an adherent point of Y if and only if there is a sequence
.Y { } in Y (i.e., .Y E Y for all n) such that lim Y. = x ;
•
(ii) x E X is a limit point of the set Y if and only if there is a sequence
.Y{ } of distinct points in Y such that lim Y . = x ; and
•
(iii) Y is closed if and only if for every convergent sequence {y.j, such
that Y. E Y for all n, limy. = x E Y.
•
Proof To prove part (i), assume that lim Y . = x. Then every sphere about
•
x contains at least one term of the sequence .Y { } and, since every term of
fy.} is a point of ,Y it follows that x is an adherent point of .Y Conversely,
assume that x is an adherent point of .Y Then every sphere about x contains
at least one point of .Y Now let us choose for each positive integer n a point
.Y E Y such that .Y E S(x; lIn). Then it follows readily that the sequence
.Y { } chosen in this fashion converges to x. Specifically, if f > 0 is given,
then we choose a positive integer N such that lIN < f. Then for every n > N
we have Y . E S(x; lIn) c S(x; f). This concludes the proof of part (i).
To prove part (ii), assume that x is a limit point of the set .Y Then every
sphere S(x; lIn) contains an infinite number of points, and so we can choose
"*
a Y . E S(x; lIn) such that Y . IY II for all m < n. The sequence .Y { } consists
of distinct points and converges to .x Conversely, if .Y { } is a sequence of
distinct points convergent to x and if S(x; f) is any sphere with center at ,x
then by definition of convergence there is an N such that for all n > N,
y" E S(x; f). That is, there are infinitely many points of Y i n S(x ; f).
To prove part (iii), assume that Y is closed and let ,Y { ,} be a convergent
sequence with Y . E Y for all n and lim "Y = x . We want to show that x E Y .
•
By part (i), x must be an adherent point of .Y Since Y is closed, x E .Y
Next, we prove the converse. Let x be an adherent point of .Y Then by part
(i), there is a sequence y{ .J in Y such that lim Y . = x. By hypothesis, we must
•
have x E .Y Since Y contains all of its adherent points, it must be closed. _
Statement (iii) of Theorem 5.5.8 is often used as an alternate way of
defining a closed set.
The next theorem provides us with conditions under which a sequence is
convergent in a product metric space.
5.5.9. Theorem. Let {X; P.. J and fY; py} be two metric spaces, letZ = X
x ,Y let p be any of the metrics defined on Z in Theorem 5.3.19, and let
{ Z ; p} denote the product metric space of { X ; P..} and { Y ; py}. If Z E Z
= X x ,Y then z = (x, y), where x E X and y E .Y eL t fx,,} be a sequence
in ,X and let y{ ,,} be a sequence in .Y Then,
(i) the sequence ({ .x , y,,)} converges in Z if and only if ,x { ,} converges in
X and .Y { } converges in ;Y and
(ii) lim (x"' Y.) = (lim x . , lim y,,) whenever this limit exists.
• ••
In many situations the limit to which a given sequence may converge is

unknown. The following concept enables us to consider the convergence
of a sequence without knowing the limit to which the sequence may converge.
5.5.11. Definition. A sequence ,x { ,} of points in a metric space ;X { p} is

said to be a Cauchy sequence or a fundamental sequence if for every e > 0
there is an integer N such that p(x", "x ,) < e whenever m, n ~ N.
The next result follows directly from the triangle inequality.
5.5.12. Theorem. Every convergent sequence in a metric space { X ; p} is

a Cauchy sequence.
Proof Assume that lim "x = .x Then for arbitrary e > 0 we can find an
"
integer N such that p(x", x) < el2 and p(x"" x) < el2 whenever m, n > N.
In view of the triangle inequality we now have
p(x", "x ,) < p(x", x) + p(x"" x) < e
whenever m, n > N. This proves the theorem. _
We emphasize that in an arbitrary metric space { X ; p} a Cauchy sequence

is not necessarily convergent.
5.5.13. Theorem. Let ,x { ,} be a Cauchy sequence. Then ,x { ,} is a bounded

sequence.
Proof We need to show that there is a constant "I such that 0 < "I < 00 and
such that p(x"" ,x ,) < "I for all m, n E I.
Letting e = I, we can find N such that p(x"" ,x ,) x z ), p(x l , x 3), ... ,p(x l , x N)). Then, by the triangle
inequality,
p(x l , ,x ,) < P(X l ' x N ) + p(x N , ,x ,) < (l + I)
if n > N. Thus, for all n E I, p(x l , ,x ,) < l + l. Again, by the triangle
inequality,
p(x"" ,x ,) < p(x"" X l ) + p(x l, ,x ,)
+
-
for all m, n E I. Thus, p(x"" ,x ,) < 2(A I) and ,x { ,} is a bounded sequence.
We also have:
5.5.14. Theorem. If a Cauchy sequence ,x { ,} contains a convergent subse-

quence "x { .}, then the sequence ,x { ,} is convergent.
We now give the definition of complete metric space.
5.5.16. Definition. If every Cauchy sequence in a metric space ;X{ p} con-

verges to an element in ,X then { X ; p} is said to be a complete metric space.
5.5. Complete Metric Spaces 291
Complete metric spaces are of utmost importance in analysis and applica-

tions. We will have occasion to make extensive use of the properties of such
spaces in the remainder of this book.
5.5.17. Example. eL t X = (0, I), and let p(x, y) = Ix - Y I for all x,

Y E .X eL t x . = lin for n E .J Then the sequence .x{ } is Cauchy (i.e., it
is a Cauchy sequence), since Ix . - lx iii < IIN for all n > m > N. Since
there is no x E X to which .x { } converges, the metric space { X ; p} is not
complete. _
5.5.18. Example. Let X = Q, the set of rational numbers, and let p(x, y)
= Ix - yl· eL t x. = I + 2\
.
+ ... + 1,
n.
for n E .J The sequence .x { } is
Cauchy. Since there is no limit in Q to which .x{ } converges, the metric space
Q
{ ; p} is not complete. _
5.5.19. Example. Let R# = R - CO}, and let p(x , y) = Ix - IY for all

x, Y E R'ft. eL t x . = lin, n E .J The sequence .x{ } is Cauchy; however, it
does not converge to a limit in R#. Thus, {R#; p} is not complete. Some
further comments are in order here. If we view R# as a subset of R in the
metric space { R ; p} (p denotes the usual metric on R), then the sequence {x.}
converges to zero; i.e., lim x . = O. By Theorem 5.5.8, R# cannot be a closed
•
subset of R. oH wever, R# is a closed subset of the metric space {R#; p},
since it is the whole space. There is no contradiction here to Theorem 5.5.8,
for the sequence {x.} does not converge to a limit in R#. Specifically, Theorem
5.5.8 states that if a sequence does converge to a limit, then the limit must
belong to the space. The requirement for completeness is that every Cauchy
sequence must converge to an element in the space. _
We now consider several specific examples of important complete metric

spaces.
5.5.20. Example. eL t p denote the usual metric on R, the set of real

{ ; p} is one of the fundamental results of
numbers. The completeness of R
analysis. _
5.5.21. Example. eL t { X ; P.. } and { Y ; py} be arbitrary complete metric

spaces. If Z = X x Yand if Z E Z, then z = (x, y), where x E X and y E Y
(see Theorem 5.3.19). Define
p,.(Zt, Z2) = P2«X t , tY ), (x 2, 2Y »
= ,J ( P ..(x t , x 2)]2 + (piY t , 2Y )]2.
It can readily be shown that the metric space { Z ; P2} is complete. _
5.5.22. Exercise. Verify the completeness of Z

{ ; P2} in the above example.
5.5.23. Example. Let P be the usual metric defined on C, the set of

complex numbers. tU ilizing Example 5.5.21 along with the completeness of
{ ; p} (see Example 5.5.20), we can readily show that C
R { ; p} is a complete
metric space. _
5.5.24. Exercise. Verify the completeness of { C ; pl.

5.5.25. Exercise. eL t X = R" (let X = C") denote the set of all real (of
all complex) ordered n-tuples x = (~I' ... ,~,,). Let y = ('11J ... ,'1,,), let
/
p,(x , y) = [~I~I - 1' 11'T " I sp < 00,
and let
p..(x, y) =
max I{ I~ - 1' 11 • . .. ,I~" - 1' "n. i.e.• p = 00.
tU ilizing the completeness of the real line (of the complex plane), show that
{R"; p,} = R;({C;'; p,} = C;) is a complete metric space for 1 S p S 00.
In particular, show that if lX { } ' is a Cauchy sequence in R; (in C;), where
lX ' = (~\kJ ... '~"l')' then {~/l'} is a Cauchy sequence in R (in C) for j = I,
... ,n, and lX { } ' converges to x, where x = (~I' ... ,~,,) and ~, = lim l'~
l'
for j = 1, ... , n.
5.5.26. Example. Let {I,; p,} be the metric space defined in Example
5.3.5. We now show that this space is a complete metric space.
Let lX { } ' be a Cauchy sequence in I,. where lX ' = (~lkJ ~2k' •• , ~d' •• ).
eL t f > O. Then there is an N E J such that
[ .-L1.." .., - 1] /' <

".1 -
p,(x" lX )' = ~.l' f
for all k,j ~ N. This implies that ~"'l I < f for every m E J and all
k,j ~ N. Thus, {~.l'} is a Cauchy sequence in R for every m E ,J and hence
.~{ l}' is convergent to some limit, say lim ~ ..l' = ~. for m E .J Now let
l'
x = (~t, ~2' • • , ~'" • • ). We want to show that (i) x E I, and (ii) lim lX ' = .X
k
Since lX { } ' is a Cauchy sequence, we know by Theorem 5.5.13 that there
exists a " > 0 such that
p,(O, lX )' = [ .~I .. 1~.k I'
1] /' <"
for all k E .J Now let n be any positive integer, let p~ be the metric on R"
defined in Exercise 5.5.25, and let ~x = {~\kJ ... '~"l'J. Then p~(x~, xj)
< p,(x l" IX )' and thus {x~} is a Cauchy sequence in R;. It also follows that
p~(O, x~) s" for all k E .J Now by Exercise 5.5.25, }~x{ converges to x ' ,
where x ' = (' I " .. ,' , ,). It follows from Theorem 5.5.6, part (vii), that
p~(O, x') < ,,; i.e., t[ i 1',1' )t/, < ". Since this must hold for all n E I, it
follows that x E I,. To show that lim X k = ,x let € > O. Then there is an
k
integer N such that p,(x } , X k ) < € for all k,j > N. Again, let n be any
positive integer. Then we have p~(,~x )~x < € for all j, k > N. F o r fixed
n, we conclude from Theorem 5.5.6, part (vii), that p~(X', x~) :::;; € for all
k 2 N. eH nce, [ ~ " 1,,,, -
"' s l
k' I' IJ /' < € for all k > N, where N depends
only on € (and not on n). Since this must hold for all n E I, we conclude
that p(x , x k } < € for all k > N. This implies that lim x k = X . _
k
5.5.27. Exercise. Show that the discrete metric space of Example 5.1.7
is complete.
5.5.28. Example. eL t e{ ra, bJ; p~) be the metric space defined in Example
5.3.14. Thus, era, bJ is the set of all continuous functions on a[ , bJ and
p~(x, y) = sup I(X I) - y(l) I.
• S/Sb
We now show that e{ ra, bJ; p~) is a complete metric space. If ,x { ,} isa Cauchy
sequence in era, bJ, then for each € > 0 there is an N such that I,x ,(I) - "X ,(I) I
< € whenever m, n 2 N for all I E a[ , b]. Thus, for fixed I, the sequence
,X { ,(I}) converges to, say, oX (I}. Since t is arbitrary, the sequence offunctions
{x,,( .)} converges pointwise to a function x o( .). Also, since N = N(€ )
is independent of I, the sequence ,x { ,( • )} converges uniformly to x o( • ).
Now from the calculus we know that if a sequence of continuous functions
,x { ,( • )» converges uniformly to a function x o( • ), then x o( • ) is continuous.
Therefore, every Cauchy sequence in e{ a[ , b); pool converges to an element in
this space in the sense of the metric poo. Therefore, the metric space e{ a[ ,
bJ; pool is complete. _
5.5.29. Example. eL t e{ ra, bJ; pz} be the metric space defined in Example
5.3.12, with p = 2; i.e.,
pz(x, y) = :U (X [ I) - y(I)J2 dt} lIZ.
We now show that this metric space is not complete. Without loss ofgenerality
let the closed interval be [ - 1 , IJ. In particular, consider the sequence ,x { ,}
of continuous functions defined by
0, -) < t:::;; 0 }
x , ,(t)= nt, O:::;;t:::;;l! n ,
{
I,I! n :::;;t:::;;)
x ( t)
n= 3
n= 2 - ~ ' + f ~ -n=l
- l _ l - - f I~ - - l ..- - - t
5.5.30. iF gw'e .F Sequence {x.} for e{ ra, b]; P2}.
n = 1,2, .... This sequence is depicted pictorially in Figure .F Now let

m > n and note that
P{ 2(X .., .X )}2 = (m - ,,)2 ill... t 2 dt
o
+ fl/. (I -
1/..
nt)2 dt
= (m - ,,)2 < .!. < £

3m2 n 3n
whenever n > 1/(3£). Therefore, .x{ } is a Cauchy sequence.
F o r purposes of contradiction, let us now assume that .x{ } converges to
a continuous function x, where convergence is taken with respect to the metric
P2' In other words, assume that
fl Ix.(t) - (x t)12 dt - - 0 as n - - 00.

This implies that the above integral with any limits between + I and - I
also approaches ez ro as n - > 00. Since x.(t) = 0 whenever t E [ - 1 ,0] , we
have
f l
Ix.(t) - (x t)12 dt = 0
independent of n. From this it follows that the continuous function x is such
that
fl I x(t) 1
2dt = 0,
r
and x(t) = 0 whenever t E [ - 1 ,0] . Now if 0 < a S I, then
Ix.(t) - x(t) 12 dt - - 0 as n - - 00.
r
Choosing n > I/a, we have
11 - x ( tW dt - - 0 as n - - 00.
Since this integral is independent of n it vanishes. Also, since x is continuous
it follows that x(t) = 1 for t > a. Since a can be chosen arbitrarily close to
ez ro, we end up with a function x such that
x(t) = {O, t E [ - 1 ,0] }.

I, t E (0, I]
Therefore, the Cauchy sequence .x [ J does not converge to a point in era, b],
and the metric space is not complete. _
The completeness property of certain metric spaces is an essential and

important property which we will use and encounter frequently in the remain-
der of this book. The preceding example demonstrates that not all metric
spaces are complete. However, this space e[ ra, b]; pzJ is a subspace of a larger
metric space which is complete. To discuss this complete metric space (i.e.,
the completion of e{ ra, b]; pz)} , it is necessary to make use of the eL besgue
theory of measure and integration. F o r a thorough treatment of this theory,
we refer the reader to the texts by Royden 5[ .9] and Taylor 5[ .10]. Although
knowledge of this theory is not an essential requirement in the development
of the subsequent results in this book, we will want to make reference to
certain examples of important metric spaces which are defined in terms of
the eL besgue integral. F o r this reason, we provide the following heuristic
comments for those readers who are unfamiliar with this subject.
The eL besgue measure space on the real numbers, R, consists of the triple
{ , mr, lJ ,} where mr is a certain family of subsets of R, called the eL besgue
R
measurable sets in R, and J l is a mapping, W mr - > R*, called eL besgue
measure, which may be viewed as a generalization of the concept of length
in R. While it is not possible to characterize mr without providing additional
details concerning the eL besgue theory, it is quite simple to enumerate
several important examples of elements in mr. F o r instance, mr contains all
intervals of the form (a, b) = x { E R: a < x < b}, c[ , d) = x { E R: c < x
< d}, (e,f] = x{ E R: e < x < f } , g[ , h] = x{ E R: g < x < h}, as well as
all countable unions and intersections of such intervals. It is emphasized
that mr does not include all subsets of R. Now if A E mr is an interval, then
the measure of A, lJ (A), is the length of A. F o r example, if A = a[ , b], then
lJ (A) = b - a. Also, if B is a countable union of disjoint intervals, then
lJ (B) is the sum of the lengths of the disjoint intervals (this sum may be infi-
nite). Of particular interest are subsets of R having measure ez ro. Essentially,
this means it is possible to "cover" the set with an arbitrarily small subset of
R. Thus, every subset of R containing at most a countable number of points
has eL besgue measure equal to ez ro. F o r example, the set of rational numbers
has eL besgue measure ez ro. (There are also uncountable subsets of R having
eL besgue measure zero.)
In connection with the above discussion, we say that a proposition P(x)
is true almost everywhere (abbreviated a.e.) if the set S = [x E R: P(x) is
not true} has eL besgue measure ez ro. F o r example, two functions f, g:

R- + R are said to be equal a.e. if the set S = x { E R:f(x ) *- g(x)} E mt
and if .J l(S) = O.
eL t us now consider the integral of real-valued functions defined on the
interval a[ , b] c R. It can be shown that a bounded function f: a[ , b] - + R
r
is Riemann integrable (where the Riemann integral is denoted, as usual, by
a
f(x ) dx ) if and only if f is continuous almost everywhere on a[ , b]. The
class of Riemann integrable funCtions with a metric defined in the same
.
manner as in Example 5.5.29 (for continuous functions on a[ , b]) is not a

complete metric space. oH wever, as pointed out before, it is possible to
generalize the concept of integral and make it applicable to a class of functions
significantly larger than the class of functions which are continuous a.e.
In doing so, we must consider the class of measurable functions. Specifically,
a functionf: R - + R is said to be a eL besgue measurable fnnction if f- I (' l l.)
E mt for every open set CU c R. Now letfbe a e L besgue measurable function
which is bounded on the interval a[ , b], and let M = sup { f (x ) = y: x E
a[ , b],} and let m = inf { f (x ) = y: x E a[ , b].} In the eL besgue approach to
integration, the range off is partitioned into intervals. (This is in contrast
with the Riemann approach, where the domain of f is partitioned in
developing the integral.) Specifically, let us divide the range off into the n
parts specified by m = oY < IY < ... <Y~_I < Y~ = M, let E k = x{ E R:
Yk-I < X < Y k } for k = I, ... ,", and let {k be such that Y k - I ~ { k < kY
for k = I, ... ,n. The sum :t {kP(E
k= 1
k) approximates the area under the
graph off, and it can serve as the definition of the integral of f between a
and b, after an appropriate limiting process has been performed. Provided
that this limit exists, it is called the eL besgue integral off over a[ , b], and it
is denoted by f. (a, b)
fd p. It can be shown that any bounded function f which
is Riemann integrable over a[ , b] is eL besgue integrable over a[ , b], and
furthermore f.
(a,bl
fdJ.l = fb f(x ) dx .
•
On the other hand, there are functions
which are eL besgue integrable but not Riemann integrable over a[ , b]. F o r
example, consider the function f: a[ , b] - + R defined by f(x ) = 0 if x is
rational and f(x ) = I if x is irrational. This function is so erratic that the
Riemann integral does not exist in this case. oH wever, since the interval
a[ , b] = A U B, where A = { x : f(x ) = I} and B = { x : f(x ) = O}, it follows
from the preceding characterization of eL besgue integral that f.
(a,bl
fdJ.l
= l.J.l(A) + O.J.l(B) = b - a.
eL t us now consider an important class of complete metric spaces, given
in the next example.
5.5.31. Example. Let p > 1 (p not necessarily an integer), let (R, mr, p.)
denote the eL besgue measure space on the real numbers, and let a[ , h] be
a subset of R. Let .c;:a, h] denote the family of functions f: R - > R which
are eL besgue measurable and such that f IfIp dp. exists and is finite.
[J .• b]
We define an equivalence relation ~ on .c;:a, h] by saying that f' " g
if f(x ) = g(x) except on a subset of a[ , h] having eL besgue measure ez ro.
Now denote the family of equivalence classes into which .cp[a, h] is divided
by pL a[ , h]. Specifically, let us denote the equivalence class [ f ] = g{ E
.cp[a, h]: g ~ f} for f E .cp[a, h]. Then pL a[ , h] = { [ f ] : f E .cp[a, hlJ·
Now let X = pL a[ , h] and define Pp: X x X - > R by
(5.5.32)
It can be shown that the value of p([f], g[ )J defined by Eq. (5.5.32) is the same
for any f and g in the equivalence classes [ f ] and g[ ,] respectively. Further-
more, p satisfies all the axioms of a metric, and as such pL { a[ , h]; pp} is a
metric space. One of the important results of the eL besgue theory is that this
space is complete.
It is important to note that the right-hand side of Eq. (5.5.32) cannot be
used to define a metric on .cp[a, h], since there are functions f *- g such that
.[J ,f b)
If- glp dp. = 0; however, in the literature the distinction between
pL a[ ,h] and .cp[a, h] is usually suppressed. That is, we usually write f E
pL a[ ,b] instead of [ f ] E A
L a, b], where f E .cJa, b].
Finally, in the particular case when p = 2, the space e{ ra, b]; pz} of
Example 5.5.29 is a subspace of the space L{ z ; pz.} •
Before closing the present section we consider some important general

properties of complete metric spaces.
5.5.33. Theorem. Let { X ; p) be a complete metric space, and let { Y ; p} be

a metric subspace of { X ; pl. Then { Y ; p) is complete if and only if Y is a
closed subset of .X
Proof Assume that { Y ; p) is complete. To show that Y is a closed subset
of X we must show that Y contains all of its adherent points. Let y be an
adherent point of ;Y i.e., lety E .Y Then each open sphere S(y; lIn), n = I,
2, ... , contains at least one point y" in .Y Since p(y", y) < lIn it follows that
the sequence y{ ,,) converges to y. Since y{ ,,) is a Cauchy sequence in the com-
plete space { Y ; p} we have y{ ,,} converging to a point y' E .Y But the limit of
a sequence of points in a metric space is unique by Theorem 5.5.6. Therefore,
y' = y; i.e., y E Y and y is closed.
Chapter 5 / Metric Spaces
Conversely, assume that Y is a closed subset of .X To show that the

space { Y ; p} is complete, let .Y { } be an arbitrary Cauchy sequence in { Y ; pl.
Then y{ ,,} is a Cauchy sequence in the complete metric space ;X { p} and as
such it has a limit y E .X oH wever, in view of Theorem 5.5.8, part (iii),
the closed subset Y of X contains all its adherent points. Therefore, { Y ; p}
is complete. _
We emphasize that completeness and closure are not necessarily equivalent

in arbitrary metric spaces. F o r example, a metric space is always closed, yet
it is not necessarily complete.
Before characterizing a complete metric space in an alternate way, we
need to introduce the following concept.
5.5.34. Definition. A sequence S{ t} of subsets of a metric space ;X{ p}

is called a nested sequence of sets if
St ::> Sz ::> S3 ::> • •
We leave the proof of the last result of the present section as an exercise.
5.5.35. Theorem. eL t { X ; p} be a metric space. Then,

(i) { X ; p} is complete if and only if every sequence of closed nested
spheres in { X ; p} with radii tending to ez ro have non-void interesec-
tion; and
(ii) if ;X { p} is complete, if S{ t} is a nested sequence of non-empty closed
subsets of ,X and if lim diam (S,,) =
•
0, then the intersection n SIt
.~I
is not empty; in fact, it consists of a single point.

5.6. COMPACTNESS
We recall the Bolzano-Weierstrass theorem from the calculus: Every

bounded, infinite subset of the real line (i.e., the set of real numbers with the
usual metric) has at least one point of accumulation. Thus, if Y is an arbitrary
bounded infinite subset of R, then in view of this theorem we know that any
sequence formed from elements of Y has a convergent subsequence. F o r
example, let Y = 0[ , 2], and let ,x { ,} be the sequence of real numbers given by
_ I - (- I )"
"x - 2 + n'I n-
_ 2
1, , ....
Then the range of this sequence lies in Y and is thus bounded. eH nce, the
range has at least one accumulation point. It, in fact, has two.
5.6. Compactness 299
A theorem from the calculus which is closely related to the Bolzano-

Weierstrass theorem is the eH ine-Borel theorem. We need the following
terminology.
5.6.1. Definition. eL t Y be a set in a metric space { X ; p), and let A be an

index set. A collection of sets { Y II : (X E A) in {X; p) is called a covering
of Y if Y c U Y II • A subcollection { Y p : p E B) of the covering { Y . : (X E A),
ileA
eL ., B c A such that Y c U
pes
Y p is called a subcovering of { Y.; (X E A). If
all the members Y . and Y p are open sets, then we speak of an open covering
and open subcovering. If A is a finite set, then we speak of a finite covering.
In general, A may be an uncountable set.
We now recall the eH ine-Borel theorem as it applies to subsets of the real

line (Le., of R): eL t Y be a closed and bounded subset of R. If { Y . : (X E A)
is any family of open sets on the real line which covers ,Y then it is possible to
find a finite subcovering of sets from { Y . : (X E A).
Many important properties of the real line follow from the Bolzano-
Weierstrass theorem and from the eH ine-Borel theorem. In general, these
properties cannot be carried over directly to arbitrary metric spaces. The
concept of compactness, to be introduced in the present section, will enable
us to isolate those metric spaces which possess the eH ine-Borel and Bolzano-
Weierstrass property.
Because of its close relationship to compactness, we first introduce the
concept of total boundedness.
5.6.2. Definition. eL t Y be any set in a metric space { X ; p}, and let l

be an arbitrary positive number. A set S. in X is said to be an l- n et for Y
if for any point y E Y there exists at least one point S E S. such that p(s,y)
< l. The l-net, S.. is said to be finite if S. contains a finite number of points.
A subset Y of X is said to be totally bounded if X contains a finite l- n et for Y
for every l > O.
Some authors use the terminology l-dense set for E-net and precompact
for totally bounded sets.
An obvious equivalent characterization of total boundedness is contained
in the following result.
5.6.3. Theorem. A subset Y c X is totally bounded if and only if Y can
be covered by a finite number of spheres of radius E for any E > O.

In Figure G a pictorial demonstration of the preceding concepts is given.
If in this figure the size of E would be decreased, then correspondingly, the
Set X
• • • • • • • • • •
• • • Set Y
S. is the finite • • •
set consisting of
the dots within • • •
the set X
• • •
• • •
• • • • •
5.6.5. iF gure G. Total boundedness of a set .Y
number of elements in S. would increase. If for arbitrarily small E the number

of elements in S. remains finite, then we have a totally bounded set .Y
Total boundedness is a stronger property than boundedness. We leave
the proof of the next result as an exercise.
5.6.6. Theorem. eL t { X ; p J be a metric space, and let Y be a subset of .X

Then,
(i) if Y is totally bounded, then it is bounded;
if Y is totally bounded, then its closure Y is totally bounded; and
(ii)
(iii) if the metric space { X ; pJ is totally bounded, then it is separable.
We note, for example, that all finite sets (including the empty set) are
totally bounded. Whereas all totally bounded sets are also bounded, the
converse does, in general, not hold. We demonstrate this by means of the
following example.
5.6.8. Example. eL t /{ 2; P2J be the metric space defined in Example 5.3.5.

Consider the subset Y c /2 defined by
Y = y{ E 12 ::E
. 1'1,1 2
S I}.
t= 1
We show that Y is bounded but not totally bounded. F o r any ,x y E ,Y we

have by the Minkowski inequality (5.2.7),
P2(X,y) = [~Iet - l' ,12r2 < [~le,/2r2 + t[ i l' ,12T'2 s 2.

Thus, Y is bounded. To show that Y is not totally bounded, consider the
set of points E = e{ p e2 , • • J c ,Y where e l = (1,0,0, ...), e2 = (0, 1,
0, ...), etc. Then pz(e l, eJ ) = ...;-T for i 1= = j. Now suppose there is a finite
€-net for Y for say € = 1- Let S{ l> ... , s,,} be the net S,. Now if eJ is such
that p(eJ' SI) < ! for some i, then peek' sJ > peek' eJ ) - p(eJ' SI) > ! for
k 1= = j. Hence, there can be at most one element of the set E in each sphere
S(SI;! ) for i = I, ... ,n. Since there are infinitely many points in E and
only a finite number of spheres S(SI; ! ) , this contradicts the fact that S, is
an (- n et. Hence, there is no finite (- n et for ( = ! ' and Y is not totally
bounded. _
Let us now consider an example of a totally bounded set.
5.6.9. Example. { "; pz}

Let R be the metric space defined in Example 5.3.1,
and let Y be the subset of R" defined by Y = {y EO R": t
leI
til < I}. Clearly,
Y is bounded. To show that Y is totally bounded, we construct an €-net for
Y for an arbitrary ( > 0. To this end, let N be a positive integer such that
N€ > .- In, and let S, be the set of all n-tuples given by
Sf = s{ = (q l ' ... ,q . ) EO :Y lq = mlIN, some integer ml,
where - N < ml < N, i = I, ... , nJ.
Then clearly S. c Y a nd S, is finite. Now for any y = (til' ... ,tI,,) EO ,Y

there is an s EO S, such that Ilq - till < IIN for i = I, ... , n. Thus, pz(y, s)
~ [ L•
I-I
I{ IN)Z
1] /1 = filN ~ (. Therefore, S. is a finite (- n et. Since (
is arbitrary, Y is totally bounded. _
In general, any bounded subset of R'i = { "; pz}

R is totally bounded.
5.6.10. Exercise. Let l{ ;z pz} be the metric space defined in Example

5.3.5, and let Y c /z be the subset defined by
Y = {y EO /z: Itlll~ I, Itizi <!, ... , 111.1 < (1)"- I , ... .}
Show that Y is totally bounded.
In studying compactness of metric spaces, we will find it convenient to

introduce the following concept.
5.6.11. Definition. A metric space { X ; p} is said to be sequentially compact

if every sequence of elements in X contains a subsequence which converges
to some element x EO .X A set Y in the metric space { X ; p} is said to be
sequentially compact if the subspace { Y ; pJ is sequentially compact; eL .,
every sequence in Y contains a subsequence which converges to a point in .Y
5.6.12. Example. L e t X = (0, I], and let p be the usual metric on the real
line R. Consider the sequence .x { ,J where "x = lin, n = I, 2, . . .. This
sequence has no subsequence which converges to a point in ,X and thus

{ X ; p} is not sequentially compact. _
We now define compactness.
5.6.13. Definition. A metric space { X ; p} is said to be compact, or to possess

the eH ine-Borel property, if every open covering of ;X { p} contains a finite
open subcovering. A set Y in a metric space { X ; p} is said to be compact if
the subspace { Y ; p} is compact.
Some authors use the term bicompact for eH ine-Borel compactness and
the term compact for what we call sequentially compact. As we shall see
shortly, in the case of metric spaces, compactness and sequential compactness
are equivalent, so no confusion should arise.
We will also show that compact metric spaces can equivalently be charac-
terized by means of the Bolzano-Weierstrass property, given by the following.
5.6.14. Definition. A metric space { X ; p} possesses the Bolzano-Weierstrass

property if every infinite subset of X has at least one point of accumulation.
A set Y in X possesses the Bolzano-Weierstrass property if the subspace
{ Y ; p} possesses the Bolzano-Weierstrass property.
Before setting out on proving the assertions made above, i.e., the equiva-
lence of compactness, sequential compactness, and the Bolzano-Weierstrass
property, in metric spaces, a few comments concerning some of these concepts
may be of benefit.
Informally, we may view a sequentially compact metric space as having
such an abundance of elements that no matter how we choose a sequence,
there will always be a clustering of an infinite number of points around at
least one point in the metric space. A similar interpretation can be made
concerning metric spaces which possess the Bolzano-Weierstrass property.
Utilizing the concepts of sequential compactness and total boundedness,
we first state and prove the following result.
5.6.15. Theorem. Let { X ; p} be a metric space, and let Y be a subset of

.X The following properties hold:
(i) if Y is sequentially compact, then Y is bounded;
(ii) if Y is sequentially compact, then Y is closed;
(iii) if { X ; p} is sequentially compact, then { X ; p} is totally bounded;
(iv) if ;X { p} is sequentially compact, then ;X{ p} is complete; and
(v) if { X ; p} is totally bounded and complete, then it is sequentially
compact.
Proof To prove (i), assume that Y is a sequentially compact subset of X
and assume, for purposes of contradiction, that Y is unbounded. Then we
can construct a sequence ,Y { ,} with elements arbitrarily far apart. Speci-

fically, let IY E Y a nd choose zY E Y such that P(YI' 12) > I. Next, choose
Y 3 E Y such that p(y I' Y 3) > 1 + p(y I' Y )z . Continuing this process, choose
"Y E Y such that P(YI' ,Y ,) > 1 + p(y., Y , ,- I )' If m > n, then P(YI'Y"') > I+
p(y"y")andp(y",,y,,) > Ip(Y I ' Y " ' ) - p(YI,Y,,)1 > 1. But this implies that
y{ ,,} contains no convergent subsequence. However, we assumed that Y is
sequentially compact; i.e., every sequence in Y contains a convergent subse-
uq ence. Therefore, we have arrived at a contradiction. Hence, Y must be
bounded. In the above argument we assumed that Y is an infinite set. We
note that if Y is a finite set then there is nothing to prove.
To prove part (ii), let f denote the closure of Y a nd assume that Y E f.
Then there is a sequence of points ,Y { ,} in Y which converges to ,Y and every
subsequence of y{ ,,} converges to ,Y by Theorem 5.5.6, part (iv). But, by
hypothesis, Y is sequentially compact. Thus, the sequence y{ ,,} in Y contains
a subsequence which converges to some element in .Y Therefore, Y = f
and Y is closed.
We now prove part (iii). Let { X ; p} be a sequentially compact metric
space, and let X I E .X With E > 0 fixed we choose if possible X z E X such
that p(x p x z ) > E. Next, if possible choose X 3 E X such that p(x l , x z ) > E
and p(x p x 3 ) > E. Continuing this process we have, for every n, p(x", X I ) > E,
p(x", x z ) > E, • • , p(x", X , ,_ I ) > E. We now show that this process must
ultimately terminate. Clearly, if{X; p} is a bounded metric space then we can
pick E sufficiently large to terminate the process after the first step; i.e.,
there is no point x E X such thatp(x 1 , x ) :2 € . Now suppose that, in general,
the process does not terminate. Then we have constructed a sequence ,x { ,}
such that for any two members X I ' x J of this sequence, we have p(xt> X J ) > E.
But, by hypothesis, ;X{ p} is sequentially compact, and thus ,x { ,} contains a
subsequence which is convergent to an element in .X Hence, we have arrived
at a contradiction and the process must terminate. Using this procedure
we now have for arbitrary E > 0 afinite set of points { x . , x z , ... ,X l } such
that the spheres, S(x,,; E), n = I, ... ,I, cover X ; i.e., for any E > 0, X
contains a finite E-net. Therefore, the metric space { X ; p} is totally bounded.
We now prove part (iv) of the theorem. Let ,x { ,} be a Cauchy sequence.
Then for every E > 0 there is an integer I such that p(x"" ,x ,) < f whenever
m > n > I. Since { X ; p} is sequentially compact, the sequence ,x { ,} contains a
subsequence tx{ .l convergent to a point X E X so that lim P(Xl., )x = O.
,,-00
The sequence I{ ,,} is an increasing sequence and I", > m. It now follows that
whenever m > n > I. Letting m - + 00, we have 0 < p(x", )x < E, whenever
n > I. Hence, the Cauchy sequence ,x { ,} converges to x E .X Therefore,
X is complete.
In connection with parts (iv) and (v) we note that a totally bounded metric
space is not necessarily sequentially compact. We leave the proof of part

(v) as an exercise. _
5.6.16. Exercise. Prove part (v) of Theorem 5.6.15.
Parts (iii), (iv) and (v) of the above theorem allow us to define a sequentially
compact metric space equivalently as a metric space which is complete and
totally bounded. We now show that a metric space is sequentially compact if
and only if it satisfies the Bolzano-Weierstrass property.
5.6.17. Theorem. A metric space { X ; p} is sequentially compact if and

only if every infinite subset of X has at least one point of accumulation.
Proof Assume that Y is an infinite subset of a sequentially compact metric
space ;X{ pl. If nY{ } is any sequence of distinct points in ,Y then nY{ } contains
a convergent subsequence y{ ,J, because ;X { p} is sequentially compact.
The limit Y of the subsequence is a point of accumulation of .Y
Conversely, assume that { X ; p} is a metric space such that every infinite
subset Y of X has a point of accumulation. Let y{ n} be any sequence of points
in .Y If a point occurs an infinite number of times in nY { ,} then this sequence
contains a convergent subsequence, a constant subsequence, and we are
finished. If this is not the case, then we can assume that all elements of .Y{ }
are disti net. eL t Z denote the set of all points Y n' n = I, 2, .... By hypothesis,
the infinite set Z has at least one point of accumulation. If Z E Z is such
a point of accumulation then we can choose a sequence of points of Z which
converges to z (see Theorem 5.5.8, part (i» and this sequence is a subsequence
y{ ,.} of nY { ' } Therefore, ;X{ p} is sequentially compact. This concludes the
proof. _
Our next objective is to show that in metric spaces the concepts of com-
pactness and sequential compactness are equivalent. In doing so we employ
the following lemma, the proof of which is left as an exercise.
5.6.18. eL mma. eL t ;X { p} be a sequentially compact metric space. If

{ Y .. : IX E A} is an infinite open covering of { X ; p}, then there exists a number
E > 0 such that every sphere in X of radius E is contained in at least one of
the open sets Y ...
5.6.19. Exercise. Prove Lemma 5.6.18.
5.6.20. Theorem. A metric space { X ; p} is compact if and only if it is

sequentially compact.
Proof From Theorem 5.6.17, a metric space is sequentially compact if and
only if it has the Bolzano-Weierstrass property. Therefore, we first show
5.6. Compactness
that every infinite subset of a compact metric space has a point of accu-
mulation.
eL t [ X ; p) be a compact metric space, and let Y be an infinite subset of
.X F o r purposes of contradiction, assume that Y has no point of accumula-
tion. Then each x E X is the center of a sphere which contains no point of
,Y except possibly x itself. These spheres form an infinite open covering of
.X But, by hypothesis, [ X ; p) is compact, and therefore we can choose from
this infinite covering a finite number of spheres which also cover .X Now
each sphere from this finite subcovering contains at most one point of ,Y and
therefore Y is finite. But this is contrary to our original assumption, and we
have arrived at a contradiction. Therefore, Y has at least one point of accu-
mulation, and [ X ; p) is sequentially compact.
Conversely, assume that [ X ; p) is a sequentially compact metric space,
and let [ Y .. ;« E A) be an arbitrary infinite open covering of .X From
Lemma 5.6.18 there exists an [ > 0 such that every sphere in X of radius
[ is contained in at least one of the open sets Y ... Now, by hypothesis, { X ; p)
is sequentially compact and is therefore totally bounded by part (iii) of
Theorem 5.6.15. Thus, with arbitrary [ fixed we can find a finite [-net,
I
IX[ > x z , ... ,XI)' such that X c U
1= 1
S(x l; f). Now in view of Lemma 5.6.18,
S(x l ; [) c Y .. I , i = I, ... ,I, where the sets ,Y ," are from the family { Y .. ;
« E A). eH nce,
I
Y .. XcU
" I-I
and X has a finite open subcovering chosen from the infinite open covering
{ Y .. ;« E A). Therefore, the metric space { X ; p) is compact. This proves the
theorem. _
There is yet another way of characterizing a compact metric space. Before

doing so, we give the following definition.
5.6.21. Definition. eL t F { .. : « E A} be an infinite family of closed sets.

The family F { .. :« E A} is said to have the finite intersection property if
for every finite set B c A the set F .. is not empty. n
.. EB
5.6.22. Theorem. A metric space ;X{ p} is compact if and only if every

infinite family F { .. :« E A} of closed sets in X with the finite intersection
property has a nonvoid intersection; i.e., F .. t= = 0 . n
.. EA
We now summarize the above results as follows.

5.6.24. Theorem. In a metric space { X ; p} the following are eq u ivalent:

(i) {X; p} is compact;
(ii) ;X{ p} is sequentially compact;
(iii) {X; p}
possesses the Bolzano-Weierstrass property;
(iv) {X; p}
is complete and totally bounded; and
(v) every infinite family of closed sets in { X ; p} with the finite intersection
property has a nonvoid intersection.
Concerning product spaces we offer the following exercise.
5~6.2S. Exercise. L e t { X I ; pa, { X z ; pz}, . .. , { X . ; P.} be n compact metric

spaces. L e t X = X I X X z X ... x X . , and let
p(x , y) = PI(X " Y I ) + ... + P.(x . , Y.), (5.6.26)
where "x ,Y E "X i = I; ... , n, and where ,x Y E .X Show that the product
space { X ; p} is also a compact metric space.
The next result constitutes an important characteriz a tion of compact sets

in the spaces R· and C·.
5.6.27. Theorem. L e t { R · ; pz} (let { C · ; pz } ) be the metric space defined

in Ex a mple 5.3.1. A set Y c R- (a set Y c C· ) is compact if and only ifit
is closed and bounded.
Recall that every non-void compact set in the real line R contains its
infimum and its supremum.
In general, it is not an easy task to apply the results of Theorem 5.6.24
to specific spaces in order to establish necessary and sufficient conditions
for compactness. F r om the point of view of applications, criteria such as
those established in Theorem 5.6.27 are much more desirable.
We now give a condition which tells us when a subset of a metric space is
compact. We have:
5.6.29. Theorem. L e t { X ; p} be a compact metric space, and let Y c .X

If Y is closed, then Y is compact.
Proof L e t { Y .. ; (J, E A} be any open covering of ;Y i.e., each Y .. is open
relative to { Y ; pl. Then, by Theorem 5.4.20, for each Y .. there is a U .. which
is open relative to ;X{ p} such that Y .. = Y n U ... Since Y is closed, Y - is
an open set in ;X{ pl. Also, since X = Y U Y - , Y - U U{ .. : (J, E A} is an
open covering of .X Since X is compact, it is possible to find a finite sub-
covering from this family; i.e., there is a finite set B c A such that X = Y -
5.7. Continuous uF nctions 307
u U[ .. EB V..]. Since Y c U
.. eB
V.., Y = U
.. eB
Y n V.. ; i.e., { Y .. ;« E B} covers Y .
This implies that Y is compact. _
We close the present section by introducing the concept of relative

compactness.
5.6.30. Definition. Let { X ; p} be a metric space and let Y c .X The subset

Y is said to be relatively compact in X if Y is a compact subset of .X
One of the essential features of a relatively compact set is that every

sequence has a convergent subsequence, just as in the case of compact
subsets; however, the limit of the subsequence need not be in the subset.
Thus, we have the following result.
5.6.31. Theorem. eL t { X ; p} be a metric space and let Y c .X Then Y is

relatively compact in X if and only if every sequence of elements in Y contains
a subsequence which converges to some x E .X
Proof Let Y be relatively compact in ,X and let nY{ } be any sequence in
.Y Then nY{ } belongs to Y also and hence has a convergent subsequence in
,Y since Y is sequentially compact. Hence, nY{ } contains a subsequence
which converges to an element x EY e .X
Conversely, let nY { } be a sequence in .Y Then for each n = 1,2, ... , there
is an x n E Y such that p(x n, nY ) < lin. Since x { n} is a sequence in ,Y it
contains a convergent subsequence, say x{ n.} , which converges to some
x E .X Since nx { .J is also in ,Y it follows from part (iii) of Theorem 5.5.8
that x E .Y Hence, Y is sequentially compact, and so Y is relatively compact
in .X _
5.7. CONTINUOS
U N
UF CTIONS
Having introduced the concept of metric space, we are in a position to

give a generalization of the concept of continuity of functions encountered
in calculus.
5.7.1. Definition. Let { X ; P..J and { Y ; py} be two metric spaces, and let
f: X - + Y be a mapping of X into .Y The mappingf is said to be continuous
at the pcint X o E X if for every ( > 0 there is a ~ > 0 such that
o)] < (
PY [ f (x ) ,f(x
whenever p,,(x, x o) < ~. The mapping f is said to be continuous on X or
simply continuous if it is continuous at each point x E .X
308 Chapter 5 / Metric Spaces
We note that in the above definition the ~ is dependent on the choice

of X o and e; ie., ~ = tS(f, x o). Now if for each f > 0 there exists a ~ = tS(e)
> 0 such that for any X o we have py[ f (x ) ,f(x o)] < f whenever p,,(x, x o) < ~,
then we say that the function f is uniformly continuous on .X Henceforth,
if we simply say f is continuous, we mean f is continuous on .X
5.7.2. Example. Let { X ; p,,} = R~, and let { Y ; py} = RT (see Example
5.3. I). Let A denote the real matrix
::: ::: :::]

A=
[ amI a m2
. ...
.,. a mn
.
We denote x E Rn and Y E Rm by
L e t us define the function f: Rn - + Rm by

f(x) = Ax
for each x ERn. We now show that f is continuous on Rn. Ifx, X o E R- and
y, oY E Rm are such that y = f(x) and aY = f(x.), then we have
[ ~']
11m
= a[ n
amI
"~ ] e[ ,]
am_ en
and
p[ y(y, OY )]2 = ~ tL a/j(e J - eOJ)r

Using the Schwarz inequality,· it follows that
p[ ,.{y, yo»)2 < [~ Ct ah) ~LJ (e J - e I)2)

O
Now let M= t{ 1 tJ all} 1/1 1= = 0 (if M= 0 then we are done). Given any
f 0 and choosing ~ = flM, it follows that p,.{y, oY ) < f whenever p,,(x, ox )
>
< and any mapping f: Rn - + Rm which is represented by a real, constant
~
(m X n) matrix A is continuous on Rn. •
5.7.3. Example. Let { X ; p,,} = { Y ; py} = {e[a, b]; P2}' the metric space
defined in Example 5.3.12, and let us define a function/: X - + Y in the fol-
lowing way. F o r x E ,X Y = f(x ) is given by
yet) = I: k(t, s)x(s)ds, t E a[ , b],
where k: R7. - > R is continuous in the usual sense, i.e., with respect to the
metric spaces R~ and R1. We now show that f is continuous on .X Let x,
rI{ :
X o E X and y, o Y E Y be such that y = f(x ) and oY = f(x o). Then
[ p iY , oY W = k(t, s)[(x s) - ox (s)]ds} 7. dt.
It follows from Holder's inequality for integrals (5.2.5) that
u: r
py(y, oY ) < Mpx(x, x o),
where M = k7.(t, s) dsdtl'7.. eH nce, for any f > 0, py{y,Yo) < f when-
ever Px(,x x o) < b, where b = fiM. •
5.7.4. Example. Consider the metric space e{ ra, b]; p~} defined in Example
5.3.14. eL t el[ a , b] be the subset of era, b] of all functions having continuous
first derivatives (on (a, b», and let {X; Px} be the metric subspace {el[a, b];
pool. Let {Y; py} = e{ ra, b]; p~} and define the functionJ: X - > Yas follows.
F o r x E ,X Y = f(x ) is given by
yet) = dx ( t) .
dt
To show that/is not continuous, we show that for any b > 0 there is a pair
x , X o E X such that Px(,x x o) < ~ but pif(x ) ,f(x o > I. eL t ox (t) = 0 for »
all t E a[ , b], and let x(t) = tx sin rot, tx > 0, ro > O. Then p(x o' x) < tx.
Now if oY = f(x o) and y = f(x ) , then yo(t) = 0 for all t E a[ , b] and yet)
= (XCI) cos rot. eH nce, p(Yo' y) = txro, provided that ro is sufficiently large, i.e.,
so that cos rot = ± I for some t E a[ , b]. Now no matter what value of ~
we choose, there is an x E X such that p(x, x o) < ~ if we pick tx < ~. oH w-
ever, p(y, oY ) = I if we let ro = Iltx. Therefore,J i s not continuous on .X •
We can interpret the notion of continuity of functions in the following

equivalent way.
5.7.5. Theorem. Let {X; Px} and { Y ; py} be metric spaces, and let f:
X -> .Y Then f is continuous at a point X o E X if and only if for every
f > 0, there exists a ~ > 0 such that
f(S(xo;~) c S(f(x o); f).
Intuitively, Theorem 5.7.5 tells us thatfis continuous at X o if f(x ) is arbi-

trarily close to f(x o) when x is sufficiently close to X o. The concept of continu-
ity is depicted in Figure H for the case where { X ; Px} = { Y ; py} = R~.
o
o
{ Y ; pyl = R~
5.7.7. iF gure H. Illustration of continuity.
As we did in Chapter I, we distinguish between mappings on metric

spaces which are injective, surjective, or bijective.
It turns out that the concepts of continuity and convergence of sequences
are related. Our next result yields a connection between convergence and
continuity.
5.7.8. Theorem. Let { X ; P,J and { Y ; p,.} be two metric spaces. A function
f: X - + Y is continuous at a point X o E X if and only if for every sequence
}~x{ of points in X which converges to a point X o the corresponding sequence
f{ (x)~ } converges to the point f(x o) in Y; i.e.,
limf(x)~ = f(lim x~) = f(x o)
whenever lim ~x = x o'

Proof Assume that f is continuous at a point X o E ,X and let {x.l be a
sequence such that lim x . = X o' Then for every E > 0 there is a 6 > 0 such
»
that p,.(f(x ) ,f(x o < E whenever Px(x, x o) < 6. Also, there is an N such
that Px ( x . , x o) < 6 whenever n > N. Hence, p,.(f(x . ),f(x o < E whenever »
n > N. Thus, if f is continuous at X o and if lim x . = x o, then Iimf(x . )
= f(x o)'
Conversely, assume that f(x . ) - + f(x o) whenever ~x - + x o' F o r purposes
of contradiction, assume that f is not continuous at X o' Then there exists
an E > 0 such that for each 6 > 0 there is an x with the property that
Px(x, x o) < 6 and p,.(f(x ) ,f(x o» > E. This implies that for each positive
integer n there is an x~ such that Px(x., x o) < lin and P,.(f(x J , f(x o > E »
for all n; i.e., ~x - + X o but { f (x . )} does not converge to f(x o)' But we assumed
that f(x . ) - + f(x o) whenever ~x - + X o' Hence, we have arrived at a contradic-
tion, and I must be continuous at X o' This concludes the proof of our
theorem. _
Continuous mappings on metric spaces possess the following important

properties.
5.7.9. Theorem. eL t { X ; p~} and { Y ; p,} be two metric spaces, and letl
be a mapping of X into .Y Then
(i) Iis continuous on X if and only if the inverse image of each open
subset of { Y ; p,} is open in { X ; p~}; and
(ii) I is continuous on X if and only if the inverse image of each closed
subset of { Y ; p,} is closed in { X ; p~}.
Proof. eL t I be continuous on ,X and let V::t= 0 be an open subset of
;Y{ p,}. Let U = I- I (V). Clearly, :U :t= 0. Now let x E .U Then there exists a
unique y = I(x ) E V. Since V is open, there is a sphere S(y; e) which is
entirely contained in V. Since I is continuous at x, there is a sphere S(x; 0)
such that its image I(S(x ; 0» is entirely contained in S(y; e) and therefore
in V. But from this it follows that S(x; 0) c .U eH nce, every x E U is the
center of a sphere which is contained in .U Therefore, U is open.
Conversely, assume that the inverse image of each non-empty open
c Y i s open, the setf- I (S(y; e» is open for every f >°

subset of Y is open. F o r arbitrary x E X we have y = f(x ) . Since S(y; e)
and x E f- I (S(y;e» .
eH nce, there is a sphere Sex; 0) such that sex ; 0) c f- I (S(y; e» . From
this it follows that for every f > 0 there is a 6 > 0 such that f(S(x ; 0)
c S(y; f). Therefore,fis continuous at .x But x E X was arbitrarily chosen.
eH nce, I is continuous on .X This concludes the proof of part (i).
To prove part (ii) we utilize part (i) and take complements of open sets. •
The reader is cautioned that the image of an open subset of X under
a continuous mapping f: X - + Y is not necessarily an open subset of .Y
F o r example, let I: R - + R be defined by f(x ) = x 2 for every x E R. Clearly,
lis continuous on R. eY t the image of the open interval ( - I , I) is the interval
0[ , I). But the interval 0[ , I) is not open.
We leave the proof of the next result as an exercise to the reader.
5.7.10. Theorem. eL t {X; p~}, {Y; p,}, and Z{ ; P.} be metric spaces, letf
be a mapping of X into ,Y and let g be a mapping of Y into Z. Iffis contin-
uous on X and g is continuous on ,Y then the composite mapping h = g 0 I
of X into Z is continuous on .X

F o r continuous mappings on compact spaces we state and prove the
following result.
5.7.12. Theorem. Let ;X { Px} and ;Y{ P)'} be two metric spaces, and let
f: X - + Y be continuous on .X
(i) If {X; Px} is compact, then f(X ) is a compact subset of {Y; p)'.}
(ii) If U is a compact subset of the metric space ;X{ Px,} thenf(U ) is
a compact subset of the metric space { Y ; p)'.}
(iii) If {X; P}x is compact and if U is a closed subset of ,X then f( )U
is a closed subset of { ;Y p)').
(iv) If;X { Px} is compact, thenfis uniformly continuous on .x
Proof To prove part (i) let IY { I} be a sequence in f(X ) . Then there are
points ,x { ,} in X such that IY I = f(x ll ). Since ;X{ Px} is compact we can find
a subsequence ,x { .) of ,x { ,} which converges to a point in ;X i.e., ,x . - + x.
In view of Theorem 5.7.8 we have, since f is continuous at x, f(x , .) - + f(x )
E f(X ) . From this it follows that the sequence ,Y{ ,} has a convergent sub-
sequence and f(X ) is compact.
To prove part (ii), let U be a compact subset of .X Then ;U { Px} is a
compact metric space. In view of part (i) it now follows that f( )U is also a
compact subset of the metric space { Y ; p)'.}
To prove part (iii), we first observe that a closed subset U of a compact
metric space ;X{ Px) is itself compact and ;U { Px) is itself a compact metric
space. In view of part (ii), f( U ) is a compact subset of the metric space
{ Y ; P)'} and as such is bounded and closed.
To prove part (iv), let E > O. F o r every x E ,X there is some positive
number, 'I(x), such that f(S(x ; 2'1(x») c: S(f(x ) ; E/2). Now the family
{ S ex ; ' I (x » : x E X ) is an open covering of X. Since X is compact, there is
a finite set, say F c: ,X such that S { ex; ,,(x»: x E } F is a covering of .X
Now let
6 = min {,,(x): x E .} F
Since F is a finite set, 6 is some positive number. Now let ,x Y E X be such
that p(x, y) < 6. Choose z E F such that x E S(z; ,,(z». Since 6:::;;; ,,(z),
Y E S(z; 2,,(z.» Since f(S(z ; 2,,(z)» c S(f(z ) ; E/2), it follows that f(x ) and
f(y) are in S(f(z ) ; E/2). eH nce, pif(x ) ,f(y» < E. Since 6 does not depend
on x E ,X f is uniformly continuous on .X This completes the proof of the
theorem. _
eL t us next consider some additional generalizations of concepts encoun-

tered in the calculus.
5.7.13. Definition. eL t ;X { Px} and ;Y{ p),} be metric spaces, and let {fll}
be a sequence of functions from X into .Y Iff{ 1l(X)} converges at each x E X,
then we say that {fll} is pointwise convergent. In this case we write lim fll = f,
II
where f is defined for every x E .X
Equivalently, we say that the sequence f{ lO} is pointwise convergent to

a function I if for every f > 0 and for every x E X there is an integer N

= N(f, x ) such that
pil,,(x ) ,/(x » < f
whenever n > N(f, x). In general, N(f, x ) is not necessarily bounded. How-
ever, if N(f, )x is bounded for all x E ,X then we say that the sequence
I[ .} converges to I uniformly on .X Let M(f) = sup N(f, x ) < 00. Equivalently,
"ex
we say that the sequence [f.} converges uniformly to I on X if for every
f > 0 there is an M(f) such that
pil.(x ) ,f(x » < f
whenever n > M(f) for all x E .X

In the next result a connection between uniform convergence of functions
and continuity is established. (We used a special case of this result in the
proof of Example 5.5.28.)
5.7.14. Theorem. Let [ X ; p,,} and [ Y ; py} be two metric spaces, and let
f[ It} be a sequence of functions from X into Y such that f" is continuous on
X for each n. If the sequence [f.} converges uniformly to I on X, then I is
continuous on .X
Proof Assume that the sequence [ f .} converges uniformly to Ion .X Then
for every f > 0 there is an N such that Py(f.(x ) ,f(x » < f whenever n > N
for all x E .X If M > N is a fixed integer then 1M is continuous on .X Letting
X o E X b e fixed, we can find a 6> 0 such thatpy(fM(x),IM(x o» < fwhenever
p,,(x , x o) < 6. Therefore, we have
:» ::;;
py(f(x ) ,/(x o pif(x ) ,fM(X » + py(fM(x),fM(X »
O
PY(fM(XO),f(x o» < 3f, +

whenever. PJe(x, x o) < 6. F r om this it follows that f is continuous at X O'
Since X o was arbitrarily chosen,fis continuous at all x E .X This proves the
theorem. •
The reader will recognize in the last result of the present section several
generalizations from the calculus to real-valued functions defined on metric
spaces.
5.7.15. Theorem. Let [ X ; pJe} be a metric space, and let R [ ; p} denote the
real line R with the usual metric. Let I: X - > R, and let U c: .X If I is con-
tinuous on X and if U is a compact subset of [ X ; p",}, then
(i) lis uniformly continuous on U ;
(ii) fis bounded on ;U and
(iii) if U " * 0, f attains its infimum and supremum on ;U i.e., there
ex i stx o ,x 1 E U s uchthatf(x o )= i nf{ f (x ) :x E U ) andf(x l) = sup
[ f (x ) : x E .} U
Proof Part (i) follows from part (iv) ofTheorem 5.7.12. Since U is a compact
subset of X it follows that /(U ) is a compact subset of R. Thus, /(U ) is
bounded and closed. From this it follows that j is bounded. To prove part
(iii), note that if U is a non-empty compact subset of ;X { Px,} then /(U ) is
a non-empty compact subset of R. This implies that / attains its infimum
and supremum on .U •
5.8. SOME IMPORTANT RESUT

L S
IN APPLICATIONS
In this section we present two results which are used widely in applica-
tions. The first of these is called the fixed point principle while the second is
known as the Ascoli-Arzela theorem. Both of these results are widely utilized,
for example,in establishing existence and uniqueness of solutions of various
types of equations (ordinary differential equations, integral equations,
algebraic equations, functional differential equations, and the like).
We begin by considering a special class of continuous mappings on metric
spaces, so-called contraction mappings.
5.8.1. Definition. eL t { X ; p} be a metric space and let j: X - X . The

function / is said to be a contraction mapping if there exists a real number
c such that 0 < c < I and
p(f(x ) ,j(y» s;;: cp(x . y) (5.8.2)
for all ,x y E .X
The reader can readily verify the following result.

5.8.3. Theorem. Every contraction mapping is uniformly continuous on
.X
The following result is known as the fixed point principle or the principle
of contraction mappings.
5.8.5. Theorem. eL t { X ; p} be a complete metric space, and let / be a

contraction mapping of X into .X Then
(i) there exists a unique point uX E X such that
f(x o) = xo' (5.8.6)
and
(ii) for any XI E ,X the sequence x { n} in X defined by
X n+ 1 = /(x n ), n= 1,2, ... (5.8.7)
converges to the unique element X o given in (5.8.6).
5.8. Some Important Results in Applications 315
The unique point X o satisfying Eq. (5.8.6) is called a fixed point off In
this case we say that X o is obtained by the method of successive approximations.
Proof We first show that if there is an X o E X satisfying (5.8.6), then it
must be unique. Suppose that X o and oY satisfy (5.8.6). Then by inequality
(5.8.2), we have p(xo,Yo) < cp(x o' oY )· Since 0 < c < I, it follows that
p(x o' oY ) = 0 and therefore X o = oY '
Now let IX be any point in .X We want to show that the sequence fx.}
generated by Eq. (5.8.7) is a Cauchy sequence. F o r any n > I, we have
p(x.+ I, x . ) < cp(x., x._ I). By induction we see that p(x.1+ > x . ) < C· - I p(XZx ' l )
for n = 1,2, .... Thus, for any m > n we have
+ c+ ... +
",- I
p(x"" x.) < I: P(XkI+ '
k= •
x k) < c· - I p(x z , xl)[1 C",-I-· ]
< c• -1
p( ,zX IX )
.
- 1- c
Since 0 < c < I, the right-hand side of the above inequality can be made
arbitrarily small by choosing n sufficiently large. Thus, .x { } is a Cauchy
sequence.
Next, since fX ; p} is complete, it follows that .x { } converges; i.e., lim x•
•
exists. eL t lim x . = .x Now since/is continuous on ,X we have
•
limf(x . ) = f(lim x.).
• •
But f(lim x . )
•
= f(x ) and lim
II.
I(x n ) = lim x n+ I = .x Thus,/(x ) = x and we
have proven the existence of a fixed point off Since we have already proven
uniqueness, the proof is complete. _
It may turn out that the composite function pn' /),. /0/0 ... 0/ is a
contraction mapping, whereas / is not. The following result shows that such
a mapping still has a unique fixed point.
5.8.8. Corollary. Let { X ;p} be a complete metric space, and let/; X - > X
be continuous on .X If the composite function p.' = f 0/0 ... 0/ is a
contraction mapping, then there is a unique point X o E X such that
f(x o) = X o' (5.8.9)
Moreover, the fixed point can be determined by the method of successive
approximations (see Theorem 5.8.5).
We will consider several applications of the above results in the last

section of this chapter.
Before we can consider the Arzela-Ascoli theorem, we need to introduce
the following concept.
5.8.11. Definition. Let e[a, b] denote the set of all continuous real-valued
functions defined on the interval a[ , b] of the real line R. A subset Y of
era, b] is said to be equicontinuous on a[ , b] if for every f > 0 there exists a
J > 0 such that Ix(t) - x(t o) I < f for all x E Y and all t, to such that
It - tol < .J
Note that in this definition J depends only on f and not on x or 1 and ' 0,
We now state and prove the Arzela-Ascoli theorem.
5.8.12. Theorem. Let e{ a[ , b]; p_} be the metric space defined in Example
5.3.14. Let Y be a bounded subset of e[a, b]. If Y is equicontinuous on a[ , b],
then Y is relatively compact in e[a, b].
Proof F o r each positive integer k, let us divide the interval a[ , b] into k equal
parts by the set of points Vk = t{ ok' 11k' ... ,/ u } c a[ , b]. That is, a = 10k
< Ilk < ... < lu = b, where t'k = a + (ilk)(b - a), i = 0, I, ... ,k, and
k
= U = 1,2, .... Since each
-
a[ , b] [ / c/- I lk' I,k] for all k Vk is a finite set,
1= '
U
k= 1
Vk is a countable set. F o r convenience of notation, let us denote this set
by T { ,! Tz, ....J The ordering of this set is immaterial. Next, since Y is
bounded, there is a ., > 0 such that p_(,x y) < ., for all ,x Y E .Y Let X o be
held fixed in ,Y and let Y E Y be arbitrary. Let 0 E era, b] be the function
which is zero for all 1 E a[ , b]. Then p_(y, 0) < p_(y, x o) + p_(x o, 0). Hence,
p_ ( y, 0) < M for all y E ,Y where M = ., + p_(x o, 0). This implies that
sup ly(t)1 < M for all Y E .Y Now, let y{ .J be an arbitrary sequence in
IEI• • bl
.Y We want to show that y{ .J contains a convergent subsequence. Since
IY.(TI)I < M for all n, the sequence of real numbers .Y { (T I)} contains a
convergent subsequence which we shall call {YI.(TI)}' Again, since IhY (1'Z) I
< M for all n, the sequence of real numbers hY{ (1')z } contains a convergent
subsequence which we shall call .z Y { (1')z .} We see that .z Y { (1' I)} is a subse-
quence Of{hY (1' I)}, and hence it is convergent. Proceeding in a similar fashion,
we obtain sequences y { hI, .z Y { ,} ... such that bY{ } is a subsequence of y { 1.}
for all k > j. Furthermore, each sequence is such that lim hY (1',) exists for
•
each i such that 1 < i < k. Now let { x . J b e the sequence y{ • .} Then .x { } is
a subsequence of hY { } and lim .X (1',) exists for i = 1,2, .... We now wish to
•
show that .x{ } is a Cauchy sequence in e{ a[ , b]; p_.} Let f> 0 be given.
Since Y is equicontinuous on a[ , b], we can find a positive number k such
that Ix.(t) - x . (t' ) I < f/3 for every n whenever It - t' l < Ilk. Since .X { (1',)}
is a convergent sequence of real numbers, there exists a positive integer
N such that Ix.(1',) - m X (1',) I < f/3 whenever m > Nand n > N for all
1', E Vk • Now, if t E a[ , b], there is some 1', E Vk such that II - 1',1 < Ilk.
5.9. Equivalent and oH meomorphic Metric Spaces. Topological Spaces 317
Hence, for all m > Nand n > N, we have

Ix i t) - "x ,(t) I< Ifx t(t) - fX t(t,) I + IfX t(t,) - "x ,(t,) I
+ IX",(t,) - "x ,(t)1 < E.
This implies that poo(x"" fX t) .< E for all m, n > N. Therefore, .x{ } is a Cauchy
sequence in era, b]. Since e{ ra, b]; pool is a complete metric space (see Example
5.5.28), fX{ t} converges to some point in era, b]. This implies that fY { t}
has a subsequence which converges to a point in era, b] and so, by Theorem
5.6.31, Y is relatively compact in era, b]. This completes the proof of the
theorem. _
Our next result follows directly from Theorem 5.8.12. It is sometimes

referred to as Ascoli's lemma.
5.8.13. Corollary. Let 9{ 1ft} be a sequence of functions in e{ ra, b]; pool-

If 9{ 1ft} is equicontinuous on a[ , b] and uniformly bounded on a[ , b] (Le., there
exists an M> 0 such that sup 1,.(t)1 < M for all n), then there exists a
.S;.S;b
, E era, b] and a subsequence 9{ 1ft.} of ,{ ,,} such that 9{ 1ft.} converges to,
uniformly on a[ , b].
We close the present section with the following converse to Theorem

5.8.12.
5.8.15. Theorem. Let Y be a subset of era, b] which is relatively compact
in the metric space e{ ra, b]; pool. Then Y is a bounded set and is equicon-
tinuous on a[ , b].
5.9. EQUIVALENT AND HOMEOMORPHIC METRIC

SPACES. TOPOLOGICAL SPACES
It is possible that seemingly different metric spaces may exhibit properties

which are very similar with regard to such concepts as open sets, limits
of sequences, and continuity of functions. F o r example, for each p, I < p
< 00, the spaces R~ (see Examples 5.3.1,5.3.3) are different metric spaces.
However, it turns out that the family of all open sets is the same in all of
these metric spaces for 1 < p < 00 (e.g., the family of open sets in R7 is the
same as the family of open sets in Ri, which is the same as the family of open
sets in Rj, etc.). Furthermore, metric spaces which are not even defined on
the same underlying set (e.g., the metric spaces { X ; P.. } and { Y ; py}, where
X * Y) may have many similar properties of the type mentioned above.
We begin with equivalence of metric spaces defined on the same underlying
set.
5.9.1. Definition. Let { X ; ptl and { X ; Pl} be two metric spaces defined on
the same underlying set .X Let 3 1 and 31 be the topology of X determined
by PI and Pl' respectively. Then the metrics PI and Pl are said to be equivalent
metrics if 3 1 = 31 ,
Throughout the present section we use the notation

f: { X ; PI} ~ {Y; Pl}
to indicate a mapping from X into ,Y where the metric on X is PI and the
metric on Y is Pl' This distinction becomes important in the case where
X = ,Y i.e. in the casef: { X ; PI} - + { X ; Pl}'
Let us denote by i the identity mapping from X onto ;X i.e., i(x ) = x
for all x E .X Clearly, i is a bijective mapping, and the inverse is simply
i itself. However, since the domain and range of i may have different metrics
associated with them, we shall write
i: ;X{ PI} ~ {X; Pl}
and
i- I : {X; Pl} ~ {X; PI}'
With the foregoing statements in mind, we provide in the following
theorem a number ofequivalent statements to characterize equivalent metrics.
5.9.2. Theorem. Let {X; pd, {X; Pl}' and { Y ; P3} be metric spaces. Then
the following statements are equivalent:
(i) PI and Pl are equivalent metrics;
(ii) for any mappingf: X - + Y,J: { X ; PI} - + { Y ; P3} is continuous on X
if and only iff: { X ; Pl} - + { Y ; P3} is continuous on X;
(iii) the mapping i: { X ; PI} - + { X ; Pl} is continuous on ,X and the
mapping i- I : { X ; Pl} - + { X ; ptl is continuous on X; and
(iv) for any sequence x { R } in ,X x { R } converges to a point x in { X ; PI}
if and only if x { R } converges to x in ;X { Pl}'
Proof To prove this theorem we show that statement (i) implies statement
(ii); that statement (ii) implies statement (iii); that statement (iii) implies
statement (iv); and that statement (iv) implies statement (i).
To show that (i) implies (ii), assume that PI and Pl are equivalent metrics,
and letfbe any continuous mapping from ;X{ PI} into {Y; P3}' Let U be any
open set in { Y ; P3}' Sincefis continuous,J - I (U ) is an open set in { X ; PI}'
Since PI and Pl are equivalent metrics,f- I (U ) is also an open set in { X ; Pl} '
Hence, the mapping f: { X ; Pl} - + { Y ; P3} is continuous. The proof of the
converse in statement (ii) is identical.
We now show that (ii) implies (iii). Clearly, the mapping i: ;X{ pz} - +
{X; pz} is continuous. Now assume the validity of statement (ii), and let
{ Y ; P3} = {X; pz.} Then i: {X; PI} - {X; pz} is continuous. Again, it is clear
that i- I : { X ; PI} - + { X ; PI} is continuous. Letting { Y ; P3} = { X ; pd in
statement (ii), it follows that i- I : { X ; pz} - + { X ; PI} is continuous.
Next, we show that (iii) implies (iv). eL t i: ;X{ PI} - + ;X { pz} be contin-
uous, and let the sequence {x~} in metric space { X ; PI} converge to .x By
Theorem 5.7.8, lim i(x)~ = i(x); eL ., lim ~x = x in { X ; pz.} The converse is
~ ~
proven in the same manner.

Finally, we show that (iv) implies (i). L e t U be an open set in { X ; PI}'
Then U - is closed in { X ; PI}' Now let{x~J be a sequence in U - which converges
to x in { X ; PI}. Then x E U - by part (iii) of Theorem 5.5.8. By assumption,
{x~} converges to x in { X ; pz} also. Furthermore, since x E U - , U - is closed
in ;X{ pz,} by part (iii) of Theorem 5.5.8. Hence, U is open in ;X{ pz.} Letting
U be an open set in ;X { pz,} by the same reasoning we conclude that U is open
in { X ; PI}' Thus, PI and pz are equivalent metrics.
This concludes the proof of the theorem. _
The next result establishes sufficient conditions for two metrics to be

equivalent. These conditions are not necessary, however.
5.9.3. Theorem. Let ;X{ PI} and ;X { pz} be two metric spaces. If there
exist two positive real numbers, Y' and A, such that
lpz ( x , y) < PI(X, y) < lJ Pz(,x y)
for all ,x y E ,X then PI and pz are equivalent metrics.
Let us now consider some specific examples of equivalent metric spaces.
5.9.5. Exercise. eL t ;X { pJ be any metric space. F o r the example of

Exercise 5.1.10 the reader showed that {X; PI} is a metric space, where
)_ p(x , y)
PI ( x , y - I + p(x , y)
for all ,x y E .X Show that P and PI are equivalent metrics.
5.9.6. Theorem. Let {R~; PI} = R~ and {R~; pz} = R~ be the metric spaces
defined in Example 5.3.1, and let R{ ;~ pool be the metric space defined in
Ex a mple 5.3.3. Then
(i) poo(x, y) < pz(,x y) < ..jn poo(x, y) for all ,x y E R~;
(ii) poo(x, y) < PI(X, y) < npoo(x, y) for all ,x y E R~; and
(iii) PI' Pz, and poo are equivalent metrics.
It can be shown that for the metric spaces R

{ n; PoP) and R
{ n; Pv), PoP and Pv
are equivalent metrics for any p, q such that I < p < 00, I < q < 00.
In Example 5.1.12, we defined a metric P*, called the usual metric for R*.
U p until now, it has not been apparent that there is any meaningful connec-
tion between P* and the usual metric for R. The following result shows that
when P* is restricted to R, it is equivalent to the usual metric on R.
5.9.8. Theorem. L e t R { ; p) denote the real line with the usual metric,
and let R
{ *; p*J denote the extended real line (see Exercise 5.1.12). Consider
R
{ ; P*J which is a metric subspace of R
{ *; P*J. Then
(i) for the metric spaces R { ; p) and {R; p*J, p and p* are eq u ivalent
metrics;
(ii) if U c R, then U is open in {R; p) if and only if U is open in
R
{ *; P*); and
(iii) if U is open in R{ *; p*), then U n R, U - {+ooJ, and U - { - o o)
are open in R
{ *; p*).
5.9.9. Exercise. Prove Theorem 5.9.8. (H i nt: Use part (iii) of Theorem
5.9.2 to prove part (i) of this theorem.)
Our next example shows that i- I need not be continuous, even though
i is continuous.
5.9.10. Example. L e t X be any non-empty set, and let PI be the discrete
metric on X (see Example 5.1.7). In Exercise 5.4.26 the reader was asked
to show that every subset of X is open in { X ; PI)' Now let { X ; p) be an
arbitrary metric space with the same underlying set .X Clearly, i: { X ; PI)
-+ { X ; p) is continuous. However, i- I : { X ; p) - + { X ; PI) is not continuous
unless every subset of { X ; p) is open. Since this is usually not true, i- I need
not be continuous. _
Next, we introduce the concepts of homeomorphism and homeomorphic

metric spaces.
5.9.11. Definition. Two metric spaces { X ; P.. J and {Y; py} are said to be
bomeomorpbic if there exists a mapping rp: { X ; P..J - + { Y ; p,.) such that (i)
rp is a bijective mapping of X onto ,Y and (ii) E c X is open in { X ; P..) if
and only if rp(E) is open in { Y ; p,.J. The mappingrp is calledabomeomorpbism.
We immediately have the following generalization of Theorem 5.9.2.
5.9.12. Theorem. Let {X; P..,J { Y ; p,.), and { Z ; p,) be metric spaces, and
let rp be a bijective mapping of { X ; P.. ) onto { Y ; p,.). Then the following
statements are equivalent.
(i) rp is a homeomorphism;
(ii) for any mapping f: X - + Z, f: ;X { Px} - + Z { ; Pz} is continuous on
X if and only iff0 rp-I: { Y ; py} - + { Z ; Pz} is continuous on ;Y
(iii) rp: { X ; Px} - + { Y ; py} is continuous and rp-I: {Y; py} - + {X; Px} is
continuous; and
(iv) for any sequence x { n } in ,X x { n } converges to a point x in {X; Px}
if and only if r{ p(x n )} converges to rp(x) in { Y ; py}.
The connection between homeomorphic metric spaces defined on the same

underlying set and equivalent metrics is provided by the next result.
5.9.14. Theorem. Let { X ; PI} and { X ; P2} be two metric spaces with the
same underlying set .X Then P I and P2 are equivalent if and only if the identity
mapping i: ;X { PI} - + ;X { P2} is a homeomorphism.
It is possible for ;X { PI} and ;X { P2} to be homeomorphic, even though

PI and P2 may not be equivalent.
There are important cases for which the metric relations between the
elements of two distinct metric spaces are the same. In such cases only the
nature of the elements of the metric spaces differ. Since this difference may
be of no importance, such spaces may often be viewed as being essentially
identical. Such metric spaces are said to be isometric. Specifically, we have:
5.9.16. Definition. eL t { X ; Px} and { Y ; py} be two metric spaces, and let
rp: { X ; Px} - + (Y ; py} be a bijective mapping of X onto .Y The mapping rp is
said to be an isometry if
Px(,x y) = py(rp(x), rp(y»
for all x, y E .X If such an isometry exists, then the metric spaces (X ; Px}
and ;Y{ P,.} are said to be isometric.
5.9.17. Theorem. eL t rp be an isometry. Then rp is a homeomorphism.

We close the present section by introducing the concept of topological

space. It turns out that metric spaces are special cases of such spaces.
In Theorem 5.4.15 we showed that, in the case of a metric space ;X{ pI,
(i) the empty set 0 and the entire space X are open; (ii) the union of an
arbitrary collection of open sets is open; and (iii) the intersection of a finite
collection of open sets is open. Examining the various proofs of the present
chapter, we note that a great deal of the development of metric spaces is not
a consequence of the metric but, rather, depends only on the properties of
certain open and closed sets. Taking the notion of open set as basic (instead
of the concept of distance, as in the case of metric spaces) and taking the
aforementioned properties of open sets as postulates, we can form a mathe-
matical structure which is much more general than the metric space.
5.9.19. Definition. Let X be a non-void set of points, and let 3 be a family

of subsets which we will call open. We call the pair ;X{ 3} a topological space
if the following hold:
(i) X 3, 0 E 3;
E
(ii) if U 1 E 3 and U z E 3, then U 1 n U z E 3; and
(iii) for any index set A, if IX E A, and ,U . E 3, then U
,.eA
,U . E 3.
The family 3 is called the topology for the set .X The complement of an
open set U E 3 with respect to X is called a closed set.
The reader can readily verify the following results:
5.9.20. Theorem. eL t ;X { 3} be a topological space. Then

(i) 0 is closed;
(ii) X is closed;
(iii) the union of a finite number of closed sets is closed; and
(iv) the intersection of an arbitrary collection of closed sets is closed.
We close the present section by citing several specific examples of

topological spaces.
5.9.22. Example. In view of Theorem 5.4.15, every metric space is a

topological space. _
5.9.23. Example. Let X = ,x { y,J and let the open sets in X be the void set
0, the set X itself, and the set .} x { If 3 is defined in this way, then ;X { 3}
is a topological space. In this case the closed sets are 0, ,X and y{ .} _
5.9.24. Example. Although many fundamental concepts carry over from

metric spaces to topological spaces, it turns out that the concept of topological
space is often too general. Therefore, it is convenient to suppose that certain
topological spaces satisfy some additional conditions which are also true
in metric spaces. These conditions, called the separation axioms, are imposed
on topological spaces ;X{ 3} to form the following important special cases:
5.10. Applications 323
TI-spaces: A topological space (X ; ::I} is called a TI-space if every set

consisting of a single point is closed. Equivalently, a space is called a T I -
space, provided that if x and yare distinct points there is an open set con-
taining y but not .x Clearly, metric spaces satisfy the TI-axiom.
Tz-spaces: A topological space (X;::I} is called a Tzs- pace if for all
distinct points ,x y E X there are disjoint open sets U x and U y such that
x E U x and y E U r Tzs- paces are also called aH usdorff' spaces. All metric
spaces are Hausdorff spaces. Also, all T z s- paces are T I-spaces. However,
there are TI-spaces which do not satisfy the Tz-separation axiom.
T 3-spaces: A topological space (X; ::I} is called a T 3s- pace if (i) it is a
TI-space, and (ii) if given a closed set Y a nd a point x not in Y there are
disjoint open sets U I and U z such that x E U I and Y c U z . T 3s- paces are
also called regular topological spaces. All metric spaces are T 3-spaces. All
T 3s- paces are Tzs- paces; however, not all Tzs- paces are T 3s- paces.
T.-spaces: A topological space (X;::I} is called a T.-space if (i) it is a
TI-space, and (ii) if for each pair of disjoint closed sets IY > Y z in X there
exists a pair of disjoint open sets U I , U z such that Y I c U I and Y z c U z .
T.-spaces are also called normal topological spaces. Such spaces are clearly
T 3-spaces. However, there are T 3-spaces which are not normal topological
spaces. On the other hand, all metric spaces are T.-spaces. •
5.10. APPLICATIONS
The present section consists of two parts (subsections A and B).

In the first part we make extensive use of the contraction mapping principle
to establish existence and uniqueness results for various types of equations.
This part consists essentially of some specific examples.
In the second part, we continue the discussion of Section .4 11, dealing
with ordinary differential equations. Specifically, we will apply Ascoli's
lemma, and we will answer the questions raised at the end ofsubsection4.1IA.
A. Applications of the Contraction Mapping Principle
In our first example we consider a scalar algebraic equation which may be

linear or nonlinear.
5.10.1. Example. Consider the equation

x = f(x ) , (5.10.2)
where f: a[ , b] - - > a[ , b] and where a[ , b] is a closed interval of R. Let L > 0,
and assume that f satisfies the condition .
If(x z ) - f(x l )! < Llx z - IX I (5.10.3)
for all X ! , X z E a[ , b). In this case / is said to satisfy a iL pschitz condition,

and L is called a iL pschitz constant.
Now consider the complete metric space {R; p}, where p denotes the usual
metric on the real line. Then aH , b]; p} is a complete metric subspace of{ R ; p}
(see Theorem 5.5.33). If in (5.10.3) we assume that L < I, then/is clearly a
contraction mapping, and Theorem 5.8.5 applies. It follows that if L < I,
then Eq. (5.10.2) possesses a unique solution. Specifically, if X o E a[ , b), then
the sequence ,x { ,}, n = 1,2, ... determined by "X = /(X,,_I) converges to
the unique solution of Eq. (5.10.2).
Note that if Id/(x ) fdx I = If' ( x ) I < c < I on the interval a[ , b) (in this
case f' ( a) denotes the right-hand derivative of/at a, and f' ( b) denotes the
left-hand derivative of/at b), then / is clearly a contraction.
In iF gures J and K the applicability of the contraction mapping principle
y= x
/
/
b1-----------.,(
81- - - . (
/
/
/
/
5.10.4. iF gure J . Successive approximations (convergent case).
,
y· x
/
b
/
/
/
/
~
8
,-
/
............... y" fIx)
/
/
x. X 3x
b
5.10.5. iF gure .K Successive approximations (convergent case).

is demonstrated pictorially. As indicated, the sequence .x{ } determined by

successive approximations converges to the fixed point .x •
In our next example we consider a system of linear equations.
5.10.6. Example. Consider the system of n linear equations
e, = ~
:'J 1
•
a'J~ + P" i= 1, ... , n. (5.10.7)
Assume that x = (~I' ... , E R·, b = (PI> ' e.)

.. , P.) E R·, and a'J E R.
Here the constants a'' J P, are known and the are unknown. In the following e,
we use the contraction mapping principle to determine conditions for the
existence and uniqueness of solutions of Eq. (5.10.7). In doing so we consider
different metric spaces. In all cases we let
y =!(x)
denote the mapping determined by the system of linear equations
•
"J :1
a'J~ P" i = I, ... , n,
I' , = ~ +
where y = (' I I' ... , 'I.) ERn.
F i rst we consider the complete space R { n; PI} = R7. Let y' = ! ( x ' ) ,
y" = ! ( x " ), x ' = (~;, ... , ~) and x " = (~~, ... , ~). We have
PI(y' , y") = PI(f(X ' ) ,f(x " » = i; It a,~j
I- I J=I
+ P, - a'Je~ - P, I
= tIt a,iej -
I- I J-I
e~)1 < tt
1= I 1= I
la,Jllej - e~1
< m:x {~la,JI}PI(X', x " ),
where in the preceding the Holder inequality for finite sums was used (see
Theorem 5.2.1). Clearly, f is a contraction if the inequality
(5.10.8)
holds. Thus, Eq. (5.10.7) possesses a unique solution if (5. 10.8) holds for allj.
Next, we consider the complete space R { n; pz} = Ri. We have
pl(y' , y") = pl(! ( x ' ) ,! ( x " » = ~

• {.
ft1 a,tj + P, - a,J e1- P, } z
= L • L .{ a,J(ej - e1) 2} < { L • L
I-I I-J 1= 1 J=I
ah } pi(x ' , x " ),
where, in the preceding, the Schwarz inequality for finite sums was employed
(see Theorem 5.2.1). It follows that f is a contraction, provided that the
inequality
(5.10.9)
326 ehopler 5 I Metric Spaces
holds. Therefore, Eq. (5.10.7) possesses a uniq u e solution, if (5.10.9) is

satisfied.
{ a; p-l = R:.. We have
Lastly, let us consider the complete metric space R
p- ( y' , y") = p_ ( f(x ' ) ,f(x " » = m~ It all~ - ~) I

a
= m
{ ax
I
L
1= 1
la,/llp_(x', x " ).
Thus,f is a contraction if
{m~x fti 1al/ Il
a
b. k < l. (5.10.10)
Hence, if (5.10.10) holds, then Eq. (5.10.7) has a unique solution.

In summary, if anyone of the conditions (5.10.8), (5.10.9), or (5.10.10)
holds, then Eq. (5.10.7) possesses a uniq u e solution, namely .x This solution
can be determined by the successive approx i mation
I- '..t(k) - - ~
a
a IJ ' - I..t(k- I ) + bI' k -- " 1 2 ... , (5.10.11)
1-
for all i = I, ... ,n, with starting point X C01 = (~iO), ... ,~~O». •
Next, let us consider an integral eq u ation.
3.10.12. Example. L e t, E e[a, b) and let (K s, I) be a real-valued function

which is continuous on the sq u are a[ , b) X a[ , b). L e t 1 E R. We call
ex s) = tp{s) + 1 s: (K s, t)x(t)dt (5.10.13)
a Fredholm nOD-homogeneous linear integral equation of the secODd kind.

In this eq u ation x is the unknown, (K s, t) and, are specified, and 1 is regarded
as an arbitrary parameter.
We now show that for all III sufficiently small, Eq. (5.10.13) has a uniq u e
solution which is continuous on a[ , b]. To this end, consider the complete
metric space e{ ra, b]; p-l, and let y = f(x ) denote the mapping determined by
yes) = ,(s) + ). s: (K s, t)x(t)dt.
Clearly y E era, b]. We thus have f: era, b] -+ era, b]. Now let M =
sup I(K s, t) I. Then
a5;t5;b
a5;,5;b
p_ ( f(x l ),! ( x , » S 111M(b - a)p_ ( x l , x , ).

Therefore, if we choose 1 so that
111= < M(b ~ a) (5.10.14)

5.10. Applications 3rT
then f is a contraction mapping. F r om Theorem 5.8.5 it now follows tbat

Eq. (5.10.13) possesses a uniq u e solution x E era, b], if (5.10.14) bolds.
Starting at X o E era, b], successive approx i mations to this solution are given
by
,x ,(s) = ,(s) + .t s: (K s, t)x"I_ (t)dt, n = 1,2,3, . . .. _ (5.10.15)
Nex t , we consider yet another type of integral eq u ation.
5.10.16. Example. L e t rp E era, b], let (K s, t) be a real continuous function

on the triangle a < t < s < b, and let .t E R. We call
(x s) = rp(s) + .t .J ' (K s, t)x(t)dt, a < s< b, (5.10.17)
a linear Volterra integral equation. H e re x is unknown, (K s, t) and, are

specified, and .t is an arbitrary parameter.
We now show that, for all .t, Eq. (5.10.17) possesses a uniq u e continuous
solution. We consider again the complete metric space e{ ra, b]; pool, and we
let Y = f(x ) be the mapping determined by
y(s) = rp(s) + .1. r (K s, t)x(t)dt.
Since the right- h and side of this expression is continuous, it follows that
f: era, b] - era, b]. Moreover, since K is continuous, there is an M such
that IK(s, t)1 ~ M. L e t YI = f(x l ), and let zY = f(x z ). As in the preceding
example, we have
p..(f(x l ),f(x 2)) = P"(YI> 2Y ) < 1.1.1 M(b - a)poo(x l , x 2)·
Now let fl"l denote the composite mapping f 0 f 0 ••• 0 f, and let fl"l(x )
= yl"l. A little bit of algebra yields

p..(fI"l(X I ),fI"l(x z » = p..(yl">'yl"l) ~ ~n. 1.1.I"M"(b - a)"p..(x l, x 2 )·
(5.10.18)
However, ~n . 1.tI"M"(b - a)" - 0 as n - 00. Thus, for an arbitrary value of

.t, n can be chosen so large that
k A ~! 1.tI"M"(b - a)" < 1.
H e nce, we have
p.. (fI"l(X I),fI"l(X 2)» < kp..(x 1 , x 2 ), 0 < k < l.
Therefore, the composite mapping fl"l is a contraction mapping. It follows
from Corollary 5.8.8 that Eq. (5.10.17) possesses a uniq u e continuous solution
for arbitrary .t. This solution can be determined by the method of successive
approx i mations. _
5.10.19. Exercise. Verify inequality (5.10.18).
Next we consider initial-value problems characterized by scalar ordinary

differential equations.
5.10.20. Example. Consider the initial-value problem
x = 1(1, x ) }
(X f) = e (5.10.21)
discussed in Section .4 1 I. We would like to determine conditions for the

existence and uniqueness of a solution ,(1) of (5.10.21) for f < 1 ::;; T.
Let k > 0, and assume that 1 satisfies the condition
1/(1, XI) - 1(1, )z x 1 < k IX I - lzx
for all 1 E [f, T] and for all x l ' X z E R. In this case we say thatf satisfies
a iL pschitz condition in x and we call k a Lipschitz constant.
As was pointed out in Section .4 11, Eq. (5.10.21) is equivalent to the
integral equation
,(t) = e+ s: f(s, tp(s»ds. (5.10.22)
Consider now the complete metric space { e [ f , T]; Poo}, and let
F ( ,) = e+ r f(s, fP(s»ds, f < 1< T.
Then clearly F: e[ f , T] -+ e[ f , n. Now
poo(F(I' )' F ( ,z » = sup

.9:S:T
ItJ •
[ (s, 'I(S» - f(s, ' z ( s» ] d s I
 fPz) .
Thus, F is a contraction if k < Ij(T - f).

Next, let p.) denote the composite mapping F 0 F 0 ••• 0 .F Similarly,
as in (5.10.18), the reader can verify that
(5.10.23)
Since ..!n., k·(T - f)ft -+ 0 as n - + 00,

.
it follows that for sufficiently large n,
..!n., kft(T- f)ft < I. Therefore, p.) is a contraction. It now follows from
Corollary 5.8.8 that Eq. (5.10.21) possesses a unique solution for [f, T].
Furthermore, this solution can be obtained by the method of successive
approximations. _
5.10.24. Exercise. Generalize Example 5.10.20 to the initial-value problem

IX = ,[ (1, XI' •. ,x ft ) ,
x t fr) = {I' i = I, ... , n,

which is discussed in Section .4 1 I .
B. Further Applications to Ordinary Differential Equations
At the end of Section 4 . lJ A we raised the following questions: (i) When

does an initial-value problem possess solutions? (ii) When are these solutions
unique? (iii) What is the extent of the interval over which such solutions
exist? (iv) Are these solutions continuously dependent on initial conditions?
In Example 5.10.20 we have already given a partial answer to the first
two uq estions. In the remainder of the present section we refine the type of
result given in Example 5.10.20, and we give an answer to the remaining
items raised above.
As in the beginning of Section 4 . lIA, we call R2 the (I, )x plane, we let
D c R2 denote an open connected set (i.e., D is a domain), we assume that
/ is a real-valued function which is defined and continuous on D, we call
T = (11' ( 2 ) eRa I interval, and we let ' I denote a solution of the differential
equation
X = 1(1, )x . (5.10.25)
The reader should refer to Section 4 . lIA for the definition of solution rp.
We first concern ourselves with the initial-value problem
X = 1(1, x), x ( r) ={ (5.10.26)
characterized in Definition .4 11.3. Our first result is concerned with existence
of solutions of this problem. It is convenient to establish this result in two
stages, using the notion of €-approximate solution of Eq. (5.10.25).
5.10.27. Definition. A function rp defined and continuous on a I interval

T is called an a-€ pproximate solution of Eq. (5.10.25) on T if
(i) (t, 1' (1» E D for all t E T;
(ii) ' I has a continuous derivative on T except possibly on a finite set
S of points in T, where there are jump discontinuities allowed; and
(iii) I;(t) - 1(1, rp(I» I < € for all lET - S.
IfS is not empty, ' I is said to have piecewise continuous derivatives on T.
We now prove:
5.10.28. Theorem. In Eq. (5.10.25), let/be continuous on the rectangle

Do = {(I, x ) : It - 1'1 < a, Ix - {I < b}.
x x
"' -r-' - ~ - I
," I
I
(1", l~ I
I
I
I
L . .- - J=- ~
I......-"""-- _ _ __ _ L. _ _ '_ t
........."- ' - "~'- ' :_ ' t-
~
1"- a 1"+ 1 b 1"- a b .,. + 1

1"+ - 1"+ -
M M
b
OCal<- M
OC=.! I<
M
(1" - I , ~l
(1" + a,~)
5.10.29. iF gure .L Construction of an t-approximate solution.
Given any f > 0, there exists an f-approximate solution tp of Eq. (5.10.25)

on an interval It - f I< ~ :5: a such that tp(r) = ~.
Proof Let M = max II(t, x) I, and let ~ = min (a, hIM). Note that
(' , > : lED,
~ =
a if a < hiM and ~ = hiM if a> hiM (refer to Figure )L . We will show
that an f-approximate solution exists on the interval [ f , f + ~]. The proof is
similar for the interval (f - ~, fl. In our proof we will construct an f-approxi-
mate solution starting at (f,~, consisting of a finite number of straight line
segments joined end to end (see Figure )L .
Since 1 is continuous on the compact set Do, it is uniformly continuous
on Do (see Theorem 5.7.12). eH nce, given f > 0, there exists 6 = 6(f) > 0
such that I/(t, x ) - I(t' , x ' ) I < f whenever (t, x), (t', x ' ) E Do, It - t'l < 6
and Ix - i'x < 6. Now let f = to and f + ~ = t". We divide the half-open
interval (to, t,,] into n half-open subintervals (to, tl]' (tl' t 2,] ... , (t,,_ I ' t,,] in
such a fashion that
(5.10.30)
Next, we construct a polygonal path consisting of n straight lines joined end

to end, starting at the point (r, e> 6 (to, eo)
and having slopes equal to
ml _ 1 = ! ( tI- 1 > e l- l ) over the intervals (tl_ l ,tl] , i= I, ... ,n, respectively,
where el = el- I + m l _ I Itl - tl _ 1 I· A typical polygonal path is shown in
Figure .L Note that the graph of this path is confined to the triangular region
in Figure .L eL t us denote the polygonal path constructed in this way by 1' .
Note that 1' is continuous on the interval 1[ ,' l' + ~], that 1' is a piecewise
linear function, and that 1' is piecewise continuously differentiable. Indeed,
we have 1' (1') = eo = e
and
1' (t) = 1' (t l- l ) + f(tl-I> '1(ti-I»(t - ti-I)' ti- I < t < ti' i = 1, ... , n.
(5.10.31)
Also note that
1'1(t) - 1' (t') I< Mit - t' l (5.10.32)
for all t, t' E 1[ ,' l' +~]. We now show that 1' is an f-approximate solution.
Let t E (t l _ l , tJ Then it follows from (5.10.30) and (5.10.32) that 1'1(t)
- 1' (t l- l ) I < 6 . Now since If(t, x) - f(t' , x ' ) I < f whenever (t, x), (t', x ' )
E Do, It - t'l < 6, and Ix - x ' i < 0, it follows from Eq. (5.10.31) that
I;(t) - f(t, 1' 1» =
If(tl- I , rp(t l_ I » - f(t, 1' (t» 1< f.
Therefore, the function 1' is an f-approximate solution on the interval
It - 1'1 < ~ < a . •
We are now in a position to establish conditions for the existence of solu-
tions of the initial-value problem (5.10.26).
S.10.33. Theorem. In Eq. (5.10.25), let! be continuous on the rectangle
Do = ret,
)x : It - 1'1 < a, Ix - el < b}. Then the initial-value problem
(5.10.26) has a solution on some t interval given by It - 1'1 < ~ ~ a.
Proof Let f. > 0, f.+ 1
.--
< .€ and lim f. = 0 (i.e., let .€{ ,}
be a monotone decreasing sequence of positive numbers tending to ez ro).
n = 1,2, ... ,
By Theorem 5.10.28, there exists for every € . an f.-approximate solution of

Eq. (5.10.25), call it 1' ., on some interval It - 1'1 < ~ such that 1' .(1') = e.
Now for each 1' . it is true, by construction of 1' ., that
(5.10.34)
This shows that 1' { .J is an equicontinuous set of functions (see Definition
5.8.11). Letting t' = Tin (5.10.34), we have 1'1.(t) - el < Mit - 1'1 < M~,
and thus I1' .(t) I< I" + M~ for all n and for all t E T[ , l' + ~]. Thus, the
sequence 1' { .J is uniformly bounded. In view of the Ascoli lemma (see Corol-
lary 5.8.13) there exists a subsequence 1'{ • .}, k = I, 2, ... of the sequence
1' { .J which converges uniformly on the interval 1[ ' - ~, l' + ~] to a limit
function 1' ; i.e.,
This function is continuous (see Theorem 5.7.14) and, in addition, I,{ t )

- ,(t' ) 1 s Mit - t'l·
To complete the proof, we must show that, is a solution of Eq. (5.10.26)
r
or, equivalently, that, satisfies the integral equation
,(t) = ~ + f(s, ,(s» d s. (5.10.35)
eL t , • be an f•• a- pproximate solution, let .~ ,(t) = ; • (t) - f(t, , ..(t)) at

those points where , • is differentiable, and let .~ ,(t) = 0 at the points where
r
, • is not differentiable. Then , • can be expressed in integral form as
, • .(t) = ~ + [f(s, , • (s» + ~ . ,(s)]ds. (5.10.36)
Since , ... is an f-approximate ...(t)1 < f... Also, since

solution, we have 1~
f is uniformly continuous on Do and since , • uniformly on 1[ ' - IX,
,.,(1» -
-+ ,
f + IX,] as k - + 00, it follows that If(t, f(t, ,(t» I < f on the interval
[ f - IX, f + IX] whenever k is so large that I,... (t) - ,(t)1 < ~ on [ f - IX,
r
f + IX.] sU ing Eq. (5.10.36) we now have
I [f(s, , ..,(s» - f(s, ,(s» + ~ ...(s)]ds < I If If(s, , ..(s» - f(s, ,(s» Ids I
+ If I~ . ,(s) Ids 1< lX(f ... + f) f' .
r
A
Therefore, ~ f [f(s, ,.,(s» + ~.,(s)]ds = f(s, ,(s» d s. It now follows that
tp(t) = ~ + f f(s, ,(s»ds,

which completes the proof. _
sU ing Theorem 5.10.33, the reader can readily prove the next result.
5.10.37. Corollary. In Eq. (5.10.25), let f be continuous on a domain D

ofthe (t, x ) plane, and let (f, ~ E D. Then the initial-value problem (5.10.26)
has a solution, on some t interval containing f.
Theorem 5.10.33 (along with Corollary 5.10.37) is known in the literature

as the Caucby-Peano existence tbeorem. Note that in these results the solution
, is not guaranteed to be unique.
Next, we seek conditions under which uniqueness of solutions is assured.
We require the following preliminary result, called the Gronwall inequality.
5.10.39. Theorem. eL t rand k be real continuous functions on an interval

> 0 and k(t) > 0 for all t E a[ , b], and let ~ > 0 be a
a[ , b]. Suppose ret)
given non-negative constant. If

r(t) ~ ~ + f k(s)r(s)ds (5.10.40)
for all t E a[ , b), then
for all t E a[ , b).
Proof. Let R(t) = ~ + f k(s)r(s)ds. Then r(t) < R(t), R(a) = 6, R(t)
= k(t)r(t) < k(t)R(t), and
R(t) - k(t)R(t) < 0 (5.10.42)
for all t E a[ , b). Let (K t) = e-f.k(I).". Then
(K t) = -k(t)e-I~k(l)dl = - K ( t)k(t).
Multiplying both sides of (5. 10.42) by (K t) we have
K(t)R(t) - K(t)k(t)R(t) < 0
or
K(t)R(t) + K(t)R(t) < 0
or
~ (K[ t)R(t») < o.
Integrating this last expression from a to t we obtain
K(t)R(t) - (K a)R(a) < 0
or
K(t)R(t) - ~ < 0
or
or
r(t) < R(t) < ~el~k(l)dl,
which is the desired inequality. •
In our next result we will require that the function 1 in Eq. (5.10.25)
satisfy a Lipschitz condition
I/(t, x ' ) - I(t, x " ) I < k lx' - x"I
for all (t, x'), (t, x " ) E D.
5.10.43. Theorem. In Eq. (5.10.25), let 1 be continuous on a domain D

of the (t, x ) plane, and let 1 satisfy a Lipschitz condition with respect to x
on D. Let (t, e> E D. Then the initial-value problem (5.10.26) has a unique
solution on some t interval containing t (i.e., if' l and are two solutions 2'
of Eq. (5.10.25) on an interval (a, b), if r- E (a, b), and if 'I(-r) = 'z(-r) = e,
then' l = ,z ) .
Proof. By Corollary 5.10.37, at least one solution exists on some interval
(a, b), r- E (a, b). Now suppose there is more than one solution, say ' I
and 'z, to the initial-value problem (5.10.26). Then
"~et) = e+ s: f(s, ,,(s»ds, i = 1,2

for all t E (a, b), and
' I (t) - ,z ( t) = s: [f(s, 'I(S» - f(s, ' z ( s» ] d s.
Let ret) = l' l (t) - ' z ( t)l, and let k > 0 denote the iL pschitz constant for
f. In the following we consider the case when t ~ r- , and we leave the details
of the proof for t < r- as an exercise. We have,
ret) < s: If(s, 'I(S» - f(s, ,z(s))l ds < s: k l'l(s) - ,z ( s) Ids
= f kr(s)ds;
i.e.,
ret) < s: kr(s)ds
for all t E r-[ , b). The conditions of Theorem 5.10.39 are clearly satisfied and
we have: if r(t) ~ J + f kr(s)ds, then ret) < JeJ~""'. Since in the present
case J = 0, it follows that
ret) = 0 for all t E r-[ , b).
Therefore, l' l (t) - ,z ( t) 1 = 0 for all t E r-[ , b), and ' I (t) = ' z ( t) for all t
in this interval. _
Now suppose that in Eq. (5.10.25) f is continuous on some domain D

of the (t, x ) plane and assume thatfis bounded on D; i.e., suppose there
exists a constant M > 0 such that
sup If(t, )x 1 ~ M.
(""leD
Also, assume that r- E (a, b), that (-r, e> E D, and that the initial-value
problem (5.10.26) has a solution, on a t interval (a, b) such that (t, ,(t)) E D
for all t E (a, b). Then
lim ,(t)
1- . +
= ,(a+ ) and lim ,(t)
1- 6 -
= ,(b- )
exist. To prove this, let t E (a, b). Then
,(t) = e+ f I(s, ,(s»)ds.

If a < t1 < t1 < b, then
1,(t 1) - ,(t1 )1 < i"lf(s, ,(s» 1ds < Mlt 1 - til·

"
Now let t 1 - - band t 1 - - b. Then It1 - t 1 1- - 0, and therefore I,(tl)
- ,(t1) I- - O. This limiting process yields thus a convergent Cauchy
sequence; i.e., ,(b- ) exists. The existence of ,(a+ ) is similarly established.
Next, let us assume that the points (a, ,(a+ » , (b, ,(b- » are in the domain
D. We now show that the solution, can be continued to the right of t = b.
An identical procedure can be used to show that the solution P' can be
continued to the left of t = a.
We define a function
;(t) = { , (t), t E (a, b) } .

,(b- ) , t = b
Then
;(0 = {+ s: f(s, ;(s»ds
for all t E (a, bl. Thus, the derivative of ;(t) exists on the interval (a, b),
and the left-hand derivative of ;(t) at t = b is given by
;(b- ) = feb, ;(b» .
Next, we consider the initial-value problem
.i = f(t, x )
x(b) = ,(b- ) .
By Corollary 5.10.37, the differential eq u ation .i = f(t, x ) has a solution
' " which passes through the point (b, ,(b- » and which exists on some interval
lb, b + Pl, P > O. Now let
~1) = { ; (1), 1 E (a, bl }.
",(1), [ , b+
1 E b Pl
To show that ; is a solution of the differential eq u ation on the interval
(a, b + Pl, with ;(-r) = ,{ we must show that ; is continuous at t = b.
r
Since
,(b- ) = {+ f(x , ;(s» d s
and since
;(0 = ,(b- ) + s: f(s, ;(s»ds,
we have
;(0 = {+ s: f(s, ;(s»ds
for all t E (a, b + Pl. The continuity of ; in the last eq u ation implies the
countinuity of I(s, s~ .» Differentiating the last equation, we have

~(t) = I(t, ~(t»
for all t E (a, b + Pl.
We call ~ a continuation of the solution tp to the interval (a, b Pl. If +
1 satisfies a Lipschitz condition on D with respect to ,x then ~ is unique,
and we call ~ the continuation of tp to the interval (a, b + Pl.
We can repeat the above procedure of continuing solutions until the
boundary of D is reached.
Now let the domain D be, in particular, a rectangle, as shown in F i gure
M. It is important to notice that, in general, we cannot continue solutions
over the entire t interval T shown in this figure.
0= h{ . )x : Tl < t < T 2.tl < x < t 2)

T= ( Tl.T2)
T t
5.10.4.4 iF gure M. Continuation of a solution to the boundary of

domain D.
We summarize the above discussion in the following:
5.10.45. Theorem. In Eq. (5.10.25), let f be continuous and bound on a

domain D of the (t, x) plane and let (T, { ) E D. Then all solutions of the
initial-value problem (5.10.26) can be continued to the boundary of D.
We can readily extend Theorems 5.10.28, 5.10.33, Corollary 5.10.37,

and Theorems 5.10.43 and 5.10.45 to initial-value problems characterized by
systems of n first-order ordinary differential equations, as given in Definition
.4 11.9 and Eq. .4 1 1.11. In doing so we replace D c R'1. by D c Ra+ I , x E R
by x E RaJ: D - + R by f: D - + Ra, the absolute value Ixl by the q u antity
a
Ixl = I; Ix,l, (5.10.46)
-
'sl
and the metric p(x, y) = Ix - y I on R by the metric p(x , y) = I; I,x - y,l

I- '
on R-. (The reader can readily verify that the function given in Eq. (5.10.46)
satisfies the axioms of a norm (see Theorem .4 9.31).) The definition of E-
approximate solution for the differential eq u ation i = f(t, x ) is identical
to that given in Definition 5.10.27, save that scalars are replaced by vectors
(e.g., the scalar function tp is replaced by the n-vector valued function p4 ).
Also, the modifications involved in defining a Lipschitz condition for f(t, )x

on D c R-+ I are obvious.
5.10.47. Exercise. F o r the ordinary differential eq u ation

i = f(t, x) (5.10.48)
and for the initial-value problem
i = f(t, x), (X T) = ~ (5.10.49)
characterized in Eq. (4.11.7) and Definition .4 11.9, respectively, state and
prove results for existence, uniqueness, and continuation of solutions, which
are analogous to Theorems 5.10.28, 5.10.33, Corollary 5.10.37, and Theorems
5.10.43 and 5.10.45.
In connection with Theorem 5.10.45 we noted that the solutions of initial-
value problems described by non-linear ordinary differential equations can,
in general, not be extended to the entire t interval T depicted in Figure M.
We now show that in the case of initial-value problems characterized by
linear ordinary differential equations it is possible to extend solutions to the
entire interval T. First, we need some preliminary results.
Let
D = ({ t, )x : a < t < b, x E R-} (5.10.50)
where the function 1·1 is defined in Eq. (5.10.46). Consider the set of linear
equations
,X = t
J-I
a,it)x J I.>. f,(t, x ) , i = 1, ... , n (5.10.51)
where the a,it), i,j = I, ... , n, are assumed to be real and continuous
functions defined on the interval a[ , b]. We first show that f(t, x) = lfl(t, x),
... ,/_(t, )x T
] satisfies a Lipschitz condition on D,
If(t, x ' ) - f(t, x " ) I ~ k lx ' - x"l
for all (t, x ' ) , (t, x " ) E D, where x ' = (x ; , ... ,x~)T, x" = (x:' , ... ,x~)T,
and k = max
I! ( .J ! ( ._ I - I
-
L I a,it) I· Indeed, we have
Ir(t, x ' ) - r(t, x " ) I = ,~ II,(t, x ' ) - I,(t, x " ) I
= L - I - 1=1' ~ a,it)x~ - - L a,it)x~ I

I
I- I J-I
=
I- 'tit a,it)(x~ J-I
- x~)
< k L
J-I
- Ix~ - x~'1 = klx' - "x l·
Next, we prove the following:
5.10.52. Lemma. In Eq. (5.10.48), let f(t, x) = (/1(t, x), ... ,I,,(t, T
»X be
2'
continuous on a domain D c R"+I, and let f(t, x) satisfy a Lipschitz condition
on D with respect to x, with Lipschitz constant k. If' l and are uniq u e
solutions of the initial-value problem (5.10.49), with ' I (f) = ;1' ' 2 (t) = ;2
and with (t, ;1), (t, ;2) E D, then
"I(t) - 2' (01::;: 1;1 - ;2Iekl,-1< (5.10.53)
for all (t, ' I (t» , (t, ' 2 { t » E D.
Proof We assume that t > t, and we leave the details of the proof for
r
t < t as an exercise. We have
= ;1 +
r
' I (t) f(s, ' I (s» d s,
' 2 (t) = ;2 + f(s, t2(s»ds,
and
"I{ t ) - t2(t) 1 < 1;1 - ;11 + k s: Ifl(s) - f1(S) 1ds. (5.10.54)
Applying Theorem 5.10.39 to inequality (5.10.54), the desired inequality

(5.10.53) results. _
We are now in a position to prove the following important result for

systems of linear ordinary differential eq u ations.
5.10.55. Theorem. L e t D c Rn+l be given by Eq. (5.10.50), and let the

real functions alit), i,j = I, ... ,n, be continuous on the t interval a[ , b].
Then there exists a unique solutionto the initial-value problem
IX = t
1=1
a1it)x1 6 !,(t, )x , i = I, ... ,n }
(5.10.56)
X,(f) = ~I' i = I, ... , n
with (t, ~1' •• '~n) E D. This solution can be extended to the entire interval
a[ , b].
Proof Since the vector f(t, x) = (fl{ t , x), ... ,/.(t, T»x is continuous on
D, since f(t, x) satisfies a Lipschitz condition with respect to x on D, and
since (T,;) E D (where; = (~I' ... '~n)T), it follows from Theorem 5.10.43
(interpreted for systems of first-order ordinary differential eq u ations) that
the initial-value problem (5.10.56) has a uniq u e solution 'I' through the point
(r, ;) over some interval e[ , d] c a[ , b]. We must show that 'I' can be continued
to a unique solution, over the entire interval a[ , b].
subinterval of a[ , b]. Applying Lemma 5.10.52 to

have
I'
Let i be any solution of Eq. (5.10.56) through (r, ;) which exists on some
= i and = 0, we 2'
(5.10.57)
for all t in the domain of definition of i. F o r purposes of contradiction,
suppose that 'I' does not have a continuation to a[ , b] and assume that 'I'
has a continuation i existing up to t' < b and cannot be continued beyond
t'. But inequality (5.10.57) implies that the path (t, i(t» remains inside a
closed bounded subset of D. It follows from Theorem 5.10.45, interpreted
for systems of first-order ordinary differential equations, that i may be
continued beyond t'. We thus have arrived at a contradiction, which proves
that a continuation, of", exists on the entire interval a[ , b]. This continuation
is unique because f(t, )x satisfies a Lipschitz condition with respect to x
on D. •
S.lO.58. Exercise. In Theorem 5.10.55, let alj(t), i,j = 1, ... ,n, be

continuous on the open interval (- 00, 00). Show that the initial-value prob-
lem (5.10.56) possesses unique solutions for every (r, E Rn+1 which can e>
be extended to the t interval (- 00, 00).
5.10.59. Exercise. eL t D c Rn+ I be given by Eq. (5.10.50), and let the real
functions alit), v/(t), i,j = I, ... ,n, be continuous on the t interval a[ , b].
Show that there exists a unique solution to the initial-value problem
lx r- ) = e/, i = I, ... ,n, (5.10.60)

with (r, el' ... ,en) E D. Show that this solution can be extended to the
entire interval a[ , b].
It is possible to relax the conditions on v/(t), j = 1, ' . ' . ,n, in the above
exercise considerably. F o r example, it can be shown that if v/(t) is piecewise
continuous on a[ , b], then the assertions of Exercise 5.10.59 still hold.
We now address ourselves to the last item of the present section. Consider
the initial-value problem (5.10.49) which we characterized in Definition
.4 11.9. Assume that f(t, )x satisfies a Lipschitz condition on a domain D
c Rn+1 and that (r,;) E D. Then the initial-value problem possesses a
unique solution, over some t interval containing 1'. To indicate the depen-
340 Chapter 5 / Metric Spaces
dence of, on the initial point (r, ;), we write

,(t; T, ;),
where fP{T; T,;) = ;. We now ask: What are the effects of different initial
conditions on the solution of Eq. (5.1O.48)? Our next result provides the
answer.
5.10.61. Theorem. In Eq. (5.10.49) let f(t, )x satisfy a iL pschitz condition

with respect to x on Dc R·+I. Let (T,;) E D. Then the unique solution
f(t; T, ;) of Eq. (5.10.49), existing on some bounded t interval containing T,
depends continuously on ; on any such bounded interval. (This means if
;. -> ; , then ,(t; T, ;.) - > f(t; T, ;).)
r
Proof We have
cp(t; T, ;.) = ;. + frs, ,(s; T, ;.)]ds
+ r
and
,(t; T,;) =; frs, ,(s; T, ;)]ds.
> <
+ r
It follows that for t T (the proof for t T is left as an exercise),
<
r
I.(t; T, ;.) - .(t; T, 1;)1 II;. - ; 1 If[s, ,(s; T, ;.)] - frs, cp(s; T, ;)]1 ds
< II;. - ;1 + k ' , (s; T, 1;.) - cp(s; T, ;)1 ds,

where k denotes a iL pschitz constant for f(t, )x . sU ing Theorem 5.10.39,
we obtain
I,(t; T, 1;.) - ,(t; T, 1;)1 < II;. - 1;1 ef~kd' = II;. -1;1 ek(t- T l.
Thus if 1;. - > 1;, then cp(t; T, 1;.) - > .(t; T, 1;). •
It follows from the proof of the above theorem that the convergence is
uniform with respect to t on any interval a[ , b] on which the solutions are
defined.
5.10.62. Example. The initial-value problem
x = 2x } (5.10.63)
(X T) c= ;
where -00 < T < 00, - 0 0 < c; < 00, has the unique solution
,(t; T, e> = c;eZ(t-T), - 0 0 < t < 00,
which depends continuously on the initial value C;. •
5.11. References and Notes 341
Thus far, in the present section, we have concerned ourselves with prob-
lems characterized by real ordinary differential equations. It is an easy matter
to verify that all the existence, uniqueness, continuation, and dependence
(on initial conditions) results proved in the present section are also valid for
initial-value problems described by complex ordinary differential equations
such as those given, e.g., in Eq. (4.11.25). In this case, the norm of a complex
vector z = (z " ... ,ZRY ' Zk = kU + ivk , k = 1, ... , n, is given by
where IZk I = (u~ + vl)I/2. The metric on en is in this case given by P(ZI' Z2)
= IZI - Z21·
There are numerous excellent texts on metric spaces. Books which are
especially readable include Copson 5[ .2], Gleason 5[ .3], Goldstein and
Rosenbaum [5.4], Kantorovich and Akilov [5.5], oK lmogorov and Fomin
5[ .7], Naylor and Sell 5[ .8], and Royden 5[ .9]. Reference 5[ .8] includes some
applications. The book by eK lley 5[ .6] is a standard reference on topology.
An excellent reference on ordinary differential equations is the book by
Coddington and eL vinson [5.1].
REFERENCES
5[ .1] E. A. CODDINGTON and N. EL VINSON, Theory ofOrdinary Differential Equa-
tions. New oY rk: McGraw-iH li Book Company, Inc., 1955.
5[ .2] E. T. CoPSON, Me/ric Spaces. Cambridge, England: Cambridge nU iversity
Press, 1968.
5[ .3] A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.:
5[ .4] M. E. GOLDSTEIN and B. M. ROSENBAUM, "Introduction to Abstract Analy-
sis," National Aeronautics and Space Administration, Report No. SP-203,
Washington, D.C., 1969.
5[ .5] L. V. A K NTOROVICH and G. P. AKIO
L V, uF nctional Analysis in Normed
Spaces. New o Y rk: The Macmillan Company, 1964.
5[ .6] .J EK EL ,Y General Topology. Princeton, N.J.: D. Van Nostrand Company,
Inc., 1955.
5[ .7] A. N. O K M
L OGOROV and S. V. O F MIN, Elements of the Theory ofFunctions
and uF nctional Analysis. Vol. I. Albany, N.Y.: Graylock Press, 1957.
5[ .8] A. W. NAYO L R. and G. R. SEL,L iL near Operator Theory in Engineering and

Science. New Y o rk: H o lt, Rinehart and Winston, 1971.
5[ .9] H . L . ROYDEN, Real Analysis. New Y o rk:The Macmillan Company,I965.
5[ .10] A. E. TAYOL R., General Theory of uF nctions and Integration. New Y o rk;
Blaisdell Publishing Company, 1965.
6
NORMED SPACES AND

INNER PRODUCT SPACES
In Chapters 2- 4 we concerned ourselves primarily with algebraic aspects of

certain mathematical systems, while in Chapter 5 we addressed ourselves to
topological properties of some mathematical systems. The stage is now set
to combine topological and algebraic structures. In doing so, we arrive at
linear topological spaces, namely normed linear spaces and inner product
spaces, in general, and Banach spaces and Hilbert spaces, in particular. The
properties of such spaces are the topic of the present chapter. In the next
chapter we will study linear transformations defined on Banach and Hilbert
spaces. The material of the present chapter and the next chapter constitutes
part of a branch of mathematics called functional analysis.
Since normed linear spaces and inner product spaces are vector spaces as
well as metric spaces, the results of Chapters 3 and 5 are applicable to the
spaces considered in this chapter. Furthermore, since the Euclidean spaces
considered in Chapter 4 are important examples of normed linear spaces and
inner product spaces, the reader may find it useful to refer to Section .4 9 for
proper motivation of the material to follow.
The present chapter consists of 16 sections. In the first 10 sections we
consider some of the important general properties of normed linear spaces
and Banach spaces. In sections II through 14 we examine some of the
important general characteristics of inner product spaces and Hilbert spaces.
(Inner product spaces are special types of normed linear spaces; Hilbert
343
344 Chapter 6 I Normed Spaces and Inner Product Spaces
spaces are special cases of Banach spaces; Banach spaces are special kinds of
nonned linear spaces; and H i lbert spaces are special types of inner product
spaces.) In section 15, we consider two applications. This chapter is con-
cluded with a brief discussion of pertinent references in the last section.
6.1. NORM ED IL NEAR SPACES
Throughout this chapter, R denotes the field ofreal numbers, C denotes the
field of complex numbers, F denotes either R or C, and X denotes a vector
space over .F
6.1.1. Definition. Let II 1· 1 denote a mapping from X into R which satisfies

the following properties for every ,x y E X and every « E :F
(i) IIxll ~ 0;
(ii) IIxll = 0 if and only if x = 0;
(iii) II/%IX I = 1«1l· lxll; and
(iv) Ilx + yll ~ IIxll + lIyll·
The function " • "is called a nonn on X, the mathematical system con-
sisting of II • I\ and ,X { X ; " • II}, is called a nonned linear space, and II x II
is called the nonn or .x If F = C we speak of a complex nonned linear space,
and if F = R we speak of a real nonned linear space.
Different norms defined on the same linear space X yield different nonned
linear spaces. If in a given discussion it is clear which particular norm is
being used, we simply write X in place of { X ; " • II} to denote the nonned
linear space under consideration. Properties (iii) and (iv) in Definition 6.1.1
are called the homogeneity property and the triangle inequality of a nonn,
respectively.
Let { X ; II • II} be a normed linear space and let ,x E ,X i = I, ... ,n.
Repeated use of the triangle inequality yields
II X I + ... + .x 1I ~ I\ x I " + ... + IIx.lI·
The following result shows that every normed linear space has a metric
associated with it, induced by the. nonn I\ • II. Therefore, every nonned
linear space is also a metric space.
6.1.2. 1beorem. L e t ;X { II • III be a nonned linear space, and let p be a

real-valued function defined on X x X given by p(x, y) = IIx - yll for all
,x y E .X Then p is a metric on X and ;X{ p} is a metric space.

6.1. oH rmed iL near Spaces
This theorem tells us that all ofthe results in the previous chapter on metric
spaces apply to normed linear spaces as we/l,providedwe let p(x, y) = Ilx - y II.
We will adopt the convention that when using the terminology ofmetric spaces
(e.g., completeness, compactness, convergence, continuity, etc.) in a normed
linear space (X ; II . Ill, we mean with respect to the metric space (X ; p},
where p(x, y) = II x - y II. Also, whenever we use metric space properties on ,F
i.e., on R or C, we mean with respect to the usual metric on R or C, respectively.
With the foregoing in mind, we now introduce the following important
concept.
6.1.4. Definition. A complete normed linear space is called a Banach

space.
Thus, (X ; (II • II} is a Banach space if and only if (X ; p} is a complete

metric space, where p(x , y) = IIx - yll.
6.1.5. Example. Let X = RR, the space of n-tuples of real numbers, or let
X = CR, the space of n-tuples of complex numbers. From Example 3.1.10 we
see that X is a vector space. F o r x E X given by x = (e I' . . . , e.), and for
pER such that I < p < 00, define
II x II, = l[ ei I' + ... + leRI'p /,.
We can readily verify that II . II, satisfies the axioms of a norm. Axioms
(i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv) is a direct
conseq u ence of Minkowski' s inequality for finite sums (5.2.6). L e tting
pix , y) = II x - y II" then (X ; p,} is the metric space of Exercise 5.5.25.
Since (X ; p,} is complete, it follows that (RR; II . II,} and (CR; II . II,} are
Banach spaces.
We may also define a norm on X by letting
IIxll .. = max
I~R~'
le,l·
It can readily be verified that (R\ II . II..} and (CR; II . II..} are also Banach
spaces (see Exercise 5.5.25). •
6.1.6. Example. Let X = R" (see Example 3.1.11) or X = C" (see Exam-
ple 3.1.13), let I S p < 00, and as in Example 5.3.5, let
I, = x{ E :X f; le,l'
I~'
< oo}.
Define
IIxll, = (~ .. le,l' )1/' .

/- 1
(6.1.7)
It is readily verified that II . II, is a norm on the linear space I,. Axioms
(i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv), the triangle
inequality, follows from Minkowski' s inequality for infinite sums (5.2.7).

Invoking Example 5.5.26, it also follows that l{ p; II . lip} is a Banach space.
H e nceforth, when we simply refer to the Banach space Ip , we assume that the
norm on this space is given by Eq. (6.1.7).
L e tting p = 00 and
I.. = x{ E X: sup
/
ne/1} < oo}
(refer to Example 5.3.8), and defining
IIxII .. = sup
/
ne/I}, (6.1.8)
it is readily verified that I{ .. ; II • II..} is also a Banach space. When we simply

refer to the Banach space I.., we have in mind the norm given in Eq.
(6.1.8). •
6.1.9. Example
(a) L e t e[a, b) denote the linear space of real continuous functions on
the interval a[ , b), as given in Example 3.1.l9. F o r x E era, b) define
Ilxllp = i[ "lx(t Wdt I] IP ,
b
I <p < 00.
It is easily shown that lela, b); II . lip} is a normed linear space. Ax i oms
(i)-(iii) of Definition 6.1.1 follow trivially, while axiom (iv) follows from the
Minkowski inequality for integrals (5.2.8). L e t pix , y) = IIx - lY l p • Then
e{ ra, b); pp} is a metric space which is not complete (see Example 5.5.29
where we considered the special case p = 2). It follows that e{ ra, b); II . lip}
is not a Banach space.
Next, define on the linear space era, b) the function II . II.. by
IIxII .. = sup Ix ( t)
' E la,b)
I.
It is readily shown that e{ ra, b); II • II..} is a normed linear space. L e t p.. (x, y)
= I\ x - yll... In accordance with Example 5.5.28, e{ ra, b); P..} is a complete
metric space, and thus e{ ra, b); II . II..} is a Banach space.
The above discussion can be modified in an obvious way for the case
where era, b) consists of complex-valued continuous functions defined on
a[ , b). H e re vector addition and multiplication of vectors by scalars are defined
similarly as in Eqs. (3.1.20) and (3.1.21), respectively. F u rthermore, it is
easy to show that e{ ra, b); II • lip}, I < p < 00, and e{ ra, b); II . II..} are
normed linear spaces with norms defined similarly as above. Once more, the
space e{ ra, b); II • lip}, I < p < 00, is not a Banach space, while the space
lela, b); II • II..} is.
(b) The metric space pL { (a, b); pp} was defined in Example 5.5.31. It
can be shown that pL a[ , b) is a vector space over R. If we let
IIXI1p=f[ ),,,,bl
IfIPdJlI] /P,
6.1. NormedLinear Spaces 347
p > I, for f E pL a[ , b], where the integral is the Lebesgue integral, then
pL { a[ , b]; II . lip} is a Banach space since pL { a[ , b]; pp} is complete, where
pp(x, y) ~ Ilx - lY lp. _
6.1.10. Example. Let { X ; II • II..}, {Y; II . II,.} be two normed linear spaces
over ,F and let X x Y denote the Cartesian product of X and .Y Defining
vector addition on X x Y by
(XI' IY ) + (x z ' )z Y = (XI + X z , IY + )z Y
and multiplication of vectors by scalars as
(« ,x y) = (<,x< «y),
we can readily show that X x Y is a linear space (see Eqs. (3.2.14), (3.2.15)
and the related discussion). This space can be used to generate a normed
linear space { X x ;Y II . III by defining the norm II . II as
II(x, y)11 = IIxll .. + IIYII,·
F u rthermore, if { X ; II . II..} and { Y ; II • II,} are Banach spaces, then it is
easily shown that { X x ;Y II . III is also a Banach space. _
6.1.11. Exercise. Verify the assertions made in Examples 6.1.5 through

6.1.10.
We note that in a normed linear space { X ; II • III a sphere S(x o; r) with

center X o E X and radius r > 0 is given by
S(x o; r) = x{ E X: IIx - ox ll < rl· (6.1.12)
Referring to Theorem 5.4.27 and Exercise 5.4.31, recall that in a metric
space the closure of a sphere (denoted by S(x o; r» need not coincide with the
closed sphere (denoted by (K x o; r». In a normed linear space we have the
following result.
6.1.13. Theorem. Let X be a normed linear space, let X o E ,X and let

r > O. L e t S(x o; r) denote the closure of the open sphere S(x o; r) given by
Eq. (6.1.12). Then S(x o; r) = (K x o; r), the closed sphere, where
K(xo;r) = x{ E X : llx - x o lI< r } . (6.1.14)
Proof By Exercise 5.4.31 we know that S(x o ; r) c (K x o; r). Thus, we
need only show that (K x o; r) c S(x o; r). It is clearly sufficient to show that
x { E :X Ilx - ox ll = r} c S(x o; r). To do so, let x be such that IIx - ox ll
= r, and let 0 < € < I. Let Y = € X o + (I - f)x . Then y - X o = (I - € )
(x - x o)' Thus, Ily - ox ll = II - 1 € l· lx - ox ll < r and so y E S(x o; r).
Also, y - x = (€ x o - )x . Therefore, Ily - ix i = f • r. This means that
x E S(x o; r), which completes the proof. _
Thus, in a nonned linear space we may call S(x o; r) the closed sphere
given by Eq. (6.1.14).
When regarded as a function from X into R, a nonn has the following
important property.
6.1.15. Theorem. Let ;X { II . III be a nonned linear space. Then II • II is

a continuous mapping of X into R.
Proof We view II . II as a mapping from the metric space {X; p}, P =
II x - y II, into the real numbers with the usual metric for R. Thus, for given
f > 0, we wish to show that there is a t5 > 0 such that II x - y II < t5 implies
IlIx l l- l Iylll < f. Now let z = x - y. Then x = z + y and so Ilxll <
Ilzll + lIyll· This implies that IIx l l- l lyll < Ilzll. Similarly, y = x - ,z
and so IIYII < Ilxll + II- z l l = Ilxll + IIzll. Thus, IIYII-Ilxll < IIzll. It
now follows that Illx l l- l Iylll< IIzll = IIx - yll. Letting t5 = f, the
desired result follows. •
In this chapter we will not always require that a particular nonned linear
space be a Banach space. Nonetheless, many important results of analysis
require the completeness property. This is also true in applications. F o r
example, in the solution of various types of equations (such as non-linear
differential equations, integral equations, etc.) or in optimization problems
or in non-linear feedback problems or in approximation theory, as well as
many other areas of applications, we frequently obtain our desired solution
in the form of a sequence generated by means of some iterative scheme. In
such a sequence, each succeeding member is closer to the desired solution
than its predecessor. Now even though the precise solution to which a
sequence of this type may converge is unknown, it is usually imperative that
the sequence converge to an element in that space which happens to be the
setting of the particular problem in question.
6.2. IL NEAR SUBSPACES
We now turn our attention briefly to linear subspaces of a normed linear

space. We first recall Definition 3.2.1. A non-empty subset Y of a vector
space X is called a linear subspace in X if (i) x + y E Y whenever x and y
are in ,Y and (ii) «x E Y whenever « E F and x E .Y Next, consider a
normed linear space { X ; 1I·11l, let Y be a linear subspace in ,X and let II· lit
denote the restriCtion of 11·11 to ;Y i.e.,
IIxll l = Ilxll for all x E .Y
Then it is easy to show that { Y ; 1\ • lit} is also a nonned linear space. We
6.2. iL near Subspaces 349
call II . III the norm induced by II . lion Y a nd we say that { Y ; II • III} is a

normed linear subspace of ;X { II • II}, or simply a linear subspace of .X Since
there is usually no room for confusion, we drop the subscript and simply
denote this subspace by { Y ; II • III In fact, when it is clear which norm is
being used, we usually refer to the normed linear spaces X and .Y
Our first result is an immediate consequence of Theorem 5.5.33.
6.2.1. Theorem. Let X be a Banach space, and let Y be a linear subspace

of .X Then Y is a Banach space if and only if Y is closed.
In the following we give an example of a linear subspace of a Banach space

which is not closed.
6.2.2. Example. Let X be the Banach space /1 of Example 6.1.6, and let
Y be the space of finitely non-zero sequences given in Example 3.1.14. It is
easily shown that Y is a linear subspace of .X To show that Y is not closed,
consider the sequence (y.l in Y defined by
IY = (1,0,0, ...),
1Y = (I, 1/2,0,0, ...),
!Y = (I, 1/2, 1/4,0,0, ...),
......................... ,
Y. = (I, 1/2, ... , 1/2· , 0,0, ).
This sequence converges to the point x = (I, 1/2, , 1/2·, 1/2-+ 1, .• . ) E .X
Since x ¢ ,Y it follows from part (iii) of Theorem 5.5.8 that Y is not a
closed subset of .X _
Next, we prove:
6.2.3. Theorem. Let X be a Banach space, let Y be a linear subspace of

,X and let f denote the closure of .Y Then f is a closed linear subspace of .X
Proof Since Y is closed, we only have to show that Y is a linear subspace.
Let ,x Y E ,Y and let f > O. Then there exist elements ,'x y' E Y such that
IIx - x'lI <and lIy - y' l l < f. Hence, for arbitrary ~,P
f E ,F 'X~ +
py' E .Y Now + py) - (~X' + Py' ) II = 1I~(x - x ' ) + P(y - y')11 <
1I(~x
I~I ·lIx - x ' i l + IPI · l Iy - y'll < (I~I + IPlk Since f > 0 is arbitrary,
this implies that ~x + py is an adherent point of ;Y i.e., x~ + py E .Y
We conclude this section with the following useful result.
6.2.4. Theorem. Let X be a normed linear space, and let Y be a linear

subspace of .X If Y is an open subset of ,X then Y = .X
Proof eL t x E .X We wish to show that x E .Y Since 0 E ,Y we may

assume that x 1= = O. Since Y is open and 0 E ,Y there is some l > 0 such that
the sphere S(O; l) C .Y eL t z = 2lixllx, Then IIzll < l and so z E .Y Since
Yis a linear subspace, it follows that 211 x llz = x E Y .•
l
6.3. INFINITE SERIES
Having defined a norm on a linear space, we are in a position to consider

the concept of infinite series in a meaningful way. Throughout this section
we refer to a normed linear space ;X{ II • II} simply as .X
6.3.1. Definition. eL t x{ 8} be a sequence of elements in .X F o r each positive
integer m, let
y", = XI + ... + "x .'
We call y{ ",} the sequence of partial sums of x{ 8 } . If the sequence y{ ",} con-
verges to a limit y E ,X we say the infinite series
XI + X z + ... + X k + ... =
-
I; X 8
-
8- 1
converges and we write
-
y= I; X
8- 1
8•
We say the infinite series I; X 8 diverges if the sequence "Y { ,} diverges.

8 2 1
The following result yields sufficient conditions for an infinite series to

converge.
.
6.3.2. Theorem. eL t X be a Banach space, and let x {
IfI; IIx 11 < then
8} be a sequence in X .
.
8 00,
8= 1
(i) the infinite series I; X

-
converges; and
.
8
8= 1
(ii) III; X II < I; II X II· 8 8

8= 1 8 1 2
Proof To prove the first part, let y", = XI + ... + "x .' If n > m, then
8Y - "Y , = X " ,+ I + ... + x 8• eH nce,
Since i: II
8= 1
X 8
II is a convergent infinite series of real numbers, the sequence
6.4. Convex Sets 351
of partial sums sIft = IIxIl1 + ... + II x'" II is Cauchy. Hence, given f > 0,
there is a positive integer N such that n > m > N implies Is. - SIft I:: :; ; f.
But Is. - s.. 1> Ily. - y",lI, and so Y { ..} is a Cauchy sequence. Since X is
complete, y{ ",} is convergent and conclusion (i) follows.
To prove the second part, let y", = IX X .. , and let y = + ... +
lim y", =
... at
I; X +
.
•. Then for each positive integer m we have y = y- y", Y .. and
.=1
Ilyll < Ily - y .. 11 +
Ily.. 1I < Ily - y.. 11 + I; Ilx/ll· Taking the limit as
... ... 1= 1
m -+ 00, we have III; Ix II < I; II tx ll- •

1= 1 I- I
6.4. CONVEX SETS
In the present section we consider the concepts of convexity and cones

which arise naturally in many applications. Throughout this section, X is a
real normed linear space.
L e t X and y be two elements of .X We call the set xy, defined by
xy = z{ E X: Z= ax + (I - a)y for all a E R such that 0 < a < I},
the line segment joining X and y. Convex sets are now characterized as
follows.
6.4.1. Definition. L e t Y be a subset of .X Then Y is said to be convex if

Y contains the line segment x y whenever X and yare two arbitrary points
in .Y A convex set is called a convex body if it contains at least one interior
point, i.e., if it completely contains some sphere.
In F i gure A we depict a line segment xy, a convex set, and a non-convex

set in R2.
)(
line segment yx Convex set Non-convex set

6.4.2. Figure A
Note that an equivalent statement for Y to be convex is that if ,x y E Y

then « x + py E Y whenever « and p are positive constants such that
«+P=1.
We cite a few examples.
6.4.3. Example. The empty set is convex. Also, a set consisting of one
point is convex. In R3, a cube and a sphere are convex bodies, while a plane
and a line segment are convex sets but not convex bodies. Any linear sub-
space of X is a convex set. Also, any linear variety of X (see Definition 3.2.17)
is a convex set. _
6.4..4 Example. Let Y and Z be convex sets in ,X let II, pER, and let
«Y=
.X _
x{ E :X x = «Y,y E .J Y Then the set « Y + pZ is a convex set in
6.4.5. Exercise. Prove the assertions made in Examples 6.4.3 and 6.4..4
6.4.6. Theorem. eL t Y be a convex set in ,X and let II, pER be positive

scalars. Then (<< + P)Y = « Y + pY .
Proof Regardless of convexity, if x E (<< + P)Y, then x = (<< + P)y =
ay + py E « Y + pY , and thus (<< + P)Y c « Y + pY . Now let Y be
convex, and let x = « y pz, where y, z E .Y Then +
1 « p
« + pX = i:+ f J Y + i+ " J z E ,Y
because
"i:+«P +~-tiJ1i:+
-1 J - i:+P - .
Therefore, x E (<< + P)Y and thus « Y + pY c (<< + P)Y. This completes
the proof. _
6.4.7. Theorem. Let e be an arbitrary collection of convex sets. The

intersection n
Y is also a convex set.
eY e
The preceding result gives rise to the following concept.
6.4.9. Definition. eL t Y be any set in .X The convex bull of ,Y also called

the convex cover of ,Y denoted by Y e , is the intersection of all convex sets
which contain .Y
6.4. Convex Sets 353
We note that the convex hull of Y is the smallest convex set which con-
tains .Y Examples of convex covers of sets in R2 are depicted in Figure B.
Legend: Y f!J
-
6.4.10. iF gure B. Convex hulls.
6.4.11. Theorem. Let Y be any set in .X The convex hull of Y is the set
of 'points expressible as I~ IY ~2Y2 + + ... + ~"Y", where IY ' ... ,Y" E ,Y
where I~ > 0, i = " ~I =
I, ... , n, where E 1 and where n is not fixed.
1= 1
Proof If Z is the set of elements expressible as described above, then clearly

Z is convex. Moreover, Y c Z, and hence Y e c Z. To show that Z eY e ,
we show that Z is contained in every convex set which contains .Y We do so
by induction on the number of elements of Y that appear in the representation
of an element of Z. Let U be a convex set with :U ::J .Y If z = (% l ZI E Z for
which n = I, then (%1 = 1 and Z E .U Now assume that an element of Z is
in U if it is represented in terms of n - I elements of .Y Let Z = ~l Z I + ...
+ (%"Z" be in Z, let P = ~I + ... + (%"_1' let PI = (%IIP, i = I, ... ,n - I,
andletu = PIZI + ... + P,,- l Z,,- I ' Thenu E ,U by the induction hypothesis.
But "z E ,U (%" = I - P, and Z = pu + (I - P)z" E ,U since U is convex.
This completes the induction, and thus Z c U from which it follows that
Zc eY ' •
6.4.12. Theorem. Let Y be a convex set in .X Then the closure of ,Y ,Y

is also a convex set.
Since the intersection of any number of closed sets is always closed, it

follows from Theorem 6.4.7 that the intersection of an arbitrary number of
closed convex sets is also a closed convex set.
We now consider some interesting aspects of norms in terms of convex
sets.
Chapter 6 / Normed Spaces and Inner Product Spaces
6.4.14. Theorem. Any sphere in X is a convex set.

Proof We consider without loss of generality the unit sphere,
Y={x E X : llx l l< I}.
Ifx o' oY E ,Y then IIx oII 0 and P> 0, where
~ + P= I, then II~xo + PYol1 < lI~xoll + IIPYolI = ~llxoll + PllYol1 <
~ + P= I, and thus ~xo + PYo E .Y •
In view of Theorems 6.1.13, 6.4.1 2, and 6.4.1 4, it follows that a closed

sphere S(x o; r) is also convex. The following example, cast in R2, is rather
instructive.
6.4.15. Example. On R2 we define the norm II • IIi> of Ex a mple 6.1.5. A

moment' s reflection reveals that in case of II . 112' the unit sphere is a circle
of radius I; when the norm is II • II.., the unit sphere is a sq u are with vertices
(1,1), (I, - I ), (- 1 , 1), (- 1 , - I ); if the norm is II • 111' the unit sphere is
the sq u are with vertices (0, I), (I~ 0), (- 1 ,0), (0, - I ). If for the unit sphere
corresponding to II • lip we let p increase from I to 00, then this sphere will
deform in a continuous manner from the sq u are corresponding to II • lit
to the sq u are corresponding to II . II... This is depicted in F i gure C. We note
that in all cases the unit sphere results in a convex set.
F o r the case of the real-valued function
(6.4.16)
the set determined by II x II < 1 results in a set which is not convex. In parti-
cular, if p = 2/3, the set determined by II x II < I yields the boundary and the
interior of an asteroid, as shown in F i gure C. The reason for the non- c onvex i ty
of this set can be found in the fact that the function (6.4.16) does not represent
a norm. In particular, it can be shown that (6.4.16) does not satisfy the triangle
inequality. •
11'11.
Unit spheres for

t, Example 6.4.15
11'11, 1I'lb13
6.4.17. iF gure C. Unit spheres for Example 6.4.15.

6.4.18. Exercise. Verify the assertions made in Example 6.4.15.
We conclude this section by introducing the notion of cone.
6.4.19. Definition. A set Y in X is called a cone with vertex at the origin

if Y E Y implies that Y« E Y for all« > O. If Y is a cone with vertex at the
origin, then the set X o + ,Y X o E ,X is called a cone with vertex X o' A convex
cone is a set which is both convex and a cone.
In Figure D examples of cones are shown.
(al Cone (bl Convex cone
6.4.20. iF gure D
6.5. IL NEAR FUNCTIONALS
Throughout this section X is a normed linear space.

We recall that a mapping, f, from X into F is called a functional on X
(see Definition 3.5.1). Iff is also linear, i.e., f(<x< + py) = « f (x ) + Pf(y)
for all« , P E F and all ,x y E ,X then f is called a linear functional (refer to
Definition 3.5.1). Recall further that X I , the set of all linear functionals on
,X is a linear space over F (see Theorem 3.5.16). eL t f E X I and x E .X In
accordance with Eq. (3.5.10), we use the notation
f(x ) = < x , f) (6.5.1)
to denote the value offat .x Alternatively, we sometimes find it convenient
to let x' E X ' denote a linear functional defined on X and write (see Eq.
(3.5.11))
x'(x) = ,x< x'). (6.5.2)
Invoking Definition 5.7.1, we note that continuity of a functional at a
point X o E X means, in the present context, that for every f > 0 there is a
~ > 0 such that If(x ) - f(x o) I < f whenever II x - X o I\ .~< Our first
356 Chapter 6 I oH rmed Spaces and Inner Product Spaces
result shows that if a linear functional on X is continuous at one point of X

then it is continuous at all points of .X
6.5.3. Theorem. If a linear functional f on X is continuous at some point

X o E X, then it is continuous for all x E .X
Proof If y{ ,,} is a sequence in X such that y" - + x o' thenf(y,,) - + f(x o)' by
Theorem 5.7.8. Now let ,x { ,} be a sequence in X converging to x E .X Then
the sequence y{ ,,} in X given by y" = "x - x + X o converges to x o' By the
linearity off, we have
f(x,,) - f(x ) = f(y,,) -
f(x o)'
Since If(y,,) - f(x o )l- + 0 as y" - + X o' we have If(x,,) - f(x ) l- + 0 as
"x - + x, and therefore f is continuous at x E .X Since x is arbitrary, the
proof of the theorem is complete. _
It is clear that iffis a linear functional and if f(x ) 0 for some x "* E ,X
< (f) = .F
then the range off is all of F ; i.e., R
F o r linear functionals we define boundedness as follows.
6.5.4. Definition. A linear functionalf on X is said to be bounded if there

exists a real constant M > 0 such that
If(x ) I < Mil x II
for all x E .X Iff is not bounded, then it is said to be unbounded.
The following theorem shows that continuity and boundedness of linear

functionals are equivalent.
6.5.5. Theorem.· A linear functional f on a normed linear space X is

bounded if and only if it is continuous.
Proof Assume thatfis bounded, and let M be such that If(x)1 < Mil x II
for all x E .X If"x - + 0, then If(x,,) I < Mil "x 11- + o. H e nce,fis continuous
at x = O. F r om Theorem 6.5.3 it follows thatfis continuous for all x E .X
Conversely, assume thatfis continuous at x = 0 and hence at any x E X.
There is a 6> 0 such that If(x)1 < I whenever IIxll < 6. Now for any
x 1= = 0 we have II (6x)/11 Ix I II = 6, and thus
If(x ) =
I If(~
IIxll
II II) 1=
-r
X
If( 6x ) \ .
TIXlT IIxll
-r < II x ll.
0-
If we let M = 1/6, then If(x ) I< Mllxll, andfis bounded. _
We will see later, in Example 6.5.17, that there may exist linear funetionals
on a normed linear space which are unbounded. The class oflinear functionals
which are bounded has some interesting properties.
6.5.6. Theorem. eL t X ' be the vector space of all linear functionals on

,X and let X · denote the family of all bounded linear functionals on .X
Define the function II . II: X · - + R by
11/11 = sup I/(x)1 for IE *X . (6.5.7)

.... 0 IIxll
Then
(i) *X is a linear subspace of XI;
(ii) the function II • II defined in Eq. (6.5.7) is a norm on X · ; and
(iii) the normed space { X · ; II . III is complete.
Proof. The proof of part (i) is straightforward and is left as an exercise.
To prove part (ii), note that if I -F= 0, then II I II > 0 and if I = 0, then
11/11 = O. Also, since
sup IOt/(x)1 = lOti sup l/(x ) I,
.... 0 IIIx I .... 0 lTXlr
it follows that II Otl II = lOt IIII II· Finally,
III1 + 1211 = sup 1/1(x) + 12(x) I< sup 1{ /1(x)1 + 1/2(x) I}

.... 0 IIxll .... 0 IIxll
< sup 1/1(x)1 + sup 1/2(x)1 = III1II + 11/211.
.... 0 IIxll .... 0 Ilxll
eH nce, II . II satisfies the axioms of a norm.
To prove part (iii), let }~x{ X be a Cauchy sequence. Then Ilx~ - :x "11
E *
-+ 0 as m, n - + 00. If we evaluate this sequence at any x E ,X then {x~(x)}
is a Cauchy sequence of scalars, because Ix(~ )x - :x "(x) I< IIx~ - :x .. 1111 x II.
This implies that for each x E X there is a scalar x ' ( x ) such that x~(x) -+
x'(x). We observe that (' x Otx + py) = lim (~x Otx + py) = lim O[ tx(~ )x + 11_ 0 0 11' _ 0 0 .
px~(y)] = Ot lim x:(x) + p lim x:(y) = /X('x )x + px ' ( y), i.e., ('x Otx + py) =
+
~ .-~
Otx(' )x px ' ( y), and thus, ' x is a linear functional. Next we show that ' x
is bounded. Since :x { } is a Cauchy sequence, for f > 0 there is an M such that
Ix : (x ) - :x "(x)1 < fllxll for all m, n > M and for all x E .X But (~x )x +-
x'(x), and hence Ix ' ( x ) - :x "(x) I < fllxll for all m > M. It now follows that
Ix(' )x I = Ix(' )x - :x "(x) + :x "(x) I < Ix(' )x - :x "(x) I + I:x "(x) I

< fllxll + Ilx:"lIllxll,
and thus x ' is a bounded linear functional. iF nally, to show that :x ., - +
x'E *X , we note that Ix ' ( x ) - :x "(x) I < fllx II whenever m > M from which
we have Ilx' - :x ., II < f whenever m > M. This proves the theorem. _
Chapter 6 I Normed Spaces and Inner Product Spaces
6.5.8. Exercise. Prove part (i) of Theorem 6.5.6.
It is especially interesting to note that *X is a Banach space whether X

is or is not a Banach space. We are now in a position to make the following
definition.
6.5.9. Definition. The set of all bounded linear functionals on a normed

space X is called the Donned conjugate space of ,X or the nonned dual of ,X
or simply the dual of ,X and is denoted by * X . F o r I E *X we call 11/11
defined by Eq. (6.5.7) the nonn off
The next result states that the norm of a functional can be represented
in various equivalent ways.
' heorem. L e tlbe

6.5.10. T a bounded linear functional on ,X and let 11/11
be the norm off Then
(i) IIIII= inf{ M : I/(x ) Is Mil x II for all x EX } ;
M
(ii) Ilfll= sup { 1 /(x ) l} ; and

b:1~ 1
(iii) 11/11 = sup {l/(x)l}.

1..1- 1
6.5.H . Exercise. Prove Theorem 6.5.10.
Let us now consider the norms of some specific linear functionals.
6.5.12. Example. Consider the normed linear space e{ ra, b]; II • II-I. The
r
mapping
I(x ) = x(s) ds, x E era, b]
is a linear functional on era, b] (cf. Example 3.5.2). The norm of this func-
tional equals (b - a), because
I/(x ) 1 = I6J G
x(s) ds I< (b - a) max
G~.~6
Ix(s) I. •
6.5.13. Example. Consider the space e{ ra, b]; II • II..}, let X o be a fixed
element of era, b], and let x be any element of era, b]. The mapping
I(x ) = s: x(s)xo(s) ds
is a linear functional on era, b] (cf. Example 3.5.2). This functional is
I< u: I
bounded, because
If(x)1 = If (X S)Xo(S) ds oX (S) IdS) II x 11_.

Sincefis bounded and linear, it follows that it is continuous. We leave it to
r
the reader to show that
11/11 = Ixo(s) Ids. -
6.5.14. Example. eL t a = (~1' ... , ~n) be a fixed element of nF , and let

x = (et, ... ,en) denote an arbitrary element of P. Then if
n
f(x ) = ~ ~/e"
it follows that f is a linear functional on P (cf. Example 3.5.6). eL tting

IIxll = I< etl Z + ... + ienI Z)I/2, it follows from the Schwarz inequality
(4.9.29) that
(6.5.15)
Thus, f is bounded and continuous. In order to determine the norm of ,J

we rewrite (6.5.15) as
I/(x) I < sup If(x)1 < lIall

Ilxll - ",00 lTXlf - ,
from which it follows that II/ II < II a II. Next, by setting x = a, we have
If(a)1 = lIallz . Thus,
I/(a) I = 11011
11011 .
Therefore IIfll= lIall· -
6.5.16. Example. Analogous to the above example, let a = (~1' ~z' ...)
be a fixed element of the Banach space I" (see Example 6.1.6), and let x =
(et, e",· .. ) be an arbitrary element of I". It follows that if
00
f(x ) = ~ ~/e"
1=1
thenfis a linear functional on /". We can show thatfis bounded by observing
that
If(x)1 = Iii/~e(1 < ~I~le/l < lIall·lIxll,

which follows from Holder's inequality for infinite sums (5.2.4). Thus, f is
bounded and, hence, continuous. In a manner similar to that of Example
6.5.14, we can show that II/II = I! all. _
We conclude this section with an example of an unbounded linear

functional.
6.5.17. Example. Consider the space X of finitely non-zero sequences x =

(~" ~2' •• '~8' 0, 0, ...) (cf. Example 3.1.14). Define II . II: X - + R as
IIxli = max lell· It is easy to show that ;X { II . II} is a normed linear space.
i
F u rthermore, it is readily verified that the mapping
is an unbounded linear functional on .X •
6.5.18. Exercise. Verify the assertions made in Examples 6.5.12, 6.5.13,

6.5.14,6.5.16, and 6.5.17.
6.6. IF NITE-DIMENSIONAL SPACES
We now briefly turn our attention to finite-dimensional vector spaces.

Throughout this section X denotes a normed linear space.
We recall that if "x { ... ,x 8 } is a basis for a linear space ,X then for each
x E X there is a unique set ofscaJ a rs e{ l" .. ,e8} in ,F called the coordinates
of x with respect to this basis (see Definition 3.3.36). We now prove the fol-
lowing result.
6.6.1. Theorem. Let X be a finite-dimensional normed linear space, and

let { x " ... ,x be a basis for .X
8
} F o r each x E ,X let the coordinates of x
with respect to this basis be denoted by (e I ' • . • , e8) E P. F o r i = I, ... ,
n, define the linear functionals It: X - + F by It(x) = Then each It is a con- el'
tinuous linear functional.
Proof The proof that It is linear is straightforward. To show that It is a
bounded linear functional, we let
S= a{ = (<"< ... , 8« ) E 8£ : 1«,1 + 1«21 + ... + 1«81 = I}.
It is left as an exercise to show that S is a compact set in the metric space
8F { ; PI} (see Example 5.3.1). Now let us define the function g: S - + R by
g(a) = 1I«,,x + ... + 8« 8x 11·
The reader can readily verify that g is a continuous function on S. Now let
m = inf{ g (a): a E S}. It follows from Theorem 5.7.15 that there is an
a o E S such that g(ao) = m. Note that m 0 since "x { *
... , x 8 } is a basis for
*
,X and also a o O. Hence m > O. It now follows that
II«,,x + ... + 8« 8x 11 > m
for every a = (<"< « ' ) E S. Since 1«,1 + ... + 1«81 =
... 8 I for a E S, we
6.6. iF nite-Dimensional Spaces 361
see that
IIIX IX I + ... + IXnnx ll > m(lIXII + ... + IIX n)! (6.6.2)
for all a E S.
Next, for arbitrary x E X with coordinates (' I " .. ,en) E P, we let
en
p = 1'1 I + ... + I I· First, we suppose that p > O. Then
Ilxli = Ilelx l + ... + enxnll = p II%x I+ ... + tXnll
> pm(I%1 + +I~I)
= m(lell + I+ n' l),
where inequality (6.6.2) has been used. Therefore, if p ::1= 0, we have
(let! + ... + len I) < - lmIlx l I· (6.6.3)

Noting that inequality (6.6.3) is also true if p = 0, we conclude that this
inequality is true for all x E .X Since Ift(x)1 = Iell < leI I + ... + Ienl,
i = I, ... , n, we see that Ift(x) I < (ljm)lIx l i for any x E .X Hence, ft is a
bounded linear functional and, consequently, it is continuous. _
6.6.4. Exercise. Prove that the set S and the function g have the properties
asserted in the proof of Theorem 6.6.1.
The preceding theorem allows us to prove the following important result.
6.6.5. Theorem. Let X be a finite-dimensional normed linear space. Then

X is complete.
Proof Let Ix { > ... , ,x ,} be a basis for ,X let kY { } be a Cauchy sequence in
,X and for each k let the coordinates of kY with respect to IX { > ... ,x,,} be
given by (l1kl> ... , ' 7 h)' It follows from Theorem 6.6.1 that there is a con-
stant M such that I11k} 1- 1/J1 < MllYk - IY II forj = I, ... , n and all i, k =
1,2, .... Hence, each sequence 7'{ k}} is a Cauchy sequence in ,F i.e., in R
or C, and is therefore convergent. Let '70} = lim 7' k} for j = I, ... , n. If we
k
let
oY = ' 7 0I X I + ... + 7' o"x",
it follows that kY { } converges to oY ' This proves that X is complete. _
The next result follows from Theorems 6.6.5 and 6.2.1.
6.6.6. Theorem. L e t X be a normed linear space, and let Y be a finite-

dimensional linear subspace of .X Then (i) Y is complete, and (ii) Y is closed.
Our next result is an immediate consequence of Theorem 6.6.1.
6.6.8. Theorem. Let X be a finite-dimensional normed linear space, and

let/be a linear functional on .X Then/is continuous.
We recall from Definition 5.6.30 and Theorem 5.6.31 that a subset Y o f

a metric space X is relatively compact if every sequence of elements in Y
contains a subsequence which converges to an element in .X This property
can be useful in characterizing finite-dimensional subspaces in an arbitrary
normed linear space as we shall see in the next theorem. Note also that in
view of Definition 5.1.19 a subset Y in a normed linear space X is bounded
if and only if there is a .t > 0 such that II Y II < .t for all Y E .Y
6.6.10. Theorem. Let X be a normed linear space, and let Y be a linear

subspace of .X Then Y is finite dimensional if and only if every bounded
subset of Y is relatively compact.
Proof (Necessity) Assume that Y is finite dimensional, and let {x I' • • , x.J
".J
".x ..
be a basis for .Y Then for any Y E Y there is a unique set {"I' ... , such
that Y = "IX I + ... + Let A be a bounded subset of ,Y and let I' k }
be a sequence in A. Then we can write kY = "ax i + ... + ".kX . for k =
I, 2, . . . . There exists a .l > 0 such that II Y kll < .l for all k. Consider I"Ik I
+ ... + I".kl. We wish to show that this sum is bounded. Suppose that it
is not. Then for each positive integer m, we can find a Y k . such that I"I k.1
+ ... + I".l.1 .>L mY > m. Now let Y~. = (l/Ym)Yk.· It follows that
lIy~.1I = ,,1m II Y k . II ,< ,1 m
<.!. m
Thus, Y~. - 0 as m - 00. On the other hand,
Y~. = "' l k.X I + ... + "~k.X.
where t(,k. = "' k J " m for i = I, ... ,n. Since 1";k.1 l~k.1 = I, the + ... +
coordinates "{ ;k., ; .. k~' .J form a bounded sequence in "F and as such
contain a convergent subsequence. Let "{ ;o' ... , ~o} be the limit of such
a convergent subsequence whose indices we denote by kmJ' If we let Y~ =
";ox i + ... + , ~ox., then we have
\Iy~ - y~JI < 1";0 - ";kJ ·lIx l lI + ... + I~o - ~k",1 • I\ .x II
- 0 as mj- 00.
Thus, Y~.J - y~. Since y~", - 0, it follows that Y~ = O. But this is impossible
because lX { ' ... , .x } is a linearly independent set. We conclude that the sum
6.7. Geometric Aspects ofLinear uF nctionals 363
II' lk I + ... + I1' 11k I is bounded. Consequently, there is a subsequence

I' { lk}" .. ,'1"k}} which is convergent in "F . Let 1'{ 10' ,'1"o} be the limit of
the convergent subsequence, and let oY = 1' 10 X I + + 1' "ox". Then Y k /- >
oY . Thus, kY { } contains a convergent subsequence, and this proves that A is
relatively compact.
(Sufficiency) Assume that every bounded subset of iY s relatively compact.
Let IX E Y be such that IIxllI = I, and let VI = V({x l } ) be the linear sub-
space generated by IX { } (see Definition 3.3.6). If VI = ,Y then we are done.
If VI *' - ,Y let z Y E Y be such that z Y $ VI' Let d = infllyz - « x " l. Since
II _
VI is closed by Theorem 6.6.6, we must have d> 0; otherwise z Y E VI'

F o r every ' 1 > 0 there is an X o E VI such that d < IIYz - X oII < d + I' .
Now let X z = (Yz - o x )/Ilyz - ox ll. Then X z rt V!, Ilxlz l = I, and
Ilx z - ix i = I I ~: = ;:11 - ix i = IIYz ~ x o ll" z Y - I'x I

> IIYz - x ' I I> _d_ = 1- - ' 1 - ,
d+ ' 1 -d+n d+ ' 1
where x ' = X o + IIYz - X oII X E VI for all X E VI' Since I' is arbitrary, we
can choose I' so that II X z - X 1 II > t·
Now let V2 be the linear subspace generated by {XI' x z .} If VI = ,Y we are
done. If not, we can proceed in the manner used above to select an X 3 rt VZ,
II x 3 11 = I, II IX - x 3 11 > t, and II X z - x 3 11 > t. If we continue this process,
then we either have V({x l ' . . . , ,x ,}) = Y for some n, or or else we obtain
an infinite sequence ,x { ,} such that Ilx,,1I = t and IIx" - "x ,11 > 1- for all
m *' - n. The second alternative is impossible, since ,x { ,} is a bounded sequence
and as such must contain a convergent subsequence. This completes the
proof. _
6.7. GEOMETRIC ASPECTS OF

IL NEAR FUNCTIONALS
Throughout this section X denotes a real normed linear space. Before giving
geometric interpretations of linear functionals we introduce the notions of
maximal subspace and hyperplane.
6.7.1. Definition. A linear subspace Y of linear space X is called maximal

if it is not all of X and if there exists no linear subspace Z of X such that
Y , *- Z ,Z,*- X and Y c Z.
Recall that if Y is a linear subspace of X and if Z E ,Y then we call the

set Z = z + Y a linear variety (see Definition 3.2.17). In this case we also
say that Z is a translation of .Y
364 Chapter 6 I aH rmed Spaces and Inner Product Spaces
6.7.2. Definition. A hyperplane Y in a linear space X is a rnax i mallinear

variety resulting from the translation of a maximal linear subspace.
If a hyperplane Y contains the origin, then it is simply a maximal linear
subspace and all hyperplanes Z obtained by translating Y are said to be
paraUel to .Y
The following theorem provides us with an important characterization

of hyperplanes in terms of linear functionals.
6.7.3. Theorem. Iff1= = 0 is a linear functional on X and if (X is any fixed

scalar, then the set Y = (x: f(x ) = (X} is a hyperplane. It contains the origin
o if and only if (X = O. Conversely, if Y is a hyperplane in a linear space X,
then there is a linear functional f on X and a fixed scalar (X such that Y =
(x: f(x ) = (X.}
Proof Consider the first part. Sincef=/::. 0 there is an IX such thatf(x ,) =
P /= ::. O. If X o = (« /X P)x l , then f(x o) = (« /X P)f(x ,) = (X and thus X o E .Y Let
+
oY = - X o .Y It is readily verified that oY = (x : f(x ) = O} and that oY
is a linear subspace, so that Y is a linear variety. Since oY /= ::. ,X we can
write every element of X as the sum of an element of oY and a multiple of
y, where y E X - oY ' If x E ,X if y is any element in X - oY such that
f(y) 1= = 0, and if
z = x _ f (x ) y
f(y) ,
then f(z ) = 0, and thus x has the required form. Now assume that Y I is a
linear subspace of X for which oY c Y I and Y I 1= = oY ' We can choose
y E Y I - oY , and the above argument shows that X c Y I so that Y I = .X
This shows that oY is maximal and that Y is a hyperplane.
The assertion that Y contains 0 if and only if (X = 0 follows readily.
Consider now the last part of the thorem. If Y is a hyperplane in X,
then Y is the translation of a linear subspace Z in X ; i.e., Y = x o + Z, with
X o fixed. If X o i Z, and if V( Y + x o) denotes the linear subspace generated
by the set Y + X O' then V(Y +
x o) = .x +
If for x = (Xx o Z, Z E Z, we
define f(x ) = (x , then Y = (x: f(x ) = I}. On the other hand, if X o E Z, then
we take IX i Z, X = V(Z + IX )' Y = Z, and define for x = (XIX + Z,
f(x ) = (X. Then Y = (x : f(x ) = OJ. This concludes the proof of the
theorem. _
In the proof ofthe above theorem we established also the following result:
6.7.4. Theorem. L e tf /= ::. 0 be a linear functional on the linear space ,X and

let Z = :x{ f(x ) = OJ. If X o E X - Z, then every x E X can be expressed as
_ f(x )
x - f(x o )x O + ,z Z E Z.
6.7. Geometric Aspects ofLinear uF nctionals
The next result shows that it is possible to establish a unique correspon-

dence between hyperplanes and linear functionals. This result follows readily
from Theorem 6.7.3.
6.7.5. Theorem. eL t Y be a hyperplane in a linear space .X If Y does not

contain the origin, there is a unique linear functional f on X such that
Y = { x : f(x ) = I}.
6.7.7. Theorem. Let Y be a maximal linear subspace in a Banach space .X

Then either Y = for f = ;X i.e., either Y is closed or else Y is dense in .X
Proof Since Y is a linear subspace, f is a linear subspace of X by Theorem
6.2.3. Now Y c f. eH nce, if Y , * f we must have f = ,X since Y is a
maximal linear subspace. _
In the next result we will show that Y is closed if and only if the functional
f associated with Y is bounded (Le., continuous). Thus, corresponding to
any hyperplane in a normed linear space there is a functional that is bounded
whenever the hyperplane is closed and vice versa.
6.7.8. Theorem. L e tfbe a non-ez ro linear functional on X , and let Y =

{ x : f(x ) = II} be a hyperplane in .X Then Y i s closed for every II if and only
iffis bounded.
Proof Sincefis bounded, it is continuous. If ,x { ,} is a sequence in Y which
converges to x E ,X then f(x,,) - + f(x ) = II, so that x E ,Y and thus Y is
closed.
Conversely, let Z = { x : f(x ) = O} be closed. In view of Theorem 6.7.4,
there exists an X o E X - Z such that X = x [ o + Z] . Now let ,x { ,} be a
sequence in X such that "x - + x E .X Then it is possible to express each "x
and x as "x = c.xo + "z and x = cX o + ,z where z., z E Z. eL t d =
inf II X o - z II· Since Z is closed, d > O. Now
6ez
IIx - ,x ,11 = II(e - e,,)x o - (z - ,z ,)11

> inf
(6- 6 .)eZ
II(e - e.)x o - (z - ,z ,)11 = Ie - e"ld.
Thus e" - + e. Moreover, since f(x,,) = c"f(x o) + f(z,,) = e"f(x o) - + cf(x o)
= f(x ) , it follows that f is continuous on ,X and hence bounded. _
We now introduce the concept of a half-space.
6.7.9. Definition. Let f be a non-zero linear functional on ,X and let

II E R. Let Y b e the hyperplane given by Y = { x : f(x ) = II}. Let Y 1, Y 2, Y 3,
and Y 4 be subsets of X defined by Y I = { x : f(x ) < ,}« Y 1 = { x : f(x ) < .}«

Y , = { x : f(x ) > .}« and Y 4 = { x : f(x ) > .}« Then each of the sets Y I • Y 1,
Y , . and Y 4 is called a half space determined by .Y In addition, let ZI and Z1
be subsets of .X We say that Y separates ZI and Z1 if either (i) ZI C Y 1 and
Z1 C Y 4 or (ii) ZI C Y 4 and Z1 C Y 1 •
6.7.10. Exercise. Show that each of the sets Y I , Y 1 , "Y Y 4 in the preced-
ing definition is convex. Also, show that if in the above definition f is
continuous, then Y I and Y , are open sets in ,X and Y 1 and Y 4 are closed sets
in .X
In order to demonstrate some of the notions introduced. we conclude

this section with the following example.
6.7.11. Example. Let X = R1, and let x = (~\t ~1) E .X Let y = ("1' "1)
be any fixed vector in .X and define the linear functionalf on X as
f(x ) = "I~I + "1~1'
The set
oY = x{ E R1:f(x ) = "I~I + "1~1 = O}
is a line through the origin of R1 which is normal to the vector y. IfX I ¢ oY .
the hyperplane
6.7.12. iF gure E. H a lf spaces.

6.8. Extension ofLinear uF nctionals 367
is a linear variety which is parallel to oY . The hyperplane Y divides RZ into

two open half-spaces ZI and Zz as depicted in Figure E. It should be noted
that x E X can now be written as x = Z + py, Z E ,Y where x E ZI if
P > 0 and x E Zz if P < o. •
6.8. EXTENSION OF IL NEAR N

UF CTIONALS
In this section we state and prove the Hahn-Banach theorem. This result
is very important in analysis and has important implications in applications.
We would like to point out that the present form of this theorem is not the
most general version of the Hahn-Banach theorem.
Throughout this section X will denote a real normed linear space.
6.8.1. Definition. Let Y be a linear subspace of ,X let Z be a proper linear

subspace of ,Y let I be a bounded linear functional defined on Z, and let]
be a bounded linear functional defined on .Y If lex) = I(x ) whenever x E Z,
then] is called an extension of/from Z to .Y If the spaces ,X ,Y Z are normed
and if 1I/11z = II 1 lin then I is called a norm preserving extension off
We now prove the following version of the Hahn-Banach theorem.
6.8.2. Theorem. Every bounded linear functional I defined on a linear

subspace Y of a real normed linear space X can be extended to the entire
space X with preservation of norm. Specifically, one can find a bounded
linear functional I such that
(i) lex) = lex) for every x E Y; and
(ii) Illllx = Ililly·
Proof Although this theorem is true for X not separable, we shall give
the proof only for the case where X is separable (see Definition 5.4.33 for
separability). We assume that Y is a proper linear subspace of X, for other-
wise there is nothing to prove. Let x I E X but x I i ,Y and let us define the
subset
Y I = x{ E X: x = (XIX + y, (X E R, y E .}Y
It is straightforward to verify that Y I is a linear subspace of ,X and further-
more that for each x E Y I there is a unique (X E R and a unique y E Y
such that x = (XIX + y. Ifan ex t ension] of/from Y t o Y I exists, then it has
the form
lex) = (X l (x l) + ICY),
and if we let c = - l(x l ), then lex) = 1(Y) - c(X. From this it is clear that
the extension is specified by prescribing the constant (Xc. In order that the
norm of the functional not be increased when it is continued from Y to Y I

,
we must find a c such that the inequality

If(Y) - tXcl < IIfll IIy + tXX I II
holds for all Y E .Y
If Y E ,Y then /Y tX E Y and the above inequality can be written as
If(tX)z - tXcl < IIflllltXz + tXx l II
or
I/(z) - cl < Ilfllllz + Ix II·
This inequality can be rewritten as
- I Ifllllz + IX II < f (z ) - c< Ilfllllz + Ix II

or, equivalently, as
f(z ) - IIfll liz + Ix II < C < f(z ) + IIfllliz + Ix II (6.8.3)
for all z E .Y We now must show that such a number c does indeed always
exist. To do this, it suffices to show that for any Y l o)/Z E ,Y we have
C 1 t>. /(YI) 1- 1/IIIIYI + XI II s /(Y z ) + IIfllllyz + x I II t>. Cz - (6.8.4)
But this inequality follows directly from
f(Y I ) - < IlfllilYI - lzY I = IlfllllYI + lX - lX - y~1I

f(yz )
< IIfllllYI + IX II + IIfll II zY + lX II·
In view of (6.8.3) and (6.8.4) it follows that C I S C < Cz. If we now let
j(x ) = f(y) - /%C, X E lY >
we have 11111 = II/II, and J is an extension of/from Y to Y I •
Next, since X is separable it contains a denumerable everywhere dense
set lX{ > ,~X •• , X . , ..• } . F r om this set of vectors we select, one at a time,
a linearly independent subset of vectors lY{ > ,z Y ... ,Y . , ...} which belongs
to X - .Y The set lY{ > ,z Y _ .. ,Y . , ...} together with the linear subspace Y
generates a subspace W dense in .X
Following the above procedure, we now extend the functional f to a
functional J on the subspace W by ex t ending/from Y to Y I , then to 2Y ,' etc.,
where
Y I = :x{ X = tXIY + ;Y Y E ,Y tX E R}
and
Y~ = :x{ X = tXzY + ;Y Y E Y I, tX E R}, etc.
6.8. Extension 0/ iL near uF nctionals 369
Finally, we extend J from the dense subspace W to the space X. At the

remaining points of X the functional is defined by continuity. If x E X ,
then there exists a sequence w { n } of vectors in W converging to .x By con-
tinuity, if lim Wn = ,x thenJ ( x ) = IimJ ( w n ). The inequality IJ ( x ) I< 1I/1111x II
._00 "_00
follows from
IJ ( x ) I= lim Ij(wn) I < lim 1I/IIIIwnii = 11/1I11xll·
'-00 n- o o
The next result is a direct consequence of Theorem 6.8.2.
6.8.5. Corollary. Let X o E ,X X o 1= = O. Then there exists a bounded non-

zero linear functional/defined on all of X such that/(x o) = II X o II and 11/11
=1.
Proof Let Y be the linear subspace of X given by Y = y{ E :X y = lU o,
IX E R}. F o r Y E ,Y define 10(Y) = IXlix oII, where Y = IXx o' Then Ilyll=
IIXI·llx oll, and so l~o~ 1 = 1 for all Y E .Y This implies that II/oil = 1.
The proof now follows from Theorem 6.8.2. _
The next result is also a consequence of the Hahn-Banach theorem.
6.8.6. Corollary. Let X o E ,X X O 1= = 0, and let "f > O. Then there exists a
bounded non·zero linear functional/defined on all of X such that II I II = "f
and/(x o) = 1I/11· l Ix o lI·
The above corollary guarantees the existence of non-trivial bounded

linear functionals.
In the next example a geometric interpretation of Corollary 6.8.5 IS

given.
6.8.8. Example. Let X o E ,X X o 1= = 0, and let/be a linear functional defined

on X such that/(x o ) = Ilx o II and II/II = I. Let K b e the closed sphere given
by K = x { E :X IIxll < IlxolIJ. Now if x E ,K then I(x ) < I/(x ) I <
11/11 · l Ix l l < IIx o II, and so x belongs to the half-space x { E X : /(x ) < IIx o ll}·
Thus, the hyperplane x { E :X /(x ) = II X oII} is tangent to the closed sphere
(as illustrated in Figure )F . _
K
(X : fIx) l= Ixoll}
6.8.9. iF gure .F Illustration of Corollary 6.8.5.
In closing this section, we mention two of the more important conse-

quences of the Hahn-Banach theorem with significant practical implications.
One of these states that given a convex set Y in X containing an interior
point and given a fixed point not in the interior of ,Y there is a hyperplane
separating the fixed point and the convex set .Y The second of these asserts
that if Y t and Y z are convex sets in X, if Y t has interior points, and if Y z
contains no interior point of Y I , then there is a closed hyperplane which
separates Y t and Y z •
6.9. DU A L SPACE AND SECOND DU A L SPACE
In this section we briefly reconsider dual space X · (see Definition 6.5.9),

and we introduce the dual space of X·, called the second dual space. Through-
out this section X is a real normed linear space, and X ' is the algebraic
conjugate of .X
We begin by determining the dual spaces of some common normed linear
spaces.
6.9.1. Example. eL t X = R· , let x = (el' ... ,e.) denote an arbitrary

element of R·, let a = (/XI' ..• , /X.) be some fixed element in R', let II x II =
~f + ... + ~:, and recall from Example 6.5.14 that the functional/(x) =
/X l et + .,. + /X.e. is a bounded linear functional on X and II/II = 11011.
If we define a set of basis vectors in R· as e l = (1,0, ... ,0), ... , e. =
(0, ... , 0, I), then x E R· may be expressed as x = L•
I- I
e,e,. If we let /x, =
/(e,), where/is any bounded linear functional on R·, then
6.9. Dual Space and Second Dual Space 371
Thus, the dual space X * of R' = X is itself the space R' in the sense that
the elements of X *
Furthermore, the norm on x * is
consist of all functionals of the form f(x )
IlfII =
.
CE l 2
ex n / • •
=
~
~
• ex,~"
lal
6.9.2. Exercise. Let X = R' , where the norm of x = (~I' ... ,~.) E X is
given by IIxll = max I~,I (see Example 6.1.5). Show that if f E X*, then
15;15;,
there is an a = (ex .. ... ,ex.) E R· such that f(x ) = + ... + ex.~ ,
exl~1 so
that X * = R' , and show that the norm on X*
•
is given by Ilfll= E lex,l.
I- I
6.9.3. Exercise. Let X = R' , and define the norm of x = (~I' ... , ~,) E X
by IIxll = (1~llp + ... + 1~,lp)l/p, where I < p < (see Example 6.1.5). 00
Show that if f E x * then there is an a = (ex ... .. , ex,) E R" such that
f(x ) = exl~1 + ... + ex,~" i.e., X* = R', and show that the norm on X *
is given by IlfII = (I ex I I' + ... + IIX,I') 1/" where q is such that .1
p
+ .1q = I.
6.9.4. Exercise. Let X be the space 1.1" 1 < p < 00, defined in Example
6.1.6 and let .1
p
+ .1q = I. If p = 1, we take q = 00. Show that the dual
space of 1.1' is I,. Specifically, show that every bounded linear functional on
lp is uniquely representable as
I(x ) -
= 1=I;
1
ex,"{
where a = (ex .. ... , ex k , • • ) is an element of I,. Also, show that every
element a of If defines an element of (lp)* in the same way and that
IIfll= 1(sup1: IIXIex,Ill,)I/'

I- I
if I
if p =
< P<
I.
001 .
Since X * is a normed linear space (see Theorem 6.5.6) it is possible to

form the dual space of X*, which we will denote by X " and which will
be referred to as the second dual space of .X As before, we will use the
notation x " for elements of X** and we will write
x " (x ' ) = ,' x < x " ),
where x' E X*. If X ' denotes the algebraic conjugate of ,X then the reader
can readily show that even though X * c X ' and X** c (X * )I, in general,
x * * is not a linear subspace of X f f.
Let us define a mapping J of X into x * * by the relation
(x ' , J x ) = (x , x ' ) , x E ,X x' E x* (6.9.5)
or, equivalently, by
Jx = x", x " (x ' ) = x'(x). (6.9.6)
We call this mapping J the canonical mapping of X into X**. The functional
x " defined on X * in this way is linear, because
"x (<;x < + px ; ) = ,x < «x; + px ; ) = ,x<« x;) + p<,x ;x >
= «x"(x~) + px " (x ; ),
and thus x " E (X * )/. Since
Ix " (x ' ) I= Ix'(x) I= I,x< )'x 1 < IIxllllxl' I,
it follows that Ilx"ll < IIxll and thus x " E X**. We can actually show that
II x " II = II x II· This is obvious for x = O. If x *- 0, then in view of Corollary
6.8.6 there exists a non-zero x ' E X * such that ,x < x ' ) = II x 1111 x ' II, and thus
IIx"ll = IIxll. From this it follows that the norm of every x E X can be
defined in two ways: as the norm of an element in X and as the norm of a
linear functional on X * , i.e., as the norm of an element in X**. We sum-
marize this discussion in the following result:
6.9.7. Theorem. X is isometric to some linear subspace in X**.
If we agree not to distinguish between isometric spaces, then Theorem

6.9.7 can simply be stated as X c X**.
6.9.8. Definition. A normed linear space is said to be reflexive if the

canonical mapping (6.9.6), J : X - + X**, is onto. If we again agree not to
distinguish between isometric spaces, we write in this case X** = .X If
X*- X**, then X is said to be irreflexive.
6.9.9. Example. The space R~, I <p < 00 is reflexive. _
6.9.10. Example. The spaces Ip , I < p < 00, are reflexive. _
6.9.11. Example. The space II is irreflexive. _
6.9.12. Exercise. Prove the assertions made in Examples 6.9.9 through

6.9.1 I.
6.10. WEAK CONVERGENCE
Having introduced the normed dual space, we are now in a position to

consider the notion of weak convergence, a concept which arises frequently
in analysis and which plays an important role in certain applications. Through-
out this section X denotes a normed linear space and *X is the dual space of .X
6.10. Weak Convergence 373
6.10.1. Definition. A sequence .x{ l of elements in X is said to converge

weakly to the element x E X if for every x ' E * X , .x< , x ' )- + ,x < x ' ) . In
i.e., if II x . - x 11-+ °
this case we write x . + - x weakly. If a sequence { x . } converges to x E ,X
as n - 00, then we call this convergence strong con-
vergence or convergence in norm to distinguish it from weak convergence.
6.10.2. Theorem. Let .x{ l be a sequence in X which converges in norm to

x E .X Then .x{ l converges weakly to .x
Proof
have
Assume that Ilx. - x l l- ° as n - 00. Then for any x ' E *X we
1.x< , x ' ) - (x , ,x )1 < Ilx'llllx. - x l l- - 0 as n - 00,
and thus x . - x weakly. _
Thus, strong convergence implies weak convergence. However, the con-

verse is not true, in general, as the following example shows.
6.10.3. Example. Consider in /2 the sequence of vectors X J = (1,0, ... ,

0, ...), X 2 = (0, 1,0, ... ,0, ...), X 3 = (0,0, I, ... ,0, ...), .... To show
that { x . } converges weakly we note that every x ' E /2 = *X can be repre-
sented as the scalar product with some fixed vector y = ('11' 1' 2' ... , 1' ., ...);
i.e., if x = (el' e2' ... ,e., ... ), then
= I=:E
~
,x < x') el'1l

J
(see Exercise 6.9.4). F o r the case of the sequence { x . } we now have
.x< , x ' ) = 1' .,
and since 1' . + - °
as n - 00 for every y E /2' it follows that .x < , x ' ) +-
° °
x. + °
as n - 00 for every x ' E 12 , Thus, .x{ }
strongly, because II x . II = 1. _
converges to weakly. However,
We leave the proof of the next result as an exercise to the reader.
6.10.4. Theorem. If X is finite dimensional, weak and strong convergence

are equivalent.
Analogous to the concept of weak convergence of elements of a normed

linear space X we can introduce the notion of weak convergence of elements
of *
X .
6.10.6. Definition. A sequence of functionals {x~} in x* converges weakstar
(i.e., weak*) to the linear functional x ' E x* if for every x E X we have
,x < )~x + - ,x< x ' ) . We say that x~ - x ' weak*.
Since strong convergence in X · implies weak convergence in X · , it follows

that if a sequence of linear functionals {x~} in X · converges to the linear
functional x ' E X · , then x~ - + x ' weak·.
Let us consider an example.
6.10.7. Example. Let a[ , b) be an interval on the real line containing the

origin, i.e., a < 0 < b, and let e{ a[ , b); II . II~} be the Banach space of real-
valued continuous functions as defined in Example 6.1.9. Let l{ pft} be a sequence
of functions in era, b) satisfying the following conditions for n = I, 2, ... :
(i) Ipft(t) > 0 for all t E a[ , b);
(ii) Ipft(t) = 0 if It I > lin and t E a[ , b); and
(iii) s: Ipft(t) dt = I.
F o r each n = I, 2, ... , we can define a continuous linear functional ~

on X (see Example 6.5.13) by
(x , x~> = s: (x t)lpft(t) dt
where x E era, b]. Now let ~x be defined on era, b) by
(x , x~> = (x O)
for all x E era, b]. It is clear that x~ E X · . We now show that ~ - + x~
weak· . By the mean value theorem from the calculus, there is a tft such that
- l in S t. < lin and
f 1/.
- l ift
Ip.(t)x(t) dt = (x tft) f lift
- l ift
'ft(t) dt = (x tft)
for each n = 1,2, ... , and x E era, b]. Thus, (x , x~> -+ (x O) for every
x E era, b]; i.e., ~x - X o weak· . We see that the sequence of functions I{ p.}
does not approach a limit in era, b]. In particular, there is no Ipo E era, b]
such that (x O) = s: (x t)lpo(t) dt. Frequently, in applications, it is convenient
to say the sequence l{ pft} converges to the so-called "~ function" which has
this property. We see that the sequence l{ pft} converges to the ~ function in the
sense of weak· convergence. _
6.10.8. Theorem. Let X be a separable normed linear space. Every

bounded sequence of linear functionals in X · contains a weakly convergent
subsequence.
Proof Since X is separable, we can choose a denumerable everywhere dense
set IX{ ' x z , ..• ,x ft , • . } in .X Now let }~x{ be a sequence in X · . Since this
sequence is bounded in norm, the sequence ({ IX > }>~x is a bounded sequence
in either R or C. It now follows that we can select from }~x{ a subsequence
6.11. Inner Product Spaces 375
{xU such that the sequence IX < { ' .~x J> converges. Again, from the sub-
sequence .~x{ J we can select another subsequence {x~.J such that the sequence
lX < { ' :.~X J> converges. Continuing this procedure, we obtain the sequences
"~X "~x , x~., .
X~" x~., , x~., .
X~" x~., , x~, .
By taking the diagonal of the above array, we obtain the subsequence of

linear functionals X~" x~., x~., .... F o r this subsequence, the sequence
,~X (X)~ , .~x (x)~ , l'x .()~X , ... converges for all n. But then X~,(X), /x ..(x),
.~x (x), ... converges for all X E .X This completes the proof of the
theorem. _
The concepts of weak convergence and weak* convergence give rise to

various generalizations, some of which we briefly mention.
Let X be a normed linear space and, let x * be its normed dual. We call
a set Y c: X · weak· compact if every infinite sequence from Y contains a
weak* convergent subsequence. We say that a functional defined on ,X which
in general may be non-linear, is weakly continuous at a point X o E X if for
every f > 0 there is a ~ > 0 and a finite collection {x~, x~, ... , x~l in *
X ,
such that If(x ) - f(x o) I < f for all x such that I<,x ;x > I < ~ for i = 1,2,
... ,n. We can define weak· continuity of a functional similarly by inter-
changing the roles of X and X · .
It can be shown that if X is a real normed linear space and X · is its
normed dual, then any closed sphere in x* is weak· compact.
The reader can readily show that iff is a weakly continuous functional,
then x . - + X weakly implies that f(x~) -+ f(x ) .
6.11. INNER PRODUCT SPACES
We recall (see Definition 3.6.19 and the discussion following this definition)
that if X is a complex linear space, a function defined on X X X into C,
which we denote by (x, y) for x, y E ,X is called an inner product if
(i) (x, )x > 0 for all x * - O and (x, )x = 0 if x = 0;
(ii) (x, y) = (y, )x for all x, y E X ;
(iii) (IXX + py, z ) = IX(,X z ) + P(Y, z ) for all x , y, z E X and for all
IX, P E C; and
(iv) (x , IXy + pz ) = « ( x , y) + P(x , z ) for all x , y, z E X and for all
IX, p E C.
In the case of real linear spaces, the preceding characterization of an

inner product is identical, except we omit complex conjugates in (ii) and (iv).
We call a complex (real) linear space X on which an inner product, ( " ' ) ,
is defined a complex (real) inner product space which we denote by ;X { ( " .)}
(see Definition 3.6.20). If the particular inner product being used in a given
discussion is understood, we simply write X to denote the inner product
space. In accordance with our discussion following Definition 3.6.20, recall
also that different inner products defined on the same linear space yield
different inner product spaces. Finally, refer also to the discussion following
Definition 3.6.20 for the characterization of an (inner product) subspace.
We have already extensively studied finite-dimensional real inner product
spaces, i.e., Euclidean vector spaces, in Sections .4 9 and .4 10. Our subsequent
presentation will be in a more general setting, where X need not be finite
dimensional and where X may be a complex vector space. In fact, unless
otherwise stated, { X ; ( " .)} will denote in this section an arbitrary complex
inner product space. Since the proofs of several of the following theorems are
nearly identical to corresponding ones in Sections .4 9 and .4 10, we will leave
such proofs as exercises.
One of our first objectives will be to show that every inner product space
{ X ; ( " .)} has a norm associated with it which is induced by its inner
product ( " .). We find it convenient to consider first the Schwarz inequal-
ity, given in the following theorem.
6.11.1. Theorem. F o r any x E ,X let us define the function II . II: X -+ R

by IIxll = (x , X ) I/2. Then for all x , y E ,X
I(x, y)l ~ II x ll·I\ IY I· (6.11.2)
6.11.3. Exercise. Prove Theorem 6.11.l (see Theorem .4 9.28).
Using the above results, we can now readily show that the function II . II
defined by IIxll = (x, X ) I/2 is a norm.
6.11.4. Theorem. Let X be an inner product space. Then the function

II . II: X -+ R defined by
IIxll = (x, X ) I/2 (6.11.5)
is a norm; i.e., for every ,x Y E X and for every« E C, we have
(i) IIxll ~ 0;
(ii) IIxll = 0 if and only if x = 0;
(iii) lI«lx l = 1«lIlxll; and
(iv) Ilx + IY I < IIxll + lIyll·
6.11.6~ Exercise. Prove Theorem 6.11.4 (see Theorem .4 9.31).
Theorem 6.11.4 allows us to view every inner product space as a normed

linear space, provided that we use Eq. (6.11.5) to define the norm on .X
Moreover, in view of Theorem 6.1.2, we may view every inner product space
as a metric space, provided that we define the metric by p(x, y) = Ilx - yll.
Subsequently, we adopt the convention that when using the properties and
terminology ofa normed linear space in connection with an inner product space
we mean the norm induced by the inner product, as given in Eq. (6.11.5).
We are now in a position to make the following important definition.
6.11.7. Definition. A complete inner product space is called a H i lbert

space.
Thus, every H i lbert space is also a Banach space (and also a complete
metric space). Some authors insist that H i lbert spaces be infinite dimensional.
We shall not follow that practice. An arbitrary inner product space (not
necessarily complete) is sometimes also called a pre- H i lbert space.
6.11.8. Example. L e t X be a finite-dimensional (real or complex) inner

product space. It follows from Theorem 6.6.5 that X is a H i lbert space. _
6.11.9. Example. L e t 12. be the (complex) linear space defined in Ex a mple

6.1.6. L e t x = (el> e2.' ...) E 12.' Y = (111) 112.' ...) E 12.' and define (x, y):
12. X 12. - Cas
(x , y) = I-I;
.
elil'
I
It can readily be shown that ( " .) is an inner product on .X Since 12. is

complete relative to the norm induced by this inner product (see Ex a mple
6.1.6), it follows that 12. is a H i lbert space. _
6.11.10. Ex a mple
(a) L e t X = ~[a, b] denote the linear space of complex-valued continuous
functions defined on a[ , b] (see Ex a mple 6.1.9). F o r ,x y E ~[a, b] define
(x , y) = s: x ( t)y(t) dt.
It is readily verified that this space is a pre- H i lbert space. In view of Example
6.1.9 this space is not complete relative to the norm II x II = (x, X)I/2., and hence
it is not a H i lbert space.
(b) We extend the space of real-valued functions, pL a[ , bJ, defined in
Ex a mple 5.5.31 for the case p = 2, to complex-valued functions to be the
set of all functions f: a[ , b] - C such that f = u + iv for u, v E 2L .[a, b].
Denoting this space also by 2L .[a, b], we define
(f, g) = r fgdp,
G
[ J .bl
for f, g E L~[a, b], where integration is in the eL begue sense. The space
{L~[a, b]; ( " .)} is a Hilbert space. _
In the next example we consider the Cartesian product of Hilbert spaces.
6.11.11. Example. Let IX{ '} i = I, ... , n, denote a finite collection of

Hilbert spaces over C, and let X = IX X •• x X .• If x E ,X then x =
(X I J ' • . , x.) with IX E IX ' Defining vector addition and multiplication of
vectors by scalars in the usual manner (see Eqs. (3.2.14), (3.2.15), and the
related discussion, and see Example 6.1.10) it follows that X is a linear
space. If ,x Y E X and if (XI' IY )I denotes the inner product of IX and IY on
uX then it is easy to show that
defines an inner product on .X The norm induced on X b y this inner product

is
Ilxll = (x, )X I/2 = d: IIIX
I- I
11f)1/2
where IIXlIII = (XI' X I)/'2. It is readily verified that X is complete, and thus
X is a Hilbert space. _
6.11.12. Exercise. Verify the assertions made in Example 6.1 1.11.
In Theorem 6.1.15 we saw that in a normed linear space { X ; II • II}, the

norm 1\ • II is a continuous mapping of X into R. Our next result establishes
the continuity of an inner product. In the following, X . +- X implies con-
vergence with respect to the norm induced by the inner product ( " .)
on .X
6.11.13. Theorem. Let .x{ } be a sequence in X such that x . +- x, where

X E ,X and let .Y { } be a sequence in .X Then
(i) (z, x . ) +- (z, )x for all z E ;X
(ii) (x . , z) - (x, )z for all z E X;
(iii) IIxlIll-+- IIxll; and
(iv) if 1; .Y then (1; .Y , )z = n:o::.l
1; (y., )z for all
~ ~ ~
is convergent in ,X
,._ 1 ,,= 1
Z E .X
Next, let us recall that two vectors x, Y E X are said to be orthogonal

if (x, y) = 0 (see Definition 3.6.22). In this case we write x ..L y. If Y c X
and x E X is such that x .J .. y for all y E ,Y then we write x .J .. .Y Also, if

Z c X and Y c X and if z .J .. Y for all z E Z, then we write Y .J .. Z. Further-
more, observe that x .J .. x implies that x = O. Finally, the notion of inner
product allows us to consider the concepts of alignment and colinearity of
vectors.
6.11.1S. Definition. Let X be an inner product space. The vectors x, y E X

are said to be coJinear if (x, y) = ± l Ix l l ·llyll and aligned if (x, y) =
Ilxll·IIYII·
Our next result is proved by straightforward computation.
6.11.16. Theorem. F o r all x, y E X we have

(i) Ilx + yW + Ilx - yW = 211xW + 211yW; and
(ii) if x .J .. y, then IIx + yW = IlxW + IlyW·
Parts (i) and (ii) of Theorem 6.11.16 are referred to as the parallelogram
law and the Pythagorean theorem, respectively (refer to Theorems .4 9.33 and
.4 9.38).
6.11.18. Definition. Let x { .. : a E I} be an indexed set of elements in ,X

where I is an arbitrary index set (i.e., I is not necessarily the integers). Then
(x .. : « E I} is said to be an orthogonal set ofvectors if x .. ...L x p for all ,« pEl
such that « 1= = p. A vector x E X is called a unit vector if II x II = 1. An
orthogonal set of vectors is called an orthonormal set if every element of the
set is a unit vector. Finally, if IX{ } is a sequence of elements in ,X we define
an orthogonal sequence and an orthonormal sequence in an obvious manner.
sU ing an inductive process we can generalize part (ii) of Theorem 6.11.16

as follows.
6.11.19. Theorem. Let { X I ' ... ,x n} be a finite orthogonal set in .X Then
II ~ x J W= J~ IIx J llz.
We note that if x 1= = 0 and if y = lx llxll, then lIyll = 1. eH nce, it is
possible to convert every orthogonal set of vectors into an orthonormal set.
Let us now consider a specific example.
6.11.20. Example. Let X denote the space of continuous complex-valued

functions on the interval 0[ , I]. In accordance with Example 6.11.10, we
define an inner product on X by
(f, g) = (f(t)g(t) dt. (6.11.21)
We now show that the set of vectors defined by

fft(t) = e2a .,' , n = 0, ± I , ± 2 , ... ,i = , J = I (6.11.22)
is an orthonormal set in .X Substituting Eq. (6.11.22) into Eq. (6.11.21),
II
we obtain
(f.,f",) =
J 0
I -
fft(t)f",(t) dt = 0 e2aCa- I II)" dt
e 2a (a- I II)' - 1
- 2n(n - m)i
Since e 2ak
' = cos 2nk + i sin 2nk, we have
(fft,f",) = 0, m * n;
i.e., if m * n, then fa ..L fill' On the other hand,
(fft,fft) = :J e2a (ft- f t)" dt = I;
i.e., if n = m, then (fft,fft) = I and Il/all = 1. •
The next result arises often in applications.
6.11.23. Theorem. If { X I ' ... , fX t} is a finite orthonormal set in ,X then
(i) t I(x,
1='
x;) 12 < IlxW for all X E X; and (6.11.24)
(ii) (x - :t (x,
1='
x,)x,) ..L x J for any j = 1, ... , n.
6.11.25. Exercise. Prove Theorem 6.11.23 (see Theorem .4 9.58).
On passing to the limit as n -> 00 in (6.11.24), we obtain the following

result.
6.11.26. Theorem. If ,x { } is any countable orthonormal set in ,X then
(6.11.27)
for every x E .X
The relationship (6.11.27) is known as the Bessel inequality. The scalars

(1" =
(x , x,) are called the Fourier coefficients of x with respect to the ortho-
normal set ,x { .}
The next result is a generalization of Theorem .4 9.17.
6.12. Orthogonal Complements 381
6.11.28. Theorem. In an inner product space X we have (x, y) = 0 for all

x E X if and only if y = O.
From our discussion thus far it should be clear that not every normed
linear space can be made into an inner product space. The following theorem
gives us sufficient conditions for which a normed linear space is also an
inner product space.
6.11.30. Theorem. Let X be a normed linear space. If for all ,x y E ,X

Ilx + yll2 + Ilx - yW = 2(llxW + IlyW), (6.11.31)
then it is possible to define an inner product on X by
(x, y) = tfll x + yW - IIx - yW + illx + iyW - illx - iyW} (6.11.32)
for all ,x y E ,X where i = ,.;=T.
6.11.34. Corollary. If X is a real normed linear space whose norm

satisfies Eq. (6.11.31) for all ,x y E ,X then it is possible to define an inner
product on X by
(x, y) = tWx + yW - l lx - yW}
for all ,x y E .X
In view of part (i) of Theorem 6.11.16 and in view of Theorem 6.11.30,

condition (6.11.31) is both necessary and sufficient that a normed linear
space be also an inner product space. Furthermore, it can also be shown
that Eq. (6.11.32) uniquely defines the inner product on a normed linear
space.
We conclude this section with the following exercise.
6.11.36. Exercise. eL t I" I < p < 00, be the normed linear space defined
in Example 6.1.6. Show that I, is an inner product space if and only if

p= 2.
6.12. ORTHOGONAL COMPLEMENTS
In this section we establish some interesting structural properties of

Hilbert spaces. Specifically, we will show that any vector x of a Hilbert space
X can uniquely be represented as the sum of two vectors y and ,z where y
is in a subspace Y of X and z is orthogonal to .Y This is known as the

projection theorem. In proving this theorem we employ the so-called "classical
projection theorem," a result of great importance in its own right. This
theorem extends the following familiar result to the case of (infinite-dimen-
sional) Hilbert spaces: in the three-dimensional Euclidean space the shortest
distance between a point and a plane is along a vector through the point and
perpendicular to the plane. Both the classical projection theorem and the
projection theorem are of great importance in applications.
Throughout this section, ;X { (-, .)) is a complex inner product space.
6.12.1. Definition. eL t Y be a non-void subset of .X The set of all vectors

orthogonal to ,Y denoted by .Y l, is called the orthogonal complement of .Y
The orthogonal complement of yl. is denoted by y{ l.).l. 6 yil, the orthog-
onal complement of yil is denoted by (yil)~ 6 Yil.l, etc.
6.12.2. Example. eL t X be the space 3£ depicted in iF gure G, and let Y

be the Ix a- ix s. Then yl. is the x 2 x 3p- lane, yu is the Ix a- ix s, ~Y is again
the x 2 x 3p- lane, etc. Thus, in the present case, y.u = ,Y ~ Y = yl., yilil
= yil, yil~il = y~il = yl., etc. _
y. y
Xl
6.11.3 iF gure G
We now state and prove several properties of the orthogonal complement.

The proof of the first result is left as an exercise.
6.12.4. Theorem. In an inner product space ,X O

{ )l. = X and Xl. = O
{ .J
6.12.6. Theorem. eL t Y be a non-void subset of .X Then y~ is a closed

linear subspace of .X
Proof If ,x y E y.l, then (x, )z = 0 and (y, )z = 0 for all z E .Y eH nce,
(<x< + py, )z = (« ,x )z + P(Y, )z = 0, and thus (<x< + Py) .l- z for all
z E ,Y or (<x< + py) E lY .. Therefore, yl. is a linear subspace of .X
To show that y.l is closed, assume that X o is a point of accumulation of

Then there is a sequence fx)~
lY .. from y.l such that II ~x - X o11- 0 as n -
00. By Theorem 6.11.13 we have 0 = (x~, z) - (x o, z ) as n - 00 for all
Z E .Y Therefore X o E y.l and y.l is closed. _
Before considering the next result we require the following concept.
6.12.7. Definition. Let Y be a non-void subset of ,X and let V(Y) be the

linear subspace generated by Y (see Definition 3.3.6). Let V(Y) denote the
closure of V(Y). We call V(Y) the closed linear subspace generated by .Y
Note that in view of Theorem 6.2.3, V(Y) is indeed a linear subspace

of .X
6.12.8. Theorem. Let Y and Z be non-void subsets of .X Then

(i) either Y () yl. = 0 or Y () y.l = O{ ;J
(ii) Y c Y.ll.;
(iii) if Y c Z, then Zl. c Y.l;
(iv) y.l = Y . lll; and
(v) yH is the smallest closed linear subspace of X which contains ;Y
i.e., yH = V(Y).
Proof To prove part (i), assume that Y () yl. 1= = 0, and let x E Y () .Y l.
Then x E Y and x E y.l and so (x , x ) = O. This implies that x = O.
The proof of part (ii) is left as an exercise.
To prove part (iii), let Y E Z.L. Then y .l- x for all x E Z. Since Z ::::J ,Y
it follows that y .l- x for all x E .Y Thus, y E y.L whenever Y E Z.L and
y.L ::::J Z.L.
To prove part (iv) we note that, by part (ii) of this theorem, y.l c yll.L .
On the other hand, since Y c yH , by part (iii) of this theorem, y.L ::::J y.L l l.
Thus, y.L = y.L..L .L
The proof of part (v) is also left as an exercise. _
6.12.9. Exercise. Prove parts (ii) and (v) of Theorem 6.12.8.
In view of part (iv) of the above theorem, we can write y.L = y.LH =
= ... , and y.l.L = y.l.L..L L = yU H l l.
yl..L..L lL . = ....
Before giving the classical projection theorem, we state and prove the
following preliminary result.
6.12.10. Theorem. Let Y be a linear subspace of ,X and let x be an

arbitrary vector in .X Let
6= inf(lIy - xII: y E .} Y
If there exists a oY E Y such that lIyo - Ix I = 0, then oY is unique, and

moreover oY E Y is the unique element in Y such that IIoY - x II = 0 if and
only if (x - oY ) 1- .Y
Proof Let us first show that if II oY - x II = 0, then (x - oY ) 1- .Y In doing
so we assume to the contrary that there is ayE Y not orthogonal to x - oY '
We also assume, without loss of generality, that y is a unit vector and that
(x - oY , y) = « *
O. Defining a vector Z E Y as z = oY + ,Y« we have
IIx - W
z = IIx - oY - y« 1l2 = (x - oY - « y , x - oY - «y)
= (x - oY , x - oY ) - (x - oY , « y ) - (<,Y< x - oY ) + (<,Y< «y)
= IIx - oY 11 2 - 1 1« - I« I + «lIlIyll2
= IIx - oY 11 1- 1«
2
< Ilx -
2
oY 11 2;
i.e., II x - z II < II x - oY II. F r om this it follows that if x - oY is not orthog-
onal to every y E ,Y then Ilyo - ix i *
o. This completes the first part of
the proof.
Next, assume that (x - oY ) 1- .Y We must show that oY is a uniq u e vector
such that II x - *
y II > II x - oY II for all y oY . F o r any y E Y we have, in
view of part (ii) of Theorem 6.11.16,
I!x - yW = IIx - oY + oY - yW = IIx - oY W + Ilyo - yW·
F r om this it follows that IIx - yll > Ilx - oY II for all y *oY - This com-
pletes the proof of the theorem. _
In Figure H the meaning of Theorem 6.] 2.] 0 is illustrated pictorially for

a subset Y of £3.
6.12.11. fF uJ re H
The preceding theorem does not ensure the existence of the vector oY .
However, if we require in Theorem 6.12.10 that Y b e a c/osedlinear subspace
in a H i lbert space ,X then the existence of the unique vector oY is guaranteed.
This important result, which we will prove below, is called the classical
projection theorem.
6.12.12. Theorem. eL t X be a Hilbert space, and let Y be a closed linear

subspace of .X Let x be an arbitrary vector in ,X and let
J = inf{lIy - Ix I: Y E .}Y
Then there exists a unique vector oY E Y such that IIYo - Ix I = .J More-
over, oY E Y is the unique vector such that IIY o - x l l= i nf(lly- x l l:
Y E } Y if and only if the vector (x - oY ) 1.. .Y
Proof. In view of Theorem 6.12. IO we only have to establish the existence
of a vector oY E Y such that II x - oY II = .J Assume that x i Y (if x E ,Y
then x = oY and we are done). Since J is the infimum of Ily - ix i for all
Y E ,Y there is a sequence nY{ } in Y such that Ilx - n
Y ll-- J as n - 00.
We now show that nY { } is a Cauchy sequence. By part (i) of Theorem 6.11.16
we have
II(Y", - x) + (x - nY )W + II(Y", - )x - (x - nY )IJ2
= 211Y", - X 11 2 + 211x - nY W·
This equation yields, after some straightforward manipulations, the relation
Ily", - nY W = 211Y", - xW + 211x - nY W - 14 1 x - (y", i Yn)!r'
Since Y is a linear subspace, it follows that for each "Y " nY E Y we have
(y", + nY )/2 E .Y Thus,llx - (y", + nY )/211 > ~ and
lIy", - nY W < 211Y", - xW + 211x - nY W - (4 P.
Also, since lIy", - x W- - ~2 as m - 00, it follows that lIy", - "Y W - 0
as m, n - 00. eH nce, nY{ } is a Cauchy sequence. Since Y is a closed linear
subspace of a Hilbert space, it is itself a Hilbert space and as such nY{ } has
a limit oY E .Y iF nally, by the continuity of the norm (see Theorem 6.1.15),
it follows that lim II x - "Y II = II x - oY II = .J This proves the theorem. -
The next result is a consequence of the preceding theorem.
6.12.13. Theorem. If Y and Z are closed linear subspaces of a Hilbert

space ,X if Y c Z, and if Y F= Z, then there exists a non-zero vector in Z,
say ,z such that z 1.. .Y
Proof. Let x be any vector in Z which is not in Y (there is one such vector
by hypothesis). If we define J as above, i.e., J = inf{lly - Ix I: Y E ,}Y
then there exists by Theorem 6.12.12 a vector oY E Y such that II x - oY II
= .J Now let z = oY - .x Then z 1.. Y by Theorem 6.12.12. _
From part (ii) of Theorem 6.12.8 we have, in general, y.u. ::> .Y Under
certain conditions equality holds.
6.12.14. Theorem. eL t Y be a linear subspace of a Hilbert space .X Then

f = y.u..
Proof From part (ii) of Theorem 6.12.8 we have Y c y.u.. Since y.u. is
closed by Theorem 6.12.6, it follows that f c y.u.. F o r purposes of con-
tradiction, let us now assume that f := 1= y.u.. Then Theorem 6.12.13 estab-
lishes the existence of a vector z E y.u. such that z := 1= 0 and such that z - ' f.
Thus, .z E· fl.. Since Y c f, it follows that Z E yl.. Therefore, we have
Z E yl. n y.u. and Z := 1= 0, which is a contradiction to part (i) of Theorem
6.12.8. eH nce, we must have f = y.u.. •
We note that if, in particular, Y is a closed linear subspace of X, then

Y = y.u..
In connection with the next result, recall the definition of the sum of two
subsets of X (see Definition 3.2.8).
6.12.15. Theorem. If Y and Z are closed linear subspaces of a Hilbert

space ,X and if Y - ' Z, then Y + Z is a closed linear subspace of .X
Proof In view of Theorem 3.2.10, Y + Z is a linear subspace of .X To
show that Y + Z is closed, it suffices to show that if u is a point of accu-
mulation for Y + Z, then u = Y + z for some Y E Y a nd for some Z E Z.
eL t u be a point of accumulation of Y + Z. Then there is a sequence of
vectors u{ ,,} in Y + Z with lIu" - ull- 0 as n - 00. In this sequence we
have for each n, u" = "Y + "z with "Y E Y and "z E Z. Suppose now that
u{ ,,} converges to a vector u E .X By the Pythagorean theorem (see Theorem
6.11.16) we have
Ilu" - umW = IIY" - mY + Z" - m
z W = IIY" - m
Y W + liz" - m
z ll 2•
But II u" - U m 11- 0 as m, n - 00, because u" having a limit is a Cauchy
sequence. Therefore, IIY" - m Y W - 0 and liz" - m z W - 0 as m, n - + 00.

But this implies that the sequences y{ ,,}, Z{ It} are also Cauchy sequences. Since
Y and Z are closed, these sequences have limits Y E Y and Z E Z, respec-
tively. iF nally, we note that
Ilu" - (y + )z 1I = IIY" - Y + ZIt - lz l < IIY,,- yll + liz" - lz l ~ 0
as n - + 00. Therefore, since "z cannot approach two distinct limits, we have
u= Y + .z This completes the proof. •
Before proceeding to the next result, we recall from Definition 3.2.13 that
a linear space X is the direct sum of two linear subspaces Y and Z if for
every x E X there is a unique Y E Y and a unique Z E Z such that x =
6.13. oF urier Series 387
Y + .z We write, in this case, X = Y ffi Z. The following result is known as

the projection theorem.
6.12.16. Theorem. If Y is a closed linear subspace of a Hilbert space X,

then X = Y f fi y1..
Proof Let Z = Y + y1.. By hypothesis, Y is a closed linear subspace and
so is y1. in view of Theorem 6.12.6. F r om the previous result it now follows
that Z is also a closed linear subspace. Next, we show that Z = .X Since
Y c Z and y1. c Z it follows from part (iii) of Theorem 6.12.8 that Z1. c y1.
and also that Z1. c y1.1., so that Z1. c y1. nY u . But from part (i) of
Theorem 6.12.8 we have y1. n y1.1. = O { .J Therefore, the ez ro vector is the
only element in both y1. and yil, and thus Z1. = O
{ .J Since Z is a closed linear
subspace we have from Theorems 6.12.4 and 6.12.14,
Z = Zu = (Z1.)1. = 0{ 1} . = .X
We have thus shown that we can represent every x E X as the sum x =
Y + ,z where Y E Y and z E y1.. To show that this representation is unique
we consider x = IY + Zl and x = 2Y + Z2' where tY > 2Y E Y and Zl' Z2
E y1.. Then (x - x ) = 0 = IY - 2Y + Z\ - Z2 or IY - 2Y = Z2 - Zl' Now
clearly (YI - 2Y ) E Y a nd (Z2 - zJ E yl.. Since IY - 2Y = Z2 - Zl we also
have (Y\ - 2Y ) E y1. and (Z2 - Zl) E .Y F r om this it follows that IY - 2Y
= Z2 - Z\ = 0; eL ., IY = 2Y and Zl = Z2' Therefore, x is unique. _
The above theorem allows us to write any vector x of a Hilbert space X

as the sum of two vectors Y and z ; i.e., x = y + ,z where y is in a closed linear
subspace Y o f X and z is in y1.. It is this theorem which gave rise to the
expression orthogonal complement.
If X is a Hilbert space and if Y is a closed linear subspace of X and if
x = y + ,z where y E Y a nd Z E y1., then we define the mapping P as
Px = y .
We call the function P the projection of x onto .Y Note that P(Px ) ~ p2X =
Py = ;Y eL ., p2 = P. We will examine the properties of projections in
greater detail in the next chapter. (Refer also to Definition 3.7.1 and
Theorem 3.7.4.)
6.13. O
F R
U IER SERIES
In the previous section we examined some of the structural properties of

Hilbert spaces. Presently, we will concern ourselves with the representation
of elements in Hilbert space. We will see that the vectors of a Hilbert space
can under certain conditions be represented as a linear combination of a
finite or infinite number of vectors from an orthonormal set. In this con-

nection we will touch upon the concept of basis in Hilbert space. The prop-
erty which makes all this possible is, of course, the inner product.
Much of the material in this section is concerned with an abstract approach
to the topic of F o urier series. Since the reader is probably already familiar
with certain facets of F o urier analysis, he or she is now in a position to recog-
nize the power and the beauty of the abstract approach.
Throughout this section ;X { (0, .)} is a complex inner product space, and
convergence of an infinite series is to be understood in the sense of Definition
6.3.1.
We now consider the representation of a vector Y of a finite-dimensional
linear subspace Y in an inner product space.
6.13.1. Theorem. Let X be an inner product space, let uY{ ..• ,Yn} be a
finite orthonormal set in ,X and let Y be the linear subspace of X generated
by { Y I ' • . • , nY ' } Then the vectors {Yu ..• , nY } form a basis for Y a nd, more-
over, in the representation of a vector Y E Y by the sum
Y = IXIIY + ... + IXnnY '
the coefficients t1., are specified by
t1.1 = ( y,y/), i= I , ..• ,n.
6.13.2. Exercise. Prove Theorem 6.13.1. (Refer to Theorems 4.9.44 and

4.9.51.)
We now generalize the preceding result.
6.13.3. Theorem.
infinite orthonormal sequence
Let X be a H i lbert
in .X A series ~
.
space and let ,x { }
t1.I X ,
be a countably
is convergent to an
t:1
element x E ,X i.e.,
if and only if L
.I 1X 2
< In this case we have the relation
1= 1
/1 00.
t1., = (x, ,x ), i = I, 2, ....

Proof Assume that i; I1X / <
1= 1
2
1 00, and let $ " = t IX,X
I- I
,. If n > m, then
= t
I:& m + l
1t1. / 12 __ O
as n, m - > 00. Therefore, {s.} is a Cauchy sequence and as such it has a

limit, say ,x in the Hilbert space .X Thus lim s. = x .
• -00
Conversely, if s{ .} converges then it is a Cauchy sequence and" s. - s". W

• •
= ~ Ill,1 2- > 0 as n, m - > 00. From this it follows that ~ Ill,12 - > 0
1-11I+1 '-11I+1
Ill,12 <
00
.
and ~ 00.
1· ..+ 1
Now assume that} ' Ill,12 < 00, and let x = lim s•. We must show that
f:1 - ...
ll, = (x, ,x ). From Theorem 6.13.1 we have ll, = (s., ,x ), i = I, ... ,n.
But s. - > x, and hence by the continuity of the inner product we have
(s., ,x ) - > (x, ,x ) as n - > 00. Therefore, ll, = (x, ,x ), which completes the
proof. _
In the next result we use the concept of closed linear subspace generated by
a set (see Definition 6.12.7).
6.13.4. Theorem. Let ,x { } be an orthonormal sequence in a Hilbert space

,X and let Y be the closed linear subspace generated by IX { ' } Corresponding
to each x E X the series
00
~ (x, x,)x , (6.13.5)

1-'
converges to an element x E .Y Moreover, (x - )X ..L .Y
6.13.6. Exercise. Prove Theorem 6.13.4. (Hint: tU ilize Theorems 6.11.26,

6.13.3, and the continuity of the inner product.)
A more general version of Theorem 6.13.4 can be established by replacing

the orthonormal sequence ,x { } by an arbitrary orthonormal set Z.
In view of Theorem 6.13.4 any element x of a Hilbert space X can
unambiguously be represented by a series of the form 6.13.5 provided that
the closed linear subspace Y generated by the orthonormal sequence ,x { }
is equal to the space .X The scalars (x, ,x ) in 6.13.5 are called Fourier co-
efllcients of x with respect to the ,x { ,}
6.13.7. Definition. Let X be a Hilbert space. An orthonormal set Y in X

is said to be complete if there exists no orthonormal set of which Y is a
proper subset.
The next result enables us to characterize complete orthonormal sets.
6.13.8. Theorem. Let X be a Hilbert space, and let Y be an orthonormal

set in .X Then the following statements are equivalent:
390 Chopter 6 I oH rmed Spaus and Inner Product Spaces
°
(i) Y is complete;
(ii) if (x, y) = for all Y E ,Y then x = 0; and
(iii) V(Y) = .X
6.13.9. Exercise. Prove Theorem 6.13.8 for the case where Y is an ortho-
normal sequence ,x { .}
As a specific example of a complete orthonormal set, we consider the

set of elements e l = (1,0, ... ,0, ...), e" = (0, 1,0, ... ,0, ...), e3 =
(0,0, 1,0, ... ,0, ...), ... in the Hilbert space I" (see Example 6.11.9). It is
readily verified that Y = Ie,} is an orthonormal set in I". Now let x = (' t ,
,,,' ... "., .• . ) E I", and corresponding to x let iX = ~
,'1=
i
,' e,. Then
Ilx - iX II" = f 1',1",

Ic:t:l
and thus lim IIx
k- -
- iX ii = 0. Hence, V(Y) = I" and
Y is complete by the preceding theorem.
Many of the subsequent results involving countable orthonormal sets
may be shown to hold for uncountable orthonormal sets as well (refer to
Definition 1.2.48). The proofs of these generalized results usually require a
postulate known as Zorn' s lemma. (Consult the references cited at the end
of this chapter for a discussion of this lemma.) Although the proofs of such
generalized results are not particularly difficult, they do involve an added
level of abstraction which we do not wish to pursue in this book. In connec-
tion with generalized results of this type, it is also necessary to use the notion
of cardinal number of a set, introduced at the end of Section 1.2.
The next result is known as Puseval' s formula (refer also to Corollary
.4 9.49).
6.13.10. Theorem. L e t X be a Hilbert space and let the sequence ,x { } be

orthonormal in .X Then
(6.13.11)
for every x E X if and only if the sequence ,x { } is complete.
there exists some z 1= = ° -

Proof. Assume to the contrary that the sequence ,x { }
such that (z, ,x ) = ° is not complete. Then
for all i. Thus, there exists a
z E X such that liz II" L
1= = I(z, x,) I". This proves the first part.
Now assume
I-'
that the sequence ,x { } is complete. In view of Theorems
- -
6.13.4 and 6.13.8 we have
x = ~
t=1
(x, ,x )x, = ~
,~
,« ,x .
Since ,x { J is orthonormal we obtain
IIx I I 2
= (i: (1"x " 1; (1,J x )J
1= I J= I
= 1'' f1= J=
1; (1,/i J (x
I "
)J x = t
1= I
1(1,,1 2
This completes the proof. _
A more general version of Theorem 6.13.10 can be established by replacing

the orthonormal sequence by an orthonormal set.
The next result, known as the Gram-Schmidt procedure, allows us to
construct orthonormal sets in inner-product spaces (compare with Theorem
.4 9.55).
6.13.12. Theorem. Let X be an inner-product space. eL t ,x { } be a finite

or a countably infinite sequence of linearly independent vectors. Then there
exists an orthonormal sequence y{ ,J having the same cardinal number as the
sequence ,x { } and generating the same linear subspace as ,x { ,}
Proof Since IX # . 0, let us define IY as
YI = 11;:11'
It is clear that IY and IX generate the same linear subspace. Next, let
Since
(Z2' Y I ) = (x 2 - (x 2, IY )YIo YI) = (x 2, Y I ) - (x 2, IY )(YIo YI)
= (x 2, Y I ) - (X2' YI) = 0,
it follows that Z2 -L Y I ' We now let 2Y = 2z /11 2z 11. Note that Z2 0, because *"
2X and IY are linearly independent. Also, IY and 2Y generate the same linear
subspace as IX and X 2, because 2Y is a linear combination of IY and 2Y '
Proceeding in the fashion described above we define Zlo Z2' ... and
Y I ' 2Y ' .• . recursively as
a- I
Z. = .X - ~ (X., ,Y )Y,
1= 1
and
z
Y.= IIz:II'
As before, we can readily verify that z. L- ,Y for all i < n, that z. # . 0, and
that the ,Y { l, i = I, ... ,n, generate the same linear subspace as the ,x{ ,}
i = I, ... ,n. If the set ,x { } is finite, the process terminates. Otherwise it is
continued indefinitely by induction.
The sequence e,,}

thus constructed can be put into a one-to-one cor-
respondence with the sequence ,x { ,} Therefore, these sequences have the
same cardinal number. _
The following result can be established by use of Zorn's lemma.
6.13.13. Theorem. eL t X be an inner product space containing a non-

ez ro element. Then X contains a complete orthonormal set. If Y is any
orthonormal set in ,X then there is a complete orthonormal set containing
Y a s a subset.
Indeed, it is also possible to prove the following result: if in an inner

product space \Y and Y 1 are two complete orthonormal sets, then Y \ and Y 1
have the same cardinal number, so that a one-to-one mapping of set \ Y onto
set Y 1 can be established. This result, along with Theorem 6.13.13, allows
us to conclude that with each Hilbert space X there is associated in a natural
way a cardinal number ". This, in turn, enables us to consider " as the
dimension of a Hilbert space .X F o r the case of finite-dimensional spaces this
concept and the usual definition of dimension coincide. oH wever, in general,
these two notions are not to be viewed as one and the same concept.
Next, recall that in Chapter 5 we defined a metric space X to be separable
if there is a countable subset everywhere dense in X (see Definition 5.4.33).
Since normed linear spaces and inner product spaces are also metric spaces,
we speak also of separable Banach spaces and separable Hilbert spaces. In
the case of Hilbert spaces, we can characterize separability in the following
equivalent way.
6.13.14. Theorem. A Hilbert space X is separable if and only if it contains

a complete orthonormal sequence.
Since in a separable Hilbert space X with a complete orthonormal sequence

,x { } one can represent every X E X as
X = L
. (x, X , )X I,
1= \
we refer to a complete orthonormal sequence ,x { } in a separable Hilbert

space X as a basis for .X Caution should be taken here not to confuse this
concept with the definition of basis introduced in Chapter 3. (See Definitions
3.3.6 and 3.3.22.) In that case we defined each X in a vector space to have a
representation as a finite linear combination of vectors X I ' Indeed, the con-
cept of Hamel basis (see Definition 3.3.22), which is a purely algebraic
6.14. The Riesz Representation Theorem 393
concept, is of very little value in spaces which are not finite dimensional. In
such spaces, orthonormal basis as defined above is much more useful.
We conclude this section with the following result.
6.13.16. Theorem. L e t Y be an orthonormal set in a separable H i lbert

space .X Then Y is either a finite set or a countably infinite set.
6.14. THE RIESZ REPRESENTATION THEOREM
In this section we state and prove an important result known as the

Riesz representation theorem. A direct conseq u ence of this theorem is that
the dual space x * of a H i lbert space X is itself a H i lbert space. Throughout
this section, { X ; (0, .)} is a H i lbert space.
We begin by first noting that for a fixed Y E ,X
f(x ) = (x, y) (6.14.1)
is a linear functional in .x By means of (6.14.1) distinct vectors y E X are
associated with distinct functionals. F r om the Schwarz ineq u ality we have
I(x, y)1 < Ilx l illyll·
H e nce, Ilfll < lIyll andfis bounded (i.e.,f E *
X ). F r om this it follows that
if X is a H i lbert space, then bounded linear functionals are determined by
the elements of X itself. In the next theorem we show that every element
y of X determines a uniq u e bounded linear functionalf(i.e., a uniq u e element
of X*) of the form (6.] 4 . 1) and that Ilfli = lIyll. F r om this we conclude
that the dual space X* of the H i lbert space X is itself a H i lbert space.
(Compare the following with Theorem 4.9.63.)
6.14.2. Theorem. (Riesz) L e tfbe a bounded linear functional on .X Then

there is a uniq u e y E X such that f(x ) = (x, y) for all x E .X Moreover,
Ilfll = Ilyll, and every y determines a uniq u e element of the dual space X*
in this way.
Proof F o r fixed y E ,X define the linear functionalf on X by Eq . (6.14.1).
F r om the Schwarz ineq u ality we have If(x)1 = lex, y)1 < lIyllllx l l so thatf
is a bounded linear functional and IIfll < lIyll· L e tting x = y we have If(y)1
= l(y,y)1 = lIyllllyll, from which it follows that Ilfli = Ilyll·
Nex t , let f be a bounded linear functional defined on the H i lbert space
.X L e t Z be the set of all vectors z E X such that fez) = o. By Theorem
3.4.19, Z is a linear subspace of .X Now let Z{ 8} be a sequence of vectors in
Z, and let X o E X be a point of accumulation of Z{ 8}' In view of the con-
394 eluzpter 6 I Normed Spaces and Inner Product Spaces
tinuity offwe now have 0 = fez,,) - f(x o) as n - 00. Thus, X o E Z and Z

is closed.
If Z = X, then for aU x E X we have f(x ) = 0, and the equality f(x ) =
(x, y) = 0 for all x E X holds if and only if y = O.
Now consider the case Z c X, X 1= = Z. F r om above, Z is a closed linear
subspace of .X We can therefore utilize Theorem 6.12.16 to represent X by
the direct sum
X= ZEfjZ1..
Since Z c X and Z 1= = ,X there exists in view of Theorem 6.12.13 a non-zero
vector u E X such that u L - Z; i.e., u E Z1.. Also, since u 1= = O.and since
u E Z1., it follows from part (i) of Theorem 6.12.8 that u fI. Z, and hence
feu) 1= = O. Since Z1. is a linear subspace of ,X we may assume without loss of
generality that feu) = l. We now show that u is a scalar multiple of our
desired vector yin Eq. (6.14.1).
F o r any fixed x E X we can write
f(x - f(x ) u) = f(x ) - f(x ) f(u) = f(x ) - f(x ) = 0,
and thus (x - f(x ) u) E Z. F r om before, we have u...L Z and hence
(x - f(x ) u, u) = 0, or (x, u) = f(x ) lluW, or f(x ) = (x, u/lluW). Letting
y= u/ll u Wyields now the desired form
f(x ) = (x, y).
To show that the vector y is unique we assume that f(x ) = (x, y' ) and
I(x ) = (x, y") for all x E .X Then (x, y' ) - (x, y") = 0, or (x, y' - y") = 0,
or (y' - y", )x = 0 for all x E .X It now follows from Theorem 6.11.28 that
y' = y". This completes the proof of the theorem. _
6.14.3. Exercise. Show that every H i lbert space X is reflexive (refer to

Definition 6.9.8).
6.14..4 Exercise. Two normed linear spaces over the same field are said
to be congruent if they are isomorphic (see Definition 3.4.76) and isometric
(see Definition 5.9.16). Let X be a H i lbert space. Show that X is congruent
to X*.
6.15. SOME APPLICATIONS
We now consider two applications to some of the material of the present

chapter. This section consists of three parts. In the first of these we consider
the problem of approximating elements in a H i lbert space by elements in a
finite-dimensional subspace. lit the second part we briefly consider random
6.15. Some Applications 395
variables, while in the third part we concern ourselves with the estimation of
random variables.
A. Approx i mation of Elements in H i lbert Space

(Normal Equations)
In many applications it is necessary to approximate functions by simpler

ones. This problem can often be implemented by approximating elements
from an appropriate Hilbert space by elements belonging to a suitable linear
subspace. In other words, we need to consider the problem of approximating
a vector x in a Hilbert space X by a vector oY in a linear subspace Y o f .X
eL t IY E X for i = I, ... ,n, and let Y = V({IY )} denote the linear sub-
space of X generated by fY I ' ... ,Yn'} Since Y is finite dimensional, it is
closed. Now for any fixed x E X we wish to find that element of Y which
minimizes II x - Y II for all Y E .Y If oY E Y is that element, then we say that
oY approximates x. We call (x - oY ) the error vector and II x - oY II the error.
Since any vector in Y can be expressed as a linear combination Y = lX \ y \
+ ... + IXnY n' our problem is reduced to finding the set of IX/, i = 1, ... , n,
for which the error IIx - lX \ y \ - ... - IXnnY ll is minimized. But in view of
the classical projection theorem (Theorem 6.12.12), oY E Y which minimizes
the error is unique and, moreover, (x - oY ) ..L yj' i = I, ... ,n. From this
we obtain the n simultaneous linear equations
IX] \ ([ 'X )\Y ]

GT(y1> ... , nY ) : = : ' (6.15.1)
[
IXn (x , nY )
where in Eq. (6.15.1) GT(y\, ... , nY ) is the transpose of the matrix
(Y\, Y\) (Y\, nY )
(Y2, Y\) (Y2' nY )
(6.15.2)
(Yn' Y \ ) (Yn' nY )
The matrix (6.15.2) is called the Gram matrix of Y\, ... ,Yn' The determinant
of (6.15.2) is called the Gram determinant and is denoted by A(YI> ... ,yJ.
The equations (6.15.1) are called the normal equations. It is clear that in a
real Hilbert space G(YI> ... ,Yn) = GT(YI> ... ,Yn), and that in a complex
Hilbert space G(YI> ... ,Yn) = GT(y1' ... ,Yn)'
In order to approximate x E X by oY E Y we only need to solve Eq.
(6.15.1) for the lXI' i = 1, ... ,n. The next result gives conditions under
which Eq. (6.15.1) possesses a unique solution for the IX I •
396 Chapter 6 I oH rmed Spaces and Inner Product Spaces
6.15.3. Theorem. A set of elements IY { ' ... 'Yft} of a Hilbert space X is

linearly independent if and only if the Gram determinant ~(y I' • • , fY t) O. *'
Proof We prove this result by proving the equivalent statement ~(yl' ... ,
fY t) = 0 if and only if the vectors IY { ' ... , fY t} are linearly dependent.
Assume that IY { ' ... 'Yft} is a set of linearly dependent vectors in .X
Then there exists a set of scalars l{ IX ' ... ,IXft}' not all ez ro, such that
IXIIY + ... + IXftYft = O. (6.15.4)
Taking the inner product of Eq. (6.15.4) with the vectors IY { ' ... , fY t} yields
the n linear equations
(6.15.5)
IXI(Yft' YI) t- - •.. + IXft(Yft' fY t) = 0

Taking the l{ IX ' ... ,IXft} as unknowns, we see that for a non-trivial solution
(IX I• . .. ,IXft) to exist we must have (~ IY ' ... 'Yft) = O.
Conversely, assume that ~(yl' ... , fY t) = O. Then a non-trivial solution
(IX I • • • • , IXft) exists for Eq. (6.15.5). After rewriting Eq. (6.15.5) as
we obtain
( I£'I-IX IIY . l":1

f' .' IXllY ) = II,£IX
1= I
IY I12 =
I 0,
which implies that t IXly,

I- I
= O. Therefore, the set { Y I ' ... ,Y . } is linearly
dependent. This completes the proof. _
The next result establishes an expression for the error II x - oY II. The
proof of this result follows directly from the classical projection theorem.
6.15.6. Theorem. Let X be a Hilbert space, let x E ,X let { y l' ... , fY t} be

a set of linearly independent vectors in ,X let Y be the linear subspace of X
generated by { y l' ... , fY t}' and let oY E Y be such that
IIx - oY II = min ! I x
7EY
- yll = min IIx - I%IIY - ... - IXJ.II·
Then
where
(Yit ,Y ,) (Ylt x)
(Y z , ,Y ,) (Y z , x)
!\(YI' ... ,Y", x ) = det

(Y", ,Y ,) (y.. x )
(x, ,Y ,) (x, x )
8. Random Variables
A rigorous development of the theory of probability is based on measure

and integration theory. Since knowledge of this theory by the reader has not
been assumed, a brief discussion of some essential concepts will now be
given.
We begin by introducing some terminology. If 0 is a non-void set, a
family of subsets, ,~ of 0 is called a q-algebra (or a f-q ield) if (i) for all
sequence of sets (E,,} in ~ we have

,,-I
-
E, F E 0 we have E U F E 0 and E - F E 0, (ii) for any countable
U E" E ~, and (iii) 0 E ~. It readily
follows that a q-algebra is a family of subsets of 0 which is closed under all
countable set operations.
A function P: ~ - + R, where ~ is a q-algebra, is called a probability
measure if (i) 0 < P (E) ~ 1 for all E E ~,(ii)P(0) = 0 and pen) = 1, and
"*j, we have P(U .. E,,) = . peE,,) .

(iii) for any countable collection of sets (E,,} in ~ such that E, n EJ = 0
if i ~
• ""'1 .- 1
A probability space is a triple (0, ~,P}, where n is a non-void set, ~ is
a q-algebra of subsets of 0, and P is a probability measure on .~ We call
elements 0.> E 0 outcomes (usually thought of as occurring at random), and
we call elements E E ~ events.
A function X: 0 - + R is called a random variable if (0.>: (X o.» < }x E ~
for all x E R. The set (0.>: (X o.» < }x is usually written in shorter form as
(X < .} x If X is a random variable, then the function F x : R - + R defined by
(xF )x = P(X < }x for x E R is called the distribution function of .X If "X
i = 1, ... , n are random variables, we define the random vector X as X =
(X I ' .. , ,X ,)T. Also, for x = (x " ... ,x , ,? E R", the event (XI < X l " • . , "X
< ,x ,} is defined to be (0.>: IX (o.» < IX } (') o{ .>: X 2(0.» < x z } (') ... (') (0.>:
,X ,(o.» < ,x ,}. Furthermore, for a random vector ,X the function F x : R" - + R,
398 Chapter 6 / Normed Spaces and Inner Product Spaces
defined by (xF )x = P{ X I < x . , ... , fX t < x ft ,} is called the distribution

function of .X
If X is a random variable and g is a function, g: R - R, such that the
Stieltjes integral roo g(x )dF x exists, then the expected value of g( X ) is defined
to be E{g(X)} = roo g(x)dF(x )X . Similarly, if X is a random vector and if g
is a function, g: Rft - R such that t.g(x ) dF x ( X ) exists, then the expected
value ofg(X) is defined to be E{g(X)} = t. g(x)dF(x )x . Some of the expected

values of primary interest are E(X), the expected value of ,X E(XZ), the second
moment of ,X and E{ [ X - E(X)Z
] ,} the variance of .X
If we let .c z denote the family of random variables defined on a probability
space to, g:, P} such that E(XZ) < 00, then this space is a vector space over
R with the usual definition of addition and multiplication by a scalar. We
say two random variables, IX and X z , are equal almost surely if P{co: IX (co)
*' (z X co)} = O. If we let L z denote the family of equivalence classes of all
random variables which are almost surely equal (as in Example 5.5.31),
then L { z ; (,)} is a real Hilbert space where the inner product is defined by
(X, )Y = E(XY)L z • for ,X Y E
Throughout the remainder of this section, we let to, g:, P} denote our
underlying probability space, and we assume that all random variab~es belong
to the Hilbert space L z with inner product (X , )Y = E(XY).
C. Estimation of Random Variables
The special class of estimation problems which we consider may be

formulated as follows: given a set of random variables { Y .. ... , "Y ,}, find
the best estimate of another random variable, .X The sense in which an
estimate is "best" will be defined shortly. Here we view the set {Y., ... , "Y ,}
to be observations and the random variable X as the unknown.
F o r any mappingf: R'" - R such thatf(Y I , • . , "Y ,) E L z for all obser-
vations {Y ... .. , "Y ,}, we call X = f(Y I , • • , "Y ,) an estimate of .X Iffis
linear, we call X a linear estimate.
Next, letfbe linear; eL ., letfbe a linear functional on R"'. Then there is
a vector aT = (III' ... ,II",) E R'" such thatf(y) = aTy for all yT = ("., ... ,
"",) E R"'. Now a linear estimate, X = lilY. + ... +
II",Y"" is called the
best linear estimate of ,X given { Y l "' " "Y ,}, if E{ [ X - lilY. - ... -
II",Y",]Z} is minimum with respect to a E R"'.
The classical projection theorem (see Theorem 6.12.12) tells us that the
best linear estimate of X is the projection of X onto the linear vector space
V({Y, IY Il})' F u rthermore, Eq . (6.15.1) gives us the explicit form for

p . •.
"« i = 1, ... ,m. We are now in a position to summariz e the above discus-
sion in the following theorem, which is usually called the orthogonality
principle.
6.15.8. Theorem. L e t ,X Y p .•. , Y III belong to L z · Then X = « I Y I + ...

+ I« IlYIIl is the best linear estimate of X if and only if {«p ... '«Ill} are such
that E{ [ X - 21Y ,} = 0 for i = 1, ... ,m.
We also have the following result.
6.15.9. Corollary. L e t ,X Y I , •• , Y IIl belong to L z . L e t G = ,Y [ j]' where

't,j = E{,Y Y j,} i,j = I, , m, and let V = (PI' ... ,Pill) E Rill, where
P, = E{XY ,} for i = 1, , m. If G is non- s ingular, then X = « I Y I + ...
+ l« ilY 1ft is the best linear estimate of X if and only if aT = bTG - I .
6.15.10. Exercise. Prove Theorem 6.15.8 and Corollary 6.15.9.
L e t us now consider a specific case.
6.15.11. Example. L e t ,X VI' ... , Vm be random variables in L z such that

E{ X } = E{V,} = E{XV ,} = 0 for i = I, ,m, and let R = P[ /J] be non-
singular where P,j = E[V,V j] for i,j = I, , m. Suppose that the measure-
ments {Y p • .• , IY ft} of X are given by Y , = X V, for i = I, ... ,m. +
Then we have E{,Y Y j} = E{ [ X + V,][X + Vj]} = 0'; + P'j for i,j = I,
... ,m, where 0'; 11. E{.J z X Also, E{XY ,} = E{ X ( X + V,)} = 0'; for i = I,
... ,m. Thus, G = /Y[ ']J where j' Y = 0'; + P'j for i,j = I, ... , m, bT =
(PI' ... , Pm), where P, = 0'; for i = 1, ... ,m, and aT = bTG-I. •
6.15.12. Exercise. In the preceding example, show that if P,j = O';b ,j for
i,} = I, ... , m, where btj is the K r onecker delta, then
,« -
_ 2
0';-+ z for
._
I - I, ... , m.
mO'" , O'v
The nex t result provides us with a useful means for finding the best linear
estimate of a random variable ,X given a set of random variables { Y p ... ,
Y k } , if we already have the best linear estimate, given {Y p .• • , Y k - I } .
6.15.13. Theorem. L e t k > 2, and let Y I , • • , Y k be random variables

in L z . L e t Y ' j = V({Y I , • • • • Yj})' the linear vector space generated by the
random variables {Y p . • , Y j } , for 1 < j < k. L e t Y i k - I) denote the
best linear estimate of Y k• given {Y p . . • , Y k- I,} and let Y k(k - I) = Y k -
Y i k - I). Then kY ' = 'Yk-I EB V({Y k(k - I)}).
04 0 Chapter 6 I Normed Spaces and Inner Product Spaces
Proof By the classical projection theorem (see Theorem 6.12.12), "Y ,(k - I)
.J .. ,Y' .-I· Now for arbitrary Z E ,Y ' ., we must have Z = CIY I + ... +
C,.-I ,Y .-I + C,.Y,. for some (C I' ... ,C,.). We can rewrite this as Z = ZI + Z2'
where ZI = CIY I + ... + C,.-I,Y .-I + C,.Y,.(k - I) and Z2 = C,.Y,.(k - I).
Since ZI E ,Y ' .-I and Z2 1- 'Y,.-I' it follows from Theorem 6.12.12 that ZI
and Z2 are unique. Since ZI E ,Y' .-I and Z2 E V({,Y .(k - I)}), the theorem
is proved. _
We can extend the problem of estimation of (scalar) random variables

to random vectors. eL t X I ' ... , X. be random variables in £ 2 ' and let X =
(XI> ... , .X )T be a random vector. Let Y o ' .. , "Y , be random variables in
£ 2 ' We call i = (.A\, ... , .X )T the best linear estimate of ,X given Y{ I' ,
"Y ,}. if ,X is the best linear estimate of "X given { Y o ' .. , "Y ,} for i = 1, ,
n. Clearly, the orthogonality principle must hold for each X , ; i.e., we must
have E{(,X - ,X )Y j } = 0 for i = 1, ... ,n and j = 1, ... ,m. In this case
i can be expressed as i = AY, where A is an (n X m) matrix of real numbers
and Y = (Y I , • • , "Y ,)T. Corollary 6.15.9 assumes now the following matrix
form.
6.15.14. Theorem. Let X I ' ... ,X., oY ... , "Y , be random variables in
£ 2 ' Let G = [)',j]' where )'1) = E{,Y Y j } for i,j = 1, ... ,m, and let B =
[ ,j] , where PI) = E{ X , Y
P j} for i = I, ... ,n. If G is non-singular, then
i = AY is the best linear estimate of ,X given ,Y if and only if A = BG- I .
We note that Band G in the above theorem can be written in an alternate

way. That is, we can say that
i = E{XYT}[E{YVTWIY (6.15.16)
is the best linear estimate of .X By the expected value of a matrix of random
variables, we mean the expected value of each element of the matrix.
In the remainder of this section we apply the preceding development to
dynamic systems.
eL t J = {I, 2, ...} denote the set of positive integers. We use the notation
{X(k)} to denote a sequence of random vectors; i.e., X(k) is a random vector
for each k E .J eL t (U { k)} be a sequence of random vectors, (U k) = [ U I (k),
... , U i k)] T , with the properties
E{ U ( k)} = 0 (6~I5.1 7)
and
E{U(k)UT(j)} = Q(k~j" (6.15.18)
for all j, k E ,J where Q(k) is a symmetric positive definite (p X p) matrix
for all k E .J Next, let V
{ (k)} be a sequence of random vectors, V(k) =
6.15. Some Applications 04 1
[V1(k), ... , V..(k)]T, with the properties

E{V(k)} = 0 (6.15.19)
and
E{V(k)VT(j)} = R(k)Ojk (6.15.20)
for all j, k E ,J where R(k) is a symmetric positive definite (m X m) matrix
for all k E .J
Now let X ( I) be a random vector, X ( I) = 1X[ (I), ... , X~(I)]T, with the
properties
E{X(I)} = 0 (6.15.21)
and
E{X(I)XT(I)} = P(I), (6.15.22)
where P(I) is an (n X n) symmetric positive definite matrix. We assume
further that the relationships among the random vectors are such that
E{(U k)VT(j») = 0, (6.15.23)
E{(X I)UT(k») = 0, (6.15.24)
and
E{X(I)VT(k)} = 0 (6.15.25)
for all k,j E .J
Next, let A(k) be a real (n x n) matrix for each k E ,J let B(k) be a real
(n x p) matrix for each k E ,J and let C(k) be a real (m x n) matrix for
each k E .J We let {X(k)} and (Y{ k)} be the sequences of random vectors
generated by the difference eq u ations
(X k + 1) = A(k)X(k) + B(k)U(k) (6.15.26)
and
Y(k) = C(k)X(k) + V(k) (6.15.27)
for k = 1,2, ....
We are now in a position to consider the following estimation problem:
given the set of observations, (Y{ I), ... , Y(k)}, find the best linear estimate of
the random vector (X k). We could view the observed random variables as
a single random vector, say cyT = T Y [ (I), yT(2), ... ,YT(k)], and apply
Theorem 6.15.14; however, it turns out that a rather elegant and significant
algorithm exists for this problem, due to R. E. Kalman, which we consider
next.
In the following, we adopt some additional convenient notation. F o r
each k,j E ,J we let t(j Ik) denote the best linear estimate of X ( j), given
(Y{ I), ... , (Y k)}. This notation is valid for j < k and j;;::: k; however, we
shall limit our attention to the situation where j ;;::: k. In the present context,
a recursive algorithm means that ~(k + I Ik + I) is a function only of
~(k Ik) and Y ( k + I). The following theorem, which is the last result of this
section, provides the desired algorithm explicitly.
6.15.28. Theorem (K a lman). Given the foregoing assumptions for the

dynamic system described by Eqs. (6.15.26) and (6.15.27), the best linear
estimate of X(k), given (Y(I), ... , Y ( k)} , is provided by the following set of
difference eq u ations:
i(k Ik) = i(k Ik - I) + K ( k)[ Y ( k) - C(k)i(k Ik - 1)], (6.15.29)
and
i(k + II k) = A(k)i(k Ik), (6.15.30)
where
K ( k) = P(k Ik - I)CT(k)[C(k)P(k Ik - I)CT(k) + R(k)] - l , (6.15.31)
P(kl k) = I[ - K ( k)C(k)] P (kl k - I), (6.15.32)
and
P(k + 11 k) = A(k)P(kl k)AT(k) + B(k)Q(k)BT(k) (6.15.33)
for k = 1, 2, ... , with initial conditions
i(IIO) = 0
and
P(lIO) = P(I).
Proof Assume that i(kl k - I) is known for k E .J We may interpret
i(lIO) as the best linear estimate of X(l), given no observations. We wish
to find i(k Ik) and i(k + 11 k). It follows from Theorem 6.15.13 (extended
to the case of random vectors) that there is a matrix K ( k) such that i(k I k)
= i(kl k - I) + K ( k)f(kl k - 1), where f(kl k - 1) = Y ( k) - t(kl k
- I), and t(k Ik - I) is the best linear estimate of Y ( k), given {Y(l), ... ,
Y ( k - I)}. It follows immediately from Eqs. (6.15.23) and (6.15.27) and the
orthogonality principle that t(k I k - 1) = C(k)i(k I k - I). Thus, we have
shown that Eq. (6.15.29) must be true. In order to determine K ( k), let
X ( kl k - 1) = X ( k) - X ( kl k - I). Then it follows from Eqs. (6.15.26) and
(6.15.29) that
(X kl k) = X ( kl k - I) - K ( k)[ C (k)X ( kl k - I) + V(k)] .
To satisfy the orthogonality principle, we must have E{ X ( k I k)Y T (j)} = 0 for
j = 1, ... , k. We see that this is satisfied for any K ( k) for j = 1, ... , k - 1.
In order to satisfy E(X ( k Ik)YT(k)} = 0, K ( k) must satisfy
0= E{ X ( k Ik - l)YT(k)} - K ( k)[ C (k)E{ X ( k Ik - l)Y T (k)}
+ E{V(k)YT(k)}]. (6.15.34)
Let us first consider the term
E{ X ( k Ik - I)YT(k)} = E(X ( kl k - I)X T (k)C T(k) + X ( kl k - l)VT(k)}.
(6.15.35)
We observe that X(k), the solution to the difference eq u ation (6.15.26) at
(time) k, is a linear combination of X ( l) and U ( l), ... , U ( k - 1). In view
of Eqs. (6.15.23) and (6.15.25) it follows that E{X(j)VT(k)} = 0 for all

k,j E .J Hence, E{ X ( kl k - I)VT(k)} = 0, since X ( kl k - I) is a linear
combination of X(k) and O Y ), ... , Y(k - I).
Next, we consider the term
E{ X ( kl k- I)XT(k)} = E{ X ( kl k- l)[XT(k)
- iT(kl k - 1)' + iT(k Ik - I)]}
= E { X ( klk- I )[ X T (klk- l ).+ i T(klk- I )] }
= P(kl k- I) (6.15.36)
where
P(kl k - I) t::. E{X(kl k- I)X T (klk - I)}
and E{ X ( klk - l)iT(klk' - I)} = 0, since i(klk - I ) is a linear combina-
tion of O Y { ), ... , Y(k - I)} .
Now consider
E{V(k)YT(k)} = E{V(k)[TX (k)CT(k) .+ VT(k)J} = R(k). (6.15.37)
Using Eqs. (6.15.35), (6.15.36), and (6.15.37), Eq. (6.15.34) becomes
0= P(kl k - I)CT(k) - K(k)[C(k)P(kl k- l)CT(k) .+ R(k)]. (6.15.38)
Solving for (K k), we obtain Eq. (6.15.31).
To obtain Eq. (6.15.32), let X(k I k) = i(k) - X(k Ik) and P(k Ik) =
ErX ( k I k)XT(k I k)}. In view of Eqs. (6.15.27) and (6.15.29) we have
X ( kl k) = X ( kl k - 1) - K(k)[C(k)X(kl k- 1) + V(k)]
= [ I - K(k)C(k)]X(k Ik - 1) - (K k)V(k).
P(kl k) = I[ - (K k)C(k)JP(kl k- 1)
- I[ - K ( k)C(k)] P (kl k- I)CT(k)KT(k) + (K k)R(k)KT(k)
= I[ - K(k)C(k)]P(k Ik - 1)
- P
{ (k Ik - I)CT(k) - K(k)[C(k)P(k Ik - I)CT(k) .+ R(k)J}
x T
K (k).
U s ing Eq. (6.15.38), it follows that Eq. (6.15.32) must be true.
To show that i(k' + 11k) is given by Eq. (6.15.30), we simply show that
the orthogonality principle is satisfied. That is,
E{[X(k + A(k)i(k Ik)]YT(j)}
°
1) -
= EfA(k)[X(k) - i(k I k)]YT(j)} .+ EfB(k)U(k)YT(j)} =
for j = 1, ... , k.
04 4 Chapter 6 / Normed Spaces and Inner Product Spaces
Finally, to verify Eq. (6.15.33), we have from Eqs. (6.15.26) and (6.15.30)
X(k + 11 k) = A(k)X ( k Ik) + B(k)U(k).
F r om this, Eq. (6.15.33) follows immediately. We note that i(ll 0) = 0 and
P(IIO) = P(l). This completes the proof. _
6.16. NOTES AND REFERENCES
The material of the present chapter as well as that of the next chapter
constitutes part of what usually goes under the heading of functional analysis.
Thus, these two chapters should be viewed as a whole rather than two separate
parts.
There are numerous excellent sources dealing with H i lbert and Banach
spaces. We cite a representative sample of these which the reader should
consult for further study. References 6 [ .6]6[- .8], 6[ .10], and 6[ .12] are at an
introductory or intermediate level, whereas references 6 [ .2]6[ - .4] and 6[ .13]
are at a more advanced level. The books by Dunford and Schwartz and by
Hille and Phillips are standard and encyclopedic references on functional
analysis; the text by Y osida constitutes a concise treatment of this subject,
while the monograph by H a lmos contains a compact exposition on H i lbert
space. The book by Taylor is a standard reference on functional analysis at
the intermediate level. The texts by K a ntorovich and Akilov, by K o lmogorov
and F o min, and by Liusternik and Sobolev are very readable presentations
of this subject. The book by Naylor and Sell, which presents a very nice
introduction to functional analysis, includes some interesting examples. F o r
references with applications of functional analysis to specific areas, including
those in Section 6.15, see, e.g., Byron and F u ller 6[ .1], K a lman et al. 6[ .5],
L u enberger 6[ .9], and Porter 6[ .11].
REFERENCES
6[ .1] .F W. BYRON and R. W. EL UF R, Mathematics of Classical and Quantum

Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc.,
1969 and 1970.·
6[ .2] N. DUNO F RD and .J SCHWARTZ, Linear Operators. Parts I and II. New York:
Interscience Publishers, 1958 and 1964.
6[ .3] P. R. A H M L OS, Introduction to Hilbert Space. New York: Chelsea Publishing
Company, 1957.
6[ .4] E. IH EL and R. S. PHIIL PS, Functional Analysis and Semi-Groups. Provi-
dence, R.I.: American Mathematical Society, 1957.
6[ .5] R. E. A K ML AN, P. L . A
F B L , and M. A. ARBIB, Topics in Mathematical System
Theory. New York: McGraw-iH ll Book Company, 1969.
*Reprinted in one volume by Dover Publications, Inc., New oY rk, 1992.
6.16. Notes and References
6[ .6) L . V. A
K NTORovlCH and G. P. AKIO L V, uF nctional Analysis in Normed
Spaces. New York: The Macmillan Company, 1964.
6[ .7) A. N. O K M
L OGOROV and S. V. O F MIN, Elements of the Theory of uF nctions
and uF nctional Analysis. Vols. t, II. Albany, N.Y.: Graylock Press, 1957
and 1961.
6[ .8) .L A. IL SU TERNIK and V. .J SoBOLEV, Elements ofFunctional Analysis. New
York: rF ederick Ungar Publishing Company, 1961.
6[ .9) D. G. EUL NBERGER, Optimization by Vector Space Methods. New York:
J o hn Wiley & Sons, Inc., 1969.
6[ .10] A. W. NAYO L R and G. R. SEL,L iL near Operator Theory. New York: Holt,
Rinehart and Winston, 1971.
6[ .11] W. A. PORTER, Modern oF undations of Systems Engineering. New York:
The Macmillan Company, 1966.
6[ .12] A. E. TAYO L R, Introduction to uF nctional Analysis. New York: John Wiley
& Sons, Inc., 1958.
6[ .13] .K OY SIDA, uF nctional Analysis. Berlin: Springer-Verlag, 1965.
7
IL NEAR OPERATORS
In the present chapter we concern ourselves with linear operators defined

on Banach and Hilbert spaces and we study some of the important properties
of such operators. We also consider selected applications in this chapter.
This chapter consists of ten parts. Throughout, we consider primarily
bounded linear operators, which we introduce in the first section. In the second
section we look at inverses of linear transformations, in section three we
introduce conjugate and adjoint operators, and in section four we study
hermitian operators. In the fifth section we present additional special linear
transformations, including normal operators, projections, unitary operators,
and isometric operators. The spectrum of an operator is considered in the
sixth, while completely continuous operators are introduced in the seventh
section. In the eighth section we present one of the main results of the present
chapter, the spectral theorem for completely continuous normal operators.
Finally, in section nine we study differentiation of operators (which need not
be linear) defined on Banach and Hilbert spaces.
Section ten, which consists of three subsections, is devoted to selected
topics in applications. Items touched upon include applications to integral
equations, an example from optimal control, and minimization of functionals
(method of steepest descent). The chapter is concluded with a brief discussion
of pertinent references in the eleventh section.
7.1. BOUNDED IL NEAR TRANSFORMATIONS
Throughout this section X and Y denote vector spaces over the same field
,F where F is either R (the real numbers) or C (the complex numbers).
We begin by pointing to several concepts considered previously. Recall
from Chapter I that a transformation or operator T is a mapping of a subset
:D(T) of X into .Y Unless specified to the contrary, we will assume that
X = :D(T). Since a transformation is a mapping we distinguish, as in Chapter
I, between operators which are onto or surjective, one-to-one or injective,
and one-to-one and onto or bijective. If T is a transformation of X into Y we
write T: X - + .Y If x E X we call y = T(x) the image ofx in Y under T, and
if V c X we define the image ofset V in Y under T as the set
T(V) = y{ E Y: y = T(v), v EVe X } .
On the other hand, if W c ,Y then the inverse image ofset Wunder T is the
set
T- I (W) = x { E :X y = T(x) EWe .} Y
We define the range ofT, denoted R
< (T), by
R
< (T) = y{ E :Y y= T(x), x EX } ;
< (T) = T(X). Recall that if a transformation T of X into Y is injective,
i.e., R
then the inverse of T, denoted T- I , exists (see Definition 1.2.9). Thus, if
y = T(x) and if T is injective, then x = T- l (y).
In Definition·3.4.1 we defined a linear operator (or a linear transformation)
as a mapping of X into Y having the property that
(i) T(x + y) = T(x) + T(y) for all ,x y E X ; and
(ii) T(lX)X = lXT(x) for alllX E F and all x E .X
As in Chapter 3, we denote the class of all linear transformations from
X into Y by L ( X , Also, in the case of linear transformations we write
)Y .
Tx in place of T(x).
Of great importance are bounded linear operators, which turn out to
be also continuous. We have the following definition.
7.1.1. Definition. Let X and Y be normed linear spaces. A linear operator

T: X - + Y is said to be bounded if there is a real number 1' > 0 such that
II Tx Ily < 1' 11 lx ix
for all x E .X
The notation II x Ilx indicates that the norm on X

is used, while the notation
II Tx lIy indicates that the norm on Y is employed.
However, since the norms
of the various spaces are usually understood, it is customary to drop the
subscripts and simply write II x II and II Tx II·
04 7
04 8 Chapter 7 I iL near Operators
Our first result allows us to characterize a bounded linear operator in an

equivalent way.
7.1.2. Theorem. Let T E L ( X , )Y . Then T is bounded if and only if T

maps the unit sphere into a bounded subset of .Y
In Chapter 5 we introduced continuous functions (see Definition 5.7.1).

The definition of continuity of an operator in the setting of normed linear
spaces can now be rephrased as follows.
7.1.4. Definition. An operator T: X - > Y (not necessarily linear) is said

to be continuous at a point X o E X iffor every f > 0 there is a 6 > 0 such that
IIT(x ) - T(x o) II < f
whenever II x - X o II < 6.
7.1.5. Theorem. Let T E L ( X , )Y . If T is continuous at a single point

X o E ,X then it is continuous at all x E .X
In this chapter we will mainly concern ourselves with bounded linear

operators. Our next result shows that in the case of linear operators bounded-
ness and continuity are equivalent.
7.1.7. Theorem. Let T E L(X, )Y . Then T is continuous if and only if it

is bounded.
Proof Assume that T is bounded, and let "I be such that II Tx II S "IIIx II for
all x E .X Now consider a sequence x { n ) in X such that x . - > 0 as n - > 00.
Then II TX n II < , 11 .x 11 - > 0 as n - > 00, and hence T is continuous at the point
o E .X F r om Theorem 7.1.5 it follows that T is continuous at all points
x E .X
Conversely, assume that Tis continuous at x = 0, and hence at all x E .X
Since TO = 0 we can find a 6 > 0 such that II Tx II < I whenever II x II S 6.
F o r any x 1= := 0 we have 1I(6x)/llxllll = 6, and hence
IITxll = II T(I I~ • 11¥)11 = (~)II T(I ~I)I < ilfl·

Ifwe let" = 1/6, then II Txll < i'llxll, and Tis bounded. _
7.1. Bounded iL near Transformations
Now let S, T E L ( X , Y). In Eq. (3.4.24 ) we defined the sum of linear

+ T) by
operators (S
(S + T)x = Sx + Tx, x E ,X
and in Eq. (3.4.34 ) we defined multiplication ofT by a scalar IX E F as
(IXT)x = IX(Tx), x E ,X IX E .F
We also recall (see Eq. (3.4.» 4 that the zero transformation, 0, of X into Y
is defined by Ox = 0 for all x E X and that the negative of a transformation
T, denoted by - T, is defined by (- T)x = - Tx for all x E X (see Eq.
3.4.45». Furthermore, the identity transformation IE L ( X , X ) is defined
by Ix = x for all x E X (see Eq. (3.4.56». Referring to Theorem 3.4.74 , we
recall that L ( X , )Y is a linear space over .F
Next, let ,X ,Y Z be vector spaces over ,F and let S E L ( ,Y Z) and
T E L ( X , )Y . The product of Sand T, denoted by ST, was defined in Eq.
(3.4.50) as the mapping of X into Z such that
(ST)x = S(Tx), x E .X
It can readily be shown that ST E L ( X , Z). Furthermore, if X = Y = Z,
then L ( X , X ) is an associative algebra with identity I (see Theorem 3.4.59).
Note however that the algebra L ( X , X ) is, in general, not commutative
because, in general,
ST* TS.
In the following, we will use the notation B(X , )Y to denote the set of
all bounded linear transformations from X into Y; i.e.,
B(X , )Y A {T E L(X, )Y : T is bounded}. (7.1.8)
The reader should have no difficulty in proving the next theorem.
7.1.9. Theorem. The space B(X , )Y is a linear space over .F
Next, we wish to define a norm on B(X , )Y .
7.1.11. Definition. Let T E B(X , )Y . The norm of T, denoted II Til, is

defined by
IITII = inf{y: II Txll < yllxll for all x EX } . (7.1.12)
Note that II Til is finite and that

IITxll S IITII' l lx l l
for all x E .X In proving that the function II . II: B(X , )Y - + R satisfies all
the axioms of a norm (see Definition 6.1.1), we need the following result.
7.1.13. Theorem. Let T E B(X , )Y . Then II Til can equivalently be

expressed in anyone of the following forms:
(i) II Til = inf{ y :IITx l l < lY lxll for all x EX } ;
7
(ii) II Til = sup I{ I Tx l l/llx l l: x EX } ;

","0
(iii) IITII = sup I{ I Tx l l: x E ;} X and

I",I:S:\
(iv) II Til = sup I{ I Tx l l: x EX } .

1"'1=\
We now show that the function II . II defined in Eq. (7.1.12) satisfies all
the axioms of a norm.
7.1.15. Theorem. The linear space B(X , )Y is a normed linear space (with
norm defined by Eq. (7.1.12»; i.e.,
(i) for every T E II Til > 0, and II Til = 0 if and only ifT =
B(X , )Y , 0;
(ii) liS + Til s IISII + II Til for every S, T E B(X, )Y ; and
(iii) II T~ II = I~ III Til for every T E B(X, )Y and for every ~ E .F
Proof The proof of part (i) is obvious. To verify (ii) we note that
II(S + T)x l l = IISx + Tx l l < IISxll + IITxl1 < (IISII + IITll)llxll·
If x = 0, then we are finished. If x t= = 0, then
liS + Til = ~ II(Sltx~)xlI < IISII + IITII for all x E ,X x t= = O.
We leave the proof of part (iii), which is similar, as an exercise. _
F o r the space B(X , X) we have the following results.
7.1.16. Theorem. If S, T E B(X, X), then ST E B(X, X) and

IISTII < IISII 1· 1 Til·
Proof F o r each x E X we have
II (ST)x II = II S(Tx) II < IISII·11 Tx l l < IISII·IITII·llxll,
which shows that ST E B(X, X). If x t= = 0, then
IISTII = sup II(ST)xll < IISII·IITII,

","0 IIxll
completing the proof. _
7.1.17. Theorem. Let / denote the identity operator on .X Then / E

B(X, X), and II/II = 1.
7.1. Bounded iL near Transformations 14 1
We now consider some specific cases.
7.1.19. Example. Let X = I z , the Banach space of Example 6.1.6. F o r

x = (el' ez, ... ) E ,X let us define T: X - > X by
Tx = (0, ez, e3' ... ).
The reader can readily verify that T is a linear operator which is neither
injective nor surjective. We see that
00
le,l z
00
IITxW = ~Ie,lz < ~ = IIxW·

Thus, T is a bounded linear operator. To compute IITII we observe that
IITxll < II x II,
which implies that IITII < I. Choosing, in particular, x =
(0, 1,0, ...) E ,X we have II Txll = Ilxll = I and
II Txll < IITII·llxll = IITII.
1=
Thus, it must be that IITil = I. •
7.1.20. Example. Let X = era, b], and let 11·1100 be the norm on era, b] defined
in Example 6.1.9. eL t k: a[ , b] X a[ , b] - > R be a real-valued function, con-
tinuous on the square a < s < b, a < t < b. Define the operator T: X - > X
by
for x E .X Then T E L(X,

[ T x ] ( s)
X)
= r k(s, t)x ( t) dt
(see Example 3.4.6). Then
IITx II = sup Ifb k(s, t)x(t) dt I

r
Q~·,~b Q
< [Q~rb Ik(s, t) Idt] • [Q~fb Ix ( t) I]

= )10 ·lIxll·
This shows that T E B(X , )Y and that IITil < )10' It can, in fact, be shown
that IITII = )10' •
F o r norms of linear operators on finite-dimensional spaces, we have the

following important result.
7.1.21. Theorem. eL t T E L(X, )Y . If X is finite dimensional, then Tis

continuous.
Proof Let {XI' ,x n } be a basis for .X F o r each x E X, there is a unique
set of scalars reI, ,en} such that x = elx l + + enxn'
If we define
the linear functionalsj,: X - > F b y j,(x ) = e" i = I, ,n, then by Theorem
6.6.1 we know that each f, is a continuous linear functional. Thus, there

exists a set of real numbers { " I' ... ,,,"} such that If,(x) I < ",lIxll for i =
1, ... , n. Now
Tx = ' I Tx l + ... + ,"Tx".
If we let p = max, llTx,11 and )'0 = max, )"' then it follows that IITxll
< np)'oll x II. Thus, T is bounded and hence continuous. _
Next, we concern ourselves with various norms of linear transformations
on the finite dimensional space R".
7.1.22. Example. eL t X = R", and let IU { ' ... ,u"} be the natural basis for
R" (see Example .4 I.I 5). F o r any A E L ( X , X ) there is an n X n matrix, say
A = a[ ll] (see Definition .4 2.7), which represents A with respect to IU{ > ... ,
u"}. Thus, if Ax = y, where x = (' I > ... ,,") E X a ndy = ('71' ... , 7' ") E ,X
we may represent this transformation by y = Ax (see Eq. (4.2.17». In Exam-
ple 6.1.5 we defined several norms on R", namely
IIxllp = [ I ' l lI' + ... + 1e"'I] ' /P, 1 <p < 00
and
II x 11_ = max, I{ e,l}.
It turns out that different norms on R" give rise to different norms oftransfor-
mation A. (In this case we speak of the norm of A induced by the norm
defined on R".) In the present example we derive expressions for the norm
of A in terms of the elements of matrix A when the norm on R" is given
by II • III' II • liz, and II • 11-·
(i) L e tp = 1; i.e., IIxli = led + ... + 1e"1. Then IIAII = max
1
t'lalll.
1-1=
To prove this, we see that
IIAxl1 = ~ It1 atjel I<I~ tilatj'll

= l=t 1 lell ,-t 1 la'll < l-i;
. lell • m
{ ax
I S;lS;" ,=t I Iall I}
= m
{ ax
• S;lS;" I- I
t la,ll} · l lx l l·
eL t jo be such that i;latj,l=

1= 1
max tla/JI"
I S;lS;" 1= 1
)' 0 ' Then IIAII< " o' To
show that equality must hold, let X o = (' I ' ... ,,") E R" be given by ' l l = I,
and " = 0 if i *- jo. Then
IIAxoli = t lau,l
I- '
and Ilxoll = 1.
From this it follows that II A II ~ )' 0 ' and so we conclude that II A II = )' 0 '
7.1. Bounded iL near Transformations 14 3
(ii) Let p = 2; i.e., IIx l l = (leI 12 + + 12)1/2. Let AT denote the Ie.
transpose of A (see Eq. (4.2.9», and let A
{ ., ,A,,} be the distinct eigenvalues
of the matrix ATA (see Definition .4 5.6). LetA o = max AJ . Then II A II = ~.
J
To prove this we note first that by Theorem .4 10.28 the eigenvalues of ATA
are all real. We show first that they are, in fact, non-negative.
Let {XI' ... , ,x ,} be eigenvectors of ATA corresponding to the eigenvalues
A{ . I, ... , A,,}, respectively. Then for each i = I, ... , k we have ATAx/ = A/X/.
Thus, ;X ATAx/ = A,X;/X . From this it follows that
A= ;X ATAx/ > O.
, x;x/-
F o r arbitrary x E X it follows from Theorem .4 10.44 that x = IX + .
+ "x , where ATAx, = A/X/, i = I, ... ,k. Hence, ATAx = AIX I + .
+ A/eX", By Theorem .4 9.41 we have IIAxW = T
x ATAx. Thus,
IIAxW = T
x ATAx " Atllx,W <
= I; "
Ao I; Ilx,W = AollxW,
'=1 /= 1
from which it follows that II A II < ~. If we let x be an eigenvector corre-

sponding to Ao, then we must have IIAxW = AollxW, and so equality is
achieved. Thus, II A II = ~.
(iii) Let Ilx l i = max I{
/
I}. Then e, IIAII = max ( t
/ J=I
laill). The proof of
this part is left as an exercise. _
7.1.23. Exercise. Prove part (iii) of Example 7.1.22.
Next, we prove the following important result concerning the completeness

of B(X , )Y
7.1.24. Theorem. If Y is complete, then the normed linear space B(X , )Y

is also complete.
Proof L e t {T.} be a Cauchy sequence in the normed linear space B(X , )Y .
Choose N such that for a given f > 0, IITm - T.II < f whenever m > N
and n > N. Since the T. are bounded we have for each x E ,X
IITmx - T.lll1xll < fllx l l
T.xll < IITm-
whenever m, n ~ N. From this it follows that T{ .x} is a Cauchy sequence in
.Y But Y is complete, by hypothesis. Therefore, T.x has a limit in Y which
depends on x E .X Let us denote this limit by Tx; i.e., lim T.x = Tx. To
show that T is linear we note that
.-00
T(x + y) = lim T.(x + y) = lim T.x + lim T.y = Tx + Ty
and
T(<)x< = lim T.(<)x < = « lim T.x = «Tx.
Thus, T is a linear operator of X into .Y We show next that T is bounded

and hence continuous. Since every Cauchy sequence in a normed linear
space is bounded, it follows that the sequence T { nJ is bounded, and thus
II Tn II < M for all n, where M is some constant. We have
II Txll = lilim Tnx II = lim II Tnx II < sup 01 Tn IIIIx II)
This proves that T is bounded and therefore continuous, and T E B(X, )Y .

Finally, we must show that Tn - + T as n - + 00 in the norm of B(X, )Y . F r om
before, we have II TmX - Tnx II < ell x II whenever m, n > N. Ifwe let n - + 00,
then II TmX - Tx II < ell x II for every x E X provided that m > N. This
implies that II Tm - Til < e whenever m > N. But Tm - + T as m - + 00 with
respect to the norm defined on B(X, )Y . Therefore, B(X, )Y is complete and
the theorem is proved. _
In Definition 3.4.16 we defined the null space of T E L(X, )Y as

(~ T) = x{ E :X Tx = OJ. (7.1.25)
We then showed that the range space R < (T) is a linear subspace of Y a nd
that ~(T) is a linear subspace of .X F o r the case of bounded linear transfor-
mations we have the following result.
7.1.26. Theorem. eL t T E B(X, )Y . Then ~(T) is a closed linear subspace

of .X
Proof meT) is a linear subspace of X by Theorem 3.4.19. That it is closed
follows from part (ii) of Theorem 5.7.9, since (~ T) = T-I({O)J and since O{ J
is a closed subset of .Y _
We conclude this section with the following useful result for continuous
-
7.1.27. Theorem. Let T L(X, )Y . Then T is continuous if and only if
-
E
T(I; ~IXI) = I; ~ITxl

-
I- I 1= 1
for every convergent series I; ~,X, in .X

I- I
The proof of this theorem follows readily from Theorem 5.7.8. We leave
the details as an exercise.

7.2. INVERSES
Throughout this section X and Y denote vector spaces over the same field
F where F is either R (the real numbers) or C (the complex numbers).
We recall that a linear operator T: X - 4 Y has an inverse, T- I , if it is
injective, and if this is so, then T- I is a linear operator from R
< (T) onto X
(see Theorem 3.4.32). We have the following result concerning the continuity
ofT- I .
7.2.1. Theorem. Let T E L(X, )Y . Then T- I exists, and T- I E B(<R(T), )X

if and only if there is an IX > 0 such that II Tx II > IXII x II for all x E .X If
this is so, II T- I II < I/IX.
Proof Assume that there is a constant IX > 0 such that IXII x II < II Tx II for
all x E .X Then Tx = 0 implies x = 0, and T- ' exists by Theorem 3.4.32.
For y E R < (T) there is an x E X such that y = Tx and T- l y = .x Thus,
IXII x II = IXII T- I y II < II Tx II = II y II,
or
IIT-I Y II < ~ lIyll·
Hence, T- I is bounded and liT-III < I/IX.
Conversely, assume that T- I exists and is bounded. Then, for x E X
there is ayE R < (T) such that y = Tx, and also x = T- I y. Since T- I is
bounded we have
or
The next result, called the Neumann expansion theorem, gives us important
information concerning the existence of the inverse of a certain class of
bounded linear transformations.
7.2.2. Theorem. Let X be a Banach space, let T E B(X, )X , let I E B(X, X)

denote the identity operator, and let II Til < I. Then the range of (1- T) is
,X the inverse of (I - T) exists and is bounded and satisfies the inequality
(7.2.3)
F u rthermore, the series ~

. Til in B(X, X) converges uniformly to (J - T)- I
.-0
with respect to the norm of B(X, )X ; i.e.,
(1- T)- I = 1+ T+ T2 + ... + T" + .... (7.2.4)
14 5
416 Chapter 7 I iL near Operators
IITil <
.
.
Proof Since I, it follows that the series I~J ITII· converges. In
view of Theorem 7.1.16 we have II P II < IITil·, and hence the series I: T·
.so
converges in the space B(X , X), because this space is complete in view of
Theorem 7.1.24. If we set
S= I:P,
.
.-0
then
ST = TS =
.
I: P+ I,
.=0
and
(I - T)S = S(I - T) = I.
It now follows from Theorem 3.4.65 that (I - T)- I exists and is equal to S.
F u rthermore, S E B(X , X ) . The inequality (7.2.3) follows now readily
and is left as an exercise. _
7.2.5. Exercise. .Prove inequality (7.2.3).
The next result, which is of great significance, is known as the Banach

inverse theorem.
7.2.6. Theorem. Let X and Y be Banach spaces, and let T E B(X , )Y .

If T is bijective then T- I is bounded.
Proof The proof of this theorem is rather lengthy and requires two prelimi-
nary results which we state and prove separately.
7.2.7. Proposition. If A is any subset of X such that .1 = X (.1 denotes the

closure of A), then any x E X such that x * - O can be written in the form
x = Xl + X" + ... + X. + ... ,
where X. E A and II X. II < 31IxlI/2·, n = 1,2, ....
Proof The sequence x { is constructed as follows. Let X l E A be such
k}
that II X - Xl II < til X II. This can certainly be done since A= .X Now choose
"x A such that
E Ilx - XI - X" II < illX II· We continue in this manner
and obtain
I
Ilx - XI - •• - .x 1I < 2.lIx
lI.
We can always choose such an x . E A, because .x _ 1 E X
tl
X - Xl - •• -
and A = .X By construction of .X{ l,11 X - x k 11-- 0 as n - 00. Hence,

7.2. Inverses 417
x
.
= :E x We now compute II x" II. First, we see that
k•
k- I
Ilxlll = IIx l - X + Ix I < IIx l - Ix I + IIx l i < l! lxll,

IIx z lI = Ilx z + IX - -X IX + ix i < IIx - IX - lzx l
+ Ilx - Ix II < illIx I,
and, in general,
IIx" II = IIx" + ,X ,_I + + IX - X + -X
I - ... - ,X ,_IX II
< Ilx - IX - - ,x ,11 + Ilx - IX - ... - ,x ,_111
3
< 2"lI x ll.
which proves the proposition. _
7.2.8. Proposition. If A
{ ,,} is any countable collection of subsets of X
= U
GO
such that X A", then there is a sphere S(x o; E) C X and a set A" such
,,-I
that S(X o; E) C .1".
Proof The proof is by contradiction. Without loss of generality, assume

that
AI C A z C A 3 C . . . .
F o r purposes of contradiction assume that for every x E X and every n
there is an E" > 0 such that S(x ; E,,) n A" = 0. Now let IX E X and EI > 0
be such that S(x l ; f l ) n AI = 0. eL t X z E X and f z > 0 be such that
S(x z ; fz ) c S(x l ; f\ ) and S(x z ; fz ) n Az = 0. We see that it is possible to
construct a sequence of closed nested spheres, ,K { ,}, (see Definition 5.5.34)
in such a fashion that the diameter of these spheres, diam (K,,), converges to
ez ro. In view of part (ii) of Theorem 5.5.35, n K" * 0. eL t
..
k- I
x E n "K .
GO
k= 1
U
GO
Then X ¢ A" for all n. But this contradicts the fact that X = A". This
,,= 1
completes the proof of the proposition. _
Proof ofTheorem 7.2.6. Let

Ak = II r- I y II < = 1,2, ....
.
{ y E :Y kllyll}, k
Clearly, Y = U A
k- I
k• By Proposition 7.2.8 there is a sphere S(Yo; f) C Y
and a set A" such that S(Yo; E) C .1". We may assume that oY E A". Let p
be such that 0 < p < E, and let us define the sets Band Bo by
B = y{ E S(Yo; f): p < lIy - oY II}
and
Bo = {y E Y: y = z - oY , Z E B}.
We now show that there is an Ax such that B o c Ax < Let Y E B n Aft' Then
Y - oY E Bo. We then have
IIT-I(y - oY )11 < IIT-I Y II + IIT-I oY lI
~ nUl Y II + II oY III
< nUly - oY II + 211 oY 10
= nlly - Y 11[1
o
+ 211Yoll
IIYoY- ll
]
< nlly - oY 11[1 + 211 po ll J

Now let K be a positive integer such that
n[1 + 211 po "] < .K

It then follows that Y - oY E Ax. It follows readily that Bo c Ax <
Now let Y be an arbitrary element in .Y It is always possible to choose

a real number .t such that .ty E Bo. Thus, there is a sequence y{ ,} such that
,Y E Ax for all i and lim ,Y = .ty. This means that the sequence i{ : ,Y }
converges to y. We observe from the definition of Ax that if ,Y E Ax , then
T1 IY E Ax for any real number .t. eH nce, we have shown that Y c Ax- .
iF nally, for arbitrary Y E Y we can write, by Proposition 7.2.7,
Y = IY + 1Y + ... + fY t + ,
where II fY t
series I; X
. II < 311 Y II/2ft . L e tx k =
k• This series converges, since
T- I yk , k = 1, ,and consider the infinite
k= 1
so that
IIxlI < ' t l"x k " < 3KIIYl k~ lk = 3KIIYII·

Since T is continuous and since ~
. X converges, it follows that Tx =
(:' 1 k
T(f I~
X k) = ~
k- I
TX k = :tY k
k- I
= y. eH nce, x = T- I y . Therefore, IIxli = IIT- I Y II
< 3KllylI. This implies that T-I is bounded, which was to be proved. _
tU ilizing the principle of contraction mappings (see Theorem 5.8.5),

we now establish results related to inverses which are important in applica-
tions. In the setting of normed linear spaces we can restate the definition of a
contraction mapping as being a function T: X - X (T is not necessarily
7.3. Conjugate and Adjoint Operators 14 9
linear) such that

IIT(x) - T(y) II < /Xlix - yll
for all x , y E ,X with 0 < IX < I. The principle of contraction mappings
asserts that if T is a contraction mapping, then the equation
T(x) = x
has one and only one solution x E .X
We now state and prove the following result.
7.2.9. Theorem. Let X be a Banach space, let T E B(X, X), let l E ,F

and let l O. *
(i) If III > II T II, then Tx = h has a unique solution, namely x = 0;
(ii) if Ill> II Til, then (T - 1/)-1 exists and is continuous on X ;
(iii) if III > II T II, then for a given y E X there is one and only one
vector x E X such that (T - l/)x = y, and
x =- i[ + ~ + ... ;J and
(iv) if III- Til < I, then T- I exists and is continuous on .X

Proof
(i) F o r any ,x y E ,X we have
1I1- Tx - l - t TY I I
I
= 11- 1 1\ 1 T(x - y)1I < Il-IIIITllllx - yll.
Thus, if II Til < IAI, then A-I T is a contraction mapping. In view of the
principle of contraction mappings there is a unique x E X with l- ' T x = x,
mil
or Tx = lx . The unique solution has to be x = 0, because TO = O.
(ii) L e tL = t- T. Then IILII = Til < l. It now follows from Theorem
7.2.2 that (L _ / )- 1 exists and is continuous on .X Thus, (lL - ll)- I = (T
- ll)- I exists and is continuous on .X This completes the proof of part (ii).
The proofs of the remaining parts are left as an exercise. _
7.2.10. Exercise. Prove parts (iii) and (iv) of Theorem 7.2.9.
7.3. CONJG
U ATE AND ADJOINT OPERATORS
Associated with every bounded linear operator defined on a normed linear

space is a transformation called its conjugate, and associated with every
bounded linear operator defined on an inner product space is a transformation
called its adjoint. These operators, which we consider in this section, are of
utmost importance in analysis as well as in applications.
Throughout this section X and Y a re normed linear spaces over ,F where

F is either R (the real numbers) or C (the complex numbers). In some cases
we may further assume that X and Y a re inner product spaces, and in other
instances we may require that X and/or Y be complete.
eL t X f and yf denote the algebraic conjugate of X and ,Y respectively
(refer to Definition 3.5.18). tU ilizing the notation of Section 3.5, we write
x ' E X f and y' E yf to denote elements of these spaces. If T E L ( X , )Y ,
we defined the transpose of T, TT, to be a mapping from yf to X f determined
by the equation
,x < TTy' ) = T
< ,x y' ) for all x E ,X y' E yf
(see Definition 3.5.27), and we showed that TT E L ( yf, Xf).
Now let us assume that T: X - + Y is a bounded linear operator on X
into .Y Let * x and y* denote the normed conjugate spaces of X and ,Y
respectively (refer to Definition 6.5.9). If y' E y*, then y' ( y) = y< , y' ) is
defined for every y E Y and, in particular, it is defined for every y = Tx,
x E .X The quantity T < ,x y' ) = y' ( Tx ) is a scalar for each x E .X Writing
x'(x) = T < ,x y' ) = y' ( Tx ) , we have defined a functional x ' on .X Since y'
is a linear transformation (it is a bounded linear functional) and since T is a
linear transformation (it is a bounded linear operator), it follows readily
that x ' is a linear functional. Also, since T is bounded, we have
Ix'(x) I = ly' ( Tx ) I= I<T,x y' ) I< Ily'lllITxll"< lIy'III1Tllllxll,
and therefore x ' is a bounded linear functional and x ' E X*. We have thus
assigned to each functional y' E y* a functional x ' E X * ; i.e., we have
established a linear operator which maps y* into X * . This operator is called
the conjugate operator of the operator T and is denoted by T'. We now have
x' = T' y ' .
The definition of T': *Y -+ *x is usually expressed by the relation
.x < T' y ' ) = T
< ,x y' ) , x E ,X y' E Y*.
tU ilizing operator notation rather than bracket notation, the definition
of the conjugate operator T' satisfies the equation
x'(x) = y' ( Tx ) = (T' y ' ) (x ) , x E ,X
and we may therefore write

y' T = T' y ' ,
where y' T denotes the functional on X consisting of the operators T and
y' , and T' y ' is the functional obtained by operating on y' by T' .
The reader can readily show that T' is unique and linear. If *Y = yf,
which is the case if Y is finite dimensional, then the conjugate T' and the
transpose TT are identical concepts. oH wever, since, in general, y* is a proper
subspace of yl, TT is an extension of T' or, conversely, T' is a restriction of

P to the space *Y .
We summarize the above discussion in the following definition and
Figure A.
x y R y. ·x
7.3.1. iF gure A
7.3.2. Definition. Let T be a bounded linear operator on X into .Y The

conjugate operator of T, T' : y* - + X * is defined by the formula
,x < T' y ' ) = <Tx, y' ) , x E ,X y' E *Y .
7.3.3. Exercise. Show that the conjugate operator T' is unique and linear.
Before exploring the properties of conjugate operators, we introduce

another important operator which is closely related to the conjugate operator,
the so-called "adjoint operator." In this case we focus our attention on
Hilbert spaces.
Let X and Y denote Hilbert spaces, and let the symbol ( , ) denote the
inner product on both X and .Y If T is a bounded linear transformation on
X into ,Y then in view of the above discussion there is a unique bounded linear
operator from *Y into X · , called the conjugate of T. But in view of Theorem
6.14.2, the dual spaces X · , y* may be identified with X and ,Y respectively,
because X and Y a re Hilbert spaces. This gives rise to a new type of bounded
linear operator from Y into ,X called the adjoint of T, which we consider in
place ofT' .
Let oY E Y be fixed, and let x ' ( x ) = (x, x ' ) = (Tx, oY ), where T E
B(X, )Y and x ' E *X . By Theorem 6.14.2 there is a unique X o E X such that
x'(x) = (x, ox )' Writing X o = T*yo we define in this way a transformation of

Y into .X We call this transformation the adjoint of T. Dropping the subscript
ez ro, we characterize the adjoint of T by the formula
(Tx, y) = (x, T*y), x E ,X Y E .Y
We will now show that T*: Y - + X is linear, unique, and bounded. To prove
linearity, let x E ,X lY > 2Y E ,Y let ~, P E ,F and note that
(x, T*(~IY + PY2» = (Tx, ~YI + PY2) = «(Tx, IY ) + P(Tx, 2Y )
= «(x, T*YI) + p(x , T*Y2) = (x, ~T*YI + PT*Y)J ' '
T*(~YI + PY2) = ~T*YI + PT*Y2'
and therefore T* is linear.
To show that T* is unique we note that if (x, T*y) = (x, S*y), then
(x, T*y) - (x, S*y) = 0 implies (x, (T* - S*)y) = 0 for all x E .X F r om
this it follows that (T* - S*)y 1. x for all x E ,X and thus (T* - S*)y = 0
for all Y E .Y Therefore, T* = S*.
To verify that T* is bounded we observe that
II T*x 11 2 = I(T*x, T*x) I = I(T(T*x), )x 1 s;; II T(T*x) IIIIx II
< IITIIIIT*xllllxll,
and thus
II T*x II < IITllllxll·
F r om this it follows that T* is bounded and furthermore II T*II S;; II T II.
We now give the following formal definition.
7.3.4. Definition. Let X and Y be Hilbert spaces, and let T be a bounded

linear operator on X into .Y The adjoint opentor T*: Y - > X is defined by
the formula
(Tx, y) = (x, T*y), x E ,X Y E .Y
Summarizing the above discussion we have the following result.
7.3.5. Theorem. The adjoint operator T* given in Definition 7.3.4 is

linear, unique, and bounded.
The reader is cautioned that many authors use the terms conjugate operator
and adjoint operator interchangeably. Also, the symbol T* is used by many
authors to denote both adjoint and conjugate operators.
Some of the important properties of conjugate operators are summarized
in the following result.
7.3.6. Theorem. Conjugate transformations have the following properties:

(i) liT' II = II Til;
(ii) I' = I, where I is the identity operator on a normed linear space X ;
(iii) 0' = 0, where 0 is the ez ro operator on a normed linear space X;
(iv) +
(S T)' = S' +
T', where S, T E B(X, )Y and where ,X Y a re
normed linear spaces;
(v) (IXT)' = IXT,' where T E B(X, )Y , IX E ,F and ,X Y a re normed
linear spaces;
(vi) (ST)' = T'S', where T E B(X, )Y , S E B( ,Y Z), and ,X ,Y Z are
normed linear spaces; and
(vii) if T- J exists and if T- ' E B(Y, X), then (T' ) - J exists, and moreover
(T' t J = (T-J)'.
Proof To prove part (i) we note that
I<,x T' y ' ) I= <IIy'IIIITxll< Ily'IIIITllllxll·
I< T x , y' ) 1
From this it follows that IIT'yl' l < IITlllly'lI,and therefore
IIT'II<IITII·
Next, let X o E ,X X O O. In view of the Hahn-Banach "* theorem (see
< x o, y~) = II Tx o II.
Corollary 6.8.5) there is a y~ E *Y , II y~ II = 1, such that T
Therefore,
II Tx o II = I<x o' T'y~>I< IIT'y~ IIl1x o II < IIT'lIllx o II,
from which it follows that
IITil < IIT'II·
Therefore, II Til = II T' II·
The proofs of properties (ii)-(vi) are straightforward. To prove (iv), for
example, we note that
,x< (S + T)' y ' ) = «S 1- T)x , y' ) = S
< x + Tx, y' )
= S
< ,x y' ) + <Tx, y' ) = ,x< S' y ' ) + ,x< T' y ' )
= ,x< S' y ' + T' y ' ) = ,x< (S' + T' ) y' ) .
From this it follows that (S T)' = S' T' . + +
To prove part (vii) assume that T E B(X, )Y has a bounded inverse T- J :
y - + .X To show that T': y* - + *X has an inverse we must show that it is
injective. eL t y~, y; E y* be such that y~ y;. Then "*
,x< T' y ;> - ,x< T'y>~ = <Tx, y; - y~)"* 0
for some x E .X From this it follows that T'y~ T'y,~ and T' is one-to-one. "*
We can, in fact, show that T' is onto. We note that for any x ' E x* and any
X E ,X Tx = y, and we have
<x, x') = <T-I y, x ' ) = <y, (T- I )' X ' ) = <Tx, (T- I )' x ' )
= <x, T'(T~I)'x').
From this it follows that

x' = T' ( T- I )' x ' .
This shows that x ' E R
< (T') and that (T' ) - I = (T- I )' . •
7.3.7. Exercise. Prove parts (ii), (iii), (v), and (vi) of Theorem 7.3.6.
In the next theorem some of the important properties of adjoint operators

are summarized.
7.3.8. Theorem. Let ,X ,Y and Z be Hilbert spaces, and let I and 0 denote
the identity and ez ro transformation on ,X respectively. Then
(i) IIT*II= IITII, where T E B(X , )Y ;
(ii) 1* = I;
(iii) 0* = 0;
(iv) (S + T)* = S* + T*, where S, T E B(X , )Y ;
(v) (a:T)* = a.T*, where T E B(X, )Y and IX E F ;
(vi) (ST)* = T*S*, where T E B(X, )Y , S E B(Y, Z);
(vii) if T- I E B(Y , X ) exists, then (T*)- t E B(X , )Y exists, and more-
over (T*)- t = (T- t )*;
(viii) if for T E B(X , )Y we define (T*)* = T**, then T** = T; and
(ix) IIT*TII= IITllz , where T E B(X, )Y .
Proof To prove part (i) we note that
IIT*x l l z = I(T*x , T*x ) ! = I(T(T*x ) , )x 1 < II T(T*x ) IIIlxll
< IITIIIIT*xllllxll,
or
Ilr*xll < IITllllx l l·
F r om the last inequality it follows that IIT*II < II T II. Reversing the roles of
T and T* we obtain
IITxZ
U = I(Tx , Tx)1 = I(T*(Tx ) , )x 1 < II T*(Tx ) IIIIx II < IIT*IIIITx l lllx l l,
or
II Tx II < II T* IIIIx II·
F r om this it follows that II Til < IIT*II,and therefore IITII = IIT*II·
The proofs of properties (ii)-(viii) are trivial. To prove part (ix), we first
note that
IIT*TII< IIT*IIIITII= IITIIIITII= IITW·
On the other hand,
IITxW = (Tx, Tx ) = (T*Tx, )x < IIT*Tx l lllx l l < IIT*Tllllxllllxll·
Taking the square root on both sides of the above inequality we obtain
II Txll < ~IIT*Tllllxll,
and thus II Til < ~IIT*TII, or IITW < IIT*TII· Hence, IIT*TII= IITW· •
7.3.9. Exercise. Prove parts (ii)-(viii) of Theorem 7.3.8.
F r om the above discussion it is obvious that adjoint operators are distinct

from conjugate operators even though many of their properties appear to
be identical, especially for the case of real spaces. We now cite a few examples
to illustrate some of the concepts considered above.
7.3.10. Example. Let X = C" be the H i lbert space with inner product
defined in Example 3.6.24, and let A E L ( X , X ) be represented (with respect
to the natural basis for X ) by the n x n matrix A = a[ IJ'} The transformation
Y = Ax can be written in the form
•
IY = E alJ x J , i= 1,2, ... ,n,
I= J
where IY is the ith component of the vector y E .X Let A* denote the adjoint
of A on the H i lbert space X, and let A* be represented by the n X n matrix
[a~}. Now if u = (u l , • • ,u.) E X , then
(Ax , u) = (y, u) = t
1= 1
Nil = t U I( 1=f:1 aIJx
1= 1
J ),
and
(x , A*u) = t (t
I- IJ-I
a~uJ)'
XI
In order that (Ax , u) = (x , A*u) we must have a~ = iiIJ ; i.e., the matrix of
A* is the transpose of the conjugate of the matrix of A. •
7.3.11. Example. Let X = Y = L 1 a[ , b], a < b (see Example 6. I I.lO),
r
and define the rF edholm operator T by
yet) = (Tx)(t) = k(s, t)x(s)ds, t E a[ , b],
where it is assumed that the kernel function k(s, t) is well enough behaved so
that
s: s: I k(s, t) 1
1
dtds < 00.
24 6 Clulpler 7 I iL near Operalors
Now if U E L 2a
[, b], then
(Tx, u) = (Y. u) = s: y(l)u(l) dl = s: (U I) :U k(s, I)x(s)ds )dl

= s: ex s) :U k(s, I)U(I)d1 )ds.
F r om this it follows that the adjoint T* of T maps U into the function
Z(I) = (T*uXI) = s: k(t, s)u(s)ds;

i.e., the adjoint of T is obtained by interchanging the roles of sand t in the
kernel and by utilizing the complex conjugate of k. •
7.3.12. Exercise. Let X = Y = 12 (see Example 6.1.6) and define T:
I' ' 2' " .. ".....)

12 -/ 2 by
T(el> 2' " .. "., )= (0, = y,

for all x = (' I > '2' •. , e., ) E 12 , Show that T*: 12 - 1 2 is the operator
defined by
T*('11> 1' 2' ...• 'I., ... ) = ('12' 1' 3' ... , 'I., ... )
for all y = (' I I' 1' 2' ... , "• • . .. ) E 12,
Recalling the definition of orthogonal complement (refer to Definition

6.12.1), we have the following important results for bounded linear operators
on H i lbert spaces.
7.3.13. Theorem. L e t T be a bounded linear operator on a H i lbert space

X into a H i lbert space .Y Then,
(i) { R< (T)}.L = ~(T*);
(ii) R< (T) = ~(T*).L;
(iii) ~(T) = R
<{ (T*)}.;L
(iv) R
< (T*) = ~(T).L;
(v) (~ T*) = ~(TT·); and
(vi) R
< (T) = R
< (TT*).
Proof We prove (i) and (v) and leave the proofs of (ii}(- iv) and (vi) as an
exercise.
To prove (i), we first show that R < (T).l = ~(T*). Let y E R < (T).l. Then
(y, Tx) = 0 for all x E ,X and hence (T· y , x) = 0 for all x E .X This can be
true only if T*y = 0; i.e., y E ~(T*). On the other hand, if y E ~(T*),
then (T*y, x) = 0 for all x E .X Thus, (y, Tx) = 0 for every x E ,X which
implies that y E R< (T).l. Now R
< (T) need not be closed. However, by Theorem
6.12.14 R < (T) = R
< (T)u. Therefore, R<{ (T)}.L = R< (T)il.L = R< (T).L = ~(T*).
7.4. eH rmitian Operators 24 7
To prove (v), let y E m(T*). Then T*y = 0 and TT*y = O. This implies
that m(T*) c m(TT*). Next, let y E m(TT*). Then TT*y = 0 and (y, TT*y)
= O. This implies that (T*y, T*y) = 0 so that T*y = O. Therefore, y E m(T*)
and m(TT*) c m(T*), completing the proof of part (v). •
7.3.14. Exercise. Prove parts (ii)-(iv) and (vi) of Theorem 7.3.13.
We conclude this section with the following results.
7.3.15. Theorem. Let T E B(X, )X , where X is a Hilbert space, and let

M and N be subsets of .X Define T(M) as
T(M) = y{ : y = Tx, x EM} .
If T(M) c N, then T*(Ni-) c Mi-.
Proof Let z .1 N. Then for x E M we have (Tx, )z = 0= (x, T*z). There-
fore, T*z .1 x for all x E M and T*z E Mi-. •
7.3.16. Theorem. Let T E B(X, )X , where X is a Hilbert space, and let

M and N be closed linear subspaces of .X Then T(M) c N if and only if
T*(Ni-) c Mi-.
Proof If T(M) c N, then by Theorem 7.3.15 T*(Ni-) c Mi-. Conversely,
if T*(Nl.) c Ml., then by Theorem 7.3.15, T**(Mil) c NH . But T** = T
and if M and N are closed linear subspaces, then MH = M and NJ.1. = N.
Therefore, T(M) c N. •
7.4. HERMITIAN OPERATORS
Throughout this section X denotes a complex Hilbert space. We shall

be primarily concerned with operators T E B(X , X ) . By T* we shall always
mean the adjoint of T.
F o r our first result, recall the definition of bilinear functional (Definition
3.6.4).
7.4.1. Theorem. Let T E B(X, X) and define the function rp: X x X - C

by rp(x, y) = (Tx, y) for all x, y E .X Then rp is a bilinear functional.
Of central importance in this section is the following class of operators.
7.4.3. Definition. A bounded linear transformation T E B(X, X) is said

to be hermitian if T = T*.
24 8
Some authors call such transformations self-adjoint operators (see Defini-

tion .4 10.20).
The next two results allow us to characterize a hermitian operator in an
equivalent manner. The first of these involves symmetric bilinear forms (see
Definition 3.6.10).
7.4..4 Theorem. eL t T E B(X, X). Then T is hermitian if and only if the

bilinear transformation ,(x , y) = (Tx , y) is symmetric.
Proof If T* = T, then ,(x , y) = (Tx , y) = (x , T*y) = (x , Ty) = (Ty, )x
= ,(y, x), and therefore, is symmetric.
Conversely, assume that ,(x , y) = ,(y, x). Then ,(y, )x = (Ty, )x =
(x , Ty) = ,(x , y) = (Tx , y) = (x , T*y); eL ., (x, Ty) = (x , T*y) for all x , y E
.X F r om this it follows that
«T* - T)x , y) = 0,
and thus (T* - T)x ..L y for all x E .X This implies that T*x = Tx for all
x E Xor T* = T. •
7.4.5. Theorem. Let T E B(X, X). Then T is hermitian if and only if

(Tx , x ) is real for every x E .X
Proof If T is hermitian, then (Tx , y) = (Ty, x). Setting x = y, we obtain

(Tx , )x = (Tx , x), which implies that (Tx , )x is real.
Conversely, suppose (x , Tx ) = (x , Tx ) for all x E .X Then (x, Tx ) =
(Tx , x). Now consider (x , Ty) for arbitrary x , y E .X It is easily verified that
(x + y, T(x + y» - (x - y, T(x - y» + i(x + iy, T(x + iy»
- i(x - iy, T(x - iy» = 4(x, Ty) (7.4.6)
where; = ..;=1. Also,
(T(x + y), x + y) - (T(x - y), x - y) + i(T(x + iy), x + iy»
- i(T(x - iy), x - iy) = 4 ( Tx , y). (7.4.7)
Since the left-hand sides of Eqs. (7.4.6) and (7.4.7) are equal, it follows
that (x , Ty) = (Tx , y) for all x , y E ,X and hence T = T*. •
The norm of a hermitian operator can be found as follows.
7.4.8. Theorem. Let T E B(X, X ) be a hermitian operator. Then the norm

of T can be expressed in the following equivalent ways:
(i) II Til = sup I{ (Tx , )x l: IIxll = I}; and
(ii) IITII = sup { I (Tx , y)l: IIxli = lIyll = I}.
7.4. eH rmitian Operators 24 9
In the next theorem, some of the more important properties of hermitian

operators are given.
7.4.10. Theorem. eL t S, T E B(X, X) be hermitian operators, and let ex

be a real scalar. Then
(i) (S + T) is a hermitian operator;
(ii) exT is a hermitian operator;
(iii) if T is bijective, then T- I is hermitian; and
(iv) ST is hermitian if and only if ST = TS.
7.4 . H . Exercise. Prove Theorem 7.4.10.
Since in the case of hermitian operators (Tx , x ) is real for all x E ,X

the following definition concerning definiteness applies (recall Definition
3.6.10).
7.4.12. Definition. eL t T E B(X, X ) be a hermitian operator. Then Tis

said to be positive if (Tx , x ) ~ 0 for all x E .X In this case we write T ~ O.
If (Tx, x) > 0 for all x *- 0, we say that T is strictly positive.
7.4.13. Definition. eL t S, T E B(X, X ) be hermitian operators. If the

hermitian operator T + (- S ) = T- S> 0, then we write T > S.
7.4.14. Theorem. eL t S, T, U E B(X, X) be hermitian operators, and let

ex be a real scalar. Then,
(i) if S ~ 0, T~ 0, then (S + T) > 0;
(ii) if ex > 0, T~ 0, then exT~ 0;
(iii) if S ::; T, T::; ,U then S < U; and
(iv) for any V E B(X, X), if T > 0, then V*TV> O. In particular,
V*V> o.
Proof The proofs of parts (i}(- iii) are obvious. F o r example, if S > 0,
T > 0, then (Sx , )x + (Tx, )x = (Sx + Tx , )x = + Dx , x) ;;::: 0 and S«
(S+ D;;:::O.
To prove part (iv) we note that (V*TVx , x) = (TVx , Vx);;::: 0, since Vx
= y is a vector in X and (Ty, )Y > 0 for all y E .X If we consider, in par-
ticular, T = 1= 1*, then v* V ~ O. •
The proof of the next result follows by direct verification of the formulas
involved.
34 0 Chapter 7 / iL near Operators
7.4.15. Theorem. eL t A E B(X , X), and let
U = ~ A
[ + A*] and V= ii A
[ - A*],
where i = ,j- 1 . Then

(i) U and V are hermitian operators; and
(ii) if A = C + iD, where C and D are hermitian, then C = U and
D= V.
eL t us now consider some specific cases.
7.4.17. Example. eL t X = C" with inner product given in Example 3.6.24.

Let A E B(X , X), and let e{ l> ... ,eft) be any orthonormal basis for .X As
we saw in Example 7.3.10, if A is represented by the matrix A, then A* is
represented by A * = AT. In this case A is hermitian ifand only if A = AT. •
7.4.18. Example. eL t X = L 2a
[, b] (see Example 6.11.10), and define
T E B(X, X ) by
y = Tx = tx ( t).
Then for any Z E X we have
(Tx , z ) = s: tx ( t)z ( t) dt = s: x ( t)tz ( t) dt
= (x , Tz) = (T*x , z).

Thus, T = T* and T is hermitian. _
7.4.19. Exercise. Let X = L 2a

[, b], and define T: X -+ X by
z = Tx = I x ( s)ds.
Show that T* *" T and therefore T is not hermitian.

7.4.20. Exercise. eL t X = L 2 a[ , b] and consider the Fredholm operator
given in Example 7.3.11; i.e.,
y(t) = (Tx ) (t) = s: k(s, t)x(s)ds, t E a[ , b].
Show that T = T* if and only if k(t, s) = k(s, t).
We conclude this section with the following result, which we will sub-
sequently require.
7.5. Other iL near Operators 34 1
7.4.21. Theorem. Let X be a H i lbert space, let T E B(X , X ) be a hermitian

operator, and let 1 E R. Then there exists a real number" > 0 such that
, 11 x II < II (T - U ) x II for all x E X if and only if (T - U ) is bijective and
(T - 11)-1 E B(X , X ) , in which case II(T - ,il)-III < 1/".
Proof L e t T A= T - AT. It follows from Theorem 7.4.10 that T Ais also
hermitian.
To prove sufficiency, let Til E B(X , X ) . It follows that for all Y E ,X
IITilyli < II Til II · l lyl\ · L e ttingy = TAX and" = II Til WI, we have II TAX II
2 ,,11 x 1\ for all x E .X
To prove necessity, let" > 0 be such that , 11 x II < II TAX II for all x E .X
We see that TAX = 0 implies X = 0; i.e., m(TJ = O { ,J and so TAis injective.
We next show that R < (T A) = .X It follows from Theorem 6.12.16 that X
= R< (T A) EEl R< (T A)1.. F r om Theorem 7.3.13, we have R< (TA)l· = men). Since
TAis hermitian, meT! ) = m(TA) = O { .J Hence, R< (T A) = .X We next show that
< (T A) = R
R < (T A), i.e. the range of T A is closed. Let nY{ J be a sequence in
R< (T A) such thatYn - > y. Then there is a sequence nx{ J in X such that TAx n = n'J '
F o r any positive integers m, n, , 11 X m - X nIi < II TAx m - TAx nII = II m Y - nY II.
Since nY{ J is Cauchy, nx { J must also be Cauchy. Let X n - > .x Then nY = TAx n
-> TAX = y. Thus, Y E R < (T A) and so R < (T A) is closed. This proves that TA
is bijective. Finally, ,,11 Ti I Y II < II Y II for all y E X implies Ti I E B(X , X )
and II Tilll < 1/". This completes the proof of the theorem. _
7.5. OTHER LINEAR OPERATORS:

NORMAL OPERATORS, PROJECTIONS,
U N ITARY OPERATORS,
AND ISOMETRIC OPERATORS
In this section we consider additional important types of linear operators.

Throughout this section X is a complex H i lbert space, T* denotes the adjoint
of T E B(X , X ) , and I E B(X , X ) denotes the identity operator.
7.5.1. Definition. An operator T E B(X , X) is said to be a normal operator

ifT*T= TT*.
7.5.2. Definition. An operator T E B(X , X) is said to be an isometric

operator if T*T = I.
7.5.3. Definition. An operator T E B(X , X) is said to be an unitary opera-

tor if T*T = TT* = I.
Our first result is for normal operators.

34 1 ClUpJ ter 7 I iL near Operators
7.5.4. Theorem. Let T E B(X, X). Let ,U V E B(X, X ) be hermitian

operators such that T = U + iV. Then T is normal if and only if U V = VU.
7.5.5. Exercise. Prove Theorem 7.5.4. Recall that U and V are unique by
Theorem 7.4.15.
F o r the next result, recall that a linear subspace Y of X is invariant

under a linear transformation T if T(Y ) c Y (see Definition 3.7.9). Also,
recall that a cloSed linear subspace Y of a Hilbert space X is itself a Hilbert
space with inner product induced by the inner product on X (see Theorem
6.2.1).
7.5.6. Theorem. Let T E B( ,X X ) be a normal operator, and let Y be a

closed linear subspace of X which is invariant under T. eL t T I be the restric-
tion of T to .Y Then TIE B(Y , )Y and T I is normal.
F o r isometric operators we have the following result.
7.5.8. Theorem. eL t T E B(X , X). Then the following are equivalent:

(i) T is isometric;
(ii) (Tx , Ty) = (x, y) for all ,x y E X ; and
(iii) II Tx - Ty II = IIx - y II for all ,x y E .X
Proof If T is isometric, then (x, y) = (lx , y) = (T*Tx, y) = (Tx , Ty) for
all x , y E .X
Next, assume that (Tx, Ty) = (x, y). Then I\ Tx - Ty I\ ' = I\ T(x - y) II'
= (T(x - y), T(x - y)) = x « - y), (x - y» = IIx - yll' ; i.e., IITx - Tyll
= l lx - y ll·
iF nally, assume that II Tx - Ty II = II x - y II. Then (T*Tx, )x = (Tx , Tx )
= II Tx W = IIx W = (x, x); i.e., (T· T x , )x = (x, x) for all x E .X But this
implies that T· T = I; i.e., T is isometric. _
From Theorem 7.5.8 there follows the following corollary.
7.5.9. Corollary. If T E B(X, X ) is an isometric operator, then IITxll

= Ilxll for all x E X and IITII = I.
F o r unitary operators we have the following result.
7.5.10. Theorem. eL t T E B(X , )X . Then the following are equivalent:

(i) T is unitary;
(ii) T· is unitary;
(iii) T and T· are isometric;
(iv) T is isometric and T* is injective;

(v) T is isometric and surjective; and
(vi) T is bijective and T- I = T*.
7.S.H. Exercise. Prove Theorem 7.5.10.
Before considering projections, let us briefly return to Section 3.7. Recall

that if (a linear space) X is the direct sum of two linear subspaces XI and X z ,
i.e., X = X l EB X z , then for each x E X there exist unique X l E X l and
Xz E X z such that X = Xl +
X z . We call a mapping P: X - > X defined by
Px = X l the projection on .X along X z . Recall thatP E L ( X , X), R< (P) = X l '
and m(p) = X z . Furthermore, recall that if P E L ( X , X ) is such that pz = P,
then P is said to be idempotent and this condition is both necessary and suffi-
cient for P to be a projection on R < (P) along m(p) (see Theorem 3.7.4). Now
if X is a Hilbert space and if X l = Y is a closed linear subspace of ,X then
X z = y.l and X = Y E9 y.l (see Theorem 6.12.16). If for this particular
case P is the projection on Y a long y.l, then P is an orthogonal projection
(see Definition 3.7.16). In this case we shall simply call P the orthogonal
projection on .Y
7.5.12. Theorem. eL t Y be a closed linear subspace of X such that Y *" O{ J

and Y *" .X Let P be the orthogonal projection onto .Y Then
(i) P E B(X, X ) ;
(ii) IIPII = I; and
(iii) p* = P.
Proof We know that P E L ( X , X ) . To show that P is bounded let X = X l
+ x z , where X I E Y a nd X z E .Y l. Then II Px II = Ilxlli < IIxll. eH nce, P
is bounded and IIPII ~ I. If X z = 0, then IIPxl1 = IIxll and so IIPII = I.
To prove (iii), let x, Y E X be given by X = X I + X z and Y = IY + ,zY
respectively, where X I ' IY E Y a nd x z , zY E .Y l. Then (x , Py) = (X l + X z ,
Y l ) = (X l ' Y l ) and (Px, y) = (XI> IY +
yz) = (X I ' YI)' Thus, (x, Py) = (Px, y)
for all ,x Y E .X
This implies that P = P*. •
From the above theorem it follows that an orthogonal projection is a
hermitian operator.
7.5.13. Theorem. Let Y be a closed linear subspace of ,X and let P be the

orthogonal projection onto .Y If
Y l = x{ E X: Px = )x
and if Y z is the range of P, then Y = Y l = Y z.
Proof Since ~Y = ,Y since Y c Y I, and since Y I c Y~, it follows that

Y= Y I = Y~ .•
7.5.14. Theorem. Let P E L(X, X). If P is idempotent and hermitian, then
Y = x{ E X: Px = }x
is a closed linear subspace of X and P is the orthogonal projection onto .Y
Proof Since P is a linear operator we have
P(rx.x + fty) = rx.Px + ftPy.
If x, y E ,Y then Px = x and Py = y, and it follows that
P(rx.x + fty) = rx.x + fty.
Therefore, (rx.x +
fty) E Y a nd Y is a linear subspace of .X We must show
that Y is a closed linear subspace. First, however, we show that P is bounded
and therefore continuous. Since
IIPzW = (Pz, Pz) = (P*Pz, )z = (P~z, )z = (Pz, )z < IIPz l lllz l I,
we have IIPzlI ~ IIzll and IIPI! = l.
To show that Y is a closed linear subspace of X let X o be a point of accu-
mulation of the space .Y Then there is a sequence of vectors {x~} in Y such
that lim I\ ~x - X o II = O. Since ~x E ,Y we can put Px~ = x~ and we have
I\ Px~ ~ X o 1\ - 0 as n - (Xl. Since P is bounded, it is continuous and thus
we also have 1\ Px~ - PX o II- 0 as n - > (X l , and hence X o E .Y
Finally, we must show that P is an orthogonal projection. L e t x E ,Y
and let y E .Y l. Then (Py, )x = (y, Px) = (Y, x) = 0, since x ...L y. Therefore,
Py...L x and Py E .Y l. But P(Py) = Py, since P~ = P and thus Py E .Y
Therefore, it follows that Py = 0, because Py E Y and Py E .Y l. Now let
Z = x + y E ,X where x E Y and y E lY .. Then pz = Px + Py = x +
= .x Hence, P is an orthogonal projection onto .Y •
°
The next result is a direct consequence of Theorem 7.5.14.
7.5.15. Corollary. L e t Y be a closed linear subspace of X, and let P be

the orthogonal projection onto .Y Then P(yl.) = O
{ .J
The next result yields the representation of an orthogonal projection onto

a finite-dimensional subspace of .X
7.5.17. Theorem. L e t IX{ > • • , x~} be a finite orthonormal set in ,X and

let Y be the linear subspace of X generated by { X I "' " x~}. Then the
orthogonal projection of X onto Y is given by
Px =
~
~ (x, ,X )X , for all x E .X

I- I
Proof We first note that Y is a closed linear subspace of X by Theorem

6.6.6. We now show that P is a projection by proving that p'1. = P. F o r
any j = I, ... , n we have
ft
PX J = ~ (x J , ,x )x, = Ix " (7.5.18)
I- I
Hence, for any x E X we have
= ~ " (x, ,X )X ,= Px.

t-1
Next, we show that CR(P) = .Y It is clear that CR(P) c .Y To show that
Y c CR(P), let y E .Y Then
y = tllXI + ... + tI"x"
for some { t il' ... ,tift}. It follows from Eq. (7.5.18) that Py = Y and so
y E CR(P).
iF nally, to show that P is an orthogonal projection, we must show that
CR(P) 1- (~ P). To do so, let x E ~(P) and let y E CR(P). Then
(x, y) = (x, Py) = (x, ~

" (y, ,X )X ,) = ~
"(y,
- - ,x )(x, ,x )
1= 1 1= 1
= ~ " (x, ,X )(X y) = (~(x, " ,X )X y) = (Px, y)

I~ " 1'1= "
= (O,y) = O.
This completes the proof. _
Referring to Definition 3.7.12 we recall that if Y and Z are linear subspaces

of (a linear space) X such that X = Y ffi Z, and if T E L ( X , X ) is such
that both Y and Z are invariant under T, then T is said to be reduced by Y
and Z. When X is a Hilbert space, we make the following definition.
7.5.19. Definition. eL t Y be a closed linear subspace of ,X and let T E

L(X, X ) . Then Y is said to reduce T if Y a nd y.l. are invariant under T.
Note that in view of Theorem 6.12.16, Definitions 3.7.12 and 7.5.19 are
consistent.
The proof of the next theorem is straightforward.
7.5.20. Theorem. Let Y be a closed linear subspace of ,X and let T E

B(X , X ) . Then
(i) Y is invariant under T if and only if y.l. is invariant under T*; and
(ii) Y reduces T if and only if Y is invariant under T and T*.
Chapter 7 I iL near Operators
7.5.22. Theorem. Let Y be a closed linear subspace of ,X let P be the

orthogonal projection onto ,Y let T E B(X, X ) , and let I denote the identity
operator on .X Then
(i) Y is invariant under T if and only if TP = PTP;
(ii) Y reduces T if and only if TP = PT; and
(iii) (I - P) is the orthogonal projection onto lY ..
Proof To prove (i), assume that TP = PTP. Then for any x E Y we have
Tx = T(Px ) = P(TPx ) E Y, since P applied to any vector of X is in .Y
Conversely, if Y is invariant under T, then for any vector x E X we have
T(Px ) E ,Y because Px E .Y Thus, P(TPx ) = TPx for every x E .X
To prove (ii), assume that PT = TP. Then PTP = P2T = PT = TP.
Therefore, PTP = TP, and it follows from (i) that Y is invariant under T.
To prove that Y reduces T we must show that Y is invariant under T*.
Since P is hermitian we have T*P = (PT)* = (TP)* = P*T* = PT*; i.e.,
T*P = PT*. But above we showed that PTP = TP. Applying this to T* we
obtain T*P = PT*P. In view of (i), Y is now invariant under T*. Therefore,
the closed linear space reduces the linear operator T.
Conversely, assume that Y reduces T. By part (i), TP = PTP and T*P
= PT*P. Thus, PT = (T*P)* = (PT*P)* = PTP = TP; i.e., TP = PT.
To prove (iii) we first show that (I - P) is hermitian. We note that
(l - P)* = 1* - p* = I - P. Next, we show that (I - P) is idempotent.
We observe that (I - pp = (1- 2P + P2) = (1- 2P + P) = (1- P).
Finally, we note that (1 - P)x = x if and only if Px = 0, which implies
that x E lY .. Thus,
yl. = x{ E X: (1- P)x = .}x
It follows from Theorem 7.5.14 that (I - P) is a projection onto lY .. •
The next result follows immediately from part (iii) of the preceding
theorem.
7.5.23. Theorem. Let Y be a closed linear subspace of ,X and let P be the

orthogonal projection on .Y If II Px II = II x II, then Px = x, and consequently
x E .Y
We leave the proof of the following result as an exercise.
7.5.25. Theorem. Let Y a nd Z be closed linear subspaces of ,X and let P

and Q be the orthogonal projections on Y a nd Z, respectively. Let 0 denote
the zero transformation in B(X, )X . The following are equivalent:

(i) Y 1- ;z
(ii) PQ = 0;
(iii) QP = 0;
(iv) P(Z) = O
{ ;} and
(v) Q(Y ) = O{ .J
F o r the product of two orthogonal projections we have the following

result.
7.S.27. Theorem. L e t Y I and Y z be closed linear subspaces of ,X and let

PI and P z be the orthogonal projections onto Y I and Y z , respectively. The
product transformation PJP Z is an orthogonal projection if and only if PI
commutes with P z . In this case the range of P1P Z is Y I (i Y z .
Proof Assume that PIP Z = PZP I· Then (PIP Z)* = PfN = PZP I = PIP Z;
i.e., if PIP Z = PZP I then (PIP Z)* = (P1P Z)· Also, (PJPZP = PIPZPIP Z
= PIPIPZP Z = PIP Z; i.e., if PIP Z = PZP I , then PIP Z is idempotent. There-
fore, PIP Z is an orthogonal projection.
Conversely, assume that PJP Z is an orthogonal projection. Then (PJP z )*
= PfN = PZP 1 and also (P1P Z)* = PJP z . Hence, P1P Z = PZP J .
Finally, we must show that the range of PI P z is eq u al to Y J (i Y z . Assume
that x E 6l(P IP z ). Then P1PZx = ,x because P J P z isan orthogonal projection.
Also, PIPZx = PI(PZx) E Y J , because any vector operated on by P J is in Y I '
Similarly, PZPlx = Pz(PJ)x E Y z . Now, by hypothesis, P1P Z = PZP Io and
therefore PIPZx = PZPJx = x E Y I (i Y z . Thus, whenever x E 6l(P IP z ),
then x E Y J (i Y z . This implies that 6l(P IP z ) c Y I (i Y z . To show that
6l(P IP z ) ::J Y I ( i Y z , assume that x E Y 1 (i Y z . Then PJPZx = PJP{ )xz
= PIX = X E 6l(P IP z ). Thus, Y I (i Y z C 6l(P 1P z ). Therefore, 6l(P IP z )
= Y I (i Y z • •
7.5.28. Theorem. L e t Y and Z be closed linear subspaces of ,X and let P

and Q be the orthogonal projections onto Y a nd Z, respectively. The following
are eq u ivalent:
(i) P::;;; Q;
(ii) II Px II < II Qxll for all x E X;
(iii) Y c: z;
(iv) QP = P; and
(v) PQ = P.
34 8 ChJpz ter 7. I iL near Operators
Proof Assume that P ~ Q. Since P and Q are orthogonal projections, they

are hermitian. F o r a hermitian operator, P ~ 0 means (Px , x ) ~ 0 for all
x E .X If P < Q, then (Px , x ) < (Qx , x ) for all x E X or (P"x , x ) < (Q"x ,
x ) or (Px , Px ) ~ (Qx , Qx ) or II Px II" < II Qx1l 2 , and hence IIPxll < II Qx l l
for aU x E .X
Next, assume that II Px II < II Qx II for all x E .X If x E Y , then Px = x
and
(x , x ) = (Px , Px ) = IIPxll" ~ IIQxll" ~ IIQllllxll" = II x II" = (x , x ) ,
and therefore II Qx II = II x II. F r om Theorem 7.5.23 it now follows that

Qx = x , and hence x E Z. Thus, whenever x E Y then x E Z and Z ::J Y.
Now assume that Z ::J Y and let y = Px , where x is any vector in X.
Then QPx = Qy = y = Px for all x E X and QP = P.
Suppose now that QP = P. Then (QP)* = P*, or P*Q* = PQ = p* = P;
i.e., PQ = P.
Finally, assume that PQ = P. F o r any x E X we have (Px , x ) = IIPxll"
= IIPQxll"~IIPII"IIQxll" = IIQxll" = (Qx , Qx ) = (Q2 X ,X ) = (Qx , x ) ;
i.e., (Px, )x < (Qx , )x from which we have P < Q. _
7.5.29. Theorem. Let Y 1 and "Y be closed linear subspaces of ,X and let
PI and P 2 be the orthogonal projections onto Y t and "Y , respectively. The
difference transformation P = PI - P z is an orthogonal projection if and
only if P z < PI' The range of Pis Y t n Y t .
We close this section by considering some specific cases.
7.5.31. Example. eL t R denote the transformation from E" into E" given
in Example .4 10.48. That transformation is represented by the matrix
R, = [c~S 0 - sin OJ
SID 0 cos 0
with respect to an orthonormal basis e{ l' e"J. By direct computation we
obtain
R: =[ c~s 0 sin OJ.
- SID 9 cos 9
It readily follows that R*R = RR* = I. Therefore, R is a linear transforma-
tion which is isometric, unitary, and normal. _
7.6. The Spectrum 0/ an Operator 34 9
7.5.32. Exercise. eL t X = L 2 0[ , 00) and define the truneation operator P T

by y = PTx, where
y(t) = { X ( t) for all 0 < t :::;; T
o for all t > T
Show that PT is an orthogonal projection with range
R
< (P T) = x{ E :X x(t) = 0 for t > T},
and null space
m(P T ) = x{ E :X (x t) = 0 for all t < T}.
Additional examples of different types of operators are considered in
Section 7.10.
7.6. THE SPECTRUM OF AN OPERATOR
In Chapter 4 we introduced and discussed eigenvalues and eigenvectors

of linear transformations defined on finite-dimensional vector spaces. In
the present section we continue this discussion in the setting of infinite-
dimensional spaces.
nU less otherwise stated, X will denote a complex Banach space and I will
denote the identity operator on .X oH wever, in our first definition, X may be
an arbitrary vector space over a field .F
7.6.1. Definition. eL t T E L ( X , )X . A scalar A E F is called an eigenvalue

of T if there exists an x E X such that x * - O and such that Tx = AX. Any
vector x * - O satisfying the equation Tx = Ax is called an eigenvector of T
corresponding to the eigenvalue A..
7.6.2. Definition. eL t X be a complex Banach space and let T: X - .X

The set of all .J E F = C such that
(i) R
< (T - AI) is dense in ;X
(ii) (T - .J I)-I exists; and
(iii) (T - .J I)-I is continuous (i.e., bounded)
is called the resolvent set of T and is denoted by p(T). The complement of
p(T) is called the spectrum of T and is denoted by q ( T).
The preceding definitions require some comments. First, note that if .J

is an eigenvalue of T, there is an x * - O such that (T - .J I)x = O. From
Theorem 3.4.32 this is true if and only if (T - AI) does not have an inverse.
eH nce, if .J is an eigenvalue of T, then ,t E (q T). Note, however, that there
04 C1u:zpter 7 I iL near Operators
are other ways that a complex number 1 may fail to be in p(T). These possi.
bilities are enumerated in the following definition.
7.6.3. Definition. The set of all eigenvalues of T is called the point spectrum
of T. The set of alll such that (T - l1)- 1 exists but Gl(T - l l) is not dense
in X is called the residual spectrum of T. The set of all 1 such that (T - 11)-1
exists and such that Gl(T - 11) is dense in X but (T - ll)- I is not continuous
is called the continuous spectrum. We denote these sets by pq ( T), Rq(T),
and Cq(T), respectively.
Clearly, q ( T) = Pq(T) U Cq(T) U Rq(T). Furthermore, when X is finite

dimensional, then q(T) = Pq(T). We summarize the preceding definition in
the following table.
(T - AI)-1 exists and (T - AI)-1 exists but

(T - AI)-1 does
(T - is
.11)-1 (T - not
.11)-1 is
not exist
continuous continuous
< R (T- U ) = X A e p(D .Ie Ca(D A e Pa(D
R
< (T -U) *X .Ie "RtT(T) 1 e RtT(T) 1 e PtT(T)
7.6.4. Table A. Characterization of the resolvent set and the spectrum

of an operator
7.6.5. Example. Let X = /2 be the Hilbert space of Example 6.11.9, let

x = (~I' ~2" ..) E ,X and define T E B(X , X ) by
Tx =
! { 2 ' i(3' ...). (~I'
F o r each 1 E C we want to determine (a) whether (T - 11)-1 exists; (b) if so,
whether (T - 11)-1 is continuous; and (c) whether Gl(T - 1 1) = .X
First we consider the point spectrum of T. IfTx = lx then (~ - l )~k = 0,
k = 1,2• . ... This holds for non-trivial x if and only if l = 11k for some k.
Hence.
pq ( T) = { k:k = I. 2• . .. } .
Next, assume that 1 ¢ pq(T). so that (T - l1)- 1 exists. and let us inves·
tigate the continuity of (T - 1 1)- 1 . We see that if y = (' I I. 1' 2.' ..) E Gl(T
- 11), then (T - l 1)- l y = x is given by
~ -.....!l.L- _ k'lk .
k- ..! . ._ l - I - l k
k
7.6. The Spectrum 0/ an Operator 14
Now if A. = 0, then II (T - A.I)-I y W=

. ~ k"11~ and (T - A.I)-I is not
bounded and hence not continuous. On the other hand, if A.
k= 1
*" 0, then
(T - A.I)-I is continuous since I' k I < , 1 11k I for all k, where
I
and
p(n = P[ O'(T) u CO' ( nr· •
7.6.6. Exercise. eL t X = lz, the Hilbert space of Example 6.11.9, let

x = (' I ' ,,,' ' 3 ' " .), and define the right shift operator T,: X - + X and the
I' ' , ,' ...)

left shift operator T,: X - + X by
Y = T,x = (0,
and
respectively. Show that

p(T,) = p(T,) = A{ . E C: IA.I > I),
CO'(T,) = CO'(T,) = A{ . E C: IA.I = I),
RO'(T,) = PO'(T,) = A{ . E C: IA.I < I),
PO'(T,) = RO'(T,) = 0.
We now examine some of the properties of the resolvent set and the
spectrum.
7.6.7. Theorem. Let T E B(X, X). IflAI > II Til, then A. E p(T) or, equiva-
lently, if A E O'(n, then IA.I < II Til.
7.6.8. Exercise. Prove Theorem 7.6.7 (use Theorem 7.2.2).
7.6.9. Theorem. Let T E B(X, X). Then P(T) is open and o'(T) is closed.
Proof Since o(T) is the complement of p(T), it is closed if and only if P(T)
is open. Let 1 0 E P(T). Then (T - 1 0 1) has a continuous inverse. F o r arbi-
trary 1 we now have
III- (T - l oI} - I (T 1- 1}11
= II(T - l oI} - ' ( T - 1 0 1) - (T - l ol} - I (T - 1 /)11
= II(T - l oI} - I [ ( T - 1 01) - (T -1I)]11
= II(l- l o)(T - 1 0 1)-111
= Il- l olIl(T - 1 0 /)-111.
Now for 11 - 10 I sufficiently small, we have
III- (T - loT) - I (T - 1I) II = 11 - 1 0 III(T - 1 0 ) -I II < 1.
Now in Theorem 7.2.2 we showed that if T E B(X, X), then T has a contin-
uous inverse if III- Til < 1. In our case it now follows that (T - lo/)- I (T
- lI) has a continuous inverse, and therefore (T - 1I) has a continuous
inverse whenever Il - lo I is sufficiently small. This implies that 1 E p(T)
and P(T) is open. eH nce, u(T) is closed. _
F o r normal, hermitian, and isometric operators we have the following

result.
7.6.10. Theorem. eL t X be a Hilbert space, let T E B(X , X), let l be an

eigenvalue of T, and let Tx = lx . Then
(i) if T is hermitian then 1 is real;
(ii) if T is isometric, then III = I;
(iii) if T is normal, then X is an eigenvalue of T* and T*x = ;x X and
(iv) if T is normal, if .J l is an eigenvalue of T such that .J l 1= = 1, and if
Ty = .J lY, then x ..1 y.
Proof Without loss of generality, assume that x is a unit vector.
To prove (i) note that l = 111 x W= l(x , )x = (lx , )x = (Tx, )x , which
is real by Theorem 7.4.5. Therefore, (Tx, )x = (x, Tx) = (Tx, )x = ;X i.e.,
1 = X and 1 is real.
To verify (ii), note that if T is isometric, then II Txll = IIxll = 1, by
Corollary 7.5.9. Since Tx = Ax it follows that IIlxll = 1 or Illllx l i = I,
and hence III = l.
7.6. The Spectrum 0/ an Operator 34
To prove (iii), assume that T is normal; i.e., T*T = TT*. Then

(T - U ) (T - U)* = (T - U ) (T* - II)
= (T - U ) T* - (T - U ) II
= TT* - ).T* - IT + ;.II
= T*T - IT - ).T* + ).II
= (T* - II)(T - ),1) = (T - U ) *(T - ).1);
eL .,
(T - UXT - ).1)* = (T - U ) *(T - U),
and (T - AI) is normal. Also, we can readily verify that I\ (T - ).I)x II =
II(T - 11)*xll. Since (T - U ) x = 0, it follows that (T - AI)*X = 0, or
(T* - Il)x = 0, or T*x = Ix . Therefore, I is an eigenvalue of T* with eigen-
vector .x
To prove the last part assume that 1 F= /.l and that T is normal. Then
(1 - /.l)(x, y) = ).(x, y) - /.l(x, y) = (.tx, y) - (x, fly)
= (Tx , y) - (x, T*y) = (Tx , y) - (Tx , y) = 0;
i.e., (A - /.l)(x, y) = O. Since 1 F= /.l we have x ..L y. •
The next two results indicate what happens to the spectrum of an operator
T when it is subjected to various elementary transformations.
7.6.11. Theorem. Let T E B(X , X), and let P(T) denote a polynomial in
T. Then
q ( p(T» = p(q(T» = {p(A): A E q(T)J.
7.6.13. Theorem. Let T E B(X , X) be a bijective mapping. Then

q ( T- I ) = [ q ( T)r l tJ. l{ :). E q ( T)} .
Proof Since T- I exists, 0 i q ( T) and so the definition of (q[ T)]1- makes

sense. Now for any). F= 0, consider the identity
(T- I - 1I) = (U - T) 1 T- I .
It follows that if 1 i q ( T), then (T- I - 1/) has a continuous inverse; i.e.,
1 i q ( T) implies that 1i q ( T- I ). In other words, U ( T- I ) c [ u (T)r l
• To
prove that [q(T)]-1 c q ( T- I ) we proceed similarly, interchanging the roles of
Tand T- I . •
We now introduce the concept of the approximate point spectrum of an

operator.
7.6.14. Definition. eL t T E B(X, )X . Then 1 E C is said to belong to the

approximate point spectrum of T if for every E > 0 there exists a non-zero
vector x E X such that II Tx - lx II < Ell x II. We denote the approximate
point spectrum by n(T). If 1 E n(T), then 1 is called an approximate eigenvalue
ofT.
Clearly, Pt1(T) c n(T). Other properties of n(T) are as follows.
7.6.15. Theorem. eL t X be a Hilbert space, and let T E B(X, )X . Then

n(T) c t1(T).
Proof Assume that 1 ~ t1(T). Then (T - lJ ) has a continuous inverse,
and for any x E X we have
IIxII = II(T- l l)- I (T - l l)x l l < II(T - l l)- I IIII(T - l l)x l l.
Now let E = I/II(T - 1/)- 1 II. Then we have, from above, II Tx - lx l l ~
Ell x II for every x E X and 1 ~ n(T). Therefore, t1(T) ::> n(T). •
7.6.16. Theorem. eL t X be a Hilbert space, and let T E B(X, X) be a
normal operator. Then n(T) = t1(T).
We can use the approximate point spectrum to establish some of the
properties of the spectrum of hermitian operators.
7.6.18. Theorem. eL t X be a Hilbert space, and let T E B(X, X) be
hermitian. Then
(i) t1(T) is a subset of the real line;
(ii) II Til = sup {Ill: 1 E t1(T)}; and
(iii) t1(T) is not empty and either + II Til or -II Til belongs to t1(T).
Proof To prove (i), note that if T is hermitian it is normal and t1(T) = n(T).
eL t 1 E n(T), and assume that 1 0 is complex. Then for any x *'
0 we have *'
0< 11- IlIlxW = 11- II(x , )x = I« T - l l)x , )x - «T - Il)x , )x 1
< I« T - ll)x , )x 1 + I« T - Il)x , )x 1 < II(T - lJ)lx lllxll
+ II(T- Il)x l lllx l i
= 211(T - l J ) x l lllx l l;
i.e.,
0< 11- IllIx l l < 211(T- l l)x l l
7.6. The Spectrum 0/ an Operator
for all x E .X But this implies that l rt neT), contrary to the original assump-
tion. eH nce, it must follow that l = .i, which implies that l is real.
To prove (ii), first note that II Til > sup { I ll: l E q ( T)} for any T E
B(X, X) (see Theorem 7.6.7). To show that equality holds, if T is hermitian,
we first must show that II T WE n(P) = q ( P). F o r all reall and all x E X
we can write
IIT2 x - ..1hW = (T2 X - l 2 X , T2X - l 2 X )
= (T2 X , T2 X ) - (T2 X , l2 X ) - (l2 X , T2X ) + l4(,X )x .
Since (T2X , )x = (Tx, T*x) = (Tx, Tx), we now have
(T2 X - l2X , Px - l2 X ) = (T2 X , T2 X ) - 2l 2(Tx, Tx) + A,4(,X )x ,
or
IIT2x - l 2 X W = IIT2x W - 2l 211TxW + A,4 I1xW. (7.6.19)
Now let }~x{ be a sequence of unit vectors such that IITx~ll--+ IITII. If'
l = IITII, then we have, from Eq. (7.6.19),
IIPx~ - l2X~HZ = IIT2X~HZ - 2l211Tx~W + l4
< (II T 1111 Tx~ 11)2 - 2A,211 Tx~ ZH + A,4 = ). 211 Tx~ ZH - 2). 211 Tx~ W+ ). 4
= ).4 _ ).211 Tx~ ZH -+ 0 as n - + 00;
eL ., - ). 2.x 11 +- 0 as n +- 00, and thus ). 2 E n(T2) =

II T2x . (q T2).
Using Theorems 7.6.11 and 7.6.15 and the fact that). 2 E n(P), it now
follows that
IITII = sup { I ll: l E u(T)} .
The proof of (iii) is left as an exercise. _
7.6.20. Exercise. Prove part (iii) of Theorem 7.6.18.
7.6.21. Theorem. eL t X be a Hilbert space, and let T E B(X, )X . Then

neT) is closed.
In the following we let T E B(X, X), ). E C, and we let .~ l(T) be the null
space of T - ;U i.e.,
i~ T) = x{ E :X (T - U)x = O} = ~(T - U). (7.6.23)
It follows from Theorem 7.1.26 that ~.l(T) is a closed linear subspace of .X
F o r the next result, recall Definition 3.7.9 for the meaning of an invariant
subspace.
7.6.24. Theorem. Let X be a Hilbert space, let). E C, and let S, T E

B(X, )X . If ST = TS, then ~l(T) is invariant under S.
Proof L e t x E l~ (n. We wantto show that Sx E ~l(n; i.e., TSx = lSx .

Since x E ~l(n, we have Tx = .lx. Thus, STx = lSx . Since ST = TS, we
have TSx = lSx . •
7.6.25. Corollary • ~l(n is invariant under T.

Proof Since IT = IT, the result follows from Theorem 7.6.24. •
F o r the nex t result, recall Definition 7.5.19.
7.6.26. Theorem. L e t X be a H i lbert space, let A. E C, and let T E B(X, X).

If T is normal, then
(i) l~ (T) = ~rtT*);
(ii) l~ (T)..l ,~ ,(T) if A. "* p.; and
(iii) l~ (T) reduces T.
Proof The proofs of parts (i) and (ii) are left as an exercise.
To prove (iii), we see that ~l(T) is invariant under T from Corollary
7.6.25. To prove that ~l(T)l- is invariant under T, let y E ~l(T)l.. We want
to show that (x, Ty) = 0 for all x E & i T). If x E &l(T), we have Tx = .lx,
and so, by part (i), T*x = .x X Now (x, Ty) = (T*x, y) = (X,x y) =
(X ,x y) = O. This implies that Ty E &l(T)l., and so &l(T)l. is invariant under
T. This completes the proof of part (iii). •
7.6.27. Exercise. Prove parts (i) and (ii) of Theorem 7.6.26.
Before considering the last result of this section, we make the following
definition.
7.6.28. Definition. A family of closed linear subspaces in a H i lbert space

X is said to be total if the only vector y E X orthogonal to each member of
the family is y = o.
7.6.29. Theorem. L e t X be a H i lbert space and let S; T E B(X , X ) . Ifthe
family of closed linear subspaces of X given by {&l(T): A. E C} is total,
then TS = ST if and only if & l (n is invariant under S for all,t E C.
Proof The necessity follows from Theorem 7.6.24. To prove sufficiency,
assume that & l (T) is invariant under S for aliA. E C. L e t & denote the null
space of TS - ST; i.e., ~ = ~(TS - ST). If x E ~in, then Sx E ~l(n
by hypothesis. Hence, TSx = T(Sx ) = ,t(Sx) = S(.lx) = S(Tx ) = STx for
all x E ~iT). Thus, (TS - ST)x = 0 for any x E & l (n, and so ~in c .&
If there is a vector y 1- & , then it follows that y 1- & i T) for all A. E C. By
hypothesis, the family {&l(T): A. E C} is total, and thus y = O. It follows that
1& . = O { J and 1& .1. = rOll. and 1& .1. = ,& because & is a closed linear
7.7. Completely Continuous Operators 74
subspace of .X Therefore, m. = X; eL ., (TS - ST)x = 0 for all x E .X

Hence, TS = ST. •
7.7. COMPLETELY CONTINUOUS OPERATORS
Throughout this section X is a normed linear space over the field ofcomplex
numbers C.
Recall that a set Y c X is bounded if there is a constant k such that for
all x E Y we have II x II < k. Also, recall that a set Y is relatively compact
if each sequence x{ n } of elements chosen from Y contains a convergent
subsequence (see Definition 5.6.30 and Theorem 5.6.31). When Y contains
only a finite number of elements then any sequence constructed from Y must
include some elements infinitely many times, and thus Y contains a convergent
subsequence. From this it follows that any set containing a finite number
of elements is relatively compact. Every relatively compact set is contained
in a compact set and hence is bounded. F o r the finite-dimensional case it is
also true that every bounded set is relatively compact (e.g., in Rn the Bolzano-
Weierstrass theorem guarantees this). However, in the infinite-dimensional
case it does not follow that every bounded set is also relatively compact.
In analysis and in applications linear operators which transform bounded
sets into relatively compact sets are of great importance. Such operators are
called completely continuous operators or compact operators. We give the
following formal definition.
7.7.1. Definition. eL t X and Y be normed linear spaces, and let T be a

linear transformation with domain X and range in .Y Then T is said to be
completely continuous or compact if for each bounded sequence x { n } in ,X the
sequence { T x . } contains a subsequence converging to some element of y E .Y
We have the following equivalent characterization of a completely con-

tinuous operator.
7.7.2. Theorem. Let X and Y be normed linear spaces, and let T E

B(X , Y ) . Then T is completely continuous if and only if the sequence { T x n }
contains a subsequence convergent to some y E Y for all sequences x { n } such
that Ilx,,11 < I for all n.
Clearly, if an operator T is completely continuous, then it is continuous.

On the other hand, the fact that T may be continuous does not ensure that
it is completely continuous.
We now cite some examples.
7.7.4. Example. eL t T: X - X be the ez ro operator; i.e., Tx = 0 for all

x E .X Then T is clearly completely continuous. _
7.7.5. Example. Let X = era, bJ, and let II . II", be the norm on era, bJ
as defined in Example 6.1.9. eL t k: a[ , bJ X a[ , bJ - R be a real-valued
function continuous on the square a < s < b, a < t < b. Defining T:
X-Xby
T
[ (J x s) = s: k(s, t)x(t)dt
for all x E ,X we saw in Example 7.1.20 that Tis a bounded linear operator.
We now show that T is completely continuous.
eL t ,x { ,} be a bounded sequence in ;X i.e., there is a K > 0 such that
IIx"lI", < K for all n. It readily follows that if "Y = Tx", then IIY"II S 7011x"II,
where 70 = sup
1I~,b~
fb Ik(s, t) Idt (see Example
II
7.1.20). We now show that .Y { }
is an equicontinuous set of functions on a[ , bJ (see Definition 5.8.11). Let
f > O. Then, because of the uniform continuity of k on a[ , bJ X a[ , bJ, there
is a ~ > 0 such that Ik(s .. t) - k(s~, t)1 < (K b f_ a) if lSI - s~1 < ~ for
r
every t E a[ , bJ. Thus
IY,,(sl) - y,,(s~) I< Ik(sl' t) - k(s~, t) IIx(t) Idt < f
for all n and all s.. s~ such that lSI - s~ I < ~. This implies the set ,Y{ ,} is
equicontinuous, and so by the Arzela-Ascoli theorem (Theorem 5.8.12),
the set { Y . } is relatively compact in era, b] ; i.e., it has a convergent subse-
uq ence. This implies that T is completely continuous.
It can be shown that if X = L~[a, b) and if T is the Fredholm operator
-
defined in Example 7.3. II, then T is also a completely continuous operator.
The next result provides us with an example of a continuous linear trans-

formation which is not completely continuous.
7.7.6. Theorem. Let IE B(X , X) denote the identity operator on X .

Then I is completely continuous if and only if X is finite dimensional.
Proof. The proof is an immediate consequence of Theorem 6.6.10. _
We now consider some of the general properties of completely continuous

operators.
7.7.7. Theorem. eL t X and Y be normed linear spaces, let S, T E B(X , )Y

be completely continuous operators, and let IX, pEe. Then the operator
(IXS + PT) is completely continuous.
7.7. Completely Continuous Operators 94
Proof Given a sequence .x{ } with Ilx.1I < I, there is a subsequence x { • .}

such that the sequence {Sx • .} has a limit u; i.e., Sx • ~ u. F r om the sequence
x { • .} we pick another subsequence x { • ,} such that TX.' J ~ v. Then
(as + PDx • J
= aSx • J
+ PTx • , - (Xu + pv
as n k , n kJ ~ 00. •
We leave the proofs of the next results as an exercise.
7.7.8. Theorem. L e t T E B(X, X ) be completely continuous. Let Y be a

closed linear subspace of X which is invariant under T. Let T t be the restric-
tion of T to .Y Then T t E B(Y, )Y and T t is completely continuous.
7.7.10. Theorem. L e t T E B(X, X ) be a completely continuous operator,

and let S E B(X , X ) be any bounded linear operator. Then ST and TS are
completely continuous.
7.7.12. Corollary. Let X and Y be normed linear spaces, and let T E

B(X , )Y and S E B( ,Y X). If T is completely continuous, then ST is com-
pletely continuous.
7.7.14. Example. A consequence of the above corollary is that if T E

B(X, X ) is completely continuous and X is infinite dimensional, then T cannot
be a bijective mapping of X onto .X For, suppose T were bijective. Then we
would have T- t T = I. By the Banach inverse theorem (see Theorem 7.2.6)
T- t would then be continuous, and by the preceding theorem the identity
mapping would be completely continuous. However, according to Theorem
7.7.6, this is possible only when X is finite dimensional.
Pursuing this example further, let X = era, bJ with II· II~ as defined
in Example 6.1.9. Let T: X ~ X be defined by Tx(t) = s: (x r- )d-r for a < t
< b and x E .X It is easily shown that T is a completely continuous operator
on .X It is, however, not bijective since R < (T) is the family of all functions
which are continuously differentiable in ,X and thus R < (T) is clearly a proper
subset of .X The operator T is injective, since Tx = 0 implies x = O. The
inverse T- t is given by T- t y(t) = dy(t)/dt for y E R< (T) and a < t < b. We
saw in Example 5.7.4 that T- t is not continuous. •
In our next result we require the following definition.

7.7.15. Definition. Let X and Y be normed linear spaces, and let T E

B(X, )Y . The operator Tis said to be finite dimensional ifT(X ) is finite dimen-
sional; i.e., the range of T is finite dimensional.
7.7.16. Theorem. Let X and Y be normed linear spaces, and let T E

B(X, )Y . If T is a finite-dimensional operator, then it is a completely con-
tinuous operator.
Proof Let .x { l be a sequence in X such that II .x 1I ::;; 1 for all n. Then { T x . l
is a bounded sequence in T(X). It follows from Theorem 6.6.10 that the set
{ T x . l is relatively compact, and as such this set has a convergent subsequence
in T(X ) . It follows from Theorem 7.7.2 that T is completely continuous. _
The proof of the next result utilizes what is called the diagonalization
process.
7.7.17. Theorem. Let X and Ybe Banach spaces, and let { T .l be a sequence
of completely continuous operators mapping X into .Y If the sequence { T .l
converges in norm to an operator T, then T is completely continuous.
Proof Let .x { l be an arbitrary sequence in X with IIx.11 < I. We must show
that the sequence {Tx.l contains a convergent subsequence.
By assumption, T 1 is a completely continuous operator, and thus we can
select a convergent subsequence from the sequence T { 1 .x l. Let
denote the inverse images of the members of this convergent subsequence.

Next, let us apply T" to each member of the above subsequence. Since T"
is completely continuous, we can again select a convergent subsequence
from the sequence {T"x. 1l. The inverse images of the terms of this sequence
are
Xu, "X ", x 3", ... , .x ", ....
Continuing this process we can generate the array
Using this array, let us now form the diagonal sequence
Now each of the operators T IJ T", T 3 , • • , T., ... transforms this sequence
into a convergent sequence. To show that Tis completely continuous we must
7.7. Completely Continuous Operators 54 1
show that T also transforms this sequence into a convergent sequence. Now
II Tx • - Tx .... 11 = 1\ Tx." - Tkx • + Tkx". - Tkx ..", + Tkx",,,, - Tx",,,, II
< 1\ Tx • - Tkx • 11 + II Tkx • - Tkx",,,, II + II Tkx",,,, - Tx",,,, II
< liT - Tkll(llx • 11 + Ilx m ", II) + 1\ Tkx • - T k"x ,,,, II;
i.e.,
II Tx"" - Tx",,,, II < liT - TkII(II x • II + II "x '
II) + II Tkx • - Tkx",,,, II.
Since the sequence T { kX • } converges, we can choose m, n > N such that
II Tkx • - Tkx ..", II < f/2, and also we can choose k so that II T - Tk II < f/4.
We now have
II Tx • - Tx",,,, II < f
whenever m, n > Nand T { "x .} is a Cauchy sequence. Since Y is a complete

space it follows that this sequence converges in Y a nd by Theorem 7.7.2
the desired result follows. _
Theorem 7.7.7 implies that the family of completely continuous operators

forms a linear subspace of B(X, )Y . The preceding theorem states that if Y
is complete, then this linear subspace is closed.
7.7.18. Theorem. eL t X be a Hilbert space, and let T E B(X , X). Then

(i) T is completely continuous if and only if T*T is completely contin-
uous; and
(ii) T is completely continuous if and only if T* is completely continuous.
Proof We prove (i) and leave the proof of (ii) as an exercise. Assume that
T is completely continuous. It then follows from Theorem 7.7.10 that T*T
is completely continuous.
Conversely, assume that T*T is completely continuous, and let (x,,} be
a sequence in X such that II "x II < 1. It follows that there is a subsequence
"x{ J such that T*Tx". - > x E X as nk - > 00. Now
II TX"J - Tx". W= II T(x"J - x • ) W= (T(x' J - "x .), T(x", - "x .»
= (T*T(x", - "x .), (x"J - "x .» < II T*T(x"J - x • ) II • II "x J - "x . II
:::;; 211 T*Tx", - T*Tx " .II- - » 0
as nl , nk - > 00. Thus, T
{ "x ,} is a Cauchy sequence and so it is convergent.
It follows from Theorem 7.7.2 that Tis completely continuous. _
In the remainder of this section we turn our attention to the properties

of eigenvalues of completely continuous operators.
7.7.20. Theorem. eL t X b e a Hilbert space, let T E B(X, )X , and letA. E C.

If T is completely continuous and if 1 =# 0, then
m.A(n = :x { Tx = lx }
is finite dimensional.
Proof. The proof is. by contradiction. Assume that m.A(n is not finite
dimensional. Then there is an orthonormal infinite sequence X I ' 2X .' • • ,
x .., ... in m.A(n, and
IITx .. - TxlllW = II Ax .. -AxlllW = 1112. II x .. - lx llW = 21112.;
i.e., II Tx .. - Txlllil = ,."I"'r III =# 0 for all m =# n. Therefore, no subsequence
of T{ x ..} can be a Cauchy sequence, and hence no subsequence of T
{ x ..} can
converge. This completes the proof. _
In the next result n(T) denotes the approximate point spectrum of T.
7.7.21. Theorem. eL t X b e a Hilbert space, let T E B(X, )X , and let 1 E C.

If T is completely continuous, if 1 =# 0, and if 1 E n(T), then 1 is an
eigenvalue.
Proof. For each positive integer n there is an x .. E X such that II Tx .. - Ax .. II
< .!.nII x .. II forA. E n(n. We may assume that II x .. II = l. Since Tis completely
{ x ..J is
continuous, there is a subsequence of x { ..}, say x { ...} such that T
convergent. eL t lim Tx ... = y E .X It now follows that lIy - lx • I1-- 0
"'
as nk -- 00; i.e., AX ... - y. Now lIyll=# 0, because lIyll = lim II AX ... II =
"'
IAI lim II x • II = IAI =# O. By the continuity of T. we now have
"'
Ty = T(lim lx ...) = lim T(AX ...) = 1 lim Tx • = ly.
.... IJ ,t II.
eH nce, Ty = ly, y =# O. Thus, 1 is an eigenvalue of T and y is the corre-
sponding eigenvector. _
The proof of the next result is an immediate consequence of Theorems

7.6.16 and 7.7.21.
7.7.22. Theorem. eL t X be a Hilbert space, and let T E B(X, X ) be com-

pletely continuous and normal. If 1 E u(n and 1 =# 0, then 1 is an eigenvalue
ofT.
The above theorem states that, with the possible exception of 1 = 0,

the spectrum of a completely continuous normal operator consists entirely
of eigenvalues; i.e., if 1 =# 0, either 1 E Pu(T) or 1 E P(T).
7.7. Completely Continuous Operators 54 3
7.7.24. Theorem. L e t X be a H i lbert space, and let T E B(X , X ) . If T is

completely continuous and hermitian, then T has an eigenvalue, l, with
III = II Til·
Proof The proof follows directly from part (iii) of Theorem 7.6.18 and
Theorem 7.7.22. _
7.7.25. Theorem. L e t X be a H i lbert space, and let T E B(X , X ) . If Tis

normal and completely continuous, then T has at least one eigenvalue.
Proof If T = 0, then l = 0 clearly satisfies the conclusion of the theorem.
So let us assume that T *- O. Also, if T = T*, the conclusion of the theorem
follows from Theorem 7.7.24. So let us assume that T*- T*. L e t U = 1(T
+ T*) and V = i/T - T*). It follows from Theorem 7.4.15 that U and V
are hermitian. F u rthermore, by Theorem 7.5.4 we have U V = VU. F r om
Theorems 7.7.7 and 7.7.18, U and V are completely continuous. Byassump-
tion, V*- O. By the preceding theorem, V has a non- z e ro eigenvalue which we
shall call p. It follows from Theorem 7.1.26 that ffi:iV) = ffi:(V - PI) ~ N
is a closed linear subspace of .X Since U V = VU, Theorem 7.6.24 implies
that N is invariant under .U Now let U I be the restriction of U to the linear
subspace N. It follows that U I is completely continuous by Theorem 7.7.8.
It is readily verified that U I is a hermitian operator on the inner product
subspace N (see Eq. (3.6.21). Hence, U I is completely continuous and
hermitian. This implies that there is an (X E C and an x E N such that x * - O
and U l x = (X.x This means Ux = (X.x Now since x E N, we must have Vx
= px . It follows that l = (1, + iP is an eigenvalue of T with corresponding
eigenvector x , since Tx = U [ + iV] x = (Xx +
ipx = x « + iP)x = lx . This
completes the proof. _
We now state and prove the last result of this section.
7.7.26. Theorem. L e t X be a H i lbert space, and let T E B(X , X). If Tis

normal and completely continuous, then T has an eigenvalue l such that
III = II Til·
Proof L e t S = T*T. Then S is hermitian and completely continuous by
Theorem 7.7.18. Also, S > 0 because (Sx , )x = (T*Tx , x ) = (Tx , Tx ) =
II Tx ZH > O. This last condition implies that S has no negative eigenvalues.
Specifically, if l is an eigenvalue of S, then there is an x * - O in X such that
Sx = Ax. Now
o< (Sx, )x ~ (Ax, x) = A(x, )x = AllxW,
and since II x II *- 0, we have A > O. By Theorem 7.7.24, S has an eigenvalue,
p, where ± p = IISII = IIT*TII = IITW· Now let N ~ ffi:(S - pI) = ffi:iS ),
and note that N contains a non- z e ro vector. Since T is normal, TS = T(T*T)
= (T*nT = ST. Similarly, we have T*S = ST*. By Theorem 7.6.24, N is

invariant under T and under T*. By Theorem 7.5.6 this means T remains
normal when its domain of definition is restricted to N. By Theorem 7.7.25,
there is alE C and a vector x I= :- 0 in N such that Tx = lx , and thus T*x
= .x X Now since Sx = T*Tx = T*(lx ) = IT*x = llx = 1112x for this x I= :-
0, and since Tx = lJ X for all x E N, it follows that 111 2 = lJ = II S II = II T*T II
= II T W· Therefore, III = II T II and 1 is an eigenvalue of T. _
7.8. THE SPECTRAL THEOREM

O
F R COMPLETELY CONTINUOS
U
NORMAL OPERATORS
The main result of this section is referred to as the spectral theorem

(for completely continuous operators). Some of the direct consequences of
this theorem provide an insight into the geometric properties of normal
operators. Results such as the spectral theorem playa central role in applica-
tions. In Section 7.10 we will apply this theorem to integral equations.
Throughout this section, X is a complex iH lbert space.
We require some preliminary results.
7.8.1. neorem. L e t T E B(X, X ) be completely continuous and normal.

F o r each f > 0, let A. be the annulus in the complex plane defined by
A. = {l E C: f < 1).1 s II Til}.
Then the number of eigenvalues of T contained in A. is finite.
Proof To the contrary, let us assume that for some f > 0 the annulus A.
contains an infinite number of eigenvalues. By the Bolzano-Weierstrass
theorem, there is a point of accumulation 1 0 of the eigenvalues in the
annulus A•. Let ){ .ft} be a sequence of distinct eigenvalues such that )." - > ).0
as n - > 00, and let Tx" = l"x", II "x II = I. Since T is a completely continuous
operator, there is a subsequence x { ...} of ,x { ,} for which the sequence T { "x .}
converges to an element u E X ; i.e., Tx". - > U as nk - > 00. Thus, since
Tx ... = l".x we have l • x ... - > u. But 1/).... - > 1/10 because 1" I= :- O. There-
ft. ,
fore x • - > (I/10)u. But the x • are distinct eigenvectors corresponding to

distinct eigenvalues. By part (iv) of Theorem 7.6.10 .x { ..} is an orthonormal
sequence and "x . - > (I/10)u. But II x • - "x ,11 = 2, and thus x { ...} cannot be
2
a Cauchy sequence. Yet, it is convergent by assumption; i.e., we have arrived

at a contradiction. Therefore, our initial assumption is false and the theorem
is proved. _
Our next result is a direct consequence of the preceding theorem.

7.8. The Spectral Theorem for Completely Continuous Normal Operators 54 5
7.8.2. Theorem. Let T E B(X , X ) be completely continuous and normal.

Then the number of eigenvalues of T is at most denumerable. If the set of
eigenvalues is denumerable, then we have a point of accumulation at zero
and only at zero (in the complex plane). The non-zero eigenvalues can be
ordered so that
The next result is known as the spectral theorem. Here we let Ao = 0,

and we let {AI' A2.' ...} be the non-zero eigenvalues of a completely contin-
uous operator T E B(X , X). Note that Ao mayor may not be an eigenvalue
of T. If Ao is an eigenvalue, then m.(T) need not be finite dimensional. oH w-
ever, by Theorem 7.7.20, m.(T - A/) is finite dimensional for i = 1,2, ....
7.8.4. Theorem. eL t T E B(X, X ) be completely continuous and normal,

let Ao = 0, and let A{ lt A2.' ...} be the non-zero distinct eigenvalues of T
(this collection may be finite). eL t m., = m.(T - A,I) for i = 0, I, 2, ....
Then the family of closed linear subspaces m { .,};:o of X is total.
Proof The fact that each m., is a closed linear subspace of X follows from
Theorem 7.1.26. Now let Y
N= O
.
= U m.", and let N = y.1.. We wish to show that
{ .J By Theorem 6.12.6, N is a closed linear subspace of .X We will
show first that Y is invariant under T*. Let x E .Y Then x E m.. for some n
and Tx = l"x. Now l.,(T*x ) = T*(l"x ) = T*Tx = T(T*x ) ; i.e., T(T*x )
= l.(T*x ) and so T*x E m.., which implies T*x E .Y Therefore, Y is
invariant under T*. From Theorem 7.3.15 it follows that y.1. is invariant under
T. Hence, N is an invariant closed linear subspace under T. It follows from
Theorems 7.7.8 and 7.5.6 that if T I is the restriction of T to N, then T I E
B(N, N) and T I is completely continuous and normal. Now let us suppose that
N 1= = O
{ .J By Theorem 7.7.25 there is a non-zero x E N and a A. E C such that
T I x = lx . But if this is so, Ais an eigenvalue of T and it follows that x E m."
for some n. Hence, x E N (\ ,Y which is impossible unless x = O. This
completes the proof. •
In proving an alternate form of the spectral theorem, we require the

following result.
7.8.5. Theorem. Let {N k } be a sequence of orthogonal closed linear sub-

spaces of ;X i.e., N k .1. N J for all j 1= = k. Then the following statements are
equivalent:
(i) N
{ k } is a total family;
(ii) X is the smallest closed linear subspace which contains every N k ; and
S4 6 Chapter 7 I iL near Operators
(iii) for every x E X there is a unique sequence x{ k} such that
(b) L
.
(a) X k E N k for every k,
x k = X; and
k=1
Proof We first prove the equivalence of statements (i) and (ii). Let Y
= U Nil' Then Y c y.l.L by Theorem 6.12.8. Furthermore, y.l.L is the smallest
II
closed linear subspace which contains Y by Theorem 6.12.8. Now suppose

{ N k } is a total family. Then yl. = O { .J Hence, yl.l. = X and so X is the
smallest closed linear subspace which contains every N k •
On the other hand, suppose X is the smallest closed linear subspace which
contains every N k • Then X = y.l.L and yl.l.l. = O { .J But yl.l.l. = lY .. Thus,
yl. = O{ ,J and so { N k } is a total family.
We now prove the equivalence of statements (i) and (iii). Let N { k } be a
total family, and let x E .X F o r every k = 1,2, ... , there is an IX < E IH <
and a kY E Nt such that x = X k + IY '< If IX < = 0, then (x, x k) = 0. If IX < 0, *'
then (x, xk1llxkll) = (Xk + kY ' x k lllx k ll) = II ,x .. II· Thus, it follows from
Bessel's inequality that
eH nce, ~
. Ilx,..1I 2
< 00. Next, let X o = ~
. X k• Then X o E .X For fixed j,
k=1 "'1=
let Y E N j . Then (x - x o, y) = (x j + Yj - x o, y) = (x j ,y) + (Y j ,y) - (xo,Y)
= (x j' y) - (i:
,..-1 ,x .., Y) = (x j' y) - i: (x k, y) = (x j' y) - (x j ' Y ) = O. Thus,
.
"'I~
(x - x o) is orthogonal to every element of Nj for every j. Since N

{ ,..} is a total
family, we have x = x o. To prove uniqueness, suppose that x = L IX <
= L
.. x~, where X k, x~ E Nk for all k. Then L
. (x k - x~) =
"'1-
O. Since X k
k=1 k=1
- x~ E N k we have (x k - x~) L- (x j - x~) for j *' k, and so II k~ (x k - x~) Ir
= i: II X k -
k=1
X~ 11 2 = O. Thus, II X k- x~ II = 0 for all k, and X k is unique for
each k.
To prove that (iii) implies (i), assume that x E Nt for every k. By hypothe-
sis, x = i:
k=1
X k, where X k E N k for all k. Hence, for any j we have
7.8. The Spectral Theorem for Completely Continuous Normal Operators 54 7
and x ) = 0 for allj. This means x = 0, and so N

{ k ) is a total family. This
completes the proof. •
In Definition 3.2.13 we introduced the direct sum of a finite number of

linear subspaces. The preceding theorem permits us to extend this definition
in a meaningful way to a countable number of linear subspaces.
7.8.6. Definition. Let kY { ) be a sequence of mutually orthogonal closed

linear subspaces of ,X and let V({Y k )) be the closed linear subspace generated
by kY{ '} If every x V({Y k)) is uniquely representable as x = L
.
E X k, where
k= 1
X k E Y k for every k, then we say V({Y k)) is the direct sum of kY{ )' In this
case we write
We are now in a position to present another version of the spectral theorem.
7.8.7. Theorem. eL t T E B(X , X ) be completely continuous and normal,

let lo = 0, and let P'I' l2' ... , In' ...) be the non-zero distinct eigenvalues
of T. eL t mol = mo(T - lJ ) for i = 0, I, 2, ... , and let Pi be the projection
on mol along mot. Then
(i) PI is an orthogonal projection for each i;
(ii) PIP) = 0 for all i,j such that i F= j;
(iii)
..
I; P J = I; and
(iv) T
)- 0
= ~
.. lJP).
t=1
Proof The proof of each part follows readily from results already obtained.
We simply indicate the principal results needed and leave the details as an
exercise.
Part (i) follows from the definition of orthogonal projection. Part (ii)
follows from part (ii) of Theorem 7.6.26. Parts (iii) and (iv) follow from
Theorems 7.1.27 and 7.8.5. •
In Chapter 4 we defined the resolution of the identity operator for

Euclidean spaces. We conclude this section with a more general definition.
7.8.9. Definition. Let P { n ) be a sequence of linear transformations on X

such that P n E B(X , X ) for each n. If conditions (i), (ii), and (iii) of Theorem
7.8.7 are satisfied, then P
{ n ) is said to be a resolution of the identity.
7.9. DIFE
F RENTIATION OF OPERATORS
In this section we consider differentiation of operators on normed linear

spaces. Such operators need not be linear. Throughout this section, X and Y
are normed linear spaces over a field ,F where F may be either R, the real
numbers, or C, the complex numbers. We will identify mappings which are,
in general, not linear by I: X - + .Y As usual, L ( X , )Y will denote the class
of all linear operators from X into ,Y while B(X , Y) will denote the class of
all bounded linear operators from X into Y~
7.9.1. Definition. L e t X o E X be a fixed element, and let I: X -+ .Y If

there exists a function 6/(x o, .): X - + Y such that
(7.9.2)
(where t E )F for all hEX , then I is said to be Gateaux differentiable at

x o, and 6/(x o, h) is called the Gateaux differential of/at X o with increment h.
The Gateaux differential ofI is sometimes also called the weak differential
of I or the G-differenfial of f If I is Gateaux differentiable at x o, then
6/(x o, h) need not be linear nor continuous as a function of hEX . However,
we shall primarily be concerned with functions I: X - + Y which have these
properties. This gives rise to the following concept.
7.9.3. Definition. L e t X o E X be a fixed element, and let I: X -+ .Y If

there exists a bounded linear operator F ( x o) E B(X, )Y such that
I~~ 1I~lIl f(xo + h) - f(x o) - F(xo)hll = 0

(where hEX ) , then f is said to be F r echet differentiable at x o, and F ( x o)
is called the F r echet derivative of I at x o' We define
f' ( x o) = F(x o)'
If I is F r echet differentiable for each x E D, where D c X, then I is said
to be F r echet differentiable on D.
We now show that F r echet differentiability implies Gateaux differen-

tiability.
7.9.4. Theorem. L e t/: X - + ,Y and let X o E X be a fixed element. If I is

F r echet differentiable at x o • then/is Gateaux differentiable. and furthermore
the Gateaux differential is given by
6/(x o, h) = f' ( x o )h for all hEX .
54 8
7.9. Differentiation ofOperators 54 9
Proof Let F(x o) = !'(x o), let f > 0, and let hEX . Then there is a 0 > 0
such that
II t~ 1I1\(J X o + th) - /(x o) - F ( x o )th II < f • II h II
provided that II th II < 0 if th *- O. This implies that
II /(x o + t~) - /(x o) - F ( x o )h II < f
provided that It I < 0/11 h II. Hence, / is Gateaux differentiable at X o and

~/(xo,h) = (F ox )h. •
Because of the preceding theorem, if I: X .- Y is Frechet differentiable
at X o E ,X the Gateaux differential ~/(xo, h) = ! , (x o )h is also called the
Frecbet differential of/at x o with increment h.
Let us now consider some examples.
7.9.5. Example. Let X be a Hilbert space, and let/be a functional defined

on X ; i.e., I: X .- .F If I has a Frechet derivative at some X o E ,X then
that derivative must be a bounded linear functional on ;X i.e.,! , (x o) E X·.
In view of Theorem 6.14.2, there is an element oY E X such that ! , (x o )h =
(h,yo)for each h E .X AIthough! , (x o) E X · andyo E ,X we know by Exercise
6.14.4 that X and X · are congruent and thus isometric. It is customary to
view the corresponding elements of isometric spaces as being one and the
same element. With this in mind, we say! ' ( x o) = oY and we call! ' ( x o) the
gradient off at X O' •
As a special case of the preceding example we consider the following
specific case.
7.9.6. Example. Let X = R' and let 11·11 be any norm on .X By Theorem
6.6.5, X is a Banach space. Now let / be a functional defined on X ; i.e.,
I: X .- R. Let x = (~I' ... ,~.) E X and h = (hI> ... ,h.) E .X If/has
continuous partial derivatives with respect to ~I' i = I, ... ,n, then the
Frechet differential of/is given by
~/(x, h) -
_
ae:
8/(x )
hI + ... + o
- 8/(xc ) h•.
F o r fixed X o E ,X we define the bounded linear functional F(x o) on X by
F(xo)h = ~ • 8/(x
~ ) I
hi "~"o for hEX .
Then F ( x o) is the Frechet derivative of/at X O' As in the preceding example,
we do not distinguish between X and X · , and we write
The gradient off at x is given by
f '(x) = (Uf(X) U f (x » ) . (7.9.7)

~ " "' ~ .
In the following, we consider another example of the gradient of a func-
tional.
7.9.8. Example. eL t X b e a real Hilbert space, letL : X - > X b e a bounded

linear operator, and let/: X - > R be given by f(x ) = (x , L x ) . Then I has a
rF echet derivative which is given by! , (x ) = (L + *L )x. To verify this, we let
h be an arbitrary element in X and we let (F )x = (L + *L )x. Then
f(x + h) - f(x ) - F(x)h = (x + h, L x + Lh) - (x, L x ) - (h, L x ) - (h, L *x)
= (h,Lh).
From this it follows that
lim If(x + h) - f(x ) - F(x)h I- 0 •
IhH IIhll - .
In
the next example we consider a functional which frequently arises in
optimization problems.
7.9.9. Example. Let X and Y be real Hilbert spaces, and let L be a bounded
linear operator from X into ;Y i.e., L E B(X , )Y . eL t L * be the adjoint of L .
eL t v be a fixed element in ,Y and let/be a real-valued functional defined on
Xby
f(x ) = IIv - L x 11 1 for all x E .X
Then f has a Frechet derivative which is given by
f' ( x ) = -2L*v + 2L*Lx.
To verify this, observe that
f(x ) = (v - Lx, v- Lx) = (v, v) - 2(v, L x ) + (Lx, Lx)
= (v, v) - 2(L*v, )x + (x , L * L x ) .
The conclusion now follows from Examples 7.9.5 and 7.9.8. •
In the next example we introduce the Jacobian matrix of a function

I: R8 -> R"'.
7.9.10. Example. eL t X = R8, and let Y = R"'. Since X and Y are finite
dimensional, we may assume arbitrary norms on each of these spaces and
they wiII both be Banach spaces. L e tf: X - > .Y F o r x = (~I" .. '~8) E ,X
7.9. Differentiation ofOperators 64 1
let us write
,I;')J
[
/I~X)J /[ 1(1;1,;., .
I(x ) = . = . .
. .
I",(x) 1",(1;1'' ,1;.)
For X o E ,X assume that the partial derivatives
af,(x ) I ~ af,(x o)
? f ; "=". - ae;-
exist and are continuous for i = I, ... , m and j = I, ... ,n. The Frechet
differential of1 at X o with increment h = (hI' ... ,h.) E X is given by
all (x o) a/,(x o)
3/(x o, h) = ~ ~ h[ h·:.'·J
al",(x o) al",(x o)
_ ael al;.
The F r tkhet derivative of 1 at X o is given by
all (x o)
~
which is also called the Jacobian matrix of 1 at X o' We sometimes write

j' ( x ) = a! ( x ) /ax . •
7.9.11. Example. Let X = e[a, b], the family of real-valued continuous
functions defined on a[ , b], and let { X ; II· II-} be the Banach space given in
Example 6.1.9. Let k(s, t) be a real-valued function defined and continuous
on a[ , b] X a[ , b], and let g(t, )x be a real-valued function which is defined
and ag(t, x ) /ax is continuous for t E a[ , b] and x E R. Let I: X - . X be
defined by
I(x ) = s: k(s, t)g(t, x(t»dt, x E .X
F o r fixed X o E ,X the Frechet differential of1 at X o with increment hEX is
f
given by
3/(x o, h) = k(s, t) ag(t'a~o(t}) h(t)dt. •
7.9.12. Exercise. Verify the assertions made in Examples 7.9.5 to 7.9.11.
We now establish some of the properties of F r echet differentials.
7.9.13. Theorem. Let f, g: X -+ Y be Frechet differentiable at X o E .X

Then
(i) fis continuous at X o E ;X and
(ii) for all ,~ p E ,F f~ + pg is F r echet differentiable at X o and (~f
+ pg)'(x o) = ~f'(xo) pg' ( x o)· +
Proof To prove (i), let f be Frechet differentiable at x o, and let F(x o) be
the Frechet derivative off at X o' Then
f(x o + h) - f(x o) = f(x o + h) - f(x o) - (F ox )h + (F ox )h,
and
IIf(x o + h) - f(x o) II ~ IIf(x o + h) - f(x o) - (F ox )hll + IIF(x o)hll.
Since F(x o) is bounded, there is an M > 0 such that II (F o x )h II < Mil h II.
F u rthermore, for given! > 0 there is a ~ > 0 such that IIf(x o + h) - f(x o)
- (F ox h) II < I! I h II provided that II h II .~< Hence, IIf(x o + h) - f(x o) II
< (M + ! ) lIhll whenever IIhll .~< This implies thatfis continuous atx o'
The proof of part (ii) is straightforward and is left as an exercise. _
We now show that the chain rule encountered in calculus applies to

Frechet derivatives as well.
7.9.15. Theorem. Let ,X ,Y and Z be normed linear spaces. L e t g: X - + ,Y

f: Y - + Z, and let,: X - + Z be the composite function , = fog. L e t g be
Frechet differentiable on an open set D c ,X and let f be F r echet differen-
tiable on an open set E c g(D). If x E D is such that g(x) E E, then, is
Frechet differentiable at x and ,' ( x ) = f'(g(x))g'(x).
Proof Let y = g(x) and d = g(x + h) - g(x), where hEX is such that
x + hE D. Then
,(x + h) - ,(x ) - f' ( y)g' ( x ) h = f(y + d) - f(y) - f' ( y)d
+ f' ( y)[ d - g'()x h)
= f(y + d) - f(y) - f' ( y)d + f' ( y){ g (x + h) - g(x) - g'()x h).
Thus, given! > 0 there is a ~ > 0 such that II d II < ~ and II h II < ~ imply
11,(x + h) - ,(x ) - f' ( y)g' ( x ) hll ~ ! l Idli + 11f' ( y)II· l Ihll· E.
By the continuity of g (see the proof of part (i) of Theorem 7.9.13), it follows
that Ildli < M · l Ihll for some constant M. Hence, there is a constant k
7.9. Differentiation 01 Operators 64 3
such that
II ,(x + h) - ,(x ) - f' ( y)g' ( x ) h II < kf II h II.
This implies that ,' ( x ) exists and ,' ( x ) = f' ( g(x » g ' ( x ) . •
We next consider the Frckhet derivative of bounded linear operators.
7.9.16. Theorem. Let T be a linear operator from X into .Y If f(x ) = Tx

for all x E ,X then/is Frechet differentiable on X if and only if T is a bounded
linear operator. In this case, f' ( x ) = T for all x E .X
Proof Let T be a bounded linear operator. Then Ilf(x + h) - f(x ) - Th II
= IIT(x + h) - Tx - Thll = 0 for all x , hEX . F r om this it follows that
f' ( x ) = T.
Conversely, suppose T is unbounded. Then, by Theorem 7.9.13,/ cannot
be Frechet differentiable. •
7.9.17. Example. Let X = R" and Y = Rm, and let us assume that the
natural basis for each of these spaces is being used (see Example .4 1.15).
If A E H ( X , Y), then Ax is given in matrix representation by
all
Ax = :
[
amI
Hence, if I(x ) = Ax , then f' ( x ) = A, and the matrix representation of

f' ( x ) = df(x ) /U x is A. •
The next result is useful in obtaining bounds on Frechet differentiable

functions.
7.9.18. Theorem. Let f: X - + ,Y let D be an open set in ,X and let / be

Frechet differentiable on D. eL t X o E D, and let hEX be such that X o
+ th E D for all t when 0· < t < I. eL t N = sup 11f'(x o + th) II. Then 0< , < 1
Ilf(x o + h) - f(x o) II < N l· Ihll.
+
Proof Let y = f(x o h) - f(x o), and let , be a bounded linear functional
defined on Y(i.e." E Y * ) such that ,(y) = 11,11 · l Iyl! (see Corollary 6.8.6).
Define g: (0, 1) - + R by get) = ,(f(x th» for 0 < t < I. By Theorems +
7.9.15 and 7.9.16, g'(t) = ' P (/' ( x + th)h). By the mean value theorem of
calculus, there is a to such that 0 < to < I and g(I) - g(O) = g'(t 0)' Thus,
I,(/(x + h» - ,(/(x » I< 11,11 • sup 1If' ( x
0< 1 < 1
+ th)II·llhll·
Since
Irp(f(x + h» - rp(/(x» I = Irp(/(x + h) - I = Irp(y) I
I(x ) )
= IIrpll·lI/(x o + h) - I(x o) II,
it follows that II/(x o + h) - I(x o) II ~ sup 11f'(x
O<t<1
+ th)1I l· Ihll. •
If a function I: X - > Y is Frechet differentiable on an open set D c .X

and if f' ( x ) is Frechet differentiable at x E D, then I is said to be twice
Frechet differentiable at x , and we call the Frechet derivative of f' ( x ) the
second derivative of f We denote the second derivative of I by I". Note
that I" is a bounded linear operator defined on X with range in the nonned
linear space B(X, )Y .
7.9.19. Theorem. L e t/: X - > Ybe twice Frechet differentiable on an open

set D c .X eL t X o E D, and hEX be such that X o + th E D for all t
when 0 < t < I. eL t N = sup 1I/"(x + th) II. Then
O< t < 1
11/(x + h) - I(x ) - f' ( x ) hll < iN l· Ihll z .

We conclude the present section by showing that the Gateaux and Frechet
differentials play a role in maximizing and minimizing functionals which is
similar to that of the ordinary derivative of functions of real variables.
eL t F = R, and let I be a functional on X ; i.e., I: X - > R. Clearly, for
fixed x o, hEX . we may define a function g: R - + R by the relation g(t)
= I(x o + th) for all t E R. In this case, if I is Gateaux differentiable at x o•
we see that ~/(xo. h) = g' ( t) It.o, where g' ( t) is the usual derivative of g(t).
We will need this property in proving our next result, Theorem 7.9.22. First,
however, we require the following important concept.
7.9.21. Definition. eL t I be a real-valued functional defined on a domain

S) c X ; i.e.,f: S) - > R. eL t X o E S). Then/is said to have a relative minimum
(relative maximum) at X o if there exists an open sphere S(x o ; r) c X such that
for all x E S(x o; r) n S) the relation I(x o) < I(x ) (/(x o) ~ I(x » holds.
IfI has either a relative minimum or a relative maximum at x o• then I is
said to have a relative extremum at X O'
F o r relative extrema, we have the following result.
7.9.22. Theorem. eL t I: X - + R be Gateaux differentiable at X o E .X

If/has a relative extremum at x o, then ~/(xo, h) = 0 for all hEX .
7.10. Some Applications
Proof As pointed out in the remark preceding Definition 7.9.21, the real-
valued function g(t) = f(x o + th) must have an extremum at t = O. From
the oridnary calculus we must have g'(t) 1,.0 = O. eH nce, 6f(x o, h) = 0
for all hEX . •
7.9.23. Corollary. eL t f: X - + R be Frechet differentiable at X o E .X If

fhas a relative extremum at x o, thenj' ( x o) = O.
We conclude this section with the following example.
7.9.25. Example. Consider the real-valued functionalf defined in Example

7.9.9; i.e.,f(x ) = IIv - L x liz. F o r a given v E ,Y a necessary condition for
fto have a minimum at X o E X is that
L*Lx o= L*v . •
7.10. SOME APPLICATIONS
In this section we consider selected applications of the material of the

present chapter. The section consists of three parts. In the first part we con-
sider integral equations, in the second part we give an example in optimal
control, while in the third part we address the problem of minimizing func-
tionals by the method of steepest descent.
A. Applications to Integral Equations
Throughout this part, X is a complex Hilbert space while T denotes a

completely continuous normal operator defined on .X
We recall that if, e.g., X = a[ z L , b] and T is defined by (see Example
7.3.11 and the comment at the end in Example 7.7.5)
Tx(s) = s: k(s, t)x(t)dt, (7.10.1)
then T is a completely continuous operator defined on .X Furthermore, if

k(s, t) = k(t, s) for all s, t E a[ , b], then T is hermitian (see Exercise 7.4.20)
and, hence, normal.
In the following, we shall focus our attention on equations of the form
Tx - h = y, (7.10.2)
where A E C and x, Y E .X If, in particular, T is defined by Eq. (7.10.1),

then Eq. (7.10.2) includes a large class of integral equations. Indeed, it was
the study of such equations which gave rise to much of the development of
functional analysis.
We now prove the following existence and uniqueness result.
7.10.3. Theorem. If A1= = 0 and if A is not an eigenvalue of T, then Eq.

(7.10.2) has a unique solution, which is given by
(7.10.4)
where A{ n} are the non-zero distinct eigenvalues of T, P n is the projection of

X onto ~n = ~(T - AnI) along~;,l for n = 1,2, ... ,and Pox is the projec-
tion of x onto ~(T).
Proof We first prove that the infinite series on the right-hand side of
Eq. (7.10.4) is convergent. Since A1= = 0, it cannot be an accumulation point
{ n}. Thus, we can find ad> 0 such that IAI > d and 11 - 1k I> d for
*'
of A
k = 1,2, .... We note from Theorem 7.8.7 that PIP j = 0 for i j. Now
for N < 00, we have by the Pythagorean theorem,
II-Pf + k~I;: ~;:112 =rhIlPoYW+ k~ 11-A

! kI2I1PkYW
< d211PoYW + dz kt IIP kyW
= dzI[ IPoYW + ktlllPkYll z ]
= d 211 poY + ~ Pkylr
< dzll pOY + i;l PkyW
= dziIYW.
This implies that k~ 11 ~ 1k 12 II PkY 2

11 is convergent, and so it follows from
Theorem 6.13.3 that nt :X ~ ): is convergent to an element in .X

be a positive integer. By Theorem 7.5.12, P j is continuous, and so
L e tj
P ) PP
by Theorem 7.1.27, Pj ~, ~ 1 = ~ , J ....:Y,. Now let x be given by
00 00
(
11-1 All ,,- 1 A" lJ .
Eq. (7.10.4) for arbitrary Y E .X We want to show that Tx - l x = y. F r om
Eq. (7.10.4) we have

I
Pox = - r PoY
and
PJ X = l J 1 lPJ y forj= 1,2, ....
Thus, poY = - l Pox and PJY = lJPxJ - lPJx. Now from the spectral
theorem (Theorem 7.8.7), we have Y = poY + fti PJ'Y
00
Tx
00
= ftilJ P J x , and
+
00
lx = lPox ~ lPJx. Hence, Y = Tx - l x .

:'J 1
Finally, to show that x given by Eq. (7.10.4) is unique, let x and z be such
that Tx - Ax = Tz - lz = y. Then it follows that T(x - )z - l(x - z)
=Y - Y = O. Hence, T(x - )z = l(x - )z . Since 1 is by assumption not
an eigenvalue of T, we must have x - z = O. This completes the proof. _
In the next result we consider the case where 1 is a non-zero eigenvalue

ofT.
7.tO.S. Theorem. Let I{ n} denote the non-zero distinct eigenvalues of T,

and let A= lJ for some positive integer j. Then there is a (non-unique)
x E X satisfying Eq. (7.10.2) if and only if PJY = 0, where PJ is the orthogonal
projection of X onto ffi:J = :x { (T - Al)x = O}. If PJY = 0, then a solution
to Eq. (7.10.2) is given by
X=X
poY
o - " ' "II.
+ ~
~'
PkY (7.10.6)
k= l lI.k -.I\,
k*J
where Po is the orthogonal projection of X onto ffi:(T) and X o is any element

in ffi:J '
Proof We first observe that ffi:J reduces T by part (iii) of Theorem 7.6.26.
It therefore follows from part (ii) of Theorem 7.5.22 that TPJ = PJT. Now
suppose that Y is such that Eq. (7.10.2) is satisfied for some x E .X Then it
follows that PJY = Pi Tx - lJ x ) = TPJx - lJPxJ = AJPXJ - AJPXJ = O.
In the preceding, we used the fact that Tx = lJ x for x E ffi:J and PJx E ffi:J
for all x E .X Hence, PJY = O.
Conversely, suppose that PJY = 0, and let x be given by Eq. (7.10.6).
The proof that x satisfies Eq. (7.10.2) follows along the same lines as the
proof of Theorem 7.10.3, and the details are left as an exercise. The non-
uniqueness of the solution is apparent, since (T - ll)x o = 0 for any X o
E ffi:J' -
7.tO.7. Exercise. Complete the proof of Theorem 7.10.5.

B. An Example from Optimal Control
In this example we consider systems which can appropriately be described

by the system of first-order ordinary differential equations
i(l) = AX(I) + BU(I), (7.10.8)
° where x ( o) X o is given. Here (X I) E RIO and (U I) E R'" for every 1 such that
A
< 1 < T for some T> 0, and A is an n X n matrix, and B is an n X m

matrix. As we saw in part (vi) of Theorem .4 11.45, if each element of the
vector (U I) is a continuous function of I, then the unique solution to Eq.
(7.10.8) at time 1 is given by
(X I) = .(1, O)x(O) + (.(1, r- )BU(f)d-r, (7.10.9)
where .(1, f) is the state transition matrix for the system of equations given
in Eq. (7.10.8).
Let sU now define the class of vector valued functions ;L O
[' , T] by
;L O
[' , T] = u{ : uT = (U . , •• ,u",), where /U E L 20
[ , T], i = I, ... ,m} .
r
If we define the inner product by
(u, v) = uT(t)v(l)dl
for u, v E Lr[O, 1',] then it follows that Lr[O, T] is a Hilbert space (see
Example 6.11.11). Next, let us define the linear operator L : Lr[O, T] - +
Li[O, 1'] by
[Lu](I) = I .(1, r- )BU(f)d-r (7.10.10)
for all U E Lr[O, 1'.] Since the elements of .(1, r- ) are continuous functions
on 0[ , T] X 0[ , T], it follows that L is completely continuous.
Now recall from Exercise 5.10.59 that Eq. (7.10.9) is the unique solution
to Eq. (7.10.8) when the elements of the vector u(t) are continuous functions
of t. It can be shown that the solution of Eq. (7.10.8) exists in an extended
sense if we permit u E Lr[O, T]. Allowing for this generalization, we can
now consider the following optimal control problem. Let "I E R be such that
r r
"I > 0, and let/be the real-valued functional defined on Ll[O, T] given by
/(u) = T
x (t)X(I)dt + "I T
U (I)U(t)dt, (7.10.11)
where (x t) is given by Eq. (7.10.9) for U E T [ , T]. The linear quadratic

L O
cost control problem is to find u E T [ , T] such that/(u) in Eq. (7.10.11) is
L O
minimum, where x(t) is the solution to the set of ordinary differential equa-
tions (7.10.8). This problem can be cast into a minimization problem in a
Hilbert space as follows.
Let
v(t) = - . (t, O)x o for 0 < t ::::;; T.
Then we can rewrite Eq. (7.10.9) as
x = Lu - v,
and Eq. (7.10.11) assumes the form
f(u) = IILu - vW + "lIuW.
We can find the desired minimizing u in the more general context of
arbitrary real Hilbert spaces by means of the following result.
7.10.12. Theorem. Let X and Y be real Hilbert spaces, let :L X - + Y be a

completely continuous operator, and let L * denote the adjoint of L . Let v
be a given fixed element in ,Y let" E R, and define the functionalf: X - + R
by
f(u) = IILu - vW + "lIull z (7.10.13)
for u E .X (In Eq. (7.10.13) we use the norm induced by the inner product
and note that II u II is the norm of u E ,X while II L u - v II is the norm of
(L u - v) E .Y ) If in Eq. (7.10.13), " > 0, then there exists a unique U o E X
such that f(u o) < f(u) for all u E .X Furthermore, U o is the solution to the
equation
L*Lu o + "U o= L*v. (7.10.14)
Proof. eL t us first examine Eq. (7.10.14). Since L is a completely continuous
operator, by Corollary 7.7.12, so is L*L. Furthermore, the eigenvalues of
L * L cannot be negative, and so - " cannot be an eigenvalue of L*L. Making
the association T = L * L , A = - " , and y = L * v in Eq. (7.10.2), it is clear
that Tis normal and it follows from Theorem 7.10.3 that Eq. (7.10.14) has a
unique solution. In fact, this solution is given by Eq. (7.10.4), using the above
definitions of symbols.
Next, let us assume that U o is the unique element in X satisfying Eq.
(7.10.14), and let hE X b e arbitrary. It follows from Eq. (7.10.13) that
f(u o + h) = (L u o + Lh - v,L u o + L h - v) + ,,(uo + h, U o + h)
= (L u o - v, L u o - v) + 2(Lh, L u o - v)
+ (v, v) + "(I!o, u o) + 2,,(u o, h) + ,,(h, h)
= (L u o - v, L u o - v) + (v, v) + ,,(uo, uo)
+ 2(h, L * L u o+ "u o - L * v) + ,,(h, h)
= IILu o - vW + IlvW + "lIuoW+ "lIhW·
Therefore, f(u o + h) is minimum if and only if h = O. •
The solution to Eq. (7.10.14) can be obtained from Eq. (7.10.4); however.
a more convenient method is available for the finding of the solution when
L is given by Eq. (7.10.10). This is summariz e d in the following result.
7.10.1S. Theorem. L e t Y' > 0, and let f(u) be defined by Eq . (7.10.11),

where x ( t) is the solution to Eq. (7.10.8). If
u(t) = .J_ .. BTp(t)x ( t)

Y'
for all t such that 0 ~ t < T, where P(t) is the solution to the matrix differ-
ential eq u ation
P(t) = - A Tp(t) - P(t)A +.!. P(t)BBTp(t) - I (7.10.16)
Y'
with P(T) = O. then u minimizes f(u).
Proof We want to show that u satisfies Eq . (7.10.14), where L u is given
by Eq. (7.10.10). We note that ifu satisfies Eq . (7.10.14). then u = - . ! . £ * (L u
Y'
- v) = -'!'L*x. We now find the expression for evaluating L * w for ar-
Y'
ru: r
bitrary w E ,L ,[O. T]. We compute
(w. £ 0 ) = .(s, t)Bu(t)dt w(s)ds
r f:
ruT(t)[f
= uT(t)BT.T(S,t)w(s)dtds
= BT.T(S, t)w(s)dsJ d t.
r
In order for this last expression to eq u al (L*w, u), we must have
*L[ w](t) = BT.T(S. t)w(s)ds.

Thus, u must satisfy
u(t) =- _ BI T
Y'
iT
I
.T(S, t)x(s)ds
r
for all t such that 0 ~ t< T. Now assume there exists a matrix P such that
P(t)x(t) = $T(S. t)x(s)ds. (7.10.17)
We now find conditions for such a matrix P(t) to exist. F i rst, we see that
P(T) = O. Next, differentiating both sides of Eq. (7.10.17) with respect to
t, and noting that ebT(s, t) = - AT.(s. t), we have
P(t)x(t) + P(t)i(t) = - x ( t) - AT f $T(S, t)x(s)ds
= - x ( t) - ATp(t)x(t).
Therefore,
P(t)x(t) + P(t)[Ax(t) + Bu(t)] = - x ( t) - ATP(t)X(t).
But
u(t) = - l - L * x ( t) = - l - B Tp(t)x ( t)
1' i'
so that
P(t)x ( t) +
P(t)Ax ( t) - l - P (t)BBTp(t)x ( t) = - x ( t) - ATP(t)X(t).
1'
Hence, pet) must satisfy
pet) = - A TP(t) - P(t)AT + l- P (t)BBTP(t) - I
i'
with peT) = O.
If
it follows that u satisfies

L*Lu + 1' U = *L v,
where v = - $ ( t, O)x o and so, by Theorem 7.10.12, u minimizes I given by
Eq. (7.10.11). This completes the proof of the theorem. _
The differential equation for pet) in Eq. (7.10.16) is called a matrix

Riccati equation and can be shown to have a unique solution for all t < T.
C. Minimiz a tion of Functionals: Method of Steepest Descent
The problem of finding the minimum (or maximum) of functionals arises

frequently in many diverse areas in applications. In this part we turn our
attention to an iterative method of obtaining the minimum of a functional
I defined on a real Hilbert space .X
Consider a functional I: X - + R of the form
I(x ) = (x, Mx ) - 2(w, x ) + p, (7.10.18)
where w is a fixed vector in ,X where PER, and where M is a linear self-
adjoint operator having the property
c,llx W « x , Mx ) < c 2 1IxW (7.10.19)
for all x E X and some constants C 2 > C 1 > O. The reader can readily verify
that the functional given in Eq. (7.10.13) is a special case off, given in Eq.
(7.10.18), where we make the association M= L * L + 1' 1 (provided i' > 0),
w = L * v, and p = (v, v).
U n der the above conditions, the equation
Mx = w (7.10.20)
has a unique solution, say x o, and X o minimizes f(x ) . Iterative methods are
based on beginning with an initial guess to the solution of Eq. (7.10.20)
and then successively attempting to improve the estimate according to a
recursive relationship of the form
(7.10.21)
where~. E Rand r. E .X Different methods of selecting~. and r. give rise
to various algorithms of minimizing f(x ) given in Eq. (7.10.18) or, equiva-
lently, finding the solution to Eq. (7.10.20). In this part we shall in particular
consider the method of steepest descent. In doing so we let
r. = w- Mx., n= 1,2, . . . . (7.10.22)
The term r. defined by Eq. (7.10.22) is called the residual of the approxima-
tion x .• If, in particular, x . satisfies Eq. (7.10.20), we see that the residual
is ez ro. F o r f(x ) given in Eq. (7.10.18), we see that
f' ( x . ) = - 2 r",
where f' ( x . ) denotes the gradient of f(x . ). That is, the residual, r., is
"pointing" into the direction of the negative of the gradient, or in the direction
of steepest descent. Equation (7.10.2 I) indicates that the correction term
~.r. is to be a scalar multiple of the gradient, and thus the steepest descent
method constitutes an example of one of the so-called "gradient methods."
With r. given by Eq. (7.10.22),~. is chosen so thatf(x . + ~.r.) is minimum.
Substituting x . + ~.r. into Eq. (7.10.18), it is readily shown that
(l = (r•• r.)
• (r., Mr.)
is the minimizing value. This method is illustrated pictorially in Figure B.
,X
fix , )
7.10.23. iF gure B. Illustration of the method of steepest descent.

7.11. Refe,ences and Notes 74 3
In the following result we show that under appropriate conditions the

sequence N x{ J generated in the heuristic discussion above converges to the
uniq u e minimizing element X o satisfying Eq. (7.10.20).
7.10.24. Theorem. L e t M E B(X , X ) be a self-adjoint operator such that

for some pair of positive real numbers" and .J l we have ,,11 x W< (x, Mx )
< .J lllx Wfor all x E .X L e t IX E X be arbitrary, let W E ,X and let'N = W
- Mx N, where NX I+ +
= X N (l,N'N for n = 1,2, ... ,and (l,N = ('N' N
' )/('N' M'N)'
Then the sequence x { converges to x o, where X o is the uniq u e solution to
N}
Eq. (7.10.20).
Proof In view of the Schwarz inequality we have (x, Mx ) < IIMx l lllx l l.
This implies that "llx l l < IIMx l l for all x E ,X and so M is a bijective
mapping by Theorem 7.4.21, with M- I E B(X , X ) and 11M-III < I/r. By
Theorem 7.4.10, M- I is also self-adjoint. L e t X o be the uniq u e solution to
Eq . (7.10.20), and define :F X - > R by
F(x) =
x o, M(x - x o)) for x E .X
(x -
We see that F is minimized uniquely by x = x o, and furthermore F ( x o) = O.
We now show that lim F ( x N) = O. If for some n, F ( x N) = 0, the process
N
terminates and we are done. So assume in the following that F ( x N

) 1= = O. Note
also that since M is positive, we have F ( x ) > 0 for all x E .X
We begin with the fact that
F ( x N+ I) = F ( x N) - 2(1,N('N' MYN) (I,~('N' + M'N)'
where we have let NY = X o - X N. Noting that N ' = MYN' so that F ( x N) =
(YN' MYN) = (M- I ' N ' 'n), we have
F(x N) - (F N
x I+ ) = (' n ' n' )2 :;;::: .1..
r
F(x N) (' n , M' n )(M- I rN, ' n ) .J l
Hence, (F N
x I+ ) < (1 - ~ )F ( X n) < (1 - ~ F(x l ). Thus, li~ F(x n) = Oand
so X N- > x o, which was to be proven. _
Many of the excellent sources dealing with linear operators on Banach

and H i lbert spaces include Balakrishnan 7[ .2], Dunford and Schwarz 7[ .5],
K a ntorovich and Akilov 7[ .6], K o lmogorov and F o min 7[ .7], Liusternik and
Sobolev 7[ .8], Naylor and Sell 7[ .11], and Taylor 7[ .12). The exposition by
Naylor and Sell is especially well suited from the viewpoint of applications
in science and engineering.
474 Chapter 7 / Linear Operators
F o r applications of the type considered in Section 7.10, as well as addi-

tional applications, refer to Antosiewicz and Rheinboldt 7[ .1], Balakrishnan
7[ .2], Byron and Fuller 7[ .3], Curtain and Pritchard 7[ .4,] Kantarovich and
Akilov 7[ .6], Lovitt 7[ .9], and Luenberger 7[ .10). Applications to integral
equations (see Section 7.lOA) are treated in 7[ .3] and 7[ .9]. Optimal control
problems (see Section 7.lOB) in a Banach and Hilbert space setting are
presented in 7[ .2], 7[ .4,] and 7[ .10]. Methods for minimization of func-
tionals (see Section 7.1OC) are developed in 7[ .1], 7[ .6], and 7[ .10].
REF E RENCES
7[ .1] .H A. ANTOSIEWICZ and W. C. RHEINBOLDT, "Numerical Analysis and

uF nctional Analysis," Chapter 14 in Survey of Numerical Analysis, ed. by
.J TODD. New oY rk: McGraw-iH ll Book Company, 1962.
7[ .2] A. V. BALARK ISHNAN, Applied uF nctional Analysis. New o Y rk: Springer-
Verlag, 1976.
7[ .3] .F W. BYRON and R. W. EL UF R, Mathematics of Classical and Quantum
Physics. Vols. I. II. Reading, Mass.: Addison-Wesley Publishing Co., Inc.,
1969 and 1970.·
7[ .4] R. .F CuRTAIN and A. .J PRITCHARD, uF nctional Analysis in Modern Applied
Mathematics. o L ndon: Academic Press, Inc., 1977.
7[ .5] N. DUNO F RD and .J SCHWARZ, iL near Operators, Parts I and II. New oY rk:
Interscience Publishers, 1958 and 1964.
7[ .6] L . V. A K NTOROVICH and G. P. AKIO L V, uF nctional Analysis in Normed
Spaces. New o Y rk: The Macmillan Company, 1964.
7[ .7] A. N. O K M
L OGOROV and S. V. O F MIN, Elements of the Theory ofFunctions
and uF nctional Analysis. Vols. I, II. Albany, N.Y.: Graylock Press, 1957
and 1961.
7[ .8] L. A. IL SU TERNlK and V. J. SoBOLEV, Elements ofFunctional Analysis. New
oY rk: rF ederick nU gar Publishing Company, 1961.
7[ .9] W. V. LovllT, iL near Integral Equations. New oY rk: Dover Publications,
Inc., 1950.
7[ .10] D. G. EUL NBERGER, Optimization by Vector Space Methods. New o Y rk:
oJ hn Wiley & Sons, Inc., 1969.
7[ .11] A. W. NAYO L R and G. R. SEL,L Linear Operator Theory. New oY rk: oH lt,
Rinehart and Winston, 1971.
7[ .12] A. E. TAYO L R, Introduction to uF nctional Analysis. New oY rk: oJ hn Wiley
& Sons, Inc., 1958.
*Reprinted in one volume by Dover Publications, Inc., New oY rk, 1992.
INDEX
A algebraic structure, 31
algebraic system, 30
algebra with identity,S 7, 105
Abelian group, 4 0 aligned,379
abstract algebra, 33 almost everywhere, 295
additive group, 46 approximate eigenvalue, 4
adherent point, 275 approximate point
adjoint system of spectrum, 4
ordinary differential approximation, 395
equations, 261 Arzela-Ascoli theorem, 316
adjoint transformation, 219, 220,4 2 2 Ascoli's lemma, 317
affine linear subspace, 85 associative algebra, 56,105
algebra, 30, 56, 57,104 associative operation, 28
algebraically closed automorphism, 64 , 68
field, 165 autonomous system of
algebraic conjugate, 110 differential equations, 241
algebraic multiplicity, 167,223 Axioms of norm, 207
475
476 Index
B closed set, 279

closed sphere, 283
Banach inverse theorem, 416 closure, 275
Banach space, 31, 345 cn,78
basis, 61 , 89 cofactor, 158
Bessel inequality, 213, 380 colinear, 379
bicompact, 302 collection of subsets, 8
bijection 14 column matrix, 132
bijective, 14, 100 column of a matriX, 132
bilinear form, 114 column rank of a matrix, 152
bilinear functional, 1141- 15 column vector, 125
binary operation, 26 commutative algebra, 57, lOS
block diagonal matrix, 175 commutative group, 4 0
Bolzano-Weierstrass commutative operation, 28
property, 302 commutative ring, 47
Bolzano-Weierstrass compact, 302
theorem, 298 compact operator, 4 7
boundary, 279 companion form, 256
bounded linear comparable matrices, 137
functional, 356 complement of a subset, 4
bounded linear opera tor, 407 completely continuous
bounded metric space, 265 operator, 4 7
bounded sequence, 286 complete metric space, 290
B(X , Y ) ,4 0 9 complete ortghonormal
set of vectors, 213,389
completion, 295
complex vector space, 76
c composite function, 16
composite mathematical
CIa, bl, 80 system, 30, 54
cancellation laws, 34 conformal matrices, 137
canonical mapping, 372 congruent matrices, 198
cardinal number, 24 conjugate functional, 114
cartesian product, 10 conjugate operator, 24 1
Cauchy-Peano constant coefficients, 241
existence theorem, 332 contact point, 275
Cauchy sequence, 290 continuation of a
Cayley-aH milton theorem, 167 solution, 336
Cayley's theorem, 66 continuous function, 307,4 0 8
characteristic equation, 166,259 continuous spectrum, 4 0
characteristic polynomial, 166 contraction mapping, 314
characteristic value, 164 converge, 286,350
characteristic vector, 164 convex, 351-355
7
~ 9 coordinate representation
classical adjoint of a of a vector, 125
matrix, 162 coordinates of a vector
closed interval, 283 with respect to a basis, 92, 124
closed relative to an countable set, 23
operation, 28 countably infinite set, 23
Index 477
covering, 299 dual, 358

cyclic group, 4 3 , 4 dual basis, 112
o E
degree of a polynomial, 70 e-approximate solution, 329

DeMorgan's laws, 7,12 e-dense set, 299
dense-in-itself, 284 e- n et,299
denumerable set, 23 e~ e nvalue, 164 , 4 3 9
derived set, 277-278 eigenvector, 164 , 4 3 9
determinant of a element, 2
linear transformation, 163 element of ordered set, 10
determinant of a matrix , 157 empty set, 3
diagonalization of a endomorphism, 64 , 68
matrix, 172 equal by definition, 10
diagonalization process, 4 5 0 equality of functions, 14
diagonal matrix, 155 equality of matrices, 132
diameter of a set, 267 equality of sets, 3
difference of sets, 7 equals relation, 26
differentiation: eq u icontinuous,316
of matrices, 247 equivalence relation, 26
of vectors, 241 equivalent matrices, lSI
dimenmon,78,92,392 equivalent metrics, 318
direct product, 10 equivalent sets, 23
direct sum of linear, error vector, 395
subspaces 83,4 5 7 estimate, 398
discrete metric, 265 Euclidean metric, 271
disjoint sets, 5 Euclidean norm, 207
disjoint vector spaces, 83 Euclidean space, 30,124 , 205
distance 264 even permutation, 156
between a point events, 397
and a set, 267 everywhere dense, 284
between sets, 267 expected value, 398
between vectors, 208 ex t ended reatline, 266
distribution function, 397 extended real numbers, 266
distributive, 28 extension of a function, 20
diverge, 286,350 extenmon of an
division algorithm, 71 operation, 29
division (of exterior, 279
polynomials), 72 ex t remum, 4 6 4
division ring, 4 6 ,50
divisor, 4 9
divisors of zero, 4 8 F
divisors of zero, 4 8
domain of a function, 12 factor, 72
domain of a relation, 25 family of disjoint sets, 12
dot product, 114 family of subsets, 8
478 Index
field, 30, 46, 50 group component, 4 6

field of complex group operation, 4 6
numbers, 51
field of real numbers, 51
finite covering, 299 H
finite-dimensional
operator, 4 5 0 aH hn-Banach theorem, 367-370
finite-dimensional half space, 366
vector space, 92, 124 aH mel basis, 89
finite group, 4 0 Hausdorff spaces, 323
finite intersection eH ine-Borel property, 302
property, 305 eH ine-Borel theorem, 299
finite linear hermitian operator, 427
combination of vectors, 85 Hilbert space, 31,377
finite set, 8 homeomorphism, 320
fixed point, 315 homogeneous property
flat, 85 of a norm, 208, 344
F n ,78 homogeneous system, 241-242
Fourier coefficients, 380, 389 homomorphic image, 62, 68
Frechet derivative, 458 homomorphic rings, 67
Fredholm equation, 97,326 homomorphic semigroups, 63
Fredholm operator, 425 homomorphism, 30, 62
function, 12 hyperplane, 364
functional, 109,355
functional analysis, 343
function space, 80
fundamental matrix, 246
fundamental sequence, 290 idempotent operator, 121
fundamental set, 246 identity:
fundamental theorem element, 35
of algebra, 74 function, I 9
fundamental theorem of matrix, 139
linear equations, 99 permutation, 19,4 4
relation, 26
transformation, 105,4 0 9
G image of a set under f, 21
indeterminate of a
Gateaux differential, 458 polynomial ring, 70
generalized associative index:
law, 36 of a nilpotent
generated subspace, 383 operator, 185
generators of a set, 60 of a symmetric
Gram matrix, 395 bilinear functional, 202
Gram-Schmidt process, 213,391 set, 10
graph of a function, 14 indexed family of sets, 10
greatest common divisor, 73 indexed set, II
Gronwall inequality, 332 induced:
group, 30, 39 mapping, 20
Index 479
induced (cont.) K
metric, 267
norm,34 9 ,4 1 2 Kalman's theorem, 4 0 14 0 2
operation, 29 kernel of a homomorphism, 65, 68
inequalities, 268-271 Kronecker delta, III
infinite-dim ensional
vector space, 92
infinite series, 350 L
infinite set, 8
initial value problem, 238-261,328-341 Laplace transform, 96
injection, 14 latent value, 164
injective, 14, 100 leading coefficient of
inner product, 117,205,375 a polynomial, 70
inner product space, 31, 118,205 eL besgue integral, 296
inner product subspace, 118 eL besgue measurable
integral domain, 4 6 , 4 9 function, 296
integration: eL besgue measurable
of matrices 249 sets, 295
of vectors 249 eL besgue measure, 295
interior, 278 left cancellation
intersection of sets,S . property, 34
invariant linear left distributive, 28
subspace, 122 left identity, 35
inverse: left inverse, 36
image 21 left invertible element, 37
of a function, IS, 100 left R-module, 54
of a matrix, 140 left solution, 4 0
of an element, 38 iL e algebra, 57
relation, 25 limit, 286
invertible element, 37 limit point, 277,288
invertible linear line segment, 351
transformation, 100 linear:
invertible matrix, 140 algebra, 33
irreducible polynomial, 74 functional, 109,355- 3 60
irreflexive,372 manifold, 81
isolated point, 275 operator, 31,95
isometric operator, 34 1 quadratic cost
isometry, 321 control,4 6 8
isomorphic, 108 space, 30, 55, 76
isomorphic semigroups, 64 subspace, 59, 81, 348
isomorphism, 30, 63, 68,108 subspace generated
by a set, 86
transformation, 30, 95, 100
J variety, 85
linearly dependent, 87
Jacobian matrix, 64 1 linearly independent, 87
Jacobi identity, 57 Lipschitz condition, 324 , 328
J o rdan canonical form, I 75, 19·1 Lipschitz c.onstant, 324 , 328
480 Index
lower triangular matrix, 176 n~imensional vector

l!p,271 space, 92
nL , 297 negative definite
(L ,X V), 104 matrix, 222
nested sequence
of sets, 298
M Neumann expansion
theorem,4 1 5
map, 13 nilpotent operator, 185
mapping, 13 non-abelian group, 4 0
mathematical system, 30 non-eommutative group, 4 0
matrices, 30 non-empty set, 3
matrix, 132 non-homogeneous system, 241-242
matrix of: non-linear
a bilinear functional, 195 transformation, 95
a linear transformation, 131 non-singular linear
one basis with respect transformation, 100
to a second basis, 149 non-singular matrix, 140
maximal linear subspace, 363 non-void set, 3
metric, 31, 209, 264 norm, 206, 344
normal:
metric space, 31 , 209, 263-342
equations, 395
metric subspace, 267
linear
minimal polynomial, 179, 181
minor of a matrix, 158 transformation, 237
operator, 34 1
modal matrix, 172
topological space, 323
modern algebra, 33
normalizing a vector, 209
module, 30, 54
normed conjugate space, 358
monic polynomial, 70
normed dual space, 358
monoid,37
normed linear space, 31,208,34 4
multiplication of a
norm of a bounded
linear transformation
linear transformation, 4 0 9
by a scalar, 104
norm preserving, 367
multiplication of
nowhere dense, 284
vectors by scalars, 76, 409
null:
multiplicative semigroup, 46
matrix, 139
multiplicity of an
set, 3
eigenvalue, 164
space, 98, 224
multivalued function, 25
vector, 76, 77
nullity of a linear
transformation, 100
N n-vector, 132
natural basis, 126

natural coordinates, 127 o
n~imensional complex
coordinate space, 78 object, 2
n-dimensional real observations, 398
coordinate space, 78 odd permutation, 156
Index 84 1
one-to~ne and onto point spectrum, 4 0

mapping, 14, 100 polarization, 116
one-to~ne mapping, 14, 100 polynomial, 69
onto mapping, 14 , 100 positive definite matrix, 222
open: positive operator, 429
ball,275 power class, 9
covering, 299 power set, 9
interval, 282 precompact, 299
set, 279 predecessor of an
sphere, 275 operation, 29
operation table, 27 pre-iH lbert space, 377
operator, 13 primary decomposition
optimal control problem, 468 theorem, 183
ordered sets, 9 principal minor of a
order of a group, 4 0 matrix, 158
order of a polynomial, 70 principle of superposition, 96
order of a set, 8 probability space, 397
ordinary differential product metric spaces, 274
equations, 238-261 product of:
origin, 76,77 a matrix by a scalar, 138
orthogonal: linear transformations, 105,4 0 9
basis, 210 two elements, 4 6 , 104
complement, 215,382 two matrices, 138
linear transformation. 217, 231-237 projection, 119,226,387
matrix, 216, 226 projection theorem, 387,4 0 0
projection, 123,4 3 3 proper:
set of vectors, 379 subset, 3
vectors, 118, 209 subspace, 81, 164
orthogonality principle, 399 value, 164
orthonormal set of vector, 164
vectors, 379 Pythagorean theorem, 209, 379
outcomes, 397
Q
p
quadratic form, 115, 226
parallel, 364 quotient, 72
parallelogram law, 208, 379
Parseval's formula, 390
Parseval's identity, 212 R
partial sums, 350
partitioned matrix, 147 radius, 275
permutation group, 4 , 54 random variable, 397
permutation on a set, 19 range of a function, 12
piecewise continuous range of a relation, 25
derivatives, 329 range space, 98
point of accumulation, 277 rank of a linear
points, 264 transformation, 100
482 Index
rank of a matrix , 136 row rank of a matrix , 152

rank of a symmetric row vector, 125, 132
bilinear functional, 202 R*,266
real inner product space, 205 R- s ubmodule,58
real line, 265 R-submodule generated
real vector space, 76 by a set, 60
reduce, 4 3 5
reduced characteristic
function, I 79 s
reduced linear
transformation, 122 scalar, 75
reflection, 218 scalar multiplication, 76
reflexive, 372 Schwarz ineq u ality, 207,376
reflexive relation, 25 second dual space, 371
regular topological secular value, 164
space, 323 self-adjoint linear
relation, 25 transformation, 221,224 - 2 25
relatively compact, 307 self-adjoint operators, 4 2 8
relatively prime, 73 semigroup, 30, 36
remainder, 72 semigroup component, 4 6
repeated eigenvalues, 173 semigroup of
residual, 4 7 2 transformations, 4
residual spectrum, 4 0 semigroup operation, 4 6
resolution of the separable, 284 , 300
identity, 226, 4 5 7 separates, 366
resolvent set, 4 3 9 seq u ence, 11,286
restriction of a mapping, 20 sequence of disjoint
R- h omomorphism, 68 sets, 12
Riccati eq u ation, 471 sequence of sets, II
Riemann intergrable, 296 sequentially compact, 301- 3 05
Riesz representation set, I
theorem, 393 set of order z e ro, 8
right: shift operator, 41
cancellation property, 34 o-algebra, 397
distributive, 28 o- f ield,397
identity, 34 signature of a symmetric
inverse, 35 bilinear functional, 202
invertible element, 37 similarity transformation, 153
R-module, 54 similar matrices, 153
solution, 4 0 simple eigenvalues, 164
ROO, 78 singleton set, 8
ring, 30,4 6 singular linear
ring of integers, 5 1 transformation, 101
ring of polynomials, 70 singular matrix , 14 0
ring with identity, 4 7 skew-adjoint linear
R-module, 54 transformation, 221,237
Rn ,78 skew symmetric bilinear
rotation, 218, 230 functional, 196
row of a matrix , 131 skew symmetric matrix , 196
Index 483
skew symmetric part of a sets, 82

linear functional, 196 vectors, 76
solution of a differential surjective, 14, 100
equation, 239 Sylvester's theorem, 199
solution of an initial symmetric difference
value problem, 239 of sets, 7
space of: symmetric matriX, 196,226
bounded complex symmetric part of a
sequences, 79 linear functional, 196
bounded real sequences, 79 symmetric relation, 26
finitely non-zero system of differential
sequences, 79 equations, 240, 255-260
linear transformations, 104
real-valued continuous
functions, 80 T
span, 86
spectral theorem, 226,4 5 5,4 5 7 ternary operation, 26
spectrum, 164 , 4 3 9 Tr spaces,323
sphere, 275 topological space, 31
spherical neighborhood, 275 topological structure, 31
square matrix, 132 topology, 280, 318, 322-323
state transition matrix, 247-255 totally bounded, 299
steepest descent, 4 7 2 T' , 4 2 1
strictly positive, 429 trace of a matriX, 169
strong convergence, 373 transformation, 13
subalgebra, 105 transformation group, S4
subcovering, 299 transitive relation, 26
subdomain, 52 transpose of a linear
subfield,52 transformation, 113,4 2 0
subgroup, 14 transpose of a matrix, 133
subgroup generated transpose of a vector, 125
by a set, 43 triangle inequality, 208, 264 , 344
submatrix, 147 triangular matriX, 176
subring,52 trivial ring, 48
subring generated by trivial solution, 245
a set, 53 trivial su bring, 53
subsemigroup,4 0 truncation operator, 439
subsemigroup generated T*,4 2 2
by a set,4 1 TT,I13
subsequence, 287
subset, 3
subsystem, 4 0 , 46
successive u
approximations, 315, 324-328
sum of: unbounded linear
elements, 46 functional, 356
linear operators, 409 unbounded metric space, 265
linear transformations, 104 uncountable set, 23
matrices, 138 uniform convergence, 313
484 Index
uniformly continuous, 308 Venn diagram, 8

union of sets, 5 void set, 3
unit, 37 Volterra equation, 327
unitary operator, 34 1 Volterra integral
unitary space, 205 equation, 97
unit of a ring, 47
unit vector, 209
unordered pair of w
elements, 9
upper triangular matrix, 176 weak convergence, 373
usual metric for R· , 266,320 weakly continuous, 375
usual metric on R, 265 weak· compact, 375
usual metric on Rn , 271 weak-star convergence, 373
Weierstrass approximation
theorem, 285
v Wronskian, 256-259
vacuous set, 3
Vandermonde matrix, 260 XYZ
variance, 398
vector, 75 CX , 357
vector addition, 75 X · , 357-358
vector space, 30, 55, 76 zero:
vector space of n-tuples polynomial, 70
over F, 56 transformation, 104 , 4 0 9
vector space over a field, 76 vector, 76, 77
vector subspace, 59 Zorn's lemma, 390
REVIEWSOF
Algebra and Analysis
for Engineers and Scientists
"This book is a useful compendium of the mathematics of (mostly) finite-di-

mensionallinear vector spaces (plus two final chapters on infinite-dimensional
spaces), which do find increasing application in many branches of engineering
and science .... The treatment is thorough; the book will certainly serve as a valu-
able reference." - A merican Scientist
"The authors present topics in algebra and analysis for students in engineering
and science .... Each chapter is organized to include a brief overview, detailed
topical discussions and references for further study. Notes about the references
guide the student to collateral reading. Theorems, definitions, and corollaries are
illustrated with examples. The student is encouraged to prove some theorems
and corollaries as models for proving others in exercises. In most chapters, the
authors discuss constructs used to illustrate examples of applications. Discus-
sions are tied together by frequent, well written notes. The tables and index are
good. The type faces are nicely chosen. The text should prepare a student well in
mathematical matters." - S cience Books and iF lms
"This is an intermediate level text, with exercises, whose avowed purpose is to
provide the science and engineering graduate student with an appropriate mod-
ern mathematical (analysis and algebra) background in a succinct, but nontrivial,
manner. After some fundamentals, algebraic structures are introduced followed
by linear spaces, matrices, metric spaces, normed and inner product spaces and
linear operators.... While one can quarrel with the choice of specific topics and
the omission of others, the book is quite thorough and can serve as a text, for
self-study or as a reference." - M athematical Reviews
"The authors designed a typical work from graduate mathematical lectures: for-
mal definitions, theorems, corollaries, proofs, examples, and exercises. It is to
be noted that problems to challenge students' comprehension are interspersed
throughout each chapter rather than at the end." - C H O ICE
Printed in the USA

Sanet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sanet

Uploaded by

Copyright:

Available Formats

Anthony N.

Algebra and Analysis

Cover design by Dutton and Sherman, aH mden, CT.

L i brary of Congress Control Number: 2007931687

ISBN-13: 978-08- 176-74 06-3 e-ISBN-13: 978-08- 176-74 07-0

Printed on acid-free paper.

©2007 Birkhiiuser Boston

Originally published as Mathematical oF undations in Engineering and Science by Prentice-aH ll,

CH A PTER 1: F U N DAMENTAL CONCEPTS 1

CH A PTER 2: ALGEBRAIC STRU C TU R ES 33

2.1 Some Basic Structures of Algebra 34

CHAPTER J : VECTOR SPACES AND LINEAR

CHAPTER 4 : FINITE-DIMENSIONAL VECTOR

.4 11 Applications. to Ordinary Differential Equations 238

CH A PTER 5: METRIC SPACES 263

5.1 Definition of Metric Spaces 264

CHAPTER 6: NORMED SPACES AND INNER PRODUCT

6.1 Normed linear Spaces 344

6.13 oF urier Series 387

CHAPTER 7: L I NEAR OPERATORS 406

7.1 Bounded iL near Transformations 04 7

rent literature (e.g., journal articles, monographs, and the like).

Suggested Course Outlines

In this chapter we present fundamental concepts required throughout the

Virtually every area of modern mathematics is developed by starting from

elements or objects. We denote sets by common capital letters A, B, C, etc.,

Q= x{ E R:x = :,p,q E Z,q : ;i:o} .

It is clear that 0 c J c Z c Q c R, and that each of these subsets are

We now wish to state what is meant by equality of sets.

1.1.2. De6nition. Two sets, A and B, are said to be equal if A c Band

Now let X be a set and let A c: .X The complement of subset A with

1.1.4. Theorem. eL t A, B, and C be subsets of .X Then

1.1.6. Exercise. Prove parts (v) and (vi) of Theorem 1.1.4.

containment and equality of sets. Frequently, the manipulations required to

1.1.7. Theorem. Let A, B, and C be subsets of .X Then

In view of part (xvi) of Theorem 1.1.7, there is no ambiguity in writing

A, = AI U A2 U ... U A3 = x{ E X: x E A, for some i = 1• . .. , n).

Similarly, by part (xv) of Theorem 1.1.7, there is no ambiguity in writing

That is, n A, consists of those members of X

1.1.11. Theorem. Let AI> ... , An be subsets of .X Then

1.1.14. Exercise. Prove Theorem 1.1.11.

1.1.15. Theorem. Let A, B, and C denote subsets of .X Then

1.1.16. Exercise. Prove Theorem 1.1.15.

1.1.17. iF gure A. Venn diagrams.

1.1.18. Definition. A non-void set A is said to be finite if A contains n

In Section 1.2 we will further categorize infinite sets as being countable

set, 0, is a subset of .X It is possible to form a non-empty set whose only

1.1.19. Definition. Let A be any subset of .X We define the power class

element, second element, third element, etc. (However, it is possible to define

1.1.23. Example. eL t R be the set of all real numbers. We denote the

1.1.24. Example. eL t A = to, I}, and let B = a{ , b, c}. Then

Next, we consider some generalizations to an ordered set. To this end,

1.1.25. Definition. A sequence is an indexed set whose index set is .J A

We usually abbreviate the sequence "x { E :X n E } J by ,x { ,}, when no

1.1.26. Definition. L e t A{ .. : a E I} be an indexed family of sets, and let

1.1.27. Example. Let X = R, the set of real numbers, and let I = x{ E R:

The reader is now in a position to prove the following results.

1.1.29. Deorem. eL t A { .. : (t E I} be an indexed family of sets. Let B be

We first give the definition of a function in a set theoretic manner. Then

1.2.1. Definition. eL t X and Y be non-void sets. A function/from X into

and denote it by f(x ) . We sometimes writef: X -> Y t o denote the function

1.2.2. Example. Let A and B be the sets defined in Example 1.1.24. L e t

Although we have defined a function as being a set, we usually characterize