Professional Documents
Culture Documents
Sanet
Sanet
Michel
Charles .J Herget
Birkhauser
Boston • Basel • Berlin
Anthony N. Michel Charles .J eH rget
Department of Electrical Engineering eH rget Associates
nU iversity of Notre Dame P.O. Box 1425
Notre Dame, IN 64 556 Alameda, CA 94501
.U S.A. .U S.A.
Mathematics Subject Classification (2000): 03Ex,x 03E20, 08-,X 08-01, IS-,X 15-01, 15A03,
15A04, 15A06, 15A09, 15A15, 15AI8, 15A21, 15A57, 15A60, 15A63, 20-,X 20-01, 26-,X
26-01, 26Ax,x 26A03, 26A15, 26Bx,x 34,X - 340- 1, 34A,x 34AI2, 34A30, 340H 5, 54 B05, 64 ,X -
64 0- 1, 64 Ax,x 64 A22, 64 A50, 64 A55, 64 Bx,x 64 B20, 64 B25, 64 Cx,x 64 C05, 64 Ex,x 64 NIO, 64 N20,
74 ,X - 74 0- 1, 74 Ax,x 74 A05, 74 A07, 74 A10, 74 A25, 74 A30, 74 A67, 47BI5,47HI0, 74 N20, 74 N70,
54,X - 540- 1, 54A20, 54C,x 54C05, 54C30, 540x , 54005, 54 0 30,54 0 35,54 0 4 5 , 54E35, 54E54 ,
54E50, 93EIO
All rights reserved. This work may not be translated or copied in whole or in part without the writ-
ten permission of the publisher (Birkhiiuser Boston, c/o Springer Science+Business Media C L , 233
Spring Street, New oY rk, NY 10013, S U A), except for brief excerpts in connection with reviews or
scholarly analysis. sU e in connection with any form of information .storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter de-
veloped is forbidden.
The use in this publication of trade names, trademarks, service marks and similar terms, even if they
are not identified as such, is not to be taken as an expression of opinion as to whether or not they are
subject to proprietary rights.
9 8 7 6 5 432 I
www.birkhauser.com (IBT)
CONTENTS
PREFACE IX
1.1 Sets 1
1.2 Functions 12
1.3 Relations and Equivalence Relations 25
1.4 Operations on Sets 26
1.5 Mathematical Systems Considered in This Book 30
1.6 References and Notes 31
References 32
v
vi Contents
This book evolved from a one-year sequence of courses offered by the authors
at Iowa State University. The audience for this book typically included theoreti-
cally oriented first- or second-year graduate students in various engineering or
science disciplines. Subsequently, while serving as Chair of the Department of
Electrical Engineering, and later, as Dean of the College of Engineering at the
University of Notre Dame, the first author continued using this book in courses
aimed primarily at graduate students in control systems. Since administrative
demands precluded the possibility of regularly scheduled classes, the Socratic
method was used in guiding students in self study. This method of course deliv-
ery turned out to be very effective and satisfying to student and teacher alike.
F e edback from colleagues and students suggests that this book has been used in
a similar manner elsewhere.
The original objectives in writing this book were to provide the reader with ap-
propriate mathematical background for graduate study in engineering or science;
to provide the reader with appropriate prerequisites for more advanced subjects
in mathematics; to allow the student in engineering or science to become famil-
iar with a great deal of pertinent mathematics in a rapid and efficient manner
without sacrificing rigor; to give the reader a unified overview of applicable
mathematics, thus enabling him or her to choose additional courses in math-
ematics more intelligently; and to make it possible for the student to understand
at an early stage of his or her graduate studies the mathematics used in the cur-
ix
x Preface
The prerequisites for this book include the usual background in undergraduate
mathematics offered to students in engineering or in the sciences at universities
in the United States. Thus, in addition to graduate students, this book is suit-
able for advanced senior undergraduate students as well, and for self study by
practitioners.
Concerning the labeling of items in the book, some comments are in order. Sec-
tions are assigned numerals that reflect the chapter and the section numbers. For
example, Section 2.3 signifies the third section in the second chapter. Extensive
sections are usually divided into subsections identified by upper-case com-
mon letters A, B, C, etc. Equations, definitions, theorems, corollaries, lemmas,
examples, exercises, figures, and special remarks are assigned monotonically
increasing numerals which identify the chapter, section, and item number. For
example, Theorem 4.4.7 denotes the seventh identified item in the fourth section
of Chapter .4 This theorem is followed by Eq. (4.4.8), the eighth identified item
in the same section. Within a given chapter, figures are identified by upper-case
letters A, B, C, etc., while outside of the chapter, the same figure is identified
by the above numbering scheme. iF nally, the end of a proof or of an example is
signified by the symbol • .
A one-semester course
Chapters 1, 3, 4, 5, and Sections 6.1 and 6.11 in Chapter 6 can serve as the basis
for a one-semester course, emphasizing basic aspects of Linear Algebra and
Analysis in a metric space setting.
The coverage of Chapter 1 should concentrate primarily on functions (Sec-
tion 1.2) and relations and equivalence relations (Section 1.3), while the material
concerning sets (Section 1.1) and operations on sets (Section 1.4) may be cov-
ered as reading assignments. On the other hand, Section 1.5 (on mathematical
systems) merits formal coverage, since it gives the student a good overview of
the book' s aims and contents.
xii Preface
The material in this book has been organized so that Chapter 2, which ad-
dresses the important algebraic structures encountered in Abstract Algebra, may
be omitted without any loss of continuity. In a one-semester course emphasizing
Linear Algebra, this chapter may be omitted in its entirety.
In Chapter 3, which addresses general vector spaces and linear transforma-
tions, the material concerning linear spaces (Section 3.1), linear subspaces and
direct sums (Section 3.2), linear independence and bases (Section 3.3), and lin-
ear transformations (Section 3.4) should be covered in its entirety, while selected
topics on linear functionals (Section 3.5), bilinear functionals (Section 3.6), and
projections (Section 3.7) should be deferred until they are required in Chap-
ter .4
Chapter 4 addresses finite-dimensional vector spaces and linear transforma-
tions (matrices) defined on such spaces. The material on determinants (Section
4.4) and some of the material concerning linear transformations on Euclidean
vector spaces (Subsections .4 1 OD and .4 1 OE), as well as applications to ordinary
differential equations (Section 4.11) may be omitted without any loss of conti-
nuity. The emphasis in this chapter should be on coordinate representations of
vectors (Section 4.1), the representation of linear transformations by matrices
and the properties of matrices (Section 4.2), equivalence and similarity of ma-
trices (Section 4.3), eigenvalues and eigenvectors (Section 4.5), some canonical
forms of matrices (Section 4.6), minimal polynomials, nilpotent operators and
the Jordan canonical form (Section 4.7), bilinear functionals and congruence
(Section 4.8), Euclidean vector spaces (Section 4.9), and linear transformations
on Euclidean vector spaces (Subsections .4 1 OA, .4 1 OB, and .4 1 oq .
Chapter 5 addresses metric spaces, which constitute some of the most impor-
tant topological spaces. In a one-semester course, the emphasis in this chapter
should be on the definition of metric space and the presentation of important
classes of metric spaces (Sections 5.1 and 5:3), open and closed sets (Sec-
tion 5.4), complete metric spaces (Section 5.5), compactness (Section 5.6), and
continuous functions (Section 5.7). The development of many classes of metric
spaces requires important inequalities, including the Holder and the Minkowski
inequalities for finite and infinite sums and for integrals. These are presented
in Section 5.2 and need to be included in the course. Sections 5.8 and 5.10 ad-
dress specific applications and may be omitted without any loss of continuity.
oH wever, time permitting, the material in Section 5.9, concerning equivalent and
homeomorphic metric spaces and topological spaces, should be considered for
inclusion in the course, since it provides the student a glimpse into other areas
of mathematics.
To demonstrate mathematical systems endowed with both algebraic and to-
pological structures, the one-semester course should include the material of
Sections 6.1 and 6.2 in Chapter 6, concerning normed linear spaces (resp., Ban-
ach spaces) and inner product spaces (resp., Hilbert spaces), respectively.
Preface ix ii
A two-semester course
In addition to the material outlined above for a one-semester course, a two-se-
mester course should include most of the material in Chapters 2, 6, and 7.
Chapter 2 addresses algebraic structures. The coverage of semigroups and
groups, rings and fields, and modules, vector spaces and algebras (Section 2.1)
should be in sufficient detail to give the student an appreciation of the various
algebraic structures summarized in Figure B on page 61. Important mappings
defined on these algebraic structures (homomorphisms) should also be empha-
sized (Section 2.2) in a two-semester course, as should the brief treatment of
polynomials in Section 2.3.
The first ten sections of Chapter 6 address normed linear spaces (resp., Ban-
ach spaces) while the next four sections address inner product spaces (resp.,
Hilbert spaces). The last section of this chapter, which includes applications (to
random variables and estimates of random variables), may be omitted without
any loss of continuity. The material concerning normed linear spaces (Sec-
tion 6.1), linear subspaces (Section 6.2), infinite series (Section 6.3), convex sets
(Section 6.4), linear functionals (Section 6.5), finite-dimensional spaces (Sec-
tion 6.6), inner product spaces (Section 6.11), orthogonal complements (Section
6.12), and Fourier series (Section 6.13) should be covered in its entirety. Cov-
erage of the material on geometric aspects of linear functionals (Section 6.7),
extensions of linear functionals (Section 6.8), dual space and second dual space
(Section 6.9), weak convergence (Section 6.10), and the Riesz representation
theorem (Section 6.14) should be selective and tailored to the availability of time
and the students' areas of interest. (For example, students interested in optimiza-
tion and estimation problems may want a detailed coverage of the H a hn- B anach
theorem included in Section 6.8.)
Chapter 7 addresses (bounded) linear operators defined on Banach and Hilbert
spaces. The first nine sections of this chapter should be covered in their entirety
in a two-semester course. The material of this chapter includes bounded lin-
ear transformations (Section 7.1), inverses (Section 7.2), conjugate and adjoint
operators (Section 7.3), Hermitian operators (Section 7.4), normal, projection,
unitary and isometric operators (Section 7.5), the spectrum of an operator (Sec-
tion 7.6), completely continuous operators (Section 7.7), the spectral theorem
for completely continuous normal operators (Section 7.8), and differentiation of
(not necessarily linear and bounded) operators (Section 7.9). The last section,
which includes applications to integral equations, an example from optimal con-
trol, and minimization of functionals by the method of steepest descent, may be
omitted without loss of continuity.
Both one-semester and two-semester courses offered by the present authors,
based on this book, usually included a project conducted by each course par-
ticipant to demonstrate the applicability of the course material. Each project
ix v Preface
involved a formal presentation to the entire class at the end of the semester.
The courses described above were also offered using the Socratic method, fol-
lowing the outlines given above. These courses typically involved half a dozen
participants. While most of the material was self taught by the students them-
selves, the classroom meetings served as a forum for guidance, clarifications, and
challenges by the teacher, usually resulting in lively discussions of the subject on
hand not only among teacher and students, but also among students themselves.
For the current printing of this book, we have created a supplementary web-
site of additional resources for students and instructors: http://Michel.Herget.
net. Available at this website are additional current references concerning the
subject matter of the book and a list of several areas of applications (including
references). Since the latter reflects mostly the authors' interests, it is by defini-
tion rather subjective. Among several additional items, the website also includes
some reviews of the present book. In this regard, the authors would like to invite
readers to submit reviews of their own for inclusion into the website.
The present publication of Algebra and Analysisfor Engineers and Scientists
was made possible primarily because of Tom Grasso, Birkhauser's Computa-
tional Sciences and Engineering Editor, whom we would like to thank for his
considerations and professionalism.
Anthony N. Michel
Charles .J Herget
Summer. 2007
1
N
U F DAMENTAL CONCEPTS
1.1. SETS
1
2 Chapter 1 I uF ndomental Concepts
In this case we say that "x belongs to A," or "x is contained in A," or "x is
a member of A," etc. Ifx is any element and if A is a set, then we assume that
one knows whether x belongs to A or whether x does not belong to A. If x
does not belong to A we write
x ¢ A.
To illustrate some of the concepts, we assume that the reader is familiar
with the set of real numbers. Thus, if we say
R is the set of all real numbers,
then this is a well defined collection of objects. We point out that it is possible
to characterize the set of real numbers in a purely abstract manner based on
an axiomatic approach. We shall not do so here.
To illustrate a non-well defined collection of objects, consider the state-
ment "the set of all tall people in Ames, Iowa." This is clearly not precise
enough to be considered here.
We will agree that any set A may not contain any given element x more
than once unless we explicitly say so. Moreover, we assume that the concept
of "order" will play no role when representing elements of a set, unless we
say so. Thus, the sets A = a{ , b, c} and B = c{ , b, a} are to be viewed as being
exactly the same set.
We usually do not describe a set by listing every element between the
curly brackets { } as we did for set A above. A convenient method of charac-
terizing sets is as follows. Suppose that for each element x of a set A there is a
statement P(x ) which is either true or false. We may then define a set B which
consists of all elements x E A such that P(x ) is true, and we may write
B = {x E A: P(x ) is true}.
F o r example, let A denote the set of all people who live in Ames, Iowa, and
let B denote the set of all males who live in Ames. We can write, then,
B= {x E A: x is a male}.
When it is clear which set x belongs to, we sometimes write { x : P(x ) is true}
(instead of, say, {x E A: P(x ) is trueD.
It is also necessary to consider a set which has no members. Since a set is
determined by its elements, there is only one such set which is called the
1.1. Sets 3
empty set, or the vacuous set, or the null set, or the void set and which is
denoted by 0. Any set, A, consisting of one or more elements is said to be
non-empty or nOD-void. IfA is non-void we write A 1= = 0.
If A and B are sets and if every element of B also belongs to A, then we
say that B is a subset of A or A includes B, and we write B c A or A :::> B.
Furthermore, if B c A and if there is an x E A such that x .¢ B, then we
say that B is a proper subset of A. Some texts make a distinction between
proper subset and any subset by using the notation c and ~, respectively.
We shall not use the symbol ~ in this book. We note that if A is any set,
then 0 c: A. Also, 0 c 0. If B is not a subset of A, we write B ¢ A or
A P= B.
1.1.1. Example. Let R denote the set of all real numbers, let Z denote
the set of all integers, let J denote the set of all positive integers, and let Q
denote the set of all rational numbers. We could alternately describe the set
Zas
Z = { x E R: x is an integer}.
Thus, for every x E R, the statement x is an integer is either true or false.
We frequently also specify sets such as J in the following obvious manner,
J = {x E Z: x = 1, 2, ...}.
We can specify the set Q as
We emphasize that all definitions are "ifand only if" statements. Thus, in
the above definition we should actually have said: A and B are equal if and
only if A c Band Be A. Since this is always understood, hereafter all
definitions will imply the "only if" portion. Thus, we simply say: two sets A
and B are said to be equal if A c Band B cA.
In Definition 1.1.2 we introduced two concepts of equality, one of equality
of sets and one of equality of elements. We shall encounter many forms of
equality throughout this book.
4 Chapter 1 I uF ndamental Concepts
The proofs given in parts (i) and (iv) of Theorem 1.1.4 are intentionally
quite detailed in order to demonstrate the exact procedure required to prove
1.1. Sets 5
(xviii) (A n B) U C = (A U C) () (B U C);
(xi)x (A U B)- = A- n B- ; and
(x)x (A n Bf = A- U B- .
Proof. We only prove part (xviii) of this theorem. again as an illustration of
the manipulations involved. We will first show that (A () B) U C c (A U C)
() (B U C), and then we show that (A () B) U C::::J (A U C) n (B U C).
Clearly, if (A () B) U C = 0, the assertion is true. So let us assume that
(A () B) U C *- 0, and let x be any element of (A () B) U C. Then x E
A () B or x E C. Suppose x E A () B. Then x belongs to both A and B, and
hence x E A U C and x E B U C. F r om this it follows that x E (A U C)
() (B U C). On the other hand, let x E C. Then x E A U C and x E B U C.
and hence x E (A U C) () (B U C). Thus, if x E (A n B) U C, then
x E (A U C) n (B U C). and we have
(A n B) U C c (A U C) n (B U C). (1.1.8)
To show that (A () B) U C ::::J (A U C) () (B U C) we need to prove the
assertion only when (A U C) () (B U C) *- 0. So let x be any element of
(A U C) n (B U C). Then x E A U C and x E B U C. Since x E A U C,
then x E A or x E C. Furthermore, x E B U C implies that x E B or
x E C. We know that either x E C or x ¢ C. If x E C. then x E (A () B)
U C. If x ¢ C, then it follows from the above comments that x E A and
also x E B. Then x E A () B, and hence x E (A () B) U C. Thus, if x ¢ C,
then x E (A () B) U C. Since this exhausts all the possibilities, we conclude
that
(A U C) () (B U C) c (A () B) U C. (1.1.9)
F r om (U . S) and (1.1.9) it follows that (A U C) () (B U C) = (A () B)
U C .•
1.1.10. Exercise. Prove parts (i) through (xvii) and parts (xi)x and (x)x
of Theorem 1.1.7.
U
3
n
';1
A, = AI () A: () ... () A. = {x E X: x E A, for all i = 1, ... ,n).
1.1. Sets 7
(i) U[ 1= I
A/J- = n A;,
1= 1
and (1.1.12)
(ii) n[ 1=1
A/J- = U/=1 A;. (1.1.13)
The results expressed in Eqs. (1.1.12) and (1.1.13) are usually referred
to as De Morgan's laws. We will see later in this section that these laws hold
under more general conditions.
Next, let A and B be two subsets of X. We define the difference of Band
A, denoted (B - A), as the set of elements in B which are not in A, i.e.,
B- A= {x E :X x E B and x f/: A}.
We note here that A is not required to be a subset of B. It is clear that
B- A = Bn A- .
Now let A and B be again subsets of the set .X The symmetric difference
of A and B is denoted by A ! l B and is defined as
A !l B = (A - B) U (B - A).
The following properties follow immediately.
In passing, we point out that the use of Venn diagrams is highly useful in
visualizing properties of sets; however, under no circumstances should such
diagrams take the place of a proof. In Figure A we illustrate the concepts of
union, intersection, difference, and symmetric difference of two sets, and the
complement of a set, by making use of Venn diagrams. Here, the shaded
regions represent the indicated sets.
• A
C"' A U B
B
CU ( DO
x
1.1.20. Example. The power class of the empty set, (P(0) = { 0 } , i.e.,
the singleton of 0. The power class of a singleton, (P({a)J = { 0 , a{ n. F o r the
bn.
set A = a{ , b}, (P(A) = { 0 , a{ ,} b{ ,} a{ , In general, if A is a finite set with
n elements, then (P(A) contains 2" elements. _
Before proceeding further, it should be pointed out that a free and uncrit-
ical use of a set theory can lead to contradictions and that set theory has had
a careful development with various devices used to exclude the contradictions.
Roughly speaking, contradictions arise when one uses sets which are "too
big," such as trying to speak of a set which contains everything. In all of our
subsequent discussions we will keep away from these contradictions by always
having some set or space X fixed for a given discussion and by considering
only sets whose elements are elements of ,X or sets (collections) whose ele-
ments are subsets of ,X or sets (families) whose elements are collections of
subsets of ,X etc.
Let us next consider ordered sets. Above, we defined set in such a manner
that the ordering of the elements is immaterial, and furthermore that each
element is distinct. Thus, if a and b are elements of ,X then a{ , b} = b{ , a};
i.e., there is no preference given to a or b. Furthermore, we have a{ , a, b}
= a{ , b}. In this case we sometimes speak of an unordered pair a{ , b}.
Frequently, we will need to consider the ordered pair (a, b), (a and b need
not belong to the same set) where we distinguish between the first element a
and the second element b. In this case (a, b) = (u, v) if and only if u = a and
v = b. Thus, (a, b) *- (b, a) if a *- b. Also, we will consider ordered triplets
(a, b, c), ordered uq adruplets (a, b, c, d), etc., where we need to distinguish
between the first element, second element, third element, fourth element,
etc. Ordered pairs, ordered triplets, ordered quadruplets, etc., are examples of
ordered sets.
We point out here that our characterization of ordered sets is not axiom-
atic, since we are assuming that the reader knows what is meant by the first
10
is called an indexed set. Here again, we agree to permit the possibility that the
elements x .., a E I need not be distinct. Clearly, if I is a finite non-void set,
then an indexed set is simply an ordered set.
In the next definition, and throughout the remainder of this section, J
denotes the set of positive integers.
If K = 0, we define U
.. e0
A.. = 0 and n A.. =
.. e0
.X
The union and intersection of families of sets which are not necessarily
indexed is defined in a similar fashion. Thus, if ff' is any non-void family of
subsets of ,X then we define
U F = x{ E :X x E F for some F E ff'}
pe'
and
(\ F = {x E :X x E F for all F E ff'.}
e' ~
When, in Definition 1.1.26, K is of the form K = k{ , k + 1, k + 2, ...},
where k is an integer, we sometimes write 0 A" and n A".
• -t • k
1.1.28. Example.
A" = :x { -n < x <
eL t
n+ .}
X =
Then, U
. A" =
.
R, the set of real numbers, and let 1=
Rand n A" = :x { -1
.J Let
< x < I} .
I< xI
< 1} + - U ..
.. = 1 .- 1
eL t B" = {x E R: - -
n
.
n
Then,
,,= \
B" = :x { -1 < x < 2}
and n B" =
00
,,= \
:x { 0 <x< I} . •
n (B -
.. eIC .. eIC
(iii) B- U A.. = A..);
n A.. = U (B - A.. );
.. ex .. eIC
(iv) B -
A.. r = n A;; and
.. eIC .. eIC
(v) U [ .. eIC .. elC
(vi) n[.. eIC
A.. r = U
.. ex
A;.
1.1.30. Exercise. Prove Theorem 1.1.29.
Parts (v) and (vi) of Theorem 1.1.29 are called De Morgan's laws.
We conclude the present section with the following:
1.1.31. Definition. eL t ff' be any family of subsets of .X ff' is said to be a
family of disjoint sets if for all A, B E ff' such that A :# B, then A n B = 0.
A sequence of sets E{ ,,} is said to be a sequenee of disjoint sets if for every m,
n E J such that m n, E", n E" = 0. "*
1.2. N
U F CTIONS
The terms mapping, map, operator, transformation, and function are used
interchangeably. When using the term mapping, we usually say "a mapping
of X into Y." Although the distinction between the words "of X " and "from
X " is immaterial, as we shall see, the wording "into Y " becomes important
as opposed to the wording "onto Y," which we will encounter later.
Sometimes it is convenient not to insist that the domain of definition off
be all of X ; i.e., a function is sometimes defined on a subset of X rather than
on all of .X In any case, the domain of definition offis denoted by ' J ( f) c .X
Unless specified otherwise, we shall always assume that ' J ( f) = .X
Intuitively, a functionfis a "rule" whereby for each x E X a uniq u e y E Y
is assigned to x . When viewed in this manner, the term mapping is q u ite
descriptive. However, defining a function as a "rule" involves usage of yet
another undefined term.
Concerning functions, some additional comments are in order.
1. So-called "multivalued functions" are not allowed by the above
definition. They will be treated later under the topic of relations
(Section 1.3).
2. The set X (or )Y may be the Cartesian product of sets, e.g., X = IX
X X 2 X ... x "X . In this case we think offas being a function of n
variables. We write f(x l , . . . ,x,,) to denote the value offat (X I ' • , ,
x . ) E X = X I X ... x "X .
3. It is important that the distinction between a function and the value
of a function be clearly understood. The value of a function, f(x ) ,
is an element of .Y The function.f is a much larger entity, and it is
to be thought ofas a single object. Note that f E P< (X x Y ) (the power
set of X x Y), but not every element of P< (X x Y ) is a function. The
set of all functions from X into Yis a subset of P< (X x Y) and is some-
times denoted by yx .
1.2.3. Example. L e t R denote the real numbers, and letfbe a function from
R into R whose value at each x E R is given by f(x ) = sin x. The function
f is the sine function. Expressed explicitly as a set, we see that f = {(x, y):
14 Chapter 1 / Fundamental Concepts
e • • 5 e • ,.4 d. ,.5
'~4
:?' 3 :7
d • • 4 c • ".4 b 3
d><:3
c 2 b. ,.3 c 2
b 2 1 d 1
a • 1
a~:
1,: ,X -+ ,Y 12 : X 2 -+ Y 2 13 : X 3 ... 3Y I.: .X - + Y .
I, is into 12 is onto 13 is (1 - 11 I. is bijective
Note that in the above definition, the domain of /-1 is R< (f) ,which need
not be all of .Y
Some texts insist that in order for a function/to have an inverse, it must
be bijective. Thus, when reading the literature it is important to note which
definition of /-1 the author has in mind. (Note that an injective function
/: X - Y is a bijective function from X onto R< (f).)
We also have
1.2.18. Theorem. IfI is a (I- I ) mapping of a set X onto a set ,Y and if g
is a (I- I ) mappi ng ofthe set Y o nto a set Z, then g 0 I is a (I- I ) mapping of X
ontoZ.
1.2.19. Exercise. Prove Theorem 1.2.18.
Next we prove:
1= (r stU ) .
u W v x
That is, the top row identifies the domain ofI and the bottom row contains
each uniq u e element in the range of I directly below the appropriate element
in the domain. Clearly, this representation can be used for any function
defined on a finite set. In a similar fashion, let the function g : B - + C be
defined as
g = (U v W )X .
x W z y
Clearly, both/and g are bijective. Also, go lis the (I- I ) mapping of A onto
C given by
F u rthermore,
uX ), g- I = (X
u v
W Z
W
y),
x
(gof)- t = (X r sZ w Y
t u
).
Now
1.2.22. T
' heorem. L e t W, X, ,Y and Z be non-void sets. If I is a mapping
of set W into set ,X if g is a mapping of X into set ,Y and if h is a mapping
of Y into set Z (sets W, ,X ,Y Z are not necessarily distinct), then h 0 (g 0 f)
= (h 0 g) of
1.2.23. Exercise. Prove Theorem 1.2.22.
Thus,
h 0 (g 0 f) = (: : ~ ~) and (h 0 g) 0 f = (: : ~ ~),
i.e., h 0 (g 0 f) = (h 0 g) 0 f. •
We also have:
j = (U v x y Z), j = (U v x y Z) .
npqrs npqnt
Then j andj are two different extensions off Moreover, I is the mapping
1.2. uF nctions 11
Let us next consider the image and the inverse image of sets under
mappings. Specifically, we have
Note thatf- I (B) is always defined for any f: X - - > .Y That is, there is no
implication here thatfhas an inverse. The notation is somewhat unfortunate
in this respect. Note also that the range offis f( X).
In the next result, some of the important properties of images and inverse
images of functions are summarized.
Proof We prove parts (i) and (ii) to demonstrate the method of proof.
The remaining parts are left as an exercise.
To prove part (i), let y E f(AI)' Then there is an x E AI such that
y = f(x ) . But AI c: A and so x E A. H e nce,f(x ) = y E f(A). This proves
thatf(A I) c: f(A).
To prove part (ii), let y E f(A 1 U A2 ). Then there is an x E AI U A2
such that y = f(x ) . If x E AI, then f(x ) = y E f(A,). If x E A2 , then
f(x ) = y E f(Az ). Since x is in AI or in Az , f(x ) must be in f(A,) or f(Az ).
Therefore, f(A I U A2 ) c: f(A I) U f(Az ). To prove that f(A I) U f(Az )
c: f(A, U Az ), we note that Al c: AI U Az . So by part (i), f(AI) c: f(AI
Chapter 1 I uF ndamental Concepts
-
U f(A2) c f(A, U A2). We conclude that f(A, U A2) = f(A I) U f(A,j.
We note that, in general, equality is not attained in parts (iii), (vii), and
(viii) of Theorem 1.2.41. However, by considering special types of mappings
we can obtain the following results for these cases.
(ii) f(
«EI «EI
1.2. u~ nct;ons
1.2.46. Exercise. Prove parts (ii), (iv), and (v) of Theorem 1.2.45.
1.2.47. Definition. Let A and B be any two sets. The set A is said to be
equivalent to set B if there exists a bijective mapping of A onto B.
1.2.48. Definition. eL t J be the set of positive integers, and let A be any set.
Then A is said to be countably infinite if A is equivalent to .J A set is said to
be countable or denumerable if it is either finite or countably infinite. Ifa set
is not countable, it is said to be uncountable.
We have:
Chapter 1 I ~ntal Concepts
As in the case of mappings, it makes sense to speak of the domain and the
range of a relation. We have:
1.3.6. Example. Let R denote the set of real numbers. The relation in R
given by {(x, y): x < y} is transitive but not reflexive and not symmetric.
The relation in R given by {(x, y): x *"
y} is symmetric but not reflexive and
not transitive. _
1.3.10. Example. Let R1. = R x R, the real plane. Let X be the family of
all triangles in R1.. Then each of the following statements can be used to define
an equivalence relation in :X "is similar to," "is congruent to," "has the same
area as," and "has the same perimeter as." _
..!~-
ala b (l.4 . 5)
b b a
If, in general, IX is an operation on an arbitrary finite set A, or sometimes
even on a countably infinite set A, then we can construct an operation table
as follows:
IX Y
x xIXy
p, y,
:r::
given in (1.4.5), we can define, for example, the operations and ~ on
A as
L~ " a b a b
a
b
Iba a
a
a
b
a b
a b
a a a
b b b •
In the case of the real numbers R, the operations of addition and multipli-
cation are both associative and commutative. The operation ofsubtraction is
neither associative nor commutative.
C b a C C b a C e d C b a C d e
d C d a b e d d C b a e
e d C a b e e d a C b e
30 Chapter 1 I uF ndamental Concepts
A classic reference on set theory is the book by Hausdorff 1[ .5]. The many
excellent references on the present topics include the elegant text by Hanneken
1[ .4), the standard reference by Halmos 1[ .3] as well as the books by Gleason
1[ .1] and Goldstein and Rosenbaum 1[ .2].
REFERENCES
31
2
ALGEBRAIC STRUCTURES
The subject matter of the previous chapter is concerned with set theoretic
structure. We emphasized essential elements of set theory and introduced
related concepts such as mappings, operations, and relations.
In the present chapter we concern ourselves with algebraic structure.
The material of this chapter falls usually under the heading of abstract
algebra or modern algebra. In the next two chapters we will continue our
investigation of algebraic structure. The topics of those chapters go usually
under the heading of linear algebra.
This chapter is divided into three parts. The first section is concerned
with some basic algebraic structures, including semigroups, groups, rings,
fields, modules, vector spaces, and algebras. In the second section we study
properties of special important mappings on the above structures, including
homomorphisms, isomorphisms, endomorphisms, and automorphisms of
semigroups, groups and rings. Because of their importance in many areas
of mathematics, as well as in applications, polynomials are considered in
the third section. Some appropriate references for further reading are sug-
gested at the end of the chapter.
The subject matter of the present chapter is widely used in pure as well as
in applied mathematics, and it has found applications in diverse areas, such
as modern physics, automata theory, systems engineering, information
theory, graph theory, and the like.
33
34 Chapter 2 I Algebraic Structures
~
IX
xxy
~
xxx
.1.- r :- t y
xxy
~ xxx
yyx yyx yxy yyy
Show that (i) { X ; P} possesses neither the right nor the left cancellation prop-
erty; (ii) { X ; )'} possesses the left cancellation property but not the right can-
cellation property; (iii) { X ; d} possesses the right cancellation property but
not the left cancellation property; and (iv) { X ; IX} possesses both the left and
the right cancellation property.
We note that a system ;X { } « may contain more than one right identity
element of X (e.g., system { X ; cS} of Exercise 2.1.2) or left identity element of
X (e.g., system ;X { y} of Exercise 2.1.2).
I-± h-oI
• .0 I
0 I o 0 0
I I 0 I 0 I
Does either ;X{ +} or ;X{ .} have an identity element?
« provided that
x« x' = e.
An element x " E X is called a left ia~erse of x relative to « if
x"« x = e.
The following exercise shows that some elements may not possess any right
or left inverses. Some other elements may possess several inverses of one kind
and none of the other, and other elements may possess a number of inverses
of both kinds.
In the remainder of the present chapter we will often use the symbols
"+" and "." to denote operations in place of a, p, etc. We will call these
"addition" and "multiplication." oH wever, we strongly emphasize here that
..+" and"· " will, in general, not denote addition and multiplication of real
numbers but, instead, arbitrary operations. In cases where there exists an
identity element relative to "+ " , we will denote this element by "0" and call
it "zero." If there exists an identity element relative to ". ", we will denote this
element either by "I" or bye. Our usual notation for representing an identity
relative to an arbitrary operation a will still be e. If in a system {X; + } an
element x E X possesses an inverse, we will denote this element by - x and
we will call it "minus "x . F o r example, if ;X{ + } is a semigroup, then we
denote the inverse of an invertible element x E X by - x , and in this case we
have x + (- x ) = (- x ) + x = 0, and also, - ( - x ) = .x Furthermore, if
,x y E X are invertible elements, then the "sum" x + y is also invertible,
+ y) = (- y ) + "+"
*'
and - ( x (- x ) . Note, however, that unless is com-
mutative, - ( x + y) (- x ) + (- y ). Finally, if x, y E X and if y is an
invertible element, then - y E .X In this case we often will simply write
x + (- y ) = x - y.
+ 10 1 2 3 0 1 2 3
0 0 1 2 3 0 0 0 0 0
1 1 2 3 0 1 0 1 2 3
2 2 3 0 I 2 0 2 0 2
3 3 0 1 2 3 0 3 2 1
The reader should readily show that the systems { X ; }+ and { X ; .} are
monoids. In this case the operation" "
+ is called "addition mod 4" and"· "
is called "multiplication mod 4."
•
The most important special type of semigroup that we will encounter in
this chapter is the group.
2.1.19. Theorem. Let {X; IX} be a group, and let e denote the identity
element of X relative to IX. Let x and y be arbitrary elements in .X Then
(i) if x IX x = x , then x = e;
(ii) if Z E X and x IX y = x IX ,z then y = z;
(iii) ifz E X a ndx I X y = z I X Y , thenx = z ;
(iv) there exists a unique W E X such that
W (X x = y; and (2.1.20)
(v) there exists a unique z E X such that
x(Xz=y. (2.1.21)
Proof To prove the first part, let x (X x = x. Then X - I IX (x IX x ) = X-I (X ,x
and so (X - I (X x ) IX x = e. This implies that x = e.
To prove the second part, let x IX y = x IX .z Then X - I (X (x IX y) = X-I IX (x
IX )z , and so (X - I IX x ) IX Y = (X - I (X x ) IX .z This implies that y = .z
The proof of part (iii) is similar to that of part (ii).
04 Chapter 2 I Algebraic·Structures
In part (iv) of Theorem 2.1.19 the element w is called the left solution
of Eq. (2.1.20), and in part (v) of this theorem the element z is called the
right solution of Eq. (2. I.21).
We can classify groups in a variety of ways. Some of these classifications
are as follows. eL t { X ; } « be a group. Ifthe set X possesses a finite number
of elements, then we speak of a finite group. If the operation « is commutative
then we have a commutative group, also called an abelian group. If « is not
commutative, then we speak of a non-commutative group or a non-abelian
group. Also, by the order of a group we understand the order of the set .X
Now let ;X { } « be a semigroup and let IX be a non-void subset of X
which is closed relative to .« Then by Theorem 1.4.11, the operation « I on XI
induced by the associative operation « is also associative, and thus the
mathematical system { X I ; I« } is also a semigroup. The system { X I ; I« } is
called a subsystem of { X ; .} « This gives rise to the following concept.
2.1.24. Theorem. Let ;X { Il} be a monoid with e its identity element, and let
{ X I ; Ill} be a subsemigroup of { X ; Il}. Ife E IX ! , then e is an identity element
of { X I ; Ill} and { X I ; Iltl is a monoid.
2.1.27. Exercise. L e t Z6 = O
{ , 1,2,3,4 , 5} and define the operation +
on Z6 by means of the following operation table:
+012345
0012345
1104523
2 2 504 3 1
3345012
4431250
5523104
(a) Show that Z { 6; +} is a group.
(b) L e t K = O
{ , I}. Show that{ K ; +} is a subgroup Of{Z6; +}.
(c) Are there any other subgroups Of{Z6; + } ?
2.1.35. Example. Let Z denote the set of integers, and let denote "+"
the usual operation of addition of integers. Let W = (I}. If Y is any subset
of Z such that (Y ; + } is a subgroup of { Z ; + } and We ,Y then Y = Z.
To prove this statement, let n be any positive integer. Since Y i s closed with
respect to ,+ we must have 1 + 1 = 2 E .Y Similarly, we must have 1 + I
+ ... + 1 = n E .Y Also, n- I = - n , and therefore all the negative integers
are in .Y Also, n - n = 0 E ,Y i.e., Y = Z. Thus, G = Y = Z, and so
E
n
Y J!'
the group { Z ; + } is the subgroup generated by I{ .} •
2.1.36. Theorem. Let Z denote the set of all integers, and let { X ; ~}
be a group. Let x E and define ;xl< = x IX X IX • • IX x (k times), for k a
X
positive integer. Let I- x < = (Xl<)-I, and let OX = e. Let Y = lx { :< k E Z}.
Then { Y ; ~} is the subgroup of{ ;X }~ generated by (x.}
Proof We first show that {Y;~} is a subgroup of ;X{ .}~ Clearly, Y eX
and e E Y and for every y E Y we have r lEY . Also, for every x , y E Y
we have x ~ y E .Y Thus, by Theorem 2.1.30, (Y;}~ is a subgroup of{ X ; ~}.
Next, we must show that {Y;~} is the subgroup generated by .} x { To do so,
it suffices to show that Y c /Y for every /Y such that x E Y and such that j
{Y/;~} is a subgroup of (X; .}~ But this is certainly true, since y E Yimplies
y = lx < for some k E Z. Since x E ,Y it follows that lx < E Y and therefore j
y E ,Y . •
(XI = C :), y
y
(X2 = C z ;), y
(X3 = (;
y
x
:),
(X4 = Gz ;), y
(x s = G ;), x
y
(x , = (: y
y
;).
We can readily verify that (XI = e. If X I = fe, (X 2 J , then (X I ; • J is a subgroup
of P(X ) and hence a permutation group on .X eL t X 2 = (e, (X 4 ' (X s J . Then
64 Chapter 2 I Algebraic Structures
2.1.45. Definition. Let X be a non- e mpty set, and let ~ and P be operations
on .X The set X together with the operations ~ and P on ,X denoted by
{X;~, P}, is called a ring if
(i) {X;~} is an abelian group;
(ii)' { X ; Pl is a semigroup; and
(iii) P is distributive over ~.
We refer to {X;~} as the group component of the ring, to ;X{ P} as the
semigroup componeDt of the ring, to tX as the group operatioD of the ring, and
to P as the semigroup operation of the ring. F o r convenience we often denote
a ring { X ; tX, P} by X and simply refer to "ring X " . F o r obvious reasons,
we often use the symbols "+" and "." ("addition" and "multiplication")
in place of tX and P, respectively. Thus, if X is a ring we may write { X ; ,+ .}
and assume that { X ; + 1 is the group component of ,X and { X ; .} is the
semigroup component of .X We call { X ; + } the additive group of ring ,X
;X{ .} the multiplicatiYe semigroup of ring ,X x +
y the sum of x and y, and
x . y the product of x and y.
We use 0 ("z e ro") to denote the identity element of { X ; .}+ If { X ; .} has
an identity element, we denote that identity bye.
The inverse of an element x relative to + is denoted by - .x If x has an
inverse relative to ".", we denote it by X - I . F u rthermore, we denote x +(- y )
by x - y (the "difference of x and y") and (- x ) y by - x + +
y. Note that
the elements 0, e, - x , and X - I are unique.
Subsequently, we adopt the convention that when operations "+" and
"." appear mixed without parentheses to clarify the order of operation,
the operation should be taken with respect to ". " first and then with respect
to .+ F o r example,
x • y + z = (x • y) + z
2.1. Some Basic Structures ofAlgebra 74
2.1.47. Definition. eL t ;X { ,+ .}
be a ring. Ifthe operation "." is com-
mutative on the set X then the ring X is called a commutative ring.
2.1.48. Definition. L e t{ X ; ,+ .}
be a ring with identity. An element x E X
is called a unit of X if x has an inverse as an element of the semigroup { X ; .}.
We denote this inverse of x by X - I .
The reader can readily verify that the following examples are rings.
2.1.50. Exercise. Let X = O{ , I} and define" "+ and"· " by the following
operation tables:
+ 0 I I
~
_O
o 0 I o 0 0
1 I 0 I 0 I
Show that ;X{ ,+ .} is a commutative ring with identity.
2.1.51. Exercise. Let ;X{ ex} be an abelian group with identity element e.
Define the operation P on X as x p y = e for every x , y E .X Show that
;X{ ex, P} is a ring.
14 Chapter 2 I Algebraic Structures
F o r rings we have:
Now let { X ; + , .} denote a ring for which the two operations are equal,
i.e., "+" = ".". Then x + y = x · y for all x , y E .X In particular, if
y = 0, then x + 0 = x . 0 = 0 for all x E X and we conclude that 0 is the
only element of the set :X
This gives rise to:
We next introduce:
2.1.54. Definition. eL t { X ; ,+ .}
be a ring. Ifthere exist non-zero elements
x , y E X (not necessarily distinct) such that x • y = 0, then x and yare both
called divisors of ez ro.
We have:
2.1.64. Example. Perhaps the most widely known field is the set of real
numbers with the usual rules for addition and multiplication. _
2.1.65. Exercise. Let Z denote the set of all integers and .. " + and "."
denote the usual operations of addition and multiplication on Z. Show that
{ ; ,+ .} is an integral domain, but not a division ring, and hence not a
Z
field.
2.1.66. Definition. Let R denote the set of all real numbers, let Z denote
the set of all integers, and let" "+ and "." denote the usual operations of
addition and multiplication, respectively. We call {R; ,+ .} the field of real
numbers and { Z ; ,+ .} the ring of Integers.
Before, we characterized a trivial ring as a ring for which the set X consists
only of the 0 element. In the case of subrings we have:
2.1. Some Basic Structures ofAlbegra 53
F o r subdomains we have:
F o r subfields we have:
"*
2.1.78. Theorem. Let X be a field, and let Y be a subring of .X Then Y
is a subfield of X if and only if for every x E ,Y X 0, X - I E .Y
2.1.83. Example. eL t { X ; ,+ .}
be a ring with identity, and let R be a
subring of X with e E R. By defining p: R x X - + X as p«(I" )x = (1,. ,x it
is clear that X is an R-module. In particular, if R = ,X we see that any ring
with identity can be made into a module over itself. _
F o r modules we have:
The notion of vector space, also called linear space, is among the most
important concepts encountered in mathematics. We will devote the next two
chapters and a large portion of the remainder of this book to vector spaces
and to mappings on such spaces.
We also have:
2.1.88. Theorem. eL t ;F { ,+ .}
be a field, and let "F = F x ... X F
be the n-fold Cartesian product of .F Denote the element x E F " by x = (~I'
~z, ... ,~,,) and define the operation "+" on "F by
Note that in hypothesis (iii) the symbol"." is used to denote two different
operations. Thus, in the case of x • y the operation used is defined on X while
in the case of /X • p the operation used is defined on .F
The reader is cautioned that in some texts the term algebra means what
we defined to be an associative algebra.
/Xu = /[ QX /Xb.J
/Xc tXd
Show that M is an associative algebra over R.
L e t us now consider some specific cases of Lie algebras. Our first exercise
shows that any associative algebra can be made into a iL e algebra.
k = (0,0,1)
j = (0,1,0)
Proof We give the sufficiency part of the proof and leave the necessity part
as an exercise.
Let ,« pER and let x E .Y Then «x, px , (<< + P)x E Y by hypothesis
(ii). Since iY s a group, it follows that « x + py E Yand since x E X we have
2.1. Some Basic Structures ofAlgebra 59
We next introduce the notion of vector subspace, also called linear subspace.
E R} is an R-submodule of .X •
2.1.104. Example. Let F be a field, and let P be the vector space of n-
tuples over .F Let IX = (1,0, ... ,0) and lX = (0,1,0, ...• 0). Then IX '
lX E F~. Let Y = (x E P: X = I« X I + «lX l , « .. 1« E .}F Then iY s a vector
subspace. We see that jf X E ,Y then x is of the form x = (« 1,1. (1,1,0, ... ,0).
•
We next prove:
eL t us next prove:
It can happen that the indexed set I{ X It • . . , IX,,} in the above definition is
not unique. That is to say, for x E X we may have x = IXIX I + ... + IX..x"
= PIX I + ... + P"x", where 1X , 1= = P, for some i. oH wever, if it turns out
that the above representation of x in terms of x " ... ,X " is unique, then we
have:
2.1. Some Basic Structures ofAlgebra 61
D. Overview
We conclude this section with the flow chart of Figure D, which attempts
to put into perspective most of the algebraic systems considered thus far.
Module
x y
62
2.2. oH momorphisms
2.2.5. Example. Let R denote the set of real numbers, and let "+" and
"." denote the usual operations of addition and multiplication on R. Then
{ ; + } and {R; .} are semigroups. eL t
R
f(x ) = e"
for all x E R. Thenfis a homomorphism from {R; + } to {R; .} . •
In order to simplify our notation even further, we will often use the
symbol"." in the remainder of the present chapter to denote operations for
semigroups (or groups), say { X ; ' } , { Y ; • ,J and we will often refer to these
simply as semigroup (or group) X and ,Y respectively. In this case, if p
denotes a homomorphism of X into Y we write
p(x • y) = p(x ) • p(y)
for all x , y E .X
In Chapter I we classified mappings as being into, onto, one-to-one and into,
and one-to-one alld onto. Now if p is a homomorphism of a semigroup X
into a semigroup ,Y we can also classify homomorphisms as being into, onto,
one-to-one and into, and one-to-one and onto. This classification gives rise
to the following concepts.
We note that since all groups are semigroups, the concepts introduced
in the above definition apply necessarily also to groups.
In connection with isomorphic semigroups (or groups) a very important
observation is in order. We first note that if a semigroup (or group) X is
isomorphic to a semigroup ,Y then there exists a mapping p from X into Y
which is one-to-one and onto. Thus, the inverse of p, p- I, exists and we can
associate with each element of X one and only one element of ,Y and vice
versa. Secondly, we note that p is a homomorphism, eL ., p preserves the
properties of the respective operations associated with semigroup (or group)
X and semigroup (or group) Y o r, to put it another way, under p the (alge-
braic) properties of semigroups (or groups) X and Y a re preserved. eH nce,
it should be clear that isomorphic semigroups (or groups) are essentially
indistinguishable, the homomorphism (which is one-to-one and onto in this
case) amounting to a mere relabeling of elements of one set by elements of
a second set. We will encounter this type of phenomenon on several other
occasions in this book.
We are now ready to prove several results.
2.2.8. 1beorem. eL t p be a homomorphism from a semigroup X into a
semigroup .Y Then
(i) p(X ) is a subsemigroup of ;Y
(ii) if X has an identity element, e, p(e) is an identity element of p(X ) ;
(iii) if X has an identity element, e, and if x E X has an inverse, x - I,
then p(x) has an inverse in p(X ) and, in fact, P
[ (X)I-] = p(x - I);
(iv) if IX is a subsemigroup of ,X then p(X I) is a subsemigroup of p(X ) ;
and
(v) if Y I is a subsemigroup of p(X ) , then
Xl = {x E :X p(x ) E Y I}
is a subsemigroup of .X
Proof. To prove the first part we must show that the subset p(X ) of Y
is closed relative to the operation"· .. on .Y Now if x', y' E p( X), then there
exists at least one x E X and at least one y E X such that p(x) = x ' and
p(y) = y'. Since p is a homomorphism, we have
x' • y' = p(x) • p(y) = p(x • y),
2.2. oH momorphisms
and since x • y E X it follows that x ' • y' E p(X ) because p(x • y) E P(X ) .
Thus, p(X ) is closed and, hence, is a subsemigroup of .Y
To prove the second part, note that since e E X we have pee) E p(X ) ,
and since for any x ' E p(X ) there exists x E X such that p(x ) = x ' , we have
p(e) • x ' = pee) • p(x ) = p(e • x ) = p(x ) = x'.
Since this is true for every x ' E p(X ) , it follows that p(e) is a left identity
element of p(X ) . Similarly, we can show that x ' • pee) = x ' for every x '
E p(X ) . Thus, p(e) is an identity element of the subsemigroup p(X ) of .Y
To prove the third part of the theorem, note that since p is a homo-
morphism, we have
p(x ) • p(x - I) = p(x • I-X ) = p(e),
and
p(x - I) • p(x ) = p(x - I • x) = p(e);
i.e., p(e) is an identity element of p(X ) . Also, since p(x - I ) E P(X ) , p(x ) has
an inverse in P(X ) , and P [ (X)I- ] = p(x - I).
The proof of parts (iv) and (v) of this theorem are left as an exercise. _
Proof To prove the first part, let e denote the identity element of .X By
part (i) of Theorem 2.2.8, p( X ) is a subsemigroup of ;Y by part (ii) of Theorem
2.2.8, p(e) is an identity element of p(X ) ; and by part (iii) of the same theorem,
it follows that every element of p(X ) has an inverse. Thus, p(X ) is a subgroup
of .Y
The second part of this theorem follows from Theorem 2.1.28 and from
part (ii) of Theorem 2.2.8. •
We also have:
Since this is true for all x ' , y' E p(X ) , it follows that p- I is an isomorphism
of p( X ) with .X
To prove the second part of the theorem we first note that P(X) is a
subsemigroup of Y by Theorem 2.2.8. It follows from Theorem 2.2.13 that
e = r
I(e') is an identity element of .X Now let p(k) = e'. Since p(e) = e',
it follows that k = e and that K, = e{ .} We can similarly show that K , .•
= e{ .} ' _
define
e= { I ,O,O, ... .J
Then e E P and { P ; • J is obviously a monoid with e as its identity element.
We can now easily prove the following
Let us next complete the connection between our abstract characteriz a tion
of polynomials and with the function f(t) we originally introduced. To this
end we let
to= { I ,O,O, J
t\ = O{ , I, 0, 0, }
t'1. = O
{ ,O, 1,0, J
t3 = O
{ ,O,O, I,O, J
At this point we still cannot give meaning to a,t', because a, E F and t' E P.
However, if we make the obvious identification a{ " 0,0, ... J E P, and if
we denote this element simply by a, E P, then we have
f(t) = a o • to + a\ • t\ + ... + a• • t· .
Thus, we can represent J ( t) uniquely by the sequence a{ o, at, ... ,a.,
0, ... .J By convention, we henceforth omit the symbol ". ", and write, e.g.,
f(t) = ao + a\ t + ... + a"r.
We assign t appearing in the argument of f(t) a special name.
2.3.4. DeftnitioD. Let f(t) E t[F ,] and let f(t) = f{ O,f1o .• . ,f", ... J * 0,
*° °
where f, E F for all i. The polynomial f(t) is said to be of order n or of
degree n iff" and if f, = for all i > n. In this case we write degf(t) =
and we call f" the leading coefticieDt off If f" = I and f, =
then J ( t) is said to be monic.
for all i > ° n
n,
Since It = 0 for i > nand gJ = 0 for j > m, the largest possible value of k
such that hk is non-zero occurs for k = m n; eL ., +
hm+n = /"gm'
Since F is a field, f" and gm cannot be zero divisors, and thushm + . *- O.
Therefore, hm + . *- 0, and hk = 0 for all k > m + n. •
Our next result shows that, in general, we cannot go any further than
integral domain for t[ F l.
2.3.8. Theorem. Let f(t) E t[F .] Then f(t) has an inverse relative to "."
if and only if f(t) is of order zero.
Proof Let f(t) E t[F J be of order n, and assume that f(t) has an inverse
relative to ".", denoted by f- I (t), which is of order m. Then
f(t)f- I (t) = e,
where e = {I, 0, 0, ... J is of order ez ro. By Theorem 2.3.5 the degree of
f(t)f- 1 (t) is m + n. Thus, m + n = 0 and since m > 0 and n > 0, we must
havem = n = O.
Conversely, let f(t) = fo = f{ o, 0, 0, ... ,J where fo *- O. Then f- I (t)
= fo 1 = { f o· , 0, 0, ... J . •
2.3.9. Theorem. eL t f(t), get) E E[t] and assume that get) *"O. Then there
exist unique elements q(t) and ret) in E[t] such that
= (q t)g(t) + ret),
f(t) (2.3.10)
where either ret) = 0 or deg ret) < deg get).
Proof If f(t) = 0 or if degf(t) < deg get), then Eq. (2.3.10) is satisfied with
q(t) = 0, and ret) = f(t). Ifdegg(t) = 0, Le.,g(t) = c, thenf(t) = c[ - I • f(t)]
• C, and Eq. (2.3.10) holds with q(t) = c- I f(t) and ret) = O.
Assume now that deg f(t) > deg get) > 1. The proof is by induction on
the degree of the polynomial f(t). Thus, let us assume that Eq. (2.3.10)
holds for deg f(t) = n. We first prove our assertion for n = 1 and then for
n + I.
Assume that deg f(t) = I, eL ., f(t) = a o + alt, where a l O. We need *"
only consider the case g(t) = b o + bit, where b l O. We readily see that *"
Eq. (2.3.10) is satisfied withq ( t) = alb. 1 and ret) = a o - alb.lb o'
Now assume that Eq. (2.3.10) holds for degf(t) = k, where k = 1, ... ,
n. We want to show that this implies the validity of Eq. (2.3.10) for
degf(t) = n + I. Let
f(t) = ao + alt + ... + a"+lt"+I,
where a,,+ I 1= = O. Let deg get) = m. We may assume that 0 < m < n + I. Let
*"
g(t) = bo + bit + ... + b",t"', where b", O. It is now readily verified that
f(t) = b;.la"t"+I"- g' (t) + [ f (t) - b;.la.,tk+I"- g' (t)]. (2.3.11)
Now let h(t) = f(t) - b;.l a.,t"+I-"'g(t). It can readily be verified that the
coefficient of t"+1 in h(t) is O. Hence, either h(t) = 0 or deg h(t) < n + I.
By our induction hypothesis, this implies there exist polynomials set) and ret)
such that h(t) = s(t)g(t) + ret), where ret) = 0 or deg ret) < deg get). Sub-
stituting the expression for h(t) into Eq. (2.3.11), we have
f(t) = b[ ;.la"t"+I"'- + s(t)]g(t) + ret).
Thus, Eq. (2.3.10) is satisfied and the proof of the existence of ret) and q(t)
is complete.
The proof of the uniqueness of q(t) and ret) is left as an exercise. _
2.3.12. Exercise. Prove that (q t) and ret) in Theorem 2.3.9 are unique.
2.3.13. Definition. Let f(t) and get) be any non-zero polynomials. Let
q(t) and ret) be the unique polynomials such thatf(t) = q(t)g(t) + r(t), where
either ret) = 0 or deg ret) < deg get). We call q(t) the qootient and ret) the
remainder in the division of f(t) by get). If ret) = 0, we say that get) divides
f(t) or is a factor of f(t).
2.3. Application to Polynomials 73
Next. we prove:
2.3.15. Exercise. Show that if d(t) is the greatest common divisor of f(t)
and g(t). then there exist polynomials m(t) and n(t) such that
de,) = m(t)f(t) + n(t)g(t).
Iff(t) and g(t) are relatively prime, then
1= m(t)f(t) + n(t)g(t).
74 Chapter 2 I Algebraic Structures
REFERENCES
75
76 Chapter 3 I Vector Spaces and iL near Transformations
(i) x + y= y +
x for every ,x y EX ;
(ii) x+ (y +
)z = (x + y) + z for every ,x y, Z E X ;
(iii) there is a unique vector in ,X called the ez ro vector or the Dull
vector or the origiD, which is denoted by 0 and which has the prop-
+
erty that 0 x = x for all x EX ;
(iv) IX(X +
y) = IXX +
IXy for all IX E F and for all ,x y E X ;
(v) (IX +
p)x = IXX +
px for all IX, p E F and for all x E X ;
(vi) (IXP)X = IX(PX) for all IX, p E F and for all x E ;X
(vii) Ox = 0 for all x E X ; and
(viii) Ix = x for all x E .X
The reader may find it instructive to review the axioms of a field which
are summarized in Definition 2.1.63. In (v) the "+" on the left-hand side
denotes the operation of addition on F ; the "+"
on the right-hand side
denotes vector addition. Also, in (vi) IXP I!. IX · p, where "." denotes the
operation of mulitplication on .F In (vii) the symbol 0 on the left-hand side is
a scalar; the same symbol on the right-hand side denotes a vector. The I
on the left-hand side of (viii) is the identity element of F r elative to ".".
To indicate the relationship between the set ofvectors X and the underlying
field ,F we sometimes refer to a vector space X over field .F oH wever, usually
we speak of a vector space X without making explicit reference to the field F
and to the operations of vector addition and scalar multiplication. If F is
the field of real numbers we call our vector space a real vector space. Similarly,
if F is the field of complex numbers, we speak of a complex vector space.
Throughout this chapter we will usually use lower case Latin letters (e.g.,
,x y, )z to denote vectors (Le., elements of X ) and lower case Greek letters
(e.g., IX, p, )') to denote scalars (Le., elements of F ) .
If we agree to denote the element (- l )x E X simply by - x , eL ., (- l )x
I!. - x , then we have x - x = Ix + (- l )x = (l - l)x = Ox = O. Thus,
if X is a vector space, then for every x E X there is a unique vector, denoted
-x, such that x - x = O. There are several other elementary properties of
vector spaces which are a direct consequence of the above axioms. Some of
these are summarized below. The reader will have no difficulties in verifying
these.
3.1. iL near Spaces 77
/
x x
x+y
o 0 y
• •
"fY 0
• • av• ., .
y
• ($•y
Vector x Vector x + y Vector y
Vector av, O< a < l
Vector ($y, fj > 1
Vector "fY, O
<'}
3.1.5. iF gm'e A
The reader can readily verify that, for the space described above, all the
axioms of a linear space are satisfied, and hence X is a vector space. _
3.1.6. Example. eL t X = R denote the set of real numbers, and let F also
denote the set of real numbers. We define vector addition to be the usual
addition of real numbers and multiplication of vectors x E R by scalars
(X E F to be multiplication of real numbers. It is a simple matter to show that
this space is a linear space. _
x + y = (el' e2"' " elt) + ('71' ' 7 2"' " 7' 1t)
.0. (el + 7' 1' e2 + 7' 2' ... ,elt + 7' 1t) (3.1.8)
and
(Xx = (X(el' e2" .. ,elt) .0. eX « l' e« 2" .. ,«elt)' (3.1.9)
It should be noted that the symbol" "+
on the right-hand side of Eq. (3.1.8)
denotes addition on the field ,F and the symbol" on the left-hand side of "+
Eq. (3.1.8) designates vector addition. (See Theorem 2.1.88.)
In the present case the null vector is defined as = (0, 0, ... , 0) and the
vector - x is defined by - x = -(~I' ~2"'" ~It) = (-~I' -~2"'" e- lt)'
°
Utilizing the properties of the field ,F all axioms of Definition 3.1.1 are
readily verified, and iF t is thus a vector space. We call this space the space
iF t of n-tuples of elements of .F _
of all infinite sequences; eL ., there is no req u irement that any type of conver-
gence of the sequence be implied. _
3.1.18. Example. Let X denote the set of infinite sequences of real numbers
00
of the form (3.1.12), with the property that ~ I~II < 00. Let F be the field
1= 1
of real numbers, let vector addition be defined similarly as in (3.1.8), and let
scalar multiplication be defined similarly as in Eq. (3.1.9). Then X is a vector
space. _
80 Chapter 3 I Vector Spaces and Linear Transformations
3.2.2. Example. The set consisting of the null vector 0 is a linear subspace;
i.e., the set Y = O
{ J is a linear subspace. Also, the vector space X is a linear
subspace of itself. If a linear subspace Y is not all of X, then we say that Y
is a proper subspace of .X •
Concerning linear subspaces we now state and prove the following result.
3.1.7. iF gure B
Y+Z
3.2.13. Definition. Let X I > " " ,X be linear subspaces of the vector
space .X The sum XI + ... + ,X is said to be a direct sum if for each
x E XI + +
,X there is a unique set of iX E IX > i = I, ... ,r such that
x = IX + +
x,. We denote the direct sum of X I , . .. , ,X by XI EB ...
EB ,X .
There is a connection between the Cartesian product of two vector spaces
and their direct sum. eL t Y a nd Z be two arbitrary linear spaces over the same
field F and let V = Y x Z. Thus, if v E V, then v is the ordered pair
v= (y, z),
84 Chapter 3 I Vector Spaces and iL near Transformations
(3.2.14)
Then Y ' and Z' are linear subspaces of V and V = Y' EB Z' . By abuse of
notation, we frequently express this simply as V = Y EB Z.
Once more, making use of Example 3.1.4, let Y and Z denote two lines
intersecting at the origin 0, as shown in F i gure D. The direct sum of linear
subspaces Y a nd Z is in this case the "entire plane."
3.2.16. Figure D
z = X + Y Do z{ E X: Z = x + .Y Y E }Y
a linear variety or a ftat or an affine linear subspace of .X
3.2.18. iF gure E
Throughout the remainder of this and in the following chapter we use the
following notation: ({ IX • . .. ,(X,,}, (X, E ,F denotes an indexed set of scalars,
and IX{ ' ...• ,x ,}. ,X E .X denotes an indexed set of vectors.
Before introducing the notions of linear dependence and independence
of a set of vectors in a linear space ,X we first consider the following.
tation of x in Eq. (3.3.2) is, of course, not necessarily unique. Thus, in the
case of Example 3.1.10, if X = RZ and if x = (1, 1), then x can be represented
as
or as
x = PIZI + pzz = 2(! , 0) + 3(0,! ) ,
etc. This situation is depicted in Figure .F
2Y = (0,1) X" (1,1) = 1(1,0) + 1(0, 1)" 2(! , 0) + 3(O,! ) " etc.
Z2 = (0, M
...- .......- ...... ,Y = ( 1,0)
Z, .. (!' 0)
3.3.3. iF gure F
Note that the null vector cannot be contained in a set which is linearly
independent. Also, if a set of vectors contains a linearly dependent subset,
then the whole set is linearly dependent.
If X denotes the space of Example 3.1.4, the set of vectors y{ , }z in Figure
H is linearly independent, while the set of vectors ru,
v} is linearly dependent.
o
3.3.12. iF gure .H iL nearly Independent and iL nearly Dependent Vec-
tors.
Then
x + (- x ) = (lIIX I +
lI"X" + ... + lI.X.) + (- P IX I- P"x"
- ... - P.x.)
= (lIl - PI)X I + (lI" - P")x,, + ... + (ll. - P.)x. = O.
Since the vectors x I, "x , ...' ,X. form a basis for ,X it follows that they
are linearly independent, and therefore we must have (lI, - P,) = 0 for
i = 1, ... ,n. From this it follows that III = PI' lI" = P", ... ,lI" = p". •
We also have:
3.3.26. Theorem. eL t IX{ ' "X , ... ,x . } be a basis for vector space ,X and
let {YI' ... IY' II} be any linearly independent set of vectors. Then m < n.
Proof. We need to consider only the case m > n and prove that then we
actually have m = n. Consider the set of vectors IY{ ' X I "" ,x.l. Since the
vectors XI' ... ,X . span ,X IY can be expressed as a linear combination of
them. Thus, the set {YI' X I > ' " ,x.l is not linearly independent. Therefore,
there exist scalars PI' lIl> ... , lI., not all ez ro, such that
PIYI + lIlx l + ... + lI"X. = O. (3.3.27)
If all the lI, are zero, then PI *' 0 and PlY I = O. Thus, we can write
PIYI + O· "Y + ... + O· IY II = O.
But this contradicts the hypothesis· of the theorem and can' t happen because
the YI' ... IY ' II are linearly independent. Therefore, at least one of the lI, O. *'
Renumbering all the x" if necessary, we can assume that lI" O. Solving for *'
x" we now obtain
Now we show that the set IY{ ' X I "' " ,X ,-I} is also a basis for .X Since
{ X I "' " x.} is a basis for ,X we have I~ ' "~ , •• ,~. E F s uch that
YI- _ (-A
T I) XI·" + + (-A
-A._I)- X.- t
+ 0 ·X .• (3.3.29)
Now the term (-a../Pt)x. in Eq. (3.3.30) is not zero, because we solved for
X . in Eq. (3.3.28); yet the coefficient multiplying X . in Eq. (3.3.29) is zero.
Since { X I ' ... ,x . J is a basis, we have arrived at a contradiction, in view of
Theorem 3.3.25. Therefore, we must have A = O. Thus, we have
At IX + ... + A._t.X _ 1 + 0 . .X = 0
and since { x u . .. , .x l is a linearly independent set it follows that = 0, AI
• . . , A._ I = O. Therefore, the set { y \ J X I ' • • , X . _ d is indeed a basis for X.
By a similar argument as the preceding one we can show that the set
,z Y { YI' XI'· ' ,x . - z J is a basis for ,X that the set 3Y{ ' ,z Y Y I ' X I ' ... ,x . - 3 I
is a basis for ,X etc. Now if m > n, then we would not utilize Y n + 1 in our
process. Since {Y., . .. ,Y I ) is a basis by the preceding argument, there exist
coefficients ' I ., ... , ' I I such that
Y.+I = ' I .Y . + ... + ' I IY I '
But by Theorem 3.3.15 this means the "Y i = 1, ... ,n + 1 are linearly
dependent, a contradiction to the hypothesis of our theorem. F r om this it
now follows that if m > n, then we must have m = n. This concludes the
proof of the theorem. _
We will agree that the linear space consisting of the null vector is finite
dimensional, and we will say that the dimension of this space is ez ro.
Our next result provides us with an alternate characterization of (finite)
dimension of a linear space.
X = (- ...!)L x11"+1
l - .•. - (~)x"
11,,+ I
and X E V({ X I > " " ,x ,}); i.e., { X l • • ,x,,} is a basis for .X Therefore, X
is n-dimensional. _
these results in this book, their proofs will be omitted. In the following
theorems, X is an arbitrary vector space (i.e., finite dimensional or infinite
dimensional).
3.3.38. Theorem. If Y and Z are Hamel bases for a linear space ,X then
Y and Z have the same cardinal number.
The notion of H a mel basis is not the only concept of basis with which we
will deal. Such other concepts (to be specified later) reduce to H a mel basis
on finite-dimensional vector spaces but differ significantly on infinite-dimen-
sional spaces. We will find that on infinite-dimensional spaces the concept
of Hamel basis is not very useful. However, in the case of finite-dimensional
spaces the concept of Hamel basis is most crucial.
In view of the results presented thus far, the reader can readily prove the
following facts.
and
E . tJ,,Y + E "
p,e, = O.
1= ' 1= '
Then there must be some Pj "*
0, otherwise the linear independence of
{ y " ... , Y.} would be contradicted. But this means that ej is a linear combi-
nation of the set of vectors Sz = y{ I' • . . , Y .., e l , • • , e j _ l , e j "+ ... , ell};
i.e., Sz is the set SI with ej eliminated. Clearly, Sz still spans .X Now either
Sz contains n vectors or else it is a linearly dependent set. If it contains n
vectors, then by Theorem 3.3.41 these vectors must be linearly independent
in which case Sz is a basis for .X We then let "x = t j , and the theorem is proved.
On the other hand, if Sz contains more than n vectors, then we continue the
above procedure to eliminate vectors from the remaining e,'s until exactly
n - m of them are left. Letting eil, ... ,ej _ be the remaining vectors and
letting X .. + I = til' ... ,x " = ej • _ , we have completed the proof of the
theorem. _
Among the most important notions which we will encounter are special
types of mappings on vector spaces, called linear transformations.
L(X, )Y denotes the set of all linear transformations from linear space X
into linear space Y). .
It follows immediately from the above definition that T is a linear transfor-
mation from a linear space X into a linear space Y if and only if T(tl IIIXI)
" II,T(X
= I-I; ,) for all ,X E X and for all II, E F , ; = I, ... ,n. In engineering
I
and science this is called the principle of soperposition and is among the most
important concepts in those disciplines.
3.4.4. Example. Let X denote the space ofall complexv- alued functions x ( t)
defined on the half-open interval 0[ , 00) such that x ( t) is Riemann integrable
and such that
,--
where k is some positive constant and a is any real number. Defining vector
addition and scalar multiplication as in Eqs. (3.1.20) and (3.1.21), respectively,
it is easily shown that X is a linear space. Now let Y denote the linear space of
complex functions of a complex variable s (s = (1 + ;0>, ; = ,.JT = ). The
reader can readily verify that the mapping T: X - + Y defined by
T
[ (] x s) = 50- e- " x ( t) dt (3.4.5)
defined for a < s :::;;: b, a < t < b, such that for each x E X the Riemann
integral
s: k(s, t)x ( t) dt (3.4.7)
T
[ )xz (s) = y(s) = x(s) - s: k(s, t)x ( t) dt, (3.4.10)
and
Thus,
or
~IYI + ... +
O. ~+IY-+I =
Therefore, by Corollary 3.3.34, R
< (T) is finite dimensional and dim R
< [ (T)]
n< o •
3.4.22. Example. Let T: R% - > R"", where R% and R- are defined in Exam-
ples 3.1.10 and 3.1.11, respectively. F o r x E R% we write x = ({Io )%{ . Define
3.4. iL near Transformations
Tby
T(~I' ~Z) = (0, I~ ' 0, ~Z, 0, 0, ...).
The mapping T is clearly a linear transformation. The vectors (0; 1,0, ...)
and (0,0,0,1,0,0, ...) span R <[ (T)] = 2 = dim R
< (T) and dim R [ Z]. •
We also have:
But fel, e", ... ,en} is a basis for .X F r om this it follows that 71 = 7" = ...
= Y r = 7r+ I = ... = Y n = O. eH nce, fltf", ... ,fr are linearly independent
and therefore dim R < (T) = r. If s = 0, the preceding proof remains valid if
we let fel, ... ,e.} be any basis for X and ignore the remarks about the
vectors e{ r + I ' • • ,en}' If s = n, then ffi.(T) = .X eH nce, R
< (T) = O
{ J and so
dim R< (T) = O. This concludes the proof of the theorem. _
Since Thas an inverse ifand only iU t (T) = O{ ,J it follows that P(T) = dim X
if and only if T has an inverse. _
:Dm = X
(iii) Tx = 0 implies x = 0;
(iv) for each y E R < (T), there is a unique x E X such that Tx = y;
(v) if TXt = Tx 1 , then X t = x 1 ; and
(vi) if X t *' x 1, then TXt *' Tx 1•
With the above definitions it is now easy to establish the following result.
3.4.74 . Tbeorem. eL t X and Y be two linear spaces over the same field
of scalars ,F and let L ( ,X Y) denote the set of all linear transformations from
X into .Y Then L ( X , Y ) is itself a linear space over ,F called the space of
linear transformations (here, vector addition is defined by Eq. (3.4.24 ) and
multiplication of vectors by scalars is defined by Eq. (3.4.43».
Now let us return to the subject on hand. Let ,X ,Y and Z be linear spaces
over ,F and consider the vector spaces L ( ,X Y) and L(Y, Z). IfS E L ( ,Y Z)
and if T E L ( X , )Y , then we define the product STas the mapping of X into
Z characterized by
(ST)x = S(Tx ) (3.4.50)
for all x E .X The reader can readily verify that ST E L ( X , Z).
Next, let X = Y = Z. If S, T, V E L ( X , X ) and if ex, PE ,F then it is
easily shown that
S(TU ) = (ST)V, (3.4.51)
S(T+ U) = ST+ SV, (3.4.52)
(S + T)V = SU + TV, (3.4.53)
and
(exS)(PT) = (a,P)ST. (3.4.54)
F o r example, to verify (3.4.52), we observe that
S
[ eT + )U x] = S[(T + )U ]x = S[Tx + ]xU
= (ST)x + (SU ) x = (ST + SU ) x
for all x E ,X and hence Eq. (3.4.52) follows.
We emphasize at this point that, in general, commutativity of linear
transformations does not hold; i.e., in general,
ST*- TS. (3.4.55)
There is a special mapping from a linear space X into ,X called the identity
transformation, defined by
Ix = x (3.4.56)
for all x E .X We note that I is linear, i.e., I E L ( X , X ) , that I*- O ifand only
if X * - O
{ ,J that I is unique, and that
TI = IT = T (3.4.57)
for all T E L(X, X). Also, we can readily verify that the transformation
106 Chapter j I Vector Spaces and Linear Transformations
rJ,I, rJ, e ,F defined by
(a.I)x = a.lx = a.x (3.4.58)
is also a linear transformation.
The above discussion gives rise to the following result.
We further have:
3.4.65. Theorem. Let X be a linear space, and let S, T, U E L(X, X). Let
IE L ( X , X ) denote the identity transformation.
(i) If ST = S U = I, then S is bijective and S- I = T = .U
(ii) IfSand Tare bijective, then STis bijective, and (Sn- I = T- I S- I .
(iii) If S is bijective, then (S- I )- I = S.
(iv) If S is bijective, then a.S is bijective and (a.S>1- = ~ S- I for all a. E
F a nd a. *' O.
3.4. iL near Transformations 107
With the aid of the above concepts and results we can now construct
certain classes of functions of linear transformations. Since relation (3.4.51)
allows us to write the product of three or more linear transformations without
the use of parentheses, we can define T", where T E L ( ,X X ) and n is a positive
integer, as
T"I1T· T · ... · T . (3.4.67)
n times
Similarly, if T- I is the inverse of T, then we can define T- " ' , where m is a
positive integer, as
T- ' " 11 (T- I )' " = T- I • T- I ... • T- t .
. (3.4.68)
mtfmes
m ti'ines n tImes
= (T. T· .... T)
m + n·times
= T"'"+ = (T • T • . ..• T) • (T • T • . .• T)
•
n times
.
mtimes
= 1'" • T"'. (3.4.69)
In a similar fashion we have
(T"')" = T"" = T- = (1"')"' (3.4.70)
and
(3.4.71)
where m and n are positive integers. Consistent with this notation we also
have
TI = T (3.4.72)
and
TO = 1. (3.4.73)
We are now in a position to consider polynomials of linear transformations.
Thus, if f(A) is a polynomial, i.e.,
f(A) = 0« + A
\« + ... + "« A", (3.4.74)
where 0
« ' ... ,« " E ,F then by f(T) we mean
f(T) = f1, 0 1 + f1,tT + ... + ,« ,1"'. (3.4.75)
The reader is cautioned that the above concept can, in general, not be
108 Chapter 3 I Vector Spaces and iL near Transformations
Theorem 3.4.77 points out the importance ofthe spaces R" and C". Namely,
every n-dimensional vector space over the field of real numbers is isomorphic
to R" and every n-dimensional vector space over the field of complex numbers
is isomorphic to eft (see Example 3.I.lO).
3.5. IL NEAR N
UF CTIONALS
3.5.2. Example. Consider the space era, b]. Then the mapping
where X o is a fixed element of era, b] and where x is any element in era, b],
is also a linear functional on era, b]. •
is a linear functional on .X •
3.5.9. Exercise. Show that the mappings (3.5.3), (3.5.4), (3.5.5), (3.5.7),
and (3.5.8) are linear functionals.
Now let X be a linear space and let X ' denote the set of all linear func-
109
110 Chapter 3 I Vector Spaces and iL near Transformations
3.5.16. Theorem. The space X ' with vector addition and multiplication
of vectors by scalars defined by equations (3.5.12) and (3.5.13), respectively,
is a vector space over .F
3.5.18. Definition. The linear space X ' is called the algebraic conjugate
of .X
Let us now examine some of the propeties of X ' for the case of finite-
dimensional linear spaces. We have:
I'
(x , x ' ) = ~ rt~"
J'
1= 1
In our next result and on several other occasions throughout this book,
we make use of the Kronecker delta.
We now have:
(eJ , ~ . Pie;)= .~ .
Then
o= 1= 1 1= 1
p,(eJ , e;) = ~
1= 1
P'~/J = PJ'
and therefore we have PI = P~ = ... = P. = O. This proves that {e~, e;,
... , e~} is a linearly independent set.
To show that the set Ie;, e;, ... , e~} spans X ' , let ' x E X ' and define
•
rt , = (e x'). Let x = ~ 'lei' We then have
" 1= 1
(x , x ' ) = (' l eI + ... + ' . e., x ' ) = l'< eI> x') + + (' . e., x ' )
= ' I (e l , x ' ) + ... + ' . (e., x ' ) = ' I rt l + + ' . rt .•
Also,
(x , e~) = t:t
•
~ "<e ,, ~) = 'J
112 Chapter 3 I Vector Spaces and iL near Transformations
3.5.23. DefinitiOD. The basis (e;, e;, ... , e.} of X ' in Theorem 3.5.22 is
called the dual basis of (e" e2 , • • , e.}.
s
y
x
lX
The reader should have now no difficulties in proving the following results.
3.6. BILINEAR N
UF CTIONALS
Ifin Definition 3.6.1 the complex vector space is replaced by a real linear
space, then the concept of conjugate functional reduces to that of linear
functional, for in this case Eq . (3.6.2) assumes the form
g(czx + py) = czg(x) + pg(y) (3.6.3)
for all ,x y E X and for all cz, pER.
vectors, defined by
I'll +
g(x , y) = ~ ~z'lz = (~t + ~DI/2('It + I' DI/2 cos (J
is a bilinear functional. _
g, then
~ [ g (x , y) + g(y,x » ) = t(X t Y) - te 2 Y ) .
= I
4 [ g (x , x + y) + g(y, x + y»)
=
I
4 [ g (x , x) + g(x, y) + g(y, x ) + g(y, y»),
and also,
- } [ g (x , y) + g(y, )» x = te t Y) - t e 2 y). -
and
it(X - ; iY ) = ~ g[ (x, x) + ig(x, y) - ig(y, x ) + g(y, y»).
Note that Theorems 3.6.13, 3.6.15, and 3.6.17 hold only for complex
vector spaces. Theorem 3.6.15 implies that a bilinear form is uniquely
determined by its induced quadratic form, and Theorem 3.6.13 gives an
explicit connection between g and g. In the case of real spaces, these conclu-
sions do not follow.
F o r the case of real linear spaces, the definition of inner product is identical
to the above definition.
Since in a given discussion the particular bilinear functional g is always
118 Chapter 3 I Vector Spaces and iL near Transformations
It should be noted that if two different inner products are defined on the
same linear space ,X say (' , )' 1 and (' , • )2' then we have two different inner
product spaces, namely, { X ; (' , .).} and { X ; (0, ')2}'
Now let { X ; (0, .)' } be an inner product space, let Y be a linear subspace
of ,X and let (' , .)" denote the inner product on Y induced by the inner
product on X ; i.e.,
(x, y)' = (x, y)" (3.6.21)
for all ,x y EY e .X Then { Y ; (' , ' )"} is an inner product space in its own
right, and we say that Y is an inner product subspace of X.
Using the concept of inner product, we are in a position to introduce the
notion of orthogonality. We have:
Before closing the present section, let us consider a few specific examples.
3.6.23. Example. Let X = R"o F o r x = (~I' 00" ~") E R" and y = (' I I'
o ,' I .) E R· , we can readily verify that
••
•
(x, y) = ~ ~,'Il
I~
is an inner product, and { X ; ( ., .)} is a real inner product space. _
3.6.24. Example. Let X = C", F o r x = (~I' .. " ~.) E C" and y = ('II>
... ,' I .) E C· , let
•
(x, y) = :E ,~ ; "
1- 1
Then (x, y) is an inner product and ;X{ (., .)} is a complex inner product
space. _
3.7. PROJECTIONS
x = Xl + 2X
transformation which maps every point x in the plane X onto the subspace
XI along the subspace 1'X .'
*'
X ) for, in general, R< (T) and meT) need not be disjoint. F o r example, if
there exists a vector X E X such that Tx 0 and such that T2 x = 0,
. then Tx E R < (T) and Tx E meT).
121 Chapter 3 I Vector Spaces and iL near Transformations
eL t us now consider:
Note that this definition does not imply that every element in Y can be
written in the form z = Ty, with y E .Y It is not even assumed that Ty E Y
implies y E .Y
F o r invariant subspaces under a transformation T E U.X, X ) we can
readily prove the following result.
Next we consider:
The material of the present chapter as well as that of the next chapter is
usually referred to as linear algebra. Thus, these two chapters should be
viewed as one package. F o r this reason, applications (dealing with ordinary
differential equations) are presented at the end of the next chapter.
There are many textbooks and reference works dealing with vector spaces
and linear transformations. Some of these which we have found to be very
useful are cited in the references for this chapter. The reader should consult
these for further study.
REFERENCES
3[ .1] P. R. A H M
L OS, iF nite Dimensional Vector Spaces. Princeton, N.J . : D. Van
Nostrand Company, Inc., 1958.
3[ .2] K. O H M
F AN and R. N U K ZE, Linear Algebra. Englewood Cliffs, N.J . : Prentice-
H a ll, Inc., 1971.
3[ .3] A. W. NAYO L R and G. R. SEL,L Linear Operator Theory in Engineering and
Science. New Y o rk: H o lt, Rinehart and Winston, 1971.
3[ .4] A. E. TAYO L R, Introduction to u F nctional Analysis. New Y o rk: J o hn Wiley &
Sons, Inc., 1966.
4
IF NITE-DIMENSIONAL
VECTOR SPACES AND
MATRICES
.4 1. COORDINATE REPRESENTATION
OF VECTORS
124
.4 1. Coordinate Representation of Vectors U5
(4.1.2)
or as
(4.1.3)
We call x (or x T) the coordinate representation of the underlying object (vector)
x with respect to the basis { x " ... ,x,,}. We call x a column vector and x T a
row vector. Also, we say that x T is the transpose vector, or simply the transpose
of the vector x. F u rthermore, we define (x T f to be x.
I'
It is important to note that in the coordinate representation (4.1.2) or
(4.1.3) of the vector (4.1.1), an "ordering" of the basis IX{ ' ... ,x,,} is em-
ployed (i.e., the coefficient of X, is the ith entry in Eqs. (4.1.2) and (4.1.3».
If the members of this basis were to be relabeled, thus specifying a different
"ordering," then the corresponding coordinate representation of the vector
X would have to be altered, to reflect this change. However, this does not pose
any difficulties, because in a given discussion we will always agree on a par-
ticular "ordering" of the basis vectors.
Now let« E .F Then
x« = «('IX I+ ... + ,,,x.) = (<I' < )X I + ... + (<e< ")x,,. (4.1.4)
In view of Eqs. (4. t.I)-.4{- 1.4) it now follows that the coordinate representa-
tion of x « with respect to the basis {XI' ... ,x . } is given by
I'
z{
« e l-
e« z
«x= « (4.1.5)
I'_ t e«_ .
or
«x T= «({I' 'z,' .. ",,) = (<I'< ' «ez,' .. ,«e,,). (4.1.6)
Next, let y E ,X where
=Y (4.1.8)
or
(4.1.9)
Now
x + y = (~.x. + ... + ~RX.) + ('1IXI + ... + 1' RX.)
= (~I + 1' .)x l + ... + (~. + 1' .)x.. (4.1.10)
F r om Eq. (4.1.10) it now follows that the coordinate representation of the
vector x + Y E X with respect to the basis { X I "" ,x . l is given by
x+y=
[ ~.] . + .[~I]
~.
. . =
1' .
[~I . ~
R~
.
+ 1' R
1' 1] (4.1.11)
or
x T+ yT = (~ ..... , ~.) + ('11" .. ,'1R)
= (~. + 1' ..... ,~. + 1' R)' (4.1.12)
Next, let lU { t • . , u.l and V { It .• • , vRl be two different bases for the linear
space .X Then clearly there exist two different but unique sets of scalars
(i.e., coordinates) t{ .¥ , .• • ,t¥.l and fP..... ,P.l such that
X = t¥lu. + ... + t¥.u. = P.v l + ... + P.v.. (4.I.I3)
This enables us to represent the same vector x E X with respect to two dif-
ferent bases in terms of two different but unique sets of coordinates, namely,
[] ~d []
(4.1.14)
The next two examples are intended to throw additional light on the above
discussion.
(4.1.17)
or
x T= (~1' ... , ~.).
Moreover, the coordinate representations of the basis vectors U I> •• , tU I are
I 0 0
0 I
u. = , U z = 0 , ... , u =
"
, (4.1.18)
0
0 0 I
respectively. We call the coordinates in Eq. (4.1.17) the natural coordinates
of x E R". (The natural basis for F " and the natural coordinates of x E F "
are similarly defined.)
{ ., ... , v.J, given by v. = (1,0, ... ,0),
Next, consider the set of vectors v
Vz = (I, 1,0, ... ,0), ... ,v" = (I, ... , I). We see that the vectors { V I>
... , v.J form a basis for R". We can express the vector x given in Eq. (4.1.16)
in terms of this basis by
(4.1.19)
where ot" = ,,, and ott = " - ,'.+
for i = 1, 2, ... , n - 1. Thus, the coor-
dinate representation of x relative to {v., ... , v.) is given by
,. - z' -
z' - 3'
ot.
ot z
- (4.1.20)
.4 1.21. Example. Let X = e[a, b}, the set of all real-valued continuous
functions on the interval a[ , b]. Let Y = x{ o, x . , .• . , "x J c ,X where ox (t)
= 1 and x,(t) = I' for all I E a[ , b}, i = 1, ... ,n. As we saw in Exercise
3.3.13, Y is a linearly independent set in X and as such it is a basis for V( )Y .
128 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
eH nce, for any y E V(Y) there exists a unique set of scalars ('Io, I' I> • . , 1' .1
such that
y = I' oXo + ... + I' . X .• (4.1.22)
Since y is a polynomial in t we can write, mote explicitly,
y(t) = 1' 0 + ' l It + ... + 'I.t·, t E a[ , b). (4.1.23)
In the present example there is also a coordinate representation; i.e., we can
represent y E V( )Y by
(4.1.24)
I' .
This representation is with respect to the basis (x o, IX > • , .x l in V(Y).
We could, of course, also have used another basis fo~ V(Y). F o r example,
let us choose the basis (zo, z I' • . , .z l for V( )Y given in Exercise 3.3.13. Then
we have
y = 1X0Z o + IX I Z I + ... + IX"Z", (4.1.25)
where IX. = I' " and IXt = I' t - I' t+I' i = 0, 1, ... ,n - 1. Thus, y E V(Y )
may also be represented with respect to the basis (ZO,ZI' " • ,z . } by
1X 0 1' 0 - 'II
IX I I' I - 1' 2
(4.1.26)
Summarizing, we observe:
1. Every vector X belonging to an n-dimensional linear space X over
a field F can be represented in terms of a coordinate vector x, or its
transpose x T , with respect to a given basis e{ I' • • , e.l c .X We note
that x T E P (the space P is defined in Example 3.1.7). By convention
we will henceforth also write x E P. To indicate the coordinate repre-
sentation of x E X by x E P, we write x ~ .x
2. In representing x by x, an "ordering" of the basis e{ l t • • , ell} c X
is implied.
.4 2. Matrices 129
.4 2. MATRICES
.4 2.1. Theorem. Let e{ ., e2, ... ,e ..} be a basis for a linear space .X
(i) eL t A be a linear transformation from X into vector space Y and
set e; = Ae1 , e~ = Ae2 , • • is any vector in X and if ,I" = Ae... If x
(el> e2"' " e..) are the coordinates of x with respect to e{ ., e2, ... ,
e..}, then Ax = e1e; + e2e~ + ... + e.. e~.
(ii) L e t {e~, e;, ... , e~J be any set of vectors in .Y Then there exists a
unique linear transformation A from X into Y such that Ae l = e;,
Ae 2 = e~, .• . , Ae.. = 1".
Proof To prove (i) we note that
Ax = A(e1e l + e2e2 + + e"e,,) = elAe l + e2Ae2 + ... + e"Ae"
= el~ + e2e~ + + e"e~.
To prove (ii), we first observe that for eachx E X we have unique scalars
e., e2" .. , e.. such that
x = e.e l + e2e2 + ... + e..e".
Now define a mapping A from X into Y as
A(x) = ele; + e2e~ + ... + e.. l".
Clearly, A(e,) = e; for i = 1, ,n. We first must show that A is linear.
Given x = ele l + e2e2 + + e..e.. and y = ' l Ie. + I' 2e2 + ... + ' I ..e..,
we have
A(x + y) = A[(el + I' 1)e. + + (e .. + ' I ..)e ..l
= (el + I' 1)e'l + + (e .. + ' I ..)e' ...
130 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
and
Thus.
A(x) + A(y) = ell. + e~e~ + ... + e"e:. + 111e~ + 11~~ + ... + l1"e~
= (el + 111)e~ + (e~ + 11~)~ + ... + (e" + 11,,)e:.
= A(x + y).
In an identical way we establish that
lXA(x) = A(lX)X
for all x E X and all lX E .F It thus follows that A E L ( X . )Y .
To show that A is uniq u e. suppose there exists aBE L ( X . )Y such that
Be, = e; for i = I• . ..• n. It follows that (A - B)e, = 0 for all i = I• . ..• n.
and thus it follows from Exercise 3.4.6 4 that A = B. •
We point out that part (i) of Theorem .4 2.1 implies that a linear transfor-
mation is completely determined by knowing how it transforms the basis vectors
in its domain. and part (ii) of Theorem .4 2.1 states that this linear transfor-
mation is uniquely determined in this way. We will utilize these facts in the
following.
Now let X be an n-dimensional vector space. and let {el' ez • . ..• ell} be
a basis for .X L e t Y b e an m-dimensional vector space. and let {fIJ~ • ... J " ,}
be a basis for .Y L e t A E L ( X . )Y . and let e; = Ae, for i = I • . ..• n. Since
{[IJ~ • ... J " ,} is a basis for .Y there are uniq u e scalars a{ o.} i = I• . ..• m.
j = I • . ..• n. such that
Ael = I. = allfl + azt!~ + + a",t!",
Aez = ~ = aufl + aufz + + a",d", (4.2.2)
+ + ... +
I
Ae" = e:. = at..!1 az,,[z a",..!",.
Now let x E .X Then x has the uniq u e representation
x = elel + e~ez + .,. + e"e"
with respect to the basis e{ l• •• ell}' In view of part (i) of Theorem 4.2.1
we have
Ax = ele~ + ... + e"e~. (4.2.3)
Since Ax E .Y Ax has a uniq u e representation with respect to the basis
ffIJ~.· .. ,fIlII. say.
Ax = 11t!1 + l1dz + ... + 11",[",. (4.2.4)
.4 2. Matrices 131
ail au a 18 ]
A -- [ a,}] - - a" I au ... ah . (4.2.6)
r a"'l
We see that once the bases {el, e", . .. ,e { / h/", ... ,I",} are fixed, we can
a",,, .. , a"'8
8 },
.4 2.7. Definition. The array given in Eq. (4.2.6) is called the matrix A of
tbe linear transformation A from linear space X into linear space Y with respect
to the basis e{ 1> • • , en} of X and the basis { I I' ... ,fIll} of .Y
If, in Definition .4 2.7, X = ,Y and if for both X and Y the same basis
e{ l' ... , e is used, then we simply speak of the matrix A of the linear trans-
8 }
scalars (all' 0 2/ , ... , 0"'/) form the jth column of A. The scalar a'l refers to
that element of matrix A which can be found in the ith row and jth column of
A. The array in Eq. (4.2.6) is said to be an (m X n) matrix. Ifm = n, we speak
of a square matrix (i.e., an (n X n) matrix).
In accordance with our discussion of Section .4 1, an (n X 1) matrix
is called a column vector, column matrix, or n-vector, and a (1 x n) matrix
is called a row vector.
We say that two (m X n) matrices A = [ 0 1/] and B = b[ l/] are equal if
and only if 01/ = bl/ for all i = I, ... , m and for allj = I, ... , n.
F r om the preceding discussion it should be clear that the same linear
transformation A from linear space X into linear space Y may be represented
by different matrices, depending on the particular choice of bases in X and .Y
Since it is always clear from context which particular bases are being used,
we usually don' t refer to them explicitly, thus avoiding cumbersome notation.
Now let AT denote the transpose of A E L ( X , Y) (refer to Definition
3.5.27). Our next result provides the matrix representation of AT.
for j = I, ... , m. By Theorem 3.5.22, e< e~> = 6,,, and <I",f~> = 6,,}.
Therefore, "
.4 2. Matrices 133
Also,
A< el,f>~ = e< l, AT/~> = (el, tl bkje~)
= k=L 1" bklel, e~> = bl}'
Therefore, b,j = ajl' which proves the theorem. _
The preceding result gives rise to the following concept.
Our next result follows trivially from the discussion leading up to Defini-
tion .4 2.7.
be the matrix of A with respect to the bases reI' e1 , •• , e.} and { f l,fl, ... ,
I.}.Then
allel + a l1 el + + alae. = ' I I'
auel + a 21 el + + a 1"e. = 1' 1' (4.2.13)
or, equivalently,
•
I' I = L
jml
a,je j, i = I, ... , m. (4.2.14)
Using matrix and vector notation, let us agree to express the system of
linear equations given by Eq. (4.2.13) equivalently as
134 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
all
au
aU.
au
aa.
a~ h
I'
2'
"1
"2
(4.2.16)
a_ In ab a•
or, in short, as
T
x AT ~ yT. (4 . 2.19)
We note that in Eq. (4.2.17), x E P, Y E F"', and A is an m X n matrix.
F r om our discussion thus far it should be clear that we can utilize matrices
to study systems of linear eq u ations which are of the form of Eq. (4.2.13).
It should also be clear that an m x n matrix A is nothing more than a uniq u e
representation of a linear transformation A of an n-dimensional vector space
X into an m-dimensional vector space Y over the same field .F As such,
A possesses all the properties of such transformations. We could, in fact,
utilize matrices in place of general linear transformations to establish many
facts concerning linear transformations defined on finite-dimensional linear
spaces. However, since a given matrix is dependent upon the selection of two
particular sets of bases (not necessarily distinct), such practice will, in general,
be avoided whenever possible.
We emphasize that a matrix and a linear transformation are not one and
the same thing. In many texts no distinction in symbols is made between
linear transformations and their matrix representation. We will not follow
this custom.
B. Rank of a Matrix
for X and a basis { I I' ... ,fIll} for Y such that the matrix A of A with respect
to these bases is of the form
r..
- 100 6 0 0 0-
010 o 0 0 0
000···000···0
000···000···0
....
n= dim X
Proof. We choose a basis for X of the form e{ l, e2.' ... ,e" e,+I' • . . , e.},
where e{ l+ ' > ... , e.} isa basisfodJt(A). Ifll = Ae l ,f2. = Ae2.' ... ,/, = Ae"
then {l1,f2.," .,/,} is a basis for R < (A), as we saw in the proof of Theorem
3.4.25. Now choose vectors 1,+1, ... ,fin in Y such that the set of vectors
l{ 1,f2., .. . ,f",} forms a basis for Y (see Theorem 3.3.4)4 . Then
The necessity is proven by applying Definition 4.2.7 (and also Eq. (4.2.2»
to the set of equations (4.2.22); the desired result given by Eq. (4.2.21)
follows.
Sufficiency follows from the fact that the basis for R
< (A) contains r linearly
independent vectors. _
... , Ae,,~
c. Properties of Matrices
of D with respect to the basis { f l ,f", ... ,!m} in Y a nd with respect to the basis
{ g l' g", ... , g,} in Z. The product mapping DA as defined by Eq. (3.4.50)
is a linear transformation of X into Z. We now ask: what is the matrix C
of DA with respect to the bases e{ l, e", ... , e of X and g{ I ' g", ... , g,} K }
and
,
B! J = 1 :bljg/t j= I , ... ,m.
1= 1
Now
, "'
= 1=1:1 J=I1: blj aJkgl'
for k = I, ... , n. Thus, the matrix C of BA with respect to basis e{ I' .•. , eK }
(4.2.28)
.4 2.32. Theorem.
(i) Let A and B be (m x n) matrices, and let C be an (n X r) matrix.
Then
(A B)C = AC + BC. + (4.2.33)
(ii) Let A be an (m X n) matrix, and let Band C be (n x r) matrices.
Then
A(B + C) = AD + AC. (4.2.34)
.4 2. Matrices 139
(A + B) + C = A+ (B + C). (4.2.40)
(4.2.42)
4.2.55. Theorem.
(i) An (n x n) non-singular matrix has one and only one inverse.
(ii) IfA and B are non-singular (n x n) matrices, then (AB)-I = B-1 A- I .
(iii) If A and Bare (n x n) matrices and if AB is non-singular, then so
are A and D.
4.2.57. Theorem.
(i) F o r any matrix A, (AT)T = A.
(ii) L e t A and B be conformal matrices. Then (AB)T = DTAT.
(iii) L e t A be a non-singular matrix. Then (AT)-I = (A-I)T.
(iv) L e t A be an (n X n) matrix. Then AT is non-singular if and only if
A is non-singular.
(v) Let A and B be comparable matrices. Then (A + B)T = AT BT. +
(vi) L e t« E F and A be a matrix. Then (<A
< )T = A « T.
A= [~ ~ ~l and B ~ [~ i ~I
Then
A+ B = :[ 3 ~ :J
o 3
and A - B = [~ ~ iJ.
1 0 -I
-
If /X = 3, then
/XA =
[ l~ :~ ~ 18
I'J~ .
9
o 3
Then
3- i 5+ 2;]
C + D =[ I~ 7 + 7; 11 .
7+ i 5 -4
I- 2- 3;]
If/X = - i , then
-i
/XC = -8; 7 -6i·.
[
1- i - 3 i +5i
a
.4 2.71. Example. eL t F denote the field of real numbers, let
G ~ [: :] and H - [~
144 eMpter 4 I iF nite-Dimensional Vector Spaces and Matrices
Then
10 13.
5]
GH =
[ 10 IS
22
K = [~ ~J and L = [~ ~J
Then
10 5J I[ I 7 12·
16J
KL= [ 22 13 and K L =
M = [~ ~J and N = [~ ~J.
Then
MN= [~ ~J =0,
2 4 I 2]
AT=4560._
[
163 I
p~ [: -6
7 ~] and Q= 24
I
24
-2 I
24
-45 -6 27
24 24 24
Then
.4 2. Matrices 145
1 0
p· Q = Q · P = 0 1
[ o 0
i.e., Q = p- I or, equivalently, P = Q- l . •
[
: ~ ~] [~:]
:
e3
= [ ~ ]. (4.2.79)
2 1 0 1
e, 0
eL t
4 2 1 3]
[2
A= 6 314 .
I 0 I
(4.2.80)
Next, we discuss briefly partitioned vectors and matrices. Such vectors and
matrices arise in a natural way when linear transformations acting on the
direct sum of linear spaces are considered.
Let X be an n-dimensional vector space, and let Y be an m-dimensional
vector space. Suppose that X = U EB W, where U is an r-dimensionallinear
subspace of ,X and suppose that Y = R EB Q, where R is a p-dimensional
linear subspace of .Y Let A E L(X, )Y , let e{ l, , eft} be a basis for X such
that e{ l' , e,} is a basis for ,U and let { f l' ,f",} be a basis for Y such
that { I I' ,f,} is a basis for R. Let A be the matrix of A with respect to
these bases. Now if x E P is the coordinate representation of x E X with
respect to the basis e{ l, ... , eft}' we can partition x into two components,
146 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
I~ -
;[= } (4.2.81)
~.
where U E pr, ' " E )'-·<F and where
y=
=[~} (4.2.82)
' I ..
where y is the coordinate representation of y with respect to { I I' ... ,I..}
and where r E PF and q E F"-p. We say the vector x in Eq. (4.2.81) is parti-
tioned into components u and "'. Clearly, the vector u is determined by the
coordinates ofx corresponding to the basis vectors e{ l' ... ,e,} in .U
We can similarly divide the matrix A into the partition
B= [~I_~LJ' B21 ! Bu
(4.2.86)
We now prove:
1 0:,
o
I
I
I r
I
I
I
I
I
p= 0 0 : (4.2.89)
-~_.
:0
I
0
I
I
o I
I
I
I
•
n- r
I •
;0 o
I
where r = dim R
< (P).
Proof. Since P is a projection we have, from Eq. (3.7.8),
X = R
< (P) EB (~ P).
Now let r = dim R < (P), and let e{ l> ... , e.J be a basis for X such that (el'
... , e,J is a basis for R< (P). Let P be the matrix of P with respect to this basis,
and the theorem follows. •
148 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
A = [~: -i'~!:J o :A 2Z
4.3.1. Theorem. Let e{ l , • • ,e"} be a basis for a linear space ,X and let
e{ ;, ... , e~} be a set of vectors in X given by
e; = :t pjlej•
j= I
i= 1, ... ,n, (4.3.2)
where Plj E F for all i,j = I, ... ,n. The set e{ ;, ... ,~} forms a basis for
X if and only if P = [Plj] is non-singular.
Proof Let e{ ;, . .. ,~} be linearly independent, and let Pj denote the
jthcolumn vector of P. Let
.4 3. Equivalence and Similarity 149
Rearranging, we have
or
I::" IX I"
1= I = O.
Since e;, ... , e.
are linearly independent, it follows that IX I = ... = IX" = O.
Thus, the columns ofP are linearly independent. Therefore, P is non-singular.
Conversely, let P be non-singular, i.e., let P
{ I' .• • , PIt} be a linearly inde-
,=I:: lX,e; =
pendent set of vectors in .X Let " 0 for some scalars IX I' . • . , IX" E .F
I
Then
.4 3.6. Theorem. eL t X be a linear space, and let the sets of vectors e{ l>
... ,eft}' e{ ~ ,e..}, and e{ f' , . .. , e':} be bases for .X If P is the matrix
of basis e{ ,~ , e'ft} with respect to basis e{ I ' • • , eft} and if Q is the matrix
of basis e{ f' , , e':} with respect to basis e{ ,~ ... ,e..}, then PQ is the
matrix of basis e{ f' , • . . , e':} with respect to basis e{ l , • • eft}'
We now prove:
and
Thus,
~ ft ~eJ = ~• ~ [ .~ plJe, ] = ~ ft(.
~ P/J~J ) e,
J-I J~I 1-' t:1 I- I
which implies that
ft
,~ = ~
j':1
P/J~J' i = I, ...• n.
Therefore,
x = Px /. •
= t
k~1
Pkl[l=t1 alk(t
J=I
q J d j)] = J-l
t(f't1 t
k= 1
lJ q alkPkl)fj.
A
x " y
x= Px' " y= Ax
t·
(e;, .. ·.e~} A' u; ..... f;"}
x' " y' = Qy
0 0 0 .. · 0 0 0 .. · 0
(ii) two (m x n) matrices A and B are equivalent if and only if they
have the same rank; and
(iii) A and AT have the same rank.
,;,.;A~ __ ....
X • X
A
•
t,
Ie" . ". enl Ie,.' ..• enl
t,,-
A'
Ie; ..... e~l e{ ;, ... , e~}
Our next result shows that ' " given in Definition 4.3.22 is an equivalence
relation.
4.3.27. Theorem.
(i) Ifan (n X n) matrix A is similar to an (n X n) matrix B, then At is
similar to Bk, where k is a positive integer.
(ii) L e t
(4.3.28)
A' = (4.3.31)
o 0 0 1"_1 0
o 0 0 o 1.
.4 .4 Determinants ofMatrices ISS
Then
MOO .. · · 0
o A~ 0 0
(A')k =
o 0 0
o 0 0 o
Now letf(A) be given by Eq. (4.3.28). Then
I 0 0 Al 0
o I 0 0 A2
o o o o o o
o o o o f(l.)
We conclude the present section with the following definition.
.4 .4 DETERMINANTS OF MATRICES
q=
I 2 ... n)
( jl jz ... j,,'
wherej, E Nfor i = I, ... , n andj, *- j" for i *- k. Henceforth, we represent
q given above, more compactly, as
q = j dz .. . j".
Clearly, there are n! possible permutations on N. We let P(N) denote the
set of all permutations on N, and we distinguish between odd and even
permutations. Specifically, if there is an even number of pairs (i, k) such
that i > k but i precedes k in q , then we say that q is even. Otherwise q is
said to be odd. Finally, we define the function sgn from P(N) into F b y
+ I q is even
sgn (q) = {
-I q is odd
for all q E P(N).
Before giving the definition of the determinant of a matrix, let us consider
a specific example.
t1 is
t1 (jl.h) (j.. h) (jz , h) odd or even sgn t1
where tU i]. ... j.) E P(N). It is possible to find n! such products, one for
each u E P(N). We now define the determinant of A, denoted by det (A), by
the sum
det (A) = I:
"ep(N)
sgn (0') • allt • a2jo • . ..• a.}., (4..4 2)
det(A) = (4..4 3)
all au a l3
det (A) = a ZI an a Z3 ,
We also have:
a.2
then
det(A) =
Furthermore, we have:
~
1=1
•
a,/c ,k = 0 for j *' k (4..4 16a)
and
(4..4 16b)
(4..4 19)
i, k = 1, ... , n.
We are now in a position to prove the following important result.
det(AB)= .
•
~ a",.b /• 1
'.=1
a"" a",.
This determinant will vanish whenever two or more of the indices i/,j = 1,
... , n, are identical. Thus, we need to sum only over (f E P(N). We have
where q = ili~ ... i. and P(N) is the set of all permutations of N = {I, ... ,
n}. It is now straightforward to show that
o
A' = QAP=
o
°
o
This shows that rank A < nand det (A') = 0. But
det (QAP) = d[ et (Q») • [det (A») • [det (P») = 0,
and det (P) *° and det (Q)::I= 0. Therefore, if A is singular, then det (A)
=0 . •
162 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
We now have:
A~[: _~ H
We have det(A) = -1,
-1
adj (A) =[ ~ -1 ~],
-3 1 -1
and
-2
A- I = -~
[
.4 .4 30. Theorem. If A and 8 are similar matrices, then det (A) = det (8).
.4 5. EIGENVALE
U S AND EIGENVECTORS
i. = Ae. = l.e.,
where 1, E ,F i = 1, ... , n. If this is the case, then the matrix A' of A with
respect to the given basis is
A/ =
o
Henceforth we let
mol = x{ E :X (A -
OJ. .tl)x
(4.5.4)=
The preceding result gives rise to several important concepts which we
introduce in the following definition.
Our next result provides the connection between Definitions .4 5.5 and
.4 5.6.
.4 5.8. Theorem. Let A E L ( X , X ) , and let A be the matrix of A with respect
to the basis e{ ., ... ,e,,}. Then A. is an eigenvalue of A if and only if.t is an
eigenvalue of A. Also, x E X is an eigenvector of A corresponding to .t if
.4 5. Eigenvalues and Eigenvectors 165
det(A - 1 1) = (4.5.14)
0"1 ad (a"" - 1)
It is clear from Eq. (4.4.2) that expansion of the determinant (4.5.14) yields
a polynomial in 1 of degree n. In order for 1 to be an eigenvalue of A it must
(a) satisfy Eq. (4.5.12), and (b) it must belong to .F Requirement (b) warrants
further comment: note that there is no guarantee that there exists 1 E F
such that Eq. (4.5.12) is satisfied, or equivalently we have no assurance that
the nth-order polynomial equation
det(A - 1 1) = 0
has any roots in .F There is, however, a special class of fields for which
requirement (b) is automatically satisfied. We have:
.4 5.15.
Definition. A field F is said to be algebraically closed if for every
polynomial p(l) there is at least one 1 E F such that
Pel) = o. (4.5.16)
166 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
Any 1 which satisfies Eq. (4.5.16) is said to be a root of the polynomial equa-
tion (4.5.16).
where AI' i = 1, ... ,p, are the distinct roots of Eq. (4.5.19) (Le., AI 1= = A/
for i 1= = j). In Eq. (4.5.24), ml is called the algebraic multiplicity of the root AI'
The ml are positive integers, and t
1= 1
ml = n.
The Cayley-aH milton theorem holds also in the case of linear transfor-
mations. Specifically, we have the following result.
A=G J~
Let us use Theorem .4 5.26 to evaluate A37. Since n = 2, we assume that
A37 is of the form
A37 = Pol + PIA.
The characteristic polynomial of A is
P(A) = (I - 1)(2 - 1)
and the eigenvalues of A are 1 1 = I and 1 2 = 2. In the present case f(l)
= 1 37 and r(l) in Eq. (4.5.27) is
r(l) = Po + PI1.
We must determine Po and PI' sU ing the fact that P(11) = P(11) = 0, it
.4 6. Some CtllWnical oF rms ofMatrices 169
(iii) if B is any matrix similar to A, then trace (8) = trace (A); and
(iv) let f(A) denote the polynomial
f(A) = 10 + 11 A + ... + 1... A... ;
then the roots of the characteristic polynomial of f(A) are f(AI ),
... ,f(A,) and
det [ f (A) - All = [ f (A I ) - A]"" ... [ f (1,) - 1] ...•.
.4 6. SOME CANONICAL O
F RMS OF MATRICES
We note that if, in the above theorem, A has n distinct eigenvalues, then
the corresponding n eigenvectors span the linear space X (recall that dim
X = n ).
Our next result enables us to represent a linear transformation with n
distinct eigenvalues in a very convenient form.
A' = (4.6.6)
o
A"
Proof Let e; denote the eigenvector corresponding to the eigenvalue A/.
In view of Theorem .4 6.1, the set e{ ;, e;, ... ,e.} is linearly independent
because AI' Az , , ,t" are all different. Moreover, since there are n of the
e;, the set e{ ,~ e;, , e~} forms a basis for the n-dimensional space .X Also,
from the definition of eigenvalue and eigenvector, we have
Ae; Ale; =
Ae; = Aze; (4.6.7)
has n distinct roots, )..1' ,Aft' then A is similar to the matrix A' of A
with respect to a basis (e;, , e~}, where
o
A' = (4.6.9)
o
Aft
In this case there eixsts a non-singular matrix P such that
A' = P- I AP. (4.6.10)
The matrix P is the matrix of basis (e;, e;, ... , e~} with respect to basis
(e l , ez , ... , ell}, and p- I is the matrix of basis e{ l, ... ,eft} with respect to
172 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
basis e{ ,~ ... , e,,}. The matrix P can be constructed by letting its columns
be eigenvectors of A corresponding to AIt • • , A., respectively. That is,
P= [XI' x 2, • • ,x.l, (4.6.11)
where x tt • • ,x . are eigenvectors of A corresponding to the eigenvalues
AI, ... , A., respectively.
The similarity transformation P given in Eq. (4.6.11) is called a modal
matrix. If the conditions of Theorem .4 6.8 are satisfied and if, in particular,
Eq. (4.6.9) holds, then we say that matrix A bas been diagonalized.
A- _ 2-[ J4 I I
.
The characteristic polynomial of A is
p(l) = det(A - 1 I) = det(A - 1 1) = A2 + A- 6.
Now det (A - AI) = 0 if and only if A2 + A- 6 = 0, or (A - 2)(A + 3)
= O. Thus, the eigenvalues of A are 1 1 = 2 and 1 2 = - 3 . To find an
eigenvector corresponding to 1 1 , we solve the equation (A - l ll)x = 0, or
The diagonal matrix A' given in Eq. (4.6.9) is, in the present case,
Then
P= [XI' x 2] = [~ ;- J
p- I =
[ ..22 -.2.8J
;J.
and
P- I AP = [~ -~J = [~I
By Eq. (4.3.2), the basis e{ ,~ e;) c X with respect to which A' represents
A is given by
2 •
e~ = ~
1=1
PIleI = el + e2' e; =
1=1
~ Pnel = 4e 1 - e2'
In view of Theorem 4.3.8, if X is the coordinate representation of x with
respect to e{ l, e2 ), then X ' = P- I X is the coordinate representation of x
with respect to {e~, e;). The vectors e~, are, of course, eigenvectors of A e;
corresponding to AI and A2 , respectively. _
13-2]
A=
[ 0 4
o 3
-2
-I
is
det (A - AI) = (I - A)2(2 - A) = 0,
and the eigenvalues of A are AI = 1 and A2 = 2. The algebraic multiplicity
174 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
m H[
eigenvectors
and
:[ }
Letting P denote a modal matrix, we have
1 I] 0[ 1 -1
p=[~ Oland p- I
010
= -2
3
!n
and
A'-P-'AP=[~
In this example, dim moll = 2, which happens to be the same as the algebraic
multiplicity of 11" F o r this reason we were able to diagonalize matrix A. •
The next example shows that the multiplicity of an eigenvalue need not
be the same as its algebraic multiplicity. In this case we are not able to
diagonalize the matrix.
21-2]
A =
[ 001
0 2 -1
is
det(A - ) .I) = (1- 1 )(2 - 1 )~ = 0
and the eigenvalues of A are 1 1 = 1 and 1~ = 2. The algebraic multiplicity
of 1~ is two. An eigenvector corresponding to AI is = (I, 1, 1). An rx
eigenvector corresponding to 1~ must be of the form
H[ *~ O.
.4 6. Some Canonical oF rms ofMatrices 175
Setting ~x = (1,0,0), we see that dim mAl = I, and thus we have not been
able to determine a basis for R3, consisting of eigenvectors. Consequently,
we have not been able to diagonalize A. •
A=l'-~[ *J
where dim Y = r, AI is an (r X r) matrix and A2 is an (n - r) X (n - r)
matrix.
We can generalize the preceding result. Suppose that X is the direct sum
of linear subspaces X I ' ... ' X , that are invariant under A E L ( X , X).
We can define linear transformations AI E L ( X I , ,X ), i = 1, ... ,p, by
A/x = Ax for x E X,. That is to say, A, is the restriction of A to ,X . We now
can find for each A, a matrix representation A" which will lead us to the
following result.
A=
:
,-- ,
r- -
o : A,
Moreover, A, is a matrix representation of A" the restriction of A to X it
176 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
F r om the preceding it is clear that, in order to carry out the block diago-
nalization of a matrix A, we need to find an appropriate set of invariant
subspaces of X and, furthermore, to find a simple matrix representation on
each of these subspaces.
In addition to the diagonal form and the block diagonal form, there
are many other useful forms for matrices to represent linear transformations
on finite-dimensional vector spaces. One of these canonical forms involves
triangular matrices, which we consider in the last result ofthe present section.
We say that an (n X n) matrix is a triangulu matrix ifit either has the form
all 012. 0 13 ab
o 022 023 02.
(4.6.21)
0 0 0 a._ I ,.
0 0 0 a••
or the form
all 0 0 0
021 02:1, 0 0
(4.6.22)
AI bl2 bl,k+1 J
B=
[
0.... ~ ... ::: ... ~ '.k:.1 .
o bk+I,z .• . bk+I,k+1
Now let C be the k x k matrix
I
I
p= I
Q
I
• I
I
• I
0:
I
p- I = I
I
.: Q- I
I
• 1
0:
1
178 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
and
AI :. • •. •
-~_.
o: I
P- I BP = I
I
I
I
I
I
o:
I
where the .' s denote elements which may be non-ez ro. Letting A = P-IBP,
it follows that A is upper triangular and is similar to B. eH nce, any (k + 1)
x (k + 1) matrix which represents A E L ( X , X ) is similar to the upper
triangular matrix A, by Theorem .4 3.19. This completes the proof of the
theorem. _
.4 7. MINIMAL POLN
Y OMIALS, NILPOTENT
OPERATORS, AND THE JORDAN
CANONICAL O F RM
A. Minimal Polynomials
A = [~ o ~ =~]. 3 -I
The characteristic polynomial of A is
p(A) = (I - 1)Z(2 - 1),
and we know from the Cayley- Hamilton theorem that
P(A) = O. (4.7.1)
.4 7. Minimal Polynomials 179
Proof. Let 11 denote the degree of mel). Then there exist polynomials q ( l)
and r(l) such that (see Theorem 2.3.9)
I(l) = q ( l)m(l) + r(l),
where deg r[ (l)] < 11 or r(l) = O. Since I(A) = 0, we have
o= q(A)m(A) + rCA),
and hence rCA) = O. This means r(l) = 0, for otherwise we would have a
contradiction to the fact that mel) is the minimal polynomial of A. Hence,
I(l) = q ( l)m(l) and mel) divides I(l). •
We now prove:
Then
(A - lI)B(l) = l' B o + A,-tB 1 + ... + AB'I_ - [l'-'AB o+ l· - l AB.
+ ... + AB,_t]
= A'B o + A·1- B
[ t - ABo] + A,-l[Bl - AB t] + .,.
+ A[B,-t - AB,_l] - AB,_t
= l' I + PtA,- I I + ... + P,- t ll + P,I = m(l)I.
.4 7. MinimolPolynomials 181
.4 7.11. Exercise. Prove Theorem .4 7.9. (Hint: Assume that m().) = ().
- p\ ) " ... (). - p,)", and use Corollary .4 7.6 and Theorem .4 7.8).
4.7.12. Theorem. Let A' be similar to A. and let m' ( .t) be the minimal
polynomial of A' . Then m /().) = m().).
This result justifies the following definition.
Then
and
Next we prove:
.4 7.18. Theorem. Let X be a vector space over C, and let A E L(X, X).
Let m(l) be the minimal polynomial of A as given in Eq. (4.7.10). Let g(l)
= (A - AI)", let h(A) = (l - A1)" ... (A - Ap )" if p 2 2, let h(A) = I
if p = I. eL t AI be the restriction of A to ~i', i.e., AI X = Ax for all x E ~i'.
Let ml = x { E :X h(A)x = OJ. Then
(i) X = ~'i' EEl ml; and
(ii) (l - A I)" is the minimal polynomial for AI'
Proof By Theorem .4 7.14, ml and ~i' are invariant linear subspaces under
A. Since g(l) and h(l) are relatively prime, there exist polynomials (q A) and
r(l) such that (see Exercise 2.3.15)
q ( l)g(l) + r(l)h(l) = 1.
.4 7. Minimal Polynomials 183
polynomial of A be
p(A.) = (A.I - A.)"" ... (A., - A.)-,' (4.7.21)
and let the minimal polynomial of A be
m(A.) = (A. - A. I ) " . • . (A. - A.,)". (4.7.22)
eL t
,x = :x { (A - A.,I)"x = OJ, i= I, ... ,po
Then
(i) "X i= I, ... ,p are invariant linear subspaces of X under A;
(ii) X = Xl Et> •.. Et> X,;
(iii) (A. - A.,)" is the minimal polynomial of A" where A, is the restriction
of A to X,; and,
(iv) dim ,X = m" i = I, ... ,po
Proof The proofs of parts (i), (ii), and (iii) follow from the preceding
theorem by a simple induction argument and are left as an exercise.
To prove the last part ofthe theorem, we first show that the only eigenvalue
of A, E (L "X ,X ) is A." i = I, ... ,po eL t f) E "X v*" 0, and consider
(A, - A.l)v = O. From part (iii) it follows that
0= (A, - A.,ly"V = (A, - 11I),·1- (A , - A.I/)v
= (A, - 1,I),·I- (A. - A.,)v = (A. - A.,)(A, - A.,I),.- l (A, - A.,l)v
= (A. - l ,)l(A , - A.,I),,-l v = ... = (A. - A.,)"v.
From this we conclude that 1 = 1
"
We can now find a matrix representation of A in the form given in Theorem
.4 6.18. uF rthermore, from this theorem it follows that
B. Nilpotent Operators
*"
space. If N is a nilpotent linear transformation of index q and if x . E V is
such that N,- l x 0, then the vectors x , Nx , ... , N,,- I x in V are linearly
independent.
186 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
Ne f = 0 •e t + 0 • e2 + ... + I •e f- t + 0 •e f•
eH nce,
188 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
Thus,
N' ( fl. e,,+. + ... + fl,e,• ..> = 0,
and (fl.e,,+. + ... + fl,e".,) E W,. If fl.e,,+! + ...
+ fl,e" • 1= = 0, it can
be written as a linear combination of e., ... , e", which contradicts the fact
that e{ ., . .. ,e".,} is a linearly independent set. If fl.e,,+. + ... + fl,e,•• ,
= 0, we contradict the fact that e { ., ... , e".,} is a linearly independent set.
eH nce, weconcludethatlZ, = Ofori = I, ... , r andfl, = Ofori = I, ... , t.
This completes the proof of the theorem. _
o N,
where
0100 00
N,= 0010
0000
.
00
01
(4.7.35)
0000 00
i = 1, ... ,r, where r = I., N, is a (k, x k,) matrix, I :::;; k,:::;; lI, and k, is
determined in the following way: there are
I. - I._I (v X v) matrices,
2/, - 1'1+ - 1,-. (i x i) matrices, i = 2, ... ,v - I, and
2/. - 11 (I x I) matrices.
The basis for V consists of strings of vectors of the form
•• ,/(/y- I v_ . ),y = e,y and let It. .- 1 = Nlt.., ... ,/(/.- 1 .- . ),.- 1 = NI(/._I .• • )•• ,
By Lemma .4 7.32, it follows that {el>'" ,e,._.,fl .• - I ,' " ,I<,.-, .• ,)•• - I } is
a linearly independent subset of W._I> which mayor may not be a basis for
W._ I' If it is not, we adjoin additional elements from W._> \ denoted by
1<,.-, .• • 1+ 1 .• - 1 "" ,/(/•.• -Iv • )•• >\- so as to form a basis for W._ I • Now let
11 .• 2- . = NII • - I ,I2.•• 2- . = NI2..• - I ' · · · ,1<, .• • ,- .• • ),.-2. = NI<, .• • ,_ .• • ).• _ I · By
Lemma .4 7.32 it follows, as before, that e{ >\ ... , e,•.• ,/I.• 2- .,. .. ,1<, .• • I- .• • ).• 2- .}
is a linearly independent set in W.-2.' If this set is not a basis we adjoin vectors
from W.-2. so that we do have a basis. We denote the vectors that we adjoin
by 1<".,-1 .• • 1+ 1 .• - 2 ., • • ,1<,. .• - 1 .,.) • 2- .' We continue in this manner until we
have formed a basis for V. We express this basis in the manner indicated in
Figure C.
Basis for
f '." - -, f(/.-I..-,I. V
NI; =
1./
/{ ,./-0,>\ j>
j =
I
I.
eH nce, if we let XI = II.., we see that the first column in Figure C reading
bottom to top, is
We are finally now in a position to state and prove the result which estab-
lishes the Jordan canonical form of matrices.
o 0 ... A,
where A, is an (m, X m,) matrix of the form
A, = 1,1 + N, (4.7.39)
and where N, is the matrix of the nilpotent operator (A, - liT)
of index V, on ,X given by Eq. (4.7.34) and Eq. (4.7.35).
Proof. Parts (i)-(iii) are restatements of the primary decomposition theorem
(Theorem .4 7.20). From this theorem we also know that (1 - 1 ,)" is the
minimal polynomial of A" the restriction of A to "X eH nce, if we let N,
= A, - l,I, then N, is a nilpotent operator of index V, on "X We are thus
able to represent N, as shown in Eq. (4.7.35).
The completes the proof of the theorem. _
.4 7. oJ rdan Canonical oF rm 191
-I 0 -1 3 0 1 1
o 1 o o 0o 0
2 1 2 -1 -1 -6 0
A= -2 0 -1 2 1 3 0
o 0 o 0 1 o0
o 0 o 0 o 1 0
-I -I o 1 2 4 1
with respect to u{ I , • . . • u7 } . L e t us find the matrix At which represents A
in the J o rdan canonical form.
We first find that the characteristic polynomial of A is
Pel) = (I - 1)7.
-2 o -I 1 1 3 0-
o o 0 o 0 o0
2 I 1 -I -I -6 0
N= A - I = -2 o -I I 1 3 0
o o 0 o 0 o 0
o o 0 o 0 o 0
-1 -I 0 1 2 4 0
192 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
r- -j·-
N' = 000:01:00
o 0 010
I I
010 0
·- - - r- ·
o 0 0 0 0:0:0 1 ,-_ ..
o 0 0 0 0 0:0
The corresponding basis will consist of strings of vectors of the form
NZx.. Nx . . x..
Nx z , X z , x 3, x ...
We will represent the vectors x .. X z , "x and x .. by x .. x z , "x and x ..,
their coordinate representations, respectively, with respect to the natural
basis u{ .. ... , u,} in .X We begin by choosing XI E W3 such that X I 1= Wz ;
i.e., we find an X I such that N 3x I = 0 but NZx I :# O. The vector x f = (0,
.4 7. oJ rdan Canonical oF rm 193
P = N
[ xz l, Nx l , X I ' Nx z , X z , x 3, x,]
is the matrix of the new basis with respect to the natural basis (see Exercise
.4 3.9).
The reader can readily show that
N' = P - I NP,
where
-I 0 0 -2 I 0 I
0 0 I 0 0 0 3
I I 0 2 0 -I I
P= -I 0 0 -2 0 -2 0
0 0 0 0 0 I 0
0 0 0 0 0 0 I
0 -I 0 - I 0 0 0
and
0 0 2 I 4 -2 2
0 0 I 3 -I 0 I
0 I 0 0 0 -3 0
p- l = 0 0 -I -I -3 I -I
I 0 0 -I -2 -I 0
0 0 0 0 I 0 0
0 0 0 0 0 I 0
Finally the J o rdan canonical form for A is given by
A' = N' + I.
(Recall that the matrix representation for [ i s the same for any basis in .X )
Thus,
194 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
1 1 0iO 0 0 0
I
011:0000 I
001:0000
t- - ·-
A' = 00 0: 1 1:0 0
I I
0 0 0 :I 0 1 :I 0 0
o 0 0 o- o- T"i-l
0
'- -i-
0 0 0 0 0 OIl
Again, the reader can show that
A' = P- I AP.
In general, it is more convenient as a check to show that PA' = AP. •
.4 7.42. Exercise. eL t X = R' , and let u{ t , • • , u,} denote the natural
basis for .X Let A E L ( X , X ) be represented by the matrix
5 -1 I 1 0 0-
1 3 -I -1 0 0
0 0 4 0 1 1
A=
0 0 0 4 -1 -1
0 0 0 0 3 1
0 0 0 0 1 3
Show that the Jordan canonical form of A is given by
4 1 0iO 0 0
I
04 1:000
o 0 4:0 0 0
I
A' = O-O
- O
- r- i- 4 l- 0
o
I I
0 0:0 4 : 0
1 1_ _
0 0 0 0 i2
~
0_
and find a basis for X for which A' represents A.
.4 8. BILINEAR N
UF CTIONALS AND CONGRUENCE
.4 8.1. Definition. eL t e{ l , ... , en) be a basis for the vector space X, and
let
ftJ = f(e" eJ ), i,j = I, ... , n.
The matrix F = lftJ ] is called the matrix of the bilinear functional f with
respect to e{ l> ... , en)'
Now let us recall that the q u adratic form induced by f was defined as
!(x) = f(x , x). On a real finite-dimensional vector space X we now have
.4 8. Bilinear uF nctionals and CongrlUn! ce 197
is given by
F' = P"F P . (4.8.16)
t
~
. p,/e,) = ..= /[ :/]p",p,/I(e", e,) = . . p",I",p,/.= 1(1" )~ .
Proof Let F '
~ ~
where, by definition,I:/
~ ~ Hence, F '
Then/(
=
"-I
P"F P .
p",e".
_
t:1 "-I I-' t:' t t:1
We now have:
Note that congruent matrices are also equivalent matrices. The next
theorem shows that ,." in Definition 4.8.17 is reflexive, symmetric, and
transitive, and as such it is an equivalence relation.
+1
p}
0
+1
r
-1
n (4.8.21)
-1
0 0
o
The integers rand p in the above matrix are uniquely determined by the
bilinear form.
Proof. Since the proof of this theorem is somewhat long, it will be carried
out in several steps.
Step 1. We first show that there exists a basis v{ u ... , v.J of X such that
/(v1, vJ) = 0 for i 1= = j. The proof of this step is by induction on the dimension
of .X The statement is trivial if dim X = 1. Suppose that the assertion is
true for dim X = n - l. Let / be a bilinear functional on ,X where dim
X = n. L e t VI E X be such that /(v l , VI) 1= = O. There must be such a VI;
otherwise, by Theorem .4 8.13, / would be skew symmetric, and we would
conclude that/(x , y) = 0 for all x , y. Now let mz = x { E X : f(v l , x) = OJ.
We now show that mz is a linear subspace of .X Let X u 2X E mz so that
f(vl , X I ) = f(v u x 2) = O. Then f(v u X I + x 2 ) = f(vl , X I ) + f(v l , x 2) = 0
+ 0 = O. Similarly, f(vt> « X I ) = 0 for all « E R. Therefore, mz is a linear
subspace of .X Furthermore, mz 1= = X because VI ¢ mz. Hence, dim mz
:= ;;; n - 1. Now let dim mz = q < n - 1. Since / is a bilinear functional on
mz, it follows by the induction hypothesis that there is a basis for mz consisting
of a set of q vectors v{ 2 , • • , vf+tl such that f(v1, vJ) = 0 for i 1= = j, 2 < i,
j < q + 1. Also, f(v l , vJ) = 0 for j = 2, ... ,q + I, by definition of mz.
200 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
uF rthermore, f(v VI) = f(v l , vJ eH nce, f(v VI) = f(v., v,} = 0 for i = 2,
... ,q + l . "
It follows that f(v"vJ } = 0 for" i:# j and I~i,j<q+l.
We now show that {VI"" ,vq+ll is a basis for .X eL t x E X and let x '
= x - (XlVI, where (XI = f(v l , x ) ff(v l , VI)' Then f(v., x'} = f(v" )x
- (X t /(v l , VI) = f(v l , VI) - f(v l , VI) = O. Thus, x ' E mI. Since v{ ,z .• . ,
vq + l l is a basis for mI, there exist (X2" • ,(Xq+1 such that x ' = (XV z z + ...
+ (XI+q V q + l ; i.e., x = (XlVI + ... + (XI+q V q + l , Thus, { V I"' " vq + l } spans .X
To show that the set {VI" • , vq + l } is linearly independent, assume that
(XlVI + ... + (XI+q Vq1+ = O. Then 0 = f(v l , 0) = f(V., (XlVI + ... +
q + l) = VI) = 0, which implies that (x , = O. eH nce, (X2VZ + ...
+
(XI+q V (X t /(v"
(XI+q V q + 1 = O. Since the set { V 2"' " vqt+ l forms a basis for mI, we
must have (X2 = ... = (X1+q = O. Thus, { V I"' " vqt+ l forms a basis for
,X and we conclude that q + I = n. This completes the proof of
step l.
Step 2. eL t { V I' • • , v.l be a basis for X such that f(v" VJ) = 0 for
i:# j and let P, = f(v v,) for i,j = I, ... , n. eL t e, = 1' IV, for i = 1, ... , n,
where 1' , = If..Jj;J l "if P,:# 0 and 1' 1 = 1 if P, = O. Now suppose that
P, = f(v" v,):# O. Then we have f(e" e,l = f(Ylv 1' ,V,) = 1' 1f(v" v,)
= P,f..J7lJ = ± l. Also, if P, = f(v v,) = 0, then f(e" e,l" = Ylf(v" v,) = O.
iF nally, we see that f(e eJ ) = f(y,v " vJ Y J ) = IY 1' fJ (v VJ) = 0 if i:# j.
Thus, we have established a basis for" X such that fl}" = f(e" eJ ) = 0 if
"
* i j and "[ = f(e,. e,l = + 1, - 1 , or O.
Step 3. We now show that the integers p and r in matrix (4.8.21) are
uniquely determined by f eL t { e l, • • ,e.l and {e~, . .. , e~} be bases for X
and let F and F ' be matrices offwith respect to { e l, • • , e.l and Ie;, ... , i.},
respectively, where
o
-1
F=
o -I
n- p
o
o
.4 8. Bilinear uF nctionals and Congruence 201
o
-1
F'=
-1
n- q
o o
To prove that p = q we show that e l , • • ,e" e;+h ... ,e:. are linearly
independent. F r om this it must follow that p (n - q ) < n, or p < .q By +
the same argument, q < p, and so p = .q L e t
Ylel + + y,e, + ;Y l+ e;+1 + ... + , ~e:. = 0,
where 1' 1 E R, i = 1, ,p and 1' : E R, i = q + 1, ... ,n. Rewriting the
above equation we have
"leI + ... + "pep = -(Y;+le;+1 + ... + y~e:.) A x o'
Then
f(x o, x o) = f(y,e, + ... + ,Y e" Ylel + ... + pY e p)
= Y~ + ... + Y~ > 0,
by choice of{ e l , ... , ep}. On the other hand,
f(x o, x o) = f[-(~+,~+, + ... + y~e:.), - (r~+I~+I' ... ,y~e~)]
= (- 1 )Z[ - (,,~+ 1)2 - (,,~+z)Z - .• • - (y~)Z] < 0
by choice of{~+I" .. ,e~+R}' F r om this we conclude thaty~ '1~ = 0; + ... +
i.e., 1' 1 = ... = 1' p = O. Hence, Y~+ I~+ I + ... + y~e~ = O. But the set
{~+I" .. , e:,} is linearly independent, and thus Y~+I = ... = , ~ = O. Hence,
the vectors el' ... ,ep , ~+t, ... ,e~ are linearly independent, and it follows
thatp = .q
To prove that r is unique, let r be the number of non-zero elements of F
and let r' be the number of non-zero elements of F ' . By Theorem .4 8.15,
F and F ' are congruent and hence equivalent. Thus, it follows from Theorem
4.3.16 that F and F ' must have the same rank, and therefore r = r'.
This concludes the proof of the theorem. _
201 Chapter 4 I Finite-Dimensional Vector Spaces and Matrices
Among the various linear spaces which we will encounter, the so-called
Euclidean spaces are so important that we devote the next two sections to
them. These spaces will allow us to make many generalizations to facts
established in plane geometry, and they will enable us to consider several
important special types of linear transformations. In order to characterize
these spaces properly, we must make use of two important notions, that of
the norm of a vector and that of the inner product of two vectors (refer to
Section 3.6). In the real plane, these concepts are related to the length of a
vector and to the angle between two vectors, respectively. Before considering
the matter on hand, some preliminary remarks are in order.
To begin with, we would like to point out that from a strictly logical
point of view Euclidean spaces should actually be treated at a later point of
.4 9. Euclidean Vector Spaces 203
of vector (x - y) is equal to { ( ' I - 1' 1)'- + (,,- - 1' ,-))- ' 1/2. By convention,
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
we say in this case that "the distance from x to y" is equal to {(~I - I' I)Z
+ (~z - I' Z)Z}1/2, that "the distance from the origin 0 (the null vector) to
x" is equal to (~f + ~DI/Z, and the like. Using the notation of the present
chapter, we have
(4.9.3)
and
Ix - yl = ,J ( x - y)T(x - y) = ,J ( y - )X T(y - )x = Iy - lx . (4.9.4)
The angle (J between vectors x and y can easily be characterized by its
cosine, namely,
cos 8 =(~ 17~+ ~z7z) Z· (4.9.5)
""'~f + i ""'I' I + I' z
Utilizing the notation of the present chapter, we have
cos (J = ,J
T
x x
XT~ yTy
(4.9.6)
It turns out that the real-valued function Tx y, which we used in both Eqs.
(4.9.3) and (4.9.6) to characterize the length of any vector x and the angle
between any vectors x and y, is of fundamental importance. F o r this reason
we denote it by a special symbol; i.e., we write
(x, y) t:. T
x y. (4.9.7)
Now if we let x = yin Eq. (4.9.7), then in view of Eq. (4.9.3) we have
Ix I = ""'(x, x). (4.9.8)
By inspection of Eq. (4.9.3) we note that
(x, x) > 0 for all x * - O (4.9.9)
and
(x , x ) = 0 for x = O. (4.9.10)
Also, from Eq. (4.9.7) we have
(x, y) = (y, x) (4.9.11)
for all x and y. Moreover, for any vectors ,x y, and z and for any real scalars
« and p we have, in view of Eq. (4.9.7), the relations
(x + y, )z = (x, )z + (Y, )z , (4.9.12)
(x , y + )z = (x, y) + (x , )z , (4.9.13)
(<,x< y) = «(x, y), (4.9.14)
and
(x , « y ) = «(x, y). (4.9.15)
In connection with Eq. (4.9.6) we can make several additional observa-
tions. First, we note that if x = y, then cos (J = 1; if x = - y , then cos +
8 = - 1 ; if x T = (~I> 0) and yT = (0, I' )z , then cos (J = 0; etc. It is easily
.4 9. Euclidean Vector Spaces
verified, using Eq. (4.9.6), that cos (J assumes all values between + 1 and
- 1 ; i.e., - 1 < cos (J S + 1.
The above formulation agrees, of course, with our notions of length
of a vector, distance between two vectors, and angle between two vectors.
F r om Eqs. (4.9.9}-(4.9.l5) it is also apparent that relation (4.9.7) satisfies
all the axioms of an inner product (see Section 3.6).
U s ing the above discussion as motivation, let us now begin our treatment
of Euclidean vector spaces.
F i rst, we recall the definition of a real inner product: a bilinear functional
f on a real vector space X is said to be an inner product on X if (i) f is sym-
metric and (ii) f is strictly positive. We also recall that a real vector space X
on which an inner product is defined is called a real inner product space. We
now have the following important
.4 9.17. Theorem. The inner product (x, y) = 0 for all x E X if and only
if y = o.
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
.4 9.19. Corollary. Let A, B E L ( X , X). If (x, Ay) = (x, By) for all ,x
y E X, then A = B.
(x, y) = :E-
I- I
/~ ' (4.9.24)
.4 9.27. Definition. The vector space R" with the inner product defined
in Eq. (4.9.24) is denoted by P. The norm of x given by Eq. (4.9.26) is called
the Euclidean norm on R".
.4 9.31. Theorem. For all x and y in X and for all real scalars tt, the
following hold:
(i) Ix l > 0 unless x = 0, in which case Ixl = 0;
(ii) Ittx I = Itt I . Ix I, where Itt I denotes the absolute value of the scalar
tt; and
(iii) Ix + IY ~ Ixl + Iyl·
Chopter 4 I iF nite-Dimensional Vector Spaces and Matrices
Proof The proof of part (i) follows from the definition of an inner product.
To prove part (ii), we note that
I«lx z = (<,x< «x) = (« ,x «x) = Z
« (x, )x = lz « lx z.
Taking the square root of both sides we have the desired relation
= I«llx I· I«lx
To verify the last part of the theorem we note that
Ix + ylZ = (x + y, x + y) = (x, x) + 2(x , y) + (Y , y)
= Ixl z + 2(x , y) + Iylz.
sU ing the Schwarz inequality we obtain
Ix + ylZ < Ixl z + 2lxl · I yl + lylZ = (Ixl + Iyl)z.
Taking the square root of both sides we have
Ix + y I< Ix I + Iy I,
which is the desired result. _
Part (iii) of Theorem .4 9.31 is called the triangle inequality. Part (ii) is
called the homogeneous property of a norm. In Chapter 6 we will define
functions on general vector spaces satisfying axioms (i), (ii), and (iii) of
Theorem .4 9.31 without making use of inner products. In such cases we will
speak of normed linear spaces (Euclidean spaces are examples of normed
linear spaces).
Our next result is called the parallelogram law. Its meaning in the plane is
evident from Figure E.
x x+y
I
I
/
I
I
I
/
/
I
I
/
o Iyl y
andy of X a s
p(x , y) = Ix - yl. (4.9.35)
It is not difficult for the reader to prove the next result.
A function p(x , y) having properties (i), (ii), and (iii) of Theorem .4 9.36
is called a metric. Without making use of inner products, we will in Chapter 5
define such functions on non-empty sets (not necessarily linear spaces) and
we will in such cases speak of metric spaces (Euclidean spaces are examples
of metric spaces).
B. Orthogonal Bases
Next, let {fl" .. .J.l be an arbitrary basis for X and let F = [ l t J ] denote
the matrix of the inner product with respect to this basis; i.e., ItJ = (It, f J )
for all; andj. More specifically,F denotes the matrix of the bilinear functional
f that is used in determining the inner product on X with respect to the
indicated basis (see Definition .4 8.1). Let x and y denote the coordinate
representation of x and y, respectively, with respect to { f l' ... ,f.l. Then
we have, by Theorem .4 8.2,
• •
(x , y) = Tx yF = yTFx' = J - LI L '-I
Ittlh·
Now by Theorems .4 8.20 and .4 8.23, since the inner product is symmetric
and strictly positive, there exists a basis e{ l , • • ,e.l for X such that the
matrix of the inner product with respect to this basis is the (n x n) identity
matrix I, i.e.,
ifi:;ej
(e" eJ ) = ~'J ={ ~ if; = j .
This motivates the following:
The reader should note that Eqs. (4.9.7) and (4.9.8) introduced at the
beginning of this section are, of course, in agreement with Eqs. (4.9.42)
and (4.9.43). (See also Example .4 9.23.)
Our next result enables us to determine the coordinates of a vector with
respect to a given orthonormal basis.
4.9.46. Example. Let X = RZ, and let the inner product on RZ be defined
by
1{ 111 + 4~z11z·
(x, y) = (4.9.47)
(The reader may verify that this is indeed an inner product.) L e t u{ I , u z }
denote the natural basis for RZ; i.e., U I = (1,0) and U z = (0,1). The matrix
representation of the bilinear functional, which determines the above inner
product with respect to the basis u{ l , uz} is
x = (x , e~)e~ + + (x , e~)e~
.4 9.49. Coronary. Let e{ " ... ,e,,} be an orthogonal basis for .X Then
for any ,x y E X we have
(x, y) = t. (x, e,)(y, e,).
t1 (e" e,)
.4 9.51. Theorem. Suppose that X I ' • • ' X k are mutually orthogonal non-
zero vectors in ;X i.e., ,x ...L X j ' i::l= j. Then X l " • ,x k are linearly inde-
pendent.
.4 9. Euclidean Vector Spaces 213
= (X/(X" X I );
i.e., (XI(X I , XI) = O. This implies that (XI = 0 for arbitrary i, which proves the
linear independence of X I " • , X k • •
Note that the converse to the above theorem is not true. We leave the
proofs of the next two results as an exercise.
o I< x - f.
~
IJ,,X I2 = (x - ~
f. IJ,"X X - t
J-I
IJ j X j)
k k k k
= (x, x ) - ~ 1J,tJ, - ftilJ j lJ j + ~I ~I (I,,tJt,x , X j )'
and
Ixl = v- "lxti z + IXlz .z
Proof To prove the first part, note that if x E y.L, then x J . .fl"' " X J . .fk'
since It E Y for i = 1, ... , k. On the other hand, let x J . ./' , i = 1, ... , k.
Then for any Y E Y there exist scalars I' " i = 1, ... , k such that y = ndl
+ ... + I' ,.fk' eH nce,
(x, y) = (x , t 'I,/')
I- '
= f
t:1
' I ,(x , /,) = O.
Thus, x E y.L.
The remaining parts of the theorem are left as an exercise. _
Before closing the present section we state and prove the following impor-
tant result.
A. Orthogonal Transformations
In order that (1" ej) = 0 for i *"j and (e;, ej) = I for i = j, we require that
• •
(e;, ej) =
~I
~ PklP' ) k' = ~
~ PklPkJ = 6,J ,
i.e., we require that
PTP = I,
where, as usual, I denotes the n X n identity matrix. We summarize.
.4 10.1. Theorem. Let { e l' ... ,e.l be an orthonormal basis for .X Let
e; = t
I-J
PJleJ' i = 1, ... ,n. Then {e~, ... ,e.l is an orthonormal basis for
X if and only if pT = P- I .
The nomenclature used in our next definition will become clear shortly.
216
.4 10. Lineary Transformations on Euclidean Vector Spaces 217
Our next result establishes the link between Definitions 4.10.2 and 4.10.4.
B. Adjoint Transformations
Furthermore,
g(tU, y) = (lX,X Gy) = IX(,X Gy) = IXg(X, y),
and
g(x, IX)Y = (x, G(IX» Y = (x, IXG(y» = IX(,X Gy) = IXg(X, y),
where IX is a real scalar. Therefore, g is a bilinear functional.
Next, let e{ ., ... ,e.} be an orthonormal basis for .X Then the matrix
G of g with respect to this basis is determined by the elements g/j = g(e l, eJ).
Now let G' = g[ ;J] be the matrix of G with respect to {e., . .. ,e.}. Then
Ge J = t
k=.
g~Jek for j = I, ... ,n. Hence, (e lt Ge) = (e k=t.
l, g~)ek) = g;j.
Since glJ = g(e l , eJ ) = (e lt Ge J ) = g;J' it follows that G' = G; eL ., G is the
matrix ofG.
To prove the last part of the theorem, choose any orthonormal basis
e[ ., ... ,e.} for .X Given a bilinear functional g defined on ,X let G = g[ lj]
denote its matrix with respect to this basis, and let G be the linear transfor-
mation corresponding to G. Then (x, Gy) = g(x, y) by the identical argument
given above. Finally, since the matrix of the bilinear functional and the matrix
of the linear transformation were determined independently, this correspon-
dence is unique. _
.4 10.13. Theorem
(i) F o r each G E L ( X , X ) , there is a unique G* E L ( X , X ) such that
(x, G*y) = (Gx, y) for all ,x y E .X
(ii) Let {e., . .. ,e.} be an orthonormal basis for ,X and let G be the
matrix of the linear transformation G E L ( X , X ) with respect to this
basis. Let G* be the matrix of G* with respect to e[ l , • • , e.}. Then
G* = GT.
Proof The proof of the first part follows from the discussion preceding
the present theorem.
To prove the second part, let e[ l, ... , e.} be an orthonormal basis for
,X and let G* denote the matrix of G* with respect to this basis. Let x and y
be the coordinate representation of x and y, respectively, with respect to this
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
basis. Then
(x , G*y) = T
x G*y = (Gx , y) = (GX)T y = T
x GT y.
Thus, for all x and y we have T
x (G* - GT)y = O. eH nce, G* = GT. •
The above result allows the following equivalent definition of the adjoint
linear transformation.
.4 10.25. Corollary. eL t A E L(X, X). Then there exist unique At, A"
E L(X, X ) such that A = AI + A", where At is self-adjoint and A" is skew-
adjoint.
i.e.,
In.
Al 0
A. 2.
A=
In.
A2.
I
o A,
n,
A,
To prove the second part, we note that the characteristic polynomial of
A is
det (A - AI) = det (A - AI) = (AI - A)"'(A2. - A)"' (Ap - A)"',
and, hence, n, = dim /~ = multiplicity of A" i = 1, ,p. _
Next, we state and prove the spectral tbeorem for self-adjoint linear
transformations. First, we recall that a transformation P E L ( X , X ) is a
projection on a linear subspace of X if and only if p1. = P (see Theorem
3.7.4). Also, for any projection P, X = R< (P) EEl (~ P), where R< (P) is the range
of P and ~(P) is the null space of P (see Eq. (3.7.8». Furthermore, recall
that a projection P is called an orthogonal projection if R < (P) ..1 (~ P) (see
Definition 3.7.16).
(iii) t
J-I
PJ = I, where I E L(X, X ) denotes the identity transformation;
and
(iv) A = t
J=I
AJP)"
Proof To prove the first part, note that X = m:, EB m:;-, i = I, ... ,p,
by Theorem .4 9.59. Thus, by Theorem 3.7.3, R< (P ,) = m:, and m:(P ,) = m:;-,
and hence, P, is an orthogonal projection.
To prove the second part, let i 1= = j and let x E .X Then PJx I:>. x J E m: J .
Since R< (P ,) = m:, and since m:,1.. m: J , we must have x J E m:(P ,); i.e.,
P,PJx = 0 for all x E .X
To prove the third part, let P = t
I- I
P" We must show that P = I. To
do so, we first show that P is a projection. This follows immediately from the
fact that for arbitrary x E ,X p2 X = (PI + ... + P,)(Plx + ... + P,x ) =
PIx + ... + P;x , because P'P J = 0 for i 1= = j. Hence, p2 X = (PI + ...
+ P,)x = Px, and thus P is a projection. Next, we show that dim R<[ (P)] = n.
It is straightforward to show that
dim R
<[ (P)] = t
1= 1
dim m
[ :,],
But by Theorem .4 10.37, t
1= 1
dim m
[ :,] = n, and thus dim R
< [ (P)] = n. Since
X = R < (P) EB m:(P), we conclude that R
< (P) = .X Finally, since P is a pro-
jection with range ,X we conclude that Px = x for all x E ,X i.e., P = 1.
To prove the last part of the theorem, let x E .X F r om part (iii) we have
x = PIX + P 2x + ... + P,x.
Let ,x = P,X for i = I, ... , p. Then ,x E m:, and AX I = A,X ,. Hence,
Ax = A(x i + + x,) = AX + ... + Ax ,
I = Al IX + ... + A,x,
= AIPIX + + A'p,X = (AIP I + ... + A,P,)X,
which concludes the proof of the theorem. _
D. Some Examples
A= 0[ 11
021 022
Ou]
is the matrix of A with respect to the basis e{ l, e2}' eL t x E E2, and let
x T = (' I ' ' 2 ) denote the coordinate representation of x with respect to this
basis. Then Ax is the coordinate representation of Ax with respect to this
basis, and we have
Ax = 0[ 111' + OUe2J ~ 1'[ IJ = y.
021 '1 + 022e2 - 1' 2
This transformation is depicted p' ictorially in iF gure F.
Now assume that A is a self-adjoint linear transformation. Then there
exists an orthonormal basis e{ ,~ }~ such that
Ae~ = A.le~, Ae; = A.2e;,
o 111 8 1 81
.4 10.46. iF gure F
.4 10.47. FIgure G
.4 10. iL near Transformations on Euclidean Vector Spaces
U n it circle
.4 10.49. iF gure H
The reader can readily verify that R is indeed a linear transformation. The
matrix of R with respect to this basis is
_ C[ OS
R.-
(J - sin (JJ .
sin (J cos (J
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
PlaneZ
900
I
I
I
I
I
/
I
I
Set .Y
We also have:
sU ing the above theorem we now can prove the following result.
R,-
_ C[ OS (J - sin (JJ . (4.10.56)
sin (J cos (J
(ii) If det A = - 1 (i.e., A is a reflection), there exists some orthonormal
basis feI' e2 } such that the matrix of A with respect to this basis is
Q= 1[ o - I OJ. (4.10.57)
Proof To prove the first part assume that det A = + 1 and choose an
arbitrary orthonormal basis fe" e2 } . eL t
A = [all au]
a21 a2 2
denote the matrix of A with respect to this basis. Then, since A is orthogonal,
so is A and we have
(4.10.58)
and
det A = I. (4.10.59)
Solving Eqs. (4.10.58) and (4.10.59) (we leave the details to the reader) yields
alI = cos (J, au = - sin (J, a21 = sin (J, and a 22 = cos (J.
To prove the second part assume that A is orthogonal and that det A
= - I . Consider the characteristic polynomial of A,
. p(l) = 12 + /X 11 + /X o •
which implies that both AI and Az are real and that AI *- Az. rF om Theorem
.4 10.52 these eigenvalues are + 1and - 1 . Therefore, there exists an orthonor-
mal basis such that the matrix of A with respect to this basis is
OJ I[ OJ
[ °
AI
Az = 0 -1 . •
where the tXl' PI> i = I, ... , r are real (i.e., det (A - U ) does not have any
linear factors (AI - ).), with AI real). Solving the first uq adratic factor we have
, _ -PI ,+ P
J : -4«1
""I - 2
and
A-
z-
-PI .- .Jm2 - t4 X l,
where AI and Az are complex. By Theorem .4 5.33, part (iv), if 1(' ) is any
polynomial function, then I(AI) will be an eigenvalue of I(A). In particular,
ifj(A) = tX l + PIA + AZ, we know that one of the eigenvalues of the linear
transform~tion tXlI + PIA + AZ will be tX l + PIAl + Ai = 0, by choice.
Thus, the linear transformation (tXII + PIA + AZ) has 0 as an eigenvalue.
Therefore, there exists a vector jl *- 0 in X such that
(tXII + PIA + AZ)/I = 0 • II
or
(4.10.61)
.4 10. iL near Transformations on Euclidean Vector Spaces 235
x = etft + e.Jz
for some et and ez, and
Ax = etA/. + ez AI1 = etA/. + ez A Z/ t .
But from Eq. (4.10.61) it follows that
AZit = - f 1,tft - PtA/t.
Thus,
Ax = etA/] + ez ( - f 1,tft - PtA/.) = - e z f 1,tft + (et - ez p t)A/t
= - e z f 1,tft + (et - eZpt)fz,
which shows that Ax E \ Y whenever x E Y t . Thus, Y t is invariant under A.
By Theorem 4.10.52, the restriction A' of A to Y t is an orthogonal trans-
formation from Y t into Y t • This restriction cannot have any (real) eigenvalues,
for then A would also have (real) eigenvalues.
F r om Theorem 4.10.55, A' cannot be a reflection, for in that case A'
would have eigenvalues eq u al to +
I and - 1 . Moreover, A' cannot be a
trivial rotation, for then the eigenvalues of A' would be eq u al to 1 if 6 = 0 0
and - 1 if (J = 1800 • But from Corollary 4.10.8 we know that if A is
orthogonal, then det A = ± 1. Therefore, it follows now from Theorem
4.10.55 that the restriction of A to Y t is a non- t rivial rotation.
Now let Zt = Y t . Since Y t is invariant under A, so is Zto by Theorem
4.10.52, part (iii), and dim Zt = dim X - 2. The restriction At of A to Zt
is an orthogonal transformation from Zt into Zto and it cannot have any
(real) eigenvalues. Applying the argument already given for A and X now to
At andZ t , we can conclude that there exists a two- d imensional linear subspace
Y z of Z t such that the restriction of A t to Y 1 is a non- t rivial rotation. Now
since Y z i scontainedinZ t and since by definitionZ t = Y t ,wehave .\ Y ..L Y 2•
Nex t , let Z2 be the linear subspace which is orthogonal to both Y t and
Y z , and let A2 be the restriction of A to Z2' Repeating the argument given
thus far, we can conclude that there exists a two- d imensional linear subspace
] Y of Z2 such that the restriction of A 2 to ] Y is a non-trivial rotation and
such that .z Y ..L ] Y and Y t ...L .] Y
To conclude the proof of the theorem, we continue the above process
until we have ex h austed the original space .X •
Since in the above corollary the dimension of each "Y i = 1, ... ,r,
is two, we have the following additional result.
cos 8 1 - s in 9 1
sin 9 I cos 9 I
o
2,
:I cos 8, - s in 8, :I
~-,
8 8
_
I • I
L~ ~ J_! ~
-1
A=
o -1
1+
.4 10. Linear Transformations on Euclidean Vector Spaces 237
A=
!-o-i~
o
i -J.,
I
0
where the .J " i= I, ... ,r are real and where some of the .J , may be
ez ro.
PI AI:I
-AI PI!
._ - - -
o
- - - - - -1
P,
1 , I
A= I
1
A, I
I
"
:- - - - - - A , P, I
I
1
o
V~-2,
The proofs of parts (i)-(iii) follow from the definitions of normal, self-
adjoint, skew-adjoint, and orthogonal linear transformations. To prove
part (iv), let A = AI + A2 , where AI = H A + A*) and A2 = t(A - A*),
and note that AI is self-adjoint and A 2 is skew adjoint. This representation
is unique by Corollary .4 10.25. Making use of Theorem .4 10.66 and Corollary
.4 10.38, we obtain the desired result. We leave the details of the proof of
this theorem as an exercise.
Let R denote the set of real numbers, and let D c R2 be a domain (i.e.,
D is an open and connected subset of R2). We will call R2 the (t, x) plane.
Let / be a real-valued function which is defined and continuous on D, and
4.IJ. Applications to Ordinary Differential Equations 239
let x I:J. dx/dt (Le., x denotes the derivative of x with respect to t). We call
x = f(t, x) (4.11.1)
an ordinary differential equation of the first order. Let T = (t l ' t )z c R be
an open interval which we call a t interval (Le., T = (t l ' t z ) = t{ E R:
t I < t < t )} z . A real differentiable function rp (if it exists) defined on T such
that the points (t, rp(t» E D for all t E T and such that
;(t) = f(t, rp(t» (4.11.2)
for all t E Tis called a solution of the differential equation (4.11.1).
.4 11.3. Definition. Let ('r, e) D. If rp is a solution of the differential
e, then rp is called a solution of tbe initial-
E
equation (4.11.1) and if rp('r) =
value problem
x = f(t, x ) } .
x('r) = e (4.11.4)
~ ---
t, T t
We can represent the initial-value problem given in Eq. (4.1 1.4) equiva-
r
lently by means of the integral equation
Here we say that two problems are equivalent if they have the same solution.
To prove this equivalence, let rp be a solution of the initial-value problem
(4.1 1.4). Then rprr) = and e
;(t) = f(t, rp(t»
140 Chapter 4 I iF nite- DimensiolUll Vector Spaces and Matrices
.4 11.9. Definition. Let (f, ~ I> • . . , ~.) E D. If the set { ' I "' " ,.} is
a solution of the system of equations (4.11.7) and if (' I (f), ... , ,.(f» = (~I>
... , ~.), then the set 1 ' £ "' . ".} is called a solution of the initial-value
problem
IX = /,(t, X I ' . ' • , x.), i = 1, ... , n } .
(4.11.10)
X I (f) = ~I' I = I, ... , n
It is convenient to use vector notation to represent Eq. (4.11.10). Let
.4 11. Applications to Ordinary Differential Equotions 241
then Eq. (4.11.14) results. In the case of Eqs. (4.11.14) and (4.11.15), we
speak of a linear homogeneous system of ordinary differential equations, in
the case of Eq. (4.11.13) we have a linear non-bomogeneous system of ordinary
differential equations, and in the case of Eq. (4.11.15) we speak of a linear
system of ordinary differential equations with constant coefficients.
Next, we consider initial-value problems described by means of nth-order
ordinary differential equations. L e tlbe a real function which is defined and
242 Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
continuous in a domain D of the real (I, XI' •• ,x~) space, and let X ( k)
(4.1 1.23)
I_~X = x~ = X(~-I)
This system of equations is clearly defined for all (I, X I ' ... ,x~) E D. Now
assume that the vector p4 T = ('11' ... , rp~) is a solution of Eq. (4.11.23) on an
.4 11. Applications to Ordinary Differential Equations
interval T. Since rp" = ;" rp3 = ;", ... ,rpft = rp\ft-I), and since
f(t, rp,(t), . .. ,rpft(t» = f(t, rp,(t), . .. ,rp\ft-Il(t» = rp\ft)(t),
it follows that the first component rp, of the vector, is a solution of Eq.
(4.11.16) on the interval T. Conversely, assume that rp, is a solution of Eq.
(4.11.16) on the interval T. Then the vector cpT = (rp, rp(l), ... , rp(ft-ll) is
clearly a solution of the system of eq u ations (4.11.23). Note that if rp,(1') = ~"
... ,rp\ft-I)(1') = ~ft' then the vector, satisfies ,(f) = ; , where = (~t, ;T
... , ~ft)' The converse is also true.
Thus far we have concerned ourselves with initial-value problems charac-
terized by real ordinary differential equations. It is possible to consider initial-
value problems involving complex ordinary differential equations. F o r exam-
ple, let t be real and let ZT = (z " ... , Zft) be a complex vector (i.e., Zk is of
the form U k + ivk , where U k and V k are real and i = ,J = } ) . Let D be a domain
in the (t, z) space, and letf., ,f,. be n continuous complex-valued functions
defined on D. Let fT = (fl' ,f,.), and let = dz/dl. We call z
z = C(t, )z (4.11.24)
a system of n complex ordinary differential eq u ations of the first order.
A complex vector cpT = (rp" • .. , rpft) which is defined and differentiable on
a real t interval T = (T" T,,) c R such that the points (I, rp,(t), ... , rpft(t»
E D for all t E T and such that
(+ t) = C(t, .<t»
for all t E T, is called a solution of the system of eq u ations (4.11.24). If in
addition, (r,~" ... '~ft) E D and if (rp,(-r), ... ,rpft(-r» = (~I"" '~ft) = ~T,
then cp is said to be a solution of the initial-value problem
z = (£ t, )z } .
(4.11.25)
(z r- ) = ~
Of particular interest in applications are initial-value problems characterized
by complex linear ordinary differential eq u ations having forms analogous
to those given in equations (4.1 1.13)-(4.11.15).
We can similarly consider initial-value problems described by complex
nth- o rder ordinary differential equations.
Let us look now at some specific examples. The first example demonstrates
that the solution to an initial-value problem may not be unique.
(x O) = O.
We can readily verify that this problem has infinitely many solutions passing
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
l
tpi ) =
0,
{ H(t ~ p)]3/Z, P <
° <1<
t < I
p
The next example shows that the t interval for which a solution to the
initial-value problem exists may be restricted.
x ( t.) = ,{
where { is any real number. By direct computation we can verify that
tp(t) = {[I - (t - tl){ ] - l
is a solution of this problem. We note that if t = t l + Ij{, then the solution
tp(t) is not defined. Thus, there is a restriction on the t interval for which a
solution to the above problem exists. Namely, if { > 0, the above solution
is valid over any interval (t l , t 2 ) such that
In this case we say the solution fails to exist for I > t 2 • On the other hand,
if { < 0, the solution given above is valid for any t > t I ' and we say the solu-
tion exists on any interval (t l , t 2 ). •
vector with components vl(t) that are defined and piecewise continuous on T.
In the following we consider matrices and vectors with components which
may be either real- or complex-valued. In the former case the field for the x
space is the field of real numbers, while in the latter case the field for the x
space is the field of complex numbers. Also, let
D = {(I, x): lET, x E Rn(or en)}. (4.11.28)
At first we consider systems of ordinary differential equations given by
i = A(I)x + V(I), (4.11.29)
i = A(I)X, (4.11.30)
and
i= Ax. (4.11.31)
In the applications section of the next chapter we will show that, with the
above assumptions, equations (4.11.29)-(4.11.31) possess unique solutions
for every (r, e) E D which exist over the entire interval T = (II' (2 ) and which
depend continuously on the initial conditions. This is an extremely important
result in applications, where we usually require that T = (- 00, 00).
E .F
1'
it follows that 0.: 1 + 0.: 2'2 E S whenever f2 E S and whenever 0.: 1,0.: 2
F u rthermore, the trivial solution f = 0 defined by f(l) = 0 for all
1' >
t E T is clearly in S, and for every TI E S there exists a I' f = -11 E S such
that TI + I' f = O. It is now an easy matter to verify that all the axioms of a
vector space are satisfied for S (we leave the details to the reader to verify).
But this last equation contradicts the assumption that the ~, are linearly
independent. Thus, the ." i = I, ... ,n, are linearly independent. iF nally,
to show that these solutions span S, let. be any solution of Eq. (4.1'1.30) on
T, such that '<1') = ~. Then there exist unique scalars IX I , • • , IX" E F such
that
because the vectors /~ ' i= 1, ... ,n, form a basis for the x space. It now
follows that
" II , U , 111]
,,= [
:~ .. ..:::.. :.1~
:.1~ = [ . II.1! · 1· .,,]
""I "111 ... ",,"
is a fundamental matrix.
.4 11. Applications to Ordinary Diff~rential Equations 247
In our next definition we employ the natural basis for the x space, given by
I 0- 0
0 I
81 = , 8:&= 0 ... , u• =
0
0 0
I
We also have:
.4 11.37. Theorem. If" is a solution of the matrix equation (4.11.36)
on T and if t, ' f E T, then
det "(/) = det "(' f )ef. tf A(.) i., t E T. (4.11.38)
Proof Recall that if C = [ e ll] is an (n X n) matrix, then tr C = t
I-I
CII' Let
fl. .. .
".. ".1 "d fl••
Also,
1' 2' . -
...................
' / Inn
The last determinant is unchanged if we subtract from the first row 012 times
°
the second row plus 1 , times the third row up to 0ln times the nth row.
This yields
0\1 (/)'/1 \I 0 \ I (t}yt u ... ° \I (t}'/I 1n
~ d[ et 1'(t)] = ° 11 (/) det Y ( t) + 022(1) det 1'(1) + ... + 0..(/) det 1'(t)
= [tr A(t)] det 1'(t).
This now implies
det Y ( t) = det Y(r)ef~ It A(,),,,
for all t E T. •
We now prove:
or
• = 1' a , (4.11.42)
where aT = (/II' ..• ,/I.). Equation (4.11.42) constitutes a system of n linear
equations with unknowns /II' ..• , /In at any f E T and has a unique solution
for any choice of.(f). eH nce, we have det 1' ( f) *'
*'
0, and it now follows from
Theorem .4 11.37 that det Y ( t) 0 for any 1 E T.
Conversely, let l' be a solution of the matrix equation (4.11.36) and assume
.4 11. Applications to Ordinary Differential Equations 249
that det Y ( t) 1= = 0 for all t E T. Then the columns of., are linearly inde-
pendent for all t E T. •
Now let R(t) = [rit») be an arbitrary matrix such that the scalar valued
functions rl}(t) are Riemann integrable on T. We define integration of R(t)
componentwise, i.e.,
Proof The first part of the theorem follows from the definition of the state
transition matrix.
Chapter 4 I iF nite-Dimensiotull Vector Spaces and Matrices
= A(t~t) + (Y t).
Also, f{ f ) =~. Therefore, • is the unique solution of Eq. (4.11.29). •
eq u ations with constant coefficients given by Eq. (4.1 1.31). We require the
following preliminary result.
~'J + I
letl a}r
l
~! I< 00 for all i,j.
Let m = max
I~I~.
(t Ia'i I). Then m is a constant which depends on the elements
=J I
< max
I,) p= I
(max
I p= I
Ialp I)(max
P.}
Ia~~1 I). Therefore,
max la}+ } II I< m .max l a1} 1 1. When k= t. we have maxlat]l~m. and
I.J I.} I.J
by induction it follows that max l aWI< m k
• Now tet Mk= ( mtl)klk! .
I.}
Then we have for any t (- t tl), t l > 0, and for any i,j,
I (kIt"k! I<M
E l,
-
ali - ".
Since I + L M" = e"'t" we now have
"-I
i-
~
QI) + ~
1e= 1
al}
(kl tIc
-k' •
is an absolutely and uniformly convergent series for each i,j over the interval
(- I I' I I) by the Weierstrass M-test. •
Now .(tl> - t 2) = eAII,+ t ,l, .(tl> 0) = eA", and ~O, - t 2) = eN', which
yields the desired result.
To prove the fourth part of the theorem we note that for all t E T,
A(I + :E Ik Ak) =
k= l k!
A+ :E t....Ak+1
k=1 k!
= (I + k-tl t....Ak)A.
k!
.4 11. Applications to Ordinary Differential Equations 253
iF nally, to prove the last part of the theorem, note that for all t E T,
t!', . t!(' ,- ) = eA (,- , ) = I.
Therefore, (t! " )- t = e- A,. •
r
= e = = +
A'J (I,)d~ ...
.(t, T) T eB(.,T) I ~ -kIB~{t, T),
k= 1 •
.4 11.53. Exercise. Find the state transition matrix for i = A{t)x, where
A{ t ) = [; ~l
The reader will find it instructive to verify the following additional
results.
A= A[ .I. 0].
o A.~
Show that
ell' . 0 ]
eA , = .
[
o el .,
for all t E T= (- b o, bo).
o
where
o
o
.4 11. Applications to Ordinary Differential Equations
where
o o-
o o
J.=
. I
1.. + . o 0· · · · ·
o 0· · · · · 0 lk+ ..
m = I, ... ,p, and where 11> ... ,1.., lk+.' ... ,lu, denote the (not
necessarily distinct) eigenvalues of A. Show that
o
ell'
o e'"
.]
where
and
I" I,· - i
1 2! (v. - l)!
1' · - "
t
(VIII - 2)!
o 0 0
where J . is a VIII X VIII matrix and k + v. + ... + v, = n.
Next, we consider initial-value problems characterized by linear nth-order
ordinary differential equations given by
a.(t)x l l
• + a._ . (t)x l
.-
Il
+ + a.(t)x ( \ l + ao(t)x = v(t), (4.11.59)
a.(t)x l• l + a._ . (t)x c.- Il + + a.(t)x ( \ l + ao(t)x = 0, (4.11.60)
and
a.x l .) + a._ . x l. - I ) + ... + a.x l l) + aox = O. (4.11.61)
In Eqs. (4.11.59) and (4.11.60), v(t) and o,(t), i = 0, ... ,n, are functions
which are defined and continuous on a real t interval T, and in Eq. (4.11.61),
Chapter 4 I iF nite-Dimensional Vector Spaces and Matrices
Now let 1' 1' " .. ,' I ' " be solutions of Eq. (4.11.60). Then we can readily
verify that the matrix
Then
2
W(YI' > )z'Y (t) = det P
' (t) = --, t> O.
t
Using the notation of Eq. (4.1 1.63), we have in the present case al(t)laz(t)
= lit. F r om Eq. (4.1 1.68) we have, for any l' > 0,
W(YI' > )z'Y (t) = det "(t) = W(Y' I> )z'Y (r)eJ~ (- II.e"I/II,C"IJ d"
-_ - e2 ID (Titl _
- -, 2 t> 0,
l' t
which checks. •
We call a set ofn solutions ofEq . (4.11.60), 1'Y t ..• , "' Y , which is linearly
independent on T a fundamental set for Eq. (4.11.60).
L e t us next turn our attention to the non-homogeneous linear nth- o rder
ordinary differential eq u ation (4.11.59). Without loss of generality, let us
assume that a,,(t) = 1 for all t E T; i.e., let us consider
C
X "1 + a"_I(t)xC"-1l + ... + al(t)x(l) + ao(t)x = v(t). (4.11.73)
The study of this eq u ation reduces to the study of the system of n first-order
158 Chapter 4 I Finite-Dimensional Vector Spaces and Matrices
.4 11.77. Theorem. Let I¥{ t> ... , I¥ .} be a fundamental set for the equation
lX .> + O._I(t)X I.- I> + ... + OI(t)X()J + oo(/)X = O. (4.11.78)
Then the solution' of the equation
Xl.> + o._I(/)x(·-tJ + + 01(/)X()J + oo(t)x = v(1), (4.11.79)
satisfying ~(T) = ; = (C(T), CIIl(T), , ,(·-t>(T»T = (' I " .. ,' . )T, T E T,
; E R· (or C·) is given by the expression
where CA is the solution of Eq. (4.11.78) with CA(T) = ' I ' and where ~(¥lI'
... ,¥I.X/) is obtained from W(¥lI" .. , I¥ .)(/) by replacing the ith column of
W(¥lI" .. , I¥ .X/) by (0,0, ... , l)T.
where b(t) is a real continuous function for all t > O. This equation is
equivalent to
(4.11.84)
where v(t) = b(t)/t'1.. F r om Example .4 11.69 we have V'I(t) = t, V'1' .(t) = l/t,
and W(V'., V'1' .)(t) = - 2 /t. Also,
o I
t 1
=--,
-1 t
tr
-1 0 0 0
o -1 I 0 0
= -1 .
o 0 0 -1 I
-01 -0" -03 ... - 0 "_ , , -(1 + 0,,_ 1 )
100
-1 I 0
+ (- 1 )"+ 1 (- 0 0) 0 -1 0.
o 0
sU ing induction we arrive at the expression
det(A - 1 1) = (- I )"{ l " + ,° ,_11,,-1 + ... + all + oo}. (4.11.89)
It follows from Eq. (4.11.89) that 1 is an eigenvalue of A if and only if 1
is a root of the characteristic equation (4.11.86).
(4.11.91)
1"
where 1 1 , • • ,1" denote the eigenvalues of matrix A. eL t V denote the
Vanclermonde matrix given by
I
11 1" 1"
V= II l~ l~
.Y = - Y , ,- I + a,,-I.Y ·
Differentiating the last expression in Eq. (4.11.97) (n - 1) times, eliminating
Y" ... ' Y " - I ' and letting "Y = ,Y we obtain
(- I )· y "> + (- I ),,- l a._1Y c..-I> + ... + (- I )Qlit> +
C
aoY = O. (4.11.98)
Equation (4.11.98) is called the adjoint of Eq. (4.1 1.85).
cations. (In particular, consult the references in .4 [ 10] for a list of diversified
areas of applications.)
ExceUent references on ordinary differential equations include .4 [ 3], .4 [ 5],
and .4 [ 11].
REFERENCES
.4 [ 1] N. R. AMuNDSON, MatMmatical Methods in Chemical Engineering: Matrices
and Their Applications. Englewood ai1f's, N.J.: Prentice-aH ll, Inc., 1966.
.4 [ 2] R. E. BELM L AN, Introduction to Matrix Algebra. New York: McGraw-iH D
Book Company, Inc., 1970.
.4 [ 3] .F BRAUER and .J A. NOBEL, Qualitatil1e Theory of Ordinary Differential
Equations: An Introduction. New York: W. A. Benjamin, Inc., 1969. *
.4 [ ] 4 E. T. BROWNE, Introduction to the Theory of Determinants and Matrices.
Chapel iH D, N.C.: The nU iversity of North carolina Press, 1958.
.4[ 5] E. A. CoDDINGTON and N. IL MNSON, Theory of Ordinary Differential
Equations. New York: McGraw-iH ll Book Company, Inc., 1955.
.4 [ 6] F. R. GANTMACHER, Theory of Matrices. Vols. I, II. New York: Chelsea
Publishing Company, 1959.
.4 [ 7) P. R. IIALMos, iF nite Dimensional Vector Spaces. Princeton, N.J.: D. Van
Nostrand Company, Inc., 1958.
.4[ 8] .K OH M
F AN and R. N UK ZE, Linear Algebra. Englewood ai1f's, N.J.:
Prentice-aH ll, Inc., 1961.
.4[ 9] S. IL PSCHT
U Z, Linear Algebra. New York: McGraw-iH ll Book Company,
1968.
.4[ 10] B. NOBLE, Applied iL near Algebra. Englewood aiit' s , N.J.: Prentice-aH ll,
Inc., 1969.
.4 [ 11] .L S. PoNTlU A OIN, Ordinary Differential Equations. Reading, Mass.:
Addison-Wesley Publishing Co., Inc., 1962.
- R eprinted by Dover Publications, Inc., New York, 1989.
5
METRIC SPACES
The set X is often called the underlying set of the metric space, the elements
of X are often called points, and p(x, y) is frequently called the distance
from a point x E X to a point y E .X In view of axiom (i) the distance
between two different points is a unique positive number and is equal to zero
if and only if two points coincide. Axiom (ii) indicates that the distance
between points x and y is equal to the distance between points y and x.
Axiom (iii) represents the well-known triangle inequality encountered, for
example, in plane geometry. Clearly, if p is a metric for X and if IX is any
real positive number, then the function IXp(X, y) is also a metric for .X We
are thus in a position to define infinitely many metries on .X
The above definition of metric was motivated by our notion of distance.
Our next result enables us to define metric in an equivalent (and often con-
venient) way.
5.1.3. Example. L e t X be the set of real numbers R, and let the function
P on R x R be defined as
p(x, y) = Ix - yI (5.1.4)
for all x, Y E R, where Ix I denotes the absolute value of .x Now clearly
p(x , y) = Ix - yl = 0 ifand only if x = y. Also, for all x , y, Z E R, we have
p(y, )z = Iy - lz = 1(Y - )x + (x - )z 1 < Ix - yl + Ix - lz = p(x , y) +
p(x, )z . Therefore, by Theorem 5.1.2, P is a metric and R { ; p} is a metric
space. We call p(x, y) defined by Eq. (5.1.4) the usual metric on R, and we
call the metric space R{ ; p} the real line. _
even though ;X { p} may not be bounded. Thus, the function (5.1.11) can be
used to generate a bounded metric space from any unbounded metric space.
(H i nt: Show that if,: R - . R is given by ,(t) = t/(l t), then ,(t 1) < ,(t 1 ) +
for all t 1 , t 1 such that 0 < t 1 < t 1 .)
J 1 {lxi' x R
1:+ : :: ~:
E
[(x) ~
Let p*: R* x R* - . R be defined by p.(x , y) = If(x ) - f(y) I for all ,x
y R*. Show that R
E { *; P.} is a bounded metric space. The function p* is
called the usual metric for R*, and R
{ *; p*} is called the extended real line.
We also have:
We call p' the metric induced by p on ,Y and we say that {Y; p'} is a
metric subspace of fX ; p} or simply a subspace of .X Since usually there is
no room for confusion, we drop the prime from p' and simply denote the
Chapter 5 I Metric Spaces
metric subspace by {Y; pl. We emphasize that any non-void subset of a metric
space can be made into a metric subspace. This is not so in the case of linear
subspaces. If Y *
,X then we speak of a proper subspace.
5.2.1. Theorem. L e t R denote the set of real numbers, and let C denote
the set of complex numbers.
(ii) (H6Ider's inequality) Let p, q E R be such that 1 < p < 00, and
1 1
- + -p= 1 . q
(a) iF nite Sums. L e t n be any positive integer and, let, 1>
and ' I ., ... ,' I . belong either to R or to C. Then
•• , n'
(5.2.3)
(5.2.4)
(c) Integrals. Let la, b) be an interval on the real line, and let
f, g: la, b] -+ R. If s: If(t) I' dt < 00 and s: Ig(t) I' dt < 00
s: If(t)g(t) Idt :::;; :U If(t) I' dt] I/':U Ig(t) ~ dt] II'. (5.2.5)
r r
(c) Integrals. Let a[ , b] be an interval on the real line, and let
,J g: a[ , b] -+ R. If If(tWdt < 00 and Ig(tW dt < 00,
then
5.1.9. iF gure A.
270 Chapter 5 I Metric Spaces
• •
= ~ 1[ ,' 1 + I'I,I]'-I'~,I + ~ [I~,I + 1'1,1]1-' 1'1,1·
Applying the Halder inequality (5.2.3) to each of the sums on the right side
of the above relation and noting that (p - l)q = p, we now obtain
1' ", ...), respectively, where 1' 1 E R for all i (where I' '
y E Roo (elements x , y E Coo) are denoted by x = (' 1 ' ,,,, ...) andy = ('11'
1' 1 E C for all i). I' '
5.3.1. Example. Let X = Rn (let X = Cn), let 1 :::;; P < 00, and let
p,(a,d)=
•
~1(l.1-~,I'
} 11,
=
1.l,~I(l.I-~ PI+PI-~II' } 11,
{
• } 11, { 11,
:::;; { 1~ (l.I-PII' + n
~IP,-~,I'
}
= p (a,b)+ p (b,d),
{ n; p,}) is a metric
the triangle inequality. It thus follows that (Rn; p,}(C
space; in fact, it is an unbounded metric space.
We frequently abbreviate (R~; p,} by R; and (cn; p,} by C;. F o r the case
p = 2, we call p" the Euclidean metric or the usual metric on R~. _
5.3.5. Ex a mple. Let 1 <p < 00, let X = R- (or X = Coo), and define
I, = x{ EX : t I'll' <
I-I
oo} . (5.3.6)
172 Chapter 5 I Metric Spaces
[ I;.. I{, 1] /, .
F o r ,x y E I" let
pJ X , y) = I- I
- 1' ,1' (5.3.7)
5.3.12. Example. eL t a[ , b], a < b, be an interval on the real line, and let
era, b] be the set of all real-valued continuous functions defined on a[ , b].
Let I < p < 00 and for ,x y E era, b], define
p,(x , y) = i[
•
b
Ix(t) - y(t)IP dt IJ I' . (5.3.13)
I{ • lu(t) -
b } 1/,
p,(u, w) = w(tWdt
= sup Ix ( t) -
.StS6
z{t) + z ( t) - y{ t ) I
I Ix - y l I
i
I
·' I
I
I I
•x •
I I
o y o
x = R, pix, yl = Ix - yl
ev= ( v,.V2)
I
X" (x " 2x 1 :
.- - - - , -
I I
o o
P.(x , yl
X .. R2, P.(X , yl" max (Ix, - y ,I,lx 2 - 2Y 11
(x tl
o Ib
"' - _ ' : "' - ' - - - -~ - -- - - -
I
In Figure B,. several metrics considered in Section 5.1 and in the present
section are depicted pictorially.
5.3.17. Exercise. Show that the metric defined in Eq. (5.3.4) is eq u ivalent
to
" /
= - 1' ,1' IJ , .
,-- [ I-'
p_(,x y) lim I; I~,
5.3.18. Exercise. Let X = R denote the set of real numbers, and define
d(x , y) = (x - yy~ for all x, y E R. Show that the function dis not a metric.
This illustrates the necessity for the ex p onent lip in Eq. (5.3.2).
5.3.19. Theorem. L e t {X; P.. } and {Y; pyJ be metric spaces, and let
Z = X x .Y Let ZI = (XI> Y I ) and Z2 = (x 2, 2Y ) be two points ofZ = X x .Y
Define the functions
P,(ZI' Z2) = ([p(z IX > x 2 l> ' + [ p iY I ' Y2)],}I/" 1 <p < 00
and
P_(ZI' Z2) = max p{ (z x u 2X ), PY(IY ' 2Y )}.
Then Z
{ ; PI} and Z
{ ; P-J are metric spaces.
The spaces Z
{ ; P,J and Z
{ ; P-J are examples of product (metric) spa~es.
5.3.21. Theorem. Let { X I ; PIJ, ... ,{ X , ,; P"J be n metric spaces, and let
X = XI X ... X "X = " "X
IT For x = (XI> ... , IX I) E ,X Y = (YI' ... ,
t-~
,Y ,) E X, define the functions
p' ( x , y) " P,(X
= I; y,)
'~I "
5.4. Open and Closed Sets 175
and
p"(x , y) = (I-' •
~ p[ ,(x" y,)~]
)1/~
.
The radius of a sphere is always positive and finite. In place of the terms
ball or sphere we also use the term spherical neighborhood of X o'
In Figure C, spheres in several types of metric spaces considered in the
previous sections are depicted. Note that in these figures the indicated spheres
do not include boundaries.
'.
I
r
~
oX
Sphere S(XO; rl, where X = R
and pIx , yl = Ix - vi
Sphere S(x o ; rl. where X ., R2 and
pIx , yl" P2(x , yl "' [ ( tl 1- 1112 + (b -1I212J~
t2
t 02 +
9! - i •
r
t 02 +
~
r
- ~ ."
-~.
t 02
~
1.,-, : I
t 02 - r - - +- -,
I~ I
:
I
I
I
I
I
I
I
I
I
I I I
Sphere S(x o ; rl. where X = R2 and Sphere S(xo ; rI where X "' R2 and
p(x , yl= P I(x , yl= l t I - " I+ I.,
~ I ~2 - 112 1 p(x , yl= p _ ( x , yl"' m ax litI - .."1'I It:~2 - 1121)
x l tl
oX - r
a b
which contains no point of Y o ther than x itself. The point x is called a limit
point or point of accumulation of set Y if every sphere with center at x contains
an infinite number of points of .Y The set of all limit points of Y is called the
derived set of Y a nd is denoted by .'Y
Our next result shows that adherent points are either limit points or
isolated points.
( ) •
o 2
5.4.9. Example. Let R { ; p} be the real line with the usual metric, and let
Q be the set of rational numbers in R. F o r every x E R, any open sphere
S(x; r) contains a point in Q. Thus, every point in R is an adherent point
of Q; i.e., R c Q. Since Q c R, it follows that R = Q. Clearly, there are
no isolated points in Q. Also, for any x E R, every sphere S(x; r) contains
278 Chapter 5 I Metric Spaces
exists a sphere Sex; r) such that sex; r) c .Y The set of all interior points of
set Y is called the interior of Y a nd is denoted by yo. A point x E X is an
ex t erior point of Y if it is an interior point of the complement of .Y The
exterior of Y is the set ofall exterior points of set .Y The set ofall points x E X
such that x E f () (Y - ) is called the frontier of set .Y The boundary of a
set Y is the set of all points in the frontier of Y which belong to .Y
5.4.13. Example. Let R{ ; p} be the real line with the usual metric, and
let Y = y{ E R: 0 < :Y 5: I} = (0, I]. The interior of Y is the set (0, I) =
°
{ y E R: 0 < y < I}. The exterior of Y i s the set (- 0 0, 0) U (I, + 0 0), f =
y{ E R: < Y : 5: I} = 0[ , I] and Y - = (- 0 0,0] U 1[ , + 0 0). Thus, the
frontier of Y is the set CO, I}, and the boundary of Y is the singleton l{ .} •
When there is no room for confusion, we usually call Y an open set and
Z a closed set. On occasions when we want to be very explicit, we will say
that Y is open relative to { X ; p} or witb respect to { X ; p}.
In our next result we establish some of the important properties of open
sets.
5.4.15. Theorem.
(i) X and 0 are open sets.
(ii) If { .Y } .. eA is an arbitrary family of open subsets of ,X then U Y ..
• eA
is an open set.
(iii) The intersection of a finite number of open sets of X is open.
Proof To prove the first part,. note that for every x E X, any sphere
Sex; r) c .X Hence, every point in X is an interior point. Thus, X is open.
Also, observe that 0 has no points and therefore every point of 0 is an
interior point of 0. Hence, 0 is an open subset of .X
To prove the second part, let .Y{ .} EA be a family of open sets in ,X and
let Y = U Y .• If Y .. is empty for every tt E A, then Y = 0 is an open
.eA
subset of .X Now suppose that Y *- 0 and let x E .Y Then x E Y . for some
tt E A. Since Y .. is an open set, there is a sphere Sex; r) such that sex; r)
c Y .• Hence, Sex; r) c ,Y and thus x is an interior point of .Y Therefore,
Y is an open set.
To prove the third part, let Y 1 and Y 2 be open subsets of .X If Y 1 () Y 2
= 0, then Y 1 n Y 2 is open. So let us assume that Y 1 n Y z *- 0, and let
Chapter 5 I Metric Spaces
5.4.17. Theorem.
(i) X and 0 are closed sets.
(ii) If Y is an open subset of ,X then r is closed.
(iii) If Z is a closed subset of ,X then Z- is open.
Proof The first part of this theorem follows immediately from the defini-
tions of ,X 0, and closed set.
To prove the second part, let Y b e any open subset of .X We may assume
that Y 1= = 0 and Y 1= = .X Let x be any adherent point of Y - . Then x cannot
belong to ,Y for if it did, then there would exist a sphere S(x ; ,) c ,Y which
is impossible. Therefore, every adherent point of Y - belongs to Y - , and thus
Y - is closed if Y is open.
To prove the third part, let Z be any closed subset of .X Again, we may
assume that Z 1= = 0 and Z 1= = .X L e t x E Z- . Then there exists a sphere
S(x ; T) which contains no point of Z. This is so because if every such sphere
would contain a point of Z, then x would be an adherent point of Z and
consequently would belong to Z, since Z is closed. Thus, there is a sphere
S(x ; r) c Z- ; i.e., x is an interiorpointofZ- . Since this holds for arbitrary
x E Z- , Z- is an open set. _
5.4.18. Theorem.
(i) Every open sphere in X is an open set.
(ii) If Y is an open subset of ,X then there is a family of open spheres,
{ .}.eA' such that Y = U S .•
S
• eA
(iii) The interior of any subset Y of X is the largest open set contained
in .Y
5.4. Open and Closed Sets 281
Proof To prove the first part, let Sex; r) be any open sphere in .X L e t
x . E sex; r), and let p(x, lX ) = r .• If we let r o = r - ' . , then according to
the proof of part (ii) of Theorem 5.4.10 we have S(x l ; ro) c Sex; r). Hence,
x . is an interior point of sex; r). Since this is true for any x . E sex; r), it
follows that sex ; r) is an open subset of .X
To prove the second part of the theorem, we first note that if Y = 0,
then Y is open and is the union of an empty family of spheres. So assume
that Y t= = 0 and that Y is open. Then each point X E Y is the center of a
sphere Sex; r) c ,Y and moreover Y is the union of the family of all such
spheres.
The proof of the last part of the theorem is left as an exercise. _
Let {Y; p} be a subspace of a metric space {X; pI, and suppose that V
is a subset of .Y It can happen that V may be an open subset of Y and at
the same time not be an open subset of .X Thus, when a set is described as
open, it is important to know in what space it is open. We have:
The first part of the preceding theorem may be stated in another equivalent
way. L e t 3 and 3' be the topology of ;X { p} and {Y; pI, respectively, generated
by p. Then 3' = { Y n :U U E 3}.
Let us now consider some specific examples.
elulpter 5 I Metric Spaces
5.4.23. Example. We now show that the word "finite" is crucial in part
{ ; p} denote again the real line with the usual
(iii) of Theorem 5.4.15. eL t R
metric, and let a < b. If "Y = x { E R: a < x < b + lin}, then for each
positive integer n, "Y is an open subset of the real line. oH wever, the set
n- "Y
,,= \
= x{ E R: a < x < b} = (a, b]
is not an open subset of R. (This. can readily be verified, since every sphere
S(b; r) contains a point greater than b and hence is not in n- "Y .)
,,= \
_
In the above example, let Y = (a, b]. We saw that Y is not an open subset
of R; i.e., b is not an interior point of .Y oH wever, if we were to consider
{ Y ; p} as a metric space by itself, then Y is an open set.
5.4.24. Example. eL t e{ ra, b]; p_} denote the metric space of Example
5.3.14. eL t 1 be an arbitrary finite positive number. Then the s~t of continuous
functions satisfying the condition Ix ( t) I < 1 for all a < t < b is an open
subset of the metric space e{ ra, b]; p_.} _
Theorems 5.4.15 and 5.4.17 tell us that the sets X and 0 are both open
and closed in any metric space. In some metric spaces there may be proper
subsets of X which are both open and closed, as illustrated in the following
example.
5.4.27. Theorem.
(i) Every subset of X consisting of a finite number of elements is closed.
(ii) L e t X o E ,X let r> 0, and let K ( x o ; r) = x { E X : p(x , x o) < r}.
Then K ( x o; r) is closed.
(iii) A subset Y c X is closed if and only if feY .
(iv) A subset Y c X is closed if and only if Y ' c .Y
(v) Let {Y.}.eA be any family of closed sets in .X Then Y. is closed. n
• eA
(vi) The union of a finite number of closed sets in X is closed.
(vii) The closure of a subset Y of X is the intersection of all closed sets
containing .Y
Proof Only the proof of part (v) is given. Let {Y.}.eA be any family of
closed subsets of .X Then {Y:}.eA is a family of open sets. Now Y . )- (n
.eA
= U
.eA
Y: is an open set, and hence n
.eA
Y. is a closed subset of .X •
5.4.28. Exercise. Prove parts (i) to (iv), (vi), and (vii) of Theorem 5.4.27.
5.4.30. Example. We now show that the word "finite" is essential in part
(vi) of Theorem 5.4.27. Let {R; p} denote the real line with the usual metric,
and let a> O. If Y. = x { E R: lin < x < a} for each positive integer n,
then Y. is a closed subset of the real line. However, the set
°
.=1
is not a closed subset of the real line, as can readily be verified since is an
adherent point of (0, a]. •
5.4.36. Example. The real line with the usual metric is a separable space.
As we saw in Example 5.4.9, if Q is the set of rational numbers, then Q = R.
•
5.4.37. Example. Let {R·; p,} be the metric space defined in Example
5.3.1 (recall that 1 < p < 00). The set of vectors x = (e I' ,e.) with
rational coordinates (i.e., e,
is a rational real number, i = I, ,n) is a
denumerable everywhere dense set in R· and, therefore, R
{ ;· p,} is a separable
metric space. _
5.4.38. Example. eL t {l,; p,} be the metric space defined in Example 5.3.5
(recall that I < p < 00). We can show that this space is separable in the
following manner. eL t
Y = .Y{ E I,: .Y = ('II' ... , 1/.,0,0, ...) for some n,
where 1/1 is a rational real number, i = 1, ... ,n} .
5.4. Open and Closed Sets 285
eH nce,
5.4.41. Exercise. Show that the metric space { X ; p}, where pis the discrete
metric defined in Example 5.1.7, is separable if and only if X is a countable
set.
set. Notice now that for every IY > zY E ,Y p~(IY > yz ) = 0 or l. That is, p~
restricted to Y is the discrete metric. It follows from Exercise 5.4.14 that Y
cannot be separable and, consequently, { t ; p~} is not separable. _
286 Chapter 5 I Metric Spaces
The set of real numbers R with the usual metric p defined on it has many
remarkable properties, several of which are attributable to the so-called
"completeness property" of this space. F o r this reason we speak of R { ; p}
as being a complete metric space. In the present section we consider general
complete metric spaces.
Throughout this section {X; p} is our underlying metric space, and J denotes
the set of positive integers. Before considering the completeness of metric
spaces we need to consider a few facts about sequences on metric spaces (cf.
Definition 1.1.25).
The range off in the above definition may consist of a finite number of
points or of an infinite number of points. Specifically, if the range of f
5.5. Complete Metric Spaces
5.5.4. Example. Let R { ; p} denote the set of real numbers with the usual
metric. If n E ,J then the sequence n{ Z} diverges and is unbounded, and the
range of this sequence is an infinite set. The sequence { ( - I )"} diverges, is
bounded, and its range is a finite set. The sequence a{ + ( nl)"} converges
to a, is bounded, and its range is an infinite set. _
Now f is any positive number. Since the only non-negative number which
288 Chapter 5 I Metric Spaces
is less than every positive number is ez ro, it follows that p(x, y) = 0 and
therefore x = y.
To prove part (iii), assume that lim x . = x and let Sex; f) be any sphere
•
about .x Then there is a positive integer N such that the only terms of the
sequence { x . } which are possibly not in Sex; f) are the terms X I ' x 2 , • • , X N - 1 •
Conversely, assume that every sphere about X contains all but a finite number
of terms from the sequence .x{ .} With f > 0 specified, let M = max n{ E :J
.x 1= S(x ; f)} . IfwesetN= M + l,thenx . E S(x ; f)foralln> N ,which
was to be shown.
To prove part (v), we note from Theorem 5.1.13 that
lP(y, )x - p(y, x.) I< p(x, .x ).
By hypothesis, lim x . = .x Therefore, lim p(x, x . ) = 0 and so lim Ip(y, )x
• • •
- p (y, x . ) I= 0; i.e., lim p(y, x . ) = p(y, x) .
•
iF nally, to prove part (vii), suppose to the contrary that p(x, y) > .Y'
Then 6 = p(x, y) - i' > O. Now'Y - p(x., y) > 0 for all n E ,J and thus
0< 6< p(x, y) - p(x., y) < p(x, x . )
for all n E .J But this is impossible, since lim X. = .x Thus, p(x, y) < y.
•
We leave the proofs of the remaining parts as an exercise. _
5.5.7. Exercise. Prove parts (ii), (iv), and (vi) of Theorem 5.5.6.
Proof To prove part (i), assume that lim Y . = x. Then every sphere about
•
x contains at least one term of the sequence .Y { } and, since every term of
fy.} is a point of ,Y it follows that x is an adherent point of .Y Conversely,
assume that x is an adherent point of .Y Then every sphere about x contains
at least one point of .Y Now let us choose for each positive integer n a point
.Y E Y such that .Y E S(x; lIn). Then it follows readily that the sequence
.Y { } chosen in this fashion converges to x. Specifically, if f > 0 is given,
then we choose a positive integer N such that lIN < f. Then for every n > N
we have Y . E S(x; lIn) c S(x; f). This concludes the proof of part (i).
To prove part (ii), assume that x is a limit point of the set .Y Then every
sphere S(x; lIn) contains an infinite number of points, and so we can choose
"*
a Y . E S(x; lIn) such that Y . IY II for all m < n. The sequence .Y { } consists
of distinct points and converges to .x Conversely, if .Y { } is a sequence of
distinct points convergent to x and if S(x; f) is any sphere with center at ,x
then by definition of convergence there is an N such that for all n > N,
y" E S(x; f). That is, there are infinitely many points of Y i n S(x ; f).
To prove part (iii), assume that Y is closed and let ,Y { ,} be a convergent
sequence with Y . E Y for all n and lim "Y = x . We want to show that x E Y .
•
By part (i), x must be an adherent point of .Y Since Y is closed, x E .Y
Next, we prove the converse. Let x be an adherent point of .Y Then by part
(i), there is a sequence y{ .J in Y such that lim Y . = x. By hypothesis, we must
•
have x E .Y Since Y contains all of its adherent points, it must be closed. _
Statement (iii) of Theorem 5.5.8 is often used as an alternate way of
defining a closed set.
The next theorem provides us with conditions under which a sequence is
convergent in a product metric space.
5.5.9. Theorem. Let {X; P.. J and fY; py} be two metric spaces, letZ = X
x ,Y let p be any of the metrics defined on Z in Theorem 5.3.19, and let
{ Z ; p} denote the product metric space of { X ; P..} and { Y ; py}. If Z E Z
= X x ,Y then z = (x, y), where x E X and y E .Y eL t fx,,} be a sequence
in ,X and let y{ ,,} be a sequence in .Y Then,
(i) the sequence ({ .x , y,,)} converges in Z if and only if ,x { ,} converges in
X and .Y { } converges in ;Y and
(ii) lim (x"' Y.) = (lim x . , lim y,,) whenever this limit exists.
• ••
5.5.10. Exercise. Prove Theorem 5.5.9.
We also have:
5.5.18. Example. Let X = Q, the set of rational numbers, and let p(x, y)
= Ix - yl· eL t x. = I + 2\
.
+ ... + 1,
n.
for n E .J The sequence .x { } is
Cauchy. Since there is no limit in Q to which .x{ } converges, the metric space
Q
{ ; p} is not complete. _
5.5.26. Example. Let {I,; p,} be the metric space defined in Example
5.3.5. We now show that this space is a complete metric space.
Let lX { } ' be a Cauchy sequence in I,. where lX ' = (~lkJ ~2k' •• , ~d' •• ).
eL t f > O. Then there is an N E J such that
for all k,j ~ N. This implies that ~"'l I < f for every m E J and all
k,j ~ N. Thus, {~.l'} is a Cauchy sequence in R for every m E ,J and hence
.~{ l}' is convergent to some limit, say lim ~ ..l' = ~. for m E .J Now let
l'
x = (~t, ~2' • • , ~'" • • ). We want to show that (i) x E I, and (ii) lim lX ' = .X
k
Since lX { } ' is a Cauchy sequence, we know by Theorem 5.5.13 that there
exists a " > 0 such that
p,(O, lX )' = [ .~I .. 1~.k I'
1] /' <"
for all k E .J Now let n be any positive integer, let p~ be the metric on R"
defined in Exercise 5.5.25, and let ~x = {~\kJ ... '~"l'J. Then p~(x~, xj)
< p,(x l" IX )' and thus {x~} is a Cauchy sequence in R;. It also follows that
p~(O, x~) s" for all k E .J Now by Exercise 5.5.25, }~x{ converges to x ' ,
5.5. Complete Metric Spaces 293
where x ' = (' I " .. ,' , ,). It follows from Theorem 5.5.6, part (vii), that
p~(O, x') < ,,; i.e., t[ i 1',1' )t/, < ". Since this must hold for all n E I, it
follows that x E I,. To show that lim X k = ,x let € > O. Then there is an
k
integer N such that p,(x } , X k ) < € for all k,j > N. Again, let n be any
positive integer. Then we have p~(,~x )~x < € for all j, k > N. F o r fixed
n, we conclude from Theorem 5.5.6, part (vii), that p~(X', x~) :::;; € for all
k 2 N. eH nce, [ ~ " 1,,,, -
"' s l
k' I' IJ /' < € for all k > N, where N depends
only on € (and not on n). Since this must hold for all n E I, we conclude
that p(x , x k } < € for all k > N. This implies that lim x k = X . _
k
5.5.27. Exercise. Show that the discrete metric space of Example 5.1.7
is complete.
5.5.28. Example. eL t e{ ra, bJ; p~) be the metric space defined in Example
5.3.14. Thus, era, bJ is the set of all continuous functions on a[ , bJ and
p~(x, y) = sup I(X I) - y(l) I.
• S/Sb
We now show that e{ ra, bJ; p~) is a complete metric space. If ,x { ,} isa Cauchy
sequence in era, bJ, then for each € > 0 there is an N such that I,x ,(I) - "X ,(I) I
< € whenever m, n 2 N for all I E a[ , b]. Thus, for fixed I, the sequence
,X { ,(I}) converges to, say, oX (I}. Since t is arbitrary, the sequence offunctions
{x,,( .)} converges pointwise to a function x o( .). Also, since N = N(€ )
is independent of I, the sequence ,x { ,( • )} converges uniformly to x o( • ).
Now from the calculus we know that if a sequence of continuous functions
,x { ,( • )» converges uniformly to a function x o( • ), then x o( • ) is continuous.
Therefore, every Cauchy sequence in e{ a[ , b); pool converges to an element in
this space in the sense of the metric poo. Therefore, the metric space e{ a[ ,
bJ; pool is complete. _
5.5.29. Example. eL t e{ ra, bJ; pz} be the metric space defined in Example
5.3.12, with p = 2; i.e.,
We now show that this metric space is not complete. Without loss ofgenerality
let the closed interval be [ - 1 , IJ. In particular, consider the sequence ,x { ,}
of continuous functions defined by
0, -) < t:::;; 0 }
x , ,(t)= nt, O:::;;t:::;;l! n ,
{
I,I! n :::;;t:::;;)
194 Chapter 5 I Metric Spaces
x ( t)
n= 3
n= 2 - ~ ' + f ~ -n=l
- l _ l - - f I~ - - l ..- - - t
r
and x(t) = 0 whenever t E [ - 1 ,0] . Now if 0 < a S I, then
r
Choosing n > I/a, we have
11 - x ( tW dt - - 0 as n - - 00.
Since this integral is independent of n it vanishes. Also, since x is continuous
5.5. Complete Metric Spaces
it follows that x(t) = 1 for t > a. Since a can be chosen arbitrarily close to
ez ro, we end up with a function x such that
r
is Riemann integrable (where the Riemann integral is denoted, as usual, by
a
f(x ) dx ) if and only if f is continuous almost everywhere on a[ , b]. The
class of Riemann integrable funCtions with a metric defined in the same
.
5.5.31. Example. Let p > 1 (p not necessarily an integer), let (R, mr, p.)
denote the eL besgue measure space on the real numbers, and let a[ , h] be
a subset of R. Let .c;:a, h] denote the family of functions f: R - > R which
are eL besgue measurable and such that f IfIp dp. exists and is finite.
[J .• b]
We define an equivalence relation ~ on .c;:a, h] by saying that f' " g
if f(x ) = g(x) except on a subset of a[ , h] having eL besgue measure ez ro.
Now denote the family of equivalence classes into which .cp[a, h] is divided
by pL a[ , h]. Specifically, let us denote the equivalence class [ f ] = g{ E
.cp[a, h]: g ~ f} for f E .cp[a, h]. Then pL a[ , h] = { [ f ] : f E .cp[a, hlJ·
Now let X = pL a[ , h] and define Pp: X x X - > R by
(5.5.32)
It can be shown that the value of p([f], g[ )J defined by Eq. (5.5.32) is the same
for any f and g in the equivalence classes [ f ] and g[ ,] respectively. Further-
more, p satisfies all the axioms of a metric, and as such pL { a[ , h]; pp} is a
metric space. One of the important results of the eL besgue theory is that this
space is complete.
It is important to note that the right-hand side of Eq. (5.5.32) cannot be
used to define a metric on .cp[a, h], since there are functions f *- g such that
.[J ,f b)
If- glp dp. = 0; however, in the literature the distinction between
pL a[ ,h] and .cp[a, h] is usually suppressed. That is, we usually write f E
pL a[ ,b] instead of [ f ] E A
L a, b], where f E .cJa, b].
Finally, in the particular case when p = 2, the space e{ ra, b]; pz} of
Example 5.5.29 is a subspace of the space L{ z ; pz.} •
We leave the proof of the last result of the present section as an exercise.
5.6. COMPACTNESS
Set X
• • • • • • • • • •
• • • Set Y
S. is the finite • • •
set consisting of
the dots within • • •
the set X
• • •
• • •
• • • • •
5.6.5. iF gure G. Total boundedness of a set .Y
We note, for example, that all finite sets (including the empty set) are
totally bounded. Whereas all totally bounded sets are also bounded, the
converse does, in general, not hold. We demonstrate this by means of the
following example.
Y = y{ E 12 ::E
. 1'1,1 2
S I}.
t= 1
0, ...), etc. Then pz(e l, eJ ) = ...;-T for i 1= = j. Now suppose there is a finite
€-net for Y for say € = 1- Let S{ l> ... , s,,} be the net S,. Now if eJ is such
that p(eJ' SI) < ! for some i, then peek' sJ > peek' eJ ) - p(eJ' SI) > ! for
k 1= = j. Hence, there can be at most one element of the set E in each sphere
S(SI;! ) for i = I, ... ,n. Since there are infinitely many points in E and
only a finite number of spheres S(SI; ! ) , this contradicts the fact that S, is
an (- n et. Hence, there is no finite (- n et for ( = ! ' and Y is not totally
bounded. _
5.6.12. Example. L e t X = (0, I], and let p be the usual metric on the real
line R. Consider the sequence .x { ,J where "x = lin, n = I, 2, . . .. This
302 Chapter 5 I Metric Spaces
Some authors use the term bicompact for eH ine-Borel compactness and
the term compact for what we call sequentially compact. As we shall see
shortly, in the case of metric spaces, compactness and sequential compactness
are equivalent, so no confusion should arise.
We will also show that compact metric spaces can equivalently be charac-
terized by means of the Bolzano-Weierstrass property, given by the following.
Before setting out on proving the assertions made above, i.e., the equiva-
lence of compactness, sequential compactness, and the Bolzano-Weierstrass
property, in metric spaces, a few comments concerning some of these concepts
may be of benefit.
Informally, we may view a sequentially compact metric space as having
such an abundance of elements that no matter how we choose a sequence,
there will always be a clustering of an infinite number of points around at
least one point in the metric space. A similar interpretation can be made
concerning metric spaces which possess the Bolzano-Weierstrass property.
Utilizing the concepts of sequential compactness and total boundedness,
we first state and prove the following result.
whenever m > n > I. Letting m - + 00, we have 0 < p(x", )x < E, whenever
n > I. Hence, the Cauchy sequence ,x { ,} converges to x E .X Therefore,
X is complete.
In connection with parts (iv) and (v) we note that a totally bounded metric
304 Chapter 5 I Metric Spaces
Parts (iii), (iv) and (v) of the above theorem allow us to define a sequentially
compact metric space equivalently as a metric space which is complete and
totally bounded. We now show that a metric space is sequentially compact if
and only if it satisfies the Bolzano-Weierstrass property.
Our next objective is to show that in metric spaces the concepts of com-
pactness and sequential compactness are equivalent. In doing so we employ
the following lemma, the proof of which is left as an exercise.
that every infinite subset of a compact metric space has a point of accu-
mulation.
eL t [ X ; p) be a compact metric space, and let Y be an infinite subset of
.X F o r purposes of contradiction, assume that Y has no point of accumula-
tion. Then each x E X is the center of a sphere which contains no point of
,Y except possibly x itself. These spheres form an infinite open covering of
.X But, by hypothesis, [ X ; p) is compact, and therefore we can choose from
this infinite covering a finite number of spheres which also cover .X Now
each sphere from this finite subcovering contains at most one point of ,Y and
therefore Y is finite. But this is contrary to our original assumption, and we
have arrived at a contradiction. Therefore, Y has at least one point of accu-
mulation, and [ X ; p) is sequentially compact.
Conversely, assume that [ X ; p) is a sequentially compact metric space,
and let [ Y .. ;« E A) be an arbitrary infinite open covering of .X From
Lemma 5.6.18 there exists an [ > 0 such that every sphere in X of radius
[ is contained in at least one of the open sets Y ... Now, by hypothesis, { X ; p)
is sequentially compact and is therefore totally bounded by part (iii) of
Theorem 5.6.15. Thus, with arbitrary [ fixed we can find a finite [-net,
I
IX[ > x z , ... ,XI)' such that X c U
1= 1
S(x l; f). Now in view of Lemma 5.6.18,
S(x l ; [) c Y .. I , i = I, ... ,I, where the sets ,Y ," are from the family { Y .. ;
« E A). eH nce,
I
Y .. XcU
" I-I
and X has a finite open subcovering chosen from the infinite open covering
{ Y .. ;« E A). Therefore, the metric space { X ; p) is compact. This proves the
theorem. _
Recall that every non-void compact set in the real line R contains its
infimum and its supremum.
In general, it is not an easy task to apply the results of Theorem 5.6.24
to specific spaces in order to establish necessary and sufficient conditions
for compactness. F r om the point of view of applications, criteria such as
those established in Theorem 5.6.27 are much more desirable.
We now give a condition which tells us when a subset of a metric space is
compact. We have:
u U[ .. EB V..]. Since Y c U
.. eB
V.., Y = U
.. eB
Y n V.. ; i.e., { Y .. ;« E B} covers Y .
This implies that Y is compact. _
5.7. CONTINUOS
U N
UF CTIONS
5.7.1. Definition. Let { X ; P..J and { Y ; py} be two metric spaces, and let
f: X - + Y be a mapping of X into .Y The mappingf is said to be continuous
at the pcint X o E X if for every ( > 0 there is a ~ > 0 such that
o)] < (
PY [ f (x ) ,f(x
whenever p,,(x, x o) < ~. The mapping f is said to be continuous on X or
simply continuous if it is continuous at each point x E .X
308 Chapter 5 / Metric Spaces
5.7.2. Example. Let { X ; p,,} = R~, and let { Y ; py} = RT (see Example
5.3. I). Let A denote the real matrix
.,. a mn
.
We denote x E Rn and Y E Rm by
[ ~']
11m
= a[ n
amI
"~ ] e[ ,]
am_ en
and
Now let M= t{ 1 tJ all} 1/1 1= = 0 (if M= 0 then we are done). Given any
f 0 and choosing ~ = flM, it follows that p,.{y, oY ) < f whenever p,,(x, ox )
>
< and any mapping f: Rn - + Rm which is represented by a real, constant
~
(m X n) matrix A is continuous on Rn. •
5.7.3. Example. Let { X ; p,,} = { Y ; py} = {e[a, b]; P2}' the metric space
defined in Example 5.3.12, and let us define a function/: X - + Y in the fol-
5.7. Continuous uF nctions 309
where k: R7. - > R is continuous in the usual sense, i.e., with respect to the
metric spaces R~ and R1. We now show that f is continuous on .X Let x,
rI{ :
X o E X and y, o Y E Y be such that y = f(x ) and oY = f(x o). Then
u: r
py(y, oY ) < Mpx(x, x o),
where M = k7.(t, s) dsdtl'7.. eH nce, for any f > 0, py{y,Yo) < f when-
ever Px(,x x o) < b, where b = fiM. •
5.7.4. Example. Consider the metric space e{ ra, b]; p~} defined in Example
5.3.14. eL t el[ a , b] be the subset of era, b] of all functions having continuous
first derivatives (on (a, b», and let {X; Px} be the metric subspace {el[a, b];
pool. Let {Y; py} = e{ ra, b]; p~} and define the functionJ: X - > Yas follows.
F o r x E ,X Y = f(x ) is given by
yet) = dx ( t) .
dt
To show that/is not continuous, we show that for any b > 0 there is a pair
x , X o E X such that Px(,x x o) < ~ but pif(x ) ,f(x o > I. eL t ox (t) = 0 for »
all t E a[ , b], and let x(t) = tx sin rot, tx > 0, ro > O. Then p(x o' x) < tx.
Now if oY = f(x o) and y = f(x ) , then yo(t) = 0 for all t E a[ , b] and yet)
= (XCI) cos rot. eH nce, p(Yo' y) = txro, provided that ro is sufficiently large, i.e.,
so that cos rot = ± I for some t E a[ , b]. Now no matter what value of ~
we choose, there is an x E X such that p(x, x o) < ~ if we pick tx < ~. oH w-
ever, p(y, oY ) = I if we let ro = Iltx. Therefore,J i s not continuous on .X •
5.7.5. Theorem. Let {X; Px} and { Y ; py} be metric spaces, and let f:
X -> .Y Then f is continuous at a point X o E X if and only if for every
f > 0, there exists a ~ > 0 such that
f(S(xo;~) c S(f(x o); f).
o
o
{ Y ; pyl = R~
5.7.8. Theorem. Let { X ; P,J and { Y ; p,.} be two metric spaces. A function
f: X - + Y is continuous at a point X o E X if and only if for every sequence
}~x{ of points in X which converges to a point X o the corresponding sequence
f{ (x)~ } converges to the point f(x o) in Y; i.e.,
limf(x)~ = f(lim x~) = f(x o)
tion, and I must be continuous at X o' This concludes the proof of our
theorem. _
5.7.9. Theorem. eL t { X ; p~} and { Y ; p,} be two metric spaces, and letl
be a mapping of X into .Y Then
(i) Iis continuous on X if and only if the inverse image of each open
subset of { Y ; p,} is open in { X ; p~}; and
(ii) I is continuous on X if and only if the inverse image of each closed
subset of { Y ; p,} is closed in { X ; p~}.
Proof. eL t I be continuous on ,X and let V::t= 0 be an open subset of
;Y{ p,}. Let U = I- I (V). Clearly, :U :t= 0. Now let x E .U Then there exists a
unique y = I(x ) E V. Since V is open, there is a sphere S(y; e) which is
entirely contained in V. Since I is continuous at x, there is a sphere S(x; 0)
such that its image I(S(x ; 0» is entirely contained in S(y; e) and therefore
in V. But from this it follows that S(x; 0) c .U eH nce, every x E U is the
center of a sphere which is contained in .U Therefore, U is open.
Conversely, assume that the inverse image of each non-empty open
5.7.10. Theorem. eL t {X; p~}, {Y; p,}, and Z{ ; P.} be metric spaces, letf
be a mapping of X into ,Y and let g be a mapping of Y into Z. Iffis contin-
uous on X and g is continuous on ,Y then the composite mapping h = g 0 I
of X into Z is continuous on .X
5.7.12. Theorem. Let ;X { Px} and ;Y{ P)'} be two metric spaces, and let
f: X - + Y be continuous on .X
(i) If {X; Px} is compact, then f(X ) is a compact subset of {Y; p)'.}
(ii) If U is a compact subset of the metric space ;X{ Px,} thenf(U ) is
a compact subset of the metric space { Y ; p)'.}
(iii) If {X; P}x is compact and if U is a closed subset of ,X then f( )U
is a closed subset of { ;Y p)').
(iv) If;X { Px} is compact, thenfis uniformly continuous on .x
Proof To prove part (i) let IY { I} be a sequence in f(X ) . Then there are
points ,x { ,} in X such that IY I = f(x ll ). Since ;X{ Px} is compact we can find
a subsequence ,x { .) of ,x { ,} which converges to a point in ;X i.e., ,x . - + x.
In view of Theorem 5.7.8 we have, since f is continuous at x, f(x , .) - + f(x )
E f(X ) . From this it follows that the sequence ,Y{ ,} has a convergent sub-
sequence and f(X ) is compact.
To prove part (ii), let U be a compact subset of .X Then ;U { Px} is a
compact metric space. In view of part (i) it now follows that f( )U is also a
compact subset of the metric space { Y ; p)'.}
To prove part (iii), we first observe that a closed subset U of a compact
metric space ;X{ Px) is itself compact and ;U { Px) is itself a compact metric
space. In view of part (ii), f( U ) is a compact subset of the metric space
{ Y ; P)'} and as such is bounded and closed.
To prove part (iv), let E > O. F o r every x E ,X there is some positive
number, 'I(x), such that f(S(x ; 2'1(x») c: S(f(x ) ; E/2). Now the family
{ S ex ; ' I (x » : x E X ) is an open covering of X. Since X is compact, there is
a finite set, say F c: ,X such that S { ex; ,,(x»: x E } F is a covering of .X
Now let
6 = min {,,(x): x E .} F
Since F is a finite set, 6 is some positive number. Now let ,x Y E X be such
that p(x, y) < 6. Choose z E F such that x E S(z; ,,(z». Since 6:::;;; ,,(z),
Y E S(z; 2,,(z.» Since f(S(z ; 2,,(z)» c S(f(z ) ; E/2), it follows that f(x ) and
f(y) are in S(f(z ) ; E/2). eH nce, pif(x ) ,f(y» < E. Since 6 does not depend
on x E ,X f is uniformly continuous on .X This completes the proof of the
theorem. _
5.7.13. Definition. eL t ;X { Px} and ;Y{ p),} be metric spaces, and let {fll}
be a sequence of functions from X into .Y Iff{ 1l(X)} converges at each x E X,
then we say that {fll} is pointwise convergent. In this case we write lim fll = f,
II
where f is defined for every x E .X
whenever n > N(f, x). In general, N(f, x ) is not necessarily bounded. How-
ever, if N(f, )x is bounded for all x E ,X then we say that the sequence
I[ .} converges to I uniformly on .X Let M(f) = sup N(f, x ) < 00. Equivalently,
"ex
we say that the sequence [f.} converges uniformly to I on X if for every
f > 0 there is an M(f) such that
pil.(x ) ,f(x » < f
5.7.14. Theorem. Let [ X ; p,,} and [ Y ; py} be two metric spaces, and let
f[ It} be a sequence of functions from X into Y such that f" is continuous on
X for each n. If the sequence [f.} converges uniformly to I on X, then I is
continuous on .X
Proof Assume that the sequence [ f .} converges uniformly to Ion .X Then
for every f > 0 there is an N such that Py(f.(x ) ,f(x » < f whenever n > N
for all x E .X If M > N is a fixed integer then 1M is continuous on .X Letting
X o E X b e fixed, we can find a 6> 0 such thatpy(fM(x),IM(x o» < fwhenever
p,,(x , x o) < 6. Therefore, we have
:» ::;;
py(f(x ) ,/(x o pif(x ) ,fM(X » + py(fM(x),fM(X »
O
The reader will recognize in the last result of the present section several
generalizations from the calculus to real-valued functions defined on metric
spaces.
5.7.15. Theorem. Let [ X ; pJe} be a metric space, and let R [ ; p} denote the
real line R with the usual metric. Let I: X - > R, and let U c: .X If I is con-
tinuous on X and if U is a compact subset of [ X ; p",}, then
(i) lis uniformly continuous on U ;
(ii) fis bounded on ;U and
(iii) if U " * 0, f attains its infimum and supremum on ;U i.e., there
ex i stx o ,x 1 E U s uchthatf(x o )= i nf{ f (x ) :x E U ) andf(x l) = sup
[ f (x ) : x E .} U
314 Chapter 5 I Metric Spaces
Proof Part (i) follows from part (iv) ofTheorem 5.7.12. Since U is a compact
subset of X it follows that /(U ) is a compact subset of R. Thus, /(U ) is
bounded and closed. From this it follows that j is bounded. To prove part
(iii), note that if U is a non-empty compact subset of ;X { Px,} then /(U ) is
a non-empty compact subset of R. This implies that / attains its infimum
and supremum on .U •
In this section we present two results which are used widely in applica-
tions. The first of these is called the fixed point principle while the second is
known as the Ascoli-Arzela theorem. Both of these results are widely utilized,
for example,in establishing existence and uniqueness of solutions of various
types of equations (ordinary differential equations, integral equations,
algebraic equations, functional differential equations, and the like).
We begin by considering a special class of continuous mappings on metric
spaces, so-called contraction mappings.
The following result is known as the fixed point principle or the principle
of contraction mappings.
The unique point X o satisfying Eq. (5.8.6) is called a fixed point off In
this case we say that X o is obtained by the method of successive approximations.
Proof We first show that if there is an X o E X satisfying (5.8.6), then it
must be unique. Suppose that X o and oY satisfy (5.8.6). Then by inequality
(5.8.2), we have p(xo,Yo) < cp(x o' oY )· Since 0 < c < I, it follows that
p(x o' oY ) = 0 and therefore X o = oY '
Now let IX be any point in .X We want to show that the sequence fx.}
generated by Eq. (5.8.7) is a Cauchy sequence. F o r any n > I, we have
p(x.+ I, x . ) < cp(x., x._ I). By induction we see that p(x.1+ > x . ) < C· - I p(XZx ' l )
for n = 1,2, .... Thus, for any m > n we have
+ c+ ... +
",- I
p(x"" x.) < I: P(XkI+ '
k= •
x k) < c· - I p(x z , xl)[1 C",-I-· ]
< c• -1
p( ,zX IX )
.
- 1- c
Since 0 < c < I, the right-hand side of the above inequality can be made
arbitrarily small by choosing n sufficiently large. Thus, .x { } is a Cauchy
sequence.
Next, since fX ; p} is complete, it follows that .x { } converges; i.e., lim x•
•
exists. eL t lim x . = .x Now since/is continuous on ,X we have
•
limf(x . ) = f(lim x.).
• •
But f(lim x . )
•
= f(x ) and lim
II.
I(x n ) = lim x n+ I = .x Thus,/(x ) = x and we
have proven the existence of a fixed point off Since we have already proven
uniqueness, the proof is complete. _
It may turn out that the composite function pn' /),. /0/0 ... 0/ is a
contraction mapping, whereas / is not. The following result shows that such
a mapping still has a unique fixed point.
5.8.8. Corollary. Let { X ;p} be a complete metric space, and let/; X - > X
be continuous on .X If the composite function p.' = f 0/0 ... 0/ is a
contraction mapping, then there is a unique point X o E X such that
f(x o) = X o' (5.8.9)
Moreover, the fixed point can be determined by the method of successive
approximations (see Theorem 5.8.5).
5.8.11. Definition. Let e[a, b] denote the set of all continuous real-valued
functions defined on the interval a[ , b] of the real line R. A subset Y of
era, b] is said to be equicontinuous on a[ , b] if for every f > 0 there exists a
J > 0 such that Ix(t) - x(t o) I < f for all x E Y and all t, to such that
It - tol < .J
Note that in this definition J depends only on f and not on x or 1 and ' 0,
We now state and prove the Arzela-Ascoli theorem.
5.8.12. Theorem. Let e{ a[ , b]; p_} be the metric space defined in Example
5.3.14. Let Y be a bounded subset of e[a, b]. If Y is equicontinuous on a[ , b],
then Y is relatively compact in e[a, b].
Proof F o r each positive integer k, let us divide the interval a[ , b] into k equal
parts by the set of points Vk = t{ ok' 11k' ... ,/ u } c a[ , b]. That is, a = 10k
< Ilk < ... < lu = b, where t'k = a + (ilk)(b - a), i = 0, I, ... ,k, and
k
= U = 1,2, .... Since each
-
a[ , b] [ / c/- I lk' I,k] for all k Vk is a finite set,
1= '
U
k= 1
Vk is a countable set. F o r convenience of notation, let us denote this set
by T { ,! Tz, ....J The ordering of this set is immaterial. Next, since Y is
bounded, there is a ., > 0 such that p_(,x y) < ., for all ,x Y E .Y Let X o be
held fixed in ,Y and let Y E Y be arbitrary. Let 0 E era, b] be the function
which is zero for all 1 E a[ , b]. Then p_(y, 0) < p_(y, x o) + p_(x o, 0). Hence,
p_ ( y, 0) < M for all y E ,Y where M = ., + p_(x o, 0). This implies that
sup ly(t)1 < M for all Y E .Y Now, let y{ .J be an arbitrary sequence in
IEI• • bl
.Y We want to show that y{ .J contains a convergent subsequence. Since
IY.(TI)I < M for all n, the sequence of real numbers .Y { (T I)} contains a
convergent subsequence which we shall call {YI.(TI)}' Again, since IhY (1'Z) I
< M for all n, the sequence of real numbers hY{ (1')z } contains a convergent
subsequence which we shall call .z Y { (1')z .} We see that .z Y { (1' I)} is a subse-
quence Of{hY (1' I)}, and hence it is convergent. Proceeding in a similar fashion,
we obtain sequences y { hI, .z Y { ,} ... such that bY{ } is a subsequence of y { 1.}
for all k > j. Furthermore, each sequence is such that lim hY (1',) exists for
•
each i such that 1 < i < k. Now let { x . J b e the sequence y{ • .} Then .x { } is
a subsequence of hY { } and lim .X (1',) exists for i = 1,2, .... We now wish to
•
show that .x{ } is a Cauchy sequence in e{ a[ , b]; p_.} Let f> 0 be given.
Since Y is equicontinuous on a[ , b], we can find a positive number k such
that Ix.(t) - x . (t' ) I < f/3 for every n whenever It - t' l < Ilk. Since .X { (1',)}
is a convergent sequence of real numbers, there exists a positive integer
N such that Ix.(1',) - m X (1',) I < f/3 whenever m > Nand n > N for all
1', E Vk • Now, if t E a[ , b], there is some 1', E Vk such that II - 1',1 < Ilk.
5.9. Equivalent and oH meomorphic Metric Spaces. Topological Spaces 317
This implies that poo(x"" fX t) .< E for all m, n > N. Therefore, .x{ } is a Cauchy
sequence in era, b]. Since e{ ra, b]; pool is a complete metric space (see Example
5.5.28), fX{ t} converges to some point in era, b]. This implies that fY { t}
has a subsequence which converges to a point in era, b] and so, by Theorem
5.6.31, Y is relatively compact in era, b]. This completes the proof of the
theorem. _
the same underlying set (e.g., the metric spaces { X ; P.. } and { Y ; py}, where
X * Y) may have many similar properties of the type mentioned above.
We begin with equivalence of metric spaces defined on the same underlying
set.
5.9.1. Definition. Let { X ; ptl and { X ; Pl} be two metric spaces defined on
the same underlying set .X Let 3 1 and 31 be the topology of X determined
by PI and Pl' respectively. Then the metrics PI and Pl are said to be equivalent
metrics if 3 1 = 31 ,
5.9.2. Theorem. Let {X; pd, {X; Pl}' and { Y ; P3} be metric spaces. Then
the following statements are equivalent:
(i) PI and Pl are equivalent metrics;
(ii) for any mappingf: X - + Y,J: { X ; PI} - + { Y ; P3} is continuous on X
if and only iff: { X ; Pl} - + { Y ; P3} is continuous on X;
(iii) the mapping i: { X ; PI} - + { X ; Pl} is continuous on ,X and the
mapping i- I : { X ; Pl} - + { X ; ptl is continuous on X; and
(iv) for any sequence x { R } in ,X x { R } converges to a point x in { X ; PI}
if and only if x { R } converges to x in ;X { Pl}'
Proof To prove this theorem we show that statement (i) implies statement
(ii); that statement (ii) implies statement (iii); that statement (iii) implies
statement (iv); and that statement (iv) implies statement (i).
To show that (i) implies (ii), assume that PI and Pl are equivalent metrics,
and letfbe any continuous mapping from ;X{ PI} into {Y; P3}' Let U be any
open set in { Y ; P3}' Sincefis continuous,J - I (U ) is an open set in { X ; PI}'
Since PI and Pl are equivalent metrics,f- I (U ) is also an open set in { X ; Pl} '
Hence, the mapping f: { X ; Pl} - + { Y ; P3} is continuous. The proof of the
converse in statement (ii) is identical.
5.9. Equivalent and oH meomorphic Metric Spaces. Topological Spaces 319
We now show that (ii) implies (iii). Clearly, the mapping i: ;X{ pz} - +
{X; pz} is continuous. Now assume the validity of statement (ii), and let
{ Y ; P3} = {X; pz.} Then i: {X; PI} - {X; pz} is continuous. Again, it is clear
that i- I : { X ; PI} - + { X ; PI} is continuous. Letting { Y ; P3} = { X ; pd in
statement (ii), it follows that i- I : { X ; pz} - + { X ; PI} is continuous.
Next, we show that (iii) implies (iv). eL t i: ;X{ PI} - + ;X { pz} be contin-
uous, and let the sequence {x~} in metric space { X ; PI} converge to .x By
Theorem 5.7.8, lim i(x)~ = i(x); eL ., lim ~x = x in { X ; pz.} The converse is
~ ~
5.9.3. Theorem. Let ;X{ PI} and ;X { pz} be two metric spaces. If there
exist two positive real numbers, Y' and A, such that
lpz ( x , y) < PI(X, y) < lJ Pz(,x y)
for all ,x y E ,X then PI and pz are equivalent metrics.
5.9.6. Theorem. Let {R~; PI} = R~ and {R~; pz} = R~ be the metric spaces
defined in Example 5.3.1, and let R{ ;~ pool be the metric space defined in
Ex a mple 5.3.3. Then
(i) poo(x, y) < pz(,x y) < ..jn poo(x, y) for all ,x y E R~;
(ii) poo(x, y) < PI(X, y) < npoo(x, y) for all ,x y E R~; and
(iii) PI' Pz, and poo are equivalent metrics.
320 Chapter 5 I Metric Spaces
In Example 5.1.12, we defined a metric P*, called the usual metric for R*.
U p until now, it has not been apparent that there is any meaningful connec-
tion between P* and the usual metric for R. The following result shows that
when P* is restricted to R, it is equivalent to the usual metric on R.
5.9.8. Theorem. L e t R { ; p) denote the real line with the usual metric,
and let R
{ *; p*J denote the extended real line (see Exercise 5.1.12). Consider
R
{ ; P*J which is a metric subspace of R
{ *; P*J. Then
(i) for the metric spaces R { ; p) and {R; p*J, p and p* are eq u ivalent
metrics;
(ii) if U c R, then U is open in {R; p) if and only if U is open in
R
{ *; P*); and
(iii) if U is open in R{ *; p*), then U n R, U - {+ooJ, and U - { - o o)
are open in R
{ *; p*).
5.9.9. Exercise. Prove Theorem 5.9.8. (H i nt: Use part (iii) of Theorem
5.9.2 to prove part (i) of this theorem.)
Our next example shows that i- I need not be continuous, even though
i is continuous.
5.9.10. Example. L e t X be any non-empty set, and let PI be the discrete
metric on X (see Example 5.1.7). In Exercise 5.4.26 the reader was asked
to show that every subset of X is open in { X ; PI)' Now let { X ; p) be an
arbitrary metric space with the same underlying set .X Clearly, i: { X ; PI)
-+ { X ; p) is continuous. However, i- I : { X ; p) - + { X ; PI) is not continuous
unless every subset of { X ; p) is open. Since this is usually not true, i- I need
not be continuous. _
5.9.11. Definition. Two metric spaces { X ; P.. J and {Y; py} are said to be
bomeomorpbic if there exists a mapping rp: { X ; P..J - + { Y ; p,.) such that (i)
rp is a bijective mapping of X onto ,Y and (ii) E c X is open in { X ; P..) if
and only if rp(E) is open in { Y ; p,.J. The mappingrp is calledabomeomorpbism.
5.9.12. Theorem. Let {X; P..,J { Y ; p,.), and { Z ; p,) be metric spaces, and
let rp be a bijective mapping of { X ; P.. ) onto { Y ; p,.). Then the following
statements are equivalent.
5.9. Equivalent and oH meomorphic Metric Spaces. Topological Spaces 321
(i) rp is a homeomorphism;
(ii) for any mapping f: X - + Z, f: ;X { Px} - + Z { ; Pz} is continuous on
X if and only iff0 rp-I: { Y ; py} - + { Z ; Pz} is continuous on ;Y
(iii) rp: { X ; Px} - + { Y ; py} is continuous and rp-I: {Y; py} - + {X; Px} is
continuous; and
(iv) for any sequence x { n } in ,X x { n } converges to a point x in {X; Px}
if and only if r{ p(x n )} converges to rp(x) in { Y ; py}.
5.9.14. Theorem. Let { X ; PI} and { X ; P2} be two metric spaces with the
same underlying set .X Then P I and P2 are equivalent if and only if the identity
mapping i: ;X { PI} - + ;X { P2} is a homeomorphism.
5.9.16. Definition. eL t { X ; Px} and { Y ; py} be two metric spaces, and let
rp: { X ; Px} - + (Y ; py} be a bijective mapping of X onto .Y The mapping rp is
said to be an isometry if
Px(,x y) = py(rp(x), rp(y»
for all x, y E .X If such an isometry exists, then the metric spaces (X ; Px}
and ;Y{ P,.} are said to be isometric.
chapter, we note that a great deal of the development of metric spaces is not
a consequence of the metric but, rather, depends only on the properties of
certain open and closed sets. Taking the notion of open set as basic (instead
of the concept of distance, as in the case of metric spaces) and taking the
aforementioned properties of open sets as postulates, we can form a mathe-
matical structure which is much more general than the metric space.
The family 3 is called the topology for the set .X The complement of an
open set U E 3 with respect to X is called a closed set.
5.9.23. Example. Let X = ,x { y,J and let the open sets in X be the void set
0, the set X itself, and the set .} x { If 3 is defined in this way, then ;X { 3}
is a topological space. In this case the closed sets are 0, ,X and y{ .} _
5.10. APPLICATIONS
y= x
/
/
b1-----------.,(
81- - - . (
/
/
/
/
,
y· x
/
b
/
/
/
/
~
8
,-
/
............... y" fIx)
/
/
x. X 3x
b
e, = ~
:'J 1
•
a'J~ + P" i= 1, ... , n. (5.10.7)
where in the preceding the Holder inequality for finite sums was used (see
Theorem 5.2.1). Clearly, f is a contraction if the inequality
(5.10.8)
holds. Thus, Eq. (5.10.7) possesses a unique solution if (5. 10.8) holds for allj.
Next, we consider the complete space R { n; pz} = Ri. We have
Thus,f is a contraction if
{m~x fti 1al/ Il
a
b. k < l. (5.10.10)
I- '..t(k) - - ~
a
a IJ ' - I..t(k- I ) + bI' k -- " 1 2 ... , (5.10.11)
1-
for all i = I, ... ,n, with starting point X C01 = (~iO), ... ,~~O». •
Clearly y E era, b]. We thus have f: era, b] -+ era, b]. Now let M =
sup I(K s, t) I. Then
a5;t5;b
a5;,5;b
Since the right- h and side of this expression is continuous, it follows that
f: era, b] - era, b]. Moreover, since K is continuous, there is an M such
that IK(s, t)1 ~ M. L e t YI = f(x l ), and let zY = f(x z ). As in the preceding
example, we have
p..(f(x l ),f(x 2)) = P"(YI> 2Y ) < 1.1.1 M(b - a)poo(x l , x 2)·
Now let fl"l denote the composite mapping f 0 f 0 ••• 0 f, and let fl"l(x )
(5.10.18)
H e nce, we have
p.. (fI"l(X I),fI"l(X 2)» < kp..(x 1 , x 2 ), 0 < k < l.
Therefore, the composite mapping fl"l is a contraction mapping. It follows
from Corollary 5.8.8 that Eq. (5.10.17) possesses a uniq u e continuous solution
for arbitrary .t. This solution can be determined by the method of successive
approx i mations. _
328 Chapter 5 I Metric Spaces
x = 1(1, x ) }
(X f) = e (5.10.21)
Consider now the complete metric space { e [ f , T]; Poo}, and let
(5.10.23)
..!n., kft(T- f)ft < I. Therefore, p.) is a contraction. It now follows from
Corollary 5.8.8 that Eq. (5.10.21) possesses a unique solution for [f, T].
Furthermore, this solution can be obtained by the method of successive
approximations. _
5.10. Applications 329
We now prove:
"' -r-' - ~ - I
," I
I
(1", l~ I
I
I
I
L . .- - J=- ~
I......-"""-- _ _ __ _ L. _ _ '_ t
........."- ' - "~'- ' :_ ' t-
~
(1" - I , ~l
(1" + a,~)
Proof Let M = max II(t, x) I, and let ~ = min (a, hIM). Note that
(' , > : lED,
~ =
a if a < hiM and ~ = hiM if a> hiM (refer to Figure )L . We will show
that an f-approximate solution exists on the interval [ f , f + ~]. The proof is
similar for the interval (f - ~, fl. In our proof we will construct an f-approxi-
mate solution starting at (f,~, consisting of a finite number of straight line
segments joined end to end (see Figure )L .
Since 1 is continuous on the compact set Do, it is uniformly continuous
on Do (see Theorem 5.7.12). eH nce, given f > 0, there exists 6 = 6(f) > 0
such that I/(t, x ) - I(t' , x ' ) I < f whenever (t, x), (t', x ' ) E Do, It - t'l < 6
and Ix - i'x < 6. Now let f = to and f + ~ = t". We divide the half-open
interval (to, t,,] into n half-open subintervals (to, tl]' (tl' t 2,] ... , (t,,_ I ' t,,] in
such a fashion that
(5.10.30)
5.10. Applications 331
r
or, equivalently, that, satisfies the integral equation
r
, • is not differentiable. Then , • can be expressed in integral form as
r
f + IX.] sU ing Eq. (5.10.36) we now have
I [f(s, , ..,(s» - f(s, ,(s» + ~ ...(s)]ds < I If If(s, , ..(s» - f(s, ,(s» Ids I
+ If I~ . ,(s) Ids 1< lX(f ... + f) f' .
r
A
sU ing Theorem 5.10.33, the reader can readily prove the next result.
Proof. Let R(t) = ~ + f k(s)r(s)ds. Then r(t) < R(t), R(a) = 6, R(t)
= k(t)r(t) < k(t)R(t), and
R(t) - k(t)R(t) < 0 (5.10.42)
for all t E a[ , b). Let (K t) = e-f.k(I).". Then
(K t) = -k(t)e-I~k(l)dl = - K ( t)k(t).
Multiplying both sides of (5. 10.42) by (K t) we have
K(t)R(t) - K(t)k(t)R(t) < 0
or
K(t)R(t) + K(t)R(t) < 0
or
~ (K[ t)R(t») < o.
Integrating this last expression from a to t we obtain
K(t)R(t) - (K a)R(a) < 0
or
K(t)R(t) - ~ < 0
or
or
r(t) < R(t) < ~el~k(l)dl,
In our next result we will require that the function 1 in Eq. (5.10.25)
satisfy a Lipschitz condition
I/(t, x ' ) - I(t, x " ) I < k lx' - x"I
for all (t, x'), (t, x " ) E D.
of Eq. (5.10.25) on an interval (a, b), if r- E (a, b), and if 'I(-r) = 'z(-r) = e,
then' l = ,z ) .
Proof. By Corollary 5.10.37, at least one solution exists on some interval
(a, b), r- E (a, b). Now suppose there is more than one solution, say ' I
and 'z, to the initial-value problem (5.10.26). Then
Let ret) = l' l (t) - ' z ( t)l, and let k > 0 denote the iL pschitz constant for
f. In the following we consider the case when t ~ r- , and we leave the details
of the proof for t < r- as an exercise. We have,
= f kr(s)ds;
i.e.,
ret) < s: kr(s)ds
for all t E r-[ , b). The conditions of Theorem 5.10.39 are clearly satisfied and
we have: if r(t) ~ J + f kr(s)ds, then ret) < JeJ~""'. Since in the present
case J = 0, it follows that
ret) = 0 for all t E r-[ , b).
Therefore, l' l (t) - ,z ( t) 1 = 0 for all t E r-[ , b), and ' I (t) = ' z ( t) for all t
in this interval. _
r
Since
,(b- ) = {+ f(x , ;(s» d s
and since
;(0 = ,(b- ) + s: f(s, ;(s»ds,
we have
;(0 = {+ s: f(s, ;(s»ds
for all t E (a, b + Pl. The continuity of ; in the last eq u ation implies the
336 Chapter 5 I Metric Spaces
T t
-
'sl
where the a,it), i,j = I, ... , n, are assumed to be real and continuous
functions defined on the interval a[ , b]. We first show that f(t, x) = lfl(t, x),
... ,/_(t, )x T
] satisfies a Lipschitz condition on D,
If(t, x ' ) - f(t, x " ) I ~ k lx ' - x"l
for all (t, x ' ) , (t, x " ) E D, where x ' = (x ; , ... ,x~)T, x" = (x:' , ... ,x~)T,
and k = max
I! ( .J ! ( ._ I - I
-
L I a,it) I· Indeed, we have
=
I- 'tit a,it)(x~ J-I
- x~)
< k L
J-I
- Ix~ - x~'1 = klx' - "x l·
338 Chapter 5 I Metric Spaces
5.10.52. Lemma. In Eq. (5.10.48), let f(t, x) = (/1(t, x), ... ,I,,(t, T
»X be
2'
continuous on a domain D c R"+I, and let f(t, x) satisfy a Lipschitz condition
on D with respect to x, with Lipschitz constant k. If' l and are uniq u e
solutions of the initial-value problem (5.10.49), with ' I (f) = ;1' ' 2 (t) = ;2
and with (t, ;1), (t, ;2) E D, then
"I(t) - 2' (01::;: 1;1 - ;2Iekl,-1< (5.10.53)
for all (t, ' I (t» , (t, ' 2 { t » E D.
Proof We assume that t > t, and we leave the details of the proof for
r
t < t as an exercise. We have
= ;1 +
r
' I (t) f(s, ' I (s» d s,
and
IX = t
1=1
a1it)x1 6 !,(t, )x , i = I, ... ,n }
(5.10.56)
X,(f) = ~I' i = I, ... , n
with (t, ~1' •• '~n) E D. This solution can be extended to the entire interval
a[ , b].
Proof Since the vector f(t, x) = (fl{ t , x), ... ,/.(t, T»x is continuous on
D, since f(t, x) satisfies a Lipschitz condition with respect to x on D, and
since (T,;) E D (where; = (~I' ... '~n)T), it follows from Theorem 5.10.43
(interpreted for systems of first-order ordinary differential eq u ations) that
the initial-value problem (5.10.56) has a uniq u e solution 'I' through the point
5.10. Applications 339
(r, ;) over some interval e[ , d] c a[ , b]. We must show that 'I' can be continued
to a unique solution, over the entire interval a[ , b].
5.10.59. Exercise. eL t D c Rn+ I be given by Eq. (5.10.50), and let the real
functions alit), v/(t), i,j = I, ... ,n, be continuous on the t interval a[ , b].
Show that there exists a unique solution to the initial-value problem
It is possible to relax the conditions on v/(t), j = 1, ' . ' . ,n, in the above
exercise considerably. F o r example, it can be shown that if v/(t) is piecewise
continuous on a[ , b], then the assertions of Exercise 5.10.59 still hold.
We now address ourselves to the last item of the present section. Consider
the initial-value problem (5.10.49) which we characterized in Definition
.4 11.9. Assume that f(t, )x satisfies a Lipschitz condition on a domain D
c Rn+1 and that (r,;) E D. Then the initial-value problem possesses a
unique solution, over some t interval containing 1'. To indicate the depen-
340 Chapter 5 / Metric Spaces
where fP{T; T,;) = ;. We now ask: What are the effects of different initial
conditions on the solution of Eq. (5.1O.48)? Our next result provides the
answer.
r
Proof We have
cp(t; T, ;.) = ;. + frs, ,(s; T, ;.)]ds
+ r
and
> <
+ r
It follows that for t T (the proof for t T is left as an exercise),
<
r
I.(t; T, ;.) - .(t; T, 1;)1 II;. - ; 1 If[s, ,(s; T, ;.)] - frs, cp(s; T, ;)]1 ds
Thus if 1;. - > 1;, then cp(t; T, 1;.) - > .(t; T, 1;). •
It follows from the proof of the above theorem that the convergence is
uniform with respect to t on any interval a[ , b] on which the solutions are
defined.
x = 2x } (5.10.63)
(X T) c= ;
where -00 < T < 00, - 0 0 < c; < 00, has the unique solution
,(t; T, e> = c;eZ(t-T), - 0 0 < t < 00,
which depends continuously on the initial value C;. •
5.11. References and Notes 341
Thus far, in the present section, we have concerned ourselves with prob-
lems characterized by real ordinary differential equations. It is an easy matter
to verify that all the existence, uniqueness, continuation, and dependence
(on initial conditions) results proved in the present section are also valid for
initial-value problems described by complex ordinary differential equations
such as those given, e.g., in Eq. (4.11.25). In this case, the norm of a complex
vector z = (z " ... ,ZRY ' Zk = kU + ivk , k = 1, ... , n, is given by
where IZk I = (u~ + vl)I/2. The metric on en is in this case given by P(ZI' Z2)
= IZI - Z21·
There are numerous excellent texts on metric spaces. Books which are
especially readable include Copson 5[ .2], Gleason 5[ .3], Goldstein and
Rosenbaum [5.4], Kantorovich and Akilov [5.5], oK lmogorov and Fomin
5[ .7], Naylor and Sell 5[ .8], and Royden 5[ .9]. Reference 5[ .8] includes some
applications. The book by eK lley 5[ .6] is a standard reference on topology.
An excellent reference on ordinary differential equations is the book by
Coddington and eL vinson [5.1].
REFERENCES
5[ .1] E. A. CODDINGTON and N. EL VINSON, Theory ofOrdinary Differential Equa-
tions. New oY rk: McGraw-iH li Book Company, Inc., 1955.
5[ .2] E. T. CoPSON, Me/ric Spaces. Cambridge, England: Cambridge nU iversity
Press, 1968.
5[ .3] A. M. GLEASON, uF ndamentals of Abstract Analysis. Reading, Mass.:
Addison-Wesley Publishing Co., Inc., 1966.
5[ .4] M. E. GOLDSTEIN and B. M. ROSENBAUM, "Introduction to Abstract Analy-
sis," National Aeronautics and Space Administration, Report No. SP-203,
Washington, D.C., 1969.
5[ .5] L. V. A K NTOROVICH and G. P. AKIO
L V, uF nctional Analysis in Normed
Spaces. New o Y rk: The Macmillan Company, 1964.
5[ .6] .J EK EL ,Y General Topology. Princeton, N.J.: D. Van Nostrand Company,
Inc., 1955.
5[ .7] A. N. O K M
L OGOROV and S. V. O F MIN, Elements of the Theory ofFunctions
and uF nctional Analysis. Vol. I. Albany, N.Y.: Graylock Press, 1957.
342 Chapter 5 I Metric Spaces
343
344 Chapter 6 I Normed Spaces and Inner Product Spaces
spaces are special cases of Banach spaces; Banach spaces are special kinds of
nonned linear spaces; and H i lbert spaces are special types of inner product
spaces.) In section 15, we consider two applications. This chapter is con-
cluded with a brief discussion of pertinent references in the last section.
Throughout this chapter, R denotes the field ofreal numbers, C denotes the
field of complex numbers, F denotes either R or C, and X denotes a vector
space over .F
Different norms defined on the same linear space X yield different nonned
linear spaces. If in a given discussion it is clear which particular norm is
being used, we simply write X in place of { X ; " • II} to denote the nonned
linear space under consideration. Properties (iii) and (iv) in Definition 6.1.1
are called the homogeneity property and the triangle inequality of a nonn,
respectively.
Let { X ; II • II} be a normed linear space and let ,x E ,X i = I, ... ,n.
Repeated use of the triangle inequality yields
II X I + ... + .x 1I ~ I\ x I " + ... + IIx.lI·
The following result shows that every normed linear space has a metric
associated with it, induced by the. nonn I\ • II. Therefore, every nonned
linear space is also a metric space.
This theorem tells us that all ofthe results in the previous chapter on metric
spaces apply to normed linear spaces as we/l,providedwe let p(x, y) = Ilx - y II.
We will adopt the convention that when using the terminology ofmetric spaces
(e.g., completeness, compactness, convergence, continuity, etc.) in a normed
linear space (X ; II . Ill, we mean with respect to the metric space (X ; p},
where p(x, y) = II x - y II. Also, whenever we use metric space properties on ,F
i.e., on R or C, we mean with respect to the usual metric on R or C, respectively.
With the foregoing in mind, we now introduce the following important
concept.
6.1.5. Example. Let X = RR, the space of n-tuples of real numbers, or let
X = CR, the space of n-tuples of complex numbers. From Example 3.1.10 we
see that X is a vector space. F o r x E X given by x = (e I' . . . , e.), and for
pER such that I < p < 00, define
II x II, = l[ ei I' + ... + leRI'p /,.
We can readily verify that II . II, satisfies the axioms of a norm. Axioms
(i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv) is a direct
conseq u ence of Minkowski' s inequality for finite sums (5.2.6). L e tting
pix , y) = II x - y II" then (X ; p,} is the metric space of Exercise 5.5.25.
Since (X ; p,} is complete, it follows that (RR; II . II,} and (CR; II . II,} are
Banach spaces.
We may also define a norm on X by letting
IIxll .. = max
I~R~'
le,l·
It can readily be verified that (R\ II . II..} and (CR; II . II..} are also Banach
spaces (see Exercise 5.5.25). •
6.1.6. Example. Let X = R" (see Example 3.1.11) or X = C" (see Exam-
ple 3.1.13), let I S p < 00, and as in Example 5.3.5, let
I, = x{ E :X f; le,l'
I~'
< oo}.
Define
It is readily verified that II . II, is a norm on the linear space I,. Axioms
(i), (ii), (iii) of Definition 6.1.1 follow trivially, while axiom (iv), the triangle
346 Chapter 6 I Normed Spaces and Inner Product Spaces
6.1.9. Example
(a) L e t e[a, b) denote the linear space of real continuous functions on
the interval a[ , b), as given in Example 3.1.l9. F o r x E era, b) define
Ilxllp = i[ "lx(t Wdt I] IP ,
b
I <p < 00.
It is easily shown that lela, b); II . lip} is a normed linear space. Ax i oms
(i)-(iii) of Definition 6.1.1 follow trivially, while axiom (iv) follows from the
Minkowski inequality for integrals (5.2.8). L e t pix , y) = IIx - lY l p • Then
e{ ra, b); pp} is a metric space which is not complete (see Example 5.5.29
where we considered the special case p = 2). It follows that e{ ra, b); II . lip}
is not a Banach space.
Next, define on the linear space era, b) the function II . II.. by
IIxII .. = sup Ix ( t)
' E la,b)
I.
It is readily shown that e{ ra, b); II • II..} is a normed linear space. L e t p.. (x, y)
= I\ x - yll... In accordance with Example 5.5.28, e{ ra, b); P..} is a complete
metric space, and thus e{ ra, b); II . II..} is a Banach space.
The above discussion can be modified in an obvious way for the case
where era, b) consists of complex-valued continuous functions defined on
a[ , b). H e re vector addition and multiplication of vectors by scalars are defined
similarly as in Eqs. (3.1.20) and (3.1.21), respectively. F u rthermore, it is
easy to show that e{ ra, b); II • lip}, I < p < 00, and e{ ra, b); II . II..} are
normed linear spaces with norms defined similarly as above. Once more, the
space e{ ra, b); II • lip}, I < p < 00, is not a Banach space, while the space
lela, b); II • II..} is.
(b) The metric space pL { (a, b); pp} was defined in Example 5.5.31. It
can be shown that pL a[ , b) is a vector space over R. If we let
IIXI1p=f[ ),,,,bl
IfIPdJlI] /P,
6.1. NormedLinear Spaces 347
p > I, for f E pL a[ , b], where the integral is the Lebesgue integral, then
pL { a[ , b]; II . lip} is a Banach space since pL { a[ , b]; pp} is complete, where
pp(x, y) ~ Ilx - lY lp. _
6.1.10. Example. Let { X ; II • II..}, {Y; II . II,.} be two normed linear spaces
over ,F and let X x Y denote the Cartesian product of X and .Y Defining
vector addition on X x Y by
(XI' IY ) + (x z ' )z Y = (XI + X z , IY + )z Y
and multiplication of vectors by scalars as
(« ,x y) = (<,x< «y),
we can readily show that X x Y is a linear space (see Eqs. (3.2.14), (3.2.15)
and the related discussion). This space can be used to generate a normed
linear space { X x ;Y II . III by defining the norm II . II as
II(x, y)11 = IIxll .. + IIYII,·
F u rthermore, if { X ; II . II..} and { Y ; II • II,} are Banach spaces, then it is
easily shown that { X x ;Y II . III is also a Banach space. _
Thus, in a nonned linear space we may call S(x o; r) the closed sphere
given by Eq. (6.1.14).
When regarded as a function from X into R, a nonn has the following
important property.
In this chapter we will not always require that a particular nonned linear
space be a Banach space. Nonetheless, many important results of analysis
require the completeness property. This is also true in applications. F o r
example, in the solution of various types of equations (such as non-linear
differential equations, integral equations, etc.) or in optimization problems
or in non-linear feedback problems or in approximation theory, as well as
many other areas of applications, we frequently obtain our desired solution
in the form of a sequence generated by means of some iterative scheme. In
such a sequence, each succeeding member is closer to the desired solution
than its predecessor. Now even though the precise solution to which a
sequence of this type may converge is unknown, it is usually imperative that
the sequence converge to an element in that space which happens to be the
setting of the particular problem in question.
6.2.2. Example. Let X be the Banach space /1 of Example 6.1.6, and let
Y be the space of finitely non-zero sequences given in Example 3.1.14. It is
easily shown that Y is a linear subspace of .X To show that Y is not closed,
consider the sequence (y.l in Y defined by
IY = (1,0,0, ...),
1Y = (I, 1/2,0,0, ...),
!Y = (I, 1/2, 1/4,0,0, ...),
......................... ,
Y. = (I, 1/2, ... , 1/2· , 0,0, ).
This sequence converges to the point x = (I, 1/2, , 1/2·, 1/2-+ 1, .• . ) E .X
Since x ¢ ,Y it follows from part (iii) of Theorem 5.5.8 that Y is not a
closed subset of .X _
Next, we prove:
XI + X z + ... + X k + ... =
-
I; X 8
-
8- 1
converges and we write
-
y= I; X
8- 1
8•
.
6.3.2. Theorem. eL t X be a Banach space, and let x {
IfI; IIx 11 < then
8} be a sequence in X .
.
8 00,
8= 1
Proof To prove the first part, let y", = XI + ... + "x .' If n > m, then
8Y - "Y , = X " ,+ I + ... + x 8• eH nce,
Since i: II
8= 1
X 8
II is a convergent infinite series of real numbers, the sequence
6.4. Convex Sets 351
of partial sums sIft = IIxIl1 + ... + II x'" II is Cauchy. Hence, given f > 0,
there is a positive integer N such that n > m > N implies Is. - SIft I:: :; ; f.
But Is. - s.. 1> Ily. - y",lI, and so Y { ..} is a Cauchy sequence. Since X is
complete, y{ ",} is convergent and conclusion (i) follows.
To prove the second part, let y", = IX X .. , and let y = + ... +
lim y", =
... at
I; X +
.
•. Then for each positive integer m we have y = y- y", Y .. and
.=1
Ilyll < Ily - y .. 11 +
Ily.. 1I < Ily - y.. 11 + I; Ilx/ll· Taking the limit as
... ... 1= 1
)(
6.4.3. Example. The empty set is convex. Also, a set consisting of one
point is convex. In R3, a cube and a sphere are convex bodies, while a plane
and a line segment are convex sets but not convex bodies. Any linear sub-
space of X is a convex set. Also, any linear variety of X (see Definition 3.2.17)
is a convex set. _
6.4..4 Example. Let Y and Z be convex sets in ,X let II, pER, and let
«Y=
.X _
x{ E :X x = «Y,y E .J Y Then the set « Y + pZ is a convex set in
6.4.5. Exercise. Prove the assertions made in Examples 6.4.3 and 6.4..4
"i:+«P +~-tiJ1i:+
-1 J - i:+P - .
Therefore, x E (<< + P)Y and thus « Y + pY c (<< + P)Y. This completes
the proof. _
We note that the convex hull of Y is the smallest convex set which con-
tains .Y Examples of convex covers of sets in R2 are depicted in Figure B.
Legend: Y f!J
-
6.4.10. iF gure B. Convex hulls.
6.4.11. Theorem. Let Y be any set in .X The convex hull of Y is the set
of 'points expressible as I~ IY ~2Y2 + + ... + ~"Y", where IY ' ... ,Y" E ,Y
where I~ > 0, i = " ~I =
I, ... , n, where E 1 and where n is not fixed.
1= 1
11'11.
11'11, 1I'lb13
6.4.20. iF gure D
It is clear that iffis a linear functional and if f(x ) 0 for some x "* E ,X
< (f) = .F
then the range off is all of F ; i.e., R
F o r linear functionals we define boundedness as follows.
If(x ) =
I If(~
IIxll
II II) 1=
-r
X
If( 6x ) \ .
TIXlT IIxll
-r < II x ll.
0-
If we let M = 1/6, then If(x ) I< Mllxll, andfis bounded. _
We will see later, in Example 6.5.17, that there may exist linear funetionals
on a normed linear space which are unbounded. The class oflinear functionals
which are bounded has some interesting properties.
6.5. iL near uF nctionals 357
px~(y)] = Ot lim x:(x) + p lim x:(y) = /X('x )x + px ' ( y), i.e., ('x Otx + py) =
+
~ .-~
Otx(' )x px ' ( y), and thus, ' x is a linear functional. Next we show that ' x
is bounded. Since :x { } is a Cauchy sequence, for f > 0 there is an M such that
Ix : (x ) - :x "(x)1 < fllxll for all m, n > M and for all x E .X But (~x )x +-
x'(x), and hence Ix ' ( x ) - :x "(x) I < fllxll for all m > M. It now follows that
The next result states that the norm of a functional can be represented
in various equivalent ways.
6.5.12. Example. Consider the normed linear space e{ ra, b]; II • II-I. The
r
mapping
I(x ) = x(s) ds, x E era, b]
is a linear functional on era, b] (cf. Example 3.5.2). The norm of this func-
tional equals (b - a), because
I/(x ) 1 = I6J G
x(s) ds I< (b - a) max
G~.~6
Ix(s) I. •
6.5.13. Example. Consider the space e{ ra, b]; II • II..}, let X o be a fixed
element of era, b], and let x be any element of era, b]. The mapping
I(x ) = s: x(s)xo(s) ds
I< u: I
bounded, because
r
the reader to show that
f(x ) = ~ ~/e"
1=1
thenfis a linear functional on /". We can show thatfis bounded by observing
that
see that
IIIX IX I + ... + IXnnx ll > m(lIXII + ... + IIX n)! (6.6.2)
for all a E S.
Next, for arbitrary x E X with coordinates (' I " .. ,en) E P, we let
en
p = 1'1 I + ... + I I· First, we suppose that p > O. Then
Ilxli = Ilelx l + ... + enxnll = p II%x I+ ... + tXnll
> pm(I%1 + +I~I)
= m(lell + I+ n' l),
where inequality (6.6.2) has been used. Therefore, if p ::1= 0, we have
6.6.4. Exercise. Prove that the set S and the function g have the properties
asserted in the proof of Theorem 6.6.1.
Thus, Y~.J - y~. Since y~", - 0, it follows that Y~ = O. But this is impossible
because lX { ' ... , .x } is a linearly independent set. We conclude that the sum
6.7. Geometric Aspects ofLinear uF nctionals 363
Throughout this section X denotes a real normed linear space. Before giving
geometric interpretations of linear functionals we introduce the notions of
maximal subspace and hyperplane.
In the proof ofthe above theorem we established also the following result:
In the next result we will show that Y is closed if and only if the functional
f associated with Y is bounded (Le., continuous). Thus, corresponding to
any hyperplane in a normed linear space there is a functional that is bounded
whenever the hyperplane is closed and vice versa.
6.7.10. Exercise. Show that each of the sets Y I , Y 1 , "Y Y 4 in the preced-
ing definition is convex. Also, show that if in the above definition f is
continuous, then Y I and Y , are open sets in ,X and Y 1 and Y 4 are closed sets
in .X
6.7.11. Example. Let X = R1, and let x = (~\t ~1) E .X Let y = ("1' "1)
be any fixed vector in .X and define the linear functionalf on X as
f(x ) = "I~I + "1~1'
The set
oY = x{ E R1:f(x ) = "I~I + "1~1 = O}
is a line through the origin of R1 which is normal to the vector y. IfX I ¢ oY .
the hyperplane
In this section we state and prove the Hahn-Banach theorem. This result
is very important in analysis and has important implications in applications.
We would like to point out that the present form of this theorem is not the
most general version of the Hahn-Banach theorem.
Throughout this section X will denote a real normed linear space.
6.8.6. Corollary. Let X o E ,X X O 1= = 0, and let "f > O. Then there exists a
bounded non·zero linear functional/defined on all of X such that II I II = "f
and/(x o) = 1I/11· l Ix o lI·
K
(X : fIx) l= Ixoll}
Thus, the dual space X * of R' = X is itself the space R' in the sense that
the elements of X *
Furthermore, the norm on x * is
consist of all functionals of the form f(x )
IlfII =
.
CE l 2
ex n / • •
=
~
~
• ex,~"
lal
6.9.2. Exercise. Let X = R' , where the norm of x = (~I' ... ,~.) E X is
given by IIxll = max I~,I (see Example 6.1.5). Show that if f E X*, then
15;15;,
there is an a = (ex .. ... ,ex.) E R· such that f(x ) = + ... + ex.~ ,
exl~1 so
that X * = R' , and show that the norm on X*
•
is given by Ilfll= E lex,l.
I- I
6.9.3. Exercise. Let X = R' , and define the norm of x = (~I' ... , ~,) E X
by IIxll = (1~llp + ... + 1~,lp)l/p, where I < p < (see Example 6.1.5). 00
Show that if f E x * then there is an a = (ex ... .. , ex,) E R" such that
f(x ) = exl~1 + ... + ex,~" i.e., X* = R', and show that the norm on X *
is given by IlfII = (I ex I I' + ... + IIX,I') 1/" where q is such that .1
p
+ .1q = I.
6.9.4. Exercise. Let X be the space 1.1" 1 < p < 00, defined in Example
6.1.6 and let .1
p
+ .1q = I. If p = 1, we take q = 00. Show that the dual
space of 1.1' is I,. Specifically, show that every bounded linear functional on
lp is uniquely representable as
I(x ) -
= 1=I;
1
ex,"{
where a = (ex .. ... , ex k , • • ) is an element of I,. Also, show that every
element a of If defines an element of (lp)* in the same way and that
or, equivalently, by
Jx = x", x " (x ' ) = x'(x). (6.9.6)
We call this mapping J the canonical mapping of X into X**. The functional
x " defined on X * in this way is linear, because
"x (<;x < + px ; ) = ,x < «x; + px ; ) = ,x<« x;) + p<,x ;x >
= «x"(x~) + px " (x ; ),
and thus x " E (X * )/. Since
Ix " (x ' ) I= Ix'(x) I= I,x< )'x 1 < IIxllllxl' I,
it follows that Ilx"ll < IIxll and thus x " E X**. We can actually show that
II x " II = II x II· This is obvious for x = O. If x *- 0, then in view of Corollary
6.8.6 there exists a non-zero x ' E X * such that ,x < x ' ) = II x 1111 x ' II, and thus
IIx"ll = IIxll. From this it follows that the norm of every x E X can be
defined in two ways: as the norm of an element in X and as the norm of a
linear functional on X * , i.e., as the norm of an element in X**. We sum-
marize this discussion in the following result:
i.e., if II x . - x 11-+ °
this case we write x . + - x weakly. If a sequence { x . } converges to x E ,X
as n - 00, then we call this convergence strong con-
vergence or convergence in norm to distinguish it from weak convergence.
= I=:E
~
(iii) s: Ipft(t) dt = I.
(x , x~> = s: (x t)lpft(t) dt
where x E era, b]. Now let ~x be defined on era, b) by
(x , x~> = (x O)
for all x E era, b]. It is clear that x~ E X · . We now show that ~ - + x~
weak· . By the mean value theorem from the calculus, there is a tft such that
- l in S t. < lin and
f 1/.
- l ift
Ip.(t)x(t) dt = (x tft) f lift
- l ift
'ft(t) dt = (x tft)
for each n = 1,2, ... , and x E era, b]. Thus, (x , x~> -+ (x O) for every
x E era, b]; i.e., ~x - X o weak· . We see that the sequence of functions I{ p.}
does not approach a limit in era, b]. In particular, there is no Ipo E era, b]
such that (x O) = s: (x t)lpo(t) dt. Frequently, in applications, it is convenient
to say the sequence l{ pft} converges to the so-called "~ function" which has
this property. We see that the sequence l{ pft} converges to the ~ function in the
sense of weak· convergence. _
{xU such that the sequence IX < { ' .~x J> converges. Again, from the sub-
sequence .~x{ J we can select another subsequence {x~.J such that the sequence
lX < { ' :.~X J> converges. Continuing this procedure, we obtain the sequences
"~X "~x , x~., .
X~" x~., , x~., .
X~" x~., , x~, .
We recall (see Definition 3.6.19 and the discussion following this definition)
that if X is a complex linear space, a function defined on X X X into C,
which we denote by (x, y) for x, y E ,X is called an inner product if
(i) (x, )x > 0 for all x * - O and (x, )x = 0 if x = 0;
(ii) (x, y) = (y, )x for all x, y E X ;
(iii) (IXX + py, z ) = IX(,X z ) + P(Y, z ) for all x , y, z E X and for all
IX, P E C; and
(iv) (x , IXy + pz ) = « ( x , y) + P(x , z ) for all x , y, z E X and for all
IX, p E C.
376 Chapter 6 I Normed Spaces and Inner Product Spaces
Using the above results, we can now readily show that the function II . II
defined by IIxll = (x, X ) I/2 is a norm.
Thus, every H i lbert space is also a Banach space (and also a complete
metric space). Some authors insist that H i lbert spaces be infinite dimensional.
We shall not follow that practice. An arbitrary inner product space (not
necessarily complete) is sometimes also called a pre- H i lbert space.
6.11.10. Ex a mple
(a) L e t X = ~[a, b] denote the linear space of complex-valued continuous
functions defined on a[ , b] (see Ex a mple 6.1.9). F o r ,x y E ~[a, b] define
(x , y) = s: x ( t)y(t) dt.
It is readily verified that this space is a pre- H i lbert space. In view of Example
6.1.9 this space is not complete relative to the norm II x II = (x, X)I/2., and hence
it is not a H i lbert space.
(b) We extend the space of real-valued functions, pL a[ , bJ, defined in
Ex a mple 5.5.31 for the case p = 2, to complex-valued functions to be the
set of all functions f: a[ , b] - C such that f = u + iv for u, v E 2L .[a, b].
Denoting this space also by 2L .[a, b], we define
(f, g) = r fgdp,
G
[ J .bl
378 Chapter 6 I Normed Spaces and Inner Product Spaces
for f, g E L~[a, b], where integration is in the eL begue sense. The space
{L~[a, b]; ( " .)} is a Hilbert space. _
where IIXlIII = (XI' X I)/'2. It is readily verified that X is complete, and thus
X is a Hilbert space. _
is convergent in ,X
,._ 1 ,,= 1
Z E .X
Parts (i) and (ii) of Theorem 6.11.16 are referred to as the parallelogram
law and the Pythagorean theorem, respectively (refer to Theorems .4 9.33 and
.4 9.38).
II ~ x J W= J~ IIx J llz.
We note that if x 1= = 0 and if y = lx llxll, then lIyll = 1. eH nce, it is
possible to convert every orthogonal set of vectors into an orthonormal set.
Let us now consider a specific example.
II
we obtain
(f.,f",) =
J 0
I -
fft(t)f",(t) dt = 0 e2aCa- I II)" dt
e 2a (a- I II)' - 1
- 2n(n - m)i
Since e 2ak
' = cos 2nk + i sin 2nk, we have
(fft,f",) = 0, m * n;
i.e., if m * n, then fa ..L fill' On the other hand,
(i) t I(x,
1='
x;) 12 < IlxW for all X E X; and (6.11.24)
(ii) (x - :t (x,
1='
x,)x,) ..L x J for any j = 1, ... , n.
(6.11.27)
for every x E .X
From our discussion thus far it should be clear that not every normed
linear space can be made into an inner product space. The following theorem
gives us sufficient conditions for which a normed linear space is also an
inner product space.
6.11.36. Exercise. eL t I" I < p < 00, be the normed linear space defined
y. y
Xl
6.11.3 iF gure G
In view of part (iv) of the above theorem, we can write y.L = y.LH =
= ... , and y.l.L = y.l.L..L L = yU H l l.
yl..L..L lL . = ....
Before giving the classical projection theorem, we state and prove the
following preliminary result.
6.12.11. fF uJ re H
The preceding theorem does not ensure the existence of the vector oY .
However, if we require in Theorem 6.12.10 that Y b e a c/osedlinear subspace
in a H i lbert space ,X then the existence of the unique vector oY is guaranteed.
6.12. Orthogonal Complements 385
This important result, which we will prove below, is called the classical
projection theorem.
Since Y is a linear subspace, it follows that for each "Y " nY E Y we have
(y", + nY )/2 E .Y Thus,llx - (y", + nY )/211 > ~ and
lIy", - nY W < 211Y", - xW + 211x - nY W - (4 P.
Also, since lIy", - x W- - ~2 as m - 00, it follows that lIy", - "Y W - 0
as m, n - 00. eH nce, nY{ } is a Cauchy sequence. Since Y is a closed linear
subspace of a Hilbert space, it is itself a Hilbert space and as such nY{ } has
a limit oY E .Y iF nally, by the continuity of the norm (see Theorem 6.1.15),
it follows that lim II x - "Y II = II x - oY II = .J This proves the theorem. -
From part (ii) of Theorem 6.12.8 we have, in general, y.u. ::> .Y Under
certain conditions equality holds.
Before proceeding to the next result, we recall from Definition 3.2.13 that
a linear space X is the direct sum of two linear subspaces Y and Z if for
every x E X there is a unique Y E Y and a unique Z E Z such that x =
6.13. oF urier Series 387
6.13. O
F R
U IER SERIES
6.13.1. Theorem. Let X be an inner product space, let uY{ ..• ,Yn} be a
finite orthonormal set in ,X and let Y be the linear subspace of X generated
by { Y I ' • . • , nY ' } Then the vectors {Yu ..• , nY } form a basis for Y a nd, more-
over, in the representation of a vector Y E Y by the sum
Y = IXIIY + ... + IXnnY '
the coefficients t1., are specified by
t1.1 = ( y,y/), i= I , ..• ,n.
6.13.3. Theorem.
infinite orthonormal sequence
Let X be a H i lbert
in .X A series ~
.
space and let ,x { }
t1.I X ,
be a countably
is convergent to an
t:1
element x E ,X i.e.,
if and only if L
.I 1X 2
< In this case we have the relation
1= 1
/1 00.
= t
I:& m + l
1t1. / 12 __ O
6.13. oF urier Series 389
Ill,12 <
00
.
and ~ 00.
1· ..+ 1
Now assume that} ' Ill,12 < 00, and let x = lim s•. We must show that
f:1 - ...
ll, = (x, ,x ). From Theorem 6.13.1 we have ll, = (s., ,x ), i = I, ... ,n.
But s. - > x, and hence by the continuity of the inner product we have
(s., ,x ) - > (x, ,x ) as n - > 00. Therefore, ll, = (x, ,x ), which completes the
proof. _
In the next result we use the concept of closed linear subspace generated by
a set (see Definition 6.12.7).
°
(i) Y is complete;
(ii) if (x, y) = for all Y E ,Y then x = 0; and
(iii) V(Y) = .X
6.13.9. Exercise. Prove Theorem 6.13.8 for the case where Y is an ortho-
normal sequence ,x { .}
(6.13.11)
- -
6.13.4 and 6.13.8 we have
x = ~
t=1
(x, ,x )x, = ~
,~
,« ,x .
6.13. oF urier Series 391
IIx I I 2
= (i: (1"x " 1; (1,J x )J
1= I J= I
= 1'' f1= J=
1; (1,/i J (x
I "
)J x = t
1= I
1(1,,1 2
YI = 11;:11'
It is clear that IY and IX generate the same linear subspace. Next, let
Since
(Z2' Y I ) = (x 2 - (x 2, IY )YIo YI) = (x 2, Y I ) - (x 2, IY )(YIo YI)
= (x 2, Y I ) - (X2' YI) = 0,
it follows that Z2 -L Y I ' We now let 2Y = 2z /11 2z 11. Note that Z2 0, because *"
2X and IY are linearly independent. Also, IY and 2Y generate the same linear
subspace as IX and X 2, because 2Y is a linear combination of IY and 2Y '
Proceeding in the fashion described above we define Zlo Z2' ... and
Y I ' 2Y ' .• . recursively as
a- I
Z. = .X - ~ (X., ,Y )Y,
1= 1
and
z
Y.= IIz:II'
As before, we can readily verify that z. L- ,Y for all i < n, that z. # . 0, and
that the ,Y { l, i = I, ... ,n, generate the same linear subspace as the ,x{ ,}
i = I, ... ,n. If the set ,x { } is finite, the process terminates. Otherwise it is
continued indefinitely by induction.
392 Chapter 6 I Normed Spaces and Inner Product Spaces
X = L
. (x, X , )X I,
1= \
concept, is of very little value in spaces which are not finite dimensional. In
such spaces, orthonormal basis as defined above is much more useful.
We conclude this section with the following result.
6.14..4 Exercise. Two normed linear spaces over the same field are said
to be congruent if they are isomorphic (see Definition 3.4.76) and isometric
(see Definition 5.9.16). Let X be a H i lbert space. Show that X is congruent
to X*.
variables, while in the third part we concern ourselves with the estimation of
random variables.
(Yn' Y \ ) (Yn' nY )
The matrix (6.15.2) is called the Gram matrix of Y\, ... ,Yn' The determinant
of (6.15.2) is called the Gram determinant and is denoted by A(YI> ... ,yJ.
The equations (6.15.1) are called the normal equations. It is clear that in a
real Hilbert space G(YI> ... ,Yn) = GT(YI> ... ,Yn), and that in a complex
Hilbert space G(YI> ... ,Yn) = GT(y1' ... ,Yn)'
In order to approximate x E X by oY E Y we only need to solve Eq.
(6.15.1) for the lXI' i = 1, ... ,n. The next result gives conditions under
which Eq. (6.15.1) possesses a unique solution for the IX I •
396 Chapter 6 I oH rmed Spaces and Inner Product Spaces
(6.15.5)
we obtain
The next result establishes an expression for the error II x - oY II. The
proof of this result follows directly from the classical projection theorem.
Then
6.15. Some Applications 397
where
(Yit ,Y ,) (Ylt x)
(Y z , ,Y ,) (Y z , x)
8. Random Variables
Throughout the remainder of this section, we let to, g:, P} denote our
underlying probability space, and we assume that all random variab~es belong
to the Hilbert space L z with inner product (X , )Y = E(XY).
6.15.12. Exercise. In the preceding example, show that if P,j = O';b ,j for
i,} = I, ... , m, where btj is the K r onecker delta, then
,« -
_ 2
0';-+ z for
._
I - I, ... , m.
mO'" , O'v
The nex t result provides us with a useful means for finding the best linear
estimate of a random variable ,X given a set of random variables { Y p ... ,
Y k } , if we already have the best linear estimate, given {Y p .• • , Y k - I } .
Proof By the classical projection theorem (see Theorem 6.12.12), "Y ,(k - I)
.J .. ,Y' .-I· Now for arbitrary Z E ,Y ' ., we must have Z = CIY I + ... +
C,.-I ,Y .-I + C,.Y,. for some (C I' ... ,C,.). We can rewrite this as Z = ZI + Z2'
where ZI = CIY I + ... + C,.-I,Y .-I + C,.Y,.(k - I) and Z2 = C,.Y,.(k - I).
Since ZI E ,Y ' .-I and Z2 1- 'Y,.-I' it follows from Theorem 6.12.12 that ZI
and Z2 are unique. Since ZI E ,Y' .-I and Z2 E V({,Y .(k - I)}), the theorem
is proved. _
6.15.14. Theorem. Let X I ' ... ,X., oY ... , "Y , be random variables in
£ 2 ' Let G = [)',j]' where )'1) = E{,Y Y j } for i,j = 1, ... ,m, and let B =
[ ,j] , where PI) = E{ X , Y
P j} for i = I, ... ,n. If G is non-singular, then
i = AY is the best linear estimate of ,X given ,Y if and only if A = BG- I .
for j = 1, ... , k.
04 4 Chapter 6 / Normed Spaces and Inner Product Spaces
Finally, to verify Eq. (6.15.33), we have from Eqs. (6.15.26) and (6.15.30)
X(k + 11 k) = A(k)X ( k Ik) + B(k)U(k).
F r om this, Eq. (6.15.33) follows immediately. We note that i(ll 0) = 0 and
P(IIO) = P(l). This completes the proof. _
The material of the present chapter as well as that of the next chapter
constitutes part of what usually goes under the heading of functional analysis.
Thus, these two chapters should be viewed as a whole rather than two separate
parts.
There are numerous excellent sources dealing with H i lbert and Banach
spaces. We cite a representative sample of these which the reader should
consult for further study. References 6 [ .6]6[- .8], 6[ .10], and 6[ .12] are at an
introductory or intermediate level, whereas references 6 [ .2]6[ - .4] and 6[ .13]
are at a more advanced level. The books by Dunford and Schwartz and by
Hille and Phillips are standard and encyclopedic references on functional
analysis; the text by Y osida constitutes a concise treatment of this subject,
while the monograph by H a lmos contains a compact exposition on H i lbert
space. The book by Taylor is a standard reference on functional analysis at
the intermediate level. The texts by K a ntorovich and Akilov, by K o lmogorov
and F o min, and by Liusternik and Sobolev are very readable presentations
of this subject. The book by Naylor and Sell, which presents a very nice
introduction to functional analysis, includes some interesting examples. F o r
references with applications of functional analysis to specific areas, including
those in Section 6.15, see, e.g., Byron and F u ller 6[ .1], K a lman et al. 6[ .5],
L u enberger 6[ .9], and Porter 6[ .11].
REFERENCES
6[ .6) L . V. A
K NTORovlCH and G. P. AKIO L V, uF nctional Analysis in Normed
Spaces. New York: The Macmillan Company, 1964.
6[ .7) A. N. O K M
L OGOROV and S. V. O F MIN, Elements of the Theory of uF nctions
and uF nctional Analysis. Vols. t, II. Albany, N.Y.: Graylock Press, 1957
and 1961.
6[ .8) .L A. IL SU TERNIK and V. .J SoBOLEV, Elements ofFunctional Analysis. New
York: rF ederick Ungar Publishing Company, 1961.
6[ .9) D. G. EUL NBERGER, Optimization by Vector Space Methods. New York:
J o hn Wiley & Sons, Inc., 1969.
6[ .10] A. W. NAYO L R and G. R. SEL,L iL near Operator Theory. New York: Holt,
Rinehart and Winston, 1971.
6[ .11] W. A. PORTER, Modern oF undations of Systems Engineering. New York:
The Macmillan Company, 1966.
6[ .12] A. E. TAYO L R, Introduction to uF nctional Analysis. New York: John Wiley
& Sons, Inc., 1958.
6[ .13] .K OY SIDA, uF nctional Analysis. Berlin: Springer-Verlag, 1965.
7
IL NEAR OPERATORS
Throughout this section X and Y denote vector spaces over the same field
,F where F is either R (the real numbers) or C (the complex numbers).
We begin by pointing to several concepts considered previously. Recall
from Chapter I that a transformation or operator T is a mapping of a subset
:D(T) of X into .Y Unless specified to the contrary, we will assume that
X = :D(T). Since a transformation is a mapping we distinguish, as in Chapter
I, between operators which are onto or surjective, one-to-one or injective,
and one-to-one and onto or bijective. If T is a transformation of X into Y we
write T: X - + .Y If x E X we call y = T(x) the image ofx in Y under T, and
if V c X we define the image ofset V in Y under T as the set
T(V) = y{ E Y: y = T(v), v EVe X } .
On the other hand, if W c ,Y then the inverse image ofset Wunder T is the
set
T- I (W) = x { E :X y = T(x) EWe .} Y
We define the range ofT, denoted R
< (T), by
R
< (T) = y{ E :Y y= T(x), x EX } ;
< (T) = T(X). Recall that if a transformation T of X into Y is injective,
i.e., R
then the inverse of T, denoted T- I , exists (see Definition 1.2.9). Thus, if
y = T(x) and if T is injective, then x = T- l (y).
In Definition·3.4.1 we defined a linear operator (or a linear transformation)
as a mapping of X into Y having the property that
(i) T(x + y) = T(x) + T(y) for all ,x y E X ; and
(ii) T(lX)X = lXT(x) for alllX E F and all x E .X
As in Chapter 3, we denote the class of all linear transformations from
X into Y by L ( X , Also, in the case of linear transformations we write
)Y .
Tx in place of T(x).
Of great importance are bounded linear operators, which turn out to
be also continuous. We have the following definition.
04 7
04 8 Chapter 7 I iL near Operators
whenever II x - X o II < 6.
The reader can readily prove the next result.
We now show that the function II . II defined in Eq. (7.1.12) satisfies all
the axioms of a norm.
7.1.15. Theorem. The linear space B(X , )Y is a normed linear space (with
norm defined by Eq. (7.1.12»; i.e.,
(i) for every T E II Til > 0, and II Til = 0 if and only ifT =
B(X , )Y , 0;
(ii) liS + Til s IISII + II Til for every S, T E B(X, )Y ; and
(iii) II T~ II = I~ III Til for every T E B(X, )Y and for every ~ E .F
Proof The proof of part (i) is obvious. To verify (ii) we note that
II(S + T)x l l = IISx + Tx l l < IISxll + IITxl1 < (IISII + IITll)llxll·
If x = 0, then we are finished. If x t= = 0, then
7.1.20. Example. Let X = era, b], and let 11·1100 be the norm on era, b] defined
in Example 6.1.9. eL t k: a[ , b] X a[ , b] - > R be a real-valued function, con-
tinuous on the square a < s < b, a < t < b. Define the operator T: X - > X
by
X)
= r k(s, t)x ( t) dt
7.1.22. Example. eL t X = R", and let IU { ' ... ,u"} be the natural basis for
R" (see Example .4 I.I 5). F o r any A E L ( X , X ) there is an n X n matrix, say
A = a[ ll] (see Definition .4 2.7), which represents A with respect to IU{ > ... ,
u"}. Thus, if Ax = y, where x = (' I > ... ,,") E X a ndy = ('71' ... , 7' ") E ,X
we may represent this transformation by y = Ax (see Eq. (4.2.17». In Exam-
ple 6.1.5 we defined several norms on R", namely
IIxllp = [ I ' l lI' + ... + 1e"'I] ' /P, 1 <p < 00
and
II x 11_ = max, I{ e,l}.
It turns out that different norms on R" give rise to different norms oftransfor-
mation A. (In this case we speak of the norm of A induced by the norm
defined on R".) In the present example we derive expressions for the norm
of A in terms of the elements of matrix A when the norm on R" is given
by II • III' II • liz, and II • 11-·
(i) L e tp = 1; i.e., IIxli = led + ... + 1e"1. Then IIAII = max
1
t'lalll.
1-1=
To prove this, we see that
From this it follows that II A II ~ )' 0 ' and so we conclude that II A II = )' 0 '
7.1. Bounded iL near Transformations 14 3
(ii) Let p = 2; i.e., IIx l l = (leI 12 + + 12)1/2. Let AT denote the Ie.
transpose of A (see Eq. (4.2.9», and let A
{ ., ,A,,} be the distinct eigenvalues
of the matrix ATA (see Definition .4 5.6). LetA o = max AJ . Then II A II = ~.
J
To prove this we note first that by Theorem .4 10.28 the eigenvalues of ATA
are all real. We show first that they are, in fact, non-negative.
Let {XI' ... , ,x ,} be eigenvectors of ATA corresponding to the eigenvalues
A{ . I, ... , A,,}, respectively. Then for each i = I, ... , k we have ATAx/ = A/X/.
Thus, ;X ATAx/ = A,X;/X . From this it follows that
A= ;X ATAx/ > O.
, x;x/-
F o r arbitrary x E X it follows from Theorem .4 10.44 that x = IX + .
+ "x , where ATAx, = A/X/, i = I, ... ,k. Hence, ATAx = AIX I + .
+ A/eX", By Theorem .4 9.41 we have IIAxW = T
x ATAx. Thus,
IIAxW = T
x ATAx " Atllx,W <
= I; "
Ao I; Ilx,W = AollxW,
'=1 /= 1
We conclude this section with the following useful result for continuous
linear transformations.
-
7.1.27. Theorem. Let T L(X, )Y . Then T is continuous if and only if
-
E
The proof of this theorem follows readily from Theorem 5.7.8. We leave
the details as an exercise.
Throughout this section X and Y denote vector spaces over the same field
F where F is either R (the real numbers) or C (the complex numbers).
We recall that a linear operator T: X - 4 Y has an inverse, T- I , if it is
injective, and if this is so, then T- I is a linear operator from R
< (T) onto X
(see Theorem 3.4.32). We have the following result concerning the continuity
ofT- I .
or
The next result, called the Neumann expansion theorem, gives us important
information concerning the existence of the inverse of a certain class of
bounded linear transformations.
(7.2.3)
14 5
416 Chapter 7 I iL near Operators
IITil <
.
.
Proof Since I, it follows that the series I~J ITII· converges. In
view of Theorem 7.1.16 we have II P II < IITil·, and hence the series I: T·
.so
converges in the space B(X , X), because this space is complete in view of
Theorem 7.1.24. If we set
S= I:P,
.
.-0
then
ST = TS =
.
I: P+ I,
.=0
and
(I - T)S = S(I - T) = I.
It now follows from Theorem 3.4.65 that (I - T)- I exists and is equal to S.
F u rthermore, S E B(X , X ) . The inequality (7.2.3) follows now readily
and is left as an exercise. _
x
.
= :E x We now compute II x" II. First, we see that
k•
k- I
7.2.8. Proposition. If A
{ ,,} is any countable collection of subsets of X
= U
GO
such that X A", then there is a sphere S(x o; E) C X and a set A" such
,,-I
that S(X o; E) C .1".
k- I
x E n "K .
GO
k= 1
U
GO
Then X ¢ A" for all n. But this contradicts the fact that X = A". This
,,= 1
completes the proof of the proposition. _
Clearly, Y = U A
k- I
k• By Proposition 7.2.8 there is a sphere S(Yo; f) C Y
and a set A" such that S(Yo; E) C .1". We may assume that oY E A". Let p
be such that 0 < p < E, and let us define the sets Band Bo by
B = y{ E S(Yo; f): p < lIy - oY II}
and
Bo = {y E Y: y = z - oY , Z E B}.
14 8 Chapter 7 I iL near Operators
We now show that there is an Ax such that B o c Ax < Let Y E B n Aft' Then
Y - oY E Bo. We then have
IIT-I(y - oY )11 < IIT-I Y II + IIT-I oY lI
~ nUl Y II + II oY III
< nUly - oY II + 211 oY 10
= nlly - Y 11[1
o
+ 211Yoll
IIYoY- ll
]
k= 1
so that
T(f I~
X k) = ~
k- I
TX k = :tY k
k- I
= y. eH nce, x = T- I y . Therefore, IIxli = IIT- I Y II
< 3KllylI. This implies that T-I is bounded, which was to be proved. _
x =- i[ + ~ + ... ;J and
mil
or Tx = lx . The unique solution has to be x = 0, because TO = O.
(ii) L e tL = t- T. Then IILII = Til < l. It now follows from Theorem
7.2.2 that (L _ / )- 1 exists and is continuous on .X Thus, (lL - ll)- I = (T
- ll)- I exists and is continuous on .X This completes the proof of part (ii).
The proofs of the remaining parts are left as an exercise. _
7.3. CONJG
U ATE AND ADJOINT OPERATORS
x y R y. ·x
7.3.1. iF gure A
7.3.3. Exercise. Show that the conjugate operator T' is unique and linear.
The reader is cautioned that many authors use the terms conjugate operator
and adjoint operator interchangeably. Also, the symbol T* is used by many
authors to denote both adjoint and conjugate operators.
Some of the important properties of conjugate operators are summarized
in the following result.
7.3. Conjugate and Adjoint Operators 24 3
X E ,X Tx = y, and we have
<x, x') = <T-I y, x ' ) = <y, (T- I )' X ' ) = <Tx, (T- I )' x ' )
= <x, T'(T~I)'x').
7.3.7. Exercise. Prove parts (ii), (iii), (v), and (vi) of Theorem 7.3.6.
7.3.8. Theorem. Let ,X ,Y and Z be Hilbert spaces, and let I and 0 denote
the identity and ez ro transformation on ,X respectively. Then
(i) IIT*II= IITII, where T E B(X , )Y ;
(ii) 1* = I;
(iii) 0* = 0;
(iv) (S + T)* = S* + T*, where S, T E B(X , )Y ;
(v) (a:T)* = a.T*, where T E B(X, )Y and IX E F ;
(vi) (ST)* = T*S*, where T E B(X, )Y , S E B(Y, Z);
(vii) if T- I E B(Y , X ) exists, then (T*)- t E B(X , )Y exists, and more-
over (T*)- t = (T- t )*;
(viii) if for T E B(X , )Y we define (T*)* = T**, then T** = T; and
(ix) IIT*TII= IITllz , where T E B(X, )Y .
Proof To prove part (i) we note that
IIT*x l l z = I(T*x , T*x ) ! = I(T(T*x ) , )x 1 < II T(T*x ) IIIlxll
< IITIIIIT*xllllxll,
or
Ilr*xll < IITllllx l l·
F r om the last inequality it follows that IIT*II < II T II. Reversing the roles of
T and T* we obtain
IITxZ
U = I(Tx , Tx)1 = I(T*(Tx ) , )x 1 < II T*(Tx ) IIIIx II < IIT*IIIITx l lllx l l,
or
II Tx II < II T* IIIIx II·
F r om this it follows that II Til < IIT*II,and therefore IITII = IIT*II·
The proofs of properties (ii)-(viii) are trivial. To prove part (ix), we first
7.3. Conjugate and Adjoint Operators 24 5
note that
IIT*TII< IIT*IIIITII= IITIIIITII= IITW·
On the other hand,
IITxW = (Tx, Tx ) = (T*Tx, )x < IIT*Tx l lllx l l < IIT*Tllllxllllxll·
Taking the square root on both sides of the above inequality we obtain
II Txll < ~IIT*Tllllxll,
and thus II Til < ~IIT*TII, or IITW < IIT*TII· Hence, IIT*TII= IITW· •
7.3.10. Example. Let X = C" be the H i lbert space with inner product
defined in Example 3.6.24, and let A E L ( X , X ) be represented (with respect
to the natural basis for X ) by the n x n matrix A = a[ IJ'} The transformation
Y = Ax can be written in the form
•
IY = E alJ x J , i= 1,2, ... ,n,
I= J
where IY is the ith component of the vector y E .X Let A* denote the adjoint
of A on the H i lbert space X, and let A* be represented by the n X n matrix
[a~}. Now if u = (u l , • • ,u.) E X , then
(Ax , u) = (y, u) = t
1= 1
Nil = t U I( 1=f:1 aIJx
1= 1
J ),
and
(x , A*u) = t (t
I- IJ-I
a~uJ)'
XI
In order that (Ax , u) = (x , A*u) we must have a~ = iiIJ ; i.e., the matrix of
A* is the transpose of the conjugate of the matrix of A. •
r
and define the rF edholm operator T by
where it is assumed that the kernel function k(s, t) is well enough behaved so
that
s: s: I k(s, t) 1
1
dtds < 00.
24 6 Clulpler 7 I iL near Operalors
Now if U E L 2a
[, b], then
To prove (v), let y E m(T*). Then T*y = 0 and TT*y = O. This implies
that m(T*) c m(TT*). Next, let y E m(TT*). Then TT*y = 0 and (y, TT*y)
= O. This implies that (T*y, T*y) = 0 so that T*y = O. Therefore, y E m(T*)
and m(TT*) c m(T*), completing the proof of part (v). •
The proof of the next result follows by direct verification of the formulas
involved.
34 0 Chapter 7 / iL near Operators
U = ~ A
[ + A*] and V= ii A
[ - A*],
7.4.18. Example. eL t X = L 2a
[, b] (see Example 6.11.10), and define
T E B(X, X ) by
y = Tx = tx ( t).
Then for any Z E X we have
z = Tx = I x ( s)ds.
We conclude this section with the following result, which we will sub-
sequently require.
7.5. Other iL near Operators 34 1
7.5.5. Exercise. Prove Theorem 7.5.4. Recall that U and V are unique by
Theorem 7.4.15.
Px =
~
Note that in view of Theorem 6.12.16, Definitions 3.7.12 and 7.5.19 are
consistent.
The proof of the next theorem is straightforward.
The next result follows immediately from part (iii) of the preceding
theorem.
7.5.29. Theorem. Let Y 1 and "Y be closed linear subspaces of ,X and let
PI and P 2 be the orthogonal projections onto Y t and "Y , respectively. The
difference transformation P = PI - P z is an orthogonal projection if and
only if P z < PI' The range of Pis Y t n Y t .
7.5.31. Example. eL t R denote the transformation from E" into E" given
in Example .4 10.48. That transformation is represented by the matrix
R, = [c~S 0 - sin OJ
SID 0 cos 0
with respect to an orthonormal basis e{ l' e"J. By direct computation we
obtain
R: =[ c~s 0 sin OJ.
- SID 9 cos 9
It readily follows that R*R = RR* = I. Therefore, R is a linear transforma-
tion which is isometric, unitary, and normal. _
7.6. The Spectrum 0/ an Operator 34 9
are other ways that a complex number 1 may fail to be in p(T). These possi.
bilities are enumerated in the following definition.
7.6.3. Definition. The set of all eigenvalues of T is called the point spectrum
of T. The set of alll such that (T - l1)- 1 exists but Gl(T - l l) is not dense
in X is called the residual spectrum of T. The set of all 1 such that (T - 11)-1
exists and such that Gl(T - 11) is dense in X but (T - ll)- I is not continuous
is called the continuous spectrum. We denote these sets by pq ( T), Rq(T),
and Cq(T), respectively.
R
< (T -U) *X .Ie "RtT(T) 1 e RtT(T) 1 e PtT(T)
Next, assume that 1 ¢ pq(T). so that (T - l1)- 1 exists. and let us inves·
tigate the continuity of (T - 1 1)- 1 . We see that if y = (' I I. 1' 2.' ..) E Gl(T
- 11), then (T - l 1)- l y = x is given by
~ -.....!l.L- _ k'lk .
k- ..! . ._ l - I - l k
k
7.6. The Spectrum 0/ an Operator 14
and
p(n = P[ O'(T) u CO' ( nr· •
7.6.7. Theorem. Let T E B(X, X). IflAI > II Til, then A. E p(T) or, equiva-
lently, if A E O'(n, then IA.I < II Til.
14 Chapter 7 I iL near Operators
7.6.9. Theorem. Let T E B(X, X). Then P(T) is open and o'(T) is closed.
Proof Since o(T) is the complement of p(T), it is closed if and only if P(T)
is open. Let 1 0 E P(T). Then (T - 1 0 1) has a continuous inverse. F o r arbi-
trary 1 we now have
III- (T - l oI} - I (T 1- 1}11
= II(T - l oI} - ' ( T - 1 0 1) - (T - l ol} - I (T - 1 /)11
= II(T - l oI} - I [ ( T - 1 01) - (T -1I)]11
= II(l- l o)(T - 1 0 1)-111
= Il- l olIl(T - 1 0 /)-111.
Now for 11 - 10 I sufficiently small, we have
III- (T - loT) - I (T - 1I) II = 11 - 1 0 III(T - 1 0 ) -I II < 1.
Now in Theorem 7.2.2 we showed that if T E B(X, X), then T has a contin-
uous inverse if III- Til < 1. In our case it now follows that (T - lo/)- I (T
- lI) has a continuous inverse, and therefore (T - 1I) has a continuous
inverse whenever Il - lo I is sufficiently small. This implies that 1 E p(T)
and P(T) is open. eH nce, u(T) is closed. _
The next two results indicate what happens to the spectrum of an operator
T when it is subjected to various elementary transformations.
7.6.11. Theorem. Let T E B(X , X), and let P(T) denote a polynomial in
T. Then
q ( p(T» = p(q(T» = {p(A): A E q(T)J.
7.6.12. Exercise. Prove Theorem 7.6.11.
(T- I - 1I) = (U - T) 1 T- I .
It follows that if 1 i q ( T), then (T- I - 1/) has a continuous inverse; i.e.,
1 i q ( T) implies that 1i q ( T- I ). In other words, U ( T- I ) c [ u (T)r l
• To
prove that [q(T)]-1 c q ( T- I ) we proceed similarly, interchanging the roles of
Tand T- I . •
4 Chapter 7 I iL near Operators
for all x E .X But this implies that l rt neT), contrary to the original assump-
tion. eH nce, it must follow that l = .i, which implies that l is real.
To prove (ii), first note that II Til > sup { I ll: l E q ( T)} for any T E
B(X, X) (see Theorem 7.6.7). To show that equality holds, if T is hermitian,
we first must show that II T WE n(P) = q ( P). F o r all reall and all x E X
we can write
IIT2 x - ..1hW = (T2 X - l 2 X , T2X - l 2 X )
= (T2 X , T2 X ) - (T2 X , l2 X ) - (l2 X , T2X ) + l4(,X )x .
Since (T2X , )x = (Tx, T*x) = (Tx, Tx), we now have
(T2 X - l2X , Px - l2 X ) = (T2 X , T2 X ) - 2l 2(Tx, Tx) + A,4(,X )x ,
or
IIT2x - l 2 X W = IIT2x W - 2l 211TxW + A,4 I1xW. (7.6.19)
Now let }~x{ be a sequence of unit vectors such that IITx~ll--+ IITII. If'
l = IITII, then we have, from Eq. (7.6.19),
IIPx~ - l2X~HZ = IIT2X~HZ - 2l211Tx~W + l4
< (II T 1111 Tx~ 11)2 - 2A,211 Tx~ ZH + A,4 = ). 211 Tx~ ZH - 2). 211 Tx~ W+ ). 4
= ).4 _ ).211 Tx~ ZH -+ 0 as n - + 00;
In the following we let T E B(X, X), ). E C, and we let .~ l(T) be the null
space of T - ;U i.e.,
i~ T) = x{ E :X (T - U)x = O} = ~(T - U). (7.6.23)
It follows from Theorem 7.1.26 that ~.l(T) is a closed linear subspace of .X
F o r the next result, recall Definition 3.7.9 for the meaning of an invariant
subspace.
Before considering the last result of this section, we make the following
definition.
Throughout this section X is a normed linear space over the field ofcomplex
numbers C.
Recall that a set Y c X is bounded if there is a constant k such that for
all x E Y we have II x II < k. Also, recall that a set Y is relatively compact
if each sequence x{ n } of elements chosen from Y contains a convergent
subsequence (see Definition 5.6.30 and Theorem 5.6.31). When Y contains
only a finite number of elements then any sequence constructed from Y must
include some elements infinitely many times, and thus Y contains a convergent
subsequence. From this it follows that any set containing a finite number
of elements is relatively compact. Every relatively compact set is contained
in a compact set and hence is bounded. F o r the finite-dimensional case it is
also true that every bounded set is relatively compact (e.g., in Rn the Bolzano-
Weierstrass theorem guarantees this). However, in the infinite-dimensional
case it does not follow that every bounded set is also relatively compact.
In analysis and in applications linear operators which transform bounded
sets into relatively compact sets are of great importance. Such operators are
called completely continuous operators or compact operators. We give the
following formal definition.
7.7.5. Example. Let X = era, bJ, and let II . II", be the norm on era, bJ
as defined in Example 6.1.9. eL t k: a[ , bJ X a[ , bJ - R be a real-valued
function continuous on the square a < s < b, a < t < b. Defining T:
X-Xby
T
[ (J x s) = s: k(s, t)x(t)dt
for all x E ,X we saw in Example 7.1.20 that Tis a bounded linear operator.
We now show that T is completely continuous.
eL t ,x { ,} be a bounded sequence in ;X i.e., there is a K > 0 such that
IIx"lI", < K for all n. It readily follows that if "Y = Tx", then IIY"II S 7011x"II,
where 70 = sup
1I~,b~
fb Ik(s, t) Idt (see Example
II
7.1.20). We now show that .Y { }
is an equicontinuous set of functions on a[ , bJ (see Definition 5.8.11). Let
f > O. Then, because of the uniform continuity of k on a[ , bJ X a[ , bJ, there
is a ~ > 0 such that Ik(s .. t) - k(s~, t)1 < (K b f_ a) if lSI - s~1 < ~ for
r
every t E a[ , bJ. Thus
IY,,(sl) - y,,(s~) I< Ik(sl' t) - k(s~, t) IIx(t) Idt < f
for all n and all s.. s~ such that lSI - s~ I < ~. This implies the set ,Y{ ,} is
equicontinuous, and so by the Arzela-Ascoli theorem (Theorem 5.8.12),
the set { Y . } is relatively compact in era, b] ; i.e., it has a convergent subse-
uq ence. This implies that T is completely continuous.
It can be shown that if X = L~[a, b) and if T is the Fredholm operator
-
defined in Example 7.3. II, then T is also a completely continuous operator.
The proof of the next result utilizes what is called the diagonalization
process.
7.7.17. Theorem. Let X and Ybe Banach spaces, and let { T .l be a sequence
of completely continuous operators mapping X into .Y If the sequence { T .l
converges in norm to an operator T, then T is completely continuous.
Proof Let .x { l be an arbitrary sequence in X with IIx.11 < I. We must show
that the sequence {Tx.l contains a convergent subsequence.
By assumption, T 1 is a completely continuous operator, and thus we can
select a convergent subsequence from the sequence T { 1 .x l. Let
Now each of the operators T IJ T", T 3 , • • , T., ... transforms this sequence
into a convergent sequence. To show that Tis completely continuous we must
7.7. Completely Continuous Operators 54 1
show that T also transforms this sequence into a convergent sequence. Now
II Tx • - Tx .... 11 = 1\ Tx." - Tkx • + Tkx". - Tkx ..", + Tkx",,,, - Tx",,,, II
< 1\ Tx • - Tkx • 11 + II Tkx • - Tkx",,,, II + II Tkx",,,, - Tx",,,, II
< liT - Tkll(llx • 11 + Ilx m ", II) + 1\ Tkx • - T k"x ,,,, II;
i.e.,
II Tx"" - Tx",,,, II < liT - TkII(II x • II + II "x '
II) + II Tkx • - Tkx",,,, II.
Since the sequence T { kX • } converges, we can choose m, n > N such that
II Tkx • - Tkx ..", II < f/2, and also we can choose k so that II T - Tk II < f/4.
We now have
II Tx • - Tx",,,, II < f
(b) L
.
(a) X k E N k for every k,
x k = X; and
k=1
Proof We first prove the equivalence of statements (i) and (ii). Let Y
= U Nil' Then Y c y.l.L by Theorem 6.12.8. Furthermore, y.l.L is the smallest
II
eH nce, ~
. Ilx,..1I 2
< 00. Next, let X o = ~
. X k• Then X o E .X For fixed j,
k=1 "'1=
let Y E N j . Then (x - x o, y) = (x j + Yj - x o, y) = (x j ,y) + (Y j ,y) - (xo,Y)
= (x j' y) - (i:
,..-1 ,x .., Y) = (x j' y) - i: (x k, y) = (x j' y) - (x j ' Y ) = O. Thus,
.
"'I~
= L
.. x~, where X k, x~ E Nk for all k. Then L
. (x k - x~) =
"'1-
O. Since X k
k=1 k=1
- x~ E N k we have (x k - x~) L- (x j - x~) for j *' k, and so II k~ (x k - x~) Ir
= i: II X k -
k=1
X~ 11 2 = O. Thus, II X k- x~ II = 0 for all k, and X k is unique for
each k.
To prove that (iii) implies (i), assume that x E Nt for every k. By hypothe-
sis, x = i:
k=1
X k, where X k E N k for all k. Hence, for any j we have
7.8. The Spectral Theorem for Completely Continuous Normal Operators 54 7
(iv) T
)- 0
= ~
.. lJP).
t=1
Proof The proof of each part follows readily from results already obtained.
We simply indicate the principal results needed and leave the details as an
exercise.
Part (i) follows from the definition of orthogonal projection. Part (ii)
follows from part (ii) of Theorem 7.6.26. Parts (iii) and (iv) follow from
Theorems 7.1.27 and 7.8.5. •
(7.9.2)
The Gateaux differential ofI is sometimes also called the weak differential
of I or the G-differenfial of f If I is Gateaux differentiable at x o, then
6/(x o, h) need not be linear nor continuous as a function of hEX . However,
we shall primarily be concerned with functions I: X - + Y which have these
properties. This gives rise to the following concept.
54 8
7.9. Differentiation ofOperators 54 9
Proof Let F(x o) = !'(x o), let f > 0, and let hEX . Then there is a 0 > 0
such that
II t~ 1I1\(J X o + th) - /(x o) - F ( x o )th II < f • II h II
provided that II th II < 0 if th *- O. This implies that
II /(x o + t~) - /(x o) - F ( x o )h II < f
7.9.6. Example. Let X = R' and let 11·11 be any norm on .X By Theorem
6.6.5, X is a Banach space. Now let / be a functional defined on X ; i.e.,
I: X .- R. Let x = (~I' ... ,~.) E X and h = (hI> ... ,h.) E .X If/has
continuous partial derivatives with respect to ~I' i = I, ... ,n, then the
Frechet differential of/is given by
~/(x, h) -
_
ae:
8/(x )
hI + ... + o
- 8/(xc ) h•.
F(xo)h = ~ • 8/(x
~ ) I
hi "~"o for hEX .
Then F ( x o) is the Frechet derivative of/at X O' As in the preceding example,
we do not distinguish between X and X · , and we write
64 0 Chapter 7 I iL near Operators
7.9.9. Example. Let X and Y be real Hilbert spaces, and let L be a bounded
linear operator from X into ;Y i.e., L E B(X , )Y . eL t L * be the adjoint of L .
eL t v be a fixed element in ,Y and let/be a real-valued functional defined on
Xby
f(x ) = IIv - L x 11 1 for all x E .X
Then f has a Frechet derivative which is given by
f' ( x ) = -2L*v + 2L*Lx.
To verify this, observe that
f(x ) = (v - Lx, v- Lx) = (v, v) - 2(v, L x ) + (Lx, Lx)
= (v, v) - 2(L*v, )x + (x , L * L x ) .
The conclusion now follows from Examples 7.9.5 and 7.9.8. •
7.9.10. Example. eL t X = R8, and let Y = R"'. Since X and Y are finite
dimensional, we may assume arbitrary norms on each of these spaces and
they wiII both be Banach spaces. L e tf: X - > .Y F o r x = (~I" .. '~8) E ,X
7.9. Differentiation ofOperators 64 1
let us write
,I;')J
[
/I~X)J /[ 1(1;1,;., .
I(x ) = . = . .
. .
I",(x) 1",(1;1'' ,1;.)
For X o E ,X assume that the partial derivatives
af,(x ) I ~ af,(x o)
? f ; "=". - ae;-
exist and are continuous for i = I, ... , m and j = I, ... ,n. The Frechet
differential of1 at X o with increment h = (hI' ... ,h.) E X is given by
all (x o) a/,(x o)
3/(x o, h) = ~ ~ h[ h·:.'·J
al",(x o) al",(x o)
_ ael al;.
The F r tkhet derivative of 1 at X o is given by
all (x o)
~
f
given by
3/(x o, h) = k(s, t) ag(t'a~o(t}) h(t)dt. •
64 2 Chapter 7 I iL near Operators
By the continuity of g (see the proof of part (i) of Theorem 7.9.13), it follows
that Ildli < M · l Ihll for some constant M. Hence, there is a constant k
7.9. Differentiation 01 Operators 64 3
such that
II ,(x + h) - ,(x ) - f' ( y)g' ( x ) h II < kf II h II.
This implies that ,' ( x ) exists and ,' ( x ) = f' ( g(x » g ' ( x ) . •
7.9.17. Example. Let X = R" and Y = Rm, and let us assume that the
natural basis for each of these spaces is being used (see Example .4 1.15).
If A E H ( X , Y), then Ax is given in matrix representation by
all
Ax = :
[
amI
Since
Irp(f(x + h» - rp(/(x» I = Irp(/(x + h) - I = Irp(y) I
I(x ) )
= IIrpll·lI/(x o + h) - I(x o) II,
it follows that II/(x o + h) - I(x o) II ~ sup 11f'(x
O<t<1
+ th)1I l· Ihll. •
We conclude the present section by showing that the Gateaux and Frechet
differentials play a role in maximizing and minimizing functionals which is
similar to that of the ordinary derivative of functions of real variables.
eL t F = R, and let I be a functional on X ; i.e., I: X - > R. Clearly, for
fixed x o, hEX . we may define a function g: R - + R by the relation g(t)
= I(x o + th) for all t E R. In this case, if I is Gateaux differentiable at x o•
we see that ~/(xo. h) = g' ( t) It.o, where g' ( t) is the usual derivative of g(t).
We will need this property in proving our next result, Theorem 7.9.22. First,
however, we require the following important concept.
Proof As pointed out in the remark preceding Definition 7.9.21, the real-
valued function g(t) = f(x o + th) must have an extremum at t = O. From
the oridnary calculus we must have g'(t) 1,.0 = O. eH nce, 6f(x o, h) = 0
for all hEX . •
(7.10.4)
(
11-1 All ,,- 1 A" lJ .
Eq. (7.10.4) for arbitrary Y E .X We want to show that Tx - l x = y. F r om
7.10. Some Applications 64 7
Thus, poY = - l Pox and PJY = lJPxJ - lPJx. Now from the spectral
theorem (Theorem 7.8.7), we have Y = poY + fti PJ'Y
00
Tx
00
= ftilJ P J x , and
+
00
X=X
poY
o - " ' "II.
+ ~
~'
PkY (7.10.6)
k= l lI.k -.I\,
k*J
° where x ( o) X o is given. Here (X I) E RIO and (U I) E R'" for every 1 such that
A
where .(1, f) is the state transition matrix for the system of equations given
in Eq. (7.10.8).
Let sU now define the class of vector valued functions ;L O
[' , T] by
;L O
[' , T] = u{ : uT = (U . , •• ,u",), where /U E L 20
[ , T], i = I, ... ,m} .
r
If we define the inner product by
(u, v) = uT(t)v(l)dl
for u, v E Lr[O, 1',] then it follows that Lr[O, T] is a Hilbert space (see
Example 6.11.11). Next, let us define the linear operator L : Lr[O, T] - +
Li[O, 1'] by
[Lu](I) = I .(1, r- )BU(f)d-r (7.10.10)
for all U E Lr[O, 1'.] Since the elements of .(1, r- ) are continuous functions
on 0[ , T] X 0[ , T], it follows that L is completely continuous.
Now recall from Exercise 5.10.59 that Eq. (7.10.9) is the unique solution
to Eq. (7.10.8) when the elements of the vector u(t) are continuous functions
of t. It can be shown that the solution of Eq. (7.10.8) exists in an extended
sense if we permit u E Lr[O, T]. Allowing for this generalization, we can
now consider the following optimal control problem. Let "I E R be such that
r r
"I > 0, and let/be the real-valued functional defined on Ll[O, T] given by
/(u) = T
x (t)X(I)dt + "I T
U (I)U(t)dt, (7.10.11)
Let
v(t) = - . (t, O)x o for 0 < t ::::;; T.
Then we can rewrite Eq. (7.10.9) as
x = Lu - v,
and Eq. (7.10.11) assumes the form
f(u) = IILu - vW + "lIuW.
We can find the desired minimizing u in the more general context of
arbitrary real Hilbert spaces by means of the following result.
The solution to Eq. (7.10.14) can be obtained from Eq. (7.10.4); however.
a more convenient method is available for the finding of the solution when
L is given by Eq. (7.10.10). This is summariz e d in the following result.
ru: r
bitrary w E ,L ,[O. T]. We compute
r f:
ruT(t)[f
= uT(t)BT.T(S,t)w(s)dtds
= BT.T(S, t)w(s)dsJ d t.
r
In order for this last expression to eq u al (L*w, u), we must have
r
for all t such that 0 ~ t< T. Now assume there exists a matrix P such that
We now find conditions for such a matrix P(t) to exist. F i rst, we see that
P(T) = O. Next, differentiating both sides of Eq. (7.10.17) with respect to
t, and noting that ebT(s, t) = - AT.(s. t), we have
= - x ( t) - ATp(t)x(t).
7.10. Some Applications 74 1
Therefore,
P(t)x(t) + P(t)[Ax(t) + Bu(t)] = - x ( t) - ATP(t)X(t).
But
u(t) = - l - L * x ( t) = - l - B Tp(t)x ( t)
1' i'
so that
P(t)x ( t) +
P(t)Ax ( t) - l - P (t)BBTp(t)x ( t) = - x ( t) - ATP(t)X(t).
1'
Hence, pet) must satisfy
pet) = - A TP(t) - P(t)AT + l- P (t)BBTP(t) - I
i'
with peT) = O.
If
has a unique solution, say x o, and X o minimizes f(x ) . Iterative methods are
based on beginning with an initial guess to the solution of Eq. (7.10.20)
and then successively attempting to improve the estimate according to a
recursive relationship of the form
(7.10.21)
where~. E Rand r. E .X Different methods of selecting~. and r. give rise
to various algorithms of minimizing f(x ) given in Eq. (7.10.18) or, equiva-
lently, finding the solution to Eq. (7.10.20). In this part we shall in particular
consider the method of steepest descent. In doing so we let
r. = w- Mx., n= 1,2, . . . . (7.10.22)
The term r. defined by Eq. (7.10.22) is called the residual of the approxima-
tion x .• If, in particular, x . satisfies Eq. (7.10.20), we see that the residual
is ez ro. F o r f(x ) given in Eq. (7.10.18), we see that
f' ( x . ) = - 2 r",
where f' ( x . ) denotes the gradient of f(x . ). That is, the residual, r., is
"pointing" into the direction of the negative of the gradient, or in the direction
of steepest descent. Equation (7.10.2 I) indicates that the correction term
~.r. is to be a scalar multiple of the gradient, and thus the steepest descent
method constitutes an example of one of the so-called "gradient methods."
With r. given by Eq. (7.10.22),~. is chosen so thatf(x . + ~.r.) is minimum.
Substituting x . + ~.r. into Eq. (7.10.18), it is readily shown that
(l = (r•• r.)
• (r., Mr.)
is the minimizing value. This method is illustrated pictorially in Figure B.
,X
fix , )
Eq. (7.10.20).
Proof In view of the Schwarz inequality we have (x, Mx ) < IIMx l lllx l l.
This implies that "llx l l < IIMx l l for all x E ,X and so M is a bijective
mapping by Theorem 7.4.21, with M- I E B(X , X ) and 11M-III < I/r. By
Theorem 7.4.10, M- I is also self-adjoint. L e t X o be the uniq u e solution to
Eq . (7.10.20), and define :F X - > R by
F(x) =
x o, M(x - x o)) for x E .X
(x -
We see that F is minimized uniquely by x = x o, and furthermore F ( x o) = O.
We now show that lim F ( x N) = O. If for some n, F ( x N) = 0, the process
N
r
F(x N) (' n , M' n )(M- I rN, ' n ) .J l
Hence, (F N
x I+ ) < (1 - ~ )F ( X n) < (1 - ~ F(x l ). Thus, li~ F(x n) = Oand
so X N- > x o, which was to be proven. _
REF E RENCES
A algebraic structure, 31
algebraic system, 30
algebra with identity,S 7, 105
Abelian group, 4 0 aligned,379
abstract algebra, 33 almost everywhere, 295
additive group, 46 approximate eigenvalue, 4
adherent point, 275 approximate point
adjoint system of spectrum, 4
ordinary differential approximation, 395
equations, 261 Arzela-Ascoli theorem, 316
adjoint transformation, 219, 220,4 2 2 Ascoli's lemma, 317
affine linear subspace, 85 associative algebra, 56,105
algebra, 30, 56, 57,104 associative operation, 28
algebraically closed automorphism, 64 , 68
field, 165 autonomous system of
algebraic conjugate, 110 differential equations, 241
algebraic multiplicity, 167,223 Axioms of norm, 207
475
476 Index
o E
induced (cont.) K
metric, 267
norm,34 9 ,4 1 2 Kalman's theorem, 4 0 14 0 2
operation, 29 kernel of a homomorphism, 65, 68
inequalities, 268-271 Kronecker delta, III
infinite-dim ensional
vector space, 92
infinite series, 350 L
infinite set, 8
initial value problem, 238-261,328-341 Laplace transform, 96
injection, 14 latent value, 164
injective, 14, 100 leading coefficient of
inner product, 117,205,375 a polynomial, 70
inner product space, 31, 118,205 eL besgue integral, 296
inner product subspace, 118 eL besgue measurable
integral domain, 4 6 , 4 9 function, 296
integration: eL besgue measurable
of matrices 249 sets, 295
of vectors 249 eL besgue measure, 295
interior, 278 left cancellation
intersection of sets,S . property, 34
invariant linear left distributive, 28
subspace, 122 left identity, 35
inverse: left inverse, 36
image 21 left invertible element, 37
of a function, IS, 100 left R-module, 54
of a matrix, 140 left solution, 4 0
of an element, 38 iL e algebra, 57
relation, 25 limit, 286
invertible element, 37 limit point, 277,288
invertible linear line segment, 351
transformation, 100 linear:
invertible matrix, 140 algebra, 33
irreducible polynomial, 74 functional, 109,355- 3 60
irreflexive,372 manifold, 81
isolated point, 275 operator, 31,95
isometric operator, 34 1 quadratic cost
isometry, 321 control,4 6 8
isomorphic, 108 space, 30, 55, 76
isomorphic semigroups, 64 subspace, 59, 81, 348
isomorphism, 30, 63, 68,108 subspace generated
by a set, 86
transformation, 30, 95, 100
J variety, 85
linearly dependent, 87
Jacobian matrix, 64 1 linearly independent, 87
Jacobi identity, 57 Lipschitz condition, 324 , 328
J o rdan canonical form, I 75, 19·1 Lipschitz c.onstant, 324 , 328
480 Index
Q
p
quadratic form, 115, 226
parallel, 364 quotient, 72
parallelogram law, 208, 379
Parseval's formula, 390
Parseval's identity, 212 R
partial sums, 350
partitioned matrix, 147 radius, 275
permutation group, 4 , 54 random variable, 397
permutation on a set, 19 range of a function, 12
piecewise continuous range of a relation, 25
derivatives, 329 range space, 98
point of accumulation, 277 rank of a linear
points, 264 transformation, 100
482 Index
vacuous set, 3
Vandermonde matrix, 260 XYZ
variance, 398
vector, 75 CX , 357
vector addition, 75 X · , 357-358
vector space, 30, 55, 76 zero:
vector space of n-tuples polynomial, 70
over F, 56 transformation, 104 , 4 0 9
vector space over a field, 76 vector, 76, 77
vector subspace, 59 Zorn's lemma, 390
REVIEWSOF
Algebra and Analysis
for Engineers and Scientists