Download as pdf or txt
Download as pdf or txt
You are on page 1of 100

MAT1320-Linear Algebra Lecture Notes

Ismail Aydogdu

December 2019

1 Matrices
If we examine the method of elimination described in Section 1.1, we can make the
following observation: Only the numbers in front of the unknowns x1 , x2 , . . . , xn and
the numbers b1 , b2 , . . . , bm on the right side are being changed as we perform the steps in
the method of elimination. Thus we might think of looking for a way of writing a linear
system without having to carry along the unknowns. Matrices enable us to do this -
that is, to write linear systems in a compact form that makes it easier to automate the
elimination method by using computer software in order to obtain a fast and efficient
procedure for finding solutions. The use of matrices, however, is not merely that of
a convenient notation. We now develop operations on matrices and will work with
matrices according to the rules they obey; this will enable us to solve systems of linear
equations and to handle other computational problems in a fast and efficient manner.
Of course, as any good definition should do, the notion of a matrix not only provides a
new way of looking at old problems, but also gives rise to a great many new questions,
some of which we study in this lecture.

Definition 1. An m × n matrix A is a rectangular array of mn real or complex


numbers arranged in m horizontal rows and n vertical columns:

a11 a12 · · · · · · · · · a1n


 
 a21 a22 · · · · · · · · · a2n 
 . .. .. 
 .
 . . ··· ··· ··· . 

 . .. . 
 .. . · · · · · · · · · .. 
A =  .. ←− ith row
 
.. ..  (1)
 . . · · · · · · a ij . 
 .. .. .. 
 
 . . ··· ··· ··· . 
 . .. . 
 .. . · · · · · · · · · .. 
am1 am2 · · · · · · · · · amn
jth column


1
The ith row of A is

[ai1 , ai2 , . . . , ain ] (1 ≤ i ≤ m)

the jth column of A is  


a1j
 a2j 
(1 ≤ j ≤ n)
 
 .. 
 . 
amj
We shall say that A is m by n (written as m × n). If m = n, we say that A is a square
matrix of order n and that the numbers a11 , a22 , . . . , ann form the main diagonal
of A. We refer to the number aij , which is in the ith row and jth column of A, as the
i, jth element of A, or the (i, j) entry of A, and we often write (11) as

A = [aij ] .

Example 2. Let
 
    1
1 2 3 1 + i 4i
A= , B= , C =  −1 
−1 0 1 2 − 3i −3
2
 
1 1 0  
D= 2
 0 1 , E = [3], F = −1 0 2
3 −1 2

Then A is a 2 × 3 matrix with a12 = 2, a13 = 3, a22 = 0, and a23 = 1; B is a 2 × 2


matrix with b11 = 1 + i, b12 = 4i, b21 = 2 − 3i, and b22 = −3; C is a 3 × 1 matrix with
c11 = 1, c21 = −1, and c31 = 2; D is a 3 × 3 matrix; E is a 1 × 1 matrix; and F is a
1 × 3 matrix. In D, the elements d11 = 1, d22 = 0, and d33 = 2 form the main diagonal.

An n×1 matrix is also called an n−vector and is denoted by lowercase boldface letters.
When n is understood, we refer to n-vectors merely as vectors. Vectors are discussed
at future sections.
 
1  
 2  1
Example 3. u =   −1  is a 4-vector and v = −1 is a 3-vector.
  
3
0
The n-vector all of whose entries are zero is denoted by 0. Observe that if A is an
n × n matrix, then the rows of A are 1 × n matrices and the columns of A are n × 1
matrices. The set of all n-vectors with real entries is denoted by Rn . Similarly the set
of all n-vectors with complex entries is denoted by Cn . As we have already pointed out,
in the first six chapters of this book we work almost entirely with vectors in Rn .

2
Definition 4. Two m × n matrices A = [aij ] and B = [bij ] are equal if they agree
entry by entry, that is, if aij = bij for i = 1, 2, . . . , m and j = 1, 2, . . . , n.

Example 5. The matrices


   
1 2 −1 1 2 w
A =  2 −3 4  and B =  2 x 4 
0 −4 5 y −4 z

are equal if and only if w = −1, x = −3, y = 0, and z = 5.

Definition 6. If A = [aij ] is an n × n matrix, then the trace of A, Tr(A), is defined


Xn
as the sum of all elements on the main diagonal of A, Tr(A) = aii .
i=1

1.1 Matrix Operations


We next define a number of operations that will produce new matrices out of given
matrices. When we are dealing with linear systems, for example, this will enable us
to manipulate the matrices that arise and to avoid writing down systems over and
over again. These operations and manipulations are also useful in other applications of
matrices.

Definition 7 (Matrix Addition). If A = [aij ] and B = [bij ] are both m × n matri-


ces, then the sum A + B is an m × n matrix C = [cij ] defined by cij = aij + bij , i =
1, 2, . . . , m; j = 1, 2, . . . , n. Thus, to obtain the sum of A and B, we merely add corre-
sponding entries.

Example 8. Let
   
1 −2 3 0 2 1
A= and B = .
2 −1 4 1 3 −4

Then    
1 + 0 −2 + 2 3+1 1 0 4
A+B = = .
2 + 1 −1 + 3 4 + (−4) 3 2 0

If x is an n-vector, then it is easy to show that x + 0 = x, where 0 is the n-vector all


of whose entries are zero.
It should be noted that the sum of the matrices A and B is defined only when A and
B have the same number of rows and the same number of columns, that is, only when
A and B are of the same size.
We now make the convention that when A + B is written, both A and B are of the
same size.

3
Definition 9 (Scalar Multiplication). If A = [aij ] is an m × n matrix and r is a
real number, then the scalar multiple of A by r, rA, is the m × n matrix C = [cij ]
where cij = raij , i = 1, 2, . . . , m and j = 1, 2, . . . , n; that is, the matrix C is obtained by
multiplying each entry of A by r.
Example 10. We have
   
4 −2 −3 (−2)(4) (−2)(−2) (−2)(−3)
−2 =
7 −3 2 (−2)(7) (−2)(−3) (−2)(2)
 
−8 4 6
=
−14 6 −4
Thus far, addition of matrices has been defined for only two matrices. Our work with
matrices will call for adding more than two matrices. Theorem 1.1 in Section 1.4 shows
that addition of matrices satisfies the associative property: A + (B + C) = (A + B) + C.
If A and B are m×n matrices, we write A+(−1)B as A−B and call this the difference
between A and B.
Example 11. Let
   
2 3 −5 2 −1 3
A= and B =
4 2 1 3 5 −2
Then    
2 − 2 3 + 1 −5 − 3 0 4 −8
A−B = = .
4−3 2−5 1+2 1 −3 3

We shall sometimes use the summation notation, and we now review this useful and
compact notation.
By ni=1 ai we mean a1 +a2 +· · ·+an . The letter i is called the index of summation,
P
it is a dummy variable that can be replaced by another letter. Hence we can write
Pn Pn Pn
i=1 ui = j=1 uj = k=1 uk .

Thus P4
i=1 ai = a1 + a2 + a3 + a4 .
If A1 , A2 , . . . , Ak are m × n matrices and c1 , c2 , . . . , ck are real numbers, then an expres-
sion of the form

c1 A 1 + c2 A 2 + · · · + ck A k (2)

is called a linear combination of A1 , A2 , . . . , Ak , and c1 , c2 , . . . , ck are called coeffi-


cients. The linear combination in Equation (12) can also be expressed in summation
notation as
Xk
ci Ai = c1 A1 + c2 A2 + · · · + ck Ak .
i=1

4
Example 12. The following are linear combinations of matrices:
   
0 −3 5 5 2 3
1
3 2 3 4 −  6 2 3 ,
2
1 −2 −3 −1 −2 3
     
2 3 −2 − 3 5 0 + 4 −2 5

   
1 0.1
−0.5  −4  + 0.4  −4 
−6 0.2
 
Definition 13. If A = [aij ] is an m × n matrix, then the transpose of A, AT = aTij ,
is the n × m matrix defined by aTij = aji . Thus the transpose of A is obtained from A by
interchanging the rows and columns of A.
Example 14. Let
   
  6 2 −4 5 4
4 −2 3
A= , B =  3 −1 2  , C =  −3 2 
0 5 −2
0 4 3 2 −3
 
  2
D = 3 −5 1 , E =  −1 
3
Then    
4 0 6 3 0
AT =  −2 5  , B T =  2 −1 4 
3 −2 −4 2 3
 
  3
T 5 −3 2
, DT =  −5  , and E T = 2 −1 3 .
 
C =
4 2 −3
1

1.2 Matrix Multiplication


In this section we introduce the operation of matrix multiplication. Unlike matrix addi-
tion, matrix multiplication has some properties that distinguish it from multiplication
of real numbers.
Definition 15. The dot product, or inner product, of the n-vectors in Rn
   
a1 b1
 a2   b2 
a =  ..  and b =  .. 
   
 .   . 
an bn

5
is defined as
n
X
a · b = a1 b1 + a2 b2 + · · · + an bn = ai b i .
i=1

The dot product is an important operation that will be used here and in later sections.

Example 16. The dot product of


   
1 2
 −2   3 
u=
 3  and
 v= 
 −2 
4 1

is
u · v = (1)(2) + (−2)(3) + (3)(−2) + (4)(1) = −6.
   
x 4
Example 17. Let a = 2  and b = 1  . If a · b = −4, find x.

3 2
We have
a · b = 4x + 2 + 6 = −4
4x + 8 = −4
x = −3.

Definition 18 (Matrix Multiplication). If A = [aij ] is an m×p matrix and B = [bij ]


is a p×n matrix, then the product of A and B, denoted AB, is the m×n matrix C = [cij ],
defined by

cij = ai1 b1j + ai2 b2j + · · · + aip bpj


p
X (3)
= aik bkj (1 ≤ i ≤ m, 1 ≤ j ≤ n).
k=1

Equation (13) says that the ith element in the product matrix is the dot product of the
transpose of the ith row, rowi (A)-that is, (rowi (A))T -and the jth column, colj (B), of
B; this is shown in the below figure.

6
Observe that the product of A and B is de-
fined only when the number of rows of B is
exactly the same as the number of columns
of A, as indicated in the right figure.

Example 19. Let


 
  −2 5
1 2 −1
A= and B =  4 −3  .
3 1 4
2 1

Then  
(1)(−2) + (2)(4) + (−1)(2) (1)(5) + (2)(−3) + (−1)(1)
AB =
(3)(−2) + (1)(4) + (4)(2) (3)(5) + (1)(−3) + (4)(1)
 
4 −2
= .
6 16
Example 20. Let
   
1 −2 3 1 4
A= 4 2 1  and B =  3 −1 
0 1 −2 −2 2

Compute the (3, 2) entry of AB.

7
If AB = C, then the (3, 2) entry of AB is c32 , which is (row3 (A))T · col2 (B). We now
have    
0 4
(row3 (A))T · col2 (B) =  1  ·  −1  = −5.
−2 2
Example 21. Let  
  2
1 x 3
A= and B =  4 
2 −1 1
y
 
12
If AB = , find x and y.
6

We have  
  2    
1 x 3  4 = 2 + 4x + 3y 12
AB = =
2 −1 1 4−4+y 6
y
Then
2 + 4x + 3y = 12
y=6
so x = −2 and y = 6.

The basic properties of matrix multiplication will be considered in the next section.
However, multiplication of matrices requires much more care than their addition, since
the algebraic properties of matrix multiplication differ from those satisfied by the real
numbers. Part of the problem is due to the fact that AB is defined only when the
number of columns of A is the same as the number of rows of B. Thus, if A is an m × p
matrix and B is a p × n matrix, then AB is an m × n matrix. What about BA? Four
different situations may occur:

1. BA may not be defined; this will take place if n 6= m.


2. If BA is defined, which means that m = n, then BA is p × p while AB is m × m;
thus if m 6= p, AB and BA are of different sizes.
3. If AB and BA are both of the same size, they may be equal.
4. If AB and BA are both of the same size, they may be unequal.
Example 22. Let    
1 2 2 1
A= and B =
−1 3 0 1
Then    
2 3 1 7
AB = while BA =
−2 2 −1 3
Thus AB 6= BA.

8
Remark 23. If u and v are n-vectors (n × 1 matrices), then it is easy to show by
matrix multiplication (Exercise 41) that

u·v = uT v.

1.3 Algebraic Properties of Matrix Operations


In this section we consider the algebraic properties of the matrix operations just de-
fined. Many of these properties are similar to the familiar properties that hold for real
numbers. However, there will be striking differences between the set of real numbers
and the set of matrices in their algebraic behavior under certain operations- for exam-
ple, under multiplication (as seen in Section 1. 3). The proofs of most of the properties
will be left as exercises.

Theorem 24 (Properties of Matrix Addition). Let A, B and C are m × n matrices.

a) A + B = B + A.

b) A + (B + C) = (A + B) + C.

c) There is a unique m × n matrix O such that

A+O =A (4)

for any m × n matrix A. The matrix O is called the m × n zero matrix.

d) For each m × n matrix A, there is a unique m × n matrix D such that

A+D =O (5)

We shall write D as −A, so (17) can be written as

A + (−A) = O.

The matrix −A is called the negative of A. We also note that −A is (−1)A.

9
Proof.

a) Let
A = [aij ] ,B = [bij ]
A + B = C = [cij ] , and B + A = D = [dij ] .
We must show that cij = dij for all i, j. Now cij = aij + bij and dij = bij + aij
for all i, j. Since aij and bij are real numbers, we have aij + bij = bij + aij which
implies that cij = dij for all i, j.

c) Let U = [uij ] . Then A + U = A if and only if aij + uij = aij , which holds if and
only if uij = 0. Thus U is the m × n matrix all of whose entries are zero: U is
denoted by O .

Example 25. If  
4 −1
A=
2 3
       
4 −1 0 0 4 + 0 −1 + 0 4 −1
then + = = .
2 3 0 0 2+0 3+0 2 3

The 2 × 3 zero matrix is  


0 0 0
O= .
0 0 0

10
Theorem 26 (Properties of Matrix Multiplication).

a) If A, B, and C are matrices of the appropriate sizes, then

A(BC) = (AB)C.

b) If A, B, and C are matrices of the appropriate sizes, then

(A + B)C = AC + BC.

c) If A, B, and C are matrices of the appropriate sizes, then

C(A + B) = CA + CB. (6)

Proof. a) Suppose that A is m × n, B is n × p, and C is p × q. We shall prove the result


for the special case m = 2, n = 3, p = 4, and q = 3. The general proof is completely
analogous.
Let A = [aij ] , B = [bij ] , C = [cij ] , AB = D = [dij ] , BC = E = [eij ] (AB)C = F =
[fij ] , and A(BC) = G = [gij ] . We must show that fij = gij for all i, j. Now
4 4 3
!
X X X
fij = dik ckj = air brk ckj
k=1 k=1 r=1

and !
3
X 3
X 4
X
gij = air erj = air brk ckj
r=1 r=1 k=1

Then, by the properties satisfied by the summation notation,

4
X
fij = (ai1 b1k + ai2 b2k + ai3 b3k ) ckj
k=1
4
X 4
X 4
X
= ai1 b1k ckj + ai2 b2k ckj + ai3 b3k ckj .
k=1 k=1 k=1
3 4
!
X X
= air brk ckj = gij
r=1 k=1

11
Example 27. Let
 
  2 −1 1 0
5 2 3
A= , B= 0 2 2 2 
2 −3 4
3 0 −1 3

and  
1 0 2
 2 −3 0 
C=
 0

0 3 
2 1 0
Then  
  0 3 7  
5 2 3  8 −4 6  = 43 16 56
A(BC) =
2 −3 4 12 30 8
9 3 3
and  
  1 0 2  
19 −1 6 13  2 −3 0  43 16 56
(AB)C =  = .
16 −8 −8 6  0 0 3  12 30 8
2 1 0
Example 28. Let
 
    1 0
2 2 3 0 0 1
A= , B= , and C= 2 2 .
3 −1 2 2 3 −1
3 −1

Then  
  1 0  
2 2 4  2 18 0
(A + B)C = 2 =
5 2 1 12 3
3 −1
and (verify)      
15 1 3 −1 18 0
AC + BC = + = .
7 −4 5 7 12 3

Theorem 29 (Properties of Scalar Multiplication). If r and s are real numbers


and A and B are matrices of the appropriate sizes, then

a) r(sA) = (rs)A

b) (r + s)A = rA + sA

c) r(A + B) = rA + rB

d) A(rB) = r(AB) = (rA)B

12
Example 30. Let
 
  3 −2 1
4 2 3
A= and B= 2 0 −1  .
2 −3 4
0 1 2

Then    
12 6 9 24 12 18
2(3A) = 2 = = 6A.
6 −9 12 12 −18 24
We also have
 
  6 −4 2  
4 2 3  4 0 −2  = 32 −10 16
A(2B) = = 2(AB).
2 −3 4 0 0 26
0 2 4

Example 31. Scalar multiplication can be used to change the size of entries in a matrix
to meet prescribed properties. Let
 
3
 7 
A=  2 .

1
Then for k = , the largest entry of kA is 1. Also if the entries of A represent the
7
volume of products in gallons, for k = 4, kA gives the volume in quarts.

So far we have seen that multiplication and addition of matrices have much in common
with multiplication and addition of real numbers. We now look at some properties of
the transpose.

Theorem 32 (Properties of Transpose). If r is a scalar and A and B are matrices


of the appropriate sizes, then

a) (AT )T = A.

b) (A + B)T = AT + B T .

c) (AB)T = B T AT .

d) (rA)T = rAT .

Proof. We leave the proofs of a), b), and d) as an exercise.

13
c) Let A = [aij ] and B = [bij ] ; let AB = C = [cij ] . We must prove that cTij is the
(i, j) entry in B T AT . Now
n
X n
X
cTij = cji = ajk bki = aTkj bTik
k=1 k=1
n
X
= bTik aTkj = the (i, j) entry in B T AT .
k=1

Example 33. Let


   
1 2 3 3 −1 2
A= and B = .
−2 0 1 3 2 −1
Then    
1 −2 3 3
AT =  2 0  and B T =  −1 2 .
3 1 2 −1
Also,  
  4 1
4 1 5
A+B = and (A + B)T =  1 2  .
1 2 0
5 0
Now  
4 1
AT + B T =  1 2  = (A + B)T .
5 0
Example 34. Let 
  0 1
1 3 2
A= and B =  2 2 
2 −1 3
3 −1
Then    
12 5 T 12 7
AB = and (AB) = .
7 −3 5 −3
On the other hand,

1 2  
T T 0 2 3
A =  3 −1  and B = .
1 2 −1
2 3
Then  
T T 12 7
B A = = (AB)T .
5 −3

14
Remark 35. We also note two other peculiarities of matrix multiplication. If a and b
are real numbers, then ab = 0 can hold only if a or b is zero. However, this is not true
for matrices.

Example 36. If    
1 2 4 −6
A= and B =
2 4 −2 3
 
0 0
then neither A nor B is the zero matrix, but AB = .
0 0
Remark 37. If a, b, and c are real numbers for which ab = ac and a 6= 0, it follows
that b = c. That is, we can cancel out the nonzero factor a. However, the cancellation
law does not hold for matrices, as the following example shows.

Example 38. If
     
1 2 2 1 −2 7
A= , B= , and C=
2 4 3 2 5 −1

then  
8 5
AB = AC =
16 10
but B 6= C.

Remark 39. We summarize some of the differences between matrix multiplication and
the multiplication of real numbers as follows: For matrices A, B, and C of the appro-
priate sizes,

1. AB need not equal BA.

2. AB may be the zero matrix with A 6= O and B 6= O.

3. AB may equal AC with B 6= C.

1.4 Special Types of Matrices and Partitioned Matrices


We have already introduced one special type of matrix O, the matrix all of whose
entries are zero. We now consider several other types of matrices whose structures are
rather specialized and for which it will be convenient to have special names. An n × n
matrix A = [aij ] is called a diagonal matrix if aij = 0 for i 6= j. Thus, for a diagonal
matrix, the terms off the main diagonal are all zero. Note that O is a diagonal matrix.
A scalar matrix is a diagonal matrix whose diagonal elements are equal. The scalar
matrix In = [dij ], where dii = 1 and dij = 0 for i 6= j, is called then n × n identity
matrix .

15
Example 40. Let
     
1 0 0 2 0 0 1 0 0
A =  0 2 0 , B =  0 2 0 , and I3 =  0 1 0 
0 0 3 0 0 2 0 0 1

Then A, B, and I3 are diagonal matrices; B and I3 are scalar matrices; and I3 is the
3 × 3 identity matrix.

It is easy to show (Exercise 1 ) that if A is any m × n matrix, then

AIn = A and Im A = A

Also, if A is a scalar matrix, then A = rIn for some scalar r.


Suppose that A is a square matrix. We now define the powers of a matrix, for p a
positive integer, by
Ap = |A · A ·{z
· · · · · A} .
p factors

If A is n × n , we also define
A0 = In .
For non-negative integers p and q , the familiar laws of exponents for the real numbers
can also be proved for matrix multiplication of a square matrix A (Exercise 8 ):

Ap Aq = Ap+q and (Ap )q = Apq .

It should also be noted that the rule

(AB)p = Ap B p

does not hold for square matrices unless AB = BA (Exercise 9).


An n × n matrix A = [aij ] is called upper triangular if aij = 0 for i > j. It is called
lower triangular if aij = 0 for i < j. A diagonal matrix is both uper triangular and
lower triangular.

Example 41. The matrix  


1 3 3
A= 0 3 5 
0 0 2
is upper triangular; and  
1 0 0
B= 2 3 0 
3 5 2
is lower triangular.

16
Definition 42. A matrix A with real entries is called symmetric if AT = A.
Definition 43. A matrix A with real entries is called skew symmetric if AT = −A.
 
1 2 3
Example 44. A =  2 4 5  is a symmetric matrix.
3 5 6
 
0 2 3
Example 45. B =  −2 0 −4  is a skew symmetric matrix.
−3 4 0

It follows from the preceding definitions that:

• If A is symmetric or skew symmetric, then A is a square matrix.


• If A is a symmetric matrix, then the entries of A are symmetric with respect to
the main diagonal of A. Also, A is symmetric if and only if aij = aji .
• A is skew symmetric if and only if aij = −aji . Moreover, if A is skew symmetric,
then the entries on the main diagonal of A are all zero.
• An important property of symmetric and skew symmetric matrices is the follow-
ing: If A is an n × n matrix, then we can show that A = S + K, where S is
symmetric and K is skew symmetric. Moreover, this decomposition is unique
(Exercise 29) .
Definition 46. Let A be an n × n matrix and p be a positive integer. If Ap+1 = A then
A is called a periodic matrix. If p is the least such integer, then the matrix is said to
have period p. If p = 1 then A2 = A and A is called idempotent.
Definition 47. Let A be an n × n matrix and q be a positive integer. If Aq = O then
A is called a nilpotent matrix. The least such positive integer q is called the index
(or, degree) of nilpotency.
Definition 48. Let A be an n × n matrix. If A2 = In then A is called an involutory
matrix.
Definition 49. An n × n real matrix A is called orthogonal if AAT = AT A = In .
 
1 1 1
Example 50. The matrix A = √ is an orthogonal matrix.
2 1 −1

Partitioned Matrices

If we start with an m × n matrix A = [aij ] and then cross out some, but not all of its
rows or columns, we obtain a submatrix of A.

17
Example 51. Let  
1 2 3 4
A =  −2 4 −3 5 .
3 0 5 −3
If we cross out the second row and third column, we get the submatrix
 
1 2 4
.
3 0 −3

A matrix can be partitioned into submatrices by drawing horizontal lines between rows
and vertical lines between columns. Of course, the partitioning can be carried out in
many different ways.
Example 52. The matrix
 
a11 a12 a13 a14 a15  
 a21 a22 a23 a24 a25 
A=  = A11 A12
 a31 a32 a33 a34 a35  A21 A22
a41 a42 a43 a44 a45

can be partitioned as indicated previously. We could also write


 
a11 a12 a13 a14 a15 " #
 a21 a22 a23 a24 a25 
 A
b11 A
b12 A
b13
A=
 a31 =
a32 a33 a34 a35  A
b21 A
b22 A
b23
a41 a42 a43 a44 a45
which gives another partitioning of A. We thus speak of partitioned matrices.
Example 53. Let
 
1 0 1 0  
 0 2 3 −1  A11 A 12
A= =
 2 0 −4 0  A21 A22
0 1 0 3

and let  
2 0 0 1 1 −1  
 0 1 1 −1 2 2  B11 B12
B= =
 1 3 0 0 1 0  B21 B22
−3 −1 2 1 0 −1
Then  
3 3 0 1 2 −1  
 6 12 0 −3 7 5  C11 C12
AB = C = 
  =
0 −12 0 2 −2 −2  C21 C22
−9 −2 7 2 2 −1

18
where C11 should be A11 B11 + A12 B21 . We verify that C11 is this expression as
     
1 0 2 0 0 1 0 1 3 0
A11 B11 + A12 B21 = +
0 2 0 1 1 3 −1 −3 −1 2
   
2 0 0 1 3 0
= +
0 2 2 6 10 −2
 
3 3 0
= = C11
6 12 0

• This method of multiplying partitioned matrices is also known as block multipli-


cation.

• Partitioned matrices can be used to great advantage when matrices exceed the
memory capacity of a computer.

• Thus, in multiplying two partitioned matrices, one can keep the matrices on disk
and bring into memory only the submatrices required to form the submatrix
products.

• The products, of course, can be downloaded as they are formed. The partitioning
must be done in such a way that the products of corresponding submatrices are
defined.

Partitioning of a matrix implies a subdivision of the information into blocks, or units.


The reverse process is to consider individual matrices as blocks and adjoin them to form
a partitioned matrix. The only requirement is that after the blocks have been joined,
all rows have the same number of entries and all columns have the same number of
entries.
Definition 54. Let A be an m × n matrix with complex entries. The matrix A, whose
entries are complex conjugates of the entries of the matrix A is called (complex)
conjugate of A.
 
1+i 2−i
Example 55. Let A = . Then the complex conjugate A of the matrix
3 4i
A is  
1−i 2+i
A= .
3 −4i

Properties of conjugate Matrices If A and B are complex matrices of appropriate


sizes and k is a scalar then

a) (A) = A b) kA = kA c) A + B = A + B
T
d) AB = AB e) AT = A

19
T
Definition 56. An n × n complex matrix A is called Hermitian if A = A. This
is equivalent to say that aji = aij for all i and j.

∗ Every real symmetric matrix is Hermitian, so we may consider Hermitian matrices


as the analogs of real symmetric matrices.

∗ The entries on the main diagonal of a Hermitian matrix are all real numbers.
 
1 i 1+i
Example 57. The matrix A =  −i −5 2 + i  is Hermitian since,
1−i 2−i 3
 T  
1 −i 1 − i 1 i 1+i
(A)T =  i −5 2 − i  =  −i −5 2 + i  = A.
1+i 2+i 3 1−i 2−i 3
T
Definition 58. An n × n complex matrix A is called skew-Hermitian if A = −A.
This is equivalent to say that aji = −aij for all i and j.

∗ The entries on the main diagonal of a skew-Hermitian matrix are zero or a complex
number with only imaginary part.
 
i 1−i 2
Example 59. Let A =  −1 − i 3i i  . Then we have
−2 i 0
   
−i 1+i 2 −i −1 + i −2
A =  −1 + i −3i −i  and (A)T =  1 + i −3i −i  = −A.
−2 −i 0 2 −i 0

So, A is skew-Hermitian.

Nonsingular Matrices

We now come to a special type of square matrix and formulate the notion corresponding
to the reciprocal of a nonzero real number.
Definition 60. An n × n matrix A is called nonsingular, or invertible (regular),
if there exists an n × n matrix B such that AB = BA = In ; such a B is called an
inverse of A. Otherwise, A is called singular, or noninvertible.
Remark 61. In Theorem 2.11, Section 2.3, we show that if AB = In , then BA = In .
Thus, to verify that B is an inverse of A, we need verify only that AB = In .

20
−1 32
   
2 3
Example 62. Let A = and B = . Since AB = BA = I2 , we
2 2 1 −1
conclude that B is an inverse of A.
Theorem 63. The inverse of a matrix, if it exists, is unique.

Proof. Let B and C be inverses of A . Then

AB = BA = In and AC = CA = In .

We then have B = BIn = B(AC) = (BA)C = In C = C, which proves that the inverse
of a matrix, if it exists, is unique.

Because of this uniqueness, we write the inverse of a nonsingular matrix A as A−1 . Thus

AA−1 = A−1 A = In .

Example 64. Let  


1 2
A=
3 4
If A−1 exists, let  
−1 a b
A = .
c d
Then we must have
    
−1 1 2 a b 1 0
AA = = I2 =
3 4 c d 0 1
so that    
a + 2c b + 2d 1 0
= .
3a + 4c 3b + 4d 0 1
Equating corresponding entries of these two matrices, we obtain the linear systems
a + 2c = 1 b + 2d = 0
and
3a + 4c = 0 3b + 4d = 1

3 1
The solutions are (verify) a = −2, c = , b = 1, and d = − . Moreover, since the
2 2
matrix
−2 1
  " #
a b
= 3 1
c d −
2 2
also satisfies the property that
−2
" #
1  1 2   1 0 
3 1 = ,
− 3 4 0 1
2 2

21
we conclude that A is nonsingular and that
−2
" #
1
−1
A = 3 1 .

2 2
We next establish several properties of inverses of matrices.
Theorem 65. If A and B are both nonsingular n × n matrices, then AB is nonsingular
and (AB)−1 = B −1 A−1 .

Proof. We have (AB) (B −1 A−1 ) = A (BB −1 ) A−1 = (AIn ) A−1 = AA−1 = In . Simi-
larly, (B −1 A−1 ) (AB) = In . Therefore AB is nonsingular. since the inverse of a matrix
is unique, we conclude that (AB)−1 = B −1 A−1 .
Corollary 66. If A1 , A2 , . . . , Ar are n × n nonsingular matrices, then A1 A2 · · · Ar is
nonsingular and (A1 A2 · · · Ar )−1 = A−1 −1 −1
r Ar−1 · · · A1 .

Theorem 67. If A is a nonsingular matrix, then A−1 is nonsingular and (A−1 )−1 = A.
T
Theorem 68. If A is a nonsingular matrix, then AT is nonsingular and (A−1 ) =
−1
AT .

Proof. We have AA−1 = In . Taking transposes of both sides, we get


T
A−1 AT = InT = In , we find, similarly, that
Taking transposes of both sides of the equation A−1 A = In , we find, similarly, that
T
AT A−1 = In .


T −1
These equations imply that (A−1 ) = AT .
Example 69. If  
1 2
A=
3 4
then from Example 64
 
3
−2
" #
1 T  −2
A−1 = 3 1 and A−1 =  2 
− 1 
2 2 1 −
2
Also (verify)
 
3
−1  −2
 
1 3 2 
AT = and AT = 1 .
2 4 1 −
2
Remark 70. Suppose that A is nonsingular. Then AB = AC implies that B = C (Ex-
ercise 50), and AB = O implies that B = O( Exercise 51) . It follows from Theorem 1.8
that if A is a symmetric nonsingular matrix, then A−1 is symmetric. (See Exercise 54.)

22
2 Echelon Form of a Matrix
Definition 71. An m × n matrix A is said to be in reduced row echelon form if it
satisfies the following properties:

a) All zero rows, if there are any, appear at the bottom of the matrix.
b) The first nonzero entry from the left of a nonzero row is a 1. This entry is called
a leading one of its row.

c) For each nonzero row, the leading one appears to the right and below any leading
ones in preceding rows.

d) If a column contains a leading one, then all other entries in that column are zero.
Remark 72. • A matrix in reduced row echelon form appears as a staircase (”ech-
elon”) pattern of leading ones descending from the upper left corner of the matrix.

• An, m × n matrix satisfying properties a), b), and c) is said to be in row echelon
form. In Definition 69, there may be no zero rows.

• A similar definition can be formulated in the obvious manner for reduced col-
umn echelon form and column echelon form.
Example 73. The following are matrices in reduced row echelon form, since they satisfy
properties a), b), c), and d) :
 
  1 0 0 0 −2 4
1 0 0 0  0 1 0 0 4
 0 1 0 0   8 

A=   , B =  0 0 0 1 7 −2 ,
0 0 1 0  
 0 0 0 0 0

0 
0 0 0 1
0 0 0 0 0 0

and  
1 2 0 0 1
C= 0 0 1 2 3 
0 0 0 0 0
The matrices that follow are not in reduced row echelon form. (Why not?)
   
1 2 0 4 1 0 3 4
D= 0  0 0 0  , E = 0 2 −2
 5 
0 0 1 −3 0 0 1 2
   
1 0 3 4 1 2 3 4
 0 1 −2 5   0 1 −2 5 
F =  0
, G =  
1 2 2   0 0 1 2 
0 0 0 0 0 0 0 0

23
Example 74. The following are matrices in row echelon form:
 
1 5 0 2 −2 4  
 0 1 1 0 0 0
 0 3 4 8  
 0 1 0 0 
H=  0 0 0 1 7 −2  , I =  0 0 1 0
 

 0 0 0 0 0 0 
0 0 0 1
0 0 0 0 0 0
 
0 0 1 3 5 7 9

 0 0 0 0 1 −2 3 

J =
 0 0 0 0 0 1 2 .
 0 0 0 0 0 0 1 
0 0 0 0 0 0 0

A useful property of matrices in reduced row echelon form (see Exercise 9) is that if
A is an n × n matrix in reduced row echelon form 6= In , then A has a row consisting
entirely of zeros.
We shall now show that every matrix can be put into row (column) echelon form, or
into reduced row (column) echelon form, by means of certain row (column) operations.

Definition 75. An elementary row (column) operation on a matrix A is any one


of the following operations:

a) Type I: Interchange any two rows (columns).

b) Type II: Multiply a row (column) by a nonzero number.

c) Type III: Add a multiple of one row (column) to another.

We now introduce the following notation for elementary row and elementary column
operations on matrices:

• Interchange rows (columns) i and j, Type I:

ri ↔ rj (cl ↔ cj )

• Replace row (column) i by k times row (column ) i, Type II:

kri → ri (kci → ci )

• Replace row (column) j by k times row (column) i+ row (column) j, Type III:

kri + rj → rj (kci + cj → cj )

24
Using this notation, it is easy to keep track of the elementary row and column operations
performed on a matrix. For example, we indicate that we have interchanged the i th
and j th rows of A as Ari ↔rj . We proceed similarly for column operations.
Observe that when a matrix is viewed as the augmented matrix of a linear system, the
elementary row operations are equivalent, respectively, to interchanging two equations,
multiplying an equation by a nonzero constant, and adding a multiple of one equation
to another equation.
Example 76. Let  
0 0 1 2
A =  2 3 0 −2 
3 3 6 −9
Interchanging rows 1 and 3 of A, we obtain
 
3 3 6 −9
B = Ar1 ↔r3 =  2 3 0 −2  .
0 0 1 2
1
Multiplying the third row of A by , we obtain
3
 
0 0 1 2
C = A 1 r3 →r3 =  2 3 0 −2  .
3
1 1 2 −3

Adding (−2) times row 2 of A to row 3 of A, we obtain


 
0 0 1 2
D = A−2r2 +r3 →r3 =  2 3 0 −2  .
−1 −3 6 −5
Observe that in obtaining D from A, row 2 of A did not change.
Definition 77. An m × n matrix B is said to be row (column) equivalent to an
m × n matrix A if B can be produced by applying a finite sequence of elementary row
(column) operations to A.
Example 78. Let  
1 2 4 3
A= 2 1 3 2 .
1 −2 2 3
If we add 2 times row 3 of A to its second row, we obtain
 
1 2 4 3
B = A2r3 +r2 →r2 =  4 −3 7 8  ,
1 −2 2 3

25
so B is row equivalent to A.
Interchanging rows 2 and 3 of B, we obtain
 
1 2 4 3
C = Br2 ↔r3 =  1 −2 2 3 
4 −3 7 8

so C is row equivalent to B.
Multiplying row 1 of C by 2, we obtain
 
2 4 8 6
D = C2r1 →r1 =  1 −1 2 3 
4 −3 7 8

so D is row equivalent to C. It then follows that D is row equivalent to A, since we


obtained D by applying three successive elementary row operations to A. Using the
notation for elementary row operations, we have

D = A2r3 +r2 → r2
r2 ↔ r3
2r1 → r1

We adopt the convention that the row operations are applied in the order listed.

Theorem 79. Every nonzero m × n matrix A = [aij ] is row (column) equivalent to a


matrix in row (column) echelon form.

Definition 80. The number of non-zero rows (columns) of a row (column) echelon
form of the matrix A is called the rank of A and denoted by rA or rank(A).

Example 81. Consider the following matrices:


 
  1 0 0  
1 4 3  0 1 0  1 2 3
A =  0 1 2 , B =   0 0 0  and C = 0 0 1 .
  
0 0 1 0 0 0
0 0 0

Then rank(A) = 3, rank(B) = 2 and rank(C) = 2.

In matrix A, the first column with a nonzero entry is called the pivot column; the first
nonzero entry in the pivot column is called the pivot.

Example 82. Let

26
Column 1 is the first (counting from left to right) column in A with a nonzero entry,
so column 1 is the pivot column of A. The first (counting from top to bottom) nonzero
entry in the pivot column occurs in the third row, so the pivot is a31 = 2. We interchange
the first and third rows of A, obtaining
 
2 2 −5 2 4
 0 0 2 3 4 
B = Ar1 ↔r3 = .
 0 2 3 −4 1 
2 0 −6 9 7
1 1
Multiply the first row of B by the reciprocal of the pivot, that is, by = , to obtain
b11 2
 5 
1 1 − 1 2
 2 
 0 0 2 3 4
C = B 1 r1 →r1 =

2

 0 2 3 −4 1 
2 0 −6 9 7
Add (−2) times the first row of C to the fourth row of C to produce a matrix D in
which the only nonzero entry in the pivot column is d11 = 1 :
 5 
1 1 − 1 2
 2 
D = C−2r1 +r4 →r4 =  0 0 2 3 4 .

 0 2 3 −4 1 
0 −2 −1 7 3

Identify A1 as the submatrix of D obtained by deleting the first row of D: Do not erase
the first row of D. Repeat the preceding steps with A1 instead of A.

27
Deleting the first row of D1 yields the matrix A2 . We repeat the procedure with A2
instead of A. No rows of A2 have to be interchanged.

28
The matrix
5
 
 1 1 −2 1 2 
 3 1 
 0 1 −2 
H= 2 2 
 3 
 0 0 1 2 
 
2
0 0 0 0 0
is in row echelon form and is row equivalent to A.

Theorem 83. Every nonzero m × n matrix A = [aij ] is row (column) equivalent to a


unique matrix in reduced row (column) echelon form.

29
Finding A−1

Remark 84. Note that at this point we have shown that the following statements are
equivalent for an n × n matrix A:

1. A is nonsingular.

2. A is row (column) equivalent to In (The reduced row echelon form of A is In .)

For a nonsingular matrix A, we transform the partitioned matrix [A In ] to reduced row


echelon form, obtaining [In A−1 ] .

Example 85. Let  


1 1 1
A= 0 2 3 
5 5 1
Assuming that A is nonsingular, we form
 
1 1 1 1 0 0
[A I3 ] =  0 2 3 0 1 0 
5 5 1 0 0 1
We now perform elementary row operations that transform [A I3 ] to [I3 A−1 ] ; we
consider [A I3 ] as a 3 × 6 matrix, and whatever we do to a row of A we also do to
the corresponding row of I3 . In place of using elementary matrices directly, we arrange
our computations, using elementary row operations as follows:

30
Hence
13 1 1
 
 8 − − 
 2 8 
 
−1
 15 1 3 
A = − .

 8 2 8 
 
 5 1 
0 −
4 4
We can readily verify that AA−1 = A−1 A = I3 .

Theorem 86. An n × n matrix A is singular if and only if A is row equivalent to a


matrix B that has a row of zeros. (That is, the reduced row echelon form of A has a
row of zeros.)

Remark 87. This means that in order to find A−1 , we do not have to determine, in
advance, whether it exists. We merely start to calculate A−1 ; if at any point in the

31
computation we find a matrix B that is row equivalent to A and has a row of zeros,
then A−1 does not exist.
Example 88. Let  
1 2 −3
A =  1 −2 1 
5 −2 −3
To find A−1 , we proceed as follows:

At this point A is row equivalent to


 
1 2 −3
B =  0 −4 4 ,
0 0 0
the last matrix under A . Since B has a row of zeros, we stop and conclude that A is
a singular matrix.
Theorem 89. If A and B are n × n matrices such that AB = In then BA = In . Thus
B = A−1 .
Remark 90. Theorem 96 implies that if we want to check whether a given matrix B
is A−1 , we need merely check whether AB = In or BA = In . That is, we do not have
to check both equalities.

3 Determinants
Determinants first arose in the solution of linear systems. First, we deal briefly with
permutations, which are used in our definition of determinant.

32
Definition 91. Let A = [aij ] be an n × n matrix. The determinant function,
denoted by det, is defined by
X
det(A) = (±)a1j1 a2j2 · · · anjn

where the summation is over all permutation j1 j2 · · · jn of the set S = {1, 2, . . . , n}.
The sign is taken as + or − according to whether the permutation j1 j2 · · · jn is even or
odd.
In each term (±)a1j1 a2j2 · · · anjn of det(A), the row subscripts are in natural order and
the column subscripts are in the order j1 j2 · · · jn . Thus each term in det(A), with its
appropriate sign, is a product of n entries of A, with exactly one entry from each row
and exactly one entry from each column. Since we sum over all permutations of S,
det(A) has n! terms in the sum. Another notation for det(A) is |A|. We shall use both
det(A) and |A|.
Example 92. If A = [a11 ] is a 1 × 1 matrix, then det(A) = a11 .
Example 93. If  
a11 a12
A=
a21 a22
then to obtain det(A), we write down the terms a1− a2− and replace the dashes with
all possible elements of S2 : The subscripts become 12 and 21. Now 12 is an even
permutation and 21 is an odd permutation. Thus
det(A) = a11 a22 − a12 a21
Hence we see that det(A) can be obtained by forming the prod-
uct of the entries on the line from left to right and subtracting
from this number the product of the entries on the line from
right to left.
 
2 −3
Thus, if A = , then |A| = (2)(5) − (−3)(4) = 22.
4 5
Example 94. It  
a11 a12 a13
A =  a21 a22 a23 
a31 a32 a33
then to compute det(A), we write down the six terms a1− a2− a3− , a1− a2− a3− , a1− a2− a3− ,
a1− a2− a3− , a1− a2− a3− , a1− a2− a3− . All the elements of S3 are used to replace the dashes,
and if we prefix each term by + or − according to whether the permutation is even or
odd, we find that (verify)
det(A) =a11 a22 a33 + a12 a23 a31 + a13 a21 a32 − a11 a23 a32
(7)
− a12 a21 a33 − a13 a22 a31 .

33
We can also obtain |A| as follows. Repeat
the first and second columns of A, as shown
right. Form the sum of the products of the
entries on the lines from left to right, and
subtract from this number the products of
the entries on the lines from right to left
(verify):
This method of calculating determinant is called the rule of Sarrus.
Example 95. Let  
1 2 3
A =  2 1 3 .
3 1 2
Evaluate |A|.
Substituting in (27), we find that
1 2 3
2 1 3 =(1)(1)(2) + (2)(3)(3) + (3)(2)(1) − (1)(3)(1)
3 1 2
− (2)(2)(2) − (3)(1)(3) = 6
We could obtain the same result by using the easy method illustrated previously. as
follows:

|A| =(1)(1)(2) + (2)(3)(3) + (3)(2)(1) − (3)(1)(3) − (1)(3)(1)


− (2)(2)(2) = 6

Warning: The methods used for computing det(A) in above examples do not apply
for n ≥ 4.

3.1 Properties of Determinants


In this section we examine properties of determinants that simplify their computation.

Theorem 96. If A is a matrix, then det(A) = det AT
Example 97. Let A be the matrix such that
 
1 2 3
AT =  2 1 1  .
3 3 2

34
So, we have
AT =(1)(1)(2) + (2)(1)(3) + (3)(2)(3)
− (1)(1)(3) − (2)(2)(2) − (3)(1)(3) = 6 = |A|.
Theorem 98. If matrix B results from matrix A by interchanging two different rows
(columns) of A, then det(B) = − det(A).

2 −1 3 2 2 3
Example 99. We have |A| = =− = = AT = 7.
3 2 2 −1 −1 2

Theorem 100. If two rows (columns) of A are equal, then det(A) = 0.

Proof. Suppose that rows r and s of A are equal. Interchange rows r and s of A to obtain
a matrix B. Then det(B) = − det(A). On the other hand, B = A, so det(B) = det(A).
Thus det(A) = − det(A), and so det(A) = 0.

1 2 3
Example 101. We have −1 0 7 = 0. (Verify by the use of Definition 101.
1 2 3

Theorem 102. If a row (column) of A consists entirely of zeros, then det(A) = 0.

Proof. Let the ith row of A consist entirely of zeros. Since each term in Definition 101
for the determinant of A contains a factor from the ith row, each term in det(A) is
zero. Hence det(A) = 0.

Theorem 103. If B is obtained from A by multiplying a row (column) of A by a real


number k, then det(B) = k det(A).

2 6 1 3 1 1
Example 104. We have =2 = (2)(3) = 6(4 − 1) = 18.
1 12 1 12 1 4
Example 105. We have

1 2 3 1 2 3 1 2 1
1 5 3 =2 1 5 3 = (2)(3) 1 5 1 = (2)(3)(0) = 0.
2 8 6 1 4 3 1 4 1

Here, we first factored out 2 from the third row and 3 from the third column, and then
used Theorem 110, since the first and third columns are equal.

Theorem 106. If B = [bij ] is obtained from A = [aij ] by adding to each element of


the rth row (column) of A, k times the corresponding element of the sth row (column),
r 6= s, of A, then det(B) = det(A).

35
Example 107. We have

1 2 3 5 0 9
2 −1 3 = 2 −1 3
1 0 1 1 0 1

obtained by adding twice the second row to the first row. By applying the definition of
determinant to the first and second determinant, both are seen to have the value 4.

Theorem 108. If a matrix A = [aij ] is upper (lower) triangular, then det(A) =


a11 a22 · · · ann that is, the determinant of a triangular matrix is the product of the ele-
ments on the main diagonal.

Recall that in Section 2.1 we introduced compact notation for elementary row and
elementary column operations on matrices. In this chapter, we use the same notation
for rows and columns of a determinant:

• Interchange rows (columns) i and j:

ri ↔ rj (ci ↔ cj )

• Replace row (column) i by k times row (column ) i:

kri → ri (kci → ci )

• Replace row (column) j by k times row (column) i+ row (column) j:

kri + rj → rj (kci + cj → cj )

Using this notation, it is easy to keep track of the elementary row and column operations
performed on a matrix. For example, we indicate that we have interchanged the ith
and jth rows of A as A↔ . We proceed similarly for column operations.
We can now interpret above theorems in terms of this notation as follows:

det Ari ↔rj = − det(A), i 6= j
det (Akri →ri ) = k det(A), i 6= j

det Akri +rj →rj = det(A), i 6= j.

It is convenient to rewrite these properties in terms of det(A):



det(A) = − det Ari ↔rj , i 6= j
1
det(A) = det (Akri →ri ) , k 6= 0
k 
det(A) = det Akri +rj →rj , i 6= j.

36
Theorems above are useful in evaluating determinants. What we do is transform A by
means of our elementary row or column operations to a triangular matrix. Of course.
we must keep track of how the determinant of the resulting matrices changes as we
perform the elementary row or column operations.
 
4 3 2
Example 109. Let A =  3 −2 5  . Compute det(A).
2 4 6
Solution: We have
  1
det(A) = 2 det A 1 r3 →r3 Multiply row 3 by .

2
 2
4 3 2
= 2 det  3 −2 5 
1 2 3
  
4 3 2
= (−1)2 det  3 −2 5   Interchange rows 1 and 3.
1 2 3 r1 ↔r3
 
1 2 3
= −2 det  3 −2 5 
4 3 2
 
 
 1 2 3 
 
= −2 det   3 −2 5   Zero out below the (1, 1) entry.
−3r + r → r
 
 4 3 2 1 2 2
−4r1 + r3 → r3
 
1 2 3
= −2 det  0 −8 −4 
0 −5 −10
 
 
 1 2 3 
= −2 det  0 −8 −4  Zero out below the (2, 2) entry.
  

0 −5 −10 5 
− r2 + r3 → r3
8
 
1 2 3
= −2 det  0 −8 −4  .
0 0 − 30
4

37
Next we compute the determinant of the upper triangular matrix.
 
30
det(A) = −2(1)(−8) − = −120.
4

The operations chosen are not the most efficient, but we do avoid fractions during the
first few steps.

Remark 110. The method used to compute a determinant in Example above will be
referred to as computation via reduction to triangular form.

Theorem 111. If A is an n×n matrix, then A is nonsingular if and only if det(A) 6= 0.

Theorem 112. If A and B are n × n matrices, then det(AB) = det(A) det(B).

Example 113. Let    


1 2 2 −1
A= and B = .
3 4 1 2
Then
|A| = −2 and |B| = 5
 
4 3
On the other hand, AB = , and |AB| = −10 = |A||B|.
10 5
1
Theorem 114. If A is nonsingular, then det (A−1 ) = .
det(A)

3.2 Cofactor Expansion


Thus far we have evaluated determinants by using Definition 3.2 and the properties
established in Section 3.2. We now develop a method for evaluating the determinant of
an n × n matrix that reduces the problem to the evaluation of determinants of matrices
of order n − 1. We can then repeat the process for these (n − 1) × (n − 1) matrices until
we get to 2 × 2 matrices.

Definition 115. Let A = [aij ] be an n × n matrix. Let Mij be the (n − 1) × (n − 1)


submatrix of A obtained by deleting the ith row and jth column of A. The determinant
det (Mij ) is called the minor of aij .

Definition 116. Let A = [aij ] be an n × n matrix. The cofactor Aij of aij is defined
as Aij = (−1)i+j det (Mij ).

Example 117. Let  


3 −1 2
A= 4 5 6 .
7 1 2

38
4 6 3 −1
Then det (M12 ) = = 8 − 42 = −34, det (M23 ) = = 3 + 7 = 10, and
7 2 7 1

−1 2
det (M31 ) = = −6 − 10 = −16.
5 6

Also,
A12 = (−1)1+2 det (M12 ) = (−1)(−34) = 34,
A23 = (−1)2+3 det (M23 ) = (−1)(10) = −10,
and
A31 = (−1)3+1 det (M31 ) = (1)(−16) = −16.

If we think of the sign(−1)i+j as being located in position (i, j) of an n × n matrix, then


the signs form a checkerboard pattern that has a+ in the (1, 1) position. The patterns
for n = 3 and n = 4 are as follows:
 
  + − + −
+ − +
 − + −   − + − + 
 
 + − + − 
+ − +
− + − +
n=3 n=4

Theorem 118. Let A = [aij ] be an n × n matrix. Then

det(A) = ai1 Ai1 + ai2 Ai2 + · · · + ain Ain


[ expansion of det (A) along the ith row ]

and
det(A) = a1j A1j + a2j A2j + · · · + anj Anj
[ expansion of det (A) along the jth column ].
Example 119. To evaluate the determinant

1 2 −3 4
−4 2 1 3
3 0 0 −3
2 0 −2 3

it is best to expand along either the second column or the third row because they each
have two zeros. Obviously, the optimal course of action is to expand along the row or
column that has the largest number of zeros, because in that case the cofactors Aij of
those aij which are zero do not have to be evaluated, since aij Aij = (0) (Aij ) = 0. Thus,
expanding along the third row, we have

39
1 2 −3 4
2 −3 4 1 −3 4
−4 2 1 3 3+1 3+2
= (−1) (3) 2 1 3 + (−1) (0) −4 1 3
3 0 0 −3
0 −2 3 2 −2 3
2 0 −2 3
1 2 4 1 2 −3
3+3 3+4
+(−1) (0) −4 2 3 + (−1) (−3) −4 2 1
2 0 3 2 0 −2
= (+1)(3)(20) + 0 + 0 + (−1)(−3)(−4) = 48.
Example 120. We have
1 2 −3 4 1 2 −3 7
2 −3 7
−4 2 1 3 −4 2 1 −9 3+1
= = (−1) (1) 2 1 −9
1 0 0 −3 1 0 0 0
0 −2 9
2 0 −2 3 c4 +3c1 →c4
2 0 −2 9 r1 −r2 →r1

0 −4 16
4
= (−1) (1) 2 1 −9 = (−1)4 (1)(8) = 8.
0 −2 9

3.3 Inverse of a Matrix


We saw in Section 3.3 that Theorem 3.10 provides formulas for expanding det(A) along
either a row or a column of A. Thus det(A) = ai1 Ai1 +ai2 Ai2 +· · · ain Ain is the expansion
of det(A) along the ith row. It is interesting to ask what ai1 Ak1 + ai2 Ak2 + · · · ain Akn
is for i 6= k, because as soon as we answer this question, we obtain another method for
finding the inverse of a nonsingular matrix.
Theorem 121. If A = [aij ] is an n × n matrix, then
ai1 Ak1 + ai2 Ak2 + · · · + ain Akn = 0 for i 6= k,
a1j A1k + a2j A2k + · · · + anj Ank = 0 for j 6= k.
 
1 2 3
Example 122. Let A =  −2 3 1  . Then
4 5 −2
2 3 1 3
A21 = (−1)2+1 = 19 A22 = (−1)2+2 = −14, and A23 =
5 −2 4 −2
1 2
(−1)2+3 = 3.
4 5
Now
a31 A21 + a32 A22 + a33 A23 = (4)(19) + (5)(−14) + (−2)(3) = 0,
and
a11 A21 + a12 A22 + a13 A23 = (1)(19) + (2)(−14) + (3)(3) = 0.

40
We may summarize our expansion results by writing

ai1 Ak1 + ai2 Ak2 + · · · + ain Akn = det(A) if i = k


=0 6 k
if i =
and
a1j A1k + a2j A2k + · · · + anj Ank = det(A) if j = k
=0 if j 6= k.

Definition 123. Let A = [aij ] be an n × n matrix. The n × n matrix adj A, called the
adjoint of A, is the matrix whose (i, j) th entry is the cofactor Aji of aji . Thus
 
A11 A21 · · · An1
 A12 A22 · · · An2 
adj A =  .. ..  .
 
..
 . . . 
A1n A2n · · · Ann
 
3 −2 1
Example 124. Let A =  5 6 2  . Compute adj A
1 0 −3
Solution: We first compute the cofactors of A. We have

6 2 5 2
A11 = (−1)1+1 = −18, A12 = (−1)1+2 = 17,
0 −3 1 −3

5 6 −2 1
A13 = (−1)1+3 = −6, A21 = (−1)2+1 = −6,
1 0 0 −3
3 1 3 −2
A22 = (−1)2+2 = −10, A23 = (−1)2+3 = −2,
1 −3 1 0
−2 1
A31 = (−1)3+1 = −10,
6 2

3 1 3 −2
A32 = (−1)3+2 = −1, A33 = (−1)3+3 = 28.
5 2 5 6
Then  
−18 −6 −10
adj A =  17 −10 −1  .
−6 −2 28

Theorem 125. If A = [aij ] is an n × n matrix, then A(adj A) = (adj A)A = det(A)In .

41
Example 126. Consider the matrix of example above, then
    
3 −2 1 −18 −6 −10 −94 0 0
 5 6 2   17 −10 −1  =  0 −94 0 
1 0 −3 −6 −2 28 0 0 −94
 
1 0 0
= −94  0 1 0 
0 0 1
    
−18 −6 −10 3 −2 1 1 0 0
and  17 −10 −1   5 6 2  = −94  0 1 0  .
−6 −2 28 1 0 −3 0 0 1

Corollary 127. If A is an n × n matrix and det(A) 6= 0. Then

A11 A21 An1


 
 det(A) det(A) · · ·
det(A) 
 A12 A22 An2 
 
1 · · ·
A−1 =
 
(adj A) =  det(A) det(A) det(A)  .
det(A)  .. .. .. 

 . . . 

 A1n A2n Ann 
···
det(A) det(A) det(A)
 
3 −2 1
Example 128. Again consider the matrix A =  5 6 2  . Then det(A) = −94
1 0 −3
and

18 6 10
 
 94 94 94 
−1 1  17 10 1 
A = (adj A) =  − .
det(A)  94 94 94 
 6 2 28 

94 94 94

Summary 129. Note that at this point we have shown that the following statements
are equivalent for an n × n matrix A:

1. A is nonsingular.

2. A is row (column) equivalent to In .

3. det(A) 6= 0.

42
Alternative Definition for the Rank of a Matrix: The rank of a matrix can also
be calculated using determinants. The rank of a matrix A is r, where r is the size of
the largest non-zero r × r submatrix with non-zero determinant.
 
2 1 3 2 0
 3 2 5 1 0 
 
Example 130. Let A =   −1 1 0 −7 0   . Find the rank of A.
 3 −2 1 17 0 
0 1 1 −4 0

∗ Column 5 can be discarded since it has all zero elements.


∗ Column 3 can be discarded since it is a linear combination of column 1 and column
2, that is, c3 = c1 + c2 .
 
2 1 2

 3 2 1 

Therefore we can consider the matrix 
 −1 1 −7 

 3 −2 17 
0 1 −4

i) Since the matrix is 5 × 3 we will look first to the determinants of the submatrices
which have the dimension 3 × 3.
All 3 × 3 submatrices of the above matrix are and their determinants are:

2 1 2 2 1 2
3 2 1 = 28−1+6+4+21−2 = 0, 3 2 1 = 68+3−12−12−51+4 = 0
−1 1 −7 3 −2 17
2 1 2 2 1 2
3 2 1 = −16 + 6 + 12 − 2 = 0, −1 1 −7 = 34 − 21 + 4 − 6 + 17 − 28 = 0
0 1 −4 3 −2 17
2 1 2 2 1 2
−1 1 −7 = −8 − 2 − 4 + 14 = 0, 3 −2 17 = 16 + 6 + 12 − 34 = 0
0 1 −4 0 1 −4
3 2 1 3 2 1
−1 1 −7 = 51 − 42 + 2 − 3 − 42 + 34 = 0, 3 −2 17 = 24 + 3 + 24 − 51 = 0
3 −2 17 0 1 −4
−1 1 −7 3 2 1
3 −2 17 = −8 − 21 + 12 + 17 = 0, −1 1 −7 = −12 − 1 − 8 + 21 = 0
0 1 −4 0 1 −4

43
2 1
Therefore, we have to look to the determinants of 2 × 2 submatrices. Since, =
3 2
2 · 2 − 3 · 1 = 1 6= 0, the rank of the matrix A is 2.

The second midterm topics start here!!!

4 Linear Equation Systems


One of the most frequently recurring practical problems in many fields of study- such
as mathematics, physics, biology, chemistry, economics, all phases of engineering, oper-
ations research, and the social sciences-is that of solving a system of linear equations.
The equation
a1 x 1 + a2 x 2 + · · · + an x n = b (8)
which expresses b in terms of the unknowns x1 , x2 , . . . , xn and the constants a1 , a2 , . . . , an
is called a linear equation. In many applications we are given b and must find
numbers x1 , x2 , . . . , xn satisfying (8). A solution to linear Equation (8) is a se-
quence of n numbers s1 , s2 , . . . .., sn , which has the property that (8) is satisfied when
x1 = s1 , x2 = s2 , . . . , xn = sn are substituted in (8). Thus x1 = 2, x2 = 3, and x3 = −4
is a solution to the linear equation

6x1 − 3x2 + 4x3 = −13

because
6(2) − 3(3) + 4(−4) = −13

More generally, a system of m linear equations in n unknowns, x1 , x2 , . . . , xn


or a linear system, is a set of m linear equations each in n unknowns. A linear system
can conveniently be written as

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. (9)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm

Thus the ith equation is

ai1 x1 + ai2 x2 + · · · + ain xn = bi .

In (9) the aij are known constants. Given values of b1 , b2 , . . . , bm , we want to find values
of x1 , x2 , . . . , xn that will satisfy each equation in (9). A solution to linear system (9)
is a sequence of n numbers s1 , s2 , . . . , sn , which has the property that equation in (9)
is satisfied when x1 = s1 , x2 = s2 , . . . , xn = sn are substituted.

44
If the linear system (9) has no solution, it is said to be inconsistent; if it has a solution,
it is called consistent. If b1 = b2 = · · · = bm = 0, then (9) is called a homogeneous
system. Note that x1 = x2 = · · · = xn = 0 is always a solution to a homogeneous
system; it is called the trivial solution. A solution to a homogeneous system in which
not all of x1 , x2 , . . . , xn are zero is called a nontrivial solution.
Consider another system of r linear equations in n unknowns:

c11 x1 + c12 x2 + · · · + c1n xn = d1


c21 x1 + c22 x2 + · · · + c2n xn = d2
.. .. .. .. . (10)
. . . .
cr1 x1 + cr2 x2 + · · · + crn xn = dr

We say that (9) and (10) are equivalent if they both have exactly the same solutions.

Example 131. The linear system

x1 − 3x2 = −7
2x1 + x2 = 7 (11)

has only the solution x1 = 2 and x2 = 3. The linear system

8x1 − 3x2 = 7
3x1 − 2x2 = 0 (12)
10x1 − 2x2 = 14

also has only the solution x1 = 2 and x2 = 3. Thus (11) and (12) are equivalent.

4.1 Solving Linear systems


Consider the linear system of m equations in n unknowns,

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
.. .. .. .. (13)
. . . .
am1 x1 + am2 x2 + · · · + amn xn = bm

Now define the following matrices:


     
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n   x2   b2 
A =  .. ..  , x =  ..  , b =  .. 
     
.. ..
 . . . .  .  . 
am1 am2 · · · amn xn bm

45
Then
    
a11 a12 · · · a1n x1 a11 x1 +a12 x2 + · · · + a1n xn
 a21 a22   x2   a21 x1 +a22 x2 + · · · + a2n xn 
· · · a2n     
Ax =  .. ..   ..  =  .. (14)

.. .. .. .. 
 . . . .  .   . . . 
am1 am2 · · · amn xn am1 x1 +am2 x2 + · · · + amn xn

The entries in the product Ax at the end of (14) are merely the left sides of the equations
in (13). Hence the linear system (13) can be written in matrix form as

Ax = b.

The matrix A is called the coefficient matrix of the linear system (13), and the
matrix
 
a11 a12 a13 · · · a1n b1
 a21 a22 a23 · · · a2n b2 
,
 
 .. .. .. ..
 . . . . 
am1 am2 am3 · · · amn bm
obtained by adjoining column b to A, is called the augmented matrix of the linear
.
system (13). The augmented matrix of (13) is written as [A .. b]. Conversely, any
matrix with more than one column can be thought of as the augmented matrix of a
linear system. The coefficient and augmented matrices play key roles in our method for
solving linear systems.
Recall from the previous section that if

b1 = b2 = · · · = bm = 0

in (14), the linear system is called a homogeneous system. A homogeneous system can
be written as

Ax = 0
where A is the coefficient matrix.
Let  
a11 a12 a13 · · · a1n b1
..  a21 a22 a23 · · · a2n b2 
[A . b] = 
 
.. .. .. .. 
 . . . . 
am1 am2 am3 · · · amn bm
be the augmented matrix of the linear system (13). Using elementary row operations,
. .
we can transform [A .. b] to [C .. d] which is in reduced row echelon form. Therefore,

46
1) If rC 6= r .. where rC is the rank of the matrix C (the number of nonzero rows
. d]
[C
in C) then the linear system (13) has no solution (inconsistent).

2) If rC = r .. = r then the linear system (13) has a solution (consistent):


[C . d]

a) If r = n then the linear system (13) has a unique solution.


b) If r < n then the linear system (13) has infinitely many solutions with respect
to n − r arbitrary variables.

This method of solving linear systems is called Gauss-Jordan reduction or solving


linear systems via equivalent matrices.

Example 132. The linear system

x + 2y + 3z = 9
2x − y + z = 8
3x − z = 3

has the augmented matrix


 
1 2 3 9
[A b] =  2 −1 1 8 
3 0 −1 3

Transforming this matrix to reduced row echelon form, we obtain (verify)


 
1 0 0 2
[C d] =  0 1 0 −1  .
0 0 1 3
Therefore, rC = r .. = 3 = n and linear system has a unique solution which is
[C . d]
x = 2, y = −1, z = 3.

Example 133. Solve the linear system

x + y + 2z = 1
2y + 7z = 4
3x + 3y + 6z = 3.

The augmented matrix of the linear system is


 
1 1 2 1
[A b] =  0 2 7 4 
3 3 6 3

47
By using elementary row operations,
     
1 1 2 1 1 1 2 1 1 0 −3/2 −1
 0 2 7 4  −→  0 1 7/2 2  −→  0 1 7/2 2 .
1
3 3 6 3 2
r2 → r2 0 0 0 0 −r2 + r1 → r1 0 0 0 0
−3r1 + r3 → r3

Hence, rC = r .. = 2 < 3 = n. So, the linear system has infinitely many solutions
[C . d]
with respect to 3 − 2 = 1 arbitrary variable. Using back substitution, we have
3
x − z = −1
2
7
y+ z=2
2
z = any real number.

So, the solution for the linear system is:


3
x = −1 + r
2
7
y =2− r
2
z = r, any real number.

Example 134. Solve the linear system

3x + 2y = 7
17x + y = 0
6x + 4y = 3.

The augmented matrix of the linear system is


 
3 2 7
[A b] =  17 1 0 
6 4 3

Transforming this matrix to row echelon form, we obtain (verify)


 
1 2/3 7/3
[C d] =  0 −31/3 −119/3  .
0 0 −11

Since, rC = 2 6= r .. = 3 linear system has no solution (inconsistent). This can be


[C . d]
verified by the last equation: 0x + 0y = −11 which can never be satisfied.

48
Homogeneous Systems

Now we study a homogeneous system Ax = 0 of m linear equations in n unknowns.


Let [C 0] be a reduced row echelon form of the augmented matrix [A 0] . Since
we always have rC = r . , a homogeneous system always has a solution:
[C .. 0]

1) If r = n, where r = rank(C), the homogeneous system has only zero (trivial)


solution.
2) If r < n then the homogeneous system has a nontrivial solution; infinitely many
solutions with respect to n − r arbitrary variables.
Example 135. Solve the homogeneous system
−3x + 2y − 3z = 0
2x + 5y + 2z = 0
4x + y − 3z = 0.
The augmented matrix of the homogeneous system is
 
−3 2 −3 0
[A 0] =  2 5 2 0 
4 1 −3 0
By using elementary row operations,
     
−3 2 −3 0 −3 2 −3 0 1 3 −6 0
 2 5 2 0  −→  2 5 2 0  −→  2 5 2 0 
4 1 −3 0 r1 + r3 → r3 1 3 −6 0 r1 ↔ r3 −3 2 −3 0 −2r1 + r2 → r2
3r1 + r3 → r3

     
1 3 −6 0 1 3 −6 0 1 0 36 0
−→ 0 −1
 14 0  −→ 0 1 −14 0 
 −→  0 1 −14 0 
0 11 −21 0 −r2 → r2 0 11 −21 0 −3r2 + r1 → r1 0 0 −175 0 − 1
r3 → r3
−11r2 + r3 → r3 175
   
1 0 36 0 1 0 0 0
−→ 0 1 −14 0 
 −→ 0 1 0 0  .

0 0 1 0 14r3 + r2 → r2 0 0 1 0
−36r3 + r1 → r1

Since r = 3 = n, the homogeneous system has only trivial (zero) solution, that is,
x = y = z = 0.
Theorem 136. A homogeneous system of m linear equations in n unknowns always has
a nontrivial solution if m < n, that is, if the number of unknowns exceeds the number
of equations.

49
Cramer’s Rule

Let
a11 x1 + a12 x2 + · · · + a1n xn = b1
a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
an1 x1 + an2 x2 + · · · + ann xn = bn
be a linear system of n equations in n unknowns, and let A = [aij ] be the coefficient
matrix so that we can write the given system as Ax = b, where
 
b1
 b2 
b =  ..  .
 
 . 
bn

If det(A) 6= 0, then the system has the unique solution

det (A1 ) det (A2 ) det (An )


x1 = , x2 = , . . . , xn =
det(A) det (A) det(A)

where Ai is the matrix obtained from A by replacing the ith column of A by b.

Example 137. Consider the following linear system:

−2x1 + 3x2 − x3 =1
x1 + 2x2 − x3 =4
−2x1 − x2 + x3 = −3.

−2 3 −1
We have |A| = 1 2 −1 = −2. Then
−2 −1 1

50
1 3 −1
4 2 −1
−3 −1 1 −4
x1 = = = 2,
|A| −2

−2 1 −1
1 4 −1
−2 −3 1 −6
x2 = = = 3,
|A| −2

−2 3 1
1 2 4
−2 −1 −3 −8
x3 = = = 4.
|A| −2
Remark 138. We note that Cramer’s rule is applicable only to the case in which we
have n equations in n unknowns and the coefficient matrix A is nonsingular. If we
have to solve a linear system of n equations in n unknowns whose coefficient matrix is
singular, then we must use the Gaussian elimination or Gauss-Jordan reduction methods
as discussed before. Cramer’s rule becomes computationally inefficient for n ≥ 4, so it is
better, in general, to use the Gaussian elimination or Gauss-Jordan reduction methods.

Summary 139. Note that at this point we have shown that the following statements
are equivalent for an n × n matrix A:

1. A is nonsingular.

2. Ax = 0 has only the trivial solution.

3. A is row (column) equivalent to In .

4. The linear system Ax = b has a unique solution for every n × 1 matrix b.

5. det(A) 6= 0.

51
Solving Linear Systems via the Inverse of a Coefficient Matrix

Consider the linear system of n equations in n unknowns.

a11 x1 + a12 x2 + · · · + a1n xn = b1


a21 x1 + a22 x2 + · · · + a2n xn = b2
..
.
an1 x1 + an2 x2 + · · · + ann xn = bn

Let A = [aij ] be the coefficient matrix of the linear system so that we can write the
given system as Ax = b, where  
b1
 b2 
b =  .. .
 
 . 
bn
If det(A) 6= 0 then A is nonsingular (invertible) and A−1 exists. Therefore, the equation
Ax = b becomes;

Ax = b
A Ax = A−1 b ⇒ x = A−1 b.
−1

So, we can solve the linear system by using the inverse of the coefficient matrix.

Example 140. Solve the following linear system by using the inverse the coefficient
matrix.
x + 3y + z = 5
2x + y + z = 4
−2x + 2y − z = 3.
   
1 3 1 −1 5/3 2/3
1
We have A =  2 1 1  . Then A−1 = adjA =  0 1/3 1/3  .
det(A)
−2 2 −1 2 −8/3 −5/3
Therefore,
      
x −1 5/3 2/3 5 11/3
x =  y  = A−1 b =  0 1/3 1/3   4  =  7/3  .
z 2 −8/3 −5/3 3 −17/3

11 7 17
So, x = , y = and z = − .
3 3 3

52
5 Vectors in the Plane and in 3-Space
Definition 141. A (nonzero) vector is a directed line segment drawn from a point
P (called its initial point to a point Q (called its terminal point) P and Q being
−→
distinct, points. The vector is denoted by P Q. Its magnitude is the length of the line
−→
segment, denoted by ||P Q|| and its direction is the same as the directed line segment.
The zero vector is just a point, and it is denoted by 0.

To indicate the direction of a vector, we draw an arrow from its initial point to its
terminal point. We will often denote a vector by a single bold-faced letter (e.g. v)
and use the terms “magnitude” and “length” interchangeably. Note that our definition
could apply to systems with any number of dimensions

A few things need to be noted about the zero vector. Our motivation for what a vector
is included the notions of magnitude and direction. What is the magnitude of the zero
vector? We define it to be zero, i.e. ||0|| = 0. This agrees with the definition of the
zero vector as just a point, which has zero length. What about the direction of the
zero vector? A single point really has no well-defined direction. Notice that we were
careful to only define the direction of a nonzero vector, which is well-defined since the
initial and terminal points are distinct. Not everyone agrees on the direction of the
zero vector. Some contend that the zero vector has arbitrary direction (i.e. can take
any direction), some say that it has indeterminate direction (i.e. the direction can not
be determined), while others say that it has no direction.
Now that we know what a vector is, we need a way of determining when two vectors
are equal. This leads us to the following definition.

Definition 142. Two nonzero vectors are equal if they have the same magnitude and
the same direction. Any vector with zero magnitude is equal to the zero vector.

53
By this definition, vectors with the same magnitude
and direction but with different initial points would
be equal. For example, in Figure 1.1.5 the √ vectors
u, v and w all have the same magnitude 5 (by
the Pythagorean Theorem). And we see that u and
w are parallel, since they lie on lines having the
1
same slope , and they point in the same direc-
2
tion. So u = w, even though they have different
initial points. We also see that v is parallel to u but
points in the opposite direction. So u 6= v.
Recall the distance formula for points in the Euclidean plane:
For points P = (x1 , y1 ) , Q = (x2 , y2 ) in R2 , the distance d between P and Q is:
q
d = (x2 − x1 )2 + (y2 − y1 )2

−→
By this formula, we have the following result: For a vector P Q in R2 with initial point
−→
P = (x1 , y1 ) and terminal point Q = (x2 , y2 ) , the magnitude of P Q is:

−→
q
kP Qk = (x2 − x1 )2 + (y2 − y1 )2

Finding the magnitude of a vector v = (a, b) in R2 is a special case of above formula


with P = (0, 0) and Q = (a, b) : For a vector v = (a, b) in R2 , the magnitude of v is:

kvk = a2 + b2

Theorem 143. The distance d between points P = (x1 , y1 , z1 ) and Q = (x2 , y2 , z2 ) in


R3 is: q
d = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

Theorem 144. For a vector v = (a, b, c) in R3 , the magnitude of v is:



kvk = a2 + b2 + c2

5.1 Vector Algebra


Definition 145. A scalar is a quantity that can be represented by a single number.

Definition 146. For a scalar k and a nonzero vector v, the scalar multiple of v by k,
denoted by kv, is the vector whose magnitude is |k|kvk, points in the same direction as
v if k > 0, points in the opposite direction as v if k < 0, and is the zero vector 0 if
k = 0. For the zero vector 0, we define k0 = 0 for any scalar k.

54
Two vectors v and w are parallel (denoted by v k w) if one is a scalar multiple of the
other. You can think of scalar multiplication of a vector as stretching or shrinking the
vector, and as flipping the vector in the opposite direction if the scalar is a negative
number (see the below figure).

Recall that translating a nonzero vector means that the initial point of the vector is
changed but the magnitude and direction are preserved. We are now ready to define
the sum of two vectors.

Definition 147. The sum of vectors v and w, denoted by v + w, is obtained by trans-


lating w so that its initial point is at the terminal point of v; the initial point of v + w
is the initial point of v, and its terminal point is the new terminal point of w.
The term scalar was invented by 19th century Irish mathematician, physicist and astronomer William Rowan Hamilton,
to convey the sense of something that could be represented by a point on a scale or graduated ruler. The word vector
comes from Latin, where it means “carrier”.

An alternate definition of scalars and vectors, used in physics, is that under certain types of coordinate transformations
(e.g. rotations), a quantity that is not affected is a scalar, while a quantity that is affected (in a certain way) is a vector.
See MARION for details.

Intuitively, adding w to v means tacking on w to the end of v (see Figure below).

Notice that our definition is valid for the zero vector (which is just a point, and hence
can be translated), and so we see that v +0 = v = 0+v for any vector v. In particular,
0+0 = 0. Also, it is easy to see that v+(−v) = 0, as we would expect. In general, since
the scalar multiple −v = −1v is a well-defined vector, we can define vector subtraction
as follows: v − w = v + (−w).

55
The below figure shows the use of “geometric proofs” of various laws of vector algebra,
that is, it uses laws from elementary geometry to prove statements about vectors. For
example, (a) shows that v + w = w + v for any vectors v, w. And (c) shows how you
can think of v − w as the vector that is tacked on to the end of w to add up to v.

Theorem 148. Let v = (v1 , v2 , v3 ) , w = (w1 , w2 , w3 ) be vectors in R3 , let k be a scalar.


Then
(a) kv = (kv1 , kv2 , kv3 )
(b) v + w = (v1 + w1 , v2 + w2 , v3 + w3 )

The following theorem summarizes the basic laws of vector algebra.

Theorem 149. For any vectors u, v, w, and scalars k, l, we have


(a) v + w = w + v Commutative Law
(b) u + (v + w) = (u + v) + w Associative Law
(c) v + 0 = v = 0 + v Additive Law
(d) v + (−v) = 0 Additive Inverse
(e) k(lv) = (kl)v Associative Law
(f ) k(v + w) = kv + kw Distributive Law
(g) (k + l)v = kv + lv Distributive Law

A unit vector is a vector with magnitude 1. Notice that for any nonzero vector v, the
v 1
vector is a unit vector which points in the same direction as v, since > 0 and
kvk kvk

56
v kvk
= = 1. Dividing a nonzero vector v by kvk is often called normalizing v.
kvk kvk

Analytic (Component) Form of Vectors

There are specific unit vectors which we will often use, called the basis vectors:
i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1) in R3 ; i = (1, 0) and j = (0, 1) in R2 .
These are useful for several reasons: they are mutually perpendicular, since they lie on
distinct coordinate axes; they are all unit vectors: kik = kjk = kkk = 1; every vector
can be written as a unique scalar combination of the basis vectors: v = (a, b) = ai + bj
in R2 , v = (a, b, c) = ai + bj + ck in R3 .
When a vector v = (a, b, c) is written as v = ai+bj+ck, we say that v is in component
form (analytic form), and that a, b, and c are the i, j, and k components, respectively,
of v. We have:

v = v1 i + v2 j + v3 k, α is a scalar ⇒ αv = αv1 i + αv2 j + αv3 k


v = v1 i + v2 j + v3 k, w = w1 i + w2 j + w3 k
⇒ v + w = (v1 + w1 ) i + (v2 + w2 ) j + (v3 + w3 ) k
q
v = v1 i + v2 j + v3 k ⇒ kvk = v12 + v22 + v32

The distance d between two points P =


(x1 , y1 , z1 ) and Q = (x2 , y2 , z2 ) in R3 is
same as the length of the vector w − v,
where the vectors v and w are defined as
v = (x1 , y1 , z1 ) and w = (x2 , y2 , z2 ) . So
since w − v = (x2 − x1 , y2 − y1 , z2 − z1 ) ,
then
q
d = kw−vk = (x2 − x1 )2 + (y2 − y1 )2 + (z2 − z1 )2

Dot Product

Definition 150. Let v = (v1 , v2 , v3 ) and w = (w1 , w2 , w3 ) be vectors in R3 . The dot


product (inner product-scalar product) of v and w, denoted by v · w, is given by:

v · w = v1 w1 + v2 w2 + v3 w3

Similarly, for vectors v = (v1 , v2 ) and w = (w1 , w2 ) in R2 , the dot product is:

v · w = v1 w1 + v2 w2

57
Notice that the dot product of two vectors is a scalar, not a vector. So the associative
law that holds for multiplication of numbers and for addition of vectors does not hold
for the dot product of vectors. Why? Because for vectors u, v, w, the dot product
u · v is a scalar, and so ( u · v) · w is not defined since the left side of that dot product
(the part in parentheses) is a scalar and not a vector.
Definition 151. The angle between two nonzero vectors with the same initial point is
the smallest angle between them.

We do not define the angle between the zero vector and any other vector. Any two
nonzero vectors with the same initial point have two angles between them: θ and
360◦ − θ. We will always choose the smallest nonnegative angle θ between them, so
that 0◦ ≤ θ ≤ 180◦ .
Theorem 152. Let v, w be nonzero vectors, and let θ be the angle between them. Then
v·w
cos θ =
kvkkwk

For vectors v = v1 i + v2 j + v3 k and w = w1 i + w2 j + w3 k in component form, the


dot product is still v · w = v1 w1 + v2 w2 + v3 w3 , since i, j and k are unit vectors and
perpendicular each other, that is,

kik = kjk = kkk = 1, i2 = i · i = 1, j2 = j · j = 1, k2 = k · k = 1


i · j = 0, i · k = 0, j · k = 0

Two nonzero vectors are perpendicular if the angle between them is 90◦ . Since cos 90◦ =
0, we have the following important corollary:
Corollary 153. Two nonzero vectors v and w are perpendicular if and only if v·w = 0.

We will write v ⊥ w to indicate that v and w are perpendicular.


Theorem 154. For any vectors u, v, w, and scalar k, we have
(a) v · w = w · v Commutative Law
(b) (kv) · w = v · (kw) = k(v · w) Associative Law
(c) (v) · 0 = 0 · v = 0
(d) u · (v + w) = u · v + u · w Distributive Law
(e) (u + v) · w = u · w + v · w Distributive Law
(f ) |v · w| ≤ kvkkwk Cauchy-Schwarz Inequality

Using the theorem above, we see that if u · v = 0 and u · w = 0, then u · (kv + lw) =
k(u · v) + l(u · w) = k(0) + l(0) = 0 for all scalars k, l. Thus, we have the following fact:

If u ⊥ v and u ⊥ w, then u ⊥ (kv + lw) for all scalars k, l.

58
For vectors v and w, the collection of all scalar combinations kv +lw is called the span
of v and w. If nonzero vectors v and w are parallel, then their span is a line; if they
are not parallel, then their span is a plane. So what we showed above is that a vector
which is perpendicular to two other vectors is also perpendicular to their span.
Theorem 155. For any vectors v, w, we have
(a) kvk2 = v · v
(b) kv + wk ≤ kvk + kwk Triangle Inequality
(c) kv − wk ≥ kvk − kwk
Example 156. Find the angle θ between the vectors v = (2, 1, −1)
√ and w = (3, −4,
√ 1).
Solution: Since v · w = (2)(3) + (1)(−4) + (−1)(1) = 1, kvk = 6, and kwk = 26,
then
v·w 1 1
cos θ = = √ √ = √ ≈ 0.08 ⇒ θ = 85.41◦
kvkkwk 6 26 2 39
Example 157. Are the vectors v = (−1, 5, −2) and w = (3, 1, 1) perpendicular?
Solution: Yes, v ⊥ w since v · w = (−1)(3) + (5)(1) + (−2)(1) = 0

Cross (Vector) Product

In Section 1.3 we defined the dot product, which gave a way of multiplying two vectors.
The resulting product, however, was a scalar, not a vector. In this section we will define
a product of two vectors that does result in another vector. This product, called the
cross product, is only defined for vectors in R3 . The definition may appear strange and
lacking motivation, but we will see the geometric basis for it shortly.
Suppose that
u=  u1 i + u2 j + u3 k and v = v1 i + v2 j + v3 k and that we want to find a
x
vector w =  y  orthogonal (perpendicular) to both u and v. Thus we want u·w = 0
z
and v · w = 0, which leads to the linear system
u1 x + u2 y + u3 z = 0
v1 x + v2 y + v3 z = 0
It can be shown that  
u2 v3 − u3 v2
w =  u3 v1 − u1 v3 
u1 v2 − u2 v1
is a solution to equation above (verify). Of course, we can also write w as
w = (u2 v3 − u3 v2 ) i + (u3 v1 − u1 v3 ) j + (u1 v2 − u2 v1 ) k
This vector is called the cross product of u and v and is denoted by u × v. Note that
the cross product, u × v, is a vector, while the dot product, u · v, is a scalar,

59
Definition 158. Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be vectors in R3 . The cross
product of u and v, denoted by u × v, is the vector in R3 given by:

u × v = (u2 v3 − u3 v2 , u3 v1 − u1 v3 , u1 v2 − u2 v1 )

Example 159. Find i × j.


Solution: Since i = (1, 0, 0) and j = (0, 1, 0), then

i × j = ((0)(0) − (0)(1), (0)(0) − (1)(0), (1)(1) − (0)(0))


= (0, 0, 1)
=k

Similarly it can be shown that j × k = i and k × i = j.


Theorem 160. If the cross product v × w of two nonzero vectors v and w is also a
nonzero vector, then it is perpendicular to both v and w.

Proof. We will show that (v × w) · v = 0 :

(v × w) · v = (v2 w3 − v3 w2 , v3 w1 − v1 w3 , v1 w2 − v2 w1 ) · (v1 , v2 , v3 )
= v2 w3 v1 − v3 w2 v1 + v3 w1 v2 − v1 w3 v2 + v1 w2 v3 − v2 w1 v3
= v1 v2 w3 − v1 v2 w3 + w1 v2 v3 − w1 v2 v3 + v1 w2 v3 − v1 w2 v3
= v1 v2 w3 − v1 v2 w3 + w1 v2 v3 − w1 v2 v3 + v1 w2 v3 − v1 w2 v3
= 0, after rearranging the terms.

∴ v × w ⊥ v by Corollary 1.7.

We will now derive a formula for the magnitude of v × w, for nonzero vectors v, w:

If θ is the angle between nonzero vectors v and w in R3 , then

kv × wk = kvkkwk sin θ

Example 161. Let 4P QR and P QRS be a triangle and parallelogram, respectively,


as shown in the below.

Think of the triangle as existing in R3 , and identify the sides QR and QP with vectors
v and w, respectively, in R3 . Let θ be the angle between v and w. The area AP QR of

60
1
∆P QR is bh, where b is the base of the triangle and h is the height. So we see that
2
b = kvk and h = kwk sin θ
1
AP QR = kvkkwk sin θ
2
1
= kv × wk
2
So since the area AP QRS of the parallelogram P QRS is twice the area of the triangle
4P QR, then
AP QRS = kvkkwk sin θ

Theorem 162. Area of triangles and parallelograms


(a) The area A of a triangle with adjacent sides v, w (as vectors in R3 ) is:
1
A = kv × wk
2
(b) The area A of a parallelogram with adjacent sides v, w (as vectors in R3 ) is:
A = kv × wk

Example 163. Calculate the area of the triangle 4P QR, where P = (2, 4, −7), Q =
(3, 7, 18), and R = (−5, 12, 8).
−→ −→
Solution: Let v = P Q and w = P R, as in Figure right. Then
v = (3, 7, 18) − (2, 4, −7) = (1, 3, 25) and w = (−5, 12, 8) −
(2, 4, −7) = (−7, 8, 15), so the area A of the triangle 4P QR
is
1 1
A = kv × wk = k(1, 3, 25) × (−7, 8, 15)k
2 2
1
= k((3)(15) − (25)(8), (25)(−7) − (1)(15), (1)(8) − (3)(−7))k
2
1
= k(−155, −190, 29)k
2
1p 1√
= (−155)2 + (−190)2 + 292 = 60966
2 2
A ≈ 123.46

Example 164. Calculate the area of the parallelogram P QRS, where P = (1, 1), Q =
(2, 3), R = (5, 4), and S = (4, 2).

61
−→ −→
Solution: Let v = SP and w = SR, as in Figure right.
Then v = (1, 1) − (4, 2) = (−3, −1) and w = (5, 4) −
(4, 2) = (1, 2). But these are vectors in R2 , and the cross
product is only defined for vectors in R3 . However, R2
can be thought of as the subset of R3 such that the z -
coordinate is always 0. So we can write v = (−3, −1, 0)
and w = (1, 2, 0). Then the area A of P QRS is

A = kv × wk = k(−3, −1, 0) × (1, 2, 0)k


= k((−1)(0) − (0)(2), (0)(1) − (−3)(0), (−3)(2) − (−1)(1))k
= k(0, 0, −5)k
A = 5

Theorem 165. For any vectors u, v, w in R3 , and scalar k, we have


(a) v × w = −w × v Anticommutative Law
(b) u × (v + w) = (u × v) + (u × w) Distributive Law
(c) (u + v) × w = (u × w) + (v × w) Distributive Law
(d) (kv) × w = v × (kw) = k(v × w) Associative Law
(e) v × 0 = 0 = 0 × v
(f ) v × v = 0
(g) v × w = 0 if and only if vkw
Example 166. We have

i × i =j × j = k × k = 0
i × j = k,j × k = i, k × i = j
Also,

j × i = −k, k × j = −i, i × k = −j

Volume of a parallelepiped: Scalar triple product

Let the vectors u, v, w in R3 represent adjacent sides of a parallelepiped P as in the


figure. We will show that the volume of P is the scalar triple product u · (v × w).

62
Recall that the volume vol(P ) of a parallelepiped
P is the area A of the base parallelogram times
the height h. We know that, the area A of the
base parallelogram is kv × wk. And we can see
that since v × w is perpendicular to the base par-
allelogram determined by v and w, then the height
h is kuk cos θ, where θ is the angle between u and
v × w.
u · (v × w)
We also know that cos θ = . Hence,
kukkv × wk

vol(P ) = Ah
kuku · (v × w)
= kv × wk
kukkv × wk
= u · (v × w)

For any vectors u, v, w in R3 ,

u · (v × w) = w · (u × v) = v · (w × u)

Since v × w = −w × v for any vectors v, w in R3 , then picking the wrong order for
the three adjacent sides in the scalar triple product will give you the negative of the
volume of the parallelepiped. So taking the absolute value of the scalar triple product
for any order of the three adjacent sides will always give the volume.
Another type of triple product is the vector triple product u × (v × w).

Theorem 167. For any vectors u, v, w in R3 ,

u × (v × w) = (u · w)v − (u · v)w

Example 168. Find u × (v × w) for u = (1, 2, 4), v = (2, 2, 0), w = (1, 3, 0)


Solution: Since u · v = 6 and u · w = 7, then

u × (v × w) = (u · w)v − (u · v)w
= 7(2, 2, 0) − 6(1, 3, 0) = (14, 14, 0) − (6, 18, 0)
= (8, −4, 0)

For vectors v = v1 i + v2 j + v3 k and w = w1 i + w2 j + w3 k in component form, the cross


product is written as: v × w = (v2 w3 − v3 w2 ) i + (v3 w1 − v1 w3 ) j + (v1 w2 − v2 w1 ) k. It
is often to use the component form for the cross product, because it can be represented
as a determinant.

63
i j k
v2 v3 v1 v3 v1 v2
v×w = v1 v2 v3 = i− j+ k
w2 w3 w1 w3 w1 w2
w1 w2 w3
= (v2 w3 − v3 w2 ) i + (v3 w1 − v1 w3 ) j + (v1 w2 − v2 w1 ) k

Example 169. Let v = 4i − j + 3k and w = i + 2k. Then

i j k
−1 3 4 3 4 −1
v×w = 4 −1 3 = i− j+ k = −2i − 5j + k
0 2 1 2 1 0
1 0 2

Theorem 170. For any vectors u = (u1 , u2 , u3 ) , v = (v1 , v2 , v3 ) , w = (w1 , w2 , w3 ) in


R3 :
u1 u2 u3
u · (v × w) = v1 v2 v3
w1 w2 w3

Example 171. Find the volume of the parallelepiped with adjacent sides u = (2, 1, 3), v =
(−1, 3, 2), w = (1, 1, −2). By the previous theorem, the volume vol(P ) of the paral-
lelepiped P is the absolute value of the scalar triple product of the three adjacent sides
(in any order).

2 1 3
u · (v × w) = −1 3 2
1 1 −2
3 2 −1 2 −1 3
=2 −1 +3
1 −2 1 −2 1 1
= 2(−8) − 1(0) + 3(−4) = −28, so
vol(P ) = | − 28| = 28.

6 Vector spaces
A useful procedure in mathematics and other disciplines involves classification schemes.
That is, we form classes of objects based on properties they have in common. This
allows us to treat all members of the class as a single unit. Thus, instead of dealing
with each distinct member of the class, we can develop notions that apply to each and
every member of the class based on the properties that they have in common. In many
ways this helps us work with more abstract and comprehensive ideas.
Linear algebra has such a classification scheme that has become very important. This
is the notion of a vector space. A vector space consists of a set of objects and two
operations on these objects that satisfy a certain set of rules. If we have a vector

64
space, we will automatically be able to attribute to it certain properties that hold for
all vector spaces. Thus, upon meeting some new vector space, we will not have to verify
everything from scratch.
The name ”vector space” conjures up the image of directed line segments from the plane,
or 3-space, as discussed in the previous section. This is, of course, where the name of
the classification scheme is derived from. We will see that matrices and n-vectors will
give us examples of vector spaces, but other collections of objects, like polynomials,
functions, and solutions to various types of equations, will also be vector spaces. For
particular vector spaces, the members of the set of objects and the operations on these
objects can vary, but the rules governing the properties satisfied by the operations
involved will always be the same.
Definition 172. A real vector space is a set V of elements on which we have two
operations ⊕ and defined with the following properties:

a) If u and v are any elements in V , then u ⊕ v is in V . (We say that V is closed


under the operation ⊕·)
(1) u ⊕ v = v ⊕ u for all u, v in V .
(2) u ⊕ (v ⊕ w) = (u ⊕ v) ⊕ w for all u, v, w in V.
(3) There exists an element 0 in V such that u ⊕ 0 = 0 ⊕ u = u for any u in V.
(4) For each u in V there exists an element −u in V such that u⊕−u = −u⊕u =
0.

b) If u is any element in V and c is any real number, then c u is in V (i.e., V is


closed under the operation ).
(5) c (u ⊕ v) = (c u) ⊕ (c v) for any u, v in V and any real number c
(6) (c + d) u = (c u) ⊕ (d u) for any u in V and any real numbers c and d.
(7) c (d u) = (cd) u for any u in V and any real numbers c and d.
(8) 1 u = u for any u in V .

• The elements of V are called vectors; the elements of the set of real numbers R
are called scalars.
• The operation ⊕ is called vector addition; the operation is called scalar
multiplication.
• The vector 0 in property (3) is called a zero vector . The vector −u in property
(4) is called a negative of u. It can be shown that 0 and −u are unique.

In order to specify a vector space, we must be given a set V and two operations ⊕ and
satisfying all the properties of the definition. We shall often refer to a real vector

65
space merely as a vector space. Thus a ”vector” is now an element of a vector space
and no longer needs to be interpreted as a directed line segment. In our examples we
shall see, however, how this name came about in a natural manner. We now consider
some examples of vector spaces, leaving it to the reader to verify that all the properties
of definition above.
Example 173. Consider Rn , the set of all n × 1 matrices
 
a1
 a2 
 
 .. 
 . 
an

with real entries. Let the operation ⊕ be matrix addition and let the operation be
multiplication of a matrix by a real number (scalar multiplication). By the use of the
properties of matrices established in previous section, it is not difficult to show that Rn
is
 a vector
 space by verifying that the properties of Definition 172. Thus the matrix
a1
 a2 
 ..  , as an element of Rn , is now called an n-vector or merely a vector.
 
 . 
an

Example 174. The set of all m × n matrices with matrix addition as ⊕ and multi-
plication of a matrix by a real number as is a vector space (verify). We denote this
vector space by Mm×n .
Example 175. The set of all real numbers with ⊕ as the usual addition of real numbers
and as the usual multiplication of real numbers is a vector space (verify), In this case
the real numbers play the dual roles of both vectors and scalars. This vector space is
essentially the case with n = 1 of Example 173.
Example 176. Let Rn be the set of all 1 × n matrices [a1 a . . . .an ], where we define
⊕ by
     
a1 a2 · · · an ⊕ b 1 b 2 · · · b n = a1 + b 1 a2 + b 2 · · · an + b n

and we define by
   
c a1 a2 · · · an = ca1 ca2 · · · can .

Then Rn is a vector space (verify). This is just a special case of Example 175.
Example 177. Let V be the set of all 2 × 2 matrices with trace equal to zero; that is,
 
a b
A= is in V provided Tr(A) = a + d = 0
c d

66
for the definition and properties of the trace of a matrix. The operation ⊕ is standard
matrix addition and the operation is standard scalar multiplication of matrices; then
V is a vector space. We verify properties (a), (3), (4), (b), and (7) of Definition 172.
The remaining properties are left for the student to verify.
Let    
a b r s
A= and B=
c d t p
be any elements of V. Then Tr(A) = a + d = 0 and Tr(B) = r + p = 0. For property
(a), we have  
a+r b+s
A⊕B = and
c+t d+p
Tr(A ⊕B) = (a + r) + (d + p) = (a + d) + (r + p) = 0 + 0 = 0,
so A ⊕ B is in V ; that is V is closed under the operation ⊕. To verify property (3),
observe that the matrix has trace equal to zero, so  it in V . Then it follows from the
0 0
definition of ⊕ that property (3) is valid in V, so is the zero vector, which we
0 0
denote as 0. To verify property (4), let A, as given previously, be an element of V and
let  
−a −b
C= .
−c −d
We first show that C is in V :

T r(C) = (−a) + (−d) = −(a + d) = 0.

Then we have
     
a b −a −b 0 0
A⊕C = + = =0
c d −c −d 0 0

so C = −A. For property (b), let k be any real number. We have


 
ka kb
k A=
kc kd

and Tr(k A) = ka + kd = k(a + d) = 0, so k A is in V ; that is, V is closed under


the operation . For property (7), let k and m be any real numbers. Then
   
ma mb kma kmb
k (m A) = k =
mc md kmc kmd

and  
kma kmb
(km) A=
kmc kmd
It follows that k (m A) = (km) A.

67
Example 178. Another source of examples are sets of polynomials; therefore, we recall
some well-known facts about such functions. A polynomial (in t) is a function that is
expressible as
p(t) = an tn + an−1 tn−1 + · · · + a1 t + a0
where a0 , a1 , . . . , an are real numbers and n is a nonnegative integer. If an 6= 0, where
p(t) is said to have degree n. Thus the degree of a polynomial is the highest power of a
term having a nonzero coefficient; p(t) = 2t+1 has degree 1, and the constant polynomial
p(t) = 3 has degree 0. The zero polynomial, denoted by 0, has no degree. We now let
Pn be the set of all polynomials of degree ≤ n together with the zero polynomial. If p(t)
and q(t) are in Pn , we can write

p(t) = an tn + an−1 tn−1 + · · · + a1 t + a0 and q(t) = bn tn + bn−1 tn−1 + · · · + b1 t + b0 .

We define p(t) ⊕ q(t) as

p(t) ⊕ q(t) = (an + bn ) tn + (an−1 + bn−1 ) tn−1 + · · · + (a1 + b1 ) t + (a0 + b0 ) .

If c is a scalar, we also define c p(t) as

c p(t) = (can ) tn + (can−1 ) tn−1 + · · · + (ca1 ) t + (ca0 )

We now show that Pn is a vector space. Let p(t) and q(t), as before, be elements of Pn ;
that is, they are polynomials of degree ≤ n or the zero polynomial. Then the previous
definitions of the operations ⊕ and show that p(t) ⊕ q(t) and c p(t), for any scalar
c, are polynomials of degree ≤ n or the zero polynomial. That is, p(t) ⊕ q(t) and c p(t)
are in Pn so that (a) and (b) in Definition 172 hold. To verify property (1), we observe
that

q(t) ⊕ p(t) = (bn + an ) tn + (bn−1 + an−1 ) tn−1 + · · · + (b1 + a1 ) t + (a0 + b0 ) ,

and since ai + bi = bi + ai holds for the real numbers, we conclude that p(t)⊕ q(t) =
q(t)⊕p(t). Similarly, we verify property (2). The zero polynomial is the element 0 needed
in property (3). If p(t) is as given previously, then its negative, −p(t), is

−an tn − an−1 tn−1 − · · · − a1 t − a0 .

We shall now verify property (6) and will leave the verification of the remaining prop-
erties to the reader. Thus

(c + d) p(t) =(c + d)an tn + (c + d)an−1 tn−1 + · · · + (c + d)a1 t + (c + d)a0


=can tn + dan tn + can−1 tn−1 + dan−1 tn−1 + · · · + ca1 t + da1 t + ca0 + da0
=c an tn + an−1 tn−1 + · · · + a1 t + a0 + d an tn + an−1 tn−1 + · · · + a1 t + a0
 

=(c p(t)) ⊕ (d p(t)).

68
Example 179. Let V be the set of all real-valued continuous functions defined on R1 .
If f and g are in V, we define f ⊕ g by (f ⊕ g)(t) = f (t) + g(t). If f in V and c is a
scalar, we define c f by (c f )(t) = cf (t). Then V is a vector space, which is denoted
by C(−∞, ∞).

Example 180. Let V be the set of all real numbers with the operations u⊕v = u−v(⊕
is ordinary subtraction) and c u = cu( ordinary multiplication). Is V a vector space?
If it is not, which properties in Definition 172 fail to hold?
Solution: If u and v are in V, and c is a scalar, then u ⊕ v and c u are in V, so
that (a) and (b) in Definition 172 hold. However, property (1) fails to hold, since

u ⊕ v = u − v and v ⊕ u = v − u

and these are not the same, in general. (Find u and v such that u − v 6= v − u). Also,
we shall let the reader verify that properties (2), (3), and (4) fail to hold. Properties
(5), (7), and (8) hold, but property (6) does not hold, because

(c + d) u = (c + d)u = cu + du,

whereas
c u⊕d u = cu ⊕ du = cu − du
and these are not equal, in general. Thus V is not a vector space.

Example 181. Let V be the set of all integers; define ⊕ as ordinary addition and
as ordinary multiplication.
√ Here V is not a vector, because if u is any nonzero vector
in V and c = 3, then c u is not in V . Thus (b) fails to hold.

Remark 182. To verify that a given set V with two operations ⊕ and is a real vector
space, we must show that it satisfies all the properties of Definition 172.

• If both (a) and (b) hold, it is recommended that (3), the existence of a zero element,
be verified next. Naturally, if (3) fails to hold, we do not have a vector space and
do not have to check the remaining properties.

6.1 Subspaces
In this section we begin to analyze the structure of a vector space. First, it is convenient
to have a name for a subset of a given vector space that is itself a vector space with
respect to the same operations as those in V. Thus we have a definition.

Definition 183. Let V be a vector space and W a nonempty subset of V . If W is a


vector space with respect to the operations in V , then W is called a subspace of V .

69
Theorem 184. Let V be a vector space with operations ⊕ and and let W be a
nonempty subset of V . Then W is a subspace of V if and only if the following conditions
hold:
(a) If u and v are any vectors in W, then u ⊕ v is in W.
(b) If c is any real number and u is any vector in W, then c u is in W.

Example 185. Every vector space has at least two subspaces, itself and the subspace
{0} consisting only of the zero vector. (Recall that 0 ⊕ 0 = 0 and c 0 = 0 in any
vector space.) Thus {0} is closed for both operations and hence is a subspace of V . The
subspace {0} is called the zero subspace of V .

Example 186. Let P2 be the set consisting of all polynomials of degree ≤ 2 and the
zero polynomial; P2 is a subset of P , the vector space of all polynomials. To verify that
P2 is a subspace of P , show it is closed for ⊕ and . In general, the set Pn consisting
of all polynomials of degree ≤ n and the zero polynomial is a subspace of P . Also, Pn
is a subspace of Pn+1 .

Example 187. Let V be the set of all polynomials of degree exactly = 2; V is a subset
of P , the vector space of all polynomials; but V is not a subspace of P , because the sum
of the polynomials 2t2 + 3t + 1 and −2t2 + t + 2 is not in V , since it is a polynomial of
degree 1.
 
a
Example 188. Let W be the set of all vectors in R3 of the form  b  where a
a+b
and b are any real numbers. To verify Theorem 184 a) and (b), we let
   
a1 a2
u =  b1  and v =  b2 
a1 + b 1 a2 + b 2

be two vectors in W . Then


   
a1 + a2 a1 + a2
u⊕v = b1 + b2 = b 1 + a2 
(a1 + b1 ) + (a2 + b2 ) (a1 + a2 ) + (b1 + b2 )

is in W, for W consists of all those vectors whose third entry is the sum of the first two
entries. Similarly,
     
a1 ca1 ca1
c  b1 = cb1 = cb1 
a1 + b 1 c (a1 + b1 ) ca1 + cb1

is in W . Hence W is a subspace of R3 .

70
Note 189. Henceforth, we shall usually denote u ⊕ v and c u in a vector space V as
u + v and cu, respectively.

We can also show that a nonempty subset W of a vector space V is a subspace


of V if and only if cu + dv is in W for any vectors u and v in W and any scalars
c and d.

Definition 190. Let v1 , v2 , . . . , vk be vectors in a vector space V. A vector v in V is


called a linear combination of v1 , v2 , . . . , vk if
k
X
v = a 1 v1 + a 2 v2 + · · · + a k vk = aj vj
j=1

for some real numbers a1 , a2 , . . . , ak .


Example 191. In R3 let
    
1 1 1
v1 =  2  , v2 =  0  , and v3 =  1  .
1 2 0

The vector  
2
v= 1 
5
is a linear combination of v1 , v2 , and v3 if we can find real numbers a1 , a2 , and a3 so
that
a1 v1 + a2 v2 + a3 v3 = v.
Substituting for v, v1 , v2 , and v3 , we have
       
1 1 1 2
a1 2 + a2 0 + a3 1 = 1  .
      
1 2 0 5

Equating corresponding entries leads to the linear system (verity)

a1 + a2 + a3 = 2
2a1 + a3 = 1
a1 + 2a2 = 5.

Solving this linear system by the methods of previous Chapters gives (verify) a1 =
1, a2 = 2, and a3 = −1, which means that v is a linear combination of v1 , v2 , and v3 .
Thus
v = v1 + 2v2 − v3 .

71
6.2 Span
Linear combinations play an important role in describing vector spaces. The set of all
possible linear combinations of a pair of vectors in a vector space V gives us a subspace.
We have the following definition to help with such constructions:
Definition 192. If S = {v1 , v2 , . . . , vk } is a set of vectors in a vector space V , then
the set of all vectors in V that are linear combinations of the vectors in S is denoted by

span S = hSi = {a1 v1 + a2 v2 + . . . + ak vk | a1 , a2 , . . . , ak ∈ R}.

Example 193. Consider the set S of 2 × 3 matrices given by


       
1 0 0 0 1 0 0 0 0 0 0 0
S= , , , .
0 0 0 0 0 0 0 1 0 0 0 1

Then span S is the set in M23 consisting of all vectors of the form
         
1 0 0 0 1 0 0 0 0 0 0 0 a b 0
a +b +c +d = , where a, b, c, d ∈ R.
0 0 0 0 0 0 0 1 0 0 0 1 0 c d

That is, span S is the subset of M23 consisting of all matrices of the form
 
a b 0
0 c d

where a, b, c, and d are real numbers.


Theorem 194. Let S = {v1 , v2 , . . . , vk } be a set of vectors in a vector space V . Then
span S is a subspace of V .

Proof. Let
k
X k
X
u= a j vj and w= bj vj
j=1 j=1

for some real numbers a1 , a2 , . . . , ak and b1 , b2 , . . . , bk . We have


k
X k
X k
X
u+w = a j vj + bj vj = (aj + bj ) vj .
j=1 j=1 j=1

Moreover, for any real number c,


k
! k
X X
cu = c a j vj = (caj ) vj .
j=1 j=1

Since the sum u + w and the scalar multiple cu are linear combinations of the vectors
in S, then span S is a subspace of V .

72
Definition 195. Let S be a set of vectors in a vector space V . If every vector in V
is a linear combination of the vectors in S, then the set S is said to span V , or V is
spanned (generated) by the set S; that is, span S = V .
Remark 196. If span S = V , S is called a spanning set of V . A vector space can
have many spanning sets.
Example 197. In R3 , let
   
2 1
v1 =  1  and v2 =  −1  .
1 3

Determine whether the vector 


1
v= 5 
−7
belongs to span{v1 , v2 }.
Solution: If we can find scalars a1 , a2 such that v = a1 v1 + a2 v2 , then v belongs to
span{v1 , v2 }. Substituting for v1 , v2 and v, we have
     
2 1 1
a1  1  + a2  −1  =  5  .
1 3 −7

This expression corresponds to the linear system whose augmented matrix is (verify)
 
2 1 1
 1 −1 5 .
1 3 −7

The reduced row echelon form of this system is (verify)


 
1 0 2
 0 1 −3  ,
0 0 0

which indicates that the linear system is consistent, a1 = 2, and a2 = 3. Hence v


belongs to span{v1 , v2 }.
Example 198. In P2 , let

v1 = 2t2 + t + 2, v2 = t2 − 2t, v3 = 5t2 − 5t + 2, v4 = −t2 − 3t − 2.

Determine whether the vector


v = t2 + t + 2

73
belongs to span {v1 , v2 , v3 , v4 } .
Solution: If we can find scalars a1 , a2 , a3 , and a4 so that

a1 v1 + a2 v2 + a3 v3 + a4 v4 = v

then v belongs to span {v1 , v2 , v3 , v4 } . Substituting for v1 , v2 , v3 , and v4 , we have

a1 2t2 + t + 2 + a2 t2 − 2t + a3 5t2 − 5t + 2 + a4 −t2 − 3t − 2 = t2 + t + 2,


   

or

(2a1 + a2 + 5a3 − a4 ) t2 + (a1 − 2a2 − 5a3 − 3a4 ) t + (2a1 + 2a3 − 2a4 ) = t2 + t + 2.

Now two polynomials agree for all values of t only if the coefficients of respective powers
of t agree. Thus we get the linear system

2a1 + a2 + 5a3 − a4 = 1
a1 − 2a2 − 5a3 − 3a4 = 1
2a1 + 2a3 − 2a4 = 2.

To determine whether this system of linear equations is consistent, we form the aug-
mented matrix and transform it to reduced row echelon form, obtaining (verify)
 
1 0 1 −1 0
 0 1 3 1 0 ,
0 0 0 0 1

which indicates that the system is inconsistent; that is, it has no solution. Hence v does
not belong to span{v1 , v2 , v3 , v4 }.

Remark 199. In general, to determine whether a specific vector v belongs to span S,


we investigate the consistency of an appropriate linear system.

Example 200. Let V be the vector space R3 . Let


     
1 1 1
v1 = 2 , v2 = 0  ,
   and v3 =  1  .
1 2 0
 
a
To find out whether v1 , v2 , v3 span V , we pick any vector v =  b  in V (a, b and
c
c are arbitrary real numbers) and determine whether there are constants a1 , a2 , and a3
such that
a1 v1 + a2 v2 + a3 v3 = v.

74
This leads to the linear system (verify)
a1 + a2 + a3 = a
2a1 + a3 = b
a1 + 2a2 = c.
A solution is (verify)

−2a + 2b + c a−b+c 4a − b − 2c
a1 = , a2 = , a3 = .
3 3 3
Thus v1 , v2 , v3 span V . This is equivalent to saying that span {v1 , v2 , v3 } = R3 .
Example 201. Let V be the vector space P2 . Let v1 = t2 + 2t + 1 and v2 = t2 + 2. Does
{v1 , v2 } span V ?
Solution: Let v = at2 + bt + c be any vector in V, where a, b, and c are any real
numbers. We must find out whether there are constants a1 and a2 such that
a1 v1 + a2 v2 = v,
or
a1 t2 + 2t + 1 + a2 t2 + 2 = at2 + bt + c.
 

Thus
(a1 + a2 ) t2 + (2a1 ) t + (a1 + 2a2 ) = at2 + bt + c.
Equating the coefficients of respective powers of t, we get the linear system
a1 + a2 = a
2a1 = b
a1 + 2a2 = c.
Transforming the augmented matrix of this linear system, we obtain (verify)
 
1 0 2a − c
 0 1 c−a .
0 0 b − 4a + 2c
If b − 4a + 2c 6= 0, then the system is inconsistent and there is no solution. Hence
{v1 , v2 } does not span V.

6.3 Linear Independence


Definition 202. The vectors v1 , v2 , . . . , vk in a vector space V are said to be linearly
dependent if there exist constants a1 , a2 , . . . , ak , not all zero, such that
k
X
aj vj = a1 v1 + a2 v2 + · · · + ak vk = 0. (15)
j=1

75
Otherwise, v1 , v2 , . . . , vk are called linearly independent. That is,

v1 , vk , . . . , vk are linearly independent if, whenever a1 v1 + a2 v2 + · · · + ak vk = 0,

a1 = a2 = · · · = ak = 0.

If S = {v1 , v2 , . . . , vk } , then we also say that the set S is linearly dependent or linearly
independent if the vectors have the corresponding property.

It should be emphasized that for any vectors v1 , v2 , . . . , vk , Equation (15) always holds
if we choose all the scalars a1 , a2 , · · · , ak equal to zero. The important point in this
definition is whether it is possible to satisfy (15) with at least one or the scalars different
from zero.

Example 203. Determine whether the vectors


     
3 1 −1
v1 = 2 , v2 = 2  ,
   v3 =  2 
1 0 −1

are linearly independent.


Solution: Forming Equation (15),
       
3 1 −1 0
a1  2  + a2  2  + a3  2  =  0  ,
1 0 −1 0

we obtain the homogeneous system (verify)

3a1 + a2 − a3 = 0
2a1 + 2a2 + 2a3 = 0
a1 − a3 = 0

The corresponding augmented matrix is


 
3 1 −1 0
 2 2 2 0 ,
1 0 −1 0

whose reduced row echelon form is (verify)


 
1 0 −1 0
 0 1 2 0 .
0 0 0 0

76
Thus there is a nontrivial solution
 
k
 −2k  , k 6= 0 (verify),
k

so the vectors are linearly dependent.


   
Example
  Are4 the vectors v1 = 1 0 1 2 , v2 = 0 1 1 2 , and v3 =
204.
1 1 1 3 in R linearly dependent or linearly independent?
Solution: We form Equation (15),

a1 v1 + a2 v2 + a3 v3 = 0

and solve for a1 , a2 , and a3 . The resulting homogeneous system is (verify) and solve for
a1 , a2 , and a3 . The resulting homogeneous system is (verify)

a1 + a3 =0
a2 + a3 =0
a1 + a2 + a3 =0
2a1 + 2a2 + 3a3 = 0.

The corresponding augmented matrix is (verify)


 
1 0 1 0
 0 1 1 0 
 
 1 1 1 0 
2 2 3 0

and its reduced row echelon form is (verify)


 
1 0 0 0
 0 1 0 0 
 .
 0 0 1 0 
0 0 0 0

Thus the only solution is the trivial solution a1 = a2 = a3 = 0, so the vectors are
linearly independent.

Example 205. Are the vectors


     
2 1 1 2 0 −3
v1 = , v2 = , v3 =
0 1 1 0 −2 1

in M22 linearly independent?

77
Solution: We form Equation (15),
       
2 1 1 2 0 −3 0 0
a1 + a2 + a3 = ,
0 1 1 0 −2 1 0 0
and solve for a1 , a2 , and a3 . Performing the scalar multiplications and adding the re-
sulting matrices gives
   
2a1 + a2 a1 + 2a2 − 3a3 0 0
= .
a2 − 2a3 a1 + a3 0 0
Using the definition for equal matrices, we have the linear system
2a1 + a2 =0
a1 + 2a2 − 3a3 =0
.
a2 − 2a3 =0
a1 + a3 =0
The corresponding augmented matrix is
 
2 1 0 0
 1 2 −3 0 
 ,
 0 1 −2 0 
1 0 1 0
and its reduced row echelon form is (verify)
 
1 0 1 0
 0 1 −2 0 
 .
 0 0 0 0 
0 0 0 0
Thus there is a nontrivial solution
 
−k
 2k  , k 6= 0 (verify)
k
so the vectors are linearly dependent.
Example 206. Are the vectors v1 = t2 + t + 2, v2 = 2t2 + t, and v3 = 3t2 + 2t + 2 in
P2 linearly dependent or linearly independent?
Solution: Forming Equation (15), we have (verify)
a1 + 2a2 + 3a3 = 0
a1 + a2 + 2a3 = 0
2a1 + 2a3 = 0,
which has infinitely many solutions (verify). A particular solution is a1 = 1, a2 =
1, a3 = −1, so, v1 + v2 − v3 = 0. Hence the given vectors are linearly dependent.

78
Theorem 207. Let S = {v1 , v2 , . . . , vn } be a set of n vectors in Rn (Rn ) . Let A be the
matrix whose columns (rows) are the elements of S. Then S is linearly independent if
and only if det(A) 6= 0.
     
Example 208. Is S = 1 2 3 , 0 1 2 , 3 0 −1 a linearly independent
set of vectors in R3 ?
Solution: We form the matrix A whose rows are the vectors in S :
 
1 2 3
A =  0 1 2 .
3 0 −1

Since det(A) = 2 (verify), we conclude that S is linearly independent.


Theorem 209. Let S1 and S2 be finite subsets of a vector space and let S1 be a subset
of S2 . Then the following statements are true:

a) If S1 is linearly dependent , so is S2 .
b) If S2 is linearly independent , so is S1 .

Summary 210. At this point, we have established the following results:

• The set S = {0} consisting only of 0 is linearly dependent, since, for example,
50 = 0, and 5 6= 0.

• From this it follows that if S is any set of vectors that contains 0 , then S
must be linearly dependent.

• A set of vectors consisting of a single nonzero vector is linearly independent


(verify).

• If v1 , v2 , . . . , vk are vectors in a vector space V and any two of them are equal,
then v1 , v2 , . . . , vk are linearly dependent (verify).

Theorem 211. The nonzero vectors v1 , v2 , . . . , vn in a vector space V are linearly


dependent if and only if one of the vectors vj (j ≥ 2) is a linear combination of the
preceding vectors v1 , v2 , . . . , vj−1 .
   
Example
 212.
 Let V =  R 3 and also
 v 1 = 1 2 −1 , v 2 = 1 −2 1 , v3 =
−3 2 −1 and v4 = 2 0 0 . We find (verify) that

v1 + v2 + 0v3 − v4 = 0

so v1 , v2 , v3 , and v4 are linearly dependent. We then have

v4 = v1 + v2 + 0v3 .

79
Remark 213. 1) We observe that Theorem 211 does not say that every vector v is
a linear combination of the preceding vectors. Thus, in Example 212, we also have
v1 + 2v2 + v3 + 0v4 = 0. We cannot solve, in this equation, for v4 as a linear
combination of v1 , v2 , and v3 , since its coefficient is zero.

2) We can also prove that if S = {v1 , v2 , . . . , vk } is a set of vectors in a vector space


V, then S is linearly dependent if and only if one of the vectors in S is a linear
combination of all the other vectors in S. For instance, in Example 212
1 1
v1 = −v2 − 0v3 + v4 ; v2 = − v1 − v3 − 0v4 .
2 2

3) Observe that if v1 , v2 , . . . , vk are linearly independent vectors in a vector space,


then they must be distinct and nonzero.

Suppose that S = {v1 , v2 , . . . , vn } spans a vector space V, and vj is a linear combination


of the preceding vectors in S. Then the set

S1 = {v1 , v2 , . . . , vj−1 , vj+1 , . . . , vn }

consisting of S with vj deleted, also spans V. To show this result, observe that if v is
any vector in V, then, since S spans V, we can find scalars a1 , a2 , . . . , an such that

v = a1 v1 + a2 v2 + · · · + aj−1 vj−1 + aj vj + aj+1 vj+1 + · · · + an vn .

Now if
vj = b1 v1 + b2 v2 + · · · + bj−1 vj−1 ,
then
v = a1 v1 + a2 v2 + · · · + aj−1 vj−1 + aj (b1 v1 + b2 v2 + · · · + bj−1 vj−1 )
+ aj+1 vj+1 + · · · + an vn
=c1 v1 + c2 v2 + · · · + cj−1 vj−1 + cj+1 vj+1 + · · · + cn vn
which means that span S1 = V.

Example 214. Consider the set of vectors S = {v1 , v2 , v3 , v4 } in R4 , where


       
1 1 0 2
 1   0   1   1 
v1 = 
 0  , v2 =  1  , v3 =  1  , v4 =  1
      

0 0 0 0

and let W = span S. Since


v4 = v1 + v2 ,
we conclude that W = span S1 , where S1 = {v1 , v2 , v3 } .

80
7 Basis and Dimension
In this section we continue our study of the structure of a vector space V by determining
a set of vectors in V that completely describes V .

7.1 Basis
Definition 215. The vectors v1 , v2 , . . . , vk in a vector space V are said to form a basis
for V if
a) v1 , v2 , . . . , vk span V and
b) v1 , v2 , . . . , vk are linearly independent.

Remark 216. If v1 , v2 , . . . , vk forms a basis for a vector space V , then they must be
distinct and nonzero.
     
1 0 0
3
Example 217. Let V = R . The vectors  0 , 1 , 0  form a basis for R3 ,
   
0 0 1
called the natural basis or standard basis, for R3 . We can readily
  see how to gener- 
n
alize this to obtain the natural basis for R . Similarly, 1 0 0 , 0 1 0 , 0 0 1
is the natural basis for R3 .

The natural basis for Rn is denoted by {e1 , e2 , . . . , en } , where


 
0
 .. 
 . 
 0 
 
 1  ← ith row;
 
 0 
 
 . 
 .. 
0

that is, ei is an n × 1 matrix with a 1 in the ith row and zeros elsewhere. The natural
basis for R3 is also often denoted by
     
1 0 0
i =  0  , j =  1  , and k =  0  .
0 0 1

Example 218. Show that the set S = {t2 + 1, t − 1, 2t + 2} is a basis for the vector
space P2 .
Solution: To do this, we must show that S spans V and is linearly independent. To
show that it spans V, we take any vector in V, that is, a polynomial at2 + bt + c, and

81
find constants a1 , a2 and a3 such that

at2 + bt + c = a1 t2 + 1 + a2 (t − 1) + a3 (2t + 2)


= a1 t2 + (a2 + 2a3 ) t + (a1 − a2 + 2a3 ) .

Since two polynomials agree for all values of t only if the coefficients of respective powers
of t agree, we get the linear system

a1 = a
a2 + 2a3 = b
a1 − a2 + 2a3 = c.

Solving, we have
a+b−c c+b−a
a1 = a, a2 = , a3 = .
2 2
Hence S spans V .
To illustrate this result, suppose that we are given the vector 2t2 + 6t + 13. Here,
a = 2, b = 6, and c = 13. Substituting in the foregoing expressions for a, b, and c, we
find that
5 17
a1 = 2, a2 = − , a3 =
2 4
Hence
5 17
2t2 + 6t + 13 = 2(t2 + l) − (t − 1) + (2t + 2).
2 4
To show that S is linearly independent, we form

a1 t2 + 1 + a2 (t − 1) + a3 (2t + 2) = 0.


Then
a1 t2 + (a2 + 2a3 ) t + (a1 − a2 + 2a3 ) = 0.
Again, this can hold for all values of t only if

a1 = 0
a2 + 2a3 = 0
a1 − a2 + 2a3 = 0

The only solution to this homogeneous system is a1 = a2 = a3 = 0, which implies that


S is linearly independent. Thus S is a basis for P2 .

82
446 Chapter 7 Eigenvalues and Eigenvectors

Compute the eigenvalues and associated eigenvectors of the matrix A defined in


EXAMPLE 12
Example 11.
Solution
In Example 11 we found the characteristic poly nomial of A to be

p().) = ). 3 - 6). 2 + 11). -6.

The possible integer roots of p().) are ±I, ±2, ±3, and ±6. By substituting these
values in p().), we find that p( l) = 0, so). = 1 is a root of p().). Hence (). - 1)
is a factor of p().), Dividing p().) by (). - 1), we obtain

p().) = (). - 1)(). 2 - 5). +6) ( verify).

Factoring). 2 - 5). + 6, we have


p().) = (). - I)(). - 2)(). - 3).
The eigenvalues of A are then).1 = 1, >..2 = 2, and A3 = 3. To find an eigenvector

x'
x 1 associated with AJ = I, we substitute).= l in (6) to get

[ ] [°]
[1-1 -2 1
-1 1 -] ] X2 = 0
-4 4 1-5 X3 0

=n [;:J-m
or

[-�
-2
l

[-!.]
-4 4

The vector is a solution for any number r. Thus

x, -[-i]
is an eigenvector of A associated with). 1 = 1 (r was taken as2).

2- 1 1 x,
To find an eigenvector x2 associated with ).2 = 2, we substitute). = 2 in (6),
obtaining

[ ][ ] [°]
-2
-1 2 -1 X2 = 0
-4 4 2-5 X3 0

=iJ [: J-m
or

[-:
-2
2
-4 4
( 447

[-!]
7.1 Eigenvalues and Eigenvectors

-
The vector is a solution fm any numbe,· ,. . Thus x2 = [ �] is an

eigenvector of A associated with .11.2 = 2 (r was taken as 4).

[x'] [°]
To find an eigenvector x3 associated with .11. 3 = 3, we substitute .11. = 3 in (6),
obtaining
[3 - 1 -2 1
3 -1 ] X2 = 0

m
-1

[
-4 4 3- 5 X3 0

=lH�:J
or

-;
-2
3
-4 4 =
-
The vector [ ; ] is a solution fo,· any numbe,· r. Thus

(
X3

is an eigenvector of A associated with .11.3


= [-l]
= 3 (r was taken as 4). •
Compute the eigenvalues and associated eigenvectors of
EXAMPLE 13
0 3
0 -1] .
1 3

Solution
The characteristic polynomial of A is

.11. -0 0 -3
p(.11.) = det(A.13 - A)= -1 )...-0 = )...3 - 3.11.2 + .11. - 3
0 -1 .11.-3
(verify). We find that .11. = 3 is a root of p()...). Dividing p()...) by (.11. - 3), we get
p(.11.) = (.11. - 3)(.11. 2 + 1). The eigenvalues of A are then

At = 3, .11.2 = i, .11.3 = -i.


To compute an eigenvector Xt associated with At = 3, we substitute )... = 3 in (6),
obtaining
3 -0 0 �3 ]
[:�]
[ -1 3-0 = [�].
0 -1 3 - 3 X3 0

(
448 Chapter 7 Eigenvalues and Eigenvectors

We find that the vector [ �] is a solution foe any number· r (verify). Letting r - I,

we conclude that

is an eigenvector of A associated with >-. 1 = 3. To obtain an eigenvector x2 associ­


ated with >-.2 = i, we substitute >-.= i in (6), which yields

i-0
.
0 -3 ] [x1 [OJ
= 0 .
[ -1 l - 0 1 X2 ]
0 -1 i-3 X3 0

We find that the vector [ (�; :1:)r] is a solution for any number r (verify).

Letting r = 1, we conclude that

is an eigenvector of A associated with >-.2 = i. Similarly, we find that

is an eigenvector of A associated with >-.3 = -i. •


Let L be the linear operator on P2 defined in Example 9. Using the matrix B
EXAMPLE 14
obtained there representing L with respect to the basis {t - I, 1, t 2 } for P2, find
the eigenvalues and associated eigenvectors of L.
Solution
T he characteristic polynomial of

[-1�
�]
0
B = -2
0
is p(>-.) = >-.(>-. + 2)(>-. + 1) (verify), so the eigenvalues of Lare >-. 1 = 0, >-.2 = -2,
and A.3 = -1. Associated eigenvectors are (verify)
( 7.1 Eigenvalues and Eigenvectors 449
These are the coordinate vectors of the eigenvectors of L, so the corresponding
eigenvectors of L are
O(t - 1) + 0(1) + l (t 2 ) = t 2
l(t - 1) + 1(1) + O(t 2) =t
and
O(t - 1) + 1(1) + O(t2 ) = 1,
respectively. •
The procedure for finding the eigenvalues and associated eigenvectors of a ma­
trix is as follows:
Step 1. Determine the roots of the characteristic polynomial
p(J..) = det(M11 - A).
These are the eigenvalues of A.
Step 2. For each eigenvalue A, find all the nontrivial solutions to the
homogeneous system (J../11 A)x = 0. These are the eigenvectors of A as­
-

sociated with the eigenvalue A.

( The characteristic polynomial of a matrix may have some complex roots, and
it may, as seen in Example 13, even have no real roots. However, in the important
case of symmetric matrices, all the roots of the characteristic polynomial are real.
We prove this in Section 7.3 (Theorem 7.6).
Eigenvalues and eigenvectors satisfy many important and interesting proper­
ties. For example, if A is an upper (lower) triangular matrix, then the eigenvalues
of A are the elements on the main diagonal of A (Exercise 11). Other properties
are developed in the exercises for this section.
It must be pointed out that the method for finding the eigenvalues of a linear
transformation or matrix by obtaining the real roots of the characteristic polyno­
mial is not practical for n > 4, since it involves evaluating a determinant. Efficient
numerical methods for finding eigenvalues and associated eigenvectors are studied
in numerical analysis courses.
Warning When finding the eigenvalues and associated eigenvectors of a ma­
trix A, do not make the common mistake of first transforming A to reduced row
echelon form B and then finding the eigenvalues and eigenvectors of B. To see
quickly how this approach fails, consider the matrix A defined in Example 10. Its
eigenvalues are J..1 = 2 and A2 = 3. Since A is a nonsingular matrix, when we
transform it to reduced row echelon form B, we have B = h The eigenvalues of
/2 areJ.. 1 = 1 andJ..2 = 1.

Key Terms
Eigenvalue Characteristic value Characteristic equation
Eigenvector Latent value Roots of the characteristic polynomial
Proper value Characteristic polynomial

You might also like