Download as pdf or txt
Download as pdf or txt
You are on page 1of 166

1

Course: Linear Algebra


Instructor: Ming-Xian Chang
Office: 92915 (9F)
E-mail: mxchang@ncku.edu.tw
Website: https://moodle.ncku.edu.tw/

Textbook:
“Elementary Linear Algebra,” 12-th ed., H. Anton and C. Rorres.

( The content of this course comes from the above book and other
references.)

Reference:
1. “Linear Algebra,” Friedberg, Insel, Spence.

2. “Elementary Linear Algebra, a Matrix Approach,” Spence, Insel,


Friedberg.
3. “Introduction to Linear Algebra,” Gilbert Strang.
4. “Linear Algebra and Its Applications,” David C. Lay.
5. “Linear Algebra with Applications,” Gareth Williams.
6. “Linear Algebra with Applications,” Steven J. Leon.
7. “Linear Algebra,” J. B. Fraleigh and R. A. Beauregard.
(Youtube Contents)

Grading: Midterm exam, final exam, and quizzes (85%),


homework (15%)
3
� Introduction

Algebra
1. Elementary algebra
2. Linear Algebra
3. Algebra (Modern algebra, Abstract algebra)
- Abstraction and generalization (Ex. Vector Spaces)
v + w, cv (vectors in R2 or R3)
f (t) + g(t), cf (t) (functions)
M + P, cM (matrices)

4
Notation
1. R: the set of real numbers.
2. C: the set of complex numbers.
3. A vector v is denoted by a bold little character.
4. R2 and R3 are the sets of 2-D plane and 3-D space.
2
1
v=
 

∈ R2 ,
3
� �

1
 
u =  4  ∈ R3

5. A vector is denoted as a column vector. For row vector, we use the


notation “T ” to denote “transpose”
uT = [2 4 1]
5

6. We use the notation “H ” to denote “complex transpose” or “Hermi-


tian” of a vector or a matrix. If w is a complex vector, we denote
wH = wT
For example,
1 + i2
 

wH = [1 − i2 4 3]
3
 
w =  4 ,

2 + i3 1 1 + i 2 − i3 0 0
0 4 MH 1
   

0 0 3
   

1−i 0 3
M = 0 , = 4 0

Preview
1. Vectors in Rn (or Cn):
a
 

ak ∈ R (or C) (an n-tuple)


 1
 a2 

an
u= 
 .. 
 

uT = [a1 a2 · · · an]

uH = uT = [a1 a2 · · · an]

2. Vectors and scalars: c 1 v 1 + c 2 v 2 + c3 v 3


7

Vector Space :
A vector space is a set of vectors, which will be defined later.

Inner product :
The inner product of two vectors x and y,
x · y, �x, y�, �x | y�
When we define an inner product in a vector space, we can use it
to define
1. the length (norm) of a vector ( �x�2 = �x, x� ), and
2. the orthogonality between vectors. ( �x, y� = 0 )

Linear combination c 1 v 1 + c2 v 2 + · · · + c n v n

Linear transformation
v ∈ Rn �→ w ∈ Rm

8
Contents
1. Systems of Linear Equations and Matrices (Chap. 1)
2. Determinants (Chap. 2)
3. Euclidean Vector Spaces (R2, R3, Rn) (Chap. 3)
4. General Vector Spaces (Chap. 4)
5. Eigenvalues and Eigenvectors (Chap. 5)
6. Inner Product Spaces (Chap. 6)
7. Diagonalization and Quadratic Forms (Chap. 7)
8. Linear Transformations (Chap. 8)
9. Additional Topics
(including Singular Value Decomposition and Jordan Forms.)
9
Homework :
Chap 1: 1.1): 8, 12 1.2): 38 1.3): 36 1.4): 42, 46 1.5): 31 1.6): 18, 24
1.7): 40(a), 47 1.8): 16, 45 1.9): 12

Chap 2: 2.1): 34 2.2): 32 2.3): 34


Chap 4: 4.1): 6,15 4.2): 12,24,28 4.3): 12 4.4): 6 4.5): 20 4.6): 26 4.7): 16
4.8): 34 4.9): 34
Chap 5: 5.1): 28 5.2): 34 5.3): 30
Chap 6: 6.1): 47 6.2): 37 6.3): 26 6.4): 30 6.5): (7)
Chap 7: 7.1): 8 7.2): 26 7.3): 32 7.4): 18 7.5): 34,43
Chap 8: 8.1): 32, 38 8.2): 40, 44 8.3): 4,6,8 8.4): 14 8.5): 6
Chap 9:

10
� Linear Equations
Consider a system of m linear equations on n unknowns,

a11x1 + a12x2 + · · · + a1k xk + · · · + a1nxn = b1


a21x1 + a22x2 + · · · + a2k xk + · · · + a2nxn = b2
.. .. . (1)
am1x1 + am2x2 + · · · + amk xk + · · · + amnxn= bm

where a11, a12, a21, . . . , a�k , . . . are constant coefficients.

a a a b
       

.
 11   12   1n   1 
 a21   a22   a2n   b2 

am1 am2 amn bm


   
x1  .  + x2  .  + · · · + xn  = 
   . 
 .   .   .   . 
11
which corresponds to a linear combination

x1a1 + x2a2 + · · · + xk ak + · · · + xnan = b

where the kth vector ak and b are defined by

a b
   

(∈ Rm)
 1k   1
 a2k   b2 

amk bm
ak =  
 ..  , b= 
 .. 
   

respectively.

12
Eq. (1) also corresponds to a transformation Ax = b

a a · · · a1n x b
    

.. ..
 11 12  1   1 
 a21 a22 · · · a2n     

am1 am2 xn bm
   x2  =  b2 

· · · amn
 .. ..   ..   .. 
    

a11 a12 · · · a1n


a21 a22
 

.. .. ..
 
 · · · a2n 

am1 am2
A= 

· · · amn
 .. 
 

A is an m × n matrix, x ∈ Rn, b ∈ Rm.


After multiplying by A, we transform x to b.
13
A system of linear equations (Ax = b) has either
1. no solutions, or
2. exactly one solution ( a unique solution), or
3. infinitely many solutions.

Suppose that Ax = b has more than one solutions,


Ax1 = b
Ax2 = b
where x1 �= x2. Then
A tx1 + (1 − t)x2 = b, ∀t ∈ R
� �

and tx1 + (1 − t)x2 is also a solution.

14

For the system of linear equations in (1), Ax = b,


a11x1 + a12x2 + · · · + a1nxn = b1
a21x1 + a22x2 + · · · + a2nxn = b2
.. .. ..
am1x1 + am2x2 + · · · + amnxn= bm

the augmented matrix is


a11 a12 · · · a1n b1
a21 a22
 

· · · a2n
.. .. .. ..
 
 b2 

am1 am2 bm
[A | b] =  

· · · amn
 .. 
 
15
We will take elementary row operations on [A | b] to make it into
a form that is more easy to solve.

Elementary row operations on a matrix:


1. (Replacement) Add to one row a multiple of anther row.
2. (Interchange) Interchange two rows.
3. (Scaling) Multiply all entries in a row by a non-zero constant.

Remarks
1. By taking elementary row operations, we do not affect the solutions
of Ax = b.
2. Each elementary row operation is reversible.

16

×3

×2

×(−3)

×1/2
17
Definition A matrix is in (row) echelon form if it has
the following three properties: (Forward Gaussian elimination)

1 ∗ ∗ ∗ ∗ � ∗ ∗ ∗ ∗

1
   

or �
∗ ∗
�= 0
0 0 1
   
0 ∗ 0 � ∗ ∗ ∗

0 0 0 0 0 0 0 0 0 0
   
0 ∗ 0 0 0 � ∗
   

1. If there are any rows that consist entirely of zeros, then they are
grouped together at the bottom of the matrix.
2. If a row does not consist entirely of zeros, then the first non-zero
number in the row is a 1. We call this a leading 1 (or pivot).
3. The leading 1 in the lower row occurs farther to the right than the
leading 1 in the higher row.

18
Definition A matrix is in reduced (row) echelon form
if it is in (row) echelon form and satisfies the following property.

4. Each column that contains a pivot (= 1) has zeros everywhere else.

Example A row echelon form (R1) and a reduced row echelon form (R2).

1∗∗∗∗ 10∗0∗
   
   
0 1 ∗ ∗ ∗ 0 1 ∗ 0 ∗

0 0 0 0 0 00 0 0 0
R1 =   
 0 0 0 1 ∗  , R2 =  0 0 0 1 ∗ 

   
19
After a sequence of row operations, we can make a matrix in a
row echelon form. A matrix may have many row echelon forms.
(For example, adding the 3rd row of R1 to the 2nd row of R1, we obtain
another matrix in row echelon form.)
1∗∗∗∗
 
 
0 1 ∗ ∗ ∗

0 0 0 0 0
R1 = 0 0 0 1 ∗

 

12357 1235 7
   
   
0 1 4 6 8  0 1 4 7 17 

00000 0000 0
   
0 0 0 1 9 → 0 0 0 1 9
   

20

However, the reduced row echelon form of a matrix is unique.


(We will discuss about this issue later.)

1 0 ∗ 0 ∗
1 0
 


0 0 1
 
0 ∗

0 0 0 0 0
R2 = 
0

 ∗

• The pivots are always in the same positions in any echelon form
of A.
• We call these positions pivot positions.
• A pivot column is a column of A that contains a pivot position.
21
Example Reduced row-echelon forms:
0 1 −2 0 1
1 0 0 4 1 0 0
0
 

0 1 0 7 , 0 1 0 ,
   

0 0
  � �

0 0 1
     0 0 1 3

0 0 1 −1
0 0 0 0 0
      , 0 0
0 0 0 0 0 

Example Row-echelon forms:


1 4 −3 7 1 1 0 0 1 2 6 0
     

0 0 1 5 0 0 1
     

0 0 0 −0 1
 0 1 6 2  ,  0 1 0  ,  0 0 1 −1 0 

22

We show in the following the process of reaching an echelon form


and a reduced echelon form.

a a12 . . . a1n b1
a22 . . . a2n
 

.. .. ..
 11 
 a21 b2 

am1 an2 . . . amn bm


[A | b] = 
 ..

.. 
 

a21
1. If a11 �= 0, perform row operations to create zeros below a11. (− )
a11
a a . . . a1n b1
 
 11 12 
 0 a�22 . . . a�2n b�2 

0 a�n2 . . . a�mn b�m


 
 .. .. .. .. .. 
 
23

2. If a11 = 0, interchange the first row with another row (say, the kth
row) for which ak1 �= 0. Then perform the operations in step1 to this
new matrix.
0 ... ... b
 
 . . . .1 
 . . . . 

.. .. .. ..
 
a ... ... b 
 k1 2

3. Apply the above steps to the following submatrix.



a�22 . . . a�2n b2
 

a�n2 . . . a�mn b�m


 . . . .. 
 . . . 

24

4. If all ak1’s are zeros,


0 a12 . . . a1n b1
a22 . . . a2n
 

.. .. ..
 
0 b2 

0 am2 . . . amn bm
[A | b] = 
 ..

.. 
 

apply the above steps to the following submatrix.

a . . . a1n b1
 
 12 
 a22 . . . a2n b2 

an2 . . . amn bm
 
 .. .. .. .. 
 

Continue these row operations, we can obtain an echelon form of [A | b].


25

� ∗ ∗ ∗ ∗
 
 
0 � ∗ ∗ ∗

0 0 0 0 0
[A | b] → 
0 0 0 � ∗

 

• In this step, we can determine whether the original equation Ax = b


is consistent or not.
• If a non-zero element of the last column (b-column) is a pivot, then
the original equation is inconsistent and no solution exits.



0 . . . 0 bk
 
 , bk �= 0

Because this corresponds to an equation


0x1 + 0x2 + · · · + 0xn = b�k

26

• Since elementary row operations do not change the solution(s)


of Ax = b, we conclude that it has no solutions.

Example Inconsistent equations.


x1 + 3x2 = 2
x1 + 3x2 = 5
which has the row-echelon form
13 2 132

13 5 003
� � � �

corresponding to
0x1 + 0x2 = 3
27
On the other hand, if the equations are consistent, we can further
make it a reduced echelon form. Begin with the rightmost pivot.

� ∗ ∗ ∗ ∗
 

1. Make each pivot 1 by a scaling row operation.


 

2. Create zeros above each pivot.


0 � ∗ ∗ ∗

0 0 0 0 0
 
0 0 0 � ∗

Finally we can obtain a reduced echelon form.


 

� ∗ ∗ ∗ ∗ 1∗ ∗ ∗ ∗ 1∗ ∗ 0 ∗ 10 ∗ 0 ∗
0 0
       

∗ ∗ ∗ ∗
0 1 0 1 0 1
       
0 � ∗ ∗ ∗  ∗  ∗  ∗

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0
  → 0 1  → 0 1  → 0 1 
0 0 0 � ∗ 0 0 ∗ 0 0 ∗ 0 0 ∗
       

28
Example Consider a system of linear equations
3x2 − 6x3 + 6x4 + 4x5 = −5
3x1 − 7x2 + 8x3 − 5x4 + 8x5 = 9
3x1 − 9x2 + 12x3 − 9x4 + 6x5 = 15
The augmented matrix
0 3 −6 6 4 −5
 
 

3 −9 12 −9 6 15
[A | b] =  3 −7 8 −5 8 9 
29

0 3 −6 6 4 −5
 

6 15
 

3 −9 12 −9
[A | b] =  3 −7 8 −5 8 9

3 −9 12 −9 6 15
 
 

0 3 −6 6 4 −5
→  3 −7 8 −5 8 9

3 −9 12 −9 6 15 3 −9 12 −9 6 15
   

0 0 0 0 1 4
   

0 3 −6 6 4 −5
→  0 2 −4 4 2 −6  →  0 2 −4 4 2 −6  (pivots)

=⇒ Consistent.

30

3 −9 12 −9 0 −9
 

0 0 0 0 1 4
 
→  0 2 −4 4 0 −14 

3 −9 12 −9 0 9
 

0 0 0 0 1 4
 
→  0 1 −2 2 0 −7 

3 0 −6 9 0 −72 1 0 −2 3 0 −24
   

0 0 0 0 1 4 0 0 0 0 1 4
   
→  0 1 −2 2 0 −7  →  0 1 −2 2 0 −7 

We obatain the reduced row-echelon form.


31

1 0 −2 3 0 −24
 

0 0 0 0 1 4
 
 0 1 −2 2 0 −7 

x1 + (−2)x3 + 3x4 = −24


x2 + (−2)x3 + 2x4 = −7
x5 = 4

Basic (leading) variables: x1, x2, x5. (at pivot positions)


Free variables: x3, x4.

Note in this example, we have 3 equations on 5 unkonwns.

32

• The number of basic variables = the number of (effective) equations.


• The number of free variables = the total number of variables − the
number of basic variables.

About effective equations,

x1+2x2 = 5 x1+2x2 =5
3x1− x2 = 1 ⇒ 3x1− x2 =1
4x1+ x2 = 6

1 2 5 1 2 5 1 2 5
     

4 1 6 0 0 0
     

0 −7 − 14
 3 −1 1  →  0 −7 − 14  →  0 1 2 
33
x1 = −24 + 2x3 − 3x4
x2 = −7 + 2x3 − 2x4
x5 = 4
x3 = t1 ∈ R (free variable)
x4 = t2 ∈ R (free variable)

x1 = −24 + 2t1 − 3t2


x2 = −7 + 2t1 − 2t2
x3 = t1
x4 = t2
x5 = 4
t1, t2 ∈ R

34
The solution set of Ax = b is

x1 −24 2 −3
  
    
       

t1, t2 ∈ R
 x2   −7   2   −2 
       
      

x5 4 0 0
x =  x3  =  0  +  1  t1 +  0  t2 ,
       
 x4   0   0   1

Note that
−24 2 −3
     
     
 −7  2  −2 
     
 A  

4 0 0
A 0   = b,  1  = 0, A 0 =0
     
 0  0  1 
35

−24 2 −3
     

0
     
 −7  2  −2 
     
Ax = A   + A  1  t1 + A  0  t2

4 0 0
     
     
 0  0  1 

=b+0+0
=b

36
Summary of the process of solving Ax = b
1. Perform elementary row operations on the augmented matrix
[A | b] to make it into an echelon form.
2. Determine if this system is consistent. If it is inconsistent, no solu-
tions exist. Otherwise, further make it into a reduced echelon form.
3. If there are no free variables, we can immediately obtain the unique
solution of this system from the reduced echelon form.

1
1
 

...
 
 

1
 �
 b
 
37

4. If there are free variables, solve the reduced system of equations for
the basic variables in terms of free variables.
(In this case, there are infinitely many solutions.)

1 0 −2 3 0 −24
 

0 0 0 0 1 4
 
 0 1 −2 2 0 −7 

38
Example
x + y + 2z = 9 1 1 2 9 1 1 2 9
   
   

3x + 6y − 5z = 0 3 6 −5 0 0 3 −11 −27
2x + 4y − 3z = 1 ⇒  2 4 −3 1  ⇒  0 2 − 7 − 17 

1 1 2 9 1 1 2 9
   

0 3
   

−11 −27 0 0 −1/2 −3/2


⇒  0 1 − 7/2 − 17/2  ⇒  0 1 − 7/2 − 17/2 

1 1 2 9 1 1 0 3 1 0 0 1
     

0 0 1 3 0 0 1 3 0 0 1 3
     
⇒  0 1 − 7/2 − 17/2  ⇒  0 1 0 2  ⇒  0 1 0 2 

x 1
   

z 3
   
⇒ y = 2
39

Note for Ax = b, A: m × n
1 0 −2 3 0 −24
2
 

(A : 4 × 5)
0 1
 
 0 1 −2 0 −7 

0 0 0 0 0 0
[A | b] ∼ 
0 0 0

 4

(#: number)
(# of basic variables) + (# of free variables) = (# of variables) = n
(# of basic variables) = (# of effective equations) ≤ m

If n > m, then Ax = b must have free variables.


(The number of unknowns > the number of equations)

40
• Existence and uniqueness of solutions
Existence of solutions
1. The equation Ax = b has a solution (consistent) if and only if
the row echelon form of [A | b] does not have a row like
[0 . . . 0 b�] b� �= 0

2. Since Ax = b can be written as

x1
 

xn
 . 

or
[a1 a2 · · · an]  .  = b

x1 a 1 + x 2 a 2 + · · · + x n a n = b
41

we see that Ax = b has a solution if and only if


b is a linear combination of the columns of A.

Uniqueness of the solution


If the equations are consistent, and there are no free variables,
then the solution is unique.

42
• Homogeneous systems: Ax = 0

Ax = 0 always has a trivial solution: x = 0. (consistent)


For homogeneous equtions, we do not have the following case.
[0 . . . 0 b�] b� �= 0

Non-trivial solution: ∃ x �= 0 such that Ax = 0.


In the previous example, if we set b = 0, then
0 3 −6 6 4 0
 
 

3 −9 12 −9 6 0
[A | 0] =  3 −7 8 −5 8 0 

has solution as
43

x1 2 −3
     
     

t1, t2 ∈ R
 x2   2   −2 
     
    

x5 0 0
x =  x3  =  1  t1 +  0  t2,
     
 x4   0   1

with
2 −3
   
   
2  −2 
   
A  

0 0
 1  = 0, A 0 =0
   
0  1 

Therefore, Ax = 0.

44

Therefore, if Ax = 0 has free variables, then it has non-trivial


solutions, since we can set each free variable as any (non-zero) value.
Ax = 0 has free variables ⇐⇒ Ax = 0 has non-trivial solutions

Conclusions:
Ax = 0 has free variables.
⇐⇒ Ax = 0 has non-trivial solutions.
⇐⇒ Ax = b has infinitely many solutions, if it is consistent.
45

Theorem Suppose Ax = b has a solution p. (Ap = b)


Then the solution set of Ax = b can be expressed as

{p + vh | Avh = 0}
(See the previous example.)

46

Proof :
Let S1 = {x | Ax = b} be the solution set of Ax = b, and
S2 = {p + vh | Avh = 0}, Ap = b.
1. Since p + vh ∈ S2, and
A(p + vh) = Ap = b
we have p + vh ∈ S1. Therefore, S2 ⊆ S1.
2. On the other hand, let w ∈ S1. Then Aw = b. Note
A(w − p) = b − b = 0
Since A(w − p) = 0, let vh = w − p. We have Avh = 0. and
w = p + vh ∈ S2. Therefore, S1 ⊆ S2.

Since S2 ⊆ S1 and S1 ⊆ S2, we conclude that S1 = S2.


47
• The equivalence relation
Examples of relations :
1. 5 > 3, −1 < 2
2. {1, 2, 3} ⊂ {1, 2, 3, 4, 5, 6}, {2, 3, 9} � {1, 2, 3, 4}
3. Two triangles are similar.
4. Relatives; Classmates; Friends.

We say that a relation (∼) is an equivalence relation if


1. A ∼ A. (Reflexivity)
2. If A ∼ B, then B ∼ A. (Symmetry)
3. If A ∼ B, B ∼ C then A ∼ C. (Transitivity)

48
Example (equivalence relations)
1. A triangle A is similar to a triangle B.
2. A person A is a relative of a person B.

Definition For two matrices A, B, we say A is row equivalent to B


if we can transform A into B by a sequence of elementary row opera-
tions.

A ∼ A1 ∼ A2 ∼ · · · ∼ B
49

A ∼ A1 ∼ A2 ∼ · · · ∼ B

“Row equivalent to” is also an equivalence relation.


For matrices A, B, and C, we have the following results.
1. A ∼ A. ( ∼ : “row equivalent to” )
2. If A ∼ B, then B ∼ A.
3. If A ∼ B, B ∼ C then A ∼ C.

50

Note that
1. A matrix A is row equivalent to all its row echelon forms.
2. A matrix A is row equivalent to its reduced row echelon form.
3. All the row echelon forms of A are row equivalent.

A ∼ R 1 , A ∼ R2 , · · ·

R1 ∼ R2, R2 ∼ R3 · · ·
51
Recall that a matrix can have many row echelon forms.

1 ∗ ∗ ∗ ∗
1
 

∗ ∗
0 0 1
 
0 ∗

0 0 0 0 0
R1 = 
0

 ∗

However, the reduced row echelon form of a matrix is unique.

1 0 ∗ 0 ∗
1 0
 


0 0 1
 
0 ∗

0 0 0 0 0
R2 = 
0

 ∗

52

To see that the reduced row-echelon form of a matrix is unique, notice


the following facts.
• Let C1 and C2 are two matrices in reduced row echelon forms. If
C1 �= C2, then it is impossible that C1 ∼ C2. (∼: row equivalent to)
• For a matrix C1 that is in reduced row echelon form, you cannot use
elementary row operations to transform C1 into another reduced row
echelon form.

1 0 ∗ 0 ∗ 1 0 ∗ ∗ 0
1 0 1
   

∗ ∗ ∗
0 0 1 0 0 0
   
0 ∗ 0 0

0 0 0 0 0 0 0 0 0 0
C1 = 
0
, C2 =  
 ∗
0
 1
53

If a matrix A has two reduced row echelon forms C1 and C2, then
A ∼ C1 and A ∼ C2

which implies that C1 ∼ C2, and this causes a contradiction if C1 �= C2.

Therefore, we conclude that the reduced row echelon form of a matrix


is unique.

(Equivalence relation)
1. A ∼ A. (Reflexivity)
2. If A ∼ B, then B ∼ A. (Symmetry)
3. If A ∼ B, B ∼ C then A ∼ C. (Transitivity)
Other examples : Set inclusion (A ⊂ B); the finger-guessing game.

54

A system of linear equations Ax = b has either


1. no solutions, or
2. exactly one solution ( a unique solution), or
3. infinitely many solutions.

When there is no solution, one may want to find a x̃ such that


Ax̃ is nearest to b.

When there are infinitely many solutions, one may want to find
a solution x̃ that has the minimum “length”.

We will discuss the above issues later.


55
� Matrices

Compared with a vector v in Rn, an m × n matrix A is a


two-dimensional (2D) array.

a11 · · · a1j · · · a1n


.. ..
 
 .. 
 

.. ..
 
A=
 ai1 · · · aij · · · ain 

am1 · · · amj · · · amn


 .. 
 

An m × n matrix A is related to a linear transformation from Rn to Rm.


y = Ax, x ∈ Rn , y ∈ Rm

56

For an m × n matrix A, with m rows and n columns,

a11 · · · a1j · · · a1n


..
 
 . .. 
 . 

..
 
A= ain 

amn
 ai1 · · · aij · · ·  = [a1 a2 · · · an]

am1 · · · amj · · ·
 . .. 
 . 

where aij is the (i, j)-entry of A, [A]ij = aij ∈ R or C

a
 

m m
 1j 
 a2j 

amj
and aj =  
 ..  ∈ R or C is the jth column of A
 
57
Remarks
1. When m = n, A is a square matrix of order m.

2. The diagonal entries of A are a11, a22, · · · .

3. An m × n zero matrix
0 ··· 0
 

0
 . . . .. 
··· 0
O =  .. 

is an m × n matrix whose all entries are zeros.


4. A diagonal matrix is a square matrix whose entries are all zero except
d O
d 22
 

...
 11 
 

O dnn
the diagonal entries. D =   

 

58
The identity matrix

1 O
1
 

...
 
 

O 1
In = 

 = [e1 e2 · · · en]

 

is a diagonal matrix of order n, where

1 0 0
     

.
     
0 1 0

0 0 1
   
e1 =  .  , e 2 =  .  , · · · , e n =  
 
. . .

Sometimes we write only I if no ambiguity exits.


59

We now consider the definitions of the multiplication of


1. a matrix A and a vector v,
2. two matrices A and B,
with proper dimensions.

They are related to the linear transformation from Rn to Rm.

T
Rn Rm

v T (v)

60
n m
Consider a transformation T from R to R ,
T : Rn �→ Rm : v �→ T (v)
which maps v to T (v). The domain of T is Rn and the codomain is
Rm .

A transformation T is called linear if it satisfies


1. T (u + v) = T (u) + T (v)
2. T (cu) = cT (u)
for all u, v in Rn and c in R.

The above is the superposition principle.


61
n
The standard basis vectors for R are

1 0 0
     
     
0    

0 0 1
e1 =   , e 2 =  1  , · · · , en =  0 
 ..   ..   .. 
     

For any vector v ∈ Rn, we can express it as


v
 
 1
 v2 

vn
v= 
 ..  = v1e1 + v2e2 + · · · + vnen
 

62
n m
Then for a linear transformation T R �→ R , we have

T (v) = T (v1e1 + v2e2 + · · · + vnen)

= v1T (e1) + v2T (e2) + · · · + vnT (en)

In the above T (e1), . . . , T (en) are vectors in Rm.

T (v) = v1T (e1) + v2T (e2) + · · · + vnT (en)


v
 
 1
� �  v2 

vn
= T (e1) T (e2) · · · T (en)  
 .. 
 

The matrix [T (e1) T (e2) · · · T (en)] is of size m × n.


63
Therefore, we define the multiplication of a matrix A and a vector
v as
v
 
 1
 v2 

vn
Av = [a1 a2 · · · an]  
 .. 
 

= v1a1 + v2a2 + · · · + vnan

Every linear transformation T from Rn to Rm is associated with an


m × n matrix: A = T (e1) T (e2) · · · T (en) , and T (v) = Av.
� �

64
Now we consider the definition of matrix multiplication, which
corresponds to the composition of two linear transformations.

Consider two linear transformations


T : R p → Rn y = Bx, B :n×p
S : R n → Rm z = Ay, A:m×n
where x ∈ Rp, y ∈ Rn, z ∈ Rm.
T S
p n
R R Rm

x y=Bx z=Ay

S◦T
65

Then the composition of S and T is

S ◦T : R p → R m z= ABx

in which we have AB, the product of two matrices A and B.

T S
p Rn
R Rm

x y=Bx z=Ay

S◦T

66
Now for the m × n matrix A and n × p matrix B,
A = [a1 a2 · · · an] = [aij ] , ak ∈ R m

B = [b1 b2 · · · bp] = [bij ] , bk ∈ Rn

x = (x1, x2, . . . , xp)T

Bx = x1b1 + · · · + xpbp (2)

ABx = A(Bx) = A(x1b1 + · · · + xpbp)


= A(x1b1) + · · · + A(xpbp) (3)
= x1(Ab1) + · · · + xp(Abp)

Compared with (2), the kth column of AB should be defined as Abk .


67

AB = [Ab1 Ab2 · · · Abp]

and the (i, j)th entry of AB is the ith entry of Abj ,


[AB]ij = [Abj ]i
Since
b1j
..
 

bnj
 
Abj = [a1 · · · an]  

= b1j a1 + b2j a2 + · · · + bnj an


we have
n
[AB]ij = [Abj ]i = ai1b1j + ai2b2j + · · · + ainbnj = aik bkj
k=1

68

n
[AB]ij = aik bkj
k=1

b1j
..
  

bnj
  
 ai1 · · · ain   
69
Remarks
1. A = B if size(A) = size(B) = (m, n), and
[A]ij = [B]ij , 1 ≤ i ≤ m, 1 ≤ j ≤ n

2. When A and B are of the same sizes, we define A + B, the sum of


A and B, by
[A + B]ij = [A]ij + [B]ij

3. If r is a scalar and A is a matrix, the scalar multiple rA is defined


by
[rA]ij = r[A]ij , 1 ≤ i ≤ m, 1 ≤ j ≤ n

4. A − B = A + (−B).

70
The algebraic properties of matrices,
1. A + B = B + A
2. (A + B) + C = A + (B + C)
3. A + O = A, A + (−A) = O
4. r(A + B) = rA + rB
5. (r + s)A = rA + sA
6. r(sA) = (rs)A, 1A = A
where A, B, and C are of the same sizes, and
r, s are real or complex numbers.

In fact, the set of all m × n matrices satisfies the definitions of


a vector space, which we will study later.
71
Theorem Let A be an m × n matrix, and B, C, D have sizes for
which the indicated sums and products are defined.
1. A(BC) = (AB)C Thus we can write ABC without ambiguity.
2. A(B + C) = AB + AC (left distributive law)
3. (B + C)D = BD + CD (right distributive law)
4. r(AB) = (rA)B for any scalar.
Thus we can write rAB without ambiguity.
5. ImA = A = AIn

72
Note for two square matrices A and B, in general, AB �= BA.
b1j
..
  

bnj
  
[AB]ij =  ai1 · · · ain   

a1j
..
  

anj
  
[BA]ij =  bi1 · · · bin   

Example (Commutative Property)


1. For the above matrices, we have A + B = B + A.
2. For any a, b ∈ R (or C), we have ab = ba, a + b = b + a.
73

In addition,
AB = O � A = O or B = O
Cf. for any a, b ∈ R (or C), we have ab = 0 ⇒ a = 0 or b = 0.

For example,
1 1
A= , B=
1 −1
1 1
� � � �

1 −1
we have
0 0
AB = =O
0 0
� �

In this example, A �= O and B �= O, but AB = O.

74

For any a, b, c ∈ R or C, we have ab = ac ⇒ b = c if a �= 0.


However, if AB = AC, we have A(B −C) = O, but by the above result,
this does not imply that B − C = O or B = C, even if A �= O.

AB = AC � B = C

Cf.
A+B =A+C ⇒ B =C

However, when A is an invertible matrix,


AB = AC ⇒ B = C
75
• Powers of a sqaure matrix

If A is a square matrix, we define



Ak = A k = 1, 2, 3, . . .
k
� ·��
· · A� ,

We define A0 = I, if A �= O.
Note that
(A + B)2 = (A + B)(A + B) = A2 + AB + BA + B 2
(A + B)3 = (A + B)(A2 + AB + BA + B 2) = · · ·

How to efficiently calculate Ak , the powers of a square matrix? We


will discuss this issue later.

76
• The transpose of a matrix

Given an m × n matrix A, the transpose of A is the n × m matrix,


denoted by AT .
1. The rows of AT are formed by the corresponding columns of A.
2. [AT ]ij = [A]ji

Rn Rm
u Au

AT v v
77

a11 · · · a1j · · · a1n


.. ..
 
 .. 
 

.. ..
 
A=
 ai1 · · · aij · · · ain  = [a1 a2 · · · an]

am1 · · · amj · · · amn


 .. 
 

a11 · · · ai1 · · · am1 aT1


..
T
 
 

T
a 1j a mj
 . 

· · · · · ·

. .
 . .  

.
 
  a2 
A =

aTn
= 
   ... 

a1n · · · ain · · · amn


 .    
 . . 

78

Theorem Let A and B denote matrices whose sizes are appropriate


for the following sums and products.
1. (AT )T = A
2. (A + B)T = AT + B T
3. (rA)T = rAT for any scalar r.
4. (AB)T = B T AT
Proof of (4)
79
T T T
Consider the (i, j)-entry of (AB) and B A

A B
|
[(AB)T ]ij
   
   

|
T T
 −− j-th row −−   i-th column 

B A
|
[B T AT ]ij
   
   

|
 −− i-th row −−   j-th column 

[(AB)T ]ij = [AB]ji = aj1b1i + aj2b2i + · · · + ajnbni


= b1iaj1 + b2iaj2 + · · · + bniajn = [B T AT ]ij

80
We further have
(ABCD)T = DT C T B T AT

since (ABCD)T = DT (ABC)T = DT C T (AB)T = DT C T B T AT

Recall the definition of the Hermitian (or Hermitian transpose) of a


vector,
uH = uT = [a1 a2 · · · an]

and the Hermitian of a matrix,


aT1
 

H T
 
 aT 
 2

aTn
A = A ==  
 ... 
 
81
• The inverse of a square matrix

An n × n square matrix A is said to be invertible if there is an


n × n matrix C such that
CA = In and AC = In (4)
The matrix C in (4) is unique. Since if there is another matrix B such
that
BA = I and AB = I
then
C = CI = C(AB) = (CA)B = IB = B

Therefore, the inverse of a matrix is unique, and we can denote C in (4)


by A−1,
AA−1 = A−1A = I

82
n
Theorem If A is an invertible n × n matrix, then for each b in R ,
the equation Ax = b has a unique solution x = A−1b.

We will extend the inverse of square matrices to a general m×n matrix


A. Then for a general linear equation Ax = b, we have a proper solution
x̂ = A+b
where A+ denotes the pseudo-inverse of A.
83

If a square matrix A is not invertible, then A is said to be singular.


( invertible ⇔ non-singular )

Theorem If R is the reduced row-echelon form of a square matrix A of


size n, then either R is the identity matrix In. or R has a row of zeros.

1 0 0 0 1 0 ∗ 0
1 0 1
   


0 1 0 0
   
0 0 0 0

0 0 0 1 0 0 0 0
I=
0
,  
 0
0
 1

84
Theorem Assume A and B are invertible matrices.
1. (A−1)−1 = A (So if C = A−1, then C −1 = A)
∵ AC = CA = I

2. (AB)−1 = B −1A−1 (thus AB is invertible)


∵ (AB)(B −1A−1) = (B −1A−1)(AB) = I

3. (AT )−1 = (A−1)T (thus AT is invertible) (can be written as A−T )


Proof :
AA−1 = A−1A = I
(AA−1)T = (A−1A)T = I T = I
(A−1)T AT = AT (A−1)T = I
⇒ (AT )−1 = (A−1)T
85
Furthermore, if A, B, C, D are invertible, then
(ABCD)−1 = D−1C −1B −1A−1
Since
(ABCD)(D−1C −1B −1A−1) = I

(D−1C −1B −1A−1)(ABCD) = I

Cf.
(ABCD)T = DT C T B T AT

86
From the above, since
(ABCD · · · )−1 = · · · D−1C −1B −1A−1
we have (let A = B = C · · · )
(An)−1 = (A−1)n = A−n

We also note
(cA)−1 = c−1A−1
87
n m
Recall that a linear transformation T from R to R can be represented
by a matrix
A = [T (e1) T (e2) · · · T (en)]
and we can write T (x) = Ax.

On the other hand, give an m × n matrix A, we can use A to define a


linear transformation TA : Rn → Rm as TA(x) = Ax. We also call this
as the matrix transformation.
For a square matrix A of size n, TA is a linear operator on Rn. If A is
invertible, then the inverse of TA is

(TA)−1 = TA−1

88

TA : x �→ Ax

TA−1 : Ax �→ x

Let y = Ax. Then x = A−1y.


Therefore, we have
TA−1 : y �→ A−1y
and we see that
TA−1 = TA−1

T Rn
Rn
x y=Ax

x = A−1y y
T −1
89
3
Example Consider a linear transformation T on R , define as
a a+b
   

c c+a
   
T (  b ) =  b + c 

then the stardard matrix of T is


1 1 0
 

1 0 1
 
A = [T (e1) T (e2) T (e3)] =  0 1 1 

Note
a 1 1 0 a a+b a
        

c 1 0 1 c c+a c
        
A  b  =  0 1 1   b  =  b + c  = T ( b )

90
• Elementary matrices
Recall the three elementary row operations on a matrix.
1. (Replacement) Add to one row a multiple of anther row.
2. (Interchange) Interchange two rows.
3. (Scaling) Multiply all entries in a row by a non-zero constant.

×3

×2
91

The three elementary row operations are reversible.

×(−3)

×1/2

− By performing an elementary row operation on an identity matrix In,


we obtain an elementary matrix.
− Since there are three types of elementary row operations, we have
three types of elementary matrices.

92
1. (Replacement) (n = 3)

100 100 100


     

001 501
     

−5 0 1
I =  0 1 0  , E1 =  0 1 0  , E1−1 =  0 1 0 

2. (Interchange)
100 100 100
     

001 010 010


     
I =  0 1 0  , E2 =  0 0 1  , E2−1 =  0 0 1 

3. (Scaling) Assume c �= 0,
100 100 1 0 0
     

001 001 0 0 1
     
I =  0 1 0  , E3 =  0 c 0  , E3−1 =  0 c−1 0 
93

Fact If we perform an elementary row operation on an m × n matrix


A, the resulted matrix can be written as EA,
A −→ EA
where the elementary matrix E is created by performing the same row
operation on Im.
Im −→ E

94

100 100 100 a11 a12 a13


       

501 010 001 a31 a32 a33


       
E1 =  0 1 0  , E2 =  0 0 1  , E3 =  0 c 0  , A =  a21 a22 a23 

a11 a12 a13


a21 a22 a23
 

a31 + 5a11 a32 + 5a12 a33 + 5a13


 
E1A =  

a11 a12 a13 a11 a12 a13


   

a21 a22 a23 a31 a32 a33


   
E2A =  a31 a32 a33  , E3A =  ca21 ca22 ca23 
95

Recall that every elementary row operation is reversible.


⇒ Every elementary matrix is invertible.
(See the examples of E1, E2, E3, and E1−1, E2−1, E3−1.)

96
Theorem (Equivalent Statements of Matrix Inversion)
If A is an n × n matrix, then the following statements are equivalent.
a. A is invertible.
b. Ax = 0 has only the trivial solution. (x = 0)
c. The reduced row-echelon form of A is In. ( A ∼ In)
d. A is expressible as a product of elementary matrices.

Proof : ( (a)⇒(b)⇒(c)⇒(d)⇒(a) )

(a)⇒(b) : A−1Ax = A−10 ⇒ x = 0


10 ∗ 0 0
0
 


0 1
(b)⇒(c) : [A | 0] ∼ [In | 0]
00 0 0 0
0 1 0
 
0 0 0
97
(c)⇒(d) :
A ∼ In
Eq Eq−1 · · · E1A = In
Eq−1 · · · E1A = Eq−1
−1
Eq−2 · · · E1A = Eq−1 Eq−1
−1
A = E1−1 · · · Eq−1 Eq−1

−1
(d)⇒(a) : Since A = E1−1 · · · Eq−1 Eq−1, we have
A−1 = (E1−1E2−1 · · · Eq−1)−1 = Eq · · · E2E1

98
−1
• An algorithm for finding A

Consider the augmented matrix [A I]. If A is invertible, then A is


row equivalent to I. Perform a sequence of elementary row operations
on [A I] to transform A into an identity matrix I,

Eq · · · E2E1 [A I] = [I (Eq · · · E2E1)] = [I A−1]

(See Example 4 in Section 1.5 of textbook.)


99

Theorem Let A and B be two n × n matrices.


1. If B satisfies BA = I, then B = A−1.
2. If B satisfies AB = I, then B = A−1.

By this theorem, we have


BA = I ⇔ AB = I ⇔ B = A−1

100
Proof :
1. BA = I
(a) First we prove that A is invertible.
(b) We can show that Ax = 0 has only the trivial solution.
Multiplying both sides by B, we have BAx = B0, or x = 0,
the trivial solution.
By the previous theorem (equivalent statements of matrix inver-
sion), we see that A is invertible and A−1 exists.
(c) Now since A is invertible and
BA = I
Multiply both sides of the above by A−1 from the right, we have
B = A−1. By the previous results, B −1 = (A−1)−1 = A.
101

2. AB = I By 1., B is invertible and A = B −1 and A−1 = B.

Theorem (Equivalent Statements of Matrix Inversion)

If A is an n × n matrix, then the following statements are equivalent.


a. A is invertible.
b. Ax = b has exactly one solution for every b ∈ Rn.
c. Ax = b is consistent for every b ∈ Rn.
Proof :
(a)⇒(b) : Ax = b has a unique solution x = A−1b for every b ∈ Rn.
(b)⇒(c) : Ax = b has a unique solution and hence it is also consistent.
(c)⇒(a) : If Ax = b is consistent for eveary b ∈ Rn, then each of the
following systems of linear equations has a solution,

102

1 0 0
     
     
0 1 0
     
    

0 0 1
Ax1 =  0  , Ax2 =  0  , · · · , Axn = 
0
. . .
. . .

Let X = [x1 x2 · · · xn], then we have

AX = A[x1 x2 · · · xn] = [e1 e2 · · · en] = I

and by the previous theorem we have X = A−1, and A is invertible.


103
• Diagonal matrices
A diagonal matrix is a square matrix whose entries are all zero except
the diagonal entries.
d 0
d 22
 

...
 11 
 

0 dnn
D= 


 

If all dkk ’s are non-zero, D is invertible,


d −1
0
d
 

−1 22
...
 11 −1 
 

0 d−1
nn
D =  

 

104
The powers of D is
dk11 0
dk22
 


...
 
 

0 dknn
Dk = DD · · · D = 



 

We will discuss later how to efficiently compute Ak for a square matrix


A.
A = EDE −1
A2 = EDE −1EDE −1 = ED2E −1
Ak = (EDE −1)k = EDk E −1
105
• Triangular matices

Upper (Lower) triangular matrices:

A square matrix whose entries below (above) the main diagonal are
zero.

∗ ∗ ∗ ∗ ∗ 0 0 0
0
   

∗ ∗
0
   

∗ ∗
0 ∗  0

0 0 0
  (upper triangular) ,  ∗ ∗  (lower triangular)

∗ ∗∗ ∗ ∗
0 ∗ ∗ ∗ 0
   

A diagonal matrix D is an upper and lower triangular matrix.

Theorem Let U and L denote upper and lower triangular matrices, 106
respectively.
a. U T is lower triangular, while LT is upper triangular.
b. U1U2 is upper triangular, L1L2 is lower triangular.
c. If all the diagonal entries of U (or L) are non-zero, then U (or L) is
invertible.
d. If U (or L) is invertible, the U −1 (or L−1) is upper (lower) triangular.
Proof : (a) is clear.
(b)
∗∗ ∗ ∗ ∗ ∗ ∗ ∗ x x x x
x x
    

∗ ∗ ∗
0 0 x
    

∗ ∗
0 ∗ ∗  ∗  x

0 0 0 0 0 0 0 0 0 x
  0 =0 

∗ ∗
0 0 ∗  ∗  x
  0  0 
107
The proofs of (c) and (d) will be given in the next chapter when we
discuss determinants.

• Symmetric matrces
Definition A square matrix A is called symmetric if AT = A.

Example The matrix

a e h p
b f
 

f c
 
e k

p k g d
M =
h
 = MT
 g

is a symmetric matrix.

108

Theorem If A and B are symmetric matrices with the same


sizes, and if k is a scalar, then
a. AT is symmetric. (AT = A)
b. A + B and A − B are symmetric.
c. kA is symmetric.
In general, AB is not symmetric.

(AB)T = B T AT = BA �= AB
109

Theorem If A is an invertible symmetric matrix, then A−1 is symmtric.


Proof :

(A−1)T = (AT )−1 = A−1

Let B be any m × n matrix. Then


• BB T is an m × m symmetric matrix.
• B T B is an n × n symmetric matrix.

(BB T )T = (B T )T B T = BB T

110
• Partitioned matrices

A matrix A can be regarded as a 2-D array of entries, or a list of col-


umn vectors or row vectors. For an m × n matrix A,

a11 · · · a1j · · · a1n


. . rT1
.
 
 
 . 
 . .   T

..
   r2 

rTm
 
A =  ai1 · · · aij · · · ain  = [c1 c2 · · · cn] =  
 .. 

am1 · · · amj · · · amn


 . ..   
 . 
111
Other partitions of matrices:

A11 A12 B1
A= , B= , C = [C1 C2]
A21 A22 B2
� � � �

Multiplication rules for partitioned matrices:

A11 A12 B1
C = [C1 C2] , A= , B=
A21 A22 B2
� � � �

A11B1 + A12B2
AB = , CA = [C1A11 + C2A21 C1A12 + C2A22]
A21B1 + A22B2
� �

if each A11B1 etc. and C1A11 etc. are defined.

112
Note the order in multiplication of two matrices,
A11B1 + A12B2 �= B1A11 + B2A12

C1A11 + C2A21 �= A11C1 + A21C2


Remarks
1. If [A B]C is defined, note that
[A B]C �= [AC BC]
and the latter is even undefined. However,
D[A B] = [DA DB]

2. Similarly,
A AC A DA
C= , D �=
B BC B DB
� � � � � � � �
113

Let
A11 A12
A=
A21 A22
� �

then
T T
A11 A21
T
T T
 

A12 A22
A = 

Exercise
Find the transpose of B and C

B1
B= , C = [C1 C2]
B2
� �

114
• Column-Row expansion of AB
Recall that for matrix multiplication,
b1j n
.. aik bkj
  

bnj k=1
   �
[AB]ij =  ai1 · · · ain   =

which is called the row-column rule.


If A is an m × n matrix and B is an n × p matrix,
T
b
 

( ak ∈ Rm, bk ∈ Rp )
 1T 
 b2 

bTn
A = [a1 a2 · · · an], B = 
 .. 

 

then
AB = a1bT1 + a2bT2 + · · · + anbTn
115
which is called the column-row rule.

a
 

1≤k≤n
 1k 
 a2k 

amk
ak bTk =  
 ..  [bk1 bk2 · · · bkp] ,
 

Since
[ak bTk ]ij = aik bkj
we have
n
T
a1bT1 + a2bT2 + · · · + a n bn ij
= ak bTk ij
k=1

n
� � � �

= aik bkj
k=1

= [AB]ij

116
• Inverses of partitioned matrices

A11 A12
If A = is invertible,
O A22
� �

A11 A12 A−1


11 A−1 −1
11 A12A22
=

O A22 O A−1
22
� �−1 � �

Note that A is invertible implies that both A11 and A22 are invertible,
and vice versa.
Exercise Check
A−1
11 A−1 −1
11 A12A22 A11 A12
=I

O A−1
22 O A22
� �� �
117
Exercise Find the inverse of

A11 O
M=
A21 A22
� �

Hint
A11 O AT11 AT21
MT =
T
 

A21 A22
� �T

O AT22
= 

(M T )−1 = (M −1)T

118

• LU decomposition
(factorization) 1 0 0 0 • ∗ ∗ ∗ ∗
1 0
  

• ∗ ∗
1 0 0
  

∗ •
∗ 0  ∗

1 0 0 0 0 0
A=  0 

∗ ∗ ∗
∗ 0  ∗
  0 

L U
A : m × n, L : m × m, U : m × n.

Note L is a lower triangular matrix, while U is in row echelon form.


119

For solving Ax = b, we can use the LU decomposition of A to


decompose the process into two steps. For A = LU , we can solve
LU x = b by
Ly = b
Ux = y
In the situation that we need to solve a sequence of equations,
Ax = b1, Ax = b2, · · · , Ax = bp
It’s more efficient to solving the above equations by the LU decomposi-
tion of A
Ly = bk , k = 1, 2, · · ·
Ux = y
than by performing elementary row operations on [A | bk ].

120

Ax = b

1 0 0 0 • ∗ ∗ ∗ ∗
1 0
  

• ∗ ∗
1 0 0
  

∗ •
∗ 0  ∗

1 0 0 0 0 0
A=  0  = LU

∗ ∗ ∗
∗ 0  ∗
  0 

L(U x) = b

Ly = b
Ux = y
121
− Algorithm for an LU factorization
Assume we can reduce a matrix A to an echelon form U by elementary
row operations, without the row interchange,
•∗∗∗∗
 
 
0 • ∗ ∗ ∗

0 0 0 0 0
A ∼ ··· ∼ U =  0 0 0 • ∗

 

Eq · · · E 1 A = U
where each Ek is lower triangular, for example,
1000 1000 1 0 0 0
1 0
     

0 1
     
 −2 1 0 0   0 1 0 0   0 0

0001 0001 0 0 0 2
     
 0 0 1 0, 3 0 1 0, 0 0
     

122
In
Eq · · · E1A = U
the matrix (Eq · · · E1) is lower triangular.
Then
A = (Eq · · · E1)−1U = LU
where

L = (Eq · · · E1)−1
is lower triangular.

A = (E1−1 · · · Eq−1)U
123

2 4 −1 5 −2 1 0 0 0
Example    

1
 −4 −5 3 −8 1   1 0 0
A=   =I

−6 0 7 −3 1
 2 −5 −4 1 8   1 0

2 4 −1 5 −2 1 0 0 0
   

1
 0 3 1 2 −3   −2 1 0 0 
∼  → 

0 12 4 12 −5 −3
 0 −9 −3 −4 10   1 1 0

2 4 −1 5 −2 1 0 0 0
   

0 0 0 4 7 1
 0 3 1 2 −3   −2 1 0 0 
∼  → 

−3 4
0 0 0 2 1  1 −3 1 0 

2 4 −1 5 −2 1 0 0 0
   

0 0 0 0 5
 0 3 1 2 −3   −2 1 0 0 
∼  → 

−3 4 2 1
0 0 0 2 1 = U  1 −3 1 0  = L

124
Example
6 −2 0 1 0 0
   

−1
3 7 5 1
A = 9 1  1 0

1 − 13 0 6 0 0
   

−1
3 7 5 1
∼ 9 1 →  1 0

1 − 13 0 6 0 0
2
   

0 8 5 3 1
∼ 0 1 → 9 1 0

1 − 13 0 6 0 0
1 2
   

0 8 5 3 1
∼ 0 1 → 9 2 0

1 − 13 0 6 0 0
1 2
   

0 0 1 3 8 1
∼ 0 1=U → 9 2 0 = L
125
Note that
1. LU decomposition doesn’t necessarily exist for every m × n
matrix A.

124
 

037
 
A ∼ ··· ∼ 0 0 1

2. In general, if row interchanges are required to reduce A to its row-


echelon form, then there is no LU decomposition of A.

However, in this case, we can extend the LU decomposition.

126

When row interchanges are required to reduced A to its row-echelon


form, we can first interchang the rows of A before the LU decompo-
sition.
P �A = LU
(b) (b)
where P � = · · · E2 E1 denotes a sequence of row interchange oper-
ations.
3. Then
A = P LU

where P = (P �)−1
127
Definition For a square matrix A of size n, we define
the trace of A as
n
tr(A) = akk , A = [aij ]
k=1

the sum of the entries on the main diagonal of A.

a
a 22
 

...
 11 
 

ann
 
 
 

It’s clear that


tr(A) = tr(AT )

128
Theorem
Suppose that A and B are two n × n square matrices. Prove that
tr(AB) = tr(BA)
Proof
Both AB and BA are n × n matrices.
n n
tr(AB) = [AB]kk = [A]k�[B]�k
k=1 k=1 �=1
� �
n �

n n
tr(BA) = [BA]kk = [B]k�[A]�k
k=1 k=1 �=1
� �
n �
129
Exercise
Suppose that A and B are two m × n matrices. Prove that
tr(B T A) = tr(AB T )

B T A is n × n, while AB T is m × m.

n m m
T T T
tr(B A) = [B A]kk = [B ]k� [A]�k = [B]�k [A]�k
k=1 k=1 �=1 k=1 �=1
� �
n � �
n �

m n n
T T T
tr(AB ) = [AB ]kk = [A]k� [B ]�k = [A]k� [B]k�
k=1 k=1 �=1 k=1 �=1
� �
m � �
m �

130
� Determinants
The determinant of a square matrix A:

A �→ det(A) ∈ R (or C)

The notation of the determinant of A: det(A).

Geometrical meaning of a determinant:

A = [a1 a2], Area = |det(A)|

a2

a1
131
A = [a1 a2 a3], Volume = |det(A)|

a3

a2

a1

We will discuss the volume of the parallelogram in Rn later.

132
Definition For a square matrix A, define
Aij : the submatrix formed by deleting the ith row and jth
column of A.

Example:
1 −2 5 0
1 5 0
 
 
 
 2 0 4 −1   

0 −2 0
A=  A32 =  2 4 −1 

0 4 −2 0
3 1 0 7,
 

Note that if A is of size n, then each Aij is of size n − 1.


133
Definition of determinants. (Iterative)

• Define the determinant of a 1 × 1 matrix A = [a11].


• Based on the determinants of square matrices of size (n−1), we define
the determinant of a square matrix A of size n. (Cofactor exapnsion)

1. For a 1 × 1 matrix
A = a11
� �

we define

det(A) = a11

134

2. For n ≥ 2, the determinant of an n × n matrix A = [aij ]


a11 · · · a1j · · · a1n
..
 
 . .. 
 . 

..
 
A=  ai1 · · · aij · · · ain 

an1 · · · anj · · · ann


 . .. 
 . 

is defined as
det(A)
= a11 det(A11) − a12 det(A12) + · · · + (−1)1+na1n det(A1n)
n
= (−1)1+j a1j detA1j
j=1

135

We call
(1) det(Aij ) : the minor of aij , or the (i, j)-minor.

(2) (−1)i+j det(Aij ) : the cofactor of aij , or the (i, j)-cofactor.

Therefore, the determinant of a square matrix A is defined by the


cofactor expansion of A.

136
Example:
a11 a12
det = a11 det[a22] − a12 det[a21] = a11a22 − a12a21
a21 a22
� �

a11 a12 a13


 

a31 a32 a33


 
det  a21 a22 a23 

a22 a23 a21 a23 a21 a22


= a11 det − a12 det + a13 det
a32 a33 a31 a33 a31 a32
� � � � � �

= a11 (a22a33 − a23a32) − a12 (a21a33 − a23a31)

+ a13 (a21a32 − a22a31)


137
Let
Cij = (−1)i+j det(Aij )
then
det(A) = a11 C11 + a12 C12 + · · · + a1n C1n

• In fact, it can be proved that we can expand along any row, say
the ith row,

det(A) = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin

• or expand down any column, say the jth column,

det(A) = a1j C1j + a2j C2j + · · · + anj Cnj

138
If A, B are triangular matrices,

a a · · · a1n b 0 ··· 0
   
 11 12   11 
 0 a22 · · · a2n   b21 b22 · · · 0 

0 0
A=  B= 

· · · ann bn1 bn2 · · · bnn


 .. . . . ..  ,  .. . . . .. 
   

then det(A) is the product of the entries on the main diagonal A,

a a · · · a1n
a22 · · · a2n
 
 
 11 12 

0
 0 a22 · · · a2n   . . . .. 
· · · ann
0 0
det    = a11a22 · · · ann

· · · ann
 .. . . . ..  = a11 · det  0
 

by expanding along the first column.


139
Similarly, for the lower triangular matrix B

b 0 ··· 0
b22 ··· 0
 

.
 
 11 

bn2
 b21 b22 · · · 0   . . . .. 
· · · bnn
det    = b11b22 · · · bnn

bn1 bn2 · · · bnn


 .. . . . ..  = b11 · det  .
 

by expanding along the first row.

For a diagonal matrix, we have

d11 0
d21
 

...
 
 

0 dnn
det 

 = d11d21 · · · dnn

 

140
• Row operations and determinants

Let A be an n × n matrix. We can use the above cofactor expansion


of a matrix to prove the following facts.
1. The replacement operation doesn’t affect det(A).
2. The interchange operation negates det(A).
3. The scaling (r) operation results in r det(A).

Recall that we can write the row operation of a matrix A as

EaA, or EbA, or EcA

where Ea, Eb, and Ec denote three types of elementary matrices,


respectively.
141
Recall the three types of elementary matrices,
1 00
 

k 01
 
Ea =  0 1 0  (replacement)

100
 

010
 
Eb =  0 0 1  (interchange)

1 00
 

0 01
 
Ec =  0 r 0  , r �= 0 (scaling)

It is clear that
det Ea = 1, det Eb = −1, det Ec = r

142
Let A be a square matrix. We will show that
det (EA) = (det E)(det A)
where E is an elementary matrix.
1. A multiple of one row of A is added to another row.

det (EaA) = det A = (det Ea)(det A) ∵ det Ea = 1

2. Two rows of A are interchanged.

det (EbA) = −det A = (det Eb)(det A) ∵ det Eb = −1

3. One row of A is multiplied by r(�= 0).

det (EcA) = r det A = (det Ec)(det A) ∵ det Ec = r


143
We have
1. det (EaA) = (det Ea) (det A), since det Ea = 1.
2. det (EbA) = (det Eb) (det A), since det Eb = −1.
3. det (EcA) = (det Ec) (det A), since det Ec = r �= 0.

We conclude that
det (Ek A) = (det Ek ) (det A), k = a, b, c (5)

In the following, we will further prove that


det (AB) = (det A) (det B)
for any square matrices A and B.

144

We now consider the determinant of a matrix A when


(1) A is invertible. (2) A is not invertible.
• If A is invertible, we can write
A = E1 · · · Ep−1Ep
where each Ek is an elementary matrix.

By (5), we have
det A = det (E1E2 · · · · · Ep)
= (det E1) det (E2 · · · · · Ep)
= (det E1) (det E2) · · · · · (det Ep) �= 0
145

• If A is not invertible
Ep� · · · E1� A = I˜
A = E1 · · · EpI˜
For example,
1 2 0 0
0 1
 

˜ det I˜ = 0
0 0
 
0 0

0 0 0 0
I=
0
,
 1

and
det A = det (E1E2 · · · EpI)
˜ = det (E1E2 · · · Ep) det I˜ = 0

146
Theorem (Equivalent Statements of Matrix Inversion)

If A is an n × n matrix, then the following statements are equivalent.


1. A is invertible.
2. det (A) �= 0.

Theorem Let A and B be two square matrices of sizes n. If AB is


invertible, then both A and B are invertible.
Proof
Let D be the inverse of AB, then
ABD = DAB = I
and we have A−1 = BD and B −1 = DA.
147

Recall that if both A and B are invertible, then AB is also invertible,


and (AB)−1 = B −1A−1. Therefore, we have

Both A and B are invertible ⇔ AB is invertible.

Theorem : If A and B are n × n matrices, then


det (AB) = (det A) (det B)
Proof :
If A is not invertible, so is AB, therefore,
det AB = 0 = (det A) (det B)

148

If A is invertible, let
A = E1E2 · · · Ep
By (5),
det AB = det (E1E2 · · · EpB)
= (det E1) [det (E2 · · · EpB)] = · · ·
= (det E1) (det E2) · · · · · (det Ep)(det B)
= (det E1E2 · · · Ep)(det B)
= (det A) (det B)

Therefore, we have
det AB = det BA = (det A) (det B)
although AB �= BA in general,
149

Theorem Let U and L denote upper and lower triangular matrices,


respectively. If all the diagonal entries of U (or L) are non-zero, then
U (or L) is invertible.
Proof
Since det (U ) = u11 · · · unn �= 0

u11 u12 · · · u1n


 
 
 0 u22 · · · u2n 
 

U = 0 0 

0 0 · · · unn
 . . 
 . . 

150
T
Theorem : If A is an n × n matrix, then det A = det A.
Proof :
1. Hold for n = 1, since A = AT = [a11].
2. Assume it’s hold for (n − 1) × (n − 1) matrices, n ≥ 2.
3. For an n × n matrix A,
det A = a11 · C11 + a12 · C12 + · · · + a1n · C1n
� � �
det AT = a11 · C11 + a12 · C21 + · · · + a1n · Cn1

C1j = (−1)1+j det(A1j )



Cj1 = (−1)j+1det[(AT )j1)] = (−1)j+1det[(A1j )T ]

Since by inductive hypothesis Cj1 = C1j , we have det AT = det A.
151
Summary
1. det AB = (det A)(det B) = det BA
2. det AT = det A
3. det (A−1) = (det A)−1 (Exercise) (Hint : det (AA−1) = det I = 1)

Note that
1. det (A + B) �= det A + det B
2. det (cA) = cn(det A), if A is of size n × n

152
• Cramer’s Rule
For an n × n matrix A, and b ∈ Rn,
A = [a1 · · · ai · · · an]
define
Ai(b) = [a1 · · · ai−1 b ai+1 · · · an]
namely, replace ai by b.
Example
Ii(x) = [e1 · · · x · · · en]
1 0 · · · x1 0
x2
 

. . . ..
 
0 1 0

⇒ det Ii(x) = xi
xi
 . .. 
 . 
=0

0

0 0 1
 

xn
 . .. . . . 
 . 
Theorem (Cramer’s Rule) : Let A be an n × n invertible matrix. 153
For any b in Rn, the unique solution x of Ax = b has entries given by

det Ai(b)
xi = ,
det A
i = 1, 2, · · · , n
Proof :
A Ii(x) = A[e1 · · · x · · · en]
= [Ae1 · · · Ax · · · Aen]
= [a1 · · · b · · · an]
= Ai(b)

⇒ (det A)(det Ii(x)) = det Ai(b)

⇒ (det A) xi = det Ai(b)


det Ai(b)
det A
⇒ xi =

154
−1
• A formula for A :
Assume A is invertible and A−1 = B = [b1 b2 · · · bn].
Since AB = I,

A[b1 b2 · · · bn] = [e1 e2 · · · en]


we have
Abj = ej 1≤j≤n
By Cramer’s rule,
det Ai(ej ) Cji
[bj ]i = = ,
det A det A
1≤i≤n

where Cji is the (j, i)-cofactor of A.


155

A = [a1 · · · ai · · · an]

Ai(ej ) = [a1 · · · ej · · · an]


0
 
.
.
 
ej =  

0
1
.
.

and
det Ai(ej ) = Cji = (−1)j+idet(Aji)

Ref.
det(A) = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin
= a1j C1j + a2j C2j + · · · + anj Cnj

156
Since [bj ]i = [B]ij , we have
Cji
[B]ij =
det A
Hence
C C · · · Cn1
1 C C C
 

A−1 = B = (6)
· · ·
. ..
 11 21 
 n2 
 12 22  = 1 adj(A)

C1n C2n · · · Cnn


det A 
 .
..  det A

where we define the adjoint of a matrix A as


C C · · · Cn1
 

..
 11 21 
 C12 C22 · · · Cn2 
adj(A) =  

C1n C2n · · · Cnn


 .. .. 
 
157
We now prove the following theorem.
Theoreom Let U and L denote upper and lower triangular matrices,
respectively. If U (or L) is invertible, the U −1 (or L−1) is upper (lower)
triangular.
Proof
Since U is upper triangular, then the cofactor Cij = 0 for i < j.
By (6), we have U −1 also an upper triangular matrix.

∗ ∗ ∗ ∗
0∗∗ ∗∗∗
 

∗ ∗
   

0
 


0 ∗    

0 0 ∗ 0 0 ∗
0 0 0
U = , U12 =  0 ∗ ∗  , U23 =  0 0 ∗ 


0 ∗
 

158

Suppose that A and B are two square matrices of sizes m × m


and n × n, respectively. Let
A O
C=
O B
� �

We can show that


det(C) = det(A)det(B)

This result can be proved based on mathematical induction.


Begin with the case that A is a 1 × 1 matrix,
a11 O
O B
� �
159
• Geometric interpretation of determinants
If
b11 b12 b13
a11 a12
A = [a1 a2] =
 

a21 a22
� �

b31 b32 b33


 
, B = [b1 b2 b3] =  b21 b22 b23 

then
(1). the area of the parallelogram determined by a1 and a2 is |det A|,
(2). the volume of the parallelepiped determined by b1, b2 and b3 is
|det B|.

a2

a1

160

1. It holds if
�1 0 0
�1 0
A= ,
 

0 �2
� �

0 0 �3
 
B =  0 �2 0 

then det A = �1�2 and det B = �1�2�3 are the corresponding area
and volume, respectively.

�2

�1
161

2. Since det [a1 a2] = det [a1 a2 − ka1],


we can select k such that a2 − ka1 is perpendicular to a1.

a2 − ka1 a2

a1
ka1

3. Rotate the rectangle by a roational matrix R,

cos(φ) − sin(φ) s 0
[a1 a2 − ka1] =
sin(φ) cos(φ) 0 t
� � � �

where s = �a1�, t = �a2 − ka1�.

162

Since the rotation doesn’t change the area and det R = 1,


we have
det [a1 a2] = det [a1 a2 − ka1] = s t

4. In Rn, the volume spanned by a1, a2 . . . , an, where ak ’s ∈ Rn, is


defined as � �

Ref:
� �
� det [a1 a2 · · · an] �

a1
a2 − c21a1 ⊥ a1
a3 − c31a1 − c32a2 ⊥ a1 , a2
..
163
� Vector Spaces

- We are familiar with vectors in R2 and R3.


v = (x0, y0)T ∈ R2 or w = (x1, y1, z1)T ∈ R3.

(x0, y0)

- We can extend many familiar ideas beyond the 3-D space.


- Ordered n-tuple: v = (v1, v2, . . . , vn)T , vk ∈ R or C.
- The set of all ordered n-tuples is denoted by Rn (Cn).

Rn = {(v1, v2, . . . , vn)T | vk ∈ R}

164
n n
For vectors in R (or C ), we have the following definitions.
1. For two vectors u = (u1, u2, . . . , un)T and v = (v1, v2, . . . , vn)T
in Rn, the sum u + v is defined by

u + v = (u1 + v1, u2 + v2, . . . , un + vn)T

2. If c is any scalar (c ∈ R or C), the scalar multiple cu is defined by

cu = (cu1, cu2, . . . , cun)T

3. The zero vector is denoted by 0 and is defined to be


the vector 0 = (0, 0, . . . , 0)T .
165
T T
u = (u1, u2, . . . , un) , v = (v1, v2, . . . , vn)
4. We say u = v if
u1 = v1, u2 = v2, . . . , un = vn

5. We define the negative (or additive inverse) of u as


−u = (−u1, −u2, . . . , −un)T

6. The difference of vectors in Rn is defined by


v − u = v + (−u) = (v1 − u1, v2 − u2, . . . , vn − un)T

Use the above definitions and operations for vectors in Rn (Cn), we


can readily prove the following theorem.

166
T T
Theorem If u = (u1, u2, . . . , un) , v = (v1, v2, . . . , vn) , and
w = (w1, w2, . . . , wn)T are vectors in Rn ( or Cn ) and c and d
are scalars in R ( or C), then:
1. u + v = v + u
2. (u + v) + w = u + (v + w)
3. u + 0 = u (0 = (0, 0, . . . , 0)T )
4. u + (−u) = 0 ( −u = (−1)u )
5. c(u + v) = cu + cv
6. (c + d)u = cu + du
7. c(du) = (cd)u
8. 1u = u
The above properties will be used to define a vector space.
167
• Motivation
Consider some mathematical sets and the related linear operations.
1. Rn (or Cn)
v1 + v2 , cv1, v = (v1, v2, . . . , vn)T , vk ∈ R or C

2. p(t) = a0 + a1t + a2t2 + · · · + antn, ak ∈ R or C


� �

p1(t) + p2(t), cp1(t)

3. Mmn(R), or Mmn(C)
M1 + M2, cM1

4. { f (t) ∈ R, t ∈ R }
f1(t) + f2(t), cf1(t)

168

5. { {s} = (s1, s2, . . .), sk ∈ R or C }


{r} + {s}, c{s}

6. { X : Ω → R }, where X is a random variable, and Ω denotes


the sample space.
X +Y, cX

Remarks
1. For the above sets, we have similar the definitions of addition “+”
and scalar multiplication.
2. How to efficiently study further issues such as the concept of bases,
linear transformation, eigenvalues/eigenvectors, inner products, etc.?
169

We will define the Vector Space as a unified and generalized


framework, and develop the “linear algebra” of the above sets under the
vector space.

The properties of the above theorem are actually common for the
following sets.
1. Rn (or Cn)
2. p(t) = a0 + a1t + a2t2 + · · · + antn, ak ∈ R or C
3. Mmn(R), or Mmn(C)
� �

4. { f (t) ∈ R, t ∈ R }
5. { {s} = (s1, s2, . . .), sk ∈ R or C }
6. { X : Ω → R }

170
n n
We use the properties of the vectors in R (or C ) to define
a vector space.

Theorem For vectors u, v in Rn, (or Cn) · · ·

············

Axiom (Definition) For vectors u, v in a vector space V , · · ·

············
171
Axiom (or Definition )
A vector space V over a field F (R or C) is a nonempty set V of
vectors on which are defined two operations, called
addition (+) and scalar multiplication,
- addition: u + v
- scalar multiplication: cu
u, v ∈ V and c ∈ F (R or C)

subject to the following ten rules.

172
For all u, v, w ∈ V and scalars c, d ∈ F (R or C),
1. (Closure) The sum of u and v, denoted by u + v, is in V .
2. (Closure) The scalar multiple of u by c, denoted by cu, is in V .

u v
V

u + v cu

For example, the addition of two polynomials, p1(t) + p2(t), is also


a polynomial.
173

3. u + v = v + u
4. (u + v) + w = u + (v + w)
5. There is a zero vector 0 in V such that u + 0 = u for any u.
6. For each u in V , there exists a vector w such that u + w = 0
We use −u to denote w.
7. c(u + v) = cu + cv
8. (c + d)u = cu + du
9. c(du) = (cd)u
10. 1u = u

More precisely, we denote a vector space by (V, F ).

174
By the above definition of a vector space, it can be verified that
each of the following sets can be regarded as a vector space.

1. Rn (or Cn) v = (v1, v2, . . . , vn)T

2. {p(t) = c0 + c1t + c2t2 + · · · + cntn, ck ∈ R or C}

3. Mmn(R), or Mmn(C)

4. {f (t) ∈ R, t ∈ R}

5. { {s} = (s1, s2, . . .), sk ∈ R or C }

6. {X : Ω → R}, where X is a random variable, and Ω denotes


the sample space.
175
Example The space of polynomials (with real coefficients) of degrees
no more than n.
V = {p(t) = c0 + c1t + · · · + cntn | c0, . . . , cn ∈ R}

u v
u+v =v+u
u+v
0 (u + v) + w = u + (v + w)
cu
c(u + v) = cu + cv
(c + d)u = cu + du
c(du) = (cd)u
u+0=u
1u = u
u+w =0

176

Example The space of m × n matrices with its entries in R or C.


{M | [M ]ij ∈ R, 1 ≤ i ≤ m, 1 ≤ j ≤ n}
{M | [M ]ij ∈ C, 1 ≤ i ≤ m, 1 ≤ j ≤ n}

u v
u+v =v+u
u+v
0 (u + v) + w = u + (v + w)
cu
c(u + v) = cu + cv
(c + d)u = cu + du
c(du) = (cd)u
u+0=u
1u = u
u+w =0
177
Example A vector space of real-valued functions defined on (a, b),
denoted by F (a, b), is a vector space. The space F (a, b) is not
of finite dimension.
F (a, b) = {f (t) ∈ R | t ∈ (a, b)}

u v
u+v =v+u
u+v
0 (u + v) + w = u + (v + w)
cu
c(u + v) = cu + cv
(c + d)u = cu + du
c(du) = (cd)u
u+0=u
1u = u
u+w =0

178

(V, F )
Example (Rn, R), (Cn, C), (F n, F ) are vector spaces.

Exercise Is (Rn, C) a vector space?

Exercise Is (Cn, R) a vector space?

Example {0} can be a vector space.


179
Remarks
1. Based on the concept of vector space, we develop the theorems of
linear algebra.
2. The theorems and results developed under the vector space can be
applied in the sets of Rn, Cn, polynomials, matrices, functions, se-
quences, and other sets that satisfy the rules of vector space.
.. .. ..
⇑ ⇑ ⇑
Theorem Theorem Theorem
⇑ ⇑ ⇑ ⇑ ⇑
Theorem Theorem Theorem Theorem
⇑ ⇑ ⇑ ⇑

Axiom of vector spaces Axiom of inner product spaces

180
Remarks
• Any kind of set can be a vector space if the above ten rules are
satisfied.

u v
u+v =v+u
u+v
0 (u + v) + w = u + (v + w)
cu
c(u + v) = cu + cv
(c + d)u = cu + du
c(du) = (cd)u
u+0=u
1u = u
u+w =0
181
Fact: The zero vector 0 in a vector space V is unique.
Assume we have two zero vectors 01 and 02 in V .
Then
0 1 = 0 1 + 0 2 = 02

Fact: Every vector u has a unique negative element.


Assume u has two negative vectors w1 and w2,
u + w1 = 0, u + w2 = 0
Then

w1 = w1 + 0 = w1 + (u + w2) = (w1 + u) + w2 = 0 + w2 = w2

182
Therefore, in a vector space,
1. the zero vector 0,
2. and the negative vector of a vector u, denoted by −u,
are well-defined.
183
Theorem (Cancellation law for vector addition)
Let V be a vector space and u, v, w are vectors in V . If
u+w =v+w
then u = v.
Proof.
There exists a z ∈ V such that w + z = 0.
u = u + 0 = u + (w + z)
= (u + w) + z
= (v + w) + z
= v + (w + z)
=v+0=v

184
Theorem Let (V, F ) be a vector space, u a vector in V , and
k a scalar in F . Then
1. 0u = 0
2. k0 = 0
3. (−1)u = −u
4. If ku = 0, then k = 0 or u = 0
Proof
1. 0u + 0u = (0 + 0)u = 0u = 0u + 0
By the cancellation law for vector addition, we have 0u = 0.
2. k0 = k(0 + 0) = k0 + k0
k0 = k0 + 0
⇒ k0 + k0 = k0 + 0 ⇒ k0 = 0 (by the cancellation law)
185

3. u + (−1)u = 1u + (−1)u = [1 + (−1)]u = 0u = 0


So (−1)u = −u.
4. Suppose k �= 0, then
1 1
(ku) = ( k)u = (1)u = 1u = u (7)
k k

However, since ku = 0, we have


1 1
(ku) = ( )0 = 0 (8)
k k
Comparing (7) and (8), we have u = 0.

186
Remarks
1. In the above, we define the vector space.
2. We use the abstract method to achieve generalization.
3. Abstraction in mathematics is the process of extracting the underly-
ing essence of a mathematical concept.
4. We have an abstract in the front of an article or a book.
5. When referring to an abstract topic, we can use the cases (or exam-
ples) of R2 or R3 to verify it.
187
2
Example Let V = R . Define addition and scalar multiplication
operations as follows. For u = (u1, u2), v = (v1, v2), we define

u + v = (u1 + v1, u2 + v2)

ku = (ku1, 0)

The first nine rules of the axiom are satisfied. However, Axiom 10 fails
to hold.
1u = 1(u1, u2) = (u1, 0) �= u

188

Exercise Consider the set of all polynomials of degrees less or equal to


two. Define two operations of polynomials as
a0 + b0

 

a2 + b2
 
(a0 + a1t + a2t2) + (b0 + b1t + b2t2) =  a1 + b1 

a0
 

a2
�  
c(a0 + a1t + a2t2) = c  a1 

Check each of the ten rules of the vector space.

In the following, we will study the issues of subspaces and the dimension
of a vector space.
189
• Subspaces
Definition A subspace of a vector space V is a subset H of V
that is also a vector space.

190
3 2
Example V=R , and H = R .
z

x
191
To determine if a subset H of a vector space V is a subspace, we
only need to examine the following three rules, for all u, v ∈ H, and
c ∈ F,
1. 0 ∈ H
2. (Closure) u ∈ H, v ∈ H ⇒ u + v ∈ H
3. (Closure) u ∈ H, c ∈ F ⇒ cu ∈ H

Since all vectors of H are in V , they inherit the properties of


a vector space, like commutativity, associativity, etc.
u + v = v + u, u, v ∈ H ⊂ V
(u + v) + w = u + (v + w)
..

192
For a subset H ⊂ V , H is a subspace if for all u, v ∈ V , and c ∈ F ,
1. 0 ∈ H
2. (Closure) u ∈ H, v ∈ H ⇒ u + v ∈ H
3. (Closure) u ∈ H, c ∈ F ⇒ cu ∈ H

u v u+v =v+u
(u + v) + w = u + (v + w)
u+v
0 c(u + v) = cu + cv
cu
(c + d)u = cu + du
c(du) = (cd)u
1u = u
u+0=u
u+w =0 Note (−1)u = −u
193

Example The subset that contains only the zero element 0 ∈ V ,


{0}, is a subspace of V .

Example The subspaces of R2 include


• {0}
• Lines through the origin H = {(t, at) | t ∈ R}, a ∈ R.
• R2 = {(x, y) | x ∈ R, y ∈ R} {(x, y) | y = ax + b}

{(x, y) | y = ax}

194
3
Example The subspaces of R include
• {0}
• Lines through the origin H = {(t, at, bt) | t ∈ R}, a, b ∈ R.
• Planes through the origin.
• R3 z

x
195

We call H is a proper subspace of V if H is a subspace of V


but H �= V .

196
Example Assume m < n, and
W = {c0 + c1t + · · · + cmtm | c0, . . . , cm ∈ R}
V = {c0 + c1t + · · · + cntn | c0, . . . , cn ∈ R}
Then W is a (proper) subspace of V .

In the above example, both V and W are subspaces of the space


of polynomials
P = {c0 + c1t + · · · + cmtm + · · · | c0, . . . , cm, . . . , ∈ R}
197
Example If m < n, we can pad (n − m) zeros to every vector in
Rm to form a subspace W of Rn.

Rm = {(a1, a2, . . . , am)T , a1, . . . , am ∈ R}

Rn = {(a1, a2, . . . , am, am+1, . . . , an)T , a1, . . . , an ∈ R}

W = {(a1, a2, . . . , am, 0, . . . , 0)T , a1, . . . , am ∈ R}

W ⊂ Rn

198
Example
P (a, b) ⊂ C ∞(a, b) ⊂ C m(a, b) ⊂ C 1(a, b) ⊂ C(a, b) ⊂ F (a, b)

In this example, all spaces are of infinitely large dimensions.

1. P (a, b) = {c0+c1t+· · ·+cmtm+· · · | c0, . . . , cm, . . . , ∈ R, t ∈ (a, b)}


2. C m(a, b) = {f (t) | f (m)(t) exits, t ∈ (a, b)}
3. C(a, b) = {f (t) | f (t) continuous, t ∈ (a, b)}
4. F (a, b) = {f (t) ∈ R | t ∈ (a, b)}
199

Example The solution set of Ax = 0 is a subspace of Rn.


(A : m × n)

Let S = {x | Ax = 0, x ∈ Rn} denote the solution set. S ⊂ Rn.


1. 0 ∈ S.
2. x1 ∈ S and x2 ∈ S ⇒ (x1 + x2) ∈ S, cx1 ∈ S.
Ax1 = 0, Ax2 = 0 ⇒ A(x1 + x2) = 0, A(cx1) = cAx1 = 0

Exercise Is the solution set of Ax = b, b �= 0, a subspace of Rn ?

200
Example Let Mnn be the space of square matrices of size n.
1. The set of n × n symmetric matrices (A = AT ) is a subspace of Mnn.
2. The set of n × n upper matrices (U ) is a subspace of Mnn.
3. The set of n × n lower matrices (L) is a subspace of Mnn.
4. The set of n × n diagonal matrices (D) is a subspace of Mnn.

Theorem Let H and W be two subspaces of V . The intersection


H ∩ W is also a subspace of V .
201
Proof
1. Since both H and W are subspaces of V , we have 0 ∈ H and 0 ∈ W ,
⇒ 0 ∈ H ∩ W.
2. Suppose that u, v ∈ H ∩ W .
⇒ u + v ∈ H and u + v ∈ W
⇒ u+v ∈H ∩W
3. ⇒ cu ∈ H and cu ∈ W ⇒ cu ∈ H ∩ W

Exercise Is H ∪ W a subspace of V ?

202
3
The subspaces of R :
1. The x-axis, y-axis, and z-axis.
2. The xy-plane, yz-plane, zx-plane.

The intersection of these subspaces is 0 = (0, 0, 0)T .


203
Let V be a vector space, and W and U are two subspaces in V .
Is it possible that W and U are disjoint? (W ∩ U = φ)

W
U

In fact, 0 ∈ W ∩ U

204
How to define the dimension of a vector space?

Definition Let V be a vector space and v1, v2, . . . , vp ∈ V .


We define the span of v1, v2, . . . , vp as

Span{v1, v2, . . . , vp}


= {c1v1 + c2v2 + · · · + cpvp | c1, . . . , cp are scalars.}
the set of all linear combination of v1, v2, . . . , vp.

Example Consider the space of polynomials. The span of 1, t, t2 is


Span{1, t, t2 } = c0 + c1t + c2t2 | ck ∈ R
� �
205

Theorem If v1, v2, . . . , .vp ∈ V , then span{v1, v2, . . . , vp} is


a subspace of V .

Proof : Let W = span{v1, v2, . . . , vp}


= {c1v1 + c2v2 + · · · + cpvp | c1, . . . , cp are scalars.}

1. 0 ∈ W (Let c1 = · · · = cp = 0)
2. Assume u ∈ W , s ∈ W , k ∈ F ,
u = c 1 v 1 + c2 v 2 + · · · + c p v p
s = d 1 v 1 + d2 v 2 + · · · + d p v p
then u + s = (c1 + d1)v1 + (c2 + d2)v2 + · · · + (cp + dp)vp ∈ W

ku = (kc1)v1 + (kc2)v2 + · · · + (kcp)vp ∈ W

206
• We call span{v1, v2, . . . , vp} the subspace spanned (or generated)
by {v1, v2, . . . , vp}.

span{v1, v2, . . . , vp}


207
Definition A set of vectors {v1, v2, . . . , vp} is said to be
linearly independent if

c1v1 + c2v2 + · · · + cpvp = 0

has only the trivial solution, c1 = c2 = · · · = cp = 0.

Otherwise, we say that v1, v2, . . . , vp are linearly dependent.

If v1, v2, . . . , vp are linearly dependent, there are c1, c2, . . . , cp,
not all zero,
|c1|2 + |c2|2 + · · · + |cp|2 �= 0
such that
c1v1 + c2v2 + · · · + cpvp = 0

208
In this case, say, ck �= 0, then

vk = (−1/ck )(c1v1 + · · · + ck−1vk−1 + ck+1vk+1 + · · · + cpvp)

and we can represent vk as a linear combination of other v�’s, � �= k.

v1 v1

v2 v2

c1v1 + c2v2 = 0 u = d 1 v 1 + d2 v 2
209

Example The set of vectors {(1, 0, 0)T , (0, 1, 0)T , (0, 0, 1)T } is a linearly
independent set,
1 0 0 0
       

0 0 1 0
       
c1  0  + c2  1  + c3  0  =  0  ⇒ c1 = c2 = c3 = 0

while the set {(1, 0)T , (0, 1)T , (2, 3)T } is not an independent set,

2 1 0
=2 +3
3 0 1
� � � � � �

210
For a vector space V ,
v1 , . . . , v p ∈ V
x ∈ span{v1, . . . , vp} = W

If v1, . . . , vp are not linearly independent (linearly dependent),


it’s possible that
x = c 1 v 1 + c2 v 2 + · · · + c p v p
= d1v1 + d2v2 + · · · + dpvp
while if v1, . . . , vp are linearly independent, the coefficients ck ’s are
unique,
x = c 1 v 1 + c2 v 2 + · · · + c p v p
for any x ∈ W .
211
Proof :
Assume x has two linear combinations,
x = c 1 v 1 + c2 v 2 + · · · + c p v p
= d1v1 + d2v2 + · · · + dpvp
⇒ (c1 − d1)v1 + (c2 − d2)v2 + · · · + (cp − dp)vp = 0
Since v1, v2, . . . , vp are linearly independent, we have
(c1 − d1) = (c2 − d2) = · · · = (cp − dp) = 0
i.e.,
ck = dk , k = 1, . . . , p

212
Basis : For a vector space V , we want to find a set of vectors
S = {v1, v2, . . . , vn} such that

1. Every vector in V can be represented as a linear combination


of vectors in S, (i.e., V = span{v1, v2, . . . , vn} )
2. and this representation is unique.
x = c1 v 1 + · · · + c n v n , x∈V
Example
1. In R2, the standarad basis is {(1, 0), (0, 1)}.
2. In R3, the standarad basis is {(1, 0, 0), (0, 1, 0), (0, 0, 1)}.

x 1 0
=x +y
y 0 1
� � � � � �
213

(0, 1)

(1, 0)

Definition A set of vectors β = {v1, . . . , vn} is a basis for V if


1. β spans (generates) V , and ( V = span{v1, v2, . . . , vn} )
2. β is a linearly independent set.

Then every vector x in V can be represented uniquely as


x = c 1 v 1 + c2 v 2 + · · · + c n v n

214
Suppose β = {v1, v2, . . . , vn} is a basis of a vector space V . For
each x ∈ V , we can write x = c1v1 + c2v2 + · · · + cnvn.

Then the β-coordinate vector of x is


c
 

n
 1
 c2 

cn
[x]β =  
 ..  ∈ R
 

Rn
V
x [x]β
β
215
For example, consider the space of polynomials of degree smaller
than 3,

V = {c0 + c1t + c2t2 | c0, c1, c2 ∈ R}

Let
β = {1, t, t2}
be a basis of V . For a polynomial

p(t) = d0 + d1t + d2t2 ∈ V

we have
d0
 

d2
 
[ p(t) ]β =  d1  ∈ R3

216

(1, 1)

(0, 1)

(1, 0) (1, −1)

A vector space V may have more than one bases. Any set of vectors
{v1, . . . , vn} in V can be a basis of V if it is linearly independent and
it spans V .

For example, in R2, we can choose β = {(1, 0), (0, 1)} as a basis,
or γ = {(1, 1), (1, −1)} as a basis.
217
Theorem A vector space V can have different bases, say,
β = {b1, . . . , bn}, γ = {f1, . . . , fm}.
However, they must have the same number of vectors, i.e., m = n.
Proof:
1. m > n
Consider the coordinates of f1, . . . , fm relative to β,

f f f
     
 11   21   m1 
 f12   f22   

f1n f2n fmn


[f1]β =  
 ..  , [f2]β = 
 .. 
 , . . . , [fm]β =  fm2 
 .. 
     

f1 = f11b1 + f12b2 + · · · + f1nbn

218

Assume c1f1 + c2f2 + · · · + cmfm = 0


or c1[f1]β + c2[f2]β + · · · + cm[fm]β = [0]β

f f21 . . . fm1 c 0
  
 

.
 11  1   
 f12 f22 . . . fm2     

f1n f2n . . . fmn cm 0


   c2  =  0 
   ..   .. 
    

Since m > n, there is a non-trivial solution (c1, c2, . . . , cm).


This means there are c1, c2, . . . , cm, not all zero, such that
c1[f1]β + c2[f2]β + · · · + cm[fm]β = [0]β
or
c1 f 1 + c 2 f 2 + · · · + c m f m = 0
219

This contradicts to the fact that f1, . . . , fm are linearly independent.


2. m < n
Reverse the roles of β and γ, and by a similar process as above, we
also reach a contradiction.
We conclude that m = n.

Definition The dimension of a vector space V is defined as the number


of vectors in a basis.

This definition is well-defined, since all bases have the same number of
vectors.

220

Fact In a vector space V of dimension n, if S is a linearly independent


set and span(S) = V , then S is a basis and the number of vectors in S
is n.
Example Find the dimensions of the following vector spaces.
1. R2 (β = {(1, 0), (0, 1)} γ = {(1, 1), (1, −1)})

2. R3 (β = {(1, 0, 0), (0, 1, 0), (0, 0, 1)})

Example The space of polynomials (with real coefficients) of degrees


no more than n has dimension n + 1.
V = {c0 + c1t + · · · + cntn | c0, . . . , cn ∈ R}
β = {1, t, . . . , tn}
221

Example The dimension of Rn is n. (β = {e1, e2, . . . , en})

Example The standard basis for 2 × 2 matrices M22 is

10 01 00 00
M1 = , M2 = , M3 = , M4 =
00 00 10 01
� � � � � � � �

since
a b 10 01 00 00
=a +b +c +d
c d 00 00 10 01
� � � � � � � � � �

Example The dimension of the space {0} is defined as 0.

222

• In the above, under the framework of vector space, we define the


basis and dimension of a vector space, and the result can be applied
to vector spaces like Rn (Cn), matrices, polynomials, and so on.

Theorem For a vector space V of dimension n, the largest number


of linearly independent vectors is n.
v1, v2, . . . , vr ∈ V, r ≤ n = dim(V )

Theorem Let S = {v1, v2, . . . , vr } be a set of vectors in Rn.


If r > n, then S is linearly dependent.
Proof :

(In the next page.)


223
Proof :
Consider the system of linear equation (homogeneous)

c1v1 + c2v2 + · · · + cr vr = 0 (vk ∈ Rn) (9)


or
c
 
 1
 c2 

cr
[v1 v2 · · · vr ]  
 ..  = 0
 

There are n equations and r unknowns. When r > n, there are


free variables, and there exist non-trivial solutions of (c1, c2, . . . , cr )
in (9).

224
Note that for a basis β of a vector space V , β is a set that
1. contains maximum number of linearly independent vectors,
2. contains minimum number of vectors that span V .

β = {v1, v2, . . . , vn}, n = dim(V )

Thus if we add a vector u ∈ V to β,


β ∪ {u}
it becomes a linearly dependent set, for we have
u = c1 v 1 + · · · + c n v n
because β is a basis.
225

If we remove a vector from β, then β cannot span V .


For example, suppose we remove v1 from β,

β → {v2, . . . , vn} = β �

Then
/ span(β �)
v1 ∈
since β is an independent set.
Therefore,
span(β �) �= V
and β � cannot span V .

226
Summary:
For a vector space V , a basis β = {v1, v2, . . . , vn}
1. is an linearly independent set,
2. and it spans (generates) V .
3. n = dimV

All bases of V have the same number of vectors.


β = {v1, v2, . . . , vn}, γ = {w1, w2, . . . , wn}

Now, we further discuss the relations among the dimension,


independence, and span (or generation).
227
Theorem Let V be an n-dimensional vector space, n ≥ 1.
1. Any linearly independent set S of exactly n elements is
automatically a basis for V .
2. Any set S of exactly n elements that spans V is
automatically a basis for V .
Proof
1. S must span V . Otherwise, span(S) � V , and there exists
v ∈/ span(S), and S ∪ {v} is a linearly independent set of n + 1
entries. This contradicts to the fact that dim(V ) = n.
2. S must be linearly independent. Otherwise, we can remove a vector
from S and the remaining (n − 1) vectors in S still span V .
This also contradicts to that dim(V ) = n.

228
In summary, for a vector space V of dimension n,
a set S = {v1, v2, . . . , vn} is a basis if
• S spans V , or
• S is linearly independent.

Theorem Let V be a finite-dimensional vector space. Any linearly


independent set S in V can be extended, if necessary, to a basis of V .
Proof:
1. If span{S} = V , then S is a basis of V .
229

/ span{S}.
2. If span{S} � V , there is a vector u1 ∈ V , u1 ∈
Let S1 = S ∪ {u1}. Note S1 is also an independent set.

Span(S)

u1

3. If span{S1} = V , then S1 is a basis of V .

230
4. If span{S1} � V , there is a vector u2 ∈ V , u2 ∈
/ span{S1}.
Let S2 = S1 ∪ {u2}.
V

span(S ∪ u1)

u2

5. Continue this process, until span{Sk } = V .


(since V is finite-dimensional)
S ∪ u1 ∪ u2 · · · · · ·
231

The above proof also indicates how to construct a basis of a space V .

Theorem: If W is a subspace of a finite-dimensional vector space


V , then dim(W ) ≤ dim(V ). When dim(W ) = dim(V ), we have W = V .

dim(W ) ≤ dim(V )

232
Proof :
• Let βW be the a basis of W . Since βW ⊂ W ⊆ V and V is
finite-dimensional, we have that βW is a finite set, and W is
finite-dimensional.
• Since βW can be extended, if necessary, to become a basis βV of V ,
we have dim(W ) ≤ dim(V ) for βW ⊆ βV .
• If dim(W ) = dim(V ) = n, then βW is an independent set of n vectors
in V , and βW becomes a basis for V . So V = W = span(βW ).

Note the above result may not be true if the dimensions of V and W
are not finite, dim(V ) = dim(W ) = ∞.
For example, P (a, b) � C(a, b).
233
n
• Change of basis in R
Let
1 0 0
     

.
     
0 1 0

0 0 1
   
� = {e1, . . . , en} = { .  ,  .  , . . . ,  }
 
. . .

be the standard basis of Rn, and

β = {u1, . . . , un}, γ = {v1, . . . , vn}

be another two bases. We can express a vector x in Rn as

x = c1u1 + · · · + cnun = d1v1 + · · · + dnvn

234
Then
c d
   
 1  1
 c2   d2 

cn dn
[x]β =  
 ..  , [x]γ =  
 .. 
   

and
x c d
     
 1  1  1
 x2   c2   d2 

xn cn dn
   
x = [x]� =  .  = [u1 · · · un]  .  = [v1 · · · vn]  
 
 .   .  ···

= Pβ [x]β = Pγ [x]γ

where the matrices Pβ = [u1 · · · un], Pγ = [v1 · · · vn].


235
Remark:
1. We have
x = Pβ [x]β = Pγ [x]γ
and
[x]β = Pβ−1x, [x]γ = Pγ−1x


2. [x]γ = Pγ−1Pβ [x]β = Pβγ [x]β , where Pβγ = Pγ−1Pβ .

Pβγ is called the change-of-coordinates matrix from β to γ.


Note P� = In.

236
2
Example Consider a basis β = {u1, u2} of R , where

1 1
u1 = , u2 =
0 2
� � � �

If

−2
[x]β =
3
� �

then

1 1 1
x = −2u1 + 3u2 = (−2) +3 = = [x]�
0 2 6
� � � � � �
237
2
Example Consider a basis β = {u1, u2} of R , where

1 1
u1 = , u2 =
0 2
� � � �

If
1
x = [x]� =
6
� �

assume
c1
[x]β =
c2
� �

then
1 1 c1 1
x = c1u1 + c2u2 = =
0 2 c2 6
� �� �� �

c1 1 1 1 1
= = =
1 −0.5 −2
⇒ [x]β =
c2 0 2 6 0 0.5 6 3
� � � �−1 � � � �� � � �

238

,
−9 −5
Example Let β = {u1, u2} = { } and
1
� � � �

−1
1 3
γ = {v1, v2} = { , } be two bases in R2.
� � � �

−4 −5

2 d1
Assume [x]β = , let [x]γ = . Then
3 d2
� � � �

2 d1
x = [u1 u2] = [v1 v2]
3 d2
� � � �

and
2 1 3 d1 d1 24
= =
−9 −5

3 d2 d2
� �� � � �� � � � � �

1 −1 −4 −5 −19
239
• Change of basis in a general vector space V

Example Let V be the space of polynomials of degrees less than or equal


to n.
V = {c0 + c1t + · · · + cntn | c0, . . . , cn ∈ R}

We consider two bases in V ,


β = {1, t, t2, . . . , tn} and γ = {1, 1 + t, t + t2, . . . , tn−1 + tn}

For the space of polynomials of degrees no more than 2,


β = {1, t, t2}, γ = {1, 1 + t, t + t2}

240

2 d0
p(t) = 2 + 3t + 4t2,
   

4 d2
   
[p]β =  3  , [p]γ =  d1 

p(t) = 2 + 3t + 4t2 = d0 + d1(1 + t) + d2(t + t2)

= (d0 + d1) + (d1 + d2)t + d2t2

⇒ d0 + d1 = 2, d1 + d2 = 3, d2 = 4

3
 

⇒ d2 = 4, d1 = −1, d0 = 3,
4
 
[p]γ =  −1 
241
Consider an m × n matrix A,
rT1
 
 

ck ∈ R m , rk ∈ Rn
 rT 
 2 
A = [c1 c2 · · · cn] = 
 . ,

rTm
 . 
 

Definition
• The column space of A: Col(A) = span{c1, c2, . . . , cn} ⊆ Rm
• The row space of A: Row(A) = span{r1, r2, . . . , rm} ⊆ Rn
• The null space of A is
Nul(A) = {x | Ax = 0} ⊂ Rn

242
A
Rn Rm
Col(A)
Nul(A)
0

A : m×n
Col(A) = {Ax | x ∈ Rn} x = (x1, x2, . . . , xn)T
= {x1c1 + x2c2 + · · · + xncn | xk ∈ R}
= span(c1, c2, · · · , cn)

Nul(A) = {x | Ax = 0}
243
Theorem Elementary row operations do not change the null space
of a matrix.
Proof
Nul(A) = {x | Ax = 0}
= {x | EAx = 0}
since the elementary row operation is reversible, and each elementary
matrix is invertible.

In general, we have
{x | Ax = 0} ⊆ {x | BAx = 0}
or
Nul(A) ⊆ Nul(BA)

244
Theorem Elementary row operations do not change the row space
of a matrix.

×3

×2

However, row operations change the column space of a matrix.


rT1
 

m n
 
 rT 
 2 
A = [c1 c2 · · · cn] =  
 .  , ck ∈ R , r k ∈ R

rTm
 . 
 
245
Example

1 −3 4 −2 5 4 1 −3 4 −2 5 4
   
   
 2 −6 9 −1 8 2   0 0 1 3 −2 −6 

0 00 0 0 0
A=   

−1 3 −4 2 −5 −4
 2 −6 9 −1 9 7  ∼ R =  0 0 0 0 1 5 
   

Row(A) = Row(R)
= span{(1, −3, 4, −2, 5, 4), (0, 0, 1, 3, −2, −6), (0, 0, 0, 0, 1, 5)}

and we can find a basis for Row(A).

In the above, note the row operation does change Col(A).

246

A = [a1 a2 · · · an] ∼ B = [b1 b2 · · · bn]

Ac = 0 ⇔ Bc = 0 c = (c1, c2, . . . , cn)T

c1 a 1 + c 2 a 2 + · · · + c n a n = 0
⇔ c1b1 + c2b2 + · · · + cnbn = 0

The columns of A and B have the same dependence.


That is, the same coefficients c1, . . . , cn linearly combines both
a1, . . . , an and b1, . . . , bn into 0.
247
Since A ∼ B, for the submatrices of A and B, we have
A1 = [a1 a3] ∼ B1 = [b1 b3]

A2 = [a2 a4 a5] ∼ B2 = [b2 b4 b5]


..
Since
A = [a1 a2 · · · an] ∼ B = [b1 b2 · · · bn]
Eq · · · E2E1 [a1 a2 · · · an] = [b1 b2 · · · bn]

Eq · · · E2E1 [a1 a3] = [b1 b3]

Eq · · · E2E1 [a2 a4 a5] = [b2 b4 b5]


Therefore,
[a1 a3] ∼ [b1 b3]
[a2 a4 a5] ∼ [b2 b4 b5]

248

A = [a1 a2 · · · an] ∼ B = [b1 b2 · · · bn]


Furthermore, if
{a1, a2, a4}
forms a basis for Col(A), then
{b1, b2, b4}
also forms a basis for Col(B).
For example,

a5 = d2a2 + d4a4 ⇔ b5 = d2b2 + d4b4

(or d2a2 + d4a4 − a5 = 0 ⇔ d2b2 + d4b4 − b5 = 0 )


249
How to find a basis of Col(A)?
A = [a1 a2 . . . an]
That is, how to find a basis for span{a1 a2 . . . an}?

We illustrate the approach by an example.

−3 6 −1 1 −7 1 −2 0 −1 3
   

0 0 0 0 0
   

2 −4 5 8 −4
A =  1 −2 2 3 −1  ∼ R =  0 0 1 2 −2 

then {a1, a3} forms a basis for Col(A).

(Some examples, derivation, and explanation are given in the Appendix.)

250
Example For
1 −3 4 −2 5 4 1 −3 4 −2 5 4
   
   
 2 −6 9 −1 8 2   0 0 1 3 −2 −6 

0 00 0 0 0
A=   

−1 3 −4 2 −5 −4
 2 −6 9 −1 9 7  ∼ R =  0 0 0 0 1 5 
   

we have
Row(A) = span{(1, −3, 4, −2, 5, 4), (0, 0, 1, 3, −2, −6), (0, 0, 0, 0, 1, 5)}

1 4 5
     
     
2  9  8

1
Col(A) = span{     

−4 −5
 2  ,  9  ,  9 }
     
251
The above examples illustrate how to find the bases of
Col(A) and Row(A).

Example Find bases for Nul(A), Row(A) and Col(A) of the matrix

−3 6 −1 1 −7
 
 

2 −4 5 8 −4
A =  1 −2 2 3 −1 

Sol : Find the solution of Ax = 0,

1 −2 0 −1 3 0 x1 − 2x2 − x4 3x5 = 0
 

x3 + 2x4 − 2x5 = 0
0 0 0 0 0 0 0 =0
� �  
A 0 ∼  0 0 1 2 −2 0 

252
x1 2x2 + x4 − 3x5 2 1 −3
x2
   
     
         
 x2    1  0  0

x4
         
 x3  =  −2x4 + 2x5  = x2  0  + x4  −2  + x5  2 

x5 x5 0 0 1
         
         
 x4    0  1  0

Therefore, the solution set of Ax = 0, or Nul(A), is

Nul(A) = {x | Ax = 0}
2 1 −3
     
     

t1, t2, t3 ∈ R
� 1  0  0 �
     
    

0 0 1
= t1  0  + t2  −2  + t3  2 ,
     
0  1  0
253
Note that

2 1 −3
     
     
1  0  0
     
0,  −2  ,  2

0 0 1
     
     
0  1  0

are linearly independent, and they form a basis of Nul A.

254

By the reduced row echelon form of A,

−3 6 −1 1 −7 1 −2 0 −1 3
   

0 0 0 0 0
   

2 −4 5 8 −4
A =  1 −2 2 3 −1  ∼ R =  0 0 1 2 −2 

we have a basis for Row(A)


{(1, −2, 0, −1, 3), (0, 0, 1, 2, −2)}

and the basis for Col(A)


−3 −1
   

2 5
   
{a1, a3} = { 1  ,  2 }
255

Theorem For any matrix A, the two spaces Row(A) and Col(A)
have the same dimensions.
Proof
A ∼ R
Since Row(A) = Row(R), we have
dim(Row(A)) = dim(Row(R))

On the other hand, since A ∼ R, a given set of columns of A form a basis


for Col(A) iff the corresponding columns R form a basis for Col(R).
dim(Col(A)) = dim(Col(R))

256

dim(Row(A)) = dim(Row(R))
dim(Col(A)) = dim(Col(R))
However,
dim(Col(R)) = dim(Row(R)) = number of pivots

1 −3 4 −2 5 4
 
 
 0 0 1 3 −2 −6 

0 0 0 0 0 0
A∼R=
0 0 0 0 1 5

 
257
Definition For a matrix A, we define its rank as
rank(A) = dim(Col(A)) = dim(Row(A))

Definition For a matrix A, we define its nullity as


nullity(A) = dim(Nul(A))

A
Rn Rm
Col(A)
Nul(A)
0

258
For an m × n matrix A, we have
rank(A) = dim(Col(A)) ≤ min(m, n)

since rank(A) equals to the number of pivots in R, which is no more


than m and n.

1 −3 4 −2 5 4
 

R : m × n (4 × 6)
 
 0 0 1 3 −2 −6 

0 0 0 0 0 0
A∼R=
0 0 0 0 1 5,

 

Theorem For any matrix A, rank(A) = rank(AT ).


Proof
rank(A) = dim(Row(A)) = dim(Col(AT )) = rank(AT )
259
Theorem (Dimension Theorem for Matrices)
For any m × n matrix A, we have
rank(A) + nullity(A) = n

A
Rn Rm
Col(A)
Nul(A)
0

Col(A) = {Ax | x ∈ Rn} ⊂ Rm


Nul(A) = {x | Ax = 0} ⊂ Rn

260
Proof
Let a row echelon form of A be R. (Ax = 0)
rank(A) = dim(Col(A)) = (# of pivots in R)
= (# of basic variables)

nullity(A) = dim(Nul(A)) = (# of free variables)


(see the previous example.)
rank(A) + nullity(A) = (# of total variables) = n

1 −2 0 −1 3
 

0 0 0 0 0
 
A ∼ R =  0 0 1 2 −2 
261
Example As in the previous examples,

1 −3 4 −2 5 4 1 −3 4 −2 5 4
   
   
 2 −6 9 −1 8 2   0 0 1 3 −2 −6 

0 0 0 0 0 0
A= ∼R= 

−1 3 −4 2 −5 −4
 2   
 −6 9 −1 9 7  0 0 0 0 1 5

We have n = 6, rank(A) = dim(Col(A)) = 3, nullity(A) = 3.

Example As in the previous examples,

−3 6 −1 1 −7 1 −2 0 −1 3
   

0 0 0 0 0
   �

2 −4 5 8 −4
A =  1 −2 2 3 −1  ∼  0 0 1 2 −2  = R

We have n = 5, rank(A) = 2, nullity(A) = 3.

262
A
Rn Rm
Col(A)
Nul(A)
0

Ax = b, A = [c1 c2 · · · cn], x = (x1, x2, . . . , xn)T

x1c1 + x2c2 + · · · + xncn = b

We have
1. Consistent ⇔ b ∈ Col(A)
263

2. When Nul(A) = {0}, (nullity(A) = 0)


Ax = b has a unique solution if consistent.
3. If Nul(A) �= {0},
Ax = b has infinitely many solutions if consistent.

Nul(A) = {0} ⇔ the columns of A are linearly independent.

Nul(A) = {x | Ax = 0} A = [c1 c2 · · · cn]

= {x | x1c1 + · · · + xncn = 0}

264

Ax = b, A : m × n, x ∈ Rn, b ∈ Rm

Overdetermined (m > n) and underdetermined (m < n)


n
n

m A A

More equations ⇒ more “determined”


265
Example (An overdetermined system)
x1 − 2x2 = b1
x1 − x2 = b2
x1 + x2 = b3
x1 + 2x2 = b4
x1 + 3x2 = b5
The corresponding [A | b] is row equivalent to

1 0 2b2 − b1
1
 

b2 − b1
 
0 
 
0 0 b3 − 3b2 + 2b1 

0
 

0 b5 − 5b2 + 4b1
 
0 0 b4 − 4b2 + 3b1 

266

which is consistent only when


b3 − 3b2 + 2b1 = 0
b4 − 4b2 + 3b1 = 0
b5 − 5b2 + 4b1 = 0

A
Rn Rm
Col(A)
Nul(A)
0

rank(A) + nullity(A) = n (A : m × n)

rank(A) ≤ min(m, n)
267

Overdetermined (m > n) Ax = b

Since rank(A) ≤ min(m, n) < m, we have Col(A) � Rm.


/ Col(A) such that Ax = b is inconsistent.
There exists b ∈ Rm but b ∈
(See the last Example )

n
A
Rn Rm
Col(A)
m A Nul(A)
0

268

Underdetermined (m < n) Ax = b

Since rank(A) + nullity(A) = n, we have

nullity(A) = n − rank(A) ≥ n − m > 0

⇒ Nul(A) �= {0}, and


Ax = b has infinitely many solutions if it is consistent.

Rm
A
n
Rn
Col(A)
Nul(A)
m A 0
269
Let Ax = b be a consistent linear system of m equations
in n unknowns (A : m × n).

If A has rank r, then dim( Nul(A) ) = n − r = k.

Let {v1, v2, . . . , vk } be a basis for


Nul(A) = {x | Ax = 0}

Av1 = 0, Av2 = 0, . . . , Avk = 0

The solution set of Ax = b can be expressed as


x 0 + c1 v 1 + c1 v 2 + · · · + c k v k (10)

where x0 is any solution that satisfies Ax0 = b, ck ∈ R.


(See the previous example.)

270
For the above results, recall the previous theorem:
Theorem Suppose Ax = b has a solution p. (Ap = b)
Then the solution set of Ax = b can be expressed as
{p + vh | Avh = 0}, vh ∈ Nul(A)
271
Theorem If A is an m × n matrix, then the following are
equivalent.
1. Ax = 0 has only the trivial solution. (Nul(A) = {0})
2. The column vectors of A are linearly independent.
3. Ax = b has at most one solution (none or one) for every
b ∈ Rm .

A = [c1 c2 . . . cn], ck ∈ Rm
Ax = 0, x = (x1, x2, . . . , xn)T
x1c1 + x2c2 + · · · + xncn = 0
x1c1 + x2c2 + · · · + xncn = b

272
Theorem (Equivalent Statements of Matrix Inversion)
If A is an n × n matrix, then the following statements are equivalent.
a. A is invertible.
b. The column vectors of A are linearly independent.
c. The row vectors of A are linearly independent.
d. The column vectors of A span Rn.
e. The row vectors of A span Rn.
f. A has rank n.
g. A has nullity 0.
273
Proof
Recall that for a square matrix A of size n,
A is invertible
⇔ Ax = 0 has only the trivial solution x = 0.
⇔ The column vectors of A are linearly independent. (b)
(the above is by (1) and (2) of the last Theorem)
⇔ Col(A) = Rn. (d)
⇔ Rank(A) = n. (f)

A
Rn Rm
Col(A)
Nul(A)
0

274
Furthermore, by the Dimension Theorem, since A: n × n, we have
rank(A) = n ⇔ nullity(A) = 0 (g)

rank(AT ) = rank(A) = n
⇔ the row vectors of A are linearly independent. (c)
⇔ the row vectors of A span Rn. (e)

(Cf. the column vectors of A are linearly independent. (b))


(Cf. the column vectors of A span Rn. (d))
275
Summary about the subspaces related to an m × n matrix A.

1 −5 7 3 2
(row-echelon form)
 

0 0 0 0 0
 
A ∼  0 0 1 −1 4 

rank(A) = dim(Col(A)) = (# of pivots) ≤ min{m, n}

nullity(A) = dim(Nul(A)) = (# of free variables)

rank(A) + nullity(A) = n

Theorem Suppose that A and B are two matrices and 276

we can calculate AB. Then we have


rank(AB) ≤ rank(B)

rank(AB) ≤ rank(A)
Proof
Nul(AB) ⊇ Nul(B) Bx = 0 ⇒ ABx = 0

⇒ nullity(AB) ≥ nullity(B) (A : m × p, B : p × n)

⇒ rank(AB) ≤ rank(B)

since
rank(AB) + nullity(AB) = rank(B) + nullity(B) = n
277

Recall that for a square matrix M ,


rank(M ) = rank(M T )
We have

rank(AB) = rank((AB)T ) = rank(B T AT ) ≤ rank(AT ) = rank(A)


Therefore,
rank(AB) ≤ rank(A)

278
Note in general, rank(AB) �= rank(BA).
For example
1 1 1 0
A= , B=
0 0
� � � �

−1 0

we have
1 1 1 0 0 0
AB = =
0 0 0 0
� �� � � �

−1 0
while
1 0 1 1 1 1
BA = =
0 0
� �� � � �

−1 0 −1 −1

rank(AB) = 0, rank(BA) = 1
279
� Linear Transformation

A transformation from a vector space V into a vector space W ,


T : V �→ W
is a function (or mapping) that maps a vector v ∈ V to a vector
T (v) ∈ W .
V T
W
T (V )

v T (v)

V : domain, W : codomain
T (V ): range, T (V ) = {T (v) | v ∈ V } ⊆ W

280
Definition
A transformation T : V �→ W is called a linear transformation if for
all vectors u and v in V and all scalars c, we have
1. T (u + v) = T (u) + T (v)
2. T (cu) = cT (u)
T
V W

u v T (u) T (v)

u+v T (u + v)

In the special case where V = W , the linear transformation


T : V �→ V is called a linear operator on V .
281

Theorem For a linear transformation T : V �→ W , we have T (0) = 0.


More precisely, T (0V ) = 0W .
Proof :
T (0 + 0) = T (0)
T (0 + 0) = T (0) + T (0) = 2T (0)

Since T (0) = 2T (0), or T (0) + 0 = 2T (0), we have T (0) = 0.

V W
T
0V 0W

282
For a linear transformation T : V �→ W , we have
T (c1v1 + c2v2 + · · · + cnvn)
= c1T (v1) + c2T (v2) + · · · + cnT (vn)

Example (Zero Transformation)


The mapping T : V �→ W such that T (v) = 0 for every v ∈ V is a
linear transformation called the zero transformation.

T (v1 + v2) = 0 = T (v1) + T (v2)


T (cv) = 0 = cT (v)
283
Example (Identity Operator)
The mapping I : V �→ V defined by I(v) = v is called the identity
operator on V.

Let T = I.
T (v1 + v2) = v1 + v2 = T (v1) + T (v2)
T (cv) = cv = cT (v)

Exercise
Consider the mapping T : V �→ W such that T (v) = w0 for every
v ∈ V , where w0 is a constant vector in W . Is T a linear transformation?

284
Example
Consider the differential operation on the space of polynomials,
2 n
V = c0 + c1t + c2t + · · · + cnt | c0, . . . , cn ∈ R
� �

Let T denotes the differential operator. Note

T p1(t) + p2(t) = p�1(t) + p�2(t) = T p1(t) + T p2(t)


� � � � � �

T kp1(t) = kp�1(t) = kT p1(t)


� � � �

Therefore, T is a linear transformation (operator).


285
Example
Define a transformation S from the space of n × n matrices Mn(R)
to R, as
S(A) = tr(A)
where A ∈ Mn(R). Note
S(A + B) = tr(A + B) = tr(A) + tr(B)
= S(A) + S(B)

S(cA) = tr(cA) = c tr(A) = cS(A)


and S is a linear transformation from Mn(R) to R.
� �

286
Exercise
Define a transformation T from the space of n × n matrices Mn(R)
to R, as
T (A) = det(A)
where A ∈ Mn(R). Determeine if T is a linear transformation.
287
n m
For a linear transformation T : R �→ R ,
we can express T (x) as
T (x) = Ax
where
A = [T (e1) T (e2) · · · T (en)]

is the standard matrix of T . (A has size m × n)

1 0 0
     
     
0    

0 0 1
e1 =   , e 2 =  1  , · · · , en =  0 
 ..   ..   .. 
     

− Every linear transformation T has a standard matrix.

288
n
Since for any x ∈ R , we can express it as

x
 
 1
 x2 

xn
x= 
 ..  = x1e1 + x2e2 + · · · + xnen
 

T (x) = T (x1e1 + x2e2 + · · · + xnen)


= x1T (e1) + x2T (e2) + · · · + xnT (en)

x
 

T (ek ) ∈ Rm
 1
 x2 

xn
= [T (e1) T (e2) · · · T (en)]  
 ..  ,
 

= Ax
289

Remarks
1. Every linear transformatin T from Rn to Rm corresponds to a matrix
A such that T (x) = Ax, with
A = [T (e1) T (e2) · · · T (en)]

2. On the other hand, every m × n matrix A corresponds a linear trans-


formation TA from Rn to Rm.
TA = Ax

290
Examples
1. Refection operator in R3
z
–Reflection about the xy-plane
(x1, y1, z1)

1 0 0 x x
    

y
z
    

0 0 −1 −z
x
0 1 0 y  =  y 

(x1, y1, −z1)


Note
1 1 0 0 0 0
           

0 0 0 0 1
           

−1
T (  0 ) =  0  , T (  1 ) =  1  , T (  0 ) =  0 
291

2. Projection operator on R3
z
– Orthogonal projection
(x1, y1, z1)
100 x x
    

000 z 0 y
    
0 1 0 y  =  y 

x (x1, y1, 0)
Note
1 1 0 0 0 0
           

0 0 0 0 1 0
           
T (  0 ) =  0  , T (  1 ) =  1  , T (  0 ) =  0 

292

3. Rotation operator
Define a linear operator on R2 that rotates a vector x counter-clockwise
through an angle θ.

x
T (x) =
cos(θ) − sin(θ)
sin(θ) cos(θ) y θ
� �� �

1 cos(θ) 0
T( )= T( )=
− sin(θ)
0 sin(θ) 1 cos(θ)
� � � � � � � �
293
Definition Let T : V �→ W be a linear transformation.
• The kernel (or null space) of T is defined as
Ker (T ) = N (T ) = {x | T (x) = 0} ⊂ V

• The range of T is defined as


R(T ) = {T (x) | x ∈ V } ⊂ W

• Nullity (T ) = dim (N (T ))
• Rank (T ) = dim (R(T ))

T
V W
R(T )
N (T )
0

294

Exercise Show that N (T ) is a subspace of V , and


R(T ) is a subspace of W .

T
V W
R(T )
N (T )
0
295

Example Let A be an m × n matrix. Define a linear transfromation


TA : Rn �→ Rm as x →
� Ax.
Then the kernel of TA is
N (TA) = Nul(A)
and the range of TA
R(TA) = Col (A)

Recall that
Nul(A) = {x | Ax = 0} ⊆ Rn
Col (A) = {Ax | x ∈ Rn} ⊆ Rm

296
Definition A linear transformation T : V → � W is said
one-to-one if T maps distinct vectors in V to distinct vectors in W .

That is
x1 �= x2 ⇒ T (x1) �= T (x2)
or equivalently
T (x1) = T (x2) ⇒ x1 = x2

V W

x
297

Theorem A linear transformation T is one-to-one iff


T (x) = 0 ⇒ x = 0 ( or N (T ) = {0} )
Proof :
(⇒)
Since T (0) = 0 for every linear transformation, if T is one-to-one and
T (x) = 0, we have x = 0.

(⇐)
Suppose that T is not one-to-one, then there exist x1 and x2 such that
x1 �= x2 but T (x1) = T (x2). Then T (x1 −x2) = 0, and x1 −x2 ∈ N (T ),
but x1 − x2 �= 0, which indicates N (T ) �= {0}, a contradiction.

298

Definition A linear transformation T : V �→ W is said onto


if each vector y in the codomain W is the image
of at least one vector x in the domain V .

That is, R(T ) = W .

V W
299
T
V W
R(T )
N (T )
0

In summary, for a linear transformation T : V �→ W ,


1. T is one-to-one: N (T ) = {0}
2. T is onto: R(T ) = W

300

Theorem (The Rank Theorem)


Assume a linear transformation T : V → W and dimV = n, then
dim(R(T )) + dim(N(T )) = n

T
V W
R(T )
N (T )
0

N (T ) = {x | T (x) = 0}
R(T ) = {T (x) | x ∈ V }
301

Proof:
Let S = {v1, . . . , vp} is a basis of N(T ), p ≤ n. Then

T (v1) = · · · = T (vp) = 0

Extend S to a basis β = {v1, . . . , vp, vp+1, . . . , vn} of V .

R(T ) = T (V ) = T (span{v1, . . . , vp, vp+1, . . . , vn})


= span{T (v1), . . . , T (vp), T (vp+1), . . . , T (vn)}
= span{T (vp+1), . . . , T (vn)}

dim(R(T )) + dim(N(T )) = (n − p) + p = n

302

In the above, T (vp+1), . . . , T (vn) are linearly independent.


If not, ∃ cp+1, . . . , cn, not all zero, such that
cp+1T (vp+1) + · · · + cnT (vn) = 0
T (cp+1vp+1 + · · · + cnvn) = 0

⇒ cp+1vp+1 + · · · + cnvn ∈ N(T )

⇒ cp+1vp+1 + · · · + cnvn = d1v1 + · · · + dpvp

⇒ d1v1 + · · · + dpvp − cp+1vp+1 − · · · − cnvn = 0

But v1, . . . , vp, vp+1, . . . , vn are linearly independent. So


d1 = · · · = dp = cp+1 = · · · = cn = 0
which leads to a contradiction.
303

Example Let T : R2 �→ R2 be the linear operator that rotates each


vector in the xy-plane through the angle θ.

− Since every vector in the xy-plane can be obtained by rotating some


vector through angle θ, we have R(T ) = R2.
− The only vector that rotates into 0 is 0, so Ker(T ) = N (T ) = {0}.
− In this example, T is both one-to-one and onto.

304

Example Let T : R3 �→ R3 be the orthogonal projection on the


xy-plane.
− The kernel of T is the z-axis, which is one-dimensional.
− The range of T is the xy-plane, which is two-dimensional.
− nullity(T ) = 1, rank(T ) = 2
z

(x1, y1, z1)

x (x1, y1, 0)
305

Definition If a linear transformation T : V �→ W is


both one-to-one and onto, we call T an isomorphism between V and
W , and V is isomorphic to W .

V T
W
T (V )

v T (v)

When V is isomorphic to W , we have


dim(V ) = dim(W )

306

Since by the rank theorem,


dim(R(T )) + dim(N(T )) = n = dim(V )

T
V W
R(T )
N (T )
0

1. one-to-one ⇒ N (T ) = {0} ⇒ dim(R(T )) = dim(V )


2. onto ⇒ R(T ) = W ⇒ dim(R(T )) = dim(W )

So T is both one-to-one and onto ⇒ dim(V ) = dim(W ).


307

Note if there exists a one-to-one and onto linear transformation T from


V to W , then T −1 : W → V exists and T −1 is also one-to-one and onto.

V W

T −1

308

Since T : V → W is both one-to-one and onto, for every y ∈ W ,


there is one and only one x ∈ V such that

T (x) = y
We can define an inverse of T , T −1 : W → V , by

T −1(y) = x

Note T −1 is also one-to-one and onto.

Therefore, if T : V → W is an isomorphism (one-to-one and onto),


then T −1 : W → V exists and T −1 is also an isomorphism.
309

In the above, we show that if T is an isomorphism from V to W ,


then
1. There exists T −1, which is also an isomorphism from W to V .
2. dim(V ) = dim(W ).
3. V is isomorphic to W .
4. W is isomorphic to V .

V is isomorphic to W ⇔ W is isomorphic to V
V and W are isomorphic.
� �

310

Suppose that T1 : V → W and T2 : W → Z are both isomorphisms


(one-to-one and onto). Then T2 ◦ T1 is also an isomorphism.

T1 T2

V W Z

T1−1 T2−1
311

Therefore, if V is isomorphic to W , and W is isomorphic to Z, then


V is isomorphic to Z.

By the above, we see that the relation “isomorphic to” is also an


equivalence relation.
1. V is isomorphic to V .
2. If V is isomorphic to W , then W is isomorphic to V .
3. If V is isomorphic to W , and W is isomorphic to Z, then V is
isomorphic to Z.

We have shown that


T is both one-to-one and onto ⇒ dim(V ) = dim(W )

312

For any vector space V with dim(V ) = n, we can construct a


one-to-one and onto mapping between V and Rn. Assume β is a basis
of V .
V �→ Rn (isomorphic)
V u �→ [u]β

Rn
[u]β
313

Therefore, if dim(V ) = dim(W ) = n, then both V and W are isomor-


phic to Rn, and V and W are isomorphic.
So
dim(V ) = dim(W ) ⇔ V and W are isomorphic.

Therefore, every vector space of dimension n is isomorphic to Rn, and


vector spaces of the same dimensions are isomorphic.

n = dim(V ) = dim(W ), V ∼ Rn, W ∼ Rn


⇒ V ∼ Rn ∼ W .

314
n
Example V = {c0 + c1x + · · · + cnx | ck ∈ R} is isomorphic to
Rn+1.
c0 + c1x + · · · + cnxn �→ (c0, c1, . . . , cn)T

Example The standard basis for 2 × 2 matrices M22 is

10 01 00 00
M1 = , M2 = , M3 = , M4 =
00 00 10 01
� � � � � � � �

since
a b 10 01 00 00
=a +b +c +d
c d 00 00 10 01
� � � � � � � � � �

and M22 is isomorphic to R4.


315
Theorem (Equivalent Statements of Matrix Inversion)
If A is an n × n matrix, and the associated linear transformation
T : Rn �→ Rn, T (x) = Ax
then the following statements are equivalent.
a. A is invertible.
b. Ax = b is consistent for every n × 1 matrix b.
c. Ax = b has exactly one solution for every n × 1 matrix b.
d. T is onto. (The range of T is Rn.)
e. T is one-to-one.

Proof : We have proved (a) ⇔ (b) ⇔ (c). It is clear that (b) ⇔ (d).
In addition,
(a) ⇔ nullity(A) = 0 ⇔ N (T ) = {0} ⇔ (e)

316

Recall that when a square matrix A is not invertible, its reduced row
echelon form is like
102 120
   

000 000
   
 0 1 1  or  0 0 1 

and the mapping x �→ Ax is not one-to-one.

102 2 102 0 2
       

000 0 000 1 0
       
0 1 11 = 0 1 10 = 1
317

For a linear transformation T from V to W , note if


dim(V ) = dim(W ) < ∞, then
T is one-to-one ⇔ T is onto.
Since by the Rank Theorem,
dim(R(T )) + dim(N(T )) = dim(V )
therefore
T is one-to-one
⇔ N(T ) = {0}
⇔ dim(N(T )) = 0
⇔ dim(R(T )) = dim(V ) = dim(W ) < ∞
⇔ R(T ) = W
⇔ T is onto

318
• Affine Transformation
Definition An affine transformation from Rn to Rm is a mapping
of the form
S(u) = T (u) + f0, u ∈ Rn
where T is a linear transformation from Rn to Rm and f0 is a constant
vector in Rm.

Affine transformation = linear transformation + translation

Example
The mapping
01 1
S(u) = u+
1
� � � �

−1 0

is an affine transformation on R2.


1

Appendix
• The development of number systems
1. Natural numbers (1, 2, 3, . . .)
x + 5 = 3, x =?

2. Integers (. . . , −2, −1, 0, 1, 2, . . .)


2x = 3, x =?
p
3. Rational numbers ( ) (Rational numbers)
q
x2 = 2, x =?

4. Real numbers ( including 2, π, . . .) R
x2 = −1, x =?

2

5. Complex numbers ( a + ib, i = −1 ) C
The equation
xn + an−1xn−1 + · · · + a1x + a0 = 0
has n roots in C. (Fundamental theorem of algebra)

• About the field


In a vector space (V, F ), the scalars are in the field F (Ex. R, C),
c1v1 + c2v2 + · · · + cnvn
Both R and C are examples of the field.
The following gives the axiom of a field.
3
Definition (or Axiom) of a field F (Ex. R, C)
1. a + b = b + a and a · b = b · a

2. (a + b) + c = a + (b + c) and (a · b) · c = a · (b · c)

3. There exist distinct elements 0 (zero) and 1 such that


0 + a = a and 1 · a = a

4. For each element a and each non-zero element b there exist


elements c and d such that
a + c = 0 and b · d = 1

5. a · (b + c) = a · b + a · c

4
Example R, C, Z2 = {0, 1} are examples of fields.

We may define the operations + and · in Z2 as


0 + 0 = 0, 0 + 1 = 1, 1 + 1 = 0,
0 · 0 = 0, 0 · 1 = 0, 1 · 1 = 1.
5

• The operations of addition and multiplication in a vector space may


not have any relationship or similarity to the standard vector opera-
tions on Rn.
addition: (u, v) �→ w = u + v
multiplication: (c, u) �→ z = cu

• The scalars of a vector space, F , may be


– real numbers, R (corresponding to real vector spaces) or
– complex numbers, C (corresponding to complex vector spaces) or
– other fields. (ex. binary numbers)

6
• About finding a basis for Col(A)

How to find a basis of Col(A)?


A = [a1 a2 . . . an]
That is, how to find a basis for span{a1 a2 . . . an}.

We illustrate the approach by an example.

−3 6 −1 1 −7 1 −2 0 −1 3
   

0 0 0 0 0
   

2 −4 5 8 −4
A =  1 −2 2 3 −1  ∼ R =  0 0 1 2 −2 

then {a1, a3} forms a basis for Col(A).


7

In the following, we show that a1 and a3 are linearly independent,


and
a2 = −2a1
a4 = −a1 + 2a3
a5 = 3a1 − 2a3

Hence we can choose {a1, a3} as a basis for


span{a1, a2, a3, a4, a5} = Col(A).

8
−3 6 −1 1 −7 1 −2 0 −1 3
   

0 0 0 0 0
   

2 −4 5 8 −4
A =  1 −2 2 3 −1  ∼ R =  0 0 1 2 −2 

In this example, consider


Ax = x1a1 + x2a2 + x3a3 + x4a4 + x5a5 = 0 (11)
which is equivalent to Rx = 0, and we have the solution set

x1 2 1 −3
       

(12)
       

x2, x4, x5 ∈ R
 x2  1  0  0
       
 x3  = x2  0  + x4  −2  + x5  2  ,

x5 0 0 1
       
       
 x4  0  1  0

where x1 and x3 are basic variables, and x2, x4, x5 are free vairables.
9

1. If we choose x2 = x4 = x5 = 0, then (11)


Ax = x1a1 + x2a2 + x3a3 + x4a4 + x5a5 = 0
reduces to
x1a1 + x3a3 = 0
and by (12)
x1 2 1 −3
       
       

x2, x4, x5 ∈ R
 x2  1  0  0
       
 x3  = x2  0  + x4  −2  + x5  2  ,

x5 0 0 1
       
       
 x4  0  1  0

the only solution is x1 = x3 = 0, this indicates that a1 and a3 are


linearly independent.

10
2. If we choose x2 = 1, x4 = x5 = 0, then (11)
Ax = x1a1 + x2a2 + x3a3 + x4a4 + x5a5 = 0
reduces to
x1a1 + a2 + x3a3 = 0
and by (12)
x1 2 1 −3
       
       

x2, x4, x5 ∈ R
 x2  1  0  0
       
 x3  = x2  0  + x4  −2  + x5  2  ,

x5 0 0 1
       
       
 x4  0  1  0

the solution is x1 = 2, x3 = 0. Then we have a2 = −2a1.


11
3. If we choose x4 = 1, x2 = x5 = 0, then (11)
Ax = x1a1 + x2a2 + x3a3 + x4a4 + x5a5 = 0
reduces to
x1a1 + x3a3 + a4 = 0
and by (12)
x1 2 1 −3
       
       

x2, x4, x5 ∈ R
 x2  1  0  0
       
 x3  = x2  0  + x4  −2  + x5  2  ,

x5 0 0 1
       
       
 x4  0  1  0

the solution is x1 = 1, x3 = −2. Then we have


a4 = −a1 + 2a3.

12
4. If we choose x5 = 1, x2 = x4 = 0, then (11)
Ax = x1a1 + x2a2 + x3a3 + x4a4 + x5a5 = 0
reduces to
x1a1 + x3a3 + a5 = 0
and by (12)
x1 2 1 −3
       
       

x2, x4, x5 ∈ R
 x2  1  0  0
       
 x3  = x2  0  + x4  −2  + x5  2  ,

x5 0 0 1
       
       
 x4  0  1  0

the solution is x1 = −3, x3 = 2. Then we have


a5 = 3a1 − 2a3.
13
Hence, for Col(A) = span{a1, a2, a3, a4, a5}, we see that
{a1, a3} is an independent set, and
a2 = −2a1
a4 = −a1 + 2a3
a5 = 3a1 − 2a3

⇒ {a1, a3} is a basis for Col(A).

14

If A ∼ R, we can determine the pivot columns of R, then choose


the corresponding columns of A as the bases for Col(A).

−3 6 −1 1 −7 1 −2 0 −1 3
   

0 0 0 0 0
   

2 −4 5 8 −4
A =  1 −2 2 3 −1  ∼ R =  0 0 1 2 −2 

then {a1, a3} forms a basis for Col(A).

You might also like