Linear Algebra Summary

Linear algebra summary
Based on Linear Algebra and its Applications, Fifth Edi-

tion (Global), by Lay et al.
Sam van Elsloo

June 2016
Version 2.0
2
©Sam van Elsloo

Contents
1 Linear Equations in Linear Algebra 7

1.1 Systems of Linear Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Row reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 (Row) echolon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Reduced (row) echolon form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.1 Vectors in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.4 The Matrix Equation Ax = b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4.1 Existence of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Solution Sets of Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Linear Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.7 Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.8 The Matrix of a Linear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Matrix Algebra 21
2.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Characterizations of Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Dimension and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Determinants 29
3.1 Introduction to Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Determinants and matrix products . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Uses of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 A formula for A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Determinants as area or volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Vector Spaces 37
4.1 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Eigenvalues and Eigenvectors 39

5.1 Eigenvectors and Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 Eigenvectors and difference Equations . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2 The Characteristic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.2.1 Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2.2 Application to dynamical systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 Diagonalizing matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.2 Matrices whose eigenvalues are not distinct . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Eigenvectors and Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5 Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.6 Applications to Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.6.1 Decoupling a dynamical system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3
CONTENTS 4
5.6.2 Complex eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6 Orthogonality 57
6.1 Inner Product, Length and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1.1 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.2 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.3 Angles in R2 and R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.1 An orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.2 Orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.1 Properties of orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 The Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.5 Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6 Applications to Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.6.1 The general linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6.2 Least-squares fitting of other curves . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6.3 Multiple regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7 Symmetric Matrices and Quadratic Forms 71

7.1 Diagonalization of Symmetric Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.1 Spectral decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Quadratic Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2.1 Change of variable in a quadratic form . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2.2 A geometric view of principal axes . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.2.3 Classifying quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
©Sam van Elsloo

Preface: 2016-2017 edition
Forgot to say it before, but if you want to forward it (which would be fine), please use the bit.ly link (i.e
bit.ly/LA300617) and not this file or the dropbox link directly.
I’ve omitted a number of proofs of theorems in the book. The reason for this is that the old exams
barely ever ask for proofs longer than two sentences, and the proofs in the book are at times rather long,
cumbersome and generally not that clarifying. I did include the proofs that are only a few lines long, or
where they were extremely logical.
Furthermore, note that there’s an index in the back of this summary. If you want to quickly look
something up, you’ll find a reference there, most likely.
Additionally, I can really recommend doing the online exercises while studying. Do a section (or
subsection) and practice the associated exercises, this was the best way to study (at least in my experience).
Finally, do yourself a huge favour and explore the true powers of your graphical calculator, for deep
inside your graphical calculator you will find some very helpful functions. For example, it can automatically
row reduce matrices for you, which is incredibly useful. Dot products (useful for orthogonality, chapter 5)
can be done much quicker as well, etc. If you do not own a graphical calculator, I’d really recommend
you buy the Texas Instruments Nspire CX (without CAS! That one is highly illegal for linear algebra
exam). Otherwise, go for the TI-84 Plus1 , which is the one I use personally. However, the Nspire is much
much better looking (I mean, it has a colour screen, a scrollpad, a dedicated keyboard, and well it’s just
simply beautiful). In any case, never go Casio.
1 Possibly go for the silver or C silver edition, though I cannot guarantee that the latter one is allowed for the exam.
They say you’re allowed to use graphical calculators that are allowed on the VWO central exam, but C silver edition was
not allowed in 2015; it was in 2016 and 2017, but I don’t know how frequently they update the list
5
CONTENTS 6
©Sam van Elsloo

1 Linear Equations in Linear Algebra
1.1 Systems of Linear Equations

The purpose of this section is to introduce the matrix notation.
A linear equation has the form:
a1 x1 + a2 x2 + · · · + an xn = b (1.1)
Where a1 , ..., an are called the coefficients. A system of linear equations (or a linear system) is a
collection of one or more linear equations involving the same variables. A solution of the system is a list
of values for x1 , ..., xn which satisfies the equations. The set of all possible solutions is called the solution
set of the linear system. Two linear systems are called equivalent if they have the same solution set.
When two linear equations are given, the solution can be seen as the intersection of these two lines, in
whatever dimension the lines are in. Note that there are three possibilities regarding the existence of the
solution:
1. There is no solution;
2. There is exactly one solution;
3. There are infinitely many solutions.
Systems with no solutions are called inconsistent. Systems with at least one solution are called
consistent.
Now, suppose we have the system:
x1 − 2x2 +x3 = 0
4x1 −3x3 = 5 (1.2)
2x1 + 6x2 +3x3 = 12
We can write this compactly in a rectangular array called a matrix. When we just write the coefficients,
we call it a coefficient matrix:  
1 −2 1
4 0 −3 (1.3)
2 6 3
If we include the right part of the equations, we get the so-called augmented matrix1 :
 
1 −2 1 0
4 0 −3 5  (1.4)
2 6 3 12
The size is given as m × n, where m is the number of rows, and n the number of columns. The augmented
matrix thus was of the size 3 × 4.
Solving linear systems will be dealed with in more detail in the next section. For now, the three
elementary row operations are:
1. (Replacement) Replace one row by the sum of itself and a multiple of another2 ;
2. (Interchange) Interchange two rows;
1 Note that the only difference is that we have included the constants on the right side of the equation.
2 Add a multiple of another row to one row
7
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 8
3. (Scaling) Multiply all entries in a row by a nonzero constant.
Two matrices are called row equivalent if there exists a sequence of elementary row operations that
transforms one matrix into the other. It is important to note that row operations are reversible. If the
augmented matrices of two linear systems are row equivalent, then the two systems have the same solution
set.
1.2 Row Reduction and Echolon Forms

This section deals with solving systems of linear equations.
1.2.1 (Row) echolon form

The easiest way to solve an augmented matrix is by trying to write it in the following form (numbers are
just examplary, of course):  
1 −2 1 0
0 2 −3 6  (1.5)
0 0 3 12
Because now, we can write the third row as 3x3 = 12, so x3 = 4. The second row is then 2x2 − 3x3 = 6,
so x2 = 6+3·4
2 = 9. Similarly, x1 can be found.
This form is called the (row) echolon form. This has the following properties:
Any row full of zeroes is placed at the bottom so that all nonzero rows are above it;
Each leading entry of a row (the leading entry is the first number in a row that’s not 0) is in a
column to the right of the leading entry of the row above it;
All entries in a column below a leading entry are zeros.
An example of this echolon form is

 
n k k k k
0 n k k k
0 0 0 n k
Where n is the leading entry in each row, and n may have any value but zero, k may have any value,
including zero.
Example
Consider the augmented matrix
 
0 3 −6 6 4 −5
A = 3 −7 8 −5 8 9 
3 −9 12 −9 6 15
Find the solutions of A.

To write this in echolon form, you need to perform the following steps:
Start with the leftmost column. Make sure that there’s a nonzero number as leading entry; if there’s
a zero there, interchange this row with another row, preferably with the row that has the smallest
number as leading entry there (the reason will become obvious during the next step);
Use this row to make the rows below it have 0 in the first column, by adding/substracting this row
to the other rows;
Go to the row below it and look for the leading entry; if there is a row below it that has its leading
entry in a column before the second row, interchange these two rows;
Make the rows below the second row have its entry in the column of the leading entry of the second
row reduced to 0.
©Sam van Elsloo

9 1.2. ROW REDUCTION
So, for this example, we first interchange the third and first row (R3 and R1), though you may also
choose R2 and R1:
 
3 −9 12 −9 6 15
3 −7 8 −5 8 9
0 3 −6 6 4 −5
Then, to make sure the second row has a zero in the first column as well (because the third row already
has a zero there, we don’t need to do anything there for now), we substract the first row from the second
row:
 
3 −9 12 −9 6 15
0 2 −4 4 2 −6
0 3 −6 6 4 −5
Now we look for the second row, and we see that that has its leading entry in the second column. We
must thus ensure that the entry in the second column of the third row becomes zero. We must thus
substract 1.5R2 from R3 . First multiplying with 1.5 gives:
 
3 −9 12 −9 6 15
0 3 −6 6 3 −9
0 3 −6 6 4 −5
Then substracting yields:

 
3 −9 12 −9 6 15
0 3 −6 6 3 −9
0 0 0 0 1 4
Now, it may be obvious that x5 = 4. For the second row, however, we get (after dividing by 3):
x2 − 2x3 + 2x4 + 4 = −3. Note that x3 and x4 may take up any value and are therefore called free
variables. Note that there is no reason why we did not write a solution like x3 = ... and x4 = ... and
make x1 and x2 the free variables, but for simplicity’s sake, it is convention to write the pivots as basic
variables and express these as functions of the free variables. Note that the number of free variables is
the number of columns in the coefficient matrix (so the number of columns in the augmented matrix -1),
minus the number of pivot columns (see below), if the system is consistent, that is. Now, substitute this
value into the first row to get an expression for x1 as a function of x3 and x4 .
1.2.2 Reduced (row) echolon form

Now, we could have made the matrix even easier to read. First, we could divide the upper two rows by
three to get 1s in the leading entry:
 
1 −3 4 −3 2 5
0 1 −2 2 1 −3
0 0 0 0 1 4
Next, we could create 0s above the leading entries in each column. We could, for example, substract R3
from R2 , and 2R3 from R1 :
 
1 −3 4 −3 0 −3
0 1 −2 2 0 −7
0 0 0 0 1 4
Next, we could also create 0s above the leading entry in the second row, by adding 3R2 to R1 :
 
1 0 −2 3 0 −24
0 1 −2 2 0 −7 
0 0 0 0 1 4
Now, we call those 1s at the beginning of each row as leading entry the pivots. To be precise:
samvanelsloo97@icloud.com
Definition
A pivot position in a matrix A is a location in A that corresponds to a leading 1 in the reduced
echolon form of A. A pivot column is a column of A that contains a pivot position.
Note the advantage of writing the matrix like this. The solutions can now be immediately written as:
x1 = −24 + 2x3 − 3x4

x2 = −7 + 2x3 − 2x4
x5 = 4
This matrix form, where each leading entry in each nonzero row is 1 and each leading 1 is the only
nonzero entry in its column, is said to be in reduced (row) echolon form.
Theorem 1: Uniqueness of the Reduced Echolon Form
Each matrix is row equivalent to one and only one reduced echolon matrix.
Theorem 2: Existence and Uniqueness Theorem
A system is consistent if and only if an echolon form of the augmented matrix has no row of the
form [0 · · · 0 b]. If a linear system is consistent, it contains one unique solution when there are
no free variables, and infinitely many solutions if there is at least one free variable.
1.3 Vector Equations
This section deals with vectors.
1.3.1 Vectors in Rn
A matrix with only one column is called a column vector, or simply a vector. A vector in R5 would
for example be:
 
3
2
 
−1
u= 
5
2
Because there are five entries and five dimensions can be reached (the first three represent x, y and z
direction, the other two have to be imagined to be the fourth and fifth dimension). Two vectors are
equal if and only if their corresponding entries are equal. The sum is equal to u + v if u and v are in
the same dimension. Given a vector u and a real number c, the scalar multiple of u by c is the vector
cu obtained by multiplying each entry in u by c. The number c is called a scalar. The vector whose
entries are all zero is called the zero vector and is denoted by 0.
©Sam van Elsloo

11 1.3. VECTOR EQUATIONS
Algebraic Properties of Rn
For all u, v, w in Rn and all scalars c and d:

(i) u + v = v + u
(ii) (u + v) + w = u + (v + w)
(iii) u + 0 = 0 + u = u
(iv) u + (−u) = −u + u = 0, where −u denotes (−1) u
(v) c (u + v) = cu + cv
(vi) (c + d) u = cu + du
(vii) c (du) = (cd) u
(viii) 1u = u
1.3.2 Linear Combinations

Using above properties, it is possible to write a vector y as
y = c1 v1 + c2 v2 + · · · + cp vp (1.6)
Where y is called a linear combination of v1 , ...vp . Note that often we choose c1 = x1 etc. To check
whether a certain vector is a linear combination of other vectors, you can solve the matrix

a1 a2 b (1.7)
     
1 2 7
So, for example, if a1 = −2, a2 = 5 and b =  4  You can check whether b is a linear combination
−5 6 −3
of the other two by the augmented matrix solving
 
1 2 7
−2 5 4 
−5 6 −3
 
1
Now, suppose we have one vector v1 = −2. If we’d plot this in a graph, we would get a line passing
−5
through the origin and ending at (1,-2,-5). By multiplying this vector with a scalar, we can reach any
point that’s on this line, so for example, (-2,4,10) (multiplied with -2), but not for example (1,-4,10).
Thus, the combination of v1 (little weird to call it combination since it’s only one vector) can only get
us at a certain subset of points within R3 , namely all points given on the
 line governed by the equation
3
x − 2y − 5z = 0. Similarly, had we had another vector, say v1 = −4 Now, these two lines together
−1
 
1+3=4
form a plane, so we can now reach −2 − 4 = −6 as well. Still, not every point in R3 can be reached,
−5 − 1 = −6
but only a certain subset. A geometric description of this can be seen in figure 1.1
Note that if we have only one vector, we can only reach points on a single line. If we have two vectors,
we can reach points in a plane (if v is not a multiple of u, that is). To make it a more general definiton:
Definition
If v1 , ..., vp are vectors in Rn (in the previous example: there are three entries in each vector, so
n = 3), then the set of all linear combinations of v1 , ..., vp (that is, all points that can be reached
by a linear combination of these vectors) is called the subset of Rn spanned (or generated)
by v1 , ...vp .
Figure 1.1: Geometric description of a span.
1.4 The Matrix Equation Ax = b

This section deals with seeing a linear combination of vectors as the product of a matrix and a vector.
Definition
If A is an m x n matrix, with columns a1 , ..., an , and if x is in Rn (that is, if the vector x has
an equal number of rows as A has columns), then the product of A and x, denoted by Ax, is
the linear combination of the columns of A using the correspondinbg entries in x as
weights; that is,
 
x1
 . 
Ax = a1 a2 · · · an  ..  = x1 a1 + x2 a2 + · · · + xn an
xn

= x1 a1 + x2 a2 + · · · + xn an
There’s actually some logic behind the first and second step, and also between the second and third
step. If you recall from the very first paragraph, we’d use the first column for our x1 entries in the system
of equations, and the second column for x2 , etc. It thus makes sense to multiply the first column with x1
here, and the nth column with xn . Furthermore, as we’re now left with merely vectors, we’re allowed to
simply sum them to get a ”total” vector.
If we have for example the system:
x1 + 2x2 − x3 = 4
(1.8)
−5x2 + 3x3 = 1
This is equivalent to the vector equation

1 2 −1 4
x1 + x2 + x3 = (1.9)
0 −5 3 1
And the associated matrix equation:
 
x1
1 2 −1  
x2 (1.10)
0 −5 3
x3
This leads to the following theorem:
Theorem 3
If A is an m × n matrix, with columns a1 , ..., an , and if b is in Rm , the matrix equation
Ax = b (1.11)
has the same solution set as the vector equation
x1 a1 + x2 a2 + · · · + xn an = b (1.12)
which, in turn, has the same solution set as the system as the system of linear equations whose
augmented matrix is
a1 a2 · · · an b (1.13)
©Sam van Elsloo

13 1.4. THE MATRIX EQUATION AX = B
1.4.1 Existence of solutions

The definition of Ax leads directly to the fact that the equation Ax = b has a solution if and only if b is
a linear combination of the columns of A. To find out whether Ax = b is consistent for all
 possible b, one

1 3 4
must check whether any combination of b1 , b2 , ..., bn is possible. For example, if A = −4 2 −6
  −3 −2 −7
b1
and b = b2 , then checking whether every b has a solution can be done by row reducing the augmented
b3
matrix:
   
1 3 4 b1 1 3 4 b1
−4 2 −6 b2  ∼ 0 14 10 b2 + 4b1 
−3 −2 −7 b3 0 7 5 b3 + 3b1
 
1 3 4 b1
∼ 0 14 10 b2 + 4b1 
1
0 0 0 b3 + 3b1 − 2 (b2 + 4b1 )
So, b1 − 12 b2 + b3 = 0. This means that only a specific combination is valid. Had there been a pivot
position in the third column as well, then we would have got b1 − 12 b2 + b3 = x3 , which is valid for any
combination of b1 , b2 and b3 3 . So, if A has a pivot in every row, then Ax = b is consistent for all b4 .
The following theorem applies to the existence of solutions:
Theorem 4
Let A be an m × n matrix. Then the following statements are logically equivalenty. That is, for a
particular A, either they are all true statements or they are all false.
a. For each b in Rm , the equation Ax = b has a solution.
b. Each b in Rm is a linear combination of the columns of A.
c. The columns of A span Rm .

d. A has a pivot position in every row.
Proof
Statement a. and b. are true due to the definition of Ax. Statement c. is logical if we consider for
example the matrix:
 
2 4 3 1
1 −3 2 −3
0 0 0 0
Although the matrix has 3 rows, each column only spans R2 , because none of the columns point in the
third direction. It is now also apparent that the equation Ax = b does not have a solution for each b
and thus that not each b is a linear combination of the columns of A. Statement d. is a little bit more
m
difficult to proof.
Let U be an echolon form of A. Given b in R , we can row reduce the augmented
m
matrix A b to an augmented matrix U d for some d in R . If d. is indeed correct, then each row
of U contains a pivot position, so that Ax = b has a solution for any b. If d. is false, then the last row of
U consists of only zeroes. Since for a. to be true, each b was allowed, it means that if a. was true, d may
also have a nonzero value. The system then would be inconsistent, and not each b has a solution (only
the bs that would cause a zero to appear as last entry in U). So, if d. is false, so is a., and if a. is false,
b. and c. are also false. Therefore, all statements are logically equivalent.
An identity matrix is a matrix with 1’s on the diagonal and 0’s elsewhere.
3 (We
can adjust x3 for that)
4 Note
that the coefficient matrix should have a pivot position in every, not the augmented matrix (because the augmented
matrix at the end did have a pivot in b1 − 12 b2 + b3 ).
One final theorem on this section, regarding properties of the matrix-vector product Ax:
Theorem 5
If A is an m × n matrix, u and v are vectors in Rn , and c is a scalar, then:
a. A (u + v) = Au + Av;
b. A (cu) = c (Au).
1.5 Solution Sets of Linear Systems

A system of linear equations is said to be homogeneous if it can be written in the form Ax = 0, where
A is a m x n matrix and 0 the zero vector in Rm . Such a system has always one solution, namely x = 0
(so x1 = x2 = xn = 0). This solution is called the trivial solution. Considering it is only possible to
have one solution or infinitely many solutions, if there is a non-trivial solution, there must be a free
variable and vice versa.
So, for example, consider the following system:
x1 − 2x2 + 3x3 = 0
(1.14)
4x1 + x2 − 6x3 = 0
These equations plotted yields figure 1.2. Both equations are equations of planes in a three-dimensional
Figure 1.2: Homogeneous equation with infinitely many solutions
graph, both passing through the origin. Considering the equations of the plane are not equivalent, the
intersection between the two planes is a line, passing through the origin. The formula for this equation
can be found by solving the augmented matrix:

1 −2 3 0 1 −2 3 0 1 −2 3 0
∼ ∼
4 1 −6 0 0 9 −18 0 0 1 −2 0

1 0 −1 0
∼
0 1 −2 0
   
x3 1
So that x2 = 2x3 , x1 = x3 , or x = 2x3  = x3 2. Now, if we compare it with the system:
x3 1
x1 − 2x2 + 3x3 = 6
(1.15)
4x1 + x2 − 6x3 = −3
We realize that we have merely shifted the two planes somewhat. However, the planes are still parallel
to what they were before, and therefore, it is logical that the intersection line between the two is also
parallel to what it was, but it’ll simply be shifted a bit. Indeed, we can row reduce it to

1 0 −1 0
∼
0 1 −2 −3
©Sam van Elsloo

15 1.5. SOLUTION SETS OF LINEAR SYSTEMS
   
0 1
So that x = −3 + x3 2. Note that this basically is just the same as before, but with an added ’base’
0 1
vector (don’t know how to call it). If we use x3 = t, we can now write x = p + tv.
In case you’re wondering what would happen if there was no free variable for the homogeneous
equation: this would happen, for example, if you consider three planes in 3D:
Figure 1.3: Homogeneous equation with only one solution
Of which the system equals:

x = 0
z = 0
x+y+z = 0
It is apparent that there is only the trivial solution for each system. However, if we were to alter the
system to
x = 2
z = −2
x+y+z = 5
We can plot this in figure 1.4.
Figure 1.4: Non-homogeneous system
It is again apparent that the solution has merely shifted a bit (if you look closely, you can see the
intersection between
  the blue, green and yellow plane, to be exact at (2,5,-2). Note that this is the only
2
solution: x =  5 .
−2
Indeed, we can write x = p + vh (this is more apparent for the previous example with the free
variable), we get the following theorem:
Theorem 6
Suppose the equation Ax = b is consistent for some given b, and let p be a solution. Then the
solution set of Ax = b is the set of all vectors of the form w = p + vh , where vh is any solution of
the homogeneous equation Ax = 0.
Please note that this only applies to an equation Ax = b in which there is at least one nonzero
solution p.
So, in short, take the following approach to find the solution set in parametric vector form:
Row reduce the augmented matrix to reduced echolon form (pretty sure the reduced is not necessary
per se but it’s not that difficult);
Express each basic variable in terms of any free variables appearing in an equation;
Write a typical solution x as a vector whose entries depend on the free variables (if applicable);
Decompose the solution into a linear combination of vectors using the free variables as parameters.
1.6 Linear Independence

     
1 2 2
If we look at the set of vectors {2 , 3 , 4}, then we see that the third vector is basically useless:
3 4 6
all points can be reached by the other two vectors. Such a set is called linearly dependent. The precise
definition is:
Definition
An indexed set of vectors v1 , ...vp in Rn is said to be linearly independent if the vector equation
x1 v 1 + x2 v 2 + · · · + xp v p = 0
has only the trivial solution. The set v1 , ..., vp is said to be linearly dependent if there exist
weights c1 , ...cp , not all zero, such tat
c1 v1 + c2 v2 + · · · + cp vp = 0
Similarly:
The columns ofA are linearly independent if and only if the equation Ax = 0 has only the
trivial solution.
Theorem 7: Characterization of Linearly Dependent Sets
An indexed set S = v1 , ..., vp of two or more vectors is linearly dependent if and only if at least
one of the vectors in S is a linear combination of the others. In fact, if S is linearly dependent and
v1 6= 0, then some vj (with j > 1) is a linear combination of the preceding vectors, v, ..., vj−1 .
Theorem 8
If a set contains more vectors than there are entries in each vector, then the set is linearly
dependent. That is, any set v1 , ..., vp in Rn is linearly dependent if p > n.
Theorem 9
If a set S = v1 , ..., vp in Rn contains the zero vector, then the set is linearly dependent.
Proof

Theorem 8 can be proven as follows: Let A = v1 · · · vp . Then A is n × p and the equation
Ax = 0 corresponds to a system of n equations and p unknowns. If p > n, then there are more unknowns
than equations, thus there must be a free variable. Hence Ax = 0 has a nontrivial solution and the
columns of A are linearly dependent.
Theorem 9 can be proven by renumbering the vectors such that v1 = 0. Then a solution would be
1v1 + 0v2 + · · · + 0vp = 0, so S is linearly dependent, because there is a nontivial solution.
©Sam van Elsloo

17 1.7. LINEAR TRANSFORMATIONS
So, to check whether a given set is linearly dependent, check whether one of the vectors is the linear
combination of the other5 , or solve the equation Ax = 0. To check what values of a certain entry would
yield a linearly dependent set, solve the equation Ax = 0. If you get that the entry should be a free
variable, it is dependent for any value you plug in (so it’s dependent for all h for example).
1.7 Introduction to Linear Transformations

Remember how we calculated Ax to be for example:
 
1
4 −3 1 4  2 = 4 · 1 − 3 · 2 + 1 · 3 + 4 · 4 = 17

(1.16)
2 0 5 1 3 2·1+0·2+5·3+1·4 21
4
Now, it what this actually does, is that it converts a point (in this case, (1,2,3,4)) in a 4D graph (you
can think of the fourth dimension for example being visualized with a colour or something), to a point in
a 2D graph. It’s nice that we now have the this conversion for one point, but aren’t you interested as fuck
in knowing it for every single 4D point? That’s why we introduce linear transformations: similarly to
how a regular function f(x) converts a input value x to something else, a linear transformation converts a
vector in Rn to a vector in Rm . To be more precise:
A transformation (or function or mapping) T from Rn to Rm is a rule that assigns to
each vector x in Rn a vector T (x) in Rm . The set Rn is called the domain of T , and Rm
the codomain of T . The notation T : Rn → Rm indicates tat the domain is Rn and the
codomain Rm . For x in Rn , the vector T (x) in Rm is called the image of x (under the action
of T ). The set of all images T (x) is called the range of T .
So, what would the linear transformation of the example above be? Simple:
 
x1
4 −3 1 3  x2  = 4 · x1 − 3 · x2 + 1 · x3 + 4 · x4

(1.17)
2 0 5 1 x3  2 · x1 + 0 · x2 + 5 · x3 + 1 · x4
x4
 
3
2
So, say we have the vector u =  
 , then T (u) would be:
5
1

4·3−3·2+1·5+4·1 15
T (u) = =
2·3+0·2+5·5+1·1 32
So, we converted a point in one graph that had the coordinates (3,2,5,1) to (15,32). Note that two
dimensions are gone. Now, to determine whether a vector b in Rm is range of T (x) (or,
whether
there is
21
an x whose image under T is b)), all we need is to solve the equation: (suppose b = ):
−18

4 · x1 − 3 · x2 + 1 · x3 + 4 · x4 21
=
2 · x1 + 0 · x2 + 5 · x3 + 1 · x4 −18
Which we can solve by solving the augmented matrix:

4 −3 1 4 21
2 0 5 1 −18
This leads to a solution set with two variables, thus there are infinitely many solutions. However, it can
of course also happen (had the coefficient matrix had more rows), that there would be just one solution,
or even none at all, if b had not been an image of x under T .
5 Note that you’d need to check for every vector to be sure. So, it’s takes a while to proof it is linearly independent this
way, but if you quickly see that one vector is a linear combination of the others, then it’s quicker than solving Ax = 0
A shear transformation is a transformation T (x) = Ax that is T : R2 → R2 where A is a 2 x 2

1 3
square. This is an interesting transformation, because if we have for example A = , we’d get that
0 1
1·1+3·2 7
for example the point (1,2) is moved to = . So, a point is merely shifted to another
0·1+1·2 2
point. You can look at a graph of this in figure 1.7; there you see nicely how a graph is shifted using a
linear transformation.
Figure 1.5: An example of a shear transformation
Definition
A transformation (or mapping) T is linear if:

T (u + v) = T (u) + T (v) for all u, v in the domain of T ;
T (cu) = cT (u) for all scalars c and all u in the domain of T .
This leads to:
If T is a linear transformation, then

T (0) = 0
and
T (cu + dv) = cT (u) + dT (v) (1.18)
for all vectors u, v in the domain of T and scalars c, d.
1.8 The Matrix of a Linear Transformation

In the previous section, we went from a given matrix A and a x/b to b/x. It is more interesting, however,
to know what A was, given a certain transformation of x ↔ b. The easiest way to solve this is to simply
make a matrix in which you for now only put a, b, c etc. as parameters. This matrix should have the same
number m rows as b has, and the same number of columns n as x has. So, say we have T : R2 → R4 ,
T (e1 ) = (3, 1, 3, 1) and T (e2 ) = (−5, 2, 0, 0), where e1 = (1, 0) and e2 = (0, 1). Then:
   
a b 3 −5
c d 1 0 1 2 

e =  
f 0 1 3 0 
g h 1 0
   
a·1+b·0 a·0+b·1 3 −5
c · 1 + d · 0 c · 0 + d · 1 1 2 
  =  
e · 1 + f · 0 e · 0 + f · 1 3 0 
g·1+h·0 g·0+h·1 1 0
The solution must be obvious for this. The following theorem applies:
©Sam van Elsloo

19 1.8. THE MATRIX OF A LINEAR TRANSFORMATION
Theorem 10
Let T : Rn → Rm be a linear transformation. Then there exists a unique matrix A such that
T (x) = Ax for all x in Rn
In fact, A is the m × n matrix whose jth column is the vector T (ej ), where ej is the jth column
of the identity matrix in Rn :

A = T (e1 ) · · · T (en )
Proof

Write x = In x = e1 · · · en x = x1 e1 + · · · + xn en , and use the linearity of T to compute
T (x) = T (x1 e1 + · · · + xn en ) = x1 T (e1 ) + · · · + xn T (en )

 
x1
 . 
= T (e1 ) · · · T (en )  ..  = Ax
xn
The matrix A that is the solution to this is the standard matrix for the linear transformation T .
What above theorem actually means is the following: suppose we are given that e1 (the point (1,0,0)) is
6 0 2 −3
transformed into (0,4), e2 into (2,3), and e3 into (-3,1) . Then A = . Wonderfully simply,
4 3 1
isn’t it? So, if they for example say that something is first turned π4 radians, and then reflected in the x2
axis, it is key to draw a sketch of this. We may pick any vectors to establish our standard matrix, but
realizing that we just read something useful, it’s much more convenient to start with e1 and √ e2 . What
1
2
would happen with them if we apply said transformation? e1 first becomes the point 21 √ and e2
1√ 2 2
− 2√ 2
becomes 1 . Then, considering we reflect in the x2 axis, the x1 coordinates switch signs (as they
2 2 1√ 1√
− 2 2
come to lie on the opposite side of where they were beforehand). So, e1 = 12√ and e2 = 12 √ So,
2 2 2 2
the standard matrix equals:
1√ √
− 2√ 2 12 √2
A= 1 1
2 2 2 2
Easy peasy lemons squeezy right?

There are but two interesting (well, relatively speaking) things left to consider: can we find a x for
any b in Rm , and is there at most one for any b? If the first is true, it is called that the transformation
is onto Rm ; if the second is true, it called one-to-one.
To be precise:
Definition
A mapping T : Rn → Rm is said to be onto Rm if each b in Rm is the image of at least one x in
Rn . That is, for each b, the augmented matrix will be consistent (there must be a solution for
each b).
Definition
A mapping T : Rn → Rm is said to be one-to-one Rm if each b in Rm is the image of at most
one x in Rn . That is, for each b, the augmented matrix may not contain a free variable (the
system may have no solution, but it may never have more than one solution).

6 So 0 2 −3
T (e1 ) = , T (e2 ) = , T (e3 ) =
4 3 1
Thus, a transformation can be both onto and one-to-one, it can be either of them, or it can be neither.
Theorem 11
Let T : Rn → Rm be a linear transformation. The T is one-to-one if and only if the equation
T (x) = 0 has only the trivial solution.
One final theorem to end this chapter:

Theorem 12
Let T : Rn → Rm be a linear transformation, and let A be the standard matrix for T . Then:
i. T maps Rn onto Rm if and only if the columns of A span Rm ;
ii. T is one-to-one if and only if the columns of A are linearly independent.
Proof
i. Remember from section 1.4, the columns of A span Rm if and only if for each b in Rm the equation
Ax = b is consistent, that is, if and only if for every b the equation T (x) = b has a solution. This
is true if and only if T maps Rn onto Rm .
ii. The equation T (x) = b and Ax = b are the same, except for notation. Remember that Ax = b only
has the trivial solution if the columns of A are linearly independent. Using the previous theorem,
that means that T (x) is one-to-one if and only if the colums of A are linearly independent.
©Sam van Elsloo

2 Matrix Algebra
2.1 Matrix Operations

First a few definitions: the diagonal entries in an m x n matrix A = [aij ] are a11 , a22 , a33 , ...., and they
form the main diagonal of A. A diagonal matrix is a square n x n matrix whose nondiagonal entries
are zero. An example is the n x n identity matrix, In . An m x n matrix whose entries are all zeroes is a
zero matrix and is written as 0.
Summing and scalar multiplication is pretty straightforward (if they are the same size):
Theorem 1
Let A, B, and C be matrices of the same size, and let r and s be scalars.
a. A + B = B + A;
b. (A + B) = A + (B + C);
c. A + 0 = A;
d. r (A + B) = rA + rB;
e. (r + s) A = rA + sA;
f. r (sA) = (rs) A.
Multiplaction may be less obvious: remember that

 
x
3 4 1  1

3x1 4x2 1x3
x2 = + +
2 −1 2 2x1 −1x2 2x3
x3
Since this was just three vectors, we could add them to get the vector

3x1 + 4x2 + 1x3
2x1 − 1x1 + 2x3
Now, we can just simply multiply this vector with another matrix, so for example:
     
2 −1 2 (3x1 + 4x2 + 1x3 ) −1 (2x1 − 1x2 + 2x3 )
 3x1 + 4x2 + 1x3 =  5 (3x1 + 4x2 + 1x3 )  +  0 (2x1 − 1x2 + 2x3 ) 
5 0    

3 3  2x1 − 1x2 + 2x3  3 (3x1 + 4x2 + 1x3 )   3 (2x1 − 1x2 + 2x3 ) 
−4 −2 −4 (3x1 + 4x2 + 1x3 ) −2 (2x1 − 1x2 + 2x3 )
Again, these are merely two vectors, and we may add them, to get
 
4x1 + 9x2
 15x1 + 20x2 + 5x3 
 
 15x1 + 11x2 + 9x3 
−18x1 − 14x2 − 8x3
Which can also be written as the matrix
 
4 9 0
 15 20 5
 
 15 11 9
−18 −14 −8
21
CHAPTER 2. MATRIX ALGEBRA 22
This can be done for all matrix multiplications. The more general form is:
Definition
If A is an m x n matrix, and if B is an n x p matrix with columns b1 , ..., bp , then AB is the m x
p matrix whose colums are Ab1 , ..., Abp . That is,

AB = A b1 b2 · · · bp = Ab1 Ab2 · · · Abp
However, a more convenient way to quickly calculate is by following this approach:

First, only look at the first row of A and the first column of B. Calculate the sum of the products
between the first entry in the first row of A and the first entry in the first column of B, the second
entry in the first row of A and the second entry in the first column of B, etc. This is the lefttop
entry in the final matrix.
Then, look at the second row of A, but still the first column of B. Calculate the sum of the products
between the first entry in the second row of A and the first entry in the first column of B, the
second entry in the second row of A and the second entry in the first column of B, etc. This is
entry beneath the lefttop entry in the final matrix.
Continue with the third row of A and the first column of B, and with each row below that, until you’ve
run out of rows; then start with the first row of A and the second column of B; this will yield the entries
in the second column of the result of AB. This also explains why this new matrix will inherit the number
of rows of A and the number of columns of B, and why the number of columns of A must be equal to the
number of rows of B. Note that A (Bx) = (AB) x.
Notice that Ab1 is a linear combination of A using the entries in b1 as weights. In general:
Each column of AB is a linear combination of the columns of A using weights from the
corresponding column of B.
The above method I explained is more generally written as:
Row-column rule for computing AB
If the product AB is defined, then the entry in row i and column j of AB is the sum of the
products of corresponding entries from row i and A and column j of B. If (AB)ij denotes the
(i, j)-entry in AB, and if A is an m x n matrix, then
(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj
Note that if you want to calculate a certain row for the matrix AB, then you only have to calculate
the product of that row in A with the matrix B.
Theorem 2
Let A be a m x n matrix, and let B and C have sizes for which the indicated sums and products
are defined.
A (BC) = (AB) C: associative law of multiplication;
A (B + C) = AB + AC: left distributive law;
(B + C) A = BA + CA: right distribituve law;
r (AB) = (rA) B = A (rB);
Im A = A = AIn : identity for matrix multiplication.
However, be wary of the following:

In general, AB 6= BA;
The cancellation laws do not hold for matrix multiplication. That is, if AB = AC, then
it is not true that B = C;
©Sam van Elsloo

23 2.2. THE INVERSE OF A MATRIX
If a product AB is the zero matrix, you cannot conclude in general that either A = 0 or
B = 0.
If A is a n x n matrix, then Ak denotes the product of k copies of A: Ak = A · · · A.

Given a m x n matrix A, the transpose of A is the n x m matrix, denoted by AT (so, the rows
become a column, and the colums become a row). Transposes have the following properties:
Theorem 3
Let A and B denote matrices whose sizes are appropriate for the following sums and products:
T
AT = A;
T
(A + B) = AT + B T ;
T
For any scalar r, (rA) = rAT ;
T
(AB) = B T AT .
In general, the transpose of a product of matrices equals the product of their transposes in the reverse
order.
2.2 The Inverse of a Matrix

Am n × n is said to be invertible if there is an n × n matrix C such that
CA = I
AC = I
Where I = In is the n x n identity matrix. In this case C is the inverse of A. In fact, there is only one C for
each matrix A, because if B were to be another inverse of A, then B = BI = B (AC) = (BA) C = IC = C.
This unique inverse is denoted by A−1 , so
A−1 A = I (2.1)
−1
AA = I (2.2)
A matrix that is not invertible is sometimes called a singular matrix, and an invertible matrix is called
a nonsingular matrix.
Theorem 4

a b
Let A = . If ad − bc 6= 0, then A is invertible and
c d

−1 1 d −b
A =
ad − bc −c a
If ad − bc = 0, then A is not invertible.
The quantity ad − bc is called the determinant of A, and we write
det A = ad − bc
The inverse can be used to solve a linear system:

Theorem 5
If A is an invertible n x n matrix, then for each b in Rn , the equation Ax = b has the unique
solution x = A−1 b.
Proof
This proof is rather easy:
Ax = b
−1
A Ax = A−1 b
Ix = A−1 b
x = A−1 b
Three useful facts about invertible matrices are:
Theorem 6
a. If A is an invertible matrix, then A−1 is invertible and

−1
A−1 =A
b. If A and B are n x n invertible matrices, then so is AB, and the inverse of AB is the product
of the inverses of A and B in the reverse order. That is,
−1
(AB) = B −1 A−1
c. If A is aninvertible matrix, then so is AT , and the inverse of AT is the transpose of A−1 .

That is,
−1 T
AT = A−1
Proof
(a) can be proven by noticing that the solution to A−1 C = I and CA−1 = I has the solution C = A,
−1 −1
and the solution to the second equation would thus be A = I A−1 = A−1 .
(b) can be proven by writing out
(AB) B −1 A−1 = A BB −1 A−1 = AIA−1 = AA−1 = I

−1
Note that if (AB) = A−1 B −1 , then we could not write BB −1 = I, because they weren’t next to each
other, and you can’t just simply interchange matrices when multiplying.
T T
(c) can be proven by using A−1 AT = A−1 A = I T = I. (b) can be generalized to:
The product of n x n invertible matrices is invertible, and the inverse is the product of their
inverses in the reverse order.
An elementary matrix is one that is obtained by performing just a single elementary row operation
on an identity matrix.
If an elementary row operation is performed on an m x n matrix A, the resulting matrix
can be written as EA, where the m x m matrix E is created by performing the same row
operation on Im .
Each elementary matrix E is invertible. The inverse of E is the elementary matrix of the
same type that transforms E back into I.
The second one is quite logical: since row operations are reversible, if there’s been only one row operation
on I, then there must be another operation of the same type that changes E back into I, thus F E = I
and EF = I. The three row operations are: interchanging two rows, multiplying a row, and adding (a
multiplicative of) a row to another row.
Final theorem on this for now:
Theorem 7
An n x n matrix A is invertible if and only if A is row equivalent to In and in this case, any
sequency of elementary row operations that reduces A to In also transforms In into A−1 .
©Sam van Elsloo

25 2.3. CHARACTERIZATIONS OF INVERTIBLE MATRICES
−1
We can use this to make
an algorithm for finding A . If we place A and I side by side to form an
augmented matrix A I , then row reduce it. If A is row equivalent to I, that is, if you can get a nice
square identity matrix in the left half of your reduced matrix, then A I is row equivalent to I A−1 .
If you weren’t able to get the identity matrix on the left, then A does not have an inverse.
2.3 Characterizations of Invertible Matrices

Basically just a shitload of properties of invertible matrices:
Theorem 8
Let A be a square n x n matrix. Then the follownig statements are equivalent. That is, for a
given A, either all statements are true or all false:
a. A is an invertible matrix;
b. A is row equivalent to the n x n identity matrix;
c. A has n pivot positions;

d. The equation Ax = 0 has only the trivial solution;
e. The columns of A form a linearly independent set;
f. The linear transformation x → Ax is one-to-one;

g. The equation Ax = b has at least one solution for each b in Rn ;
h. The columns of A span Rn ;
i. The linear transformation x → Ax maps Rn onto Rn ;
j. There is a n x n matrix C such that CA = I;

k. There is an n x n matrix D such that AD = I;
l. AT is an invertible matrix.
Apply these to establish whether a given matrix is invertible. The easiest way is to row reduce it and
see whether it has indeed pivot positions in each row.
Proof
If (a) is true, then so is (j), because A−1 could be plugged in for C. If (j) is true, then so is (d),
because the solution to Ax = b is x = A−1 b. If b = 0, then x = 0, so the only solution is the trivial
solution. Since there is only this solution, there may not be a free variable, thus A has n pivot columns,
thus (c) is true as well. Since A also has n rows, the pivot positions must lay on the main diagonal, and
the matrix may be row reduced to the identity matrix, thus (b) is true as well. If (a) is true, then so is
(k), because A−1 could be pluigged in for D.
If (k) is true, then so is (g), because we can substitute x
with A−1 b to get A A−1 b = b = AA−1 b = b, so each b has at least one solution. If for each b in

Rn there is a solution, there must be a pivot position in each row, thus n pivot positions in total. Thus,
if (g) is true, then so is (c) and so is (a). (g), (h) and (i) are equivalent for each matrix, so if (g) is true,
then so are (h) and (i). (e) follows from (d), and from (e) follows (f). Finally, and from (a) follows (l)
and vice versa, using theorem 6(c) in section 2.2.
A linear transformation T : Rn → Rn is said to be invertible if there exists a function S : Rn → Rn
such that
S (T (x)) = x for all x in Rn (2.3)

T (S (x)) = x for all x in Rn (2.4)
Theorem 9
Let T : Rn → Rn be a linear transformation and let A be the standard matrix forr T . Then T is
invertible if and only if A is an invertible matrix. In that case, the linear transformation S given
by S (x) = A−1 x is the unique function satisfying equations (2.3) and (2.4).
2.4 Subspaces of Rn
Definition
A subspace of Rn is any set H in Rn that has three properties:
a. The zero vector is in H.
b. For each u and v in H, the sum u + v is in H.
c. For each u in H and each scalar c, the vector cu is in H.
It’s almost the same as a span, but with the added requirements (b) and (c). However, pretty much
all of the vector sets you’ve seen thus fare are subspaces.
Two subspaces are of special interest: the subspace which contains all linear combinations of the
columns of A, and the subspace which contains all solutions of the homogeneous equation Ax = 0. We
have special names for these two:
Definition
The column space of a matrix A is the set Col A of all linear combinations of the columns of A.
   
3 1 −3 −4
So, for example, to check whether b =  3  is in the column space of A = −4 6 −2, we solve
−4 −3 7 6
the augmented matrix:
     
1 −3 −4 3 1 −3 −4 3 1 −3 −4 3
−4 6 −2 3  ∼ 0 −6 −18 −15 ∼ 0 2 6 −5
−3 7 6 −4 0 −2 −6 5 0 0 0 0
Thus Ax = b is consistent and b is in Col A. Please note that if A is a m × n matrix, then Col A is in
Rm (thus equal to the number of rows). On the other hand, we have the null space
Definition
The null space of a matrix A is the set Nul A of all solutions of the homogeneous equation
Ax = 0.
It will be clearer later, but for now, just trust me when I say:
Theorem 12
The null space of an m × n matrix A is a subspace of Rn . Equivalently, the set of all solutions of
a system Ax = 0 of m homogeneous linear equations in n unknowns is a subspace of Rn .
This makes sense, because a solution of a system thats 2 × 5 would contain 5 variables, x1 , ..., x5 . One
more definition:
Definition
A basis for a subspace H of Rn is a linearly independent set in H that spans H.
This is quite a logical term: we can have subspaces that have many ’useless’ vectors, that is, vectors
that are merely a combination of the other vectors in it. By using a basis, we are only left with the
fundamental vectors in the subspace.
©Sam van Elsloo

27 2.5. DIMENSION AND RANK
Now, if we for example want to find a basis for the null space of the matrix
 
−3 6 −1 1 −7
A= 1 −2 2 3 −1
2 −4 5 8 −4
We first row reduce the augmented matrix to:

 
1 −2 0 −1 3 0
A 0 ∼ 0 0 1 2 −2 0
0 0 0 0 0 0
So that the solutions is:

         
x1 2x2 + x4 − 3x5 2 1 −3
x2   x2  1 0 0
         
x3  =  −2x4 + 2x5  = x2 0 + x4 −2 + x5  2  = x2 u + x4 v + x5 w
         
x4   x4  0 1 0
x5 x5 0 0 1
So, u, v, w generates Nul A. Note again that since we have five columns in A, the null space is also in R5 .
Now, what would be the basis column space of A? Denote the columns of the row reduced matrix B
by b1 , ..., b5 . Note that b2 = −2b1, b4 = −b1 + 2b3 , and b5 = 3b1 − b3 . So, b2 , b4 , b5 are all merely
linear combinations of b1 and b3 . Remembering that the equations Ax = 0 and Bx = 0 should have the
same solution set, thus they must have the exact same linear dependence relationships. Ergo, a2 , a4 and
a5 are also linear combinations of a1 and a3 . Therefore, the basis of the column space of A is formed by
a1 and a3 , so:
   
 −3 −1 
Basis for Col A =  1  ,  2 
2 5
 
The argument provided leads to the following theorem:
Theorem 13
The pivot columns of a matrix A form a basis for the column space of A.
Remember to always use the columns in the originial matrix, not the columns in the reduced matrix.
Now, please refer to chapter 4.1.
2.5 Dimension and Rank

   
3 −1
Suppose we have two vectors, v1 = 6, v2 =  0  which form a basis of H. Now, it may be interesting
2 1
to know what our position in R3 would be if we went three times in the direction of v1 and one time in
the direction of v2 . This would end us up at:
     
3 −1 8
3 6 + 1  0  = 18
2 1 7
So, the two vectors kinda created a ’new’ coordinate

 system,
 where the axes aren’t x1 and x2 , but v1 and
8
3
v2 , where the point with coordinates becomes 18. Now, we have some nicer words for this process:
1
7
Definition
Suppose that the set B = {b1 , ..., bp } is a basis for subspace H. For each x in H, the coordinates
of x relative to basis B are the weights c1 , ..., cp such that x = c1 b1 + · · · + cp bp , and the vector
in Rp
 
c1
[x]B = · · ·
cp
is called the coordinate vector of x (relative to B) or the B-coordinate vector of x.

 
8
3
So, for the previous example, [x]B = becomes x = 18. We can also check whether a vector
1
  7
3
x = 12 is in H, and if it is, what the coordinate vector of x relative to B is. We can do this by writing
7
the vector equation:
     
3 −1 3
c1 6 + c2  0  = 12
2 1 7
Which can be written as the augmented matrix:
   
3 −1 3 1 0 2
6 0 12 ∼ 0 1 3
2 1 7 0 0 0

2
So, the unique solution is c1 = 2 and c2 = 3, or [x]B = . The basis B is depicted in figure 2.1. Note
3
Figure 2.1: A coordinate system on a plane H in R3 .
that the plane formed by H ’looks’ like R2 . Indeed, we call this the dimension of H. In a linearly
independent set, thus a base, the number of vectors in H is equal to the dimension. So:
Definition
The dimension of a nonzero subspace H, denoted by dim H, is the number of vectors in any
basis for H. The dimension of the zero subspace {0} is defined to be zero.
The definition of a rank and theorem 14 and 16 of this chapter are treated in chapter 4.1.
Theorem 15: The Basis Theorem
Let H be a p-dimensional subspace of Rn . Any linearly independent set of exactly p elements in
H is automatically a basis for H. Also, any set of p elements of H that spans H is automatically
a basis for H.
©Sam van Elsloo

3 Determinants
3.1 Introduction to Determinants

Remember that a matrix is invertible if and only if its determinant is nonzero. Calculating the determinant
of a 2 × 2 matrix was easy: ad − bc. For larger matrices, it becomes more difficult. However, the process
to determine it is actually rather easy. Suppose we have:
 
1 −2 5 0
2 0 4 −1
A= 3 1

0 7
0 4 −2 0
What we do then, is just pick any row or column we like, say the first row. Then, the determinant of A
can be written as:
det A = 1 · det A11 − −2 · det A12 + 5 · det A13 − 0 · det A14
Note two things: the minus and plus signs alternate, and we the A1 1 etc. These indicate the submatrices
by leaving out the first row and first column, first row and second column, etc. So:
 
0 4 −1
A11 = 1 0 7
4 −2 0
Again, the determinant of this can be determined by:
det A11 = 0 det A11 − 4 det A12 + −1 det A13
Now, det A11 = 0 · 0 − 7 · −2 = 14, det A12 = 1 · 0 − 7 · 4 = 28, det A13 = 1 · −2 − 0 · 4 = −2. This process
must be repeated for each submatrix, so for large matrices, it really takes some while (for a 4 × 4 matrix,
it already takes 12 matrices).
This leads to a more general definition:
Definition
For n ≥ 2, the determinant of an n × n matrix A = [aij ] is the sum of n terms of the form
±a1j det A1j , with plus and minus signs alternating, where the entries a11 , a12 , ..., a1n are from,
the first row of A. In symbols:
1+n
det A = a11 det A11 − a12 det A12 + · · · + (−1) a1n det A1n
Xn
1+j
= (−1) a1j det A1j
j=1
 
1 5 0
For example, let A =  2 4 −1, then
0 −2 0

4 −1 2 −1 2 4
det A = 1 · det − 5 · det + 0 · det
−2 0 0 0 0 −2
= 1 (0 − 2) − 5 (0 − 0) + 0 (−4 − 0) = −2
29
CHAPTER 3. DETERMINANTS 30
These submatrices are called the (i,j)-cofactor of A, and the number Cij is given by:
i+j
Cij = (−1) det Aij (3.1)
The previous calculation used cofactor expansion accross the first row. We can, however, pick any
row or column to cofactor along:
Theorem 1
The determinant of an n × n matrix A can be computed by a cofactor expansion across any row
or down any column. The expansion acrros the ith row using cofactors is:
det A = ai1 Ci1 + ai2 Ci2 + · · · + ain Cin
The cofactor expansion down the jth column is:
det A = a1j C1j + a2j C2j + · · · + anj Cnj
The plus and minus sign in the (i, j)-cofactor depends on the position of aij in the matrix, regardless
i+j
of the sign of aij itseelf. The factor (−1) determines the following checkerboard pattern of sings:
 
+ − + ···
− + − 
 
+ − + 
··· ···
Choosing smartly along which column and rows we use cofactor expansion, we can significantly reduce
the computation time. For example, compute det A, where
 
3 −7 8 9 −6
0
 2 −5 7 3
0
A= 0 1 5 0 
0 0 2 4 −1
0 0 0 −2 0
If we first do it accross the first column, we get:

2
−5 7 3
0 1 5 0
det A = 3 · + 0 · C21 + 0 · C31 + 0 · C41 + 0 · C51
0 2 4 −1
0 0 −2 0
Note that e.g. C31 indicates the submatrix that is created by leaving out the 3rd row and 1st row. If we
now again cofactor along the first column, we get:

1 5 0

det A = 3 · 2 · 2 4 −1 (3.2)
0 −2 0
We saw that this equaled the determinant of this submatrix was -2, and hence det A = 3 · 2 · −2 = −12.
Theorem 2
If A is a triangular matrix, then det A is the product of the entries on the main diagonal of A.
©Sam van Elsloo

31 3.2. PROPERTIES OF DETERMINANTS
3.2 Properties of Determinants
Theorem 3: Row Operations
Let A be a square matrix.

a. If a multiple of one row of A is added to another row to produce a matrix B, then
det B = det A.
b. If two rows of A are interchanged to produce B, then det B = − det A.
c. If one row of A is multiplied by k to produce B, then det B = k · det A.
Note that if A for example is a 4 × 4 matrix, then det 3A would be 34 det A, as multiply four rows by
three.  
2 −8 6 8
 3 −9 5 10 
If we combine this with theorem 2, we can ofr example easily compute det A, where A = 
−3 0 1 −2.

1 −4 0 6
First, factoring out 2 in the top row and then adding a multiple of this row to some the other rows:

1
−4 3 4 1
−4 3 4
3 −9 5 10 0 3 −4 −2
det A = 2 = 2
−3 0 1 −2 0 −12 10 10
1 −4 0 6 0 0 −3 2
Leading to:

1
−4 3 4 1
−4 3 4
0 3 4 −2 0 3 4 −2
det A = 2 = 2 = 2 · 1 · 3 · −6 · 1 = −36
0 0 −6 2 0 0 −6 2
0 0 −3 2 0 0 0 1
Note that if a square matrix A is only invertible if and only if A has n pivot positions. If, after row
reduction, it becomes apparent that A does not have a pivot position in its final row, then A is not
invertible, and its determinant is equal to zero. Ergo:
Theorem 4
A square matrix A is invertible if and only if det A 6= 0.
Furthermore, det A = 0 when the columns of A are linearly dependent. Also, det A = 0 when the rows
of A are linearly dependen1 .
Theorem 5
If A is an n × n matrix, then det AT = det A.
3.2.1 Determinants and matrix products
Theorem 6: Multiplicative Property
If A and B are n × n matrices, then det AB = (det A) (det B).
1 Rows of A are the columns of AT and linearly dependent columns of AT make AT is singular (non-invertible). When
AT is singular, so is A, by the invertible matrix theorem).
3.3 Cramer’s Rule, Volume, and Linear Transformations

Let us say that Ai (b) is the matrix that is obtained from A by replacing the ith column by b:

Ai (b) = a1 · · · b · · · an
Now, suppose we have Ax = b, and try out the following, just for the heck of it:

A · Ii (x) = A e1 · · · x · · · en = Ae1 · · · Ax · · · Aen

= a1 · · · b · · · an = Ai (b)
If you’re a bit confused with what this actually means, hang on just a second, we’re almost there. We
can use the multiplicative property of determinants to write:
(det A) (det Ii (x)) = det Ai (b)
The determinant of an identity matrix where we just replaced one column with a column x would for
example look like:
 
1 0 3 0
0 1 −2 0
 
0 0 5 0
0 0 2 1
It should be obvious that by adding multiples of row 3 to the other rows, the system can be rewritten to:
 
1 0 0 0
0 1 0 0
 
0 0 5 0
0 0 0 1
Meaning that the determinant is simply 5. Thus, det Ii (x) is simply equal to the value of the ith entry
in x, or xi . This leads to the following theorem:
Theorem 7: Cramer’s Rule
Let A be an invertible n × n matrix. For any b in Rn , the unique solution x of Ax = b has entries
given by
det Ai (b)
xi = (3.3)
det A
Thus, we can calculate the ith entry in the vector x by dividing the determinant of the matrix we get
when substituting b into the ith column of A. So, for example, if we have the system:
3sx1 − 2x2 = 4
−6x1 + sx2 = 1
Calculate the value(s) for s for which the system has a unique solution (you don’t need Cramer’s rule for
this) and describe the solution (you need Cramer’s rule for this). First, the three applicable matrices are:

3s −2 4 −2 3s 4
A= , A1 (b) = , A2 (b) =
−6 s 1 s −6 1
For the system to have a unique solution, det A 6= 0.
det A = 3s2 − 12 = 3 (s + 2) (s − 2)
So that there is a unique solution as longa as s 6== −2 and s 6= 2.

det A1 (b) 4s + 2
x1 = =
det A 3 (s + 2) (s − 2)
det A2 (b) 3s + 24 s+8
x2 = = =
det A 3 (s + 2) (s − 2) (s + 2) (s − 2)
©Sam van Elsloo

33 3.3. USES OF DETERMINANTS
3.3.1 A formula for A−1

We know that AA−1 = I, and thus, the j th column of A−1 is a vector x such that
Ax = ej
Where the ith entry of x is the (i, j)-entry of A−1 . By Cramer’s rule, it is:
det Ai (ej )
xi = (i, j) − entry of A−1 =

det A
If we cofactor expansion down column i of Ai (ej ), we basically get all zeroes multiplied with a
subdeterminant, except for the j th row (ej consists of only zeroes, except for one one in the ith row.
Thus, if we cofactor expand along the this column, everytime we get a 0, and we only get nonzero cofactor
for the ith row). So, we can write:
i+j
det Ai (ej ) = (−1) det Aji = Cji
Cji
Thus, the (i, j) entry in A−1 equals det A ,
leading to the general matrix: If we cofactor expansion
down column i of Ai (ej ), we basically get all zeroes multiplied with a subdeterminant, except for the j th
row (ej consists of only zeroes, except for one one in the ith row. Thus, if we cofactor expand along the
this column, everytime we get a 0, and we only get nonzero cofactor for the ith row). So, we can write:
 
C11 C21 · · · Cn1
1   C12 C22 · · · Cn2 

A−1 =  .. .. ..  (3.4)
det A  . . . 
C1n C2n · · · Cnn
The last part is usually called the adjugate (or classical adjoint) of A. The general theorem is:
Theorem 8: An Inverse Formula
Let A be an invertible n × n matrix. Then
1
A−1 = adj A (3.5)
det A
 
2 1 3
Let’s do an example just for clarification. Find the inverse of the matrix A = 1 −1 1 . The
1 4 −2
determinant can be found to equal 14 (use your graphical calculator, for instance). The nine cofactors are:

−1 1 1 1 1 −1
C11 = + = −2, C12 = − = 3, C13 = + =5
4 −2 1 −2 1 4

1 3 2 3 2 1
C21 = − = 14, C22 = + = −7, C23 = − = −7
4 −2 1 −2 1 4

1 3 2 3 2 1
C31 = + = 4, C32 = − = 1, C33 = + = −3
−1 1 1 1 1 −1
We must take the transpose of this matrix to have everything end up at the right adjugate:
 
−2 14 4
adj A =  3 −7 1 
5 −7 −3
And thus
   −1 2

−2 14 4 1
1  7 7
A−1 = 3 −7 3
1  =  14 − 12 1 
14
14 5 −3
5 −7 −3 14 − 12 14
3.3.2 Determinants as area or volume

Suppose we have two vectors, < 3, 2 > and < 1, 4 >. You can draw a parallellogram by drawing these
two vectors, and then drawing the < 1, 4 > on the < 3, 2 > vector and vice versa, so that there would be
an intersection at (4, 6). Now, how big is the the area enclosed by this parallellogram? Let’s first write
down the matrix created by the two vectors of interest:

3 2
1 4
From the properties, it can be deduced that:
1 1
det 3 2 = det 2 2 0 2 0

= det 2
1 4 1 4 0 4
This matrix is simply a rectangle, with width 2 12 and height 4. Hence, the area equals 2 12 · 4 = 10. So, any
parallellogram can be transformed to a rectangle with equal area, and thus we know that the determinant
of the parallelogram is equal to the area of the parallelogram. This leads to the general theorem (it can
be extended to R3 using similar reasoning, but using a box instead of a rectangle):
Theorem 9

If A is a 2 × 2 matrix, the area of the parallelogram determined by the columns of A is det
A . If

A is a 3 × 3 matrix, the volume of the parallelepiped determined by the columns of A is det A.

Note that it is often necessary to translate the parallelogram/parallelepiped such that one of the
points is on the vector. Translation does not influence the area, however. So, for example, if we have the
paraellelogram with points (−2, −2), (0, 3), (4, −1) and (6, 4). If we ’add’ (2, 2) to each point, we end up
with the coordinates (0, 0), (2, 5), (6, 1) and (8, 6), see 3.1.
Figure 3.1: Translation of a parallelogram
Now, the matrix becomes:

2 6
A=
5 1

So that det A = −28 so that the area is 28 (note that we take the absolute value because negative
areas don’t exist, not even in linear algebra).
3.3.3 Linear transformations

Theorem 10
Let T : R2 → R2 be the linear transformation determined by a 2 × 2 matrix A. If A is a

parallelogram in R2 , then

{area of T (S)} = det A · {area of S} (3.6)
If T is detertmined by a 3 × 3 matrix A, and if S is a parallelepiped in R3 , then

{volume of T (S)} = det A · {volume of S} (3.7)
©Sam van Elsloo

35 3.3. USES OF DETERMINANTS
Proof
A parallelogram at the origin in R2 determined by vectors b1 and b2 has the form
S = {s1 b1 + s2 b2 : 0 ≤ s1 ≤ 1, 0 ≤ s2 ≤ 1}
The image of S under T consists of points of the form
T (s1 b1 + s2 b2 ) = s1 T b1 + s2 T (b2 )
= s1 Ab1 + s2 Ab2

It follows that T (S) is the parallelogram determined by the columns of the matrix Ab1 Ab2 . This

matrix can be written as AB, where B = b1 b2 . Thus,

{area of T (S)} = det AB = det A = det B = det A · {area of S} (3.8)
Obviously, if we translate S, the area is still the same.
©Sam van Elsloo

4 Vector Spaces
4.1 Rank
If we have a m × n = 4 × 5 matrix:
 
−2 −5 8 0 −17
1 3 −5 1 5 
A= 
3 11 −19 7 1 
1 7 −13 5 −3
Each row has n entries, and the set of all linear combinations of the row vectors is called the row space.
So, for this matrix:
r1 = (−2, −5, 8, 0, −17)

r2 = (1, 3, −5, 1, 5)
r3 = (3, 11, −19, 7, 1)
r4 = (1, 7, −13, 5, −3)
And Row A = Span {r1 , r2 , r3 , r4 }. The vectors can either be written out horizontally or vertically. Note
that if we transpose A, the column space of AT will be equal to the row space of A.
Theorem 13
If two matrices A and B are row equivalent, then their row spaces are the same. If B is in echolon
form, the nonzero rows of B form a basis for the row space of A as well for that of B.
Proof
If B is obtained from A by row operations, the rows of B are linear combinations of the rows of A. It
follows any linear combination of the rows of B is automatically a linear combination of the rows of A.
Thus the row space of B is contained in the row space of A. Since row operations are reversible, it can
also be said that A is a subset of the row space of B. If B is in echolon form, the nonzero rows are linearly
independent, because no nonzero row is a linear combination of the nonzero rows below it (just look at:
 
1 3 4 2
0 2 3 −1
0 0 1 2
It is obvious that the second row cannot be produced by the third row, and neither can the first row by
the second and third rows). Thus, the nonzero rows of B form a basis of the row space of B, which is
equal to the row space of A. Thus, to find the row space of a matrix, you need to row reduce it so that
it is apparent whether there are any zero rows in it; the row space is then formed by the nonzero rows.
Now, please note that you should use the rows in the B matrix now, not the originial rows in matrix A.
Definition
The rank of A is the dimension of the column space of A.
The dimension will be explained in more detail in chapter 2.9. However, you can compare it with this:
suppose you have two (linearly independent) vectors in R3 . Now, with these two vectors, you are able to
37
CHAPTER 4. VECTOR SPACES 38
reach all points on a certain plane given by these two vectors. So, in a sense, there is something that’s
only 2D about this set of vectors, even though they’re in R3 . The dimension refers to this matter: if you
have two vectors in a basis (because you need the set to be linearly independent), then the dimension
is R2 , because you can go to points in a certian set that’s only R2 . So, if we have three vectors in our
column space, even if they have thousands of rows each, the dimension is only in R3 .
Now, note that each pivot position in a m × n matrix led to a vector in the basis of the column space.
Furthermore, each column that did not contain a pivot position led to a free variable, thus a vector in the
null space. Since a column can only be a pivot column or not be a pivot column, this means that the
total number of colums n is equal to sum of the rank of A and the dimension of the null space of A. So,
if I have a 7 × 9 matrix with 7 pivot positions, then I would have a rank of 7, and since n = 9, my null
space would have a dimension of 2. Note that the rank of a matrix can never be larger than m, because
then there would be more rows with a pivot than there are actually pivots.
The precise theorem is:
Theorem 14
The dimensions of the column space and the row space of an m × n matrix A are equal. This
common dimension, the rank of A, also equals the number of pivot poisitons in A and satisfies the
equation:
rank A + dim Nul A = n
The invertible matrix theorem can be extended to:

Theorem 15: The Invertible Matrix Theorem (continued)
Let A be an n × n matrix. Then the following statements are each equivalent to the statement
that A is an invertible matrix.
m. The columns of A form a basis of Rn .
n. Col A = Rn .
o. dim Col A = n.
p. Rank A = n.
q. Nul A = {0}.
r. dim Nul A = 0.
Proof
Statement (m) is logically equivalent to statement (e) and (h) regarding linear independnece and
spanning. From (g), it follows that (n) is true, and so is (o), and so is (p), (r), (q) and finally (d), so that
the loop is complete.
©Sam van Elsloo

5 Eigenvalues and Eigenvectors
5.1 Eigenvectors and Eigenvalues

3 −2
Sometimes, when you perform a transformation, vectors transform very nicely. Let’s consider A = ,
1 0

−1 2 −5 4
u= and v = . It can be easily found that Au = and Av = . Just look at figure 5.1.
1 1 −1 2
We see that A only stretches, or dilates, v.
Figure 5.1: Effects of multiplication by A.
This only happens for certain vectors, and this section is about these vectors. We call them eigenvectors,
and the scalar multiple (so for the last time, we saw that Av = 2v, then the scalar multiple is 2) is called
the eigenvalue. To be precise:
Definition
An eigenvector of an n × n matrix A is a nonzero vector x such that Ax = λx for some scalar λ.
A scalar λ is called an eigenvalue of A if there is a nontrivial solution x of Ax = λx; such an x
is called an eigenvector corresponding to λ.
Itis easy
to determine
if a given
vector/scalar
is an eigenvector/eigenvalue of a matrix. Let’s consider
1 6 6 3
A= ,u = , and v = , and let’s determine whether u and v are eigenvectors of A:
5 2 −5 −2

1 6 6 −24 6
Au = = = −4 = −4u
5 2 −5 20 −5

1 6 3 −9 3
Av = = 6= λ
5 2 −2 11 −2
Thus, u is an eigenvector with eigenvalue -4, while v is not an eigenvector. Now, suppose we have to show
that 7 is an eigenvalue of matrix A, and to find the corresponding eigenvectors. For 7 to be an eigenvalue
Ax = 7x (5.1)
must have a nontrivial solution. We must thus solve:
Ax − 7x = 0
(A − 7I) x = 0
39
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 40
To solve this homogenous equation, form the matrix:

1 6 7 0 −6 6
A − 7I = − =
5 2 0 7 5 −5
Row reducing the augmented matrix:

−6 6 0 1 −1 0
∼
5 −5 0 0 0 0

1
The general solution has the form x2 . Each vector of this form with x2 6= 0 is an eigenvector
1
corresponding to λ = 7. This also shows why there must be a nontrivial solution to Ax = λx, had there
boon only the solution x = 0, then the vector would have been the zerovector, which may not be an
eigenvector.
Thus, λ is an eigenvalue of an n × n matrix A if and only if the equation
(A − λI) x = 0 (5.2)
has a nontrivial solution. The set of all solutions of above equation is just the null space of matrix
A − λI. So, this set is a subspace of Rn and is called the eigenspace of A corresponding to λ (thus, each
eigenvalue has
its own eigenspace). For example, the eigenspace of λ = 7 consists of all vectors that are a
1
multiple of , as depicted in figure 5.2.
1
Figure 5.2: Eigenspace
Example
 
4 −1 6
Let A = 2 1 6. An eigenvalue of A is 2. Find a basis for the corresponding eigenspace.
2 −1 8
     
4 −1 6 2 0 0 2 −1 6
A − 2I = 2 1 6 − 0 2 0 = 2 −1 6
2 −1 8 0 0 2 2 −1 6
Row reducing the augmented matrix leads to:
   
−2 1 6 0 −2 1 6 0
−2 1 6 0 ∼  0 0 0 0
−2 1 6 0 0 0 0 0
At this point, it is clear that 2 is indeed an eigenvalue, as there are two free variables and thus a nontrivial
solution. The general solution is:
     
x1 1/2 −3
x2  = x2  1  + x3  0 
x3 0 1
A basis for the eigenspace is thus (a two-dimensional subspace in R3 ):

   
 1 −3 
2 ,  0 
0 1
 
©Sam van Elsloo

41 5.1. EIGENVECTORS AND EIGENVALUES
Figure 5.3: A acts as a dilation on the eigenspace.
Each vector in this eigenspace is merely dilated by A, see figure 5.3.

For a triangular matrix, the eigenvalues can easily be calculated:
Theorem 1
The eigenvalues of a triangular matrix are the entries on its main diagonal.
Proof
For simplicity, consider the 3 × 3 case. If A is upper triangular, then A − λI has the form:
   
a11 a12 a13 λ 0 0
A − λI =  0 a22 a23  −  0 λ 0 
0 0 a33 0 0 λ
 
a11 − λ a12 a13
=  0 a22 − λ a23 
0 0 a33 − λ
A free variable is created when a33 − λ = 0, or a22 − λ 

= 0 or a11 − λ =0. Thus, λ needs to equal
3 6 −8 2
0 0 6 5
any of the entries on the diagonal. Suppose we have A =  0 0 2 −3, then the eigenvalues are

0 0 0 3
simply 3, 0 and 2.
Now, what happens if the eigenvalue is 0? This is only the case if:
Ax = 0x = 0 (5.3)
This has only a nontrivial solution (necessary for it to be an eigenvalue) if it is linearly dependent (because
then there’s a free variable), so by the invertible matrix theorem, 0 is an eigenvalue of A if and only if A
is not invertible. Remember this, because we will add this to the invertible matrix theorem in the next
section.
Theorem 2
If v1 , ..., vr are eigenvectors that correspond to distinct eigenvalues λ1 , ..., λr of an n × n matrix
A, then the set {v1 , ..., vp } is linearly independent.
Proof
Let vp+1 be a linear combination of all preceding vectors. Then there exists scalars such that:
c1 v1 + · · · + cp vp = vp+1 (5.4)
Multiplying both sides with A and using Avk = λvk , we get:
c1 Av1 + · · · + cp Avp = Avp+1 (5.5)

c1 λ1 v1 + · · · + cp λp vp = λp+1 vp+1 (5.6)
If we multiply (5.4) with λp+1 on both sides, and then substract this from above result, we get:
c1 (λ1 − λp+1 ) v1 + · · · + cp (λp − λp+1 ) vp = 0 (5.7)
Since each eigenvalue is distinct, λi − λp+1 6= 0, and thus this is only possible if all scalars c1 , ..., cp are
zero, and thus only the trivial solution exists, which is only the case for linearly independent vectors, and
hence it cannot be that the set {v1 , ..., vp } is linearly dependent.
5.1.1 Eigenvectors and difference Equations

Sometimes, we can find a vector xk+1 by simply multiplying the previous vector, xk , with matrix A. This
is for example the case if you look at population growth, where each year you for example multiply with
1.02 to reach the population in the next year, if the population growth is 2%. We can write:
xk+1 = Axk (5.8)
And this is called a difference equation, because it describes the changes in a system as time passes. It is a
recursive description of a sequence {xk } in Rn . A solution of it is an explicit description of {xk } whose
formula does not depent directly on A or preceding terms except for the first term, x0 . The simplest way
to do this is to just write:
xk = λk x0 (5.9)
This is true because:
Axk = A λk x0 = λk (Ax0 ) = λk (λx0 ) = λk+1 x0 = xk+1

5.2 The Characteristic Equation

Perhaps I’m stupid, but pretty sure we half of the stuff here already in chapter 3. For the following
extension of the invertible matrix theorem, please refer to the previous section and theorem 4 of chapter
3.
Theorem: The Invertible Matrix Theorem (continued)
Let A be an n × n matrix. Then A is invertible if and only if:

s. The number 0 is not an eigenvalue of A.
t. The determinant of A is not zero.
The following theorem is also merely an repetition of chapter 3.

Theorem 3: Properties of Determinants
Let A and B be n × n matrices.

a. A is invertible if and only if det A 6= 0.
b. det AB = (det A) (det B).
c. det AT = det A.
d. If A is triangular, then det A is the product of the entries on the main diagonal of A.
e. A row replacement operation on A does not change the determinant. A row interchange
changes the sign also scales the determinant by the same scale factor.
We can now go on to do something useful. For λ to be an eigenvalue, (A − λI) x = 0, or A − λI = 0.

This means that:
A scalar λ is an eigenvalue of an n × n matrix A if and only if λ satisfies the characteristic

equation
det (A − λI) = 0
©Sam van Elsloo

43 5.2. THE CHARACTERISTIC EQUATION
 
5 −2 6 1
0 3 −8 0
For example, if we want to find the characteristic equation of = 
 , then:
0 0 5 4
0 0 0 1
 
5−λ −2 6 −1
 0 3−λ −8 0 
det (A − λI) = det   = (5 − λ) (3 − λ) (5 − λ) (1 − λ)
 0 0 5−λ −4 
0 0 0 1−λ
So the characteristic equation is (one of the following, just a matter of writing):

2
(5 − λ) (3 − λ) (1 − λ) = 0
2
(λ − 5) (λ − 3) (λ − 1) = 0
4 3 2
λ − 14λ + 68λ − 130λ + 75 = 0
The last equation is also called the characteristic polynomial of A. In fact, if A is an n × n matrix,
then det (A − λI) is a polynomial of degree n. The eigenvalue 5 in the foregoing example is said to have
multiplicity 2 because (λ − 5) occurs two times as a factor of the characteristic polynomial. In general,
the (algebraic) multiplicity of an eigenvalue λ is its multiplicity as a root of the characteristic equation.
Example
Let the characteristic polynomial of a 6 × 6 matrix be λ6 − 4λ5 − 12λ4 . The eigenvalues and their
multiplicities are then:
λ6 − 4λ5 − 12λ4 = λ4 λ2 − 4λ − 12 = λ4 (λ − 6) (λ + 2)

Thus the eigenvalues are 0 (multiplicity 4), 6 and -2 (both multiplicity of 1).
5.2.1 Similarity
We’ll now discuss something that will only become of use in the following section, so don’t freak out when
you have no idea why I mention this. A is said to be similar to B if there is an invertible matrix P such
that P −1 AP = B, or A = P BP −1 . Let Q = P −1 , then Q−1 BQ = A, thus B is also similar to A, and we
simply say that A and B are similar. Changing A into P −1 AP is called a similarity transformation.
Similar matrices have a very nice property:
Theorem 4
If n × n matrices A and B are similar, then they have the same characteristic polynomial and
hence the same eigenvalue (with the same multiplicities).
Proof
If B = P −1 AP , then:
B − λI = P −1 AP − λP −1 P = P −1 (AP − λP ) = P −1 (A − λI) P
det (B − λI) = det P −1 (A − λI) P = det P −1 · det (A − λI) · det (P )

= det P −1 P · det (A − λI) = det (A − λI)

So they have the same eigenvalues. Two warnings:
If two matrices have the same eigenvalues, they are not necessarily similar;
Similarity is not the same as row equivalence.
5.2.2 Application to dynamical systems

Example

0.95 0.03
Let A = . Analyze the long-term behavior of the dynamical system defined by xk+1 = Axk ,
0.05
0.97
0.6
with x1 = . Remember that in the previous section, we said that we could write xk = λk x0 . However,
0.4
this only applies if x0 is an eigenvector, which isn’t the case here. We can, however, write x0 as a linear
combination of eigenvectors v1 and v2 , and then write (c1 and c2 are necessary scaling factors to write
x1 as linear combination of the other two):
xk = λk1 c1 v1 + λk2 c2 v2
So, let’s first find the eigenvectors and eigenvalues. We have:

0.95 − λ 0.03
0 = det = (0.95 − λ) (0.97 − λ) − 0.03 · 0.05 = λ2 − 1.92λ + 0.92
0.05 0.97 − λ
Solving with the quadratic equation leads to λ1 = 1 or λ1 = 0.92. Then finding the eigenvectors, starting
with the first eigenvalue:
(A − λI) x = 0

0.95 0.03 1 0 −0.05 0.03
A − λI = − =
0.05 0.97 0 1 0.05 −0.03

x1 0.6x2
Row reducing the augmented matrix leads to the solution x1 −0.6x2 = 0, and thus v1 = = =
x2 x2
3 1
. Similarly it can be found that v2 = . Now, we need write x0 as a linear combination of these
5 −1

two vectors. The augmented matrix will thus be v1 v2 x0 , or:

3 1 0.6
5 −1 0.4
This has the solutions c1 = 0.125 and
c2 = 0.225 (use your
graphical calculator to easily row reduce).

k 3 k 1 3
This thus leads to xk = 0.125 · 1 + 0.225 · 0.92 . In the long run, this becomes 0.125 =
5 −1 5

0.375
= 0.125v1 .
0.625
5.3 Diagonalization
Remember that we could write A = P BP −1 . This can be a very powerful property if B is a diagonal
matrix ( (a matrix
with only entries on
its main diagonal). Let’s just look at the following
computation.
2 3

5 0 5 0 5 0 5 0 5 0
Let D = . Then D2 = = . If we take D3 , we get and in general,
0 3 0 3 0 3 0 32 0 33
k
5 0
Dk = . So, suppose you want to find A100 where A is a shitty matrix, it significantly reduces
0 3k
computation time if you can find a A = P DP −1 where D is a diagonal matrix, because D100 can very
easily be found and we thus only have to do two ’full’ matrix multiplications, rather than 1001 .
Now, the following theorem provides some useful facts about how to construct A = P DP −1 .
Theorem 5: The Diagonalization Theorem
An n × n matrix A is diagonalizable if and only if A has n linearly independent eigenvectors.

In fact, A = P DP −1 , with D a diagonal matix, if and only if the columns of P are n linearly
independent vectors of A. In this case, the diagonal entries of D are eigenvalues of A that
correspond, respectively, to the eigenvectors in P .
1 The exact derivation follows the pattern: A2 = P D−1 P −1 P D−1 P −1 = P DDP −1 = P D2 P −1 . Same happens

when you do it with A3 , etc. ad infinitum.
©Sam van Elsloo

45 5.3. DIAGONALIZATION
A diagonizable matrix is a square matrix A similar to a diagonal matrix.

So, basically what this means, A is diagonalizable if and only if there are as many linearly independent
eigenvectors as columns in A. The proof in the book is rather vague in my opinion, but it basically boils
down to this. Suppose A is indeed diagonalizable and A = P DP −1 . This should automatically give
AP = P D. Now, let us just guess that the columns of P consist of the eigenvectors of A, v1 , ..., vn , and
D has the eigenvalues that correspond to the eigenvectors as the diagonal entries. We then have:

AP = A v1 v2 · · · vn = Av1 Av2 · · · Avn
 
λ1 0 · · · 0
 0 λ2 · · · 0 
PD = P  . = λ1 v1 λ2 v2 · · · λn vn
 
. .
 .. .. .. 
0 0 ··· λn
Now, we see that this exactly works out (because for an eigenvector, Avk = λk vk ), so our guesses
were correct. Furthermore, for P to be invertible, its columns must be linearly independent, and hence P
must consist of n linearly independent eigenvectors.
5.3.1 Diagonalizing matrices

Using above theorem, the approach is actually very simple: we first determine the eigenvalues of A. We
then find all eigenvectors and write them nicely in a matrix P .
Example
Diagonalize the following matrix, if possible.
 
1 3 3
A = −3 −5 −3
3 3 1
So, let’s find the eigenvalues. For 3 × 3 matrices, determinants are a much alrger pain in the ass, so the
book will usually provide for an easy way to avoid doing a shitload of computations (the textbook at
least says so), so I’ll just give you the characteristic equation:
2
0 = det (A − λI) = −λ3 − 3λ2 + 4 = − (λ − 1) (λ + 2)
So, λ1 = 1 and λ2 = −2. Now, solving to find the eigenvectors. Remember that then the equation
(A − Iλ) x = 0 must hold. So, we get for λ = 1:
     
1 3 3 1 0 0 0 3 3
−3 −5 −3 − 0 1 0 = −3 −6 −3
3 3 1 0 0 1 3 3 0
   
0 3 3 0 1 0 −1 0
−3 −6 −3 0 ∼ 0 1 1 0
3 3 0 0 0 0 0 0
     
x3 1 1
So x = −x3  = x3 −1, so v1 = −1. Similarly, we have for λ = −2:
x3 1 1
     
1 3 3 −2 0 0 3 3 3
−3 −5 −3 −  0 −2 0  = −3 −3 −3
3 3 1 0 0 −2 3 3 3
   
3 3 3 0 1 1 1 0
−3 −3 −3 0 ∼ 0 0 0 0
3 3 3 0 0 0 0 0
         
−x2 − x3 −1 −1 −1 −1
So x =  x2  = x2  1  + x3  0 , so the basis for λ = −2 is v2 =  1  and v3 =  0 .
x3 0 1 0 1
   
1 −1 −1 1 0 0
We can now simply construct P = v1 v2 v3 = −1 1 0  and D = 0 −2 0 .
1 0 1 0 0 −2
Now, it may occur that if we have an eigenvalue with a multiplicity, we do not gain as many linearly
independent eigenvectors from this eigenvalue (for example, in the foregoing example, if λ = −2 had only
yielded us with one vector v2 instead of both v2 and v3 . In that case, we have too few eigenvectors to
construct P , and hence A is not diagonalizable. However, we do know:
Theorem 6
An n × n matrix with n distinct eigenvalues is diagonalizable.
In other words, A is diagonalizable if and only if there are enough eigenvectors to form a basis of Rn .
We call such basis an eigenvector basis for Rn2 .
The proof is simple. Let v1 , ..., vn be eigenvectors corresponding to the n distinct eigenvalues of
a matrix A. Then {v1 , ..., vn } is linearly independent, by Theorem 2 in Section 5.1. Hence A is
diagonalizable, by theorem 5.
Note that it is a requisite for A to have n distinct eigenvalues to be diagonalizable, just see the
foregoing example.
5.3.2 Matrices whose eigenvalues are not distinct

Now, you must wonder, what happens if there are less than n distinct eigenvalues? The following theorem
provides some useful information about that:
Theorem 7
Let A be an n × n matrix whose distinct eigenvalues are λ1 , ..., λp .
a. For 1 ≤ k ≤ p, the dimension of the eigenspace of λk is less than or equal to the multiplicity
of the eigenvalue λk .
b. The matrix A is diagonalizable if and only if the sum of the dimensions of the eigenspace
equals n, and this happens if and only if (i ) the characteristic polynomial factors completely
factor into linear factors and (ii ) the dimension of the eigenspace for each λk equals the
multiplicity of λk .
c. If A is diagonalizable and Bk is a basis for the eigenspace corresponding to λk for each k,

then the total collection of vectors in the sets B1 , ..., Bp forms an eigenvector basis for Rn .
All of these are pretty logical: an eigenvalue will at maximum have the same number of eigenvectors
as it is the mltiplicity, so its dimension will be less than or equal to the multiplicity of the eigenvalue. The
characteristic polynomial needs to completely factor out and each eigenvalue should have an eigenspace
where the dimension equals the multiplicity3 . Finally, the ’sum’ of all basisses of the individual eigenspaces
together make up one big ass eigenvector basis that covers all of Rn .
Example
Let’s do one final example about this. Diagonalize A, if possible:
 
5 0 0 0
0 5 0 0
A= 
1 4 −3 0
−1 −2 0 −3
2 Remember that this was a fancy way of saying that the eigenvectors need to be able to reach every point in Rn .
3 Thedimension was determined by how many (linearly independent) vectors a set contained. As we need the maximum
number of vectors per eigenvalue, the dimension thus has to equal the multiplicity.
©Sam van Elsloo

47 5.4. EIGENVECTORS AND LINEAR TRANSFORMATIONS
The eigenvalues are λ = 5 and λ= 3.It can be

 calculated
 by your graphicalcalculator
 that the
 associated
−8 −16 0 0
4  4  0 0
 1 , v2 =  0  and for λ = −3 v3 = 1 and v4 = 0. The set is
eigenvectors for λ = 5 are v1 =        
0 1 0 1
linearly independent and thus we have:
   
−8 −16 0 0 5 0 0 0
4 4 0 0  and D = 0 5 0 0

P = 1 0 0 −3 0 

0 1 0
0 1 0 1 0 0 0 −3
5.4 Eigenvectors and Linear Transformations

You only need to know theorem 8 and the associated example for this section. First:
Theorem 8
Suppose A = P DP −1 , where D is a diagonal n × n matrix. If B is the basis for Rn formed from

the columns of P , then D is the B-matrix for the transformation x → Ax.
Now, what they are actually saying here: if we have A = P DP −1 , then if we use this as a linear
transformation, we can either go via A directly (Ax) or do P DP −1 x. Remembering that you will first
multiply x with P −1 , what you actually do is the following: you convert the vector via P −1 to a different
coordinate in a different basis4 . Now we multiply it with another matrix, D, to convert it to another
vector, but in the same basis as when we converted via P −1 . Now, to finish up, we multiply with P to
convert back to the vector that was also calculated with Ax. A visualization is given in figures 5.4 and
5.5. Note that in figure 5.4 for our case, both are in Rn .
Figure 5.4: Graphical representation of a linear transformation
Figure 5.5: Diagram of a matrix transformation
−1
What you need to remember however, is that if we have B-matrix
A =P DP , then D is simply the
7 2 1 1 5 0
for T . So, for example, we found before that for A = ,P = and D = . So,
−4 1 −1 −2 0 3
5 −
B= , .
0 3

4 If 2
you don’t remember what this meant: if we first have a vector a = , then we basically say, it’s 2x1 and 1x2 . But,
1

3 −2 1
we can also define w1 = and w2 = , and then a = w1 + w2 , or a = where the basis is formed by {w1 , w2 }.
−1 2 1
n
but still in the same dimension (if x is in R , then P −1 must be n × n, so it stays Rn
5.5 Complex Eigenvalues

0.5 −0.6
Sometimes, eigenvalues aren’t real but imaginary. Take A = . The eigenvalues are then
0.75 1.1
determined by:
2
0 = (0.5 − λ) (1.1 − λ) − −0.6 · 0.75 = λ2 − 1.6λ + 1 = (λ − 0.8) + 0.36
Which does not have real solution. We can still solve it however:
2
(λ − 0.8) = −0.36
√ √
λ − 0.8 = ± −0.36 = ±i 0.36
λ = 0.8 ± 0.6i
So, we have two complex eigenvalues, λ = 0.8 + 0.6i and λ = 0.8 − 0.6i. We can find the complex
eigenvectors as well, though that’s a little bit more difficult. First, let’s do it for λ = 0.8 + 0.6i. We then
get:
0.5 −0.6 0.8 + 0.6i 0
A − (0.8 − 0.6i) I = −
0.75 1.1 0 0.8 + 0.6i
(5.10)
−0.3 − 0.6i −0.6
=
0.75 0.3 − 0.6i
Now, row reducing isn’t a fun thing to do here due to the complex numbers. We can, however, write:
(−0.3 − 0.6i) x1 − 0.6x2 =0

(5.11)
0.75x1 + (0.3 − 0.6i) x2 =0
If we think logically, we can however find the solution. Since we used an eigenvalue, we know this system
to have a nontrivial solution. So, if we were to write x1 as a function of x2 based on the first equation,
we know that if we plug this into the second equation, that it should still yield 0. If this would not be the
case, then apparently the first equation only has the trivial solution, but we know that that is bullshit as
we used an eigenvalue and thus it should have a nontrivial solution. So, we can just use any of the two
equations to determine our solution. Let’s take the second equation. We see that we get:
0.75x1 = − (0.3 − 0.6i) x2

x1 = (−0.4 + 0.8i) x2

−2 + 4i −2 − 4i
And thus v1 = . Similarly, v2 can be found to be equal to v2 = . So, an complex
5 5
eigenvalue is an eigenvalue in Cn and an complex eigenvector is an eigenvector in Cn .
Now, if you’re interested

in seeing what happens if there’s an complex eigenvalue. For the foregoing
2
matrix, let us take x0 = , so that:
0

0.5 −0.6 2 1.0
x1 = Ax0 = =
0.75 1.1 0 1.5

0.5 −0.6 1.0 −0.4
x2 = Ax1 = =
0.75 1.1 1.5 2.4
The points are shown in 5.6. It’s visible that the values lie in an elliptical orbit.
We call the real stuff of a vector the real part and the complex stuff the complex part. The
conjugate of a complex number, denoted by x has the same real part, but its complex part has twisted
sign. So, for v1 above, we have:

−2 4 −2 4 −2 − 4i
Re v1 = , Im v1 = , and v1 = −i =
5 0 5 0 5
The following properties apply:
rx = rx, Bx = Bx, BC = BC, and rB = rB
©Sam van Elsloo

49 5.6. APPLICATIONS TO DIFFERENTIAL EQUATIONS
Figure 5.6: Iterations of the point x0 under the action of a matrrix with a complex eigenvalue.
Now, you may have thought, is it coincidence that v1 and v2 happen to be a conjugate pair? And
that the eigenvalues were also a conjugate pair? It’s not. Let A be an n × n matrix whose entries
are completely real. Since there is no complex part in A, A = A. So, not only Ax = λx, but also
Ax = Ax = λx = λx. But this isn’t that important, just remember that they occur in conjugate pairs.
Now, as to why we have rotation. We can most easily deduce this by using similarity, and finding
a
−1 a −b
matrix C that will form a building block for A = P CP . Now, let us just assume C = , then
b a
the eigenvalues equal:
2
(a − λ) + b2 = 0
2
(a − λ) = −b2
a−λ = ±bi
λ = a ± bi
a
We can draw λ as depicted in figure 5.7. Using the fact that cos φ = r and sin φ = rb , we get that:
Figure 5.7: λ

a/r −b/r cos φ − sin φ r 0 cos φ − sin φ
C=r =r =
b/r a/r sin φ cos φ 0 r sin φ cos φ

r 0
Note that r may be substituted by because that’s the way you end up a the same matrix as before.
0 r
√
Furthermore, r = kλk = a2 + b2 .
Now, note what actually happens here: we first multiply by some angles (because we first multiply with
the rightermost matrix). This is the rotation. How much it is actually rotated isn’t very straightforward
to explain, and you do not need to know it anyway, but it’s obvious that if we keep multiplying with
sines and cosines, we do something related to a rotation. Afterwards, we multiply it with the length of
the eigenvalue of C, so there it is scaled as well. This computation is shown in figure 5.8.
5.6 Applications to Differential Equations

Stuff now starts to get a bit more complicated. It may be that in the beginning, stuff will seem very
vague, but I promise, hang in there, in the end it’ll be easier than it seems. The second example (starting
Figure 5.8: Rotation and then scaling
on page 56) will provide a lot of clarity. Remember that a differential equation could be of the form
x0 (t) = ax (t) (so a very basic differential equation). Now, we can have system of such equations as well,
so for example written as:
x01 = a11 x1 + · · · + a1n xn

x02 = a21 x1 + · · · + a2n xn
..
.
x0n = an1 x1 + · · · + ann xn
The reason why I wrote a11 etc. is so that the conversion to matrix form will be clearer. They merely
indicate a scalar constant for each function, nothing more than that. We can write this set of differential
equations as:
x1 (t) = Ax (t) (5.12)
where
   0   
x1 (t) x1 (t) a11 ··· a1n
 ..  0  ..   .. .. 
x (t) =  .  , xt =  .  , and A =  . . 
xn (t) xn (t) an1 ··· ann
A solution is a vector-valued function that satisfies (5.12) for all t in some interval of real numbers,
such as t ≤ 0. Now, equation (5.12) is linear because remember that for a linear transformation (if
x = cu + dv):
Ax = A (cu + dv) = cAu + dAv
In this case, Ax should equal x0 . So, if u and v are both valid solutions for x, then we need to prove that
0
A (cu + dv) = (cu + dv) . That’s true:
0
(cu + dv) = cu0 + dv0 = cAu + dAv = A (cu + dv)
Now, enough theorem for now, let’s do some calculations. Let us for example consider:
0
x1 (t) 3 0 x1 (t)
= (5.13)
x02 (t) 0 −5 x2 (t)
This can actually be written as:

x01 (t) = 3x1 (t)
(5.14)
x02 (t) = −5x2 (t)
You may remember from calculus that the general solutions to these equations are x1 (t) = c1 e3t and
x2 (t) = c2 e−5t , or, equivalently:
c1 e3t

x1 (t) 1 3t 0 −5t
= = c1 e + c 2 e
x2 (t) c2 e−5t 0 1
Now, note that 3 and -5 also happened to be the eigenvalues of A. So, we have the suspicion that the
general solution will be a linear combinations of the vectors of the form:
x (t) = veλt (5.15)
©Sam van Elsloo

(in the foregoing example, c1 and c2 would indicate the weights), where λ is some scalar (which will turn
out to be the eigenvalue) and v a nonzero vector (if v = 0, then x (t) = 0, and thus x0 = 0 = Ax5 ). Now,
note that we can prove that we must take the eigenvalue in the exponent:
x (0 t) = λveλt
Ax (t) = Aveλt
This is only correct if and only if λv = Av, thus only if λ is the eigenvalue, and v the corresponding
eigenvector. Such solutions are called eigenfunctions. Thus, here is the general approach for finding an
appropriate x (t): find the eigenvalues and eigenvectors. The general solution, or preferably called the
fundamental set of solutions is then equal to x (t) = c1 v1 eλt + · · · + cn vn eλt . The exact solution (so
values for c1 , ..., cn ) can be found by solving the initial value problem. Let’s do an example.
Example

x1 (t) −1.5 0.5 5
Let x = , A = and x (0) = . It can rather easily be found that the
x2 (t) 1 −1 4
corresponding eigenvalues
are (usethecharacteristic equation) λ1 = −0.5 and λ2 = −2, with corresponding
1 −1
eigenvectors v1 = and v2 = . Then the fundamental set of solutions is:
2 1

λ1 t λ2 t 1 −0.5t −1 −2t
x (t) = c1 v1 e + c2 v 2 e = c1 e + c2 e
2 1
That’s not that difficult, is it? Finding the eigenvalues and eigenvectors is the hardest part, basically.
Now, we know that if t = 0, it reduces to:

1 −1 5
c1 + c2 =
2 1 4

1 −1 5
Solving the augmented matrix (with your graphical calculator) leads to c1 = 3 and c2 = 2.
2 1 4
So, the desired solution can be written as:

1 −0.5t −1 −2t
x (t) = 3 e −2 e
2 1
−0.5t
+ 2e−2t

x1 (t) 3e
=
x2 (t) 6e−0.5t − 2e−2t
c1 e−0.5t + c2 e−2t

x1 (t)
(note that the general form would have been = . Now, we can plot the
x2 (t) c1 2e−0.5t − c2 e−2t
graphs for several values of c1 and c2 if we want. On your graphical calculator (at least, this is how I do
it on my TI-84 Plus), go to MODE, and change the fourth setting from FUNC to PAR6 . Then, go to the
screen where you can input the graphs. It’ll now give X1T and Y1T as formulas. For X1T , plug in the
first row of the matrix (so the one corresponding to x1 ) (choose some scalars you want). For Y1T , plug in
the second row. If you want a second graph in the same window, just plug in the formulas into the X2T ,
Y2T , etc. Now, in the window settings, you now also have to pick a suitable range for T and a reasonable
time step. Please note that you cannot choose any range; if there are too many steps, it’ll take literally
ages before the graphing is complete. Furthermore, you need to pick your range smartly: there’s no point
in plotting 1000000-200000 for example. Finally, the graph will connect all points with straight lines, so
if your time steps are too large, it’ll cause sudden deflections.
So, you can play a bit with this, playing with your window settings and playing with the scalars.
Specifically, it’s interesting to see what happens if you choose c1 = 0, or c2 = 0, or in one and the same
plot c1 = 3 and c2 = 2. We then get the following graph, or trajectory, as seen in figure 5.9. Now, you may
have noticed during plotting that your graphical calculator drew the points chronologically: you really saw
5 OkayI have actually no idea why the book points this out.
6 This
indicates how it will graph equations. FUNC will plot like a regular function: you plug in a function for y and it
will plot the function. If you enable PAR (from parametric equations), you can plug in two formulas (one for x, one for y)
as a function of time T, and it’ll plot the coordinates for a certain range of T, which is what we will be wanting.
Figure 5.9: Trajectory
one pixel added after pixel. It does in fact first plot T = 0.000, then T = 0.001, T = 0.002 etc., so you see
what direction the graph tends to. These directions are indicated with arrows in figure 5.9. We also see the
specific graphs formed by the scalars c1 = 0 (the graph that goes through v2 ), c2 = 0 (the graph that goes
through v1 ), c1 = 3 and c2 = 2 (the graph that goes through the initial value, x0 ). We also see that each
graph is directed towards the origin (though please note, except if of course c1 = c2 = 0, it never ends at
the origin). The reason for this may be obvious: as e is raised to a negative power, it’ll keep on decreasing
forever, thus ending up almost at the origin). The origin attracts thus all graphs and is called an at-
tractor or sink. An matrix A with only negative eigenvalues will thus have an origin that acts as attractor.
Now, if the eigenvalues had been positive (and the exponents would thus have been positive), the
values of x1 and x2 would have kept on increasing forever, thus the origin seems to repel all the graphs,
and is therefore called a repeller or source. Now, what happens if we have one positive and one negative
eigenvalue?
Example
Suppose a particle is moving in a planar
force
field and its
position
vector x satisfies x0 = Ax and
4 −5 2.9
x (0) = 0. Furthermore. we have A = and x0 = . Solve this initial value problem for
−2 1 2.6
t ≤ 0, and sketch the trajectory of the particle.
−5
The eigenvalues can be found to equal λ1 = 6 and λ2 = −1, with the eigenvectors being v1 =
2

1
and v2 = and thus
1

−5 6t 1 −t
x (t) = c1 v1 eλ1 t + c2 v2 eλ2 t = c1 e + c2 e
2 1

−5 1 2.9 −3 188
Solving the augmented matrix yields c1 = 70 and c2 = 70 and hence:
2 1 2.6

−3 −5 6t 188 1 −t
x (t) = e + e
70 2 70 1
−5e6t + c2 e−t 7

x1 (t)
Plot a few trajectories of this (it’d be = ) and end up at something like depicted
x2 (t) 2e6t + c2 e−t
in figure 5.10. We see that the lines initially direct towards the origin, then suddenly experience a mid-life
experience and decide to go away from the origin. So, the origin acts half as sink and as source, and is
called a saddle point. Now, we also see that the line of v2 is never repelled; this is because there c1 = 0,
and thus you basically only have the part of x (t) that has the negative eigenvalue (and thus is solely
attracted) and v1 is never attracted; this is because there c2 = 0, and thus you basically only have the
part of x (t) that has the positive eigenvalue (and thus is only repelled).
7 Take a few scalars: take the negative value of c1 and c2 , 0 for one of them, etc. to get a good overview.
©Sam van Elsloo

Figure 5.10: Saddle point
5.6.1 Decoupling a dynamical system

We saw in the example on page 55 that a diagonal matrix actually left us with very nice solutions that
were quite easily found (I mean, you
could
easily
see what the eigenvalues were and the eigenvectors for a
1 0
diagonal matrix 2 × 2 are simply and . So, basically, the solutions of a diagonal matrix can be
0 1
written as:8  λ t
c1 e 1
y (t) = e1 c1 eλ1 t + · · · + en cn eλn t  ...  (5.16)
 
cn eλn t
(this is what you get if you simply write everything in one matrix). We can use this property if have
a shitty matrix. Remember that diagonalization meant A = P DP −1 . The eigenvalues of A and D are
the same, but the eigenvectors are not. Furthermore, P consisted of the eigenvectors of A. Now, if we
compare
x (t) = c1 v1 eλ1 t + · · · + cn vn eλn t

y (t) = c1 e1 eλ1 t + · · · + cn en eλn t
What oh what must we do to get from y (t) to x (t)? That’s right, e1 must becmoe v1 , etc. How do we do
this? We multiply e1 with a matrix whose first column is v1 . e2 is multiplied by the same matrix, and its
second column is v2 . en is multiplied by the same matrix whose nth column is vn . So, we must multiply
y (t) with a matrix that consists of the eigenvectors of A, which, coincidentally, happens to be P . So,
x (t) = P y (t) (5.17)
Or equivantly:
y (t) = P −1 x (t) (5.18)
Note that in equation 5.16, the scalars are found by solving:
 
c1
 ..  −1 −1
 .  = y (0) = P x (0) = P x0
cn
The change of variable from x to y has decoupled the system of differential equations. It’s called
decoupling because if we compare the example on page 55 and on page 56 with eachother, we see that for
the diagonal matrix, x01 (t) only depended on x1 (t) and x02 (t) only depended on x2 (t). However, on page
56, both were dependent on each other, so they were coupled.
5.6.2 Complex eigenvalues

We can just as well have complex eigenvalues and eigenvectors for differential equations. However, as we
live in the real life, we often want a real solution instead of the complex one. The purpose of this section
is to find such a real solution. Note that the derivation may be rather difficult (though you should be
able to follow), but an example later will clear up what you need to remember, so hang in there). First,
8 Remember that ei indicated the ith entry in the identity matrix.
let me recap what the real part and imaginary part of a complex number is. If we have p = a + bi, then
Re p = a and Im p = b (but not Im p = bi!).
Now, we remember that a real matrix A has a pair of complex eigenvalues λ and λ (where λ = a + bi
and λ = a − bi. So, two solutions of x0 = Ax are:
x1 (t) = veλt (5.19)

x2 (t) = veλt = x1 (t) (5.20)

Now, any linear combination is a solution. Just for the heck of it, let’s write Re (x1 (t)) = Re veλt
as9
1h i
Re veλt = x1 (t) + x1 (t)
2

So, Re veλt is a solution, as it is merely a linear combination of x1 (t) and x2 (t). Now, we can do the
same for Im veλt and prove this is merely a linear combination as well10
1 h i
Im veλt =

x1 (t) − x1 (t)
2i
So, our goal will be to find the real and imaginery parts of veλt . Now, we will write v as v = Re v+i Im v.
Furthermore, eλt = e(a+bi)t = eat eibt , where eibt = cos bt + i sin bt. So, we may write:
veλt = (Re v + i Im v) eat (cos bt + i sin bt)

= [(Re v) cos bt − (Im v) sin bt] eat
+i [(Re v) sin bt − (Im v) cos bt] eat
We established before that Re veλt and Im veλt are solutions, and so, two real solutions of x0 = Ax

are:
y1 (t) = Re x1 (t) = [(Re v) cos bt − (Im v) sin bt] eat

y2 (t) = Im x1 (t) = [(Re v) sin bt − (Im v) cos bt] eat
Example

−2 −2.5 3
Now, let’s do an example to clarify a bit. Let A = and x0 = . We find that the
10 −2 3
i
eigenvalue is λ = −2 + 5i and the corresponding eigenvector v1 = . So:
2

i (−2+5i)t
x1 (t) = e
2

−i (−2−5i)t
x1 (t) = e
2
Now, we can write x1 (t) as:

i −2t
x1 (t) =
e (cos 5t + i sin 5t)
2

0 1
Now, let’s do what the green box tells us to. Re v = and Im v = , and thus we get:
2 0

0 1 −2t 0 sin 5t −2t − sin 5t −2t
y1 (t) = cos bt − sin bt e = − e = e
2 0 2 cos 5t 0 2 cos 5t
9 If you don’t agree with this, simplify veλt as a + bi. Then x1 (t) + x1 (t) = a + bi + a − bi = 2a, so Re veλt =

h i
Re (x1 (t)) = a = 12 2a = 21 x1 (t) + x1 (t) .
10 If you don’t believe me: x (t) − x (t) = a + bi − (a − bi) = 2bi. Im veλt = Im (x (t)) = b, so we must divide by 2i.

1 1 1
©Sam van Elsloo

And for y2 (t):

0 1 −2t 0 cos 5t −2t cos 5t −2t
y2 (t) = sin bt + cos bt e = + e = e
2 0 2 sin 5t 0 2 sin 5t
Now, that’s not too difficult, right? We thus have as general solution

− sin 5t −2t cos 5t −2t
x (t) = c1 e + c2 e
2 cos 5t 2 sin 5t

3 0 1 3
With x (0) = , we get c1 + c2 = , so c1 = 1.5 and c2 = 3. Thus:
3 2 0 3

− sin 5t −2t cos 5t −2t
x (t) = 1.5 e +3 e
2 cos 5t 2 sin 5t
or
x1 (t) −1.5 sin 5t + 3 cos 5t −2t
= e
x2 (t) 3 cos 5t + 6 sin 5t
This can be plotted. If you try several scalars, you’ll end up with a figure similar to figure 5.11. We
see that the lines spiral towards the origin, and hence the origin is called a spiral point. The lines spiral
towards the origin due to the negative exponent of e; had it been positive, it would have been repelled by
the origin. The spiralling makes sense due to the sines and cosines.
Figure 5.11: The origin as spiral point.
©Sam van Elsloo

6 Orthogonality and Least Squares
6.1 Inner Product, Length and Orthogonality

If u and v are vectors in Rn , then we regard u and v as n × 1 matrices. The transpose uT is a 1 × n
matrix, and the matrix product uT v is a 1 × 1 matrix, which we write as a single real number (a scalar)
without brackets. The number uT v is called the inner product or dot product of u and v. If:
   
u1 v1
 u2   v2 
u =  . , v= . 
   
 ..   .. 
un vn
then the inner product of u and v is:

 
v1
 v2 


u1 u2 ··· un  .  = u1 v1 + u2 v2 + · · · + un vn
 .. 
vn
However, I suppose this is not brand new information for you. The following properties apply:
Theorem 1
Let u, v and w be vectors in Rn , and let c be a scalar. Then
a. u · v = v · u
b. (u + v) · w = u · w + v · w
c. (cu) · v = c (u · v) = u · (cv)
d. u · u ≤ 0, and u · u = 0 if and only if u = 0
Definition
The length (or norm) of v is the nonnegative scalar kvk defined by

√ q
2
kvk = v · v = v12 + v22 + · · · + vn2 , and kvk = v · v
Which is Pythagoras, basically. A vector whose length is 1 is called a unit vector. If we divide
a nonzero vector v by its length, we obtain a unit vector u, and this process is sometimes called
normalizing, and we say that u is in the same direction as v.
Definition
For u and v in Rn , the distance between u and v, written as dist (u,v), is the length of the
vector u − v. That is,
dist (u,v) = ku − vk
57
CHAPTER 6. ORTHOGONALITY 58
6.1.1 Orthogonal vectors

Let’s look at figure 6.1. You may remember that the line formed by v and −v is only orthogonal to
Figure 6.1: Orthogonal vectors
u if and only if the distances between the end points of the first line to the end point of u are equal
(it’s basically a linesegment bisector (middelloodlijn in het Nederlands) then). So, that would mean the
following:
2 2
[dist (u,v)] = [dist (u,-v)]
2 2
ku − vk = ku − (−v)k
2 2
ku − vk = ku + vk
u · u − 2u · v + v · v = u · u + 2u · v + v · v
2 2 2 2
kuk + kvk − 2u · v = kuk + kvk + 2u · v
We see that this is only true if u · v = 0, and hence:

Definition
Two vectors u and v in Rn are orthogonal (to each other) if u · v = 0.
Furthermore, considering u · v = 0, this leads to:

Theorem 2: The Pythagorean Theorem
2 2 2
Two vectors u and v are orthogonal if and only if ku + vk = kuk + kvk .
6.1.2 Orthogonal complements

If vector z is orthogonal to every vector in a subspace W in Rn , then z is said to be orthagonal. The set
of all vectors z that are orthogonal to W is called the orthogonal complement of W and is denoted
by W ⊥ . Note that if one subspace is the orthogonal complement of the other, it automatically follows
that the other is also the orthogonal complement of the other. In figure 6.2, an example of a plane and
line through 0 as orthogonal complements are given.
Figure 6.2: Orthogonal complements
©Sam van Elsloo

59 6.2. ORTHOGONAL SETS
Theorem 3
Let A be an m × n matrix. The orthogonal complement of the row space of A is the null space of
A, and the orthogonal complement of the column space of A is the null space of AT :
⊥ ⊥
(Row A) = Nul A and (Col A) = Nul AT
 
1 2 −2 4
The reasoning behind this is actually rather logical: suppose we have the matrix A = 2 3 −1 −5.
1 4 2 −2
The row space of this is given by the three vectors that are formed by each row. The row null space is
calcualted such that if you plug them in the matrix, you end up with 0s everywhere. So, if you multiply
any vector in the row space with any vector in the null space, you end up with 0. That means that they
are orthogonal. Similarly, since Col A is simply the transpose of Row A, Col A is orthogonal to transpose
of the null space of A.
6.1.3 Angles in R2 and R3

Simply u · v = kukkvk cos ϑ.
6.2 Orthogonal Sets

A set of vectors {u1 , ..., up } in Rn is said to be an orthogonal set if each pair of distinct
  vectorsfrom

3 −1
the set is orthogonal, that is, if ui · uj = 0, whenever i 6= j. If we for example have u1 = 1, u2 =  2 
  1 1
−1/2
and u3 =  −2  Then this is an orthogonal set, for:
7/2
u1 · u2 = 3 · −1 + 1 · 2 + 1 · 1 = 0
1 7
u1 · u3 = 3 · − + 1 · −2 + 1 · = 0
2 2
1 7
u2 · u3 = −1 · − + 2 · −2 + 1 · = 0
2 2
The vectors are depicted in figure 6.3.
Figure 6.3: Orthogonal set
Theorem 4
If S = {u1 , ..., up } is an orthogonal set of nonzero vectors in Rn , then S is linearly independent

and hence a basis for the subspace spanned by S
Proof
Remember that a set is linearly independent if and only if the homogeneous equation only has the
trivial solution. If 0 = c1 u1 + · · · + cp up , then:
0 = 0 · · · u1 = (c1 u1 + c2 u2 + · · · + cp up ) · u1
= (c1 u1 ) · u1 + (c2 u2 ) · u1 + · · · + (cp up ) · u1
= c1 (u1 · u1 ) + c2 (u2 · u1 ) + · · · + cp (up · u1 )
Since it’s an orthogonal set, up · u1 is zero, and thus it reduces to 0 = c1 (u1 · u1 ). Since u1 · u1 is not
zero (as u1 6= 0), this leads to the fact that c1 = 0 is the only solution. The same holds for other scalars,
so it only has the trivial solution, and S is linearly independent.
Definition
An orthogonal basis for a subspace W of Rn is a basis for W that is also an orthagonal set.
The nice thing about orthogonal bases are that the weights in a linear combination can be computed
easily.
Theorem 5
Let {u1 , ..., up } be an orthogonal basis for a subspace W of Rn . For each y in W , the weights in
the linear combination
y = c1 u1 + · · · + cp up
are given by
y · uj
cj =
uj · uj
Proof
Similar to last time, y · u1 = c1 (u1 · u1 ) and thus c1 = uy·u 1
1 ·u1
. It works as well for the jth term. So,
suppose we have the three vectors we had before. Express the vector y as linear combination of these
three vectors. First, we find that y · u1 = 11, y · u2 = −12, y · u3 = −33, u1 · u1 = 11, u2 · u2 = 6,
u3 · u3 = 33/2. Then:
y · u1 y · u2 y · u3
y = · u1 + · u2 + · u3
u1 · u1 u2 · u2 u3 · u3
11 −12 −33
= u1 + u2 + u3
11 6 33/2
= u1 − 2u2 − 2u3
6.2.1 An orthogonal projection

Given a nonzero vector u in Rn , consider the problem of decomposing a vector y in Rn into the sum of
two vectors, one a multiple of u and the other orthogonal to u. We wish to write
y = ŷ + z (6.1)
See figure 6.4. We can write ŷ = αu, so that z = y − αu. Then y − ŷ is orthogonal to u if and only if
(y − αu) · u = y · u − (αu) · u = y · u − α (u · u)
y·u y·u
So, α = u·u and thus ŷ = u·u u. ŷ is called the orthogonal projection of y onto u and the vector z
is called the component of y orthogonal to u. Sometimes ŷ is denoted by projL y and this is called
the orthogonal projection of y onto L. That is,
y·u
ŷ = projL y = ·u (6.2)
u·u
©Sam van Elsloo

61 6.2. ORTHOGONAL SETS
Figure 6.4: Orthogonal projection of y to u

7 4
So, suppose we for example have y = and u = . Find the orthogonal projection of y onto u.
6 2
Then write y as the sum of two orthogonal vectors, one in Span{u} and one orthogonal to u.

7·4+6·2 4 4 8
ŷ = · =2 =
4·4+2·2 2 2 4

7 8 −1 8 −1
So that z = y − ŷ = − = . So, y = + .
6 4 2 4 2
See figure 6.5
Figure 6.5: Another orthogonal projection
Suppose we now need to calculate the distance from y to L. This is simply the length of vector z, i.e.:
q
2
√
ky − ŷk = kzk = (−1) + 22 = 5
Let’s now take a look and admire the beauty of figure 6.6.
Figure 6.6: Orthogonal decomposition of a vector
Note how we can basically decompose a vector y into a sum of orthogonal projections onto one-
dimensional subspaces, so that any y can be written in the form:
y · u1 y · u2
y = projL1 y + projL2 y = u1 + u2 (6.3)
u1 · u1 u2 · u2
6.2.2 Orthonormal sets

A set {u1 , ..., up } is an orthonormal set if it is an orthogonal set of unit vectors. If W is the subspace
spanned by such a set, then {u1 , ..., up } is an orthonormal basis for W , since the set is automatically
linearly independent, by Theorem 4. The simplest form is of course {e1 , ..., en }. An more complicated
example is showing that the following set is orthonormal:
 √   √   √ 
3/√11 −1/√ 6 −1/√66
v1 = 1/√11 ,
 v2 =  2/√6  , v3 = −4/√ 66
1/ 11 1/ 6 7/ 66
We must show that all vectors are orthogonal to each other, and that the length of each vector is 1. Thus:
√ √ √
v1 · v2 = −3/ 66 + 2/ 66 + 1/ 66 = 0
√ √ √
v1 · v3 = −3/ 726 − 4/ 726 + 7/ 726 = 0
√ √ √
v2 · v3 = 1/ 396 − 8 396 + 7 396 = 0
9 1 1
v1 · v1 = + + =1
11 11 11
1 4 1
v2 · v2 = + + =1
6 6 6
1 16 49
v3 · v3 = + + =1
66 66 66
Since the set is linearly independent, its three vectors form a basis for R3 . See figure 6.7.
Figure 6.7: An orthonormal set
When the vectors in an orthogonal set of nonzero vectors are normalized to have unit length, the new
ectors will still be orthogonal, and hence the new set will be an orthonormal set.
Matrices whose columns form an orthonormal set are very nice.
Theorem 6
An m × n matrix U has orthonormal columns if and only if U T U = I
Proof
Let’s simplify by assuming U having three columns. Then:

 T  T
uT1 u2 uT1 u3

u1 u1 u1
U T U = uT2  u1 = uT2 u1 uT2 u2 uT2 u3 

u2 u3 (6.4)
uT3 uT3 u1 uT3 u2 uT3 u3
The columns of U are orthogonal if and only if uT1 u2 = uT2 = 0, uT1 u3 = uT3 u1 = 0, uT2 u3 = uT3 u2 = 0.
Furthermore, the columns of U all have unit length if and only if uT1 u1 = 1, uT2 u2 = 1 uT3 u3 = 1. Plug
these values in the matrix and you end up with the identity matrix.
©Sam van Elsloo

63 6.3. ORTHOGONAL PROJECTIONS
Theorem 7
Let U be an m × n matrix with orthonormal columns, and let x and y be in Rn . Then:
a. kU xk = kxk
b. (U x) · (U y) = x · y
c. (U x) · (U y) = 0 if and only if x · y = 0
These statements indicate that a linear transformation with a matrix with orthonormal columns
preserve lengths and orthogonality. Theorems 6 and 7 are especially useful when applied to square
matrices. An orthogonal matrix is a square invertible matrix U such that U −1 = U T . By theorem
6, such a matrix has orthonormal columns. Any square matrix with orthonormal columns is also an
orthogonal matrix. Such a matrix must have orthonormal rows too.
6.3 Orthogonal Projections

Last time, we dealt with projecting a vector on a line. We can, of course, also do it onto a more-than-one-
dimensional subspace, for example on a two-dimensional plane, as depicted in figure 6.8.
Figure 6.8: Orthogonal projection on a plane
Now, similar to last time around, we can calculate the orthogonal projection of y onto W , or
proj Wy, using the following theorem:
Theorem 8: The Orthogonal Decomposition Theorem
Let W be a subspace of Rn . Then each y in Rn can be written uniquely in the form
y = ŷ + z (6.5)
where ŷ is in W and z is in W ⊥ . In fact, if {u1 , ..., up } is any orthogonal basis of W , then:

y · u1 y · up
ŷ = u1 + · · · + up (6.6)
u1 · u1 up · up
and z = y − ŷ
     
2 −2 1
For example, let’s consider u1 =  5 , u2 =  1 , and y = 2 Write y as the sum of a vector in
−1 1 3
W and a vector orthogonal to W .
y · u1 y · u2
ȳ = u1 + · u2
u1 · u1 u2 · u2
     2
2 −2 −
9   3    5
= 5 + 1 = 2
30 6 1
−1 1 5
   2  7 
1 −5 5
And z = y − ŷ, we get z = 2 −  2  =  0 , and thus:
1 14
3 5 5
 2  7 
−5 5
y= 2 +0
1 14
5 5
6.3.1 Properties of orthogonal projections

Obviously, if y is in W = Span {u1 , ..., up }, then proj Wy = y. Furthermore:
Theorem 9: The Best Approximation Theorem
Let W be a subspace of Rn , let y be any vector in Rn , and let ŷ be the orthogonal projection of y
onto W . Then ŷ is the closest point in W to y, in the sense that
ky − ŷk < ky − vk (6.7)
for all v in W distinct from y.
The vector ŷ is called the best approximation to y by elements of W . The proof follows from
figure 6.9
Figure 6.9: The orthogonal projection of y onto W is the closest point in W to y.
2 2 2
ky − vk = ky − ŷk + kŷ − vk
Since ŷ 6= v, the term on the right is greater than 0, and hence theorem 9 follows.
Let’s do an example. Find the distance from y to W = Span {u1 , u2 }, where:
    
−1 5 1
y = −5 , u1 = −2 , u2 =  2 
10 1 −1
The shortest distance is ky − ŷk.

     
5 1 −1
15 −21 1  7   
ŷ = u1 + u2 = −2 − 2 = −8
30 6 2 2
1 −1 4
     
−1 −1 0
y − ŷ = −5 − −8 = 3
10 4 6
p √
ky − ŷk = 32 + 62 = 45
√ √
So the distance from y to W is 45 = 3 5.
For an orthonormal basis, up · up = 1, thus theorem 8 simplifies a bit, to:
©Sam van Elsloo

65 6.4. THE GRAM-SCHMIDT PROCESS
Theorem 10
If {u1 , ..., up } is an orthonormal basis for a subspace W of Rn , then
proj Wy = (y · u1 ) u1 + (y · u2 ) u2 + · · · + (y · up ) up (6.8)

If U = u1 u2 · · · up , then:
projW y = U U T y for all y in Rn (6.9)
The second follows from the fact that projW y is a linear combination of the columns of U using the
weights of y · u1 , y · u2 , ..., y · up , which can be written as uT1 y, uT2 y, ..., uTp y, showing that they are the
entries in U T y and justifying (6.9).
6.4 The Gram-Schmidt Process
Sometimes in life, you get a basis that is not orthogonal, let alone orthonormal. However, we like
orthogonal and orthonormal sets, so we want to learn (yeah you do) how to convert a given span to an
orthogonal basis. This is actually surprisingly easy. You start with one vector in the span, that’ll be the
first vector, v1 . We then look at the second vector, and basically decompose it onto the vector orthogonal
to v1 and the projection of v2 onto v1 . The orthogonal component is then simply x2 − projv1 x2 , and
this will be v2 . We can do this again for a third vector, using both v1 and v2 as the thing we project it
on, so that we basically get the following theorem:
Theorem 11: The Gram-Schmidt Process
Given a basis {x1 , ..., xp } for a nonzero subspace W of Rn , define
v1 = x1
x2 · v1
v2 = x2 − v1
v1 · v1
x3 · v1 x3 · v2
v3 = x3 − − v2
v1 · v1 v2 · v2
..
.
xp · v1 xp · v2 xp · vp−1
vp = xp − − v2 − · · · − vp−1
v1 · v1 v2 · v2 vp−1 · vp−1
Then {v1 , ..., vp } is an orthogonal basis for W . In addition
Span {v1 , ..., vk } = Span {x1 , ..., xk } for 1 ≤ k ≤ p (6.10)
   
3 1
So, suppose we have x1 = 6 and x2 = 2. Construct an orthogonal basis for this.
0 2
 
3
v1 = 6
0
     
1 3 0
x2 · x1   15    
v2 = x2 − x1 2 − 6 = 0
x1 · x1 45
2 0 2
A picture for clarification is given in figure 6.10.

We can even make an orthonormal base, very simply. Just divide each vector in the orthogonal base
Figure 6.10: The Gram-Schmidt Process in process
by its length. So, in the example, it would become:

   √1 
3
1 1    25 
u1 = v1 = 6 = √5
kv1 k 45
0 0
   
0 0
1 1
u2 = v2 = 0 = 0
kv2 k 2
2 1
6.5 Least-Squares Problems

Sometimes in linear algebra world, a system is consistent. However, we’d still like a solution for Ax = b
that’s as close to b as possible. We can do this by looking at figure 6.11. We want to find the value of x̂
Figure 6.11: An inconsistent system
such that the distance to b is as small as possible. Logic tells you that the the projection of b onto Col
A coincides with Ax̂, thus:
Ax̂ = b̂ (6.11)
Now, the orthogonal vector equals z = b − b̂ = b − Ax̂. Let aj be any column of matrix A, then it
follows (by orthogonality) that aj · (b − Ax̂) = 0. Using the definition of the dot product (you transpose
the first vector, so that you get a matrix multiplication, just look at the first page of this chapter0, this
also means aTj (b − Ax̂) = 0, so that:
AT (b − Ax̂) = 0 (6.12)
Thus:
AT b − AT Ax̂ = 0
AT Ax̂ = AT b
These calculations show that each least-squares solution of Ax = b satisfies the equation
AT Ax = AT b (6.13)
This matrix equation represents a system of equations called the normal equations for Ax = b. Actually,
the definition of a least-squares solution is:
©Sam van Elsloo

67 6.5. LEAST-SQUARES PROBLEMS
Definition
If A is m × n and b is in Rm , a least-squares solution of Ax = b is an x̂ in Rn such that:
kb − Ax̂k ≤ kb − Axk
for all x in Rn .
Theorem 13
The set of least-squares solutions of Ax = b coincides with the nonempty set of solutions of the
normal equations AT Ax = AT b.
For example, if we want to solve Ax = b, for

   
4 0 2
A = 0 2 , b =  0 
1 1 11
So:
 
4 0
T 4 0 1  17 1
A A = 0 2 =
0 2 1 1 5
1 1
 
2
4 0 1   19
AT b = 0 =
0 2 1 11
11
Then the equation AT Ax = AT b becomes

17 1 x1 19
=
1 5 x2 11
Computing the inverse of AT A leads to

T
−1 1 5 −1
A A =
84 −1 17
Leading to the solution:
−1 T
x = AT A A b

1 5 −1 19 1 84 1
= = =
84 −1 17 11 84 168 2
We see that this system has one least-square solution. However, we can also have infinitely many
least-square solutions, if we have a free variable. Suppose we want to find a least-squares soltion for
Ax = b for    
1 1 0 0 −3
1 1 0 0 −1
   
1 0 1 0 0
A=   , b= 
1 0 1 0
 2
 
1 0 0 1 5
1 0 0 1 1
We then get:
 
6 2 2 2
2 2 0 0
AT A = 
2 0

2 0
2 0 0 2
 
4
−4
AT b = 
2

The augmented matrix can be row reduced to:

   
6 2 2 2 4 1 0 0 1 3
2 2 0 0 −4 0 1 0 −1 −5
2 0 2 0 2  ∼ 0
   
0 1 −1 −2
2 0 0 2 6 0 0 0 0 0
So that the general equation has the form:
   
3 −1
−5 1
−2 + x4  1 
x̂ =    
0 1
These two examples lead to the following theorem:
Theorem 14
Let A be an m × n matrix. The following statements are logically equivalent:
a. The equation Ax = b has a unique least-square solution for each b in Rm .
b. The columns of A are linearly independent.
c. The matrix AT A is invertible.

When these statements are true, the least-squares solution x̂ is given by
−1
x̂ = AT A AT b (6.14)
I can really recommend using your graphical calculator to solve these type of exercises.
The distance from b to Ax̂ is called the least-squares error. For example, for the first example, we
have:      
2 4 0 4
1
b =  0  and Ax̂ = 0 2 = 4
2
11 1 1 3
And thus:
     
2 4 −2
b − Ax̂ =  0  − 4 = −4
11 3 8
q
2 2
√
kb − Ax̂k = (−2) + (−4) + 82 = 84
6.6 Applications to Linear Models

We will apply the least-squares problems to statistics. However, in statistics, we usually write Xβ = y
instead of Ax = b, and we call X the design matrix, β the paramer vector and y the observation
vector.
Let’s look at figure 6.12.
We call yj the observed value of y and β0 + β1 xj the predicted y-value. The difference between these
two is called a residual. The least-squares line is the line y = β0 + β1 x that minimizes the sum of the
squares of the residuals. It’s also called a line of regression of y on x. The coefficients β0 , β1 of the
line are called (linear) regression coefficients. We can make a table:
Which we can write as a system as:

   
1 x1 y1
1 x2   y2 
β0
Xβ = y, where X =  . ..  , β = β , y =  ..  (6.15)
   
 .. .  1 .
1 xn yn
©Sam van Elsloo

69 6.6. APPLICATIONS TO LINEAR MODELS
Figure 6.12: Finding a trend line
Table 6.1: Predicted and observed values

Predicted y -value Observed y -value
β0 + β1 x 1 = y1
β0 + β1 x 2 = y2
.. ..
. .
β0 + β1 x n = yn
This is simply a least-squares problem. Suppose we need to find the regression line for the data points
(2, 1), (5, 2), (7, 3), and (8, 3). We then have:
   
1 2 1
1 5 2
X= 1
, y =  
7 3
1 8 3
The normal equations:
X T Xβ = X T y
Use your graphical calculator, honestly. You end up at:

4 22 β0 9
=
22 142 β1 57
Solving yields:
−1 2
β0 4 22 9
= = 75
β1 22 142 57 14
2 5
So y = 7 + 14 x.
6.6.1 The general linear model

Statisticians usually introduce the residual vector , defined by = y − Xβ and write:
y = Xβ +
Any equation of this form is eferred to as a linear model. Once X and y are determined, the goal is to
minimize the length of , which amounts to finding a least-squares solution of Xβ = y. In each case, the
ˆ
least-squares solution boldsymbolβ is a solution of the normal equations X T Xβ = X T y.
6.6.2 Least-squares fitting of other curves

Sometimes we need a parabolic regression line or other things. We have therefore the general form:
y = β0 f0 (x) + β1 f1 (x) + β1 f1 (x) + · · · + βk fx (x) (6.16)
where f0 , ..., fk are known functions and β0 , ..., βk are parameters that must be determined. If we for
example want to use a parabolic relationship, we get:
1 x1 x21  
     
y1 1
 y2  1 x2 x22  β0  2 
 ..  =  .. ..  β1  +  .. 
     
..
 .  . . .  β2 .
yn 1 xn x2n n
6.6.3 Multiple regression

Sometimes, an output y depends on multiple variables, say u and v. Equations to predict y from u and v
may be:
y = β0 + β1 u + β2 v (6.17)
2 2
y = β0 + β1 u + β2 v + β3 u + β4 uv + β5 v (6.18)
So, as an example suppose we’ve made a huge mistake in our lifes and somehow ended up at the
faculty of geography. However, considering we’re at geography, we’re the only ones with some basic
understanding of mathematics, and hence they’ve asked us to come up with a model to construct a trend
surface or, in this case, a least-squares plane of the altitude y of a certain region, based on the latitude u
and longitude v. We expect the data to satisfy the following equations:
y1 = β0 + β1 u1 +β2 v1 + 1
y2 = β0 + β1 u2 +β2 v2 + 2
.. ..
. .
yn = β0 + β1 un +β2 vn + n
This system has the matrix form y = Xβ + , where:

     
y1 1 u1 v1   1
 y2  1 u2 v2  β0  2 
y =  .  , X = . ..  , β = β1  , =  .. 
     
..
 ..   .. . . .
β2

vn 1 un un n
©Sam van Elsloo

7 Symmetric Matrices and Quadratic Forms
7.1 Diagonalization of Symmetric Matrices

 
1 2 3
A symmetric matrix is beautiful. It’s a matrix A such that AT = A. So, for example, A = 2 4 6
3 6 9
 
1 2 3
is symmetric as AT = 2 4 6. Beautiful, isn’t it?
3 6 9
Now,
 symmetric matrices
 have very nice properties once we start diagonalizing them. Let’s consider
6 −2 −1
A = −2 6 −1. The characteristic equation can be shwon to be:
−1 −1 5
0 = −λ3 + 17λ2 − 90λ + 144 = − (λ − 8) (λ − 6) (λ − 3)
So that:
     
−1 −1 1
λ = 8: v1 =  1  ; λ = 6: v2 = −1 ; λ = 3: v3 = 1 ;
0 2 1
It can be easily shown that these are all orthogonal to each other. Now, chapter 6.2 gave you some
valuable life advise, namely that if you can normalize an orthogonal basis, you should always do so. So,
let’s be wise and get the normalized eigenvectors:
 √   √   √ 
−1/√ 2 −1/√6 1/√3
u1 =  1/ 2  , u2 = −1/√ 6 , u3 = 1/√3
0 2/ 6 1/ 3
And thus we end up at:

 √ √ √ 
−1/√ 2 −1/√6 1/√3
P =  1/ 2 −1/√ 6 1/√3
0 2/ 6 1/ 3
Then A = P DP −1 as per usual. However, now, the wonderful thing is that since P is square and has
orthonormal columns, P is an orthogonal matrix (we don’t call it orthonormal matrix as well because
fuck logic), and this causes P −1 = P T (see secton 6.2).
Theorem 1
If A is symmetric, then any two eigenvectors from different eigenspaces are orthogonal.
71
CHAPTER 7. SYMMETRIC MATRICES AND QUADRATIC FORMS 72
Proof
Let v1 and v2 be eigenvectors that correspond to distinct eigenvalues λ1 and λ2 . Then1
T T
λ1 v1 · v2 = (λ1 v1 ) v2 = (Av1 ) v2
= vT1 AT v2 = vT1 (Av2 )

= vT1 (λ2 v2 )
= λ2 vT1 = λ2 v1 · v2
So λ1 v1 v2 − λ2 v1 v2 = 0, but since λ1 6= λ2 , this means that v1 · v2 = 0.

An n × n matrix A is said to be orthogonally diagonalizable if there are an orthogonal matrix P
and a diagonal matrix D such that
A = P DP T = P DP −1 (7.1)
Theorem 2
An n × n matrix A is orthogonally diagonalizable if and only if A is a symmetric matrix.
Even the book admits that ”this theorem is rather amazing”. Normally, you can’t tell at a quick
glance whether a matrix is diagonalizable, but for symmetric matrices, you can.
Proof
The proof is rather simple. If A is orthogonally diagonalizable, then:
T
AT = P DP T = P T T DT P T = P DP T = A
Which is only true for symmetric matrices.
Example
 
3 −2 4
Orthogonally diagonalize the matrix A = −2 6 2, whose characteristic equation is:
4 2 3
2
0 = −λ3 + 12λ2 − 21λ − 98 = − (λ − 7) (λ + 2)
The eigenvalues and eigenvectors yield:

     
1 −1/2 −1
λ = 7 : v1 = 0 , v2 =  1  ; λ = −2 : v3 = −1/2
1 0 1
Now, note that theorem only says something about eigenvectors from different spaces. v1 and v2 are
clearly not orthogonal, so we must use the Gram-Schmidt Process here:
   
−1/2 −1/4
v2 · v1
z2 = v 2 − v1 =  1  =  1 
v1 · v1
0 1/4
1 Used properties are: the dot product can be written as a · b = bT a, λ v = Av , (Av )T = vT AT (because the order
1 1 1 1 1
was reversed when you took the transverse of multiple things), AT = A.
©Sam van Elsloo

73 7.1. DIAGONALIZATION OF SYMMETRIC MATRICES
Normalizing al vectors yields:2

 √ √   
1/ 2 −1/√ 18 −2/3 7 0 0
P = u1 u2 u3 =  0√ 4/√18 −1/3 , D = 0 7 0
1/ 2 1/ 18 2/3 0 0 −2
7.1.1 Spectral decomposition

The set of eigenvalues of a matrix A is sometimes called the spectrum, and the following description of
the eigenvalues is called a spectal theorem.
Theorem 3: The Spectral Theorem for Symmetric Matrices
An n × n symmetric matrix A has the following properties:
a. A has n real eigenvalues, counting multiplicities.

b. The dimension of the eigenspace for each eigenvalue λ equals the multiplicity of λ as a root
of the characteristic equation.
c. The eigenspaces are mutually orthogonal, in the sense that eigenvectors corresponding to
different eigenvalues are orthogonal.
d. A is orthogonally diagonalizable.
Note that if we use orthonormal eigenvectors, we get:

  T
λ1 0 u1
A = P DP −1 P DP T
 . .. . 
= = u1 · · · un    .. 
 
0 λn uTn
 T
u
 1 
· · · λn un  ...  = λ1 u1 uT1 + λ2 u2 uT2 + λn un uTn

= λ1 u1
uTn
This representation of A is called a spectral decomposition because it breaks A up into pieces

determined by the spectrum (eigenvalues) of A. Each term is an n × n matrix of rank 13 . Furthermore,

each matrix uj uTj is a projection matrix in the sense that for each x in Rn , the vector uj uTj x is the
orthogonal projection of x onto the subspace spanned by u4 .
Example
So what can we use the above things for? Let’s try to construct a spectral decomposition of the
matrix A that has the orthogonal diagonalization
√ √ √ √
7 2 2/√5 −1/√ 5 8 0 2/ √5 1/√5
A= =
2 4 1/ 5 2/ 5 0 3 −1/ 5 2/ 5
2 In my opinion, the easiest way to do this, is via your graphical calculator. First, save a vector. You do this by using
2ND (, so that you get a {. Then, write the first entry, use a comma as seperation, type the second entry, etc. Close by
pressing 2ND ). Then, press STO> (left bottom, just above ON). Now, press 2ND STAT, and press on L1. This will assign
L1 to this list and allows you to quickly load vectors (note that L1, L2 and L3 also have shortcuts on your the numpad,
2ND 1, 2ND 2 and 2ND 3, respectively. Now, to normalize a matrix, we need to divide the vector by its length. So, start a
fraction, and the numerator will be L1. The denumerator can be typed in as follows: use a square root, press 2ND STAT,
go to MATH, choose option 5: sum(. Then type L1 and square it. Then, close the sum( bracket by typing). This is your
length, and you’ll get the normalized vector. A problem arises however, as it won’t give you an exact answer. To counteract
this, the easiest way is to only compute sum(L12 ), and then divide each entry in the vector yourself by the square root of
this value.
3 Forgot what rank meant? It was the dimension of the column space of a matrix, or more straightforwardly, how many
linearly independent vectors made up a matrix. For example, every column of λu1 uT 1 is a multiple of u1 . In a following
example, we will see that this indeed the case. You don’t need to know the proof, though.
4 The reason for this is as follows: uuT x = u uT x = uT x u (position switch is allowed because uT x is the

dotproduct and thus a scalar). u x = u · x = x · u, and thus uuT x = (x · u) u, which is the orthogonal projection of x
T

x·u
onto a unit vector u (because u · u = 1 for a unit vector, u·u u = (x · u) u).
That’s simply:
A = 8u1 uT1 + 3u2 uT2
I honestly don’t know what’s the use of this except for that spectral sounds decomposition sounds kinda
cool (relative to linear algebra standards, that is). You can prove this by plugging in the unit eigenvectors,
if you do it correctly (or type it correctly in your graphical calculator), you end up exactly at A.
7.2 Quadratic Forms

T
This section is basically
about
quadratic
formulas in linear algebra. Let’s, just for fun, compute x Ax, if
x1 3 −2
we have x = and A = . For clarity, I’ve boldfaced the -2 in the topright corner (to see
x2 −2 7
what happens with that one):

T
3 −2 x1 3x1 − 2x2
x Ax = x1 x2 = x1 x2
−2 7 x2 −2x1 + 7x2
= x1 (3x1 − 2x2 ) + x2 (−2x1 + 7x2 )
= 3x21 − 2x1 x2 − 2x2 x1 + 7x22 = 3x21 − 4x1 x2 + 7x22
We see that we actually get a function of x1 and x2 at the end. Indeed, we write it as Q (x) = xT Ax,
where A is an n × n symmetric matrix. The matrix A is called the matrix of the quadratic form,
and Q is called a quadratic form on Rn . Note the beauty of the symmetric matrix here: we see that
the boldfaced -2 was doubled in the final formula in front of the x1 x2 . We can also deduce that a
diagonal matrix (where the -2 would be replaced by 0) would lead to 0x1 x2 , so it wouldn’t have a x1 x2
cross-product term. We can rather simply use this fact of ’doubling’ (don’t really know how to call it) to
express a function Q (x) as xT Ax.
Example
For x in R3 , let Q (x) = 2 2 2
 5x1 + 3x2 + 2x3 − x1 x2 + 8x2 x3 . Write this quadratic form as x Ax.
T
3
Furthermore, compute x = −1.
2
We see that there’s -1 in front of the x1 x2 , indicating that the entry in first row and second column
((i,j)=(1,2)), and first column and second row ((i,j)=(2,1)) should both be -1/2. Similarly, (2,3) and (3,2)
should be 4, and (1,3) and (3,1) need to be 0. Furthermore (1,1) is simply 5, (2,2) is 3 and (3,3) is 2:
  
5 −1/2 0 x1
Q (x) = xT Ax = x1 x2 x3 −1/2

3 4 x2 
0 4 2 x3
2
Q (3, −1, 2) = 5 · 32 + 3 · (−1) + 2 · 22 + 3 · −1 + 8 · −1 · 2 = 37
7.2.1 Change of variable in a quadratic form

Unfortunately, some people don’t like any of those cross-product terms in Q (x) and want us to write
xT Ax as yT Dy5 , where D is a diagonal matrix. So, we must find two things: y and D. The diagonal
matrix D isn’t that hard, it’s just as usual constructed by the eigenvalues of A. I’ll spoil y for you, it’s
beautifully:
x= Py
−1
(7.2)
y= P x
Since A is symmetric, we have that P is orthogonally diagonalizable, and thus the orthogonal matrix
P has the nice property that P −1 = P T . Why?
T
xT Ax = (P y) A (P y) = yT P T AP y = yT P T AP y = yT Dy

(7.3)

5y 2 1
is just another vector like x, but slightly shifted. For example, x = may require y = to make sure that
3 4
T T
x Ax = y Dy, as A and D are not the same and plugging in the same x in both would probably result in something
different. We must thus find a vector y such that they do yield the same result.
©Sam van Elsloo

75 7.2. QUADRATIC FORMS
Which is the result we wanted.

An example will clarify a lot.
Example
Make a change of variable that transforms the quadratic form Q (x) = x21 − 8x1 x2 − 5x22 into a
quadratic form with no cross-product term.
1 −4
The matrix of the quadratic form can easily be deduced to be A = . The eigenvalues equal
−4 5
λ = 3 and λ = −7, and the associated eigenvectors are respectively:
√ √
2/ √5 1/√5
v1 = , v2 =
−1/ 5 2/ 5
And since the vectors are of different eigenspaces, they are automatically orthogonal and
√ √
2/ √5 1/√5 3 0
P = ,D=
−1/ 5 2/ 5 0 −7
A suitable change of variable is simply (don’t read too much into it):

x y
x = P y, where x = 1 and y = 1
x2 y2
And (the steps in the middle are basically equation (7.3):

T
x21 − 8x1 x2 − 5x22 = xT Ax = (P y) A (P y) = yT P T AP y = yT Dy
= 3y12 − 7y22

2
If we have x = , we get:
−2
y = P −1 x = P T x
√ √ √
2/√5 −1/√ 5 2 6/ √5
y = =
1/ 5 2/ 5 −2 −2/ 5
We see that both x and y yield the same result:

2
x21 − 8x1 x2 − 5x22 = 22 − 8 · 2 · −2 − 5 · (−2) = 16
√ 2 √ 2
3y12 − 7y22 = 3 6/ 5 − 7 −2/ 5 = 16
So, basically what we do is, instead of going directly from x to 16 using a less-than-nice matrix, we first
convert x to y, from which we go to 16 using a more-than-nice matrix.
What we’ve done so far can be put more eloquently in a nice theorem:
Theorem 4: The Principal Axes Theorem
Let A be an n × n symmetric matrix. Then there is an orthogonal change of variable, x = P y,

that transforms the quadratic form xT Ax into a quadratic form yT Dy with no cross-product
term.
7.2.2 A geometric view of principal axes

When we have xT Ax = c, the corresponding graph either corresponds to an ellipse (or circle), a hyperbola,
two intersecting lines, or a single points, or contains no points at all. If A is a diagonal matrix, the graph
is said to be in standard position as shown in figure 7.1.
However, if do not have an diagonal matrix, the graphs are rotated, as in figure 7.2.
Figure 7.1: An ellipse and a hyperbola in standard position.
Figure 7.2: An ellipse and a hyperbola not in standard position.
The principal axes (which are the eigenvectors of A) amounts to finding a new coordinate system
with respect to which the graph is in standard position. So, for the left graph in figure 7.2, we have that
5 −2
A= , where the eigenvalues equal 3 and 7, with unit eigenvectors
−2 5
√ √
1/√2 −1/√ 2
u1 = , u2 =
1/ 2 1/ 2
So these are the principal axes as shown.
7.2.3 Classifying quadratic forms
Look at figure 7.3. We see basically three cases: the left two graphs are always positive (except for x = 0),
the most right graph is always negative (except for x = 0), and the third graph is a mix. We have nice
terms for this:
Figure 7.3: Graphs of quadratic forms.
©Sam van Elsloo

77 7.2. QUADRATIC FORMS
Definition
A quadratic form Q is:
a. positive definite if Q (x) > 0 for all x 6= 0,
b. negative definite if Q (x) < 0 for all x 6= 0,

c. indefinite if Q (x) assumes both positive and negative values.
Also, Q is said to be positive semidefinite if Q (x) ≥ 0 for all x, and to be negative semidefinite
if Q (x) ≤ 0 for all x. The quadratic forms in (a) and (b) in 7.3 are both positive semidefinite, but the
form in (a) is best described as positive definite.
The type is related to the eigenvalues.
Theorem 5: Quadratic Forms and Eigenvalues
Let A be an n × n symmetric matrix. Then a quadratic form xT Ax is:

a. positive definite if and only if the eigenvalues of A are all positive,
b. negative definite if and only if the eigenvalues of A are all negative, or
c. indefinite if and only if A has both positive and negative eigenvalues.
Proof
By the Principal Axes Theorem, there exists an orthogonal change of variable x = P y such that
Q (x) = xT Ax = yT Dy = λ1 y12 + · · · + λn yn2 (7.4)
Since P is invertible, there is a one-to-one correspondence between all nonzero x and all nonzero y. Thus,
the values of Q (x) for x 6== 0 coincide with the values of the expression on the right side of (7.4), which
is obvisouly controlled by the signs of the eigenvalues λ1 , ..., λn in the three ways described in the theorem.
Example
Is Q (x) = 3x21 + 2x22 + x23 + 
4x1 x2 + 4x2 x3 positive definite?
3 2 0
The matrix is A = 2 2 2 of which the eigenvalues can be found to equal 5, 2, and -1. Thus, A is
0 2 1
an indefinite quadratic form, not positive definite.
If a quadratic form is positive definite, then the associated matrix is said to be a positive definite
matrix. Other terms, such as a positive semidefinite matrix are defined anagously.
Time for celebration day, because you’ve reached the end of the summary and you’ve read everything
you need to know for the exam.
©Sam van Elsloo

Index
Basic variable, 9 Identity matrix, 13

Inconsistent system, 7
Characterization of Linearly Dependent Sets, 16
Coefficients, 7 Leading entry, 8
Cofactor, 30 Least-square problem
Cofactor expansion, 30 Linear model, 69
Column vector, 10 Residual vector, 69
Consistent system, 7 Least-square solution, 67
Design matrix, 68
Determinant, 29 Least-squares error, 68
Adjugate, 33 Least-squares line, 68
Classical adjoint, 33 Line of regression of y on x, see
Multiplicative property, 31 Least-squares line
Row Operations, 31 Normal equations, 66
Determinants Observation vector, 68
Properties of determinants, 42 Parameter vector, 68
Differential equations Regression coefficients, 68
Attractor, 52 Linear combination, 11
Fundamental set of solutions, 51 Linear equation, 7
Initial value problem, 51 Linear system, see System of linear
Repeller, 52 equations
Saddle point, 52 System of linear equations, 7
Sink, see Attractor Linear independence
Solution, 50 Linearly dependent, 16
Source, see Repeller Linearly independent, 16
Spiral point, 55 Linear transformation, 17
Dimension, 28 Codomain, 17
Definition of ∼, 18
Eigenvalues and Eigenvectors Domain, 17
(Algebraic) multiplicity, 43 Function, see linear transformation
Characteristic equation, 42 Image, 17
Characteristic polynomial, 43 Mapping, see linear transformation
Complex eigenvalue, 48 Range, 17
Complex eigenvector, 48 Shear transformation, 18
Complex part, 48 Standard matrix for the linear
Real part, 48 transformation T , 19
Eigenspace, 40
Eigenvalue, 39 Matrix, 7
Eigenvector, 39 ∼ of the quadratic form., 74
Solution, 42 Augmented matrix, 7
Spectral decomposition, 73 Coefficient matrix, 7
Spectrum, 73 Diagonal entries, 21
Equivalent, 7 Diagonal matrix, 21, 44
Diagonalizable matrix, 44
Free variable, 9 Elementary matrix, 24
Indefinite matrix, 77
Homogeneous equation, 14 Invertible matrix, 23
79
Inverse, 23 Coordinates of x relative to the basis B, 28

Nonsingular matrix, 23 Row equivalency, 8
Singular matrix, 23
Matrix equation, 12 Scalar, 10
Negative definite matrix, 77 Size, 7
Negative semidefinite matrix, 77 Matrix size, 7
Orthogonal matrix, 63 Solution, 7
Positive definite matrix, 77 Non-trivial solution, 14
Positive semidefinite matrix, 77 Solution set, 7
Projection matrix, 73 Trivial solution, 14
Similar matrices, 43 Span, 11
Similarity transformation, 43 Subspace, 26
Symmetric matrix, 71 Basis, 26
Zero matrix, 21 Column space, 26
Null space, 26
Negative semidefinite, 77 Row space, 37
Orthagonal set The Best Approximation Theorem, 64

Best approximation to y by elements of W , The Diagonalization Theorem, 44
64 The Gram-Schmidt Process, 65
Orthogonal projection of y onto W , 63 The Orthogonal Decomposition Theorem, 63
Orthogonal set, 59 The Principal Axes Theorem, 75
Orthogonal basis, 60 The Spectral Theorem for Symmetric Matrices,
Orthonormal basis, 61 73
Orthonormal set, 61
Vector, 10
Pivot, 9 Algebraic Properties of vectors in Rn , 11
Pivot column, 10 Component of y orthogonal to u, 60
Pivot position, 10 Distance between u and v, 57
Positive semidefinite, 77 Dot product, see Inner product
Product Equal vectors, 10
Product of a matrix and a vector, 12 Inner product, 57
Length, 57
Quadratic form, 74 Norm, see Length
Indefinite, 77 Normalizing, 57
Negative definite, 77 Orthagonal to a subspace, 58
Positive definite, 77 Orthogonal projection of y onto u, 60
Quadratic forms and eigenvalues, 77 Orthogonal vectors, 58
Scalar multiple, 10
Rank, 37 Sum of vectors, 10
Reduced (row) echolon form, 10 The Pythagorean Theorem, 58
Relative coordinates Unit vector, 57
Coordinate vector of x, 28 Zero vector, 10
©Sam van Elsloo

Linear Algebra Summary

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linear Algebra Summary

Uploaded by

Copyright:

Available Formats

Linear algebra summary

Based on Linear Algebra and its Applications, Fifth Edi-

Sam van Elsloo

©Sam van Elsloo

1 Linear Equations in Linear Algebra 7

5 Eigenvalues and Eigenvectors 39

5.6.2 Complex eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Symmetric Matrices and Quadratic Forms 71

©Sam van Elsloo

©Sam van Elsloo

1.1 Systems of Linear Equations

3. (Scaling) Multiply all entries in a row by a nonzero constant.

1.2 Row Reduction and Echolon Forms

1.2.1 (Row) echolon form

 All entries in a column below a leading entry are zeros.

An example of this echolon form is

Find the solutions of A.

©Sam van Elsloo

Then substracting yields:

1.2.2 Reduced (row) echolon form

x1 = −24 + 2x3 − 3x4

Theorem 1: Uniqueness of the Reduced Echolon Form

Theorem 2: Existence and Uniqueness Theorem

1.3 Vector Equations

This section deals with vectors.

©Sam van Elsloo

For all u, v, w in Rn and all scalars c and d:

1.3.2 Linear Combinations

Figure 1.1: Geometric description of a span.

1.4 The Matrix Equation Ax = b

has the same solution set as the vector equation

©Sam van Elsloo

1.4.1 Existence of solutions

c. The columns of A span Rm .

1.5 Solution Sets of Linear Systems

Figure 1.2: Homogeneous equation with infinitely many solutions

©Sam van Elsloo

Figure 1.3: Homogeneous equation with only one solution

Of which the system equals:

Figure 1.4: Non-homogeneous system

1.6 Linear Independence

Theorem 7: Characterization of Linearly Dependent Sets

©Sam van Elsloo

1.7 Introduction to Linear Transformations

Which we can solve by solving the augmented matrix:

A shear transformation is a transformation T (x) = Ax that is T : R2 → R2 where  A is a 2 x 2

Figure 1.5: An example of a shear transformation

A transformation (or mapping) T is linear if:

 T (cu) = cT (u) for all scalars c and all u in the domain of T .

This leads to:

If T is a linear transformation, then

for all vectors u, v in the domain of T and scalars c, d.

1.8 The Matrix of a Linear Transformation

©Sam van Elsloo

T (x) = Ax for all x in Rn

T (x) = T (x1 e1 + · · · + xn en ) = x1 T (e1 ) + · · · + xn T (en )

Easy peasy lemons squeezy right?

One final theorem to end this chapter:

©Sam van Elsloo

2.1 Matrix Operations

Multiplaction may be less obvious: remember that

However, a more convenient way to quickly calculate is by following this approach:

(AB)ij = ai1 b1j + ai2 b2j + · · · + ain bnj

 A (B + C) = AB + AC: left distributive law;

All entries in a column below a leading entry are zeros.

A shear transformation is a transformation T (x) = Ax that is T : R2 → R2 where A is a 2 x 2

T (cu) = cT (u) for all scalars c and all u in the domain of T .

A (B + C) = AB + AC: left distributive law;

(B + C) A = BA + CA: right distribituve law;

r (AB) = (rA) B = A (rB);

Im A = A = AIn : identity for matrix multiplication.