Professional Documents
Culture Documents
Linear Algebra Summary
Linear Algebra Summary
2 Matrix Algebra 21
2.1 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 The Inverse of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Characterizations of Invertible Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Subspaces of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Dimension and Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Determinants 29
3.1 Introduction to Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Properties of Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.1 Determinants and matrix products . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Uses of determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.1 A formula for A−1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Determinants as area or volume . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.3 Linear transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4 Vector Spaces 37
4.1 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3
CONTENTS 4
6 Orthogonality 57
6.1 Inner Product, Length and Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.1.1 Orthogonal vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.2 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6.1.3 Angles in R2 and R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2 Orthogonal Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
6.2.1 An orthogonal projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.2.2 Orthonormal sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.3 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.3.1 Properties of orthogonal projections . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.4 The Gram-Schmidt Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.5 Least-Squares Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.6 Applications to Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.6.1 The general linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6.2 Least-squares fitting of other curves . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.6.3 Multiple regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Forgot to say it before, but if you want to forward it (which would be fine), please use the bit.ly link (i.e
bit.ly/LA300617) and not this file or the dropbox link directly.
I’ve omitted a number of proofs of theorems in the book. The reason for this is that the old exams
barely ever ask for proofs longer than two sentences, and the proofs in the book are at times rather long,
cumbersome and generally not that clarifying. I did include the proofs that are only a few lines long, or
where they were extremely logical.
Furthermore, note that there’s an index in the back of this summary. If you want to quickly look
something up, you’ll find a reference there, most likely.
Additionally, I can really recommend doing the online exercises while studying. Do a section (or
subsection) and practice the associated exercises, this was the best way to study (at least in my experience).
Finally, do yourself a huge favour and explore the true powers of your graphical calculator, for deep
inside your graphical calculator you will find some very helpful functions. For example, it can automatically
row reduce matrices for you, which is incredibly useful. Dot products (useful for orthogonality, chapter 5)
can be done much quicker as well, etc. If you do not own a graphical calculator, I’d really recommend
you buy the Texas Instruments Nspire CX (without CAS! That one is highly illegal for linear algebra
exam). Otherwise, go for the TI-84 Plus1 , which is the one I use personally. However, the Nspire is much
much better looking (I mean, it has a colour screen, a scrollpad, a dedicated keyboard, and well it’s just
simply beautiful). In any case, never go Casio.
1 Possibly go for the silver or C silver edition, though I cannot guarantee that the latter one is allowed for the exam.
They say you’re allowed to use graphical calculators that are allowed on the VWO central exam, but C silver edition was
not allowed in 2015; it was in 2016 and 2017, but I don’t know how frequently they update the list
5
CONTENTS 6
a1 x1 + a2 x2 + · · · + an xn = b (1.1)
Where a1 , ..., an are called the coefficients. A system of linear equations (or a linear system) is a
collection of one or more linear equations involving the same variables. A solution of the system is a list
of values for x1 , ..., xn which satisfies the equations. The set of all possible solutions is called the solution
set of the linear system. Two linear systems are called equivalent if they have the same solution set.
When two linear equations are given, the solution can be seen as the intersection of these two lines, in
whatever dimension the lines are in. Note that there are three possibilities regarding the existence of the
solution:
1. There is no solution;
2. There is exactly one solution;
3. There are infinitely many solutions.
Systems with no solutions are called inconsistent. Systems with at least one solution are called
consistent.
Now, suppose we have the system:
x1 − 2x2 +x3 = 0
4x1 −3x3 = 5 (1.2)
2x1 + 6x2 +3x3 = 12
We can write this compactly in a rectangular array called a matrix. When we just write the coefficients,
we call it a coefficient matrix:
1 −2 1
4 0 −3 (1.3)
2 6 3
If we include the right part of the equations, we get the so-called augmented matrix1 :
1 −2 1 0
4 0 −3 5 (1.4)
2 6 3 12
The size is given as m × n, where m is the number of rows, and n the number of columns. The augmented
matrix thus was of the size 3 × 4.
Solving linear systems will be dealed with in more detail in the next section. For now, the three
elementary row operations are:
1. (Replacement) Replace one row by the sum of itself and a multiple of another2 ;
2. (Interchange) Interchange two rows;
1 Note that the only difference is that we have included the constants on the right side of the equation.
2 Add a multiple of another row to one row
7
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 8
Two matrices are called row equivalent if there exists a sequence of elementary row operations that
transforms one matrix into the other. It is important to note that row operations are reversible. If the
augmented matrices of two linear systems are row equivalent, then the two systems have the same solution
set.
Any row full of zeroes is placed at the bottom so that all nonzero rows are above it;
Each leading entry of a row (the leading entry is the first number in a row that’s not 0) is in a
column to the right of the leading entry of the row above it;
Where n is the leading entry in each row, and n may have any value but zero, k may have any value,
including zero.
Example
Consider the augmented matrix
0 3 −6 6 4 −5
A = 3 −7 8 −5 8 9
3 −9 12 −9 6 15
Start with the leftmost column. Make sure that there’s a nonzero number as leading entry; if there’s
a zero there, interchange this row with another row, preferably with the row that has the smallest
number as leading entry there (the reason will become obvious during the next step);
Use this row to make the rows below it have 0 in the first column, by adding/substracting this row
to the other rows;
Go to the row below it and look for the leading entry; if there is a row below it that has its leading
entry in a column before the second row, interchange these two rows;
Make the rows below the second row have its entry in the column of the leading entry of the second
row reduced to 0.
So, for this example, we first interchange the third and first row (R3 and R1), though you may also
choose R2 and R1:
3 −9 12 −9 6 15
3 −7 8 −5 8 9
0 3 −6 6 4 −5
Then, to make sure the second row has a zero in the first column as well (because the third row already
has a zero there, we don’t need to do anything there for now), we substract the first row from the second
row:
3 −9 12 −9 6 15
0 2 −4 4 2 −6
0 3 −6 6 4 −5
Now we look for the second row, and we see that that has its leading entry in the second column. We
must thus ensure that the entry in the second column of the third row becomes zero. We must thus
substract 1.5R2 from R3 . First multiplying with 1.5 gives:
3 −9 12 −9 6 15
0 3 −6 6 3 −9
0 3 −6 6 4 −5
Now, it may be obvious that x5 = 4. For the second row, however, we get (after dividing by 3):
x2 − 2x3 + 2x4 + 4 = −3. Note that x3 and x4 may take up any value and are therefore called free
variables. Note that there is no reason why we did not write a solution like x3 = ... and x4 = ... and
make x1 and x2 the free variables, but for simplicity’s sake, it is convention to write the pivots as basic
variables and express these as functions of the free variables. Note that the number of free variables is
the number of columns in the coefficient matrix (so the number of columns in the augmented matrix -1),
minus the number of pivot columns (see below), if the system is consistent, that is. Now, substitute this
value into the first row to get an expression for x1 as a function of x3 and x4 .
Next, we could create 0s above the leading entries in each column. We could, for example, substract R3
from R2 , and 2R3 from R1 :
1 −3 4 −3 0 −3
0 1 −2 2 0 −7
0 0 0 0 1 4
Next, we could also create 0s above the leading entry in the second row, by adding 3R2 to R1 :
1 0 −2 3 0 −24
0 1 −2 2 0 −7
0 0 0 0 1 4
Now, we call those 1s at the beginning of each row as leading entry the pivots. To be precise:
samvanelsloo97@icloud.com
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 10
Definition
A pivot position in a matrix A is a location in A that corresponds to a leading 1 in the reduced
echolon form of A. A pivot column is a column of A that contains a pivot position.
Note the advantage of writing the matrix like this. The solutions can now be immediately written as:
This matrix form, where each leading entry in each nonzero row is 1 and each leading 1 is the only
nonzero entry in its column, is said to be in reduced (row) echolon form.
Each matrix is row equivalent to one and only one reduced echolon matrix.
A system is consistent if and only if an echolon form of the augmented matrix has no row of the
form [0 · · · 0 b]. If a linear system is consistent, it contains one unique solution when there are
no free variables, and infinitely many solutions if there is at least one free variable.
1.3.1 Vectors in Rn
A matrix with only one column is called a column vector, or simply a vector. A vector in R5 would
for example be:
3
2
−1
u=
5
2
Because there are five entries and five dimensions can be reached (the first three represent x, y and z
direction, the other two have to be imagined to be the fourth and fifth dimension). Two vectors are
equal if and only if their corresponding entries are equal. The sum is equal to u + v if u and v are in
the same dimension. Given a vector u and a real number c, the scalar multiple of u by c is the vector
cu obtained by multiplying each entry in u by c. The number c is called a scalar. The vector whose
entries are all zero is called the zero vector and is denoted by 0.
Algebraic Properties of Rn
(ii) (u + v) + w = u + (v + w)
(iii) u + 0 = 0 + u = u
(iv) u + (−u) = −u + u = 0, where −u denotes (−1) u
(v) c (u + v) = cu + cv
(vi) (c + d) u = cu + du
(vii) c (du) = (cd) u
(viii) 1u = u
If v1 , ..., vp are vectors in Rn (in the previous example: there are three entries in each vector, so
n = 3), then the set of all linear combinations of v1 , ..., vp (that is, all points that can be reached
by a linear combination of these vectors) is called the subset of Rn spanned (or generated)
by v1 , ...vp .
samvanelsloo97@icloud.com
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 12
If A is an m x n matrix, with columns a1 , ..., an , and if x is in Rn (that is, if the vector x has
an equal number of rows as A has columns), then the product of A and x, denoted by Ax, is
the linear combination of the columns of A using the correspondinbg entries in x as
weights; that is,
x1
.
Ax = a1 a2 · · · an .. = x1 a1 + x2 a2 + · · · + xn an
xn
= x1 a1 + x2 a2 + · · · + xn an
There’s actually some logic behind the first and second step, and also between the second and third
step. If you recall from the very first paragraph, we’d use the first column for our x1 entries in the system
of equations, and the second column for x2 , etc. It thus makes sense to multiply the first column with x1
here, and the nth column with xn . Furthermore, as we’re now left with merely vectors, we’re allowed to
simply sum them to get a ”total” vector.
If we have for example the system:
x1 + 2x2 − x3 = 4
(1.8)
−5x2 + 3x3 = 1
This is equivalent to the vector equation
1 2 −1 4
x1 + x2 + x3 = (1.9)
0 −5 3 1
And the associated matrix equation:
x1
1 2 −1
x2 (1.10)
0 −5 3
x3
This leads to the following theorem:
Theorem 3
If A is an m × n matrix, with columns a1 , ..., an , and if b is in Rm , the matrix equation
Ax = b (1.11)
x1 a1 + x2 a2 + · · · + xn an = b (1.12)
which, in turn, has the same solution set as the system as the system of linear equations whose
augmented matrix is
a1 a2 · · · an b (1.13)
So, b1 − 12 b2 + b3 = 0. This means that only a specific combination is valid. Had there been a pivot
position in the third column as well, then we would have got b1 − 12 b2 + b3 = x3 , which is valid for any
combination of b1 , b2 and b3 3 . So, if A has a pivot in every row, then Ax = b is consistent for all b4 .
The following theorem applies to the existence of solutions:
Theorem 4
Let A be an m × n matrix. Then the following statements are logically equivalenty. That is, for a
particular A, either they are all true statements or they are all false.
a. For each b in Rm , the equation Ax = b has a solution.
b. Each b in Rm is a linear combination of the columns of A.
Proof
Statement a. and b. are true due to the definition of Ax. Statement c. is logical if we consider for
example the matrix:
2 4 3 1
1 −3 2 −3
0 0 0 0
Although the matrix has 3 rows, each column only spans R2 , because none of the columns point in the
third direction. It is now also apparent that the equation Ax = b does not have a solution for each b
and thus that not each b is a linear combination of the columns of A. Statement d. is a little bit more
m
difficult to proof.
Let U be an echolon form of A. Given b in R , we can row reduce the augmented
m
matrix A b to an augmented matrix U d for some d in R . If d. is indeed correct, then each row
of U contains a pivot position, so that Ax = b has a solution for any b. If d. is false, then the last row of
U consists of only zeroes. Since for a. to be true, each b was allowed, it means that if a. was true, d may
also have a nonzero value. The system then would be inconsistent, and not each b has a solution (only
the bs that would cause a zero to appear as last entry in U). So, if d. is false, so is a., and if a. is false,
b. and c. are also false. Therefore, all statements are logically equivalent.
An identity matrix is a matrix with 1’s on the diagonal and 0’s elsewhere.
3 (We
can adjust x3 for that)
4 Note
that the coefficient matrix should have a pivot position in every, not the augmented matrix (because the augmented
matrix at the end did have a pivot in b1 − 12 b2 + b3 ).
samvanelsloo97@icloud.com
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 14
One final theorem on this section, regarding properties of the matrix-vector product Ax:
Theorem 5
If A is an m × n matrix, u and v are vectors in Rn , and c is a scalar, then:
a. A (u + v) = Au + Av;
b. A (cu) = c (Au).
graph, both passing through the origin. Considering the equations of the plane are not equivalent, the
intersection between the two planes is a line, passing through the origin. The formula for this equation
can be found by solving the augmented matrix:
1 −2 3 0 1 −2 3 0 1 −2 3 0
∼ ∼
4 1 −6 0 0 9 −18 0 0 1 −2 0
1 0 −1 0
∼
0 1 −2 0
x3 1
So that x2 = 2x3 , x1 = x3 , or x = 2x3 = x3 2. Now, if we compare it with the system:
x3 1
x1 − 2x2 + 3x3 = 6
(1.15)
4x1 + x2 − 6x3 = −3
We realize that we have merely shifted the two planes somewhat. However, the planes are still parallel
to what they were before, and therefore, it is logical that the intersection line between the two is also
parallel to what it was, but it’ll simply be shifted a bit. Indeed, we can row reduce it to
1 0 −1 0
∼
0 1 −2 −3
0 1
So that x = −3 + x3 2. Note that this basically is just the same as before, but with an added ’base’
0 1
vector (don’t know how to call it). If we use x3 = t, we can now write x = p + tv.
In case you’re wondering what would happen if there was no free variable for the homogeneous
equation: this would happen, for example, if you consider three planes in 3D:
It is again apparent that the solution has merely shifted a bit (if you look closely, you can see the
intersection between
the blue, green and yellow plane, to be exact at (2,5,-2). Note that this is the only
2
solution: x = 5 .
−2
Indeed, we can write x = p + vh (this is more apparent for the previous example with the free
variable), we get the following theorem:
Theorem 6
Suppose the equation Ax = b is consistent for some given b, and let p be a solution. Then the
solution set of Ax = b is the set of all vectors of the form w = p + vh , where vh is any solution of
the homogeneous equation Ax = 0.
Please note that this only applies to an equation Ax = b in which there is at least one nonzero
solution p.
So, in short, take the following approach to find the solution set in parametric vector form:
samvanelsloo97@icloud.com
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 16
Row reduce the augmented matrix to reduced echolon form (pretty sure the reduced is not necessary
per se but it’s not that difficult);
Express each basic variable in terms of any free variables appearing in an equation;
Write a typical solution x as a vector whose entries depend on the free variables (if applicable);
Decompose the solution into a linear combination of vectors using the free variables as parameters.
x1 v 1 + x2 v 2 + · · · + xp v p = 0
has only the trivial solution. The set v1 , ..., vp is said to be linearly dependent if there exist
weights c1 , ...cp , not all zero, such tat
c1 v1 + c2 v2 + · · · + cp vp = 0
Similarly:
The columns ofA are linearly independent if and only if the equation Ax = 0 has only the
trivial solution.
An indexed set S = v1 , ..., vp of two or more vectors is linearly dependent if and only if at least
one of the vectors in S is a linear combination of the others. In fact, if S is linearly dependent and
v1 6= 0, then some vj (with j > 1) is a linear combination of the preceding vectors, v, ..., vj−1 .
Theorem 8
If a set contains more vectors than there are entries in each vector, then the set is linearly
dependent. That is, any set v1 , ..., vp in Rn is linearly dependent if p > n.
Theorem 9
If a set S = v1 , ..., vp in Rn contains the zero vector, then the set is linearly dependent.
Proof
Theorem 8 can be proven as follows: Let A = v1 · · · vp . Then A is n × p and the equation
Ax = 0 corresponds to a system of n equations and p unknowns. If p > n, then there are more unknowns
than equations, thus there must be a free variable. Hence Ax = 0 has a nontrivial solution and the
columns of A are linearly dependent.
Theorem 9 can be proven by renumbering the vectors such that v1 = 0. Then a solution would be
1v1 + 0v2 + · · · + 0vp = 0, so S is linearly dependent, because there is a nontivial solution.
So, to check whether a given set is linearly dependent, check whether one of the vectors is the linear
combination of the other5 , or solve the equation Ax = 0. To check what values of a certain entry would
yield a linearly dependent set, solve the equation Ax = 0. If you get that the entry should be a free
variable, it is dependent for any value you plug in (so it’s dependent for all h for example).
Now, it what this actually does, is that it converts a point (in this case, (1,2,3,4)) in a 4D graph (you
can think of the fourth dimension for example being visualized with a colour or something), to a point in
a 2D graph. It’s nice that we now have the this conversion for one point, but aren’t you interested as fuck
in knowing it for every single 4D point? That’s why we introduce linear transformations: similarly to
how a regular function f(x) converts a input value x to something else, a linear transformation converts a
vector in Rn to a vector in Rm . To be more precise:
A transformation (or function or mapping) T from Rn to Rm is a rule that assigns to
each vector x in Rn a vector T (x) in Rm . The set Rn is called the domain of T , and Rm
the codomain of T . The notation T : Rn → Rm indicates tat the domain is Rn and the
codomain Rm . For x in Rn , the vector T (x) in Rm is called the image of x (under the action
of T ). The set of all images T (x) is called the range of T .
So, what would the linear transformation of the example above be? Simple:
x1
4 −3 1 3 x2 = 4 · x1 − 3 · x2 + 1 · x3 + 4 · x4
(1.17)
2 0 5 1 x3 2 · x1 + 0 · x2 + 5 · x3 + 1 · x4
x4
3
2
So, say we have the vector u =
, then T (u) would be:
5
1
4·3−3·2+1·5+4·1 15
T (u) = =
2·3+0·2+5·5+1·1 32
So, we converted a point in one graph that had the coordinates (3,2,5,1) to (15,32). Note that two
dimensions are gone. Now, to determine whether a vector b in Rm is range of T (x) (or,
whether
there is
21
an x whose image under T is b)), all we need is to solve the equation: (suppose b = ):
−18
4 · x1 − 3 · x2 + 1 · x3 + 4 · x4 21
=
2 · x1 + 0 · x2 + 5 · x3 + 1 · x4 −18
This leads to a solution set with two variables, thus there are infinitely many solutions. However, it can
of course also happen (had the coefficient matrix had more rows), that there would be just one solution,
or even none at all, if b had not been an image of x under T .
5 Note that you’d need to check for every vector to be sure. So, it’s takes a while to proof it is linearly independent this
way, but if you quickly see that one vector is a linear combination of the others, then it’s quicker than solving Ax = 0
samvanelsloo97@icloud.com
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 18
Definition
and
T (cu + dv) = cT (u) + dT (v) (1.18)
The solution must be obvious for this. The following theorem applies:
Theorem 10
Let T : Rn → Rm be a linear transformation. Then there exists a unique matrix A such that
In fact, A is the m × n matrix whose jth column is the vector T (ej ), where ej is the jth column
of the identity matrix in Rn :
A = T (e1 ) · · · T (en )
Proof
Write x = In x = e1 · · · en x = x1 e1 + · · · + xn en , and use the linearity of T to compute
The matrix A that is the solution to this is the standard matrix for the linear transformation T .
What above theorem actually means is the following: suppose we are given that e1 (the point (1,0,0)) is
6 0 2 −3
transformed into (0,4), e2 into (2,3), and e3 into (-3,1) . Then A = . Wonderfully simply,
4 3 1
isn’t it? So, if they for example say that something is first turned π4 radians, and then reflected in the x2
axis, it is key to draw a sketch of this. We may pick any vectors to establish our standard matrix, but
realizing that we just read something useful, it’s much more convenient to start with e1 and √ e2 . What
1
2
would happen with them if we apply said transformation? e1 first becomes the point 21 √ and e2
1√ 2 2
− 2√ 2
becomes 1 . Then, considering we reflect in the x2 axis, the x1 coordinates switch signs (as they
2 2 1√ 1√
− 2 2
come to lie on the opposite side of where they were beforehand). So, e1 = 12√ and e2 = 12 √ So,
2 2 2 2
the standard matrix equals:
1√ √
− 2√ 2 12 √2
A= 1 1
2 2 2 2
Definition
A mapping T : Rn → Rm is said to be one-to-one Rm if each b in Rm is the image of at most
one x in Rn . That is, for each b, the augmented matrix may not contain a free variable (the
system may have no solution, but it may never have more than one solution).
6 So 0 2 −3
T (e1 ) = , T (e2 ) = , T (e3 ) =
4 3 1
samvanelsloo97@icloud.com
CHAPTER 1. LINEAR EQUATIONS IN LINEAR ALGEBRA 20
Thus, a transformation can be both onto and one-to-one, it can be either of them, or it can be neither.
Theorem 11
Let T : Rn → Rm be a linear transformation. The T is one-to-one if and only if the equation
T (x) = 0 has only the trivial solution.
Proof
i. Remember from section 1.4, the columns of A span Rm if and only if for each b in Rm the equation
Ax = b is consistent, that is, if and only if for every b the equation T (x) = b has a solution. This
is true if and only if T maps Rn onto Rm .
ii. The equation T (x) = b and Ax = b are the same, except for notation. Remember that Ax = b only
has the trivial solution if the columns of A are linearly independent. Using the previous theorem,
that means that T (x) is one-to-one if and only if the colums of A are linearly independent.
b. (A + B) = A + (B + C);
c. A + 0 = A;
d. r (A + B) = rA + rB;
e. (r + s) A = rA + sA;
f. r (sA) = (rs) A.
21
CHAPTER 2. MATRIX ALGEBRA 22
This can be done for all matrix multiplications. The more general form is:
Definition
If A is an m x n matrix, and if B is an n x p matrix with columns b1 , ..., bp , then AB is the m x
p matrix whose colums are Ab1 , ..., Abp . That is,
AB = A b1 b2 · · · bp = Ab1 Ab2 · · · Abp
If the product AB is defined, then the entry in row i and column j of AB is the sum of the
products of corresponding entries from row i and A and column j of B. If (AB)ij denotes the
(i, j)-entry in AB, and if A is an m x n matrix, then
Note that if you want to calculate a certain row for the matrix AB, then you only have to calculate
the product of that row in A with the matrix B.
Theorem 2
Let A be a m x n matrix, and let B and C have sizes for which the indicated sums and products
are defined.
A (BC) = (AB) C: associative law of multiplication;
If a product AB is the zero matrix, you cannot conclude in general that either A = 0 or
B = 0.
In general, the transpose of a product of matrices equals the product of their transposes in the reverse
order.
CA = I
AC = I
Where I = In is the n x n identity matrix. In this case C is the inverse of A. In fact, there is only one C for
each matrix A, because if B were to be another inverse of A, then B = BI = B (AC) = (BA) C = IC = C.
This unique inverse is denoted by A−1 , so
A−1 A = I (2.1)
−1
AA = I (2.2)
A matrix that is not invertible is sometimes called a singular matrix, and an invertible matrix is called
a nonsingular matrix.
Theorem 4
a b
Let A = . If ad − bc 6= 0, then A is invertible and
c d
−1 1 d −b
A =
ad − bc −c a
det A = ad − bc
samvanelsloo97@icloud.com
CHAPTER 2. MATRIX ALGEBRA 24
Proof
This proof is rather easy:
Ax = b
−1
A Ax = A−1 b
Ix = A−1 b
x = A−1 b
Three useful facts about invertible matrices are:
Theorem 6
b. If A and B are n x n invertible matrices, then so is AB, and the inverse of AB is the product
of the inverses of A and B in the reverse order. That is,
−1
(AB) = B −1 A−1
Proof
(a) can be proven by noticing that the solution to A−1 C = I and CA−1 = I has the solution C = A,
−1 −1
and the solution to the second equation would thus be A = I A−1 = A−1 .
(b) can be proven by writing out
(AB) B −1 A−1 = A BB −1 A−1 = AIA−1 = AA−1 = I
−1
Note that if (AB) = A−1 B −1 , then we could not write BB −1 = I, because they weren’t next to each
other, and you can’t just simply interchange matrices when multiplying.
T T
(c) can be proven by using A−1 AT = A−1 A = I T = I. (b) can be generalized to:
The product of n x n invertible matrices is invertible, and the inverse is the product of their
inverses in the reverse order.
An elementary matrix is one that is obtained by performing just a single elementary row operation
on an identity matrix.
If an elementary row operation is performed on an m x n matrix A, the resulting matrix
can be written as EA, where the m x m matrix E is created by performing the same row
operation on Im .
Each elementary matrix E is invertible. The inverse of E is the elementary matrix of the
same type that transforms E back into I.
The second one is quite logical: since row operations are reversible, if there’s been only one row operation
on I, then there must be another operation of the same type that changes E back into I, thus F E = I
and EF = I. The three row operations are: interchanging two rows, multiplying a row, and adding (a
multiplicative of) a row to another row.
Final theorem on this for now:
Theorem 7
An n x n matrix A is invertible if and only if A is row equivalent to In and in this case, any
sequency of elementary row operations that reduces A to In also transforms In into A−1 .
−1
We can use this to make
an algorithm for finding A . If we place A and I side by side to form an
augmented matrix A I , then row reduce it. If A is row equivalent to I, that is, if you can get a nice
square identity matrix in the left half of your reduced matrix, then A I is row equivalent to I A−1 .
If you weren’t able to get the identity matrix on the left, then A does not have an inverse.
Apply these to establish whether a given matrix is invertible. The easiest way is to row reduce it and
see whether it has indeed pivot positions in each row.
Proof
If (a) is true, then so is (j), because A−1 could be plugged in for C. If (j) is true, then so is (d),
because the solution to Ax = b is x = A−1 b. If b = 0, then x = 0, so the only solution is the trivial
solution. Since there is only this solution, there may not be a free variable, thus A has n pivot columns,
thus (c) is true as well. Since A also has n rows, the pivot positions must lay on the main diagonal, and
the matrix may be row reduced to the identity matrix, thus (b) is true as well. If (a) is true, then so is
(k), because A−1 could be pluigged in for D.
If (k) is true, then so is (g), because we can substitute x
with A−1 b to get A A−1 b = b = AA−1 b = b, so each b has at least one solution. If for each b in
Rn there is a solution, there must be a pivot position in each row, thus n pivot positions in total. Thus,
if (g) is true, then so is (c) and so is (a). (g), (h) and (i) are equivalent for each matrix, so if (g) is true,
then so are (h) and (i). (e) follows from (d), and from (e) follows (f). Finally, and from (a) follows (l)
and vice versa, using theorem 6(c) in section 2.2.
A linear transformation T : Rn → Rn is said to be invertible if there exists a function S : Rn → Rn
such that
samvanelsloo97@icloud.com
CHAPTER 2. MATRIX ALGEBRA 26
Theorem 9
Let T : Rn → Rn be a linear transformation and let A be the standard matrix forr T . Then T is
invertible if and only if A is an invertible matrix. In that case, the linear transformation S given
by S (x) = A−1 x is the unique function satisfying equations (2.3) and (2.4).
2.4 Subspaces of Rn
Definition
A subspace of Rn is any set H in Rn that has three properties:
a. The zero vector is in H.
b. For each u and v in H, the sum u + v is in H.
It’s almost the same as a span, but with the added requirements (b) and (c). However, pretty much
all of the vector sets you’ve seen thus fare are subspaces.
Two subspaces are of special interest: the subspace which contains all linear combinations of the
columns of A, and the subspace which contains all solutions of the homogeneous equation Ax = 0. We
have special names for these two:
Definition
The column space of a matrix A is the set Col A of all linear combinations of the columns of A.
3 1 −3 −4
So, for example, to check whether b = 3 is in the column space of A = −4 6 −2, we solve
−4 −3 7 6
the augmented matrix:
1 −3 −4 3 1 −3 −4 3 1 −3 −4 3
−4 6 −2 3 ∼ 0 −6 −18 −15 ∼ 0 2 6 −5
−3 7 6 −4 0 −2 −6 5 0 0 0 0
Thus Ax = b is consistent and b is in Col A. Please note that if A is a m × n matrix, then Col A is in
Rm (thus equal to the number of rows). On the other hand, we have the null space
Definition
The null space of a matrix A is the set Nul A of all solutions of the homogeneous equation
Ax = 0.
It will be clearer later, but for now, just trust me when I say:
Theorem 12
The null space of an m × n matrix A is a subspace of Rn . Equivalently, the set of all solutions of
a system Ax = 0 of m homogeneous linear equations in n unknowns is a subspace of Rn .
This makes sense, because a solution of a system thats 2 × 5 would contain 5 variables, x1 , ..., x5 . One
more definition:
Definition
A basis for a subspace H of Rn is a linearly independent set in H that spans H.
This is quite a logical term: we can have subspaces that have many ’useless’ vectors, that is, vectors
that are merely a combination of the other vectors in it. By using a basis, we are only left with the
fundamental vectors in the subspace.
Now, if we for example want to find a basis for the null space of the matrix
−3 6 −1 1 −7
A= 1 −2 2 3 −1
2 −4 5 8 −4
So, u, v, w generates Nul A. Note again that since we have five columns in A, the null space is also in R5 .
Now, what would be the basis column space of A? Denote the columns of the row reduced matrix B
by b1 , ..., b5 . Note that b2 = −2b1, b4 = −b1 + 2b3 , and b5 = 3b1 − b3 . So, b2 , b4 , b5 are all merely
linear combinations of b1 and b3 . Remembering that the equations Ax = 0 and Bx = 0 should have the
same solution set, thus they must have the exact same linear dependence relationships. Ergo, a2 , a4 and
a5 are also linear combinations of a1 and a3 . Therefore, the basis of the column space of A is formed by
a1 and a3 , so:
−3 −1
Basis for Col A = 1 , 2
2 5
Theorem 13
The pivot columns of a matrix A form a basis for the column space of A.
Remember to always use the columns in the originial matrix, not the columns in the reduced matrix.
Now, please refer to chapter 4.1.
samvanelsloo97@icloud.com
CHAPTER 2. MATRIX ALGEBRA 28
Definition
Suppose that the set B = {b1 , ..., bp } is a basis for subspace H. For each x in H, the coordinates
of x relative to basis B are the weights c1 , ..., cp such that x = c1 b1 + · · · + cp bp , and the vector
in Rp
c1
[x]B = · · ·
cp
that the plane formed by H ’looks’ like R2 . Indeed, we call this the dimension of H. In a linearly
independent set, thus a base, the number of vectors in H is equal to the dimension. So:
Definition
The dimension of a nonzero subspace H, denoted by dim H, is the number of vectors in any
basis for H. The dimension of the zero subspace {0} is defined to be zero.
The definition of a rank and theorem 14 and 16 of this chapter are treated in chapter 4.1.
Theorem 15: The Basis Theorem
Let H be a p-dimensional subspace of Rn . Any linearly independent set of exactly p elements in
H is automatically a basis for H. Also, any set of p elements of H that spans H is automatically
a basis for H.
What we do then, is just pick any row or column we like, say the first row. Then, the determinant of A
can be written as:
Note two things: the minus and plus signs alternate, and we the A1 1 etc. These indicate the submatrices
by leaving out the first row and first column, first row and second column, etc. So:
0 4 −1
A11 = 1 0 7
4 −2 0
Now, det A11 = 0 · 0 − 7 · −2 = 14, det A12 = 1 · 0 − 7 · 4 = 28, det A13 = 1 · −2 − 0 · 4 = −2. This process
must be repeated for each submatrix, so for large matrices, it really takes some while (for a 4 × 4 matrix,
it already takes 12 matrices).
This leads to a more general definition:
Definition
For n ≥ 2, the determinant of an n × n matrix A = [aij ] is the sum of n terms of the form
±a1j det A1j , with plus and minus signs alternating, where the entries a11 , a12 , ..., a1n are from,
the first row of A. In symbols:
1+n
det A = a11 det A11 − a12 det A12 + · · · + (−1) a1n det A1n
Xn
1+j
= (−1) a1j det A1j
j=1
1 5 0
For example, let A = 2 4 −1, then
0 −2 0
4 −1 2 −1 2 4
det A = 1 · det − 5 · det + 0 · det
−2 0 0 0 0 −2
= 1 (0 − 2) − 5 (0 − 0) + 0 (−4 − 0) = −2
29
CHAPTER 3. DETERMINANTS 30
These submatrices are called the (i,j)-cofactor of A, and the number Cij is given by:
i+j
Cij = (−1) det Aij (3.1)
The previous calculation used cofactor expansion accross the first row. We can, however, pick any
row or column to cofactor along:
Theorem 1
The determinant of an n × n matrix A can be computed by a cofactor expansion across any row
or down any column. The expansion acrros the ith row using cofactors is:
The plus and minus sign in the (i, j)-cofactor depends on the position of aij in the matrix, regardless
i+j
of the sign of aij itseelf. The factor (−1) determines the following checkerboard pattern of sings:
+ − + ···
− + −
+ − +
··· ···
Choosing smartly along which column and rows we use cofactor expansion, we can significantly reduce
the computation time. For example, compute det A, where
3 −7 8 9 −6
0
2 −5 7 3
0
A= 0 1 5 0
0 0 2 4 −1
0 0 0 −2 0
Note that e.g. C31 indicates the submatrix that is created by leaving out the 3rd row and 1st row. If we
now again cofactor along the first column, we get:
1 5 0
det A = 3 · 2 · 2 4 −1 (3.2)
0 −2 0
We saw that this equaled the determinant of this submatrix was -2, and hence det A = 3 · 2 · −2 = −12.
Theorem 2
If A is a triangular matrix, then det A is the product of the entries on the main diagonal of A.
Note that if A for example is a 4 × 4 matrix, then det 3A would be 34 det A, as multiply four rows by
three.
2 −8 6 8
3 −9 5 10
If we combine this with theorem 2, we can ofr example easily compute det A, where A =
−3 0 1 −2.
1 −4 0 6
First, factoring out 2 in the top row and then adding a multiple of this row to some the other rows:
1
−4 3 4 1
−4 3 4
3 −9 5 10 0 3 −4 −2
det A = 2 = 2
−3 0 1 −2 0 −12 10 10
1 −4 0 6 0 0 −3 2
Leading to:
1
−4 3 4 1
−4 3 4
0 3 4 −2 0 3 4 −2
det A = 2 = 2 = 2 · 1 · 3 · −6 · 1 = −36
0 0 −6 2 0 0 −6 2
0 0 −3 2 0 0 0 1
Note that if a square matrix A is only invertible if and only if A has n pivot positions. If, after row
reduction, it becomes apparent that A does not have a pivot position in its final row, then A is not
invertible, and its determinant is equal to zero. Ergo:
Theorem 4
A square matrix A is invertible if and only if det A 6= 0.
Furthermore, det A = 0 when the columns of A are linearly dependent. Also, det A = 0 when the rows
of A are linearly dependen1 .
Theorem 5
1 Rows of A are the columns of AT and linearly dependent columns of AT make AT is singular (non-invertible). When
samvanelsloo97@icloud.com
CHAPTER 3. DETERMINANTS 32
Now, suppose we have Ax = b, and try out the following, just for the heck of it:
A · Ii (x) = A e1 · · · x · · · en = Ae1 · · · Ax · · · Aen
= a1 · · · b · · · an = Ai (b)
If you’re a bit confused with what this actually means, hang on just a second, we’re almost there. We
can use the multiplicative property of determinants to write:
The determinant of an identity matrix where we just replaced one column with a column x would for
example look like:
1 0 3 0
0 1 −2 0
0 0 5 0
0 0 2 1
It should be obvious that by adding multiples of row 3 to the other rows, the system can be rewritten to:
1 0 0 0
0 1 0 0
0 0 5 0
0 0 0 1
Meaning that the determinant is simply 5. Thus, det Ii (x) is simply equal to the value of the ith entry
in x, or xi . This leads to the following theorem:
Theorem 7: Cramer’s Rule
Let A be an invertible n × n matrix. For any b in Rn , the unique solution x of Ax = b has entries
given by
det Ai (b)
xi = (3.3)
det A
Thus, we can calculate the ith entry in the vector x by dividing the determinant of the matrix we get
when substituting b into the ith column of A. So, for example, if we have the system:
3sx1 − 2x2 = 4
−6x1 + sx2 = 1
Calculate the value(s) for s for which the system has a unique solution (you don’t need Cramer’s rule for
this) and describe the solution (you need Cramer’s rule for this). First, the three applicable matrices are:
3s −2 4 −2 3s 4
A= , A1 (b) = , A2 (b) =
−6 s 1 s −6 1
det A = 3s2 − 12 = 3 (s + 2) (s − 2)
Ax = ej
Where the ith entry of x is the (i, j)-entry of A−1 . By Cramer’s rule, it is:
det Ai (ej )
xi = (i, j) − entry of A−1 =
det A
If we cofactor expansion down column i of Ai (ej ), we basically get all zeroes multiplied with a
subdeterminant, except for the j th row (ej consists of only zeroes, except for one one in the ith row.
Thus, if we cofactor expand along the this column, everytime we get a 0, and we only get nonzero cofactor
for the ith row). So, we can write:
i+j
det Ai (ej ) = (−1) det Aji = Cji
Cji
Thus, the (i, j) entry in A−1 equals det A ,
leading to the general matrix: If we cofactor expansion
down column i of Ai (ej ), we basically get all zeroes multiplied with a subdeterminant, except for the j th
row (ej consists of only zeroes, except for one one in the ith row. Thus, if we cofactor expand along the
this column, everytime we get a 0, and we only get nonzero cofactor for the ith row). So, we can write:
C11 C21 · · · Cn1
1 C12 C22 · · · Cn2
A−1 = .. .. .. (3.4)
det A . . .
C1n C2n · · · Cnn
The last part is usually called the adjugate (or classical adjoint) of A. The general theorem is:
Theorem 8: An Inverse Formula
Let A be an invertible n × n matrix. Then
1
A−1 = adj A (3.5)
det A
2 1 3
Let’s do an example just for clarification. Find the inverse of the matrix A = 1 −1 1 . The
1 4 −2
determinant can be found to equal 14 (use your graphical calculator, for instance). The nine cofactors are:
−1 1 1 1 1 −1
C11 = + = −2, C12 = − = 3, C13 = + =5
4 −2 1 −2 1 4
1 3 2 3 2 1
C21 = − = 14, C22 = + = −7, C23 = − = −7
4 −2 1 −2 1 4
1 3 2 3 2 1
C31 = + = 4, C32 = − = 1, C33 = + = −3
−1 1 1 1 1 −1
We must take the transpose of this matrix to have everything end up at the right adjugate:
−2 14 4
adj A = 3 −7 1
5 −7 −3
And thus
−1 2
−2 14 4 1
1 7 7
A−1 = 3 −7 3
1 = 14 − 12 1
14
14 5 −3
5 −7 −3 14 − 12 14
samvanelsloo97@icloud.com
CHAPTER 3. DETERMINANTS 34
Note that it is often necessary to translate the parallelogram/parallelepiped such that one of the
points is on the vector. Translation does not influence the area, however. So, for example, if we have the
paraellelogram with points (−2, −2), (0, 3), (4, −1) and (6, 4). If we ’add’ (2, 2) to each point, we end up
with the coordinates (0, 0), (2, 5), (6, 1) and (8, 6), see 3.1.
Proof
A parallelogram at the origin in R2 determined by vectors b1 and b2 has the form
S = {s1 b1 + s2 b2 : 0 ≤ s1 ≤ 1, 0 ≤ s2 ≤ 1}
T (s1 b1 + s2 b2 ) = s1 T b1 + s2 T (b2 )
= s1 Ab1 + s2 Ab2
It follows that T (S) is the parallelogram determined by the columns of the matrix Ab1 Ab2 . This
matrix can be written as AB, where B = b1 b2 . Thus,
{area of T (S)} = det AB = det A = det B = det A · {area of S} (3.8)
samvanelsloo97@icloud.com
CHAPTER 3. DETERMINANTS 36
4.1 Rank
If we have a m × n = 4 × 5 matrix:
−2 −5 8 0 −17
1 3 −5 1 5
A=
3 11 −19 7 1
1 7 −13 5 −3
Each row has n entries, and the set of all linear combinations of the row vectors is called the row space.
So, for this matrix:
And Row A = Span {r1 , r2 , r3 , r4 }. The vectors can either be written out horizontally or vertically. Note
that if we transpose A, the column space of AT will be equal to the row space of A.
Theorem 13
If two matrices A and B are row equivalent, then their row spaces are the same. If B is in echolon
form, the nonzero rows of B form a basis for the row space of A as well for that of B.
Proof
If B is obtained from A by row operations, the rows of B are linear combinations of the rows of A. It
follows any linear combination of the rows of B is automatically a linear combination of the rows of A.
Thus the row space of B is contained in the row space of A. Since row operations are reversible, it can
also be said that A is a subset of the row space of B. If B is in echolon form, the nonzero rows are linearly
independent, because no nonzero row is a linear combination of the nonzero rows below it (just look at:
1 3 4 2
0 2 3 −1
0 0 1 2
It is obvious that the second row cannot be produced by the third row, and neither can the first row by
the second and third rows). Thus, the nonzero rows of B form a basis of the row space of B, which is
equal to the row space of A. Thus, to find the row space of a matrix, you need to row reduce it so that
it is apparent whether there are any zero rows in it; the row space is then formed by the nonzero rows.
Now, please note that you should use the rows in the B matrix now, not the originial rows in matrix A.
Definition
The rank of A is the dimension of the column space of A.
The dimension will be explained in more detail in chapter 2.9. However, you can compare it with this:
suppose you have two (linearly independent) vectors in R3 . Now, with these two vectors, you are able to
37
CHAPTER 4. VECTOR SPACES 38
reach all points on a certain plane given by these two vectors. So, in a sense, there is something that’s
only 2D about this set of vectors, even though they’re in R3 . The dimension refers to this matter: if you
have two vectors in a basis (because you need the set to be linearly independent), then the dimension
is R2 , because you can go to points in a certian set that’s only R2 . So, if we have three vectors in our
column space, even if they have thousands of rows each, the dimension is only in R3 .
Now, note that each pivot position in a m × n matrix led to a vector in the basis of the column space.
Furthermore, each column that did not contain a pivot position led to a free variable, thus a vector in the
null space. Since a column can only be a pivot column or not be a pivot column, this means that the
total number of colums n is equal to sum of the rank of A and the dimension of the null space of A. So,
if I have a 7 × 9 matrix with 7 pivot positions, then I would have a rank of 7, and since n = 9, my null
space would have a dimension of 2. Note that the rank of a matrix can never be larger than m, because
then there would be more rows with a pivot than there are actually pivots.
The precise theorem is:
Theorem 14
The dimensions of the column space and the row space of an m × n matrix A are equal. This
common dimension, the rank of A, also equals the number of pivot poisitons in A and satisfies the
equation:
rank A + dim Nul A = n
Let A be an n × n matrix. Then the following statements are each equivalent to the statement
that A is an invertible matrix.
m. The columns of A form a basis of Rn .
n. Col A = Rn .
o. dim Col A = n.
p. Rank A = n.
q. Nul A = {0}.
r. dim Nul A = 0.
Proof
Statement (m) is logically equivalent to statement (e) and (h) regarding linear independnece and
spanning. From (g), it follows that (n) is true, and so is (o), and so is (p), (r), (q) and finally (d), so that
the loop is complete.
This only happens for certain vectors, and this section is about these vectors. We call them eigenvectors,
and the scalar multiple (so for the last time, we saw that Av = 2v, then the scalar multiple is 2) is called
the eigenvalue. To be precise:
Definition
An eigenvector of an n × n matrix A is a nonzero vector x such that Ax = λx for some scalar λ.
A scalar λ is called an eigenvalue of A if there is a nontrivial solution x of Ax = λx; such an x
is called an eigenvector corresponding to λ.
Itis easy
to determine
if a given
vector/scalar
is an eigenvector/eigenvalue of a matrix. Let’s consider
1 6 6 3
A= ,u = , and v = , and let’s determine whether u and v are eigenvectors of A:
5 2 −5 −2
1 6 6 −24 6
Au = = = −4 = −4u
5 2 −5 20 −5
1 6 3 −9 3
Av = = 6= λ
5 2 −2 11 −2
Thus, u is an eigenvector with eigenvalue -4, while v is not an eigenvector. Now, suppose we have to show
that 7 is an eigenvalue of matrix A, and to find the corresponding eigenvectors. For 7 to be an eigenvalue
Ax = 7x (5.1)
Ax − 7x = 0
(A − 7I) x = 0
39
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 40
Example
4 −1 6
Let A = 2 1 6. An eigenvalue of A is 2. Find a basis for the corresponding eigenspace.
2 −1 8
4 −1 6 2 0 0 2 −1 6
A − 2I = 2 1 6 − 0 2 0 = 2 −1 6
2 −1 8 0 0 2 2 −1 6
Row reducing the augmented matrix leads to:
−2 1 6 0 −2 1 6 0
−2 1 6 0 ∼ 0 0 0 0
−2 1 6 0 0 0 0 0
At this point, it is clear that 2 is indeed an eigenvalue, as there are two free variables and thus a nontrivial
solution. The general solution is:
x1 1/2 −3
x2 = x2 1 + x3 0
x3 0 1
Proof
For simplicity, consider the 3 × 3 case. If A is upper triangular, then A − λI has the form:
a11 a12 a13 λ 0 0
A − λI = 0 a22 a23 − 0 λ 0
0 0 a33 0 0 λ
a11 − λ a12 a13
= 0 a22 − λ a23
0 0 a33 − λ
0 0 0 3
simply 3, 0 and 2.
Now, what happens if the eigenvalue is 0? This is only the case if:
Ax = 0x = 0 (5.3)
This has only a nontrivial solution (necessary for it to be an eigenvalue) if it is linearly dependent (because
then there’s a free variable), so by the invertible matrix theorem, 0 is an eigenvalue of A if and only if A
is not invertible. Remember this, because we will add this to the invertible matrix theorem in the next
section.
Theorem 2
If v1 , ..., vr are eigenvectors that correspond to distinct eigenvalues λ1 , ..., λr of an n × n matrix
A, then the set {v1 , ..., vp } is linearly independent.
Proof
Let vp+1 be a linear combination of all preceding vectors. Then there exists scalars such that:
c1 v1 + · · · + cp vp = vp+1 (5.4)
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 42
If we multiply (5.4) with λp+1 on both sides, and then substract this from above result, we get:
Since each eigenvalue is distinct, λi − λp+1 6= 0, and thus this is only possible if all scalars c1 , ..., cp are
zero, and thus only the trivial solution exists, which is only the case for linearly independent vectors, and
hence it cannot be that the set {v1 , ..., vp } is linearly dependent.
And this is called a difference equation, because it describes the changes in a system as time passes. It is a
recursive description of a sequence {xk } in Rn . A solution of it is an explicit description of {xk } whose
formula does not depent directly on A or preceding terms except for the first term, x0 . The simplest way
to do this is to just write:
xk = λk x0 (5.9)
This is true because:
c. det AT = det A.
d. If A is triangular, then det A is the product of the entries on the main diagonal of A.
e. A row replacement operation on A does not change the determinant. A row interchange
changes the sign also scales the determinant by the same scale factor.
5 −2 6 1
0 3 −8 0
For example, if we want to find the characteristic equation of =
, then:
0 0 5 4
0 0 0 1
5−λ −2 6 −1
0 3−λ −8 0
det (A − λI) = det = (5 − λ) (3 − λ) (5 − λ) (1 − λ)
0 0 5−λ −4
0 0 0 1−λ
The last equation is also called the characteristic polynomial of A. In fact, if A is an n × n matrix,
then det (A − λI) is a polynomial of degree n. The eigenvalue 5 in the foregoing example is said to have
multiplicity 2 because (λ − 5) occurs two times as a factor of the characteristic polynomial. In general,
the (algebraic) multiplicity of an eigenvalue λ is its multiplicity as a root of the characteristic equation.
Example
Let the characteristic polynomial of a 6 × 6 matrix be λ6 − 4λ5 − 12λ4 . The eigenvalues and their
multiplicities are then:
λ6 − 4λ5 − 12λ4 = λ4 λ2 − 4λ − 12 = λ4 (λ − 6) (λ + 2)
Thus the eigenvalues are 0 (multiplicity 4), 6 and -2 (both multiplicity of 1).
5.2.1 Similarity
We’ll now discuss something that will only become of use in the following section, so don’t freak out when
you have no idea why I mention this. A is said to be similar to B if there is an invertible matrix P such
that P −1 AP = B, or A = P BP −1 . Let Q = P −1 , then Q−1 BQ = A, thus B is also similar to A, and we
simply say that A and B are similar. Changing A into P −1 AP is called a similarity transformation.
Similar matrices have a very nice property:
Theorem 4
If n × n matrices A and B are similar, then they have the same characteristic polynomial and
hence the same eigenvalue (with the same multiplicities).
Proof
If B = P −1 AP , then:
B − λI = P −1 AP − λP −1 P = P −1 (AP − λP ) = P −1 (A − λI) P
det (B − λI) = det P −1 (A − λI) P = det P −1 · det (A − λI) · det (P )
If two matrices have the same eigenvalues, they are not necessarily similar;
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 44
5.3 Diagonalization
Remember that we could write A = P BP −1 . This can be a very powerful property if B is a diagonal
matrix ( (a matrix
with only entries on
its main diagonal). Let’s just look at the following
computation.
2 3
5 0 5 0 5 0 5 0 5 0
Let D = . Then D2 = = . If we take D3 , we get and in general,
0 3 0 3 0 3 0 32 0 33
k
5 0
Dk = . So, suppose you want to find A100 where A is a shitty matrix, it significantly reduces
0 3k
computation time if you can find a A = P DP −1 where D is a diagonal matrix, because D100 can very
easily be found and we thus only have to do two ’full’ matrix multiplications, rather than 1001 .
Now, the following theorem provides some useful facts about how to construct A = P DP −1 .
Theorem 5: The Diagonalization Theorem
1 The exact derivation follows the pattern: A2 = P D−1 P −1 P D−1 P −1 = P DDP −1 = P D2 P −1 . Same happens
when you do it with A3 , etc. ad infinitum.
0 0 ··· λn
Now, we see that this exactly works out (because for an eigenvector, Avk = λk vk ), so our guesses
were correct. Furthermore, for P to be invertible, its columns must be linearly independent, and hence P
must consist of n linearly independent eigenvectors.
Example
Diagonalize the following matrix, if possible.
1 3 3
A = −3 −5 −3
3 3 1
So, let’s find the eigenvalues. For 3 × 3 matrices, determinants are a much alrger pain in the ass, so the
book will usually provide for an easy way to avoid doing a shitload of computations (the textbook at
least says so), so I’ll just give you the characteristic equation:
2
0 = det (A − λI) = −λ3 − 3λ2 + 4 = − (λ − 1) (λ + 2)
So, λ1 = 1 and λ2 = −2. Now, solving to find the eigenvectors. Remember that then the equation
(A − Iλ) x = 0 must hold. So, we get for λ = 1:
1 3 3 1 0 0 0 3 3
−3 −5 −3 − 0 1 0 = −3 −6 −3
3 3 1 0 0 1 3 3 0
0 3 3 0 1 0 −1 0
−3 −6 −3 0 ∼ 0 1 1 0
3 3 0 0 0 0 0 0
x3 1 1
So x = −x3 = x3 −1, so v1 = −1. Similarly, we have for λ = −2:
x3 1 1
1 3 3 −2 0 0 3 3 3
−3 −5 −3 − 0 −2 0 = −3 −3 −3
3 3 1 0 0 −2 3 3 3
3 3 3 0 1 1 1 0
−3 −3 −3 0 ∼ 0 0 0 0
3 3 3 0 0 0 0 0
−x2 − x3 −1 −1 −1 −1
So x = x2 = x2 1 + x3 0 , so the basis for λ = −2 is v2 = 1 and v3 = 0 .
x3 0 1 0 1
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 46
1 −1 −1 1 0 0
We can now simply construct P = v1 v2 v3 = −1 1 0 and D = 0 −2 0 .
1 0 1 0 0 −2
Now, it may occur that if we have an eigenvalue with a multiplicity, we do not gain as many linearly
independent eigenvectors from this eigenvalue (for example, in the foregoing example, if λ = −2 had only
yielded us with one vector v2 instead of both v2 and v3 . In that case, we have too few eigenvectors to
construct P , and hence A is not diagonalizable. However, we do know:
Theorem 6
An n × n matrix with n distinct eigenvalues is diagonalizable.
In other words, A is diagonalizable if and only if there are enough eigenvectors to form a basis of Rn .
We call such basis an eigenvector basis for Rn2 .
The proof is simple. Let v1 , ..., vn be eigenvectors corresponding to the n distinct eigenvalues of
a matrix A. Then {v1 , ..., vn } is linearly independent, by Theorem 2 in Section 5.1. Hence A is
diagonalizable, by theorem 5.
Note that it is a requisite for A to have n distinct eigenvalues to be diagonalizable, just see the
foregoing example.
Theorem 7
Let A be an n × n matrix whose distinct eigenvalues are λ1 , ..., λp .
a. For 1 ≤ k ≤ p, the dimension of the eigenspace of λk is less than or equal to the multiplicity
of the eigenvalue λk .
b. The matrix A is diagonalizable if and only if the sum of the dimensions of the eigenspace
equals n, and this happens if and only if (i ) the characteristic polynomial factors completely
factor into linear factors and (ii ) the dimension of the eigenspace for each λk equals the
multiplicity of λk .
All of these are pretty logical: an eigenvalue will at maximum have the same number of eigenvectors
as it is the mltiplicity, so its dimension will be less than or equal to the multiplicity of the eigenvalue. The
characteristic polynomial needs to completely factor out and each eigenvalue should have an eigenspace
where the dimension equals the multiplicity3 . Finally, the ’sum’ of all basisses of the individual eigenspaces
together make up one big ass eigenvector basis that covers all of Rn .
Example
Let’s do one final example about this. Diagonalize A, if possible:
5 0 0 0
0 5 0 0
A=
1 4 −3 0
−1 −2 0 −3
2 Remember that this was a fancy way of saying that the eigenvectors need to be able to reach every point in Rn .
3 Thedimension was determined by how many (linearly independent) vectors a set contained. As we need the maximum
number of vectors per eigenvalue, the dimension thus has to equal the multiplicity.
0 1 0 1
linearly independent and thus we have:
−8 −16 0 0 5 0 0 0
4 4 0 0 and D = 0 5 0 0
P = 1 0 0 −3 0
0 1 0
0 1 0 1 0 0 0 −3
Now, what they are actually saying here: if we have A = P DP −1 , then if we use this as a linear
transformation, we can either go via A directly (Ax) or do P DP −1 x. Remembering that you will first
multiply x with P −1 , what you actually do is the following: you convert the vector via P −1 to a different
coordinate in a different basis4 . Now we multiply it with another matrix, D, to convert it to another
vector, but in the same basis as when we converted via P −1 . Now, to finish up, we multiply with P to
convert back to the vector that was also calculated with Ax. A visualization is given in figures 5.4 and
5.5. Note that in figure 5.4 for our case, both are in Rn .
−1
What you need to remember however, is that if we have B-matrix
A =P DP , then D is simply the
7 2 1 1 5 0
for T . So, for example, we found before that for A = ,P = and D = . So,
−4 1 −1 −2 0 3
5 −
B= , .
0 3
4 If 2
you don’t remember what this meant: if we first have a vector a = , then we basically say, it’s 2x1 and 1x2 . But,
1
3 −2 1
we can also define w1 = and w2 = , and then a = w1 + w2 , or a = where the basis is formed by {w1 , w2 }.
−1 2 1
n
but still in the same dimension (if x is in R , then P −1 must be n × n, so it stays Rn
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 48
Which does not have real solution. We can still solve it however:
2
(λ − 0.8) = −0.36
√ √
λ − 0.8 = ± −0.36 = ±i 0.36
λ = 0.8 ± 0.6i
So, we have two complex eigenvalues, λ = 0.8 + 0.6i and λ = 0.8 − 0.6i. We can find the complex
eigenvectors as well, though that’s a little bit more difficult. First, let’s do it for λ = 0.8 + 0.6i. We then
get:
0.5 −0.6 0.8 + 0.6i 0
A − (0.8 − 0.6i) I = −
0.75 1.1 0 0.8 + 0.6i
(5.10)
−0.3 − 0.6i −0.6
=
0.75 0.3 − 0.6i
Now, row reducing isn’t a fun thing to do here due to the complex numbers. We can, however, write:
If we think logically, we can however find the solution. Since we used an eigenvalue, we know this system
to have a nontrivial solution. So, if we were to write x1 as a function of x2 based on the first equation,
we know that if we plug this into the second equation, that it should still yield 0. If this would not be the
case, then apparently the first equation only has the trivial solution, but we know that that is bullshit as
we used an eigenvalue and thus it should have a nontrivial solution. So, we can just use any of the two
equations to determine our solution. Let’s take the second equation. We see that we get:
The points are shown in 5.6. It’s visible that the values lie in an elliptical orbit.
We call the real stuff of a vector the real part and the complex stuff the complex part. The
conjugate of a complex number, denoted by x has the same real part, but its complex part has twisted
sign. So, for v1 above, we have:
−2 4 −2 4 −2 − 4i
Re v1 = , Im v1 = , and v1 = −i =
5 0 5 0 5
Figure 5.6: Iterations of the point x0 under the action of a matrrix with a complex eigenvalue.
Now, you may have thought, is it coincidence that v1 and v2 happen to be a conjugate pair? And
that the eigenvalues were also a conjugate pair? It’s not. Let A be an n × n matrix whose entries
are completely real. Since there is no complex part in A, A = A. So, not only Ax = λx, but also
Ax = Ax = λx = λx. But this isn’t that important, just remember that they occur in conjugate pairs.
Now, as to why we have rotation. We can most easily deduce this by using similarity, and finding
a
−1 a −b
matrix C that will form a building block for A = P CP . Now, let us just assume C = , then
b a
the eigenvalues equal:
2
(a − λ) + b2 = 0
2
(a − λ) = −b2
a−λ = ±bi
λ = a ± bi
a
We can draw λ as depicted in figure 5.7. Using the fact that cos φ = r and sin φ = rb , we get that:
Figure 5.7: λ
a/r −b/r cos φ − sin φ r 0 cos φ − sin φ
C=r =r =
b/r a/r sin φ cos φ 0 r sin φ cos φ
r 0
Note that r may be substituted by because that’s the way you end up a the same matrix as before.
0 r
√
Furthermore, r = kλk = a2 + b2 .
Now, note what actually happens here: we first multiply by some angles (because we first multiply with
the rightermost matrix). This is the rotation. How much it is actually rotated isn’t very straightforward
to explain, and you do not need to know it anyway, but it’s obvious that if we keep multiplying with
sines and cosines, we do something related to a rotation. Afterwards, we multiply it with the length of
the eigenvalue of C, so there it is scaled as well. This computation is shown in figure 5.8.
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 50
on page 56) will provide a lot of clarity. Remember that a differential equation could be of the form
x0 (t) = ax (t) (so a very basic differential equation). Now, we can have system of such equations as well,
so for example written as:
The reason why I wrote a11 etc. is so that the conversion to matrix form will be clearer. They merely
indicate a scalar constant for each function, nothing more than that. We can write this set of differential
equations as:
x1 (t) = Ax (t) (5.12)
where
0
x1 (t) x1 (t) a11 ··· a1n
.. 0 .. .. ..
x (t) = . , xt = . , and A = . .
xn (t) xn (t) an1 ··· ann
A solution is a vector-valued function that satisfies (5.12) for all t in some interval of real numbers,
such as t ≤ 0. Now, equation (5.12) is linear because remember that for a linear transformation (if
x = cu + dv):
In this case, Ax should equal x0 . So, if u and v are both valid solutions for x, then we need to prove that
0
A (cu + dv) = (cu + dv) . That’s true:
0
(cu + dv) = cu0 + dv0 = cAu + dAv = A (cu + dv)
Now, enough theorem for now, let’s do some calculations. Let us for example consider:
0
x1 (t) 3 0 x1 (t)
= (5.13)
x02 (t) 0 −5 x2 (t)
c1 e3t
x1 (t) 1 3t 0 −5t
= = c1 e + c 2 e
x2 (t) c2 e−5t 0 1
Now, note that 3 and -5 also happened to be the eigenvalues of A. So, we have the suspicion that the
general solution will be a linear combinations of the vectors of the form:
(in the foregoing example, c1 and c2 would indicate the weights), where λ is some scalar (which will turn
out to be the eigenvalue) and v a nonzero vector (if v = 0, then x (t) = 0, and thus x0 = 0 = Ax5 ). Now,
note that we can prove that we must take the eigenvalue in the exponent:
x (0 t) = λveλt
Ax (t) = Aveλt
This is only correct if and only if λv = Av, thus only if λ is the eigenvalue, and v the corresponding
eigenvector. Such solutions are called eigenfunctions. Thus, here is the general approach for finding an
appropriate x (t): find the eigenvalues and eigenvectors. The general solution, or preferably called the
fundamental set of solutions is then equal to x (t) = c1 v1 eλt + · · · + cn vn eλt . The exact solution (so
values for c1 , ..., cn ) can be found by solving the initial value problem. Let’s do an example.
Example
x1 (t) −1.5 0.5 5
Let x = , A = and x (0) = . It can rather easily be found that the
x2 (t) 1 −1 4
corresponding eigenvalues
are (usethecharacteristic equation) λ1 = −0.5 and λ2 = −2, with corresponding
1 −1
eigenvectors v1 = and v2 = . Then the fundamental set of solutions is:
2 1
λ1 t λ2 t 1 −0.5t −1 −2t
x (t) = c1 v1 e + c2 v 2 e = c1 e + c2 e
2 1
That’s not that difficult, is it? Finding the eigenvalues and eigenvectors is the hardest part, basically.
Now, we know that if t = 0, it reduces to:
1 −1 5
c1 + c2 =
2 1 4
1 −1 5
Solving the augmented matrix (with your graphical calculator) leads to c1 = 3 and c2 = 2.
2 1 4
So, the desired solution can be written as:
1 −0.5t −1 −2t
x (t) = 3 e −2 e
2 1
−0.5t
+ 2e−2t
x1 (t) 3e
=
x2 (t) 6e−0.5t − 2e−2t
c1 e−0.5t + c2 e−2t
x1 (t)
(note that the general form would have been = . Now, we can plot the
x2 (t) c1 2e−0.5t − c2 e−2t
graphs for several values of c1 and c2 if we want. On your graphical calculator (at least, this is how I do
it on my TI-84 Plus), go to MODE, and change the fourth setting from FUNC to PAR6 . Then, go to the
screen where you can input the graphs. It’ll now give X1T and Y1T as formulas. For X1T , plug in the
first row of the matrix (so the one corresponding to x1 ) (choose some scalars you want). For Y1T , plug in
the second row. If you want a second graph in the same window, just plug in the formulas into the X2T ,
Y2T , etc. Now, in the window settings, you now also have to pick a suitable range for T and a reasonable
time step. Please note that you cannot choose any range; if there are too many steps, it’ll take literally
ages before the graphing is complete. Furthermore, you need to pick your range smartly: there’s no point
in plotting 1000000-200000 for example. Finally, the graph will connect all points with straight lines, so
if your time steps are too large, it’ll cause sudden deflections.
So, you can play a bit with this, playing with your window settings and playing with the scalars.
Specifically, it’s interesting to see what happens if you choose c1 = 0, or c2 = 0, or in one and the same
plot c1 = 3 and c2 = 2. We then get the following graph, or trajectory, as seen in figure 5.9. Now, you may
have noticed during plotting that your graphical calculator drew the points chronologically: you really saw
5 OkayI have actually no idea why the book points this out.
6 This
indicates how it will graph equations. FUNC will plot like a regular function: you plug in a function for y and it
will plot the function. If you enable PAR (from parametric equations), you can plug in two formulas (one for x, one for y)
as a function of time T, and it’ll plot the coordinates for a certain range of T, which is what we will be wanting.
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 52
one pixel added after pixel. It does in fact first plot T = 0.000, then T = 0.001, T = 0.002 etc., so you see
what direction the graph tends to. These directions are indicated with arrows in figure 5.9. We also see the
specific graphs formed by the scalars c1 = 0 (the graph that goes through v2 ), c2 = 0 (the graph that goes
through v1 ), c1 = 3 and c2 = 2 (the graph that goes through the initial value, x0 ). We also see that each
graph is directed towards the origin (though please note, except if of course c1 = c2 = 0, it never ends at
the origin). The reason for this may be obvious: as e is raised to a negative power, it’ll keep on decreasing
forever, thus ending up almost at the origin). The origin attracts thus all graphs and is called an at-
tractor or sink. An matrix A with only negative eigenvalues will thus have an origin that acts as attractor.
Now, if the eigenvalues had been positive (and the exponents would thus have been positive), the
values of x1 and x2 would have kept on increasing forever, thus the origin seems to repel all the graphs,
and is therefore called a repeller or source. Now, what happens if we have one positive and one negative
eigenvalue?
Example
Suppose a particle is moving in a planar
force
field and its
position
vector x satisfies x0 = Ax and
4 −5 2.9
x (0) = 0. Furthermore. we have A = and x0 = . Solve this initial value problem for
−2 1 2.6
t ≤ 0, and sketch the trajectory of the particle.
−5
The eigenvalues can be found to equal λ1 = 6 and λ2 = −1, with the eigenvectors being v1 =
2
1
and v2 = and thus
1
−5 6t 1 −t
x (t) = c1 v1 eλ1 t + c2 v2 eλ2 t = c1 e + c2 e
2 1
−5 1 2.9 −3 188
Solving the augmented matrix yields c1 = 70 and c2 = 70 and hence:
2 1 2.6
−3 −5 6t 188 1 −t
x (t) = e + e
70 2 70 1
−5e6t + c2 e−t 7
x1 (t)
Plot a few trajectories of this (it’d be = ) and end up at something like depicted
x2 (t) 2e6t + c2 e−t
in figure 5.10. We see that the lines initially direct towards the origin, then suddenly experience a mid-life
experience and decide to go away from the origin. So, the origin acts half as sink and as source, and is
called a saddle point. Now, we also see that the line of v2 is never repelled; this is because there c1 = 0,
and thus you basically only have the part of x (t) that has the negative eigenvalue (and thus is solely
attracted) and v1 is never attracted; this is because there c2 = 0, and thus you basically only have the
part of x (t) that has the positive eigenvalue (and thus is only repelled).
7 Take a few scalars: take the negative value of c1 and c2 , 0 for one of them, etc. to get a good overview.
cn eλn t
(this is what you get if you simply write everything in one matrix). We can use this property if have
a shitty matrix. Remember that diagonalization meant A = P DP −1 . The eigenvalues of A and D are
the same, but the eigenvectors are not. Furthermore, P consisted of the eigenvectors of A. Now, if we
compare
What oh what must we do to get from y (t) to x (t)? That’s right, e1 must becmoe v1 , etc. How do we do
this? We multiply e1 with a matrix whose first column is v1 . e2 is multiplied by the same matrix, and its
second column is v2 . en is multiplied by the same matrix whose nth column is vn . So, we must multiply
y (t) with a matrix that consists of the eigenvectors of A, which, coincidentally, happens to be P . So,
Or equivantly:
y (t) = P −1 x (t) (5.18)
Note that in equation 5.16, the scalars are found by solving:
c1
.. −1 −1
. = y (0) = P x (0) = P x0
cn
The change of variable from x to y has decoupled the system of differential equations. It’s called
decoupling because if we compare the example on page 55 and on page 56 with eachother, we see that for
the diagonal matrix, x01 (t) only depended on x1 (t) and x02 (t) only depended on x2 (t). However, on page
56, both were dependent on each other, so they were coupled.
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 54
let me recap what the real part and imaginary part of a complex number is. If we have p = a + bi, then
Re p = a and Im p = b (but not Im p = bi!).
Now, we remember that a real matrix A has a pair of complex eigenvalues λ and λ (where λ = a + bi
and λ = a − bi. So, two solutions of x0 = Ax are:
1 h i
Im veλt =
x1 (t) − x1 (t)
2i
So, our goal will be to find the real and imaginery parts of veλt . Now, we will write v as v = Re v+i Im v.
Furthermore, eλt = e(a+bi)t = eat eibt , where eibt = cos bt + i sin bt. So, we may write:
We established before that Re veλt and Im veλt are solutions, and so, two real solutions of x0 = Ax
are:
Example
−2 −2.5 3
Now, let’s do an example to clarify a bit. Let A = and x0 = . We find that the
10 −2 3
i
eigenvalue is λ = −2 + 5i and the corresponding eigenvector v1 = . So:
2
i (−2+5i)t
x1 (t) = e
2
−i (−2−5i)t
x1 (t) = e
2
Now, that’s not too difficult, right? We thus have as general solution
− sin 5t −2t cos 5t −2t
x (t) = c1 e + c2 e
2 cos 5t 2 sin 5t
3 0 1 3
With x (0) = , we get c1 + c2 = , so c1 = 1.5 and c2 = 3. Thus:
3 2 0 3
− sin 5t −2t cos 5t −2t
x (t) = 1.5 e +3 e
2 cos 5t 2 sin 5t
or
x1 (t) −1.5 sin 5t + 3 cos 5t −2t
= e
x2 (t) 3 cos 5t + 6 sin 5t
This can be plotted. If you try several scalars, you’ll end up with a figure similar to figure 5.11. We
see that the lines spiral towards the origin, and hence the origin is called a spiral point. The lines spiral
towards the origin due to the negative exponent of e; had it been positive, it would have been repelled by
the origin. The spiralling makes sense due to the sines and cosines.
samvanelsloo97@icloud.com
CHAPTER 5. EIGENVALUES AND EIGENVECTORS 56
However, I suppose this is not brand new information for you. The following properties apply:
Theorem 1
Let u, v and w be vectors in Rn , and let c be a scalar. Then
a. u · v = v · u
b. (u + v) · w = u · w + v · w
c. (cu) · v = c (u · v) = u · (cv)
d. u · u ≤ 0, and u · u = 0 if and only if u = 0
Definition
Which is Pythagoras, basically. A vector whose length is 1 is called a unit vector. If we divide
a nonzero vector v by its length, we obtain a unit vector u, and this process is sometimes called
normalizing, and we say that u is in the same direction as v.
Definition
For u and v in Rn , the distance between u and v, written as dist (u,v), is the length of the
vector u − v. That is,
dist (u,v) = ku − vk
57
CHAPTER 6. ORTHOGONALITY 58
u if and only if the distances between the end points of the first line to the end point of u are equal
(it’s basically a linesegment bisector (middelloodlijn in het Nederlands) then). So, that would mean the
following:
2 2
[dist (u,v)] = [dist (u,-v)]
2 2
ku − vk = ku − (−v)k
2 2
ku − vk = ku + vk
u · u − 2u · v + v · v = u · u + 2u · v + v · v
2 2 2 2
kuk + kvk − 2u · v = kuk + kvk + 2u · v
Theorem 3
Let A be an m × n matrix. The orthogonal complement of the row space of A is the null space of
A, and the orthogonal complement of the column space of A is the null space of AT :
⊥ ⊥
(Row A) = Nul A and (Col A) = Nul AT
1 2 −2 4
The reasoning behind this is actually rather logical: suppose we have the matrix A = 2 3 −1 −5.
1 4 2 −2
The row space of this is given by the three vectors that are formed by each row. The row null space is
calcualted such that if you plug them in the matrix, you end up with 0s everywhere. So, if you multiply
any vector in the row space with any vector in the null space, you end up with 0. That means that they
are orthogonal. Similarly, since Col A is simply the transpose of Row A, Col A is orthogonal to transpose
of the null space of A.
u1 · u2 = 3 · −1 + 1 · 2 + 1 · 1 = 0
1 7
u1 · u3 = 3 · − + 1 · −2 + 1 · = 0
2 2
1 7
u2 · u3 = −1 · − + 2 · −2 + 1 · = 0
2 2
Theorem 4
samvanelsloo97@icloud.com
CHAPTER 6. ORTHOGONALITY 60
Proof
Remember that a set is linearly independent if and only if the homogeneous equation only has the
trivial solution. If 0 = c1 u1 + · · · + cp up , then:
0 = 0 · · · u1 = (c1 u1 + c2 u2 + · · · + cp up ) · u1
= (c1 u1 ) · u1 + (c2 u2 ) · u1 + · · · + (cp up ) · u1
= c1 (u1 · u1 ) + c2 (u2 · u1 ) + · · · + cp (up · u1 )
Since it’s an orthogonal set, up · u1 is zero, and thus it reduces to 0 = c1 (u1 · u1 ). Since u1 · u1 is not
zero (as u1 6= 0), this leads to the fact that c1 = 0 is the only solution. The same holds for other scalars,
so it only has the trivial solution, and S is linearly independent.
Definition
An orthogonal basis for a subspace W of Rn is a basis for W that is also an orthagonal set.
The nice thing about orthogonal bases are that the weights in a linear combination can be computed
easily.
Theorem 5
Let {u1 , ..., up } be an orthogonal basis for a subspace W of Rn . For each y in W , the weights in
the linear combination
y = c1 u1 + · · · + cp up
are given by
y · uj
cj =
uj · uj
Proof
Similar to last time, y · u1 = c1 (u1 · u1 ) and thus c1 = uy·u 1
1 ·u1
. It works as well for the jth term. So,
suppose we have the three vectors we had before. Express the vector y as linear combination of these
three vectors. First, we find that y · u1 = 11, y · u2 = −12, y · u3 = −33, u1 · u1 = 11, u2 · u2 = 6,
u3 · u3 = 33/2. Then:
y · u1 y · u2 y · u3
y = · u1 + · u2 + · u3
u1 · u1 u2 · u2 u3 · u3
11 −12 −33
= u1 + u2 + u3
11 6 33/2
= u1 − 2u2 − 2u3
y = ŷ + z (6.1)
See figure 6.4. We can write ŷ = αu, so that z = y − αu. Then y − ŷ is orthogonal to u if and only if
(y − αu) · u = y · u − (αu) · u = y · u − α (u · u)
y·u y·u
So, α = u·u and thus ŷ = u·u u. ŷ is called the orthogonal projection of y onto u and the vector z
is called the component of y orthogonal to u. Sometimes ŷ is denoted by projL y and this is called
the orthogonal projection of y onto L. That is,
y·u
ŷ = projL y = ·u (6.2)
u·u
7 4
So, suppose we for example have y = and u = . Find the orthogonal projection of y onto u.
6 2
Then write y as the sum of two orthogonal vectors, one in Span{u} and one orthogonal to u.
7·4+6·2 4 4 8
ŷ = · =2 =
4·4+2·2 2 2 4
7 8 −1 8 −1
So that z = y − ŷ = − = . So, y = + .
6 4 2 4 2
See figure 6.5
Suppose we now need to calculate the distance from y to L. This is simply the length of vector z, i.e.:
q
2
√
ky − ŷk = kzk = (−1) + 22 = 5
Let’s now take a look and admire the beauty of figure 6.6.
Note how we can basically decompose a vector y into a sum of orthogonal projections onto one-
dimensional subspaces, so that any y can be written in the form:
y · u1 y · u2
y = projL1 y + projL2 y = u1 + u2 (6.3)
u1 · u1 u2 · u2
samvanelsloo97@icloud.com
CHAPTER 6. ORTHOGONALITY 62
linearly independent, by Theorem 4. The simplest form is of course {e1 , ..., en }. An more complicated
example is showing that the following set is orthonormal:
√ √ √
3/√11 −1/√ 6 −1/√66
v1 = 1/√11 ,
v2 = 2/√6 , v3 = −4/√ 66
1/ 11 1/ 6 7/ 66
We must show that all vectors are orthogonal to each other, and that the length of each vector is 1. Thus:
√ √ √
v1 · v2 = −3/ 66 + 2/ 66 + 1/ 66 = 0
√ √ √
v1 · v3 = −3/ 726 − 4/ 726 + 7/ 726 = 0
√ √ √
v2 · v3 = 1/ 396 − 8 396 + 7 396 = 0
9 1 1
v1 · v1 = + + =1
11 11 11
1 4 1
v2 · v2 = + + =1
6 6 6
1 16 49
v3 · v3 = + + =1
66 66 66
Since the set is linearly independent, its three vectors form a basis for R3 . See figure 6.7.
When the vectors in an orthogonal set of nonzero vectors are normalized to have unit length, the new
ectors will still be orthogonal, and hence the new set will be an orthonormal set.
Matrices whose columns form an orthonormal set are very nice.
Theorem 6
Proof
The columns of U are orthogonal if and only if uT1 u2 = uT2 = 0, uT1 u3 = uT3 u1 = 0, uT2 u3 = uT3 u2 = 0.
Furthermore, the columns of U all have unit length if and only if uT1 u1 = 1, uT2 u2 = 1 uT3 u3 = 1. Plug
these values in the matrix and you end up with the identity matrix.
Theorem 7
Let U be an m × n matrix with orthonormal columns, and let x and y be in Rn . Then:
a. kU xk = kxk
b. (U x) · (U y) = x · y
c. (U x) · (U y) = 0 if and only if x · y = 0
These statements indicate that a linear transformation with a matrix with orthonormal columns
preserve lengths and orthogonality. Theorems 6 and 7 are especially useful when applied to square
matrices. An orthogonal matrix is a square invertible matrix U such that U −1 = U T . By theorem
6, such a matrix has orthonormal columns. Any square matrix with orthonormal columns is also an
orthogonal matrix. Such a matrix must have orthonormal rows too.
Now, similar to last time around, we can calculate the orthogonal projection of y onto W , or
proj Wy, using the following theorem:
Theorem 8: The Orthogonal Decomposition Theorem
y = ŷ + z (6.5)
and z = y − ŷ
2 −2 1
For example, let’s consider u1 = 5 , u2 = 1 , and y = 2 Write y as the sum of a vector in
−1 1 3
W and a vector orthogonal to W .
y · u1 y · u2
ȳ = u1 + · u2
u1 · u1 u2 · u2
2
2 −2 −
9 3 5
= 5 + 1 = 2
30 6 1
−1 1 5
samvanelsloo97@icloud.com
CHAPTER 6. ORTHOGONALITY 64
2 7
1 −5 5
And z = y − ŷ, we get z = 2 − 2 = 0 , and thus:
1 14
3 5 5
2 7
−5 5
y= 2 +0
1 14
5 5
Let W be a subspace of Rn , let y be any vector in Rn , and let ŷ be the orthogonal projection of y
onto W . Then ŷ is the closest point in W to y, in the sense that
The vector ŷ is called the best approximation to y by elements of W . The proof follows from
figure 6.9
2 2 2
ky − vk = ky − ŷk + kŷ − vk
Since ŷ 6= v, the term on the right is greater than 0, and hence theorem 9 follows.
Let’s do an example. Find the distance from y to W = Span {u1 , u2 }, where:
−1 5 1
y = −5 , u1 = −2 , u2 = 2
10 1 −1
Theorem 10
proj Wy = (y · u1 ) u1 + (y · u2 ) u2 + · · · + (y · up ) up (6.8)
If U = u1 u2 · · · up , then:
The second follows from the fact that projW y is a linear combination of the columns of U using the
weights of y · u1 , y · u2 , ..., y · up , which can be written as uT1 y, uT2 y, ..., uTp y, showing that they are the
entries in U T y and justifying (6.9).
Sometimes in life, you get a basis that is not orthogonal, let alone orthonormal. However, we like
orthogonal and orthonormal sets, so we want to learn (yeah you do) how to convert a given span to an
orthogonal basis. This is actually surprisingly easy. You start with one vector in the span, that’ll be the
first vector, v1 . We then look at the second vector, and basically decompose it onto the vector orthogonal
to v1 and the projection of v2 onto v1 . The orthogonal component is then simply x2 − projv1 x2 , and
this will be v2 . We can do this again for a third vector, using both v1 and v2 as the thing we project it
on, so that we basically get the following theorem:
v1 = x1
x2 · v1
v2 = x2 − v1
v1 · v1
x3 · v1 x3 · v2
v3 = x3 − − v2
v1 · v1 v2 · v2
..
.
xp · v1 xp · v2 xp · vp−1
vp = xp − − v2 − · · · − vp−1
v1 · v1 v2 · v2 vp−1 · vp−1
3 1
So, suppose we have x1 = 6 and x2 = 2. Construct an orthogonal basis for this.
0 2
3
v1 = 6
0
1 3 0
x2 · x1 15
v2 = x2 − x1 2 − 6 = 0
x1 · x1 45
2 0 2
samvanelsloo97@icloud.com
CHAPTER 6. ORTHOGONALITY 66
such that the distance to b is as small as possible. Logic tells you that the the projection of b onto Col
A coincides with Ax̂, thus:
Ax̂ = b̂ (6.11)
Now, the orthogonal vector equals z = b − b̂ = b − Ax̂. Let aj be any column of matrix A, then it
follows (by orthogonality) that aj · (b − Ax̂) = 0. Using the definition of the dot product (you transpose
the first vector, so that you get a matrix multiplication, just look at the first page of this chapter0, this
also means aTj (b − Ax̂) = 0, so that:
AT (b − Ax̂) = 0 (6.12)
Thus:
AT b − AT Ax̂ = 0
AT Ax̂ = AT b
These calculations show that each least-squares solution of Ax = b satisfies the equation
AT Ax = AT b (6.13)
This matrix equation represents a system of equations called the normal equations for Ax = b. Actually,
the definition of a least-squares solution is:
Definition
If A is m × n and b is in Rm , a least-squares solution of Ax = b is an x̂ in Rn such that:
kb − Ax̂k ≤ kb − Axk
for all x in Rn .
Theorem 13
The set of least-squares solutions of Ax = b coincides with the nonempty set of solutions of the
normal equations AT Ax = AT b.
samvanelsloo97@icloud.com
CHAPTER 6. ORTHOGONALITY 68
0 1
These two examples lead to the following theorem:
Theorem 14
Let A be an m × n matrix. The following statements are logically equivalent:
a. The equation Ax = b has a unique least-square solution for each b in Rm .
b. The columns of A are linearly independent.
I can really recommend using your graphical calculator to solve these type of exercises.
The distance from b to Ax̂ is called the least-squares error. For example, for the first example, we
have:
2 4 0 4
1
b = 0 and Ax̂ = 0 2 = 4
2
11 1 1 3
And thus:
2 4 −2
b − Ax̂ = 0 − 4 = −4
11 3 8
q
2 2
√
kb − Ax̂k = (−2) + (−4) + 82 = 84
This is simply a least-squares problem. Suppose we need to find the regression line for the data points
(2, 1), (5, 2), (7, 3), and (8, 3). We then have:
1 2 1
1 5 2
X= 1
, y =
7 3
1 8 3
The normal equations:
X T Xβ = X T y
Use your graphical calculator, honestly. You end up at:
4 22 β0 9
=
22 142 β1 57
Solving yields:
−1 2
β0 4 22 9
= = 75
β1 22 142 57 14
2 5
So y = 7 + 14 x.
samvanelsloo97@icloud.com
CHAPTER 6. ORTHOGONALITY 70
y = β0 + β1 u + β2 v (6.17)
2 2
y = β0 + β1 u + β2 v + β3 u + β4 uv + β5 v (6.18)
So, as an example suppose we’ve made a huge mistake in our lifes and somehow ended up at the
faculty of geography. However, considering we’re at geography, we’re the only ones with some basic
understanding of mathematics, and hence they’ve asked us to come up with a model to construct a trend
surface or, in this case, a least-squares plane of the altitude y of a certain region, based on the latitude u
and longitude v. We expect the data to satisfy the following equations:
y1 = β0 + β1 u1 +β2 v1 + 1
y2 = β0 + β1 u2 +β2 v2 + 2
.. ..
. .
yn = β0 + β1 un +β2 vn + n
So that:
−1 −1 1
λ = 8: v1 = 1 ; λ = 6: v2 = −1 ; λ = 3: v3 = 1 ;
0 2 1
It can be easily shown that these are all orthogonal to each other. Now, chapter 6.2 gave you some
valuable life advise, namely that if you can normalize an orthogonal basis, you should always do so. So,
let’s be wise and get the normalized eigenvectors:
√ √ √
−1/√ 2 −1/√6 1/√3
u1 = 1/ 2 , u2 = −1/√ 6 , u3 = 1/√3
0 2/ 6 1/ 3
Then A = P DP −1 as per usual. However, now, the wonderful thing is that since P is square and has
orthonormal columns, P is an orthogonal matrix (we don’t call it orthonormal matrix as well because
fuck logic), and this causes P −1 = P T (see secton 6.2).
Theorem 1
If A is symmetric, then any two eigenvectors from different eigenspaces are orthogonal.
71
CHAPTER 7. SYMMETRIC MATRICES AND QUADRATIC FORMS 72
Proof
T T
λ1 v1 · v2 = (λ1 v1 ) v2 = (Av1 ) v2
= vT1 AT v2 = vT1 (Av2 )
= vT1 (λ2 v2 )
= λ2 vT1 = λ2 v1 · v2
Theorem 2
An n × n matrix A is orthogonally diagonalizable if and only if A is a symmetric matrix.
Even the book admits that ”this theorem is rather amazing”. Normally, you can’t tell at a quick
glance whether a matrix is diagonalizable, but for symmetric matrices, you can.
Proof
The proof is rather simple. If A is orthogonally diagonalizable, then:
T
AT = P DP T = P T T DT P T = P DP T = A
Example
3 −2 4
Orthogonally diagonalize the matrix A = −2 6 2, whose characteristic equation is:
4 2 3
2
0 = −λ3 + 12λ2 − 21λ − 98 = − (λ − 7) (λ + 2)
Now, note that theorem only says something about eigenvectors from different spaces. v1 and v2 are
clearly not orthogonal, so we must use the Gram-Schmidt Process here:
−1/2 −1/4
v2 · v1
z2 = v 2 − v1 = 1 = 1
v1 · v1
0 1/4
1 Used properties are: the dot product can be written as a · b = bT a, λ v = Av , (Av )T = vT AT (because the order
1 1 1 1 1
was reversed when you took the transverse of multiple things), AT = A.
d. A is orthogonally diagonalizable.
0 λn uTn
T
u
1
· · · λn un ... = λ1 u1 uT1 + λ2 u2 uT2 + λn un uTn
= λ1 u1
uTn
Example
So what can we use the above things for? Let’s try to construct a spectral decomposition of the
matrix A that has the orthogonal diagonalization
√ √ √ √
7 2 2/√5 −1/√ 5 8 0 2/ √5 1/√5
A= =
2 4 1/ 5 2/ 5 0 3 −1/ 5 2/ 5
2 In my opinion, the easiest way to do this, is via your graphical calculator. First, save a vector. You do this by using
2ND (, so that you get a {. Then, write the first entry, use a comma as seperation, type the second entry, etc. Close by
pressing 2ND ). Then, press STO> (left bottom, just above ON). Now, press 2ND STAT, and press on L1. This will assign
L1 to this list and allows you to quickly load vectors (note that L1, L2 and L3 also have shortcuts on your the numpad,
2ND 1, 2ND 2 and 2ND 3, respectively. Now, to normalize a matrix, we need to divide the vector by its length. So, start a
fraction, and the numerator will be L1. The denumerator can be typed in as follows: use a square root, press 2ND STAT,
go to MATH, choose option 5: sum(. Then type L1 and square it. Then, close the sum( bracket by typing). This is your
length, and you’ll get the normalized vector. A problem arises however, as it won’t give you an exact answer. To counteract
this, the easiest way is to only compute sum(L12 ), and then divide each entry in the vector yourself by the square root of
this value.
3 Forgot what rank meant? It was the dimension of the column space of a matrix, or more straightforwardly, how many
linearly independent vectors made up a matrix. For example, every column of λu1 uT 1 is a multiple of u1 . In a following
example, we will see that this indeed the case. You don’t need to know the proof, though.
4 The reason for this is as follows: uuT x = u uT x = uT x u (position switch is allowed because uT x is the
dotproduct and thus a scalar). u x = u · x = x · u, and thus uuT x = (x · u) u, which is the orthogonal projection of x
T
x·u
onto a unit vector u (because u · u = 1 for a unit vector, u·u u = (x · u) u).
samvanelsloo97@icloud.com
CHAPTER 7. SYMMETRIC MATRICES AND QUADRATIC FORMS 74
That’s simply:
A = 8u1 uT1 + 3u2 uT2
I honestly don’t know what’s the use of this except for that spectral sounds decomposition sounds kinda
cool (relative to linear algebra standards, that is). You can prove this by plugging in the unit eigenvectors,
if you do it correctly (or type it correctly in your graphical calculator), you end up exactly at A.
Example
For x in R3 , let Q (x) = 2 2 2
5x1 + 3x2 + 2x3 − x1 x2 + 8x2 x3 . Write this quadratic form as x Ax.
T
3
Furthermore, compute x = −1.
2
We see that there’s -1 in front of the x1 x2 , indicating that the entry in first row and second column
((i,j)=(1,2)), and first column and second row ((i,j)=(2,1)) should both be -1/2. Similarly, (2,3) and (3,2)
should be 4, and (1,3) and (3,1) need to be 0. Furthermore (1,1) is simply 5, (2,2) is 3 and (3,3) is 2:
5 −1/2 0 x1
Q (x) = xT Ax = x1 x2 x3 −1/2
3 4 x2
0 4 2 x3
2
Q (3, −1, 2) = 5 · 32 + 3 · (−1) + 2 · 22 + 3 · −1 + 8 · −1 · 2 = 37
Example
Make a change of variable that transforms the quadratic form Q (x) = x21 − 8x1 x2 − 5x22 into a
quadratic form with no cross-product term.
1 −4
The matrix of the quadratic form can easily be deduced to be A = . The eigenvalues equal
−4 5
λ = 3 and λ = −7, and the associated eigenvectors are respectively:
√ √
2/ √5 1/√5
v1 = , v2 =
−1/ 5 2/ 5
And since the vectors are of different eigenspaces, they are automatically orthogonal and
√ √
2/ √5 1/√5 3 0
P = ,D=
−1/ 5 2/ 5 0 −7
A suitable change of variable is simply (don’t read too much into it):
x y
x = P y, where x = 1 and y = 1
x2 y2
y = P −1 x = P T x
√ √ √
2/√5 −1/√ 5 2 6/ √5
y = =
1/ 5 2/ 5 −2 −2/ 5
So, basically what we do is, instead of going directly from x to 16 using a less-than-nice matrix, we first
convert x to y, from which we go to 16 using a more-than-nice matrix.
What we’ve done so far can be put more eloquently in a nice theorem:
Theorem 4: The Principal Axes Theorem
samvanelsloo97@icloud.com
CHAPTER 7. SYMMETRIC MATRICES AND QUADRATIC FORMS 76
The principal axes (which are the eigenvectors of A) amounts to finding a new coordinate system
with respect to which the graph is in standard position. So, for the left graph in figure 7.2, we have that
5 −2
A= , where the eigenvalues equal 3 and 7, with unit eigenvectors
−2 5
√ √
1/√2 −1/√ 2
u1 = , u2 =
1/ 2 1/ 2
Look at figure 7.3. We see basically three cases: the left two graphs are always positive (except for x = 0),
the most right graph is always negative (except for x = 0), and the third graph is a mix. We have nice
terms for this:
Definition
A quadratic form Q is:
a. positive definite if Q (x) > 0 for all x 6= 0,
Also, Q is said to be positive semidefinite if Q (x) ≥ 0 for all x, and to be negative semidefinite
if Q (x) ≤ 0 for all x. The quadratic forms in (a) and (b) in 7.3 are both positive semidefinite, but the
form in (a) is best described as positive definite.
The type is related to the eigenvalues.
Theorem 5: Quadratic Forms and Eigenvalues
Proof
By the Principal Axes Theorem, there exists an orthogonal change of variable x = P y such that
Since P is invertible, there is a one-to-one correspondence between all nonzero x and all nonzero y. Thus,
the values of Q (x) for x 6== 0 coincide with the values of the expression on the right side of (7.4), which
is obvisouly controlled by the signs of the eigenvalues λ1 , ..., λn in the three ways described in the theorem.
Example
Is Q (x) = 3x21 + 2x22 + x23 +
4x1 x2 + 4x2 x3 positive definite?
3 2 0
The matrix is A = 2 2 2 of which the eigenvalues can be found to equal 5, 2, and -1. Thus, A is
0 2 1
an indefinite quadratic form, not positive definite.
If a quadratic form is positive definite, then the associated matrix is said to be a positive definite
matrix. Other terms, such as a positive semidefinite matrix are defined anagously.
Time for celebration day, because you’ve reached the end of the summary and you’ve read everything
you need to know for the exam.
samvanelsloo97@icloud.com
CHAPTER 7. SYMMETRIC MATRICES AND QUADRATIC FORMS 78
79
CHAPTER 7. SYMMETRIC MATRICES AND QUADRATIC FORMS 80