Download as pdf or txt
Download as pdf or txt
You are on page 1of 119

Notes 1 – Geometric vectors in the plane and in the space

L1.1 Applied vectors and free vectors. The typical treasure map gives information to you
using applied vectors. From the oak three walk 30 steps est and then walk 40 steps north.
So, in the first part of our treasure hunt, we start from a point O and then we move along
the west-est direction, in the verse towards est, walking a distance of magnitude 30 steps.
This task can be encoded by an arrow → (vector) starting at the point O (application point).
This is an example of an applied vector.
In Physics applied vectors are widely used. Think of a force applied to body, of the speed of
a point, and so on.
In Mathematics we want to separate the information encoded in applied vectors in two
pieces: application point + vector.
Definition. A free vector → −v is completely determined by
(i) the direction of the line containing →

v,
(ii) the verse following which we move along the line according to →

v,

− →

(iii) the magnitude (or length) of v denoted as || v ||.

Definition. An applied vector is a free vector with an application point.

Note that for each free vector there are infinitely applied vectors, one for each possible ap-
plication point. This means that different applied vectors can correspond to the same free
vector.
Exercise. When is it the case that different applied vectors correspond to the same free vector?

Example. A common way to produce applied vectors is to take two points, say A and B . Then
−−→ −−−−−→ −−→
we write AB , or (B − A) , to represent the applied vector going from A to B . Notice that BA
−−→
as the same direction and magnitude of AB but opposite verse.
−→
Exercise. What kind of vector is AA?

There are different ways to visualize the set of free vectors of the plane; one is the following.
Fix a point O and apply all free vectors to this point. Then, the free vectors of magnitude R
are in bijection with the points of the circumference of center O and radius R.
Exercise. How can we visualize free vectors in three dimensional space?

L1.2 Operation with vectors. Natural numbers arose naturally to count objects, and for a
long while there was no place for the number zero (no objects). However, if we want not
only to count, but also to perform operations, zero is of crucial importance.


Definition. The zero vector 0 is the only vector of null magnitude. It does not have direc-
tion or verse.

We first see how to multiply a vector by a real number c ∈ R which we call a scalar.


Definition. Let c ∈ R and →−v be a free vector. If c = 0 then c→

v = 0→

v = 0 . If c 6= 0 then
c→

v is the vector:
(i) having the same direction of →−v,


(ii) having the same verse of v if c > 0 and opposite verse if c < 0,
(iii) having magnitude ||c→
−v || = |c| ||→

v ||.

Exercise. Define the multiplication by a scalar for applied vectors.

Note that c→
−v is obtained by contracting or dilating the vector →
−v . Also, take note of the
absolute value of c appearing in the definition. What would be the problem with c||→−
v ||?
Let’s go back to our treasure hunt. To reach X , we have to start from O , walk 30 steps est
and 40 steps north. Or, we can move 50 steps north-est from O and then reach X . In other
−−→
words, we can move along the diagonal OX instead of following the two sides.
Definition. (Parallelogram Rule) Given free vectors →−
u and → −v , their sum −
u−+
−→v is obtained

− →

in the following way. Apply u and v at the same point O and consider the parallelogram
having → −
u and →
−v as two consecutive sides. Then, −
u−+
−→v is the diagonal of the parallelogram
starting from O .

Exercise. Work out the sum of two parallel vectors.

Exercise. What can we say about the sum of applied vectors? Is it always defined?

The multiple by a scalar and the addition of vectors have many useful properties. These
properties make computing expressions with vectors very similar to computing expressions
with real numbers.
Proposition. (Basic Properties) Let c, d ∈ R and →−
u,→−v and →−
z be free vectors, then the
following hold:
(i) c(d→−
u ) = (cd)→−
u,
(ii) (c + d) u = c→

− −u + d→
−u,
−−−→ −−−→
(iii) u + v = v + u,
(iv) →−
u + (→−v +→ −
z ) = (→

u +→ −
v)+→

z,
(v) c(→−u +→−v ) = c→
−u + c→
−v.

Proof. Let’s see how to prove some of the properties above.


(i) If cd = 0, then both vectors are the zero vector by definition. Assume c and d are not
zero. We need to show that the vectors → −
w = c(d→ −
u ) and →−z = (cd)→ −
u are equal. To do this we
show that their direction, verse, and magnitude are the same. By definition of multiplication
by a scalar, →

w has the direction of → −u and similarly for → −z . If cd > 0, then the verse of →

w is
the one of →−
u and similarly for → −z ; if cd < 0 the same argument applies. Finally,
||→

w || = |c| ||d→
−u || = |d| |c| ||→

u || = |dc| ||→

u || = ||→

z ||,
and we are done.
(v) Let →−
w = c(→ −u +→−v ) and →

z = c→ −
u + c→−
v . Again, we want to show that → −
w = → −
z and
this is obvious if c = 0. Thus, we assume c 6= 0. To determine w we make u and →

− →
− −
v to
share their tail, then we consider the parallelogram having u and v as sides. Then, →

− →
− −
w
has the direction of the diagonal of the parallelogram starting in the common tail, same or
opposite verse depending on the sign of c and the magnitude is obtained multiplying by |c|
the magnitude of the diagonal. The vector → −
z is determined similarly, but considering the
previous parallelogram scaled by |c|. Hence the desired equality follows.
QED
Exercise. Prove the remaining properties.

If we want to sum the vectors →−


u,→−v and →−z , the result does not depend on the order in
which we compute. A handy way of doing this is concatenation. Apply → −u in O , then
apply v in the head of u , and finally apply z in the head of u . If the head of →

− →
− →
− →
− −
z is now
− →
in the point A, then →

u +→
−v +→−z = OA.
Exercise. Using twice the Parallelogram Rule, prove that the previous way of computing the
sum of vectors produce the right answer.

Using the previous Proposition, we can show that (−1)→ −v can be reasonably called −→−
v . In

− →
− →
− →
− →

fact, (−1) v differs from v only for the verse which is the opposite. Thus, (−1) v + v = 0
Exercise. If the previous argument seems too obvious, read it again and be sure to understand
each step.

Exercise. Take a triangle and see its sides as vectors by assigning arrows of your choice. Now
take the sum of these three vectors. What do you have to do to obtain the zero vector? Repeat
for a square.

L1.3 Components. Vectors of magnitude one play a special role and they deserve a name.
Definition. A vector of magnitude one is called a versor or unit vector.


u is a unit vector.
Exercise. Given a vector →

u , show that →
||−
u ||

To describe a point in the plane or in three space we can use coordinates. In order to do
this in the plane, we fix two orthogonal, oriented axes and we choose a way to measure
lengths on these axes. Then the position of each point is completely determined by taking
orthogonal projections.
The same can be done with vectors. Each coordinate, oriented axis provides us with a unit

− →− →

vector. In three space these are usually called i , j , and k .
Using the concatenation description of the sum of vectors, we can represent each free vector
as a linear combination of the same three unit vectors.


Definition/Proposition. Any free vector → −v in three space can be written as → −
v = vx i +

− →

vy j + vz k , for vx , vy , vz ∈ R. The numbers vx , vy , and vz are called the components of

−v with respect to the coordinate axes. In two space, i.e in the plane, simply there is no vz
component.

Exercise. Use orthogonal projection to find the expression in components of any vector.

− →
− →

Example. If we think the vector → −v = i + 2 j + 3 k as applied in the intersection of the
coordinate axes, then the head of →

v is in the point of coordinate (1, 2, 3).

Operation on vectors can be easily performed taking advantage of components and of the
Basic Properties.

− →
− →
− →
− →
− →

Example. Consider the vectors →

v = i + 2 j + 3 k and →−
w = 3 i + 2 j + 1 k . Then, by applying
the basic properties, we get
− →
− →
− →

u−+
−→v = 4 i + 4 j + 4 k = 4→

z,

− → − → −
where →

z = i + j + k.

From this example it is possible to derive a general rule.


Proposition. (Linear Combination of Vectors) Let a, b ∈ R. If we consider the vectors

− →
− →
− →
− →
− →
− →

v = vx i + vy j + vz k , and →

w = wx i + wy j + wz k , then

− →
− →

a→

v + b→

w = (avx + bwx ) i + (avy + bwy ) j + (avz + bwz ) k .

Proof. We propose to arguments, one algebraic and the other geometric. Algebraic The result
simply follows by applying the Basic Properties (i)-(v) to the expressions in components of

−v and →−w . Geometric First we notice that the proof can be split in two steps: the case a = 0
and then the case a = b = 1. To prove the proposition, it is enough to use orthogonal
projections to determine the components. For example, in the case a = 0 and b > 0, we
have to determine the components of b→ −w . As b→

w is a dilation, or contraction, of →

w by the
factor b the result on the components follows. QED

Exercise. Why it is enough to prove the Proposition for a = 0 and then the case a = b = 1?
Notes 2 – More operations with vectors: scalar product,
vector product and mixed product
It is clear that assigning, and sharing, vectors using direction verse and magnitude is not the
best possible way. Of course using component is much more efficient and feasible. However,
the component presentation of vectors hides the geometry. For example, we can ask: what
is the length of a vector? how can we measure the angle between two given vectors? In
this section we introduce very important operations with vectors which will also provide
answers to these questions.

L2.1 Dot Product/Scalar product. We first consider angles.


Definition. Consider vectors − →
u and − →
w . The angle between the vectors −→u and − →
w is the
smallest angle that the first vector, u , spans while rotating onto the second vector, −

→ →
w.

We now see how to use the angle between two vectors.


Definition. The scalar product, or dot product, of the vectors −

u and −

w is


u ·−

w = ||−

u || ||−

w || cos(α),

where α is the angle between −



u and −

w.

→ − → −
→ −
→ − →
Example. Consider the unit vectors i , j and k . We want to compute i · i . In order to do

→ −
→ −

this, recall that i thus || i || = 1 . Moreover, i forms the zero angle with itself. Hence we have

→ −→ −
→ − →
i · i = || i || || i || cos(α) = 1.

Similarly we can compute


→ −
− → → −
− → π
j · k = || j || || k || cos( ) = 0.
2


→ −→
Exercise. Compute the dot product of all possible pairs chosen among the unit vectors i , j


and k . Does the result depend upon the order in which we choose the vectors?

The dot product is strictly related with projections.


Proposition. Let −

u and − →
w be non-zero vectors, then


w
(−

u ·−

w) −→
|| w ||2
is the orthogonal projection of −

u along the direction of −

w.

Proof. The proof follows by standard trigonometry, noticing that the scalar


u ·−→w

→ = ||−

u || cos(α)
|| w ||
is the length of the orthogonal projection. QED

Here are the fundamental properties of the dot product


Proposition. Consider vectors −

u,−→
w,−→z and a scalar a ∈ R. Then the following hold:
(i) (Linearity 1) (a−→
u)·−→
w = a(− →u ·−
→w ).
(ii) (Linearity 2) ( u + w ) · z = u · −

→ −
→ −
→ −
→ →z +−

w ·−

z.

→ −
→ −

(iii) (Symmetry) u · w = w · u . −

(iv) (Positivity) −
→u ·−→
u = ||−
→u ||2 ≥ 0.

Proof. (Hint) The first two properties are proved by using the projection interpretation of
the dot product. While the remaining properties follow by the definition. QED

The linearity properties and the symmetry allows us to use the components of vectors to
compute the dot product.

→ −
→ → −
− →
Example. ( i + 2 j ) · (2 j − k ) =

→ → −
− → −
→ → −
− →
= i · (2 j − k ) + (2 j ) · (2 j − k )

→ −
→ −
→ −
→ −
→ −
→ −
→ −

= i · (2 j ) + i · (− k ) + (2 j ) · (2 j ) + (2 j ) · (− k )

→ −→ − → − → −
→ − → → −
− →
=2 i · j − i · k +4j · j −2j · k
= 2 · 0 − 0 + 4 · 1 − 2 · 0 = 4.

Exercise. Using the components of −



u find the expression of ||−

u || .

Because of linearity, we have a useful formula to compute the dot product.



→ −
→ −
→ −
→ −
→ −

Proposition. Consider the vectors −→v = vx i + vy j + vz k , and −

w = wx i + wy j + wz k ,
then

→v ·−

w = vx wx + vy wy + vz wz .

We can now determine the angle formed by two vectors.


Proposition. Let α be the angle between the vectors − →
u and − →
w , then
! − → "
u ·− →
w
α = arccos .
|| u || ||−

→ →
w ||

Notice that we can easily check orthogonality of vectors using the dot product.


Corollary. If −

u,−

w ∕= 0 , then : −

u ⊥− →
w if and only if −

u ·−→
w = 0.

→ − →
Example. The vector −

u = i + j has the direction of the bisectrix of first quadrant of the x, y
plane. Computing

→ −

u · i =1
and √ √
||−

u || = −

u ·−

u = 2
we see that the x -axis forms an angle of π
4 with the bisectrix, as expected.

Exercise. Try the same with the first orthant in three space. Before computing, make a guess:
what is the angle between the bisectrix and the x -axis?

L2.2 Vector product. Dot product detects orthogonality and can compute angles. We will
now introduce a new operation useful by itself, which can help us to detect parallelism and
to compute areas.
Definition. Given vector −

u and −

w the vector product


u ×−

w

is the unique vector − →


z such that:
(i) the direction of −

z is orthogonal to the direction of −

u and to the direction of −

w;


(ii) the magnitude of z is:

||−

z || = ||−

u || ||−

w || sin(α),

where α is the angle between − →


u and −→w.
(ii) the direction of −

z is given by the right-hand rule.

The right-hand rule to determine the verse of −



u ×− →w works as follow. Close your right-
hand with your thumb sticking out. Make the tip of your fingers follow −→
u , the 1st vector,


moving onto w , the 2nd vector, along the smallest angle between the vectors. Your thumb
then gives the verse of −

u ×−→
w.

→ −
→ −
→ −
→ − →
Example. Consider the unit vectors i , j and k . To compute − →z = i × j we use the

→ −
→ −

definition. The direction of −
→z is orthogonal to i and j , and hence it is the direction of k .

→ −

Moreover, ||−→
z || = 1 as i and j are orthogonal unit vectors. The righthand rule gives the verse

→ −

of z and this is the verse of k . Thus,

→ −→ − →
i × j = k.

→ −
− → − → → −
− → −
→ −
→ −→ − → − →
Exercise. Check that j × k = i and i × k = − j . What can you say about i × i , j × j

→ − →
and k × k ?

We said that the vector product can detect parallelism between two vectors. This can be
easily done noticing the following.

→ −

Corollary. If −

u,−→
w ∕= 0 , then : −

u (−

w if and only if −

u ×−→
w = 0.

We now collect some useful properties of the vector product.


Proposition. Consider vectors − →
u,−→
w,−→
z and a scalar a ∈ R. Then the following hold:
(i) (Linearity 1) (a u ) × w = a( u × −

→ −
→ −
→ →w ).
(ii) (Linearity 2) ( u + w ) × z = u × −

→ −
→ −
→ −
→ →z +−

w ×− →z.

→ −
→ −

(iii) (Skew symmetry) u × w = − w × u . −



(iv) −→u ×−→
u = 0

Proof. The proof of (ii) will require the notion of mixed product and it will be given in the
next section. We give here a proof of (i). If one among a, −

u and −

w is zero, then the equality
is clear. Hence we may assume that no one of them is zero. Let


z = (a−

u)×−

w

and


t = a(−→u ×−→w ).

→ −

We want to show that −→z = t . It is easy to see that −→z and t have the same direction as
this is the common perpendicular to −→
u and − →
w . Also the magnitudes coincide, in fact


||−

z || = ||a−

u || ||−

w || sin(α) = |a| ||−

u || ||−

w || sin(α) = || t ||.
Finally, we deal with verse. First consider the case a > 0. Notice that the angle between a−

u

→ −
→ −
→ −
→ −

and w and the angle between u and w coincide. Thus z and t have the same verse.
Now consider the case a < 0. In this situation, (a− →
u)×− →w and −→
u ×− →
w have opposite verse.

→ −

But, as a < 0, the vector z and t have, again, the same verse. QED

This properties allows us to compute the vector product of any pairs of vectors.

→ −
→ → −
− →
Example. ( i + 2 j ) × (2 j − k ) =

→ → −
− → −
→ → −
− →
= i × (2 j − k ) + (2 j ) × (2 j − k )

→ −
→ −
→ −
→ −
→ −
→ −
→ −

= i × (2 j ) + i × (− k ) + (2 j ) × (2 j ) + (2 j ) × (− k )

→ − → − → − → −
→ − → → −
− →
=2 i × j − i × k +4j × j −2j × k

→ −
→ −
→ −
→ − → −

= 2 k − (− j ) + 0 − 2 i = −2 i + j + 2 k .

Extensively using the properties of the vector product we can find the following formula to
compute the vector product.

→ −
→ −
→ −
→ −
→ −

Proposition. Consider the vectors −→v = vx i + vy j + vz k , and −

w = wx i + wy j + wz k ,
then

→ −
→ −
→ −

v ×−

w = (vy wz − vz wy ) i − (vx wz − vz wx ) j + (vx wy − vy wx ) k .

Exercise. Prove the formula!

We conclude describing how the vector product relates to computing areas.


Proposition. Consider vectors −

u and −
→w having, then

||−

u ×−

w || = 2area(T ) = area(P ),

where T is the triangle, and P is the parallelogram, of sides −



u and −

v.

Proof. We use standard trigonometry. Recall that ||− →


u ×−→
w || = ||−

u || ||−

w || sin(α) where α is

→ −
→ −

the angle between u and w . Now it is enough to notice that || w || sin(α) is the length of
the height of the triangle T with respect to the base −

u. QED

L2.3 Mixed product. Given three vectors, there is essentially only one way to merge them
using the dot product and the vector product.
Definition. For vectors −

u,−→
v and −→
w we define the mixed product


u ×−

v ·−

w.

Note that in the previous expression no parentheses are needed!


The mixed product is used to compute volumes, as shown by the following result.
Proposition. The absolute value of the mixed product

|−

u ×−

v ·−

w|

is the volume of the box of sides −



u,−

v and −

w.
Proof. (Hint) The volume of the box is given by the BaseArea times the Height with respect
to that base. If we choose the base bounded by −
→u and −→
v , then

BaseArea = ||−

u ×−

v ||.

The height with respect to this base is given by the magnitude of the orthogonal projection
of the third side −

w on −

u ×− →
v . Thus we get

BaseArea × Heigth = |−

u ×−

v ·−

w |.

QED

From this result we can understand what happens when we permute the vectors of the
mixed product. In particular, for i ∕= j ∕= k


v1 × −

v2 · −

v3

and


vi × −

vj · −

vk
are equal up to sign and whenever two vector are swopped, vi ↔ vj , the sign changes.
We already note that, when a product (dot or vector) is zero, we can derive some useful
information. The same holds for the mixed product.


Proposition. If −

u,−
→v ,−

w ∕= 0 , then


u ×−

v ·−

w =0

if and only if the vectors −



u,−

v , and −

w lie on the same plane, i.e. they are said to be coplanar.

Exercise. Prove the previous result.

We finally see how to use the mixed product to prove Linearity 2 for the vector product. We
provide two different arguments.
Proof. (Conceptual) We use the following remarks:

→ −
→ −
→ −
→ −
→ −

(i) −
→v = vx i + vy j + vz k if and only if vx = −→
v · i , vy = −

v · j e vz = −

v · k.

→ → −
− → − → → − → − → − → → − → − → −→
(ii) −
→a = b if and only if −

a · i = b · i ,− a · j = b · j e− a · k = b · k.
Now we set


a = (−

u +−

v)×−

w
and

→ −
b =→
u ×−

w +−

v ×−

w


and we show that −

a = b.
Using properties of the mixed product and using the linearity properties of the dot product


we get −
→a · i =

→ → − → → − −
→ → − −
→ → −

(−

u +−→v )×− →
w· i =− w × i · (−
u +→v)=− →
w × i ·− u +→ w × i ·− v = (−→
u ×− →w +− →
v ×− →
w)· i

→ − → −
→ −

= b · i . Repeating the same argument with j and k we complete the proof. QED

Proof. (Computational) We want to show that



→ −

t = (−

u +−

v)×−

w −−

u ×−

w −−

v ·−

w. = 0

→− →
and this is equivalent to show that t · t = 0. Thus we have to compute the nine summands

→ −→
of t · t = 0 and use the mixed product to show that they cancel out.
Now take a deep breath, and start computing.

→ −→
t · t = A + B + C + ||−

u ×−

w ||2 + 2(−

u ×−

w ) · (−

v ×−

w ) + ||−

v ×−

w ||2 ,

where
A = [(−

u +−
→v)×−
→w ] · [(−

u +− →
v)×−→
w ],
B = −2[(−

u +−
→v)×− →w ] · (−

u ×−

w ),
C = −2[(−

u +−

v)×−

w ] · (−

v ×−

w ).

Now we use the mixed product and the fact that shifting does not change it.

A=−

w × [(−

u +−

v)×−

w] · −

u +−

w × [(−

u +−

v)×−

w] · −

v,

B −
− =→
w × (−

u ×−

w) · −

u +−

w × (−

u ×−

w) · −

v
2
C
− =−→
w × (−

v ×−

w) · −

u +−

w × (−

v ×−

w) · −

v
2
More computing for B and C gives

B
− = (−

u ×−

w ) × (−

v ×−

w ) + ||−

v ×−

w ||2 ,
2
C
− = (−

v ×−

w ) × (−

u ×−

w ) + ||−

u ×−

w ||2 .
2
The last computation with A yields

A = (−

u ×−

w ) · [(−

u +−

v)×−

w ] + (−

v ×−

w ) · [(−

u +−

v)×−

w]

= (−
→u +− →
v)×− →w · (−

u ×−→w ) + (−→u +−→
v)×− →
w · (−
→v ×− →w)
=−

w × (−

u ×−

w) · −

u +−→w × (− →
u ×− →
w) · −

v +− →w × (−

v ×− →
w) · −→u +− →w × (−

v ×−

w) · −

v
= ||−
→u ×−→
w || + 2(−
2 →v ×− →
w ) · (−

u ×− →
w ) + ||−

v ×− → 2
w || .

→ −→
Finally, we substitute the obtained expressions for A, B and C and we get t · t = 0.
QED
Notes 3 – Planes and lines.
A plane is the locus of points determined by a single linear equation, and it can be param-
eterized using two free variables. We begin with the second point of view. Similarly a line
is determined by two linear equations, that is by a linear system of equations, and it can be
parameterized using one free variable.

− → − →
− →
− →
− →

WarningSince the unit vectors i , j and k are fixed, the expression → −v = vx i +vy j +vz k
is redundant. Thus, with a slight abuse of notation, we will sometimes also write → −v =
(vx , vy , vz ). The latter is less formal but more convenient when computing. We have to be
very careful though and remember: never mix up points and vectors!
WarningVectors are usually denoted as → −
v but the notation v , that is v in boldface, is also
quite common.

L3.1 The equation of a plane. What is a plane exactly? It is a flat 2-dimensional surface.
As a first example, consider the plane consisting of all points of ‘height’ z = 1. To describe
all points P (x, y, z) belonging to this plane we can proceed as follows. The position vector

− −−→
v = OP satisfies the equation


v = s→

p + t→

q +→

v0 .
(1)


− − →

where →

p = i ,→ q = j . Since s, t are free to vary in R they are called free variables or free


parameters, while the vector →

v0 is fixed and chosen to be →−
v0 = k . If we write vectors using
components as column vectors and we set x = s, y = t, we obtain the following expression
       
x 1 0 0
 y  = x 0  + y  1  +  0  .
z 0 0 1 (2)

We may call either version in (1) or in (2) a parametric equation of the plane.
Setting s = 0 = t in (1) gives us a particular point (0, 0, 1) on the plane with position vector


v0 . The whole plane is described by adding to → −
v0 linear combinations of two fixed vectors

− →

p , q that are parallel to the plane.
A general plane will have the form (1) for arbitrary choices of →

v0 , →

p ,→

q , provided the last
two are not proportional. But a more common description is provided by the following
Proposition.
Proposition. A plane is the set of points (x, y, z) satisfying a linear equation equation

ax + by + cz = d, (3)

where a, b, c, d ∈ R with a, b, c not all zero.

Proof. Write (3) in vector form as




v ·→

n = d, (4)
where → −
n stands for the vector of components (a, b, c), and choose a solution →−v =→

v0 . Then

− →

v0 · n = d, and
(→

v −→−
v0 ) · →

n = 0.
If →

v0 , →

v are the position vectors of two points P0 , P (the first fixed, the second varying)
−−→ − →
then our equation is saying that the displacement vector P0 P = → v −− v0 is orthogonal to →

n.
The point P therefore lies in a plane passing through P0 and perpendicular to n . →
− QED

The vector


n = (a, b, c)
is called the normal vector to the plane. Any reasonable surface in R3 has a normal at each
point, but only for the plane the normal direction constant.

L3.2 Intersection of two planes. Consider the linear system of equations


(
a1 x + b1 y + c1 z = d1
. (5)
a2 x + b2 y + c2 z = d2

Each equation determines a plane πi with normal vector




ni = (ai , bi , ci ), i = 1, 2.

We shall assume that the two planes are not parallel, equivalently −
→1 , −
n →2 are not propor-
n
tional. In this case, their intersection will be a line

` = π1 ∩ π2 .

We can describe ` analytically by solving the linear system of two equations. We will see
a systematic way of doing this in a future lecture. At the moment, we just see a numerical
example, which can easily be generalized to produce a general method.
Example. Consider the intersection of the planes π1 : x + y + z = 1 and π2 : x + 2y + 3z = 4 ,
that is consider the system of equations

x+y+z = 1
,
x + 2y + 3z = 4

defining the line ` = π1 ∩ π2 . Since x appears in the first equation we can subtract the first
equation from the second producing a new system of equations having the same solutions,
namely 
x+y+z = 1
.
y + 2z = 3
The second equation now readily gives y using z , namely y = −2z + 3 . Substituting in the first
equation we get x = −y − z + 1 = (2z − 3) − z + 1 = z − 2 and thus

x = z−2
`: ,
y = −2z + 3

that is P (x, y, z) ∈ ` if and only if x = z − 2 and y = −2z + 3. Using components and column
vectors we can write      
x 1 −2
 y  = t  −2  +  3 
z 1 0
where z = t, and in the usual vector notation


v = t→

p +→

v0

where →

p has component (1, −2, 1) and →

v0 has components (−2, 3, 0).

In general, the solutions of the system (5) can be written in one of the following format

(x, y, z) = (x0 + tp, y0 + tq, z0 + tr),


     
x x0 p
 y  =  y0  + t  q  ,
z z0 r


v = t→

p +→

v0 , →

p 6= 0,
where t is free to move in R. The last equation asserts that → −v −→ −
v0 is parallel to the fixed


vector p . The equations therefore determine a straight line ` that passes through the point
(x0 , y0 , z0 ) with position vector →

v0 and direction →

p . Any one of these equations is called


the parametric equation of the line. The vector p is called direction vector of the line `
Because ` lies in both planes π1 and π2 , it is perpendicular to both − →1 , −
n →2 . Thus, we have
n
the following Lemma.
Lemma. → −p is a multiple of −
→1 × −
n →2 .
n

This Lemma gives us a way to find the equation of `, namely compute −


→1 × −
n →2 and then
n
find a particular solution of (5) perhaps by setting z = 0.

Here is an example.

x+y+z = 1
Example. We shall find the parametric equation of the line ` : . The
x + 2y + 3x = 4
direction of ` is given by →

p = (1, 1, 1)× (1, 2, 3) = (1, −2, 1). To find one point →

v0 on the line,
we set z = 0 so that
x + y = 1, x + 2y = 4 ⇒ y = 3, x = −2.
Therefore the equation is


v =→

v0 + t→

p = (−2, 3, 0) + t(1, −2, 1),

and the direction vector of ` is →



p = (1, −2, 1). Note that a line does not have a unique paramet-
ric form, thus different methods will not necessarily give the same parametric form.

L3.3 Further exercises.

1. Given the planes π1 : x + y + z = 1 and π2 : x + 2y − z = 0, let ` = π1 ∩ π2 . Say whether


(i) there exists a, b, c ∈ R such that ` is given by (x, y, z) = (1+at, −1+bt, 1+ct),
(ii) there exists p, q, r ∈ R such that ` is given by (x, y, z) = (p−3t, q+2t, r+t).

2. Find a unit vector orthogonal to


(i) the plane 7x + y + z = 5,
(ii) the plane that contains the points (1, 2, −2), (1, 0, 3), (−4, 4, 4),
(iii) both the lines (x, y, z) = (−1, t, 2t) and (x, y, z) = (t, 1 + t, 1 − t).

3. Given the planes


π1 : ax + y − 2z = 0,
π2 : y + z + 2b = 0,
π3 : 2x + y + 2z = 1,
find a and b such that the line π1 ∩ π2 is parallel to π3 .

4. Let ` be the line x = 2y = z and π the plane x = y + z . Explain why a line m in π that
meets ` necessarily has the form (x, y, z) = (at, bt, ct), and find the condition on a, b, c for
which m is orthogonal to `.
Notes 4 – More on planes and lines.
In this lecture we investigate how lines and planes relate to each other. We will also see how
to deal with symmetry and projection with respect to planes and lines.

L4.1 Parallelism and orthogonality. We presented a plane as the set of solution of one linear
equation
π : ax + by + cz = d

− →
− →

and we introduced the normal vector of the plane → −n = a i + b j + c k . In contrast with
this situation, a line r in three space is the solution set of a linear system having two linearly
independent equations, 
a1 x + b1 y + c1 z = d1
r: ;
a2 x + b2 y + c2 z = d2
thinking that a line in three space is presented by one equation is one of the worst thing you
can do in this class! A parametric equation of this line can be derived using the direction
vector →
−p =− →1 × −
n →2 , thus
n


v =→ −
v0 + t→

p;
as t varies the position vector →−
v describes the lines r .
By the geometric meaning of the normal vector and the direction vector, it is easy to see that
(i) The planes α, β are parallel if and only if −n→ −

α knβ .
(ii) The planes α, β are orthogonal if and only if − n→ −→
α ⊥ nβ .
(iii) The plane α and the line r are parallel if and only if − n→ →

α ⊥ vr .
(iii) The plane α and the line r are orthogonal if and only if nα k→ −
→ −vr .

− →

(iv) The lines r, s are parallel if and only if vr k vs .
(v) The lines r, s are orthogonal if and only if → −vr ⊥ → −
vs .

L4.2 Intersection of planes and lines. As lines and planes are defined as solution sets of
linear equations, in order to intersect them it is enough to solve a (larger) linear system of
equation. For the time being, we will solve these systems of equations by hands, but a more
systematic way of studying them will be soon introduced.
The intersection of two planes α and β is computed by solving a 2 × 3 linear system of
equations, that is a system with 2 equations in 3 unknowns. If the normal vectors − n→
α and


nβ are proportional, then the planes are either parallel, and we will find no solution, or they
coincide, and we will find infinitely many solutions depending on two parameters. If the
normal vectors are not parallel then α ∩ β is a line.
The intersection of a plane α and a line r is obtained by solving a 3 × 3 linear system of
equations, that is a system with 3 equations in 3 unknowns. If the normal vector of the
plane → −
n and the direction vector of the line → −
p are orthogonal, then either the plane and
the line are parallel, and we will find no solution, or the line lies inside the plane, and we
will find infinitely many solutions depending on one parameter. If the normal vector of the
plane →−n and the direction vector of the line →

v are not orthogonal, α ∩ r is exactly one point
and the system has exactly one solution.
The intersection of two lines r, s is determined by solving a 4×3 linear system of equations,
that is a system of 4 equations in 3 unknowns. If the direction vectors of the lines → −
v1 and


v2 are parallel, so are the lines, and no intersection exists, thus no solution exists of the
system; note that r and s lie in the same plane, that is they are coplanar However, when the
direction vectors are not proportional, a new phenomenon occurs. If the lines are coplanar,
then they must intersect, exactly one common point exists, and the system has exactly one
solution. But, if the lines are not coplanar, then no common points exist, and the system has
no solutions; r and s are called skew lines.
Skew lines are a new important feature of three space compared with two space. In the plane,
two lines are either intersecting or they are parallel, notice that they of course lie in the same
plane! In three space, though, two lines can also be not coplanar: in this case they are not
intersecting and they are not parallel and we call them skew lines.
Exercise. How do you find the plane containing two parallel lines?

Exercise. Can you find a plane containing two skew lines?

Actually, when intersecting a line and a plane, this can be done more efficiently without us-
ing the 3×3 system equation. We can just use the parametric equation of the line, containing
only one variable t, and substitute into the equation of the plane.
Example. Find the intersection of the plane α : x + y + z = 0 and of the line

y−x=1
r:
z−y =1

We first find a parametric equation of the line. We can do this by finding the direction vector
using the vector product and then picking a random point of the line. However, in this case it
is simpler to use the structure of the equations. If we set y = t we can readily solve the system
and get the parametric equation
     
x −1 1
 y  =  0  + t 1 
z 1 1

To find r ∩ α we substitute the parametric equation of the line in the equation of the plane and
we get
(t − 1) + (t) + (t + 1) = 0
and thus t = 0. Setting t = 0 in the parametric equations of r we get the intersection point
(−1, 0, 1).

Exercise. Find the parametric equations for r using the vector product. Do you get the same
equations?

L4.3 Orthogonal projection and symmetries. If a point P does not lie on a plane α, then
the orthogonal proiection of the point on the plane is the point of α closest to P , that is it
the best approximation of P in α. This is why orthogonal projections naturally appears in
numerical applications. Symmetries (or reflections), besides their aesthetical appeal, have
similar connections with applications.
Let’s see how to find orthogonal projections on planes.
Lemma. Fix O ∈ α. The orthogonal projection of P on α is the point H such that

−−→ −−→ − −
n→
HP = OP · n→
α
α − .
||n→α ||
2

Proof. Here is the basic remark: for a given point P , let H ∈ α be the orthogonal projection
of P on α. Then the triangle of vertices P, O and H has a right angle in H and we have the
−−→ −−→ −−→
following vector equation OH + HP = OP .
−−→ −−→
Thus, HP is the orthogonal projection of OP on − n→ α and this is enough to conclude the
proof. QED

Example. Consider the plane α : z = 0 and the point P (x, y, z) . To find the orthogonal projec-


tion of the point P , we pick O(0, 0, 0) and then we apply the lemma noticing that −
n→
α = k and
we get
−−→ →
− →
− →
− → −→ − →

HP = (x i + y j + z k ) · k k = z k
and we get H(x, y, 0) as it should be.

Exercise. If P (xP , yP , zP ) and α : ax + by + cz = d , let

axP + byP + czP − d


T = .
a2 + b2 + c2
Then the orthogonal projection of P on α is the point

(xP − aT, yP − bT, zP − cT ).

Prove this formula.

Rather than using the previous formulae, it is possible to follow an alternative geometric
argument in order to find orthogonal projections. Given a point P and a plane α, let H ∈ α
be the orthogonal projections of P . In order to find H , find the unique line r passing
through P and such that r ⊥ α. Then, H = r ∩ α.
Example. Find the orthogonal projection of P (3, 2, 1) on the plane a : x + y + z = 0. The
direction vector of the line r can be chosen equal to the normal vector of the plane, that is

− −→ →
− → − → −
p =n α = i + j + k . Since P ∈ r the parametric equations of the line are

     
x 3 1
 y  =  2  + t 1 
z 1 1

To find r ∩ α we substitute the parametric equation of the line in the equation of the plane
and we get
(3 + t) + (2 + t) + (1 + t) = 0
and thus t = −2. Setting t = −2 in the parametric equations of r we get the orthogonal
projection H(1, 0, −1).
The treatment of symmetries is very similar to the one of orthogonal projections. The key
remark is the following: if Q is the symmetric of P with respect to the plane α, then the
−−→
middle point of P Q is H where H is the orthogonal projection of P on α.
Lemma. Fix O ∈ α. The symmetric of P with respect to the plane α is the point Q such
that
−−→ −−→ → − n→
QP = 2OP · −
α
nα − → .
||nα ||2

Example. Find the point Q the symmetric point of P (1, 2, 3) with respect to the plane α : z = 0.
Fix O(0, 0, 0) and note that −
n→
α = (0, 0, 1). Let Q(x, y, z) and apply the Lemma:

−−→ −
−−→ OP · n→ α−
QP = 2 − n→
α = 2(1, 2, 3) · (0, 0, 1)(0, 0, 1) = (0, 0, 6),
||n→
α ||
2

−−→
thus QP = (P − Q) = (1 − x, 2 − y, 3 − z) = (0, 0, 6) and hence Q(1, 2, −3).

Exercise. Find an expression for the coordinates of Q as we did for the coordinates of the
orthogonal projection.

L4.4 Further exercises.

1. Find the orthogonal projection of the points A(1, 2, 3) and B(3, 2, 1) on the plane α :
x − z = 1.

2. Describe the mutual position of all possible pairs chosen from:

x+y+z =0
α : x + y + z = 1, r : x = y = z, s : { .
x−y−2=0

3. Choose three pairwise non-parallel planes and consider the three lines obtained as the
pairwise intersections of the planes. Do the line have a point in common? Does the answer
(YES/NO) depend on the initial choice of the planes?

4. Consider the triangle T of vertices A(1, 0, 0), B(0, 1, 0), C(0, 0, 1), and let T 0 be the or-
thogonal projection of T on the plane x + y + z = 5. Find the area of T 0 . How does it
compare to the area of T ?

5. Same exercise with the plane π : x + y − z = 5.

6. Discuss symmetries with respect to a line.

7. Find three planes intersecting exactly in the point P (1, 2, 3). Can you find three lines
intersecting exactly at the same point?
Notes 5 – Matrix addition and multiplication

L5.1 Matrices and their entries. A matrix is a rectangular array of numbers. Examples:
 
  0 0 0 0 1
1 1
  0 0 0 0  
1 2 3 0 
0
 1 2
  1 0 0 0
0 6 9 0 
0
 1 4
0 1 0 0
0
0 0 0 1 0

The individual numbers are called the entries, elements, or components of the matrix. If the
matrix has m rows and n columns, we say that it has size ‘m by n’ or m×n. The above
examples have respective sizes 2×3, 4×1, 5×5, 2×2. If m = n (as in the last two cases) the
matrix is obviously square.
The set of matrices of size m × n whose entries are real numbers is denoted by Rm,n ; the
first superscript is always the number of rows. Sometimes we use symbols to represent
unspecified numbers, so the statement
 
a b
∈ R2,2
c d

is tantamount to saying that a, b, c, d are real numbers.


For matrices with more than about 4 entries, it is convenient to use subscripts to label the
entries. Given a matrix A, we typically denote by aij the entry in the ith row and j th
column (lower case to emphasize that the entry is a number).
Example. In this notation, the generic 3×4 matrix is
 
a11 a12 a13 a14
 a21 a22 a23 a24  .
a31 a32 a33 a34

Mathematicians like to deal in generalities and will even write a matrix as A = (aij ) without
specifying its size.

Definition. The transpose of a matrix A is the matrix, indicated tA or A> , is obtained by


interchanging its rows and columns.

For example
 T  
  0 0 0 0 1 0 1 0 0 0
1 1 0 0 0 0 0 0 1 0 0
(1, 2, −7)> =  2  ,
   
 =
0 1 0 0 0 0 0 0 1 0
  
−7 0 0 1 0 0 0 0 0 0 1
0 0 0 1 0 1 0 0 0 0

If A ∈ Rm,n then A> ∈ Rn,m . In subscript notation, we have

(A> )ij = aji .

Notice that (A> )> = A, so the operation of taking the transpose is self-inverse.
L5.2 Vectors. Of special importance are matrices that have only one row or column; they are
called row and column vectors. In writing a row vector with digits, it is useful to use commas
to separate the entries. For example, both the matrices
 
1
A = (1, 2, −7) ∈ R1,3 , B =  2  ∈ R3,1
−7

can be used to represent the point in space with Cartesian coordinates x = 1, y = 2, z = −7.
(Sometimes commas are used to distinguish between matrices and row vectors, but it is
simpler to regard them as the same object.)
One can switch between row and column vectors by observing that A = B> or B = A> . For
this reason, the distinction between a row vector and a column vector is often unimportant,
and the sets R1,n and Rn,1 can be written more simply Rn , and we can refer to both A ∈ R3
and B ∈ R3 as ‘vectors’ of length 3. We shall use such vectors to study analytic geometry
later in the course.
Whenever we write ‘Rn ’ the reader is free to use row or column vectors as he or she prefers;
if such a choice is not possible, we shall use the other notation to specify either rows or
columns. Actually, vectors tend to be given lower-case names, and a vector of unspecified
length n is more likely to be written
  
v1 x1
 ·   · 
u = (u1 , . . . , un ) or v=
 · 
 or  .
 · 
vn xn

Row and column vectors are not merely special cases of matrices. Any matrix can be re-
garded as an ordered list of both row vectors and column vectors. Given a matrix A ∈ Rm,n ,
we shall denote its rows (thought of as matrices in their own right) by

r1 , . . . , rm ∈ R1,n

and its columns


c1 , . . . , cn ∈ Rm,1
More informally (ignoring parentheses in a way that would be spell disaster in a computer
program), we may write
   
← r1 → ↑ · · ↑
 ···  = A =  c1 · · cn  .
← rm → ↓ · · ↓

Much of the study of matrices is ultimately be based on one or other of these two descrip-
tions.

L5.3 Addition of matrices. A matrix is much more than an array of data. It is an algebraic
object that is subject to operations generalizing the more familar ones applicable to numbers
and vectors.
Definition. To form the sum of two matrices A, B , they must have the same size. The
entries are then added component-wise.

For example
     
1 2 3 0 −2 4 1 0 7
A= , B= ⇒ A+B =
0 6 9 2 −6 1 2 0 10

Of course, the result is still a matrix of size 2×3.


In particular, if we add a matrix to itself, we merely double every entry and it is reasonable
to call the result 2A:  
2 4 6
A+A= = 2A
0 12 18

Definition. If c ∈ R and A ∈ Rm,n then cA is the matrix formed by multiplying every entry
of A by c.

If c is zero, we get a null matrix


 
0 0 0
0A = = 0.
0 0 0

A null matrix is denoted by 0 or 0 (or even 0 like the number), provided the context makes
clear its size. Of course, these definitions apply equally to vectors, so for example

2(x, y, z) = (2x, 2y, 2z).

We denote (−1)B by −B , so that matrices can be subtracted in the obvious way:


 
1 4 −1
A−B = = A + (−B).
−2 12 8

Exercise. Explain why A + B = B + A and (A+B)> = A> + B> .

L5.4 Matrix multiplication. First we define a numerical product between two vectors u, v
of the same length. For this it does not really matter whether they are row or column vectors,
but for egalitarian purposes we shall suppose that the first is a row vector and the second a
column vector. Thus, we consider
 
v1
 · 
u = (u1 , . . . , un ) ∈ R1,n , n,1
 ·  ∈R .
v= 

vn

Definition. The dot or scalar product of u and v , written u · v is the number


n
X
ui vi = u1 v1 + · · · + un vn .
i=1

(We shall not use the summation symbol much in this course, but students should be famil-
iar with its use.) The dot product provides the basis for multiplying matrices:
Definition. The product of two matrices A, B is only defined if the number of columns of A
equals the number of rows of B . If A ∈ Rm,n has rows r1 , . . . rm and B ∈ Rn,p has columns
c1 , . . . , cp then AB is the matrix with entries ri · cj and has size m×p.

More explicitly,
    
← r1 → ↑ · · ↑ r1 · c1 · · r1 · cp
AB =  ···   c1 · · cp  =  · · · ··· .
← rm → ↓ · · ↓ rm · c1 · · rm · cp

One should imagine taking each row of A, rotating it and placing it on top of each column
of B in turn so as to perform the dot product.
Example. A very special case is the product r1 c1 = (r1 · c1 ) of a single row and a column.
Strictly speaking, this is a 1×1 matrix, but (again ignoring parentheses) we shall regard it as a
number, i.e. the dot product. With this convention, if v = (x, y, z) then
 
x
vv> = (x, y, z)  y  = x2 + y 2 + z 2 .
z

Later, we shall refer to the square root of this quantity as the norm of the vector v (it is the
distance from the corresponding point to the origin). By contrast, note that
 2 
x xy xz
>
v v =  yx y 2 yz 
zx zy z 2

is a 3×3 matrix.

An intermediate case of the matrix product is that in which the second factor is a single
column v = (x1 , . . . , xn )> ∈ Rn,1 , so that
 
r1 · v
Av =  · .
rm · v

In general we can say that  


↑ · · ↑
AB =  Ac1 · · Acp  .
↓ · · ↓
This shows clearly that each column of the product is obtained by premultiplying the corre-
sponding column of B .
The rule for manipulating the sizes can be remembered by the scheme

m×n n×p m × p,
| {z }

and matrix multiplication defines a mapping

Rm,n × Rn,p −→ Rm,p .

Even if AB is defined, it will often be the case that BA is not. The situation is much more
symmetrical if m = n, and we investigate this next.
Notes 6 – Square matrices: determinants and inverses
(Part I)
The study of square matrices is particularly rich, since ones of the same size can be mul-
tiplied together repeatedly. This realization will lead us to construct inverse matrices and
define a number called the ‘determinant’ of a square matrix.

L6.1 Identity matrices. Recall that a matrix is said to be square if it has the same number of
rows and columns. So A ∈ Rm,n is square iff m = n.
Definition. A square matrix is diagonal if the only entries aij that are nonzero are those for
which i = j . These form the diagonal & from top left to bottom right. The n × n matrix A
for which 
1, i=j
aij =
0, i 6= j
is called the identity matrix of order n, and is denoted In .

It is easy to verify the


Proposition. If A ∈ Rm,n then Im A = A = AIn .

Here is an example:
 
       1 0 0
1 0 a11 a12 a13 a11 a12 a13 a11 a12 a13
= = 0 1 0
0 1 a21 a22 a23 a21 a22 a23 a21 a22 a23
0 0 1

If A, B ∈ Rn,n then both AB and BA are defined and have size n×n. In general they are
unequal, so matrix multiplication is not commutative.
 
0 1
Exercise. Try A = and B = A> .
0 0

Even more surprisingly AB could be the zero matrix even if both A and B are not the zero
matrix.
   
1 1 1 −1
Exercise. Compute AB and BA for A = and B = .
2 2 1 1−

Thus, matrix multiplication is radically different from the multiplication of real numbers.
We must be extremely careful when dealing with matrix multiplication!

L6.2 Powers of matrices. We can raise a square matrix to any positive power. For example
A2 simply means AA, and
A3 = AAA = A2 A = AA2 .
An important property of powers of a given matrix is that they commute with one another,
i.e. the order of multiplication does not matter (unlike for general pairs of matrices):

Am An = Am An , m, n ∈ N. (1)
By convention, for a matrix A ∈ Rn,n , we set A0 = In . We can try to define negative powers
using the inverse of a matrix, though this does not always exist. The situation for n = 2 is
described by the
 
a b
Lemma. Let A = . Then there exists B ∈ R2,2 for which AB = I2 iff ad − bc 6= 0. In
c d
this case, the same matrix B satisfies BA = I2 .

Proof. The recipe is well known:

1
 
d −b
B= . (2)
ad − bc −c a

Provided ad − bc 6= 0 this matrix satisfies both the required equations. If ad − bc = 0 then


 
d −b
A=0 (3)
−c a

is the null matrix, and this precludes the existence of a B for which AB = I2 ; multiplying
(??) on the right by B would give a, b, c, d are zero, impossible. QED

If ad − bc 6= 0, then (??) is called the inverse of A and denoted A−1 . More generally, a square
matrix A ∈ Rn,n is said to be invertible or nonsingular if there exists a matrix A−1 such that
AA−1 = In or AA−1 = In . In this case, it is a remarkable fact that there is only one inverse
matrix A−1 and it satisfies both equations.
Exercise. (i) If A is invertible, then so is A> , and (A> )−1 = (A−1 )> .
(ii) If A, B are invertible then (AB)−1 = B −1 A−1 .
(iii) If A is invertible and n ∈ N then (An )−1 = (A−1 )n .

The inverse can be used to help solve equations involving square matrices. For example,
suppose that
AB = C,
where A is an invertible square matrix. Then

B = In B = (A−1 A)B = A−1 (AB) = A−1 C,

and we have solved for B in terms of C .


Example. Let A ∈ R2,2 . A direct calculation shows that

A2 − (a + d)A + (ad − bc)I2 = 0.

Assuming there exists A−1 such that AA−1 = I2 we obtain

A − (a + d)I2 + (ad − bc)A−1 = 0 ⇒ (ad − bc)A−1 = (a + d)I2 − A.

We get exactly the same expression for A−1 by assuming that A−1A = I2 .

L6.3 Determinants. The quantity ad − bc is called the determinant of the 2×2 matrix A. It
turns out that it is possible to associate to any square matrix A ∈ Rn,n a number called its
determinant, written det A or |A|. This number is a function of the components of A, and
satisfies
Theorem. det A 6= 0 iff A is invertible.

We shall explain this result further on in the course, but here we give two ways of computing
the determinant when n = 3. Let
 
a11 a12 a13
A =  a21 a22 a23  .
a31 a32 a33

Then one copies down the first two columns to form the extended array
 
a11 a12 a13 a11 a12
 a21 a22 a23  a21 a22 .
a31 a32 a33 a31 a32

The formula of Sarrus asserts that the determinant of A is the sum of the products of entries
on the three downward diagonals & minus those on the three upward diagonals %.
Equivalently,

a a
det A = a11 22 23 − a12 a21 a23 + a13 a21 a22

. (4)
a32 a33 a31 a33 a31 a32

The three mini-determinants are constructed from the last two rows of A.
Exercise. Use either of these formulae to prove the following properties for the determinant of
a 3×3 matrix A :
(i) if one row is multiplied by c so is det A,
(ii) det(cA) = c3 det A,
(iii) if two rows are swapped then det A changes sign,
(iv) if one rows is a multiple of another then det A = 0,
(v) det A = det(A> ), so the above statements apply equally to columns.

Determinants of 3 × 3 matrices can be used to compute mixed products.


Proposition. For vectors

− →
− →
− →
− − →
− →
− →
− →
− →
− →

u = ux i + uy j + uz k , →
v = vx i + vy j + vz k , and →

w = wx i + wy j + wz k ,

we have  
ux uy uz


u ×→

v ·→

w = det  vx vy vz .
wx wy wz

Exercise. Prove the previous proposition by direct computation.

In order to explain where (??) comes from, let Aij ∈ R2,2 denote the matrix obtained from
A by deleting its i row and j column. Let Ae denote the matrix with entries

aij = (−1)i+j det(Aij ).


e
 
In words, it is formed by replacing each entry of A by the determi- + − +
nant of its ‘complementary matrix’ changing sign chess-board style.  − + − 
+ − +
e> = (det A)In , so if det A 6= 0 is nonzero then A−1 = 1 e>
Proposition. AA A .
det A
Proof. (Sketch) We will use properties of the mixed product. We let B = AA e> = (bij ) and

− →
− →

we denote with u ,respectively v , respectively w , the vectors having as components the
first, respectively the second, respectively the third, row of A. Thus, for example, we have

b11 = →

u ·→

v ×→

w = det(A)

and
b12 = →

u ·→

u ×→

w = 0.

Exercise. Complete the proof this proposition using the properties of the mixed product.

e> . The matrix A


Note that the right-hand side of (??) is the top-left entry of AA e> is called
the adjoint or adjugate of A.
Example. To find the inverse of  
1 1 2
A=  3 5 8 ,
13 21 35
we compute
 T   
7 −1 −2 1 1 2 7 7 −2
e> = A  7
AA 9 −8  =  3 5 8   −1 9 −2  = 2I3 .
−2 −2 2 13 21 35 −2 −8 2

Thus det A = 2 and  


7 7
2 2 −1
A−1 =
 1 9
 −2 −1  .

2
−1 −4 1
One can also check that A−1A = I3 .
L6.4 Further exercises.

1. Compute the following products of matrices:


   
  1 1 1 1  
6 0 4 6 0 4
 0 4 ,  0 4 .
1 −1 1 1 −1 1
−2 2 −2 2

   
1 2 3 6 0 4
2. Let A =  1 −1 2  and B =  1 0 0  . Compute the following products:
3 2 0 3 2 5

AB, BA, (BA)> , (AB)> , A> B> , B> A> .

3. Let  
  a b
1 2 3
A= , B =  0 c .
0 6 9
0 0
Find a, b, c so that AB = I2 . Does there exist a matrix C such that CA = I3 ?
   
a b 0 1
4. The matrices A = ,E= satisfy the equations AE = EA, AE> = E> A.
c d 0 0
Deduce that A = aI2 .

5. Give examples of matrices A, B, C ∈ R2,2 for which


(i) A2 = 0 and A 6= 0,
(ii) B 2 = B and B 6= I2 ,
(iii) C 2 + 2C + I2 = 0.

6. Find the inverses of the following matrices:


 
  1 2 1 1
  1 2 3
1 2 2 1 0 0
,  0 1 2 ,  .
2 −1 0 1 1 3
−1 4 0
3 2 1 1

For the last one, you will need to generalize the chess-board method.
   
5 0 −1 1 1 −1
7. Let A =  0 5 1  and P =  −1 1 1  . Compute the following matrices:
−1 1 4 2 0 1

AP, P −1 , D = P −1AP, P D, AD.


Notes 7 – Linear systems of equations
In this lecture, we shall introduce linear systems, interpret them in terms of matrices and
vectors, and define linear combinations of vectors.

L7.1 Introduction. Here is an example of a so-called linear system:



 x1 + x2 + 2x3 + 3x4 = 3
5x1 + 8x2 + 13x3 + 21x4 = 7 (1)
34x1 + 55x2 + 89x3 + 144x4 = 4

This one consists of 3 equations in 4 unknowns. The problem is to determine all possible val-
ues of these unknowns x1 , x2 , x3 , x4 that solve all 3 equations simultaneously. Each equa-
tion might be expected to impose a single constraint, and since there are fewer equations
than unknowns we might guess that it is not possible to specify the unknowns uniquely.
However, without checking the numbers on the right, it is conceivable that the 3 equations
are inconsistent and that there are no solutions.
The situation is best illustrated with pairs of equations, each in 2 unknowns. Consider the
four separate systems
 
x + 2y = 0 x + 2y = 7
(a) (b)
3x + 4y = 0 3x + 4y = 8
  (2)
x + 2y = 0 x + 2y = 7
(c) (d)
2x + 4y = 0 2x + 4y = 8

Those on the left are called homogeneous because the numbers on the right are all zero,
whereas (b) and (d) are inhomogeneous or nonhomogeneous. Any homogeneous system al-
ways has at least one solution, namely the one in which all the unknowns are assigned the
value 0; this is called the trivial solution.
It is easy to check that in cases (a) and (b) the two equations are independent and that there
is a unique solution of the system. For (a), it is the trivial solution x = 0 = y ; for (b) it is
x = −6, y = 13/2, or expressed more neatly (x, y) = (−6, 13 2 ).
In (c) it is obvious that the second equation is completely redundant; it is merely twice the
first. In this case, we can assign any value to (say) y and then declare that x = −2y ; we
say that y is a free variable and that the general solution depends on one free parameter. In a
sense, the system (c) is ‘underdetermined’.
In (d), the two equations are incompatible; the first would imply that 2x + 4y = 14 and
we get 14 = 8. This means that there is no solution; the system is called inconsistent. By
contrast, homogeneous equations are always consistent.
To sum up, we can have no solutions, a unique solution (one and only one value for each
unknown) or infinitely many solutions. We shall see that the same is true for a linear system
of arbitrary size. With this knowledge, and without further examination, we can be confi-
dent that (1) has either infinitely many solutions or not at all; it cannot have say exactly four
solutions!
L7.2 Matrix form. Let us begin with an arbitrary linear system of the form


 a11 x1 + . . . . . . + a1n xn = b1
 a21 x1 + . . . . . . + a2n xn = b2


······
······




am1 x1 + . . . . . . + amn xn = bm

We shall always use


m to denote the number of equations, and
n to denote the number of unknowns or variables.
We now see that the notation is tailored to that of matrices; indeed the system can be rewrit-
ten in the succinct matrix form
Ax = b (3)
where
 
  x1  
a11 · · a1n  ·  b1
A=  · ·  ∈ Rm,n , n,1
 · ∈ R ,
x=   b =  ·  ∈ Rm,1 .
am1 · · amn bm
xn
The system is homogeneous iff b = 0 is the null vector. A solution of the system is now
understood as meaning a column vector x of length n that satisfies (3). The problem is to
find all such vectors.
Observe that in the matrix form (3), the left-hand side of each equation is translated into a
row of A. We shall normally solve such a system by operating on the rows of A, but first we
show how the inverse matrix can sometimes be used.
Remark. Recall that a square matrix A m × mis invertible if and only if exists B of the same
size such that
AB = Im = BA.

Example. Consider the linear system (3), and suppose that m = n and that A is invertible. This
means that we can find a matrix A−1 such that A−1 A = In . Then
A−1 (Ax) = A−1 b ⇒ (A−1 A)x = A−1 b ⇒ x = A−1 b,
and the system is solved uniquely. Thus, a linear system with the same number of equations and
variables whose associated matrix is invertible has a unique solution. Applying this method to
the generic 2×2 system 
ax + by = p
cx + dy = q.
gives       
x 1 d −b p 1 dp − bq
= = .
y ad − bc −c a q ad − bc −cp + aq
The solution is neatly expressed as
   
p b a p

q d c q
x =   , y =   .
a b a b

c d c d

It is a special case of Cramer’s rule, whereby each unknown is obtained by susbstituting a column
of A by b, taking the determinant, and then dividing by det A.
L7.3 Linear combinations. Let A be the matrix of left-hand coefficients defined by a linear
system. We can instead emphasize the role played by the columns c1 , . . . cn of A by rewriting
the system as        
a11 a12 a1n b1
x1  ·  + x2  ·  + · · · + xn  ·  =  ·  .
am1 am2 amn bm
Equivalently,
x1 c1 + x2 c2 + · · · + xn cn = b. (4)
This is called the column vector form of the system. In this interpretation, the simultaneous
nature of the m equations translates into a relation between the column vectors of length m
involving the coefficients xi . For example
     
1 13 2 7
−6 + 2 =
3 4 8

is the solution of example (2)(b) in these terms.


This motivates the
Definition. Fix n, and let u1 , . . . , uk be finitely many vectors of length n (either all in R1,n
or all in Rn,1 ). A linear combination (LC) of these vectors is any vector of the form

a1 u1 + · · · + ak uk

with a1 , . . . , ak ∈ R. The set of all such linear combinations is written L {u1 , . . . , uk }.

Thus L {u1 , · · · , uk } is the set of vectors ‘generated’ by the ui . Often it is called their span
and written hu1 , . . . , uk i. It is an example of a subspace, something that we shall study in a
future lecture. It does not depend on the order in which the ui are written; it is a function
of the unordered set {u1 , . . . , uk }, which mathematicians usually write with curly brackets.
Solving a linear system then amounts to trying to express the given vector b as a LC of the
columns manufactured from the left-hand coefficients. A solution exists iff

b ∈ L {c1 , . . . , cn } .

Whilst the rows of A represent the equations, it is linear combinations of the columns that
characterize the solutions. In the study of linear systems, one is constantly torn between
favouring the rows of the associated coefficient matrix, or the columns.
     
1 2 x
Remark. Let v = ,w= . Show that L {v, w} = L {v} and that ∈ L {v} iff
2 4 y
 
7
y = 2x . The fact that 6∈ L {v} explains why the system (d) had no solution.
8

Example. Consider the row vectors i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1). Then

L {i} = {(x, 0, 0) : x ∈ R},


L {i, j} = {(x, y, 0) : x, y ∈ R} = {(x, y, z) : z = 0}.

The last line shows a linear combination of vectors characterized by an equation, something we
shall see over and over again. Note also that L {i, 0} = L {i} and L {i, i+j} = L {i, j} .
L7.4 Further exercises.

1. Determine which of the following homogeneous systems admit only the trivial solution:
  
 3x + y − z = 0  −4x + 2y + z = 0  −2x1 + x2 + x3 = 0
x + y − 3z = 0 3x − 5y + z = 0 x1 − 2x2 + x3 = 0
x + y = 0, 3x + y − 2z = 0, x1 + x2 − 2x3 = 0.
  

2. Find all the solutions of the linear systems


  
 3x + y − z = 0  −4x + 2y + z = 0  −2x1 + x2 + x3 = 0
x + y − 3z = 1 3x − 5y + z = 1 x1 − 2x2 + x3 = 1
x + y = −1, 3x + y − 2z = −1, x1 + x2 − 2x3 = −1.
  

3. Given the row vectors v1 = (a, b, c), v2 = (1, 1, 0), v3 = (0, 1, −1), w = (2, 3, −1),
consider the equation
x1 v1 + x2 v2 + x3 v3 = w. (5)
Determine whether there exist a, b, c ∈ R such that
(i) equation (5) has a unique solution (x1 , x2 , x3 ),
(ii) equation (5) has no solution,
(iii) equation (5) has infinitely many solutions.
Notes 8 – Row equivalence
For simplicity, we shall first study homogeneous systems of equations. The secret is to config-
ure the rows of the coefficient matrix A so as to (more or less) read off the solutions.

L8.1 Row operations. Consider a homogeneous linear system in matrix form

Ax = 0, with A ∈ Rm,n , x ∈ Rn,1 .

In this case, each equation is completely determined by the corresponding row of A, and
we can encode the equations by the m rows


 r1
r2


 ·
rm

of A. In this notation, with m = 4, the scheme




 r1
r2 − 2r1


 r3
3r4

represents an equivalent system of equations; we have merely subtracted twice the first
from the second and multiplied the last by 3. These changes will not affect the values of
any solution (x1 , . . . , xn ). We are also at liberty to change the order in which we list the
equations.
Our aim is to use such changes to simplify the system.
Definition. Let A be a matrix of size m×n. An elementary row operation (ERO) is one of
the following ways in which a new matrix of the same size is formed from A:
(i) add to a given row a multiple of a different row,
(ii) multiply a given row by a nonzero constant,
(iii) swap or interchange two rows.

In symbols, we can denote the operations that we have just described by


(i) ri 7→ ri + arj , i 6= j ,
(ii) ri 7→ cri , c 6= 0,
(iii) ri ↔ rj .
In practice, it is often convenient to take a to be negative; in particular (i) includes the act of
subtracting one row from another: ri 7→ ri − rj (but it is essential that i 6= j otherwise we
would effectively have eliminated one of the equations).
ERO’s will play an essential role in the development of our course. Thus, we want to stress
the deep link between them and matrix theory, and in particular with matrix multiplication:
whenever we perform an ERO on a matrix A of size m×n we are actually computing the
matrix product EA where E is a matrix of size m×m called elementary matrix.
Exercise. Let A be a matrix of size 2 × n . Prove that each one of the elementary operations
above is equivalent to the matrix multiplication EA where E is one of the following elementary
matrices:   
1 a 1 0
(i) or ,
0 1  a 1 
a 0 1 0
(ii) or ,
0 1  0 a
0 1
(iii) .
1 0

In general, elementary matrices are constructed applying the corresponding ERO to the
identity matrix of the proper size.
Definition. A n×n matrix E i called an elementary matrix if E is obtained from In applying
one of the ERO’s.

It is easy to see that:


Proposition. Let E be an elementary matrix n×n obtained by a given ERO applied and let
A be a n×n matrix. Then, the matrix EA is obtained by the matrix A applying the same
ERO.

Since ERO’s are invertible operations elementary matrices are invertible matrices.
Proposition. If E is an elementary matrix, the E is invertible.

Proof. Assume that E is obtained applying an ERO of type (iii); if not a similar argument
applies. In other words, we assume that E is obtained by applying ri 7→ rj . Let F be the
elementary matrix obtained applying rj 7→ ri . Thus, for each choice of A, EA is obtained
swapping row i and row j . Thus, F (EA) = A for all choices of A, and hence F E = In ;
similarly we can prove that EF = In . Hence we proved that F is the inverse of E .
QED

We will often use the following basic fact about invertible matrices:
Proposition. If A and B are invertible n × n matrices, then AB is invertible and

(AB)−1 = B −1 A−1 .

Exercise. Prove the Proposition.

L8.2 Solving a homogeneous system. Let us show how ERO’s can be used to solve the
linear system 
 x1 + x2 + 2x3 + 3x4 = 0
5x1 + 8x2 + 13x3 + 21x4 = 0 (1)
34x1 + 55x2 + 89x3 + 144x4 = 0

written before.

We shall apply ERO’s to convert A into a matrix that is roughly triangular, and then solve
the resulting system.
 
1 1 2 3
A =  5 8 13 21 
34 55 89 144
 
1 1 2 3
r2 − 5r1 ∼  0 3 3 6 
34 55 89 144
 
1 1 2 3
1
3 r2 ∼  0 1 1 2 
34 55 89 144
 
1 1 2 3
r3 − 34r1 ∼ 0 1 1 2 
0 21 21 42
 
1 1 2 3
1
21 r3 ∼ 0 1 1 2
0 1 1 2
 
1 1 2 3
r3 − r2 ∼ 0 1 1 2
0 0 0 0

On the left, we jot down (in abbreviated form) the operations used. It is not essential to do
this, provided the operations are carried out one at a time; errors occur when one tries to be
too ambitious! It follows from the last matrix that (1) has the same solutions as the system

x1 + x2 + 2x3 + 3x4 = 0
x2 + x3 + 2x4 = 0

But one can see at a glance how to solve this; we can assign any values to x3 and x4 which
will then determine x2 (from the second equation) and then x1 (from the first). Suppose
that we set x3 = s and x4 = t (it is a good idea to use different letters to indicate free
variables); then

x2 = −s − 2t ⇒ x1 = −(−s − 2t) − 2s − 3t = −s − t. (2)

The general solution is

(x1 , x2 , x3 , x4 ) = (−s − t, −s − 2t, s, t),

or in column form      
−s − t −1 −1
 −s − 2t 
 = s  −1  + t  −2  ,
   
x= 
 s   1   0 
t 0 1
where s, t are arbitrary. The set of solutions is therefore
   
−1 −1
 −1   −2 
L {u, v} , where u= 
 1 ,
 v= 
 0 .

0 1
We shall see that the solution set of any homogeneous system is always a linear combination
of this type.
Exercise. Can you solve the system using the matrix form and elementary matrices?

Exercise. Compute u − v , and explain why this is also a solution.

L8.3 Equivalence relations. Applying ERO’s produces a natural relation on the set of ma-
trices of any fixed size.
Definition. A relationship ∼ between elements of a set is called an equivalence relation if
(E1) A ∼ A is always true,
(E2) A ∼ B always implies B ∼ A,
(E3) A ∼ B and B ∼ C always implies A ∼ C .

Observe that these three conditions are satisfied by equality =. On the set of real numbers,
‘having the same absolute value’ is an equivalence relation, but 6 is not. But we are more
interested in sets of matrices.
Definition. From now on, we write A ∼ B to mean that B is a matrix obtained by applying
one or more ERO’s in succession to A.

Note that A ∼ B is equivalent to the equality A = EB where the matrix E is a product of


matrices E = Er . . . E1 and each matrix Ei is an elementary matrix.
Proposition. This does define an equivalence relation, and if A ∼ B we say that A and B
are row equivalent.

Proof. By definition A ∼ B and B ∼ C imply that A ∼ C . Obviously A ∼ A (take (i) with


a = 0 or (ii) with c = 1, both of which are permitted).
Exercise. Prove the proposition using elementary matrices.

The condition (E2) is less obvious. But each of the three operations is invertible; it can be
‘undone’ by the same type of operation. For example, r1 7→ r1 − ar2 by r1 7→ r1 + ar2 . So if
A ∼ B then we can undo each ERO in the succession one at a time, and B ∼ A. QED

Example. If B is a matrix which has the same rows as A but in a different order, then A ∼ B .
This is because any permutation of the rows can be obtained by a succession of transpositions,
i.e. ERO’s of type (iii). For example, let
     
1 2 3 4 5 6 4 5 6
A =  4 5 6 , B =  1 2 3 , C =  7 8 9 .
7 8 9 7 8 9 1 2 3

Then
B is obtained from A by r1 ↔ r2 ; C is obtained from B by r2 ↔ r3 ,
so A ∼ C even though it is not possible to pass from A to C by a single ERO. Of course, it is
also true that A ∼ B , B ∼ C , and C ∼ A.

Exercise. In this example, can C be obtained from A by a succession of ERO’s of type (i)?

The theory of this lecture can be summarized by the


Proposition. Suppose that A ∼ B . Then the column vector x is a solution of Ax = 0 iff it
is a solution of Bx = 0.

Proof. Using elementary matrices we know that B = EA where E is a product of elemen-


tary matrices E = Er . . . E1 . Since elementary matrices are invertible also E is invertible
and hence Bx = 0 iff EAx = 0 iff Ax = E −1 0 = 0 and the result follows.
In other words, the homogeneous linear systems associated to row equivalent matrices have
the same solutions. This follows because each individual ERO leaves unchanged the solu-
tion space.
L8.4 Further exercises.

1. Use ERO’s to find the general solution of the linear system




 2x − 2y + z + 4t = 0
x − y − 4z + 2t = 0


 −x + y + 3z − 2t = 0
3x − 3y + z + 6t = 0.

2. Show that the matrix  


1 −2 1 2
A = 1 3 1 −3 
1 8 1 −8
is row equivalent to one with one row null.

3. Recall that a square matrix P ∈ Rn,n is called invertible if there exists a matrix P −1 of
the same size such that P −1P = In = P P −1 . Two matrices A, B are said to be similar if
there exists an invertible matrix P such that P −1AP = B . Prove that being similar is an
equivalence relation on the set Rn,n . (Hint: you will need the fact that (P −1 )−1 = P .)
Notes 9 – Reduced matrices and their rank
Solving a linear system by applying suitable ERO’s is called Gaussian elimination. In this
lecture we shall describe it more carefully, and present some variations. First we need to
know exactly what type of matrices we want to achieve by operating on the rows.

L9.1 Echelon forms. The idea is to convert a matrix into a steplike sequence of rows like an
arrangement of toy soldiers ready for battle. We shall make this precise using the
Definition. An entry of a matrix is called a marker if it is the first nonzero entry of a row
(starting from the left).

Consider the condition


(M1) there is at most (meaning not more than) one marker in each column.
This is important for two reasons. First, because it turns out that the equations defined by
the rows of a matrix satisfying (M1) are independent in a sense that we shall make precise
in a moment. Second, because of the following result that becomes self-evident in the light
of further practice:
Proposition. Given any matrix A, one may apply a sequence of ERO’s of type (i) so as to
obtain a matrix B satisfying (M1).

In solving the system, the order of the rows is immaterial. It is common practice to ‘tidy up’
by applying ERO’s of type (iii) to permute the rows so that
(M2) moving down the rows from the top, the markers move from left to right, and
(M3) all the null rows are at the bottom.
It is a consequence of (M1) and (M2) that all the entries (in the same column) underneath a
given marker are zero.
Definition. A matrix satisfying (M1)–(M3) is called step-reduced.

Warning. This is sometimes referred to as ‘row echelon form’, but is a stronger notion than
‘reduced’ in the sense of the Greco-Valabrega text. We prefer to work with step-reduced
matrices as they are easy to spot visually: the main null part has an approximately triangular
form, and the markers represent ‘corner soldiers’.
Once a matrix is step-reduced, its markers take on a greater significance and are also called
pivots; often we shall box them. We shall call a column marked if it contains a marker, and
otherwise unmarked. It is sometimes convenient to suppose in addition that
(M4) each marker equals 1.
This can be quickly achieved using ERO’s of type (ii), but may have the adverse effect of
introducing fractions elsewhere. A more stringent condition is that
(M5) each marker is the only nonzero entry in its column (above as well as below).
Definition. A matrix satisfying (M1)–(M5) is said to be super-reduced.

A step-reduced matrix can be made super-reduced by a process called backwards reduction.


Start with the last (bottom right) marker and subtract multiples of its row to remove all the
entries above it. Then do the same with the second-to-last marker and so forth. (The word
‘super’ is a pun since it means both ‘above’ and ‘extra’; some authors use the terminology
‘reduced row echelon form’ or RREF.)
Example. Consider again ! #
1 1 2 3
A= " 5 8 13 21 $ .
34 55 89 144
We have already step-reduced it to
! #
1 1 2 3
B=" 0 1 1 2 $.
0 0 0 0

To super-reduce B , we merely perform the operation r1 !→ r1 − r2 so as to obtain


! #
1 0 1 1
C=" 0 1 1 2 $.
0 0 0 0

When a matrix is super-reduced, it is possible to read off immediately the solution to the original
system. In the above example, we get the equations
%
x1 + x3 + x4 = 0
x2 + x3 + 2x4 = 0,

whence
x1 = −s − t, x2 = −s − 2t,
slightly more effectively than before.

L9.2 Linear independence. Converting a matrix to row echelon form has the effect of mak-
ing the transformed equations ‘independent’. We formulate this notion.
Definition. Let {u1 , . . . , uk } be a finite subset of Rn (meaning either in R1,n or Rn,1 ). The
set is called linearly independent (LI) if the equation

x1 u 1 + · · · + xk u k = 0 (1)

admits only the trivial solution x1 = · · · = xk = 0.

One often says ‘u1 , · · · , uk are linearly independent’, though strictly speaking being LI is
a property of a set or list and not its individual elements. The order of the elements is
immaterial, and any duplication prevents the list from being LI.
A singleton set {v} is LI iff v ∕= 0, and no set that contains 0 can be LI. A set {u, v} is LI iff
neither vector is a multiple (including zero times) the other. More generally,
Lemma. A set {u1 , . . . , uk } is LI iff no one element in it can be expressed as a linear combi-
nation of the others.

Proof. Suppose that the set is LI, but that one of the elements is a LC of the others. For sake
of argument, suppose that uk ∈ L {u1 , · · · , uk−1 }, so that uk = a1 u1 + · · · + ak−1 uk−1 for
some ai ∈ R. But then
a1 u1 + · · · + ak−1 uk−1 + (−1)uk = 0,
contradicting (1). The converse is similar. QED
Proposition. Let B be a step-reduced matrix. Then its nonzero rows are LI.

Proof. We refer to the example


! #
a1 0 2 3 0 4 6
& '
a2 &0 0 7 3 8 1'
a3
& '=B (2)
"0 0 0 0 2 4$
a4 0 0 0 0 0 0

assuming a linear relation


a1 r1 + · · · + a4 r4 = 0
between the nonzero rows. Perform the addition column by column, high school fashion.
The marker of r1 is the only nonzero entry in its column, so 2a1 = 0. Passing to the next
marker, 3a1 + 7a2 = 0, so a2 = 0 and so on. QED

Exercise. The same conclusion holds if B satisfies just the first condition (M1), namely that
there is at most one marker in each column. Indeed, once B satisfies (M1) we can make it
step-reduced just by changing the order of the rows.

The columns of the matrix in (2) are not LI. Even if we forget about c1 and c4 , we have
15
c6 = 2c5 − 7 c3 − kc2 ,

for some k ∈ R. This illustrates the


Proposition. Let B be a step-reduced matrix. A column cj of B is unmarked iff it is a LC
of the previous marked columns (or null if j = 1).

The point is that we can always express an unmarked column as a LC of the previous
marked ones by finding the coefficients one at a time starting from the bottom.
Corollary. If B is step-reduced then its marked columns are LI.

Thus, the markers of a step-reduced matrix ‘mark out’ an independent set of both rows and
columns. Whilst there may be unmarked columns in any position, row reduction ensures
that all the unmarked rows are null. If every column of a step-reduced matrix C is marked,
then the set of columns is LI and (from the column vector form of the system) Cx = 0 has
only the trivial solution. Here is an example that makes this clear:
! #
1 2 3 1 !
& ' "
" x + 2y + 3z + t = 0
& 0 1 2 1 ' #
& ' y + 2z + t = 0
C=& 0 0 1 1 ', (3)
& ' "
" z+t = 0
" 0 0 0 1 $ $
t = 0.
0 0 0 0

L9.3 The rank. This is defined to measure the number of independent rows or columns of
any matrix, in a way we shall make precise. We first restrict to reduced matrices. --> or the number of pivots
Definition. Let B be step-reduced. Its rank is the number of markers, or equivalently
nonzero rows. This number is denoted by rank B , rk B or r(B).
Example. The matrices in (2) and (3) have rank 3 and 4 respectively.

If B has size m×n then obviously r(B) ! min{m, n}. If r(B) ∕= min{m, n}, we can think
of B as ‘defective’. It r(B) does achieve the rank we say that B has full rank or that B has
maximal rank.
The importance of the rank derives from the
Theorem. Suppose B and C are both step-reduced and that B ∼ C . Then the markers of
B and C occur in exactly the same positions. In particular, r(B) = r(C).

This enables us to define the rank of an arbitrary matrix A to be the rank of any reduced
matrix B row equivalent to A. For if A ∼ B and A ∼ C with B and C step-reduced then
B∼A∼C ⇒ B ∼ C,
and their ranks are equal.
Proof. First we prove that the rank must be equal and then we deal with the position of the
markers.
We give a proof by contradiction. Assume that r(B) < r(C), that is assume that B has
more zero rows than C . Since B ∼ C there exists a matrix E such that B = EC where
E is a product of elementary matrices. In particular, by the definition of the row-column
product of matrices, each row of B is LC of rows of C . Just by leaving the zero rows of
C untouched we produce zero rows of B . However, at least a zero row is missing since
r(B) < r(C). Thus, this zero row is LC combination of non-zero rows of C , but this is a
contradiction since the non-zero rows of C are LI being C step-reduced. This proves that
r(B) = r(C).
To prove that the markers of B and C are in the same position we can proceed by induction
on r = r(B) = r(C). Since the argument is very straightforward, we just show that the
marker of the first row of B and the marker of the first row of c are in the same column.
Say that the marker of the first row of B is in column i and the marker of the first row of
C is in column j . Since B = EC and C is step-reduced, all column of C before j are zero
and thus all the columns of B before j are zero. Thus, i ≥ j . Similarly, the only non-zero
element of the j column of C is the marker and thus B has a non-zero element in column
j . Hence, j ≤ i and we conclude j = i.
QED

It is not hard to deduce an even stronger result of theoretical importance, namely


Corollary. Any matrix is row equivalent to a unique super-reduced one.

L9.4 Further exercises.

1. Use ERO’s to reduce the matrices


! # ! # ! #
! # 4 2 1 0 1 1 1 1 1 2 3 4
( ) 8 10 12
1 2 &3 7 0 5' &1 2 3 4 ' & 5 6 7 8 '
, "1 1 1 $, & ', & ', & ',
3 4 "0 8 4 2$ "1 4 9 16 $ " 9 10 11 12 $
9 7 6
0 1 1 0 1 8 27 64 13 14 15 16

and find their ranks.


! # ! #
0 −5 1 2 1
2. Given A = " −1 −1 0 $, B= "1 −1 $ , decide which of the following are valid:
0 −5 −3 2 4

r(A) = r(B), r(A) = r(AB), r(B) = r(AB).

3. Find matrices A, B ∈ R3,3 for which r(A) > r(A + B) > r(B).
! #
1 a 0 0
4. Let A = " 0 1 3 0 $ . Find the value of r(A) as a ∈ R varies.
2 0 0 a
Notes 10 – Solving a general system
Having introduced the rank of an arbitrary matrix, we are in a position to formulate the
celebrated results of E. Rouché (1832–1910) and A. Capelli (1855–1910) concerning solutions
of a (generally inhomogeneous) linear system of equations. From now on, we denote the rank
of a matrix M by r(M ).

L10.1 The augmented matrix. Let us return to the inhomogeneous system in matrix form

Ax = b , A ∈ Rm,n , x ∈ Rn,1 , b ∈ Rm,1 .

We associate with this the so-called augmented matrix


! $
a11 · · a1n b1
" · · · %
(A | b ) = "
# ·
%. (1)
· · &
am1 · · amn bm

The vertical bar reminds us that the last column is special, but in applying row operations
it should be ignored so that b is just treated as an extra column added to A. We solve the
system by applying ERO’s to (1) so that it becomes a step-reduced matrix

(A′ | b′ ). (2)

Warning: in doing this it is essential to apply each ERO to the whole row, including bi ; thus
the last column will usually change unless it was already null.
Note that if (2) is step-reduced, so is the left-hand matrix A′ . By definition then,

r(A) = r(A′ ), r(A | b) = r(A′ | b′ ),

where we are relying on the previous Theorem. Furthermore,

r(A′ ) ! r(A′ | b′ ) ! r(A′ ) + 1,

and we consider the two possibilities in turn. First, we record the


Lemma. The m rows of a matrix A ∈ Rm,n are LI if and only if r(A) = m. In particular,
since r(A) ! n, no set of LI vectors in Rn has more than n elements.

Proof (of ‘only if’). Suppose by contradiction that the rows of A are LI but that r(A) < m.
Thus A ∼ B , B step-reduced with the last row equal zero. Since each row of B is a LC of
rows of A we get a contradiction. QED

Think of LI rows as ‘incompressible’: when the matrix is reduced they are not diminished
in number.
Exercise. (i) The columns of a matrix A are LI iff the matrix equation Ax = 0 admits only the
trivial solution x = 0 ∈ Rm,1 .
(ii) The rows of a matrix are LI iff the equation xA = 0 only has the trivial solution x = 0 ∈ R1,n .
(iii) The rows of a matrix are LI iff the equation A⊤ x = 0 only has the trivial solution x = 0 ∈
Rn,1 .
L10.2 Inconsistent systems. Given a linear system, the student-friendly situation is that in
which there are no solutions, as one does not have to waste time finding them!
Proposition. (RC1) If r(A) < r(A | b), the system has no solutions.

This case can only occur if r(A) is less than the number m of its rows, since otherwise both
matrices will have rank m.
Proof. Let r = r(A) < r(A | b). Then the first null row of A′ is the (r + 1)st and will be
followed by br+1 ∕= 0 in the step-reduced matrix (2). This row represents a contradictory
equation
0x1 + · · · + 0xn = br+1 ,
and the only way out is that the xi do not exist.

L10.3 Counting parameters. The consistent case is characterized by the


Proposition. (RC2) If r(A) = r(A | b), there exist solutions depending upon n − r parame-
ters where r is the common rank. If n = r there is a unique solution.

Of course, if the system is homogeneous we must be in this case since A and (A | 0) only
differ by a null column.
Proof. Each column of A and A′ corresponds to a variable, so we can speak of ‘marked’ and
‘unmarked’ variables. It is easier (but not essential) to assume that A′ is super-reduced, in
which case its ith row has the form
(0 · · · 0 1 ? . . . ? | ci ),
and represents an equation
marked variable + LC of unmarked variables = ci .
It follows that we can assign the unmarked variables arbitrarily and solve uniquely for each
of the marked variables in terms of them. QED

In the light of the procedure above, the unmarked variables are called free variables, and in
the solution it is good practice to give them new names such as s, t, u · · · or t1 , t2 , t3 . . .. The
conclusion is traditionally expressed by the statement
‘If r(A) = r(A | b) = r then the linear system has ∞n−r solutions’.
This is a useful way of recording the result that can be understood as follows. The ac-
tual number m of equations is irrelevant; what is important is the number of LI or effective
equations, and this is the rank r . Each effective equation allows us to express one of the n
variables in terms of the others, so we end up with n − r free variables or parameters.

L10.4 Inversion by reduction.


Having introduced the augmented matrix, we can apply similar techniques to solve matrix
equations of the type AX = B where X and B are matrices rather than just column vectors.
A special case is
AX = In , A, X ∈ Rn,n ,
whose solution X (if it exists) is necessarily A−1 . As a consequence,
Proposition. If A ∈ Rn,n is invertible then the unique super-reduced form of (A | In ) is
(In | A−1 ).

Proof. We use elementary matrices to perform Gaussian elimination. Namely, to super-


reduce the matrix (A | In ) we have to compute the product

E(A | In ) = (EA | E)

where E is a product of suitable invertible matrices super-reducing A, that is EA = In and


thus E = A−1 . The result is now proved.
Here is an example:
! $ ! $
1 1 2 1 0 0 1 1 2 1 0 0
(A | I5 ) = # 3 5 8 0 1 0& ∼ # 0 2 2 −3 1 0&
13 21 35 0 0 1 13 21 35 0 0 1
! $ ! $
1 1 2 1 0 0 1 1 2 1 0 0
∼ #0 2 2 −3 1 0& ∼ #0 2 2 −3 1 0&
0 8 9 −13 0 1 0 0 1 −1 −4 1
! $ ! $
1 1 2 1 0 0 1 1 0 3 8 −2
∼ #0 2 0 −1 9 −2 & ∼ # 0 2 0 −1 9 −2 &
0 0 1 −1 −4 1 0 0 1 −1 −4 1
! $ ! $
7 7
1 1 0 3 8 −2 1 0 0 2 2 −1
" % " %
∼ #0 1 0 − 12 9
2 −1 & ∼ # 0 1 0 − 12 9
2 −1 &.
0 0 1 −1 −4 1 0 0 1 −1 −4 1

confirming the inverse found in L2. The matrices on the right act as a ‘book-keeping’ of the
ERO’s which there is no need for us to record separately.

L10.5 Further exercises.

1. Consider the linear systems


" "
! # x1 − 2x2 = 1 # −x1 − 2x2 + x3 − 2x4 = 0
−x1 − 2x2 + x3 = 0
(i) (ii) 2x1 + x2 = 2 (iii) 2x1 + 3x2 − 3x3 + 3x4 = 1
x1 + 3x2 − x3 = a, $ $
x1 − x2 = b, x2 + x3 + x4 = c.

Is it true that for every a ∈ R, (i) has infinitely many solutions?


Is it true that (ii) never has a solution irrespective of the value of b?
Is it true that for all c ∈ R, (iii) has a solution?
"
# x1 + x2 + x3 = k
2. Given the system x1 − kx2 + x3 = −1 find all solutions in the case that k = −1.
$
−x1 + kx2 + x3 = k,
Then discuss the existsence of solutions as k varies.
3. Find a relation between h1 , h2 , h3 in order that
"
# x1 − 2x2 + x3 + 2x4 = h1
x1 + 3x2 + x3 − 3x4 = h2
$
2x1 + x2 + 2x3 − x4 = h3

has a solution. Find the general solution when h1 = −1, h2 = 4, h3 = 3.


! 5
$
2 2 3
4. Use ERO’s to reduce the matrix A = # 4 5 a & with a, b ∈ R. One of the following
b b b
statements is false. Which?
(i) If b = 0 then r(A) ! 2,
(ii) A is invertible if a = b = 1,
(iii) A is invertible if a ∕= 6,
(iv) r(A) " 1 for any a, b ∈ R.

5. Find values of t ∈ R for which each of the following matrices is not invertible:
! $
! $ −t 3 −3 −6
' ( ' ( −t −2 −3
1 −t 3−t −2 " 0 −t 0 0 %
, , # 0 1−t 1 &, " %.
t 4 −5 −t # 1 1 −t 0 &
1 2 −t
1 0 0 −t

$ !
λ −1 1
6. Find the values of λ ∈ R for which A = # 0 1 & is invertible. Now set λ = 1, and
2
0 λ 1
! $
2 1
solve the matrix equation AX = B where X ∈ R3,2 and B = # 0 1 & .
2 0
! $ ! $
1 2 3 4
7. Given A = # 4 5 6 & , b = # 8 & , verify that the equation AX = b has an infinite
1 3
2 1 2 2
number of solutions and determine the number of free parameters.
' (
2 −1
8. Given A = , which of the following equations admit at least one solution?
−4 2
' ( ' ( ' ( ' ( ' ( ' ( ' ( ' (
x 0 x 1 x 1 2 x 1
A = , A = , A = , A = .
y 0 y 0 y −2 y −2

9. Give the matrices


$!
! $ x1 ! $ ! $ ! $
2 −3 −2 1 " x2 % 1 1 0
A= # 4 −6 1 −2 & ; X=" %
# x3 & ; B1 = # 2 & , B2 = # 2 & , B3 = # 0 & ,
6 −9 −1 −1 3 0 0
x4

determine the solutions of the matrix equations AX = B1 , AX = B2 , AX = B3 .


Notes 11 – Rank, inverse, and determinant (part II)
We will see how the rank of a matrix relates to the existence of the inverse. Then we will see
how to define, and compute, the determinant of any square matrix. Finally, we will prove
the famous result: A is invertible iff det(A) ∕= 0 (that is A is non-singular).

L11.1 Rank and inverse. Using row reduction to super-reduce the matrix (A|In ) we com-
puted the inverse of the n×n matrix A in the case the inverse exists. What happens when
we row reduce (A|In ) if A is not invertible? In this case we cannot super-reduce the matrix
and it turns out that the all situation is better explained using the notion of rank
Theorem. If A is a n×n matrix, then A is invertible iff r(A) = n.

Proof. If A is invertible we know that we can super-reduce (A|In ) and obtain (In |A−1 ).
In particular, we can super-reduce A and obtain In . Thus, we have that that A ∼ In and
hence r(A) = r(In ) = n. Conversely, if r(A) = n we can row reduce A and obtain a n×n
row reduced matrix A′ . Note that all the diagonal elements of A′ must be special element
so that they must be non-zero while all the elements below the diagonal must be zero. We
can now use the non-zero diagonal elements to super-reduce A′ and obtain In . Thus we
can super-reduce (A|In ) and obtain (In |A−1 ). Hence A is invertible.
In the previous argument we met matrices of a special kind, called triangular matrices. A
square matrix T = (tij ) is called upper triangular if tij = 0 whenever i > j ; the matrix is
called lower triangular if tij = 0 whenever i < j . Note that upper/lower refers to the only
possibly non-zero elements.
A special case of triangular matrix is the one of diagonal matrices which are both upper and
lower triangular. Namely, D = (dij ) is a diagonal matrix if dij = 0 whenever i ∕= j .
Exercise. Find conditions on the entries of diagonal matrix such that it is invertible. Can you
easily find the inverse when it exists?

Exercise. Find conditions on the entries of triangular matrix such that it is invertible.

L11.2 Computing determinants. We know how to compute the determinant of a 2×2 matrix
and this is the key to define/compute the determinant of any matrix. Here is a recursive
definition of determinant.
Definition. Let A = (aij ) be a n×n matrix and denote by Aij the (n − 1)×(n − 1) submatrix
of A obtained by deleting the i-th row and the j -th column. If n = 2, then det(A) =
a11 a22 − a12 a21 . If n > 2, then we define the determinant of A as follows

det(A) = a11 det(A11 ) − a12 det(A12 ) + . . . + (−1)1+n a1n det(A1n )


!
or, in a more compact, way det(A) = ni=1 (−1)1+i a1i det(A1i ).

This definition is called recursive since it reduces the computation of the determinant of a
n×n matrix to the computation of several determinants of (n − 1)×(n − 1) matrices and
so on ; the process stops when we hit 2×2 matrices for which we know how to explicitly
compute the determinant.
Exercise. Does Sarrus’ rule computation agree with our recursive definition in the case of 3×3
matrices?

Clearly, if the first row of A is zero, then det(A) = 0, but even more is true:
Proposition. If A has a zero row, then det(A) = 0.

Proof. We give an proof by induction. Clearly the result holds for 2×2 matrices. Assume
now that the result holds for all (n − 1) × (n − 1) matrices and let A be a n × n matrix.
Let ra be a zero row of A. If a = 1 we are done. If a ≥ 2 then all the (n − 1) × (n − 1)
matrices
!n A1 j have a zero row. Thus, A1j = 0 by the inductive hypothesis. Hence, det(A) =
1+i a det(A ) = 0 and the proof is completed.
i=1 (−1) 1i 1i

Exercise. Prove that a square matrix with a zero column has zero determinant.

Exercise. Let r be a row of the square matrix A and consider the matrix B obtained by replacing
r with λr in A . Compute det(B) using det(A) .

L11.3 Determinants and Gaussian elimination. Since computing determinants using the
definition is a computationally nasty business, we want to find shortcuts and one is pro-
vided by ERO’s. The key tool is the following:
Binet’s Theorem. Let A, B be n×n matrices, then det(A)B = det(A)det(B).

Thus, to know how ERO’s affect the determinant of a matrix, we only need to know the
determinant of all elementary matrices and this is easily done
Proposition. If E is the elementary matrix corresponding to ERO of type
(i) ri &→ ri + arj , i ∕= j , then det(E) = 1
(ii) ri &→ cri , c ∕= 0, then det(E) = c
(iii) ri ↔ rj , then det(E) = −1.

Thus, ERO’s of type (i) do not change the determinant while ERO’s of type (ii), respectively
(iii), multiply the determinant by c, respectively by −1.
When applying Gaussian elimination to a square matrix A we will always get an upper
triangular matrix T , that is EA = T for suitable product of elementary matrices E =
E1 . . . Er . Since we know det(E)i , we can apply Binet’s Theorem to reduce the computation
of det(A) to the computation of det(T ).
Proposition. If T is an upper triangular matrix, then det(T ) is the product of the diagonal
elements of T .

Proof. (Hint) We can again provide a proof by induction on n, the size of T . It is enough to
note that A11 is again an upper triangular matrix of size n − 1, while A1j for j ≥ 2 has a
zero column.

L11.4 Inverse.
We can now prove our main result.
Theorem. A matrix A is invertible iff det(A) ∕= 0.
Proof. Let A be a n×n matrix. If det(A) ∕= 0, then A ∼ T where T is a upper triangular ma-
trix with all diagonal elements different from zero. Hence, r(T ) = r(n) and A is invertible.
If A is invertible, then there exists A−1 such that AA−1 = In . Thus, using Binet’s Theorem,
det(A)det(A)−1 = det(I)n = 1 and hence det(A) ∕= 0.
Exercise. If A is invertible, compute the determinant of the inverse matrix.

Determinants also provide a way to compute the inverse of an invertible matrix. This is
based on a generalization of the properties of the mixed product.
Laplace’s Theorem. Let A be a n×n matrix, then
!
(i) !ni=1 (−1)1+i aji det(Aji ) = det(A) for all j
(ii) ni=1 (−1)1+i aji det(Ali = 0) for all j and l such thatj ∕= l .

In words, (i) says that we can compute det(A) using any row and not only the first one,
while (ii) says that mixing the Ali with a row different from rl gives zero.
We can define the adjoint matrix as done in a previous lecture and use it to compute the
inverse thanking to Laplace’s Theorem.
!=!
Exercise. Given a n×n matrix A , let A aij be its adjoint matrix, where !
aij = (−1)i+j det(()Aij ) .
Prove that AA!⊤ = det(A)In , thus, if det(A) ∕= 0 , A−1 = 1 A !⊤ .
det(A)

L11.5 Further exercises.


Exercise. Compute the following determinants using the recursive formula and Gaussian elimi-
nation: " $
" $ " $ 1 1 1 1
1 0 1 1 2 3 & 1 1 1 1 '
(i) # 0 1 0 % , (i) # 4 5 6 % , (iii) &
# 1 1 1 1 %.
'
1 0 1 7 8 9
1 1 1 1

Exercise. For which values of t are the following matrix invertible? For the values of t for which
the matrix
" is invertible
$ find the" inverse. $
1 0 2 t 1 1
(i) # 0 t 0 % , (ii) # 1 t 1 % .
3 0 4 1 1 t

Exercise. Compute det(A)B in the following cases:


( ) ( )
1 2 1 0
(i) A = , B= ,
0 3 2 3
" $
* + 3
(ii) A = 1 2 3 , B = # 4 % ,
5
" $
3 * +
(iii) A = # 4 % , B = 1 2 3 .
5
Notes 12 – Subspaces of Rn
We can understand the theory of matrices better using the concept of subspace.

L12.1 Closure. We continue to use Rn to denote either the set R1,n of row vectors, or the set
of column vectors Rn,1 . The definitions in this lecture apply equally to both cases, though
at times it is best to specify one or the other.
Definition. A subspace of Rn is a nonempty subset V that is ‘closed’ under addition and
multiplication by a constant, meaning that these operations do not allow one to escape from
the subset V (like a room with closed doors!).

Thus, V is a subspace iff


(S1) u, v ∈ V ⇒ u + v ∈ V ,
(S2) a ∈ R, v ∈ V ⇒ av ∈ V .
The word ‘space’ conveys the fact that, with these operations, the subset V acquires a struc-
ture of its own without the need to refer to Rn . It is an immediate consequence that any
subspace must contain the null vector. For if v ∈ V then

0 = v + (−1)v ∈ V.

Moreover, the singleton set {0} consisting of only the null vector is always a subspace. It
is called the null subspace or zero subspace and any other subspace of Rn must have infinitely
many elements.
Warning: do not confuse the null subspace with the empty set ∅ that is not counted as a
subspace.
At the other extreme is Rn itself. There is no doubt that this is a subspace, as conditions (S1)
and (S2) are satisfied by default: the vectors u+v and av are certainly in Rn as they have
nowhere else to go!
Example. To test whether a subset of Rn is a subspace, check first that it contains 0 . Be careful
though; here are two subsets of the plane that both contain 0 but are not subspaces:
(a) A = {(x, y) ∈ R2 : x ! 0 and y ! 0 , geometrically the first quadrant; it satisfies (S1) but not
(S2).
(b) B = {(x, y) ∈ R2 : xy = 0} , geometrically the union of the two axes; it satisfies (S2) but not
(S1).

In practice, subspaces are constructed by taking linear combinations of vectors:


Lemma. Any subset L {u1 , . . . , uk } (with each ui ∈ Rn ) is a subspace.

Proof. For simplicity, suppose that we have only two vectors u1 = u, u2 = v . Two arbitrary
elements of L {u, v} are then au+bv , cu+dv , and their sum

(au + bv) + (cu + dv) = (a + c)u + (b + d)v

obviously stays in L {u, v}. So does any multiple of au + bv . QED

The converse of this result is valid, namely that any subspace of Rn can be expressed in
the form L {u1 , . . . , uk }. To see this, one chooses a succession u1 , u2 , . . . of vectors in V ,
preferably in such a way that each one is not a LC of the previous ones. We shall explain
this better in the next lecture.

L12.2 Solution spaces. The set of solutions of a homogeneous system considered previ-
ously had the form V = L {u, v}, where u, v were two column vectors, and is therefore a
subspace. But there is a more basic reason for this:
Proposition. Given a matrix A ∈ Rm,n , the set
{x ∈ Rn,1 : Ax = 0} (1)
of solutions of the associated homogeneous linear system is always a subspace of Rn,1 .

Proof. This follows from the corresponding properties of matrix multiplication. If x, y are
solutions then
A(x + y) = Ax + Ay = 0 + 0 = 0,
so x + y is a solution too. Similarly,
A(ax) = a(Ax) = a0 = 0,
and ax is a solution for any a ∈ R. QED

Definition. The subspace (1) is called the null space or kernel of the matrix A, and denoted
Ker A.

Example. Let W denote the set of vectors (x, y, z) that satisfy x + y + z = 0 . Since this is
effectively a linear system (with m = 1 and n = 3 ), W is a subspace of R3 . But we can easily
express it as a LC by picking a couple of elements in it. Let u = (1, −1, 0) and v = (0, 1, −1) .
Both lie in W since their entries add up to 0. But we claim that any element (x, y, z) of W is a
LC of u and v . Indeed,
(x, y, z) = (x, −x − z, z) = xu − zv,
as claimed. Thus W = L {u, v} .

Warning: The solution set is only a subspace when the system is homogeneous. For a inho-
mogeneous system Ax = b, the solution set has the form
{x0 + v : v ∈ Ker A}.
Here, x0 is any particular solution of the inhomogeneous equation Ax = b; the difference
of any two such solutions x0 , x1 belongs to Ker A because A(x1 − x2 ) = 0.

L12.3 Subspaces defined by a matrix. Given a matrix A, two separate collections of vectors
are staring us in the face:
the rows r1 , · · · , rm ∈ R1,n of A, and
the columns c1 , · · · · · · , cn ∈ Rm,1 of A.
These give rise to two respective subspaces that complement the one already defined in (1).
Definition. With the notation above,
(i) the row space of A, denoted Row A, is L {r1 , · · · , rm } ⊂ R1,n , and
(ii) the column space of A, denoted Col A, is L {c1 , · · · · · · , cn } ⊂ Rm,1 .
More informally, Row A is a subsapce of Rn , whereas Col A is a subspace of Rm .
Each row of A corresponds to an equation of the linear system with augmented matrix
(A | 0). We already know that there are many ways to transform this system into an equiv-
alent one with the same solutions. The next result formalizes the fact that it is the row space
Row A (rather than the individual rows of A) that determines the solution space Ker A.
Lemma. Ker A = {x ∈ Rn,1 : rx = 0 for all r ∈ Row A}.

Proof. Since ! #
r1 · x
Ax = " · $,
rm · x
x belongs to Ker A iff ri x = 0 for all i. This implies that rx = 0 for any r ∈ Row A since
such an r is a LC of the rows r1 , · · · , rm . Conversely, if rx = 0 for all r ∈ Row A then
certainly ri x = 0 for all i, and so Ax = 0. QED

Recall the notion of row equivalence. It is easy to see that


A∼B ⇒ Row A = Row B. (2)
For if A ∼ B , each row of B is obtained from A using ERO’s, and Row B ⊆ Row A. But the
process is reversible: B ∼ A and Row A ⊆ Row B . It follows from the Lemma that
A∼B ⇒ Ker A = Ker B, (3)
confirming something we already know: if two matrices A, B are row equivalent then the asso-
ciated homogeneous systems have the same solutions.
We can complete these observations by the next result, which is easily memorized.
Theorem. Let A, B be two matrices of the size m × n. The following are equivalent:
(i) A ∼ B , i.e. A and B are related by ERO’s.
(ii) Row A = Row B ,
(iii) Ker A = Ker B .

This is especially relevant in the case in which B is a step-reduced matrix obtained by ap-
plying ERO’s to A. Notice that the statement A ∼ B forces the matrices to have the same
size – one could relax this requirement (and retain the Theorem’s validity) by introducing a
fourth ERO, that of deleting null rows.
! "
0
Warning. A ∼ B does not imply that Col A = Col B ; to see this reduce the matrix .
1
Proof. To prove that (i) ⇒ (ii) recall that A ∼ B gives A = EB where E is the product of
elementary matrices and thus each row of A is a LC of rows of B , hence Row A ⊆ Row B .
The opposite inclusion follows since we also have B ∼ A and we can repeat the same
argument.
To prove that (ii) ⇒ (i) we note that the equality Row A = Row B is equivalent to the fact
that each row of A is a LC of rows of B , and viceversa, hence A ∼ B
To prove that (ii) ⇒ (iii) we use (i) to show that the systems Ax = 0 and Bx = EAx = 0
have the same solution sets (remember that E is invertible!).
Finally, to prove that (iii) ⇒ (ii) we use results and ideas about bases and dimension (see
further lectures). It is possible to find matrices A′ and B ′ in such a way that
Row A′ = Ker A and Row B ′ = Ker B
and, moreover,
Ker A′ = Row A and Ker B ′ = Row B.
Thus the result follows applying (ii) to the matrices A′ and B ′ .
QED

L12.4 Further exercises.

1. Use ERO’s to show that the following subspaces of R4 coincide:

L {(1, 2, −1, 3), (2, 4, 1, −2), (3, 6, 3, −7)} , L {(1, 2, −4, 11), (2, 4, −5, 14)} .

2. Consider the following subspaces of R4 :

U = L {(1, 2, −1, 3), (2, 4, 1, −2), (3, 6, 3, −7)} , V = L {(1, 2, −4, 11), (2, 4, 0, 14)} .

Is it true that U ⊆ V or V ⊆ U ?

3. Let W = {(x, x, xy, y, y) : x, y ∈ R}. Which of the following statements is true?


(i) W is a subspace of R5 ,
(ii) W is contained in a proper subspace U (so W ⊆ U ∕= R5 ),
(iii) W contains a subspace V that is not null (so 0 ∕= V ⊆ W ).

4. Show that
Col A = {x⊤ ∈ Rm,1 : x ∈ Row(A⊤ )}.
This means that Col A is effectively the same as Row(A⊤ ). We could define a fourth susb-
pace Ker(AT ) = {x ∈ Rm,1 : x⊤A = 0}, but we have enough work to do studying Row A
and Ker A for the time being!
Notes 13 – Bases and dimension
The concepts of linear combination and linear independence are combined in the definition
of the basis of a subspace V of Rn . Any two bases have the same number of elements, called
the dimension. When V is represented as the row space of a matrix A, its dimension equals
the rank of A.

L13.1 Redundant elements. Describing a subspace as L {u1 , · · · , uk } is all very well, but
there are infinitely many ways to choose the representative vectors u1 , · · · , uk . Suppose that
{u, v} is a linearly independent set, and consider the subspace
V = L {u, 2u, u+v, 0, u−7v} .
The right-hand side is a bit ridiculous since most of its elements are redundant. We can
make the list of vectors more effective as follows:
Retain the non-zero vector u.
Discard 2u because it is already in L {u}.
Retain u + v becuase it is not in L {u}.
Discard 0 (since the null vector is already present in L {u, v}!).
Discard u−7v as it too is in L {u, v}.
We are finished with V = L {u, u+v}, which of course equals L {u, v} though it may be
that u+v is simpler than v in a numerical example.
Definition. Let V be a subspace of Rn . A basis of V is a linearly independent set {v1 · · · , vk }
such that V = L {v1 · · · , vk }.

A basis of V is therefore characterized by two key properties:


(B1) it generates V , meaning that any element in V is a LC of the basis, and
(B2) the basis is LI.
The second condition implies that a basis can contain no more than n elements (for we can
represent the basis elements as rows of a matrix with n columns, so at most n rows are LI).
We shall often indicate a particular basis as follows: B = {v1 , . . . , vk }. Some authors re-
quire a basis to be an ordered set, in which case one could write B = (v1 , . . . , vk ).
Warning. Since no LI set can contain 0, neither can a basis.

L13.2 Bases of Rn . The definition of basis makes perfect sense if we take the subspace
V = Rn (thought of as either row or column vectors), and there are infintely many bases
to choose from. But the most obvious is the one consisting of the rows or columns of the
matrix In . In particular,
Definition. The canonical basis of Rn,1 consists of the columns of In , and its individual
elements are denoted e1 , . . . , en .

Thus, ! $ ! $
1 0
"0% "1%
e1 = # %
" , e2 = # %
" , ...
·& ·&
0 0
It is obvious that any v ∈ Rn,1 can be written in one and only one way as a LC of this basis:
$!
a1
v = # · & = a 1 e1 + · · · + a n en .
an
But this property holds for any basis of any subspace. By property (B1), v can certainly be
written in some way as a LC of the basis, and by (B2),
v = a 1 e1 + · · · + a n en = b 1 e1 + · · · + b n en
⇒ (a1 −b1 )e1 + · · · + (an −bn )en = 0
⇒ a1 −b1 = 0, · · · an −bn = 0.

Exercise. Is B = {(2, −1, −1), (1, −2, 1), (1, 1, −2)} a basis of R1,3 ?

L13.3 Finding bases by reduction. Bases can in theory be computed by the ‘discard/retain’
method described above. But often it is more effective to use row reduction. If B is step-
reduced, we know that the nonzero rows of B are LI. They certainly generate Row B be-
cause the missing rows are null! Thus we have the
Proposition. The nonzero rows of a step-reduced matrix B form a basis of Row B .

This gives us a prescription for finding a basis of any susbspace


V = L {u1 , · · · , uk } ⊂ Rn .
First suppose that the ui are row vectors. We can construct a matrix A ∈ Rk,n by taking
these vectors in any order to be the rows of A. Apply ERO’s to A to obtain a step-reduced
matrix B . Then the nonzero rows of B form a basis of Row B = Row A = V .
We can apply an identical procedure if the ui are column vectors; we only need to transpose
them into rows, and at the end of our labours transpose the nonzero rows of B back into
columns. We shall illustrate the latter by constructing a basis of the subspace
!! $ ! $ ! $ ! $%
" " 1 % " 6 % " 11 % " 16 % "
"
" "
# " 2 % " 7 % " 12 % " 17 % " &
W =L " 3 % , " 8 % , " 13 % , " 18 % ⊂ R5,1 .
" " % " % " % " %"
" # & # 9 & # 14 & # 19 & "
$ 4
" "
'
5 10 15 20
We convert the columns into rows and reduce the 5×4 matrix:
! $ ! $
1 2 3 4 5 1 2 3 4 5
" 6 7 8 9 10 % " 0 −5 −10 −15 −20 %
" % ∼ " %
# 11 12 13 14 15 & # 0 −10 −20 −30 −40 &
16 17 18 19 20 −15 −30 −45 −60 −75
! $ ! $
1 2 3 4 5 1 2 3 4 5
" 0 -5 −10 −15 −20 % " 1 4%
"
∼ # %∼ " 0 2 3 %
0 0 0 0 0 & # 0 0 0 0 0&
0 0 0 0 0 0 0 0 0 0
! $
1 0 −1 −2 −3
" 0 1 2 3 4 %
∼ "
# 0
%.
0 0 0 0 &
0 0 0 0 0
The fact that W has a basis of two elements is already clear after two steps, and the super-
reduction was unnecessary. But we can now give three useful bases:
!! $ ! $% !! $ ! $% !! $ ! $%
"
" 1 0 " " "
" −1 0 " " "
" 1 6 "
"
#" % " 1 %" & "
# " 0 % " 1 %" & "
# " 2 % " 7 %" "
2
" % " % " % " % " % " %&
W =L " % " % " % " % " % " %
" 3 %, " 2 %" = L "" 1 %, " 2 %" = L "" 3 %, " 8 %" .
"
" " " " "
"
$
# 4 & # 3 & "
' "
$
# 2 & # 3 & "
' "# 4 & # 9 &"
$ "
'
5 4 3 4 5 10

L13.4 Dimension. First we reassure ourselves that any subspace of Rn has a basis.
Theorem. Let V be a subspace of Rn which is not null. Then V has a basis consisting of at
most n elements.

Proof. By assumption, V contains a nonzero vector u1 . Then


either V = L {u1 } or we can choose u2 ∈ V \ L {u1 }.
In the latter case, u2 cannot be a multiple of u1 , and
either V = L {u1 , u2 } or we can choose u3 ∈ V \ L {u1 , u2 }.
In the latter case {u1 , u2 , u3 } is LI since any linear relation could be used to express u3 as
a LC of u1 , u2 , and the procedure can be continued. The set of elements u1 , u2 , . . . that we
have selected at each stage is necessarily LI for the same reason. But we know that it is
impossible to have more than n elements of Rn that are LI. So the process must stop. QED

Recall that if the rows of a matrix A ∈ Rm,n are LI then r(A) = m. A deeper fact we have
seen is that if A, B are two matrices of the same size with Row A = Row B then A ∼ B .
Corollary. Let V be a subspace of Rn that is not null. Any two bases of V have the same
number of elements.

Proof. We give a proof by contradiction. Given two bases {a1 , . . . , aℓ } and {b1 , . . . , bm }
assume that ℓ < m.
Form the matrix of size (ℓ + m)×n
( +
← b1 →
) ← b2 → ,
) ,
) ··· ,
) ,
A=)
) ← bm → ,.
,
) ← a1 → ,
) ,
* ··· -
← aℓ →

Clearly r(A) ≥ m since {b1 , . . . , bm } is LI. Now apply ERO’s to A in order to obtain
( +
← a1 →
) ··· ,
) ,
) ← aℓ → ,
) ,
B=) ,
) ← b1 → , .
) ← b2 → ,
) ,
* ··· -
← bm →
Clearly A ∼ B and since bi ∈ L{a1 , . . . , aℓ } for all i we also have B ∼ C where

( +
← a1 →
) ··· ,
) ,
) ← aℓ → ,
) ,
C=)
) ← 0...0 → ,.
,
) ← 0...0 → ,
) ,
* ··· -
← 0...0 →

Thus, r(A) = r(C) ≤ ℓ. Hence a contradiction since ℓ < m ≤ r(A).


QED

The number of elements in a basis is called the dimension of V , and we write it as dim V .
For a subspace of Rn , we know that dim V ! n. Thus,

V = L {u1 , . . . , uk } , k = dim V.
. /0 1
LI
If V = {0} then we set dim V = 0, and (by convention) declare ∅ to be a basis.
Our final big result before turning to more geometrical applications is
Theorem. For any matrix A of size m × n,

dim(Row A) = r(A), dim(Ker A) = n − r(A).

Proof. We may suppose that A is step-reduced as none of Row A, Ker A, r(A) changes un-
der ERO’s. The nonzero rows of A then form a basis of Row A, whence the first equality.
The second equality is then a restatement of (RC2), whereby the solutions of the homoge-
neous system Ax = 0 depend on n − r(A) free parameters. More precisely, a basis of the
Ker A is given by the solutions of the form

x = (a1 , . . . , aj−1 , −1, 0 . . . , 0)T ,

of which there is one for each unmarked column. QED

We have also stated that the marked columns of B (of which there are r(B) in number)
form a basis of Col B , though the latter does in general change with ERO’s. But the columns
of two row equivalent matrices satisfy the same linear relations, so the same columns will
form a basis of Col A. Thus,
dim(Col A) = r(A).

Corollary. r(A) = r(A⊤ ) for any matrix A.

This result is important as it means all the definitions and procedures that we carried out
would have given the same results had we interchanged the roles of rows and columns. We
shall return to study the column space of a matrix in Part II.

L13.5 Further exercises.


1. Find the dimensions of the following subspaces of R5 :

U = L {(1, 3, −2, 2, 3), (1, 4, −3, 4, 2), (2, 3, −1, −2, 9)} ,
V = L {(1, 3, 0, 2, 1), (1, 5, −6, 6, 3), (2, 5, 3, 2, 1)} .

Let W denote the subspace generated by all six row vectors. Find dim W .

2. Find the dimensions of the following subspaces of R5 :

U = {(x1 , x2 , x3 , x4 , x5 ) : 2x1 − x2 − x3 = 0, x4 − 3x5 = 0},


V = {(x1 , x2 , x3 , x4 , x5 ) : 2x1 − x2 + x3 + 4x4 + 4x5 = 0}.

Let W denote the subspace consisting of vectors satisfying all three equations. Is it true that
dim W = 5?

3. Consider the following vectors of R4 :

w1 = (1, 0, 1, 0), w2 = (2, h, 2, h), w3 = (1, 1 + h, 1, 2h).

Find the dimension of L {w1 , w2 , w3 } as h varies.

4. Find a basis of R4 that contains both a basis for U and basis of V , where

U = {(x, y, z, t) ∈ R4 : x − 2z = y = 0},
V = L {(0, 2, 1, −1), (1, −2, 1, 1), (1, 2, 3, −1), (1, 2, 7, 1)} .

5. Explain carefully why the marked columns of a step-reduced matrix B form a basis of
Col B .
Notes 14 – Vector Spaces
The theory of linear combinations, linear independence, bases, and subspaces that we have
studied in relation to Rn can be generalized to the more general study of vector spaces. Any
subspace of Rn (including of course Rn itself) is an example of a vector space, but there are
many others including sets of matrices, polynomials and functions.

L14.1 Motivation. A subspace of Rn is the prime example of a vector space, but there are a
number of reasons for discussing the general definition, namely
(i) to emphasize aspects of the theory that do not depend upon the choice of a specific basis,
(ii) to allow the use of scalars that are different from real numbers,
(iii) to extend the theory to function spaces of infinite dimensions.
We shall explain each of these points in turn.
(i) The whole description of Rn is modelled on the existence of its canonical basis. To be
specific, consider Rn,1 and let ej denote the j th column of the identity matrix In . Then a
typical element of Rn,1 is given by
! $
x1
" x2 %
v=" %
# · & = x 1 e1 + x 2 e2 + · · · + x n en ,
xn

and is represented by its coefficients relative to the basis {e1 , . . . , en }. But when we wish to
describe subspaces of Rn,1 there is a need to work with other bases. In fact, any subspace of
Rn is a vector space in its own right. In general it is important to be able to change basis; in
this way, the abstract concept of vector space comes into its own.
(ii) The ‘scalars’ that are used to multiply vectors in the definition of a vector space need
not be real numbers. The set of scalars is required to be what is called a field, of which R
is only one example. Other examples of fields include the set Q of rational numbers (p/q
where√ p, q are integers with q ∕= 0), the set C of complex numbers (x + iy with x, y ∈ R and
i = −1), and the ‘binary set’ B = {0, 1} (also called F2 ) consisting of just two elements.
(iii) In this course, we shall only work with vector spaces of finite dimension. We shall
explain that such a vector space V is characterized by the existence of a basis of finite size
n. The choice of such a basis makes V closely resemble the set Rn or (for the other choices
of fields mentioned above) Qn , Cn or the finite set B n of size 2n . However, in analysis, the
most important examples of vector spaces do not fall into this category and once again ones
needs to rely upon the abstract and basis-independent theory.

L14.2 The definition of a vector space. In order to define a vector space in general, one first
needs a field F of scalars. For the moment, we shall suppose that F is one of R, Q, C, B . The
important thing about F one needs to know is that its elements can be added, subtracted,
multiplied and divided, and that there are two special ones, 0 and 1.
A vector space is a set V in which it is possible to form
(i) the sum u + v of u, v ∈ V ,
(ii) the product av of a ∈ F with v ∈ V .
The elements of V are called ‘vectors’ even though they do not necessarily resemble vectors
in Rn . The two basic operations are subject to a number of rules that formalize the ones that
are completely obvious in the case F = R and V = Rn we are most familiar with. There is
no need to memorize these rules, as they are quickly absorbed in practice:
Definition. V is said to be a vector space over F provided
(a) addition of vectors behaves like addition of real numbers in that it satisfies
(u + v) + w = u + (v + w),
(1)
u + v = v + u, for all u, v, w ∈ V,
there is a zero or null element 0 for which
0+v =v for all v ∈ V,
and each vector v ∈ V has a ‘negative’ −v with the property that v + (−v) = 0;
(b) the ‘internal’ operations of F are compatible with (i) and (ii) in the sense that
(a + b)v = av + bv
(ab)v = a(bv), (2)
a(u + v) = au + av,
and finally,
1v = v.

On this page, we have been careful to type elements of V (but not F) in boldface, though in
handwriting one does not normally distinguish elements of V in any way. It is important
to observe that instances of both addition and multiplication in (2) occur with different
meanings. The conditions in (a), taken together, assert that the operation + makes V into
what is called a commutative (or abelian) group. Of course, we write u + (−v) as u − v , and
this process defines subtraction in a vector space.
Here is a simple consequence of the axioms above:
0v + 0v = (0 + 0)v = 0v.
The various rules in (a) allow us to subtract 0v (without knowing what this equals) to get
0v = 0,
so that 0v is always the null vector.
Example. To keep matters familiar, we first suppose that F = R , in which case V is called a real
vector space. Certainly V = Rn satisfies the definition above with the usual operations that we
have used repeatedly.
But another example is to take V to be the set Rm,n of matrices of size m × n . We explained
in the first lecture how to add such matrices together, and multiply them by scalars. From the
point of view of vector spaces (in which multiplication of matrices plays no part), there is little
difference between Rm,n and the space (of say row vectors) R1,mn . For example, we can pass
from R2,3 to R6 by the correspondence
' (
a b c
↔ (a, b, c, d, e, f ).
d e f
it does not matter whether we use the left-hand or right-hand description to define the two
basic operations – the result is the same. But we could equally well have chosen to represent the
matrix with (c, f, b, e, a, d) ; for this reason the vector spaces Rm,n , R1,mn are not identical.
L14.3 Polynomials and functions. A more original example is obtained using polynomials.
Recall that a polynomial is an expression of the form

p(x) = a0 + a1 x + a2 x2 + · · · + an xn . (3)

We are most familiar with the case in which the coefficients are real numbers, but they could
belong to a field F. The polynomial has degree equal to n provided an ∕= 0. We have written
p in boldface to emphasize that is is to be treated as a ‘vector’, though it is also the function

x %→ p(x);

the choice of symbol for the variable is irrelevant, and one often writes p(t).
The constant term
a0 = p(0)
of the polynomial is none other than the value of the function at 0. It vanishes if and only if
the polynomial p(x) is divisible by x.
Proposition. The set Fn [x] of polynomials (in a variable x) of degree no more than n with
coefficients in a field F is a vector space over F.

Proof. There is way to define define the basic operations, using a rule that works for any
functions. Namely, we set

(p1 + p2 )(x) = p1 (x) + p2 (x),


(4)
(a p)(x) = a p(x), a ∈ F.

Using (3), it is obvious that the sum of two polynomials is a polynomial, and that the product
of a polynomial with a scalar is a polynomial. In practice, it is just a matter of applying the
operations coefficient-wise, as in the example

(1 + x)2 + 3(1 + x + 12 x2 + 16 x3 ) = 4 + 5x + 52 x2 + 12 x3 ,

representing a LC of two elements in R3 [x]. QED

The previous proposition is a special case of the


Proposition. Let V be a vector space, and let A be any non-empty set. Then the rules (4)
make the set of all mappings f : A → V into a vector space.

L14.4 More about fields.


We shall not give the formal definition of a field. But it is a set F that satisfies the rules
of a vector space, in which we are allowed to take the set of scalars to be the same set
F. Multiplication between scalars and vectors therefore becomes a multiplication between
elements of F such that
1a = a, a ∈ F,
and 1 is the multiplicative identity or unit. The multiplication is required to be commutative,
so that
ab = ba, a, b ∈ F, (5)
and every nonzero element a ∈ F must have a multiplicative inverse, written a−1 , satisfying

a a−1 = 1.

Every field must contain at least two elements: the additive identity (usually written 0) and
the multiplicative identity (written 1). If there are no other elements, we obtain B = {0, 1}
that is a field with the operations

0 + 0 = 0, 0 + 1 = 1 = 1 + 0, 1 + 1 = 0,
(6)
0 0 = 0, 0 1 = 0 = 1 0, 1 1 = 1.

Example. Here is an example of a field F with 4 elements. It will be defined as a vector space
over a simpler field, namely B . The set F consists of all linear combinations

b1 f 1 + b2 f 2 , b1 , b1 ∈ B,

in which we decree that f1 , f2 are independent. Although b1 , b2 are arbitrary, there are only two
choices for each. We can therefore list all four elements of F as row vectors
(0, 0) = 0f1 + 0f2 = 0,
(1, 0) = 1f1 + 0f2 = f1 ,
(0, 1) = 0f1 + 1f2 = f2 ,
(1, 1) = 1f1 + 1f2 = 1.

(On the right, we have avoided boldface to emphazise that the elements are to be treated like
numbers, not vectors.) Multiplication is carried out component-wise, using the operations of B .
The reason for also calling the last element 1 is that (1, 1)(a, b) = (1a, 1b) = (a, b) for a, b ∈ B .
The full multiplication table for F is symmetric because of (5):

· 0 f1 f2 1
0 0 0 0 0
f1 0 f1 0 f1
f2 0 0 f2 f2
1 0 f1 f2 1

If p is a prime number, the set {0, 1, 2, . . . , p − 1} with addition and multiplication modulo p
(‘clockface arithmetic’) becomes a field with exactly p elements. Applying the construction
of the Example with f1 , . . . , fk in place of f1 , f2 shows that there is a field with pk elements
for any positive integer k . It turns out that this is essentially the only field with pk elements.
Moreover, any finite field has pk elements for some prime number p ! 2 and integer k ! 1.

L14.5 Further exercises.

1. Show that the set of all differentiable functions f : (0, 1) → R is a real vector space. By
considering polynomials, or otherwise, show that it is not finite-dimensional.

2. Let V = B 2,2 be the set of 2 × 2 matrices, each of whose entries is 0 or 1.


(i) How many elements does V have?
(ii) Show that V is a vector space with field B .
(iii) How many matrices in V have determinant (calculated using (6)) equal to 1?
Notes 15 – Bases and Linear Mappings
For the rest of the course, students should have in mind the following vector spaces: Rn
(that is, either R1,n or Rn,1 ), Rm,n and Rn [x]. All are real vector spaces, that is F = R.
Many other vector spaces can then be defined by choosing subspaces, a concept that we
have already investigated in Rn .

L15.1 Linear combinations and subspaces. Let u1 , · · · , uk be elements of a vector space V


over F . We can use the same notation as before for the set of all linear combinations (LC’s) of
the vectors listed, so that

L {u1 , · · · , uk } = {a1 u1 + · · · + ak uk : ai ∈ F }. (1)

The only novelty is that the coefficients now belong to F . Whilst the ui ’s form a finite set,
the right-hand side of (1) will be infinite if F is.
Subspaces of V are defined exactly as for Rn :
Definition. Let V be a vector space over a field F . A non-empty subset U of V is a
sub space iff
(S1) u, v ∈ U ⇒ u + v ∈ U ,
(S2) a ∈ F, u ∈ U ⇒ au ∈ U .

It follows that a subspace is a vector space in its own right: the operations (S1) and (S2) will
satisfy all the vector space axioms because V itself does. In practice, subspaces are again
defined either by linear combinations or linear equations.
Exercise. Let V = R3,3 be the space of 3 × 3 matrices. Let S = {A ∈ R3,3 : A> = A} be the
subset consisting of symmetric matrices. Check that S is a subspace of V , and find matrices Ai
such that S = L {A1 , . . . , A6 }.

Definition. A vector space V (for example, a subspace V of some other vector space W )
is finite-dimensional, or finitely-generated, if there exists a finite subset {u1 , · · · , uk } such
that V = L {u1 , · · · , uk }.

Example. Consider R2,3 again. This vector space is finite dimensional because any matrix of
size 2×3 is a LC of the matrices
     
1 0 0 0 1 0 0 0 1
, , ,
0 0 0 0 0 0 0 0 0
      (2)
0 0 0 0 0 0 0 0 0
, , .
1 0 0 0 1 0 0 0 1

Indeed,
       
a5 a3 a1 0 0 1 0 0 0 0 0 0
= a1 + a2 + · · · + a6 . (3)
a6 a4 a2 0 0 0 0 0 1 1 0 0

Whilst the unordered set (2) is an obvious basis, there is no ‘right’ or ‘wrong’ way to order its
elements into a list.

Warning: there exists vectors spaces which are not finitely-generated, for example R[x] is
not finitely-generated
Exercise. Prove that R[x] is not finitely-generated. (Hint: for nay finite set of polynomials
{u1 , · · · , uk } let d = max{degui , i = 1, . . . , k} and note that L {u1 , · · · , uk } ⊆ Rd [x] .)

The actual dimension of V in the definition above turns out (we shall see) to be the smallest
number of vectors needed to ‘generate’ V , in which case the resulting set is LI. Elements
v1 , . . . , vk in a vector space V are linearly independent (LI) if there is no nontrivial linear
relation between them. More formally,
Definition. Let {v1 , . . . , vk } ⊂ V . We say that {v1 , . . . , vk } is LI if and only if
a1 v1 + · · · + ak vk = 0 (with ai ∈ F ) ⇒ a1 = · · · = ak = 0.

For example, the six matrices in (2) are LI because (3) can only be null if all the coefficients
ai are zero.
Definition. A finite set {v1 , . . . , vn } of elements of a vector space V is a basis of V if
(B1) it generates V in the sense that V = L {v1 , . . . , vn }, and
(B2) it is LI.

Recall that any two bases of a subspace of Rn have the same number of elements. We
shall explain shortly that the same result holds for any finite-dimensional vector space; the
dimension of V is then defined to be this number.
Exercise. (i) Guided by (2), show that Rm,n has a basis of consisting of mn matrices.
(ii) Verify that {1, x, . . . , xn } is a basis of Rn [x] , by observing that if a0 + a1 x + · · · + an xn equals
the zero polynomial then it has to vanish for all x , thus ai = 0 for all i.

L15.2 Linear mappings. Let V, W be two vector spaces over F . A mapping f : V → W is


called linear if
(LM1) f (u + v) = f (u) + f (v) for all u, v ∈ V .
(LM2) f (av) = af (v) for all a ∈ F and v ∈ V .
These two conditions are equivalent to either of the single ones
f (au + bv) = af (u) + bf (v) for all a, b ∈ R, u, v ∈ V ,
f (au + v) = af (u) + f (v) for all a ∈ R, u, v ∈ V .
Here is an essential consequence:
f (0) = 0, (4)
meaning that f maps the null vector of V to the null vector of W (both are denoted here
by the symbol 0).
Example. Equation (3) effectively defines a linear mapping
 
a5 a3 a1
f : R1,6 → R2,3 , for which f (a1 , . . . , a6 ) = .
a6 a4 a2
Here, we have used the notation f v in place of f (v) to avoid double parentheses. It is easy
to check the conditions (LM1) and (LM2); the reason they hold is that each matrix component
on the right is a linear combination of the coordinates on the left. By contrast, neither of the
following mappings is linear:
a5 (a3 )2 a1
   
a5 + 1 a3 a1
g(a1 , . . . , a6 ) = , h(a1 , . . . , a6 ) = .
a6 a4 a2 a6 a4 a2
Let f : A → B be an arbitrary mapping between two sets. Recall that the image of f ,

Im f = {f (a) : a ∈ A },

denotes the subset of B consisting of those elements ‘gotten’ from A . Also, given b ∈ B ,
its inverse image
f −1 (b) = {a ∈ A : f (a) = b}
is the subset of A consisting of all those elements that map to b. Then f is said to be
(i) surjective or onto if Im f = B ,
(ii) injective or into if f (a1 ) = f (a2 ) ⇒ a1 = a2 .
Thus f is onto iff f −1 (b) is nonempty for all b ∈ B . If f is both surjective and injective then
it is called bijective. This means that there exists a well-defined inverse mapping

f −1 : B → A ,

so that f −1 ◦ f and f ◦ f −1 are identity mappings.


Example. Let n be a positive integer. Then f (x) = xn defines a bijection R → R iff n is odd;
if n is even f is neither injective nor surjective; f is linear only when n = 1 (in which case it is
the identity mapping). If n = 2 then f −1 (64) = {8, −8}; if n = 3 then ‘f −1 (64)’ is ambiguous:
it could mean the subset {4} , or the number 4 obtained by applying the inverse mapping to 64 .

Here is an easy but important result:


Lazy Lemma. Let f : V → W be a linear mapping. Then f is injective iff

f (v) = 0 ⇒ v = 0. (5)

Proof. Equation (4) tells us that (5) is a special case of the injectivity condition. So if f is
injective and f (v) = 0, then f (v) = f (0) and thus v = 0. Conversely, suppose that (5)
holds. If f (v1 ) = f (v2 ) then because f is linear,

f (v2 − v1 ) = f (v2 ) − f (v1 ) = 0,

and by hypothesis, v2 − v1 = 0. Thus, f is injective. QED

L15.3 Bases and linear mappings. We now use linear mappings to interpret the conditions
that define a basis. Suppose that v1 , . . . , vn are any n elements of a vector space V . Define
a mapping f : R1,n → V by

(a1 , . . . , an ) 7→ a1 v1 + · · · + an vn . (6)

It is easy to check that this mapping is linear. Then (B1) asserts that it is surjective, and (B2)
implies that it is injective (with the help of the Lazy Lemma).
A bijective linear mapping is also called an isomorphism, so a basis of V defines an isomor-
phism f from Rn to V . Observe from (6) that f maps each element of the canonical basis
of Rn onto an element of the chosen basis of V . If {v1 , · · · , vn } is a basis, we may use f to
identify Rn with V , and to transfer properties of Rn to V . This enables one to prove results
such as the following:
Theorem. Let V be a vector space with a basis of size n. We have
(i) if m vectors v1 , . . . , vm of V are LI then m 6 n,
(ii) if V = L {v1 , . . . , vp } then n 6 p.
In particular, any basis of V has n elements, and V is said to have dimension n.

Proof. We already know that this is true for V = Rn . For, we represent the vectors as rows
of a matrix A with size m × n or p × n, and use the theory of the rank r(A) of A. Part (i)
implies that r(A) = m, and so m 6 n. Part (ii) implies that r(A) = n and so n 6 p.
To deduce (i) in general, let {u1 , . . . , un } be a basis of V with n elements that we know
exists by hypothesis and consider the linear mapping f : Rn → V defined as

f (a1 , . . . , an ) = a1 u1 + . . . an un

and note that f is injective. Assume that v1 , . . . , vm are LI in V . Then f −1 (v1 ), · · · , f −1 (vm )
are LI in Rn because
0 = a1 f −1 (v1 ) + · · · + am f −1 (vm ) = f −1 (a1 v1 + · · · + am vm )
⇒ 0 = a1 v1 + · · · + am vm ⇒ a1 = · · · = am = 0.

Hence m 6 n. Part (ii) is similar. QED

The statement of the Theorem is represented schematically by

1 n →∞

LI set (size 6 n) basis generating set (size > n)

Lazy Corollary. Suppose that we already know that dim V = n. Then in checking whether
a set of n elements is a basis we only need bother to check one of (B1), (B2).

Proof. A practical way of extending an LI set to a basis is provided by the following result:
Suppose that v1 , . . . , vk are LI in a vector space V . Then v1 , . . . , vk , vk+1 are LI iff

vk+1 6∈ L {v1 , . . . , vk } .

We have already seen this in action when V = Rn , and it is valid in general.


Now, if V = L {v1 , . . . , vn } as in (B1), then the n vectors must be LI. For if not, at least
one is redundant and V is generated by n − 1 elements, impossible. Similarly, if v1 , . . . , vn
are LI as in (B2), then they must generate V . For if not, we could add vn+1 to get n + 1 LI
vectors, contradicting the Theorem. QED

L15.4 Further exercises.

1. Let D: R3 [x] → R3 [x] denote the mapping given by differentiating: D(p(x)) = p0 (x).
Show that D is linear, and determine the images of the four polynomials pi in the next
exercise. Is D injective? Is it surjective?
2. Consider the polynomials:

p1 (x) = 3 + x2 , p2 (x) = x − x2 + 2x3 , p3 (x) = 2 − x2 , p4 (x) = x − 2x2 + 3x3 .

Verify that {p1 (x), p2 (x), p3 (x), p4 (x)} is a basis of R3 [x] and express x2 as a LC of the ele-
ments of this basis.

3. Let A ∈ R3,3 and define f : R3,1 → R3,1 by f (v) = Av . Prove that f is linear (such
examples will be the subject of the next lecture), and use the theory of linear systems to
show that f is injective iff r(A) = 3.

4. Show that the cubic polynomials

p1 (x) = − 61 (x − 2)(x − 3)(x − 4) p2 (x) = 21 (x − 1)(x − 3)(x − 4)


p3 (x) = − 12 (x − 1)(x − 2)(x − 4) p4 (x) = 61 (x − 1)(x − 2)(x − 3)

satisfy pi (j) = δij (meaning 1 if i = j and 0 otherwise). Deduce that they are LI. Explain
carefully why this implies that they form a basis of R3 [x].
Notes 16 – Linear Mappings and Matrices
In this lecture, we turn our attention to linear mappings that may be neither surjective nor
injective. We show that once bases have been chosen, a linear map is completely determined
by a matrix. This approach provides the first real justification for the definition of matrix
multiplication that we gave in the first lecture.

L16.1 The linear mapping associated to a matrix. First, we point out that any matrix defines
a linear mapping.
Lemma. A matrix A ∈ Rm,n defines a linear mapping f : Rn → Rm by regarding elements
of Rn as column vectors and setting

f (v) = Av, v ∈ Rn,1 .

Proof. The conditions (LM1) and (LM2) are obvious consequences of the rules of matrix
algebra. QED

Our preference for column vectors means that an m × n matrix defines a mapping from Rn
to Rm , so that m, n are ‘swapped over’. Here is an example:
 
0 2 4
A= ∈ R2,3 defines f : R3 → R2
3 5 1

with      
1   0   0  
0 2 4
f 0 =
  , f 1 =
  , f 0 =
  .
3 5 1
0 0 1

The j th column of A represents the image of the j th element of the canonical basis of Rn,1 .
This example shows that at times we can use A and f interchangeably. But there is a subtle
difference: in applying f , we are allowed to represent elements of Rn and Rm by row
vectors. Thus it is also legitimate to write

f (1, 0, 0) = (0, 3), f (0, 1, 0) = (2, 5), f (0, 0, 1) = (4, 1),

and more memorably,

f (x1 , x2 , x3 ) = (2x2 + 4x3 , 3x1 + 5x2 + x3 ). (1)

The last equation shows us how to pass from the rows of A to the definition of f . It turns
out that any linear mapping f : Rn → Rm has a form analogous to (1), from which we can
construct the rows of an associated matrix.

L16.2 The matrix associated to a linear mapping. Let V, W be vector spaces.


Lemma. Let V be a vector space with basis {v1 , . . . , vn }. A linear mapping f : V → W
is completely determined by vectors f (v1 ), . . . , f (vn ), which (in order to define f ) can be
assigned arbitrarily.
Proof. Note that (LM1) and (LM2) yield that any element v of V can be written in a unique
way as a1 v1 + · · · + an vn , and

f (v) = f (a1 v1 + · · · + an vn ) = a1 f (v1 ) + · · · + an f (vn ). (2)

Thus, f (v) is determined by the n images f (vi ) of the basis elements. Moreover, any choice
of n such vectors allows us to define a linear mapping f by means of (2). QED

We have seen that a matrix determines a linear mapping between vector spaces, namely
from Rn to Rm . The Lemma allows us to go backwards and associate a matrix to any linear
mapping from V to W , once we have chosen bases

{v1 , · · · , vn }, {w1 , · · · , wm } (3)

of V and W (we are assuming that dim V = n and dim W = m).


Definition. Let g: V → W be a linear mapping. The matrix of g with respect to the bases
(3) is the matrix B ∈ Rm,n (also written Mg or M (g) if the choice (3) is understood) whose
j th column gives the coefficients of g(vj ) in terms of w1 , . . . , wm .

If B = (bij ), then we are asserting that


m
X
g(vj ) = b1j w1 + · · · + bmj wm = bij wi , for each j = 1, . . . , n. (4)
i=1

By the previous lemma, g is completely determined by B .


Warning: In this definition, we are really thinking of the bases (3) as ordered sets.
To recover our previous correspondence, we need to take V = Rn and W = Rm with their
canonical bases. For let A ∈ Rm,n . If f : Rn → Rm is the linear mapping defined by setting
f (v) = Av and we choose (3) to be the canonical bases, then B = Mf = A.
Example. We are perfectly at liberty to apply the Definition to the same vector space with dif-
ferent bases. Let V = W = R2 [x] . Choose the basis {1, x + 1, (x + 1)2 } for V and the basis
{1, x, x2 } for W . Let D be the linear mapping defined by differentiation: Dp = p0 . Then the
matrix of D with respect to the chosen bases is
 
0 1 2
A =  0 0 2 ;
0 0 0

the null column tells us that D(1) = 0 and the null row tells us that Im D consists of polynomials
of degree no greater than 1.

Exercise. Repeat the previous example using the same bases, but w.r.t. the map G : W → V
such that Gp = p0 . Compare the two matrices.

L16.3 Compositions and products. With the link between linear mappings and matrices
now established, we shall see that composition of linear mappings corresponds to the prod-
uct of matrices. Suppose that B ∈ Rm,n , A ∈ Rn,p , and consider the associated linear
mappings
g f
Rm,1 ←− Rn,1 ←− Rp,1
defined by f (u) = Au and g(v) = Bv . (It is easier to understand what follows by writing
the mappings from right to left.) The composition g ◦ f is obviously

B Au ← Au ← u,

and is therefore associated to the matrix BA.

More generally, given vector spaces U, V, W and linear mappings


g f
W ←− V ←− U, (5)

choose bases for each of U, V, W , and let Mf , Mg be the associated matrices (the same basis
of V being used for both matrices). Then we state without proof the
Proposition. Let h = g ◦ f be the composition (5). Then Mh = Mg Mf .

This result is especially useful in the case of a single vector space V of dimension n, and a
linear mapping f : V → V . Such a linear mapping (between the same vector space) is called
a linear transformation or endomorphism. In these circumstances, we can fix the same basis
{v1 , . . . , vn } of V , and consider compositions of f with itself:
Example. Define f : R3 → R3 by f (e1 ) = e2 , f (e2 ) = e3 and f (e3 ) = e1 . Check that the matrix
A = Mf (taken with respect to the canonical basis {e1 , e2 , e3 } ) satisfies A3 = I3 .

L16.4 Nullity and rank. Important examples of subspaces are provided by the following
lemma:
Lemma. Let g: V → W be a linear mapping. Then
(i) g −1 (0) is a subspace of V ,
(ii) Im g is a subspace of W .

Proof. We shall only prove (ii). If w1 , w2 ∈ Im g then we may write w1 = g(v1 ) and
w2 = g(v2 ) for some v1 , v2 ∈ V . If a ∈ F then

aw1 + w2 = ag(v1 ) + g(v2 ) = g(av1 + v2 ),

and so aw1 + w2 ∈ Im g . Part (i) is similar. QED

Example. In the case of a linear mapping f : Rn → Rm defined by f (v) = Av with A ∈ Rm,n ,

f −1 (0) = {v ∈ Rn,1 : Av = 0}

is the solution space of the homogeneous linear system Ax = 0 . We already know that this is
a subspace and we labelled it Ker A. On the other hand, the image of f is generated by the
vectors f (ei ) that are the columns of A :

Im f = L {f (e1 ), . . . , f (en )} = L {c1 . . . , cn } = Col A.

It follows that the dimension of Im f equals the rank, r(A) , the common dimension of Row A
and Col A.

In view of this key example, we state the


Definition. (i) The kernel of an arbitrary linear mapping g: V → W is the subspace g −1 (0),
more usually written Ker g or ker g . Its dimension is called the nullity of g .
(ii) The dimension of Im g is called the rank of g .

Rank-Nullity Theorem. Given an arbitrary linear mapping g: V → W ,

dim V = dim(Ker g) + dim(Im g).

This important result can also be expressed in the form

nullity(g) + rank(g) = n,

n being the dimension of the space of ‘departure’.

Proof. By choosing bases for V and W , we may effectively replace g by a linear mapping
f : Rn → Rm . But in this case, the previous example shows that Ker f = Ker A and Im f =
Col A. We know that dim(Col A) = dim(Row A) = r(A), and (essentialy by (RC2)),

dim(Ker A) = n − r(A).

The result follows. QED

The following result is for class discussion:


Corollary. Given a linear mapping g: V → W with dim V = n and dim W = m,

g is injective ⇔ nullity(g) = 0 ⇔ rank(g) = n


g is surjective ⇔ rank(g) = m
g is bijective ⇔ nullity(g) = 0 and rank(g) = m ⇔ rank(g) = m = n.

L16.5 Further exercises.

1. Let f : R4 → R3 be the linear mapping with

f (x1 , x2 , x3 , x4 ) = (x1 , x1 +x2 , x1 +x2 +x3 ).

Write down that associated matrix A, and find v ∈ R4 such that f (v) = (0, 1, 1).

2. Let f : R3 → R2 and g : R2 → R3 be the two mappings:

f (x, y, z) = (x + y, x + y − z), g(s, t) = (3s − t, 3t − s, s).

(i) Complete: g(f (x, y, z)) = (2x+2y+z, ? )


(ii) Find the matrices Mf , Mg , Mg◦f with respect to the canonical bases of R2 and R3 , and
check that Mg◦f = Mg Mf .

3. Let V = W = R2 [x], and let D be the linear mapping given by D(p(x)) = p0 (x). Find the
matrix MD with respect to the bases: {1, x, x2 } for V and {1, x+1, (x+1)2 } for W .
4. Let f : R3 → R3 be a linear transformation such that

Ker f = L {(1, 0, 0), (0, 1, 0)} and Im f ⊆ L {(0, 1, 0), (0, 0, 1)} .

Find all possible matrices Mf associated to f with respect to the canonical basis of R3 .

5. Find bases for the kernel and the image of each of the following linear mappings:
 
x −y
f: R3 → R2,2 , f (x, y, z) = ,
y x
 
x y
g : R2,2 → R4 , g = (x − 2y, x − 2z, y + t, x + 2t).
z t
 
1 2 −1
6. Let f : R3,1 → R3,1 be the linear transformation defined by A =  1 1 3 .
3 5 1
(i) Find a vector v1 such that Ker f = L {v1 }.
(ii) Choose v2 , v3 such that {v1 , v2 , v3 } is a basis of R3,1 .
(iii) Check that {f (v2 ), f (v3 )} is a basis of the subspace Im f (it always will be!).
Notes 17 – Operations on Subspaces
Subspaces of vector spaces (including Rn ) can now be conveniently defined as the kernels or
images of linear mappings between vector spaces. This leads us to discuss their properties
in more detail, and compute their dimensions.

L17.1 Intersections and unions. Let W be any finite-dimensional vector space over a field
F (or, if the student prefers) merely Rn with F = R.
Lemma. Let U, V be two subspaces of the fixed vector space W . (i) The intersection U ∩ V
is a subspace of W .
(ii) The union U ∪ V is a subspace of W if and only if U ⊆ V or V ⊆ U (in which case of
course, it equals V or U ).

Proof. (i) If w1 , w2 ∈ U ∩ V and a ∈ F then aw1 + w2 belongs to both U and V by


assumption. This is a combined verification of (LM1) and (LM2).
(ii) If U ∪ V is a subspace, but neither U nor V is contained in the other, then we can choose
u ∈ U \ V and v ∈ V \ U . By assumption, w = u + v ∈ U ∪ V so w belongs to either U or
V . In the former case, v = w − u ∈ U , contradiction. Latter case, similarly. QED

Re-iterating (i), the intersection of any number of subpsaces is always a subspace. All sub-
spaces contain the null vector 0, so at ‘worst’ this subspace will be {0}.
As for unions, there will always exist a smallest subspace of W containing U ∪ V . Any such
subspace must certainly contain all the vectors
u + v, for any u ∈ U, v ∈ V, (1)
by property (S1) of a subspace. Indeed the set of all these simple sums is a subspace:
Definition/Lemma. Let W be a vector space. The sum of two subspaces U, V of W is the
set, denoted U + V , consisting of all the elements in (1). It is a subspace, and is contained
inside any subspace that contains U ∪ V .

Proof. We check properties (S1) and (S2). Typical elements of U + V are u1 + v1 and u2 + v2
with ui ∈ U and vi ∈ V . Their sum is
(u1 + v1 ) + (u2 + v2 ) = (u1 + u2 ) + (v1 + v2 ) ∈ U + V,
and thus (S1) holds. Similarly,
a(u1 + v1 ) = (au1 ) + (av1 ) ∈ U + V,
and thus (S2) holds. Hence, U + V is a subspace.
If X is a subspace that contains U ∪ V then it has to contain all elements u ∈ U and v ∈ V ,
and therefore all elements u + v ∈ U + V . It therefore contains U + V . QED

One also says that U + V is the subspace generated by U and V . This actually gives a clearer
idea of its definition.
In practice, U + V contains any LC of elements drawn from U and V , because such a LC
can always be re-arranged into the form (1). We can also think of U + V as the intersection
of all (typically infinitely many) subspaces containing both U and V .
Example. Consider the subspaces U = L {e1 , e2 } and V = L {e3 , e4 } of R4 . Then U + V = R4
because any vector in R4 can be expressed as the sum

a 1 e1 + a 2 e2 + a 3 e3 + a 4 e4 = u + v.
! "# $ ! "# $
u∈U v∈V

This situation is somewhat special, as it is also the case that U ∩ V = {0} . Lots of similar
examples (of sums of two subspaces with zero intersection) can be constructed in any vector
space, once one has a basis to play with.

L17.2 Visualizing subspaces in R2 and R3 . We can represent the vector space R2 by points
of the plane, in which the null vector 0 corresponds to the origin. An arbitrary vector in
R2 is represented by the tip of the arrow it defines, placed at the origin. In this way, it is
obvious that any subspace U of R2 is either
(i) the origin itself, corresponding to the zero subspace {0},
(ii) any straight line passing through the origin,
(iii) the whole plane, corresponding to R2 .
In these cases, the dimension of U is respectively 0, 1, 2.
If U1 , U2 are two distinct subspaces of R2 , each of dimension 1, they are both represented
by lines containing the origin. One easily sees that

U1 ∩ U2 = {0} and U 1 + U 2 = R2 .

The last equality follows because any vector in R2 can be expressed as the sum of something
in U1 with something in U2 . (The most obvious case is that in which U1 = L {e1 } and
U2 = L {e2 } correspond to the two axes and (x, y) = xe1 + ye2 ∈ U1 + U2 .)
We can carry out a similar analysis for subspaces V in R3 , representing the latter by points
in space. In this situation, dim V = 1 again gives rise to a straight line through the origin,
but dim V = 2 gives any plane passing through the origin. If V1 , V2 are two such distinct
2-dimensional subspaces (planes through the origin), one easily sees this time that

V1 ∩ V2 = V3 and V 1 + V 2 = R3 ,

where V3 is a line (again containing the origin). Note that in this last case,

dim(V1 + V2 ) = dim V1 + dim V2 − dim(V1 ∩ V2 ).

We shall see that this formula holds in general.

L17.3 Dimension counting. Any subspace U is a vector space in its own right and has a
dimension: recall that this equals the number of elements inside any basis of U .
Obvious lemma. If U is a subspace of a vector space (or another subspace) V then dim U !
dim V , with equality iff U = V .
This is true because a basis of U can always be extended until it becomes one of V . To do
this we can use the trick that if v1 , . . . , vk are LI and vk+1 is not a LC of v1 , . . . , vk , then
v1 , . . . , vk , vk+1 are LI. In the examples above, a subspace of R2 has dimension 2 only if it is
R2 . Similarly for dimension 3 in R3 .
Much of the theory of bases and dimension was discovered by Hermann Grassmann, in-
cluding the following result dating from around 1860:
Theorem. Let U, V be two subspaces of a finite-dimensional vector space W . Then

dim U + dim V = dim(U ∩ V ) + dim(U + V ). (2)

This result is illustrated by the following example (whose method is often used as a proof).
Example. Let W = R5 . Consider the two subspaces

U = L {(x1 , x2 , x3 , x4 , x5 ) : 2x1 − x2 − x3 = 0 = x4 − 3x5 } ,


V = L {(x1 , x2 , x3 , x4 , x5 ) : x3 + x4 = 0} .

We are required to find a basis of R5 that contains both a basis of U and a basis of V . The trick
is to start by finding a basis of U ∩ V . It is easy to see that dim U = 3 and dim V = 4 ; this is
because the homogeneous linear systems have rank 2 and 1. Now, a vector v ∈ R5 belongs to
U ∩ V iff it satisfes all three equations. Since the associated matrix
% ' % '
2 −1 −1 0 0 1 − 12 0 0 3
2
) *
A = &0 0 0 1 −3 ( ∼ & 0 0 1 0 3 (
0 0 1 1 0 0 0 0 1 −3

has rank 3, we deduce that dim(U ∩ V ) = 5 − 3 = 2 . Indeed, we may take x2 = s and x5 = t to


be free variables and obtain (as a row) the general solution
+ ,
v = 12 s − 32 t, s, −3t, 3t, t .

A basis of U ∩ V consists of

w1 = ( 12 , 1, 0, 0, 0), w2 = (− 32 , 0, −3, 3, 1)

(take first s = 1, t = 0 and second s = 0, t = 1 ). Extend this basis in any way to


a basis {w1 , w2 , w3 } of U , and
a basis {w1 , w2 , w4 , w5 } of V , and
Then {w1 , w2 , w3 , w4 , w5 } will always be LI and thus a basis of R5 . There are lots of choices in
this example, but we could take
w3 = (0, −1, 1, 0, 0) (this works since w3 ∈ U but w3 ∕∈ L {w1 , w2 } ),
w4 = (0, 0, 1, −1, 0) , w5 = (0, 0, 0, 0, 1) (note that w5 ∕∈ L {w1 , w2 , w4 } ).

In conclusion, U + V = R5 , and the required basis is U


# $! "
w5 w1 w2 w3 w4
! "# $
V

Fancy proof of (2). First consider the Cartesian product P = U × V consisting of ordered pairs
(u, v) with u ∈ U and v ∈ V . This can be made into a vector space using the operations

(u1 , v1 ) + (u2 , v2 ) = (u1 + u2 , v1 + v2 ),


a(u, v) = (au, av).

If u1 , . . . , um is a basis of U and v1 , . . . , vn a basis of V , it is easy to verify that

(u1 , 0), . . . , (um , 0), (0, v1 ), . . . , (0, vn )


is a basis of P . Hence, dim P = m + n (P is called the external direct sum of U and V ).
Consider the mapping
f : P → W, f (u, v) = u + v.
One easily checks that (i) f is linear, (ii) the image of f equals U + V , and (iii) the kernel of
f equals {(w, −w) : w ∈ U ∩ V }. Since the last subspace has the same dimension as U ∩ V ,
the Theorem follows from the Rank-Nullity formula: dim P = dim Ker f + dim Im f . QED

Example. Consider two subspaces U = L {u1 , u2 } , V = L {v1 , v2 } of Rn . There are two


competing ways to decide mechanically whether U = V :
(i) Super-reduce the 2 × n matrix with rows u1 , u2 . Do the same for v1 , v2 . Then U = V iff
the two super-reduced matrices are identical. (This method works because the super-reduced
version of a matrix is unique.)
(ii) Step-reduce the 4 × n matrix with rows u1 , u2 , v1 , v2 to find its rank ρ . Then U = V iff
ρ = 2 . (This works because in general ρ = dim(U + V ) , whereas U = V iff U + V = U = V .)

L17.4 Further exercises.

1. Given the subspaces

U = {(0, a, b, c) : a, b, c ∈ R}, V = {(p, q, p, r) : p, q, r ∈ R}

of R4 , find a basis of each of U , V , U ∩ V , U + V .

2. In R5 , find a basis for the intersection of the two subspaces

U1 = {(x1 , x2 , x3 , x4 , x5 ) : 2x1 − x2 − x3 = 0 = x4 − 3x5 },


U2 = {(x1 , x2 , x3 , x4 , x5 ) : 2x1 − x2 + x3 + 4x4 + 4x5 = 0}.

Is it true that U1 + U2 = R5 ?

3. Consider the following space of polynomials:

V = {p(x) ∈ R[x] : p(1) = 0}.

By the Remainder Theorem, p(x) belongs to V iff it is divisible by x − 1. What is the dimen-
sion of V ∩ Rn [x] if n " 1?

4. Consider the following subspaces of R5 :

V1 = L {(1, 3, −2, 2, 3), (1, 4, −3, 4, 2), (2, 3, −1, −2, 9)} ,
V2 = L {(1, 3, 0, 2, 1), (1, 5, −6, 6, 3), (2, 5, 3, 2, 1)} .

Determine the dimensions of V1 , V2 , V1 ∩ V2 , V1 + V2 .

5. Do the same for


W1 = {(x1 , x2 , x3 , x4 , x5 ) : 2x1 − x2 − x3 = 0 = x4 − 3x5 },
W2 = {(x1 , x2 , x3 , x4 , x5 ) : 3x1 − 3x2 − x4 = 0 = 2x1 − x2 − x3 }.

Referring to the previous exercise, find also the dimensions of V1 ∩ W1 and V2 ∩ W2 .


Notes 18 – Eigenvectors and Eigenvalues

L18.1 Eigenvectors of a linear transformation. Consider a linear mapping

f : V → V,

where V is a vector space with field of scalars F .


Definition. A nonzero element v ∈ V is called an eigenvector of f if there exists λ (possibly
0) in F such that
f (v) = λv. (1)
In these circumstances, λ is called the eigenvalue associated to v .

Warning: Since f (0) = 0, it is obvious that the null vector satisfies (1). But the null vector
does NOT count as an eigenvector; for one thing its eigenvalue λ is undetermined. On the
the hand, observe that if v is an eigenvector of f and a ∕= 0 then av is also an eigenvector
(with the same eigenvalue).
Example. Here are two extreme cases:
(i) Suppose f is the identity mapping, so that f (v) = v for all v ∈ V . This is obviously linear,
and every nonnull vector v ∈ V is an eigenvector with eigenvalue 1.
(ii) Define g(v) = 0 for every v ∈ V (the ‘null’ transformation, linear by default). Once again,
every nonnull v ∈ V is an eigenvector, but this time with common eigenvalue 0 .

More interesting examples of eigenvectors can easily be written down if there is a basis of
the vector space V at one’s disposal:
Example. (i) Take V = R2 and define a linear mapping f : R2 → R2 by

f (e1 ) = e1 , f (e2 ) = −e2 .

By this very definition, e1 and e2 are eigenvectors with eigenvalues 1 and −1 . Geometrically,
f is a reflection in the x -axis, and any reflection in the plane will have two such eigenvectors.
(ii) Suppose that V has dimension 4 and a basis {v1 , v2 , v3 , v4 } . We are at liberty to define

f (v1 ) = v2 , f (v2 ) = v3 , f (v3 ) = v4 , f (v4 ) = v1 ,

and this uniquely determines the linear mapping f : V → V for which

v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4

are eigenvectors with respective eigenvalues 1 and −1 .

To show that bases are not essential for the existence of eigenvectors, here is an example in
which the vector space is not finite-dimensional:
Exercise. Let V be the vector space of functions φ: R → R that admit derivatives of all orders.
Then the mapping D: V → V given by


D(φ) = φ′ , where φ′ (x) = ,
dx
is linear. Find all the eigenvectors (or rather, ‘eigenfunctions’) of D .
L18.2 Eigenvectors of a square matrix. Suppose that V is a vector space of finite dimension
n with F = R (we shall only consider this case from now on). Once we have chosen a basis
of V (any one will have n elements), we know that we can treat V as if it were Rn . For this
reason, we need only study linear mappings f : Rn → Rn , any one of which is given by

f (v) = Av,

where A ∈ Rn,n is a square matrix.


Accordingly, an eigenvector of the matrix A is a nonzero column vector v such that

Av = λv. (2)

In some lucky cases, one can find such vectors by inspection.


Example. Consider the matrix ! #
−2 1 1
A = " 1 −2 1 $ .
1 1 −2
It is easy to spot the eigenvector
! #
1
" 1 $, with eigenvalue 0.
1

It is a little less obvious, but nonetheless true, that


! # ! # ! #
1 1 0
"−1 $ , " 0 $ , " 1 $ are all eigenvectors with eigenvalue − 3.
0 −1 −1

However, these three are not linearly independent (the second is the first plus the third). Overall,
in this example, we can find three eigenvectors that are LI. We shall see later that any square
matrix of size n × n has at most n eigenvectors that are LI.

Let A ∈ Rn,n , and let I = In be the identity matrix of the same size. Here is a useful trick:
Lemma. If v is an eigenvector of A with eigenvalue λ, then it is an eigenvector of A + aI
with eigenvalue λ + a.

Proof. Immediate, since (A + aI)v = Av + av = (λ + a)v . QED

Exercise. (i) Find two linearly independent eigenvectors of


! #
1 0 0 1
%1 1 0 0&
% &
" 0 1 1 0 $.
0 0 1 1

Hint: refer to an example on the previous page.


(ii) Let V be a vector space with F = R , and basis {v1 , v2 } so dim V = 2 . A linear mapping
f : V → V is defined by setting f (v1 ) = v2 and f (v2 ) = −v1 . Write down the associated matrix
Mf , and show that it has no eigenvector in R2,1 .
L18.3 The characteristic polynomial. This is a great tool for detecting eigenvalues.
Lemma. If v is an eigenvector of A with eigenvalue λ then det(A − λI) = 0. Conversely, if
det(A − λI) = 0 then there exists an eigenvector of A with eigenvalue λ.

Proof. Equation (2) can be rewritten as

(A − λI)v = 0 or equivalently v ∈ Ker(A − λI). (3)

Given a solution v ∕= 0, the nullity of the matrix A − λI will be nonzero. By (RC2) or the
Rank-Nullity Theorem, r(A − λI) < n. Thus, A − λI is not invertible and its determinant
is necessarily 0.
Conversely, det(A − λI) = 0 implies that r(A − λI) < n and Ker(A − λI) contains a nonnull
vector v . The latter satisfies Av = λv . QED

Definition/Proposition. Given A ∈ Rn,n , the function

p(x) = det(A − xI)

is called the characteristic polynomial of the square matrix A. It is indeed a polynomial of


degree n in the variable x.
' (
a b
To appreciate this, observe that the matrix A = has characteristic polynomial
c d
' (
a−x b
p(x) = det = x2 − (a + d)x + ad − bc. (4)
c d−x

For any square matrix, the constant term of the characteristic polynomial is given by

p(0) = det(A − 0x) = det A.

At the other extreme, the leading term is always (−x)n = (−1)n xn . It is also easy to show
that the coefficient of xn−1 in p(x) equals (−1)n−1 tr A, where tr A denotes the sum of the
diagonal entries. Warning: Some authors (including Wikipedia) define the characteristic
polynomial to be det(xI − A); this is always monic (good!) but attempts to calculate it by
hand are usually fraught with sign errors (bad ¡).
The statement ‘λ is an eigenvalue of A’ means that there exists v ∕= 0 satisfying (2). Any
such λ must therefore be a root of p(x), and a solution of the characteristic equation

p(x) = 0. (5)

Here is an important example:


Lemma. A is itself invertible iff 0 is not an eigenvalue of A.

Proof. A is invertible iff det A ∕= 0 iff x = 0 is not a solution of det(A − xI) = 0. QED

Exercise. Give a different proof of the Lemma, avoiding any mention of the determinant and
characteristic polynomial.
If A ∈ Rn,n , then (5) can have at most n roots, and there are at most n distinct eigenvalues.
There may be less, as the roots of a polynomial can be repeated, and it is also possible that
pairs of roots occur as complex numbers.
Given A, suppose that λ is a root of the characteristic polynomial, so that p(λ) = 0. From
above, we know that there must exist a nonnull column vector v satisfying (3). We can find
therefore such a v by solving the homogeneous linear system associated to A − λI . This we
shall do in the next lecture (but if you are desperate, do q. 6 below).
Example. We have not spoken much about 4×4 determinants, but these can be expanded along
a row or column into a linear combination of 3 × 3 determinants. Let
! #
x 0 0 s
% −1 x 0 r &
% &
A=% &.
" 0 −1 x q $
0 0 −1 x + p
Expanding down the first column,
! # ! #
x 0 r 0 0 s
det A = x det" −1 x q $ + 1. det" −1 x q $+0+0
0 −1 x + p 0 −1 x+p
) *
= x x(x(x + p) + q) + 1.r + s

= x4 + px3 + qx2 + rx + s.
This type of example can be used to show that any polynomial whose leading term is (−x)n is
the characteristic polynomial of some n × n matrix.

L18.4 Further exercises.


! # ! # ! # ! #
1 1 1 0 1 1
1. Every eigenvector of " 1 1 1 $ is a LC of " 1 $ , " 0 $ , " −1 $ . True or false?
1 1 1 −1 −1 0
! #
0 2 a
2. Given A = " 2 1 1 $ with a ∈ R, find the values of a for which r(A) = 2. Find the
a 1 1
eigenvalues of A when a = 2.

3. Find all the eigenvalues of the matrices


! # ! #
! # ! # 1 0 0 0 1 0 0 0
2 0 0 2 1 1 %0
"0 % 1 −1 0&&,
%0
% 0 0 0& &.
1 8 $, "1 −2 3 $ , "0 −1 1 0$ "0 1 1 0$
0 2 1 3 4 −1
0 0 0 1 0 0 0 1
! # ! #
0 1 1 0 0 1
4. Try to find two LI vectors u, v that are eigenvectors of both " 1 0 1 $ and " 0 1 0$.
1 1 0 1 0 0
' (
a b
5. Given A = , verify that A2 − (a+d)A + (ad−bc)I2 is the null matrix.
c d

6. Find all the eigenvectors of the first two matrices in q. 3 by solving appropriate linear
systems (one for each eigenvalue).
Notes 19 – Eigenspaces and Multiplicities
In this lecture, we shall explain how to compute methodically all the eigenvectors associated
to a given square matrix.

L19.1 Subspaces generated by eigenvectors. Given A ∈ Rn,n , consider its characteristic


polynomial
p(x) = det(A − xI).
We know that this polynomial has degree n, and its roots are precisely the eigenvalues of
A. In particular,

• if λ is real number, then λ satisfies p(λ) = 0 iff there exists a nonnull column vector
v ∈ Rn,1 such that Av = λv .

• if λ is a complex number, then λ satisfies p(λ) = 0 iff there exists a nonnull column
vector v ∈ Cn,1 such that Av = λv .

Definition. If λ is an eigenvalue, the subspace

Eλ = ker(A − λI)

is called the eigenspace associated to λ.

Warning: Not quite all the elements of Eλ are eigenvectors, since (being a subspace) Eλ
also includes the null vector 0 that is not counted as an eigenvector.
! "
0 1
Warning: Even if A ∈ Rn,n , it can have complex eigenvalues, e.g. A = having
−1 0
charcteristic polynomial x2 +1 and eigenvalues i and −i. Since the eigenvalues are complex
and not real, the corresponding eigenspaces contain no eigenvector with real components.
The dimension of Eλ satisfies

n − dim Eλ = r(A − λI),

by the Rank-Nullity Theorem or (RC2).


! "
−6 9
Example. The matrix A = has characteristic polynomial
−4 7

p(x) = (−6 − x)(7 − x) + 36 = x2 − x − 6 = (x + 2)(x − 3).

The roots are λ1 = −2 and λ2 = 3 , and we consider one at a time. Firstly,


! " ! "
−4 9 1 − 94
A − λ1 I = A + 2I = ∼ .
−4 9 0 0

As predicted (by the very fact that −2 is an eigenvalue), this matrix has rank less than 2. It is
easy to find a nonnull vector in Ker(A + 2I) , namely
!9" ! " #! "$
9 9
4 or , whence E−2 = L .
1 4 4
Similarly,
! " ! " #! "$
−9 9 1 −1 1
A − λ2 I = A − 3I = ∼ , and E3 = L .
−4 4 0 0 1

Although the previous example only has n = 2, it illustrates an important technique: that
of selecting an eigenvector for each eigenvalue.
The next result is fairly obvious for k = 2, and we prove it in another special case.
Proposition. Suppose that v1 , . . . , vk are eigenvectors of A associated to distinct eigenval-
ues λ1 , . . . , λk . Then {v1 , . . . , vk } is LI.

Proof. To give the general idea, we take k = 3 (which, for the conclusion to be valid, means
that the vectors lie in Rn with n ! 3) and λ1 = 1, λ2 = 2, λ3 = 3. Suppose that

0 = a1 v 1 + a2 v 2 + a3 v 3 . (1)

Applying A (or rather its associated linear mapping) gives

0 = a1 Av1 + a2 Av2 + a3 Av3 = a1 v1 + 2a2 v2 + 3a3 v3 ,

and subtracting (1),


0 = a2 v2 + 2a3 v3 . (2)
Applying A again,
0 = a2 Av2 + 2a3 Av3 = 2a2 v2 + 6a3 v3 .
Substracting twice (2) gives a3 = 0, since the eigenvector v3 is necessarily nonnull. Return-
ing to (2) we get a2 = 0, and finally a1 = 0 from (1). Thus, there is no non-trivial linear
relation between the three eigenvectors, and they are LI. QED

Recall that a total of n LI vectors in the vector space Rn automatically forms a basis.
Corollary. Suppose that the characteristic polynomial of a matrix A ∈ Rn,n has n distinct
real roots. Then Rn has a basis consisting of eigenvectors of A.

We shall see in the next lecture that the Corollary’s conclusion means that A is what is called
diagonalizable: in many ways A behaves as if it were a diagonal matrix.

L19.2 Repeated roots. Greater difficulties can arise when a root of p(x) is not simple.
Definition. We define the multiplicity, written mult(λ), of a root λ of the characteristic
polynomial p(x) to be the highest power of the factor x − λ that divides p(x).

Example. We spotted some eigenvectors of the matrix


% '
−2 1 1
A = & 1 −2 1 (
1 1 −2

in the previous lecture. Its characteristic polynomial is


) * ) * ) *
p(x) = (−2 − x) (−2 − x)(−2 − x) − 1 − − 3 − x + 3 + x
= −x3 − 6x2 − 9x = −x(x + 3)2 ,
whence mult(0) = 1 and mult(−3) = 2 . To find the space E−3 generated by the eigenvectors
with eigenvalue −3 , consider
% ' % '
1 1 1 1 1 1
A + 3I = & 1 1 1( ∼ & 0 0 0 (.
1 1 1 0 0 0

This has rank 1, and there is a 2-dimensional space


% ' % ' % '
+ x , # −1 −1 $
E−3 = & y ( : x + y + z = 0 = L & 0 ( , & 1 (
z 1 0

of solutions. The fact that dim E−3 = mult(−3) is not entirely a coincidence.

Theorem. Let λ be an eigenvalue of a n×n matrix A. Then

1 " dim Eλ " mult(λ).

Proof. The first inequality is obvious: the fact that λ is an eigenvalue means that Eλ contains
a nonzero vector v .
To prove the second inequality one needs to know more about the characteristic polynomial,
but we can justify it in the special case that the remaining roots of p(x) are real and distinct.
Let A ∈ Rn,n , m = mult(λ). Suppose (for a contradiction) that dim Eλ > m, so that we can
pick LI vectors v1 , . . . , vm+1 in Eλ , as well as LI eigenvectors w1 , . . . , wn−m , one for each
remaining eigenvalue. The resulting total of n + 1 vectors in Rn cannot be LI, so there is a
non-trivial linear relation

a1 v1 + · · · + am+1 vm+1 +b1 w1 + · · · + bn−m wn−m = 0.


! "# $
v

But one way or another this contradicts the Proposition: the sum v is nonnull since the w ’s
are LI, and is itself an eigenvector with eigenvalue λ different from the others. QED

Exercise. (i) Write down a diagonal matrix A ∈ R4,4 such that the charactristic polynomial of A
equals (x − 1)2 (x + 2)2 .
(ii) Verify that A has dim E1 = 2 and dim E−2 = 2 .
(iii) Let B be the matrix obtained from A by changing its entry in row 1, column 2 from 0 to 1.
Compute dim E1 and dim E−2 for B .
(iv) Find a matrix C with the same characteristic polynomial (x − 1)2 (x + 2)2 but for which
dim E1 = 1 = dim E−2 .

In the light of the Theorem, some authors call


mult(λ) the algebraic multiplicity of λ, and dim(Eλ ) the geometric multiplicity of λ.
The latter can never exceed the former.

L19.3 The 2 × 2 case. Let us analyse the possible eigenspaces of the matrix
! "
a b
A= ∈ R2,2 .
c d
Its characteristic polynomial p(x) = x2 − (a+d)x + ad − bc has roots
% √
a+d ± (a+d)2 − 4(ad−bc) a+d ± ∆
= ,
2 2
where
∆ = (a−d)2 + 4bc.

There are three cases:


(i) If ∆ > 0, then there are distinct real eigenvalues λ1 , λ2 so that

p(x) = (x − λ1 )(x − λ2 ), λ1 +λ2 = a+d, λ1 λ2 = ad+bc.

Any associated pair of eigenvectors will form a basis of R2 , by the Corollary.


(ii) If ∆ = 0, then there is one eigenvalue λ with mult(λ) = 2. In the subcase that dim Eλ =
2, Eλ = R2 contains both e1 , e2 and
! "
λ 0
A= = λI, (3)
0 λ

which means that b = c = 0 and a = d. In the subcase dim Eλ = 1 there is no basis of


eigenvectors.
(iii) If ∆ < 0, then there are no real eigenvalues, though the theory still makes sense when
one passes to the field F = C of complex numbers (see q. 5 below).
An important special case (of (i) or (ii)) is that in which b = c, and A is symmetric: this
implies that ∆ ! 0 with equality iff A is given by (3). Actually, one can prove that the
eigenvalues of any real symmetric n × n matrix are always themselves real.
" !
0 1
Example. An instance of (ii) with dim Eλ = 1 is the matrix N = that satisfies N 2 = 0 .
0 0
The characteristic polynomial is x2 , so 0 is a repeated eigenvalue. A direct way of seeing that
any eigenvalue must be 0 is to observe that

N v = λv ⇒ 0 = N 2 v = N (λv) = λ2 v ⇒ λ2 = 0.
! " #! "$
x 1
Any eigenvector has to satisfy y = 0 , so E0 = L has dimension 1. Similar
y 0
considerations apply to the matrix N + aI where a ∕= 0 .

L19.4 Further exercises.

1. For each of the following matrices, find all possible eigenvalues λ ∈ R and describe the
associated eigenspaces Eλ :
% '
% ' % ' 1 1 0 2
! " 1 1 2 5 3 −3
2 1 - −1 3 0 0.
, &0 −1 1 (, &0 1 0 (, - ..
1 2 & −1 4 −1 1(
0 0 0 1 2 1
−1 4 −1 0
% '
a 2 a −1
2. Given A = & −3 5 −2 ( with a ∈ R,
−4 4 −1
(i) find the value of a for which 1 is an eigenvalue (no need to work out p(x)!).
(ii) is there a value of a for which there are two LI eigenvectors sharing the same eigenvalue?

3. Let p(x) be a polynomial of degree n, and assume that p(λ) = 0. Show that mult(λ) ! 2
if and only if p′ (λ) = 0.

4. Find a matrix L such that L2 = L but L is neither 0 nor I . What are the eigenvalues of
L? Does the answer depend upon you choice?

5. [Uses the complex field C.] Consider the matrix


! "
0 −1
M= .
1 0

(i) Verify that M satisfies M 2 + I = 0 and has characteristic polynomial x2 + 1.


(ii) Show that M −iI and M +iI both have rank 1 (in the obvious sense in which this applies
to complex matrices), and find nonnull u, v ∈ C2 such that M u = iu and M v = −iv .
Notes 20 – Diagonalizability
For an important class of square matrices (or linear transformations of a finite-dimensional
vector space), it is possible to choose a basis of eigenvectors. We have already seen that this
is possible if the characteristic polynomial of A ∈ Rn,n has distinct real roots. We shall now
explain the significance of, and give a more general criterion for, the existence of a basis of
eigenvectors. We include a well-known application to the theory of Fibonacci numbers.

L20.1 An example in detail. Consider the following matrix:


! #
5 3 −3
A = "0 1 0 $.
1 2 1
One’s first observation upon setting eyes on this matrix is that A − I has a row of 0’s; it
follows that 1 is an eigenvalue of A. Given that
det A = 5 × 1 − (−3) × 1 = 8, tr A = 5+1+1 = 7,
we may choose the roots λ1 , λ2 , λ3 of p(x) = det(A − xI) so that
λ1 = 1, 1 × λ2 × λ3 = 8 1 + λ2 + λ3 = 7,
and λ2 = 2 and λ4 = 4. If required to do so, one can verify directly that
p(x) = −(x − 1)(x − 2)(x − 4).

Eigenvectors can be found by picking particular solutions of the corresponding linear sys-
tem: ! # ! # ! #
4 3 −3 1 2 0 −6
A − I = "0 0 0 $ ∼ "0 5 3$ ⇒ v1 = " 3 $
1 2 0 0 0 0 −5
! # ! # ! #
3 3 −3 1 0 −1 1
A − 2I = " 0 −1 0 $ ∼ " 0 1 0 $ ⇒ v2 = " 0 $
1 2 −1 0 0 0 1
! # ! # ! #
1 3 −3 1 0 −3 3
A − 4I = " 0 −3 0 $ ∼ " 0 1 0 $ ⇒ v3 = " 0 $.
1 2 −3 0 0 0 1

Let f : R3 → R3 denote the linear transformation defined by f (v) = Av . Instead of using


the canonical basis e1 , e2 , e3 of R3 , we are at liberty to use the basis v1 , v2 , v3 (these three
vectors are obviously LI, though this is also a theoretical consequence of the fact that the
corresponding eigenvalues are distinct). Whilst the matrix of f with respect to the canonical
basis is A, its matrix with respect to the ‘new’ basis is
! #
1 0 0
D = "0 2 0 $, (1)
0 0 4
reflecting the equations
Av1 = 1v1 , Av2 = 2v1 , Av3 = 4v3 . (2)
We shall now make the relationship between the two matrices A and D more explicit.
Let P denote the matrix whose columns are the chosen eigenvectors:
! # ! #
↑ ↑ ↑ −6 1 3
P = " v1 v2 v3 $ = " 3 0 0 $
↓ ↓ ↓ −5 1 1

(vertical lines emphasize the column structure of this matrix). In view of (2),
! # ! #
↑ ↑ ↑ −6 2 12
AP = " Av1 Av2 Av3 $ = " 3 0 0 $ .
↓ ↓ ↓ −5 2 4

We get exactly the same result by multiplying P by the diagonal matrix D on the right:
! #! # ! #
−6 1 3 1 0 0 −6 2 12
PD = " 3 0 0$"0 2 0$ = " 3 0 0 $.
−5 1 1 0 0 4 −5 2 4

In conclusion,
AP = P D.
Since the columns of P are LI, P has rank 3 and is invertible. One can therefore assert
(without the need to actually compute P −1 ) that

P −1AP = D, or A = P DP −1 .

To further make sense of these equations, we record the


Definition. (i) Two matrices A, B ∈ Rn,n are said to be similar if there exists an invertible
matrix P such that A = P BP −1 .
(ii) A matrix A ∈ Rn,n is said to be diagonalizable if it is similar to a diagonal matrix.

The property of ‘being similar to’ is an equivalence relation on the set Rn,n (refer to L8.3). The
3 × 3 matrix A of our example is similar to (1), and therefore diagonalizable.
Warning. We have chosen to apply these definitions strictly within the field R of real num-
bers. One is free to treat A, B as elements of Cn,n and ask whether A = P BP −1 with
P ∈ Cn,n . This is an easier condition to satisfy, and leads to a more general concept of
similarity and diagonalizability, but one that we shall ignore in this course.

L20.2 A criterion for diagonalizability. We explain the construction in L19.1 in a more gen-
eral context. Suppose that A ∈ Rn,n possesses two eigenvectors v1 , v2 , with corresponding
eigenvalues λ1 , λ2 that may or may not be distinct. Consider the matrix
! #
↑ ↑
X = " v1 v2 $ ∈ Rn,2
↓ ↓

whose two columns are the chosen eigenvectors. Arguing as before,


! # ! #
↑ ↑ ↑ ↑ % &
λ1 0
AX = " Av1 Av2 $ = " v1 v2 $ = XD,
0 λ2
↓ ↓ ↓ ↓

where this time D is a diagonal 2 × 2 matrix.


The last equation is valid even if the two eigenvectors are identical, and we can choose more
eigenvectors so as to obtain a matrix X with more columns. But we can only invert X if its
rank is n, or equivalently if we can find a total of n LI eigenvectors. Hence the following
Lemma. A matrix A ∈ Rn,n is diagonalizable over R iff there exists a basis of Rn consisting
of eigenvectors of A.

The next result tells us exactly when this is possible.


Theorem. A matrix A ∈ Rn,n is diagonalizable over R iff all the roots of p(x) are real, and
for each repeated root λ we have

mult(λ) = dim Eλ . (3)

Proof. Each eigenvector v ∈ Rn,1 is associated to a real root λ of p(x), and we already know
that the dimension of the eigenspace Eλ is at most mult(λ). So unless (3) holds for every
eigenvalue, it is numerically impossible to find a basis of eigenvectors.
Conversely, suppose that the distinct roots of p(x) are λ1 , . . . , λk ∈ R. If k = n the result
follows from the Corollary in L20.1; otherwise set mi = mult(λi ) and suppose that (3) holds.
Pick a basis of each eigenspace Eλi , and put all these elements together to get a total of n =
m1 + · · · + mk eigenvectors v1 , . . . , vn . Any linear relation between them can be regrouped
into a linear relation

a1 w1 + · · · + ak wk = 0, with ai ∈ R and Awi = λi wi ,

in which each wi is itself a LC of v ’s if mi > 1. Since the wi ’s correspond to distinct


eigenvalues all the coefficients must vanish, and it follows that the v ’s form a basis. QED

Warning: we can work in greater generality over any field F. In particular: A matrix
A ∈ Fn,n is diagonalizable iff all the roots of p(x) are in F, and for each repeated root λ we have
mult(λ) = dim Eλ .
Exercise. Verify that the ‘anti-diagonal’ matrix
! #
0 0 1
B = "0 1 0$
1 0 0

has characteristic polynomial −(x − 1)2 (x + 1) , and eigenvectors


! # ! # ! #
0 1 1
v1 = " 1 $ , v2 = " 0 $ , v3 = " 0 $.
0 1 −1

Deduce that mult(1) = 2 = dim E1 , and that P −1BP = D , where


! # ! #
0 −1 1 1 0 0
P = " 1 0 1 $, D = " 0 −1 0 $.
0 1 1 0 0 1

L20.3 Simple endomorphism. From the previous discussion, the following definition is
now very natural.
Definition. A linear mapping f : V → V is called endomorphism.

Definition. We say that an endomorphism f : V → V is simple if there exists a basis of V


made by eigenvectors of f .

We can now consider the question: is a given endomorphism simple or not? The answer
comes from the previous discussion and we state the result without a proof.
Proposition. Let f : V → V be an endomorphis, then the following are equivalent facts:

• f is a simple endomorphism

• there exists a basis of V made of eigenevctors of f

• there exists a basis of V such that the associated matrix to f is diagonal

• there exists a basis of V such that the associated matrix to f is diagonalizable

The key to prove this useful result is the following Lemma:


Lemma. If A, B are associated matrices to the same endomorphism f , then A and B are
similar.

Example. The endomorphism f : R3 → R3 given by f (x, y, z) = (z, y, x) has associated matrix


with respect to the canonical basis of R3
! #
0 0 1
M (f ) = " 0 1 0 $.
1 0 0

Since we know that M (f ) is diagonalizable, we conclude that f is simple.

L20.4 An application. The sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . of Fibonacci
numbers is defined recursively by the initial values f0 = 0, f1 = 1 and the equation

fn+1 = fn + fn−1 .

The latter can be put into matrix form


% & % & % &
fn fn−1 0 1
=F by taking F = .
fn+1 fn 1 1

The characteristic polynomial of F is x2 − x − 1 = (x − λ1 )(x − λ2 ), where


√ √
1+ 5 1− 5
λ1 = , λ2 = (4)
2 2
(λ1 = 1.618.. is the so-called golden ratio). It follows that F = P DP −1 , where
% & % & % &
1 1 λ1 0 1 λ2 −1
P = , D= , P −1 = .
λ1 λ2 0 λ2 λ2 −λ1 −λ1 1
Powers of F are now easily computed:

F n = (P DP −1 )(P DP −1 ) · · · (P DP −1 ) = P Dn P −1 ,
since all internal pairs P −1P cancel out. It follows that
% & % & % & % &% &
fn 0 −1 0 √1 P
λ1n 0 1
= Fn n
= PD P = ,
fn+1 1 1 5 0 λ2n −1

and we obtain the celebrated formula


λ1n − λ2n
fn = √
5

for the nth Fibonacci
√ number in terms of (4). For large n, this is very close to λ1n / 5 (for
instance, λ112 / 5 = 144.001..). Moreover, the ratio fn+1 /fn tends to λ1 as n → ∞.

L20.5 Further exercises.

1. Which of the following matrices A is diagonalizable? Find an invertible matrix P ∈ R3,3


(if it exists) such that P −1AP is diagonal.
! # ! # ! # ! #
2 1 2 2 1 1 −2 3 −3 1 1 0
"0 0 1 $, "1 2 1 $, " 0 1 0 $, " −1 3 0 $.
0 1 0 1 1 2 1 1 2 −1 4 −1

2. Given the matrices


! # ! # ! #
1 0 0 1 0 0 −1 0 0
A = " 1 −1 0 $, B1 = " 0 −1 0 $, B2 = " 0 1 0 $,
2 3 2 0 0 2 0 0 2

find invertible matrices P1 and P2 such that P1−1AP1 = B1 and P2−1AP2 = B2 .

3. Find counterexamples to show that both the following assertions are false:
A ∈ Rn,n is diagonalizable ⇒ A is invertible;
A ∈ Rn,n is invertible ⇒ A is diagonalizable.
% &
−5 3
4. Let A = . Find a diagonal matrix D and a matrix P such that A = P DP −1 . If
6 −2
D = E 3 , check that A = (P EP −1 )3 ; hence find a matrix B ∈ R2,2 such that A = B 3 .

5. Let g: R3,3 → R3,3 denote the linear mapping defined by g(A) = A + A⊤ . Use the study
of g carried out in a previous lecture to find a basis of R3,3 consisting of eigenvectors of g .
Write down the daigonal 9 × 9 matrix representing g with respect to this basis.

6. Let c ∈ R. Prove that A ∈ Rn,n is diagonalizable if and only if A + cI is.


Notes 21 – Symmetric and Orthogonal Matrices
In this lecture, we focus attention on symmetric matrices, whose eigenvectors can be used
to construct orthogonal matrices. Determinants will then help us to distinguish those or-
thogonal matrices that define rotations.

L21.1 Orthogonal eigenvectors. Recall the definition of the dot or scalar product of two col-
umn vectors v, w ∈ Rn,1 . Without writing out their components, we can nonetheless assert
that
v · w = v⊤ w. (1)
Recall too that a matrix S is symmetric if S⊤ = S (this implies of course that it is square).
Lemma. Let v1 , v2 be eigenvectors of a symmetric matrix S corresponding to distinct eigen-
values λ1 , λ2 . Then v1 · v2 = 0.

Proof. First note that

(Sv1 ) · v2 = (Sv1 )⊤ v2 = v⊤ ⊤ ⊤
1 S v2 = v1 (Sv2 ) = v1 · (Sv2 ).

This is true for any vectors v1 , v2 , but the assumptions Sv1 = λ1 v1 and Sv2 = λ2 v2 tell us
that
λ1 v 1 · v 2 = λ2 v 1 · v 2 ,
and the result follows. QED

We will see that pairwise orthogonal vectors are LI, and this motivates the following:
Definition. Let v1 , . . . , vm be a basis of the subspace V ⊆ Rn . The basis is called an or-
thonormal basis (ON basis for short ) if the following holds
(i) vi · vi = 1 for all i;

(ii) vi · vj = 0 for all i ∕= j .

Example. To begin with the 2 × 2 case, consider the symmetric matrix


! "
5 2
A= .
2 8

It is easy to check that its eigenvalues are 9 and 4 , and that respective eigenvectors are
! " ! "
1 −2
v1 = , v2 = .
2 1

As predicted by the Lemma, v1 ·v2 = 0 . Given this fact, we can normalize v1 , v2 to manufacture
an orthonormal basis ! " ! "
1 −2
f1 = √1 , f2 = √1
5 2 5 1
of eigenvectors, and use these to define the matrix
! "
1 −2
P = √1 .
5 2 1
With this choice, ! "! " ! "
⊤ 1 1 2 1 −2 1 5 0
P P = 5
= 5
= I2 . (2)
−2 1 2 1 0 5
Another way of expressing this relationship is
! "
−1 1 1 2
P = √ = P⊤ ,
5 −2 1

and also P P ⊤ = I . It is easy to verify that


! "! "! " ! "
1 2 5 2 1 −2 9 0
P −1AP = 15 = .
−2 1 2 8 2 1 0 4

Definition. A matrix P ∈ Rn,n is called orthogonal if it satisfies one of the equivalent


conditions: (i) P ⊤P = In , (ii) P P ⊤ = In , (iii) P is invertible and P −1 = P ⊤ .

Let us explain why the three conditions are indeed equivalent. As (1) and (2) make clear,
condition (i) asserts that the columns of P are orthonormal. Condition (ii) assserts that the
rows are orthonormal. A set {v1 , . . . , vn } of orthonormal vectors is necessarily LI since

a1 v 1 + · · · + an v n = 0

implies (by taking the dot product with each vi in turn) that ai = 0; thus either (i) or (ii)
implies that P is invertible. It follows that both (i) and (ii) are equivalent to (iii).
The relationship between symmetric and orthogonal matrices is cemented by the
Theorem. Let S ∈ Rn,n be a symmetric matrix. Then
(i) the eigenvalues (or roots of the characteristic polynomial p(x)) of S are all real.
(ii) there exists an orthogonal matrix P such that P −1SP = P ⊤SP = D .

Proof. (i) Suppose that λ ∈ C is a root of p(x). Working over the field C, we can assert that
there exists a complex eigenvector v ∈ Cn,1 satisfying Sv = λv . If v⊤ = (z1 , . . . , zn ) then
the complex conjugate of this vector is v⊤ = (z 1 , . . . , z n ) and

v⊤ v = |z1 |2 + · · · + |zn |2 > 0,

since v ∕= 0. Thus

λ v⊤ v = v⊤ (Sv) = Sv v = λ v⊤ v,
and necessarily λ = λ and λ ∈ R.
In the light of (i), part (ii) follows immediately if all the roots of p(x) are distinct. For each
repeated root λ, one needs to know that mult(λ) = dim Eλ ; for if this is true the Lemma
permits us to build up an orthonormal basis of eigenvectors. We shall not prove the mul-
tiplicity statement (that is always true for a symmetric matrix), but a convincing exercise
follows. QED

Exercise. Consider again the symmetric matrix


# %
−2 1 1
A = $ 1 −2 1 & ,
1 1 −2
and its eigenvectors
# % # % # %
1 1 1
v1 = $ 1 & , v2 = $−1 & , v3 = $ 0 &
1 0 −1

(found in L20.2), with respective eigenvalues 0 (multiplicity 1) and −3 (multiplicity 2). As pre-
dicted by the Lemma, v1 · v2 = 0 = v1 · v3 . Observe however that v2 · v3 ∕= 0 ; show nonetheless
that there exists an eigenvector v3′ with eigenvalue −3 such that v2 · v3′ = 0 . Normalize the
vectors v1 , v2 , v3′ so as to obtain an orthogonal matrix P for which P −1AP is diagonal. Compute
the determinant of P ; can the latter be chosen so that det P = 1 ?

We conclude with a result explaining why we like symmetric real matrices so much.
Corollary. If S ∈ Rn,n is symmetric, then S is diagonalizable.
! "
0 i
Exercise. Prove that the symmetric not real matrix S = is clearly not diagonalizable
i 0
over R , but it is diagonalizable over C .
! "
1 i
Exercise. Prove that the symmetric not real matrix S = is not diagonalizable over
i −1
R and it is not diagonalizable over over C .

L21.2 Rotations in the plane. We proceed to classify 2 × 2 orthogonal matrices. Suppose


that ! "
p q
P =
r s
is orthogonal, so that both
! "columns
! are
" mutually
! orthogonal
" unit vectors. The only
! two
" unit
p −r r q
vectors orthogonal to are and , so one of these must equal . This
r p −p s
gives us two possibilities:
! " ! "
p −r p r
P = or P = .
r p r −p

Since p2 + r2 = 1, the first matrix has determinant 1 and the second −1.
Let us focus our attention on the first case. There exists a unique angle θ , 0 ≤ θ < 2π such
that cos θ = p and sin θ = r . We denote the resulting matrix P by
! "
cos θ − sin θ
Rθ = , (3)
sin θ cos θ

to emphasize that it is a function of θ . Observe that the vector


! " ! "
x x cos θ − y sin θ
Rθ =
y x sin θ + y cos θ
! "
x
is the image of the vector under a rotation about the origin by an angle θ . To summa-
y
rize:
Proposition. Any 2 × 2 orthogonal matrix with determinant 1 has the form (3), and repre-
sents an anti-clockwise rotation in R2 by an angle θ with centre the origin.
L21.3 Properties of orthogonal matrices. Recall that (AB)T = B TAT is always true.
Lemma. If A, B are orthogonal matrices of the same size, then A−1 and AB are also or-
thogonal.

Proof. Suppose that A⊤A = I and B⊤B = I . Then A⊤ = A−1 ; this implies that

(A−1 )⊤ = A, and so (A−1 )⊤A−1 = I,

so A−1 is orthogonal. Moreover,

(AB)⊤ (AB) = B⊤A⊤AB = B⊤ IB = B⊤B = I,

as required. QED

In order to compute the determinant of an orthogonal matrix, we use Binet’s Theorem.


Suppose that P ⊤P = I . Then

1 = det I = det(P ⊤ P ) = det(P ⊤ ) det P = (det P )2 ,

since det(P ⊤ ) = det P .


Corollary. Any orthogonal matrix has determinant equal to 1 or −1.

Example. Suppose that P is an orthogonal matrix with det P = 1 . Then


' (
det(P − I) = det(P ⊤ ) det(P − I) = det (P ⊤ )(P − I)
' (
= det P ⊤P − P ⊤
= det(I − P ⊤ )
' (
= det (I − P )⊤
= det(I − P ) = (−1)n det(P − I).

If n is odd, it follows that det(P−I) = 0 , so 1 is a root of the characteristic polynomial. Therefore


1 is an eigenvalue of P , and there exists v ∈ R3,1 such that P v = v (and v ∕= 0 ).

This example can be used to show that any 3 × 3 orthogonal matrix P with det P = 1
represents a rotation of R3 about an axis passing through the origin. For given such a
rotation, one can choose an orthonormal basis {v1 , v2 , v3 } of R3 such that v3 points in the
direction of the axis of rotation. It then follows (from an understanding of what is meant by
a rotation of a rigid body in space, and referring to (3)) that the rotation is described by a
linear mapping f : R3 → R3 whose matrix with respect to the basis is
# %
cos θ − sin θ 0
Mθ = $ sin θ cos θ 0 &.
0 0 1

For example, v (→ Mθ v is itself a rotation about the z -axis.

L21.4 Further exercises.

1. For which values of θ does the rotation matrix (3) have a real eigenvalue?
2. Show that if an n × n matrix S is both symmetric and orthogonal then S 2 = I . Deduce
that the eigenvalues of S are 1 or −1.

3. An isometry of Rn is any mapping f : Rn → Rn such that |f (v) − f (w)| = |v − w| for all


v, w ∈ Rn . Show that such a mapping is necessarily injective. Now suppose that v0 ∈ Rn,1
is a fixed vector and that P ∈ Rn,n is an orthogonal matrix. Set g(v) = v0 + P v . Verify that
g is an isometry; is it surjective?
! " ! "
0 −1 i 0
4. [Uses the complex field C.] Find a matrix Y ∈ C2,2 for which =Y −1 Y.
1 0 0 −i
Notes 22 – The Gram-Schmidt algorithm and the Cayley-
Hamilton theorem
In this lecture, we introduce two important results having both theoretical both practice rel-
evance. Gram-Schmidt algorithm allows us to find orthonormal basis of vector spaces (with
a scalar product). Cayley-Hamilton theorem allows us to simplify polynomial computations
with matrices, in particular the computation of powers and of the inverse.

L22.1 Gram-Schmidt algorithm.


We restrict our attention to the real vector space Rn with the usual dot product defined by
matrix (vector) multiplication, but our analysis can be repeated in more general situation
and even in the case of not finitely generated vector spaces.
We begin by noticing that orthonormal bases are particularly liked since they enjoy many of
the features of canonical bases:
Lemma. Let {v1 , . . . , vn } be an orthonormal basis of Rn and let u ∈ Rn . Then

u = u1 v1 + . . . + un vn

where ui = u · vi for i = 1, . . . , n.

Exercise. Prove the Lemma!

Example. The previous lemma, applied to canonical bases, just states the very well-known fact:
when we write the vector v = (1, 2, 3, . . . , n) ∈ Rn we really mean v = 1e1 + 2e2 + . . . + nen .

Exercise. Can you find an orthonormal basis of R3 containing two given orthogonal unit vectors
u, v ? (Hint: note that e3 = e1 × e2 is a unit vector)

Since orthonormal bases are so handy, we need a way to produce them at will, and in order
to do so we will use projections.
Definition. Let u, v ∈ Rn . The orthogonal projection of u on v is
u·v
prv (u) = v,
||v||2

where ||v|| = v · v is the magnitude, or norm, of v .

Example. If u, v and w in R2 are the three sides of a right triangle, that is v + w = u and
v ⊥ w , then v = prv (u) . Note that
u − prv (u) = w ⊥ v

and we will see that this property holds in general, not only in the plane.

Proposition. Let {v1 , . . . , vn , u} be a basis of V such that vi · vi = 1, for all i, and vi · vj = 0


for i 6= j . Then
{v1 , . . . , vn , v},
u0
is an orthonormal basis of V where v = ||u0 || and

u0 = u − prv1 (u) − prv2 (u) − . . . − prvn (u).


Proof. We need to check that the set {v1 , . . . , vn , v} has properties (B1), (B2) and that its
element are unit vectors orthogonal to each other.
Since, for all i, prvi (u) is a multiple of vi and since u is a (non-zero) multiple of u0 we have
that
V = L{v1 , . . . , vn , v} = L{v1 , . . . , vn , u},
thus property (B1) holds.
To see that property (B2) holds note that dim V = n + 1 and apply the Lazy Corollary.
By construction u is a unit vectors, and thus we only have to check that u · vi = 0 for
i = 1, . . . , n and to do this it is enough to check that

u0 · v i = 0

for i = 1, . . . , n. We note that


prvi (u) · vj = 0
for i 6= j since vi · vj = 0. Then we compute
u · vi
u0 · vi = u · vi − prvi (u) · vi = u · vi − vi · vi = 0
||vi ||2

and the result is proved. QED

Gram-Schmidt algorithm is just the iteration of the previous proposition which we will illus-
trate with an example.
Example. Find an orthonormal basis of

V = L{u1 = (1, 1, 0, 0), u2 = (1, 0, 1, 0), u3 = (0, 0, 0, 1)}.

We begin by finding an orthonormal basis of the subspace V1 = L{u1 } and this is done by
normalizing. We define √
u1 2
v1 = = (1, 1, 0, 0),
||u1 || 2
and {v1 } is an orthonormal basis of V1 . Now we consider the subspace V2 = L{v1 , u2 } and
we apply the proposition, that is we change the second vector in order to obtain an orthonormal
basis of V2 :

u02 = u2 − prv1 (u2 )

1
= (1, 0, 1, 0) − (1, 1, 0, 0) · (1, 0, 1, 0)(1, 1, 0, 0)
2
1 1 1
= (1, 0, 1, 0) − (1, 1, 0, 0) = ( , − , 1, 0).
2 2 2
0 0
Note that u2 ⊥ v1 as desired, but u2 is not a unit vector and then we need to normalize
again √ √
u02 6 1 1 6
v2 = = ( , − , 1, 0) = (1, −1, 2, 0).
||u02 || 3 2 2 6
In conclusion {v1 , v2 } is an orthonormal basis of V2 .
We now need to consider V itself and we know that V = L{v1 , v2 , u3 }. To apply the
proposition we define
u03 = u3 − prv1 (u3 ) − prv2 (u3 ),
and we compute. Since, u3 · v1 = u3 · v2 = 0 we get u03 = u3 and since it is a unit vector
u0
v3 = ||u03 || = u03 . In conclusione we have that
3

{v1 , v2 , v3 }

is an orthonormal basis of V .
Exercise. Reorder the vectors v1 , v2 and v3 in the previous example and repeat the process. Do
you get the same orthonormal basis?

L22.2 Cayley-Hailton theorem. We already noticed that matrix multiplication is much eas-
ier when we consider diagonal matrices. For example, if D is a diagonal matrix, then Dn is
obtained just by taking the n-th powers of the diagonal elements of D . In general, comput-
ing powers of diagonalizable matrices is not difficult.
Lemma. If A = P DP −1 where D is diagonal, then An = P Dn P −1 .

Exercise. Prove the Lemma!

What can we do to work efficiently with the powers of any matrix? Cayley-Hamilton Theo-
rem can help us.
Cayley-Hamilton Theorem. Let A be a square matrix with characteristic polynomial p(x) =
a0 + a1 x + . . . + (−1)n xn . Then

p(A) = a0 In + a1 A + . . . + (−1)n An = 0.

In words, ’A satisfies its characteristic equation’.


Proof. We prove the theorem only in the case in which A is a diagonalizable matrix, that is
A = P DP −1 where D = (dij ) is a diagonal matrix. Note that

det(A − xI) = det(P DP −1 − xP P −1 ) = det(P (D − xI)P −1 ) = det(D − xI),

and thus A and D have the same characteristic polynomial p(x) = (d11 − x) · . . . · (dnn − x).
Now we compute p(A) using the last equality

p(A) = (d11 I − A) · . . . · (dnn I − A) = (d11 I − P DP −1 ) · . . . · (dnn I − P DP −1 )

= (d11 P P −1 − P DP −1 ) · . . . · (dnn P P −1 − P DP −1 )
= P (d11 I − D) · . . . · (dnn I − D)P −1 .
Note that the matrix dii I − D is a diagonal matrix with the i-th row equal the zero row, and
whence p(A) = 0. QED

Cayley-Hamilton Theorem has many deep consequences, but we will only focus on two
algorithmic aspects: the computation of the inverse and the computation of powers, which
we will show in two (easily generalized) examples.
About the inverse:
 
1 2
Example. Consider the matrix A = having characteristic polynomial p(x) = x2 −
3 4
5x − 2. Since det A 6= 0 the matrix is invertible and, because of Cayley-Hamilton Theorem, we
have p(A) = −2I − 5A + A2 = 0, thus
1 2
I= (A − 5A).
2
Multiplying the previous inequality by A−1 we get

1
A−1 = (A − 5I).
2

About powers:
 
1 2
Example. Consider the matrix A = having characteristic polynomial p(x) = x2 −
0 1
2x + 1 . We want to compute A3 . Note that A is not diagonalizable (why?). By Cayley-Hamilton
Theorem we know that A2 = 2A − I and thus

A3 = A(A2 ) = A(2A − I) = 2A2 − A = 2(2A − I) − A = 3A − I.

Warning: In general, the use of Cayley-Hamilton Theorem to compute powers require the
use of the division of polynomials.

L22.3 Further exercises.

1. Find an orthonormal basis for V = L{(1, 1, 0), (0, 1, 1)} and for W = L{(1, 1, 0), (0, 1, 1), (1, 0, 1)}.
Can you make the second computation really quickly?
 
1 0 1 0
2. Let A = and find orthonormal bases for Row A and Ker A. Can one use
0 1 1 0
these to find an orthonormal basis of R4 ?

3. Let A be a 3×3 matrix with eigenvalues 1, 2, 3. Find A−1 in terms of A.

4. Let A be a 2×2 matrix with eigenvalue −1 of multiplicity 2. Find An in terms of A for


n = 2, 3, 4, 5.
Notes 24 – Conics and Quadrics
In the previous lecture, we discussed conics defined by setting a quadratic form equal to
a constant. After defining a general conic, we list the eight different types and investigate
when a conic has a centre of symmetry. We then move on to discuss quadrics from a similar
point of view.

L24.1 Definition of a conic. A conic C is the set of points (x, y) in R2 determined by an


equation of the form

Ax2 + 2Bxy + Cy 2 + 2Dx + 2Ey + F = 0, (1)

where A, . . . , F are real constants, and A, B, C are not all zero so that the left-hand side is a
polynomial of degree 2 in the two variables x, y . (The 2’s are for convenience later.)
Often we shall refer to an equation like (1) as a conic, though strictly speaking the latter is a
set of points. The conics we discussed before were those for which D = E = 0. In this case,
by diagonalizing the symmetric matrix
 
A B
S= , (2)
B C

we can always rotate the coordinate system so that the equation of C becomes

λ1 X 2 + λ2 Y 2 = µ, µ = −F.

It follows that C is one of


(i) an ellipse (if λ1 , λ2 , µ all have the same sign); a circle is the special case in which λ1 = λ2 ;
(ii) a hyperbola (if λ1 , λ2 have opposite signs and µ 6= 0); the corresponding special case
λ1 = −λ2 gives rise to a rectangular hyperbola whose asymptotes are perpendicular;
(iii) two straight lines interecting in one point (if λ1 , λ2 have opposite signs but µ = 0);
(iv) two parallel lines (if one of λ1 , λ2 is zero and the other has the same sign as µ);
(v) a single line (if one of λ1 , λ2 is zero and µ = 0, for then the equation actually defines
two coincident lines, though only one is visible to the naked eye);
(vi) a point (if λ1 , λ2 have the same sign but µ = 0);
(vii) in all other cases, the set of points satisfying (1) is empty (over R).
Allowing D, E to be nonzero produces only one other type, namely
(viii) a parabola (such as x2 + y = 0, or less obviously 4x2 + 12xy + 9y 2 + x = 0).
Warning: it is sometime convenient to speak of the type of a conic: we say that a conic is
of elliptic type if the eigenvalues are positive; we say that a conic is of hyperbolic type if the
eigenvalues are negative; we say that a conic is of parabolic type if zero is an eigenvalue.
Warning: the conics in (iii), (iv), (v), and (vi) are called degenerate, or singular, conics. Degen-
erate conics involve lines: this is clear for the first three cases, but not for the last one. For
case (vi) it is enough to consider two complex lines intersecting in their unique real point: in
this way all singular conics are conics which split into lines.
Warning: in (vii) we mean that the solution set of (1) is empty in R2 , however it is not empty
in C2 . For example, the conic x2 + y 2 = −1 is a nice complex circle without real points. Also
note that the degenerate ellipse in (vi) corresponds to a pair of complex lines with just one
real point, e.g. x2 + y = 0 can be written as (x − iy)(x + iy) = 0 and we only see the real point
(0, 0).

L24.2 Study of a conic: finding the center. Given the general equation (1), we can try first
to eliminate the term 2Dx + 2Ey of degree 1 by a change of coordinates of type

x = X + u,
(3)
y = Y + v.
This corresponds to a translation in which the new system OXY has its origin O at the old
point (x, y) = (u, v). Substituting (3) into (1), we see that the new term of degree 1 is
2AuX + 2B(Xv+uY ) + 2CvY + 2DX + 2EY = (2Au+2Bv+2D)X + (2Bu+2Cv+2E)Y.
To eliminate all this, we need to solve the linear system of equations with unknowns u, v
and augmented matrix  
A B −D
. (4)
B C −E
Since the left-hand side of this matrix is (8), a solution might not exist if det S = 0.
Definition. The conic C is central if there is a translation (3) that converts its equation into
the form A0 X 2 + 2B 0 XY + C 0 Y 2 + F 0 = 0. In this case, (X, Y ) ∈ C ⇔ (−X,−Y ) ∈ C , and
the centre of symmetry is the point (X, Y ) = (0, 0) or (x, y) = (u, v).

From the analysis above, we know that there is only one case in which (4) is incompatible
and C is not central, namely (viii).
Corollary. The conic (1) is a parabola only if B 2 = AC .

Example. Given the conic x2 + 4y 2 − 6x + 8y = 3 , we can locate a centre by completing the


squares:
(x − 3)2 − 9 + 4(y + 1)2 − 4 = 3.
Thus u = 3, v = −1 , and the equation becomes
X2 Y2
X 2 + 4Y 2 = 16, or + = 1,
42 22
which is an ellipse with width twice its height. In general, if the orginal equation has a term in
xy , one needs to find the centre by solving (4).

L24.3 Study of a conic: using matrices. One carry out a qualitative study of a conic by means
of the study of two symmetric matrices. Given C as in (1) we consider the 2 × 2 matrix
 
A B
A22 = , (5)
B C

and the 3 × 3 matrix  


A B D
A33 =  B C E . (6)
 
D E F
In particular, we have that the conic C is
• of elliptic type if and only if A22 is positive or negative definite

• of hyperbolic type if and only if A22 is indefinite

• of parabolic type if and only if the determinat of A22 is zero.

• degenerate if and only if the determinant of A33 is zero

Example. The conic C : x2 + 2xy + y 2 + x + y = 0 is such that


 
1 1
A22 = , (7)
1 1

and  
1 1 1/2
A33 = 1 1 1/2  . (8)
 
1/2 1/2 0
Since |A22 | = 0 then the conic is of parabolic type and since |A33 | = 0 the conic is degenerate.
Thus C is either a pair of parallel lines, either distinct or coinciding; note that the real point
(0, 0) ∈ C . To decide in which situation we are, we intersect with a random line: if the inter-
section is empty we pick a new line, if not we will find either two distinct points or one double
point. Let’s pick the line l : x = 0. Thus l ∩ C is the solution set of y 2 + y = y(y + 1) = 0, that is
a set of two distinct points. Hence C is the union of two parallel distinct lines.

L24.4 Central quadrics. One can carry out a parallel discussion in space by adding a third
variable.
Definition. A quadric Q is the locus of points (x, y, z) in R3 satisfying an equation of the
form
Ax2 + By 2 + Cz 2 + 2Dyz + 2Ezx + 2F xy + 2Gx + 2Hy + 2Iz + J = 0. (9)
The word ‘quadric’ implies that (9) has order 2, so not all of A, B, C, D, E, F are zero.

Just as we did for conics in L24.1, one can list all the different types of quadrics; whilst there
were 8 types of conics there are 15 types of quadrics. However, we shall only consider the
more interesting cases in this course.
Let us start with an obvious example. The equation

x2 + y 2 + z 2 = r 2

fits the definition (with A = B = C = 1, J = −r2 , and all other coefficients zero). It is of
course a sphere of radius r with centre the origin. Indeed if v = (x, y, z)T , then the equation
becomes |v|2 = r2 or |v| = r , and asserts that the distance of (x, y, z) from the origin is r
(see L9.1).
In the light of the discussion of ellipses, it should now come as no surprise that the equation

x2 y 2 z 2
+ 2 + 2 =1
a2 b c
represents an ellipsoid that fits snugly into a box centred at the orgin of dimension 2a×2b×2c.
Definition. A central quadric is the locus of points (x, y, z) satisfying an equation

Ax2 + By 2 + Cz 2 + 2Dyz + 2Ezx + 2F xy + J = 0, (10)

or equivalently   
A F E x
(x y z) F B D  y  = −J.
 
E D C z

The 3 × 3 matrix here is symmetric, and we can rewrite the equation as

v>S v = −J, where µ = −J.

We know from L21.1 that there exists a 3 × 3 orthogonal matrix P so that P −1 SP = P > SP
is diagonal. We may also suppose that det P = 1 (for if not, det P = −1 and we merely
replace P by −P and note that det(−P ) = 1). It follows from remarks in L22.3 that P
represents a rotation; thus we have the
Theorem. Given a central quadric (10), it is possible to rotate the coordinate system about
the origin in space so that in the new system the equation becomes

λ1 X 2 + λ2 Y 2 + λ3 Z 2 = µ. (11)

The numbers λ1 , λ1 , λ3 are (in no particular order) the eigenvalues of S .

Here are some examples of central quadrics in which the eigenvalues are all nonzero:
(i) an ellipsoid (if λ1 , λ2 , λ3 , µ all have the same sign);
(ii) a hyperboloid of one sheet (if for example λ1 , λ2 , µ are positive and λ3 < 0);
(iii) a hyperboloid of two sheets (if for example λ1 , λ2 are negative and λ3 , µ are positive),
(iv) a cone (if not all λ1 , λ2 , λ3 have the same sign and if µ = 0).
In the last case, the cone is circular if two of the eigenvalues are equal, otherwise it is called
elliptic. We shall explain this case further in the next lecture.

L24.5 Paraboloids. In some ways the simplest equation in x, y, z of second order is

z = xy.

This is the equation (9) of a quadric for which all the coefficients are zero except F = −I . If
we perform a rotation of π/4 of the xy plane corresponding to the matrix
 
√1
1 −1
P = ,
2 1 1

then we can replace x by √1 (X − Y ) and y by √1 (X + Y ), and leave z = Z alone. Our


2 2
quadric Q becomes

z = 21 (X − Y )(X + Y ), or 2Z = X 2 − Y 2 .

This is an example of a hyperbolic paraboloid that resembles a ‘saddle’ for a horse, or (on a
bigger scale) a ‘mountain pass’. Any plane X = c or Y = c intersects Q in a parabola,
whereas a plane Z = c intersects Q in a hyperbola or (if c = 0) a pair of lines.
Paraboloids are quadrics that cannot be put into the form (10) or (11), and therefore possess
no central point of symmetry. The standard form of a paraboloid is the equation

Z = aX 2 + bY 2 .

If a, b have opposite signs, it is again a hyperbolic paraboloid. If a, b have the same sign, the
quadric is easier to draw and is called an elliptic paraboloid (circular if a = b). Its intersection
with the plane Z = a is an ellipse (circle).

L24.6 Further exercises.

1. For each of the following conics, find the centre (u, v) and the equations that results by
setting x = X+u, y = Y + v : (i) x2 + y 2 + x = 3, (ii) 3y 2 − 4xy − 4x = 0, (ii) 3x2 − xy + 2y = 9.

2. Find the centre (u, v) of the conic C : x2+xy +y 2 −2x−y = 0, and a symmetric matrix S
X
so that the equation becomes (X, Y )S = 1 with x = X +u, y = Y +v . Diagonalize S ,
Y
and sketch C relative to the original axes (x, y).

3. Let S be the sphere with centre (3, 1, 1) passing through (3, 4, 5). Find the radius of S ,
and write down its equation.

4. Let π be the plane x−2y+2z = 0 and let O denote the origin (0, 0, 0). Find
(i) the line ` orthogonal to π that passes through O ,
(ii) the point P on ` a distance 6 from O with z > 0;
(iii) a sphere S of radius 6 tangent to π at O .

5. Match up, in the correct order, the quadrics

x2 = 3y 2 + z 2 + 1, z 2 = xy, x2 + 2y 2 − z 2 = 1, −x2 − y 2 + 2x + 1 = 0

with (i) a hyperboloid of 1 sheet, (ii) a hyperboloid of 2 sheets, (iii) a cone, (iv) a cylinder.

6. Show that the line ` with parametric equation (x, y, z) = (1, −t, t) is contained in the
quadric Q : x2 +y 2 −z 2 = 1. Draw Q and ` in the same coordinate system. Find a second
line `0 that lies in Q .

7. Decide which of the following equations describes the circular cone that is obtained when
one rotates the line {(x, y, z) : x = 0, z = 2y} around the z -axis:

x2 +4y 2 = z 2 , 4x2 +4y 2 −z 2 = 0, 2(x2 +y 2 )−z 2 = 0, z = 4x2 +4y 2 .

8. The quadrics Q1 : z = x2 +y 2 and Q2 : z = x2 −y 2 are both examples of paraboloids. Write


down the equations of planes π1 , π2 , π3 , π4 parallel to the coordinate planes but such that
(i) Q1 ∩ π1 is a parabola, (ii) Q1 ∩ π2 is a circle,
(iii) Q2 ∩ π3 is a hyperbola, (iv) Q2 ∩ π4 is a pair of lines.
Notes 24 – Conics and Quadrics
In the previous lecture, we discussed conics defined by setting a quadratic form equal to
a constant. After defining a general conic, we list the eight different types and investigate
when a conic has a centre of symmetry. We then move on to discuss quadrics from a similar
point of view.

L24.1 Definition of a conic. A conic C is the set of points (x, y) in R2 determined by an


equation of the form
Ax2 + 2Bxy + Cy 2 + 2Dx + 2Ey + F = 0, (1)
where A, . . . , F are real constants, and A, B, C are not all zero so that the left-hand side is a
polynomial of degree 2 in the two variables x, y . (The 2’s are for convenience later.)
Often we shall refer to an equation like (1) as a conic, though strictly speaking the latter is a
set of points. The conics we discussed before were those for which D = E = 0. In this case,
by diagonalizing the symmetric matrix
 
A B
S= , (2)
B C

we can always rotate the coordinate system so that the equation of C becomes
λ1 X 2 + λ2 Y 2 = µ, µ = −F.
It follows that C is one of
(i) an ellipse (if λ1 , λ2 , µ all have the same sign); a circle is the special case in which λ1 = λ2 ;
(ii) a hyperbola (if λ1 , λ2 have opposite signs and µ 6= 0); the corresponding special case
λ1 = −λ2 gives rise to a rectangular hyperbola whose asymptotes are perpendicular;
(iii) two straight lines interecting in one point (if λ1 , λ2 have opposite signs but µ = 0);
(iv) two parallel lines (if one of λ1 , λ2 is zero and the other has the same sign as µ);
(v) a single line (if one of λ1 , λ2 is zero and µ = 0, for then the equation actually defines
two coincident lines, though only one is visible to the naked eye);
(vi) a point (if λ1 , λ2 have the same sign but µ = 0);
(vii) in all other cases, the set of points satisfying (1) is empty (over R).
Allowing D, E to be nonzero produces only one other type, namely
(viii) a parabola (such as x2 + y = 0, or less obviously 4x2 + 12xy + 9y 2 + x = 0).
Warning: it is sometime convenient to speak of the type of a conic: we say that a conic is of
elliptic type if the eigenvalues have the same sign; we say that a conic is of hyperbolic type
if the eigenvalues have opposite signs; we say that a conic is of parabolic type if zero is an
eigenvalue.
Warning: the conics in (iii), (iv), (v), and (vi) are called degenerate, or singular, conics. Degen-
erate conics involve lines: this is clear for the first three cases, but not for the last one. For
case (vi) it is enough to consider two complex lines intersecting in their unique real point: in
this way all singular conics are conics which split into lines.
Warning: in (vii) we mean that the solution set of (1) is empty in R2 , however it is not empty
in C2 . For example, the conic x2 + y 2 = −1 is a nice complex circle without real points. Also
note that the degenerate ellipse in (vi) corresponds to a pair of complex lines with just one
real point, e.g. x2 + y = 0 can be written as (x − iy)(x + iy) = 0 and we only see the real point
(0, 0).

L24.2 Study of a conic: finding the center. Given the general equation (1), we can try first
to eliminate the term 2Dx + 2Ey of degree 1 by a change of coordinates of type

x = X + u,
(3)
y = Y + v.
This corresponds to a translation in which the new system OXY has its origin O at the old
point (x, y) = (u, v). Substituting (3) into (1), we see that the new term of degree 1 is
2AuX + 2B(Xv+uY ) + 2CvY + 2DX + 2EY = (2Au+2Bv+2D)X + (2Bu+2Cv+2E)Y.
To eliminate all this, we need to solve the linear system of equations with unknowns u, v
and augmented matrix  
A B −D
. (4)
B C −E
Since the left-hand side of this matrix is (8), a solution might not exist if det S = 0.
Definition. The conic C is central if there is a translation (3) that converts its equation into
the form A0 X 2 + 2B 0 XY + C 0 Y 2 + F 0 = 0. In this case, (X, Y ) ∈ C ⇔ (−X,−Y ) ∈ C , and
the centre of symmetry is the point (X, Y ) = (0, 0) or (x, y) = (u, v).

From the analysis above, we know that there is only one case in which (4) is incompatible
and C is not central, namely (viii).
Corollary. The conic (1) is a parabola only if B 2 = AC .

Example. Given the conic x2 + 4y 2 − 6x + 8y = 3 , we can locate a centre by completing the


squares:
(x − 3)2 − 9 + 4(y + 1)2 − 4 = 3.
Thus u = 3, v = −1 , and the equation becomes
X2 Y2
X 2 + 4Y 2 = 16, or + = 1,
42 22
which is an ellipse with width twice its height. In general, if the orginal equation has a term in
xy , one needs to find the centre by solving (4).

L24.3 Study of a conic: using matrices. One carry out a qualitative study of a conic by means
of the study of two symmetric matrices. Given C as in (1) we consider the 2 × 2 matrix
 
A B
A22 = , (5)
B C

and the 3 × 3 matrix  


A B D
A33 =  B C E . (6)
 
D E F
In particular, we have that the conic C is
• of elliptic type if and only if A22 is positive or negative definite

• of hyperbolic type if and only if A22 is indefinite

• of parabolic type if and only if the determinat of A22 is zero.

• degenerate if and only if the determinant of A33 is zero

Example. The conic C : x2 + 2xy + y 2 + x + y = 0 is such that


 
1 1
A22 = , (7)
1 1

and  
1 1 1/2
A33 = 1 1 1/2  . (8)
 
1/2 1/2 0
Since |A22 | = 0 then the conic is of parabolic type and since |A33 | = 0 the conic is degenerate.
Thus C is either a pair of parallel lines, either distinct or coinciding; note that the real point
(0, 0) ∈ C . To decide in which situation we are, we intersect with a random line: if the inter-
section is empty we pick a new line, if not we will find either two distinct points or one double
point. Let’s pick the line l : x = 0. Thus l ∩ C is the solution set of y 2 + y = y(y + 1) = 0, that is
a set of two distinct points. Hence C is the union of two parallel distinct lines.

L24.4 Central quadrics. One can carry out a parallel discussion in space by adding a third
variable.
Definition. A quadric Q is the locus of points (x, y, z) in R3 satisfying an equation of the
form
Ax2 + By 2 + Cz 2 + 2Dyz + 2Ezx + 2F xy + 2Gx + 2Hy + 2Iz + J = 0. (9)
The word ‘quadric’ implies that (9) has order 2, so not all of A, B, C, D, E, F are zero.

Just as we did for conics in L24.1, one can list all the different types of quadrics; whilst there
were 8 types of conics there are 15 types of quadrics. However, we shall only consider the
more interesting cases in this course.
Let us start with an obvious example. The equation

x2 + y 2 + z 2 = r 2

fits the definition (with A = B = C = 1, J = −r2 , and all other coefficients zero). It is of
course a sphere of radius r with centre the origin. Indeed if v = (x, y, z)T , then the equation
becomes |v|2 = r2 or |v| = r , and asserts that the distance of (x, y, z) from the origin is r
(see L9.1).
In the light of the discussion of ellipses, it should now come as no surprise that the equation

x2 y 2 z 2
+ 2 + 2 =1
a2 b c
represents an ellipsoid that fits snugly into a box centred at the orgin of dimension 2a×2b×2c.
Definition. A central quadric is the locus of points (x, y, z) satisfying an equation

Ax2 + By 2 + Cz 2 + 2Dyz + 2Ezx + 2F xy + J = 0, (10)

or equivalently   
A F E x
(x y z) F B D  y  = −J.
 
E D C z

The 3 × 3 matrix here is symmetric, and we can rewrite the equation as

v>S v = −J, where µ = −J.

We know from L21.1 that there exists a 3 × 3 orthogonal matrix P so that P −1 SP = P > SP
is diagonal. We may also suppose that det P = 1 (for if not, det P = −1 and we merely
replace P by −P and note that det(−P ) = 1). It follows from remarks in L22.3 that P
represents a rotation; thus we have the
Theorem. Given a central quadric (10), it is possible to rotate the coordinate system about
the origin in space so that in the new system the equation becomes

λ1 X 2 + λ2 Y 2 + λ3 Z 2 = µ. (11)

The numbers λ1 , λ1 , λ3 are (in no particular order) the eigenvalues of S .

Here are some examples of central quadrics in which the eigenvalues are all nonzero:
(i) an ellipsoid (if λ1 , λ2 , λ3 , µ all have the same sign);
(ii) a hyperboloid of one sheet (if for example λ1 , λ2 , µ are positive and λ3 < 0);
(iii) a hyperboloid of two sheets (if for example λ1 , λ2 are negative and λ3 , µ are positive),
(iv) a cone (if not all λ1 , λ2 , λ3 have the same sign and if µ = 0).
In the last case, the cone is circular if two of the eigenvalues are equal, otherwise it is called
elliptic. We shall explain this case further in the next lecture.

L24.5 Paraboloids. In some ways the simplest equation in x, y, z of second order is

z = xy.

This is the equation (9) of a quadric for which all the coefficients are zero except F = −I . If
we perform a rotation of π/4 of the xy plane corresponding to the matrix
 
√1
1 −1
P = ,
2 1 1

then we can replace x by √1 (X − Y ) and y by √1 (X + Y ), and leave z = Z alone. Our


2 2
quadric Q becomes

z = 21 (X − Y )(X + Y ), or 2Z = X 2 − Y 2 .

This is an example of a hyperbolic paraboloid that resembles a ‘saddle’ for a horse, or (on a
bigger scale) a ‘mountain pass’. Any plane X = c or Y = c intersects Q in a parabola,
whereas a plane Z = c intersects Q in a hyperbola or (if c = 0) a pair of lines.
Paraboloids are quadrics that cannot be put into the form (10) or (11), and therefore possess
no central point of symmetry. The standard form of a paraboloid is the equation

Z = aX 2 + bY 2 .

If a, b have opposite signs, it is again a hyperbolic paraboloid. If a, b have the same sign, the
quadric is easier to draw and is called an elliptic paraboloid (circular if a = b). Its intersection
with the plane Z = a is an ellipse (circle).

L24.6 Further exercises.

1. For each of the following conics, find the centre (u, v) and the equations that results by
setting x = X+u, y = Y + v : (i) x2 + y 2 + x = 3, (ii) 3y 2 − 4xy − 4x = 0, (ii) 3x2 − xy + 2y = 9.

2. Find the centre (u, v) of the conic C : x2+xy +y 2 −2x−y = 0, and a symmetric matrix S
X
so that the equation becomes (X, Y )S = 1 with x = X +u, y = Y +v . Diagonalize S ,
Y
and sketch C relative to the original axes (x, y).

3. Let S be the sphere with centre (3, 1, 1) passing through (3, 4, 5). Find the radius of S ,
and write down its equation.

4. Let π be the plane x−2y+2z = 0 and let O denote the origin (0, 0, 0). Find
(i) the line ` orthogonal to π that passes through O ,
(ii) the point P on ` a distance 6 from O with z > 0;
(iii) a sphere S of radius 6 tangent to π at O .

5. Match up, in the correct order, the quadrics

x2 = 3y 2 + z 2 + 1, z 2 = xy, x2 + 2y 2 − z 2 = 1, −x2 − y 2 + 2x + 1 = 0

with (i) a hyperboloid of 1 sheet, (ii) a hyperboloid of 2 sheets, (iii) a cone, (iv) a cylinder.

6. Show that the line ` with parametric equation (x, y, z) = (1, −t, t) is contained in the
quadric Q : x2 +y 2 −z 2 = 1. Draw Q and ` in the same coordinate system. Find a second
line `0 that lies in Q .

7. Decide which of the following equations describes the circular cone that is obtained when
one rotates the line {(x, y, z) : x = 0, z = 2y} around the z -axis:

x2 +4y 2 = z 2 , 4x2 +4y 2 −z 2 = 0, 2(x2 +y 2 )−z 2 = 0, z = 4x2 +4y 2 .

8. The quadrics Q1 : z = x2 +y 2 and Q2 : z = x2 −y 2 are both examples of paraboloids. Write


down the equations of planes π1 , π2 , π3 , π4 parallel to the coordinate planes but such that
(i) Q1 ∩ π1 is a parabola, (ii) Q1 ∩ π2 is a circle,
(iii) Q2 ∩ π3 is a hyperbola, (iv) Q2 ∩ π4 is a pair of lines.
Notes 25 – Distances and circles
In this lecture we investigate the notion of distance between two geometrical objects. In
particular, we devise methods to deal with distances between two objects chosen among
points, lines, and planes. Then we will introduce the easiest curve defined by distances: the
circle.

L25.1 Distances in general. We already know how to compute the distance between two
points using a formula. We now investigate the problem of finding distances in general.
What is the distance between two geometrical objects? How can we define this number?
Here is the most elegant way:
Definition. Given two sets X and Y , we define the distance between X and Y as
d(X, Y ) = min{d(P, Q) : P ∈ X and Q ∈ Y }.

Notice that, if X ∩ Y 6= ∅, then d(X, Y ) = 0.


Warning: to work in full generality the notion of minimum should be replaced with the one
infimum , but we will not go in this generality.
Exercise. Is the converse true? That is: if d(X, Y ) = 0 , then X ∩ Y 6= ∅?

Here are some examples of distances from points:

i)Determine d(P, Q) the distance between two points.


−−→
This is the magnitude of the vector P Q.

ii)Determine d(P, r) the distance between a point and a line.


This can be determined by taking the plane α such that
P ∈ α, α ⊥ r.
Then consider the point Q = α ∩ r and notice that
d(P, r) = d(P, Q).
Let’s now check that this agrees with the definition above. That is, we have to prove that
for any point R ∈ r , R 6= Q, we have d(P, R) > d(P, Q). This is easily done: consider the
−→
triangle P QR which has a right angle in Q by construction. Thus, P R clearly has larger
−−→
magnitude than P Q.

iii)Determine d(P, α) the distance between a point and a plane.


We can proceed as follows. Take the line r such that
P ∈ r, r ⊥ α.
Then consider the point Q = α ∩ r and notice that
d(P, α) = d(P, Q)
Exercise. Prove that the described procedure agrees with the definition of distance. (Hint: use
orthogonal projections.)

L25.2 A formula for d(P, α). Consider the plane α : ax + by + cz = d where n = (a, b, c) is
the normal vector. It is convenient to arrange that n be a unit vector. If this is not already
the case, it suffices to divide both sides of the equation by |n|, thus modifying also d.
Lemma. If |n| = 1, so that a2 + b2 + c2 = 1, the distance between a point with position
vector p and the plane α equals |p · n − d|.

Proof. The distance is the length of the vector p − p0 , where p0 is the point of the plane π
at the foot of the perpendicular from p to π . Since p − p0 is parallel to n, this distance is
the absolute value of
(p − p0 ) · n = p · n − p0 · n.
Since p0 ∈ π , the last term equals d. QED

Example. To find the distance of (2, 0, 0) from x + y + z = 0, we merely take



n = (1, 1, 1)/ 3,

and d = 0. The distance is |p · n| = 2/ 3.

L25.3 Distances from lines. The case of d(l, α) is easily treated: either the line and plane
intersect, and the distance is zero, or the line and the plane are parallel and then

d(l, α) = d(P, α)

for any point P ∈ l .


To determine d(r, s), the distance between two lines, we have to work a little more. If the
line intersect, than the distance i zero. If the lines are parallel, then it is easy to see that

d(r, s) = d(P, s) = d(r, Q)

for any choice of P ∈ r and Q ∈ s. If the lines are intersecting, or they are skew lines, the
following formula will compute the distance
−−→ → − → −

P Q · vr × vs
||→

vr × →

vs ||
for any choice of P ∈ r and Q ∈ s. Notice that, if the lines are intersecting, then they are
−−→ − →
coplanar, hence the dot product P Q · →
vr × −
vs is zero.
Warning: the formula does not work for parallel lines.
Exercise. Verify that this answer agrees with the definition using the following geometric con-
struction in the case of skew lines. There exists a (unique) plane α such that α ⊂ r and αks . For
this plane, the following holds d(r, s) = d(α, s) .

L25.4 Distances from planes. The only case left to be treated is the one of two planes, that is
we have to compute d(α, β) the distance between two planes. If the planes are not parallel,
then they are intersecting and the distance is d(α, β) = 0; similarly if they coincide. If they
are parallel, then it is easy to see that

d(α, β) = d(P, β) = d(α, Q)

for any choice of P ∈ α or Q ∈ β .


Exercise. Verify that this answer agrees with the definition.

L25.5 Circles. For ancient Greek Mathematicians, a curve is defined by giving a procedure
to draw it. For example, fix a pole in the ground use a tight rope to draw a curve: here is a
circle. After Descartes, though, things are different: we define curves using their equations
(could be more than one equation: think of three space).
Definition. A circle in the plane is the set of points P ∈ R2 whose coordinates are solutions
of the equation
x2 + y 2 + Dx + Ey + F = 0.

Exercise. Fix a point C ∈ R2 and a real positive number R ∈ R . Show that the set of points P
such that d(P, C) = R is a circle according to the previous definition. (Hint: take the squares
and compute.)

Note that the equation approach introduces degenerate curves such as circles of zero radius
and even of imaginary radius!
Example. The following are circles:
(i) C : x2 + y 2 = 0 that corresponds to exactly one real point, that is (0, 0);

(ii) C : x2 + y 2 = −1 that is the empty set, since there are no real solutions.
These degenerate cases are actually a richness and show how restricting to the real numbers
could make our understanding less clear (one could think of (i) as a complex circle with only one
real point and of (ii) as a complex circle with no real points).

Given a circle a typical question is to find its center and its radius, and thus detecting if the
circle is degenerate or not. This is done by completing the squares, a very natural procedure
based on the equality d(P, C)2 = R2 which yields

(x − xc )2 + (y − yc )2 = R2

and thus giving to us the center C(xc , yc ) and the radius R of the circle.
Example. Let’s study the circle C : x2 + y 2 + 2x − 2y − 2 = 0 . To find the center and the radius
we complete the squares, that is we want to find a, b such that

x2 + 2x = (x + a)2 − a2 and y 2 − 2y = (y + b)2 − b2 .

Thus we get a = 1 and b = −1 and hence

x2 + y 2 + 2x − 2y − 2 = (x + 1)2 − 1 + (y − 1)2 − 1 − 2 = (x + 1)2 + (y − 1)2 − 4 = 0.



We then conclude that C has center C(−1, 1) and radius R = 4 = 2.

Exercise. For what values of a ∈ R the center of the circle x2 + y 2 + ax + y − 1 = 0 lies on the
line x − y = 0?
Exercise. For what values of a ∈ R the circle x2 + y 2 + ax + y − 1 = 0 does not touch the line
x − y = 0?

L25.6 Distances and circles. Now that we added circles to our set of geometrical objects, we
can study distances from circles using the general approach. We will see that it is easier to
find the distance rather than the two points realizing it; as always orthogonality is the key.

(i) Distance of a point P form a circle C : it is easy to check that d(P, C) = |d(P, C) − R|
where C and R are the center and the radius of the circle.

(ii) Distance of a line l form a circle C : it is easy to check that d(l, C) = d(l, C) − R, if
this is non-negative, or zero otherwise, where C and R are the center and the radius of the
circle.

(iii) Distance of a circle C1 from a circle C : we will see that

d(C1 , C) = max{0, d(C1 , C) − (R1 + R), |R1 − R| − d(C1 , C)},

where C and R, resp. C1 and R1 , are the center and the radius of the circle C , resp. C1 .
Exercise. In cases (i), (ii), and (iii), find a pair of points, one on the circle and one on the other
object, realizing the minimal distance. Can you see any ’orthogonality’ in the cases in which the
distance is not zero?

L25.7 Further exercises.

1. Given the plane π : x + 2y − 4z = 7, find an orthonormal set {u, v, w} of vectors such


that u and v are parallel to π .

2. Let v = xi+yj+zk, v0 = x0 i+y0 j+z0 k, n = ai+bj + ck. Find the distances between the
following points/planes:
(i) (0, 0, 0) and x+y+z+6 = 0,
(ii) (1, 2, 3) and x = 4,
(iii) (1, 2, 3) and x+y+z = 0.

3. Given the planes π1 : x + y + z = 1 and π2 : x + 2y − z = 0, let ` = π1 ∩ π2 . Say whether


(i) there exists a, b, c ∈ R such that ` is given by (x, y, z) = (1+at, −1+bt, 1+ct),
(ii) there exists p, q, r ∈ R such that ` is given by (x, y, z) = (p−3t, q+2t, r+t).

4. Find a unit vector orthogonal to


(i) the plane 7x + y + z = 5,
(ii) the plane that contains the points (1, 2, −2), (1, 0, 3), (−4, 4, 4),
(iii) both the lines (x, y, z) = (−1, t, 2t) and (x, y, z) = (t, 1 + t, 1 − t).
5. Given the planes
π1 : ax + y − 2z = 0,
π2 : y + z + 2b = 0,
π3 : 2x + y + 2z = 1,
find a and b such that the line π1 ∩ π2 is parallel to π3 .

6. Let ` be the line x = 2y = z and π the plane x = y + z . Explain why a line m in π that
meets ` necessarily has the form (x, y, z) = (at, bt, ct), and find the condition on a, b, c for
which m is orthogonal to `.

7. Let ` and m be two lines parallel to vectors p e q and containing points P and Q.
Suppose that v = p×q 6= 0. Show that the (minimum) distance between ` and m equals
|P~Q · v |/|v|.

8. Study the following circles, e.g. are there real points? how many? what is the center?
what is the radius?.
(i) x2 + y 2 + y + x = −3, 0, 3
(ii) x2 + y 2 + 2y + 3x = −5, 0, 5

9. Find the distances among the following pairs


(i) x + y = 0
(ii) x2 + y 2 = 1
(iii) x + y = 2
(iv) x2 + y 2 + 2x + 2y = 4
Notes 26 – Circles and spheres
In this lecture we will intersect two circles. We then study spheres and their intersection,
this leads to the study of circles in 3 dimensional space.

L26.1 Intersection of circles. We defined a circle as the solution set of a particular degree 2
equation in x and y . Using this approach, the intersection of two circles is the solution set
of a system of two quadratic equations in two variables. That is, the solutions of
!
C 1 : " x 2 + y 2 + D 1 x + E 1 y + F1 = 0
#
C2 : x 2 + y 2 + D 2 x + E 2 y + F2 = 0

correspond to the points of C1 ∩ C2 . But how can we solve such a system of equations?
Gaussian elimination only works for linear systems of equations, however it is a good idea
to take linear combinations to produce an equivalent system of equations. Namely, we get
that the solutions of
!
C 1 : " x 2 + y 2 + D 1 x + E 1 y + F1 = 0
#
ℓ: (D2 − D1 )x + (E2 − E1 )y + F2 − F1 = 0

are exactly the points of C1 ∩ C2 , but now we are intersecting a line and a circle and this looks
much more promising.
The line ℓ is called radical axis of the circles C1 and C2 and it has the property that

ℓ ∩ C1 = ℓ ∩ C2 = C1 ∩ C2 .

Example. Let’s find the intersection of the circles C1 : x2 +y 2 −1 = 0 and C2 : x2 +y 2 +2x−1 = 0 .


The radical axis is the line

ℓ : (x2 + y 2 − 1) − (x2 + y 2 + 2x − 1) = −2x = 0

and thus C1 ∩ C2 = ℓ ∩ C1 = {(x, y) : x = 0 and y 2 − 1 = 0} = {(0, −1), (0, 1)} .

Exercise. Find C1 ∩ C2 for the circles C1 : x2 + y 2 − 1 = 0 and C2 : x2 + y 2 + ax − 1 = 0 where


a ∈ R is a parameter.

Exercise. Show that, if R1 = R2 , then the radical axis is the axis of the segment joining the
centers of the two circles. What happens if R1 ∕= R2 ?

The following result is an easy application of the basic inequalities among the sides of a
triangle.
Proposition. Let C1 and C2 be circles having radius R1 , resp. R2 , and center C1 , resp. C2 .
Then
(i) if d(C1 , C2 ) > R1 + R2 or d(C1 , C2 ) < |R1 − R2 |, then C1 ∩ C2 = ∅

(ii) if |R1 − R2 | ≤ d(C1 , C2 ) ≤ R1 + R2 , then C1 ∩ C2 ∕= ∅

Exercise. State the previous Proposition using the radical axis and the distance from it.
L26.2 Spheres and intersection of spheres. The canonical form of a sphere is an expression
of the form
S : (x − xc )2 + (y − yc )2 + (z − zc )2 = R2
where the non-negative real number R is the radius of the sphere and C(xc , yc , zc ) is the
center of the sphere. That is S is the set of points P (x, y, z) such that

d(P, C) = R.

However, the general form of a sphere is the following:


Definition. A sphere S is the solution set of the equation

x2 + y 2 + z 2 + DX + Ey + F z + G = 0

where D, E, F, G ∈ R.

Warningthis definition includes some degenerate cases such as a single point (only one real
solution exists) and the empty set (no real solutions exist).
Given a sphere a typical task is to find its radius and its center, and this can be done again
by completing the squares.
Example. Find the radius and the center of the sphere S : x2 + y 2 + z 2 + x − y − 1 = 0 . That is,
we look for a, b, c ∈ R such that

(x − a)2 − a2 = x2 + x, (y − b)2 − b2 = y 2 − y and (z − c)2 − c2 = z 2 .


!
Clearly c = 0 , b = 1
2 ,and a = − 12 . Thus the center is C(− 12 , 12 , 0) and the radius is R = 1 + 12 .

We want to study the intersection of two spheres, and thus to find the solutions of the non-
linear system of equations
!
S 1 : " x 2 + y 2 + z 2 + D 1 x + E 1 y + F1 z + G 1 = 0
#
S2 : x 2 + y 2 + z 2 + D 2 x + E 2 y + F2 z + G 2 = 0

We can proceed as we did for circles and we get that S1 ∩ S2 is equal to the solution set of
the system
!
S 1 : " x 2 + y 2 + z 2 + D 1 x + E 1 y + F1 z + G 1 = 0
,
#
π: (D2 − D1 )x + (E2 − E1 )y + (F2 − F1 )z + (G2 − G1 ) = 0

where the plane π is called radical plane of the two spheres and has the property that

S1 ∩ π = S2 ∩ π = S1 ∩ S2 .

In particular, to describe the intersection of two spheres, it is enough to compare d(C1 , π)


with R1 , where C1 and R1 are the center and the radius of one of the two spheres.
Proposition. Let Si , i = 1, 2 be two spheres of center Ci and radius Ri and let π be the
radical plane. Then,
(1) if d(Ci , π) > Ri , then S1 ∩ S2 = ∅,
(ii) if d(Ci , π) = Ri , then S1 ∩ S2 is one single point (tangent spheres),
(iii) if d(Ci , π) < Ri , then S1 ∩ S2 is a circle.

Proof. The only part that requires some explanation is the last one. First note that S1 ∩ S2 =
S1 ∩ π and thus we have only to understand the intersection of a sphere and a plane. Let C ′
be the orthogonal projection of C1 on π and let P be any point in C = S1 ∩ π . The triangle
$′ P has a right-angle in C ′ and thus
C1 C

d(P, C ′ )2 = d(P, C1 )2 − d(C1 , π)2 .

Since d(P, C1 )2 = R12 , the right-hand


% side does not depend on P , and we conclude that C is
a circle of center C and radius d(P, C1 )2 − d(C1 , π)2 .
′ QED

Exercise. What is the maximal radius of a circle on the sphere x2 + y 2 + z 2 = 4 ? Can you find
all the circles of maximal radius?

Example. Consider the circle in the space R3


" 2
x + y2 + z2 = 4
C:
x+y+z =1

to find its center and its radius we proceed as follows. The center of C id

C =π∩l

where π : x + y + z = 1 and l is the line through the center of the sphere orthogonal to the plane,
that is
l : x = y = z.
Thus, we get C( 13 , 13 , 13 ) and the radius R of the circle is

R2 = 4 − d(O, π)2

where 2 is the radius of the sphere and the origin O is its center. Thus we get
# √ $2
3 11
R2 = 4 − = .
3 3

L26.3 Circles and spheres passing through given points. We know that infinitely many
lines goes through one point, and that exactly one line exists through two distinct points.
What is the situation with circles and spheres?
Let’s start with the case of circles in R2 . Recall that the axis of a segment AB is the locus of
points P such that d(P, A) = d(P, B), then it is easy to prove the following.
Proposition. Given three distinct non-collinear points there exists a unique circle passing
through them.

Proof. Let P1 , P2 , P3 be the points. Consider the lines r and s, respectively axes of the
segment P1 P2 and P2 P3 . Then r ∩ s is the center of the circle we are looking for (note that
if the points are collinear, then the intersection is empty) and the radius is the distance from
any of the point Pi from that point. QED
The same result can be obtained by solving a linear system of equations. Namely, consider
the three points Pi (xi , yi ), i = 1, 2, 3, and the circle

C : x2 + y 2 + Dx + Ey + F = 0.

Note that Pi ∈ C if and only if D, E, F are such that

x2i + yi2 + Dxi + Eyi + F = 0.

Thus, to find the circle containing the three points, we have to solve a linear system of three
equations in three unknowns.
Example. To find a circle containing the points P1 (1, 0), P2 (0, 1) and P3 (1, 1) , we have to solve
the linear system of equations: %
&
& 1+D+F =0
&
&
'
1+E+F =0
&
&
&
&
(
2+D+E+F =0
whose solutions is {(D, E, F ) = (−1, −1, 0)} . Thus the required circle is

C : x2 + y 2 − x − y = 0.

Exercise. Find the center and the radius of the circle of the previous example. Check you answer
using the axes of the segments approach.

A completely analogue approach works for spheres.


Proposition. Given four distinct non-coplanar points R3 there exists a unique sphere pass-
ing through them.

Exercise. Prove the proposition. What happens if the points are coplanar?

Exercise. Recall that three non-collinear points in R3 uniquely determine a plane, use this to
find the circle (in the space) containing the three given points. (Hint: intersect a sphere and a
plane).

L26.4 Further exercises.

1. Find the pairwise intersection of the following circles:


(i) x2 + y 2 − 1 = 0 (ii) x2 + y 2 + x + y − 1 = 0 (iii) x2 + y 2 + x − 1 = 0
(iv) x + y − 4 = 0
2 2

2. Find centers and radi of the following intersection of spheres circles:


(i) x2 + y 2 + z 2 − 1 = 0 (ii) x2 + y 2 + +z 2 + x + y − 1 = 0

(iii) x2 + y 2 + z 2 + x + z − 1 = 0 (iv) x2 + y 2 + z 2 − 4 = 0

3. Find the circles passing through one, two, three, and four of the following points:
(i) A(1, 1), B(0, 0), C(2, 3)
(ii) A(1, 1), B(0, 0), C(2, 2)
&
(iii) A(1, 0), B(0, 1), C( √12 , √12 ), C( 12 , 3
2)

(iv) A(1, 0), B(0, 1), C( √12 , √12 ), C( 12 , 32 )

(v) A(1, 0, 0), B(0, 1, 0), C(0, 0, 1)

(vi) A(0, 0, 0), B(1, 1, 1), C(2, 2, 2)

You might also like