Professional Documents
Culture Documents
Carlini LAG Notes
Carlini LAG Notes
L1.1 Applied vectors and free vectors. The typical treasure map gives information to you
using applied vectors. From the oak three walk 30 steps est and then walk 40 steps north.
So, in the first part of our treasure hunt, we start from a point O and then we move along
the west-est direction, in the verse towards est, walking a distance of magnitude 30 steps.
This task can be encoded by an arrow → (vector) starting at the point O (application point).
This is an example of an applied vector.
In Physics applied vectors are widely used. Think of a force applied to body, of the speed of
a point, and so on.
In Mathematics we want to separate the information encoded in applied vectors in two
pieces: application point + vector.
Definition. A free vector → −v is completely determined by
(i) the direction of the line containing →
−
v,
(ii) the verse following which we move along the line according to →
−
v,
→
− →
−
(iii) the magnitude (or length) of v denoted as || v ||.
Note that for each free vector there are infinitely applied vectors, one for each possible ap-
plication point. This means that different applied vectors can correspond to the same free
vector.
Exercise. When is it the case that different applied vectors correspond to the same free vector?
Example. A common way to produce applied vectors is to take two points, say A and B . Then
−−→ −−−−−→ −−→
we write AB , or (B − A) , to represent the applied vector going from A to B . Notice that BA
−−→
as the same direction and magnitude of AB but opposite verse.
−→
Exercise. What kind of vector is AA?
There are different ways to visualize the set of free vectors of the plane; one is the following.
Fix a point O and apply all free vectors to this point. Then, the free vectors of magnitude R
are in bijection with the points of the circumference of center O and radius R.
Exercise. How can we visualize free vectors in three dimensional space?
L1.2 Operation with vectors. Natural numbers arose naturally to count objects, and for a
long while there was no place for the number zero (no objects). However, if we want not
only to count, but also to perform operations, zero is of crucial importance.
→
−
Definition. The zero vector 0 is the only vector of null magnitude. It does not have direc-
tion or verse.
We first see how to multiply a vector by a real number c ∈ R which we call a scalar.
→
−
Definition. Let c ∈ R and →−v be a free vector. If c = 0 then c→
−
v = 0→
−
v = 0 . If c 6= 0 then
c→
−
v is the vector:
(i) having the same direction of →−v,
→
−
(ii) having the same verse of v if c > 0 and opposite verse if c < 0,
(iii) having magnitude ||c→
−v || = |c| ||→
−
v ||.
Note that c→
−v is obtained by contracting or dilating the vector →
−v . Also, take note of the
absolute value of c appearing in the definition. What would be the problem with c||→−
v ||?
Let’s go back to our treasure hunt. To reach X , we have to start from O , walk 30 steps est
and 40 steps north. Or, we can move 50 steps north-est from O and then reach X . In other
−−→
words, we can move along the diagonal OX instead of following the two sides.
Definition. (Parallelogram Rule) Given free vectors →−
u and → −v , their sum −
u−+
−→v is obtained
→
− →
−
in the following way. Apply u and v at the same point O and consider the parallelogram
having → −
u and →
−v as two consecutive sides. Then, −
u−+
−→v is the diagonal of the parallelogram
starting from O .
Exercise. What can we say about the sum of applied vectors? Is it always defined?
The multiple by a scalar and the addition of vectors have many useful properties. These
properties make computing expressions with vectors very similar to computing expressions
with real numbers.
Proposition. (Basic Properties) Let c, d ∈ R and →−
u,→−v and →−
z be free vectors, then the
following hold:
(i) c(d→−
u ) = (cd)→−
u,
(ii) (c + d) u = c→
→
− −u + d→
−u,
−−−→ −−−→
(iii) u + v = v + u,
(iv) →−
u + (→−v +→ −
z ) = (→
−
u +→ −
v)+→
−
z,
(v) c(→−u +→−v ) = c→
−u + c→
−v.
Using the previous Proposition, we can show that (−1)→ −v can be reasonably called −→−
v . In
→
− →
− →
− →
− →
−
fact, (−1) v differs from v only for the verse which is the opposite. Thus, (−1) v + v = 0
Exercise. If the previous argument seems too obvious, read it again and be sure to understand
each step.
Exercise. Take a triangle and see its sides as vectors by assigning arrows of your choice. Now
take the sum of these three vectors. What do you have to do to obtain the zero vector? Repeat
for a square.
L1.3 Components. Vectors of magnitude one play a special role and they deserve a name.
Definition. A vector of magnitude one is called a versor or unit vector.
→
−
u is a unit vector.
Exercise. Given a vector →
−
u , show that →
||−
u ||
To describe a point in the plane or in three space we can use coordinates. In order to do
this in the plane, we fix two orthogonal, oriented axes and we choose a way to measure
lengths on these axes. Then the position of each point is completely determined by taking
orthogonal projections.
The same can be done with vectors. Each coordinate, oriented axis provides us with a unit
→
− →− →
−
vector. In three space these are usually called i , j , and k .
Using the concatenation description of the sum of vectors, we can represent each free vector
as a linear combination of the same three unit vectors.
→
−
Definition/Proposition. Any free vector → −v in three space can be written as → −
v = vx i +
→
− →
−
vy j + vz k , for vx , vy , vz ∈ R. The numbers vx , vy , and vz are called the components of
→
−v with respect to the coordinate axes. In two space, i.e in the plane, simply there is no vz
component.
Exercise. Use orthogonal projection to find the expression in components of any vector.
→
− →
− →
−
Example. If we think the vector → −v = i + 2 j + 3 k as applied in the intersection of the
coordinate axes, then the head of →
−
v is in the point of coordinate (1, 2, 3).
Operation on vectors can be easily performed taking advantage of components and of the
Basic Properties.
→
− →
− →
− →
− →
− →
−
Example. Consider the vectors →
−
v = i + 2 j + 3 k and →−
w = 3 i + 2 j + 1 k . Then, by applying
the basic properties, we get
− →
− →
− →
−
u−+
−→v = 4 i + 4 j + 4 k = 4→
−
z,
→
− → − → −
where →
−
z = i + j + k.
Proof. We propose to arguments, one algebraic and the other geometric. Algebraic The result
simply follows by applying the Basic Properties (i)-(v) to the expressions in components of
→
−v and →−w . Geometric First we notice that the proof can be split in two steps: the case a = 0
and then the case a = b = 1. To prove the proposition, it is enough to use orthogonal
projections to determine the components. For example, in the case a = 0 and b > 0, we
have to determine the components of b→ −w . As b→
−
w is a dilation, or contraction, of →
−
w by the
factor b the result on the components follows. QED
Exercise. Why it is enough to prove the Proposition for a = 0 and then the case a = b = 1?
Notes 2 – More operations with vectors: scalar product,
vector product and mixed product
It is clear that assigning, and sharing, vectors using direction verse and magnitude is not the
best possible way. Of course using component is much more efficient and feasible. However,
the component presentation of vectors hides the geometry. For example, we can ask: what
is the length of a vector? how can we measure the angle between two given vectors? In
this section we introduce very important operations with vectors which will also provide
answers to these questions.
−
→ −→
Exercise. Compute the dot product of all possible pairs chosen among the unit vectors i , j
−
→
and k . Does the result depend upon the order in which we choose the vectors?
Proof. The proof follows by standard trigonometry, noticing that the scalar
−
→
u ·−→w
−
→ = ||−
→
u || cos(α)
|| w ||
is the length of the orthogonal projection. QED
Proof. (Hint) The first two properties are proved by using the projection interpretation of
the dot product. While the remaining properties follow by the definition. QED
The linearity properties and the symmetry allows us to use the components of vectors to
compute the dot product.
−
→ −
→ → −
− →
Example. ( i + 2 j ) · (2 j − k ) =
−
→ → −
− → −
→ → −
− →
= i · (2 j − k ) + (2 j ) · (2 j − k )
−
→ −
→ −
→ −
→ −
→ −
→ −
→ −
→
= i · (2 j ) + i · (− k ) + (2 j ) · (2 j ) + (2 j ) · (− k )
−
→ −→ − → − → −
→ − → → −
− →
=2 i · j − i · k +4j · j −2j · k
= 2 · 0 − 0 + 4 · 1 − 2 · 0 = 4.
Notice that we can easily check orthogonality of vectors using the dot product.
−
→
Corollary. If −
→
u,−
→
w ∕= 0 , then : −
→
u ⊥− →
w if and only if −
→
u ·−→
w = 0.
−
→ − →
Example. The vector −
→
u = i + j has the direction of the bisectrix of first quadrant of the x, y
plane. Computing
−
→ −
→
u · i =1
and √ √
||−
→
u || = −
→
u ·−
→
u = 2
we see that the x -axis forms an angle of π
4 with the bisectrix, as expected.
Exercise. Try the same with the first orthant in three space. Before computing, make a guess:
what is the angle between the bisectrix and the x -axis?
L2.2 Vector product. Dot product detects orthogonality and can compute angles. We will
now introduce a new operation useful by itself, which can help us to detect parallelism and
to compute areas.
Definition. Given vector −
→
u and −
→
w the vector product
−
→
u ×−
→
w
||−
→
z || = ||−
→
u || ||−
→
w || sin(α),
→ −
− → − → → −
− → −
→ −
→ −→ − → − →
Exercise. Check that j × k = i and i × k = − j . What can you say about i × i , j × j
−
→ − →
and k × k ?
We said that the vector product can detect parallelism between two vectors. This can be
easily done noticing the following.
−
→ −
→
Corollary. If −
→
u,−→
w ∕= 0 , then : −
→
u (−
→
w if and only if −
→
u ×−→
w = 0.
Proof. The proof of (ii) will require the notion of mixed product and it will be given in the
next section. We give here a proof of (i). If one among a, −
→
u and −
→
w is zero, then the equality
is clear. Hence we may assume that no one of them is zero. Let
−
→
z = (a−
→
u)×−
→
w
and
−
→
t = a(−→u ×−→w ).
−
→ −
→
We want to show that −→z = t . It is easy to see that −→z and t have the same direction as
this is the common perpendicular to −→
u and − →
w . Also the magnitudes coincide, in fact
−
→
||−
→
z || = ||a−
→
u || ||−
→
w || sin(α) = |a| ||−
→
u || ||−
→
w || sin(α) = || t ||.
Finally, we deal with verse. First consider the case a > 0. Notice that the angle between a−
→
u
−
→ −
→ −
→ −
→ −
→
and w and the angle between u and w coincide. Thus z and t have the same verse.
Now consider the case a < 0. In this situation, (a− →
u)×− →w and −→
u ×− →
w have opposite verse.
−
→ −
→
But, as a < 0, the vector z and t have, again, the same verse. QED
This properties allows us to compute the vector product of any pairs of vectors.
−
→ −
→ → −
− →
Example. ( i + 2 j ) × (2 j − k ) =
−
→ → −
− → −
→ → −
− →
= i × (2 j − k ) + (2 j ) × (2 j − k )
−
→ −
→ −
→ −
→ −
→ −
→ −
→ −
→
= i × (2 j ) + i × (− k ) + (2 j ) × (2 j ) + (2 j ) × (− k )
−
→ − → − → − → −
→ − → → −
− →
=2 i × j − i × k +4j × j −2j × k
−
→ −
→ −
→ −
→ − → −
→
= 2 k − (− j ) + 0 − 2 i = −2 i + j + 2 k .
Extensively using the properties of the vector product we can find the following formula to
compute the vector product.
−
→ −
→ −
→ −
→ −
→ −
→
Proposition. Consider the vectors −→v = vx i + vy j + vz k , and −
→
w = wx i + wy j + wz k ,
then
−
→ −
→ −
→ −
→
v ×−
→
w = (vy wz − vz wy ) i − (vx wz − vz wx ) j + (vx wy − vy wx ) k .
||−
→
u ×−
→
w || = 2area(T ) = area(P ),
L2.3 Mixed product. Given three vectors, there is essentially only one way to merge them
using the dot product and the vector product.
Definition. For vectors −
→
u,−→
v and −→
w we define the mixed product
−
→
u ×−
→
v ·−
→
w.
|−
→
u ×−
→
v ·−
→
w|
BaseArea = ||−
→
u ×−
→
v ||.
The height with respect to this base is given by the magnitude of the orthogonal projection
of the third side −
→
w on −
→
u ×− →
v . Thus we get
BaseArea × Heigth = |−
→
u ×−
→
v ·−
→
w |.
QED
From this result we can understand what happens when we permute the vectors of the
mixed product. In particular, for i ∕= j ∕= k
−
→
v1 × −
→
v2 · −
→
v3
and
−
→
vi × −
→
vj · −
→
vk
are equal up to sign and whenever two vector are swopped, vi ↔ vj , the sign changes.
We already note that, when a product (dot or vector) is zero, we can derive some useful
information. The same holds for the mixed product.
−
→
Proposition. If −
→
u,−
→v ,−
→
w ∕= 0 , then
−
→
u ×−
→
v ·−
→
w =0
We finally see how to use the mixed product to prove Linearity 2 for the vector product. We
provide two different arguments.
Proof. (Conceptual) We use the following remarks:
−
→ −
→ −
→ −
→ −
→ −
→
(i) −
→v = vx i + vy j + vz k if and only if vx = −→
v · i , vy = −
→
v · j e vz = −
→
v · k.
−
→ → −
− → − → → − → − → − → → − → − → −→
(ii) −
→a = b if and only if −
→
a · i = b · i ,− a · j = b · j e− a · k = b · k.
Now we set
−
→
a = (−
→
u +−
→
v)×−
→
w
and
−
→ −
b =→
u ×−
→
w +−
→
v ×−
→
w
−
→
and we show that −
→
a = b.
Using properties of the mixed product and using the linearity properties of the dot product
−
→
we get −
→a · i =
−
→ → − → → − −
→ → − −
→ → −
→
(−
→
u +−→v )×− →
w· i =− w × i · (−
u +→v)=− →
w × i ·− u +→ w × i ·− v = (−→
u ×− →w +− →
v ×− →
w)· i
−
→ − → −
→ −
→
= b · i . Repeating the same argument with j and k we complete the proof. QED
where
A = [(−
→
u +−
→v)×−
→w ] · [(−
→
u +− →
v)×−→
w ],
B = −2[(−
→
u +−
→v)×− →w ] · (−
→
u ×−
→
w ),
C = −2[(−
→
u +−
→
v)×−
→
w ] · (−
→
v ×−
→
w ).
Now we use the mixed product and the fact that shifting does not change it.
A=−
→
w × [(−
→
u +−
→
v)×−
→
w] · −
→
u +−
→
w × [(−
→
u +−
→
v)×−
→
w] · −
→
v,
B −
− =→
w × (−
→
u ×−
→
w) · −
→
u +−
→
w × (−
→
u ×−
→
w) · −
→
v
2
C
− =−→
w × (−
→
v ×−
→
w) · −
→
u +−
→
w × (−
→
v ×−
→
w) · −
→
v
2
More computing for B and C gives
B
− = (−
→
u ×−
→
w ) × (−
→
v ×−
→
w ) + ||−
→
v ×−
→
w ||2 ,
2
C
− = (−
→
v ×−
→
w ) × (−
→
u ×−
→
w ) + ||−
→
u ×−
→
w ||2 .
2
The last computation with A yields
A = (−
→
u ×−
→
w ) · [(−
→
u +−
→
v)×−
→
w ] + (−
→
v ×−
→
w ) · [(−
→
u +−
→
v)×−
→
w]
= (−
→u +− →
v)×− →w · (−
→
u ×−→w ) + (−→u +−→
v)×− →
w · (−
→v ×− →w)
=−
→
w × (−
→
u ×−
→
w) · −
→
u +−→w × (− →
u ×− →
w) · −
→
v +− →w × (−
→
v ×− →
w) · −→u +− →w × (−
→
v ×−
→
w) · −
→
v
= ||−
→u ×−→
w || + 2(−
2 →v ×− →
w ) · (−
→
u ×− →
w ) + ||−
→
v ×− → 2
w || .
−
→ −→
Finally, we substitute the obtained expressions for A, B and C and we get t · t = 0.
QED
Notes 3 – Planes and lines.
A plane is the locus of points determined by a single linear equation, and it can be param-
eterized using two free variables. We begin with the second point of view. Similarly a line
is determined by two linear equations, that is by a linear system of equations, and it can be
parameterized using one free variable.
→
− → − →
− →
− →
− →
−
WarningSince the unit vectors i , j and k are fixed, the expression → −v = vx i +vy j +vz k
is redundant. Thus, with a slight abuse of notation, we will sometimes also write → −v =
(vx , vy , vz ). The latter is less formal but more convenient when computing. We have to be
very careful though and remember: never mix up points and vectors!
WarningVectors are usually denoted as → −
v but the notation v , that is v in boldface, is also
quite common.
L3.1 The equation of a plane. What is a plane exactly? It is a flat 2-dimensional surface.
As a first example, consider the plane consisting of all points of ‘height’ z = 1. To describe
all points P (x, y, z) belonging to this plane we can proceed as follows. The position vector
→
− −−→
v = OP satisfies the equation
→
−
v = s→
−
p + t→
−
q +→
−
v0 .
(1)
→
− − →
−
where →
−
p = i ,→ q = j . Since s, t are free to vary in R they are called free variables or free
→
−
parameters, while the vector →
−
v0 is fixed and chosen to be →−
v0 = k . If we write vectors using
components as column vectors and we set x = s, y = t, we obtain the following expression
x 1 0 0
y = x 0 + y 1 + 0 .
z 0 0 1 (2)
We may call either version in (1) or in (2) a parametric equation of the plane.
Setting s = 0 = t in (1) gives us a particular point (0, 0, 1) on the plane with position vector
→
−
v0 . The whole plane is described by adding to → −
v0 linear combinations of two fixed vectors
→
− →
−
p , q that are parallel to the plane.
A general plane will have the form (1) for arbitrary choices of →
−
v0 , →
−
p ,→
−
q , provided the last
two are not proportional. But a more common description is provided by the following
Proposition.
Proposition. A plane is the set of points (x, y, z) satisfying a linear equation equation
ax + by + cz = d, (3)
The vector
→
−
n = (a, b, c)
is called the normal vector to the plane. Any reasonable surface in R3 has a normal at each
point, but only for the plane the normal direction constant.
We shall assume that the two planes are not parallel, equivalently −
→1 , −
n →2 are not propor-
n
tional. In this case, their intersection will be a line
` = π1 ∩ π2 .
We can describe ` analytically by solving the linear system of two equations. We will see
a systematic way of doing this in a future lecture. At the moment, we just see a numerical
example, which can easily be generalized to produce a general method.
Example. Consider the intersection of the planes π1 : x + y + z = 1 and π2 : x + 2y + 3z = 4 ,
that is consider the system of equations
x+y+z = 1
,
x + 2y + 3z = 4
defining the line ` = π1 ∩ π2 . Since x appears in the first equation we can subtract the first
equation from the second producing a new system of equations having the same solutions,
namely
x+y+z = 1
.
y + 2z = 3
The second equation now readily gives y using z , namely y = −2z + 3 . Substituting in the first
equation we get x = −y − z + 1 = (2z − 3) − z + 1 = z − 2 and thus
x = z−2
`: ,
y = −2z + 3
that is P (x, y, z) ∈ ` if and only if x = z − 2 and y = −2z + 3. Using components and column
vectors we can write
x 1 −2
y = t −2 + 3
z 1 0
where z = t, and in the usual vector notation
→
−
v = t→
−
p +→
−
v0
where →
−
p has component (1, −2, 1) and →
−
v0 has components (−2, 3, 0).
In general, the solutions of the system (5) can be written in one of the following format
Here is an example.
x+y+z = 1
Example. We shall find the parametric equation of the line ` : . The
x + 2y + 3x = 4
direction of ` is given by →
−
p = (1, 1, 1)× (1, 2, 3) = (1, −2, 1). To find one point →
−
v0 on the line,
we set z = 0 so that
x + y = 1, x + 2y = 4 ⇒ y = 3, x = −2.
Therefore the equation is
→
−
v =→
−
v0 + t→
−
p = (−2, 3, 0) + t(1, −2, 1),
4. Let ` be the line x = 2y = z and π the plane x = y + z . Explain why a line m in π that
meets ` necessarily has the form (x, y, z) = (at, bt, ct), and find the condition on a, b, c for
which m is orthogonal to `.
Notes 4 – More on planes and lines.
In this lecture we investigate how lines and planes relate to each other. We will also see how
to deal with symmetry and projection with respect to planes and lines.
L4.1 Parallelism and orthogonality. We presented a plane as the set of solution of one linear
equation
π : ax + by + cz = d
→
− →
− →
−
and we introduced the normal vector of the plane → −n = a i + b j + c k . In contrast with
this situation, a line r in three space is the solution set of a linear system having two linearly
independent equations,
a1 x + b1 y + c1 z = d1
r: ;
a2 x + b2 y + c2 z = d2
thinking that a line in three space is presented by one equation is one of the worst thing you
can do in this class! A parametric equation of this line can be derived using the direction
vector →
−p =− →1 × −
n →2 , thus
n
→
−
v =→ −
v0 + t→
−
p;
as t varies the position vector →−
v describes the lines r .
By the geometric meaning of the normal vector and the direction vector, it is easy to see that
(i) The planes α, β are parallel if and only if −n→ −
→
α knβ .
(ii) The planes α, β are orthogonal if and only if − n→ −→
α ⊥ nβ .
(iii) The plane α and the line r are parallel if and only if − n→ →
−
α ⊥ vr .
(iii) The plane α and the line r are orthogonal if and only if nα k→ −
→ −vr .
→
− →
−
(iv) The lines r, s are parallel if and only if vr k vs .
(v) The lines r, s are orthogonal if and only if → −vr ⊥ → −
vs .
L4.2 Intersection of planes and lines. As lines and planes are defined as solution sets of
linear equations, in order to intersect them it is enough to solve a (larger) linear system of
equation. For the time being, we will solve these systems of equations by hands, but a more
systematic way of studying them will be soon introduced.
The intersection of two planes α and β is computed by solving a 2 × 3 linear system of
equations, that is a system with 2 equations in 3 unknowns. If the normal vectors − n→
α and
−
→
nβ are proportional, then the planes are either parallel, and we will find no solution, or they
coincide, and we will find infinitely many solutions depending on two parameters. If the
normal vectors are not parallel then α ∩ β is a line.
The intersection of a plane α and a line r is obtained by solving a 3 × 3 linear system of
equations, that is a system with 3 equations in 3 unknowns. If the normal vector of the
plane → −
n and the direction vector of the line → −
p are orthogonal, then either the plane and
the line are parallel, and we will find no solution, or the line lies inside the plane, and we
will find infinitely many solutions depending on one parameter. If the normal vector of the
plane →−n and the direction vector of the line →
−
v are not orthogonal, α ∩ r is exactly one point
and the system has exactly one solution.
The intersection of two lines r, s is determined by solving a 4×3 linear system of equations,
that is a system of 4 equations in 3 unknowns. If the direction vectors of the lines → −
v1 and
→
−
v2 are parallel, so are the lines, and no intersection exists, thus no solution exists of the
system; note that r and s lie in the same plane, that is they are coplanar However, when the
direction vectors are not proportional, a new phenomenon occurs. If the lines are coplanar,
then they must intersect, exactly one common point exists, and the system has exactly one
solution. But, if the lines are not coplanar, then no common points exist, and the system has
no solutions; r and s are called skew lines.
Skew lines are a new important feature of three space compared with two space. In the plane,
two lines are either intersecting or they are parallel, notice that they of course lie in the same
plane! In three space, though, two lines can also be not coplanar: in this case they are not
intersecting and they are not parallel and we call them skew lines.
Exercise. How do you find the plane containing two parallel lines?
Actually, when intersecting a line and a plane, this can be done more efficiently without us-
ing the 3×3 system equation. We can just use the parametric equation of the line, containing
only one variable t, and substitute into the equation of the plane.
Example. Find the intersection of the plane α : x + y + z = 0 and of the line
y−x=1
r:
z−y =1
We first find a parametric equation of the line. We can do this by finding the direction vector
using the vector product and then picking a random point of the line. However, in this case it
is simpler to use the structure of the equations. If we set y = t we can readily solve the system
and get the parametric equation
x −1 1
y = 0 + t 1
z 1 1
To find r ∩ α we substitute the parametric equation of the line in the equation of the plane and
we get
(t − 1) + (t) + (t + 1) = 0
and thus t = 0. Setting t = 0 in the parametric equations of r we get the intersection point
(−1, 0, 1).
Exercise. Find the parametric equations for r using the vector product. Do you get the same
equations?
L4.3 Orthogonal projection and symmetries. If a point P does not lie on a plane α, then
the orthogonal proiection of the point on the plane is the point of α closest to P , that is it
the best approximation of P in α. This is why orthogonal projections naturally appears in
numerical applications. Symmetries (or reflections), besides their aesthetical appeal, have
similar connections with applications.
Let’s see how to find orthogonal projections on planes.
Lemma. Fix O ∈ α. The orthogonal projection of P on α is the point H such that
−−→ −−→ − −
n→
HP = OP · n→
α
α − .
||n→α ||
2
Proof. Here is the basic remark: for a given point P , let H ∈ α be the orthogonal projection
of P on α. Then the triangle of vertices P, O and H has a right angle in H and we have the
−−→ −−→ −−→
following vector equation OH + HP = OP .
−−→ −−→
Thus, HP is the orthogonal projection of OP on − n→ α and this is enough to conclude the
proof. QED
Example. Consider the plane α : z = 0 and the point P (x, y, z) . To find the orthogonal projec-
→
−
tion of the point P , we pick O(0, 0, 0) and then we apply the lemma noticing that −
n→
α = k and
we get
−−→ →
− →
− →
− → −→ − →
−
HP = (x i + y j + z k ) · k k = z k
and we get H(x, y, 0) as it should be.
Rather than using the previous formulae, it is possible to follow an alternative geometric
argument in order to find orthogonal projections. Given a point P and a plane α, let H ∈ α
be the orthogonal projections of P . In order to find H , find the unique line r passing
through P and such that r ⊥ α. Then, H = r ∩ α.
Example. Find the orthogonal projection of P (3, 2, 1) on the plane a : x + y + z = 0. The
direction vector of the line r can be chosen equal to the normal vector of the plane, that is
→
− −→ →
− → − → −
p =n α = i + j + k . Since P ∈ r the parametric equations of the line are
x 3 1
y = 2 + t 1
z 1 1
To find r ∩ α we substitute the parametric equation of the line in the equation of the plane
and we get
(3 + t) + (2 + t) + (1 + t) = 0
and thus t = −2. Setting t = −2 in the parametric equations of r we get the orthogonal
projection H(1, 0, −1).
The treatment of symmetries is very similar to the one of orthogonal projections. The key
remark is the following: if Q is the symmetric of P with respect to the plane α, then the
−−→
middle point of P Q is H where H is the orthogonal projection of P on α.
Lemma. Fix O ∈ α. The symmetric of P with respect to the plane α is the point Q such
that
−−→ −−→ → − n→
QP = 2OP · −
α
nα − → .
||nα ||2
Example. Find the point Q the symmetric point of P (1, 2, 3) with respect to the plane α : z = 0.
Fix O(0, 0, 0) and note that −
n→
α = (0, 0, 1). Let Q(x, y, z) and apply the Lemma:
−−→ −
−−→ OP · n→ α−
QP = 2 − n→
α = 2(1, 2, 3) · (0, 0, 1)(0, 0, 1) = (0, 0, 6),
||n→
α ||
2
−−→
thus QP = (P − Q) = (1 − x, 2 − y, 3 − z) = (0, 0, 6) and hence Q(1, 2, −3).
Exercise. Find an expression for the coordinates of Q as we did for the coordinates of the
orthogonal projection.
1. Find the orthogonal projection of the points A(1, 2, 3) and B(3, 2, 1) on the plane α :
x − z = 1.
x+y+z =0
α : x + y + z = 1, r : x = y = z, s : { .
x−y−2=0
3. Choose three pairwise non-parallel planes and consider the three lines obtained as the
pairwise intersections of the planes. Do the line have a point in common? Does the answer
(YES/NO) depend on the initial choice of the planes?
4. Consider the triangle T of vertices A(1, 0, 0), B(0, 1, 0), C(0, 0, 1), and let T 0 be the or-
thogonal projection of T on the plane x + y + z = 5. Find the area of T 0 . How does it
compare to the area of T ?
7. Find three planes intersecting exactly in the point P (1, 2, 3). Can you find three lines
intersecting exactly at the same point?
Notes 5 – Matrix addition and multiplication
L5.1 Matrices and their entries. A matrix is a rectangular array of numbers. Examples:
0 0 0 0 1
1 1
0 0 0 0
1 2 3 0
0
1 2
1 0 0 0
0 6 9 0
0
1 4
0 1 0 0
0
0 0 0 1 0
The individual numbers are called the entries, elements, or components of the matrix. If the
matrix has m rows and n columns, we say that it has size ‘m by n’ or m×n. The above
examples have respective sizes 2×3, 4×1, 5×5, 2×2. If m = n (as in the last two cases) the
matrix is obviously square.
The set of matrices of size m × n whose entries are real numbers is denoted by Rm,n ; the
first superscript is always the number of rows. Sometimes we use symbols to represent
unspecified numbers, so the statement
a b
∈ R2,2
c d
Mathematicians like to deal in generalities and will even write a matrix as A = (aij ) without
specifying its size.
For example
T
0 0 0 0 1 0 1 0 0 0
1 1 0 0 0 0 0 0 1 0 0
(1, 2, −7)> = 2 ,
=
0 1 0 0 0 0 0 0 1 0
−7 0 0 1 0 0 0 0 0 0 1
0 0 0 1 0 1 0 0 0 0
Notice that (A> )> = A, so the operation of taking the transpose is self-inverse.
L5.2 Vectors. Of special importance are matrices that have only one row or column; they are
called row and column vectors. In writing a row vector with digits, it is useful to use commas
to separate the entries. For example, both the matrices
1
A = (1, 2, −7) ∈ R1,3 , B = 2 ∈ R3,1
−7
can be used to represent the point in space with Cartesian coordinates x = 1, y = 2, z = −7.
(Sometimes commas are used to distinguish between matrices and row vectors, but it is
simpler to regard them as the same object.)
One can switch between row and column vectors by observing that A = B> or B = A> . For
this reason, the distinction between a row vector and a column vector is often unimportant,
and the sets R1,n and Rn,1 can be written more simply Rn , and we can refer to both A ∈ R3
and B ∈ R3 as ‘vectors’ of length 3. We shall use such vectors to study analytic geometry
later in the course.
Whenever we write ‘Rn ’ the reader is free to use row or column vectors as he or she prefers;
if such a choice is not possible, we shall use the other notation to specify either rows or
columns. Actually, vectors tend to be given lower-case names, and a vector of unspecified
length n is more likely to be written
v1 x1
· ·
u = (u1 , . . . , un ) or v=
·
or .
·
vn xn
Row and column vectors are not merely special cases of matrices. Any matrix can be re-
garded as an ordered list of both row vectors and column vectors. Given a matrix A ∈ Rm,n ,
we shall denote its rows (thought of as matrices in their own right) by
r1 , . . . , rm ∈ R1,n
Much of the study of matrices is ultimately be based on one or other of these two descrip-
tions.
L5.3 Addition of matrices. A matrix is much more than an array of data. It is an algebraic
object that is subject to operations generalizing the more familar ones applicable to numbers
and vectors.
Definition. To form the sum of two matrices A, B , they must have the same size. The
entries are then added component-wise.
For example
1 2 3 0 −2 4 1 0 7
A= , B= ⇒ A+B =
0 6 9 2 −6 1 2 0 10
Definition. If c ∈ R and A ∈ Rm,n then cA is the matrix formed by multiplying every entry
of A by c.
A null matrix is denoted by 0 or 0 (or even 0 like the number), provided the context makes
clear its size. Of course, these definitions apply equally to vectors, so for example
L5.4 Matrix multiplication. First we define a numerical product between two vectors u, v
of the same length. For this it does not really matter whether they are row or column vectors,
but for egalitarian purposes we shall suppose that the first is a row vector and the second a
column vector. Thus, we consider
v1
·
u = (u1 , . . . , un ) ∈ R1,n , n,1
· ∈R .
v=
vn
(We shall not use the summation symbol much in this course, but students should be famil-
iar with its use.) The dot product provides the basis for multiplying matrices:
Definition. The product of two matrices A, B is only defined if the number of columns of A
equals the number of rows of B . If A ∈ Rm,n has rows r1 , . . . rm and B ∈ Rn,p has columns
c1 , . . . , cp then AB is the matrix with entries ri · cj and has size m×p.
More explicitly,
← r1 → ↑ · · ↑ r1 · c1 · · r1 · cp
AB = ··· c1 · · cp = · · · ··· .
← rm → ↓ · · ↓ rm · c1 · · rm · cp
One should imagine taking each row of A, rotating it and placing it on top of each column
of B in turn so as to perform the dot product.
Example. A very special case is the product r1 c1 = (r1 · c1 ) of a single row and a column.
Strictly speaking, this is a 1×1 matrix, but (again ignoring parentheses) we shall regard it as a
number, i.e. the dot product. With this convention, if v = (x, y, z) then
x
vv> = (x, y, z) y = x2 + y 2 + z 2 .
z
Later, we shall refer to the square root of this quantity as the norm of the vector v (it is the
distance from the corresponding point to the origin). By contrast, note that
2
x xy xz
>
v v = yx y 2 yz
zx zy z 2
is a 3×3 matrix.
An intermediate case of the matrix product is that in which the second factor is a single
column v = (x1 , . . . , xn )> ∈ Rn,1 , so that
r1 · v
Av = · .
rm · v
m×n n×p m × p,
| {z }
Even if AB is defined, it will often be the case that BA is not. The situation is much more
symmetrical if m = n, and we investigate this next.
Notes 6 – Square matrices: determinants and inverses
(Part I)
The study of square matrices is particularly rich, since ones of the same size can be mul-
tiplied together repeatedly. This realization will lead us to construct inverse matrices and
define a number called the ‘determinant’ of a square matrix.
L6.1 Identity matrices. Recall that a matrix is said to be square if it has the same number of
rows and columns. So A ∈ Rm,n is square iff m = n.
Definition. A square matrix is diagonal if the only entries aij that are nonzero are those for
which i = j . These form the diagonal & from top left to bottom right. The n × n matrix A
for which
1, i=j
aij =
0, i 6= j
is called the identity matrix of order n, and is denoted In .
Here is an example:
1 0 0
1 0 a11 a12 a13 a11 a12 a13 a11 a12 a13
= = 0 1 0
0 1 a21 a22 a23 a21 a22 a23 a21 a22 a23
0 0 1
If A, B ∈ Rn,n then both AB and BA are defined and have size n×n. In general they are
unequal, so matrix multiplication is not commutative.
0 1
Exercise. Try A = and B = A> .
0 0
Even more surprisingly AB could be the zero matrix even if both A and B are not the zero
matrix.
1 1 1 −1
Exercise. Compute AB and BA for A = and B = .
2 2 1 1−
Thus, matrix multiplication is radically different from the multiplication of real numbers.
We must be extremely careful when dealing with matrix multiplication!
L6.2 Powers of matrices. We can raise a square matrix to any positive power. For example
A2 simply means AA, and
A3 = AAA = A2 A = AA2 .
An important property of powers of a given matrix is that they commute with one another,
i.e. the order of multiplication does not matter (unlike for general pairs of matrices):
Am An = Am An , m, n ∈ N. (1)
By convention, for a matrix A ∈ Rn,n , we set A0 = In . We can try to define negative powers
using the inverse of a matrix, though this does not always exist. The situation for n = 2 is
described by the
a b
Lemma. Let A = . Then there exists B ∈ R2,2 for which AB = I2 iff ad − bc 6= 0. In
c d
this case, the same matrix B satisfies BA = I2 .
1
d −b
B= . (2)
ad − bc −c a
is the null matrix, and this precludes the existence of a B for which AB = I2 ; multiplying
(??) on the right by B would give a, b, c, d are zero, impossible. QED
If ad − bc 6= 0, then (??) is called the inverse of A and denoted A−1 . More generally, a square
matrix A ∈ Rn,n is said to be invertible or nonsingular if there exists a matrix A−1 such that
AA−1 = In or AA−1 = In . In this case, it is a remarkable fact that there is only one inverse
matrix A−1 and it satisfies both equations.
Exercise. (i) If A is invertible, then so is A> , and (A> )−1 = (A−1 )> .
(ii) If A, B are invertible then (AB)−1 = B −1 A−1 .
(iii) If A is invertible and n ∈ N then (An )−1 = (A−1 )n .
The inverse can be used to help solve equations involving square matrices. For example,
suppose that
AB = C,
where A is an invertible square matrix. Then
We get exactly the same expression for A−1 by assuming that A−1A = I2 .
L6.3 Determinants. The quantity ad − bc is called the determinant of the 2×2 matrix A. It
turns out that it is possible to associate to any square matrix A ∈ Rn,n a number called its
determinant, written det A or |A|. This number is a function of the components of A, and
satisfies
Theorem. det A 6= 0 iff A is invertible.
We shall explain this result further on in the course, but here we give two ways of computing
the determinant when n = 3. Let
a11 a12 a13
A = a21 a22 a23 .
a31 a32 a33
Then one copies down the first two columns to form the extended array
a11 a12 a13 a11 a12
a21 a22 a23 a21 a22 .
a31 a32 a33 a31 a32
The formula of Sarrus asserts that the determinant of A is the sum of the products of entries
on the three downward diagonals & minus those on the three upward diagonals %.
Equivalently,
a a
det A = a11 22 23 − a12 a21 a23 + a13 a21 a22
. (4)
a32 a33 a31 a33 a31 a32
The three mini-determinants are constructed from the last two rows of A.
Exercise. Use either of these formulae to prove the following properties for the determinant of
a 3×3 matrix A :
(i) if one row is multiplied by c so is det A,
(ii) det(cA) = c3 det A,
(iii) if two rows are swapped then det A changes sign,
(iv) if one rows is a multiple of another then det A = 0,
(v) det A = det(A> ), so the above statements apply equally to columns.
we have
ux uy uz
→
−
u ×→
−
v ·→
−
w = det vx vy vz .
wx wy wz
In order to explain where (??) comes from, let Aij ∈ R2,2 denote the matrix obtained from
A by deleting its i row and j column. Let Ae denote the matrix with entries
b11 = →
−
u ·→
−
v ×→
−
w = det(A)
and
b12 = →
−
u ·→
−
u ×→
−
w = 0.
Exercise. Complete the proof this proposition using the properties of the mixed product.
1 2 3 6 0 4
2. Let A = 1 −1 2 and B = 1 0 0 . Compute the following products:
3 2 0 3 2 5
3. Let
a b
1 2 3
A= , B = 0 c .
0 6 9
0 0
Find a, b, c so that AB = I2 . Does there exist a matrix C such that CA = I3 ?
a b 0 1
4. The matrices A = ,E= satisfy the equations AE = EA, AE> = E> A.
c d 0 0
Deduce that A = aI2 .
For the last one, you will need to generalize the chess-board method.
5 0 −1 1 1 −1
7. Let A = 0 5 1 and P = −1 1 1 . Compute the following matrices:
−1 1 4 2 0 1
This one consists of 3 equations in 4 unknowns. The problem is to determine all possible val-
ues of these unknowns x1 , x2 , x3 , x4 that solve all 3 equations simultaneously. Each equa-
tion might be expected to impose a single constraint, and since there are fewer equations
than unknowns we might guess that it is not possible to specify the unknowns uniquely.
However, without checking the numbers on the right, it is conceivable that the 3 equations
are inconsistent and that there are no solutions.
The situation is best illustrated with pairs of equations, each in 2 unknowns. Consider the
four separate systems
x + 2y = 0 x + 2y = 7
(a) (b)
3x + 4y = 0 3x + 4y = 8
(2)
x + 2y = 0 x + 2y = 7
(c) (d)
2x + 4y = 0 2x + 4y = 8
Those on the left are called homogeneous because the numbers on the right are all zero,
whereas (b) and (d) are inhomogeneous or nonhomogeneous. Any homogeneous system al-
ways has at least one solution, namely the one in which all the unknowns are assigned the
value 0; this is called the trivial solution.
It is easy to check that in cases (a) and (b) the two equations are independent and that there
is a unique solution of the system. For (a), it is the trivial solution x = 0 = y ; for (b) it is
x = −6, y = 13/2, or expressed more neatly (x, y) = (−6, 13 2 ).
In (c) it is obvious that the second equation is completely redundant; it is merely twice the
first. In this case, we can assign any value to (say) y and then declare that x = −2y ; we
say that y is a free variable and that the general solution depends on one free parameter. In a
sense, the system (c) is ‘underdetermined’.
In (d), the two equations are incompatible; the first would imply that 2x + 4y = 14 and
we get 14 = 8. This means that there is no solution; the system is called inconsistent. By
contrast, homogeneous equations are always consistent.
To sum up, we can have no solutions, a unique solution (one and only one value for each
unknown) or infinitely many solutions. We shall see that the same is true for a linear system
of arbitrary size. With this knowledge, and without further examination, we can be confi-
dent that (1) has either infinitely many solutions or not at all; it cannot have say exactly four
solutions!
L7.2 Matrix form. Let us begin with an arbitrary linear system of the form
a11 x1 + . . . . . . + a1n xn = b1
a21 x1 + . . . . . . + a2n xn = b2
······
······
am1 x1 + . . . . . . + amn xn = bm
Example. Consider the linear system (3), and suppose that m = n and that A is invertible. This
means that we can find a matrix A−1 such that A−1 A = In . Then
A−1 (Ax) = A−1 b ⇒ (A−1 A)x = A−1 b ⇒ x = A−1 b,
and the system is solved uniquely. Thus, a linear system with the same number of equations and
variables whose associated matrix is invertible has a unique solution. Applying this method to
the generic 2×2 system
ax + by = p
cx + dy = q.
gives
x 1 d −b p 1 dp − bq
= = .
y ad − bc −c a q ad − bc −cp + aq
The solution is neatly expressed as
p b a p
q d c q
x = , y = .
a b a b
c d c d
It is a special case of Cramer’s rule, whereby each unknown is obtained by susbstituting a column
of A by b, taking the determinant, and then dividing by det A.
L7.3 Linear combinations. Let A be the matrix of left-hand coefficients defined by a linear
system. We can instead emphasize the role played by the columns c1 , . . . cn of A by rewriting
the system as
a11 a12 a1n b1
x1 · + x2 · + · · · + xn · = · .
am1 am2 amn bm
Equivalently,
x1 c1 + x2 c2 + · · · + xn cn = b. (4)
This is called the column vector form of the system. In this interpretation, the simultaneous
nature of the m equations translates into a relation between the column vectors of length m
involving the coefficients xi . For example
1 13 2 7
−6 + 2 =
3 4 8
a1 u1 + · · · + ak uk
Thus L {u1 , · · · , uk } is the set of vectors ‘generated’ by the ui . Often it is called their span
and written hu1 , . . . , uk i. It is an example of a subspace, something that we shall study in a
future lecture. It does not depend on the order in which the ui are written; it is a function
of the unordered set {u1 , . . . , uk }, which mathematicians usually write with curly brackets.
Solving a linear system then amounts to trying to express the given vector b as a LC of the
columns manufactured from the left-hand coefficients. A solution exists iff
b ∈ L {c1 , . . . , cn } .
Whilst the rows of A represent the equations, it is linear combinations of the columns that
characterize the solutions. In the study of linear systems, one is constantly torn between
favouring the rows of the associated coefficient matrix, or the columns.
1 2 x
Remark. Let v = ,w= . Show that L {v, w} = L {v} and that ∈ L {v} iff
2 4 y
7
y = 2x . The fact that 6∈ L {v} explains why the system (d) had no solution.
8
Example. Consider the row vectors i = (1, 0, 0), j = (0, 1, 0), k = (0, 0, 1). Then
The last line shows a linear combination of vectors characterized by an equation, something we
shall see over and over again. Note also that L {i, 0} = L {i} and L {i, i+j} = L {i, j} .
L7.4 Further exercises.
1. Determine which of the following homogeneous systems admit only the trivial solution:
3x + y − z = 0 −4x + 2y + z = 0 −2x1 + x2 + x3 = 0
x + y − 3z = 0 3x − 5y + z = 0 x1 − 2x2 + x3 = 0
x + y = 0, 3x + y − 2z = 0, x1 + x2 − 2x3 = 0.
3. Given the row vectors v1 = (a, b, c), v2 = (1, 1, 0), v3 = (0, 1, −1), w = (2, 3, −1),
consider the equation
x1 v1 + x2 v2 + x3 v3 = w. (5)
Determine whether there exist a, b, c ∈ R such that
(i) equation (5) has a unique solution (x1 , x2 , x3 ),
(ii) equation (5) has no solution,
(iii) equation (5) has infinitely many solutions.
Notes 8 – Row equivalence
For simplicity, we shall first study homogeneous systems of equations. The secret is to config-
ure the rows of the coefficient matrix A so as to (more or less) read off the solutions.
In this case, each equation is completely determined by the corresponding row of A, and
we can encode the equations by the m rows
r1
r2
·
rm
represents an equivalent system of equations; we have merely subtracted twice the first
from the second and multiplied the last by 3. These changes will not affect the values of
any solution (x1 , . . . , xn ). We are also at liberty to change the order in which we list the
equations.
Our aim is to use such changes to simplify the system.
Definition. Let A be a matrix of size m×n. An elementary row operation (ERO) is one of
the following ways in which a new matrix of the same size is formed from A:
(i) add to a given row a multiple of a different row,
(ii) multiply a given row by a nonzero constant,
(iii) swap or interchange two rows.
In general, elementary matrices are constructed applying the corresponding ERO to the
identity matrix of the proper size.
Definition. A n×n matrix E i called an elementary matrix if E is obtained from In applying
one of the ERO’s.
Since ERO’s are invertible operations elementary matrices are invertible matrices.
Proposition. If E is an elementary matrix, the E is invertible.
Proof. Assume that E is obtained applying an ERO of type (iii); if not a similar argument
applies. In other words, we assume that E is obtained by applying ri 7→ rj . Let F be the
elementary matrix obtained applying rj 7→ ri . Thus, for each choice of A, EA is obtained
swapping row i and row j . Thus, F (EA) = A for all choices of A, and hence F E = In ;
similarly we can prove that EF = In . Hence we proved that F is the inverse of E .
QED
We will often use the following basic fact about invertible matrices:
Proposition. If A and B are invertible n × n matrices, then AB is invertible and
(AB)−1 = B −1 A−1 .
L8.2 Solving a homogeneous system. Let us show how ERO’s can be used to solve the
linear system
x1 + x2 + 2x3 + 3x4 = 0
5x1 + 8x2 + 13x3 + 21x4 = 0 (1)
34x1 + 55x2 + 89x3 + 144x4 = 0
written before.
We shall apply ERO’s to convert A into a matrix that is roughly triangular, and then solve
the resulting system.
1 1 2 3
A = 5 8 13 21
34 55 89 144
1 1 2 3
r2 − 5r1 ∼ 0 3 3 6
34 55 89 144
1 1 2 3
1
3 r2 ∼ 0 1 1 2
34 55 89 144
1 1 2 3
r3 − 34r1 ∼ 0 1 1 2
0 21 21 42
1 1 2 3
1
21 r3 ∼ 0 1 1 2
0 1 1 2
1 1 2 3
r3 − r2 ∼ 0 1 1 2
0 0 0 0
On the left, we jot down (in abbreviated form) the operations used. It is not essential to do
this, provided the operations are carried out one at a time; errors occur when one tries to be
too ambitious! It follows from the last matrix that (1) has the same solutions as the system
x1 + x2 + 2x3 + 3x4 = 0
x2 + x3 + 2x4 = 0
But one can see at a glance how to solve this; we can assign any values to x3 and x4 which
will then determine x2 (from the second equation) and then x1 (from the first). Suppose
that we set x3 = s and x4 = t (it is a good idea to use different letters to indicate free
variables); then
or in column form
−s − t −1 −1
−s − 2t
= s −1 + t −2 ,
x=
s 1 0
t 0 1
where s, t are arbitrary. The set of solutions is therefore
−1 −1
−1 −2
L {u, v} , where u=
1 ,
v=
0 .
0 1
We shall see that the solution set of any homogeneous system is always a linear combination
of this type.
Exercise. Can you solve the system using the matrix form and elementary matrices?
L8.3 Equivalence relations. Applying ERO’s produces a natural relation on the set of ma-
trices of any fixed size.
Definition. A relationship ∼ between elements of a set is called an equivalence relation if
(E1) A ∼ A is always true,
(E2) A ∼ B always implies B ∼ A,
(E3) A ∼ B and B ∼ C always implies A ∼ C .
Observe that these three conditions are satisfied by equality =. On the set of real numbers,
‘having the same absolute value’ is an equivalence relation, but 6 is not. But we are more
interested in sets of matrices.
Definition. From now on, we write A ∼ B to mean that B is a matrix obtained by applying
one or more ERO’s in succession to A.
The condition (E2) is less obvious. But each of the three operations is invertible; it can be
‘undone’ by the same type of operation. For example, r1 7→ r1 − ar2 by r1 7→ r1 + ar2 . So if
A ∼ B then we can undo each ERO in the succession one at a time, and B ∼ A. QED
Example. If B is a matrix which has the same rows as A but in a different order, then A ∼ B .
This is because any permutation of the rows can be obtained by a succession of transpositions,
i.e. ERO’s of type (iii). For example, let
1 2 3 4 5 6 4 5 6
A = 4 5 6 , B = 1 2 3 , C = 7 8 9 .
7 8 9 7 8 9 1 2 3
Then
B is obtained from A by r1 ↔ r2 ; C is obtained from B by r2 ↔ r3 ,
so A ∼ C even though it is not possible to pass from A to C by a single ERO. Of course, it is
also true that A ∼ B , B ∼ C , and C ∼ A.
Exercise. In this example, can C be obtained from A by a succession of ERO’s of type (i)?
3. Recall that a square matrix P ∈ Rn,n is called invertible if there exists a matrix P −1 of
the same size such that P −1P = In = P P −1 . Two matrices A, B are said to be similar if
there exists an invertible matrix P such that P −1AP = B . Prove that being similar is an
equivalence relation on the set Rn,n . (Hint: you will need the fact that (P −1 )−1 = P .)
Notes 9 – Reduced matrices and their rank
Solving a linear system by applying suitable ERO’s is called Gaussian elimination. In this
lecture we shall describe it more carefully, and present some variations. First we need to
know exactly what type of matrices we want to achieve by operating on the rows.
L9.1 Echelon forms. The idea is to convert a matrix into a steplike sequence of rows like an
arrangement of toy soldiers ready for battle. We shall make this precise using the
Definition. An entry of a matrix is called a marker if it is the first nonzero entry of a row
(starting from the left).
In solving the system, the order of the rows is immaterial. It is common practice to ‘tidy up’
by applying ERO’s of type (iii) to permute the rows so that
(M2) moving down the rows from the top, the markers move from left to right, and
(M3) all the null rows are at the bottom.
It is a consequence of (M1) and (M2) that all the entries (in the same column) underneath a
given marker are zero.
Definition. A matrix satisfying (M1)–(M3) is called step-reduced.
Warning. This is sometimes referred to as ‘row echelon form’, but is a stronger notion than
‘reduced’ in the sense of the Greco-Valabrega text. We prefer to work with step-reduced
matrices as they are easy to spot visually: the main null part has an approximately triangular
form, and the markers represent ‘corner soldiers’.
Once a matrix is step-reduced, its markers take on a greater significance and are also called
pivots; often we shall box them. We shall call a column marked if it contains a marker, and
otherwise unmarked. It is sometimes convenient to suppose in addition that
(M4) each marker equals 1.
This can be quickly achieved using ERO’s of type (ii), but may have the adverse effect of
introducing fractions elsewhere. A more stringent condition is that
(M5) each marker is the only nonzero entry in its column (above as well as below).
Definition. A matrix satisfying (M1)–(M5) is said to be super-reduced.
When a matrix is super-reduced, it is possible to read off immediately the solution to the original
system. In the above example, we get the equations
%
x1 + x3 + x4 = 0
x2 + x3 + 2x4 = 0,
whence
x1 = −s − t, x2 = −s − 2t,
slightly more effectively than before.
L9.2 Linear independence. Converting a matrix to row echelon form has the effect of mak-
ing the transformed equations ‘independent’. We formulate this notion.
Definition. Let {u1 , . . . , uk } be a finite subset of Rn (meaning either in R1,n or Rn,1 ). The
set is called linearly independent (LI) if the equation
x1 u 1 + · · · + xk u k = 0 (1)
One often says ‘u1 , · · · , uk are linearly independent’, though strictly speaking being LI is
a property of a set or list and not its individual elements. The order of the elements is
immaterial, and any duplication prevents the list from being LI.
A singleton set {v} is LI iff v ∕= 0, and no set that contains 0 can be LI. A set {u, v} is LI iff
neither vector is a multiple (including zero times) the other. More generally,
Lemma. A set {u1 , . . . , uk } is LI iff no one element in it can be expressed as a linear combi-
nation of the others.
Proof. Suppose that the set is LI, but that one of the elements is a LC of the others. For sake
of argument, suppose that uk ∈ L {u1 , · · · , uk−1 }, so that uk = a1 u1 + · · · + ak−1 uk−1 for
some ai ∈ R. But then
a1 u1 + · · · + ak−1 uk−1 + (−1)uk = 0,
contradicting (1). The converse is similar. QED
Proposition. Let B be a step-reduced matrix. Then its nonzero rows are LI.
Exercise. The same conclusion holds if B satisfies just the first condition (M1), namely that
there is at most one marker in each column. Indeed, once B satisfies (M1) we can make it
step-reduced just by changing the order of the rows.
The columns of the matrix in (2) are not LI. Even if we forget about c1 and c4 , we have
15
c6 = 2c5 − 7 c3 − kc2 ,
The point is that we can always express an unmarked column as a LC of the previous
marked ones by finding the coefficients one at a time starting from the bottom.
Corollary. If B is step-reduced then its marked columns are LI.
Thus, the markers of a step-reduced matrix ‘mark out’ an independent set of both rows and
columns. Whilst there may be unmarked columns in any position, row reduction ensures
that all the unmarked rows are null. If every column of a step-reduced matrix C is marked,
then the set of columns is LI and (from the column vector form of the system) Cx = 0 has
only the trivial solution. Here is an example that makes this clear:
! #
1 2 3 1 !
& ' "
" x + 2y + 3z + t = 0
& 0 1 2 1 ' #
& ' y + 2z + t = 0
C=& 0 0 1 1 ', (3)
& ' "
" z+t = 0
" 0 0 0 1 $ $
t = 0.
0 0 0 0
L9.3 The rank. This is defined to measure the number of independent rows or columns of
any matrix, in a way we shall make precise. We first restrict to reduced matrices. --> or the number of pivots
Definition. Let B be step-reduced. Its rank is the number of markers, or equivalently
nonzero rows. This number is denoted by rank B , rk B or r(B).
Example. The matrices in (2) and (3) have rank 3 and 4 respectively.
If B has size m×n then obviously r(B) ! min{m, n}. If r(B) ∕= min{m, n}, we can think
of B as ‘defective’. It r(B) does achieve the rank we say that B has full rank or that B has
maximal rank.
The importance of the rank derives from the
Theorem. Suppose B and C are both step-reduced and that B ∼ C . Then the markers of
B and C occur in exactly the same positions. In particular, r(B) = r(C).
This enables us to define the rank of an arbitrary matrix A to be the rank of any reduced
matrix B row equivalent to A. For if A ∼ B and A ∼ C with B and C step-reduced then
B∼A∼C ⇒ B ∼ C,
and their ranks are equal.
Proof. First we prove that the rank must be equal and then we deal with the position of the
markers.
We give a proof by contradiction. Assume that r(B) < r(C), that is assume that B has
more zero rows than C . Since B ∼ C there exists a matrix E such that B = EC where
E is a product of elementary matrices. In particular, by the definition of the row-column
product of matrices, each row of B is LC of rows of C . Just by leaving the zero rows of
C untouched we produce zero rows of B . However, at least a zero row is missing since
r(B) < r(C). Thus, this zero row is LC combination of non-zero rows of C , but this is a
contradiction since the non-zero rows of C are LI being C step-reduced. This proves that
r(B) = r(C).
To prove that the markers of B and C are in the same position we can proceed by induction
on r = r(B) = r(C). Since the argument is very straightforward, we just show that the
marker of the first row of B and the marker of the first row of c are in the same column.
Say that the marker of the first row of B is in column i and the marker of the first row of
C is in column j . Since B = EC and C is step-reduced, all column of C before j are zero
and thus all the columns of B before j are zero. Thus, i ≥ j . Similarly, the only non-zero
element of the j column of C is the marker and thus B has a non-zero element in column
j . Hence, j ≤ i and we conclude j = i.
QED
3. Find matrices A, B ∈ R3,3 for which r(A) > r(A + B) > r(B).
! #
1 a 0 0
4. Let A = " 0 1 3 0 $ . Find the value of r(A) as a ∈ R varies.
2 0 0 a
Notes 10 – Solving a general system
Having introduced the rank of an arbitrary matrix, we are in a position to formulate the
celebrated results of E. Rouché (1832–1910) and A. Capelli (1855–1910) concerning solutions
of a (generally inhomogeneous) linear system of equations. From now on, we denote the rank
of a matrix M by r(M ).
L10.1 The augmented matrix. Let us return to the inhomogeneous system in matrix form
The vertical bar reminds us that the last column is special, but in applying row operations
it should be ignored so that b is just treated as an extra column added to A. We solve the
system by applying ERO’s to (1) so that it becomes a step-reduced matrix
(A′ | b′ ). (2)
Warning: in doing this it is essential to apply each ERO to the whole row, including bi ; thus
the last column will usually change unless it was already null.
Note that if (2) is step-reduced, so is the left-hand matrix A′ . By definition then,
Proof (of ‘only if’). Suppose by contradiction that the rows of A are LI but that r(A) < m.
Thus A ∼ B , B step-reduced with the last row equal zero. Since each row of B is a LC of
rows of A we get a contradiction. QED
Think of LI rows as ‘incompressible’: when the matrix is reduced they are not diminished
in number.
Exercise. (i) The columns of a matrix A are LI iff the matrix equation Ax = 0 admits only the
trivial solution x = 0 ∈ Rm,1 .
(ii) The rows of a matrix are LI iff the equation xA = 0 only has the trivial solution x = 0 ∈ R1,n .
(iii) The rows of a matrix are LI iff the equation A⊤ x = 0 only has the trivial solution x = 0 ∈
Rn,1 .
L10.2 Inconsistent systems. Given a linear system, the student-friendly situation is that in
which there are no solutions, as one does not have to waste time finding them!
Proposition. (RC1) If r(A) < r(A | b), the system has no solutions.
This case can only occur if r(A) is less than the number m of its rows, since otherwise both
matrices will have rank m.
Proof. Let r = r(A) < r(A | b). Then the first null row of A′ is the (r + 1)st and will be
followed by br+1 ∕= 0 in the step-reduced matrix (2). This row represents a contradictory
equation
0x1 + · · · + 0xn = br+1 ,
and the only way out is that the xi do not exist.
Of course, if the system is homogeneous we must be in this case since A and (A | 0) only
differ by a null column.
Proof. Each column of A and A′ corresponds to a variable, so we can speak of ‘marked’ and
‘unmarked’ variables. It is easier (but not essential) to assume that A′ is super-reduced, in
which case its ith row has the form
(0 · · · 0 1 ? . . . ? | ci ),
and represents an equation
marked variable + LC of unmarked variables = ci .
It follows that we can assign the unmarked variables arbitrarily and solve uniquely for each
of the marked variables in terms of them. QED
In the light of the procedure above, the unmarked variables are called free variables, and in
the solution it is good practice to give them new names such as s, t, u · · · or t1 , t2 , t3 . . .. The
conclusion is traditionally expressed by the statement
‘If r(A) = r(A | b) = r then the linear system has ∞n−r solutions’.
This is a useful way of recording the result that can be understood as follows. The ac-
tual number m of equations is irrelevant; what is important is the number of LI or effective
equations, and this is the rank r . Each effective equation allows us to express one of the n
variables in terms of the others, so we end up with n − r free variables or parameters.
E(A | In ) = (EA | E)
confirming the inverse found in L2. The matrices on the right act as a ‘book-keeping’ of the
ERO’s which there is no need for us to record separately.
5. Find values of t ∈ R for which each of the following matrices is not invertible:
! $
! $ −t 3 −3 −6
' ( ' ( −t −2 −3
1 −t 3−t −2 " 0 −t 0 0 %
, , # 0 1−t 1 &, " %.
t 4 −5 −t # 1 1 −t 0 &
1 2 −t
1 0 0 −t
$ !
λ −1 1
6. Find the values of λ ∈ R for which A = # 0 1 & is invertible. Now set λ = 1, and
2
0 λ 1
! $
2 1
solve the matrix equation AX = B where X ∈ R3,2 and B = # 0 1 & .
2 0
! $ ! $
1 2 3 4
7. Given A = # 4 5 6 & , b = # 8 & , verify that the equation AX = b has an infinite
1 3
2 1 2 2
number of solutions and determine the number of free parameters.
' (
2 −1
8. Given A = , which of the following equations admit at least one solution?
−4 2
' ( ' ( ' ( ' ( ' ( ' ( ' ( ' (
x 0 x 1 x 1 2 x 1
A = , A = , A = , A = .
y 0 y 0 y −2 y −2
L11.1 Rank and inverse. Using row reduction to super-reduce the matrix (A|In ) we com-
puted the inverse of the n×n matrix A in the case the inverse exists. What happens when
we row reduce (A|In ) if A is not invertible? In this case we cannot super-reduce the matrix
and it turns out that the all situation is better explained using the notion of rank
Theorem. If A is a n×n matrix, then A is invertible iff r(A) = n.
Proof. If A is invertible we know that we can super-reduce (A|In ) and obtain (In |A−1 ).
In particular, we can super-reduce A and obtain In . Thus, we have that that A ∼ In and
hence r(A) = r(In ) = n. Conversely, if r(A) = n we can row reduce A and obtain a n×n
row reduced matrix A′ . Note that all the diagonal elements of A′ must be special element
so that they must be non-zero while all the elements below the diagonal must be zero. We
can now use the non-zero diagonal elements to super-reduce A′ and obtain In . Thus we
can super-reduce (A|In ) and obtain (In |A−1 ). Hence A is invertible.
In the previous argument we met matrices of a special kind, called triangular matrices. A
square matrix T = (tij ) is called upper triangular if tij = 0 whenever i > j ; the matrix is
called lower triangular if tij = 0 whenever i < j . Note that upper/lower refers to the only
possibly non-zero elements.
A special case of triangular matrix is the one of diagonal matrices which are both upper and
lower triangular. Namely, D = (dij ) is a diagonal matrix if dij = 0 whenever i ∕= j .
Exercise. Find conditions on the entries of diagonal matrix such that it is invertible. Can you
easily find the inverse when it exists?
Exercise. Find conditions on the entries of triangular matrix such that it is invertible.
L11.2 Computing determinants. We know how to compute the determinant of a 2×2 matrix
and this is the key to define/compute the determinant of any matrix. Here is a recursive
definition of determinant.
Definition. Let A = (aij ) be a n×n matrix and denote by Aij the (n − 1)×(n − 1) submatrix
of A obtained by deleting the i-th row and the j -th column. If n = 2, then det(A) =
a11 a22 − a12 a21 . If n > 2, then we define the determinant of A as follows
This definition is called recursive since it reduces the computation of the determinant of a
n×n matrix to the computation of several determinants of (n − 1)×(n − 1) matrices and
so on ; the process stops when we hit 2×2 matrices for which we know how to explicitly
compute the determinant.
Exercise. Does Sarrus’ rule computation agree with our recursive definition in the case of 3×3
matrices?
Clearly, if the first row of A is zero, then det(A) = 0, but even more is true:
Proposition. If A has a zero row, then det(A) = 0.
Proof. We give an proof by induction. Clearly the result holds for 2×2 matrices. Assume
now that the result holds for all (n − 1) × (n − 1) matrices and let A be a n × n matrix.
Let ra be a zero row of A. If a = 1 we are done. If a ≥ 2 then all the (n − 1) × (n − 1)
matrices
!n A1 j have a zero row. Thus, A1j = 0 by the inductive hypothesis. Hence, det(A) =
1+i a det(A ) = 0 and the proof is completed.
i=1 (−1) 1i 1i
Exercise. Prove that a square matrix with a zero column has zero determinant.
Exercise. Let r be a row of the square matrix A and consider the matrix B obtained by replacing
r with λr in A . Compute det(B) using det(A) .
L11.3 Determinants and Gaussian elimination. Since computing determinants using the
definition is a computationally nasty business, we want to find shortcuts and one is pro-
vided by ERO’s. The key tool is the following:
Binet’s Theorem. Let A, B be n×n matrices, then det(A)B = det(A)det(B).
Thus, to know how ERO’s affect the determinant of a matrix, we only need to know the
determinant of all elementary matrices and this is easily done
Proposition. If E is the elementary matrix corresponding to ERO of type
(i) ri &→ ri + arj , i ∕= j , then det(E) = 1
(ii) ri &→ cri , c ∕= 0, then det(E) = c
(iii) ri ↔ rj , then det(E) = −1.
Thus, ERO’s of type (i) do not change the determinant while ERO’s of type (ii), respectively
(iii), multiply the determinant by c, respectively by −1.
When applying Gaussian elimination to a square matrix A we will always get an upper
triangular matrix T , that is EA = T for suitable product of elementary matrices E =
E1 . . . Er . Since we know det(E)i , we can apply Binet’s Theorem to reduce the computation
of det(A) to the computation of det(T ).
Proposition. If T is an upper triangular matrix, then det(T ) is the product of the diagonal
elements of T .
Proof. (Hint) We can again provide a proof by induction on n, the size of T . It is enough to
note that A11 is again an upper triangular matrix of size n − 1, while A1j for j ≥ 2 has a
zero column.
L11.4 Inverse.
We can now prove our main result.
Theorem. A matrix A is invertible iff det(A) ∕= 0.
Proof. Let A be a n×n matrix. If det(A) ∕= 0, then A ∼ T where T is a upper triangular ma-
trix with all diagonal elements different from zero. Hence, r(T ) = r(n) and A is invertible.
If A is invertible, then there exists A−1 such that AA−1 = In . Thus, using Binet’s Theorem,
det(A)det(A)−1 = det(I)n = 1 and hence det(A) ∕= 0.
Exercise. If A is invertible, compute the determinant of the inverse matrix.
Determinants also provide a way to compute the inverse of an invertible matrix. This is
based on a generalization of the properties of the mixed product.
Laplace’s Theorem. Let A be a n×n matrix, then
!
(i) !ni=1 (−1)1+i aji det(Aji ) = det(A) for all j
(ii) ni=1 (−1)1+i aji det(Ali = 0) for all j and l such thatj ∕= l .
In words, (i) says that we can compute det(A) using any row and not only the first one,
while (ii) says that mixing the Ali with a row different from rl gives zero.
We can define the adjoint matrix as done in a previous lecture and use it to compute the
inverse thanking to Laplace’s Theorem.
!=!
Exercise. Given a n×n matrix A , let A aij be its adjoint matrix, where !
aij = (−1)i+j det(()Aij ) .
Prove that AA!⊤ = det(A)In , thus, if det(A) ∕= 0 , A−1 = 1 A !⊤ .
det(A)
Exercise. For which values of t are the following matrix invertible? For the values of t for which
the matrix
" is invertible
$ find the" inverse. $
1 0 2 t 1 1
(i) # 0 t 0 % , (ii) # 1 t 1 % .
3 0 4 1 1 t
L12.1 Closure. We continue to use Rn to denote either the set R1,n of row vectors, or the set
of column vectors Rn,1 . The definitions in this lecture apply equally to both cases, though
at times it is best to specify one or the other.
Definition. A subspace of Rn is a nonempty subset V that is ‘closed’ under addition and
multiplication by a constant, meaning that these operations do not allow one to escape from
the subset V (like a room with closed doors!).
0 = v + (−1)v ∈ V.
Moreover, the singleton set {0} consisting of only the null vector is always a subspace. It
is called the null subspace or zero subspace and any other subspace of Rn must have infinitely
many elements.
Warning: do not confuse the null subspace with the empty set ∅ that is not counted as a
subspace.
At the other extreme is Rn itself. There is no doubt that this is a subspace, as conditions (S1)
and (S2) are satisfied by default: the vectors u+v and av are certainly in Rn as they have
nowhere else to go!
Example. To test whether a subset of Rn is a subspace, check first that it contains 0 . Be careful
though; here are two subsets of the plane that both contain 0 but are not subspaces:
(a) A = {(x, y) ∈ R2 : x ! 0 and y ! 0 , geometrically the first quadrant; it satisfies (S1) but not
(S2).
(b) B = {(x, y) ∈ R2 : xy = 0} , geometrically the union of the two axes; it satisfies (S2) but not
(S1).
Proof. For simplicity, suppose that we have only two vectors u1 = u, u2 = v . Two arbitrary
elements of L {u, v} are then au+bv , cu+dv , and their sum
The converse of this result is valid, namely that any subspace of Rn can be expressed in
the form L {u1 , . . . , uk }. To see this, one chooses a succession u1 , u2 , . . . of vectors in V ,
preferably in such a way that each one is not a LC of the previous ones. We shall explain
this better in the next lecture.
L12.2 Solution spaces. The set of solutions of a homogeneous system considered previ-
ously had the form V = L {u, v}, where u, v were two column vectors, and is therefore a
subspace. But there is a more basic reason for this:
Proposition. Given a matrix A ∈ Rm,n , the set
{x ∈ Rn,1 : Ax = 0} (1)
of solutions of the associated homogeneous linear system is always a subspace of Rn,1 .
Proof. This follows from the corresponding properties of matrix multiplication. If x, y are
solutions then
A(x + y) = Ax + Ay = 0 + 0 = 0,
so x + y is a solution too. Similarly,
A(ax) = a(Ax) = a0 = 0,
and ax is a solution for any a ∈ R. QED
Definition. The subspace (1) is called the null space or kernel of the matrix A, and denoted
Ker A.
Example. Let W denote the set of vectors (x, y, z) that satisfy x + y + z = 0 . Since this is
effectively a linear system (with m = 1 and n = 3 ), W is a subspace of R3 . But we can easily
express it as a LC by picking a couple of elements in it. Let u = (1, −1, 0) and v = (0, 1, −1) .
Both lie in W since their entries add up to 0. But we claim that any element (x, y, z) of W is a
LC of u and v . Indeed,
(x, y, z) = (x, −x − z, z) = xu − zv,
as claimed. Thus W = L {u, v} .
Warning: The solution set is only a subspace when the system is homogeneous. For a inho-
mogeneous system Ax = b, the solution set has the form
{x0 + v : v ∈ Ker A}.
Here, x0 is any particular solution of the inhomogeneous equation Ax = b; the difference
of any two such solutions x0 , x1 belongs to Ker A because A(x1 − x2 ) = 0.
L12.3 Subspaces defined by a matrix. Given a matrix A, two separate collections of vectors
are staring us in the face:
the rows r1 , · · · , rm ∈ R1,n of A, and
the columns c1 , · · · · · · , cn ∈ Rm,1 of A.
These give rise to two respective subspaces that complement the one already defined in (1).
Definition. With the notation above,
(i) the row space of A, denoted Row A, is L {r1 , · · · , rm } ⊂ R1,n , and
(ii) the column space of A, denoted Col A, is L {c1 , · · · · · · , cn } ⊂ Rm,1 .
More informally, Row A is a subsapce of Rn , whereas Col A is a subspace of Rm .
Each row of A corresponds to an equation of the linear system with augmented matrix
(A | 0). We already know that there are many ways to transform this system into an equiv-
alent one with the same solutions. The next result formalizes the fact that it is the row space
Row A (rather than the individual rows of A) that determines the solution space Ker A.
Lemma. Ker A = {x ∈ Rn,1 : rx = 0 for all r ∈ Row A}.
Proof. Since ! #
r1 · x
Ax = " · $,
rm · x
x belongs to Ker A iff ri x = 0 for all i. This implies that rx = 0 for any r ∈ Row A since
such an r is a LC of the rows r1 , · · · , rm . Conversely, if rx = 0 for all r ∈ Row A then
certainly ri x = 0 for all i, and so Ax = 0. QED
This is especially relevant in the case in which B is a step-reduced matrix obtained by ap-
plying ERO’s to A. Notice that the statement A ∼ B forces the matrices to have the same
size – one could relax this requirement (and retain the Theorem’s validity) by introducing a
fourth ERO, that of deleting null rows.
! "
0
Warning. A ∼ B does not imply that Col A = Col B ; to see this reduce the matrix .
1
Proof. To prove that (i) ⇒ (ii) recall that A ∼ B gives A = EB where E is the product of
elementary matrices and thus each row of A is a LC of rows of B , hence Row A ⊆ Row B .
The opposite inclusion follows since we also have B ∼ A and we can repeat the same
argument.
To prove that (ii) ⇒ (i) we note that the equality Row A = Row B is equivalent to the fact
that each row of A is a LC of rows of B , and viceversa, hence A ∼ B
To prove that (ii) ⇒ (iii) we use (i) to show that the systems Ax = 0 and Bx = EAx = 0
have the same solution sets (remember that E is invertible!).
Finally, to prove that (iii) ⇒ (ii) we use results and ideas about bases and dimension (see
further lectures). It is possible to find matrices A′ and B ′ in such a way that
Row A′ = Ker A and Row B ′ = Ker B
and, moreover,
Ker A′ = Row A and Ker B ′ = Row B.
Thus the result follows applying (ii) to the matrices A′ and B ′ .
QED
L {(1, 2, −1, 3), (2, 4, 1, −2), (3, 6, 3, −7)} , L {(1, 2, −4, 11), (2, 4, −5, 14)} .
U = L {(1, 2, −1, 3), (2, 4, 1, −2), (3, 6, 3, −7)} , V = L {(1, 2, −4, 11), (2, 4, 0, 14)} .
Is it true that U ⊆ V or V ⊆ U ?
4. Show that
Col A = {x⊤ ∈ Rm,1 : x ∈ Row(A⊤ )}.
This means that Col A is effectively the same as Row(A⊤ ). We could define a fourth susb-
pace Ker(AT ) = {x ∈ Rm,1 : x⊤A = 0}, but we have enough work to do studying Row A
and Ker A for the time being!
Notes 13 – Bases and dimension
The concepts of linear combination and linear independence are combined in the definition
of the basis of a subspace V of Rn . Any two bases have the same number of elements, called
the dimension. When V is represented as the row space of a matrix A, its dimension equals
the rank of A.
L13.1 Redundant elements. Describing a subspace as L {u1 , · · · , uk } is all very well, but
there are infinitely many ways to choose the representative vectors u1 , · · · , uk . Suppose that
{u, v} is a linearly independent set, and consider the subspace
V = L {u, 2u, u+v, 0, u−7v} .
The right-hand side is a bit ridiculous since most of its elements are redundant. We can
make the list of vectors more effective as follows:
Retain the non-zero vector u.
Discard 2u because it is already in L {u}.
Retain u + v becuase it is not in L {u}.
Discard 0 (since the null vector is already present in L {u, v}!).
Discard u−7v as it too is in L {u, v}.
We are finished with V = L {u, u+v}, which of course equals L {u, v} though it may be
that u+v is simpler than v in a numerical example.
Definition. Let V be a subspace of Rn . A basis of V is a linearly independent set {v1 · · · , vk }
such that V = L {v1 · · · , vk }.
L13.2 Bases of Rn . The definition of basis makes perfect sense if we take the subspace
V = Rn (thought of as either row or column vectors), and there are infintely many bases
to choose from. But the most obvious is the one consisting of the rows or columns of the
matrix In . In particular,
Definition. The canonical basis of Rn,1 consists of the columns of In , and its individual
elements are denoted e1 , . . . , en .
Thus, ! $ ! $
1 0
"0% "1%
e1 = # %
" , e2 = # %
" , ...
·& ·&
0 0
It is obvious that any v ∈ Rn,1 can be written in one and only one way as a LC of this basis:
$!
a1
v = # · & = a 1 e1 + · · · + a n en .
an
But this property holds for any basis of any subspace. By property (B1), v can certainly be
written in some way as a LC of the basis, and by (B2),
v = a 1 e1 + · · · + a n en = b 1 e1 + · · · + b n en
⇒ (a1 −b1 )e1 + · · · + (an −bn )en = 0
⇒ a1 −b1 = 0, · · · an −bn = 0.
Exercise. Is B = {(2, −1, −1), (1, −2, 1), (1, 1, −2)} a basis of R1,3 ?
L13.3 Finding bases by reduction. Bases can in theory be computed by the ‘discard/retain’
method described above. But often it is more effective to use row reduction. If B is step-
reduced, we know that the nonzero rows of B are LI. They certainly generate Row B be-
cause the missing rows are null! Thus we have the
Proposition. The nonzero rows of a step-reduced matrix B form a basis of Row B .
L13.4 Dimension. First we reassure ourselves that any subspace of Rn has a basis.
Theorem. Let V be a subspace of Rn which is not null. Then V has a basis consisting of at
most n elements.
Recall that if the rows of a matrix A ∈ Rm,n are LI then r(A) = m. A deeper fact we have
seen is that if A, B are two matrices of the same size with Row A = Row B then A ∼ B .
Corollary. Let V be a subspace of Rn that is not null. Any two bases of V have the same
number of elements.
Proof. We give a proof by contradiction. Given two bases {a1 , . . . , aℓ } and {b1 , . . . , bm }
assume that ℓ < m.
Form the matrix of size (ℓ + m)×n
( +
← b1 →
) ← b2 → ,
) ,
) ··· ,
) ,
A=)
) ← bm → ,.
,
) ← a1 → ,
) ,
* ··· -
← aℓ →
Clearly r(A) ≥ m since {b1 , . . . , bm } is LI. Now apply ERO’s to A in order to obtain
( +
← a1 →
) ··· ,
) ,
) ← aℓ → ,
) ,
B=) ,
) ← b1 → , .
) ← b2 → ,
) ,
* ··· -
← bm →
Clearly A ∼ B and since bi ∈ L{a1 , . . . , aℓ } for all i we also have B ∼ C where
( +
← a1 →
) ··· ,
) ,
) ← aℓ → ,
) ,
C=)
) ← 0...0 → ,.
,
) ← 0...0 → ,
) ,
* ··· -
← 0...0 →
The number of elements in a basis is called the dimension of V , and we write it as dim V .
For a subspace of Rn , we know that dim V ! n. Thus,
V = L {u1 , . . . , uk } , k = dim V.
. /0 1
LI
If V = {0} then we set dim V = 0, and (by convention) declare ∅ to be a basis.
Our final big result before turning to more geometrical applications is
Theorem. For any matrix A of size m × n,
Proof. We may suppose that A is step-reduced as none of Row A, Ker A, r(A) changes un-
der ERO’s. The nonzero rows of A then form a basis of Row A, whence the first equality.
The second equality is then a restatement of (RC2), whereby the solutions of the homoge-
neous system Ax = 0 depend on n − r(A) free parameters. More precisely, a basis of the
Ker A is given by the solutions of the form
We have also stated that the marked columns of B (of which there are r(B) in number)
form a basis of Col B , though the latter does in general change with ERO’s. But the columns
of two row equivalent matrices satisfy the same linear relations, so the same columns will
form a basis of Col A. Thus,
dim(Col A) = r(A).
This result is important as it means all the definitions and procedures that we carried out
would have given the same results had we interchanged the roles of rows and columns. We
shall return to study the column space of a matrix in Part II.
U = L {(1, 3, −2, 2, 3), (1, 4, −3, 4, 2), (2, 3, −1, −2, 9)} ,
V = L {(1, 3, 0, 2, 1), (1, 5, −6, 6, 3), (2, 5, 3, 2, 1)} .
Let W denote the subspace generated by all six row vectors. Find dim W .
Let W denote the subspace consisting of vectors satisfying all three equations. Is it true that
dim W = 5?
4. Find a basis of R4 that contains both a basis for U and basis of V , where
U = {(x, y, z, t) ∈ R4 : x − 2z = y = 0},
V = L {(0, 2, 1, −1), (1, −2, 1, 1), (1, 2, 3, −1), (1, 2, 7, 1)} .
5. Explain carefully why the marked columns of a step-reduced matrix B form a basis of
Col B .
Notes 14 – Vector Spaces
The theory of linear combinations, linear independence, bases, and subspaces that we have
studied in relation to Rn can be generalized to the more general study of vector spaces. Any
subspace of Rn (including of course Rn itself) is an example of a vector space, but there are
many others including sets of matrices, polynomials and functions.
L14.1 Motivation. A subspace of Rn is the prime example of a vector space, but there are a
number of reasons for discussing the general definition, namely
(i) to emphasize aspects of the theory that do not depend upon the choice of a specific basis,
(ii) to allow the use of scalars that are different from real numbers,
(iii) to extend the theory to function spaces of infinite dimensions.
We shall explain each of these points in turn.
(i) The whole description of Rn is modelled on the existence of its canonical basis. To be
specific, consider Rn,1 and let ej denote the j th column of the identity matrix In . Then a
typical element of Rn,1 is given by
! $
x1
" x2 %
v=" %
# · & = x 1 e1 + x 2 e2 + · · · + x n en ,
xn
and is represented by its coefficients relative to the basis {e1 , . . . , en }. But when we wish to
describe subspaces of Rn,1 there is a need to work with other bases. In fact, any subspace of
Rn is a vector space in its own right. In general it is important to be able to change basis; in
this way, the abstract concept of vector space comes into its own.
(ii) The ‘scalars’ that are used to multiply vectors in the definition of a vector space need
not be real numbers. The set of scalars is required to be what is called a field, of which R
is only one example. Other examples of fields include the set Q of rational numbers (p/q
where√ p, q are integers with q ∕= 0), the set C of complex numbers (x + iy with x, y ∈ R and
i = −1), and the ‘binary set’ B = {0, 1} (also called F2 ) consisting of just two elements.
(iii) In this course, we shall only work with vector spaces of finite dimension. We shall
explain that such a vector space V is characterized by the existence of a basis of finite size
n. The choice of such a basis makes V closely resemble the set Rn or (for the other choices
of fields mentioned above) Qn , Cn or the finite set B n of size 2n . However, in analysis, the
most important examples of vector spaces do not fall into this category and once again ones
needs to rely upon the abstract and basis-independent theory.
L14.2 The definition of a vector space. In order to define a vector space in general, one first
needs a field F of scalars. For the moment, we shall suppose that F is one of R, Q, C, B . The
important thing about F one needs to know is that its elements can be added, subtracted,
multiplied and divided, and that there are two special ones, 0 and 1.
A vector space is a set V in which it is possible to form
(i) the sum u + v of u, v ∈ V ,
(ii) the product av of a ∈ F with v ∈ V .
The elements of V are called ‘vectors’ even though they do not necessarily resemble vectors
in Rn . The two basic operations are subject to a number of rules that formalize the ones that
are completely obvious in the case F = R and V = Rn we are most familiar with. There is
no need to memorize these rules, as they are quickly absorbed in practice:
Definition. V is said to be a vector space over F provided
(a) addition of vectors behaves like addition of real numbers in that it satisfies
(u + v) + w = u + (v + w),
(1)
u + v = v + u, for all u, v, w ∈ V,
there is a zero or null element 0 for which
0+v =v for all v ∈ V,
and each vector v ∈ V has a ‘negative’ −v with the property that v + (−v) = 0;
(b) the ‘internal’ operations of F are compatible with (i) and (ii) in the sense that
(a + b)v = av + bv
(ab)v = a(bv), (2)
a(u + v) = au + av,
and finally,
1v = v.
On this page, we have been careful to type elements of V (but not F) in boldface, though in
handwriting one does not normally distinguish elements of V in any way. It is important
to observe that instances of both addition and multiplication in (2) occur with different
meanings. The conditions in (a), taken together, assert that the operation + makes V into
what is called a commutative (or abelian) group. Of course, we write u + (−v) as u − v , and
this process defines subtraction in a vector space.
Here is a simple consequence of the axioms above:
0v + 0v = (0 + 0)v = 0v.
The various rules in (a) allow us to subtract 0v (without knowing what this equals) to get
0v = 0,
so that 0v is always the null vector.
Example. To keep matters familiar, we first suppose that F = R , in which case V is called a real
vector space. Certainly V = Rn satisfies the definition above with the usual operations that we
have used repeatedly.
But another example is to take V to be the set Rm,n of matrices of size m × n . We explained
in the first lecture how to add such matrices together, and multiply them by scalars. From the
point of view of vector spaces (in which multiplication of matrices plays no part), there is little
difference between Rm,n and the space (of say row vectors) R1,mn . For example, we can pass
from R2,3 to R6 by the correspondence
' (
a b c
↔ (a, b, c, d, e, f ).
d e f
it does not matter whether we use the left-hand or right-hand description to define the two
basic operations – the result is the same. But we could equally well have chosen to represent the
matrix with (c, f, b, e, a, d) ; for this reason the vector spaces Rm,n , R1,mn are not identical.
L14.3 Polynomials and functions. A more original example is obtained using polynomials.
Recall that a polynomial is an expression of the form
p(x) = a0 + a1 x + a2 x2 + · · · + an xn . (3)
We are most familiar with the case in which the coefficients are real numbers, but they could
belong to a field F. The polynomial has degree equal to n provided an ∕= 0. We have written
p in boldface to emphasize that is is to be treated as a ‘vector’, though it is also the function
x %→ p(x);
the choice of symbol for the variable is irrelevant, and one often writes p(t).
The constant term
a0 = p(0)
of the polynomial is none other than the value of the function at 0. It vanishes if and only if
the polynomial p(x) is divisible by x.
Proposition. The set Fn [x] of polynomials (in a variable x) of degree no more than n with
coefficients in a field F is a vector space over F.
Proof. There is way to define define the basic operations, using a rule that works for any
functions. Namely, we set
Using (3), it is obvious that the sum of two polynomials is a polynomial, and that the product
of a polynomial with a scalar is a polynomial. In practice, it is just a matter of applying the
operations coefficient-wise, as in the example
(1 + x)2 + 3(1 + x + 12 x2 + 16 x3 ) = 4 + 5x + 52 x2 + 12 x3 ,
a a−1 = 1.
Every field must contain at least two elements: the additive identity (usually written 0) and
the multiplicative identity (written 1). If there are no other elements, we obtain B = {0, 1}
that is a field with the operations
0 + 0 = 0, 0 + 1 = 1 = 1 + 0, 1 + 1 = 0,
(6)
0 0 = 0, 0 1 = 0 = 1 0, 1 1 = 1.
Example. Here is an example of a field F with 4 elements. It will be defined as a vector space
over a simpler field, namely B . The set F consists of all linear combinations
b1 f 1 + b2 f 2 , b1 , b1 ∈ B,
in which we decree that f1 , f2 are independent. Although b1 , b2 are arbitrary, there are only two
choices for each. We can therefore list all four elements of F as row vectors
(0, 0) = 0f1 + 0f2 = 0,
(1, 0) = 1f1 + 0f2 = f1 ,
(0, 1) = 0f1 + 1f2 = f2 ,
(1, 1) = 1f1 + 1f2 = 1.
(On the right, we have avoided boldface to emphazise that the elements are to be treated like
numbers, not vectors.) Multiplication is carried out component-wise, using the operations of B .
The reason for also calling the last element 1 is that (1, 1)(a, b) = (1a, 1b) = (a, b) for a, b ∈ B .
The full multiplication table for F is symmetric because of (5):
· 0 f1 f2 1
0 0 0 0 0
f1 0 f1 0 f1
f2 0 0 f2 f2
1 0 f1 f2 1
If p is a prime number, the set {0, 1, 2, . . . , p − 1} with addition and multiplication modulo p
(‘clockface arithmetic’) becomes a field with exactly p elements. Applying the construction
of the Example with f1 , . . . , fk in place of f1 , f2 shows that there is a field with pk elements
for any positive integer k . It turns out that this is essentially the only field with pk elements.
Moreover, any finite field has pk elements for some prime number p ! 2 and integer k ! 1.
1. Show that the set of all differentiable functions f : (0, 1) → R is a real vector space. By
considering polynomials, or otherwise, show that it is not finite-dimensional.
The only novelty is that the coefficients now belong to F . Whilst the ui ’s form a finite set,
the right-hand side of (1) will be infinite if F is.
Subspaces of V are defined exactly as for Rn :
Definition. Let V be a vector space over a field F . A non-empty subset U of V is a
sub space iff
(S1) u, v ∈ U ⇒ u + v ∈ U ,
(S2) a ∈ F, u ∈ U ⇒ au ∈ U .
It follows that a subspace is a vector space in its own right: the operations (S1) and (S2) will
satisfy all the vector space axioms because V itself does. In practice, subspaces are again
defined either by linear combinations or linear equations.
Exercise. Let V = R3,3 be the space of 3 × 3 matrices. Let S = {A ∈ R3,3 : A> = A} be the
subset consisting of symmetric matrices. Check that S is a subspace of V , and find matrices Ai
such that S = L {A1 , . . . , A6 }.
Definition. A vector space V (for example, a subspace V of some other vector space W )
is finite-dimensional, or finitely-generated, if there exists a finite subset {u1 , · · · , uk } such
that V = L {u1 , · · · , uk }.
Example. Consider R2,3 again. This vector space is finite dimensional because any matrix of
size 2×3 is a LC of the matrices
1 0 0 0 1 0 0 0 1
, , ,
0 0 0 0 0 0 0 0 0
(2)
0 0 0 0 0 0 0 0 0
, , .
1 0 0 0 1 0 0 0 1
Indeed,
a5 a3 a1 0 0 1 0 0 0 0 0 0
= a1 + a2 + · · · + a6 . (3)
a6 a4 a2 0 0 0 0 0 1 1 0 0
Whilst the unordered set (2) is an obvious basis, there is no ‘right’ or ‘wrong’ way to order its
elements into a list.
Warning: there exists vectors spaces which are not finitely-generated, for example R[x] is
not finitely-generated
Exercise. Prove that R[x] is not finitely-generated. (Hint: for nay finite set of polynomials
{u1 , · · · , uk } let d = max{degui , i = 1, . . . , k} and note that L {u1 , · · · , uk } ⊆ Rd [x] .)
The actual dimension of V in the definition above turns out (we shall see) to be the smallest
number of vectors needed to ‘generate’ V , in which case the resulting set is LI. Elements
v1 , . . . , vk in a vector space V are linearly independent (LI) if there is no nontrivial linear
relation between them. More formally,
Definition. Let {v1 , . . . , vk } ⊂ V . We say that {v1 , . . . , vk } is LI if and only if
a1 v1 + · · · + ak vk = 0 (with ai ∈ F ) ⇒ a1 = · · · = ak = 0.
For example, the six matrices in (2) are LI because (3) can only be null if all the coefficients
ai are zero.
Definition. A finite set {v1 , . . . , vn } of elements of a vector space V is a basis of V if
(B1) it generates V in the sense that V = L {v1 , . . . , vn }, and
(B2) it is LI.
Recall that any two bases of a subspace of Rn have the same number of elements. We
shall explain shortly that the same result holds for any finite-dimensional vector space; the
dimension of V is then defined to be this number.
Exercise. (i) Guided by (2), show that Rm,n has a basis of consisting of mn matrices.
(ii) Verify that {1, x, . . . , xn } is a basis of Rn [x] , by observing that if a0 + a1 x + · · · + an xn equals
the zero polynomial then it has to vanish for all x , thus ai = 0 for all i.
Im f = {f (a) : a ∈ A },
denotes the subset of B consisting of those elements ‘gotten’ from A . Also, given b ∈ B ,
its inverse image
f −1 (b) = {a ∈ A : f (a) = b}
is the subset of A consisting of all those elements that map to b. Then f is said to be
(i) surjective or onto if Im f = B ,
(ii) injective or into if f (a1 ) = f (a2 ) ⇒ a1 = a2 .
Thus f is onto iff f −1 (b) is nonempty for all b ∈ B . If f is both surjective and injective then
it is called bijective. This means that there exists a well-defined inverse mapping
f −1 : B → A ,
f (v) = 0 ⇒ v = 0. (5)
Proof. Equation (4) tells us that (5) is a special case of the injectivity condition. So if f is
injective and f (v) = 0, then f (v) = f (0) and thus v = 0. Conversely, suppose that (5)
holds. If f (v1 ) = f (v2 ) then because f is linear,
L15.3 Bases and linear mappings. We now use linear mappings to interpret the conditions
that define a basis. Suppose that v1 , . . . , vn are any n elements of a vector space V . Define
a mapping f : R1,n → V by
(a1 , . . . , an ) 7→ a1 v1 + · · · + an vn . (6)
It is easy to check that this mapping is linear. Then (B1) asserts that it is surjective, and (B2)
implies that it is injective (with the help of the Lazy Lemma).
A bijective linear mapping is also called an isomorphism, so a basis of V defines an isomor-
phism f from Rn to V . Observe from (6) that f maps each element of the canonical basis
of Rn onto an element of the chosen basis of V . If {v1 , · · · , vn } is a basis, we may use f to
identify Rn with V , and to transfer properties of Rn to V . This enables one to prove results
such as the following:
Theorem. Let V be a vector space with a basis of size n. We have
(i) if m vectors v1 , . . . , vm of V are LI then m 6 n,
(ii) if V = L {v1 , . . . , vp } then n 6 p.
In particular, any basis of V has n elements, and V is said to have dimension n.
Proof. We already know that this is true for V = Rn . For, we represent the vectors as rows
of a matrix A with size m × n or p × n, and use the theory of the rank r(A) of A. Part (i)
implies that r(A) = m, and so m 6 n. Part (ii) implies that r(A) = n and so n 6 p.
To deduce (i) in general, let {u1 , . . . , un } be a basis of V with n elements that we know
exists by hypothesis and consider the linear mapping f : Rn → V defined as
f (a1 , . . . , an ) = a1 u1 + . . . an un
and note that f is injective. Assume that v1 , . . . , vm are LI in V . Then f −1 (v1 ), · · · , f −1 (vm )
are LI in Rn because
0 = a1 f −1 (v1 ) + · · · + am f −1 (vm ) = f −1 (a1 v1 + · · · + am vm )
⇒ 0 = a1 v1 + · · · + am vm ⇒ a1 = · · · = am = 0.
1 n →∞
Lazy Corollary. Suppose that we already know that dim V = n. Then in checking whether
a set of n elements is a basis we only need bother to check one of (B1), (B2).
Proof. A practical way of extending an LI set to a basis is provided by the following result:
Suppose that v1 , . . . , vk are LI in a vector space V . Then v1 , . . . , vk , vk+1 are LI iff
vk+1 6∈ L {v1 , . . . , vk } .
1. Let D: R3 [x] → R3 [x] denote the mapping given by differentiating: D(p(x)) = p0 (x).
Show that D is linear, and determine the images of the four polynomials pi in the next
exercise. Is D injective? Is it surjective?
2. Consider the polynomials:
Verify that {p1 (x), p2 (x), p3 (x), p4 (x)} is a basis of R3 [x] and express x2 as a LC of the ele-
ments of this basis.
3. Let A ∈ R3,3 and define f : R3,1 → R3,1 by f (v) = Av . Prove that f is linear (such
examples will be the subject of the next lecture), and use the theory of linear systems to
show that f is injective iff r(A) = 3.
satisfy pi (j) = δij (meaning 1 if i = j and 0 otherwise). Deduce that they are LI. Explain
carefully why this implies that they form a basis of R3 [x].
Notes 16 – Linear Mappings and Matrices
In this lecture, we turn our attention to linear mappings that may be neither surjective nor
injective. We show that once bases have been chosen, a linear map is completely determined
by a matrix. This approach provides the first real justification for the definition of matrix
multiplication that we gave in the first lecture.
L16.1 The linear mapping associated to a matrix. First, we point out that any matrix defines
a linear mapping.
Lemma. A matrix A ∈ Rm,n defines a linear mapping f : Rn → Rm by regarding elements
of Rn as column vectors and setting
Proof. The conditions (LM1) and (LM2) are obvious consequences of the rules of matrix
algebra. QED
Our preference for column vectors means that an m × n matrix defines a mapping from Rn
to Rm , so that m, n are ‘swapped over’. Here is an example:
0 2 4
A= ∈ R2,3 defines f : R3 → R2
3 5 1
with
1 0 0
0 2 4
f 0 =
, f 1 =
, f 0 =
.
3 5 1
0 0 1
The j th column of A represents the image of the j th element of the canonical basis of Rn,1 .
This example shows that at times we can use A and f interchangeably. But there is a subtle
difference: in applying f , we are allowed to represent elements of Rn and Rm by row
vectors. Thus it is also legitimate to write
The last equation shows us how to pass from the rows of A to the definition of f . It turns
out that any linear mapping f : Rn → Rm has a form analogous to (1), from which we can
construct the rows of an associated matrix.
Thus, f (v) is determined by the n images f (vi ) of the basis elements. Moreover, any choice
of n such vectors allows us to define a linear mapping f by means of (2). QED
We have seen that a matrix determines a linear mapping between vector spaces, namely
from Rn to Rm . The Lemma allows us to go backwards and associate a matrix to any linear
mapping from V to W , once we have chosen bases
the null column tells us that D(1) = 0 and the null row tells us that Im D consists of polynomials
of degree no greater than 1.
Exercise. Repeat the previous example using the same bases, but w.r.t. the map G : W → V
such that Gp = p0 . Compare the two matrices.
L16.3 Compositions and products. With the link between linear mappings and matrices
now established, we shall see that composition of linear mappings corresponds to the prod-
uct of matrices. Suppose that B ∈ Rm,n , A ∈ Rn,p , and consider the associated linear
mappings
g f
Rm,1 ←− Rn,1 ←− Rp,1
defined by f (u) = Au and g(v) = Bv . (It is easier to understand what follows by writing
the mappings from right to left.) The composition g ◦ f is obviously
B Au ← Au ← u,
choose bases for each of U, V, W , and let Mf , Mg be the associated matrices (the same basis
of V being used for both matrices). Then we state without proof the
Proposition. Let h = g ◦ f be the composition (5). Then Mh = Mg Mf .
This result is especially useful in the case of a single vector space V of dimension n, and a
linear mapping f : V → V . Such a linear mapping (between the same vector space) is called
a linear transformation or endomorphism. In these circumstances, we can fix the same basis
{v1 , . . . , vn } of V , and consider compositions of f with itself:
Example. Define f : R3 → R3 by f (e1 ) = e2 , f (e2 ) = e3 and f (e3 ) = e1 . Check that the matrix
A = Mf (taken with respect to the canonical basis {e1 , e2 , e3 } ) satisfies A3 = I3 .
L16.4 Nullity and rank. Important examples of subspaces are provided by the following
lemma:
Lemma. Let g: V → W be a linear mapping. Then
(i) g −1 (0) is a subspace of V ,
(ii) Im g is a subspace of W .
Proof. We shall only prove (ii). If w1 , w2 ∈ Im g then we may write w1 = g(v1 ) and
w2 = g(v2 ) for some v1 , v2 ∈ V . If a ∈ F then
f −1 (0) = {v ∈ Rn,1 : Av = 0}
is the solution space of the homogeneous linear system Ax = 0 . We already know that this is
a subspace and we labelled it Ker A. On the other hand, the image of f is generated by the
vectors f (ei ) that are the columns of A :
It follows that the dimension of Im f equals the rank, r(A) , the common dimension of Row A
and Col A.
nullity(g) + rank(g) = n,
Proof. By choosing bases for V and W , we may effectively replace g by a linear mapping
f : Rn → Rm . But in this case, the previous example shows that Ker f = Ker A and Im f =
Col A. We know that dim(Col A) = dim(Row A) = r(A), and (essentialy by (RC2)),
dim(Ker A) = n − r(A).
Write down that associated matrix A, and find v ∈ R4 such that f (v) = (0, 1, 1).
3. Let V = W = R2 [x], and let D be the linear mapping given by D(p(x)) = p0 (x). Find the
matrix MD with respect to the bases: {1, x, x2 } for V and {1, x+1, (x+1)2 } for W .
4. Let f : R3 → R3 be a linear transformation such that
Ker f = L {(1, 0, 0), (0, 1, 0)} and Im f ⊆ L {(0, 1, 0), (0, 0, 1)} .
Find all possible matrices Mf associated to f with respect to the canonical basis of R3 .
5. Find bases for the kernel and the image of each of the following linear mappings:
x −y
f: R3 → R2,2 , f (x, y, z) = ,
y x
x y
g : R2,2 → R4 , g = (x − 2y, x − 2z, y + t, x + 2t).
z t
1 2 −1
6. Let f : R3,1 → R3,1 be the linear transformation defined by A = 1 1 3 .
3 5 1
(i) Find a vector v1 such that Ker f = L {v1 }.
(ii) Choose v2 , v3 such that {v1 , v2 , v3 } is a basis of R3,1 .
(iii) Check that {f (v2 ), f (v3 )} is a basis of the subspace Im f (it always will be!).
Notes 17 – Operations on Subspaces
Subspaces of vector spaces (including Rn ) can now be conveniently defined as the kernels or
images of linear mappings between vector spaces. This leads us to discuss their properties
in more detail, and compute their dimensions.
L17.1 Intersections and unions. Let W be any finite-dimensional vector space over a field
F (or, if the student prefers) merely Rn with F = R.
Lemma. Let U, V be two subspaces of the fixed vector space W . (i) The intersection U ∩ V
is a subspace of W .
(ii) The union U ∪ V is a subspace of W if and only if U ⊆ V or V ⊆ U (in which case of
course, it equals V or U ).
Re-iterating (i), the intersection of any number of subpsaces is always a subspace. All sub-
spaces contain the null vector 0, so at ‘worst’ this subspace will be {0}.
As for unions, there will always exist a smallest subspace of W containing U ∪ V . Any such
subspace must certainly contain all the vectors
u + v, for any u ∈ U, v ∈ V, (1)
by property (S1) of a subspace. Indeed the set of all these simple sums is a subspace:
Definition/Lemma. Let W be a vector space. The sum of two subspaces U, V of W is the
set, denoted U + V , consisting of all the elements in (1). It is a subspace, and is contained
inside any subspace that contains U ∪ V .
Proof. We check properties (S1) and (S2). Typical elements of U + V are u1 + v1 and u2 + v2
with ui ∈ U and vi ∈ V . Their sum is
(u1 + v1 ) + (u2 + v2 ) = (u1 + u2 ) + (v1 + v2 ) ∈ U + V,
and thus (S1) holds. Similarly,
a(u1 + v1 ) = (au1 ) + (av1 ) ∈ U + V,
and thus (S2) holds. Hence, U + V is a subspace.
If X is a subspace that contains U ∪ V then it has to contain all elements u ∈ U and v ∈ V ,
and therefore all elements u + v ∈ U + V . It therefore contains U + V . QED
One also says that U + V is the subspace generated by U and V . This actually gives a clearer
idea of its definition.
In practice, U + V contains any LC of elements drawn from U and V , because such a LC
can always be re-arranged into the form (1). We can also think of U + V as the intersection
of all (typically infinitely many) subspaces containing both U and V .
Example. Consider the subspaces U = L {e1 , e2 } and V = L {e3 , e4 } of R4 . Then U + V = R4
because any vector in R4 can be expressed as the sum
a 1 e1 + a 2 e2 + a 3 e3 + a 4 e4 = u + v.
! "# $ ! "# $
u∈U v∈V
This situation is somewhat special, as it is also the case that U ∩ V = {0} . Lots of similar
examples (of sums of two subspaces with zero intersection) can be constructed in any vector
space, once one has a basis to play with.
L17.2 Visualizing subspaces in R2 and R3 . We can represent the vector space R2 by points
of the plane, in which the null vector 0 corresponds to the origin. An arbitrary vector in
R2 is represented by the tip of the arrow it defines, placed at the origin. In this way, it is
obvious that any subspace U of R2 is either
(i) the origin itself, corresponding to the zero subspace {0},
(ii) any straight line passing through the origin,
(iii) the whole plane, corresponding to R2 .
In these cases, the dimension of U is respectively 0, 1, 2.
If U1 , U2 are two distinct subspaces of R2 , each of dimension 1, they are both represented
by lines containing the origin. One easily sees that
U1 ∩ U2 = {0} and U 1 + U 2 = R2 .
The last equality follows because any vector in R2 can be expressed as the sum of something
in U1 with something in U2 . (The most obvious case is that in which U1 = L {e1 } and
U2 = L {e2 } correspond to the two axes and (x, y) = xe1 + ye2 ∈ U1 + U2 .)
We can carry out a similar analysis for subspaces V in R3 , representing the latter by points
in space. In this situation, dim V = 1 again gives rise to a straight line through the origin,
but dim V = 2 gives any plane passing through the origin. If V1 , V2 are two such distinct
2-dimensional subspaces (planes through the origin), one easily sees this time that
V1 ∩ V2 = V3 and V 1 + V 2 = R3 ,
where V3 is a line (again containing the origin). Note that in this last case,
L17.3 Dimension counting. Any subspace U is a vector space in its own right and has a
dimension: recall that this equals the number of elements inside any basis of U .
Obvious lemma. If U is a subspace of a vector space (or another subspace) V then dim U !
dim V , with equality iff U = V .
This is true because a basis of U can always be extended until it becomes one of V . To do
this we can use the trick that if v1 , . . . , vk are LI and vk+1 is not a LC of v1 , . . . , vk , then
v1 , . . . , vk , vk+1 are LI. In the examples above, a subspace of R2 has dimension 2 only if it is
R2 . Similarly for dimension 3 in R3 .
Much of the theory of bases and dimension was discovered by Hermann Grassmann, in-
cluding the following result dating from around 1860:
Theorem. Let U, V be two subspaces of a finite-dimensional vector space W . Then
This result is illustrated by the following example (whose method is often used as a proof).
Example. Let W = R5 . Consider the two subspaces
We are required to find a basis of R5 that contains both a basis of U and a basis of V . The trick
is to start by finding a basis of U ∩ V . It is easy to see that dim U = 3 and dim V = 4 ; this is
because the homogeneous linear systems have rank 2 and 1. Now, a vector v ∈ R5 belongs to
U ∩ V iff it satisfes all three equations. Since the associated matrix
% ' % '
2 −1 −1 0 0 1 − 12 0 0 3
2
) *
A = &0 0 0 1 −3 ( ∼ & 0 0 1 0 3 (
0 0 1 1 0 0 0 0 1 −3
A basis of U ∩ V consists of
w1 = ( 12 , 1, 0, 0, 0), w2 = (− 32 , 0, −3, 3, 1)
Fancy proof of (2). First consider the Cartesian product P = U × V consisting of ordered pairs
(u, v) with u ∈ U and v ∈ V . This can be made into a vector space using the operations
Is it true that U1 + U2 = R5 ?
By the Remainder Theorem, p(x) belongs to V iff it is divisible by x − 1. What is the dimen-
sion of V ∩ Rn [x] if n " 1?
V1 = L {(1, 3, −2, 2, 3), (1, 4, −3, 4, 2), (2, 3, −1, −2, 9)} ,
V2 = L {(1, 3, 0, 2, 1), (1, 5, −6, 6, 3), (2, 5, 3, 2, 1)} .
f : V → V,
Warning: Since f (0) = 0, it is obvious that the null vector satisfies (1). But the null vector
does NOT count as an eigenvector; for one thing its eigenvalue λ is undetermined. On the
the hand, observe that if v is an eigenvector of f and a ∕= 0 then av is also an eigenvector
(with the same eigenvalue).
Example. Here are two extreme cases:
(i) Suppose f is the identity mapping, so that f (v) = v for all v ∈ V . This is obviously linear,
and every nonnull vector v ∈ V is an eigenvector with eigenvalue 1.
(ii) Define g(v) = 0 for every v ∈ V (the ‘null’ transformation, linear by default). Once again,
every nonnull v ∈ V is an eigenvector, but this time with common eigenvalue 0 .
More interesting examples of eigenvectors can easily be written down if there is a basis of
the vector space V at one’s disposal:
Example. (i) Take V = R2 and define a linear mapping f : R2 → R2 by
By this very definition, e1 and e2 are eigenvectors with eigenvalues 1 and −1 . Geometrically,
f is a reflection in the x -axis, and any reflection in the plane will have two such eigenvectors.
(ii) Suppose that V has dimension 4 and a basis {v1 , v2 , v3 , v4 } . We are at liberty to define
v1 + v2 + v3 + v4 , v1 − v2 + v3 − v4
To show that bases are not essential for the existence of eigenvectors, here is an example in
which the vector space is not finite-dimensional:
Exercise. Let V be the vector space of functions φ: R → R that admit derivatives of all orders.
Then the mapping D: V → V given by
dφ
D(φ) = φ′ , where φ′ (x) = ,
dx
is linear. Find all the eigenvectors (or rather, ‘eigenfunctions’) of D .
L18.2 Eigenvectors of a square matrix. Suppose that V is a vector space of finite dimension
n with F = R (we shall only consider this case from now on). Once we have chosen a basis
of V (any one will have n elements), we know that we can treat V as if it were Rn . For this
reason, we need only study linear mappings f : Rn → Rn , any one of which is given by
f (v) = Av,
Av = λv. (2)
However, these three are not linearly independent (the second is the first plus the third). Overall,
in this example, we can find three eigenvectors that are LI. We shall see later that any square
matrix of size n × n has at most n eigenvectors that are LI.
Let A ∈ Rn,n , and let I = In be the identity matrix of the same size. Here is a useful trick:
Lemma. If v is an eigenvector of A with eigenvalue λ, then it is an eigenvector of A + aI
with eigenvalue λ + a.
Given a solution v ∕= 0, the nullity of the matrix A − λI will be nonzero. By (RC2) or the
Rank-Nullity Theorem, r(A − λI) < n. Thus, A − λI is not invertible and its determinant
is necessarily 0.
Conversely, det(A − λI) = 0 implies that r(A − λI) < n and Ker(A − λI) contains a nonnull
vector v . The latter satisfies Av = λv . QED
For any square matrix, the constant term of the characteristic polynomial is given by
At the other extreme, the leading term is always (−x)n = (−1)n xn . It is also easy to show
that the coefficient of xn−1 in p(x) equals (−1)n−1 tr A, where tr A denotes the sum of the
diagonal entries. Warning: Some authors (including Wikipedia) define the characteristic
polynomial to be det(xI − A); this is always monic (good!) but attempts to calculate it by
hand are usually fraught with sign errors (bad ¡).
The statement ‘λ is an eigenvalue of A’ means that there exists v ∕= 0 satisfying (2). Any
such λ must therefore be a root of p(x), and a solution of the characteristic equation
p(x) = 0. (5)
Proof. A is invertible iff det A ∕= 0 iff x = 0 is not a solution of det(A − xI) = 0. QED
Exercise. Give a different proof of the Lemma, avoiding any mention of the determinant and
characteristic polynomial.
If A ∈ Rn,n , then (5) can have at most n roots, and there are at most n distinct eigenvalues.
There may be less, as the roots of a polynomial can be repeated, and it is also possible that
pairs of roots occur as complex numbers.
Given A, suppose that λ is a root of the characteristic polynomial, so that p(λ) = 0. From
above, we know that there must exist a nonnull column vector v satisfying (3). We can find
therefore such a v by solving the homogeneous linear system associated to A − λI . This we
shall do in the next lecture (but if you are desperate, do q. 6 below).
Example. We have not spoken much about 4×4 determinants, but these can be expanded along
a row or column into a linear combination of 3 × 3 determinants. Let
! #
x 0 0 s
% −1 x 0 r &
% &
A=% &.
" 0 −1 x q $
0 0 −1 x + p
Expanding down the first column,
! # ! #
x 0 r 0 0 s
det A = x det" −1 x q $ + 1. det" −1 x q $+0+0
0 −1 x + p 0 −1 x+p
) *
= x x(x(x + p) + q) + 1.r + s
= x4 + px3 + qx2 + rx + s.
This type of example can be used to show that any polynomial whose leading term is (−x)n is
the characteristic polynomial of some n × n matrix.
6. Find all the eigenvectors of the first two matrices in q. 3 by solving appropriate linear
systems (one for each eigenvalue).
Notes 19 – Eigenspaces and Multiplicities
In this lecture, we shall explain how to compute methodically all the eigenvectors associated
to a given square matrix.
• if λ is real number, then λ satisfies p(λ) = 0 iff there exists a nonnull column vector
v ∈ Rn,1 such that Av = λv .
• if λ is a complex number, then λ satisfies p(λ) = 0 iff there exists a nonnull column
vector v ∈ Cn,1 such that Av = λv .
Eλ = ker(A − λI)
Warning: Not quite all the elements of Eλ are eigenvectors, since (being a subspace) Eλ
also includes the null vector 0 that is not counted as an eigenvector.
! "
0 1
Warning: Even if A ∈ Rn,n , it can have complex eigenvalues, e.g. A = having
−1 0
charcteristic polynomial x2 +1 and eigenvalues i and −i. Since the eigenvalues are complex
and not real, the corresponding eigenspaces contain no eigenvector with real components.
The dimension of Eλ satisfies
As predicted (by the very fact that −2 is an eigenvalue), this matrix has rank less than 2. It is
easy to find a nonnull vector in Ker(A + 2I) , namely
!9" ! " #! "$
9 9
4 or , whence E−2 = L .
1 4 4
Similarly,
! " ! " #! "$
−9 9 1 −1 1
A − λ2 I = A − 3I = ∼ , and E3 = L .
−4 4 0 0 1
Although the previous example only has n = 2, it illustrates an important technique: that
of selecting an eigenvector for each eigenvalue.
The next result is fairly obvious for k = 2, and we prove it in another special case.
Proposition. Suppose that v1 , . . . , vk are eigenvectors of A associated to distinct eigenval-
ues λ1 , . . . , λk . Then {v1 , . . . , vk } is LI.
Proof. To give the general idea, we take k = 3 (which, for the conclusion to be valid, means
that the vectors lie in Rn with n ! 3) and λ1 = 1, λ2 = 2, λ3 = 3. Suppose that
0 = a1 v 1 + a2 v 2 + a3 v 3 . (1)
Recall that a total of n LI vectors in the vector space Rn automatically forms a basis.
Corollary. Suppose that the characteristic polynomial of a matrix A ∈ Rn,n has n distinct
real roots. Then Rn has a basis consisting of eigenvectors of A.
We shall see in the next lecture that the Corollary’s conclusion means that A is what is called
diagonalizable: in many ways A behaves as if it were a diagonal matrix.
L19.2 Repeated roots. Greater difficulties can arise when a root of p(x) is not simple.
Definition. We define the multiplicity, written mult(λ), of a root λ of the characteristic
polynomial p(x) to be the highest power of the factor x − λ that divides p(x).
of solutions. The fact that dim E−3 = mult(−3) is not entirely a coincidence.
Proof. The first inequality is obvious: the fact that λ is an eigenvalue means that Eλ contains
a nonzero vector v .
To prove the second inequality one needs to know more about the characteristic polynomial,
but we can justify it in the special case that the remaining roots of p(x) are real and distinct.
Let A ∈ Rn,n , m = mult(λ). Suppose (for a contradiction) that dim Eλ > m, so that we can
pick LI vectors v1 , . . . , vm+1 in Eλ , as well as LI eigenvectors w1 , . . . , wn−m , one for each
remaining eigenvalue. The resulting total of n + 1 vectors in Rn cannot be LI, so there is a
non-trivial linear relation
But one way or another this contradicts the Proposition: the sum v is nonnull since the w ’s
are LI, and is itself an eigenvector with eigenvalue λ different from the others. QED
Exercise. (i) Write down a diagonal matrix A ∈ R4,4 such that the charactristic polynomial of A
equals (x − 1)2 (x + 2)2 .
(ii) Verify that A has dim E1 = 2 and dim E−2 = 2 .
(iii) Let B be the matrix obtained from A by changing its entry in row 1, column 2 from 0 to 1.
Compute dim E1 and dim E−2 for B .
(iv) Find a matrix C with the same characteristic polynomial (x − 1)2 (x + 2)2 but for which
dim E1 = 1 = dim E−2 .
L19.3 The 2 × 2 case. Let us analyse the possible eigenspaces of the matrix
! "
a b
A= ∈ R2,2 .
c d
Its characteristic polynomial p(x) = x2 − (a+d)x + ad − bc has roots
% √
a+d ± (a+d)2 − 4(ad−bc) a+d ± ∆
= ,
2 2
where
∆ = (a−d)2 + 4bc.
N v = λv ⇒ 0 = N 2 v = N (λv) = λ2 v ⇒ λ2 = 0.
! " #! "$
x 1
Any eigenvector has to satisfy y = 0 , so E0 = L has dimension 1. Similar
y 0
considerations apply to the matrix N + aI where a ∕= 0 .
1. For each of the following matrices, find all possible eigenvalues λ ∈ R and describe the
associated eigenspaces Eλ :
% '
% ' % ' 1 1 0 2
! " 1 1 2 5 3 −3
2 1 - −1 3 0 0.
, &0 −1 1 (, &0 1 0 (, - ..
1 2 & −1 4 −1 1(
0 0 0 1 2 1
−1 4 −1 0
% '
a 2 a −1
2. Given A = & −3 5 −2 ( with a ∈ R,
−4 4 −1
(i) find the value of a for which 1 is an eigenvalue (no need to work out p(x)!).
(ii) is there a value of a for which there are two LI eigenvectors sharing the same eigenvalue?
3. Let p(x) be a polynomial of degree n, and assume that p(λ) = 0. Show that mult(λ) ! 2
if and only if p′ (λ) = 0.
4. Find a matrix L such that L2 = L but L is neither 0 nor I . What are the eigenvalues of
L? Does the answer depend upon you choice?
Eigenvectors can be found by picking particular solutions of the corresponding linear sys-
tem: ! # ! # ! #
4 3 −3 1 2 0 −6
A − I = "0 0 0 $ ∼ "0 5 3$ ⇒ v1 = " 3 $
1 2 0 0 0 0 −5
! # ! # ! #
3 3 −3 1 0 −1 1
A − 2I = " 0 −1 0 $ ∼ " 0 1 0 $ ⇒ v2 = " 0 $
1 2 −1 0 0 0 1
! # ! # ! #
1 3 −3 1 0 −3 3
A − 4I = " 0 −3 0 $ ∼ " 0 1 0 $ ⇒ v3 = " 0 $.
1 2 −3 0 0 0 1
(vertical lines emphasize the column structure of this matrix). In view of (2),
! # ! #
↑ ↑ ↑ −6 2 12
AP = " Av1 Av2 Av3 $ = " 3 0 0 $ .
↓ ↓ ↓ −5 2 4
We get exactly the same result by multiplying P by the diagonal matrix D on the right:
! #! # ! #
−6 1 3 1 0 0 −6 2 12
PD = " 3 0 0$"0 2 0$ = " 3 0 0 $.
−5 1 1 0 0 4 −5 2 4
In conclusion,
AP = P D.
Since the columns of P are LI, P has rank 3 and is invertible. One can therefore assert
(without the need to actually compute P −1 ) that
P −1AP = D, or A = P DP −1 .
The property of ‘being similar to’ is an equivalence relation on the set Rn,n (refer to L8.3). The
3 × 3 matrix A of our example is similar to (1), and therefore diagonalizable.
Warning. We have chosen to apply these definitions strictly within the field R of real num-
bers. One is free to treat A, B as elements of Cn,n and ask whether A = P BP −1 with
P ∈ Cn,n . This is an easier condition to satisfy, and leads to a more general concept of
similarity and diagonalizability, but one that we shall ignore in this course.
L20.2 A criterion for diagonalizability. We explain the construction in L19.1 in a more gen-
eral context. Suppose that A ∈ Rn,n possesses two eigenvectors v1 , v2 , with corresponding
eigenvalues λ1 , λ2 that may or may not be distinct. Consider the matrix
! #
↑ ↑
X = " v1 v2 $ ∈ Rn,2
↓ ↓
Proof. Each eigenvector v ∈ Rn,1 is associated to a real root λ of p(x), and we already know
that the dimension of the eigenspace Eλ is at most mult(λ). So unless (3) holds for every
eigenvalue, it is numerically impossible to find a basis of eigenvectors.
Conversely, suppose that the distinct roots of p(x) are λ1 , . . . , λk ∈ R. If k = n the result
follows from the Corollary in L20.1; otherwise set mi = mult(λi ) and suppose that (3) holds.
Pick a basis of each eigenspace Eλi , and put all these elements together to get a total of n =
m1 + · · · + mk eigenvectors v1 , . . . , vn . Any linear relation between them can be regrouped
into a linear relation
Warning: we can work in greater generality over any field F. In particular: A matrix
A ∈ Fn,n is diagonalizable iff all the roots of p(x) are in F, and for each repeated root λ we have
mult(λ) = dim Eλ .
Exercise. Verify that the ‘anti-diagonal’ matrix
! #
0 0 1
B = "0 1 0$
1 0 0
L20.3 Simple endomorphism. From the previous discussion, the following definition is
now very natural.
Definition. A linear mapping f : V → V is called endomorphism.
We can now consider the question: is a given endomorphism simple or not? The answer
comes from the previous discussion and we state the result without a proof.
Proposition. Let f : V → V be an endomorphis, then the following are equivalent facts:
• f is a simple endomorphism
L20.4 An application. The sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, . . . of Fibonacci
numbers is defined recursively by the initial values f0 = 0, f1 = 1 and the equation
fn+1 = fn + fn−1 .
F n = (P DP −1 )(P DP −1 ) · · · (P DP −1 ) = P Dn P −1 ,
since all internal pairs P −1P cancel out. It follows that
% & % & % & % &% &
fn 0 −1 0 √1 P
λ1n 0 1
= Fn n
= PD P = ,
fn+1 1 1 5 0 λ2n −1
3. Find counterexamples to show that both the following assertions are false:
A ∈ Rn,n is diagonalizable ⇒ A is invertible;
A ∈ Rn,n is invertible ⇒ A is diagonalizable.
% &
−5 3
4. Let A = . Find a diagonal matrix D and a matrix P such that A = P DP −1 . If
6 −2
D = E 3 , check that A = (P EP −1 )3 ; hence find a matrix B ∈ R2,2 such that A = B 3 .
5. Let g: R3,3 → R3,3 denote the linear mapping defined by g(A) = A + A⊤ . Use the study
of g carried out in a previous lecture to find a basis of R3,3 consisting of eigenvectors of g .
Write down the daigonal 9 × 9 matrix representing g with respect to this basis.
L21.1 Orthogonal eigenvectors. Recall the definition of the dot or scalar product of two col-
umn vectors v, w ∈ Rn,1 . Without writing out their components, we can nonetheless assert
that
v · w = v⊤ w. (1)
Recall too that a matrix S is symmetric if S⊤ = S (this implies of course that it is square).
Lemma. Let v1 , v2 be eigenvectors of a symmetric matrix S corresponding to distinct eigen-
values λ1 , λ2 . Then v1 · v2 = 0.
(Sv1 ) · v2 = (Sv1 )⊤ v2 = v⊤ ⊤ ⊤
1 S v2 = v1 (Sv2 ) = v1 · (Sv2 ).
This is true for any vectors v1 , v2 , but the assumptions Sv1 = λ1 v1 and Sv2 = λ2 v2 tell us
that
λ1 v 1 · v 2 = λ2 v 1 · v 2 ,
and the result follows. QED
We will see that pairwise orthogonal vectors are LI, and this motivates the following:
Definition. Let v1 , . . . , vm be a basis of the subspace V ⊆ Rn . The basis is called an or-
thonormal basis (ON basis for short ) if the following holds
(i) vi · vi = 1 for all i;
It is easy to check that its eigenvalues are 9 and 4 , and that respective eigenvectors are
! " ! "
1 −2
v1 = , v2 = .
2 1
As predicted by the Lemma, v1 ·v2 = 0 . Given this fact, we can normalize v1 , v2 to manufacture
an orthonormal basis ! " ! "
1 −2
f1 = √1 , f2 = √1
5 2 5 1
of eigenvectors, and use these to define the matrix
! "
1 −2
P = √1 .
5 2 1
With this choice, ! "! " ! "
⊤ 1 1 2 1 −2 1 5 0
P P = 5
= 5
= I2 . (2)
−2 1 2 1 0 5
Another way of expressing this relationship is
! "
−1 1 1 2
P = √ = P⊤ ,
5 −2 1
Let us explain why the three conditions are indeed equivalent. As (1) and (2) make clear,
condition (i) asserts that the columns of P are orthonormal. Condition (ii) assserts that the
rows are orthonormal. A set {v1 , . . . , vn } of orthonormal vectors is necessarily LI since
a1 v 1 + · · · + an v n = 0
implies (by taking the dot product with each vi in turn) that ai = 0; thus either (i) or (ii)
implies that P is invertible. It follows that both (i) and (ii) are equivalent to (iii).
The relationship between symmetric and orthogonal matrices is cemented by the
Theorem. Let S ∈ Rn,n be a symmetric matrix. Then
(i) the eigenvalues (or roots of the characteristic polynomial p(x)) of S are all real.
(ii) there exists an orthogonal matrix P such that P −1SP = P ⊤SP = D .
Proof. (i) Suppose that λ ∈ C is a root of p(x). Working over the field C, we can assert that
there exists a complex eigenvector v ∈ Cn,1 satisfying Sv = λv . If v⊤ = (z1 , . . . , zn ) then
the complex conjugate of this vector is v⊤ = (z 1 , . . . , z n ) and
since v ∕= 0. Thus
⊤
λ v⊤ v = v⊤ (Sv) = Sv v = λ v⊤ v,
and necessarily λ = λ and λ ∈ R.
In the light of (i), part (ii) follows immediately if all the roots of p(x) are distinct. For each
repeated root λ, one needs to know that mult(λ) = dim Eλ ; for if this is true the Lemma
permits us to build up an orthonormal basis of eigenvectors. We shall not prove the mul-
tiplicity statement (that is always true for a symmetric matrix), but a convincing exercise
follows. QED
(found in L20.2), with respective eigenvalues 0 (multiplicity 1) and −3 (multiplicity 2). As pre-
dicted by the Lemma, v1 · v2 = 0 = v1 · v3 . Observe however that v2 · v3 ∕= 0 ; show nonetheless
that there exists an eigenvector v3′ with eigenvalue −3 such that v2 · v3′ = 0 . Normalize the
vectors v1 , v2 , v3′ so as to obtain an orthogonal matrix P for which P −1AP is diagonal. Compute
the determinant of P ; can the latter be chosen so that det P = 1 ?
We conclude with a result explaining why we like symmetric real matrices so much.
Corollary. If S ∈ Rn,n is symmetric, then S is diagonalizable.
! "
0 i
Exercise. Prove that the symmetric not real matrix S = is clearly not diagonalizable
i 0
over R , but it is diagonalizable over C .
! "
1 i
Exercise. Prove that the symmetric not real matrix S = is not diagonalizable over
i −1
R and it is not diagonalizable over over C .
Since p2 + r2 = 1, the first matrix has determinant 1 and the second −1.
Let us focus our attention on the first case. There exists a unique angle θ , 0 ≤ θ < 2π such
that cos θ = p and sin θ = r . We denote the resulting matrix P by
! "
cos θ − sin θ
Rθ = , (3)
sin θ cos θ
Proof. Suppose that A⊤A = I and B⊤B = I . Then A⊤ = A−1 ; this implies that
as required. QED
This example can be used to show that any 3 × 3 orthogonal matrix P with det P = 1
represents a rotation of R3 about an axis passing through the origin. For given such a
rotation, one can choose an orthonormal basis {v1 , v2 , v3 } of R3 such that v3 points in the
direction of the axis of rotation. It then follows (from an understanding of what is meant by
a rotation of a rigid body in space, and referring to (3)) that the rotation is described by a
linear mapping f : R3 → R3 whose matrix with respect to the basis is
# %
cos θ − sin θ 0
Mθ = $ sin θ cos θ 0 &.
0 0 1
1. For which values of θ does the rotation matrix (3) have a real eigenvalue?
2. Show that if an n × n matrix S is both symmetric and orthogonal then S 2 = I . Deduce
that the eigenvalues of S are 1 or −1.
u = u1 v1 + . . . + un vn
where ui = u · vi for i = 1, . . . , n.
Example. The previous lemma, applied to canonical bases, just states the very well-known fact:
when we write the vector v = (1, 2, 3, . . . , n) ∈ Rn we really mean v = 1e1 + 2e2 + . . . + nen .
Exercise. Can you find an orthonormal basis of R3 containing two given orthogonal unit vectors
u, v ? (Hint: note that e3 = e1 × e2 is a unit vector)
Since orthonormal bases are so handy, we need a way to produce them at will, and in order
to do so we will use projections.
Definition. Let u, v ∈ Rn . The orthogonal projection of u on v is
u·v
prv (u) = v,
||v||2
√
where ||v|| = v · v is the magnitude, or norm, of v .
Example. If u, v and w in R2 are the three sides of a right triangle, that is v + w = u and
v ⊥ w , then v = prv (u) . Note that
u − prv (u) = w ⊥ v
and we will see that this property holds in general, not only in the plane.
u0 · v i = 0
Gram-Schmidt algorithm is just the iteration of the previous proposition which we will illus-
trate with an example.
Example. Find an orthonormal basis of
We begin by finding an orthonormal basis of the subspace V1 = L{u1 } and this is done by
normalizing. We define √
u1 2
v1 = = (1, 1, 0, 0),
||u1 || 2
and {v1 } is an orthonormal basis of V1 . Now we consider the subspace V2 = L{v1 , u2 } and
we apply the proposition, that is we change the second vector in order to obtain an orthonormal
basis of V2 :
1
= (1, 0, 1, 0) − (1, 1, 0, 0) · (1, 0, 1, 0)(1, 1, 0, 0)
2
1 1 1
= (1, 0, 1, 0) − (1, 1, 0, 0) = ( , − , 1, 0).
2 2 2
0 0
Note that u2 ⊥ v1 as desired, but u2 is not a unit vector and then we need to normalize
again √ √
u02 6 1 1 6
v2 = = ( , − , 1, 0) = (1, −1, 2, 0).
||u02 || 3 2 2 6
In conclusion {v1 , v2 } is an orthonormal basis of V2 .
We now need to consider V itself and we know that V = L{v1 , v2 , u3 }. To apply the
proposition we define
u03 = u3 − prv1 (u3 ) − prv2 (u3 ),
and we compute. Since, u3 · v1 = u3 · v2 = 0 we get u03 = u3 and since it is a unit vector
u0
v3 = ||u03 || = u03 . In conclusione we have that
3
{v1 , v2 , v3 }
is an orthonormal basis of V .
Exercise. Reorder the vectors v1 , v2 and v3 in the previous example and repeat the process. Do
you get the same orthonormal basis?
L22.2 Cayley-Hailton theorem. We already noticed that matrix multiplication is much eas-
ier when we consider diagonal matrices. For example, if D is a diagonal matrix, then Dn is
obtained just by taking the n-th powers of the diagonal elements of D . In general, comput-
ing powers of diagonalizable matrices is not difficult.
Lemma. If A = P DP −1 where D is diagonal, then An = P Dn P −1 .
What can we do to work efficiently with the powers of any matrix? Cayley-Hamilton Theo-
rem can help us.
Cayley-Hamilton Theorem. Let A be a square matrix with characteristic polynomial p(x) =
a0 + a1 x + . . . + (−1)n xn . Then
p(A) = a0 In + a1 A + . . . + (−1)n An = 0.
and thus A and D have the same characteristic polynomial p(x) = (d11 − x) · . . . · (dnn − x).
Now we compute p(A) using the last equality
= (d11 P P −1 − P DP −1 ) · . . . · (dnn P P −1 − P DP −1 )
= P (d11 I − D) · . . . · (dnn I − D)P −1 .
Note that the matrix dii I − D is a diagonal matrix with the i-th row equal the zero row, and
whence p(A) = 0. QED
Cayley-Hamilton Theorem has many deep consequences, but we will only focus on two
algorithmic aspects: the computation of the inverse and the computation of powers, which
we will show in two (easily generalized) examples.
About the inverse:
1 2
Example. Consider the matrix A = having characteristic polynomial p(x) = x2 −
3 4
5x − 2. Since det A 6= 0 the matrix is invertible and, because of Cayley-Hamilton Theorem, we
have p(A) = −2I − 5A + A2 = 0, thus
1 2
I= (A − 5A).
2
Multiplying the previous inequality by A−1 we get
1
A−1 = (A − 5I).
2
About powers:
1 2
Example. Consider the matrix A = having characteristic polynomial p(x) = x2 −
0 1
2x + 1 . We want to compute A3 . Note that A is not diagonalizable (why?). By Cayley-Hamilton
Theorem we know that A2 = 2A − I and thus
Warning: In general, the use of Cayley-Hamilton Theorem to compute powers require the
use of the division of polynomials.
1. Find an orthonormal basis for V = L{(1, 1, 0), (0, 1, 1)} and for W = L{(1, 1, 0), (0, 1, 1), (1, 0, 1)}.
Can you make the second computation really quickly?
1 0 1 0
2. Let A = and find orthonormal bases for Row A and Ker A. Can one use
0 1 1 0
these to find an orthonormal basis of R4 ?
where A, . . . , F are real constants, and A, B, C are not all zero so that the left-hand side is a
polynomial of degree 2 in the two variables x, y . (The 2’s are for convenience later.)
Often we shall refer to an equation like (1) as a conic, though strictly speaking the latter is a
set of points. The conics we discussed before were those for which D = E = 0. In this case,
by diagonalizing the symmetric matrix
A B
S= , (2)
B C
we can always rotate the coordinate system so that the equation of C becomes
λ1 X 2 + λ2 Y 2 = µ, µ = −F.
L24.2 Study of a conic: finding the center. Given the general equation (1), we can try first
to eliminate the term 2Dx + 2Ey of degree 1 by a change of coordinates of type
x = X + u,
(3)
y = Y + v.
This corresponds to a translation in which the new system OXY has its origin O at the old
point (x, y) = (u, v). Substituting (3) into (1), we see that the new term of degree 1 is
2AuX + 2B(Xv+uY ) + 2CvY + 2DX + 2EY = (2Au+2Bv+2D)X + (2Bu+2Cv+2E)Y.
To eliminate all this, we need to solve the linear system of equations with unknowns u, v
and augmented matrix
A B −D
. (4)
B C −E
Since the left-hand side of this matrix is (8), a solution might not exist if det S = 0.
Definition. The conic C is central if there is a translation (3) that converts its equation into
the form A0 X 2 + 2B 0 XY + C 0 Y 2 + F 0 = 0. In this case, (X, Y ) ∈ C ⇔ (−X,−Y ) ∈ C , and
the centre of symmetry is the point (X, Y ) = (0, 0) or (x, y) = (u, v).
From the analysis above, we know that there is only one case in which (4) is incompatible
and C is not central, namely (viii).
Corollary. The conic (1) is a parabola only if B 2 = AC .
L24.3 Study of a conic: using matrices. One carry out a qualitative study of a conic by means
of the study of two symmetric matrices. Given C as in (1) we consider the 2 × 2 matrix
A B
A22 = , (5)
B C
and
1 1 1/2
A33 = 1 1 1/2 . (8)
1/2 1/2 0
Since |A22 | = 0 then the conic is of parabolic type and since |A33 | = 0 the conic is degenerate.
Thus C is either a pair of parallel lines, either distinct or coinciding; note that the real point
(0, 0) ∈ C . To decide in which situation we are, we intersect with a random line: if the inter-
section is empty we pick a new line, if not we will find either two distinct points or one double
point. Let’s pick the line l : x = 0. Thus l ∩ C is the solution set of y 2 + y = y(y + 1) = 0, that is
a set of two distinct points. Hence C is the union of two parallel distinct lines.
L24.4 Central quadrics. One can carry out a parallel discussion in space by adding a third
variable.
Definition. A quadric Q is the locus of points (x, y, z) in R3 satisfying an equation of the
form
Ax2 + By 2 + Cz 2 + 2Dyz + 2Ezx + 2F xy + 2Gx + 2Hy + 2Iz + J = 0. (9)
The word ‘quadric’ implies that (9) has order 2, so not all of A, B, C, D, E, F are zero.
Just as we did for conics in L24.1, one can list all the different types of quadrics; whilst there
were 8 types of conics there are 15 types of quadrics. However, we shall only consider the
more interesting cases in this course.
Let us start with an obvious example. The equation
x2 + y 2 + z 2 = r 2
fits the definition (with A = B = C = 1, J = −r2 , and all other coefficients zero). It is of
course a sphere of radius r with centre the origin. Indeed if v = (x, y, z)T , then the equation
becomes |v|2 = r2 or |v| = r , and asserts that the distance of (x, y, z) from the origin is r
(see L9.1).
In the light of the discussion of ellipses, it should now come as no surprise that the equation
x2 y 2 z 2
+ 2 + 2 =1
a2 b c
represents an ellipsoid that fits snugly into a box centred at the orgin of dimension 2a×2b×2c.
Definition. A central quadric is the locus of points (x, y, z) satisfying an equation
or equivalently
A F E x
(x y z) F B D y = −J.
E D C z
We know from L21.1 that there exists a 3 × 3 orthogonal matrix P so that P −1 SP = P > SP
is diagonal. We may also suppose that det P = 1 (for if not, det P = −1 and we merely
replace P by −P and note that det(−P ) = 1). It follows from remarks in L22.3 that P
represents a rotation; thus we have the
Theorem. Given a central quadric (10), it is possible to rotate the coordinate system about
the origin in space so that in the new system the equation becomes
λ1 X 2 + λ2 Y 2 + λ3 Z 2 = µ. (11)
Here are some examples of central quadrics in which the eigenvalues are all nonzero:
(i) an ellipsoid (if λ1 , λ2 , λ3 , µ all have the same sign);
(ii) a hyperboloid of one sheet (if for example λ1 , λ2 , µ are positive and λ3 < 0);
(iii) a hyperboloid of two sheets (if for example λ1 , λ2 are negative and λ3 , µ are positive),
(iv) a cone (if not all λ1 , λ2 , λ3 have the same sign and if µ = 0).
In the last case, the cone is circular if two of the eigenvalues are equal, otherwise it is called
elliptic. We shall explain this case further in the next lecture.
z = xy.
This is the equation (9) of a quadric for which all the coefficients are zero except F = −I . If
we perform a rotation of π/4 of the xy plane corresponding to the matrix
√1
1 −1
P = ,
2 1 1
z = 21 (X − Y )(X + Y ), or 2Z = X 2 − Y 2 .
This is an example of a hyperbolic paraboloid that resembles a ‘saddle’ for a horse, or (on a
bigger scale) a ‘mountain pass’. Any plane X = c or Y = c intersects Q in a parabola,
whereas a plane Z = c intersects Q in a hyperbola or (if c = 0) a pair of lines.
Paraboloids are quadrics that cannot be put into the form (10) or (11), and therefore possess
no central point of symmetry. The standard form of a paraboloid is the equation
Z = aX 2 + bY 2 .
If a, b have opposite signs, it is again a hyperbolic paraboloid. If a, b have the same sign, the
quadric is easier to draw and is called an elliptic paraboloid (circular if a = b). Its intersection
with the plane Z = a is an ellipse (circle).
1. For each of the following conics, find the centre (u, v) and the equations that results by
setting x = X+u, y = Y + v : (i) x2 + y 2 + x = 3, (ii) 3y 2 − 4xy − 4x = 0, (ii) 3x2 − xy + 2y = 9.
2. Find the centre (u, v) of the conic C : x2+xy +y 2 −2x−y = 0, and a symmetric matrix S
X
so that the equation becomes (X, Y )S = 1 with x = X +u, y = Y +v . Diagonalize S ,
Y
and sketch C relative to the original axes (x, y).
3. Let S be the sphere with centre (3, 1, 1) passing through (3, 4, 5). Find the radius of S ,
and write down its equation.
4. Let π be the plane x−2y+2z = 0 and let O denote the origin (0, 0, 0). Find
(i) the line ` orthogonal to π that passes through O ,
(ii) the point P on ` a distance 6 from O with z > 0;
(iii) a sphere S of radius 6 tangent to π at O .
x2 = 3y 2 + z 2 + 1, z 2 = xy, x2 + 2y 2 − z 2 = 1, −x2 − y 2 + 2x + 1 = 0
with (i) a hyperboloid of 1 sheet, (ii) a hyperboloid of 2 sheets, (iii) a cone, (iv) a cylinder.
6. Show that the line ` with parametric equation (x, y, z) = (1, −t, t) is contained in the
quadric Q : x2 +y 2 −z 2 = 1. Draw Q and ` in the same coordinate system. Find a second
line `0 that lies in Q .
7. Decide which of the following equations describes the circular cone that is obtained when
one rotates the line {(x, y, z) : x = 0, z = 2y} around the z -axis:
we can always rotate the coordinate system so that the equation of C becomes
λ1 X 2 + λ2 Y 2 = µ, µ = −F.
It follows that C is one of
(i) an ellipse (if λ1 , λ2 , µ all have the same sign); a circle is the special case in which λ1 = λ2 ;
(ii) a hyperbola (if λ1 , λ2 have opposite signs and µ 6= 0); the corresponding special case
λ1 = −λ2 gives rise to a rectangular hyperbola whose asymptotes are perpendicular;
(iii) two straight lines interecting in one point (if λ1 , λ2 have opposite signs but µ = 0);
(iv) two parallel lines (if one of λ1 , λ2 is zero and the other has the same sign as µ);
(v) a single line (if one of λ1 , λ2 is zero and µ = 0, for then the equation actually defines
two coincident lines, though only one is visible to the naked eye);
(vi) a point (if λ1 , λ2 have the same sign but µ = 0);
(vii) in all other cases, the set of points satisfying (1) is empty (over R).
Allowing D, E to be nonzero produces only one other type, namely
(viii) a parabola (such as x2 + y = 0, or less obviously 4x2 + 12xy + 9y 2 + x = 0).
Warning: it is sometime convenient to speak of the type of a conic: we say that a conic is of
elliptic type if the eigenvalues have the same sign; we say that a conic is of hyperbolic type
if the eigenvalues have opposite signs; we say that a conic is of parabolic type if zero is an
eigenvalue.
Warning: the conics in (iii), (iv), (v), and (vi) are called degenerate, or singular, conics. Degen-
erate conics involve lines: this is clear for the first three cases, but not for the last one. For
case (vi) it is enough to consider two complex lines intersecting in their unique real point: in
this way all singular conics are conics which split into lines.
Warning: in (vii) we mean that the solution set of (1) is empty in R2 , however it is not empty
in C2 . For example, the conic x2 + y 2 = −1 is a nice complex circle without real points. Also
note that the degenerate ellipse in (vi) corresponds to a pair of complex lines with just one
real point, e.g. x2 + y = 0 can be written as (x − iy)(x + iy) = 0 and we only see the real point
(0, 0).
L24.2 Study of a conic: finding the center. Given the general equation (1), we can try first
to eliminate the term 2Dx + 2Ey of degree 1 by a change of coordinates of type
x = X + u,
(3)
y = Y + v.
This corresponds to a translation in which the new system OXY has its origin O at the old
point (x, y) = (u, v). Substituting (3) into (1), we see that the new term of degree 1 is
2AuX + 2B(Xv+uY ) + 2CvY + 2DX + 2EY = (2Au+2Bv+2D)X + (2Bu+2Cv+2E)Y.
To eliminate all this, we need to solve the linear system of equations with unknowns u, v
and augmented matrix
A B −D
. (4)
B C −E
Since the left-hand side of this matrix is (8), a solution might not exist if det S = 0.
Definition. The conic C is central if there is a translation (3) that converts its equation into
the form A0 X 2 + 2B 0 XY + C 0 Y 2 + F 0 = 0. In this case, (X, Y ) ∈ C ⇔ (−X,−Y ) ∈ C , and
the centre of symmetry is the point (X, Y ) = (0, 0) or (x, y) = (u, v).
From the analysis above, we know that there is only one case in which (4) is incompatible
and C is not central, namely (viii).
Corollary. The conic (1) is a parabola only if B 2 = AC .
L24.3 Study of a conic: using matrices. One carry out a qualitative study of a conic by means
of the study of two symmetric matrices. Given C as in (1) we consider the 2 × 2 matrix
A B
A22 = , (5)
B C
and
1 1 1/2
A33 = 1 1 1/2 . (8)
1/2 1/2 0
Since |A22 | = 0 then the conic is of parabolic type and since |A33 | = 0 the conic is degenerate.
Thus C is either a pair of parallel lines, either distinct or coinciding; note that the real point
(0, 0) ∈ C . To decide in which situation we are, we intersect with a random line: if the inter-
section is empty we pick a new line, if not we will find either two distinct points or one double
point. Let’s pick the line l : x = 0. Thus l ∩ C is the solution set of y 2 + y = y(y + 1) = 0, that is
a set of two distinct points. Hence C is the union of two parallel distinct lines.
L24.4 Central quadrics. One can carry out a parallel discussion in space by adding a third
variable.
Definition. A quadric Q is the locus of points (x, y, z) in R3 satisfying an equation of the
form
Ax2 + By 2 + Cz 2 + 2Dyz + 2Ezx + 2F xy + 2Gx + 2Hy + 2Iz + J = 0. (9)
The word ‘quadric’ implies that (9) has order 2, so not all of A, B, C, D, E, F are zero.
Just as we did for conics in L24.1, one can list all the different types of quadrics; whilst there
were 8 types of conics there are 15 types of quadrics. However, we shall only consider the
more interesting cases in this course.
Let us start with an obvious example. The equation
x2 + y 2 + z 2 = r 2
fits the definition (with A = B = C = 1, J = −r2 , and all other coefficients zero). It is of
course a sphere of radius r with centre the origin. Indeed if v = (x, y, z)T , then the equation
becomes |v|2 = r2 or |v| = r , and asserts that the distance of (x, y, z) from the origin is r
(see L9.1).
In the light of the discussion of ellipses, it should now come as no surprise that the equation
x2 y 2 z 2
+ 2 + 2 =1
a2 b c
represents an ellipsoid that fits snugly into a box centred at the orgin of dimension 2a×2b×2c.
Definition. A central quadric is the locus of points (x, y, z) satisfying an equation
or equivalently
A F E x
(x y z) F B D y = −J.
E D C z
We know from L21.1 that there exists a 3 × 3 orthogonal matrix P so that P −1 SP = P > SP
is diagonal. We may also suppose that det P = 1 (for if not, det P = −1 and we merely
replace P by −P and note that det(−P ) = 1). It follows from remarks in L22.3 that P
represents a rotation; thus we have the
Theorem. Given a central quadric (10), it is possible to rotate the coordinate system about
the origin in space so that in the new system the equation becomes
λ1 X 2 + λ2 Y 2 + λ3 Z 2 = µ. (11)
Here are some examples of central quadrics in which the eigenvalues are all nonzero:
(i) an ellipsoid (if λ1 , λ2 , λ3 , µ all have the same sign);
(ii) a hyperboloid of one sheet (if for example λ1 , λ2 , µ are positive and λ3 < 0);
(iii) a hyperboloid of two sheets (if for example λ1 , λ2 are negative and λ3 , µ are positive),
(iv) a cone (if not all λ1 , λ2 , λ3 have the same sign and if µ = 0).
In the last case, the cone is circular if two of the eigenvalues are equal, otherwise it is called
elliptic. We shall explain this case further in the next lecture.
z = xy.
This is the equation (9) of a quadric for which all the coefficients are zero except F = −I . If
we perform a rotation of π/4 of the xy plane corresponding to the matrix
√1
1 −1
P = ,
2 1 1
z = 21 (X − Y )(X + Y ), or 2Z = X 2 − Y 2 .
This is an example of a hyperbolic paraboloid that resembles a ‘saddle’ for a horse, or (on a
bigger scale) a ‘mountain pass’. Any plane X = c or Y = c intersects Q in a parabola,
whereas a plane Z = c intersects Q in a hyperbola or (if c = 0) a pair of lines.
Paraboloids are quadrics that cannot be put into the form (10) or (11), and therefore possess
no central point of symmetry. The standard form of a paraboloid is the equation
Z = aX 2 + bY 2 .
If a, b have opposite signs, it is again a hyperbolic paraboloid. If a, b have the same sign, the
quadric is easier to draw and is called an elliptic paraboloid (circular if a = b). Its intersection
with the plane Z = a is an ellipse (circle).
1. For each of the following conics, find the centre (u, v) and the equations that results by
setting x = X+u, y = Y + v : (i) x2 + y 2 + x = 3, (ii) 3y 2 − 4xy − 4x = 0, (ii) 3x2 − xy + 2y = 9.
2. Find the centre (u, v) of the conic C : x2+xy +y 2 −2x−y = 0, and a symmetric matrix S
X
so that the equation becomes (X, Y )S = 1 with x = X +u, y = Y +v . Diagonalize S ,
Y
and sketch C relative to the original axes (x, y).
3. Let S be the sphere with centre (3, 1, 1) passing through (3, 4, 5). Find the radius of S ,
and write down its equation.
4. Let π be the plane x−2y+2z = 0 and let O denote the origin (0, 0, 0). Find
(i) the line ` orthogonal to π that passes through O ,
(ii) the point P on ` a distance 6 from O with z > 0;
(iii) a sphere S of radius 6 tangent to π at O .
x2 = 3y 2 + z 2 + 1, z 2 = xy, x2 + 2y 2 − z 2 = 1, −x2 − y 2 + 2x + 1 = 0
with (i) a hyperboloid of 1 sheet, (ii) a hyperboloid of 2 sheets, (iii) a cone, (iv) a cylinder.
6. Show that the line ` with parametric equation (x, y, z) = (1, −t, t) is contained in the
quadric Q : x2 +y 2 −z 2 = 1. Draw Q and ` in the same coordinate system. Find a second
line `0 that lies in Q .
7. Decide which of the following equations describes the circular cone that is obtained when
one rotates the line {(x, y, z) : x = 0, z = 2y} around the z -axis:
L25.1 Distances in general. We already know how to compute the distance between two
points using a formula. We now investigate the problem of finding distances in general.
What is the distance between two geometrical objects? How can we define this number?
Here is the most elegant way:
Definition. Given two sets X and Y , we define the distance between X and Y as
d(X, Y ) = min{d(P, Q) : P ∈ X and Q ∈ Y }.
L25.2 A formula for d(P, α). Consider the plane α : ax + by + cz = d where n = (a, b, c) is
the normal vector. It is convenient to arrange that n be a unit vector. If this is not already
the case, it suffices to divide both sides of the equation by |n|, thus modifying also d.
Lemma. If |n| = 1, so that a2 + b2 + c2 = 1, the distance between a point with position
vector p and the plane α equals |p · n − d|.
Proof. The distance is the length of the vector p − p0 , where p0 is the point of the plane π
at the foot of the perpendicular from p to π . Since p − p0 is parallel to n, this distance is
the absolute value of
(p − p0 ) · n = p · n − p0 · n.
Since p0 ∈ π , the last term equals d. QED
L25.3 Distances from lines. The case of d(l, α) is easily treated: either the line and plane
intersect, and the distance is zero, or the line and the plane are parallel and then
d(l, α) = d(P, α)
for any choice of P ∈ r and Q ∈ s. If the lines are intersecting, or they are skew lines, the
following formula will compute the distance
−−→ → − → −
P Q · vr × vs
||→
−
vr × →
−
vs ||
for any choice of P ∈ r and Q ∈ s. Notice that, if the lines are intersecting, then they are
−−→ − →
coplanar, hence the dot product P Q · →
vr × −
vs is zero.
Warning: the formula does not work for parallel lines.
Exercise. Verify that this answer agrees with the definition using the following geometric con-
struction in the case of skew lines. There exists a (unique) plane α such that α ⊂ r and αks . For
this plane, the following holds d(r, s) = d(α, s) .
L25.4 Distances from planes. The only case left to be treated is the one of two planes, that is
we have to compute d(α, β) the distance between two planes. If the planes are not parallel,
then they are intersecting and the distance is d(α, β) = 0; similarly if they coincide. If they
are parallel, then it is easy to see that
L25.5 Circles. For ancient Greek Mathematicians, a curve is defined by giving a procedure
to draw it. For example, fix a pole in the ground use a tight rope to draw a curve: here is a
circle. After Descartes, though, things are different: we define curves using their equations
(could be more than one equation: think of three space).
Definition. A circle in the plane is the set of points P ∈ R2 whose coordinates are solutions
of the equation
x2 + y 2 + Dx + Ey + F = 0.
Exercise. Fix a point C ∈ R2 and a real positive number R ∈ R . Show that the set of points P
such that d(P, C) = R is a circle according to the previous definition. (Hint: take the squares
and compute.)
Note that the equation approach introduces degenerate curves such as circles of zero radius
and even of imaginary radius!
Example. The following are circles:
(i) C : x2 + y 2 = 0 that corresponds to exactly one real point, that is (0, 0);
(ii) C : x2 + y 2 = −1 that is the empty set, since there are no real solutions.
These degenerate cases are actually a richness and show how restricting to the real numbers
could make our understanding less clear (one could think of (i) as a complex circle with only one
real point and of (ii) as a complex circle with no real points).
Given a circle a typical question is to find its center and its radius, and thus detecting if the
circle is degenerate or not. This is done by completing the squares, a very natural procedure
based on the equality d(P, C)2 = R2 which yields
(x − xc )2 + (y − yc )2 = R2
and thus giving to us the center C(xc , yc ) and the radius R of the circle.
Example. Let’s study the circle C : x2 + y 2 + 2x − 2y − 2 = 0 . To find the center and the radius
we complete the squares, that is we want to find a, b such that
Exercise. For what values of a ∈ R the center of the circle x2 + y 2 + ax + y − 1 = 0 lies on the
line x − y = 0?
Exercise. For what values of a ∈ R the circle x2 + y 2 + ax + y − 1 = 0 does not touch the line
x − y = 0?
L25.6 Distances and circles. Now that we added circles to our set of geometrical objects, we
can study distances from circles using the general approach. We will see that it is easier to
find the distance rather than the two points realizing it; as always orthogonality is the key.
(i) Distance of a point P form a circle C : it is easy to check that d(P, C) = |d(P, C) − R|
where C and R are the center and the radius of the circle.
(ii) Distance of a line l form a circle C : it is easy to check that d(l, C) = d(l, C) − R, if
this is non-negative, or zero otherwise, where C and R are the center and the radius of the
circle.
where C and R, resp. C1 and R1 , are the center and the radius of the circle C , resp. C1 .
Exercise. In cases (i), (ii), and (iii), find a pair of points, one on the circle and one on the other
object, realizing the minimal distance. Can you see any ’orthogonality’ in the cases in which the
distance is not zero?
2. Let v = xi+yj+zk, v0 = x0 i+y0 j+z0 k, n = ai+bj + ck. Find the distances between the
following points/planes:
(i) (0, 0, 0) and x+y+z+6 = 0,
(ii) (1, 2, 3) and x = 4,
(iii) (1, 2, 3) and x+y+z = 0.
6. Let ` be the line x = 2y = z and π the plane x = y + z . Explain why a line m in π that
meets ` necessarily has the form (x, y, z) = (at, bt, ct), and find the condition on a, b, c for
which m is orthogonal to `.
7. Let ` and m be two lines parallel to vectors p e q and containing points P and Q.
Suppose that v = p×q 6= 0. Show that the (minimum) distance between ` and m equals
|P~Q · v |/|v|.
8. Study the following circles, e.g. are there real points? how many? what is the center?
what is the radius?.
(i) x2 + y 2 + y + x = −3, 0, 3
(ii) x2 + y 2 + 2y + 3x = −5, 0, 5
L26.1 Intersection of circles. We defined a circle as the solution set of a particular degree 2
equation in x and y . Using this approach, the intersection of two circles is the solution set
of a system of two quadratic equations in two variables. That is, the solutions of
!
C 1 : " x 2 + y 2 + D 1 x + E 1 y + F1 = 0
#
C2 : x 2 + y 2 + D 2 x + E 2 y + F2 = 0
correspond to the points of C1 ∩ C2 . But how can we solve such a system of equations?
Gaussian elimination only works for linear systems of equations, however it is a good idea
to take linear combinations to produce an equivalent system of equations. Namely, we get
that the solutions of
!
C 1 : " x 2 + y 2 + D 1 x + E 1 y + F1 = 0
#
ℓ: (D2 − D1 )x + (E2 − E1 )y + F2 − F1 = 0
are exactly the points of C1 ∩ C2 , but now we are intersecting a line and a circle and this looks
much more promising.
The line ℓ is called radical axis of the circles C1 and C2 and it has the property that
ℓ ∩ C1 = ℓ ∩ C2 = C1 ∩ C2 .
Exercise. Show that, if R1 = R2 , then the radical axis is the axis of the segment joining the
centers of the two circles. What happens if R1 ∕= R2 ?
The following result is an easy application of the basic inequalities among the sides of a
triangle.
Proposition. Let C1 and C2 be circles having radius R1 , resp. R2 , and center C1 , resp. C2 .
Then
(i) if d(C1 , C2 ) > R1 + R2 or d(C1 , C2 ) < |R1 − R2 |, then C1 ∩ C2 = ∅
Exercise. State the previous Proposition using the radical axis and the distance from it.
L26.2 Spheres and intersection of spheres. The canonical form of a sphere is an expression
of the form
S : (x − xc )2 + (y − yc )2 + (z − zc )2 = R2
where the non-negative real number R is the radius of the sphere and C(xc , yc , zc ) is the
center of the sphere. That is S is the set of points P (x, y, z) such that
d(P, C) = R.
x2 + y 2 + z 2 + DX + Ey + F z + G = 0
where D, E, F, G ∈ R.
Warningthis definition includes some degenerate cases such as a single point (only one real
solution exists) and the empty set (no real solutions exist).
Given a sphere a typical task is to find its radius and its center, and this can be done again
by completing the squares.
Example. Find the radius and the center of the sphere S : x2 + y 2 + z 2 + x − y − 1 = 0 . That is,
we look for a, b, c ∈ R such that
We want to study the intersection of two spheres, and thus to find the solutions of the non-
linear system of equations
!
S 1 : " x 2 + y 2 + z 2 + D 1 x + E 1 y + F1 z + G 1 = 0
#
S2 : x 2 + y 2 + z 2 + D 2 x + E 2 y + F2 z + G 2 = 0
We can proceed as we did for circles and we get that S1 ∩ S2 is equal to the solution set of
the system
!
S 1 : " x 2 + y 2 + z 2 + D 1 x + E 1 y + F1 z + G 1 = 0
,
#
π: (D2 − D1 )x + (E2 − E1 )y + (F2 − F1 )z + (G2 − G1 ) = 0
where the plane π is called radical plane of the two spheres and has the property that
S1 ∩ π = S2 ∩ π = S1 ∩ S2 .
Proof. The only part that requires some explanation is the last one. First note that S1 ∩ S2 =
S1 ∩ π and thus we have only to understand the intersection of a sphere and a plane. Let C ′
be the orthogonal projection of C1 on π and let P be any point in C = S1 ∩ π . The triangle
$′ P has a right-angle in C ′ and thus
C1 C
Exercise. What is the maximal radius of a circle on the sphere x2 + y 2 + z 2 = 4 ? Can you find
all the circles of maximal radius?
to find its center and its radius we proceed as follows. The center of C id
C =π∩l
where π : x + y + z = 1 and l is the line through the center of the sphere orthogonal to the plane,
that is
l : x = y = z.
Thus, we get C( 13 , 13 , 13 ) and the radius R of the circle is
R2 = 4 − d(O, π)2
where 2 is the radius of the sphere and the origin O is its center. Thus we get
# √ $2
3 11
R2 = 4 − = .
3 3
L26.3 Circles and spheres passing through given points. We know that infinitely many
lines goes through one point, and that exactly one line exists through two distinct points.
What is the situation with circles and spheres?
Let’s start with the case of circles in R2 . Recall that the axis of a segment AB is the locus of
points P such that d(P, A) = d(P, B), then it is easy to prove the following.
Proposition. Given three distinct non-collinear points there exists a unique circle passing
through them.
Proof. Let P1 , P2 , P3 be the points. Consider the lines r and s, respectively axes of the
segment P1 P2 and P2 P3 . Then r ∩ s is the center of the circle we are looking for (note that
if the points are collinear, then the intersection is empty) and the radius is the distance from
any of the point Pi from that point. QED
The same result can be obtained by solving a linear system of equations. Namely, consider
the three points Pi (xi , yi ), i = 1, 2, 3, and the circle
C : x2 + y 2 + Dx + Ey + F = 0.
Thus, to find the circle containing the three points, we have to solve a linear system of three
equations in three unknowns.
Example. To find a circle containing the points P1 (1, 0), P2 (0, 1) and P3 (1, 1) , we have to solve
the linear system of equations: %
&
& 1+D+F =0
&
&
'
1+E+F =0
&
&
&
&
(
2+D+E+F =0
whose solutions is {(D, E, F ) = (−1, −1, 0)} . Thus the required circle is
C : x2 + y 2 − x − y = 0.
Exercise. Find the center and the radius of the circle of the previous example. Check you answer
using the axes of the segments approach.
Exercise. Prove the proposition. What happens if the points are coplanar?
Exercise. Recall that three non-collinear points in R3 uniquely determine a plane, use this to
find the circle (in the space) containing the three given points. (Hint: intersect a sphere and a
plane).
(iii) x2 + y 2 + z 2 + x + z − 1 = 0 (iv) x2 + y 2 + z 2 − 4 = 0
3. Find the circles passing through one, two, three, and four of the following points:
(i) A(1, 1), B(0, 0), C(2, 3)
(ii) A(1, 1), B(0, 0), C(2, 2)
&
(iii) A(1, 0), B(0, 1), C( √12 , √12 ), C( 12 , 3
2)