Download as pdf or txt
Download as pdf or txt
You are on page 1of 99

Mathematical Structures

Dr Elizabeth Scott
September 2009
Department of Computer Science
Egham, Surrey TW20 0EX, England
Abstract
These notes accompany the rst year course CS1860. All computer scientists should
know some mathematics because it is the fundamental tool for describing and reasoning
about computing issues. These notes will cover aspects of dealing with large and innite
structures in a computer science environment.
This document is c Elizabeth Scott 2008.
Permission is given to freely copy and distribute this document in an
unchanged form. You may not modify the text and redistribute without
written permission from the author.
Contents
1 Introduction 1
1.1 Aims 5
1.2 Course outline 5
1.3 Books 5
2 Set theory 7
2.1 Some Examples And Notation 7
2.2 Equality of sets 9
2.3 Why isnt everything a set? Russells paradox 10
2.4 Operations On Sets 11
2.5 Properties Of Sets 12
2.6 Proofs 13
2.6.1 Direct Proof 14
2.6.2 Proof By Contradiction 14
2.6.3 Existence Proofs 14
2.6.4 Counter-examples 15
2.7 Cardinality And Power Sets 15
2.7.1 Power sets 15
2.8 Product Sets 15
2.8.1 Bit strings 16
2.8.2 Representing subsets as bit strings 17
2.9 Partitions Of Sets 17
2.10 Look up on the WEB 18
2.11 Exercises 18
3 Relations 21
3.1 General relations 21
3.2 Counting relations 22
3.3 Modular arithmetic 22
3.4 Inx relation symbols 22
3.5 Graphical representation of relations 23
3.6 Matrix representation of relations 24
3.7 Inverse and complement relations 25
3.8 Properties of relations 26
3.8.1 Equivalence relations 28
3.9 Orderings 29
3.10 Lexicographic orderings 30
3.11 Exercises 30
4 Functions And Cardinality 33
4.1 Functions 33
4.2 Inverse functions 35
4.3 Error detection in bit strings 35
4.4 Injective and surjective functions 36
4.5 Finite cardinality and bijections 37
4.6 Cardinality of innite sets 37
4.7 The Inclusion-Exclusion principle 38
4.8 Countably innite sets 38
4.9 Integer functions and orderings 40
4.10 Data compression 40
4.11 Exercises 41
5 Recursion And Induction 43
5.1 Recursive denitions 43
5.2 Recurrence relations 44
5.3 Recursion and iteration 44
5.4 The towers of Hanoi 45
5.5 The General Principle Of Induction 46
5.6 Induction and recursion 47
5.7 Exercises 48
6 Graphs And Trees 49
6.1 Introduction to graph theory 49
6.1.1 Walks, trails and paths 49
6.1.2 The Exmoor roads problem revisited 50
6.2 Trees 51
6.3 Directed graphs and rooted trees 53
6.4 Return to the K onigsberg bridges 55
6.5 Exercises 56
7 Vector Spaces 59
7.1 Vectors 59
7.1.1 Scalar multiplication 60
7.1.2 Vector addition 61
7.1.3 Linear combinations 62
7.1.4 Standard bases 62
7.1.5 Vector spaces 62
7.2 Linear independence, spanning and bases 63
7.3 Transformations 64
7.3.1 Scaling 64
7.3.2 Translation 64
7.3.3 Rotation 65
7.4 Linear Transformations 65
7.5 Matrices 66
7.6 Composing Linear Transformations 68
7.7 Inverting Linear Transformations 70
7.7.1 Kernels 70
7.8 Invertible Matrices 71
7.9 Determinants And Row Reductions 72
7.9.1 Elementary row operations 74
7.9.2 Finding inverses 75
7.10 Exercises 75
8 Probability 77
8.1 Elementary Probability 77
8.1.1 Compound events 78
8.2 Conditional Probability 80
8.2.1 Probabilities and partitions 81
8.2.2 Independent events 83
8.3 Exercises 83
9 Statistical Distributions 85
9.1 The binomial distribution 85
9.1.1 Binomial coecients 86
9.1.2 Binomial coecients in probability 87
9.2 Random variables 89
9.3 Expected values 91
9.4 Exercises 92
10 Some Properties Of Numbers 93
Chapter 1
Introduction
What sorts of problems can we expect a computer to be able to solve? How do we represent
and reason about things in a way that allows us to be condent thet they are correct?
Some students may ask why a computer science degree should include a mathematics
component. For some students, mathematics is interesting and/or enjoyable and this is
reason enough. But why is mathematics compulsory for all students?
Mathematics is intimately involved with computer science in two basic ways.
Ultimately, computers perform mathematical calculations. This is historically why
computing was developed, (by mathematicians such as Alan Turing).
Many of the major uses for computers are essentially solving mathematical problems.
Physicists have used computers for a long time to predict results of experiments using
mathematical models, and to analyse data taken during experiments.
However, the most important reason that all computer scientists should know some
mathematics is that it is the fundamental tool for describing and reasoning about com-
puting issues.
In order to be a competent historian it is necessary to read, write and comprehend
English (or the native language of the country whose history is being studied). It is
necessary to be able to write critical essays, and to use the power of the language to
convey the points being made.
Mathematics plays a similar role in computer science to that played by English in
history. Just as historians dont need to be able to write novels or poetry, computer
scientists dont need to be able to prove new mathematical theorems. However, one of the
best ways to acquire skill in expression is to study novels and poetry, and one of the best
ways to acquire mathematical literacy is to study mathematical results and applications.
A historian does not need to be a poet but does need to be literate; a computer scientist
does not need to be a mathematician, but does need to be mathematically literate.
It is possible to view any task we wish to perform in terms of tools, techniques, and
methodologies. The tools are the equipment available, the techniques are the (standard)
ways in which the tools are operated, and the methodologies are the ways in which the
techniques are combined to successfully perform the task.
2 INTRODUCTION
Tools Techniques Methodologies
tennis shoes forehand singles strategy
tennis racket backhand doubles strategy
tennis balls serve specic opponents
tennis court volley
lob
terminal typing skills produce an algorithm
keyboard using the program constructs
computer using the compiler
compiler using the operating system
pen+paper use of notation mathematical modelling
calculator proof by contradiction axiomatic deduction
symbols generalisation
.
.
.
proof by induction
.
.
.
In general, the tools are provided (created) by someone else. The techniques for using
the tools are taught by an instructor. The skill of nding a methodology for performing a
given task can only be developed with practice and experience.
The process of deciding how to (correctly) perform a task is referred to as problem
solving. The task may be anything from beating a particular person at tennis to writing
a program to calculate a specic result.
Many mathematical methodologies for problem solving are types of mathematical mod-
eling. Mathematical modeling consists of taking a real situation and reformulating it in
mathematical terms, so that we can use mathematical techniques to nd a solution. The
reformulation involves selecting some properties, ignoring others, and using more concise
notation. This is called generalisation, and models which involve generalisation are called
abstract models because we have abstracted away some unnecessary details to make the
model simpler.
For example, suppose we have to solve the following problem:
Mary has 35 pencils, which she bought for 12p each. How many pencils must
she sell at 14p before she makes a prot?
We break the problem down into small steps. The expenses are the amount Mary had to
spend, the income is what she receives from her sales, and her prot is the income minus
the expenses.
prot = income - expenses
We can calculate the values of income and expenses:
expenses = (cost)(number bought) = 1235 = 420
income = (price)(number sold) = 14(number sold)
Mary makes a prot when (income - expenses) > 0, i.e. when
14(number sold) > 420
3
Using simple algebra we have that
(number sold) > 30
So Mary must sell more than 30 pencils before she makes a prot.
We have done more than solve the pencils problem. We have produced an algorithm
that can be used to write a general purpose computer program.
Suppose that we are given the following task:
Write a computer program which accepts the cost, the selling price, and the
number of items bought, and outputs the number that must be sold before a
prot is made.
All we have to do is substitute variable names for the prices and quantities. Then the
mathematical model above will provide us with an algorithm for solving the problem.
write(enter number of items bought)
read(number bought)
write(enter cost per item)
read(cost)
write(enter selling price)
read(price)
need to sell = (cost number bought) / price
write(for a profit sell more than , need to sell)
This can be translated into the chosen programming language.
Next we look at another example of mathematical modelling, a road system. Suppose
that there are six villages A, B, C, D, E, and F scattered over a region of Exmoor. Suppose
also that there are direct roads between some of the villages, as given in the following table.
Distance apart in miles
B 5
C 5
Towns D 10 10 20
E 20 20
F 30 10
A B C D E
Towns
Now suppose that, in the interests of conservation, it is decided to close some of the roads,
leaving only the sortest amount of road needed to connect all the villages. Thus we have
the following task.
Find the smallest length of (existing) road needed to ensure that all the villages
are connected.
We begin by formulating the problem in mathematical terms, using graphs. A graph
consists of a set of vertices, and a set of lines joining selected vertices. We can use a
graph to represent the Exmoor village road system, the villages are vertices and the roads
between them are edges.
4 INTRODUCTION
x x
x
x
x
x

`
`
`
`

/
/
/
/
/
/
/
/
F
E
D
A
B
C
30
10
20
20
10
5
10
5
20
It can be shown, using mathematical techniques (in this case induction), that if all the
villages are to be connected there must be at least 5 roads. Another technique (proof by
contradiction) shows that if there are 6 or more roads between the villages then some of
the roads will be redundant, and can be removed and still leave the villages connected.
Thus we deduce, by case analysis, that the shortest road system will have exactly 5 roads.
A methodology that we can use is to check combinations of 5 roads, starting with the
shortest, until we obtain a solution.
The shortest possible length of road is 40 miles, obtained by using the ve shortest
roads, AB, BC, AD, BD, EF.
x x
x
x
x
x

`
`
`
`

F
E
D
A
B
C
10
10
5
10
5
This is not a solution as E and F are not connected to the other villages.
The next shortest length is 50 miles, using one of the roads of length 20 instead of one
of length 10. Using BE instead of BD we get a solution, which our reasoning has shown
is of the shortest possible length.
x x
x
x
x
x

`
`
`
`

F
E
D
A
B
C
10
20
10
5
5
One of the main skills that a computer scientist needs is to be able determine how
to solve a problem in a way that allows a program to do it to be written. Ultimately, a
computer has no ability to think, it only knows what it has been told and it can only
execute actions that it has been given.
Aims 5
1.1 Aims
To provide insights and skills in rigor and formal reasoning in a way that allows reasoning
about behaviour, correctness and performance in a programming environment.
By the end of this course the student should be able to:
dene and reason about sets, relations, functions and cardinality
write and reason about recursive denitions and prove results by induction and
contradiction
represent and reason about problems using graphs
use vectors and transformations to dene and manipulate graphical objects
have an understanding of basic probability and statistics suitable for use in studying
articial intelligence
1.2 Course outline
Sets: dening sets, logic notation, proofs by construction, counterexample and contradic-
tion
Relations: relations, orderings, functions, bijections
Recursion: recursive denitions and induction, cardinality of innite sets
Graph theory: graphs, trees and spanning trees, directed graphs
Vector spaces: vectors, transformations, bases, matrices, determinants
Probability: elementary and conditional probability, binomial distribution, random vari-
ables and Bayes theorem
1.3 Books
There is no set book for this course, and there is no book which alone covers the material
presented in the course. However, most of the topics in this course a covered in the book
by Rosen, which is also one of the recommended books for CS1870.
1. Kenneth Rosen, Discrete Mathematics And Its Applications, 3rd Edition, McGraw-Hill,
2006. ISBN 0-07-124474-3. (Earlier editions are also acceptable.)
2. Kenneth A. Ross and Charles R.B. Wright, Discrete Mathematics, Prentice Hall, 2003.
ISBN 0-13-065247-4.
In this course we shall assume a basic familiarity with prime numbers, and division prop-
erties of integers. If you have not had much practice with these things you should read
Section 2.3 of Rosen, or the beginning of any elementary book on number theory. A
summary of some of the basic properties is given at the end of these notes.
Chapter 2
Set theory
Set theory forms the basis of all mathematics. It provides us with a powerful notation
for expressing properties. In this chapter we will review the basic aspects of set theory.
(Most of the material covered in this section is in Sections 1.4 and 1.5 of Rosen.)
2.1 Some Examples And Notation
Informal denition A set is a collection of objects. The objects are called elements of
the set. For example, all the people in this class form a set, and each person here is an
element of the set.
Some sets of numbers are used so often that they have special symbols to denote them.
and {} both denote the empty set, the set that has no elements.
N denotes the set of all natural integers, the numbers 0,1,2,3,4, etc.
P (or N
1
) denotes the set of all positive integers, the numbers 1,2,3,4, etc.
Z denotes the set of all integers, positive and negative.
Q denotes the set of all rational numbers, numbers of the form n/m where n is an
integer, and m is a positive integer.
R denotes the set of all real numbers.
S
n
denotes the set of the rst n integers, 1,2,...,n.
The phrase a is an element of the set S is written more concisely as
a S.
It is false if S is not a set, or if S is a set but a is not an element of the set. So if 1 and 2
are the usual integers then 2 1 is false.
The phrase a is not an element of the set S is written
a S.
Sets can be dened in several ways.
Descriptively: the set of all the people in this classroom.
This is perhaps the simplest method, but it is also the least precise. So we use other
methods.
8 SET THEORY
Enumeration: (listing all the members between braces)
{1, 2, 4, 6, 8}
{red, blue, the tower of London}
The order in which the members are listed is unimportant.
{1, 2, 3}, {3, 1, 2}, {3, 2, 1}
are all the same set.
This method is ne, but there is a problem if there are lots of elements. If the elements
follow a (very) simple pattern, we can use dots to indicate missing elements, thus making
the representation shorter.
{1, 2, ..., 201}
But this is not precise enough for a formal specication, or for a computer!
Predicates: We will often need to consider sets of objects, all of which share a common
property. Our third way of dening sets essentially says the set of all objects x with
property P(x)
For example,
the set of all odd integers
the set of all roots of the function f
the set of all Kings and Queens of England since 1066
We write the set of all objects for which P(x) is true as
{x | P(x)}.
For example, the set of odd integers is written
{x | x = 2n + 1, for some n Z}.
Other examples are:
S
n
= {x | x N and 1 x n}
Q = {p/q | p Z, q P}
Z = {x | x N or x N}
P = {x | x N, x = 0}
Programs operate on data which are grouped together in types. These types are sets.
For example, the Java type int is the set of integers, Z. A C++ enumerated type is set of
objects whose elements are listed explicitly.
enum days of week {Mon, Tue, Wed, Thur, Fri, Sat, Sun}
declares the set days of week = {Mon, Tue, Wed, Thur, Fri, Sat, Sun}.
Equality of sets 9
2.2 Equality of sets
Throughout this course we shall have to consider the question of what we mean by saying
two things are equal. In programming we are concerned about whether two objects have
the same value, and in this case whether changing the value of one also changes the
value of the other (aliasing). In theoretical computer science we have to consider the case
in which two objects which look dierent are actually the same, and thus can be used
interchangeably.
Denition Two sets are equal if they each contain the same elements. We write A = B
if A and B contain the same elements.
{2, 4, {}, } = {{}, 2, , 4}
{1, 2, 3, 4} = S
4
{x | for some n N, x = 2n + 1, x 11} = {1, 3, 5, 7, 9, 11}
We prove that two sets S
1
, S
2
are equal by showing that (a S
1
implies a S
2
) and
that (a S
2
implies a S
1
).
Denition A set A is a subset of a set B if all the elements of A are also elements of B.
For example, {1, 2, 4, 77} is a subset of N, and N is a subset of Z. We will write A B or
B A.
Every set is a subset of itself, A A is always true.
A is a subset of B if a A implies a B. Thus we see that if A B and B C then
A C. We also have that A = B if and only if A B and B A.
We dene the empty set to be a subset of every set, i.e. A, where A is any set, is
true.
For small sets we can write out the elements and check by examining each element
that one is a subset of the other. However, we cant do this when reasoning about large
sets, and a computer cant do this at all.
If the sets A and B are dened by a predicate (property) then we can show that an
element of A is also an element of B by showing that it has the property required for B.
Example Show that for the following two sets A B.
A = {5, 8, 23}
B = {x | x = 3n + 2 for some n Z}
If n = 1 we have 5 = 3n + 2, so 5 B.
If n = 2 we have 8 = 3n + 2, so 8 B.
If n = 7 we have 23 = 3n + 2, so 23 B.
Thus every element of A belongs to B and so A B.
Example Show that the following two sets are equal.
A
1
= {x | x = 2n + 1 for some n Z}
A
2
= {x | x = 2n 1 for some n Z}
If x A
1
then by denition of A
1
then there must be an integer n Z such that x = 2n+1.
Since n is an integer then so is m = n + 1. We have m Z and x = 2m 1, so x A
2
.
10 SET THEORY
Thus we have proved that every element of A
1
also belongs to A
2
(even though there are
innitely many of these elements).
If y A
2
then by denition of A
2
then there must be an integer n Z such that y = 2n1.
Since n is an integer then so is m = n 1. We have m Z and y = 2m + 1, so y A
1
.
So A
1
= A
2
as required.
This method of proof can also allow us to prove two sets are equal even if we dont
know what the sets are! More precisely, it can allow us to show that sets constructed from
other sets in dierent ways are equal. We shall return to this later.
2.3 Why isnt everything a set? Russells paradox
The elements of a set dont have to have anything in common, and their elements can
themselves be sets.
{4, the tower of London, {31, my car, {}}, apollo 11, {}, 9012}
So why do we have sets, isnt every collection of objects a set?
It turns out that if we just say that a set is any collection of objects then we end up
with a paradox, a statement which is both true and false.
Mathematical paradoxes play a very important role in Science. Until the beginning
of the 20th century people did not realise that such paradoxes exist. Then in 1931 a
Mathematician called Kurt G odel proved that in any formal mathematical system specied
by a set of axioms and rules either there is a true proposition which cannot be proved true
using the rules of the system, or the system contains a proposition which can be proved
both true and false using the rules of the system!
This has implications for what we try to do as computer scientists. There is no point
in trying to build a computer that can prove everything thats true about the complete
system that it can represent. For example, we cannot expect a computer to be able to
derive all the true statements in arithmetic.
In 1937 Alan Turing put the problem at the heart of computing when he showed that
the halting problem is undecidable. When we write a program we would like to be able
to tell if it will terminate, even if we have to wait a very long time for it to do so. Turing
showed that there cannot exist an algorithm which, given any program, will be able to
decide whether or not that program will terminate. Of course, there are lots of programs
that we can decide will terminate, or will not terminate. But there is no hope of writing
a computer program which will decide whether or not any given program will terminate.
Proving, or even understanding the basic arguments for, the halting problem and
G odels incompleteness theorem requires some mathematical skill. But one of the ear-
liest mathematical paradoxes, discovered by the mathematician Russell in 1901, can be
formulated using only the concepts of set membership.
Some sets are clearly not elements of themselves. For example, the set S
4
of integers
from 1 to 4 is not itself an integer between 1 and 4, but it is an element of the set
{3, desert orchid, S
4
, Emma}, and of the set of subsets of S
6
.
There are a lot of sets which do not have S
4
as an element. In fact we can form the set
S of all sets which do not have S
4
as an element, and S
4
is an element of this set. Since
S
4
is an element of S, we have that S is not an element of itself.
OK, so what about the collection of all those sets which are not elements of them-
selves. Is a set?
If is a set then we can ask is an element of itself? If the answer is no then it is
also yes, and if its yes then it is also no!
Operations On Sets 11
Thus we have that the collection
= {S | S is a set, S / S}
cannot be a set. This is known as Russells paradox and it shows that not every collection
of objects which can be dened by a given property is a set.
If you want to know more about the formal rules which determine when a collection
of objects is a set, you should look up the work of Georg Cantor and then the work of
Bertrand Russell. For this course you do not need to know the formal denition of a set.
2.4 Operations On Sets
We can use set operations to form new sets from existing ones. These operations are called
union, intersection, complement or dierence, and symmetric dierence.
The mathematical notation for these operations is , , \ and .
Union,
The union of two sets A and B is obtained by taking all the elements of A and all the
elements of B.
A B = {x | (x A) or (x B)}.

A B
So {red, blue} {1, 4, 5} = {red, blue, 1, 4, 5} and N = P {0}.
A Venn diagram is a pictorial representation of the result of applying set operations.
Sets are drawn as circles, and sets which have elements in common are drawn to overlap.
A = {1, 2, 3, 4, 5, 6, 7, 8}, B = {1, 2, 3, 4}, C = {1, 2, 7, 8, 39, 10}
&%
'$ 5
6
3
4
1
2
7
8
39
10
A
C
B

Note: Venn diagrams are only used for illustration. They are merely pictures and cannot
be used to prove properties.
For example, two sets are only equal if they have the same elements. Having the same
picture is not a proof.
Intersection,
The intersection of two sets A and B is obtained by taking the elements which are in both
A and B.
A B = {x | (x A) and (x B)}.
A B

So {1, 3, 4, 17} {1, 4, 5} = {1, 4} and {} = = P {1, 0, 4.567}.


Complement, \
The complement of B in A is obtained by removing the elements of B from the set A.
12 SET THEORY
A\B = {x | (x A) and (x B)}.
A B

>
>
>>
So {1, 3, 4, 17}\{1, 4, 5} = {3, 17} and P = N\{0}.
Symmetric dierence,
The symmetric dierence of two sets A and B is obtained by taking the elements which
are in A or in B, but not in both.
A B = {x | (x A or x B) and x A B}.
A B

>
>
>>

.
.
So {1, 3, 4, 17} {1, 4, 5} = {3, 17, 5}
2.5 Properties Of Sets
Sets constructed in dierent ways can be equal. A statement that two sets are equal is
called an identity. We use element wise arguments to establish set-theoretic identities.
This can allow us to show that two sets are equal even if they have innitely many elements,
and even if we dont know what the elements are.
A B = B A
If x A B then, by denition of , either x A or x B.
If x A then x B A, by denition of .
If x B then x B A, by denition of .
So (A B) (B A).
If x B A then, by denition of , either x B or x A.
If x B then x A B, by denition of .
If x A then x A B, by denition of .
So (B A) (A B).
So we have shown that A B = B A.
(This is the only time we shall spell a proof out like this. We usually say, the argument
that shows (B A) (A B) is similar.)
A B = B A
If x AB then, by denition of , x A and x B. So x BA and AB BA.
The reverse inclusion follows similarly, so A B = B A.
The next two results are direct consequences of the denitions, but they are very useful
for proving other results.
A A B
If x A then x A B, by denition of .
A B A
If x A B then, x A by denition of .
Proofs 13
Notice A B A and B A A.
A = A
If x A then, by denition of , x A or x . By denition of we have x , so
we must have x A.
(A B) C = A (B C)
If x (A B) C then x A B or x C. So x A, or x B, or x C. Thus x A
or x B C, so x A (B C).
The proof that x A (B C) implies x (A B) C is similar.
(A B) C = A (B C)
The proof of this is left as an exercise.
We can write A B C (and A B C) because both possible interpretations of
A B C give the same set. (A B) C = A (B C). Furthermore, the order of the
sets is not important either, AB C = B C A. This follows the similar situation in
arithmetic: a +b +c = b +c +a etc.
Notice, A B C = B C A, and we need brackets A (B C) = (A B) C.
Just as a b +c = b c +a and (a b) +c = a (b +c).
A\B = A\(A B) and A\B = (A B)\B
If x A\B then x A and x B, so x A B. Thus x A\(A B).
If x A\(A B) then x A and x A B. If x B then x A B, a contradiction.
Thus x A and x B, so x A\B.
2.6 Proofs
A proposition is a statement which is either true or false. We may not know if is true or
false but it must be one or the other. So
6 < 10
King Henry VIII of England had six wives
4 1 = 7
On 8th December 2009 it will rain in Egham
On 10th May 1004 there was a Viking named Swen in Winchester
are all propositions, but the statement
This statement is false
is not a proposition.
A proof is a logical argument which ends with the conclusion that some proposition is
true. We state the assumptions, or hypotheses, and these form the axioms on which the
proof is based. We then use rules to deduce consequences of these axioms, until we get
the required proposition.
It is important that computer scientists understand what constitutes a proof. Any
program is based on an algorithm; for the program to work correctly the algorithm must
be correct. Most program implementations are too complex to be able to prove them
correct (although producing provably correct software is now a major area of research and
development). However, an implementation is more likely to be correct if its writer has a
good understanding of how proofs are constructed.
14 SET THEORY
Proving something means showing that an implication is true,
(the assumptions) imply (the required proposition).
2.6.1 Direct Proof
Assume that P is true and show that Q must be true.
For example, we can give a direct proof that if an integer n is divisible by 54, then it is
divisible by 3:
P = n is divisible by 54
Q = n is divisible by 3
Prove: P Q.
Assume that P is true, so n = 54m (this is the denition of division).
Letting r = 18m, we have (n = 3r), so 3 divides n.
P implies m(n = 54m) implies r(n = 3r) implies Q.
2.6.2 Proof By Contradiction
Direct proofs have limited uses, they tend to prove only obvious things. A more powerful
method of proof relies on the fact that something cannot be both true and false (remember
we have to be careful with set theory to ensure that this is true). We assume both that
P is true and that Q is false, and derive a contradiction (a proposition which is always
false). Then we can deduce that if P is true then Q cannot be false, so it must be true.
For example, we can use proof by contradiction to prove the result that, for integers n, if
n is prime and not equal to 2, then n is odd.
P and (not Q) = ((n is prime) and (n = 2)) and (n is even).
We assume that n is prime, that n = 2, and n is even. So, n = 2m and 2 divides n. But
n = 2, so n is not prime which contradicts the assumption that n is prime.
2.6.3 Existence Proofs
Most mathematical propositions are depedent on some quantity. For example,
for some integer n, n
4
= 81
for all integers n, n
2
+ 2 2
Statements of this form are called predicates, and are written P(n) if the quantity that P
depends on is n.
To prove that a predicate is true for some value we just have to exhibit one value of n
for which P(n) is true. This is called an existence proof. (We sometimes write nP(n) is
true if there exists some value of n for which P(n) is true.)
For example, we prove that for some integer n, n
2
= 43264 by calculating that (208)
2
=
43264.
Existence proofs are often constructive they involve producing an algorithm that con-
structs an element with the property.
How did I nd the example above? I used prime factorisation:
Cardinality And Power Sets 15
43264 = 2 21632 = 2 2 10816 = 2
2
2 5408 = 2
2
2
2
2704
= 2
2
2
2
2 1352 = 2
2
2
2
2
2
676 = 2
2
2
2
2
2
2 338
= 2
2
2
2
2
2
2
2
169 = 2
2
2
2
2
2
2
2
13
2
= (2 2 2 2 13)
2
2.6.4 Counter-examples
To show that a proposition of the form for all integers n, P(n) is false we just have to
nd one value of n for which P(n) is false. This value is called a counter-example. (We
sometimes write nP(n) is true if P(n) is true for all values of n, and nP(n) if P(n)
has a counter-example.)
For example, for all integers n, if n is prime then 2n + 1 prime.
If n = 7 then n is prime, but 2n + 1 = 15 is not prime. So when n = 7, (n prime)
(2n + 1 prime), and 7 is a counter-example.
2.7 Cardinality And Power Sets
One very powerful technique for proving results is to use the size (number of elements) of
a set. The number of elements in a set S is called its cardinality, and is written |S|.
|{1, 2}| = 2 = |{a, b}| || = 0 |S
n
| = n
Some sets are innite. We write
|N| = |Z| = |{2n + 1 | n N}| = .
We use the cardinality of sets to prove results by induction. (We shall discuss proofs
by induction in Chapter 5.)
2.7.1 Power sets
Denition The power set of A is the set of all subsets of A. It is written P(A).
P({a, b}) = {{a, b}, {a}, {b}, }
P({1, 2, 3}) = {{1, 2, 3}, {1, 2}, {1, 3}, {2, 3}, {1}, {2}, {3}, }
If |A| = n then what is |P(A)|?
In fact, |P(A)| = 2
|A|
, so |P(A)| = 2
n
if |A| = n. (When |A| is nite, this can be proved
by induction, as we shall see later.)
2.8 Product Sets
Most programming languages have arrays as one of their basic data types. Arrays allow
several pieces of information to be stored together.
For example, we may hold the roots of quadratic equations in arrays of size 2. So [1,-3]
holds the roots of x
2
+ 2x 3.
In mathematics the equivalent of an array [a,b] of size 2, is an ordered pair, or vector,
(a, b). As you might expect, in an ordered pair the order of the elements is important. So
(a, b) = (c, d) if and only if a = c and b = d. So (1, 3) = (3, 1).
16 SET THEORY
We can form ordered pairs (a, b) with elements from any two sets A and B.
For example,
(1,a), (1,b), (1,c), ... , (1,z),
(2,a), (2,b), (2,c), ... , (2,z),
(3,a), (3,b), (3,c), ... , (3,z)
are ordered pairs formed by taking the rst element from S
3
= {1, 2, 3} and the second
element from [a..z] = {a, b, c, ..., z}.
Denition We call the set of ordered pairs whose rst element comes from A and whose
second element comes from B, the (Cartesian) product, AB, of A and B. So
A B = {(a, b) | a A, b B}.
The above set is
S
3
[a..z] = {(, ) | 1 3, [a..z]}.
S
2
S
4
= {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4)}
S
4
S
2
= {(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (4, 1), (4, 2)}
So AB = B A.
Again as you might expect, we can form the Cartesian product of more than two sets.
(a
1
, a
2
, a
3
) is called an ordered triple, or a 3-dimensional vector, and corresponds to an
array [a
1
, a
2
, a
3
] of size 3.
We call (a
1
, a
2
, . . . , a
n
) an ordered n-tuple, or an n-dimensional vector. We still have that
(a
1
, a
2
, . . . , a
n
) = (b
1
, b
2
, . . . , b
n
)
if and only if a
1
= b
1
, a
2
= b
2
, . . . , a
n
= b
n
.
For example, AB C = {(a, b, c) | a A, b B, c C}.
S
2
{a, b} S
1
= {(1, a, 1), (2, a, 1), (1, b, 1), (2, b, 1)}.
In general,
A
1
A
2
. . . A
n
= {(a
1
, a
2
, . . . , a
n
) | a
1
A
1
, a
2
A
2
, . . . , a
n
A
n
}.
We have seen that
|S
2
S
4
| = 8 = 2 4
|S
2
{a, b} S
1
| = 4 = 2 2 1
|S
3
[a..z]| = 78 = 3 26
It is possible to prove that
|A B| = |A| |B|
and generally that
|A
1
A
2
. . . A
n
| = |A
1
| |A
2
| . . . |A
n
|.
2.8.1 Bit strings
One important class of product sets in computer science are the binary arrays which come
from the products of the set {0, 1}.
Partitions Of Sets 17
Notation: When all the sets in a Cartesian product are the same, we write A
n
rather than
AA. . . A.
So A
2
= A A, A
3
= AAA, etc.
If B = {0, 1} then
B
4
= (0, 0, 0, 0) (1, 0, 0, 0) (0, 1, 0, 0) (0, 0, 1, 0)
(0, 0, 0, 1) (1, 1, 0, 0) (1, 0, 1, 0) (1, 0, 0, 1)
(0, 1, 1, 0) (0, 1, 0, 1) (0, 0, 1, 1) (1, 1, 1, 0)
(1, 1, 0, 1) (1, 0, 1, 1) (0, 1, 1, 1) (1, 1, 1, 1)
|B
4
| = 16, and in general |B
n
| = 2
n
.
In computer science, the elements of B
n
are called bit strings.
2.8.2 Representing subsets as bit strings
Let S = {s
1
, s
2
, . . . , s
n
} be a set with nitely many elements. We pick an order, s
1
, s
2
, . . . , s
n
say, for these elements. Then we can represent subsets of S using elements of B
n
, where
B = {0, 1}. We dene b
A
= (b
1
, b
2
, . . . , b
n
) by putting b
i
= 1 if s
i
A and b
i
= 0 if s
i
A.
For example, let S = {a, b, c, d, e, f, g} and A = {b, e, g, a} then
b
A
= (1, 1, 0, 0, 1, 0, 1).
The bit string b
A
is called the characteristic vector of the subset A.
If S = {1, 3, 5, 7, 9, 11} then
A = {3, 7} b
A
= (0, 1, 0, 1, 0, 0)
B = {1, 7, 9} b
B
= (1, 0, 0, 1, 1, 0)
AB = {3, 1, 9} b
AB
= (1, 1, 0, 0, 1, 0) = b
A
+b
B
(using binary addition).
Notice that for all subsets A and B we have b
A
+b
B
= b
AB
. We can use this to give a
simple algorithm for nding the symmetric dierence of two sets.
We can also give an algorithm using characteristic vectors to nd the union of two subsets.
We nd b
C
where C = A B.
FOR i=1 to n DO
IF b
A
[i]=1 or b
B
[i]=1 THEN b
C
[i]=1
ELSE b
C
[i]=0
(Here b[i] is the value held at index i in b.)
2.9 Partitions Of Sets
We often have to split things up into dierent cases. For example, when testing programs
we divide the input into subsets of normal, boundary, and illegal cases. Every test
case is either normal, boundary or illegal, and no case is both normal and boundary or
normal and illegal, etc. This is an example of a partition.
Two subsets A, B S partition S if
A B = S and A B = .
18 SET THEORY
`
`
`
`

`
`
`
`
`
-
-
-
-
-
-
-
`
`
`
`

S

>
>
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.

>
>
>
>
.
.

A
B

.
.
A binary partition splits the elements of S into two separate cases. If x S then either
x A or x B but not both.
The example of program testing required a set to be split into three parts. In fact we can
form a partition with any number of subsets.
Denition The subsets A
1
, A
2
, . . . , A
n
S partition S if
S = A
1
A
2
. . . A
n
, the A
i
cover S
If i = j then A
i
A
j
= , the A
i
are pairwise disjoint.
Example Let S = {1, 2, a, X, 3, } and let A = {1, 2}, B = {a, 3}, C = {}, D = {X}.
Then A, B, C, D partitions S.
2.10 Look up on the WEB
Georg Cantor Born in Russia in 1845, while Cantor was still a child his family moved
to Germany. As well as his work on set theory, Cantor produced the formal denition of
the real numbers in terms of Dedekind cuts, and proved that the rational numbers are
countable and that the real numbers are not. Cantor suered periodic bouts of mental
ill health, which grew worse as he got older. He never met Russell, because his health
forced him to cancel a planned visit. But Cantors work took the study of orderings and
set theory from their naive setting in the mid 1800s to the modern approach, and forms
the foundations of the work continued by Russell, G odel and Turing.
Bertrand Russell Born in England in 1872, Russell worked at Trinity College Cam-
bridge. He was a pacist who was imprisoned in 1918 and again in 1961 (when he was
89!) for his pacist beliefs. He won the Nobel Prize for literature in 1950, but his most
important work is the Principia Mathematica which he wrote with Alfred Whitehead.
Whiteheads nephew Henry Whitehead was Professor of Pure Mathematics at Oxford,
where he was the D.Phil supervisor of Graham Higman. Higman was also Professor of
Pure Mathematics at Oxford, where he was my D.Phil supervisor!
Kurt G odel Born in 1906 in what was then Austria but is now the Czech Republic,
G odel did his work on his incompleteness theorem at the University of Vienna in the
1920s. All his life G odel was worried about his health and at the start of the second
World War he moved to America, probably to avoid conscription which he felt his health
would not survive. In America G odel he became a close friend of Albert Einstein.
2.11 Exercises
1. Prove that A (B C) = (A B) C.
Exercises 19
2. List the elements of the set {x | x N, 2 x < 4}.
3. Write the set {1, 2, ..., n} in predicate form
(i). using S
n
(ii). without using S
n
.
4. For A = {1, 2, 3, 4, 5}, B = {2, 5, a, b}, and C = {a, 2} write down
A B, B C, A B, B C, A\C, AC.
5. Draw Venn diagrams for A (B C) and (A B) C.
6. Write out the power set of {1, 2, 3, 4}.
7. Let S = {a, b, c, d, e, f, g} and suppose that the elements of S are ordered alphabeti-
cally. If A = {c, g, a, c}, write down b
A
, the characteristic vector of A.
Which subset has characteristic vector (0, 1, 1, 1, 0, 1, 0)?
8. Calculate the number of subsets of cardinality 4 in the set {1, 2, 3, ..., 23}.
9. Let A = {a, b, c}, B = {, }. Write out the elements of AB and B A.
10. How many elements are there in S
14
{x, y, z}?
11. Let S = {1, 2, 3, a, b, c}. Write down three sets A, B, C such that A, B, C partition
S.
12. Prove that A B = (A\B) (B\A).
13. Prove the following identities, which are called the distributive laws.
(i). A (B C) = (A B) (A C).
(ii). A (B C) = (A B) (A C).
14. Write the set {2, 5, 8, 11, 14, 17, ...} in predicate form.
15. Prove that A (B C) = (A B) C.
16. Prove that (A B)\B = A\B.
17. Let |S| = n. Write out an algorithm for calculating the intersection of two subsets
which are represented as bit strings.
18. Calculate the number of subsets of cardinality 3 in the set {x |x N, 1 x 17}.
19. Find all possible partitions of the set {1, 2, 3}.
20. If |A| = n and |B| = m, what is |P(A B)|?
Chapter 3
Relations
It is natural to compare things your pudding is larger than mine, she is taller than her
brother.
The concept of comparison is important in mathematics and computer science. Searches
through data are much more ecient if the data is ordered somehow. (Imagine trying to
nd a word in a dictionary if the words were not ordered alphabetically, or at all!)
Comparisons, or orderings as they are called in mathematics, are a form of relation.
We often want to divide a collection of objects into groups, each group having some
particular property. For example, we could divide this class up into groups according to
the degree course for which each person is registered. Then two people are related if they
are registered for the same course.
A large amount of the worlds computing power is being used to classify data structures
into classes, and to sort data into some given order. Relations between properties form
the basis one of the mostly widely used types of data base relational data bases.
In this chapter we study relations.
3.1 General relations
(Most of the material in this section can be found in Section 1.6 and Chapter 3 of Rosen.)
Consider the set of all undergraduate students in the college, and the set of all courses
oered by departments in the college. The students and the courses are related in an
obvious way: a student is related to a course if he/she is currently registered for the
course. This is an example of a relation. Most students are related to several courses, and
most courses are related to several students.
Relations are just sets of ordered pairs. If S is the set of all students in the college, and
if C is the set of all courses then we form the ordered pair (s, c) if student s is registered
for course c. So, for example, we may have (John Brown, CS1860).
If R is the relation is registered for then we have
R = {(s, c) | s S, c C, s is registered for c}.
So R S C. We could also write
R = {(s, c) S C | s is registered for c}.
Denition Let S and T be sets. A (binary) relation between S and T is a set of ordered
pairs, a subset of S T.
Example Let S = { int, oat, char, double} and let T = {term, amount, total cost,
nished, answer} be variable names from a program. Then we can dene a relation R
22 RELATIONS
between S and T by saying that s is related to t if t T is of type s. So we might have
R = {(int, term), (int, amount), (int, nished), (oat, total cost), (char, answer) }.
Example Let S = T = S
4
and say that s is related to t if s is bigger than or equal to t.
Then the relation is
{(1, 1), (2, 1), (2, 2), (3, 1), (3, 2), (3, 3), (4, 1), (4, 2), (4, 3), (4, 4)}
Example Write down all the relations between {a, b} and {1, 2}.
R
1
= {(a, 1), (a, 2), (b, 1), (b, 2)} R
2
= {(a, 2), (b, 1), (b, 2)}
R
3
= {(a, 1), (b, 1), (b, 2)} R
4
= {(a, 1), (a, 2), (b, 2)}
R
5
= {(a, 1), (a, 2), (b, 1)} R
6
= {(b, 1), (b, 2)}
R
7
= {(a, 2), (b, 2)} R
8
= {(a, 2), (b, 1)}
R
9
= {(a, 1), (b, 2)} R
10
= {(a, 1), (b, 1)}
R
11
= {(a, 1), (a, 2)} R
12
= {(b, 2)}
R
13
= {(b, 1)} R
14
= {(a, 2)}
R
15
= {(a, 1)} R
16
=
We call a relation between A and itself a relation on A.
It is important to notice a relation may not be reversible. We can dene a relation R on
S
4
by declaring that (s, t) R if s t. Then we have (2, 1) R, but (1, 2) R.
3.2 Counting relations
Any subset of S T is a relation between S and T. So if |S| = n and |T| = m then there
are 2
nm
dierent relations that can be dened between S and T.
3.3 Modular arithmetic
The most general way of describing a relation is to use the predicate notation for sets,
making the predicate the condition for two objects to be related.
For example, let p be a prime number, and let n N. Then let n%p be the remainder
when n is divided by p. So there is some integer k such that
n = kp +n%p.
5%2 = 1, 4%2 = 0, 8%3 = 2, p%p = 0 (p + 1)%p = 1, etc.
We can dene a relation R on N by
R = {(n, m) | n%p = m%p}.
This is the same as the relation you may have met before:
R = {(n, m) | n m(modulo p)}.
We have (n, m) R if and only if n m(modulo p).
(If you have not met the relation modulo p before, you can read about it in Section 2.3
of Rosen.)
3.4 Inx relation symbols
Some common relations correspond to operations that are well known. For example, we
can dene R = {(s, t) | s, t S
4
, s < t}. Then we have (s, t) R if and only if s < t. The
statement s < t is shorter and easier to read than (s, t) R.
Thus, when a relation R is generated by a well known mathematical operation we may
write sRt rather than (s, t) R.
Graphical representation of relations 23
The so called trivial reexive relation is dened by R
0
= {(s, t) | s = t}. We write s = t
instead of (s, t) R
0
.
Using the predicate method for describing sets may be the only way to describe a relation.
But if there are only a few elements in each of the sets involved in the relation then we
can use a graphical representation.
3.5 Graphical representation of relations
We have already mentioned that a graph is a set of vertices (nodes) and a set of edges
between them. A digraph (a di(rected) graph) is a graph whose edges are arrows. We can
use digraphs to represent relations. The nodes of the digraph are the elements of the two
sets, and there is an arrow from one node to another if the two nodes are related.
Let S = { int, oat, char, double} and T = {term, amount, total cost, nished,
answer}, and let R be the relation is of type described above. Then a graphical repre-
sentation of R is
t
t
t
t
t
t
t
t
t

`
`
`
`
`
`
`
`

int
oat
char
double
term
amount
total cost
nished
answer
Example The following are graphical representations of the relations R
1
, R
3
, R
10
and
R
14
dened above between {a, b} and S
2
.
t
t
t
t
a
b
2
.
.
.
.
.
.
..
>
>
>
>
>
>
>
>
1
R10
t
t
t
t
a
b
2
.
.
.
.
.
.
..

>
>
>
>
>
>
>
>
-
-
-
-
-
-
--
1
R1
t
t
t
t
a
b
2

1
R14
t
t
t
t
a
b
2
.
.
.
.
.
.
..

>
>
>
>
>
>
>
>
1
R3
If the relation is dened on A, so it is a relation between A and A, then we only list the
nodes of A once. We end up with graphs that look dierent because there can be edges
in both directions between two nodes and there can be an edge from a node to itself.
Example We can dene the relation R = {(s, t) | s, t S
3
, ((s%2) < (t%2)) or (s < t)}.
The graphical representation is
24 RELATIONS
t t
t
`
`
`
`
`
`
`
`
1 2
3


Example Let R be the relation on S
5
. Then the graphical representation of R is
t
t
1
5

-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
``
t
3
t
2
`
`
`
``

.
t
4

-
'

`
3.6 Matrix representation of relations
A matrix is a two dimensional array. Matrices have rows and columns. We refer to an
entry in the matrix by stating its row number and then its column number.
M =
_
_
1 0 1 0
1 1 1 1
0 0 1 1
_
_
is a matrix with three rows and four columns. The (3,2)th entry of M is 0.
If M is a matrix with n rows and m columns, we say that M is an n m matrix (said n
by m). We use M(i, j) to denote the entry in the ith row and jth column of M.
For the above matrix M(2, 4) = 1.
We can dene a matrix by saying what M(i, j) is for all possible values of i and j. For
example, if M is a 5 3 matrix and
M(i, j) =
_
_
_
i +j, if i < j
i j, if i > j
0, if i = j,
then
M =
_
_
_
_
_
_
0 3 4
1 0 5
2 1 0
3 2 1
4 3 2
_
_
_
_
_
_
We can use matrices whose entries are either 1 (true) or 0 (false) to describe relations.
We use the elements of the rst set to label the rows and the elements of the second
set to label the columns. Then we put 1 in the (s, t) entry if (s, t) R, and 0 in the entry
otherwise.
Inverse and complement relations 25
The following matrix represents the relation on S
3
.
_
_
1 1 1
0 1 1
0 0 1
_
_
If R is a relation between two nite sets A and B then we pick an order for the elements
of the sets, say a
1
, a
2
, . . . , a
n
and b
1
, b
2
, . . . , b
m
. We then build an n m matrix, whose
(i,j)th entry is 1 if (a
i
, b
j
) R, and is 0 otherwise.
For example, suppose that A = {u, w, y} and B = {1, 3, 5, 7} and
R = {(u, 1), (w, 1), (u, 3), (u, 5), (w, 5), (y, 5), (y, 7)}.
Order A and B in the natural way, so that A = u, w, y and B = 1, 3, 5, 7. Then the matrix
representation of R is
_
_
1 3 5 7
u 1 1 1 0
w 1 0 1 0
y 0 0 1 1
_
_
.
If A = B = S
5
, if R is the relation , and if S
5
is ordered in the usual way, S
5
= 1,2,3,4,5,
then the matrix representation of R is
_
_
_
_
_
_
1 1 1 1 1
0 1 1 1 1
0 0 1 1 1
0 0 0 1 1
0 0 0 0 1
_
_
_
_
_
_
.
If A = B = S
3
, if R is the relation R = {(s, t) | s, t S
3
, ((s%2) < (t%2)) or (s < t)},
and if S
3
is ordered in the usual way, then the matrix representation of R is
_
_
0 1 1
1 0 1
0 0 0
_
_
.
Of course, we can reconstruct the relation from its representation matrix. If A = {1, 2, 3, 4},
B = {a, b, c}, both in the usual order, and if R has representation matrix
_
_
_
_
0 1 1
1 0 1
1 0 0
1 0 1
_
_
_
_
,
then
R = {(1, b), (1, c), (2, a), (2, c), (3, a), (4, a), (4, c)}.
3.7 Inverse and complement relations
We have looked at two relations < and on Z. There are two corresponding relations >
and . These are examples of inverse relations.
If n < m then m > n, and if n m then m n.
If we have a relation R on a set A we can form another relation R
1
on A using the rule
26 RELATIONS
aR
1
b if and only if bRa.
The relation R
1
is called the inverse of R.
> is the inverse of < and is the inverse of .
We have a(R
1
)
1
b if and only if bR
1
a, which is if and only if aRb. So the inverse of
R
1
is R, as we might hope.
Also, looking at the relations < and on Z we see that m n if and only if m < n.
We can dene the complement R of R by the rule
aRb if and only if (a, b) R.
So is the complement of <.
Exercise What is the complement of ?
3.8 Properties of relations
Many of the relations we meet have particular properties. For example, for the relation
on S
5
we have 1 1, 2 2, etc. We also have that if n m and m r then n r.
The relation R = {(n, m) | n = m modulo3} on S
10
has both these properties; (n, n)
R, and if (n, m) R and (m, s) R then (n, s) R. This relation also has the property
that if (n, m) R then (m, n) R.
Denitions Suppose that R is a relation on a set A.
1. R is reexive if (a, a) R for all a A.
2. R is symmetric if, for all a, b A, we have (a, b) R implies that (b, a) R.
3. R is transitive if, for all a, b, c A, we have (a, b), (b, c) R implies that (a, c) R.
The relation on S
n
is reexive and transitive, but it is not symmetric if n 2. For
example, 1 2 but 2 1.
The relation
3
dened on N by the rule
n
3
m if and only if n = m modulo3,
is reexive, symmetric and transitive.
If you draw a graphical representation of a reexive relation you will nd that every node
has an edge going to itself.
The graphical representation of
3
on S
4
is
t t
t
1 2
4
t
3

If you write out the matrix representation of a reexive relation you will nd that there
are 1s down the leading diagonal.
The matrix representation of
3
on S
4
is
Properties of relations 27
_
_
_
_
1 0 0 1
0 1 0 0
0 0 1 0
1 0 0 1
_
_
_
_
If you draw a graphical representation of a symmetric relation you will nd that if there is
an edge in one direction between two nodes, then there is an edge in the other direction.
The graphical representation of R = {(a, b), (b, a), (a, c), (c, a), (d, d)} on A = {a, b, c, d} is
t t
t
a
b
d
t
c

-
`
If you write out the matrix representation of a symmetric relation you will nd that the
matrix is symmetric about the leading diagonal.
The matrix representation of R = {(a, b), (b, a), (a, c), (c, a), (d, d)} on {a, b, c, d} is
_
_
_
_
0 1 1 0
1 0 0 0
1 0 0 0
0 0 0 1
_
_
_
_
If you draw a graphical representation of a transitive relation you will nd that if there
is an edge from node A to node B, and if there is an edge from node B to node C, then
there is an edge directly from node A to node C.
The graphical representation of R = {(a, b), (b, b), (b, c), (a, c), (c, b), (c, c), (b, d), (c, d), (a, d)}
on {a, b, c, d} is
t t
t
a
b
d
t
c

If you are using a graph or a matrix to represent a relation in a computer program, then
you can exploit the special properties that these representations have when the relation is
known to be reexive or symmetric. For example, if the relation is symmetric then it is
only necessary to store the upper triangular half of the matrix representation.
It is easy to tell whether a relation is reexive or symmetric from either the graphical or
the matrix representations. In general it is hard to tell whether a relation is transitive.
28 RELATIONS
3.8.1 Equivalence relations
Denition A relation on a set A which is reexive, symmetric and transitive is called an
equivalence relation on A.
An equivalence is an abstraction (generalisation) of equality. We think of objects as
being equivalent in some way, and from this point of view two equivalent objects are
indistinguishable.
For example, the relation R on Z dened by R = {(n, m) | (nm > 0) or n = m = 0} is
an equivalence relation. Two integers are equivalent if they have the same sign. (Using this
relation corresponds to saying that we only care whether an integer is positive, negative
or 0, we dont care about its actual value.)
This equivalence could arise when an algorithm branches:
if n < 0 then do .....
if n > 0 then do .....
if n = 0 then do .....
The relation R divides Z up into three subsets:
K = {x | x P}, {0}, P.
These three sets are pairwise disjoint, and their union is the whole of Z, thus they form a
partition of Z.
Denition Let A be any non-empty set, let a be an element of A, and let R be an
equivalence relation on A. The equivalence class of a is a subset of A which contains all
the elements which are related to a under R.
In the above example, P is the equivalence class of 1; it is the set of all elements which
have the same sign as 1.
We use the notation [a] for the equivalence class of a. So
[a] = {b A | (a, b) R}.
Example Dene a relation R on Z by saying that nRm if n m is divisible by 4. Show
that R is an equivalence relation, and nd all the equivalence classes relative to R.
Solution
n n = 0 which is divisible by 4, so nRn for all n Z and R is reexive.
mn = (nm), so if nm is divisible by 4 then so is mn. Thus nRm implies mRn,
and R is symmetric.
If n m and mr are divisible by 4 then n m = 4k and mr = 4h, for some integers
k, h. Then n r = n m+(mr) = 4(k +h). So n r is divisible by 4, and nRr. Thus
R is transitive.
So R is an equivalence relation on Z.
Every integer can be written in one of the following four forms:
4h, 4h + 1, 4h + 2, 4h + 3.
If n and m are both of the same form then n m is divisible by 4. For example, if
n = 4h + 1 and m = 4k + 1 then n m = 4(h k).
If n and m are of dierent forms then nm is not divisible by 4. For example, if n = 4h+1
and m = 4k + 2 then n m = 4(h k) 1.
So the equivalence classes are the subsets which contain elements of the same form:
{. . . , 8, 4, 0, 4, 8, 12, . . .} = {4h | h Z} = [0]
{. . . , 7, 3, 1, 5, 9, 13, . . .} = {4h + 1 | h Z} = [1]
Orderings 29
{. . . , 6, 2, 2, 6, 10, 14, . . .} = {4h + 2 | h Z} = [2]
{. . . , 5, 1, 3, 7, 11, 15, . . .} = {4h + 3 | h Z} = [3]
Every element of Z is in one of the equivalence classes, and these classes are disjoint.
Again, we have a partition of the set.
The following theorem shows that there is a direct correspondence between equivalence
relations and partitions of a set.
Theorem 1
(i) If R is an equivalence relation on the set A then the equivalence classes relative to R
form a partition of A.
(ii) If A
1
, A
2
, ... is a partition of the set A, and if
R = {(a, b) | for some i, (a A
i
and b A
i
)}
then R is an equivalence relation on A.
3.9 Orderings
If we order a set of data then it usually becomes easier to use. For example, sorting a
set of words into alphabetical order makes searching for a particular word much easier.
Ordering student marks into ascending order makes it easier to see how many have more
than 70%. Having an order relation on a set of data usually makes programs which use
that data more ecient. Orderings are a special kind of relation on a set.
Some relations are the opposite of reexive or symmetric. For example, for the relation <
on N we have n < n, for all n, and if n < m then m < n.
Denition A relation R on A is irreexive if there are no elements a A such that
(a, a) R.
The relation R = {(n, m) | n%3 < m%3} on S
4
is irreexive. The graphical represen-
tation is
t t
t
1 2
4
t
3

and there are no edges from a node to itself.


It is possible for a relation to be neither reexive or irreexive. For example, the
relation on {a, b, c} whose graph is
t t
a
b
t
c

-
Denition A relation R on A is antisymmetric if whenever both (a, b) and (b, a) are in R
then a = b.
30 RELATIONS
For example, the relation on N is antisymmetric. For any elements n, m of N, if n m
and m n then n = m. (This relation is also reexive.)
Denition A total ordering on a set A is a relation R which is antisymmetric and tran-
sitive, and which has the property that for every a, b A either a = b, or aRb or bRa.
The relations < and are both total orderings on Z.
The usual alphabetic ordering that you see in a dictionary is a total ordering on the set
of words in the dictionary.
Most of the total orderings that are used are either reexive or irreexive. The ordering
is reexive, and < and > are irreexive.
3.10 Lexicographic orderings
We can order pairs of integers by comparing rst on the left element and then on the right
element. So for example, we have (1, 7) < (2, 4) and (4, 5) < (4, 7) etc. The general rule
which denes this ordering is that
(n, m) < (s, r) if (n < s) or if (n = s and m < r).
Using the predicate notation for dening a relation we have
R = {((n, m), (s, r)) | n < r or ((n = r) (m < s))}.
This is an example of a lexicographic ordering. If we have any set A and a total ordering
R on A, then we can use R to dene an ordering on the cartesian product A
2
. We dene
the new ordering R

using the rule


(a, b) R

(c, d) if aRc or a = c and bRd.


We call R

the lexicographic ordering generated by R, or the lexicographic extension of R.


We can dene a lexicographic extension of R on A
n
for any n, so we have either xR

y or
yR

x or x = y for all x, y A
n
. The rule is always the same: compare the rst (left most)
elements of x and y. If these are the same then compare their second elements. If these
are also the same then compare their third elements, and so on.
For example, suppose that B = {0, 1} and < is dened on B in the usual way so that
0 < 1. The lexicographic extension of < to B
4
gives the following relations:
(0,0,0,0) < (1,0,0,0) (0,0,0,0) < (0,1,0,1)
(0,0,0,0) < (0,0,1,0) (0,1,0,0) < (0,1,1,0)
(1,0,0,0) < (1,1,0,0) (1,0,0,1) < (1,1,0,0)
(These are not the only relations, for example
(b
1
, b
2
, 0, b
4
) < (1, 1, 1, 1)
for every possible value of b
1
, b
2
and b
4
.)
3.11 Exercises
1. Let A = S
4
and let R = {(1, 2), (1, 4), (2, 1), (2, 2), (2, 4), (3, 3), (4, 2)}. Give the graph-
ical and matrix representation of R. Write down all the elements which are related to 2
and all the elements to which 4 is related.
Exercises 31
2. Write out all the possible relations between {a} and {A, B, C}. How many relations
are there between S
8
and {a, b, c, d}?
3. The following is a graphical representation of a relation R. Write out all the ordered
pairs in R.
t
t
t
t
t
t

t
a
b
c
d
1
2
3

4. Let R = {(s, t) | s S
7
, t S
4
, s%3 = t%3}. Give the graphical representation of R.
5. Draw the graphical representation of on S
6
.
6. Let R be the relation dened on S
4
by the rule that nRm if n = m. Give the ordered
pair, graphical and matrix representations of R and R.
7. Let B
4
be the set of bit strings of length 4. Dene a relation R on B
4
by setting aRb
if a and b both have the same number of 1s. Show that R is an equivalence relation, and
write out the equivalence classes.
8. Let S = {a, b, c, d} and suppose that subsets of S are represented as bit strings from
B
4
, so b
A
represents the subset A. Let R be the relation on B
4
dened in Question 1, and
dene a relation R

on P(S) as follows:
R

= {(A, B) | (b
A
, b
B
) R}.
Show that R

is an equivalence relation on S, and describe in words the condition for


two sets to be equivalent relative to R

.
9. Let R = {(s, t) | s, t S
5
, s%3 = t%3}. Write out the elements of the inverse relation
R
1
and the complement relation R.
Give the graphical and matrix representations of R, R
1
and R. What is the relation-
ship between the three graphs, and what is the relationship between the three matrices?
Chapter 4
Functions And Cardinality
4.1 Functions
A function is a relation between two sets, the domain and the codomain, which has
the property that every element in the domain is related to exactly one element in the
codomain.
For example, let M be the set of mothers of children at a pre-school, and let C be the
set of children. We can dene the relation mother by setting
R = {(c, m) | c C, m M, c has mother m}.
Each child has only one mother, so there are not two relations cRm
1
and cRm
2
with
m
1
= m
2
. Graphically,
t
t
t
t
t
t
t
t

t
t
t
t
t

>
>
>
>
>
>
>>
>
>
>
>
>
>
>>

C M
There is only one arrow from each element in C.
Consider the relation R S
7
S
3
dened by nRm if and only if n m modulo 3.
R = {(n, m) | n S
7
, m S
3
, n%3 = m%3}
= {(1, 1), (2, 2), (3, 3), (4, 1), (5, 2), (6, 3), (7, 1)}
Each element of S
7
appears in only one relation (n, m) R.
t
t
t
t
t
t
t
1
2
3
4
1
2
3
t
t
t
5
6
7
-
-
-
-
-
-
--

.
.
.
.
.
.
.
.

34 FUNCTIONS AND CARDINALITY


Denition A function, f A B is a relation between A and B such that
1. for each a A there is some b B such that (a, b) f,
2. for each a A, if (a, b
1
) f and (a, b
2
) f then b
1
= b
2
.
So for each a A there is only one b B such that (a, b) f. Graphically, there is
exactly one arrow from each element of A.
t
t
t
t
t
t
t
t

t
A B

When a relation is a function we use functional notation rather than ordered pairs.
So we write f(a) = b rather than (a, b) f. The following is the denition of a function
written in functional notation.
Denition A function, f : A B is a relation between A and B such that
1. for each a A there is some b B such that f(a) = b,
2. for each a A, if f(a) = b
1
and f(a) = b
2
then b
1
= b
2
.
Many of the familiar functions from calculus are functions from R R. For example,
f(x) = x
2
, f(x) = 5x +, f(x) = cos(x) are all functions from R to R.
A bit of care is needed here though. The map f(x) = x
1
is not a function from R to R
because it is not dened on 0.
Exercise Find a set A such that f(x) = x
1
is a function from A to Z.
Denitions We call f(a) the image of a under f.
If f is a function from A to B, we write f : A B, and we call A the domain of f.
We write f(A) = {f(a) | a A}. So f(A) is the set of all images of elements of A
under f. We have f(A) B, and we call f(A) the image of A in B.
We call B the codomain of f.
Example Let A = S
4
and let B = S
5
. Decide which of the following relations are
functions from A to B. For those which are functions, write down their domain and
image.
R
1
= {(n, m) | n S
4
, m S
5
, m = n + 1}
R
2
= {(n, m) | n S
4
, m S
5
, m%3 = n%3}
R
3
= {(1, 1), (2, 1), (3, 1), (4, 1)}
R
4
= {(1, 1), (2, 1)}
Solution
R
1
: For each n S
4
, we have (n + 1) S
5
and (n, n + 1) R
1
. So there is at least one
m S
5
such that (n, m) R
1
.
If (n, m), (n, r) R
1
then m = n + 1 = r, so m = r. Thus there is exactly one m S
5
such that (n, m) R
1
. Thus R
1
is a function.
Call this function f
1
, so f
1
(n) = n + 1 and (n, f
1
(n)) R
1
.
Inverse functions 35
The domain of f
1
is S
4
.
The image of f
1
is {f
1
(n) | n S
4
} = {2, 3, 4, 5}.
R
2
: 1%3 = 1%3 and 1%3 = 4%3 so (1, 1), (1, 4) R
2
. So R
2
is not a function.
R
3
: If n S
4
then (n, 1) R
3
. So there is at least one m S
5
such that (n, m) R
3
.
Since all the elements of R
3
are of the form (n, 1), there is only one relation for each n.
Thus R
3
is a function.
Call this function f
3
, so f
3
(n) = 1 for all n S
4
.
The domain of f
3
is S
4
.
The image of f
1
is {1}.
R
4
: There is no relation containing 4 in R
4
, i.e. (4, m) R
4
. So R
4
is not a function.
Denition Two functions f, g are equal if and only if they have the same domain, the
same codomain, and for all x in the domain, f(x) = g(x).
The functions f : Z Z and g : Z Z given by f(x) = 2x and g(x) = x +x are equal.
Is the function k : Z Z given by k(x) = x(x + 2) x
2
equal to f?
Careful: the function h : N Z given by h(x) = 2x is not equal to f and g, because it
has a dierent domain.
4.2 Inverse functions
Every function has a relation which is its inverse. However, it is not always the case that
the inverse of a function is a function.
The function f : Z N given by f(n) = |n| is a function. Its inverse relation is R where
R = {(m, n) | m N, n Z, |n| = m}.
Exercise Why isnt R a function?
Denition Functions whose inverse relation is also a function are called invertible. We
write f
1
for the inverse of f.
Invertible functions are important because they can be undone. We have f
1
(f(x)) = x.
4.3 Error detection in bit strings
When data is transmitted it is possible that errors can be introduced. It its simplest form
we can imagine transmitting sequences of bit strings. Occasionally, one or more of the
entries will become corrupted.
For example, (1, 0, 0, 1, 1, 0, 0, 1) is sent and (1, 0, 1, 1, 1, 0, 0, 1) is received. How can
the receiver know that there has been an error, and ask for the data to be resent? The
theory of error correcting codes is well advanced, and is based on vector spaces. We just
discuss a simple method which will tell the receiver if there has been one error in the
transmission.
Instead of sending a string of length 8 we send a string of length 9, where the last digit
on the string is the sum of the previous 8 digits modulo 2.
If we wish to send (b
1
, b
2
, . . . , b
n
) we actually send (b
1
, b
2
, . . . , b
n
, b
n+1
) where
b
n+1
=
_
0, if b
1
+b
2
+. . . +b
n
is even,
1, if b
1
+b
2
+. . . +b
n
is odd.
36 FUNCTIONS AND CARDINALITY
Thus we send (1, 0, 0, 1, 1, 0, 0, 1, 0) instead of (1, 0, 0, 1, 1, 0, 0, 1).
The receiver checks that the last digit is correct. If not then an error has been intro-
duced and we request that the string is sent again. (This method will detect that there has
been one error, but not if there have been two errors, which requires more sophisticated
methods.)
If we receive (1, 0, 1, 1, 1, 0, 0, 1, 0) then we know there has been an error, if we receive
(1, 0, 1, 0, 1, 0, 0, 1, 0) then we cant tell that there are errors.
What has this to do with invertible functions? We need a function to convert the message
into a coded one, and an inverse function to decode the message at the other end.
For the error detecting method above we have a function f : B
n
A, where A is a
subset of B
n+1
, and B = {0, 1}. We have
A = {(b
1
, b
2
, . . . , b
n
, b
n+1
) | b
i
B, b
n+1
= (b
1
+. . . +b
n
)%2}
and f(b
1
, b
2
, . . . , b
n
) = (b
1
, b
2
, . . . , b
n
, b
n+1
). To decode the message, if no errors have been
detected, we apply f
1
,
f
1
(b
1
, b
2
, . . . , b
n
, b
n+1
) = (b
1
, b
2
, . . . , b
n
).
This is a simple example, but to get better error detection, and even error correction, we
use more complicated invertible functions.
4.4 Injective and surjective functions
In the relation R
3
of the example above, we have that
f
3
(1) = 1 = f
3
(2).
So knowing f
3
(n) is not enough for us to be able to tell what n is.
The relation R
1
above has the property that if f
1
(n) = f
1
(m) then n = m. In this case,
if we know f
1
(n) then we can work out n.
Denition A function f : A B is injective if f(a
1
) = f(a
2
) implies that a
1
= a
2
. We
also say that f is one-to-one, or 1-1.
Graphically: no element in B has two arrows going to it.
The function f : {a, b, c} S
6
given by f(a) = 2, f(b) = 4, f(c) = 6 is injective.
All the linear functions f : Z Z are injective:
f(x) = 2x
f(x) = 7x + 4
f(x) = x etc.
Question: is f(x) = x
2
+ 2 an injective function?
In the relation R
3
above there is no element n S
4
such that f
3
(n) = 2, but 2 S
4
= B.
Denition A function f : A B is surjective if for every b B there exists some a A
such that f(a) = b. We also say that f is onto.
Graphically: a function is surjective if every node in B has some arrow going to it.
The function f : S
6
{a, b, c} given by f(1) = f(2) = a, f(3) = f(4) = b,
f(5) = f(6) = c is surjective.
Neither of the functions R
1
, R
3
from S
4
to S
5
are surjective. In fact there are no surjective
functions f : S
4
S
5
. Similarly, there are no injective functions f : S
5
S
4
. Both of
these facts are consequences of Theorem 1, which is given below.
Finite cardinality and bijections 37
Denition A function f : A B which is both injective and surjective is called a
bijection.
Examples
1. f : Z Z, f(n) = n + 1, is a bijection.
2. g : P N, g(n) = n 1, is a bijection.
3. h : S
4
{a, b, c, d}, h(1) = a, h(2) = b, h(3) = c, h(4) = d, is a bijection.
Theorem 1 A function f : A B is invertible if and only if it is a bijection.
4.5 Finite cardinality and bijections
The following theorem shows, for nite sets, the relationships between injective and sur-
jective functions and cardinality
Theorem 2 Let A, B be nite sets, and let f : A B be a function.
1. |f(A)| |A|.
2. If |f(A)| = |A| then f is injective.
3. If f is injective then |f(A)| = |A| and |A| |B|.
4. If f is surjective then f(A) = B and |A| |B|.
5. If f is a bijection then |A| = |B|.
Theorem 3 (The Pigeonhole Principle) Let A and B be nite sets.
(1) If |A| = |B| and f is injective then f is surjective.
(2) If |A| = |B| and f is surjective then f is injective.
4.6 Cardinality of innite sets
If A is a nite set then |A| is the number of elements in A. What if A is innite? Some
innite sets are larger than others. For example, the size of the natural numbers is
less than the size of the reals. However, the size of N is the same as the size of Z
and the size of Q. Furthermore, the size of R is the same as the size of the interval
(0, 1) = {r R | 0 < r < 1}.
Let A and B be nite sets,
A = {a
1
, a
2
, . . . , a
m
} and B = {b
1
, b
2
, . . . , b
n
}
If m n then the map f : A B given by f(a
i
) = b
i
, for 1 i m, is injective. If
n m then any map f : A B such that f(a
i
) = b
i
, for 1 i n, is surjective.
Recall from Theorem 2, that for nite sets A, B, if there is a bijection f : A B then
|A| = |B|.
We use these ideas to compare |A| and |B| when these sets are innite.
Denition Let A, B be any sets.
38 FUNCTIONS AND CARDINALITY
1. We say |A| |B| if there exists an injective function f : A B.
2. We say |A| |B| if there exists a surjective function f : A B.
Note: if A is nite and B is innite then there is an injective map f : A B, but there
is no injective map g : B A. So |A| < |B|.
Thus we have that |A| = |B| if there exists a bijection f : A B, and we say that A and
B have the same cardinality.
Example The map f : Z N given by the rules
f(0) = 0, f(n) = 2n, f(n) = 2n 1, for n P,
is a bijection. Thus |Z| = |N|.
4.7 The Inclusion-Exclusion principle
The following result is true, but we wont prove it:
|A B| = |A| +|B| |A B|.
picture
A B

>
>
>
>
>
>
>
>
.
.

.
.
.
.
`
` `
`
`
`
`
`
`
`
`
`
`
`
`
`

A B

>
>
>
>
>
>
>
>
.
.

.
.
.
.
`
` `
`
`
`
`
`
`
`
`
`
`
`
`
`

AB

Exercise Is it true that |A\B| = |A| |B|?


Is it true that |A\(A B)| = |A| |(A B)|?
4.8 Countably innite sets
Denition A set A is said to be countably innite if |A| = |N|.
The map f : N R given by f(n) = n is injective, so |N| |R|. However, the converse is
not true.
Theorem 4 R is not countably innite, so |N| < |R|.
Proof We suppose for contradiction that |N| |R|. Then there exists a surjective function
f : N R. We write each element f(n) as a decimal:
f(0) = k
0
.a
00
a
01
a
02
a
03
a
04
. . .
f(1) = k
1
.a
10
a
11
a
12
a
13
a
14
. . .
f(2) = k
2
.a
20
a
21
a
22
a
23
a
24
. . .
f(3) = k
3
.a
30
a
31
a
32
a
33
a
34
. . .
f(4) = k
4
.a
40
a
41
a
42
a
43
a
44
. . .
.
.
.
Now dene digits b
0
, b
1
, b
2
, b
3
, . . . by the rule
Countably innite sets 39
b
i
=
_
1, if a
ii
= 1
2, if a
ii
= 1
So if f(0) = 34.345124 . . . then b
0
= 1, if f(1) = 4.115198 . . . then b
1
= 2, etc.
So we have b
i
= a
ii
, for all i N.
Now let r = 0.b
0
b
1
b
2
b
3
b
4
. . ..
We have that r R but r = f(0), since b
0
= a
00
, r = f(1), since b
1
= a
11
, etc. In general,
we have
r = f(i), for any i N since r
i
= a
ii
.
Thus we have that f is not surjective. This is a contradiction, so we have that |R| |N|.
Hence, |R| = |N|, and R is not countably innite.
The method we have used to prove the last result is called a diagonalization argument.
We listed all the numbers, then we took the digits from the diagonal and used them to
construct a new number not on the list.
We can list all the rational numbers by writing them out as follows: on the rst line
we write the natural numbers, on the second line we write the negative integers, on the
third line we write the positive fractions whose denominator is 2, on the fourth line we
write the negative fractions whose denominator is 2, on the fth line we write the positive
fractions whose denominator is 3, on the sixth line we write the negative fractions whose
denominator is 3, and so on.
0 1 2 3 4 5 6 7
. . .
1 2 3 4 5 6 7
. . .
1
2
2
2
3
2
4
2
5
2
6
2
7
2
. . .

1
2

2
2

3
2

4
2

5
2

6
2

7
2
. . .

1
3
2
3
3
3
4
3
5
3
6
3
7
3
. . .

1
3

2
3

3
3

4
3

5
3

6
3

7
3
. . .

1
4
2
4
3
4
4
4
5
4
6
4
7
4
. . .

1
4

2
4

3
4

4
4

5
4

6
4

7
4
. . .
.
.
.
We can use this array to write the rationals in a list, moving along the rows and down the
columns following the arrows.
We dene a function f : N Q using the arrows. So we have
f(0) = 0, f(1) = 1, f(2) = 2, f(3) = 1, f(4) =
1
2
, f(5) = 2, f(6) = 3,
f(7) = 4, f(8) = 3, f(9) =
2
2
, f(10) =
1
2
, f(11) =
1
3
, . . . .
40 FUNCTIONS AND CARDINALITY
It can be proved that f is surjective, so we have that |N| |Q|.
The function f is not injective; for example, f(1) = 1 = f(9). However, the function
g : N Q dened by the rule g(n) = n, for all n N is injective.
So we have that |N| |Q|.
This two calculations together prove the following theorem:
Theorem 5 The rationals are countably innite, i.e. |Q| = |N|.
4.9 Integer functions and orderings
Any function from a set A to the set of natural numbers corresponds to a total reexive
ordering on the set.
For f : A N we dene R
f,<
by setting aR
f,<
b if f(a) < f(b).
We can dene a reexive version of this ordering R
f,
by setting aR
f,
b if f(a) f(b).
The process of using a function onto the integers to dene an ordering on a set is often
referred to as indexing assigning integers to objects.
If f is surjective then we can use f as the basis for proofs by induction on the set A.
We prove the result for a A such that f(a) = 0, then we assume that the result is true
for a A such that f(a) n to prove the result for b A such that f(b) = n + 1.
4.10 Data compression
We can also use orderings, and in particular integer functions, to compress data so that
it can be electronically transmitted more quickly and stored in less space.
Words consist of strings of letters, and when a document is electronically transmitted
each of the letters in each of the words must be transmitted. If we can code the words so
that the coded word contains fewer characters than the letters which make up the word
then transmitting and storing the text can be more ecient.
A simple coding is to assign an integer to each word, and then, if there are not too
many dierent words, the integer version of the document will be smaller than the original
version. For example, if there are only 999 words all of length 4 then we will save between
3 and 1 characters per word when we represent the words as integers.
It is not just the length of the words which is important but also their frequency. For
example, if a word of length three occurs twenty times then replacing it with the integer
1 saves 40 characters, and replacing it with the integer 12 saves 20 characters. Whereas
if a word of length ten occurs only once, then replacing it with 1 saves 9 characters while
replacing it with 12 saves 8 characters. So it is better to replace the word of length three
with 1 and the word of length ten with 12, rather than the other way round.
Thus we do not want to pre-assign integers to words, we dont want to use a xed
coding. The assignment of integers to words is done separately for each le which is
compressed.
The basis of many common compression algorithms is to order the words of the docu-
ment according to the number of times the word appears in the document. The front of
the compressed document then contains a list of all the words in the document, and the
order of this list denes the integers which are used to represent the words.
In this set-up all the words in the document are stored in their full form once, so the
compressed le is only smaller than the original le if some of the words are repeated.
Exercises 41
Although that this is usually the case, it is possible for a compressed le to be larger
than the uncompressed version.
Better compression can be achieved by indexing the strings of characters which appear
in a le, rather than using the words (that is the spaces) to determine which strings will
be represented as an integer. For more on the topic of compression you should look up
Human coding and LZ compression on the web.
4.11 Exercises
1. Prove that the relation R = {(x, 2
x
) | x N} is a function f : N N. Decide whether
it is injective, surjective, and/or a bijection.
2. Let f : A B be an invertible function. Prove that f is a bijection.
3. Find the total number of functions from S
2
to {a, b, c}.
4. Let R = {(1, 2), (2, 3), (3, 1)} be a relation on S
3
. Show that R is a function f, that f
is invertible, and give the graphical representation of f
1
.
5. Prove that |NN| = |N|. (Hint: write the elements of NN in a 2-dimensional array,
as we did with the rationals to show that they are countably innite.)
6. Let K = {n | n N}, prove that K is countably innite.
7. Let A = {a
1
, a
2
, . . . , a
n
} be a nite set, and suppose that a
i
= a
j
if and only if i = j.
Prove that |A| |N|.
8. Prove that the map f : Z N given by the rules
f(0) = 0, f(n) = 2n, f(n) = 2n + 1, for n P,
is a bijection.
Chapter 5
Recursion And Induction
Quantities have to be dened before they can be used, particularly before a computer can
use them. For example, before proving that n! > 2
n
we need to know what n! and 2
n
are.
If there are only nitely many elements that can appear then we can, at least in
principle, write them out. This is the core of a data base, all the elements are stored and
when information is required the corresponding entry is looked up in the data base.
However, data bases can be very large and for innite sets they cannot be constructed.
An alternative approach is to give a basic set of data and a set of rules for calculating the
other data elements. If the data set is innite and the set of rules is to be nite then the
rules must in some sense be recursive. In this chapter we shall look at simple recursive
functions, and the related method of proof known as induction.
5.1 Recursive denitions
We often say
n! = n (n 1) (n 2) . . . 2 1, and
2
n
= 2 2 2 . . . 2, n times.
This works well for humans, but how do we dene n! and 2
n
in a computer language?
We state the value of the rst case, and we give an algorithm for calculating the (n +1)st
case in terms of the previous n cases.
For example, we dene n! for n P by declaring
1! = 1
(n + 1)! = (n + 1) n!
So, 1! = 1, 2! = 2 1! = 2 1 = 2, 3! = 3 2! = 3 2 = 6, 4! = 4 3! = 4 6 = 24
By convention we also dene 0! = 1.
We think of factorial as a function from P to P, and write an algorithm to implement it:
if n==1 then factorial(n) = 1
else
factorial(n) = n * factorial(n-1)
This algorithm is recursive because it calls itself. It does not go on calling itself for ever
though, because eventually it calls factorial(1) which is equal to 1.
Denition A denition of a function which uses its own denition on a smaller value is
called a recursive or inductive denition.
44 RECURSION AND INDUCTION
For example, a recursive denition of 2
n
is
2
0
= 1
for n 1, 2
n
= 2
n1
2.
The corresponding recursive algorithm is
if n==0 then exponent 2(n) = 1
else
exponent 2(n) = exponent 2(n-1) * 2
5.2 Recurrence relations
Sometimes the denition of an element may depend on several smaller elements. For
example, we may wish to describe a function by saying how to calculate it on n in terms
of its value on n 1 and n 2.
f(n) = f(n 1) +f(n 2)
Then if we know f(0) and f(1) we can calculate f(n) for all n N.
Denition A recurrence relation is an equation which denes the nth element of a se-
quence in terms of the preceding (n 1) values.
The sequence obtained by adding together the previous two values turns up in many
dierent areas, and has been given a name: the Fibonacci sequence. It is the sequence
fib(0), fib(1), fib(2), fib(3), ... where
fib(0) = 1, fib(1) = 1, and for n 2, fib(n) = fib(n 1) +fib(n 2).
So the sequence begins
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...
A recursive algorithm for calculating fib(n), the nth Fibonacci number is
if (n==0) or (n==1) then fib(n) = 1
else
fib(n) = fib(n-1) + fib(n-2)
5.3 Recursion and iteration
It makes sense to dene quantities such as n! and the Fibonacci numbers recursively, but
that is not always the best way to calculate them.
If we calculate factorial(5) we need to nd factorial(4), and before that factorial(3)
and factorial(2). This way we would have to do quite a bit of back substitution.
If we were to calculate 5! without the recursive denition we would start at 2 and
multiply up until we reached 5, so 2 3 = 6, 6 4 = 24, 5 24 = 120.
The same is true for recurrence denitions. If we have G(2) = 2, G(3) = 1, G(n) =
2G(n 2) + 3G(n 1) and we want to nd G(6) we dont start with G(6) and back
substitute. We have
G(4) = 2 2 + 3 1 = 1, G(5) = 2 1 + 3 1 = 1, G(6) = 2 1 + 3 1 = 5
Recursion is natural for dening quantities, but it also naturally leads us to calculate
values from the top down, using back substitution.
Calculation from the bottom up is familiar to Computer Scientists, because program-
ming languages usually support iteration in the form of for and while loops.
The towers of Hanoi 45
int fac_n = 1
for i == 2 to 5 do
fac_n = fac_n * i
int G_a = 2, G_b = -1
int i = 4
while i <= 6 do
G_n = 2*G_a +3*G_b
G_a = G_b
G_b = G_n
i = i+1
We could use iteration to dene a quantity, so
int pow2_n = 1
for i == 1 to n do
pow2_n = pow2_n * 2
but it is hard to prove properties of the quantity. For example, with the iterative denition
of 2
n
it is not clear how we could prove 2
r
2
m
= 2
r+m
, we need to reason about the
behaviour of a program. With a recursive denition we can often use a proof by induction.
5.4 The towers of Hanoi
There is a well known puzzle (the towers of Hanoi Problem) which consists of three poles
and a set of discs of dierent sizes. The discs are put on one pole in decreasing size order.
A
B
C
The challenge is to move all the discs from one pole to another pole, without ever putting
a larger disc on top of a smaller one, so that all the discs end up in the correct order on
the other pole.
Can we prove that the challenge can be completed for any number of discs?
Well, if there is only one disc then we may move it where we want. So the challenge is
easily met.
A
B
C
46 RECURSION AND INDUCTION
Suppose that there are n + 1 discs, where n is a natural number and n 1, and suppose
that the challenge can be completed when there are n discs. So we assume that we can
move the top n discs to either of the other two poles in the permitted manner.
If we wish to move all the discs to pole C, then we rst move the top n discs to pole B.
A
B
C
We then move the largest disc to the remaining pole, C.
We have assumed that we can move the n smaller discs to any pole from any other pole,
so we can move them onto pole C. This completes the challenge.
A
B
C
What have we done? We have shown that the challenge can be met for 1 disc, and we
have shown that if the challenge can be met for n discs, then it can be met for n+1 discs.
If we are given any positive integer k then this argument shows that the challenge can be
completed for k discs. Thus we have shown that it can be done for all integers n 1.
This method of proving a result for all integers in a given range is called induction.
5.5 The General Principle Of Induction
To prove a result, P(n), for all integers n k (where k Z), it is sucient to
1. prove P(n) when n = k (so prove P(k)), and
2. prove that if P(n) is true then P(n + 1) is true, for all n Z such that n k
(1) is called the base case, (2) is called the inductive step, and P(n) is called the induction
hypothesis (often written I.H.).
Example Prove that for n N and n 5, 2
n
> n
2
.
Induction and recursion 47
Solution
Base case: When n = 5 we have 2
n
= 2
5
= 32 > 25 = 5
2
= n
2
. Thus the base case is
proved.
Inductive step: Suppose that P(n) is true, and that n 5. Then we have
2
n+1
= 2
n
2
> n
2
2, ( by induction hypothesis)
= n
2
+n
2
.
(n + 1)
2
= n
2
+ 2n + 1.
Since n 5, we have
n
2
5n = 2n + 3n 2n + 15 > 2n + 1.
So, 2
n+1
> n
2
+n
2
> n
2
+ 2n + 1 = (n + 1)
2
.
This proves the inductive step. So the result follows by induction.
Example Prove, by induction, that for n P
S(n) =
n

i=0
i = 1 + 2 +. . . +n =
n(n + 1)
2
.
Solution
We dene S(n) inductively, S(1) = 1, S(n) = S(n 1) +n.
Base case: When n = 1,
n(n + 1)
2
=
2
2
= 1 = S(1).
Inductive step: Suppose
S(n) =
n(n + 1)
2
, and n 1.
Then we have
S(n + 1) = S(n) + (n + 1) =
n(n + 1)
2
+ (n + 1), (by induction hypothesis)
=
n
2
+n + 2n + 2
2
=
(n + 1)(n + 2)
2
Thus the result follows by induction.
Exercise Prove by induction that for n N, if n 4 then n! > 2
n
.
5.6 Induction and recursion
It is natural to use induction arguments to prove results about things dened using recur-
sion or recurrence relations.
Example Prove that for n, m N, a
n
a
m
= a
n+m
.
Solution We have a
0
= 1, and for n 1, a
n
= a
n1
a. We need to use the fact that,
because multiplication is commutative, for all n N, we have a
n
a = a a
n
.
48 RECURSION AND INDUCTION
We prove the result by induction on n.
Base case: If n = 0 then a
0
a
m
= 1 a
m
= a
m
= a
0+m
.
Inductive step: Suppose that the result is true for n, and n 0. Then
a
n+1
a
m
= a
n
a a
m
(by denition of a
n+1
)
= a
n
a
m
a
= a
n
a
m+1
(by denition of a
m+1
)
= a
n+m+1
(by the I.H. )
= a
(n+1)+m
Example Suppose that G(0) = 1, G(1) = 3 and that for n 2, G(n) = 2G(n 1) +
3G(n 2) + 4. Prove that, for all integers n, G(n) is an odd number.
Solution
Base case: If n = 0 then G(n) = 1, which is an odd number, and if n = 1, G(1) = 3 which
is an odd number.
Inductive step: Suppose that the result is true for all k such that k n, and that n 1.
Then
G(n + 1) = 2G(n) + 3G(n 1) + 4 by denition of G(n + 1)
By the induction hypothesis we have that G(n 1) is an odd number, so 3G(n 1) is an
odd number. Since
G(n + 1) = 2G(n) + 3G(n 1) + 4 = 2(G(n) + 2) + 3G(n 1)
we have that G(n + 1) is an odd number, as required.
5.7 Exercises
1. Give a proof by induction that n
2
> 3n, for all n N such that n 4.
2. Give a recursive denition of S(n), the sum of all the integers from 1 to n, where
n P. Write a recursive algorithm to calculate S(n).
3. Write down T(6) if T(1) = 1 and for n 2, T(n) is given by the recurrence relation
T(n) = 2T(n 1) + 1.
4. Let n be a natural number. Give a proof by induction of the proposition
(for all n N)(n
2
+ 7 > 3n).
5. Let n be a natural number. Give a proof by induction of the proposition
(for all n N)(n
2
+ 7 > 5n).
6. Give the recurrence relation which denes the (innite) sequence
0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5....
7. Write down T(7) if T(1) = 1, T(2) = 1 and for n 3, T(n) is given by the recurrence
relation
T(n) = 3T(n 1) 2T(n 2).
8. For n N and a R we use the following recurrence relation to dene a
n
: a
0
= 1, and
for n 1, a
n
= a
n1
a. Prove by induction on n, that for all n, m N, a
n
a
m
= a
m
a
n
.
(Note: you cannot use the fact that a
n
a
m
= a
n+m
because we used the property
a
n
a
m
= a
m
a
n
to prove this fact.)
Chapter 6
Graphs And Trees
In the Introduction we saw an example of the use of graphs to help solve a problem.
The use of graphs for problem solving began in the eighteenth century, when Euler was
trying to solve the K onigsberg bridge problem. This problem concerns seven bridges which
connect two islands in a river to the mainland. The bridges and islands are arranged as
in the following picture.
The problem is to nd a path (beginning and ending anywhere you like) which crosses
each bridge exactly once.
Challenge Find such a path.
The method that Euler used to tackle this problem was to represent it as a graph. Then
he developed graph theoretical results which he could apply to the problem.
6.1 Introduction to graph theory
A graph is a set of vertices (or nodes) and a set of edges between the vertices. It is possible
for there to be more than one edge between two vertices. In this case we say that the
graph is a multigraph.
The following is a multigraph which represents the K onigsberg bridge situation. The
vertices N, S correspond to the north and south banks, and the vertices I, J correspond
to the two islands.
v
v
v v
N
J I
S
6.1.1 Walks, trails and paths
Denitions Two vertices in a graph are adjacent if there is an edge between them.
50 GRAPHS AND TREES
A walk from vertex N to vertex M is a sequence of vertices N = N
0
, N
1
, N
2
, . . . , N
k
= M,
with k 0, such that N
i
is adjacent to N
i+1
, for 0 i k.
Let G be the graph
v
v
v
v
v
v v
3
5 4
1
2
6
7
`
`
`
``
.
.
.
.
.
. .
`
`
`
``

Then 1,2,3,5 is a walk from 1 to 5. So is 1,3,4,4,3,5.


There is no walk from 1 to 6.
It is possible to get very long walks by moving backwards and forwards along the same
edge. For example, the walk 1,2,1,2,1,2,3 in the above graph
We often want to exclude such walks because they involve unnecessary redundancy.
Denitions A trail in a graph is a walk which does not use any edge more than once
(i.e. a walk in which no edge is repeated). A trail is also sometimes called a simple walk.
A path in a graph is a walk which does not use any vertex more than once. So any path
is also a trail.
Lemma 1 If there is a walk from N to M, we can nd a trail from N to M and if N = M
this trail can be taken to be a path.
Proof If a vertex is repeated in the walk then the walk is of the form
N, . . . , K, N
i
, N
i+1
, . . . , N
i
, H, . . . , M
In this case we remove everything after the rst N
i
and before H
N, . . . , K, N
i
, H, . . . , M
We still have a walk from N to M. We repeat these reductions until the walk contains
no repeated vertices apart from the possibility that N = M. Thus we have a trail, and a
path if N = M.
Denition We say that N, M are connected if there is a walk from N to M. So N is
connected to itself by the empty walk, where k = 0.
(A vertex is not necessarily adjacent to itself.)
Denition If every vertex is connected to every other vertex we say that the graph is
connected.
6.1.2 The Exmoor roads problem revisited
The Exmoor roads problem discussed in the Introduction could be stated as nd the
shortest connected graph between the vertices. Actually, we only wanted to use existing
roads, so we wanted the solutions to be made of existing edges.
Denition If a graph G is obtained by removing some of the edges and some of the
vertices from another graph, G

say, then G is a subgraph of G

.
Trees 51
v v
v v
v v
v
v v
v v v
3 3
5 4 4
1 1
2 2
6
7 7
`
`
`
``
`
`
`
``
.
.
.
.
.
. .
`
`
`
``

Both of these graphs are subgraphs of the above graph.


In the Exmoor roads problem we are not allowed to remove villages from the graph, just
roads. (We cant solve the problem just by closing the villages.)
Denition A subgraph of G is a spanning subgraph if it contains all the vertices of G.
The subgraph on the right above is a spanning subgraph, the subgraph on the left is not.
The Exmoor roads problem can now be stated precisely as follows:
Find the shortest connected spanning subgraph of the graph of existing roads.
6.2 Trees
A shortest connected spanning subgraph will always be a tree. Trees are an important
subclass of the class of graphs. In this section we dene trees and discuss some of their
properties.
In many graphs it is possible to start at some vertex and follow a walk which eventually
ends up back at the same vertex again.
For example, in the following graph there is a walk N
4
, N
1
, N
3
, N
4
.
v
v
v
v v
N1
N2 N3
N0
N4
`
`
`
``
.
.
.
.
.
. .
`
`
`
``

/
/
/
/
/
Denitions A loop is an edge from a vertex to itself.
A cycle in a graph is a non-empty trail N = N
0
, N
1
, . . . , N
k1
, N
k
= N from a vertex to
itself where N = N
0
, N
1
, . . . , N
k1
is a path. (Non-empty just means that k 1.)
Example In the above graph, N
0
, N
4
, N
1
, N
3
, N
4
, N
0
and N
1
, N
2
, N
3
, N
4
, N
4
, N
3
, N
1
and N
0
, N
1
, N
2
, N
3
, N
4
, N
0
are not cycles,
v v v
v v
v v v
v v v v v
N1 N1 N1
N2 N2 N3 N3 N3
N0 N0
N4 N4 N4
`
`
`
``
.
.
.
.
.
. .
`
`
`
``

.
.
.
.
.
. .
,
`
`
`
`

`
.
.
.
.
.
. .
,

`
`
`
` `
'
.
.
.
.
.
.

`
`
`
`
'
-
but N
0
, N
1
, N
2
, N
3
, N
4
, N
0
and N
0
, N
1
, N
4
, N
0
and N
4
, N
4
are cycles.
52 GRAPHS AND TREES
v v
v v
v v v
v v
N1 N1
N2 N3
N0 N0
N4 N4 N4

`
.
.
.
.
.
.

`
`
`
`
'
`
`
`
`
'
- -

A cycle which contains only one edge is a loop.


Denition A tree is a connected graph which has no cycles.
The graph on the left below is a tree but the graph on the right is not!
u
u
u
u
u
u
u
u
u
u u
u
u
u
u u
u
u u
u
u
u
u
u
u
`
`
`
`
.
.
.
.
. .
.
.
.
.
. .
`
`
`
`
`
`
`
`
`
`
`
`

'
'
'

\
\
\
\
\
\

>
>
>

`
`
`

'
'
'
'

`
`

Trees often occur as data structures in programs, as you will see in the programming
courses. In this course we just look at the basic properties of trees.
Although the rst three examples above were not cycles, they contained cycles. It is
always true that if there are two dierent trails between a pair of vertices then there is
a cycle in the graph. This is a fundamental property of trees, formalised in the following
lemma.
Lemma 2 If there are vertices N, M in G which have two dierent trails connecting them
then G has a cycle.
The next lemma, which can be proved by induction on the number of edges in the
graph, shows that in a connected graph the number of edges will dominate the number of
vertices. So when considering how large a graph might be for implementation purposes,
if it is connected we need to consider the number of edges not the number of vertices.
Lemma 3 Every connected graph G with n vertices has at least n 1 edges.
We now give several equivalent conditions for a tree to be a graph. To prove that
a graph is a tree it is sucient to show that any one of the last four conditions in the
following theorem holds.
Theorem 6 Let G be a graph with n vertices. The following properties are equivalent.
(1) G is a tree.
(2) There exists exactly one trail between any pair of vertices.
(3) G is connected and has exactly n 1 edges.
(4) G is connected and either n = 1 and G has no edges, or n 2 and removing any edge
in G makes the graph unconnected.
(5) G has no cycles, and adding an edge between any pair of non-adjacent edges N, M
creates a cycle containing N and M.
Directed graphs and rooted trees 53
The proofs that the other properties are equivalent to (1) are not given in these notes.
Some of the techniques required for these proofs form the basis for algorithms in programs
which manipulate trees, so it is worth looking up these proofs in one of the text books.
We rst met graphs when we were looking for shortest road systems. We can now
prove that the shortest connected spanning subgraph of any connected graph will always
be a tree. This allows us to make the algorithm for nding the shortest subsystem more
ecient.
Theorem 7 Let G be a connected graph. The shortest connected spanning subgraphs of
G (the ones with the least number of edges) are trees. Furthermore, if G has n vertices
then any shortest connected spanning subgraph has exactly (n 1) edges.
Proof: Let G

be a shortest spanning subgraph of G.


If n = 1, the graph with one vertex and no edges spans G. This is a tree.
Suppose that n 2. Since G

is a shortest connected subgraph, removing an edge


from G

makes the graph unconnected. Thus G

has property (4) of Theorem 6, and so G


is a tree. Then by property (3) of Theorem 6, G

has exactly (n 1) edges.


Finally, we return again to the Exmoor roads problem. There are 6 villages, so the
graph of existing roads has 6 vertices. Theorem 7 shows that the minimum road system
connecting the villages will have 5 roads. This justies the assumption that we made when
we discussed the algorithm for nding a minimum system, in the Introduction.
If the solution has less than 5 roads then it is not connected. If the solution has more
than 5 roads then one can be removed leaving the system still connected.
Example Find the shortest connected subgraph of the following graph.
v
v
v
v
v
v
v
v
D
F
C
H
G
E
A
B
`
`
`
``
.
.
.
.
.
. .
`
`
`
``/
/
/
/
/
`
`
`
`
``

7
1
1
2 4
5
3
3
2
5
4
1
It has 8 vertices, so we need 7 edges. The 7 shortest are 1,1,1,2,2,3,3 giving 13. This
gives the following graph.
u
u
u
u
u
u
u
u
D
F
C
H
G
E
A
B
`
`
`
`
.
.
.
.
. .
`
`
`
`
`
`
`
`
`

1
1
2
3
3
2 1
The next shortest length is 14. We can replace BF with FG to get a connected system of
length 14. This must be a shortest system.
6.3 Directed graphs and rooted trees
As we have already mentioned, searching data is an important part of computing, and it
can be very time consuming. Thus it is necessary to take steps to make searches ecient.
54 GRAPHS AND TREES
One method of organizing data so that it can be searched more eciently is to use a rooted
tree.
Many family medical books have a section which helps to diagnose illnesses from
given symptoms. There is an initial question, then subsequent questions which depend on
the answer to the rst question.
did the pain
is the ankle
are you
can you move
are the other
temperature temperature
is your is your
follow an injury
red and swollen
over 50
the ankle
joints aected
over 38C over 38C
>
>
>
>
> >.

No
Yes

`
`
`
`
`
`
`
`
No
No
No
No No
Yes
Yes
Yes
Yes Yes
cant make a
probably
probably probably
diagnosis
a strained
gout
rheumetoid
probably
probably
probably probably
osteoarthitis
a broken
a joint
rheumatic fever
see doctor
arthritis see doctor infection
ligament ankle
>
>
> >.
`
`
`
`

No
Yes
An alternative is, for each combination of symptoms, to list the diagnosis. Then search
the list until the matching symptoms are found. The list will have 8 entries, thus the
longest search will involve 8 items. With the tree method the maximum search has length
4 (the longest path in the tree).
The tree method of storage makes searching more ecient, and this eciency is greater
when the tree is bigger. The increased eciency is illustrated by the following party game.
One person asks another to think of a number n between 1 and 31. The rst person then
proceeds to try to guess n. The second person is required to respond to each guess with
either Yes, smaller, or larger, according to whether n is equal to, smaller than or
larger than the number guessed.
The rst person guarantees to guess n in at most 5 guesses.
is it 16
is it 8 is it 24
is it 4 is it 20 is it 12 is it 28
is it 2 is it 18 is it 10 is it 26 is it 6 is it 22 is it 14 is it 30
<
< <
< < < <
< < < < < < < <
>
> >
> > > >
> > > > > > > >
l l l l l l l l l l l l l l l l
3 11 19 27 1 9 17 25 5 13 21 29 7 15 23 31

>
>
>
>
>
>.

.
`
`
`
`
`
`
`
`

-
`
`
`
`
`
`
`
`
`
`
`
`

-

-

-

-

-

-

-

-
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
`
This uses a maximum of 5 tests instead of a possible 31.
Denition A tree which has a distinguished start vertex (called the root), such that
there is a path from the root to any other vertex, is called a rooted tree.
Return to the Konigsberg bridges 55
Rooted trees really belong to the family of directed graphs. We have already met
directed graphs as a method of representing relations on nite sets.
Denitions A directed graph or digraph is a set of vertices and a set of arrows (directed
edges) between pairs of vertices.
u
u
u
u
u
N3
N4
N2
N5
N1

.
`
`
`
`

A vertex N is adjacent to a vertex M if there is an arrow from N to M. We call N an


antecedent of M, and we call M a successor of N.
A walk in a digraph is a sequence
N = N
0
, N
1
, N
2
, . . . , N
k1
, N
k
= M,
where k 0 and for 0 i (k 1), N
i
is adjacent to N
i+1
.
A trail in a digraph is a walk that has no repeated edges and a path in a digraph is a walk
with no repeated vertices.
A cycle is a walk N
0
, N
1
, N
2
, . . . , N
k1
, N
k
, N
0
such that N
0
, N
1
, N
2
, . . . , N
k1
, N
k
is a
path.
In the above digraph, N
2
, N
1
, N
1
, N
3
, N
5
, N
2
and N
3
, N
5
, N
3
are cycles.
Denition The underlying graph of a digraph is the graph obtained by replacing arrows
with undirected edges.
A digraph is connected if its underlying graph is connected.
Denition A rooted tree is a digraph in which there is one vertex which has a path to
any other vertex and in which each vertex has at most one antecedent (called its parent).
u
u
u
u
u
u
u
u
u
root node

.
`
`
`
`

\
\
`
`
\
\
\
`
`
-
-
-
-
-
-
-
In a tree there is only one vertex which has a path to every other vertex, it is called the
root vertex.
A digraph without cycles is not necessarily a rooted tree.
u
u
u
u
2
4
1

.
`
`
`
`

3
6.4 Return to the K onigsberg bridges
Recall the challenge that was set at the beginning of this section: nd a path in the
K onigsberg bridges graph which uses each edge exactly once.
Denition A trail in a graph which uses all the edges exactly once is called an Eulerian
path.
56 GRAPHS AND TREES
It is not possible to meet the above challenge; no Eulerian path exists. How do we
know this?
There are only 7 edges in the graph. Thus, in theory it is possible to write all the
possible trails down, and demonstrate that none of them includes all the edges. But there
are at least 60 such trails, and how do we know that we havent missed any?
Notice, if any bridge is removed or if a new one is added then it is possible to complete
the challenge. The graphs
v v v
v v v
v v v v v v
N N N
J J J I I I
S S S
have Eulerian paths.
It turns out that the existence of an Eulerian path depends only on the number of
vertices which have an odd number of attached edges.
Denition The number of edges attached to a vertex is called the degree of the vertex.
In the following graph vertex A has degree 2, vertices B and C have degree 3, and vertex
D has degree 0.
v
v
v v
B
C A
D

The K onigsberg bridges graph has 3 vertices of degree 3 and one of degree 5, thus it
has 4 vertices whose degree is odd. The following theorem, whose proof is not covered in
this course, shows that the K onigsberg graph cannot contain an Eulerian path.
Theorem 8 A connected graph G has an Eulerian path if and only if either all its vertices
have even degree, or exactly two of its vertices have odd degree.
6.5 Exercises
1. Write down an Eulerian path in the following (multi)graph.
v
v
v v
v
v
6
5
3
4
1
2
2. For the graph in Question 1, write down
(a). The vertices adjacent to 6.
(b). The vertices adjacent to 1.
(c). A walk from 5 to 3 which is not a trail.
3. Consider the following tree, T:
Exercises 57
v
v
v
v
v
2 1
3
4
5 `
`
`/
/
/
`
`
`

Write down all the graphs which can be constructed by adding one new edge between any
two non-adjacent vertices in T. Thus verify that (5) of Theorem 6 holds for this graph.
4. Consider the following relations on S
4
.
R
1
= {(1, 1), (1, 2), (1, 3), (2, 4)}, R
2
= {(1, 2), (2, 3), (2, 4), (3, 4)}, R
3
= {(2, 1), (2, 3), (1, 4)}.
Decide whether the graphical representation of each relation is a rooted tree, and if it is
give the root vertex.
5. A graph in which every vertex is adjacent to all the vertices in the graph is called a
complete graph. The following is the complete graph with four vertices.
v
v
v
v
2
1
3
4
>
>
>
>
>

>
>
>
>
>

Write down all the spanning subtrees of the complete graph with four vertices.
6. Write down all the possible graphs that have four vertices and 0, 1, or 2 edges and no
loops. Hence verify that Lemma 3 holds for graphs with four vertices.
58 GRAPHS AND TREES
7. Consider the following road network.
v
v v
v v v
v v
A
B
H
E
F G
D C
>
>
>
>
>

>
>
>
>
>

`
`
`
5 5
1
2
7 6
3
1
2 3
1
4
3
2 2
6
Find the shortest road network which connects all the towns, and which uses only existing
roads. (Justify your answer.)
8. The following is a chart which allows a person to select clothing.
temperature raining windy wear
hot no no shirt
warm no no jumper
warm yes yes shirt, kagool
cold yes no overcoat, umbrella
hot yes no shirt, umbrella
hot no yes shirt
cold no no overcoat
warm no yes jumper, umbrella
cold no yes overcoat
hot yes yes shirt, kagool
cold yes yes jumper, kagool
warm no yes jumper
Draw a decision tree (a rooted tree) which corresponds to the above chart. What is the
root of your tree?
Construct a dierent rooted tree which also corresponds to the above chart. What is
the root of this tree?
Chapter 7
Vector Spaces
Computer graphics routines use 2D and 3D geometrical transformations. There are three
basic types of transformation: translation, scaling, and rotation. These transformations
are used to change the position, orientation and size of objects drawn on the screen. The
ideas on which graphics routines are based come from a powerful branch of mathematics
called vector space theory. Vector spaces are collections of vectors.
Sets which have associated operations addition and multiplication, (where these operations
have certain natural properties), are called elds. The multiplication must be commuta-
tive, so ab = ba for all elements in the set; and every element except 0 must have a
multiplicative inverse, so for each element a there is an element a
1
with aa
1
= 1.
The real numbers R, the rationals Q, and the booleans {0, 1} with binary addition and
multiplication are all elds. But the integers Z and the natural numbers N are not elds.
In general, a vector can have elements from any eld but in this course we shall only
consider vectors whose elements are real numbers.
7.1 Vectors
A vector is an n-tuple whose entries are elements of R. We write vectors as columns:
_
_
0
0
0
_
_
,
_
_
1
3.5
7
_
_
,
_
0.5

_
.
You should think of a vector as a line from the origin to the point whose co-ordinates are
the entries of the vector.
(8,3,6)
v=
8
3
6
60 VECTOR SPACES
We can specify graphical objects by giving the vectors of the corners, and lling in the
straight line connections. Usually we dont draw in the whole vector, just the end point.
The corners of the following drawing are given by the vectors
_
2
2
_
,
_
2
4
_
,
_
4
2
_
,
_
4
4
_
,
_
3
5
_
.
7.1.1 Scalar multiplication
The vector
_
_
2
2
2
_
_
is just a line which is twice as long as the vector
_
_
1
1
1
_
_
.
In general, we can multiply any vector by any real number to get a new vector which is
just a dierent length to the original. So
_
_
3.5
3.5
3.5
_
_
= 3.5
_
_
1
1
1
_
_
and
_
0

_
=
_
0
1
_
.
The eect of multiplying vectors by real numbers is to scale the drawing. If the real
number is greater than 1 then the drawing is enlarged, if the number is less than 1 then
the drawing is reduced.
The two drawings below are obtained by multiplying the above vectors by 2 and by 1/2.
Vectors 61
What happens if we scalar multiply by -2?
Multiplying a vector by a real number in this way is called scalar multiplication. For any
vector v and any real number r R we can nd the scalar multiple, rv, of v by r:
If v =
_
_
_
_
a
1
a
2
.
.
.
a
n
_
_
_
_
then rv =
_
_
_
_
ra
1
ra
2
.
.
.
ra
n
_
_
_
_
.
7.1.2 Vector addition
It is always possible to add together two vectors with the same number of entries.
_
4
2
_
+
_
1
3
_
=
_
3
5
_
The general rule for adding vectors is to add together the corresponding entries in each
vector.
_
_
_
_
a
1
a
2
.
.
.
a
n
_
_
_
_
+
_
_
_
_
b
1
b
2
.
.
.
b
n
_
_
_
_
=
_
_
_
_
a
1
+b
1
a
2
+b
2
.
.
.
a
n
+b
n
_
_
_
_
For 2-dimensional vectors (vectors with two entries) this corresponds to the parallelogram
rule: to add two vectors, translate one vector so that it starts at the end of the other
vector, then draw a line from the origin to the end point of the result.
u
v
u+v
62 VECTOR SPACES
Notation A vector with n entries is called an n-dimensional vector.
7.1.3 Linear combinations
We can combine scalar multiplication and vector addition to get new vectors from old.
If v =
_
_
1
2
0
_
_
and w =
_
_
1
1
1
_
_
then
3v + (2)w =
_
_
3
6
0
_
_
+
_
_
2
2
2
_
_
=
_
_
5
4
2
_
_
.
Denition A linear combination of the vectors v
1
, v
2
, . . . , v
k
is a vector of the form
a
1
v
1
+a
2
v
2
+. . . +a
k
v
k
,
where a
1
, a
2
, . . . , a
k
are real numbers.
7.1.4 Standard bases
For any 3-dimensional vector v we have
v =
_
_
r
s
t
_
_
= r
_
_
1
0
0
_
_
+s
_
_
0
1
0
_
_
+t
_
_
0
0
1
_
_
.
We let e
1
=
_
_
1
0
0
_
_
, e
2
=
_
_
0
1
0
_
_
, and e
3
=
_
_
0
0
1
_
_
. Then v = re
1
+se
2
+te
3
.
So every 3-dimensional vector is a linear combination of the vectors e
1
, e
2
, e
3
.
The set {e
1
, e
2
, e
3
} is called the standard basis for 3-space.
The same is true for 2-dimensional vectors.
_
r
s
_
= r
_
1
0
_
+s
_
0
1
_
.
e
1
=
_
1
0
_
, e
2
=
_
0
1
_
is the standard basis for 2-space.
7.1.5 Vector spaces
A vector space is a set V of vectors which has the following properties:
1. If u, v V then u +v V .
2. For u, v, w V we have u + (v +w) = (u +v) +w.
3. For u, v V we have u +v = v +u
4. There is a vector 0 V such that v + 0 = v, for all v V .
5. For every a R and every v V we have av V .
6. For all v V we have v + (1)v = 0. (Write v u for v + (1)u.)
7. For all a, b R and for all v V , we have (a +b)v = av +bv.
8. For all a, b R and for all v V , we have (ab)v = a(bv).
9. For all a R and for all v, u V , we have a(v +u) = av +au.
10. For v V , we have 1v = v.
Linear independence, spanning and bases 63
The set of all vectors of the form
_
_
r
s
t
_
_
(where r, s, t R) has these properties, so it is a
vector space. We call this vector space 3-space or V
3
.
The set of all vectors of the form
_
r
s
_
(where r, s R) has these properties, so it is a
vector space. We call this vector space 2-space or V
2
.
In general, the set of all vectors of the form
_
_
_
_
r
1
r
2
.
.
.
r
n
_
_
_
_
(where r
i
R for 1 i n) has these
properties, so it is a vector space. We call this vector space n-space or V
n
.
7.2 Linear independence, spanning and bases
Denition A spanning set is a set of vectors {v
1
, v
2
, . . . , v
k
} with the property that, for
any vector v, there are real numbers a
1
, a
2
, . . . , a
k
such that
v = a
1
v
1
+a
2
v
2
+. . . +a
k
v
k
.
The standard basis {e
1
, e
2
, e
3
} and the set of vectors
_
_
2
0
1
_
_
,
_
_
0
1
1
_
_
,
_
_
0
0
1
_
_
,
_
_
1
1
1
_
_
are both spanning sets for 3-space.
Denition A set {v
1
, v
2
, . . . , v
k
} of vectors is linearly independent if the only linear com-
bination which gives the zero vector is the one obtained multiplying each vector by 0.
i.e. If a
1
v
1
+a
2
v
2
+. . . +a
k
kv
k
= 0 implies that a
1
= a
2
= . . . = a
k
= 0.
Denition A set of vectors {v
1
, v
2
, . . . , v
k
}, which is both a spanning set and linearly
independent, is called a basis.
Bases are important because if {v
1
, v
2
, . . . , v
k
} is a basis for a vector space V then every
vector in V can be written uniquely as a linear combination of these vectors.
In other words, if v V then there exist unique real numbers a
1
, . . . , a
k
such that
v = a
1
v
1
+a
2
v
2
+. . . a
k
v
k
.
The sets {e
1
, e
2
} and {e
1
, e
2
, e
3
} are bases for V
2
and V
3
respectively.
There are lots of bases in a vector space.
Example Show that v =
_
2
1
_
, u =
_
1
3
_
, is a basis for V
2
.
Solution: If a
_
2
1
_
+b
_
1
3
_
=
_
0
0
_
, then
2a b = 0 = a + 3b, so 7a = 0, and hence a = b = 0.
So we have shown that the vectors v, u are linearly independent.
_
r
s
_
=
(3r +s)
7
_
2
1
_
+
(r + 2s)
7
_
1
3
_
.
64 VECTOR SPACES
This shows that the set {v, u} spans V
2
. Thus it is a basis.
Notice, both the bases that we have found for V
2
have contained 2 vectors. This is not a
coincidence, as the next theorem shows.
Theorem 9 All bases in a vector space V contain the same number of vectors. This
number is called the dimension of V .
7.3 Transformations
We have seen how scalar multiplication can scale the size of graphical images, stretching
or shrinking them.
7.3.1 Scaling
Scalar multiplication scales a drawing uniformly (the same amount in each direction). We
can scale a drawing by dierent amounts in each direction. For example, we can multiply
the rst entry by 1/2 and the second entry by 2.
_
2
2
_

_
1
4
_
,
_
4
2
_

_
2
4
_
,
_
2
4
_

_
1
8
_
,
_
4
4
_

_
2
8
_
,
_
3
5
_

_
1.5
10
_
.
This general type of scaling is a form of transformation. Given any non-zero real numbers
a, b we can transform every vector of 2-space using the following rule:
_
r
s
_

_
ar
bs
_
.
7.3.2 Translation
Another type of transformation is a translation. In a translation a xed number is added
to each of the entries in the vectors. The aect of a translation is to slide a drawing across
to a dierent place on the screen.
For example, we can add 1 to the rst entry and 2 to the second entry of each vector in
2-space.
Linear Transformations 65
7.3.3 Rotation
The third type of transformation is a rotation. To rotate a vector through an angle of 60

we replace the vector


_
r
s
_
with the vector
_
r.cos60 s.sin60
r.sin60 +s.cos60
_
.
before rotation after rotation
In general, to rotate a vector through

we replace
_
r
s
_
with the vector
_
r.cos s.sin
r.sin +s.cos
_
.
7.4 Linear Transformations
There is a special class of transformations, called the linear transformations, which preserve
the addition and scalar multiplication of a vector space.
For example, if we scale two vectors and then add them, we get the same result as we
would by adding rst and then scaling.
_
ax
by
_
+
_
ar
bs
_
=
_
a(x +r)
b(y +s)
_
.
Similarly, we can rotate then add, or add then rotate, and get the same result in either
case.
_
x.cos y.sin
x.sin +y.cos
_
+
_
r.cos s.sin
r.sin +s.cos
_
=
_
(x +r).cos (y +s).sin
(x +r).sin + (y +s).cos
_
.
66 VECTOR SPACES
Denition A linear transformation is a function T : V W (from a vector space V to
a vector space W) which has the properties:
T(v +u) = T(v) +T(u)
T(av) = aT(v)
for all u, v V and for all a R.
The scaling transformation T : V
3
V
3
which is dened by the rule
if v =
_
_
r
s
t
_
_
then T(v) =
_
_
2r
s
5t
_
_
is a linear transformation.
Lemma 4 If T : V W is a linear transformation then T(0) = 0.
Proof 0 = v + (1)v so
T(0) = T(v + (1)v) = T(v) +T((1)v) = T(v) T(v) = 0.
Remark Translations are not linear transformations.
For example, the translation
_
r
s
_

_
r + 3
s 2
_
has the property that
_
0
0
_

_
3
2
_
=
_
0
0
_
.
So Lemma 4 does not hold for translations, and hence they cannot be linear transforma-
tions.
Linear transformations are very important functions. They are so important that there is
a special technique in graphics which modies the representations of 2-space and 3-space
so that translations become linear transformations.
The trick is to represent a 2-dimensional vector as a 3-dimensional vector whose third
entry is 1. So
_
r
s
_
is represented as
_
_
r
s
1
_
_
. The new vectors are called homogeneous
vectors, and the process of representing 2-space in this way is called homogenization. We
shall not go in to the details of this in this course, but it is discussed in the second year
graphics course.
One of the reasons why linear transformations are so important is that they can be rep-
resented as matrices.
7.5 Matrices
In this section we consider matrices which are n n arrays of real numbers, where n P.
We can multiply a 3-dimensional vector by a 3 3 matrix:
_
_
a
1
a
2
a
3
b
1
b
2
b
3
c
1
c
2
c
3
_
_
_
_
2
1
3
_
_
=
_
_
2a
1
+a
2
3a
3
2b
1
+b
2
3b
3
2c
1
+c
2
3c
3
_
_
.
We can multiply a 2-dimensional vector by a 2 2 matrix:
Matrices 67
_
cos sin
sin cos
__
r
s
_
=
_
r.cos s.sin
r.sin +s.cos
_
.
In general, we can multiply a n-dimensional vector by a n n matrix as follows:
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
a
n1
a
n2
. . . a
nn
_
_
_
_
_
_
_
_
r
1
r
2
.
.
.
r
n
_
_
_
_
=
_
_
_
_
r
1
a
11
+r
2
a
12
+. . . +r
n
a
1n
r
1
a
21
+r
2
a
22
+. . . +r
n
a
2n
.
.
.
r
1
a
n1
+r
2
a
n2
+. . . +r
n
a
nn
_
_
_
_
.
A rotation T : V
2
V
2
which rotates vectors through an angle has the same eect as
multiplying each vector by the matrix in the second example above, i.e. if
M =
_
cos sin
sin cos
_
then Mv = T(v),
for all vectors v in V
2
.
We say that a matrix M is equivalent to a linear transformation T if for all vectors v, of
the correct dimension, Mv = T(v).
Example The matrix M below is equivalent to the linear transformation T : V
3
V
3
which scales the rst entry by 3, the second entry by -2, and the third entry by 7.
M =
_
_
3 0 0
0 2 0
0 0 7
_
_
, M
_
_
r
s
t
_
_
=
_
_
3r
2s
7t
_
_
= T(v).
The following theorem shows that multiplying by any matrix is a linear transformation,
and that any linear transformation can be represented as a matrix.
Theorem 10 (a). If M is an n n matrix then the map T
M
: V
n
V
n
dened by
T
M
(v) = Mv, is a linear transformation.
(b). If T : V
n
V
n
is a linear transformation then there is an nn matrix M
T
such that
for all v V
n
, T(v) = M
T
v.
Problem Given a linear transformation T : V
n
V
n
, nd the matrix M such that
T(v) = Mv.
Algorithm: For each of the standard basis vectors e
i
, 1 i n, calculate the vectors
u
i
= T(e
i
).
Construct the matrix M by making u
1
the rst column, u
2
the second column, and so
on nishing with u
n
as the last column.
Example Let T : V
3
V
3
be the linear transformation which projects objects on to
the x,y plane, so if v =
_
_
r
s
t
_
_
then T(v) =
_
_
r
s
0
_
_
. Find the matrix which is equivalent
to T.
68 VECTOR SPACES
Solution We have
T(e
1
) =
_
_
1
0
0
_
_
, T(e
2
) =
_
_
0
1
0
_
_
, T(e
3
) =
_
_
0
0
0
_
_
, so M =
_
_
1 0 0
0 1 0
0 0 0
_
_
.
Check the result:
Mv =
_
_
1 0 0
0 1 0
0 0 0
_
_
_
_
r
s
t
_
_
=
_
_
r
s
0
_
_
= T(v).
Example Let T : V
3
V
3
be the linear transformation dened by the rule if v =
_
_
r
s
t
_
_
then T(v) =
_
_
s
t 2r
t + 7s
_
_
. Find the matrix which is equivalent to T.
Solution We have
T(e
1
) =
_
_
0
2
0
_
_
, T(e
2
) =
_
_
1
0
7
_
_
, T(e
3
) =
_
_
0
1
1
_
_
, so M =
_
_
0 1 0
2 0 1
0 7 1
_
_
.
Check the result:
Mv =
_
_
0 1 0
2 0 1
0 7 1
_
_
_
_
r
s
t
_
_
=
_
_
s
t 2r
t + 7s
_
_
= T(v).
Example Let T : V
2
V
2
be the linear transformation which scales vectors by 2 in the
rst entry and by -3.5 in the second entry. So, if v =
_
r
s
_
then T(v) =
_
2r
3.5s
_
. Find
the matrix which is equivalent to T.
Solution We have
T(e
1
) =
_
2
0
_
, T(e
2
) =
_
0
3.5
_
, so M =
_
2 0
0 3.5
_
.
Check the result:
Mv =
_
2 0
0 3.5
__
r
s
_
=
_
2r
3.5s
_
= T(v).
7.6 Composing Linear Transformations
We often want to transform drawings in a combination of ways, for example we may want
to scale an object and rotate it through 90

.
Composing Linear Transformations 69
We can combine two or more transformations into a single transformation by composing
them. If T : V W and S : W U are both linear transformations, then we can form
a new transformation R : V U by dening R(v) = S(T(v)).
To get a transformation R : V
2
V
2
which both scales the rst entry by 2 and the second
entry by 1/2, and which rotates by 90

, we compose the two transformations


T
_
r
s
_
=
_
2r
0.5s
_
and S
_
r
s
_
=
_
r.cos90

s.sin90

r.sin90

+s.cos90

_
and we get
R
_
r
s
_
=
_
2r.cos90

0.5s.sin90

2r.sin90

+ 0.5s.cos90

_
Since cos90 = 0 and sin90 = 1 we actually have R
_
r
s
_
=
_
0.5s
2r
_
.
We know that every linear transformation corresponds to a matrix, and we might expect
that the matrix of a composition can be constructed from the matrices of the transforma-
tions being composed.
Each column of a matrix is a vector, and we can multiply vectors by matrices. We can
extend this so that we can multiply two matrices together by multiplying each column of
the second matrix by the rst matrix. So
_
_
a
1
a
2
a
3
b
1
b
2
b
3
c
1
c
2
c
3
_
_
_
_
x
1
x
2
x
3
y
1
y
2
y
3
z
1
z
2
z
3
_
_
=
_
_
a
1
x
1
+a
2
y
1
+a
3
z
1
a
1
x
2
+a
2
y
2
+a
3
z
2
a
1
x
3
+a
2
y
3
+a
3
z
3
b
1
x
1
+b
2
y
1
+b
3
z
1
b
1
x
2
+b
2
y
2
+b
3
z
2
b
1
x
3
+b
2
y
3
+b
3
z
3
c
1
x
1
+c
2
y
1
+c
3
z
1
c
1
x
2
+c
2
y
2
+c
3
z
2
c
1
x
3
+c
2
y
3
+c
3
z
3
_
_
_
a d
b e
__
x p
y q
_
=
_
ax +dy ap +dq
bx +ey bp +eq
_
Lemma 5 If T, S are linear transformations with matrices M
T
, M
S
, respectively, then
their composition R, where R(v) = S(T(v)), has matrix M
S
M
T
.
Example Let T, S : V
3
V
3
be the linear transformations dened by the rules if
v =
_
_
r
s
t
_
_
then T(v) =
_
_
s
t 2r
t + 7s
_
_
and S(v) =
_
_
r
s
0
_
_
. Find the matrix of the composition
ST.
70 VECTOR SPACES
Solution From the examples in the previous section we have that S, T have matrices
M
S
=
_
_
1 0 0
0 1 0
0 0 0
_
_
and M
T
=
_
_
0 1 0
2 0 1
0 7 1
_
_
.
Then, by Lemma 5, we have that their composition has the matrix M, where
M = M
S
M
T
=
_
_
0 1 0
2 0 1
0 0 0
_
_
.
The fact that linear transformations can be built up from simple transformations, such as
rotation and scaling, by multiplying matrices is one of the reasons why linear transforma-
tions are so important and widely used.
7.7 Inverting Linear Transformations
Many of the transformations that we have considered can be reversed. If we rotate
a drawing through

then we can reverse this action by rotating the drawing through

= 360

360
Similarly, if we scale a drawing by stretching everything by 2, we can reverse this by
shrinking everything by 1/2.
Formally, for some linear transformations T there is a corresponding linear transfor-
mation S such that S(T(v)) = v, for all appropriate vectors v.
Linear transformations are functions, and saying that a linear transformation is re-
versible is the same thing as saying that it is an invertible function, and a bijection. We
are now going to consider the conditions under which a linear transformation is a bijection.
7.7.1 Kernels
If T : V W is a linear transformation then it is a function, but it may not be surjective
and it may not be injective.
From Lemma 4 we know that for any linear transformation, T, T(0) = 0. For T : V
3

V
3
as above we also have
T
_
_
0
0
2
_
_
=
_
_
1 0 0
0 1 0
0 0 0
_
_
_
_
0
0
2
_
_
=
_
_
0
0
0
_
_
.
Invertible Matrices 71
Thus T(v) = T(0), but v = 0. Thus this T is not injective.
If there is a vector v = 0 such that T(v) = 0 then T(v) = T(0) and T is not injective.
It turns out that the converse is also true: if there is no vector v, except 0, such that
T(v) = 0 then T is injective.
For this reason we dene the kernel of a linear transformation T to be the set of all
vectors which it maps to 0. Thus
kernel of T = ker(T) = {v V | T(v) = 0}.
Example Let T be the linear transformation whose matrix is M, below. Find the kernel,
ker(T), of T.
M =
_
_
1 0 0
0 1 0
0 0 0
_
_
.
Solution: We have that T(v) = 0 if and only if Mv = 0. Then
M
_
_
r
s
t
_
_
=
_
_
r
s
0
_
_
=
_
_
0
0
0
_
_
if and only if r = 0 and s = 0. Thus we have that ker(T) is the set of vectors whose rst
two entries are 0.
ker(T) =
_
_
_
_
_
0
0
t
_
_
| t R
_
_
_
.
Theorem 11 A linear transformation T : V W is a bijection, and hence invertible, if
and only if T(V ) = W and ker(T) = {0}.
Proof: T is surjective if and only if T(V ) = W.
We have seen that if ker(T) = {0} then there is some v = 0 such that T(v) = T(0), and
so T is not injective. We prove that if ker(T) = {0} then T is injective by contradiction.
Suppose that T(v) = T(u) with v = u. Then w = v u = 0 and T(w) = T(v u) =
T(v) T(u) = 0, so w ker(T). This is a contradiction. So we have that T is injective if
and only if ker(T) = {0}.
Remark If T : V W is a linear transformation then the image T(V ) and the kernel
ker(T) are both vector spaces. You can prove this by checking that each set satises the
10 laws for a vector space given at the end of Section 7.1.
7.8 Invertible Matrices
We have already remarked that one of the important properties of linear transformations
is that they correspond to matrices. We now look at the matrices of invertible linear
transformations.
If a linear transformation T : V W is invertible then there exists S : W V such that
S(T(v)) = v. What can we say about a matrix M which has the property that Mv = v,
for all v?
72 VECTOR SPACES
The matrix with 1s down the leading diagonal, and 0s everywhere else, is called the identity
matrix and is denoted by I
n
. So
I
4
=
_
_
_
_
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
_
_
_
_
, I
3
=
_
_
1 0 0
0 1 0
0 0 1
_
_
, I
2
=
_
1 0
0 1
_
.
(Notice, for all n n matrices M, MI
n
= M = I
n
M.)
It is easy to check that if v is an n-dimensional vector then I
n
v = v. The converse is also
true, if Mv = v for all n-dimensional vectors v then M = I
n
.
Denition We say that a n n matrix M is invertible if there is a matrix N such that
NM = I
n
. We write N = M
1
and M = N
1
Example If M and N are the two matrices below, then NM = I
2
, so N = M
1
.
N =
_
2 0
0 1/2
_
, M =
_
1/2 0
0 2
_
.
In general it is not obvious whether a matrix has an inverse, or, if it exists, what the inverse
is. We nish this section by discussing a way of testing to see if a matrix is invertible, and
giving an algorithm for nding the inverse if it exists.
7.9 Determinants And Row Reductions
The determinant of an nn matrix is a real number which is constructed from the entries
of the matrix. For a 2 2 matrix M the determinant is the dierence of the products of
the two diagonals.
M =
_
a b
c d
_
, detM = ad bc.
Example For
M
1
=
_
1 0
0 1
_
, M
2
=
_
2 0
0 0.5
_
, M
3
=
_
4 3
2 1
_
, M
4
=
_
0 1
2 5
_
, M
5
=
_
1 1
1 1
_
,
we have
detM
1
= 1, detM
2
= 1, detM
3
= 2, detM
4
= 2, detM
5
= 0.
For an n n matrix M the determinant of M is given using a recursive denition.
Given an (n+1)(n+1) matrix M we dene nn matrices M
1
, M
2
, . . . , M
n
by removing
the top row and the ith column from M.
So, for n = 2, if
M =
_
_
a
1
a
2
a
3
b
1
b
2
b
3
c
1
c
2
c
3
_
_
then M
1
=
_
b
2
b
3
c
2
c
3
_
, M
2
=
_
b
1
b
3
c
1
c
3
_
, M
3
=
_
b
1
b
2
c
1
c
2
_
.
Denition For a n n matrix M the determinant is given by:
Determinants And Row Reductions 73
if n = 1, and M = (a) then detM = a,
if n = 2, and M =
_
a b
c d
_
then detM = ad bc,
if n 3, then detM = a
1
detM
1
a
2
detM
2
+. . . + (1)
n+1
a
n
detM
n
.
For n = 3 we have
M =
_
_
a
1
a
2
a
3
b
1
b
2
b
3
c
1
c
2
c
3
_
_
, detM = a
1
(b
2
c
3
b
3
c
2
) a
2
(b
1
c
3
b
3
c
1
) +a
3
(b
1
c
2
b
2
c
1
).
We use determinants to test whether matrices are invertible. You are not expected to be
able to prove the following theorem, but you must understand it and be able to use it.
Theorem 12 An n n matrix has an inverse if and only if detM = 0.
Example Let T and S be the following linear transformations
T
_
r
s
_
=
_
r +s
r +s
_
, S
_
r
s
_
=
_
4r + 3s
2r +s
_
.
Then T, S have corresponding matrices
M
T
=
_
1 1
1 1
_
, M
S
=
_
4 3
2 1
_
.
Since detM
T
= 0 and detM
S
= 2, by Theorem 12, S is invertible, but T is not.
Exercise Show that the inverse of the linear transformation S, from the last example, is
R
_
r
s
_
=
_
0.5r + 1.5s
r 2s
_
.
Using the recursive denition to calculate the determinants of large matrices takes a lot
of time. We now show how to use row operations to reduce the matrix to one whose
determinant can be calculated easily.
First we give a lemma, which can be proved by induction on n, the dimension of the
matrix, and which says that certain types of matrices have determinants which are easy
to calculate.
Lemma 6 If M is a matrix which has all 0s above the leading diagonal, or which has
all 0s below the leading diagonal, then detM is the product of the elements on the leading
diagonal. If M is a matrix which has a row of 0s then detM = 0.
Thus det
_
_
2 3 7
0 4 1
0 0 1
_
_
= 8, det
_
_
2 3 7
0 0 0
5 7 1
_
_
= 0.
74 VECTOR SPACES
7.9.1 Elementary row operations
There are three kinds of elementary row operations which can be performed on a matrix
without aecting its determinant much.
(a). Interchange two rows.
(b). Divide a row by a non-zero real number.
(c). Add a scalar multiple of one row to a dierent row.
Lemma 7 (a) If M

is obtained from M by interchanging two rows then detM = detM

.
(b). If M

is obtained from M by dividing a row of M by a non-zero number a, then


detM = a detM

.
(c). If M

is obtained from M by adding a scalar multiple of one row to a dierent row,


then detM = detM

.
Problem Given an n n matrix M nd detM.
Algorithm Use the elementary row operations to transform M into a matrix with 0s
below the leading diagonal, keeping track of the modications to the determinant.
Then calculate the determinant of the new matrix by nding the product of the diag-
onal entries.
Example Find the determinant of the matrix
M =
_
_
_
_
1 1 0 3
1 2 4 2
1 1 3 2
0 1 0 2
_
_
_
_
.
Solution
detM =det
_
_
_
_
1 1 0 3
1 2 4 2
0 1 7 0
0 1 0 2
_
_
_
_
= det
_
_
_
_
1 1 0 3
0 3 4 1
0 1 7 0
0 1 0 2
_
_
_
_
= det
_
_
_
_
1 1 0 3
0 1 0 2
0 1 7 0
0 3 4 1
_
_
_
_
= det
_
_
_
_
1 1 0 3
0 1 0 2
0 0 7 2
0 3 4 1
_
_
_
_
= det
_
_
_
_
1 1 0 3
0 1 0 2
0 0 7 2
0 0 4 5
_
_
_
_
= 7det
_
_
_
_
1 1 0 3
0 1 0 2
0 0 1
2
7
0 0 4 5
_
_
_
_
= 7det
_
_
_
_
1 1 0 3
0 1 0 2
0 0 1
2
7
0 0 0
27
7
_
_
_
_
= 27.
Exercises 75
7.9.2 Finding inverses
It is possible to use the elementary row operations (a), (b) and (c), described above, to
nd the inverse of a matrix.
Problem Given an n n matrix M such that detM = 0, nd the inverse matrix M
1
.
Algorithm Use elementary row operations on M to reduce M to the identity matrix I
n
.
At the same time, perform the same elementary row operations on the identity matrix
I
n
. The matrix obtained will be M
1
.
Example Find the inverse of the matrix
_
4 3
2 1
_
.
Solution
_
4 3
2 1
_

_
4 3
4 2
_

_
4 3
0 1
_

_
4 3
0 3
_

_
4 0
0 3
_

_
4 0
0 1
_

_
1 0
0 1
_
_
1 0
0 1
_

_
1 0
0 2
_

_
1 0
1 2
_

_
1 0
3 6
_

_
2 6
3 6
_

_
2 6
1 2
_

_

1
2
3
2
1 2
_
.
The fact that the above algorithm for calculating the inverse is correct is stated in the
following theorem, which we shall not prove.
Theorem 13 If a sequence of elementary row operations reduces a matrix M to the iden-
tity matrix, then the same sequence of row operations will reduce the identity matrix to
M
1
.
(This also implies that M must be invertible if it can be reduced to the identity matrix in
this way.)
7.10 Exercises
1. For each of the vectors below, draw the vector as a line, and write each vector as a
linear combination of the standard basis vectors e
1
, e
2
, e
3
.
v
1
=
_
_
2
1
3
_
_
, v
2
=
_
_
2.4
0
1/2
_
_
, v
3
=
_
_
0
0
0
_
_
, v
4
=
_
_
0
0
1
_
_
.
2. Let T, S : V
2
V
2
be dened by
T
_
r
s
_
=
_
3s
s +r
_
and S
_
r
s
_
=
_
r +s
2r
_
.
Find matrices M
T
and M
S
which correspond to T and S.
Let R be the composition of T and S, so R(v) = S(T(v)). Write down R(v) when
v =
_
r
s
_
, and nd the matrix M
R
which corresponds to R.
3. Let T, S : V
2
V
2
be linear transformations, and suppose that T, S have matrices
M
S
=
_
a d
b e
_
, M
T
=
_
x p
y q
_
.
76 VECTOR SPACES
Let R be the composition of T and S, so R(v) = S(T(v)). Calculate M = M
S
M
T
and
show that for all v V
2
, R(v) = Mv.
4. Write down the determinants of the following matrices.
M
1
=
_
4 2
2 1.5
_
, M
2
=
_
2 2
1 1
_
, M
3
=
_
_
0 1 1
0 0 0
0 0 0
_
_
, M
4
=
_
_
0 1 2
2 5 1
0 1 3
_
_
.
5. Let T : V
3
V
3
be dened by
T
_
_
r
s
t
_
_
=
_
_
r +s +t
2s 4t
2r +s
_
_
.
Find the matrix corresponding to T, and calculate its determinant.
6. Show that the following vectors are linearly independent and that they are a spanning
set for 2-space, V
2
. Hence deduce that they form a basis for 2-space.
v =
_
1
2
_
, u =
_
5
0
_
7. Decide whether the each of the following maps is a linear transformation.
f : V
3
V
3
, f
_
_
r
s
t
_
_
=
_
_
3r 2
2s r
2
2t
_
_
g : V
3
V
3
, g
_
_
r
s
t
_
_
=
_
_
3r 2t
2s r
0
_
_
.
8. Let T, S : V
3
V
3
be dened by
T
_
_
r
s
t
_
_
=
_
_
3s
s +r
s +r
_
_
and S
_
r
s
_
=
_
_
r +s
2r
3t +r
_
_
.
Find matrices M
T
and M
S
which correspond to T and S.
Let R be the composition of T and S, so R(v) = S(T(v)). Write down R(v) when
v =
_
_
r
s
t
_
_
, and nd the matrix M
R
which corresponds to R.
9. Let N, M be matrices. Show that the equation MN = NM is not always true by
nding two 2 2 matrices P, Q such that PQ = QP.
10. Let T : V
4
V
4
be dened by
T
_
_
_
_
r
s
t
q
_
_
_
_
=
_
_
_
_
r +s t
4r 4t
2q +s +r t
q +s
_
_
_
_
.
Use the matrix corresponding to T to show that T is not invertible.
Chapter 8
Probability
Probability theory is used in many areas of Computer Science. Whenever several ap-
plications have to share resources then it is necessary to calculate the probable usage
requirements of each application, to decide how to assign the resources. For example, tele-
phone and computer networks have more handsets/terminals than there are connections
to the switch board/le server. Probability calculations are used in network design.
In safety critical areas, where software failure may cause injury or loss of life, it is
necessary to calculate the probability that an error will occur. This is then compared with
the level of damage which would be caused by the error, to give a measure of the safety
of the system.
Many expert systems, which are used for example in medical diagnosis and weather
forecasting, take a large set of actual case histories and use this to construct a decision algo-
rithm. The algorithm is then used to decide the most likely diagnosis/weather prediction,
based on the given symptoms/climatic conditions. The decision algorithm is constructed
using probability theory.
Articial Intelligence systems based on neural networks learn using probabilistic rea-
soning, and the cost of insurance is determined by probability calculations which predict
the likelihood of a claim.
8.1 Elementary Probability
(Similar material is in Section 4.5 of Rosen.)
Probability is concerned with events which are assumed to be random. If a coin is tossed
and required to spin several times before landing, experience shows that we cannot predict
which side will face upwards when it lands. We say a coin is fair if each face has the same
properties, so the coin is not weighted or bent. We assume that it is not possible for a
person to toss a coin, which has to spin several times, in a way which guarantees that it
will always lie head side up. Experience supports this assumption.
Denitions
A process is random if it is not possible to predict the outcome of the process in advance.
An event is a possible outcome of a process (experiment). For example, heads is an event
which is a possible outcome of the process of tossing a coin.
The set of all possible outcomes of an experiment (all possible events) is called the sample
space of the experiment. The sample space for tossing a coin (once) is {heads, tails}.
Another familiar example is the process of rolling a fair die. We assume that the die is
not weighted and that it must roll several times before stopping. Experience shows that
78 PROBABILITY
the number which will appear on the top face of the die is unpredictable. We assume that
it is a random process. The sample space associated with the rolling a die experiment is
the set {1, 2, 3, 4, 5, 6}, the values on each of the faces of the die.
In both of the above examples each of the events in the sample space is equally likely.
The chances of rolling a 1 are the same as the chances of rolling a 2, etc.
It is possible to construct random experiments whose outcomes have dierent proba-
bilities. For example, suppose we toss a fair coin twice and record the number of heads
and the number of tails. There are 4 possibilities:
HH HT TH TT.
The chances of two heads are 1 in 4, the chances of two tails are 1 in 4, but the chances
of one of each are 2 in 4.
Suppose we roll two identical fair dice and add together the total on each then the
sample space is {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. It is not true that each of these events is
equally likely. We record the values on each die as an ordered pair, so (3,1) means 3 on die
A and 1 on die B. There is only one way to get 2, by throwing two 1s i.e. (1,1). However,
there are six ways of getting 7,
(1,6) (2,5) (3,4) (4,3) (5,2) (6,1).
There are 36 possible results, 6 possible values on die A, and 6 on die B. The probability
of scoring 2 with the dice is 1 in 36, and the probability of scoring 7 is 6 in 36, or 1 in 6.
Denition We let [0, 1] = {r R | 0 r 1}. Then if S = {s
1
, s
2
, . . . , s
n
} is a
sample space, a probability function is a function p : S [0, 1] with the property
p(s
1
) +p(s
2
) +. . . +p(s
n
) = 1.
When tossing a coin we have S = {H, T} and p : S [0, 1] is given by
p(H) = 1/2, p(T) = 1/2.
(Notice: p(H) +p(T) = 1.)
For rolling a die we have S = {1, 2, 3, 4, 5, 6} and p : S [0, 1] is given by
p(1) = p(2) = p(3) = p(4) = p(5) = p(6) = 1/6.
(Notice: p(1) +. . . +p(6) = 1.)
When tossing a coin twice we have S = {2H, 2T, 1H1T} and p : S [0, 1] is given by
p(2H) = 1/4, p(2T) = 1/4, p(1H1T) = 1/2.
(Notice: p(2H) +p(2T) +p(1H1T) = 1.)
For rolling two dice we have S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12} and p : S [0, 1] is given
by
p(2) = 1/36, p(3) = 2/36, p(4) = 3/36, p(5) = 4/36, p(6) = 5/36, p(7) = 6/36
p(8) =5/36, p(9) = 4/36, p(10) = 3/36, p(11) = 2/36, p(12) = 1/36.
(Notice: p(1) +. . . +p(12) = 1.)
8.1.1 Compound events
Sometimes we want to know the probability that an event has a particular property. For
example, if we roll a die what is the probability that the number will be odd? This is the
same as asking for the probability that the event is in the set {1, 3, 5}.
Denition A set of events, a subset of the sample space S, is called a compound event.
The probability of rolling an odd number is the probability of the compound event {1, 3, 5}.
Elementary Probability 79
The probability of a compound event is the sum of the probabilities of the individual
events. For example,
p(odd) = p({1, 3, 5}) = p(1) +p(3) +p(5) = 1/6 + 1/6 + 1/6 = 1/2.
Generally, if {a
1
, a
2
, . . . , a
k
} = A S then
p(A) = p(a
1
) +p(a
2
) +. . . +p(a
k
).
The whole sample space S is a compound event, and p(S) = 1. The empty set is also a
compound event, and p() = 0.
S is the certain event; it is certain that any event which actually occurs will be in S. The
empty set is the impossible event; no event which may occur will be in .
The probability that an event (single or compound) will not occur, is the probability
of the complement event in S.
For example, the probability of not rolling a 3 in one die roll is
p(S\{3}) = p({1, 2, 4, 5, 6}) = 5/6.
The probability of not rolling an odd number is
p(S\{1, 3, 5}) = p({2, 4, 6}) = 1/2.
Theorem 14 (1) If A is a compound event in the probability space S then
p(S\A) = 1 p(A).
(2) If B is also a compound event then
p(A B) = p(A) +p(B) p(A B).
Example The (compound) event that 2 dice give a total score which is multiple of 6 is
A = {6, 12}. The event that 2 dice give a total score which is multiple of 4 is B = {4, 8, 12}.
Then we have
p(A) = 5/36 + 1/36 = 1/6
p(B) = 3/36 + 5/36 + 1/36 = 1/4
p(A B) = p({12}) = 1/36
p(A B) = p({4, 6, 8, 12}) = 7/18.
A B is the event that the score is a multiple of 4 or 6, and A B is the event that
the score is a multiple of 4 and a multiple of 6. We also have
p(A) +p(B) p(A B) = 1/6 + 1/4 1/36 = 7/18.
Thus we have veried Theorem 14 in this case.
Example A pack of 52 playing cards is shued in to random order, and then one card
is selected at random. What is the probability that this card is either a spade, an ace or
a two?
Solution Let A be the set of 13 spades, and let B be the set of 4 aces and 4 twos. The
event that the selected card is either a spade, an ace or a two is the set A B. Thus we
want p(A B). Since A B = {AS, 2S} we can use Theorem 14 to get
p(A B) = p(A) +p(B) p(A B) = 19/52.
p(A) = 13/52, p(B) = 8/52, p(A B) = 2/52.
80 PROBABILITY
8.2 Conditional Probability
(Similar material is in Section 9.1 of Ross & Wright.)
So far we have assumed that we cannot predict anything about the events which may
occur. However, sometimes we have some information which changes the probabilities of
the events.
For example, suppose that someone rolls a die, looks at the result, and tells everyone that
the number is odd. This is the same as saying that the event which has occurred belongs
to B = {1, 3, 5}. Knowing that the number is odd, we know that the probabilities of the
event being 2,4, or 6 are 0. The remaining numbers, 1,3,5 are still equally likely, so we
have
p(1) = p(3) = p(5) = 1/3.
We also have that
p(B) = 1 and p({2, 4, 6}) = p(S\B) = 0.
Denition When we have additional information that an event must be in a set B,
the probability of an event is called the conditional probability given B. The conditional
probability is written p(s|B).
Example Suppose that a coin is tossed twice and you are told that there is at least one
head. What is the probability that there is one head and one tail?
We have S = {2H, 2T, 1H1T}, B = {2H, 1H1T} which corresponds to HH, HT, TH.
We get 1H1T if either HT or TH occurs. So:
Answer p(1H1T | B) = 2/3.
We can also form the conditional probability of compound events. For example, if we
throw two dice, and if we are told that the total score is odd, what is the probability that
the score is prime?
We have B = {3, 5, 7, 9, 11} and we want to know the probability that the event is in
A = {2, 3, 5, 7, 11}. How can we nd p(A|B), the probability of A given B?
If we are told that the event is in B, then B has occurred and we know that p(B) = 1.
We re-normalize the probabilities of the events in S so that p(B) = 1. If s B then map
p(s) p(s)/p(B), if s B then map p(s) 0.
In general, if S = {s
1
, s
2
, . . . , s
n
} and B = {s
1
, s
2
, . . . , s
k
} then
p(B)
p(s
1
)
p(B)
+. . . +
p(s
k
)
p(B)
= 1.
We achieve this by dividing all the probabilities by p(B). So we get
p(s|B) =
p(s)
p(B)
, if s B
p(s|B) = 0, if s B.
For compound events A, an event in A can only occur if it is also in B. So we only consider
events in A B.
Formula for the probability of A given B:
p(A|B) =
p(A B)
p(B)
.
Conditional Probability 81
We now return to the example above. If A = {2, 3, 5, 7, 11} and we are given B =
{3, 5, 7, 9, 11} then A can only happen if the score is 3,5,7, or 11, i.e. if A B occurs. We
have
p(B) = p(3) +p(5) +p(7) +p(9) +p(11) = 18/36, p(A B) = 14/36.
p(A|B) =
p(A B)
p(B)
=
14/36
18/36
=
14
18
=
7
9
.
For certain applications we need to consider several levels of conditional probability. First
we re-arrange the formula for p(A|B) to get
p(A B) = p(B)p(A|B).
Suppose that we have three compound events, A, B, and C, and that we want p(ABC).
We get
p(A B C) = p(A)p(B|A)p(C|A B)
Example Three cards are drawn, without replacement, from a standard pack of playing
cards. What is the probability that all three cards are aces?
Solution Let A be the event that the rst card is an ace, let B be the event that the
second card is an ace, and let C be the event that the third card is an ace. We need
p(A B C).
We have p(A) = 4/52 = 1/13.
If we know that A is true then there are 3 aces, and 51 cards from which to choose the
second card. Thus we have p(B|A) = 3/51. Finally, p(C|A B) = 2/50.
Thus p(A B C) = p(A)p(B|A)p(C|A B) = 3/(13.51.25) = 3/16575.
In general, we have
p(A
1
A
2
. . . A
k
) = p(A
1
)p(A
2
|A
1
)p(A
3
|A
1
A
2
) . . . p(A
k
|A
1
A
2
. . . A
k1
).
8.2.1 Probabilities and partitions
If two sets S
1
, S
2
partition the sample space S then we have S
1
S
2
= S and S
1
S
2
= .
For any compound event A we have
A = (S
1
A) (S
2
A) and (S
1
A) (S
2
A) =
A
S
S
1
S
2
S
1
A S
2
A
Then, using the rule for calculating the probability of unions, we get
p(A) = p(S
1
A) +p(S
2
A).
In general, if S
1
, S
2
, . . . , S
k
is a partition of S, and if A is a compound event, then
p(A) = p(S
1
A) +p(S
2
A) +. . . +p(S
k
A).
For example, let S = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}, A = {2, 3, 5, 7, 11}, S
1
{2, 3}, S
2
= {4},
S
3
= {5, 8, 12}, S
4
= {6, 7, 9, 10, 11}. Then S
1
, S
2
, S
3
, S
4
is a partition of S.
82 PROBABILITY
p(A) = 1/36 + 2/36 + 4/36 + 6/36 + 2/36 = 5/12
p(S
1
A) = p({2, 3}) = 1/12
p(S
2
A) = p() = 0
p(S
3
A) = p({5}) = 1/9
p(S
4
A) = p({7, 11}) = 2/9
Then p(S
1
A) +p(S
2
A) +p(S
3
A) +p(S
4
A) = 1/12 + 1/9 + 2/9 = 5/12 = p(A),
verifying the result in this case.
Re-arranging the formula for conditional probability often helps us to nd required prob-
abilities. We now consider another example.
Example Imagine that there is a rare illness which eects 1 in 10000 people. Suppose
that we have a medical scanner which detects the illness in 99.9% of people who have
it, but which also incorrectly reports that 0.1% of non-suers have the illness. If you
have been diagnosed by the scanner as having the illness what is the probability that you
actually do have the illness?
Solution First we establish some notation for the possible events.
I = the event that a person has the illness = the set of people with the illness
H = the event that a person does not have the illness = the set of healthy people
D = the event that the scanner diagnoses the illness in a person.
From the information that we are given we have
p(I) = 1/10000, p(D|I) = 999/1000, p(D|H) = 1/1000.
Since I, H form a partition of the sample space (set of all people) we have that
p(H) = 1 p(I) = 9999/10000.
We want p(I|D).
For any compound events A, B, we can use the formulae for p(A|B) and p(B|A) to deduce
that
p(A|B) =
p(B|A)p(A)
p(B)
.
Furthermore, if A, B partition the sample space we have that for any other compound
event X,
p(X) = p(X A) +p(X B) = p(X|A)p(A) +p(X|B)p(B).
Using these two results, substituting I for A, H for B and D for X, and substituting I
for A and D for B in the previous formula, we get
p(I|D) =
p(D|I)p(I)
p(D|I)p(I) +p(D|H)p(H)
=
(999/1000)(1/10000)
(999/1000)(1/10000) + (1/1000)(9999/10000)
=
999
10998
.
Thus we have that the probability of having the illness given a positive diagnosis by the
scanner is 999 in 10998, which is approximately 9%!
Exercises 83
The method that we used to rewrite the probabilities that we were given is often useful for
calculating conditional probabilities. Writing a conditional probability using a partition,
as we did in the above example, has a special name; Bayes Theorem.
Bayes Theorem If S is a sample space, if S
1
, S
2
, . . . , S
k
is a partition of S and if B
is a compound event, so B S, then
p(S
i
|B) =
p(B|S
i
)p(S
i
)
p(B|S
1
)p(S
1
) +p(B|S
2
)p(S
2
) +. . . +p(B|S
k
)p(S
k
)
,
for all i = 1, 2, . . . , k.
Example Assume that 25% of babies born in this country have mothers that smoked
during pregnancy. Suppose it is known that 60% of the babies born whose mothers smoke
weigh less than 6lbs. Suppose that it is also known that 20% of babies whose mothers
do not smoke weigh less than 6lbs at birth. If a baby is picked at random and found to
have birth weight less than 6lbs, use Bayes Theorem to calculate the probability that the
babys mother smokes.
Solution Let B be the event that a baby weighs less than 6lbs, and let M be the event that
a babys mother smokes. We are given that p(M) = 25/100 and that p(B|M) = 60/100.
We want to nd p(M|B).
Let N be the event that the babys mother does not smoke. We are told that p(B|N) =
20/100.
Then M, N partition the sample space. So we have p(N) = 1 p(M) = 75/100.
Using Bayes Theorem we have
p(M|B) =
p(B|M)p(M)
p(B|M)p(M) +p(B|N)p(N)
=
(60/100)(25/100)
(60/100)(25/100) + (20/100)(75/100)
=
75
150
.
Thus the probability of a low weight baby having a mother who smokes is 50%.
8.2.2 Independent events
When we select two cards from a pack, the selection we make rst aects the probability
of the card which will be selected second. If we select a heart, then the probability of
selecting heart the second time is slightly reduced. This dependence does not always
exist. If we select a card, replace it, shue the cards and select a second card then the
probabilities remain the same. If we selected a heart the rst time, there is still a 1 in 13
probability of selecting a heart the second time.
If we roll a die and get 6, the probability that we will get 6 again the next time we roll
is still 1/6. Furthermore, if we toss a coin 20 times and get all heads, the probability of
getting heads on the 21st toss is still 1/2 (provided that the coin is fair).
We say that two events A, B are independent if p(A B) = p(A)p(B). From the formula
for conditional probabilities we see that this is the same as saying
p(A|B) = p(A).
8.3 Exercises
1. The 13 hearts from a standard pack of cards are shued in to a random order, then
one card is selected. What is the probability that this card is the ace of hearts?
84 PROBABILITY
2. Three dice are rolled and their total score is noted. Write down the sample space S
of this experiment. What is p(3)? What is p({3, 18})? What is p(S\{3, 18})?
3. A die is thrown and you are told that the number is a multiple of 3. What is the
probability that the number is 3?
4. 4 cards are drawn without replacement from a standard pack. What is the probability
that they are a mix of Kings and Queens?
5. Let A, B, C be three independent compound events, with A B C = S. If E S,
write down the probability of A given E in terms of p(E|A), p(E|B), p(E|C), p(A), p(B),
and p(C).
6. A coin is tossed three times. Find the probability that the number of heads is 1 and
the number of tails is 2.
7. Let S = {s
1
, s
2
, s
3
, s
4
, s
5
} be a sample space, and suppose that p : S [0, 1] is given
by
p(s
1
) = 1/9, p(s
3
) = 1/9, p(s
5
) = 1/9
p(s
2
) = 4/9, p(s
4
) = 2/9.
If A = {s
1
, s
2
, s
4
}, B = {s
3
, s
5
}, nd p(A), p(S\A), p(A B).
8. It is known that 50% of the students doing Maths A-level are also doing Physics A-
level. It is also known that 25% of all students doing A-level are taking Maths as one of
their A-levels, and 10% of all students doing A-levels are taking Physics. If an A-level
student is picked at random and found to be taking Physics A-level, use Bayes Theorem
to calculate the probability that student is also taking Maths A-level.
Chapter 9
Statistical Distributions
In this nal section we look at what happens when the same experiment is repeated many
times. The sort of questions we consider are: if an experiment is repeated several times
how often is a certain event likely to occur?
9.1 The binomial distribution
(Similar material is in Section 9.4 of Ross & Wright.)
Suppose that we roll two dice four times and count the number of times that we throw a
double, i.e. that both dice have the same value. In four throws we can get 0, 1, 2, 3, or
4 doubles. What is the likelihood of each possibility?
Let D be the event that we throw a double, and let N be the event that the result is
not a double. There are 36 possible outcomes for each throw, and 6 of these are doubles.
Thus we have that p(D) = 6/36 = 1/6 and p(N) = 1 p(D) = 5/6.
Let S be the sample space which consists of all the possible outcomes of four throws. So S
contains 16 events; the event that all four throws are doubles, D, D, D, D, the event that
the rst three throws are doubles but the last is not D, D, D, N, etc.
The possible events are:
D,D,D,D D,D,D,N D,D,N,D D,N,D,D N,D,D,D D,D,N,N
D,N,D,N N,D,D,N D,N,N,D N,D,N,D N,N,D,D D,N,N,N
N,D,N,N N,N,D,N N,N,N,D N,N,N,N
86 STATISTICAL DISTRIBUTIONS
We can dene a function X : S R by setting
X(s) = the number of doubles in s.
So X(D, D, D, D) = 4, X(N, D, N, D) = 2, and for all s S we have X(s) S
4
{0}.
It is possible that each of the four throws has no doubles, so the result is N, N, N, N. Each
throw is independent, so the probability of this is just
p(N)p(N)p(N)p(N) = (5/6)
4
= 625/1296.
This is the only event that contains no doubles, thus we have that the probability that
X(s) = 0 is 625/1296.
Notation It is usual to write X = 0 as shorthand for {s S | X(s) = 0}, the set events
s for which X(s) = 0. We also write X = k for {s S | X(s) = k} and h < X k for
{s S | h < X(s) k}, etc.
Thus X = 0 is a compound event in S (a set of single events). Thus it makes sense to ask
what is p(X = 0)? This is the question that we have just answered, p(X = 0) = 625/1296.
Of course, we can ask what is p(X = r)? for any r R. We know that p(s) = 0, 1, 2, 3
or 4 so if r S
4
{0}, we have {s S | X(s) = r} = , so X = r is the empty set and
p(X = r) = 0.
To calculate p(X = 1) we need to nd the probability of each event which contains
exactly one double. There are four such events, D, N, N, N; N, D, N, N; N, N, D, N and
N, N, N, D. Since each dice roll is independent of the previous rolls, for each of these
events the probability is the product of three instances of p(N) and one instance of p(D),
i.e p(N)
3
p(D). Thus we get that
p(X = 1) =(5/6)
3
(1/6) + (5/6)
3
(1/6) + (5/6)
3
(1/6) + (5/6)
3
(1/6)
= 4 75/1296 = 75/324.
There are six events which contain two Ds, four events which contain three Ds and one
event which contains four Ds. Using the same method we calculate that
p(X = 2) = 6 (5/6)
2
(1/6)
2
= 25/216
p(X = 3) = 4 (5/6)(1/6)
3
= 5/324
p(X = 4) = (1/6)
4
= 1/1296.
We can generalize this method to calculate probabilities when an experiment is repeated
n times, for any n P. To do this we need to use binomial coecients.
9.1.1 Binomial coecients
Sometimes algorithms require us to look at all subsets of a set. How many of these are
there that contain a specied number of elements?
In a set with n elements there are n subsets which contain one element.
How many sets are there with two elements? Well, there are n choices for the rst
element and then n1 choices for the second element. But {1, 2} and {2, 1} are the same
set, so we have to divide by 2. So the are
n(n 1)/2
subsets with two elements.
The binomial distribution 87
For three elements we have n choices for the rst element, n1 choices for the second
element and n 2 choices for the third element. This time we have six repeats, {1, 2, 3},
{1, 3, 2}, {2, 1, 3}, {2, 3, 1}, {3, 1, 2} and {3, 2, 1} are all the same set. So we have
n(n 1)(n 2)/6
subsets with three elements.
We can begin to see a pattern here, because
n(n 1)(n 2)
6
=
n(n 1)(n 2) . . . 3 2 1
(3 2 1) ((n 3) . . . 3 2 1)
=
n!
3!(n 3)!
and
n(n 1)
2
=
n(n 1) . . . 2 1
(2 1) ((n 2) . . . 3 2 1)
=
n!
2!(n 2)!
If we list the subsets with three elements in a set {a, b, c, d, e} of size ve we get
{a, b, c}, {a, b, d}, {a, b, e}, {a, c, d}, {a, c, d},
{a, c, e}, {b, c, d}, {b, c, e}, {b, d, e}, {c, d, e}
and there are
10 =
5!
3!(5 3)!
of these. The following result can be proved by induction.
Theorem 15 A set S which has n elements (where n P) has
n!
r!(n r)!
subsets con-
taining r elements, for 1 r n.
The value
n!
r!(n r)!
is used in many aspects of mathematics and computer science. It is
called the binomial coecient and it is often written
_
n
r
_
9.1.2 Binomial coecients in probability
In the rolling dice example above, the sample space S is the set of all 4-tuples (events)
whose entries are either D or N. To calculate p(X = 2) we needed to know how many
events had exactly two Ds. We found this out by writing out all the possibilities. In fact
we can use the binomial coecients to nd the number of events which contain k Ds.
If T = {a, b} is a set with two elements then T
n
is the set of all n-tuples whose entries
are either a or b. Suppose that u = (a
1
, a
2
, . . . , a
n
) T
n
. Let B = {i | a
i
= b}, so B is
the set of indexes in u which have the value b. So u has k bs if and only if |B| = k, i.e. if
and only if B has k elements. Of course, B is a subset of the set {1, 2, . . . , n} which has
k elements. Every subset which has k elements corresponds to a n-tuple with exactly k
bs. Thus the number of n-tuples with exactly k bs is the same as the number of subsets
of {1, 2, . . . , n} which have size k. This is the binomial coecient. So we have:
Lemma 8 Let S be the set of all n-tuples whose entries are either a or b. Then the
number of n-tuples which have exactly k bs, 0 k n, is the binomial coecient
_
n
k
_
=
n!
k!(n k)!
.
88 STATISTICAL DISTRIBUTIONS
Example Two dice are rolled n times, what is the probability that exactly k doubles
are rolled in this process?
Solution Let S be the sample space, and let X : S R be the function which gives the
number of doubles in each possible event. We are required to nd p(X = k).
Let D be the event that a double is rolled, and let N be the event that a double is not
rolled. So p(D) = 1/6 and p(N) = 5/6.
An event in the trial is a sequence of Ds and Ns of length n, i.e. an n-tuple of Ds and
Ns. If k doubles are rolled the sequence must contain k Ds and (n k) Ns. Since each
dice roll is independent from the others, the probability of such a sequence is
(1/6)
k
(5/6)
nk
.
From Lemma 8 we see that there
_
n
k
_
such sequences, thus we get that
p(X = k) =
_
n
k
_
5
nk
6
n
.
The above reasoning doesnt really have anything to do with rolling dice.
Denitions Any experiment which is repeated several times is called a trial. If the
experiment has only two possible outcomes s and f say, then it is a binomial trail. One
of the outcomes s, is called a success, (in the dice rolling example, D is a success) and the
other outcome f is called a failure, (N is the failure).
We must be told (or be able to calculate) the probability function for a single exper-
iment. So we know p(s). Since there are only two possible outcomes (the experiment is
binomial) we have that p(f) = 1 p(s).
We let S be the sample space of the trial, so S contains all the n-tuples whose entries
are s or f. Let X : S R be the function which counts the number of successes in an
event. Then the probability that an event has k successes (k entries s) is the number of
events with k successes multiplied by the probability of such an event. The probability of
k successes and n k failures is p(s)
k
p(f)
nk
= p(s)
k
(1 p(s))
nk
, so
p(X = k) =
_
n
k
_
p(s)
k
(1 p(s))
nk
.
Denition Suppose that a process which has two possible outcomes is repeated n times.
If a is one of the possible outcomes then the number, k, of as which occur in n tests is
said to have the binomial distribution. If a has probability p [0, 1] then the probability
of k occurrences of a is given by
_
n
k
_
p
k
(1 p)
nk
.
The number of doubles obtained when rolling two dice a xed number of times has the
binomial distribution.
If two dice are rolled 10 times then the possible numbers of doubles are 0,1,2,3,4,5,6,7,8,9,10.
Let X(s) be the number of doubles in the event s. Using the formula for the binomial
distribution we get that
Random variables 89
p(X = 0) =
_
10
0
_
5
6
10
0.162 p(X = 1) =
_
10
1
_
5
6
9
1
6
0.323
p(X = 2) =
_
10
1
_
5
6
8
1
6
2
0.291 p(X = 3) =
_
10
3
_
5
6
7
1
6
3
0.155
p(X = 4) =
_
10
4
_
5
6
6
1
6
4
0.054 p(X = 5) =
_
10
2
_
5
6
5
1
6
5
0.013
We can give a graphical representation of this distribution
0 4 5 6 7 8 9 10 3 2 1
Example A fair coin is tossed 8 times. Find the probability that the outcome contains
at least 6 heads.
Solution Let S be the sample space of the trial, so S is the set of all 8-tuples containing
Hs and Ts. We count an H as a success, and p(H) = 1/2.
Let X : S R be the function which counts the number of heads. We want to nd
p(X 6). Thus we need to nd
p(X = 6) +p(X = 7) +p(X = 8).
There are only two possible outcomes of each experiment in the trial, the number of
Hs has the binomial distribution. Hence
p(X = k) =
_
n
k
_
p
k
(1 p)
nk
.
In this case we have p = 1/2 and n = 8. Thus we have
p(X 6)) =
_
8
6
_
1
2
6
(1
1
2
)
2
+
_
8
7
_
1
2
7
(1
1
2
) +
_
8
8
_
1
2
8
= 28
1
2
8
+ 8
1
2
8
+
1
2
8
=
37
256
.
9.2 Random variables
(Similar material is in Section 4.5 of Rosen.)
When we discussed the binomial distribution, we considered a function X : S R. We
wrote X = k for {s S | X(s) = k}. We think of X as a variable which can take values
in R. If X is associated with a binomial distribution of outcomes with probability p in n
repeated experiments, then the probability that X = k is
p(X = k) =
_
_
n
k
_
p
k
(1 p)
nk
for 0 k n,
0 for all other values of k R.
90 STATISTICAL DISTRIBUTIONS
Denition Let S be a sample space. Any function X : S R is a random variable.
The probability that the random variable has a value r is
p(X = r) = p({s S | X(s) = r}).
If s
1
, s
2
, . . . , s
k
are exactly the events for which X(s) = r, then p(X = r) = p(s
1
) +
p(s
2
) +. . . +p(s
k
).
Example Let S be the sample space of rolling a fair die once, so S = {1, 2, 3, 4, 5, 6}.
The function X : S R given by X(s) = s is a random variable on S. It takes each of
the values 1,2,3,4,5 and 6 with probability 1/6, and any other number with probability 0.
So X is a random variable which is equally likely to have any value between 1 and 6.
Example Let S be the sample space of rolling two fair dice once, so S = {2, 3, 4, . . . , 12}.
The function X : S R given by X(s) =total score is a random variable on S. It takes
the value 2 with probability 1/36, the value 3 with probability 2/36, etc. So X is a random
variable which takes values between 2 and 12, and which is more likely to take values in
the middle of this range.
Example Let S be the sample space of tossing a fair coin twice. So S = {HH, HT, TH, TT}.
The function X : S R given by X(s) =the number of Hs is a random variable on S.
It takes the value 2 with probability 1/4, the value 0 with probability 1/4 and the value
1 with probability 1/2. So X is a random variable which takes values 0,1,2, and which is
equally likely to take the value 0 or 2, but twice as likely to take the value 1.
The probability that a random variable takes a given value is always a number between 0
and 1, and the total sum of the probabilities of all values is 1 (the probability that X will
have some value). Thus we have a function f
X
: R [0, 1] given by
f
X
(r) = p(X = r).
Denition The sum of all of the f
X
(r) is 1, so f
X
is a probability function on R (see
Section 8.1). It is called the probability distribution function for X.
Example If X counts the number of heads in n coin tosses then f
X
(r) = p(X = r), so
f
X
(r) =
_
_
n
k
_
p
k
(1 p)
nk
if r = k for some for 0 k n,
0 for all other values of r R.
In this case f
X
is called the binomial distribution function.
If we roll a die once and record the value thrown then the sample space is {1, 2, 3, 4, 5, 6}.
We have X : S R and for each s S, p(X = s) = 1/6. Thus
f
X
(r) =
_
1/6, if r = 1, 2, 3, 4, 5, or 6,
0, otherwise.
A random variable has uniform probability distribution if the probability of all its (non-
zero) values are the same.
If X is the random variable given by the die roll experiment, then X has uniform proba-
bility distribution. We can give a graphical representation of f
X
:
Expected values 91
0 4 5 6 7 8 9 10 3 2 1
Often we want to know the probability that a random number has a value below a given
bound. In other words, we want to know p(X k). The function F
X
: R R given by
F
X
(r) = p(X r)
is also a probability function on R, it is called the cumulative distribution function for X.
9.3 Expected values
(Similar material is in Section 9.3 of Ross & Wright.)
Consider the following game. Two people take it in turns to throw a die, and each person
moves forward the number of meters equal to the value that they have thrown. The winner
is the rst person to get to the hundred meter line.
How long (how many die throws) do we expect the game to last?
Of course it is possible (but unlikely) that both players will always throw 1s, thus the
game will take 199 throws. It is also possible that the rst player will always throw a 6,
in which case the game will last for 33 throws. But how long do we expect the game to
last on average?
We know that the probability of throwing each of the possible numbers is equally likely.
Then the average value of a throw is the sum of each possible score divided by its proba-
bility. So the average score is
1/6 + 2/6 + 3/6 + 4/6 + 5/6 + 6/6 = 21/6 = 3.5.
Thus on average we expect to move 3.5 meters per throw (of course we wont actually
move 3.5 meters on any one throw!). Since 3.5 28 = 98 and 3.5 29 = 101.5 we expect
the game to last 58 throws, on average.
We often need to estimate the average outcome of an experiment. This average may not
be an actual outcome, as in the game described above, but it gives a way of calculating
the expected result of repeating an experiment a large number of times.
Denition Suppose that S is a sample space and that X : S R is a random variable
which takes values x
1
, x
2
, . . . , x
k
. Then the mean value, E(X), of X is given by
E(X) = x
1
p(X = x
1
) +x
2
p(X = x
2
) +. . . +x
n
p(X = x
n
)
= x
1
f
X
(x
1
) +x
2
f
X
(x
2
) +. . . +x
n
f
X
(x
n
).
Example
If X is the random variable given by rolling a die once then X has a uniform probability
distribution, with f
X
(k) = 1/6, for 1 k 6. Thus we get that
E(X) = 1(1/6) + 2(1/6) + 3(1/6) + 4(1/6) + 5(1/6) + 6(1/6) = 3.5.
92 STATISTICAL DISTRIBUTIONS
Example
If X is any random which takes values x
1
, . . . , x
n
and has a uniform probability distribu-
tion, then f
X
(x
k
) = 1/n, for 1 k n. Thus we get that
E(X) =
x
1
+x
2
+. . . +x
n
n
.
Example
If X is the random variable given by tossing a coin 4 times and counting the numbers
of heads then X has a binomial probability distribution, with f
X
(k) =
_
4
k
_
1
2
4
, for
0 k 4. Thus we get that
E(X) = 0
_
4
0
_
1
2
4
+ 1
_
4
1
_
1
2
4
+ 2
_
4
2
_
1
2
4
+ 3
_
4
3
_
1
2
4
+ 4
_
4
4
_
1
2
4
=
4!
1!3!
1
2
4
+
2.4!
2!2!
1
2
4
+
3.4!
3!1!
1
2
4
+
4.4!
0!4!
1
2
4
=
4 + 12 + 12 + 4
16
= 2 = 4 (1/2).
Theorem 16 If X is a random variable which has a binomial distribution which arises
from n repeats of an experiment whose outcome has probability p then
E(X) = np.
(This theorem can be proved by induction.)
9.4 Exercises
1. Two dice are thrown 6 times and X : S R counts the number of doubles, so
X(s) = 0, 1, 2, 3, 4, 5, 6. Write down the formula for p(X = k), the probability that the
random variable X has value k. Hence nd p(X = 3).
2. A pyramid die has 4 equal faces each of which is a triangle. The faces have the numbers
2,4,6 and 8 on them. Suppose that X is the random variable given by rolling this die once
and recording the score. Calculate E(X).
3. An experiment has outcomes a and b, and p(a) =
1
3
. The experiment is repeated 8
times and X(s) = number of occurrences of a in s. Find f
X
(r) for all r R, and plot a
graph of f
X
.
4. Calculate E(X) for the random variable X in Qu.3.
5. Let X : S R have values 0,1,2,3,4,5,7. Suppose that X has binomial distribution
with associated probability p = 1/4. Use the formula for f
X
to nd E(X). Hence verify
that E(X) = 7p, as required by Theorem 16.
Chapter 10
Some Properties Of Numbers
Summarized below are some properties of numbers which you need to know. (You are not
expected to be able to prove these properties, but you must know them and be able to
use them.)
1. A number n is prime if
(a) n N and n 2, and
(b) If m P and m divides n then m = n or m = 1.
The set of primes is innite. The rst few primes are
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, . . .
2. Every integer n Z can be written as a product of primes
n = p
1
p
2
p
3
. . . p
k
or n = p
1
p
2
p
3
. . . p
k
Where each p
i
is a prime.
3. If p is a prime and p divides nm, where n and m are integers, then either p divides
n or p divides m.
4. For any natural number r N, every element of Z can be written in exactly one of
the following forms:
kr, kr + 1, kr + 2, . . . , kr + (r 1),
where k Z.
5. If p is a prime and p divides n
2
then p
2
divides n
2
. (This follows from 3.)
6. If k, h, r Z and if r divides h and (h +k), then r divides k.
7. The logarithm to base a of x R, when x > 0, is the (real) number r such that
a
r
= x (such a number always exists). We write log
a
(x) = r.
In particular, log
a
(1) = 0, log
a
(a) = 1, and a
loga(x)
= x.
If b > a > 1 then the number r such that a
r
= s will need to be bigger than the
number l such that b
l
= s. So we see that if b > a then log
b
(x) < log
a
(x), for all
x > 1.
For all a, x, y > 0 we have log
a
(xy) = log
a
(x) +log
a
(y) and log
a
(x
y
) = ylog
a
(x).
For all a, b, x > 0 we have log
b
(x) = log
b
(a)log
a
(x).

You might also like