Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

MATH1179

Mathematics for
Computer Science
Part 2

Notes 2

Dr Yvonne Fryer
September 2021
MATH1179

Mathematics for Computer Science


This handbook contains the basic material you will need for the first half of the module. Worked
examples and further explanations will be given in the lectures and tutorials so it is very important
that you attend lectures and tutorials and bring these notes with you. Only the most basic information
about each topic is contained in this handbook. Accompanying there notes will also be a workbook
with corresponding questions on a particular topic that are to be completed during the tutorials.

What is this course about?


We will look at some of the fundamental ideas of mathematics that are useful and applied in aa
computing setting. You will gain far more from this course if you engage fully, work hard, contribute,
and have fun!

Course topics
A variety of topics will be introduced. Some may be familiar to you, others possibly less so. You will
be encouraged to read around and find out more about the topics.
Topics will include sets, number theory, linear algebra, calculus, graph theory… and more!

How will the course run?


There is a two hour lecture and a two hour tutorial session each week for the module.

Tutorials
The tutorials allow you the opportunity to really make use of the tutors help and guidance. The tutors
will be monitoring your completion of the tutorial exercises in the workbook.
There will be tutorial exercises accompanying every week of lectures - you must try to complete these
exercises - the most important reason is to develop your understanding. If you cannot do an exercise
then ask for help!
Working together on tutorial exercises is encouraged and strongly recommended – you can learn from
each other!
You should work through a selection of exercises and ask for help on ones you don’t understand how
to do.
Solutions will appear on Moodle a little while after the tutorial, after you have had time to try all the
exercises for yourself.

Attendance
It is expected that you attend all lectures and tutorials. Records of attendance are kept. There is always
a very strong correlation between attendance and marks.
If you cannot make a class, make sure you notify your absence online, let your personal tutor know
there is a long term problem.

Mathematics for Computer Science i


MATH1179
Assessment
This course is assessed by two tests each worth 50% of the marks available for the module. There
will be formative assessment prior to the summative tests. This formative assessment will be used as
a practice for you. More information will be given on this in the lectures.

Books
We do not “lecture from a book” and are therefore not expecting you to go out and buy a book.
There are some books and sites listed that may be useful for you, but it is not compulsory to buy
anything.
If you find any really good books or sites, let me know, so that I can recommend them to other
students!
www.mathcentre.ac.uk has useful resources and practice material, often referred to on the course.

Finally …
Finally, we want you to enjoy this course! We want to get you thinking in different ways.
Put the work in and contribute to lectures, and you will learn more and hopefully enjoy it while you
are doing it!
And ask me anything you want to ask!

ii Term 1
MATH1179

Contents

Page No. Bisection method .............................. 27


Newton-Raphson Method ................. 29
1: Vectors and Matrices ............................... 1
5.2 Numerical differentiation ................. 32
1.1 Vectors................................................ 1
Finite Difference Approximations of
1.2 Matrices .............................................. 4
the First Derivative ........................... 32
1.3 Matrices and vectors for 2D
5.3 Numerical integration....................... 34
transformations ......................................... 8
6: Data and Statistics ................................. 39
2: Graph Theory and Networks .................. 2
Data and Information ............................. 39
3: Algorithms .............................................. 10 Parameters and Statistics ........................ 39
Performance Characteristics of Algorithms Graphical Representation of data ........... 40
................................................................ 12
Types of Data ................................... 41
Orders of magnitude ............................... 16
Frequency Distributions ................... 43
Computability and the Turing Machine . 17
Using a Spreadsheet ......................... 44
4: Differential and Integral Calculus ....... 19 Probability Distributions .................. 53
4.1 Differentiation .................................. 19
7: Probability .............................................. 55
4.2 Integration ........................................ 23
Introduction to Probability ............... 55
5: Numerical Methods – not for 2021/22 .. 27 Definitions ........................................ 55
5.1 Root finding...................................... 27 Probability of an Event ..................... 56

Mathematics for Computer Science iii


MATH1179

iv Term 1
MATH1179

1: Vectors and Matrices

1.1 Vectors
The word vector is used for a type of quantity that has size as well as direction. A vector quantity can
be depicted by an ‘arrow’ with the length of the shaft representing the size and the arrowhead its
direction.

Example
A ship is travelling west for 6 km.

The example shows a free vector as it describes the journey of the ship but does not give a starting
point. If we want to tie the vector to a particular place this is called a position vector. This also
enables us to combine vectors to build up a resultant ‘journey’.

Example
A ship is travelling west for 6 km
from Liverpool and then 3km north.

In our examples so far the direction


has been described using the
compass points and they have been
in 2 dimensions (assuming the ship Liverpool
doesn’t sink). For more complex
vectors we will not be able to describe them so easily.

If we use our 2D coordinate system we can describe each vector in terms of its distance and direction
from the origin O at (0,0) and we can also use other formats to describe the vector

Example
Here we can see two position vectors p and q. They
could also be described as the vectors going from
the origin, O, to the points P(3,4) and Q(5,3)
respectively or as vectors OP and OQ.
5
P Another way of describing them is by giving their
4
direction and length but in these cases those
3 Q numbers are not going to be straightforward.
p q
2 OP is length 5 and the angle is 53o with the
horizontal
1
OQ is length 5.83 and the angle is 31o

0 O 1 2 3 4 5

Mathematics for Computer Science Page 1


MATH1179
A third way is to describe in terms of 𝑥𝑥 and 𝑦𝑦 the ‘journey’ from the origin that each arrow makes. In
the case of 𝑶𝑶𝑶𝑶 this is 3 in the 𝑥𝑥 direction and then 4 in the 𝑦𝑦 direction. We use a column vector to
3 5
describe this so that 𝑶𝑶𝑶𝑶 = � � = and 𝑶𝑶𝑶𝑶 = � �.
4 3
We write the single letters for vectors here in lower case and bold. In books you will also see them
�����⃗ or 𝑝𝑝.
written 𝑂𝑂𝑂𝑂
Sometimes it is helpful to describe vectors in terms of the single unit vectors in the directions of 𝑥𝑥
and 𝑦𝑦. Traditionally these are called i and j

1 0
so 𝑖𝑖 = � � = and 𝑗𝑗 = � �
0 1

5
P
4 Now we can also express
OP as 3i + 4j
3 Q
p and OQ as 5i + 3j
q
2

j
0 O
1 2 3 4 5

In 3D we use k to be the unit vector in the z direction


coming out of the page
i

Negative numbers are used to show the reverse direction so the vectors r and s are given as

4
r = -2i -1j s = 2i -3j
r
3
s
2  − 2  2 
or r =   and s =  
1
 −1  − 3

0 1 2 3 4 5

Page 2 Vectors and matrices


MATH1179
Addition and subtraction of vectors
Below are illustrated the results of the addition and subtraction of two vectors, t and u.
as
 3  2 5  3  and -u =  − 2  and t-u = 1
t =   and u =   then t+u =   Similarly as t =      
3   1   4    3  −1   2

4 4
u
u
3 3 -u
t t+u t-u
2 2
t
1 1

0 1 2 3 4 5 0 1 2 3 4 5
We can see that to add and subtract vectors you simply deal with the i and j components separately.

Scaling a vector
7 We can scale a vector by multiplying it by a scalar
-p (an ordinary number). Here we can see
6
1  2
2p p =   and q =  
5 p  3 1
 0.5   −1
4 ½ p =   and -p =  
 1.5   − 3
3
3q  2 6
2p =   and 3q =  
2 6  3
so we multiply each component of the vector by
1 q the scalar

0 1 2 3 4 5 6

Length and Direction of a vector


Because we are describing our vectors within a rectangular grid of i’s and j’s it is not difficult to
calculate the size of the vector and its angle.
4
The size of a vector, r, is known as its modulus and
3 written |r|. Using Pythagoras we find here
s
2
r r = 32 + 2 2 = 13 r
1 2
θ
0 1 2 3 4 5 3

Mathematics for Computer Science Page 3


MATH1179
For the direction we need the angle θ that r makes with the horizontal and so must use trigonometry.
tan θ = 2/3 so θ = tan-1 (2/3) = 33.7o
Similarly with s, s = 2 2 + ( −3) 2 = 13 tan θ = -3/2 so θ = tan-1 (-1.5) = -56.3o

1.2 Matrices
Matrices are a very important concept related to arrays in programming languages. We can use them
for such diverse purposes as describing adjacency in graph theory, implementing transformations in
geometry, coding and ciphers and solving simultaneous equations. In their simplest form they are
rectangular arrays of numbers arranged in rows and columns.
 2 4  1 3 2 3
   
 24 13 12   5 − 5 2 4 9 8
   2.76 8.34 
A =  − 3 45 26  B=   C=  6 8  D = 
 5.89 0.38    5 
 2 − 17 67   0 4  3 1 1
   
  9 8 2 9
− 2 7 
A matrix is said to have dimension m × n (m by n) where there are m rows and n columns so above
the dimensions of the matrices are
A is a 3×3matrix, B is a 2×2 matrix, C is a 5×2 matrix, D is a 2×4 matrix

Two matrices are equal if they have the same size and corresponding elements are equal.

An n × n matrix is called a square matrix. so A and B are square matrices but C and D are not

An n × 1 matrix is called a column vector; an 1 × n matrix is called a row vector.


 2
 
a column vector 5 a row vector (− 2 6 10 − 8)
 4
 

Addition and Subtraction of Matrices


To add or subtract matrices they must have the same size and results are obtained by working through
the matrices element by element adding or subtracting corresponding elements.

Scalar Multiplication
This means you multiply each entry by a number.

Example
 45 33 24   30 30 25 
A =   B =  
 31 27 42   28 43 20 

 45 33 24   30 30 25   75 63 49 
S = A + B =   +   =  
 31 27 42   28 43 20   59 70 61 
Page 4 Vectors and matrices
MATH1179

 45 33 24  135 99 72 
3A =A+A+A = 3×A = 3   =  
 31 27 42   93 81 123 

Matrix multiplication
This is the most complex procedure and is best illustrated by an example.

We calculate the total 19 by


multiplying the first row with
 6 1 4  2 =  19 
the first column and adding
   3  24 
all the results together
 5 4 2    
 
 1
We calculate the total 24 by
multiplying the second row
 6 1 4  2  19 
with the first column and  
adding all the results together     =  24 
 5 4 2  3
 
 1
If there is a second column then the answers go to the side of the first so that
 2 4
 6 1 4    19 45 
  3 1  =  
 5 4 2  1 5   24 34 
 
Remember: Matrix multiplication is one row ×one column = one answer
Note: Matrix multiplication is not generally commutative.

Example
 1 2  2 − 1  2 9  5 0
A=   B =   then AB =   but BA =  
 − 3 4 0 5   − 6 23   − 15 20 

Applications of Matrices and Vectors


There are many applications for matrices as they are a very powerful way to express information and
store it in a computer. For the first application of matrices we need to understand the concept of the
inverse.

Mathematics for Computer Science Page 5


MATH1179
Matrix ‘division’
You cannot divide by a matrix you can only invert it and multiply (like we did with fractions)
5 2 5 3 15
〈 Reminder ÷ = × = Here we invert the second fraction. 〉
8 3 8 2 16
We invert the fraction as this second fraction is multiplicative inverse of the first
2 3 6 1
i.e. × = = = 1 together they make 1 the identity for multiplication
3 2 6 1
We can do the same with some matrices

Example
 2 1  3 − 1 this is because
If A =   then its multiplicative inverse A-1 =  
 5 3  − 5 2 

 2 1  3 − 1  3 − 1  2 1  1 0
AxA-1 = A-1 xA =     =   =  
  1 
 5 3  − 5 2   − 5 2   5 3  0
(  1 0  is the equivalent of 1 for 2x2 matrices and the inverse must work from left and right)
0 1
We can use this to solve simultaneous equations

Simultaneous equations
We have learnt in semester one to express information with equations. Sometimes this results in 2
equations with 2 variables.

Example (traditional method)


Solve the following simultaneous equations 2x + y = 11 EQU1
5x + 3y =26 EQU2
y = 11-2x (subtract 2x from both sides EQU1)
5x + 3(11-2x) = 26 (substitute for y in EQU2)
5x +33 – 6x = 26 (distributive law)
-x = -7 (subtract 33 from both sides)
x=7 (multiply both sides by -1)
14 + y = 11 (substitute for x in EQU1)
y = -3 (subtract 14 from both sides) so x = 7 and y = -3

Example (matrix method)


2x + y = 11
5x + 3y =26
 2 1  x   11 
These equations can be rewritten using matrices as    =   this would lead to a solution
 5 3  y   26 
if we could find some way of removing the matrix on the left hand side.
By using the inverse we have just found we can.

Page 6 Vectors and matrices


MATH1179
 3 − 1  2 1  x   3 − 1  11 
     =     (pre-multiply both sides by inverse)
 − 5 2   5 3  y   − 5 2   26 

NOTE because matrix multiplication is NOT commutative you must keep the multiplications both
on the left of the expressions
 1 0  x   33 − 26 
   =   (multiplying out both sides using matrix multiplication)
 0 1  y   − 55 + 26 
x  7 
  =   so we can read off the answers x = 7 y = -3
 y   − 3

Inverse of a matrix
We solved our equation because we were using a matrix where we had already been given its inverse.

Only square matrices can have inverses and not all of them do. In general to find the inverse of a
large matrix by hand would be out of the scope of a course like this and a package such as MATLAB
would be used.

However for certain 2x2 matrices it is possible to calculate the inverse by hand using this formula
a b 1  d − b
:Let A =   then A-1 =   ( only if ad-bc ≠ 0)
c d ad − cb − c a 

Example
 4 5
C =   then a = 4 b = 5 c = 2 d = 3, so substituting them in the formula we get
 2 3
1  3 − 5 1  3 − 5
C-1 =   =  
(4 × 3) − (2 × 5) − 2 4  2 − 2 4 

we can check this by calculating C-1C


1  3 − 5  4 5 1  2 0   1 0
    =   =  
2  − 2 4   2 3 2  0 2   0 1 

Example
We can use our previous calculation in the simultaneous equations: 4p + 5q = 22, 2p+3q = 8
 4 5  p   22 
this becomes    =   so
 2 3  q   8 
 p  1  3 − 5   22 
  =     (multiplying both sides by the inverse)
q 2 − 2 4   8 
 p  1  66 − 40  1  26   13 
  =   =   =   so that p = 13 and q = -6
 q  2  − 44 + 32  2  −12   − 6

Mathematics for Computer Science Page 7


MATH1179

1.3 Matrices and vectors for 2D transformations


This is another powerful way in which we can use matrices and vectors. It underlines the graphical
work that takes place in drawing packages, animation and virtual reality. In modern packages it is not
usually necessary to program at the matrix level but an understanding of the way they manipulate the
geometry even in the 2-dimensional case is useful.

Basics
We will illustrate all our examples using a triangle T and we will describe this by using a matrix
which gives the coordinates of each corner given as (x,y), these are (1,1), (4,1) and (4,3). Rewriting
1 4 4  x
T as a matrix, T =   as all the co ordinates are written in the format   .
1 1 3   y
6

4
T+v ST
3

2
FT 1 T

-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
-1

-2
HT
-3

-4 RT
-5

-6

 −1 0  −1 0   0 1 1.5 0 
Applying the matrices F =   , H =   R=   and S=   to T gives
 0 1  0 − 1  −1 0  0 2
 − 1 0  1 4 4   −1 − 4 − 4   −1 0  1 4 4  =  − 1 − 4 − 4 
FT =    =   HT =      
 0 1  1 1 3  1 1 3   0 − 1 1 1 3   −1 −1 − 3 

 0 1  1 4 4  1 1 3 
    =  
 − 1 0  1 1 3   −1 − 4 − 4

ST = 1.5 0 1 4 4  = 1.5 6 6 
 0     
 2  1 1 3   2 2 6
We can also use the vector v =  − 7  to move T to the position labelled T+v.
 
 2 

Page 8 Vectors and matrices


MATH1179
Transformations
We have seen 4 types of transformation of our triangle. Only the vector movement cannot be achieved
by using a simple 2x2 matrix multiplication.

Reflection
 −1 0
F =   has the effect on T of reflecting it in the y axis.
 0 1
The vertices of T are now in reverse order. To describe a reflection you must state the ‘mirror line’.

Rotation
−1 0 
H =   has the effect of rotating T about (0,0) through an angle of 1800.
 0 − 1
 0 1
R=   has the effect of rotating T about (0,0) through an angle of 900 clockwise.
 − 1 0 
The vertices of T remain in the same order.

In general, a rotation counter clockwise by an In general, a rotation clockwise by an angle θ


angle θ is achieved using the matrix, is achieved using the matrix,

cos 𝜃𝜃 − sin 𝜃𝜃 cos 𝜃𝜃 sin 𝜃𝜃


𝑅𝑅 = � � 𝑅𝑅 = � �
sin 𝜃𝜃 cos 𝜃𝜃 −sin 𝜃𝜃 cos 𝜃𝜃

To describe a rotation you must state the centre, the angle and whether clockwise or anticlockwise.

Scaling
1.5 0 
S=   has the effect of scaling T by 1.5 in the x direction and 2 in the y direction.
 0 2
To describe scaling you must give the scale factor in each direction

Translation by vector
v =  − 7  describes moving the triangle T 7 units to the left and 2 units up. This is a bit different from
 
 2 
the matrix multiplication as the vector is applied to each point in turn.

The use of matrix multiplication to apply this translation to T is beyond the scope of this section but
would involve enhancing the vectors representing the vertices of T and then using the 3x3 matrix
1 0 − 7
  to apply the translation.
0 1 2 
0 0 1 
 
To describe a translation you must give the size and sign of the movement in each direction

Combinations of transformations
More than one transformation can be applied to a figure. This has the same effect as applying the
product of the individual transformations.

Mathematics for Computer Science Page 9


MATH1179

2: Graph Theory and Networks


Consider the following problems:

Derby Nottingham
16
A leading TV company is setting up
a new cable TV company based at 16 16
24

Leicester which has connections as 18


Ashby
shown to the towns Derby, 12 Loughborough
Grantham
Nottingham, Grantham, Ashby-de-
la-Zouch, Loughborough and 15 1
17
20
Melton Mowbray. Find the
minimum amount of cable which
15
will connect all the towns. Leicester
Melton Mowbray

Suppose you own a chain of shops


at Sheffield, Aston, Chapeltown, Worksop and Chesterfield. The distances between these towns
are known. You live at Sheffield and each day have to visit each shop finally returning to Sheffield.
What is the shortest route?

Can an electrical circuit be constructed as a single layer with no crossing wires?

The city of Konigsberg in the 15th


century was divided into four parts by LAND A

the river Pregal, the parts being


joined by seven bridges. LAND C

Every Sunday the citizens would LAND B


walk around their city. Was it
possible to devise a walk which
passed over every bridge just once LAND D
and returned to the starting place ?

These are examples of problems which can be solved by graph theory.

Definition
A graph G consists of a set of vertices V and a set of edges E. Each edge e is associated with a pair
of distinct vertices u and v called the endpoints of e. We can write e = ( u, v ).

We shall only consider simple graphs - graphs which do not have edges which start and finish on the
same vertex and which can only have at most one edge between any two vertices.

Page 2 Graph Theory and Networks


MATH1179
Examples
b
1 Graph with 5 vertices and 8 edges
a
e

c d
2 A graph which has vertices a,b,c,d,e and edges (a,b), (a,c), (b,d), (c,e), (a,e)

a b

e c

Given a graph G, a walk in G is a sequence of edges of the form


( v1, v2 ), ( v2, v3 ), ( v3, v4 ), ..... , (vk-1, vk)
where vi is a vertex of G. We can write this as v1 , v2 , v3 , ..., vk

The number of edges in the walk is called the length of the walk.

If all the edges of a walk are distinct we have a trail.

If all the vertices of a walk are distinct we have a path.

walks trails paths


e≠ e≠ v≠

Mathematics for Computer Science Page 3


MATH1179
In the first example above, the walks from a to b of
length 4 are b
acdcb acbeb abecb adceb a
acecb acacb abceb adecb
acbab adacb acdab acdeb e
abcab acbcb adcab
abacb

The first 2 columns are walks, the next trails and the
last column is paths from a to b.
c d
A walk from a vertex v to itself with no repeated edges is called a cycle.

A graph is connected if given any two vertices there is a walk between them.

Example
a b
A graph which is not connected :
G has vertices a, b, c, d ,e
edges (a,b), (a,c), (d,e) d c

A connected graph with no cycles is called a tree

Example
A tree
C:

Example
The directory system on the PC is a tree

A subgraph of a graph G is a graph all of whose vertices are in G and all of whose edges are in G.

Example b b

c
a a

d
d Subgraph of G
G

Page 4 Graph Theory and Networks


MATH1179
A spanning tree of a graph is a subgraph which includes all the vertices of G and is also a tree. If G
has n vertices, a spanning tree will have n - 1 edges.

Example
The graph G above has spanning trees

b
b

c c

a a

d d

An alternative way to represent a graph is by an adjacency matrix. Matrices are simply ways of
storing information in a grid formation.

Let G be a graph with vertices v1, v2, .... , vn


v1 v
. . 2
0 1 0 0
 
1 0 1 1
then A = 
0 1 0 1
. .  
v4 v3 0 1 1 0
If A is the adjacency matrix of G then the entry A(i,j) = 1 if there is an edge between vi and vj.
Otherwise the entry is 0.

We can show that the number of walks from vertex vi to vj of length n is the i-jth entry in the matrix
An.
0 1 0 0  0 1 0 0  1 0 1 1
     
2  1 0 1 1  1 0 1 1  0 3 1 1
In the above example A = × =
0 1 0 1  0 1 0 1  1 1 2 1
     
0 1 1 0  0 1 1 0  1 1 1 2 
The number of walks of length 2 between v4 and v2 is 1 v2 v3 v4
The number of walks of length 2 between v2 and v2 is 3 v2 v1 v2 , v2 v3 v2 , v2 v4 v2

If we want walks of length 3 we must find A3


1 0 1 1  0 1 0 0  0 3 1 1
     
0 3 1 1  1 0 1 1  3 2 4 4
A =A ×A = 
3 2
× =
1 1 2 1  0 1 0 1  1 4 2 3
     
1 1 1 2  0 1 1 0  1 4 3 2 
The number of walks of length 3 between v4 and v3 is 3
v3 v2 v3 v4, v3 v4 v3 v4 , v3 v 4 v2 v4

A weighted graph is a graph which has a real number attached to each of its edges.

Mathematics for Computer Science Page 5


MATH1179
Example
Five towns A, B, C, D, E are connected by roads of the following lengths. Put this in the form of a
weighted graph.
A B C D E
A 31 28 13
B 43 10
C 40 12

A
28

31 13

D 40
C
43
B

10 12

Minimal Spanning Trees


Suppose that G is a weighted graph. A minimal spanning tree is a spanning tree of G which minimises
the weights on the edges in the tree.

In the first example of this section, the minimal spanning tree will give the least amount of cable
needed to connect the towns.

We use Kruskal's algorithm to find the minimal spanning tree for a weighted graph G with n
vertices.

This algorithm 'grows' a tree.


1 The graph T initially consists of the vertices of G and no edges.
2 Add an edge to T having minimum weight but which does not give a cycle in T.
3 Repeat 2 until T has n - 1 edges.

Page 6 Graph Theory and Networks


MATH1179
Example
Solution to minimum cable problem on first page (NB edges in ORDER)
edge weight accept/reject
e1 11 A
Derby Nottingham
e6
e2 12 A
e5 e7
e11
e3 15 R (e1 e2 e3)
e9
Ashby e4 15 A
e2 Loughborough
Grantham
e5 16 A
e3 e1
e8
e10 e6 16 A
e7 16 R
Leicester
e4
Melton Mowbray e8 17 R
e9 18 R
e10 20 A
e11 24 R

Spanning Tree :
e6 The minimum length of cable is
16+16+12+11+15+20=90
e5 e10

e2 e1
e4

Multigraphs
These are graphs which
have loops or multiple edges. A B

Directed graphs or digraphs

A directed graph is a multigraph with a direction assigned to each


a b edge. A digraph could be used as a model for a one-way street
plan. Walks etc must follow the arrows.
eg there is only one walk from d to b : d a b
d c

Mathematics for Computer Science Page 7


MATH1179
Directed or Rooted Trees
A rooted tree is a directed graph which is also
r
a tree. In addition it has a special vertex r
called a root and all vertices are accessible v
from r. A vertex w is a successor of a vertex
v if there is a directed edge from v to w. w

A binary tree T is a directed tree such that each vertex v


has at most two successors. Each successor is either called
r
a left successor, left(v) or a right successor, right(v). v
cannot have two left or two right successors.
v
left(v) The left subtree of a vertex v in T consists of all vertices
right(v) and connecting edges in T which can be reached from v.
Similiarly we have a right subtree.

Evaluating logic expression using binary trees


Find the truth value of the expression (p∧(q∨¬r)) ∨ (q∨(¬p∧r)) when p is true and q and r are false.

First construct the binary tree for the expression

(p∧(q∨¬r)) ∨ (q∨(¬p∧r)) T

p∧(q∨¬r) q∨(¬p∧r) T F

p q∨¬r q ¬p∧r T T F F
q ¬r ¬p r F T F F

r p F T

A binary search tree is a binary tree in which data is associated with each vertex. The data is
arranged so that for any vertex v in T, each data item on the left subtree of v is less than the data item
in v and the data in the right subtree of v is greater than the data item in v.

Example
Find a binary search tree for the words in the sentence 'Once upon a time there was an old man'. Use
alphabetical ordering and Once as the node.

Page 8 Graph Theory and Networks


MATH1179

Once
a upon

an time was

old there

man
Example
Put the numbers 34, 56, 23, 89, 12, 24, 54, 22, 67 in a binary search tree with 34 as the node.

34

23 56

12 24 54 89

22 67

Mathematics for Computer Science Page 9


MATH1179

3: Algorithms
Algorithms are often described as ‘Mathematical Recipes’, the solving of a problem with a finite
sequence of operations. You have already considered many such ‘recipes’ in Systems Building and
Programming. You have written them both in structured English and a programming language like
JAVA. In looking at networks earlier we used Kruskal’s Algorithm to find the minimum spanning
tree.

Example
1. Multiply 1. Naming the algorithm
2. READ x,y 2. Taking in inputs x and y
3. n ← 0 3. Initialising the variable, n (loop counter)
4. Answer ← 0 4. Initialising the variable Answer
5. DO 5. Start of the loop
6. Answer ← Answer + y 6. Adding y on to the current value of Answer
7. n ← n+1 7. Adding 1 on to the current value of n (updating counter)
8. LOOP UNTIL n =x 8. Test to see if we have finished the loop
9. DISPLAY Answer 9. Outputting the result
10. End Multiply 10. Ending the algorithm

Because computers do not think, we use algorithms as instruction sets to the machine to implement
our thoughts and designs. Here the algorithm (written in pseudo code or structured English) uses
successive addition to accomplish a multiplication in the same way that you once learnt that 4+4+4
= 3×4.

Definition
An algorithm may be defined as a sequence of instructions to solve a problem which has the following
properties:
The algorithm receives INPUT
The algorithm produces OUTPUT
The algorithm stops after a FINITE set of instructions have been executed
Each instruction in the algorithm is followed by a UNIQUE SUCCESSOR instruction.
This can be seen in the algorithm design for the above example, this is shown in the structure chart.
INPUT is x and y
OUTPUT is Answer
The algorithm will stop after the LOOP has executed exactly x times
It is always clear what the next instruction is in the algorithm.

Page 10 Algorithms
MATH1179

UNIQUE
SUCCESSOR

Multiply

until n=x

READ x,y n ←0 Answer ← 0 DO * DISPLAY Answer

Answer ← Answer + y
n←n + 1

INPUT OUTPUT
Initialisin FINITE as loop stops
after x executions.

Example
Input Age
READ n
IF n<0 OR n>130
DISPLAY “Error- bad input”
ELSE
DISPLAY “ Age is” n
END IF
End Input Age

Here we have a selection operation as part of the algorithm but that does not invalidate our property
that there must be a UNIQUE SUCCESSOR as, given the data INPUT, there is only one choice
possible.
UNIQUE SUCCESSOR
Multiply instruction dependent on n

n<0 OR n >130

True False
READ n O
DISPLAY DISPLAY O
“Error- bad input” “ Age is” n

INPUT OUTPUT
Dependent on n

Mathematics for Computer Science Page 11


MATH1179

Performance Characteristics of Algorithms


When a project is specified there might be constraints given on the use of computer resources. As
algorithms are at the heart of our computer programs, this leads us to look at the efficiency of the
algorithms underpinning the software.

Complexity and Efficiency


Complexity of algorithms can have many different meanings.
Cognitive complexity – concerned with the difficulty of understanding an algorithm
Structural complexity – concerned with the flow of data through an algorithm
Computing Complexity - usually split into 2 aspects:
Space Complexity - concerned with the storage i.e use of memory
Time complexity - concerned with the speed at which an algorithm will execute

It is this last type of complexity that is of interest to us. The limitations of early computers, where
saving a few bytes of memory or reducing the number of instructions could mean the difference
between a program working or not, meant that complexity became synonymous with efficiency.
Computing complexity describes the two types of efficiency we can consider related to the
algorithm’s use of machine resources. Space complexity is dependent on the type of data structures
used in the algorithm and its effect on the memory capacity of the computer has become much less
important over recent years. Time complexity is still a vital issue especially with the emergence of
newer technologies and software which rely heavily on the internet.

Time Complexity
The time complexity of an algorithm can be expressed in terms of the number of primitive operations
used by an algorithm when the input has a certain size.

Definition
For any algorithmic solution A, we define the time complexity function TA(n) as the maximum
number of relevant primitive operations that have to be performed by A on a problem of size n.

Example [Fenton]
Suppose we have an array X of marks for n students and wish to change them from being out of 70
to a percentage then 2 possible algorithms are given here.
ScaleMarks A ScaleMarks B
READ X(array) READ X(array)
total←70 factor←100/70
FOR I = 1 To n FOR I = 1 To n
X(i) ←X(i) * 100/total X(i) ←X(i) * factor
NEXT i NEXT i
DISPLAY X DISPLAY X
End ScaleMarks End ScaleMarks

Page 12 Algorithms
MATH1179

The two important primitive operations are multiplication and division.


In algorithm A the multiplication and division are carried out each time the loop executes so TA(n) =
2n
In algorithm B one division is performed before the start of the loop and one multiplication for each
execution of the loop so TB(n) = n+1

Example
Consider 2 algorithms for finding the sum of the rows of a matrix and then the sum of all elements in
the matrix. How it works:
First each row is summed and the totals for each row, sum(i), are displayed
1 5 2 = 8
  Then the final total, Grandsum, of all the elements is calculated and displayed
4 0 3 = 7
The general element of the matrix A is written here as a(i,j) so for example
2 8 1  = 11 the element on the second row in the third column is called a(2,3).
 
5
 3 2  = 10 In this example the inner loop will add the rows so j will go from 1 to 3.
36 The outer loop will total these row sums and so i will go from 1 to 4

Both algorithms work by using a double loop but 2 students have decided to try and get the total
(Grandsum) by putting similar statements in different places in the algorithm

MatrixSum A MatrixSum B

READ a(array) READ a(array)


Grandsum ←0 Grandsum ←0
FOR i = 1 TO 4 FOR i = 1 TO 4
Sum(i)←0 Sum(i)←0
FOR j = 1 TO 3 FOR j = 1 TO 3
Sum(i)←Sum(i) + a(i,j) Sum(i)←Sum(i) + a(i,j)
Grandsum ← Grandsum + a(ij) DISPLAY Sum(i)
DISPLAY Sum(i) NEXT j
NEXT j Grandsum ← Grandsum + Sum(i)
NEXT i NEXT i
DISPLAY Grandsum DISPLAY Grandsum
End Matrixsum End Matrixsum

When we analyse the time complexity of these algorithms we look to see how often a critical
operation (addition in this case) takes place.
In A we have both Sum and Grandsum calculated 12 (4×3) times so have 2(12) = 24 operations
In B we have Sum calculated 12 times but Grandsum only 4 times so have a total of 16 operations.

Mathematics for Computer Science Page 13


MATH1179

MatrixSum C MatrixSum D

READ a(array) READ a(array)


Grandsum ←0 Grandsum ←0
FOR i = 1 TO n FOR i = 1 TO n
Sum(i)←0 Sum(i)←0
FOR j = 1 TO n FOR j = 1 TO n
Sum(i)←Sum(i) + a(i,j) Sum(i)←Sum(i) + a(i,j)
Grandsum ← Grandsum + a(ij) DISPLAY Sum(i)
DISPLAY Sum(i) NEXT j
NEXT j Grandsum ← Grandsum + Sum(i)
NEXT i NEXT i
DISPLAY Grandsum DISPLAY Grandsum
End Matrixsum End Matrixsum

The difference in computer time in these cases will be negligible but consider the general case where
both i and j have been replaced by n allowing us to use this algorithm with any size matrix.

Now we have algorithm TC(n) = 2n2 and TD(n) = n2+n.

For all values of n , 2n2 > n2+n so we can say D is apparently guaranteed to execute faster than C.

Does this matter from a user’s perspective? If we take the case where n is 1000 and a computer only
capable of processing one instruction each microsecond the comparison between C and D will be
about 2 second to 1 second so not noticeably different. On a 100 000 by 100 000 matrix the difference
is significant with 6 hrs as opposed to 3 hours.

Example
Travelling salesman problem- this comes from Graph theory in the case where we have n cities each
connected to all others where all distances are known. (this type of network is known as a complete
graph). The problem is to find the shortest route from a city back to itself passing through all other
cities just once.

e.g. For 4 cities the graph would be as shown ( although the distances would not be so regular) and
taking city A as start and finish there would be 3! (3×2×1) routes to compare.
(starting at A there are 3 choices of next town × 2 of next ×1of last)
A B
length ABCDA = DIST(1)
length ABDCA= DIST(2)
length ACBDA= DIST(3)
length ACDBA= DIST(4)
length ADCBA= DIST(5)
length ADBCA= DIST(6)
D C

Page 14 Algorithms
MATH1179
This algorithm is designed to compute the shortest route in this case starting and ending at city A.

Shortest Route For each DIST(i) 4 distances are calculated


Calc DIST (1) E.g. Dist(1) = AB + BC + CD + DA
MinDist←DIST(1) There are 3! of these to calculate
FOR i = 2 to 6
Calc DIST(i)
Dist(i) So the total number of additions is 4×3! = 4!
IF Dist(i)<MinDist THEN
MinDist←Dist(i)
NEXT i For this case TA(4) = 4!
End Shortest Route

In general if there are n towns then TA(n) = n! .

In general we can compute three versions of the function T, best case time, worst case time and
average case time but we will concentrate on worst case when the algorithm carries out its maximum
set of instructions. This is a well known problem that runs out of computer time if the number of
nodes (towns) gets too big. From the table you can see that with 10 nodes it is manageable as
10! =3628800 (3.6 x 106) but
16!=20922789888000 ( 2.1 x 1013 )
and you can see how quickly the number of operations gets too big to compute in reasonable time.

The table given here shows that although computers are incredibly fast at calculating, some
algorithms have a time complexity that grows in such a way to be impossible for large values.

When constructing algorithms for prototypes the scalability of the solution should be noted. The table
calculated by Norman Fenton shows execution times in relation to the number of operations and those
entries left blank exceed the expected life span of the universe!

Execution times for different algorithms assuming 1 execution per nanosecond [Fenton 1993 updated]
Time Complexity T(n) of algorithm in ascending order
Size of log2n n n2 2n n! nn
input
10 0.000000 003 0.000 000 01 0.000 000 1 0.000 001 0.0036 0.168
seconds seconds seconds seconds seconds minutes
100 0.000 000 007 0.000 000 1 0.000 01 1011 10143 10182
seconds seconds seconds centuries centuries centuries
1000 0.000 000 01 0.000 000 1 0.001 - - -
seconds seconds second
10 000 0.000 000 013 0.000 01 0.1 - - -
seconds seconds seconds
100 000 0.000 000 017 0.000 01 0.168 - - -
seconds seconds minutes

Mathematics for Computer Science Page 15


MATH1179

Orders of magnitude
In our example on matrices although the difference in the two algorithms was large when the matrix
was large, for the user they are regarded as the same order of magnitude. This is because at the low
end of the comparison there is no practical difference between waiting 1 and 2 seconds for the result
and at the upper end if you are going to wait 3hrs you might as well leave it to execute overnight
anyway and 6 hrs will make no difference.

Order of magnitude is an expression used by scientists to compare 2 values. When dealing with
ordinary numbers usually 2 values have the same order of magnitude if the ratio of the larger to the
smaller is less than 10:1. e.g. 57.3 is the same order of magnitude as 81.75 but NOT the same as
4.786. If they are not the same order of magnitude then the difference is based on the power of 10
involved.

Example
6245 is 2 orders of magnitude bigger than 48 as
6245 : 48 is bigger than 102 : 1 but less than 103 : 1

Sometimes units are taken into account rather than powers of 10 so that an hour might be considered
as 2 orders of magnitude bigger than a second.

In looking at the time complexity of an algorithm A we often compare TA(n) with a similar simple
function which behaves the same for large n . This is the basis for the big O sets used in complexity
where the O represents this dominating similar function. (we are just scratching the surface here and
more can be found about this in most good books on the theory of computing).

Example
If TA(n) = 2n+1 then it is said to be of the order of n or O(n) this is because as n increases 2n+1
grows at the same rate as n. Here n is the dominating similar function.

Example
If TA(n) = 6n2+3n + 17 then it is said to be O(n2)

Using this big O notation means that we don’t directly compare T values, which ignore some
operations like initialising and formatting, but have a much simpler grading of complexity.

Example
Searching and Sorting algorithms have already been mentioned and used in the programming course.
• The Binary search algorithm has O(logn)
• The Bubblesort algorithm has O(n2) in the worst case because for a list of n elements the
algorithm will need at worst O(n2) comparisons to sort out the list.
• Quicksort also has O(n2) in the worst case but it can be proved that it has average case
complexity of O(n logn) comparisons to sort out the list.
It can be seen that in investigating the big O value of an algorithm the only ones that will execute in
a reasonable time for non trivial n are O(nk). These are said to have polynomial complexity.

Page 16 Algorithms
MATH1179

Computability and the Turing Machine


Our original definition of an algorithm was in terms of a finite instruction set. An equivalent definition
makes use of one of the landmarks of Computer Science and the work of Alan Turing who laid the
foundations of the theory of computability in 1936.

Definition
A Turing Machine M = {S, I, f, s0 } consists of S a finite set of states, an alphabet I containing the
blank symbol B and a partial function, f , from S×I to S×I×{R,L}

This rather frightening definition can be interpreted in terms of a control head that can read and write
symbols on an infinite tape and move left or right and halt if the partial function is not defined. An
example of a Turing machine can be seen in the diagram below.

Here S = {s0 , s1, s2, s3, s4} I = {B, 1 0} and at each position the circular control head can do 3
things simultaneously:
Read a symbol and write another over it
Change its internal state
Move one space left or right
Its moves are defined by the function f and would consist of quintuples (sets of 5) such as
(s1, 1, s2, 0, L) i.e. If the machine is in state s1 and positioned by a 1 it changes to state s2, writes over
the 1 with a 0, and moves one space to the left.

Any time there is not a defined outcome for a pair (si ,I) the machine HALTS.

B 0 1 B 1 B 0 0 1 B B

s0
s1
s4

s3 s2

Turing invented his machine to mirror the functions that can be computed with an algorithm. For a
particular algorithm the Turing machine may be difficult to construct but the Church-Turing thesis
states that for any problem that can be solved by an effective algorithm there is an equivalent Turing
Machine that can solve this problem.

So IF an algorithm is said to be computable it is equivalent to saying we can construct a Turing


Machine to carry out the same procedures.

This is important as there are some problems which are non-computable.

Mathematics for Computer Science Page 17


MATH1179
Example
The Halting Problem is a well known example of a non-computable problem. Suppose there was a
standard program called Halt that takes as input any Program, P, and tells us whether P terminates.
So Halt(P) is either Yes or No.

Program P HALT Yes/No

Unfortunately no such program exists as can be shown by the algorithm

Stupid
INPUT Stupid
WHILE Halt(Stupid) = Yes DO
DISPLAY “This is weird”
LOOP
End Stupid

Thinking about the only two possible cases - Stupid terminates or Stupid doesn’t - shows us this is a
non-computable algorithm.

Case 1
Stupid terminates
Halt(Stupid) = Yes
WHILE DO LOOP never stops so Stupid doesn’t terminate

Case2
Stupid doesn’t terminate
Halt(Stupid) = No
WHILE DO LOOP stops so Stupid terminates

Page 18 Algorithms
MATH1179

4: Differential and Integral Calculus


Calculus is a general term for a particular method of calculation or reasoning. When we studied Logic
in the previous course we called that Propositional Calculus – the methods and reasoning about logical
propositions. However when calculus is mentioned generally then it usually refers to the methods of
differentiation and integration. This course will only give the briefest of overviews of both and there
are many available books to give you further techniques and examples.

4.1 Differentiation
By definition a straight line y = mx +c has the
7 (3,7)
same slope or gradient at all points on the line.
y = 2x+1 We can measure this in a practical way or
6
calculate it using a triangle placed against it i.e.
looking at the differences in the y’s divided by
5 2 the differences in the x’s. We have already
looked at this when studying functions.
4 gradient=2/1 =1
1
or Wherever we form the triangle on the line the
3 answer will be the same.
diff y/diff in x
(0,1) = 7-1/3-0

0 1 2 3 4 5 6
m= -2
m= 0.3

If m is positive the line slopes upwards from


left to right. If m is negative the line slopes
downwards.

Gradient of a curve
There is not a single value for the gradient
when the function is represented by a curve
as the slope or gradient is different at each
point along the curve

Differentiation is a way of looking at the changes in y and x which measure the slope at any point.
Because this is not now a fixed number unlike the m of a straight line, it varies according to the
location on the curve, it is a function of x.

Definition
Let y = f(x) be the equation that describes the curve, then the expression for the gradient of the curve
is known as dy . To find it at a particular point on the curve you must put in the x value for that point.
dx

Mathematics for Computer Science Page 19


MATH1179
Example
16

If y = x it can be shown that the gradient at any point dy = 2x


2 15

dx 14

x y =f(x) grad = 2x 13

-4 16 -8
12

11

-3 9 -6 10

-2 4 -4 9

-1 1 -2
8

0 0 0 6

1 1 2 5

2 4 4
4

3 9 6 2

4 16 8 1

0
-4 -3 -2 -1 0 1 2 3 4

So the final column shows the gradients evaluated at each point.


From the graph we can see the gradient is -6 when x = -3 and the gradient is 2 when x = 1
More rigorous mathematics than we need in this course enables us to find the expression for dy for
dx
a number of useful functions that you will meet and they are given in the table.
Combinations of powers of x can be differentiated one at a time. y =f(x) dy
Example dx
y = x2 + x - 7 c 0
dy = 2x +1 x 1
dx x2 2x
Scalar multiples of the functions are just carried through
x3 3x2
Example xn nxn-1
y = 7sinx - 3x2 = 7(sinx) -3(x2)
sinx cosx
dy =7cosx -3(2x) = 7cosx -6x
cosx -sinx
dx
ex ex
NOTE All angles must be measured in radians when using calculus for lnx 1
these relationships to hold. x

Page 20 Differentiation and Integration


MATH1179
Chain Rule
For brackets with powers use the chain rule. Can either do it straight by saying “multiply by the
power, decrease the power by one and then multiply by the inside of the brackets differentiated”
d { f ( x)} n
= n{ f ( x)} n −1 × f ′( x) , or,
dx
dy dy du
let u = (the function in brackets) and use = ×
dx du dx

Example
dy
a. y = sin3 x = (sin x)3 then = 3(sin x ) 2 × cos x = 3 sin 2 x cos x
dx
du dy dy dy du
or let u = sin x ⇒ = cos x . y = u3 ⇒ = 3u 2 ⇒ = × = 3 sin 2 x cos x
dx du dx du dx
dy
b. y = cos2 5x = (cos 5x)2 then = 2(cos 5 x)1 × − sin 5 x × 5 = − 10 cos 5 x sin 5 x
dx
dy
= 4(5 x 2 − 10 x)3 × (10 x − 10 ) = 40(5 x − 10 x) ( x − 1)
2 3
c. y = (5 x 2 − 10 x) 4 then
dx

Product Rule
Where both 𝑢𝑢, 𝑣𝑣 are functions of 𝑥𝑥, and the function is then a product of the two use the product rule.
For two different functions of x multiplied together use the product rule
d (uv ) dv du
=u +v
dx dx dx
It applies for more than two functions of 𝑥𝑥 too, i.e. the product of 𝑢𝑢, 𝑣𝑣, 𝑤𝑤,
𝑑𝑑(𝑢𝑢𝑢𝑢𝑢𝑢) 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= 𝑢𝑢𝑢𝑢 + 𝑢𝑢𝑢𝑢 + 𝑣𝑣𝑣𝑣
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑

Example
dy
y = e3x (2 - √x)2 find
dx
du
Let u = e3x ⇒ = e3 x × 3 = 3e3 x
dx
dv 1 (2 − x )
v = (2 - √x)2 ⇒ = 2(2 − x )1 × (− x −1 / 2 ) = −
dx 2 x

dy
dx
 (2 − x ) 
= e3 x ×  −
x  
2
( )  1
 + 2 − x × 3e3 x = e3 x 2 − x ×  − (
x
)
+3 2− x  ( )
  

Mathematics for Computer Science Page 21


MATH1179
Example
dy
y = xe x sin 3x find
dx
dy
= xe x .3 cos 3 x + x sin 3 x.e x + e x sin 3 x.1
dx
dy
= e x (3 x cos 3 x + ( x + 1) sin 3 x )
dx

Quotient Rule
du dv
v −u
d  u dx dx
For two different functions of x which are divided use the quotient rule  =
dx  v  v 2

Example
1 + 3x dy
y= find
2 − 5x dx
du dv 1
let u = 1 + 3x ⇒ =3 let v = 2 − 5 x ⇒ = (2 − 5 x) −1 / 2 × (−5) , v 2 = 2 − 5 x
dx dx 2
 −5 
2 − 5x × 3 − (1 + 3x ) 
dy  2 2 − 5x 
=
dx 2 − 5x
This can be tidied up by multiplying top and bottom by 2 2 − 5x
dy 6(2 − 5 x ) + 5(1 + 3 x) 17 − 15 x
= =
2(2 − 5 x ) 2 2(2 − 5 x ) 2
3 3
dx

Parametric differentiation
When 𝑥𝑥, 𝑦𝑦 are defined in terms of a third variable for example 𝑡𝑡, parametric differentiation can be
𝑑𝑑𝑑𝑑
used to obtain 𝑑𝑑𝑑𝑑 .
𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= �𝑑𝑑𝑑𝑑
𝑑𝑑𝑥𝑥
𝑑𝑑𝑑𝑑
Example
𝑑𝑑𝑑𝑑
Plot the graph of the parametric curve defined by 𝑥𝑥 = 𝑡𝑡 3 − 𝑡𝑡, 𝑦𝑦 = 4 − 𝑡𝑡 2 . Find 𝑑𝑑𝑑𝑑
It is possible to plot the parametric curve using values of 𝑡𝑡 to find the corresponding values of 𝑥𝑥 & 𝑦𝑦.
𝑡𝑡 -2 -1 0 1 2 and so on
𝑥𝑥 …
𝑦𝑦 …

Page 22 Differentiation and Integration


MATH1179

𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= 3𝑡𝑡 2 − 1, = −2𝑡𝑡
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 2𝑡𝑡
=− 2
𝑑𝑑𝑑𝑑 3𝑡𝑡 − 1

Implicit differentiation
Sometime the equation we want to differentiate is not in the standard for 𝑦𝑦 = 𝑓𝑓(𝑥𝑥), but it is still
𝑑𝑑𝑑𝑑
possible to differentiate and determine the gradient 𝑑𝑑𝑑𝑑 by remembering that as 𝑦𝑦 is still a function of
𝑑𝑑𝑑𝑑
𝑥𝑥, 𝑦𝑦 = 𝑓𝑓(𝑥𝑥) so 𝑑𝑑𝑑𝑑 = 𝑓𝑓′(𝑥𝑥)

Example
Consider the equation of a circle centred at the origin with radius 5,
𝑥𝑥 2 + 𝑦𝑦 2 = 25
𝑑𝑑𝑑𝑑
Differentiating, 2𝑥𝑥 + 2𝑦𝑦. 𝑑𝑑𝑑𝑑 = 0
(Using the chain rule for 𝑦𝑦 2 and remembering 𝑦𝑦 is a function of 𝑥𝑥.
Rearranging
𝑑𝑑𝑑𝑑 2𝑥𝑥 𝑥𝑥
=− =−
𝑑𝑑𝑑𝑑 2𝑦𝑦 𝑦𝑦
Replacing 𝑦𝑦 by 𝑦𝑦 = √25 − 𝑥𝑥 2 ,
𝑑𝑑𝑑𝑑 𝑥𝑥
=−
𝑑𝑑𝑑𝑑 √25 − 𝑥𝑥 2
𝑑𝑑𝑑𝑑 𝑥𝑥 3 3 3
At 𝑥𝑥 = 3, 𝑑𝑑𝑑𝑑
= − √25−𝑥𝑥 2 = − √25−32 = − 4 , 4
When 𝑥𝑥 = 3 there are 2 corresponding y values on the circle so hence the two options for the gradient.

4.2 Integration
This is sometimes thought of as ‘reverse differentiation’ and there is a connection between the two.
(For those interested it is described in the Fundamental Theorem of Calculus) Like differentiation it
arises from a practical problem, this time of finding the area of irregular shapes. In later courses you
will also learn numerical methods for dealing with differentiation and integration and there are a
number of packages such as Derive to perform the calculations.

Mathematics for Computer Science Page 23


MATH1179
In the first shape a combination of rectangles and triangles could be used to calculate the area under
the shape, but the second shape needs the more sophisticated technique of integration.

The theory of integration is derived by summing smaller and smaller vertical slices of the shape. The
sign used for integration of a function y = f(x) is ∫ y dx and a similar table to the one for
differentiation of well known results can be obtained.
y =f(x) ∫ y dx
We also need to use a constant of integration for indefinite integration,
c, to allow for a possible shift in the function in the y direction. a ax

Indefinite integration follows the same combining rules as x ½ x2


differentiation. x2 ⅓x3

Combinations of powers of x can be integrated one at a time. x3 ¼x4


xn ( n ≠ -1) 1 xn+1
Example n +1
y = x2 + x - 7
sinx -cosx
∫ y dx = ⅓x 3 2
+ ½ x - 7x +c cosx sinx
Multiples of the functions are just carried through ex ex
Example 1 lnx
y = 7sinx - 3x2 = 7(sinx) -3(x2) x

∫ y dx = -7cosx -3(⅓x3) + c = -7cosx -x3 +c

There are two types of integration indefinite where we just find the expression for the integration as
in the table and the examples, and definite where we go on to use this answer to find the actual area
under a curve between two boundaries. This area can represent many different quantities depending
on the original function so that it may be distance travelled (in a velocity/time graph) or the consumer
surplus (in a price/quantity demand graph).

Page 24 Differentiation and Integration


MATH1179
Definite integration
This has an extra step of evaluating the integration at both boundaries so finding the area between
them by subtraction of two evaluations of the integral, one at each boundary.

Example
Calculate the area under the curve y = -x2 +10x between x = 2 and x = 8
8 8
This is written as Area = ∫ y dx = ∫ − x
2
+ 10 x dx
2 2

8 8 8
AREA  1 3 1 2
∫2 y dx = ∫2 − x + 10x dx = − 3 x + 10 2 x  2
2

Using our rules from the table we show we have solved but NOT evaluated the integral by using
square brackets. We then evaluate by substituting x = 8 first and then x=2.
8
 1   1   1 
= − x 3 + 5x 2  =  − 8 3 + 5(8) 2  −  − 2 3 + 5(2) 2  = (-512/3+320)–(-8/3+20)
 3 2  3   3 
= 300 – 504/3 = 300 – 168 = 132 sq units
(Note checking this from the graph the scale on the y axis is in 4’s so each square represents 4 sq
units)
Areas below the axis are negative which leads to some interesting results in calculating total areas

Example

∫ sin x dx = [cos x]

0 = (cos 2π ) − (cos 0) = 1 − 1 = 0
0

The negative area below the axis has cancelled out the
positive one above. To find the actual area covered here
you need to split the integration into 2 parts and recognise
the second one is negative
2π π 2π

∫ sin x dx = ∫ sin x dx + ∫ sin x dx


0 0 π

This emphasizes the importance of sketching the curve to be integrated and the need to understand
what is represented by the integration in any particular problem domain.

Mathematics for Computer Science Page 25


MATH1179
Integration by parts
Differentiating uv, where u and v are functions of x, using the product rule gives
d (uv) dv du
= u +v
dx dx dx
dv du
Integrating both sides with respect to x gives uv = ∫ u +∫v which can be rearranged to give
dx dx
dv du
∫ u dx = uv − ∫ v dx
Example
Find ∫ x sin x.dx let u = x. let dv/dx = sinx

Then du/dx = 1 Then v = -cosx


Therefore ∫ x sin x.dx = x (-cosx) - ∫ ( − cos x)1dx = -xcosx + ∫ cos x.dx
= − x cos x + sin x + c

Example
Find ∫ xe let u = x. let dv/dx = e −2 x
−2 x
dx

− e −2 x
Then du/dx = 1 Then v =
2
− e −2 x − e −2 x − xe −2 x e −2 x
Therefore ∫ xe −2 x dx =x.
2
- ∫ 2 . 1dx =
2
+ ∫ 2 dx
− xe −2 x e −2 x
= − +c
2 4
− e −2 x
= (2 x + 1) + c
4

Page 26 Differentiation and Integration


MATH1179

5: Numerical Methods – not for 2021/22


Numerical methods are useful when analytical methods are not possible. This section looks at three
instances of when numerical methods can be used. The numerical methods highlighted involve
approximation techniques and as such there will be an error associated with them.

5.1 Root finding


Aim: To solve 𝑓𝑓(𝑥𝑥) = 0 for 𝑥𝑥, when an explicit analytical solution is impossible.

𝑓𝑓(𝑥𝑥)
?
𝑥𝑥

Bisection method
This method applies to any function that is continuous on an interval [𝑎𝑎, 𝑏𝑏] and has the property that
𝑓𝑓(𝑎𝑎) and 𝑓𝑓(𝑏𝑏) have different signs. This implies that 𝑓𝑓 has a zero in [𝑎𝑎, 𝑏𝑏]. Let 𝑥𝑥0 be the mid-point
in [𝑎𝑎, 𝑏𝑏]. If 𝑓𝑓(𝑥𝑥0 ) = 0 we have the solution, otherwise,…
• If 𝑓𝑓(𝑥𝑥0 ) –ve then choose the 𝑎𝑎 or 𝑏𝑏 which makes 𝑓𝑓(𝑎𝑎) or 𝑓𝑓(𝑏𝑏) positive and continue to find
another mid-point.
• If 𝑓𝑓(𝑥𝑥0 ) +ve then choose the 𝑎𝑎 or 𝑏𝑏 which makes 𝑓𝑓(𝑎𝑎) or 𝑓𝑓(𝑏𝑏) negative and continue to find
another mid-point.
Continue until the correct degree of accuracy is reached.
This method never fails - so is useful for that reason. However you do need two values 𝑎𝑎 and 𝑏𝑏 to
start with, not just one like Newton-Raphson. 𝑎𝑎 and 𝑏𝑏 should be close to the root, else it can take a
long time.

Example
Suppose 𝑔𝑔(𝑥𝑥) represents the profit from the sale of bananas, and is measured in thousands of dollars,
and 𝑥𝑥 is measured in thousands of kg, and 𝑔𝑔(𝑥𝑥) = 𝑥𝑥 5 + 𝑥𝑥 3 + 𝑥𝑥 2 − 1 for x in [0, 1].
Since 𝑔𝑔(0) = −1 and 𝑔𝑔(1) = 2 there is a number 𝑑𝑑 in [0, 1] where 𝑔𝑔(𝑑𝑑) = 0. This is the break-even
point where there is neither a profit nor a loss. We
need to try to find 𝑑𝑑. The equation does not
factorise so we will try the interval bisection
method. Perform 5 iterations after the initial +ve
bisection.

𝑥𝑥𝑛𝑛 𝑔𝑔(𝑥𝑥𝑛𝑛 ) x1=0.75,


0 −1 g(x1)=0.22168
1 2
0+1 0
𝑥𝑥0 = = 0.5 −0.59375
2
0.5 + 1 x0=0.5,
𝑥𝑥1 = = 0.75 0.22168
2 -ve g(x0)= -0.59375

Mathematics for Computer Science Page 27


MATH1179
0.75 + 0.5
𝑥𝑥2 = = 0.625 Thus the break-even point is 0.703125x103kg,
2
703kg of bananas approximately. [NB. real root
closer to 0.699 – more iterations needed].

The Error Term and when to stop


There are many ways of defining when to stop.
• |𝑑𝑑 − 𝑥𝑥𝑛𝑛 | is the error, and we can stop when this en is less than a certain number i.e., 0.005, or,
• |𝑥𝑥𝑛𝑛 − 𝑥𝑥𝑛𝑛−1 | is the difference between two iterations, and we could stop when this is less than
a certain number, say 0.005, or,
• we could stop when |𝑓𝑓(𝑥𝑥)| < 0.005 i.e., the height of the curve is less than 0.005.

a0 d b0
x0 xn-1 xn

Criteria 1:

Stop when d − x n < en , where d is the actual root.

Suppose the first interval [a0, b0] has length M. The first approximation x0 is the midpoint of this
interval and the error is d − x0

M M
d − x0 ≤ or e 0 ≤
2 2
The next approximation x1 is half way between x0 and b0 (or a0) so
M M
d − x1 ≤ or e 1 ≤
22 22
M M
Similarly d − xn ≤ or e n ≤ this gives us an upper bound on the error.
2n +1 2n +1
Criteria 2:

Stop when the difference between two iterations x n − x n −1 is less than a certain number, say 0.005

such that x n − x n −1 < 0.005.


M M
Suppose [a0, b0] has length M, then x1 − x0 = . Therefore, xn − xn −1 =
22 2n +1

Page 28 Numerical Methods


MATH1179

Solve for n to find how many iterations are needed, or, just keep iterating until xn − xn −1 < 0.005

Criteria 3:
Stop when the absolute function value is less than a specified value
close to 0, ie., f ( x n ) < 0.005

Therefore, just keep iterating until f ( x n ) < 0.005 which meets the
required accuracy and the root is found.

Example
Use the interval bisection method to solve 3 x − ln x = 5 stopping where f ( x) < 0.02
Need to get it in the form f(x) = 0 first,
f ( x) = 3x − ln x − 5 = 0

xn f ( xn ) = 3 xn − ln xn − 5 Try the interval [1, 2] initially, look for


interval with change in signs.
a=1 -2
b=2 0.3068528
x0 = 1.5 − 0.9054651

1.5 + 2
x1 = = 1.75 − 0.3096157
2
1.75 + 2 As f ( x) < 0.02 stop
x2 = = 1.875 − 0.0036086
2 ∴ x = 1.875

Newton-Raphson Method
Numerical methods for solving equations basically involve finding an approximate value for the root
and improving this approximation to a required accuracy.

This method, named after Sir Isaac Newton and Joseph Raphson, utilises the line tangent to the graph
of a function, to approximate values for a zero of that function.

d d
x2 x1 x3 x2 x1

Let d be a zero of the function and assume f is differentiable on an open interval containing d. Choose
a number x1 which is close to d. Draw the tangent at x1 and provided it is not horizontal, hopefully it

Mathematics for Computer Science Page 29


MATH1179
will cut the x-axis at x2, which is closer to d than x1 was. Continue in this way to get as close as you
wish.
f ( x1 )
In the first diagram, f ′( x1 ) = . Cross multiply f ′( x1 ) × ( x1 − x2 ) = f ( x1 ) divide by f ′( x1 )
x1 − x2
f ( x1 )
x1 − x2 = thus x2 = x1 − f ( x1 ) and this is clearly a better approximation to d than x1 was.
f ′( x1 ) f ′( x1 )

In general, if xn is an approximation to the root or zero of an equation, then a better root is


f ( xn )
x n +1 = x n − (find it in the Formulae book)
f ′( xn )

The second diagram shows the iteration in a second stage.


When do we stop? We specify that we want the difference between two successive values of xn and
f ( xn )
xn +1 to be less than ε , this is the same as saying <ε
f ′( xn )

Example
Find an approximate solution to an accuracy
of 0.0001, to the equation cos x = x
Firstly, this is not of the form f ( x ) = 0 so we
must re-write it as cos x − x = 0

We don’t have a starting point so have to find


one by trial and error. We could sketch y = x and
y = cos x and see where they cross OR try x = 0 to

give f(x) +ve, try x = π 2 giving f(x) –ve so there is a


root somewhere in between.

Start at x = 1 in this case.


By and large, keep trying x = 0, ± 1, ± 2, ± 3, etc until
you get a sign change.
x f (x ) f ′(x ) f ( x) f ( x)
x−
f ′( x ) f ′( x )

1 -0.459697694 -1.84147099 0.2496361314 0.750363868


0.750363868 -0.0189230738 -1.68190495 0.0112509769 0.739112891
0.739112891 -4.64559998 × 10−5 -1.67363254 2.775758638 × 10−5 0.739085133

f ( x)
In the last step < 0.0001 or 10-4, the approximate root is 0.739085133. (0.739 3dp)
f ′( x )

Page 30 Numerical Methods


MATH1179
An alternative way is to ask for the root to be accurate to say 4 dps. Keep going until two successive
roots have the same 4dps. In this case 0.7391 is a root accurate to 4 dps.

Example
3 3
Use the Newton-Raphson method to find an approximate positive root of the equation x + 4x = 1
8
to an accuracy of 4 dps.
3 3 9 2
Write as f ( x=
) x + 4 x −=
1 0 ⇒ f ′( x=
) x +4
8 8
27
Try x =
0 f (0) =
−1 x =
1 f (1) == 3.375
8
x f (x ) f ′(x ) f ( x) f ( x)
x−
f ′( x ) f ′( x )

0.5 1.046875 4.28125 0.244525547 0.255474453

=> Root 0.2485 (4dp)

When Newton-Raphson doesn’t work


1. If the initial value of x1 is too far
away from the root the tangent
may take you further away
from the root, and thus
successive values of xn will
diverge. You might even end
up close to another root.

2. Or, if the tangent is too


shallow, ie f ′(x ) is close to
f ( x)
zero, then will be
f ′( x )
large and so the
approximation won’t
converge.

Mathematics for Computer Science Page 31


MATH1179

5.2 Numerical differentiation


Recall: Mathematically, the derivative represents the rate of change of a dependent variable with
respect to an independent variable.
If 𝑦𝑦 = 𝑓𝑓(𝑥𝑥) the definition of the derivative of 𝑓𝑓(𝑥𝑥) at 𝑥𝑥𝑖𝑖 is
𝑑𝑑𝑑𝑑 𝑓𝑓(𝑥𝑥𝑖𝑖 + ∆𝑥𝑥) − 𝑓𝑓(𝑥𝑥𝑖𝑖 )
= lim
𝑑𝑑𝑑𝑑 ∆𝑥𝑥→0 ∆𝑥𝑥
𝑑𝑑𝑑𝑑
The derivative 𝑑𝑑𝑑𝑑 is also designated as 𝑦𝑦 ′ (𝑥𝑥𝑖𝑖 ) or 𝑓𝑓 ′ (𝑥𝑥𝑖𝑖 ) .

Geometrically, the derivative is the slope of the tangent to the curve at 𝑥𝑥𝑖𝑖 xi (figure later in text).

Finite Difference Approximations of the First Derivative

Forward Difference Approximation of the First


Derivative
𝑓𝑓(𝑥𝑥𝑖𝑖+1 ) − 𝑓𝑓(𝑥𝑥𝑖𝑖 )
𝑓𝑓 ′ (𝑥𝑥𝑖𝑖 ) = + 𝑂𝑂(ℎ)

Where ℎ = 𝑥𝑥𝑖𝑖+1 − 𝑥𝑥𝑖𝑖 is the step size and 𝑂𝑂(ℎ) is the error
of order of magnitude ℎ
It is termed a “forward” difference because it utilizes data
at i and i+1 to estimate the derivative.

Backward Difference Approximation of the First


Derivative
𝑓𝑓(𝑥𝑥𝑖𝑖 ) − 𝑓𝑓(𝑥𝑥𝑖𝑖−1 )
𝑓𝑓 ′ (𝑥𝑥𝑖𝑖 ) = + 𝑂𝑂(ℎ)

Page 32 Numerical Methods


MATH1179
Centered Difference Approximation of the First Derivative
𝑓𝑓(𝑥𝑥𝑖𝑖+1 ) − 𝑓𝑓(𝑥𝑥𝑖𝑖−1 )
𝑓𝑓 ′ (𝑥𝑥𝑖𝑖 ) = + 𝑂𝑂(ℎ2 )
2ℎ
Notice that the truncation error is of the order ℎ2 in
contrast to the forward and backward approximations that
were of the order ℎ. Consequently the centered difference
approximation is more accurate.

Example - Finite-Divided-Difference
Approximations of Derivatives
Problem Statement: Use forward and backward difference approximations of 𝑂𝑂(ℎ) and a centred
difference approximation of 𝑂𝑂(ℎ2 ) to estimate the 1st derivatives of
f ( x) = −0.1x 4 − 0.15 x 3 − 0.5 x 2 − 0.25 x + 1.2
at 𝑥𝑥 = 0.5 with step size ℎ = 0.5, repeat with ℎ = 0.25.

[1st derivative calculated directly is: f ( x) = −0.4 x 3 − 0.45 x 2 − 1.0 x − 0.25 and can be used to
calculate the true value 𝑓𝑓 ’(0.5) = − 0.9125]

Solution:
For ℎ = 0.5, the function can be employed to determine
𝑥𝑥𝑖𝑖−1 = 0 𝑓𝑓(𝑥𝑥𝑖𝑖−1 ) = 1.2
𝑥𝑥𝑖𝑖 = 0.5 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = 0.925
𝑥𝑥𝑖𝑖+1 = 1.0 𝑓𝑓(𝑥𝑥𝑖𝑖+1 ) = 0.2
f (1.0) − f (0.5)
Forward divided difference: f ' (0.5) = = −1.45 error, 𝑒𝑒 = 58.9%
0. 5
f (0.5) − f (0)
Backward divided difference: f ' (0.5) = = −0.55 𝑒𝑒 = 39.7%
0.5
f (1.0) − f (0)
Central divided difference: f ' (0.5) = = −1.0 𝑒𝑒 = 9.6%
2 * 0.5
For ℎ = 0.25, the function can be employed to determine
𝑥𝑥𝑖𝑖−1 = 0.25 𝑓𝑓(𝑥𝑥𝑖𝑖−1 ) = 1.10351563
𝑥𝑥𝑖𝑖 = 0.5 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = 0.925
𝑥𝑥𝑖𝑖+1 = 0.75 𝑓𝑓(𝑥𝑥𝑖𝑖+1 ) = 0.63632813
f (0.75) − f (0.5)
Forward divided difference: f ' (0.5) = = −1.155 error, 𝑒𝑒 = 26.5%
0.25
f (0.5) − f (0.25)
Backward divided difference: f ' (0.5) = = −0.714 𝑒𝑒 = 21.7%
0.25

Mathematics for Computer Science Page 33


MATH1179
f (0.75) − f (0.25)
Central divided difference: f ' (0.5) = = −0.934 𝑒𝑒 = 2.4%
2 * 0.25

For both step sizes, the centered difference approximation is more accurate than forward or backward
differences. Also, as predicted by the Taylor series analysis, halving the step size approximately
halves the error of the backward and forward differences and quarters the error of the centered
difference.
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣−𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
Formula for relative error, 𝑒𝑒 = 100 × � 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑

5.3 Numerical integration


The problem of numerical integration is that of estimating the number

I = ∫ f ( x)dx
when the integration cannot be carried out exactly or when f ( x ) is known only at a finite number
of points.
x
2cos x e
Many functions, for example sin x , and cannot be integrated analytically. However,
x x
integration of such functions can be performed numerically.
To get an approximate value of I,
b b
I = ∫ f ( x)dx ≅ ∫ f n ( x)dx
a a

where

f n ( x) = c0 + c1 x + c2 x 2 + ... + cn −1 x n −1 + cn x n
is a polynomial of degree 𝑛𝑛. The resulting formulas are called Newton-Cotes formulas.
Let us consider the simplest integration rule of this type.

Trapezoidal Rule
This rule is a particular case when 𝑓𝑓(𝑥𝑥) is approximated by a polynomial of first degree, namely:
f1 ( x) = c0 + c1 x . Then we get the formula
b b
I = ∫ f ( x)dx ≅ ∫ f1 ( x)dx
a a

Page 34 Numerical Methods


MATH1179
Geometrically, means that the area beneath the curve
𝑦𝑦 = 𝑓𝑓(𝑥𝑥) is approximated by the shaded area 𝐴𝐴 beneath
the straight line y = f1 ( x) = c0 + c1 x
Recap: an equation of a straight line, passing through
points (𝑎𝑎, 𝑓𝑓(𝑎𝑎)), (𝑏𝑏, 𝑓𝑓(𝑏𝑏)) can be written in the form
y − f (a) f (b) − f (a)
=
x−a b−a
f (b) − f (a )
or y = f (a) + ( x − a)
b−a
Hence
f (b) − f (a )
y = f1 ( x) = f (a ) + ( x − a)
b−a
The area 𝐴𝐴, shown in the figure, under the straight line is an estimate of the definite integral of 𝑓𝑓(𝑥𝑥)
between limits a and b. i.e.,
b
 f (b) − f (a ) 
I ≅ ∫  f (a) + ( x − a ) dx
a 
b−a 
Rearranging the integrand and then integrating we obtain the following:
b
 f (b) − f (a ) a. f (b) − a. f ( a ) 
I ≅ ∫ x + f (a) −  dx
a 
b−a b−a 
b
 f (b) − f (a ) b. f (a ) − a. f (b) 
= ∫ x+  dx
a 
b−a b−a 
b
 f (b) − f (a ) b. f (a ) − a. f (b) 
= ∫ x+  dx
a 
b−a b−a
b
 f (b) − f (a) x 2 b. f (a) − a. f (b) 
= + x
 b−a 2 b−a a

=
( +
)
f (b) − f (a) b 2 − a 2 b. f (a) − a. f (b)
(b − a )
b−a 2 b−a

= [ f (b) − f (a )]
(b + a ) + b. f (a) − a. f (b)
2
f (a ) + f (b)
= (b − a )
2
The last expression represents the Trapezoidal Rule:
b
f (a ) + f (b)
I = ∫ f ( x)dx ≅ (b − a )
a
2

Introduce the following denotations:


b − a = h, a = x0 , b = x1 , f ( x0 ) = f 0 , f ( x1 ) = f1 ,
then this rule can be written as

Mathematics for Computer Science Page 35


MATH1179
h
I≅ ( f 0 + f1 )
2
h
I= ( f 0 + f1 ) + Error
2
Lagrange interpolation polynomials are used to determine the error
ℎ3
So the error can be evaluated as 𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸𝐸, 𝐸𝐸 = 𝑂𝑂 �− 12� 𝑓𝑓 (2) (𝜉𝜉)

Where 𝑓𝑓 (2) (𝜉𝜉) is the function differentiated twice and determined at some point 𝑥𝑥 = 𝜉𝜉 at some value
of x in the interval 𝑎𝑎, 𝑏𝑏].

Example
Use the trapezoidal rule to integrate the function.
f ( x) = 0.2 + 25 x − 200 x 2 + 675 x 3 − 900 x 4 + 400 x 5
from 𝑎𝑎 = 0 to 𝑏𝑏 = 0.8. Compare the answer to the exact value of the integral.

𝑓𝑓(0) = 0.2, 𝑓𝑓(0.8) = 0.232


Trapezoidal rule:
h 0.8
I= ( f 0 + f1 ) + E = [0.2 + 0.232]
2 2
= 0.1728
Exact solution
0.8
I= ∫ f ( x)dx = 1.64053
0

The error is:


E = 1.6405 − 0.1728 = 1.4677
1.6405 − 0.1728
%E = .100 = 89.47%
1.6405

Why is the error so large? Have a look at the graph of


the function and the Trapezoidal approximation of the
area below the function.

How can the accuracy be improved?

Composite Trapezoidal Rule


We can divide the interval into smaller intervals/ segments with the aid of equidistant points
x0 = a < x1 < x2 < ... < xn −1 < xn = b ,

Page 36 Numerical Methods


MATH1179
b−a
where the step size is h = xi +1 − xi = , i = 0,1,2,..., n − 1
n

b x1 x2 xn
Then I = ∫ f ( x)dx = ∫ f ( x)dx + ∫ f ( x)dx + ... + ∫ f ( x)dx
a x0 x1 xn−1

f ( x0 ) + f ( x1 ) f ( x1 ) + f ( x2 ) f ( xn −1 ) + f ( xn )
≅h +h + ... + h
2 2 2
If we rearrange we get
n −1
f ( x0 ) + 2∑ f ( xi ) + f ( xn )
I ≅ (b − a ) i =1
2n
(Width) * (Average Height)
(b − a )
or I≅ [ f ( x0 ) + 2( f ( x1 ) + f ( x2 ) + ... + f ( xn−1 )) + f ( xn )]
2n
h
I≅ [ f 0 + 2( f1 + f 2 + ... + f n−1 ) + f n ] Composite Trapezoidal Rule
2
The error of the Composite Trapezoidal Rule is given by
(b − a) 3 n
(b − a) 3 ( 2 )
E=−
12n 3
∑i =1
f ( 2 ) (ξ ) ≅ −
12n 2
f (ξ )

(2)
Where 𝑓𝑓 (𝜉𝜉) is the function differentiated twice and determined at some point 𝑥𝑥 = 𝜉𝜉 at some value
of x in the interval 𝑎𝑎, 𝑏𝑏].

Example
Integrate the function using the composite trapezoidal rule
f ( x) = 0.2 + 25 x − 200 x 2 + 675 x 3 − 900 x 4 + 400 x 5
with n=2, from a=0 to b=0 and estimate the error.

Mathematics for Computer Science Page 37


MATH1179
b − a 0.8 − 0
h= = = 0.4
n 2
x0=0, x1=0.4, x2=0.8
 f0= f(0)=0.2, f1=f(0.4)=2.456, f2= f(0.8)=0.232
Application of the formula gives
0.8
I≅ [0.2 + 2(2.456) + 0.232] = 1.0688
4
Actual and relative percentage errors are
Ea=1.6405 - 1.0688 = 0.5717 and E%=34.9% respectively

There are other approaches to numerical integration besides the Trapezoidal rule. There are also
many other numerical methods that further studies will present.

Page 38 Numerical Methods


MATH1179

6: Data and Statistics

Data and Information


A datum (the singular of data) is a sequence of alphanumeric characters that can exist without
reference to context or meaning. When data are given meaning or placed within a context they contain
information.

Example
02083318431 contains information once you recognise it as a phone number 0208 331 8431

Example
Once CV116PG is put on an envelope it conveys the information that the post code CV11 6PG
contains.
As we have already noted many of the roles in computing, including the systems analyst and software
developer, are involved in extracting information and interpreting input, output and processes.
Information will often be observed, measured, presented or collected in the form of data i.e.
collections of alphanumeric characters.

Parameters and Statistics


Parameters are measures or numerical values that can describe a ‘population’. Here population does
not necessarily mean people but just the collection of objects of interest.

Example
The maximum age for a man currently in UK is 111 years, here maximum is a parameter, an
interesting value describing a characteristic of the population of men in the UK. If the measurement
is made on only a sample of the population it is called a statistic. e.g. the maximum age for a sample
group of 500 men is 102 years.
In practice we are usually concerned with the calculation of statistics on the samples we have and
sometimes use our results to estimate the population parameters.
A statistic is a number or fact that summarizes a collection of data and is derived from it by various
arithmetic measures. The study of statistics is concerned with scientific methods for collecting,
organising, summarising, presenting and analysing data and with drawing valid conclusions and
making reasonable decisions on the basis of such analysis.
We can divide these methods into descriptive statistics and statistical inference. In the first we are
just trying to describe or characterize the information we have, in the second we will try to use our
statistics to give us extra insight usually for prediction or applying into a wider context.

Example
In a recent Mathematics lecture a survey was taken. From 80 students:
10 were asleep
50 said it was a fascinating lecture
15 said it was alright
Mathematics for Computer Science Page 39
MATH1179
5 said “Oh is this Mathematics?”
We could conclude from this

a) one eighth of the students in Mathematics sleep (this would be in the area of descriptive statistics
as we are simply describing the existing situation)

b) At the University of Greenwich 1 out of every 16 students do not know what they are doing (this
would be statistical inference, trying to extend our result from the class to a wider group to infer
something for the wider students population)

Statistical Inference has to be carefully made and often its application is governed by rules relating
the data you have, the sample set, to the population, the wider set of all possible data.
Whenever you apply statistical methods either for description or inference you should be aware of
and state the limitations of your results. When you input data into a computer, calculator or program
and obtain a result the restrictions of its applications should be clearly expressed to the user.

Graphical Representation of data


Example
In 1987 the UK General Election produced

Party Nos of MP's


Cons 375 Alliance Other
3% 4%
Labour 229
Alliance 22
Other 24
Total 650
Lab
35%
Pie Chart Cons
58%
Cons take 375 = 0.58 % of the pie
650
Lab take 229 = 0.35 % of the pie etc
650
We can easily see here the relative share of the votes each part obtained.

Page 40 Data and statistics


MATH1179
Bar Charts
These are useful for comparing the performance of
Other one party with the rest. The length of the bar (or
height if it is drawn the other way round) represents
Alliance the number of MPs each party has.
Lab
Both the examples above only had one number for
Cons each category – all 229 Labour MPs were counted
as one sector or bar. More usually we have several
0 100 200 300 400
results that can be grouped under the same
category.

Organization of Data
We might have obtained the previous list from a spreadsheet of all 650 MPs in that election giving
the MP’s name and party

MP Party Using the “COUNTIF” function in Excel as a formula at


the bottom of the 650 names as shown here we can group
Tony Blair Labour
the MPs by their party.
William Hague Conservative
Edward Heath Conservative
Gordon Brown Labour We often have large numbers of observations or other data
that we have to group together before we can analyze it
Charles Kennedy Alliance
efficiently.
Winifred Ewing Other
Jeffrey Archer Conservative

all other data

375 = COUNTIF(B2:B651,"Cons")
229 = COUNTIF(B2:B651,"Lab")
22 = COUNTIF(B2:B651,"Alliance")
24 = COUNTIF(B2:B651,"Other")

Types of Data

Discrete
So far all the examples we have seen used values that we can refer to as being discrete, this means
that there are gaps between the datum values or categories. Some data values in the range are not
possible in the case of discrete datum.

Mathematics for Computer Science Page 41


MATH1179
Example
A survey to determine the most popular shirt collar size was undertaken with male office workers.
The range of values reported was between size 15 and size 17½. So the sizes formed a set of discrete
datum {15, 15½, 16,16½, 17, 17½}.

There were no values in between and on a number line they would represent isolated points

15 15½ 16 16½ 17 17½

There are no values in between 16 and 16½ so if the office worker has a neck size of 16¼ they have
to choose to wear a 16 or a one or a 16½.

Similarly the MPs were only in one party, they couldn’t be halfway between a Conservative MP and
a Labour one.

Continuous
Continuous data as its name suggests could be shown as a continuous line with all values possible.
Some examples are temperature and height.

Example
In an experiment to take repeated measurements of the temperature of a component in a laptop the
minimum of value was 21o C and the maximum 24.7oC.

The temperatures formed a set of continuous datum. Represented on a number line any value could
have been obtained in between the minimum and maximum.

21 24.7

In practice we are usually constrained by the precision of the measurement method to make the data
discrete for example by measuring to one decimal place so that there are no values recorded between
23.3oC and 23.4oC.

The range of data is the difference between the maximum and minimum values.
In the last example the range = 24.7 oC – 21 oC = 3.7oC

Categorical
Sometimes the results we are given to analyze are not numerical, this is particularly true in the field
of market research. Answers are given from a number of discrete non numerical categories. This was
true of the MP’s and their party.

Page 42 Data and statistics


MATH1179
Example
In a questionnaire respondents were asked to ring the categories appropriate to them
Sex : Male / Female Status: Married / Single / Widowed / Divorced

Ranked
In the previous example no ranking is implied within the categories, but in other questionnaires there
is an ordering of the possible answers which could be given a numerical rank.

Example
In a questionnaire respondents were asked to rate the following items with one of three categories
which have a clear ranking.
Lecturer: Boring / OK / Interesting
Material: Easy / OK / Hard

Frequency Distributions
We shall now consider the process of organising data. We first sort the data into the different values
which can be attained and calculate the frequency of each value (the number of times it has occurred).
If there is a large range of values or the data is continuous the different values may be grouped into a
range of datum.
Hours of TV Tally Frequency
Example
28 IIII 5
A random sample of 20 individuals responded to the
question 'How many hours of television do you watch 29 III 3
each week ?' as follows : 30 II 2
31 I 1
29 35 28 31
32 III 3
28 29 30 32
33 I 1
29 30 28 32
34 0
32 28 35 35
35 IIII 5
35 28 33 35
Total 20

We convert these into frequencies using tally marks.


This is one of the most primitive ways of counting as we make a mark each time we count a datum
in that category.

Mathematics for Computer Science Page 43


MATH1179

Using a Spreadsheet
Forming a frequency distribution from a set of numbers
in a spreadsheet package such as Excel can be achieved
in a number of ways. The data has to be copied in and
then you can either use the COUNTIF function or the
Histogram facility in the Data analysis option. Both
results are shown here.

Grouping data
If there are a large number of different values e.g. the results of an exam for 50 students, we do not
want a table which has a low frequency against each of a large number of values. We use classes to
join some of the data together.

Example
Here the quantity of data is large ( 50) and has a wide range so it is convenient to group data into
class intervals. Suppose we have repeated the television survey with 50 respondents and the results
are as follows:

27 26 35 34 29 28 40 40 24 33
22 25 21 20 31 30 37 38 22 23
15 12 29 16 10 36 29 17 39 28
25 41 29 40 18 37 25 18 27 27
25 40 33 31 34 26 40 20 12 15

Maximum data value = 41


Minimum data value = 10
Range of values = 31

We shall aim for between 8 and 10 classes - so we shall have a class width of 4 hours

Page 44 Data and statistics


MATH1179
The class boundaries are really 9.5 - 13.5, 13.5 - 17.5 (to ensure no data can be in two adjacent
classes).
The class mark is half way between the class boundaries. When using the histogram function in Excel
it asks for bin ranges for this you give the top value in each group so it is useful to have a column
with these in to reference.

This data can be represented pictorially by a histogram.


This is a set of rectangles with bases on the horizontal axes, with centres at the class mark and lengths
equal to the class interval size. The area of each rectangle is proportional to the class frequency.

Remember class boundaries must be chosen so that we have one more degree of accuracy than in the
data, this ensures that every datum can be put into exactly ONE class. The class mark at the midpoint
is the representative value for that class and can be used in calculations.

12

10

8
Frequency

End of class interval 33.5.


6 No datum will have this
value.
4

0
11.5

15.5

19.5

23.5

27.5

31.5

35.5

39.5

Hours

A frequency polygon is obtained by joining the mid-points of the top of each rectangle.

Mathematics for Computer Science Page 45


MATH1179

Cumulative frequency Tables


Here we calculate a running total of the
Hours of Frequenc Cumulative
frequency values.
TV y Frequency
9.5-13.5 3 3
The number of observations below a specified
value is often required e.g. the number of people 13.5-17.5 4 7
who watch less than 30 hours of TV a week. 17.5-21.5 5 12
21.5-25.5 8 20
The cumulative frequency of 31, shows that 31
people watched up to 29.5 hours. 25.5-29.5 11 31
29.5-33.5 5 36
33.5-37.5 6 42
Statistics for summarising Data
We can often describe the way that data is 37.5-41.5 8 50
distributed by stating:
(i) an average value (commonly Arithmetic Mean, Mode and Median)
(ii) the shape of the distribution of the data
(iii) the spread of the data.

MEAN

Usually written as x for a sample or µ for the population

Example
Suppose that 6 students take a test and get scores 53, 72, 45, 80, 85 and 52. The mean mark is the
sum of these numbers divided by 6 = ( 53 + 72 + 45 + 80 + 85 + 52 ) ÷ 6 = 64.5
n

(x + x 2 + x 3 + .... x n )
∑x i
With n values given as x1, x2, ... , xn the mean is = 1 = i =1
using the
n n
Greek capital sigma ∑ to show we have summed all terms like x i where i takes the values from 1
to n

Page 46 Data and statistics


MATH1179
Example
Now consider the data in the first example on the number of hours of TV viewing.

Mean = ( 29 + 35 + 28 +............ + 35 + 35 ) ÷ 20 using the original ‘raw’ data and summing all
20 numbers separately then dividing by the number of data items, this gives the answer 31.1

We could also get this number from the frequency table using more efficient grouping

Mean = ( 5 × 28 + 3 × 29 + 2 × 30 + 1 × 31 + 3 × 32 + 1 × 33 + 5 × 35 ) ÷ 20
= 140 + 87 + 60 + 31 + 96 + 33 + 175 ) ÷ 20
= 31.1

So if data values are repeated so that value xi occurs fi times, then n = f1 + f2 + ... + fn the total
frequency and
n n

(f1 x 1 + f 2 x 2 + f 3 x 3 + .... f n x n )
∑ fi x i ∑f x
i =1
i i
i =1
mean = = = n
n n
∑f
i =1
i

Now consider data which has been grouped into classes. In this case we take the class mark to
represent the class in the calculations. This is true of the second example of TV viewing. Here x =
11.5 represents the values in the class 2.5 – 13.5. This means we will lose some precision if we
calculate the mean this way. The alternatives are to use the AVERAGE function in Excel or the
Summary statistics provided which will use all the raw data. All are shown here for comparison

=AVERAGE(A1:J5) 27.58

Mathematics for Computer Science Page 47


MATH1179
Example
Consider the frequency distribution of salaries in care work. They have been grouped into classes for
ease of calculation. In the raw data there would probably be 20 different values for the salaries.

Annual Salary (£) Class Frequency, f×x


Mark, x f

≤ 10000 9500* 1 9500


> 10000 but ≤ 11000 10500 2 21000
> 11000 but ≤ 12000 11500 5 57500
> 12000 but ≤ 13000 12500 4 50000
> 13000 but ≤ 14000 13500 3 40500
> 14000 but ≤ 15000 14500 2 29000
> 15000 but ≤ 16000 15500 2 31000
> 16000 16500* 1 16500
Total 20 255000

*In the absence of a lower or upper boundary we try to make a reasonable assumption.
So the mean salary = £ 255000 /20 = £ 12750 This will now only be an approximation of the answer
taken straight from the raw data.

Advantages of the Mean


(i) Widely understood and calculation is easy
(ii) Uses all available data
(iii) easy to use in further calculations,
e.g. in calculating overall average for data from two or more sources.

Disadvantages of the Mean


(i) Can appear unrealistic
(ii) Uses all data - one extreme value can distort mean value.
(iii) Boundary value assumptions may not be accurate;
e.g. for a range such as "25 or more"

MODE
This is defined as the most frequently occurring value and so in a bar chart showing the frequency it will
be the values giving rise to the highest bar.

Page 48 Data and statistics


MATH1179
mode

Advantages of Mode
(i) Easy to find
(ii) Best represents a 'typical' item so is practical
(iii) Easy to understand

Disadvantages of Mode
(i) Not well defined
(ii) Does not use all values
(iii) Not useful if observations are spread out
(iv) Unsuitable for further calculations

MEDIAN
The value of the middle term of the sorted data

Example
6, 7, 9, 15, 23

median

50% of the observations are above/below the median

Calculation for ungrouped data:


If there is an odd number of observations choose middle value

If there is an even number of observations then average the two middle values

Example
6, 7, 9, 15, 20, 23
└─┬┘
12
For grouped data we need to construct a cumulative frequency polygon (ogive). The median is the
observation which represents the 50% or n/2 value.

Mathematics for Computer Science Page 49


MATH1179
Median: For grouped data calculate the % cumulative totals of each group

MILEAGE
MILEAGE NUMBER OF cumulative
('000 miles)
('000 miles) CARS (frequency) frequency %
upper limits
upto 3 3.5 16 16 4
4-6 6.5 40 56 14
7-9 9.5 94 150 38
10 - 12 12.5 96 246 62
13 - 15 15.5 62 308 77
16 - 18 18.5 44 352 88
19 - 24 24.5 34 386 97
25 and over 30.5 14 400 100

Now plot the cumulative frequency percentage against the upper limits of each group to give

100
% cumulative
frequency
80

60

40

20

0
0 5 10 15 20 25 30

'000 miles
L.Q. U.Q.
Med.

Advantages of the Median


(i) A few extreme values do not distort the median
e.g. for wages there are usually a few large earners that have a disproportionate effect on the mean
value, to make it not really a central value.
(ii) Can calculate when extreme values are unknown.

Page 50 Data and statistics


MATH1179
Disadvantages of the Median
(i) Cannot use for further calculations
(ii) Does not use all the data

Measures of dispersion
Consider this simple example where both
sets of data have a mean of 100: To A 40 50 100 150 160 x = 100
distinguish them we need to discuss the B 98 99 100 101 102 x = 100
spread of the data.

Range
This is probably the easiest measurement of spread.
range = MAXIMUM VALUE - MIN.VALUE
for sample A 160 - 40 = 120
for sample B 102 - 98 = 4

OK to use but very loosely defined and easily distorted by one extreme value.

Inter-Quartile Range
If using median use the Inter-Quartile range for the spread

Find Upper Quartile (75% value) and Lower Quartile (25% value)
IQ range = UQ - LQ
We need to find out how the values are spread about the mean.

The standard deviation


This looks at the difference between the data values and the mean.

Using sample A to illustrate the process we can see how the formula originates.

X 40 50 100 150 160 x = 100


x- x -60 -50 0 50 60 Σ(x- x ) = DEVIATION

(x- x )2 3600 2500 0 2500 3600 Σ(x- x )2 = (DEVIATION)2

∑( x − x ) 2 12200
(Average Deviation)2 = =
n 5
This is also known as the variance written σ2 for a population or s2 for a sample.

∑( x − x ) 2
σ= = Average Deviation = 2440 = 49.4
n

Mathematics for Computer Science Page 51


MATH1179
and the average deviation is given the technical name of the standard deviation

In practice Σ(x- x )2 is awkward to calculate so an equivalent formula is used to calculate the standard
deviation
∑ 𝑥𝑥 2 2
Standard deviation = � − 𝑥𝑥
𝑛𝑛

So for sample A ∑x 2
= 62200 n=5 x = 100

62200
- 1002 = 2440 = 49.4
5

For grouped data the standard deviation is given by the formula

Standard deviation σ =
∑ fx 2

−x
2

A lot of calculators have these functions built in and provided you understand how to input your data
in pairs they will calculate the mean and standard deviation easily.
Spreadsheets such as Excel or statistical packages such as MiniTab will do this and a lot more.

Example
Consider the frequency distribution of office juniors. Mean = £10 205.88

Annual Salary (£) Class Mark


x f x−x ( x − x )2 f( x − x )2
≤ 8000 7500 1 -2705.88 7321786.6 7321786.6
> 8000 but ≤ 9000 8500 3 -1705.88 2910026.6 8730079.7
> 9000 but ≤ 10000 9500 5 -705.88 498266.6 2491332.9
> 10000 but ≤ 11000 10500 3 294.12 86506.6 259519.7
> 11000 but ≤ 12000 11500 2 1294.12 1674746.6 3349493.1
> 12000 but ≤ 13000 12500 2 2294.12 5262986.6 10525973.1
> 13000 but ≤ 14000 13500 1 3294.12 10851226.6 10851226.6
> 14000 14500 0

Σ f = 17 Σf(x- x )2 = 43529411.7
Standard deviation = √(43529411.7 ÷ 17) = £1600.17

Example
Using the short cut version of the formula
Calculate the mean and standard deviation for this distribution of test marks for 200 students

Page 52 Data and statistics


MATH1179

Class Class- f fx fx2 Mean = 8140 ÷ 200 = 40.7marks


mark
Standard deviation = 387800
10 - 15 18 270 4050 − 40.7 2
200
20 - 25 34 850 21250
30 - 35 58 2030 71050
40 - 45 42 1890 85050
50 - 55 24 1320 72600 = √282.51
60 - 65 10 650 42250
70 - 75 6 450 33750 = 16.81 marks
80 – 85 8 680 57800
90
Totals 200 8140 387800

Probability Distributions
We have already looked at frequency distributions and generated histograms in Excel. If we revisit
the data obtained by asking 20 people how many hours they watched television we can display it in
a number of ways

29 35 28 31
28 29 30 32
29 30 28 32
32 28 35 35
35 28 33 35

Hours of TV Frequency Percentage Probability


28 5 25% 0.25
29 3 15% 0.15
30 2 10% 0.10
31 1 5% 0.05
32 3 15% 0.15
33 1 5% 0.05
34 0 0% 0.00
35 5 25% 0.25
20 100% 1.00

Mathematics for Computer Science Page 53


MATH1179
Here we see that the column headed Frequency has counted how many people there were for each
value of the hours. We can express this as a percentage of the total number so that 5/20 becomes 25%
and this can also be used to show that the probability that someone from this group of 20 was watching
television for 28 hours was 0.25. Once the individual frequencies are converted to probabilities we
note that the total of these is 1.

Showing these graphically starting with the frequency polygon the only difference is in the scale of
the y axis. The third graph is known as a probability distribution. We note that in this case the curve
is made up from discrete points as there are no values in between the whole numbers
6

percentage of people
30%
number of people

5 25%
4 20%
3 15%
2 10%
1 5%
0 0%
25 27 29 31 33 35 37 25 27 29 31 33 35 37
hours hours

0.30
0.25
probability

0.20
0.15
0.10
0.05
0.00
25 27 29 31 33 35 37
hours

Continuous Probability Density Functions


We often deal with variables whose values are continuous particularly as results of measurements or
experiments. If we plot these in a probability curve we have a smooth line with all values covered.
Here the area under the curve is 1 and such a function is known as a probability density function.
The most common of these is the Normal distribution which has a characteristic Bell shape.
The curve can be used to determine the
probability of values or ranges of values 0.06
occurring. 0.05

0.04
probability

Example
0.03
In the normal distribution of temperatures
0.02
shown the probability of a temperature
>19 C is equal to the area under the curve 0.01

to the right of the line as shown. This can 0.00


be calculated by formula or from a set of 0 5 10 15 20 25 30

statistical tables in the case the answer is temperture C


Temperature
0.27

Page 54 Data and statistics


MATH1179

7: Probability

Introduction to Probability
Examples of probability in everyday life
• 60% chance of precipitation
• chance of winning the UK lottery, the odds of picking all 6 correct numbers is 1 in 13,983,816
(49!/(6!*(49-6)!) combinations of numbers)

Definitions

Probability:
Probability is a measure of the expectation that an event will occur or a statement is true. Probabilities
are given a value between 0 (will not occur) and 1 (will occur).
eg., probability that it will rain =60%, 0.6
The higher the probability of an event, the more certain we are that the event will occur.

Experiment:
An Experiment is a situation involving chance or probability that leads to results called outcomes.
For example
 Riding a bicycle – chance of me falling off!
 MATH1111 exam – what’s the chance that probability will come up in the exam?

Outcome:
An Outcome is the result of a single trial of an experiment.
For example
 Tossing a coin – the outcome could be either head or tails
 When throwing a standard dice, with six faces and numbers 1 to 6, a single time there are six
possible outcomes

Event:
An Event is one or more outcomes of an experiment.
For example
 A coin tossed 10 times may have had the following frequency of the outcomes: heads, tails
A coin tossed 10 times

Number of Heads Number of Tails

6 4

Mathematics for Computer Science Page 55


MATH1179
Sample Space:
The Sample Space is the set of all possible outcomes of that experiment
For example
 if the experiment is tossing a coin, the sample space is the set {head, tail}.
 for tossing two coins, the sample space is {(head,head), (head,tail), (tail,head), (tail,tail)}.
 for tossing a single six-sided die, the sample space is {1, 2, 3, 4, 5, 6}.

Probability of an Event
An event A has the probability:
P(A) = The Number Of Ways Event A Can Occur
The total number Of Possible Outcomes
P(A’) = 1 - P(A)

Example: Coins
Experiment: Tossing a coin with two sides – head, tail.
Outcome: The result of tossing the coin once.
Event: tossing a head or tossing a tail
Sample Space: {head, tail}

Both events have an equal chance of occurring,


 Probability of a head is 1 out of 2 = 0.5
P(head) = 0.5
 Probability of a tail is 1 out of 2 = 0.5
P(tail) = 0.5
Check:
P(head)+P(tail) = ½ + ½ = 1.0
 There are no other possible outcomes and the sum of the two probabilities is 1 which agrees
with this.

Example: Dice
 A single 6-sided die is rolled.
 What is the probability of each outcome?
 What is the probability
 of rolling an even number?
 of rolling an odd number?
 Outcomes: The possible outcomes of this experiment are 1, 2, 3, 4, 5 and 6.

 Probabilities:
P(1) = # of ways to roll a 1 = 1 / 6
total # of sides

Page 56 Probability
MATH1179
P(2) = # of ways to roll a 2 = 1 / 6
total # of sides
P(3) = # of ways to roll a 3 = 1 / 6
total # of sides 6
P(4) = 1 / 6 P(5) =1 / 6 P(6) 1 / 6

P(even) = # ways to roll an even number = 3 / 6 = 1 / 2


total # of sides
P(odd) = # ways to roll an odd number = 3 / 6 = 1 / 2
total # of sides

Example Summary
The probability of an event is the measure of the chance that the event will occur as a result of an
experiment.
The probability of an event A is the number of ways event A can occur divided by the total number
of possible outcomes.
The probability of an event A, symbolized by P(A), is a number between 0 and 1, inclusive, that
measures the likelihood of an event in the following way:
 If P(A) > P(B) then event A is more likely to occur than event B.
 If P(A) = P(B) then events A and B are equally likely to occur.
 Impossible Event: P(A) = 0
 Certain Event: P(A) = 1

Addition rule 1 for probability


When two events, A and B, are mutually exclusive, the probability that A or B will occur is the sum
of the probability of each event.
 P(A or B) = P(A) + P(B)
Eg., A single 6-sided die is rolled. What is the probability of rolling a 2 or a 5?
Probabilities: P(2) = 1/6, P(5) = 1/6
P(2 or 5) = P(2) + P(5) = 1 /6 + 1 /6 = 2 / 6 = 1/ 3

Addition rule 2 for probability


When two events, A and B, are non-mutually exclusive, the probability that A or B will occur is:
 P(A or B) = P(A) + P(B) - P(A and B)
In the rule, P(A and B) refers to the overlap of the two events.

Multiplication rule 1 for probability


Two events, A and B, are independent if the fact that A occurs does not affect the probability of B
occurring.
When two events, A and B, are independent, the probability of both occurring is:
P(A and B) = P(A) · P(B)

Mathematics for Computer Science Page 57


MATH1179
Example: coin & die
A coin is tossed and a single 6-sided die is rolled.
Find the probability of landing on the head side of the coin and rolling a 3 on the die.
P(head) = 1 /2 P(3) = 1 /6
P(head and 3) = P(head) · P(3) = ½ * 1/6 = 1/12

Example: Pizza (with replacement)


A survey found that 9 out of 10 students like pizza. If three students are chosen at random with
replacement, what is the probability that all three students like pizza?
 P(student 1 likes pizza) = 9/10
 P(student 2 likes pizza) = 9/10
 P(student 3 likes pizza) = 9/10
 P(student 1 and student 2 and student 3 like pizza) = 9/10 * 9/10* 9/10 = 729/1000
Note – could be the same student picked!

Multiplication rule 2 for probability


Two events are dependent if the outcome or occurrence of the first affects the outcome or occurrence
of the second so that the probability is changed.
The conditional probability of an event B in relationship to an event A is the probability that event
B occurs given that event A has already occurred. The notation for conditional probability is P(B|A)
[pronounced as The probability of event B given A].
When two events, A and B, are dependent, the probability of both occurring is:
P(A and B) = P(A) · P(B|A)

Example: defective computers


In a shipment of 20 computers, 3 are defective. Three computers are randomly selected and tested.
What is the probability that all three are defective if the first and second ones are not replaced after
being tested?
P(3 defectives) = P(1st defective and 2nd defective and 3rd defective)
= 3/20 * 2/19 * 1/18 = 6/6840 = 1/1140

Summary of probabilities
Event Probability
A P(A) ∈ [0,1]
not A P(A’) = 1 – P(A)
A or B P(A∪B) = P(A) + P(B) – P(A∩B)
P(A∪B) = P(A) + P(B) if A and B mutually exclusive
A and B P(A∩B) = P(A\B)P(B) = P(B\A)P(A)
P(A∩B) = P(A)P(B) if A and B are independent
A given B P(A\B) = P(A∩B)/P(B) = P(B\A)P(A)/P(B)

Page 58 Probability
MATH1179
Example: double dice
You decide to see how many times a "double" would come up when throwing 2 dice.
Each time the two dice are thrown is an Experiment.
It is an Experiment because the result is uncertain.

The Event you are looking for is a "double", where both dice have the same number. It is made up of
these 6 Sample Points:
{1,1} {2,2} {3,3} {4,4} {5,5} and {6,6}

The Sample Space is all possible outcomes (36 Sample Points):


{1,1} {1,2} {1,3} {1,4} ... {6,3} {6,4} {6,5} {6,6}

These are your Results:


Experiment it a Double? {3,4}No{5,1}No{2,2}Yes{6,3}No......

After 100 Experiments, you had 19 "double" Events ... is that close to what you would expect?

Actual probability of a double is:


P(double) = 6/36 = 1/6 = 0.16667

Observed probability/chance of a double is:


P(doubles observed) = 19/100 = 0.19

Looks reasonable, would consider larger sample size.

Probability line

Mathematics for Computer Science Page 59

You might also like