Professional Documents
Culture Documents
MATH1179 Text Book Part 2 2021-2022
MATH1179 Text Book Part 2 2021-2022
Mathematics for
Computer Science
Part 2
Notes 2
Dr Yvonne Fryer
September 2021
MATH1179
Course topics
A variety of topics will be introduced. Some may be familiar to you, others possibly less so. You will
be encouraged to read around and find out more about the topics.
Topics will include sets, number theory, linear algebra, calculus, graph theory… and more!
Tutorials
The tutorials allow you the opportunity to really make use of the tutors help and guidance. The tutors
will be monitoring your completion of the tutorial exercises in the workbook.
There will be tutorial exercises accompanying every week of lectures - you must try to complete these
exercises - the most important reason is to develop your understanding. If you cannot do an exercise
then ask for help!
Working together on tutorial exercises is encouraged and strongly recommended – you can learn from
each other!
You should work through a selection of exercises and ask for help on ones you don’t understand how
to do.
Solutions will appear on Moodle a little while after the tutorial, after you have had time to try all the
exercises for yourself.
Attendance
It is expected that you attend all lectures and tutorials. Records of attendance are kept. There is always
a very strong correlation between attendance and marks.
If you cannot make a class, make sure you notify your absence online, let your personal tutor know
there is a long term problem.
Books
We do not “lecture from a book” and are therefore not expecting you to go out and buy a book.
There are some books and sites listed that may be useful for you, but it is not compulsory to buy
anything.
If you find any really good books or sites, let me know, so that I can recommend them to other
students!
www.mathcentre.ac.uk has useful resources and practice material, often referred to on the course.
Finally …
Finally, we want you to enjoy this course! We want to get you thinking in different ways.
Put the work in and contribute to lectures, and you will learn more and hopefully enjoy it while you
are doing it!
And ask me anything you want to ask!
ii Term 1
MATH1179
Contents
iv Term 1
MATH1179
1.1 Vectors
The word vector is used for a type of quantity that has size as well as direction. A vector quantity can
be depicted by an ‘arrow’ with the length of the shaft representing the size and the arrowhead its
direction.
Example
A ship is travelling west for 6 km.
The example shows a free vector as it describes the journey of the ship but does not give a starting
point. If we want to tie the vector to a particular place this is called a position vector. This also
enables us to combine vectors to build up a resultant ‘journey’.
Example
A ship is travelling west for 6 km
from Liverpool and then 3km north.
If we use our 2D coordinate system we can describe each vector in terms of its distance and direction
from the origin O at (0,0) and we can also use other formats to describe the vector
Example
Here we can see two position vectors p and q. They
could also be described as the vectors going from
the origin, O, to the points P(3,4) and Q(5,3)
respectively or as vectors OP and OQ.
5
P Another way of describing them is by giving their
4
direction and length but in these cases those
3 Q numbers are not going to be straightforward.
p q
2 OP is length 5 and the angle is 53o with the
horizontal
1
OQ is length 5.83 and the angle is 31o
0 O 1 2 3 4 5
1 0
so 𝑖𝑖 = � � = and 𝑗𝑗 = � �
0 1
5
P
4 Now we can also express
OP as 3i + 4j
3 Q
p and OQ as 5i + 3j
q
2
j
0 O
1 2 3 4 5
Negative numbers are used to show the reverse direction so the vectors r and s are given as
4
r = -2i -1j s = 2i -3j
r
3
s
2 − 2 2
or r = and s =
1
−1 − 3
0 1 2 3 4 5
4 4
u
u
3 3 -u
t t+u t-u
2 2
t
1 1
0 1 2 3 4 5 0 1 2 3 4 5
We can see that to add and subtract vectors you simply deal with the i and j components separately.
Scaling a vector
7 We can scale a vector by multiplying it by a scalar
-p (an ordinary number). Here we can see
6
1 2
2p p = and q =
5 p 3 1
0.5 −1
4 ½ p = and -p =
1.5 − 3
3
3q 2 6
2p = and 3q =
2 6 3
so we multiply each component of the vector by
1 q the scalar
0 1 2 3 4 5 6
1.2 Matrices
Matrices are a very important concept related to arrays in programming languages. We can use them
for such diverse purposes as describing adjacency in graph theory, implementing transformations in
geometry, coding and ciphers and solving simultaneous equations. In their simplest form they are
rectangular arrays of numbers arranged in rows and columns.
2 4 1 3 2 3
24 13 12 5 − 5 2 4 9 8
2.76 8.34
A = − 3 45 26 B= C= 6 8 D =
5.89 0.38 5
2 − 17 67 0 4 3 1 1
9 8 2 9
− 2 7
A matrix is said to have dimension m × n (m by n) where there are m rows and n columns so above
the dimensions of the matrices are
A is a 3×3matrix, B is a 2×2 matrix, C is a 5×2 matrix, D is a 2×4 matrix
Two matrices are equal if they have the same size and corresponding elements are equal.
An n × n matrix is called a square matrix. so A and B are square matrices but C and D are not
Scalar Multiplication
This means you multiply each entry by a number.
Example
45 33 24 30 30 25
A = B =
31 27 42 28 43 20
45 33 24 30 30 25 75 63 49
S = A + B = + =
31 27 42 28 43 20 59 70 61
Page 4 Vectors and matrices
MATH1179
45 33 24 135 99 72
3A =A+A+A = 3×A = 3 =
31 27 42 93 81 123
Matrix multiplication
This is the most complex procedure and is best illustrated by an example.
Example
1 2 2 − 1 2 9 5 0
A= B = then AB = but BA =
− 3 4 0 5 − 6 23 − 15 20
Example
2 1 3 − 1 this is because
If A = then its multiplicative inverse A-1 =
5 3 − 5 2
2 1 3 − 1 3 − 1 2 1 1 0
AxA-1 = A-1 xA = = =
1
5 3 − 5 2 − 5 2 5 3 0
( 1 0 is the equivalent of 1 for 2x2 matrices and the inverse must work from left and right)
0 1
We can use this to solve simultaneous equations
Simultaneous equations
We have learnt in semester one to express information with equations. Sometimes this results in 2
equations with 2 variables.
NOTE because matrix multiplication is NOT commutative you must keep the multiplications both
on the left of the expressions
1 0 x 33 − 26
= (multiplying out both sides using matrix multiplication)
0 1 y − 55 + 26
x 7
= so we can read off the answers x = 7 y = -3
y − 3
Inverse of a matrix
We solved our equation because we were using a matrix where we had already been given its inverse.
Only square matrices can have inverses and not all of them do. In general to find the inverse of a
large matrix by hand would be out of the scope of a course like this and a package such as MATLAB
would be used.
However for certain 2x2 matrices it is possible to calculate the inverse by hand using this formula
a b 1 d − b
:Let A = then A-1 = ( only if ad-bc ≠ 0)
c d ad − cb − c a
Example
4 5
C = then a = 4 b = 5 c = 2 d = 3, so substituting them in the formula we get
2 3
1 3 − 5 1 3 − 5
C-1 = =
(4 × 3) − (2 × 5) − 2 4 2 − 2 4
Example
We can use our previous calculation in the simultaneous equations: 4p + 5q = 22, 2p+3q = 8
4 5 p 22
this becomes = so
2 3 q 8
p 1 3 − 5 22
= (multiplying both sides by the inverse)
q 2 − 2 4 8
p 1 66 − 40 1 26 13
= = = so that p = 13 and q = -6
q 2 − 44 + 32 2 −12 − 6
Basics
We will illustrate all our examples using a triangle T and we will describe this by using a matrix
which gives the coordinates of each corner given as (x,y), these are (1,1), (4,1) and (4,3). Rewriting
1 4 4 x
T as a matrix, T = as all the co ordinates are written in the format .
1 1 3 y
6
4
T+v ST
3
2
FT 1 T
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
-1
-2
HT
-3
-4 RT
-5
-6
−1 0 −1 0 0 1 1.5 0
Applying the matrices F = , H = R= and S= to T gives
0 1 0 − 1 −1 0 0 2
− 1 0 1 4 4 −1 − 4 − 4 −1 0 1 4 4 = − 1 − 4 − 4
FT = = HT =
0 1 1 1 3 1 1 3 0 − 1 1 1 3 −1 −1 − 3
0 1 1 4 4 1 1 3
=
− 1 0 1 1 3 −1 − 4 − 4
ST = 1.5 0 1 4 4 = 1.5 6 6
0
2 1 1 3 2 2 6
We can also use the vector v = − 7 to move T to the position labelled T+v.
2
Reflection
−1 0
F = has the effect on T of reflecting it in the y axis.
0 1
The vertices of T are now in reverse order. To describe a reflection you must state the ‘mirror line’.
Rotation
−1 0
H = has the effect of rotating T about (0,0) through an angle of 1800.
0 − 1
0 1
R= has the effect of rotating T about (0,0) through an angle of 900 clockwise.
− 1 0
The vertices of T remain in the same order.
To describe a rotation you must state the centre, the angle and whether clockwise or anticlockwise.
Scaling
1.5 0
S= has the effect of scaling T by 1.5 in the x direction and 2 in the y direction.
0 2
To describe scaling you must give the scale factor in each direction
Translation by vector
v = − 7 describes moving the triangle T 7 units to the left and 2 units up. This is a bit different from
2
the matrix multiplication as the vector is applied to each point in turn.
The use of matrix multiplication to apply this translation to T is beyond the scope of this section but
would involve enhancing the vectors representing the vertices of T and then using the 3x3 matrix
1 0 − 7
to apply the translation.
0 1 2
0 0 1
To describe a translation you must give the size and sign of the movement in each direction
Combinations of transformations
More than one transformation can be applied to a figure. This has the same effect as applying the
product of the individual transformations.
Derby Nottingham
16
A leading TV company is setting up
a new cable TV company based at 16 16
24
Definition
A graph G consists of a set of vertices V and a set of edges E. Each edge e is associated with a pair
of distinct vertices u and v called the endpoints of e. We can write e = ( u, v ).
We shall only consider simple graphs - graphs which do not have edges which start and finish on the
same vertex and which can only have at most one edge between any two vertices.
c d
2 A graph which has vertices a,b,c,d,e and edges (a,b), (a,c), (b,d), (c,e), (a,e)
a b
e c
The number of edges in the walk is called the length of the walk.
The first 2 columns are walks, the next trails and the
last column is paths from a to b.
c d
A walk from a vertex v to itself with no repeated edges is called a cycle.
A graph is connected if given any two vertices there is a walk between them.
Example
a b
A graph which is not connected :
G has vertices a, b, c, d ,e
edges (a,b), (a,c), (d,e) d c
Example
A tree
C:
Example
The directory system on the PC is a tree
A subgraph of a graph G is a graph all of whose vertices are in G and all of whose edges are in G.
Example b b
c
a a
d
d Subgraph of G
G
Example
The graph G above has spanning trees
b
b
c c
a a
d d
An alternative way to represent a graph is by an adjacency matrix. Matrices are simply ways of
storing information in a grid formation.
We can show that the number of walks from vertex vi to vj of length n is the i-jth entry in the matrix
An.
0 1 0 0 0 1 0 0 1 0 1 1
2 1 0 1 1 1 0 1 1 0 3 1 1
In the above example A = × =
0 1 0 1 0 1 0 1 1 1 2 1
0 1 1 0 0 1 1 0 1 1 1 2
The number of walks of length 2 between v4 and v2 is 1 v2 v3 v4
The number of walks of length 2 between v2 and v2 is 3 v2 v1 v2 , v2 v3 v2 , v2 v4 v2
A weighted graph is a graph which has a real number attached to each of its edges.
A
28
31 13
D 40
C
43
B
10 12
In the first example of this section, the minimal spanning tree will give the least amount of cable
needed to connect the towns.
We use Kruskal's algorithm to find the minimal spanning tree for a weighted graph G with n
vertices.
Spanning Tree :
e6 The minimum length of cable is
16+16+12+11+15+20=90
e5 e10
e2 e1
e4
Multigraphs
These are graphs which
have loops or multiple edges. A B
(p∧(q∨¬r)) ∨ (q∨(¬p∧r)) T
p∧(q∨¬r) q∨(¬p∧r) T F
p q∨¬r q ¬p∧r T T F F
q ¬r ¬p r F T F F
r p F T
A binary search tree is a binary tree in which data is associated with each vertex. The data is
arranged so that for any vertex v in T, each data item on the left subtree of v is less than the data item
in v and the data in the right subtree of v is greater than the data item in v.
Example
Find a binary search tree for the words in the sentence 'Once upon a time there was an old man'. Use
alphabetical ordering and Once as the node.
Once
a upon
an time was
old there
man
Example
Put the numbers 34, 56, 23, 89, 12, 24, 54, 22, 67 in a binary search tree with 34 as the node.
34
23 56
12 24 54 89
22 67
3: Algorithms
Algorithms are often described as ‘Mathematical Recipes’, the solving of a problem with a finite
sequence of operations. You have already considered many such ‘recipes’ in Systems Building and
Programming. You have written them both in structured English and a programming language like
JAVA. In looking at networks earlier we used Kruskal’s Algorithm to find the minimum spanning
tree.
Example
1. Multiply 1. Naming the algorithm
2. READ x,y 2. Taking in inputs x and y
3. n ← 0 3. Initialising the variable, n (loop counter)
4. Answer ← 0 4. Initialising the variable Answer
5. DO 5. Start of the loop
6. Answer ← Answer + y 6. Adding y on to the current value of Answer
7. n ← n+1 7. Adding 1 on to the current value of n (updating counter)
8. LOOP UNTIL n =x 8. Test to see if we have finished the loop
9. DISPLAY Answer 9. Outputting the result
10. End Multiply 10. Ending the algorithm
Because computers do not think, we use algorithms as instruction sets to the machine to implement
our thoughts and designs. Here the algorithm (written in pseudo code or structured English) uses
successive addition to accomplish a multiplication in the same way that you once learnt that 4+4+4
= 3×4.
Definition
An algorithm may be defined as a sequence of instructions to solve a problem which has the following
properties:
The algorithm receives INPUT
The algorithm produces OUTPUT
The algorithm stops after a FINITE set of instructions have been executed
Each instruction in the algorithm is followed by a UNIQUE SUCCESSOR instruction.
This can be seen in the algorithm design for the above example, this is shown in the structure chart.
INPUT is x and y
OUTPUT is Answer
The algorithm will stop after the LOOP has executed exactly x times
It is always clear what the next instruction is in the algorithm.
Page 10 Algorithms
MATH1179
UNIQUE
SUCCESSOR
Multiply
until n=x
Answer ← Answer + y
n←n + 1
INPUT OUTPUT
Initialisin FINITE as loop stops
after x executions.
Example
Input Age
READ n
IF n<0 OR n>130
DISPLAY “Error- bad input”
ELSE
DISPLAY “ Age is” n
END IF
End Input Age
Here we have a selection operation as part of the algorithm but that does not invalidate our property
that there must be a UNIQUE SUCCESSOR as, given the data INPUT, there is only one choice
possible.
UNIQUE SUCCESSOR
Multiply instruction dependent on n
n<0 OR n >130
True False
READ n O
DISPLAY DISPLAY O
“Error- bad input” “ Age is” n
INPUT OUTPUT
Dependent on n
It is this last type of complexity that is of interest to us. The limitations of early computers, where
saving a few bytes of memory or reducing the number of instructions could mean the difference
between a program working or not, meant that complexity became synonymous with efficiency.
Computing complexity describes the two types of efficiency we can consider related to the
algorithm’s use of machine resources. Space complexity is dependent on the type of data structures
used in the algorithm and its effect on the memory capacity of the computer has become much less
important over recent years. Time complexity is still a vital issue especially with the emergence of
newer technologies and software which rely heavily on the internet.
Time Complexity
The time complexity of an algorithm can be expressed in terms of the number of primitive operations
used by an algorithm when the input has a certain size.
Definition
For any algorithmic solution A, we define the time complexity function TA(n) as the maximum
number of relevant primitive operations that have to be performed by A on a problem of size n.
Example [Fenton]
Suppose we have an array X of marks for n students and wish to change them from being out of 70
to a percentage then 2 possible algorithms are given here.
ScaleMarks A ScaleMarks B
READ X(array) READ X(array)
total←70 factor←100/70
FOR I = 1 To n FOR I = 1 To n
X(i) ←X(i) * 100/total X(i) ←X(i) * factor
NEXT i NEXT i
DISPLAY X DISPLAY X
End ScaleMarks End ScaleMarks
Page 12 Algorithms
MATH1179
Example
Consider 2 algorithms for finding the sum of the rows of a matrix and then the sum of all elements in
the matrix. How it works:
First each row is summed and the totals for each row, sum(i), are displayed
1 5 2 = 8
Then the final total, Grandsum, of all the elements is calculated and displayed
4 0 3 = 7
The general element of the matrix A is written here as a(i,j) so for example
2 8 1 = 11 the element on the second row in the third column is called a(2,3).
5
3 2 = 10 In this example the inner loop will add the rows so j will go from 1 to 3.
36 The outer loop will total these row sums and so i will go from 1 to 4
Both algorithms work by using a double loop but 2 students have decided to try and get the total
(Grandsum) by putting similar statements in different places in the algorithm
MatrixSum A MatrixSum B
When we analyse the time complexity of these algorithms we look to see how often a critical
operation (addition in this case) takes place.
In A we have both Sum and Grandsum calculated 12 (4×3) times so have 2(12) = 24 operations
In B we have Sum calculated 12 times but Grandsum only 4 times so have a total of 16 operations.
MatrixSum C MatrixSum D
The difference in computer time in these cases will be negligible but consider the general case where
both i and j have been replaced by n allowing us to use this algorithm with any size matrix.
For all values of n , 2n2 > n2+n so we can say D is apparently guaranteed to execute faster than C.
Does this matter from a user’s perspective? If we take the case where n is 1000 and a computer only
capable of processing one instruction each microsecond the comparison between C and D will be
about 2 second to 1 second so not noticeably different. On a 100 000 by 100 000 matrix the difference
is significant with 6 hrs as opposed to 3 hours.
Example
Travelling salesman problem- this comes from Graph theory in the case where we have n cities each
connected to all others where all distances are known. (this type of network is known as a complete
graph). The problem is to find the shortest route from a city back to itself passing through all other
cities just once.
e.g. For 4 cities the graph would be as shown ( although the distances would not be so regular) and
taking city A as start and finish there would be 3! (3×2×1) routes to compare.
(starting at A there are 3 choices of next town × 2 of next ×1of last)
A B
length ABCDA = DIST(1)
length ABDCA= DIST(2)
length ACBDA= DIST(3)
length ACDBA= DIST(4)
length ADCBA= DIST(5)
length ADBCA= DIST(6)
D C
Page 14 Algorithms
MATH1179
This algorithm is designed to compute the shortest route in this case starting and ending at city A.
In general we can compute three versions of the function T, best case time, worst case time and
average case time but we will concentrate on worst case when the algorithm carries out its maximum
set of instructions. This is a well known problem that runs out of computer time if the number of
nodes (towns) gets too big. From the table you can see that with 10 nodes it is manageable as
10! =3628800 (3.6 x 106) but
16!=20922789888000 ( 2.1 x 1013 )
and you can see how quickly the number of operations gets too big to compute in reasonable time.
The table given here shows that although computers are incredibly fast at calculating, some
algorithms have a time complexity that grows in such a way to be impossible for large values.
When constructing algorithms for prototypes the scalability of the solution should be noted. The table
calculated by Norman Fenton shows execution times in relation to the number of operations and those
entries left blank exceed the expected life span of the universe!
Execution times for different algorithms assuming 1 execution per nanosecond [Fenton 1993 updated]
Time Complexity T(n) of algorithm in ascending order
Size of log2n n n2 2n n! nn
input
10 0.000000 003 0.000 000 01 0.000 000 1 0.000 001 0.0036 0.168
seconds seconds seconds seconds seconds minutes
100 0.000 000 007 0.000 000 1 0.000 01 1011 10143 10182
seconds seconds seconds centuries centuries centuries
1000 0.000 000 01 0.000 000 1 0.001 - - -
seconds seconds second
10 000 0.000 000 013 0.000 01 0.1 - - -
seconds seconds seconds
100 000 0.000 000 017 0.000 01 0.168 - - -
seconds seconds minutes
Orders of magnitude
In our example on matrices although the difference in the two algorithms was large when the matrix
was large, for the user they are regarded as the same order of magnitude. This is because at the low
end of the comparison there is no practical difference between waiting 1 and 2 seconds for the result
and at the upper end if you are going to wait 3hrs you might as well leave it to execute overnight
anyway and 6 hrs will make no difference.
Order of magnitude is an expression used by scientists to compare 2 values. When dealing with
ordinary numbers usually 2 values have the same order of magnitude if the ratio of the larger to the
smaller is less than 10:1. e.g. 57.3 is the same order of magnitude as 81.75 but NOT the same as
4.786. If they are not the same order of magnitude then the difference is based on the power of 10
involved.
Example
6245 is 2 orders of magnitude bigger than 48 as
6245 : 48 is bigger than 102 : 1 but less than 103 : 1
Sometimes units are taken into account rather than powers of 10 so that an hour might be considered
as 2 orders of magnitude bigger than a second.
In looking at the time complexity of an algorithm A we often compare TA(n) with a similar simple
function which behaves the same for large n . This is the basis for the big O sets used in complexity
where the O represents this dominating similar function. (we are just scratching the surface here and
more can be found about this in most good books on the theory of computing).
Example
If TA(n) = 2n+1 then it is said to be of the order of n or O(n) this is because as n increases 2n+1
grows at the same rate as n. Here n is the dominating similar function.
Example
If TA(n) = 6n2+3n + 17 then it is said to be O(n2)
Using this big O notation means that we don’t directly compare T values, which ignore some
operations like initialising and formatting, but have a much simpler grading of complexity.
Example
Searching and Sorting algorithms have already been mentioned and used in the programming course.
• The Binary search algorithm has O(logn)
• The Bubblesort algorithm has O(n2) in the worst case because for a list of n elements the
algorithm will need at worst O(n2) comparisons to sort out the list.
• Quicksort also has O(n2) in the worst case but it can be proved that it has average case
complexity of O(n logn) comparisons to sort out the list.
It can be seen that in investigating the big O value of an algorithm the only ones that will execute in
a reasonable time for non trivial n are O(nk). These are said to have polynomial complexity.
Page 16 Algorithms
MATH1179
Definition
A Turing Machine M = {S, I, f, s0 } consists of S a finite set of states, an alphabet I containing the
blank symbol B and a partial function, f , from S×I to S×I×{R,L}
This rather frightening definition can be interpreted in terms of a control head that can read and write
symbols on an infinite tape and move left or right and halt if the partial function is not defined. An
example of a Turing machine can be seen in the diagram below.
Here S = {s0 , s1, s2, s3, s4} I = {B, 1 0} and at each position the circular control head can do 3
things simultaneously:
Read a symbol and write another over it
Change its internal state
Move one space left or right
Its moves are defined by the function f and would consist of quintuples (sets of 5) such as
(s1, 1, s2, 0, L) i.e. If the machine is in state s1 and positioned by a 1 it changes to state s2, writes over
the 1 with a 0, and moves one space to the left.
Any time there is not a defined outcome for a pair (si ,I) the machine HALTS.
B 0 1 B 1 B 0 0 1 B B
s0
s1
s4
s3 s2
Turing invented his machine to mirror the functions that can be computed with an algorithm. For a
particular algorithm the Turing machine may be difficult to construct but the Church-Turing thesis
states that for any problem that can be solved by an effective algorithm there is an equivalent Turing
Machine that can solve this problem.
Stupid
INPUT Stupid
WHILE Halt(Stupid) = Yes DO
DISPLAY “This is weird”
LOOP
End Stupid
Thinking about the only two possible cases - Stupid terminates or Stupid doesn’t - shows us this is a
non-computable algorithm.
Case 1
Stupid terminates
Halt(Stupid) = Yes
WHILE DO LOOP never stops so Stupid doesn’t terminate
Case2
Stupid doesn’t terminate
Halt(Stupid) = No
WHILE DO LOOP stops so Stupid terminates
Page 18 Algorithms
MATH1179
4.1 Differentiation
By definition a straight line y = mx +c has the
7 (3,7)
same slope or gradient at all points on the line.
y = 2x+1 We can measure this in a practical way or
6
calculate it using a triangle placed against it i.e.
looking at the differences in the y’s divided by
5 2 the differences in the x’s. We have already
looked at this when studying functions.
4 gradient=2/1 =1
1
or Wherever we form the triangle on the line the
3 answer will be the same.
diff y/diff in x
(0,1) = 7-1/3-0
0 1 2 3 4 5 6
m= -2
m= 0.3
Gradient of a curve
There is not a single value for the gradient
when the function is represented by a curve
as the slope or gradient is different at each
point along the curve
Differentiation is a way of looking at the changes in y and x which measure the slope at any point.
Because this is not now a fixed number unlike the m of a straight line, it varies according to the
location on the curve, it is a function of x.
Definition
Let y = f(x) be the equation that describes the curve, then the expression for the gradient of the curve
is known as dy . To find it at a particular point on the curve you must put in the x value for that point.
dx
dx 14
x y =f(x) grad = 2x 13
-4 16 -8
12
11
-3 9 -6 10
-2 4 -4 9
-1 1 -2
8
0 0 0 6
1 1 2 5
2 4 4
4
3 9 6 2
4 16 8 1
0
-4 -3 -2 -1 0 1 2 3 4
Example
dy
a. y = sin3 x = (sin x)3 then = 3(sin x ) 2 × cos x = 3 sin 2 x cos x
dx
du dy dy dy du
or let u = sin x ⇒ = cos x . y = u3 ⇒ = 3u 2 ⇒ = × = 3 sin 2 x cos x
dx du dx du dx
dy
b. y = cos2 5x = (cos 5x)2 then = 2(cos 5 x)1 × − sin 5 x × 5 = − 10 cos 5 x sin 5 x
dx
dy
= 4(5 x 2 − 10 x)3 × (10 x − 10 ) = 40(5 x − 10 x) ( x − 1)
2 3
c. y = (5 x 2 − 10 x) 4 then
dx
Product Rule
Where both 𝑢𝑢, 𝑣𝑣 are functions of 𝑥𝑥, and the function is then a product of the two use the product rule.
For two different functions of x multiplied together use the product rule
d (uv ) dv du
=u +v
dx dx dx
It applies for more than two functions of 𝑥𝑥 too, i.e. the product of 𝑢𝑢, 𝑣𝑣, 𝑤𝑤,
𝑑𝑑(𝑢𝑢𝑢𝑢𝑢𝑢) 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= 𝑢𝑢𝑢𝑢 + 𝑢𝑢𝑢𝑢 + 𝑣𝑣𝑣𝑣
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
Example
dy
y = e3x (2 - √x)2 find
dx
du
Let u = e3x ⇒ = e3 x × 3 = 3e3 x
dx
dv 1 (2 − x )
v = (2 - √x)2 ⇒ = 2(2 − x )1 × (− x −1 / 2 ) = −
dx 2 x
dy
dx
(2 − x )
= e3 x × −
x
2
( ) 1
+ 2 − x × 3e3 x = e3 x 2 − x × − (
x
)
+3 2− x ( )
Quotient Rule
du dv
v −u
d u dx dx
For two different functions of x which are divided use the quotient rule =
dx v v 2
Example
1 + 3x dy
y= find
2 − 5x dx
du dv 1
let u = 1 + 3x ⇒ =3 let v = 2 − 5 x ⇒ = (2 − 5 x) −1 / 2 × (−5) , v 2 = 2 − 5 x
dx dx 2
−5
2 − 5x × 3 − (1 + 3x )
dy 2 2 − 5x
=
dx 2 − 5x
This can be tidied up by multiplying top and bottom by 2 2 − 5x
dy 6(2 − 5 x ) + 5(1 + 3 x) 17 − 15 x
= =
2(2 − 5 x ) 2 2(2 − 5 x ) 2
3 3
dx
Parametric differentiation
When 𝑥𝑥, 𝑦𝑦 are defined in terms of a third variable for example 𝑡𝑡, parametric differentiation can be
𝑑𝑑𝑑𝑑
used to obtain 𝑑𝑑𝑑𝑑 .
𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= �𝑑𝑑𝑑𝑑
𝑑𝑑𝑥𝑥
𝑑𝑑𝑑𝑑
Example
𝑑𝑑𝑑𝑑
Plot the graph of the parametric curve defined by 𝑥𝑥 = 𝑡𝑡 3 − 𝑡𝑡, 𝑦𝑦 = 4 − 𝑡𝑡 2 . Find 𝑑𝑑𝑑𝑑
It is possible to plot the parametric curve using values of 𝑡𝑡 to find the corresponding values of 𝑥𝑥 & 𝑦𝑦.
𝑡𝑡 -2 -1 0 1 2 and so on
𝑥𝑥 …
𝑦𝑦 …
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
= 3𝑡𝑡 2 − 1, = −2𝑡𝑡
𝑑𝑑𝑑𝑑 𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 2𝑡𝑡
=− 2
𝑑𝑑𝑑𝑑 3𝑡𝑡 − 1
Implicit differentiation
Sometime the equation we want to differentiate is not in the standard for 𝑦𝑦 = 𝑓𝑓(𝑥𝑥), but it is still
𝑑𝑑𝑑𝑑
possible to differentiate and determine the gradient 𝑑𝑑𝑑𝑑 by remembering that as 𝑦𝑦 is still a function of
𝑑𝑑𝑑𝑑
𝑥𝑥, 𝑦𝑦 = 𝑓𝑓(𝑥𝑥) so 𝑑𝑑𝑑𝑑 = 𝑓𝑓′(𝑥𝑥)
Example
Consider the equation of a circle centred at the origin with radius 5,
𝑥𝑥 2 + 𝑦𝑦 2 = 25
𝑑𝑑𝑑𝑑
Differentiating, 2𝑥𝑥 + 2𝑦𝑦. 𝑑𝑑𝑑𝑑 = 0
(Using the chain rule for 𝑦𝑦 2 and remembering 𝑦𝑦 is a function of 𝑥𝑥.
Rearranging
𝑑𝑑𝑑𝑑 2𝑥𝑥 𝑥𝑥
=− =−
𝑑𝑑𝑑𝑑 2𝑦𝑦 𝑦𝑦
Replacing 𝑦𝑦 by 𝑦𝑦 = √25 − 𝑥𝑥 2 ,
𝑑𝑑𝑑𝑑 𝑥𝑥
=−
𝑑𝑑𝑑𝑑 √25 − 𝑥𝑥 2
𝑑𝑑𝑑𝑑 𝑥𝑥 3 3 3
At 𝑥𝑥 = 3, 𝑑𝑑𝑑𝑑
= − √25−𝑥𝑥 2 = − √25−32 = − 4 , 4
When 𝑥𝑥 = 3 there are 2 corresponding y values on the circle so hence the two options for the gradient.
4.2 Integration
This is sometimes thought of as ‘reverse differentiation’ and there is a connection between the two.
(For those interested it is described in the Fundamental Theorem of Calculus) Like differentiation it
arises from a practical problem, this time of finding the area of irregular shapes. In later courses you
will also learn numerical methods for dealing with differentiation and integration and there are a
number of packages such as Derive to perform the calculations.
The theory of integration is derived by summing smaller and smaller vertical slices of the shape. The
sign used for integration of a function y = f(x) is ∫ y dx and a similar table to the one for
differentiation of well known results can be obtained.
y =f(x) ∫ y dx
We also need to use a constant of integration for indefinite integration,
c, to allow for a possible shift in the function in the y direction. a ax
There are two types of integration indefinite where we just find the expression for the integration as
in the table and the examples, and definite where we go on to use this answer to find the actual area
under a curve between two boundaries. This area can represent many different quantities depending
on the original function so that it may be distance travelled (in a velocity/time graph) or the consumer
surplus (in a price/quantity demand graph).
Example
Calculate the area under the curve y = -x2 +10x between x = 2 and x = 8
8 8
This is written as Area = ∫ y dx = ∫ − x
2
+ 10 x dx
2 2
8 8 8
AREA 1 3 1 2
∫2 y dx = ∫2 − x + 10x dx = − 3 x + 10 2 x 2
2
Using our rules from the table we show we have solved but NOT evaluated the integral by using
square brackets. We then evaluate by substituting x = 8 first and then x=2.
8
1 1 1
= − x 3 + 5x 2 = − 8 3 + 5(8) 2 − − 2 3 + 5(2) 2 = (-512/3+320)–(-8/3+20)
3 2 3 3
= 300 – 504/3 = 300 – 168 = 132 sq units
(Note checking this from the graph the scale on the y axis is in 4’s so each square represents 4 sq
units)
Areas below the axis are negative which leads to some interesting results in calculating total areas
Example
2π
∫ sin x dx = [cos x]
2π
0 = (cos 2π ) − (cos 0) = 1 − 1 = 0
0
The negative area below the axis has cancelled out the
positive one above. To find the actual area covered here
you need to split the integration into 2 parts and recognise
the second one is negative
2π π 2π
This emphasizes the importance of sketching the curve to be integrated and the need to understand
what is represented by the integration in any particular problem domain.
Example
Find ∫ xe let u = x. let dv/dx = e −2 x
−2 x
dx
− e −2 x
Then du/dx = 1 Then v =
2
− e −2 x − e −2 x − xe −2 x e −2 x
Therefore ∫ xe −2 x dx =x.
2
- ∫ 2 . 1dx =
2
+ ∫ 2 dx
− xe −2 x e −2 x
= − +c
2 4
− e −2 x
= (2 x + 1) + c
4
𝑓𝑓(𝑥𝑥)
?
𝑥𝑥
Bisection method
This method applies to any function that is continuous on an interval [𝑎𝑎, 𝑏𝑏] and has the property that
𝑓𝑓(𝑎𝑎) and 𝑓𝑓(𝑏𝑏) have different signs. This implies that 𝑓𝑓 has a zero in [𝑎𝑎, 𝑏𝑏]. Let 𝑥𝑥0 be the mid-point
in [𝑎𝑎, 𝑏𝑏]. If 𝑓𝑓(𝑥𝑥0 ) = 0 we have the solution, otherwise,…
• If 𝑓𝑓(𝑥𝑥0 ) –ve then choose the 𝑎𝑎 or 𝑏𝑏 which makes 𝑓𝑓(𝑎𝑎) or 𝑓𝑓(𝑏𝑏) positive and continue to find
another mid-point.
• If 𝑓𝑓(𝑥𝑥0 ) +ve then choose the 𝑎𝑎 or 𝑏𝑏 which makes 𝑓𝑓(𝑎𝑎) or 𝑓𝑓(𝑏𝑏) negative and continue to find
another mid-point.
Continue until the correct degree of accuracy is reached.
This method never fails - so is useful for that reason. However you do need two values 𝑎𝑎 and 𝑏𝑏 to
start with, not just one like Newton-Raphson. 𝑎𝑎 and 𝑏𝑏 should be close to the root, else it can take a
long time.
Example
Suppose 𝑔𝑔(𝑥𝑥) represents the profit from the sale of bananas, and is measured in thousands of dollars,
and 𝑥𝑥 is measured in thousands of kg, and 𝑔𝑔(𝑥𝑥) = 𝑥𝑥 5 + 𝑥𝑥 3 + 𝑥𝑥 2 − 1 for x in [0, 1].
Since 𝑔𝑔(0) = −1 and 𝑔𝑔(1) = 2 there is a number 𝑑𝑑 in [0, 1] where 𝑔𝑔(𝑑𝑑) = 0. This is the break-even
point where there is neither a profit nor a loss. We
need to try to find 𝑑𝑑. The equation does not
factorise so we will try the interval bisection
method. Perform 5 iterations after the initial +ve
bisection.
a0 d b0
x0 xn-1 xn
Criteria 1:
Suppose the first interval [a0, b0] has length M. The first approximation x0 is the midpoint of this
interval and the error is d − x0
M M
d − x0 ≤ or e 0 ≤
2 2
The next approximation x1 is half way between x0 and b0 (or a0) so
M M
d − x1 ≤ or e 1 ≤
22 22
M M
Similarly d − xn ≤ or e n ≤ this gives us an upper bound on the error.
2n +1 2n +1
Criteria 2:
Stop when the difference between two iterations x n − x n −1 is less than a certain number, say 0.005
Solve for n to find how many iterations are needed, or, just keep iterating until xn − xn −1 < 0.005
Criteria 3:
Stop when the absolute function value is less than a specified value
close to 0, ie., f ( x n ) < 0.005
Therefore, just keep iterating until f ( x n ) < 0.005 which meets the
required accuracy and the root is found.
Example
Use the interval bisection method to solve 3 x − ln x = 5 stopping where f ( x) < 0.02
Need to get it in the form f(x) = 0 first,
f ( x) = 3x − ln x − 5 = 0
1.5 + 2
x1 = = 1.75 − 0.3096157
2
1.75 + 2 As f ( x) < 0.02 stop
x2 = = 1.875 − 0.0036086
2 ∴ x = 1.875
Newton-Raphson Method
Numerical methods for solving equations basically involve finding an approximate value for the root
and improving this approximation to a required accuracy.
This method, named after Sir Isaac Newton and Joseph Raphson, utilises the line tangent to the graph
of a function, to approximate values for a zero of that function.
d d
x2 x1 x3 x2 x1
Let d be a zero of the function and assume f is differentiable on an open interval containing d. Choose
a number x1 which is close to d. Draw the tangent at x1 and provided it is not horizontal, hopefully it
Example
Find an approximate solution to an accuracy
of 0.0001, to the equation cos x = x
Firstly, this is not of the form f ( x ) = 0 so we
must re-write it as cos x − x = 0
f ( x)
In the last step < 0.0001 or 10-4, the approximate root is 0.739085133. (0.739 3dp)
f ′( x )
Example
3 3
Use the Newton-Raphson method to find an approximate positive root of the equation x + 4x = 1
8
to an accuracy of 4 dps.
3 3 9 2
Write as f ( x=
) x + 4 x −=
1 0 ⇒ f ′( x=
) x +4
8 8
27
Try x =
0 f (0) =
−1 x =
1 f (1) == 3.375
8
x f (x ) f ′(x ) f ( x) f ( x)
x−
f ′( x ) f ′( x )
Geometrically, the derivative is the slope of the tangent to the curve at 𝑥𝑥𝑖𝑖 xi (figure later in text).
Example - Finite-Divided-Difference
Approximations of Derivatives
Problem Statement: Use forward and backward difference approximations of 𝑂𝑂(ℎ) and a centred
difference approximation of 𝑂𝑂(ℎ2 ) to estimate the 1st derivatives of
f ( x) = −0.1x 4 − 0.15 x 3 − 0.5 x 2 − 0.25 x + 1.2
at 𝑥𝑥 = 0.5 with step size ℎ = 0.5, repeat with ℎ = 0.25.
[1st derivative calculated directly is: f ( x) = −0.4 x 3 − 0.45 x 2 − 1.0 x − 0.25 and can be used to
calculate the true value 𝑓𝑓 ’(0.5) = − 0.9125]
Solution:
For ℎ = 0.5, the function can be employed to determine
𝑥𝑥𝑖𝑖−1 = 0 𝑓𝑓(𝑥𝑥𝑖𝑖−1 ) = 1.2
𝑥𝑥𝑖𝑖 = 0.5 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = 0.925
𝑥𝑥𝑖𝑖+1 = 1.0 𝑓𝑓(𝑥𝑥𝑖𝑖+1 ) = 0.2
f (1.0) − f (0.5)
Forward divided difference: f ' (0.5) = = −1.45 error, 𝑒𝑒 = 58.9%
0. 5
f (0.5) − f (0)
Backward divided difference: f ' (0.5) = = −0.55 𝑒𝑒 = 39.7%
0.5
f (1.0) − f (0)
Central divided difference: f ' (0.5) = = −1.0 𝑒𝑒 = 9.6%
2 * 0.5
For ℎ = 0.25, the function can be employed to determine
𝑥𝑥𝑖𝑖−1 = 0.25 𝑓𝑓(𝑥𝑥𝑖𝑖−1 ) = 1.10351563
𝑥𝑥𝑖𝑖 = 0.5 𝑓𝑓(𝑥𝑥𝑖𝑖 ) = 0.925
𝑥𝑥𝑖𝑖+1 = 0.75 𝑓𝑓(𝑥𝑥𝑖𝑖+1 ) = 0.63632813
f (0.75) − f (0.5)
Forward divided difference: f ' (0.5) = = −1.155 error, 𝑒𝑒 = 26.5%
0.25
f (0.5) − f (0.25)
Backward divided difference: f ' (0.5) = = −0.714 𝑒𝑒 = 21.7%
0.25
For both step sizes, the centered difference approximation is more accurate than forward or backward
differences. Also, as predicted by the Taylor series analysis, halving the step size approximately
halves the error of the backward and forward differences and quarters the error of the centered
difference.
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣−𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎
Formula for relative error, 𝑒𝑒 = 100 × � 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
�
I = ∫ f ( x)dx
when the integration cannot be carried out exactly or when f ( x ) is known only at a finite number
of points.
x
2cos x e
Many functions, for example sin x , and cannot be integrated analytically. However,
x x
integration of such functions can be performed numerically.
To get an approximate value of I,
b b
I = ∫ f ( x)dx ≅ ∫ f n ( x)dx
a a
where
f n ( x) = c0 + c1 x + c2 x 2 + ... + cn −1 x n −1 + cn x n
is a polynomial of degree 𝑛𝑛. The resulting formulas are called Newton-Cotes formulas.
Let us consider the simplest integration rule of this type.
Trapezoidal Rule
This rule is a particular case when 𝑓𝑓(𝑥𝑥) is approximated by a polynomial of first degree, namely:
f1 ( x) = c0 + c1 x . Then we get the formula
b b
I = ∫ f ( x)dx ≅ ∫ f1 ( x)dx
a a
=
( +
)
f (b) − f (a) b 2 − a 2 b. f (a) − a. f (b)
(b − a )
b−a 2 b−a
= [ f (b) − f (a )]
(b + a ) + b. f (a) − a. f (b)
2
f (a ) + f (b)
= (b − a )
2
The last expression represents the Trapezoidal Rule:
b
f (a ) + f (b)
I = ∫ f ( x)dx ≅ (b − a )
a
2
Where 𝑓𝑓 (2) (𝜉𝜉) is the function differentiated twice and determined at some point 𝑥𝑥 = 𝜉𝜉 at some value
of x in the interval 𝑎𝑎, 𝑏𝑏].
Example
Use the trapezoidal rule to integrate the function.
f ( x) = 0.2 + 25 x − 200 x 2 + 675 x 3 − 900 x 4 + 400 x 5
from 𝑎𝑎 = 0 to 𝑏𝑏 = 0.8. Compare the answer to the exact value of the integral.
b x1 x2 xn
Then I = ∫ f ( x)dx = ∫ f ( x)dx + ∫ f ( x)dx + ... + ∫ f ( x)dx
a x0 x1 xn−1
f ( x0 ) + f ( x1 ) f ( x1 ) + f ( x2 ) f ( xn −1 ) + f ( xn )
≅h +h + ... + h
2 2 2
If we rearrange we get
n −1
f ( x0 ) + 2∑ f ( xi ) + f ( xn )
I ≅ (b − a ) i =1
2n
(Width) * (Average Height)
(b − a )
or I≅ [ f ( x0 ) + 2( f ( x1 ) + f ( x2 ) + ... + f ( xn−1 )) + f ( xn )]
2n
h
I≅ [ f 0 + 2( f1 + f 2 + ... + f n−1 ) + f n ] Composite Trapezoidal Rule
2
The error of the Composite Trapezoidal Rule is given by
(b − a) 3 n
(b − a) 3 ( 2 )
E=−
12n 3
∑i =1
f ( 2 ) (ξ ) ≅ −
12n 2
f (ξ )
(2)
Where 𝑓𝑓 (𝜉𝜉) is the function differentiated twice and determined at some point 𝑥𝑥 = 𝜉𝜉 at some value
of x in the interval 𝑎𝑎, 𝑏𝑏].
Example
Integrate the function using the composite trapezoidal rule
f ( x) = 0.2 + 25 x − 200 x 2 + 675 x 3 − 900 x 4 + 400 x 5
with n=2, from a=0 to b=0 and estimate the error.
There are other approaches to numerical integration besides the Trapezoidal rule. There are also
many other numerical methods that further studies will present.
Example
02083318431 contains information once you recognise it as a phone number 0208 331 8431
Example
Once CV116PG is put on an envelope it conveys the information that the post code CV11 6PG
contains.
As we have already noted many of the roles in computing, including the systems analyst and software
developer, are involved in extracting information and interpreting input, output and processes.
Information will often be observed, measured, presented or collected in the form of data i.e.
collections of alphanumeric characters.
Example
The maximum age for a man currently in UK is 111 years, here maximum is a parameter, an
interesting value describing a characteristic of the population of men in the UK. If the measurement
is made on only a sample of the population it is called a statistic. e.g. the maximum age for a sample
group of 500 men is 102 years.
In practice we are usually concerned with the calculation of statistics on the samples we have and
sometimes use our results to estimate the population parameters.
A statistic is a number or fact that summarizes a collection of data and is derived from it by various
arithmetic measures. The study of statistics is concerned with scientific methods for collecting,
organising, summarising, presenting and analysing data and with drawing valid conclusions and
making reasonable decisions on the basis of such analysis.
We can divide these methods into descriptive statistics and statistical inference. In the first we are
just trying to describe or characterize the information we have, in the second we will try to use our
statistics to give us extra insight usually for prediction or applying into a wider context.
Example
In a recent Mathematics lecture a survey was taken. From 80 students:
10 were asleep
50 said it was a fascinating lecture
15 said it was alright
Mathematics for Computer Science Page 39
MATH1179
5 said “Oh is this Mathematics?”
We could conclude from this
a) one eighth of the students in Mathematics sleep (this would be in the area of descriptive statistics
as we are simply describing the existing situation)
b) At the University of Greenwich 1 out of every 16 students do not know what they are doing (this
would be statistical inference, trying to extend our result from the class to a wider group to infer
something for the wider students population)
Statistical Inference has to be carefully made and often its application is governed by rules relating
the data you have, the sample set, to the population, the wider set of all possible data.
Whenever you apply statistical methods either for description or inference you should be aware of
and state the limitations of your results. When you input data into a computer, calculator or program
and obtain a result the restrictions of its applications should be clearly expressed to the user.
Organization of Data
We might have obtained the previous list from a spreadsheet of all 650 MPs in that election giving
the MP’s name and party
375 = COUNTIF(B2:B651,"Cons")
229 = COUNTIF(B2:B651,"Lab")
22 = COUNTIF(B2:B651,"Alliance")
24 = COUNTIF(B2:B651,"Other")
Types of Data
Discrete
So far all the examples we have seen used values that we can refer to as being discrete, this means
that there are gaps between the datum values or categories. Some data values in the range are not
possible in the case of discrete datum.
There were no values in between and on a number line they would represent isolated points
There are no values in between 16 and 16½ so if the office worker has a neck size of 16¼ they have
to choose to wear a 16 or a one or a 16½.
Similarly the MPs were only in one party, they couldn’t be halfway between a Conservative MP and
a Labour one.
Continuous
Continuous data as its name suggests could be shown as a continuous line with all values possible.
Some examples are temperature and height.
Example
In an experiment to take repeated measurements of the temperature of a component in a laptop the
minimum of value was 21o C and the maximum 24.7oC.
The temperatures formed a set of continuous datum. Represented on a number line any value could
have been obtained in between the minimum and maximum.
21 24.7
In practice we are usually constrained by the precision of the measurement method to make the data
discrete for example by measuring to one decimal place so that there are no values recorded between
23.3oC and 23.4oC.
The range of data is the difference between the maximum and minimum values.
In the last example the range = 24.7 oC – 21 oC = 3.7oC
Categorical
Sometimes the results we are given to analyze are not numerical, this is particularly true in the field
of market research. Answers are given from a number of discrete non numerical categories. This was
true of the MP’s and their party.
Ranked
In the previous example no ranking is implied within the categories, but in other questionnaires there
is an ordering of the possible answers which could be given a numerical rank.
Example
In a questionnaire respondents were asked to rate the following items with one of three categories
which have a clear ranking.
Lecturer: Boring / OK / Interesting
Material: Easy / OK / Hard
Frequency Distributions
We shall now consider the process of organising data. We first sort the data into the different values
which can be attained and calculate the frequency of each value (the number of times it has occurred).
If there is a large range of values or the data is continuous the different values may be grouped into a
range of datum.
Hours of TV Tally Frequency
Example
28 IIII 5
A random sample of 20 individuals responded to the
question 'How many hours of television do you watch 29 III 3
each week ?' as follows : 30 II 2
31 I 1
29 35 28 31
32 III 3
28 29 30 32
33 I 1
29 30 28 32
34 0
32 28 35 35
35 IIII 5
35 28 33 35
Total 20
Using a Spreadsheet
Forming a frequency distribution from a set of numbers
in a spreadsheet package such as Excel can be achieved
in a number of ways. The data has to be copied in and
then you can either use the COUNTIF function or the
Histogram facility in the Data analysis option. Both
results are shown here.
Grouping data
If there are a large number of different values e.g. the results of an exam for 50 students, we do not
want a table which has a low frequency against each of a large number of values. We use classes to
join some of the data together.
Example
Here the quantity of data is large ( 50) and has a wide range so it is convenient to group data into
class intervals. Suppose we have repeated the television survey with 50 respondents and the results
are as follows:
27 26 35 34 29 28 40 40 24 33
22 25 21 20 31 30 37 38 22 23
15 12 29 16 10 36 29 17 39 28
25 41 29 40 18 37 25 18 27 27
25 40 33 31 34 26 40 20 12 15
We shall aim for between 8 and 10 classes - so we shall have a class width of 4 hours
Remember class boundaries must be chosen so that we have one more degree of accuracy than in the
data, this ensures that every datum can be put into exactly ONE class. The class mark at the midpoint
is the representative value for that class and can be used in calculations.
12
10
8
Frequency
0
11.5
15.5
19.5
23.5
27.5
31.5
35.5
39.5
Hours
A frequency polygon is obtained by joining the mid-points of the top of each rectangle.
MEAN
Example
Suppose that 6 students take a test and get scores 53, 72, 45, 80, 85 and 52. The mean mark is the
sum of these numbers divided by 6 = ( 53 + 72 + 45 + 80 + 85 + 52 ) ÷ 6 = 64.5
n
(x + x 2 + x 3 + .... x n )
∑x i
With n values given as x1, x2, ... , xn the mean is = 1 = i =1
using the
n n
Greek capital sigma ∑ to show we have summed all terms like x i where i takes the values from 1
to n
Mean = ( 29 + 35 + 28 +............ + 35 + 35 ) ÷ 20 using the original ‘raw’ data and summing all
20 numbers separately then dividing by the number of data items, this gives the answer 31.1
We could also get this number from the frequency table using more efficient grouping
Mean = ( 5 × 28 + 3 × 29 + 2 × 30 + 1 × 31 + 3 × 32 + 1 × 33 + 5 × 35 ) ÷ 20
= 140 + 87 + 60 + 31 + 96 + 33 + 175 ) ÷ 20
= 31.1
So if data values are repeated so that value xi occurs fi times, then n = f1 + f2 + ... + fn the total
frequency and
n n
(f1 x 1 + f 2 x 2 + f 3 x 3 + .... f n x n )
∑ fi x i ∑f x
i =1
i i
i =1
mean = = = n
n n
∑f
i =1
i
Now consider data which has been grouped into classes. In this case we take the class mark to
represent the class in the calculations. This is true of the second example of TV viewing. Here x =
11.5 represents the values in the class 2.5 – 13.5. This means we will lose some precision if we
calculate the mean this way. The alternatives are to use the AVERAGE function in Excel or the
Summary statistics provided which will use all the raw data. All are shown here for comparison
=AVERAGE(A1:J5) 27.58
*In the absence of a lower or upper boundary we try to make a reasonable assumption.
So the mean salary = £ 255000 /20 = £ 12750 This will now only be an approximation of the answer
taken straight from the raw data.
MODE
This is defined as the most frequently occurring value and so in a bar chart showing the frequency it will
be the values giving rise to the highest bar.
Advantages of Mode
(i) Easy to find
(ii) Best represents a 'typical' item so is practical
(iii) Easy to understand
Disadvantages of Mode
(i) Not well defined
(ii) Does not use all values
(iii) Not useful if observations are spread out
(iv) Unsuitable for further calculations
MEDIAN
The value of the middle term of the sorted data
Example
6, 7, 9, 15, 23
↓
median
If there is an even number of observations then average the two middle values
Example
6, 7, 9, 15, 20, 23
└─┬┘
12
For grouped data we need to construct a cumulative frequency polygon (ogive). The median is the
observation which represents the 50% or n/2 value.
MILEAGE
MILEAGE NUMBER OF cumulative
('000 miles)
('000 miles) CARS (frequency) frequency %
upper limits
upto 3 3.5 16 16 4
4-6 6.5 40 56 14
7-9 9.5 94 150 38
10 - 12 12.5 96 246 62
13 - 15 15.5 62 308 77
16 - 18 18.5 44 352 88
19 - 24 24.5 34 386 97
25 and over 30.5 14 400 100
Now plot the cumulative frequency percentage against the upper limits of each group to give
100
% cumulative
frequency
80
60
40
20
0
0 5 10 15 20 25 30
'000 miles
L.Q. U.Q.
Med.
Measures of dispersion
Consider this simple example where both
sets of data have a mean of 100: To A 40 50 100 150 160 x = 100
distinguish them we need to discuss the B 98 99 100 101 102 x = 100
spread of the data.
Range
This is probably the easiest measurement of spread.
range = MAXIMUM VALUE - MIN.VALUE
for sample A 160 - 40 = 120
for sample B 102 - 98 = 4
OK to use but very loosely defined and easily distorted by one extreme value.
Inter-Quartile Range
If using median use the Inter-Quartile range for the spread
Find Upper Quartile (75% value) and Lower Quartile (25% value)
IQ range = UQ - LQ
We need to find out how the values are spread about the mean.
Using sample A to illustrate the process we can see how the formula originates.
∑( x − x ) 2 12200
(Average Deviation)2 = =
n 5
This is also known as the variance written σ2 for a population or s2 for a sample.
∑( x − x ) 2
σ= = Average Deviation = 2440 = 49.4
n
In practice Σ(x- x )2 is awkward to calculate so an equivalent formula is used to calculate the standard
deviation
∑ 𝑥𝑥 2 2
Standard deviation = � − 𝑥𝑥
𝑛𝑛
So for sample A ∑x 2
= 62200 n=5 x = 100
62200
- 1002 = 2440 = 49.4
5
Standard deviation σ =
∑ fx 2
−x
2
A lot of calculators have these functions built in and provided you understand how to input your data
in pairs they will calculate the mean and standard deviation easily.
Spreadsheets such as Excel or statistical packages such as MiniTab will do this and a lot more.
Example
Consider the frequency distribution of office juniors. Mean = £10 205.88
Σ f = 17 Σf(x- x )2 = 43529411.7
Standard deviation = √(43529411.7 ÷ 17) = £1600.17
Example
Using the short cut version of the formula
Calculate the mean and standard deviation for this distribution of test marks for 200 students
Probability Distributions
We have already looked at frequency distributions and generated histograms in Excel. If we revisit
the data obtained by asking 20 people how many hours they watched television we can display it in
a number of ways
29 35 28 31
28 29 30 32
29 30 28 32
32 28 35 35
35 28 33 35
Showing these graphically starting with the frequency polygon the only difference is in the scale of
the y axis. The third graph is known as a probability distribution. We note that in this case the curve
is made up from discrete points as there are no values in between the whole numbers
6
percentage of people
30%
number of people
5 25%
4 20%
3 15%
2 10%
1 5%
0 0%
25 27 29 31 33 35 37 25 27 29 31 33 35 37
hours hours
0.30
0.25
probability
0.20
0.15
0.10
0.05
0.00
25 27 29 31 33 35 37
hours
0.04
probability
Example
0.03
In the normal distribution of temperatures
0.02
shown the probability of a temperature
>19 C is equal to the area under the curve 0.01
7: Probability
Introduction to Probability
Examples of probability in everyday life
• 60% chance of precipitation
• chance of winning the UK lottery, the odds of picking all 6 correct numbers is 1 in 13,983,816
(49!/(6!*(49-6)!) combinations of numbers)
Definitions
Probability:
Probability is a measure of the expectation that an event will occur or a statement is true. Probabilities
are given a value between 0 (will not occur) and 1 (will occur).
eg., probability that it will rain =60%, 0.6
The higher the probability of an event, the more certain we are that the event will occur.
Experiment:
An Experiment is a situation involving chance or probability that leads to results called outcomes.
For example
Riding a bicycle – chance of me falling off!
MATH1111 exam – what’s the chance that probability will come up in the exam?
Outcome:
An Outcome is the result of a single trial of an experiment.
For example
Tossing a coin – the outcome could be either head or tails
When throwing a standard dice, with six faces and numbers 1 to 6, a single time there are six
possible outcomes
Event:
An Event is one or more outcomes of an experiment.
For example
A coin tossed 10 times may have had the following frequency of the outcomes: heads, tails
A coin tossed 10 times
6 4
Probability of an Event
An event A has the probability:
P(A) = The Number Of Ways Event A Can Occur
The total number Of Possible Outcomes
P(A’) = 1 - P(A)
Example: Coins
Experiment: Tossing a coin with two sides – head, tail.
Outcome: The result of tossing the coin once.
Event: tossing a head or tossing a tail
Sample Space: {head, tail}
Example: Dice
A single 6-sided die is rolled.
What is the probability of each outcome?
What is the probability
of rolling an even number?
of rolling an odd number?
Outcomes: The possible outcomes of this experiment are 1, 2, 3, 4, 5 and 6.
Probabilities:
P(1) = # of ways to roll a 1 = 1 / 6
total # of sides
Page 56 Probability
MATH1179
P(2) = # of ways to roll a 2 = 1 / 6
total # of sides
P(3) = # of ways to roll a 3 = 1 / 6
total # of sides 6
P(4) = 1 / 6 P(5) =1 / 6 P(6) 1 / 6
Example Summary
The probability of an event is the measure of the chance that the event will occur as a result of an
experiment.
The probability of an event A is the number of ways event A can occur divided by the total number
of possible outcomes.
The probability of an event A, symbolized by P(A), is a number between 0 and 1, inclusive, that
measures the likelihood of an event in the following way:
If P(A) > P(B) then event A is more likely to occur than event B.
If P(A) = P(B) then events A and B are equally likely to occur.
Impossible Event: P(A) = 0
Certain Event: P(A) = 1
Summary of probabilities
Event Probability
A P(A) ∈ [0,1]
not A P(A’) = 1 – P(A)
A or B P(A∪B) = P(A) + P(B) – P(A∩B)
P(A∪B) = P(A) + P(B) if A and B mutually exclusive
A and B P(A∩B) = P(A\B)P(B) = P(B\A)P(A)
P(A∩B) = P(A)P(B) if A and B are independent
A given B P(A\B) = P(A∩B)/P(B) = P(B\A)P(A)/P(B)
Page 58 Probability
MATH1179
Example: double dice
You decide to see how many times a "double" would come up when throwing 2 dice.
Each time the two dice are thrown is an Experiment.
It is an Experiment because the result is uncertain.
The Event you are looking for is a "double", where both dice have the same number. It is made up of
these 6 Sample Points:
{1,1} {2,2} {3,3} {4,4} {5,5} and {6,6}
After 100 Experiments, you had 19 "double" Events ... is that close to what you would expect?
Probability line