Core Concepts - Probability Booklet 2020-21

Probability and Statistics 55-403850
Core concepts:
Probability 2020/21
© Ellen Marshall and Lindsay Lee 2020

Contents and teaching plan
Staff teaching on this module.........................................................................................................5
Content of the course......................................................................................................................5
Module overview............................................................................................................................ 5
Module learning outcomes.............................................................................................................5
Topics.............................................................................................................................................. 5
Sessions...........................................................................................................................................6
Differences between this course and A level...................................................................................7
WELCOME TUTORIAL......................................................................................................................9
Opening exercise: Attendance and performance............................................................................9
1 Introduction to Probability..................................................................................................12
*Start video lecture 1a*................................................................................................................12
1.1 Introduction/ Motivation................................................................................................... 12
1.2 Foundations of probability.................................................................................................13
Notation........................................................................................................................................13
*End video lecture 1a*..................................................................................................................14
*Start video lecture 1b*................................................................................................................14
Probability.....................................................................................................................................14
Venn Diagrams............................................................................................................................. 15
*End video lecture 1b*..................................................................................................................17
*Start video lecture 1c*................................................................................................................ 18
1.3 Conditional probability.......................................................................................................18
Independence............................................................................................................................... 20
*End video lecture 1c*.................................................................................................................. 20
1.4 What have you learnt this week?.......................................................................................20
Chapter 1 homework questions: Basic Probability........................................................................21
Robot Questions............................................................................................................................21
Mathematician Questions – we will go through this in the tutorial..............................................22
Chapter 1 tutorial questions: Basic Probability.............................................................................24
Extended Applied Task..................................................................................................................24
2 Discrete Distributions..........................................................................................................27
2
2.1 Random Variables..............................................................................................................27
2.2 Discrete probability distributions.......................................................................................29
2.3 Frequently used discrete probability distributions.............................................................32
2.3.1. Discrete Uniform Distribution.........................................................................................32
2.3.2. Bernoulli Distribution......................................................................................................34
2.3.3. Geometric Distribution...................................................................................................35
2.3.4. Binomial Distribution......................................................................................................37
*End video lecture 2c*.................................................................................................................. 39
2.4 Summary so far..................................................................................................................40
2.5 Online tests this week........................................................................................................ 40
Chapter 2 homework questions: Discrete Distributions................................................................41
Robot Questions............................................................................................................................41
Mathematician Questions.............................................................................................................43
3 Continuous Distributions and Expectation...........................................................................45

3.1 Continuous Distributions....................................................................................................45
3.2 Special Continuous Distributions........................................................................................49
3.2.1. Continuous Uniform Distribution....................................................................................49
3.2.2. Exponential distribution................................................................................................. 50
3.3 Expectation of Discrete Distributions.................................................................................52
Manipulating Expected Values..................................................................................................... 53
3.4 Expectation of Continuous Distributions............................................................................53
3.5 Online quiz......................................................................................................................... 54
Chapter 3 homework: Expectation and Continuous Distributions.................................................54
Robot Questions............................................................................................................................54
Chapter 3 Tutorial: Expectation and Continuous Distributions.....................................................56
3
Mathematician Questions.............................................................................................................56
4 Summary statistics and the Normal distribution..................................................................59

4.1 Summarising continuous data............................................................................................59
Histograms................................................................................................................................... 60
Inferential statistics...................................................................................................................... 61
4.2 Normal probability distribution..........................................................................................62
Normal probability density function.............................................................................................63
Cumulative density function (cdf)................................................................................................. 65
4.3 Standard normal (Z) distribution........................................................................................68
Standard normal (Z) distribution table..........................................................................................68
4.4 Standardising..................................................................................................................... 69
Critical values summary................................................................................................................71
4.5 CAST material.....................................................................................................................74
4.6 Summary of Excel commands for the normal distribution.................................................75
Generating random numbers in Excel...........................................................................................75
4.7 Online tests this week........................................................................................................ 76
Tutorial 4: Normal distribution..................................................................................................... 77
Probability and statistics statistical tables....................................................................................82
Created by: Ellen Marshall and Lindsay Lee

Acknowledgements Roy Ward and Roger Jackson, Jess Hargreaves and Claire Cornock
4
Staff teaching on this module
Ellen Marshall (Module leader) ellen.marshall@shu.ac.uk Room: N601
Lindsay Lee lindsay.lee@shu.ac.uk Room: N601
Keith Harris k.harris@shu.ac.uk Room: N602
Content of the course

Module overview
This is an introductory probability and statistics module which will develop the statistical skills
needed in the workplace as well as cover the theoretical aspects of the subject. The course is in
several parts with the first section covering the core mathematical concepts of probability and
statistics before moving on to using SAS and undertaking a statistical project. The final section will
return to probability for a deeper understanding and to prepare you for your level 5 Statistical
theory and methods module.
Module learning outcomes
LO Learning Outcome
1 Use and interpret the concepts of uncertainty, variability and probability and demonstrate a
basic understanding of how these work in the real world.
2 Perform simple statistical modelling and conduct appropriate data analysis.
3 Communicate the elements of the statistical process in a clear and concise manner.
Topics
Section Contents
Core concepts: Probability Basic probability
Discrete distributions
Introduction to continuous distributions
Expectation
Core concepts: Inferential Statistics Data collection
Summary Statistics
Confidence intervals
Hypothesis testing concepts
Z and T-tests
Correlation and regression Correlation
Simple and multiple linear regression
Analysing categorical data Summarising categorical data
Risk and relative risk
Chi-squared tests
Non-parametric tests
Surveys
Further probability Recapping and expanding on core concepts
Conditional probability
Bayes' Theorem
5
Sessions
Session Content
Video Lectures Short videos explaining the core concepts and how to apply them.
These will be interspersed with short quizzes and exercises. You can
watch these videos at any time but it is recommended that you watch
them in the dedicated lecture slot for this module.
Online Tutorials An opportunity to work together through examples and to introduce
SAS and Excel for statistical analysis. This will include drop-in sessions
for general help, advice and clarification.
Online self-check quizzes A number of practice quizzes for the core concepts of probability and
statistics are available on Blackboard. Each week you will be asked to
try at least one specific test. YOU MUST COMPLETE the suggested
test at least once which will be the equivalent to attendance each
week. You can retake the test as many times as you like as the
numbers change.
Brad Allison (current final year student) created the tests and has
recorded a short video introducing the tests.
Additional resources SAS programming videos
Bilal Mahmood (current final year student) has created a set of SAS
videos covering key topics from your Maths Tech SAS which will help
with coursework for this module. He has also created some short
videos on how to write a statistical report.
SAS statistical techniques summary sheets
We also have a set of summary sheets for each of the key statistical
topics covered in this module which you may find useful when
completing coursework.
Peer support Some sort of peer support from final year students will take place
(details to be confirmed). It is likely that this will centre around the
work based group projects
Main stats support The maths and stats support service offers 1:1 support to any student
of the University and is primarily staffed by lecturers from this
department. Ellen coordinates the statistics support element.
Normally there are drop in sessions but it is likely that this year only
bookable online appointments will be available with statistics staff
teaching on this course.
https://maths.shu.ac.uk/mathshelp/
6
Differences between this course and A level
Most of you will have studied probability and statistics before but prior knowledge will vary so
some of you may find the initial material easy. For sections where most of you will have some
knowledge, the material will form more of a recap and for those who need it, extra practice
questions will be made available. However, we will very quickly progress from A level type
material to project based analysis more similar to the type of statistics you will need to carry out in
the workplace and using SAS.
Important note: Don’t miss class because you think you have studied statistics before. Analysis
from previous years shows that there is no significant difference in final grade for those who had
and had not studied statistics in detail before and attendance is the strongest predictor of grade!
This is the maths robot. Ask an exact question and you will get an exact
answer. Whilst it is important to be able to practice and repeat using
formulae or techniques when learning basic maths or statistics, being a
real mathematician or statistician requires much more than that.
Understanding where formulae come from and being able to apply the
techniques you have mastered in a practical and more open context is more
important at this level.
On this course, the selection, application and interpretation of techniques is important and the
focus of your assessment will be on these aspects rather than the memorisation of formulae.
7
Assessment of the Module
Assessment: There will be several assessments which test a mix of mathematical understanding,
the use of Excel/SAS and general skills such as reporting and presentation.
% weighting % subtask
Assessment and content Due in weighting
overall
Task 1. Phase tests.

There will be 3 online phase tests testing sections of the course 35
rather than one exam at the end.
4th 28
1a) Probability
November
7th January 28
1b) Inferential statistics (provisional
)
1c) Mostly semester 2 material. This will take place in the exam 10th May 44
weeks so the date may need to change depending on other (provisional
scheduled exams. )
Task 2: Applied coursework
These assessments test application of statistics to real world 65
projects using SAS and practicing vital communication skills.
Interim 50
2a. Group report and poster: You will be working with companies report: 10th
to analyse data and address questions of interest in a report December 30
suitable for a non-statistical audience. You will receive guidance
from staff and meet with the companies. Final report:
15th Feb
3. Individual report: Coursework using SAS and reporting 20th April 30 50
8
WELCOME TUTORIAL
In the first tutorial, we will be concentrating on getting to know you and demonstrating how
online tutorials will work. There is an opening exercise to try which doesn’t require any prior
knowledge. You will be placed in smaller breakout groups (in a similar way to tables within a
classroom) to work together on the opening exercise. Test out the best ways of doing this whilst
working on the exercises but also take the opportunity to introduce yourselves to the rest of the
group perhaps discussing any prior study of probability and statistics. Staff will move between the
rooms to chat to you in your smaller groups.
Opening exercise: Attendance and performance
The percentage of weekly statistics lectures and overall performance (%) in a module were used to
create statistical models to predict probabilities of failing or getting a 2.1 or higher based on
attendance. Use the theoretical predicted models in the graph below to answer the following
questions.
Model estimating performance probabilities based on attendance

1.1
1
0.9 p(2.1 or more)
0.8 p(fail)
0.7
Probability
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100 110
% attendance for weekly Statistics lecture
1. Use the graph to estimate the following probabilities for a student attending 30% of their lectures
a) Fail- The probability of a student with 30% attendance failing is estimated to be around 0.65
b) Pass- The probability of a student with 30% attendance passing is estimated to be around
0.07
c) get a 2.1 or higher?
2. Use the graph to estimate the risk of failing if a student attends 70% of lectures. How much more likely
to fail is a student who attends 30% of lectures compared to one who attends 70%?
9
3. At what level of attendance is a student equally likely to fail and get a 2.1 or above?
4. If there are 90 students in the class and everyone attends 40% of lectures, how many do you expect to
get a 2.1 or more?
5. The statistical model for failing uses the following equation where p = Probability of failing the course.
ln ( 1−pp )=3.4−0.093 x
a) Rearrange the equation to make p the subject
b) Estimate the probability of failing for a student attending 20% of classes using the formula
from a)
p
c) Describe what the ratio (known as odds) means in words
1− p
d) Calculate the odds of failing for someone attending 20% of classes
10
6. The statistical model was created from real data on attendance at the weekly statistics lecture
(attendance not collected consistently for other lectures and tutorials for the same module) and overall
performance in the whole module.
a) Draw some conclusions on the relationship between attendance and performance based
on the answers to the previous questions.
b) How reliable do you think predictions of final grade are based solely on lecture attendance
and is it just attendance that increases grade or another underlying factor?
c) What else should be considered when predicting success or failure for individuals?
11
1 Introduction to Probability
*Start video lecture 1a*
1.1 Introduction/ Motivation

We are often faced with making decisions in the presence of uncertainty.
Example 1.1.1
• A doctor is treating a patient, and is considering whether to prescribe a particular drug.
How likely is it that the drug will cure the patient?
• A probation board is considering whether to release a prisoner early on parole. How likely is
the prisoner to re-offend?
• A bank is considering whether to approve a loan. How likely is it that the loan will be
repaid?
• A government is considering a CO2 emissions target. What would the effect of a 20% cut in
emissions be on global mean temperatures in 20 years’ time?
• Your own example:
We often use verbal expressions of uncertainty (“possible”, “quite unlikely”, “very likely”), but
these are often inadequate if we want to communicate with each other about uncertainty.
Consider the following example.
Example 1.1.2
You are being screened for a particular disease. You are told that the disease is “quite rare”, and
that the screening test is accurate but not perfect; if you have the disease, the test will “almost
certainly” detect it, but if you don’t have the disease, there is “a small chance” the test will
mistakenly report that you have it anyway. The test result is positive. How certain are you that you
really have the disease?
Clearly, in this example and in many others, it would be useful if we could quantify our
uncertainty. In other words, we would like to measure how likely it is that something will happen,
or how likely it is that some statement about the world turns out to be true. In this course we
introduce a theory for measuring uncertainty: probability theory.
There are (at least) two ways to think about the study of probability theory:
1. Pure mathematical approach
2. Modelling/data-driven approach
12
1.2 Foundations of probability
An experiment or trial is a process that results in one outcome out of all the possible outcomes.
The resultant outcome is unknown prior to the experiment. That is the result of the experiment is
uncertain.
The 'list' of all possible outcomes or sample points is called the sample space “S” or Ω.
An event is one or more of the possible outcomes from an experiment. A simple event
corresponds to a single outcome or sample point.
Example 1.2.1
If you throw a fair die some examples of events are:
 A is the event that you roll a 5
 B is the event that an even number appears
 C is the event that an odd number occurs
 D is
The sample space is
Ω={ 1, 2 , 3 , 4 , 5 , 6 } .
Quick exercise: The sample space of a fair coin toss is Ω={}

Notation
Maths Maths Word Translation Example
(Using Example 1.2.1)
P( A) Probability Probability of A occurring 1
P ( A )=P ( 5 ) =
6
A∪B Union Event of A or B occurring 4 2

{ 2 , 4 , 5 , 6 }= =
6 3
A ∩C Intersection Event on A and C occurring 1

{5}=
6
A ' or Ac Complement Event A does not occur 4

{ 1 , 2, 3 , 4 ,6 }=
6
∅ Empty Set The impossible event. 0 , e . g . roll a7
13
Axioms of Probability
A probability function P assigns to each event E ⊆ Ω a real number P(E) such
that:
(A1) P(E)∈[0 ,1]
(A2) P(Ω)=1
(A3) If { E 1 , … , En } is a countable disjoint collection of events then
n
P ( ¿ i=1 ¿ n Ei ) =∑ P ( Ei ) .
i=1
*End video lecture 1a*

*Start video lecture 1b*
Probability
Now that we are happy with the idea of the sample space for an experiment, and with the concept
of events, we’d like to be able to say something about how likely various events are: this is what
probability is all about!
We want to assign a probability to each event.
The probabilities may be defined by the data-generating process. For example, the probability of
getting a 6 when you roll a fair die is already defined when you consider how you would obtain the
data. You roll a die, there are six equally likely events, therefore P(6) = 1/6.
Thinking about a few examples one realises that one cannot assign probabilities arbitrarily.
Example 1.2.2
Say I am rolling a biased dice.
The sample space of this experiment is { 1 , 2, 3 , 4 ,5 , 6 } .
Let E be the event that I roll a 6 on this die.
On the packaging, it displays the following table:
Event 1 2 3 4 5 6
P(Event) 0.1 0.1 0.1 0.1 0.1 0.6
What does P ( E ∪ Ec )=¿
Is the packaging correct? Why/ why not?
So the probability of the event of “rolling or not rolling a 6” is 1.1 and that is not allowed because
probabilities lie between 0 and 1.
Why do probabilities lie between 0 and 1? Can you prove that? No you cannot. That is just
something we all agreed on. It’s an AXIOM.
14
15
Example 1.2.3
Let’s think of a more standard probability experiment where we all know what the probabilities
are: throwing a die.
The sample space is { 1 , 2, 3 , 4 ,5 , 6 } . The probability of the event {1 } that the die comes up with a 1
is equal to 1/6, in symbols we write P({1})=1/6.
Similarly P({2})=1/6.
If I told you that P({1, 2 })=0.4 you would object, because you would tell me that
P({1, 2 })=P ({1 }∪ {2 })=P({1 })+ P({2 })=1 /6+ 1/6=1 /3 ≠ 0.4 .
So there are rules to assigning probabilities.
In this example the idea of what probability should be assigned to each event is quite intuitive.
However, in order to deal with (potentially much) harder examples, we need to write down a set
of rules explicitly that probabilities should obey. These are the Axioms of probability.
Axiom Translation
(A1) A probability is always between
(A2) The certain event has probability
(A3) The probability of the union of disjoint events is the sum of the probabilities of the
individual events.
i.e. (A3) is just the “or- rule” from GCSE/ A level.
Venn Diagrams
The Venn diagram can be used to illustrate the combination of events.
We use the Greek letter Ω to denote the universal set i.e. the sample space. Regions within this
space represent events.
Mutually exclusive events cannot occur together.

Quick exercise: Write down two mutually exclusive events when a fair die is rolled.
16
Example 1.2.4
Example 1.2.5: Disjoint events

 Draw a Venn Diagram to illustrate two disjoint events, E1 and E2.
 Shade in E1 ∪ E2.
 Use this to convince yourself that (A3) is true only for mutually exclusive events and that A3
can also be written as
P ( A ∪ B )=P ( A ) + P ( B ) if A ∩ B= ∅
Quick exercise: Using the Venn diagrams, what would happen if A ∩ B≠ ∅ ?
17
Example 1.2.6
Consider for example the events A={1 , 2} and B={2 ,3 }when throwing a die.
We know that P( A)=P (B)=1 /3 .
But P( A ∪ B)=P({1, 2 , 3})=1/2 ≠1 /3+ 1/3.
This is because the events A and B are not disjoint but both contain the outcome 2.
The next theorem gives some rules that may be inferred from the axioms of probability.
For any events A , B ∈ Ω

(A4) P( A c )=1−P ( A );
(A5) P( ∅ )=0 ;
(A6) P( A ∪ B)=P( A)+ P( B)−P( A ∩ B).
Theorem Translation
(A4) The probability of an event not occurring is 1 minus the probability

of it occurring.
(A5) The empty set has probability zero.
(A6) The probability of the union of any two events is the sum of the
probabilities of the individual events minus the probability of both
events occurring together.
Example 1.2.7
Draw a Venn diagram to convince yourself that (A6) is true.
*End video lecture 1b*
18
*Start video lecture 1c*
1.3 Conditional probability

Example 1.3.1
Consider a game: you get to roll a fair die twice and if the total score is higher than 9 you win £6,
otherwise you lose £1.
Consider the sample space of outcomes:
Quick exercise: complete the table
(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)
(2, 1) (2, 2) (2, 3) (2, 4) (2, 5)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5)
(6, 1)
The probability of you winning is the probability of getting:
So P ( win ) =¿
(there are a total of 36 possible outcomes).

Now suppose that the first roll of the die gives you a 6. What is your probability of winning now?
The number of possible scores has been drastically reduced from 36 to 6, and the winning
outcomes are now
Thus, P ( win ) =¿
What happened is that additional information has shrunk the sample space.
Before the first die stopped at six, the sample space contained 36 elements whereas after the first
die stops at six the sample space consists of only 6 elements.
19
We can consider this example in terms of conditional probability.
The conditional probability of A given B is denoted by P( A∨B) . And we say…
So, in this example,
P ( A|B )=¿
Or, more formally,

Let B be any event such that P(B)> 0. For any event A the conditional
probability of A given B is
P( A ∩ B)
P ( A|B )= .
P( B)
Exercise 1.3.2: Verify the above relation for the probabilities in Example 1.3.1.
 P ( B )=¿
 P ( A ∩B )=¿
A simple rearrangement of formula for conditional probability gives the multiplication rule.
Multiplication Rule
Let A , B be events with P(B)> 0. Then
P ( A ∩B )=P ( A|B ) P ( B ) .
Note: This is formalising how you would calculate probabilities using a Tree Diagram at GCSE/ A-
level.
Example 1.3.3
Consider drawing two balls out of a bag containing 8 white and 4 red balls. What is the probability
that both balls are red?
Let R1 be the event that the first ball is red and R2 the event that the second ball is red.
The question asks for P(R 1 ∩ R 2).
It is easy to determine that P(R 1)=¿
20
because all balls are equally likely to be picked and one third of the balls are red.
It is less easy to determine P(R 2) because we do not know whether by the time the second ball is
picked there are still 4 or only 3 red balls still in the bag, because that depends on the outcome of
the first draw.
This is where at GCSE/ A-level we might draw a tree diagram to help us calculate this probability.
So, draw a tree diagram to answer this question in the space below.
Which branch corresponds to P ( R 2|R 1 ) ?

So we can determine the conditional probability P(R 2∨R 1)that the second ball is red given that
the first one was already red, because given that information we now know that there are 3 red
balls left and there are 11 balls left altogether, so P(R2|R1) = 3/11.
Now, can you use the definition of conditional probability/ the multiplication rule to determine
P ( R1 ∩ R 2 )=¿
Does this agree with your tree diagram solution?
Independence
Unlike in Examples 1.3.3, there are situations where knowledge that an event B occurs does not
influence the probability of occurrence of an event A. This gives rise to the notion of independent
events.
Events A and B are independent if P( A ∩ B)=P( A) P( B).
Note: This is simply a rephrasing of the “and-rule” from GCSE/A-level.
*End video lecture 1c*
1.4 What have you learnt this week?

It’s good practice to summarise the key things you have learnt each week and identify bits you are
struggling with before the tutorial. For example, you could write any formula and key definitions
you may need for the tutorial. Try to re-write definitions in your own words as you are more likely
to remember and understand when preparing for a test. You can take brief notes into exams so
summaries here could be used in these notes.
21
Chapter 1 homework questions: Basic Probability
HOMEWORK: You must attempt ALL the robot questions BEFORE the tutorial
and ideally the mathematician questions from section 1 as well. If you are
struggling with any questions, go to the face to face session for help and/or
ask us in the tutorial.
ONLINE TEST which MUST be completed before the tutorial: Basic Probability
Robot Questions
1) A ten sided die with numbers 1 to 10 is thrown
Describe the sample space and find the probability of obtaining the event {1, 4, 5}.
2) A card is drawn at random from a pack of 52 cards.

Describe the sample space and find the probability of obtaining:
a) an ace b) a club c) an honours card (i.e. a picture card or an ace).
3) Two cards are drawn successively at random from a pack of 52 cards.

What is the probability of getting:
a) two hearts
b) an honours card and then a 10.
22
4) Two four-sided dice are thrown.
Two four-sided dice are thrown. Let us write the outcome of an experiment as [i, j] where
i∈ {1 , 2 ,3 , 4 } is the score on the first die and similarly j ∈{1 ,2 , 3 , 4 } is the score on the second
die.
 Let A be the event that the sum of the dice is even;
 let B be the event that the first die shows a higher number than the second;
 and let C be the event that the sum of the two dice is 4.
We can write the event A by listing all the outcomes contained in the event:
A={[1 , 1],[1, 3],[2 , 2],[2 , 4 ],[3 , 1],[3 , 3],[ 4 , 2],[4 , 4 ]}
Similarly specify each of the following events as sets listing all their outcomes.
a) B=¿
b) C=¿
c) A ∩ B=¿
d) A ∪ C =
e) A ∩C c =¿
f) c
A ∩C=¿
g) A ∩ B∩ C=¿
5) Using a Theorem
If P( A)=2/3 , P(B)=1/ 4, and P( A ∪ B)=3/ 4, calculate P( A ∩ B).
Mathematician Questions – we will go through this in the tutorial

1) Newspaper readership
The probability that a man reads the Sun is 0.7. The probability that he reads both the Sun and the
Guardian is 0.1. The probability that he reads neither is 0.2. Find the probability that he reads the
Guardian.
2) Independence
1 2
Let P ( A )= , P ( B )= p and P ( A ∩B )= . Find the value of p such that the events A and B are
3 15
independent.
23
3) Driving licence
To get a driving licence, you have to pass a theory test and a practical. The probability that an
individual passes the theory test is 0.9. The chances of passing the practical test are greatly
improved by passing the theory test. In particular, if an individual passes the theory test, they have
a 0.7 chance of passing the practical. Otherwise, the chance of passing the practical is 0.4. Without
using a Tree Diagram, find the probability that an individual gets a driving licence.
4) Random draws
A ball is drawn at random from an urn containing 8 red and 6 white balls. If a white ball is drawn, it
is put back into the urn. If a red ball is drawn, it is returned to the urn together with 6 more red
balls. Then a second draw is made. What is the probability that a red ball was drawn on both the
first and second draws?
5) Maths and gender

It is claimed that boys are more likely than girls to take maths A level so 486 students are asked
whether or not they have maths A level and the gender they identified as.
252 students had maths A level, 43% of students asked identified as male and 138 of those
identifying as male had maths A level.
a) Fill in the table of frequencies GCSE and A level and
Gender Ge nde r
and then use it to calculate the
below above (A) Total
following probabilities. State
the answer numerically but also Female
use probability notation e.g. Male (M)
P(M) = probability of identifying
male and P(A) = Probability of A le ve l Total 486
having maths A level.
b) Probability of a randomly selected respondent identifying as female
c) The probability that a randomly selected participant has maths A level.
d) Probability that a randomly selected participant identifies as female and has maths A level.
24
e) Probability that a student has maths A level given that they identify as female
f) The probability that a student identifies as female given they have maths A level.
Chapter 1 tutorial questions: Basic Probability
Extended Applied Task

Weather predictions by hour for exact locations
are commonly used by many weather sites with
indications of the likelihood of rain.
https://blog.metoffice.gov.uk/2016/04/21/whats-
the-chance-of-an-april-shower/
1) You are going away for the weekend and the Met Office predicts that the likelihood of rain on
Saturday is 5% and the likelihood of rain on the Sunday is 10%.
a) Assuming that the likelihoods are independent, what is the probability of it NOT raining whilst you are
away P(No rain on either day)?
b) What is the probability of it raining on one day only?
c) What is the probability of it raining on at least one day?
25
2) The Met Office claims to predict next day temperature within 2 degrees day 90% of the time.
a) What is p = P(they are out by more than 2 degrees on any given next day)?
b) What is the probability of not making a mistake on any given next day in terms of p and
numerically?
c) How many days in a month (30 days) do you expect them to be wrong about the next day
temperature?
3) Let X = r.v. number of days up to and including their first mistake’ and p as in Q5
Write the following answers in terms of p and numerically. What is the probability that the first day they
are wrong is:
a) first day
b) second day
c) third day
a ¿ P( X=1)=¿
Create a formula in terms of x and p which will allow the calculation of the probability of the first
'success' for any x and p.
P( X=x)=¿
26
4) You are going away for a week. The chance of rain is 5% on each day.
a) If p = P(rains on one day), how would you express P(it does not rain on one day) in terms of p?
b) What is the probability of it not raining whilst you are away (assuming events are independent)?
Write this in terms of p and calculate the probability.
P(¿ does not rain on one day)=¿
c) If X = r.v. ‘no. of days it rains in 7 days’, what is the probability that it rains on one day only? Write this
in terms of p and calculate the probability. Hint: Consider how many ways this can happen.
d) What is the probability of it raining on the first two days only? Write this in terms of p and calculate
the actual probability.
a)
27
2 Discrete Distributions
2.1 Random Variables

Before we think about distributions it is important to understand the concept of a random
variable. We want to know the outcome of some trial or event but until we complete the trial we
don’t know the outcome.
The outcome of interest, the variable, is a random variable
Note that data may come in two different forms:
i.) Data consisting of whole numbers.
We can consider this kind of data as the values taken by discrete variables.
(e.g. the number of counts registered by a Geiger counter, the number of faulty items
produced by a company etc).
Quick exercise: write down a discrete random variable
ii.) Data consisting of numbers which can take any value within certain limits.
We can regard this kind of data as the values taken by continuous variables.
(e.g. the air temperature over a certain period, birth weights of babies etc).
Quick exercise: write down a discrete random variable
In statistics we are constantly considering variables and the values they take.
Frequently we wish to determine the probability with which a particular variable can take certain
values.
Example 2.1.1
Consider the simple experiment of tossing three coins simultaneously and noting the result.
We can define a random variable X to be the number of heads that turn up.
The possible values that X can take are…
We may be interested in the probability that X takes a value of two or more.
X can take only whole number values and until we toss the coins we do not know how many heads
will turn up (i.e. random). Putting this together, it is thus a discrete random variable.
More formally, a random variable is a quantity that depends on the outcome of a random event
(i.e. a probability experiment).
28
Example 2.1.2
When you throw two dice:
 X = the sum of the two scores showing is a random variable
 Y = the product of the scores
 Z = the larger of the two scores
 Your own example:
are all random variables.
Notation
Random variables are denoted by a capital letter, as in Example 2.1.2. Realisations of the random
variable, i.e. the outcome of the experiment, are denoted by a small letter.
e.g., the random variable X can take values x 1 , x 2 , x 3 , x 4 , x 5. We may be interested in P ( X =x1 ) or
some other value or collection of values.
Formally, a random variable, X, represents a function which associates a real number with every
event in a sample space.
Maths: X : Ω→ R
We denote the range of this function (i.e. the values it can take) as X (Ω).
Example 2.1.3
Random variables can also arise from real-life observation. For example, X, the number of
telephone calls arriving at a switchboard between 10:00am and 10:30am is also a random
variable.
Example 2.1.4
Consider the following bet: you roll a fair die and
a) win £2 if the outcome is a 5 or a 6,
b) lose £1 if the outcome is 1, 2 or 3,
c) win or lose nothing if the outcome is 4.
The sample space of this random experiment is Ω=¿
Denoting losses as negative gains, we can represent the gains from this experiment as a function
X : Ω→ R define as
{
X ( ω )= −1 if ω=1 , 2 ,3
¿
The amount you gain is the random variable.
The range of X (i.e. the values the random variable can take) is X ( Ω ) ={−1 ,0 , 2 } .
X, can only take distinct values, it is a discrete random variable.
29
2.2 Discrete probability distributions
When values of a variable have a probability attached, they form a probability distribution.
Probability distributions are theoretical distributions based on some mathematical model which is
set up to describe some particular situation. A discrete probability distribution is simply a list of
the probabilities associated with each possible outcome of the experiment.
Remember the notation: We use capital letters (often W , X , Y , Z ) to mean the random variable
and small letters (e.g. w , x , y , z ) to mean the particular values of the random variables.
Example 2.2.1
Flip a coin three times. Let X =number of heads . What is the probability distribution of X ?
 List the possible outcomes in the space below- what is the probability associated with each?
 Complete the following table (called a “possibility space”).

x 0 1
P( X=x) 1
8
Given a sample space Ω and a discrete random variable X ,the function

p X ( x ) :R → [0 , 1] defined by
p X ( x )=P( X=x )
is called the probability mass function of X .
Example 2.2.2
In Example 2.2.1, the probability mass function is given by:
p X ( x )=¿
Exercise 2.2.3: Write down the probability mass function for the discrete random variable
defined in Example 2.1.4
Theorem
30
Let X be a discrete random variable that takes values in the set { x 0 , x 1 , x2 … } .
Then its probability mass function p X satisfies:
p X ( x k ) ≥ 0 , k=0 ,1 , 2 ,… (m1)
∑ p X ( x k ) =1(m2).
all x k
Exercise 2.2.4: Which of the following are a valid probability mass function/ possibility space?
Give a reason for any which are not.
x 1 2 3 4 5
P( X=x) 0.2 0.3 0.4 0.3 0.2
x -2 -1 0 1 2
p x (x ) 0.2 0.3 0.1 0.3 0.1
{
1
if x=1
2
1
p X ( x )= if x=2
4
1
if x=3
4
{
0.3if y=1
pY ( y )= −0.2if y=2
0.9if y=3
Let X be a random variable. The (cumulative) distribution function of X is

the function F X : R→ [0 , 1] defined by
F X ( x )=P ( X ≤ x ) .
For discrete random variables this is the sum of all probabilities for outcomes
up to and including x .
In general, for a discrete random variable, the distribution function is obtained simply by summing
the mass function for all values up to x .
Formally, for any discrete random variable X taking values in { x 0 , x 1 , x2 , … } with x 0 < x 1< …,
F X ( x )=P ( X ≤ x )= ∑ p X ( x k ) .
x k ≤x
31
Example 2.2.5
Recall Example 2.2.1: flip a coin three times. Let X =number of heads .
a) What is the probability that you get 1 head or less?
b) F X ( 1 )=P ( X ≤1 ) =¿
c) F X ( 2 )=¿
Example 2.2.6
In Example 2.2.1, the cumulative distribution function (cdf) in tabular form is given by:
x 0 1 2 3 4
F X ( x )=P(X ≤ x) 1 1 3 4 4 3 7
+ = + =
8 8 8 8 8 8 8
Here are some more general properties satisfied by the distribution function of any random
variable:
1. F X (x ) is increasing in x .
(i.e. If a< b then F X ( a ) ≤ F X ( b ).)
2. lim F X ( x )=0
x→−∞
3. lim F X ( x )=¿ ¿
x→ ∞
Example 2.2.7
a) In Example 2.1.4, what is F X (−1 )?
b) In Example 2.2.6, the cumulative distribution function (cdf) is given by:
{
0 if x <0
1
if 0 ≤ x<1
8
4
F X ( x )= if 1 ≤ x< 2
8
7
if 2 ≤ x <3
8
1 if x ≥ 3
c) We can also plot the cdf (see below). Comment on the features of this plot.
32
Recap
So far in this section we have covered:
 Random variables – discrete and continuous
 Probability distributions
 The probability mass function for discrete random variables
 The cumulative distribution function.

2.3 Frequently used discrete probability distributions

There are certain types of discrete probability distributions that are used very frequently - in fact,
we have already met some of these distributions in lectures and tutorials. We will now formally
introduce these distributions and a few others in the following subsections.
We take n ∈ N and p ∈[0 ,1] throughout.
2.3.1. Discrete Uniform Distribution

The scores on a fair die provide six possible outcomes…
Each outcome has the same probability…
Example 2.3.1.1: Rolling a fair die
 Complete the following table
x
p X (x)
 Complete the probability mass function for this example.
{
1
if x=1, 2 ,
p X ( x )= 6
0 if x ≠1 , 2 ,
33
This is an example of the discrete uniform distribution also known as the “equally likely
outcomes” distribution. Discuss why you think this distribution might have these names.
We say that a random variable X has a discrete uniform distribution and
write X Uniform(n) , if X ( Ω ) ={ x 1 , x 2 , … , x n } and it has mass function
{
1
if x ∈ X ( Ω )={ x 1 , x 2 , … , x n }
p X ( x )= n
0 otherwise .
Note that you may see alternative notation in the literature such as:
 X Uniform ( k )
 X Unif ( n )
 X U (k )
They all mean that “ X has a uniform distribution” (i.e. n or k outcomes, each with equal
probability of occurrence).
Example 2.3.1.2
A special case of the discrete uniform distribution is when X ( Ω ) ={ x 1 , x 2 , … , x n }= {1 , 2 , … ,n } .
 Write down the mass function for this special case of the discrete uniform distribution.
 Sketch the pmf of this special case of the discrete uniform distribution.
 Can you think of any examples of the discrete uniform distribution?
34
2.3.2. Bernoulli Distribution
Many common discrete distributions are associated with a Bernoulli trial in which only one of two
possible outcomes can occur. These outcomes are commonly referred to as "success" (1) and
"failure" (0).
Example 2.3.2.1
 When you press a light switch the bulb either comes on or it does not.
 After manufacture an article may be classed as either "defective" or "non-defective".
 A patient referred to a consultant may attend or not attend.
 Any situation where we have an event and a complementary event…
 Your own example:
Hence, if the probability that a "success" will occur at a Bernoulli trial is0 ≤ p ≤ 1 then the
probability that a "failure" will occur at the trial must be q=1− p .
But the Bernoulli trial also forms a distribution in its own right.
We say that the random variable X has the Bernoulli distribution with
parameter p, and write X Bernoulli( p), if it only takes values 0 and 1, i.e.
X ( Ω ) ={ 0 , 1 } with
P ( X=1 )= p
and
P ( X=0 )=1− p .
The mass function of X is
{
1− p if x=0
p X ( x )= p if x=1
0 if x ≠ 0 ,1.
35
Example 2.3.2.2
Consider a Bernoulli trial in which a biased coin is flipped. Suppose the probability of a tail is 0.2.
Assuming trials are independent what is the chance that a tail will occur for the first time on the:
a) First flip? P(T )=¿
b) Second flip?
c) Third flip?
d) Fourth flip?
e) If X is the number of trials to the first tail then

P( X=x)=¿
f) Now suppose the probability of a tail is p. If X is the number of trials to the first tail then
P( X=x)=¿
2.3.3. Geometric Distribution

Example 2.3.2.2 is an introduction to our next famous distribution- the geometric distribution.
This example illustrates that a geometric distribution describes the waiting time until a success in a
series of (independent) Bernoulli trials (trials that lead to either success or failure).
We say that the random variable X has the geometric distribution with
parameter p ∈¿ and write X Geo( p), if X ( Ω ) ={ 1 ,2 , … }=N has mass function
p X ( n )= {
( 1− p )n−1 p
0 otherwise
In other words, X represents the number of the trial at which the first "success" occurs.
Why do you think this distribution is called the “Geometric” distribution?
Note: you may also see the notation X Geometric ( p).
36
Example 2.5.1
Consider an American roulette wheel, with numbers 00 ,0, 1, 2, 3,..., 36 (38 numbers in total). A ball
is thrown onto the wheel as it is spinning, and comes to rest by one of the numbers. If you always
bet that the ball will stop on one of the numbers 1, 2, ..., 12, what is the probability that you will:
a) Win on your first bet?
b) Win on your second bet?
c) Win on or before your third bet?
d) Can you write what you were asked to calculate in parts a-c using the notation p X (n) and
F X (n)?
In part (c) you were calculating P ¿)- this is the cumulative distribution function of a Geometric
random variable. To save you some time, we will now derive a general formula for the distribution
function of a geometric random variable.
F X ( x )=P ( X ≤ x )=¿
In the last equality we introduced the notation ⌊ x ⌋ to denote the largest integer smaller or equal
to x . Using the probability mass function given above we then find for x ≥ 1 that
F X ( x )=¿
Which formula from A level could we use now?
This gives, for x ≥ 1,
F X ( x )=¿
37
The distribution function for X Geo ( p ) is thus given by
F X ( x )=P ( X ≤ x )=
{ 0 x< 1
1−( 1−p )⌊ x ⌋ x ≥ 1
2.3.4. Binomial Distribution

The Binomial distribution, like the Geometric distribution, makes use of the Bernoulli trial. But,
whereas the geometric distribution investigates the time to a success, the binomial distribution
investigates the number of successes in a finite number of trials.
Example 2.3.4.1
Consider an experiment in which a biased coin is tossed 4 times.
Suppose the probability of a head is 0.8.
We want to find the probability distribution of the random variable X =number of heads.
 Complete the following table.
Outcomes Combinations Probability of One
Combination
0 heads TTTT 1
1 head HTTT THTT TTHT TTTH 4
2 heads HHTT HTHT HTTH TTHH THTH THHT 6
3 heads THHH HTHH HHTH HHHT 4
4 heads HHHH 1
 Hence, can you think of a formula in terms of p for the “probability of one combination”?
 Complete the following table that represents the possibility space for this distribution.
Number of Heads = x Number of Probability of One P( X=x)
combinations Combination
0 1 0.0016 0.0016
1 0.0064 0.0256
2 6
3 4 0.1024 0.4096
4 1 0.4096 0.4096
Total =
 Can you write a formula in words for P( X=x) ?
38
To find the number of combinations in this example, we do not have to list all them all!
Exercise: What is the formula for the number of combinations of r items out of n items when the
order does not matter?
1. How many ways can you arrange n items?
The notation for this is

2. So how many ways can you arrange the r items (if order does not matter)?
3. How many repeats will there be?
4. Hence, can you derive a formula for the number of combinations?
Note: 0 !=1.
 The Binomial distribution occurs when we have a fixed number (n ) of independent trials.
 Each trial has only two outcomes, usually called 'success' and 'failure'.
 The probability of success ( p) is the same for each trial.
 The random variable X is the number of successes in the n independent (Bernoulli) trials.
Can you put all this together and complete the definition of the Binomial distribution (below)?
Note: This derivation (above) is non-examinable.
We say that the random variable X has the binomial distribution with
parameters n=number of trials and p= probability of success, and write X Binomial(n , p)
, if X ( Ω ) ={0 , 1 ,2 , … , n } and it has mass function
{( )
n p k ( 1− p )n−k if k=0 , 1 ,2 , … , n
p X ( k )=P ( X=k )= k
0 otherwise
39
Example 2.3.4.2
 Complete the following table to verify that our pmf gives the same values as our
calculations in Example 2.3.4.1.
Number of
Heads = k
Number of
combinations ( nk) Probability of
One
P( X=k ) pX ( k )
(using pmf)
Combination
0 1 0.0016 0.0016
1 0.0064 0.0256
2 6
3 4 0.1024 0.4096
4 1 0.4096 0.4096
Total =
Recap
In this chapter we covered:
 Discrete/ continuous random variables
 The probability mass function and cumulative distribution functions for discrete random
variables
 The Discrete Uniform, Bernoulli, Geometric and Binomial distributions.
*End video lecture 2c*
40
2.4 Summary so far
It’s good practice to summarise key points from each chapter in your own words to ensure you
understand and to refer back to when revising. For example, summarise the distributions covered
so far or the axioms of probability as below.
1) Identify the distribution given below. State the axioms of probability in your own words and
demonstrate that it is a valid probability mass function (pmf).
{
1− p if x=0
p X ( x )= p if x=1
0 if x ≠ 0 ,1.
Fill in the table below to help decide which distribution to use.

Distribution When to use and parameters? Probability mass function
Discrete Each of k outcomes is equally likely
Uniform X Uniform( k ) 1
p X ( x )=P( X=x )=
k
Bernoulli
Binomial
Geometric
2.5 Online tests this week

This week there are three relevant online quizzes:
 Discrete Uniform
 Binomial
 Geometric
Complete the Binomial one to count as attendance but do try the others as well.
41
Chapter 2 homework questions: Discrete Distributions
Remember to work through the homework questions BEFORE the tutorial.
Robot Questions
Calculate the value of c that makes the following valid probability mass functions.
Question 1
x 1 2 3 4
P( X=x) c 2c 3c 4c
Question 2
x 1 2 3 4
p X (x) c c c c
2 3 4
Question 3
{
cx , x =3 , 4 , 5
p X ( x )= c ( 11−x ) , x=6 , 7 , 8
0 , 0 therwise
Complete the following tables.

Question 4
x 15 30 40
P( X=x) 0.5 0.3 0.2
P( X ≤ x)
42
Question 5
x 1 2 3 4
p X (x) 0.3 0.2 0.3 0.2
F X (x )
Question 6
x -2 -1 0 1 2
p X (x)
F X (x ) 0.1 0.2 0.4 0.7 1
Question 7: I have an unbiased die. For each of the following state, with a reason, the distribution of the
random variable. (Discrete uniform, geometric, binomial, other)
a) X = the number of times I roll until I get a 6.
b) Y = the score showing on the top face when I roll it.
c) Z = the score showing on the top face when I roll it
Question 8) In the production of a Micro SD card, it is found that 10% are defective. The cards are
produced in batches of 10.
a) Write down a suitable model for the distribution of defective components in a batch.
b) Find the probability that a batch contains:
i) No defective Micro SD cards ii) 2 defective micro cards iii) at least 3 defective Micro SD card.
43
Question 9: A shop receives a batch of 1000 cheap lamps. The probability that a lamp is defective
is 0.1%. Let X be the number of defective lamps in the batch.
a) What kind of distribution does X have? What is/are the value(s) of the parameter(s) of this
distribution?
b) What is the probability that the batch contains no defective lamps? One defective lamp?
More than two defective lamps?
Mathematician Questions
Question 1) NBA All Star Weekend! (Google this if you have no idea what we’re talking about. Definitely
youtube the Slam Dunk Contest!) https://www.youtube.com/watch?v=u7VgkfcSYz0
a) Slam Dunk Contest: In the Slam Dunk contest, players compete to perform the “best” dunk.
They get three attempts at a dunk. If/when they make the basket they stop and get a score. If the
probability of Aaron Gordon making a particular dunk is 80%, what is the probability he does not
make the dunk in the contest?
44
b) Three-point Contest
In the Three-point Contest, players compete to make the most three-point shots out of 25 taken. If
Steph Curry makes 44% of the three-pointers he takes, how many would you expect him to make
in the contest? Klay Thompson went before Steph Curry and made 23 shots. What is the
probability Steph beats Klay?
c) Challenge discussion question
The NBA Skills Challenge is a competition to test ball-handling, passing and shooting ability. In the
current version of the contest, two participants race against each other on identical courses by first
dribbling between five obstacles while running down the court. Next, the player must throw a pass
into a net that does not touch the ground. Then, the players must dribble back the full length of the
court for a lay-up. Shortly after, the players must dribble back down the court and hit a three-
pointer from the top of the basketball key. The match ends when the first player hits the three
pointer. Kristaps Porzingis is competing in the skills challenge. He has reached the final challenge: to
make the three-pointer. In the regular season, Kristaps’s three-point percentage is 35.7%.
a) How many shots do you think Kristaps will have to take before he makes the basket?
(Despite the rules of the game, Kristaps is competitive and will shoot until he makes the basket!)
b)What is the probability he makes the basket on his third attempt?
c) What assumptions have you made to answer this question?
45
3 Continuous Distributions and Expectation
This chapter is an introduction to continuous distributions and expectation of discrete and
continuous variables which will be covered again in more detail in the second probability section
of the course.
3.1 Continuous Distributions

The probability distributions considered so far have enabled us to answer questions such as:
 What is the probability of getting 2 defective articles in a batch of 10?
 What is the probability I win the jackpot in the lottery on my fifth time of trying?
The common factor associated with these questions is that the random variable being described
can only take discrete integer values. Thus, discrete probability distributions were used.
However, if we want to ask questions such as:
 What is the probability that a can of Diet Coke contains 330ml of the drink?
Then, the variable is NOT restricted to a discrete value. It may be measured to any degree of
accuracy on a continuous scale, dependent only upon the measuring equipment.
To describe these situations we use continuous probability distributions.
The first problem we encounter is that P ( X=x ) =0 for all values of the random variable.
Discuss why you think this is the case. Does this mean all values of the random variable are
impossible?
Therefore, continuous random variables are characterised by the properties of their (cumulative)
distribution functions. This is because we can define the cumulative probability distribution (cdf)
of a continuous distribution in the same way as we did in Chapter 2 for discrete distributions:
F ( x )=P ( X ≤ x ) .
In fact, if you look back to Chapter 2, you will note that we didn’t state that X was a discrete
random variable in our definition of the cdf. The cdf exists for all random variables and all values
on the continuous scale regardless of whether the variable is discrete or continuous.
Note: the properties of the cdf stated in Chapter 2 apply to the cdf of any random variable.
However, there are some special properties of the cdf for discrete and continuous random
variables.
Find the property/ properties of the cdf in Chapter 2 that only applies/ apply to discrete random
variables.
Now we state some properties of the cdf for continuous random variables.
46
 Suppose that a random variable X can take any value in the range (a ,b) (i.e. a to b).
Then the cumulative probability distribution is of the form:
{
0 for x <a
F X ( x )= A monotonically increasing function for a ≤ x ≤ b
1 for x >b
 Note: since P ( X=x ) =0 it follows that:
P ( X ≤ x )=P (X < x )
and
P ( X ≥ x ) =¿
Recall from Chapter 2 that: In general, for a discrete random variable, the distribution function is
obtained simply by summing the mass function for all values up to x .
Example 3.1.1
Look at and discuss the following diagram:
 Do you think it represents a discrete or continuous distribution?
 If the class intervals are( 0 , 0.1 ] , ( 0.1 , 0.2 ] , … , ( 0.9 ,1.0 ] ,what probability does the shaded
area represent?
 Put a tick in the bars you would need to calculate
F X ( 0.2 )=P ( X ≤ 0.2 )
probabilit y= heigtx0.1
Histogram class interval of 0.1

0.2
0.16
0.12
0.08
0.04
0
05 .15 .25 .35 .45 .55 003 003 001 001
0. 0 0 0 0 0 0 0 0 0
0 00 000 000 000
0 0 0 0
0 00 000 000 000
00 00 00 00
.65 .75 .85 .95
0 0 0 0
 Now suppose we narrow our class width to 0.01. The resultant histogram is below.
47
probability = heightx0.01
Histogram class interval 0.01
0.018
0.012
0.006
0
 Now suppose we let the class widths get really small.

Maths: Let the class width be denoted δx and let δx → 0.
If we plot a curve that represents the top of these tiny bars, we get:
Histogram class width δx

f(x)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
 Looking at the diagram above, how could you calculate the following probabilities (in terms
of the function f (x))?
P ( 0.2≤ x ≤0.4 )=¿
P ( X ≤ 0.3 )=F X ( 0.3 )=¿
 Can you think how the cdf will be linked to the function denoted f (x) in the figure above?
The function f (x) is known as the (probability) density function (pdf) of the random variable X .
We can use this function to help us mathematically define a continuous random variable.
We call a random variable X continuous if its distribution function F X can be
written as
x
F X ( x )=∫ f X ( s ) ds , x ∈ R
−∞
48
for some non-negative function f X : R → [ 0 , ∞ ) .
In this case, we say that f X is the density function of X .
The fundamental theorem of calculus implies (under some conditions) that for each x ∈ R,
x
d d
F ( x ) = ∫ f X ( s ) ds=f X (x ).
dx X dx −∞
Exercise: Translations
 We can get F X from f X by…
 We can get f X from F X by…
 We can think of f X as…
Theorem
Let X be a continuous random variable, then its density function f X satisfies
f X ( x ) ≥ 0 , ∀ x ∈ R (d 1)
and
∞
∫ f X ( x ) dx=1(d 2).
−∞
Conversely, any real function f X satisfying (d1) and (d2) is the density function
of some continuous random variable.
Example 3.1.2
{
c−1
x , 0≤ x ≤ 1 .
Consider the function g ( x )= Is this a probability density function?
0 otherwise
1. d1: Is g ( x ) ≥ 0 ∀ x ∈ R ?
Firstly, outside of the range [0, 1] it is zero, so that is OK. But is it positive in the range 0 to 1
[0, 1]? Yes, since x is positive over the range therefore g ( x ) is also positive over the range.
2. d2: Is total probability 1?
[ ]
∞ 1 1
xc 1
∫ g ( x ) dx=¿∫ x c−1 dx= = ¿
c 0 c
−∞ 0
Thus, this function is not a probability density function unless c=1 .
For calculating probabilities of events involving random variables, density functions have for
continuous random variables the same role that mass functions have for discrete random
variables. The analogy, however, is not direct.
49
Example 3.2.3
 Can you think of anything that is true for mass functions but is not true for density
functions?
 Find the equivalent theorem for discrete random variables. Compare and contrast.
(i.e. the (m1)/ (m2) theorem.)
Lemma
If X is a continuous random variable with density function f X , then for all
a , b , ∈ R with a ≤ b
Note that this means we can calculate P(a≤ X ≤ b) simply by calculating the area under the
density function between the points a and b .
Exercise: Add three equivalent expressions below.
Also, because P( X=a)=0=P( X=b) the weak inequalities (≤) can be replaced by strict
inequalities (<) anywhere without changing the probabilities.
i.e. P ( a< x <b )=¿
As with discrete random variables, there are a number of continuous random variables which have
special places in probability theory.
50
3.2 Special Continuous Distributions
3.2.1. Continuous Uniform Distribution

We begin with the continuous version of the uniform distribution- we met the discrete version in
Section 2.3.
As the name suggest all outcomes are equally likely over a finite range a ¿ b denoted [ a , b ].
Special cases of the uniform distribution:
 When the range is [0,1], it is used to generate random numbers.
 When the range is [ −1 1

]
, it is used for the distribution of rounding errors.
2 2

We say that the continuous random variable X has the uniform distribution
on [a , b], and write X ∼ U [a , b ], if the density of X is
{
1
, if x ∈[a , b]
f X ( x )= b−a
0 , if x ∉[a ,b ]
Exercise 3.2.1.1: Verify the above is indeed a density function.

Sketch the density function of the continuous uniform distribution.
Example 3.2.1.2: Finding the cdf of the uniform distribution

For x ∈[a , b ], the distribution function of the uniform distribution is given by
F X ( x )=¿
Thus the full specification of the distribution function is
51
3.2.2. Exponential distribution
The exponential distribution is perhaps the second most important of all the continuous
probability distributions because of its extensive use in probability modelling.
The exponential distribution is often used to model waiting times between certain events, such as
natural disasters, machine break-downs, or customers joining a queue. If these waiting times are
independent and Exp(λ) distributed, then it can be shown that the number of arrivals per unit of
time t follows a certain type of famous discrete distribution called the Poisson distribution (with
parameter λt). We will cover this distribution later in the course.
The exponential distribution is also useful as it is the only continuous distribution that has the
memoryless property; that is the conditional probability: P ( T > s+ t|T >s )=P (T >t )
Translation: If the random variable T is a lifetime then we read this probability statement as the
conditional probability that the lifetime will survive s + t given that it has already survived to s is
the same as the probability that the lifetime will survive t.
Exercise 3.2.2.1: Can you think of an example of where this property would be appropriate?
Example 3.2.2.2
So if the length of time between hurricanes is exponentially distributed, the probability that the
next hurricane doesn’t occur in the next t+ s units of time, given that we’ve waited s units already,
is simply the same as the probability that the next hurricane doesn’t occur in the next t time units.
(This seems quite reasonable: nature doesn’t decide that there should be a new hurricane soon
because there hasn’t been one for a while...!)
We say that the continuous random variable X has the exponential
distribution with parameter λ and write X ∼exp (λ), if the density of X is
{
− λx
f X ( x )= λ e , if x ≥ 0
0 ,if x <0.
For x ≥ 0 , the distribution function of the exponential distribution is given by

− λx
F X ( x )=1−e .
Example 3.2.2.3
The following figures show the probability density function and cumulative distribution function for
the exponential distribution for various values of λ.
Discuss how the value of λ effects the shape of the graphs.
On the figures below, sketch the pdf and cdf of an Exp(0.25) distribution.Sketch the Exp(2)
distribution.
52
Example 3.2.2.4
Suppose that the operating lifetime of a battery is an exponential random variable with λ=0.5 .
What is the probability that the operating lifetime is over 4 years?
What is the probability that the operating lifetime is between 1 and 3 years?
3.3 Expectation of Discrete Distributions

In this section, we introduce some quantities that describe features of the shape of a probability
distribution, like where it is centred and how broad it is. These will play a big role in the later
chapters on statistics.
Intuitively, the expectation of a random variable is the value you “expect” your random variable to
take.
Exercise 3.3.1
 Roll a fair die 600 times. How many sixes would you expect to get?
 Flip a fair coin 50 times. How many heads would you expect to get?
 Flip a fair coin 100 times. How many heads would you expect to get?
 Flip a biased coin, where P ( H )=0.7 , 100 times. How many heads would you expect to get?
53
 Flip a coin, where P ( H )= p , 100 times. How many heads would you expect to get?
 Can you write a general formula for the expected number of heads in these examples?
If X is a discrete random variable taking the values x 1 , x 2 , x 3 , …, the

expectation of X , denoted by E [ X ] , is defined by
E [ X ] =∑ x k p X ( x k ) =¿ ∑ x k P ( X =x k ) .¿
k k
Note:
 E [ X ] is just a number (not a random variable).
 E [ X ] is also called the “expected value” or “mean” of X .
 It can be thought of as the “center of mass” of the probability distribution.
Example 3.3.2
Students at the university library may borrow up to five books at any one time. The number of
books borrowed by a student on each visit is a random variable, X , with the following probability
distribution:
x 0 1 2 3 4 5
p X (x) 0.24 0.12 0.14 0.30 0.05 0.15
A student arrives at the library. How many books would you expect them to borrow?
5
E [ X ] =∑ x k p X ( x k ) =0 ×0.24 +1× 0.12+¿ ¿
k =0
Example 3.3.3: Expectation of the Bernoulli Distribution

If X Ber (p) then X takes on the values x 1=1 and x 2=0 and
E [ X ] =∑ x k p X ( x k ) =¿ ∑ x k P ( X =x k )=¿ ¿ ¿
k k
Manipulating Expected Values

Later in this course (and throughout your degree), it will be useful to be able to calculate the
expectation of some function of X . The following result helps us do this.
If X is a discrete random variable with X ( Ω ) ={ x 1 , x 2 , x 3 ,… } and h : R → R is a
function, then
54
E [ h(x ) ] =∑ h ( x k ) p X ( x k ) .
k
A useful example of this theorem is:

Linearity of Expectation
If X is a random variable and a , b ∈ R (i.e. a and b are real numbers), then
E [ aX +b ] =aE [ X ] +b .
3.4 Expectation of Continuous Distributions
The expectation of a continuous distribution is defined in exactly the same way as

the expectation of a discrete distribution EXCEPT that, the summation becomes
integration in the same way the probability is calculated differently between the
two. All the rules of manipulation and linearity hold for continuous variables (again,
remembering that sums become integrals)
If X is a continuous random variable with density function f X, then the expectation of

X , denoted once again by E [X ], is defined as
∞
E [ X ] = ∫ x f X ( x ) dx
−∞
(whenever the integral converges absolutely).
3.5 Online quiz

Try the ‘Exponential’ quiz on Blackboard.
Chapter 3 homework: Expectation and Continuous Distributions

Robot Questions
Calculate the expectation of the following random variables.
Question 1. Calculate the expected value of x. Write the pmf in the form specified in lectures
and discuss what makes it a valid pmf.
x 5 6 7 8 9 ∑❑
P( X=x) 0.1 0.2 0.3 0.3 0.1 1
xP( X=x )
E( X )=¿
55
P(X=x) = ¿
Question 2 Calculate the expected value of x. Write the pmf in the form specified in lectures
x -2 -1 0 1 2 ∑❑
p X (x) 0.1 0.2 0.3 0.3 0.1
x . p X (x)
Question 3 Calculate the expected value.
{
6− y
, y=1 ,2 , 3 , 4 , 5
p y ( y )= 15
0 , otherwise
For the following questions, assume X exp ( 3 ) .
Question 4 State the pdf for the exponential distribution and calculate f X ( 2 ) .
Question 5 State the cdf for the exponential distribution and calculate F X ( 2 ) .
Question 6 Calculate P( X ≤ 3)
Question 7 Calculate P( X> 3)
56
Question 8 Find a formula (e.g. on the internet) for E [X ]. Use this to calculate E [X ]
57
Chapter 3 Tutorial: Expectation and Continuous Distributions
Mathematician Questions
Calculate the expectation of the following random variables.
Question 1: Find the value of a. Then calculate E [ X ] .
x 1 2 3 4
p X (x) a 0.2 3a 0.2
Question 2
{
k ( w−1 ) , w=2 ,3 , 4 , 5 ,6 ,7
pW ( w )= k ( 13−w ) , w=8 , 9 ,10 , 11, 12
0 , otherwise
For the following questions you will need the formula for the expectation of the Binomial and Geometric
distribututions.
1
E [ X ]=
p
E [ X ] =np
Discuss which expectation you think goes with each distribution? What makes you think this?
Check if you are correct on the internet.
Question 3: X Geometric ( 0.1 ) . Calculate E [ X ] .
Question 4: You flip a fair coin until you get heads. How many times do you expect to flip the
coin?
58
Question 5: You are in a casino in Las Vegas and decide to play roulette. You always bet on 00.
You keep playing until you win. How many spins of the roulette wheel do you expect?
Question 6: You only have enough money for 10 spins of the roulette wheel. How many times do
you expect to win?
Note: You will not be tested on anything requiring intergration at the moment but you will later in
the course.
Let a continuous random variable X be given that takes values in [0 , 6], and whose distribution function
F X satisfies
3 2
−2 x +6 x +144 x
F X ( x )= for 0≤ x ≤ 6.
648
Question 8: Compute P ( 1
2
≤ X ≤1 )
Question 9: Give the probability density function of X in the interval [0, 6].
59
The probability density function f X of a continuous random variable X is given by
{ −6 ( x +7 x−c )
2
f X ( x )= 25 if 0 ≤ x ≤1
0 otherwise
Question 10: Computec .
Question 11: Give the distribution function of X in the interval [0 ,1].
In the following questions, consider the Exponential distribution (Section 3.4).

Question 12: Verify the density function of the exponential distribution given in lectures is valid.
Question 13: Derive the cdf of the exponential distribution (for x ≥ 0 ¿ by integrating the pdf.
The service time at a super market checkout is exponentially distributed with a mean service time of 2
minutes.
Question 14: What is the probability that the service time will be longer than 3 minutes?
Question 15: The service time has already taken 5 minutes. How much longer is it expected to
take?
60
4 Summary statistics and the Normal distribution
With Ellen’s notes you will need to fill in the gaps as we go, answering questions rather than
copying from the board!
In this chapter, you will need the Excel file ‘Student normal distribution Excel sheets’ which
contain calculations used within the notes.
4.1 Summarising continuous data

VIDEO: Summary Statistics (continuous data)
You should have learnt about summary statistics in Maths Tech but here is a quick reminder.
CONTINUOUS DATA: This is data which has a meaningful measurement such as height, exam
score, age or time taken to complete a task.
Exercise: I have collected and summarised data on 81 babies born at Jessops in 2016 which are contained
in the Excel sheet.
Two summary statistics which are commonly used are the mean and the standard deviation.
n
The mean is the most commonly used average

∑ xi
i=i
n
The standard deviation is a measure of how spread out the values are from the mean. A small
value means the values are consistent about the mean and large means they are spread out. The
calculations use the differences between each value and the mean.
n
Variance:
∑ ( x i−x )
2
Standard deviation:√ variance

i=1
n−1
Birthweight (x)
The table here shows the first 5 babies. The mean uses the sum 6.9 0.16
of the first column and the variance the sum of the second.
6.6 0.47
Calculate the sample mean and standard deviation for the 81 5.3 3.91
babies using the sums below.
8.5 1.33
81 81
7.9 0.41
∑ x =594.89 ∑ ( xi −x ) =158.97
2
i=1 i =1
61
Median and quartiles
The median and quartiles divide ordered data into 4 equal parts which are labelled in the boxplot
below. 25% of values are below the lower quartile and 25% are above the upper quartile.
The interquartile range is the middle 50% (the box) are can be written as (lower quartile, upper
quartile) or as the absolute difference upper – lower.
Write down the median and interquartile range from the data below
Histograms
Histograms are frequency distributions with frequencies of grouped data in bars. They are usually
used to check the spread of the data and check the type of distribution.
62
Histogram of approximately
Histogram of skewed data
normally distributed data
Average: Mean Average: Median

Measure of spread: Measure of spread:
Standard deviation Interquartile range
Which summary statistics should be used to describe the birthweight data?
Can we use the sample data we have to generalise about the wider population of babies?
Inferential statistics
In statistics we usually use sample data to estimate parameters of a wider population.
We call this inferential statistics as we are inferring something about the population.
Different notation is used to represent sample statistics and population parameters.
63
4.2 Normal probability distribution
VIDEO: Introduction to the normal distribution
The babies can be summarised in histograms using frequencies, proportions or percentages and
used to estimate values in the population.
Frequency distribution Distribution of percentages
How many babies weigh between 9 and 10 lbs? What percentage of babies weigh 9 - 10 lbs?
Are the data normally distributed? What percentage of babies weigh more than 9 lbs?
The population distribution can be represented by a smooth normal curve estimated using the
sample mean and standard deviation from which probabilities can be calculated.
Probability density curve Probability density curve
Estimated population Shaded area is

probability curve required probability
Does the curve fit the data well? If X is the random variable birthweight, what
probability is represented in the blue shaded area?
What’s the total area under the curve?
64
Normal probability density function
Note: In statistics you will notice the use of lowercase and capital letters. Capitals are for the
random variable in general and lowercase letters indicate specific values of the random variable.
The most widely used continuous probability distribution is the normal distribution. It was
originally used by the German mathematician, Karl Friedrich Gauss (1777-1855) who called it
Gaussian error distribution but it was given the name normal by the American logician Charles
Peirce in 1873. The exact shape is controlled by the mean m and the standard deviation s.
Formally if a random variable X has a normal distribution, in short hand
2
X N (μ , σ ), then the normal probability density function f ( x ) is a function of
the form
2
−( x−μ )
1 2
f ( x )=f ( x|μ , σ )= e 2σ
−∞ < x <∞ ,−∞< μ <∞ σ >0
σ √2 π
Where m is the mean, s is the standard deviation of X and π is the constant

3.142.
Let X = random variable ‘birthweight in pounds (lbs)’ which has a mean of 7.3 and standard
deviation of 1.42 so X N (7.3 , 1.422) .
To sketch the probability density curve below, values of f(x) were calculated for specific x values
Excel has a built in formula for calculating the p.d.f:
=¿ NORM . DIST (x , μ , σ , FALSE )

Estimate the value of f(9)
from the curve.
At what value of x does the

curve peak?
Note: The curve never actually touches the x axis as the probability is never 0, just really small!
65
The curve is controlled by the mean and the standard deviation. The following curves show how
the curves change as the parameters (mean and SD) change. To see the impact changes have use
the Excel sheet:
Write the means and standard deviations for the following density curves (given in lecture)
μ=7.3
μ=¿
σ =1.4
σ =¿
μ=7.3 μ=¿
σ =1.4 σ =¿
Which parameter controls the

a) location of the curve
and b) the shape of the curve?
66
Cumulative density function (cdf)
For continuous random variables, probabilities are calculated using the integral
of probability density function, the cumulative probability distribution
(cdf) which calculates less than probabilities. The cumulative probability
density function is represented by
x
F ( x )=P( X < x)=∫ f (x )dx
−∞
To calculate the cumulative probabilities for a given distribution in Excel, the same formula is used
as the p.d.f but FALSE is replaced with TRUE
=¿ NORM . DIST (x , μ , σ , TRUE)

The following plot shows the c.d.f for the birthweight data with y values being calculated using:
P( X< x)=NORM . DIST (x ,7.3 , 1.42 , TRUE)
From the graph, estimate the

following:
a) P(Baby weighs less than 6 lbs) = P(X<
6)
b) P(X > 6)
c) the median birthweight P(X < ?) = 0.5
d) the number below which the bottom 5% of values lie.
67
Probability tables
Excel can calculate the probability of any normally distributed variable given the mean and
standard deviation but for exam situations, you will need to use probability tables in exams
X is the random variable ‘birthweight’ which is normally distributed
2
X N (7.3 , 1.42 ). The table shows cumulative less than probabilities
x
F ( x )=P( X < x)=∫ f (x )dx
−∞
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

4 0.0101 0.0121 0.0145 0.0173 0.0206 0.0243 0.0286 0.0336 0.0392 0.0455
5 0.0526 0.0607 0.0696 0.0795 0.0904 0.1025 0.1156 0.1299 0.1454 0.1621
6 0.1800 0.1990 0.2193 0.2406 0.2631 0.2866 0.3110 0.3363 0.3624 0.3891
7 0.4163 0.4440 0.4719 0.5000 0.5281 0.5560 0.5837 0.6109 0.6376 0.6637
8 0.6890 0.7134 0.7369 0.7594 0.7807 0.8010 0.8200 0.8379 0.8546 0.8701
9 0.8844 0.8975 0.9096 0.9205 0.9304 0.9393 0.9474 0.9545 0.9608 0.9664
10 0.9714 0.9757 0.9794 0.9827 0.9855 0.9879 0.9899 0.9917 0.9931 0.9944
11 0.9954 0.9963 0.9970 0.9976 0.9981 0.9985 0.9988 0.9990 0.9992 0.9994
Rather than the cumulative curve, it is common to shade the parts of the p.d.f. required by the
question of interest instead.
Examples
P( X< 9)
To calculate the probability of a random variable X lying between a and b (where b > a):
b
P(a< X <b)=∫ f ( x ) dx=F ( b )−F ( a )
a
Example: What is the probability that a baby weighs between

4.7 and 9 lbs?
9
P(4.7< X < 9)=∫ f ( x ) dx=F ( 9 ) −F ( 4.7 )
4.7
P ( X <9 ) - P ( X < 4.7 )=P(4.7< X <9)
68
For the following examples, shade the appropriate section of the pdf, write the question
mathematically and use the table above to calculate the required probabilities. Also write the
command needed to calculate the probability in Excel.
Example: P(baby weighs less than 5.2 lbs)
P ( X <5.2 )=¿
¿ NORM . DIST (, 7.3 ,1.42 , TRUE)
P(Baby weighs less than 4.5 lbs)
P(baby weighs less than 10.1 lbs)
P(baby weighs more than 10.1 lbs)
What is the probability of a baby weighing between 4.5 and 10.1 lbs to 2 d.p.?
Find the birthweight a which approximately 80% of babies

lie below e.g. P(X < a) = 0.8
P(X < a) = 0.8
a = ?????
69
4.3 Standard normal (Z) distribution
VIDEO: The Z distribution
Whilst it is now very straightforward to calculate the probabilities for any normal distribution using
a computer package, probability tables were used in the past. As having tables readily available
for every combination of mean and standard deviation was not possible, one special distribution
with a mean of 0 and standard deviation of 1 existed called the Z distribution. Other distributions
were standardised to fit this distribution. Z N ( 0 , 1 )
Probability density function Z N ( 0 , 1 ) Cumulative distribution function
2 2
− ( z) z z −1
1 1 z
f ( z )= e 2
Φ ( z)= ∫ f ( z ) dz= ∫ e 2
√2 π −∞ −∞ √2 π
Standard normal (Z) distribution table
Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
70
4.4 Standardising
In situations where only the standard normal table is available (e.g. exam), values from other
distributions need to be standardised.
Standardising other distributions to create Z scores
X N ( μ , σ 2 ) → Z N (0 ,1)
X−μ
Z=
σ
How would the formula differ if you had sample statistics instead of population parameters?
Example: For the birthweight data which had a mean of 7.3 and standard deviation of 1.42, we
need to convert the original X distribution to a Z
X N ( 7.3 , 1.422) → Z N (0 , 1),
The general equation for standardising

values of interest, X, is:
X−7.3
Z=
1.42
A baby is born weighing 9.9 lbs so the standardised score would be calculated as:
9.9−7.3
Z= =1.83
1.42
If you wish to calculate the probability of having a baby of 9.9 lbs or less, use the Z table
(
P ( X <9.9 )=P Z <
9.9−7.3
1.42 )
=P ( Z <1.83 ) =Φ ( 1.83 )=¿
To calculate the probability of having a baby of at least 9.9 lbs:

P ( Z >1.83 )=1−P(Z <1.83)=1−Φ (1.83 )=¿
71
Special z scores exist which are used a lot in statistics. These z
scores are often used to describe middle (or normal) ranges
such as the middle 95% or 90%.
Example: Find the Z scores between which the middle 90% lie.
Top
5%
Middle 90%
Exercise: Calculate the following probabilities using the standard z distribution

Hint: You may find sketching the curve and shading the required probability useful
a) P ( baby weighs less than 8lbs ) =( X <8 )
¿ P(Z< ¿¿ )
b) P( X> 8)
c) P(8< X <9.9)
Use the Z table to find the value a below which 97.5% of values lie
and the value below which 2.5% lie? Therefore state the range
within which 95% of values lie
Top
P(Z< a) = 0.975 2.5%
Find the values between which the middle 68% of values lie.
72
How would you calculate the middle 95% of birthweights i.e. the
X values given the Z scores? Hint: Reverse the standardisation
No. of SD’s from mean
Actual values
Can you think of a general formula for calculating the middle 95% for any mean μ and SD σ ?
Can you think of a general formula for any Z score, mean μ and SD σ ?
Critical values
Critical values are set values from the Z distribution between which a set percentage of values lie.
We have calculated a few of these in previous examples so fill in the missing values.
Critical values summary
Middle % Z scores
68%
90% ± 1.645
95%
99% ± 2.58
73
Ranges for individuals
You will come across many types of ranges in statistics so make sure you check what exactly is
being reported and whether the range refers to ranges for individuals or parameters.
Interquartile ranges contain the middle 50% of values for sample data but many other ranges are
used to make inferences about the population.
Measurements for people are often divided into percentiles e.g. ‘child’s height is in the bottom
20% for their age’. These can be used to create 'Normal ranges' which are based on sample data
but used to represent individuals in the population. They are also known as 'Reference ranges'.
Next week we will look at 'Confidence intervals' which are used to give a range of values for a
population parameter e.g. mean rather than a range of values within which an individual is
expected to lie.
There are properties of normal distributions which enable us to calculate ranges within which set
proportions of subjects lie. Values can be discussed as a number of standard deviations from the
mean and there are particular 'critical values' between which set percentages lie.
For example, for any normal distribution, 68% of values

lie within one standard deviation of the mean.
μ ±1 × σ
In the birthweight example we can estimate these limits
using the sample data:
7.3 ± 1.42 No. of SD’s from mean
Lower limit =
Actual values
Upper limit =
In statistics, 'most' subjects is considered to be the middle 95%. For normally distributed data,
95% of subjects are considered to be within 1.96 standard deviations of the mean.
Limits are estimated as:
mean ± ( 1.96 × SD )
7.3 ± ( 1.96× 1.42 ) 4.5 10.1
(4.5, 10.1)
95% of individual babies are expected to be between 4.5
No. of SD’s from mean
lbs and 10.1 lbs.
Actual values
74
Fill in the general normal range equations and actual numbers for the birthweight example in
the table below.
Middle % Normal range for individuals Birthweight example

68% μ±σ
90%
95% μ ±1.96 σ (4.5, 10.1)
99%
50%
(Interquartile
range)
Exercise: 30 women were asked how many hours of

housework they did each week and the results summarised.
Given the mean was 14.2 hours and the population standard
deviation is known to be 8.8, estimate:
a) the interquartile range No. of SD’s from mean
b) the 95% limits for individuals. Is there anything strange

Actual values
about the limits?
75
4.5 CAST material
The images in this chapter were created using the interactive online statistics ebook CAST. It uses
Java script to run interactive demonstrations of different aspects of statistics and probability which
can aide your learning. The University computers may not allow you to run Java script so go to the
link at home and use Internet Explorer rather than Firefox:
http://cast.massey.ac.nz/core/index.html?book=generalx
Go to the Sampling and
Variability section and choose
‘Probability and probability
density’ and investigate some
of the options to help you
understand the normal
distribution and probability.
Select the ‘6. Normal

distributions’ section and adjust
the mean and standard deviation
to see the shape of the curve
change.
Change the sliders
In the ‘Probability density functions

sections, change the sample size to
see the impact of sample size on the
smoothness of a histogram.
76
4.6 Summary of Excel commands for the normal distribution
For cumulative less than calculations using the standard normal distribution, Z N ( 0 , 1 ) , P ( Z < z ) :
=NORM.S.DIST(z, TRUE)
You can also calculate z scores for a given probability α , P ( Z < z )=α
=NORM.S.INV(α )
Example: If P(Z < z) = 0.05, use the following to calculate z=-1.645
=NORM.S.INV(0.05 )
To calculate less than cumulative probabilities P ( X < x ) for distributions with any mean and
standard deviation, X N (μ , σ 2 ) use the command:
¿ NORM . DIST (x , μ , σ , TRUE)
Example: ( X <5.2 )=NORM . DIST (5.2, 7.3 , 1.42 ,TRUE)=0.0696

To calculate x scores given the probability, P ( X < x )=α state the probability, the mean and the
standard deviation:
=NORM.INV(α , μ , σ )
Example: To find the x value beneath which 2.5% of values lie for the birthweight data,
P ( X < x )=0.0 5
¿ NORM . INV (0.025 , 7.3 , 1.42)=4.1
Generating random numbers in Excel

To generate random values between 0 and 1, the RAND() function can be used and for generating
numbers between set limits a and b, the RANDBETWEEN(a,b) function can be used.
For example, if we wished to generate numbers between 0 and 15,
=RANDBETWEEN(0,15)
To generate random numbers from a normally distributed variable, nest the RAND command with
the inverse normal command
=NORM.INV( RAND(), μ , σ )
Example: To generate a random number from a population with a mean birthweight of 7.3 and
standard deviation of 1.42:
¿ NORM . INV (RAND(), 7.3 ,1.42)=?
77
You can then pull the command down to create a set of random numbers.
Important note: Pressing F9 will change the numbers but also any calculation carried out on the
sheet will also change the numbers so if you wish to keep a sample, use Copy  Paste values
somewhere else on the sheet.
4.7 Online tests this week

This week you must complete the online quiz:
 Normal distribution
78
Tutorial 4: Normal distribution
The online quiz ‘Normal distribution’ should be completed BEFORE the tutorial. probabilities then
a) P(Z <1.26)
b) P(Z >1.26)
c) P(0< Z <1.26)
d) P(Z ←1.26)
e) P(−1.96< Z <1.96)
79
Q2) The IQ's of 200 people were collected and a mean of 100 and standard deviation of 15.3.
The following questions relate to this distribution. To calculate probabilities, standardise and
use the Z distribution. Sketch normal curves and the required probabilities for each question.
. X =random variable IQ , X N (100 , 15.32 )
a) Does the normal curve fit the distribution well?
b) What is the probability that a randomly selected person has
an IQ of less than 105?
c) What is the probability that a randomly selected person has
an IQ of more than 105?
d) What is the probability that a randomly selected person has
an IQ of less than 95?
e) To qualify for Mensa, an individual must have an IQ of at least 131. What percentage of the
population is eligible to join?
80
a) Estimate the 95% limits between which 95% of people are expected to lie
b) Estimate the 99% limits between which 99% of people are expected to lie.
c) Estimate the median, quartiles and interquartile range for IQ .
Q3) Chocolate bars are automatically filled with on average 100g of chocolate by machines in a factory
with a standard deviation of 5g. Let X be the random variable ‘Weight of one chocolate bar'.
a) What is the probability that a randomly selected chocolate bar weighs less than 105g?
b) What is the probability that a randomly selected chocolate bar weighs more than 105g?
c) Between which two weights (a and b) should 95% of chocolate bars fall if the machine is
functioning correctly?
81
d) If more than 110g of chocolate is used the wrapper is not big enough and the bar is
rejected. What is the probability that a randomly selected chocolate bar weighs more
than 110g?
e) If a batch contains 200 chocolate bars, how many do you expect to be rejected?
A random sample of 5 bars of chocolate is taken from the machine with the following weights.
Calculate the mean and standard deviation of the bars. How
i x
likely do you think it is that the machine is under or over
1 101 filling based on this sample?
2 104 Change the 4th observation from 108 to 98 and write down
3 103 the new mean and standard deviation. What can you
conclude about the impact of outliers on small samples?
4 108
5 99
sum
The Excel sheet ‘Chocolate’ randomly generates bars of chocolate from a population with mean
100 and SD = 5. Press F9 or fn lock F9 to randomly generate more numbers. Record 8 more
randomly generated means in column K and calculate the mean and SD of the 10 sample means.
82
Q4) Complete the following summary table.
You may wish to calculate the cdf/ expectation/ variance of the random variables for which this was not
demonstrated in lectures. Alternatively, you can find them in a book/ on the internet etc.
Bernoulli
NA
Geometric
Binomial
NA
Continuous
Uniform
Exponential
R μ 2
σ
Normal
Continuous
Continuous?
Distribution
Parameters
Range
Discrete or
pmf/ pdf
cdf
83
Probability and statistics statistical tables
Standardised Z distribution cdf

Z N (0 ,1) P ( Z < z )=Φ ( z )
Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
84
85

Core Concepts - Probability Booklet 2020-21

Uploaded by

Copyright:

Available Formats

You might also like

Core Concepts - Probability Booklet 2020-21

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Core Concepts - Probability Booklet 2020-21

Uploaded by

Copyright:

Available Formats

Probability and Statistics 55-403850

© Ellen Marshall and Lindsay Lee 2020

3 Continuous Distributions and Expectation...........................................................................45

4 Summary statistics and the Normal distribution..................................................................59

Created by: Ellen Marshall and Lindsay Lee

Content of the course

Task 1. Phase tests.

Model estimating performance probabilities based on attendance

c) get a 2.1 or higher?

d) Calculate the odds of failing for someone attending 20% of classes

1.1 Introduction/ Motivation

Quick exercise: The sample space of a fair coin toss is Ω={}

A∪B Union Event of A or B occurring 4 2

A ∩C Intersection Event on A and C occurring 1

A ' or Ac Complement Event A does not occur 4

∅ Empty Set The impossible event. 0 , e . g . roll a7

*End video lecture 1a*

Is the packaging correct? Why/ why not?

(A2) The certain event has probability

Mutually exclusive events cannot occur together.

Example 1.2.5: Disjoint events

Quick exercise: Using the Venn diagrams, what would happen if A ∩ B≠ ∅ ?

For any events A , B ∈ Ω

(A4) The probability of an event not occurring is 1 minus the probability

(A5) The empty set has probability zero.

*End video lecture 1b*

1.3 Conditional probability

The probability of you winning is the probability of getting:

(there are a total of 36 possible outcomes).

So, in this example,

Or, more formally,

Which branch corresponds to P ( R 2|R 1 ) ?

Does this agree with your tree diagram solution?

*End video lecture 1c*

1.4 What have you learnt this week?

2) A card is drawn at random from a pack of 52 cards.

3) Two cards are drawn successively at random from a pack of 52 cards.

b) an honours card and then a 10.

Mathematician Questions – we will go through this in the tutorial

5) Maths and gender

c) The probability that a randomly selected participant has maths A level.

Chapter 1 tutorial questions: Basic Probability

Extended Applied Task

c) What is the probability of it raining on at least one day?

P(¿ does not rain on one day)=¿

2.1 Random Variables

 Complete the following table (called a “possibility space”).

Given a sample space Ω and a discrete random variable X ,the function

Let X be a random variable. The (cumulative) distribution function of X is

b) In Example 2.2.6, the cumulative distribution function (cdf) is given by:

*End video lecture 2b*

2.3 Frequently used discrete probability distributions

2.3.1. Discrete Uniform Distribution

 Can you think of any examples of the discrete uniform distribution?

a) First flip? P(T )=¿

e) If X is the number of trials to the first tail then

2.3.3. Geometric Distribution

End video lecture 1a

End video lecture 1b

End video lecture 1c

End video lecture 2b

End video lecture 2c

Start video lecture 3c