Professional Documents
Culture Documents
Core Concepts - Probability Booklet 2020-21
Core Concepts - Probability Booklet 2020-21
Core Concepts - Probability Booklet 2020-21
Core concepts:
Probability 2020/21
1 Introduction to Probability..................................................................................................12
*Start video lecture 1a*................................................................................................................12
1.1 Introduction/ Motivation................................................................................................... 12
1.2 Foundations of probability.................................................................................................13
Notation........................................................................................................................................13
*End video lecture 1a*..................................................................................................................14
*Start video lecture 1b*................................................................................................................14
Probability.....................................................................................................................................14
Venn Diagrams............................................................................................................................. 15
*End video lecture 1b*..................................................................................................................17
*Start video lecture 1c*................................................................................................................ 18
1.3 Conditional probability.......................................................................................................18
Independence............................................................................................................................... 20
*End video lecture 1c*.................................................................................................................. 20
1.4 What have you learnt this week?.......................................................................................20
Chapter 1 homework questions: Basic Probability........................................................................21
Robot Questions............................................................................................................................21
Mathematician Questions – we will go through this in the tutorial..............................................22
Chapter 1 tutorial questions: Basic Probability.............................................................................24
Extended Applied Task..................................................................................................................24
2 Discrete Distributions..........................................................................................................27
*Start video lecture 2a*................................................................................................................27
2
2.1 Random Variables..............................................................................................................27
*End video lecture 2a*..................................................................................................................28
*Start video lecture 2b*................................................................................................................28
2.2 Discrete probability distributions.......................................................................................29
*End video lecture 2b*..................................................................................................................32
*Start video lecture 2c*................................................................................................................ 32
2.3 Frequently used discrete probability distributions.............................................................32
2.3.1. Discrete Uniform Distribution.........................................................................................32
2.3.2. Bernoulli Distribution......................................................................................................34
2.3.3. Geometric Distribution...................................................................................................35
2.3.4. Binomial Distribution......................................................................................................37
*End video lecture 2c*.................................................................................................................. 39
2.4 Summary so far..................................................................................................................40
2.5 Online tests this week........................................................................................................ 40
Chapter 2 homework questions: Discrete Distributions................................................................41
Robot Questions............................................................................................................................41
Mathematician Questions.............................................................................................................43
3
Mathematician Questions.............................................................................................................56
4
Staff teaching on this module
Ellen Marshall (Module leader) ellen.marshall@shu.ac.uk Room: N601
Lindsay Lee lindsay.lee@shu.ac.uk Room: N601
Keith Harris k.harris@shu.ac.uk Room: N602
Topics
Section Contents
Core concepts: Probability Basic probability
Discrete distributions
Introduction to continuous distributions
Expectation
Core concepts: Inferential Statistics Data collection
Summary Statistics
Confidence intervals
Hypothesis testing concepts
Z and T-tests
Correlation and regression Correlation
Simple and multiple linear regression
Analysing categorical data Summarising categorical data
Risk and relative risk
Chi-squared tests
Non-parametric tests
Surveys
Further probability Recapping and expanding on core concepts
Conditional probability
Bayes' Theorem
5
Sessions
Session Content
Video Lectures Short videos explaining the core concepts and how to apply them.
These will be interspersed with short quizzes and exercises. You can
watch these videos at any time but it is recommended that you watch
them in the dedicated lecture slot for this module.
Online Tutorials An opportunity to work together through examples and to introduce
SAS and Excel for statistical analysis. This will include drop-in sessions
for general help, advice and clarification.
Online self-check quizzes A number of practice quizzes for the core concepts of probability and
statistics are available on Blackboard. Each week you will be asked to
try at least one specific test. YOU MUST COMPLETE the suggested
test at least once which will be the equivalent to attendance each
week. You can retake the test as many times as you like as the
numbers change.
Brad Allison (current final year student) created the tests and has
recorded a short video introducing the tests.
Additional resources SAS programming videos
Bilal Mahmood (current final year student) has created a set of SAS
videos covering key topics from your Maths Tech SAS which will help
with coursework for this module. He has also created some short
videos on how to write a statistical report.
SAS statistical techniques summary sheets
We also have a set of summary sheets for each of the key statistical
topics covered in this module which you may find useful when
completing coursework.
Peer support Some sort of peer support from final year students will take place
(details to be confirmed). It is likely that this will centre around the
work based group projects
Main stats support The maths and stats support service offers 1:1 support to any student
of the University and is primarily staffed by lecturers from this
department. Ellen coordinates the statistics support element.
Normally there are drop in sessions but it is likely that this year only
bookable online appointments will be available with statistics staff
teaching on this course.
https://maths.shu.ac.uk/mathshelp/
6
Differences between this course and A level
Most of you will have studied probability and statistics before but prior knowledge will vary so
some of you may find the initial material easy. For sections where most of you will have some
knowledge, the material will form more of a recap and for those who need it, extra practice
questions will be made available. However, we will very quickly progress from A level type
material to project based analysis more similar to the type of statistics you will need to carry out in
the workplace and using SAS.
Important note: Don’t miss class because you think you have studied statistics before. Analysis
from previous years shows that there is no significant difference in final grade for those who had
and had not studied statistics in detail before and attendance is the strongest predictor of grade!
This is the maths robot. Ask an exact question and you will get an exact
answer. Whilst it is important to be able to practice and repeat using
formulae or techniques when learning basic maths or statistics, being a
real mathematician or statistician requires much more than that.
Understanding where formulae come from and being able to apply the
techniques you have mastered in a practical and more open context is more
important at this level.
On this course, the selection, application and interpretation of techniques is important and the
focus of your assessment will be on these aspects rather than the memorisation of formulae.
7
Assessment of the Module
Assessment: There will be several assessments which test a mix of mathematical understanding,
the use of Excel/SAS and general skills such as reporting and presentation.
% weighting % subtask
Assessment and content Due in weighting
overall
8
WELCOME TUTORIAL
In the first tutorial, we will be concentrating on getting to know you and demonstrating how
online tutorials will work. There is an opening exercise to try which doesn’t require any prior
knowledge. You will be placed in smaller breakout groups (in a similar way to tables within a
classroom) to work together on the opening exercise. Test out the best ways of doing this whilst
working on the exercises but also take the opportunity to introduce yourselves to the rest of the
group perhaps discussing any prior study of probability and statistics. Staff will move between the
rooms to chat to you in your smaller groups.
Opening exercise: Attendance and performance
The percentage of weekly statistics lectures and overall performance (%) in a module were used to
create statistical models to predict probabilities of failing or getting a 2.1 or higher based on
attendance. Use the theoretical predicted models in the graph below to answer the following
questions.
0.6
0.5
0.4
0.3
0.2
0.1
0
0 10 20 30 40 50 60 70 80 90 100 110
% attendance for weekly Statistics lecture
1. Use the graph to estimate the following probabilities for a student attending 30% of their lectures
a) Fail- The probability of a student with 30% attendance failing is estimated to be around 0.65
b) Pass- The probability of a student with 30% attendance passing is estimated to be around
0.07
2. Use the graph to estimate the risk of failing if a student attends 70% of lectures. How much more likely
to fail is a student who attends 30% of lectures compared to one who attends 70%?
9
3. At what level of attendance is a student equally likely to fail and get a 2.1 or above?
4. If there are 90 students in the class and everyone attends 40% of lectures, how many do you expect to
get a 2.1 or more?
5. The statistical model for failing uses the following equation where p = Probability of failing the course.
ln ( 1−pp )=3.4−0.093 x
a) Rearrange the equation to make p the subject
b) Estimate the probability of failing for a student attending 20% of classes using the formula
from a)
p
c) Describe what the ratio (known as odds) means in words
1− p
10
6. The statistical model was created from real data on attendance at the weekly statistics lecture
(attendance not collected consistently for other lectures and tutorials for the same module) and overall
performance in the whole module.
a) Draw some conclusions on the relationship between attendance and performance based
on the answers to the previous questions.
b) How reliable do you think predictions of final grade are based solely on lecture attendance
and is it just attendance that increases grade or another underlying factor?
c) What else should be considered when predicting success or failure for individuals?
11
1 Introduction to Probability
*Start video lecture 1a*
We often use verbal expressions of uncertainty (“possible”, “quite unlikely”, “very likely”), but
these are often inadequate if we want to communicate with each other about uncertainty.
Consider the following example.
Example 1.1.2
You are being screened for a particular disease. You are told that the disease is “quite rare”, and
that the screening test is accurate but not perfect; if you have the disease, the test will “almost
certainly” detect it, but if you don’t have the disease, there is “a small chance” the test will
mistakenly report that you have it anyway. The test result is positive. How certain are you that you
really have the disease?
Clearly, in this example and in many others, it would be useful if we could quantify our
uncertainty. In other words, we would like to measure how likely it is that something will happen,
or how likely it is that some statement about the world turns out to be true. In this course we
introduce a theory for measuring uncertainty: probability theory.
There are (at least) two ways to think about the study of probability theory:
1. Pure mathematical approach
2. Modelling/data-driven approach
12
1.2 Foundations of probability
An experiment or trial is a process that results in one outcome out of all the possible outcomes.
The resultant outcome is unknown prior to the experiment. That is the result of the experiment is
uncertain.
The 'list' of all possible outcomes or sample points is called the sample space “S” or Ω.
An event is one or more of the possible outcomes from an experiment. A simple event
corresponds to a single outcome or sample point.
Example 1.2.1
If you throw a fair die some examples of events are:
A is the event that you roll a 5
B is the event that an even number appears
C is the event that an odd number occurs
D is
The sample space is
Ω={ 1, 2 , 3 , 4 , 5 , 6 } .
13
Axioms of Probability
A probability function P assigns to each event E ⊆ Ω a real number P(E) such
that:
(A1) P(E)∈[0 ,1]
(A2) P(Ω)=1
(A3) If { E 1 , … , En } is a countable disjoint collection of events then
n
P ( ¿ i=1 ¿ n Ei ) =∑ P ( Ei ) .
i=1
So the probability of the event of “rolling or not rolling a 6” is 1.1 and that is not allowed because
probabilities lie between 0 and 1.
Why do probabilities lie between 0 and 1? Can you prove that? No you cannot. That is just
something we all agreed on. It’s an AXIOM.
14
15
Example 1.2.3
Let’s think of a more standard probability experiment where we all know what the probabilities
are: throwing a die.
The sample space is { 1 , 2, 3 , 4 ,5 , 6 } . The probability of the event {1 } that the die comes up with a 1
is equal to 1/6, in symbols we write P({1})=1/6.
Similarly P({2})=1/6.
If I told you that P({1, 2 })=0.4 you would object, because you would tell me that
P({1, 2 })=P ({1 }∪ {2 })=P({1 })+ P({2 })=1 /6+ 1/6=1 /3 ≠ 0.4 .
So there are rules to assigning probabilities.
In this example the idea of what probability should be assigned to each event is quite intuitive.
However, in order to deal with (potentially much) harder examples, we need to write down a set
of rules explicitly that probabilities should obey. These are the Axioms of probability.
Axiom Translation
(A1) A probability is always between
(A3) The probability of the union of disjoint events is the sum of the probabilities of the
individual events.
i.e. (A3) is just the “or- rule” from GCSE/ A level.
Venn Diagrams
The Venn diagram can be used to illustrate the combination of events.
We use the Greek letter Ω to denote the universal set i.e. the sample space. Regions within this
space represent events.
16
Example 1.2.4
Shade in E1 ∪ E2.
Use this to convince yourself that (A3) is true only for mutually exclusive events and that A3
can also be written as
P ( A ∪ B )=P ( A ) + P ( B ) if A ∩ B= ∅
17
Example 1.2.6
Consider for example the events A={1 , 2} and B={2 ,3 }when throwing a die.
We know that P( A)=P (B)=1 /3 .
But P( A ∪ B)=P({1, 2 , 3})=1/2 ≠1 /3+ 1/3.
This is because the events A and B are not disjoint but both contain the outcome 2.
The next theorem gives some rules that may be inferred from the axioms of probability.
Theorem Translation
(A6) The probability of the union of any two events is the sum of the
probabilities of the individual events minus the probability of both
events occurring together.
Example 1.2.7
Draw a Venn diagram to convince yourself that (A6) is true.
18
*Start video lecture 1c*
So P ( win ) =¿
Thus, P ( win ) =¿
What happened is that additional information has shrunk the sample space.
Before the first die stopped at six, the sample space contained 36 elements whereas after the first
die stops at six the sample space consists of only 6 elements.
19
We can consider this example in terms of conditional probability.
The conditional probability of A given B is denoted by P( A∨B) . And we say…
P ( A|B )=¿
Exercise 1.3.2: Verify the above relation for the probabilities in Example 1.3.1.
P ( B )=¿
P ( A ∩B )=¿
A simple rearrangement of formula for conditional probability gives the multiplication rule.
Multiplication Rule
Let A , B be events with P(B)> 0. Then
P ( A ∩B )=P ( A|B ) P ( B ) .
Note: This is formalising how you would calculate probabilities using a Tree Diagram at GCSE/ A-
level.
Example 1.3.3
Consider drawing two balls out of a bag containing 8 white and 4 red balls. What is the probability
that both balls are red?
Let R1 be the event that the first ball is red and R2 the event that the second ball is red.
The question asks for P(R 1 ∩ R 2).
It is easy to determine that P(R 1)=¿
20
because all balls are equally likely to be picked and one third of the balls are red.
It is less easy to determine P(R 2) because we do not know whether by the time the second ball is
picked there are still 4 or only 3 red balls still in the bag, because that depends on the outcome of
the first draw.
This is where at GCSE/ A-level we might draw a tree diagram to help us calculate this probability.
So, draw a tree diagram to answer this question in the space below.
Independence
Unlike in Examples 1.3.3, there are situations where knowledge that an event B occurs does not
influence the probability of occurrence of an event A. This gives rise to the notion of independent
events.
Events A and B are independent if P( A ∩ B)=P( A) P( B).
Note: This is simply a rephrasing of the “and-rule” from GCSE/A-level.
21
Chapter 1 homework questions: Basic Probability
HOMEWORK: You must attempt ALL the robot questions BEFORE the tutorial
and ideally the mathematician questions from section 1 as well. If you are
struggling with any questions, go to the face to face session for help and/or
ask us in the tutorial.
ONLINE TEST which MUST be completed before the tutorial: Basic Probability
Robot Questions
1) A ten sided die with numbers 1 to 10 is thrown
Describe the sample space and find the probability of obtaining the event {1, 4, 5}.
22
4) Two four-sided dice are thrown.
Two four-sided dice are thrown. Let us write the outcome of an experiment as [i, j] where
i∈ {1 , 2 ,3 , 4 } is the score on the first die and similarly j ∈{1 ,2 , 3 , 4 } is the score on the second
die.
Let A be the event that the sum of the dice is even;
let B be the event that the first die shows a higher number than the second;
and let C be the event that the sum of the two dice is 4.
We can write the event A by listing all the outcomes contained in the event:
A={[1 , 1],[1, 3],[2 , 2],[2 , 4 ],[3 , 1],[3 , 3],[ 4 , 2],[4 , 4 ]}
Similarly specify each of the following events as sets listing all their outcomes.
a) B=¿
b) C=¿
c) A ∩ B=¿
d) A ∪ C =
e) A ∩C c =¿
f) c
A ∩C=¿
g) A ∩ B∩ C=¿
5) Using a Theorem
If P( A)=2/3 , P(B)=1/ 4, and P( A ∪ B)=3/ 4, calculate P( A ∩ B).
2) Independence
1 2
Let P ( A )= , P ( B )= p and P ( A ∩B )= . Find the value of p such that the events A and B are
3 15
independent.
23
3) Driving licence
To get a driving licence, you have to pass a theory test and a practical. The probability that an
individual passes the theory test is 0.9. The chances of passing the practical test are greatly
improved by passing the theory test. In particular, if an individual passes the theory test, they have
a 0.7 chance of passing the practical. Otherwise, the chance of passing the practical is 0.4. Without
using a Tree Diagram, find the probability that an individual gets a driving licence.
4) Random draws
A ball is drawn at random from an urn containing 8 red and 6 white balls. If a white ball is drawn, it
is put back into the urn. If a red ball is drawn, it is returned to the urn together with 6 more red
balls. Then a second draw is made. What is the probability that a red ball was drawn on both the
first and second draws?
252 students had maths A level, 43% of students asked identified as male and 138 of those
identifying as male had maths A level.
a) Fill in the table of frequencies GCSE and A level and
Gender Ge nde r
and then use it to calculate the
below above (A) Total
following probabilities. State
the answer numerically but also Female
use probability notation e.g. Male (M)
P(M) = probability of identifying
male and P(A) = Probability of A le ve l Total 486
having maths A level.
b) Probability of a randomly selected respondent identifying as female
d) Probability that a randomly selected participant identifies as female and has maths A level.
24
e) Probability that a student has maths A level given that they identify as female
f) The probability that a student identifies as female given they have maths A level.
https://blog.metoffice.gov.uk/2016/04/21/whats-
the-chance-of-an-april-shower/
1) You are going away for the weekend and the Met Office predicts that the likelihood of rain on
Saturday is 5% and the likelihood of rain on the Sunday is 10%.
a) Assuming that the likelihoods are independent, what is the probability of it NOT raining whilst you are
away P(No rain on either day)?
b) What is the probability of it raining on one day only?
25
2) The Met Office claims to predict next day temperature within 2 degrees day 90% of the time.
a) What is p = P(they are out by more than 2 degrees on any given next day)?
b) What is the probability of not making a mistake on any given next day in terms of p and
numerically?
c) How many days in a month (30 days) do you expect them to be wrong about the next day
temperature?
3) Let X = r.v. number of days up to and including their first mistake’ and p as in Q5
Write the following answers in terms of p and numerically. What is the probability that the first day they
are wrong is:
a) first day
b) second day
c) third day
a ¿ P( X=1)=¿
Create a formula in terms of x and p which will allow the calculation of the probability of the first
'success' for any x and p.
P( X=x)=¿
26
4) You are going away for a week. The chance of rain is 5% on each day.
a) If p = P(rains on one day), how would you express P(it does not rain on one day) in terms of p?
b) What is the probability of it not raining whilst you are away (assuming events are independent)?
Write this in terms of p and calculate the probability.
c) If X = r.v. ‘no. of days it rains in 7 days’, what is the probability that it rains on one day only? Write this
in terms of p and calculate the probability. Hint: Consider how many ways this can happen.
d) What is the probability of it raining on the first two days only? Write this in terms of p and calculate
the actual probability.
a)
27
2 Discrete Distributions
*Start video lecture 2a*
ii.) Data consisting of numbers which can take any value within certain limits.
We can regard this kind of data as the values taken by continuous variables.
(e.g. the air temperature over a certain period, birth weights of babies etc).
Quick exercise: write down a discrete random variable
In statistics we are constantly considering variables and the values they take.
Frequently we wish to determine the probability with which a particular variable can take certain
values.
Example 2.1.1
Consider the simple experiment of tossing three coins simultaneously and noting the result.
We can define a random variable X to be the number of heads that turn up.
The possible values that X can take are…
We may be interested in the probability that X takes a value of two or more.
X can take only whole number values and until we toss the coins we do not know how many heads
will turn up (i.e. random). Putting this together, it is thus a discrete random variable.
More formally, a random variable is a quantity that depends on the outcome of a random event
(i.e. a probability experiment).
28
Example 2.1.2
When you throw two dice:
X = the sum of the two scores showing is a random variable
Y = the product of the scores
Z = the larger of the two scores
Your own example:
are all random variables.
Notation
Random variables are denoted by a capital letter, as in Example 2.1.2. Realisations of the random
variable, i.e. the outcome of the experiment, are denoted by a small letter.
e.g., the random variable X can take values x 1 , x 2 , x 3 , x 4 , x 5. We may be interested in P ( X =x1 ) or
some other value or collection of values.
Formally, a random variable, X, represents a function which associates a real number with every
event in a sample space.
Maths: X : Ω→ R
We denote the range of this function (i.e. the values it can take) as X (Ω).
Example 2.1.3
Random variables can also arise from real-life observation. For example, X, the number of
telephone calls arriving at a switchboard between 10:00am and 10:30am is also a random
variable.
Example 2.1.4
Consider the following bet: you roll a fair die and
a) win £2 if the outcome is a 5 or a 6,
b) lose £1 if the outcome is 1, 2 or 3,
c) win or lose nothing if the outcome is 4.
The sample space of this random experiment is Ω=¿
Denoting losses as negative gains, we can represent the gains from this experiment as a function
X : Ω→ R define as
{
X ( ω )= −1 if ω=1 , 2 ,3
¿
The amount you gain is the random variable.
The range of X (i.e. the values the random variable can take) is X ( Ω ) ={−1 ,0 , 2 } .
X, can only take distinct values, it is a discrete random variable.
*End video lecture 2a*
*Start video lecture 2b*
29
2.2 Discrete probability distributions
When values of a variable have a probability attached, they form a probability distribution.
Probability distributions are theoretical distributions based on some mathematical model which is
set up to describe some particular situation. A discrete probability distribution is simply a list of
the probabilities associated with each possible outcome of the experiment.
Remember the notation: We use capital letters (often W , X , Y , Z ) to mean the random variable
and small letters (e.g. w , x , y , z ) to mean the particular values of the random variables.
Example 2.2.1
Flip a coin three times. Let X =number of heads . What is the probability distribution of X ?
List the possible outcomes in the space below- what is the probability associated with each?
Example 2.2.2
In Example 2.2.1, the probability mass function is given by:
p X ( x )=¿
Exercise 2.2.3: Write down the probability mass function for the discrete random variable
defined in Example 2.1.4
Theorem
30
Let X be a discrete random variable that takes values in the set { x 0 , x 1 , x2 … } .
Then its probability mass function p X satisfies:
p X ( x k ) ≥ 0 , k=0 ,1 , 2 ,… (m1)
∑ p X ( x k ) =1(m2).
all x k
Exercise 2.2.4: Which of the following are a valid probability mass function/ possibility space?
Give a reason for any which are not.
x 1 2 3 4 5
P( X=x) 0.2 0.3 0.4 0.3 0.2
x -2 -1 0 1 2
p x (x ) 0.2 0.3 0.1 0.3 0.1
{
1
if x=1
2
1
p X ( x )= if x=2
4
1
if x=3
4
{
0.3if y=1
pY ( y )= −0.2if y=2
0.9if y=3
31
Example 2.2.5
Recall Example 2.2.1: flip a coin three times. Let X =number of heads .
a) What is the probability that you get 1 head or less?
b) F X ( 1 )=P ( X ≤1 ) =¿
c) F X ( 2 )=¿
Example 2.2.6
In Example 2.2.1, the cumulative distribution function (cdf) in tabular form is given by:
x 0 1 2 3 4
F X ( x )=P(X ≤ x) 1 1 3 4 4 3 7
+ = + =
8 8 8 8 8 8 8
Here are some more general properties satisfied by the distribution function of any random
variable:
1. F X (x ) is increasing in x .
(i.e. If a< b then F X ( a ) ≤ F X ( b ).)
2. lim F X ( x )=0
x→−∞
3. lim F X ( x )=¿ ¿
x→ ∞
Example 2.2.7
a) In Example 2.1.4, what is F X (−1 )?
{
0 if x <0
1
if 0 ≤ x<1
8
4
F X ( x )= if 1 ≤ x< 2
8
7
if 2 ≤ x <3
8
1 if x ≥ 3
c) We can also plot the cdf (see below). Comment on the features of this plot.
32
Recap
So far in this section we have covered:
Random variables – discrete and continuous
Probability distributions
The probability mass function for discrete random variables
The cumulative distribution function.
{
1
if x=1, 2 ,
p X ( x )= 6
0 if x ≠1 , 2 ,
33
This is an example of the discrete uniform distribution also known as the “equally likely
outcomes” distribution. Discuss why you think this distribution might have these names.
We say that a random variable X has a discrete uniform distribution and
write X Uniform(n) , if X ( Ω ) ={ x 1 , x 2 , … , x n } and it has mass function
{
1
if x ∈ X ( Ω )={ x 1 , x 2 , … , x n }
p X ( x )= n
0 otherwise .
Note that you may see alternative notation in the literature such as:
X Uniform ( k )
X Unif ( n )
X U (k )
They all mean that “ X has a uniform distribution” (i.e. n or k outcomes, each with equal
probability of occurrence).
Example 2.3.1.2
A special case of the discrete uniform distribution is when X ( Ω ) ={ x 1 , x 2 , … , x n }= {1 , 2 , … ,n } .
Write down the mass function for this special case of the discrete uniform distribution.
Sketch the pmf of this special case of the discrete uniform distribution.
34
2.3.2. Bernoulli Distribution
Many common discrete distributions are associated with a Bernoulli trial in which only one of two
possible outcomes can occur. These outcomes are commonly referred to as "success" (1) and
"failure" (0).
Example 2.3.2.1
When you press a light switch the bulb either comes on or it does not.
After manufacture an article may be classed as either "defective" or "non-defective".
A patient referred to a consultant may attend or not attend.
Any situation where we have an event and a complementary event…
Your own example:
Hence, if the probability that a "success" will occur at a Bernoulli trial is0 ≤ p ≤ 1 then the
probability that a "failure" will occur at the trial must be q=1− p .
But the Bernoulli trial also forms a distribution in its own right.
We say that the random variable X has the Bernoulli distribution with
parameter p, and write X Bernoulli( p), if it only takes values 0 and 1, i.e.
X ( Ω ) ={ 0 , 1 } with
P ( X=1 )= p
and
P ( X=0 )=1− p .
The mass function of X is
{
1− p if x=0
p X ( x )= p if x=1
0 if x ≠ 0 ,1.
35
Example 2.3.2.2
Consider a Bernoulli trial in which a biased coin is flipped. Suppose the probability of a tail is 0.2.
Assuming trials are independent what is the chance that a tail will occur for the first time on the:
b) Second flip?
c) Third flip?
d) Fourth flip?
f) Now suppose the probability of a tail is p. If X is the number of trials to the first tail then
P( X=x)=¿
p X ( n )= {
( 1− p )n−1 p
0 otherwise
In other words, X represents the number of the trial at which the first "success" occurs.
Why do you think this distribution is called the “Geometric” distribution?
36
Example 2.5.1
Consider an American roulette wheel, with numbers 00 ,0, 1, 2, 3,..., 36 (38 numbers in total). A ball
is thrown onto the wheel as it is spinning, and comes to rest by one of the numbers. If you always
bet that the ball will stop on one of the numbers 1, 2, ..., 12, what is the probability that you will:
a) Win on your first bet?
d) Can you write what you were asked to calculate in parts a-c using the notation p X (n) and
F X (n)?
In part (c) you were calculating P ¿)- this is the cumulative distribution function of a Geometric
random variable. To save you some time, we will now derive a general formula for the distribution
function of a geometric random variable.
F X ( x )=P ( X ≤ x )=¿
In the last equality we introduced the notation ⌊ x ⌋ to denote the largest integer smaller or equal
to x . Using the probability mass function given above we then find for x ≥ 1 that
F X ( x )=¿
F X ( x )=¿
37
The distribution function for X Geo ( p ) is thus given by
F X ( x )=P ( X ≤ x )=
{ 0 x< 1
1−( 1−p )⌊ x ⌋ x ≥ 1
Complete the following table that represents the possibility space for this distribution.
Number of Heads = x Number of Probability of One P( X=x)
combinations Combination
0 1 0.0016 0.0016
1 0.0064 0.0256
2 6
3 4 0.1024 0.4096
4 1 0.4096 0.4096
Total =
Can you write a formula in words for P( X=x) ?
38
To find the number of combinations in this example, we do not have to list all them all!
Exercise: What is the formula for the number of combinations of r items out of n items when the
order does not matter?
Note: 0 !=1.
The Binomial distribution occurs when we have a fixed number (n ) of independent trials.
Each trial has only two outcomes, usually called 'success' and 'failure'.
The probability of success ( p) is the same for each trial.
The random variable X is the number of successes in the n independent (Bernoulli) trials.
Can you put all this together and complete the definition of the Binomial distribution (below)?
Note: This derivation (above) is non-examinable.
We say that the random variable X has the binomial distribution with
parameters n=number of trials and p= probability of success, and write X Binomial(n , p)
, if X ( Ω ) ={0 , 1 ,2 , … , n } and it has mass function
{( )
n p k ( 1− p )n−k if k=0 , 1 ,2 , … , n
p X ( k )=P ( X=k )= k
0 otherwise
39
Example 2.3.4.2
Complete the following table to verify that our pmf gives the same values as our
calculations in Example 2.3.4.1.
Number of
Heads = k
Number of
combinations ( nk) Probability of
One
P( X=k ) pX ( k )
(using pmf)
Combination
0 1 0.0016 0.0016
1 0.0064 0.0256
2 6
3 4 0.1024 0.4096
4 1 0.4096 0.4096
Total =
Recap
In this chapter we covered:
Discrete/ continuous random variables
The probability mass function and cumulative distribution functions for discrete random
variables
The Discrete Uniform, Bernoulli, Geometric and Binomial distributions.
40
2.4 Summary so far
It’s good practice to summarise key points from each chapter in your own words to ensure you
understand and to refer back to when revising. For example, summarise the distributions covered
so far or the axioms of probability as below.
1) Identify the distribution given below. State the axioms of probability in your own words and
demonstrate that it is a valid probability mass function (pmf).
{
1− p if x=0
p X ( x )= p if x=1
0 if x ≠ 0 ,1.
Bernoulli
Binomial
Geometric
41
Chapter 2 homework questions: Discrete Distributions
Remember to work through the homework questions BEFORE the tutorial.
Robot Questions
Calculate the value of c that makes the following valid probability mass functions.
Question 1
x 1 2 3 4
P( X=x) c 2c 3c 4c
Question 2
x 1 2 3 4
p X (x) c c c c
2 3 4
Question 3
{
cx , x =3 , 4 , 5
p X ( x )= c ( 11−x ) , x=6 , 7 , 8
0 , 0 therwise
x 15 30 40
P( X=x) 0.5 0.3 0.2
P( X ≤ x)
42
Question 5
x 1 2 3 4
p X (x) 0.3 0.2 0.3 0.2
F X (x )
Question 6
x -2 -1 0 1 2
p X (x)
F X (x ) 0.1 0.2 0.4 0.7 1
Question 7: I have an unbiased die. For each of the following state, with a reason, the distribution of the
random variable. (Discrete uniform, geometric, binomial, other)
a) X = the number of times I roll until I get a 6.
Question 8) In the production of a Micro SD card, it is found that 10% are defective. The cards are
produced in batches of 10.
a) Write down a suitable model for the distribution of defective components in a batch.
i) No defective Micro SD cards ii) 2 defective micro cards iii) at least 3 defective Micro SD card.
43
Question 9: A shop receives a batch of 1000 cheap lamps. The probability that a lamp is defective
is 0.1%. Let X be the number of defective lamps in the batch.
a) What kind of distribution does X have? What is/are the value(s) of the parameter(s) of this
distribution?
b) What is the probability that the batch contains no defective lamps? One defective lamp?
More than two defective lamps?
Mathematician Questions
Question 1) NBA All Star Weekend! (Google this if you have no idea what we’re talking about. Definitely
youtube the Slam Dunk Contest!) https://www.youtube.com/watch?v=u7VgkfcSYz0
a) Slam Dunk Contest: In the Slam Dunk contest, players compete to perform the “best” dunk.
They get three attempts at a dunk. If/when they make the basket they stop and get a score. If the
probability of Aaron Gordon making a particular dunk is 80%, what is the probability he does not
make the dunk in the contest?
44
b) Three-point Contest
In the Three-point Contest, players compete to make the most three-point shots out of 25 taken. If
Steph Curry makes 44% of the three-pointers he takes, how many would you expect him to make
in the contest? Klay Thompson went before Steph Curry and made 23 shots. What is the
probability Steph beats Klay?
The NBA Skills Challenge is a competition to test ball-handling, passing and shooting ability. In the
current version of the contest, two participants race against each other on identical courses by first
dribbling between five obstacles while running down the court. Next, the player must throw a pass
into a net that does not touch the ground. Then, the players must dribble back the full length of the
court for a lay-up. Shortly after, the players must dribble back down the court and hit a three-
pointer from the top of the basketball key. The match ends when the first player hits the three
pointer. Kristaps Porzingis is competing in the skills challenge. He has reached the final challenge: to
make the three-pointer. In the regular season, Kristaps’s three-point percentage is 35.7%.
a) How many shots do you think Kristaps will have to take before he makes the basket?
(Despite the rules of the game, Kristaps is competitive and will shoot until he makes the basket!)
b)What is the probability he makes the basket on his third attempt?
c) What assumptions have you made to answer this question?
45
3 Continuous Distributions and Expectation
This chapter is an introduction to continuous distributions and expectation of discrete and
continuous variables which will be covered again in more detail in the second probability section
of the course.
*Start video lecture 3a*
The common factor associated with these questions is that the random variable being described
can only take discrete integer values. Thus, discrete probability distributions were used.
However, if we want to ask questions such as:
What is the probability that a can of Diet Coke contains 330ml of the drink?
Then, the variable is NOT restricted to a discrete value. It may be measured to any degree of
accuracy on a continuous scale, dependent only upon the measuring equipment.
To describe these situations we use continuous probability distributions.
The first problem we encounter is that P ( X=x ) =0 for all values of the random variable.
Discuss why you think this is the case. Does this mean all values of the random variable are
impossible?
Therefore, continuous random variables are characterised by the properties of their (cumulative)
distribution functions. This is because we can define the cumulative probability distribution (cdf)
of a continuous distribution in the same way as we did in Chapter 2 for discrete distributions:
F ( x )=P ( X ≤ x ) .
In fact, if you look back to Chapter 2, you will note that we didn’t state that X was a discrete
random variable in our definition of the cdf. The cdf exists for all random variables and all values
on the continuous scale regardless of whether the variable is discrete or continuous.
Note: the properties of the cdf stated in Chapter 2 apply to the cdf of any random variable.
However, there are some special properties of the cdf for discrete and continuous random
variables.
Find the property/ properties of the cdf in Chapter 2 that only applies/ apply to discrete random
variables.
Now we state some properties of the cdf for continuous random variables.
46
Suppose that a random variable X can take any value in the range (a ,b) (i.e. a to b).
Then the cumulative probability distribution is of the form:
{
0 for x <a
F X ( x )= A monotonically increasing function for a ≤ x ≤ b
1 for x >b
Note: since P ( X=x ) =0 it follows that:
P ( X ≤ x )=P (X < x )
and
P ( X ≥ x ) =¿
Recall from Chapter 2 that: In general, for a discrete random variable, the distribution function is
obtained simply by summing the mass function for all values up to x .
Example 3.1.1
Look at and discuss the following diagram:
Do you think it represents a discrete or continuous distribution?
If the class intervals are( 0 , 0.1 ] , ( 0.1 , 0.2 ] , … , ( 0.9 ,1.0 ] ,what probability does the shaded
area represent?
Put a tick in the bars you would need to calculate
F X ( 0.2 )=P ( X ≤ 0.2 )
probabilit y= heigtx0.1
Now suppose we narrow our class width to 0.01. The resultant histogram is below.
47
probability = heightx0.01
Histogram class interval 0.01
0.018
0.012
0.006
0
Looking at the diagram above, how could you calculate the following probabilities (in terms
of the function f (x))?
Can you think how the cdf will be linked to the function denoted f (x) in the figure above?
The function f (x) is known as the (probability) density function (pdf) of the random variable X .
We can use this function to help us mathematically define a continuous random variable.
We call a random variable X continuous if its distribution function F X can be
written as
x
F X ( x )=∫ f X ( s ) ds , x ∈ R
−∞
48
for some non-negative function f X : R → [ 0 , ∞ ) .
In this case, we say that f X is the density function of X .
The fundamental theorem of calculus implies (under some conditions) that for each x ∈ R,
x
d d
F ( x ) = ∫ f X ( s ) ds=f X (x ).
dx X dx −∞
Exercise: Translations
Theorem
Let X be a continuous random variable, then its density function f X satisfies
f X ( x ) ≥ 0 , ∀ x ∈ R (d 1)
and
∞
∫ f X ( x ) dx=1(d 2).
−∞
Conversely, any real function f X satisfying (d1) and (d2) is the density function
of some continuous random variable.
Example 3.1.2
{
c−1
x , 0≤ x ≤ 1 .
Consider the function g ( x )= Is this a probability density function?
0 otherwise
1. d1: Is g ( x ) ≥ 0 ∀ x ∈ R ?
Firstly, outside of the range [0, 1] it is zero, so that is OK. But is it positive in the range 0 to 1
[0, 1]? Yes, since x is positive over the range therefore g ( x ) is also positive over the range.
2. d2: Is total probability 1?
[ ]
∞ 1 1
xc 1
∫ g ( x ) dx=¿∫ x c−1 dx= = ¿
c 0 c
−∞ 0
For calculating probabilities of events involving random variables, density functions have for
continuous random variables the same role that mass functions have for discrete random
variables. The analogy, however, is not direct.
49
Example 3.2.3
Can you think of anything that is true for mass functions but is not true for density
functions?
Find the equivalent theorem for discrete random variables. Compare and contrast.
(i.e. the (m1)/ (m2) theorem.)
Lemma
If X is a continuous random variable with density function f X , then for all
a , b , ∈ R with a ≤ b
Note that this means we can calculate P(a≤ X ≤ b) simply by calculating the area under the
density function between the points a and b .
Also, because P( X=a)=0=P( X=b) the weak inequalities (≤) can be replaced by strict
inequalities (<) anywhere without changing the probabilities.
i.e. P ( a< x <b )=¿
As with discrete random variables, there are a number of continuous random variables which have
special places in probability theory.
50
*End video lecture 3a*
*Start video lecture 3b*
{
1
, if x ∈[a , b]
f X ( x )= b−a
0 , if x ∉[a ,b ]
51
3.2.2. Exponential distribution
The exponential distribution is perhaps the second most important of all the continuous
probability distributions because of its extensive use in probability modelling.
The exponential distribution is often used to model waiting times between certain events, such as
natural disasters, machine break-downs, or customers joining a queue. If these waiting times are
independent and Exp(λ) distributed, then it can be shown that the number of arrivals per unit of
time t follows a certain type of famous discrete distribution called the Poisson distribution (with
parameter λt). We will cover this distribution later in the course.
The exponential distribution is also useful as it is the only continuous distribution that has the
memoryless property; that is the conditional probability: P ( T > s+ t|T >s )=P (T >t )
Translation: If the random variable T is a lifetime then we read this probability statement as the
conditional probability that the lifetime will survive s + t given that it has already survived to s is
the same as the probability that the lifetime will survive t.
Exercise 3.2.2.1: Can you think of an example of where this property would be appropriate?
Example 3.2.2.2
So if the length of time between hurricanes is exponentially distributed, the probability that the
next hurricane doesn’t occur in the next t+ s units of time, given that we’ve waited s units already,
is simply the same as the probability that the next hurricane doesn’t occur in the next t time units.
(This seems quite reasonable: nature doesn’t decide that there should be a new hurricane soon
because there hasn’t been one for a while...!)
We say that the continuous random variable X has the exponential
distribution with parameter λ and write X ∼exp (λ), if the density of X is
{
− λx
f X ( x )= λ e , if x ≥ 0
0 ,if x <0.
52
Example 3.2.2.4
Suppose that the operating lifetime of a battery is an exponential random variable with λ=0.5 .
What is the probability that the operating lifetime is over 4 years?
What is the probability that the operating lifetime is between 1 and 3 years?
*End video lecture 3b*
Roll a fair die 600 times. How many sixes would you expect to get?
Flip a fair coin 50 times. How many heads would you expect to get?
Flip a fair coin 100 times. How many heads would you expect to get?
Flip a biased coin, where P ( H )=0.7 , 100 times. How many heads would you expect to get?
53
Flip a coin, where P ( H )= p , 100 times. How many heads would you expect to get?
Can you write a general formula for the expected number of heads in these examples?
Note:
E [ X ] is just a number (not a random variable).
E [ X ] is also called the “expected value” or “mean” of X .
It can be thought of as the “center of mass” of the probability distribution.
Example 3.3.2
Students at the university library may borrow up to five books at any one time. The number of
books borrowed by a student on each visit is a random variable, X , with the following probability
distribution:
x 0 1 2 3 4 5
p X (x) 0.24 0.12 0.14 0.30 0.05 0.15
A student arrives at the library. How many books would you expect them to borrow?
5
E [ X ] =∑ x k p X ( x k ) =0 ×0.24 +1× 0.12+¿ ¿
k =0
54
E [ h(x ) ] =∑ h ( x k ) p X ( x k ) .
k
∞
E [ X ] = ∫ x f X ( x ) dx
−∞
x 5 6 7 8 9 ∑❑
P( X=x) 0.1 0.2 0.3 0.3 0.1 1
xP( X=x )
E( X )=¿
55
P(X=x) = ¿
Question 2 Calculate the expected value of x. Write the pmf in the form specified in lectures
x -2 -1 0 1 2 ∑❑
p X (x) 0.1 0.2 0.3 0.3 0.1
x . p X (x)
{
6− y
, y=1 ,2 , 3 , 4 , 5
p y ( y )= 15
0 , otherwise
Question 4 State the pdf for the exponential distribution and calculate f X ( 2 ) .
Question 5 State the cdf for the exponential distribution and calculate F X ( 2 ) .
Question 6 Calculate P( X ≤ 3)
Question 7 Calculate P( X> 3)
56
Question 8 Find a formula (e.g. on the internet) for E [X ]. Use this to calculate E [X ]
57
Chapter 3 Tutorial: Expectation and Continuous Distributions
Mathematician Questions
Calculate the expectation of the following random variables.
Question 1: Find the value of a. Then calculate E [ X ] .
x 1 2 3 4
p X (x) a 0.2 3a 0.2
Question 2
{
k ( w−1 ) , w=2 ,3 , 4 , 5 ,6 ,7
pW ( w )= k ( 13−w ) , w=8 , 9 ,10 , 11, 12
0 , otherwise
For the following questions you will need the formula for the expectation of the Binomial and Geometric
distribututions.
1
E [ X ]=
p
E [ X ] =np
Discuss which expectation you think goes with each distribution? What makes you think this?
Check if you are correct on the internet.
Question 3: X Geometric ( 0.1 ) . Calculate E [ X ] .
Question 4: You flip a fair coin until you get heads. How many times do you expect to flip the
coin?
58
Question 5: You are in a casino in Las Vegas and decide to play roulette. You always bet on 00.
You keep playing until you win. How many spins of the roulette wheel do you expect?
Question 6: You only have enough money for 10 spins of the roulette wheel. How many times do
you expect to win?
Note: You will not be tested on anything requiring intergration at the moment but you will later in
the course.
Let a continuous random variable X be given that takes values in [0 , 6], and whose distribution function
F X satisfies
3 2
−2 x +6 x +144 x
F X ( x )= for 0≤ x ≤ 6.
648
Question 8: Compute P ( 1
2
≤ X ≤1 )
Question 9: Give the probability density function of X in the interval [0, 6].
59
The probability density function f X of a continuous random variable X is given by
{ −6 ( x +7 x−c )
2
f X ( x )= 25 if 0 ≤ x ≤1
0 otherwise
Question 13: Derive the cdf of the exponential distribution (for x ≥ 0 ¿ by integrating the pdf.
The service time at a super market checkout is exponentially distributed with a mean service time of 2
minutes.
Question 14: What is the probability that the service time will be longer than 3 minutes?
Question 15: The service time has already taken 5 minutes. How much longer is it expected to
take?
60
4 Summary statistics and the Normal distribution
With Ellen’s notes you will need to fill in the gaps as we go, answering questions rather than
copying from the board!
In this chapter, you will need the Excel file ‘Student normal distribution Excel sheets’ which
contain calculations used within the notes.
Exercise: I have collected and summarised data on 81 babies born at Jessops in 2016 which are contained
in the Excel sheet.
Two summary statistics which are commonly used are the mean and the standard deviation.
n
Variance:
∑ ( x i−x )
2
Birthweight (x)
The table here shows the first 5 babies. The mean uses the sum 6.9 0.16
of the first column and the variance the sum of the second.
6.6 0.47
Calculate the sample mean and standard deviation for the 81 5.3 3.91
babies using the sums below.
8.5 1.33
81 81
7.9 0.41
∑ x =594.89 ∑ ( xi −x ) =158.97
2
i=1 i =1
61
Median and quartiles
The median and quartiles divide ordered data into 4 equal parts which are labelled in the boxplot
below. 25% of values are below the lower quartile and 25% are above the upper quartile.
The interquartile range is the middle 50% (the box) are can be written as (lower quartile, upper
quartile) or as the absolute difference upper – lower.
Write down the median and interquartile range from the data below
Histograms
Histograms are frequency distributions with frequencies of grouped data in bars. They are usually
used to check the spread of the data and check the type of distribution.
62
Histogram of approximately
Histogram of skewed data
normally distributed data
Can we use the sample data we have to generalise about the wider population of babies?
Inferential statistics
In statistics we usually use sample data to estimate parameters of a wider population.
We call this inferential statistics as we are inferring something about the population.
Different notation is used to represent sample statistics and population parameters.
63
4.2 Normal probability distribution
VIDEO: Introduction to the normal distribution
The babies can be summarised in histograms using frequencies, proportions or percentages and
used to estimate values in the population.
Frequency distribution Distribution of percentages
How many babies weigh between 9 and 10 lbs? What percentage of babies weigh 9 - 10 lbs?
Are the data normally distributed? What percentage of babies weigh more than 9 lbs?
The population distribution can be represented by a smooth normal curve estimated using the
sample mean and standard deviation from which probabilities can be calculated.
Probability density curve Probability density curve
Does the curve fit the data well? If X is the random variable birthweight, what
probability is represented in the blue shaded area?
64
Normal probability density function
Note: In statistics you will notice the use of lowercase and capital letters. Capitals are for the
random variable in general and lowercase letters indicate specific values of the random variable.
The most widely used continuous probability distribution is the normal distribution. It was
originally used by the German mathematician, Karl Friedrich Gauss (1777-1855) who called it
Gaussian error distribution but it was given the name normal by the American logician Charles
Peirce in 1873. The exact shape is controlled by the mean m and the standard deviation s.
Formally if a random variable X has a normal distribution, in short hand
2
X N (μ , σ ), then the normal probability density function f ( x ) is a function of
the form
2
−( x−μ )
1 2
f ( x )=f ( x|μ , σ )= e 2σ
−∞ < x <∞ ,−∞< μ <∞ σ >0
σ √2 π
To sketch the probability density curve below, values of f(x) were calculated for specific x values
Excel has a built in formula for calculating the p.d.f:
Note: The curve never actually touches the x axis as the probability is never 0, just really small!
65
The curve is controlled by the mean and the standard deviation. The following curves show how
the curves change as the parameters (mean and SD) change. To see the impact changes have use
the Excel sheet:
Write the means and standard deviations for the following density curves (given in lecture)
μ=7.3
μ=¿
σ =1.4
σ =¿
μ=7.3 μ=¿
σ =1.4 σ =¿
66
Cumulative density function (cdf)
For continuous random variables, probabilities are calculated using the integral
of probability density function, the cumulative probability distribution
(cdf) which calculates less than probabilities. The cumulative probability
density function is represented by
x
F ( x )=P( X < x)=∫ f (x )dx
−∞
To calculate the cumulative probabilities for a given distribution in Excel, the same formula is used
as the p.d.f but FALSE is replaced with TRUE
b) P(X > 6)
67
Probability tables
Excel can calculate the probability of any normally distributed variable given the mean and
standard deviation but for exam situations, you will need to use probability tables in exams
X is the random variable ‘birthweight’ which is normally distributed
2
X N (7.3 , 1.42 ). The table shows cumulative less than probabilities
x
F ( x )=P( X < x)=∫ f (x )dx
−∞
Rather than the cumulative curve, it is common to shade the parts of the p.d.f. required by the
question of interest instead.
Examples
P( X< 9)
To calculate the probability of a random variable X lying between a and b (where b > a):
b
P(a< X <b)=∫ f ( x ) dx=F ( b )−F ( a )
a
68
For the following examples, shade the appropriate section of the pdf, write the question
mathematically and use the table above to calculate the required probabilities. Also write the
command needed to calculate the probability in Excel.
P ( X <5.2 )=¿
What is the probability of a baby weighing between 4.5 and 10.1 lbs to 2 d.p.?
a = ?????
69
4.3 Standard normal (Z) distribution
VIDEO: The Z distribution
Whilst it is now very straightforward to calculate the probabilities for any normal distribution using
a computer package, probability tables were used in the past. As having tables readily available
for every combination of mean and standard deviation was not possible, one special distribution
with a mean of 0 and standard deviation of 1 existed called the Z distribution. Other distributions
were standardised to fit this distribution. Z N ( 0 , 1 )
Probability density function Z N ( 0 , 1 ) Cumulative distribution function
2 2
− ( z) z z −1
1 1 z
f ( z )= e 2
Φ ( z)= ∫ f ( z ) dz= ∫ e 2
√2 π −∞ −∞ √2 π
Standard normal (Z) distribution table
Z 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
3 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990
3.1 0.9990 0.9991 0.9991 0.9991 0.9992 0.9992 0.9992 0.9992 0.9993 0.9993
3.2 0.9993 0.9993 0.9994 0.9994 0.9994 0.9994 0.9994 0.9995 0.9995 0.9995
3.3 0.9995 0.9995 0.9995 0.9996 0.9996 0.9996 0.9996 0.9996 0.9996 0.9997
3.4 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9997 0.9998
70
4.4 Standardising
In situations where only the standard normal table is available (e.g. exam), values from other
distributions need to be standardised.
Standardising other distributions to create Z scores
X N ( μ , σ 2 ) → Z N (0 ,1)
X−μ
Z=
σ
How would the formula differ if you had sample statistics instead of population parameters?
Example: For the birthweight data which had a mean of 7.3 and standard deviation of 1.42, we
need to convert the original X distribution to a Z
X N ( 7.3 , 1.422) → Z N (0 , 1),
X−7.3
Z=
1.42
A baby is born weighing 9.9 lbs so the standardised score would be calculated as:
9.9−7.3
Z= =1.83
1.42
If you wish to calculate the probability of having a baby of 9.9 lbs or less, use the Z table
(
P ( X <9.9 )=P Z <
9.9−7.3
1.42 )
=P ( Z <1.83 ) =Φ ( 1.83 )=¿
71
Special z scores exist which are used a lot in statistics. These z
scores are often used to describe middle (or normal) ranges
such as the middle 95% or 90%.
Example: Find the Z scores between which the middle 90% lie.
Top
5%
Middle 90%
¿ P(Z< ¿¿ )
b) P( X> 8)
c) P(8< X <9.9)
Use the Z table to find the value a below which 97.5% of values lie
and the value below which 2.5% lie? Therefore state the range
within which 95% of values lie
Top
P(Z< a) = 0.975 2.5%
Find the values between which the middle 68% of values lie.
72
How would you calculate the middle 95% of birthweights i.e. the
X values given the Z scores? Hint: Reverse the standardisation
Actual values
Can you think of a general formula for calculating the middle 95% for any mean μ and SD σ ?
Can you think of a general formula for any Z score, mean μ and SD σ ?
Critical values
Critical values are set values from the Z distribution between which a set percentage of values lie.
We have calculated a few of these in previous examples so fill in the missing values.
Critical values summary
Middle % Z scores
68%
90% ± 1.645
95%
99% ± 2.58
73
Ranges for individuals
You will come across many types of ranges in statistics so make sure you check what exactly is
being reported and whether the range refers to ranges for individuals or parameters.
Interquartile ranges contain the middle 50% of values for sample data but many other ranges are
used to make inferences about the population.
Measurements for people are often divided into percentiles e.g. ‘child’s height is in the bottom
20% for their age’. These can be used to create 'Normal ranges' which are based on sample data
but used to represent individuals in the population. They are also known as 'Reference ranges'.
Next week we will look at 'Confidence intervals' which are used to give a range of values for a
population parameter e.g. mean rather than a range of values within which an individual is
expected to lie.
There are properties of normal distributions which enable us to calculate ranges within which set
proportions of subjects lie. Values can be discussed as a number of standard deviations from the
mean and there are particular 'critical values' between which set percentages lie.
Lower limit =
Actual values
Upper limit =
In statistics, 'most' subjects is considered to be the middle 95%. For normally distributed data,
95% of subjects are considered to be within 1.96 standard deviations of the mean.
Limits are estimated as:
mean ± ( 1.96 × SD )
7.3 ± ( 1.96× 1.42 ) 4.5 10.1
(4.5, 10.1)
95% of individual babies are expected to be between 4.5
No. of SD’s from mean
lbs and 10.1 lbs.
Actual values
74
Fill in the general normal range equations and actual numbers for the birthweight example in
the table below.
90%
99%
50%
(Interquartile
range)
75
4.5 CAST material
The images in this chapter were created using the interactive online statistics ebook CAST. It uses
Java script to run interactive demonstrations of different aspects of statistics and probability which
can aide your learning. The University computers may not allow you to run Java script so go to the
link at home and use Internet Explorer rather than Firefox:
http://cast.massey.ac.nz/core/index.html?book=generalx
Go to the Sampling and
Variability section and choose
‘Probability and probability
density’ and investigate some
of the options to help you
understand the normal
distribution and probability.
76
4.6 Summary of Excel commands for the normal distribution
For cumulative less than calculations using the standard normal distribution, Z N ( 0 , 1 ) , P ( Z < z ) :
=NORM.S.DIST(z, TRUE)
You can also calculate z scores for a given probability α , P ( Z < z )=α
=NORM.S.INV(α )
Example: If P(Z < z) = 0.05, use the following to calculate z=-1.645
=NORM.S.INV(0.05 )
To calculate less than cumulative probabilities P ( X < x ) for distributions with any mean and
standard deviation, X N (μ , σ 2 ) use the command:
=NORM.INV(α , μ , σ )
Example: To find the x value beneath which 2.5% of values lie for the birthweight data,
P ( X < x )=0.0 5
=RANDBETWEEN(0,15)
To generate random numbers from a normally distributed variable, nest the RAND command with
the inverse normal command
=NORM.INV( RAND(), μ , σ )
Example: To generate a random number from a population with a mean birthweight of 7.3 and
standard deviation of 1.42:
¿ NORM . INV (RAND(), 7.3 ,1.42)=?
77
You can then pull the command down to create a set of random numbers.
Important note: Pressing F9 will change the numbers but also any calculation carried out on the
sheet will also change the numbers so if you wish to keep a sample, use Copy Paste values
somewhere else on the sheet.
78
Tutorial 4: Normal distribution
The online quiz ‘Normal distribution’ should be completed BEFORE the tutorial. probabilities then
a) P(Z <1.26)
b) P(Z >1.26)
c) P(0< Z <1.26)
d) P(Z ←1.26)
e) P(−1.96< Z <1.96)
79
Q2) The IQ's of 200 people were collected and a mean of 100 and standard deviation of 15.3.
The following questions relate to this distribution. To calculate probabilities, standardise and
use the Z distribution. Sketch normal curves and the required probabilities for each question.
. X =random variable IQ , X N (100 , 15.32 )
a) Does the normal curve fit the distribution well?
b) What is the probability that a randomly selected person has
an IQ of less than 105?
c) What is the probability that a randomly selected person has
an IQ of more than 105?
d) What is the probability that a randomly selected person has
an IQ of less than 95?
e) To qualify for Mensa, an individual must have an IQ of at least 131. What percentage of the
population is eligible to join?
80
a) Estimate the 95% limits between which 95% of people are expected to lie
b) Estimate the 99% limits between which 99% of people are expected to lie.
c) Estimate the median, quartiles and interquartile range for IQ .
Q3) Chocolate bars are automatically filled with on average 100g of chocolate by machines in a factory
with a standard deviation of 5g. Let X be the random variable ‘Weight of one chocolate bar'.
a) What is the probability that a randomly selected chocolate bar weighs less than 105g?
b) What is the probability that a randomly selected chocolate bar weighs more than 105g?
c) Between which two weights (a and b) should 95% of chocolate bars fall if the machine is
functioning correctly?
81
d) If more than 110g of chocolate is used the wrapper is not big enough and the bar is
rejected. What is the probability that a randomly selected chocolate bar weighs more
than 110g?
e) If a batch contains 200 chocolate bars, how many do you expect to be rejected?
A random sample of 5 bars of chocolate is taken from the machine with the following weights.
Calculate the mean and standard deviation of the bars. How
i x
likely do you think it is that the machine is under or over
1 101 filling based on this sample?
2 104 Change the 4th observation from 108 to 98 and write down
3 103 the new mean and standard deviation. What can you
conclude about the impact of outliers on small samples?
4 108
5 99
sum
The Excel sheet ‘Chocolate’ randomly generates bars of chocolate from a population with mean
100 and SD = 5. Press F9 or fn lock F9 to randomly generate more numbers. Record 8 more
randomly generated means in column K and calculate the mean and SD of the 10 sample means.
82
Q4) Complete the following summary table.
You may wish to calculate the cdf/ expectation/ variance of the random variables for which this was not
demonstrated in lectures. Alternatively, you can find them in a book/ on the internet etc.
Bernoulli
NA
Geometric
Binomial
NA
Continuous
Uniform
Exponential
R μ 2
σ
Normal
Continuous
Continuous?
Distribution
Parameters
Range
Discrete or
pmf/ pdf
cdf
83
Probability and statistics statistical tables
84
85