Data Study Guide 2

07/04/2022, 08:46 Data Study Guide 2: View as single page
Unless otherwise stated, copyright © 2022 The Open University, all rights reserved.
Printable page generated Thursday, 7 Apr 2022, 08:42
Data Study Guide 2
Objectives
Having studied this section you should be able to:
calculate probabilities based on simple assumptions
identify dependent and independent probabilities
calculate conditional probabilities
identify binomial distributions
use the binomial distribution to predict probabilities
understand where and how to apply the Poisson distribution
show how the binomial distribution relates to the normal distribution
use the normal distribution to measure behaviour of populations.
Introduction
Statistics are used widely in engineering: everything from measuring component tolerances in a production
process to designing automobile interiors requires a grasp of statistical methods. In this section we will look at
probability and then we will see how this leads to methods of interpreting and classifying data. Understanding
probability and statistics is vital in many branches of engineering, particularly in production and design. We
look to automobile design and component production for our examples, but the underlying principles could
apply to any activity where large numbers of measurements are involved.
1 People, automobiles and statistics

Automobile manufacturers typically want to sell as many vehicles as they can make. This means that they
need to appeal to the widest possible market. In this section we are going to examine the way manufacturers
might use statistics and probabilities to influence some aspects of vehicle design and to ensure they meet
market requirements.
There are many considerations which feed into automotive design but one of the key drivers (sic) has to be
the customer’s physique.
https://learn2.open.ac.uk/mod/oucontent/view.php?id=1891789&printable=1 1/87
Figure 1: Both a, b) small and c) large vehicles present access and drivability problems.
Obviously, the driver has to be able to access and fit into the vehicle Figure 1. This can be a problem for very
large or very small people. In fact, even people who are only a little taller than average may struggle in the
case of some sports cars (see Figure 2) where vehicle height might be restricted by styling or performance
requirements.
Figure 2: Styling and performance requirements may be more important than space.
At the other end of the scale, access is a major consideration in the larger 4x4 vehicles where stepping up into
the cab can be an issue for some people due to height, disability or infirmity. In fact many 4x4 vehicles are
designed to lower themselves to facilitate access. Once inside, the driver must be able to reach and operate
the controls: steering wheel, pedals and switches. Also, the vehicle safety systems must be appropriate to the
occupants, this influences things like seatbelt locations and airbag deployment which could themselves cause
injury if there is a poor match between design and occupants’ build.
It’s clear that people’s physique will be a key influence in selecting a car. So manufacturers need to
understand how factors such as: height, leg length, arm length, weight and even physical strength influence
sales and this requires understanding how such metrics are distributed. If I’m designing a vehicle’s interior I
should like to know how much of the market I’m likely to exclude when I take decisions which might exclude
large or small drivers. I could choose to build the interior around some sort of mean value but how much
adjustment do I need to incorporate to accommodate, say, 95% of the market? To understand this I need to
engage with the branches of applied science called anthropometrics (or anthropometry), which is the study of
the measurement of size and proportions of the human body. As you are aware from Block 1, measuring
people is far from simple.
Activity 1
Measure the length of your forearm. This is the distance between your elbow and your wrist.
Hide discussion
Discussion
This is far from simple. You’d maybe like to measure the length between the axis around which your
elbow hinges then a similar axis for your wrist, or simply from the back of your elbow with a bent arm to
a bone in your wrist. But, either way, these are not well defined and the proportion of error is quite
significant. You met a similar problem when measuring a towel in Block 1. All we are doing here is
identifying the need for consistency in what we do.
Here is a link to Adultdata, the handbook of adult anthropometric and strength measurement.
On pages 21 to 28 you will see the many different dimensions which are used to characterise human beings. If
you look at page 30 titled ‘Stature’, you will see a table identifying:
the country from which the data was obtained
the sex of the participants from whom the data were obtained
four columns of figures describing the mean, standard deviation and the 5th and 95th percentiles – you
will see later in this section what a percentile is and how the 5th and 95th percentiles are calculated
the source of the survey – this turns out to be vital since some of the data is taken from quite narrow
sections of the population.
The main part of the handbook, which is section 7, uses this format to list data for all the dimensions shown in
section 6.
If you look at section 5 you will see the data do not necessarily describe the same populations. For example,
the age ranges vary and in some cases were never recorded. Clearly a person’s racial origin is an issue when
evaluating anthropometric data; you can see this by comparing say Dutch statures to Sri Lankan. Gender is
another obvious factor, but there are many other less apparent factors. For example, in many countries the
population is getting taller over time. This means that much of the data in this handbook is likely to be out of
date. Also this only covers adults, things get a lot more complicated when dealing with children where the
rates of growth and growth proportions vary widely.
Activity 2
Use the Adultdata handbook to find a definition of how to carry out the forearm measurement you
attempted in Activity 1.
Hide discussion
Discussion
This is given on page 123 as ‘back of elbow to wrist crease’. Although it’s not perfect it does provide a
means to be consistent.
Now think of yourself sitting in the driver’s seat of a car and identify three factors which might influence
your ability to drive the vehicle.
Hide discussion
Discussion
The three I got were:
1. distance to pedals
2. weight of steering (force required to turn the wheel) at low speed
3. mechanical handbrake lever location.
Your three will probably be different because there are so many possibilities but this exercise is purely to
help you think about the possibilities. 1. relates to height 2. to strength and 3. possibly to a combination
of the two.
Given the three points I have suggested, how might a designer alter the vehicle in order to
accommodate a wider range of physiques?
Hide answer
Answer
1. Distance to pedals. This one seems quite straightforward. I simply have to ensure the driver’s seat
will adjust over an appropriate range, but what does that do to the driver’s position in relation to the
steering wheel?
2. Weight of steering at low speed. Again seat position can be a factor but this is usually overcome by
either altering the gearing or adding some form of power assistance.
3. Handbrake lever location. This one might not have been a problem, except that I solved the first
problem by making the driver’s seat adjustable. There are all sorts of solutions possible here. Fix
the seat so that it doesn’t move, put a long lever on the handbrake. Some compromise, or maybe
rethink the handbrake design and location entirely.
Here is a recording of an interview with Dr Paul Herriotts. Dr Herriotts is an ergonomist who designs vehicle
interiors for Jaguar Land Rover. His job is to ensure that the vehicle is designed around the user. He is being
interviewed by me, Dr Tony Nixon, senior lecturer in information systems and author of this block.I hope this
will give you one perspective on how statistics play a role in industry.
Vehicle design requires an understanding, not only of how to measure physique, but also how to handle data
which is distributed – one size will not fit all. The content of the Adultdata handbook provides information
which is useful but needs some interpretation. In particular, most of the measurements are expressed in terms
of things called ‘percentiles’ and their use is based on some knowledge of statistics.
The sort of question we would like to be able to address is:
‘What range of physiques must we accommodate in order to be sure that 95% of the population can
drive the vehicle comfortably?’
In doing this you will see that we can also address other questions which are of interest to engineers, such as:
‘Given a batch of components, some of which are faulty, how many do I need to sample to ensure
the number of faulty components is below a given threshold?’
For example, if I am manufacturing rivets and I can tolerate 0.3% being over size, how many from a batch of
1,000 do I need to measure to be 95% sure that I know when I fail to meet this condition? This sort of question
arises frequently in manufacturing and the mathematics required to answer it is the same as that required by
the car designer.
To answer these questions, you need to understand a little about how to deal with probabilities and stochastic
(or random) processes.
Activity 3
Before proceeding, measure your sitting height as shown in the Adultdata handbook and in Figure 3
below.
Figure 3: a) Measuring sitting height, and b) using a caliper to take a wrist-to-elbow measurement.
Make an entry in your learning log, to summarise how you measured your sitting height and to make a
record of your measurement. You will use this information along with the data entered by other students
to perform calculations for TMA 02.
Before going on to the next section, here’s another interview with myself, Dr Herriotts, and his colleague
Louise Malcolm, who is an ergonomist. I hope this will give you a little insight into how anthropometrics are
interpreted in automotive design.
2 What is probability?
What do I mean when I talk about the probability of an event? I might say something like, ‘It’s probably going
to rain,’ and you’d rightly take this to mean that it’s likely to rain, but it might not. If we’re going to work with
probability as engineers, we need a definition which is more precise. We need to turn it into a number, then we
can do useful things like make comparisons and carry out calculations.
Returning to the probability of rain you might have seen weather forecasts with expressions like ‘20% chance
of rain’. Percentages fall between 0 and 100 and we’d understand a 100% chance of rain to mean it’s
definitely going to rain and a 0% chance of rain to mean it won’t rain. Percentages are an alternative way of
representing a value between 0 and 1 and I’ll use this value as a definition of probability. A probability of 0
means it will never occur and a probability of 1 means it will definitely happen. So, a 20% chance of rain would
mean there’s a probability of 1/5, (20/100), or 0.2 that it will rain. Returning to our car design problem, we
might be interested to know the proportion of the adult population that are less than 1.3m in height.
Activity 4
You have bought a raffle ticket. In total the organisers sold 800 tickets. What’s the probability that your
ticket is the first out of the hat?
Reveal answer
What is the probability that the first ticket out of the hat is not your ticket?
Reveal answer
Given a 20% chance that it will rain, what is the probability that it will not?
Reveal answer
2.1 Probability using dice

Let’s take a 6 sided die and look at how our definition of probability might work where there are 6 possible
outcomes. If the die is fair, then there are 6 equally probable results for any throw: 1, 2, 3, 4, 5 or 6. So, I can
say that the probability of throwing any given number is 1/6. If I ask what the probability is of throwing a 2 it is
1/6. Be careful, this doesn’t mean that if I throw the dice six times I’m bound to throw a 2. It’s a method of
guessing what the throw might produce. If I throw the dice enough I’d expect about 1/6 of the throws to
produce a 2.
Activity 5
What is the probability of throwing a number that is not 3?
Hide answer
Answer
1 – ‘the probability of throwing a 3’ = 1 – 1/6 = 5/6
What is the probability of throwing a number greater than 3?
Hide answer
Answer
The possibilities are a 4, 5 or 6. Each of these has a probability of 1/6, so adding the three probabilities
1/6 + 1/6 + 1/6 = 3/6 = 1/2
I hope that it’s becoming clear that we can add individual probabilities in order to obtain a net probability for a
given event. This is one way in which probabilities combine, but as you’ll see shortly you have to be very
careful when combining probabilities.
It might seem that we haven’t achieved very much by attributing probabilities to a die in this way. We knew
that the die was equally likely to give each of the numbers 1 to 6 and attributing a probability to it has not really
given us much insight.
Let’s go another step and throw two dice. Suppose that I’m interested only in the sum and not the combination
that produced it.
Activity 6
Given two dice, how many ways are there of throwing a total of 7?
Hide answer
Answer
Six
1+6, 2+5, 3+4, 4+3, 5+2, 6+1
So what is the probability of throwing a 7, given there are six ways of doing it? I can show that there are
36 possible combinations of two dice. You could work it out this way: I throw a 1 with the first die then
there are six possibilities for the second. Similarly, when I throw a 2 with the first die there are again six
possibilities for the second and so on (see Figure 4).
Figure 4: For each value of the first die there are six possibilities for the second.
So there are 36 possibilities in total. From these there are six combinations which total 7 (1-6, 2-5, 3-4,
4-3, 5-2 and 6-1) so the probability of throwing a 7 is 6/36 = 1/6.
Using two dice how many ways are there of throwing a total of 12 and what’s the probability of doing
this?
Hide answer
Answer
Only one. A 6 and a 6. So the probability is 1/36.
Figure 5: Rolling dice.
Now we can begin to make some predictions. Suppose I ask the question, if I roll two dice what are the
possible outcomes and what are their probabilities? Table 1 below shows the probability of all possible
sums of two dice. See if you can fill in the blanks. For a downloadable version of the table to fill in and
print, click here.
Now complete the table:
Table 1
Total 2 3 4 5 6 7 8 9 10 11 12
Number of ways 1 2 3 .... .... 6 5 .... 3 .... 1
Probability 1/36 2/36 3/36 .... .... 6/36 5/36 .... 3/36 .... 1/36
Probability simplified 1/36 1/18 1/12 .... .... 1/6 5/36 .... 1/12 .... 1/36
Hide answer
Answer
Table 1: The probability of all possible sums of two dice.
Table 2
Total 2 3 4 5 6 7 8 9 10 11 12
Number of ways 1 2 3 4 5 6 5 4 3 2 1
Probability 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
Probability 1/36 1/18 1/12 1/9 5/36 1/6 5/36 1/9 1/12 1/18 1/36
simpilified
Table 2: The completed table showing the probability of all possible sums of two dice.
You can see from the answer to Activity 6 that I’m more likely to throw a total of 7 than any other combination.
The skill in playing games like Backgammon, where two dice are used, relies on this knowledge and we will
make use of this idea in the next section. The important point here is that in summing the two dice I’m asking
a question about particular results of a random process. The two dice are producing random results and there
are 36 possible combinations each with equal probabilities, 1/36. When we consider the sum of any throw of
the two dice, there are only eleven possible outcomes, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and their probabilities
are far from equal.
Activity 7
What is the total of all of the probabilities in Table 2? You will find it easiest to sum these using the
unsimplified results where the denominators are identical.
Hide answer
Answer
36/36 = 1.
The previous answer is exactly what we would expect. Why?
Hide answer
Answer
Because there are no other possible outcomes.
2.2 Probabilities that are not mutually exclusive

It is not always possible to add probabilities for a particular event or observation. This is only correct when the
probabilities are mutually exclusive. What I mean by mutually exclusive is that each outcome excludes all
others. For example a component might be faulty or not faulty, or it could be above or below a certain
diameter, but it cannot be both. For a single die throw, only one outcome can appear. So, if I ask what is the
possibility of throwing a two or a five? I can simply add the respective probabilities, 1/6 + 1/6 = 1/3.
Activity 8
Consider the question: ‘What is the probability of throwing a number greater than two or a number less
than five using a single die?’ I can’t simply add the two – but why not?
Hide answer
Answer
Because they are not mutually exclusive; 3 and 4 are both greater than 2 but also less than 5.
What is the probability of throwing a number greater than 2?
Hide answer
Answer
3, 4, 5 and 6 are all greater than 2 and each has a probability of 1/6. They are mutually exclusive so we
can add them giving 4/6 or 2/3.
What is the probability of throwing a number less than five?
Hide answer
Answer
That’s also 2/3.
But if I add these numbers to try to combine the probabilities I get, 2/3+2/3 = 4/3. This is greater than one and
cannot be a probability. If you look at the question it doesn’t make a lot of sense. It’s two sensible questions
combined. I could ask, if I throw a number less than 5 what’s the probability of it being greater than 2? This
question requires an understanding of something called conditional probability and we will discuss this in the
next section.
If we’re going to work with probabilities we will need an efficient way of expressing a probability algebraically. I
will use the expression P(A) to mean the probability of an outcome or event, A, occurring. Sometimes the
outcome or event will be a number, so take care not to confuse the outcome A with its probability P. For
example the probability of throwing a 2 using a die, would be written as P(2). So we could say:
Activity 9
How would I express the probability that the outcome, A, does not occur?
Hide answer
Answer
It will be useful to have a single expression for this too; so I’ll use You should read this as ‘not
P(A)’. So, You will find that this is a very common notation for ‘not’, used in
statistics and logic – particularly in the digital electronics industry. Similarly,

(Sometimes you will encounter to mean )
For our die I might write P(3) = 1/6 as the probability of throwing a 3.
What is the value of ?
Hide answer
Answer
2.3 Sample space

In working with probabilities it is often convenient to think in terms of a space where a complete set of
probabilities exist; this is called a sample space. By ‘complete’ I mean that they sum to equal 1. This could be
a set of all possible outcomes for throwing a die (as in Figure 6b) or choosing a card from a pack of 52 playing
cards. A particular outcome such as throwing a number greater than 3 or choosing a king would be a subset
of all the possible outcomes.
Figure 6 (a) The possible outcomes for a single die. (b) The sample space for a single die.
The outcomes don’t all have to have the same probability. They might be any set of values less than one, but
they must sum to one, as in Figure 6. If the outcomes all have the same probability (as in Figure 6) the sample
space is called a uniform sample space. In Figure 7 we have a non-uniform sample space. You will see the
need to distinguish between uniform and non-uniform sample spaces shortly. In Figure 7 I have outlined a
subset of the probabilities which are consistent with some outcome, P(A).
Figure 7: A non-uniform sample space.
The probability of A is the sum of probabilities. So P(A) = 1/30+1/30+1/9+1/15 = 11/45.
Activity 10
How would I represent not A, (or ), in Figure 7?
Hide answer
Answer
Figure 8: Ā is everything outside A.
Since I could sum these to find this will, of course, be which is .
We could choose another set of probabilities P(B) from the same space as we selected P(A), as shown in
Figure 9.
Figure 9: Two mutually exclusive probabilities P(A) and P(B).
I can add all the probabilities in A to give P(A) and all in B to give P(B) then the sum P(A)+ P(B) will give the
probability that either A or B occurs. This is possible because P(A) and P(B)are mutually exclusive. However,
referring to Figure 10, I cannot add P(A) and P(C) to give the probability that A or C occurs, because they are
not mutually exclusive.
Figure 10: Probabilities P(A) and P(C) are not mutually exclusive so adding them would include the value 1/15
twice.
Activity 11
Which of the following are mutually exclusive?
A coin toss resulting in a head or a tail.
Choosing the king of spades or a heart from a pack of cards.
Choosing a king or a heart from a pack of cards.
Choosing a component which is above or below a 5 mm diameter.
Check your answer Reveal answer
Answer
a. It can’t be both a head and a tail, so, yes, they’re mutually exclusive.
b. It can’t be a spade and a heart, so yes, they’re mutually exclusive.
c. It could be a king of hearts, so no they’re not mutually exclusive.
d. It can’t be both above and below a 5 mm diameter so the probabilities must be mutually
exclusive.)
Activity 12
What is the probability of:
a) choosing a king of spades or a four of spades from a pack of cards?
Hide answer
Answer
These are mutually exclusive so it’s simply 1/52+1/52 = 1/26.
b) throwing a number greater than two or a one on a single die?
Hide answer
Answer
The probability of throwing a one is 1/6 and a number greater than two is 2/3 (3,4,5 or 6 each with a
probability of 1/6) 1/6+2/3 = 5/6.
c) throwing a number less than three or a two with a single die?
Hide answer
Answer
It’s 1/3. I hope you spotted that these are not mutually exclusive probabilities. Two is a number less than
three so we can simply discard the probability that it’s a two since it’s already accounted for.
The word ‘or’ is generally used to mean, A or B happens but this can include the possibility that both A and B
happen. In the case of Activity 11 it is possible to choose a king and a heart. So, you do have to take care
where you see the word ‘or’, you can add the probabilities for an outcome or event but you should first ensure
that there is no possibility of A and B occurring.
In brief, you should now know:
1. that probability is an estimate of what might happen. If I carry out a process such as throwing a die, the
probability will give me an estimate of what happens for a large number of throws.
2. the probability that A will happen .
3. the sum of all mutually exclusive probabilities of all possible outcomes is always 1.
4. we can sum the mutually exclusive probabilities for a particular outcome to give a net probability.
5. how to construct a simple sample space.
6. we should never attempt to sum outcomes that are not mutually exclusive.
You might like to summarise these in your Learning log.
3 Independent, dependent and conditional probabilities

3.1 Conditional probabilities
Activity 13
Refer back to Figure 9 and consider: what is the probability of A and B occurring?
Hide answer
Answer
Zero. There is no overlap.
Here are two questions with very different answers:
what is the probability of both A and B occurring?
what is the probability of B occurring if A has already occurred?
Figure 11: shows the probability of A and B occurring. It is the region where A and B overlap.
The probability of A and B occurring is given by the overlap of the probability A with the probability of B. For
Figure 11 that would be .
Our second question was, ‘What is the probability of B occurring if A has already occurred?’ Again we can turn
to our diagram for a solution. This case is a bit more subtle: first we eliminate all the results that are not in A
then we identify, from those that are left, the ones which overlap with B. However there’s a complication here.
When we eliminated everything not within A we are left with a set of probabilities which do not sum to one, as
shown in Figure 12.
Figure 12: If we know A has occurred then we are left with just the subset for A but now the probabilities do
not sum to one.
We need to correct the probabilities in order to make them sum to one. This process is called normalisation.
The sum of all the probabilities is . (You might prefer to work in decimals; obviously
this won’t alter the result, although occasionally you will get small rounding errors: 0.033 + 0.033 + 0.111 +
0.067 = 0.244)
I need to make this total one, so I will multiply by a number that does this.
Activity 14
What number do I need to multiply 11/45 by to give a result of one?
Hide answer
Answer
45/11. Because or
Or in decimals
We can use this to rescale each of the probabilities in Figure 12
as shown in Figure 13.
Figure 13: the probabilities have now been normalised to sum to one and the probability of B occurring given
the condition that A has occurred is 3/11 = 0.27.
Of course, if I only need to calculate B given A the remaining values in Figure 13 are not essential to this
calculation and are shown for consistency and to clarify the effect of normalisation on the whole of A. This
leads us to a simplification. In order to calculate B given A, I divide the probability of B by the probability of A.
This is known as Bayes’ formula. (After Thomas Bayes, an eighteenth century statistician and Presbyterian
minister). Bayes’ formula is given by the expression:
Where is the probability of B given A occurs and is the probability of both A and B. You
might feel this is rather a complicated and very general calculation. If so, take another look at Figures 11, 12
and 13 and see if you can follow the argument through. You might like to argue it through in your own words
and enter it in your learning log.
Activity 15
What is the probability of A, given B has already occurred. You can calculate this by looking at Figure 10,
then following the method above for the probability of B given A has occurred. After doing this, carry out
the same calculation using Bayes’ formula.
Hide answer
Answer
Figure 14: First, select the whole of B. This includes the part which is A and B.
Next I need to normalise by making sure that the sum of all the probabilities in Figure 14 is 1. Following
the same procedure as above, I add all the probabilities:
Given the lack of simplicity in these fractions, it’s probably best to work in decimals.
So the probability of A given B is or (note we work to 3

decimal places but state the final answer to 2 d.p.)
Using Bayes’ formula
You might note that
Example 1
Two factories, X and Y produce large numbers of components in the ratio 7:2 respectively. The
components are pooled together. The proportions of defective components produced are 0.003 by X, and
0.005 by Y. A component is selected at random and found to be defective. We wish to calculate the
probability that it was produced by Y.
There are simpler ways to solve this problem but, here I want to use it to illustrate Bayes’ formula
discussed above.
There are two sources of components X and Y. 7 out of 9 components are produced by X and 2 out of 9
by Y.
Therefore, the probability that a component is manufactured by X
and by Y is
(Note the sum of these two must be 1 since the component comes from either X or Y. In more
complicated problems this can be a useful check.)
The probability that I select a faulty component, call this P(F), is given by
because these are mutually exclusive. So
Using Bayes’ formula, where I’ve substituted which is the probability that the
component is manufactured by Y and is faulty. Then Bayes’ formula gives
which is the probability that the faulty component selected was produced by Y.
3.2 Independent and dependent probabilities

You have learned to add probabilities of individual mutually exclusive outcomes to obtain the probability of a
particular outcome, but what about combining probabilities of sequences of events? For example, the net
outcome for a series of coin tosses. To carry out these calculations we will need to know whether or not the
events are independent. Suppose I toss a fair coin twice, the outcome of the second coin toss has nothing to
do with the first. I can say they are independent. By contrast, looking at our earlier example of a particular
raffle ticket being chosen from a hat containing 800, the probability of your ticket being selected the first time is
1/800. But if a second ticket is chosen it’s either zero or 1/799.
Activity 16
Why zero?
Hide answer
Answer
Because your ticket was chosen on the first draw so there is no chance of it being drawn on the second
draw.
Why 1/799?
Hide answer
Answer
The ticket drawn was not yours but wasn’t returned to the hat, leaving yours as one out of 799.
Activity 17
Calculate all of your values to 5 decimal places in these exercises. You’ll need to use decimals because
we’re going to do some additional calculations with them.
Two transistors are faulty in a batch of 20.
a. What is the probability of choosing a faulty component from the batch?
Hide answer
Answer
There are 20 in the batch and 2 are faulty so 2/20 = 1/10 = 0.10000.
b. If the first component is returned to the batch and a second chosen at random, what is the probability
this second component is faulty?
Hide answer
Answer
This is the same as above since nothing is altered so 1/10 = 0.10000.
c. If the first component is found not to be faulty and is not returned to the batch, what is the probability a
second component chosen at random is faulty?
Hide answer
Answer
Now there are only 19 components for the second sample so 2/19 = 0.10526.
d. If the first component is found to be faulty and is not returned to the batch, what is the probability a
Hide answer
Answer
Now there are 19 components and only 1 with a fault so 1/19 = 0.05263.
Activity 18
Answer the questions a, b, c and d from Activity 17 again, but this time for a batch of 2,000 components
in which two are faulty.
a. What is the probability of choosing a faulty component from the batch?
Hide answer
Answer
Now there are 2,000 in the batch and 2 are faulty so 2/2,000 = 1/1,000 = 0.00100.
b. If the first component is returned to the batch and a second chosen at random, what is the probability
this second component is faulty?
Hide answer
Answer
Again this is the same as above since nothing is altered so 1/1,000 = 0.00100.
c. If the first component is found not to be faulty and is not returned to the batch, what is the probability a
Hide answer
Answer
Now there are only 1999 components for the second sample so 2/1999 = 0.00100
d. If the first component is found to be faulty and is not returned to the batch, what is the probability a
Hide answer
Answer
Now there are 1999 components and only 1 with a fault so 1/1999 = 0.00050
Activity 19
Comparing the differences in your answers to questions c and b from Activity 17 with the differences
between questions c and b from Activity 18 (I suggest you write them as decimals), what can you say
about independent results for large batches with low probability of failure?
Hide answer
Answer
I hope you can see now why I suggested you carry so many decimal places in your calculations. The
difference between 17c and 17b is 0.00526 or roughly . The difference between 18c and 18b
appears to be zero because it’s too small for the number of decimal places, (despite carrying 5). In fact
it’s so we’d have had to carry another two decimal places in order to see this. Our conclusion
is that, for large batches with a small number of events (in this case an event is the identification of a
faulty component) we really don’t have to worry so much about the fact that we removed, and didn’t
replace, an item.
3.3 Calculating combinations of probabilities

In many cases we need to know what happens on successive trials. That is, we want to know the outcome if
we carry out a series of experiments. Suppose I ask the probability of throwing two sixes from a single throw
of two dice. (I’ll call this ) you know that this will give the same outcome as throwing a single die
twice. So Because the two events (in this case the number thrown on each die) are
independent, I can find the probability for a combination of the two by multiplying the individual probabilities. If
you look back at Figure 4 you can see why we multiply – for any outcome with the first die there are six
possible outcomes for the second so the probability is 1/6 of 1/6.
Activity 20
What’s the probability of throwing five sixes in five throws?
Hide answer
Answer
As you might expect it’s not very likely.
What is the probability that over five throws at least one is not a six?
Hide answer
Answer
This is the opposite of the result above so I can use giving

a very likely outcome.
Discussion
You might have thought that would be the probability of throwing no sixes. This is an easy
mistake to make. The probability of throwing no sixes in five throws plus the probability of
throwing five sixes does not equal 1. Not throwing all sixes means throwing at least one number
that is not six.
Providing the events are independent (remember, as shown in Figure 8, this is where the outcome of A does
not influence B) we can multiply the probabilities for each outcome. A complication arises if the outcomes are
not independent. In Activity 18b and c. the selection of a second component caused a complication if it wasn’t
returned to the batch.
Activity 21
A batch of ten components is made up of eight good components and two which are faulty. If I sample
the batch twice and each time I do not replace the component, what are the possible outcomes?
Hide answer
Answer
1. Neither is faulty.
2. Just the first is faulty.
3. Just the second is faulty.
4. Both are faulty.
Example 2
What is the probability of each of the outcomes 1-4 listed above in the answer to Activity 21?
Taking each in turn:
1. The probability neither is faulty is . Why? The probability of selecting a non-faulty
component on the first selection if there are 8 non-faulty in a batch of 10 is . The second
time I sample the batch there are only 7 non-faulty out of 9, hence .
2.
is the probability of selecting a faulty component from the 10, is the probability of selecting
one of the eight non-faulty component from the remaining 9.
3.
This is similar to above – note the answer is the same even though the fractions are different.
4.
Which is the product of selecting 1 faulty component followed by another.
Activity 22
How can you check my answers in Example 2?
Hide answer
Answer
You can add the four results. The sum of all the possibilities has to be 1.
Activity 23
Which of the following have independent outcomes?
a. The values thrown on each of two successive throws of a die.
Hide answer
Answer
a. These are independent.
b. The sum of two successive throws of a die.
Hide answer
Answer
b. Each throw will be a part of the sum, so these are dependent.
c. The gender of two people selected at random from a group of 50 men and women.
Hide answer
Answer
c. The number of people remaining in the group will be 49 after the first has been selected, so this does
influence the second selection. They are dependent.
d. Finding a faulty component from a package containing 500, some of which are faulty, where each of
six sample components is drawn successively without replacement.
Hide answer
Answer
d. Similarly the selection of each reduces the number remaining so they are dependent.
Activity 24
What is the probability of selecting the winning horse in two successive races if 10 horses run in each
and I know nothing about the relative merits of each horse?
Hide answer
Answer
The probability of selecting the winner in the first race is The same is true of the second race. The
two probabilities are independent so
Activity 25
A batch contains 40 capacitors of which 3 are faulty.
a. What is the probability of selecting 1 faulty component if I sample once?
Hide answer
Answer
a. 3/40
b. What is the probability of selecting 2 faulty components if I take the second sample after returning the
first to the batch?
Hide answer
Answer
b. These are independent so
c. What is the probability of selecting 2 faulty components if I do not return the first to the batch?
Hide answer
Answer
c. Now the size of the batch is influenced by the fact I have removed a faulty capacitor the first time I
sampled hence
d. What is the probability of selecting no faulty components from 2 examples if I do not return the first to
the batch?
Hide answer
Answer
d. Again the size of the batch is altered so
e. Why do the sums of c and d not add to one?
Hide answer
Answer
e. The probability of selecting 2 faulty components and no faulty components are not the only
possibilities. You could select 1 faulty component.
In brief you should now know how to:
1. identify and calculate conditional probabilities

2. distinguish between dependent and independent probabilities using a sample space
3. carry out simple calculations for dependent and independent probabilities.
Make an entry in your learning log relating to the list above. Do you feel comfortable that you know how to do
these things? If not, what are your concerns and how will you address them?
4 A simple probability experiment

Here I am going to describe a simple experiment using what is called a Bernoulli trial. From this simple
experiment we can produce a quite complicated data set which we will work with to understand how to
represent and eventually calculate and estimate statistical results.
4.1 Bernoulli trials

An experiment where there are only two possible outcomes in a particular trial (such as coin tossing) is called
a Bernoulli trial. (The Swiss mathematician Jacques Bernoulli [1655-1705] is generally accepted as being
responsible for beginning the mathematical theory of probability.) Before I go on I should make it clear that I
have chosen coin tossing as our Bernoulli trial but this could be any set of data with only two possible
outcomes.
Our coin tosses could equally well be replaced by the probability that:
two components will fit together

a resistor meets or does not meet its specification
a batch of components will or will not arrive by a given date.
Bernoulli trials don’t have to have equally likely outcomes. So a Bernoulli trial could be the probability that I
throw a number greater than 4 on a single die which would give
and . Having said this, here I want to start with equally likely outcomes and then extend what
we learn to other Bernoulli trials where the outcomes may have different probabilities.
Activity 26
If I toss a coin, what is the probability that it will land heads up? Call this P(h).
Hide answer
Answer
Clearly, the probability that it will land tails up, P(t), is also 1/2. Incidentally, I could write as
they are the same thing: ‘not a tail’ must be a head. I can see this because there are only two possible
outcomes: heads, h, or tails, t. As with the dice, it doesn’t mean half of my tosses will give heads but, if I toss
the coin enough, then I should expect the number of heads to roughly equal the number of tails.
4.2 An experiment
Let’s do an experiment. I’ll toss the coin 50 times and see how many heads I get.
Sometime later…
I got 20. I’d expect 25 if the number of heads equalled the number of tails, but given that it’s random 20’s not
bad. What happens if I do it again? This time I got 26. What happens if I repeat this experiment many times?
I’ll do the 50 tosses 500 times.
This could take a little time; you might like to make yourself a cup of tea…
Here’s what I got.
Figure 15: Number of heads for 50 coin tosses over 500 trials.
It looks like I have a problem here. There’s just too much data. I’m interested in how frequently each value
occurs but there’s no chance of being able to make sense of the data in this form; so I’ll count the number of
occurrences of each number (this is called the frequency) and present that as a table.
Table 3: Frequency of heads for 50 coin tosses over 500 trials.
Table 3 Frequency of heads for 50 coin tosses.
Frequency 0 0 0 0 0 1 0 2 6 18 20 30 35 34 61 58 67
No. of 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
heads
This is much better, now I can see how the results are distributed and there seems to be a peak around 24, 25
and 26. I’m trying to visualise what is going on, and Table 3 is helpful, but a graph would be even better.
4.3 Frequency diagrams

Figure 16 shows the results from Table 3 plotted as a graph.
Figure 16. Frequency diagram showing the results of 500 sets of 50 coin tosses.
This type of graph, where we represent the distribution of data in terms of the frequency, is called a frequency
diagram. I’ve plotted the frequency of occurrence vertically and the number of heads in each of the 50 tosses
horizontally. The frequency diagram is much more helpful as a representation of the data because now we can
see clearly that the peak is clustered around the expected value of 25.
You can construct a frequency diagram from any set of data: it doesn’t have to be from a Bernoulli trial. Try
Activity 27.
Activity 27
Table 4 shows a data set which is the result of measuring the resistance of a batch of forty 200 Ω
resistors.
1. Create a frequency table similar to Table 3, grouping the frequencies into the following bins (the
term ‘bins’ is used to describe the groups into which we sort frequency data): 197 Ω, 198 Ω, 199 Ω,
200 Ω, 201 Ω, 202 Ω, 203 Ω, 204 Ω.
2. Use your table to plot a frequency diagram similar to that in Figure 16.
You might choose to plot the data using a spreadsheet or manually. You will have to round the data to
the nearest integer. There is an Excel version of Table 4 on the Resources page.
Table 4: Values in ohms for 40 resistors.
199.4 202.4 200.4 199.2 200.8 201.8 199.4 200.2
201.6 199.2 198.2 200.8 199.8 199.8 198.8 198.4
199.6 200.0 198.4 201.8 199.2 200.6 200.6 200.0
200.8 198.4 201.4 197.4 198.4 198.8 199.8 200.8
203.2 201.4 201.6 198.8 198.6 200.4 197.6 202.0
Hide answer
Answer
Rounding the values should give the following frequency table.
Table 5: Frequency table with the values rounded.
Frequency 1 6 9 9 8 6 1 0
Bin 197 198 199 200 201 202 203 204
Figure 17: Frequency diagram for the data shown in Table 4.
In brief you should now know:
1. what a Bernoulli trial is

2. how to construct a frequency diagram from a discrete data set
3. how to form a discrete data set from a data set which is not discrete.
5 Binomial distributions
Let’s return to our Bernoulli trials and the coin tossing experiment and look at the probabilities to see if we
could have predicted the distribution of heads we observe from experiment. If I toss two coins the possible
outcomes are:
tt, th, ht, hh
You can consider this to be a uniform sample space; all outcomes are equally likely. I hope you’ll agree this is
the case. This means that, since there are 4 possible outcomes, each must have a probability of 1/4. You
might also notice that there are possible outcomes.
Activity 28
Figure 18: Coin toss
What are the possible outcomes (the sample space) for three coin tosses?
Hide answer
Answer
ttt, tth, tht, thh, htt, hth, hht, hhh
What is the probability of each outcome for three tosses?
Hide answer
Answer
There are 8 equally likely outcomes so the probability of any outcome is 1/8.
Again, you might notice there are possible outcomes.
Activity 29
Figure 19 shows a way to visualise the possible outcomes for two coin tosses if I am not interested in
which toss produced which result in the same way as when we summed the dice in the previous section.
The numbers in the circles give the number of ways of achieving a particular outcome.
Figure 19: Possible outcomes for two coin tosses.
In a moment you’ll watch a screencast of ways to predict the outcomes for a series of coin tosses.
Before you do, please consider the following questions and look for the answers in the screencast:
How may outcomes are there if I toss a second coin?

How many ways are there of arriving at each outcome?
How may outcomes are there if I toss a third coin?
How many ways are there of arriving at each outcome?
Now watch the screencast, which will show you how this diagram develops and should enable you to
predict the outcomes for greater numbers of coin tosses.
Hide answer
Answer
With two coin tosses, there is one way of arriving at two heads (toss two heads), and one way of
arriving at two tails (toss two tails).
There are two ways of arriving at one head and one tail (toss head then tail, or tail then head).
With three coin tosses, there are three ways of tossing two heads.
You should now be able to extend the diagram for four, five or more coin tosses.
Activity 30
Referring to Figure 20, what is the value in the circle that represents these possible outcomes for four
coin tosses:
1. hhht?
2. thhh?
Figure 20: The result of 4 coin tosses.
Hide answer
Answer
Figure 21: There are 4 ways of arriving at one tail and three heads. It doesn’t matter in which order you
count them.
How many outcomes are possible for 4 coin tosses?
Hide answer
Answer
What is the sum of the binomial coefficients for 6 coin tosses?
Hide answer
Answer
What would be the probabilities for each possible outcome from 4 coin tosses?
Hide answer
Answer
24 = 16 then referring to figure 21 we have: 1 4 6 4 1. Combining these results gives:
four heads 1/16

three heads 4/16 = 1/4
two heads 6/16 = 3/8
one head 4/16 = 1/4
no heads 1/16
5.1 Pascal’s triangle

The pattern we are using to produce the binomial coefficients is quite famous: it is called Pascal’s triangle.
(Blaise Pascal was a 17th century French mathematician and physicist whose name is today given to the SI
unit of pressure.)
Figure 22: Pascal’s triangle.
Each term in the triangle is generated by the sum of the two terms either side in the line above, exactly as you
have been doing with our paths and circles earlier.
Activity 31
What would the next line be?
Hide answer
Answer
Figure 23: The next line of Pascal’s triangle.
5.2 Using Pascal’s triangle to calculate probabilities

This enables us to calculate the probability of outcomes for any combination of coin tosses. Look at the
following example.
Activity 32
What is the probability of tossing exactly 3 heads and 2 tails from 5 coins (in any order)?
Hide answer
Answer
The binomial coefficients from Pascal’s triangle are 1, 5, 10, 10, 5, 1.
The sum of all these is or 32. So, the probabilities are 1/32, 5/32, 5/16, 5/16, 5/32, 1/32. These are,
respectively: 5 heads, 4 heads, 3 heads, 2 heads, 1 head and 0 heads. So the answer is 5/16. (It doesn’t
matter which way around I do it; I could look for two tails, I’d get the same answer).
What is the probability that I’ll toss at least 1 head from 6 coins?
Hide answer
Answer
I could do this by summing all the coefficients for terms that contain a head, but this is hard work.
Remember, P(at least 1 head) + P(all tails) = 1 since this covers all possibilities and they’re mutually
exclusive. Transposing this gives: 1 – P(all tails) = P(at least one head) So the probability of all tails is
1/64 and therefore P(at least one head) = 1 – 1/64 = 63/64. (A good bet!)
Activity 33
Two, equally sized, very large, batches of components A and B have been mixed. Unfortunately, all the
components in batch B are defective. You have selected at random six components from the mixed
batch returning each before selecting the next.
From your selection, what is the probability that:
1. none of the six components are faulty?

2. exactly two are faulty?
3. at least one is faulty?
4. more than three are faulty?
Hide discussion
Discussion
The batches are of equal size, so assuming they are thoroughly mixed this has the same expectation as
the coin tossing in that there is a probability of 1/2 that I’ll select a faulty component at each attempt. In
this case I indicated that the components were returned to the batch, so the results are independent of
the number of samples. Also, the batch was very large and this would mean, providing you were only
removing a small sample, you could ignore the dependence on the removed components. The relevant
terms from Pascal’s triangle are: 1 6 15 20 15 6 1 and 26 = 64. So the probabilities of selecting n faulty
components are:
So from above
1. this is when as n is the number of faulty components so
2. similarly this is so
3. this is a bit more subtle, you can get to it by adding all of the other terms but it’s much quicker to
use since is the only other possibility.
So the answer is
4. There’s no shortcut here; this is simply the sum for n = 4, 5 and 6
Figure 24 shows 3 plots of rows of Pascal’s triangle for n = 9, 10 and 11. You can see that although the
number of terms increases the profiles of the three plots are very similar. They’re also remarkably similar to
the shape we got in Figure 16.
View larger image
Figure 24: Pascal’s triangle plotted for (a) n = 9, (b) n = 10 and (c) n = 11.
5.3 Binomial expansion

This pattern for Pascal’s triangle might remind you of what happens when you multiply out an expression of
the form
For example, for n = 2, we have:
Equation 1
and for n = 3.
Equation 2
Looking at the coefficients in Equation 1 we have 1, 2, 1. For Equation 2 we have 1, 3, 3, 1 (remember the
coefficients of the first and last terms i.e. and in Equation 2 are one). So the coefficients match the
numbers generated in each successive line of Pascal’s triangle. Note how the indices for count down
whilst the indices for b count up. Incidentally anything raised to the power 0 is equal to 1 so making
0
this statement true for the first and last terms in the series too. (Well, not quite ‘anything’; 0 depends on the
context – fortunately this won’t concern us.)
This process is called the binomial expansion of and the coefficients are referred to as binomial
coefficients.
Activity 34
Write the binomial coefficients for and so expand the brackets to give its binomial expansion.
Hide answer
Answer
Using Pascal’s triangle the coefficients are 1, 4, 6, 4, 1 so
Equation 3
So, Pascal’s triangle is handy for quickly multiplying out expressions of the form ( + )n, but these
expressions also relate to probabilities. Let’s say we are tossing a coin three times. We know that the
probability of tossing a head and the probability of tossing a tail So and
in the right hand side of Equation 2. I get the probabilities as follows:
Equation 4
You can see now that I have an expression in Equation 4 which is the sum of each and all probabilities for the
outcome from tossing three coins or one coin three times. I could have arrived at this conclusion by taking the
appropriate line from Pascal’s triangle and dividing each term by the sum of the binomial coefficients –
remember that is the total number of paths leading to a set of outcomes as in Figure 20.
In equation 4 you can see that process of dividing by to convert the terms in Pascal’s triangle to
probabilities, emerges naturally from the probabilities which are each being multiplied to give a factor of
for each term.
If I were to plot the shape of Pascal’s triangle for 50 throws I could compare the output to the outcome of our
coin tossing experiment in Figure 16. It’s going to be hard work to write out all the terms in the triangle up to
50 but fortunately there is a way to calculate each term. You might have encountered this already, they are
written in many forms nCr or nCr or C(n,r) or where n is the number of coin tosses and r the number of
term for which we are looking to find the coefficient.
So I could write Equation 3 in the form
So how do we calculate nCr in general?
Activity 35
See if you can find this function on your calculator (make sure it’s a scientific calculator), on mine it’s
marked nCr, then find the terms in the row for n = 5.
Hide answer
Answer
On my calculator I first enter n then press nCr then enter r. For all terms n = 5 then for each term
r = 0, 1, 2, 3, 4, 5. Your result should be 1 5 10 10 5 1.
Fortunately you don’t have to do the calculation yourself, but here is the definition of nCr if you fancy trying it:
The exclamation marks, in terms like ‘n!’, mean the factorial value. This is the product of all the terms between
1 and n. For example
These numbers get large very quickly indeed. 20! = 2.4 x 1018. (Incidentally 0! is defined as equal to 1, this
may seem odd but as an engineer you’ll find occasionally have to simply trust the mathematicians and not
worry about the proof!)
It’s still quite a lot of work but I can now calculate all 51 data points (remember, we count r from zero, hence
51 rather than 50) and compare the curve with the data I got from the coin tossing experiment. Here it is
overlaid on my original data set. Figure 25.
Figure 25: Match between the coin tossing data and the recalibrated binomial coefficients.
I had to do a little work recalibrating the binomial coefficients in order to make the two data sets comparable.
Firstly there is the factor of 250 which I need to divide by in order to turn the binomial coefficients into
probabilities. (Why? Because I have to divide by the sum of all the coefficients in order to turn the values into
probabilities.) Then there is the fact that I have 500 data points in my original data set. This amounts to
multiplying each of my binomial coefficients by 500.
Equation 5
where F is the Frequency.
After doing this I have the data points in Figure 25. Finally I need to make a judgement about the quality of my
data. Have I got enough data to be sure of a representative sample? Looking at Figure 25, I can see that I
have a fairly good fit to the binomial distribution, but it’s not perfect. It may be that the coin I’m using has a bias
but it’s not clear. What I really need now is a number that tells me the quality of fit to the theory. We will return
to this shortly when we look at Measuring uncertainty and random walk.
Activity 36
See if you can reproduce the data point corresponding to bin 23 using Equation 5.
Hide answer
Answer
Using the right hand side of Equation 5
Before moving on, it is worth briefly reviewing the relationship between Pascal’s triangle, the binomial
expansion and our attempt to replicate the data as shown in the chart in Figure 25.
Activity 37
a. Use your calculator to find the binomial coefficients for n = 8.
Hide answer
Answer
a. The binomial coefficients are 1, 8, 28, 56, 70, 56, 28, 8, 1
b. What is the value of 28?
Hide answer
Answer
b. 256. This is the same as the sum of the binomial coefficients for n = 8
c. From 8 tosses of a fair coin what are the probabilities of tossing exactly
i. zero heads
ii. 1 head
iii. 2 heads
Hide answer
Answer
i.
ii.
iii.
d. From eight tosses what is the probability that I toss two heads or fewer?
Hide answer
Answer
d. Don’t forget that you have to include the probability of throwing no heads.
e. What is the probability that I toss fewer than six tails?
Hide answer
Answer
e. You could do this by summing the probabilities for, zero tails, one tail two tails… up to five tails. But
this is simply the opposite of the previous problem, so you don’t need to calculate all these terms. You
can use the relationship:
f. For n tosses of a fair coin what is the probability that I toss:
1. a single head?
2. all tails?
3. at least 1 head?
Hide answer
Answer
1. The probability of tossing a single head is
2. The probability of tossing all tails is
3. Again, this is the probability of not tossing all tails
5.4 Brief Summary

We found, given a series of events each with two possible outcomes, we can plot the number of possible
routes to a particular outcome (such as five heads out of seven coin tosses) using Pascal’s triangle. Dividing
this by the total number of routes gives us the probability of a particular outcome.
So in short, we can quickly calculate the probability for an outcome using Pascal’s triangle or nCr then dividing
by
There are two ways of thinking about the divisor, Either it’s the number of possible paths or it’s the
probability used as follows:
Either way, since , we get the same result.
5.5 Bernoulli trials which are not 50-50
In using the coin tossing example, we have ignored the broader class of Bernoulli trials where the probabilities
are not equally distributed. Obviously this is a much more common case than you have read about thus far.
Our analysis using is still appropriate we can substitute the outcome of any Bernoulli trial into and
and this method will still work. Suppose for a moment that and where
and . As before, the sum must always equal 1; remember, this is the result of a
Bernoulli trial. Or if you prefer,
Example 3
Let’s look again at the problem of a mixed batch of good and faulty components. We will assume that the
batches are so large that the removal of samples does not affect the distribution of probabilities. This time
we will say that 1/4 of the components in a batch are faulty and we can calculate the probability, having
chosen four components at random, that:
1. none of the four components are faulty

2. exactly two are faulty
3. at least one is faulty
4. more than two are faulty.
We have 4 trials so our binomial expression has an exponent of 4.
If we insert the probability then, because of the entire batch is faulty, it follows that
So, we have
Again, each term represents a particular combination of (Note the sum of these
fractions is 1.) You can read them off as the probability that all four, three, two, one or none are faulty. In
answer to our questions:
1.
2.
3. The opposite of none being faulty
4. The sum of three and four being faulty.
You should check that you agree with these results. They follow the same procedure as before, in that
the binomial coefficients are just as you’ve calculated previously, but the probabilities are no longer all
equal to .
Activity 38
For the problem described in Example 3 above find the probability that at least two are faulty.
Hide answer
Answer
The working is very similar to (3) above but this time we have to exclude the possibility that only one is
faulty as well as none.
Figure 26 shows the outcomes graphically. You can see that the outcomes are not symmetrically distributed
about 2 as they would have been if the outcomes had equal probability.
Figure 26: Probability of selecting n faulty components from 4 where the probability of selecting a single faulty
component is 1/4.
There are considerable differences between this distribution and the previous cases where the frequency
diagrams were symmetrical. Now we have a definite bias in favour of finding good components as we’d
expect.
In the next section you will see a useful approximation for these calculations which is easy to apply for large
sample populations where the probability of an event is small.
5.6 The birthday problem

This is also sometimes referred to as the birthday paradox because the outcome seems absurd.
Activity 39
Just using your intuition, guess how many people I need to gather together in order to have a chance
greater than 50% of finding two who have the same birthday?
Hide answer
Answer
The answer is 23. This is a smaller number than is commonly thought or that perhaps even seems
logical!
We can use a Bernoulli trial to verify this number by considering each person in turn and calculating the
probability. This is another case where it is easiest to ask: what is the probability that nobody shares the same
birthday from the 23? Then to find the probability that at least two people do share a birthday, take:
1 − P(no one shares a birthday) = P(at least 2 people share a birthday)
I will have to exclude leap years because I need to assume all the birthdates are equally likely but if you were
to calculate allowing for Feb 29th you’d find it makes no difference to the result at this level of accuracy.
I need to set this up as a series of independent probabilities and then take the product of these to obtain the
net probability. Let’s start with a simpler problem. What is the probability of three people not sharing the same
birthday?
I first select one person and ask, what are the possible birthdays? There are 365 of them and 365 days in a
year so the answer is 365/365 = 1. Next, I select the second person and again ask, what are the remaining
possible birthdays? Now, if this person is not to share a birthday with the first person I chose, there are only
364 possibilities; so I have 364/365. Finally I take the third person and ask the same question. The result must
be 363/365. Finally I can calculate the probability that none of the three have the same birthday.
Activity 40
What is the probability that four people do not share the same birthday?
Hide answer
Answer
Can you see a pattern emerging in the calculations?
How would you calculate the probability that 7 people do not share the same birthday?
Hide answer
Answer
What is the probability that two or more, of the seven, share the same birthday?
Hide answer
Answer
I can use this pattern to proceed for the 23 birthdays, but before I do this I should like to simplify the
calculation; it’s rather lengthy if I have to multiply 23 numbers in the numerator. I can see that the denominator
is going to give 365r where r is the number of people, but what about the numerator? I note that this is the first
r terms of 365! Where r is the number of people I am considering.
To make this clear consider 10!/7!. If I write this out longhand I get
By cancelling terms top and bottom I can reduce this to the first three terms in the numerator. I could have
written 10!/7! above as 10!/(10-3)! So
I can use this method to multiply the first r terms of the calculation above in the form
Applying this to the example where we considered 7 people gives (Don’t bother trying to calculate this; the
numbers are too big for your calculator):
Equation 6
We have an expression above which is accurate but numbers like 365! are simply too large to put into a
calculator. I have included the next part for completeness but if you are happy to accept the result or have a
calculator capable of coping with very large numbers then you won’t lose much by skipping to the final result.
Activity 41
In practice we can use an approximation for n! which, providing n is large, is very accurate. It’s called
Stirling’s approximation or Stirling’s formula. If you’d like to argue through the calculation here it is.
for large n where and e have their usual values. Even for modest values such as n = 10 this
approximation is within less than 1%.
Now try replacing the factorial terms in Equation 6 above using Stirling’s formula and see how you get
on.
Hide answer
Answer
So for 23 people I can write
So the probability of two or more of our 23 people sharing the same birthday is 1 – 0.4927 = 0.5073 or just
over 50%.
Note that we didn’t calculate that it would be 23 people; that was given. We verified that it was 23. The
calculation to find the value 23, given that the probability is a half, goes beyond the scope of this module.
1. how to construct Pascal’s triangle

2. how to relate Pascal’s triangle to probabilities
3. how to work with Bernoulli trials which are not 50-50
4. how to analyse the probabilities in a binomial distribution
5. how to perform simple calculations of probability based on the binomial series.
6 Poisson distribution
Although the methods we have used so far give insight they are quite hard work, especially when the number
of trials gets large or the probabilities become small. For observations where this is the case, there is a very
useful technique which uses the Poisson distribution. (Simeon Denis Poisson, 1781–1840 was a French
mathematician and physicist.) The Poisson distribution expresses the probability of a given number of
outcomes occurring over some fixed interval such as time, on condition that these outcomes occur at a known
average rate and do not depend on the time since the last outcome.
The Poisson distribution is useful when we select a sample from a large batch and ask how many faulty
components we would expect to find in that sample. Here we assume that we know the probability of finding a
faulty component P(A). We don’t need to know the batch size in order to use this method.
The Poisson distribution forms a good approximation to the binomial distribution providing the probability of a
single occurence is small in relation to the sample. In practice, the product of the sample size, N, and the
probability, p, should be less than 5.
In this case we can use the following equation in much the same way we used the binomial theorem to find
the coefficients for the binomial distribution.
The probability, , that an event A will happen n times in N trials is given by:
Where p is the probability of a single occurrence of the event and e is the mathematical constant 2.71828 to 5
decimal places.
Example 4
Suppose the probability of a defective transistor in a large batch is p(defective) = 0.020, and that I want to
know the probability of finding 2 defective transistors in a sample of 90.
First I test that the use of Poisson distribution is valid:
This is less than 5, so I am safe in using the Poisson distribution.
The probability of two defective transistors is:
Activity 42
What is the probability of 4 transistors being found to be defective?
Hide answer
Answer
Following the method above:
In brief you should now be able:
1. to test if the Poisson distribution is a valid approximation.

2. to apply the approximation in simple cases.
7 Measuring uncertainty (standard deviation revisited)

You met standard deviation in Study Guide 1, Section 5.2. I want to revisit this in the context of sampling data
for large populations. In this section we are going to be thinking in terms of large batches where we can’t
possibly sample all of the components, so our sample will always be considerably less than the size of the
total population. We will need to distinguish between the population and our sample when we talk about the
mean and the standard deviation so I will adopt the following convention.
For the population mean I will use µ (mu) and for its standard deviation σ (sigma).
For the measured sample I will use for the mean and s for the standard deviation.
You have already met the standard deviation expressed in terms of the symbols above:
Equation 7
Activity 43
Does Equation 7 refer to the population or the measured sample?
Hide answer
Answer
I have used for the mean and s for the standard deviation; therefore it refers to the measured sample.
In Equation 7, n is the sample size and the observed/measured value. Because we are dealing with large
populations (n is very large) we can say
so we can rewrite Equation 7 in the form
Equation 8
It’s worth bearing in mind that this is an approximation generally accepted for samples where n is greater than
25.
Because the standard deviation is used so widely and in so many contexts it is worth a little effort to see how it
relates to probability and random processes. For this I need to digress a little and return to our coin tossing.
7.1 Random walk

This time I want you to imagine you are standing under a lamppost on a long straight pavement running from
left to right. If the coin comes up heads you take a step to the right, we’ll call that the positive direction, if it
comes up tails take a step to the left, the negative direction.
Figure 27: Stepping out.
If you toss the coin for a long time, how far from the lamppost would you expect to find yourself? You might
suggest that on average you’ll find yourself back where you started, but it does seem reasonable that if you do
this for long enough you would drift in some sort of proportion to the number of steps. What we are going to do
is calculate this drift in terms of the number of steps.
At this point I should point out that this is directly related to what you studied in Study Guide 1, Section 3.3 on
random errors. where you saw the role of random noise in data: if you keep taking data the noise reduces but
never goes away. As you have seen if I repeatedly take readings of a particular data point, I’ll get a series of
values which spread around the actual value I’m trying to measure. You rightly expect that if you keep taking
readings then measure the average you’ll find the value you’re looking for with increased accuracy, providing
there are only random errors affecting your measurements. This is of course true and you saw the role of the
standard deviation in estimating the accuracy of a value when several readings are available. Incidentally, it
also helps in understanding a vast range of things which are driven by random processes, anything from how
atoms or molecules move in a gas (Brownian motion – which governs how gases mix) to how components
vary in size during manufacturing to some of the behaviour underpinning share prices.
In order to keep things simple let’s assume that all the steps you take are of exactly one unit in length. (It
doesn’t matter what the unit is; this just enables us to measure distance in step lengths.) If I take N steps, all
in the same direction, then the distance I will move from the lamppost, let’s call it D, will be equal to N. (If we
had measured the step length in, say, metres we should have to multiply N by the length of a step in metres
but as you will see it’s not necessary.)
Generally, , D would equal N if all the steps were in the same direction but that’s very unlikely given
we are choosing the direction with coin tosses.
Activity 44
Why will ?
Hide answer
Answer
the sum of N steps including their directions. Some steps will be in the negative direction
cancelling the steps in the positive direction. Therefore will be the maximum distance. (Of
course, the steps could all be in the negative direction so strictly I should say where is the
magnitude of D).
What would the probability of all steps being in the same direction be for 20 coin tosses? (Hint, you
might want to go back and look at Bernoulli trials in section 4,1).
Hide answer
Answer
This is based on your work in Section 4.1.
or about 1 in a million.
So the value of D is the difference between the sums of all the positive and all the negative steps.
Figure 28: Plot showing results for 300 steps.
We are only interested in the magnitude of D not its direction. This is called the ‘displacement’ and usually
written as . The bars either side of D indicating we are only interested in the magnitude (this is called the
absolute value or modulus) in other words we will drop the minus sign of D if it turns out to be negative.
7.2 Relating displacement to the normal distribution

Activity 45
What will be the displacement if you take two steps in the positive direction and five in the negative.
Hide answer
Answer
It will simplify the argument a little if we work with D 2 as this is always positive and if we want to know the
displacement we can simply take the square root, This value is referred to as the root mean square of
D or DRMS. (You may have encountered RMS values. They are very common when dealing with alternating
electrical currents, for example mains electricity or the power rating of hi-fi speakers.)
Our question is: ‘What is your best estimate of the range of displacements you will find yourself within from the
lamppost after N steps?’ After 1 step D 2 = +1, regardless of direction, so we know to expect a value of 1.
What we would like to estimate is the value of D 2 after N steps, let’s call it, DN 2. After N-1 steps we have a
displacement then taking one more step if you go to the right or
if you go to the left.
After stepping left or right we have either
Looking at the squares of these two possible outcomes we have
Since the probability of moving in the positive direction is the same as that for moving in the negative
direction, we can take an average of the two as half of the sum of these expressions. (You will notice
that I have expressed the outcome as rather than This is because it is an expected value (or
expectation value) arrived at by finding the mean of the two outcomes, rather than a measured value which
would vary each time I do the experiment. We arrived at it by taking an average of the two possible outcomes.
You can think of as the mean value over very many measurements.)
The use of the notation to indicate an expectation value is very common in engineering and physics.
You can think of it as the mean outcome over a very large number of experiments or trials of the form
shown in Figure 25. The important point is that it’s just a number. The angle brackets just indicate there is
a process behind arriving at this number.
So, on average
which simplifies to
Equation 9
Now we know that this simply says that the displacement after your first step is depending on
which direction you step but taking the square gives 1 regardless of direction. Then taking a second step, so
N = 2 in Equation 9 we have
Taking a 3rd step we have
and so on. Note how the subscript for N is the same as therefore, . So the expected
displacement after N steps is given by
This is a very important result; it tells us how far we might expect the sum of a series of unbiased
measurements to wander from their average value over time and it’s almost what we set out to find.
We can go just a little further and reason through how the displacement will look in relation to the mean. For
our coin tossing we know the number of heads + the number of tails has to equal the number of
coin tosses which is fixed as N.
Activity 46
Before I continue, there are a number of terms that have emerged in this section and you might feel you
are losing the plot a little. Here is a list of the terms I have used so far.
Fill in the description column with the missing terms. For a Word version of the table, click here.
Table 6: Terms used in this section
Term Description
N Number of steps (which equals the number of coin tosses)
Nh
Nt
DN
〈DN〉
RMS
Hide answer
Answer
Table 7: Completed list of terms used in this section
Term Description
N Number of steps (which equals the number of coin tosses)
Nh Number of heads after coin tosses
Nt Number of tails after coin tosses
D Displacement (distance from start measured in steps)
DN Displacement after N steps
〈DN〉 Expected displacement after N steps (the mean of a large number of trials with N steps
Square of the above
Modulus of D, drop the sign (In this case the same as D)
RMS Root mean square (In this case it means drop the sign)
This is often a useful technique if you feel you’re losing the plot.
What we are interested in is the difference in the numbers of heads and tails:
Using the expression above to substitute for:
we can write this in terms of just N and Nh.
This difference is what we estimated as so, dividing by 2, I can write:
I have to use the root mean square in order to ensure I have a positive value for so strictly, I should
write this as
This tells me that I am most likely to find my result within of its mean value.
7.3 Relating random walk to the experiment in 4.2
Figure 29: Distance from lamppost after 50 steps for four separate walks.
The outcome of each of these walks is a distance from the lamppost. This is the same as the difference
between the number of heads above or below half the number of coin tosses, Each of these could
be an entry in a bin on our original coin tossing experiment if we add in Section 4.2.
Activity 47
What is the value of for the experiment in Section 4.2?
Hide answer
Answer
There are 50 steps in each trial so
At the end of Section 7.2 we found that for a random process with N steps we are most likely to find our result
within of its mean value.
Figure 30: Standard deviation.
Looking again at our prediction for coin tosses we see that
This gives us an estimate of how much our number of heads might vary around N/2 which, in this case equals
25. Looking at our measurements we can see that a large proportion are within the range 25 +/- 3.5, but not
all. You can see the prediction in Figure 30, it’s the shaded area in the centre of the graph.
In fact this measure covers just over 68% of the range of outcomes if they were distributed as our analysis for
the binomial distribution showed. You may recognise this figure of 68% from Section 6.2 in Study Guide 1.
This is the standard deviation; it is a very useful measure of the width of the curve for the binomial distribution
and it is widely used because the curve has no definite boundary.
7.4 Full width at half maximum

If you look carefully at Figure 30 above, you’ll see that 4.2 either side of 25 is half of the amplitude (vertical
height) of the binomial distribution curve. I can draw a horizontal line at half the maximum amplitude (around
29) and it will cross the curve at two points. If I then read the horizontal axis at these two points, the difference
between them is the full width at half maximum (FWHM), ∼8.4 in this case.
Figure 31: Full width at half maximum for a normally distributed data set.
Figure 31 shows how we measure the full width at half maximum for a general normally distributed function.
We find the mean, measure the amplitude, then draw a horizontal line across the curve at half that amplitude.
The width between the points of intersection of that line with the curve is the full width at half maximum
(FWHM) as shown in Figure 31. This is another commonly used measure of the width of the normal
distribution.
In brief you should now know how:
1. to describe random walk

2. random walk relates to standard deviation
3. to justify the use of standard deviation for measuring the spread of a distribution
4. to estimate the full width at half maximum for a normally distributed data set.
8 Evaluating results
8.1 Continuous distributions
Most of the data sets we have dealt with thus far have been discrete. That is to say the outcomes are discrete
numbers; 3 heads for the coin tosses, 2 sixes for the dice throws etc. Now I want to turn our attention to
continuous distributions. By continuous I mean they can be subdivided indefinitely; so the points I plotted in
Figure 30 would become a continuous line. You might argue that the resistors in Activity 27 were part of a
continuous distribution (since resistance can take on any value), but we made the data discrete by rounding
into bins. When we measure physical properties such as height, mass, current or time, we are measuring
continuous properties. When we deal with continuous data the most common approach is to make it discrete
by sorting it into bins. As in Activity 27 each bin contains data over some interval, so for example if we
measure height the 177 cm bin contains the number of people with heights between 176.500 and 177.499.
8.2 Histograms
Our coin tossing data resembles that for the binomial distribution, but before I continue I need to represent the
data in the form of a histogram, see Figure 32.
Figure 32: Histogram for outcomes of 50 coin tosses measured 500 times. Because the width of each bar is
one, the heights are the proportion of the 500 experiments that have the given the number of heads indicated
on the horizontal axis.
This is similar to the frequency chart; except now the bars fill the available space and the heights have been
recalibrated. For a frequency chart the heights of the bars equal the frequencies; in a histogram it is the areas
of the bars that are equal to the frequencies. The area of the bars in Figure 32 sum to equal one. So the area
of each bar represents the frequency of that particular outcome. The process of making the total area equal to
one is called normalisation and you will see the value of doing this shortly.
I will now draw a curve on top of my histogram which is calculated from a continuous function. You can think of
this as a continuous version of the data points we obtained when we produced the binomial distribution.
Figure 33: The normal curve overlaid on the histogram of Figure 32.
This curve is called the normal curve and we can use it to analyse our data. I will come to how we calculate
this curve for a given data set in a moment. But let’s look at how our coin tossing data fits this curve
qualitatively. You can see two things:
the mean of the distribution is roughly where the maximum of the curve falls.
the FWHM for both data and the curve are roughly the same.
The normal distribution is determined entirely by two values the mean and the standard deviation. The coin
tossing data we have been using is normally distributed but often we won’t know what the distribution of our
data is – in many cases we assume it is normally distributed and if the underlying uncertainty is random this
assumption will turn out to be sound.
8.3 The normal distribution

The normal distribution is a continuous function, described by the equation
Equation 10
where the variable, z, is called the normal standard variate. You don’t need to be concerned with the detail of
the derivation of this equation which is beyond this text; essentially it comes from allowing our walker’s step
length to vary a little. The important thing is to understand its behaviour and to be able to apply it.
Remember f(z) says that z is the independent variable and f the dependent variable. Or f varies as a
consequence of z varying.
Here the constants e and have their usual values of 2.718… and 3.142… respectively.
Activity 48
Looking at equation 10:
what is the value of when ?
Hide answer
Answer
When we have
and
so
What is the value of when a large positive value?
Hide answer
Answer
Remember we can get rid of the minus sign using , as becomes large the denominator
on the right hand side will become very large, because the exponent goes as , so the term
will tend to zero.
What is the value of when a large negative value?
Hide answer
Answer
Since we are squaring z it really doesn’t matter if the exponent is positive or negative – the squares will
be the same so the plot for negative z is exactly the same as for positive z.
This curve is shown as a function of z in Figure 34. (check)
Figure 34: The normal distribution f(z)
You can clearly see the behaviour you described in your answer to the question above for large and for
. f(z) is called the probability density function for the normal distribution.
Now we need to relate this curve to an observed distribution such as was shown in Figure 29. The only
variable in Equation 10 is z and we will now use this expression to relate the two.
Equation 11
Here and are the mean and standard deviation respectively. We are now using and because these
are not measured values but values for an assumed normal population distribution.
The plot for this curve as a function of x is shown in Figure 35.
Figure 35: The normal curve as a function of x centred at µ, standard deviation σ and an interval f(x) × δx.
You can think of this curve as telling you the probability of finding our coin tossing walker a distance from
the mean, This distribution is what we would observe if the random walk we took earlier had a large
number of steps and we allow for slight variations in the step length. The slight variations turn the outcomes of
the random walk into a continuous distribution.
In practice you would look for our walker over some small interval, say, from to rather than at a
precise position ( is a very small change in ). This is a small area under the curve ( wide and in
height) as shown in figure 33.
We are now in a position to match our distribution to some data. Let’s take the data from the first coin tossing
experiment.
Activity 49
The answer to this question assumes you are able to use a spreadsheet to carry out the large number of
repetitive parts of the calculations.
Calculate the mean and standard deviation for the data shown in Figure 15, which you first saw in
Section 4.2. Look again at Equation 8 below, which you first saw in Section 7, following Activity 43. I’ve
repeated them below. The data is available here.
Figure 15: Number of heads for 50 coin tosses over 500 trials.
Equation 8
Hide answer
Answer
The mean is the sum of all the data divided by the number of data points:
The standard deviation is given by Equation 8 where is the measured value of each of the data
points.
How do these compare to the mean and standard deviation previously calculated based on theory?
Hide answer
Answer
Looking back to section 7.3, we expect a mean of 25 and a standard deviation of 3.536. So we’re within
0.8% and 0.3% respectively.
In the case above we knew the mean and standard deviation for the entire population but it is often the case
that we only have the measured sample to work with. We assume that the data will be normally distributed,
but this is often just an assumption since there may be non-random factors influencing the data.
8.4 Using the standard normal curve

We now return to the curve as a function of z.
Activity 50
Using Equation 11, what would be the value of if ?
Hide answer
Answer
so this means that the horizontal, or z, axis is calibrated in units of or one standard deviation.
Figure 36: The normal curve where z is measured in standard deviations.
The total area under the curve for is given by this integral
(Remember integrating a function finds the area between the limits of integration, the horizontal axis and the
curve. There’s a slight complication if the curve falls below the horizontal axis, because then the areas are
negative, but that is not the case here.)
This integral equals one. This is the total area under the curve and remember, the area under the curve
represents the probability of finding our coin tossing walker somewhere. If we look everywhere this has to be
one, because the walker must be somewhere.
If we want to know the probability over a large interval, say, between we would be looking to
find the area under the curve shown shaded in Figure 37.
Figure 37: Probability over a large interval.

This amounts to evaluating the integral from A to B
Equation 12
Remembering Equation 11 which I have repeated below
Equation 11
we need to input values for the mean and the standard deviation . Then what we have here is a standard
way of measuring probability over some interval. For example: if we can measure enough data to get an
estimate of (referring back to section 7, by definition that is ), and the (which by definition is ) we can
read information from a standard source in order to make statistical inferences about the population
distribution. This is exactly what we need to help us assess how height might limit our market in the case of
the vehicle I talked about at the start of this section.
Before you rush to solve equation Equation 12, I should point out that there is no analytical method for doing
this in general. Fortunately, once you have cast your distribution in the standard form of Equation 12 you can
simply look up the values in a table, as given in Figure 38.
Figure 38: The area under a standard normal curve as a function of z.
What we are interested in is the area under the curve up to some cut off value. Looking at Figure 38, what we
have within the body of the table are areas under the normal distribution curve measured from the centre,
, to some point given by the figure in the left-hand column which is added to the figure in the top row.
So, for example, the corresponding value for z = 1.72 is found by looking up 1.7 in the first vertical column
then 0.02 in the first row, the intersection of the two giving 0.4573.
Activity 51
Look up the following areas under the normal distribution curve from to:
a. 1.30
b. 2.71
c. 3.90
Hide answer
Answer
a. 1.30 : 0.4032
b. 2.71 : 0.4966
c. 3.90 : 0.5
You can see that all of the values in the last row 3.90 to 3.99 are the same at 0.5.
Why is this?
Hide answer
Answer
First, we are looking at half of the distribution. Given that it is symmetrical about , the area will be
half of the total which is 1/2, since the total is 1. Second, looking at the shape of the curve it tends to
zero as . In fact by the time the four decimal places we are using mean that the value
is rounded to 0.5.
So, now it is possible for you to find the proportion of a population from zero to some percentage using Figure
38. Similarly using the other side of the distribution it is possible to find values from some point up to 0.
Summing these two we can find the proportion over any range expressed as standard deviations.
Example 5
We can find the proportion of a normally distributed population lying between −1.21 and 2.11 standard
deviations of the mean.
First find the proportion from z = −1.21 and 0. This is the same as the proportion between 0 and 1.21
which from Figure 38 is 0.3869. Then do the same from 0 to 2.11 which gives, 0.4826. Summing these
two we have 0.8695. Or 86.95% of the population lie in the range −1.21 and 2.11 standard deviations.
Activity 52
Find the proportion of a normally distributed population between the following standard deviations.
a. −1.55 to 1.22
b. −2.2 to 2.2
c. 1.0 to 2.0 (note: these are both positive)
d. 2.0 to 3.0
Hide answer
Answer
a. 0.4394 + 0.3888 = 0.8282

b. 0.4861 × 2 = 0.9722
c. 0.4772 − 0.3413 = 0.1359 (we want the proportion to be positive so we put the larger number first)
d. 0.4986 − 0.4772 = 0.0214
From this point it is possible to define a process for answering questions relating to the probability of finding
outcomes in a certain range if we know the standard deviation and the mean of our population. Now we are
ready to do some engineering examples.
Example 6
A bearing is considered defective if it has a diameter greater than 290.00 mm. In a batch of 1500
bearings the mean diameter is 281.00 mm and the standard deviation is 3.00 mm. We can calculate how
many are likely to be classed as defective.
, for a defective component we have This is

because our range is defined by the upper limit on a component’s diameter.
Substituting into Equation 11 we have
Looking at the table in Figure 38 we have a corresponding value of 0.4987. We have to add on the 0.5
from the other half of the distribution giving 0.9987 as the proportion that will be below this upper limit or
99.87%. So, or 0.13% are defective.
Finally, we need to know the number of defective components. This is the product of the proportion which
are defective with the number in the batch, so we expect 2 defective
components.
Activity 53
A designer of aircraft seats is interested in how people’s masses are distributed. Given the data that the
masses of 600 people are normally distributed with a mean of 68.5 kg and a standard deviation of 5.80
kg answer the following questions.
a. How many people are likely to have a mass of less than 55.0 kg?
b. How many are likely to have a mass greater than 70.0 kg?
Hide answer
Answer
a. We’re looking for masses, m, less than 55.0 kg.
Substituting into Equation 11 below:
Equation 11
We have: We’re looking for a small proportion that fall well below the
mean so we expected a negative value for z.
Looking at Figure 38 we have a corresponding value of 0.490 (looking at the values either side only 3 sig
figs are justified. We, again, have to add on the 0.5 from the other half of the distribution giving 0.990 as
the proportion that will be below this upper limit. So, or 1.0% have a mass of less
than 55 kg.
Of 600 people or roughly 6 will have a mass of less than 55 kg.
b. This time m > 70 kg, so giving a value of 0.1026. So adding on the 0.5
and subtracting the result from 1 gives the proportion as 0.3974 (in fact only 2 sig figs are justified) so
the proportion is 0.40. This amounts to 240 people in our sample of 600.
8.5 Six sigma

It is quite common for engineers to talk in terms of standard deviations when referring to a process. One
particularly common phrase is ‘six sigma’. This originally referred to a value which is six standard deviations
from the mean. This is often considered a standard of excellence and the term, six sigma, is used to describe
a set of tools for processes improvement which goes well beyond its origins. The implication being that the
process is error-free up to six standard deviations from the mean.
Activity 54
To what value of z does this equate?
Hide answer
Answer
z is measured in units of standard deviations, so that would be z = 6.
Looking at Figure 38, values of z beyond 3.8 correspond to an outcome probability of 1. Clearly the 4 decimal
places of the Table are insufficient for z = 6. In fact going out so far as gives 99.99999988% coverage. So,
if this were used as a manufacturing standard, only one component in would be faulty. Impressive? For
many industries this could mean never observing a faulty component. However, there are industries where
safety is critical, the airline industry for example, where six sigma or roughly 1 in a billion might not be enough.
Aircraft manufacturers meet this standard for the probability of mechanical failure per landing, however, across
an entire fleet (say all the Boeing 777s) it is conceivable that the total number of landings might equal one
billion over the life of the fleet.
Activity 55
What would be the probability of at least one failure for landings with a failure probability of ?
Hide answer
Answer
Remember, this follows a binomial distribution so we ask first ask what is the probability of no failures?
Then subtract the result from 1.
This gives a probability of at least one failure of 0.63, so over 60%. This is not terrible given the large
number of landings across the fleet, but it is significant.
1. how to construct a histogram from a set of data
2. how to distinguish between a frequency diagram and a histogram
3. how to describe the normal distribution as a function of z
4. how to perform estimates given areas under a standard normal curve.
9 Percentiles
You may remember that I referred to percentiles at the beginning of this Study Guide. It is often the case that
we are interested in some proportion of the normal distribution and so it is common practice to divide the
range into commonly used proportions. For example, we divide the distribution into four equal portions which
are called quartiles. Named appropriately: the first, second, third and fourth quartile. See Figure 39.
Figure 39: Figure 34. 1st, 2nd, 3rd and 4th quartiles.
In a similar way, it is useful to split the normal distribution into 100 equal parts called percentiles. There is a
little ambiguity about how the term ‘percentile’ is used. Some texts refer to it as the range up to a given point.
For example, an outcome in the 77th percentile would fall within the first 77% of the distribution. Others use it
as a measure beyond that point. In this text we will try to avoid ambiguity and refer to the 77th percentile as the
outcomes that fall between 76.5% and 77.499…%. Figure 40.
Figure 40: The 77th percentile. A is the region below the 77th percentile and B the region above the 77th
percentile.
Now I want to reverse the techniques we used to produce the data in Figure 38, that is, to state the range
within which a certain proportion of observations lie; assuming, of course, it is normally distributed.
Example 7
50 components are drawn from a large batch, they are found to have a mean diameter of 50.00mm and a
standard deviation of 0.30mm. We can find the diameter below which 95% of the components fall.
and This time we are given the proportion as 0.95 and we need to find
z. Referring to the body of Figure 38 we are looking for the value closest to 0.45 (remember we need to
subtract 0.5, if the proportion is greater than 0.5, in order to use the table). The nearest values I can find
are 0.4495 and 0.4505, which give 1.64 and 1.65 respectively. We’ll take the mean of these two values,
which is Now we need to convert this to find the diameter. In order to do this we have to
assume that our measured values are equal to the true values for the standard deviation. That is
and This is of course an approximation since we don’t know these values for the whole distribution
only for the sample. In some cases we will know the mean and standard deviation in most cases we will
be working from a sample.
Example 8
We can convert from z to x.
Looking at Equation 11, which defines z, we can transpose to give x.
Using this result we have . (The result is expressed to the

same number of significant figures we’re given in the question.)
We can calculate the result if we had used z values of:
a. 1.64
b. 1.65
We get:
a. 50.49 mm
b. 50.49 mm
We can see that it really doesn’t make any difference.
We can now see how to obtain a number when given a percentile for a normally distributed population. This
number could be a number of people, components or of any items in a given population. We look up the
percentile (once we have converted it to a decimal value) in the body of the chart and identify its normal
standard variate, z. Having done this, we can then convert from z to x.
Our procedure will be as follows: take a set of data then:
1. if we don’t know the mean we must calculate it from the data
2. similarly if we don’t know its standard deviation we calculate that
3. select a desired proportion of the distribution and look up its normal standard variate z.
4. multiply z by the standard deviation and add the mean.
Activity 56
Assuming we don’t know anything other than the measured values find the 95th percentile for the set of
resistor values given in Activity 27, Table 4.
Hide answer
Answer
The measured mean is 199.995 Ω
The standard deviation is 1.381 Ω
(If the sample size is assumed to be very large, and the approximation n≈ n−
1
is assumed, then the
calculated standard deviation is 1.363, which will have some effect on the rest of the calculation below.)
I am looking for 95%, that corresponds to a value of z = 1.65. Remember z is measured in standard
deviations so I can multiply by the sample standard deviation to give So
adding the mean in the 95th percentile of my sample is 202.27 Ω.
1. to what percentiles refer

2. how to translate percentiles into a numerical value which represents a proportion of a population
10 A brief introduction to sampling
Here we are interested in the results obtained from extracting several samples from a population. Suppose
each sample contains n members from a total population of np. For example, it might be that we are
manufacturing some components and are interested in their mass. We know the mean mass and standard
deviation to which we are manufacturing. Periodically we extract n components and measure their mean
mass. This is easy to do, as we can just weigh all n components.
Activity 57
If we weigh 30 components and the total mass is 6.5 kg, what is the mean mass, , of the sample.
Hide answer
Answer
So in this case it’s very easy to find the mean of the sample.
If we select m samples, each containing n members from a batch containing np members, how do the mean
and standard deviation of the m samples compare to the mean and standard deviation of the whole
population, np?
Activity 58
How many ways can I select a sample of 2 letters from the list below
A, B, C, D, E
Hide answer
Answer
AB, AC, AD, AE,
BC, BD, BE
CD, CE
DE
So there are 10.
Note: this assumes I don’t return the first letter to the selection pool before selecting the second. So I
don’t allow repetition in combinations like AA and I don’t allow repetition of combinations, for example
AB, AB. We say the sample is selected without replacement.
It can be shown that the number of different ways I can select n items from np is
So, if n is small and np is large, there will be a very large number of ways of selecting n for any sample. As a
result, if I extract a small sample from a large population I will find a wide range of mean values for each
sample.
Over several samples (each of n components) we could find the standard deviation for the sample means.
This is called the standard error of the means and given the symbol You can think of these as being the
statistics of statistical results if this helps; mean of means and standard error of means – which is very like the
standard deviation of a sample.
It is possible to give an expression for the standard error of the mean for a sample, n, taken from a larger
population, np This plays a similar role to the standard deviation of a measurement. The expression for the
standard error of the mean, is given by
Equation 13
where is the standard deviation for the population, np
There are cases where we might not even know or be able to find out the size of the population, np For
example this would be the case if the m samples were taken over time from a continuous process such as a
production run.
This expression assumes that items are not replaced after sampling – i.e. there’s no chance of choosing the
same sample twice. In a large population this is improbable, but in a small population it is significant. In fact if
the population is large, then
Equation 14
The number we sample, n, is tiny compared to the number of possible samples np Inserting this result into
equation 13 gives
Equation 15
So, this equation is used for large np or where each sample is replaced before the next is selected. You will
see that Equation 15 is independent of np This is good news, because it means we only need to know that the
batch is large compared to the sample, but we don’t need to know exactly how large.
So what of n, the sample size? As a rule n is taken to be greater than 30; this ensures a reasonable statistical
significance.
Activity 59
Samples of 30 components are extracted and measured from a batch of over 1,000. The standard error
of the mean is required to have an accuracy of 90%. Should I use equation Equation 13 or Equation 15
to determine ?
Hide answer
Answer
This amounts to testing the approximation used in Equation 14. We would like to know if Equation 14 is
within 90% of 1. If it is, the approximation is valid.
If we look at the approximation in Equation 14 we need values for np and n We know the size of each
sample,
The batch contains at least 1,000 components so the approximation will be poorest when .
Putting these values into (Equation 14) gives
so I will get 99% accuracy using the simpler equation (Equation 15).
If we took all possible samples of size n from a population np, the mean value of the sampling distribution of
means, , would equal μ, the population mean. In reality we don’t take all possible samples but providing the
sample size is over 30, as previously mentioned, we can say that
where μ is the population mean.
Activity 60
The lengths of 3,000 bolts are normally distributed about a mean of 25.40mm with a standard deviation
of 0.20mm. If random samples are taken of 40 bolts, we can predict the standard error of the means if
samples are taken:
a. with replacement
b. without replacement
Hide answer
Answer
a. For replacement we can use Equation 15
b. Without replacement we will use Equation 13
Which is the same as part a, because the population is large compared to the sample size.
Activity 61
A normally distributed population of 5,000 people has a mean of 170cm and a standard deviation of
8.2cm. Random samples of 30 people are measured. Find the standard error of the means if samples
are taken.
a. with replacement
b. without replacement
Hide answer
Answer
a. For replacement we can use Equation 15
b. Without replacement we will use (Equation 13)
In closing it is important to emphasise that the samples must be selected at random; any bias in the sampling
process will bias the result. Obviously, the larger the sample, n, the closer the measured mean and standard
deviation will be to the actual value for np providing the sample is selected at random.
11 Working with uncertainty

We have come a long way from asking how we estimate we’re meeting the needs of 95% of the population
using anthropometry or estimating how many components to sample in a batch. Some of the analysis may
have seemed quite abstract and challenging to follow, but the results we obtained are real and applicable to
most branches of engineering. It is unlikely that you got to grips with the subject based on a single reading,
like most mathematically based subjects practice and understanding go hand in hand and, as time goes by
you will find your familiarity with the subject will open doors.
In studying this section you have seen how uncertainty can be related to statistical distributions. You have
encountered many conceptual models which enable predictions to be made, even when the measured
outcomes are random. We started by looking at ways of combining probabilities. We looked at Bernoulli trials
and how we can use the binomial expansion to combine outcomes of simple combinations of random events.
From this we extracted the binomial distribution and, by extending the binomial distribution to continuous data,
we found the normal distribution.
Unbiased random events are normally distributed. We use this fact to enable us to sample outcomes from
large populations and, by assuming they are random and therefore normally distributed, we can make
estimates of where various segments of our population can be found.
Bibliography
Bird, J., (2014) Higher Engineering Mathematics (7th Ed.). London, Routledge.
Boas, M.L., (2006) Mathematical Methods in Physical Sciences (3rd Ed.). Hoboken, New Jersey, Wiley.
Feynman, R.P., Layton, R.B., Sands, M., (1963) The Feynman Lectures on Physics. Volume 1 Reading,
Massachusetts, Addison-Wesley.

Data Study Guide 2

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Study Guide 2

Uploaded by

Copyright:

Available Formats

07/04/2022, 08:46 Data Study Guide 2: View as single page

Data Study Guide 2

calculate probabilities based on simple assumptions

identify dependent and independent probabilities

calculate conditional probabilities

identify binomial distributions

use the binomial distribution to predict probabilities

understand where and how to apply the Poisson distribution

show how the binomial distribution relates to the normal distribution

use the normal distribution to measure behaviour of populations.

1 People, automobiles and statistics

the country from which the data was obtained

The sort of question we would like to be able to address is:

2.1 Probability using dice

What is the probability of throwing a number that is not 3?

What is the probability of throwing a number greater than 3?

1+6, 2+5, 3+4, 4+3, 5+2, 6+1

Figure 5: Rolling dice.

Now complete the table:

Number of ways 1 2 3 .... .... 6 5 .... 3 .... 1

The previous answer is exactly what we would expect. Why?

2.2 Probabilities that are not mutually exclusive

What is the probability of throwing a number greater than 2?

What is the probability of throwing a number less than five?

statistics and logic – particularly in the digital electronics industry. Similarly,

What is the value of ﻿?

2.3 Sample space

Figure 7: A non-uniform sample space.

The probability of A is the sum of probabilities. So P(A) = 1/30+1/30+1/9+1/15 = 11/45.

How would I represent not A, (or ), in Figure 7?

Figure 8: Ā is everything outside A.

Since I could sum these to find this will, of course, be which is .

Figure 9: Two mutually exclusive probabilities P(A) and P(B).

Which of the following are mutually exclusive?

A coin toss resulting in a head or a tail.

Choosing the king of spades or a heart from a pack of cards.

Choosing a king or a heart from a pack of cards.

Choosing a component which is above or below a 5 mm diameter.

Check your answer Reveal answer

What is the probability of:

a) choosing a king of spades or a four of spades from a pack of cards?

b) throwing a number greater than two or a one on a single die?

c) throwing a number less than three or a two with a single die?

In brief, you should now know:

2. the probability that A will happen .

5. how to construct a simple sample space.

You might like to summarise these in your Learning log.

3 Independent, dependent and conditional probabilities

Here are two questions with very different answers:

what is the probability of both A and B occurring?

what is the probability of B occurring if A has already occurred?

What number do I need to multiply 11/45 by to give a result of one?

We can use this to rescale each of the probabilities in Figure 12

as shown in Figure 13.

So the probability of A given B is or (note we work to 3

Using Bayes’ formula

You might note that

Therefore, the probability that a component is manufactured by X

because these are mutually exclusive. So

3.2 Independent and dependent probabilities

Two transistors are faulty in a batch of 20.

a. What is the probability of choosing a faulty component from the batch?

a. What is the probability of choosing a faulty component from the batch?

What is the value of ?