Professional Documents
Culture Documents
Y13 IAL Statistics Workbook - Student v1
Y13 IAL Statistics Workbook - Student v1
STATISTICS
FOR BIOLOGY
1
Year 13 IAL Biology – Statistics for Biology
Introduction
This booklet aims to prepare you for your Unit 6B exam paper, Practical Biology and Investigative
Skills. This paper contains three questions:
1) Core practical – you will be expected to evaluate a core practical, usually from the A2 course, but
it can include AS core practicals
2) Data analysis – you will be expected to analyse data from an experiment, including drawing a
results table, graph and statistical analysis of the results. This booklet will help most with this
question. You should aim to complete this section of the paper very quickly to allow time for the
(more demanding) Q3.
3) Experimental design – you will be expected to design an experiment to test a hypothesis. This
booklet will also be useful for answering this question.
This is not (thankfully!) an exhaustive guide to the finer details of statistical analysis. It covers only the
main aspects that you are likely to encounter in the Unit 6B paper. This guide is designed to be used
as a self-study unit, before you start the A2 Biology course, but your teacher will probably make some
time available at the end of the course to go over the material. If you have any questions, it should be
obvious to you by now to find your teacher and ask them to go over the work with you.
Ernest Rutherford (British Chemist; investigated radioactivity; Nobel Prize for Chemistry, 1908):
“If your experiment needs statistics, you ought to have done a better experiment.”
Benjamin Disraeli (British Politician and Prime Minister) / Mark Twain (American writer):
“There are three kinds of lies: lies, damned lies, and statistics.”
2
Year 13 IAL Biology – Statistics for Biology
Contents
1) Summarising data: p4-6
(a) Graphs
(b) Summarising normally-distributed data
(c) Standard deviation (SD)
(d) Summarising skewed data
3
Year 13 IAL Biology – Statistics for Biology
It is often convenient to summarise large sets of experimental data to make analysis easier. This can
be done by calculating the mean, median or mode of the data and these can be represented visually
in a graph.
• mean: sum of all the samples divided by the number of samples
• median: middle number in the list when arranged in rank order
• mode: measurement that occurs the most frequently
a) Graphs
• line graphs – when the independent variable is continuous
• bar charts - when the independent variable is discontinuous / categoric
• scatter graphs – similar to a line graph, but where there is either no clear correlation or where
there is no clear independent variable (i.e. may just be looking to see of two variables
correlate without looking at any causative relationships)
• histograms – continuous independent variable, grouped into equal-sized categories, for
convenience
Generally, results can either show a normal distribution, around the mean, or a skewed distribution,
around the mode.
To see which type of distribution shown by a set of results, we can use frequency tables:
i. find the range of the data (lowest – highest value)
ii. split the range into an appropriate number of equal-sized classes (make sure there are no
overlaps)
iii. tally (count up) the number of results falling into each class
iv. produce a histogram of the results (a proper histogram, not like the silly Maths ones!)
4
Year 13 IAL Biology – Statistics for Biology
5
Year 13 IAL Biology – Statistics for Biology
However, both the above data have the same mean and range, but the distribution of data is different.
We can get a better idea of the spread of the results around the mean by calculating the standard
deviation of the results.
The standard deviation of a set of results can also be plotted on a graph to show the reliability /
variability of the results set.
6
Year 13 IAL Biology – Statistics for Biology
Very often, with experiments in Biology, we are either trying to see if there is a difference between the
results gained from two different experimental conditions, or if there is a relationship between the
independent and the dependent variable in an experiment. At the end of these experiments, we will
come to a conclusion. At A2 level, we need to think about also measuring how confident we are that
our conclusion is correct i.e. would we come to the same conclusion 100% of the time, or 90% or 50%
etc.
Q2) Formulate null hypotheses for the following investigations. For each investigation, make sure you
can identify the independent and dependent variables:
a) A student decided to investigate whether eating breakfast every day had an effect on the body
mass of students in her class. She selected twelve students, measured their body mass and
asked if they ate breakfast regularly.
b) A student investigated the effect of caffeine concentration on the heart rate of Daphnia
c) A group of nine athletes wanted to see if training for two weeks at a mountain camp, 2000 m
above sea level, had an effect on the number of red blood cells in their blood. Samples of
blood were taken from each of the athletes at their normal training camp at sea level. Blood
samples were taken again after two weeks of training at the mountain camp.
7
Year 13 IAL Biology – Statistics for Biology
1) (Student’s) t-test
A t-test will tell you if the means of two sets of normally distributed, unmatched, (see Spearman’s rank
correlation below for what this means) continuous data are significantly different to one another.
Examples of where you might use a t-test:
• comparing mean heights of limpets on two different seashores
• comparing mean masses of plants grown with and without fertilizer
• comparing mean tree heights on North- and South-facing slopes
• comparing mean vegetation heights on trampled and untrampled areas
• comparing mean vitamin C contents of pasteurised and unpasteurised orange juice.
For any Spearman’s rank correlation you do, the null hypothesis will be:
There is no correlation between the two variables.
8
Year 13 IAL Biology – Statistics for Biology
A χ2 test does a lot of things but it can be used in a simple way to see if an observed set of data
(categorical data, counts of things in categories, i.e. frequencies) differs significantly from what we
might expect, given our null hypothesis. For example, it can be used in a genetics experiment to
compare the observed data with what might be expected from a cross between two heterozygotes.
Other examples of where you might use a χ2 test:
• Do seashore snails actively select specific microhabitats?
• Does lichen frequency differ between air-polluted and clean sites?
9
Year 13 IAL Biology – Statistics for Biology
Summary:
• calculate the t-value for your data (in your exam, this will be given to you)
• calculate degrees of freedom for the data sets:
degrees of freedom = n1 +n2 – 2 (n is the number of measurements in each data set)
• compare observed t value to critical values for t (at p=0.05) for these degrees of freedom
• if observed t value is greater than or equal to the critical value (at p=0.05) the null hypothesis
should be rejected i.e. there is less than a 5% (p=0.05) chance that the null hypothesis is
correct
• what you are doing is working out the probability that the experimental values you have, could
have been collected if there was no difference between the two populations – if it is a low
probability, then there must have been two separate populations
• degrees of freedom are simply the way in which some tests take into account the sample size.
10
Year 13 IAL Biology – Statistics for Biology
Step 1: Calculate the t value for the data (this value will be given to you in the exam: t = 3.42)
Looking at the table of critical values (p=0.05) for these degrees of freedom, we see it is 2.05. lan's
value of t (i.e. 3.42) is greater than this critical value. Therefore there is a significant difference
between lan's two sets of data.
Conclusion: The null hypothesis is rejected and there is a significant difference between the
light intensities in the two woods, at the 5% significance level.
(In fact, lan's value of t was greater than the critical value of 2.75 at the p = 0.01 level (using the
nearest degrees of freedom to 28 i.e. 30), so there was a one in one hundred chance of the difference
being due to chance).
11
Year 13 IAL Biology – Statistics for Biology
Q3) Dog whelks on sheltered and exposed shores appear to be different sizes
('heights'). Here are some data from two different shore types. Are they
significantly different?
Height on sheltered shores (mm): 22, 23, 26, 29, 30, 31, 32, 32, 33, 33,
35, 39
Height on exposed shores (mm): 15, 17, 19, 19, 20, 20, 21, 21, 21, 22,
24, 27
Calculated t value = 3.1
Fruits per plant (sprayed only): 33, 28, 56, 43, 45, 62, 74, 45, 32, 48
Fruits per plant (hand pollinated): 46, 42, 63, 40, 52, 60, 82, 74, 62, 55
Calculated t value = -1.77 (ignore the minus sign)
12
Year 13 IAL Biology – Statistics for Biology
Q6) A student decided to examine whether the size of leaves growing on geranium plants was larger
in the shade, compared to those growing in the sun. He measured the maximum length of leaves
of approximately the same age and recorded his results:
Sun leaves (cm): 3.5, 3.6, 3.2, 3.8, 2.9, 4.6, 3.4, 3.6, 3.4, 4.1, 3.2, 2.6,
2.0, 3.3, 4.2, 3.7, 3.9, 3.7, 3.1, 2.4, 3.0, 3.7, 4.0, 3.8, 3.7, 3.2, 3.7, 3.7,
3.8, 3.3
Shade leaves (cm): 4.5, 3.9, 4.1, 4.3, 4.1, 4.2, 4.4, 2.6, 4.0, 3.9, 5.3, 3.6,
3.7, 3.9, 3.9, 3.6, 3.4, 3.5, 3.9, 4.6, 3.9, 3.6, 4.0, 3.9, 4.1, 3.9, 4.2, 3.6,
4.4, 3.7
Calculated t value = 3.77
13
Year 13 IAL Biology – Statistics for Biology
Plotting a scattergram gives a good indication of any possible relationship: either a positive correlation
(for every increase in the independent variable, there is an increase in the dependent variable);
negative correlation (for every increase in the independent variable, there is a decrease in the
dependent variable); or no correlation (a random scattering of points)
12 12
10 10
8 8
6 6
4 4
2 2
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
c) No correlation
10
0
0 2 4 6 8 10 12
14
Year 13 IAL Biology – Statistics for Biology
Summary:
• null hypothesis formulated
• calculate the Spearman Rank Correlation Coefficient (rs)
• compare the observed value of rs against the critical value (5% significance level) for the
appropriate number of pairs of measurements (n)
• if rs (ignoring the sign) is greater than or equal to the critical value then there is a significant
correlation
• if rs is positive there is a positive correlation
• if rs is negative there is a negative correlation
N.B. a significant correlation does not necessarily mean that changes in one factor cause
changes in the other: there could be some third factor responsible for changes in both. Ideally
you should be able to design an experiment to help you decide if a causative relationship exists
between the two variables.
An investigation was undertaken into the effect of the flow rate of a stream on the density of stream
invertebrates. A glance at the data shown below seemed to suggest that there was a correlation
between the flow rate and the invertebrate density. Is this a true correlation or merely due to chance?
15
Year 13 IAL Biology – Statistics for Biology
Invertebrate density
Site Flow rate (ms-1) (number per 0.5x0.5m
quadrat)
4 0.1 15
5 0.1 41
9 0.2 60
1 0.3 86
6 0.4 52
10 0.5 30
3 0.5 39
2 0.6 46
11 0.7 72
8 0.8 63
7 0.9 100
12 1.1 71
Note that the results have been ranked (placed in order) according to the independent variable (flow
rate) – the site number is not relevant here at all, and could actually be left out from the results table.
You might be expected to rank results in an exam question.
Step 1: Calculate the rs value for the data (rs = 0.55 for the data below)
Step 2: Compare the value of rs against the critical value for the appropriate number of pairs of
measurements (n) – see appendix. For n = 12 the critical value = 0.59
The observed rs value is less than the critical value.
Conclusion: There is no significant correlation between the flow rate and the number of animals,
at the 5% significance level
Q7) After thinking about the results, it was decided to split the invertebrates into different groups and
analyse each group separately. Animals such as blackfly larva are filter feeders that depend on
flowing water to bring them food. Others, such as freshwater shrimps, are deposit feeders and prefer
still water.
16
Year 13 IAL Biology – Statistics for Biology
Therefore, it was decided to analyse the data a second time using the numbers of different groups
present within each quadrat. The results are shown in the table below:
Examining the correlation between flow rate and the number of blackfly larvae/quadrat gives
rs=0.98
and for the correlation between flow rate and the number of shrimp/quadrat, the rs value = -0.81
17
Year 13 IAL Biology – Statistics for Biology
This test is used when we have a model, or hypothesis and we want to see if our experimental values
fit in with the values that would be expected from the model or hypothesis. We call these two sets of
data the observed values (O) and the expected values (E), respectively. The χ2 test is useful when
we are analysing the results of genetic crosses for the predicted ratios of offspring.
Summary:
• formulate a null hypothesis
• calculate the χ2 value
• calculate the degrees of freedom = number of categories (or the number of pairs of
observed:expected values) minus one.
• a large value of χ2 occurs when there is a big difference between the observed and expected
values. So the larger the χ2 value, the more certain it is that the difference is significant:
• if the calculated χ2 value is greater than the critical value (5% significance level; see tables in
appendix) then there is a significant difference between the observed and expected values and
the null hypothesis is rejected
• if the calculated value is less than the critical value, then the null hypothesis is accepted
Let us suppose that you have been catching up on your Edward Lear and the bit about owls, larks,
wrens and hens has made a deep impression on you. You decide to investigate the phenomenon of
‘beards as nesting sites’ more fully. Do nesting birds and other vertebrates exhibit a preference
for certain types of facial fungus? Or will any old beard do?
You conduct a survey of 100 randomly chosen individuals from each of seven professions where
facial hair is known to be advantageous. Then you simply determine the number of individuals with
18
Year 13 IAL Biology – Statistics for Biology
one or more small mammals and birds nesting beneath their chins. You obtain the results in the table
below.
Frequency of individuals with birds and/or small
Category/Profession of beard grower
mammals in their beards/total number
Ecologists 22
Sea captains 31
Psychologists 27
Druids 15
Conservationists 37
TV natural history presenters 22
Members of the NSPB* 42
(*National Society for the Protection of Beards)
The calculated value for is greater than the critical value (see the appendix), so there is a
significant difference between the observed and the expected values i.e. the null hypothesis should
be rejected.
Q8) An investigation was carried out to study what happens when woodlice are given a choice
between dry and humid atmospheres. The investigation consisted of five trials with ten woodlice
used in each trial. The results are shown below:
19
Year 13 IAL Biology – Statistics for Biology
Formulate a null hypothesis and determine whether it should be accepted or not (use a p=0.05
significance level). χ2 = 2.88
Q9) As part of a sea-shore study, students had been looking at the size (maximum length) of cockles
(Cerastoderms edule). They were using size as a measure of age so they could examine the age
structure of a population of cockles on the shore. The results are shown in the table below (Shore
1). One student wanted to look at the age structure of a cockle population on a second shore that
had a large wintering population of oystercatchers (Haematopus ostralegus) and compare it with
that of the first shore. His hypothesis was that the oystercatchers would take the larger, older
cockles in preference to smaller, younger cockles, leading to a distortion of the size structure
towards the smaller size classes. Effectively he was using the age structure of cockles on the first
shore as a model distribution (his expected values) and wanted to see how well the data from to
the second shore (his observed values) fitted to that model.
Results:
Size class / cm Shore 1 Shore 2
0.5-0.9 96 111
1.0-1.4 36 40
1.5-1.9 12 12
2.0-2.4 10 3
2.5-2.9 8 8
3.0-3.4 6 4
3.5-3.9 10 8
4.0-4.4 9 6
4.5-4.9 6 2
5.0-5.4 7 6
20
Year 13 IAL Biology – Statistics for Biology
Student’s t-test
Degrees of Critical value: Degrees of Critical value:
freedom p=0.05 p=0.01 freedom p=0.05 p=0.01
15 2.13 2.94 25 2.06 2.80
16 2.12 2.92 26 2.06 -
17 2.11 2.90 27 2.05 -
18 2.10 2.88 28 2.05 -
19 2.09 2.86 29 2.04 -
20 2.09 2.85 30 2.04 2.75
21 2.08 2.83 40 2.02 2.70
22 2.07 2.82 60 2.00 2.66
23 2.07 2.81 ∞ 1.96 2.58
24 2.06 2.80
21
Year 13 IAL Biology – Statistics for Biology
χ 2 test
Critical values at the p=0.05 level
Degrees of freedom Critical value Degrees of freedom Critical value
1 3.84 16 26.30
2 5.99 17 27.59
3 7.81 18 28.87
4 9.49 19 30.14
5 11.07 20 31.41
6 12.59 21 32.67
7 14.07 22 33.92
8 15.51 23 35.17
9 16.92 24 36.42
10 18.31 25 37.65
11 19.68 26 38.89
12 21.02 27 40.11
13 22.36 28 41.34
14 23.69 29 42.56
15 24.99 30 43.77
22