Y13 IAL Statistics Workbook - Student v1

Year 13 IAL Biology – Statistics for Biology
Y13 IAL BIOLOGY
STATISTICS
FOR BIOLOGY
1
Introduction
This booklet aims to prepare you for your Unit 6B exam paper, Practical Biology and Investigative
Skills. This paper contains three questions:
1) Core practical – you will be expected to evaluate a core practical, usually from the A2 course, but
it can include AS core practicals
2) Data analysis – you will be expected to analyse data from an experiment, including drawing a
results table, graph and statistical analysis of the results. This booklet will help most with this
question. You should aim to complete this section of the paper very quickly to allow time for the
(more demanding) Q3.
3) Experimental design – you will be expected to design an experiment to test a hypothesis. This
booklet will also be useful for answering this question.
This is not (thankfully!) an exhaustive guide to the finer details of statistical analysis. It covers only the
main aspects that you are likely to encounter in the Unit 6B paper. This guide is designed to be used
as a self-study unit, before you start the A2 Biology course, but your teacher will probably make some
time available at the end of the course to go over the material. If you have any questions, it should be
obvious to you by now to find your teacher and ask them to go over the work with you.
Ernest Rutherford (British Chemist; investigated radioactivity; Nobel Prize for Chemistry, 1908):
“If your experiment needs statistics, you ought to have done a better experiment.”
Benjamin Disraeli (British Politician and Prime Minister) / Mark Twain (American writer):
“There are three kinds of lies: lies, damned lies, and statistics.”
2
Contents
1) Summarising data: p4-6
(a) Graphs
(b) Summarising normally-distributed data
(c) Standard deviation (SD)
(d) Summarising skewed data
2) Statistical analysis of data: p7-9

(a) The null hypothesis
(b) Which test should I use?
(c) Critical values and significance levels
3) Comparing two sets of data – Student’s t-test p10-13
4) Looking for relationships between data sets – Spearman rank correlation

p14-17
5) Dealing with Categorical Data: the Chi2 (χ 2) test p18-20
Appendix: Tables of Critical Values p21-2
3
(1) Summarising Data
It is often convenient to summarise large sets of experimental data to make analysis easier. This can
be done by calculating the mean, median or mode of the data and these can be represented visually
in a graph.
• mean: sum of all the samples divided by the number of samples
• median: middle number in the list when arranged in rank order
• mode: measurement that occurs the most frequently
a) Graphs
• line graphs – when the independent variable is continuous
• bar charts - when the independent variable is discontinuous / categoric
• scatter graphs – similar to a line graph, but where there is either no clear correlation or where
there is no clear independent variable (i.e. may just be looking to see of two variables
correlate without looking at any causative relationships)
• histograms – continuous independent variable, grouped into equal-sized categories, for
convenience
Generally, results can either show a normal distribution, around the mean, or a skewed distribution,
around the mode.
To see which type of distribution shown by a set of results, we can use frequency tables:
i. find the range of the data (lowest – highest value)
ii. split the range into an appropriate number of equal-sized classes (make sure there are no
overlaps)
iii. tally (count up) the number of results falling into each class
iv. produce a histogram of the results (a proper histogram, not like the silly Maths ones!)
4
Q1) Try the following example:

By means of random sampling using a 1m x 1m quadrat, the number of daisy flowers (capitulae) per
square metre of lawn were obtained. These results were as shown in the table below. Express these
results in a frequency table, to show the distribution in the density of the capitulae. Group the data in
five conveniently-sized groups.
No. of No. of No. of No. of

Square Square Square Square
capitulate/m2 capitulate/m2 capitulate/m2 capitulate/m2
1 11 11 16 21 19 31 6
2 3 12 9 22 15 32 13
3 13 13 14 23 23 33 15
4 19 14 12 24 12 34 4
5 20 15 6 25 10 35 10
6 21 16 7 26 9 36 9
7 15 17 20 27 5 37 10
8 10 18 15 28 12 38 12
9 17 19 24 29 14 39 2
10 13 20 13 30 3 40 18
5
b) Summarising normally-distributed data

In addition to calculating the mean, it is useful to have some idea of the spread / dispersion of the
data around this value. We can use the range for this i.e. record the lowest and highest values
However, both the above data have the same mean and range, but the distribution of data is different.
We can get a better idea of the spread of the results around the mean by calculating the standard
deviation of the results.
c) Standard deviation (SD)

• can only be used with normally distributed data
• measures and sums up the deviation of each result from the mean
What does the SD tell us?

• 68% of all the data lies within ±1SD of the mean
• 95% of all the data lies within ±2SD of the mean
• there is only a 5% probability (p=0.05) that a result will be outside the range of the mean ±
2SDs
• we can be confident that 95% of the data will be ±2SD from the mean
• high SD – the data show a lot of variation from the mean i.e. low reliability
• low SD – data are more closely clustered around the mean i.e. higher reliability
The standard deviation of a set of results can also be plotted on a graph to show the reliability /
variability of the results set.
d) Summarising skewed data

• if the data shows a skewed distribution, the extreme values tend to have a large effect if we use
the mean to summarise the data
• the median gives a more acceptable measure of the average
6
(2) Statistical Analysis of Data
Very often, with experiments in Biology, we are either trying to see if there is a difference between the
results gained from two different experimental conditions, or if there is a relationship between the
independent and the dependent variable in an experiment. At the end of these experiments, we will
come to a conclusion. At A2 level, we need to think about also measuring how confident we are that
our conclusion is correct i.e. would we come to the same conclusion 100% of the time, or 90% or 50%
etc.
The way that we actually do this seems backwards:

1) we formulate a null hypothesis about our experiment
2) we then get the experimental results
3) we perform the appropriate statistical test on our results
4) the result from this test allows us to state whether we will accept or reject our null hypothesis
i.e. we come to a conclusion about the experiment
5) because of the way the statistical test is done, we can also state how confident we are that
our conclusion is correct
a) The null hypothesis

• all statistical tests compare one set of data with another
• need to form a null hypothesis to start our analysis i.e. we assume that there is no significant
difference or no significant relationship between the two data sets
• the type of statistical test we use next depends on whether on the size of our data sets and
whether the data sets show a normal distribution or a skewed distribution
• we can then accept the null hypothesis if there is no significant difference or relationship
between the data sets, or reject it if there is a significant difference or relationship between them
Q2) Formulate null hypotheses for the following investigations. For each investigation, make sure you
can identify the independent and dependent variables:
a) A student decided to investigate whether eating breakfast every day had an effect on the body
mass of students in her class. She selected twelve students, measured their body mass and
asked if they ate breakfast regularly.
b) A student investigated the effect of caffeine concentration on the heart rate of Daphnia
c) A group of nine athletes wanted to see if training for two weeks at a mountain camp, 2000 m
above sea level, had an effect on the number of red blood cells in their blood. Samples of
blood were taken from each of the athletes at their normal training camp at sea level. Blood
samples were taken again after two weeks of training at the mountain camp.
7
b) Which test should I use?

The good news is that in your A2 paper 6B, you will always be told which test has been used, but it
will boost your confidence if you knew this answer already! In addition, this information will help you in
answering Q3 of the Paper 6B exam.
The following descriptions are intended to familiarise you with the three tests that you will need to
understand, when they would be used and the corresponding null hypotheses.
1) (Student’s) t-test
A t-test will tell you if the means of two sets of normally distributed, unmatched, (see Spearman’s rank
correlation below for what this means) continuous data are significantly different to one another.
Examples of where you might use a t-test:
• comparing mean heights of limpets on two different seashores
• comparing mean masses of plants grown with and without fertilizer
• comparing mean tree heights on North- and South-facing slopes
• comparing mean vegetation heights on trampled and untrampled areas
• comparing mean vitamin C contents of pasteurised and unpasteurised orange juice.
For any t-test you do the null hypothesis will be:

There is no significant difference between the means of the two sets of data.
2) Spearman’s rank correlation

A Spearman’s rank correlation test will tell you whether two variables are correlated i.e. whether a
change in one variable is accompanied by a change in the other variable. It will also tell you whether
the relationship is a positive correlation (both go up together) or a negative correlation (one goes up
as the other goes down) and the strength of any correlation.
The data will always be in matched pairs. This means that one piece of data is associated with one
other piece of data only. For example, if you were measuring temperature and water depth, each
temperature measurement would belong with only one specific depth measurement (both taken at the
same place). If you mixed the matched pairs up the data would be meaningless.
Examples of where you might use Spearman’s rank correlation:
• Is there a correlation between temperature and height up a mountain?
• Is there a correlation between mouse density and proximity to a cheese factory?
• Is there a correlation between current speed and mayfly nymph abundance?
• Is there a correlation between cigarette smoking and low intelligence?
• Is there a correlation between species diversity and height on the seashore?
For any Spearman’s rank correlation you do, the null hypothesis will be:
There is no correlation between the two variables.
8
3) χ 2 test (Chi-squared test)
A χ2 test does a lot of things but it can be used in a simple way to see if an observed set of data
(categorical data, counts of things in categories, i.e. frequencies) differs significantly from what we
might expect, given our null hypothesis. For example, it can be used in a genetics experiment to
compare the observed data with what might be expected from a cross between two heterozygotes.
Other examples of where you might use a χ2 test:
• Do seashore snails actively select specific microhabitats?
• Does lichen frequency differ between air-polluted and clean sites?
For any χ2 test you do the null hypothesis will be:

There is no significant difference between the observed and the expected frequencies.
c) Critical values and significance levels

Each statistical test will give you a calculated / observed value. This should be compared to a data
table of critical values for the test you are using. You can choose how confident you want to be in
accepting or rejecting your null hypothesis by using critical values at a given significance (or
confidence) level, and this value should always be quoted in your conclusion.
e.g. usually in Biology, we use a 5% significance level. This means that 95% of the times we do this
experiment, we can be sure the conclusion we make is correct. About 5% of the times, we would get
a different result.
In data tables, a 95% significance level, is written as a probability: p=0.05.
9
(3) Comparing Two Sets of Data – Student’s t-test

• data are normally distributed
• looking to see if the difference between the means for each set of data is significant
• large sample size (>25-30 per data set) – although smaller samples can be used with variations
of the formula and calculation
Summary:
• calculate the t-value for your data (in your exam, this will be given to you)
• calculate degrees of freedom for the data sets:
degrees of freedom = n1 +n2 – 2 (n is the number of measurements in each data set)
• compare observed t value to critical values for t (at p=0.05) for these degrees of freedom
• if observed t value is greater than or equal to the critical value (at p=0.05) the null hypothesis
should be rejected i.e. there is less than a 5% (p=0.05) chance that the null hypothesis is
correct
• what you are doing is working out the probability that the experimental values you have, could
have been collected if there was no difference between the two populations – if it is a low
probability, then there must have been two separate populations
• degrees of freedom are simply the way in which some tests take into account the sample size.
t-test: A Worked Example

lan was comparing the ground flora of two woods, one deciduous and one coniferous. He noticed that
in the deciduous wood there / seemed to be more light reaching the ground than there was in the
coniferous wood.
Realising the ecological importance of light, lan decided to measure the light intensity at fifteen
randomly selected spots within each wood. His light meter showed the values as arbitrary units (see
the table below).
He wanted to use the t-test on the data, so first he checked that the data were normally distributed.
He split the range of light intensities into equal size classes and tallied the number of readings in each
class. His data appeared to have a normal distribution. (Try to prove this yourself!)
lan then formulated the null hypothesis that there is no significant difference between the light
intensities in the two woods.
10
Light Intensity (arbitrary

Site units)
Deciduous Coniferous
1 10.5 9.1
2 9.6 10.3
3 10.1 10.8
4 11.6 10.3
5 11.6 9.6
6 11.3 11.1
7 10.6 9.3
8 10.4 10.5
9 12.4 10.4
10 11.3 9.7
11 10.7 10.2
12 10.5 10.2
13 11.5 9.7
14 11.1 10.9
15 11.0 9.9
Mean 10.9 10.1
Step 1: Calculate the t value for the data (this value will be given to you in the exam: t = 3.42)
Step 2: Calculate the degrees of freedom using the formula n1 + n2 - 2 (i.e. 15 + 15 – 2) = 28
Looking at the table of critical values (p=0.05) for these degrees of freedom, we see it is 2.05. lan's
value of t (i.e. 3.42) is greater than this critical value. Therefore there is a significant difference
between lan's two sets of data.
Conclusion: The null hypothesis is rejected and there is a significant difference between the
light intensities in the two woods, at the 5% significance level.
(In fact, lan's value of t was greater than the critical value of 2.75 at the p = 0.01 level (using the
nearest degrees of freedom to 28 i.e. 30), so there was a one in one hundred chance of the difference
being due to chance).
11
For the following experiments:

a) formulate a null hypothesis
b) test the hypothesis by doing a t-test
i) the observed t value for the data is given to you
ii) calculate the degrees of freedom for the experiment
iii) compare the observed t value with the critical values for these degrees of freedom
(use a 5% significance level) – see the appendix section for tables of critical values
c) give an appropriate conclusion for each experiment
Assume that all data show a normal distribution
Q3) Dog whelks on sheltered and exposed shores appear to be different sizes
('heights'). Here are some data from two different shore types. Are they
significantly different?
Height on sheltered shores (mm): 22, 23, 26, 29, 30, 31, 32, 32, 33, 33,
35, 39
Height on exposed shores (mm): 15, 17, 19, 19, 20, 20, 21, 21, 21, 22,
24, 27
Calculated t value = 3.1
Q4) Some gardeners pollinate tomato plants by hand to ensure a good

fruit-set. Other more economically-minded gardeners just spray them with
water to encourage pollen transfer. Which gives a better a fruit yield, if
either?
Fruits per plant (sprayed only): 33, 28, 56, 43, 45, 62, 74, 45, 32, 48
Fruits per plant (hand pollinated): 46, 42, 63, 40, 52, 60, 82, 74, 62, 55
Calculated t value = -1.77 (ignore the minus sign)
Q5) The Asellus water louse lives in polluted, oxygen-poor water. It is

possible to count the number of gill movements the louse makes per
minute to determine how much effort it is putting into breathing. The
number of gill movements per minute in Asellus from stagnant and
oxygen-rich habitats was counted. Do the lice breathe harder in
stagnant water?
Gill movements per minute (stagnant): 44, 53, 54, 43, 48, 49, 53.
Gill movements per minute (oxygen-rich): 42, 48, 46, 43, 49, 42, 41, 40, 44, 48.
12
Q6) A student decided to examine whether the size of leaves growing on geranium plants was larger
in the shade, compared to those growing in the sun. He measured the maximum length of leaves
of approximately the same age and recorded his results:
Sun leaves (cm): 3.5, 3.6, 3.2, 3.8, 2.9, 4.6, 3.4, 3.6, 3.4, 4.1, 3.2, 2.6,
2.0, 3.3, 4.2, 3.7, 3.9, 3.7, 3.1, 2.4, 3.0, 3.7, 4.0, 3.8, 3.7, 3.2, 3.7, 3.7,
3.8, 3.3
Shade leaves (cm): 4.5, 3.9, 4.1, 4.3, 4.1, 4.2, 4.4, 2.6, 4.0, 3.9, 5.3, 3.6,
3.7, 3.9, 3.9, 3.6, 3.4, 3.5, 3.9, 4.6, 3.9, 3.6, 4.0, 3.9, 4.1, 3.9, 4.2, 3.6,
4.4, 3.7
13
(4) Looking for Relationships Between Data Sets – Spearman

Rank Correlation Test
• used to look for a correlation between two variables
• a minimum of eight pairs of measurements are needed. Ten to fifteen pairs give a better
chance of finding a significant correlation
Plotting a scattergram gives a good indication of any possible relationship: either a positive correlation
(for every increase in the independent variable, there is an increase in the dependent variable);
negative correlation (for every increase in the independent variable, there is a decrease in the
dependent variable); or no correlation (a random scattering of points)
a) Positive correlation b) Negative correlation
12 12
10 10
8 8
6 6
4 4
2 2
0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12
c) No correlation
10
0
0 2 4 6 8 10 12
14
Summary:
• null hypothesis formulated
• calculate the Spearman Rank Correlation Coefficient (rs)
• compare the observed value of rs against the critical value (5% significance level) for the
appropriate number of pairs of measurements (n)
• if rs (ignoring the sign) is greater than or equal to the critical value then there is a significant
correlation
• if rs is positive there is a positive correlation
• if rs is negative there is a negative correlation
N.B. a significant correlation does not necessarily mean that changes in one factor cause
changes in the other: there could be some third factor responsible for changes in both. Ideally
you should be able to design an experiment to help you decide if a causative relationship exists
between the two variables.
Spearman Rank Correlation Test: A Worked Example
An investigation was undertaken into the effect of the flow rate of a stream on the density of stream
invertebrates. A glance at the data shown below seemed to suggest that there was a correlation
between the flow rate and the invertebrate density. Is this a true correlation or merely due to chance?
15
Invertebrate density
Site Flow rate (ms-1) (number per 0.5x0.5m
quadrat)
4 0.1 15
5 0.1 41
9 0.2 60
1 0.3 86
6 0.4 52
10 0.5 30
3 0.5 39
2 0.6 46
11 0.7 72
8 0.8 63
7 0.9 100
12 1.1 71
Note that the results have been ranked (placed in order) according to the independent variable (flow
rate) – the site number is not relevant here at all, and could actually be left out from the results table.
You might be expected to rank results in an exam question.
Step 1: Calculate the rs value for the data (rs = 0.55 for the data below)
Step 2: Compare the value of rs against the critical value for the appropriate number of pairs of
measurements (n) – see appendix. For n = 12 the critical value = 0.59
The observed rs value is less than the critical value.
Conclusion: There is no significant correlation between the flow rate and the number of animals,
at the 5% significance level
Q7) After thinking about the results, it was decided to split the invertebrates into different groups and
analyse each group separately. Animals such as blackfly larva are filter feeders that depend on
flowing water to bring them food. Others, such as freshwater shrimps, are deposit feeders and prefer
still water.
16
Therefore, it was decided to analyse the data a second time using the numbers of different groups
present within each quadrat. The results are shown in the table below:
Site Flow (ms-1) Blackflies per quadrat Shrimp per quadrat

3 0.3 10 23
8 0.6 31 2
1 0.5 26 0
4 0.1 1 10
5 0.1 0 35
11 0.4 8 11
7 0.9 73 0
2 0.8 54 1
9 0.2 5 34
12 0.5 21 2
6 0.7 42 9
10 1.1 65 0
Examining the correlation between flow rate and the number of blackfly larvae/quadrat gives
rs=0.98
and for the correlation between flow rate and the number of shrimp/quadrat, the rs value = -0.81
Are there any correlations from these data?
17
(5) Dealing with Categorical Data: the Chi2 (χ 2) test

All the previous situations have dealt with independent variables that are continuous. Other data are
discontinuous / categorical i.e. they belong to one of a number of different categories e.g. eye colour,
blood group, ear lobe type, hair colour, presence or absence data
This test is used when we have a model, or hypothesis and we want to see if our experimental values
fit in with the values that would be expected from the model or hypothesis. We call these two sets of
data the observed values (O) and the expected values (E), respectively. The χ2 test is useful when
we are analysing the results of genetic crosses for the predicted ratios of offspring.
Summary:
• formulate a null hypothesis
• calculate the χ2 value
• calculate the degrees of freedom = number of categories (or the number of pairs of
observed:expected values) minus one.
• a large value of χ2 occurs when there is a big difference between the observed and expected
values. So the larger the χ2 value, the more certain it is that the difference is significant:
• if the calculated χ2 value is greater than the critical value (5% significance level; see tables in
appendix) then there is a significant difference between the observed and expected values and
the null hypothesis is rejected
• if the calculated value is less than the critical value, then the null hypothesis is accepted
The χ 2 test: a worked example

There was an old man with a beard,
Who said, “It is just as I feared!”
Two owls and a hen,
Four larks and a wren,
Have all built their nests in my beard.
Edward Lear (1812–1888)
Let us suppose that you have been catching up on your Edward Lear and the bit about owls, larks,
wrens and hens has made a deep impression on you. You decide to investigate the phenomenon of
‘beards as nesting sites’ more fully. Do nesting birds and other vertebrates exhibit a preference
for certain types of facial fungus? Or will any old beard do?
You conduct a survey of 100 randomly chosen individuals from each of seven professions where
facial hair is known to be advantageous. Then you simply determine the number of individuals with
18
one or more small mammals and birds nesting beneath their chins. You obtain the results in the table
below.
Frequency of individuals with birds and/or small
Category/Profession of beard grower
mammals in their beards/total number
Ecologists 22
Sea captains 31
Psychologists 27
Druids 15
Conservationists 37
TV natural history presenters 22
Members of the NSPB* 42
(*National Society for the Protection of Beards)
Formulate a null hypothesis and find a conclusion for the experiment.

Null hypothesis: There is no significant difference between the observed and expected frequencies
(The expected frequency for each category is 28 – can you work out how we get this value?)
Using the formula for the calculation of the χ2 value, gives:

χ2 = 18.87
Degrees of freedom = no of categories - 1 = 7-1 = 6
The calculated value for is greater than the critical value (see the appendix), so there is a
significant difference between the observed and the expected values i.e. the null hypothesis should
be rejected.
Q8) An investigation was carried out to study what happens when woodlice are given a choice
between dry and humid atmospheres. The investigation consisted of five trials with ten woodlice
used in each trial. The results are shown below:
Distribution of woodlice after three minutes

Trial Dry atmosphere Humid atmosphere
1 3 7
2 4 6
3 3 7
4 5 5
5 4 6
Total O = 19 O = 31
19
Formulate a null hypothesis and determine whether it should be accepted or not (use a p=0.05
significance level). χ2 = 2.88
Q9) As part of a sea-shore study, students had been looking at the size (maximum length) of cockles
(Cerastoderms edule). They were using size as a measure of age so they could examine the age
structure of a population of cockles on the shore. The results are shown in the table below (Shore
1). One student wanted to look at the age structure of a cockle population on a second shore that
had a large wintering population of oystercatchers (Haematopus ostralegus) and compare it with
that of the first shore. His hypothesis was that the oystercatchers would take the larger, older
cockles in preference to smaller, younger cockles, leading to a distortion of the size structure
towards the smaller size classes. Effectively he was using the age structure of cockles on the first
shore as a model distribution (his expected values) and wanted to see how well the data from to
the second shore (his observed values) fitted to that model.
Results:
Size class / cm Shore 1 Shore 2
0.5-0.9 96 111
1.0-1.4 36 40
1.5-1.9 12 12
2.0-2.4 10 3
2.5-2.9 8 8
3.0-3.4 6 4
3.5-3.9 10 8
4.0-4.4 9 6
4.5-4.9 6 2
5.0-5.4 7 6
Formulate a null hypothesis and state a conclusion for the experiment.

χ2 = 12.56
20
Appendix: Tables of Critical Values
Student’s t-test
Degrees of Critical value: Degrees of Critical value:
freedom p=0.05 p=0.01 freedom p=0.05 p=0.01
15 2.13 2.94 25 2.06 2.80
16 2.12 2.92 26 2.06 -
17 2.11 2.90 27 2.05 -
18 2.10 2.88 28 2.05 -
19 2.09 2.86 29 2.04 -
20 2.09 2.85 30 2.04 2.75
21 2.08 2.83 40 2.02 2.70
22 2.07 2.82 60 2.00 2.66
23 2.07 2.81 ∞ 1.96 2.58
24 2.06 2.80
Spearman Rank correlation coefficient

Critical values are shown for the p=0.05 level
Number of pairs of Number of pairs of
Critical value Critical value
measurements (n) measurements (n)
5 1.00 16 0.51
6 0.89 18 0.48
7 0.79 20 0.45
8 0.74 22 0.43
9 0.68 24 0.41
10 0.65 26 0.39
12 0.59 28 0.38
14 0.54 30 0.36
21
χ 2 test
Critical values at the p=0.05 level
Degrees of freedom Critical value Degrees of freedom Critical value
1 3.84 16 26.30
2 5.99 17 27.59
3 7.81 18 28.87
4 9.49 19 30.14
5 11.07 20 31.41
6 12.59 21 32.67
7 14.07 22 33.92
8 15.51 23 35.17
9 16.92 24 36.42
10 18.31 25 37.65
11 19.68 26 38.89
12 21.02 27 40.11
13 22.36 28 41.34
14 23.69 29 42.56
15 24.99 30 43.77
22

Y13 IAL Statistics Workbook - Student v1

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Y13 IAL Statistics Workbook - Student v1

Uploaded by

Copyright:

Available Formats

Year 13 IAL Biology – Statistics for Biology

Y13 IAL BIOLOGY

2) Statistical analysis of data: p7-9

3) Comparing two sets of data – Student’s t-test p10-13

4) Looking for relationships between data sets – Spearman rank correlation

5) Dealing with Categorical Data: the Chi2 (χ 2) test p18-20

Appendix: Tables of Critical Values p21-2

(1) Summarising Data

Q1) Try the following example:

No. of No. of No. of No. of

b) Summarising normally-distributed data

c) Standard deviation (SD)

What does the SD tell us?

d) Summarising skewed data

(2) Statistical Analysis of Data

The way that we actually do this seems backwards:

a) The null hypothesis

b) Which test should I use?

For any t-test you do the null hypothesis will be:

2) Spearman’s rank correlation

3) χ 2 test (Chi-squared test)

For any χ2 test you do the null hypothesis will be:

c) Critical values and significance levels

(3) Comparing Two Sets of Data – Student’s t-test

t-test: A Worked Example

Light Intensity (arbitrary

Step 2: Calculate the degrees of freedom using the formula n1 + n2 - 2 (i.e. 15 + 15 – 2) = 28

For the following experiments:

Q4) Some gardeners pollinate tomato plants by hand to ensure a good

Q5) The Asellus water louse lives in polluted, oxygen-poor water. It is

(4) Looking for Relationships Between Data Sets – Spearman

a) Positive correlation b) Negative correlation

Spearman Rank Correlation Test: A Worked Example

Site Flow (ms-1) Blackflies per quadrat Shrimp per quadrat

Are there any correlations from these data?

(5) Dealing with Categorical Data: the Chi2 (χ 2) test

The χ 2 test: a worked example

Formulate a null hypothesis and find a conclusion for the experiment.

Using the formula for the calculation of the χ2 value, gives:

Distribution of woodlice after three minutes

Formulate a null hypothesis and state a conclusion for the experiment.

Appendix: Tables of Critical Values

Spearman Rank correlation coefficient

You might also like