Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 48

Statistics and ecology

Why??????
In any study, including field studies, you collect a
lot of data.
It is hard to tell if patterns that appear are the result
of a biological relationship or merely due to
chance!!!!
This is why you need a statistical test to help you
decide.
They are: student’s t-test , spearman’s rank
correlation, chi squared test.
( and mann- whitney U-test)
Probability
 is normally measured on a scale from 0 – 1, where 0 represents
impossibility and 1 represents certainty. The probability that our
results are due to chance could take any value between 0 and 1. 
We need to agree on what level of probability we are going to
accept before we decide that the difference is real.  For most
purposes, and certainly for A Level Biology, the level which is
conventionally chosen is that the probability that our result is due
to chance should be no more than 0.05.  This means that there is
only a 1 in 20, or 5% probability that the difference we have seen
is due to chance
  We can be 95% confident that our results are real – there is a high
chance that something really is   going on!
At this point, we say that the difference is statistically significant.
(significant = not due to chance)
What are we testing for??????
Before you collect data, you need to decide what you’re trying to prove.
You need a null hypothesis (which assumes that any pattern/ differences
between data sets are the result of chance.
Choose the suitable test .
Run the test
You’ll get a number( observed/ calculated value).
Obtain a value from published tables (critical value) at p (probability /
significance level) = 0.05 , confidence level 95% .
Compare the values , if observed > critical … reject null hypothesis and
accept working hypothesis.( you are 95 % confidant that the pattern in
your data is due to something other than chance).
Calculated value > Critical value = Significant result
An exception to this rule however is when using the Mann Whitney U
test where your calculated value needs to be less than the critical value
before your result is significant:
 Calculated value < Critical value = Significant result
Null Hypothesis
(in a statistical test) the hypothesis that
there is no significant difference between
specified populations, any observed
difference being due to sampling or
experimental error.
By disproving the null
Hypothesis we accept
the experimental
one.
Formula
There’sno significant correlation/ difference between\
in …………………(indep) and ………………….
(dep).

Be as specific as possible


There is no significant correlation in heart
rate of Daphnia in different concentrations of
caffeine.
Which test to use?????
1. First you need to decide whether you are looking for
differences or correlations
 between sets of data.
For example you want to know if the average height of a
crop plant is different in two samples/fields……
difference 
If you had measured several samples at intervals, for
example from the top of a slope to the bottom, you
would want to know whether there is a correlation
between the height of the plants and the distance down
the slope.
A) Testing for Differences:

For Edexcel and OCR there are two possibilities, so you have another
decision to make:
You need to decide whether your data fit a normal distribution. This
is a symmetrical distribution, with the greatest number of readings
being in the central range, and progressively fewer readings as you
move away from the mean in either direction. You can find the
normality of the distribution of your data by plotting it as a histogram.
When in this form, normal distribution takes the form of a bell shaped
curve.
If your data roughly fits a bell shape curve and is normally distributed,
you need the T test.
If not, try the Mann Whitney U test.

In practice, if you have measured any aspect of an organism you are


likely to find a normal distribution. If you have counted numbers of
organisms it is unlikely that you will have a normal distribution.
New syllabus
compare the means of two sets of data, if
the data are normally distributed….. T-
test

For example, we could use the t-test to


investigate whether the difference
between the mean width of shade leaves is
significantly different from the mean
width of sun leaves.
B) Testing for Correlation:
You need to decide whether your data consist of
either:
– Continuous variables (measurable on a scale) e.g.
distance, height
OR
– Discrete categories e.g. colour, gender.
Spearman Rank /correlation coefficient should be
used when you are looking for associations between
two continuous variables.
The Chi Squared association test is the one to
choose if looking for associations between categorical
data. (difference in means) usually in genetics
Spearman’s rank correlation/ correlation
coeffecient
Spearman’s rank correlation coefficient
test
 Spearman’s test is used to determine if there’s a correlation between the
values on the x and y axes,
 It gives a single number, called the correlation coefficient,
 The symbol for this is r,
 r is in the range +1.0 to -1.0
 when r = +1.0, this means a perfect positive correlation, with the line of
best fit going up from bottom left to top right.,
• when r = -1.0, this means a perfect negative correlation, with the line of best fit
going up from top left to bottom right,.
• When r = 0, there is no correlation
• When r is between 0 and either –1 or +1, there is a weaker correlation.

r = -1.0 r = 0.0 r = -0.6

r = +1.0 r = +0.8
Remember
Ranking Where:
: from
largest to smallest
 rs = the correlation
coefficient
∑ = sum of
d = the difference between
each pair of ranks (explained
in the worked example)
 n = the size of the sample
(number of pairs of values)
Note: n(n2 – 1) is sometimes
written as n3 – n.
Spearman’s rank correlation test
calculate difference
tabulate rank rank between ranks
data for x for y

hrs after % photo- x ranks y ranks d d2


hatching taxic (rx) (ry) (rx - ry) square
5 3 10 10 0 0 this
18 5 9 9 0 0
25 12 8 7 1 1
31 7 7 8 -1 1
42 17 0 0
6 6
50 22 0 0
5 5
55 35 0
4 4 0
64 37 0.5
3 2.5 0.25
72 46 1
2 1 1
80 37 -1.5 sum of
1 2.5 2.25 squared
deviations
n = 10
number of pairs d = 2 5.5
Spearman’s rank correlation test
2
6 d
r is calculated according to this r =1-
equation: n3 - n n (n2 -1 )

2
6(5.5 )
 r =1-
1000 -10
181.5
 r =1- Comments
990 on this ????

r = + 0.817
Significance of the correlation test

We look up the value for r (+0.817) in a significance table.


This gives us the probability for the null hypothesis: that the apparent
correlation on the graph could have been obtained by pure chance.
The value of r is greater than the
critical value p = 0.05 . So the null
hypothesis is very unlikely and we
have excellent support for the 0.05 0.025 0.01 0.005
alternative hypothesis.  Significance level (two-tailed)
0.1 0.05 0.02 0.01

We see where our calculated value for 5 0.900 1.000 1.000


r fits on the line. In this case it is to 6 0.829 0.886 0.943 1.000
the right of the biggest number. 7 0.714 0.786 0.893 0.929
8 0.654 0.738 0.833 0.881
9 0.600 0.683 0.783 0.833
10 0.564 0.648 0.745 0.794
we find the correct line in the table. n
is the number of data points/ means: 11 0.523 0.623 0.736 0.818
in this case 10 12 0.497 0.591 0.703 0.780
We therefore reject the null hypothesis.
And accept the working hypothesis.
We can say that there is a significant positive correlation between the age
of bird and % phototaxis, at the 5% significance level / 95% confidance
level.
Student t-test
When comparing means of the same variable of two
groups.
It tells you about the degree of overlap between the
two sets of data which allows you to judge if the
difference between the mean values is significant or
not.
Used for a small sample size while we use z-test for
a large sample size. (The sample size is less than 25).

 What can this test tell you? If there is a statistically


significant difference between two means.
Page 44 go through example 2
To analyse the data from the T-test , identify
the critical value, you need 2 values, the
significance level (5% , 0.05) and the
degrees of freedom –instead of sample size.
Since we have 2 samples then degrees of
freedom will be
dF= n1+n2 - 2

Or
dF=(n1-1 )+ (n2-1 )
Understanding Degrees of Freedom
 Calculations of sample estimates, such as the standard deviation and variance, use degrees
of freedom instead of sample size. The way you calculate degrees of freedom depends on
the statistical method you are using, but for calculating the standard deviation, it is defined
as 1 less than the sample size (n − 1).
 To illustrate what this number means, consider the following example. Biologists are
interested in the variation in leg sizes among grasshoppers. They catch five grasshoppers
(𝑛 = 5) in a net and prepare to measure the left legs.
 As the scientists pull grasshoppers one at a time from the net, they have no way of
knowing the leg lengths until they measure them all. In other words, all five leg lengths are
“free” to vary within some general range for this particular species. The scientists measure
all five leg lengths and then calculate the mean to be x = 10 millimeters.
 They then place the grasshoppers back in the net and decide to pull them out one at a time
to measure them again. This time, since the biologists already know the mean to be 10,
only the first four measurements are free to vary within a given range.
 If the first four measurements are 8, 9, 10, and 12 millimeters, then there is no freedom for
the fifth measurement to vary; it has to be 11. Thus, once they know the sample mean, the
number of degrees of freedom is 1 less than the sample size, df = 4.
Go through practice Question 1 and 2
Chi-square test (goodness of fit)
Genetic Ratios: are deviations
significant?
• Mendel crossed two pea plants with green pods. Both were heterozygous for
the recessive characteristic yellow pods.
• In the offspring, 428 plants had green pods and 152 had yellow pods. (total
580)
• The expected ratio is 3:1. The actual ratio is 2.82:1.
• Could this deviation from the ideal ratio be produced by pure chance or is
there a significant deviation?


This is a job for the  test (chi squared).
2

• This test compares actual numerical patterns with expected patterns and gives
the probability that chance could have caused the deviations.
Checking Genetic Ratios with 2
Enter the observed values into the first column of a table (O = observed),
Calculate the values expected for a “perfect ratio”: total offspring = 580; ¾ of
this is 435 and ¼ is 145,
Enter these values in the second column (E = expected),
In the third column, calculate deviations from the expected values (O – E),
Square this value in the third column, and
in the final column divide by the expected.
2 is the sum of the final column

O E O-E (O – E)2 (O – E)2 / E


428 435 -7 49 0.113

152 145 7 49 0.338

2 = 0.451
THE SIGNIFICANCE TEST
The value of 0.451 for 2 does not mean
anything yet.
First, we must look up the value in
significance tables,
As with the Spearman’s rank table, there
are lines in the 2 table for different values Significance level
of  the number of degrees of freedom, 
For 2, this is the number of data items 0.05 0.01 0.005 0.001
minus one, so in this case  = 1, 1 3.84 6.64 7.88 10.83
2 5.99 9.21 10.60 13.82
We see where our calculated value fits on 3 7.82 11.34 12.84 16.27
this line,
4 9.49 13.28 14.86 18.46
It is well below the critical value for p =
5 11.07 15.09 16.75 20.52
0.05, so we give the probability of the null
hypothesis as: 6 12.59 16.81 18.55 22.46

p > 0.05
VERY IMPORTANT

GO OVER
EXAMPLES 1,2,3
FROM THE BOOK
PLUS Q3 PAGE 47
Statistics: which test?
PURPOSE WHICH TEST? REQUIRES

To compare two groups, e.g. THE MANN- Use median


heights of trees from different WHITNEY U
woods, or speed of breakdown of TEST
protein by two different enzymes.
or T-TEST
use a mean

different
To check for correlation between
SPEARMAN’S sources give
two variables e.g. effect of
RANK minimum
temperature on metabolic rate
CORRELATION between 8 and
TEST 15 data points

To check for goodness of fit to a


numerical pattern, e.g. are
woodlice randomly distributed in THE  TEST
2 2 numbers
a choice chamber?
 chi-squared test when the data are
categoric
 the Student’s t test when comparing the
mean values of two sets of data/ samples
 a correlation coefficient when examining
an association between two sets of data.

You might also like