Professional Documents
Culture Documents
Statistics and Ecology
Statistics and Ecology
Why??????
In any study, including field studies, you collect a
lot of data.
It is hard to tell if patterns that appear are the result
of a biological relationship or merely due to
chance!!!!
This is why you need a statistical test to help you
decide.
They are: student’s t-test , spearman’s rank
correlation, chi squared test.
( and mann- whitney U-test)
Probability
is normally measured on a scale from 0 – 1, where 0 represents
impossibility and 1 represents certainty. The probability that our
results are due to chance could take any value between 0 and 1.
We need to agree on what level of probability we are going to
accept before we decide that the difference is real. For most
purposes, and certainly for A Level Biology, the level which is
conventionally chosen is that the probability that our result is due
to chance should be no more than 0.05. This means that there is
only a 1 in 20, or 5% probability that the difference we have seen
is due to chance
We can be 95% confident that our results are real – there is a high
chance that something really is going on!
At this point, we say that the difference is statistically significant.
(significant = not due to chance)
What are we testing for??????
Before you collect data, you need to decide what you’re trying to prove.
You need a null hypothesis (which assumes that any pattern/ differences
between data sets are the result of chance.
Choose the suitable test .
Run the test
You’ll get a number( observed/ calculated value).
Obtain a value from published tables (critical value) at p (probability /
significance level) = 0.05 , confidence level 95% .
Compare the values , if observed > critical … reject null hypothesis and
accept working hypothesis.( you are 95 % confidant that the pattern in
your data is due to something other than chance).
Calculated value > Critical value = Significant result
An exception to this rule however is when using the Mann Whitney U
test where your calculated value needs to be less than the critical value
before your result is significant:
Calculated value < Critical value = Significant result
Null Hypothesis
(in a statistical test) the hypothesis that
there is no significant difference between
specified populations, any observed
difference being due to sampling or
experimental error.
By disproving the null
Hypothesis we accept
the experimental
one.
Formula
There’sno significant correlation/ difference between\
in …………………(indep) and ………………….
(dep).
For Edexcel and OCR there are two possibilities, so you have another
decision to make:
You need to decide whether your data fit a normal distribution. This
is a symmetrical distribution, with the greatest number of readings
being in the central range, and progressively fewer readings as you
move away from the mean in either direction. You can find the
normality of the distribution of your data by plotting it as a histogram.
When in this form, normal distribution takes the form of a bell shaped
curve.
If your data roughly fits a bell shape curve and is normally distributed,
you need the T test.
If not, try the Mann Whitney U test.
r = +1.0 r = +0.8
Remember
Ranking Where:
: from
largest to smallest
rs = the correlation
coefficient
∑ = sum of
d = the difference between
each pair of ranks (explained
in the worked example)
n = the size of the sample
(number of pairs of values)
Note: n(n2 – 1) is sometimes
written as n3 – n.
Spearman’s rank correlation test
calculate difference
tabulate rank rank between ranks
data for x for y
2
6(5.5 )
r =1-
1000 -10
181.5
r =1- Comments
990 on this ????
r = + 0.817
Significance of the correlation test
Or
dF=(n1-1 )+ (n2-1 )
Understanding Degrees of Freedom
Calculations of sample estimates, such as the standard deviation and variance, use degrees
of freedom instead of sample size. The way you calculate degrees of freedom depends on
the statistical method you are using, but for calculating the standard deviation, it is defined
as 1 less than the sample size (n − 1).
To illustrate what this number means, consider the following example. Biologists are
interested in the variation in leg sizes among grasshoppers. They catch five grasshoppers
(𝑛 = 5) in a net and prepare to measure the left legs.
As the scientists pull grasshoppers one at a time from the net, they have no way of
knowing the leg lengths until they measure them all. In other words, all five leg lengths are
“free” to vary within some general range for this particular species. The scientists measure
all five leg lengths and then calculate the mean to be x = 10 millimeters.
They then place the grasshoppers back in the net and decide to pull them out one at a time
to measure them again. This time, since the biologists already know the mean to be 10,
only the first four measurements are free to vary within a given range.
If the first four measurements are 8, 9, 10, and 12 millimeters, then there is no freedom for
the fifth measurement to vary; it has to be 11. Thus, once they know the sample mean, the
number of degrees of freedom is 1 less than the sample size, df = 4.
Go through practice Question 1 and 2
Chi-square test (goodness of fit)
Genetic Ratios: are deviations
significant?
• Mendel crossed two pea plants with green pods. Both were heterozygous for
the recessive characteristic yellow pods.
• In the offspring, 428 plants had green pods and 152 had yellow pods. (total
580)
• The expected ratio is 3:1. The actual ratio is 2.82:1.
• Could this deviation from the ideal ratio be produced by pure chance or is
there a significant deviation?
•
This is a job for the test (chi squared).
2
• This test compares actual numerical patterns with expected patterns and gives
the probability that chance could have caused the deviations.
Checking Genetic Ratios with 2
Enter the observed values into the first column of a table (O = observed),
Calculate the values expected for a “perfect ratio”: total offspring = 580; ¾ of
this is 435 and ¼ is 145,
Enter these values in the second column (E = expected),
In the third column, calculate deviations from the expected values (O – E),
Square this value in the third column, and
in the final column divide by the expected.
2 is the sum of the final column
2 = 0.451
THE SIGNIFICANCE TEST
The value of 0.451 for 2 does not mean
anything yet.
First, we must look up the value in
significance tables,
As with the Spearman’s rank table, there
are lines in the 2 table for different values Significance level
of the number of degrees of freedom,
For 2, this is the number of data items 0.05 0.01 0.005 0.001
minus one, so in this case = 1, 1 3.84 6.64 7.88 10.83
2 5.99 9.21 10.60 13.82
We see where our calculated value fits on 3 7.82 11.34 12.84 16.27
this line,
4 9.49 13.28 14.86 18.46
It is well below the critical value for p =
5 11.07 15.09 16.75 20.52
0.05, so we give the probability of the null
hypothesis as: 6 12.59 16.81 18.55 22.46
p > 0.05
VERY IMPORTANT
GO OVER
EXAMPLES 1,2,3
FROM THE BOOK
PLUS Q3 PAGE 47
Statistics: which test?
PURPOSE WHICH TEST? REQUIRES
different
To check for correlation between
SPEARMAN’S sources give
two variables e.g. effect of
RANK minimum
temperature on metabolic rate
CORRELATION between 8 and
TEST 15 data points