Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Chi squared tests

These are used in biology and in particular in genetics and ecology to test whether the observed results are different to
what you would have expected and whether this difference is statistically significant or just due to chance.

We first need to make a hypothesis. This is a prediction of the outcome. One form of the prediction is that there will be
no difference between the observed and expected results. This is called the null hypothesis, H0. The alternative
hypothesis states that there is a difference between the observed and expected results. It is written as H1.

In Biology the cut off point for difference is taken as 5%. We use formula to determine the likelihood of the numbers
being different (observed and expected) due to a random cause/chance. If there is a greater than 5% chance (5 in a
100/1 in 20) that the numbers are different due to a random cause, then they are really not different at all. It is just
chance that is causing the observed and expected numbers to look different.

If the formula tells us that the probability that chance is causing the observed difference is very small i.e. less than 5% e.g
3%, 1%, 0.01% then they are different. Something real is causing them to be different. It is not chance. There is a causal
factor. In this case we would reject the null hypothesis and accept the alternative hypothesis.

Ecology Example.

We are collecting data about the presence or absence of two different species in a series of randomly positioned
quadrats in a sample area. We pose the question are the species associated with each other i.e. when we find one do
we also find the other (is something causing it) or is this just due to chance. The null hypothesis would say that there is
no association and that it is simply due to chance. The alternative hypothesis would say that they are associated –
something is causing one to be present with the other.

The data is recorded as a 2 x 2 contingency table as follows

Number of quadrats Number of quadrats


Species A present Species A absent Row totals
Species B present present/present present/absent
Species B absent absent/present absent/absent
Column totals Grand totals

To calculate Expected frequencies, we use the formula = row total x column total/grand total

To calculate chi squared we use the formula where O is observed and E is expected

To calculate the degrees of freedom we use (number of row – 1) x (number of columns – 1)

Let us look at an example:

The diamond leaf willow (Salix pulcha) and the artic willow (Salix artica) are both found on rocky outcrops in marshy
ground on the arctic tundra. A survey was carried out to see if there was an association between these two species.

Here are the results:

Number of quadrats Number of quadrats


S. pulcha present S. pulcha absent Row totals
S. artica present 52 38 90
S. artica absent 45 20 65
Column totals 97 58 Grand total 155
Iiii
In the table below calculate the expected values

Number of quadrats Number of quadrats


S. pulcha present S. pulcha absent Row totals
S. artica present 90 x 97/155 = 56.3 90
S. artica absent 65 x 65
Column totals 97 58 Grand total 155

This should be the completed table

Number of quadrats Number of quadrats


S. pulcha present S. pulcha absent Row totals
S. artica present 56.3 33.7 90
S. artica absent 40.7 24.3 65
Column totals 97 58 Grand total 155

The data can now be entered into a chi squared table:

O E O-E (O-E)2 (O-E)2/E


Both present 52 56.3
S. pulcha only 45
S. artica only 38
Both absent 20
155 ∑=

Did you get a ∑ = 2.09?

Now we go to the most important part of the exercise and the one that is most likely to be asked as it is checking your
understanding of the process.

Here is a chi squared table; - p value stands for probability value. The chi squared values are in the table. The degrees of
freedom are on the left hand side column.

The important cut off point is the 5% level. To the right of it indicates a decreasing probability that the results are
different due to chance. To the left of it indicates an increasing probability that it is just chance that is causing the
difference in numbers between observed and expected.
To the right reject the null hypothesis (Right reject). There is only a very small probability that Observed and
Expected are different due to chance. They do occur together. In other words, reject the idea that there is no association
and so there is an association.

To the left accept the null hypothesis – it is just due to chance. There is no association.

So now let us analyse our result. We got a chi squared value of 2.09.

We have 1 degree of freedom. Look back at the formula for this to work it out.

2.09 is to the left of the cutoff value/the 5% value. It lies between a probability of 10% and 15% that the results are
different just due to chance. This is considered a high probability as it is greater than the 5% probility. They are not
statistically different. We accept the null hypothesis that says that there is no association.

If you are going to be asked to do the calculations it is likely that they will give you the formula. In this case it will appear
in paper 2 and you are allowed to use a calculator in paper 2.

You can do these calculations on a calculator as follows. Check out that you can do this with the numbers above.

1. Go to http://www.physics.csbsju.edu/stats/contingency_NROW_NCOLUMN_form.html
2. Enter 2 and 2 for the rows and columns
3. Click submit
4. Enter your 4 observed values
5. Click calculate now
6. Make sure you match the number to the box as putting the values in different positions may give a wrong
calculation
7. This gives the expected numbers, the chi squared value and the probability.

The Math’s studies people can also do the calculation on the Scientific calculators!!

You might also like