Professional Documents
Culture Documents
M3 L03 Contingency Analysis D2L
M3 L03 Contingency Analysis D2L
M3 L03 Contingency Analysis D2L
Here is a second type of analysis for frequency data: Contingency analysis. The
underlying arithmetic of the test is the same as the Goodness-of-fit test, but
now we are asking whether two categorical variables are associated with each
other. In contingency analysis, we are trying to determine if the frequencies of
different values for one variable depend on the value of another variable. We
want to determine if one variable is “contingent” or “dependent on” the other.
This is ultimately telling us whether two variables are independent or not.
ex. hockey player lab, the canadian population birth data were extrinctic
ex. tim -rol up thingy , bird example 1
2) Intrinsic expectations/hypotheses -
- When expected frequencies are derived from the data you are analyzing, No information is assumed prior
to the study
Spotted Purplemouth
Grass 127 116
Sand 99 67
Border 264 161
2
It helps to calculate the totals in each category; these are called
“column totals” and “row totals”.
→These are the observed frequencies.
Spotted Purplemouth Total
Grass 127 116 243
Sand 99 67 166
Border 264 161 425
Total 490 344 834
HA:
The proportion of eel species( spotted or purplemouth) will not be equal across the three habitat
types( brass, sand, border) for the coral reef in Belize.
-
→We will do a G-test for Contingency Analysis using a contingency
table
• The hypothesis doesn’t actually tell you what the expected
number of eels sightings in each habitat would be.
• Instead, you need to estimate the expected numbers, using the
column and row totals.
3
Step 1: Calculate expected frequencies for each category
E= expected frequencies
344/834= 41.2%
What does it look like is happening in terms of habitat choice for the
two species?
These expected numbers are intrinsic to the data. We didn’t have any
scientific reason beforehand to say, “I expect 41.2% of eels found in
sandy habitats to be G.vicinus”.
The frequencies are features of our data, so our expected numbers are
said to constitute an intrinsic hypothesis.
4
Step 2: Now that we have our observed and expected frequencies, we
can calculate our test statistic. Again, we will conduct a G-test.
𝑘
𝑂𝑖
G = 2 ∑ 𝑂𝑖 ln ( )
𝐸𝑖
𝑖=1
5
Degrees of freedom for the intrinsic hypothesis:
If you have an intrinsic hypothesis, you have to estimate (c-1)+(r-1)
parameters, where c is the number of columns in your table and r is
the number of rows.
• The reason you only have to estimate c-1 column totals and r-1
row totals is because you can take the total number of
observations N as given.
• So if you know N, and you’ve estimated c-1 column totals, you
also have an estimate of the remaining column total. It’s the
same for rows. If you know N, and you’ve estimated r-1 row
totals, you also have an estimate of the remaining row total.
• For our example of an intrinsic hypothesis that eels and habitat
type are independent, k=6 categories and p= (r-1)+(c-1) = (2-
1)+(3-1) = 3 est. parameters, so df=6-3-1=2; same as (r-1)(c-1) =
(2-1)(3-1) = 2
So…
𝒅𝒇 = 𝒓𝒄 − [(𝒓 − 𝟏) + (𝒄 − 𝟏)] − 𝟏
𝒅𝒇 = 𝟐
6- {(3-1)+(2-1)}-1= 2
6
• So now we know that habitat in which an eel is found is NOT
independent of the eel species, but what classes in the table are
similar to others, and which are different?
We won’t show how to do this, but know that there is a way to look at
exactly HOW the frequencies differ from what you expect.
Partitioning G to see which terms in the calculation of G are
larger/smaller than you would expect due to random chance along.
Summary:
What is the difference between using the G-test as a contingency
analysis or a test for goodness of fit?
- if you have 2 categorical variable looking at independence (contingency) , nor goodness if fit
7
7. All of the following correctly describes contingency analysis except?
a. Contingency analysis is used to determine whether two or
more categorical variables are associated with one another.
b. The degrees of freedom are calculated as the number of
categories minus one.
1. Random sample
c. In contingency analysis, the null hypothesis is testing the
2. No more than 20% of
cells have an expected independence of two or more categorical variables.
frequency of less thand. Contingency analysis utilizes expected frequencies for each
5, power will be too low category combination compared to observed frequencies for
3. No cells have expected each category combination.
frequency less than 1
e. The 𝒳 2 contingency test statistic makes the same
assumptions as the 𝒳 2 goodness-of-fit test.