PSM 201 The Chi Square Test

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

The Chi square test

BY
BABATUNDE ADEDOKUN
MBBS; MSc Epid & Med Stat (Ib.)
The Chi square test
• Significance test for testing the association
between two categorical variables
• Equivalent to a comparison of proportions
• Data usually presented in contingency tables
where the variables are cross classified
• It compares observed frequencies with the
frequencies expected assuming there was no
association between the variables i.e. under
the null hypothesis assumption
Cross-tabulations
• Categorical variables are cross classified in tables known
as contingency tables
• Horizontal panels are known as rows while the vertical
panels are called columns
• The space formed by the intersections between the row
and column is called the cell
• A contingency table is named according to the number
of categories of the row variable by that of the column
variable
• A table with an independent variable with 5 categories
and the dependent variable with 3 categories is a 5 by 3
table
Cross-tabulations 2
• Often, there is a dependency in the relationship
between two variables
– Smoking (independent variable) could influence the
risk of lung cancer (dependent variable)
– Maternal education (independent variable) affects
breastfeeding practice (dependent variable) or
utilization of antenatal care (dependent variable)
• Conventionally, the dependent variable is inserted
on top (column variable) while the independent
variable is placed at the side (row variable)
Some rules for tables
• Proper title (study population, time and place)
• Avoid overcrowding with too many variables
• Percentages should be reported along with frequencies
• Conventionally independent variables are placed in rows
and the dependent in the column section
• Some tables may not be necessary (simply describe)
• Do not present same information in table and chart at
same time
• With the independent variable in the row, percentages
should add up to 100% in the direction of the row
The chi-square test

• This is the choice of test statistic to investigate significance of


association between 2 qualitative variables
• It involves the use of contigency tables which help to assess
the association
• The test is based on the difference between the observed
data and what is expected if the null hypothesis (of no
association) were true.

Χ2 = Σ[(Oi-Ei)2/Ei]

where:
Oi = Observed frequencies
Ei = Expected frequencies if the null hypothesis were
true
d.f. = (r-1)x (c-1)
The chi-square test –an example

• 14 out of 60 pts seen in a PHC in a community had IBS and 4 out of 50 pts
seen in another community had IBS. Does the data suggest any association
between IBS and location.
• Solution
– Identify the independent and dependent variables
– Present data in a contingency table
Region IBS
Yes No Total
Comm 1 14 46 60
Comm 2 4 46 50
Total 18 92 110

– State the null hypothesis


– State the alternative hypothesis
– Choose sig. level
– Compute the expected frequencies
The null and alternative hypothesis
• Null hypothesis:
– There is no association between location and
IBS OR there is no difference in the
occurrence of IBS between communities I and
II
• Alternative hypothesis
– There is an association between location and
IBS OR there is a difference in the occurrence
of IBS between communities I and II
Table of expected frequencies
Region IBS

Yes No Total

Comm 1 (60x18)/110 = 9.8 (60x92)/110 = 50.2 60

Comm 2 (50x18)/110 = 8.2 (50x92)/110 = 41.8 50

Total 18 92 110

• Hint: if H0 were true, the same proportion of people should


have IBS in the 2 communities (18/110). This is expected to be
in proportion to the total no of people seen in each community.
• E1 = (18/110) x 60; E2 = (18/110) x 50; E3 = (92/110) x 60 ; E4 =
(92/110) x 50
• Using the formula for the Chi square

• Χ2 = 4.62; from table , at 5% sig level and 1 df, Χ2


=3.84

so we reject H0

• We conclude that there is a statistically significant


association between location and IBS OR there is a
statistically significant difference in proportions with
IBS between communities I and II

You might also like