Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 30

CP2403 - Module 6

Inferential Tool – Chi-Square Test of Independence


Summary of Module 5
• Inferential Tools
• Probability
• Sampling Variability
• Central Limit Theorem
• Hypothesis Testing
• Steps in Hypothesis Testing
• p-value
• ANOVA
• Post hoc test
Learning outcome – Module 6
By the end of Module 6, you should be able to
• perform hypothesis testing in the context of Chi-Square Test of
Independence when two categorical variable are involved
• interprets the results of a Chi-Square Test of Independence
Topics covered in Week 6
• Chi-Square Test of Independence
• Post hoc test
What tools to use?
Response Variable
Categorical Quantitative
C -> C
Categorical C -> Q
Explanatory (Chi Square)
Variable
Quantitative Q -> C Q -> Q
Example
• In 1970s, Buick sale was prohibited for sale to males under 21, but
sale to females under 21 was ok
• There was a challenge in court and data was presented on random
breath test survey
Hypothesis Testing
Step 1:
Null hypothesis (Ho):
• There is no difference in the drunk driving rate between males and females
under 21
• The proportion of male drunk drivers = The proportion of female drunk drivers

Alternative (Ha) hypothesis


• There is a difference in the drunk driving rate between males and females
under 21
• The proportion of male drunk drivers ≠ The proportion of female drunk drivers
Step 2: Data source
• data on random breath test survey

Gender Drunk Gender Yes No Total


Driver 1 M Y Male 77 404 481
Driver 2 F N Female 16 122 138
Driver … … … Total 93 526 619
Driver 619 F Y
Step 3: Assess the evidence
Calculate percentages
Response Variable
Gender (x) Yes No Total
Explanatory
Variable
Male 77/481 = 16% 404/481=84% 481 (100%)

Female 16/138 = 11.6% 122/138=88.4% 138 (100%)

Total 93 526 619

16% > 11.6%


So can we conclude that male drivers are more likely to be drink driving
that female drivers

Not yet
So we use Chi Square Test
Break for probability
Example - Dice
One Die
What is the probability of getting 1?
What is the probability of getting 2?
What is the probability of getting 3?
What is the probability of getting 4?
What is the probability of getting 5?
What is the probability of getting 6?
One Die Example
• A single 6-sided die is rolled. What is the probability of rolling a 2 or a 5?

The number rolled can be a 2.


The number rolled can be a 5.

P(A or B) = P(A) + P(B)


Two Dice Example
• If two 6-sided dice are rolled. What is the probability of rolling a 2 and a 5?

The number rolled can be a 2.


The number rolled can be a 5.

P(A and B) = P(A) * P(B)


Chi Square Test
Step 3: Assess the evidence (Chi Square Test)
• In chi square test, we try to determine the difference between what
was
• Observed
• Expected if the hypothesis was true

Observed Expected if the hypothesis was true

Gender Yes No Total Gender Yes No Total


Male 77 404 481 Male ??? ??? 481
Female 16 122 138 Female ??? ??? 138
Total 93 526 619 Total 93 526 619
Calculating expected value if the hypothesis
was true Observed
Gender Yes No Total
Male 77 404 481
P(A and B) = P(A) * P(B)
Female 16 122 138
Total 93 526 619
P(Male & Drunk)
= P(Male) * P(Drunk) Expected if the hypothesis was true
Gender Yes No Total
= * = 0.78 * 0.15 = 0.117 Male 72.3 ??? 481
Female ??? ??? 138

So the expected number of Total 93 526 619

male drunk drivers are


0.117 * 619 = 72.3
Formula for expected count

𝐶𝑜𝑙𝑢𝑚𝑛𝑡𝑜𝑡𝑎𝑙∗𝑅𝑜𝑤𝑇𝑜𝑡𝑎𝑙
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝐶𝑜𝑢𝑛𝑡=
Observed Expected if the hypothesis was true
Gender Yes No Total Gender Yes No Total
Male 77 404 481 Male 72.3 (526*481)/619 481
= 408.7
Female 16 122 138

𝑇𝑎𝑏𝑙𝑒 𝑇𝑜𝑡𝑎𝑙
Female (93*138)/619 (526*138)/619 138
Total 93 526 619 = 20.7 = 117.3
Total 93 526 619
2
(𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑 𝐶𝑜𝑢𝑛𝑡 − 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑜𝑢𝑛𝑡)
χ =∑ 𝑎𝑙𝑙 𝑐𝑒𝑙𝑙𝑠
Chi Square Stats
2
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝐶𝑜𝑢𝑛𝑡

Observed Expected if the hypothesis was true


Gender Yes No Total Gender Yes No Total
Male 77 404 481 Male 72.3 408.7 481
Female 16 122 138 Female 20.7 117.3 138
Total 93 526 619 Total 93 526 619

Gender Yes No Total p = 0.201 (p>0.05)


Male 77 404 481
For a 2 x 2 case, is large if >3.84
72.3 408.7 Therefore, our is not large
Female 16 122 138
Step 4: Draw conclusions
20.7 117 indicates that there is there is no difference in the
Total 93 526 619 drunk driving rate between males and females under 21
Accept null hypothesis
Example 2
• We want to find out if ‘how often is a young person smokes’ relate to their ‘nicotine
dependence’?

Step 1:
Null hypothesis (Ho):
Rate of smoking in nicotine dependent young person
= Rate of smoking in non-nicotine dependent young person

Alternative (Ha) hypothesis


Rate of smoking in nicotine dependent young person
≠ Rate of smoking in non-nicotine dependent young person
Smoking
Frequency Nicotine
dependence (Y)
(X)
(Categorical)
(Categorical)
(0/1)
(6 Level)
Hypothesis Testing
Step 2:
We select data from nesarc dataset
young adults 18-15 years old
Daily smokers
• 64 people who smoke
one day a month & are
Hypothesis Testing not nicotine dependent

3. Assess the evidence (Cross tab) • 7 people who smoke


one day a month & are
nicotine dependent

• 41 people of daily
smoker are not
nicotine dependent

• 27 people of daily
smoker are nicotine
dependent
Hypothesis Testing
• % of nicotine
dependence
3. Assess the evidence (Cross tab - % & χ2 ) increase with
amount of
cigarette
• p < 0.05
Step 4: Draw conclusion
Reject (Rate of smoking in nicotine dependent young person
= Rate of smoking in non-nicotine dependent young person)

Accept (Rate of smoking in nicotine dependent young person


≠ Rate of smoking in non-nicotine dependent young person)
Further investigation

In addition, we are interested in rate of


smoking and nicotine dependence
Post hoc tests
• As we have rejected the null hypothesis, we need to perform comparison
for each pair of nicotine dependent’s rate across the six smoking frequency
categories
P value
smoking 1 day a month vs smoking 2.5 days a month ???
smoking 1 day a month vs smoking 5 days a month ???
… … ???
… …. ???
smoking 22 day a month vs smoking 30 days a month ???

• So, we will be doing 15 comparison for our 6 categories


Post hoc tests
• Bonferroni Adjustment
• Not use p<0.05, but
• But new p is 0.05/15 (number of comparison)
• p<0.0033

• python demo
Post hoc tests
Post hoc tests

• There are significantly more cases of nicotine dependence in young people who
• smoke 14 days a month than those who smoke once a month
• smoke 22 days a month than those who smoke once a month
• smoke 30 days a month than those who smoke once a month
• smoke 30 days a month than those who smoke 2.5 days a month
• smoke 30 days a month than those who smoke 5 days a month
• smoke 30 days a month than those who smoke 14 days a month
• smoke 30 days a month than those who smoke 22 days a month
Summary of Module 6
• Chi-Square Test of Independence
• Post hoc test
Prac 6 Overview

You might also like