Professional Documents
Culture Documents
Hypothesis Testing: Categorical Data: Outline
Hypothesis Testing: Categorical Data: Outline
Outline:
1) Two-sample test for proportions for Independent samples:
Chi_Square Test, Fisher’s Exact Test (Categorical Data).
• Examples:
Gender: male/female (2 categories).
Breast cancer: yes/no.
race: (White, African, Asian, Hispanic).
1
Two-sample test of proportions:
Independent samples
• Assume we want to study whether age of the woman when she had
her first baby is associated with her risk of having breast cancer.
• You want to assess whether having first childbirth at older age (≥ 30)
increases the risk of breast cancer in women.
3
EPHD310 Basic Biost: lecture 7 Dr Jaffa
2
Two-sample test of proportions:
Independent samples
H0: Age of having first child and breast cancer are not associated
H1: Age of having first child and breast caner are associated
3
Two-sample test of proportions: Independent samples
H0: There is no association between the two categorical variables
H1: There is association between the two categorical variables
4
Two-sample test of proportions: Independent samples
Observed table:
Age at birth of first child
status ≥ 30 ≤ 29 Total
Breast cancer O11 = 683 O12 = 2537 3220
(cases)
No breast cancer O21 = 1498 O22 = 8747 10,245
(controls)
Total 2181 11,284 13,465
Expected table:
Age at birth of first child
status ≥ 30 ≤ 29 Total
Breast cancer E11 = 521.6 E12 = 2698.4 3220
(cases)
No breast cancer E21 = 1659.4 E22 = 8585.6 10,245
(controls)
Total 2181 11,284 13,465
E 1 1 3 2 2 0 2 1 8 1 1 3 , 4 6 5 5 2 1 .6
E 1 2 3 2 2 0 1 1, 2 8 4 1 3 , 4 6 5 2 6 9 8 .4
E 2 1 1 0 , 2 4 5 2 1 8 1 1 3 , 4 6 5 1 6 5 9 .4
E 2 2 1 0 , 2 4 5 1 1, 2 8 4 1 3 , 4 6 5 8 5 8 5 .6
EPHD310 Basic Biost: lecture 7 10
Dr Jaffa
5
Two-sample test of proportions: Independent samples
Yates-corrected Chi-square Test:
• Note: This test should be used only if none of the 4 Eij’s is less than 5;
otherwise, report P-value from fisher’s exact test.
6
Chi-Square Test for the Proportions of Two-
Independent Samples
Yates-corrected Chi-square test : SPSS Output
Chi-Square Test
Chi-square
statistic X2
EPHD310 Basic Biost: lecture 7 13
Dr Jaffa
7
Chi-Square Test for the Proportions of Two-
Independent Samples
Yates-corrected Chi-square test : SPSS Output
• The associated P-value < 0.000 indicating that the Chi-
square test is significant and that we can reject the null
hypothesis of equal proportion.
Chi-Square Distribution
8
EPHD310 Basic Biost: lecture 7 17
Dr Jaffa
9
Two-sample test for proportions for Matched-Pair
data: McNemar’s Test
Example (continued)
• Breast cancer patients are assigned to matched pairs
based on age and clinical condition.
10
Two-sample test for proportions for Matched-Pair data: McNemar’s Test
• The matched pairs in cell “a” are called concordant pairs since the
outcome “survive for 5 years” is common between each pair.
• The matched pairs in cell “d” are also called concordant pairs since
the outcome “die within 5 years” is common between each pair.
• For instance in cell “a” there are 510 matched pairs i.e. 1020
individuals. In the grand total there are 621 matched pairs
i.e. 1242 individuals.
11
Two-sample test for proportions for Matched-pair
data: McNemar’s Test
• The total number of discordant pairs in the breast cancer
example is nD = b + c =16 + 5 = 21 matched pairs.
| b c | 1
2
• Note: use this test can be used only if the total number of
discordant pairs is greater than or equal to 20.
12
Two-sample test for proportions for Matched-pair data:
McNemar’s Test
Back to the breast cancer example:
A 2x2 contingency table with matched pair as the sampling unit
based on 621 matched pairs
Outcome of treatment B
patient
Outcome of Survive for Die within 5 Total
treatment A patient 5 years years
Hypotheses:
Ho: Txts A and B are equally effective in survival for 5 years
H1: Txts A and B are not equally effective in survival for 5 years.
| b c | 1 |16 5 | 1
2 2
X2 4.76 1,0.95
2
3.84
bc 16 5
13
Two-sample test for proportions for Matched-pair data:
McNemar’s Test
27
EPHD310 Basic Biost: lecture 7
Dr Jaffa
P-value<0.05 so we reject
H0 and deduce that there
is a difference between
the 2 chemo-therapy
treatment
14
Kappa Statistic
Kappa Statistic
15
Kappa Statistic
• Example: A diet questionnaire was administered by mail to
537 female American nurses on two separate occasion
several month apart.
Kappa Statistic
Amount of beef consumption reported by 537 female American
nurses at 2 different surveys
Survey 2
Survey 1 ≤ 1 serving/week > 1 serving/week Total
16
Kappa Statistic
Summary
17
35
36
1) What type of test was used? Since we are comparing if females and males
(categorical) are equally distributed on the 2 treatment arms (categorical) in an
independent sample study design then use the Chi-Squared test.
18
Examples From Published Articles
Gender: Conclusion?
Reported P-value is 0.98 hence there is no significant difference
in the allocation of males and females patients on the treatment
groups (Intensive vs. Standard).
Hence one can conclude that males and females are distributed
evenly on the 2 treatment arms.
38
Examples From Published Articles
Race or Ethnic Group:
1) Types of the outcome: Race is not continuous, it is categorical (Non-
Hispanic white, Hispanic white, Black, and other).
2) Number of groups you are comparing: 2 groups: standard therapy and
intensive therapy.
3) Study design: Independent (no mentioning of matching in the method’s
section)
4) Specify the hypothesis to be tested: proportion of Non-Hispanic white,
Hispanic white, Black, and other is the same for patients enrolled in
standard therapy and patients in the intensive therapy.
5) What type of test was used? Since we are comparing if the different
races(categorical) are equally distributed on the 2 treatment arms
(categorical) in an independent sample study design then use the Chi-
Squared test.
19
Examples From Published Articles
Race or Ethnic Group: Conclusion?
Reported P-value is 0.51 (above 0.05) hence there is no
significant difference in the allocation of the race or ethnic groups
on the treatment groups (Intensive vs. Standard).
Summary
20
41
3) Study design: Dependent study design since same patients were asked
about their use of assistive devices at Baseline and at follow-up
5) What type of test was used? Since we are comparing if the use of
assistive devices has changed from baseline to follow-up, and we have
categorical data with 2 dependent groups only then we can use McNemar’s
test.
EPHD310 Basic Biost: lecture 7 42
Dr Jaffa
21
Examples From Published Articles
44
22
EPHD310 Basic Biostatistics Course Learning Outcomes Per FHS Catalogue
23