Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Hypothesis Testing: Categorical Data

Outline:
1) Two-sample test for proportions for Independent samples:
Chi_Square Test, Fisher’s Exact Test (Categorical Data).

2) Two-sample test for proportions for matched-pair data:


McNemar’s Test. (Binomial data).

3) The Kappa statistic.

4) Learning outcomes for EPHD310 Covered in Lecture 7.

EPHD310 Basic Biost: lecture 7 1


Dr Jaffa

Two-sample test of proportions: Independent samples


• A categorical variable (sometimes called a nominal
variable) is one that has two or more categories, but there
is no intrinsic ordering to the categories.

• A categorical variable with 2 categories is referred to as


binomial.

• Examples:
Gender: male/female (2 categories).
Breast cancer: yes/no.
race: (White, African, Asian, Hispanic).

• We will focus on the binomial variables (yes/no; 0/1)


EPHD310 Basic Biost: lecture 7
Dr Jaffa 2

1
Two-sample test of proportions:
Independent samples
• Assume we want to study whether age of the woman when she had
her first baby is associated with her risk of having breast cancer.

• You want to assess whether having first childbirth at older age (≥ 30)
increases the risk of breast cancer in women.

• In specific, interest is in determining how the population proportion of


women who had their first babies at age more than 30 among breast
cancer cases (P1) compares to the population proportion of women
who also had their first childbirth at age older than 30 but among
controls (do not have breast cancer) (P2).

3
EPHD310 Basic Biost: lecture 7 Dr Jaffa

Two-sample test of proportions:


Independent samples

Age at birth of first child


status ≥ 30 ≤ 29 Total
Breast cancer 683 2537 3220
(cases)
No breast cancer 1498 8747 10,245
(controls)
Total 2181 11,284 13,465

EPHD310 Basic Biost: lecture 7 Dr Jaffa 4

2
Two-sample test of proportions:
Independent samples

Age at birth of first child


status ≥ 30 ≤ 29 Total
Breast cancer 683 2537 3220
(cases)
No breast cancer 1498 8747 10,245
(controls)
Total 2181 11,284 13,465

pˆ1  683 3220  0.212 pˆ 2  1498 10, 245  0.146

EPHD310 Basic Biost: lecture 7 Dr Jaffa 5

Two-sample test of proportions: Independent samples

H0: Age of having first child and breast cancer are not associated
H1: Age of having first child and breast caner are associated

To test the hypothesis use “Yates-corrected Chi-square test for a 2x2


contingency table”.

EPHD310 Basic Biost: lecture 7 6


Dr Jaffa

3
Two-sample test of proportions: Independent samples
H0: There is no association between the two categorical variables
H1: There is association between the two categorical variables

Yates-corrected Chi-square test: critical value method


1) Compute the X2 test statistic
X 2  | O11  E11 | 0.5  E11  | O12  E12 | 0.5  E12
2 2

+ | O21  E21 | 0.5  E21  | O22  E22 | 0.5  E22


2 2

which under H0 approximately follows Chi-square distribution, 2I 1 J 1


with degrees of freedom=(I-1)x(J-1) with I=rows, J=columns

2) Reject H0 if X    I 1 J 1,(1 ) and fail to reject H0 otherwise


2 2

In a 2x2 table I = 2 and J = 2 so  2I 1 J 1,1  1,12 

EPHD310 Basic Biost: lecture 7 7


Dr Jaffa

Two-sample test of proportions:


Independent samples
Yates-corrected Chi-Square Test:
• Oij and Eij are respectively the observed and expected
number of units in the (i,j) cell. Total
x1 n1-x1 n1
x2 n2-x2 n2
Total x1+x2 n1+n2 –(x1+x2) n1+n2

• E11 = n1(x1+x2) / (n1+n2)


• E12 = n1(n1+n2-(x1+x2)) / (n1+n2)
• E21 = n2(x1+x2) / (n1+n2)
• E22 = n2(n1+n2 –(x1+x2)) / (n1+n2)
EPHD310 Basic Biost: lecture 7 8
Dr Jaffa

4
Two-sample test of proportions: Independent samples

Observed table:
Age at birth of first child
status ≥ 30 ≤ 29 Total
Breast cancer O11 = 683 O12 = 2537 3220
(cases)
No breast cancer O21 = 1498 O22 = 8747 10,245
(controls)
Total 2181 11,284 13,465

EPHD310 Basic Biost: lecture 7 9


Dr Jaffa

Expected table:
Age at birth of first child
status ≥ 30 ≤ 29 Total
Breast cancer E11 = 521.6 E12 = 2698.4 3220
(cases)
No breast cancer E21 = 1659.4 E22 = 8585.6 10,245
(controls)
Total 2181 11,284 13,465

E 1 1  3 2 2 0  2 1 8 1  1 3 , 4 6 5  5 2 1 .6
E 1 2  3 2 2 0 1 1, 2 8 4  1 3 , 4 6 5  2 6 9 8 .4
E 2 1  1 0 , 2 4 5  2 1 8 1  1 3 , 4 6 5  1 6 5 9 .4
E 2 2  1 0 , 2 4 5 1 1, 2 8 4  1 3 , 4 6 5  8 5 8 5 .6
EPHD310 Basic Biost: lecture 7 10
Dr Jaffa

5
Two-sample test of proportions: Independent samples
Yates-corrected Chi-square Test:

X 2  | 683  521.6 | 0.5  521.6  | 2537  2698.4 | 0.5  2698.4


2 2

 |1498  1659.4 | 0.5  1659.4  | 8747  8585.6 | 0.5  8585.6


2 2

 77.89  1,0.95


2
 3.84
• Thus the result is significant so we reject the null hypothesis and conclude
that breast cancer is significantly associated with having a first child after
the age of 30. In specific, the population proportion for age >= 30 among
case is greater than among controls. So we can deduce that having the
first baby at an age >= 30 increases the risk of breast cancer.

• Note: This test should be used only if none of the 4 Eij’s is less than 5;
otherwise, report P-value from fisher’s exact test.

EPHD310 Basic Biost: lecture 7 11


Dr Jaffa

Chi-Square Test for the Proportions of Two-Independent


Samples

Yates-corrected Chi-square test : SPSS Output

Count: is the observed table


Expected Count: Expected table that would have been
obtained had the null hypothesis been true.
EPHD310 Basic Biost: lecture 7 12
Dr Jaffa

6
Chi-Square Test for the Proportions of Two-
Independent Samples
Yates-corrected Chi-square test : SPSS Output

Chi-Square Test

Chi-square
statistic X2
EPHD310 Basic Biost: lecture 7 13
Dr Jaffa

Chi-Square Test for the Proportions of Two-


Independent Samples
Yates-corrected Chi-square test : SPSS Output
• Chi-square test is used only when all the expected cells are
greater than 5 for 2x2 table, and if 75% of the expected cells
are greater than 5 for IxJ table (more than the 2x2
dimension).

EPHD310 Basic Biost: lecture 7 14


Dr Jaffa

7
Chi-Square Test for the Proportions of Two-
Independent Samples
Yates-corrected Chi-square test : SPSS Output
• The associated P-value < 0.000 indicating that the Chi-
square test is significant and that we can reject the null
hypothesis of equal proportion.

• Thus we can deduce that having a first baby at an age


greater than or equal to 30 is associated with breast cancer.

EPHD310 Basic Biost: lecture 7 15


Dr Jaffa

Chi-Square Distribution

• The Chi-square distribution is a family of distributions


indexed by its degrees of freedom (as was the case for
the t distribution).

• Unlike the t distribution, which is always symmetric about


0 for any degrees of freedom, the chi-square distribution
only takes on positive values and is always skewed to the
right.

EPHD310 Basic Biost: lecture 7 16


Dr Jaffa

8
EPHD310 Basic Biost: lecture 7 17
Dr Jaffa

Two-sample test for proportions for Matched-Pair


data: McNemar’s Test
• McNemar’s test: used when proportions come from
correlated samples.

• Correlated samples: matched pairs, or same person


used as case and control (example before and after
treatment).

• Example: two treatments for chemotherapy (A and B)


after mastectomy.

EPHD310 Basic Biost: lecture 7 18


Dr Jaffa

9
Two-sample test for proportions for Matched-Pair
data: McNemar’s Test
Example (continued)
• Breast cancer patients are assigned to matched pairs
based on age and clinical condition.

• Outcome of interest is survival (yes/no) for 5 years after


either treatment.

• The data in a matched pair context is always tabulated as


follows:

EPHD310 Basic Biost: lecture 7 19


Dr Jaffa

A 2x2 contingency table with matched pair as the sampling unit


based on 621 matched pairs
Outcome of treatment B
patient
Outcome of Survive for Die within 5 Total
treatment A patient 5 years years

Survive for 5 years a = 510 b = 16 526


Die within 5 years c=5 d = 90 95
Total 515 106 621

EPHD310 Basic Biost: lecture 7 20


Dr Jaffa

10
Two-sample test for proportions for Matched-Pair data: McNemar’s Test

• The matched pairs in cell “a” are called concordant pairs since the
outcome “survive for 5 years” is common between each pair.

• The matched pairs in cell “d” are also called concordant pairs since
the outcome “die within 5 years” is common between each pair.

• The matched pairs in cell “b” are called discordant pairs.

• The matched pairs in cell “c” are called discordant pairs.

EPHD310 Basic Biost: lecture 7 21


Dr Jaffa

Two-sample test for proportions for Matched-pair


data: McNemar’s Test
• Note that the number shown in each cell as well as on the
total is the number of matched pairs and not individuals. In
matched pairs context we always talk pairs and not
individuals.

• For instance in cell “a” there are 510 matched pairs i.e. 1020
individuals. In the grand total there are 621 matched pairs
i.e. 1242 individuals.

EPHD310 Basic Biost: lecture 7 22


Dr Jaffa

11
Two-sample test for proportions for Matched-pair
data: McNemar’s Test
• The total number of discordant pairs in the breast cancer
example is nD = b + c =16 + 5 = 21 matched pairs.

• The total number of concordant pairs in the breast cancer


example is nC = a + d = 510 + 90 = 600 matched pairs.

• McNemar’s test is based on the discordant pairs (b and c


cells) and can be used only when the total number of
discordant pairs > = 20.

EPHD310 Basic Biost: lecture 7 23


Dr Jaffa

Two-sample test for proportions for Matched-pair


data: McNemar’s Test
McNemar’s test for correlated proportions: Normal theory test:

| b  c | 1
2

(1) Compute the test statistic: X 2



bc
(2) Reject Ho if X  1,1
2 2

Fail to reject Ho if X 2  1,1


2


• Note: use this test can be used only if the total number of
discordant pairs is greater than or equal to 20.

EPHD310 Basic Biost: lecture 7 24


Dr Jaffa

12
Two-sample test for proportions for Matched-pair data:
McNemar’s Test
Back to the breast cancer example:
A 2x2 contingency table with matched pair as the sampling unit
based on 621 matched pairs
Outcome of treatment B
patient
Outcome of Survive for Die within 5 Total
treatment A patient 5 years years

Survive for 5 years a = 510 b = 16 526


Die within 5 years c=5 d = 90 95
Total 515 106 621

EPHD310 Basic Biost: lecture 7 25


Dr Jaffa

Two-sample test for proportions for Matched-pair data: McNemar’s Test

Aim: assess if treatments A and B are equivalently effective: Proportions for


matched-pair data with total number of discordant pairs is nD = b + c =16 + 5 =
21 matched pairs > 20, so use McNemar’s test.

Hypotheses:
Ho: Txts A and B are equally effective in survival for 5 years
H1: Txts A and B are not equally effective in survival for 5 years.
| b  c | 1 |16  5 | 1
2 2

X2    4.76  1,0.95
2
 3.84
bc 16  5

The result is significant, so we reject the null hypothesis at α level of


significance and deduce that the 2 txts are not equally effective.

EPHD310 Basic Biost: lecture 7 26


Dr Jaffa

13
Two-sample test for proportions for Matched-pair data:
McNemar’s Test

Treatment A member of the pair is significantly more likely to survive for 5


years than the treatment B member.

The result is in favor of treatment A since


proportion of those who survive on A and die on B = 16/621
While the proportion of those of who survive on B and die on A = 5/621

27
EPHD310 Basic Biost: lecture 7
Dr Jaffa

Two-sample test for proportions for Matched-pair


data: McNemar’s Test
SPSS output for the McNemar’s Test applied to the
chemotherapy and breast cancer example:

P-value<0.05 so we reject
H0 and deduce that there
is a difference between
the 2 chemo-therapy
treatment

EPHD310 Basic Biost: lecture 7 28


Dr Jaffa

14
Kappa Statistic

• Thus far we were focusing on assessing whether or not


there’s an association between two categorical variables.

• If we need to quantify the “degree of association” or


“degree of agreement” we use kappa statistic κ

• If a categorical variable is reported at two surveys by each


of n subjects, then the kappa statistic (κ) is used to
measure “reproducibility” between surveys.

EPHD310 Basic Biost: lecture 7 29


Dr Jaffa

Kappa Statistic

• Example: beef consumption measured in two occasions,


few months apart, for the same group of people.

• Guidelines for the evaluation of kappa statistic κ:


κ > 0.75 denotes excellent reproducibility.
0.4 ≤ κ ≤ 0.75 denotes good reproducibility.
0 ≤ κ < 0.4 denotes marginal reproducibility.

EPHD310 Basic Biost: lecture 7 30


Dr Jaffa

15
Kappa Statistic
• Example: A diet questionnaire was administered by mail to
537 female American nurses on two separate occasion
several month apart.

• The data obtained from the two surveys represent the


amount of beef consumption.

• How can the reproducibility of response for the beef


consumption data be quantified?

EPHD310 Basic Biost: lecture 7 31


Dr Jaffa

Kappa Statistic
Amount of beef consumption reported by 537 female American
nurses at 2 different surveys
Survey 2
Survey 1 ≤ 1 serving/week > 1 serving/week Total

≤ 1 serving/week 136 92 228


> 1 serving/week 69 240 309
Total 205 332 537

EPHD310 Basic Biost: lecture 7 32


Dr Jaffa

16
Kappa Statistic

Kappa κ = 0.378 < 0.4 thus the reproducibility between the


first and second surveys is marginal.
EPHD310 Basic Biost: lecture 7 33
Dr Jaffa

Summary

Chi-Square test is a statistical test used to examine differences


with categorical variables in an independent unmatched
samples study design.

Example: In an election survey, voters might be classified by


gender (male or female) and voting preference (Democrat,
Republican, or Independent). We could use a chi-square test for
independence to determine whether gender is related to voting
preference.

EPHD310 Basic Biost: lecture 7 34


Dr Jaffa

17
35

36

Examples From Published Articles


Gender:

1) Types of the outcome: Gender is not continuous, it is categorical (binary


males/females).

2) Number of groups you are comparing: 2 groups: standard therapy and


intensive therapy.

3) Study design: Independent (no mentioning of matching in the method’s


section)

4) Specify the hypothesis to be tested: proportions of females and males is


the same for patients enrolled in standard therapy and patients in the intensive
therapy.

1) What type of test was used? Since we are comparing if females and males
(categorical) are equally distributed on the 2 treatment arms (categorical) in an
independent sample study design then use the Chi-Squared test.

EPHD310 Basic Biost: lecture 7


Dr Jaffa

18
Examples From Published Articles

Gender: Conclusion?
Reported P-value is 0.98 hence there is no significant difference
in the allocation of males and females patients on the treatment
groups (Intensive vs. Standard).

Hence one can conclude that males and females are distributed
evenly on the 2 treatment arms.

EPHD310 Basic Biost: lecture 7 37


Dr Jaffa

38
Examples From Published Articles
Race or Ethnic Group:
1) Types of the outcome: Race is not continuous, it is categorical (Non-
Hispanic white, Hispanic white, Black, and other).
2) Number of groups you are comparing: 2 groups: standard therapy and
intensive therapy.
3) Study design: Independent (no mentioning of matching in the method’s
section)
4) Specify the hypothesis to be tested: proportion of Non-Hispanic white,
Hispanic white, Black, and other is the same for patients enrolled in
standard therapy and patients in the intensive therapy.
5) What type of test was used? Since we are comparing if the different
races(categorical) are equally distributed on the 2 treatment arms
(categorical) in an independent sample study design then use the Chi-
Squared test.

EPHD310 Basic Biost: lecture 7


Dr Jaffa

19
Examples From Published Articles
Race or Ethnic Group: Conclusion?
Reported P-value is 0.51 (above 0.05) hence there is no
significant difference in the allocation of the race or ethnic groups
on the treatment groups (Intensive vs. Standard).

Hence one can conclude that the different races (Non-Hispanic


white, Hispanic white, Black, and other) are distributed evenly on
the 2 treatment arms.

EPHD310 Basic Biost: lecture 7 39


Dr Jaffa

Summary

McNemar’s Test Used when:


1) Outcome that is being compared is Binary. Example:
assessing if in a matched study design participants
religious status: religious yes/no (Binary) is impacting their
opinion about civil marriage (with or against).
2) Groups that are compared are dependent (that is matched
or repeated measure -- example a before/after -- study
design).
3) Limited to comparison of 2 groups only.

EPHD310 Basic Biost: lecture 7 40


Dr Jaffa

20
41

EPHD310 Basic Biost: lecture 7


Dr Jaffa

Examples From Published Articles


Use of Assistive Devices:

1) Types of the outcome: Use of assistive Devices is not continuous, it is


categorical (Binary Yes/No).

2) Number of groups you are comparing: 2 groups since Use of Assistive


Devices is divided into 2 categories (Yes/No).

3) Study design: Dependent study design since same patients were asked
about their use of assistive devices at Baseline and at follow-up

4) Specify the hypothesis to be tested: Did the use of assistive devices


change from baseline to follow-up?

5) What type of test was used? Since we are comparing if the use of
assistive devices has changed from baseline to follow-up, and we have
categorical data with 2 dependent groups only then we can use McNemar’s
test.
EPHD310 Basic Biost: lecture 7 42
Dr Jaffa

21
Examples From Published Articles

Use of Assistive Devices: Conclusion?


Reported P-value is 0.009 (below 0.05) hence there is a
significant difference in the Use of Assistive Devices from
Baseline to Follow-up.

Direction of the Conclusion: At baseline the use of assistive


devices was 57 out of 195 (29%) at follow-up it went up to 74
out of 195 (38%). Hence the percentage of utilization of
assistive devices went up from baseline to follow-up.

EPHD310 Basic Biost: lecture 7 43


Dr Jaffa

44

Examples From Published Articles

What type of data is “Physical Function” reported in the Table?


Since standard deviation (SD) was reported for physical function
in the Table then this is an indication that “physical function” is a
continuous outcome.
What test was used to report the P-value for physical function?
Since we have dependent samples (baseline and follow-up
assessment) and “physical function” is continuous then paired t
test should have been used here to assess if physical function
changed from baseline to follow-up.

EPHD310 Basic Biost: lecture 7


Dr Jaffa

22
EPHD310 Basic Biostatistics Course Learning Outcomes Per FHS Catalogue

LO4. Analyze quantitative data using common statistical methods for


inference through computer based statistical software and manual
computation.
LO5. Apply alternative statistical methodologies to commonly used
statistical methods when assumptions are not met.
LO6. Interpret results of statistical analyses found in public health studies
and biomedical sciences.
LO7. Apply ethical principles to data management and analysis.

EPHD310 Basic Biost: lecture 7 45


Dr Jaffa

23

You might also like