Hypothesis Testing: Categorical Data: Outline

Hypothesis Testing: Categorical Data
Outline:
1) Two-sample test for proportions for Independent samples:
Chi_Square Test, Fisher’s Exact Test (Categorical Data).
2) Two-sample test for proportions for matched-pair data:

McNemar’s Test. (Binomial data).
3) The Kappa statistic.
4) Learning outcomes for EPHD310 Covered in Lecture 7.
EPHD310 Basic Biost: lecture 7 1

Dr Jaffa
Two-sample test of proportions: Independent samples

• A categorical variable (sometimes called a nominal
variable) is one that has two or more categories, but there
is no intrinsic ordering to the categories.
• A categorical variable with 2 categories is referred to as

binomial.
• Examples:
Gender: male/female (2 categories).
Breast cancer: yes/no.
race: (White, African, Asian, Hispanic).
• We will focus on the binomial variables (yes/no; 0/1)

EPHD310 Basic Biost: lecture 7
Dr Jaffa 2
1
Two-sample test of proportions:
Independent samples
• Assume we want to study whether age of the woman when she had
her first baby is associated with her risk of having breast cancer.
• You want to assess whether having first childbirth at older age (≥ 30)
increases the risk of breast cancer in women.
• In specific, interest is in determining how the population proportion of

women who had their first babies at age more than 30 among breast
cancer cases (P1) compares to the population proportion of women
who also had their first childbirth at age older than 30 but among
controls (do not have breast cancer) (P2).
3
EPHD310 Basic Biost: lecture 7 Dr Jaffa

Independent samples
Age at birth of first child

status ≥ 30 ≤ 29 Total
Breast cancer 683 2537 3220
(cases)
No breast cancer 1498 8747 10,245
(controls)
Total 2181 11,284 13,465
EPHD310 Basic Biost: lecture 7 Dr Jaffa 4
2
Independent samples

Breast cancer 683 2537 3220
(cases)
No breast cancer 1498 8747 10,245
(controls)
Total 2181 11,284 13,465
pˆ1  683 3220  0.212 pˆ 2  1498 10, 245  0.146
EPHD310 Basic Biost: lecture 7 Dr Jaffa 5
H0: Age of having first child and breast cancer are not associated
H1: Age of having first child and breast caner are associated
To test the hypothesis use “Yates-corrected Chi-square test for a 2x2

contingency table”.

Dr Jaffa
3
H0: There is no association between the two categorical variables
H1: There is association between the two categorical variables
Yates-corrected Chi-square test: critical value method

1) Compute the X2 test statistic
X 2  | O11  E11 | 0.5  E11  | O12  E12 | 0.5  E12
2 2
+ | O21  E21 | 0.5  E21  | O22  E22 | 0.5  E22

2 2
which under H0 approximately follows Chi-square distribution, 2I 1 J 1

with degrees of freedom=(I-1)x(J-1) with I=rows, J=columns
2) Reject H0 if X    I 1 J 1,(1 ) and fail to reject H0 otherwise

2 2
In a 2x2 table I = 2 and J = 2 so  2I 1 J 1,1  1,12 

Dr Jaffa

Independent samples
Yates-corrected Chi-Square Test:
• Oij and Eij are respectively the observed and expected
number of units in the (i,j) cell. Total
x1 n1-x1 n1
x2 n2-x2 n2
Total x1+x2 n1+n2 –(x1+x2) n1+n2
• E11 = n1(x1+x2) / (n1+n2)

• E12 = n1(n1+n2-(x1+x2)) / (n1+n2)
• E21 = n2(x1+x2) / (n1+n2)
• E22 = n2(n1+n2 –(x1+x2)) / (n1+n2)
Dr Jaffa
4
Observed table:
Breast cancer O11 = 683 O12 = 2537 3220
(cases)
No breast cancer O21 = 1498 O22 = 8747 10,245
(controls)
Total 2181 11,284 13,465

Dr Jaffa
Expected table:
Breast cancer E11 = 521.6 E12 = 2698.4 3220
(cases)
No breast cancer E21 = 1659.4 E22 = 8585.6 10,245
(controls)
Total 2181 11,284 13,465
E 1 1  3 2 2 0  2 1 8 1  1 3 , 4 6 5  5 2 1 .6
E 1 2  3 2 2 0 1 1, 2 8 4  1 3 , 4 6 5  2 6 9 8 .4
E 2 1  1 0 , 2 4 5  2 1 8 1  1 3 , 4 6 5  1 6 5 9 .4
E 2 2  1 0 , 2 4 5 1 1, 2 8 4  1 3 , 4 6 5  8 5 8 5 .6
Dr Jaffa
5
Yates-corrected Chi-square Test:
X 2  | 683  521.6 | 0.5  521.6  | 2537  2698.4 | 0.5  2698.4

2 2
 |1498  1659.4 | 0.5  1659.4  | 8747  8585.6 | 0.5  8585.6

2 2
 77.89  1,0.95

2
 3.84
• Thus the result is significant so we reject the null hypothesis and conclude
that breast cancer is significantly associated with having a first child after
the age of 30. In specific, the population proportion for age >= 30 among
case is greater than among controls. So we can deduce that having the
first baby at an age >= 30 increases the risk of breast cancer.
• Note: This test should be used only if none of the 4 Eij’s is less than 5;
otherwise, report P-value from fisher’s exact test.

Dr Jaffa
Chi-Square Test for the Proportions of Two-Independent

Samples
Yates-corrected Chi-square test : SPSS Output
Count: is the observed table

Expected Count: Expected table that would have been
obtained had the null hypothesis been true.
Dr Jaffa
6
Chi-Square Test for the Proportions of Two-
Independent Samples
Chi-Square Test
Chi-square
statistic X2
Dr Jaffa

Independent Samples
• Chi-square test is used only when all the expected cells are
greater than 5 for 2x2 table, and if 75% of the expected cells
are greater than 5 for IxJ table (more than the 2x2
dimension).

Dr Jaffa
7
Independent Samples
• The associated P-value < 0.000 indicating that the Chi-
square test is significant and that we can reject the null
hypothesis of equal proportion.
• Thus we can deduce that having a first baby at an age

greater than or equal to 30 is associated with breast cancer.

Dr Jaffa
Chi-Square Distribution
• The Chi-square distribution is a family of distributions

indexed by its degrees of freedom (as was the case for
the t distribution).
• Unlike the t distribution, which is always symmetric about

0 for any degrees of freedom, the chi-square distribution
only takes on positive values and is always skewed to the
right.

Dr Jaffa
8
Dr Jaffa
Two-sample test for proportions for Matched-Pair

data: McNemar’s Test
• McNemar’s test: used when proportions come from
correlated samples.
• Correlated samples: matched pairs, or same person

used as case and control (example before and after
treatment).
• Example: two treatments for chemotherapy (A and B)

after mastectomy.

Dr Jaffa
9
Two-sample test for proportions for Matched-Pair
Example (continued)
• Breast cancer patients are assigned to matched pairs
based on age and clinical condition.
• Outcome of interest is survival (yes/no) for 5 years after

either treatment.
• The data in a matched pair context is always tabulated as

follows:

Dr Jaffa
A 2x2 contingency table with matched pair as the sampling unit

based on 621 matched pairs
Outcome of treatment B
patient
Outcome of Survive for Die within 5 Total
treatment A patient 5 years years
Survive for 5 years a = 510 b = 16 526

Die within 5 years c=5 d = 90 95
Total 515 106 621

Dr Jaffa
10
Two-sample test for proportions for Matched-Pair data: McNemar’s Test
• The matched pairs in cell “a” are called concordant pairs since the
outcome “survive for 5 years” is common between each pair.
• The matched pairs in cell “d” are also called concordant pairs since
the outcome “die within 5 years” is common between each pair.
• The matched pairs in cell “b” are called discordant pairs.
• The matched pairs in cell “c” are called discordant pairs.

Dr Jaffa
Two-sample test for proportions for Matched-pair

• Note that the number shown in each cell as well as on the
total is the number of matched pairs and not individuals. In
matched pairs context we always talk pairs and not
individuals.
• For instance in cell “a” there are 510 matched pairs i.e. 1020
individuals. In the grand total there are 621 matched pairs
i.e. 1242 individuals.

Dr Jaffa
11
• The total number of discordant pairs in the breast cancer
example is nD = b + c =16 + 5 = 21 matched pairs.
• The total number of concordant pairs in the breast cancer

example is nC = a + d = 510 + 90 = 600 matched pairs.
• McNemar’s test is based on the discordant pairs (b and c

cells) and can be used only when the total number of
discordant pairs > = 20.

Dr Jaffa

McNemar’s test for correlated proportions: Normal theory test:
| b  c | 1
2
(1) Compute the test statistic: X 2


bc
(2) Reject Ho if X  1,1
2 2
Fail to reject Ho if X 2  1,1

2

• Note: use this test can be used only if the total number of
discordant pairs is greater than or equal to 20.

Dr Jaffa
12
Two-sample test for proportions for Matched-pair data:
McNemar’s Test
Back to the breast cancer example:
A 2x2 contingency table with matched pair as the sampling unit
based on 621 matched pairs
Outcome of treatment B
patient
Outcome of Survive for Die within 5 Total
treatment A patient 5 years years
Survive for 5 years a = 510 b = 16 526

Die within 5 years c=5 d = 90 95
Total 515 106 621

Dr Jaffa
Two-sample test for proportions for Matched-pair data: McNemar’s Test
Aim: assess if treatments A and B are equivalently effective: Proportions for

matched-pair data with total number of discordant pairs is nD = b + c =16 + 5 =
21 matched pairs > 20, so use McNemar’s test.
Hypotheses:
Ho: Txts A and B are equally effective in survival for 5 years
H1: Txts A and B are not equally effective in survival for 5 years.
| b  c | 1 |16  5 | 1
2 2
X2    4.76  1,0.95
2
 3.84
bc 16  5
The result is significant, so we reject the null hypothesis at α level of

significance and deduce that the 2 txts are not equally effective.

Dr Jaffa
13
Two-sample test for proportions for Matched-pair data:
McNemar’s Test
Treatment A member of the pair is significantly more likely to survive for 5

years than the treatment B member.
The result is in favor of treatment A since

proportion of those who survive on A and die on B = 16/621
While the proportion of those of who survive on B and die on A = 5/621
27
Dr Jaffa

SPSS output for the McNemar’s Test applied to the
chemotherapy and breast cancer example:
P-value<0.05 so we reject
H0 and deduce that there
is a difference between
the 2 chemo-therapy
treatment

Dr Jaffa
14
Kappa Statistic
• Thus far we were focusing on assessing whether or not

there’s an association between two categorical variables.
• If we need to quantify the “degree of association” or

“degree of agreement” we use kappa statistic κ
• If a categorical variable is reported at two surveys by each

of n subjects, then the kappa statistic (κ) is used to
measure “reproducibility” between surveys.

Dr Jaffa
Kappa Statistic
• Example: beef consumption measured in two occasions,

few months apart, for the same group of people.
• Guidelines for the evaluation of kappa statistic κ:

κ > 0.75 denotes excellent reproducibility.
0.4 ≤ κ ≤ 0.75 denotes good reproducibility.
0 ≤ κ < 0.4 denotes marginal reproducibility.

Dr Jaffa
15
Kappa Statistic
• Example: A diet questionnaire was administered by mail to
537 female American nurses on two separate occasion
several month apart.
• The data obtained from the two surveys represent the

amount of beef consumption.
• How can the reproducibility of response for the beef

consumption data be quantified?

Dr Jaffa
Kappa Statistic
Amount of beef consumption reported by 537 female American
nurses at 2 different surveys
Survey 2
Survey 1 ≤ 1 serving/week > 1 serving/week Total
≤ 1 serving/week 136 92 228

> 1 serving/week 69 240 309
Total 205 332 537

Dr Jaffa
16
Kappa Statistic
Kappa κ = 0.378 < 0.4 thus the reproducibility between the

first and second surveys is marginal.
Dr Jaffa
Summary
Chi-Square test is a statistical test used to examine differences

with categorical variables in an independent unmatched
samples study design.
Example: In an election survey, voters might be classified by

gender (male or female) and voting preference (Democrat,
Republican, or Independent). We could use a chi-square test for
independence to determine whether gender is related to voting
preference.

Dr Jaffa
17
35
36
Examples From Published Articles

Gender:
1) Types of the outcome: Gender is not continuous, it is categorical (binary

males/females).
2) Number of groups you are comparing: 2 groups: standard therapy and

intensive therapy.
3) Study design: Independent (no mentioning of matching in the method’s

section)
4) Specify the hypothesis to be tested: proportions of females and males is

the same for patients enrolled in standard therapy and patients in the intensive
therapy.
1) What type of test was used? Since we are comparing if females and males
(categorical) are equally distributed on the 2 treatment arms (categorical) in an
independent sample study design then use the Chi-Squared test.

Dr Jaffa
18
Gender: Conclusion?
Reported P-value is 0.98 hence there is no significant difference
in the allocation of males and females patients on the treatment
groups (Intensive vs. Standard).
Hence one can conclude that males and females are distributed
evenly on the 2 treatment arms.

Dr Jaffa
38
Race or Ethnic Group:
1) Types of the outcome: Race is not continuous, it is categorical (Non-
Hispanic white, Hispanic white, Black, and other).
2) Number of groups you are comparing: 2 groups: standard therapy and
intensive therapy.
3) Study design: Independent (no mentioning of matching in the method’s
section)
4) Specify the hypothesis to be tested: proportion of Non-Hispanic white,
Hispanic white, Black, and other is the same for patients enrolled in
standard therapy and patients in the intensive therapy.
5) What type of test was used? Since we are comparing if the different
races(categorical) are equally distributed on the 2 treatment arms
(categorical) in an independent sample study design then use the Chi-
Squared test.

Dr Jaffa
19
Race or Ethnic Group: Conclusion?
Reported P-value is 0.51 (above 0.05) hence there is no
significant difference in the allocation of the race or ethnic groups
on the treatment groups (Intensive vs. Standard).
Hence one can conclude that the different races (Non-Hispanic

white, Hispanic white, Black, and other) are distributed evenly on
the 2 treatment arms.

Dr Jaffa
Summary
McNemar’s Test Used when:

1) Outcome that is being compared is Binary. Example:
assessing if in a matched study design participants
religious status: religious yes/no (Binary) is impacting their
opinion about civil marriage (with or against).
2) Groups that are compared are dependent (that is matched
or repeated measure -- example a before/after -- study
design).
3) Limited to comparison of 2 groups only.

Dr Jaffa
20
41

Dr Jaffa

Use of Assistive Devices:
1) Types of the outcome: Use of assistive Devices is not continuous, it is

categorical (Binary Yes/No).
2) Number of groups you are comparing: 2 groups since Use of Assistive

Devices is divided into 2 categories (Yes/No).
3) Study design: Dependent study design since same patients were asked
about their use of assistive devices at Baseline and at follow-up
4) Specify the hypothesis to be tested: Did the use of assistive devices

change from baseline to follow-up?
5) What type of test was used? Since we are comparing if the use of
assistive devices has changed from baseline to follow-up, and we have
categorical data with 2 dependent groups only then we can use McNemar’s
test.
Dr Jaffa
21
Use of Assistive Devices: Conclusion?

Reported P-value is 0.009 (below 0.05) hence there is a
significant difference in the Use of Assistive Devices from
Baseline to Follow-up.
Direction of the Conclusion: At baseline the use of assistive

devices was 57 out of 195 (29%) at follow-up it went up to 74
out of 195 (38%). Hence the percentage of utilization of
assistive devices went up from baseline to follow-up.

Dr Jaffa
44
What type of data is “Physical Function” reported in the Table?

Since standard deviation (SD) was reported for physical function
in the Table then this is an indication that “physical function” is a
continuous outcome.
What test was used to report the P-value for physical function?
Since we have dependent samples (baseline and follow-up
assessment) and “physical function” is continuous then paired t
test should have been used here to assess if physical function
changed from baseline to follow-up.

Dr Jaffa
22
EPHD310 Basic Biostatistics Course Learning Outcomes Per FHS Catalogue
LO4. Analyze quantitative data using common statistical methods for

inference through computer based statistical software and manual
computation.
LO5. Apply alternative statistical methodologies to commonly used
statistical methods when assumptions are not met.
LO6. Interpret results of statistical analyses found in public health studies
and biomedical sciences.
LO7. Apply ethical principles to data management and analysis.

Dr Jaffa
23

Hypothesis Testing: Categorical Data: Outline

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing: Categorical Data: Outline

Uploaded by

Copyright:

Available Formats

Hypothesis Testing: Categorical Data

2) Two-sample test for proportions for matched-pair data:

3) The Kappa statistic.

4) Learning outcomes for EPHD310 Covered in Lecture 7.

EPHD310 Basic Biost: lecture 7 1

Two-sample test of proportions: Independent samples

• A categorical variable with 2 categories is referred to as

• We will focus on the binomial variables (yes/no; 0/1)

• In specific, interest is in determining how the population proportion of

Two-sample test of proportions:

Age at birth of first child

EPHD310 Basic Biost: lecture 7 Dr Jaffa 4

Age at birth of first child

pˆ1  683 3220  0.212 pˆ 2  1498 10, 245  0.146

EPHD310 Basic Biost: lecture 7 Dr Jaffa 5

Two-sample test of proportions: Independent samples

To test the hypothesis use “Yates-corrected Chi-square test for a 2x2

EPHD310 Basic Biost: lecture 7 6

Yates-corrected Chi-square test: critical value method

+ | O21  E21 | 0.5  E21  | O22  E22 | 0.5  E22

which under H0 approximately follows Chi-square distribution, 2I 1 J 1

2) Reject H0 if X    I 1 J 1,(1 ) and fail to reject H0 otherwise

In a 2x2 table I = 2 and J = 2 so  2I 1 J 1,1  1,12 

EPHD310 Basic Biost: lecture 7 7

Two-sample test of proportions:

• E11 = n1(x1+x2) / (n1+n2)

EPHD310 Basic Biost: lecture 7 9

X 2  | 683  521.6 | 0.5  521.6  | 2537  2698.4 | 0.5  2698.4

 |1498  1659.4 | 0.5  1659.4  | 8747  8585.6 | 0.5  8585.6

 77.89  1,0.95

EPHD310 Basic Biost: lecture 7 11

Chi-Square Test for the Proportions of Two-Independent

Yates-corrected Chi-square test : SPSS Output

Count: is the observed table

Chi-Square Test for the Proportions of Two-

EPHD310 Basic Biost: lecture 7 14

• Thus we can deduce that having a first baby at an age

EPHD310 Basic Biost: lecture 7 15

• The Chi-square distribution is a family of distributions

• Unlike the t distribution, which is always symmetric about

EPHD310 Basic Biost: lecture 7 16

Two-sample test for proportions for Matched-Pair

• Correlated samples: matched pairs, or same person

• Example: two treatments for chemotherapy (A and B)

EPHD310 Basic Biost: lecture 7 18

• Outcome of interest is survival (yes/no) for 5 years after

• The data in a matched pair context is always tabulated as

EPHD310 Basic Biost: lecture 7 19

A 2x2 contingency table with matched pair as the sampling unit

Survive for 5 years a = 510 b = 16 526

EPHD310 Basic Biost: lecture 7 20

• The matched pairs in cell “b” are called discordant pairs.

• The matched pairs in cell “c” are called discordant pairs.

EPHD310 Basic Biost: lecture 7 21

Two-sample test for proportions for Matched-pair

EPHD310 Basic Biost: lecture 7 22

• The total number of concordant pairs in the breast cancer

• McNemar’s test is based on the discordant pairs (b and c

EPHD310 Basic Biost: lecture 7 23

Two-sample test for proportions for Matched-pair

(1) Compute the test statistic: X 2

Fail to reject Ho if X 2  1,1