Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Chapter 10

Hypothesis testing: Categorical


Data Analysis

EPI809/Spring 2008

Learning Objectives
1.

Comparison of binomial proportion using Z and


2 Test.

2.

Explain 2 Test for Independence of 2 variables

3.

Explain The Fishers test for independence

4.

McNemars tests for correlated data

5.

Kappa Statistic

6.

Use of SAS Proc FREQ


EPI809/Spring 2008

Data Types
Data
Data

Quantitative
Quantitative

Discrete
Discrete

Qualitative
Qualitative

Continuous
Continuous
EPI809/Spring 2008

Qualitative Data
1.

2.
3.
4.

Qualitative Random Variables Yield


Responses That Can Be Put In Categories.
Example: Gender (Male, Female)
Measurement or Count Reflect # in Category
Nominal (no order) or Ordinal Scale (order)
Data can be collected as continuous but
recoded to categorical data. Example
(Systolic Blood Pressure - Hypotension,
Normal tension, hypertension )
EPI809/Spring 2008

Hypothesis Tests
Qualitative Data
Qualitative
Qualitative
Data
Data
11pop.
pop.

22orormore
more
pop.
pop.

Proportion
Proportion

Independence
Independence

22pop.
pop.
ZZTest
Test

ZZTest
Test

22 Test
Test

EPI809/Spring 2008

22 Test
Test

Z Test for Differences in


Two Proportions

EPI809/Spring 2008

Hypotheses for
Two Proportions

EPI809/Spring 2008

Hypotheses for
Two Proportions
Research Questions
Hypothesis No Difference
Any Difference

Pop 1 Pop 2
Pop 1 < Pop 2

Pop 1 Pop 2
Pop 1 > Pop 2

H0
Ha

EPI809/Spring 2008

Hypotheses for
Two Proportions
Research Questions
Hypothesis No Difference
Any Difference

H0
Ha

Pop 1 Pop 2
Pop 1 < Pop 2

Pop 1 Pop 2
Pop 1 > Pop 2

p1 - p2 = 0
p1 - p2 0

EPI809/Spring 2008

Hypotheses for
Two Proportions
Research Questions
Hypothesis No Difference
Any Difference

H0
Ha

p1 - p2 = 0
p1 - p2 0

Pop 1 Pop 2
Pop 1 < Pop 2

Pop 1 Pop 2
Pop 1 > Pop 2

p1 - p2 0
p1 - p2 < 0

EPI809/Spring 2008

10

Hypotheses for
Two Proportions
Research Questions
Hypothesis No Difference
Any Difference

H0
Ha

p1 - p2 = 0
p1 - p2 0

Pop 1 Pop 2
Pop 1 < Pop 2

Pop 1 Pop 2
Pop 1 > Pop 2

p1 - p2 0
p1 - p2 < 0

p1 - p2 0
p1 - p2 > 0

EPI809/Spring 2008

11

Hypotheses for
Two Proportions
Research Questions
Hypothesis No Difference
Any Difference

H0
Ha

p1 - p2 = 0
p1 - p2 0

Pop 1 Pop 2
Pop 1 < Pop 2

Pop 1 Pop 2
Pop 1 > Pop 2

p1 - p2 0
p1 - p2 < 0

p1 - p2 0
p1 - p2 > 0

EPI809/Spring 2008

12

Z Test for Difference in Two


Proportions
1. Assumptions

2.

Populations Are Independent


Populations Follow Binomial Distribution
Normal Approximation Can Be Used for
large samples (All Expected Counts 5)

Z-Test Statistic for Two Proportions


Z

p1 p 2 p1 p2
1 1
p 1 p

n1 n2

where p

EPI809/Spring 2008

X1 X 2
n1 n2

13

Sample Distribution for Difference


Between Proportions

12 22
X 1 X 2 ~ N 1 2 ;

n1 n2

p1 1 p1 p2 1 p2

p1 p2 N p1 p2 ;

n
n
1
2

1 1
N 0; pq
under H 0 : p1 p2

n1 n2

x x
p 1 2,
n1 n2

EPI809/Spring 2008

14

Z Test for Two Proportions


Thinking Challenge
MA

Youre an epidemiologist for the US


Department of Health and Human
Services. Youre studying the
prevalence of disease X in two
states (MA and CA). In MA, 74 of
1500 people surveyed were
diseased and in CA, 129 of 1500
were diseased. At .05 level, does
MA have a lower prevalence rate?
EPI809/Spring 2008

CA

15

Z Test for Two Proportions


Solution*

EPI809/Spring 2008

16

Z Test for Two Proportions


Solution*
H0:
Ha:
=
nMA =
nCA =
Critical Value(s):

Test Statistic:

Decision:
Conclusion:

EPI809/Spring 2008

17

Z Test for Two Proportions


Solution*
Test Statistic:

H0: pMA - pCA = 0


Ha: pMA - pCA < 0
=
nMA =
nCA =
Critical Value(s):

Decision:
Conclusion:

EPI809/Spring 2008

18

Z Test for Two Proportions


Solution*
Test Statistic:

H0: pMA - pCA = 0


Ha: pMA - pCA < 0
= .05
nMA = 1500 nCA = 1500
Critical Value(s):

Decision:
Conclusion:

EPI809/Spring 2008

19

Z Test for Two Proportions


Solution*
Test Statistic:

H0: pMA - pCA = 0


Ha: pMA - pCA < 0
= .05
nMA = 1500 nCA = 1500
Critical Value(s):
Reject

.05
-1.645

Decision:
Conclusion:

Z
EPI809/Spring 2008

20

Z Test for Two Proportions


Solution*
X MA
74
p MA

.0493
nMA 1500

X CA 129
p CA

.0860
nCA 1500

X MA X CA
74 129
p

.0677
nMA nCA 1500 1500
Z

.0493 .0860 0

1
1
.0677 1 .0677

1500 1500
4.00

EPI809/Spring 2008

21

Z Test for Two Proportions


Solution*
Test Statistic:
Z = -4.00

H0: pMA - pCA = 0


Ha: pMA - pCA < 0
= .05
nMA = 1500 nCA = 1500
Critical Value(s):
Reject

.05
-1.645

Decision:
Conclusion:

Z
EPI809/Spring 2008

22

Z Test for Two Proportions


Solution*
Test Statistic:
Z = -4.00

H0: pMA - pCA = 0


Ha: pMA - pCA < 0
= .05
nMA = 1500 nCA = 1500
Critical Value(s):
Reject

Decision:
Reject at = .05

.05

Conclusion:

-1.645

Z
EPI809/Spring 2008

23

Z Test for Two Proportions


Solution*
Test Statistic:
Z = -4.00

H0: pMA - pCA = 0


Ha: pMA - pCA < 0
= .05
nMA = 1500 nCA = 1500
Critical Value(s):
Reject

.05
-1.645

Decision:
Reject at = .05
Conclusion:
There is evidence MA
is less than CA

EPI809/Spring 2008

24

2 Test of Independence
Between 2 Categorical
Variables

EPI809/Spring 2008

25

Hypothesis Tests
Qualitative Data
Qualitative
Qualitative
Data
Data
11pop.
pop.

22orormore
more
pop.
pop.

Proportion
Proportion

Independence
Independence

22pop.
pop.
ZZTest
Test

ZZTest
Test

22 Test
Test

EPI809/Spring 2008

22 Test
Test

26

Test of Independence
2

1.Shows If a Relationship Exists Between 2


Qualitative Variables, but does Not Show
Causality
2.Assumptions
Multinomial Experiment
All Expected Counts 5

3.Uses Two-Way Contingency Table


EPI809/Spring 2008

27

Test of Independence
Contingency Table
2

1.

Shows # Observations From 1 Sample


Jointly in 2 Qualitative Variables

EPI809/Spring 2008

28

Test of Independence
Contingency Table
2

1.Shows # Observations From 1 Sample


Jointly in 2 Qualitative Variables
Levels of variable 2

Levels of variable 1
EPI809/Spring 2008

29

Test of Independence
Hypotheses & Statistic
2

1.Hypotheses

H0: Variables Are Independent


Ha: Variables Are Related (Dependent)

EPI809/Spring 2008

30

Test of Independence
Hypotheses & Statistic
2

1.Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

2.Test Statistic
2

all cells

Observed count

ch
ch

nij E nij

E n

Expected
count

ij

EPI809/Spring 2008

31

Test of Independence
Hypotheses & Statistic
2

1.Hypotheses
H0: Variables Are Independent
Ha: Variables Are Related (Dependent)

2.Test Statistic
2

all cells

Observed count

ch
ch

nij E nij

E n
ij

Expected
count

Rows Columns

Degrees of Freedom: (r - 1)(c - 1)


EPI809/Spring 2008

32

2 Test of Independence
Expected Counts

1.Statistical Independence Means Joint


Probability Equals Product of Marginal
Probabilities
2.Compute Marginal Probabilities & Multiply
for Joint Probability
3.Expected Count Is Sample Size Times
Joint Probability

EPI809/Spring 2008

33

Expected Count Example

EPI809/Spring 2008

34

Expected Count Example


Marginal probability = 112
160

EPI809/Spring 2008

35

Expected Count Example


Marginal probability = 112
160

78
Marginal probability =
160
EPI809/Spring 2008

36

Expected Count Example


112 78
Joint probability =
160 160

Marginal probability = 112


160

78
Marginal probability =
160
EPI809/Spring 2008

37

Expected Count Example


112 78
Joint probability =
160 160

Marginal probability = 112


160

112 78
Expected count = 160
78
160 160
Marginal probability =
160
= 54.6
EPI809/Spring 2008
38

Expected Count Calculation

EPI809/Spring 2008

39

Expected Count Calculation


Expected count =

aRow totalf aColumn totalf


Sample size

EPI809/Spring 2008

40

Expected Count Calculation


Expected count =

aRow totalf aColumn totalf


Sample size

112x78
160

112x82
160

48x78
160
EPI809/Spring 2008

48x82
160

41

Test of Independence
Example on HIV
2

You randomly sample 286 sexually active


individuals and collect information on their HIV
status and History of STDs. At the .05 level, is
there evidence of a relationship?

EPI809/Spring 2008

42

2 Test of Independence
Solution

EPI809/Spring 2008

43

2 Test of Independence
Solution
H0:
Ha:
=
df =
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

EPI809/Spring 2008

44

2 Test of Independence
Solution
H0: No Relationship
Ha: Relationship
=
df =
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

EPI809/Spring 2008

45

2 Test of Independence
Solution
H0: No Relationship
Ha: Relationship
= .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:

Decision:

Reject

Conclusion:
0

EPI809/Spring 2008

46

2 Test of Independence
Solution
H0: No Relationship
Ha: Relationship
= .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:

Decision:

Reject

= .05
0

3.841

Conclusion:

EPI809/Spring 2008

47

2 Test of Independence
Solution

E(nij) 5 in all
cells
116x132
286

154x116
286

170x132
286
EPI809/Spring 2008

170x154
286

48

2 Test of Independence
Solution
2

all cells

af
af

n11 E n11

E n
11

84 53.5

53.5

ch
ch

nij E nij
E n

ij

af
af

n12 E n12

E n
12

af
af

n22 E n22

E n

32 62.5
122 91.5

62.5
91.5
EPI809/Spring 2008

22

54.29
49

2 Test of Independence
Solution
H0: No Relationship
Ha: Relationship
= .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):

Test Statistic:
2 = 54.29

Decision:

Reject

= .05
0

3.841

Conclusion:

EPI809/Spring 2008

50

2 Test of Independence
Solution
H0: No Relationship
Ha: Relationship
= .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject

= .05
0

3.841

Test Statistic:
2 = 54.29

Decision:
Reject at = .05
Conclusion:

EPI809/Spring 2008

51

2 Test of Independence
Solution
H0: No Relationship
Ha: Relationship
= .05
df = (2 - 1)(2 - 1) = 1
Critical Value(s):
Reject

= .05
0

3.841

Test Statistic:
2 = 54.29

Decision:
Reject at = .05
Conclusion:
There is evidence of a
relationship

EPI809/Spring 2008

52

2 Test of Independence
SAS CODES
Data dis;
input STDs HIV count;
cards;
1 1 84
1 2 32
2 1 48
2 2 122
;
run;
Proc freq data=dis order=data;
weight Count;
tables STDs*HIV/chisq;
run;

EPI809/Spring 2008

53

2 Test of Independence
SAS OUTPUT
Statistics for Table of STDs by HIV

54.1502

Statistic
DF
Value
Prob
------------------------------------------------------Chi-Square
1
<.0001
Likelihood Ratio Chi-Square
1
55.7826
<.0001
Continuity Adj. Chi-Square
1
52.3871
<.0001
Mantel-Haenszel Chi-Square
1
53.9609
<.0001
Phi Coefficient
0.4351
Contingency Coefficient
0.3990
Cramer's V
0.4351

EPI809/Spring 2008

54

You might also like