Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

N203 – Biostatistics for Nurses

Souha Fares, PhD


Hariri School of Nursing
Inference for Categorical Data
(Inference for proportions, z tests)
Outline
• Inference for one population proportion
– Confidence interval
– Test of hypotheses

• Inference for two population proportions


– Confidence interval
– Test of hypotheses

NURS 203 Biostatistics for Nurses


2
Lecture 7 – Inference for Proportions
Binomial Distribution
• Variable of interest is binary (Yes /No)

• Ex: respond / do not respond to treatment


have normal/high blood pressure
Pregnant/not pregnant
Have the disease/ do not have the disease

• The proportion of individuals in the


population who have the characteristic is p
NURS 203 Biostatistics for Nurses
3
Lecture 7 – Inference for Proportions
Binomial Distribution

• X = number of subjects with the characteristic


in the sample

• X ~ Binomial (n , p)

• Proportion of individuals with the


characteristic in the sample is
^
p =X/n

NURS 203 Biostatistics for Nurses


4
Lecture 7 – Inference for Proportions
Binomial Distribution

• When n is large (np ≥ 5 and n(1-p) ≥ 5) the


binomial distribution can be approximated
with the normal distribution

• We will make inference for p (confidence


intervals and tests of significance) based on
the normal distribution

NURS 203 Biostatistics for Nurses


5
Lecture 7 – Inference for Proportions
Single Sample: Confidence interval for p

• Parameter of interest: p = true proportion

^
• Point estimate: p =X/n sample proportion

^ ^ ^
• SE(point estimate): SE( p ) = p (1  p ) / n

NURS 203 Biostatistics for Nurses


6
Lecture 7 – Inference for Proportions
Single Sample: Confidence interval for p

^ ^
If n p ≥ 5 an n (1 - p ) ≥ 5, a 100(1-α)% CI for p
is:

^ ^

p ± zcrit SE( p)

zcrit = critical value from the standard normal


distribution (from the Z table)
NURS 203 Biostatistics for Nurses
7
Lecture 7 – Inference for Proportions
Single Sample: Confidence interval for p
Ex. n=50 smokers enrolled in a smoking cessation
study. After 6 months, 17 (34%) had successfully
quit. Find a 95% CI for p, (Zcrit = 1.96, from the table)

• Parameter of interest: p = true proportion who


will quit
^
• Point estimate p = 17/50 = 0.34

• 50(0.34) = 17 ≥ 5 and 50(1-0.34) = 33 ≥ 5


NURS 203 Biostatistics for Nurses
8
Lecture 7 – Inference for Proportions
Single Sample: Confidence interval for p
Ex. n=50 smokers enrolled in a smoking cessation study.
After 6 months, 17 (34%) had successfully quit

95% CI for p is:


^
^ ^
p ± zcrit
p (1  p ) / n

= .34 ± 1.96 0.34(1  0.34) / 50 = (0.21 , 0.47)

We are 95% confident that 21% to 47% of subjects who


will enroll in this study, will quit smoking
NURS 203 Biostatistics for Nurses
9
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
• Null hypothesis H0: p = p0
^
p  p0
• Test statistic Z  ~ N (0,1)
p 0 (1  p0 )
n

under H0 provided that


np0 ≥ 5 and n(1-p0) ≥ 5

NURS 203 Biostatistics for Nurses


10
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
Rejection region

H1 Rejection region P value

p > p0

p < p0 |Zobs| ≥ Zcrit From the Z table

p ≠ p0

NURS 203 Biostatistics for Nurses


11
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
Ex
• Suppose that a diagnostic test has been shown to be 80% effective
in detecting a genetic abnormality in human cells

• An investigator modifies the diagnostic testing protocol and wishes


to test if the new protocol has a detection rate that is significantly
different from 80% in specimens known to possess the abnormality

• The new protocol is applied to 300 independent specimens of


human cells known to possess the abnormality

• The abnormality is detected in 222 specimens

• Run the appropriate test at 5% level of significance. Use Zcrit = 1.96

NURS 203 Biostatistics for Nurses


12
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
Ex (cont).
1. Hypotheses: H0: p = 0.8
H1: p ≠ 0.8, α = 0.05

2. Test statistic:
np0 = 300(0.8)=240 ≥ 5
n(1-p0)=300(1-0.8)=60 ≥ 5
^
p  p0
test statistic Z ~ N (0,1) under H0
p 0 (1  p0 )
n
NURS 203 Biostatistics for Nurses
13
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
Ex (cont).
^
3. Observed statistic: p = 222/300 = 0.74

0.74  0.8
Z obs   2.61
0.8(1  0.8)
300

4. Rejection region
Reject H0 if |Zobs| ≥ Zcrit= 1.96

NURS 203 Biostatistics for Nurses


14
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
Ex (cont).

4. Or p-value = p(Z < -2.61) + p (Z > 2.61)

= 2*p(Z < -2.61)


= 2*0.0045
= 0.009
-2.61 2.61

NURS 203 Biostatistics for Nurses


15
Lecture 7 – Inference for Proportions
Single Sample: Hypothesis Testing for p
Ex (cont).

5. Conclusion
Reject H0 since |-2.61| = 2.61 ≥ 1.96 (or
equivalently p-value = 0.009 < 0.05). We have
significant evidence at α = 0.05 to show that the
modified protocol has a significantly different
detection rate than 80%

NURS 203 Biostatistics for Nurses


16
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
• We often compare two populations with
respect to the proportion of successes in each

• We have two independent random samples


from binomial populations

NURS 203 Biostatistics for Nurses


17
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Group 1 Group 2
Population proportions p1 p2
Sample
Sample sizes n1 n2

Numbers with the X1 X2


characteristic
^
^
Sample proportions p1 = X1/n1 p 2= X2/n2

Goal: To test whether p1 = p2 using the sample


data

NURS 203 Biostatistics for Nurses


18
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Null hypothesis:
H0: p1=p2 (or equivalently p1 - p2=0)

Test statistic:
If ^
^
n1 p1≥ 5, n1(1 - p1) ≥ 5
^ ^
n2 p 2 ≥ 5 n2 (1- p 2) ≥ 5

Then,
NURS 203 Biostatistics for Nurses
19
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
^ ^
p1  p 2
Then, Z  ~ N(0,1)
^  1
^
1 
p (1  p )
n  n 
 1 2 

^ ^
Where p1 = X1/n1, p 2= X2/n2

^ X1  X 2
p
n1  n2

NURS 203 Biostatistics for Nurses


20
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Rejection region

H1 Rejection region P value

p1 > p2

p1 < p2 |Zobs| ≥ Zcrit From the Z Table

p1 ≠ p2

NURS 203 Biostatistics for Nurses


21
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

• A new drug is being compared to an existing drug for its


effectiveness in relieving headache pain

• 50 subjects were administered the existing drug. Among


those 28 reported relief from headache pain within 60
minutes

• Among the 50 subjects administered the new drug 34


reported relief from headache pain within 60 minutes

• Test if the proportion of subjects reporting relief from


headache pain within 60 minutes under the existing drug is
significantly different from those under the new drug. Use a
5% significance level (Zcrit = 1.96)
NURS 203 Biostatistics for Nurses
22
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

• Group 1 (existing drug): n1=50, X1 =28


• Group 2 (new drug) : n2=50, X2=34

1. Hypotheses
H0: p1=p2
H1: p1≠p2

2. Test statistic
^
p1= 28/50 = 0.56
^
p 2 = 34/50 = 0.68
NURS 203 Biostatistics for Nurses
23
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

2. Test statistic
^
n1 p1 = 50(0.56) = 28 ≥ 5
^
n1 (1 - p1) = 50(1-0.56) = 22 ≥ 5
^
n2 p 2 = 50(0.68) = 34 ≥ 5
^
n2 (1 - p 2) = 50 (1-0.68) = 16 ≥ 5

NURS 203 Biostatistics for Nurses


24
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

2. Test statistic
So the appropriate test statistic is
^ ^
p1  p 2
Z 
^  1
^
1 
p (1  p )
n  n 
 1 2 

NURS 203 Biostatistics for Nurses


25
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

3. Observed test statistic


^ X1  X 2 28  34
p   0.62
n1  n2 50  50
0.56  0.68
Z obs   1.24
 1 1 
0.62(1  0.62)  
 50 50 

NURS 203 Biostatistics for Nurses


26
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

4. Rejection region

Reject H0 if |Zobs| ≥ Zcrit= 1.96

Or p-value < 0.05


P-value = = p(z < -1.24) + p (z > 1.24)
= 2*p(Z < -1.24)
= 2*0.1075
= 0.215
-1.24 1.24

NURS 203 Biostatistics for Nurses


27
Lecture 7 – Inference for Proportions
Statistical Inference for p1 – p2
Ex. New vs. existing drug

Since 1.24 < 1.96 (or equivalently p-value =


0.215 > 0.05),we fail to reject H0. There is no
significant evidence at α = 0.05 to show a
difference in the proportions of subjects
experiencing relief from headache pain between
the two drugs

NURS 203 Biostatistics for Nurses


28
Lecture 7 – Inference for Proportions
Confidence Interval for p1 – p2
^ ^
If n1 p1 ≥ 5, n1(1 - p1 ) ≥5

^ ^
n2 p 2 ≥ 5 n2 (1- p 2) ≥ 5

Then, a 100(1-α)% confidence interval for p1 – p2


is:
 ^ ^
  ^ ^

^ ^
 1
p (1  p 1 
)  2
p (1  p ) 
p1  p 2  Z crit    2

 n1    n2 

   

NURS 203 Biostatistics for Nurses


29
Lecture 7 – Inference for Proportions
Confidence Interval for p1 – p2
Ex. New vs. existing drug

A 95% CI for p1 – p2 is:

 0.56(1  0.56)   0.68(1  0.68) 


0.56  0.68  1.96   
 50   50 

 0.12  0.189

 ( 0.309,0.069)

NURS 203 Biostatistics for Nurses


30
Lecture 7 – Inference for Proportions
Confidence Interval for p1 – p2
Ex. New vs. existing drug

A 95% CI for p1 – p2 is (– 0.309, 0.069).

Interpretation:
• We are 95% confident that the true difference in
proportions of subjects reporting relief from headache pain
within 60 minutes between the true and existing drug is
between –0.309 and 0.069

• Zero is in the 95% confidence interval, this means that


there is not enough evidence in the data that the two
population proportions differ at α = 0.05
(i.e we fail to reject H0: p1 = p2 at α = 0.05 )
NURS 203 Biostatistics for Nurses
31
Lecture 7 – Inference for Proportions
Inference for p1 – p2 – Another Example
Ex. Medication adherence
We want to evaluate the effectiveness of a program targeting
medication adherence. 130 patients visiting a particular clinic
were included in the study. Of the 70 participants that were
enrolled in the program, 30 were adherent. Of the 60
participants that were not enrolled in the program 21 were
adherent. Is there enough evidence to show that the program
improved adherence?

NURS 203 Biostatistics for Nurses


32
Lecture 7 – Inference for Proportions
Inference for p1 – p2 – Another Example
Ex. Medication adherence
1. Hypotheses
p1= proportion of adherence among patients enrolled in the
program
p2= proportion of adherence among patients not enrolled in
the program

H0: p1 – p2 = 0
H1: p1 – p2 ≠ 0

NURS 203 Biostatistics for Nurses


33
Lecture 7 – Inference for Proportions
Inference for p1 – p2 – Another Example
Ex. Medication adherence
2. Test statistic

𝑛1 𝑝1 = 30 ≥ 5
𝑛1 1 − 𝑝1 = 70 − 30 = 40 ≥ 5
𝑛2 𝑝2 = 21 ≥ 5
𝑛2 1 − 𝑝2 = 60 − 21 = 39 ≥ 5

Therefore we can use the Z test.


^ ^
p1  p 2
Z 
^  1 1 
^
p (1  p )
n  n 
 1 2 

NURS 203 Biostatistics for Nurses


34
Lecture 7 – Inference for Proportions
Inference for p1 – p2 – Another Example
Ex. Medication adherence
3. Observed Test statistic

𝑋1 + 𝑋2 30 + 21 51
𝑝= = = = 0.39
𝑛1 + 𝑛2 70 + 60 130

0.43 − 0.35
𝑍= = 0.93
1 1
0.39(1 − 0.39) 70 + 60

NURS 203 Biostatistics for Nurses


35
Lecture 7 – Inference for Proportions
Inference for p1 – p2 – Another Example
Ex. Medication adherence
4. Use α = 0.01. From the Z table:

Rejection region: Zcrit = ?

Or p-value = ?

5. Conclusion?

NURS 203 Biostatistics for Nurses


36
Lecture 7 – Inference for Proportions
Inference for Proportions for Paired/Matched Data
McNemar Test
• The McNemar test is used to test differences in
proportions for dependent groups in a 2x2
contingency table (dependent groups also called
paired groups or matched groups)

• Same (or paired) subjects being observed under 2


conditions
– 2 treatments
– before/after
– 2 diagnostic tests
– twins
NURS 203 Biostatistics for Nurses
37
Lecture 7 – Inference for Proportions
Inference for Proportions for Paired/Matched Data
McNemar Test
Ex: Hypertension
• An automated blood pressure machine is developed where a
person can sit on a booth and have her blood pressure
measured by a computer device

• A study is conducted to compare the computer device with


standard methods of measuring blood pressure

• Twenty patients are recruited and their hypertensive status is


assessed by both the computer device and a trained observer.

• Hypertensive status is defined as either hypertensive or


normotensive

NURS 203 Biostatistics for Nurses


38
Lecture 7 – Inference for Proportions
Inference for Proportions for Paired/Matched Data
McNemar Test
Ex: Hypertension
• The data are shown below: Trained observer

Hypertensive Normotensive

Hypertensive 3 7
Computer device
Normotensive 1 9

• Note that:
– 3 people are measured as hypertensive by both the computer
device and trained observer
– 9 people are normotensive by both methods
– 7 people are hypertensive by the computer device and
normotensive by the trained observer
– 1 person is normotensive by the computer device and
hypertensive by the trained observer
NURS 203 Biostatistics for Nurses
39
Lecture 7 – Inference for Proportions
Inference for Proportions for Paired/Matched Data
McNemar Test
• In general the data look like
Condition 2

Yes No

Yes a b
Condition 1
No c d

– The blue cells (a and d) are called concordant pairs


– The red cells (b and c) are called discordant pairs
– The test statistic is calculated using the discordant pairs
– Test statistic = (b –c)2/(b + c)
– P-value will be given

NURS 203 Biostatistics for Nurses


40
Lecture 7 – Inference for Proportions
Inference for Proportions for Paired/Matched Data
McNemar Test
Ex: Hypertension
Trained observer

Hypertensive Normotensive

Hypertensive 3 7
Computer device
Normotensive 1 9

• The hypotheses are:


H0: The hypertensive status is the same using both methods
H1: The hypertensive status differs

• The test statistic = (7 -1)2 / (7 + 1 ) = 36/8 = 4.5

NURS 203 Biostatistics for Nurses


41
Lecture 7 – Inference for Proportions
Inference for Proportions for Paired/Matched Data
McNemar Test
Ex: Hypertension

• Based on the binomial distribution we get p-value =


0.073

• Conclusion?

NURS 203 Biostatistics for Nurses


42
Lecture 7 – Inference for Proportions

You might also like