Power & Sample Size - Lecture17a

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 55

Power & Sample Size

Sadik A. Khuder, Ph.D.


College of Medicine
University of Toledo
Questions
 What is a statistical hypothesis?
 What does it mean to say that a finding
(difference or association) is statistically
significant?
 What is the probability of detecting a
real effect?
 What is statistical power?
Why is it important?
 What are the main factors that influence
power?
Decision Making Under Uncertainty
You have to make decisions even when you are
unsure.
 Choose drug A or drug B
 Discontinue a useful pesticide
 Introducing new biomarker
 Statistics provides an approach to decision
making under uncertainty.
 Sort of decision making by choosing the same
way you would bet.
Maximize expected utility (subjective value).
Null and Alternative Hypotheses

H0 = Null Hypotheses
Ha = Alternative Hypotheses
Hypotheses always pertain to population
parameters or characteristics rather than
to sample characteristics.
It is the population, not the sample, that
we want to make an infernece about from
limited data
Hypothesis Testing
 Making statement(s) regarding unknown population
parameter values based on sample data.
Null Hypothesis (H0):
The accepted explanation, status quo.
This is what we're trying to disprove.

Alternative Hypothesis (H1):


What the investigator thinks might really be
going on, a (possibly) better explanation than
the null.
Statistical Test
 The goal of the test is to reject H0 in favour of H1.
 We do this by calculating a test statistic & comparing
its value with a table value (critical value).
 If our test statistic is equal or more than our critical
value, then we reject H0. We must set up the rejection
region before computing the test statistic.
Decision:
Reject H0 or
Fail to reject H0

Error:
Type 1 : Reject H0 when H0 is really true
Type II : Fail to reject H0 when H0 is really false
Error Rates and Power

Test Result
True State
H0 True H0 False
H0 True Correct Decision Type I error
H0 False Type II error Correct Decision

  P(Type I Error )   P(Type II Error )


 Goal: Keep ,  reasonably small
 Power = 1 - 
The 5 % Level
 It is conventional in biomedical research to reject the H0
at the 5% level.
We reject H0 when there is 1/20 chance, or less, of the
event occurring.
When the p-value is less than 0.05, the event is said to
be statistically significant at the 0.05 level.
The 0.05 level is a convenient cut-off value adopted by
convention. Values close to 0.05 provide moderate evidence
against H0, while values less than 0.01 provide considerable
evidence against H0.
 One approach is to decide before you do the study what p-
value you will use to reject H0. This is called the significance
level of the test (e.g. significance level = 0.05) denoted by .
H0 is rejected if p-value <  and the results are said to
be "statistically significant" at level .
One-sided or Two-sided
 H0 is contrasted with the alternative hypothesis (H1)
 H1 may be one-sided or two-sided.
 If H1 allows parameter values on either side of the
value specified by H0 (e.g. H0: µ = 100 vs H1: µ ≠ 100)
then the test is called two-tailed (or two-sided).
 Occasionally only parameter values on one side of the
value specified by H0 (e.g. H0: µ=100 vs H1: µ >100).
 The test is called one-tailed (one-sided).
 It is rarely correct to use a one-sided test in practice.
One-sided or Two-sided
Two-sided: 0.05 is divided equally for the two tails

0.025 0.025

One-sided 0.05 is used in each tail

0.05
0.05
Steps of Hypothesis Testing
 The following steps are used in hypothesis testing:
1. Identify the problem: objectives.
2. Translate these into H0 and H1.
3. Set up the rejection region.
4. Calculate the test statistic.
5. Draw the conclusion: reject or fail to reject H0 .
6. Interpret the results: say in words what the
conclusion means in terms of the objectives.
Interpretation of P-value
In the research literature, results of statistical tests
are usually reported using the p-value.
 The p-value provides an objective measure of the
strength of evidence which the data supplies in
favor of the null hypothesis.
 It is the probability of getting a result as extreme
or more extreme than the one observed if the
proposed null hypothesis is correct.
 A small p-value provides evidence against the null
hypothesis, because data have been observed that
would be unlikely if the null hypothesis were correct.
Thus we reject the null hypothesis when the p-value
is sufficiently small.
Statistical Tests for Continuous Data
Normal/Large Sample Data?
No

Inference on means? Non-Parametric tests


No

Independent? Inference on variance??


No

Variance Paired t test


F test
known?
for variances
No

Z test Variances equal?


No

t test t test
pooled variance unequal variance
Z Test Statistic
Want to test continuous outcome
 Known variance
Under H0 X  0
~ N (0,1)

n
 Therefore,
X  0
Reject H0 if  1.96 Two sided =0.05 test
/ n

Reject H0 if X  0  1.96 or
n

X  0  1.96
n
Testing Statistical Hypotheses – example
 Suppose H0: =75; Ha: 75. Assume =10 & population is
normal, so sampling distribution of means is known (to be
normal). We get data

N  100; X  80

Likely outcome if H0 is true


80  75
Z 5
10
100

 Conclusion: reject null.


Type I and II Errors
Sampling distribution under H0
Null distribution Sampling distribution under H1
Alternate distribution
Type I and II Errors

Null distribution
Alternate distribution

Effect Size
Type I and II Errors
This is part of H1
Type I and II Errors
This is part of H1


Type I and II Errors

This is part of H0


Type I and II Errors

This is part of H0



Power of a statistical test

Power = 1 - P(type II error given H0 is false)


= probability of rejecting H0 when it is false

The probability of accepting H0 when it is false (i.e. the


conditional probability of making a type II error) is
conventionally called  ("beta")
Power = 1- .
Ideally studies should be designed so that power is at
least 80% .
This requires using an efficient design and a sufficiently
large sample.
Power of a statistical test

Power = 1 - 



Power of a statistical test

 =0.2
 =0.05

Power = 0.8



Impact of decreasing 

 > 0.2
 =0.02

Power < 0.8



Impact of increasing 

 < 0.2
 > 0.05

Power > 0.8



Impact of increasing sample size

Look at the shape of


the distribution


 
Impact of increasing Effect Size

 =0.2
 =0.05


Effect Size
Impact of increasing Effect Size

 

Both  and  will decrease, so power will increase


Why to calculate sample size and power?

To show that under certain conditions, the


hypothesis test has a good chance of showing a
desired difference (if it exists)
To show to the funding agency that the study
has a reasonable chance to obtain a conclusive
result
To show that the necessary resources (human,
monetary, time) will be minimized and well
utilized
The importance of sufficient power
If the sample was too small, our test may not be adequate to
detect a difference that in reality is there.
 The smaller our sample, or the smaller the true difference if
it exists, the greater is the probability of accepting the null
hypothesis in error, and this probability is increased with
increasing variance in the samples.
 The importance of this to our experiments or surveys
should be obvious. If, for example, there is a small change in
an important outcome (like a rise in the contaminant level), our
statistical test may lead us to accept H0 and say that there is
no significant difference or no deterioration when in reality
there is.
 On this basis, no action in terms of contaminant clean-up, is
likely to be the result, especially if that action would be
expensive.
The importance of sufficient power
 How do you know that absence of a statistically significant
effect is not just due to small sample size or sampling
variation, which tend to reduce the chances of detecting an
effect that is actually present?.
The commonest factors which make a test unable to detect
a change include
 too few samples,
 too small a difference, and
 a large variation
 It is when a test does not show a difference that power
analysis is especially important. It is needed to show whether
or not the test could have shown a difference where difference
existed in reality.
Statistical Power
Power curve Power = 1-
Probability of Type II error ()

1
Probability

0
- 0 +
Difference
Example
 A drug company wants to know if a new drug B will be more effective
than the current drug A. Running clinical trials comparing A to B are
expensive and they are only budgeted to run a trial with at most 10 patients
in each group. Is running the trial worthwhile?

Power under nA = nB = 10 for a variety of values of B - A and :


|B - A|=1, =1, power  0.60
|B - A|=1, =3, power  0.23
Example
 A drug company wants to know if a new drug B will be more effective
than the current drug A. Running clinical trials comparing A to B are
expensive and they are only budgeted to run a trial with at most 10 patients
in each group. Is running the trial worthwhile?

Power under nA = nB = 10 for a variety of values of B - A and :


|B - A|=1, =1, power  0.60
|B - A|=2, =1, power  0.95
Example
 A drug company wants to know if a new drug B will be more effective
than the current drug A. Running clinical trials comparing A to B are
expensive and they are only budgeted to run a trial with at most 10 patients
in each group. Is running the trial worthwhile?

Power under nA = nB = 10 for a variety of values of B - A and :


|B - A|=-2, =1, power  0.65
|B - A|=1, =3, power  0.98
Factors Affecting Power
 Increasing overall sample size increases power
 Having unequal group sizes usually reduces
power
 Larger size of effect being tested increases power
 Setting lower significance level decreases power
 Violations of assumptions underlying test often
decrease power substantially
Effect Size
 Is the difference between the null and alternative
hypotheses, and can be measured either using
raw or standardized values
 Decide smallest clinically important effect
 Power tells us how many needed to reliably
detect this effect or a larger one
 Smaller effects are more difficult to detect

  1  2

Effect Size=

Sample Size Calculations for Fixed Power
 Step 1 - Define an important difference in means:
– Case 1:  approximated from prior experience or pilot study -
difference can be stated in units of the data
– Case 2:  unknown - difference must be stated in units of standard
deviations of the data

 Step 2 - Choose the desired power to detect the clinically


meaningful difference (1-, typically at least 0.80).
One Sample

  ( Z  Z 
2

n    for the one-side test, and
  

  ( Z / 2  Z 
2

n    for the two-side test.
  
Example
 We need to estimate the average blood glucose among
diabetic patients. Normal fasting glucose level is about
100 mg/dl. Estimated standard deviations is 20 mg/dl.
Calculate the sample size needed to detect a difference
of 20 mg/dl. Desired Power = 0.80,  = 0.05

  ( Z / 2  Z  
2

n   
  
z / 2  1.96 z  z.20  .84

 20(1.96  0.84) 
2

n   7.84
 20 
Paired Observations Example
 Below is data acquired by Mazze el al. (1971) that deals
with the pre-operative and post-operative creatinine
clearance (ml/min) of six patients anesthetized by halothane.

Patients
Time 1 2 3 4 5 6
Pre-operative 110 101 61 73 143 118
Post-operative 149 105 162 93 143 100
Difference 39 4 101 20 0 -18

Assume that this is pilot data from which the investigator


wishes to design a larger study.
Paired Observations Example
The hypotheses to be tested are
H0: D = 0
H1: D > 0.
The investigator further states that he believes that, on
average, detecting difference larger than 15 ml/min in
creatinine is clinically important.
The estimated standard deviation, calculated from the pilot
data, is 42 ml/min.
Assuming an -level of 0.05 will be used to determine
statistical significance, how many patients should be
enrolled to achieve a power of 80%?
Paired Observations Example

The four values that must be pre-specified before


the sample size can be calculated are

1. standard deviation: 42 ml/min


2. the  level: 0.05, Z=1.645
3. a specified alternative: a- o= =15 ml/min
4. the power: 1-=0.80, Z =0.842.
Paired Observations Example

  ( Z  Z 
2

n   
  

 42(1.645  0.842 
2

n   48.49
 15 

The investigator should enroll 49 patients.


Paired Observations Example

Should the investigator choose to perform a


two-sided test, he would need to enroll 62
patients.

  ( Z / 2  Z  ) 
2

n   
  

 42(1.96  0.842 
2

n   61.55
 15 
Two-Sample Tests
 Assume that each sample has a normal distribution with
mean  and variance 2. The true levels of  I and  C for
the intervention and control groups are not known, but it is
assumed that 2 is known. The sample size per group can
be calculated as

  ( Z  Z  ) 
2

n  2  for the one-side test, and


  

  ( Z / 2  Z  ) 
2

n  2  for the two-side test.


  
n=sample size per group, so the total sample size is 2n.
Example 1
 Suppose an investigator wishes to detect a 10 mg/dl
difference in cholesterol level in a diet intervention group
compared to the control group. The standard deviation,
estimated from pilot data, is 50 mg/dl2. How large of a
sample size is required to test

H0:  I -  C =0
HA: I -  C 0

with 5% significance level, and for 90% power?


Example 1
For a two-sided test 0.05 significance level, Z/2=1.96 and
for 90% power, Z =1.282.
 Substituting these values into the formula,

  ( Z / 2  Z  ) 
2
 50(1.96  1.282) 
2

n  2   2   526.33
    10 

 Approximately 526 subjects in each group.


A total of 1052 subjects is required for this study.
Example 2
 A clinical trial was designed to compare placebo vs
rosiglitazone for HIV-1 lipoatrophy.
 The response variable was change in Limb fat mass.
Clinically meaningful difference - 0.5 (std)
 Desired Power - 1- = 0.80
 Significance Level -  = 0.05

z / 2  1.96 z   z.20  .84


21.96  0.84
2
n1  n2  2
 63
(0.5)

Source: Carr, et al (2004)


Statistical Tests for Categorical Data

Independent?
Yes No

Large sample size? McNemar’s test


Yes No

Chi-square test Chi-square test/ Yates correction


Z test for proportions Fisher’s Exact test
Sample Size for Proportion
For the case of a single proportion, let p0 denotes the
value under the null hypothesis, and p denotes the true
value.
The formula of sample size which is based on the
assumed adequacy of the normal approximation to the
binomial distribution is

n
Z  p0 (1  p0 )  Z  p(1  p ) 
2

( p  p0 ) 2
Example

Null hypothesis: p = 0.50 ;


Significance level:  = 0.05 ;
Expected proportion under the alternative: p = 0.60 ;
Power: 1-  = 0.80.
The required sample size is

n
1.96 0.5(1  0.5)  0.8416 0.6(1  0.6) 
2

 194
(0.6  0.5) 2
Sample Size for two Proportion

n
Z  2 p(1  p )  Z  pT (1  pT )  pC (1  pC ) 
2

2
pT  pC
p
2
  pT  pC or pC  pT

For 80% power and two-sided alpha=0.05.


Example
A trial comparing lifestyle intervention to advice only
(control) for prehypertensives. The aim is to achieve a
20% reduction over the control. The rate in the control
( pC) =.30.

A 20% reduction means treatment rate is


pT=(1-.20)(.30)=.24.
The overall rate p=(pT+pC)/2=(.30+.24)/2=.27
The sample size per group for 90% power is

n

1.96 2(.27)(1  .27)  1.28 .24(1  .24)  .30(1  .30) 
2

 1149
(.30  .24) 2

The required sample size is 1149x2=2298

You might also like