Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Hypothesis Testing: Independent Two-Sample Inference-with

Equal-Variances in the Normally Distributed Outcome


Outline:
1) Introduction to independent samples
2) Introduction to Two-sample t Test for independent samples with equal
variances.
3) Introduction to Two-sample t Test for independent
samples with unequal variances: “Satterthwaite’s Method”.
4) Testing the equality of variances between two
independent samples: The F Test.
5) F distribution with numerator and denominator degrees of freedom
6) Two-sample t Test for equality of means in the independent groups with
equal variances of the normally distributed outcome using the 3 methods of
hypothesis testing:
 Critical value,
 P-value,
 Confidence Interval
EPHD310 Basic Biostat lect 4 Dr. Jaffa 1

Introduction to Two Samples

• Most clinical studies involve two samples (or groups) and 2 (or
more) treatments that need to be compared.

• Each group is assigned one of the two drugs and the mean of the
2 groups are compared.

• There are 2 different types of groups:

EPHD310 Basic Biostat lect 4 Dr. Jaffa 2

1
Introduction to Two Samples

1) Paired groups or samples :


 Each point in the first sample is matched to a unique point in the second
sample.

 Matching based on factors such as age, gender, ethnic background,


smoking etc. This is referred to as matched study design.

 A subject can also be used as his/her own control (100% matching such
as follow up study design).

 Outcome being compared is normally distributed in the two groups.

 Use Paired t Test for comparing the means of the normally distributed
outcome in the matched paired samples (Discussed in Lecture 3)

EPHD310 Basic Biostat lect 4 Dr. Jaffa 3

Introduction to Two Samples

2) Independent samples :
 Data points in one sample are unrelated to data points in the second
sample.
 Outcome being compared is normally distributed in the two groups.

 2 types of independent samples:


 Independent samples with Equal Variances: Then two-sample t
test for independent samples with equal variances should be used
(Pooled Variance Method) to compare the means of the normally
distributed outcome in the independent two groups.

 Independent samples with Unequal Variances: Then two-sample


t test for independent samples with unequal variances should be
used (Satterthwaite’s Method) to compare the means of the
normally distributed outcome in the independent two groups.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 4

2
Introduction to Two Samples
2) Independent samples : Testing for equality of variances
 Before you conduct any t test you need first to test for normality of the outcome
being compared in the two groups using the test of normality (Shapiro test of
normality is an option).

 After you confirm normality of the outcome in the two groups then before you
proceed with conducting the t test for independent groups you need first to test
for equality of variances in the outcome between the two groups.

 You should do the test of equality of variances in the outcome in the two
independent groups to determine whether you should conduct the independent
t test with equal or unequal variances.

 The test for equality of population variances of the normally distributed outcome
in the two independent groups is done using the F test also referred to as test
of Homogeneity. This test is based on a new distribution referred to as the F
distribution. In SPSS the test for equality of population variances of the
outcome in two independent groups is conducted using Levene’s test.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 5

Testing the Equality of Variances Between Two Independent


Samples: The F Test

The hypotheses: where  12 and  22


are the population variances
H o :  12   22 vs H1 :  12   22
of the ouctome being
Compute the F statistic: compared in the
two independent groups:
F  s12 s22  Fn1 1,n2 1
group 1 and group 2
Decision rules at α level of significance should be
done to compare the F statistic to percentiles from
an F distribution with n1-1, n2-1 df as follows:

if F  Fn 1, n 1,1  
or F  Fn 1, n 1,
then reject H 0
1 2 2 1 2 2

if Fn 1, n 1,
 F  Fn 1, n 1,1  
then fail to reject H 0
1 2 2 1 2 2

at  level of significance
EPHD310 Basic Biostat lect 4 Dr. Jaffa 6

3
Testing the Equality of Variances Between Two Independent
Samples: The F Test

Computation of the lower percentile of the F distribution:


• The lower pth percentile of an F distribution with d1 and d2 df is the
reciprocal of the upper pth percentile of an F distribution with d2 and d1
df. That is:
1
Fd1 ,d2 , p 
Fd2 ,d1 ,1 p

• Example: estimate F6,8,0.05


F6,8,0.05 = 1/F8,6,1-0.05 =1/F8,6,0.95 = 1/4.15 = 0.241

EPHD310 Basic Biostat lect 4 Dr. Jaffa 7

Testing the Equality of Variances Between Two Independent Samples:


The F Test

The F statistic follows an F distribution with n1-1 and n2-1 df

F distribution is a family of distributions.

Each F distribution is determined by 2 degrees of freedom the numerator df:


d1 and denominator df: d2 and is generally positively skewed.

An F distribution is denoted by Fd1,d2

In the F test for equality of variances the two critical values are
1
F  and F  
n1 1, n2 1,1 n1 1, n2 1, F
2 2 
n2 1, n1 1,1
2

with degrees of freedom: d1 = n1 -1 and d2 = n2 -1


EPHD310 Basic Biostat lect 4 Dr. Jaffa 8

4
Testing the Equality of Variances Between Two Independent Samples:
The F Test

Example to illustrate the application of the F test: Assume we have 2


independent samples:
X 1 : n1  25 x1  207.3 s1  35.6
X 2 : n2  41 x2  193.4 s2  17.3

We need to test whether the variances are equal: H o :  1   2 vs H1 :  1   2


2 2 2 2

F  s12 s22  35.62 17.32  4.23


Since F  4.23  Fn 1,n 1,1  F24,40,0.975  2.01
1 2 2

Then at α = 0.05 level of significance, we reject the null hypothesis of no


differences in the variances of the outcome in the two independent groups and
conclude that the variances of the two samples are statistically significantly
different. Accordingly you follow this up by conducting the independent t-test
with unequal variances.
EPHD310 Basic Biostat lect 4 Dr. Jaffa 9

Testing the Equality of Variances Between Two Independent Samples:


The F Test
Summary: If the variances of the normally distributed outcome being compared
between the 2 independent groups are Not significantly different then use the
pooled variance estimate method to conduct the independent t-test with equal
variances to test the main hypothesis of the difference in the means of the
normally distributed outcome between the two independent groups.

Otherwise If the variances of the normally distributed outcome being compared


between the 2 independent groups are significantly different, then use the
Satterthwaite’s Method to conduct the independent t-test with unequal
variances to test the main hypothesis of the difference in the means of the
normally distributed outcome between the two independent groups.

In SPSS Levene’s Test is used to test for the equality of variances in the
outcome in the two independent groups. If this test is not significant then we fail
to reject the null hypothesis of equal variances of the outcome in the
independent groups and we assume homogeneity. Otherwise, we proceed with
independent t test with equal variances.
10
EPHD310 Basic Biostat lect 4 Dr. Jaffa

5
Test of Equality of Variances: Levene’s Test

H o :  12   22 vs H1 :  12   22

 P-value for the test of equality of variances i.e. P-value for Levene’s test =
0.077 > 0.05 then at α = 0.05 level of significance we fail to reject the null
hypothesis of equal variances and we deduce that the variances of the normally
distributed outcome in the two independent groups are equal.

Hence in this example we use the pooled variance method for testing the
equality of means in the outcome in the two independent groups. That is, we
report the results listed under the “Equal Variances Assumed” option (row 1).

EPHD310 Basic Biostat lect 4 Dr. Jaffa 11

Test of Equality of Variances: Levene’s Test


H o :  12   22 vs H1 :  12   22

 P-value for the test of equality of variances i.e. P-value for Levene’s test =
0.01 < 0.05 then at α = 0.05 level of significance we reject the null hypothesis of
equal variances in the normally distributed outcome between the two
independent groups and we deduce that the variances of the outcome in the
two independent groups are not equal.

In this example we use “Satterthwaite’s method” for testing equality of means
of the outcome in the two independent groups. That is, we report the results
listed under the “Equal Variances Not Assumed” option (row 2).

EPHD310 Basic Biostat lect 4 Dr. Jaffa 12

6
Two-sample t Test for Independent Samples with Equal Variances
• Assumptions: X1 ~ N(μ1, σ2) and X2 ~ N(μ2, σ2)
  1 1 
X 1  X 2   N  1  2 ,  2    
  n1 n2  

• Where X1 and X2 representing the outcome in each group and is assumed


to be normally distributed.

• The population variance σ2 is estimated by what is called “Pooled


estimate of the variance” from two independent samples and is denoted
by s2.

s2 
 n1  1 s12   n2  1 s22
n1  n2  2 df for t distribution is
n1+n2-2
• Where s1 and s2 are the sample standard deviations of the outcome in
group 1 and group 2 respectively.
EPHD310 Basic Biostat lect 4 Dr. Jaffa 13

Two-sample t Test for Independent Samples with Equal Variances: Critical


Value Method
The question of interest is whether or not the underlying population means μ1
and μ2 of the normally distributed outcome in the two independent groups are
equal.

H o : 1  2 vs H1 : 1  2 equivalently
Hypotheses:
H o : 1  2 =0 vs H1 : 1  2  0
3 methods can be used
Critical value method: Compute t test statistic: to conduct the
independent t test to test
x1  x2
t the null hypothesis of
1 1 equality of the mean of
s  the outcome in the two
n1 n2 groups: critical value, P-
value, and CI methods
with s being the square root of the pooled variance s2.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 14

7
Two-sample t Test for Independent Samples with Equal Variances:
Critical Value Method

Decision rules for critical value method: H o : 1  2 =0 vs H1 : 1  2  0

if t  t n1  n2  2,1 2 then at α level of significance we reject H0

if t  tn1  n2  2,1 2 then at α level of significance we fail to reject H0

EPHD310 Basic Biostat lect 4 Dr. Jaffa 15

Two-sample t Test for Independent Samples with Equal Variances: P-


Value Method

P-value method: H o : 1  2 =0 vs H1 : 1  2  0

To compute the P-value look at t distribution with df = n1+n2-2 to locate the t


statistic:
t n1+n2-2, p1 ≤ |t| ≤ t n1+n2-2, p2

2(1-p2) ≤ P-value ≤ 2(1-p1)

Decision rules:
If p-value ≤ α then at α level of significance we reject H0 and deduce at that
the means of the outcome in the two independent groups are different.

Otherwise we deduce that the means of the outcome in the two independent
groups are not significantly different.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 16

8
Two-sample t Test for Independent Samples with Equal Variances:
Confidence Interval Method

Confidence Interval Method: H o : 1  2 =0 vs H 1 : 1  2  0

The 100(1-α)% CI for the underlying true mean difference (μ1-μ2) between two
independent groups is given by :

 1 1 
100% 1    CI for  1  2  =  x1  x2   t  s  
n1  n2  2,1 n1 n2 
 2

Decision rules:
If the CI crosses zero then fail to reject H0 of equal means of the outcome in
the two independent groups.

If the CI does not cross zero then we reject H0 and deduce that the means of
the outcome in the two independent groups are different.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 17

Two-sample t Test for Independent Samples with Equal Variances of the


outcome in the two groups: illustration using a clinical example

• To study the effect of OC on SBP, two independent samples of women


age between 35 to 39 were collected.

• Participants were not matched resulting in independent two samples and


independent study design.

• The OC in each group was normally distributed based on the test of


normality.

• Sample 1 (OC users) denoted as X1 : x1  132.86, s1 = 15.34, n1 = 8

• Sample 2 (non OC users) denoted as X2 : x2  127.44, s2 = 18.23, n2 = 21

• What can be said about the underlying mean difference in SBP between
the two groups of OC users and non-OC users?

EPHD310 Basic Biostat lect 4 Dr. Jaffa 18

9
Two-sample t Test for Independent Samples with Equal Variances:
Illustration
Solution: This is the case of two independent study design, and we are
comparing the means of the SBP a normally distributed outcome in the two
independent groups of OC and non-OC users among females.

The hypotheses are: H o : 1  2 =0 vs H1 : 1  2  0


Where µ1 and µ2 are the population means of SBP in the two groups of OC and
non-OC users respectively.

To test the H0 of equality of population means in the normally distributed


outcome in the two groups of OC and non-OC users, we need to conduct the
independent t test.

However, step 1 before the t-test, is to test if the variances of the SBP in the
two independent groups of OC and non-OC users are equal or not using the F
test as shown in the following slide.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 19

Testing the Equality of Variances Between Two Independent


Samples: The F Test

The hypotheses: H o :  12   22 vs H1 :  12   22 where  12 and  22


are the population variances
Compute the F statistic: of SBP in the two independent
groups of OC and non-OC users
s12 15.342 234.702
F    0.706
s22 18.232 332.333

Decision rules at α level of significance:

if F  Fn 1, n  or F  Fn 1, n  then reject the null.


1 2 1,1 2 1 2 1, 2

if Fn 1,n 1,  F  Fn 1,n 1,1 then fail to reject the null.


1 2 2 1 2 2

EPHD310 Basic Biostat lect 4 Dr. Jaffa 20

10
Testing the Equality of Variances Between Two Independent Samples:
The F Test

The hypotheses: H o :  12   22 vs H1 :  12   22

F statistic=0.706
Critical Values are:
Fn 1, n 1,1 = F 0.05  F7,20,0.975  3.01
1 2 2 8 1,211,1 
2

1 1 1
Fn 1, n 1,
 F7,20,0.025     0.226
1 2 2 F20,7,0.975 F24,7,0.975 4.42
Since Fn 1,n   F7,20,0.025  0.226  F  0.706  Fn 1,n   F7,20,0.975  3.01
1 2 1, 2 1 2 1,1 2

Then we fail to reject H 0 of homogeneity and deduce at  =0.05 level


of significance that the two variances of SBP in the OC and
the non-OC groups are equal. So we conduct the independent t-test with
equal variance using the pooled variance method

21

Two-sample t Test for Independent Samples with Equal Variances:


Illustration

Solution: OC and non-OC example continued:

H o : 1  2 =0 vs H1 : 1  2  0

Estimate first the pooled variance s2

 n  1 s12   n2  1 s22 7 15.34   20 18.23


2 2

s 2
 1   307.18
n1  n2  2 27

So s  307.18  17.527

EPHD310 Basic Biostat lect 4 Dr. Jaffa 22

11
Two-sample t Test for Independent Samples with Equal Variances:
Illustration
H o : 1  2 =0 vs H1 : 1  2  0

Critical Value Method: Compute the t test statistic:


x1  x2 132.86  127.44
t   0.74
1 1 1 1
s  17.527 
n1 n2 8 21

Critical value: tn1  n2  2,1 2  t8 21 2,10.05 2  t27,0.975  2.052

Since |t|= 0.74 < 2.052 then we fail to reject the null hypothesis at α=0.05
level of significance.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 23

Two-sample t Test for Independent Samples with Equal Variances:


Illustration

H o : 1  2 =0 vs H1 : 1  2  0
Conclusion Critical value method

• And we can conclude that we do not have statistically significant evidence


that the mean of SBP in the two groups of OC and non-OC users are
different at α=0.05 level of significance.

• Accordingly, at α=0.05 level of significance we deduce that the true


population means of SBP in the OC and non-OC users are not statistically
significantly different and that OC has no association with SBP among
female in the general population.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 24

12
Two-sample t Test for Independent Samples with Equal Variances:
Illustration
H o : 1  2 =0 vs H1 : 1  2  0
P-value method:
To compute the P-value look at t distribution with df = 27 and locate where
the t statistic fall:

Note that t27, 0.75 = 0.684 < |t| = 0.74 < t27, 0.80 = 0.855 so

2(1-0.80) < P-value < 2(1-0.75)


0.4 < P-value < 0.5

P-value > 0.05 so at α=0.05 level of significance we fail to reject H0 and


deduce that the true population means of SBP in the OC and non-OC users
are not significantly different and that the OC is not associated with SBP in
the female population.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 25

Two-sample t Test for Independent Samples with Equal Variances:


Illustration

Confidence interval Method: H o : 1  2 =0 vs H1 : 1  2  0

The 95% CI for the true underlying mean difference in SBP between the
population of 35- to 39- year-old OC users and non-OC users is given by:

 1 1 
100% 1    CI for  1  2     x1  x2   tn1  n2  2,1 2 s  
 n1 n2 
 1 1 
95% CI for  1  2   132.86  127.44   2.052 *17.527  
 8 21 
95% CI for  1  2   5.42  14.94
95% CI for  1  2    9.52,20.36

EPHD310 Basic Biostat lect 4 Dr. Jaffa 26

13
Two-sample t Test for Independent Samples with Equal Variances:
Illustration
Confidence interval method: H o : 1  2 =0 vs H1 : 1  2  0

Interpretation:
We are 95% confident that the true underlying mean difference in SBP
between the population of 35- to 39- year-old OC users and non-OC users
lies between -9.52 and 20.36.

Note that by just looking at the CI we can tell whether result is significant or
not.

Since this CI contains zero then this means that the result is not significant,
and at α=0.05 level of significance we fail to reject H0 and deduce that the
true population means of SBP in the OC and non-OC users are not
significantly different and that the SBP is not associated with OC in the
female population.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 27

Two-sample t Test for Independent Samples with Equal Variances:


Illustration

• Note that this CI is wide so it has a low precision.

• So larger sample size might be needed to accurately assess


the true mean difference.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 28

14
Two-sample t Test for Independent Samples with Equal Variances
SPSS Output Example: This is NOT the exact OC example

x1  x2
t
1 1
s1 s 
n1 n2
s2

x1  x2  133.43  128.10

P-value = 0.522 > 0.05 then we fail to achieve


statistical significance and deduce that there is no 1 1
difference in mean SBP between OC and non-OC s 
n1 n2
users. Thus, OC has no effect on SBP at α=0.05 level
29
of significance

Types of Study Design, outcome, and Appropriate Tests


Summary of hypothesis testing to compare means of two groups:
If the two groups are independent and outcome is continuous and you want to test
the means of the outcome in the two independent groups, then potentially you can
use the independent t-test.

First check normality of the outcome in the two independent groups. If confirmed
then proceed to utilization of the independent t-test.

Check for the homogeneity in the population variances of the outcome in the two
independent groups using F test.

If F test is not significant then variances are equal then carry out independent t-test
for testing equality of means of the two independent groups assuming equal
variances and using the pooled variances method (lect 4).

If F test is significant then variances are not equal, then carry out independent t-test
for testing equality of means of the two independent groups assuming unequal
variances and using the Satterthwaite’s Method (lect 5).

30
EPHD310 Basic Biostat lect 4 Dr. Jaffa

15
Types of Study Design, outcome, and Appropriate Tests

Summary of hypothesis testing to compare means of two groups:


If the two groups are matched and dependent and outcome is continuous and you
want to test the means of the outcome in the two dependent groups then potentially
you can use the dependent paired t-test.

First check normality of the outcome in the two dependent groups. If confirmed then
proceed to utilization of the dependent paired t-test as was discussed in lec 3.

31
EPHD310 Basic Biostat lect 4 Dr. Jaffa

Course learning Objectives (LOs) covered in this lecture

By the end of this course, students are expected to be able to achieve the
following LOs:

LO1.Explain the role of quantitative methods and sciences of biostatistics in


describing and assessing a population’s health.
LO2. Apply the appropriate descriptive techniques commonly used to
summarize public health data.
LO3. Describe commonly used statistical probability distributions.
LO4. Analyze quantitative data using common statistical methods for inference
through computer based statistical software and manual computation.
LO5. Apply alternative statistical methodologies to commonly used statistical
methods when assumptions are not met.
LO6. Interpret results of statistical analyses found in public health studies and
biomedical sciences.
LO7. Apply ethical principles to data management and analysis.

EPHD310 Basic Biostat lect 4 Dr. Jaffa 32

16

You might also like