Professional Documents
Culture Documents
Hypothesis Testing - 2 Populations
Hypothesis Testing - 2 Populations
Estimation and
Hypothesis
Testing : Two
populations
LEARNING OBJECTIVES
Outline and explain the procedure for a test of
significance between two sample means
Determine when to use an independent t test and
when to use a paired t test
Calculate and interpret the results of an independent
and a paired t test
Compute a confidence interval from a set of data for
the difference between two population means
deviation
(Fig
1)
1
2
x1-x2 1as
follows:
2
x - x
n1
n
1.
and
/ 2
2
2
1
s x1-x2
s1 s2
n1 n2
1=themeanofpopulation1;2=themeanofpopulation2;1=the
(1 and 2 are=thestandarddeviationofpopulation
unknown)
standarddeviationofpopulation1;
2
2;n1=thesizeofthesampledrawnfrompopulation1;n
2=thesizeofthe
x
x
sampledrawnfrompopulation2;=themeanofthesampledrawnfrom
population1;=themeanofthesampledrawnfrompopulation2;s1:
1
Interval Estimation of
1 2
for
Confidence interval
1 2for
The (1-) 100% confidence interval
( x is:
x ) z
If and are known
1
x1 x 2
( x1 x2 ) zs x1 x2
Assume that executive males earned an average of $ 538 per week and
executive females earned an average of $ 470 per week. Assume that these
means have been calculated for samples of 500 and700 workers taken from
the two populations, respectively. Further assume that the standard
deviations of weekly earnings of the 2 populations are $ 66 and $ 60,
respectively.
a) What is the point estimate of ? x1 x2 $538 $470 $68
b) Construct a 95% confidence interval for the difference between the
mean weekly earnings of the2populations? $60.70 to $ 75.30.
1
( x1 x2 ) ( 1 2 )
z
x1 x2
z
( x1 x2 ) ( 1 2 )
s x1 x2
1 2
The value of
is substituted from H0
x1 x2
is computed
x1-x2
1 2
3.7222
n1 n2
2
( x1 x2 ) ( 1 2 ) (538 470) 0
z
18.27
x1 x2
3.7222
Step 5: Make a decision: Z= 18.27 falls in the
rejection region, we reject H0. Conclusion : the
mean weekly earnings of the 2 groups of executives
are different
1 2
When to use the t distribution to make inferences
1 2
about
The t distribution is used to make inferences about
when the following assumptions hold true.
1. The 2 populations from which the two samples are drawn
are approximately normally distributed
2.the samples are small (n1<30 and n2<30) and independent
3.the standard deviations 1 and 2 of the two populations are
unknown and they are equal , that is 1 = 2
Since is unknown , we replace it by its point estimator s p (pooled sample standard
n1 and n2 : sizes of the 2 samples
deviation).
2
(n1 1) s1 (n2 1) s2
sp
n1 n2 2
1 1
n1 n2
x1 x2
Interval Estimation of
1 2
1 for
2
Confidence interval
The (1-) 100% confidence interval
1 2for
is:
( x1 x2 ) tsx x
1
Interval Estimation of
1 2
1
2
Confidence interval
for
S1=5mg,s
n12=15,n22 =12
2
2
2=6mg;
(n 1)s (n 1)s
(15 1)(5) (12 1)(6)
sp 1 1 2 2
5.4626
n1 n2 2
15 12 2
s x1 x2 s p
1 1
1 1
(5.4626)
2.1157
n1 n2
15 12
Interval Estimation of
95%
CI for
1 2
1 2
The value of
is substituted
from H0
Example : The management at a supermarket wanted to
investigate whether or not a promotional campaign
increases the sales of a product. A sample of 28 days
during the promotional campaign showed that an
average of 316 units of this product are sold per day with
a SD of 18 units. A sample of 24 days before the
promotional campaign showed that an average of 282
units of this product are sold per day with a SD of 13
units. Assume that the number of units sold per day has a
n1 28; x1 316; s1 18
n2 24; x2 282; s2 13
(n1 1) s1 (n2 1) s2
(28 1)(18) 2 (24 1)(13) 2
sp
15.8965
n1 n2 2
28 24 2
s x1 x2 s p
t
1 1
1
1
(15.8965)
4.4220
n1 n2
28 24
( x1 x2 ) ( 1 2 ) (316 282) 0
7.689
s x1 x2
4.4220
d
n
( d ) 2
d n
sd
n 1
2
d
Sampling Distribution, mean and standard deviation
of
If the number of paired samples is large(n30), because of the central limit
d
theorem the sampling distribution of is approximately
normal with its mean and
standard deviation as
d d
and
d
d
n
1.n<30
d
2.
Is not known
3. The population of paired differences is approximately normally distributed
d
t distribution is used
sd to make inferences about
which is calculated as
sd
sd
n
of
is estimated by
Interval Estimation of
d
Confidence Interval for
d
d
The (1-) 100% confidence interval for
is: tsd
where the value of t is obtained from the t distribution table for
the given confidence level and n-1 degrees of freedom
Example:
A hospital is considering adopting a new procedure to decrease
waiting time incurred by patients admitted through the ER . The
hospital randomly selected seven admission staff and gathered
information on the times taken by them to admit patients using
the old procedure. Then, the same employees were asked to
admit patients using the new procedure. the following table
gives the assembly times (in minutes) for these seven staff.
Let d be the mean of the differences between the admission
times for the 2 populations . Construct a 95% CI for d . Assume
that the population of paired differences is approx. normally
distributed.
Interval Estimation of
Confidence Interval
for
d
Employ
ee
Old
Procedure
(time in
min)
New
Procedure
(time in
min)
Difference
(d)
64
60
16
71
66
25
68
66
66
69
-3
73
63
10
100
62
57
25
70
62
64
d 31
4.43
n
d2
d=31
d =31
2
( d )2
(31)
243
sd 4.1975
d n
7
s
1.5865
sd
4.1975 d
n
7
n 1
7 1
2
Interval Estimation of
d
# of
procedures
(Before
course)
12
18
25
9
14
18
d 25
4.17
n
6
# of
procedures
(after
course)
18
24
24
14
19
20
( d ) 2
d n
sd
n 1
2
Difference
(d)
-6
-6
1
-5
-5
-4
(25d=-25
)2
139
sd
6 2.6394 sd
n
6 1
d2
36
36
1
25
25
16
2
d
2.6394 =139
6
1.0775
3.870
sd
1.0775
CONCLUSION
Tosolveatwo-samplehypothesisproblem,itisfirst
necessarytodeterminewhetherthesamplesare
independentorpairedandwhetherthetestisone-or
two-tailed.Nextchooseasignificanceleveland
calculatethe tstatistic.Thendeterminewhether
yourresultsaresignificant.Finally,calculateand
interprettheconfidenceintervals.Rememberthat
two-sampleconfidenceintervalsrepresentthe
differencebetweenthemeans.