Chapter 6

Paired Samples (dependent samples)
Two Independent samples
P ROBABILITY AND S TATISTICS

C HAPTER 6: I NFERENCES B ASED ON T WO S AMPLES
Dr. Phan Thi Huong
HoChiMinh City University of Technology

Faculty of Applied Science, Department of Applied Mathematics
Email: huongphan@hcmut.edu.vn
HCM city — 2021.
Dr. Phan Thi Huong Probability and Statistics

O UTLINE
1 PAIRED S AMPLES ( DEPENDENT SAMPLES )

O UTLINE
1 PAIRED S AMPLES ( DEPENDENT SAMPLES )
2 T WO I NDEPENDENT SAMPLES

Normal distribution
E XAMPLE
E XAMPLE 1
Trace metals in drinking water affect the flavor, and unusually high
concentrations can pose a health hazard. An article reports on a
study in which six river locations were selected (six experimental
objects) and the zinc concentration (mg/L) determined for both
surface water and bottom water at each location. The six pairs of
observations are displayed in the accompanying table. Does the
data suggest that true average concentration in bottom water
exceeds that of surface water? (α = 0.05)
Zinc concentration 1 2 3 4 5 6
in bottom water (x) 0.430 0.266 0.567 0.531 0.707 0.716
in surface water (y) 0.415 0.238 0.390 0.410 0.605 0.609

Normal distribution
PAIRED DATA
Paired data usually given as this form:

Observation 1 2 3 4 ... n
Sample/Property 1 (X) x 1 x 2 x 3 x4 ... xn
Sample/Property 2 (Y) y 1 y 2 y 3 y4 ... yn

Normal distribution
PAIRED DATA
Paired data usually given as this form:

Sample/Property 1 (X) x 1 x 2 x 3 x4 ... xn
Sample/Property 2 (Y) y 1 y 2 y 3 y4 ... yn
⇒ Problem: Compare the true mean difference between property 1

and property 2, e.i E (X ) − E (Y ) = E (X − Y ).

Normal distribution
PAIRED DATA AND HYPOTHESIS TESTING
Let D i = X i − Yi , we have:
Sample/Property 1 (X) x 1 x2 x3 x4 ... xn
Sample/Property 2 (Y) y 1 y2 y3 y4 ... yn
D = X −Y d1 d2 d3 d4 ... dn

Normal distribution
PAIRED DATA AND HYPOTHESIS TESTING
Let D i = X i − Yi , we have:
Sample/Property 1 (X) x 1 x2 x3 x4 ... xn
Sample/Property 2 (Y) y 1 y2 y3 y4 ... yn
D = X −Y d1 d2 d3 d4 ... dn
⇒ Let µD = E(X − Y ), consider the test of hypothesis for one sample

di :
H0: µD = ∆0 H0: µD = ∆0 H0: µD = ∆0

H1: µD 6= ∆0 H1: µD > ∆0 H1: µD < ∆0

Normal distribution
D ISTRIBUTION OF THE S AMPLE D IFFERENCES
A SSUMPTIONS
The data consists of n independently selected pairs (X 1 , Y1 ),
(X 2 , Y2 ), . . . , (X n , Yn ), with E (X i ) = µ1 and E (Yi ) = µ2 . Let
D 1 = X 1 − Y1 , D 2 = X 2 − Y2 , . . . , D n = X n − Yn
So the D i ’s are the differences within pairs. Then the D i ’s are

assumed to be normally distributed with mean µD and variance σ2D
(unknown).
Note:
D − µD
T= p ∼ t n−1
sD / n

Normal distribution
H YPOTHESIS TESTING ON THE DIFFERENCE MEANS
Consider the null hypothesis: H 0 : µD = ∆0

The test statistic:
D̄ − ∆0
T= p
SD / n

Normal distribution
H YPOTHESIS TESTING ON THE DIFFERENCE MEANS
Consider the null hypothesis: H 0 : µD = ∆0

The test statistic:
H1 Rejection region
D̄ − ∆0 µD 6= ∆0 |T | > t n−1,α/2
T= p
SD / n µD > ∆0 T > t n−1,α
µD < ∆0 T < −t n−1,α
⇒ use t-test.
C ONFIDENCE I NTERVALS
The paired t CI for µD is
SD
D ± t n−1,α/2 p
n
A one-sided confidence bound results from retaining the relevant

sign and replacing t α/2 by t α .

Paired Samples (dependent samples) Inferences for Two Population Means
Two Independent samples Inferences for Population Proportions (Large-Sample)
N ORMAL P OPULATION + K NOWN VARIANCES (H YPOTHESIS

T ESTS ON THE D IFFERENCE IN M EANS )
E XAMPLE 2
A consumer-research organization routinely selects several car
models each year and evaluates their fuel efficiency. In this year’s
study of two similar subcompact models from two different
automakers, the average gas mileage for twelve cars of brand A was
27.2 miles per gallon. The nine brand B cars that were tested
averaged 32.1 mpg. At α = 0.01 should it conclude that brand B cars
have higher average gas mileage than brand A cars do? Suppose
that two populations have normal distribution with standard
deviations 3.8 mpg and 4.3 mpg respectively.

N ORMAL P OPULATION + K NOWN VARIANCES (H YPOTHESIS

T ESTS ON THE D IFFERENCE IN M EANS )
A SSUMPTIONS
1 X 1 , X 2 , . . . , X m is a random sample from the normal population
with mean µ1 and variance σ21 .
2 Y1 , Y2 , . . . , Yn is a random sample from the the normal
population with mean µ2 and variance σ22 .
3 The X and Y samples are independent of one another.
4 The variances σ21 and σ22 are given.

N ORMAL P OPULATION + K NOWN VARIANCES
Note: If both X and Y are normal then
(X − Y ) − (µ1 − µ2 )
Z= q 2 ∼ N (0, 1)
σ1 σ22
m + n
Consider the null hypothesis H0 : µ1 − µ2 = ∆0 .

The test statistic:
(x − y) − ∆0
z= q 2
σ1 σ22
m + n
Then apply the following decision rule (z-test)

N ORMAL P OPULATION + K NOWN VARIANCES
Note: If both X and Y are normal then
(X − Y ) − (µ1 − µ2 )
Z= q 2 ∼ N (0, 1)
σ1 σ22
m + n

The test statistic:
(x − y) − ∆0
z= q 2
σ1 σ22
m + n
H1 Rejection Region
µ1 − µ2 6= ∆0 |z| > z α/2
µ1 − µ2 < ∆0 z < −z α
µ1 − µ2 > ∆0 t > z α

N ORMAL P OPULATION + UNKNOWN VARIANCES
A SSUMPTIONS
1 X 1 , X 2 , . . . , X m is a random sample from the normal population
2 Y1 , Y2 , . . . , Yn is a random sample from the normal population
4 The variances σ21 and σ22 are unknown.
T HE RULE OF THUMB
s1
If ∈ [0.5, 2], we assume that σ21 = σ22 .
s2
s1
If ∉ [0.5, 2], we assume that σ21 6= σ22 .
s2

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES
In case σ21 = σ22 = σ2 , we use the same sample variance to

estimate for σ21 and σ22 that is S 2p called the pooled sample
variance.
(n − 1)S 12 + (m − 1)S 22
S 2p = (1)
n +m −2
Then, the statistic
X̄ − Ȳ − (µ1 − µ2 )
T0 = r (2)
1 1
Sp +
n m
follows Student distribution with n + m − 2 degrees of freedom.


The test statistic:
(x − y) − ∆0
t= r
1 1
Sp +
n m
Then apply the following decision rule (t-test)


The test statistic:
(x − y) − ∆0
t= r
1 1
Sp +
n m
Then apply the following decision rule (t-test)
H1 Rejection Region
µ1 − µ2 6= ∆0 |t | > t α/2,m+n−2
µ1 − µ2 < ∆0 t < −t α,m+n−2
µ1 − µ2 > ∆0 t > t α,m+n−2

E XAMPLE 3
The course coordinator wants to determine if two ways of taking
the course resulted in a significant difference in achievement as
measured by the final exam for the course. The following table gives
the scores on an examination with 45 possible points for two
groups.
Online 32 37 35 28 41 44 35 31 34
Classroom 35 31 29 25 34 40 27 32 31
Do these data present sufficient evidence to indicate that the
average grade for students who take the course online is
significantly higher than for those who attend a conventional class?
Assume that the population are both normal and the significance
level α = 0.01.

σ (C ONFIDENCE I NTERVAL ON A D IFFERENCE IN M EANS )
If both X and Y are normal then
(X − Y ) − (µ1 − µ2 )
∼ t m+n−2
se
where
s
(m − 1)s 12 + (n − 1)s 22
µ ¶
1 1
se = s p2 + and s p2 = .
m n m +n −2
A 100(1 − α)% confidence interval for µ1 − µ2 is
(x − y) − t m+n−2,α/2 × se ≤ µ1 − µ2 ≤ (x − y) + t m+n−2,α/2 × se


(H YPOTHESIS T ESTS ON THE D IFFERENCE IN M EANS )
E XAMPLE 4
Ten samples of standard cement had an average weight percent
calcium of x = 90.0 with a sample standard deviation of s 1 = 5.0,
and 15 samples of the lead-doped cement had an average weight
percent calcium of y = 87.0 with a sample standard deviation of
s 2 = 4.0. Assume that weight percent calcium is normally
distributed with same standard deviation. Find a 95% confidence
interval on the difference in means, µ1 − µ2 , for the two types of
cement.

N ORMAL P OPULATION + U NKNOWN σ + U NEQUAL

VARIANCES
when σ21 6= σ22 , we use the test statistic
X̄ − Ȳ − ∆0
T0 = s (3)
S 12 S 22
+
n m
T0 follows the Student distribution with d f degrees of freedom

which is the the closest integer of
£ 2 ¤2
(s 1 /n) + (s 22 /m)
ν= (4)
(s 12 /n)2 (s 22 /m)2
+
n −1 m −1


VARIANCES
T HE T WO -S AMPLE T TEST FOR TESTING H0 : µ1 − µ2 = ∆0

We can test hypotheses about this difference based on the statistic
H1 Rejection Region
(x − y) − ∆0 t −t est µ1 − µ2 6= ∆0 |t | > t d f ,α/2
T= q 2 −−−−→
s1 2
s2 µ 1 − µ2 > ∆ 0 t > t d f ,α
m+ n µ1 − µ2 < ∆0 t < −t d f ,α
where d f is the closest integer of
s 12 s 22 2
³ ´
m + n
ν= ³ 2 ´2 ³ 2 ´2 .
1 s1 1 s2
m−1 m + n−1 n


VARIANCES
E XAMPLE 5
Arsenic concentration in public drinking water supplies is a
potential health risk. The below table reported drinking water
arsenic concentrations (in ppb) for 10 metropolitan Phoenix
communities and 10 communities in rural Arizona. Determine if
there is any difference in mean arsenic concentrations between
metropolitan Phoenix communities and communities in rural
Arizona. Assumed the arsenic concentration is following normal
distributions and the significant level α = 0.05.

Metro Phoenix Rural Arizona

Phoenix 3 Rimrock 48
Chandler 7 Goodyear 44
Gilbert 25 New River 40
Glendale 10 Apache Junction 38
Mesa 15 Buckeye 33
Paradise Valley 6 Nogales 21
Peoria 12 Black Canyon City 20
Scottsdale 25 Sedona 12
Tempe 15 Payson 1
Sun City 7 Casa Grande 18


VARIANCES σ (C ONFIDENCE I NTERVAL ON A D IFFERENCE
IN M EANS )
T HE T WO -S AMPLE T C ONFIDENCE I NTERVAL FOR µ1 − µ2

s
s 12 s 22
X − Y ± t d f ,α/2 +
m n
here d f is the closest integer of
s 12 s 22 2
³ ´
m + n
ν= ³ 2 ´2 ³ 2 ´2
1 s1 1 s2
m−1 m + n−1 n
A one-sided CI can be calculated as described earlier.

E XAMPLE 6
The void volume within a textile fabric affects comfort,
flammability, and insulation properties. Permeability of a fabric
refers to the accessibility of void space to the flow of a gas or liquid.
An article gave summary information on air permeability
(cm3/cm2/sec) for a number of different fabric types. Consider the
following data on two different types of plain-weave fabric:
Fabric Type Sample Size Sample Mean Sample Std

Cotton 10 51.71 0.79
Triacetate 10 136.14 3.59
Assuming that the porosity distributions for both types of fabric are
normal, let’s calculate a confidence interval for the difference
between true average porosity for the cotton fabric and that for the
acetate fabric, using γ = 95%.

L ARGE S AMPLE S IZE
A SSUMPTIONS
1 X 1 , X 2 , . . . , X m is a random sample from the any distribution
2 Y1 , Y2 , . . . , Yn is a random sample from the any distribution
4 The sample sizes are enough large.


If m and n are large then
(X − Y ) − (µ1 − µ2 )
z= q 2 ' N (0, 1)
S1 S 22
m + n
Consider the null hypothesis: H 0 : µ1 − µ2 = ∆0 .

The test statistic (if the variances are known, then use σ1 and σ2
instead of s 1 and s 2 ):
(x − y) − ∆0
z= q 2
s1 s 22
m+ n


If m and n are large then
(X − Y ) − (µ1 − µ2 )
z= q 2 ' N (0, 1)
S1 S 22
m + n
Consider the null hypothesis: H 0 : µ1 − µ2 = ∆0 .

The test statistic (if the variances are known, then use σ1 and σ2
instead of s 1 and s 2 ):
(x − y) − ∆0
z= q 2
s1 s 22
m+ n
H1 Rejection Region
µ1 − µ2 6= ∆0 |z| > z α/2
µ1 − µ2 < ∆0 z < −z α
µ1 − µ2 > ∆0 z > z α
E XAMPLE 7
To compare the average life of two brands of 9-volt batteries, a
sample of 100 batteries from each brand is tested. The sample
selected from the first brand shows an average life of 47 hours and a
standard deviation of 4 hours. A mean life of 48 hours and a
standard deviation of 3 hours are recorded for the sample from the
second brand. Is the observed difference between the means of the
two samples significant at the 0.01 level?

D ISTRIBUTION OF THE D IFFERENCE IN P ROPORTIONS
P ROPOSITION 2.1
Let p̂ 1 = X /m and p̂ 2 = Y /n, where X ∼ B (m, p 1 ) and Y ∼ B (n, p 2 )
with X ⊥Y . Then
E(p̂ 1 − p̂ 2 ) = p 1 − p 2
So (p̂ 1 − p̂ 2 ) is an unbiased estimator of (p 1 − p 2 ), and
p 1 q1 p 2 q2
Var (p̂ 1 − p̂ 2 ) = +
m n
The following test statistic is distributed approximately as standard
normal and is the basis of the test:
(p̂ 1 − p̂) − (p 1 − p 2 )
Z= q
p 1 q1 p 2 q2
m + n

L ARGE -S AMPLE (H YPOTHESIS T ESTS ON THE D IFFERENCE

IN P ROPORTION )
A L ARGE -S AMPLE z T EST H0 : p 1 − p 2 = 0

Test statistic H1 Rejection
(p̂ −p̂ )−∆
Z = q 1 ¡ 21 10¢ , p̂ 1 − p̂ 2 6= 0 |Z | > z α/2
p q m+n
m p̂ 1 + n p̂ 2 X + Y p̂ 1 − p̂ 2 > 0 Z > z α
p= = p̂ 1 − p̂ 2 < 0 Z < −z α
m +n m +n
The test can safely be used as long as m p̂ 1 , m q̂ 1 , n p̂ 2 , and n q̂ 2 are all
at least 10.

A L ARGE -S AMPLE z T EST H0 : p̂ 1 − p̂ 2 = 0
E XAMPLE 8
Extracts of St. John’s Wort are widely used to treat depression. An
article in the April 18, 2001, issue of the Journal of the American
Medical Association compared the efficacy of a standard extract of
St. John’s Wort with a placebo in 200 outpatients diagnosed with
major depression. Patients were randomly assigned to two groups;
one group received the St. John’s Wort, and the other received the
placebo. After eight weeks, 19 of the placebo- treated patients
showed improvement, and 27 of those treated with St. John’s Wort
improved. Is there any reason to believe that St. John’s Wort is
effective in treating major depression? Use α = 0.05.

L ARGE -S AMPLE (C ONFIDENCE I NTERVAL ON THE

D IFFERENCE IN P ROPORTION )
A CI for p 1 − p 2 is
s s
p̂ 1 q̂ 1 p̂ 2 q̂ 2 p̂ 1 q̂ 1 p̂ 2 q̂ 2
(p̂ 1 −p̂ 2 )−z α/2 + ≤ p 1 −p 2 ≤ (p̂ 1 −p̂ 2 )+z α/2 +
m n m n


A CI for p 1 − p 2 is
s s
p̂ 1 q̂ 1 p̂ 2 q̂ 2 p̂ 1 q̂ 1 p̂ 2 q̂ 2
(p̂ 1 −p̂ 2 )−z α/2 + ≤ p 1 −p 2 ≤ (p̂ 1 −p̂ 2 )+z α/2 +
m n m n
This interval can safely be used as long as m p̂ 1 , m q̂ 1 , n q̂ 2 , and

n q̂ 2 are all at least 10.
A one-sided confidence bound results from retaining the
relevant sign and replacing z α/2 by z α .
The estimated standard deviation of (p̂ 1 − p̂ 2 ) is different here
from what it was for hypothesis testing when ∆0 = 0.


E XAMPLE 9
Consider the process of manufacturing crankshaft bearings.
Suppose that a modification is made in the surface finishing
process and that, subsequently, a second random sample of 85
bearings is obtained. The number of defective bearings in this
second sample is 8. Suppose that
m = 85, p̂ 1 = 10/85 = 0.1176, n = 85, p̂ 2 = 8/85 = 0.0941
Obtain an approximate 95% confidence interval on the difference

in the proportion of defective bearings produced under the two
processes.

Chapter 6

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 6

Uploaded by

Copyright:

Available Formats

Paired Samples (dependent samples)

Two Independent samples

P ROBABILITY AND S TATISTICS

Dr. Phan Thi Huong

HoChiMinh City University of Technology

HCM city — 2021.

Dr. Phan Thi Huong Probability and Statistics

1 PAIRED S AMPLES ( DEPENDENT SAMPLES )

Dr. Phan Thi Huong Probability and Statistics

1 PAIRED S AMPLES ( DEPENDENT SAMPLES )

Dr. Phan Thi Huong Probability and Statistics

Dr. Phan Thi Huong Probability and Statistics

Paired data usually given as this form:

Dr. Phan Thi Huong Probability and Statistics

Paired data usually given as this form:

⇒ Problem: Compare the true mean difference between property 1

Dr. Phan Thi Huong Probability and Statistics

PAIRED DATA AND HYPOTHESIS TESTING

Dr. Phan Thi Huong Probability and Statistics

PAIRED DATA AND HYPOTHESIS TESTING

⇒ Let µD = E(X − Y ), consider the test of hypothesis for one sample

H0: µD = ∆0 H0: µD = ∆0 H0: µD = ∆0

Dr. Phan Thi Huong Probability and Statistics

D ISTRIBUTION OF THE S AMPLE D IFFERENCES

So the D i ’s are the differences within pairs. Then the D i ’s are

Dr. Phan Thi Huong Probability and Statistics

H YPOTHESIS TESTING ON THE DIFFERENCE MEANS

Consider the null hypothesis: H 0 : µD = ∆0

Dr. Phan Thi Huong Probability and Statistics

H YPOTHESIS TESTING ON THE DIFFERENCE MEANS

Consider the null hypothesis: H 0 : µD = ∆0

A one-sided confidence bound results from retaining the relevant

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + K NOWN VARIANCES (H YPOTHESIS

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + K NOWN VARIANCES (H YPOTHESIS

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + K NOWN VARIANCES

Note: If both X and Y are normal then

Consider the null hypothesis H0 : µ1 − µ2 = ∆0 .

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + K NOWN VARIANCES

Note: If both X and Y are normal then

Consider the null hypothesis H0 : µ1 − µ2 = ∆0 .

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + UNKNOWN VARIANCES

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES

In case σ21 = σ22 = σ2 , we use the same sample variance to

follows Student distribution with n + m − 2 degrees of freedom.

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES

Consider the null hypothesis H0 : µ1 − µ2 = ∆0 .

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES

Consider the null hypothesis H0 : µ1 − µ2 = ∆0 .

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES

If both X and Y are normal then

A 100(1 − α)% confidence interval for µ1 − µ2 is

Dr. Phan Thi Huong Probability and Statistics

N ORMAL P OPULATION + U NKNOWN σ + E QUAL VARIANCES

Dr. Phan Thi Huong Probability and Statistics