Inferences Based On Two Samples

Chapter 9
Inferences Based
on
Two Samples
9.1
z Tests and Confidence

Intervals for a
Difference Between
Two Population Means
The Difference Between Two
Population Means
New Notation
Assumptions:
1. X1,,Xm is a random sample from a
population with m1 and s 1 .
2
m: sample size 1
2. Y1,,Yn is a random sample2 from a
population with m 2 and s 2 .
n: sample size 2
3. The X and Y samples are independent

of one another
Expected Value and Standard
Deviation of X - Y
Think of this as
the parameter.
The expected value is m1 - m 2 .
So X - Y is an estimator of
m1 - m 2 .
The standard deviation is
s2
s 2
s X -Y = 1
+ 2
m n
Test Procedures for Normal
Populations With Known Variances
Null hypothesis: H 0 : m1 - m 2 = 0
same
x - y - D0
Test statistic value: z =
s 2
s 2
+
1 2
m n
b ( D
) = P(Type II Error)
Alt. Hypothesis b ( D
= m1 - m 2 )
H a : m1 - m 2 > D 0 D - D0
F za -
s
D
- D0
H a : m1 - m 2 < D 0 1- F - za -
s
D - D0
H a : m1 - m 2 D 0 F za / 2 -
s

D - D0
Similar to p. 330 -F - za / 2 -
formulas s
Large-Sample Tests
The assumptions of normal population

distributions and known values of s 1 , s 2
are unnecessary. The Central Limit
Theorem guarantees that X - Y has
approximately a normal distribution.
Rule of thumb: Both m, n>40

Large-Sample Tests
Use of the test statistic value Usually zero
x - y - D0
z= m, n >40
2 2
s s
1
+ 2
m n
along with previously stated rejection
regions based on z critical values give
large-sample tests whose significance
levels are approximately a .
Confidence Interval for m1 - m 2
Provided m and n are large, a CI for

m1 - m 2 with a confidence level of
100(1 - a )% is
2 2
s s
x - y za / 2 1
+ 2
m n
confidence bounds can be found by
replacing za / 2 by za .
9.2
The Two-Sample
t Test and
Confidence Interval
Assumptions
Both populations are normal, so that X1,

,Xm is a random sample from a normal
distribution and so is Y1,,Yn. The
plausibility of these assumptions can be
judged by constructing a normal
probability plot of the xis and another of
the yis.
Normality assumption important for (small-sample) t-tests!
t Distribution
When the population distributions are
both normal, the standardized variable
X - Y - ( m1 - m 2 )
T=
S12 S 22
+
m n
has approximately a t distribution

t Distribution
df v can be estimated from the data

by 2 2
Yuck! Dont do
s1 s2
2
by hand if you
+
m n can help it.
v=
( s1 / m ) + ( s2 / n )
2 2 2 2
m -1 n -1
(round down to the nearest integer)

Two-Sample CI for m1 - m 2
The two-sample CI for m1 - m 2

with a confidence level of 100(1 - a )%
is
2 2
s s
x - y ta / 2,v 1
+ 2
m n
Two-Sample t Test
Null hypothesis: H 0 : m1 - m 2 = D 0
Usually zero
x - y - D0
Test statistic value: z=
2 2
s s
1
+ 2
m n
The Two-Sample t Test
Alternative Rejection Region for
Hypothesis Approx. Level a Test
H a : m - m0 > D0 t ta ,v
H a : m - m0 < D0 t -ta ,v
H a : m - m 0 D 0 t ta / 2,v or t -ta / 2,v

Important: pooled t assumes equal variances
Pooled t Procedures
Assume two populations are normal and

have equal variances. If s denotes the
2
common variance, it can be estimated

by combining information from the two
samples. Standardizing X - Y using
the pooled estimator gives a t variable
based on m + n 2 df.
Pooled sample variance
( m - 1) S 2
( n - 1) S 2
S P2 = 1
+ 2
m+n-2 m+n-2
Usage in formulas:
S12 S 22 S P2 S P2 2 1 1
+ becomes + or S P +
m n m n m n
9.3
Analysis of
Paired Data
Paired Data (Assumptions)
Important: A natural pairing must exist!
The data consists of n independently
selected pairs (X1,Y1),, (Xn,Yn), with
E ( X i ) = m1 and E (Yi ) = m 2
Let D1 = X1 Y1, , Dn = Xn Yn.
The Dis are assumed to be normally
distributed 2with mean value m Dand
variance s D . Bottom line: Two-sample problem
becomes a one-sample problem!
The Paired t Test
Null hypothesis: H 0 : m D = D0
Usually zero
d - D0
Test statistic value: t=
sD / n
d and sD are the sample mean
and standard deviation of the dis.
The Paired t Test Nothing new
here!
Alternative Rejection Region for
Hypothesis Level a Test
H a : mD > D0 t ta ,n -1
H a : m D < D0 t -ta ,n -1
H a : m D D 0 t ta / 2,n -1 or t -ta / 2,n -1
Confidence Interval for m D
Nothing new
here!
The paired t CI for m D is
d ta / 2,n -1
sD / n
confidence bounds can be found by
replacing ta / 2 by ta .
For large samples, you could use Z test and CI
Paired Data and Two-Sample t
1
V ( X - Y ) = V ( D) = V Di
n
V ( Di ) s 1 + s 2 - 2 rs 1s 2
2 2
= =
n n
Remember: Smaller variance means better estimates
Independence between X and Y r = 0
Positive dependence r > 0
Pros and Cons of Pairing
1. For great heterogeneity and large correlation
within experimental units, the loss in degrees
of freedom will be compensated for by an
increased precision associated with pairing
(use pairing). Usually, were in case 1;
use pairing if possible.
2. If the units are relatively homogeneous and
the correlation within pairs is not large, the
gain in precision due to pairing will be
outweighed by the decrease in degrees of
freedom (use independent samples).
9.4
Inferences
Concerning a
Difference Between
Population Proportions
Difference Between Population
Proportions
Let X ~Bin(m,p1) and Y ~Bin(n,p2) with
X and Y independent variables. Then
p1 - p 2 is an estimator of p1 - p2
X Y
Note: p1 = and p2 =
E ( p1 - p 2 ) = p1 - p2 m n
p1q1 p2 q2
V ( p1 - p 2 ) = - (qi = 1 pi)
m n
mp1 10 and mq1 10 and np 2 10 and nq2 10
Large-Samples
Null hypothesis: H 0 : p1 - p2 = 0
Test statistic value:

p1 - p 2
z=
( 1/ m + 1/ n )
pq
Standard error involves p, a
weighted average of p1 and p2
Only for test of H 0 : p1 - p2 = 0,
Standard error involves p, a weighted average of p1 and p2
p = m n
p1 + p2
m+n m+n
p = Total number of successes (X + Y )

Total number of trials (m + n)
Confidence Interval for p1 p2
p1q1 p 2 q2
p1 - p 2 za / 2 +
m n
Note: Standard error here is

slightly different than for test!

Inferences Based On Two Samples

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Inferences Based On Two Samples

Uploaded by

Copyright:

Available Formats

Chapter 9

z Tests and Confidence

3. The X and Y samples are independent

The assumptions of normal population

Rule of thumb: Both m, n>40

Provided m and n are large, a CI for

Both populations are normal, so that X1,

has approximately a t distribution

df v can be estimated from the data

(round down to the nearest integer)

The two-sample CI for m1 - m 2

H a : m - m 0 D 0 t ta / 2,v or t -ta / 2,v

Assume two populations are normal and

common variance, it can be estimated

Test statistic value:

p = Total number of successes (X + Y )

Note: Standard error here is

You might also like