AB1202 Statistics and Analysis: Statistical Inferences Based On Two Samples

AB1202
Statistics and Analysis

Lecture 7
Statistical Inferences Based on Two Samples
Chin Chee Kai

cheekai@ntu.edu.sg
Nanyang Business School
Nanyang Technological University
NBS 2016S1 AB1202 CCK-STAT-018
2
Statistical Inferences Based on Two

Samples
• Comparing Two Population Means
• Paired Difference Experiments
• Comparing Population Proportions
• Special Case with Equality of Proportions
• When To Assume Equal Variance?
3
Comparing Two Population Means

• We like to compare two populations means,
whose values we don’t really know.
Sampling
𝜎12 𝜎22
Distributions
𝑛1 𝑛2
𝜇1 𝜇2
𝑋1 , 𝑋2
Sample once each from
above Sampling
𝜎12 𝜎22 Distributions, then take
+
𝑛1 𝑛2 difference. Keep doing that
Rejection Region indefinitely to form a
𝑧=0 population of difference.
𝑋1 − 𝑋2
𝜇1 − 𝜇2
𝑧𝑐 𝑍
4
Points To Note
• Two parent populations 𝑋1 with mean 𝜇1 , variance
𝜎12 , and 𝑋2 with mean 𝜇2 , variance 𝜎22 .
• Two samples with means 𝑥1 , 𝑥2 , variances 𝑠12 , 𝑠22 and
sample sizes 𝑛1 , 𝑛2 (where both sample sizes are
roughly similar)
• Must check:
▫ Unpaired or paired?
▫ Both large, or both small sample sizes?
▫ Parent population variances known or unknown?
▫ Both variances assumed equal, or unequal?
▫ Both parent distributions are normal, or not?
• Unequal variance assumption is more general (less
restrictive) than equal variance assumption.
5
Unpaired Tests
• We begin by comparing two populations means under
unpaired test scenarios. The are 5 populations at work!
Parent
Populations
𝜎12 𝜎22
𝜇1 𝜇2 𝑋1 , 𝑋2
Populations
of Sample
𝜎12 Means 𝜎22
𝑛1 𝑛2
𝜇1 𝜇2 𝑋1 , 𝑋2
Population of
Rejection Region Difference of (Parent)
𝜎12 𝜎22
+ Population Means
𝑛1 𝑛2
𝑋1 − 𝑋2
𝜇1 − 𝜇2
𝑧𝑐 𝑧=0 𝑍
6
Large n, Var Known, Var Unequal

• If parent populations are Normal, then sampling
distribution will be Normal anyway.
• If parent populations are not normal, then by
CLT (𝑛1 , 𝑛2 ≥ 30), both sampling distributions
will still approximate Normal.
• Test random variable: 𝑋 = 𝑋1 − 𝑋2

• H0 : 𝜇1 − 𝜇2 ≥ 𝑑0 , H1 : 𝜇1 − 𝜇2 < 𝑑0
𝜎12 𝜎22
• 𝑉𝑎𝑟 𝑋 = 𝑉𝑎𝑟 𝑋1 + 𝑉𝑎𝑟 𝑋2 = +
𝑛1 𝑛2
𝑥1 −𝑥2 −𝑑0
• Test statistic: 𝑧 =
𝜎2 2
1 +𝜎2
𝑛 1 𝑛2
7
Large n, Var Known, Var Equal

• If both variances are known to be equal, OR may
be assumed to be equal, then we still proceed as
before, setting 𝜎12 = 𝜎22 = 𝜎 2 .
• Note that as before, parent populations can be

normally distributed, or not. We still use normal
distribution to test the sample statistic due to
CLT.
8
Large n, Var Unknown, Var Unequal

• If both variances are unknown, we need to use
Student-t Distribution to test the sample statistic.
• Test random variable: 𝑋 = 𝑋1 − 𝑋2 Welch-

• H0 : 𝜇1 − 𝜇2 ≥ 𝑑0 , H1 : 𝜇1 − 𝜇2 < 𝑑0 Satterthwaite
formula
𝑠12 𝑠22
𝑛1 𝑛2
𝑠2 𝑠2 2
𝑥1 −𝑥2 −𝑑0 1+ 2
• Test statistic: 𝑡 = d.f. 𝑣 =
𝑛 1 𝑛2
𝑠2 2 2 2 2
1 + 𝑠2 𝑠2
1 𝑠2
Round 𝑛1 𝑛2
𝑛 1 𝑛2
DOWN +
𝑛1 −1 𝑛2 −1
• Note that we don’t need to pool variance, nor fall back on

assuming equal variance, since our sample sizes are large.
9
Large n, Var Unknown, Var Equal

• If both variances are known to be equal (but we
don’t know the value), OR may be assumed to be
equal (due to contextual understanding), then we
still proceed as before, but we would pool the
sample variances:
2 2
2
𝑛 1 − 1 𝑠1 + 𝑛 2 − 1 𝑠2
𝑠𝑝 =
𝑛1 + 𝑛2 − 2
𝑥1 −𝑥2 −𝑑0
• Test statistic: 𝑡 = 1 1
𝑠𝑝2 𝑛 +𝑛
1 2
with 𝑣 = 𝑛1 + 𝑛2 − 2 d.f.
10
Small n Cases
• With smallish sample sizes, we require
assumption of parent populations being normally
distributed to proceed. If so, then sampling
distribution will be normal.
• If parent populations are not normal, or cannot
be assumed to be approximately normal, then we
have to use other methods not discussed here.
• This will be the case (with small sample sizes)
whether or not variances are known or unknown,
and whether or not variances are equal or not
equal.
• In what follows, parent populations are all
assumed to be normally distributed.
11
Small n, Var Known, Var Unequal or Equal

• Since parent populations are normally
distributed, sampling distribution will be normal
(despite smallish sample sizes).
• Test random variable:𝑋 = 𝑋1 − 𝑋2
• H0 : 𝜇1 − 𝜇2 ≥ 𝑑0 , H1 : 𝜇1 − 𝜇2 < 𝑑0
𝜎12 𝜎22
𝑛1 𝑛2
𝑥1 −𝑥2 −𝑑0
𝜎2 𝜎 2
1+ 2
𝑛 1 𝑛2
• If both variances are known to be equal, OR may be

assumed to be equal, then we still proceed as before,
setting 𝜎12 = 𝜎22 = 𝜎 2 .
12
Small n, Var Unknown, Var Unequal

• If both variances are unknown, then just like large
sample size case, we use Student-t Distribution to
test the sample statistic.
Welch-
• Test random variable: 𝑋 = 𝑋1 − 𝑋2 Satterwaite
• H0 : 𝜇1 − 𝜇2 ≥ 𝑑0 , H1 : 𝜇1 − 𝜇2 < 𝑑0 formula
𝑠12 𝑠22
• 𝑉𝑎𝑟 𝑋 = 𝑉𝑎𝑟 𝑋1 + 𝑉𝑎𝑟 𝑋2 = + 𝑠2 2 2
𝑛1 𝑛2 𝑠
1+ 2
𝑥1 −𝑥2 −𝑑0 𝑛 1 𝑛2
• Test statistic: 𝑡 = d.f. 𝑣 = 2 2 2
2 𝑠2 𝑠
𝑠2
1 + 𝑠2
1 2
Round 𝑛1
+
𝑛2
𝑛1 𝑛2 DOWN 𝑛1 −1 𝑛2 −1
• BUT, unlike large sample size case, sample variances may be

very poor point-estimates of population variances, since
sample sizes are small. So we may suffer from inflated Type
I or II probabilities unknowingly.
13
Small n, Var Unknown, Var Unequal

• Small sample sizes 𝑛1 , 𝑛2 give poor variance point-
estimates.
𝑠12 𝑠22
𝑉𝑎𝑟 𝑋 = 𝑉𝑎𝑟 𝑋1 + 𝑉𝑎𝑟 𝑋2 ≈×≈ +
𝑛1 𝑛2
• We attempt to trade-off modeling error with precision by
making a (bold) assumption that parent population
variances are equal (without proof).
• This assumption allows us to pool-estimate the variance
value.
• Pooling sample variances tends to improve precision of
variance estimate since two sets of samples are combined as
if it were a single “larger” sample set.
• We proceed by assuming “Small n, Var Unknown, Var
Equal” (next slide).
• But as we knowingly contradict our knowledge that
population variances are unequal, we might still suffer from
modeling inaccuracies. (There’s no free lunch)
14
Small n, Var Unknown, Var Equal

• For unknown variances, we use Student-t Distribution.
• If both variances are known to be equal (but we don’t
know the value), OR may be assumed to be equal (due
to contextual understanding), then we would pool the
sample variances:
2 2
2
𝑛1 − 1 𝑠 1 + 𝑛 2 − 1 𝑠2
𝑠𝑝 =
𝑛1 + 𝑛2 − 2
• Pooled sample variance is a more precise point-
estimate of population variance.
• Test random variable: 𝑋 = 𝑋1 − 𝑋2
• H0 : 𝜇1 − 𝜇2 ≥ 𝑑0 , H1 : 𝜇1 − 𝜇2 < 𝑑0
𝑥1 −𝑥2 −𝑑0
• Test statistic: 𝑡 = 1 1
𝑣 = 𝑛1 + 𝑛2 − 2 d.f.
𝑠𝑝2 𝑛 +𝑛
1 2
15
Paired Difference Experiments

• Why pair?
▫ When experimental unit measurements are related (in
whichever way deemed proper), paired-test is more
appropriate than unpaired-test.
• What is it?
▫ Individual experimental unit measurements are paired
and their paired-difference calculated to create a NEW
population (of delta values).
▫ Test the mean of this NEW population.
• Pairing situations arise commonly from:
▫ Before-after measurements
▫ Data collected somewhat simultaneously on
experimental units which are deemed related.
 Eg, intelligence of children paired with parent
• Note that whenever paired-test can be calculated,
unpaired-test can also be done, but might not be
proper or meaningful.
16
Paired, Var Known

• As in one-variable hypothesis testing, the parent
population should be normally distributed, or the
sample size is large.
• If variance is known, then sampling distribution
used will be the normal distribution.
• Test random variable: 𝐷

where data 𝑑𝑖 = 𝑥1,𝑖 − 𝑥2,𝑖 mean 𝜇𝐷 variance 𝜎𝐷2
• H0 : 𝜇𝐷 ≥ 𝑑0 , H1 : 𝜇𝐷 < 𝑑0
2
𝜎𝐷
• 𝑉𝑎𝑟 𝐷 =
𝑛
𝑑 −𝑑0
• Test statistic: 𝑧 = 𝜎𝐷
𝑛
17
Paired, Var Unknown

• Again, we need to assume parent population is normal
or else the sample size used is large.
• If variance is unknown, then we need to estimate
population variance using sample variance. Sampling
distribution used will be Student-t distribution.
• Test random variable: 𝐷

where data 𝑑𝑖 = 𝑥1,𝑖 − 𝑥2,𝑖 mean 𝜇𝐷
• H0 : 𝜇𝐷 ≥ 𝑑0 , H1 : 𝜇𝐷 < 𝑑0
2
𝑠𝐷
• 𝑉𝑎𝑟 𝐷 =
𝑛
𝑑 −𝑑0
• Test statistic: 𝑡 = 𝑠𝐷 with 𝑣 = 𝑛 − 1 d.f.
𝑛
18
Comparing Population Proportions

• Proportion distribution’s variance depends on
the very mean proportion value which we test.
• We will always do large samples
(𝑛1 𝑝1 , 𝑛1 1 − 𝑝1 , 𝑛2 𝑝2 , 𝑛2 1 − 𝑝2 ≥ 5).
• Test random variable: 𝑃 = 𝑃1 − 𝑃2

• H0 : 𝑝1 − 𝑝2 ≥ 𝑝0 , H1 : 𝑝1 − 𝑝2 < 𝑝0
𝑝1 1−𝑝1 𝑝2 1−𝑝2 𝑝1 1−𝑝1 𝑝2 1−𝑝2
• 𝑉𝑎𝑟 𝑋 = + = +
𝑛1 𝑛2 𝑛1 𝑛2
𝑝1 −𝑝2 −𝑝0
𝑝1 1−𝑝1 𝑝 1−𝑝2
+ 2
𝑛1 𝑛2
19
Population Proportions – When 𝑑0 = 0

• When 𝑑0 = 0, our null hypothesis claims that both
population proportions are equal.
• But this implies both population variances are equal!
• We pool-estimate sample proportion:
count of all "successes" in both samples
• 𝑝=
total count in both samples
• We pool estimate sample proportion variance:

𝑝 1−𝑝 𝑝 1−𝑝 1 1
• 𝑠𝑝2 = + =𝑝 1−𝑝 +
𝑛1 𝑛2 𝑛1 𝑛2
• H0 : 𝑝1 − 𝑝2 ≥ 0, H1 : 𝑝1 − 𝑝2 < 0
𝑝1 −𝑝2
• Test statistic: 𝑧 = 1 1
𝑝 1−𝑝 +
𝑛1 𝑛2
20
When To Assume Equal Variance?

• Small sample sizes:
▫ Instead of each (small) sample estimating its own variance
in a poor manner, we trade-off by assuming equal variance
and thus enabling us to pool both the samples together to
point-estimate more accurately the assumed-equal
variance.
▫ As long as actual population variances are not too far apart,
the trade-off would be safe to make.
• Both population variances are known to be the same.
▫ Usually if the same population is measured at different
times, we could gather contextual knowledge that the
variances are presumably the same.
• Null hypothesis postulates or implies population
variances are the same.
▫ Typically when the variances are dependent only on means,
and means are hypothesized to be equal.

AB1202 Statistics and Analysis: Statistical Inferences Based On Two Samples

Uploaded by

Copyright:

Available Formats

You might also like

AB1202 Statistics and Analysis: Statistical Inferences Based On Two Samples

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AB1202 Statistics and Analysis: Statistical Inferences Based On Two Samples

Uploaded by

Copyright:

Available Formats

AB1202

Statistics and Analysis

Chin Chee Kai

Statistical Inferences Based on Two

Comparing Two Population Means

Large n, Var Known, Var Unequal

• Test random variable: 𝑋 = 𝑋1 − 𝑋2

Large n, Var Known, Var Equal

• Note that as before, parent populations can be

Large n, Var Unknown, Var Unequal

• Test random variable: 𝑋 = 𝑋1 − 𝑋2 Welch-

• Note that we don’t need to pool variance, nor fall back on

Large n, Var Unknown, Var Equal

Small n, Var Known, Var Unequal or Equal

• If both variances are known to be equal, OR may be

Small n, Var Unknown, Var Unequal

• BUT, unlike large sample size case, sample variances may be

Small n, Var Unknown, Var Unequal

Small n, Var Unknown, Var Equal

Paired Difference Experiments

Paired, Var Known

• Test random variable: 𝐷

Paired, Var Unknown

• Test random variable: 𝐷

Comparing Population Proportions

• Test random variable: 𝑃 = 𝑃1 − 𝑃2

Population Proportions – When 𝑑0 = 0

• We pool estimate sample proportion variance:

When To Assume Equal Variance?

You might also like