Stat 115 - Chapter 4

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 62

CHAPTER 4

Estimation:
Two Populations
University of the Philippines School of Statistics | 2nd Semester AY 2022-2023
01 Basic Concepts 02 Point Estimation

03 Interval Estimation
01 Basic Concepts
Parameters of Interest
Objective: compare the means of two populations
Parameter of interest: µX - µY

Population 1 Population 2
Mean: 𝜇! Mean: 𝜇"

𝜇! − 𝜇" = 0 the means of the populations are equal

𝜇! − 𝜇" > 0 the mean of the measurements in Population 1 is larger than


the mean of measurements in Population 2

𝜇! − 𝜇" < 0 the mean of the measurements in Population 1 is smaller than


the mean of measurements in Population 2
Parameters of Interest
Objective: compare the proportions of two populations
Parameter of interest: p1 – p2

Population 1 Population 2
Proportion: 𝑝# Proportion: 𝑝$

𝑝# − 𝑝$ = 0 proportions of the two populations are equal

𝑝# − 𝑝$ > 0 proportion of elements possessing characteristic of interest is


larger in Population 1 than in Population 2

𝑝# − 𝑝$ < 0 proportion of elements possessing characteristic of interest is


smaller in Population 1 than in Population 2
Parameters of Interest
Objective: compare the variances of two populations
Parameter of interest: sX2 /sY2

Population 1 Population 2
Variance: 𝜎!$ Variance: 𝜎"$
𝜎!$ ⁄𝜎"$ = 1 variances of the two populations are equal

𝜎!$ ⁄𝜎"$ > 1 measures in Population 1 are more varied than measures in
Population 2
𝜎!$ ⁄𝜎"$ < 1 measures in Population 1 are less varied than measures in
Population 2
NOTE: Variances are comparable when the means of the two populations are not too different from
each other.
Approaches to Sampling

Independent
Samples

Related
Samples
Independent Sampling
If the selection of the sample from the first population is independent of the
selection of the sample from the second population, then we are taking independent
samples from the two populations.

Population 1 Population 2
𝜇! , 𝜎!" 𝜇# , 𝜎#"
Sample 1 of size n1 Sample 2 of size n2
(X1 , X2 , ..., Xn1 ) Use these (X1 , X2 , ..., Xn2 )
$ 𝑠!"
𝑋, samples $ 𝑠#"
𝑌,
to infer on
μX –μY
Example
The principal of a school wishes to determine if the Grade 6
boys are better in mathematics than the Grade 6 girls. A
random sample of boys were selected. Then a random sample
of girls were selected. All of the students in the two random
samples were asked to take a standardized test in mathematics
and their scores were determined.
Matched Sampling/Paired Sampling
If the selection of the sample from the first population is related (in any of its forms)
to the selection of the sample from the second population, then we are taking related
or paired samples from the two populations.

Population 1 Population 2
𝜇! , 𝜎!" 𝜇# , 𝜎#"
X1 Y1
X2 Y2

… Xn Yn
Sample
D1 = X1 – Y1
D2 = X2 – Y2
...
Dn = Xn – Yn
Matched Sampling/Paired Sampling
Recall
o An experiment is a data collection method where the researcher intervenes by
controlling the conditions that may affect the response variable by:
(i) using a randomization mechanism in assigning the treatments, and
(ii) controlling the identified extraneous variables.
o By doing so, the researcher can isolate the effects of the explanatory variable on
the response variable and clarify the direction and strength of their relationship.
o In many experiments, the available experimental units may considerably differ
with respect to extraneous variables.
Methods of Generating Paired Data
Paired data: {(X1,Y1), (X2,Y2), …, (Xn,Yn)}
Forming paired data will be beneficial when the two measures in the ith pair, Xi and Yi,
exhibit strong direct relationship so that when Xi is high then so is Yi as a result of
sharing the same values on the extraneous variable/s.
Example: Two formulations of a new whitening soap are
to be compared as to their whitening effect. A random
Method 1: sample of 40 potential users of the soap is selected.
Each person uses a randomization mechanism to
determine which formulation is applied on the left arm,
all experimental units in the so that the other formulation is applied on the right
sample receive both arm. After two weeks, they measured the effect of each
formulation.
treatments Experimental unit: person
Response variable: degree of fairness of a person
2 Treatments: Formulation A and Formulation B of
whitening soap
Xi = degree of fairness of arm of ith person where
Formulation A was applied
Yi = degree of fairness of arm of ith person where
Formulation B was applied
Parameter of interest: µX - µY
Extraneous variables: original degree of fairness of
person, biological characteristics that affect person’s
reaction to any treatment
Example: A police department wants to assess the
Method 2: effects of an obvious radar trap on the speeds of
cars. Ten cars are randomly selected on a highway,
taking measurements before and their speeds are measured just before a radar
and after the treatment is trap comes into view and right after they pass the
obvious radar trap.
applied to the experimental
Experimental unit: car
units (can be viewed as a
Response variable: speed of car
special case of Method 1)
Treatment: visible radar trap
Xi = speed of ith car before seeing radar trap
Yi = speed of ith car after seeing radar trap
Parameter of interest: µX - µY
Extraneous variables: driver, type of car, age of car,
etc.
Method 3: Example: A gym instructor measures the
effect of an exercise in lowering blood
use naturally occurring pairs pressure among twins by making one twin
exercise four times a week while the other
such as twins, or husbands and
twin has no exercise.
wives, or siblings, etc.
Example: A science teacher has developed new teaching
materials and wants to evaluate the effectiveness of these
Method 4: materials in improving the students’ comprehension. Prior to
sampling, the teacher formed pairs of students so that
students belonging in the same pair received about the same
form pairs of experimental final grade in science the previous term. The teacher then
units that have the same selected a sample of pairs of students. The teacher randomly
selects which student in each pair will be taught using the
values or levels of the new materials so that the other one will be taught using the
extraneous variable old materials. At the end of the term, all the students in the
sample were given a standardized test.
Experimental unit: student
Response variable: score in standardized test
2 Treatments: old teaching materials, new teaching materials
Xi = score of ith student taught using the old teaching
materials
Yi = score of ith student taught using the new teaching
materials
Parameter of interest: µX - µY
Extraneous variables: aptitude in science.
Recall: Expectation of a Random Variable
The following properties apply to the mean and variance of discrete and continuous random
variables.
Let X and Y be random variables, and a and b be real numbers.
o E aX+b =aE X +b
o E X+Y =E X +E Y
o E X−Y =E X −E Y
o If X and Y are independent, then E XY =E X E Y
o Var aX+b =a2Var X
o If X and Y are independent, then
o Var X+Y =Var X +Var Y
o Var X−Y =Var X +Var(Y)
02 Point Estimation
Point Estimation
Parameter Point Estimator
µx – µY 𝑋+ − 𝑌+

p1 – p2 𝑃.! − 𝑃."

sX2 /sY2 𝑆#" /𝑆$"


Example
Suppose that a random sample of 12 plots planted with pechay were selected from each
of two types of plants in a farm in Nueva Ecija. The yields, in kilograms, of pechay are
measured from each of the plots. The measurements generated are given to be:
Type I: 10.1 7.85 4.9 5.705 5.625 7.45
8 5.4 7.55 7.25 5.75 4.575
Type II: 8.7 9.5 9.2 6.45 10.35 8.1
7.7 3.3 8.3 7.8 6.15 5
Estimate the difference in the mean yields of pechay between the types of plots.
Recall: Type I Type II
Sample Mean 6.6796 7.5458
The estimated difference in the mean yields of pechay between the two types of
plots is
!−𝒀
𝑿 ! = 6.6796 − 7.5458 = −0.8662
Example
Consider again the Pechay Plot example.
Type I: 10.1 7.85 4.9 5.705 5.625 7.45
8 5.4 7.55 7.25 5.75 4.575
Type II: 8.7 9.5 9.2 6.45 10.35 8.1
7.7 3.3 8.3 7.8 6.15 5

Estimate the difference between the proportions among Type I plots and Type II plots whose
yield is greater than 6 kilograms.
Recall: Type I Type II
Sample Proportion 6/12 or 0.5 10/12 or 0.8333
The estimated difference between the proportion among Type I plots and Type II plots
whose yield is greater than 6 kilograms is
%𝟏 − 𝒑
𝒑 %𝟐 = 6⁄12 − 10⁄12 = −0.333
Example
Dieticians developed a new diet which they claimed can reduce a person’s
weight. They selected a random sample of 7 women to follow the diet. The
weights of the women in the sample who followed this diet were recorded
before taking the diet and after two weeks of taking the diet.

Woman 1 2 3 4 5 6 7
Weight Before 58.5 60.3 61.7 69.0 64.0 62.6 56.7
Weight After 60.0 54.9 58.1 62.1 58.5 59.9 54.4

Estimate the mean difference of the weight before and after taking the diet
for two weeks.
Example
An estimator for 𝜇# = 𝜇$%&'(% − 𝜇)&*%( is 𝑑̅
𝑖 𝑋𝑖 𝑌𝑖 𝑑𝑖 = 𝑋𝑖 − 𝑌𝑖
1 58.5 60 -1.5
2 60.3 54.9 5.4
3 61.7 58.1 3.6
4 69.0 62.1 6.9
5 64.0 58.5 5.5
6 62.6 59.9 2.7
7 56.7 54.4 2.3

−1.5 + 5.4 + 3.6 + 6.9 + 5.5 + 2.7 + 2.3


𝑑̅ = = 3.5571 𝑘𝑖𝑙𝑜𝑔𝑟𝑎𝑚𝑠
7
Thus, the estimated mean difference of the weights before and after taking the diet for two
weeks is 3.5571 kilograms.
03 Interval Estimation
Confidence Interval for Two Means
Confidence Interval Estimators for μX – μY
Let 𝑋", 𝑋#, … , 𝑋$" be a random sample with mean 𝜇% and variance 𝜎%#
Let 𝑌", 𝑌#, … , 𝑌$# be a random sample with mean 𝜇& and variance 𝜎&#
( and 𝒀
Let 𝑿 ( denote the sample mean and 𝑠%# and 𝑠'# denote the sample variance of the two random samples, respectively.

Cases Confidence Interval Estimators

𝜎+, 𝜎-, 𝜎+, 𝜎-,


Case 1: 𝜎+, and 𝜎-, are known 𝑋B − 𝑌B − 𝑍. + , 𝑋B − 𝑌B + 𝑍. +
, 𝑛/ 𝑛, , 𝑛 / 𝑛,

1 1 1 1
𝑋$ − 𝑌$ − 𝑡$,& 𝑆)" + , 𝑋$ − 𝑌$ + 𝑡$,& '& (" 𝑆)" +
" !'&"(" 𝑛* 𝑛" " ! " 𝑛* 𝑛"
Case 2: 𝜎+, and 𝜎-, are unknown
but 𝜎+, = 𝜎-,
, ,
𝑛/ − 1 𝑆+ + 𝑛 , − 1 𝑆-
𝑆0, =
𝑛/ + 𝑛, − 2
Confidence Interval Estimators for μX – μY
Let 𝑋", 𝑋#, … , 𝑋$" be a random sample with mean 𝜇% and variance 𝜎%#
Let 𝑌", 𝑌#, … , 𝑌$# be a random sample with mean 𝜇& and variance 𝜎&#
( and 𝒀
Let 𝑿 ( denote the sample mean and 𝑠%# and 𝑠'# denote the sample variance of the two random samples, respectively.

Cases Confidence Interval Estimators

𝑠+, 𝑠-, 𝑠+, 𝑠-,


𝑋B − 𝑌B − 𝑡.,2 + , 𝑋B − 𝑌B + 𝑡.,2 +
, 𝑛/ 𝑛, , 𝑛 / 𝑛,
Case 3: 𝜎+, and 𝜎-, are unknown
but 𝜎+, ≠ 𝜎-, 𝑠!"#
+
𝑠$"#
"
𝑛# 𝑛"
𝑣= " "
𝑠!"# 𝑠$"#
𝑛# 𝑛"
𝑛# − 1 + 𝑛" − 1

Case 4: 𝜎+, and 𝜎-, are unknown 𝑠!" 𝑠#" 𝑠!" 𝑠#"
𝑋$ − 𝑌$ − 𝑍$ + , 𝑋$ − 𝑌$ + 𝑍$ +
but 𝑛/ > 30 𝑎𝑛𝑑 𝑛, > 30 " 𝑛* 𝑛" " 𝑛* 𝑛"
Derivation: Sampling Distribution of 𝑋! − 𝑌!

Suppose (X1, X2, …, XnX) and (Y1, Y2, …, YnY) are independent random samples from Normal(µX,
sX2) and Normal(µY, sY2), respectively.
Clearly, 𝑋/ and 𝑌/ are also independent.
Thus,
𝐸 𝑋/ − 𝑌/ = 𝐸 𝑋/ − 𝐸 𝑌/ = 𝜇, − 𝜇-
and
𝜎,. 𝜎-.
𝑉𝑎𝑟 𝑋/ − 𝑌/ = 𝑉𝑎𝑟 𝑋/ + 𝑉𝑎𝑟 𝑌/ = +
𝑛, 𝑛-
,
/+ /-,
Hence, 𝑋/ − 𝑌~𝑁𝑜𝑟𝑚𝑎𝑙
/ 𝜇, − 𝜇- , + .
0+ 0-
Derivation: Sampling Distribution of 𝑋! − 𝑌!

Therefore,
𝑋/ − 𝑌/ − 𝜇, − 𝜇-
~𝑁𝑜𝑟𝑚𝑎𝑙 0,1
𝜎,. 𝜎-.
+
𝑛, 𝑛-

Similarly, if 𝜎,. and 𝜎-. are unknown but assuming 𝜎,. = 𝜎-. ,
𝑋/ − 𝑌/ − 𝜇, − 𝜇-
~𝑡 0+ 20- 3.
1 1
𝑆1. +
𝑛, 𝑛-
,
0+ 34 5+ 2 0- 34 5-,
where 𝑆1. = .
0+ 20- 3.
Assumptions
üAll formulas were derived under the assumption that the two independent random
samples come from normal distributions.
üThese procedures are robust in the sense that these will still provide good
approximate (1-a)100% confidence interval estimates even if there are slight
deviations from the assumption of normality.
üBecause of the Central Limit Theorem, in most cases, the assumption of normality
can be dropped as long as both samples are greater than 30.
Assumptions
üFormula 2 was derived under the additional assumption that the two unknown
variances are equal to each other.
üHowever, the procedure is also robust in the sense that this will still provide good
approximate (1- a)100% confidence interval estimates even if the 2 population
variances are not equal to each other so long as the sample sizes are equal to each
other.
üThis is one of the reasons why we consider using equal sample sizes when we
design our experiment.
Assumptions
üFormula 3 adjusts the degrees of freedom downwards.
üThe result of this is to have a longer interval estimate.
üFormula 3 also does not pool the information from the two samples to estimate a
common variance since the variances of the two populations are actually not equal.
üHowever, these two adjustments (on df and variance) become negligible when
both sample sizes are large.
Assumptions
üThe degrees of freedom in Formula 3 is a computed value based on the
sample sizes and sample variances so that the resulting value will not always
be an integer.
üSince our table presents the values for integral degrees of freedom only, then we would
have to round-off the computed value. We will take the more conservative approach of
always rounding-down instead of using the standard rules of rounding.

üFormula 4 is relevant only when we cannot get the t-value from the t-table
because the degrees of freedom is very large.
üAgain, we just replace t by z because as the degrees of freedom approaches infinity, the
t-distribution approaches the standard normal distribution.
Interpretation
(−, +)
If the computed interval estimate contains 0, then we do not
have sufficient evidence to conclude that the two means are
different from each other.
Minsan oo, minsan hindi... Minsan positive, minsan negative…

(+, +)
If the computed interval estimate contains positive values only
then we can conclude with (1-α)100% confidence that μX is
greater than μY.
𝜇𝑋 − 𝜇𝑌 > 0 → 𝜇+ > 𝜇𝑌
If the computed interval estimate contains negative values only
(−, −) then we can conclude with (1-α)100% confidence that μX is less
than μY
𝜇𝑋 − 𝜇𝑌 < 0 → 𝜇+ < 𝜇𝑌
Example
Suppose that company officials were concerned about the length of time a
particular drug retained its potency. A random sample of nX = 20 bottles of
the drug was drawn from the production line and analyzed for potency. A
second sample of nY = 25 bottles was drawn and stored in regulated
environment for a period of one year. The readings obtained are shown
below.
$
Sample 1: 𝑋=10.37, SX = 0.3234 $
Sample 2: 𝑌=9.83, SY = 0.2406
Estimate the difference in mean potency for all bottles coming off the
production line and the mean potency for all bottles retained for a period of
one year using a 95% confidence interval assuming (i) the population
variances are equal and (ii) the population variances are unequal.
Example
Assuming normality and equal variances:
, ,
1 1 (𝑛+ − 1)𝑆+ + (𝑛 - − 1)𝑆-
𝑋L − 𝑌L ∓ 𝑡.⁄, 𝑣 = 𝑛+ + 𝑛- − 2 𝑆0, + where 𝑆4, =
𝑛+ 𝑛- 𝑛+ + 𝑛- − 2
𝑋L − 𝑌L = 10.37 − 9.83 = 0.54
, , , + (25 − 1)(0.2406),
(𝑛+ − 1)𝑆+ + (𝑛 - − 1)𝑆- (20 − 1)(0.3234)
𝑆4, = = = 0.078523
𝑛+ + 𝑛- − 2 20 + 25 − 2

𝑡./, (𝑣 = 𝑛/ + 𝑛, − 2) = 𝑡.78/, (𝑣 = 20 + 25 − 2) = 𝑡7.7,8 (𝑣 = 43) = 2.017

1 1 1 1
𝑆0, + = 0.078523 + = 0.084066
𝑛/ 𝑛, 20 25

0.54 ∓ (2.017)(0.084066) = (𝟎. 𝟑𝟕𝟎𝟒, 𝟎. 𝟕𝟎𝟗𝟔)


Example
Assuming normality but unequal variances:
𝑆+, 𝑆-, 𝑆+, 𝑆-,
(𝑋L − 𝑌)
L − 𝑡./, (𝑣) + , (𝑋L − 𝑌)
L + 𝑡./, (𝑣) +
𝑛/ 𝑛, 𝑛/ 𝑛,

𝑋L − 𝑌L = 10.37 − 9.83 = 0.54 𝑆%# 𝑆&#


#
(0.3234)# (0.2406)#
#
𝑛% + 𝑛& 20 +
25
𝑣= # # = # # = 34.237 ↓ 34
𝑆%# 𝑆&# (0.3234)# (0.2406)#
𝑡7.7,8 (𝑣 = 34) = 2.032 𝑛% 𝑛& 20 25
+
𝑛% − 1 + 𝑛& − 1 20 − 1 25 − 1

𝑆+, 𝑆-, 0.3234, 0.2406,


+ = + = 0.086861455
𝑛/ 𝑛, 20 25
0.54 ∓ (2.032)(0.086861455) = (𝟎. 𝟑𝟔𝟑𝟓, 𝟎. 𝟕𝟏𝟔𝟓)
Preliminaries on Inference on µX - µY
Based on 2 Related Samples
Sample Data={(X1,Y1), (X2,Y2), …, (Xn,Yn)}

Define: Di = Xi – Yi , i=1,2,…,n
(Note: Dis are all random variables. Why?)

Assumptions: (D1, D2, …, Dn) is a random sample


Di ~ Normal(µD, sD2)
Preliminaries on Inference on µX - µY
Based on 2 Related Samples
Following the same procedure to estimate the population mean based on a random
sample from a normal distribution:
∑.
+,- 3+
1. the point estimator for the mean µD is 𝐷$ = = sample mean of Dis
4
2. the standard error of 𝐷$ is 𝜎3 / 𝑛
3. the estimator for the standard error is 𝑆3 / 𝑛
∑. 7 /
+,-(3+ 63)
where 𝑆3 = 469
= sample standard deviation of Dis
Remarks on µD and sD2
We defined Di = Xi – Yi, i=1,2,….,n earlier.
We assumed (D1,D2,…,Dn) is a random sample from a normal distribution with
parameters µD and sD2.

Since Di = Xi – Yi then µD = µX - µY and sD2 = sX2 + sY2 – 2Cov(X,Y)


where µX = common mean of the Xis
µY = common mean of the Yis
sX2 = common variance of the Xis
sY2 = common variance of the Yis
Cov(X,Y) = common covariance of (Xi, Yi)s
Remarks on µD and sD2
o The Cov(X,Y) is a measure of the linear relationship of X and Y. If X and Y are not
related then Cov(X,Y) = 0.
o The converse though is not always true.
o If the value of Y increases as X increases then Cov(X,Y) > 0; but if the value of Y
decreases as X increases then Cov(X,Y) < 0.
Confidence Interval Estimator for µD=µX-µY
Based on 2 Related Samples
Let {(𝑋! , 𝑌! ), (𝑋" , 𝑌" ), … , (𝑋# , 𝑌# )} be your sample data
Denote 𝐷$ = 𝑋$ − 𝑌$ for i=1, 2, …, n
#
1 , )"
∑#$%!(𝐷$ − 𝐷
, = / 𝐷$ = 𝑋0 − 𝑌0
𝐷 𝑠& =
𝑛 𝑛−1
$%!

A 100(1-α)% Confidence Interval Estimator


for the Mean of the Differences
𝑠& 𝑠&
, − 𝑡'
𝐷 , , + 𝑡'
𝐷
,#)! 𝑛 ,#)! 𝑛
" "
Confidence Interval Estimator for µD=µX-µY
Based on 2 Related Samples
Procedure:
Step 1: Compute Di= Xi – Yi, i=1,2,…,n
Step 2: Compute for the mean and standard deviation of the Dis.
Step 3: Use t-table to determine 𝑡'⁄" 𝑣 = 𝑛 − 1 where n=number
of pairs.
Step 4: Plug-in the computed values in Steps 2 and 3 in the
formula.
Example
To test two promising new lines of hybrid corn under normal farming conditions, a seed
company selected eight farms at random in Iowa and planted both lines in experimental
plots on each farm. The yields (converted to bushels per acre) for the eight locations were:
Line A: 86 87 56 93 84 93 75 79
Line B: 80 79 58 91 77 82 74 66
Assuming that the two yields are jointly normally distributed, estimate the difference
between the mean yields by a 95% confidence interval.

The parameter of interest is µD= µX - µY where


µX=mean yield using line A of hybrid corn
µY=mean yield using line B of hybrid corn.
Example
Step 1: Compute for Di = Xi – Yi, i=1,2,…,8
D1=86 – 80=6 D2=87 – 79=8 D3=56 – 58=-2 D4=93 – 91=2
D5=84 – 77=7 D6=93 – 82=11 D7=75 – 74=1 D8=79 – 66=13

Step 2: Compute for and SD. (Use standard deviation / statistics mode of your calculator
by entering the values of Di, i=1,2,…,8.)
Z
D=5.75 and SD= 5.1199888
Step 3: Use t-table to determine value of t.05/2(v=8-1).
t0.025(v=7) = 2.365
Step 4: Plug-in computed values in the following formula
5.75 ∓ 2.365 5.1199888/√8
A 95% CI estimate for the mean difference is (1.4689, 10.0311).
03 Interval Estimation
Confidence Interval for Two Proportions
Confidence Interval Estimators
Given two independent random samples of sizes n1 and n2, a point estimator for
𝑃9– 𝑃" is 𝑝̂9 − 𝑝̂"

An Approximate 100(1-α)% Confidence Interval Estimator for the


Difference of Proportions
𝑝̂/ 1 − 𝑝̂/ 𝑝̂, 1 − 𝑝̂, 𝑝̂/ (1 − 𝑝̂/ ) 𝑝̂, (1 − 𝑝̂, )
(𝑝̂/ −𝑝̂, ) − 𝑍. + , (𝑝̂/ −𝑝̂, ) + 𝑍. +
, 𝑛/ 𝑛, , 𝑛/ 𝑛,

This approximation will only hold when the sample sizes are large.
Thus, we require the sample sizes n1 ≥ 30 and n2 ≥ 30.
Furthermore, we have the condition that both
P1 and P2 are not expected to be too close to 0 or 1.
NOTES
o This CI estimator will provide a good approximate (1-α)100% CI estimate for p1-
p2 when both sample sizes are large. Thus, we require that both sample sizes are
at least 30. Furthermore, we have the condition that both p1 and p2 are not
expected to be too close to 0 or 1.
o In Stat 132, you will learn about CI estimation for p1-p2 based on paired or related
samples.
INTERPRETATION
Suppose that a 95% confidence interval estimate for the difference is
constructed.
a) For what range of values is it not possible to conclude that the population
proportions are different from one another?
b) For what range of values can you conclude, with 95% confidence, that the
proportion in population 1 is statistically higher than the proportion in
population 2?
c) For what range of values can you conclude, with 95% confidence, that the
proportion in population 1 is statistically lower than the proportion in
population 2?
ANSWER: CI contains a) 0; b) positive values only; c) negative values only.
Example
A company is considering the introduction of a new formulation of its Zippi Cola soft drink.
It first conducts a series of taste tests comparing Zippi to the leading brand of cola. In the
first test based on the original formula of Zippi, 120 of 500 people who tried it preferred
Zippi. The test was repeated to a new group of 1000 tasters to compare the new
formulation of Zippi Cola to the leading brand. This time, 300 of the 1000 tasters preferred
the new Zippi to the leading brand. Compute for an approximate 90% confidence interval
estimate for the difference of population proportions who prefer Zippi over the leading
brand of cola.

Parameter of interest: p1 – p2
p1=proportion who prefer the original formula of Zippi over the leading brand
p2=proportion who prefer the new formulation of Zippi over the leading brand
Example
Point Estimates for 𝑝/ : 𝑃_/ = 120⁄500 = 0.24, and 𝑝, : 𝑃_, = 300/1000 = 0.3
Point Estimate for 𝑝/ − 𝑝, : 𝑃_/ − 𝑃_, = 0.24 − 0.3 = −0.06

Interval Estimator
49* (/;49* ) 49) (/;49) ) 49* (/;49* ) 49) (/;49) )
(𝑃_/ − 𝑃_, ) − 𝑧( + , (𝑃_/ − 𝑃_, ) + 𝑧( +
) =* =) ) =* =)

𝑧7.//, = 𝑧7.78 = 1.645

Interval Estimate
(7.,>)(7.?@) (7.A)(7.?) (7.,>)(7.?@) (7.A)(7.?)
−0.06 − (1.645) + , −0.06 + (1.645) +
877 /777 877 /777

= (−𝟎. 𝟎𝟗𝟗𝟒, −𝟎. 𝟎𝟐𝟎𝟔)


03 Interval Estimation
Confidence Interval for Two Variances
Confidence Interval Estimators
Suppose (X1, X2, …, XnX) and (Y1, Y2, …, YnY) are independent random samples from Normal(µX, sX2) and
Normal(µY, sY2), respectively.
s𝑿𝟐
A (1-a)100% confidence interval estimator for the ratio of two variances, 𝟐, is given by:
s𝒀
𝑆+, 𝑆+,
,
𝑆-, 𝐹. 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1 𝑆-, 𝐹/;. 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1
, ,

. *B
where 𝐹/;( 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1 and 𝐹( 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1 are the 100 and
) ) ,
. *B
100 1 − percentiles, respectively, of the F-distribution with 𝑣/ = 𝑛/ − 1 and 𝑣, = 𝑛, − 1
,
degrees of freedom.
Confidence Interval Estimators
Suppose (X1, X2, …, XnX) and (Y1, Y2, …, YnY) are independent random samples from Normal(µX, sX2) and
Normal(µY, sY2), respectively.
s
A (1-a)100% confidence interval estimator for the ratio of two variances, sX, is given by:
Y

𝑆+, 𝑆+,
,
𝑆-, 𝐹. 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1 𝑆-, 𝐹/;. 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1
, ,

. *B
where 𝐹/;( 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1 and 𝐹( 𝑣/ = 𝑛/ − 1, 𝑣, = 𝑛, − 1 are the 100 and
) ) ,
. *B
100 1 − percentiles, respectively, of the F-distribution with 𝑣/ = 𝑛/ − 1 and 𝑣, = 𝑛, − 1
,
degrees of freedom.
Exercise 2 p. 506
Example
Consider the data on the number of births per 1,000 population in African and Asian countries in table 14.1. Provide
a 95% CI estimate of the ratio of the variances of number of births per 1,000 population between African and Asian
countries.
Births (per 1,000 population)
African Countries Asian Countries
Algeria 20 Libya 28 Armenia 10 Mongolia 18
Benin 41 Madagascar 43 Brunei 22 Myanmar 25
Botswana 27 Malawi 51 China 12 Nepal 34
Burkina Faso 45 Mali 50 Georgia 11 North Korea 17
Cameroon 37 Mauritius 16 India 25 Oman 26
Cape Verde 29 Mayotte 41 Indonesia 22 Pakistan 34
Chad 49 Senegal 37 Iran 18 Philippines 26
Comoros 47 Seychelles 18 Japan 9 Qatar 20
Eritrea 39 Sudan 38 Kuwait 18 Syria 28
Ethiopia 41 Togo 38 Kyrgyzstan 21 Turkey 21
Gambia 41 Tunisia 17 Lebanon 23 UAE 16
Guinea-Bissau 50 Zambia 42 Malaysia 26 Uzbekistan 24
Lesotho 33 Maldives 18
Example
Exercise 2 p. 506
Consider the data on the number of births per 1,000 population in African and Asian
countries in table 14.1. Provide a 95% CI estimate of the ratio of the variances of number of
births per 1,000 population between African and Asian countries.
𝑆!" = 113.46
𝑆#" = 43.04
𝐹$ 𝑣* = 𝑛* − 1, 𝑣" = 𝑛" = 1 = 𝐹..."0 𝑣* = 24, 𝑣" = 24 = 2.26927728
"
𝐹*($ (𝑣* = 𝑛* − 1, 𝑣" = 𝑛" = 1) = 𝐹..120 𝑣* = 24, 𝑣" = 24 = 0.44066893
"

Substituting the values to the formula,

113.46 113.46
, = (1.1617, 5.9822)
(43.04)(2.26927728) (43.04)(0.44066893)
EXERCISES
A statistics test was given to a random sample of
Exercise 50 girls and another random sample of 75 boys.
The mean score of the girls is 80 with a standard
01 deviation of 4 and the mean score of the boys is
86 with a standard deviation of 6.
Find a 95% confidence interval for the difference
of means μB - μG.
Consider again the example on the dieticians developing
Exercise a new diet. Assume the distribution of weights to be
approximately normal. Compute a 95% confidence

02
interval for the mean difference of weight before and
after the diet. Do you think the diet really reduces a
person’s weight on the average?
Woman 1 2 3 4 5 6 7
Weight
Before
58.5 60.3 61.7 69.0 64.0 62.6 56.7
Weight
After
60.0 54.9 58.1 62.1 58.5 59.9 54.4
Ten engineering schools in the United States
Exercise were surveyed. The sample contained 250
electrical engineers, 80 being women; 175
03 chemical engineers, 40 being women. Compute a
90% confidence interval for the difference
between the proportions of women in these two
fields of engineering.
Two different brands of latex paint are being considered
Exercise for use. Fifteen specimens of each type of paint were
selected, and the drying times, in hours were as follows:

04 PAINT A PAINT B
3.5 2.7 3.9 4.2 3.6 4.7 3.9 4.5 5.5 4.0
2.7 3.3 5.2 4.2 2.9 5.3 4.3 6.0 5.2 3.7
4.4 5.2 4.0 4.1 3.4 5.5 6.2 5.1 5.4 4.8
E!"
Construct a 95% confidence interval for 1E"
#
Read:
• Section 14.7: Determining the
Sample Sizes (pp. 506-509)

Reading
Assignment
Hypothesis
Next
Testing (Single
Chapter
Population)

You might also like