Chapter3 Statistics 2021 22

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

CHAPTER III

CONFIDENCE INTERVAL

PHOK Ponna

Institute of Technology of Cambodia


Department of Applied Mathematics and Statistics (AMS)

2021–2022

Statistics ITC 1 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 1 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 2 / 34
Confidence Intervals for Parameters

Definition 1
Let 𝑋1 , 𝑋2 , ..., 𝑋𝑛 be a random sample of size 𝑛 from a population 𝑋
with density 𝑓 (𝑥; 𝜃), where 𝜃 is an unknown parameter. The interval
estimator of 𝜃 is a pair of statistics 𝐿 = 𝐿(𝑋1 , 𝑋2 , ..., 𝑋𝑛 ) and
𝑈 = 𝑈(𝑋1 , 𝑋2 , ..., 𝑋𝑛 ) with 𝐿 ≤ 𝑈 such that if 𝑥 1 , 𝑥2 , ..., 𝑥 𝑛 is a set of
sample data, then 𝜃 belongs to the interval
[𝐿(𝑥 1 , 𝑥2 , ..., 𝑥 𝑛 ), 𝑈(𝑥 1 , 𝑥2 , ..., 𝑥 𝑛 )]. The interval [𝑙, 𝑢] will be denoted as
an interval estimate of 𝜃 whereas the random interval [𝐿, 𝑈] will
denote the interval estimator of 𝜃.
The interval estimator of 𝜃 is called a 100(1 − 𝛼)% confidence interval
for 𝜃 if
𝑃(𝐿 ≤ 𝜃 ≤ 𝑈) = 1 − 𝛼.
The random variable 𝐿 is called the lower confidence limit and 𝑈 is
called the upper confidence limit. The number (1 − 𝛼) is called the
confidence level or degree of confidence.

Statistics ITC 3 / 34
Confidence Intervals for Parameters

Definition 2
Let 𝑋1 , 𝑋2 , ..., 𝑋𝑛 be a random sample of size 𝑛 from a population 𝑋
with probability density function 𝑓 (𝑥; 𝜃), where 𝜃 is an unknown
parameter. A pivotal quantity 𝑄 is a function of 𝑋1 , 𝑋2 , ..., 𝑋𝑛 and
𝜃 whose probability distribution is independent of the parameter 𝜃.

Procedure to find a confidence interval for 𝜃 using the pivot method


If 𝑄 = 𝑄(𝑋1 , 𝑋2 , ..., 𝑋𝑛 , 𝜃) is a pivot, then a 100(1 − 𝛼)% confidence
interval for 𝜃 may be constructed as follows:
1 Find two values 𝑎 and 𝑏 such that
𝑃 (𝑎 ≤ 𝑄 ≤ 𝑏) = 1 − 𝛼
Choose 𝑎 and 𝑏 such that 𝑃(𝑄 ≤ 𝑎) = 𝛼/2 and 𝑃(𝑄 ≥ 𝑏) = 𝛼/2.
2 Convert the inequality 𝑎 ≤ 𝑄 ≤ 𝑏 into the form 𝐿 ≤ 𝜃 ≤ 𝑈.

Statistics ITC 4 / 34
Confidence Intervals for Parameters

Example 1
Suppose we have a random sample 𝑋1 , ..., 𝑋𝑛 from 𝑁(𝜇, 1). Construct
a 95% confidence interval for 𝜇.

Example 2
Suppose the random sample 𝑋1 , ..., 𝑋𝑛 has 𝑈(0, 𝜃) distribution.
Construct a 90% confidence interval for 𝜃 and interpret. Identify the
upper and lower confidence limits.

Statistics ITC 5 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 6 / 34
Large-sample confidence intervals: one sample case

Large-sample confidence intervals


If the sample size 𝑛 ≥ 30 , then by the CLT, certain sampling
distributions can be assumed to be approximately normal.
1 Find an estimator (such as the MLE) of 𝜃, say 𝜃.ˆ
2 ˆ
Obtain the standard error, 𝜎𝜃ˆ of 𝜃.
𝜃ˆ − 𝜃
3 Find the 𝑧-transform . Then 𝑧 has an approximately
𝜎𝜃ˆ
standard normal distribution.
4 Using the normal table, find two tail values −𝑧 𝛼/2 and 𝑧 𝛼/2 .
5 An approximate (1 − 𝛼)100% for 𝜃 is
𝜃ˆ − 𝑧 𝛼/2 𝜎 ˆ ≤ 𝜃 ≤ 𝜃ˆ + 𝑧 𝛼/2 𝜎 ˆ .
𝜃 𝜃
6 If 𝜎𝜃ˆ involve unknown parameters, then we replace 𝜎𝜃ˆ by 𝜎ˆ 𝜃ˆ .

Statistics ITC 7 / 34
Large-sample confidence intervals: one sample case

Theorem 1
The large-sample (1 − 𝛼)100% CI for the population mean 𝜇 is

𝑆 𝑆
 
𝑋¯ − 𝑧 𝛼/2 √ , 𝑋¯ + 𝑧 𝛼/2 √
𝑛 𝑛

Example 3
Two statistics professors want to estimate average scores for an
elementary statistics course that has two sections. Each professor
teaches one section and each section has a large number of students. A
random sample of 50 scores from each section produced the following
results:
(a) Section I: 𝑥¯ 1 = 77.01, 𝑠1 = 10.32.
(b) Section II: 𝑥¯ 2 = 72.22, 𝑠2 = 11.02
Calculate 95% CIs for each of these two samples.
Statistics ITC 8 / 34
Large-sample confidence intervals: one sample case

Theorem 2
Let 𝑋 ∼ 𝐵𝑖𝑛(𝑛, 𝑝), where 𝑝 ∈ (0, 1) is the population proportion, and
𝑋
𝑝ˆ = be the MLE of 𝑝. Then an approximate large sample
𝑛
(1 − 𝛼)100% CI for 𝑝 is
r r !
𝑝(1
ˆ − 𝑝)ˆ 𝑝(1
ˆ − 𝑝)ˆ
𝑝ˆ − 𝑧 𝛼/2 , 𝑝ˆ + 𝑧 𝛼/2 .
𝑛 𝑛

There are various rules of thumb that are used to determine the
adequacy of the sample size for normal approximation. One of the
popular rules are that 𝑛𝑝 and 𝑛(1 − 𝑝) should be greater than 10 (or
> 5).

Statistics ITC 9 / 34
Large-sample confidence intervals: one sample case

Example 4
An auto manufacturer gives a bumper-to-bumper warranty for 3 years
or 36,000 miles for its new vehicles. In a random sample of 60 of its
vehicles, 20 of them needed five or more major warranty repairs within
the warranty period. Estimate the true proportion of vehicles from this
manufacturer that need five or more major repairs during the warranty
period, with confidence coefficient 0.95. Interpret.

Statistics ITC 10 / 34
Large-sample confidence intervals: one sample case

Theorem 3
Sample size needed for interval estimate of a population proportion is
given by
 𝑧 2
𝛼/2
𝑛= 𝑝(1
˜ − 𝑝) ˜
𝐸
where 𝐸 is the maximum error of estimate and 𝑝˜ is the initial estimate
of 𝑝.

Example 5
A researcher wishes to estimate, with 95% confidence, the number of
people who own a home computer. A previous study shows that 40%
of those interviewed had a computer at home. The researcher wishes to
be accurate within 2% of the true proportion. Find the minimum
sample size necessary.

Statistics ITC 11 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 12 / 34
Student’s 𝑡-distribution

Definition 1
Let 𝑋 be a crv on R with cdf 𝑓𝑋 . We say that 𝑋 has Student’s
𝑡-distribution or 𝑡-distribution with degree of freedom 𝜈 > 0, written by
𝑋 ∼ 𝑡 𝜈 or 𝑋 ∼ 𝑡(𝜈), if
 − 𝜈+21
Γ 𝜈+1
 
𝑥2
𝑓𝑋 (𝑥) = √ 2 𝜈  1 + , 𝑥 ∈ R.
𝜋𝜈Γ 2 𝜈

Theorem 4
Let 𝑋 ∼ 𝑡 𝜈 , 𝜈 > 0. Then
1 𝐸(𝑋) = 0 for 𝜈 > 1, otherwise undefined.
𝜈
2 𝑉(𝑋) = for 𝜈 > 2, 𝑉(𝑋) = ∞ for 1 < 𝜈 ≤ 2, otherwise
𝜈−2
undefined.

Statistics ITC 13 / 34
Student’s 𝑡-distribution

Theorem 5
𝑍
Let 𝑍 ∼ 𝒩(0, 1) and 𝑉 ∼ 𝜒 2 (𝜈), 𝜈 > 0, and define 𝑇 = p . Suppose
𝑉/𝜈
that 𝑍 and 𝑉 are independent. Then, 𝑇 ∼ 𝑡 𝜈 .

Theorem 6
Let 𝑋1 , . . . , 𝑋𝑛 be a random sample drawn from a normal distribution
with mean 𝜇 and variance 𝜎2 . Then the random variable
¯
𝑋−𝜇
𝑇= √
𝑆/ 𝑛

has a 𝑡 distribution with 𝑛 − 1 degrees of freedom.

Theorem 7
𝑝
Let 𝑇 ∼ 𝑡 𝑛 , 𝑛 ∈ N, and 𝑍 ∼ 𝒩(0, 1). Then, 𝑇 −→ 𝑍.
Statistics ITC 14 / 34
Small-sample confidence intervals for 𝜇

Theorem 8
If 𝑋¯ and 𝑆 are the sample mean and the sample standard deviation of
a random sample of size 𝑛 from a normal population, then:
𝑆 𝑆
𝑋¯ − 𝑡 𝛼/2,𝑛−1 √ < 𝜇 < 𝑋¯ + 𝑡 𝛼/2,𝑛−1 √
𝑛 𝑛

is a (1 − 𝛼)100% CI for the population mean 𝜇.

Theorem 9
If 𝑥¯ is used as an estimate of 𝜇, we can be 100(1 − 𝛼)% confident that
the error | 𝑥¯ − 𝜇| will not exceed a specified amount 𝐸 when the sample
size is  𝑧 𝜎 2
𝛼/2
𝑛= .
𝐸
𝐸 is called the maximum error of estimate.
Statistics ITC 15 / 34
Small-sample confidence intervals for 𝜇

Example 6
Assume that the helium porosity (in percentage) of coal samples taken
from any particular seam is normally distributed with true standard
deviation 0.75.
a. Compute a 95% CI for the true average porosity of a certain seam
if the average porosity for 20 specimens from the seam was 4.85.
b. Compute a 98% CI for true average porosity of another seam
based on 16 specimens with a sample average porosity of 4.56.
c. How large a sample size is necessary if the width of the 95%
interval is to be 0.40?
d. What sample size is necessary to estimate true average porosity to
within 0.2 with 99% confidence?

Statistics ITC 16 / 34
Small-sample confidence intervals for 𝜇

Example 7
The following is a random data from a normal population.

7.2 5.7 4.9 6.2 8.5 2.8

Construct a 95% confidence interval for the population mean 𝜇.


Interpret.

Example 8
The scores of a random sample of 16 people who took the TOEFL (Test
of English as a Foreign Language) had a mean of 540 and a standard
deviation of 50. Construct a 95% CI for the population mean m of the
TOEFL score, assuming that the scores are normally distributed.

Statistics ITC 17 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 18 / 34
Confidence Interval on the Variance and Standard Deviation of a Normal
Distribution

Theorem 10
Let 𝑋1 , 𝑋2 , . . . , 𝑋𝑛 be a random sample from a normal distribution
with mean 𝜇 and variance 𝜎2 , and let 𝑆 2 be the sample variance. Then
the random variable
(𝑛 − 1)𝑆 2
𝜎2
has a chi-square (𝜒 2 ) distribution with 𝑛 − 1 degrees of freedom.

Statistics ITC 19 / 34
Confidence Interval on the Variance and Standard Deviation of a Normal
Distribution

Theorem 11
Let 𝑋1 , . . . , 𝑋𝑛 be a random sample drawn from a normal distribution
𝒩(𝜇, 𝜎2 ). Then, a 100(1 − 𝛼)% CI for the population variance is

(𝑛 − 1)𝑆 2 (𝑛 − 1)𝑆2
< 𝜎 2
<
𝜒 2𝛼 ,𝑛−1 𝜒1−
2
𝛼
,𝑛−1
2 2

A confidence interval for 𝜎 has lower and upper limits that are the
square roots of the corresponding limits in the interval for 𝜎2 . An
upper or a lower confidence bound results from replacing 𝛼/2 with 𝛼 in
the corresponding limit of the CI.

Statistics ITC 20 / 34
Confidence Interval on the Variance and Standard Deviation of a Normal
Distribution

Example 9
Find a 95% confidence interval for the variance and standard deviation
of the nicotine content of cigarettes manufactured if a sample of 20
cigarettes has a standard deviation of 1.6 milligrams. Asusme that the
variable is approximately normally distributed.

Example 10
Find a 90% confidence interval for the variance and standard deviation
for the price in dollars of an adult single-day ski lift ticket. The data
represent a selected sample of nationwide ski resorts. Assume the
variable is normally distributed.
59 54 53 52 51
39 49 46 49 48
Source: USA TODAY.
Statistics ITC 21 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 22 / 34
Confidence interval for the difference of two means

Theorem 12
Let 𝑋11 , . . . , 𝑋1𝑛1 be a random sample from a normal distribution with
mean 𝜇1 and variance 𝜎12 , and let 𝑋21 , . . . , 𝑋2𝑛2 be a random sample
from a normal distribution with mean 𝜇2 and variance 𝜎22 . Let
1 Í𝑛1 1 Í𝑛2
𝑋¯ 1 = 𝑖=1
𝑋1𝑖 and 𝑋¯ 2 = 𝑋2𝑖 . We assume that the two
𝑛1 𝑛2 𝑖=1
samples are independent. Then 𝑋¯ 1 and 𝑋¯ 2 are independent and the
distribution of 𝑋¯ 1 − 𝑋¯ 2 is 𝑁(𝜇1 − 𝜇2 , 𝜎12 /𝑛1 + 𝜎22 /𝑛2 ).

Statistics ITC 23 / 34
Large-sample confidence interval for the difference of two means
(i) If 𝜎1 , 𝜎2 are known, then a 100(1 − 𝛼)% large sample CI for
𝜇1 − 𝜇2 is
s
𝜎12 𝜎22

(𝑋¯ 1 − 𝑋¯ 2 ) ± 𝑧 𝛼/2 +
𝑛1 𝑛2
(ii) If 𝜎1 and 𝜎2 are not known, 𝑠 1 and 𝑠 2 can be replaced respective
by sample standard deviations 𝑆1 and 𝑆2 when 𝑛 𝑖 ≥ 30, 𝑖 = 1, 2,
then a 100(1 − 𝛼)% large sample CI for 𝜇1 − 𝜇2 is
s
𝜎12 𝜎22

(𝑋¯ 1 − 𝑋¯ 2 ) ± 𝑧 𝛼/2 +
𝑛1 𝑛2
Assumptions: The population is normal, and the samples are
independent.

Statistics ITC 24 / 34
Large-sample confidence interval for the difference of two means

Example 11
A study of two kinds of machine failures shows that 58 failures of the
first kind took an average of 79.7 minutes to repair with a standard
deviation of 18.4 minutes, whereas 71 failures of the second kind took
on average 87.3 minutes to repair with a standard deviation of 19.5
minutes. Find a 99% CI for the difference between the true average
amounts of time it takes to repair failures of the two kinds of machines.

Statistics ITC 25 / 34
Small-sample confidence interval for the difference of two means
(i) If 𝜎1 and 𝜎2 are unknown but 𝜎12 = 𝜎22 , then the small sample
100(1 − 𝛼)% CI for 𝜇1 − 𝜇2 is
r
1 1
(𝑋¯ 1 − 𝑋¯ 2 ) ± 𝑡 𝛼/2,(𝑛1 +𝑛2 −2) 𝑆 𝑝 +
𝑛1 𝑛2
s
(𝑛1 − 1)𝑆12 + (𝑛2 − 1)𝑆22
where 𝑆 𝑝 = .
𝑛1 + 𝑛2 − 2
(ii) If 𝜎1 and 𝜎2 are unknown and 𝜎12 ≠ 𝜎22 , then the small sample
100(1 − 𝛼)% CI for 𝜇1 − 𝜇2 is
s
𝑆12 𝑆22
(𝑋¯ 1 − 𝑋¯ 2 ) ± 𝑡 𝛼/2,𝜈 +
𝑛1 𝑛2
where 2 
𝜈 = 𝑠 12 /𝑛1 + 𝑠 22 /𝑛2 / (𝑠 12 /𝑛1 )2 /(𝑛1 − 1) + (𝑠 22 /𝑛2 )2 /(𝑛2 − 1) .


Assumption: The samples are independent from two normal


populations.
Statistics ITC 26 / 34
Small-sample confidence interval for the difference of two means

Example 12
Independent random samples from two normal populations with equal
variances produced the following data:
Sample 1: 1.2 3.1 1.7 2.8 3
Sample 2: 4.2 2.7 3.6 3.9

Obtain a 90% CI for 𝜇1 − 𝜇2 .

Example 13
Assume that two populations are normally distributed with unknown
and unequal variances. Two independent samples are taken with the
following summary statistics:
𝑛1 = 16 𝑥 1 = 20.17 𝑠 1 = 4.3
𝑛2 = 11 𝑥 2 = 19.23 𝑠 2 = 3.8

Construct a 95% CI for 𝜇1 − 𝜇2 .


Statistics ITC 27 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 28 / 34
Large-sample confidence interval for 𝑝 1 − 𝑝2

Large-sample confidence interval for 𝑝 1 − 𝑝 2


The (1 − 𝛼)100% large-sample CI for 𝑝 1 − 𝑝 2 is given by
s
𝑝ˆ 1 (1 − 𝑝ˆ 1 ) 𝑝ˆ 2 (1 − 𝑝ˆ 2 )
( 𝑝ˆ 1 − 𝑝ˆ 2 ) ± +
𝑛1 𝑛2

where 𝑝ˆ 1 and 𝑝ˆ 2 are the point estimators of 𝑝 1 and 𝑝 2 . This


approximation is applicable if 𝑝ˆ 𝑖 𝑛 𝑖 ≥ 5, 𝑖 = 1, 2 and
(1 − 𝑝ˆ 𝑖 )𝑛 𝑖 ≥ 5, 𝑖 = 1, 2 .The two samples are independent.

Statistics ITC 29 / 34
Large-sample confidence interval for 𝑝 1 − 𝑝2

Example 14
Iron deficiency, the most common nutritional deficiency worldwide, has
negative effects on work capacity and on motor and mental
development. In a 1999–2000 survey by the National Health and
Nutrition Examination Survey, iron deficiency was detected in 58 of
573 white, non-Hispanic females (10% rounded to whole number) and
95 of 498 (19% rounded to whole number) black, non-Hispanic females
(source:
http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5140a1.htm). Let
𝑝 1 be the proportion of black, non-Hispanic females with iron
deficiency and let 𝑝 2 be the proportion of white, non-Hispanic females
with iron deficiency. Obtain a 95% CI for 𝑝 1 − 𝑝 2 .

Statistics ITC 30 / 34
Contents

1 Interval Estimators and Confidence Intervals for Parameters

2 Large-sample confidence intervals: one sample case

3 Small-sample confidence intervals for 𝜇

4 Confidence Interval on the Variance and Standard Deviation of a


Normal Distribution

5 Confidence interval for the difference of two means

6 Large-sample confidence interval for 𝑝 1 − 𝑝 2

𝜎12
7 Confidence interval for
𝜎22

Statistics ITC 31 / 34
𝜎12
Confidence interval for
𝜎22

Theorem 13
Let 𝑋1 , . . . , 𝑋𝑛1 and 𝑌1 , . . . , 𝑌𝑛2 be independent samples of size 𝑛1 and
𝑛2 from two normal distributions 𝑁(𝜇1 , 𝜎12 ) and 𝑁(𝜇2 , 𝜎22 ) respectively.
Let 𝑆12 and 𝑆22 be the variances of the two random samples. Then the
random variable
𝑆 2 /𝜎2
𝐹 = 12 12
𝑆2 /𝜎2
has an 𝐹 distribution with 𝜈1 = 𝑛1 − 1 and 𝜈2 = 𝑛2 − 1.

Statistics ITC 32 / 34
𝜎12
Confidence interval for
𝜎22

Theorem 14
Let 𝑋1 , . . . , 𝑋𝑛1 and 𝑌1 , . . . , 𝑌𝑛2 be independent samples of size 𝑛1 and
𝑛2 from two normal distributions 𝑁(𝜇1 , 𝜎12 ) and 𝑁(𝜇2 , 𝜎22 ) respectively.
Let 𝑆12 and 𝑆22 be the variances of the two random samples. The a
𝜎2
100(1 − 𝛼)% CI for 12 is
𝜎2

𝑆12 𝑆12
 
1 1
,
𝑆22 𝐹𝑛1 −1,𝑛2 −1,1−𝛼/2 𝑆22 𝐹𝑛1 −1,𝑛2 −1,𝛼/2

Statistics ITC 33 / 34
𝜎12
Confidence interval for
𝜎22

Example 15
Assuming that two populations are normally distributed, two
independent random samples are taken with the following summary
statistics:
𝑛1 = 21 𝑥 1 = 20.17 𝑠 1 = 4.3
𝑛2 = 16 𝑥 2 = 19.23 𝑠 2 = 3.8

𝜎12
Construct a 95% CI for .
𝜎22

Statistics ITC 34 / 34

You might also like