Download as pdf or txt
Download as pdf or txt
You are on page 1of 30

STA408: Statistics for Science and Engineering

mean I
Topic 2: Estimation
M
2.1 Sampling Distributions
SD S T
Populations and Samples
 A population consists of all subjects that are being studied. S
 A sample is a subset of a population. Variance y
Population Parameters and Sample Statistics
 A numerical measure calculated for a population data set is called a population parameter.
 A summary measure calculated for a sample data set is called a sample statistic.

Population Distribution and Sampling Distribution


 The population distribution is the probability distribution of the population data.
 The probability distribution of a sample statistics is called its sampling distribution.

Note: A sample statistic that is used to estimate a population parameter is called an estimator.

Example 1
Suppose there are only five students in STA 408 class and the test 1 scores of these five students are as
below.
70 78 80 80 95
(a) Find the population distribution for the scores of the students.
(b) Find the mean and standard deviation of the data.

(a) Let 𝑥 be the score of a student. The population probability distribution is:

𝑥 𝑃(𝑥)
70
78
80
95

(b) Mean,
∑𝑥
𝜇= =
𝑁

Standard deviation,
(∑ 𝑥 )
∑(𝑥 − 𝜇) ∑𝑥 −
𝜎= or = 𝑁
𝑁 𝑁
(∑ 𝑥 )
∑𝑥 −
𝜎= 𝑁 =
𝑁
STA408 Topic 2: Estimation

Sampling Distribution of 𝒙

Example 2
Refer to the data in Example 1.
(a) Find all possible samples of three scores each that can be selected, without replacement.
(b) Find the mean for each of the sample.
(c) Find the sampling distribution of 𝑥̅ .

(a) Let 𝑀 = 70, 𝑀 = 78, 𝑀 = 80, 𝑀 = 80 and 𝑀 = 95.

Possible samples (b) Mean, 𝑥̅

From Example 1:
{𝑀 , 𝑀 , 𝑀 } {70, 78, 80} = = 76
Population Mean,
{𝑀 , 𝑀 , 𝑀 } {70, 78, 80} = = 76
𝜇=
{𝑀 , 𝑀 , 𝑀 } {70, 78, 95} = = 81
Mean of sample means,
{𝑀 , 𝑀 , 𝑀 } {70, 80, 80}
𝜇 ̅=
{𝑀 , 𝑀 , 𝑀 } {70, 80, 95}

{𝑀 , 𝑀 , 𝑀 } {70, 80, 95}


If we consider ALL the
{𝑀 , 𝑀 , 𝑀 } {78, 80, 80} possible samples in a
sampling distribution
{𝑀 , 𝑀 , 𝑀 } of sample mean, 𝑥̅ , then

{𝑀 , 𝑀 , 𝑀 } ∴ 𝜇𝑥 = 𝜇
{𝑀 , 𝑀 , 𝑀 }

(c) Sampling distribution of 𝑥̅ :

𝑥̅ 76 76.67 79.33 𝑃(𝑋 = 𝑥̅ )

𝑃(𝑋 = 𝑥̅ ) 0.2 0.1 0.1

Mean and Standard Deviation of 𝒙


 The mean of the sampling distribution of 𝑥̅ is denoted by 𝝁𝒙 , is equal to the population mean 𝜇, i.e.,
𝜇 ̅=𝜇

 The standard deviation of the sampling distribution of 𝑥̅ is denoted by 𝝈𝒙 is


𝜎
𝜎̅=
√𝑛
where 𝜎 is the standard deviation of the population and 𝑛 is the sample size.
Note: This property is possible if and only if both the distribution of 𝑋 and 𝑋 are normally
distributed.

2
STA408 Topic 2: Estimation

 Sampling from a normally distributed population


If the population from which the samples are drawn is normally distributed with mean 𝜇 and
standard deviation 𝜎 , then the sampling distribution of the sample mean, 𝑥̅ will be normally
distributed with mean 𝜇 ̅ and standard deviation 𝜎 ̅ , irrespective of the sample size.
The distribution of 𝑋 is
𝜎
𝑋 ~ 𝑁 𝜇 ̅ = 𝜇 ,𝜎 ̅ =
𝑛

 Sampling from a population that is not normally distributed


Most of the time the population from which the samples are selected is NOT normally distributed. In
such cases, the shape of the sampling distribution of 𝑥̅ is inferred from a theorem called Central Limit
Theorem.
 Central Limit Theorem
For a large sample size, the sampling distribution of 𝑥̅ is approximately normal, irrespective
of the shape of the population distribution. The mean and standard deviation of the sampling
distribution of 𝑥̅ are, respectively,
𝜎
𝜇 ̅ = 𝜇 and 𝜎̅= .
√𝑛
The sample size is usually considered to be large if 𝒏 ≥ 𝟑𝟎.
Hence, if 𝑛 ≥ 30, then for an unknown distribution of 𝑋, then the distribution of sampling
distribution of sample mean,
𝜎
𝑋 ⎯ 𝑋 ~ 𝑁 𝜇 ̅ = 𝜇 ,𝜎 ̅ = .
𝑛

Population distributions and sampling distributions of 𝒙


Table 1: The normal and not normal population distributions together with their respective
sampling distributions of 𝑥̅ for different sample sizes, 𝑛.

3
STA408 Topic 2: Estimation

Estimates and their notations


The population mean and variance are usually not known, therefore we estimate them as follows.
 To estimate population mean, 𝝁, we use the sample mean, 𝒙.
 To estimate population variance, 𝝈𝟐 , we use the sample variance, 𝒔𝟐 .
Note: The standard error of the mean is also the standard deviation of the sample mean, denoted as
𝜎 𝑠
or
√𝑛 √𝑛

Example 3
The mean wage per hour for all 5000 employees who work at a large hotel is RM 27.50, and the standard
deviation is RM 3.70. Let 𝑥̅ be the mean wage per hour for a random sample of certain employees selected
from this company. Find the mean and standard deviation of 𝑥̅ for a sample size of
(a) 30, (b) 75, (c) 200.

Applications of the sampling distribution of sample mean, 𝒙

Example 4
Assume that the weights of all packages of a certain brand of cookies are normally distributed with a
mean of 32 grams and a standard deviation of 0.3 grams. Find the probability that the mean weight, of a
random sample of 20 packages of this brand of cookies will be between 31.8 and 31.9 grams.

4
STA408 Topic 2: Estimation

Example 5
The actual weights, 𝑊 kilograms, of fertilizer in a 5 kg bag may be modelled by a normal random variable
with mean 5.25 kg and variance 0.25 kg. A random sample of four 5 kg bags is selected. Calculate the
probability that the mean weight of fertilizer of the four bags is less than 5.30 kg.

2.2 Estimation

Definitions
 The assignment of value(s) to a population parameter based on a value of the corresponding
statistic is called estimation.
 The value(s) assigned to a population parameter based on a value of a sample statistic is called an
estimate.
 The sample statistic that is used to estimate a population parameter is called an estimator.

Properties of a good estimator


 Unbiased
Expected value of the estimates is equal to the parameter being estimated.

 Consistent
As sample size increases, the value of the estimator approaches the value of the parameter
estimated

 Relatively efficient
Of all the statistics that can be used to estimate a parameter, the relatively efficient estimator has
the smallest variance.

Estimation procedure
Step 1: Select a sample.
Step 2: Collect required information from the members of the sample.
Step 3: Calculate the value(s) of the sample statistic(s).
Step 4: Assign value(s) to the corresponding population parameter(s).

5
STA408 Topic 2: Estimation

 The value of a sample statistic that is used to estimate a population parameter is called a point
estimate.
 In interval estimate, an interval is constructed around the point estimate and it is stated that
this interval is likely to contain the corresponding population parameter.
 Each interval is constructed with regard to a given confidence level and is called a confidence
interval. The confidence interval is given as
Point estimate ± margin of error.
 The confidence level associated with a confidence interval states how much confidence we have
that this interval contains the true population parameter. The confidence interval is denoted by
(1 − 𝛼 )100%
where 𝛼 is the significance level.
Examples of point estimates
Population (Parameter) Sample (Statistic /point estimate)
∑𝑋 ∑𝑥
Mean 𝜇= 𝑥̅ =
𝑁 𝑛
(∑ 𝑋) (∑ 𝑥 )
Variance ∑(𝑋 − 𝜇) ∑𝑋 − ∑(𝑥 − 𝑥̅ ) ∑𝑥 −
𝜎 = = 𝑁 𝑠 = = 𝑛
𝑁 𝑁 𝑛−1 𝑛−1
(∑ 𝑋) (∑ 𝑥 )
Standard ∑(𝑋 − 𝜇) ∑𝑋 − ∑(𝑥 − 𝑥̅ ) ∑𝑥 −
𝑁 𝑛
deviation 𝜎= = 𝑠= =
𝑁 𝑁 𝑛−1 𝑛−1

where 𝑋 , 𝑋 , 𝑋 , , 𝑋 are the members of a population and 𝑥 , 𝑥 , 𝑥 , , 𝑥 are the elements in a


sample.

Example 6
Following are the 2009 earnings (in thousands of dollars) before taxes for all six employees of a small
company.
88.50 108.40 65.50 52.50 79.80 54.60
Calculate the mean and standard deviation of these data.

6
STA408 Topic 2: Estimation

Example 7
Assume that the data given in Example 6 are the earnings for six employees of a large company. Calculate
the mean and standard deviation of those data.

Estimation of a population mean: 𝝈𝟐 known


Confidence interval of mean for a specific 𝛼 when 𝜎 is known:
𝜎 𝜎
𝑥̅ − 𝑧 < 𝜇 < 𝑥̅ + 𝑧
√𝑛 √𝑛

For a 90% confidence interval, 𝛼 = 10% = 𝟎. 𝟏𝟎 and 𝑧 = 𝒛𝟎.𝟎𝟓 = _________________;

For a 95% confidence interval, 𝛼 = 5%= 𝟎. 𝟎𝟓 and 𝑧 = 𝒛𝟎.𝟎𝟐𝟓 = __________________;

For a 99% confidence interval, 𝛼 = 1%= 𝟎. 𝟎𝟏 and 𝑧 = 𝒛𝟎.𝟎𝟎𝟓 = __________________;


𝝈
and the margin of error is 𝒛𝜶 .
𝟐 √𝒏

Confidence intervals

Figure 1: The 90% confidence intervals for 𝜇 constructed by 𝑥̅ , 𝑥̅ and 𝑥̅ .

7
STA408 Topic 2: Estimation

2.3 Interval Estimation of one Population Mean

(I) A sample (be it large or small) drawn from a normal distribution with a known 𝝈

Example 8
A publishing company has just published a new college textbook. Before the company decides the price
at which to sell this textbooks, it wants to know the average price of all such textbooks in the market. The
research department at the company took a sample of 25 comparable textbooks and collected
information on their prices. This information produced a mean price of RM 145 for this sample. It is
known that the standard deviation of the prices of all such textbooks is RM 35 and the population of such
prices is normal.
(a) What is the point estimate of the mean price of all such college textbooks?
(b) Construct a 90% confidence interval for the mean price of all such textbooks and interpret the
interval.

Example 9
The following data represent a sample of assets (in millions of RM) of 10 companies in Selangor. Find the
90% confidence interval of the mean. Assume that the assets (in millions of RM) of all companies in
Selangor are approximately normally distributed and the standard deviation of the population is 21.154.
12.23 2.89 13.19 73.25 11.59 8.74 7.92 40.22 5.01 2.27

8
STA408 Topic 2: Estimation

Below are some examples of the outputs for the analysis done using the data given Example 9.
(i) Minitab software (version 17)
One-Sample Z: Assets_Value

The assumed standard deviation = 21.154

Variable N Mean StDev SE Mean 95% CI


Assets_Value 10 17.73 22.30 6.69 (4.62, 30.84)

(ii) Minitab software (version 21)


One-Sample Z: Asset_Values

Descriptive Statistics
N Mean StDev SE Mean 95% CI for μ
10 17.73 22.30 6.69 (4.62, 30.84)
μ: population mean of Asset_Values
Known standard deviation = 21.154

(iii) Statistics Kingdom

https://www.statskingdom.com/confidence-interval-calculator.html

Note: There is no restrictions to the type of software used. What is important is you got to know what
the terms in the output mean.

(II) A large sample drawn from an unknown distribution with a known 𝝈

Example 10
A machine is regulated to dispense liquid into cartons in such a way that the amount of liquid dispensed
on all occasions is known to have a standard deviation of 20 ml.
(a) Find the 95% confidence limits for the mean amount of liquid dispensed if a random sample of
40 cartons had an average content of 266ml.
(b) Find the 99% confidence limits for the mean amount of liquid dispensed if a random sample of
40 cartons had an average content of 266ml.
(c) Find the 95% confidence limits for the mean amount of liquid dispensed if a random sample of
120 cartons had an average content of 266ml.

9
9 n 40 95 CI for If

I l 1 I 1 1 12
257 9 259
g of y 5266 2696 222.22741

M M
b 999 for Me
STA408 Topic 2: Estimation

govt
MTors
Note : The width of the confidence interval depends on the size of the margin of error which depends
on the values of 𝑧, 𝜎 and 𝑛. However, the value of 𝜎 is beyond our control. Therefore the width of
the confidence interval can be controlled either through the value of 𝑧 (depends on 𝛼) or the size
of the sample, 𝑛.
Confidence level and the width of confidence interval CLT
width Par Lt ardthl
- The larger the confidence level, the wider the confidence interval is and vice versa.
Sample size and the width of confidence interval
-
nM
The bigger the size of the sample, the smaller the confidence interval is and vice versa.

t Distribution
Characteristics if a t distribution:
 It is bell-shaped
 It is symmetric about the mean.
 The mean, median and mode are equal to 0 and are located in the centre of the distribution.
 The curve never touches the 𝑥-axis.
 The variance is greater than 1.
 The t distribution is a family of curves based on the concept of degrees of freedom, 𝜈 which is
related to the sample size.
 As the sample size increases, the t distribution approaches the normal distribution.

Estimation of a population mean: 𝝈𝟐 unknown


Confidence interval of mean for a specific 𝛼 when 𝜎 is unknown:
𝑠 𝑠
𝑥̅ − 𝑡 , < 𝜇 < 𝑥̅ + 𝑡 ,
√𝑛 √𝑛
where the degrees of freedom, 𝝂 = 𝒏 − 𝟏.

Below are some examples of the outputs for the analysis done using the data given Example 9.
(i) Minitab software (version 17)
One-Sample T: Assets_Value

Variable N Mean StDev SE Mean 95% CI


Assets_Value 10 17.73 22.30 7.05 (1.78, 33.68)

(ii) Minitab software (version 21)


One-Sample T: Asset_Values

Descriptive Statistics
N Mean StDev SE Mean 95% CI for μ
10 17.73 22.30 7.05 (1.78, 33.68)
μ: population mean of Asset_Values

10
STA408 Topic 2: Estimation

(iii) Microsoft Excel

Asset_Values

Mean 17.731
Standard Error 7.051338321
Median 10.165
Mode #N/A
Standard Deviation 22.29828965
Sample Variance 497.2137211
Kurtosis 4.407003912
Skewness 2.14612104
Range 70.98
Minimum 2.27
Maximum 73.25
Sum 177.31
Count 10
Confidence Level(95.0%) 15.95123549

(III) A small sample drawn from a normal distribution with an unknown 𝝈𝟐

Example 11
Dr. K wants to estimate the mean cholesterol level for all adult men living in Shah Alam. He took a sample
of 25 adult men from Shah Alam and found that the mean cholesterol level for this sample is 186mg/dL
with a standard deviation of 12 mg/dL. Assume that the cholesterol levels for all adult men in Shah Alam
are (approximately) normally distributed. Construct a 95% confidence interval for the population mean.

11
STA408 Topic 2: Estimation

(IV) A large sample drawn from an unknown distribution with an unknown 𝝈𝟐 (use 𝒕 table)

Example 12
Forty-one randomly selected adults who buy books for general reading were asked how much they
actually spend on books per year. The sample produced a mean of RM 145 and a standard deviation of
RM 30 for such annual expenses. Determine a 99% confidence interval for the corresponding population
mean.

N 41 T 145 5 30 1 99 0 A 20 01
from table
CV th 2 goyt
I
o oos r yo
99 CI for M 145 It 2.704
if
132.3312 157 6688
we are 99 confident

Example 13
An experienced poultry farmer knows that the mean weight 𝜇 kg for a large population of chickens will
gift
samedeviation of the weights should remain at 0.70 kg. A random
vary from season to season but the standard
sample of 100 chickens is taken from the population and the weight 𝑥 kg of each chicken in the sample is
recorded giving ∑ 𝑥 = 190.2. Find a 95% confidence interval for 𝜇.

0.70
assumed std der
past records

InfunIIa Zoo s 1.9600

of 90.2 1.902
100

12
Let me be the weight chicken

Given Pao 70 n 100 and Eu 190.2 therefore I 1 2 1 9oz

95 Confidence level CL N 5 0.05

Zo 025 1 9600
ZI
9506 confidence internal
for u
STA408 Topic 2: Estimation

2.4 Interval Estimation of Two Population Means (Independent variables)

(I) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐 are known)
Mean and standard deviation of 𝑥̅ − 𝑥̅ which is (approximately) normal has a mean and standard
deviation as follow:
Mean 𝜇 ̅ ̅ =𝜇 −𝜇

𝜎 𝜎
Standard deviation 𝜎 ̅ ̅ = +
𝑛 𝑛

Interval Estimation of 𝝁𝟏 − 𝝁𝟐
When using the normal distribution, the (1 − 𝛼 )100% confidence interval for 𝜇 − 𝜇 is
(𝑥̅ − 𝑥̅ ) ± 𝑧 𝜎 ̅ ̅
Example 14
A survey of low-and middle-income households show that consumers aged 65 years and older had an
average credit card debt of RM 10, 235 and consumers in the 50- to 64-year group had an average credit
card debt of RM 9, 342 at the time of survey. Suppose that these averages where based on the random
samples of 1200 and 1400 people for the two groups, respectively. Further, assume that the population
standard deviations for the two groups were RM 2, 800 and RM 2, 500, respectively. Let 𝜇 and 𝜇 be the
r respective population means for the two groups, people ages 65 years and older and people in the 50- to
64- year age group. Construct a 95% confidence interval for 𝜇 − 𝜇 . Based on the interval, are 𝜇 and 𝜇
equal? Explain.

I I 10 235 Fa I 9342 N I 1200 Nz 1000

T 2800 CL 95 5
02 2500
1 9600
contiffertal Zo ow

95 CI form oh t Fal Ito


ETE
ons

10235 9342
I 1 9600138 t
687 4557 1098 5443
We are 9506 confident that difference of the
population mean
credit card debts for the norms
13
is between Rn
687.66
and Ron 1098.54
If M M 3 M Ma O

since O is not in the interval therefore M Me

If M M 3 M Ma O

If
M M 40 M FM

Is
it Me or
M
M
M Z M because all the values in

are positive
the interval

Z
M M 2M M 20

positive
STA408 Topic 2: Estimation

(II) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 = 𝝈𝟐𝟐 and unknown)
When the standard deviation of two populations are equal, we can use 𝜎 for both 𝜎 and 𝜎 . However,
since 𝜎 is unknown, we replace it by its point estimator, 𝑠 , called the pooled standard deviation.

Pool standard deviation for two samples


(𝑛 − 1)𝑠 + (𝑛 − 1)𝑠
𝑠 =
𝑛 +𝑛 −2

Estimator of the standard deviation of 𝒙𝟏 − 𝒙𝟐 is


1 1
𝑠 ̅ ̅ =𝑠 +
𝑛 𝑛
Interval Estimation of 𝝁𝟏 − 𝝁𝟐
The (1 − 𝛼 )100% confidence interval for 𝜇 − 𝜇 is
(𝑥̅ − 𝑥̅ ) ± 𝑡 ,
𝑠 ̅ ̅

where the degrees of freedom, 𝜈 = 𝑛 + 𝑛 − 2.

Example 15
A consumer agency wanted to estimate the difference in mean amounts of caffeine in two different brands
of coffee. The agency took a sample of 15 one-pound jars of Brand I coffee that showed the mean amount
of caffeine in these jars to be 80 milligrams jar with a standard deviation of 5 milligrams. Another sample
of 12 one-pound jars of Brand II coffee gave a mean amount of caffeine equal to 77 milligrams per jar
with a standard deviation of 6 milligrams. Construct a 98% confidence interval for the difference between
the mean amounts of caffeine in one-pound jars of these two brands of coffee. Assume that the two
populations are normally distributed and that the standard deviations of two populations are equal.

14
STA408 Topic 2: Estimation

Example 16
The following Minitab output (version 17) was obtained from two independent samples selected from
two normally distributed populations with unknown but equal standard deviations.
Two-Sample T-Test and CI: S1, S2

Two-sample T for S1 vs S2

N Mean StDev SE Mean


Minitab S1 13 48.94 8.31 2.3
S2 10 45.95 9.97 3.2
version 17
Difference = μ (S1) - μ (S2)
Estimate for difference: 2.99
95% CI for difference: (-4.94, 10.91)
T-Test of difference = 0 (vs ≠): T-Value = 0.78 P-Value = 0.442 DF = 21
Both use Pooled StDev = 9.0587

(a) Verify that the pooled standard deviation of the data is 9.0587
(b) Show that the 95% confidence interval of the difference in mean of the two populations is
between −4.94 and 10.91.

Two-Sample T-Test and CI

Method
μ₁: population mean of Sample 1
µ₂: population mean of Sample 2
Difference: μ₁ - µ₂
Equal variances are assumed for this analysis.

Descriptive Statistics
Sample N Mean StDev SE Mean
Minitab Sample 1 13 48.94 8.31 2.3
version 21 Sample 2 10 45.95 9.97 3.2

Estimation for Difference


Difference Pooled StDev 95% CI for Difference
2.99 9.06 (-4.93, 10.91)

Test
Null hypothesis H₀: μ₁ - µ₂ = 0
Alternative hypothesis H₁: μ₁ - µ₂ ≠ 0

T-Value DF P-Value
0.78 21 0.441

15
STA408 Topic 2: Estimation

(III) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 and unknown)

Degrees of freedom
𝑠 𝑠
+
𝑛 𝑛
𝜈=
𝑠 𝑠
𝑛 𝑛
+
𝑛 −1 𝑛 −1

Estimator of the standard deviation of 𝒙𝟏 − 𝒙𝟐


𝑠 𝑠
𝑠 ̅ ̅ = +
𝑛 𝑛
Interval Estimation of 𝝁𝟏 − 𝝁𝟐
The (1 − 𝛼 )100% confidence interval for 𝜇 − 𝜇 is
(𝑥̅ − 𝑥̅ ) ± 𝑡 ,
𝑠 ̅ ̅

where the degrees of freedom, 𝜈 =

Example 17
Refer to Example 15. Construct a 98% confidence interval for the difference between the mean amounts
of caffeine in one-pound jars of these two brands. Assume that two populations are normally distributed
and that the standard deviations of the two populations are not equal.

16
STA408 Topic 2: Estimation

Example 18
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal standard deviations.
Two-Sample T-Test and CI: S1, S2

Two-sample T for S1 vs S2

N Mean StDev SE Mean


Minitab S1 13 48.94 8.31 2.3
S2 10 45.95 9.97 3.2
version 17
Difference = μ (S1) - μ (S2)
Estimate for difference: 2.99
95% CI for difference: (-5.25, 11.23)
T-Test of difference = 0 (vs ≠): T-Value = 0.77 P-Value = 0.455 DF = 17

(a) Verify that the degrees of freedom is 17.


(b) Show that the 95% confidence interval of the difference in mean of the two populations is
between −5.25 and 11.23.

Two-Sample T-Test and CI


Method
μ₁: population mean of Sample 1
µ₂: population mean of Sample 2
Difference: μ₁ - µ₂
Equal variances are not assumed for this analysis.

Descriptive Statistics
Sample N Mean StDev SE Mean
Minitab Sample 1 13 48.94 8.31 2.3
version 21 Sample 2 10 45.95 9.97 3.2

Estimation for Difference


Difference 95% CI for Difference
2.99 (-5.25, 11.23)

Test
Null hypothesis H₀: μ₁ - µ₂ = 0
Alternative hypothesis H₁: μ₁ - µ₂ ≠ 0
T-Value DF P-Value
0.77 17 0.454

17
STA408 Topic 2: Estimation

2.5 Interval Estimation of Two Population Means (Dependent variables)

Mean difference of two normal distributions for paired samples, 𝝁𝒅


Two samples are said to be paired samples when for each data value collected from one sample there is
a corresponding data value collected from the second sample, and both these data values are collected
from the same source.

Notation for paired samples


In pair samples, the difference between the two data values for each element of the two samples is
denoted by 𝒅, called pair difference. The degrees of freedom for the paired samples, 𝝂 = 𝒏 − 𝟏.
 𝜇 = the mean of the paired differences of the population.
 𝜎 = the standard deviation of the paired differences of the population (usually is never known).
 𝑑̅ = the mean of the paired differences of the sample.
 𝑠 = the standard deviation of the paired differences of the sample.
 𝑛 = the number of paired difference values.

Mean and standard deviation of the paired differences of two samples

∑𝑑
Mean, 𝑑̅ 𝑑̅ =
𝑛
(∑ 𝑑 )
Standard deviation, 𝑠 ∑𝑑 −
𝑠 = 𝑛
𝑛−1

Mean and standard deviation of 𝒅


If 𝜎 is known and either the sample size is large (𝑛 ≥ 30) or the population is normally distributed, then
the sampling distribution of 𝑑̅ is approximately normal with its mean and standard deviation given as,

Mean, 𝜇 𝜇 =𝜇
𝜎
Standard deviation, 𝜎 𝜎 =
√𝑛

If 𝜎 is unknown, then 𝜎 is estimated by 𝑠 , i.e.,


𝑠
𝑠 =
√𝑛

Confidence interval for 𝝁𝒅


The (1 − 𝛼 )100% confidence interval for 𝜇 is
𝑑̅ ± 𝑡 ,
(𝑠 )

where the degrees of freedom, 𝜈 = 𝑛 − 1.

18
STA408 Topic 2: Estimation

Example 19
A researcher wanted to find the effect of special diet on systolic blood pressure. She selected a sample of
seven adults and put them on this dietary plan for 3 months. The table below gives the systolic blood
pressure (in mm Hg) of these seven adults before and after the completion of this plan.

Before 210 180 195 220 231 199 224


After 193 186 186 223 220 183 233

Let 𝜇 be the mean reduction in the systolic blood pressures due to this special dietary plan for the
population of all adults. Construct a 95% confidence interval for 𝜇 . Assume that the population paired
differences is (approximately) normally distributed.

Let 𝑑 =
𝑑

The following outputs are obtained from the data in Example 19. Take note of the difference, 𝑑̅ value.

Software Output
Paired T-Test and CI: Before, After

Paired T for Before - After

N Mean StDev SE Mean


Minitab Before 7 208.43 18.10 6.84
Version 17 After 7 203.43 21.08 7.97
Difference 7 5.00 10.79 4.08

95% CI for mean difference: (-4.98, 14.98)


T-Test of mean difference = 0 (vs ≠ 0): T-Value = 1.23 P-Value = 0.266

19
STA408 Topic 2: Estimation

Paired T-Test and CI: After, Before

Paired T for After - Before

Minitab N Mean StDev SE Mean


After 7 203.43 21.08 7.97
Version 17 Before 7 208.43 18.10 6.84
Difference 7 -5.00 10.79 4.08

95% CI for mean difference: (-14.98, 4.98)


T-Test of mean difference = 0 (vs ≠ 0): T-Value = -1.23 P-Value = 0.266

Paired T-Test and CI: Before, After


Descriptive Statistics
Sample N Mean StDev SE Mean
Before 7 208.43 18.10 6.84
After 7 203.43 21.08 7.97

Estimation for Paired Difference


SE 95% CI for
Minitab Mean StDev Mean μ_difference
Version 21 5.00 10.79 4.08 (-4.98, 14.98)

µ_difference: population mean of (Before - After)

Test
Null hypothesis H₀: μ_difference = 0
Alternative hypothesis H₁: μ_difference ≠ 0
T-Value P-Value
1.23 0.266

t-Test: Paired Two Sample for Means


Before After
Mean 208.4285714 203.4285714
Variance 327.6190476 444.2857143
Observations 7 7
Pearson Correlation 0.859160448
Microsoft Hypothesized Mean Difference 0
Excel df 6
t Stat 1.226498265
P(T<=t) one-tail 0.13297784
t Critical one-tail 1.943180281
P(T<=t) two-tail 0.26595568
t Critical two-tail 2.446911851

20
STA408 Topic 2: Estimation

Chi-square 𝝌𝟐 Distribution
 A distribution based on degrees of freedom, 𝜈.
 The symbol is 𝜒 .
( )
 The chi-square distribution is obtained from the values of when random samples are
selected from a normally distributed population whose variance is 𝜎 .
 A chi-square variable cannot be negative.
 The distribution is skewed to the right.
 At about 100 degrees of freedom, chi-square distribution becomes approximately normal.
 The area under each chi-square distribution is equal to 1.00 or 100%.

Figure 1: The Chi-Square Family of Curves

Figure 2: Chi-Square Distribution for d.f. = 𝑛 − 1.

Example 20
Find the values of 𝜒 and 𝜒 for a 90% confidence interval when 𝑛 = 25.

21
STA408 Topic 2: Estimation

2.6 Interval Estimation of One Population Variance


The assumptions for finding a confidence interval for a variance:
 The sample is a random sample.
 The population must be normally distributed.

The (1 − 𝛼 )100% confidence interval for 𝜎 is


(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎 <
𝜒 𝜒
where degrees of freedom, 𝜈 = 𝑛 − 1.

The (1 − 𝛼 )100% confidence interval for 𝜎 is

(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎<
𝜒 𝜒

Example 21
Find the 95% confidence interval for the variance and standard deviation of the nicotine content of
cigarettes manufactured if a sample of 20 cigarettes has a standard deviation of 1.6 milligrams.

Example 22
Find the 90% confidence interval for the variance and standard deviation for the price in dollars of an
adult single-day ski lift ticket. The data represent a selected sample of nationwide ski resorts. Assume the
variable is normally distributed.
59 54 53 52 51 39 49 46 49 48

22
STA408 Topic 2: Estimation

Some Examples of statistical outputs for the confidence interval for one variance using the data given in
Example 22.
Software Output
Test and CI for One Variance: ski_lift_ticket

Method

The chi-square method is only for the normal distribution.


The Bonett method is for any continuous distribution.

Minitab Statistics
Version 17 Variable N StDev Variance
ski_lift_ticket 10 5.31 28.2

90% Confidence Intervals


CI for CI for
Variable Method StDev Variance
ski_lift_ticket Chi-Square (3.87, 8.74) (15.0, 76.4)
Bonett (3.35, 10.09) (11.2, 101.8)

Test and CI for One Variance: Ski-lift Ticket

Method
σ: standard deviation of Ski-lift Ticket
The Bonett method is valid for any continuous distribution.
Minitab The chi-square method is valid only for the normal distribution.
Version 21
Descriptive Statistics
95% CI for σ 95% CI for σ
N StDev Variance using Bonett using Chi-Square
10 5.31 28.2 (2.99, 11.73) (3.65, 9.70)

F distribution
 The values of F cannot be negative because variances are always positive or zero.
 The distribution is positively skewed.
 The mean value of F is approximately equal to 1.
 The F distribution is a family of curves based on degrees of freedom of variance of the numerator
and the degrees of freedom of the variance of the denominator.

Figure 3: The F family of curves.

23
STA408 Topic 2: Estimation

2.7 Interval Estimation of Two Population Variances: Estimating the Ratio of Two Variances

 The point estimate of the ratio of two population variances is given by the ratio of the
sample variances.
 If 𝜎 and 𝜎 are the variances of normal populations, we can establish an interval estimate of
by using the statistic
𝜎 𝑠
𝐹=
𝜎 𝑠
 The random variable 𝐹 has an 𝐹 -distribution with 𝜈 = 𝑛 − 1 and 𝜈 = 𝑛 − 1 degrees of
freedom.

The (1 − 𝛼 )100% confidence interval for ratio of two variances, is

𝑠 1 𝜎 𝑠
< < 𝐹 , ,
𝑠 𝐹 , , 𝜎 𝑠

The (1 − 𝛼 )100% confidence interval for ratio of two standard deviations, is

𝑠 1 𝜎 𝑠
< < 𝐹 , ,
𝑠 𝐹 , , 𝜎 𝑠

Note: 𝑠 > 𝑠

Example 23
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal variances. Show that the lower limit for the 95%
confidence interval of the ratio of variances and standard deviations for the two populations are as given
in the output.

Test and CI for Two Variances: S1, S2


Statistics

95% CI for
Variable N StDev Variance StDevs
S1 13 8.309 69.038 (5.958, 13.716)
Minitab S2 9 6.564 43.092 (4.434, 12.576)
Version 17
Ratio of standard deviations = 1.266
Ratio of variances = 1.602

95% Confidence Intervals


CI for
CI for StDev Variance
Method Ratio Ratio
F (0.618, 2.372) (0.381, 5.626)

24
STA408 Topic 2: Estimation

Test and CI for Two Variances

Method
σ₁²: variance of Sample 1
σ₂²: variance of Sample 2
Ratio: σ₁²/σ₂²
F method was used. This method is accurate for normal data only.

Descriptive Statistics
Sample N StDev Variance 95% CI for σ²
Sample 1 13 8.309 69.038 (35.500, 188.123)
Sample 2 9 6.564 43.092 (19.660, 158.155)

Ratio of Variances
Estimated Ratio 95% CI for Ratio using F
1.60211 (0.381, 5.626)

Test
Null hypothesis H₀: σ₁² / σ₂² = 1
Alternative hypothesis H₁: σ₁² / σ₂² ≠ 1
Significance level α = 0.05

Minitab Method Test Statistic DF1 DF2 P-Value


F 1.60 12 8 0.513
Version 21

25
STA408 Topic 2: Estimation

Example 24
A study was conducted by the Department of Zoology at Virginia Tech to estimate the difference in the
amounts of the chemical orthophosphorus measured at two different stations on the James River.
Orthophosphorus was measured in milligrams per litre. Thirteen samples were collected from station 1,
and 11 samples were obtained from station 2. The 13 samples from station had an average
orthophosphorus content of 3.84 milligrams per litre and a standard deviation of 3.07 milligrams per
litre, while the 11 samples from station 2 had an average content of 1.49 milligrams per litre and a
standard deviation of 0.80 milligram per litre. Assume that the observations came from normal
populations.
(a) Construct a 98% confidence interval for the ratio of two variances and standard deviations. Based
on the confidence interval, what can you conclude about the two population variances?
(b) From the result in (a), construct the 98% confidence interval for the difference in the population
mean amounts of the chemical orthophosphorus measured at two different stations. Based on the
interval, is there a significant difference in the two population means?

26
STA408 Topic 2: Estimation

Additional Notes

Interval Estimation for Two Populations


17 4
For the confidence interval for difference in two population means, 𝜇 − 𝜇 , 2024
 If the value of 0 is in the interval, we can conclude that 𝜇 = 𝜇 because 𝜇 − 𝜇 = 0.
 However, if the value of 0 is not in the interval, then we can conclude that 𝜇 ≠ 𝜇 because
𝜇 − 𝜇 ≠ 0.

In a similar manner, when we consider the confidence interval for the ratio of two population variances,
𝝈𝟐𝟏
,
𝝈𝟐𝟐

 If the value of 1 is in the interval, we can conclude that 𝜎 = 𝜎 because = 1.

 However, if the value of 1 is not in the interval, then we can conclude that 𝜎 ≠ 𝜎 because ≠ 1.

For example, if we consider the confidence interval for Example 24, the 98% confidence interval for is

(3.127, 63.326). Since the value of 1 is not in the interval, we can conclude that 𝜎 ≠ 𝜎 because ≠ 1.

Remember:
To draw conclusion on the confidence interval for the difference in two population means, check if the
value of 0 is in the interval; however,

If we want to conclude on the confidence interval for the ratio of two population variances, check if the
value of 1 is in the interval.

27

You might also like