Professional Documents
Culture Documents
Topic 2 - Estimation (Students' Notes)
Topic 2 - Estimation (Students' Notes)
mean I
Topic 2: Estimation
M
2.1 Sampling Distributions
SD S T
Populations and Samples
A population consists of all subjects that are being studied. S
A sample is a subset of a population. Variance y
Population Parameters and Sample Statistics
A numerical measure calculated for a population data set is called a population parameter.
A summary measure calculated for a sample data set is called a sample statistic.
Note: A sample statistic that is used to estimate a population parameter is called an estimator.
Example 1
Suppose there are only five students in STA 408 class and the test 1 scores of these five students are as
below.
70 78 80 80 95
(a) Find the population distribution for the scores of the students.
(b) Find the mean and standard deviation of the data.
(a) Let 𝑥 be the score of a student. The population probability distribution is:
𝑥 𝑃(𝑥)
70
78
80
95
(b) Mean,
∑𝑥
𝜇= =
𝑁
Standard deviation,
(∑ 𝑥 )
∑(𝑥 − 𝜇) ∑𝑥 −
𝜎= or = 𝑁
𝑁 𝑁
(∑ 𝑥 )
∑𝑥 −
𝜎= 𝑁 =
𝑁
STA408 Topic 2: Estimation
Sampling Distribution of 𝒙
Example 2
Refer to the data in Example 1.
(a) Find all possible samples of three scores each that can be selected, without replacement.
(b) Find the mean for each of the sample.
(c) Find the sampling distribution of 𝑥̅ .
From Example 1:
{𝑀 , 𝑀 , 𝑀 } {70, 78, 80} = = 76
Population Mean,
{𝑀 , 𝑀 , 𝑀 } {70, 78, 80} = = 76
𝜇=
{𝑀 , 𝑀 , 𝑀 } {70, 78, 95} = = 81
Mean of sample means,
{𝑀 , 𝑀 , 𝑀 } {70, 80, 80}
𝜇 ̅=
{𝑀 , 𝑀 , 𝑀 } {70, 80, 95}
{𝑀 , 𝑀 , 𝑀 } ∴ 𝜇𝑥 = 𝜇
{𝑀 , 𝑀 , 𝑀 }
2
STA408 Topic 2: Estimation
3
STA408 Topic 2: Estimation
Example 3
The mean wage per hour for all 5000 employees who work at a large hotel is RM 27.50, and the standard
deviation is RM 3.70. Let 𝑥̅ be the mean wage per hour for a random sample of certain employees selected
from this company. Find the mean and standard deviation of 𝑥̅ for a sample size of
(a) 30, (b) 75, (c) 200.
Example 4
Assume that the weights of all packages of a certain brand of cookies are normally distributed with a
mean of 32 grams and a standard deviation of 0.3 grams. Find the probability that the mean weight, of a
random sample of 20 packages of this brand of cookies will be between 31.8 and 31.9 grams.
4
STA408 Topic 2: Estimation
Example 5
The actual weights, 𝑊 kilograms, of fertilizer in a 5 kg bag may be modelled by a normal random variable
with mean 5.25 kg and variance 0.25 kg. A random sample of four 5 kg bags is selected. Calculate the
probability that the mean weight of fertilizer of the four bags is less than 5.30 kg.
2.2 Estimation
Definitions
The assignment of value(s) to a population parameter based on a value of the corresponding
statistic is called estimation.
The value(s) assigned to a population parameter based on a value of a sample statistic is called an
estimate.
The sample statistic that is used to estimate a population parameter is called an estimator.
Consistent
As sample size increases, the value of the estimator approaches the value of the parameter
estimated
Relatively efficient
Of all the statistics that can be used to estimate a parameter, the relatively efficient estimator has
the smallest variance.
Estimation procedure
Step 1: Select a sample.
Step 2: Collect required information from the members of the sample.
Step 3: Calculate the value(s) of the sample statistic(s).
Step 4: Assign value(s) to the corresponding population parameter(s).
5
STA408 Topic 2: Estimation
The value of a sample statistic that is used to estimate a population parameter is called a point
estimate.
In interval estimate, an interval is constructed around the point estimate and it is stated that
this interval is likely to contain the corresponding population parameter.
Each interval is constructed with regard to a given confidence level and is called a confidence
interval. The confidence interval is given as
Point estimate ± margin of error.
The confidence level associated with a confidence interval states how much confidence we have
that this interval contains the true population parameter. The confidence interval is denoted by
(1 − 𝛼 )100%
where 𝛼 is the significance level.
Examples of point estimates
Population (Parameter) Sample (Statistic /point estimate)
∑𝑋 ∑𝑥
Mean 𝜇= 𝑥̅ =
𝑁 𝑛
(∑ 𝑋) (∑ 𝑥 )
Variance ∑(𝑋 − 𝜇) ∑𝑋 − ∑(𝑥 − 𝑥̅ ) ∑𝑥 −
𝜎 = = 𝑁 𝑠 = = 𝑛
𝑁 𝑁 𝑛−1 𝑛−1
(∑ 𝑋) (∑ 𝑥 )
Standard ∑(𝑋 − 𝜇) ∑𝑋 − ∑(𝑥 − 𝑥̅ ) ∑𝑥 −
𝑁 𝑛
deviation 𝜎= = 𝑠= =
𝑁 𝑁 𝑛−1 𝑛−1
Example 6
Following are the 2009 earnings (in thousands of dollars) before taxes for all six employees of a small
company.
88.50 108.40 65.50 52.50 79.80 54.60
Calculate the mean and standard deviation of these data.
6
STA408 Topic 2: Estimation
Example 7
Assume that the data given in Example 6 are the earnings for six employees of a large company. Calculate
the mean and standard deviation of those data.
Confidence intervals
7
STA408 Topic 2: Estimation
(I) A sample (be it large or small) drawn from a normal distribution with a known 𝝈
Example 8
A publishing company has just published a new college textbook. Before the company decides the price
at which to sell this textbooks, it wants to know the average price of all such textbooks in the market. The
research department at the company took a sample of 25 comparable textbooks and collected
information on their prices. This information produced a mean price of RM 145 for this sample. It is
known that the standard deviation of the prices of all such textbooks is RM 35 and the population of such
prices is normal.
(a) What is the point estimate of the mean price of all such college textbooks?
(b) Construct a 90% confidence interval for the mean price of all such textbooks and interpret the
interval.
Example 9
The following data represent a sample of assets (in millions of RM) of 10 companies in Selangor. Find the
90% confidence interval of the mean. Assume that the assets (in millions of RM) of all companies in
Selangor are approximately normally distributed and the standard deviation of the population is 21.154.
12.23 2.89 13.19 73.25 11.59 8.74 7.92 40.22 5.01 2.27
8
STA408 Topic 2: Estimation
Below are some examples of the outputs for the analysis done using the data given Example 9.
(i) Minitab software (version 17)
One-Sample Z: Assets_Value
Descriptive Statistics
N Mean StDev SE Mean 95% CI for μ
10 17.73 22.30 6.69 (4.62, 30.84)
μ: population mean of Asset_Values
Known standard deviation = 21.154
https://www.statskingdom.com/confidence-interval-calculator.html
Note: There is no restrictions to the type of software used. What is important is you got to know what
the terms in the output mean.
Example 10
A machine is regulated to dispense liquid into cartons in such a way that the amount of liquid dispensed
on all occasions is known to have a standard deviation of 20 ml.
(a) Find the 95% confidence limits for the mean amount of liquid dispensed if a random sample of
40 cartons had an average content of 266ml.
(b) Find the 99% confidence limits for the mean amount of liquid dispensed if a random sample of
40 cartons had an average content of 266ml.
(c) Find the 95% confidence limits for the mean amount of liquid dispensed if a random sample of
120 cartons had an average content of 266ml.
9
9 n 40 95 CI for If
I l 1 I 1 1 12
257 9 259
g of y 5266 2696 222.22741
M M
b 999 for Me
STA408 Topic 2: Estimation
govt
MTors
Note : The width of the confidence interval depends on the size of the margin of error which depends
on the values of 𝑧, 𝜎 and 𝑛. However, the value of 𝜎 is beyond our control. Therefore the width of
the confidence interval can be controlled either through the value of 𝑧 (depends on 𝛼) or the size
of the sample, 𝑛.
Confidence level and the width of confidence interval CLT
width Par Lt ardthl
- The larger the confidence level, the wider the confidence interval is and vice versa.
Sample size and the width of confidence interval
-
nM
The bigger the size of the sample, the smaller the confidence interval is and vice versa.
t Distribution
Characteristics if a t distribution:
It is bell-shaped
It is symmetric about the mean.
The mean, median and mode are equal to 0 and are located in the centre of the distribution.
The curve never touches the 𝑥-axis.
The variance is greater than 1.
The t distribution is a family of curves based on the concept of degrees of freedom, 𝜈 which is
related to the sample size.
As the sample size increases, the t distribution approaches the normal distribution.
Below are some examples of the outputs for the analysis done using the data given Example 9.
(i) Minitab software (version 17)
One-Sample T: Assets_Value
Descriptive Statistics
N Mean StDev SE Mean 95% CI for μ
10 17.73 22.30 7.05 (1.78, 33.68)
μ: population mean of Asset_Values
10
STA408 Topic 2: Estimation
Asset_Values
Mean 17.731
Standard Error 7.051338321
Median 10.165
Mode #N/A
Standard Deviation 22.29828965
Sample Variance 497.2137211
Kurtosis 4.407003912
Skewness 2.14612104
Range 70.98
Minimum 2.27
Maximum 73.25
Sum 177.31
Count 10
Confidence Level(95.0%) 15.95123549
Example 11
Dr. K wants to estimate the mean cholesterol level for all adult men living in Shah Alam. He took a sample
of 25 adult men from Shah Alam and found that the mean cholesterol level for this sample is 186mg/dL
with a standard deviation of 12 mg/dL. Assume that the cholesterol levels for all adult men in Shah Alam
are (approximately) normally distributed. Construct a 95% confidence interval for the population mean.
11
STA408 Topic 2: Estimation
(IV) A large sample drawn from an unknown distribution with an unknown 𝝈𝟐 (use 𝒕 table)
Example 12
Forty-one randomly selected adults who buy books for general reading were asked how much they
actually spend on books per year. The sample produced a mean of RM 145 and a standard deviation of
RM 30 for such annual expenses. Determine a 99% confidence interval for the corresponding population
mean.
N 41 T 145 5 30 1 99 0 A 20 01
from table
CV th 2 goyt
I
o oos r yo
99 CI for M 145 It 2.704
if
132.3312 157 6688
we are 99 confident
Example 13
An experienced poultry farmer knows that the mean weight 𝜇 kg for a large population of chickens will
gift
samedeviation of the weights should remain at 0.70 kg. A random
vary from season to season but the standard
sample of 100 chickens is taken from the population and the weight 𝑥 kg of each chicken in the sample is
recorded giving ∑ 𝑥 = 190.2. Find a 95% confidence interval for 𝜇.
0.70
assumed std der
past records
of 90.2 1.902
100
12
Let me be the weight chicken
Zo 025 1 9600
ZI
9506 confidence internal
for u
STA408 Topic 2: Estimation
(I) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐 are known)
Mean and standard deviation of 𝑥̅ − 𝑥̅ which is (approximately) normal has a mean and standard
deviation as follow:
Mean 𝜇 ̅ ̅ =𝜇 −𝜇
𝜎 𝜎
Standard deviation 𝜎 ̅ ̅ = +
𝑛 𝑛
Interval Estimation of 𝝁𝟏 − 𝝁𝟐
When using the normal distribution, the (1 − 𝛼 )100% confidence interval for 𝜇 − 𝜇 is
(𝑥̅ − 𝑥̅ ) ± 𝑧 𝜎 ̅ ̅
Example 14
A survey of low-and middle-income households show that consumers aged 65 years and older had an
average credit card debt of RM 10, 235 and consumers in the 50- to 64-year group had an average credit
card debt of RM 9, 342 at the time of survey. Suppose that these averages where based on the random
samples of 1200 and 1400 people for the two groups, respectively. Further, assume that the population
standard deviations for the two groups were RM 2, 800 and RM 2, 500, respectively. Let 𝜇 and 𝜇 be the
r respective population means for the two groups, people ages 65 years and older and people in the 50- to
64- year age group. Construct a 95% confidence interval for 𝜇 − 𝜇 . Based on the interval, are 𝜇 and 𝜇
equal? Explain.
T 2800 CL 95 5
02 2500
1 9600
contiffertal Zo ow
10235 9342
I 1 9600138 t
687 4557 1098 5443
We are 9506 confident that difference of the
population mean
credit card debts for the norms
13
is between Rn
687.66
and Ron 1098.54
If M M 3 M Ma O
If M M 3 M Ma O
If
M M 40 M FM
Is
it Me or
M
M
M Z M because all the values in
are positive
the interval
Z
M M 2M M 20
positive
STA408 Topic 2: Estimation
(II) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 = 𝝈𝟐𝟐 and unknown)
When the standard deviation of two populations are equal, we can use 𝜎 for both 𝜎 and 𝜎 . However,
since 𝜎 is unknown, we replace it by its point estimator, 𝑠 , called the pooled standard deviation.
Example 15
A consumer agency wanted to estimate the difference in mean amounts of caffeine in two different brands
of coffee. The agency took a sample of 15 one-pound jars of Brand I coffee that showed the mean amount
of caffeine in these jars to be 80 milligrams jar with a standard deviation of 5 milligrams. Another sample
of 12 one-pound jars of Brand II coffee gave a mean amount of caffeine equal to 77 milligrams per jar
with a standard deviation of 6 milligrams. Construct a 98% confidence interval for the difference between
the mean amounts of caffeine in one-pound jars of these two brands of coffee. Assume that the two
populations are normally distributed and that the standard deviations of two populations are equal.
14
STA408 Topic 2: Estimation
Example 16
The following Minitab output (version 17) was obtained from two independent samples selected from
two normally distributed populations with unknown but equal standard deviations.
Two-Sample T-Test and CI: S1, S2
Two-sample T for S1 vs S2
(a) Verify that the pooled standard deviation of the data is 9.0587
(b) Show that the 95% confidence interval of the difference in mean of the two populations is
between −4.94 and 10.91.
Method
μ₁: population mean of Sample 1
µ₂: population mean of Sample 2
Difference: μ₁ - µ₂
Equal variances are assumed for this analysis.
Descriptive Statistics
Sample N Mean StDev SE Mean
Minitab Sample 1 13 48.94 8.31 2.3
version 21 Sample 2 10 45.95 9.97 3.2
Test
Null hypothesis H₀: μ₁ - µ₂ = 0
Alternative hypothesis H₁: μ₁ - µ₂ ≠ 0
T-Value DF P-Value
0.78 21 0.441
15
STA408 Topic 2: Estimation
(III) Difference in means of two normal populations, 𝝁𝟏 − 𝝁𝟐 (variances 𝝈𝟐𝟏 ≠ 𝝈𝟐𝟐 and unknown)
Degrees of freedom
𝑠 𝑠
+
𝑛 𝑛
𝜈=
𝑠 𝑠
𝑛 𝑛
+
𝑛 −1 𝑛 −1
Example 17
Refer to Example 15. Construct a 98% confidence interval for the difference between the mean amounts
of caffeine in one-pound jars of these two brands. Assume that two populations are normally distributed
and that the standard deviations of the two populations are not equal.
16
STA408 Topic 2: Estimation
Example 18
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal standard deviations.
Two-Sample T-Test and CI: S1, S2
Two-sample T for S1 vs S2
Descriptive Statistics
Sample N Mean StDev SE Mean
Minitab Sample 1 13 48.94 8.31 2.3
version 21 Sample 2 10 45.95 9.97 3.2
Test
Null hypothesis H₀: μ₁ - µ₂ = 0
Alternative hypothesis H₁: μ₁ - µ₂ ≠ 0
T-Value DF P-Value
0.77 17 0.454
17
STA408 Topic 2: Estimation
∑𝑑
Mean, 𝑑̅ 𝑑̅ =
𝑛
(∑ 𝑑 )
Standard deviation, 𝑠 ∑𝑑 −
𝑠 = 𝑛
𝑛−1
Mean, 𝜇 𝜇 =𝜇
𝜎
Standard deviation, 𝜎 𝜎 =
√𝑛
18
STA408 Topic 2: Estimation
Example 19
A researcher wanted to find the effect of special diet on systolic blood pressure. She selected a sample of
seven adults and put them on this dietary plan for 3 months. The table below gives the systolic blood
pressure (in mm Hg) of these seven adults before and after the completion of this plan.
Let 𝜇 be the mean reduction in the systolic blood pressures due to this special dietary plan for the
population of all adults. Construct a 95% confidence interval for 𝜇 . Assume that the population paired
differences is (approximately) normally distributed.
Let 𝑑 =
𝑑
The following outputs are obtained from the data in Example 19. Take note of the difference, 𝑑̅ value.
Software Output
Paired T-Test and CI: Before, After
19
STA408 Topic 2: Estimation
Test
Null hypothesis H₀: μ_difference = 0
Alternative hypothesis H₁: μ_difference ≠ 0
T-Value P-Value
1.23 0.266
20
STA408 Topic 2: Estimation
Chi-square 𝝌𝟐 Distribution
A distribution based on degrees of freedom, 𝜈.
The symbol is 𝜒 .
( )
The chi-square distribution is obtained from the values of when random samples are
selected from a normally distributed population whose variance is 𝜎 .
A chi-square variable cannot be negative.
The distribution is skewed to the right.
At about 100 degrees of freedom, chi-square distribution becomes approximately normal.
The area under each chi-square distribution is equal to 1.00 or 100%.
Example 20
Find the values of 𝜒 and 𝜒 for a 90% confidence interval when 𝑛 = 25.
21
STA408 Topic 2: Estimation
(𝑛 − 1)𝑠 (𝑛 − 1)𝑠
<𝜎<
𝜒 𝜒
Example 21
Find the 95% confidence interval for the variance and standard deviation of the nicotine content of
cigarettes manufactured if a sample of 20 cigarettes has a standard deviation of 1.6 milligrams.
Example 22
Find the 90% confidence interval for the variance and standard deviation for the price in dollars of an
adult single-day ski lift ticket. The data represent a selected sample of nationwide ski resorts. Assume the
variable is normally distributed.
59 54 53 52 51 39 49 46 49 48
22
STA408 Topic 2: Estimation
Some Examples of statistical outputs for the confidence interval for one variance using the data given in
Example 22.
Software Output
Test and CI for One Variance: ski_lift_ticket
Method
Minitab Statistics
Version 17 Variable N StDev Variance
ski_lift_ticket 10 5.31 28.2
Method
σ: standard deviation of Ski-lift Ticket
The Bonett method is valid for any continuous distribution.
Minitab The chi-square method is valid only for the normal distribution.
Version 21
Descriptive Statistics
95% CI for σ 95% CI for σ
N StDev Variance using Bonett using Chi-Square
10 5.31 28.2 (2.99, 11.73) (3.65, 9.70)
F distribution
The values of F cannot be negative because variances are always positive or zero.
The distribution is positively skewed.
The mean value of F is approximately equal to 1.
The F distribution is a family of curves based on degrees of freedom of variance of the numerator
and the degrees of freedom of the variance of the denominator.
23
STA408 Topic 2: Estimation
2.7 Interval Estimation of Two Population Variances: Estimating the Ratio of Two Variances
The point estimate of the ratio of two population variances is given by the ratio of the
sample variances.
If 𝜎 and 𝜎 are the variances of normal populations, we can establish an interval estimate of
by using the statistic
𝜎 𝑠
𝐹=
𝜎 𝑠
The random variable 𝐹 has an 𝐹 -distribution with 𝜈 = 𝑛 − 1 and 𝜈 = 𝑛 − 1 degrees of
freedom.
𝑠 1 𝜎 𝑠
< < 𝐹 , ,
𝑠 𝐹 , , 𝜎 𝑠
𝑠 1 𝜎 𝑠
< < 𝐹 , ,
𝑠 𝐹 , , 𝜎 𝑠
Note: 𝑠 > 𝑠
Example 23
The following Minitab output was obtained from two independent samples selected from two normally
distributed populations with unknown and unequal variances. Show that the lower limit for the 95%
confidence interval of the ratio of variances and standard deviations for the two populations are as given
in the output.
95% CI for
Variable N StDev Variance StDevs
S1 13 8.309 69.038 (5.958, 13.716)
Minitab S2 9 6.564 43.092 (4.434, 12.576)
Version 17
Ratio of standard deviations = 1.266
Ratio of variances = 1.602
24
STA408 Topic 2: Estimation
Method
σ₁²: variance of Sample 1
σ₂²: variance of Sample 2
Ratio: σ₁²/σ₂²
F method was used. This method is accurate for normal data only.
Descriptive Statistics
Sample N StDev Variance 95% CI for σ²
Sample 1 13 8.309 69.038 (35.500, 188.123)
Sample 2 9 6.564 43.092 (19.660, 158.155)
Ratio of Variances
Estimated Ratio 95% CI for Ratio using F
1.60211 (0.381, 5.626)
Test
Null hypothesis H₀: σ₁² / σ₂² = 1
Alternative hypothesis H₁: σ₁² / σ₂² ≠ 1
Significance level α = 0.05
25
STA408 Topic 2: Estimation
Example 24
A study was conducted by the Department of Zoology at Virginia Tech to estimate the difference in the
amounts of the chemical orthophosphorus measured at two different stations on the James River.
Orthophosphorus was measured in milligrams per litre. Thirteen samples were collected from station 1,
and 11 samples were obtained from station 2. The 13 samples from station had an average
orthophosphorus content of 3.84 milligrams per litre and a standard deviation of 3.07 milligrams per
litre, while the 11 samples from station 2 had an average content of 1.49 milligrams per litre and a
standard deviation of 0.80 milligram per litre. Assume that the observations came from normal
populations.
(a) Construct a 98% confidence interval for the ratio of two variances and standard deviations. Based
on the confidence interval, what can you conclude about the two population variances?
(b) From the result in (a), construct the 98% confidence interval for the difference in the population
mean amounts of the chemical orthophosphorus measured at two different stations. Based on the
interval, is there a significant difference in the two population means?
26
STA408 Topic 2: Estimation
Additional Notes
In a similar manner, when we consider the confidence interval for the ratio of two population variances,
𝝈𝟐𝟏
,
𝝈𝟐𝟐
However, if the value of 1 is not in the interval, then we can conclude that 𝜎 ≠ 𝜎 because ≠ 1.
For example, if we consider the confidence interval for Example 24, the 98% confidence interval for is
(3.127, 63.326). Since the value of 1 is not in the interval, we can conclude that 𝜎 ≠ 𝜎 because ≠ 1.
Remember:
To draw conclusion on the confidence interval for the difference in two population means, check if the
value of 0 is in the interval; however,
If we want to conclude on the confidence interval for the ratio of two population variances, check if the
value of 1 is in the interval.
27