Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Topic: Sampling and sampling distribution of sample mean: Central limit theorem

Content:

Definition
The Central Limit Theorem states that the sampling distribution of the sample
means approaches a normal distribution as the sample size gets larger — no matter
what the shape of the population distribution. This fact holds especially true for sample
sizes over 30.

Graphical illustration of Central Limit Theorem


Roll a fair die and graph its probability distribution using histogram. The more times you roll the die,
the shape of the distribution of the means tends to look like a normal distribution.

(𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑑𝑖𝑒 𝑜𝑛𝑐𝑒) 𝑛 = 1 (𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑑𝑖𝑒 𝑡𝑤𝑖𝑐𝑒) 𝑛 = 2

(𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑑𝑖𝑒 𝑡ℎ𝑟𝑖𝑐𝑒) 𝑛 = 3 (𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑑𝑖𝑒 𝑓𝑜𝑢𝑟 𝑡𝑖𝑚𝑒𝑠) 𝑛 = 4

(𝑟𝑜𝑙𝑙𝑖𝑛𝑔 𝑎 𝑑𝑖𝑒 𝑓𝑖𝑣𝑒 𝑡𝑖𝑚𝑒𝑠) 𝑛 = 5


Definition
SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
USING CENTRAL LIMIT THEOREM
Let x be a random variable with a normal distribution whose population mean is 𝜇
and population standard deviation is 𝜎. Let 𝑥̅ be the sample mean corresponding to
the random samples of size n taken from the x distribution. The following are true:
a. The 𝑥̅ distribution is a normal distribution.
b. The mean of the sampling distribution of means is equal to the mean of the
population, that is 𝜇𝑥̅ = 𝜇.
c. The variance of the sampling distribution of means is equal to the variance of
𝜎2
the population divided by the sample size n, that is 𝜎𝑥̅2 =
𝑛
d. The standard deviation of the sampling distribution of means is equal to the
standard deviation of the population divided by the square root of the sample
𝜎
size n, that is 𝜎𝑥̅ =
√𝑛

PROBLEMS INVOLVING SAMPLING DISTRIBUTION OF SAMPLE MEAN

STEPS IN SOLVING THE PROBABILITIES OF SAMPLE MEAN


(USING CENTRAL LIMIT THEOREM)
Step 1: Draw the normal curve. Find the population mean (𝜇)., sample mean (𝑥̅ ), and
𝜎
standard deviation of sample mean (𝜎𝑥̅ = )
√𝑛
𝑥̅ −𝜇
Step 2: Convert the necessary values of 𝑥̅ to z-values using 𝑧 = 𝜎𝑥̅
Step 3: Draw the standard normal curve
Step 4: Solve for the area/percentile/probability (use z distribution table)
Step 5: Make conclusion

Examples:
1. The average cost per household of owning a brand new car is ₱5,000. Suppose that we
randomly selected 40 households, determine the probability that the sample mean for these
households is more than ₱5,350. Assume the variable is normally distributed and the
standard deviation for population is ₱1,230.

Solution:

Step 1: 𝜇 = ₱5,000
𝑥̅ = ₱5,350
𝜎 1,230
𝜎𝑥̅ = 𝑛 = = 194.48
√ √40

4416.56 4611.04 4805.52 5000 5194.48 5388.96 5583.44


5350
𝑥̅ −𝜇 5350−5000
Step 2: 𝑧 = = = 1.80
𝜎𝑥̅ 194.48

Step 3:

-3 -2 -1 0 1 2 3
1.80

Refer to the PROCEDURE


Step 4: 𝐴 = 0.50 − 𝑧1.80 TABLE #4 and STANDANRD
NORMAL TABLE in Module 4.
𝐴 = 0.50 − 0.4641
𝐴 = 0.0359 𝑜𝑟 3.59%

Step 5: Therefore, the probability that the sample mean for these households is more than
₱5,350 is 0.0359.

2. The average age of the car registered to the Land Transfortation Office (LTO) is 10 years or
120 months. Assume the standard deviation is 20 months. If a random sample of 35 cars is
selected, find the probability that the mean of their ages is between 115 and 129 months.

Solution:

Step 1: 𝜇 = 120
𝑥̅1 = 115; 𝑥̅2 = 129
𝜎 20
𝜎𝑥̅ = 𝑛 = = 3.38
√ √35

109.86 113.24 116.62 120 123.38 126.76 130.14

115 129
𝑥̅ 1 −𝜇 115−120 𝑥̅ 2 −𝜇 129−120
Step 2: 𝑧1 = = = −1.48 𝑧2 = = = 2.66
𝜎𝑥̅ 3.38 𝜎𝑥̅ 3.38

Step 3:

-3 -2 -1 0 1 2 3

-1.48 2.66
Refer to the PROCEDURE
Step 4: 𝐴 = 𝑧−1.48 + 𝑧2.66
TABLE #2 and STANDANRD
𝐴 = 0.4306 + 0.4961 NORMAL TABLE in Module 4.
𝐴 = 0.9267 𝑜𝑟 92.67%

Step 5: Therefore, the probability that the mean of their ages is between 115 and 129
months is 0.9267

3. The average number of litters of fresh milk that a person consumes in a month is 18 liters.
Assume that the standard deviation is 4.5 liters and the distribution is approximately normal.
If a sample of 32 individuals are selected, find the probability that mean of the sample will be
more than 16 liters.

Solution:

Step 1: 𝜇 = 18
𝑥̅ = 16
𝜎 4.5
𝜎𝑥̅ = 𝑛 = = 0.80
√ √32

15.6 16.4 17.2 18 18.8 19.6 20.4


16

𝑥̅ −𝜇 16−18
Step 2: 𝑧 = = = −2.50
𝜎𝑥̅ 0.80

Step 3:

-3 -2 -1 0 1 2 3
-2.50

Refer to the PROCEDURE


Step 4: 𝐴 = 0.50 + 𝑧2.50 TABLE #5 and STANDANRD
𝐴 = 0.50 + 0.4938 NORMAL TABLE in Module 4.
𝐴 = 0.9938 𝑜𝑟 99.38%

Step 5: Therefore, the probability that mean of the sample will be more than 16 liters is
0.9938.

Integration

Central Limit Theorem states that the sampling distribution of the sample
means approaches a normal distribution as the sample size gets larger — no matter what
the shape of the population distribution.
In life, as we continue our journey, we face a lot of problems. And it will become normal for
us as we used to face it. The point is, as you observe, the level of difficulty of problems did
not decrease, it is us that becomes stronger. In every problems that we conquer, there is a
new version of you that becomes a lot stronger.
What are the problems that you think made you stronger?
I. Topic: Key concepts of estimation of population mean and population proportion

A. t- DISTRIBUTION

The t-distribution is a type of normal distribution that is used for smaller sample sizes. Normally-
distributed data form a bell shape when plotted on a graph, with more observations near the mean
and fewer observations in the tails.

The t-distribution is used when data are approximately normally distributed, which means the data
follow a bell shape but the population variance is unknown. The variance in a t-distribution is
estimated based on the degrees of freedom of the data set (total number of observations minus 1).

It is a more conservative form of the standard normal distribution, also known as the z-distribution.
This means that it gives a lower probability to the center and a higher probability to the tails than
the standard normal distribution.

A1. CHARACTERISTICS OF THE T DISTRIBUTION

The t distribution shares some characteristics of the normal distribution and differs from it in others.
The t distribution is similar to the
standard normal distribution in these
ways:
1. It is bell-shaped.
2. It is symmetrical about the
mean.
3. The mean, median, and mode
are equal to 0 and are located
at the center of the distribution.
4. The curve never touches the x-
axis.

The t distribution differs from the standard normal distribution in the following ways:
1. The variance is greater than 1.
2. The t distribution is actually a
family of curves based on the
concepts of degrees of
freedom, which is related to
sample size.
3. As the sample size increases,
the t distribution approaches
the standard normal
distribution.
t values for various values of df

Confidence interval
80% 90% 95% 98% 99% 99.8% 99.9%

𝜶 level two-tailed test


0.2 0.1 0.05 0.02 0.01 0.002 0.001

𝜶 level one-tailed test


df 0.1 0.05 0.025 0.01 0.005 0.001 0.0005

1 3.078 6.314 12.706 31.821 63.657 318.313 636.589


2 1.886 2.920 4.303 6.965 9.925 22.327 31.598
3 1.638 2.353 3.182 4.541 5.841 10.215 12.924
4 1.533 2.132 2.776 3.747 4.604 7.173 8.610
5 1.476 2.015 2.571 3.365 4.032 5.893 6.869
6 1.440 1.943 2.447 3.143 3.707 5.208 5.959
7 1.415 1.895 2.365 2.998 3.499 4.785 5.408
8 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 1.372 1.812 2.228 2.764 3.169 4.144 4.587
11 1.363 1.796 2.201 2.718 3.106 4.025 4.437
12 1.356 1.782 2.179 2.681 3.055 3.930 4.318
13 1.350 1.771 2.160 2.650 3.012 3.852 4.221
14 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 1.337 1.746 2.120 2.583 2.921 3.686 4.015
17 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 1.328 1.729 2.093 2.539 2.861 3.579 3.883
20 1.325 1.725 2.086 2.528 2.845 3.552 3.849
21 1.323 1.721 2.080 2.518 2.831 3.527 3.819
22 1.321 1.717 2.074 2.508 2.819 3.505 3.792
23 1.319 1.714 2.069 2.500 2.807 3.485 3.768
24 1.318 1.711 2.064 2.492 2.797 3.467 3.745
25 1.316 1.708 2.060 2.485 2.787 3.450 3.725
26 1.315 1.706 2.056 2.479 2.779 3.435 3.707
27 1.314 1.703 2.052 2.473 2.771 3.421 3.690
28 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 1.310 1.697 2.042 2.457 2.750 3.385 3.646
40 1.303 1.684 2.021 2.423 2.704 3.307 3.551
60 1.296 1.671 2.000 2.390 2.660 3.232 3.460
120 1.289 1.658 1.980 2.358 2.617 3.160 3.373
∞ (σknown) 1.282 1.645 1.960 2.327 2.576 3.091 3.291
A2. IDENTIFYING PERCENTILES (t-VALUE) USING t-TABLE

Examples:
1. Find the t-value for 95% confidence interval when the sample size is 22.

Solution:
The 𝑑. 𝑓. = 𝑛 − 1 = 22 − 1 = 21. Find
the 21 in the left column and 95% in
the row labeled confidence intervals.
The intersection where the two meet
gives the value for 𝑡 which is 2.080.

2. Find the t-value for 90%


confidence interval when
the sample size is 15.

Solution:
The 𝑑. 𝑓. = 𝑛 − 1 = 15 − 1 =
14. Find the 14 in the left
column and 90% in the row
labeled confidence intervals.
The intersection where the two
meet gives the value for t which
is 1.761.

B. PARAMETER ESTIMATION

When a researcher wants to determine information about a particular population but infeasible to
obtain it, they usually take a random sample from the population. Using that sample, you calculate
the corresponding sample characteristic, which is used to summarize information about the unknown
population characteristic. The population characteristic of interest is called a parameter and the
corresponding sample characteristic is the sample statistic or parameter estimate.
The best point to estimate the population mean 𝜇 is the sample mean 𝑋̅. It is also called the point
estimate of population mean. However, since we are estimating the population mean based on the
sample mean, the point estimate is not accurate. Therefore, establishing a margin of error is
necessary. Adding or subtracting the margin of error from point estimate gives us interval estimate
(also called confidence interval).

Definitions
➢ Point estimate is a specific numerical value estimate of a parameter.
➢ The margin error, also called the maximum error of the estimate, is the maximum likely
difference between the point estimate of a parameter and the actual value of the
parameter.
➢ An interval estimate if a parameter is an interval or a range of values used to estimate
the parameter. This estimate may or may not contain the value of the parameter being
estimated
➢ The confidence level of an interval estimate of a parameter is the probability that the
interval estimate will contain the parameter, assuming that a large number of sample
are selected and that the estimation process on the same parameter is repeated.
➢ The confidence interval is a specific interval estimate of a parameter determined by
using data obtained from a sample and by using the specific confidence level of the
estimate.

B1. ESTIMATION OF THE POPULATION MEAN


Formula for Margin of Error (E) Where
when 𝜎 is known
𝜎 𝑍𝛼/2 = 1.65 (for a 90% confidence interval)
𝐸 = 𝑍𝛼/2 ( )
√ 𝑛 𝑍𝛼/2 = 1.96 (for a 95% confidence interval)
𝑍𝛼/2 = 2.58 (for a 99% confidence interval)
Or
𝜎 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑠 𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
𝐸 = 𝑡𝛼/2 ( )
√𝑛
when 𝜎 is unknown
𝑠 = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑑𝑓 = 𝑛 − 1

Note: use t-table the determine 𝑡𝑎/2 ;


Formula for the Confidence Interval of the Where
Mean for a Specific 𝜶 𝐸 = 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
𝑋̅ = 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 (𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛)
𝑋̅ − 𝐸 < 𝝁 < 𝑋̅ + 𝐸 𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛

Length of the Confidence Interval of the Where


Mean for a Specific 𝜶 𝐸 = 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
𝑋̅ = 𝑝𝑜𝑖𝑛𝑡 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 (𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛)
𝐿𝑒𝑛𝑔𝑡ℎ = [𝑋̅ + 𝐸] − [𝑋̅ − 𝐸]
or
𝐿𝑒𝑛𝑔𝑡ℎ = 2𝐸

Examples:
1. A researcher wishes to estimate the number of days it takes an automobile dealer to sell a
Chevrolet Aveo. A sample of 50 cars had a mean time on the dealer’s lot of 54 days. Assume
the population standard deviation to be 6.0 days. Find the best point estimate of the population
mean and the 95% confidence interval of the population mean. Compute the margin of error,
confidence interval and the length of the confidence interval.
Solution: (𝜎 is known)
• The best point estimate (𝑋̅) of the population mean is 54 days.
• For the 95% confidence level, use 𝑍𝛼/2 = 1.96
• Margin of Error:
𝜎
𝐸 = 𝑍𝛼/2 ( )
√𝑛
6
= 1.96 ( )
√50
= 1.7

• Confidence interval:
𝑋̅ − 𝐸 < 𝝁 < 𝑋̅ + 𝐸

[54 − 1.7] < 𝜇 < [54 + 1.7]

52.3 < 𝜇 < 55.7

52 < 𝜇 < 56

Interpretation: Hence, one can say that with 95% confidence, the interval between 52.3 and
55.7 days does contain the population mean, based on a sample of 50
automobiles.

• Length of Confidence interval:


𝑳𝒆𝒏𝒈𝒕𝒉 = [𝑋̅ + 𝐸] − [𝑋̅ − 𝐸]

= [54 + 1.7] − [54 − 1.7]

= 55.7 − 52.3

= 3.4
-------------or--------------
𝑳𝒆𝒏𝒈𝒕𝒉 = 2𝐸

= 2(1.7) = 3.4

2. A survey of 30 emergency room patients found that the average waiting time for treatment
was 174.3 minutes. Assuming that the population standard deviation is 46.5 minutes, find the
best point estimate of the population mean and the 99% confidence interval of the population
mean. Compute the length of the confidence interval.

Solution:
• The best point estimate is 174.3 minutes.
• For the 99% confidence level, use 𝑍𝛼/2 = 2.58
• Length of confidence interval:

𝑳𝒆𝒏𝒈𝒕𝒉 = [𝑋̅ + 𝐸 ] − [𝑋̅ − 𝐸]

46.5 46.5
𝑳𝒆𝒏𝒈𝒕𝒉 = [174.3 + 2.58 ( )] − [174.3 − 2.58 ( )]
√30 √30

= [174.3 + 21.9] − [174.3 − 21.9]

= 196.2 − 152.4

= 43.8
3. Ten randomly selected people were asked how long they slept at night. The mean time was
7.1 hours, and the sample standard deviation was 0.78 hour. Assume the variable is normally
distributed. Find the best point estimate of the population mean and the 95% confidence
interval of the population mean. Compute the margin of error, confidence interval and the
length of the confidence interval

Solution: (𝜎 is unknown)
• The best point estimate (𝑋̅) of the population mean is 7.1
• For 95% as confidence interval, 𝑑𝑓 = 𝑛 − 1 = 10 − 1 = 9, thus 𝑡𝛼/2 = 2.262
• Margin of error: 𝑠 = 0.78

𝑠
𝐸 = 𝑡𝛼/2 ( )
√𝑛

0.78
= 2.262 ( )
√10 Please refer to IDENTIFYING
= 0.56 PERCENTILES (t-VALUE) USING t-TABLE

• Confidence interval:

[7.1 − 0.56] < 𝜇 < [7.1 + 0.56]

6.54 < 𝜇 < 7.66

6.5 < 𝜇 < 7.7

Therefore, one can be 95% confident that the population mean is between 6.5 and 7.7 hours.

• Length of Confidence interval:

𝑳𝒆𝒏𝒈𝒕𝒉 = [7.1 + 0.56] − [7.1 − 0.56]

= 7.66 − 6.54

= 1.12

4. The data represent a sample of the number of home fires started by candles for the past
several years. (Data from the National Fire Protection Association.) Find the 99% confidence
interval for the mean number of home fires started by candles each year, and its length.

5460 5900 6090 6310 7160 8440 9930

Solution: (𝜎 is unknown)
• The best point estimate (𝑋̅) of the population mean is
∑𝑥 5460+5900+6090+6310+7160+8440+9930
𝑋̅ = 𝑛 = = 7041.4
7

• For 99% as confidence interval, 𝑑𝑓 = 𝑛 − 1 = 7 − 1 = 6, thus 𝑡𝛼/2 = 3.707


• Confidence interval:
• Sample standard deviation:

∑(𝑥 − 𝑥̅ )2 (5460 − 7041.4)2 + (5900 − 7041.4)2 + ⋯ + (9930 − 7041.4)2


𝑠=√ =√
𝑛−1 6

= 1,610.3
1610.3 1610.3
[7041.4 − 3.707 ( )] < 𝜇 < [7041.4 + 3.707 ( )]
√7 √7

[7041.4 − 2256.2] < 𝜇 < [7041.4 + 2256.2]

4785.2 < 𝜇 < 9297.6

Interpretation: One can be 99% confident that the population mean number of home fires
started by candles each year is between 4785.2 and 9297.6, based on a sample
of home fires occurring over a period of 7 years.

• Length of Confidence interval:

1610.3 1610.3
𝑳𝒆𝒏𝒈𝒕𝒉 = [7041.4 + 3.707 ( )] − [7041.4 − 3.707 ( )]
√7 √7

= [7041.4 + 2256.2] − [7041.4 − 2256.2]

= 9297.6 − 4785.2

= 4512.4

B2. ESTIMATION OF THE POPULATION PROPORTION

Formula for Margin of Error for when 𝑛𝑝̂ and 𝑛𝑞̂ are each greater than or
Proportion equal to 5.

𝑍𝛼/2 = 1.65 (for a 90% confidence interval)


𝑝̂ 𝑞̂
𝑬 = 𝑍𝛼/2 √ 𝑍𝛼/2 = 1.96 (for a 95% confidence interval)
𝑛 𝑍𝛼/2 = 2.58 (for a 99% confidence interval)
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
where
𝑋
𝑝̂ = 𝑛 𝑝̂ = (𝑟𝑒𝑎𝑑 𝑎𝑠 𝑝 ℎ𝑎𝑡) 𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
̂𝑞 =
𝑛−𝑋
or 𝑞̂ = 1 − 𝑝̂ 𝑋= number of sample units that possess the
𝑛 characteristics of interest

Formula for the Confidence Interval of the Where


Mean for a Specific 𝜶 𝐸 = 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
𝑝̂ = (𝑟𝑒𝑎𝑑 𝑎𝑠 𝑝 ℎ𝑎𝑡) 𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
𝑝̂ − 𝐸 < 𝒑 < 𝑝̂ + 𝐸 𝜇 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛

Length of the Confidence Interval of the Where


Mean for a Specific 𝜶 𝐸 = 𝑚𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝑒𝑟𝑟𝑜𝑟
𝑝̂ = 𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
𝐿𝑒𝑛𝑔𝑡ℎ = [𝑝̂ + 𝐸] − [𝑝̂ − 𝐸]
or
𝐿𝑒𝑛𝑔𝑡ℎ = 2𝐸
Examples:

1. A survey conducted by Sally and Fernando of 1404 respondents found that 323 students paid
for their education by student loans. Find the 90% confidence of the true proportion of students
who paid for their education by student loans, and its length.

Solution:
• Confidence interval:

✓ 𝑍𝛼 = 1.65
2
323
✓ 𝑝̂ = 1404 = 0.23,
✓ 𝑞̂ = 1 − 𝑝 = 1 − 0.23 = 0.77

(0.23)(0.77) (0.23)(0.77)
[0.23 − 1.65√ ] < 𝜇 < [0.23 + 1.65√ ]
1404 1404

[0.23 − 0.019 ] < 𝜇 < [0.23 + 0.019 ]

0.211 < 𝜇 < [0.249 ]

21.1% < 𝜇 < [24.9% ]

Interpretation: Hence, you can be 90% confident that the percentage of students who pay for
their college education by student loans is between 21.1 and 24.9%

• Length of Confidence interval:

(0.23)(0.77) (0.23)(0.77)
𝑳𝒆𝒏𝒈𝒕𝒉 = [0.23 + 1.65√ ] − [0.23 − 1.65√ ]
1404 1404

= [0.23 + 0.019 ] − [0.23 − 0.019 ]

= 0.249 − 0.211

= 0.038

2. A survey of 1721 people found that 15.9% of individuals purchase religious books at Christian
bookstores. Find the length of 95% confidence interval of the true proportion of people who
purchase their religious books at Christian bookstores.
Solution:
• Confidence interval:

✓ 𝑍𝛼 = 1.96,
2
✓ 𝑝̂ = 0.159 = 15.9%,
✓ ̂𝑞 = 1 − 0.159 = 0.841

(0.159)(0.841) (0.159)(0.841)
[0.159 − 1.96√ ] < 𝜇 < [0.159 + 1.96√ ]
1721 1721

[0.159 − 0.017] < 𝜇 < [0.159 + 0.017 ]

0.142 < 𝜇 < 0.176

Interpretation: Hence, you can say with 95% confidence that the true percentage is
between 14.2 and 17.6%.

C. SAMPLE SIZE DETERMINATION USING CONFIDENCE INTERVAL

Formula for the Minimum Sample Size Where


Needed for an Interval Estimate of the 𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Population Mean 𝐸 = 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟
𝜎 = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑍𝛼/2 . 𝜎 2
𝒏=( ) 𝑍𝛼/2 = 1.65 (for a 90% confidence interval)
𝐸 𝑍𝛼/2 = 1.96 (for a 95% confidence interval)
𝑍𝛼/2 = 2.58 (for a 99% confidence interval)

NOTE: If necessary, round up to obtain a whole


number.
Formula for Minimum Sample Size Needed when 𝑛𝑝̂ and 𝑛𝑞̂ are each greater than or equal
for Interval Estimate of a Population to 5.
Proportion 𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒

𝑍𝛼/2 2 𝑍𝛼/2 = 1.65 (for a 90% confidence interval)


𝒏 = 𝑝̂ 𝑞̂ ( ) 𝑍𝛼/2 = 1.96 (for a 95% confidence interval)
𝐸
𝑍𝛼/2 = 2.58 (for a 99% confidence interval)
where
𝑋
𝑝̂ = 𝑛 𝑝̂ = (𝑟𝑒𝑎𝑑 𝑎𝑠 𝑝 ℎ𝑎𝑡) 𝑆𝑎𝑚𝑝𝑙𝑒 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛
𝑛−𝑋
𝐸 = 𝑀𝑎𝑟𝑔𝑖𝑛 𝑜𝑓 𝐸𝑟𝑟𝑜𝑟
̂𝑞 = or 𝑞̂ = 1 − 𝑝̂ 𝑋= number of sample units that possess the
𝑛
characteristics of interest

NOTE: If necessary, round up to obtain a


whole number.
Examples:

1. A scientist wishes to estimate the average depth of a river. He wants to be 99% confident that
the estimate is accurate within 2 feet. From the previous study, the standard deviation of the
depths measured was 4.33 feet.

Solution: 𝑍𝛼/2 = 2.58 and 𝐸 = 2


2
𝑍𝛼/2 . 𝜎 2 (2.58)(4.33) 11.1714 2
𝒏=( ) =[ ] =[ ] = (5.5857)2 = 31.2 ≈ 32
𝐸 2 2

Round the value 31.2 up to 32. Therefore, to be 99% confident that the estimate is within 2
feet of the true mean depth, the scientist needs a sample of at least 32 measurements.

2. A pizza shop owner wishes to find the 95% confidence interval of the true mean cost of a large
plain pizza. How large should the sample be if she wishes to be accurate to within ₱7.35. A
previous study showed that the standard deviation of the price was ₱12.74.

Solution: 𝑍𝛼/2 = 1.96 and 𝐸 = 7.35

2
𝑍𝛼/2 . 𝜎 2 (1.96)(12.74) 24.9704 2
𝒏=( ) =[ ] =[ ] = (12.4852)2 = 155.9 ≈ 156
𝐸 2 2

Therefore, to be 95% confident that the estimate cost is within ₱7.35 of the true mean
cost, the pizza shop owner needs a sample of at least 156 pizza.

3. A researcher wishes to estimate, with 95% confidence, the proportion of people who own a
home computer. A previous study shows that 40% of those interviewed had a computer at
home. The researcher wishes to be accurate within 2 % of the true proportion. Find the
minimum sample size necessary.

Solution:
𝑍𝛼/2 = 1.96, 𝐸 = 0.02, 𝑝̂ = 0.40, 𝑎𝑛𝑑 𝑞̂ = 0. 60

1.96 2
𝒏 = (0.40)(0.60) ( )
0.02
= (0.24)(9604)
= 2304.96

Which, when rounded up, is 2305 people to interview.

4. A researcher wishes to estimate the percentage of M & M that are brown. He wants to be 99%
confident and be accurate within 3% of the true proportion. How large a sample size would be
necessary?
Solution:
Since no prior knowledge of 𝑝̂ is known, assign a value of 0.5 and then 𝑞̂ = 1 − 𝑝̂ = 1 − 0.5 =
0.5. Substitute in the formula, using 𝐸 = 0.03.

𝑍𝛼/2 2 2.58 2
𝒏 = 𝑝̂ 𝑞̂ ( ) = (0.5)(0.5) ( ) = 1,849
𝐸 0.03
Here, a sample size of 1,849 would be needed

You might also like