Professional Documents
Culture Documents
1-Introduction To Statistics PDF
1-Introduction To Statistics PDF
1-Introduction To Statistics PDF
Ralph Lteif
6 – Estimates and Sample Sizes with One Sample
6-1 Overview
6-2 Estimating a Population Proportion
6-3 Estimating a Population Mean: σ Known
6-4 Estimating a Population Mean: σ Not Known
6-1 Overview
Recall from Section 1-3 that a simple random sample of n values is obtained if every possible
sample of size n has the same chance of being selected.
STA / BIF 205 – Biostatistics
6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
This requirement of random selection means that the methods of this section cannot be used with
some other types of sampling, such as stratified, cluster, and convenience sampling.
Assuming that we have a simple random sample and the other requirements listed previously are
also satisfied, we can now proceed with our major objective:
using the sample as a basis for estimating the value of the population proportion p.
We introduce the new notation 𝑝Ƹ (called “p hat”) for the sample proportion.
We use 𝑝Ƹ as the point estimate of p because it is unbiased and is the most consistent of the
estimators that could be used.
Given the way that the margin of error E is defined, there is a probability of 1- α that a sample
proportion will be in error (different from the population proportion p) by no more than E, and there is
a probability of α that the sample proportion will be in error by more than E.
With 95% confidence, we expect that 19 out of 20 samples should result in confidence intervals that
do contain the true value of p, and Figure 6-3 illustrates this with 19 of the confidence intervals
containing p, while one confidence interval does not contain p.
● What is the mean amount of milk obtained from cows in New York State?
When using the procedures of this section to estimate an unknown population mean µ, the above requirements
indicate that we must know the value of the population standard deviation σ.
It would be an unusual set of circumstances that would allow us to know σ without knowing µ.
After all, the only way to find the value of σ is to compute it from all of the known population values, so the
computation of µ would also be possible and, if we can find the true value of there is no need to estimate it.
Although the confidence interval methods of this section are not very realistic, they do reveal the basic concepts
of important statistical reasoning, and they form the foundation for sample size determination discussed later.
This means that if we were to select many different samples of size 106 and construct the
confidence intervals as we did here, 95% of them would actually contain the value of the population
mean µ.
Note that the sample size does not depend on the size (N) of the population !!
STA / BIF 205 – Biostatistics
6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
The sample size must be a whole number, because it represents the number of sample values that
must be found.
However, when we use Formula 6-5 to calculate the sample size n, we usually get a result that is not
a whole number.
When this happens, we use the following round-off rule:
Formula 6-5 requires that we substitute some value for the population standard deviation σ, but in
reality, it is usually unknown.
Here are some ways that we can work around this problem:
1. Use the range rule of thumb (see Section 2-5) to estimate the standard deviation as follows:
σ ≈ range/4
2. Conduct a pilot study by starting the sampling process. Based on the first collection of at least
31 randomly selected sample values, calculate the sample standard deviation s and use it in
place of σ.
3. Estimate the value of σ by using the results of some other study that was done earlier.
If we are willing to settle for less accurate results by using a larger margin of error, such as 4, the
sample size drops to 54.0225, which is rounded up to 55.
Doubling the margin of error causes the required sample size to decrease to one fourth its original
value.
Conversely, halving the margin of error quadruples the sample size.
Consequently, if you want more accurate results, the sample size must be substantially increased.
Because large samples generally require more time and money, there is often a need for a trade-off
between the sample size and the margin of error E
STA / BIF 205 – Biostatistics
6 – Estimates and Sample Sizes with One Sample
The sampling distribution of sample means 𝑥ҧ is exactly a normal distribution with mean μ and
standard deviation 𝜎ൗ 𝑛 whenever the population has a normal distribution with mean μ and
standard deviation σ.
If the population is not normally distributed, large samples yield sample means with a distribution
that is approximately normal with mean μ and standard deviation 𝜎ൗ 𝑛.
As in Section 6-3, the distribution of sample means 𝑥ҧ tends to be more consistent (with less
variation) than the distributions of other sample statistics, and the sample mean 𝑥ҧ is an unbiased
estimator that targets the population mean μ.
In Sections 6-2 and 6-3 we noted that there is a serious limitation to the usefulness of a point
estimate: The single value of a point estimate does not reveal how good that estimate is.
Confidence intervals give us much more meaningful information by providing a range of values
associated with a degree of likelihood that the range actually does contain the true value of μ.
Here is the key point of this section: If σ is not known, but the requirements are satisfied, instead of
using the normal distribution, we use the Student t distribution.
For example, if 10 students have quiz scores with a mean of 80, we can freely assign values to the first 9
scores, but the 10th score is then determined. The sum of the 10 scores must be 800, so the 10th score
must equal 800 minus the sum of the first 9 scores. Because those first 9 scores can be freely selected to
be any values, we say that there are 9 degrees of freedom available.
For the applications of this section, the number of degrees of freedom is simply the sample size
minus 1.