1-Introduction To Statistics PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 78

STA / BIF 205 – Biostatistics

Ralph Lteif
6 – Estimates and Sample Sizes with One Sample

6-1 Overview
6-2 Estimating a Population Proportion
6-3 Estimating a Population Mean: σ Known
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample

6-1 Overview

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-1 Overview
In this chapter we begin working with the true core of inferential statistics as we use sample data to
make inferences about population parameters.
The two major applications of inferential statistics involve the use of sample data to
(1) estimate the value of a population parameter, and
(2) test some claim (or hypothesis) about a population.
In this chapter we introduce methods for estimating values of population proportions and means.
We also present methods for determining the sample sizes necessary to estimate those parameters.
We begin with proportions.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample

6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Here is the main objective of this section: Given a sample proportion, estimate the value of the
population proportion p.
This section will consider only cases in which the normal distribution can be used to approximate the
sampling distribution of sample proportions.

Recall from Section 1-3 that a simple random sample of n values is obtained if every possible
sample of size n has the same chance of being selected.
STA / BIF 205 – Biostatistics
6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
This requirement of random selection means that the methods of this section cannot be used with
some other types of sampling, such as stratified, cluster, and convenience sampling.

Assuming that we have a simple random sample and the other requirements listed previously are
also satisfied, we can now proceed with our major objective:
using the sample as a basis for estimating the value of the population proportion p.
We introduce the new notation 𝑝Ƹ (called “p hat”) for the sample proportion.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
If we want to estimate a population proportion with a single value, the best estimate is 𝑝.Ƹ Because 𝑝Ƹ
consists of a single value, it is called a point estimate.

We use 𝑝Ƹ as the point estimate of p because it is unbiased and is the most consistent of the
estimators that could be used.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Why Do We Need Confidence Intervals?
In the preceding example we saw that 0.262 was our best point estimate of the population proportion
p, but we have no indication of just how good our best estimate is.
Because the point estimate has the serious flaw of not revealing anything about how good it is,
statisticians have cleverly developed another type of estimate.
This estimate, called a confidence interval or interval estimate, consists of a range (or an interval)
of values instead of just a single value.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
A confidence interval is associated with a confidence level, such as 0.95 (or 95%).
The confidence level gives us the success rate of the procedure used to construct the confidence
interval.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Critical Values
A standard z score that can be used to distinguish between sample statistics that are likely to occur and those
that are unlikely is called a critical value. Critical values are based on the following observations:

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Critical Values
These observations can be formalized with the following notation and definition.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Critical Values

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Critical Values

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Margin of Error

Given the way that the margin of error E is defined, there is a probability of 1- α that a sample
proportion will be in error (different from the population proportion p) by no more than E, and there is
a probability of α that the sample proportion will be in error by more than E.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Margin of Error

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Margin of Error

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion

- The sample is a simple random sample.


- The conditions for a binomial experiment are satisfied, because there is a fixed number of trials
(580), the trials are independent (because the color of a pea pod doesn’t affect the probability of the
color of another pea pod), there are two categories of outcome (yellow, not yellow), and the
probability of yellow remains constant.
- Finally, we use n = 580 and 𝑝Ƹ = 0.262 to find that so the normal
distribution can be used to approximate the binomial distribution.
The check of requirements has been successfully completed.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Interpreting a Confidence Interval
We must be careful to interpret confidence intervals correctly. There is a correct interpretation and
many other creative but wrong interpretations of the confidence interval 0.226 < p < 0.298.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Interpreting a Confidence Interval

With 95% confidence, we expect that 19 out of 20 samples should result in confidence intervals that
do contain the true value of p, and Figure 6-3 illustrates this with 19 of the confidence intervals
containing p, while one confidence interval does not contain p.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Determining Sample Size
Suppose we want to collect sample data with the objective of estimating some population proportion.
How do we know how many sample items must be obtained?
If we begin with the expression for the margin of error E (Formula 6-1), then solve for n, we get
Formula 6-2.
Formula 6-2 requires 𝑝Ƹ as an estimate of the population proportion p, but if no such estimate is known (as is
often the case), we replace 𝑝Ƹ by 0.5 and replace 𝑞ො by 0.5 (the product 𝑝Ƹ . 𝑞ො has 0.25 as its largest possible
value), with the result given in Formula 6-3.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Determining Sample Size

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Determining Sample Size

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Determining Sample Size

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Determining Sample Size

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Finding the Point Estimate and E from a Confidence Interval
If we already know the confidence interval limits, the sample proportion 𝑝Ƹ and the margin of error E
can be found as follows:

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-2 Estimating a Population Proportion
Finding the Point Estimate and E from a Confidence Interval

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample

6-3 Estimating a Population Mean: σ Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
In Section 6-2 we introduced the point estimate and confidence interval as tools for using a sample
proportion to estimate a population proportion.
We also showed how to determine the minimum sample size required to estimate a population
proportion.
In this section we again discuss point estimate, confidence interval, and sample size determination,
but we now consider the objective of estimating a population mean µ.
For example, important questions such as these can be addressed using the methods of this section
and the following section:
● What is the mean life span of bald eagles in the United States?

● What is the mean weight of elephants in Kenya?

● What is the mean amount of milk obtained from cows in New York State?

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Requirements for Estimating µ when σ is known

When using the procedures of this section to estimate an unknown population mean µ, the above requirements
indicate that we must know the value of the population standard deviation σ.
It would be an unusual set of circumstances that would allow us to know σ without knowing µ.
After all, the only way to find the value of σ is to compute it from all of the known population values, so the
computation of µ would also be possible and, if we can find the true value of there is no need to estimate it.
Although the confidence interval methods of this section are not very realistic, they do reveal the basic concepts
of important statistical reasoning, and they form the foundation for sample size determination discussed later.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
In Section 6-2 we saw that the sample proportion 𝑝Ƹ is the best point estimate of the population
proportion p. For similar reasons, the sample mean 𝑥ҧ is the best point estimate of the population
mean µ.
We use 𝑥ҧ as the point estimate of µ because it is unbiased (the distribution of sample means tends
to center about the value of the population mean) and is the most consistent (smaller standard
deviation) of the estimators that could be used.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Confidence Intervals
Margin of Error
The difference between the sample mean and the population mean is an error.
σ
In Section 5-5 we saw that is the standard deviation of sample means.
𝑛
σ
Using and the notation 𝑧α introduced in Section 6-2, we now use the margin of error E
𝑛 2
expressed as follows:

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Confidence Intervals
Using the margin of error E, we can now identify the confidence interval for the population mean µ (if
the requirements for this section are satisfied).

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Confidence Intervals

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Confidence Intervals

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Interpreting a Confidence Interval
After obtaining a confidence interval estimate of the population mean µ, such as a 95% confidence
interval of 98.08 < µ < 98.32, there is a correct interpretation and many wrong interpretations.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known

This means that if we were to select many different samples of size 106 and construct the
confidence intervals as we did here, 95% of them would actually contain the value of the population
mean µ.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Determining Sample Size Required to Estimate µ
When we plan to collect a simple random sample of data that will be used to estimate a population
mean how many sample values must be obtained?
Determining the size of a simple random sample is a very important issue, because samples that are
needlessly large waste time and money, and samples that are too small may lead to poor results.
If we begin with the expression for the margin of error E (Formula 6-4) and solve for the sample size
n, we get the following:

Note that the sample size does not depend on the size (N) of the population !!
STA / BIF 205 – Biostatistics
6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
The sample size must be a whole number, because it represents the number of sample values that
must be found.
However, when we use Formula 6-5 to calculate the sample size n, we usually get a result that is not
a whole number.
When this happens, we use the following round-off rule:

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Dealing with Unknown σ When Finding Sample Size

Formula 6-5 requires that we substitute some value for the population standard deviation σ, but in
reality, it is usually unknown.
Here are some ways that we can work around this problem:
1. Use the range rule of thumb (see Section 2-5) to estimate the standard deviation as follows:
σ ≈ range/4

2. Conduct a pilot study by starting the sampling process. Based on the first collection of at least
31 randomly selected sample values, calculate the sample standard deviation s and use it in
place of σ.

3. Estimate the value of σ by using the results of some other study that was done earlier.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Dealing with Unknown σ When Finding Sample Size
IQ tests are typically designed so that the mean is 100 and the standard deviation is 15.
Statistics professors have IQ scores with a mean greater than 100 and a standard deviation less
than 15 (because they are a more homogeneous group than people randomly selected from the
general population).
We do not know the specific value of σ for statistics professors, but we can play it safe by using σ =
15.
Using a value for σ that is larger than the true value will make the sample size larger than necessary,
but using a value for σ that is too small would result in a sample size that is inadequate
When calculating the sample size n, any errors should always be conservative in the sense that they
make n too large instead of too small.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Dealing with Unknown σ When Finding Sample Size

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-3 Estimating a Population Mean: σ Known
Dealing with Unknown σ When Finding Sample Size

If we are willing to settle for less accurate results by using a larger margin of error, such as 4, the
sample size drops to 54.0225, which is rounded up to 55.
Doubling the margin of error causes the required sample size to decrease to one fourth its original
value.
Conversely, halving the margin of error quadruples the sample size.
Consequently, if you want more accurate results, the sample size must be substantially increased.
Because large samples generally require more time and money, there is often a need for a trade-off
between the sample size and the margin of error E
STA / BIF 205 – Biostatistics
6 – Estimates and Sample Sizes with One Sample

6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
In Section 6-3 we presented methods for constructing a confidence interval estimate of an unknown
population mean μ but we considered only cases in which the population standard deviation σ is
known.
We noted that the requirement of a known σ is not very realistic, because the calculation of σ
requires that we know all of the population values, but if we know all of the population values we can
easily find the value of the population mean μ, so there is no need to estimate μ.
In this section we present a method for constructing confidence interval estimates of μ without the
requirement that σ is known.
The usual procedure is to collect sample data and find the value of the statistics n, 𝑥ҧ and s.
Because the methods of this section are based on those statistics and σ is not required, the methods
of this section are very realistic, practical, and they are used often.
Note that the following requirements for the methods of this section do not include a requirement that
σ is known.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

The sampling distribution of sample means 𝑥ҧ is exactly a normal distribution with mean μ and
standard deviation 𝜎ൗ 𝑛 whenever the population has a normal distribution with mean μ and
standard deviation σ.

If the population is not normally distributed, large samples yield sample means with a distribution
that is approximately normal with mean μ and standard deviation 𝜎ൗ 𝑛.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

As in Section 6-3, the distribution of sample means 𝑥ҧ tends to be more consistent (with less
variation) than the distributions of other sample statistics, and the sample mean 𝑥ҧ is an unbiased
estimator that targets the population mean μ.
In Sections 6-2 and 6-3 we noted that there is a serious limitation to the usefulness of a point
estimate: The single value of a point estimate does not reveal how good that estimate is.
Confidence intervals give us much more meaningful information by providing a range of values
associated with a degree of likelihood that the range actually does contain the true value of μ.
Here is the key point of this section: If σ is not known, but the requirements are satisfied, instead of
using the normal distribution, we use the Student t distribution.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
A value of 𝑡α can be found in Table A-3. To find a critical value 𝑡α in Table A-3, locate the appropriate
2 2
number of degrees of freedom in the left column and proceed across the corresponding row until
reaching the number directly below the appropriate area at the top.

For example, if 10 students have quiz scores with a mean of 80, we can freely assign values to the first 9
scores, but the 10th score is then determined. The sum of the 10 scores must be 800, so the 10th score
must equal 800 minus the sum of the first 9 scores. Because those first 9 scores can be freely selected to
be any values, we say that there are 9 degrees of freedom available.
For the applications of this section, the number of degrees of freedom is simply the sample size
minus 1.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
We use critical values denoted by 𝑡α for the margin of error E and the confidence interval.
2

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
The following procedure uses the above margin of error in the construction of confidence interval
estimates of μ.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
We now list the important properties of the t distribution that we are using in this section.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
The following is a summary of the conditions indicating use of a t distribution instead of the standard
normal distribution.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
Choosing the Appropriate Distribution

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
Choosing the Appropriate Distribution
In Figure 6-6, note that if we have a small (n ≤ 30) sample drawn from a distribution that differs
dramatically from a normal distribution, we can’t use the methods described in this chapter.
One alternative is to use nonparametric methods (see Chapter 12), and another alternative is to use
the computer bootstrap method. In both of those approaches, no assumptions are made about the
original population.

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
Finding Point Estimate and E from a Confidence Interval

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
Finding Point Estimate and E from a Confidence Interval

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
Finding Point Estimate and E from a Confidence Interval

STA / BIF 205 – Biostatistics


6 – Estimates and Sample Sizes with One Sample
6-4 Estimating a Population Mean: σ Not Known
Using Confidence Intervals to Describe, Explore, or Compare Data
• In some cases, we might use a confidence interval to achieve an ultimate goal of estimating the
value of a population parameter.
• For the body temperature data used in this section, an important goal might be to estimate the
mean body temperature of healthy adults, and our results strongly suggest that the commonly
used value of 98.6oF is incorrect (because we have 95% confidence that the limits of 98.08oF and
98.32oF contain the true value of the population mean).
• In other cases, a confidence interval might be one of several different tools used to describe,
explore, or compare data sets.

STA / BIF 205 – Biostatistics

You might also like