1-Introduction To Statistics PDF

STA / BIF 205 – Biostatistics
Ralph Lteif
6 – Estimates and Sample Sizes with One Sample
6-1 Overview
6-2 Estimating a Population Proportion
6-3 Estimating a Population Mean: σ Known
6-4 Estimating a Population Mean: σ Not Known

6-1 Overview

6-1 Overview
In this chapter we begin working with the true core of inferential statistics as we use sample data to
make inferences about population parameters.
The two major applications of inferential statistics involve the use of sample data to
(1) estimate the value of a population parameter, and
(2) test some claim (or hypothesis) about a population.
In this chapter we introduce methods for estimating values of population proportions and means.
We also present methods for determining the sample sizes necessary to estimate those parameters.
We begin with proportions.


Here is the main objective of this section: Given a sample proportion, estimate the value of the
population proportion p.
This section will consider only cases in which the normal distribution can be used to approximate the
sampling distribution of sample proportions.
Recall from Section 1-3 that a simple random sample of n values is obtained if every possible
sample of size n has the same chance of being selected.
This requirement of random selection means that the methods of this section cannot be used with
some other types of sampling, such as stratified, cluster, and convenience sampling.
Assuming that we have a simple random sample and the other requirements listed previously are
also satisfied, we can now proceed with our major objective:
using the sample as a basis for estimating the value of the population proportion p.
We introduce the new notation 𝑝Ƹ (called “p hat”) for the sample proportion.

If we want to estimate a population proportion with a single value, the best estimate is 𝑝.Ƹ Because 𝑝Ƹ
consists of a single value, it is called a point estimate.
We use 𝑝Ƹ as the point estimate of p because it is unbiased and is the most consistent of the
estimators that could be used.


Why Do We Need Confidence Intervals?
In the preceding example we saw that 0.262 was our best point estimate of the population proportion
p, but we have no indication of just how good our best estimate is.
Because the point estimate has the serious flaw of not revealing anything about how good it is,
statisticians have cleverly developed another type of estimate.
This estimate, called a confidence interval or interval estimate, consists of a range (or an interval)
of values instead of just a single value.

A confidence interval is associated with a confidence level, such as 0.95 (or 95%).
The confidence level gives us the success rate of the procedure used to construct the confidence
interval.

Critical Values
A standard z score that can be used to distinguish between sample statistics that are likely to occur and those
that are unlikely is called a critical value. Critical values are based on the following observations:

Critical Values
These observations can be formalized with the following notation and definition.

Critical Values

Critical Values

Margin of Error
Given the way that the margin of error E is defined, there is a probability of 1- α that a sample
proportion will be in error (different from the population proportion p) by no more than E, and there is
a probability of α that the sample proportion will be in error by more than E.

Margin of Error

Margin of Error



- The sample is a simple random sample.

- The conditions for a binomial experiment are satisfied, because there is a fixed number of trials
(580), the trials are independent (because the color of a pea pod doesn’t affect the probability of the
color of another pea pod), there are two categories of outcome (yellow, not yellow), and the
probability of yellow remains constant.
- Finally, we use n = 580 and 𝑝Ƹ = 0.262 to find that so the normal
distribution can be used to approximate the binomial distribution.
The check of requirements has been successfully completed.



Interpreting a Confidence Interval
We must be careful to interpret confidence intervals correctly. There is a correct interpretation and
many other creative but wrong interpretations of the confidence interval 0.226 < p < 0.298.

With 95% confidence, we expect that 19 out of 20 samples should result in confidence intervals that
do contain the true value of p, and Figure 6-3 illustrates this with 19 of the confidence intervals
containing p, while one confidence interval does not contain p.

Determining Sample Size
Suppose we want to collect sample data with the objective of estimating some population proportion.
How do we know how many sample items must be obtained?
If we begin with the expression for the margin of error E (Formula 6-1), then solve for n, we get
Formula 6-2.
Formula 6-2 requires 𝑝Ƹ as an estimate of the population proportion p, but if no such estimate is known (as is
often the case), we replace 𝑝Ƹ by 0.5 and replace 𝑞ො by 0.5 (the product 𝑝Ƹ . 𝑞ො has 0.25 as its largest possible
value), with the result given in Formula 6-3.





Finding the Point Estimate and E from a Confidence Interval
If we already know the confidence interval limits, the sample proportion 𝑝Ƹ and the margin of error E
can be found as follows:

Finding the Point Estimate and E from a Confidence Interval


In Section 6-2 we introduced the point estimate and confidence interval as tools for using a sample
proportion to estimate a population proportion.
We also showed how to determine the minimum sample size required to estimate a population
proportion.
In this section we again discuss point estimate, confidence interval, and sample size determination,
but we now consider the objective of estimating a population mean µ.
For example, important questions such as these can be addressed using the methods of this section
and the following section:
● What is the mean life span of bald eagles in the United States?
● What is the mean weight of elephants in Kenya?
● What is the mean amount of milk obtained from cows in New York State?

Requirements for Estimating µ when σ is known
When using the procedures of this section to estimate an unknown population mean µ, the above requirements
indicate that we must know the value of the population standard deviation σ.
It would be an unusual set of circumstances that would allow us to know σ without knowing µ.
After all, the only way to find the value of σ is to compute it from all of the known population values, so the
computation of µ would also be possible and, if we can find the true value of there is no need to estimate it.
Although the confidence interval methods of this section are not very realistic, they do reveal the basic concepts
of important statistical reasoning, and they form the foundation for sample size determination discussed later.

In Section 6-2 we saw that the sample proportion 𝑝Ƹ is the best point estimate of the population
proportion p. For similar reasons, the sample mean 𝑥ҧ is the best point estimate of the population
mean µ.
We use 𝑥ҧ as the point estimate of µ because it is unbiased (the distribution of sample means tends
to center about the value of the population mean) and is the most consistent (smaller standard
deviation) of the estimators that could be used.

Confidence Intervals
Margin of Error
The difference between the sample mean and the population mean is an error.
σ
In Section 5-5 we saw that is the standard deviation of sample means.
𝑛
σ
Using and the notation 𝑧α introduced in Section 6-2, we now use the margin of error E
𝑛 2
expressed as follows:

Using the margin of error E, we can now identify the confidence interval for the population mean µ (if
the requirements for this section are satisfied).



After obtaining a confidence interval estimate of the population mean µ, such as a 95% confidence
interval of 98.08 < µ < 98.32, there is a correct interpretation and many wrong interpretations.



This means that if we were to select many different samples of size 106 and construct the
confidence intervals as we did here, 95% of them would actually contain the value of the population
mean µ.


Determining Sample Size Required to Estimate µ
When we plan to collect a simple random sample of data that will be used to estimate a population
mean how many sample values must be obtained?
Determining the size of a simple random sample is a very important issue, because samples that are
needlessly large waste time and money, and samples that are too small may lead to poor results.
If we begin with the expression for the margin of error E (Formula 6-4) and solve for the sample size
n, we get the following:
Note that the sample size does not depend on the size (N) of the population !!
The sample size must be a whole number, because it represents the number of sample values that
must be found.
However, when we use Formula 6-5 to calculate the sample size n, we usually get a result that is not
a whole number.
When this happens, we use the following round-off rule:

Dealing with Unknown σ When Finding Sample Size
Formula 6-5 requires that we substitute some value for the population standard deviation σ, but in
reality, it is usually unknown.
Here are some ways that we can work around this problem:
1. Use the range rule of thumb (see Section 2-5) to estimate the standard deviation as follows:
σ ≈ range/4
2. Conduct a pilot study by starting the sampling process. Based on the first collection of at least
31 randomly selected sample values, calculate the sample standard deviation s and use it in
place of σ.
3. Estimate the value of σ by using the results of some other study that was done earlier.

IQ tests are typically designed so that the mean is 100 and the standard deviation is 15.
Statistics professors have IQ scores with a mean greater than 100 and a standard deviation less
than 15 (because they are a more homogeneous group than people randomly selected from the
general population).
We do not know the specific value of σ for statistics professors, but we can play it safe by using σ =
15.
Using a value for σ that is larger than the true value will make the sample size larger than necessary,
but using a value for σ that is too small would result in a sample size that is inadequate
When calculating the sample size n, any errors should always be conservative in the sense that they
make n too large instead of too small.



If we are willing to settle for less accurate results by using a larger margin of error, such as 4, the
sample size drops to 54.0225, which is rounded up to 55.
Doubling the margin of error causes the required sample size to decrease to one fourth its original
value.
Conversely, halving the margin of error quadruples the sample size.
Consequently, if you want more accurate results, the sample size must be substantially increased.
Because large samples generally require more time and money, there is often a need for a trade-off
between the sample size and the margin of error E

In Section 6-3 we presented methods for constructing a confidence interval estimate of an unknown
population mean μ but we considered only cases in which the population standard deviation σ is
known.
We noted that the requirement of a known σ is not very realistic, because the calculation of σ
requires that we know all of the population values, but if we know all of the population values we can
easily find the value of the population mean μ, so there is no need to estimate μ.
In this section we present a method for constructing confidence interval estimates of μ without the
requirement that σ is known.
The usual procedure is to collect sample data and find the value of the statistics n, 𝑥ҧ and s.
Because the methods of this section are based on those statistics and σ is not required, the methods
of this section are very realistic, practical, and they are used often.
Note that the following requirements for the methods of this section do not include a requirement that
σ is known.

The sampling distribution of sample means 𝑥ҧ is exactly a normal distribution with mean μ and
standard deviation 𝜎ൗ 𝑛 whenever the population has a normal distribution with mean μ and
standard deviation σ.
If the population is not normally distributed, large samples yield sample means with a distribution
that is approximately normal with mean μ and standard deviation 𝜎ൗ 𝑛.

As in Section 6-3, the distribution of sample means 𝑥ҧ tends to be more consistent (with less
variation) than the distributions of other sample statistics, and the sample mean 𝑥ҧ is an unbiased
estimator that targets the population mean μ.
In Sections 6-2 and 6-3 we noted that there is a serious limitation to the usefulness of a point
estimate: The single value of a point estimate does not reveal how good that estimate is.
Confidence intervals give us much more meaningful information by providing a range of values
associated with a degree of likelihood that the range actually does contain the true value of μ.
Here is the key point of this section: If σ is not known, but the requirements are satisfied, instead of
using the normal distribution, we use the Student t distribution.


A value of 𝑡α can be found in Table A-3. To find a critical value 𝑡α in Table A-3, locate the appropriate
2 2
number of degrees of freedom in the left column and proceed across the corresponding row until
reaching the number directly below the appropriate area at the top.
For example, if 10 students have quiz scores with a mean of 80, we can freely assign values to the first 9
scores, but the 10th score is then determined. The sum of the 10 scores must be 800, so the 10th score
must equal 800 minus the sum of the first 9 scores. Because those first 9 scores can be freely selected to
be any values, we say that there are 9 degrees of freedom available.
For the applications of this section, the number of degrees of freedom is simply the sample size
minus 1.


We use critical values denoted by 𝑡α for the margin of error E and the confidence interval.
2

The following procedure uses the above margin of error in the construction of confidence interval
estimates of μ.





We now list the important properties of the t distribution that we are using in this section.

The following is a summary of the conditions indicating use of a t distribution instead of the standard
normal distribution.

Choosing the Appropriate Distribution

Choosing the Appropriate Distribution
In Figure 6-6, note that if we have a small (n ≤ 30) sample drawn from a distribution that differs
dramatically from a normal distribution, we can’t use the methods described in this chapter.
One alternative is to use nonparametric methods (see Chapter 12), and another alternative is to use
the computer bootstrap method. In both of those approaches, no assumptions are made about the
original population.






Finding Point Estimate and E from a Confidence Interval



Using Confidence Intervals to Describe, Explore, or Compare Data
• In some cases, we might use a confidence interval to achieve an ultimate goal of estimating the
value of a population parameter.
• For the body temperature data used in this section, an important goal might be to estimate the
mean body temperature of healthy adults, and our results strongly suggest that the commonly
used value of 98.6oF is incorrect (because we have 95% confidence that the limits of 98.08oF and
98.32oF contain the true value of the population mean).
• In other cases, a confidence interval might be one of several different tools used to describe,
explore, or compare data sets.

1-Introduction To Statistics PDF

Uploaded by

Copyright:

Available Formats

You might also like

1-Introduction To Statistics PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1-Introduction To Statistics PDF

Uploaded by

Copyright:

Available Formats

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

6-2 Estimating a Population Proportion

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

- The sample is a simple random sample.

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

6-3 Estimating a Population Mean: σ Known

STA / BIF 205 – Biostatistics

● What is the mean weight of elephants in Kenya?

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

6-4 Estimating a Population Mean: σ Not Known

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics

STA / BIF 205 – Biostatistics