Stat609 Sp23 LCN Unit2

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 79

Unit 2

STATISTICAL INFERENCE
 CHAPTER 7: Sampling and Sampling
Distributions

 CHAPTER 8: Confidence Interval Estimation

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7
Sampling and Sampling Distributions

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-1 Introduction

 In a typical statistical inference problem, you want to


discover one or more characteristics of a given
population.
 Generally difficult or even impossible to contact each
member of the population
 Solution: identify a sample of the population and then obtain
information from members of the sample
 Chapter objectives
 Discuss the sampling schemes generally used in real
sampling applications
 See how the information from a sample of the population
can be used to infer the properties of the entire population
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-2 Sampling Terminology

 A population is the set of all members about which a


study intends to make inferences.
 An inference is a statement about a numerical characteristic
of the population.
 A frame is a list of all members of the population. The
potential sample members are called sampling units.
 A probability sample is a sample in which the sampling
units are chosen from the population according to a
random mechanism.
 A judgmental sample is a sample in which the sampling
units are chosen according to the sampler’s judgment.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-2 Sampling Terminology

 The members of a probability sample are chosen according


to a random mechanism, whereas the members of a
judgmental sample are chosen according to the sampler’s
judgment.
Why Random Sampling?
 One reason for sampling randomly from a population is to

avoid biases (such as choosing mainly stay-at-home


mothers because they are easier to contact).
 The random sampling allows you to use probability to make

inferences about unknown population parameters. If


sampling were not random, there would be no basis for
using probability to make such inferences.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3 Methods for Selecting Random
Samples
 Different types of sampling schemes have different
properties.
 There is typically a trade-off between cost and
accuracy.
 Some sampling schemes are cheaper and easier to
administer, whereas others are more costly but provide
more accurate information.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3a Simple Random Sampling
(slide 1 of 2)

 The simplest type of sampling scheme is called


simple random sampling.
 A simple random sample of size n is one where
each possible sample of size n has the same chance
of being chosen.
 Simple random samples are the easiest to understand,
and their statistical properties are the most
straightforward.
 More complex random samples are often used in
real applications
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Simple Random Sampling
(slide 2 of 2)

 Simple random samples are used infrequently in real


applications. There are several reasons for this:
 Because each sampling unit has the same chance of being
sampled, simple random sampling can result in samples
that are spread over a large geographical region.
 This can make sampling extremely expensive, especially if
personal interviews are used.
 Simple random sampling requires that all sampling units be
identified prior to sampling. Sometimes this is infeasible.
 Simple random sampling can result in underrepresentation
or overrepresentation of certain segments of the population.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 7.1: Sampling Families to Analyze
Annual Incomes (slide 1 of 2)
 Objective: Using SPSS to illustrate how the random number
function, RAND, can be used to generate simple random samples.
The file name: Annual Incomes.
 Solution: Consider the frame of 40 families with annual incomes.
 Choose a simple random sample of size 10 from this frame.
SPSS Steps:
From the menus choose:
 Data  Select Cases  Random Sample of Cases.
 The ‘Random Sample of Cases’, selects a random subset of cases
based on a pseudo-random number generated by SPSS.
 Enter Exactly 10 cases from the first 40 cases.
 To obtain more random samples of size 10 (for comparison), you
would need to go through this process repeatedly.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 7.1: Sampling Families to Analyze
Annual Incomes (slide 2 of 2)

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3b Systematic Sampling (slide 1 of 2)

 A systematic sample provides a convenient way to


choose the sample.
 First, divide the population size by the sample size,
creating “blocks.”
 Next, use a random mechanism to choose a number
between 1 and the number in each “block.”
 In general, one of the first k members is selected
randomly, and then every kth member after this one is
selected.
 The value k is called the sampling interval and equals the
ratio N/n, where N is the population size and n is the desired
sample size.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3b Systematic Sampling (slide 2 of 2)

Example 7.2:
 Suppose you are asked to select a random sample of 250 names
from a large company’s directory of employees.
 There are 55,000 names listed in alphabetical order in the
directory.
 First, you divide the population size by the sample size:
55,000/250 = 220.
 Next, you use a random mechanism to choose a number
between 1 and 220. Suppose this number is 131.
 Then you choose the 131st name and every 220th name
thereafter.
 So, you would choose name 131, name 351, name 571, and so
on. The result is a systematic sample of size n = 250.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3c Stratified Sampling (slide 1 of 2)

 Suppose various subpopulations within the total population can be


identified. These subpopulations are called strata.
 Instead of taking a simple random sample from the entire
population, it might make more sense to select a simple random
sample from each stratum separately. This sampling method is
called stratified sampling.
 In stratified sampling, the population is divided into relatively
homogeneous subsets called strata.
 Advantages of stratified sampling:
 Separate estimates can be obtained within each stratum, which would not
be obtained with a simple random sample from the entire population.
 The accuracy of the resulting population estimates can be increased by
using appropriately defined strata.
 Define the strata such that there is less variability within the individual strata
than in the population as a whole.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Stratified Sampling (slide 2 of 2)

 The key to using stratified sampling effectively is selecting the


appropriate strata.
 What is appropriate depends on the company’s objectives and its
product.
 There are many ways to choose sample sizes from each stratum,
but the most popular method is to use proportional sample
sizes.
 With proportional sample sizes, the proportion of a stratum in the
sample is the same as the proportion of that stratum in the population.
 The advantage of proportional sample sizes is they are very easy to
determine.
 The disadvantage is they ignore differences in variability among the
strata.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3d Cluster Sampling

 In cluster sampling, the population is separated into


clusters, such as cities or city blocks, and then a random
sample of the clusters is selected.
 The primary advantage of cluster sampling is sampling
convenience (and possibly lower cost).
 The downside is that the inferences drawn from a cluster sample
can be less accurate for a given sample size than other sampling
plans.
 The key to selecting a cluster sample is to define the
sampling units as the clusters—the city blocks, for example.
 Then a simple random sample of clusters can be chosen.
 Once the clusters are selected, it is typical to sample all of the
population members in each selected cluster.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-3e Multistage Sampling Schemes

 The cluster sampling scheme is an example of a single-


stage sampling scheme.
 Real applications are often more complex than this,
resulting in multistage sampling schemes.
 For example, in Gallup’s nationwide surveys, a random
sample of approximately 300 locations is chosen in the
first stage of the sampling process.
 City blocks or other geographical areas are then randomly
sampled from the first-stage locations in the second stage
of the process.
 This is followed by a systematic sampling of households
from each second-stage area.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4 An Introduction to Estimation

 The purpose of any random sample, simple or


otherwise, is to estimate properties of a population
from the data observed in the sample.
 The mathematical procedures appropriate for
performing this estimation depend on which
properties of the population are of interest and
which type of random sampling scheme is used.
 For both simple random samples and more
complex sampling schemes, the concepts are the
same.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4a Sources of Estimation Error
(slide 1 of 2)

 There are two basic sources of errors that can occur


when you sample randomly from a population:
 Samplingerror
 Nonsampling error

 Sampling error is the inevitable result of basing an


inference on a random sample rather than on the
entire population.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sources of Estimation Error
(slide 2 of 2)

 Nonsampling error is quite different and can occur for a variety of


reasons:
 Nonresponse bias occurs when a portion of the sample fails to respond
to the survey.
 Nontruthful responses are particularly a problem when there are
sensitive questions in a questionnaire.
 Measurement error occurs when the responses to the questions do not
reflect what the investigator had in mind (e.g., when questions are poorly
worded).
 Voluntary response bias occurs when the subset of people who respond to
a survey differs in some important respect from all potential respondents.
 The potential for nonsampling error is enormous.
 However, unlike sampling error, it cannot be measured with probability theory.
 It can be controlled only by using appropriate sampling procedures and
designing good survey instruments.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4b Key Terms in Sampling
(slide 1 of 2)

 A point estimate is a single numeric value, a “best


guess” of a population parameter, based on the data
in a random sample.
 The sampling error (or estimation error) is the
difference between the point estimate and the true
value of the population parameter being estimated.
 The sampling distribution of any point estimate is
the distribution of the point estimates from all
possible samples (of a given sample size) from the
population.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Key Terms in Sampling
(slide 2 of 2)

 A confidence interval is an interval around the point


estimate, calculated from the sample data, that is very
likely to contain the true value of the population
parameter.
 An unbiased estimate is a point estimate such that the
mean of its sampling distribution is equal to the true
value of the population parameter being estimated.
 The standard error of an estimate is the standard
deviation of the sampling distribution of the estimate.
 It measures how much estimates vary from sample to
sample.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4c Sampling Distribution of the Sample
Mean (slide 1 of 2)
 The sampling distribution of the sample mean has the
following properties:
 It is an unbiased estimate of the population mean, as
indicated in this equation:
 The standard error of the sample mean is given in the
equation where σ is the standard deviation of
the population, and n is the sample size.
 It is customary to approximate the standard error by substituting
the sample standard deviation, s, for σ, which leads to this
equation:
 If you go out two standard errors on either side of the
sample mean, you are approximately 95% confident of
capturing the population mean, as shown below:
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4c Sampling Distribution of the Sample Mean
(slide 2 of 2)

 Example 7.3: Size of a pizza is normally distributed with a mean of


16 inches and a standard deviation of 0.8 inch. The pizza chefs strive
to make each pizza 16 inches but are not able to make them all 16
inches.
 What are the expected value and standard error of the sample mean
derived from a random sample of 2 pizzas?
and
 What are the expected value and standard error of the sample mean
derived from a random sample of 4 pizzas?
and
 Compare the expected value and the standard error of the sample
mean with those of an individual pizza.
 The expected values are the same
 The standard errors are lower; averaging reduces variability
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Finite Population Correction (slide 1 of 3)

 Generally, sample size is small relative to the


population size.
 There are situations, however, when the sample size
is greater than 5% of the population.
 Finite population correction factor
 Accounts for the added precision gained by sampling a
larger percentage of the population
 Used to reduce the sampling variation of the
mean/proportion
 Always less than one

 N is large relative to n, factor is close to 1


© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Finite Population Correction (slide 2 of 3)

 In this case, the formula for the standard error of


the mean should be modified with a finite
population correction, or fpc, factor:

 The standard error of the mean is multiplied by fpc


in order to make the correction:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Finite Population Correction (slide 3 of 3)

 Example 7.4: a large class with 340 students has been


divided up into 10 groups. Connie is in a group of 34
students that averaged 72 on the midterm. The class
average was 73 with a standard deviation of 10.
a. Calculate the expected value and the standard error of
the sample mean based on a random sample of 34
students.
 . and

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4d The Central Limit Theorem (slide 1 of 2)

 For any population distribution with mean μ and standard


deviation σ, the sampling distribution of the sample mean is
approximately normal with mean μ and standard deviation ,
and the approximation improves as n increases. This is
called the central limit theorem.
 The important part of this result is the normality of the
sampling distribution.
 When you sum or average n randomly selected values from any
distribution, normal or otherwise, the distribution of the sum or
average is approximately normal, provided that n is sufficiently
large.
 This is the primary reason why the normal distribution is
relevant is so many real applications.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4d The Central Limit Theorem (slide 2 of 2)

 The Averaging Effect


 As you average more and more observations from a given
distribution, the variance of the average decreases.
 For example, suppose you average only two observations. Then
it is easy to get an abnormally large (or small) average.
 But if you average a much larger number of observations, you
aren’t likely to get an abnormally large (or small) average.
 The reason is that a few abnormally large observations will
typically be cancelled by a few abnormally small observations.
 This cancellation produces the averaging effect. It also explains
why a larger sample size tends to produce a more accurate
estimate of a population mean.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 7.5: Average Winnings From A
Wheel of Fortune (slide 1 of 2)
 Objective: To illustrate the central limit theorem by a simulation of
winnings in a game of chance.
 Solution: The population is the set of all outcomes you could obtain
from a single spin of the wheel—that is, all dollar values from $0 to
$1000.
 Each spin results in one randomly sampled dollar value from this
population.
 Each replication of the experiment simulates n spins of the wheel and
calculates the average—that is, the winnings—from these n spins.
 A histogram of winnings is formed, for any value of n, where n is the
number of spins.
 As the number of spins increases, the histogram starts to take on more
and more of a bell shape.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 7.5: Average Winnings From A
Wheel of Fortune (slide 2 of 2)

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4e Sample Size Selection (slide 1 of 2)

 The problem of selecting the appropriate sample size in


any sampling context is not an easy one, but it must be
faced in the planning stages, before any sampling is
done.
 The sampling error tends to decrease as the sample size
increases, so the desire to minimize sampling error
encourages us to select larger sample sizes.
 However, several other factors encourage us to select
smaller sample sizes, including:
 Cost
 Timely collection of data
 Increased chance of nonsampling error, such as nonresponse bias

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4e Sample Size Selection (slide 2 of 2)

 Effect of Larger Sample Sizes


 Accurate estimates of population parameters require small
standard errors, and small standard errors require large
sample sizes.
 However, standard errors are typically inversely proportional
to the square root of the sample size (or sample sizes).
 The implication is that if you want to decrease the standard
error by a given factor, you must increase the sample size by
a much larger factor.
 For example, to decrease the standard error by a factor of 2,
you must increase the sample size by a factor of 4.
 Accurate estimates are not cheap.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
7-4f Summary of Key Ideas for Simple
Random Sampling
 To estimate a population, mean with a simple random sample, the
sample mean is typically used as a “best guess.” This estimate is
called a point estimate.
 The accuracy of the point estimate is measured by its standard error. It
is the standard deviation of the sampling distribution of the point
estimate.
 A confidence interval (with 95% confidence) for the population mean
extends to approximately two standard errors on either side of the
sample mean.
 From the central limit theorem, the sampling distribution
of is approximately normal when n is reasonably large.
 There is approximately a 95% chance that any particular will be
within two standard errors of the population mean μ.
 The sampling error can be reduced by increasing the sample size n.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 8
Confidence Interval Estimation

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-1 Introduction

 Statistical inferences are always based on an underlying


probability model, which means that some type of
random mechanism must generate the data.
 Two random mechanisms are generally used:
 Random sampling from a larger population
 Randomized experiments

 Generally, statistical inferences are of two types:


 Confidence interval estimation uses the data to obtain a
point estimate and a confidence interval around this point
estimate.
 Hypothesis testing determines whether the observed data
provide support for a particular hypothesis.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-2 Sampling Distributions

 Most confidence intervals are of the form:

 In general, whenever you make inferences about one or more population


parameters, you always base this inference on the sampling distribution of
a point estimate, such as the sample mean.
 An equivalent statement to the central limit theorem is that the standardized
quantity Z, as defined below, is approximately normal with mean 0 and
standard deviation 1:

 However, the population standard deviation σ is rarely known, so it is replaced


by its sample estimate s in the formula for Z.
 When the replacement is made, a new source of variability is introduced, and the
sampling distribution is no longer normal. Instead, it is called the t distribution.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8.2a The t Distribution
(slide 1 of 2)

 If we are interested in estimating a population mean μ with a sample


of size n, we assume the population distribution is normal with
unknown standard deviation σ.
 σ is replaced by the sample standard deviation s, as shown in this
equation:

 Then the standardized value in the equation has a t distribution with n – 1


degrees of freedom.
 The degrees of freedom is a numerical parameter of the t distribution that defines
the precise shape of the distribution.
 The t-value in this equation is very much like a typical Z-value.
 That is, the t-value indicates the number of standard errors by which the sample
mean differs from the population mean.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The t Distribution
(slide 2 of 2)

 The t distribution looks very much like the standard normal


distribution.
 It is bell-shaped and centered at 0.
 The only difference is that it is slightly more spread out, and this
increase in spread is greater for small degrees of freedom.
 When n is large, so that the degrees of freedom is large, the t
distribution and the standard normal distribution are practically
indistinguishable, as shown below.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-2b Other Sampling Distributions

 The t distribution, a close relative of the normal


distribution, is used to make inferences about a
population mean when the population standard
deviation is unknown.
 Two other close relatives of the normal distribution
are the chi-square and F distributions.
 These are used primarily to make inferences about
variances (or standard deviations), as opposed to
means.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-3 Confidence Interval for a Mean
(slide 1 of 2)

 To obtain a confidence interval for μ, first specify a


confidence level, usually 90%, 95%, or 99%.
 Then use the sampling distribution of the point
estimate to determine the multiple of the standard
error (SE) to go out on either side of the point
estimate to achieve the given confidence level.
 If the confidence level is 95%, the value used most
frequently in applications, the multiple is
approximately 2. More precisely, it is a t-value.
 A typical confidence interval for μ is of the form:
where
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Confidence Interval for a Mean
(slide 2 of 2)

 To obtain the correct t-multiple, let α be 1 minus the


confidence level (expressed as a decimal).
 For example, if the confidence level is 90%, then α = 0.10.
 Then the appropriate t-multiple is the value that cuts
off probability α/2 in each tail of the t distribution with
n−1 degrees of freedom.
 As the confidence level increases, the length of the
confidence interval also increases.
 As n increases, the standard error decreases, so the
length of the confidence interval tends to decrease for
any confidence level.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.1: IQ Tests (slide 1 of 2)

IQ tests are approximately normally distributed. A sample


of 22 employees gives a sample mean of 106 and a
standard deviation of 15.
Compute 90% and 99% confidence intervals for the
average IQ at this firm.
Use the results to infer if the mean IQ in this firm is
significantly different from the national average of 100.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.1: IQ Tests (slide 2 of 2)

90% confidence interval:

[100.5, 111.5]
Does not contain 100
99% confidence interval:

[96.95, 115.05]
Contains 100

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.2: Fuel Usage of “Ultra-Green” Cars
(slide 1 of 4)

 A car manufacturer advertises that its new “ultra-green” car obtains


an average of 100 MPG and, based on its fuel emissions. For a
sample of 25 cars, each car is driven the same distance in identical
conditions in order to obtain the car’s MPG. The file name: MPG

 Use the sample data to estimate the mean MPG of all ultra-green cars
with 90% confidence.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.2: Fuel Usage of “Ultra-Green” Cars
(slide 2 of 4)

 Summary Statistics:
 and
 Assume that MPG follows a normal distribution
 Construct the 90% confidence interval for the population mean
 For a 90% confidence interval:

 With 90% confidence, the average MPG of all ultra-green cars is between
92.86 MPG and 100.18 MPG
 The manufacturers claim that the ultra-green car will average 100 MPG
cannot be rejected since 100 falls within the interval.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.2: Fuel Usage of “Ultra-Green”
Cars (slide 3 of 4)
SPSS Steps:
From the menus choose:
 Analyze  Compare Means  One-Sample T Test..
 Click MPG and move it onto the Test Variable(s) field.
 Click Options and enter 90% for confidence interval
percentage.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.2: Fuel Usage of “Ultra-Green”
Cars (slide 4 of 4)
SPSS Output:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-4 Confidence Interval for a Total
(slide 1 of 2)

 Let T be a population total we want to estimate, such as the


total of all receivables, and let be a point estimate of T based
on a simple random sample of size n from a population of size
N.
 First, we need a point estimate of T. For the population total T,
it is reasonable to sum all of the values in the sample, denoted
Ts, and then “project” this total to the population with this
equation:

 The mean and standard deviation of the sampling distribution


of are given in the equations below:
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Confidence Interval for a Total
(slide 2 of 2)

 Because σ is usually unknown, s is used instead of


σ to obtain the approximate standard error of given
in the equation below:

 The point estimate of T is the point estimate of the


mean multiplied by N, and the standard error of this
point estimate is the standard error of the sample
mean multiplied by N.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.3: Estimating Total Tax
Refunds
(slide 1 of 2)

 Objective: To find a 95% confidence interval for the


total (net) amount the IRS must pay out to a set of
1,000,000 taxpayers.
 Solution: Data set is the refunds from a random sample
of 500 taxpayers.
 The file name: IRS Refunds
 Note that the sample mean is multiplied by the
population size (1,000,000) and how the standard error
of the mean is also multiplied by the population size.
 The effect is to scale the usual confidence interval for
the mean by the population size
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.3: Estimating Total Tax
Refunds
(slide 2 of 2)

Std Error of total,


499, 965

Upper Limit = $346,057,164

Based on these calculations, the IRS can be 95% confident that it will need
to pay out somewhere between about 244 and 346 million dollars to these
1,000,000 taxpayers.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-5 Confidence Interval for a Proportion
(slide 1 of 2)

 Surveys are often used to estimate proportions, so it is important to know


how to form a confidence interval for any population proportion p.
 The basic procedure requires a point estimate, the standard error of this
point estimate, and a multiple that depends on the confidence level:

 It can be shown that for sufficiently large n, the sampling distribution of is


approximately normal with mean p and standard error

 Standard error of sample proportion:

 Confidence interval for a proportion:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Confidence Interval for a Proportion
(slide 2 of 2)

 Confidence intervals for proportions are fairly wide unless n


is quite large.
 To obtain a 95% confidence interval of 3 percentage points
for a population proportion, where the population consists of
millions of people, only about 1000 people need to be
sampled.
 When auditors are interested in how large the proportion of
errors might be, they usually calculate one-sided confidence
intervals for proportions.
 They automatically use lower limit pL = 0 and determine an upper
limit pU such that the 95% confidence interval is from 0 to pU.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.4: Estimating the Response to a
New Sandwich (slide 1 of 4)
 Objective: To illustrate the procedure for finding a
confidence interval for the proportion of customers
who rate the new sandwich at least 6 on a 10-point
scale.
 The file name: Satisfaction Ratings
 Solution: A sample of 40 customers who ordered a
new sandwich were surveyed. Each was asked to
rate the sandwich on a scale of 1 to 10.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.4: Estimating the Response to a
New Sandwich (slide 2 of 4)
,
Sample proportion,
Std Error of proportion
For 95% confidence level, 960

Upper Limit = 0.775

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.4: Estimating the Response to a
New Sandwich (slide 3 of 4)
SPSS Steps:
From the menus choose:
 Analyze  Compare Means  One- Sample Proportions.
 Select Satisfaction and move it onto the Test Variable(s) field.
 Select Values under Define Success criteria and enter 6 7 8 9 10
(Multiple values must be separated by spaces)
 Click on Confidence Intervals and select Wald.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.4: Estimating the Response to a
New Sandwich (slide 4 of 4)
SPSS Steps & Output:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.5: Auditing for Price Errors (slide 1 of
2)

 Objective: To find the upper limit of a one-sided


95% confidence interval for the proportion of
errors in the context of attribute sampling in
auditing.
 Solution: An auditor checks 93 randomly sampled
invoices and finds that two of them include price
errors.
 If pU is the appropriate upper confidence limit, then
pU satisfies the equation:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.5: Auditing for Price Errors (slide 2 of
2)

,
Sample proportion,
Std Error of proportion
645

Upper Limit = 0.046248

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-7 Confidence Interval for the
Difference Between Means
 One of the most important applications of statistical
inference is the comparison of two population
means.
 There are many applications to business.
 For statistical reasons, independent samples must
be distinguished from paired samples.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-7a Independent Samples

 The appropriate sampling distribution of the


difference between sample means is the t
distribution with n1 + n2 – 2 degrees of freedom.
 Confidence interval for difference between means:

 Pooled estimate of common standard deviation:

 Standard error of difference between sample means:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.6: Reliability of Treadmill
Motors at SureStep (slide 1 of 3)
 Objective: To find a confidence interval for the
difference between mean lifetimes of motors, and
to see how this confidence interval can help
SureStep choose the better supplier. The file name:
Treadmill Motors
 Solution: SureStep Company installs motors from
supplier A on 30 of its treadmills and motors from
supplier B on another 30 of its treadmills.
 It then runs these treadmills and records the
number of hours until the motor fails.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.6: Reliability of Treadmill
Motors at SureStep (slide 2 of 3)
SPSS Steps:
From the menus choose:
 Analyze  Compare Means  Independent- Samples T Test.
 Select lifetimes and move it onto the Test Variable(s) field.
 Select Suppliers and transfer it to the box labeled Grouping Variable.
 Click on the <Define Groups> button. In the window displayed, enter
values 1 and 2 for Suppliers 1 and 2 respectively.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.6: Reliability of Treadmill
Motors at SureStep (slide 3 of 3)
SPSS Output:
With 95% confidence, the confidence interval for the difference
between means extends from -47.549 to 233.815.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.7: consumer advocate (slide 1 of 3)
 A consumer advocate analyzes the nicotine content in two brands
of cigarettes.

 Construct the 95% confidence interval for the difference between


the two population means.
 Nicotine content is assumed to be normally distributed.
 In addition, the population variances are unknown but assumed equal.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.7: consumer advocate (slide 2 of 3)

 The 95% confidence interval for the difference between the two
means ranges from -0.41 to -0.13.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.7: consumer advocate (slide 3 of 3)

SPSS Steps:
From the menus choose:
 Analyze  Compare Means  Summary Independent-

Samples T Test.
 Complete the dialog box as shown. SPSS Output:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Equal-Variance Assumption

 This two-sample analysis makes the assumption that the


standard deviations of the two populations are equal.
 How can you tell if they are equal, and what do you do if
they are clearly not equal?
 A statistical test for equality of two population variances is
automatically shown at the bottom of the SPSS Two-Sample
output.
 If there is reason to believe that the population variances are
unequal, a slightly different procedure can be used to calculate a
confidence interval for the difference between the means.
 The appropriate standard error of is now:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-7b Paired Samples

 When the samples to be compared are paired in some


natural way, such as a pretest and posttest for each
person, or husband-wife pairs, there is a more
appropriate form of analysis than the two-sample
procedure.
 The paired procedure itself is very straightforward:
 It does not directly analyze two separate variables (pretest
scores and posttest scores, for example); it analyzes their
differences.
 For each pair in the sample, calculate the difference
between the two scores for the pair.
 Then perform a one-sample analysis on these differences.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.8: Husband and Wife Reactions
to Presentations (slide 1 of 3)
 Objective: To use a paired-sample procedure to find
a confidence interval for the mean difference
between husbands’ and wives’ ratings of sales
presentations.
 The file name: Sales Presentation Ratings
 Solution: A random sample of husbands and wives
are asked (separately) to rate the sales presentation at
Stevens Honda-Buick automobile dealership on a
scale of 1 to 10.
 The analysis using SPSS is shown on the following
slide.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.8: Husband and Wife Reactions
to Presentations (slide 2 of 3)
 SPSS Steps:
From the menus choose:
 Analyze  Compare Means  Paired-Samples T Test.

 Select Husband as Variable 1 and Wife as Variable 2 and

then click on the arrow button to move the selection pair into
the Paired Variables box.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.8: Husband and Wife Reactions
to Presentations (slide 3 of 3)
 The sample mean Husband minus Wife difference is 1.629 and a
95% confidence interval for this difference extends from 1.057 to
2.200
SPSS Output:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-9 Sample Size Selection
(slide 1 of 2)

 Confidence intervals are a function of three things:


 Data in the sample directly affect the length of a confidence
interval through their sample standard deviation(s).
 There are random sampling plans that can reduce the amount of
variability in the sample and hence reduce confidence interval length.
 Variance reduction is also possible in randomized experiments.
 As confidence level increases, the length of the confidence
interval increases as well.
 However, the confidence level is rarely used to control the length of the
confidence interval.
 Instead, confidence level choice is usually based on convention, and
95% is by far the most commonly used value.
 Sample size(s) is/are the most obvious way to control confidence
interval length is to choose the sample size(s) appropriately.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Sample Size Selection
(slide 2 of 2)

 The goal is to make the length of a confidence interval


sufficiently narrow.
 Each confidence interval discussed so far (with the
exception of the confidence interval for a standard
deviation) is a point estimate plus or minus some quantity.
 The “plus or minus” part is called the half-length of the interval.
 The usual approach is to specify the half-length B you would
like to obtain. Then you find the sample size(s) necessary to
achieve this half-length.

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-9a Sample Size Selection for Estimation
of the Mean
 The appropriate sample size for estimation of the mean can be
calculated from the formula for the confidence interval for the
mean, by setting and solving for n:

 Unfortunately, sample size selection must be done before a


sample is observed, and value s is not yet available.
 The usual solution is to replace s by some reasonable estimate σest of
the population standard deviation, and to replace the t-multiple with
the corresponding z-multiple from the standard normal distribution.
 The resulting sample size formula is:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.9: Sample Size Selection for
Estimating Reaction
 Objective: To find the sample size of customers required to
achieve a sufficiently narrow confidence interval for the mean
rating of the new sandwich.
 Solution: The fast-food manager in Example 8.1 surveyed 40
customers, each of whom rated a new sandwich on a scale of 1 to
10.
 Based on the data, a 95% confidence interval for the mean rating
of all potential customers extended from 5.739 to 6.761, with a
half-length (Margin of error) of 0.511.
 Find how large a sample would be needed to reduce this half-
length to approximately 0.3, s=1.597 .
 Using the sample size formula yields the following, rounded up
to 109.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
8-9b Sample Size Selection for
Estimation of Other Parameters
 The sample-size analysis for the mean carries over
with very few changes to other parameters.
 Sample size formula for estimating a proportion:

 If pest is not given, use pest =0.5


 Sample size formula for estimating the difference
between means:

© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.10: Sample Size Selection for
Estimating the Proportion
 Objective: To find the sample size of customers required to
achieve a sufficiently narrow confidence interval for the
proportion of customers who have tried the new sandwich.
 Solution: The data set is the same as in Examples 8.1 and 8.9.
 Now the fast-food manager wants to estimate the proportion of
customers who have tried its new sandwich.
 She wants a 90% confidence interval for this proportion to have
half-length 0.05.
 If she is fairly sure that the proportion who have tried the new
sandwich is around 0.3, she can use pest = 0.3.

 Round up to 228.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 8.11: Sample Size Selection for
Analyzing Customer Complaints
 Objective: To see how many employees in each experimental group must be
sampled to achieve a sufficiently narrow confidence interval for the difference
between the mean numbers of complaints.
 Solution: A customer service center has two types of employees: those who
have had a recent course in dealing with customers (but little actual experience)
and those with a lot of experience dealing with customers (but no formal
course).
 The company wants to estimate the difference between the two types of
employees in terms of the average number of customer complaints regarding
poor service in the last six months.
 The company plans to obtain information on a randomly selected sample of
each type of employee, using equal sample sizes.
 How many employees should be in each sample to achieve a 95% confidence
interval with approximate half-length 2 ? Assume =5

 Round up to 49.
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.

You might also like