Stat609 Sp23 LCN Unit2

Unit 2
STATISTICAL INFERENCE
 CHAPTER 7: Sampling and Sampling
Distributions
 CHAPTER 8: Confidence Interval Estimation
© 2017 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Chapter 7
Sampling and Sampling Distributions
7-1 Introduction
 In a typical statistical inference problem, you want to

discover one or more characteristics of a given
population.
 Generally difficult or even impossible to contact each
member of the population
 Solution: identify a sample of the population and then obtain
information from members of the sample
 Chapter objectives
 Discuss the sampling schemes generally used in real
sampling applications
 See how the information from a sample of the population
can be used to infer the properties of the entire population
7-2 Sampling Terminology
 A population is the set of all members about which a

study intends to make inferences.
 An inference is a statement about a numerical characteristic
of the population.
 A frame is a list of all members of the population. The
potential sample members are called sampling units.
 A probability sample is a sample in which the sampling
units are chosen from the population according to a
random mechanism.
 A judgmental sample is a sample in which the sampling
units are chosen according to the sampler’s judgment.
7-2 Sampling Terminology
 The members of a probability sample are chosen according

to a random mechanism, whereas the members of a
judgmental sample are chosen according to the sampler’s
judgment.
Why Random Sampling?
 One reason for sampling randomly from a population is to
avoid biases (such as choosing mainly stay-at-home

mothers because they are easier to contact).
 The random sampling allows you to use probability to make
inferences about unknown population parameters. If

sampling were not random, there would be no basis for
using probability to make such inferences.
7-3 Methods for Selecting Random
Samples
 Different types of sampling schemes have different
properties.
 There is typically a trade-off between cost and
accuracy.
 Some sampling schemes are cheaper and easier to
administer, whereas others are more costly but provide
more accurate information.
7-3a Simple Random Sampling
(slide 1 of 2)
 The simplest type of sampling scheme is called

simple random sampling.
 A simple random sample of size n is one where
each possible sample of size n has the same chance
of being chosen.
 Simple random samples are the easiest to understand,
and their statistical properties are the most
straightforward.
 More complex random samples are often used in
real applications
Simple Random Sampling
(slide 2 of 2)
 Simple random samples are used infrequently in real

applications. There are several reasons for this:
 Because each sampling unit has the same chance of being
sampled, simple random sampling can result in samples
that are spread over a large geographical region.
 This can make sampling extremely expensive, especially if
personal interviews are used.
 Simple random sampling requires that all sampling units be
identified prior to sampling. Sometimes this is infeasible.
 Simple random sampling can result in underrepresentation
or overrepresentation of certain segments of the population.
Example 7.1: Sampling Families to Analyze
Annual Incomes (slide 1 of 2)
 Objective: Using SPSS to illustrate how the random number
function, RAND, can be used to generate simple random samples.
The file name: Annual Incomes.
 Solution: Consider the frame of 40 families with annual incomes.
 Choose a simple random sample of size 10 from this frame.
SPSS Steps:
From the menus choose:
 Data  Select Cases  Random Sample of Cases.
 The ‘Random Sample of Cases’, selects a random subset of cases
based on a pseudo-random number generated by SPSS.
 Enter Exactly 10 cases from the first 40 cases.
 To obtain more random samples of size 10 (for comparison), you
would need to go through this process repeatedly.
Example 7.1: Sampling Families to Analyze
Annual Incomes (slide 2 of 2)
7-3b Systematic Sampling (slide 1 of 2)
 A systematic sample provides a convenient way to

choose the sample.
 First, divide the population size by the sample size,
creating “blocks.”
 Next, use a random mechanism to choose a number
between 1 and the number in each “block.”
 In general, one of the first k members is selected
randomly, and then every kth member after this one is
selected.
 The value k is called the sampling interval and equals the
ratio N/n, where N is the population size and n is the desired
sample size.
7-3b Systematic Sampling (slide 2 of 2)
Example 7.2:
 Suppose you are asked to select a random sample of 250 names
from a large company’s directory of employees.
 There are 55,000 names listed in alphabetical order in the
directory.
 First, you divide the population size by the sample size:
55,000/250 = 220.
 Next, you use a random mechanism to choose a number
between 1 and 220. Suppose this number is 131.
 Then you choose the 131st name and every 220th name
thereafter.
 So, you would choose name 131, name 351, name 571, and so
on. The result is a systematic sample of size n = 250.
7-3c Stratified Sampling (slide 1 of 2)
 Suppose various subpopulations within the total population can be

identified. These subpopulations are called strata.
 Instead of taking a simple random sample from the entire
population, it might make more sense to select a simple random
sample from each stratum separately. This sampling method is
called stratified sampling.
 In stratified sampling, the population is divided into relatively
homogeneous subsets called strata.
 Advantages of stratified sampling:
 Separate estimates can be obtained within each stratum, which would not
be obtained with a simple random sample from the entire population.
 The accuracy of the resulting population estimates can be increased by
using appropriately defined strata.
 Define the strata such that there is less variability within the individual strata
than in the population as a whole.
Stratified Sampling (slide 2 of 2)
 The key to using stratified sampling effectively is selecting the

appropriate strata.
 What is appropriate depends on the company’s objectives and its
product.
 There are many ways to choose sample sizes from each stratum,
but the most popular method is to use proportional sample
sizes.
 With proportional sample sizes, the proportion of a stratum in the
sample is the same as the proportion of that stratum in the population.
 The advantage of proportional sample sizes is they are very easy to
determine.
 The disadvantage is they ignore differences in variability among the
strata.
7-3d Cluster Sampling
 In cluster sampling, the population is separated into

clusters, such as cities or city blocks, and then a random
sample of the clusters is selected.
 The primary advantage of cluster sampling is sampling
convenience (and possibly lower cost).
 The downside is that the inferences drawn from a cluster sample
can be less accurate for a given sample size than other sampling
plans.
 The key to selecting a cluster sample is to define the
sampling units as the clusters—the city blocks, for example.
 Then a simple random sample of clusters can be chosen.
 Once the clusters are selected, it is typical to sample all of the
population members in each selected cluster.
7-3e Multistage Sampling Schemes
 The cluster sampling scheme is an example of a single-

stage sampling scheme.
 Real applications are often more complex than this,
resulting in multistage sampling schemes.
 For example, in Gallup’s nationwide surveys, a random
sample of approximately 300 locations is chosen in the
first stage of the sampling process.
 City blocks or other geographical areas are then randomly
sampled from the first-stage locations in the second stage
of the process.
 This is followed by a systematic sampling of households
from each second-stage area.
7-4 An Introduction to Estimation
 The purpose of any random sample, simple or

otherwise, is to estimate properties of a population
from the data observed in the sample.
 The mathematical procedures appropriate for
performing this estimation depend on which
properties of the population are of interest and
which type of random sampling scheme is used.
 For both simple random samples and more
complex sampling schemes, the concepts are the
same.
7-4a Sources of Estimation Error
(slide 1 of 2)
 There are two basic sources of errors that can occur

when you sample randomly from a population:
 Samplingerror
 Nonsampling error
 Sampling error is the inevitable result of basing an

inference on a random sample rather than on the
entire population.
Sources of Estimation Error
(slide 2 of 2)
 Nonsampling error is quite different and can occur for a variety of

reasons:
 Nonresponse bias occurs when a portion of the sample fails to respond
to the survey.
 Nontruthful responses are particularly a problem when there are
sensitive questions in a questionnaire.
 Measurement error occurs when the responses to the questions do not
reflect what the investigator had in mind (e.g., when questions are poorly
worded).
 Voluntary response bias occurs when the subset of people who respond to
a survey differs in some important respect from all potential respondents.
 The potential for nonsampling error is enormous.
 However, unlike sampling error, it cannot be measured with probability theory.
 It can be controlled only by using appropriate sampling procedures and
designing good survey instruments.
7-4b Key Terms in Sampling
(slide 1 of 2)
 A point estimate is a single numeric value, a “best

guess” of a population parameter, based on the data
in a random sample.
 The sampling error (or estimation error) is the
difference between the point estimate and the true
value of the population parameter being estimated.
 The sampling distribution of any point estimate is
the distribution of the point estimates from all
possible samples (of a given sample size) from the
population.
Key Terms in Sampling
(slide 2 of 2)
 A confidence interval is an interval around the point

estimate, calculated from the sample data, that is very
likely to contain the true value of the population
parameter.
 An unbiased estimate is a point estimate such that the
mean of its sampling distribution is equal to the true
value of the population parameter being estimated.
 The standard error of an estimate is the standard
deviation of the sampling distribution of the estimate.
 It measures how much estimates vary from sample to
sample.
7-4c Sampling Distribution of the Sample
Mean (slide 1 of 2)
 The sampling distribution of the sample mean has the
following properties:
 It is an unbiased estimate of the population mean, as
indicated in this equation:
 The standard error of the sample mean is given in the
equation where σ is the standard deviation of
the population, and n is the sample size.
 It is customary to approximate the standard error by substituting
the sample standard deviation, s, for σ, which leads to this
equation:
 If you go out two standard errors on either side of the
sample mean, you are approximately 95% confident of
capturing the population mean, as shown below:
7-4c Sampling Distribution of the Sample Mean
(slide 2 of 2)
 Example 7.3: Size of a pizza is normally distributed with a mean of

16 inches and a standard deviation of 0.8 inch. The pizza chefs strive
to make each pizza 16 inches but are not able to make them all 16
inches.
 What are the expected value and standard error of the sample mean
derived from a random sample of 2 pizzas?
and
 What are the expected value and standard error of the sample mean
derived from a random sample of 4 pizzas?
and
 Compare the expected value and the standard error of the sample
mean with those of an individual pizza.
 The expected values are the same
 The standard errors are lower; averaging reduces variability
The Finite Population Correction (slide 1 of 3)
 Generally, sample size is small relative to the

population size.
 There are situations, however, when the sample size
is greater than 5% of the population.
 Finite population correction factor
 Accounts for the added precision gained by sampling a
larger percentage of the population
 Used to reduce the sampling variation of the
mean/proportion
 Always less than one
 N is large relative to n, factor is close to 1

 In this case, the formula for the standard error of

the mean should be modified with a finite
population correction, or fpc, factor:
 The standard error of the mean is multiplied by fpc

in order to make the correction:
 Example 7.4: a large class with 340 students has been

divided up into 10 groups. Connie is in a group of 34
students that averaged 72 on the midterm. The class
average was 73 with a standard deviation of 10.
a. Calculate the expected value and the standard error of
the sample mean based on a random sample of 34
students.
 . and

7-4d The Central Limit Theorem (slide 1 of 2)
 For any population distribution with mean μ and standard

deviation σ, the sampling distribution of the sample mean is
approximately normal with mean μ and standard deviation ,
and the approximation improves as n increases. This is
called the central limit theorem.
 The important part of this result is the normality of the
sampling distribution.
 When you sum or average n randomly selected values from any
distribution, normal or otherwise, the distribution of the sum or
average is approximately normal, provided that n is sufficiently
large.
 This is the primary reason why the normal distribution is
relevant is so many real applications.
7-4d The Central Limit Theorem (slide 2 of 2)
 The Averaging Effect

 As you average more and more observations from a given
distribution, the variance of the average decreases.
 For example, suppose you average only two observations. Then
it is easy to get an abnormally large (or small) average.
 But if you average a much larger number of observations, you
aren’t likely to get an abnormally large (or small) average.
 The reason is that a few abnormally large observations will
typically be cancelled by a few abnormally small observations.
 This cancellation produces the averaging effect. It also explains
why a larger sample size tends to produce a more accurate
estimate of a population mean.
Example 7.5: Average Winnings From A
Wheel of Fortune (slide 1 of 2)
 Objective: To illustrate the central limit theorem by a simulation of
winnings in a game of chance.
 Solution: The population is the set of all outcomes you could obtain
from a single spin of the wheel—that is, all dollar values from $0 to
$1000.
 Each spin results in one randomly sampled dollar value from this
population.
 Each replication of the experiment simulates n spins of the wheel and
calculates the average—that is, the winnings—from these n spins.
 A histogram of winnings is formed, for any value of n, where n is the
number of spins.
 As the number of spins increases, the histogram starts to take on more
and more of a bell shape.
Example 7.5: Average Winnings From A
Wheel of Fortune (slide 2 of 2)
7-4e Sample Size Selection (slide 1 of 2)
 The problem of selecting the appropriate sample size in

any sampling context is not an easy one, but it must be
faced in the planning stages, before any sampling is
done.
 The sampling error tends to decrease as the sample size
increases, so the desire to minimize sampling error
encourages us to select larger sample sizes.
 However, several other factors encourage us to select
smaller sample sizes, including:
 Cost
 Timely collection of data
 Increased chance of nonsampling error, such as nonresponse bias
7-4e Sample Size Selection (slide 2 of 2)
 Effect of Larger Sample Sizes

 Accurate estimates of population parameters require small
standard errors, and small standard errors require large
sample sizes.
 However, standard errors are typically inversely proportional
to the square root of the sample size (or sample sizes).
 The implication is that if you want to decrease the standard
error by a given factor, you must increase the sample size by
a much larger factor.
 For example, to decrease the standard error by a factor of 2,
you must increase the sample size by a factor of 4.
 Accurate estimates are not cheap.
7-4f Summary of Key Ideas for Simple
Random Sampling
 To estimate a population, mean with a simple random sample, the
sample mean is typically used as a “best guess.” This estimate is
called a point estimate.
 The accuracy of the point estimate is measured by its standard error. It
is the standard deviation of the sampling distribution of the point
estimate.
 A confidence interval (with 95% confidence) for the population mean
extends to approximately two standard errors on either side of the
sample mean.
 From the central limit theorem, the sampling distribution
of is approximately normal when n is reasonably large.
 There is approximately a 95% chance that any particular will be
within two standard errors of the population mean μ.
 The sampling error can be reduced by increasing the sample size n.
Chapter 8
Confidence Interval Estimation
8-1 Introduction
 Statistical inferences are always based on an underlying

probability model, which means that some type of
random mechanism must generate the data.
 Two random mechanisms are generally used:
 Random sampling from a larger population
 Randomized experiments
 Generally, statistical inferences are of two types:

 Confidence interval estimation uses the data to obtain a
point estimate and a confidence interval around this point
estimate.
 Hypothesis testing determines whether the observed data
provide support for a particular hypothesis.
8-2 Sampling Distributions
 Most confidence intervals are of the form:
 In general, whenever you make inferences about one or more population

parameters, you always base this inference on the sampling distribution of
a point estimate, such as the sample mean.
 An equivalent statement to the central limit theorem is that the standardized
quantity Z, as defined below, is approximately normal with mean 0 and
standard deviation 1:
 However, the population standard deviation σ is rarely known, so it is replaced

by its sample estimate s in the formula for Z.
 When the replacement is made, a new source of variability is introduced, and the
sampling distribution is no longer normal. Instead, it is called the t distribution.
8.2a The t Distribution
(slide 1 of 2)
 If we are interested in estimating a population mean μ with a sample

of size n, we assume the population distribution is normal with
unknown standard deviation σ.
 σ is replaced by the sample standard deviation s, as shown in this
equation:
 Then the standardized value in the equation has a t distribution with n – 1

degrees of freedom.
 The degrees of freedom is a numerical parameter of the t distribution that defines
the precise shape of the distribution.
 The t-value in this equation is very much like a typical Z-value.
 That is, the t-value indicates the number of standard errors by which the sample
mean differs from the population mean.
The t Distribution
(slide 2 of 2)
 The t distribution looks very much like the standard normal

distribution.
 It is bell-shaped and centered at 0.
 The only difference is that it is slightly more spread out, and this
increase in spread is greater for small degrees of freedom.
 When n is large, so that the degrees of freedom is large, the t
distribution and the standard normal distribution are practically
indistinguishable, as shown below.
8-2b Other Sampling Distributions
 The t distribution, a close relative of the normal

distribution, is used to make inferences about a
population mean when the population standard
deviation is unknown.
 Two other close relatives of the normal distribution
are the chi-square and F distributions.
 These are used primarily to make inferences about
variances (or standard deviations), as opposed to
means.
8-3 Confidence Interval for a Mean
(slide 1 of 2)
 To obtain a confidence interval for μ, first specify a

confidence level, usually 90%, 95%, or 99%.
 Then use the sampling distribution of the point
estimate to determine the multiple of the standard
error (SE) to go out on either side of the point
estimate to achieve the given confidence level.
 If the confidence level is 95%, the value used most
frequently in applications, the multiple is
approximately 2. More precisely, it is a t-value.
 A typical confidence interval for μ is of the form:
where
Confidence Interval for a Mean
(slide 2 of 2)
 To obtain the correct t-multiple, let α be 1 minus the

confidence level (expressed as a decimal).
 For example, if the confidence level is 90%, then α = 0.10.
 Then the appropriate t-multiple is the value that cuts
off probability α/2 in each tail of the t distribution with
n−1 degrees of freedom.
 As the confidence level increases, the length of the
confidence interval also increases.
 As n increases, the standard error decreases, so the
length of the confidence interval tends to decrease for
any confidence level.
Example 8.1: IQ Tests (slide 1 of 2)
IQ tests are approximately normally distributed. A sample

of 22 employees gives a sample mean of 106 and a
standard deviation of 15.
Compute 90% and 99% confidence intervals for the
average IQ at this firm.
Use the results to infer if the mean IQ in this firm is
significantly different from the national average of 100.
Example 8.1: IQ Tests (slide 2 of 2)
90% confidence interval:
[100.5, 111.5]
Does not contain 100
99% confidence interval:
[96.95, 115.05]
Contains 100
Example 8.2: Fuel Usage of “Ultra-Green” Cars
(slide 1 of 4)
 A car manufacturer advertises that its new “ultra-green” car obtains

an average of 100 MPG and, based on its fuel emissions. For a
sample of 25 cars, each car is driven the same distance in identical
conditions in order to obtain the car’s MPG. The file name: MPG
 Use the sample data to estimate the mean MPG of all ultra-green cars
with 90% confidence.
Example 8.2: Fuel Usage of “Ultra-Green” Cars
(slide 2 of 4)
 Summary Statistics:
 and
 Assume that MPG follows a normal distribution
 Construct the 90% confidence interval for the population mean
 For a 90% confidence interval:
 With 90% confidence, the average MPG of all ultra-green cars is between
92.86 MPG and 100.18 MPG
 The manufacturers claim that the ultra-green car will average 100 MPG
cannot be rejected since 100 falls within the interval.
Example 8.2: Fuel Usage of “Ultra-Green”
Cars (slide 3 of 4)
SPSS Steps:
 Analyze  Compare Means  One-Sample T Test..
 Click MPG and move it onto the Test Variable(s) field.
 Click Options and enter 90% for confidence interval
percentage.
Example 8.2: Fuel Usage of “Ultra-Green”
Cars (slide 4 of 4)
SPSS Output:
8-4 Confidence Interval for a Total
(slide 1 of 2)
 Let T be a population total we want to estimate, such as the

total of all receivables, and let be a point estimate of T based
on a simple random sample of size n from a population of size
N.
 First, we need a point estimate of T. For the population total T,
it is reasonable to sum all of the values in the sample, denoted
Ts, and then “project” this total to the population with this
equation:
 The mean and standard deviation of the sampling distribution

of are given in the equations below:
Confidence Interval for a Total
(slide 2 of 2)
 Because σ is usually unknown, s is used instead of

σ to obtain the approximate standard error of given
in the equation below:
 The point estimate of T is the point estimate of the

mean multiplied by N, and the standard error of this
point estimate is the standard error of the sample
mean multiplied by N.
Example 8.3: Estimating Total Tax
Refunds
(slide 1 of 2)
 Objective: To find a 95% confidence interval for the

total (net) amount the IRS must pay out to a set of
1,000,000 taxpayers.
 Solution: Data set is the refunds from a random sample
of 500 taxpayers.
 The file name: IRS Refunds
 Note that the sample mean is multiplied by the
population size (1,000,000) and how the standard error
of the mean is also multiplied by the population size.
 The effect is to scale the usual confidence interval for
the mean by the population size
Example 8.3: Estimating Total Tax
Refunds
(slide 2 of 2)
Std Error of total,

499, 965
Upper Limit = $346,057,164
Based on these calculations, the IRS can be 95% confident that it will need
to pay out somewhere between about 244 and 346 million dollars to these
1,000,000 taxpayers.
8-5 Confidence Interval for a Proportion
(slide 1 of 2)
 Surveys are often used to estimate proportions, so it is important to know

how to form a confidence interval for any population proportion p.
 The basic procedure requires a point estimate, the standard error of this
point estimate, and a multiple that depends on the confidence level:
 It can be shown that for sufficiently large n, the sampling distribution of is

approximately normal with mean p and standard error
 Standard error of sample proportion:
 Confidence interval for a proportion:
Confidence Interval for a Proportion
(slide 2 of 2)
 Confidence intervals for proportions are fairly wide unless n

is quite large.
 To obtain a 95% confidence interval of 3 percentage points
for a population proportion, where the population consists of
millions of people, only about 1000 people need to be
sampled.
 When auditors are interested in how large the proportion of
errors might be, they usually calculate one-sided confidence
intervals for proportions.
 They automatically use lower limit pL = 0 and determine an upper
limit pU such that the 95% confidence interval is from 0 to pU.
Example 8.4: Estimating the Response to a
New Sandwich (slide 1 of 4)
 Objective: To illustrate the procedure for finding a
confidence interval for the proportion of customers
who rate the new sandwich at least 6 on a 10-point
scale.
 The file name: Satisfaction Ratings
 Solution: A sample of 40 customers who ordered a
new sandwich were surveyed. Each was asked to
rate the sandwich on a scale of 1 to 10.
,
Sample proportion,
Std Error of proportion
For 95% confidence level, 960
Upper Limit = 0.775
SPSS Steps:
 Analyze  Compare Means  One- Sample Proportions.
 Select Satisfaction and move it onto the Test Variable(s) field.
 Select Values under Define Success criteria and enter 6 7 8 9 10
(Multiple values must be separated by spaces)
 Click on Confidence Intervals and select Wald.
SPSS Steps & Output:
Example 8.5: Auditing for Price Errors (slide 1 of
2)
 Objective: To find the upper limit of a one-sided

95% confidence interval for the proportion of
errors in the context of attribute sampling in
auditing.
 Solution: An auditor checks 93 randomly sampled
invoices and finds that two of them include price
errors.
 If pU is the appropriate upper confidence limit, then
pU satisfies the equation:
Example 8.5: Auditing for Price Errors (slide 2 of
2)
,
Sample proportion,
Std Error of proportion
645
Upper Limit = 0.046248
8-7 Confidence Interval for the
Difference Between Means
 One of the most important applications of statistical
inference is the comparison of two population
means.
 There are many applications to business.
 For statistical reasons, independent samples must
be distinguished from paired samples.
8-7a Independent Samples
 The appropriate sampling distribution of the

difference between sample means is the t
distribution with n1 + n2 – 2 degrees of freedom.
 Confidence interval for difference between means:
 Pooled estimate of common standard deviation:
 Standard error of difference between sample means:
Example 8.6: Reliability of Treadmill
Motors at SureStep (slide 1 of 3)
 Objective: To find a confidence interval for the
difference between mean lifetimes of motors, and
to see how this confidence interval can help
SureStep choose the better supplier. The file name:
Treadmill Motors
 Solution: SureStep Company installs motors from
supplier A on 30 of its treadmills and motors from
supplier B on another 30 of its treadmills.
 It then runs these treadmills and records the
number of hours until the motor fails.
SPSS Steps:
 Analyze  Compare Means  Independent- Samples T Test.
 Select lifetimes and move it onto the Test Variable(s) field.
 Select Suppliers and transfer it to the box labeled Grouping Variable.
 Click on the <Define Groups> button. In the window displayed, enter
values 1 and 2 for Suppliers 1 and 2 respectively.
SPSS Output:
With 95% confidence, the confidence interval for the difference
between means extends from -47.549 to 233.815.
Example 8.7: consumer advocate (slide 1 of 3)
 A consumer advocate analyzes the nicotine content in two brands
of cigarettes.
 Construct the 95% confidence interval for the difference between

the two population means.
 Nicotine content is assumed to be normally distributed.
 In addition, the population variances are unknown but assumed equal.
 The 95% confidence interval for the difference between the two
means ranges from -0.41 to -0.13.
SPSS Steps:
 Analyze  Compare Means  Summary Independent-
Samples T Test.
 Complete the dialog box as shown. SPSS Output:
Equal-Variance Assumption
 This two-sample analysis makes the assumption that the

standard deviations of the two populations are equal.
 How can you tell if they are equal, and what do you do if
they are clearly not equal?
 A statistical test for equality of two population variances is
automatically shown at the bottom of the SPSS Two-Sample
output.
 If there is reason to believe that the population variances are
unequal, a slightly different procedure can be used to calculate a
confidence interval for the difference between the means.
 The appropriate standard error of is now:
8-7b Paired Samples
 When the samples to be compared are paired in some

natural way, such as a pretest and posttest for each
person, or husband-wife pairs, there is a more
appropriate form of analysis than the two-sample
procedure.
 The paired procedure itself is very straightforward:
 It does not directly analyze two separate variables (pretest
scores and posttest scores, for example); it analyzes their
differences.
 For each pair in the sample, calculate the difference
between the two scores for the pair.
 Then perform a one-sample analysis on these differences.
Example 8.8: Husband and Wife Reactions
to Presentations (slide 1 of 3)
 Objective: To use a paired-sample procedure to find
a confidence interval for the mean difference
between husbands’ and wives’ ratings of sales
presentations.
 The file name: Sales Presentation Ratings
 Solution: A random sample of husbands and wives
are asked (separately) to rate the sales presentation at
Stevens Honda-Buick automobile dealership on a
scale of 1 to 10.
 The analysis using SPSS is shown on the following
slide.
 SPSS Steps:
 Analyze  Compare Means  Paired-Samples T Test.
 Select Husband as Variable 1 and Wife as Variable 2 and
then click on the arrow button to move the selection pair into
the Paired Variables box.
 The sample mean Husband minus Wife difference is 1.629 and a
95% confidence interval for this difference extends from 1.057 to
2.200
SPSS Output:
8-9 Sample Size Selection
(slide 1 of 2)
 Confidence intervals are a function of three things:

 Data in the sample directly affect the length of a confidence
interval through their sample standard deviation(s).
 There are random sampling plans that can reduce the amount of
variability in the sample and hence reduce confidence interval length.
 Variance reduction is also possible in randomized experiments.
 As confidence level increases, the length of the confidence
interval increases as well.
 However, the confidence level is rarely used to control the length of the
confidence interval.
 Instead, confidence level choice is usually based on convention, and
95% is by far the most commonly used value.
 Sample size(s) is/are the most obvious way to control confidence
interval length is to choose the sample size(s) appropriately.
Sample Size Selection
(slide 2 of 2)
 The goal is to make the length of a confidence interval

sufficiently narrow.
 Each confidence interval discussed so far (with the
exception of the confidence interval for a standard
deviation) is a point estimate plus or minus some quantity.
 The “plus or minus” part is called the half-length of the interval.
 The usual approach is to specify the half-length B you would
like to obtain. Then you find the sample size(s) necessary to
achieve this half-length.
8-9a Sample Size Selection for Estimation
of the Mean
 The appropriate sample size for estimation of the mean can be
calculated from the formula for the confidence interval for the
mean, by setting and solving for n:
 Unfortunately, sample size selection must be done before a

sample is observed, and value s is not yet available.
 The usual solution is to replace s by some reasonable estimate σest of
the population standard deviation, and to replace the t-multiple with
the corresponding z-multiple from the standard normal distribution.
 The resulting sample size formula is:
Example 8.9: Sample Size Selection for
Estimating Reaction
 Objective: To find the sample size of customers required to
achieve a sufficiently narrow confidence interval for the mean
rating of the new sandwich.
 Solution: The fast-food manager in Example 8.1 surveyed 40
customers, each of whom rated a new sandwich on a scale of 1 to
10.
 Based on the data, a 95% confidence interval for the mean rating
of all potential customers extended from 5.739 to 6.761, with a
half-length (Margin of error) of 0.511.
 Find how large a sample would be needed to reduce this half-
length to approximately 0.3, s=1.597 .
 Using the sample size formula yields the following, rounded up
to 109.
8-9b Sample Size Selection for
Estimation of Other Parameters
 The sample-size analysis for the mean carries over
with very few changes to other parameters.
 Sample size formula for estimating a proportion:
 If pest is not given, use pest =0.5

 Sample size formula for estimating the difference
between means:
Estimating the Proportion
 Objective: To find the sample size of customers required to
achieve a sufficiently narrow confidence interval for the
proportion of customers who have tried the new sandwich.
 Solution: The data set is the same as in Examples 8.1 and 8.9.
 Now the fast-food manager wants to estimate the proportion of
customers who have tried its new sandwich.
 She wants a 90% confidence interval for this proportion to have
half-length 0.05.
 If she is fairly sure that the proportion who have tried the new
sandwich is around 0.3, she can use pest = 0.3.
 Round up to 228.
Analyzing Customer Complaints
 Objective: To see how many employees in each experimental group must be
sampled to achieve a sufficiently narrow confidence interval for the difference
between the mean numbers of complaints.
 Solution: A customer service center has two types of employees: those who
have had a recent course in dealing with customers (but little actual experience)
and those with a lot of experience dealing with customers (but no formal
course).
 The company wants to estimate the difference between the two types of
employees in terms of the average number of customer complaints regarding
poor service in the last six months.
 The company plans to obtain information on a randomly selected sample of
each type of employee, using equal sample sizes.
 How many employees should be in each sample to achieve a 95% confidence
interval with approximate half-length 2 ? Assume =5
 Round up to 49.

Stat609 Sp23 LCN Unit2

Uploaded by

Copyright:

Available Formats

You might also like

Stat609 Sp23 LCN Unit2

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat609 Sp23 LCN Unit2

Uploaded by

Copyright:

Available Formats

Unit 2

 CHAPTER 8: Confidence Interval Estimation

 In a typical statistical inference problem, you want to

 A population is the set of all members about which a

 The members of a probability sample are chosen according

avoid biases (such as choosing mainly stay-at-home

inferences about unknown population parameters. If

 The simplest type of sampling scheme is called

 Simple random samples are used infrequently in real

 A systematic sample provides a convenient way to

 Suppose various subpopulations within the total population can be

 The key to using stratified sampling effectively is selecting the

 In cluster sampling, the population is separated into

 The cluster sampling scheme is an example of a single-

 The purpose of any random sample, simple or

 There are two basic sources of errors that can occur

 Sampling error is the inevitable result of basing an

 Nonsampling error is quite different and can occur for a variety of

 A point estimate is a single numeric value, a “best

 A confidence interval is an interval around the point

 Example 7.3: Size of a pizza is normally distributed with a mean of

 Generally, sample size is small relative to the

 N is large relative to n, factor is close to 1

 In this case, the formula for the standard error of

 The standard error of the mean is multiplied by fpc

 Example 7.4: a large class with 340 students has been

 For any population distribution with mean μ and standard

 The Averaging Effect

 The problem of selecting the appropriate sample size in

 Effect of Larger Sample Sizes

 Statistical inferences are always based on an underlying

 Generally, statistical inferences are of two types:

 Most confidence intervals are of the form:

 In general, whenever you make inferences about one or more population

 However, the population standard deviation σ is rarely known, so it is replaced

 If we are interested in estimating a population mean μ with a sample

 Then the standardized value in the equation has a t distribution with n – 1

 The t distribution looks very much like the standard normal

 The t distribution, a close relative of the normal

 To obtain a confidence interval for μ, first specify a

 To obtain the correct t-multiple, let α be 1 minus the

IQ tests are approximately normally distributed. A sample

90% confidence interval:

 A car manufacturer advertises that its new “ultra-green” car obtains

 Let T be a population total we want to estimate, such as the

 The mean and standard deviation of the sampling distribution

 Because σ is usually unknown, s is used instead of

 The point estimate of T is the point estimate of the

 Objective: To find a 95% confidence interval for the

Std Error of total,

Upper Limit = $346,057,164

 Surveys are often used to estimate proportions, so it is important to know

 It can be shown that for sufficiently large n, the sampling distribution of is

 Standard error of sample proportion:

 Confidence interval for a proportion:

 Confidence intervals for proportions are fairly wide unless n

Upper Limit = 0.775