Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Chapter 11

Confidence Intervals for Proportions


Ch. 11: Confidence Intervals for Proportions

Learning Objectives

1) Calculate a confidence interval for a proportion

2) Trade off certainty of precision

3) Choose the appropriate size for a sample

4) Calculate a confidence interval for the difference between


two proportions
Ch. 11: Confidence Intervals for Proportions

• We have learned to use the sample we have at hand to


say something about the world at large.
• This process is called inference
• Inference is based on our understanding of sampling
models
• Inference is a key contribution of statistics in management
decision making
• There are important assumptions and conditions we must
check before using any statistical inference procedure.
Tracking Changes in Consumer Confidence

• Tracking Consumer confidence can be an indicator of


economic condition
• In order to plan their inventory and production needs,
businesses use a variety of forecasts about the economy
• Businesses rely on the different sets of indicators, one of
them is consumer confidence over time to gauge the
demand for their products

2021-11-03
11.1 A Confidence Interval
Example:
• The Gallop Poll periodically asks a random sample of
adults whether they think economic conditions are getting
better.
• They found that 153 out of 1023 respondents thought
economic conditions were getting better – a sample
proportion of p̂ = 153/1023 =15.0%.
We’d like to use this sample proportion to say something
about what proportion, p, of the entire population thinks
about the economic conditions.
11.1 A Confidence Interval
• From Chapter 10, we know that it is not surprising that two
random samples give slightly different results.
• We like to say something, not about different random
samples, but about the proportion of all adults who thought
economic conditions in the country were getting better
• It means we would like to use sample data to talk about
the population proportion.
• We are focusing on survey questions that bring answers
like Yes/ No only.
11.1 A Confidence Interval
11.1 A Confidence Interval
Example (continued): We know that our sampling distribution
model is centered at the true proportion, p, and we know the
standard deviation of the sampling distribution is given by the
formula below.

pq
SD = , where q = 1 − p
n
We do not know p, so we cannot find the true standard
deviation of the sampling distribution model.
Standard Error (SE)
• Often we only know observed proportion
or the observed sample standard deviation, s
• So of course we just use what we know, and we estimate
• That may not seem like big deal, but it gets a special name
• Whenever we estimate standard deviation of a sampling
distribution, we call it Standard Error (SE)
Standard Error (SE)
• For a sample proportion,
i.e., the standard error is

pˆ qˆ
SE ( pˆ ) =
n
11.1 A Confidence Interval

We also know from the Central Limit Theorem that the shape
of the sampling distribution is approximately Normal and we
can use p̂ to find the standard error.

pˆ qˆ (0.15)(1 − 0.15)
SE ( pˆ ) = = = 0.011
n 1023
11.1 A Confidence Interval
Example (continued): The sampling distribution model for p̂
is Normal with mean p and standard deviation estimated to
be 0.011.

Because the distribution is Normal, we expect that about


95% of all samples of 1023 U.S. adults would have had
sample proportions within two SEs of p. That is, we are 95%
certain that p̂ is within 2 × (0.011) of p.
11.1 A Confidence Interval
What Can We Say about a Proportion?

Here’s what we would like to be able to say:


1) “15.0% of all adults thought the economy was improving.”
There is no way to be sure that the population proportion
is the same as the sample proportion.

2) “It is probably true that 15.0% of all adults thought the


economy was improving.” We can be pretty certain that
whatever the true proportion is, it’s probably not exactly
15.0%.
11.1 A Confidence Interval
What Can We Say about a Proportion?

3) “We don’t know the exact proportion of adults who


thought the economy was improving but we know it is
between 12.8% and 17.2%.” We can’t know for sure that
the true proportion is in this interval.

4) “We don’t know the exact proportion of adults who


thought the economy was improving but the interval from
12.8% to 17.2% probably contains the true proportion.”
This is close to correct, but what is meant by probably?
11.1 A Confidence Interval
What Can We Say about a Proportion?

An appropriate interpretation of our confidence interval would


be, “We are 95% confident that between 12.8% and 17.2% of
adults thought the economy was improving.”

The confidence interval calculated and interpreted here is an


example of a one-proportion z-interval.
11.1 A Confidence Interval

What Does “95% Confidence” Really Mean?

What does it mean when we say we have 95% confidence


that our interval contains the true proportion?

Our uncertainty is about whether the particular sample we


have at hand is one of the successful ones or one of the 5%
that fail to produce an interval that captures the true value.

If other pollsters would have collected samples, their


confidence intervals would have been centered at the
proportions they observed.
11.1 A Confidence Interval
What Does “95% Confidence” Really Mean?
Below we see the confidence intervals produced by
simulating 20 samples.
The purple dots are the
simulated proportions of
adults who thought the
economy was improving.
The orange segments
show each sample’s
confidence intervals. The
green line represents the
true proportion of the
entire population. Note: Not all confidence intervals
capture the true proportion.
11.2 Margin of Error: Certainty vs. Precision
Our confidence interval can be expressed as below.
pˆ  2SE( pˆ )
The extent of that interval on either side of p̂ is called the
margin of error (ME). The general confidence interval can
now be expressed in terms of the ME.
estimate  ME
11.2 Margin of Error: Certainty vs. Precision
The more confident we want to be, the larger the margin of
error must be.

We can be 100% confident that any proportion is between


0% and 100%, but we can’t be very confident that the
proportion is between 14.98% and 15.02%.

Every confidence interval is a balance between certainty and


precision.

Fortunately, we can usually be both sufficiently certain and


sufficiently precise to make useful statements.
11.3 Critical Values
To change the confidence level, we’ll need to change the
number of SEs to correspond to the new level.

For any confidence level the number of SEs we must stretch


out on either side of p̂ is called the critical value.

Because a critical value is based on the Normal model, we


denote it z*.
11.3 Critical Values
A 90% confidence interval has a critical value of 1.645. That
is, 90% of the values are within 1.645 standard deviations
from the mean.
Class example
Class example
Class example
Class example
Class example
Class example
11.4 Assumptions and Conditions
Is using a Normal model for the sampling distribution
appropriate?

Are the assumptions reasonable?

We must check our assumptions and the corresponding


conditions before creating a confidence interval about a
proportion.
11.4 Assumptions and Conditions
Independence Assumption

Is there any reason to believe that the data values somehow


affect each other?

• Randomization Condition: Proper randomization can help


ensure independence.

• 10% Condition: If the sample exceeds 10% of the


population, the probability of a success changes so much
during the sampling that a Normal model may no longer be
appropriate.
11.4 Assumptions and Conditions
Sample Size Assumption

The sample size must be large enough for the Normal


sampling model to be appropriate.

• Success/Failure Condition: We must expect our sample to


contain at least 10 “successes” and at least 10 “failures”. So
we check that both npˆ  10 and nqˆ  10.
11.5 Choosing the Sample Size
To get a narrower confidence interval without giving up
confidence, we must choose a larger sample.

Example: Suppose a company wants to offer a new service


and wants to estimate, to within 3%, the proportion of
customers who are likely to purchase this new service with
95% confidence. How large a sample do they need?
To answer this question, we look at the margin of error.
pˆ qˆ pˆ qˆ
ME = z *  0.03 = 1.96
n n
We see that this question can’t be answered because there
are two unknown values, p̂ and n.
11.5 Choosing the Sample Size
Example (continued): We proceed by guessing the worst
case scenario for p̂ . We guess p̂ is 0.50 because this
makes the SD (and therefore n) the largest.
We may now compute n.
(0.5)(0.5)
0.03 = 1.96  n = 1067.1
n
We can conclude that the company will need at least 1068
respondents to keep the margin of error as small as 3% with
confidence level 95%.
Class Example
Class Example
Class Example
Class Example
Class Example
11.5 Choosing the Sample Size
Usually a margin of error of 5% or less is acceptable.

However, to cut the margin of error in half, you will have to


quadruple the sample size.

The sample size in a survey is the number of respondents,


not the number of questionnaires sent or phone numbers
dialed, so increasing the sample size can dramatically
increase the cost and time needed to collect the data.
*11.6 A Confidence Interval for Small
Samples
• Suppose a student in an advertising class is studying the
impact of ads placed during the Super Bowl
• Wants to know what proportion of students on campus
watched it
• She takes a random sample of 25 students and finds out
that all 25 watched super bowl
• So, sample proportion becomes 100%
• A 95% confidence interval is (1.0, 1.0)
• Does she really believe that 30000 students on her
campus watched super bowl.
• Probably not
• She realizes that success/failure conditions severely
violated because there are n failures.
*11.6 A Confidence Interval for Small
Samples
When the Success/Failure condition fails, we make a simple
adjustment to the calculation that lets us make a confidence
interval anyway.

We add four synthetic observations, two to the successes


and two to the failures, and use the adjusted proportion.

~ y+2
p = ~ , where n~ = n + 4
n
*11.6 A Confidence Interval for Small
Samples
Including the synthetic observations leads to a new adjusted
interval.
~
p (1 − ~
p)
~
p  z*
n~
This form gives better performance for proportions near zero
or one. It also has the advantage that we do not need to
check the Success/Failure condition.
*11.6 A Confidence Interval for Small
Samples
Example: A student studying the impact of Super Bowl ads
wants to know what proportion of students on campus
watched the Super Bowl.

A random sample of 25 students reveals that all 25 watched


the Super Bowl.

This gives a p̂ of 100% and a 95% confidence interval of


(1.0, 1.0).

Can she conclude that every student on her campus watched


the Super Bowl?
*11.6 A Confidence Interval for Small
Samples
Example (continued): Obviously the Success/Failure
condition is violated, but she can use synthetic observations.
~
Adding two successes and failures, she can calculate p and
the standard error.
~ 27 ~ (0.931)(0.069)
p= = 0.931, SE ( p ) = = 0.047
29 29
She can find the 95% confidence interval:
0.931±1.96(0.047) = (0.839, 1.023).
She can conclude with 95% confidence that between 83.9%
and 102.3% (or 100%) of all students watched the Super
Bowl.
11.7 Confidence Interval for the Difference
Between Two Proportions
11.7 Confidence Interval for the Difference
Between Two Proportions
Use the formula below to calculate the confidence interval for
the difference between two populations:
What Can Go Wrong?
• Don’t suggest that the parameter varies. The population
parameter is fixed, it is the interval that varies from sample
to sample.

• Don’t claim that other samples will agree with yours. There
is nothing special about your sample; it doesn’t set the
standard for other samples.

• Don’t be certain about the parameter. Do not assert that


the population parameter cannot be outside an interval.
What Can Go Wrong?
• Don’t forget: It’s about the parameter. We are interested in
p, not p̂.

• Don’t claim to know too much.

• Do take responsibility. You must accept the responsibility


and consequences of the fact that not all the intervals you
compute will capture the true population value.
What Can Go Wrong?
Violations of Assumptions

• Watch out for biased sampling. Don’t forget the sources of


bias in surveys.

• Think about independence. It is tough to check the


assumption that values in a sample are mutually
independent, but it pays to think about it.

• Be careful of sample size. The validity of the confidence


interval for proportions may be affected by sample size.

•Don’t think you need a large sample if you’ve got a larger


population
48
What Have We Learned?
• There are important assumptions and conditions we must
check before using any statistical inference procedure.

• Interpret confidence interval by reporting what is true in the


entire population from which we took our random sample.

• Our best estimate of the true population is the proportion


we observed in the sample, so we center our confidence
interval there.

• Samples don’t represent the population perfectly, so we


create our interval with a margin of error.

• 49
What Have We Learned?
• Adjust the calculations of confidence interval if the sample
is very small.

• For a given sample size, the higher the level of confidence


we want, the wider our confidence interval becomes.

• For a given level of confidence, the larger the sample size


we have, the narrower our confidence interval can be.

• Compare two proportions by calculating the confidence


interval for the difference between them.

You might also like