Professional Documents
Culture Documents
Topic 1 Section 2 CI Basic Concepts
Topic 1 Section 2 CI Basic Concepts
Section 2.
Basic concepts used in
confidence interval
estimation
Section Two
Basic concepts used in
confidence interval
estimation
8.1 Explain why confidence interval (CI) estimates for a
population parameter are used.
Provide appropriate interpretations of CI estimates from
standard statistical output.
Jaggia Chapter 8, Section 8.1, 8.2, Section 8.3.
1
Questions this section will answer.
1. What is a confidence interval?
If feasible: why should we always report a confidence interval estimate as
well as a point estimate?
2. What’s confidence? Are there any limits on what number is ethical for
confidence?
3. What’s the basic structure of many confidence interval estimates?
4. How can we interpret a confidence interval estimate?
2
When a hurricane hits people don’t just die the
day the hurricane hits: people die because the
electricity is bust, there’s no clean water etc.
And we can never know what the true value of
these excess deaths is. This Boston Globe
headline is misleading– it does not convey
that 4,600 is not the true value for excess
deaths, but rather an estimate for the number
of “excess deaths” caused by Hurricane Maria.
3
Precision and confidence interval width
We report a point estimate for a population parameter so that people have one number they can
use as an estimate for a population parameter.
We report a confidence interval estimate for the population parameter to show a range of
plausible values, which conveys the precision of our estimate
• An imprecise confidence interval estimator produces a wide range of plausible values for
estimates
CI Lower Limit CI Upper Limit
A wide confidence interval shows the confidence interval (CI) estimate is not precise.
• A precise confidence interval estimator produces a narrow range of plausible values for
estimates:
CI Lower Limit CI Upper Limit
A narrow confidence interval shows the confidence interval (CI) estimate is precise
What is confidence?
Suppose that the true population parameter the proportion is 0.5. We can mimic the process of
gathering a sample of size 100 and computing a confidence interval estimate (CI estimate) for a
proportion using a simulator: http://www.rossmanchance.com/applets/ConfSim.html
4
Concept Check
A confidence interval estimate for the parameter the population proportion p is 0.25
through 0.75, with 95% confidence. This estimate is obviously not precise
5
Optional: run your own Rossman Chance Confidence Interval Simulation
Go to the simulator http://www.rossmanchance.com/applets/ConfSim.html
The next slide shows the default settings, which work great for this exercise
Hit “Sample” with “Intervals” set to 1 a few times. You’ll notice that the sample proportion produced
varies randomly, and so does the corresponding CI estimate displayed.
Select the CI estimate you can currently see on the display to make the numbers show up. Does this
interval contain the population parameter (cross the line marked 0.5)?
• If the CI estimation procedure has worked the CI estimate contains the population parameter and is
color coded green
• If the CI estimation procedure has not worked the CI does not contain the population parameter and is
color coded red.
Now change intervals to 1000 and see what happens. Hit “Sort”. Are all the confidence intervals color
coded green?
Look at the number beside “Running Total” which gives the percentage of all the CI estimates that have
been produced that worked: the CI estimates that contained the population proportion. Is it close to 95%?
This is because “confidence level” is a setting. When we set a confidence level of 95% we’re stating that we
want our procedure to work 95% of the time. We accept the fact that 5% of the time our procedure is not
going to work, due to random chance.
11
Optional: Use the Rossman Chance app to get a feel for how CI estimation works
Simulating Confidence Intervals for Population Parameter:
http://www.rossmanchance.com/applets/ConfSim.html
π is the setting for the
Vertical
population parameter the phat number
line marks
proportion, 𝑝 is sample
population
proportion,
n is the number of objects parameter
in our
in the sample – the sample notation 𝑝̅
size
Intervals is the number of
different random samples
to draw at once.
Equivalently it’s the number
Horizontal
of different CI estimates to
lines are CI
produce estimates This area turns into a
sampling distribution
for the sample
proportion
One sample of 100 objects has
been drawn. This sample
produced a 𝑝̅=0.440 and a CI This area displays
confidence intervals
estimate of 0.343 through 0.537
produced
6
Confidence and alpha
Confidence (confidence level) measures the probability that the (appropriate)
confidence interval estimation procedure will generate a confidence interval that
contains the fixed population parameter of interest.
Interpreting a confidence of 95%:
“We designed a procedure to work with 95% confidence.
If you take “numerous random samples from a population”, and follow the procedure
we have used, approximately 95% of the confidence intervals will contain the
population parameter under investigation.
5% of the confidence intervals will not contain the population parameter”
Margins of error and confidence interval estimates should always state the
level of confidence associated with them.
14
7
Concept Check
Informally confidence is the probability the inferential statistics procedure works,
𝛼 = 1 − 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒; 𝛼 is the probability the inferential statistics procedure does not work
• Confidence levels should be 90% or more, equivalently 𝛼 should be at most 10%.
• Common confidence levels are 90%, 95%, and 99%
8
The basic structure of a confidence interval estimate
Point Estimate
CI Lower Limit CI Upper Limit
Be careful not to confuse the confidence interval width with the margin of error
The common error that gets made is to confuse the margin of error with the
width of the interval itself. The width of the interval is the distance from the
lower to the upper limit. The margin of error is half the width of the confidence
interval estimate: watch out for this in homework questions.
18
9
Guess the answer
What proportion of US adults do you think did not use the internet in 2019
19
Problem
Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.
Source: Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN,
JINGJING JIANG AND MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-
internet-who-are-they/, unweighted sample size 1,502 computed taking into account weighting
20
10
Problem
Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.
a) The parameter of interest is the population proportion of US adults who did not
use the internet in 2019, p.
21
Problem: solving b)
Based on poll results, the Pew Research council estimates that 10% of US
adults do not use the internet in 2019. With 95% confidence, the margin of
error on this poll is 2.9 percentage points.
22
11
Problem: analyze text
Based on poll results, the Pew Research council estimates that 10% of US
adults do not use the internet in 2019. With 95% confidence, the margin of
error on this poll is 2.9 percentage points.
24
12
Problem: complete solution to b)
Based on poll results, the Pew Research council estimates that 10% of US adults do not use the
internet in 2019. With 95% confidence, the margin of error on this poll is 2.9 percentage points.
Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN, JINGJING JIANG AND
MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-internet-who-are-they/, unweighted sample size
1,502 computed taking into account weighting
25
But put yourself in the shoes of someone who has not done a statistics
class: you would not understand these statements. They are not
written in “business friendly” English. We need to be able to interpret
our results for a business audience.
The problem is that it can sometimes be tricky to interpret statistical inference in standard
business English. A business friendly interpretation of statistical English also has to be correct
statistics English- and statistics English is an incredibly pedantic foreign language.
The solution is to use boiler plate language to interpret results whilst learning inference.
26
13
Boiler plate for interpreting a confidence interval estimate
Always include
Fill in the blanks. confidence level Describe population
parameter using
business friendly
With confidence of _____________ language if possible:
we estimate that the sometimes it is not
______________________________ possible.
________________________________ Example: For a business
lies between ______________ friendly report “average
chocolate consumption by
all Americans”
For a DS302 standard
CI estimate numbers, with homework problem
relevant unit and rounding “population mean chocolate
consumption” 27
28
14
Problem: c) interpret confidence interval.
Include confidence level
Filled in blanks are underlined
29
Problem
Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.
Source: Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN,
JINGJING JIANG AND MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-
internet-who-are-they/, unweighted sample size 1,502 computed taking into account weighting
30
15
Interpreting confidence: the common error everyone makes first time round
• Interpreting confidence (confidence level) is not important for business practice – though
understanding what confidence means is. You will be asked to interpret confidence in DS302
homework and exams.
• Reminder: confidence is how likely it is that the inference procedure works.
o For the inference procedure confidence interval estimation the inference procedure works
if the confidence interval contains the value of the population parameter.
o So confidence in a CI estimate is how likely it is that the procedure “compute a confidence
interval estimate” will produce an interval that contains the fixed population parameter.
• Reminder: in inferential statistics population parameters are usually numbers which are
unknown. However, a population parameter is not a random number –the parameter is
“fixed”. When interpreting confidence for confidence interval estimates the common error
that everyone initially makes is to imply that population parameters are “random”.
• Solution to common error: use the phrase “fixed, unknown, population parameter”. Now
there’s no way this is business friendly language, but it’ll be your homework problem safety
net. Using this phrase makes it clear that you know that population parameters are not
random numbers
31
16
Problem d): interpret confidence
A confidence of 95% means that if we took numerous
random samples of data and computed confidence interval
estimates for the population proportion from each sample
then 95% of the CI estimates produced would contain the
fixed population parameter, 𝑝, the proportion of American
adults who do not use the internet, and 5% of the CI
estimates produced would not contain 𝑝
33
34
17