Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Topic: Confidence Interval Estimation

Section 2.
Basic concepts used in
confidence interval
estimation

Section Two
Basic concepts used in
confidence interval
estimation
8.1 Explain why confidence interval (CI) estimates for a
population parameter are used.
Provide appropriate interpretations of CI estimates from
standard statistical output.
Jaggia Chapter 8, Section 8.1, 8.2, Section 8.3.

1
Questions this section will answer.
1. What is a confidence interval?
If feasible: why should we always report a confidence interval estimate as
well as a point estimate?
2. What’s confidence? Are there any limits on what number is ethical for
confidence?
3. What’s the basic structure of many confidence interval estimates?
4. How can we interpret a confidence interval estimate?

What is a confidence interval?

• Confidence Interval—provides a range of values that, with a


certain level of confidence, may contain the population parameter
of interest.
 Also referred to as an interval estimate or CI estimate

Unofficially a confidence interval estimate is a plausible range


of numbers for values of a population parameter, associated
with a certain level of confidence

BUSINESS STATISTICS | Jaggia, Kelly


7-4

2
When a hurricane hits people don’t just die the
day the hurricane hits: people die because the
electricity is bust, there’s no clean water etc.
And we can never know what the true value of
these excess deaths is. This Boston Globe
headline is misleading– it does not convey
that 4,600 is not the true value for excess
deaths, but rather an estimate for the number
of “excess deaths” caused by Hurricane Maria.

Formally 4,600 is a point estimate for the


parameter “number of excess deaths caused by
Hurricane Maria” in Puerto Rico

Headline and picture source: Boston Globe,


https://www.bostonglobe.com/news/nation/2018/05/29/harvard-study-estimates-thousands-
died-puerto-rico-due-hurricane-maria/3Gdc34Fh5iSEPTmlXzCaVP/story.html
5

Look at the underlined Confidence Interval (CI) estimates:


“plausible ranges” for an estimate of excess deaths.
Which estimate sounds most precise?
1) “This rate yielded a total of 4645 excess deaths during this period
(95% CI, 793 to 8,498)”
Quote from “Mortality in Puerto Rico after Hurricane Maria”, New England Journal of Medicine 2018;
379:162-170

After more reliable data became available


2)” 2,975 excess deaths from its landfall in September 2017 to February 2018......
The 95 percent confidence interval........... is from 2,658 to 3,290 dead.”
Source: “A Year After Hurricane Maria, Puerto Rico Finally Knows How Many People Died”
VANN R. NEWKIRK II AUG 28, 2018 The Atlantic,
https://www.theatlantic.com/politics/archive/2018/08/puerto-rico-death-toll-hurricane-maria/568822/

We report confidence interval estimates (CI estimates) because they convey


how “precise” our estimate of a parameter is. 6

3
Precision and confidence interval width

We report a point estimate for a population parameter so that people have one number they can
use as an estimate for a population parameter.
We report a confidence interval estimate for the population parameter to show a range of
plausible values, which conveys the precision of our estimate
• An imprecise confidence interval estimator produces a wide range of plausible values for
estimates
CI Lower Limit CI Upper Limit

 A wide confidence interval shows the confidence interval (CI) estimate is not precise.

• A precise confidence interval estimator produces a narrow range of plausible values for
estimates:
CI Lower Limit CI Upper Limit

 A narrow confidence interval shows the confidence interval (CI) estimate is precise

What is confidence?
Suppose that the true population parameter the proportion is 0.5. We can mimic the process of
gathering a sample of size 100 and computing a confidence interval estimate (CI estimate) for a
proportion using a simulator: http://www.rossmanchance.com/applets/ConfSim.html

10 different random samples are drawn from the same


Population proportion p= 0.5
population, which has a true population parameter the
proportion of 0.5
• One of the samples of data produces a CI estimate –
color coded red – which does not contain the
population proportion 0.5.
• The other nine samples of data produce CI
estimates which include the population proportion
0.5 in their plausible range of values.

Confidence measures the probability that the (appropriate) confidence interval


estimation procedure will generate a confidence interval that contains the fixed
population parameter of interest.
8

4
Concept Check
A confidence interval estimate for the parameter the population proportion p is 0.25
through 0.75, with 95% confidence. This estimate is obviously not precise

CI Lower Limit, 0.25 CI Upper Limit, 0.75

Plausible values for p are 0.25 through 0.75


(95% confidence)

Do confidence interval estimates always contain the population parameter of interest –


in this example the population proportion p?

Different random samples produce different CI estimates for a proportion,


and sometimes these CI estimates may not even contain the true value of the
parameter of interest, in this case the population proportion p
9

Optional: informally, what is confidence?


A confidence level is the probability that a statistical inference procedure works.
But what does the “Confidence” in confidence interval estimates mean?
Personally running a simulation really helps with understanding confidence.
We produce confidence interval estimates (CI estimates) because we can not find out the true value of a population
parameter. This means that we can never know whether a specific confidence estimate actually contains a
population parameter or not.
A confidence interval estimate produces a plausible range of numbers for a population parameter– but this
plausible range is an estimate, an educated guess, based off a sample of data. Different samples of data will produce
different confidence interval (CI) estimates for the same parameter (you saw this in the Hurricane Maria example).
And sometimes our CI estimates are going to be wrong due to random chance: random samples sometimes produce
weird results - educated guesses are still guesses and can be far from the truth.
Intuitively a confidence level is the probability that a statistical inference procedure works, “does what it says on
the tin”. A confidence interval procedure works if the interval produced contains the population parameter of
interest. So in confidence interval estimation confidence means the probability of the CI estimation procedure (CI
estimator) producing a confidence interval that contains the population parameter of interest. One way to think
about a confidence level of 95% in a confidence interval estimate is as follows
“this CI estimate has been produced using a procedure that works 95% of the time, but will not work 5% of the time
due to random factors; sorry but I cannot tell you whether this particular interval has worked or not”
10

5
Optional: run your own Rossman Chance Confidence Interval Simulation
Go to the simulator http://www.rossmanchance.com/applets/ConfSim.html
The next slide shows the default settings, which work great for this exercise
Hit “Sample” with “Intervals” set to 1 a few times. You’ll notice that the sample proportion produced
varies randomly, and so does the corresponding CI estimate displayed.
Select the CI estimate you can currently see on the display to make the numbers show up. Does this
interval contain the population parameter (cross the line marked 0.5)?
• If the CI estimation procedure has worked the CI estimate contains the population parameter and is
color coded green
• If the CI estimation procedure has not worked the CI does not contain the population parameter and is
color coded red.
Now change intervals to 1000 and see what happens. Hit “Sort”. Are all the confidence intervals color
coded green?
Look at the number beside “Running Total” which gives the percentage of all the CI estimates that have
been produced that worked: the CI estimates that contained the population proportion. Is it close to 95%?
This is because “confidence level” is a setting. When we set a confidence level of 95% we’re stating that we
want our procedure to work 95% of the time. We accept the fact that 5% of the time our procedure is not
going to work, due to random chance.

11

Optional: Use the Rossman Chance app to get a feel for how CI estimation works
Simulating Confidence Intervals for Population Parameter:
http://www.rossmanchance.com/applets/ConfSim.html
π is the setting for the
Vertical
population parameter the phat number
line marks
proportion, 𝑝 is sample
population
proportion,
n is the number of objects parameter
in our
in the sample – the sample notation 𝑝̅
size
Intervals is the number of
different random samples
to draw at once.
Equivalently it’s the number
Horizontal
of different CI estimates to
lines are CI
produce estimates This area turns into a
sampling distribution
for the sample
proportion
One sample of 100 objects has
been drawn. This sample
produced a 𝑝̅=0.440 and a CI This area displays
confidence intervals
estimate of 0.343 through 0.537
produced

6
Confidence and alpha
Confidence (confidence level) measures the probability that the (appropriate)
confidence interval estimation procedure will generate a confidence interval that
contains the fixed population parameter of interest.
Interpreting a confidence of 95%:
“We designed a procedure to work with 95% confidence.
If you take “numerous random samples from a population”, and follow the procedure
we have used, approximately 95% of the confidence intervals will contain the
population parameter under investigation.
5% of the confidence intervals will not contain the population parameter”

alpha, α measures the probability that the (appropriate) confidence interval


estimation procedure will generate an interval that does not contain the fixed
population parameter.
alpha, α =1 – confidence
In the previous example alpha was 5%. 13

Ethical levels of confidence

• Confidence levels should be 90% or more.


o 𝛼 should be at most 10%.
• Common confidence levels are 90%, 95%, and 99%

Margins of error and confidence interval estimates should always state the
level of confidence associated with them.

95% is the most common confidence level you’ll meet.

If you unfortunately encounter an opinion poll result + margin of error


without confidence, presume the confidence is 95%.

14

7
Concept Check
Informally confidence is the probability the inferential statistics procedure works,
𝛼 = 1 − 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒; 𝛼 is the probability the inferential statistics procedure does not work
• Confidence levels should be 90% or more, equivalently 𝛼 should be at most 10%.
• Common confidence levels are 90%, 95%, and 99%

A person is producing a confidence interval (CI) estimate using a procedure


that operates with 𝛼 = 5%, 1/20
1) If the person drew 20 different random samples, and produced 20
corresponding CI estimates how many CI estimates would typically not
contain the population parameter of interest?
A) 0 B) 1 C) 5 D) 20

2) Is the person operating with an ethical confidence level?


A) Yes, 5% confidence B) Yes 95% confidence
C) No, 5% confidence D) No, 95% confidence
15

Concept Check Answers


A person is producing a confidence interval (CI) estimate using a procedure
that operates with 𝛼 = 5%, 1/20
1) If the person drew 20 different random samples, and produced 20
corresponding CI estimates how many CI estimates would typically not
contain the population parameter of interest?
A) 0 B) 1 C) 5 D) 20
2) Is the person operating with an ethical confidence level?
A) Yes, 5% confidence B) Yes 95% confidence
• The procedure compute a CI estimate does not work if the interval produced does not contain the
population parameter of interest.
• 𝛼 measures how likely it is that a correctly applied inference procedure will not work.
• If 𝛼 = 5% then 𝛼 is 1/20. We’d expect that if we drew 20 different random samples and produced
20 CI estimates we would typically produce one interval that does not contain the population
parameter of interest.
• 𝛼 = 5%. Confidence = 1 − 𝛼 = 1 − 0.05 = 0.95. A confidence of 95% is ethical
16

8
The basic structure of a confidence interval estimate

Confidence intervals are constructed from:


Point estimate ± Margin of error.

Point Estimate
CI Lower Limit CI Upper Limit

Margin of Error Margin of Error

• The margin of error : half of the width of the confidence interval.


• Future sections will explain the factors that determine the width of confidence
intervals – the variability of the estimator (standard error) and the desired
confidence level
17

Be careful not to confuse the confidence interval width with the margin of error

Construct a confidence interval from:


Point estimate ± Margin of error.
Point Estimate
CI Lower Limit CI Upper Limit

Margin of Error Margin of Error

Confidence Interval Width

The common error that gets made is to confuse the margin of error with the
width of the interval itself. The width of the interval is the distance from the
lower to the upper limit. The margin of error is half the width of the confidence
interval estimate: watch out for this in homework questions.
18

9
Guess the answer
What proportion of US adults do you think did not use the internet in 2019

19

Problem
Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.
Source: Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN,
JINGJING JIANG AND MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-
internet-who-are-they/, unweighted sample size 1,502 computed taking into account weighting

a) What is the population parameter of interest?


b) What is a CI estimate for this parameter?
c) Interpret the confidence interval
d) Interpret the confidence level

20

10
Problem
Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.

a) What is the population parameter of interest?


Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN, JINGJING JIANG AND
MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-internet-who-are-they/, unweighted sample size
1,502 computed taking into account weighting

a) The parameter of interest is the population proportion of US adults who did not
use the internet in 2019, p.

21

Problem: solving b)
Based on poll results, the Pew Research council estimates that 10% of US
adults do not use the internet in 2019. With 95% confidence, the margin of
error on this poll is 2.9 percentage points.

b) What is a CI estimate for this parameter?


Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN, JINGJING JIANG AND
MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-internet-who-are-they/, unweighted sample
size 1,502 computed taking into account weighting

Confidence Interval Estimate for the parameter the proportion is:


Point Estimate ± Margin of Error

22

11
Problem: analyze text
Based on poll results, the Pew Research council estimates that 10% of US
adults do not use the internet in 2019. With 95% confidence, the margin of
error on this poll is 2.9 percentage points.

b) What is a CI estimate for this parameter?


Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN, JINGJING JIANG AND
MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-internet-who-are-they/, unweighted sample
size 1,502 computed taking into account weighting

percentage point just means the “10% of US adults” is a point


unit 1 percent ie 0.01. estimate.

A 95% confidence margin of error 10% or 0.1 is the (weighted)


of 2.9 percentage points means sample proportion of the 1,502
the margin of error to use for a adults in the Pew Poll who do not
CI estimate is 0.029 use the internet
23

Problem: what you need to write down


Based on poll results, the Pew Research council estimates that 10% of US
adults do not use the internet in 2019. With 95% confidence, the margin of
error on this poll is 2.9 percentage points.
Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN, JINGJING JIANG AND
MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-internet-who-are-they/, unweighted sample
size 1,502 computed taking into account weighting

b) What is a CI estimate for this parameter? Always state confidence


CI Estimate: Point Estimate ± Margin of Error level in your answers
CI Estimate for p is 0.1 ± 0.029, 95% confidence

CI Lower Limit = Point Estimate – Margin of Error = 0.1 – 0.029 = 0.071


CI Upper Limit = Point Estimate + Margin of Error = _________________

24

12
Problem: complete solution to b)
Based on poll results, the Pew Research council estimates that 10% of US adults do not use the
internet in 2019. With 95% confidence, the margin of error on this poll is 2.9 percentage points.
Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN, JINGJING JIANG AND
MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-internet-who-are-they/, unweighted sample size
1,502 computed taking into account weighting

b) What is a CI estimate for this parameter?


CI Estimate: Point Estimate ± Margin of Error
CI Estimate for p is 0.1 ± 0.029, 95% confidence
CI Lower Limit = Point Estimate – Margin of Error = 0.1 – 0.029 = 0.071
CI Upper Limit = Point Estimate + Margin of Error = 0.1+ 0.029 = 0.129
With 95% confidence a CI estimate for the proportion of all US adults who did not
use the internet in 2019 is 7.1% through 12.9%

25

Why we need to interpret results

CI Estimate for p is 0.1 ± 0.029, 95% confidence


With 95% confidence a CI estimate for the proportion of all US adults who did not use the internet in
2019 is 7.1% through 12.9%

These items appropriately report results: they state confidence and


appropriately present numbers.

But put yourself in the shoes of someone who has not done a statistics
class: you would not understand these statements. They are not
written in “business friendly” English. We need to be able to interpret
our results for a business audience.

The problem is that it can sometimes be tricky to interpret statistical inference in standard
business English. A business friendly interpretation of statistical English also has to be correct
statistics English- and statistics English is an incredibly pedantic foreign language.

The solution is to use boiler plate language to interpret results whilst learning inference.
26

13
Boiler plate for interpreting a confidence interval estimate

Always include
Fill in the blanks. confidence level Describe population
parameter using
business friendly
With confidence of _____________ language if possible:
we estimate that the sometimes it is not
______________________________ possible.
________________________________ Example: For a business
lies between ______________ friendly report “average
chocolate consumption by
all Americans”
For a DS302 standard
CI estimate numbers, with homework problem
relevant unit and rounding “population mean chocolate
consumption” 27

Problem: solve c) using boiler plate language.


Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.
Source: Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN,
JINGJING JIANG AND MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-
internet-who-are-they/, unweighted sample size 1,502 computed taking into account weighting

a) What is the population parameter of interest?


b) What is a CI estimate for this parameter?
c) Interpret the confidence interval
d) Interpret the confidence level

28

14
Problem: c) interpret confidence interval.
Include confidence level
Filled in blanks are underlined

With confidence of 95% we estimate that Describe


population
the percentage of all Americans who did parameter using
not use the internet in 2019 lies between business
7% and 13% of American adults friendly
language

CI estimate numbers, with relevant unit


and rounding

29

Problem
Based on poll results, the Pew Research council estimates that 10% of US adults
do not use the internet in 2019. With 95% confidence, the margin of error on
this poll is 2.9 percentage points.
Source: Pew Research Council, “10% of Americans do not use the internet”, April 22 2019, MONICA ANDERSON, ANDREW PERRIN,
JINGJING JIANG AND MADHUMITHA KUMAR, https://www.pewresearch.org/fact-tank/2019/04/22/some-americans-dont-use-the-
internet-who-are-they/, unweighted sample size 1,502 computed taking into account weighting

a) What is the population parameter of interest?


b) What is a CI estimate for this parameter?
c) Interpret the confidence interval
d) Interpret the confidence level

30

15
Interpreting confidence: the common error everyone makes first time round

• Interpreting confidence (confidence level) is not important for business practice – though
understanding what confidence means is. You will be asked to interpret confidence in DS302
homework and exams.
• Reminder: confidence is how likely it is that the inference procedure works.
o For the inference procedure confidence interval estimation the inference procedure works
if the confidence interval contains the value of the population parameter.
o So confidence in a CI estimate is how likely it is that the procedure “compute a confidence
interval estimate” will produce an interval that contains the fixed population parameter.
• Reminder: in inferential statistics population parameters are usually numbers which are
unknown. However, a population parameter is not a random number –the parameter is
“fixed”. When interpreting confidence for confidence interval estimates the common error
that everyone initially makes is to imply that population parameters are “random”.
• Solution to common error: use the phrase “fixed, unknown, population parameter”. Now
there’s no way this is business friendly language, but it’ll be your homework problem safety
net. Using this phrase makes it clear that you know that population parameters are not
random numbers

31

Boilerplate for interpreting confidence in a CI estimate.

Fill in the blanks, replacing #% with the correct confidence


Name of
A confidence of #% ___ means that if we took statistical
numerous random samples of data and computed inference
______________________ procedure
________________________________ How the
then #% ____ of the CI estimates produced would procedure is
successful
contain the fixed unknown population
parameter__________________,___________
name of
______________________________ population
and 100% -#% of the CI estimates would not contain parameter from
the fixed parameter. the problem +
description
32

16
Problem d): interpret confidence
A confidence of 95% means that if we took numerous
random samples of data and computed confidence interval
estimates for the population proportion from each sample
then 95% of the CI estimates produced would contain the
fixed population parameter, 𝑝, the proportion of American
adults who do not use the internet, and 5% of the CI
estimates produced would not contain 𝑝

33

Section Two Summary


1. Explained what a confidence interval estimate does and why CI estimates
are used: CI estimates convey precision and estimate
2. Explained why your choice of confidence matters, and stated ethical
confidence levels.
3. Explained the basic structure of many confidence interval estimate.
4. Explained how we interpret an appropriate confidence interval estimate for
any population parameter.
5. Explained how we interpret a confidence level for a CI estimate.

And pointed out that CI stands for confidence interval ….

34

17

You might also like