Download as pdf or txt
Download as pdf or txt
You are on page 1of 150

1

2
Table of Contents
1. Preface
2. Module 4 Exploring and Producing Data for Business Decision Making
1. Lesson 4-1 Confidence Interval Basics
1. Lesson 4-1.1 Confidence Interval Basics
2. Lesson 4-2 Confidence Interval for Means
1. Lesson 4-2.1 Confidence Interval for Means
2. Lesson 4-2.2 Confidence Interval for Mean in Excel
3. Lesson 4-2.3 Impact of Confidence Level Illustrated in Excel
3. Lesson 4-3 Confidence Interval for Population Proportion
1. Lesson 4-3.1 Confidence Interval for Population Proportion
2. Lesson 4-3.2 Confidence Interval for Population Proportion in Excel
3. Lesson 4-3.3 Confidence Interval Animation in Excel
4. Lesson 4-3.4 Starting Salary Example in Excel
4. Lesson 4-4 Sample Size
1. Lesson 4-4.1 Sample Size
2. Lesson 4-4.2 Sample Size Proportion in Excel
3. Lesson 4-4.3 Sample Size Mean in Excel
4. Lesson 4-4.4 Sample Size Effect in Excel

3
Preface
Thank you for choosing a Gies eBook.

This Gies eBook is based on an extended video lecture transcript made from Module 4 of
Professor Fataneh Taghaboni-Dutta’s Exploring and Producing Data for Business Decision
Making on Coursera. The Gies eBook provides a reading experience that covers all of the
information in the MOOC videos in a fully accessible format. The Gies eBook can be used with
any standards-based e-reading software supporting the ePUB 3.0 format.

Each Gies eBook is broken down by lessons that are navigable using our e-reader’s table of
contents feature. Within each lesson the following sequence of content will always occur:

Lesson title
A link to the web-based videos for each lesson (You must be online to view.)

Within the lesson, every time there is a slide change or a switch to the next informative video
scene, you will be presented with:

Thumbnail image of the current slide or video scene


Any text present on the slide in the video is recreated below the thumbnail in a searchable,
screen reader-ready format.
Extended text description of the important visuals such as graphs and charts presented in
the slides.
Any tabular data from the video is recreated and properly labeled for screen reader
navigation and reading.
All math equations are presented in MathML that provides both content and presentation if
on screen.
Transcript that captures all of the original speech in the video labeled by the person
speaking.

All Gies eBooks are designed with accessibility and usability as a priority. This design is intended
to serve all readers in a flexible manner regardless of their choice of digital reading tools.

If you have any questions or suggestions for improvement for this Gies eBook, please contact
Giesbooks@illinois.edu

4
Copyright © 2019 by Fataneh Taghaboni-Dutta

All rights reserved.

Published by the Gies College of Business at the University of Illinois at Urbana-Champaign, and
the Board of Trustees of the University of Illinois

5
Module 4 Exploring and Producing Data for
Business Decision Making

6
Lesson 4-1 Confidence Interval Basics

Lesson 4-1.1 Confidence Interval Basics


Media Player for Video

Major Purpose of Statistics (1 of 2) - Slide 1

Why is learning statistics important?


Why are so many organizations looking for people who understand how to use statistical
methods?

The Best Jobs of 2016*

1. Data scientist
2. Statistician
3. Information security analyst

*(Career Cast Job Search, 2016)

7
Transcript

I'm sure that some of you are still wondering but why you need to learn statistics or why do so
many organizations are looking for people who understand how to use statistical methods. Just
recently, CareerCast, a web-based employment service listed the best jobs of 2016 and once
again, we see statistics on the top of the list. Speaking to the enormous need for scientists who
will be slicing and dicing the data companies have so that they can improve their decision
making. So, for you as someone who's interested in leadership roles, this is also important. If you
don't ask the right questions, then the analysis done by the most talented statisticians will be of
very little use. You need to be able to understand statistical analysis, ask the right questions, and
shape the future of the inquiries. In this module, we are now ready to begin the process of
making inferences. So, let's get started.

Major Purpose of Statistics (2 of 2) - Slide 2

Provide information so that informed decisions can be made

Transcript

Remember that the major purpose of statistic is to provide information so that informed decisions
can be made. In everything we do, we face uncertainty. Statistics will allow us to anticipate the
possible outcomes of these uncertainties and in turn, improve our decisions. We also know that
surveying the entire population is too costly and takes too much time. So, we rely on a sample. In
the last module, we learned about how to take samples and what does it mean to have a
representative sample which is not biased.

8
Making an Inference - Slide 3

Inferential statistics uses sample statistics to provide estimates for population parameters.

A point estimate is a single number estimated of a population parameter.

An example:

What is the average age of Coursera students?


My sample results show 28 years old to be the mean.

Transcript

Now we will learn how to use the sample information and make inferences about the population.
This is what is known as inferential statistics. Sample information will provide us a point estimate
which we will use as an estimation for the population parameter. For example, I may want to
know the average age of students who take a course in Coursera. Using a sample of Coursera
participants, I can find the average age for the sample and then use that as the estimate for the
population of Coursera participants. Let's say I did such sampling and found the average age of
the participants to be 28. I don't need to know much about the statistics to know that my estimate
is more than likely a bit off. What statistics can tell me is, how confident I can be about my
estimation and its degrees of error. Have you ever heard of margin of error? This is a value we
can calculate and this is one of the concepts that you will be learning in this module. Let's start
with the larger concept of confidence interval.

9
Confidence Interval - Slide 4

Use a range to estimate the population parameter.

Interval Estimate

This is still an estimate and we can’t be certain that it contains the true population
parameter.
The probability that the interval actually contains the true population parameter is called
the level of confidence.
An interval estimate associated with a certain level of confidence is called a confidence
interval.

Transcript

Using only one value for estimating the population mean or any other population parameter
leaves much room for error. It is much better to provide the range; this range is called an interval
estimate. This is still an estimate and we can't be certain that it actually contains the true
population parameter. The probability that that interval actually contains the true population
parameter is called the level of confidence. Finally, an interval estimate associated with a certain
level of confidence is called a confidence interval.

10
Confidence Interval - Example - Slide 5

Two drugs are being tested for headache relief – we want to know the time it takes to experience
relief – the group size is 100.

Drug A: Average time was 38 minutes


Drug B: Average time was 43 minutes
Drug A: Average time was 38 minutes
Drug B: Average time was 58 minutes

Transcript

I think we all have suffered with headaches and when we take a pill, we want to get relief as fast
as possible. So, consider this. Two drugs are being tested for headache relief. We want to know
the time it takes to experience relief and the testing is done on a group of size 100. One group
takes drug A and the other one takes drug B. Drug A average time was 38 minutes before they
felt relief. Drug B, average time elapsed was 43 minutes before they felt relief. Based on this
study, the average time for drug A is five minutes less but in the big picture, can you conclude
that drug A really acts faster? Could it be make of the people who were in the group A that made
them feel pain relief faster? What if the conclusion of the study showed the following: now drug A
resulted in 20 minutes faster relief. Are you more likely to think that drug A is more effective than
drug B? Of course, there can still be other explanation about why drug A group reported faster
relief but given this larger difference the other possible explanations may be less likely be the
reason for the difference we are seeing. Now it would be a good time to explain the concept of
margin of error.

11
Margin of Error - Slide 6

When estimating the population mean, begin with the best point estimate from the sample mean
and then add and subtract the margin of error.

Flow chart. Lower Bound is reached by subtracting margin of error from x bar. Upper bound is
found by adding margin of error to x bar.

Transcript

When estimating the population mean, begin with the best point estimate from the sample mean
and then add and subtract the margin of error.

Gallup Example (1 of 2) - Slide 7

12
Gallup Daily: U.S. Economic Confidence Index. A green line with multiple peaks and drops runs
across the chart in a slight upward slope to the right.

Transcript

I mentioned that you might have seen the phrase margin of error is news programs and business
reports and etc. Here's an example for margin of error. This image is from Gallup Daily Economic
Confidence Indexes from April 15th, 2016.

Gallup Example (2 of 2) - Slide 8

Daily results are based on telephone interviews with approximately 1500 national adults: Margin
of error is ±3 percentage points.

Transcript

Look closely and you will see the sample size used and the margin of error, but what exactly is
the margin of error?

13
How to Calculate Margin of Error - Slide 9

Size of the sample (n)

Sample standard deviation (σ)

Confidence level (1 − α): Your confidence in having studied a sample which contains the true
population parameter

Transcript

Well margin of error mathematically is made of many things. One is the size of your sample.
Remember, you should know this now at the gut level that the larger the sample size, the more
reliable your estimation will be. That means larger sample size will reduce the margin of error.
Then it is the natural variability that you have in your sample which is represented by the
sample's standard deviation. Finally, the confidence level which describes uncertainty of the
sampling method. In other word, how confident are you that you have studied a sample which
contains the true population parameter? The notion for this is 1 minus α, α is known as the
significance level. So, if you want a 95% confidence interval, the most commonly confidence level
used, then you're essentially saying that there is 95% chance that the sample you took will allow
you to make a correct inference about the true population parameter and 5% chance of missing
the mark. Let's explore the concept of confidence interval a little further.

14
Confidence Level - Slide 10

Distribution of Sample Means


s
Standard Error: σx̄ =
√n

68.2% are ± 1 Standard Error


95.4% are ± 2 Standard Error
99.7% are ± 3 Standard Error

A bell curve with one peak and two tails, with the three standard deviations marked by vertical
lines.

Transcript

This is a distribution of sample means. Central limit theorem shows that if you take many
samples and then plot each sample mean, you will end up with approximately a normal
distribution because if you take just one sample, then we expect that about 68% of all sample
means to lie within 1 standard deviation of the population mean, 95.5% of the sample means to
be within two standard errors of the population mean and 99.7 of the sample means will lie within
three standard errors of the population mean.

15
95% Confidence Level - Slide 11

Distribution of Sample Means


s
Standard Error: σx̄ =
√n

Transcript

So, if you are considering a 95% confidence level then you are implying that the sample that you
took will have a mean which be roughly about plus or minus 2 standard errors from the mean of
this distribution, 1.96 is to be exact. This is the z-score for the desired confidence level. So, the
confidence interval would be expressed in the number of standard errors that you're away from
the mean and that leaves you with a 5% chance that you selected a sample which fell outside of
your boundary. That 5% chance of selecting such a sample is equally likely at each end of the
tail.

16
Confidence Interval for Mean - Slide 12

Sample Mean ± Margin of Error

Margin of Error is: Z-score for the desired confidence level × Standard Error
s
Standard Error: σx̄ =
√n

Transcript

So, then the equation for a confidence interval for estimating mean of a population is the sample
mean plus or minus the margin of error. And the margin of error is calculating by finding the z-
score of the confidence level desired times the standard error where the standard error is
calculated by taking the standard deviation of the population and dividing it by the square root of
the sample size. Please note that in this equation for the standard error, we use sample standard
deviation as an estimation for σ which is the population standard deviation. You may wonder why.
So, let me explain.

17
Unknown Population Standard Deviation and Small
Sample Sizes - Slide 13

σ
Equations for confidence interval (σ is known): [x̄ ± Z α ]
2 √n

s
Equations for confidence interval (σ is unknown): [x̄ ± t α ]
2 √n

Transcript

It's been understood that we do sample studies because we don't know something of interest
about our population. Thus, we don't know what the population mean or standard deviation is but
the statistical equations use different notations and distributions when we know the population
standard deviation and when we don't know this value. To be completely precise, if we know the
population standard deviation, that is σ, then to find the z-score corresponding to our desired
confidence level, we will use the actual σ and the normal distribution and find its z-score in order
to calculate the confidence interval. However, if we don't know the population standard deviation
then we use the sample deviation, s, as an estimate of σ which is the actual population standard
deviation and another distribution as known as t-distribution in order to calculate the confidence
interval. Let me show you the relationship between the t-distribution and the normal distribution.

18
The t Distribution - Slide 14

The t distribution. A bell curve with 4 peaks within each other on μ, fanning out to 4 tails on the
right and 4 on the left. The highest peak is labeled normal distribution.

Transcript

The t-distribution is similar to that of the standard normal distribution, both are symmetrical and
bell shaped. However, the t-distribution is more spread out than the standard normal distribution.
That is, it has more area in its tails and less in its center. The amount of spread of the t-
distribution is given by degrees of freedom which n − 1, that's sample size minus 1. Now let me
analyze this graph. The most peaks and narrow curve in this graph is a standard normal curve.
The most spread out curve shows a t-distribution where sample size used was 3, a very small
sample. The other curves are t-distribution plotted as the sample size has increased to 4 and
then 10. One thing you notice is that as we increase the sample size, the t-distribution becomes
more and more peaked and narrower and approaches a true normal distribution. Thus, if you
have a fairly large sample size, then the two curves become more or less identical. Actually, pass
sample size of 30, these two distributions become very similar. Since in this class we only plan to
work with large sample sizes, I will always use a normal distribution thus a z-score for calculating
the margin of error when illustrating a problem in my lectures. When solving problems using a
software like Excel, it is just as easy to use the t-score and the t-distribution. So, in a sense, t-
score and z-score become almost synonymous and interchangeable when we have large data
sets.

19
Comparing z-score and t-score - Slide 15

20
Comparing z-score and t-score
Sample size (n) Sample standard deviation (s) t-score z-score

30 5.95 2.04 1.96

100 8.14 1.98 1.96

500 7.75 1.964 1.96

1000 7.85 1.962 1.96

21
Transcript

As I said to be perfectly precise, we should be using a t-distribution, but for large sample sizes
where the population standard deviation is not known, using just the standard normal distribution
is an extremely good approximation. Just to show you that numerically consider the following
example. There is a population which you don't know anything about and this is why we are
taking samples from it. We take a random sample of varying sizes from this population. I started
with 30 because at 30 t-distribution and the standard normal distribution become very close and
get closer as the sample size increases. For each of the sample taken, we calculate the sample's
standard deviation which will be used as a point estimate of the population standard deviation.

So now let's see how the t-score and z-score will be if we wanted a 95% confidence interval. This
table shows the t-score for the different sample sizes I have here. As we increase the sample
size, the t-score starts getting closer to 1.96, the actual z-score. So why do I say when I'm
demonstrating concepts to you I will only use z-score? Because z-score stays the same
regardless of the sample size. So, for the most commonly used confidence interval of 95%, you
know that value is 1.96. Approximately 2 standard errors from the mean. So, you can focus on
calculating the standard error. T-score which is based on t-distribution, on the other hand, is
dependent on the sample size which makes it impossible to memorize a t-score. It is not unique
and that would prohibit you from doing any quick calculations or mental math. One of my
objectives in this class is to give you the ability to read business reports and look at data and be
able to decide for yourself if the conclusion makes sense or not which means that you do some
basic reasoning and mental math. And relying on a z-score for a quick approximation is perfectly
good substitute when you don't have access to a computer.

Commonly Used Confidence Intervals - Slide 16

There are three confidence intervals that are used most often:

95% z-score = 1.96 (use as an approximation for t-score)


90% z-score = 1.645
99% z-score = 2.576

22
Transcript

There are three confidence intervals that are used most often; 95% is the most often used
interval. Whenever the confidence level is not mentioned, you can assume that it is at 95%. This
value is the default value in most of the statistical software as well. A good approximation of the t-
score using the normal distribution is 1.96. The other two are 90% confidence interval in which
case z-score is 1.645 and 99% confidence interval and for that, z-score is 2.576. You can pretty
much memorize these for quick references as you look at business reports and analysis
presented to you. In this lesson, you learned about confidence level, margin of error, and how all
these come together to create a confidence interval. In the next lesson, we will put all of these
together and develop the confidence level and start looking at its meaning.

23
Lesson 4-2 Confidence Interval for Means

Lesson 4-2.1 Confidence Interval for Means


Media Player for Video

Confidence Interval for Mean - Slide 17

Sample Mean ± Margin of Error

Margin of Error is: Z-score for the desired confidence level × Standard Error
s
Standard Error: σx̄ =
√n

s
[x̄ ± t α × ]
2 √n

In this last equation t α  is circled in red.


2

Transcript

We have learned that confidence intervals for estimating mean of a population is the sample
mean plus or minus the margin of error. The margin of error is calculated by finding the Z score of
the confidence level desired times the standard error, where the standard error is calculated by
taking the sample standard deviation and dividing it by the square root of the sample size. Here's
a notation that represents the various components of the margin of error. Let me focus on the t of
α over 2 and explain why it's written this way.

24
95% Confidence Level - Slide 18

s
[x̄  ±  t α × ]
2 √n

α = 5%

A drawing of a histogram with a bell curve. The central part of the curve is labeled 95 percent.
Alpha equals 5 percent. The two tails are labeled alpha divided by 2 equals 2.5 percent. Z of
alpha over 2 equals −1.96 on the left tail and Z of alpha over 2 equals +1.96 on the right tail.

Transcript

Consider the case of 95% confidence interval. This confidence interval means that there is a 5%
chance, known as α, that we select the sample that has the mean farther away than our desired
level. Based on the symmetrical nature of normal distribution, this 5% is split equally into two tails
and that is 2.5% on each tail. And a close approximation of t α over 2 is 1.96.

25
Example - Mean Weight (1 of 3) - Slide 19

Cereal boxes are filled to be 18 oz.


Workers take samples of 50 boxes and weigh them.
What is the 95% confidence interval for the mean weight if a sample gives an average
weight of 17.6 oz. and a standard deviation of .21 oz.?

Transcript

Consider this example where we are manufacturing cereals. Cereal boxes are filled out to be 18
ounces. To make sure that the production system is working properly, workers take a sample of
50 boxes and weight them. The sample is used to decide whether or not the filling system is
working properly. What is the 95% confidence interval for mean weight, if a sample gives average
weight of 17.6 ounces and standard deviation of 0.21 ounce?

26
Example - Mean Weight (2 of 3) - Slide 20

n = 50

What is the 95% confidence interval for the mean weight if a sample gives an average weight of
17.6 oz. and a standard deviation of .21 oz.?

x̄ = 17.6

s = 0.21

z α = 1.96
2

Transcript

Here's the relevant information. The size of the sample is 50 boxes. Their average weight, thus
their sample mean, which is the notation x̄ is 17.6 ounces. And the sample standard deviation of
this sample, denoted here by S, is 0.21 ounces. For a 95% confidence interval, we will use the
approximation of Z score of 1.96 and now putting these values in our equations will give – so,
based on the sample, it seems that population of cereals will have a mean which can be
something between 17.54 ounces to 17.66 ounces.

27
Example - Mean Weight (3 of 3) - Slide 21

n = 50

x̄ = 17.6

s = 0.21

z α = 1.96
2

s
[x̄ ± t α × ]
2 √n

0.21
[17. 6 ± 1. 96 × ]=[17. 6 ± 0. 058]=[17. 54, 17. 66]
√50

Transcript

So even the best-case scenario, 17.6 ounces, seems to be fairly below the listed weight of 18
ounces. Management better look into fixing this problem, don't you think?

28
Let's Practice - Slide 22

n = 50

x̄ = 17.6

s = 0.21

z α   =  1.96
2

Transcript

So now let's practice. Can you tell me what is the margin of error here?

Let's Practice - Solved (1 of 2) - Slide 23

29
n = 50

x̄ = 17.6

s = 0.21

z α   =  1.96
2

0.21
[17. 6  ±  1. 96  ×   ]  =  [17. 6  ±  0. 058]  =  [17. 54,  17. 66] 
√50

The number 0.058 in the formula is marked with a red circle. It stands for the margin of error.

Transcript

Margin of error here is 0.058 ounces. Now let's practice again. How will the confidence interval
change, for this example, if we wanted a 99% confidence level?

Let's Practice - Solved (2 of 2) - Slide 24

n = 50

x̄ = 17.6

s = 0.21

99% Confidence interval: z α = 2.575


2

0.21
[17.6 ± 2.575 × ]=[17.6 ± 0.00765]=[17.52, 17.67]
√50

95% Confidence Interval: [17.54, 17.66]

30
Transcript

For 99% confidence interval, the only thing that will change is z score and that would
approximately be 2.575. This will result in an interval which is slightly wider interval as compared
to the 95% confidence interval, which makes sense because the margin of error is being
multiplied by 2.575 rather than 1.96.

Example - Waiting Time (1 of 2) - Slide 25

Speedy Lube wants to advertise a 15-minute oil change to its customers. The manager has spent
a lot of time improving the oil change process and now wants to see if they will be able to
advertise for a 15-minute oil change. He takes 1000 random observations and analyzes the data
to find that the average time was 14.7 minutes and the standard deviation was 3.5 minutes. What
is the 95% confidence interval for the population mean?

Transcript

Now here's another example. Speedy Loop wants to advertise a 15-minute oil change to its
customers. The manager has spent a lot of time improving the process of oil change and now
wants to see if he will be able to advertise for a 15-minute oil change. He takes 1,000 random
observations and analyzes the data to find the average time was 14.7 minutes and the standard
deviation was 3.5 minutes. What is the 95% confidence interval for the population mean? Can he
advertise for expected time of 15 minutes?

31
Example - Waiting Time (2 of 2) - Slide 26

n = 1000

x̄ = 14.7

s = 3.5

z α = 1.96
2

3.5
[14. 7 ± 1. 96 × ]=[14. 7 ± 0. 217]=[14. 483,  14. 917]
√1000

The confidence interval from 14.483 to 14.917 is marked with a red circle.

Transcript

Once again, here are the relevant information. Sample size is 1,000 observations, which resulted
in 14.7 minutes as the average for the sample and standard deviation of 3.5 minutes. At 95%
confidence interval, we use 1.96 and using the equation we will get the interval of 14.483 minutes
and 14.917. So, there is a 95% confidence that the two-average oil change time is between these
two values, which is below 15 minutes. So yes, he can advertise the expected time of 15 minutes
to his customers.

32
Let's Practice - Slide 27

Speedy Lube wants to advertise a 15-minute oil change to its customers. The manager has spent
a lot of time improving the oil change process and now wants to see if they will be able to
advertise for a 15-minute oil change. He takes 1000 random observations and analyzes the data
to find that the average time was 14.7 minutes and the standard deviation was 8.2 minutes.
What is the 95% confidence interval for the population mean?

Transcript

Now consider this and let's practice. Consider the same shop, but now let's assume that although
we had the same average for the 1,000 observations that we had made, the standard deviation is
this time 8.2 minutes, not the 3.5 that we did in the previous example. So, what would the 95%
confidence interval be and can he still advertise for 15 minutes?

33
Let's Practice - Solved - Slide 28

n = 1000

x̄ = 14.7

s = 8.2

z α = 1.96
2

8.2
[14. 7 ± 1. 96 × ]=[14. 7 ± 0. 508]=[14. 192,  15. 208]
√1000

Confidence interval when s = 3.5 min: [14.483, 14.917]

Transcript

The confidence interval using the new standard deviation results in a larger confidence interval
as compared to before, when the standard deviation was 3.5 minutes. More importantly, the
confidence interval contains a value of 15.208. So, in this as the expected time for an oil change
could be greater than 15 minutes and the manager should not advertise 15-minute expected
time. What this example illustrates is that as the variability of the values observed within a sample
increases, the width of the confidence interval increases. The manager may want to know why
there are so much variation in the service time and try to improve the process and make it less
variable. Let's revisit this situation again.

34
Example - Waiting Time (1 of 2) - Slide 29

Speedy Lube wants to advertise a 15-minute oil change to its customers. The manager has spent
a lot of time improving the oil change process and now wants to see if they will be able to
advertise for a 15-minute oil change. He takes 100 random observations and analyzes the data
to find that the average time was 14.7 minutes and the standard deviation was 3.5 minutes. What
is the 95% confidence interval for the population mean?

Transcript

In the original version we took a sample of 1,000 observations. What if we took 100 observations
and got the same sample statistics of mean over 14.7 minutes and standard deviation of 3.5
minutes. What would the confidence interval be?

35
Example - Waiting Time (2 of 2) - Slide 30

n = 100

x̄ = 14.7

s = 3.5

z α = 1.96
2

3.5
[14. 7 ± 1. 96 × ]=[14. 7 ± 0. 686]=[14. 014, 15. 386]
√100

(15.386 is underlined

Confidence Interval for n = 1000: [14.483, 14.917].

Transcript

Once again here are the relevant information. Size of the sample is 100, resulted in 14.7 minutes
as the average and for the sample and a standard deviation of 3.5 minutes. At 95% confidence
interval we use 1.96 and using the equation we would get the interval of 14.014 minutes and
15.386 minutes. So, there is a 95% confidence that the true average of oil change is between
these two values, which is again wider than the interval we had when our sample had 1,000
observations. The smaller sample of 100 observations has also produced a possible value of
15.386 and this could be actually the value of the expected time. Remember, any value in this
interval is a possibility for the population parameter. You can't just choose to focus on what is a
more desirable outcome for you. It is not up to you. So, in this case, the manager can't advertise
for 15 minutes as the expected time. So, what do we observe here? We observe that the smaller
sample sizes will make our confidence interval have a larger margin of error, thus it is wider as
compared to a confidence interval, which is based on a larger sample size.

36
Precision Vs. Accuracy - Slide 31

The more confident we want to be, the larger the margin of error must be. LESS PRECISE
Every confidence interval is a balance between certainty and precision.
Fortunately, we can usually be both sufficiently certain and sufficiently precise to make
useful statements.

Transcript

Two of the examples we have gone through we see a few things that are unfolding and they all
have to do with the margin of error. First, the more confident we want to be, the larger the margin
of error must be. Remember, in an earlier example when we switched the confidence level from
95% to 99%, the estimate interval became wider. Just imagine, I give you a quiz which has ten
points. You're nervous and want to know, how hard are my quizzes? So, I tell you I have given
this quiz to students like you and I am 100% confident that your class average will be between 0
and 10. This statement is absolutely true but it also doesn't give you any useful information.
Why? Because it's too wide. Confidence without precision is not very useful. Thus, every
confidence level is a balance between certainty and precision. Fortunately, we can usually be
both sufficiently certain and sufficiently precise to make useful statements. From a statistical
study design perspective, we can do this by paying attention to sample size. We will learn how to
select the sample size on a later lesson here.

Lesson 4-2.2 Confidence Interval for Mean in Excel


Media Player for Video

37
Daily Temperature NY Sample Excel Sheet (1 of 22) -
Slide 32

The slide shows the Average Daily Temperature data set of New York LaGuardia airport for the
last 25 years. The data consists of a table with three columns: Day number, Date, and New York
(La Guardia). This last column is highlighted in yellow. To the right of this table, are two cells
called "Average" and "Standard deviation", which are vertically listed with their respective values.
Here, the Average is 55.3 and the Standard deviation is 17.379.

Download the Daily Temperature excel file (Refer to Complete Data - First worksheet)

Transcript

In this video, I'm going to show you the concept of confidence interval. So, we have over 26,000
data points for New York for over 25 years of data that we have for average daily temperatures.
And I'm going to use this as a way of illustrating to you what it means to take a sample and then
using that sample to come up with a confidence interval. I have already gone ahead and
calculated my average based on the entire data set that I have. So, if you look at this one, you
will see that it is the average of where the data for New York sits (Data of column C), and that
gives me the average of 55.2, and it gives me the standard deviation of 17.38, roughly.

38
Daily Temperature NY Sample Excel Sheet (2 of 22) -
Slide 33

The slide shows a new worksheet that contains 200 samples from the Average Daily
Temperature data set. To the right, is a list of statistics including: sample size (n), mean, standard
error, confidence level, critical value under t-distribution (t α  ) , critical value under normal-
2

distribution (z α  ) , and the lower and upper value of confidence interval. These statistics are
2

vertically listed, but they are not computed yet. The only item that has a value is the sample size,
which is 200.

Download the Daily Temperature excel file (Refer to Sample Data - Second worksheet)

Transcript

So, I have taken one sample. And that has – the sample has 200 points and it, so I use the same
principle that I used in my earlier video to show you that I went to data analysis then went to
sampling and then I selected a sample size of 200. So, here's my sample size of 200.

39
Daily Temperature NY Sample Excel Sheet (3 of 22) -
Slide 34

Transcript

So first I need to know what is the mean of this sample. So, the way I find that is by taking its
average and the average of the value that sits right here. So, I'm going to click on the first value
(Data of column C), hold control shift, and I will pick the entire 200 points, close parentheses,
return.

Daily Temperature NY Sample Excel Sheet (4 of 22) -


Slide 35

40
Transcript

And you need to scroll up just a tad to see it again. So, this sample gives me a mean of 56.36.

Daily Temperature NY Sample Excel Sheet (5 of 22) -


Slide 36

Transcript

No instruction provided during this slide

41
Daily Temperature NY Sample Excel Sheet (6 of 22) -
Slide 37

Transcript

Next, I need to calculate the standard deviation of this sample. And I will do that by taking stdev.s,
dot s is for sample. Pick the first value again, control shift down, close the parentheses, return.
And it will give me the standard deviation of 17.99 for this.

Daily Temperature NY Sample Excel Sheet (7 of 22) -


Slide 38

s
σx̄ = .
√n

42
Transcript

Now, based on this, I need to calculate the standard deviation for the sampling means. That
means if I were taking samples over and over again, that's what I would get. The formula for that
is the standard deviation of the sampling means is known as a standard error, and we use the
sample standard deviation and divide it by the square root of n. So, this is what I need to do, I am
going to basically write that here. It's going to be my standard deviation divided by the square
root of my sample size so that's exactly what that equation is. So, I will press return, and this
would be the standard error (1.266271), which is the standard deviation of the sampling means,
then the confidence interval.

Daily Temperature NY Sample Excel Sheet (8 of 22) -


Slide 39

Transcript

Let's say here my confidence interval is .95.

43
Daily Temperature NY Sample Excel Sheet (9 of 22) -
Slide 40

Animated curves on a graph. Normal in black, t in red with df from 1 to 50. The red curve starts
lower than the black curve with wider tails.

Transcript

No instruction provided during this slide

Daily Temperature NY Sample Excel Sheet (10 of 22) -


Slide 41

Red and black curve even on graph.

44
Transcript

So, then what is these two values? In the PowerPoints, when we don't have access to t-
distribution, I have said to you that we can go ahead and use a z-value. And for 95, I pretty much
know that's a 1.96. So, remember what a confidence interval of 95% would be. To be exactly
right, we should be using a t-distribution. But in the PowerPoints, I've been telling you that if your
sample size is large enough, we can use a z-distribution. Because as the sample size is larger
and larger, the t-distribution and the z-distribution start to become very similar. Let me just, in this
video, show you a simulation which shows the difference between a t-distribution and a normal
distribution.

If you look at this animation that's happening right here, the black curve is the normal distribution.
The red curve represents a t-distribution. And as its degrees of freedom goes up, and degrees of
freedom is sample size minus 1, what you see is that as it becomes closer and closer to 50, at 50
they're almost identical. So, what I have said in my PowerPoints is that it's easier for you to just
use an estimation when the sample size is large enough. One of the things that we know is that
1.96 represents 95% confidence interval when it comes to normal distribution. And how do I know
this?

Daily Temperature NY Sample Excel Sheet (11 of 22) -


Slide 42

Draws a histogram with a bell curve. The central portion is shaded and labeled 95 percent. The
left tail is negative z of alpha over 2 and labeled 2.5 percent. The right tail is positive z of alpha
over 2 and labeled 2.5 percent. Everything to the left of the right tail is .975.

45
Transcript

Remember what a normal distribution looks like, normal distribution is a symmetrical curve that
looks like this. And if I say I'm looking for a confidence interval of 95%, I am saying that here it's
95%. So, then I want to know what is this z-value. And this is what we call z of α over 2 and z of
αover2 . One is positive, and one is negative. This 95% – the remaining 5%, 2.5% of it is going to
be on this side of the curve and 2.5% of it is going to be on this side of the curve. The area to the
left of the z is really actually .975. So that's what I'm going to put in order for you to see what that
value is going to be.

Daily Temperature NY Sample Excel Sheet (12 of 22) -


Slide 43

The slide shows how to use the Critical value using a normal distribution. Once written
=NORM.S.INV, next to the Critical value using a normal distribution cell, the program
automatically shows the syntax to get the normal distribution, which is
“=NORM.S.INV(probability).” The function for the normal distribution is =NORM.S.INV(0.975).

The histogram presented on Slide 42 - Daily Temperature NY Sample Excel Sheet (11 of 22) is
repeated.

Transcript

No instruction provided during this slide

46
Daily Temperature NY Sample Excel Sheet (13 of 22) -
Slide 44

The slide shows the outcome of the critical value using a normal distribution, which is 1.9599.

The histogram presented on Slide 42 - Daily Temperature NY Sample Excel Sheet (11 of 22) is
repeated.

Transcript

So first I'm going to show you z-value then I'm going to show you the t-value. So to do that, I'm
going to say norm.s.inverse and I'm going to put everything to the left of that value, so it's .975.
And this is going to be close to 1.96. And that's one of the things that I have said to you that 95%
confidence interval is very common, and you want to remember that is 1.96.

47
Daily Temperature NY Sample Excel Sheet (14 of 22) -
Slide 45

The slide shows two bell-shaped curves in a plot with the y-axis ranges from 0 to 0.4 and the x-
axis ranges from −4 to 4. One of the curves is black and has a normal distribution with a peak at
0.4. The other curve is red and has a t distribution with a degree of freedom of 50, and a peak at
0.38. Here, the black and red curves almost overlap.

Transcript

T-distribution looks exactly the same way. So, let me get rid of this drawing. T-distribution looks
exactly the same way except its tail is a little longer. So, again, let me go back to my simulation
so you can see that visually. Look at the red line versus the black line. The red line is for the t-
distribution. And it becomes more and more like a normal distribution as the sample size
increases, but look at its tail. It's just longer, slightly longer.

48
Daily Temperature NY Sample Excel Sheet (15 of 22) -
Slide 46

The slide shows how to use the Critical value using a t-distribution. Once written =T.INV, next to
the Critical value using a t-distribution cell, the program automatically shows the syntax to get the
t-distribution, which is “=T.INV(probability,deg_freedom).” The degree of freedom (DOF) is
calculated as the sample size minus 1. The function for the t-distribution is =T.INV(0.975,F1-1),
where F1 is the sample size of 200.

Degrees of Freedom (DOF) = Sample Size (n) − 1.

Transcript

No instruction provided during this slide

49
Daily Temperature NY Sample Excel Sheet (16 of 22) -
Slide 47

The slide shows the outcome of the critical value using a t-distribution, which is 1.9719.

Transcript

Going back, the t-distribution also has a similar function to our normal distribution and it's called
t.inverse. And it is looking for probability, again .975, and its degrees of freedom is always n − 1,
so it is 200 minus 1. So, degrees of freedom is always n − 1. So, I will return that and you would
see that these numbers are pretty close. z would have given me 1.96, using a t-distribution, I get
a 1.97. That's why, in my slides, I have told you when the sample size is large enough, you can
go ahead and just use 1.96; it’s minor problems. So, – but being accurate and being in Excel, I
am going to actually use the correct one, which is the t-distribution. So, I'm going to highlight this
for you to remember, you will use this value.

50
Daily Temperature NY Sample Excel Sheet (17 of 22) -
Slide 48

t of alpha over 2: Critical value (using t-distribution) equals 1.971957.

t of alpha over 2: Critical value (using normal-distribution) equals 1.959964.

Transcript

No instruction provided during this slide

Daily Temperature NY Sample Excel Sheet (18 of 22) -


Slide 49

51
The slide shows the outcome of the margin of error, which is 2.4970. In addition, there is the
margin of error formula that says: Critical value using t-distribution × standard error = 1.97 ×
1.266.

Margin of Error equals 2.497032.

Transcript

For me to calculate the lower and upper values of my confidence interval, I need to know my
margin of error. And margin of error is simply your critical value, how far you are from the mean in
that distribution times your standard error, which is right here. 1.97 multiplied by 1.266. So, this is
my t-value, and this is my standard error. Okay, and if I multiply that, this is the value I get.

Daily Temperature NY Sample Excel Sheet (19 of 22) -


Slide 50

The slide intends to show how to calculate the confidence interval with the lower value. To
calculate this value, use the mean of the sample and substract the margin of error. The function
for the confidence interval with the lower value is =F3-F13, where F3 is 56.36 (mean of the
sample) and F13 is 2.49 (margin of error).

Transcript

No instruction provided during this slide

52
Daily Temperature NY Sample Excel Sheet (20 of 22) -
Slide 51

The slide shows how to calculate the confidence interval with the upper value. To calculate this
value, use the mean of the sample and add the margin of error. The function for the confidence
interval with the upper value is =F3+F13, where F3 is 56.36 (mean of the sample) and F13 is
2.49 (margin of error).

Confidence Interval Lower Value equals 53.86642. Upper Value equals 58.86048.

Transcript

No instruction provided during this slide

53
Daily Temperature NY Sample Excel Sheet (21 of 22) -
Slide 52

The slide shows the outcome of the confidence interval with the upper value, which is 58.8604. In
addition, the confidence interval [53.86, 58.86] is hand written on the screen.

Download the Daily Temperature excel file (Refer to Sample Calculation - Third worksheet)

Transcript

So now that I have my margin of error, the lower bound of my confidence interval is going to be
my sample mean. So, the equation or my confidence interval is x̄ ± margin of error. So, in this
case, it's going to be 56, this is my mean of my sample, minus the margin of error. And then it's
going to be – upper value is going to be 56 plus the margin of error. You're 95% confident that the
population parameter, the temperature – the average temperature for New York falls somewhere
between these two values. And what was our temperature?

54
Daily Temperature NY Sample Excel Sheet (22 of 22) -
Slide 53

The slide shows the Average Daily Temperature data set and the "Average" and "Standard
deviation" cells shown in Slide 32 - Sample Excel Sheet (1 of 22). The Average is 55.2 and the
Standard deviation is 17.379.

Transcript

Our actual temperature was 55.2. So, in this case, we got a sample that gave us the right answer.
There is a 5% chance that we would have had something that did not result in this value. Now
every value in this interval is as likely as anything else.

Lesson 4-2.3 Impact of Confidence Level Illustrated in


Excel
Media Player for Video

55
Daily Temperature NY Sample Excel Sheet (1 of 13) -
Slide 54

This slide contains 200 samples from the temperature population, listed on the left of the slide. To
the right, there is a list of statistics vertically placed with their respective values are: sample size
(n) = 200. Mean of This Sample = 56.36345. Standard deviation of the sample = 17.90778.
Standard Error (standard deviation of Sampling Means) = 1.266271. t of alpha over 2: Critical
value (using t-distribution) = 1.971957. z of alpha over 2: Critical value (using normal-distribution)
= 1.959964. Margin of Error = 2.497032. Confidence Interval: Lower Value = 53.86642. Upper
Value = 58.86048.

Download the Daily Temperature excel file (Refer to Sample Calculation - Third worksheet)

Transcript

So let's go back to our example that we were using. I showed you how to calculate the
confidence interval when we had a confidence level of 95%. Let's say I have the exact example
and I now want to calculate the 90% confidence interval and 99% confidence interval. So how
would the values change?

56
Daily Temperature NY Sample Excel Sheet (2 of 13) -
Slide 55

This slide contains all of the information listed on the right side of Slide 54 - Daily Temperature
NY Sample Excel Sheet (1 of 13).

Draws a histogram with a bell curve. Central part of the curve is shaded in as 90 percent. Two
tails are 5 percent each. 95 percent is everything to the left of the right tail.

Transcript

No instruction provided during this slide

57
Daily Temperature NY Sample Excel Sheet (3 of 13) -
Slide 56

Draws a histogram with a bell curve. Central part of the curve is shaded in as 99 percent. Two
tails are .005 percent each. .995 is everything between the two tails.

Transcript

Your confidence level is going to be right here and that's 90%. If you're going to go to 90%, this is
what it means. And if it is 90%, then this value, everything to the left of it, the remaining 10%, five
of it is here and five of it is here, so this would be .95. In case of a 99%, then you're saying that if
this is 99%, then there is a .005 here and a .005 here. So, in this case, everything to the left of
this is .995. So that's what would change.

58
Daily Temperature NY Sample Excel Sheet (4 of 13) -
Slide 57

The slide shows how to use the Critical value using a t-distribution when the confidence level is
0.9. Once written =T.INV the program automatically shows the syntax to get the t-distribution,
which is “=T.INV(probability,deg_freedom).” The degree of freedom (DOF) is calculated as the
sample size minus 1. The function for the t-distribution is =T.INV(0.95,199).

Transcript

So in our case, it would be t.inverse and we're going to put .95 when we are talking about the
90% and the degrees of freedom is again 200, the n − 1, so 199, and the 99% is going to be
t.inverse of everything to the left of this is .995 and again 200 minus one is 199.

59
Daily Temperature NY Sample Excel Sheet (5 of 13) -
Slide 58

The slide shows the outcome of the critical value using t-distribution when the confidence level is
0.9, which is 1.6525. The same process is repeated with the confidence level 0.99. Once written
=T.INV in the corresponding cell, the program automatically shows the syntax to get the t-
distribution, which is “=T.INV(probability,deg_freedom).” The degree of freedom (DOF) is
calculated as the sample size minus 1. The function for the t-distribution is =T.INV(0.995,199).

t of alpha over 2: Critical value (using t-distribution) = 1.9652547 for a Confidence level of 0.9 and
2.60076 for a Confidence level of 0.99.

Transcript

No instruction provided during this slide

60
Daily Temperature NY Sample Excel Sheet (6 of 13) -
Slide 59

The slide shows the outcome of the critical value using t-distribution when the confidence level is
0.99, which is 2.6007. In addition, the slide shows how to use the margin of error function when
the confidence level is 0.9. To calculate the margin of error, use the critical value using a t-
distribution and multiply it by the standard error. The function for the margin of error is =G11*F7,
where G11 is 1.6525 (critical value using a t-distribution for 0.9) and F7 is 1.266 (standard error).

Transcript

So now what is the margin of error? The margin of error is simply your critical value multiplied by
the standard error, which is this.

61
Daily Temperature NY Sample Excel Sheet (7 of 13) -
Slide 60

The slide shows the outcome of the margin of error when the confidence level is 0.9, which is
2.0925. In addition, the slide repeats the same process with the confidence level is 0.99. To
calculate the margin of error, use the critical value using a t-distribution and multiply it by the
standard error. The function for the margin of error is =H11*F7, where H11 is 2.6007 (critical
value using a t-distribution for 0.99) and F7 is 1.266 (standard error).

Margin of Error = 2.092572 for a Confidence level of 0.9 and 3.293268 for a Confidence level of
0.99.

Transcript

No instruction provided during this slide

62
Daily Temperature NY Sample Excel Sheet (8 of 13) -
Slide 61

The slide shows the outcome of the margin of error when the confidence level is 0.99, which is
3.2932.

Transcript

So, this is the margin of error, and this is the margin of error for 99%. So now we have the margin
of error. Now one of the things that you notice is that the margin of error goes up. So, if I go from
90% to 95%, the margin of error has gone up, and if I go from 90% to 99%, the margin of error
goes up. Because the margin of error goes higher and higher as we increase our confidence
level, the bit of our confidence intervals will start to increase also.

63
Daily Temperature NY Sample Excel Sheet (9 of 13) -
Slide 62

The slide shows how to calculate the confidence interval for the lower value when the confidence
level is 0.9. To calculate this value, use the mean of the sample and substract the margin of error.
The function is =F3-G13, where F3 is 56.36 (mean of the sample) and G13 is 2.0925 (margin of
error for confidence level of 0.9).

Transcript

No instruction provided during this slide

Daily Temperature NY Sample Excel Sheet (10 of 13) -


Slide 63

64
The slide shows the outcome of the confidence interval for the lower value when the confidece
level is 0.9, which is 54.2708. In addition, the screenshot intends to show how to calculate the
confidence interval for the upper value when the confidence level is 0.9. To calculate this value,
use the mean of the sample and add the margin of error. The function is =F3+G13, where F3 is
56.36 (mean of the sample) and G13 is 2.0925 (margin of error for confidence level of 0.9).

Transcript

So, let's calculate the confidence levels here. For the 90%, it's mean minus its margin of error,
and its mean plus the margin of error.

Daily Temperature NY Sample Excel Sheet (11 of 13) -


Slide 64

The slide shows the outcome of the confidence interval for the lower and upper values when the
confidece level is 0.9, which are 54.2708 and 58.456, respectively. In addition, the screenshot
intends to show how to calculate the confidence interval for the lower value when the confidence
level is 0.99. To calculate this value, use the mean of the sample and substract the margin of
error. The function is =F3-H13, where F3 is 56.36 (mean of the sample) and H13 is 3.293 (margin
of error for confidence level of 0.99).

Transcript

No instruction provided during this slide

65
Daily Temperature NY Sample Excel Sheet (12 of 13) -
Slide 65

The slide shows the outcome of the confidence interval for the lower value when the confidece
level is 0.99, which is 53.07018. In addition, the screenshot intends to show how to calculate the
confidence interval for the upper value when the confidence level is 0.99. To calculate this value,
use the mean of the sample and add the margin of error. The function is =F3+H13, where F3 is
56.36 (mean of the sample) and H13 is 3.293 (margin of error for confidence level of 0.99).

Transcript

For the 99%, the same thing we repeat. The mean minus its margin of error and mean plus its
margin of error.

66
Daily Temperature NY Sample Excel Sheet (13 of 13) -
Slide 66

The slide shows the outcome of the confidence interval for the upper value when the confidece
level is 0.99, which is 59.6567. In addition, there is a hand written summary of the results of the
three confidence levels. These results are:

90% = [54.27, 58.45]

95% = [53.86, 58.86]

99% = [53.07, 59.65]

Download the Daily Temperature excel file (Refer to Sample 3 Confidence Levels - Fourth
worksheet)

Transcript

I'm going to just write it here side by side so we can compare. So, this is my 90% confidence.
This is my 95% confidence, and this is my 99% confidence. 90th percent is somewhere between
54.27 all the way to 58.45, this is the 90th percent. 95th percent is 53.86 and 58.86. And for 99%,
it's 53.07, 59.65. So, the width starts increasing. So, we will be more and more confident that we
have captured the true population mean in this interval. However, we are losing precision, it's too
wide. For example, at 99% there is almost a 3.5-degree difference, and that might be a big deal.

67
Lesson 4-3 Confidence Interval for Population
Proportion

Lesson 4-3.1 Confidence Interval for Population


Proportion
Media Player for Video

Population Proportion - Slide 67

What proportion of your customers would recommend your business to their friends?
What proportion of voters will vote for candidate A?
What proportion of American Airline flights leave on time?

Transcript

There are many times that you're interested in inferring about a population proportion. For
example, what proportion of your customers would recommend your business to their friends?
What proportion of voters would vote for candidate A? What proportion of American Airline flights
leave on time? When you ask questions like this there are only two possible outcomes. For
instance, someone either recommends the business or not. They vote for the candidate, or not or
the flight leaves on time or not. We can use sample studies to make inference about the
population proportion just the same way we did for the mean. The idea is the same, the math is
the difference; so, let's explore this now.

68
Sample Proportion - Slide 68

A drawing of a jar with red and blue marbles inside. A drawing of a smaller jar with less marbles
inside.

Transcript

This jar of marbles can represent the two possibilities when we look at the proportion for
instance, the red marbles could be all the voters who would vote for candidate A. And the blue
marbles are those who won't. If this jar was the population of all voters, how did I know before the
Election Day what percentage will vote for candidate A? This is exactly what political polling is all
about. We will take a sample and then use that information to make an inference about the
population at large. A randomly selected sample will be used to draw conclusions about the
population, and that is part of the statistical inference process. But because samples are drawn
randomly all forms of statistical inference are tarnished with a degree of uncertainty and this is
where our confidence interval becomes important.

69
Confidence Interval for a Population Proportion - Slide
69

Desired level of confidence (1 − α)


Sample proportion p̂

p̂ ( 1−p̂ )
[p̂ ± z α √ ]
n
2

p̂ ( 1−p̂ )
Margin of Error is this portion of the equation: z α √
n
2

Transcript

To develop the confidence interval we need to state the level of confidence, which is 1 − α. Then
find the sample proportion denoted by x̂ and once you have these two, then the confidence
interval for the population proportion is calculated using the following mathematical operation. As
was the case for inferring about the mean, we are adding and subtracting the margin of error
from our sample proportion, where z of α over 2 shows the number of standard error away from
the center of the sampling distribution.

70
Are Local Job Market Conditions Good? (1 of 2) -
Slide 70

From March 31 to April 2, 2016, Gallup interviewed a random sample of 1,526 adults living in all
50 US states and the District of Columbia. Of the 1,526 adults, 779 thought the market conditions
were good.

What is the 95% confidence interval for the proportion of all adults who thought the market
conditions were good?

Transcript

Each day Gallp asks a randomly selected sample of Americans many different questions, and
tracks their answers. One question is their opinion on whether the local job market condition has
improved. From March 31 through April 2, 2016, Gallup interviewed the random sample of 1,526
adults living in all 50 states, as well as the District of Columbia. There were 779 adults who
thought market conditions was good. What is a 95% confidence interval for the proportion of all
adults who think the market condition is good?

71
Are Local Job Market Conditions Good? (2 of 2) -
Slide 71

n = 1,526

p = 779 ÷ 1526 = 0.51

Confidence level = 95% (z α = 1.96 )


2

p̂ ( 1−p̂ ) 0.51× ( 1−0.51 )


Margin of Error:z α √
n
  =  1. 96 × √
1.526
= 0. 025
2

0.51 ± 0.025 =[0.485, 0.535]

(Abouelala & Ander, 2015)

Transcript

First, there are 1,526 people in our sample. Then 51% is the sample proportion who believe job
market is good. Confidence interval is 95%, which means there is a 5% change that our sample
is not a good sample, but there is a 95% chance that the interval has the true population
proportion. Margin of error, which is the value of z of α over 2, times the standard error.
Therefore, in this case it would be .025 or 2.5%. So, based on these calculations we get a
confidence interval for population proportion, which is somewhere between 48.5% and 53.5%.

72
What Does 95% "Confidence" Really Mean? - Slide 72

0.51× ( 1−0.51 )
Standard Error = √ = 0. 0128
1.526

95% confidence interval: [0.485, 0.535]

A histogram is drawn with a bell curve representing 95 percent. Three standard errors on the left
tail is calculated by 0.51 minus 3 times 0.128 equals 0.4716. Markers at the two standard error
points at the 95 percent confidence interval: [0.485, 0.535].

Transcript

It is really important to understand what a 95% confidence means. It goes back to the central limit
theorem, which shows that if you keep taking samples from our population each sample will have
an average or proportion slightly different from another sample. One reason why you get different
results from various political polls. But if you plot these sample means or sample proportions we
get a normal distribution centered around in the Gallup case .51, and the spread is based on the
standard error. So, we get out in each direction up to plus or minus three standard error to get the
spread. So, at 95% confidence interval we are roughly at two standard errors, which means the
confidence interval for population proportion, within these two boundaries. And based on our
conclusion that was .485 and .535.

Now if you take another sample you will have a different result. But at 95% confidence interval it
implies that if you take total of 100 samples, 95 of those will have a sample proportion which will
fall within the interval we calculated based on our one sample. However, five samples or so will
result in proportions in the tails and that is why we are sometimes wrong. In this case roughly a
5% chance of getting a non-representative sample which will mislead us. Here you see results for
95% confidence intervals when we took total of 100 samples.

73
Animated Example - 95% C.I. - Slide 73

A bar graph with a line at 0.2 and several green bars and a few red bars along that line.

Transcript

For this simulation, I assume that actual population proportion was .2 and it was known, shown in
this solid black line. Then I took 100 samples of 100 observation each. In this particular
simulation we ended up with six sample shown in red, which when we use the sample
information to develop the 95% confidence interval for the population proportion we would have
gotten the wrong impression. All the green bars represent the 95% confidence interval for all the
other 94 samples which will have resulted in confidence interval, which would have included the
actual population proportion. You can watch my illustration video later on where you can see me
develop this graph.

74
Multiplier and Confidence Level - Slide 74

zα is the multiplier that represents the level of confidence desired (also known as the critical
2

value).

p̂ ( 1−p̂ )
[p̂ ± z α √ n
]
2

75
Multiplier and confidence level example
Confidence Level Multiplier (z α )
2

90 1.645

95 1.96 (often rounded to 2)

98 2.33

99 2.58

76
Transcript

Z of α over 2 is the multiplier which represents the level of confidence desired, also known as the
critical value. It is how many standard errors we are away from the need of the sampling
distribution. This multiplier gets larger as we increase the confidence level, thus resulting in the
wider confidence interval.

Let's Practice - Slide 75

From March 31 to April 2, 2016, Gallup interviewed a random sample of 1,526 adults living in all
50 US states and the District of Columbia. Of the 1,526 adults, 779 thought the market conditions
were good.

What is the 99% confidence interval for the proportion of all adults who thought the market
conditions were good?

Transcript

Going back to the Gallup example, we found the 95% confidence interval and now I want you to
find the 99% confidence interval.

77
Let's Practice - Solved - Slide 76

n = 1,526

p = 779 ÷ 1526 = 0.51

Confidence level = 99% (z α = 2.58 )


2

p̂ ( 1−p̂ ) 0.51 × ( 1−0.51 )


Margin of Error: z α
×

n
= 2. 58 × √
1.526
= 0. 033
2

99% Conf. Interval: 0.51 ± 0.033 = [0.477, 0.543]

95% Conf. Interval: 0.51 ± 0.025 = [0.485, 0.535]

Transcript

As before the sample proportion, x̂ is 51%. Now confidence interval is 99% which means there is
a 1% chance that our sample is not a good representative sample. But there is a 99% chance
that interval has the true population proportion. Margin of error, which is the value of z of α over 2
times the standard error, then it will be .033 or 3.3%. So based on these calculations we get a
confidence interval for the population proportion, which is somewhere within 47.7% and 54.3%,
and this is a wider interval than what we got when we use the 95% confidence interval which
gave us 48.5% to 53.5%. Let's look at what changing of confidence interval does to the width of
the confidence interval.

78
90% C.I. Vs. 99% C.I. - Slide 77

A bar graph representing 90 percent Confidence Interval with a line at 0.2 and several green bars
and a few red bars along that line. A second bar graph representing 99 percent Confidence
Interval with a line at 0.2 and even more green bars and one red bar along that line.

Transcript

First, we do the experiment using 100 samples of 100 observation each. Using the confidence
interval of 90% and then 99%. As you can see here when we switch to 99% confidence interval,
the width of the interval gets wider. Look at the green bars for 90% versus the 99%. So, this wider
interval is also more likely to contain the true population parameter. Again, only one red bar or the
99% confidence interval, which is the only sample that would have missed the true population.
However, while we are becoming more accurate we are doing so by becoming less precise. In
another word the margin of error is larger for 99% confidence interval versus 98%. So, there is a
tradeoff here; to get more precise we can increase the sample size.

79
Impact of Sample Size - Slide 78

A bar graph representing 99 percent Confidence Interval, n equal 100. It has a line at 0.2, several
green bars and one red bar along that line. A second bar graph representing 99 percent
Confidence Interval, n equals 1000. It has a line at 0.2, several smaller green bars and one red
bar along that line.

Transcript

Looking at 99% confidence level and using samples of 100 observation each and comparing it to
samples which have 1,000 observation each illustrate this quite well. As you see in the bottom
graph with sample size of 1,000 the width of interval is much narrower than it is for the top graph
where the sample size is 100. So, increased sample size has given us more precision. Once
again, we can determine the right sample size for the level of accuracy and precision with desire.
Now that you understand how to develop a confidence interval for mean and population
proportion we can turn our attention to how to improve our estimation interval. As you see that is
closely related to sample size. In the next lesson we will explore this topic in detail.

Lesson 4-3.2 Confidence Interval for Population


Proportion in Excel
Media Player for Video

80
Excel Sheet (1 of 19) - Slide 79

The slide shows a new worksheet with a data set of the New York Stock Exchange closing data.
The first column is named "New Symbols", which list the symbol of different stocks. The second
column is called “Close” and contains the closing price of the stock. The third column is called
“Net Chg” and the fourth column is called “% Chg”.

Download the New York Stock Exchange excel file (Refer to Data - First worksheet)

Transcript

In this video, I am going to show you how to calculate confidence interval for a population
proportion. The data I'm using is the data we downloaded from the New York Stock Exchange
closing data at the end of a day in September of 2015. Symbols have been changed to comply
with the copyright laws. Let's say that in this situation I'm interested in knowing what proportion of
the stocks had a positive change at the end of the day. So, what we have recorded from the New
York Stock Exchange, the value of the stock at the end of the day, what was the net change that
it had, and whether it was a negative change or a positive change. So, it has gone up if this
percentage is positive and it's gone down if it's this value is negative. If I'm interested in knowing
the proportion of what has gone up or not (Columns: Close, Net Change, Percent Change), first I
have to classify each stock as a positive value or a negative value. So how am I going to do that?
So, I'm going to answer question like has it gone up.

81
Excel Sheet (2 of 19) - Slide 80

Creates column label Up?

The slide shows a new column called "Up?" placed to the right of the "% Chg" column. Here, the
slide shows how to complete this new column. "=IF" is typed in the first "Up?" value cell; the
program automatically shows the syntax of this function: “=IF(logical_test, [value_if_true],
[value_if_false]).” The function for this first value is =IF(D4>0,1,0), where D4 is the first value of
the "% Chg" column (−2.38).

Transcript

And to do this, I'm going to use a logical function from Excel. So, I'm going to say if this value
(Value in Percent Change column for that stock) that I see here is a positive value, so it's greater
than 0, then return a value of 1. Otherwise, return a value of 0.

82
Excel Sheet (3 of 19) - Slide 81

The slide shows the first eight cells of the "Up?" column with the values: 0, 0, 0, 0, 0, 0, 0, 1.
These values were filled by selecting the first created cell and dragging it down the fill hand with a
‘+’ sign until reaching the desired value.

Transcript

So, if you return this and if I'm done it correctly, the first stock that I have classifies as a 0
because it did not go up. So, 0 means false. No, it didn't go up. And let me just copy this down
until we get to a positive value. Here we have a stock that went up on that day. So, if I've done
this correctly, this should return a value of 1. And indeed, it does. For everything else in between
because everything is negative, it returned a value of 0. Here because the value that sits in the
cell that I'm checking, D11, is greater than 0 it returns a 1 and I get to know that that's a stock that
has gone up. So, you can essentially copy this down all the way.

83
Excel Sheet (4 of 19) - Slide 82

Download the New York Stock Exchange excel file (Refer to Up - Second worksheet)

The slide shows the process of completeing the "Up?" column by selecting the eight created cells
and dragging it down the fill hand with a ‘+’ sign until reaching the end of the column.

Transcript

So, I put my cursor here, double click, and it populates the entire thing. So, it again, we have all
of the stocks at that day which is over 2,100 stocks recorded here. So first, I'm going to find out
how many went up? I'm going to say number that went up. And that number is simply sum of
everything here. Remember, they get a 0 if they didn't go up. So, if I add all the ones, I will find
out what was total number that went up at the end of that day. So, control shift down, close the
parenthesis, return, scroll up to see the number.

84
Excel Sheet (5 of 19) - Slide 83

Selects all data in Up? column.

The slide shows the whole "Up?" column completed.

To the right, a new cell called "No. Up" is created. Here, the slide shows how to use the Excel
sum function to calculate the number of Ups in the data. "=SUM" is typed to the right of the No.
Up cell; the program automatically shows the syntax of this function: “=SUM(number1, number2,
…).” After completeing the sum function, select the first cell of the "Up?" column and then hold
Ctrl + Shift and press the down arrow button. This allows to select the whole column.

Transcript

So, it's 383 stocks on that day went up. So, if I want to know what is the proportion, I have to take
this number and divide it by the total number of stocks that were reported back at the end of the
day. So, in order for me to know how many stocks are in this column or basically in this column, I
need to find the total number which is count.

85
Excel Sheet (6 of 19) - Slide 84

The slide shows the outcome of the No. Up: 383.

In addition, a new cell called "count" is created and placed below the "No. Up" cell. To the right of
this new cell, the slide shows how to use the Excel count function. "=COUNT" is typed into the
"count" value cell; the program automatically shows the syntax of this function: “=COUNT(value1,
value2, …).” After completing the count function, select the first cell of the "Up?" column and then
hold Ctrl + Shift and press the down arrow button. This allows to select the whole column.

Transcript

No instruction provided during this slide

Excel Sheet (7 of 19) - Slide 85

86
The slide shows the outcome of the count cell: 2192.

Transcript

Count is simply – I can ask Excel to count the number of values that it finds here. So, if I put my
cursor again here, control shift down, close the parenthesis, return, I will see that on that day I
had 2,192 stocks that were reported in this data set.

Excel Sheet (8 of 19) - Slide 86

The slide shows a new cell called "Prop 'up'" placed a few cells below the count cell. Here, the
slide shows how to calculate the proportion of the stock that has increased over the day. To
calculate this value, use the number of Ups and divide it by the count. The function is =H4/H5,
where H4 is 383 (No. Up) and H5 is 2192 (count).

Transcript

No instruction provided during this slide

87
Excel Sheet (9 of 19) - Slide 87

The slide shows the outcome of the Prop "up" cell: 0.174726.

Download the New York Stock Exchange excel file (Refer to Count Prop - Third worksheet)

Transcript

So, what was the proportion of ups? It is simply 383 divided by 2,192, so those two cells divided.
So roughly about 17.5% of the stocks went up, the rest did not. So, this is definitely going to be a
rough day for the stock market. So now let's look at what happens if we do sampling. So, an idea
behind sampling is that you have the sample. So, let's say you have 127 stocks that you have in
your 401k or you have invested in or whatever. So, you basically hold a sample of the entire
stock market. How did you fair? And can you estimate based on your data what was the number
of ups and downs in the stock market as a whole?

88
Excel Sheet (10 of 19) - Slide 88

The slide shows a new worksheet named "one sample portfolio of 127 stocks using 09/15 data"
with a new data set of stocks. The worksheet is set up in the same fashion as the one presented
in Slide 79 - Excel Sheet (1 of 19) In addition, the "Up?" column has been placed to the right of
the "% Chg" column.

The slide shows how to fill out this new column. "=IF" is typed in the first "Up?" value cell; the
program automatically shows the syntax of this function: “=IF(logical_test, [value_if_true],
[value_if_false]).” The function for this first value is =IF(D5>0,1,0), where D5 is the first value of
the "% Chg" column (−0.89).

Selects value in Percent Change column for that stock.

Download the New York Stock Exchange excel file (Refer to Sample Data - Fourth worksheet)

Transcript

So, we will repeat the same thing here. I want to know how many went up here in my stock, in my
portfolio, and I'm going to repeat it the same way. If this value is a positive value which is greater
than 0, it will return a value of 1; otherwise, return a value of 0 and I'm going to say return. Put my
cursor here and repeat it. So, the entire 127 stocks that I own also gets repeated as 0 and 1.

89
Excel Sheet (11 of 19) - Slide 89

The slide shows the completed "Up?" column. To the right, there are three cells, vertically listed,
with their respective values as follows:

"Sum up": 20.

"count": 127.

"Prop up": =I4/I5, where I4 is 20 (sum up) and I5 is 127 (count.)

Transcript

So, what is the proportion for my portfolio? So again, I'm going to find the sum of the ups and that
will be the sum of the values I have here. Then I'm going to look at the count. I should have 127
because that's what I have told you, but it never hurts to redo it. Count. And as I mentioned, I
have 127. So now my proportion of up for my portfolio is simply 20 divided by 127. So, in terms of
up and downs, there are only 15.75% that went up in mine.

90
Excel Sheet (12 of 19) - Slide 90

The slide shows the outcome of the "Prop up" cell: 0.15748.

Transcript

No instruction provided during this slide

Excel Sheet (13 of 19) - Slide 91

The slide shows two new cells called "95% conf" and "z" placed a few cells below the "Prop up"
cell. Here, the Slide shows how to calculate the z value. "=NORM.S.INV" is typed to the right of
the z cell; the program automatically shows the syntax of this function:
“=NORM.S.INV(probability).” The function is =NORM.S.INV(0.975). In addition, the value for the
95% conf cell is empty.

91
Transcript

So, can I come out with a confidence interval here? So, if I assume a 95% confidence interval,
then what I know is that z value is going to be – I know it's 1.96 but I'm just going to show it to
you. Remember? I have .975 here and I get a 1.96. This is my z value.

Excel Sheet (14 of 19) - Slide 92

The slide shows the outcome of the z score: 1.9599.

There is also a hand drawn histogram with a bell curve with a 95% confidence interval marked
with two vertical lines in the plot. The outside area has 2.5% to the right and 2.5% to the left and
a z score of 1.96.

Transcript

If you need a reminder of how – why I had to put that is because if you look at the normal
distribution, I'm saying that I'm going to create the 95th percentile. So, 2.5% is here, 2.5% is here,
so if you look at this z value, everything to the left of it is that .95. That's in the middle and this
.025. So, you're saying .975 would have a z of 1.96. I'm trying to come up with my confidence
interval but remember, what is the confidence interval?

92
Excel Sheet (15 of 19) - Slide 93

The slide shows a new cell called "SE" placed below the z cell. Here, the slide shows how to
calculate the standard error of the data. "=SQRT" is typed to the right of the SE cell; the program
automatically shows the syntax of this function:“=SQRT(number).” The function used is
=SQRT((I7*(1-I7))/I5), where I7 is 0.1574 (Prop up) and I5 is 127 (count).

p̂ ( 1−p̂ )
To the right of the statistics list there is a hand written formula that says: p̂ ± zα × √
n
.
2

The portion of the formula starting from the square root until the end is called the standard error
(SE)

Standard Error is 0.032322.

Transcript

Confidence interval is your x̂ ± z of α over 2 times the standard error which is your x̂(1 − x̂)
divided by n. This is my standard error. So, I'm going to calculate my standard error here and my
standard here is square root of and I'm going to use as many parentheses as I need to make
sure that my order of operations is correct. So, it's x̂ times 1 minus the x̂ and this is going to give
me the numerator under that square root divided by my sample size. And in this case, it is my
count, 127. Now I'm ready to close my parenthesis and return.

93
Excel Sheet (16 of 19) - Slide 94

The slide shows the outcome of the SE, which is 0.03232.

A new cell called "Margin of error" is created and placed a few cells below the SE cell. Here, the
slide shows how to calculate the margin of error. To calculate this value, use the z score and
multiply it by the SE number. The function is =I12*I13, where I12 is 1.9599 (z) and I13 is 0.032
(SE).

Margin of Error is 0.06335.

Transcript

So now I can calculate my margin of error. And my margin of error is simply your z value times
your standard error.

94
Excel Sheet (17 of 19) - Slide 95

The slide shows the outcome of the Margin of error: 0.06335

A new cell called "Lower" is created and placed a few cells below the Margin of error cell. Here,
the slide shows how to calculate the Lower value. To calculate this value, substract the Margin of
error from the Prop up value. The function is =I7-I15, where I7 is 0.1574 (Prop up) and I15 is
0.063 (Margin of error).

Lower value is 0.09413.

Transcript

So now I can highlight the values that I'm going to use. One would be margin of error and the
other one would be my sample proportion. So, focus on what I have highlighted. So now my 95%
confidence interval would be the lower part of it and the upper part of it is going to be the sample
proportion which is right here minus the margin of error and the upper would be sample
proportion plus the margin of error.

95
Excel Sheet (18 of 19) - Slide 96

The slide shows the outcome of the Lower value: 0.09413.

A new cell called "Upper" is created and placed below the Lower cell. Here, the slide shows how
to calculate the Upper value. To calculate this value, use the Prop up and add the Margin of error
value. The function is =I7+I15, where I7 is 0.1574 (Prop up) and I15 is 0.063 (Margin of error).

Upper value is 0.2208

Transcript

No instruction provided during this slide

Excel Sheet (19 of 19) - Slide 97

96
The slide shows the outcome of the Upper value: 0.2208.

A hand written summary of the confidence level of the data by combining the lower and upper
values previously calculated. The information says: [9.4%, 22%].

Download the New York Stock Exchange excel file (Refer to Sample Count Prop - Fifth
worksheet)

Transcript

I'm going to use my portfolio as a sample. Then based on what I have experience in my portfolio,
I suspect that about 9.4% to 22% is what the stock market has experienced in terms of proportion
of stocks that went up on that day and of course, in my case, my sample would have given me
the right answer. Why? Because looking back on that day, I see the true proportion was 17.5%
and clearly this interval will capture 17.5%. This is how we can use sample proportion to create a
confidence interval, an estimate for what the population proportion might have been.

Lesson 4-3.3 Confidence Interval Animation in Excel


Media Player for Video

Animation (1 of 14) - Slide 98

A program that plots Confidence Intervals on a bar graph. A box shows an image of several blue
and orange marbles. A box to the right shows a smaller sample size. Buttons include Sample,
Sample 100, Clear Plot, and New Population. Multiple Choice check boxes include Sample Size
50, 100, or 250 and Confidence Level 90, 95, or 99.

Math Graphing Tool

97
Transcript

To this animation in this video I'm going to show you the impact of confidence level and the
sample size on the width of the confidence interval itself. So, let's assume that we have a jar of
blue and orange marbles, and again blue can represent happy customers and orange can
represent your unhappy customers. So, let's assume that we have a population like this, and I'm
going to ask it to show me its p.

Animation (2 of 14) - Slide 99

In this slide, the checkbox under the squared jar of marbles says: "Show p: 0.5" and it is selected.

Transcript

So, it says it's a 50/50 population. And I am going to take various size of sample; so, I'm going to
start with size 50, and a 95% confidence interval and start sampling.

98
Animation (3 of 14) - Slide 100

In this slide, the sample size 50 radio option is selected. The empty square to the right of the
squared jar of marbles now has a few marbles inside. As a consequence, the information under
this square is now "Sample p: 0.46".

Additionally, in the figure section there is a vertical black bar at x = 1. The lower bound of the
vertical bar is around 0.3 and the upper bound is around 0.6. The mean, 0.46, is marked with a
straight blue line at y = 0.46.

Transcript

And it's telling me to sample 100 different samples. So, if I take only one sample I think click this
and only take one sample, and this is the sample I get, the first sample. So, you would see this
black line represents the confidence interval I got using the first sample. Our first sample that was
drawn had .46 in it. Remember my real population has 50%. So, this is the width; so, if I do it
again you'll get a different line again different line and I can just ask it to do 100 altogether.

99
Animation (4 of 14) - Slide 101

On the graph a blue straight horizontal line goes across at 0.5. Selects Sample button and a bar
populates vertically on the graph between the 0.2 and 0.8 points. 100 samples shows several
black bars and ten red bars on the graph.

Transcript

So, here we are. So, if you're at a 95th percentile and you were doing a sample size of 50 you
actually are going to get several samples, the red ones are the ones that actually would have
given you a confidence interval that would have not included the real population proportion 50%.
So, if I had only drawn this sample this one that is red I would have received a wrong conclusion.
So, what happens if I change my sample size?

100
Animation (5 of 14) - Slide 102

In this slide, the sample size 100 and the confidence level 95 radio options are selected. The
cursor is selecting the "Sample" option below the squared jar of marbles.

After taking 4 samples, four vertical black bars are drawn on the plot. The first bar has an upper
bound around 0.68 and a lower bound around 0.48. The second bar has an upper bound around
0.55 and a lower bound around 0.35. The third bar has an upper bound around 0.63 and a lower
bound about 0.43. Finally, the fourth bar has an upper bound about 0.62 and a lower bound of
0.42.

Transcript

So instead of taking 50 from exactly the same population I'll take a sample of 100. Just watch
what happens to the width of my confidence intervals. So again, I am going to say to it to sample
and the first sample comes out, sample again, sample again, sample again and you see that the
width, the black line has gotten smaller. So, now I'm going to say take 100 sample from this. And
you would see that by creating a larger sample size first of all the number of samples that I got
that would have missed the true population proportion has been reduced, furthermore the
confidence interval is getting smaller.

101
Animation (6 of 14) - Slide 103

Six red bars on the graph.

Transcript

And remember when the confidence interval is getting smaller because my sample size has
increased, not only I am more likely to find the true population to be given with each sample, but
also, I'm becoming more precise. I'm getting closer, my range is not so high, my estimation is not
so wide.

Animation (7 of 14) - Slide 104

In this slide, the sample size 250 and the confidence level 95 radio options are selected. The
cursor is selecting the "Sample" option below the squared jar of marbles.

102
After taking 3 samples, there are three vertical bars drawn on the plot. The upper bound of the
first bar is around 0.51 and the lower bound is around 0.41. The second bar has an upper bound
around 0.59 and a lower bound of 0.49. The upper bound of third vertical bar is about 0.58 and a
lower bound around 0.48.

Transcript

What happens if I make it 250? So, again I will sample one, sample two, sample three and now
I'm going to do the 100 samples.

Animation (8 of 14) - Slide 105

After taking 100 samples, 100 vertical bars are drawn on the plot, where 95 of them are between
the range of 0.4 and 0.5. From the remaining 5 bars (marked red), 4 have the lower bound higher
than the true proportion 0.5 and the other one has the higher bound lower than 0.5.

Transcript

Once again, the width of the sample sizes has decreased. And now I have one, two, three, four,
five samples that would have missed the real sample and that is what the idea is, that 95%
confidence interval will tell you that you have 95 out of 100 chances that the sample that you
have taken, 95 of them will give you confidence interval will actually contain the true population
value and five of them won't. So, what happens if I keep my sample size at 250, make my
confidence interval 99%? What do you expect to happen? It should be that my confidence
interval gets bigger because 99% the multiplier z of α over 2 is going to be about 2.5 rather than
1.96. So, let's see what happens here. And you see that that happened.

103
Animation (9 of 14) - Slide 106

In this slide, the sample size 250 and the confidence level 99 radio options are selected. Another
100 sample vertical bars are drawn on the plot. From the 100 bars, only one of them is marked
red with a higher bound less than 0.5. The range of the remaining 99 bars is between 0.35 and
0.55.

Transcript

No instruction provided during this slide.

Animation (10 of 14) - Slide 107

In this slide, the sample size 250 and the confidence level 90 radio options are selected. In
addition, another 100 sample vertical bars are drawn on the plot, where 94 are between the

104
range of 0.4 and 0.5. From the remaining 6 bars (marked red), 4 have a lower bound higher than
0.5 and 2 have an upper bound less than 0.5.

Transcript

And if I make it 90% it becomes really small.

Animation (11 of 14) - Slide 108

In this slide, the sample size 250 and the confidence level 95 radio options are selected. In
addition, another 100 sample vertical bars are drawn on the plot, where 95 are between the
range of 0.4 and 0.5. From the remaining 5 bars (marked red), 4 have a lower bound higher than
0.5 and 1 has an upper bound less than 0.5.

Transcript

So, 95 confidence interval is a little wider, 99 is a lot wider. If I make it 100 and at 99% or at 50
and 99% let's see what happens.

105
Animation (12 of 14) - Slide 109

In this slide, the sample size 50 and the confidence level 99 radio options are selected. In
addition, another 100 sample vertical bars are drawn on the plot, where 99 are between the
range of 0.2 and 0.8. The remaining bar (marked red), has a lower bound higher than the true
proportion of 0.5.

Transcript

I will do sample of 100. I have the largest interval; so sure, enough my intervals are so wide that I
won't contain the population proportion but again as I said the width is so big that my margin of
error is going to be big. So, here's what they have learned as I increase my sample size the width
of my confidence interval becomes smaller.

106
Animation (13 of 14) - Slide 110

In this slide, the sample size 250 and the confidence level 99 radio options are selected. In
addition, another 100 sample vertical bars are drawn on the plot, where 98 are between the
range of 0.35 and 0.65. From the remaining 2 bars (marked red), one has a lower bound higher
than 0.5 and the other one has an upper bound less than 0.5.

Transcript

It's going to become more precise. And as I increase my confidence interval again, this would
increase.

Animation (14 of 14) - Slide 111

107
In this slide, the sample size 250 and the confidence level 90 radio options are selected. In
addition, another 100 sample vertical bars are drawn on the plot, where 91 are between the
range of 0.3 and 0.6. From the remaining 9 bars (marked red), 3 have a lower bound higher than
0.5 and 6 have an upper bound less than 0.5.

Transcript

So, if I do 250 at the confidence interval of 90, you should see these lines become much smaller
and they do. Now again 90% will tell me that there are more chances that I would have taking a
sample that would have missed the true population proportion. So, these are the impacts of
confidence level and sample size. The best way to increase are accuracy and precision is to
increase our sample size.

Lesson 4-3.4 Starting Salary Example in Excel


Media Player for Video

Starting Salary Excel Sheet (1 of 13) - Slide 112

A community College provides the starting salaries of its graduates in a particular program as a
way of attracting new students. The student enrollment office, has conducted a survey of its
graduates and 100 of them have volunteered their starting salaries for the jobs they have
accepted. Base on this data, what is the 95% confidence interval for the average starting salary
of the graduates of this program?

The director of the admission, believes that students wouldn't choose to enroll for the program, if
upon graduation they will not make at least $26,500 - What is the 95% confidence interval for
proportion of graduate who make at least $26,500

108
Transcript

The community college provides the starting salaries of its graduates in a particular program as a
way of attracting new students. The student enrollment office has conducted a survey of its
graduates and 100 of them have volunteered their starting salaries for the jobs they have
accepted. Based on this data what is the 95% confidence interval for the average starting salary
of the graduates of this program?

Starting Salary Excel Sheet (2 of 13) - Slide 113

An Excel spreadsheet with one column labeled Starting Salary.

Download the Starting Salary excel file (Refer to Data - First worksheet)

Transcript

So, clearly here we don't want to just say this is what you will make. We want to give the range in
our brochures saying that our graduates make something between x and y, and that would be our
interval estimates of the starting salary. So here, 100 people that have decided to share with us
what they are making. So first we will start by calculating the average salary for these graduates
who shared their numbers with us so this would be equal to average and I will pick the first salary,
control shift down, close the parenthesis, return. And it turns out that the average salary is
$27,092.

109
Starting Salary Excel Sheet (3 of 13) - Slide 114

The slide shows the outcome of the average salary: 27092.

A new cell called "std dev" is created and placed below the "Avg Salary" cell. Here, the slide
shows how to use the Excel standard deviation function. "=STDEV.S" is typed to the right of the
"std dev" cell; the program automatically shows the syntax to get the standard deviation of the
sample: “=STDEV.S(number1, number2, …).” To compute the stadard deviation you have to
select the whole Starting Salary column.

Transcript

I can also calculate the standard deviation and that would be the standard deviation.s and I would
pick the same salary and that would be $2,534 roughly.

110
Starting Salary Excel Sheet (4 of 13) - Slide 115

The slide shows the outcome of the standard deviation: 2533.997.

A new cell called "Sample Size" is created and placed a few cells below the "std dev" cell. Here,
the slide shows how to use the Excel "count" function. "=COUNT" is typed to the right of the
"Sample Size" cell; the program automatically shows the syntax to get the sample size of the
sample:“=COUNT(value1, value2, …).” To compute the sample size you have to select the whole
"Starting Salary" column.

Transcript

Sample size, how many people got back to us is the number of people who responded back and
that's the count of these values and it says that it has 100.

111
Starting Salary Excel Sheet (5 of 13) - Slide 116

The slide shows the outcome of the Sample size: 100.

A new cell called "Std Error" is created and placed a few cells below the "Sample Size" cell. Here,
the slide shows how to use the Excel "standard error" function. To calculate the standard error,
use the standard deviation value and divide it by SQRT(Number). Here, the number is D6, which
is the sample size (100). The function for the standard error is =D4/SQRT(D6).
s
Below the list of statistics, is a formula to calculate the Standard error: σx̂ =
√n

Standard Error is 253.399794.

Transcript

So now I can calculate the standard error. And standard error is the standard deviation of the
sample divided by the square root of the sample size.

112
Starting Salary Excel Sheet (6 of 13) - Slide 117

The slide shows the outcome of the Standard error: 253.3997.

A new cell called "95%, t value" is created and placed a few cells below the "Std Error" cell. Here,
the slide shows how to calculate the t-value. "=T.INV" is typed to the right of the "95%, t value"
cell; the program automatically shows the syntax to get the t-value of the sample:
“=T.INV(probability, deg_freedom).” The function shown for the t-value is =T.INV(0.972,…).

The following note is included below the "95%, t value" cell: "I mistyped here - it should have
been 0.975, but I typed 0.972. The t-values are almost the same." 95 percent, t value is
1.93374829.

Transcript

So now that I have this and I wanted 95% confidence then the t-value is t.inverse. Again, we're
doing 95% so it's going to be .975. That .025 on the left tail is also added and the degrees of
freedom is 100 − 1, which is 99. So, this is my critical value of t and based on this I can find the
margin of error and margin of error is – margin of error is the t-value multiplied by the standard
error.

113
Starting Salary Excel Sheet (7 of 13) - Slide 118

The slide shows the outcome of the t-value: 1.9337.

A new cell called "Margin of error" is created and placed a few cells below the "95%, t-value" cell.
Here, the slide shows how to calculate the margin of error. To calculate this value, use the t-value
and multiply it by the Standard of error number. The function is =D10*D8, where D10 is 1.9337 (t-
value) and D8 is 253.399 (Standard of error).
s
To the right of the list of statistics there is the following formula: t α × Margin of Error is
2 √n

490.01142.

Transcript

No instruction provided during this slide.

114
Starting Salary Excel Sheet (8 of 13) - Slide 119

The slide shows the outcome of the Margin of error:490.01142.

Two new cells called "Lower" and "Upper" are created and placed a few cells below the "Margin
of error" cell. The values of these new cells are 26602 and 27582, respectively.

Download the Starting Salary excel file (Refer to Calculation - Second worksheet)

Transcript

So, now I'm ready to calculate the lower and upper bound. The lower bound on my 95%
confidence interval for the mean salary is going to be the mean of this sample minus margin of
error and mean of this sample plus the margin of error. So, the 95% confidence interval tells us
that the average or expected salary upon graduation from this department for the population of
the graduates is somewhere between $26,602 to $27,582. It could be any value in this interval.
So, now let's look back at our problem statement and see the second part. The director of
admissions believes that students wouldn't choose to enroll for the program if upon graduation
they would not make at least $26,500. What is a 95% confidence interval for proportion of
graduates who make at least $26,500? So, if I look at my data I should be able to tell that roughly
the average is here, so $26,500 is below the lower bound of the average; so, we are probably in
a good shape here. But we can actually find out what proportion by getting more than $26,500?

115
Starting Salary Excel Sheet (9 of 13) - Slide 120

The slide shows the Starting Salary data set.

A new column called "Makes more than 26500" is created and placed to the right of the Starting
Salary column. Here, the slide shows how to complete this new column. "=IF" is typed in the first
"Makes more than 26500" value cell; the program automatically shows the syntax of this function:
“=IF(logical_test, [value_if_true], [value_if_false]).” The function for this first value is
=IF(A2>26500,1,0), where A2 is the first value of the "Starting Salary" column (29148).

Transcript

So, I have recopied this salary right here and I'm going to ask makes more than $26,500,
question in this column (Column B). And as before, it's going to be an if statement and the if
would say if this value is greater than $26,500 then return a value of one, these people are
getting more; otherwise you turn a value of zero. And I can copy this and you can pay attention to
the if we have done it correctly. So, if you come to data number seven this person made less or
got an offer for less money, and if I've done this correctly I should get a value of zero there. And
sure, enough I do, I get a value of zero for this one and I get a value of zero for this one.
Everything else gets a value of one because all the other values are over $26,500. So, I can just
put my cross hair here, double click and it will fill it up for me.

116
Starting Salary Excel Sheet (10 of 13) - Slide 121

The slide shows the whole "Makes more than 26500" column completed. To the right, a new cell
called "Making > 26500" is created and has a value of 61. This value corresponds to the number
of people whose salary is more than 26500.

Transcript

Now I can find out from my data – so, I'm going to keep here number of people who are making
more than $26,500. And that would be simply the sum of values of ones that I see here, because
they represent people who are making more than $26,500. And this would be now 61, so 61 out
of 100. So again, I knew that sample size here is 100. I already calculated that before.

Starting Salary Excel Sheet (11 of 13) - Slide 122

117
The slide shows a list of statistics vertically placed with their respective values, as follows:

"Making > 26500": 61

"Sample size": 100

"Prop > 26500": 0.61

"95%, z value": 1.95996398

"Std error": 0.04877499

Transcript

So, then the proportion making more than $26,500 would be simply 61 out of 100 or 61%. So, for
95% confidence interval remember for proportion I would always use the z-value. The z- value
would be based on norm.s.inv(.975) and that's 1.96. And my standard error is simply square root
of p times one minus p divided by the sample size of 100. So that's my standard error; now that I
have my standard error I can find out what my margin of error is.

Starting Salary Excel Sheet (12 of 13) - Slide 123

The slide shows a new cell called "Margin of error" placed a few cells below the "Std error" cell.
Here, the slide shows how to calculate the margin of error. To calculate this value, use the t-value
and multiply it by the Standard of error number. The function is =E8*E10, where E8 is 1.9599 (t-
value) and E10 is 0.048 (Standard of error).

p̂ ( 1−p̂ )
Below the list of statistics is the following formula: z α √
n
2

Margin Error is 0.09559723.

118
Transcript

And margin of error is the z-value times standard error.

Starting Salary Excel Sheet (13 of 13) - Slide 124

The slide shows the outcome of the Margin of error: 0.09559.

Two new cells called "Lower" and "Upper" are created and placed a few cells below the "Margin
of error" cell. The values of these new cells are 0.5144 and 0.7055, respectively.

Download the Starting Salary excel file (Refer to More than 26500 - Third worksheet)

Transcript

And now the confidence interval; the lower percentage and upper percentage of the estimate for
my population proportion is simply the proportion of this sample minus margin of error, and the
proportion of this sample plus the margin of error. And I find that at about 51% to 70% of
graduates will make $26,500.

Now does this mean something that I can advertise because I can't just focus on 70% because
that's what is really, really appealing to me for my advertising, because the true proportion could
be 51%. And if you say to someone that half of you will make $26,500 that may not be as
appealing as it would have been if you had said 70%. But in reality, any value between these two
is equally likely to be the true proportion. So, you have to think hard in terms of what would be the
threshold. What would I have liked to see here if I had seen 80% of the people make over
$26,500? Would I make that advertising? Would you be happy if somebody made that advertising
to you and they really meant only 51% of the people make that? So, these are the type of things
that you have to think about when you read a report, when somebody is telling you something or
you are going to give a report. You can really misuse the statistics or misunderstand statistics if
you don't pay close attention to what is being communicated.

119
Lesson 4-4 Sample Size

Lesson 4-4.1 Sample Size


Media Player for Video

Confidence Interval - Slide 125

Confidence interval for Mean: Sample Mean ± Margin of Error

Confidence Interval for proportion: Sample Proportion ± Margin of Error

Transcript

Going through the last few lessons on confidence interval, we have seen a few things that have
unfolded. And they all have to do with the margin of error. To be precise and accurate, we like to
have a small margin of error. So, let's review the components of margin of error.

120
Margin of Error - Slide 126

s
Margin of Error for Mean: t α

2 √n

p̂ ( 1−p̂ )
Margin of Error for Proportion: z α √
n
2

Transcript

The close examination of the margin of error, whether you're estimating for me or population
proportion is partly controlled by the sample size. So far, we have looked at examples where you
were told that a study was done on a sample of a given size then based on the given information
we calculated the margin of error and the confidence interval. We can do this in another way. We
can say what we desire to have as the maximum margin of error and based on that solve for the
sample size. Let me show you how to do this.

121
Sample Size Determination - Mean - Slide 127

Sample size is an economic decision.

How to determine the right sample size?

State the acceptable margin of error. (E)


z α σ 2
2

n = ( )
E

Transcript

Starting with the margin of error and sample size calculations for the mean estimation followed by
the population proportion. And first of all, by now you intuitively know that the larger the sample
size, the better. But the sample size selection is an economic decision. It takes longer and more
effort to get the larger sample size. It really depends on the cost of getting it wrong with the
smaller sample size versus cost of getting a bigger sample. If you were in charge of quality
control and production line, filling boxes of cereals, then when the production line is a little off
then missing this problem will increase your cost of production a little, if you're putting too much in
the box or may result getting fined by not putting enough. But imagine that you're producing
drugs. If your mix of ingredients are incorrect, the mistake can result in catastrophic
consequences such as death resulting from wrong dosages. So how do you select the right
sample size? You can make this decision by stating the margin of error which is acceptable and
then work backwards to find the sample size, which means sample size is square of z times the
standard deviation divided by the acceptable margin of error.

122
Example (1 of 2) - Slide 128

Speedy Lube wants to advertise a 15-minute oil change to its customers. The manager has spent
a lot of time improving the oil change process and now wants to see if he will be able to advertise
for a 15-minute oil change. What sample size does he need to find the 95% confidence interval if
he wishes his margin of error to be 30 seconds (0.5 minutes)?

Transcript

Let's go back to an example we used in an earlier lesson which was about the manager wanting
to advertise 15-minute oil change to his customers. Now the manager wants to know how big a
sample he needs to collect for a 95% confidence interval if he wishes his margin of error to be 30
seconds or half a minute. To answer this, we will use this formula. Z of α over 2 for 95%
confidence interval is 1.96. E is half a minute. But we also need to know the standard deviation,
which we also don't have. So, in this case we first will do a small study to find this value.

123
Example (2 of 2) - Slide 129

He takes 100 random observations and analyzes the data to find that the average time was 14.7
minutes and the standard deviation was 3.5 minutes. What is the 95% confidence interval for the
population mean?
z α σ 2
2

n = ( )
E

2
1.96×3.5
n = ( ) = 188.23 ≅ 189
0.5

Transcript

We did 100 observation study and found the sample standard deviation of 3.5 minutes. Now we
can calculate the sample size required for the level of accuracy desired. Based on this the
sample size needed for our half a minute margin of error would be 188.23. We need an integer
value; so always round up. So, in this case the minimum sample size needed to provide the level
of precision and accuracy the manager wants is 189 observations.

124
Let's Practice - Slide 130

The manager of a cereal company wants to create a confidence interval for the average weight of
its cereal boxes within 0.15 ounces. The process is known to have a standard deviation of 0.5
ounces. What would be the required sample size for a 99% confidence interval?

Transcript

So now let's practice. The manager of the cereal company wants to create a confidence interval
or the average rate of its cereal boxes within .15 ounces. The process is known to have a
standard deviation of .5 ounces. What would be the required sample size for a 99% confidence
interval?

Let's Practice - Solved - Slide 131

125
99% Confidence level → z = 2.575
Standard deviation = 0.5 ounces
Acceptable error = 0.15 ounces
z α σ 2
2

n = ( )
E

2
2.575×0.5
n = ( ) = 73.72 ≅ 74
0.15

Transcript

Z-score for 99% confidence level is 2.575. Standard deviation is .5 ounces. Acceptable margin of
error is .15 ounces so the sample size is therefore 74.

Sample Size Determination - Proportion - Slide 132

z α 2
2

n = p(1 − p)( )
E

Can always use p = 0.5 which will result in the largest sample size for any value of P.
z α multiplier for the desired level of confidence
2

E: desired margin of error

126
Transcript

Just like we did for mean confidence interval, we can compute the sample size required to have a
proportion confidence interval to be of a specified width using the formula showing on this slide. P
is a preliminary estimate of the population proportion. We either get this by doing a small study
first to get a sense of this value, or we can just use .5. P of .5 will result in the larger sample size
of any value of p, which means using P of .5 will give us the most conservative estimate of the
required sample size. Z of α over 2 is the multiplier for the desired level of confidence and E is
the desired margin of error.

Sample Size Determination - Example - Slide 133

We want to know what percentage of the population think global warming is real. We would like to
be within 1%. If you were developing a 95% confidence interval , what sample size would you
need?
z α 2
2
n = p(1 − p)( )
E

p = 0.50

z α = 1.96
2

E = 0.01
2
1.96
n = 0.5 ×(1 − 0.5)×( ) = 9,604
0.01

127
Transcript

We want to know what percent of the population think global warming is real. We would like to be
within plus or minus 1%. If you were developing a 95% confidence interval what sample size
would you need? Use the equation to find the required sample size. Since we have no idea what
it may be, I will use P of 50%. Z of α over 2 for 95% confidence level is 1.96 and the desired
margin of error is 1% or .01. Putting these values into formula will yield 9,604 as the required
sample size.

Let's Practice - Slide 134

We want to know what percentage of the population think global warming is real. We would like to
be within 1%. If you were developing a 95% confidence interval, what sample size would you
need?
z α 2
2
n = p(1 − p)( )
E

p = 0.50

z α = 1.96
2

E = 0.01
2
1.96
n = 0.5 ×(1 − 0.5)×( ) = 9,604
0.01

Change E (margin of error) to 4%

128
Transcript

Wow, this is really big. Let's be real, I may not want to survey this many people. Then what
options do I have? First, I can reduce my precision. Let's say I will accept margin of error of 4%,
what will happen? What would be the new sample size? Please calculate it for me.

Let's Practice - Solved - Slide 135

We want to know what percentage of the population think global warming is real. We would like to
be within 4%. If you were developing a 95% confidence interval, what sample size would you
need?
z α 2
2
n = p(1 − p)( )
E

p = 0.50

z α = 1.96
2

E = 0.04
2
1.96
n = 0.5 ×(1 − 0.5)×( ) = 600.25 ≅ 601
0.04

Transcript

The new sample size is 601 observations instead of 9,604, quite a bit of drop. Once again, if you
want precision then you need to have larger sample size. And it's worth it if the cost of being
wrong is high enough to justify the cost of more sampling. Another option is to take a small
sample first and find the sample information and then based on that calculate the needed sample
size.

129
Sample Size Determination - Example - Slide 136

We would like to be within 1%. If you were developing a 95% confidence interval, what sample
size would you need?

An initial survey of 100 adults showed that 15% didn’t believed in global warming.

p̂ ( 1−p̂ ) 0.15 ( 1−0.15 )


[p̂ ± z α √ ]= 0. 15 ± 1. 96 × √ =[0. 15 ± 0. 07]=[0. 08, 0. 22]
n 100
2

2
1.96
n = 0.22 ×(1 − 0.22)×( ) = 6,591.88 ≅ 6,592
0.01

Sample size based on p = 0.50 was 9,604

Transcript

Going back to our original statement where we wanted to be within 1% of the true population
proportion, but this time we survey 100 adults and found that 15% of them didn't believe in global
warming. You may think your P is .15, not so fast. .15 is based on a sample. First, we need to see
what the 95% confidence interval will be based on this sample proportion. Then this gives us the
interval estimation within 8% to 22%. To be right, you need to use the value in this interval which
is closest to .5. This will ensure that your sample size will be large enough. So, in this case we
will use P of .22, then using the equation for sample size we find out that we need to sample
6,592 which is still pretty large, but much smaller than our initial of 9,604, which was calculated
by just assuming P of 50%. So, this is also an acceptable way of evaluating our sample size. As
you sample along and get some information, you can update the .5 you use as a starting point.

130
Sample Proportion Factor - Slide 137

If the proportion is close to either 0 or 1, most individuals have the same trait or opinion; then
there is very little natural variability, and the margin of is smaller than if the proportion was near
0.5.

Transcript

So, remember that if the proportion is either close to zero or one, that means most individuals
having the same trait or opinion then there is very little natural variability, and the margin of error
is smaller than if the population is near 50%. They're split. At 50% when they're split even, we
need to have a larger sample size in order to be able to make a fair estimate for the population.

Factors that Determine Margin of Error - Slide 138

131
1. Sample size: Sample size increases, margin of error decreases
2. The level of confidence: Confidence level increases, margin of error increases
3. The standard deviation of sample: Standard deviation increases, margin of error
increases

Transcript

Whether you're doing a sample study so you can make an inference about population mean or
proportion there are three things that impact the margin of error. First is the sample size. When
the sample size increases, margin of error decreases. Then it's about the level of confidence
desire. The higher the level, the larger the multiplier of the z α over 2 increasing the margin of
error. Third, is the natural variability, the standard deviation that you have observed within your
sample. In this lesson, we learned how to calculate the sample size, which will give us both the
precision and the accuracy, the desire.

Lesson 4-4.2 Sample Size Proportion in Excel


Media Player for Video

Excel Sheet (1 of 2) - Slide 139

The slide shows the same data set presented in Slide 88 - Excel Sheet (10 of 19) with all of the
additional information provided in Slide 97 - Excel Sheet (19 of 19).

To the left is the data set of one sample portfolio of 127 stocks. To the right is a list of statistics. In
addition,

The following informaiton is hand written to the right:


z α 2

E = 4%, n = ?, andn
2
= p(1 − p)( )
E

132
Transcript

Now, let's see how we can calculate the sample size needed for a desired level of accuracy,
which is represented by the margin of error. In this problem that we already solved earlier, we
decided our margin of error was 6%, and this was based on a portfolio that had 127 stocks. So,
what if I told you that my desired error is 4% and in order to do this, I would ask you how many
stock I need to sample. And to answer that question, I have to use the equation that we have,
which is p (1 − p) × the z of f over 2, which represents our confidence level, divided by this
degree of error that we are willing to accept. And the second term has to be squared. So, now let
me calculate this for you. But, first let me clean up some of the information in this spreadsheet
that we don't need.

So, in order for me to calculate what is that value, I had told you that you should look at this
sample as possibly returning a value for the true proportion that is anywhere between 9% to 22%.
So, in order for me to ensure that I will have the actual sample size that is large enough, I always
have to pick the value in this interval that is closest to 50%. So, in this case, it turns out to be this
one that I need to focus on (Upper 0.220831). So, I'm going to highlight this to bring attention to
it. And un-highlight what we had highlighted before.

Excel Sheet (2 of 2) - Slide 140

The slide shows a new cell called "Sample size" placed a few cells below the Upper cell. Here,
the slide shows how to calculate the sample size by using the z score and the confidence upper
level value.

Download the New York Stock Exchange Excel file (Refer to Sample Size - Sixth worksheet)

133
Transcript

So, now that I have focused on the value that is closest to 50%, I can say what sample size I
need. It's going to be this value (Upper 0.220831) multiplied by 1 minus this value multiplied by z
of f over 2, which we had already calculated before. It was 1.96. Divided by the desired rate of
error and in this case, I had told you that I want it to be four percent. So, I'm just going to enter it
as four percent. And this last term has to be squared. So, I'm going to raise it to the power of 2.
And I will say return. And it will give me the sample size of 413.117. And in this case, to be sure
that we have large enough, we will round it up to 414. So, this is sample size of 414. Whether it's
9%, 10%, 15% or 22%, this is ensuring sample sizes that are large enough to cover all the
possibilities that falls in that interval. So, remember to always look at your confidence interval and
use the value that is closest to 50% to calculate your needed sample size to ensure the level of
accuracy that you're looking for.

Lesson 4-4.3 Sample Size Mean in Excel


Media Player for Video

Excel Sheet (1 of 7) - Slide 141

The slide shows the New York Stock Exchange closing data set shown in Slide 79 - Excel Sheet
(1 of 19).

Two new cells called "Mean %Chg" and "Std Dev" are created and placed to the right of the data
set. Here, the slide shows how to use the Excel average function to calculate the %Chg of the
data. "=AVERAGE" is typed to the right of the "Mean %Chg" cell; the program automatically
shows the syntax of this function:“=AVERAGE(number1, number2, …).” After writting the average
function, select the first cell of the "%Chg" column and then hold Ctrl + Shift and press the down
arrow button. This allows to select the whole column.

Download the New York Stock Exchange with Mean Excel file (Refer to Data - First worksheet)

134
Transcript

Looking at our New York Stock Exchange closing data that we have I am going to see what was
the mean percentage change. At the end of the day each stock that was listed with the New York
Stock Exchange had recorded its closing value, the net change for that day and what percentage
that represented. So, I'm going to look at the mean percent changes, and to find that I would say
average value of everything that I see here. This will give me a negative 1.27. So, it happened to
be a losing day for the stock market on that day. And as you know there are times that we have
as a whole a losing day and sometimes a day that gains so many points. So, in that day it lost as
a whole 1.27% of its value.

Excel Sheet (2 of 7) - Slide 142

The slide shows the outcome of the Mean %Chg cell: −1.278.

A new cell called "Std Dev" placed below the "Mean %Chg" cell. To the right of this new cell is the
corresponding value 2.2658.

Download the New York Stock Exchange with Mean Excel file (Refer to Mean SD - Second
worksheet)

Transcript

And I can say the standard deviation of this is 2.2.

135
Excel Sheet (3 of 7) - Slide 143

The slide shows Mean %Chg and Std Dev cells with other values, −1.338 and 2.043,
respectively. In addition, two new cells called "n (sample size)" and "Confidence level" are
created and placed a few cells below the Std Dev cell. The values for these new cell are 125 and
95%, respectively.

Selects all data in Column D Percent Change. Standard Deviation is 2.043179.

Download the New York Stock Exchange with Mean Excel file (Refer to Sample Data - Third
worksheet)

Transcript

So, let's go back to our portfolio. We have 127 stocks in our data and based on this we can find
what is the mean change that we saw in our stocks; so, this would be an average our stocks. So,
the 127 stocks that we have in our portfolio, they lost as a whole 1.33% of their value. Their
standard deviation was stdev.s, I pick the 120 stocks that I have, control shift down, close the
parentheses and go back up. Sample size count, how many stocks I have in my portfolio and
close the parentheses. Go back up, you can see that my count is 125 (Selects all data in Column
D Percent Change). And the reason for that is if you scroll down you would see that I have a
stock, here's one and here's the other one. That was not traded that day; so what Excel is trying
to tell me is that you have 127 stocks in your portfolio, but on this day, this mean percentage
change is only based on 125 of the stocks that you own. Because two of them were not traded at
all, so in this case our sample size turns out to be actually 125. So, it's always a good thing for
you to use the count because you could have missing values. So, I am going to calculate our
confidence interval and then move on to calculating the sample size.

136
Excel Sheet (4 of 7) - Slide 144

The slide shows a new cell called "t" placed a few cells below the Confidence level cell.

t is 1.97928.

Transcript

So, let's say the confidence level of 95% then t of α over 2 remember I'm going to use the t-value
is going to be t.inv and the probability is going to be .975 and the degrees of freedom is n − 1, so
125 is my N. So, I'm going to put 124 and again, I'm going to remind you that you could use 1.96
as good estimate if you were doing this without the access to any computer program. But since
we have access to Excel, I am going to use the precise value. I'm not going to use the z-value.
I'm going to use the t-value when it comes to mean calculations. And why did I put .9975?

137
Excel Sheet (5 of 7) - Slide 145

The slide shows a new cell called "SE" placed a few cells below the t cell. The slide shows how to
use the Excel standard error function. To calculate the standard error, use the standard deviation
and divide it by SQRT(Number). Here, the number is I8, which is the sample size (125). The
function for the standard error is =I6/SQRT(I8), where I6 is 2.043 (Std Dev).

To the right of the list of statistics is a bell-shaped normal distribution curve with a confidence
level of 95%, which is marked by two vertical lines. Outside the range, each side has 2.5%. The
right vertical bar of the range has an arrow pointing to the left to indicate that it covers the 97.5%
s
of the area. On top of the list of statistics is a formula that reads: x̄ ± t ×
α

2 √n

Transcript

Because once again what we are saying is that the confidence interval is 95%; so, this is 95%,
.025 is in this tail and .025 is in this tail. So, everything to the left of this value is .975. So, for my
t.inv the first argument is .975 the second argument is n − 1, and in this case, it would be 124.
And I get the value of 1.98. And again, if I didn't have access to this I would just substituted 1.96.
Okay, now going back the confidence interval for mean is your sample mean plus or minus t of α
over 2 times the standard error. And standard error is going to be calculated here based on the
standard deviation of the sample that we have. So, I'm going to calculate my standard error here
and that would be s which is right here, standard deviation on my sample and divided by the
square root of my sample size, which is 125. So, this is what I have done.

138
Excel Sheet (6 of 7) - Slide 146

ME is 0.361708. Lower is −1.69979. Upper is −0.97637.

Transcript

And my standard error is going to be .18. So now my margin of error is this entire second term is
my margin of error, so that's t of α over 2 multiplied by the standard error; so that's my margin of
error. And based on this I can now come up with my confidence interval, lower value and its
upper value. And the lower value is going to be my mean minus my margin of error and my upper
value is going to be my mean plus my margin of error. This is what they would have calculated
based on our sample information in terms of what could be the true mean percent change for the
stock exchange on that day? Somewhere between negative .17 to negative .9; so again, going
back to our we can see negative 1.2. So, certainly negative 1.2 would have been in this interval
and our population mean would have been contained.

139
Excel Sheet (7 of 7) - Slide 147

n equals square of (t of alpha over 2 times s divided by E).

Download the New York Stock Exchange with Mean Excel file (Refer to Sample Size - Fourth
worksheet)

Transcript

Now let's say that I don't consider this acceptable. I don't want this much margin of error. I don't
want to be this far off, then what would I do? So that means that I need to calculate my required
sample size. So, let me get rid of some of these writings and then we will continue. So now let's
say that the desired error rate is only .3. I want to be only by .3 off. So then what sample size do I
need? This is the equation we have for sample size. If I didn't have my t-value this would have z
of α over 2, so again remember that this is not in contrast or this is not different than what you
see in power points, but since I have access to actual numbers I'm going to use the t-value. So,
my sample size is going to be equal to taking my t-value, multiplying it by the standard deviation,
then taking that and dividing it by my desired rate and desired error, which is .3 and squaring that
whole thing. So, raising it to the power of two. And this will give me 181.7.I will round it up to 182.
So, in order for me to be more precise I need a portfolio larger than 127 that I have here; I need
to have 182 stock sampled in order to be able to come up with a confidence interval for the actual
stock market that is only off by .3 when it comes to mean percent changes.

Lesson 4-4.4 Sample Size Effect in Excel


Media Player for Video

140
Excel Sheet (1 of 12) - Slide 148

Mean of This Sample for sample size 50, 200, and 500 is all 56.3645. Standard deviation of The
sample for sample size 50, 200, and 500 is all 17.90778. x̄ ± M.E.

Transcript

In this video I'm going to show you the impact of sample size. Let's assume that we took a
sample of 50 and a sample of 200, and a sample of 500. And for the sake of demonstration I'm
going to assume that all the samples gave us basically the same value for their mean as well as
their standard deviation. So, what would be changing our equation, and again remember that the
equation for confidence interval is sample mean plus and minus the margin of error, and margin
of error is your critical value times the standard error. So, now we need to calculate the standard
error; so standard errors are basically your sample standard deviation which we're using as a
point estimator for the population standard deviation divided by the square root of sample size.

141
Excel Sheet (2 of 12) - Slide 149

The slide shows how to calculate the Standard error for the Sample size 50. To calculate the
standard error, use the standard deviation and divide it by SQRT(Number). Here, the number
refers to F1, which is the sample size (50). The function for the standard error is =F5/SQRT(F1),
where F5 is 17.907 (Std Dev).

Transcript

So, in this case it's 50, so this is the standard error for sample size of 50.

Excel Sheet (3 of 12) - Slide 150

The slide shows the outcome of the Standard error for Sample size 50, which is 2.532543. This
same number is also placed in the Standard error values for the sample sizes 200 and 500.

142
Transcript

And I can grab and pull this and you will calculate the rest for me automatically.

Excel Sheet (4 of 12) - Slide 151

Standard Error (std deviation of Sampling Means) for sample size 50 is 2.532543. For 200 is
1.266271. For 500 is 0.80086.

The slide shows a bell-shaped curve with a confidence level of 95%, which is marked with two
vertical lines. Outside the range each side (left and right) has 2.5% and the left side of the right
vertical line covers 97.5% of the whole area.

Transcript

Let's look at what happens to our key distribution when we have this, so once again we are
saying that we're looking for the 95% confidence interval. So, here's my 95%, so this is 95%. So,
everything to the left of this is .975 and I'm looking for this value, t of α over 2 because .025 is
sitting here and that's alpha, which is .05 divided by two.

143
Excel Sheet (5 of 12) - Slide 152

The slide shows how to fill the critical value when the using t-distribution for Sample size 50.
Once written =T.INV the program automatically shows the syntax to get the t-distribution, which is
“=T.INV(probability,deg_freedom).” The function for the t-distribution is =T.INV(0.975,49).

Transcript

No instruction provided during this slide

Excel Sheet (6 of 12) - Slide 153

The slide shows another way to fill the degree of freedom, which is subtracting 1 to the sample
size. The function for the t-distribution looks like =T.INV(0.975,F1-1), where F1 is the sample size
of 50.

144
Transcript

So therefore, my t distribution is t.inv and I'm going to give it a value of .975 and the degrees of
freedom is n − 1. And in this case and this column, my n is 50, right here. So, it's going to be 50
minus one, so that's 49. And I'm going to actually write it as an equation so I can just copy it. So,
it's going to be this cell minus one (Cell F1 sample size (n) = 50).

Excel Sheet (7 of 12) - Slide 154

t of alpha over 2: Critical value (using t-distribution) for sample size 50 is 2.0009575. For 200 is
1.971957. For 500 is 1.964729.

Transcript

And now if I just grab this and pull it I would get appropriate values.

145
Excel Sheet (8 of 12) - Slide 155

The slide shows how to calculate the critical value when the using normal-distribution for Sample
size 200. "=NORM.S.INV" is typed to the right of the "Critical value" value cell; the program
automatically shows the syntax to get the normal distribution: “=NORM.S.INV(probability).” The
function for the normal distribution when the sample size is 200 is =NORM.S.INV(0.975). In
addition, the critical value when the using normal-distribution for Sample size 50 is 1.959.

Transcript

I'm just going to show you for the sake of illustration how z will change accordingly. So, z does
not require degrees of freedom. So, I'm going to use norm.s.inv and I'm going to say everything
to the left of it is .975 and clearly that's not going to change because it's independent of the
sample size.

146
Excel Sheet (9 of 12) - Slide 156

z of alpha over 2: Critical value (using normal-distribution) for sample size 50, 200, and 500 is all
1.959964.

Transcript

So, as you can see as my sample size has increased these two values are becoming closer and
closer. For 50 it's not as close as it is for 200. And for 200 it's not as close as it is for 500; so once
again if you are not having access to a spread sheet you can always use 1.96 as a quick way of
estimating what a t of α over 2 would have been.

Excel Sheet (10 of 12) - Slide 157

147
The slide shows how to calculate the margin of error for Sample size 50. To calculate this value,
use the Standard error and multiply it by the Critical value using t-distribution. The function is
=F7*F11, where F7 is 2.5325 (Standard of error) and F11 is 2.009 (t-distribution).

Transcript

So, what is the margin of error? So, margin of error is always your standard error multiplied by
the t value. Since I'm calculating it here I'm going to use the accurate values.

Excel Sheet (11 of 12) - Slide 158

Margin of Error for sample size 50 is 5.089335. For 200 is 2.497032. For 500 is 1.573474.

Transcript

So again, I can just grab this and it will calculate it for all of them, so if I click on this now it's
appropriately multiplying the standard error by the correct value of t of α over 2. So, what you see
is that margin of error has decreased as the sample size has gone up our accuracy and our
precision is going to go up? Why? Two folds, one is as you can see your critical value is
decreasing from 50 to 200 to 500. It's getting smaller and smaller. But also, so is your standard
error. So, both the standard error and t of α over 2 are decreasing, so as a result the margin of
error is decreasing quite a bit. Going from 50 your margin of error is plus or minus five, but at 500
you're at ± 1.5.

148
Excel Sheet (12 of 12) - Slide 159

Confidence Interval Lower Value for sample size 50 is 51.27412 and Upper Value is 61.45278.
For sample size 200 is 53.866642 and Upper Value is 58.86048. For sample size 500 is
54.78998 and Upper Value is 57.93692.

Download the Daily Temperature Excel file (Refer to Sample 3 Sample Sizes - Fifth worksheet)

Transcript

So, let's see what happens to our confidence intervals. Our confidence interval is going to be the
mean of this sample minus it's margin of error that's lower and this one is equal to mean of the
sample plus it's margin of error. So again, if I grab these and just copy them it will repeat it for me.
So, this is again, the mean for the sample of 200 minus the margin of error. And this is the same
mean, but plus the margin of error. So, 500 the sample mean, minus the margin of error; sample
mean plus the margin of error; so now what has happened to our width? So, at sample size of 50
the width of our interval is 10 degrees; that's quite a bit. At 500 the width of our sample is three,
at 200 the width of interval is five. So, our sample size decreases the width of the confidence
interval. So, when we want to have high confidence levels but also, we want to be precise, then
the best way to achieve that is by increasing the sample size because sample size reduces the
margin of error.

149

You might also like