Unit 5 Tutorials Hypothesis Testing With Z Tests T Tests and Anova

Unit 5 Tutorials: Hypothesis Testing with
Z-Tests, T-Tests, and ANOVA

INSIDE UNIT 5
Sampling Distributions
Sample Statistics and Population Parameters

Sampling With or Without Replacement
Sampling Error and Sample Size
Distribution of Sample Means
Distribution of Sample Proportions
Hypothesis Testing
Hypothesis Testing
Statistical Significance
Type I/II Errors
Significance Level and Power of a Hypothesis Test
One-Tailed and Two-Tailed Tests
Test Statistic
Pick Your Inference Test
Z-Tests for Population Means and Proportions
Standard Normal Table Review

Z-Test for Population Means
Z-Test for Population Proportions
How to Find a Critical Z Value
How to Find a P-Value from a Z-Test Statistic
Confidence Intervals
Confidence Interval for Population Proportion
Calculating Standard Error of a Sample Proportion
T-Tests for Sample Means
T-Tests
How to Find a Critical T Value
How to Find a P-Value from a T-Test Statistic
Confidence Intervals Using the T-Distribution
© 2023 SOPHIA Learning, LLC. SOPHIA is a registered trademark of SOPHIA Learning, LLC. Page 1
Calculating Standard Error of a Sample Mean
Analysis of Variance and Chi-Square Tests
Analysis of Variance/ANOVA
One-Way ANOVA/Two-Way ANOVA
Chi-Square Statistic
Chi-Square Test for Goodness-of-Fit
Chi-Square Test for Homogeneity
Chi-Square Test for Association and Independence
Sample Statistics and Population Parameters

by Sophia
 WHAT'S COVERED
This tutorial will explain the distinction between sample statistics and population parameters, with a
review of sampling distributions. Our discussion breaks down as follows:
1. Sample Statistics
2. Population Parameters
1. Sample Statistics
When you take a sample, it is important to try to obtain values that are accurate and represent the true values
for the population. A measure of an attribute of a sample is called a sample statistic
 EXAMPLE In election season, suppose we took a simple random sample of 500 people from a
town of 10,000 and found that in this particular poll, 285 of those 500 plan to vote for Candidate Y.
That would mean that our best guess for the proportion of the town that will vote for Candidate Y,
when the election actually does happen, is 285 out of 500, or 57%. This 57% is a sample statistic.
We don't know what the real proportion is of people who will vote for Candidate Y. We only know that
after election day. For now, though, this is our best guess as to the proportion that will vote for
Candidate Y. We are using the results of our sample to estimate the value on the population.
In general, the following notations are for the sample statistics that we generate most often. The sample
proportion is shown as p-hat. The sample mean is shown as x-bar. Lastly, a sample standard deviation is
shown as s.
Sample Statistic Examples (Statistics)
Sample proportion =
Sample Mean =
Sample Std Dev =
 HINT
Basically, the sample mean is the sum of a certain attribute of a sample divided by the sample size.
 TERMS TO KNOW
Sample Statistic
A measure of an attribute of a sample.
Sample Mean
A mean obtained from a sample of a given size. Denoted as
2. Population Parameters
A sample statistic is a measurement from a sample, and a population parameter is the corresponding
measurement for the population. This is something that we can find in a sample. The only way to figure out a
parameter is to take a census.
 EXAMPLE In our previous example, the sample proportion was 57%. The population proportion,
however, is unknown; we won't know it until election day. In applied statistics, it is often a goal to use
sample statistics to better understand unknowable population parameters.
Statistic Parameter
Sample Proportion = 57% Population proportion = ?
A population proportion is denoted as p (without the hat). A population mean is denoted as the Greek letter
mu, and a population standard deviation is shown with the Greek letter sigma.
Population Parameter Examples

(Parameters)
Population proportion =
Population Mean =
Population Std Dev =
 HINT
Population mean is basically the sum of a certain attribute of a population divided by the population size.
⭐ BIG IDEA
Parameters are a numerical value that characterizes some aspect of a population.
IN CONTEXT
The mean GPA of all 2,000 students at a school is 2.9, while the GPA of a sample of 50 students is
3.1
What is the population parameter and the sample statistic?
Based on the information provided, we can identify the following population parameters:
Population size = 2,000

Population mean = μ = 2.9
We can also find the following sample statistics:
Sample size = 50
Sample mean = x̅ = 3.1
 TERMS TO KNOW
Population Parameters
Summary values for the population. These are often unknown.
Population Mean
A mean for all values in the population. Denoted as .
 SUMMARY
Statistics are sample measures that we can use to estimate parameters, which are corresponding
population measure. It's important to remember that this only works when the sample is carried out
well. For instance, if there's bias, then those wouldn't accurately reflect the population measures.
Good luck!
Source: Adapted from Sophia tutorial by Jonathan Osters.
 TERMS TO KNOW
Population Mean
Sample Mean
A mean obtained from a sample of a given size. Denoted as .
Sample Statistics
Summary values obtained from a sample.
Sampling With or Without Replacement
by Sophia
 WHAT'S COVERED
This tutorial will cover sampling, both with and without replacement. Our discussion breaks down as
follows:
1. Sampling With Replacement

2. Sampling Without Replacement
1. Sampling With Replacement

Sampling with replacement means that you put everything back once you've selected it.
Typically, one big requirement for statistical inference is that the individuals, the values from the sample, are
independent. One doesn't affect any of the others. When sampling with replacement, each trial is
independent.
 EXAMPLE Consider a standard deck of 52 cards:
What is the probability that you draw a spade?

The probability of a spade on the first draw is 13 out of 52, or one-fourth.
Suppose you pull the 10 of spades, but then you put it back into the deck. Now, what's the probability of a
spade on the second draw?
It's one fourth again. It's the same 52 cards. Therefore, you have the same likelihood of selecting a spade.
⭐ BIG IDEA
When sampling with replacement, the trials are independent.

 TERM TO KNOW
Sampling With Replacement
A sampling plan where each observation that is sampled is replaced after each time it is sampled,
resulting in an observation being able to be selected more than once.
2. Sampling Without Replacement

Typically sampling with replacement will lead to independence, which is a requirement for a lot of statistical
analysis. However, it's not often that you sample with replacement. It simply doesn't make sense to do this in
real life.
 EXAMPLE You wouldn't call a person twice for their opinion in a poll, so we don't put someone
back into the population and see if you can sample them again.
Most situations are considered sampling without replacement, which means that each observation is not put
back once it's selected--once it's selected, it's out and cannot be selected again.
 EXAMPLE Let's go back to the example with the standard deck of 52 cards. What is the probability
that you select a spade on the first draw?
On the first draw, you have all 52 cards available, so the probability of drawing a spade is 13 out of 52, or
one-fourth, as we had found before.
Suppose you drew the Ten of Spades and did not place it back in the deck of cards. Now, what's the
probability of a spade on the second draw?
Now that there are only 12 spades left out of 51 cards, the probability of a spade on the second draw is not
equal to one-fourth.
This means that the first draw and the second draw are dependent. The probability of a spade on the
second draw changed after knowing that you got a spade on the first draw and did not replace it before
drawing again.
⭐ BIG IDEA
Even though the sampling that happens in real life doesn't technically fit the definition for independent
observations, there's going to be a workaround.
Suppose that your population was very large. Suppose you had four decks of cards, totaling 208 different
cards.
What is the probability of drawing a diamond?
There are 52 diamonds out of 208 cards, so the probability of a diamond on the first draw is one-fourth
probability, the same as if there were one deck.
Suppose the worst case scenario happened in terms of independence and every card picked you have
picked was the same suit. Take four diamonds from the group and do not replace them into the deck.
Now, what is the probability of drawing a diamond on the fifth draw?

There are now only 48 diamonds out of 204 cards remaining, so the probability of a diamond on the fifth
draw is 48/204.
The larger population actually has an effect now. The probability is about 0.24, which is different than
0.25, but not dramatically--even after five draws. The probability of a diamond didn't change particularly
that much from the first to the last draw.
When you sample without replacement, if the population is large enough, then the probabilities don't shift
very much as you sample. The sampling without replacement becomes almost independent because the
probabilities don't change very much.
The question is, when is the population large enough? How large is considered a large population? You're
going to institute a rule.
 CONCEPT TO KNOW
For independence, a large population is going to be at least 10 times larger than the sample.
If that's the case, then you're going to say that the probabilities don't shift very much when you sample "n"
items from the population. Therefore, you can treat the sampling as being almost independent.
 TERM TO KNOW
Sampling without Replacement

A sampling plan where each observation that is sampled is kept out of subsequent selections,
resulting in a sample where each observation can be selected no more than one time.
 SUMMARY
Sampling with replacement is the gold standard, in a sense. It always creates independent trials. The
probability of particular events doesn't change at all from trial to trial. However, in real life, when you
sample without replacement, the probabilities do necessarily change. Your workaround is that if the
population from which you're sampling is at least 10 times larger than the sample that you're drawing,
the trials can be considered nearly independent.
Good luck!
 TERMS TO KNOW

A sampling plan where each observation that is sampled is replaced after each time it is sampled,
resulting in an observation being able to be selected more than once.
Sampling Without Replacement

A sampling plan where each observation that is sampled is kept out of subsequent selections, resulting
in a sample where each observation can be selected no more than one time.
Sampling Error and Sample Size
by Sophia
 WHAT'S COVERED
This tutorial will focus on sampling error and how sample size relates to sampling error. Our
discussion breaks down as follows:
1. Sampling Error
2. The Effect of Sampling Size on Sampling Error
1. Sampling Error
Sampling error simply relates to the variability within the sampling distribution. It is the amount by which the
sample statistic differs from the population parameter.
 EXAMPLE Suppose that you have taken a sampling distribution of certain sizes from this spinner.
This would be a distribution where the number 1 occurs about three-eighths of the time, 2 occurs about
one-eighth of the time, 3 occurs about two-eighths of the time, and 4 also occurs about two-eighths of the
time.
These are the different sampling distributions.
You can see that their means are all the same. However, you can also notice that their standard
deviations, which are the lengths of the arrows, decrease as the sample size increases. That means that
the larger the sample, the closer on average the sample statistic will be to the right answer. This also
means that you will be closer to the population mean, represented by the blue line down the middle of
the graphs above.
What you'll notice is that some of the sample means from samples of size 4 are way up near four or down
near one, when the true population mean is two and three-eighths. Meanwhile, when you look at samples
of size 20, the vast majority of these samples are between two and three--very close to the population
mean of two and three-eighths. So, the distribution of sample means has a smaller standard deviation
with a larger sample size.
⭐ BIG IDEA
The sampling error is an amount by which the sample statistic, like a sample mean, is off from the
population parameter (a fixed value that we're trying to estimate). With larger samples, this sampling error
decreases.
When we calculate a margin of error, we're approximating the sampling error. What’s being said is that our
sample statistic is probably within a certain margin of error of the right answer, which we don't know.
 EXAMPLE Think of a poll that states that 60% of people are going to vote for a particular
candidate for office. It's reported with a margin of error of 7%. So this is saying that the sample gave us
60%, but the real population proportion of people who are going to vote for that candidate is plus or
minus 7%.
 TERMS TO KNOW
Sampling Error
The amount by which the sample statistic differs from the population parameter.
Sample Size
The size of a sample of a population of interest.
2. The Effect of Sampling Size on Sampling Error
From no fault of yours, sampling error occurs when you use a statistic like a sample mean or sample
proportion to estimate a parameter, like a population mean or population proportion. It's important to note,
though, from the previous sampling distributions, is that since the sampling error decreases as the sample size
increases, you would like as large a sample as possible.
 HINT
Sometimes getting a large sample is precluded by practical concerns like money or time. Perhaps you
simply don't have the money or time to take a large sample, so you are confined to a small sample. That is
fine--just keep in mind that you would like a larger sample if you can get one.
An increased sample size has to be coupled with well-collected data. A large sample size does not rescue
poorly-collected data. If your data are biased, you can't simply double the sample size and assume that
everything will be okay, because there will be less sampling error. It doesn't work that way.
If the questions are poorly worded or there's non-response or other biases like response bias, then the data
aren’t going to become any more accurate. They're not going to accurately approach the population
parameters that you're trying to estimate by taking a larger sample.
⭐ BIG IDEA
Once you've collected your data poorly, you might as well throw it out. An increased sample size does not
rescue it.
 SUMMARY
Sample statistics estimate population parameters. They do it more accurately when the sample size is
large. Often, we don't know what the population parameter is, which is why we've taken the sample in
the first place--to try and estimate the parameter. If the data were properly collected and the sample
size is large--which is ideal--you can be fairly sure that the statistic that we get is close to the right
answer, the parameter for the population. When you calculate a margin of error, you're approximating
the sampling error.
Good luck!
 TERMS TO KNOW
Sample Size
Sampling Error
by Sophia
 WHAT'S COVERED
This tutorial will cover the distribution of sample means. Our discussion breaks down as follows:
1. Distribution of Sample Means

a. Mean
b. Standard Deviation
c. Shape
1. Distribution of Sample Means

A distribution of sample means is a distribution that shows the means from all possible samples of a given
size.
 EXAMPLE Consider the spinner shown here:
Suppose you spun it four times to obtain a sample mean. The first spin was a 2, the second spin was a 4,
the third spin was a 3, and the fourth spin was a 1. The sample mean, then, would be 2.50. There's are
many possible samples that could be taken of size 4 for this spinner, and there are many possible means
that could arise from those samples, as shown below:
Sample Mean
= {2, 4, 3, 1} = 2.50
= {1, 4, 3, 1} = 2.25
= {4, 2, 4, 4} = 3.50
= {2, 2, 3, 1} = 2.00
= {3, 1, 1, 1} = 1.50
= {1, 1, 1, 2} = 1.25
So how can we represent all these distributions?

 STEP BY STEP
Step 1: First, take these sample means and graph them. Draw out an axis. For this one, it should go from 1
to 4 because this set can’t average anything higher than four or lower than a one.
Step 2: Take the average value, for example, the mean of 2.5, and put a dot at 2.5 on the x-axis, much like
a dot plot. Do this for all the sample means that you have found.
Step 3: You can keep doing this over and over again. Ideally, you would do this hundreds or thousands of
times, to show the distribution of all possible samples that could be taken of size four. Once you’ve
enumerated every possible sample of size four from this spinner, then the sampling distribution looks like
this:
On the graph, the lowest number you can get is one, and the highest number you can get is four. On the
far right of the graph is the point that represents a spin of 4 fours, {4, 4, 4, 4}. On the far left is the point
that represents a spin of 4 ones, {1, 1, 1, 1}. Notice that 4 ones happens more than 4 fours. Why is that? If
you take a look at the spinner, you'll see that there are more ones on the spinner than there are fours.
You can also notice that, since there are more ones, this actually pulls the average down a bit. The most
frequent average is 2.25, not 2.5, which would be the exact middle between 1 and 4. Therefore, this
distribution is skewed slightly to the right because the numbers on the spinner are not evenly distributed.
 TERM TO KNOW

A distribution where each data point consists of a mean of a collected sample. For a given sample
size, every possible sample mean will be plotted in the distribution.
1a. Mean
For the spinner above, the following are the histograms for a sample size of 1 spin, 4 spins, 9 spins, and
20 spins.
With the sampling distribution when the sample size was 1, you'll see that 1 occurs about 3/8 of the time, 2
occurs about 1/8 of the time, 3 occurs about a fourth of the time, and 4 occurs about a fourth of the time.
This produces a mean of 2.375:
You'll notice that when the sample size is 4, the shape of the distribution of sample means is significantly
different from when the sample size was 1. However, there some similarities and differences that you can
recognize here about all four of these sampling distributions. The similarities are their centers--all of them
are centered at 2.375. You'll notice that some of these are more tightly packed around that number--for
instance, the samples of size 20 are more tightly packed around 2.375 than the samples of size 1--but
they all are centered at that very same number.
What we can see here is that the mean of the sampling distribution of sample means is the same as the
mean for the population. In this case, it was 2.375.
 FORMULA
Mean of a Sampling Distribution of Sample Means
1b. Standard Deviation

How about the spread? The arrows on each of the histograms below indicate the standard deviation of
each distribution.
Notice the arrows on the first distribution are very wide, and they seem to diminish in size as each
distribution is graphed. When we get to the lowest distribution where the sample size was 20, its spread
is much, much less.
The rule that's being followed is that the standard deviation of a distribution of sample means is the
standard deviation of the population divided by the square root of sample size.
 FORMULA
Standard Deviation of a Sampling Distribution of Sample Means
What that indicates is that when the sample size is 4, the standard deviation of that sampling distribution
of sample means is going to be half as large as it was when the sample size was one. When the sample
size is 9, it's going to be a third the size of the original standard deviation. And when n is 20, it's going to
be the original standard deviation divided by the square root of 20.
 HINT
The standard deviation of the sampling distribution is also called the standard error.
 TERMS TO KNOW
Standard Deviation of a Distribution of Sample Means

The standard deviation of the population, divided by the square root of sample size.
Standard Error
The standard deviation of the sampling distribution of sample means.
1c. Shape
Lastly, let's discuss the measured center, or measured spread, and describe the shape of these
distributions. You'll notice that the shape is becoming more and more like the normal distribution as the
sample size increases. There's a theorem that describes that, called the central limit theorem.
The Central Limit Theorem states that when the sample size is large (at least 30 for most distributions
with a finite standard deviation), the sampling distribution of the sample means is approximately normal.
This means we can use the normal distribution to calculate probabilities on them, which is nice because
normal calculations are easy to do.
Therefore, it's going to be normal, or approximately normal, with a mean of the same as that of the
population, and a standard deviation equal to the standard deviation of the population divided by the
square root of sample size.
 TERM TO KNOW
Central Limit Theorem

A theorem that explains the shape of a sampling distribution of sample means. It states that if the
sample size is large (generally n ≥ 30), and the standard deviation of the population is finite, then the
distribution of sample means will be approximately normal.
 SUMMARY
The distribution of sample means is called a sampling distribution of sample means. The sampling
distribution of sample means has an approximately normal sampling distribution when the sample
size is large. This is the Central Limit Theorem. The mean of the sampling distribution is the mean of
the population. The standard deviation of the sampling distribution, which is also called the standard
error, is the standard deviation of the population divided by the square root of the sample size.
Good luck!
 TERMS TO KNOW

A theorem that explains the shape of a sampling distribution of sample means. It states that if the
sample size is large (generally n ≥ 30), and the standard deviation of the population is finite, then the
distribution of sample means will be approximately normal.

A distribution where each data point consists of a mean of a collected sample. For a given sample size,
every possible sample mean will be plotted in the distribution.

Standard Error
The standard deviation of the sampling distribution of sample means distribution.
 FORMULAS TO KNOW
by Sophia
 WHAT'S COVERED
This tutorial will cover the distribution of sample proportions, which is called a sampling distribution.
Our discussion breaks down as follows:
1. Sample Proportions
a. Mean
b. Standard Deviation
c. Shape
1. Sample Proportions
Many different situations can provide you with proportions.
 EXAMPLE Suppose that you were taking a poll during a political season, and you calculated the
proportion of people that were going to vote for a particular candidate.
However, proportions like this are typically sample proportions. The only way to obtain the true population
proportion, which is the parameter we're trying to estimate, is by taking a census. If you had some binomial
question type-- meaning, are you going to vote for one candidate or the other--and you took a census, you
would be able to know the parameter.
In most cases, you only deal with samples. You will want to figure out what thedistribution of sample
proportions actually looks like, which is the distribution of all possible sample proportions for a certain size, n.
 EXAMPLE Consider flipping a coin ten times. Obviously, you would expect 50% heads and 50%
tails, however, it doesn't always work out exactly that way.
Suppose the first time you flipped ten coins, you got 6 heads, so a percentage of 60% heads.
The next time you flipped ten coins, you got 70% heads. So it seems like the proportion of heads might
change from trial to trial, or sample to sample rather. First time, you got 60% heads in your sample. The
second time you got 70% heads in your sample. Suppose you do this a lot of times, and obtain sample
proportions of heads every time.
Sample Sample Prop Heads
=HHHTHTTHHT = 0.6
=HTHHHTHTHH = 0.7
=HHTHHHTHTT = 0.6
=TTHTHTTHTH = 0.4
=TTTTHHTHHH = 0.5
=HHHHHTTTTH = 0.6
Next, you can start to graph those sample proportions on a dot plot. Take the 0.6 and graph it, and then
the 0.7, then the 0.6 again, stacking up the second dot on top of the first dot.
Repeat this process for every possible sample of size ten. Eventually, you would obtain a distribution that
looks like this:
This is the distribution of the sample proportions of heads. This is what is called a sampling distribution of
proportions.
 TERM TO KNOW

The distribution of all possible sample proportions for a certain size, n.
1a. Mean
For the scenario above, notice that it peaks at 0.5, exactly where you would expect. Also, notice that it
sort of falls in almost a normal-looking shape off to each side. Very rarely did you get all of them being
heads (a sample proportion of one) and very rarely did you get none of them being heads (a sample
proportion of zero).
Notice the mean of the distribution of sample proportions, is the value of p, which is the actual probability
of getting heads, which was 0.5. It centers around what the proportion, or probability, of heads is going to
be for a single trial.
 FORMULA
Mean of a Distribution of Sample Proportions
1b. Standard Deviation
The number of successes is actually a binomial variable, meaning either you do it or you don't, and each trial
is independent and all of the requirements for it being binomial are there. Since this is the case, when we
graph the proportion of successes --which is the number of successes divided by the sample size, n-- the
standard deviation will be the standard deviation of the binomial distribution, divided by n.
Therefore, the standard deviation of a distribution of sample proportions is the square root of n times p times
q, divided by n. After some algebra, this simplifies to this square root of p times q over n. This is also known as
the standard error.
 FORMULA
Standard Deviation of a Distribution of Sample Proportions
 HINT
If p is the probability of success, q is the probability of failure, which is equal to 1-p.

 TERMS TO KNOW

A measure calculated by taking the square root of the quotient of p(1-p) and n.
Standard Error
The standard deviation of the sampling distribution of sample proportions.
1c. Shape
For a distribution of sample proportions, we have discussed that the mean is equal to the probability of
success and the standard deviation is the equal to the square root of p times q over n.
You're going to use the binomial numerator again to determine the shape. Since the sampling distribution of
sample proportions is a binomial variable divided by a constant --that is, it's some number of successes
divided by n-- the rules for the shape of it are going to follow that of the binomial distribution.
That is, it's going to be skewed to the left when the value of p is high and the sample size is low. It's going to
be skewed to the right when the probability of success is low and the sample size is low. Then, when the
sample size is large, it will be approximately normal.
Again, how large is large? When n times p is at least ten and when n times q is at least ten, the distribution of
sample proportions will be approximately normal, with the mean of p and the standard deviation of the square
root of p times q over n.
This is going to be one of our conditions for inference if you're going to use normal calculations, which you'll
want to do because they're easy to deal with. You're going to require that n times p is at least ten, and n times
q is also at least ten.
⭐ BIG IDEA
A condition for inference with a distribution of sample proportions states that n times p is at least ten and
n times q is at least ten.
AND
 SUMMARY
You've learned about the distribution of sample proportions, the standard deviation of a distribution of
sample proportions, and standard error, which is the same thing as the standard deviation of the
sampling distribution. The sampling distribution of sample proportions has an approximately normal
sampling distribution when the number of trials is large, referring to the shape. Its mean is the
proportion of successes in the population--that's the center. In addition, the standard deviation of the
sampling distribution, which is also called standard error, is the square root of the product of the
probabilities of success and failure, divided by the number of trials. That's the spread.
Good luck!
 TERMS TO KNOW

A distribution where each data point consists of a proportion of successes of a collected sample. For a
given sample size, every possible sample proportion will be plotted in the distribution.

The square root of the product of the probabilities of success and failure (p and q, respectively) divided
by the sample size.
Standard Error
The standard deviation of the sampling distribution of sample proportions.
Hypothesis Testing
by Sophia
 WHAT'S COVERED
This tutorial will cover the basics of hypothesis testing. Our discussion breaks down as follows:
1. Hypothesis Testing
2. Null and Alternative Hypotheses
3. Reject to Fail or Fail to Reject the Null Hypotheses
1. Hypothesis Testing
Hypothesis testing is the standard procedure in statistics for testing ahypothesis, or claim, about population
parameters.
IN CONTEXT
Suppose a Liter O'Cola company has a new Diet Liter O'Cola, which they claim is indistinguishable
from Classic Liter O'Cola. They obtain 120 individuals to do a taste test.
If their claim is true, some people will be able to identify the diet soda just by guessing correctly.
What percent of people will do that? You'd think it would probably be around 60 people, which is
50% --50% would guess correctly and 50% would guess incorrectly, simply based on guessing, even
if the Diet Cola was indistinguishable from the Classic Cola.
Now, suppose that you didn't get an exact 50/50 split. Suppose 61 people correctly identified the
diet Cola. Would that be evidence against the company's claim? Well, it's more than half, but it's not
that much more than half. We would say no. Sixty-one isn't that different from 60. Therefore, it's not
really evidenced that more than half of people can correctly identify the diet soda
Suppose that 102 people of the group were able to identify the diet cola correctly. Is that evidence
against the company's claim? In this case, 102 is significantly more than half. We would say that this
would be evidence that at least some of the people could taste the difference. Even if some of those
102 were guessing, it's evidence that at least some of those 102 can taste the difference.
Now, the question posed to us with the 102 is if the people were guessing randomly just by chance,
what would be the probability that we would get 102 correct answers or more? Isn't it possible that
102 out of 120 could correctly pick the diet cola just by chance? Anything is possible.
However, if this was a low probability, then the evidence doesn't really support the hypothesis of
guessing. In fact, it would appear that some people can taste the difference.
 TERMS TO KNOW
Hypothesis Testing
The standard procedure in statistics for testing claims about population parameters.
Hypothesis
A claim about a population parameter.
2. Null and Alternative Hypothesis

With hypothesis testing, there are two hypotheses that are pitted against each other.
Null Hypothesis: A claim about a particular value of a population parameter that serves as the starting
assumption for a hypothesis test.
Alternative Hypothesis: A claim that a population parameter differs from the value claimed in the null
hypothesis.
The null hypothesis is a default hypothesis that is temporarily accepted as true.
 EXAMPLE Refer back to the competing hypotheses from above. The null hypothesis will be that
"Liter O'Cola claims that 50% of people will correctly select the diet cola. We will state the null
hypothesis as the true proportion of people who can correctly identify the diet soda, p, is equal to 1/2.
The suspicion is that perhaps over 50% of people will select the diet cola--some of those by chance, and
some of those because they can actually taste the difference. This is called the alternative hypothesis,
which in essence is a "something is going on here" type of assumption.
 HINT
The notation is H subscript 0 for the null hypothesis (H0), and H subscript a for the alternative hypothesis
(Ha).
Null hypothesis is always an equality, and the alternative hypothesis can be expressed many ways,
depending on the problem. It's either a "less than" symbol, a "greater than" symbol, or a strictly "not equal
to" symbol.
 TERMS TO KNOW
Null Hypothesis
A claim about a particular value of a population parameter that serves as the starting assumption for a
hypothesis test.
Alternative Hypothesis
A claim that a population parameter differs from the value claimed in the null hypothesis.
3. Reject to Fail or Fail to Reject the Null
Hypotheses
In this example, if significantly more than half of the cola drinkers in our sample of 120 can correctly select the
diet soda, we would reject the null hypothesis where Liter O'Cola claims that 50% of people will correctly
select diet cola by chance.
If we reject the null hypothesis, then we are saying that we are in favor of the alternative hypothesis, which
states that there is convincing evidence that more than half of people will correctly identify the diet cola.
Now, significantly more than half is a loose term. How many is that? It was decided that 102 was probably
significant, while 61 probably wasn't that significant. We'll leave that definition for another time. On the other
hand, if not significantly more than half of the participants select the diet soda, then you would fail to reject the
null hypothesis. For instance, the 61 is not significantly more than half of the participants, and so you'd fail to
reject the null hypothesis.
 HINT
Notice you don't say the word "accept" the null hypothesis. Why not? Why do you fail to reject the null
hypothesis and not accept it? There's a very good reason for that.
When you do an experiment like this, you already believe the null hypothesis and try to provide evidence
against it. If there isn't enough legitimate evidence against it or strong enough evidence to reject it, then
all you can do is not reject it. You haven't proven that the null hypothesis is true, you just haven't
presented strong enough evidence to prove it false.
 SUMMARY
You learned about the hypotheses in the hypothesis test: the null and alternative hypotheses. You pit
those against each other and calculate probabilities in order to make a decision about the population.
Hypothesis testing involves a lot of things. You start by stating your assumption about the population,
which is the null hypothesis denoted H subscript null. You determine if the evidence gathered
contradicts the assumption, leading you to reject the null hypothesis in favor of the alternative
hypothesis, H sub a. You can calculate conditional probabilities by questioning the probability that
you would obtain statistics at least as extreme as these from a sample if the null hypothesis were, in
fact, true.
Good luck!
 TERMS TO KNOW
Hypothesis
Hypothesis Testing
Null Hypothesis
A claim about a particular value of a population parameter that serves as the starting assumption for a
hypothesis test.
by Sophia
 WHAT'S COVERED
This tutorial will cover statistical significance, which is an important concept in hypothesis testing. Our
1. Statistical Significance
2. Practical Significance
1. Statistical Significance
When you run a significance test, you need to determine what level of departure is considered a significant
departure from what you would have expected to have happened.
IN CONTEXT
Suppose you work in research at Liter O'Cola company. They've developed a new diet cola that
they believe is indistinguishable from the classic cola. Therefore, you obtain 120 individuals to do a
taste test. If the claim is true, what percent of people should select the correct cola just by random
chance, by guessing?
Well, if Liter O'Cola's claim is correct, about 50% of people would just guess correctly and 50%
would guess incorrectly if presented with the two options. So now the question is, at what point are
we going to stop believing Liter O'Cola's claim?
Suppose 61 people were able to pick the diet cola. Is this evidence against the claim? Well, 61 is not
that different from 60, so you're going to say no. This is not significantly different from what you
would expect.
Conversely, suppose 102 people were able to pick the diet cola correctly. Would that be evidence
against the company's claim?
In this case, you would probably say yes--102 is significantly over 60, and 60 is what you would
expect had they been randomly guessing. It's fairly unusual that you would see 102 people get it
right by randomly guessing out of 120. Therefore, this is evidence that some people can taste the
difference.
This is the whole idea of statistical significance. The result of 61 out of 120 is not a significant result, meaning
that it is not evidenced against the claim or null hypothesis. Conversely, the 102 would be evidence against
the null hypothesis, because it's so much higher than what we would have expected. Statistical significance
means that you doubt that the results that you obtain are due to chance.
Instead, you believe that it's part of some larger trend. For instance, in the cola example, you don't believe the
null hypothesis that people can't distinguish. You believe that the trend is that people, in fact, can distinguish.
So, if 61 people correctly identify it, you're not convinced that over half can identify the diet cola. The
difference might be only due to chance. In fact, it probably is. On the other hand, the difference of 42 from
what you expect is probably not due to chance. That would be called statistically significant.
 TERM TO KNOW
The statistic obtained is so different from the hypothesized value that we are unable to attribute the
difference to chance variation.
2. Practical Significance
Practical significance is whether or not something is meaningful in the real world. With practical significance,
we can ask ourselves, in practice, does this affect our lives?
It's important to make the distinction between practical significance and statistical significance. They're not
necessarily the same thing.
Suppose you had a large enough sample. It's possible if the sample size was large enough that even
something as close to 50% as 50.1% correct guessing could be considered statistically significant, even
though 50.1% is not that different from 50%.
The statistical significance argument is based largely on sample size and how far off from this 50% percent
claim you are. If the sample size is big, you don't need to be very far off. If the sample size is small, you need
to be further off in order to claim significance. However, if the sample size is big, you might get something like
50.1%, which is not considered practically significant.
IN CONTEXT
A state survey of all high school students finds that 15% of 10th graders drink regularly. A town
randomly selects 100 students and finds that 18% of their 10th graders drink regularly.
By doing some statistical test and setting a significance level and if this passes that test, then we can
say whether this is statistically significant or not.
Now whether this is practically significant, we need to consider if this affects our lives in the real
world. For this town, even if it came back that there was no statistical significance and the 18% result
was random, you may still want to do something about this report because it may still have meaning
for you in the real world because this is about something serious.
So without doing a test, we cannot say that this is statistically significant, but it may be practically
significant.
 TERM TO KNOW
Practical Significance
An arbitrary assessment of whether observations reflect a practical real-world use.
 SUMMARY
You learned about statistical significance and how to measure it versus practical significance. You
also learned how those two are not necessarily the same. Statistical significance is the extent to which
a sample measurement is evidence of a trend, like being able to taste the difference between regular
cola and diet cola, and whether the difference can be attributed to chance. Sometimes very small
differences can be statistically significant, though not have a lot of real-life meaning, which is practical
significance.
Good luck!
Source: ADAPTED FROM SOPHIA TUTORIAL BY JONATHAN OSTERS.
 TERMS TO KNOW
The statistic obtained is so different from the hypothesized value that we are unable to attribute the
difference to chance variation.
Type I/II Errors
by Sophia
 WHAT'S COVERED
This tutorial will cover the difference between a Type I error and a Type II error in a hypothesis test.
1. Type I and Type II Errors

2. Consequences of Type I and Type II Errors
1. Type I and Type II Errors

When you think about a hypothesis test as a decision-making tool, it's possible that you could be making
some errors. Suppose, for example, you're in a clinical trial for a new drug. There are two possibilities: the
drug is effective, or it is not.
 HINT
Recall that H0 is the null hypothesis, and Ha is the alternative hypothesis.

When you use a hypothesis test as a decision-making tool, you might make a different decision. There are two
possibilities for the decision you arrive at:
1. You could fail to reject the null hypothesis that the drug is not effective.
2. You could reject the null hypothesis in favor of the alternative hypothesis that the drug is effective.
One of those two will be your conclusion.
However, there's only one thing that's actually true and fact. Suppose these are the four different possibilities.
Two of them are the correct decisions.
Reality
Drug is not
Drug is effective
effective
Reject H0;
Correct
decide drug is
Decision
effective
Decision
Fail to reject H0;
Correct
decide drug Decision
isn't effective
With the two correct decisions, if the drug was effective, you should reject the null hypothesis and decide that
the drug is effective. Also, if the drug isn't effective, you should fail to reject the null hypothesis and decide
that the drug isn't as effective as it would have needed to be to reject it.
The other two possibilities are considered a Type I error or a Type II error.
Reality
Drug is not
Drug is effective
effective
Reject H0;
Correct
decide drug is Type I Error
Decision
effective
Decision
Fail to reject H0;
Correct
decide drug Type II Error
Decision
isn't effective
A Type I error is an error that occurs when a true null hypothesis is rejected. In the example above, a Type I
error would happen when the drug is not effective, but you decide that it is effective. The drug is not effective,
but you rejected the null hypothesis anyway. Based on your data, you thought that you had enough evidence
to reject the null hypothesis, but, in fact, the drug is not effective.
A Type II error is an error that occurs when a false null hypothesis is not rejected. As you can see on the chart
below, the drug was effective, but the data didn't make it clear enough, and so you failed to reject the null
hypothesis. This incorrect decision would be considered a Type II error.
 TERMS TO KNOW
Type I Error
An error that occurs when a true null hypothesis is rejected.
Type II Error
An error that occurs when a false null hypothesis is not rejected.
2. Consequences of Type I and Type II Errors

What are the consequences of each of those? Think back to a Type I error versus a Type II error.
A Type I error would have a consequence of you approving the drug and allowing the public to have it, even
though it's not effective. You're also unleashing all the potential negative side effects that this drug might
have. There's really no upside here and some negative consequences.
In a Type II error, you would not allow the drug to go to market because you think it's not effective when, in
fact, it is. You would deny an effective drug to the public who might need it, because you didn't know it was
effective, based on your data. This is another negative consequence. These errors always have negative
consequences.
⚙ THINK ABOUT IT
Which one are you more easily able to reconcile with yourself? In this case, probably a Type II error. It
would be difficult to deal with the idea of unleashing something that might hurt people just because you
think it might be effective. Typically, you need some hard evidence--if there's not hard evidence, you
would deny the drug.
IN CONTEXT
In the criminal justice system, juries are told to presume that someone is innocent until proven guilty,
meaning the null hypothesis is that the suspect is innocent, and the prosecution has to prove its
case. What would a Type I and Type II error look like in this context?
A Type I error would be that the person is innocent, but they're convicted anyway.
A Type II error would be that the person is guilty, but the result of the trial is that they're acquitted.
Obviously, both of these are problematic, but the criminal justice system in America puts a lot of
safeguards in place to make sure that a Type I error doesn't happen very often. In fact, the criminal
justice system allows a Type II error to happen fairly frequently in order to reduce a Type I error.
You may think a Type I error is absolutely the worst thing you can do in this particular case, but it's
not always this way. Sometimes a Type II error is worse. It depends on the situation, and so you have
to analyze each situation to determine which one is a worse mistake to make.
 SUMMARY
When you talk about a hypothesis test as a decision-making tool, you might be making an error in
your judgment. It's not that you made a mistake, but the result that you choose might not match what
is really the case. A Type I error is when the null hypothesis is rejected when, in fact, it's true. A Type II
error is when the null hypothesis is not rejected. In reality, it's false, but you didn't reject it. The severity
of these errors depends on the context. In both the examples covered in the tutorial, a Type I error
was worse. However, there are conceivably some scenarios where a Type II error might be worse.
Good luck!
 TERMS TO KNOW
Type I Error
In a hypothesis test, when the null hypothesis is rejected when it is in fact, true.
Type II Error
In a hypothesis test, when the null hypothesis is not rejected when it is, in fact, false.
Significance Level and Power of a Hypothesis
Test
by Sophia
 WHAT'S COVERED
This tutorial will cover how to identify factors that influence the significance level and power of a
hypothesis test. Our discussion breaks down as follows:
1. Significance Level
a. Selecting an Appropriate Significance Level
b. Cautions about Significance Level
2. Power of a Hypothesis Test
1. Significance Level
Before we begin, you should first understand what is meant by statistical significance. When you calculate a
test statistic in a hypothesis test, you can calculate the p-value. The p-value is the probability that you would
have obtained a statistic as large (or small, or extreme) as the one you got if the null hypothesis is true. It's a
conditional probability.
Sometimes you’re willing to attribute whatever difference you found between your statistic and your
parameter to chance. If this is the case, you fail to reject the null hypothesis, if you’re willing to write off the
differences between your statistic and your hypothesized parameter.
If you’re not, meaning it's just too far away from the mean to attribute to chance, then you’re going to reject
the null hypothesis in favor of the alternative.
This is what it might look like for a two-tailed test.
The hypothesized mean is right in the center of the normal distribution. Anything that is considered to be too
far away--something like two standard deviations or more away--you would reject the null hypothesis.
Anything you might attribute to chance, within the two standard deviations, you would fail to reject the null
hypothesis. Again, this is assuming that the null hypothesis is true.
However, think about this. All of this curve assumes that the null hypothesis is true, but you make a decision to
reject the null hypothesis anyway if the statistic you got is far away. It means that this would rarely happen by
chance. But, it's still the wrong thing to do technically, if the null hypothesis is true. This idea that we're
comfortable making some error sometimes is called a significance level.
The probability of rejecting the null hypothesis in error, in other words, rejecting the null hypothesis when it is,
in fact, true, is called a Type I Error.
Fortunately, you get to choose how big you want this error to be. You could have stated that three standard
deviations from the mean on either side as "too far away". Or, for instance, you could say you only want to be
wrong 1% of the time, or 5% of the time, meaning that you are rejecting the null hypothesis in error that often.
This value is known as the significance level. It is the probability of making a Type I error. We denote it with
the Greek letter alpha ( ).
 TERM TO KNOW
Significance Level
The probability of making a Type I error. Abbreviated with the symbol, alpha, .
1a. Selecting an Appropriate Significance Level

When you choose how big you want alpha to be, you do itbefore you start the tests. You do it this way to
reduce bias because if you already ran the tests, you could choose an alpha level that would automatically
make your result seem more significant than it is. You don't want to bias your results that way.
Take a look back at this visual here.
The alpha, in this case, is 0.05. If you recall, the 68-95-99.7 rule says that 95% of the values will fall within two
standard deviations of the mean, meaning that 5% of the values will fall outside of those two standard
deviations. Your decision to reject the null hypothesis will be 5% of the time; the most extreme 5% of cases,
you will not be willing to attribute to chance variation from the hypothesized mean.
The level of significance will also depend on the type of experiment that you're doing.
 EXAMPLE Suppose you are trying to bring a drug to market. You want to be extremely cautious
about how often you reject the null hypothesis. You will reject the null hypothesis if you’re fairly certain
that the drug will work. You don't want to reject the null hypothesis of the drug not working in error,
thereby giving the public a drug that doesn't work.
If you want to be really cautious and not reject the null hypothesis in error very much, you'll choose a low
significance level, like 0.01. This means that only the most extreme 1% of cases will have the null hypothesis
rejected.
If you don't believe a Type I Error is going to be that bad, you might allow the significance level to be
something higher, like 0.05 or 0.10. Those still seem like low numbers. However, think about what that means.
This means that one out of every 20, or one out of every ten samples of that particular size will have the null
hypothesis rejected even when it's true. Are you willing to make that mistake one out of every 20 times or
once every ten times? Or are you only willing to make that mistake one out of every 100 times? Setting this
value to something really low reduces the probability that you make that error.
1b. Cautions about Significance Level
It is important to note that you don't want the significance level to betoo low. The problem with setting it really
low is that as you lower the value of a Type 1 Error, you actually increase the probability of a Type II Error.
A Type II Error is failing to reject the null hypothesis when a difference does exist. This reduces the power or
the sensitivity of your significance test, meaning that you will not be able to detect very real differences from
the null hypothesis when they actually exist if your alpha level is set too low.
2. Power of a Hypothesis Test

You might wonder, what is power? Power is the ability of a hypothesis test to detect a difference that is
present.
Consider the curves below. Note that μ0 is the hypothesized mean and μA is the actual mean. The actual
mean is different than the null hypothesis; therefore, you should reject the null hypothesis. What you end up
with is an identical curve to the original normal curve.
If you take a look at the curve below, it illustrates the way the data is actually behaving, versus the way you
thought it should behave based on the null hypothesis. This line in the sand still exists, which means that
because we should reject the null hypothesis, this area in orange is a mistake.
Failing to reject the null hypothesis is wrong, if this is actually the mean, which is different from the null
hypothesis' mean. This is a type II error.
Now, the area in yellow on the other side, where you are correctly rejecting the null hypothesis when a
difference is present, is called power of a hypothesis test. Power is the probability of rejecting the null
hypothesis correctly, rejecting when the null hypothesis is false, which is a correct decision.
 TERM TO KNOW
Power of a Hypothesis Test

The probability that we reject the null hypothesis (correctly) when a difference truly does exist.
 SUMMARY
The probability of a type I error is a value that you get to choose in a hypothesis test. It is called the
significance level and is denoted with the Greek letter alpha. Choosing a big significance level allows
you to reject the null hypothesis more often, though the problem is that sometimes we reject the null
hypothesis in error. When the difference really doesn't exist, you say that a difference does exist.
However, if you choose a really small one, you reject the null hypothesis less often. Sometimes you
fail to reject the null hypothesis in error as well. There's no foolproof method here. Usually, you want
to keep your significance levels low, such as 0.05 or 0.01. Note that 0.05 is the default choice for
most significance tests for most hypothesis testing.
Good luck!
 TERMS TO KNOW

The probability that we reject the null hypothesis (correctly) when a difference truly does exist.
Significance Level
The probability of making a type I error. Abbreviated with the symbol, alpha .
One-Tailed and Two-Tailed Tests
by Sophia
 WHAT'S COVERED
This tutorial will cover the difference between a one-tailed and a two-tailed test in a hypothesis test.
1. One-Tailed Test
a. Right-Tailed Test
b. Left-Tailed Test
2. Two-Tailed Test
a. One-Tailed vs. Two-Tailed Tests
1. One-Tailed Test
A one-tailed test is a test for when you have reason to believe the population parameter is higher or lower
than the assumed parameter value of the null hypothesis.
One-tailed tests have two versions:
Right-Tailed Test
Left-Tailed Test
 TERM TO KNOW
One-Tailed Test
A test for when you have reason to believe the population parameter is higher or lower than the
assumed parameter value of the null hypothesis.
1a. Right-Tailed Test

A right-tailed test is a type of one-tailed test that means that the alternative hypothesis is larger than the
claimed parameter.
IN CONTEXT
Suppose you have your favorite soda, Liter O'Cola, and it's come out with new Diet Liter O'Cola.
They think that it's indistinguishable from their regular cola, so they obtain 120 individuals to do the
taste test. If the claim is true, you would expect about 50%, or 60 people, to guess correctly simply
based on the fact that guessed it right, if the taste was indistinguishable.
However, what if some people can taste the difference? What would you expect the proportion of
people correctly selecting the diet cola to be? You would likely say that it's some number over 50%.
At least half of the people will be able to correctly identify which cup is the diet cola.
You could make the following null and alternative hypothesis:
Your null hypothesis says that p, the true proportion of people who can correctly identify the diet
cola is 1/2, is half the people. Your alternative hypothesis suspects that maybe more than half of
people will be able to select the diet cola correctly.
Since you're only interested in testing whether or not the true proportion of people who can guess
correctly or identify which one is the diet cola is over half, this will be considered a right-tailed test, a
specific type of a one-tailed test. You don't care if it's under half. If it's under half, that actually works
in Liter O'Cola's favor.
The distribution of a right-tailed test would look similar to the following curve:
We are looking at the values higher than the assumed value, which is the section to the right of this value.
 TERM TO KNOW
Right-tailed Test
A hypothesis test where the alternative hypothesis only states that the parameter is higher than the
stated value from the null hypothesis.
1b. Left-Tailed Test

A left-tailed test is a type of one-tailed test in which the alternative hypothesis claims that it's less than the
claimed parameter.
IN CONTEXT
Suppose you suspect that Liter O'Cola is under-filling their bottles. Unsurprisingly, the bottles are
supposed to contain one liter of cola.
State the null and alternative hypothesis for this.
This is another example of a one-tailed test, more specifically a left-tailed test. The null hypothesis
says that the average amount of cola in the bottle is one liter for all the bottles that Liter O'Cola
makes. The alternative is that perhaps it's less than one liter--they're under-filling the bottles. The
average amount is less than one liter.
If the average amount, μ, was greater than one liter, you wouldn't really have a claim against Liter
O'Cola because you're actually getting more soda than they claim they're providing. You're only
going to give them trouble if they're under-filling their bottles.
The distribution of a left-tailed test would look similar to the following curve:
We are looking at the values lower than the assumed value, which is the section to the left of this value.
 TERM TO KNOW
Left-tailed Test
A hypothesis test where the alternative hypothesis only states that the parameter is lower than the
2. Two-Tailed Test
A two-tailed test is when we have reason to believe the population is different from the assumed parameter
value of the null hypothesis.
IN CONTEXT
Liter O'Cola also claims 35 grams of sugar in its bottles of cola. Anything over that and the soda will
taste too sweet. Anything under that and the soda won't taste quite sweet enough. Consumers won't
get the refreshing Liter O'Cola taste that they have come to expect. We suspect that Liter O'Cola
might have altered their formula recently because it tastes differently.
What do you think the null and alternative hypotheses will be here with respect to sugar?
Here, the null hypothesis is that the mean grams of sugar will be the same as it was before, 35.
What about the alternative hypothesis? Well, if they've changed their formula, you don't know if they
added more sugar or put in less sugar. However, they're only going to be in trouble if they put in a
different amount of sugar than before. The alternative hypothesis will stat that the mean grams of
sugar in the bottle is different than 35. So this is considered a two-tailed test. They're going to be in
trouble if they put in significantly more than 35 grams or significantly less than 35 grams.
The distribution of a two-tailed test would look similar to the following curve:
We are looking at the values on the values that are extremely lower or higher than the assumed value.
 TERM TO KNOW
Two-tailed Test
A test for when you have reason to believe the population parameter is different from the assumed
parameter value of the null hypothesis
3. One-Tailed vs. Two-Tailed Tests

One-tailed tests are preferred to two-tailed tests because they're more powerful. Statistical power means that
they have a higher likelihood of actually detecting a difference if one is present.
Let's take a look visually at what a one-tailed test and a two-tailed test look like. This is what a one-tailed test
with a p-value of 5% would look like.
Comparing a One-Tailed Test and a Two-Tailed Test,
Both With a p-value of 5%
With the one-tailed test, this would be under the alternative hypothesis that you have something less than a
particular number, like a mean is less than 1, for example. You end up with one tail area here of about 5%.
You're only going to get them in trouble if it's extremely lower than what you would have expected.
With the two-tailed test, you are interested in what the probability is that you would get at least as extreme on
either side of a value as you ended up with from your sample. It could either be extremely low or extremely
high--something that is extremely different from what you would have expected.
 SUMMARY
One-tailed tests only test whether or not there is evidence of a statistic being significantly higher or
lower than a claimed parameter, like mu or p. Two-tailed tests will test whether or not the statistic
obtained, x-bar or p-hat, is significantly different from the claimed parameter. You learned about one-
tailed tests, which have two versions, a left-tailed test, where you say in the alternative hypothesis
that it's less than a claimed parameter; and a right-tailed test, which means that it's larger than the
claimed parameter. There can also be a two-sided test, where we simply claim that the true value is
different than the claimed parameter, not equal to.
Good luck!
 TERMS TO KNOW
Left-tailed test
A hypothesis test where the alternative hypothesis only states that the parameter is lower than the
One-tailed test
A hypothesis test where the alternative hypothesis only states that the parameter is higher (or lower)
than the stated value from the null hypothesis.
Right-tailed test
A hypothesis test where the alternative hypothesis only states that the parameter is higher than the
Two-tailed test
A hypothesis test where the alternative hypothesis states that the parameter is different from the stated
value from the null hypothesis; that is, the parameter's value is either higher or lower than the value
from the null hypothesis.
Test Statistic
by Sophia
 WHAT'S COVERED
This tutorial will cover the topic of test statistics, which is the statistic that we calculate using the
statistics that we already have when we're running a hypothesis test. The tutorial will cover how to
determine whether to reject a null hypothesis from a given p-value and significance level. Our
1. Test Statistics
a. Z-Statistic for Means
b. Z-Statistic for Proportions
2. p-Value
3. Critical Values
1. Test Statistics
A test statistic is a relative distance of the statistic obtained from the sample from the hypothesized value of
the parameter from the null hypothesis. It is measured in terms of the number of standard deviations from the
mean of the sampling distribution.
When we have a hypothesized value for the parameter from the null hypothesis, we might get a statistic that's
different than that number. So, it's how far it is from that parameter.
⭐ BIG IDEA
Essentially, a test statistic is a z-statistic or a z-score.

The basic test statistic formula is equal to the statistic minus the parameter, divided by the standard deviation
of the statistic.
 FORMULA
Test Statistic
 TERM TO KNOW
Test Statistic
A measurement, in standardized units, of how far a sample statistic is from the assumed parameter if
the null hypothesis is true
1a. Z-Statistic for Means
When dealing with means, we can use the following values:
Z-Statistic for Means
Statistic Sample Mean:
Parameter Hypothesized Population Mean:
Standard Deviation Standard Deviation of :
Therefore, the z-statistic for sample means that you can calculate is your test statistic, and it is equal to x-bar
minus mu, divided by the standard deviation of x-bar.
 FORMULA
Z-Statistic of Means
1b. Z-Statistic for Proportions

Meanwhile, for proportions, we can use the following values:
Z-Statistic for Proportions
Statistic Sample Proportion:
Parameter Hypothesized Population Proportion:
Standard Deviation Standard Deviation of :
 HINT
The standard deviation of the p-hat statistic is going to be the square root of p times q (which is 1 minus p)
over n.
Therefore, the z-statistic for sample proportions that you can calculate is your test statistic, and it is equal to p-
hat minus p from the null hypothesis, divided by the standard deviation of p-hat.
 FORMULA
Z-Statistic of Proportions
2. p-Value
Both these situations have conditions under which they're normally distributed. You can use the normal
distribution to analyze and make a decision about the null hypothesis.
The normal curve below operates under the assumption that the null hypothesis is, in fact, true.
Suppose you are dealing with means. In the following graph, the parameter mean is indicated by mu, the
standard deviation of the sampling distribution is sigma over the square root of n, and perhaps your statistic x-
bar is over to the right as indicated below. The test statistic will become a z-score of means.
You are going to find what is called a p-value, the probability that you would get an x-bar at least as high as
what you'd get if the mean really is over here at the mean, mu. In this particular case, it's one sided-test.
We could do that, or if it were a two-sided test, it would look like this:
 TERM TO KNOW
P-Value
The probability that the test statistic is that value or more extreme in the direction of the alternative
hypothesis
3. Critical Value
Another way to determine statistical significance not using a p-value would be with what's called acritical
value. This corresponds to the number of standard deviations away from the mean that you're willing to
attribute to chance.
 EXAMPLE You might say that anything within this green area here is a typical value for x-bar.
You are willing to attribute any deviations from mu to chance if it's in this green region. This is the most
typical 95 percent of values. If it's outside that region, it would be within the most unusual 5%. You would
be more willing to reject the null hypothesis in that case.
A test statistic, meaning a z-statistic, that's far from 0 provides evidence against the null hypothesis. One
way would be to say that if it's farther than two standard deviations, which means it's in the outermost 5%,
then you're going to reject the null hypothesis. If it's in the most innermost 95 percent, you will fail to reject
the null hypothesis.
With two-tailed tests like the image above, the critical values are actually symmetric around the mean. That
means that if you use positive 2 on the right side, you would be using negative 2 on the left side.
There are some very common critical values that we use. The most common cutoff points are at 5%, 1%, and
10%, and you can see their corresponding critical values, which is the number of standard deviations away
from the mean that you're willing to attribute to chance.
Tail Area
Two-Tailed One-Tailed Critical Value (z*)
0.05 0.025 1.960
0.10 0.05 1.645
0.20 0.10 1.282
0.01 0.005 2.576
0.02 0.01 2.326
So, if it's two-tailed with 0.05 as your significance level, this was actually 1.96 standard deviations away.
If you were doing a one-tailed test with 0.05 as your significance level or a two-tailed test with rejecting the
null hypothesis if it's among the most 10% extreme values, you'd use a z-statistic critical value of 1.645.
If you were doing a one-tailed test and you wanted to reject the most extreme 10% of values on one side,
you'd use 1.282 for your critical value.
When you run a hypothesis test with the critical value, you should state it as a decision rule. For instance, you
would say something like, "I will reject the null hypothesis if the test statistic, z, is greater than 2.33". That's the
same as saying that on a right-tailed test, reject the null hypothesis if the sample mean is among the highest
1% of all sample means that would occur by chance. Note this is one-tailed because you're saying that the
rejection region is on the high side of the normal curve.
Consider the curve below:
The area within the blue box is what you're not willing to attribute to chance.
The area within the red box is what you are willing to attribute to chance.
The decision rule, the area where the red and blue boxes overlap, is your line in the sand. Anything less than
that will fail to reject the null hypothesis and attribute whatever differences exist for a mu to chance. Anything
higher than 2.33 for a test statistic, you will reject the null hypothesis and not attribute the difference from mu
to chance.
 TERM TO KNOW
Critical Value
A value that can be compared to the test statistic to decide the outcome of a hypothesis test
 SUMMARY
We learned about test statistics, both of which were z's. We also learned about p-values, which were
the probabilities that you would get a statistic as extreme as what you got by chance, and the critical
values, which are our lines in the sand whereby if we exceed that number with our test statistic, we'll
reject the null hypothesis. When we are running a hypothesis test, we convert our sample statistic
obtained (either x-bar or p-hat) into a test statistic, both of which are z's. If the sampling distribution is
approximately normal, we can use the normal distribution to determine if our sample statistic is
unusual or not--unusually high or unusually low or just unusually different--given that the null
hypothesis is true. We can decide on different critical values for different levels of "unusual", where if
our test statistic exceeds the critical value, we reject the null hypothesis--and that's our decision rule
Good luck!
 TERMS TO KNOW
Critical Value
P-value
hypothesis
Test Statistic
A measurement, in standardized units, of how far a sample statistic is from the assumed parameter if the
null hypothesis is true
Test Statistic
z-statistic of Means
z-statistic of Proportions
Pick Your Inference Test
by Sophia
 WHAT'S COVERED
This tutorial will help explain which inference test should be used based upon the data set. Our
1. Overview
2. Qualitative or Categorical Data
a. One-Proportion Z-Test
b. Chi-Squared Test for Goodness-of-Fit
c. Chi-Squared Test for Homogeneity
d. Chi-Squared Test for Association and Independence
3. Quantitative Data
a. One-Way ANOVA
b. Two-Way ANOVA
c. One-Sample T-Test Vs. One-Sample Z-Test
1. Overview
Let's take a look at how to determine what type of hypothesis testing or inference test we should perform on a
given data set. First, we need to ask ourselves if we're dealing with qualitative or quantitative data.
How Many
Type of Population
Test
Data Proportions
or Population Means?
Qualitative One
One-Proportion Z-Test; model the data using a normal distribution.
or Population Proportion
Categorical Two or More Chi-squared test; determine if we are testing for goodness of fit,
Data Population Proportions homogeneity, or association and independence.
One-Sample Z-Test or a One-Sample T-Test; this will depend on

One
whether or not we know the population standard deviation. If we do,
Population Mean
we use the z-test. If we don't, we use the t-test.
Quantitative Two Special type of student t-test, which will not be addressed in this
Data Population Means tutorial.
ANOVA f-test; if our data has one characteristic, use a one-way
Three or More ANOVA test. If it has two or more characteristics, use a two-way
Population Means ANOVA test.
Another way to determine the type of test is through this inference test decision tree, which is available to
view or download as a PDF at the end of this tutorial.
2. Qualitative or Categorical Data

2a. One-Proportion Z-Test
Suppose you hear that four out of five dentists recommend a certain type of toothpaste. After taking a sample
of 100 dentists, you found that 75 dentists would recommend the toothpaste.
✔ ✔ ✔ ✔ ✘
"Takes Away Cavities" Brand Toothpaste

Sample Results
Was the claim accurate? What kind of tests are you going to use to try and figure this out?
We need to note that we're dealing with categorical data here. We're looking at dentists and if they
recommend something or don't recommend something. We're not really dealing with calculating means.
We also need to think about how many proportions we have. Here we only have one proportion: 75 out
of 100 dentists. Therefore, we're going to perform a one proportion z-test.
2b. Chi-Squared Test for Goodness-of-Fit

Suppose you flip a coin 100 times and recorded the number of heads and tails. In this case, we would expect
that there would be 50 heads and 50 tails. However, our data showed 30 heads and 70 tails.
Heads Tails
Expected 50 50
Observed 30 70
So, how can you tell if the coin that you're flipping is fair? And what tests should we use?
We need to consider the type of data that we're dealing with. Notice here, we have heads and tails to
record, which are categorical data because the data just falls into two categories: heads or tails.
We're also dealing with population proportions in regards to heads and tails; there are two population
proportions: heads and tails. Therefore, we're going to use a chi-squared test.
But what kind of chi-squared test should we be using? We're comparing observed data to expected data.
Because we are looking to see if the sample distribution matches the population distribution, we're going
to be using a chi-squared test for goodness-of-fit.
2c. Chi-Squared Test for Homogeneity
Suppose you want to determine the effectiveness of the flu vaccine in preventing the chance of someone
getting the flu. You gather data on 500 people where 250 had the flu vaccine, and 250 didn't get the flu
vaccine. You also record who got the flu and who did not get it.
Caught Flu Did Not Catch Flu Total
Received
115 135 250
Flu Vaccine
Did Not
Receive Flu 120 130 250
Vaccine
Received
235 265 500
Flu Vaccine
What type of tests would you use to determine if the flu vaccine was effective or not?
We need to ask ourselves again what kind of data we are dealing with. We're looking at those that got the
flu vaccine and those who did not, as well as the number of people who caught the flu. Both are
categorical data.
Notice here that we're dealing with two populations proportions, those that got the flu and those who
didn't. Therefore, we're going to use a chi-squared test again.
We are also trying to determine if the flu vaccine was effective or not across two populations we're
considering. Because we're seeing if there is a difference between this variable across two populations,
we're going to use a chi-squared test for homogeneity.
2d. Chi-Squared Test for Association and Independence
Suppose we want to determine if gender affects whether or not someone likes an apple, orange, or banana.
How are we going to test this?
We need to ask ourselves what kind of data we are dealing with. In this case, we're dealing with data that
can be categorized by names--apples, oranges, and bananas, which are categorical data.
We also notice that we're dealing with two populations proportions, men and women. Therefore, we're
going to use a chi-squared test.
We're trying to determine how apples, oranges, and bananas are related to each population. Because we
are looking for an association between two or more variables in a single population, we're going to use a
chi-squared test for association or independence.
3. Quantitative Data
3a. One-Way ANOVA
Suppose you're trying to determine if the overall standardized test scores on a given test across different
states are equal for high school students trying to enter college.
What kind of test should we use?
Notice that we are dealing with mean test scores here, which are qualitative data. Remember, that's the
first thing you should always ask yourself. What kind of data am I dealing with?
We're also dealing with several population means--in this case, 50 population means, one for each state.
That means we're going to be using an ANOVA f-test.
In this case, we're looking at one characteristic of the data, which is the overall test scores. Because we're
just looking at one characteristic--overall test scores--we're going to use a one-way ANOVA f-test.
3b. Two-Way ANOVA
Suppose you want to determine how students in different states are performing on the Math and English
sections of the exam.
We need to think about what kind of data we are dealing with. Here we're dealing with mean test scores,
which are quantitative data.
We're also dealing with multiple populations--in this case, up to 50 population means, because we're
having one for each state. Again, because we have so many population means, we're going to use an
ANOVA f-test.
In this case, we're looking at two characteristics of the data tests: test scores on the math and the English
section. So, we're going to use a two-way ANOVA f-test.
3c. One-Sample T-Test Vs. One-Sample Z-Test
Suppose we're concerned with the test scores of students in Minnesota taking a given standardized test.
Again, what kind of data are we dealing with? Here we're dealing with mean test scores, which are
quantitative data.
We're also dealing with one population mean--in this case, Minnesota's population mean. So, we're going
to use a one-sample test.
In this case, we're looking at one characteristic of the data, the overall test score. Therefore, if we don't
know the standard deviation of the entire population that took the test, we would use a one-sample t-test.
If, however, we did know the standard deviation of the population that took the test, then we would use a
one-sample z-test.
 SUMMARY
This lesson explored how to perform different types of hypothesis or inference tests that you're likely
to encounter when you're in a statistics course, and when to apply one over the other.
Source: Adapted from Sophia tutorial by Parmanand Jagnandan.
Standard Normal Table Review
by Sophia
 WHAT'S COVERED
This tutorial will review the standard normal table. Our discussion breaks down as follows:
1. Standard Normal Table

a. Percent Below a Z-Score
b. Percent Above a Z-Score
c. Percent Between Two Z-Scores
d. Percent Outside Two Z-Scores
1. Standard Normal Table

The standard normal table is the table that is used when you have a normal distribution, and you want to find
probabilities or percent. The table can be used actually to do four things.
1. The table value itself gives you the percent of observations below a particular z-score.
2. You can find the percent above a particular z-score by subtracting the table value from 100% because the
table value always gives the area to the left.
3. You can find the percent of observations between two z-scores by subtracting the table values.
4. You can find the percent of values outside of two z-scores by finding both the percent above the higher
number and the percent below the lower number, which is sort of a combination of these other options.
 TERM TO KNOW
Standard Normal Table

A table showing the values of the cumulative distribution function of the standard normal distribution.
1a. Percent Below a Z-Score
Suppose you want to know the percent of men who are shorter than 63.5 inches. Men's heights are normally
distributed with a mean of 68 inches and a standard deviation of 3 inches. Using this distribution, 63.5 falls
right between 62 and 65.
First we need to find the z-score by using the following formula:
The z-score ends up being negative 1.5; it's 1.5 standard deviations below the mean of 68.
You can use the negative z-score table, and go to the negative 1.5 row and the zero hundredths column, and
find that the probability is 0.0668.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
This means that about 7% of men are shorter than 63.5 inches.
1b. Percent Above a Z-Score

This example focuses on men taller than 72 inches. Again, we will use the same information that says men's
heights are normally distributed with a mean of 68 inches and a standard deviation of 3 inches. What percent
of men are over six feet tall?
Here's the normal distribution. 72 inches is the cutoff value, and you want the percent of men that are taller
than that.
To find this z-score, use the following formula:
The 72 inches standardizes to positive 1.33 for a z-score.
We can also take the normal distribution, centered at 68 with a standard deviation of 3, and convert it into the
standard deviation of 1 and mean of zero. The image below is called the standard normal curve.
Our z-score was positive 1.33, so you will look in the positive z-score table. Positive z-scores deal with the
tenths place and the hundredths place. Because your z-score was positive 1.33, you will go to the 1.3 row
(tenths) and the 0.03 column (hundredths).
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
At that intersection, you will find 0.9082, which is the area to the left of 1.33. But, the question was asking the
area above so now you simply subtract from 100%.
This tells us that 9.18% of adult men have heights over 72 inches.
1c. Percent Between Two Z-Scores

You can do another type of problem, which is finding the area between two values, such as 5'6" and 5'9", or
66 inches and 69 inches.
Something like this is a little trickier. When you standardize the values of 66 and 69, you end up with these
two z-scores.
To find the probability of the area between these two numbers, you actually need to find the probabilities of
both z-scores.
First, for the area corresponding to the z-score of positive 0.33, look in the positive z-score table at 0.3 row
and 0.03 column to find that the orange area shown below is 0.6293.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
Now, we need to consider our second z-score, negative 0.67. When you look at the negative z-score table for
the negative 0.67 z-score, you find that its probability in the negative 0.6 row and the 0.07 column is 0.2514,
shown in the green area below.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
The area between 66 inches and 69 inches is the area below the 0.33 z-score but not below the -0.67 z-score.
Therefore, we need to subtract the orange area by the green area to obtain the area between the two values,
shown in blue below.
The orange area is equal to 0.6293, the green area is equal to 0.2514, so 0.6293 minus 0.2514 is 0.3779,
which tells us that about 38% of men are between those two heights.
1d. Percent Outside Two Z-Scores

Lastly, you can find the area outside of a particular region. What percent of men are not within 2.5 inches of
the mean of 68; in other words, what percent of men are not within 65.5 inches and 70.5 inches? What does
this one look like? It looks like this, where this grey area is the particular area that we want.
All you do is add the two probabilities of the area below 65.5 and the area above 70.5. First, convert both of
these to z-scores.
We get z-scores of negative 0.83 and positive 0.83. Now, since these two values are the same distance away
from the mean and because of the symmetry of the normal curve, you can actually just find one of these two
areas and double it. In general, you wouldn't be able to do that if they were different differences from the
mean.
Let's find the probability of negative 0.83 in the table.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
You would find the area below the negative 0.83 z-score, which is 0.2033. Normally you would find the area
above the positive 0.83 z-score, but you don't have to do that, because it's the same as the area below the
negative 0.83 z-score. Just use the symmetry and double it to obtain about 41% of men being outside that
range.
 SUMMARY
It's possible to use the standard normal table to find the percent of values above or below a particular
value, or between two values, or even outside two values, using z-scores on the normal distribution.
The normal probability table, also called the z-table or the standard normal table, can find these
percents by finding the percent of values below a certain z- score, and subtract as necessary.
Good luck!
 TERMS TO KNOW

The table that allows us to find the percent of observations below a particular z-score in the normal
distribution.
by Sophia
 WHAT'S COVERED
This tutorial will cover how to perform a z-test for population means. Our discussion breaks down as
follows:
1. Calculating a Z-Test for Population Means

2. Conducting a Z-Test for Population Means
1. Calculating a Z-Test for Population Means

A z-test for population means is a type of hypothesis test that compares a hypothesized mean from the null
hypothesis to a sample mean. This can be used with quantitative data and when the population standard
deviation is known.
This type of z-test is not done often because it is unlikely that we would know the population standard
deviation without knowing the population mean.
When calculating a z-test for population means, you need the following information:
Population mean (μ)

Population standard deviation (σ)
Sample mean (x̅)
Sample size (n)
This information will be plugged into the formula for a z-statistic of population means:
 FORMULA
Z-Statistic for Population Means
IN CONTEXT
The average weight of newborn babies is 7.2 pounds, with a standard deviation of 1.1 pounds. A
local hospital has recorded the weights of all 285 babies born in a month, and the average weight
was 6.9 pounds.
Find the z-test statistic for this data set.
We know the average weight is 7.2 pounds, with a standard deviation of 1.1 pounds. Because we
know the population standard deviation and it is quantitative data, we can use the normal
distribution and find a z-score. We also know the average weight was 6.9 pounds. We can plug this
information into the following formula to calculate our z-test statistic:
We have 6.9, which is our sample mean, minus the population mean of 7.2, divided by the
population standard deviation, 1.1, divided by the square root of our sample size, 285. This gives us a
z-test statistic of negative 4.604. We should expect to get a negative z-score because our sample
mean was less than the population mean.
If we were to put this on a normal distribution, it's centered at the population mean, which is 7.2
pounds. The average weight of the babies at the hospital was 6.9, which is less than 7.2 pounds, so
it should fall in the lower part of our distribution.
The corresponding z-score is all the way down at the negative 4.604. At this hospital, the average
weight of the babies was definitely far below the average weights of the babies of the population.
 HINT
Technology is often used when conducting a z-test for population means.

 TERM TO KNOW
A hypothesis test that compares a hypothesized mean from the null hypothesis to a sample mean,
when the population standard deviation is known.
2. Conducting a Z-Test for Population Means

There are four parts to running any hypothesis test. This is regardless of the type of tests that you use.
 STEP BY STEP
Step 1: State the null and alternative hypotheses.
Step 2: Check the conditions necessary in order to actually perform the inference that you're trying to do.
Step 3: Calculate the test statistic--in this case, a z-statistic--and calculate the p-value based on the normal
sampling distribution.
Step 4: Compare your test statistic to your chosen critical value or your p-value to our chosen significance
level. Those are both acceptable approaches. Based on how they compare, state a decision regarding
the null hypothesis. Circle it back around to the null hypothesis and decide if it supports the null
hypothesis or refutes the hypothesis. Make a decision to either reject or fail to reject it based on your
evidence. It should also be in the context of the problem.
 EXAMPLE Consider the following problem:

According to their packaging, the standard bag of M&M's candies contains 47.9 grams. Suppose you take 14
bags at random and weigh them. The sample mean is 48.07 grams.
48.2 48.4 47.0 47.3 47.9 48.5 49.0
48.3 48.0 47.9 48.7 48.8 47.4 47.6
Assuming the distribution of bag weights is approximately normal, and the standard deviation of all M&M's
bags is 0.22 grams, is this evidence that the bags do not contain the claimed amount of 47.9 grams in each
bag?
This could mean that it's either higher than 47.9 grams or lower than 47.9 grams. If you take a look, some of
the weights in the sample are fairly off, some by almost a full gram.
You are also assuming that you know the standard deviation of all M&M's bags, which is not always a
reasonable assumption, but is for this example.
Let's walk through each of the steps of running a hypothesis test with our M&M's example.
 STEP BY STEP

For this problem, the null hypothesis is that the M&M's bags are doing exactly what you thought they would
do. The mean of all M&M's bags is the 47.9 grams that was claimed.
The alternative hypothesis is that they're not that number. This is going to be a two-sided test based on this
"not equal to" symbol.
You should also state what your alpha level or significance level is going to be. By stating that alpha equals
0.05, which is the most common significance level, you are saying if the p-value is less than 0.05, reject the
null hypothesis. If this is above 0.05, you should fail to reject it.
Step 2: Check the conditions necessary for inference.

Look at the conditions necessary for inference on a population mean.
Criteria Description
How were the data collected?
Randomness
The randomness should be stated somewhere in the problem. Think about the way the
data was collected.
Population ≥ 10n
Independence You want to make sure that the population is at least 10 times as large as the sample
size. This was your workaround for independence. If the population is sufficiently large,
then taking out the number of bags that you took doesn't make a huge difference.
n ≥ 30 or normal parent distribution
There are two ways to verify normality. Either the parent distribution has to be normal or
Normality
the central limit theorem is going to have to apply. The central limit theorem says that
for most distributions, when the sample size is greater than 30, the sampling distribution
will be approximately normal.
Going back to the M&M's example:
Randomness: In the problem, it does say the bags were randomly selected. So, thinking about the way
that the data was collected in the problem is important.
Independence: We can also assume there are at least 140 bags of M&M's, which is a reasonable
assumption. Why 140? Because there were 14 bags in our sample. So you're going to assume that the
population of all bags of M&M's is at least 10 times that size.
Normality: Finally, the distribution of bag weights is in fact approximately normal as stated in the problem.
Step 3: Calculate the test statistic and calculate the p-value based on the normal sampling distribution.
In this problem, your test statistic is going to be a z-statistic. How is this done? Take the sample mean minus
the hypothesized population mean of 47.9 from the null hypothesis, and divide by the standard error, which is
the standard deviation of the population divided by the square root of sample size.
When you do all of that and input the numbers, you get a z-statistic of positive 2.89. Look at where that lies on
the normal distribution that you're using. A z-statistic of 2.89 on the standard normal distribution centered at 0
is between two and three standard deviations above the mean. Because this is a two-sided test, find the
probability that your z-statistic is above positive 2.89 and the probability that it's below negative 2.89.
Use a z-table to find the probability of 0.0019 for a z-score of -2.89.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
That probability, when doubled, gives us 0.0038.
level.
This actually contains three parts: the comparison, the decision, and the conclusion. Since your p-value of
0.0038 is less than the significance level of 0.05, your decision is to reject the null hypothesis. There is
evidence to conclude that the M&M's bags are not filled to a mean of 47.9 grams.
Comparison: 0.0038 < 0.05

Decision: Reject the null hypothesis
Conclusion: There is evidence to conclude that the m&m bags are not filled to the mean of 47.9 grams.
 SUMMARY
The steps in any hypothesis test--not just a z-test for population means--are the same. First, state the
null and alternative hypotheses both in symbols and in words. Second, state and verify the conditions
necessary for inference. Third, calculate the test statistic from your statistics that you have and
calculate its p-value. Finally, compare your p-value to the alpha level that you've chosen, or your test
statistic to the critical value, and make a decision about the null hypothesis. State your conclusion in
the context of the problem. In a z-test for population means, the population standard deviation must
be known. That is not very common. You'll have other ways to address it when you don't know the
population standard deviation. Since the test statistic is a z, this focuses on a z-test for population
means.
Good luck!
 TERMS TO KNOW

A hypothesis test that compares a hypothesized mean from the null hypothesis to a sample mean, when
the population standard deviation is known.
by Sophia
 WHAT'S COVERED
This tutorial will cover how to calculate a hypothesis test for population proportions. Our discussion
breaks down as follows:
1. Calculating a Z-Test for Population Proportions
2. Conducting a Z-Test for Population Proportions
1. Calculating a Z-Test for Population Proportions

A hypothesis test for population proportions is a hypothesis test where we compare to see if the sample
proportion of "successes" differs significantly from a hypothesized value that we believe is the population
proportion of "successes." The type of data that is collected for a population proportion is qualitative data.
⭐ BIG IDEA
This is also known as a z-test for population proportions

When calculating a z-test for population proportions, you need the following information:
Population proportion of successes (p)

Population proportion of failures (1-p = q)
Sample proportion of successes (p̂)
Sample size (n)
This information will be plugged into the formula for a z-statistic of population proportions:
 FORMULA
Z-Statistic for Population Proportions
 HINT
You can also find this probability using technology.
IN CONTEXT
Approximately 10% of the population is left-handed, with a standard deviation of 3.13%. Of 100
randomly selected people, 14 claimed to be left-handed.
Find the z-test statistic for this data set.
This type of data is qualitative data; people are answering either yes or no. They're either left-
handed or not left-handed. We're placing the answers into categories, which is why it's also called
categorical data.
Since we know the population standard deviation, we can use the formula for population proportions
to find the z-score. We have p-hat, which is the proportion of successes from our sample. In this
case, a success is being left-handed, which is 14 out of 100, or 14%, or 0.14. Now, p is the population
number proportion of successes, which is 10%, or 0.10. Then, we have q, which is the complement to
p. This tells us that for people who are right-handed, there would be 90%, or 0.90. Our sample size
was 100.
Let's go ahead and calculate the z-score.
We've got our 14% minus the population proportion of 10%. We're going to calculate divide by the
standard error, so the square root of 0.10 times 0.90 all over our sample size of 100. We end up
getting a z-test statistic of 1.33.
We can use a normal distribution because we know the population standard deviation. This
distribution is centered at 10%. Our sample rendered 14% of people being left-handed, which was
1.33 standard deviations above the mean.
 TERM TO KNOW
Hypothesis Test for Population Proportions

A hypothesis test where we compare to see if the sample proportion of "successes" differs
significantly from a hypothesized value that we believe is the population proportion of "successes."
Z-test for Population Proportions

A type of hypothesis test used to test an assumed population proportion.
2. Conducting a Z-Test for Population

Proportions
When running a hypothesis test for population proportions, the same four parts apply every time:
Step 1: State the null and alternative hypotheses

Step 2: Check the conditions necessary for inference
Step 3: Calculate the test statistic and the p-value
Step 4: Compare your test statistic to your chosen critical value, or your p-value to your chosen
significance level. Based on how they compare, state a decision about the null hypothesis and conclusion
in the context of the problem.
 EXAMPLE Let's look at a situation that would require proportions.

A popular consumer report reported that 80% of all supermarket prices end in the digits 9 or 5. Suppose you
review a random sample of 115 items to check it against the consumer report, and you find that only 88 items
end in 9 or 5. That's less than 80%.
The question that we need to consider is this significantly less than 80%? Is this evidence that, in fact, less
than 80% of all items at the supermarket have a price ending in 9 or 5?
 STEP BY STEP
Step 1: State the null and alternative hypothesis.

Your null hypothesis is the "nothing's going on" hypothesis. In this case, it's the "there is no reason to
disbelieve the consumer report" hypothesis. Rewrite it as p equals 0.8. The true proportion of all prices ending
in 9 or 5 is 80% at the supermarket.
Conversely, the alternative hypothesis suspects that something is amiss, that it is actually less than 80%. We
are going to say that p, the true proportion of prices ending in 9 or 5, is below 80%.
In this problem, choose a significance level of 0.10. With the decision rule, if the p-value is less than 0.10, you’ll
reject the null hypothesis in favor of the alternative.
Step 2: Check the conditions necessary for inference.

You should be familiar with the conditions: randomness, independence, and normality. Look at them one at a
time.
Randomness
The randomness should be stated somewhere in the problem. Think about the way the data
was collected
Population ≥ 10n
Independence
Make sure that the population is at least 10 times the size of the sample because you're
sampling without replacement.
np ≥ 10 and nq ≥ 10
Because you're using the sampling distribution of p-hat instead of x-bar, there are different
Normality
conditions for normality. Use the conditions np is at least 10 and nq is at least 10. We can't
use the central limit theorem here because this is not the sampling distribution of x-bar. It's
the sampling distribution of p-hat, sample proportions.
Going back to the example:
Randomness: In the problem, it does say that the items were randomly selected, so the simple random
sample condition is acceptable.
Independence: Assume the independence piece--that the population of all items at the grocery store is at
least 1,150. That seems reasonable.
Normality: You know what n is, and you know what p is. So, p is the value from the null hypothesis--it's the
80% that you're believing is the center of the distribution; n was the sample size, 115. Multiply 0.80 times
115 to get 92 for n times p. That's greater than 10. In addition, you get 23 for n times q, which is also
greater than 10.
The sampling distribution of sample proportions is going to be of an approximately normal sampling

distribution.
All three conditions have been checked, and we're good to go.
Step 3: Calculate the test statistic and calculate the p-value based on the normal sampling distribution.
Now you can perform the z-test for population proportions. It's going to be the statistic (88 over 115) minus the
hypothesized parameter (0.80) over the standard error. The standard error in this case is the square root of p
times q divided by n.
When you evaluate the fraction, you get a z-score of negative 0.93. Then, you can find negative 0.93 on the
normal distribution that is the sampling distribution for p-hat, and find the tail probability by using a normal z-
table.
The probability that your sample proportion would be less than the one that you got, the 88 out of 115, is
equals to the probability that the z-statistic would be less than negative 0.93.
You can find that area using the normal table, and you get 0.1762, or about 18% of the time. This means that if
the null hypothesis was true and this distribution was really centered at 0.8, meaning the true proportion of
prices ending in 9 or 5 was 0.8, you would find something at least as low as we got about 18% of the time.
level.
Based on how your p-value compares to your chosen significance level, which you may recall was 0.10, you're
going to make a decision about the null hypothesis and state the conclusion. In your case, 0.1762 is greater
than 0.10. Your decision, then, is that you fail to reject the null hypothesis. The conclusion is that there's not
sufficient evidence to conclude that less than 80% of supermarket prices end in 9 or 5. You don't have strong
enough evidence to reject the claim of the consumer report.
Comparison: 0.1762 > 0.10

Decision: Fail to reject the null hypothesis
Conclusion: There is insufficient evidence to conclude that less than 80% of supermarket prices end in 9
or 5.
 SUMMARY
The steps in any hypothesis test are always the same. You start by stating your null and alternative
hypotheses, which is where you would also state your alpha level. Next, state and verify the
conditions. Calculate the test statistic and the p-value. Finally, based on your p-value, compare it to
your alpha level and make a decision about the null hypothesis and state it in the context of the
problem. In this case, we did a z-test for population proportions, and it's analogous to any other
hypothesis tests that you do. The only thing that you changed was how you verified the normality
condition, because you needed np to be at least 10 and nq to be at least 10.
Good luck!
 TERMS TO KNOW

A hypothesis test where we compare to see if the sample proportion of "successes" differs significantly
from a hypothesized value that we believe is the population proportion of "successes."

How to Find a Critical Z Value
by Sophia
 WHAT'S COVERED
This tutorial will cover how to find the critical z-value for the following tests:
1. Left-Tailed Tests
a. Graphing Calculator
b. Z-Table
c. Excel
2. Right-Tailed Tests
b. Z-Table
c. Excel
3. Two-Sided Tests
b. Z-Table
c. Excel
For a left-tailed test, suppose we need to find the critical z-value for a hypothesis test that would reject the
null hypothesis (H0) at a 2.5% significance level. To do this, we want to find, on our normal distribution, the
cutoff on the left tail that corresponds to the lower 2.5% of our distribution.
1a. Graphing Calculator

The first way is by using a graphing calculator. First, hit "2nd", then "DISTR" (This is above the button "VARS").
Then scroll down to the third function, which is inverse norm (invNorm). This is the inverse of the normal
distribution. We're going to hit "Enter".
Next, we're going to input 0.025, because this is a left-tailed test, so we're looking at the lower 2.5% of our
distribution. For this specific calculator (TI-84 Plus), we need to type 0.025 for the area, and 0 for mu and 1 for
sigma because these are the values that correspond with a normal distribution. Hit "Enter", and we get a z-test
statistic of negative 1.96.
At about -1.96, this is the cutoff for the lower 2.5% of our data.
Basically any z-score that is below a negative 1.96 means we're going to reject the null hypothesis. Any z-
score that is above a negative 1.96 is going to fall in this unshaded region of our distribution. This means that
we're willing to accept the variation in our sample from the center of our distribution due to chance, and we're
going to fail to reject the null hypothesis.
1b. Z-Table
The second method is using a z-table. When using the z-table, we look for our significance level in the table. In
this case, remember we were looking at a left-tailed test. This means we need to use the negative z-table with
negative z-scores, not positive, because we're looking at the lower half of the distribution. Remember, the
significance level is 0.025, or 2.5%, so we are going to look for that value or the closest thing to it. Here it is
on a z-table:
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
A significance level of 0.025 corresponds to a z-score of negative 1.96. Therefore, our z critical value is
negative 1.96.
1c. Excel
A third way to find the critical z-value that corresponds to a 2.5% significance level for a left-tailed test is in
Excel. All we have to do is go to our "Formulas" tab. We're going to insert under the "Statistical" column. We're
looking for "NORM.S.INV", which is right here:
This is for the inverse of the normal distribution, and because it's a left-tailed test, we're looking at the lower
half of our distribution. We're going to put in the 0.025 for the lower 2.5%. Hit "Enter", and notice how we get
the same critical z-value that we did using the calculator and table:
 TERM TO KNOW
Critical Value
For a right-tailed test, suppose we need to find the critical z-value for a hypothesis test that would reject the
null (H0) at a 5% significance level. To do this, we want to find, on our normal distribution, the cutoff on the
upper part of the distribution where we are not going to attribute the difference in proportion due to chance.

The first way is by using a graphing calculator. First, hit "2nd", then "DISTR" (This is above the button "VARS").
Then scroll down to the third function, which is inverse norm (invNorm). This is the inverse of the normal
distribution. We're going to hit "Enter".
The significance level is 5%, but we're not going to put in 0.05 like we did with the left-tailed test, where the
significance level was 2.5% and we entered 0.025. In the normal distribution, we always read left to right, and
it always goes from 0 percent to 100 percent. We're looking at a right-tailed test, which is the upper portion of
our distribution. That cutoff is the top 5% of our distribution. So, 100% minus 5% is going to be 95%. We are
actually going to put in the inverse norm of 0.95, and that's going to get us a corresponding critical z-value of
about 1.645.
Any z-test statistic that is greater than 1.645 falls in the upper 5% of our distribution, and therefore we would
reject the null hypothesis.
2b. Z-Table
The second method uses the z-table. Because we're looking at a right-tailed test, we're going to have positive
z-scores since we're looking at the upper half of the distribution. We'll use the positive z-table that
corresponds with positive z-scores.
The significance level was 5%, but it was the upper 5%. Remember, this corresponds to the 95th percentile on
our distribution. In the table, we need to look for the closest thing to 95%, or 0.95.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
This actually falls in between these two values, 0.9495 and 0.9505. This value corresponds to a z-score of 1.6
in the left column, and it falls between the 0.04 and the 0.05 in the top row. When we take the average of 1.64
and 1.65, we get a critical z-value of 1.645.
2c. Excel
A third way to find the critical z-value that corresponds to a 5% significance level for an upper tail test, or a
right-tailed test, is by using Excel. Again, go to "Formulas" tab. We're going to insert under the "Statistical"
column our "NORM.S.INV", but we're not going to put in 0.05 for the 5%. Because we're looking at the upper
part of our distribution, this is going to correspond to the 95th percentile. We're going to enter 0.95, and
notice how we get the same critical value we did from our table and our calculator, which is a positive 1.645.
3. Two-Sided Tests
For a two-sided test, suppose we want to find the critical z-score for a hypothesis test that would reject the
null at a 1% significance level. Because it's a two-sided test, we have to divide that 1% into each tail. Therefore,
1% divided by 2 means we're going to be looking for the cutoff at the lower 0.5% of the distribution, and the
upper 0.5% of our distribution.

The first way to find this value is with a graphing calculator. Let's go ahead and first find the corresponding
critical z-score for the lower part of our distribution. Hit "2nd", "DISTR", "invNorm". This tail is 0.5%, so we're
going to put 0.005.
This gives us a corresponding z-score of negative 2.576. In a distribution, this falls right about here, negative
2.576.
The shaded region corresponds to the lower 0.5% of the distribution. If we do this correctly, we should get the
same z-score, but a positive value for the upper portion of our distribution for that 0.5% cut off.
Let's go ahead and do inverse norm again on our calculator. But we can't put in 0.005, because remember,
our distribution reads from 0% to 100%. We actually have to do 100% minus 0.05%, or 99.5%. We are going to
put in 0.995 and get a positive 2.576.
This positive 2.576 corresponds to the upper 0.5% of our distribution.
Any z-score that we would calculate that would be greater than a positive 2.576 or less than a negative 2.576
means we would reject the null hypothesis.
3b. Z-Table
Using our z-table, we first look for the corresponding critical value for the lower half of our distribution, since
it's a two-sided test. Remember, we're not going to look for the closest thing to 1%, but we're going to look for
the closest value to 0.5%, or 0.005.
Let's use the table to find the lower critical value for our two-sided test in a 1% significance level, we're going
to find the closest thing to 0.5%.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
The closest value to 0.005 is between these two values, 0.0051 and 0.0049. This corresponds to a negative
2.5 in the left column, and in between the 0.07 and the 0.08 in the top row. If we're using the table, we're
going to get an average critical z-value of negative 2.575, which is quite close to what the calculator gave us.
Remember, sometimes the table can just give us an estimate.
Let's use the table to find the upper critical value for our two-sided test in a 1% significance level, we're going
to try to find the closest to 100% minus 0.5%, or 99.5%.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
The closest value to 99.5%, or 0.995, is in between these two values, 0.9949 and 0.9951. This corresponds to
a positive 2.5 in the left column, and falling between the 0.07 and the 0.08 in the top row. If we're using the
table, we would get a critical z-value of a positive 2.575, taking the average between those two values.
3c. Excel
In Excel, we're going to find the two critical z-values that correspond to the 1% significance level for our two-
sided test. Again, go under your "Formulas" tab. We're going to insert the "NORM.S.INV" under the "Statistical"
column. We'll first find the lower critical value that corresponds to the lower 0.5%, so enter 0.005.
You can see that we get our first critical z-value of negative 2.576. Now, if we do this correctly, we should get
a positive 2.576. Again, we're going to insert to get the second critical value for the upper part of our
distribution. The upper percentage that corresponds to the top 0.5% is going to be our 99.5%, so 0.995.
We get the positive critical z-value of 2.576.
 SUMMARY
We calculated a critical z-score for a left-tailed, right-tailed, and two-tailed test, utilizing three methods
for each test: graphing calculator, z-table, and Excel.
Good luck!
Source: Adapted from Sophia tutorial by RACHEL ORR-DEPNER.
 TERMS TO KNOW
Critical Value
How to Find a P-Value from a Z-Test Statistic
by Sophia
 WHAT'S COVERED
This tutorial will explain how to find a p-value when given the z-test statistic, by using either graphing
calculator, z-table, or technology. Our discussion breaks down as follows:
1. Two-Sided Tests
a. Z-Table
b. Graphing Calculator
c. Excel
a. Z-Table
c. Excel
a. Z-Table
c. Excel
1. Two-Sided Tests
Suppose a pharmaceutical company manufactures ibuprofen pills. They need to perform some quality
assurance to ensure they have the correct dosage, which is supposed to be 500 milligrams. This is a two-
sided test because if the company's pills are deviating significantly in either direction, meaning there are more
than 500 milligrams or less than 500 milligrams, this will indicate a problem.
In a random sample of 125 pills, there is an average dose of 499.3 milligrams with a standard deviation of 6
milligrams. Because this is quantitative data, 500 mg is the population mean. We can use the following
formula to calculate the z-score:
We get a z-score of negative 1.304. Because this is a two-sided test, it is not enough to just look at the left tail.
We also have to look at the equivalent of the right tail, or a positive 1.304.
Now that we have the z-score, we can use a variety of methods to find the probability, or p-value.
1a. Z-Table
The first way to find the p-value is to use the z-table. In the z-table, the left column will show values to the
tenths place, while the top row will show values to the hundredths place. If we have a z-score of -1.304, we
need to round this to the hundredths place, or -1.30. In the left column, we will first find the tenths place, or -1.3.
In the top row, we will find the hundredths place, or 0.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
This results in a p-value of 0.0968, or 9.68%, for a z-score of negative 1.304. We also need to take the positive
1.304 into account, which is the upper right tail.
To calculate the true p-value, we just need to multiply 0.0968 by two, or 0.1936. This would be a p-value of
19.36%.
1b. Graphing Calculator

The second method is using a graphing calculator. This can give us a more exact number because we will not
have to cut off the z-score at the hundredths place. On the calculator, click "2nd", then "DISTR" for distribution.
We will use "normalcdf", which stands for normal cumulative density function. When inserting the values into
the calculator, we always go lower boundary to upper boundary.
In this case, the lower boundary was shaded all the way to the left of the curve, which would be negative
infinity. We cannot enter negative infinity in our calculator, so instead, we can just enter negative 99. The
shading stops at -1.304, so this is the upper boundary.
We get a value of 0.0961, which is about the same value as we got in the table. Again, we need to take both
tails into account, so we can simply multiply this value by two to get a p-value of 0.1922, or 19.22%.
1c. Excel
The third method to find the p-value is to use Excel. First, select "Formulas", choose the "Statistical" option,
and pick "NORM.DIST". The first value we are going to input is the mean of the sample, which was 499.3, then
the population mean which we are testing against, or 500, and finally the standard deviation, which was 6,
divided by the square root of the sample of n. We can find the square root under the "Math and Trigonometry"
option in "Formulas". The last value that we need to enter is "TRUE".
We get about the same value as we did with the table and the calculator. Since this is a two-sided test, we
need to multiply the value by two, or 0.096 times two equals 0.1912.
2. Left-Tailed Test
In this next example, we'll look at the proportion of students who suffer from test anxiety. We want to test the
claim that fewer than half of students suffer from test anxiety.
In this case, we will have a left-tailed test. Because this is qualitative data, meaning the students answer yes or
no to suffering from test anxiety, this is a population proportion and we can use the following formula to
calculate the z-test statistic:
In a random sample of 1000 students, 450 students claimed to have test anxiety. This will be p-hat, or the
sample proportion. We can calculate this by dividing 450 by 1000, or 0.45. The population proportion, p, is
50%, or 0.50. The complement of p, or q, can be found by calculating 1 minus 0.50, or 0.50. The sample size
is 1000.
The corresponding z-score is negative 3.162. Testing against that half, or 50%, of students suffer from test
anxiety, we get the following shaded region all the way to the left of our curve:
2a. Z-Table
The first way to find the p-value is with the z-table. Remember, we can only go up to the hundredths place, so
we will need to round -3.162 to -3.16. In the left column, we will first find the tenths place, or -3.1. In the top row,
we will find the hundredths place, or 0.06.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.9 0.0019 0.0018 0.0017 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
This gives us a p-value of 0.0008, or 0.08%.

To find the p-value on the graphing calculator, click "2nd", then "DISTR" for distribution. Again, we will use
"normalcdf". When inserting the values into the calculator, remember we always go lower boundary to upper
boundary. In this case, the lower boundary was shaded all the way to the left of the curve, which would be
negative infinity. We cannot enter negative infinity in our calculator, so instead, we can just enter negative 99.
The shading stops at -3.162, so this is the upper boundary.
This answer shows a p-value of 0.00078, or 0.078%.
2c. Excel
In Excel, select "Formulas", choose the "Statistical" option, and pick "NORM.DIST". The first value we are going
to input is the sample proportion, "0.45", then the population proportion, "0.50", and finally the standard
deviation, which was the square root of pq divided by n, or 0.50 times 0.50 divided by 1000. We can find the
square root under the "Math and Trigonometry" option in "Formulas". The standard deviation should be input
as "SQRT((0.50*0.50)/1000)". The last value that we need to enter is "TRUE".
We get about the same p-value as we did with the z-table and the calculator.
3. Right-Tailed Test
In this final example, we will be testing the claim that women in a certain town are taller than the average U.S.
height, which is 63.8 inches.
From a random sample of 50 women, we get an average height of 64.7 inches with a standard deviation of
2.5 inches. Inches is a quantitative variable, therefore the 63.8 inches is a population mean. We will then use
the following formula to calculate the z-score:
We get a z-score of 2.546, which is labeled on the following distribution:
3a. Z-Table
The first way to find the p-value is to use the z-table. In the z-table, the left column will show values to the
tenths place, while the top row will show values to the hundredths place. If we have a z-score of 2.546, we
need to round this to the hundredths place, or 2.55. In the left column, we will first find the tenths place, or 2.5.
In the top row, we will find the hundredths place, or 0.05.
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986
This gives us a p-value of 0.9946, or 99.46%.
However, when we are performing an upper-tail test, or right-tailed test, that p-value from the table always
reads left to right for our distribution. The p-value of 99.46% is associated with the 99.46% percent that is
unshaded.
To get the percent that is shaded under the curve, we just need to calculate 100% minus 99.46%. This gives
us the p-value of 0.54%, or 0.0054.

On the graphing calculator, again we are going to click "2nd", then "DISTR", and use "normalcdf". When
inserting the values into the calculator, remember we always go lower boundary to upper boundary of the
shaded region. In this case, the lower boundary of the shaded region is our z-score, 2.546. The upper
boundary goes all the way up to positive, but we cannot type positive infinity in our calculator. Instead, we can
just enter positive 99.
This answer shows a p-value of 0.0054, or 0.54%.
3c. Excel
In Excel, first, select "Formulas", choose the "Statistical" option, and again pick "NORM.DIST". The first value
we are going to input is the mean of the sample, which was 64.7, then the population mean which we are
testing against, or 63.8, and finally the standard deviation, which was 2.5, divided by the square root of the
sample size of 50. We can find the square root under the "Math and Trigonometry" option in "Formulas". The
last value that we need to enter is "TRUE".
Notice that we do not get the same p-value as the graphing calculator. In this case, since it is a right-tailed
test, Excel always goes from the first part of the distribution and reads left to right. We know that the
distribution is 100%, so to get that upper portion of the distribution, we have to do 100%, or 1, minus this value.
We get the same p-value, which is about 0.0054, or 0.54%.
 SUMMARY
Today we calculated the p-value from a given z-test statistic, for a two-sided test, a left-tailed test, and
a right-tailed test. For each test, we performed the calculation using three different methods: z-table,
graphing calculator, and Excel.
Good luck!
 TERMS TO KNOW
P-value
hypothesis
Test Statistic
Confidence Intervals
by Sophia
 WHAT'S COVERED
This tutorial will cover the basics of confidence intervals, focusing on how to identify the z-critical
value needed for a given confidence interval. Our discussion breaks down as follows:
1. Confidence Intervals
2. Margin of Error and the Affect of Confidence Level and Sample Size
3. Confidence Interval Formulas
a. For Sample Means
b. For Sample Proportions
4. Finding Z*
1. Confidence Intervals
Before we begin, it's important to note that sampling error is the inherent variability in the process of sampling.
In a random sample, it occurs when you use a statistic, like a sample mean, to estimate the parameter, like a
population mean. You won't always get exact accuracy with the sample mean, but you can use it to estimate
the population mean. The idea is that you can be close.
When you take a larger sample, you're going to be, on average, closer. The sampling error, which is the
amount by which the sample statistic is off from the population parameter, decreases. You get more
consistently close values to the parameter when you take samples. When you calculate a margin of error in a
study, you are approximating the sampling error.
When you take a sample, you try to obtain values that accurately represent what's going on in the population.
 EXAMPLE For example, suppose you took a simple random sample of 500 people getting ready
for an upcoming election in a town of 10,000, and found that 285 of those 500 plan to vote for a
particular candidate. Your best guess, for the true proportion, in the population of the town that will
vote for candidate y is the proportion that you got in your sample--285 out of 500, which is 57% of the
town. That's your best guess, but you might be off by a little bit.
You don't know if the true proportion of people who will vote for that candidate is 57%, and that's why
you report a margin of error in your poll.
From the margin of error, you can create what is called aconfidence interval. A confidence interval is an
interval that contains the likely values for a parameter. We base the confidence interval on our point estimate,
and the width of the interval is affected by the confidence level and sample size.
The confidence interval can be found using the following formula:
 FORMULA
Confidence Interval
The confidence interval is your point estimate, which is your best guess from your simple random sample, plus
or minus the margin of error. You believe you are within a certain amount of the right answer with your point
estimate.
 TERM TO KNOW
Confidence Interval
An interval that contains likely values for a parameter. We base our confidence interval on our point
estimate, and the width of the interval is affected by confidence level and sample size.
2. Margin of Error and the Affect of Confidence

Level and Sample Size
The margin of error depends on two things:
Sample size: You knew this from before when you said that a larger sample size results in less sampling
error, and therefore a lower margin of error.
Confidence level: You're going to learn more about this, but a higher confidence level results in a larger
margin of error.
 EXAMPLE If you want to be very confident that you're going to accurately describe what percent
of people are going to vote for that particular candidate, you have to go out a little bit further on each
side. Maybe you have to go out plus or minus 5%, as opposed to plus or minus 3%.
IN CONTEXT
95% Confidence
If the sampling distribution of p-hat is approximately normal, it will be centered at p, the population
parameter. 95% of all sample proportions will be within two standard deviations of p.
So p plus or minus two standard deviations will contain 95% percent of all p-hat. This is called 95%
confidence. Approximately 19 out of every 20 samples, in the long term, that you take will be within
two standard deviations of the right answer. 95% percent of all p-hats are within two standard
deviations of p.
If you want to be more confident, you can go out even further.
99% Confidence
For instance, 99% of all p-hats will be within 2.58 standard deviations of p. This means that when
you take a sample proportion, 99% of sample proportions will be within 2.58 standard deviations of
the right answer, the value of p.
Take your p-hat value, and plus or minus 2.58 standard deviations, and you're 99% likely to capture
the value of p.
3. Confidence Interval Formulas

When stating the confidence interval, we will use the following phrase:
In C% of samples, the parameter will be within z* standard errors of the sample statistic.
This is the interpretation here. These bold words are all going to be replaced with numbers, in typical
interpretations.
3a. For Sample Means

What does this look like if you're using means? If you are using means, it looks like that Mu (μ), the parameter,
will be contained in the interval statistic, which is x-bar, plus or minus z*, times the standard error of the
statistic. In other words,
For C% of the time, the parameter μ will be contained in the interval .
 FORMULA
Confidence Interval of Means
3b. For Sample Proportions

If you're using proportions, that means that the sample proportion, p-hat, plus or minus z* times the standard
error, will contain the value of the parameter, p, some percent of the time, such as 95% or 99% of the time.
For C% of the time, the parameter p will be contained in the interval .
 FORMULA
Confidence Interval of Proportions
4. Finding Z*
The confidence level determines the value of z*. Depending on what you choose for your confidence level, z*
will be affected that way. To find the z* critical value, we can use a z-table. For a confidence interval, we can
follow the same steps as a two-sided test.
 EXAMPLE If we have a 95% confidence interval, this is actually the same as a 5% significance
level. However, this is split between two tails, the lower and upper part of the distribution. Each tail will
have 2.5%, or 0.025.
We can use the upper limit to find the critical z-score. Remember, a distribution is 100%, so to find the upper
limit, we can subtract 0.025 from 1, which gives us 0.975. Now, we can use a z-table.
Standard Normal Distribution

Z-Table
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
In a z-table, the value 0.975 corresponds with a 1.9 in the left column and 0.06 in the top row. This tells us that
the z-score is 1.96.
Another way is to use a t-table, which you will learn more about in a later tutorial. We don't use the t-
distribution for proportions; however, we can use the last row in this table to find the confidence levels.
t-Distribution Critical Values
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.610 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 3.552 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.767
24 0.685 0.857 1.059 1.318 1.711 2.064 2.172 2.492 2.797 3.091 3.467 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.167 2.485 2.787 3.078 3.450 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.158 2.473 2.771 3.057 3.421 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.150 2.462 2.756 3.038 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.147 2.457 2.750 3.030 3.385 3.646
40 0.681 0.851 1.050 1.303 1.684 2.021 2.123 2.423 2.704 2.971 3.307 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.109 2.403 2.678 2.937 3.261 3.496
60 0.679 0.848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.460
80 0.678 0.846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416
100 0.677 0.845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.390
1000 0.675 0.842 1.037 1.282 1.646 1.962 2.056 2.330 2.581 2.813 3.098 3.300
>1000 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291
Confidence Interval between -t and t
50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%
Z confidence level, critical values, are found in the last row of this t-table, under the infinity value, or ">1000".
Essentially the normal distribution is the t distribution with infinite degrees of freedom. We're going to look in
this row to find the z critical value that we should use, which is the same as the 1.96 we previously got.
 SUMMARY
When you take a sample, you obtain a sample statistic that is a point estimate of the population
parameters. You can create a confidence interval where you can be a certain percent confident that
the parameter lies within the interval. This means that the percent of sample statistics in the sample
distribution are within the margin of error of the parameter. Perhaps you'll say 95% of all the x-bars in
the sampling distribution of x-bar will be within the margin of error of the true parameter Mu. That
percent of confidence intervals will contain the parameters. If you did samples over and over again,
and took confidence intervals each time, 90% or 95% of confidence intervals would contain the
answer of Mu or p, or whatever parameters you're trying to estimate.
Good luck!
 TERMS TO KNOW
Confidence Interval
An interval that contains likely values for a parameter. We base our confidence interval on our point
estimate, and the width of the interval is affected by confidence level and sample size.
Confidence Interval
Confidence Interval for Population Proportion
by Sophia
 WHAT'S COVERED
This tutorial will cover how to calculate confidence intervals for a population proportion. Our
1. Calcuating a Confidence Interval for Population Proportion

2. Constructing a Confidence Interval for Population Proportion
1. Calcuating a Confidence Interval for Population

Proportion
A confidence interval for population proportions is very similar to a confidence interval for population means.
In general, a confidence interval is an estimate found by using a sample statistic and adding and subtracting
an amount corresponding to how confident we are that the interval created captures the population
parameter.
For a confidence interval for population proportions, the statistic is the sample proportion and population
parameter is the population proportion. The following can be used to calculate the confidence interval:
 FORMULA
Confidence Interval of Population Proportion
 HINT
We will use p-hat and q-hat because we do not have an assumed population proportion.
 TERM TO KNOW
Confidence Interval for a Population Proportion

A confidence interval that gives a likely range for the value of a population proportion. It is the sample
proportion, plus and minus the margin of error from the normal distribution.
2. Constructing a Confidence Interval for

Population Proportion
To construct a confidence interval for population proportions, the following steps must be followed:
 STEP BY STEP
Step 1: Verify the conditions necessary for inference.

Step 2: Calculate the confidence interval.
Step 3: Interpret the confidence interval.
 EXAMPLE Obecalp is a popular prescription drug but is thought to cause headaches as a side
effect. In a random sample of 206 patients taking Obecalp, 23 experienced headaches.
Construct a 95% confidence interval for the proportion of all Obecalp users that would experience
headaches.
Stating the conditions isn't enough, and it's not just a formality--you must verify them. Recall the conditions
needed:
Condition Description
Randomness How was the sample obtained?
Independence Population ≥ 10n
Normality np ≥ 10 and nq ≥ 10
Let's go back to our example to check the requirements of randomness, independence, and normality.
Randomness: The sample of Obecalp users was a random sample, so that is verified.
Independence: The sample of Obecalp users taken was a small fraction of the population of Obecalp
users. There's no way to verify that empirically unless you had the whole list of people taking the drug.
You're going to have to assume there are at least ten times the sample size, or 2,060 people taking this
drug.
Normality: This "np is greater than or equal to 10" equation is a little harder to figure out. You don't know
p, the true proportion of people who will get headaches, and you don't have a best guess for it from a null
hypothesis. There is no null hypothesis in this problem. What you do have, as a point estimate for p, is p-
hat. Verify normality by using p-hat instead of p. Say n times p-hat has to be at least 10. In this case, 206
times p, 23 out of 206, is 23, which is bigger than 10; n times q-hat is 183, which is also bigger than 10.
 HINT
Recall that you need to use sample statistic, p-hat, to verify the normality condition because you don't
know population parameter, p.
To do this, we will take the point estimate, p-hat, plus or minus the z* critical value times the standard error of
p-hat, which is the square root of p-hat times q hat, over n. The population proportion is not known, so you’ll
use p-hat for the standard error.
First, let's find the corresponding z* critical value for a 95% confidence interval by using a z-table. For a
confidence interval, we can follow the same steps as a two-sided test. If we have a 95% confidence interval,
this is actually the same as a 5% significance level. However, this is split between two tails, the lower and
upper part of the distribution. Each tail will have 2.5%.
We can use the upper limit to find the critical z-score. Remember, a distribution is 100%, so to find the upper
limit, we can subtract 0.025 from 1, which gives us 0.975. Now, we can use a z-table.
Standard Normal Distribution

Z-Table
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890
In a z-table, the value 0.975 corresponds with a 1.9 in the left column and 0.06 in the top row. This tells us that
the z-score is 1.96.
Another way is to use a t-table, which you will learn more about in a later lesson but is available to view at the
end of this tutorial. We don't use the t-distribution for proportions; however, we can use the last row in this
table to find the confidence levels. Z confidence level, critical values, are found in the last row of this t table,
under the infinity value, or ">1000". Essentially, the normal distribution is the t distribution with infinite degrees
of freedom. We're going to look in this row to find the z critical value that we should use, which is the same as
the 1.96 we got from before.
Now that we have the corresonding z* critical value, we need to use p-hat, which is 23 out of 206, q-hat, which
is the complement of p-hat, and the sample size, n, which is 206 and put all this information in the formula:
From this formula, we obtain 0.112, which was our p-hat, plus or minus 0.043, which is the margin of error.
When we evaluate the interval, it's going to be from 0.069 all the way up to 0.155.
The confidence interval of 0.069 to 0.155 means we're 95% certain that if everyone who was taking Obecalp
was in the study, the true proportion of all Obecalp users who would experience headaches is somewhere
between 6.9% and 15.5%. We don't know exactly where in that range, but the true proportion is probably
somewhere in this range.
 SUMMARY
You can create point estimates for population proportions, which is your sample proportion, and then
use that sample proportion to determine the margin of error for a confidence interval. First, verify the
conditions for inference are met, then construct and interpret a confidence interval based on the data
that you've gathered and the statistics that you've calculated.
Good luck!
 TERMS TO KNOW

A confidence interval that gives a likely range for the value of a population proportion. It is the sample
proportion, plus and minus the margin of error from the normal distribution.
Calculating Standard Error of a Sample
Proportion
by Sophia
 WHAT'S COVERED
This tutorial will explain how to calculate standard error for a sample proportion, for cases when the
population standard deviation is known, and when it is unknown. Our discussion breaks down as
follows:
1. Standard Error for Sample Proportions (Population Standard Deviation Is Unknown)

2. Standard Error for Sample Proportions (Population Standard Deviation Is Known)
1. Standard Error for Sample Proportions

(Population Standard Deviation Is Unknown)
A survey is conducted at the local high schools to find out about underage drinking. Of the 523 students who
replied to the survey, 188 replied that they have drank some amount of alcohol.
What is the standard error of the sample proportion?
These students are either answering yes or no on the survey: "Yes, I've drank some amount of alcohol" or "No,
I have not drank some amount of alcohol". That is qualitative data, also known as categorical data. Therefore,
we're dealing with a sample proportion.
Whenever we're dealing with a sample proportion, the next question we need to ask ourselves is, "Do I know
the population standard deviation?" In this case, we do not have any of that information. Therefore, the
formula to calculate the standard error is p-hat times q-hat, divided by n, all under the square root.
 FORMULA
Standard Error for Sample Proportions (Population Standard Deviation Unknown)
We're actually going to use the data that was given to us, which are estimates--that's what the hat indicates--in
order to calculate the standard error.
The first thing we need to do is to figure out what p-hat is, based off of the information given to us. In this case,
the p-hat is what we're interested in, and that is how many have answered yes to participating in underage
drinking. That would be 188 out of 523 students, or 188/523, which is about 36% of the students.
Now, we also need the complement, which would be q-hat. This is also written as 1 minus p-hat. One minus
the 188 out of 523, or 1 - 0.36, tells us that 0.64, or 64%, of the students have not participated in underage
drinking. To always make sure our math is correct, remember that our p-hat and q-hat should add up to 1,
because they're complements of each other.
Now, we can plug in those values into the formula.
We have 0.36 for p-hat, 0.64 for q-hat, and the total sample, n, was 523 students. This calculates to a
standard error that is 0.021.

(Population Standard Deviation Is Known)
Revisiting our prior example, a survey is conducted at the local high schools to find out about underage
drinking. Of the 523 students who replied to the survey, 188 replied that they have drank some amount of
alcohol. The proportion of underage drinkers nationally is 39%
We are still looking at the students who were surveyed about underage drinking, but notice how this scenario
added on that the proportion of underage drinkers nationally is 39%. We're still calculating the standard error
of the sample proportion, but in this case, we know the population standard deviation, which is 39%. We're
going to use the formula of the square root of pq over n.
 FORMULA
Standard Error for Sample Proportions (Population Standard Deviation Is Known)
We do not need to use p-hat, which is the 188 out of 523, to make the estimate for the standard error. We
actually know p, which is 39%, or 0.39.
In this case, we're going to use 0.39 for p. This is another way of indicating population proportion. We can
then use this to find q, which is the complement of p. The complement of 0.39 is calculated by 1 minus 0.39,
which equals 0.61, or 61%. Sometimes we'll see this written as p subscript 0 and q subscript 0.
The sample size, n, is still 523 students who were surveyed.
The standard error is 0.021.
 SUMMARY
Today we learned how to calculate standard error of a sample proportion, and practiced identifying
which formula to use, based on the whether the population standard deviation is unknown or known.
 TERMS TO KNOW
Standard Error
Standard Error
Sample Means:
Sample Proportion (population standard deviation is unknown):
Sample Proportion (population standard deviation is known):
T-Tests
by Sophia
 WHAT'S COVERED
In this tutorial, you will learn about t-tests, and how to determine key characteristics of a t-distribution.
1. Difference between Z-Tests and T-Tests

2. Conducting a T-Test
1. Difference between Z-Tests and T-Tests

In a z-test for means, the z-test statistic is equal to the sample mean minus the hypothesized population mean,
over the standard deviation of the population divided by the square root of sample size.
 FORMULA
Z-Statistic For Population Means
However, the z-statistic was based on the fact that the population standard deviation was known. If the
population standard deviation is not known, we need a new statistic. We're going to use our sample standard
deviation, s, instead.
 FORMULA
T-Statistic For Population Means
 HINT
This "s" over the square root of n value, replacing the sigma over square root of n value, is called the
standard error.
The only problem with using the sample standard deviation (s) as opposed to the population standard
deviation (σ) is that the value of s can vary largely from sample to sample. Sigma (σ) is fixed, so we can base
our normal distribution off of it.
The sample standard deviation is more variable than the population standard deviation and much more
variable for small samples than for large samples. For large samples, s and sigma are very close, but with
small samples particularly, the value of s can vary wildly.
Because s is so variable, it creates a new distribution of test statistics much like the normal distribution, but is
known as the student's t-distribution, or sometimes just the t-distribution.
The only difference is the t-distribution is a more heavy-tailed distribution. If we used the normal distribution, it
wouldn't underestimate the proportion of extreme values in the sampling distribution.
The t-distribution is actually a family of distributions. They all are a little bit shorter than the standard normal
distribution and a little heavier on the tails. As the sample size gets larger, the t-distribution does get close to
the normal distribution. It doesn't diminish as quickly in the tails when the sample size is small, but gets very
close to the normal distribution when n is large.
 TERM TO KNOW
T-Distribution
A family of distributions that are centered at zero and symmetric like the standard normal distribution,
but heavier in the tails. Depending on the sample size, it does not diminish towards the tails as fast. If
the sample size is large, the t-distribution approximates the normal distribution.
2. Conducting a T-Test
We're going to conduct a t-test for population means much like we conducted a z-test for population means.
Recall, that when running a hypothesis test, there are four parts:
 STEP BY STEP
1. State the null and alternative hypotheses.

2. Check the conditions necessary for inference.
3. Calculate the test statistic and its p-value.
4. Compare our test statistic to our chosen critical value, or our p-value, to the significance level, and then
based on how those compare, make a decision about the null hypothesis and a conclusion in the context
of the problem.
The only difference between these two tests is the test statistic is going to be a t-statistic instead of a z-
statistic. Because we’re using the t-distribution instead of the z- distribution, we're going to obtain a different
p-value.
Therefore, we will need a new table, not the standard normal table for that. Below is the t-distribution table.
We can see the possible p-values in the top row and the t-values are the values inside the table. Potential p-
values are based on the values within this section.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.610 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 3.552 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.767
24 0.685 0.857 1.059 1.318 1.711 2.064 2.172 2.492 2.797 3.091 3.467 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.167 2.485 2.787 3.078 3.450 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.158 2.473 2.771 3.057 3.421 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.150 2.462 2.756 3.038 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.147 2.457 2.750 3.030 3.385 3.646
40 0.681 0.851 1.050 1.303 1.684 2.021 2.123 2.423 2.704 2.971 3.307 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.109 2.403 2.678 2.937 3.261 3.496
60 0.679 0.848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.460
80 0.678 0.846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416
100 0.677 0.845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.390
1000 0.675 0.842 1.037 1.282 1.646 1.962 2.056 2.330 2.581 2.813 3.098 3.300
>1000 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291
50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%
This distribution is actually one-sided, and it's the upper side that gives us these tail probabilities here. The
entries in the t-table represent the probability of a value laying above t.
The one new wrinkle that we're adding for a t-distribution is this value df (the far left column). It's called the
degrees of freedom. For our purposes, it's going to be the sample size minus 1. We find our t-statistic in
whatever row our degrees of freedom is. If it's between two values, that means our p-value is between these
two p-values.
 EXAMPLE According to their bags, a standard bag of M&M's candies is supposed to weigh 47.9
grams. Suppose we randomly select 14 bags and got this distribution.
48.2 48.4 47.0 47.3 47.9 48.5 49.0
48.3 48.0 47.9 48.7 48.8 47.4 47.6
Assuming the distribution of bag weights is approximately normal, is this evidence that the bags do not
contain the amount of candy that they say that they do?
The null is that the mean is 47.9 grams. The alternative is the mean is not 47.9 grams. We can select a
significance level of 0.05, which means that if the p-value is less than 0.05, reject the null hypothesis.
H0: μ = 47.9; The mean of all M&M's bags is 47.9g.

Ha: μ ≠ 47.9; The mean of all M&M's bags is not 47.9g.
α = 0.05
Step 2: Check the necessary conditions.

Consider the following criteria for a hypothesis test:
Randomness
The randomness should be stated somewhere in the problem. Think about the way the
data was collected.
Population ≥ 10n
Independence
You want to make sure that the population is at least 10 times as large as the sample
size.
n ≥ 30 or normal parent distribution
There are two ways to verify normality. Either the parent distribution has to be normal or
Normality
the central limit theorem is going to have to apply. The central limit theorem says that
for most distributions, when the sample size is greater than 30, the sampling distribution
will be approximately normal.
Let's verify each of those in the M&M's problem:
Randomness: It says in the problem that the bags were randomly selected.
Independence: We're going to go ahead and assume that there are at least 140 bags of M&M's. That's 10
times as large as the 14 bags in our sample.
Normality: Finally, it does say in the problem that the distribution of bag weights is approximately normal,
and so normality will be verified for our sampling distribution.
Step 3: Calculate the test statistic and the p-value.

Since we do not know the population standard deviation, we will calculate a t-test. We first need to find the
sample mean and standard deviation. We can easily find both values by using Excel. First, list all the values
given. To find the average, type "=AVERAGE(" and highlight all 14 values. We can also find this function under
the Formulas tab, and select the Math and Trigonometry option.
When we press "Enter", we get an average of 48.07. We can also find the standard deviation easily by typing
"=STDEV.S(" and highlighting the 14 values. We can also find this function by going under the Formulas tab,
and then selecting the Statistical option.
This gives a standard deviation of 0.60.
Now, we can plug the known values into the t-statistic formula:
By plugging in all the numbers that we have, we obtain a t-statistic of positive 1.06. Where exactly does this
tell us? We need to calculate the probability that we get a t-statistic of 1.06 or larger.
In the table, we also need to identify the degrees of freedom (df), which is the sample size minus one. The
sample was 14 bags, so our degrees of freedom is 14 minus 1, or 13.
Let's look at the t-table in row 13 to find the closest value to 1.06:
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
In all likelihood, it's not one of the values listed in the row here, but between two values. We’ll see 1.06 is
between the 0.870 and the 1.079, which means that the p-value is going to be between the two numbers 0.40
and 0.30 on the top row for a two-tailed test. Recall that we are testing to see if the mean weight of the M&M's
bags is anything other than the hypothesized 47.9 grams, so this is a two-tailed test.
 HINT
You will need to consider if the problem is looking at a one-tailed or two-tailed test. If we need a value
that is either less than or greater than the hypothesized mean, we will use a one-tailed test. If we are
looking for any value that is different than the hypothesized mean, then we will use a two-tailed test.
Now, we can, in fact, use technology to nail down the p-value more exactly. We don't have to use this table,
although we can still use the table to answer the question about the null hypothesis.
Step 4: Compare our p-value to our significance level.

We don't know exactly what our p-value is, but we know that it's within the range of 0.30 to 0.40. Since both
of those numbers is greater than the significance level of 0.05, we're going to fail to reject the null hypothesis.
There's our decision based on how they compare.
Finally, the conclusion is that there's not sufficient evidence to conclude that M&M's bags are being filled to a
mean other than 47.9 grams.
 TERM TO KNOW
T-test For Population Means

The type of hypothesis test used to test an assumed population mean when the population standard
deviation is unknown. Due to the increased variability in using the sample standard deviation instead
of the population standard deviation, the t-distribution is used in place of the z-distribution.
 SUMMARY
In cases where the population standard deviation is not known--which is almost always--we should
use the t-distribution to account for the additional variability introduced by using the sample standard
deviation in the test statistic. A t-test means that the value will be a "t" statistic instead of a "z" statistic.
The steps in the hypothesis test are the same as they are in a z-test: first stating the non-alternative
hypotheses, stating and verifying the conclusions of the test, calculating the test statistic and the p-
value, and then finally, comparing the p-value to alpha and making a decision about the null
hypothesis.
Good luck!
 TERMS TO KNOW
T-Distribution/Student's T-Distribution
A family of distributions that are centered at zero and symmetric like the standard normal distribution,
but heavier in the tails. Depending on the sample size, it does not diminish towards the tails as fast. If
the sample size is large, the t-distribution approximates the normal distribution.

The type of hypothesis test used to test an assumed population mean when the population standard
deviation is unknown. Due to the increased variability in using the sample standard deviation instead of
the population standard deviation, the t-distribution is used in place of the z-distribution.
How to Find a Critical T Value
by Sophia
 WHAT'S COVERED
This tutorial will explain how to find a critical t-value by using either a t-table or technology. Our
1. Left-Tailed Test
b. T-Table
c. Excel
b. T-Table
c. Excel
3. Two-Sided Test
b. T-Table
c. Excel
1. Left-Tailed Test
Remember, a critical value is a value that corresponds to the number of standard deviations away from the
mean that we're willing to attribute to chance. How far from the center of our distribution can a t-test statistic
fall? We'll decide to either fail to reject the null hypothesis or reject the null hypothesis.
For a left-tailed test, let's find the critical t* for a hypothesis test, with eight degrees of freedom, that would
reject the H0 at a 2.5% significance level.

One way to do this is with a graphing calculator. However, depending on the type of calculator you have, you
may or may not have this function. If you have the TI-84, the TI-89, the Nspire, or the CAS, you should have
the function for doing the inverse T. If you do not have one of these calculators, you can also use Excel, which
will be explained later.
First, click "2nd", "DIST", and we want the inverse T, or "InvT". Because we are looking at a left-tailed test,
we're looking at the bottom 2.5% of the distribution. We will want to enter 0.025, and then 8 for the degrees of
freedom. Remember, our distribution changes shape based on the sample size.
We get a corresponding critical t-value of negative 2.306. This falls about here on the distribution, where the
lower shaded region corresponds to the lower 2.5% of our distribution, or negative 2.306.
Any t-test statistic that we calculate for this corresponding hypothesis test that is less than negative 2.306
means we would reject the null hypothesis. Anything greater than that critical value is in this safe region that is
unshaded. We would just attribute it to chance and we would fail to reject the null hypothesis.
1b. T-Table
Using the t-table to find our critical t-value--remember, this is a lower-tail test--we're going to locate the closest
thing to 2.5%, or 0.025, for the one-tail probability. We also know that we have eight degrees of freedom.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
A tail probability of 0.025 and eight degrees of freedom is going to correspond to a critical t-value of 2.306.
⭐ BIG IDEA
Now, one thing with the t-table--unlike the z-table where we have a positive table and a negative table--is
that it's all positive. However, we can use it for both positive and negative values.
Since this is a left-tailed test, we have to recognize that it's a lower-tail test; we're lower than the mean of the
distribution. So the critical value should actually be a negative 2.306. Always be careful of that when using the
t-table.
1c. Excel
To find a critical t-value in Excel, go under the Formulas tab. We're going to insert a function. Under the
Statistical column, we're going to look for T.INV, or inverse of T. We're going to put in the corresponding
significance level, 0.025, comma, degrees of freedom, which was 8. Hit Enter.
Notice how we get the same value we did on the calculator and in the table--a negative 2.306.
For this second example, we're going to look at a right-tailed test and find the critical t* for a hypothesis test
with only four degrees of freedom that would reject the null hypothesis at 5% significance level.
Because it's a right-tailed test, we need to find the cutoff for our t-scores. This will be on the upper part of this
distribution that corresponds to the top 5% of our distribution.

On the graphing calculator, again click "2nd", "DIST", and we want the inverse T, or "InvT". However, we
wouldn't enter 0.05 this time. Remember we always read a distribution from left to right, or from 0% to 100%.
For our right-tailed test, we're looking at the top 5%, which actually corresponds to the 100% - 5%, or the 95th
percentile. We'd actually put in 0.95, comma, our degrees of freedom, which was 4.
Our calculator would give us a corresponding t-test statistic of 2.132. This falls right about here on the
distribution and corresponds to the top 5%.
It makes sense that this is a positive value, because we're above the mean. So, any t-test statistic that is above
2.132 for this particular hypothesis test means we would reject the null. Anything below that value, we'd
attribute to chance and we'd fail to reject.
2b. T-Table
Now we're going to use our t-table to find our critical t-value with a significant level of 5% for an upper-tail test.
Again, looking at our top row which has the one-tail probabilities, we have 5%, or 0.05. We also are looking at
four degrees of freedom.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
That will correspond to a t-test statistic, or in this case a critical t*, of 2.132. Because it is an upper-tail test,
we're above the center of the distribution, so it should be positive. We're going to leave the t-test statistic as a
positive 2.132.
2c. Excel
Now let's use Excel to find our critical t-value. Again, go under our Formulas tab and look under the Statistical
column for T.INV, or inverse T. In this case, remember that because it's an upper-tail test, we have to put in
95% or 0.95 since that corresponds to the upper 5% from a cutoff value. Next, enter a comma, and then our
degrees of freedom, which was 4.
We get the same critical t-value we did in our calculator and on our table--a positive 2.132.
3. Two-Sided Test
For this last example, we're going to look at a two-sided test and find the critical t-value for a hypothesis test
with 13 degrees of freedom that would reject the null hypothesis at 1% significance level.
Because it's a two-sided test, we have to divide the significance level of 1%, or 0.01, onto both sides of our
distribution. Half of 0.01 is equal to 0.005. This means we're going to be finding that critical value, that cutoff,
for the lower 0.5%, or 0.005, of the distribution and the upper 0.5% of our distribution.

On the graphing calculator, for the lower 0.5%, we would do "invT(0.005,13)".
We would get a corresponding critical t-value of negative 3.012, which is about here on our distribution.
This would correspond to the lower 0.5% of our distribution, and it should be negative since it's below the
mean. We're going to do the same thing, but for the upper 0.5% of our distribution. Remember, in our
distribution, we always read left to right from 0% to 100%. So, upper 0.5% actually corresponds to 99.5%, or
0.995, of our distribution. On the graphing calculator, we would enter "invT(0.995,13)".
We get a positive 3.012. Our distribution now looks like this:
For this particular hypothesis test, if we got a t-test statistic that was above a positive 3.012 or below a
negative 3.012, we would reject the null. Anything in between, we'd attribute to chance and we would fail to
reject the null.
3b. T-Table
Now we're going to use our t-table to find our critical t-value. We had 13 degrees of freedom and our
significance level was 1%. Keep in mind that this is the tail probability for two tails, meaning each tail will have
0.5%.
If you take a look at our table, we actually have both one-tailed and two-tailed probabilities listed. You can use
either row and get the same critical value.
Looking at the two-tailed, we need to find 1%, or 0.01.

Looking at the one-tailed, we need to find 0.5%, or 0.005.
Go down to the row that shows 13 degrees of freedom and find the corresponding critical value.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
We get a corresponding critical t-value of 3.012. But because it's a two-sided test, it's both the positive and the
negative 3.012.
3c. Excel
Now we're going to use Excel to find our critical t-value for our two-sided test. This one's going to be a little bit
different than the previous two, which were one-tailed tests. Again, go under the Statistical column. In this
case, we're going to do the inverse of T for 2T, which means the two-tailed test.
In this case, we do not have to divide our significance level into the two halves; we can just put 0.01. Excel
knows--because we indicated it's a two-tailed test--to automatically divide that 1%. We were also at 13 degrees
of freedom. Therefore, it gives us the positive corresponding critical t-value of 3.012. However, we know that
it's not only a positive 3.012, but also a negative 3.012.
 SUMMARY
We learned how to calculate the critical t-value using either a t-table or Excel, for a left-tailed test,
right-tailed test, and two-sided test. At the end of this lesson, we've attached a PDF where you can try
some examples for yourself.
Good luck!
 TERMS TO KNOW
Critical Value
How to Find a P-Value from a T-Test Statistic
by Sophia
 WHAT'S COVERED
In this tutorial, you will learn how to find a p-value from a t-test statistic using the calculator, the t-table
table, and Excel. Our discussion breaks down as follows:
b. T-Table
c. Excel
2. Two-Tailed Test
b. T-Table
c. Excel
The average ACT score in Illinois is a 20.9. There's one high school in particular that believes its students
scored significantly better than the state average. Because the school believes that its students performed
better, this is an upper tail test.
In order to test this hypothesis, the school took a random sample of 15 students' scores and got an average of
22.5, with a standard deviation of 2.3.
Let's first ask ourselves which type of data we are dealing with. In this case, we're dealing with quantitative
data, and we do not know the population standard deviation. We just know the sample standard deviation, so
we're going to do a t-test.
The first step is to calculate the t-statistic:
This gives us a t-score of 2.694. This score is plotted on the t-distribution below.

In order to convert that into a p-value, the first method is on a graphing calculator. On your calculator, go
ahead and hit 2nd DIST. We're interested in this sixth function, tcdf, which stands for "t cumulative density
function."
When we're entering in the values for a "tcdf," it is always going to be the lower boundary of the shaded
region, the upper boundary of the shaded region, then the degrees of freedom. Remember, the shape of this
distribution changes with the sample size. In this case, the lower boundary of our shaded region is 2.694. The
upper boundary is the top portion of our distribution, which is positive infinity. In order to indicate positive
infinity to our calculator, we just enter a positive 99. Our degrees of freedom will be 14 because the degrees of
freedom is sample size minus one, or 15 minus 1.
For this particular problem, we have a p-value of 0.0087, or 0.87%.
1b. T-Table
We can also use the t-table. Now, tables sometimes can only get us an estimated p-value. They can't get us
an exact p-value, like the calculator or Excel. But sometimes it's all we have, and it's definitely sufficient. In this
case, remember we had a t-test statistic of 2.694. All of the values inside our t-table are a bunch of t-scores.
On the left hand column, these are the degrees of freedom, and the top row contains all of our corresponding
p-values.
What we're going to do is look for the corresponding degrees of freedom for our hypothesis test. In this case
we had 14 degrees of freedom. In that same row, we're going to look for the closest thing possible to the t-
score of 2.694.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
Now, 2.694 falls somewhere in between these two values, 2.624 and 2.977. These t-scores correspond to the
two p-values, 0.01 and 0.005, or 1% and 0.5%. Since it falls somewhere in the middle, we are actually just
going to take the average of these two p-values.
To do this, take 1% plus 0.5%; divide that by 2, and we get an estimated p-value of 0.0075, or 0.75% when
using the t-table. It is not exactly the same as the value we found using the calculator; however, it is very close.
⭐ BIG IDEA
The t-table only shows positive values, however, you can still use this same method for a left-tailed test;
you would just look for the positive of your t-statistic.
1c. Excel
To convert our t-test statistic into a p-value using Excel, go under the Formulas tab. We are going to insert a
formula that falls under the Statistical column. We are looking for t-distribution dot rt, or T.DIST.RT, because
we're performing a right-tail test.
Notice how there is no T.DIST.LT. If we're performing a left-tail test, we would just use the T.DIST. But since
we're performing an upper-tail test, we are going to use T.DIST.RT. The first thing we are going to put in is the
t-score, which was a positive 2.694, then 14 degrees of freedom. Hit Enter.
Notice how we get the same p-value of 0.0087, or 0.87%, as when we used our calculator.
2. Two-Tailed Test
The quality assurance of a soda company wants to make sure that its machines are working in filling its 12-
ounce cans of soda. Because it would be a problem if the machines were significantly over-filling and under-
filling the cans of soda, the company needs to perform a two-sided test.
The company took a random sample of 10 cans of its soda off the production line and got an average of 11.8
ounces with a standard deviation of 1.2 ounces.
We're looking at quantitative data, which is ounces of soda, and we also do not know the population standard
deviation. Therefore, we're going to perform a t-test.
The first step is to calculate the t-statistic:
The t-test statistic is negative 0.527. This score is plotted on the t-distribution below.

Using a graphing calculator to go from the t-score to a p-value, we're going to look again at "tcdf." So go to
2nd, DIST, and we are going to scroll down to "tcdf," which stands for "t cumulative density function." We're
going to put in the lower boundary of the shaded region, the upper boundary of the shaded region, then the
degrees of freedom.
The lower boundary of the shaded region for this problem is negative infinity. In order for the calculator to
recognize negative infinity, we put in a negative 99. The upper boundary of the shaded region is the t-score,
which is a negative 0.527. We have nine degrees of freedom, because our sample size was 10. We get a
corresponding p-value of 0.3055, or 30.55%.
However, this is not our final answer. This was a two-sided test, and this is only the p-value that's associated
with the left-tail, or the left-shaded region. We also have to get the corresponding upper-tail as well, which
would fall at a positive 0.527.
Luckily, because these two areas are equal, all we have to do is take this p-value that we got on the
calculator, and multiply it by two. The p-value for this problem is 0.3055 times 2, or 61.1%.
2b. T-Table
Now, let's find the p-value using the t-table. To get the p-value from the t-test statistic of a negative 0.527,
remember, we're going to look at the corresponding degrees of freedom, which in this case was a nine, and
we're going to find the closest t-score we can to what we calculated. The closest value to 0.527 is this 0.703.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
There's nothing that we can estimate. We can't take an estimate between two values, because our t-score
falls below the very first value. Because we're doing a two-sided test, we can just look at the row for two-tailed
and get a p-value of 0.5, or 50%. We could have also looked at the one-tailed row and doubled the value of
0.25 to get the same answer.
2c. Excel
To convert our t-score into a p-value using Excel, we're going to go under the Formulas tab and insert a
formula again under the Statistical column. But because this is a two-sided test, we want t-distribution dot 2t,
or T.DIST.2T, for two tails. Even though the t-square that we calculated was negative 0.527, in Excel you
always put in the positive tail. We're going to go ahead and put in the positive 0.527 with our degrees of
freedom of 9.
Notice how we get the same p-value value of 61.1% that we did when using our calculator.
 SUMMARY
Today we learned how to find a p-value from a t-test statistic using the calculator, the t-table table, and
Excel. We did this for two types of test: a right-tailed test and a two-tailed test.
Good luck!
 TERMS TO KNOW
P-value
hypothesis
Test Statistic
Confidence Intervals Using the T-Distribution
by Sophia
 WHAT'S COVERED
This tutorial will discuss confidence intervals, specifically those that use the t-distribution. Our
1. Confidence Intervals Using the T-Distribution
1. Confidence Interval Using the T-Distribution

A confidence interval using the t-distribution is very similar to a hypothesis test. In fact, it is preferred to a
hypothesis test because we have an estimate and a conclusion that can be made equivalent to a two-tailed
test.
When calculating the confidence interval for a sampling distribution, you would normally take the sample
mean plus or minus some number of standard deviations times the standard error, or .
However, the only problem is this formula has a sigma in it, which is the population standard deviation. For
situations where we don't know what the population standard deviation is, you have to replace this formula
with one that uses "s", or the sample standard deviation.
Since you're using s as a stand-in for sigma, you need to use thet-distribution instead and come up with the
following formula:
 FORMULA
Confidence Interval of Population Mean
 TERMS TO KNOW
Confidence Interval
An interval we are some percent certain (eg 90%, 95%, or 99%) will contain the population parameter,
given the value of our sample statistic.
T-Distribution
A distribution similar to the normal distribution but with fatter tails. Depending on the sample size, it
does not diminish toward the tails as fast.
2. Constructing a Confidence Interval Using the
T-Distribution
To construct a confidence interval for population means using the t-distribution, the following steps must be
followed:
 STEP BY STEP

 EXAMPLE Many times, consumers will pay attention to nutritional contents on packaged food so
it's important to make them accurate as to what the food product actually contains. Suppose, for
example, that the stated calorie content for a particular frozen dinner was 240.
A random sample of 12 frozen dinners was selected, and the calorie contents of each one was
determined.
255 244 239 242 265 245
259 248 225 226 251 233
One of the boxes actually contained 255 calories worth of food whereas another one only contained 225
calories' worth of food. We can quickly calculate the mean and standard deviation by using Excel.
First, enter all 12 values. Go to the Formulas tab, and we will use the formula AVERAGE under the Statistical
option to find the sample mean and highlight all the values.
The sample mean is 244.33 calories. To find the sample standard deviation, again go under the Formulas tab,
and select STDEV.S under the Statistical option. Highlight all 12 values.
The sample standard deviation is 12.38.
Suppose you want to construct a 90% confidence interval for the true mean number of calories. This means
that you want to construct a confidence interval such that you’re 90% confident that the true mean of all the
packaged frozen dinners lies within the interval.
Step 1: Verify the conditions for inference.
Stating the conditions isn't enough, and it's not just a formality--you must verify them. Recall the conditions
needed:
Randomness How was the sample obtained?
Independence Population ≥ 10n
n ≥ 30 or normal parent
Normality
distribution
Let's go back to our example to check the requirements of randomness, independence, and normality.
Randomness: It was a random sample, as was said in the problem.
Independence: Is the population of all frozen dinners at least 10 times the size of your sample? That's
reasonable to believe. Assume there are at least 120 frozen dinners in all of this company's frozen dinner
line.
Normality: This one's a little tricky. Your sample size isn't 30 or larger, so the Central Limit Theorem
doesn't apply to this problem. Is the parent distribution normal? You don't know that either. You need to
determine if this is plausible. You can do that by graphing the actual data that you have.
You can see that the parent distribution might be normal since the data that you got from the population
are single peaked and approximately symmetric. It's possible that the population parent distribution is
normal. You can proceed under the assumption of normality. You can’t verify it 100%, but assume it is for
the purposes of this problem.
Reviewing the formula, we need the sample mean, the sample standard deviation, the sample size, and the t-
critical value. We have already figured out the information about the sample with the help from Excel:
We know that 244.33 is the sample mean and the standard deviation is 12.38 when we used the data of the 12
dinners. This information also tells us that the sample size is 12. What we still need to do is figure out what that
t* value is going to be.
To find this value, we need a t-distribution table. We need a t* that will give us 90% of the t-distribution. 90%
confidence interval would mean that there is 10% remaining on either side for the two tails, or 5% for each tail.
Looking at the table, we can match this information with the values 0.05 in the row of one-tailed or 0.10 in the
row of two-tailed. We can also look all the way down at the bottom and see that there is a row that says
"Confidence Interval." There is 50%, 60%, 70%, 80%, 90%, etc. Either one of those justifications is reason
enough to use this column.
We also need to know the degrees of freedom to determine which number from this column we're going to
use. In this problem, we have 11 degrees of freedom because we had 12 dinners in our sample, and the
degrees of freedom is n minus 1.
So we need to look for the corresponding value inside that table that matches with 11 degrees of freedom and
90% confidence interval.
Tail Probability, p
One-tail 0.25 0.20 0.15 0.10 0.05 0.025 0.02 0.01 0.005 0.0025 0.001 0.0005
Two-tail 0.50 0.40 0.30 0.20 0.10 0.05 0.04 0.02 0.01 0.005 0.002 0.001
df
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6
2 0.816 1.080 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7.173 8.610
5 0.727 0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869
6 0.718 0.906 1.134 1.440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959
7 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587
11 0.697 0.876 1.088 1.363 1.796 2.201 2.328 2.718 3.106 3.497 4.025 4.437
12 0.695 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.235 2.583 2.921 3.252 3.686 4.015
17 0.689 0.863 1.069 1.333 1.740 2.110 2.224 2.567 2.898 3.222 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.214 2.552 2.878 3.197 3.610 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883
20 0.687 0.860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 3.552 3.850
21 0.686 0.859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.819
22 0.686 0.858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.792
23 0.685 0.858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.767
24 0.685 0.857 1.059 1.318 1.711 2.064 2.172 2.492 2.797 3.091 3.467 3.745
25 0.684 0.856 1.058 1.316 1.708 2.060 2.167 2.485 2.787 3.078 3.450 3.725
26 0.684 0.856 1.058 1.315 1.706 2.056 2.162 2.479 2.779 3.067 3.435 3.707
27 0.684 0.855 1.057 1.314 1.703 2.052 2.158 2.473 2.771 3.057 3.421 3.690
28 0.683 0.855 1.056 1.313 1.701 2.048 2.154 2.467 2.763 3.047 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.150 2.462 2.756 3.038 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.147 2.457 2.750 3.030 3.385 3.646
40 0.681 0.851 1.050 1.303 1.684 2.021 2.123 2.423 2.704 2.971 3.307 3.551
50 0.679 0.849 1.047 1.299 1.676 2.009 2.109 2.403 2.678 2.937 3.261 3.496
60 0.679 0.848 1.045 1.296 1.671 2.000 2.099 2.390 2.660 2.915 3.232 3.460
80 0.678 0.846 1.043 1.292 1.664 1.990 2.088 2.374 2.639 2.887 3.195 3.416
100 0.677 0.845 1.042 1.290 1.660 1.984 2.081 2.364 2.626 2.871 3.174 3.390
1000 0.675 0.842 1.037 1.282 1.646 1.962 2.056 2.330 2.581 2.813 3.098 3.300
>1000 0.674 0.841 1.036 1.282 1.645 1.960 2.054 2.326 2.576 2.807 3.091 3.291
50% 60% 70% 80% 90% 95% 96% 98% 99% 99.5% 99.8% 99.9%
Look in the 11 degrees of freedom row and the 90% confidence column until we obtain a t* of 1.796.
Now we have all the information needed in order to create our confidence interval. Construct it as x-bar plus
or minus the t critical value times the sample standard deviation divided by the square root of sample size.
When we do that, we obtain 244.33 plus or minus 6.42. When we evaluate the interval, it's going to be 237.68
all the way up to 249.98.
What does this confidence interval actually mean? How can you interpret the interval? The interpretation is
that we're 90% confident that the true mean calorie content of all frozen dinners is between about 237 and
250 calories. We're 90% confident that the real value is somewhere in there, and that the 240 value that they
were purporting at the beginning of the problem is, in fact, plausible.
 SUMMARY
Today we learned about confidence intervals specifically for means using the t- distribution. We can
create point estimates for the population means using x-bar, and determine the margin of error. That
margin of error is the "t* times s over the square root of n" piece of the confidence interval. First, we
verify that conditions are met. Then, we construct and interpret the confidence interval.
Good luck!
 TERMS TO KNOW
Confidence Interval
An interval we are some percent certain (eg 90%, 95%, or 99%) will contain the population parameter,
given the value of our sample statistic.
t-distribution
A family of distributions similar to the standard normal distribution, except that they are fatter in the tails,
due to the increased variability associated with using the sample standard deviation instead of the
population standard deviation in the formula for the test statistic.
Calculating Standard Error of a Sample Mean
by Sophia
 WHAT'S COVERED
This tutorial will explain how to calculate standard error for both sample means and sample
proportions. It will cover both when the population standard deviation is unknown and known. Our
1. Standard Error for Sample Means

2. Standard Error for Sample Proportions (Population Standard Deviation Is Unknown)
3. Standard Error for Sample Proportions (Population Standard Deviation Is Known)
1. Standard Error for Sample Means

The standard error for sample means is used when we have quantitative data and can be calculated using the
following formula:
 FORMULA
Standard Error of a Sample Mean
It is the sample standard deviation, s, over the square root of the sample size, n.
 EXAMPLE The amount of fallen snow, in inches, is recorded for one week in Minneapolis.
Day Sunday Monday Tuesday Wednesday Thursday Friday Saturday
Snow 1.5 3 4.75 8 0.3 2 2.95

What is the standard error of the sample mean?
The very first question we always want to ask ourselves is, "What type of data are we dealing with? Is this
quantitative or qualitative data (which is also known as categorical data)?"
Inches of snow is a quantitative variable, so we're looking for the standard error of a sample mean.
Recall that "s" stands for the standard deviation of the sample, and "n" is the sample size. In this case, we
have seven pieces of data for the week, so our sample size is seven.
One way to find the standard deviation is on a graphing calculator. We'll want to enter the data into a list.
To get there, go ahead and hit the "Stat" button, and hit "Enter" for edit, and we're going to insert the data
into List One.
At first, we have 1.5 inches of snow, so type 1.5, then hit Enter. Next it snowed 3 inches, then it snowed a
lot, 4.75 inches up to 8 inches. Then it tapered off to 0.3 inches. On Friday, there were 2 inches of snow.
Finally, on Saturday, there were 2.95 inches of snow.
Once we've entered all of that data, we can exit out of this screen by hitting 2nd Mode. Now, to get s, the
standard deviation of this sample, we need to get the sample statistics from that list of data. To do so, hit
the Stat button, scroll over to Calc, and we're interested in the first function, one variant statistics.
Go ahead and hit Enter. We want it for List One, so we're going to hit 2nd 1, and we can see the L1 in the
upper left hand corner above the one button. Hit Enter, and we get all sorts of useful data for this set of
data.
We have the x-bar, which is the mean. We also have s of x and sigma of x, as well as the sample size,
which is seven.
We also get the five number summary, which is quite useful for other types of problems. However, for this
problem, we're interested in the standard deviation of the sample. Remember, this is a sample, not a
population, so we want to use the value for s of x, not sigma of x. Remember, population statistics are
noted with Greek letters.
In this case, since it is just a sample, we're going to use the 2.526 (which is rounded). We'll divide 2.526 by
the square root of 7 to calculate the standard error.
The standard error equals 0.955.

Another method to calculate the standard error of the sample mean is in Excel. The first thing we need to
do is calculate s, which is the standard deviation from your sample. We'll go to the "Formulas' tab, select
the "Statistical" option and look for the standard deviation of a sample. This is indicated with "STDEV.S".
The "STDEV.P" is the standard deviation of the population, but remember, we are just dealing with a
sample.
Next, insert the data, separating each data value with a comma, and hit Enter.
Notice how we get the same value that we got in our calculator under s of x. Now, we're going to finish
calculating the standard error, which was s divided by the square root of n. In a new cell, we'll type the
equal sign and then click on A1 with our mouse since that is where the s-value is. This automatically inserts
that value. Next, divide by the square root of n. To get the square root, we have to insert a formula. The
square root is just under "Math and Trigonometry", and is indicated with SQRT. Enter the sample size of 7.
Notice how we get the same value as we did on our calculator for standard error, 0.955.

(Population Standard Deviation Is Unknown)
Sample proportions use data that is qualitative, or categorical. When calculating the standard error for sample
proportions, you need to consider whether or not you know the population standard deviation. If the
population standard deviation is NOT known, you will use the following formula:
 FORMULA
Standard Error for Sample Proportions (Population Standard Deviation Unknown)
The formula to calculate the standard error is p-hat times q-hat, divided by n, all under the square root. We're
actually going to use the data that was given to us, which are estimates -- that's what the hat indicates.
 EXAMPLE A survey is conducted at the local high schools to find out about underage drinking. Of
the 523 students who replied to the survey, 188 replied that they have consumed some amount of
alcohol.

These students are either answering yes or no on the survey: "Yes, I've drank some amount of alcohol" or
"No, I have not drank some amount of alcohol". That is qualitative data, also known as categorical data.
Therefore, we're dealing with a sample proportion.
Whenever we're dealing with a sample proportion, the next question we need to ask ourselves is, "Do I
know the population standard deviation?" In this case, we do not have any of that information. So we will
use the standard error formula with the sample data, p-hat and q-hat.
The first thing we need to do is to figure out what p-hat is, based off of the information given to us. In this
case, the p-hat is what we're interested in, and that is how many have answered yes to participating in
underage drinking. That would be 188 out of 523 students, or 188/523, which is about 36% of the
students.
Now, we also need the complement, which would be q-hat. This is also written as 1 minus p-hat. One
minus the 188 out of 523, or 1 - 0.36, tells us that 0.64, or 64%, of the students have not participated in
underage drinking. To always make sure our math is correct, remember that our p-hat and q-hat should
add up to 1, because they're complements of each other.
Now, we can plug in those values into the formula.
We have 0.36 for p-hat, 0.64 for q-hat, and the total sample, n, was 523 students. This calculates to a
standard error that is 0.021.

(Population Standard Deviation Is Known)
When we have a sample proportion where we do know the population standard deviation, we can use the
following formula:
 FORMULA
Standard Error for Sample Proportions (Population Standard Deviation Is Known)
We do not need to use the sample data, p-hat, to make the estimate for the standard error. We actually know
the population parameters we can use this information.
 EXAMPLE Revisiting our prior example, a survey is conducted at the local high schools to find out
about underage drinking. Of the 523 students who replied to the survey, 188 replied that they have
drank some amount of alcohol. The proportion of underage drinkers nationally is 39%.

We are still looking at the students who were surveyed about underage drinking, but notice how this
scenario added on that the proportion of underage drinkers nationally is 39%. We're still calculating the
standard error of the sample proportion, but in this case, we know the population standard deviation,
which is 39%. We're going to use the formula of square root of pq over n.
We're going to use 39%, or 0.39, for p. This is another way of indicating population proportion. We can
then use this to find q, which is the complement of p. The complement of 0.39 is calculated by 1 minus
0.39, which equals 0.61, or 61%. Sometimes we'll see this written as p0 and q0.
The sample size, n, is still 523 students who were surveyed.
The standard error is 0.021.
 SUMMARY
Today we learned how to calculate standard error, and practiced identifying which formula to use,
based on the information given for these three formulas: Sample Means, Sample Proportion
(population standard deviation is unknown), and Sample Proportion (population standard deviation is
known).
 TERMS TO KNOW
Standard Error
Standard Error
Sample Means:
Analysis of Variance/ANOVA
by Sophia
 WHAT'S COVERED
This tutorial will cover tests for three or more population means and the process for analysis of
variance (ANOVA). Our discussion breaks down as follows:
1. ANOVA
a. Conditions
b. Null and Alternative Hypothesis
c. F-Statistic
d. Concluding the ANOVA Test
1. ANOVA
Comparing three or more means requires a new hypothesis test calledanalysis of variance (ANOVA). The AN
is for "analysis", the O is for "of", and the VA is for "variance"). For ANOVA, we compare the means by
analyzing the sample variances from the independently selected sample.
 EXAMPLE Suppose a factory supervisor wants to know whether it takes his workers different
amounts of time to complete a task based on their proficiency level. The factory employs apprentices,
novices, and masters. The supervisor randomly selects ten workers from each group and has them
perform the task.
The summary of the data, which is the time in minutes to complete the task, is shown in this table here:
Proficiency n x̄ s
Apprentice 10 22.5 4.2
Novice 10 20.7 5.1
Master 10 19.0 4.6

Are these sample means significantly different from each other? In order to answer this question, you will
need to perform the analysis of variance (ANOVA) because we are comparing three population means.
 TERM TO KNOW
Analysis of Variance (ANOVA)

A hypothesis test that allows us to compare three or more population means.
1a. Conditions
There are a few conditions necessary for an ANOVA test:
1. Independent samples from the populations.
2. Each population has to be normally distributed.
3. The variances, and therefore the standard deviations of all those normal distributions, are the same.
For the above factory scenario, let's assume that the above three conditions are met.
1b. Null and Alternative Hypothesis

Once the three conditions are met, we can continue to identifying the null and alternative hypotheses and
choosing an alpha level.
For our factory scenario:
Null H0: μA = μN = μM ; The mean time required to complete the task is the same for the
Hypothesis masters, the novices, and the apprentices.
Alternative
Ha: At least one of the mean times is different from another.
Hypothesis
Alpha
α = 0.05
Level
1c. F-Statistic
When you do an ANOVA test, the statistic that you use is not going to be a z or t, as you have been using in
the past. Instead, you will use what is called an "F". An F statistic is calculated by taking the quotient of the
variability between the samples and the variability within each sample.
 FORMULA
F-Statistic
The size of F can provide information about the null hypothesis:
Small F Statistic: Consistent with the null hypothesis, meaning H0 is true.

Large F Statistic: Evidence against the null hypothesis, meaning there's more variability between the
samples than there are within the samples. This would be rare if the null hypothesis was true.
⭐ BIG IDEA
A small F is consistent with the null hypothesis, versus a large F statistic, which is evidence against the
null hypothesis. You wouldn't reject it if F was small.
Almost always, you will calculate the ANOVA F statistic and the p-value with technology. All but the most
simple, straightforward problems will be calculated using technology.
In our factory scenario, the F statistic, calculated with technology, is 1.418. That is not a very large value of F.
The corresponding p-value is 0.26, which is a very large p-value.
 TERM TO KNOW
F Statistic
The test statistic in an ANOVA test. It is the ratio of the variability between the samples to the
variability within each sample. If the null hypothesis is true, the F statistic will probably be small.
1d. Concluding the ANOVA Test

Finally, we need to decide whether to reject or fail to reject the null hypothesis.
If the p-value is less than the significance level, you would reject the null hypothesis.
If the p-value is greater than the significance level, you would fail to reject the null hypothesis.
In the factory scenario, since the p-value of 0.26 is very large, greater than the 0.05 significance level, you fail
to reject the null hypothesis. There's no evidence that suggests that the time required to complete the task
differs significantly with proficiency level.
 SUMMARY
ANOVA, or analysis of variance, allows you to compare three or more means by comparing the
variability within each sample to the variability between the samples. The null hypothesis is that all the
means are the same, and the alternative hypothesis is that at least one of them is different. A small F
is consistent with the null hypothesis, versus a large F statistic, which is evidence against the null
hypothesis. The F and the p-value are almost always calculated with technology.
Good luck!
 TERMS TO KNOW

F statistic
The test statistic in an ANOVA test. It is the ratio of the variability between the samples to the variability
within each sample. If the null hypothesis is true, the F statistic will probably be small.
One-Way ANOVA/Two-Way ANOVA
by Sophia
 WHAT'S COVERED
This tutorial will cover the difference between a one-way ANOVA test versus a test with a two-way
ANOVA. Our discussion breaks down as follows:
1. Types of ANOVA Tests
a. One-Way ANOVA
b. Two-Way ANOVA
1. Types of ANOVA Tests

Recall that with ANOVA tests, you want to compare three or more population means and see if there's a
significant difference between the means.
There are two types of ANOVA tests:
One-Way ANOVA: Consider the population means based on one characteristic

Two-Way ANOVA: Consider the population means based on multiple characteristics
1a. One-Way ANOVA

With one-way ANOVA, we are comparing three or more sample means for only one characteristic.
 EXAMPLE Suppose that you had a 10-point cleanliness scale that you were ranking detergents on.
Detergent
Tide 8.3
All 6.4
Era 5.5
Arm & Hammer 6.8
Based on this one factor, detergent, you are trying to see how clean the clothes get on average. You
would need more information, such as sample size and standard deviation, but this is a situation which
would lead you to an ANOVA test.
Because we're only looking at the one factor of detergent affecting cleanliness, this case would be
considerd a one-way ANOVA.
 TERM TO KNOW
One-Way ANOVA
A hypothesis test that compares three or more population means with respect to a single
characteristic or factor.
1b. Two-Way ANOVA

With two-way ANOVA, we are comparing three or more sample means for multiple characteristics.
 EXAMPLE Consider the scenario from above that compared cleanliness of different types of
detergents. Now, suppose that you included another factor: water temperature.
Water Temperature
Detergent Hot Warm Cold
Tide 8.3 8.4 8.6
All 6.4 6.9 7.3
Era 5.5 5.9 6.5
Arm & Hammer 6.8 7.8 7.8
It's possible that some of these detergents do a better job of cleaning in different temperatures of water.
Now that you have all of this additional information, you're actually looking at 12 treatments, four
detergents and three water temperatures for each detergent.
There are two factors that are factoring into the cleanliness score.
Type of Detergent
Water Temperature
Because there are two factors that are affecting the cleanliness score, we can still do an ANOVA test, but
this time, it's called a two-way ANOVA.
 TERM TO KNOW
Two-Way ANOVA
A hypothesis test that compares three or more population means with respect to multiple
characteristics or factors.
 SUMMARY
In one-way ANOVA, you can consider population means that are based on just one characteristic. In
two-way ANOVA, you consider the comparisons based on multiple characteristics or factors.
Good luck!
 TERMS TO KNOW
One-Way ANOVA
A hypothesis test that compares three or more population means with respect to a single characteristic
or factor.
Two-Way ANOVA
A hypothesis test that compares three or more population means with respect to multiple characteristics
or factors.
by Sophia
 WHAT'S COVERED
This tutorial will cover the chi-square statistic, discussing how to calculate the observed frequency
and expected frequency of a data set. Our discussion breaks down as follows:
1. The Chi-Square Statistic

2. Finding Observed and Expected Frequencies
1. The Chi-Square Statistic

A chi-square statistic is a particular test used for categorical data. It measures how expected frequency differs
from observed frequency.
The observed frequency is the number of observations we actually see for a value, or what actually
happened. The expected frequency is what we would expect to happen. It is the number of observations we
would see for a value if the null hypothesis was true.
 HINT
In this tutorial, you will not run any significance tests because the chi-square tests have many different
versions, and each of them will have their own tutorial. This tutorial is going to focus on how the statistic is
calculated, as it's calculated the same regardless of the test you're running.
To measure the discrepancy between what you observed and what you expected, we need to calculate the
chi-square statistic, which is calculated this way:
 STEP BY STEP
1. Take the observed values.

2. Subtract the expected values.
3. Square that difference.
4. Divide by the expected values.
5. Add up all of those fractions.
 FORMULA
 EXAMPLE Suppose you have a tin of colored beads, and you claim that the tin contains the
colored beads in these proportions: 35% blue, 35% green, 15% yellow, and 15% red. These will be used
to find the expected frequencies.
You draw 10 beads from the tin: 4 red, 3 blue, 1 green, and 2 yellow. These will be your observed
frequencies.
Is what you drew consistent with the percentages you claimed or not? Why or why not?
If the claim were true, we would have expected that out of 10 beads, 3 1/2 of them would be blue, 3 1/2
green, 1 1/2 yellow, and 1 1/2 red. This is called the expected frequencies and can be calculated by
multiplying the sample size by the hypothesized proportion.
Expected
Color Percentage
out of 10
Blue 35% 3.5
Green 35% 3.5
Yellow 15% 1.5
Red 15% 1.5
You can't actually pull 3 1/2 blue beads, because you can't have half of a bead. Therefore, this is an
idealized scenario, representative of what you might expect in the long-term in samples of 10.
In your one sample of 10 beads, what you actually got was: 3 blue, 1 green, 2 yellow, and 4 red. The two
yellow beads drawn seems fairly close with the 15% claim. However, the four red beds that were drawn
does not seem consistent with the 15% claim for red.
How can you measure that discrepancy? We can calculate the chi-square statistic using the above
formula. First, subtract the each expected frequency from the observed frequency, square that value, and
divide by the expected frequency. Finally, add up all those calculations.
Color Expected Observed
Blue 3.5 3 0.0714
Green 3.5 1 1.7857
Yellow 1.5 2 0.1667
Red 1.5 4 4.1667
Sum 6.1905
The 3 1/2, 3 1/2, 1 1/2, and 1 1/2 were the expected frequencies and the observed frequencies were the 3, 1,
2, and 4. Using the formula, we get a chi-square statistic value of 6.1905.
So what do we do with this chi-squared statistic? We can find this value, along with the degrees of
freedom, in a chi-squared distribution table to determine if we reject or fail to reject the null hypothesis by
comparing it to the pre-determined significance level.
 HINT
You can use a table to calculate the chi-square statistic or you can use technology.
Now, it's worth noting that in this case, the conditions for inference with a chi-square test are not met. This
is only meant to illustrate how a chi-square statistic would be calculated, although you can't do any real
chi-square inference on this because the sample size isn't large enough.
 TERMS TO KNOW
The sum of the ratios of the squared differences between the expected and observed counts to the
expected counts.
Observed Frequencies
The number of occurrences that were observed within each of the categories in a qualitative
distribution.
Expected Frequencies
The number of occurrences we would have expected within each of the categories in a qualitative
distribution if the null hypothesis were true.
2. Finding Observed and Expected Frequencies
IN CONTEXT
Suppose there are four flavors of candy in a bag: cherry, lemon, orange, and strawberry. The
company claims the flavors are equally distributed in each bag.
After opening a bag of candy and sorting the flavors, the following counts were produced:
Flavor Observed
Cherry 11
Lemon 15
Orange 12
Strawberry 12
Total 50
In equal distribution, it is helpful to think of the proportions of each flavor and then make a
hypothesis based on those proportions. For the null hypothesis, we can assume that the proportions
for the four flavors are the same. The alternate hypothesis would state that is is not true; that the
proportions are not the same.
H0: pC = pL = pO = pS
Ha: The proportions of the flavors are not the same.
Next, we need to compare the observed frequency with the expected frequency. The observed
frequencies are the same as the above counts.
To find the expected frequency, we need to find the number of occurrences if the null hypothesis is
true, which in this case, was that the flavor proportions are equal, or if the four flavor categories
were all evenly distributed. Counting up all the flavors in that bag of candy gives us a total of 50
candies. If the flavor categories were evenly distributed among the 50 candies, we would need to
divide the total candies evenly between the four flavors, so 50 divided by 4, or 12.5 candies. This
means we would expect 12.5 candies in each flavor.
Flavor Observed Expected
Cherry 11 12.5
Lemon 15 12.5
Orange 12 12.5
Strawberry 12 12.5
We can then use the chi-squared formula to calculate the chi-square statistic to compare the
discrepency between the expected and observed frequencies.
A middle school is gathering information on its after-school clubs because it was assumed that the
distribution of students in each grade was evenly distributed across the clubs, meaning there were
the same amount of 6th graders in each club, the same amount of 7th graders in each club, and the
same amount of 8th graders in each club.
This table lists the number of students from each grade participating in each club.
6th graders 7th graders 8th graders
Coding Club 12 14 8
Photography Club 7 11 15
Debate Club 9 5 13
Suppose we want to find the observed frequency for 7th graders participating in the photography
club. Using the chart, we can directly see the observed frequency for 7th graders participating in the
photography club is 11.
To find the expected frequency for 7th graders participating in the photography club, we need to
find the number of occurrences if the null hypothesis is true, which in this case, was that the three
options are equally likely, or if the students in each grade were all evenly distributed across the
clubs.
First, add up all the students in the 7th-grade column:
If each of these three clubs were evenly distributed among the 30 7th graders, we would need to
divide the total evenly between the three options:
This means we would expect 10 7th graders to participate in the coding club, 10 7th graders to
participate in the photography club, and 10 7th graders to participate in the debate club.
In summary, the observed and expected frequencies for 7th graders participating in photography
club is:
Observed: 11
Expected: 10
 SUMMARY
The chi-square statistic is a measure of discrepancy across categories from what you would have
expected in categorical data. You can only use it for data that appear in categories or qualitative data.
The expected values may not be whole numbers since the expected values are long-term average
values.
Good luck!
 TERMS TO KNOW
The sum of the ratios of the squared differences between the expected and observed counts to the
expected counts.
The number of occurrences we would have expected within each of the categories in a qualitative
distribution if the null hypothesis were true.
The number of occurrences that were observed within each of the categories in a qualitative
distribution.
by Sophia
 WHAT'S COVERED
This tutorial will cover how to calculate a chi-square test statistic for a chi-square test of goodness-of-
fit. Our discussion breaks down as follows:
1. The Chi-Square Distribution

2. The Chi-Square Test for Goodness-of-Fit
1. The Chi-Square Distribution

Recall that the chi-square statistic is a particular test used to measure how the expected frequency differs
from observed frequency.
Below is a visual representation of a chi-square distribution.
The chi-square distribution is a right-skewed distribution that generally measures the discrepancy from what a
sample of categorical data would look like if you had an idea of what the population should look like in those
categories.
A smaller chi-square value would indicate a small discrepancy.

A larger chi-square value would indicate a large discrepancy.
The p-value is always the area in the chi-square distribution to the left of your particular chi-square statistic
that we end up calculating. The values on the left (low values of chi-square) are likely to happen by chance,
and high values of chi-square are unlikely to happen by chance.
Just like the t distribution, the chi-square distribution is actually a family of curves. The shape changes a little
bit, based on the degrees of freedom, but it's always skewed to the right.
 HINT
The degrees of freedom for the chi-square distribution is the number of categories minus 1.
The conditions for using the chi-square distribution are:
The data represent a simple random sample from the population.

The observations should be sampled independently from the population, and the population is at least 10
times sample size condition, which is called the "10% of the population" condition.
The expected counts have to be at least 5. We have to ensure that the sample size is large, which is
similar to the conditions for checking normality in other hypothesis tests.
2. The Chi-Square Test for Goodness-of-Fit

A chi-square test for goodness-of-fit is a method of testing the fit of three or more category proportions to a
specified distribution. The null hypothesis is that the population distribution matches a specified distribution,
while the alternative hypothesis
H0: The population distribution matches a specified distribution.

Ha: The population distribution does not match a specified distribution.
As with any hypothesis test, you will need to follow these steps:
 STEP BY STEP

Step 2: Check the conditions.
Step 3: Calculate the test-statistic and p-value.
 EXAMPLE In the book Outliers, Malcolm Gladwell outlines a trend that he finds in professional
hockey, related to birth month. Suppose a random sample of 512 professional hockey players was
taken and their birth month was recorded.
Given the following information about birth month for the population, what would you expect for the
number of hockey players born in each month?
Expected # of Observed # of
Month % of Population
Hockey Players Hockey Players
January 8% 51
February 7% 46
March 8% 61
April 8% 49
May 8% 46
June 8% 49
July 9% 36
August 9% 41
September 9% 36
October 9% 34
November 8% 33
December 9% 30
Total 512
Is the recorded values what you would have expected, given the general population? It certainly appears that
the earlier months of the year have larger numbers of NHL players born in them, which is not very consistent
with the nearly uniform distribution of the population. The observed distribution looks like this:
What you would have expected is that, of those 512 professional hockey players, 8% of them would have
been born in January, 7% of them would have been born in February, etc. We can find the expected value for
each month based on the given percentages of the population to get the following values:
Month % of Population
January 8% 40.96 51
February 7% 35.84 46
March 8% 40.96 61
April 8% 40.96 49
May 8% 40.96 46
June 8% 40.96 49
July 9% 46.08 36
August 9% 46.08 41
September 9% 46.08 36
October 9% 46.08 34
November 8% 40.96 33
December 9% 46.08 30
Total 512 512
We would have expected 9% of the players to have been born in each of July, August, September, October,
and December. So we would have expected 46.08. However, apparently just 30 were born in December.
Let's perform a chi-square goodness-of-fit test for this set of data to determine the discrepency:

The null hypothesis, H0, is that the distribution of birth month for the population of all hockey players is
the same as the distribution for the entire population.
The alternative hypothesis, Ha, is that the distribution of birth months for hockey players differs from that
of the population.
Significance level, α, can be set at 0.05, meaning if you get a p-value below 0.05, you'll reject the null
hypothesis.

Take a look at the conditions:
You can treat it as such. This was a sample of hockey players born
between 1980 and 1990. There's no reason to imagine that that's
Simple Random Sample going to be particularly different or unrepresentative. Therefore, you
can treat this as a random sample of players who have played or will
play professional hockey.
You have to assume that there are at least 10 times as many players
who have ever played pro hockey as there were in our sample, such
Independence that we can assume that independence piece. That would mean that
you have to assume that there are at least 5,120 players who have
ever played pro hockey.
The smallest number occurred in February, with 35.84. So, yes,

Expected Counts At Least 5 when you look at the entire row of expected values, all of them are
over 5.
Now, let's calculate your chi-square statistic using this formula:
 FORMULA
Chi-Square Test
The chi-square statistic is going to be the observed minus the expected for each month squared, divided by
the expected for each month.
Month
January 40.96 51 2.46
February 35.84 46 2.88
March 40.96 61 9.80
April 40.96 49 1.58
May 40.96 46 0.62
June 40.96 49 1.58
July 46.08 36 2.20
August 46.08 41 0.56
September 46.08 36 2.20
October 46.08 34 3.17
November 40.96 33 1.55
December 46.08 30 5.61
Sum 34.21
When you add all of those components together, you get the chi-square value of 34.21.
In this case, it's also a good idea to state that the degrees of freedom, which is the number of categories
minus 1. There were 512 hockey players, but there were 12 categories. So the degrees of freedom is 12 minus
1, or 11.
The p-value can be obtained from technology or with a table. When using a table, you go down to the line that
corresponds with the degrees of freedom and look for the chi-square value.
χ2 Critical Values
Degrees Tail Probability Values, p
of freedom
(df) 0.250 0.200 0.150 0.100 0.050 0.025 0.020 0.010 0.005 0.0025 0.001 0.0005
1 1.32 1.64 2.07 2.71 3.84 5.02 5.41 6.63 7.88 9.14 10.83 12.12
2 2.77 3.22 3.79 4.61 5.99 7.38 7.82 9.21 10.60 11.98 13.82 15.20
3 4.11 4.64 5.32 6.25 7.81 9.35 9.84 11.34 12.84 14. 32 16.27 17.73
4 5.39 5.59 6.74 7.78 9.49 11.14 11.67 13.23 14.86 16.42 18.47 20.00
5 6.63 7.29 8.12 9.24 11.07 12.83 13.33 15.09 16.75 18.39 20.51 22.11
6 7.84 8.56 9.45 10.64 12.53 14.45 15.03 16.81 18.55 20.25 22.46 24.10
7 9.04 9.80 10.75 12.02 14.07 16.01 16.62 18.48 20.28 22.04 24.32 26.02
8 10.22 11.03 12.03 13.36 15.51 17.53 18.17 20.09 21.95 23.77 26.12 27.87
9 11.39 12.24 13.29 14.68 16.92 19.02 19.63 21.67 23.59 25.46 27.88 29.67
10 12.55 13.44 14.53 15.99 18.31 20.48 21.16 23.21 25.19 27.11 29.59 31.42
11 13.70 14.63 15.77 17.29 19.68 21.92 22.62 24.72 26.76 28.73 31.26 33.14
12 14.85 15.81 16.99 18.55 21.03 23.34 24.05 26.22 28.30 30.32 32.91 34.82
13 15.93 16.99 18.90 19.81 22.36 24.74 25.47 27.69 29.82 31.88 34.53 36.48
14 17.12 18.15 19.40 21.06 23.68 26.12 26.87 29.14 31.32 33.43 36.12 38.11
Going down to the line for 11 degrees of freedom, the closest to 34.21 is 33.14. This chi-square statistic
corresponds with a probability of 0.0005. That's a very low p-value, much less than the significance level of
0.05.
Since your p-value of 0.0005 is low, you can't attribute the difference from the "norm" to chance alone.
This means that you must reject the null hypothesis in favor of the alternative and conclude that the
distribution of birth months for professional hockey players differs significantly from the birth month
distribution for the general populace.
IN CONTEXT
A manufacturing company claims that they expect five defects per day. This means that they believe
the defects are evenly distributed across Monday through Friday.
A manager collects data on the days of the week and records the following information:
Day of the Expected # Observed #

Week of Defects of Defects
Monday 5 6
Tuesday 5 8
Wednesday 5 4
Thursday 5 2
Friday 5 5
Let's perform a chi-square test for goodness of fit to determine if the variation that we see in the
observation is from random chance, or is there something different than an even distribution.
We can state these hypotheses with a significance level of 5%:
Step 2: Check conditions. Let's check the three conditions for this hypothesis test.
Simple Random Sample: We can assume that the manager collected data randomly
throughout the days of the week.
Independence: You have to assume that there have been at least 10 times as many defects
at this manufacturing company as there were in our sample, such that we can assume that
independence piece. That would mean that you have to assume that there have been at
least 250 defects in this company's history.
Expected Counts At least 5: When you look at the entire row of expected values, all of them
are 5 so this condition is satisfied.
We can use the chi-square formula to calculate the chi-square test statistic and take the
observed minus the expected, then square those value and divide by the expected, and finally
sum everything that we find.
Day of the Expected # Observed #

Week of Defects of Defects
Monday 5 6
Tuesday 5 8
Wednesday 5 4
Thursday 5 2
Friday 5 5
Sum 4
The chi-square test statistic for this data set is equal to 4. We can use a chi-square table or
technology to find the p-value that relates to this value of the chi-statistic, 4. We also need to
look at the degrees of freedom, which is the sample size minus 1, or 5 minus 1. So, in this
case, the chi-square statistic and the degrees of freedom are both 4. Applying this
information and using technology, we find a p-value of 0.40601.
Step 4: Compare your test statistic to your chosen critical value, or your p-value to your
chosen significance level. Based on how they compare, state a decision about the null
hypothesis and conclusion in the context of the problem.
Remember, our significance level was 0.05. In this case, our p-value is greater than our
significance level so we cannot reject our null hypothesis.
 TERM TO KNOW

A hypothesis test where we test whether or not our sample distribution of frequencies across
categories fits with hypothesized probabilities for each category.
 SUMMARY
The chi-square statistic is a measure of discrepancy across categories from what we would have
expected in our categorical data. The expected values might not be whole numbers, since each
expected value is a long term average. The chi-square distribution is a skewed right distribution, and
chi-square statistics near zero are more common if the null hypothesis is true. The goodness-of-fit test
is used to see if the distribution across categories for data fit a hypothesized distribution across
categories.
Good luck!
 TERMS TO KNOW

A hypothesis test where we test whether or not our sample distribution of frequencies across categories
fits with hypothesized probabilities for each category.
Chi-Square Test for Homogeneity
by Sophia
 WHAT'S COVERED
This tutorial is going to run through a chi-square test of homogeneity. Our discussion breaks down as
follows:
1. Chi-Square Test of Homogeneity
1. Chi-Square Test of Homogeneity

A chi-square test of homogeneity is a test that uses multiple populations and tests to see if these populations
are the same across categorical, or qualitative, variables. In other words, you are trying to determine if the
distributions of categorical data differ across different populations.
Instead of comparing the distributions to some hypothesized distribution, you compare whether or not two
sample distributions are significantly different from each other.
As with any chi-square test, you must follow these steps:
 STEP BY STEP

 EXAMPLE Suppose that two colleges, the U and State, are worried about the student drinking
behaviors, so they both independently choose random samples of their students. The results of the
drinking behaviors are given in the table here:
Drinking Level The U State
None 140 186 326
Low 478 661 1,139
Moderate 300 173 473
High 63 16 79
981 1036 2017
The question is, does there appear to be a difference with drinking behaviors between the two colleges?
Obviously, those who drink a lot represent the lowest category in both schools, and those who drink a little
represent the highest in both schools. Perhaps the schools are not that different. You can run a test, though,
to make sure whether that's the case or to dispute whether that's the case.

In the test for homogeneity, the null hypothesis is that they are the same distribution, or that the two sample
distributions are not significantly different; the distribution of drinking levels is the same at the U as it is for
State. The alternative hypothesis is that the two distributions are not the same.
H0: The distribution of drinking levels is the same for The U as it is for State.
Ha: The distribution of drinking levels is not the same for The U as it is for State.
α: 0.05
Choose a significance level of 0.05.

One of the conditions is going to be that the expected values are all greater than five. But the question is, how
do you calculate expected values? You can't do the same thing you did in a goodness-of-fit test. Instead, you
have to think about it a different way. Of the 2,017 students, 326 of them don't drink at all, which is equal to
16.2%.
None 140 186 326
Low 478 661 1,139
Moderate 300 173 473
High 63 16 79
981 1036 2017
The idea here is that if the two distributions were homogeneous, then it would be 16.2% at The U that don't
drink at all and 16.2% at State that don't drink it all.
(0.162)(981) = 158.56 The U students expected to not drink at all

(0.162)(1036) = 167.44 State students expected to not drink at all
So we would expect 158.56 students from The U and 167.44 students from State that participated in this
survey to be in the "None" row.
Take a look at how this was calculated:
When you calculated the expected value for "None" and The U, you divided 326 by 2017 to get the 16.2%, and
then multiplied by 981. In other words, we multiplied the total of "None" by the total of "The U", and divide all
that by the grand total.
In general, what we can say is that the expected values for each cell are going to be the row total times the
column total over the grand total.
 FORMULA
Expected Value for Cell in Chi-Square Test for Homogeneity
From that, it's not too hard to create an entire table of expected values.
Observed Table Expected Table
Drinking Drinking
The U State The U State
Level Level
None 140 186 326 None 158.56 167.44 326
Low 478 661 1139 Low 553.97 585.03 1139
Moderate 300 173 473 Moderate 212.54 224.46 473
High 63 16 79 High 38.42 40.58 79
981 1036 2017 981 1036 2017
The table on the left is what you observed; the table on the right is what you expected. Again, these values
don't have to be integers.
The conditions for this hypothesis test are met: you have two independent random samples and all cell
counts in the expected table are at least five, the smallest one being 38.42.

At this point, you can calculate the chi-square statistic using the observed and expected. Recall that the
formula for a chi-square statistic the observed minus expected, squared, over expected. Add all of them up.
 FORMULA
Chi-Square Test
 HINT
You can also use technology to calculate the chi-square test statistic and the p-value.
The chi-square test statistic that you would obtain is 96.6.
The degrees of freedom, in this case, can be found by multiplying the number of rows minus one times the
value of the number of columns minus one. This is technically the general rule and can be applied to the
previous chi-square tests.
 FORMULA
Chi-Square Test Degrees of Freedom
Let's take another look at our data:
None 140 186
Low 478 661
Moderate 300 173
High 63 16
In this case, there were four rows (none, low, moderate, and high) and two columns (The U and State):
So, the degrees of freedom is going to be equal to three. The chi-square statistic and p-value can all be
obtained using technology and we get a corresponding p-value of 0.001. This is a very low value, less than
0.05.
Since the p-value is lower than the significance level, you reject the null hypothesis and conclude that there is
a difference in drinking behavior between the students at the U and the students at State.
 TERM TO KNOW
Chi-Square Test of Homogeneity

A test used to determine if there is no difference in a categorical variable across several populations
or treatments.
 SUMMARY
The chi-square test of homogeneity allows you to test whether two populations have significantly
different distributions across the categories. The expected counts for each cell is the product of the
row total and the column total divided by the grand total. The conditions are the same as they are for
a goodness-of-fit test, in that all the expected values have to be greater than five.
Good luck!
 TERMS TO KNOW
Chi-square test for homogeneity

A test used to determine if there is no difference in a categorical variable across several populations or
treatments
Chi-square Degrees of Freedom
Chi-Square Test for Association and
Independence
by Sophia
 WHAT'S COVERED
This tutorial will cover the chi-square test of independence. Our discussion breaks down as follows:
1. The Chi-Square Test for Association/Independence
1. The Chi-Square Test for

Association/Independence
The chi-square test for association is sometimes called a chi-square test of independence. This is a type of
hypothesis test to test if there is no association between multiple categorical variables in a single population.
As with any chi-square test, you must follow these steps:
 STEP BY STEP

Step 3: Calculate the test-statistic and p-value
Recall that the conditions for a chi-square test are:
The data represent a simple random sample from the population.

The observations should be sampled independently from the population, and the population is at least 10
times sample size condition, which is called the "10% of the population" condition.
The expected counts have to be at least 5. We have to ensure that the sample size is large, which is
similar to the conditions for checking normality in other hypothesis tests.
 EXAMPLE Suppose 335 students of different backgrounds (rural, suburban, and urban schools)
were asked to pick one thing about school that was most important to them: getting good grades,
being popular, or being good at sports. Here is the distribution of responses:
School Locations
Goal Rural Suburban Urban
Grades 57 87 24
Popular 50 42 6
Sports 42 22 5
The question is, does there appear to be an association between the geographic location of the school and
the answer choice to the question (the goal)? This is an ideal time to run a chi-square test for association or
independence. This can tell you if the distribution of goals (grades, popular, and sports) differ significantly for
each school location. Are they associated or are they independent?

In the null hypothesis, you're going to say that school location and goal are independent. That is, they donot
have an association with each other. The alternative hypothesis is that they do have an association with each
other. At least one of these distributions--grades, popularity, and sports--is different for suburban, urban, or
rural verus the others. Also, you can choose a significance level of 0.05.
H0: The school locations and goals are independent.

Ha: The school locations and goals are associated.
α: 0.05

For the test of independence, the conditions and the way that chi-square and p-value are calculated are the
same as in a test of homogeneity. We first need to find the expected value for each cell to ensure that the
condition is met.
Remember, the expected value is equal to that particular cell's row total, times its column total, divided by the
grand total for all the cells.
 FORMULA
Expected Value for Cell in Chi-Square Test for Association/Independence
For example, if we wanted the expected value for "Grades" and "Rural", we would multiply the row total for
"Grades" with the column total for "Rural", and divide by the total values in the table.
For the row with "Grades", there was a total of 57 plus 87 plus 24, or 168 students. For "Rural", there was a
total of 149 students. We were told at the beginning there were a total of 335 students, however, we could
also add up all the values in the table to get this same value.
We can continue using this formula for each cell and get the expected table of results:
Observed
School Locations
Grades 57 87 24
Popular 50 42 6
Sports 42 22 5
Expected
School Locations
Grades 74.72 75.73 17.55
Popular 43.59 44.17 10.24
Sports 30.69 31.10 7.21
What you are interested in is whether or not all the expected counts are at least 5. The smallest one is 7.21, so
the conditions are met.

Using technology, we can find that the chi-square statistic is equal to 18.564, which is big.
To find the corresponding p-value, we first need to find the degrees of freedom. The degrees of freedom can
be found by multiplying the number of rows minus one times the value of the number of columns minus one.
 FORMULA
Chi-Square Test Degrees of Freedom
In this case, there were three rows (grades, popular, and sports) and three columns (rural, suburban, and
urban):
In this case, the degrees of freedom is going to be equal to four. Using technology and plugging in the chi-
square statistic of 18.564 and 4 degrees of freedom, the p-value can be obtained and we get a very small p-
value of 0.001.
You need to link your p-value to a decision about the null hypothesis. Since the p-value is smaller than 0.05,
you reject the null hypothesis in favor of the alternative and conclude that there is an association between the
two categorical variables of school location and goal.
 TERM TO KNOW
Chi-Square Test for Association/Independence

A hypothesis test that tests whether two qualitative variables have an association or not.
 SUMMARY
The chi-square test of independence tests whether two qualitative variables have an association or
not, so it's sometimes called the chi-square test of association. The expected value for each cell is
equal to that particular cell's row total, times its column total, divided by the grand total for all the cells.
Good luck!
 TERMS TO KNOW
Chi-Square Test of Independence/Association

Terms to Know


A theorem that explains the shape of a sampling distribution of sample means. It states that
if the sample size is large (generally n ≥ 30), and the standard deviation of the population is
finite, then the distribution of sample means will be approximately normal.
The sum of the ratios of the squared differences between the expected and observed
counts to the expected counts.

A hypothesis test where we test whether or not our sample distribution of frequencies
across categories fits with hypothesized probabilities for each category.
Chi-Square Test of Independence/Association

Chi-square test for homogeneity

A test used to determine if there is no difference in a categorical variable across several
populations or treatments
Confidence Interval
An interval we are some percent certain (eg 90%, 95%, or 99%) will contain the population
parameter, given the value of our sample statistic.

A confidence interval that gives a likely range for the value of a population proportion. It is
the sample proportion, plus and minus the margin of error from the normal distribution.
Critical Value
A value that can be compared to the test statistic to decide the outcome of a hypothesis
test
A distribution where each data point consists of a mean of a collected sample. For a given
sample size, every possible sample mean will be plotted in the distribution.

A distribution where each data point consists of a proportion of successes of a collected
sample. For a given sample size, every possible sample proportion will be plotted in the
distribution.
The number of occurrences we would have expected within each of the categories in a
qualitative distribution if the null hypothesis were true.
F statistic
The test statistic in an ANOVA test. It is the ratio of the variability between the samples to
the variability within each sample. If the null hypothesis is true, the F statistic will probably
be small.
Hypothesis

A hypothesis test where we compare to see if the sample proportion of "successes" differs
significantly from a hypothesized value that we believe is the population proportion of
"successes."
Hypothesis Testing
Left-tailed test
A hypothesis test where the alternative hypothesis only states that the parameter is lower
Null Hypothesis
A claim about a particular value of a population parameter that serves as the starting
assumption for a hypothesis test.
The number of occurrences that were observed within each of the categories in a
qualitative distribution.
One-Way ANOVA
A hypothesis test that compares three or more population means with respect to a single
characteristic or factor.
One-tailed test
A hypothesis test where the alternative hypothesis only states that the parameter is higher
(or lower) than the stated value from the null hypothesis.
P-value
The probability that the test statistic is that value or more extreme in the direction of the
alternative hypothesis
Population Mean

The probability that we reject the null hypothesis (correctly) when a difference truly does
exist.
Right-tailed test
A hypothesis test where the alternative hypothesis only states that the parameter is higher
Sample Mean
A mean obtained from a sample of a given size. Denoted as .
Sample Size
Sample Statistics
Summary values obtained from a sample.
Sampling Error
A sampling plan where each observation that is sampled is replaced after each time it is
sampled, resulting in an observation being able to be selected more than once.
Sampling Without Replacement

A sampling plan where each observation that is sampled is kept out of subsequent
selections, resulting in a sample where each observation can be selected no more than
one time.
Significance Level
The probability of making a type I error. Abbreviated with the symbol, alpha .


The square root of the product of the probabilities of success and failure (p and q,
respectively) divided by the sample size.
Standard Error

The table that allows us to find the percent of observations below a particular z-score in the
normal distribution.
The statistic obtained is so different from the hypothesized value that we are unable to
attribute the difference to chance variation.
T-Distribution/Student's T-Distribution
A family of distributions that are centered at zero and symmetric like the standard normal
distribution, but heavier in the tails. Depending on the sample size, it does not diminish
towards the tails as fast. If the sample size is large, the t-distribution approximates the
normal distribution.

The type of hypothesis test used to test an assumed population mean when the population
standard deviation is unknown. Due to the increased variability in using the sample
standard deviation instead of the population standard deviation, the t-distribution is used in
place of the z-distribution.
Test Statistic
A measurement, in standardized units, of how far a sample statistic is from the assumed
parameter if the null hypothesis is true
Two-Way ANOVA
A hypothesis test that compares three or more population means with respect to multiple
characteristics or factors.
Two-tailed test
A hypothesis test where the alternative hypothesis states that the parameter is different
from the stated value from the null hypothesis; that is, the parameter's value is either higher
or lower than the value from the null hypothesis.
Type I Error
In a hypothesis test, when the null hypothesis is rejected when it is in fact, true.
Type II Error
In a hypothesis test, when the null hypothesis is not rejected when it is, in fact, false.

A hypothesis test that compares a hypothesized mean from the null hypothesis to a sample
mean, when the population standard deviation is known.

t-distribution
A family of distributions similar to the standard normal distribution, except that they are
fatter in the tails, due to the increased variability associated with using the sample standard
deviation instead of the population standard deviation in the formula for the test statistic.
Formulas to Know
Confidence Interval
Standard Error
Sample Means:
Test Statistic
z-statistic of Means

Unit 5 Tutorials Hypothesis Testing With Z Tests T Tests and Anova

Uploaded by

Copyright:

Available Formats

You might also like

Unit 5 Tutorials Hypothesis Testing With Z Tests T Tests and Anova

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5 Tutorials Hypothesis Testing With Z Tests T Tests and Anova

Uploaded by

Copyright:

Available Formats

Unit 5 Tutorials: Hypothesis Testing with

Z-Tests, T-Tests, and ANOVA

Sample Statistics and Population Parameters

Z-Tests for Population Means and Proportions

Standard Normal Table Review

T-Tests for Sample Means

Analysis of Variance and Chi-Square Tests

Sample Statistics and Population Parameters

Sample Statistic Examples (Statistics)

Sample Std Dev =

Sample Proportion = 57% Population proportion = ?

Population Parameter Examples

Population Std Dev =

Parameters are a numerical value that characterizes some aspect of a population.

What is the population parameter and the sample statistic?

Population size = 2,000

We can also find the following sample statistics:

Source: Adapted from Sophia tutorial by Jonathan Osters.

1. Sampling With Replacement

1. Sampling With Replacement

 EXAMPLE Consider a standard deck of 52 cards:

What is the probability that you draw a spade?

When sampling with replacement, the trials are independent.

2. Sampling Without Replacement

Now, what is the probability of drawing a diamond on the fifth draw?

Sampling without Replacement

Source: Adapted from Sophia tutorial by Jonathan Osters.

Sampling With Replacement

Sampling Without Replacement

Source: Adapted from Sophia tutorial by Jonathan Osters.

1. Distribution of Sample Means

1. Distribution of Sample Means

 EXAMPLE Consider the spinner shown here:

So how can we represent all these distributions?

Distribution of Sample Means

Mean of a Sampling Distribution of Sample Means

1b. Standard Deviation

Standard Deviation of a Sampling Distribution of Sample Means

Standard Deviation of a Distribution of Sample Means

Central Limit Theorem

Central Limit Theorem

Distribution of Sample Means

Standard Deviation of a Distribution of Sample Means

Mean of a Sampling Distribution of Sample Means

Standard Deviation of a Sampling Distribution of Sample Means

Sample Sample Prop Heads

Distribution of Sample Proportions

Mean of a Distribution of Sample Proportions

Standard Deviation of a Distribution of Sample Proportions

If p is the probability of success, q is the probability of failure, which is equal to 1-p.

Standard Deviation of a Distribution of Sample Proportions

Source: Adapted from Sophia tutorial by Jonathan Osters.

Distribution of Sample Proportions

Standard Deviation of a Distribution of Sample Proportions

Mean of a Distribution of Sample Proportions

Standard Deviation of a Distribution of Sample Proportions

2. Null and Alternative Hypothesis

The null hypothesis is a default hypothesis that is temporarily accepted as true.