Professional Documents
Culture Documents
Graduate Statistics in Excel Manual 2 S
Graduate Statistics in Excel Manual 2 S
By Mark Harmon
Copyright © 2014 Mark Harmon
No part of this publication may be reproduced
or distributed without the express permission
of the author.
mark@ExcelMasterSeries.com
ISBN: 978-1-937159-21-4
Table of Contents
t-Tests in Excel
t-Test: t-Distribution-Based Hypothesis Test ....................................... 16
t-Test Overview .......................................................................................................... 16
Null Hypothesis .......................................................................................................... 16
Null Hypothesis - Rejected or Not But Never Accepted.............................................. 17
Alternative Hypothesis................................................................................................ 17
One-Tailed Test vs. Two-Tailed Test ......................................................................... 18
Level of Certainty ....................................................................................................... 18
Level of Significance (Alpha) ...................................................................................... 18
Region of Acceptance ................................................................................................ 18
Region of Rejection .................................................................................................... 19
Critical Value(s) .......................................................................................................... 19
Test Statistic ............................................................................................................... 19
p Value ....................................................................................................................... 20
Critical t Value or Critical z Value ............................................................................... 20
Critical t Value For 1-Tailed Test in Right Tail: ............................................................................ 20
Critical t Value For 1-Tailed Test in Left Tail:............................................................................... 21
Critical t Values For a 2-Tailed Test: .............................................................................................. 21
3 Equivalent Reasons To Reject Null Hypothesis ...................................................... 22
1) Sample Statistic Beyond Critical Value ....................................................................................... 22
2) Test Statistic Beyond Critical t or z Value .................................................................................. 22
3) p Value Smaller Than α (1-Tailed) or α/2 (2-Tailed) ................................................................. 22
Independent vs. Dependent Samples ........................................................................ 22
Pooled vs. Unpooled Tests ........................................................................................ 22
Type I and Type II Errors ............................................................................................ 22
Power of a Test .......................................................................................................... 23
Effect Size .................................................................................................................. 23
Nonparametric Alternatives for t-Tests in Excel ......................................................... 23
Hypothesis Test of Mean vs. Proportion..................................................................... 23
Hypothesis Tests of Mean – Overview ............................................................................................. 23
Hypothesis Tests of Proportion – Overview .................................................................................... 24
t-Test vs. z-Test.......................................................................................................... 24
Means of Large Samples Are Normally Distributed .................................................... 24
Requirements of a z-Test ........................................................................................... 24
Requirements of a t-Test ............................................................................................ 25
Basic Steps of a Hypothesis Test of Mean ................................................................. 25
Uses of Hypothesis Tests of Mean ............................................................................. 27
Types of Hypothesis Tests of Mean ........................................................................... 27
Correctable Reasons That Normal Data Can Appear Non-Normal ........................................... 153
Step 1 – Create the Null and Alternate Hypotheses ..................................................................... 153
Step 2 – Map the Distributed Variable on a t-Distribution Curve .............................................. 155
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 156
Calculate the Critical Values .......................................................................................................................................... 157
Two-Tailed Critical Values ......................................................................................................................................... 157
One-Tailed Critical Value ........................................................................................................................................... 158
Correctable Reasons That Normal Data Can Appear Non-Normal ........................................... 185
When Data Are Not Normally Distributed.................................................................................... 185
Step 1 – Create the Null and Alternate Hypotheses ..................................................................... 186
Step 2 – Map the Distributed Variable to t-Distribution ............................................................. 187
Step 3 – Map the Regions of Acceptance and Rejection .............................................................. 188
Calculate the Critical Value ............................................................................................................................................ 189
Correctable Reasons That Normal Data Can Appear Non-Normal ........................................... 375
Nonparametric Alternatives to the F Test ..................................................................................... 376
Levene’s Test For Sample Variance Comparison in Excel .......................................................................................... 376
Brown-Forsythe Test For Sample Variance Comparison in Excel .............................................................................. 378
Check Out the Latest Book in the Excel Master Series! .................... 380
t-Test Overview
The t-Test is the most commonly-used hypothesis test that analyzes sample data to determine if two
populations have significantly different means. A t-Test can be applied if the test statistic follows the t
Distribution under the Null Hypothesis. The test statistic will follow the t Distribution if any of the following
conditions exist:
1) The population is normally distributed.
2) The sample is normally distributed.
3) The sample size is large.
The t-Test is the appropriate population mean hypothesis testing tool when sample size is small and/or
the population standard deviation is not known. A t-Test can always be substituted for a z-Test.
Null Hypothesis
A hypothesis test is based upon a Null Hypothesis which states that the sample did come from that
population. A hypothesis test compares a sample statistic such as a sample mean to a population
parameter such as the population’s mean. The amount of difference between the sample statistic and the
population parameter determines whether the Null Hypothesis can be rejected or not.
The Null Hypothesis states that the population from which the sample came has the same mean or
proportion as a hypothesized population. The Null Hypothesis is always an equality stating that the
means or proportions of two populations are the same.
An example of a basic Null Hypothesis for a Hypothesis Test of Mean would be the following:
H0: x_bar = Constant = 5
This Null Hypothesis would be used to state that the population from which the sample was taken has a
mean equal to 5. The Constant (5) is the mean of the hypothesized population that the sample’s
population is being compared to. The Null Hypothesis states that the sample’s population and the
hypothesized population have the same means. The Alternative Hypothesis states that they are different.
An example of a basic Null Hypothesis for a Hypothesis Test of Proportion would be the following:
H0: p_bar = Constant = 0.3
This Null Hypothesis would be used to state that the population from which the sample was taken has a
proportion equal to 0.3. The Constant (0.3) is the proportion of the hypothesized population that the
sample’s population is being compared to. The Null Hypothesis states that the sample’s population and
the hypothesized population have the same proportions. The Alternative Hypothesis states that they are
different.
16
Null Hypothesis - Rejected or Not But Never Accepted
A hypothesis test has only two possible outcomes: the Null Hypothesis is either rejected or is not rejected.
It is never correct to state that the Null Hypothesis was accepted. A hypothesis test only determines
whether there is or is not enough evidence to reject the Null Hypothesis. The Null Hypothesis is rejected
only when the hypothesis test result indicates a Level of Certainty that the Null Hypothesis is not valid at
least equals the specified Level of Certainty.
If the required Level of Certainty for a hypothesis test is specified to be 95 percent, the Null Hypothesis
will be rejected only if the test result indicates that there is at least a 95 percent probability that the Null
Hypothesis is invalid. In all other cases, the Null Hypothesis would not be rejected. This is not equivalent
to stating that the Null Hypothesis was accepted. The Null Hypothesis is never accepted; it can only be
rejected or not rejected.
Alternative Hypothesis
The Alternative Hypothesis is always in inequality stating that the means or proportions of two populations
are not the same. The Alternative Hypothesis can be non-directional if it states that the means or
proportions of two populations are merely not equal to each other. The Alternative Hypothesis is
directional if it states that the mean or proportion of one of the populations is less than or greater than the
mean of proportion of the other population.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Mean would be the
following:
H1: x_bar ≠ 5
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a mean that is not equal to 5.
An example of a directional Alternative Hypothesis would be the following:
H1: x_bar > 5
or
H1: x_bar < 5
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a mean that is either greater than or less than 5.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Proportion would be the
following:
H1: p_bar ≠ 0.3
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a proportion that is not equal to 0.3.
An example of a directional Alternative Hypothesis would be the following:H 1: p_bar > 0.3
or
H1: p_bar < 0.3
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a proportion that is either greater than or less than 0.3.
17
One-Tailed Test vs. Two-Tailed Test
The number of tails in a hypothesis test depends on whether the test is directional or not. The operator of
the Alternative Hypothesis indicates whether or not the hypothesis test is directional. A non-directional
operator (a “not equal” sign) in the Alternative Hypothesis indicates that the hypothesis test is a two-
tailed test. A directional operator (a “greater than” or “less than” sign) in the Alternative Hypothesis
indicates that the hypothesis test is a one-tailed test.
The Region of Rejection (the alpha region) for a one-tailed test is entirely contained in the one of the
outer tails. A “greater than” operator in the Alternative Hypothesis indicates that the test is a one-tailed
test in the right tail. A “less than” operator in the Alternative Hypothesis indicates that the test is a one-
tailed test in the left tail. If α = 0.05, then one of the outer tails will contain the entire 5-percent Region of
Rejection.
The Region of Rejection (the alpha region) for a two-tailed test is split between both outer tails. Each
outer tail will contain half of the total Region of Rejection (alpha/2). If α = 0.05, then each outer tail will
contain a 2.5-percent Region of Rejection if the test is a two-tailed tailed.
Level of Certainty
Each hypothesis test has Level of Certainty that is specified. The Null Hypothesis is rejected only when
that Level of Certainty has been reached that the sample did not come from the population. A commonly
specified Level of Certainty is 95 percent. The Null Hypothesis would only be rejected in this case if the
sample statistic was different enough from the population parameter that at least 95 percent certainty was
achieved that the sample did not come from that population.
Region of Acceptance
A Hypothesis Test of Mean or Proportion can be performed if the Test Statistic is distributed according to
the normal distribution or the t distribution. The Test Statistic is derived directly from the sample statistic
such as the sample mean. If the Test Statistic is distributed according to the normal or t distribution, then
the sample statistic is also distributed according to normal or t distribution. This will be discussed is
greater detail shortly.
A Hypothesis Test of Mean or Proportion can be understood much more intuitively by mapping the
sample statistic (the sample mean or proportion) to its own unique normal or t distribution. The sample
statistic is the distributed variable whose distribution is mapped according its own unique normal or t
distribution
The Region of Acceptance is the percentage of area under this normal or t distribution curve that equals
the test’s specified Level of Certainty. If the hypothesis test requires 95 percent in order to reject the Null
Hypothesis, the Region of Acceptance will include 95 percent of the total area under the distributed
variable’s mapped normal or t distribution curve.
18
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Acceptance, the Null Hypothesis is not rejected. If the observed value of the
sample statistic falls outside of the Region of Acceptance (into the Region of Rejection), the Null
Hypothesis is rejected.
Region of Rejection
The Region of Rejection is the percentage of area under this normal or t distribution curve that equals the
test’s specified Level of Significance (alpha). It is important to remember the following relationship:
Level of Significance (alpha) = 1 – Level of Certainty.
If the required Level of Certainty to reject the Null Hypothesis is 95 percent, then the following are true:
Level of Certainty = 0.95
Level of Significance (alpha) = 0.05
The Region of Acceptance includes 95 percent of the total area under the normal or t distribution curve
that maps the distributed variable, which is the sample statistic (the sample mean or proportion).
The Region of Rejection includes 5 percent of the total area under the normal or t distribution curve that
maps the distributed variable, which is the sample statistic (the sample mean or proportion). The 5-
percent alpha region is entirely contained in one of the tails if the test is a one-tailed test. The 5-percent
alpha region is split between both of the outer tails if the test is a one-tailed test.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Rejection (outside the Region of Acceptance), the Null Hypothesis is rejected.
If the observed value of the sample statistic falls inside of the Region of Acceptance, the Null Hypothesis
is not rejected.
Critical Value(s)
Each hypothesis test has one or two Critical Values. A Critical Value is the location of boundary between
the Region of Acceptance and the Region of Rejection. A one-tailed test has one critical value because
the Region of rejection is entirely contained in one of the outer tails. A two-tailed test has two Critical
Values because the Region of Rejection is split between the two outer tails.
The Null Hypothesis is rejected if the sample statistic (the observed sample mean or proportion) is farther
from the curve’s mean than the Critical Value on that side. If the sample statistic is farther from the
curve’s mean than the Critical value on that side, the sample statistic lies in the Region of Rejection. If the
sample statistic is closer to the curve’s mean than the Critical value on that side, the sample statistic lies
in the Region of Acceptance.
Test Statistic
Each hypothesis test calculates a Test Statistic. The Test Statistic is the amount of difference between
the observed sample statistic (the observed sample mean or proportion) and the hypothesized population
parameter (the Constant on the right side of the Null Hypothesis) which will be located at the curve’s
mean.
This difference is expressed in units of Standard Errors. The Test Statistic is the number of Standard
Errors that are between the observed sample statistic and the hypothesized population parameter. The
Null Hypothesis is rejected if that number of Standard Errors specified by the Test Statistic) is larger than
a critical number of Standard Errors. The critical number of Standard Errors is determined by the required
Level of Certainty
19
The Test Statistic is either the z Score or the t Value depending on whether a z-Test or t-Test is being
performed. This will be discussed in greater detail shortly.
p Value
Each hypothesis test calculates a p Value. The p Value is the area under the curve that is beyond the
sample statistic (the observed sample mean or proportion). The p Value is the probability that a sample of
size n with the observed sample mean or proportion could have occurred if the Null Hypothesis were true.
If, for example, the p Value of a Hypothesis Test of Mean or Proportion were calculated to be 0.0212, that
would indicated that there is only a 2.12 percent chance that a sample of size n would have the observed
sample mean or proportion if the Null Hypothesis were true. The Null Hypothesis states that the
population from which the sample came has the same mean as the hypothesized population. This mean
is the Constant on the right side of the Null Hypothesis.
The p Value is compared to alpha for a one-tailed test and to alpha/2 for a two-tailed test. The Null
Hypothesis is rejected if p is smaller than α for a one-tailed test or if p is smaller than α/2 for a two-tailed
test. If the p Value is smaller than α for a one-tailed test or smaller than α/2 for a two-tailed test, the
sample statistic is in the Region of Rejection.
Calculations of the Critical t Value(s) and the p Value are as follows:
20
Critical t Value For 1-Tailed Test in Left Tail:
Excel 2010 and beyond
Critical t Value = T.INV(α,df)
Note that the negative sign has to be manually inserted into this pre-2010 formula to calculate the Critical
t Value in the left tail for a one-tailed test.
The p Value is calculated using the same formulas whether the test is a one-tailed test or a two-tailed
test.
21
3 Equivalent Reasons To Reject Null Hypothesis
The Null Hypothesis of a Hypothesis Test of Mean or Proportion is rejected if any of the following
equivalent conditions are shown to exist:
22
Power of a Test
The Power of a test indicates the test’s sensitivity. The Power of a test is the probability that the test will
detect a significant difference if one exists. The Power of a test is the probability of not making a Type II
Error, which is failing to detect a difference when one exists. A test’s Power is therefore expressed by the
following formula:
Power = 1 – β
Effect Size
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.” A large effect would be a difference between two groups that is easily
noticeable with the measuring equipment available. A small effect would be a difference between two
groups that is not easily noticed.
23
Hypothesis Tests of Mean require that the Test Statistic is distributed either according to the normal
distribution or to the t distribution. The Test Statistic in a Hypothesis Test of Mean is derived directly from
the sample mean and therefore has the same distribution as the sample mean.
Requirements of a z-Test
A z-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived
from the sample mean) is normally distributed. The sample mean and therefore the Test Statistic are
normally distributed only when the following two conditions are both met:
24
1) The size of the single sample taken is large (n > 30). The Central Limit Theorem states that means of
large samples will be normally distributed. When the size of the single sample is small (n < 30), only a t-
Test can be performed.
2) The population standard deviation, σ (sigma), is known.
Requirements of a t-Test
A t-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived from
the sample mean) is distributed according to the t distribution. The sample mean and therefore the Test
Statistic are distributed according to the t distribution when both of these conditions are met:
1) The sample standard deviation, s, is known.
2) Either the sample or the population has been verified for normality.
A t-Test can be performed when the single sample is large (n > 30) but is the only option when the size of
the single sample is small (n < 30). A z-Test can only be performed when the size of the single sample is
large (n > 30) and the population standard deviation is known.
As mentioned, a Hypothesis Test of Mean requires that the sample mean and therefore the Test Statistic
is distributed either according to the normal distribution or to the t distribution. The sample mean and the
Test Statistic are distributed variables that can be graphed according to the normal or t distribution.
The Test Statistic, which represents the number of Standard Errors that the sample mean is from the
hypothesized population mean, could be graphed on a standard normal distribution curve or a
standardized t distribution curve. Both these two distribution curves have their means at zero and the
length of one Standard Error is set to equal 1.
25
This t-Test was a two-tailed test as evidenced by the yellow Region of Rejection split between the both
outer tails. In this t-Test the alpha was set to 0.05. This 5-percent Region of Rejection is split between the
two tails so that each tail contains a 2.5 percent Region of Rejection.
The mean of this non-standardized t-distribution curve is 186,000. This indicates that the Null Hypothesis
is as follows:
H0: x_bar = 186,000
Since this is a two-tailed t-Test, the Alternative Hypothesis is as follows:
H1: x_bar ≠ 186,000
This one-sample t-Test is evaluating whether the population from which the sample was taken has a
population mean that is not equal to 186,000. This is a non-directional t-Test and is therefore two-tailed.
The sample statistic is the observed sample mean of this single sample taken for this test. This observed
sample mean is calculated to be 200,000.
The boundaries of the Region of Rejection occur at 172,083 and 199,916. Everything beyond these two
point is in the Region of Rejection. These two Critical Values are 1.95 Standard Errors from the
standardized mean of 0. This indicates that the Critical t Values are ±1.95.
The graph shows that the sample statistic (the sample mean of 200,000) falls beyond the right Critical
value of 199,916 and is therefore in the Region of Rejection.
The sample statistic is 2.105 Standard Errors from the standardized mean of 0. This is further from the
standardized mean of 0 than the right Critical t value which is 1.95.
The curve area beyond the sample statistic consists of 2.4 percent of the area under the curve. This is
smaller than α/2 which is 2.5 percent of the total curve area because alpha was set to 0.05.
As the graph shows, all three equivalent conditions have been met to reject the Null Hypothesis. It can be
stated with at least 95 percent certainty that the mean of the population from which the sample was taken
does not equal the hypothesized population mean of 186,000.
26
Uses of Hypothesis Tests of Mean
1) Comparing the mean of a sample taken from one population with the another population’s
mean to determine if both populations have the different means. An example of this would be to
compare the mean monthly sales of a sample of retail stores from one region to the national mean
monthly store sales to determine if the mean monthly sales of all stores in the one region are different
than the national mean.
2) Comparing the mean of a sample taken from one population to a fixed number to determine if
that population’s mean is different than the fixed number. An example of this might be to compare
the mean product measurement taken a sample of a number of units of a product to the company’s
claims about that product specification to determine if the actual mean measurement of all units of that
company’s product is different than what the company claims it is.
3) Comparing the mean of a sample from one population with the mean of a sample from another
population to determine if the two populations have different means. An example of this would be to
compare the mean of a sample of daily production totals from one crew with the mean of a sample of
daily production totals from another crew to determine if the two crews have different mean daily
production totals.
4) Comparing successive measurement pairs taken on the same group of objects to determine if
anything has changed between measurements. An example of this would be to evaluate whether there
is mean difference in before-and-after tests scores of a small sample of the same people to determine if a
training program made a difference to all of the people who underwent it.
5) Comparing the same measurements taken on pairs of related objects. An example of this would
be to evaluate whether there is mean difference in the incomes of husbands and wives in a sample of
married couples to determine if there is a mean difference in the incomes of husbands and wives in all
married couples.
It is important to note that a hypothesis test is used to determine if two populations are different. The
outcome of hypothesis test is to either reject or fail to reject the Null Hypothesis. It would be incorrect to
state that a hypothesis test is used to determine if two populations are the same.
27
1) One-Sample t-Test in Excel
Overview
This hypothesis test determines whether the mean of the population from which the sample was taken is
equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant. This constant
is often the known mean of a population from which the sample may have come from. The constant is the
constant on the right side of the Null Hypothesis.
df = n - 1
Null Hypothesis H0: x_bar = Constant
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
29
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions To Be Answered Before Performing the Four-Step Hypothesis Test of Mean are
as follows:
e) t-Test or z-Test?
Assuming that the population or sample can pass a normality test, a hypothesis test of mean must be
performed as a t-Test when the sample size is small (n < 30) or if the population variance is unknown.
In this case the sample size is small as n = 20. This Hypothesis Test of Mean must therefore be
performed as a t-Test and not as a z-Test.
The t Distribution with degrees of freedom = df = n – 1 is defined as the distribution of random data
sample of sample size n taken from a normal population.
30
The means of samples taken from a normal population are also distributed according to the t
Distribution with degrees of freedom = df = n – 1.
The Test Statistic (the t Value, which is based upon sample mean (x_bar) because it equals (x_bar –
Constant)/(s/SQRT(n)) will therefore also be distributed according to the t Distribution. A t-Test will be
performed if the Test Statistic is distributed according to the t Distribution.
The distribution of the Test Statistic for sample taken from a normal population is always described by the
t Distribution. The shape of the t Distribution converges to (very closely resembles) the shape of the
standard normal distribution when sample size becomes large (n > 30).
The Test Statistic’s distribution can be approximated by the normal distribution only if the sample size is
large (n > 30) and the population standard deviation, σ, is known. A z-Test can be used if the Test
Statistic’s distribution can be approximated by the normal distribution. A t-Test must be used in all other
cases.
It should be noted that a one-sample t-Test can always be used in place of a one-sample z-Test. All z-
Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software
packages including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a t-Test that is one-sample, two-tailed hypothesis test of mean as long as
all required assumptions have been met.
31
The population is considered to be normally distributed if any of the following are true:
1) The population from which the sample was taken is shown to be normally distributed.
2) The sample is shown to be normally distributed. If the sample passes a test of normality then the
population from which the sample was taken can be assumed to be normally distributed.
The population or the sample must pass a normality test before a t-Test can be performed. If the only
data available are the data of the single data taken, then sample must pass a normality test before a t-
Test can be performed.
Histogram in Excel
The quickest way to check the sample data for normality is to create an Excel histogram of the data as
shown below, or to create a normal probability plot of the data if you have access to an automated
method of generating that kind of a graph.
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
32
The sample group appears to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.
33
The normal probability plots for the sample group show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).
34
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
35
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Max Difference Between the Actual and Expected CDF (0.1500) is less than the Kolmogorov-
Smirnov Critical Value for n = 20 and α = 0.05 so do not reject the Null Hypothesis.
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected if the maximum difference between the expected and actual CDF of
any of the data points exceed the Critical Value for the given n and α.
36
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
37
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A Test Statistic W is calculated. If this Test Statistic is less than a critical value of W
for a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample
is normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
Sample Data
38
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Shapiro-Wilk Test Statistic W (0.967452) is larger than W Critical 0.905. The Null Hypothesis
therefore cannot be rejected. There is not enough evidence to state that the data are not normally
distributed with a confidence level of 95 percent.
39
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
40
Step 2 – Map the Distributed Variable to t-Distribution
A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed
according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-
related Test Statistic are distributed according to the t Distribution.
The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies
only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its
degrees of freedom, which equals n – 1 if n = sample size.
The means of similar, random samples taken from a normal population are distributed according to the t
Distribution. This means that the distribution of a large number of means of samples of size n taken from
a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.
The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of
freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map
the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:
A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar.
41
The variable x_bar is distributed according to the t Distribution. Mapping this distributed variable to a t
Distribution curve is shown as follows:
This non-standardized t Distribution curve has its mean set to equal the Constant taken from the Null
Hypothesis, which is:
H0: x_bar = Constant = 186,000
This non-standardized t Distribution curve is constructed from the following parameters:
Mean = 186,000
Standard Error = 6,649.10
Degrees of Freedom = 19
Distributed Variable = x_bar
42
If the sample mean’s value of x_bar = 200,000 falls into a Region of Rejection, the Null Hypothesis is
rejected. If the sample mean’s value of x_bar = 200,000 falls into a Region of Acceptance, the Null
Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.
This 5 percent is divided up between the two outer tails. Each outer tail contains 2.5 percent of the curve
that is the Region of Rejection.
The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical values need to be calculated.
43
The non-standardized t Distribution curve with the blue Region of Acceptance and the yellow Regions of
Rejection divided by the Critical Values is shown is in the following Excel-generated graph of this non-
standardized t Distribution curve:
44
Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the
Critical t Value.
45
The p Value (0.0244) is smaller than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore
reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is
smaller than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails.
This is shown in the following Excel-generated graph of this non-standardized t Distribution curve:
It should be noted that if this t-Test were a one-tailed test, which is less stringent than a two-tailed test,
the Null Hypothesis would still have been rejected because:
1) The p Value (0.0244) would still be smaller than the Alpha (0.05) Region of Rejection, which is now
entirely contained in the right tail
2) x_bar (200,000) would still be outside the Region of Acceptance, which would now have its outer right
boundary at 197,497.2 (mean + T.INV(1 - Alpha,df)*SE)
3) The t Value (2.105) would still be larger than the critical t Value which would now be 1.73 (Critical t
Value = T.INV(1 - Alpha,df))
46
Excel Shortcut to Performing a One-Sample t-Test
All of the three other types of t-Tests (two-independent-sample pooled and unpooled t-Tests along with
the paired t-Test) can be solved in one step with a built-in Excel formula and also with a built-in Data
Analysis tool for each t-Test.
Excel unfortunately does not provide a formula or tool that can perform or solve a one-sample t-Test in
one step. Interestingly enough, a one-sample z-Test can be solved in Excel in one step with the following
formula:
p Value = MIN(Z.TEST(array,Constant,σ),1- Z.TEST(array,Constant,σ))
array = Set of sample data
Constant = the Constant in the Null Hypothesis
σ = Population standard deviation
There is no such method in Excel to perform a one-sample t-Test similarly in a single step. The other
three types of t-Tests each have a one-step tool and a one-step formula. One of the main reasons that
these tools and formulas are one-step is that the t Value is calculated automatically. There is no one-
sample t-Test tool or formula that automatically calculates the t Value while performing the t-Test or
calculating the p Value. The t Value must be calculated in its own step when performing a one-sample t-
Test in Excel.
The formula needed to perform a one-sample t-Test is the following as previously shown:
p Value = T.DIST.RT(ABS(t Value), df)
This formula requires that the t Value be calculated first. This must be done manually using the following
steps:
t Value = (x-bar – Constant)/SE
SE = s/SQRT(n)
The one-sample t-Test is a very common statistical test so it is surprising that Excel does not have a one-
step formula or a Data Analysis tool to directly calculate either the p Value or t Value given the array and
the Constant from the Null Hypothesis. Each of the other three types of t-Tests has its own specific
formulas and its own Data Analysis tools to perform either entire the t-Test or calculate the p Value in a
single step.
47
Effect Size in Excel
Effect size for a one-sample t-Test is a method of expressing the difference between the sample mean,
x_bar, and the Constant in a standardized form that does not depend on the sample size.
Remember that the Test Statistic (the t Value) for a one-sample t-Test calculated by the following formula:
The t Value specifies the number of Standard Errors that the sample mean, x_bar, is from the Constant.
The t Value is dependent upon the sample size, n. The t Value determines whether the test has achieved
statistical significance and is dependent upon sample size. Achieving statistical significance means that
the Null Hypothesis (H0: x_bar = Constant) has been rejected.
The Effect Size, d, for a one-sample t-Test is a very similar measure that does not depend on sample size
and has the following formula:
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
If the t Value has already been calculated, the Effect Size can be quickly calculated by the following
formula:
The d measured here is Cohen’s d for a one-sample t-Test. The Effect Size is a standardized measure of
size of the difference that the t-Test is attempting to detect. The Effect Size for a one-sample t-Test is a
measure of that difference in terms of the number of sample standard deviations. Note that sample size
has no effect on Effect Size. Effect size values for the one-sample t-Test are generalized into the
following size categories:
48
d = 0.2 up to 0.5 = small Effect Size
d = 0.05 up to 0.8 = medium Effect Size
d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar – Constant| / s = |200,000 – 186,000| / 29,736.68 = 0.471
An effect size of d = 0.471 is considered to be a small effect.
49
Bring up G*Power’s initial screen and input the following information:
Test family: t-Tests
Statistical test: Difference from constant (one-sample case)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 2
Effect Size (d) = 0.471
Alpha (α) = 0.05
Sample Size (n) = 20
The completed dialogue screen appears as follows:
50
Clicking Calculate would produce the following output:
The Power achieved for this test is 0.5645. This means that the current two-tailed test has a 56.45
percent chance of detecting a difference that has an effect size of 0.471 if α = 0.05 and n = 20.
51
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:
This would indicate that a Power of 80 percent would be achieved for this test if the sample size were
approximately n = 34.
52
to the one-sample t-test. The Sign Test is non-directional and can be substituted only for a two-tailed test
but not for a one-tailed test.
The Wilcoxon test is based upon the sum of rankings of values while the Sign Test is based upon the
sum of positive versus negative values.
The Wilcoxon One-Sample Signed Rank is much more powerful (able to detect a difference) than the
Sign Test but has a required assumption that sample data are distributed about a median is a relatively
symmetric fashion. The Sign Test does not have this assumption.
Step 1) Calculate the Difference Between Each Sample Data Point and the Constant to Which the
Sample Is Being Compared.
The original Null Hypothesis from the one-sample t-Test stated that the mean monthly retails sales for the
stores in a single region is equal to the nation average which is 186,000. The Null Hypothesis for this t-
Test was as follows:
H0: x_bar = Constant =186,000
53
A difference sample consisting of the differences between each sample data point and the Constant
(186,000) is created as follows:
54
The Alternative Hypothesis is non-directional because the test’s overall purpose is to determine only
whether or not the regional mean monthly retail sales equals the national average of 186,000. The
Alternative Hypothesis for this Wilcoxon One-Sample, Signed-Rank Test will therefore be stated as
follows:
H1: Median_Difference ≠ Constant = 0
H1: Median_Difference ≠ 0
Step 3) Evaluate Whether the Test’s Required Conditions Have Been Met
The Wilcoxon One-Sample, Signed-Rank Test has the following requirements:
a) Data are ratio or interval but not categorical (nominal or ordinal). This is the case here.
b) Sample size is at least 10.
c) Data of the Difference sample are distributed about a median with reasonable symmetry. Test Statistic
W will not be normally distributed unless this assumption is met.
The following Excel-generated histogram shows that the difference data are distributed symmetrically
about their median of 14,000:
This histogram and the sample’s median were generated in Excel as follows:
55
56
Step 4 – Record the Sign of Each Difference
Place a “+1” and “-1” next to each non-zero difference. This can be automatically generated with an If-
Then-Else statement as follows:
57
Placing a plus sign (+) next to a number automatically requires a custom number format available from
the Format Cell dialogue box. One custom format that will work is the following: “+”#:”-“# . This is
demonstrated in following Excel screen shot:
58
Step 5 – Sort the Absolute Values of the Differences While Retaining the Sign Associated With
Each Difference
Sort both columns based upon column of difference absolute values.
59
Step 6 –Rank the Absolute Values, Attach the Signs, and Sum up the Signed Ranks to Create Test
Statistic W.
The absolute values are ranked in ascending order starting with a rank of 1. Absolute values that are tied
area assigned the average rank of the tied values. For example, the first four absolute values are 6000.
Each of these four absolute values would be assigned a rank of 2.5, which is equal to the average rank of
all four, i.e., (1 + 2 + 3 + 4) / 4 = 2.5.
Test Statistic W is equal to the sum of all signed ranks.
60
Step 7 – Calculate the z Score of W
The distribution of Test Statistic W can be approximated by the normal distribution if all of the required
assumptions for this test are met. The difference data consists of more than 10 points of ratio data that
are reasonably symmetrically distributed about their median. The assumptions are therefore met for this
Wilcoxon One-Sample, Signed-Rank Test.
The standard deviation of W, σW, is calculated as follows:
σW = SQRT[ n(n + 1)(2n + 1)/6 ] = 53.57
z Score = ( W – Constant – 0.5) / σW
z Score = ( 110 – 0 – 0.5) / 53.57 = 2.04
The constant is the Constant from the Null Hypothesis for this test, which is the following:
H0: Median_Difference = Constant = 0
The z Score must include a 0.5 correction for continuity because W assumes whole integer values
(except in the event of a tie of ranks).
Step 8 – Reject or Fail to Reject the Null Hypothesis Based Upon a Comparison Between the z
Score and the Critical z Value
Given that α = 0.05 and this is a two-tailed test, the Critical z Value is calculated as follows:
Z Criticalα=0.05,Two-Tailed = ±NORM.S.INV(1 – α/2) = ±NORM.S.INV(0.975)
Z Criticalα=0.05,Two-Tailed = ±1.9599
The Null Hypothesis is rejected if the z Score is further from the standardized mean of zero than the
Critical z Values. This is the case here since the z Score (2.04) is further from the standardized mean of
zero than the Critical z Values (±1.9599). These results from the Wilcoxon Signed-Rank Test are shown
in the following Excel-generated graph:
61
Rejection of the Null Hypothesis for this test can be interpreted to state that there is at least 95 percent
certainty that the median of the difference sample does not equal zero. This would mean that there is 95
percent certainty that the median monthly sales of the retail stores in the region does not equal the
national average of 186,000.
The results of this Wilcoxon One-Sample, Signed-Rank Test were very similar to the results of the original
one-sample t-Test in which the Null Hypothesis was rejected because the t value (2.105) was further from
the standardized mean of zero than the Critical t Value (2.093). The results of this t-Test indicate 95
percent certainty that the mean monthly sales of the retail stores in the region does not equal the national
average of 186,000.
The results of the t-Test are shown in the following Excel-generated graph of this non-standardized t
Distribution:
The Wilcoxon One-Sample Signed-Rank Test detects that the median difference between the region’s
retail store monthly sales and the national average is significant at an alpha level of 0.05.
The one-sample t- Test detects that the mean difference between the region’s retail store monthly sales
and the national average is significant at an alpha level of 0.05.
62
Sign Test in Excel
The Sign Test along with the Wilcoxon One-Sample Signed-Rank Test are nonparametric alternatives to
the one-sample t-Test when the normality of the sample or population cannot be verified and the sample
size is small.
The Wilcoxon One-Sample Signed-Rank Test is significantly more powerful than the Sign Test but has a
requirement of symmetrical distribution about a median for the difference sample data (the data set of the
sample points minus the Constant of the Null Hypothesis). The Wilcoxon One-Sample Signed-Rank Test
is based upon a normal approximation of its Test Statistic’s distribution. This requires that the difference
sample be reasonably symmetrically distributed about a median.
The Sign Test has no requirements regarding the distribution of data but, as mentioned, is significantly
less powerful than the Wilcoxon One-Sample Signed-Rank Test.
The Sign Test counts the number of positive and negative non-zero differences between sample data and
the Constant from the Null Hypothesis in the one-sample t-Test. In this case that Constant = 186,000
because the Null Hypothesis of the one-tailed t-Test is as follows:
H0: x_bar = Constant = 186,000
63
This difference sample is calculated as follows:
64
A count of positive and negative differences in this sample is taken as follows:
The minimum count of positive or negative non-zero differences is designated as the Test Statistic W for
this One-Sample Sign Test. Test Statistic W is named after Frank Wilcoxon who developed the test.
The objective of the two-tailed, one-sample t-Test was to determine whether to reject or fail to reject the
Null Hypothesis that states that the mean monthly sales of retails stores in the one region is equal to the
national average which is 186,000.
If the region’s mean store sales is equal to 186,000, then the probability of the monthly sales of any store
in the region minus 186,000 being positive (greater than zero) is the same as the probability of being
negative (less than zero). This probability is 50 percent.
Without knowing whether positive outcomes or negative outcomes are being counted, the probability of
the mean monthly sales of the region’s stores being 186,000 is equal to the probability of a positive
outcome (p) being 50 percent OR the probability of a negative outcome (q) being 50 percent.
65
The Null Hypothesis for this two-tailed, one-sample Sign Test states that the probability of a difference
being positive (p) OR the probability of a difference being negative (q) is 50 percent. This can be
expressed as follows:
H0: p=0.5 OR q=0.5
which would be expressed as follows:
H0: p=0.5 ∩ q=0.5
The Alternative Hypothesis would state the following:
H1: p≠0.5 ∩ q≠0.5
Each non-zero difference is classified as either positive or negative. This is a binary event because the
classification of each difference has only two possible outcomes: the non-zero difference is either positive
or negative.
The distribution of the outcomes of this binary event can be described by the binomial distribution as long
as the following two conditions exist:
1) Each binary trial is independent.
2) The data from which the differences are derived are at least ordinal. The data can be ratio, interval,
ordinal, but not nominal. The differences of “less than” and “greater than” must be meaningful even if the
amount of difference is not, as would be the case with ordinal data but not with nominal data.
3) Each binary trial has the same probability of a positive outcome.
All of these conditions are met because of the following:
1) Each sample taken is independent of any other sample.
2) The differences are derived from continuous (either ratio or interval) data.
3) The proportion of positive differences versus negative differences is assumed to be constant in the
population from which the sample of differences was derived.
The counts of the positive and negative differences both follow the binomial distribution. The binary event
to be analyzed will be one of the two, i.e., either the count of positive differences OR the count of the
negative differences. The conservative choice will be made by selecting the count that has the lowest
number.
This count, whether it is the count of positive differences or the count of negative differences, is
designated as W, the Test Statistic. This Test Statistic follows the binomial distribution because W
represents the count of positive or negative outcomes of independent binary events that all have the
same probability of a positive outcome.
As stated, the Null Hypothesis of this two-tailed, one-sample Sign Test is the following:
H0: p=0.5 ∩ q=0.5
The Null Hypothesis would be rejected if the p Value calculated from this test is less than alpha, which is
customarily set at 0.05.
The logical operator OR represents the intersection of sets. The probability of Event A OR Event B
occurring equals the sums of the probabilities of each occurring individually.
Pr(A ∩ B) = PR(A) + Pr(B)
The p Value of this test represents the probability that p = 0.5 given that the count of positive differences
is less than or equal to W OR q = 0.5 given that the count of negative differences is less than or equal
to W. Test Statistic W can represent either the count of positive OR negative differences and is set to the
difference type that has the lower count.
The p value equals the probability that p = 0.5 if W equals UP TO the count of positive differences OR the
probability that q = 0.5 if W equals UP TO the count of negative differences.
66
This p Value is expressed as follows:
p Value =
Pr (p = No. of Positive Differences ≤ W |p=0.5,n = No. of Non-Zero Differences)
∩
Pr (q = No. of Negative Differences ≤ W |p=0.5,n = No. of Non-Zero Differences)
Since Pr(A ∩ B) = PR(A) + Pr(B)
p Value =
Pr (p = No. of Positive Differences ≤ W |p=0.5,n = 20 = No. of Non-Zero Differences)
+
Pr (q = No. of Negative Differences ≤ W |p=0.5,n =20 = No. of Non-Zero Differences)
Given that variable x is binomially distributed, the CDF (Cumulative Distribution Function) of the x ≤ X is
calculated in Excel as follows:
F(X;n,p) = BINOM.DIST(X, n, p, 1)
This calculates the probability that up to X number of positive outcomes will occur in n total binary trials if
the probability of a positive outcome is p for every trial. “1” specifies that the Excel formula will calculate
the CDF and not the PDF.
Therefore the following can be calculated:
Pr (p = No. of Positive Differences ≤ W |p=0.5, n = Total No. of Non-Zero Differences) =
= BINOM.DIST(W, n, p,1)
= BINOM.DIST(7,20,0.5,1) = 0.1316
Pr (q = No. of Negative Differences ≤ W |p=0.5,n =20 = No. of Non-Zero Differences)
= 1 - BINOM.DIST(n - W, n, q,1)
= 1 - BINOM.DIST(13,20,0.5,1) = 0.1316
Due the symmetry of the binominal distribution, the following is true:
BINOM.DIST(W, n, p,1) = 1 - BINOM.DIST(n - W, n, q,1)
p Value = BINOM.DIST(W, n, p,1) + [1 - BINOM.DIST(n - W, n, q,1)]
p Value = 2 * BINOM.DIST(W, n, p,1)
p Value = 2 * BINOM.DIST(7,20,0.5,1) = 2 * 0.1316 = 0.2712
67
This is shown in the following Excel-generated graph of the PDF of the binomial distribution for this sign
test. The parameters of this binomial distribution are Total Trials = N = 20 and the Probability of a Positive
Outcome of Each Trial, p, equal 0.5. The Probability of a Negative Outcome, q, also equals 0.5.
This total p value (0.2712 = 0.1356 + 0.1356) is larger than alpha (set at 0.05). The Null Hypothesis is
therefore not rejected at this alpha level. The Null Hypothesis for this test can be interpreted to state that
the mean difference is equal to zero. This would be equivalent to stating that the mean monthly retails
sales for the region is equal to the national average which is 186,000.
This example demonstrates how much less powerful the one-sample Sign Test is than the one-sample t-
Test or the one-sample Wilcoxon Signed-Rank Test. The Sign Test did not come close to detecting a
difference at the same alpha level that the other two tests did.
68
2) Two-Independent-Sample, Pooled t-Test in Excel
Overview
This hypothesis test evaluates two independent samples to determine whether the difference between the
two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-
tailed test) than a constant. This is a pooled test because a single pooled standard deviation replaces
both sample standard deviations because they are similar enough.
x_bar1 - x_bar2 = Observed difference between the sample means
Pooled t-Tests are performed if the variances of both sample groups are similar. A rule-of-thumb is as
follows: A Pooled t-Test should be performed if the standard deviation of one sample, s 1, is no more than
twice as large as the standard deviation in the other sample s2. That is the case here for the following
example.
dfpooled = degrees of freedom = n1 + n2 – 2
69
Example of 2-Sample, 1-Tailed, Pooled t-Test in Excel
In this example two different brand of the same type of battery are being tested to determine if there
probably is a real difference in the average length of time that batteries from each of the two brands last.
The length of each battery’s lifetime of operation in minutes was recorded. Determine with 95 percent
certainty whether Brand A batteries have a longer average lifetime than Brand B batteries.
Here are the data samples from batteries of the two brands:
70
Running the Excel data analysis tool Descriptive Statistics separately on each sample group produces the
following output:
Note that when performing two-sample t-Tests in Excel, always designate Sample 1 (Variable 1) to be the
sample with the larger mean.
The results of the Pooled t-Test will be more intuitive if the sample group with the larger mean is
designated as the first sample and the sample group with the smaller mean is designated as the second
sample.
Another reason for designating the sample group with the larger mean as the first sample is to obtain the
correct result from the Excel data analysis tool t-Test: Two-Sample Assuming Equal Variances. The
test statistic (T Stat in the Excel output) and the Critical t value (t Critical two-tail in the Excel output) will
have the same sign (as they always should) only if the sample group with the larger mean is designated
the first sample.
71
Sample Group 2 – Brand B (Variable 2)
x_bar2 = sample2 mean = AVERAGE() = 33.53
µ2 (Greek letter “mu”) = population mean from which Sample 2 was drawn = Not Known or needed to
solve this problem
s2 = sample2 standard deviation =STDEV.S() = 15.28
Var2 = sample2 variance =VAR() = 233.39
σ2 (Greek letter “sigma”) = population standard deviation from which Sample 2 was drawn = Not Known
or needed to solve this problem
n2 = sample2 size = COUNT() = 17
x_bar1 - x_bar2 = 43.56 – 33.53 = 10.03
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean are as follows:
72
c) Independent or Dependent (Paired) Test?
It is an unpaired test because data observations in each sample group are completely unrelated to data
observations in the other sample group. The designation of “paired” or “unpaired” applies only for two-
sample hypothesis tests.
e) t-Test or as a z-Test?
A two-independent-sample hypothesis test of mean must be performed as a t-Test if sample size is small
(n1 + n2 < 40). In this case the sample size is small as n1 + n2 = 33. This Hypothesis Test of Mean must
be performed as a t-Test. A t-Test uses the t distribution and not the normal distribution as does a z-Test.
The Null Hypothesis of an F Test states that the variances of the two groups are the same. The p Value
shown in the Excel F Test output equals 0.345. This is much larger than the Alpha (0.05) that is typically
used for an F Test so the Null Hypothesis cannot be rejected.
73
We therefore conclude as a result of the F Test that the variances are the same. The F Test is sensitive to
non-normality of data. The sample variances can also be compared using the nonparametric Levene’s
Test and also the nonparametric Brown-Forsythe Test.
74
Levene’s Test involves performing Single-Factor ANOVA on the groups of distances to the mean. This
can be easily implemented in Excel by applying the Excel data analysis tool ANOVA: Single Factor.
Applying this tool on the above data produces the following output:
The Null Hypothesis of Levene’s Test states that the average distance to the mean for the two groups are
the same. Acceptance of this Null Hypothesis would imply that the sample groups have the same
variances. The p Value shown in the Excel ANOVA output equals 0.6472. This is much larger than the
Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
We therefore conclude as a result of Levene’s Test that the variances are the same or, at least, that we
don’t have enough evidence to state that the variances are different. Levene’s Test is sensitive to outliers
because relies on the sample mean, which can be unduly affected by outliers. A very similar
nonparametric test called the Brown-Forsythe Test relies on sample medians and is therefore much less
affected by outliers as Levene’s Test is or by non-normality as the F Test is.
75
Brown-Forsythe Test For Sample Variance Comparison in Excel
The Brown-Forsythe Test is a hypothesis test commonly used to test for the equality of variances of two
or more sample groups. The Null Hypothesis of the Brown-Forsythe Test is average distance to the
sample median is the same for each sample group. Acceptance of this Null Hypothesis implies that the
variances of the sampled groups are the same. The distance to the median for each data point of both
samples is shown as follows:
The Brown-Forsythe Test involves performing Single-Factor ANOVA on the groups of distances to the
median. This can be easily implemented in Excel by applying the Excel data analysis tool ANOVA:
Single Factor. Applying this tool on the above data produces the following output:
76
The Null Hypothesis of the Brown-Forsythe Test states that the average distance to the median for the
two groups are the same. Acceptance of this Null Hypothesis would imply that the sample groups have
the same variances. The p Value shown in the Excel ANOVA output equals 0.6627. This is much larger
than the Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
We therefore conclude as a result of the Brown-Forsythe Test that the variances are the same or, at
least, that we don’t have enough evidence to state that the variances are different.
Each of the above tests can be considered relatively equivalent to the others. The variances of both
sample groups are verified to be similar enough to permit using a Pooled test for this two-independent
sample hypothesis test.
This hypothesis test is a t-Test that is two-independent-sample, one-tailed, Pooled hypothesis test
of mean.
To perform a hypothesis test that is based on the normal distribution or t distribution, both sample means
must be normally distributed. In other words, if we took multiple samples just like either one of the two
mentioned here, the means of those samples would have to be normally distributed in order to be able to
perform a hypothesis test that is based upon the normal or t distributions.
For example, 30 independent, random samples of the battery lifetimes from each of the two battery
brands could be evaluated just like the single sample of the lifetimes of 15+ batteries from each of the two
battery brands as mentioned here. If the means of all of the 30 samples from one battery brand and,
separately, the means of the other 30 samples from the other battery brand are normally distributed, a
hypothesis test based on the normal or t distribution can be performed on the two independent samples
taken.
The means of the samples would be normally distributed if any of the following are true:
c) Independence of Samples
This type of a hypothesis test requires both samples be totally independent of each other. In this case
they are completely independent. There is no relationship between the observations that make up each of
the two sample groups.
78
Histogram in Excel
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
79
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.
80
Normal Probability Plot in Excel
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:
Normal probability plots for both sample groups show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).
81
Kolmogorov-Smirnov Test For Normality in Excel
The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
Variable 1 - Brand A Battery Lifetimes
82
Variable 2 - Brand B Battery Lifetimes
83
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected only if the maximum difference between the expected and actual CDF
of any of the data points exceed the Critical Value for the given n and α. That is not the case here.
The Max Difference Between the Actual and Expected CDF for Variable 1 (0.0885) and for Variable 2
(0.1007) are significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15
(0.34) at α = 0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of each of the two sample
groups is accepted.
84
Anderson-Darling Test For Normality in Excel
The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons:
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.
If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
Variable 1 – Brand A Battery Lifetimes
85
Variable 2 - Brand B Battery Lifetimes
86
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
Variable 1 – Brand A Battery Life
87
Test Statistic W (0. 972027) is larger than W Critical 0.887. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.
88
Correctable Reasons That Normal Data Can Appear Non-Normal
If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.
89
Step 1 – Create Null and Alternate Hypotheses
The Null Hypothesis is always an equality and states that the items being compared are the same. In this
case, the Null Hypothesis would state that the average optimism scores for both sample groups are the
same. We will use the variable x_bar1-x_bar2 to represent the difference between the means of the two
groups. If the mean scores for both groups are the same, then the difference between the two means,
x_bar1-x_bar2, would equal zero. The Null Hypothesis is as follows:
H0: x_bar1-x_bar2 = Constant = 0
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the mean of the population from which the
first sample (x_bar1) was taken is greater than the mean of the population from which the second sample
was taken (x_bar2). The Alternate Hypothesis is as follows:
H1: x_bar1-x_bar2 > Constant, which is 0
H1: x_bar1-x_bar2 > 0
The Alternative Hypothesis is directional (“greater than” or “less than” instead of “not equal”) and the
hypothesis test is therefore a one-tailed test. The “greater than” operator in the Alternative hypothesis
indicates that this one-tailed test occurs in the right tail. It should be noted that a two-tailed test is more
rigorous (requires a greater differences between the two entities being compared before the test shows
that there is a difference) than a one-tailed test.
The following formulas are used by the Two-Independent Sample, Pooled t-Test:
2 2
sPooled = SQRT[{(n1-1)s1 +(n2-1)s2 }/df]
2 2
sPooled = SQRT[{(16-1)*(16.915) +(17-1)*(15.277) }/31]
sPooled = 16.09
90
Pooled Sample Standard Error
91
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:
A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar 1-x_bar2.
92
The variable x_bar1-x_bar2 is distributed according to the t Distribution. Mapping this distributed variable
to a t Distribution curve is shown as follows:
93
This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the
Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which
outer tail. The Alternative Hypothesis is the follows:
H1: x_bar1-x_bar2 > 0
A “greater than” or “less than” operator indicates that this will be a one-tailed test. The “greater than” sign
indicates that the Region of Rejection will be in the right tail.
The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical Values need to be calculated.
94
If this were a two-tailed test, the Critical values would be determined as follows:
95
One-Tailed (Right Tail) Critical t Value = T.INV(1-α,df)
One-Tailed (Right Tail) Critical t Value = T.INV(1-0.05, 31) = 1.696
This indicates that (x_bar1 - x_bar2) is 1.696 standard errors to the right of the Constant, which is 0.
The t Value (1.790) is farther from the standardized mean of zero than the Critical t Value (1.696) so the
Null Hypothesis is rejected.
The value of x_bar1-x_bar2, 10, has a t Value of 1.79 and therefore is 1.79 standard errors from the mean.
This is further from the mean than the critical value of 9.6, which is the t critical distance of 1.69 standard
errors from the mean.
96
It should be noted that if this t-Test were a two-tailed test, which is more stringent than a one-tailed test,
the Null Hypothesis would be accepted because:
1) The p Value (0.042) would now be larger than Alpha/2 (0.025)
2) x_bar1-x_bar2 (10.033) would now be in the Region of Acceptance, which would now have its outer
right boundary at 11.43 (mean + T.INV(1-α/2,df)*SE)
97
Following are screen shots of how the data should be entered:
98
Clicking OK will produce the following result. This result agrees with the calculations that were performed
in this section.
99
The calculations to create the preceding output were performed as follows. The individual outputs are
color-coded so it is straight-forward to match the calculations with the outputs of the tool.
100
101
102
Excel Statistical Function Shortcut
Another very quick way to perform this t-Test is to calculate the p value and compare it to Alpha (for a
one-tailed test) or Alpha/2 for a two-tailed test.
The p Value of this two-independent-sample, Pooled t-Test can be very quickly using the following Excel
statistical function:
=T.TEST(array 1,array2,1,2)
Before this test is employed, all required assumptions such as normality of data must be verified as was
done.
The stand-alone Excel formula to perform a two-independent sample, pooled t-Test is shown as follows. If
the resulting p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test, the difference
between the means of the samples is deemed to be statistically significant. This indicates that the two
samples were likely drawn from different populations.
The Null Hypothesis of the t-Test would not be rejected if the test were two-tailed because the p Value
(0.042) is greater than Alpha/2 (0.025).The Null Hypothesis of the t-Test would be rejected if the test were
one-tailed because the p Value (0.042) is less than Alpha/2 (0.025). A one-tailed test is less.
103
Effect Size in Excel
Effect size in a t-Test is a convention of expressing how large the difference between two groups is
without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.”
A large effect would be a difference between two groups that is easily noticeable with the measuring
equipment available. A small effect would be a difference between two groups that is not easily noticed.
Effect size for a two-independent-sample, pooled t-Test is a method of expressing the distance between
the difference between sample mean, x_bar1-x_bar2, and the Constant in a standardized form that does
not depend on the sample size.
Remember that the Test Statistic (the t Value) for a two-independent-sample t-Test calculated by the
following formula:
which equals
since
104
Since degrees of freedom for a two-independent-sample, pooled t-Test equals the following:
df = n1 + n2 – 2
The t Value specifies the number of Standard Errors that the differences between sample means, x_bar 1-
x_bar2, is from the Constant. The t Value is dependent upon the sample size, n. The t Value determines
whether the test has achieved statistical significance and is dependent upon sample size. Achieving
statistical significance means that the Null Hypothesis (H 0: x_bar1-x_bar2 = Constant = 0) has been
rejected.
The t Value for a two-independent-sample, pooled is calculated as follows:
The Effect Size, d, for a two-independent-sample, pooled t-Test is a very similar measure that does not
directly depend on sample size and has the following formula:
spooled pools the sample standard deviations based upon the proportion of combined samples that each of
the sample sizes n1 and n2 represent and not the absolute values of n1 and n2. spooled is therefore not
directly dependent on sample sizes n1 and n2.
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
If the t Value has already been calculated, the Effect Size can be quickly calculated by the following
formula:
105
The d measured here is Cohen’s d for a two-independent-sample, pooled t-Test. The Effect Size is a
standardized measure of size of the difference that the t-Test is attempting to detect. The Effect Size for a
two-independent-sample, pooled t-Test is a measure of that difference in terms of the number of sample
standard deviations. Note that sample size has no effect on Effect Size. Effect size values for the two-
independent-sample, pooled t-Test are generalized into the following size categories:
d = 0.2 up to 0.5 = small Effect Size
d = 0.05 up to 0.8 = medium Effect Size
d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar1 - x_bar2 – Constant| / spooled = |43.56 – 33.53 – 0| / 16.09 = 0.623
An Effect Size of d = 0.623 is considered to be a medium effect.
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.
Bring up G*Power’s initial screen and input the following information:
106
Test family: t-Tests
Statistical test: Means: Difference between two independent means (two groups)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 1
Effect Size (d) = 0.623
Alpha (α) = 0.05
Sample Sizes (n1 = 16 and n2 = 17)
The completed dialogue screen appears as follows:
107
Clicking Calculate would produce the following output:
The Power achieved for this test is 0.5416. This means that the current one-tailed test has a 54.16
percent chance of detecting a difference that has an effect size of 0.623 if α = 0.05, n 1 = 16, and n2 = 17.
108
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:
This would indicate that a Power of 80 percent would be achieved for this test if the total sample size
were equal to approximately n1 + n2 = 65.
109
Nonparametric Alternatives in Excel
The Mann-Whitney U Test is a nonparametric test that can be substituted for the two-sample t-Test (both
pooled or unpooled) when the following circumstances occur:
1) Normality of at least one sample or one population cannot be verified and sample size is small.
2) The data is ordinal. A t-Test requires that the data be either ratio or interval but not ordinal. The Mann-
Whitney U Test requires only the data be at least ordinal so that all of the data can be ranked. The
specific difference between data values does not have to be measurable.
3) Either one of the sample groups has significant outliers. The Mann-Whitney U Test is based upon the
rankings of data values and is therefore much affected by outliers than a t-Test, which is based on
sample means.
The two-independent-sample t-Test compares the means of the two samples to determine if the means of
the two populations are significantly different. The two populations are those from the two samples were
taken.
110
1) The data are at least ordinal so that the data can be ranked. Differences between sample data points
do not have to be measurable.
2) All data observations are independent of each other.
3) The sum of sample sizes, n1 + n2, equals at least 20.
4) Both samples have similar distribution shapes. A histogram of each sample will display the shape of
the data’s distribution.
If these assumptions are met, the Test Statistic U will have an approximately normal distribution. The
Mann-Whitney U test is based upon the Test Statistic U being approximately normally distributed. Test
Statistic U is the sum of the ranks of the data in one of the two samples.
Following are the data from the two data samples that will be compared in this Mann-Whitney U Test:
111
Step 1 – Evaluate Whether the Required Assumptions Are Met
The required assumptions for the Mann-Whitney U Test are as follows:
1) The data are at least ordinal so that the data can be ranked. Differences between sample data points
do not have to be measurable.
2) All data observations are independent of each other.
3) The sum of sample sizes, n1 + n2, equals at least 20.
4) Both samples have similar distribution shapes. A histogram of each sample will display the shape of
the data’s distribution.
The first three assumptions have clearly been met. Histograms of each sample group have to be created
to determine if both samples have similar distribution shapes. The following Excel histograms show that
the sample groups have reasonably similar distribution shapes:
112
Step 2 – Create the Null and Alternative Hypotheses
The purpose of the original t-Test was to determine with 95 percent certainty whether Brand A batteries
have a longer average lifetime than Brand B batteries. The Null and Alternative Hypotheses for this t-Test
are the following:
H0: x_bar1-x_bar2 = 0
H1: x_bar1-x_bar2 > 0
The “greater than” operator in the Alternative Hypothesis indicates that this is a one-tailed test in the right
tail.
The two-independent-sample t-Test compares the means of the two samples to determine if the means of
the two populations are significantly different. The two populations are those from the two samples were
taken.
The Mann-Whitney U Test performs a similar evaluation by comparing the ranks of one sample group to
the average ranks of both sample groups to determine if the ranks of each of the two populations are
significantly different. The two populations are those from the two samples were taken.
113
Just as with a two-independent-sample t-Test, the Mann-Whitney U Test can be performed as a two-
tailed test or as a one-tailed test. The Alternative Hypothesis specifies which tail(s) the test will be
focused on.
The Null Hypothesis for this Mann-Whitney U Test is as follows:
H0: U = Uaverage
There is one notable difference between a one-tailed t-Test and a one-tailed Mann-Whitney U Test. A
one-tailed t-Test either the left of the right tail. A one-tailed Mann-Whitney U Test will always be
performed in the left tail regardless of which sample is expected to have the larger rank sum.
The Alternative Hypothesis for this one-tailed test in the left tail is the following:
H1: U < Uaverage
The reason that a one-tailed Mann-Whitney U Test is always performed in the left tail is the Test Statistic
U is always less than Uaverage (which is the average of U1 and U2) because Statistic U is set to equal the
smaller of the two adjusted sums of ranks, U1 and U2, for the two groups.
Uaverage = (U1 and U2)/2
It should be noted that the one-tailed t-Test was performed in the right tail. This one-tailed Mann-Whitney
U Test is performed in the left tail.
Test Statistic U is calculated in the following steps.
114
Step 3 – Combine All of the Data Into a Single Column
Make sure that each data point has its group name in an adjacent cell. This will be necessary to return
the data back to the original groups.
115
Step 4 – Sort All of the Data
116
Step 5 – Rank All of the Data
Ties (data that have the same values) are assigned the rank that is the average rank for all of the tied
values. For example, the two tied data values of 26 would have been assigned the ranks of 9 and 10 if
they were not tied. Since they are tied, they are both assigned the average rank of 9.5.
117
Step 6 – Return the Data to the Original Two Groups
Sort all three columns simultaneously according the column that contains the name of the original group
that each data value belongs.
118
Step 7 – Calculate R and n For Each Sample Group
R equals the sum of the ranks for each group and n is the sample size of each group.
119
Step 8 – Calculate U1 and U2
U1 and U2 are adjusted rank sums for the two groups.
U1 = R1 – n1(n1 + 1)/2
U1 = 314 – 16(16 – 1)/2 = 178
U2 = R2 – n2(n2 + 1)/2
U2 = 247 – 17(17 – 1)/2 = 94
120
Step 10 – Calculate the Mean and Standard Deviation of U
U_bar = (U1 + U2) / 2 = n1*n2 / 2 = 136
sU = SQRT( n1 * n2 * (n1 + n2+ 1) / 12) = 27.76
Step 12 – Determine Whether or Not to Reject the Null Hypothesis by Comparing the z Score to the
Critical z Value
The Null Hypothesis is rejected if the z Score is farther from the standardized normal distribution’s mean
of zero than the Critical z Value. This is not the case here because the z Score (-1.5130) is closer to the
standardized mean of zero than the Critical z Value (-1.6448). There is not enough evidence to reject the
Null Hypothesis at an alpha level of 0.05.
This one-tailed, left tail, Mann-Whitney U Test was not sensitive enough to detect a difference at α = 0.05.
The Null Hypothesis, which states that the adjusted rank sum of one of the groups is not different than the
average adjusted rank sum of both groups, is not rejected. The rankings of the data in each group are not
found to be significantly different at an alpha level of 0.05. The two populations from which the samples
were taken are not assumed to have different rankings. This one-tailed Mann-Whitney U Test did not
detect a difference in the two populations based on the two samples taken from the populations. This
information is shown in the following Excel-generated graph:
121
The equivalent one-tailed, right tail, two-independent-sample t-Test was sensitive enough to detect a
difference at α = 0.05. The t value (1.79) was further from the standardized mean of zero than the Critical
t Value (1.69). The Null Hypothesis of this t-Test, which states that the means of both populations are not
different, is rejected. This one-tailed t-Test did detect a difference in the two populations based on the two
samples taken from the populations. This information is shown in the following Excel-generated graph:
122
You may have noticed that the p Value (red region in the chart) appears in the left tail in the Mann-
Whitney U Test but appears in the right tail in the t-Test graph directly above.
It should be noted that the Mann-Whitney U Test is always has its p value (red region in the graph) in its
left tail. This is due to the negative z Score. The z Score calculated in the Mann-Whitney U Test will
always negative because Test Statistic U is always set to equal the smaller of U1 and U2.
The formula for this z Score is the following:
z Score = (U – U_bar)/ sU
This z Score is negative because U is always less than U_bar.
The p Value for a t-test can appear in the right or left tail because the t Value of a t-Test can be positive or
negative. The t Value in this t-Test is positive because the formula for the t Value is the following:
x_bar1 = 43.56
x_bar2 = 33.53
Constant = 0
This t Value is positive because x_bar1 - x_bar2 - Constant is positive.
Nonparametric tests generally have less power (ability to detect a difference) than their parametric
equivalents. One way to increase the likelihood that a nonparametric test will detect a difference is to
increase alpha. Increasing alpha decreases the required level of certainty because of the following
relationship:
Alpha – 1 – Level of Required Certainty
If alpha were doubled from a value of α = 0.05 to α = 0.10, the Critical z Value is changed from a value of
-1.6448 to -1.2816. The Mann-Whitney U Test would have detected a difference in this case.
123
How Sample Standard Deviation Affects t-Test Results
When the standard deviation in sample groups is increased, the sample groups harder to tell apart. This
might be more intuitive to understand if presented visually.
Below are box plots of three sample groups each having a small sample standard deviation:
Each of the sample groups is visually easy to differentiate from the others. The measures of spread -
standard deviation and variance - are shown for each sample group. Remember that variance equals
standard deviation squared.
If each sample group’s spread is increased (widened), the sample groups become much harder to
differentiate from each other. The graph shown below is of three sample groups having the same means
as above but much wider spread.
124
It is easy to differentiate the sample groups in the top graph but much less easy to differentiate the
sample groups in the bottom graph simply because the sample groups in the bottom graph have much
wider spread.
In statistical terms, one could say that it is easy to tell that the samples in the top graph were drawn from
different populations. It is much more difficult to say whether the sample groups in the bottom graph were
drawn from different populations.
Relationship Between the Two-Independent-Sample, Pooled t-Test and Single-Factor ANOVA
The preceding illustrates the underlying principle behind both t-tests and ANOVA tests. One of the main
purposes of both t-tests and ANOVA tests is to determine whether samples are from the same
populations or from different populations. The variance (or equivalently, the standard deviation) of the
sample groups is what is what determines how difficult it is to tell the sample groups apart.
The two-independent-sample, pooled t-test is essentially the same test as single-factor ANOVA. The two-
independent-sample, pooled t-test can only be applied to two sample groups at one time. Single-Factor
ANOVA can be applied to three or more groups at one time. Both two-independent-sample, pooled t-test
and single-factor ANOVA require that variances of sample groups be similar.
We will apply both the two-independent sample t-test and single-factor ANOVA to the first two samples in
each of the above graphs to verify that the results are equivalent.
125
Sample Groups With Small Variances (the first graph)
126
Applying a two-independent-sample, pooled t-test to the first two of the three sample groups of this graph
would produce the following result:
This result would have been obtained by filling in the Excel dialogue box as follows:
127
Running Single-Factor ANOVA on those same two sample groups would produce this result:
This result would have been obtained by filling in the Excel dialogue box as follows:
Both the Two-Independent-Sample, Pooled t-test and the Single-Factor ANOVA test produce the same
result when applied to these two sample groups. They both produce the same p Value (1.51E-10) which
is extremely small. This indicates that the result is statistically significant and that the difference in the
means of the two groups is real. More correctly put, it can be stated that there is a very small chance
(1.51E-10) that the samples came from the same population and that the result obtained (that their
means are different) was merely a random occurrence.
128
Sample Groups With Large Variances (the second graph)
129
Applying a two-independent sample t-test to the first two of the three sample groups in this graph would
produce the following result:
This result would have been obtained by filling in the Excel dialogue box as follows:
130
Running Single-Factor ANOVA on those same two sample groups would produce this result:
This result would have been obtained by filling in the Excel dialogue box as follows:
Both the t-test and the ANOVA test produce the same result when applied to these two sample groups.
They both produce the same p Value (0.230876). This is relatively large. 95 percent is the standard level
of confidence usually required in statistical hypothesis tests to conclude that the results are statistically
significant (real). The p value needs to be less than 0.05 to achieve a 95 percent confidence level that a
difference really exists. The sample groups with the large spread produced a p Value greater than 0.05
and we can therefore not reject the Null Hypothesis which states that the sample groups are the same.
The results are not statistically significant and we cannot conclude that the two samples were not drawn
from the same population.
131
Showing How the Formulas For Both the t-Test and for ANOVA Produce the Same Result
t-Test Formula
The Two-Independent-Sample, Pooled t-Test is used to determine with a specific degree of certainty
whether there really is a difference between the mean values of two sample groups given a similar
amount of variance in each of the two sample groups.
If the sample standard deviation in each of the two sample groups, s 1 and s2, is large, then the Pooled
Standard Deviation will also be large, as can be seen from the following equation:
ANOVA
The ANOVA outputs of the previous two comparisons demonstrate the following:
The smaller the p Value is, the more certainty exists that sample groups are really different, i.e., that the
sample groups came from different populations.
The p Value is derived from the F value. The larger the F Value, the smaller is the p Value.
The F value can be roughly described as being the variation between groups divided by the variation
within groups (the spread of the groups).
As the spread (standard deviation) of the sample groups increase, the F value become smaller. When the
F Value become smaller, the p Value becomes larger. The larger the p Value becomes, the less certainty
exists that the ANOVA results are statistically significant (real). If the results are not statistically
significant, we cannot reject the Null Hypothesis that states that the sample different (drawn from different
populations).
Bottom line: the larger the standard deviation of sample groups being compared with a two-independent-
sample, pooled t-Test or single-factor ANOVA, that harder it is to state that the sample groups are truly
different, i.e., that the sample groups come from different populations.
132
3) Two-Independent-Sample, Unpooled t-Test in Excel
Overview
This hypothesis test evaluates two independent samples to determine whether the difference between the
two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-
tailed test) than a constant. This is an unpooled test because a single pooled standard deviation
CANNOT replace both sample standard deviations because they are too different.
x_bar1 - x_bar2 = Observed difference between the sample means
Unpooled t-Tests are performed if the variances of both sample groups are not similar. A rule-of-thumb is
as follows: A Pooled t-Test should only be performed if the standard deviation of one sample, s 1, is no
more than twice as large as the standard deviation in the other sample s 2. An unpooled t-Test should be
performed if that condition is not met.
Null Hypothesis H0: x_bar1 - x_bar2 = Constant
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bar1 - x_bar2 is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
133
Example of 2-Sample, 2-Tailed, Unpooled t-Test in Excel
This problem is very similar to the problem solved in the z-test section for a two-independent-sample, two-
tailed z-test. Similar problems were used in each of these sections to show the similarities and also
contrast the differences between the two-independent-sample z-Test and t-test as easily as possible.
Two shifts on a production are being compared to determine if there is a difference in the average daily
number of units produced by each shift. The two shifts operate eight hours per day under nearly identical
conditions that remain fairly constant from day to day. A sample of the total number of units produced by
each shift on a random selection of days is taken. Determine with a 95 percent Level of Confidence if
there is a difference between the average daily number of units produced by the two shifts.
Here is the sampled data as follows:
134
Running the Excel data analysis tool Descriptive Statistics separately on each sample group produces the
following output:
Note that when performing two-sample t-Tests in Excel, always designate Sample 1 (Variable 1) to be the
sample with the larger mean.
The results of the Unpooled t-Test will be more intuitive if the sample group with the larger mean is
designated as the first sample and the sample group with the smaller mean is designated as the second
sample.
Another reason for designating the sample group with the larger mean as the first sample is to obtain the
correct result from the Excel data analysis tool t-Test:Two-Sample Assuming Unequal Variances. The
test statistic (T Stat in the Excel output) and the Critical t value (t Critical two-tail in the Excel output) will
have the same sign (as they always should) only if the sample group with the larger mean is designated
the first sample.
135
Sample Group 2 – Shift B (Variable 2)
x_bar2 = sample2 mean = AVERAGE() = 42.24
µ2 (Greek letter “mu”) = population mean from which Sample 2 was drawn = Not Known
s2 = sample2 standard deviation =STDEV.S() = 11.80
Var2 = sample2 variance =VAR() = 139.32
σ2 (Greek letter “sigma”) = population standard deviation from which Sample 2 was drawn = Not Known
n2 = sample2 size = COUNT() = 17
x_bar1 - x_bar2 = 46.55 – 42.24 = 4.31
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The initial two questions that need to be answered before performing the Four-Step Hypothesis Test of
Mean are as follows:
136
d) One-Tailed or Two-Tailed Test?
The problem asks to determine whether there is simply a difference in the average number of daily units
produced by Shift A and by Shift B. This is a non-directional inequality making this hypothesis test a two-
tailed test. If the problem asked to determine whether Shift A’s production is greater than or less than
than Shift B’s, the inequality would be directional and the resulting hypothesis test would be a one-tailed
test. A two-tailed test is more stringent than a one-tailed test.
e) t-Test or z-Test?
A two-independent-sample hypothesis test of mean must be performed as a t-Test if sample size is small
(n1 + n2 < 40). In this case the sample size is small as n1 + n2 = 37 This Hypothesis Test of Mean must be
performed as a t-Test. A t-Test uses the t distribution and not the normal distribution as does a z-Test.
The Null Hypothesis of an F Test states that the variances of the two groups are the same. The p Value
shown in the Excel F Test output equals 0.002. This is much smaller than the Alpha (0.05) that is typically
used for an F Test so the Null Hypothesis can be rejected. The p value indicates that there is only a 0.2
percent of obtaining this result if the Null Hypothesis is true.
137
We therefore conclude as a result of the F Test that the variances are the not same. The F Test is
sensitive to non-normality of data. The sample variances can also be compared using the nonparametric
Levene’s Test and also the nonparametric Brown-Forsythe Test.
138
Levene’s Test involves performing Single-Factor ANOVA on the groups of distances to the mean. This
can be easily implemented in Excel by applying the Excel data analysis tool ANOVA: Single Factor.
Applying this tool on the above data produces the following output:
The Null Hypothesis of Levene’s Test states that the average distance to the mean for the two groups are
the same. Rejection of this Null Hypothesis would imply that the sample groups have the different
variances. The p Value shown in the Excel ANOVA output equals 0.0025. This is much smaller than the
Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis should be rejected.
We therefore conclude as a result of Levene’s Test that the variances are different. Levene’s Test is
sensitive to outliers because relies on the sample mean, which can be unduly affected by outliers. A very
similar nonparametric test called the Brown-Forsythe Test relies on sample medians and is therefore
much less affected by outliers as Levene’s Test is or by non-normality as the F Test is.
139
Brown-Forsythe Test For Sample Variance Comparison in Excel
The Brown-Forsythe Test is a hypothesis test commonly used to test for the equality of variances of two
or more sample groups. The Null Hypothesis of the Brown-Forsythe Test is average distance to the
sample median is the same for each sample group. Acceptance of this Null Hypothesis implies that the
variances of the sampled groups are the same. The distance to the median for each data point of both
samples is shown as follows:
The Brown-Forsythe Test involves performing Single-Factor ANOVA on the groups of distances to the
median. This can be easily implemented in Excel by applying the Excel data analysis tool ANOVA:
Single Factor. Applying this tool on the above data produces the following output:
140
The Null Hypothesis of the Brown-Forsythe Test states that the average distance to the median for the
two groups are the same. Acceptance of this Null Hypothesis would imply that the sample groups have
the same variances. The p Value shown in the Excel ANOVA output equals 0.0033. This is much smaller
than the Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis should be rejected.
We therefore conclude as a result of the Brown-Forsythe Test that the variances are the not same.
Each of the above tests can be considered relatively equivalent to the others. We believe that the
variances of both sample groups are dissimilar enough to force using an Unpooled test for this two-
independent sample hypothesis test.
This hypothesis test is a t-Test that is two-independent-sample, two-tailed, Unpooled hypothesis
test of mean.
To perform a hypothesis test that is based on the normal distribution or t distribution, both sample means
must be normally distributed. In other words, if we took multiple samples just like either one of the two
mentioned here, the means of those samples would have to be normally distributed in order to be able to
perform a hypothesis test that is based upon the normal or t distributions.
For example, 30 independent, random samples of the daily production from each of the two shifts could
be evaluated just like the single sample of units produced from 15+ production days from each of the two
shifts as mentioned here. If the means of all of the 30 samples from one shift and, separately, the means
of the other 30 samples from the other shift are normally distributed, a hypothesis test based on the
normal or t distribution can be performed on the two independent samples taken.
The means of the samples would be normally distributed if any of the following are true:
141
3) Both Samples Are Normally Distributed
If the sample is normally distributed, the means of other similar-sized, independent, random samples will
also be normally distributed. Normality testing must be performed on the sample to determine whether the
sample is normally distributed.
In this case the sample size for both samples is small: n 1 and n2 are both less than 30. The normal
distribution of both sample means must therefore be tested and confirmed. Normality testing on each of
the samples has to be performed to confirm the normal distribution of the means of both samples.
c) Independence of Samples
This type of a hypothesis test requires both samples be totally independent of each other. In this case
they are completely independent.
142
Histogram in Excel
Excel histograms of both sample groups are as follows:
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
143
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.
144
Normal Probability Plot in Excel
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:
Normal probability plots for both sample groups show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).
145
Kolmogorov-Smirnov Test For Normality in Excel
The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
146
Variable 2 - Shift B Units Produced
147
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected if the maximum difference between the expected and actual CDF of
any of the data points exceed the Critical Value for the given n and α.
The Max Difference Between the Actual and Expected CDF for Variable 1 (0.0938) and for Variable 2
(0.1212) are significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15
(0.34) at α = 0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of each of the two sample
groups is accepted.
148
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
149
Variable 2 - Shift B Units Produced
Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for Variable 1 (0.253) and for Variable 2 (0.219) are significantly less than
the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for
each of the two sample groups is accepted.
150
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
152
Correctable Reasons That Normal Data Can Appear Non-Normal
If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can oftwenty be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.
The above questions have been satisfactorily answered. We will however perform the t-Test to
demonstrate how a two-independent-sample, unpooled t-Test is done. We now proceed to complete the
four-step method for solving all Hypothesis Tests of Mean. These four steps are as follows:
153
The Alternate Hypothesis is always in inequality and states that the two items being compared are
different. This hypothesis test is trying to determine whether the first mean (x_bar 1) is different than the
second mean (x_bar2). The Alternate Hypothesis is as follows:
H1: x_bar1-x_bar2 ≠ Constant, which is 0
H1: x_bar1-x_bar2 ≠ 0
The Alternative Hypothesis is non-directional (“not equal” instead of “greater than” or “less than”) and the
hypothesis test is therefore a two-tailed test. It should be noted that a two-tailed test is more rigorous
(requires a greater differences between the two entities being compared before the test shows that there
is a difference) than a one-tailed test.
The following formulas are used by the Two-Independent Sample, Unpooled t-Test:
154
Step 2 – Map the Distributed Variable on a t-Distribution Curve
A t-Test can be performed if the sample mean, and the Test Statistic (the t Value) are distributed
according to the t Distribution. If the sample has passed a normality test, the sample mean and closely-
related Test Statistic are distributed according to the t Distribution.
The t Distribution always has a mean of zero and a standard error equal to one. The t Distribution varies
only in its shape. The shape of a specific t Distribution curve is determined by only one parameter: its
degrees of freedom, which equals n – 1 if n = sample size.
The means of similar, random samples taken from a normal population are distributed according to the t
Distribution. This means that the distribution of a large number of means of samples of size n taken from
a normal population will have the same shape as a t Distribution with its degrees of equal to n – 1.
The sample mean and the Test Statistic are both distributed according to the t Distribution with degrees of
freedom equal to n – 1 if the sample or population is shown to be normally distributed. This step will map
the sample mean to a t Distribution curve with a degrees of freedom equal to n – 1.
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:
A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar 1-x_bar2.
155
The variable x_bar1-x_bar2 is distributed according to the t Distribution. Mapping this distributed variable
to a t Distribution curve is shown as follows:
156
This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the
Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which
outer tail. The Alternative Hypothesis is the follows:
H1: x_bar1-x_bar2 ≠ 0
A “not equal” operator indicates that this will be a two-tailed test. This means that the Region of Rejection
is split between both outer tails.
The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical Values need to be calculated.
157
If this were a one-tailed test, the Critical Values would be determined as follows:
The t Value, the Test Statistic in a t-Test, is the number of Standard Errors that x_bar 1-x_bar2 is from the
mean. The Critical t Value is the number of Standard Errors that the Critical Value is from the mean. If the
t Value is larger than the Critical t Value, the Null Hypothesis can be rejected.
158
t Value (test statistic) = (x_bar1 - x_bar2 - 0) / SE
t Value (test statistic) = (4.31)/6.239 = 0.69
159
It should be noted that if this t-Test were a one-tailed test, which is less stringent than a two-tailed test,
the Null Hypothesis would still be accepted because:
1) The p Value (0.247) is still much larger than Alpha (0.05)
2) x_bar1-x_bar2 (4.31) is still in the Region of Acceptance, which would now have its outer right boundary
at 10.61 (mean + T.INV(1-α,df)*SE)
3) the t Value (0.69) would still be smaller than the Critical t Value which would now be 1.70 (TINV(1-
α,df))
160
The completed dialogue box for this tool is shown as follows:
161
Clicking OK will produce the following result. This result agrees with the calculations that were performed
in this section.
162
The calculations to create the preceding output were performed as follows. The individual outputs are
color-coded so it is straight-forward to match the calculations with the outputs of the tool.
163
164
Excel Statistical Function Shortcut
The stand-alone Excel formula to perform a two-independent sample, unpooled t-Test is shown as
follows. If the resulting p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test, the
difference between the means of the samples is deemed to be statistically significant. This indicates that
the two samples were likely drawn from different populations.
Pooled t-Test formulas are used when the variances of both independent sample groups are similar. The
rule of thumb is that the pooled t-Test formulas are used if the sample standard deviation of one of the
groups is no more than twice as large as the sample standard deviation of the other group. Unpooled t-
Test formulas are used the difference between the sample standard deviations is larger.
The t Value in a pooled t-Test is calculated as follows:
165
The Standard Error for a pooled t-Test is calculated as follows:
In this case spooled can be derived from SEpooled with the following calculation:
sunpooled for purposes of calculating Effect Size can be derived from SE in the same way that s pooled can as
follows:
166
The Standard Error for a unpooled t-Test is calculated as follows:
With algebraic manipulation, the formula for sunpooled can be shortened to the following formula:
sunpooled = 18.905
The t Value specifies the number of Standard Errors that the differences between sample means, x_bar 1-
x_bar2, is from the Constant. The t Value is dependent upon the sample size, n. The t Value determines
whether the test has achieved statistical significance and is dependent upon sample size. Achieving
statistical significance means that the Null Hypothesis (H 0: x_bar1-x_bar2 = Constant = 0) has been
rejected.
The Effect Size, d, for a two-independent-sample, unpooled t-Test is a very similar measure that does not
directly depend on sample size and has the following formula:
sunpooled pools the sample standard deviations based upon the proportion of combined samples that each
of the sample sizes n1 and n2 represent and not the absolute values of n1 and n2. sunpooled is therefore not
directly dependent on sample sizes n1 and n2.
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
167
The d measured here is Cohen’s d for a two-independent-sample, unpooled t-Test. The Effect Size is a
standardized measure of size of the difference that the t-Test is attempting to detect. The Effect Size for a
two-independent-sample, unpooled t-Test is a measure of that difference in terms of the number of
sample standard deviations. Note that sample size has no effect on Effect Size. Effect size values for the
two-independent-sample, unpooled t-Test are generalized into the following size categories:
d = 0.2 up to 0.5 = small Effect Size
168
Power of the Test With Free Utility G*Power
The Power of a two-independent-sample, pooled t-Test is a measure of the test’s ability to detect a
difference given the following parameters:
Alpha (α)
Effect Size (d)
Sample Sizes (n1 and n2)
Number of Tails
Power is defined by the following formula:
Power = 1 – β
Β equals the probability of a Type 2 Error. A Type 2 Error can be described as a False Negative. A false
Negative represents a test not detecting a difference when a difference does exist.
1 – β = Power = the probability of a test detecting a difference when one exists.
Power is therefore a measure of the sensitivity of a statistical test. It is common to target a Power of 0.8
for statistical tests. A Power of 0.8 indicates that a test has an 80 percent probability of detecting a
difference.
The four variables that are required in order to determine the Power for a one-sample t-Test are Alpha
(α), Effect Size (d), Sample Sizes (n1 and n2), and the Number of Tails. Typically alpha, Effect Size, and
the Number of Tails are held constant while sample sizes are varied (usually increased) to achieve the
desired Power for the statistical test.
Manual calculation of a test’s Power given Alpha, Effect Size, Sample Size, and the Number of Tails are
quite tedious. Fortunately there are a number of free utilities online that will readily calculate a test’s
statistical Power. A widely-used online Power calculation utility called G*Power is available for download
from the Institute of Experimental Psychology at the University of Dusseldorf at this link:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.
Bring up G*Power’s initial screen and input the following information:
Test family: t-Tests
Statistical test: Means: Difference between two independent means (two groups)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 2
Effect Size (d) = 0.228
Alpha (α) = 0.05
Sample Sizes (n1 = 20 and n2 = 17)
169
The completed dialogue screen appears as follows:
170
Clicking Calculate would produce the following output:
The Power achieved for this test is 0.1031. This means that the current one-tailed test has a 10.31
percent chance of detecting a difference that has an effect size of 0.228 if α = 0.05, n 1 = 20, and n2 = 17.
171
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:
This would indicate that a Power of 80 percent would be achieved for this test if the total sample size
were equal to approximately n1 + n2 = 600.
172
Paired (Two-Sample Dependent) t-Test in Excel
Overview
This hypothesis test determines whether the mean of a sample of differences between pairs of data
(x_bardiff) is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant.
Before-and-after fitness levels of individuals undergoing a training program would be an example of
paired data. The sample evaluated would be the group of differences between the before-and-after
scores of the individuals. This is called the difference sample.
x_bardiff = Observed Difference Sample Mean
df = n – 1
Null Hypothesis H0: x_bardiff = Constant
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed x_bardiff is beyond the Critical Value.
2) The t Value (the Test Statistic) is farther from zero than the Critical t Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
173
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred.
This problem illustrates why the t-test is nearly always used instead of a z-Test to perform a two-
dependent-sample (paired) hypothesis test of mean. The z-Test requires the population standard
deviation of the differences between the pairs be known. This is often not the case, but is required for a
paired z-Test . The t-test requires only the sample standard deviation of the sample of paired differences
be known.
Before and After Results and Their Differences Are As Follows:
Running the Excel data analysis tool Descriptive Statistics on the column of Difference data produces the
following output:
Running the Excel data analysis tool Descriptive Statistics on the column of Difference data will provide
the Sample Mean, the Sample Standard Deviation, the Standard Error, and the Sample Size. It will even
provide half the width of a confidence interval about the mean based on this sample for any specified
level of certainty if that option is specified. The output of this tool appears as follows:
174
It is the difference that we are concerned with. A hypothesis test will be performed on the sample of
differences. The distributed variable will be designated as x_bar diff and will represent that average
difference between After and Before samples.
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred
175
Note that this calculation of the Standard Error using the sample standard deviation, sdiff, is an estimate of
the true Standard Error which would be calculated using the population standard deviation, σ diff.
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean are as follows:
e) t-Test or z-Test?
Assuming that the difference population or difference sample can pass a normality test, a hypothesis test
of mean must be performed as a t-Test when the difference sample size (n = number of difference pairs)
is small (n < 30) or if the variance of differences is unknown.
176
In this case the difference sample size (the number of data pairs) is small as n = 17 data sample pairs.
This Hypothesis Test of Mean must therefore be performed as a t-Test and not as a z-Test.
The t Distribution with degrees of freedom = df = n – 1 is defined as the distribution of random data
sample of sample size n taken from a normal population.
The means of samples taken from a normal population are also distributed according to the t
Distribution with degrees of freedom = df = n – 1.
The Test Statistic (the t Value, which is based upon the difference sample mean (x_bardiff) because it
equals (x_bardiff – Constant)/(SEdiff) will therefore also be distributed according to the t Distribution. A t-
Test will be performed if the Test Statistic is distributed according to the t Distribution.
The distribution of the Test Statistic for the difference sample taken from a normal population of
differences is always described by the t Distribution. The shape of the t Distribution converges to (very
closely resembles) the shape of the standard normal distribution when the difference sample size
becomes large (n > 30).
The Test Statistic’s distribution can be approximated by the normal distribution only if the difference
sample size is large (n > 30) and the population standard deviation, σ, is known. A z-Test can be used if
the Test Statistic’s distribution can be approximated by the normal distribution. A t-Test must be used in
all other cases.
It should be noted that a paired t-Test can always be used in place of a paired z-Test. All z-Tests can be
replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages
including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a t-Test that is two-sample, paired (dependent), one-tailed hypothesis test
of mean.
177
When the Difference Sample Size is Small
The data in a difference sample taken from a normally-distributed population of paired differences will be
distributed according to the t Distribution regardless of the difference sample size.
The means of similar difference samples randomly taken from a normally-distributed population of paired
differences are also distributed according to the t Distribution regardless of the difference sample size.
The difference sample mean, and therefore the Test Statistic, are distributed according to the t
Distribution if the population of paired differences is normally distributed.
The population of paired differences is considered to be normally distributed if any of the following are
true:
1) Population of Paired Differences Is Normally Distributed
2) Difference Sample Is Normally Distributed
If the difference sample passes a test of normality then the population of paired difference from which the
difference sample was taken can be assumed to be normally distributed.
The population of paired differences or the difference sample must pass a normality test before a t-Test
can be performed. If the only data available are the data of the single difference sample taken, then
difference sample must pass a normality test before a t-Test can be performed.
Histogram in Excel
The quickest way to check the difference sample data for normality is to create an Excel histogram of the
data as shown below, or to create a normal probability plot of the data if you have access to an
automated method of generating that kind of a graph.
It is very important to verify the normality of the data if the difference sample size is small. If a hypothesis
test’s required underlying assumptions cannot be met, the test is invalid.
178
Here is a histogram of the sample of differences.
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
The sample of differences appears to be distributed reasonably closely to the bell-shaped normal
distribution. It should be noted that bin size in an Excel histogram is manually set by the user. This
arbitrary setting of the bin sizes can has a significant influence on the shape of the histogram’s output.
Different bin sizes could result in an output that would not appear bell-shaped at all. What is actually set
by the user in an Excel histogram is the upper boundary of each bin.
179
Normal Probability Plot in Excel
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for the sampled differences. This can be implemented in Excel and appears as follows:
The normal probability plot for the sampled differences shows that the data appears to be very close to
being normally distributed. The actual sample data (red) matches very closely the data values of the
sample were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence
interval boundaries (green).
180
Difference Sample Group
181
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected if the maximum difference between the expected and actual CDF of
any of the data points exceed the Critical Value for the given n and α.
The Max Difference Between the Actual and Expected CDF for Difference sample group (0.0926) is
significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15 (0.34) at α =
0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of the difference sample group is accepted.
182
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
Difference Sample Group
183
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
Difference Data
184
Test Statistic W (0. 968860) is larger than W Critical 0.892. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.
186
SEdiff = Standard Error = sdiff / SQRT(n) = 6.4 / SQRT(16)
These parameters are used to map the distributed variable, x_bardiff, to the t Distribution curve as follows:
The t Distribution is usually presented in its finalized form with standardized values of a mean that equals
zero and a standard error that equals one. The horizontal axis is given in units of Standard Errors and the
distributed variable is the t Value (the Test Statistic) as follows:
187
A non-standardized t Distribution curve would simply have its horizontal axis given in units of the measure
used to take the samples. The distributed variable would be the sample mean, x_bar diff.
The variable x_bardiff is distributed according to the t Distribution. Mapping this distributed variable to a t
Distribution curve is shown as follows:
This non-standardized t Distribution curve has its mean set to equal the Constant taken from the Null
Hypothesis, which is:
H0: x_bardiff = Constant = 0
This non-standardized t Distribution curve is constructed from the following parameters:
Mean = Constant = 0
Standard Errordiff = 1.55
Degrees of Freedom = 16
Distributed Variable = x_bardiff
188
The above distribution curve that maps the distribution of variable x_bar diff can be divided up into two
types of regions: the Region of Acceptance and the Region of Rejection.
If x_bardiff’s value of -3.35 falls in the Region of Acceptance, we must accept the Null Hypothesis. If
x_bardiff’s value of -3.35 falls in the Region of Rejection, we can reject the Null Hypothesis.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t distribution curve.
This 5 percent is entirely contained in the outer left tail. The outer left tail contains the 5 percent of the
curve that is the Region of Rejection.
189
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent-Tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
The t Value corresponds to the standardized value of the sample mean, x_bar diff = -3.35. The t Value is
the number of Standard Errors that x_bardiff is from the curve’s mean of 0.
The Critical t Value is the number of Standard Errors that the Critical Value is from the curve’s mean.
Reject the Null Hypothesis if the t Value is farther from the standardized mean of zero than the Critical t
Value.
Equivalently, reject the Null Hypothesis if the t Value is closer to the standardized mean of zero than the
Critical t Value.
This means that the sample mean, x_bardiff, is 2.159 standard errors to the left of the curve mean of 0.
This means that the boundary of the Region of Rejection are 1.76 standard errors to the left of the curve
mean of 0 since this is a one-tailed test in the left tail.
The Null Hypothesis is rejected because the t Value is farther from curve mean the Critical t Values
indicating that x_bardiff is in the Region of Rejection.
190
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond x_bar diff (-3.35). If the p Value is smaller than
Alpha, the Null Hypothesis is rejected.
p Value = 0.023
The p Value (0.023) is smaller than Alpha (0.05) Region of Rejection in the right tail and we therefore
reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is
smaller than the yellow Alpha, which is the 5 percent Region of Rejection in the left tail. This is shown in
the following Excel-generated graph of this non-standardized t Distribution curve:
191
Excel Data Analysis Tool Shortcut
This two-independent-sample, Pooled t-Test can be solved much quicker using the following Excel data
analysis tool:
t-Test: Paired Two Sample for Means
The Excel tool can be found by clicking Data Analysis under the Data tab. The tool is titled t-Test:Paired
Two Sample For Means. The entire Data Analysis Toolpak is an add-in that ships with Excel but must
first be activated by the user before it is available.
This hypothesis test creates the sample of differences by subtracting the Before results from the After
results. If the training program has successfully reduced the average number monthly clerical errors per
employee, the resulting average difference (x_bardiff) will be negative.
If the Before data was subtracted from the After data, the After data (in column B) sample should be
designated as Variable 1, as is done here. This ensures that the t Value (T Stat in the Excel output) has
the correct sign, which would be negative in this case.
This tool will be applied to the following data set using the same data as the preceding example in this
section.
192
193
Clicking OK will produce the following result. This result agrees with the calculations that were performed
in this section.
194
The calculations to create the preceding output were performed as follows. The individual outputs are
color-coded so it is straight-forward to match the calculations with the outputs of the tool.
195
196
Excel Statistical Function Shortcut
The stand-alone Excel formula to perform a paired (two-dependent sample) t-Test is shown as follows. If
the resulting p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test, the mean difference
between the Before and After sample pairs is deemed to be statistically significant.
The p Value is calculated to be 0.023. This is less than Alpha (0.05) or Alpha/2 (0.025) so the Null
Hypothesis for this t-Test would be rejected for both a one-tailed test and a two-tailed test if Alpha is set
to 0.05 (95 percent certainty required for the test).
197
Effect Size in Excel
Effect size in a t-Test is a convention of expressing how large the difference between two groups is
without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.”
A large effect would be a difference between two groups that is easily noticeable with the measuring
equipment available. A small effect would be a difference between two groups that is not easily noticed.
Effect size for a paired (two-dependent-sample) t-Test is a method of expressing the difference between
the sample mean, x_bardiff, and the Constant in a standardized form that does not depend on the sample
size.
Remember that the Test Statistic (the t Value) for a paired t-Test calculated by the following formula:
since
then
The t Value specifies the number of Standard Errors that the sample mean, x_bar diff, is from the Constant.
The t Value is dependent upon the sample size, n. The t Value determines whether the test has achieved
statistical significance and is dependent upon sample size. Achieving statistical significance means that
the Null Hypothesis (H0: x_bar = Constant) has been rejected.
The Effect Size, d, for a paired-sample t-Test is a very similar measure that does not depend on sample
size and has the following formula:
198
A test’s Effect Size can be quite large even though the test does not achieve statistical significance due to
small sample size.
If the t Value has already been calculated, the Effect Size can be quickly calculated by the following
formula:
The d measured here is Cohen’s d for a paired t-Test. The Effect Size is a standardized measure of size
of the difference that the t-Test is attempting to detect. The Effect Size for a paired t-Test is a measure of
that difference in terms of the number of sample standard deviations. Note that sample size has no effect
on Effect Size. Effect size values for the paired t-Test are generalized into the following size categories:
d = 0.2 up to 0.5 = small Effect Size
d = 0.05 up to 0.8 = medium Effect Size
d = 0.8 and above = large Effect Size
In this example, the Effect Size is calculated as follows:
d = |x_bar diff – Constant| / sdiff = |–3.35- 0| / 6.40 = 0.523
An effect size of d = 0.523 is considered to be a medium effect.
199
Power of the Test With Free Utility G*Power
The Power of a one-sample t-Test is a measure of the test’s ability to detect a difference given the
following parameters:
Alpha (α)
Effect Size (d)
Sample Size (n)
Number of Tails
Power is defined by the following formula:
Power = 1 – β
Β equals the probability of a Type 2 Error. A Type 2 Error can be described as a False Negative. A false
Negative represents a test not detecting a difference when a difference does exist.
1 – β = Power = the probability of a test detecting a difference when one exists.
Power is therefore a measure of the sensitivity of a statistical test. It is common to target a Power of 0.8
for statistical tests. A Power of 0.8 indicates that a test has an 80 percent probability of detecting a
difference.
The four variables that are required in order to determine the Power for a one-sample t-Test are Alpha
(α), Effect Size (d), Sample Size (n), and the Number of Tails. Typically alpha, Effect Size, and the
Number of Tails are held constant while sample size is varied (usually increased) to achieve the desired
Power for the statistical test.
Manual calculation of a test’s Power given Alpha, Effect Size, Sample Size, and the Number of Tails are
quite tedious. Fortunately there are a number of free utilities online that will readily calculate a test’s
statistical Power. A widely-used online Power calculation utility called G*Power is available for download
from the Institute of Experimental Psychology at the University of Dusseldorf at this link:
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Screen shots will show how use this utility to calculate the Power for this example and also to provide a
graph of Sample Size vs. Achieved Power for this example as follows:
As mentioned, the four variables that are required in order to determine Power for a one-sample t-Test
are Alpha (α), Effect Size (d), Sample Size (n), and the Number of Tails.
Bring up G*Power’s initial screen and input the following information:
Test family: t-Tests
Statistical test: Difference between two dependent means (matched pairs)
Type of power analysis: Post hoc – Compute achieved power –given α, sample size, and effect size
Number of Tails = 1
Effect Size (d) = 0.523
Alpha (α) = 0.05
Sample Size (n) = 17
The completed dialogue screen appears as follows:
200
201
Clicking Calculate would produce the following output:
The Power achieved for this test is 0.6624. This means that the current one-tailed paired t-Test has a
66.24 percent chance of detecting a difference that has an effect size of 0.523 if α = 0.05 and n = 17.
202
It is often desirable to plot a graph of sample size versus achieve Power for the given Effect Size and
alpha. This can be done by clicking the button X-Y plot for a range of values and then clicking Draw
Plot on the next screen that comes up. This will produce the following output:
This would indicate that a Power of 80 percent would be achieved for this test if the sample size were
approximately n = 24.
203
Nonparametric Alternatives in Excel
Wilcoxon Signed-Rank Test in Excel
The Wilcoxon Signed-Rank Test is an alternative to the paired t-Test when sample size is small (number
of pairs = n < 30) and normality cannot be verified for the difference sample data or the population from
which the difference sample was taken.
The Wilcoxon Signed-Rank Test calculates the difference between each data point in the difference
sample and the Constant from the t-Test’s Null Hypothesis (0 in this case). The absolute values of each
difference and ranked and then assigned the sign (positive or negative) that the difference originally had.
These signed ranks are summed up to create the Test Statistic W.
Test Statistic W will be approximately normally distributed if the required assumptions are met for this
test. The Test Statistic’s z Score can then be calculated and compared with the Critical z value. The
decision whether or not to reject the test’s Null Hypothesis is made based on the results of this
comparison.
The Null Hypothesis for this test states that the median difference equals the Constant, i.e. H 0: Population
Median Difference = Constant. This is very similar to the Null Hypothesis of the one-sample t-Test which
states that the population median difference is equal to the Constant. The population is the set of
differences from all possible before-and-after data pairs.
The Wilcoxon Signed-Rank Test is performed by implementing the following steps:
1) Calculate the difference between each difference data point and the Constant that the sample is being
compared to, which is the Constant of 0 from the Null Hypothesis.
2) Record the sign (positive or negative) of each difference.
3) Sort the absolute values of difference data in ascending order.
4) Assign ranks to this data.
5) Attach the sign of each difference to its rank.
6) Sum up all of these signed ranks. This sum is the Test Statistic W.
7) Calculate the standard deviation, σw, for these signed ranks.
8) The Test Statistic W will be approximately normally distributed if the required assumptions for this test
are met. Calculate the z Score for this variable W.
9) Compare the z Score of W with the Critical z Value for the given alpha and number of tails in the test. If
the z Score is further from the standardized mean of zero than the Critical z Value, the Null Hypothesis
can be rejected. The Null Hypothesis states that the population’s median is equal to the Constant from
the Null Hypothesis.
204
Performing the Wilcoxon Signed Rank Test on the data is the example in this section is done as follows:
Step 1) Calculate the Difference Between Each Sample Data Point and the Constant to Which the
Sample Is Being Compared.
The original Null Hypothesis from the paired t-Test stated that the mean difference between all before-
and-after data pairs in the entire population is equal to 0. The Null Hypothesis for this test was as follows:
H0: x_bardiff = Constant =0
The x_bardiff sample is created as follows:
205
The Wilcoxon Signed-Rank Test calculates the difference between each data point in the difference
sample and the Constant from the t-Test’s Null Hypothesis (0 in this case). That final difference sample is
created as follows:
206
Step 3) Evaluate Whether the Test’s Required Conditions Have Been Met
The Wilcoxon Signed-Rank Test has the following requirements:
a) Data are ratio or interval but not categorical (nominal or ordinal). This is the case here.
b) Sample size (the number of data pairs) is at least 10. This is the case here.
c) Data of the Difference sample are distributed about a median with reasonable symmetry. Test Statistic
W will not be normally distributed unless this assumption is met.
The following Excel-generated histogram shows that the difference data are distributed symmetrically
about their median of -4. The symmetry about the median of -4 is not perfect but, given the small sample
size, is reasonable enough to proceed with this test:
207
This histogram and the sample’s median were generated in Excel as follows:
208
Step 4 – Record the Sign of Each Difference
Place a “+1” and “-1” next to each non-zero difference. This can be automatically generated with an If-
Then-Else statement as follows:
209
Placing a plus sign (+) next to a number automatically requires a custom number format available from
the Format Cell dialogue box. One custom format that will work is the following: “+”#:”-“# . This is
demonstrated in following Excel screen shot:
210
Step 5 – Sort the Absolute Values of the Differences While Retaining the Sign Associated to Each
Difference
Sort both columns based upon column of difference absolute values.
211
Step 6 –Rank the Absolute Values, Attach the Signs, and Sum up the Signed Ranks to Create Test
Statistic W.
The absolute values are ranked in ascending order starting with a rank of 1. Absolute values that are tied
area assigned the average rank of the tied values. For example, the first three absolute values are 2.
Each of these three absolute values would be assigned a rank of 2, which is equal to the average rank of
all three, i.e., (1 + 2 + 3) / 3 = 2.
Test Statistic W is equal to the sum of all signed ranks.
212
Step 7 – Calculate the z Score of W
The distribution of Test Statistic W can be approximated by the normal distribution if all of the required
assumptions for this test are met. The difference data consists of more than 10 points of ratio data that
are reasonably symmetrical about their median. The assumptions are met for this Wilcoxon Signed-Rank
Test.
The standard deviation of W, σW, is calculated as follows:
σW = SQRT[ n(n + 1)(2n + 1)/6 ] = 42.25
z Score = ( W – Constant – 0.5) / σW
z Score = ( -83 – 0 – 0.5) / 42.25 = -1.98
The constant is the Constant from the Null Hypothesis for this test, which is the following:
H0: Median_Difference = Constant = 0
The z Score must include a 0.5 correction for continuity because W assumes whole integer values
(except in the event of a tie of ranks).
213
Step 8 – Reject or Fail to Reject the Null Hypothesis Based Upon a Comparison Between the z
Score and the Critical z Value
Given that α = 0.05 and this is a two-tailed test, the Critical z Value is calculated as follows:
Z Criticalα=0.05,One-Tailed, Left_Tail = NORM.S.INV(α) = NORM.S.INV(0.05)
Z Criticalα=0.05, One-Tailed, Left_Tail = -1.64485
The Null Hypothesis is rejected if the z Score is further from the standardized mean of zero than the
Critical z Values. This is the case here since the z Score (-1.98) is further from the standardized mean of
zero than the Critical z Values (-1.64485). this information is shown in the Excel-generated graph as
follows:
The Wilcoxon Signed-Rank Test detects that the median difference between the before-and-after data is
significant at an alpha level of 0.05.
The results of this Wilcoxon Signed-Rank Test were very similar to the results of the original paired t-Test
in which the Null Hypothesis was rejected because the t Value (-2.159) was further from the standardized
mean of zero than the Critical t Value (-1.76).
214
The results of the original t-Test are shown as follows:
The Paired t- Test detects that the mean difference between the before-and-after data is significant at an
alpha level of 0.05.
215
The Sign Test in Excel
The Sign Test along with the Wilcoxon Signed-Rank Test are nonparametric alternatives to the paired t-
Test when the normality of the difference sample cannot be verified and the sample size is small.
The Wilcoxon Signed-Rank Test is significantly more powerful than the Sign Test but has a requirement
of symmetrical distribution about a median for the difference sample data (the data set of the sample
points minus the Constant of the Null Hypothesis). The Wilcoxon Signed-Rank Test is based upon a
normal approximation of its Test Statistic’s distribution. This requires that the difference sample be
reasonably symmetrically distributed about a median.
The Sign Test has no requirements regarding the distribution of data but, as mentioned, is significantly
less powerful than the Wilcoxon Signed-Rank Test.
The Sign Test counts the number of positive and negative non-zero differences between difference
sample data and the Constant from the Null Hypothesis in the paired t-Test. In this case that Constant = 0
because the Null Hypothesis of the one-tailed t-Test is as follows:
H0: x_bardiff = Constant = 0
The after-minus-before difference sample for this problem is calculated as follows:
216
The final difference sample is created by subtracting the Constant from the Null Hypothesis, 0, from the
after-minus-before difference as follows:
217
A count of positive and negative differences in this sample is taken as follows:
The minimum count of positive or negative non-zero differences is designated as the Test Statistic W for
the Sign Test. Test Statistic W is named after Frank Wilcoxon who developed the test.
The objective of the one-tailed, paired t-Test was to determine whether to reject or fail to reject the Null
Hypothesis that states that the mean difference between the number of clerical errors before and after the
training for all employees is equal to 0.
If the mean difference is equal to 0, then the probability of a difference being positive (greater than zero)
is the same as the probability of being negative (less than zero). This probability is 50 percent.
The Null Hypothesis for this one-tailed, Sign Test states that the probability of a difference being positive
(p) is 50 percent. This can be expressed as follows:
H0: p=0.5
The Alternative Hypothesis would state the following:
H1: p≠0.5
218
Each non-zero difference is classified as either positive or negative. This is a binary event because the
classification of each difference has only two possible outcomes: the non-zero difference is either positive
or negative.
The distribution of the outcomes of this binary event can be described by the binomial distribution as long
as the following two conditions exist:
1) Each binary trial is independent.
2) The data from which the differences are derived are at least ordinal. The data can be ratio, interval,
ordinal, but not nominal. The differences of “less than” and “greater than” must be meaningful even if the
amount of difference is not, as would be the case with ordinal data but not with nominal data.
3) Each binary trial has the same probability of a positive outcome.
All of these conditions are met because of the following:
1) Each sample taken is independent of any other sample.
2) The differences are derived from continuous (either ratio or interval) data.
3) The proportion of positive differences versus negative differences is assumed to be constant in the
population from which the sample of differences was derived.
The counts of the positive and negative differences both follow the binomial distribution. The lowest
count, whether it is the count of positive differences or the count of negative differences, is designated as
W, the Test Statistic. This Test Statistic follows the binomial distribution because W represents the count
of positive or negative outcomes of independent binary events that all have the same probability of a
positive outcome.
As stated, the Null Hypothesis of this one-tailed, paired Sign Test is the following:
H0: p=0.5
If the training program was successful, there would be a reduction in the number of clerical errors. If the
number of clerical errors were reduced, there would be a larger number of negative after-minus-before
differences than positive differences.
219
The difference count indicates that there are 11 negative differences and 6 positive differences. These
counts are distributed according to the binomial distribution that has a probability of positive outcome, p,
equal to 0.5 and the total number of trials, N, equal to 17. As Excel-generated graph of this binomial
distribution is shown as follows:
This test evaluates whether a count of 11 negative differences is significant at an alpha level of 0.05. The
area under the PDF curve beyond 11 differences is equal to the probability that this outcome did not
occur by chance. This is the p value for this test.
The Null Hypothesis would be rejected if the p Value calculated from this test is less than alpha, which is
customarily set at 0.05.
The binomial distribution is symmetric and the curve area in the right tail beyond 11 differences is the
same as the curve area in the left tail beyond 6 differences.
Test Statistic W in the Sign Test is always set to the lower count. The area outside the lower count is
equal to the area outside the upper count. That area is the p Value for the one-tailed Sign test.
That p Value is equal to the probability that the number of positive outcomes is less than W = 6 if the total
number of nonzero counts = N = 17 and every binary trial has the same probability of a positive outcome
= p = 0.5.
Given that variable x is binomially distributed, the CDF (Cumulative Distribution Function) of the x ≤ X is
calculated in Excel as follows:
p Value = F(X;n,p) = BINOM.DIST(X, n, p, 1)
This calculates the probability that up to X number of positive outcomes will occur in n total binary trials if
the probability of a positive outcome is p for every trial. “1” specifies that the Excel formula will calculate
the CDF and not the PDF.
220
Therefore the following can be calculated:
p Value = Pr (p = No. of Positive Differences ≤ W |p=0.5, N = Total No. of Non-Zero Differences) =
p Value = BINOM.DIST(W, N, p,1)
p Value = BINOM.DIST(7,20,0.5,1) = 0.1316
This is shown in the following Excel-generated graph of the PDF of the binomial distribution for this sign
test. The parameters of this binomial distribution are Total Trials = N = 17 and the Probability of a Positive
Outcome of Each Trial, p, equal 0.5.
The p Value (0.1661) is larger than alpha (set at 0.05). The Null Hypothesis is therefore not rejected at
this alpha level. This would be equivalent to stating that there is not enough evidence to reject the Null
Hypothesis which ultimately states that there has been no reduction in clerical errors as a result of the
training program.
This example demonstrates how much less powerful the one-sample Sign Test is than the paired t-Test
or the Wilcoxon Signed-Rank Test. The Sign Test did not come close to detecting a difference at the
same alpha level that the other two tests did.
221
z-Tests: Hypothesis Tests Using the Normal
Distribution in Excel
z-Test Overview
The z-Test is a hypothesis test that analyzes sample data to determine if two populations have
significantly different means. A z-Test can be applied if the test statistic follows the normal distribution
under the Null Hypothesis. The test statistic will follow the normal distribution if both of the following
conditions exist at the same time:
1) The sample size is large.
2) The population standard deviation is known.
The population and sample do not have to be evaluated for normality when sample size is large. As per
the Central Limit Theorem, large sample size ensures that the sample mean will be normally distributed
The test statistic is derived from the sample mean. Normal distribution of the test statistic further requires
that the population standard deviation be known.
The t-Test is the appropriate population mean hypothesis testing tool when sample size is small and/or
the population standard deviation is not known. A t-Test can always be substituted for a z-Test.
Null Hypothesis
A hypothesis test is based upon a Null Hypothesis which states that the sample did come from that
population. A hypothesis test compares a sample statistic such as a sample mean to a population
parameter such as the population’s mean. The amount of difference between the sample statistic and the
population parameter determines whether the Null Hypothesis can be rejected or not.
The Null Hypothesis states that the population from which the sample came has the same mean or
proportion as a hypothesized population. The Null Hypothesis is always an equality stating that the
means or proportions of two populations are the same.
An example of a basic Null Hypothesis for a Hypothesis Test of Mean would be the following:
H0: x_bar = Constant = 5
This Null Hypothesis would be used to state that the population from which the sample was taken has a
mean equal to 5. The Constant (5) is the mean of the hypothesized population that the sample’s
population is being compared to. The Null Hypothesis states that the sample’s population and the
hypothesized population have the same means. The Alternative Hypothesis states that they are different.
An example of a basic Null Hypothesis for a Hypothesis Test of Proportion would be the following:
H0: p_bar = Constant = 0.3
This Null Hypothesis would be used to state that the population from which the sample was taken has a
proportion equal to 0.3. The Constant (0.3) is the proportion of the hypothesized population that the
sample’s population is being compared to. The Null Hypothesis states that the sample’s population and
222
the hypothesized population have the same proportions. The Alternative Hypothesis states that they are
different.
Alternative Hypothesis
The Alternative Hypothesis is always in inequality stating that the means or proportions of two populations
are not the same. The Alternative Hypothesis can be non-directional if it states that the means or
proportions of two populations are merely not equal to each other. The Alternative Hypothesis is
directional if it states that the mean or proportion of one of the populations is less than or greater than the
mean of proportion of the other population.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Mean would be the
following:
H1: x_bar ≠ 5
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a mean that is not equal to 5.
An example of a directional Alternative Hypothesis would be the following:
H1: x_bar > 5 or H1: x_bar < 5
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a mean that is either greater than or less than 5.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Proportion would be the
following:
H1: p_bar ≠ 0.3
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a proportion that is not equal to 0.3.
An example of a directional Alternative Hypothesis would be the following:
H1: p_bar > 0.3 or H1: p_bar < 0.3
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a proportion that is either greater than or less than 0.3.
223
One-Tailed Test vs. a Two-Tailed Test
The number of tails in a hypothesis test depends on whether the test is directional or not. The operator of
the Alternative Hypothesis indicates whether or not the hypothesis test is directional. A non-directional
operator (a “not equal” sign) in the Alternative Hypothesis indicates that the hypothesis test is a two-
tailed test. A directional operator (a “greater than” or “less than” sign) in the Alternative Hypothesis
indicates that the hypothesis test is a one-tailed test.
The Region of Rejection (the alpha region) for a one-tailed test is entirely contained in the one of the
outer tails. A “greater than” operator in the Alternative Hypothesis indicates that the test is a one-tailed
test in the right tail. A “less than” operator in the Alternative Hypothesis indicates that the test is a one-
tailed test in the left tail. If α = 0.05, then one of the outer tails will contain the entire 5-percent Region of
Rejection.
The Region of Rejection (the alpha region) for a two-tailed test is split between both outer tails. Each
outer tail will contain half of the total Region of Rejection (alpha/2). If α = 0.05, then each outer tail will
contain a 2.5-percent Region of Rejection if the test is a two-tailed tailed.
Level of Certainty
Each hypothesis test has Level of Certainty that is specified. The Null Hypothesis is rejected only when
that Level of Certainty has been reached that the sample did not come from the population. A commonly
specified Level of Certainty is 95 percent. The Null Hypothesis would only be rejected in this case if the
sample statistic was different enough from the population parameter that at least 95 percent certainty was
achieved that the sample did not come from that population.
Region of Acceptance
A Hypothesis Test of Mean or Proportion can be performed if the Test Statistic is distributed according to
the normal distribution or the t-Distribution. The Test Statistic is derived directly from the sample statistic
such as the sample mean. If the Test Statistic is distributed according to the normal or t-Distribution, then
the sample statistic is also distributed according to normal or t-Distribution. This will be discussed is
greater detail shortly.
A Hypothesis Test of Mean or Proportion can be understood much more intuitively by mapping the
sample statistic (the sample mean or proportion) to its own unique normal or t-Distribution. The sample
statistic is the distributed variable whose distribution is mapped according its own unique normal or t-
Distribution.
The Region of Acceptance is the percentage of area under this normal or t-Distribution curve that equals
the test’s specified Level of Certainty. If the hypothesis test requires 95 percent in order to reject the Null
Hypothesis, the Region of Acceptance will include 95 percent of the total area under the distributed
variable’s mapped normal or t-Distribution curve.
224
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Acceptance, the Null Hypothesis is not rejected. If the observed value of the
sample statistic falls outside of the Region of Acceptance (into the Region of Rejection), the Null
Hypothesis is rejected.
Region of Rejection
The Region of Rejection is the percentage of area under this normal or t-Distribution curve that equals the
test’s specified Level of Significance (alpha). It is important to remember the following relationship:
Level of Significance (alpha) = 1 – Level of Certainty.
If the required Level of Certainty to reject the Null Hypothesis is 95 percent, then the following are true:
Level of Certainty = 0.95
Level of Significance (alpha) = 0.05
The Region of Acceptance includes 95 percent of the total area under the normal or t-Distribution curve
that maps the distributed variable, which is the sample statistic (the sample mean or proportion).
The Region of Rejection includes 5 percent of the total area under the normal or t-Distribution curve that
maps the distributed variable, which is the sample statistic (the sample mean or proportion). The 5-
percent alpha region is entirely contained in one of the tails if the test is a one-tailed test. The 5-percent
alpha region is split between both of the outer tails if the test is a one-tailed test.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Rejection (outside the Region of Acceptance), the Null Hypothesis is rejected.
If the observed value of the sample statistic falls inside of the Region of Acceptance, the Null Hypothesis
is not rejected.
Critical Value(s)
Each hypothesis test has one or two Critical Values. A Critical Value is the location of boundary between
the Region of Acceptance and the Region of Rejection. A one-tailed test has one critical value because
the Region of rejection is entirely contained in one of the outer tails. A two-tailed test has two Critical
Values because the Region of Rejection is split between the two outer tails.
The Null Hypothesis is rejected if the sample statistic (the observed sample mean or proportion) is farther
from the curve’s mean than the Critical Value on that side. If the sample statistic is farther from the
curve’s mean than the Critical value on that side, the sample statistic lies in the Region of Rejection. If the
sample statistic is closer to the curve’s mean than the Critical value on that side, the sample statistic lies
in the Region of Acceptance.
Test Statistic
Each hypothesis test calculates a Test Statistic. The Test Statistic is the amount of difference between
the observed sample statistic (the observed sample mean or proportion) and the hypothesized population
parameter (the Constant on the right side of the Null Hypothesis) which will be located at the curve’s
mean.
This difference is expressed in units of Standard Errors. The Test Statistic is the number of Standard
Errors that are between the observed sample statistic and the hypothesized population parameter. The
Null Hypothesis is rejected if that number of Standard Errors specified by the Test Statistic) is larger than
a critical number of Standard Errors. The critical number of Standard Errors is determined by the required
Level of Certainty.
225
The Test Statistic is either the z Score or the t Value depending on whether a z-Test or t-Test is being
performed. This will be discussed in greater detail shortly.
226
Critical z Values
p Value
Excel 2010 and beyond
p Value =MIN(NORM.DIST(x_bar,µ,SE,TRUE),1-NORM.DIST(x_bar,µ,SE,TRUE))
x_bar represents one of the following:
- the sample mean if this is a one-independent sample z-Test
- the difference between the sample means of a two-independent sample z-Test
- the mean difference between the paired values if this is a paired z-Test.
If the z Score (Test Statistic) is known, the p Value can be calculated more simply as follows:
p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))
227
The 3 Equivalent Reasons To Reject the Null Hypothesis
The Null Hypothesis of a Hypothesis Test of Mean or Proportion is rejected if any of the following
equivalent conditions are shown to exist:
1) The sample statistic (the observed sample mean or proportion) is beyond the Critical Value. The
sample statistic would therefore lie in the Region of Rejection because the Critical Value is the boundary
of the Region of Rejection.
2) The Test Statistic (the t value or z Score) is farther from zero than the Critical t or z Value. The
Test Statistic is the number of Standard Errors that the sample statistic is from the curve’s mean. The
Critical t or z Value is the number of Standard Errors that the boundary of the Region of Rejection is from
the curve’s mean. If the Test Statistic is farther from farther from the standardized mean of 0 than the
Critical t or z Value, the sample statistic lies in the Region of Rejection.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test. The p Value is the
curve area beyond the sample statistic. α and α/2 equal the curve areas contained by the Region of
Rejection on that side for a one-tailed test and a two-tailed test respectively. If the p value is smaller than
α for a one-tailed test or α/2 for a two-tailed test, the sample statistic lies in the Region of Rejection.
Power of a Test
The Power of a test indicates the test’s sensitivity. The Power of a test is the probability that the test will
detect a significant difference if one exists. The Power of a test is the probability of not making a Type II
Error, which is failing to detect a difference when one exists. A test’s Power is therefore expressed by the
following formula:
Power = 1 – β
228
Effect Size
Effect size in a t-Test or z-Test is a convention of expressing how large the difference between two
groups is without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.” A large effect would be a difference between two groups that is easily
noticeable with the measuring equipment available. A small effect would be a difference between two
groups that is not easily noticed.
Nonparametric Alternatives
Nonparametric tests are not substituted for z-Tests because a z-Test (a Hypothesis test of Mean that is
performed using the normal distribution) can only be performed on large samples (n > 30). The sample
mean and therefore the Test Statistic will always be normally-distributed as per the Central Limit
Theorem.
Nonparametric tests are sometimes substituted for t-Tests because normality requirements cannot be
met. A t-Test is a Hypothesis Test of Mean that can be performed if the sample statistic (and therefore the
Test Statistic) is distributed according to the t-Distribution under the Null Hypothesis. The sample statistic
(the sample mean) is distributed according to the t-Distribution if any of the following three conditions
exist:
1) Sample size is large (n > 30). The sample taken for the hypothesis test must have at least 30 data
observations.
2) The population from which the sample was taken is verified to be normal-distributed.
3) The sample is verified to be normal-distributed.
If none of these conditions can be met or confirmed, a nonparametric test can often be substituted for a t-
Test. A nonparametric test does not have normality requirements that a parametric test such as a t-Test
does.
229
Hypothesis Tests of Proportion – Basic Definition
A Hypothesis Test of Proportion compares an observed sample proportion with a hypothesized
population proportion to determine if the sample was taken from the same population. An example would
be to compare the proportion of defective units from a sample taken from one production line to the
proportion of defective units from all production lines to determine if the proportion defective from the one
production line (the population from which the sample was taken) is different than from the proportion
defective of all production lines (the hypothesized population parameter). As stated, a sample taken for a
Hypothesis Test of Proportion can only have one of two values. In this case, a sampled unit from a
production line is either defective or it is not.
Hypothesis Test of Proportion are covered in detail in a separate section in this manual. They are also
summarized at the end of the binomial distribution section.
230
Requirements for a z-Test
A z-Test can be performed only if the sample mean (and therefore the Test Statistic, which is derived
from the sample mean) is normal-distributed. The sample mean and therefore the Test Statistic are
normal-distributed only when the following two conditions are both met:
1) The size of the single sample taken is large (n > 30). The Central Limit Theorem states that means of
large samples will be normal-distributed. When the size of the single sample is small (n < 30), only a t-
Test can be performed.
2) The population standard deviation, σ (sigma), is known.
This z-Test was a two-tailed test as evidenced by the yellow Region of Rejection split between the both
outer tails. In this t-Test the alpha was set to 0.05. This 5-percent Region of Rejection is split between the
two tails so that each tail contains a 2.5 percent Region of Rejection.
The mean of this non-standardized normal distribution curve is 186,000. This indicates that the Null
Hypothesis is as follows:
H0: x_bar = 186,000
Since this is a two-tailed t-Test, the Alternative Hypothesis is as follows:
H1: x_bar ≠ 186,000
This one-sample z-Test is evaluating whether the population from which the sample was taken has a
population mean that is not equal to 186,000. This is a non-directional z-Test and is therefore two-tailed.
The sample statistic is the observed sample mean of this single sample taken for this test. This observed
sample mean is calculated to be 200,000.
The boundaries of the Region of Rejection occur at 176,703 and 195,297. Everything outside of these two
points is in the Region of Rejection. These two Critical Values are 1.959 Standard Errors from the
standardized mean of 0. This indicates that the Critical z Values are ±1.959.
The graph shows that the sample statistic (the sample mean of 200,000) falls beyond the right Critical
value of 195,257 and is therefore in the Region of Rejection.
The sample statistic is 2.951 Standard Errors from the standardized mean of 0. This is further from the
standardized mean of 0 than the right Critical t value which is 1.959.
232
The curve area beyond the sample statistic consists of 2.4 percent of the area under the curve. This is
smaller than α/2 which is 2.5 percent of the total curve area because alpha was set to 0.05.
As the graph shows, all three equivalent conditions have been met to reject the Null Hypothesis. It can be
stated with at least 95 percent certainty that the mean of the population from which the sample was taken
does not equal the hypothesized population mean of 186,000.
233
Detailed discussions of each of the three types of z-Tests along with examples in Excel are as follows:
Overview
This hypothesis test determines whether the mean of the population from which the sample was taken is
equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant. This constant
is often the known mean of a population from which the sample may have come from. The constant is the
constant on the right side of the Null Hypothesis.
x_bar = Observed Sample Mean
234
The data sample of sales for the month for a random sample of 40 retail stores in a region is as follows:
235
SE = Standard Error = σ / SQRT(n) = 30,000 / SQRT(40) = 4,743
Note that this calculation of the Standard Error using the population standard deviation, σ, is the true
Standard Error. If the sample standard error, s, were used in place of σ, the Standard Error calculated
would be an estimate of the true Standard Error. The z-Test requires the population standard deviation
but the t-test uses the sample standard deviation as an estimate of the population standard deviation.
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean are as follows:
236
e) t-Test or z-Test?
A hypothesis test of means can be performed if the distribution of the Test Statistic under the Null
Hypothesis can be approximated by either the normal distribution or the t-Distribution.
A z-Test is a statistical test in which the distribution of the Test Statistic under the Null Hypothesis can be
approximated by the normal distribution. A t-test is a statistical test in which the distribution of the Test
Statistic under the Null Hypothesis can be approximated by the t-Distribution.
This hypothesis test of mean can be performed as z-Test because sample size is large (n = 40) and the
population standard deviation (σ = 30,000) is known. The large sample size and known population
standard deviation ensure that the distribution of the sample mean (and therefore Test Statistic, which is
derived from the sample mean) can be approximated by the normal distribution under the Null
Hypothesis.
It should be noted that a one-sample t-Test can always be used in place of a one-sample z-Test. All z-
Tests can be replaced be their equivalent t-Tests. As a result, some major commercial statistical software
packages including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a t-Test that is one-sample, two-tailed hypothesis test of mean as long as
all required assumptions have been met.
237
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:
Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
Proceeding through the four steps is done is follows:
This non-standardized normal Distribution curve has its mean set to equal the Constant taken from the
Null Hypothesis, which is:
H0: x_bar = Constant = 186,000
This non-standardized normal Distribution curve is constructed from the following parameters:
Mean = 186,000
Standard Error = 4,743
Distributed Variable = x_bar
239
Step 3 – Map the Regions of Acceptance and Rejection
The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different the sample mean, x_bar = $200.000, is from the national average of
$186,000.
The non-standardized normal distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Regions of Rejection. A two-tailed test has the Region of Rejection split between the
two outer tails. A boundary between a Region of Acceptance and a Region of Rejection is called a Critical
Value.
If the sample mean’s value of x_bar = 200,000 falls into a Region of Rejection, the Null Hypothesis is
rejected. If the sample mean’s value of x_bar = 200,000 falls into a Region of Acceptance, the Null
Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.
This 5 percent is divided up between the two outer tails. Each outer tail contains 2.5 percent of the curve
that is the Region of Rejection.
The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical values need to be calculated.
240
An Excel-generated distribution curve with the blue Region of Acceptance and the yellow Regions of
Rejection is shown is as follows:
241
Equivalently, reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than
the Critical z Value.
The Constant is the Constant from the Null Hypothesis (H0: x_bar = Constant = 186,000)
Z Score (Test Statistic) = (200,000 – 186,000)/4,743
Z Score (Test Statistic) = 2.951
This means that the sample mean, x_bar, is 2.951 standard errors from the curve mean (186,000).
242
The p Value (0.0016) is smaller than Alpha/2 (0.025) Region of Rejection in the right tail and we therefore
reject the Null Hypothesis. The following Excel-generated graph shows that the red p Value (the curve
area beyond x_bar) is smaller than the yellow Alpha, which is the 5 percent Region of Rejection split
between both outer tails.
243
The Excel z-Test formula produces the p Value as follows:
Note that the array can be spread across two columns as is done here. The array does not have to be
entirely contained in a single column in this case.
244
2) Two-Independent-Sample, Unpooled z-Test in Excel
Overview
This hypothesis test evaluates two independent samples to determine whether the difference between the
two sample means (x_bar1 and x_bar2) is equal to (two-tailed test) or else greater than or less than (one-
tailed test) than a constant.
This is an unpooled test. An unpooled test can always be used in place of a pooled test. An unpooled test
must be used when population variances are not similar. An unpooled test calculates the Standard Error
using separate standard deviations instead of combining them into a single, pooled standard deviation as
a pooled test does.
The t-Test is nearly always used to compare two independent samples. For this reason, only the
unpooled, two-independent-sample z-Test will be covered. The pooled version of this z-Test will not be
covered.
In the real world, the only the sample variances are known but the population variances are usually not
known and therefore t-tests are nearly always used to perform a two-independent-sample hypothesis test
of mean. For this reason, only the unpooled, two-independent-sample z-Test will be explained. This z-
Test can always be used in place of the pooled z-Test that could be used if population variances were
known to be similar enough.
x_bar1 - x_bar2 = Observed difference between the sample means
Note that this is the same formula for SE for the two-independent-sample, unpooled t-test except that the
variance for the z-Test is the population variance as follows:
2
var1 = σ1
2
var2 = σ2
and not the sample variance used for the t-test as follows:
2
var1 = s1
2
var2 = s2
245
Example of 2-Sample, 2-Tailed, Unpooled z-Test in Excel
This problem is very similar to the problem solved in the t-test section for a two-independent-sample, two-
tailed t-test. Similar problems were used in each of these sections to show the similarities and also
contrast the differences between the two-independent-sample z-Test and t-test as easily as possible.
Two shifts on a production are being compared to determine if there is a difference in the average daily
number of units produced by each shift. The two shifts operate eight hours per day under nearly identical
conditions that remain fairly constant from day to day. A sample of the total number of units produced by
each shift on a random selection of days is taken. Determine with a 95 percent Level of Confidence if
there is a difference between the average daily number of units produced by the two shifts.
Note that when performing two-sample z-tests in Excel, always designate Sample 1 (Variable 1) to be the
sample with the larger mean.
The results of the two-independent-sample z-Test will be more intuitive if the sample group with the larger
mean is designated as the first sample and the sample group with the smaller mean is designated as the
second sample.
Details about both data samples are shown as follows:
e) t-Test or z-Test?
A z-Test is a statistical test in which the distribution of the Test Statistic under the Null Hypothesis can be
approximated by the normal distribution.
247
The Test Statistic is distributed by the normal distribution if both samples are large and both population
standard deviations are known. Both samples considered to be large samples because both sample sizes
(n1 = 40 and n2 = 36) exceeds 30. Both population standard deviations (σ1 = 25.5 and σ2 = 11.2) are
known.
Because both sample sizes (n1 = 40 and n2 = 36) exceeds 30, both sample means are therefore normal-
distributed as per the Central Limit Theorem. The difference between two normally-distributed sample
means is also normal-distributed. The Test Statistic is derived from the difference between the two means
and is therefore normal-distributed. A z-Test can be performed if the Test Statistic is normal-distributed.
It should be noted that a two-independent-sample, unpooled t-Test can always be used in place of a two-
independent-sample, unpooled. All z-Tests can be replaced be their equivalent t-Tests. As a result, some
major commercial statistical software packages including the well-known SPSS provide only t-Tests and
no direct z-Tests.
248
We now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These four
steps are as follows:
Step 1) Create the Null Hypothesis and the Alternative Hypothesis
Step 2 – Map the Normal or t-Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject the Null Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical z Value Test
Proceeding through the four steps is done is follows:
249
Step 2 – Map the Distributed Variable on a Normal Distribution Curve
H0: x_bar1-x_bar2 = Constant = 0
n1 = 40
n2 = 36
2 2
Var1 = σ1 = (25.5) = 650.25
2 2
Var2 = σ2 = (11.2) = 125.44
This non-standardized normal distribution curve has its mean set to equal the Constant taken from the
Null Hypothesis, which is:
H0: x_bar1-x_bar2 = Constant = 0
This non-standardized normal distribution curve is constructed from the following parameters:
Mean = 0
Standard Error = 4.443
Distributed Variable = x_bar1-x_bar2
250
Step 3 – Map the Regions of Acceptance and Rejection
The goal of a hypothesis test is to determine whether to reject or fail to reject the Null Hypothesis at a
given level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bar1 is from x_bar2 by showing how different x_bar1-x_bar2 (4.31) is
from zero.
The non-standardized t-Distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.
If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a Region of Rejection, the
Null Hypothesis is rejected. If the difference between the sample means, x_bar1-x_bar2 (4.31), falls into a
Region of Acceptance, the Null Hypothesis is not rejected.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.
This 5 percent Alpha (Region of Rejection) is entirely contained in the outer right tail. The operator in the
Alternative Hypothesis whether the hypothesis test is two-tailed or one-tailed and, if one tailed, which
outer tail. The Alternative Hypothesis is the follows:
H1: x_bar1-x_bar2 ≠ 0
A “not equal” operator indicates that this will be a two-tailed test. This means that the Region of Rejection
is split between both outer tails.
The boundaries between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical Values need to be calculated.
251
The following Excel-generated distribution curve with the blue Region of Acceptance and the yellow
Regions of Rejection is shown is as follows:
252
2) Compare the z Score with the Critical z Value
The z Score is the number of Standard Errors that x_bar1-x_bar2 (4.31) is from the curve’s mean of 0.
The Critical z Value is the number of Standard Errors that the Critical Value is from the curve’s mean.
Reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than the Critical z
Value. Fail to reject the Null Hypothesis if the z Score is closer to the standardized mean of zero than the
Critical z Value.
Equivalently, reject the Null Hypothesis if the z Score is farther from the standardized mean of zero than
the Critical z Value. Fail to reject the Null Hypothesis if the z Score is closer to the standardized mean of
zero than the Critical z Value.
The Constant is the Constant from the Null Hypothesis (H0: x_bar1-x_bar2 = Constant = 0)
Z Score (Test Statistic) = (4.31 – 0)/4.443
Z Score (Test Statistic) = 0.97
This means that the sample mean, x_bar1-x_bar2 (4.31), is 0.97 standard errors from the curve mean (0).
Two-tailed Critical z Values = ±NORM.S.INV(1-α/2)
Two-tailed Critical z Values = ±NORM.S.INV(1-0.05/2)
Two-tailed = ±NORM.S.INV(0.975) = ±1.9599
This means that the boundaries between the Region of Acceptance and the Region of Rejection are
1.9599 standard errors from the curve mean on each side since this is a two-tailed test.
The Null Hypothesis is not rejected because the z Score (+0.97) is closer to the standardized mean of
zero than the Critical z Value on the right side (+1.9599).
253
The following Excel-generated graph shows that the red p Value (the curve area beyond x_bar1-x_bar2) is
larger than the yellow Alpha, which is the 5 percent Region of Rejection split between both outer tails.
254
Note that this tool requires that all data in each sample group be placed in a single column. In the
following image, only the first 19 data points of each sample are showing.
The completed dialogue box for this tool which produced the preceding output is as follows:
255
256
3) Paired (Two-Sample Dependent) z-Test in Excel
Overview
This hypothesis test determines whether the mean of a sample of differences between pairs of data
(x_bardiff) is equal to (two-tailed test) or else greater than or less than (one-tailed test) than a constant.
Before-and-after fitness levels of individuals undergoing a training program would be an example of
paired data. The sample evaluated would be the group of differences between the before-and-after
scores of the individuals. This is called the difference sample.
The t-test is nearly always used instead of a z-Test to perform a two-dependent-sample (paired)
hypothesis test of mean. The z-Test requires the population standard deviation of the differences
between the pairs be known. The sample standard deviation of the difference sample is readily available
but the population standard deviation of the differences is usually not known. The t-test requires only the
sample standard deviation of the sample of paired differences be known.
x_bardiff = difference sample mean
Null Hypothesis H0: x_bardiff = Constant
257
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred.
This problem illustrates why the t-test is nearly always used instead of a z-Test to perform a two-
dependent-sample (paired) hypothesis test of mean. The z-Test requires the population standard
deviation of the differences between the pairs be known. This is rarely ever the case, but will be given for
this problem so that a paired z-Test can be used. The t-test requires only the sample standard deviation
of the sample of paired differences be known.
It is the difference that we are concerned with. A hypothesis test will be performed on the sample of
differences. The distributed variable will be designated as x_bardiff and will represent that average
difference between After and Before samples.
x_bardiff was calculated by subtracting the Before measurement from the After measurement. This is the
intuitive way to determine if a reduction in error occurred.
258
Summary of Problem Information
x_bardiff = sample mean =AVERAGE() = -2.14
σdiff = population standard deviation = 6.4
n = sample size = number of pairs = COUNT() = 40
SEdiff = Standard Error = σdiff / SQRT(n) = 6.4 / SQRT(40) = 1.01
Note that this calculation of the Standard Error, SEdiff, using the population standard deviation, σdiff, is the
true Standard Error. If the sample standard error, sdiff, were used in place of σdiff, the Standard Error
calculated would be an estimate of the true Standard Error. The z-Test requires the population standard
deviation of the paired differences but the t-test uses the sample standard deviation as an estimate of the
population standard deviation of the paired differences.
Level of Certainty = 0.95
Alpha = 1 - Level of Certainty = 1 – 0.95 = 0.05
The Excel data analysis tool Descriptive Statistics in not employed when the z-Test is used. Descriptive
Statistics should only be used if a t-Test will be performed. The Standard Deviation and Standard Error
calculated by Descriptive Statistics is based upon the sample standard deviation. the z-Test uses the
population standard deviation instead of the sample standard deviation used by the t-Test.
As with all Hypothesis Tests of Mean, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
259
The Initial Two Questions That Need To Be Answered Before Performing the Four-Step Hypothesis Test
of Mean Are As Follows:
e) t-Test or z-Test?
A z-Test can be performed if the Test Statistic’s distribution can be approximated by the normal
distribution under the Null Hypothesis. The Test Statistic’s distribution can be approximated by the normal
distribution only if the difference sample size is large (n > 30) and the population standard deviation, σ, is
known. A t-Test must be used in all other cases.
Sample size, n, equals 40 and population standard deviation, σ, equals 6.4 so both conditions are met for
the z-Test.
It should be noted that a paired t-Test can always be used in place of a paired z-Test. All z-Tests can be
replaced be their equivalent t-Tests. As a result, some major commercial statistical software packages
including the well-known SPSS provide only t-Tests and no direct z-Tests.
This hypothesis test is a z-Test that is two-sample, paired (dependent), one-tailed hypothesis test
of mean.
260
The difference sample size indicates how to determine the distribution of the difference sample mean and
therefore the distribution of the Test Statistic. As per the Central Limit Theorem, as the difference sample
size increases, the distribution of the difference sample means converges to the normal distribution.
In actuality, the sample mean converges toward the t-Distribution as sample size increases. The t-
Distribution converges to the standard normal distribution as sample size increases. The t-Distribution
nearly exactly resembles the standard normal distribution when sample size exceeds 30. The sample
mean’s distribution can therefore be approximated by the normal distribution. The Test Statistic’s
distribution can therefore be approximated by the normal distribution because the Test Statistic is derived
from the sample mean.
As per the Central Limit Theorem, the Test Statistic’s distribution can be approximated by the normal
distribution when the difference sample size is large regardless of the distribution of population from
which the sample was drawn. There is also no need to verify the normality of the difference sample, as
would be the case with a t-Test when population distribution is not known.
We can now proceed to complete the four-step method for solving all Hypothesis Tests of Mean. These
four steps are as follows:
261
Parameters necessary to map the distributed variable, x_bardiff , to the normal distribution are the
following:
x_bardiff = sample mean =AVERAGE() = -2.14
σdiff = population standard deviation = 6.4
n = sample size = number of pairs = COUNT() = 40
SEdiff = Standard Error = σdiff / SQRT(n) = 6.4 / SQRT(40) = 1.01
These parameters are used to map the distributed variable, x_bardiff, to the Excel-generated normal
distribution curve as follows:
This non-standardized t-Distribution curve has its mean set to equal the Constant taken from the Null
Hypothesis, which is:
H0: x_bardiff = Constant = 0
This non-standardized normal distribution curve is constructed from the following parameters:
Curve Mean = Constant = 0
Standard Errordiff = 1.01
Distributed Variable = x_bardiff
262
Step 3 – Map the Regions of Acceptance and Rejection
The goal of a hypothesis test is to determine whether to accept or reject the Null Hypothesis at a given
level of certainty. If the two things being compared are far enough apart from each other, the Null
Hypothesis (which states that the two things are not different) can be rejected. In this case we are trying
to show graphically how different x_bardiff (-2.14) is from the hypothesized mean of 0.
The non-standardized normal distribution curve can be divided up into two types of regions: the Region of
Acceptance and the Region of Rejection. A boundary between a Region of Acceptance and a Region of
Rejection is called a Critical Value.
The above distribution curve that maps the distribution of variable x_bar diff can be divided up into two
types of regions: the Region of Acceptance and the Region of Rejection.
If x_bardiff’s value of -2.14 falls in the Region of Acceptance, we must accept the Null Hypothesis. If
x_bardiff’s value of -2.14 falls in the Region of Rejection, we can reject the Null Hypothesis.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this t-Distribution curve.
This 5 percent is entirely contained in the outer left tail. The outer left tail contains the 5 percent of the
curve that is the Region of Rejection.
The boundary between Regions of Acceptance and Regions of Rejection are called Critical Values. The
locations of these Critical values need to be calculated as follows.
263
The distribution curve with the blue 95-percent Region of Acceptance and the yellow 5-percent Region of
Rejection entirely contained in the left tail is shown is as follows:
264
Z Score (as called the Test Statistic) = (x_bardiff – 0) / SEdiff
Z Score (Test Statistic) = (-2.14 – 0) / 1.01
Z Score (Test Statistic) = -2.11
This indicates that x_bardiff is 2.11 standard errors to the left of the mean (mean = Constant = 0).
265
3) Compare p Value to Alpha.
The p Value is the percent of the curve that is beyond x_bardiff (-2.14). If the p Value is smaller than
Alpha, the Null Hypothesis is rejected. The p Value in this case is calculated by the following Excel
formula:
p Value =MIN(NORM.S.DIST(z Score,TRUE),1-NORM.S.DIST(z Score,TRUE))
p Value =MIN(NORM.S.DIST(-2.11,TRUE),1-NORM.S.DIST(-2.11,TRUE))
p Value = 0.0174
The p Value (0.0174) is smaller than Alpha (0.05) and we therefore reject the Null Hypothesis. The
following Excel-generated graph shows that the red p Value (the curve area beyond x_bardiff) is smaller
than the yellow Region of Rejection in the left tail.
The p Value (0.0174) is smaller than Alpha (0.05) Region of Rejection in the left tail and we therefore
reject the Null Hypothesis. A graph below shows that the red p Value (the curve area beyond x_bar) is
smaller than the yellow Alpha, which is the 5 percent Region of Rejection in the left tail. This is shown in
the following Excel-generated graph of this non-standardized t-Distribution curve:
266
Excel Shortcut to Performing a Paired z-Test
Excel does not provide any formulas or tools in the Data Analysis ToolPak add-in that directly perform the
paired z-Test. The easy work-around is to perform a one-sample z-Test on the difference data sample.
This formula is as follows:
p Value =MIN(Z.TEST(array,Constant,σdiff),1-Z.TEST(array,Constant,σdiff))
It should be noted that when the Constant is positive, the p Value Excel formula is
p Value = Z.TEST(array,Constant,σdiff),
When the Constant is negative, the p Value Excel formula is
p Value = 1-Z.TEST(array, Constant,σdiff))
The Constant is taken from the Null Hypothesis and is equal to 0.
The Null Hypothesis is as follows:
H0: x_bardiff = Constant = 0
Applying the Excel one-sample z-Test formula to the sample of difference data would give the following p
Value for this paired z-Test:
267
Hypothesis Testing on Binomial Data
Overview
A hypothesis test evaluates whether a sample is different enough from a population to establish that the
sample probably did not come from that population. If a sample is different enough from a hypothesized
population, then the population from which the sample came is different than the hypothesized
population.
Null Hypothesis
A hypothesis test is based upon a Null Hypothesis which states that the sample did come from a
hypothesized population. A hypothesis test compares a sample statistic such as a sample mean or
sample proportion to a population parameter such as the population’s mean or proportion. The amount of
difference between the sample statistic and the population parameter determines whether the Null
Hypothesis can be rejected or not.
The Null Hypothesis states that the population from which the sample came has the same mean or
proportion as a hypothesized population. The Null Hypothesis is always an equality stating that the
means or proportions of two populations are the same.
An example of a basic Null Hypothesis for a Hypothesis Test of Mean would be the following:
H0: x_bar = Constant = 5
This Null Hypothesis would be used to state that the population from which the sample was taken has a
mean equal to 5. The Constant (5) is the mean of the hypothesized population that the sample’s
population is being compared to. The Null Hypothesis states that the sample’s population and the
hypothesized population have the same means. The Alternative Hypothesis states that they are different.
An example of a basic Null Hypothesis for a Hypothesis Test of Proportion would be the following:
H0: p_bar = Constant = 0.3
This Null Hypothesis would be used to state that the population from which the sample was taken has a
proportion equal to 0.3. The Constant (0.3) is the proportion of the hypothesized population that the
sample’s population is being compared to. The Null Hypothesis states that the sample’s population and
the hypothesized population have the same proportions. The Alternative Hypothesis states that they are
different.
268
Alternative Hypothesis
The Alternative Hypothesis is always in inequality stating that the means or proportions of two populations
are not the same. The Alternative Hypothesis can be non-directional if it states that the means or
proportions of two populations are merely not equal to each other. The Alternative Hypothesis is
directional if it states that the mean or proportion of one of the populations is less than or greater than the
mean of proportion of the other population.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Mean would be the
following:
H1: x_bar ≠ 5
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a mean that is not equal to 5.
An example of a directional Alternative Hypothesis would be the following:
H1: x_bar > 5
or
H1: x_bar < 5
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a mean that is either greater than or less than 5.
An example of a non-directional Alternative Hypothesis for a Hypothesis test of Proportion would be the
following:
H1: p_bar ≠ 0.3
This Alternative Hypothesis would be used to state that the population from which the sample was taken
has a proportion that is not equal to 0.3.
An example of a directional Alternative Hypothesis would be the following:
H1: p_bar > 0.3
or
H1: p_bar < 0.3
These Alternative Hypotheses would be used to state that the population from which the sample was
taken has a proportion that is either greater than or less than 0.3.
Region of Acceptance
A Hypothesis Test of Mean or Proportion can be performed if the Test Statistic is distributed according to
the normal distribution or the t distribution. The Test Statistic is derived directly from the sample statistic
such as the sample mean. If the Test Statistic is distributed according to the normal or t distribution, then
the sample statistic is also distributed according to normal or t distribution. This will be discussed is
greater detail shortly.
A Hypothesis Test of Mean or Proportion can be understood much more intuitively by mapping the
sample statistic (the sample mean or proportion) to its own unique normal or t distribution. The sample
statistic is the distributed variable whose distribution is mapped according its own unique normal or t
distribution.
The Region of Acceptance is the percentage of area under this normal or t distribution curve that equals
the test’s specified Level of Certainty. If the hypothesis test requires 95 percent in order to reject the Null
Hypothesis, the Region of Acceptance will include 95 percent of the total area under the distributed
variable’s mapped normal or t distribution curve.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Acceptance, the Null Hypothesis is not rejected. If the observed value of the
sample statistic falls outside of the Region of Acceptance (into the Region of Rejection), the Null
Hypothesis is rejected.
Region of Rejection
The Region of Rejection is the percentage of area under this normal or t distribution curve that equals the
test’s specified Level of Significance (alpha). It is important to remember the following relationship:
Level of Significance (alpha) = 1 – Level of Certainty.
If the required Level of Certainty to reject the Null Hypothesis is 95 percent, then the following are true:
Level of Certainty = 0.95
Level of Significance (alpha) = 0.05
270
The Region of Acceptance includes 95 percent of the total area under the normal or t distribution curve
that maps the distributed variable, which is the sample statistic (the sample mean or proportion).
The Region of Rejection includes 5 percent of the total area under the normal or t distribution curve that
maps the distributed variable, which is the sample statistic (the sample mean or proportion). The 5-
percent alpha region is entirely contained in one of the tails if the test is a one-tailed test. The 5-percent
alpha region is split between both of the outer tails if the test is a one-tailed test.
If the observed value of the sample statistic (the observed mean or proportion of the single sample taken)
falls inside of the Region of Rejection (outside the Region of Acceptance), the Null Hypothesis is rejected.
If the observed value of the sample statistic falls inside of the Region of Acceptance, the Null Hypothesis
is not rejected.
Critical Value(s)
Each hypothesis test has one or two Critical Values. A Critical Value is the location of boundary between
the Region of Acceptance and the Region of Rejection. A one-tailed test has one critical value because
the Region of rejection is entirely contained in one of the outer tails. A two-tailed test has two Critical
Values because the Region of Rejection is split between the two outer tails.
The Null Hypothesis is rejected if the sample statistic (the observed sample mean or proportion) is farther
from the curve’s mean than the Critical Value on that side. If the sample statistic is farther from the
curve’s mean than the Critical value on that side, the sample statistic lies in the Region of Rejection. If the
sample statistic is closer to the curve’s mean than the Critical value on that side, the sample statistic lies
in the Region of Acceptance.
Test Statistic
Each hypothesis test calculates a Test Statistic. The Test Statistic is the amount of difference between
the observed sample statistic (the observed sample mean or proportion) and the hypothesized population
parameter (the Constant on the right side of the Null Hypothesis) which will be located at the curve’s
mean.
This difference is expressed in units of Standard Errors. The Test Statistic is the number of Standard
Errors that are between the observed sample statistic and the hypothesized population parameter. The
Null Hypothesis is rejected if that number of Standard Errors specified by the Test Statistic) is larger than
a critical number of Standard Errors. The critical number of Standard Errors is determined by the required
Level of Certainty.
The Test Statistic is either the z Score or the t Value depending on whether a z Test or t Test is being
performed. This will be discussed in greater detail shortly.
272
Power of a Test
The Power of a test indicates the test’s sensitivity. The Power of a test is the probability that the test will
detect a significant difference if one exists. The Power of a test is the probability of not making a Type II
Error, which is failing to detect a difference when one exists. A test’s Power is therefore expressed by the
following formula:
Power = 1 – β
Effect Size
Effect size in a t-Test or z Test is a convention of expressing how large the difference between two
groups is without taking into account the sample size and whether that difference is significant.
Effect size of Hypotheses Tests of Mean is usually expressed in measures of Cohen’s d. Cohen’s d is a
standardized way of quantifying the size of the difference between the two groups. This standardization of
the size of the difference (the effect size) enables classification of that difference in relative terms of
“large,” “medium,” and “small.” A large effect would be a difference between two groups that is easily
noticeable with the measuring equipment available. A small effect would be a difference between two
groups that is not easily noticed.
Hypothesis Test covered in this section will either be Hypothesis Tests of Mean or Hypothesis Test of
Proportion. A data point of a sample taken for a Hypothesis Test of Mean can have a range of values. A
data point of a sample taken for a Hypothesis Test of Proportion is binary; it can take only one of two
values.
Hypothesis Tests of Mean – Basic Definition
A Hypothesis Test of Mean compares an observed sample mean with a hypothesized population mean to
determine if the sample was taken from the same population. An example would be to compare a sample
of monthly sales of stores in one region to the national average to determine if mean sales from the
region (the population from which the sample was taken) is different than the national average (the
hypothesized population parameter). As stated, a sample taken for a Hypothesis Test of Mean can have
a range of values. In this case, the sales of a sample sampled store can fall within a wide range of values.
Hypothesis Tests of mean are covered in detail separate sections on t Tests and z Tests.
t Tests are also summarized at the end of the section on the t distribution.
z Tests are also summarized at the end of the section on the normal distribution.
Hypothesis Tests of Proportion – Basic Definition
A Hypothesis Test of Proportion compares an observed sample proportion with a hypothesized
population proportion to determine if the sample was taken from the same population. An example would
be to compare the proportion of defective units from a sample taken from one production line to the
proportion of defective units from all production lines to determine if the proportion defective from the one
production line (the population from which the sample was taken) is different than from the proportion
defective of all production lines (the hypothesized population parameter). As stated, a sample taken for a
Hypothesis Test of Proportion can only have one of two values. In this case, a sampled unit from a
production line is either defective or it is not.
273
Data observations in the sample taken for a Hypothesis Test of Proportion are required to be distributed
according to the binomial distribution. Data that are binomially distributed are independent of each other,
binary (can assume only one of two states), and all have the same probability of assuming the positive
state.
The binomial distribution can be approximated by the normal distribution under the following two
conditions:
1) p (the probability of a positive outcome on each trial) and q (q = 1 – p) are not too close to 0 or 1.
2) np > 5 and nq > 5
A z Test can be performed on binomially-distributed data if the above conditions are met. Hypothesis
Test of Proportion only use z Tests and not t Tests because the binomial distribution is approximated by
the normal distribution, not the t distribution.
The Test Statistic for a z Test is a z Score.
A Hypothesis Test of Proportion is performed in a very similar manner to a Hypothesis Test of Mean. A
general description of the major steps is as follows:
1) A sample of binary data is taken. The sample proportion is calculated. Examples of a sample
proportion is the proportion of sampled people who are of one gender or the proportion of sampled
production units that are defective.
2) A Null Hypothesis is created stating the population from which the sample was taken has the same
proportion as a hypothesized population proportion. An Alternative Hypothesis is constructed stating that
sample population’s proportion is not equal to, greater than, or less than the hypothesized population
proportion depending on the wording of the problem.
3) The sample proportion is mapped to a normal curve that has a mean equal to the hypothesized
population proportion and a Standard Error calculated based upon a formula specific to the type of
Hypothesis Test of Proportion.
4) The Critical Values are calculated and the Regions of Acceptance and Rejection are mapped on the
normal graph that maps the distributed variable.
5) Critical z Values, the Test Statistic (z Score) and p Value are calculated.
6) The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
a) The observed sample proportion, p_bar, is beyond the Critical Value.
b) The z Value (the Test Statistic) is farther from zero than the Critical z Value.
c) The p Value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
The Null Hypothesis is not rejected in the output of the following Hypothesis Test of Proportion because
none of the above equivalent conditions exist. This is evidenced in the following graph:
274
This z-Test was a two-tailed test as evidenced by the yellow Region of Rejection split between the both
outer tails. In this t-Test the alpha was set to 0.05. This 5-percent Region of Rejection is split between the
two tails so that each tail contains a 2.5 percent Region of Rejection.
The mean of this non-standardized normal distribution curve is 0.30. This indicates that the Null
Hypothesis is as follows:
H0: p_bar = 0.30
Since this is a two-tailed t-Test, the Alternative Hypothesis is as follows:
H1: p_bar ≠ 0.30
This one-sample z-Test is evaluating whether the population from which the sample was taken has a
population proportion that is not equal to 0.30. This is a non-directional z-Test and is therefore two-tailed.
The sample statistic is the observed sample proportion of this single sample taken for this test. This
observed sample proportion is calculated to be 0.42.
The boundaries of the Region of Rejection occur at 0.17 and 0.43. Everything beyond these two points is
in the Region of Rejection. Everything inside of these two points is in the Region of Acceptance. These
two Critical Values are 1.95 Standard Errors from the standardized mean of 0. This indicates that the
Critical t Values are ±1.96.
The graph shows that the sample statistic (the sample proportion of 0.42) falls inside the right Critical
Value of 0.43 and is therefore in the Region of Acceptance.
The sample statistic is 1.85 Standard Errors from the standardized mean of 0. This is closer to the
standardized mean of 0 than the right Critical t value which is 1.96.
The curve area beyond the sample statistic consists of 3.2 percent of the area under the curve. This is
larger than α/2 which is 2.5 percent of the total curve area because alpha was set to 0.05.
The Null Hypothesis is not rejected. As the graph shows, none of the three equivalent conditions have
been met to reject the Null Hypothesis. It cannot be stated with at least 95 percent certainty that the
proportion of the population from which the sample was taken does not equal the hypothesized
population proportion of 0.30.
It should be noted that failure to reject the Null Hypothesis is not equivalent to accepting the Null
Hypothesis. A hypothesis test can only reject or fail to reject the Null Hypothesis.
275
Uses of Hypothesis Tests of Proportion
1) Comparing the proportion of a sample taken from one population with the another population’s
proportion to determine if both populations have the different proportions. An example of this would be to
compare the proportion of monthly purchases returned in a sample of retail stores from one region to the
national mean monthly return rate to determine if the monthly proportion of sales returned in all stores in
the one region is different than the national monthly return rate.
2) Comparing the proportion of a sample taken from one population to a fixed proportion to determine if
that population’s proportion is different than the fixed proportion. An example of this might be to compare
the proportion of a specified chemical measured in a sample of a number of units of a product to the
company’s claims about that product specification to determine if the actual proportion of the chemical in
all units of that company’s product is different than what the company claims it is.
3) Comparing the proportion of a sample from one population with the proportion of a sample from
another population to determine if the two populations have different proportions. An example of this
would be to compare the proportion of defective units of a sample of production runs by one crew with the
proportion of defective units of a sample of production runs by another crew to determine if the two crews
have consistently different proportions of defective units in all of their runs.
4) Comparing successive measurement pairs taken on the same group of objects to determine if anything
has changed between measurements. An example of this would be to evaluate whether there is
difference in the proportion of the same people passing a standardized test before and after a training
program to determine if the training program makes a difference in the proportion of all people who take
the standardized test before and after undergoing the training
5) Comparing the same measurements taken on pairs of related objects. An example of this would be to
evaluate whether the proportion of total household income brought in by the husband and the wife is
different in a sample of married couples to determine if there is a difference in the proportions of total
household income brought in by husbands and wives in all married couples.
It is important to note that a hypothesis test is used to determine if two populations are different, The
outcome of hypothesis test is to either reject or fail to reject the Null Hypothesis. It would be incorrect to
state that a hypothesis test is used to determine if two populations are the same.
276
1) One-Sample Hypothesis Test of Proportion
Overview
This hypothesis test analyzes a single sample to determine if the population from which the sample was
taken has a proportion that is equal to a constant, p. In many cases, a one-sample Hypothesis test of
Proportion is used to determine if one population has the same proportion as the known proportion of
another population, p.
p_bar = observed sample proportion
p_bar = X/n = (Number of successes in the sample)/(Number of trials in the sample)
q_bar = 1 – p_bar
p = Constant (is the hypothesized population proportion, which is often a known population proportion)
q=1–p
The z Value for this test, which is the Test Statistic, is calculated as follows:
SE is calculated using population parameters p and q, not sample statistics p_bar and q_bar.
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed p_bar is beyond the Critical Value.
2) The z Value (the Test Statistic) is farther from zero than the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
277
Example of a One-Sample, Two-Tailed Hypothesis Test of
Proportion in Excel
Following is result of a one-sample hypothesis test of proportion that is two-tailed:
Over the course of one entire year, 30 percent of all units produced by one production line had at least
one defect. During the next year the first 21 out of 50 units produced by the production line had a defect.
Determined with 95 percent certainty whether production line’s performance has changed.
Note that the question asks only whether there has been any change, not whether there has been a
specific change such as whether there has been a worsening of performance. This means that the
hypothesis test will be a two-tailed test and not a one-tailed test.
278
The Initial Two Questions That Must Be Answered Before Performing the Four-Step Hypothesis Test of
Proportion are as follows:
d) t-Test or z-Test?
A Hypothesis Test of Proportion uses the normal distribution to approximate the underlying binomial
distribution that the sampled objects follow. A Hypothesis Test of Proportion will therefore always be a z
Test and not a t Test.
A hypothesis test of proportion will always be a z test because a hypothesis test of proportion always
uses the normal distribution to model the distributed variable. A t Test uses the t distribution to model the
distributed variable.
Samples taken for a Hypothesis Test of Proportion are binary: they can only assume one of two values.
Binary objects are distributed according to the binomial distribution. The binomial distribution can be
approximated by the normal distribution. A Hypothesis Test of Proportion uses the normal distribution to
approximate the underlying binomial distribution that the sampled objects follow. A Hypothesis Test of
Proportion will therefore always be a z Test and not a t Test.
This hypothesis test is a z Test that is one-sample, two-tailed hypothesis test of proportion.
279
The binomial distribution can be approximated by the normal distribution if np > 5 and nq >5. In this case,
the calculation of np and qp is the following:
n = 50
p = 0.30
q = 0.70
np = 15 and nq = 35
np > 5 and nq >5 so it is valid to approximate the binomial distribution with the normal distribution.
Because the binomial distribution can be modeled by the normal distribution, a z Test can be used to
perform a Hypothesis Test of Proportion.
The binomial distribution has the following parameters:
Mean = np
Variance = npq
Each unique normal distribution can be completely described by two parameters: its mean and its
standard deviation. As long as np > 5 and nq > 5, the following substitution can be made:
Normal (mean, standard deviation) approximates Binomial (n,p)
If np is substituted for the normal distribution’s mean and npq is substituted for the normal distribution’s
standard deviation as follows:
Normal (mean, standard deviation)
becomes
Normal (np, npq)
which approximates Binomial (n,p)
This can be demonstrated with Excel using data from this problem.
n = 50 = the number of trials in one sample
p = 0.3 = expected probability of a positive result in all trials
q = 1 – p = 0.7 = expected probability of a negative result in all trials
If the number of positive outcomes is randomly picked to be X = 21, the normal approximation of the
binomial distribution’s PDF at the point X = 21 is computed as follows:
BINOM.DIST(X, n, p, FALSE)
= BINOM.DIST(21, 50, 0.3, FALSE)
= 0.023
The normal distribution’s PDF will equal approximately the same value of binomial distribution’s PDF if the
following substitutions are made:
NORM.DIST(X, Mean, Stan. Dev, FALSE)
= NORM.DIST(X, np, npq, FALSE)
This is the basis for the normal approximation of the binomial distribution as follows:
BINOM.DIST(X, n, p, FALSE) ≈ NORM.DIST(X, np, npq, FALSE)
NORM.DIST(X, np, npq, FALSE)
= NORM.DIST(21,15,10.5,FALSE) = 0.032
The difference between BINOM.DIST(21, 50, 0.3, FALSE) and NORM.DIST(21,15,10.5,FALSE) is less
than 0.01. That is reasonably close.
280
Note that the normal approximation of the binomial distribution only works for the PDF (Probability
Density Function) and not the CDF (Cumulative Distribution Function). Replacing FALSE with TRUE in
the above BINOM.DIST() and NORM.DIST() formulas would calculate their CDFs instead of their PDFs.
We now proceed to complete the four-step method for solving all Hypothesis Tests of Proportion. These
four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
281
Step 2 – Map the Distributed Variable to Normal Distribution
A z Test can be performed if the sample proportion is distributed according to the normal distribution. The
sample proportion, p_bar, is distributed according to the binomial distribution. The normal distribution can
be used to approximate this binomial distribution because the requirements that np and qp are greater
than 5. The distribution of the sample proportion, p_bar, can therefore be approximated by the normal
distribution.
The sample proportion, p_bar, will be mapped to a normal distribution. Each unique normal distribution
can be fully described by two parameters: its mean and standard deviation.
The mean of the normal distribution curve that maps the distributed variable p_bar is equal to the
Constant in the Null Hypothesis. The Null Hypothesis is as follows:
H0: p_bar = Constant = p = 0.3
The distributed variable p_bar will be mapped to a normal distribution curve with a mean = 0.3.
Population parameters such as population standard deviation have to be estimated if only sample data is
available. In this case the population standard deviation will be estimated by the Standard Error which is
based on the sample size.
Standard Error (SE) for a one-sample Hypothesis Test of Proportion is calculated as follows:
SE = Standard Error
SE = SQRT[ (0.3*0.7)/50 ]
SE = 0.0648
Note that SE is calculated using p (0.3) from the Null Hypothesis and not p_bar (0.42). q is derived from p
and q_bar is derived from p_bar.
The normal distribution curve that maps the distribution of variable p_bar now has the following
parameters:
Mean = 0.3
Standard Error = 0.0648
282
This Excel-generated normal distribution curve is shown as follows:
283
This Alternate Hypothesis indicates that the hypothesis test has the Region of Rejection split between
both outer tails and is therefore two-tailed.
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this normal distribution
curve.
Because this test is a two-tailed test, the 5 percent Region of Rejection is divided up between the two
outer tails. Each outer tail contains 2.5 percent of the total 5 percent is the Region of Rejection.
284
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
1) Compare p-bar With Critical Value
If the observed value of p_bar (0.43) falls into the Region of Acceptance (the blue region under the
curve), the Null Hypothesis is not rejected. If the observed value of p_bar falls into the Regions of
Rejection (either of the two yellow outer regions), the Null Hypothesis is rejected.
The observed p_bar (0.42) is closer to the curve’s mean (0.3) than the Critical Value (0.43) and falls in
the blue Region of Acceptance. We therefore do not reject the Null Hypothesis. We cannot state with 95
percent certainty that there is a real difference between the overall defect rates of this year and last year
based upon the defect rate of the sample taken from this year’s production.
285
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond the observed p_bar (0.42). If the p Value is smaller
than Alpha/2 (if the test is two-tailed), the Null Hypothesis is rejected. If the p Value is larger than Alpha/2,
the Null Hypothesis is not rejected.
The p Value is calculated by the following Excel formula:
p Value = MIN(NORM.S.DIST(z Value,TRUE),1-NORM.S.DIST(z Value,TRUE))
p Value = MIN(NORM.S.DIST(1.85,TRUE),1-NORM.S.DIST(1.85,TRUE))
p Value = 0.032
The p Value (0.032) is larger than Alpha/2 (0.025) and we therefore do not reject the Null Hypothesis.
The following Excel-generated graph shows that the red p Value (the curve area beyond p_bar) is larger
than the yellow Alpha/2 Region of rejection in the outer right tail.
It should be noted that if this z Test were a one-tailed test, which is less stringent than a two-tailed test,
the Null Hypothesis would now have been reject because of the following three equivalent conditions:
1) The p Value (0.034) is smaller than Alpha (0.05). A one-tailed test would contain the entire 5-percent
Region of Rejection in one outer tail.
2) p_bar (0.42) would now be the Region of Rejection, which would now have its outer right boundary at
0.41, the Critical value for a one-tailed test.
Critical Valueone-tailed,right tail = Mean + NORM.S.INV(1-α) * SE = 0.41
3) The z Value (1.85) would now be farther the standardized mean of zero than the critical z Value which
would now be 1.6444
Critical z Valueone-tailed,right tail = NORM.S.INV(1-α) = 1.6448
286
Two-Sample, Pooled Hypothesis Test of Proportion in
Excel
Overview
This hypothesis test analyzes two independent samples to determine if the populations from which the
samples were taken have equal proportions. This test is often used to determine whether two sample
proportions are likely the same. The test is a called pooled test because the Null Hypothesis states that
the two sample proportions are the same. The formula for Standard Error uses a pooled proportion that
combines the proportions of both samples. This formula is shown in the following set of formulas.
p_bar1 = observed sample 1 proportion
p_bar1 = X1/n1
= (Number of successes in sample 1)/(Number of trials in sample 1)
p_bar2 = observed sample 2 proportion
p_bar2 = X2/n2
= (Number of successes in sample 2)/(Number of trials in sample 2)
The z Value for this test is the Test Statistic and is calculated as follows:
287
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed p_bar2 - p_bar1 is beyond the Critical Value.
2) The z Value (the Test Statistic) is farther from zero than the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
288
As with all Hypothesis Tests of Proportion, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions That Must Be Answered Before Performing the Four-Step Hypothesis Test of
Proportion are as follows:
289
d) t-Test or z-Test?
A Hypothesis Test of Proportion uses the normal distribution to approximate the underlying binomial
distribution that the sampled objects follow. A Hypothesis Test of Proportion will therefore always be a z
Test and not a t Test.
A hypothesis test of proportion will always be a z test because a hypothesis test of proportion always
uses the normal distribution to model the distributed variable. A t Test uses the t distribution to model the
distributed variable.
Samples taken for a Hypothesis Test of Proportion are binary: they can only assume one of two values.
Binary objects are distributed according to the binomial distribution. The binomial distribution can be
approximated by the normal distribution. A Hypothesis Test of Proportion uses the normal distribution to
approximate the underlying binomial distribution that the sampled objects follow. A Hypothesis Test of
Proportion will therefore always be a z Test and not a t Test.
This hypothesis test is a z Test that is two-independent-sample, pooled, two-tailed hypothesis test
of proportion.
Sample 2
X2 = 59
n2 = 100
p2 = 0.59
q2 = 0.41
n2p2 = 59 and n2q2 = 41
290
np > 5 and nq >5 for both samples so it is valid to approximate the binomial distribution with the normal
distribution. Because the binomial distribution can be modeled by the normal distribution, a z Test can be
used to perform a Hypothesis Test of Proportion.
291
We now proceed to complete the four-step method for solving all Hypothesis Tests of Proportion. These
four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
292
Step 2 – Map the Distributed Variable to Normal Distribution
A z Test can be performed if the sample proportion p_bar2–p_bar1 is distributed according to the normal
distribution. The sample proportion, p_bar2–p_bar1, is distributed according to the binomial distribution
because both p_bar1 and p_bar2 are binomially distributed. The normal distribution can be used to
approximate this binomial distribution because the requirements that np and qp are greater than 5 for
both p_bar1 and p_bar2 . The distribution of the sample proportion, p_bar2–p_bar1, can therefore be
approximated by the normal distribution.
The sample proportion, p_bar2–p_bar1, will be mapped to a normal distribution. Each unique normal
distribution can be fully described by two parameters: its mean and standard deviation.
The mean of the normal distribution curve that maps the distributed variable p_bar is equal to the
Constant in the Null Hypothesis. The Null Hypothesis is as follows:
H0: p_bar2–p_bar1 = Constant = 0
The distributed variable p_bar2–p_bar1 will be mapped to a normal distribution curve with a mean = 0,
which is the Constant.
Population parameters such as population standard deviation have to be estimated if only sample data is
available. In this case the population standard deviation will be estimated by the Standard Error which is
based on the sample size.
Standard Error (SEDiff) for a pooled, two-independent-sample Hypothesis Test of Proportion is calculated
as follows:
293
This Excel-generated normal distribution curve is shown as follows:
294
The total size of the Region of Rejection is equal to Alpha. In this case Alpha, α, is equal to 0.05. This
means that the Region of Rejection will take up 5 percent of the total area under this normal distribution
curve.
Because this test is a two-tailed test, the 5 percent Region of Rejection is divided up between the two
outer tails. Each outer tail contains 2.5 percent of the total 5 percent is the Region of Rejection.
295
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
296
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond the observed p_bar2–p_bar1 (0.14). If the p Value is
smaller than Alpha/2 (since this is a two-tailed test), the Null Hypothesis is rejected. If the p Value is larger
than Alpha/2, the Null Hypothesis is not rejected.
The p Value is calculated by the following Excel formula:
p Value = MIN(NORM.S.DIST(z Value,TRUE),1-NORM.S.DIST(z Value,TRUE))
p Value = MIN(NORM.S.DIST(2.29,TRUE),1-NORM.S.DIST(2.29,TRUE))
p Value = 0.0111
The p Value (0.0111) is smaller than Alpha/2 (0.025) and we therefore reject the Null Hypothesis.
The following Excel-generated graph shows that the red p Value (the curve area beyond p_bar2–p_bar1)
is smaller than the yellow Alpha/2 Region of rejection in the outer right tail.
It should be noted that if this z Test were a one-tailed test, the Null Hypothesis would also be rejected
because a two-tailed Hypothesis Test is more stringent than a one-tailed test.
The one and two-tailed tests both calculate the same p Value (0.011), z Value (2.29), and observed value
of p_bar2–p_bar1 (0.14) . The critical values that these are compared to are different between the one and
two-tailed tests. These critical values are the Critical Value, the Critical z Value, and the area of rejection
in one outer tail of the Region of Rejection.
Critical values for a one-tailed would be the following:
Critical Value = Mean + NORM.S.INV(1-α) * SEDiff
Critical Value = 0 + NORM.S.INV(1- 0.05 ) * 0.06
Critical Value = 0 + NORM.S.INV(0.95) * 0.06
Critical Value = 0.0987
297
Critical z Valueα=0.05,one-tailed,right tail = NORM.S.INV(1-α)
Critical z Valueα=0.05,one-tailed,right tail = NORM.S.INV(0.95) = 1.64
Note that one of the main differences between critical values for a one and two-tailed test is that the one-
tailed test critical values are calculated using α while the two-tailed critical values are calculated using α/2.
298
Two-Sample, Unpooled Hypothesis Test of Proportion
in Excel
Overview
This hypothesis test analyzes two independent samples to determine if the populations from which the
samples were taken have equal proportions. This test is often used to determine whether two sample
proportions are likely different by some specific proportion.
The test is called an unpooled test because the Null Hypothesis states that the two sample proportions
are not the same. The formula for Standard Error uses a unpooled proportion that does not combine the
proportions of both samples into a single, pooled proportion as a pooled test does. This formula is shown
in this following set of formulas.
p_bar1 = observed sample 1 proportion
p_bar1 = X1/n1
= (Number of successes in sample 1)/(Number of trials in sample 1)
The z Value for this test is the Test Statistic and is calculated as follows:
The Null Hypothesis is rejected if any of the following equivalent conditions are shown to exist:
1) The observed p_bar2 - p_bar1 is beyond the Critical Value.
2) The z Value (the Test Statistic) is farther from zero the Critical z Value.
3) The p value is smaller than α for a one-tailed test or α/2 for a two-tailed test.
299
Example of a Two-Sample, Unpooled, One-Tailed Hypothesis
Test of Proportion in Excel
It is believed that Production Line B produces 5 percent more defects than Production Line A.
Both production lines manufacture the same products and have the same types of machines that are all
in similar condition. Both production lines operate approximately the same number of hours. The only
difference between the production lines is the experience of the crews. The crews that operate Production
Line A have more experience the crews on Production Line B.
Completed units from both production lines were sampled and evaluated over the same period of time as
follows.
12 out of 200 randomly sampled units produced on Production Line A were nonconforming. The
proportion of sample units from Production Line A that were nonconforming was 0.06 (6 percent).
39 out of 300 randomly sampled units produced on Production Line B were nonconforming. The
proportion of sample units from Production Line B that were nonconforming was 0.13 (13 percent).
Determine with 95 percent certainty whether production Line B’s overall proportion nonconforming
exceeds that of Production Line A by more than 5 percent. In other words, determine whether difference
between Production Line B’s overall percent defective and Production Line A’s overall percent defective is
greater than 5 percent.
Note that this will be a unpooled z Test because the proportions of the two populations are assumed to be
different. The Null Hypothesis will state that the difference between the proportion of defectives of the two
populations from which the samples are taken is equal to 5 percent. The Alternative Hypothesis states
that this difference is greater than 5 percent.
Sample results for the two samples cannot be pooled if they are known to be different, as stated by the
Null Hypothesis. Sample results can only be pooled if the Null Hypothesis states that the proportions of
the two samples are the same.
Two-independent-sample Hypothesis Tests of Proportion remain slightly more intuitive and allow for
consistent use of the variable name p_bar2–p_bar1 if the larger sample proportion is always designated
as p_bar2.and the smaller sample proportion is designated as p_bar1.
300
As with all Hypothesis Tests of Proportion, we must satisfactorily answer these two questions and then
proceed to the four-step method of solving the hypothesis test that follows.
The Initial Two Questions That Must Be Answered Before Performing the Four-Step Hypothesis Test of
Proportion are as follows:
301
d) t-Test or z-Test?
A Hypothesis Test of Proportion uses the normal distribution to approximate the underlying binomial
distribution that the sampled objects follow. A Hypothesis Test of Proportion will therefore always be a z
Test and not a t Test.
A hypothesis test of proportion will always be a z test because a hypothesis test of proportion always
uses the normal distribution to model the distributed variable. A t Test uses the t distribution to model the
distributed variable.
Samples taken for a Hypothesis Test of Proportion are binary: they can only assume one of two values.
Binary objects are distributed according to the binomial distribution. The binomial distribution can be
approximated by the normal distribution. A Hypothesis Test of Proportion uses the normal distribution to
approximate the underlying binomial distribution that the sampled objects follow. A Hypothesis Test of
Proportion will therefore always be a z Test and not a t Test.
This hypothesis test is a z Test that is two-independent-sample, unpooled, one-tailed hypothesis
test of proportion.
Sample 2
X2 = 39
n2 = 300
p2 = 0.13
q2 = 0.87
n2p2 = 39 and n2q2 = 261
302
np > 5 and nq >5 for both samples so it is valid to approximate the binomial distribution with the normal
distribution. Because the binomial distribution can be modeled by the normal distribution, a z Test can be
used to perform a Hypothesis Test of Proportion.
The binomial distribution has the following parameters:
Mean = np
Variance = npq
Each unique normal distribution can be completely described by two parameters: its mean and its
standard deviation. As long as np > 5 and nq > 5, the following substitution can be made:
Normal (mean, standard deviation) approximates Binomial (n,p)
When np is substituted for the normal distribution’s mean and npq is substituted for the normal
distribution’s standard deviation, then the following is true:
Normal (mean, standard deviation)
becomes
Normal (np, npq)
This approximates Binomial (n,p).
The approximation can be demonstrated with Excel using data from the second sample of this problem.
X = 90 = the number of positive outcomes in n trials
n = 200 = the number of trials in one sample
p = 0.45 = expected probability of a positive result in all trials
q = 1 – p = 0.55 = expected probability of a negative result in all trials
The normal approximation of the binomial distribution as follows:
BINOM.DIST(X, n, p, FALSE) ≈ NORM.DIST(X, np, npq, FALSE)
Analyzing the data from Sample 2 would produce the following comparison:
BINOM.DIST(X, n, p, FALSE)
= BINOM.DIST(39, 300, 0.13, FALSE) = 0.067
NORM.DIST(X, np, npq, FALSE)
= NORM.DIST(39, 39, 33.93, FALSE) = 0.012
The difference between BINOM.DIST(39, 300, 0.13 FALSE) and NORM.DIST(39, 39, 33.93, FALSE) is
less than 0.06. That is reasonably close.
Note that the normal approximation of the binomial distribution only works for the PDF (Probability
Density Function) and not the CDF (Cumulative Distribution Function). Replacing FALSE with TRUE in
the above BINOM.DIST() and NORM.DIST() formulas would calculate their CDFs instead of their PDFs.
303
We now proceed to complete the four-step method for solving all Hypothesis Tests of Proportion. These
four steps are as follows:
Step 1) Create the Null Hypothesis and the Alternate Hypothesis
Step 2 – Map the Normal or t Distribution Curve Based on the Null Hypothesis
Step 3 – Map the Regions of Acceptance and Rejection
Step 4 – Determine Whether to Accept or Reject theNull Hypothesis By Performing the Critical
Value Test, the p Value Test, or the Critical t Value Test
304
Step 2 – Map the Distributed Variable to Normal Distribution
A z Test can be performed if the sample proportion p_bar2–p_bar1 is distributed according to the normal
distribution. The sample proportion, p_bar2–p_bar1, is distributed according to the binomial distribution
because both p_bar1 and p_bar2 are binomially distributed. The normal distribution can be used to
approximate this binomial distribution because the requirements that np and qp are greater than 5 for
both p_bar1 and p_bar2 . The distribution of the sample proportion, p_bar2–p_bar1, can therefore be
approximated by the normal distribution.
The sample proportion, p_bar2–p_bar1, will be mapped to a normal distribution. Each unique normal
distribution can be fully described by two parameters: its mean and standard deviation.
The mean of the normal distribution curve that maps the distributed variable p_bar is equal to the
Constant in the Null Hypothesis. The Null Hypothesis is as follows:
H0: p_bar2–p_bar1 = Constant = 0.05
The distributed variable p_bar2–p_bar1 will be mapped to a normal distribution curve with a mean = 0.05,
which is the Constant.
Population parameters such as population standard deviation have to be estimated if only sample data is
available. In this case the population standard deviation will be estimated by the Standard Error which is
based on the sample size.
Standard Error (SEDiff) for an unpooled, two-independent-sample Hypothesis Test of Proportion is
calculated as follows:
305
This Excel-generated normal distribution curve is shown as follows:
307
Step 4 – Determine Whether to Reject Null Hypothesis
The object of a hypothesis test is to determine whether to accept of reject the Null Hypothesis. There are
three equivalent tests that determine whether to accept or reject the Null Hypothesis. Only one of these
tests needs to be performed because all three provide equivalent information. The three tests are as
follows:
308
3) Compare p Value With Alpha
The p Value is the percent of the curve that is beyond the observed p_bar2–p_bar1 (0.07) . If the p Value
is smaller than Alpha (since the test is one-tailed), the Null Hypothesis is rejected. If the p Value is larger
than Alpha, the Null Hypothesis is not rejected.
The p Value is calculated by the following Excel formula:
p Value = MIN(NORM.S.DIST(z Value,TRUE),1-NORM.S.DIST(z Value,TRUE))
p Value = MIN(NORM.S.DIST(0.67,TRUE),1-NORM.S.DIST(0.67,TRUE))
p Value = 0.2523
The p Value (0.2523) is larger than Alpha (0.05) and we therefore cannot reject the Null Hypothesis.
The following Excel-generated graph shows that the red p Value (the curve area beyond p_bar2–p_bar1)
is larger than the yellow Alpha Region of Rejection in the outer right tail.
The Null Hypothesis would also not be rejected if the test were two-tailed because a two-tailed test is
more stringent that a one-tailed test. A hypothesis test is more stringent if the Null Hypothesis is harder to
reject.
309
Chi-Square Independence Test in Excel
Overview
The Chi-Square Independence Test is used to determine whether two categorical variables associated
with the same item act independently on that item. The example presented in this section analyzes
whether the gender of the purchaser of a car is independent of the color of the car. This Chi-Square
Independence Test answers the question of whether gender plays a role in the color selection of a
purchased car.
Each item (each purchased car) has two attributes associated with it. These two attributes are the
categorical variables of purchaser’s gender and color. The counts of the number of cars purchased for
each unique combination of gender and color are placed in a matrix called a contingency table.
Contingency Table
A contingency table is a two-way cross-tabulation. Each row in the contingency table is associated with
one of the levels of one of the categorical attributes (such as gender) and each column is associated with
one of the levels of the other categorical attribute (such as color).
The number of rows in the contingency table, r, is equal to the number of levels of the row attribute. The
number of columns in the contingency table, c, is equal to the number of levels of the column attribute.
The contingency table is therefore an r x c table and has r x c cells representing r x c unique
combinations of levels of row and column attributes.
Null Hypothesis
A Null Hypothesis is created which states there is no significant difference between the actual and
expected counts of data for the unique combinations of levels of the two factors.
Test Statistic
2
The Chi-Square Independence Test calculates a Test Statistic called a Chi-Square Statistic, Χ . The
distribution of this Test Statistic can be approximated by the Chi-Square distribution if several conditions
are met.
310
Required Assumptions
2
The distribution of this Test Statistic, Χ , can be approximated by the Chi-Square distribution with degrees
of freedom equal to df = (r – 1)(c – 1) if the following three conditions are met:
1) The number of cells in the contingency table (r x c) is at least 5. A 2 x 2 contingency table is not large
enough. One of the two attributes must have at least 3 levels.
2) The average value of all of the expected counts is at least 5.
3) All of the expected counts equal at least 1.
311
Example of Chi-Square Independent Test in Excel
We will examine whether gender and product color selection are independent of each other. A car
company in the United States sold new 12,000 cars of one brand in one month. The car company
recorded the gender of each customer and also the color of the car. The car was available in only three
colors: red, blue, and green.
The actual counts of cars purchased in that months for each unique combination of gender/color are
shown as follows:
Determine with 95-percent certainty the car purchaser’s gender and the selected color of the car are
independent of each other.
312
Creating the Contingency Table From an Excel Pivot Table
The contingency table can be created with Excel’s Pivot Table tool is the data are initially presented in the
following fashion as they often are:
313
Hitting OK brings up the following final Pivot Table dialogue box:
314
Dragging the label Color down to the Column Labels box and to the Σ Values box and then dragging the
label Gender down to the Row Labels box produces the completed Pivot Table as follows. This Pivot
Table is an exact match of the contingency table containing the actual values for this data set.
Note that the Excel Pivot Table would be an exact match for the contingency table with the actual counts
that is shown again here.
315
Step 2 – Place Expected Counts In Contingency Table
The expected counts for each unique combination of levels of row/column attributes are placed into the
correct cells of an identical contingency table as follows:
The expected counts are based upon the assumption that the row and column attributed act
independently of each other. The method of calculated the expected numbers based upon this
assumption is shown below:
316
Step 4 – Verify Required Assumptions
The distribution of this Test Statistic, Χ2, can be approximated by the Chi-Square distribution with
degrees of freedom equal to df = (r – 1)(c – 1) if the following three conditions are met:
1) The number of cells in the contingency table (r x c) is at least 5. The contingency table is a 2 x 3 table
so this condition is met.
2) The average value of all of the expected counts is at least 5. This condition is met.
3) All of the expected counts equal at least 1. This condition is met.
317
Step 6 – Calculate Critical Chi-Square Value and p Value
The degrees of freedom for the Chi-Square Independence Test is calculated as follows:
r = number of rows = 2
c = number of columns = 3
df = (r – 1)(c – 1) = (2 – 1)(3 – 1) = 2
318
Step 7 – Determine Whether To Reject Null Hypothesis
The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:
1) Chi-Square Statistic > Critical Chi-Square Value
2) p Value < α
Both of these conditions exist as follows.
p Value = 0.0457
α = 0.05
In this case we reject the Null Hypothesis because the Chi-Square Statistic (6.17) is larger than the
Critical Value (5.99) or, equivalently, the p Value (0.0457) is smaller than Alpha (0.05).
A graphical representation of this problem is shown as follows:
319
Chi-Square Goodness-of-Fit Tests in Excel
Overview
Chi-Square Goodness-Of-Fit (GOF) tests are hypothesis tests that determine how closely a sample of
data fits a hypothesized distribution. The actual data observations are divided up into groups called bins.
The same number of data points is divided up into identical bins in the groupings that would be expected
if these data points exactly matched the hypothesized distribution.
The counts of actual data observations in each bin are compared with the expected number of data points
that would be in identical bins if the data exactly matched the hypothesized distribution.
Test Statistic
2
A Test Statistic called the Chi-Square Statistic, Χ , is calculated based upon the comparison of the counts
of actual data points in each bin and the counts of expected data points in each of the bins. The formula
for the Chi-Square Statistic is as follows:
n = the total number of bins that containing expected groupings of data points
Actuali = the number of actual observed data points that fall into the ith bin
Expectedi = the number of expected data points in the ith bin if the data exactly matched hypothesized
distribution
Required Assumptions
2
The distribution of the Chi-Square Statistic, Χ , can be approximated by the Chi-Square distribution if the
following 3 conditions are met:
1) n ≥ 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5
Null Hypothesis
2
The Null Hypothesis of this hypothesis test states that Χ = 0. This would mean that actual and expected
counts of data points in each bin are the same. This Null Hypothesis is rejected if either of the following
two equivalent conditions exist:
1) The Chi-Square Statistic is larger than the Critical Chi-Square Value
2) The p Value is smaller than the specified alpha.
320
Basic Excel Formulas
The formulas for the Critical Chi-Square Value and p Value in Excel are the following:
Critical Chi-Square Value = CHISQ.INV.RT(α, df)
p Value = CHISQ.DIST.RT(Chi-Square Statistic, df)
df = degrees of freedom and is calculated using one of two different formulas depending on which of the
two types of GOF tests is being performed.
321
GOF Example – Type 1
Problem Information
Required Level of Certainty = 90 percent
α = 0.10
Actual data observations divided up into 7 bins.
The 7 Actual bins contain the average count of sales that occurred on each of the seven weekdays.
The average number of total sales each week was 105. This is the total number of actual data
observations.
322
Step 3 – Verify Required Assumptions
2
The distribution of the Chi-Square Statistic, Χ , can be approximated by the Chi-Square distribution if the
following 3 conditions are met:
1) n ≥ 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5
All of these conditions have been met.
323
Step 5 – Calculate Chi-Square Statistic, Χ2
2
The Test Statistic, which is the Chi-Square Statistic, Χ , is calculated by this formula as shown below as
follows:
324
Step 6 – Calculate Critical Chi-Square Value and p Value
The degrees of freedom for the Chi-Square Independence Test is calculated as follows:
df = n – 1 = 7 – 1 = 6
n = k = number of expected bins
325
Step 7 – Determine Whether To Reject Null Hypothesis
The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:
1) Chi-Square Statistic > Critical Chi-Square Value
2) p Value < α
Both of these equivalent conditions exist as follows:
p Value = 0.0863
α = 0.10
In this case we reject the Null Hypothesis because the Chi-Square Statistic (11.07) is larger than the
Critical Value (10.64) or, equivalently, the p Value (0.0863) is smaller than Alpha (0.10).
A graphical representation of this problem is shown as follows:
326
GOF Example – Type 2
Overview
The first example in this section demonstrated the Chi-Square GOF test being performed for the uniform
distribution. The Chi-Square GOF test can be used to test how well any data sample fits just about any
distribution. Quite often the Chi-Square GOF test is used to test whether a sample of data is normally
distributed. The Chi-Square GOF test for normality is an alternative to other well-known normality tests
such as the Anderson-Darling and Kolmogorov-Smirnov tests.
The Chi-Square GOF test can be used to test whether a data sample can be fitted with any distribution for
which the CDF (Cumulative Distribution Function) can be calculated. The Anderson-Darling and
Kolmogorov-Smirnov tests can only be used to test whether a data sample can be fitted with a continuous
distribution such as the normal distribution. The Chi-Square GOF test with continuous distributions as well
as discrete distributions such as the binomial and Poisson distributions.
327
Chi-Square GOF Test for Normality Example in Excel
Determine with 95 percent certainty whether the following sample of data is normally distributed.
328
More importantly, converting data values to their z Scores makes it possible to use the normal
distribution’s CDF (Cumulative Distribution Function) to calculate the percentage of total data points that
would be expected to fall into each of the bins. This will be discussed in more detail shortly.
329
b) Standardizing the Data
Standardizing the data simply involves subtracting the mean from the data value and then dividing by the
standard deviation. This calculation converts each data value to its z Score. For population data, the z
Score is the number of population standard deviations that the data value is from the population mean.
For sample data, the z Score is the number of sample standard deviations that a data value is from the
sample mean.
The z Scores in this example are calculated from sample data as follows:
330
Step 2 – Create Bins
Bin creation involves specifying the upper and lower boundaries of each bin into which the actual and
expected values will fall. Sorting and standardizing the data simplify bin creation.
The z Scores of the data range from -1.787 to 2.490. The bins should cover that entire range because
there are no significant outliers among the 26 total z Scores.
The bins need to be large enough so the three required conditions for the Test Statistic to follow the Chi-
2
Square Distribution are met. The distribution of the Chi-Square Statistic, Χ , can be approximated by the
Chi-Square distribution if the following 3 conditions are met:
1) The number of bins (n) is at least 5
2) The minimum expected number of data points in any of the bins is at least 1
3) The average number of expected data points in a bin is at least 5
Establishing dimensions for the bins is an arbitrary process. Three important criteria need to be
considered when establishing the upper and lower boundaries of the bins. These are the following:
1) The bins need to be large enough so the Chi-Square GOF Test’s three conditions will be met.
2) The overall range of all of the bins should be large enough to capture all data points that have not
removed for being outliers.
2) The bins need to be small enough so that the Chi-Square GOF Test will have sufficient power. The
power of a statistical test is equivalent to its sensitivity and is measured as follows:
Power = 1 – β
β is the test’s probability of making a Type II error. A type II error is a false negative or failing to detect a
significant difference.
Power is therefore a statistical test’s probability of not failing to detect a significant difference. This would
be the sensitivity of a test.
3) The distance between upper and lower boundaries for all bins should be as similar as possible.
Establishing optimal dimensions for the bins is dependent on judgment and statistical skill of the person
performing the test. One possible configuration for the bins would be to construct five bins that catch all
data points with z Scores ranging from -2.5 up to 2.5. Each bin would have a range equaling the length of
one z Score. The boundaries of the five bins configured with those dimensions are shown as follows:
331
It is not yet known whether at least one data point is expected to fall into each of these bins and whether
the average number of data points expected to fall into a bin is at least five. The number of data points
expected to fall into each of the bins if the data were normally distributed will be calculated in Step 4 of
this process.
333
An expanded view of the completed dialogue box is shown as follows:
334
The histogram bar chart is an Excel bar chart that is based upon the actual bin counts and the bins’ upper
boundary z Scores.
The bins counts and the bar chart output are automatically updated if any of the raw data have been
changed.
This bar chart is created in Excel as follows:
Insert tab / Column Chart / 2-D Clustered Column Chart/
A blank chart will appear on the worksheet. Right-click on the blank chart.
Select Data / This brings up the Select Data Source dialogue box
On the left side under Legend Entries (Series) select the blank data series / Edit /
In the Series Values box, select Bin Actual Count cells J4 to J8 / OK
On the right side under Horizontal (Category) Axis Labels, select Bin Upper Boundary z Score cells I4 to
I8 / OK
Note that the values in cells I4 to I8 need to start with lower values on top in order to have lower values
on the right side of the x axis.
335
Step 4 – Determine Expected Count For Each Bin
Standardizing a data value converts that value to its z Score. Converting data values to their z Scores
makes it possible to use the normal distribution’s CDF (Cumulative Distribution Function) to calculate the
percentage of total data points that would be expected to fall into each of the bins.
The normal distribution’s CDF (Cumulative Distribution Function) equals the probability that sampled point
from a normally distributed population has a value UP TO X given the population’s mean, µ, and standard
deviation, σ.
The normal distribution’s CDF is expressed as F(X,µ,σ).
The normal distribution’s CDF at point X is calculated in Excel as follows:
F(X,µ,σ) = NORM.S.DIST(z Score(X),TRUE)
If data are normally distributed, the percentage of total data points that is expected to lie between X Upper
and XLower is equal to the difference in the CDF values at those two X values. This is equal to the
following:
Percentage of Data between XUpper and XLower = F(XUpper,µ,σ) - F(XLower,µ,σ)
Percentage of Data between XUpper and XLower =
= NORM.S.DIST(z Score(XUpper),TRUE)) – NORM.S.DIST(z Score(XLower),TRUE)
336
This is demonstrated in the following diagram which shows that 38.12 percent of the area under the
normal distribution PDF (Probability Density Function) curve lies between x = 25 and x = 30 if µ = 27 and
σ = 5.
337
The CDF values of the z Scores of the upper and lower bin boundaries are created as follows:
The percentage of the total number of data points in each bin is equal to the percentage of the normal
curve area assigned to each bin if the data are normally distributed.
The percentage of normal curve area assigned to each bin is equal to the CDF of the bin’s upper z Score
minus the CDF of the bin’s lower z Score. This subtraction is performed in the following image.
338
The expected count of data points in a bin if the data is normally distributed is equal to the total number of
actual data points (26) times the percentage of the total normal curve area assigned to the bin. This
calculation is also performed in the following image:
339
Step 6 – Create Null and Alternative Hypotheses
The Null Hypothesis states that actual distribution of the data matches the hypothesized distribution. The
Null Hypothesis for the Chi-Square GOF is always specified as the following:
2
H0: Χ = 0
2
The Chi-Square Statistic, Χ , is distributed according to the Chi-Square distribution if certain conditions
are met. The Chi-Square distribution has only one parameter: its degrees of freedom, df. The probability
density function of the Chi-Square distribution calculated at x is defined as f(x,df) and can only be defined
for positive values of x.
Since the Chi-Square’s PDF value f(x,df) only exists for positive values of x, the alternative hypothesis
specifies that that the Chi-Square Independence Test is a one-tailed test in the right tail and is specified
as follows:
2
H1: Χ > 0
340
Step 7 – Calculate Chi-Square Statistic, Χ2
2
The Test Statistic, which is the Chi-Square Statistic, Χ , is calculated for n = r x c unique cells in the
contingency table as follows:
341
Step 8 – Calculate Critical Chi-Square Value and p Value
The degrees of freedom for the Chi-Square Independence Test is calculated as follows:
df = n – 1 –m = n – 1 – 2 = 2
n = k = number of expected bins
342
Step 9 – Determine Whether To Reject Null Hypothesis
The Null Hypothesis is rejected if either of the two equivalent conditions are shown to exist:
1) Chi-Square Statistic > Critical Chi-Square Value
2) p Value < α
Both of these equivalent conditions exist as follows:
p Value = 0.0369
α = 0.05
In this case we reject the Null Hypothesis because the Chi-Square Statistic (6.60) is larger than the
Critical Value (5.99) or, equivalently, the p Value (0.0369) is smaller than Alpha (0.50).
A graphical representation of this problem is shown as follows:
343
Chi-Square Population Variance Test in Excel
Overview
The Chi-Square Population Variance Test is a hypothesis test is used to determine if the variance of a
normally-distributed population has changed. One common use of this test is to determine whether an
adjustment made to a production line causes a change in variance at some measurement point on the
production line.
The Chi-Square Variance Test can be performed as a one-sample or a two-sample test.
A one-sample test usually involved using a single sample taken from a normally-distributed population to
determine whether the variance of that population has changed from a known variance measured in the
past. The production line example just mentioned is the most common use of the one-sample Chi-Square
Variance Test. In this case the test is most accurate when the benchmark population standard deviation
has been calculated from a stable process over a long period of time.
A two-sample test is used to determine whether two normally-distributed populations have the same
variance. This is known as the F Test.
As with most hypothesis tests, the Chi-Square Population variance Test can be conducted as a one-tailed
test or a two-tailed test. When this hypothesis test is conducted as a one-tailed test, it is used to
determine whether the population variance has moved in one direction, i.e., the test is being used to
determine only whether the population variance has increased or the test is being used to determine only
whether the population variance has decreased.
When this hypothesis test is being conducted as a two-tailed test, it is being used to determine whether
the population variance has changed in any direction (a one-sample test) or whether two populations
have the same variance (a two-sample test). The two-tailed test is more stringent than the one-tailed test;
the two-tailed test requires more change to reject the Null Hypothesis than a one-tailed test of the same
alpha level. The Null Hypothesis states either that a single population variance has not changed (a one-
sample test) or that two populations have the same variance (an F Test, which is a two-sample test).
Two-tailed test
Left Critical Value = CHISQ.INV(α/2,df)
Right Critical Value = CHISQ.INV(1 – α/2,df)
344
One-tailed test – Right tail
Critical Value = CHISQ.INV(1 – α,df)
Problem Information
Sample size = n = 150
Degrees of Freedom = n – 1 = 149
Sample Standard Deviation = s = 0.32
2
Sample Variance = s = 0.1024
Long-term, Benchmark Population Variance = 0.09
Alpha = 1 – Required Level of Certainty = 1 – 0.95 = 0.05
345
The Anderson-Darling test for normality of the sample data in Excel
The Shapiro-Wilk test for normality of the sample data in Excel
The above tests are all performed on the sample data in the following section for the F Test, which is a
two-sample, one-tailed Chi-Square Population Variance test.
346
Chi-Square Statistic and Chi-Square Critical Values
The Chi-Square Statistic and Critical Values for this two-tailed test are calculated as follows:
These left and right Critical Values are shown in this Chi-Square PDF distribution curve for 149 degrees
of freedom as follows:
The Chi-Square Statistic (169.5) falls inside of the Critical Values (117, 184) and into the red Region of
Acceptance. The Null Hypothesis is therefore not rejected. It cannot be stated with 95 percent certainty
that the variance of the measurement taken from the completed production unit has changed as a result
of the adjustment made to production line.
347
Example of 1-Sample, 1-Tailed, Right Tail, Chi-Square
Population Variance Test in Excel
A specific measurement is taken on each unit that is completed from a production line over a long period
of time. These measurements have been determined to be normally distributed with a population variance
2
= σ = 0.09.
An adjustment was made to the production line that may have affected the variance of the measurement
taken on each completed unit. A random sample of 150 units from the newly-adjusted production line had
the measurement taken. The sample variance of this 150-unit sample is s = 0.33. Determine with 95
percent certainty whether the population variance has increased as a result of the adjustment.
Problem Information
Sample size = n = 150
Degrees of Freedom = n – 1 = 149
Sample Standard Deviation = s = 0.33
2
Sample Variance = s = 0.1089
Long-term Population Variance = 0.09
Alpha = 1 – Required Level of Certainty = 1 – 0.95 = 0.05
348
Non-Parametric Alternatives to the One-Sample Chi-Square
Population Variance Test
When population normality cannot be confirmed, nonparametric alternatives for the one-sample Chi-
Square Population Variance Test include Levene’s Test and the Brown-Forsythe Test.
Levene’s Test and the Brown-Forsythe Test are nonparametric tests that are used to compare variances
of two samples when the F Test’s normality requirement cannot be met. These two nonparametric tests
can also be used in place of the one-sample, Chi-Square Population Variance Test.
Since both of these nonparametric tests are used to compare variances between two samples and
require that two data samples be taken for comparison.
The one-sample Chi-Square Population Test must be changed slightly in order to meet the requirement of
two samples to compare. A “Before” sample must now be taken in place of the known population
standard deviation data. The one-sample Chi-Square Population Variance Test compares an “After”
sample with known population variance data. The nonparametric tests requires that a “Before” sample be
taken to compare with the “After” sample. The “Before” sample is taken before to the adjustment is made
to, for example, a production line. The “After” sample is taken after the adjustment is made.
Levene’s Test and the Brown-Forsythe Test are performed in the next section on the two samples taken
for comparison by the F Test.
349
Chi-Square Statistic and Chi-Square Critical Values
The Chi-Square Statistic and Critical Values for this two-tailed test are calculated as follows:
These left and right Critical Values are shown in this Chi-Square PDF distribution curve for 149 degrees
of freedom as follows:
The Chi-Square Statistic (180.3) falls outside of the Critical Value (178) and into the blue Region of
Rejection. The Null Hypothesis is therefore rejected. It can be stated with 95 percent certainty that the
variance of the measurement taken from the completed production unit has increased as a result of the
adjustment made to production line.
It should be noted that the Null Hypothesis would not be rejected if this were a two-tailed test. The Chi-
Square Statistic, 180.3, falls inside of the Chi-Square Critical Values of the two-tailed test (117, 184). The
two-tailed test is more stringent than the one-tailed test. This is the case with nearly every type of
hypothesis test.
350
Example of 1-Sample, 1-Tailed, Left Tail, Chi-Square
Population Variance Test in Excel
A specific measurement is taken on each unit that is completed from a production line over a long period
of time. These measurements have been determined to be normally distributed with a population variance
2
= σ = 0.09.
An adjustment was made to the production line that may have affected the variance of the measurement
taken on each completed unit. A random sample of 150 units from the newly-adjusted production line had
the measurement taken. The sample variance of this 150-unit sample is s = 0.33. Determine with 95
percent certainty whether the population variance has decreased as a result of the adjustment.
Problem Information
Sample size = n = 150
Degrees of Freedom = n – 1 = 149
Sample Standard Deviation = s = 0.27
2
Sample Variance = s = 0.0729
Long-term Population Variance = 0.09
Alpha = 1 – Required Level of Certainty = 1 – 0.95 = 0.05
351
Non-Parametric Alternatives to 1-Sample Chi-Square Population
Variance Test
When population normality cannot be confirmed, nonparametric alternatives for the one-sample Chi-
Square Population Variance Test include Levene’s Test and the Brown-Forsythe Test.
Levene’s Test and the Brown-Forsythe Test are nonparametric tests that are used to compare variances
of two samples when the F Test’s normality requirement cannot be met. These two nonparametric tests
can also be used in place of the one-sample, Chi-Square Population Variance Test.
Since both of these nonparametric tests are used to compare variances between two samples and
require that two data samples be taken for comparison.
The one-sample Chi-Square Population Test must be changed slightly in order to meet the requirement of
two samples to compare. A “Before” sample must now be taken in place of the known population
standard deviation data. The one-sample Chi-Square Population Variance Test compares an “After”
sample with known population variance data. The nonparametric tests requires that a “Before” sample be
taken to compare with the “After” sample. The “Before” sample is taken before to the adjustment is made
to, for example, a production line. The “After” sample is taken after the adjustment is made.
Levene’s Test and the Brown-Forsythe Test are performed in the next section on the two samples taken
for comparison by the F Test.
352
Chi-Square Statistic and Chi-Square Critical Values
The Chi-Square Statistic and Critical Values for this two-tailed test are calculated as follows:
These left and right Critical Values are shown in this Chi-Square PDF distribution curve for 149 degrees
of freedom as follows:
The Chi-Square Statistic (121.8) falls outside of the Critical Value (120) and into the blue Region of
Rejection. The Null Hypothesis is therefore rejected. It can be stated with 95 percent certainty that the
variance of the measurement taken from the completed production unit has decreased as a result of the
adjustment made to production line.
It should be noted that the Null Hypothesis would not be rejected if this were a two-tailed test. The Chi-
Square Statistic, 121.8, falls inside of the Chi-Square Critical Values of the two-tailed test (117, 184). The
two-tailed test is more stringent than the one-tailed test. This is the case with nearly every type of
hypothesis test.
353
F-Test – 2-Sample, 2-Tailed Chi-Square Population
Variance Test
The variances of two normally-distributed populations can be compared for equality using the F-Test. The
F-Test is a two-sample, two-tailed population variance test. This is a hypothesis test with a Null
Hypothesis stating that the variances of both populations are the same. The Null Hypothesis is shown as
follows:
H0: σ1 = σ2 = σ
2
Note that population variance = σ
The F Test is always performed as a one-tailed test in the right tail with the Alternative Hypothesis
constructed as follows:
H0: σ1 > σ2
The F Test is performed as a one-tailed test in the right tail because the sample with the larger standard
deviation of the two samples is designated as sample 1. The population from which that sample was
taken is designated as population 1. The two parameters associated with sample 1 and population 1 are
s1 (sample 1 standard deviation) and σ1 (population 1 standard deviation).
The F distribution describes the distribution of the F statistic, also called the f value. An F statistic can be
calculated if two independent random samples are taken from two normally-distributed populations. The
following parameters associated with the two samples and populations that must be determined are the
following:
n1 = size of sample 1
n2 = size of sample 2
The F statistic can then be calculated in any of the following four equivalent ways:
2 2 2 2
f = [ s1 /σ1 ] / [ s2 /σ2 ]
2 2 2 2
f = [ s1 * σ2 ] / [ s2 * σ1 ]
2 2
f=[Χ 1 / df1 ] / [ Χ 2 / df2 ]
2 2
f=[Χ 1 * df2 ] / [ Χ 2 * df1 ]
The numerator of the F statistic should be the parameters associated with the larger s.
354
The distribution of all possible values of the f statistic is called the F distribution, with v1 and v2 degrees of
freedom.
Since the F distribution has the chi-square distribution as a component, many of the chi-square
distribution properties are also properties of the F distribution such as the following:
1) The distribution is non-symmetric.
2) The mean is approximately 1.
3) The F-values are all non-negative.
4) There are two independent degrees of freedom, one for the numerator, and one for the denominator.
5) Each different F distribution has a unique pair of degrees of freedom.
The F Test is a hypothesis test determines if the variances of two normally-distributed populations are
significantly different based upon the standard deviations of samples taken from each population.
The F Test is performed by comparing the calculated F statistic to an F Critical Value, F α(df1,df2). Alpha,
α, is the specified level of significance for the hypothesis test. The Null Hypothesis that the two variances
are the same is rejected if the F statistic is greater than F Critical. Equivalently, the Null Hypothesis is also
rejected of the p Value (the area in the right tail of the F distribution curve that is beyond the F statistic) is
smaller than alpha.
It should be noted that the F Test is extremely sensitive to non-normality. It is very important to verify
normality of both samples or both populations prior to performing an F Test.
355
F Test Problem in Excel
Determine with 95 percent certainty whether the variances of battery lifetime of Brand A and brand B are
significantly different from each other.
Descriptive statistics run on the above data samples produces the following result:
356
Example Data
2
s1 = 286.13 (This sample is designated sample 1 because its variance is larger)
2
s2 = 232.39
n1 = 16
n2 = 17
df1 = n1 – 1 = 15
df2 = n2 – 1 = 16
357
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
358
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.
359
Having confirmed the F Test’s requirement of normality of both populations, the F Test can be conducted
as follows:
360
The Null Hypothesis of an F Test states that the variances of the two groups are the same. The p Value
shown in the Excel F Test output equals 0.345. This is much larger than the Alpha (0.05) that is typically
used for an F Test so the Null Hypothesis cannot be rejected. A p value of 0.345 indicates that there is a
34.5 percent probability of a Type I error, i.e. a false positive. This means that there is a 34.5 percent
probability that the difference in the variances shown by the test do not exist and are merely the chance
result of random sampling from each population.
The p value needs to be no larger than 0.05 to be at least 95 percent certain that the test’s indication of a
difference between the population variances is a true result. A p Value of 0.345 indicates that only 65.5
percent certainty exists that the a difference between the population variances really exists.
361
Performing the F Test With the Data Analysis F Test Tool
The F Test can be performed in one step by using the Excel Data Analysis F Test tool. this tool can be
accessed under the Data tab as follows:
Data tab / Data Analysis / F Test Two Sample for Variances
The F Test dialogue box then appears and should be completed as follows:
Hitting the OK button will produce the following output. Directly below the output are the calculations that
duplicate the output created by this tool.
362
363
In-Depth Analysis of Sample Normality
The F Test is extremely sensitive to non-normality of either population from which the samples were
taken. A population’s normality is confirmed when a sample taken from that population is shown to be
normally distributed. The preceding F test was performed on the basis of bell-shaped histograms of each
of the two samples’ data. Other methods of confirming sample normality are shown as follows:
Evaluating the Normality of the Sample Data
The following five normality tests will be performed on the sample data here:
An Excel histogram of the sample data will be created.
A normal probability plot of the sample data will be created in Excel.
The Kolmogorov-Smirnov test for normality of the sample data will be performed in Excel.
The Anderson-Darling test for normality of the sample data will be performed in Excel.
The Shapiro-Wilk test for normality of the sample data will be performed in Excel.
Excel Histogram
The quickest way to evaluate normality of a sample is to construct an Excel histogram from the sample
data. This is shown as follows:
Excel histograms of both sample groups are as follows:
364
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
365
To create this histogram in Excel, fill in the Excel Histogram dialogue box as follows:
Both sample groups appear to be distributed reasonably closely to the bell-shaped normal distribution. It
should be noted that bin size in an Excel histogram is manually set by the user. This arbitrary setting of
the bin sizes can has a significant influence on the shape of the histogram’s output. Different bin sizes
could result in an output that would not appear bell-shaped at all. What is actually set by the user in an
Excel histogram is the upper boundary of each bin.
Another way to graphically evaluate normality of each data sample is to create a normal probability plot
for each sample group. This can be implemented in Excel and appears as follows:
366
Normal probability plots for both sample groups show that the data appears to be very close to being
normally distributed. The actual sample data (red) matches very closely the data values of the sample
were perfectly normally distributed (blue) and never goes beyond the 95 percent confidence interval
boundaries (green).
367
Kolmogorov-Smirnov Test For Normality in Excel
The Kolmogorov-Smirnov Test is a hypothesis test that is widely used to determine whether a data
sample is normally distributed. The Kolmogorov-Smirnov Test calculates the distance between the
Cumulative Distribution Function (CDF) of each data point and what the CDF of that data point would be if
the sample were perfectly normally distributed. The Null Hypothesis of the Kolmogorov-Smirnov Test
states that the distribution of actual data points matches the distribution that is being tested. In this case
the data sample is being compared to the normal distribution.
The largest distance between the CDF of any data point and its expected CDF is compared to
Kolmogorov-Smirnov Critical Value for a specific sample size and Alpha. If this largest distance exceeds
the Critical Value, the Null Hypothesis is rejected and the data sample is determined to have a different
distribution than the tested distribution. If the largest distance does not exceed the Critical Value, we
cannot reject the Null Hypothesis, which states that the sample has the same distribution as the tested
distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
368
Variable 2 - Brand B Battery Lifetimes
369
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Kolmogorov-Smirnov Test for Normality, which states that the sample data
are normally distributed, is rejected only if the maximum difference between the expected and actual CDF
of any of the data points exceed the Critical Value for the given n and α. That is not the case here.
The Max Difference Between the Actual and Expected CDF for Variable 1 (0.0885) and for Variable 2
(0.1007) are significantly less than the Kolmogorov-Smirnov Critical Value for n = 20 (0.29) and for n = 15
(0.34) at α = 0.05 so the Null Hypotheses of the Kolmogorov-Smirnov Test of each of the two sample
groups is accepted.
370
Anderson-Darling Test For Normality in Excel
The Anderson-Darling Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. The Anderson-Darling Test calculates a test statistic based upon the actual value of
each data point and the Cumulative Distribution Function (CDF) of each data point if the sample were
perfectly normally distributed.
The Anderson-Darling Test is considered to be slightly more powerful than the Kolmogorov-Smirnov test
for the following two reasons.
The Kolmogorov-Smirnov test is distribution-free. i.e., its critical values are the same for all distributions
tested. The Anderson-darling tests requires critical values calculated for each tested distribution and is
therefore more sensitive to the specific distribution.
The Anderson-Darling test gives more weight to values in the outer tails than the Kolmogorov-Smirnov
test. The K-S test is less sensitive to aberration in outer values than the A-D test.
If the test statistic exceeds the Anderson-Darling Critical Value for a given Alpha, the Null Hypothesis is
rejected and the data sample is determined to have a different distribution than the tested distribution. If
the test statistic does not exceed the Critical Value, we cannot reject the Null Hypothesis, which states
that the sample has the same distribution as the tested distribution.
F(Xk) = CDF(Xk) for normal distribution
F(Xk) = NORM.DIST(Xk, Sample Mean, Sample Stan. Dev., TRUE)
371
Variable 2 - Brand B Battery Lifetimes
Reject the Null Hypothesis of the Anderson-Darling Test which states that the data are normally
distributed if any the following are true:
A* > 0.576 When Level of Significance (α) = 0.15
A* > 0.656 When Level of Significance (α) = 0.10
A* > 0.787 When Level of Significance (α) = 0.05
A* > 1.092 When Level of Significance (α) = 0.01
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
The Null Hypothesis for the Anderson-Darling Test for Normality, which states that the sample data are
normally distributed, is rejected if the Adjusted Test Statistic (A*) exceeds the Critical Value for the given
n and α.
The Adjusted Test Statistic (A*) for Variable 1 (0.174) and for Variable 2 (0.227) are significantly less than
the Anderson-Darling Critical Value for α = 0.05 so the Null Hypotheses of the Anderson-Darling Test for
each of the two sample groups is accepted.
372
Shapiro-Wilk Test For Normality in Excel
The Shapiro-Wilk Test is a hypothesis test that is widely used to determine whether a data sample is
normally distributed. A test statistic W is calculated. If this test statistic is less than a critical value of W for
a given level of significance (alpha) and sample size, the Null Hypothesis which states that the sample is
normally distributed is rejected.
The Shapiro-Wilk Test is a robust normality test and is widely-used because of its slightly superior
performance against other normality tests, especially with small sample sizes. Superior performance
means that it correctly rejects the Null Hypothesis that the data are not normally distributed a slightly
higher percentage of times than most other normality tests, particularly at small sample sizes.
The Shapiro-Wilk normality test is generally regarded as being slightly more powerful than the Anderson-
Darling normality test, which in turn is regarded as being slightly more powerful than the Kolmogorov-
Smirnov normality test.
373
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0. 972027) is larger than W Critical 0.887. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.
The Null Hypothesis Stating That the Data Are Normally Distributed Cannot Be Rejected
Test Statistic W (0. 971481) is larger than W Critical 0.892. The Null Hypothesis therefore cannot be
rejected. There is not enough evidence to state that the data are not normally distributed with a
confidence level of 95 percent.
374
Correctable Reasons That Normal Data Can Appear Non-Normal
If a normality test indicates that data are not normally distributed, it is a good idea to do a quick evaluation
of whether any of the following factors have caused normally-distributed data to appear to be non-
normally-distributed:
1) Outliers – Too many outliers can easily skew normally-distributed data. An outlier can often be
removed if a specific cause of its extreme value can be identified. Some outliers are expected in normally-
distributed data.
2) Data Has Been Affected by More Than One Process – Variations to a process such as shift changes
or operator changes can change the distribution of data. Multiple modal values in the data are common
indicators that this might be occurring. The effects of different inputs must be identified and eliminated
from the data.
3) Not Enough Data – Normally-distributed data will often not assume the appearance of normality until
at least 25 data points have been sampled.
4) Measuring Devices Have Poor Resolution – Sometimes (but not always) this problem can be solved
by using a larger sample size.
5) Data Approaching Zero or a Natural Limit – If a large number of data values approach a limit such
as zero, calculations using very small values might skew computations of important values such as the
mean. A simple solution might be to raise all the values by a certain amount.
6) Only a Subset of a Process’ Output Is Being Analyzed – If only a subset of data from an entire
process is being used, a representative sample in not being collected. Normally-distributed results would
not appear normally distributed if a representative sample of the entire process is not collected.
375
Nonparametric Alternatives to the F Test
The F Test is extremely sensitive to non-normality of either population. When normality of population of
sample data cannot be confirmed, sample variances can be compared using the nonparametric Levene’s
Test and also the nonparametric Brown-Forsythe Test.
It is often a good idea to perform at least one of these tests along with the F Test even when sample or
population normality has been confirmed.
Levene’s Test and the Brown-Forsythe Test will be performed on the sample data as follows:
Levene’s Test involves performing Single-Factor ANOVA on the groups of distances to the mean. This
can be easily implemented in Excel by applying the Excel data analysis tool ANOVA: Single Factor.
Applying this tool on the above data produces the following output:
376
The Null Hypothesis of Levene’s Test states that the average distance to the mean for the two groups are
the same. Acceptance of this Null Hypothesis would imply that the sample groups have the same
variances. The p Value shown in the Excel ANOVA output equals 0.6472. This is much larger than the
Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
We therefore conclude as a result of Levene’s Test that the variances are the same or, at least, that we
don’t have enough evidence to state with 95 percent certainty that the variances are different. Levene’s
Test is sensitive to outliers because relies on the sample mean, which can be unduly affected by outliers.
A very similar nonparametric test called the Brown-Forsythe Test relies on sample medians and is
therefore much less affected by outliers as Levene’s Test is or by non-normality as the F Test is.
377
Brown-Forsythe Test For Sample Variance Comparison in Excel
The Brown-Forsythe Test is a hypothesis test commonly used to test for the equality of variances of two
or more sample groups. The Null Hypothesis of the Brown-Forsythe Test is average distance to the
sample median is the same for each sample group. Acceptance of this Null Hypothesis implies that the
variances of the sampled groups are the same. The distance to the median for each data point of both
samples is shown as follows:
The Brown-Forsythe Test involves performing Single-Factor ANOVA on the groups of distances to the
median. This can be easily implemented in Excel by applying the Excel data analysis tool ANOVA:
Single Factor. Applying this tool on the above data produces the following output:
378
The Null Hypothesis of the Brown-Forsythe Test states that the average distance to the median for the
two groups are the same. Acceptance of this Null Hypothesis would imply that the sample groups have
the same variances. The p Value shown in the Excel ANOVA output equals 0.6627. This is much larger
than the Alpha (0.05) that is typically used for an ANOVA Test so the Null Hypothesis cannot be rejected.
379
Check Out the Latest Book in the Excel Master Series!
Click Here To Download This 200+ Page Excel Solver Optimization Manual Right Now for $19.95
http://37.solvermark.pay.clickbank.net/
For anyone who wants to be performing optimization at a high level with the Excel Solver quickly, Step-
By-Step Optimization With Excel Solver is the e-manual for you. This is a hands-on, step-by-step,
complete guidebook for both beginner and advanced Excel Solver users. This book is perfect for the
many students who are now required to be proficient in optimization in so many majors as well as industry
professionals who have an immediate need to become up-to-speed with advanced optimization in a short
time frame.
Step-By-Step Optimization With Excel Solver is 200+ pages .pdf e-manual of simple yet thorough
explanations on how to use the Excel Solver to solve today’s most widely known optimization problems.
Loaded with screen shots that are coupled with easy-to-follow instructions, this .pdf e-manual will simplify
many difficult optimization problems and make you a master of the Excel Solver almost immediately.
The author of Step-By-Step Optimization With Excel Solver, Mark Harmon, was the Internet marketing
manager for several years for the company that created the Excel Solver and currently develops it for
Microsoft Excel today. He shares his deep knowledge of and experience with optimization using the Excel
Solver in this book.
Here are just some of the Solver optimization problems that are solved completely with simple-to-
understand instructions and screen shots in this book
● The famous “Traveling Salesman” problem using Solver’s Alldifferent constraint and the Solver’s
Evolutionary method to find the shortest path to reach all customers. This also provides an advanced use
of the Excel INDEX function.
380
● The well-known “Knapsack Problem” which shows how optimize the use of limited space while
satisfying numerous other criteria.
● How to perform nonlinear regression and curve-fitting on the Solver using the Solver’s GRG Nonlinear
solving method
● How to solve the “Cutting Stock Problem” faced by many manufacturing companies who are trying to
determine the optimal way to cut sheets of material to minimize waste while satisfying customer orders.
● Portfolio optimization to maximize return or minimize risk.
● Venture capital investment selection using the Solver’s Binary constraint to maximize Net Present Value
of selected cash flows at year 0. Clever use of the If-Then-Else statements makes this a simple problem.
● How use Solver to minimize the total cost of purchasing and shipping goods from multiple suppliers to
multiple locations.
● How to optimize the selection of different production machine to minimize cost while fulfilling an order.
● How to optimally allocate a marketing budget to generate the greatest reach and frequency or number
of inbound leads at the lowest cost.
Step-By-Step Optimization With Excel Solver has complete instructions and numerous tips on every
aspect of operating the Excel Solver. You’ll fully understand the reports and know exactly how to tweek all
of the Solver’s settings for total custom use. The book also provides lots of inside advice and guidance on
setting up the model in Excel so that it will be as simple and intuitive as possible to work with All of the
optimization problems in this book are solved step-by-step using a 6-step process that works every time.
In addition to detailed screen shots and easy-to-follow explanations on how to solve every optimization
problem in this e-manual, a link is provided to download an Excel workbook that has all problems
completed exactly as they are in this e-manual.
Step-By-Step Optimization With Excel Solver is exactly the e-manual you need if you want to be
optimizing at an advanced level with the Excel Solver quickly.
Reader Testimonials
"Step-By-Step Optimization With Excel Solver is the "Missing Manual" for the Excel Solver. It is pretty
difficult to find good documentation anywhere on solving optimization problems with the Excel Solver.
This book came through like a champ!
Optimization with the Solver is definitely not intuitive, but this book is. I found it very easy to work through
every single one of the examples. The screen shots are clear and the steps are presented logically. The
downloadable Excel spreadsheet with all example completed was quite helpful as well.
Once again, it really amazing how little understandable documentation there is on doing real-life
optimization problems with Solver.
For example, just try to find anything anywhere about the well-known Traveling Salesman Problem (a
salesman needs to find the shortest route to visit all customers once).
Step-By-Step Optimization With Excel Solver is the "Missing Manual" for the Excel Solver. It is pretty
difficult to find good documentation anywhere on solving optimization problems with the Excel Solver.
This book came through like a champ!
Optimization with the Solver is definitely not intuitive, but this book is. I found it very easy to work through
every single one of the examples. The screen shots are clear and the steps are presented logically. The
downloadable Excel spreadsheet with all example completed was quite helpful as well.
381
Once again, it really amazing how little understandable documentation there is on doing real-life
optimization problems with Solver.
For example, just try to find anything anywhere about the well-known Traveling Salesman Problem (a
salesman needs to find the shortest route to visit all customers once)
It is a tricky problem for sure, but this book showed a quick and easy way to get it done. I'm not sure I
would have ever figured that problem out, or some the other problems in the book, without this manual.
I can say that this is the book for anyone who wants or needs to get up to speed on an advanced level
quickly with the Excel Solver. It appears that every single aspect of using the Solver seems to be covered
thoroughly and yet simply. The author presents a lot of tricks in how to set the correct Solver settings to
get it to do exactly what you want.
The book flows logically. It's an easy read. Step-By-Step Optimization With Excel Solver got me up to
speed on the Solver quickly and without to much mental strain at all. I can definitely recommend this
book."
Pam Copus
Sonic Media Inc
“As Graduate student of the Graduate Program in International Studies (GPIS) at Old Dominium
University, I'm required to have a thorough knowledge of Excel in order to use it as a tool for interpreting
data, conducting research and analysis.
I've always found the Excel Solver to be one of the more difficult Excel tools to totally master. Not any
more. This book was so clearly written that I was able to do almost every one of advanced optimization
examples in the book as soon as I read through it once.
I can tell that the author really made an effort to make this manual as intuitive as possible. The screen
shots were totally clear and logically presented.
Some of the examples that were very advanced, such as the venture capital investment example, had
screen shot after screen shot to ensure clarity of the difficult Excel spreadsheet and Solver dialogue
boxes.
It definitely was "Step-By-Step" just like the title says. I must say that I did have to cheat a little bit and
look at the Excel spreadsheet with all of the book's example that is downloadable from the book. The
spreadsheet was also a great help.
Step-By-Step Optimization With Excel Solver is not only totally easy to understand and follow, but it is
also very complete. I feel like I'm a master of the Solver. I have purchased a couple of other books in the
Excel MaSter Series (the Excel Statistical Master and the Advanced Regression in Excel book) and they
have all been excellent.
I am lucky to have come across this book because the graduate program that I am in has a number of
optimization assignments using the Solver. Thanks Mark for such an easy-to-follow and complete book
on using the Solver. It really saved me a lot of time in figuring this stuff out."
Federico Catapano
Graduate Student
International Studies Major
Old Dominion University
Norfolk, Virginia
"I'm finished with school (Financial Economics major) and currently work for a fortune 400 company as a
business analyst. I find that the statistics and optimization manuals are indispensable reference tools
throughout the day.
382
I keep both eManuals loaded on my ipad at all times just in case I have to recall a concept I don't use all
the time. Its easier to recall the concepts from the eManuals rather then trying to sift through the
convoluted banter in a text book, and for that I applaud the author!
In a business world where I need on demand answers now this optimization eManual is the perfect tool.
I just recently used the bond investment optimization problem to build a model in excel and help my VP
understand that a certain process we're doing wasn't maximizing our resources.
That's the great thing about this manual, you can use any practice problem (with a little outside thinking)
to mold it into your own real life problem and come up with answers that matter in the work place.!"
Sean Ralston
Sr. Financial Analyst
Enogex LLC
Oklahoma City, Oklahoma
"Excel Solver is a tool that most folks never use. I was one of those people. I was working on a project,
and was told that solver might be helpful. I did some research online, and was more confused than ever. I
started looking for a book that might help me. I got this book, and was not sure what to expect.
It surpassed my expectations! The book explains the concepts behind the solver, the best way to set up
the "problem", and how to use the tool effectively. It also gives many examples including the files. The
files are stored online, and you can download them so you can see everything in excel.
The author does a fantastic job on this book. While I'm not a solver "expert", I am definitely much smarter
about it than I was before. Trust me, if you need to understand the solver tool, this book will get you
there."
Scott Kinsey
Missouri
“The author, Mark, has a writing style that is easy to follow, simple, understandable, with clear examples
that are easy to follow. This book is no exception.
Mark explains how solver works, the different types of solutions that can be obtained and when to use
one or another, explains the content and meaning of the reports available. Then he presents several
examples, goes about defining each problem, setting it up in excel and in solver and interpreting the
solution.
It is a really good book that teaches you how to apply solver (linear programming) to a problem.”
Luis R. Heimpel
El Paso, Texas
383
Click Here To Download This 200+ Page Excel Solver Optimization Manual Right Now for $19.95
http://37.solvermark.pay.clickbank.net/
384
Meet the Author
Mark Harmon is a university statistics instructor and statistical/optimization consultant. He was the
Internet marketing manager for several years for the company that created the Excel Solver and currently
develops that add-in for Excel. He has made contributions to the development of Excel over a long period
of time dating all the way back to 1992 when he was one of the beta users of Excel 4 creating the sales
force deployment plan for the introduction of the anti-depressant drug Paxel into the North American
market.
Mark Harmon is a natural teacher. As an adjunct professor, he spent five years teaching more than thirty
semester-long courses in marketing and finance at the Anglo-American College in Prague, Czech
Republic and the International University in Vienna, Austria. During that five-year time period, he also
worked as an independent marketing consultant in the Czech Republic and performed long-term
assignments for more than one hundred clients. His years of teaching and consulting have honed his
ability to present difficult subject matter in an easy-to-understand way.
This manual got its start when Mark Harmon began conducting statistical analysis to increase the
effectiveness of various types of Internet marketing that he was performing during the first decade of the
2000s. Mark initially formulated the practical, statistical guidelines for his own use but eventually realized
that others would also greatly benefit by this step-by-step collection of statistical instructions that really did
not seem to be available elsewhere. Over the course of a number of years and several editions, this
instruction manual blossomed into the Excel Master Series of graduate-level, step-by-step, complete,
practical, and clear set of guidebooks that it is today.
Mark Harmon received a degree in electrical engineering from Villanova University and MBA in marketing
from the Wharton School.
Mark is an avid fan of the beach life and can nearly always be found by a warm and sunny beach.
385