Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

A PowerPoint Presentation Package to Accompany

Applied Statistics in Business &


Economics, 6th edition

David P. Doane and Lori E. Seward

Prepared by Lloyd R. Jaisingh

15-1 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
Chi-Square Tests
Chapter Contents
15.1 Chi-Square Test for Independence
15.2 Chi-Square Tests for Goodness-of-Fit
15.3 Uniform Goodness-of-Fit Test
15.4 Poisson Goodness-of-Fit Test
15.5 Normal Chi-Square Goodness-of-Fit Test
15.6 ECDF Tests (Optional)

15-2 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
Chapter Learning Objectives (LOs)

LO15-1: Recognize a contingency table and understand how it is created.


LO15-2: Find degrees of freedom and use the chi-square table of critical
values.
LO15-3: Perform a chi-square test for independence on a contingency
table.
LO15-4: Perform a goodness-of-fit (GOF) test for a multinomial
distribution.
LO15-5: Perform a GOF test for a uniform distribution.

15-3 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
Chapter Learning Objectives (LOs), continued

LO15-6: Explain the GOF test for a Poisson distribution.


LO15-7: Explain the chi-square GOF test for normality.
LO15-8: Interpret ECDF tests and know their advantages compared to
chi-square GOF tests.

15-4 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
15.1 Chi-Square Test for
Independence
LO15-1: Recognize a contingency table and understand
how it is created.
Contingency Tables
• A contingency table is a cross-tabulation of n paired observations into
categories.
• Each cell shows the count of observations that fall into the
category defined by its row and column heading as shown in Table 15.2.

15-5 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-1: Recognize a contingency table and understand
how it is created (continued, 2).

Contingency Tables

For example: Marketing researchers did a survey of 291 websites in


three nations (France, U.K., U.S.) and obtained the contingency
table shown here as Table 15.1. Is location of the privacy disclaimer
independent of the website’s nationality? This question can be answered
by using a test based on the frequencies in this contingency table.

15-6 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-1: Recognize a contingency table and understand
how it is created (continued, 3).

Chi-Square Test
• In a test of independence for an r x c contingency table, the
hypotheses are
H0: Variable A is independent of variable B
H1: Variable A is not independent of variable B
• Use the chi-square test for independence to test these hypotheses.
• This non-parametric test is based on frequencies.
• The n data pairs are classified into c columns and r rows, and then the
observed frequency fjk is compared with the expected frequency ejk
under the assumption of independence.

15-7 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-1: Recognize a contingency table and understand
how it is created (continued, 4).

Chi-Square Test, continued


• The chi-square test statistic measures the relative difference between
expected and observed frequencies:

If the two variables are independent, then fjk should be close to ejk, leading to


a chi-square test statistic near zero. Conversely, large differences
between fjk and ejk will lead to a large chi-square test statistic. The chi-square
test statistic cannot be negative (due to squaring) so it is always a right-tailed
test. If the test statistic is far enough in the right tail, we will reject the
hypothesis of independence. 
15-8 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-2: Find degrees of freedom and use the chi-square
table of critical values.

Chi-Square Distribution
• The critical value comes from the chi-square probability distribution
with (r – 1)(c – 1) degrees of freedom.
df = degrees of freedom = (r – 1)(c – 1)

where r = number of rows in the table


c = number of columns in the table
• Appendix E contains critical values for right-tail areas of the chi-
square distribution.
• The Excel function =CHISQ.INV.RT(α, df) also gives the critical
value in the right-tail.

15-9 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-2: Find degrees of freedom and use the chi-square
table of critical values (continued, 2).
Chi-Square Distribution
• Consider the shape of the chi-square distribution. As the degrees
of freedom increases, the shape begins to resemble a normal,
bell-shaped curve.
• However, for any contingency table you are likely to encounter,
degrees of freedom will not be large enough to assume normality.

15-10 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-2: Find degrees of freedom and use the chi-square
table of critical values (continued, 3).
Expected Frequencies
 Assuming that H0 is true, the expected frequency of row j and
column k is:

where
Rj = total for row j (j = 1, 2, …, r)
Ck = total for column k (k = 1, 2, …, c)
n = sample size

15-11 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table.

Steps in Testing the Hypotheses


Step 1: State the Hypotheses
• H0: Variable A is independent of variable B
• H1: Variable A is not independent of variable B

Step 2: Specify the Decision Rule


• Calculate df = (r – 1)(c – 1)
• For a given a, look up the right-tail critical value (c2R) from
Appendix E or by using Excel.
Reject H0 if test statistic > c2R.
• Instead of using Appendix E, you can use the Excel function
=CHISQ.INV.RT(α, df) to get the critical value in the right-tail.
15-12 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 2).

Steps in Testing the Hypotheses (continued)


• For example, for df = 6 and a = .05, c2.05 = 12.59.

15-13 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 3).

Steps in Testing the Hypotheses (continued)


• Here is the rejection region.

15-14 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 4).
Steps in Testing the Hypotheses (continued)
Step 3: Calculate the Test Statistic

The expected frequencies are computed from

• For example:

15-15 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 5).
Steps in Testing the Hypotheses (continued)
• The chi-square test statistic is

Step 4: Make the Decision


• Reject H0 if c2calc > test statistic or if the p-value  .

Step 5: Take Action

15-16 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 6).

Test of Two Proportions


• For a 2 × 2 contingency table, the chi-square test is equivalent to a
two-tailed z test for two proportions, if the samples are large
enough to ensure normality.
• The hypotheses for a two-tailed test are:

Figure 14.6
15-17 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 7).

Test of Two Proportions, continued


• The z-test statistic is computed from the following formula.
• Reject H0 if .

Figure 14.6
15-18 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 8).

Small Expected Frequencies


• The chi-square test is unreliable if the expected frequencies are too
small.
• Rules of thumb:
 Cochran’s Rule requires that e > 5 for all cells.
jk
 Another rule of thumb is that up to 20% of the cells may have e <
jk
5
• Most agree that a chi-square test is infeasible if ejk < 1 in any cell.
• If this happens, try combining adjacent rows or columns to enlarge
the expected frequencies.

15-19 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 9).

Cross-Tabulating Raw Data


• Chi-square tests for independence can also be used to analyze
quantitative variables by coding them into categories.
• For example, the variables Infant Deaths per 1,000 and Doctors
per 100,000 can each be coded into various categories:

15-20 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 10).

Why Do a Chi-Square Test on Numerical Data?


• The researcher may believe there’s a relationship between X and
Y, but doesn’t want to make an assumption on its form (linear,
quadratic etc.) as required by regression.
• There are outliers or anomalies that prevent us from assuming that
the data came from a normal population. Unlike correlation and
regression, the chi-square test does not require any normality
assumptions.
• The researcher has numerical data for one variable but not the
other. A chi-square test can be used if we convert the numerical
variable into categories.

15-21 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-3: Perform a chi-square test for independence
on a contingency table (continued, 11).

3-Way Tables and Higher


• More than two variables can be compared using contingency
tables.
• However, it is difficult to visualize a higher order table.
• For example, you could visualize a cube as a stack of tiled 2-way
contingency tables.
• Major computer packages permit 3-way tables.

15-22 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
15.2 Chi-Square Test for
Goodness-of-Fit
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution.
Purpose of the Test
• The goodness-of-fit (GOF) test helps you decide whether your
sample resembles a particular kind of population.
• The chi-square test will be used because it is versatile and easy
to understand.

15-23 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 2).

Multinomial GOF Test


• A multinomial distribution is defined by any k probabilities 1, 2, …, k
that sum to unity.
• For example, consider the following “official” proportions of M&M
colors.

15-24 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 3).

Multinomial GOF Test, continued


• The hypotheses are
H0: 1 = .13, 2 = .13, 3 = .24, 4 = .20, 5 = .16, 6 = .14
H1: At least one of the j differs from the hypothesized value.

• No parameters are estimated (m = 0) and there are c = 6 classes, so


the degrees of freedom are df = c – m – 1 = 6 – 0 – 1 = 5.

15-25 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 4).
Test Statistic and Degrees of Freedom for GOF
• Assuming n observations, the observations are grouped into c
classes and then the chi-square test statistic is found using:

where fj = the observed frequency of


observations in class j
ej = the expected frequency in class j if
H0 were true

15-26 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 5).
Test Statistic and Degrees of Freedom for GOF,
continued
• If the proposed distribution gives a good fit to the sample, the test
statistic will be near zero.
• The test statistic follows the chi-square distribution with c – m – 1
degrees of freedom df = c – m – 1.
• where c is the number of classes (bins) used in the test and m is
the number of parameters estimated.

Small Expected Frequencies


• Goodness-of-fit tests may lack power in small samples. As a guideline, a
chi-square goodness-of-fit test should be avoided if n < 25. Cochran’s Rule
that expected frequencies should be at least 5 (i.e., all ej ≥ 5) also provides
a guideline, although some experts would weaken the rule to require
only ej ≥ 2.
15-27 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 6).
GOF Test for Other Distributions
• The hypotheses are:
H0: The population follows a _____ distribution
H1: The population does not follow a ______ distribution
• The blank may contain the name of any theoretical distribution (e.g.,
uniform, Poisson, normal).
• In a GOF test, if we use sample data to estimate the distribution’s
parameters, then our degrees of freedom would be as follows:

15-28 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 7).
Data-Generating Situations
• Instead of “fishing” for a good-fitting model, visualize a priori the
characteristics of the underlying data-generating process.
• It is undoubtedly true that the most common GOF test is for the
normal distribution, simply because so many parametric tests
assume normality, and that assumption must be tested. Also, the
normal distribution may be used as a default benchmark for any
mound-shaped data that have centrality and tapering tails, as long
as you have reason to believe that a constant mean and variance
would be reasonable.
• However, you would not consider a Poisson distribution for
continuous data or certain integer variables because a Poisson
model only applies to integer data on arrivals or rare, independent
events.
• We remind you of this because software makes it possible to fit
inappropriate distributions all too easily.
15-29 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 8).
Mixtures: A Problem
• Mixtures occur when more than one data-generating process is
superimposed on top of one another.
• Your sample may not resemble any known distribution. One common
problem is mixtures.
• A sample may have been created by more than one data-generating
process superimposed on top of another.
• For example, adult heights of either sex would follow a normal distribution,
but a combined sample of both genders will be bimodal, and its mean and
standard deviation may be unrepresentative of either sex.
• Obtaining a good fit is not sufficient justification for assuming a particular
model. Each probability distribution has its own logic about the nature of
the underlying process, so we also must examine the data-generating
situation and be convinced that the proposed model is both
logical and empirically apt.

15-30 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-4: Perform a goodness-of-fit (GOF) test for a
multinomial distribution (continued, 9).
Eyeball Tests
• A simple “eyeball” inspection of the histogram or dot plot may suffice
to rule out a hypothesized population.
• For example, if the sample is strongly bimodal or skewed, or if
outliers are present, we would anticipate a poor fit to a normal
distribution. The shape of the histogram can give you a rough idea
whether a normal distribution is a likely candidate for a good fit.
• You can be fairly sure that a formal test will agree with what your
common sense tells you, as long as the sample size is not too small.
• Yet a limitation of eyeball tests is that we may be unsure just how
much variation is expected for a given sample size. If anything, the
human eye is overly sensitive, causing us to commit α error
(rejecting a true null hypothesis) too often.
• People are sometimes unduly impressed by a small departure from
the hypothesized distribution, when actually it is within chance.
15-31 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
15.3 Uniform Goodness-of-Fit Test
LO15-5: Perform a goodness of-fit (GOF) test for a
uniform distribution.

Uniform Distribution
• The uniform goodness-of-fit test is a special case of the multinomial
in which every value has the same chance of occurrence.
• The chi-square test for a uniform distribution compares all c groups
simultaneously.
• The hypotheses are:
H0: 1 = 2 = …, c = 1/c
H1: Not all j are equal

15-32 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-5: Perform a goodness of-fit (GOF) test for a
uniform distribution (continued, 2).

Uniform GOF Test: Grouped Data


• The test can be performed on data that are already tabulated into
groups.
• Calculate the expected frequency ej for each cell.
• The degrees of freedom are df = c – 1 because there are no
parameters for fitting the uniform distribution.
• Obtain the critical value c2a from Appendix E for the desired level
of significance a.
• The p-value can be obtained from the Excel function
=CHISQ.DIST.RT(c2calc, df)
• Reject H0 if p-value  a.

15-33 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-5: Perform a goodness of-fit (GOF) test for a
uniform distribution (continued, 3).

Uniform GOF Test: Raw Data


• First form c bins of equal width and create a frequency distribution.
• Calculate the observed frequency fj for each bin.
• Define ej = n/c.
• Perform the chi-square calculations.
• The degrees of freedom are df = c – 1 since there are no parameters
for the uniform distribution.
• Obtain the critical value from Appendix E for a given significance
level a and make the decision.

15-34 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-5: Perform a goodness of-fit (GOF) test for a
uniform distribution (continued, 4).

Uniform GOF Test: Raw Data, continued


• Maximize the test’s power by defining bin width as

• As a result, the expected frequencies will be as large as possible.

15-35 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-5: Perform a goodness of-fit (GOF) test for a
uniform distribution (continued, 5).

Uniform GOF Test: Raw Data (continued, 3)


• Calculate the mean and standard deviation of the uniform
distribution from:

• If the data are not skewed and the sample size is large (n > 30),
then the mean is approximately normally distributed.
• So, test the hypothesized uniform mean using

15-36 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
15.4 Poisson Goodness-of-Fit Test
LO15-6: Explain the GOF test for a Poisson distribution.

Poisson Data-Generating Situations


• In a Poisson distribution model, X represents the number of events
per unit of time or space.
• X is a discrete nonnegative integer (X = 0, 1, 2, …)
• Event arrivals must be independent of each other.
• Sometimes called a model of rare events because X typically has a
small mean.

15-37 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-6: Explain the GOF test for a Poisson distribution
(continued, 2).

Poisson Goodness-of-Fit Test


• The mean λ is the only parameter. The initial steps for the test are:
• Step 1: Tally the observed frequency for each of each x-value.
• Step 2: If λ is unknown, estimate it from the sample.
• Step 3: Use the estimated λ to find the Poisson probability P(X)
for each value of X.
• Step 4: Multiply P(X = x) by the sample size n to get the expected
frequencies .
• Step 5: Perform the chi-square calculations.
• Step 6: Make the decision.
• You may need to combine classes until expected frequencies
become large enough for the test (at least until > 2).

15-38 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-6: Explain the GOF test for a Poisson distribution
(continued, 3).

Poisson GOF Test: Tabulated Data

 Calculate the sample mean as:

 Using this estimate mean, calculate the Poisson probabilities


either by using the Poisson formula P(x) = (λxe-l)/x! or Excel.

15-39 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-6: Explain the GOF test for a Poisson distribution
(continued, 4).

Poisson GOF Test: Tabulated Data, continued


• For c classes with m = 1 parameter estimated, the degrees of
freedom are df = c – m – 1 = c – 2.
• Obtain the critical value for a given a from Appendix E.
• Make the decision.

15-40 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
15.5 Normal Chi-Square Goodness-of-Fit Test

LO15-7: Explain the chi-square GOF test for normality.

Normal Data Generating Situations


• Two parameters, the mean and the standard deviation , fully describe
the normal distribution.
• Unless μ and are known a priori, they must be estimated from a
sample.
• Using these statistics, the chi-square goodness-of-fit test can be used.

15-41 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-7: Explain the chi-square GOF test for normality
(continued, 2).

Method 1: Standardizing the Data


• There are various ways to calculate the frequencies for a chi-square
test. One way is to transform the sample
observations x1, x2, . . . , xn into standardized values:

• We could count the sample observations fj within intervals of the form​​


 and compare them with the known frequencies ej based on the
normal distribution, as illustrated in Figure 15.13 (on next slide).

15-42 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-7: Explain the chi-square GOF test for normality
(continued, 3).

Method 1: Standardizing the Data, continued

Advantage is a
standardized
scale.

Disadvantage is
that data are no
longer in the
original units.

15-43 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-7: Explain the chi-square GOF test for normality
(continued, 4).

Method 2: Equal Bin Widths


• To obtain equal-width bins, divide the exact data range into c groups of
equal width.
• Step 1: Count the sample observations in each bin to get observed
frequencies fj.
• Step 2: Convert the bin limits into standardized z-values by using the
formula.

• Step 3: Find the normal area within each bin assuming a normal
distribution.
• Step 4: Find expected frequencies ej by multiplying each normal area by
the sample size n.
• Classes may need to be collapsed from the ends inward to enlarge
expected frequencies.
15-44 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-7: Explain the chi-square GOF test for normality
(continued, 5).

Method 3: Equal Expected Frequencies


• Define histogram bins in such a way that an equal number of observations
would be expected within each bin under the null hypothesis.
• Define bin limits so that ej = n/c
• A normal area of 1/c in each of the c bins is desired.
• The first and last classes must be open-ended for a normal distribution, so
to define c bins, we need c – 1 cut-points.
• The upper limit of bin j can be found directly by using Excel.
• Alternatively, find zj for bin j using Excel and then calculate the upper limit
for bin j as s.
• Once the bins are defined, count the observations fj within each bin and
compare them with the expected frequencies ej = n/c.

15-45 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-7: Explain the chi-square GOF test for normality
(continued, 6).

Method 3: Equal Expected Frequencies, continued


• Table 15.17 shows some standard normal cutpoints for equal area
bins.

Table 15.16

15-46 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-7: Explain the chi-square GOF test for normality
(continued, 7).
Histograms
• The fitted normal histogram gives visual clues as to the likely
outcome of the GOF test.
• Histograms reveal any outliers or other non-normality issues.
• Further tests are needed since histograms vary.

15-47 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
15.6 ECDF Tests (Optional)
LO15-8: Interpret ECDF tests and know their advantages
compared to chi-square GOF tests.

• There are many alternatives to the chi-square test for goodness-of-fit.


These alternatives are based on the Empirical Cumulative Distribution
Function (ECDF).
• The Anderson-Darling (A-D) test is the most widely used for non-normality
because of its power.
• The A-D test is based on a probability plot. When the data fit the
hypothesized distribution closely, the probability plot will be close to a
straight line.
• The A-D test is more powerful than a chi-square test if raw data are
available because it treats the observations individually. Also, the probability
plot has the attraction of revealing discrepancies between the sample and
the hypothesized distribution, and it is usually easy to spot outliers.

15-48 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-8: Interpret ECDF tests and know their advantages
compared to chi-square GOF tests (continued, 2).

• Another such test is the Kolmogorov-Smirnov (K-S) test, which uses the
largest absolute difference between the actual and expected cumulative
relative frequency of the n data values.
• The K-S test assumes that no parameters are estimated. If parameters are
estimated, use a Lilliefors test whose test statistic is the same but with a
different table of critical values. Both tests are done by computer.
• The K-S test can be illustrated in the same probability plot as the A-D test
as shown in Figure 15.15 (see the next slide).

15-49 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Chapter 15
LO15-8: Interpret ECDF tests and know their advantages
compared to chi-square GOF tests (continued, 3).

15-50 Copyright ©2019 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.

You might also like