Goodness of Fit Test: A Multinomial Population Goodness of Fit Test: Poisson and Normal Distributions Test of Independence

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 54

Chapter 12

Tests of Goodness of Fit and Independence


 Goodness of Fit Test: A Multinomial Population
 Test of Independence
 Goodness of Fit Test: Poisson
and Normal Distributions

© 2008 Thomson South-Western. All Rights Reserved Slide


1
Chi square distribution

 the data are categorical.


 1. Testing the equality of population proportions for
three or more populations
 2. Testing the independence of two categorical
variables
 3. Testing whether a probability distribution for a
population follows a specific historical or theoretical
probability distribution

© 2008 Thomson South-Western. All Rights Reserved Slide


2
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
1. Set up the null and alternative hypotheses.
2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.

© 2008 Thomson South-Western. All Rights Reserved Slide


3
 p1 = population proportion for population 1
 p2 =population proportion for population 2
 pk = population proportion for population k

 the hypotheses for the equality of population


proportions for k =3 populations are as follows:

 H0: p1 =p2= . . . = pk
 Ha: Not all population proportions are equal

© 2008 Thomson South-Western. All Rights Reserved Slide


4
 p1 =proportion likely to repurchase an Impala for the population of
 Chevrolet Impala owners
 p2= proportion likely to repurchase a Fusion for the population of Ford
Fusion owners
 p3 = proportion likely to repurchase an Accord for the population of
Honda Accord owners

 How to determine if the H0:p1=p2=p3 should be rejected?


 Chi square test to determine whether there is a significant difference
between observed and expected frequency
© 2008 Thomson South-Western. All Rights Reserved Slide
5
 If H0 is true

 The best estimate .624 would be the best estimate of


the proportion responding likely to repurchase for
each of the automobile owner populations.

 312/500 expected
 Observed vs Expected

© 2008 Thomson South-Western. All Rights Reserved Slide


6
 All expected frequencies are at least 5

© 2008 Thomson South-Western. All Rights Reserved Slide


7
© 2008 Thomson South-Western. All Rights Reserved Slide
8
© 2008 Thomson South-Western. All Rights Reserved Slide
9
© 2008 Thomson South-Western. All Rights Reserved Slide
10
© 2008 Thomson South-Western. All Rights Reserved Slide
11
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
4. Compute the value of the test statistic.
2
k ( f  e )
2   i i
i 1 ei

where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
Note: The test statistic has a chi-square distribution
with k – 1 df provided that the expected frequencies
are 5 or more for all categories.

© 2008 Thomson South-Western. All Rights Reserved Slide


12
Hypothesis (Goodness of Fit) Test
for Proportions of a Multinomial Population
5. Rejection rule:
p-value approach: Reject H0 if p-value < 

Critical value approach: Reject H0 if  2   2


2 2

where  is the significance level and


there are k - 1 degrees of freedom

© 2008 Thomson South-Western. All Rights Reserved Slide


13
© 2008 Thomson South-Western. All Rights Reserved Slide
14
Statistical Inferences:
With p-value .05, we reject H0 and conclude that the three
population proportions are not all equal and thus there is a
difference in brand loyalties among the Chevrolet Impala, Ford
Fusion, and Honda Accord owners. Minitab or Excel
procedures provided in Appendix F can be used to show
2 7.89 with 2 degrees of freedom yields a p-value .0193.

© 2008 Thomson South-Western. All Rights Reserved Slide


15
Multinomial Distribution Goodness of Fit Test

 Example: Finger Lakes Homes (A)


Finger Lakes Homes manufactures
four models of prefabricated homes,
a two-story colonial, a log cabin, a
split-level, and an A-frame. To help
in production planning, management
would like to determine if previous
customer purchases indicate that there
is a preference in the style selected.

© 2008 Thomson South-Western. All Rights Reserved Slide


16
Multinomial Distribution Goodness of Fit Test

 Example: Finger Lakes Homes (A)


The number of homes sold of each
model for 100 sales over the past two
years is shown below.

Split- A-
Model Colonial Log Level Frame
# Sold 30 20 35 15

© 2008 Thomson South-Western. All Rights Reserved Slide


17
Multinomial Distribution Goodness of Fit Test

 Hypotheses

H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25
where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame

© 2008 Thomson South-Western. All Rights Reserved Slide


18
Multinomial Distribution Goodness of Fit Test

 Rejection Rule

Reject H0 if p-value < .05 or 2 > 7.815.

With  = .05 and


k-1=4-1=3
degrees of freedom

Do Not Reject H0 Reject H0


2
7.815

© 2008 Thomson South-Western. All Rights Reserved Slide


19
Multinomial Distribution Goodness of Fit Test

 Expected Frequencies

e1 = .25(100) = 25 e2 = .25(100) = 25
e3 = .25(100) = 25 e4 = .25(100) = 25
 Test Statistic
2 2 2 2
( 30  25) ( 20  25) ( 35  25) (15  25)
2    
25 25 25 25
=1+1+4+4
= 10

© 2008 Thomson South-Western. All Rights Reserved Slide


20
Multinomial Distribution Goodness of Fit Test

 Conclusion Using the p-Value Approach

Area in Upper Tail .10 .05 .025 .01 .005


2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

Because 2 = 10 is between 9.348 and 11.345, the


area in the upper tail of the distribution is between
.025 and .01.
The p-value <  . We can reject the null hypothesis.

Note: A precise p-value can be found using


Minitab or Excel.

© 2008 Thomson South-Western. All Rights Reserved Slide


21
Multinomial Distribution Goodness of Fit Test

 Conclusion Using the Critical Value Approach


2 = 10 > 7.815

We reject, at the .05 level of significance,


the assumption that there is no home style
preference.

© 2008 Thomson South-Western. All Rights Reserved Slide


22
Test of Independence: Contingency Tables

1. Set up the null and alternative hypotheses.


2. Select a random sample and record the observed
frequency, fij , for each cell of the contingency table.
3. Compute the expected frequency, eij , for each cell.

(Row i Total)(Column j Total)


eij 
Sample Size

© 2008 Thomson South-Western. All Rights Reserved Slide


23
Test of Independence: Contingency Tables

4. Compute the test statistic.


2
( f  e )
 2    ij ij
i j eij

5. Determine the rejection rule.

Reject H0 if p -value <  or  2


  2
.

where  is the significance level and,


with n rows and m columns, there are
(n - 1)(m - 1) degrees of freedom.

© 2008 Thomson South-Western. All Rights Reserved Slide


24
Contingency Table (Independence) Test

 Example: Finger Lakes Homes (B)


Each home sold by Finger Lakes
Homes can be classified according to
price and to style. Finger Lakes’
manager would like to determine if
the price of the home and the style of
the home are independent variables.

© 2008 Thomson South-Western. All Rights Reserved Slide


25
Contingency Table (Independence) Test

 Example: Finger Lakes Homes (B)


The number of homes sold for
each model and price for the past two
years is shown below. For convenience,
the price of the home is listed as either
$99,000 or less or more than $99,000.

Price Colonial Log Split-Level A-Frame


< $99,000 18 6 19 12
> $99,000 12 14 16 3

© 2008 Thomson South-Western. All Rights Reserved Slide


26
Contingency Table (Independence) Test

 Hypotheses

H0: Price of the home is independent of the


style of the home that is purchased
Ha: Price of the home is not independent of the
style of the home that is purchased

© 2008 Thomson South-Western. All Rights Reserved Slide


27
Contingency Table (Independence) Test

 Expected Frequencies

Price Colonial Log Split-Level A-Frame Total


< $99K 18 6 19 12 55
> $99K 12 14 16 3 45
Total 30 20 35 15 100

© 2008 Thomson South-Western. All Rights Reserved Slide


28
Contingency Table (Independence) Test

 Rejection Rule
With  = .05 and (2 - 1)(4 - 1) = 3 d.f.,  .052  7.815
2

Reject H0 if p-value < .05 or 2 > 7.815

 Test Statistic
2 2 2
(18  16 . 5) ( 6  11) ( 3  6 . 75)
2    ... 
16. 5 11 6. 75
= .1364 + 2.2727 + . . . + 2.0833 = 9.149

5.991465

© 2008 Thomson South-Western. All Rights Reserved Slide


29
Contingency Table (Independence) Test

 Conclusion Using the p-Value Approach

Area in Upper Tail .10 .05 .025 .01 .005


2 Value (df = 3) 6.251 7.815 9.348 11.345 12.838

Because 2 = 9.145 is between 7.815 and 9.348, the


area in the upper tail of the distribution is between
.05 and .025.
The p-value <  . We can reject the null hypothesis.

Note: A precise p-value can be found using


Minitab or Excel.

© 2008 Thomson South-Western. All Rights Reserved Slide


30
Contingency Table (Independence) Test

 Conclusion Using the Critical Value Approach


2 = 9.145 > 7.815

We reject, at the .05 level of significance,


the assumption that the price of the home is
independent of the style of home that is
purchased.

© 2008 Thomson South-Western. All Rights Reserved Slide


31
Test of independence

© 2008 Thomson South-Western. All Rights Reserved Slide


32
© 2008 Thomson South-Western. All Rights Reserved Slide
33
 we see the upper tail area at 6.45 is between .05 and .
025, and so the corresponding upper tail area or p-
value must be between .05 and .025. With p-value .05,
we reject H0 and conclude that beer preference is
not independent of the gender of the beer drinker.
6.45 5.991465

© 2008 Thomson South-Western. All Rights Reserved Slide


34
© 2008 Thomson South-Western. All Rights Reserved Slide
35
Test of independence

© 2008 Thomson South-Western. All Rights Reserved Slide


36
Goodness of fit

© 2008 Thomson South-Western. All Rights Reserved Slide


37
The goodness of fit test now focuses on the differences
between the observed frequencies and the expected
frequencies. Whether the differences between the observed
and expected frequencies are “large” or “small” is a question
answered with the aid of the following chi-square test statistic.
© 2008 Thomson South-Western. All Rights Reserved Slide
38
© 2008 Thomson South-Western. All Rights Reserved Slide
39
 We will reject the null hypothesis if the differences
between the observed and expected frequencies are
large.

 Thus the test of goodness of fit will always be an


upper tail test.

© 2008 Thomson South-Western. All Rights Reserved Slide


40
p value CHISQ.DIST.RT(7.34,2) 0.025476
© 2008 Thomson South-Western. All Rights Reserved Slide
41
© 2008 Thomson South-Western. All Rights Reserved Slide
42
© 2008 Thomson South-Western. All Rights Reserved Slide
43
Goodness of Fit Test: Normal Distribution

1. Set up the null and alternative hypotheses.


2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
c. For each interval record the observed frequencies
3. Compute the expected frequency, ei , for each interval.

© 2008 Thomson South-Western. All Rights Reserved Slide


44
Goodness of Fit Test: Normal Distribution

4. Compute the value of the test statistic.


2
k ( f  e )
2   i i
i 1 ei

5. Reject H0 if  2   2 (where  is the significance level


and there are k - 3 degrees of freedom).

© 2008 Thomson South-Western. All Rights Reserved Slide


45
Normal Distribution Goodness of Fit Test

 Example: IQ Computers
IQ Computers (one better than HP?)
manufactures and sells a general IQ
purpose microcomputer. As part of
a study to evaluate sales personnel, management
wants to determine, at a .05 significance level, if the
annual sales volume (number of units sold by a
salesperson) follows a normal probability distribution.

© 2008 Thomson South-Western. All Rights Reserved Slide


46
Normal Distribution Goodness of Fit Test

 Example: IQ Computers
A simple random sample of 30 of
the salespeople was taken and their IQ
numbers of units sold are below.

33 43 44 45 52 52 56 58 63 64
64 65 66 68 70 72 73 73 74 75
83 84 85 86 91 92 94 98 102 105

(mean = 71, standard deviation = 18.54)


z mean+z*sigma
0.1667 -0.96729 53.06648
0.3334 -0.43054 63.01772
0.5001 0.000251 71.00465
0.6668 0.431094 78.99248
0.8335 0.968089 88.94837

© 2008 Thomson South-Western. All Rights Reserved Slide


47
Normal Distribution Goodness of Fit Test

 Hypotheses
H0: The population of number of units sold
has a normal distribution with mean 71
and standard deviation 18.54.
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.

© 2008 Thomson South-Western. All Rights Reserved Slide


48
Normal Distribution Goodness of Fit Test

 Interval Definition
To satisfy the requirement of an expected
frequency of at least 5 in each interval we will
divide the normal distribution into 30/5 = 6
equal probability intervals.

© 2008 Thomson South-Western. All Rights Reserved Slide


49
Normal Distribution Goodness of Fit Test

 Interval Definition

Areas
= 1.00/6
= .1667

53.02 71 88.98 = 71 + .97(18.54)


71  .43(18.54) = 63.03 78.97

© 2008 Thomson South-Western. All Rights Reserved Slide


50
Normal Distribution Goodness of Fit Test

 Observed and Expected Frequencies

i fi ei f i - ei
Less than 53.02 6 5 1
53.02 to 63.03 3 5 -2
63.03 to 71.00 6 5 1
71.00 to 78.97 5 5 0
78.97 to 88.98 4 5 -1
More than 88.98 6 5 1
Total 30 30

© 2008 Thomson South-Western. All Rights Reserved Slide


51
Normal Distribution Goodness of Fit Test

 Rejection Rule
With  = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
(where k = number of categories and p = number
of population parameters estimated),  .05  7.815
2

Reject H0 if p-value < .05 or 2 > 7.815.

 Test Statistic
(1) 2
(  2) 2
(1) 2
(0) 2
(  1) 2
(1) 2
2        1.600
5 5 5 5 5 5

© 2008 Thomson South-Western. All Rights Reserved Slide


52
Normal Distribution Goodness of Fit Test

 Conclusion Using the p-Value Approach

Area in Upper Tail .90 .10 .05 .025 .01


2 Value (df = 3) .584 6.251 7.815 9.348 11.345

Because 2 = 1.600 is between .584 and 6.251 in the


Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
The p-value >  . We cannot reject the null hypothesis.
There is little evidence to support rejecting the
assumption the population is normally distributed with
 = 71 and  = 18.54.
A precise p-value can be found
using Minitab or Excel.
© 2008 Thomson South-Western. All Rights Reserved Slide
53
© 2008 Thomson South-Western. All Rights Reserved Slide
54

You might also like