STAT 1013 Statistics: Week 13 AND 14

STAT 1013
STATISTICS
Week 13 AND 14
Chi – Square and ANOVA
LEARNING OUTCOMES
At the end of the lesson, the students should be able to:

1. Explain how the test statistic is computed for a chi-square test;
2. Calculate the degrees of freedom for the chi-square goodness-of-fit
test and locate critical values in the chi-square table;
3. Compute the chi-square goodness-of-fit test and interpret the
results;
4. Identify the assumption and the restriction of expected frequency
size for the chi-square test;
LEARNING OUTCOMES
5. Calculate the degrees of freedom for the chi-square test for

independence and locate critical values in the chi-square table;
6. Compute the chi-square test for independence and interpret the
results;
7. Set up and perform one – way ANOVA;
8. Identify the information in the ANOVA table; and
9. Interpret the results from ANOVA output
CHI - SQUARE
TEST FOR GOODNESS OF FIT

The chi-square statistic can be used to see whether a frequency
distribution fits a specific pattern. The chi-square goodness of fit test is
used for this test.
CHI - SQUARE
For example, a researcher wants to determine whether consumers

have any preference among five flavors of ice cream. A sample of
100 people provided the following data.
Flavor Frequency
Chocolate 32
Strawberry 28
Mango 16
Cookies ’n Cream 14
Vanilla 10
CHI - SQUARE
If there were no preference, one would expect that each flavor would be
selected with equal frequency. In this case, approximately 100 /5 = 20 people
would select each flavor. Since these frequencies were obtained from a
sample, these frequencies are called observed frequencies. The frequencies
obtained from calculation are called expected frequencies.
Flavor Observed Frequency Expected Frequency
Chocolate 32 20
Strawberry 28 20
Mango 16 20
Cookies ’n Cream 14 20
Vanilla 10 20
CHI - SQUARE
Flavor Observed Frequency Expected Frequency

Chocolate 32 20
Strawberry 28 20
Mango 16 20
Cookies ’n Cream 14 20
Vanilla 10 20
The observed frequencies will always differ from the expected frequencies
due to sampling error. But are these differences significant? The chi-square
goodness of fit test will enable one to determine the answer.
Formula for the Chi-Square Test

Formula
With
df = degrees of freedom = number of categories – 1
O = observed frequencies
E = expected frequencies
Chi-Square Test

Assumptions for the Chi-Square Test:
 The data obtained from a random sample must be independent.
 The expected frequency for each category must be at least 5.
Before computing the test statistic, the hypotheses must be stated first.
Consumers show no preference.
Consumers show a preference.
Chi-Square Test
When there is a close agreement between the value of O and E, the

chi-square is small and the null hypothesis is not rejected. When there
are large differences between O and E, the chi-square is large and the
null hypothesis is rejected.
EXAMPLE 1 : Chi - Square
A clothing manufacturer wants to determine whether costumers prefer any
specific color over other colors in shirts. He selects a random sample of 102
shirts sold and notes the color. The table below shows the results. At α=0.10,
is there a color preference?
Color Number Sold
White 43
Blue 22
Black 16
Red 10
Yellow 6
Green 5
STEP 1 : State the hypotheses.

Ho: Consumers have no color preference.
Ha: Consumers show a preference.
STEP 2: Level of Significance
α = 0.10
STEP 3 : Determine the critical value
Using α = 0.10 and df = 6 – 1 = 5
Hence, the critical value is 9.236.
STEP 4 : Compute the chi-square statistic

If there were no preference, then there will be 102 / 6 = 17 shirts for each
color.
Color Observed Frequency Expected Frequency

White 43 17
Blue 22 17
Black 16 17
Red 10 17
Yellow 6 17
Green 5 17
Color Observed Frequency Expected Frequency
White 43 17
Blue 22 17
Black 16 17
Red 10 17
Yellow 6 17
Green 5 17


STEP 5 : Decision
The decision is to reject the null hypothesis, since .
STEP 6: Conclusion
Therefore, there is enough evidence to reject the claim that the customers
show no preference for the colors of shirts.

When there is a perfect agreement between the observed and expected
values, . Also, can never be negative. Finally, the test is always right-tailed.
The dean of student affairs of a college wishes to test the claim that the distribution of
students is as follows: 40% nursing (N); 25% business (B); 15% computer science (CS);
10% engineering (E); 5% humanities (H); and 5% education (Ed). Last semester, the
program enrollment was distributed as shown below. At α=0.05 is the distribution of
students the same as hypothesized?
Major Number
Nursing 72
Business 53
Computer Science 32
Engineering 20
Humanities 16
Education 7
Ho: The distribution of students is as follows: 40% nursing (N); 25%
business (B); 15% computer science (CS); 10% engineering (E); 5%
humanities (H); and 5% education (Ed).
Ha: The distribution is not the same as stated in the null hypothesis.
α = 0.05
Using α = 0.05 and df = 6 – 1 = 5
Hence, the critical value is 11.071.
EXAMPLE 3: Chi - Square
Since there are 200 students in the study, the expected values are
computed as follows:

STEP 5 : Decision
Since 5.61<11.071, the decision is not to reject the null hypothesis.
STEP 6: Conclusion
Therefore, the percentages are not significantly different than those given in
the null hypothesis.

TEST FOR INDEPENDENCE
There are times when we might be interested in observing more than

one variable on each individual to find if a relationship exists between
these variables. As an example, for each person we might observed his
blood type and eye color and investigate if these characteristics are
related in any way. Our goal is a test of independence, or to find
whether two observed characteristics of a member of a population are
independent.
Suppose we pick a sample size n and classify the data in a two-way
table on the basis of the two variables. Such a table for determining
whether the distribution according to one variable is contingent on the
distribution of the other is called a contingency table. It is used to
arrange the data. The table is made up of R rows and C columns. A
contingency table is designated as R×C (rows times columns) table.
Each block in the table is called a cell and it is designated by its row
and column position.
Column 1 Column 2
Row 1
Row
Row 22

The chi-square independence test can be used to test the independence of
two variables. The hypotheses are stated as follows:
The first variable is independent of the second variable.
The first variable is dependent of the second variable.

The degrees of freedom for any contingency table are (no. of rows -1)
(no. of columns – 1). The reason for this formula is that all the expected
values except one are free to vary in each row and in each column.

Computation of Expected Frequencies

Find the sum of each row and each column, and find the grand total.
For each cell, multiply the corresponding row sum by the column sum
and divide by the grand total, to get the expected value.

Place the expected values in the corresponding cells along with the
observed values.

A study is being conducted to determine whether there is a relationship
between jogging and blood pressure. A random sample of 210 subjects is
selected, and they are classified as shown in the table that follows.
Use .
Blood pressure
Jogging status Low moderate High Total
Joggers 34 57 21 112
Non – joggers 15 63 20 98
Total 49 120 41 210

Ho: The blood pressure of a person does not depend on whether he
jogs or not.
Ha: The blood pressure of a person depends on whether he jogs or
not.
α = 0.05
Using α = 0.05
df = (R – 1) (C – 1)
df = (2 – 1) (3 – 1)
df = 2
Hence, the critical value is 5.991

Tabulate the summary of the results.

BLOOD PRESSURE
Jogging status Low Moderate High Total
O
(E) O (E) O (E)
Joggers 34 (26.13) 57 (64) 21 (21.87) 112
Non – joggers 15 (22.87) 63 (56) 20 (19.13) 98

Total 49 120 41 210


STEP 5 : Decision
Since Reject .
STEP 6: Conclusion
Therefore, the blood pressure of a person depends on whether he jogs or
not.
ANALYSIS OF VARIANCE
The Analysis of Variance, commonly abbreviated ANOVA, is a

technique that uses the F test to test a hypothesis concerning the means
of three or more populations. Two different estimates of the population
variance are made. The first estimate is called the between-group
variance. It involves computing the variance by using the means of the
group or between the groups. The second estimate, the within-group
variance., is made by computing the variance using all the data. It is not
affected by differences in the means.

For a test of the differences among three or more means, the following
hypotheses are used:
At least one mean is different from the others.
If there is no difference in the means, the between-group variance

estimate will approximately be equal to the within-group variance
estimate, and the F test value will be approximately equal to 1. When the
means differ significantly, the between-group variance will be much
larger than the within-group variance. Then the F test value will be
significantly greater than 1. Then the null hypothesis will be rejected.

The degrees of freedom for the F test are :
where k is the number of groups
where n is the sum of the sample sizes of the groups
.
The sample sizes need not to be equal.

The F test to compare the means is always a RIGHT-TAILED TEST.

F test statistic
The value of the test statistic F for an ANOVA test is calculated as
To calculate MSB and MSW, first compute the between – samples sum
of squares denoted by SSB and the within-samples sum of the squares
denoted by SSW.
The sum of SSB and SSW is called the total sum of the squares and it is
denoted by SST. That is,

The values of SSB and SSW are calculated using the following formulas:
The between – samples sum of squares, denoted by SSB,
where:
= sum of sample in the ith group
= number of samples in the ith group
The within-samples sum of squares, denoted by SSW,

The variance between samples MSB and the variance within samples MSW
are calculated using the following formulas.
The MSB and MSW are calculated as
where k is the number of groups

where n is the sum of the sample sizes of the groups
.
EXAMPLE 1 : Analysis of Variance
Consider the following data obtained for two samples selected at random
from two populations that are independent and normally distributed with
equal variances. Calculate the F test statistic.
Sample 1 Sample 2
32 27
26 37
31 33
29 36
27 38
34 31

Calculate , square all values included in the two samples and then add.
Thus,
Substitute all the values in the formula for SSB and SSW.

Therefore,

The value of F test statistic is
These calculations are often recorded in a table called the ANOVA:
Source of Degrees of Sum of

Mean square F-test statistic
variation Freedom squares
Between 1 44.08 44.08
Within
Within 10
10 134.167
134.167 13.4167
13.4167
Total 11 178.247 F = 3.29
Total 11 178.247 F = 3.29
ANOVA Table
Source of Degrees of Sum of Mean square F-test statistic

Between
Between SSB
SSB MSB
MSB
Within
Within SSW MSW
SSW MSW
Total SST
Total SST
ANOVA Table
Source of Degrees of Sum of Mean square F-test statistic

Between
Between SSB
SSB MSB
MSB
Within
Within SSW MSW
SSW MSW
Total SST
Total SST
EXAMPLE 2 : ANOVA
Using the data in the previous example, at α = 0.05 level of
significance, test the claim that the two populations have equal
means.

STEP 1 : State the hypotheses

EXAMPLE 2 : ANOVA

STEP 3: Determine the critical value
Since and the test is right-tailed,
The critical value is 4.75.

EXAMPLE 2 : ANOVA

STEP 4: As we have computed in example 1 of this section,
STEP 5: Decision Rule

Reject if . Since , the decision is not to reject the null hypothesis.
STEP 6: Conclusion
Hence, the two populations have equal means.

STAT 1013 Statistics: Week 13 AND 14

Uploaded by

Copyright:

Available Formats

You might also like

STAT 1013 Statistics: Week 13 AND 14

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 1013 Statistics: Week 13 AND 14

Uploaded by

Copyright:

Available Formats

STAT 1013

At the end of the lesson, the students should be able to:

5. Calculate the degrees of freedom for the chi-square test for

TEST FOR GOODNESS OF FIT

For example, a researcher wants to determine whether consumers

Flavor Observed Frequency Expected Frequency

When there is a close agreement between the value of O and E, the

STEP 1 : State the hypotheses.

STEP 4 : Compute the chi-square statistic

Color Observed Frequency Expected Frequency

There are times when we might be interested in observing more than

Computation of Expected Frequencies

STEP 1 : State the hypotheses.

STEP 4 : Compute the chi-square statistic

Tabulate the summary of the results.

Joggers 34 (26.13) 57 (64) 21 (21.87) 112

Non – joggers 15 (22.87) 63 (56) 20 (19.13) 98

The Analysis of Variance, commonly abbreviated ANOVA, is a

At least one mean is different from the others.

If there is no difference in the means, the between-group variance

The sample sizes need not to be equal.

where k is the number of groups

These calculations are often recorded in a table called the ANOVA:

Source of Degrees of Sum of

Source of Degrees of Sum of Mean square F-test statistic

Source of Degrees of Sum of Mean square F-test statistic

STEP 2: Level of Significance

The critical value is 4.75.

STEP 5: Decision Rule

You might also like