STAT 1013 Statistics: Week 13 AND 14

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 46

STAT 1013

STATISTICS
Week 13 AND 14
Chi – Square and ANOVA
LEARNING OUTCOMES

At the end of the lesson, the students should be able to:


1. Explain how the test statistic is computed for a chi-square test;
2. Calculate the degrees of freedom for the chi-square goodness-of-fit
test and locate critical values in the chi-square table;
3. Compute the chi-square goodness-of-fit test and interpret the
results;
4. Identify the assumption and the restriction of expected frequency
size for the chi-square test;
LEARNING OUTCOMES

5. Calculate the degrees of freedom for the chi-square test for


independence and locate critical values in the chi-square table;
6. Compute the chi-square test for independence and interpret the
results;
7. Set up and perform one – way ANOVA;
8. Identify the information in the ANOVA table; and
9. Interpret the results from ANOVA output
CHI - SQUARE

TEST FOR GOODNESS OF FIT


The chi-square statistic can be used to see whether a frequency
distribution fits a specific pattern. The chi-square goodness of fit test is
used for this test.
CHI - SQUARE

For example, a researcher wants to determine whether consumers


have any preference among five flavors of ice cream. A sample of
100 people provided the following data.
Flavor Frequency
Chocolate 32
Strawberry 28
Mango 16
Cookies ’n Cream 14
Vanilla 10
CHI - SQUARE
If there were no preference, one would expect that each flavor would be
selected with equal frequency. In this case, approximately 100 /5 = 20 people
would select each flavor. Since these frequencies were obtained from a
sample, these frequencies are called observed frequencies. The frequencies
obtained from calculation are called expected frequencies.
Flavor Observed Frequency Expected Frequency
Chocolate 32 20
Strawberry 28 20
Mango 16 20
Cookies ’n Cream 14 20
Vanilla 10 20
CHI - SQUARE

Flavor Observed Frequency Expected Frequency


Chocolate 32 20
Strawberry 28 20
Mango 16 20
Cookies ’n Cream 14 20
Vanilla 10 20

The observed frequencies will always differ from the expected frequencies
due to sampling error. But are these differences significant? The chi-square
goodness of fit test will enable one to determine the answer.
Formula for the Chi-Square Test
 

Formula

With
df = degrees of freedom = number of categories – 1
O = observed frequencies
E = expected frequencies
Chi-Square Test
 
Assumptions for the Chi-Square Test:
 The data obtained from a random sample must be independent.
 The expected frequency for each category must be at least 5.

Before computing the test statistic, the hypotheses must be stated first.
Consumers show no preference.
Consumers show a preference.
Chi-Square Test

When there is a close agreement between the value of O and E, the


chi-square is small and the null hypothesis is not rejected. When there
are large differences between O and E, the chi-square is large and the
null hypothesis is rejected.
EXAMPLE 1 : Chi - Square
A clothing manufacturer wants to determine whether costumers prefer any
specific color over other colors in shirts. He selects a random sample of 102
shirts sold and notes the color. The table below shows the results. At α=0.10,
is there a color preference?
Color Number Sold
White 43
Blue 22
Black 16
Red 10
Yellow 6
Green 5
EXAMPLE 1 : Chi - Square

STEP 1 : State the hypotheses.


Ho: Consumers have no color preference.
Ha: Consumers show a preference.
STEP 2: Level of Significance
α = 0.10
STEP 3 : Determine the critical value
Using α = 0.10 and df = 6 – 1 = 5
Hence, the critical value is 9.236.
EXAMPLE 1 : Chi - Square

STEP 4 : Compute the chi-square statistic


If there were no preference, then there will be 102 / 6 = 17 shirts for each
color.

Color Observed Frequency Expected Frequency


White 43 17
Blue 22 17
Black 16 17
Red 10 17
Yellow 6 17
Green 5 17
EXAMPLE 1 : Chi - Square
STEP 4 : Compute the chi-square statistic
Color Observed Frequency Expected Frequency
White 43 17
Blue 22 17
Black 16 17
Red 10 17
Yellow 6 17
Green 5 17
 
EXAMPLE 1 : Chi - Square
 

STEP 5 : Decision
The decision is to reject the null hypothesis, since .

STEP 6: Conclusion
Therefore, there is enough evidence to reject the claim that the customers
show no preference for the colors of shirts.
 
When there is a perfect agreement between the observed and expected
values, . Also, can never be negative. Finally, the test is always right-tailed.
EXAMPLE 2 : Chi - Square
The dean of student affairs of a college wishes to test the claim that the distribution of
students is as follows: 40% nursing (N); 25% business (B); 15% computer science (CS);
10% engineering (E); 5% humanities (H); and 5% education (Ed). Last semester, the
program enrollment was distributed as shown below. At α=0.05 is the distribution of
students the same as hypothesized?
Major Number
Nursing 72
Business 53
Computer Science 32
Engineering 20
Humanities 16
Education 7
EXAMPLE 2 : Chi - Square
STEP 1 : State the hypotheses.
Ho: The distribution of students is as follows: 40% nursing (N); 25%
business (B); 15% computer science (CS); 10% engineering (E); 5%
humanities (H); and 5% education (Ed).
Ha: The distribution is not the same as stated in the null hypothesis.
STEP 2: Level of Significance
α = 0.05
STEP 3 : Determine the critical value
Using α = 0.05 and df = 6 – 1 = 5
Hence, the critical value is 11.071.
EXAMPLE 3: Chi - Square
STEP 4 : Compute the chi-square statistic
Since there are 200 students in the study, the expected values are
computed as follows:

 
EXAMPLE 1 : Chi - Square

STEP 5 : Decision
Since 5.61<11.071, the decision is not to reject the null hypothesis.

STEP 6: Conclusion
Therefore, the percentages are not significantly different than those given in
the null hypothesis.
 
TEST FOR INDEPENDENCE

There are times when we might be interested in observing more than


one variable on each individual to find if a relationship exists between
these variables. As an example, for each person we might observed his
blood type and eye color and investigate if these characteristics are
related in any way. Our goal is a test of independence, or to find
whether two observed characteristics of a member of a population are
independent.
TEST FOR INDEPENDENCE
Suppose we pick a sample size n and classify the data in a two-way
table on the basis of the two variables. Such a table for determining
whether the distribution according to one variable is contingent on the
distribution of the other is called a contingency table. It is used to
arrange the data. The table is made up of R rows and C columns. A
contingency table is designated as R×C (rows times columns) table.
Each block in the table is called a cell and it is designated by its row
and column position.
  Column 1 Column 2
Row 1
Row
Row 22
TEST FOR INDEPENDENCE
 
The chi-square independence test can be used to test the independence of
two variables. The hypotheses are stated as follows:
The first variable is independent of the second variable.
The first variable is dependent of the second variable.
 
The degrees of freedom for any contingency table are (no. of rows -1)
(no. of columns – 1). The reason for this formula is that all the expected
values except one are free to vary in each row and in each column.
TEST FOR INDEPENDENCE
 

Computation of Expected Frequencies


Find the sum of each row and each column, and find the grand total.
For each cell, multiply the corresponding row sum by the column sum
and divide by the grand total, to get the expected value.

 
Place the expected values in the corresponding cells along with the
observed values.
EXAMPLE 3 : Chi - Square

 
A study is being conducted to determine whether there is a relationship
between jogging and blood pressure. A random sample of 210 subjects is
selected, and they are classified as shown in the table that follows.
Use .
   Blood pressure 
Jogging status Low moderate High Total
Joggers 34 57 21 112
Non – joggers 15 63 20 98
Total 49 120 41 210
EXAMPLE 3 : Chi - Square

STEP 1 : State the hypotheses.


Ho: The blood pressure of a person does not depend on whether he
jogs or not.
Ha: The blood pressure of a person depends on whether he jogs or
not.
STEP 2: Level of Significance
α = 0.05
EXAMPLE 3 : Chi - Square
STEP 3 : Determine the critical value
Using α = 0.05
df = (R – 1) (C – 1)
df = (2 – 1) (3 – 1)
df = 2
Hence, the critical value is 5.991
EXAMPLE 3: Chi - Square

STEP 4 : Compute the chi-square statistic


EXAMPLE 3: Chi - Square

Tabulate the summary of the results.


  BLOOD PRESSURE
Jogging status Low Moderate High Total
O
  (E) O (E) O (E)  

Joggers 34 (26.13) 57 (64) 21 (21.87) 112

Non – joggers 15 (22.87) 63 (56) 20 (19.13) 98


Total 49   120   41   210
EXAMPLE 1 : Chi - Square
 

 
EXAMPLE 1 : Chi - Square

 
STEP 5 : Decision
Since Reject .

STEP 6: Conclusion
Therefore, the blood pressure of a person depends on whether he jogs or
not.  
ANALYSIS OF VARIANCE

The Analysis of Variance, commonly abbreviated ANOVA, is a


technique that uses the F test to test a hypothesis concerning the means
of three or more populations. Two different estimates of the population
variance are made. The first estimate is called the between-group
variance. It involves computing the variance by using the means of the
group or between the groups. The second estimate, the within-group
variance., is made by computing the variance using all the data. It is not
affected by differences in the means.
ANALYSIS OF VARIANCE
 
For a test of the differences among three or more means, the following
hypotheses are used:

At least one mean is different from the others.

If there is no difference in the means, the between-group variance


estimate will approximately be equal to the within-group variance
estimate, and the F test value will be approximately equal to 1. When the
means differ significantly, the between-group variance will be much
larger than the within-group variance. Then the F test value will be
significantly greater than 1. Then the null hypothesis will be rejected.
ANALYSIS OF VARIANCE

 
The degrees of freedom for the F test are :
where k is the number of groups
where n is the sum of the sample sizes of the groups
.

The sample sizes need not to be equal.


The F test to compare the means is always a RIGHT-TAILED TEST.
ANALYSIS OF VARIANCE
 
F test statistic
The value of the test statistic F for an ANOVA test is calculated as

To calculate MSB and MSW, first compute the between – samples sum
of squares denoted by SSB and the within-samples sum of the squares
denoted by SSW.
The sum of SSB and SSW is called the total sum of the squares and it is
denoted by SST. That is,
ANALYSIS OF VARIANCE
 
The values of SSB and SSW are calculated using the following formulas:
The between – samples sum of squares, denoted by SSB,

where:
= sum of sample in the ith group
= number of samples in the ith group
The within-samples sum of squares, denoted by SSW,
ANALYSIS OF VARIANCE
 
The variance between samples MSB and the variance within samples MSW
are calculated using the following formulas.
The MSB and MSW are calculated as

where k is the number of groups


where n is the sum of the sample sizes of the groups
.
EXAMPLE 1 : Analysis of Variance

Consider the following data obtained for two samples selected at random
from two populations that are independent and normally distributed with
equal variances. Calculate the F test statistic.

Sample 1 Sample 2
32 27
26 37
31 33
29 36
27 38
34 31
EXAMPLE 1 : Analysis of Variance
 
Calculate , square all values included in the two samples and then add.
Thus,

Substitute all the values in the formula for SSB and SSW.
EXAMPLE 1 : Analysis of Variance
 
Therefore,

 
The value of F test statistic is
EXAMPLE 1 : Analysis of Variance

These calculations are often recorded in a table called the ANOVA:

Source of Degrees of Sum of


Mean square F-test statistic
variation Freedom squares
Between 1 44.08 44.08
Within
Within 10
10 134.167
134.167 13.4167
13.4167
Total 11 178.247   F = 3.29
Total 11 178.247   F = 3.29
ANOVA Table

Source of Degrees of Sum of Mean square F-test statistic


variation Freedom squares
Between
Between SSB
SSB MSB
MSB
Within
Within SSW MSW
SSW MSW
Total SST  
Total SST  
ANOVA Table

Source of Degrees of Sum of Mean square F-test statistic


variation Freedom squares
Between
Between SSB
SSB MSB
MSB
Within
Within SSW MSW
SSW MSW
Total SST  
Total SST  
EXAMPLE 2 : ANOVA
Using the data in the previous example, at α = 0.05 level of
significance, test the claim that the two populations have equal
means.
 
STEP 1 : State the hypotheses

STEP 2: Level of Significance


EXAMPLE 2 : ANOVA
 
STEP 3: Determine the critical value
Since and the test is right-tailed,

The critical value is 4.75.


EXAMPLE 2 : ANOVA
 
STEP 4: As we have computed in example 1 of this section,

STEP 5: Decision Rule


Reject if . Since , the decision is not to reject the null hypothesis.

STEP 6: Conclusion
Hence, the two populations have equal means.

You might also like