Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 83

Chi-Square

Analysis

Prepared by:
Prof. Dr Bahaman Abu Samah
Department of Professional Development and Continuing Education
Faculty of Educational Studies
Universiti Putra Malaysia
Serdang
Introduction
• The chi-square distribution has only one parameter –
degrees of freedom
• The shape of the distribution is skewed to the right
(positively skewed) for small df and becomes symmetric for
large df
• The chi-square statistic reflects the magnitude of the
discrepancies between observed and expected counts
• The entire chi-square distribution lies to the right of the y-
axis
• The chi-square value can never be negative
Chi-Square
Distribution
df = 2

df = 7

df = 12

| | | | | | | | | | | | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 χ2
Types
Chi-Square
of

 Goodness-of-fit
To test certain assumption on the distribution of a
categorical variable
 Test of Independence
Test on association between variables regarding
contingency tables
 Test of Homogeneity
Test on the difference/proportion between groups
1 χ 2

Goodness-of-Fit
Introduction
• Aims at comparing the actual frequencies within
each category of a nominal variable against its
expected frequency of a model based on the
probability theory
• Developed by Karl Pearson in 1980s
• Also known as one sample Chi-Square Test
Purpose
• Test the assumption on the distribution of a
categorical variable
• Example:
There is 3:7 male to female teachers on teaching
profession

Next ►
Requirements
• One categorical variable (nominal/ordinal) with more than two categories
• The expected frequencies should not be smaller than 5 for more than 20% of the
total expected frequencies
• In the case of expected frequencies of less than 5 exceeding 20%, it is
advisable to collapse the adjacent categories
• Calculation is based on:
O - Observed frequency
E - Expected frequency
E = np
where n – sample size
p – probability/proportion
What
to Expect?
Manual
What to Expect?
2  
O  E 2
Criteria Decision
E χ2cal ≥ χ2critical Reject HO

χ2cal < χ2critical Fail to reject HO
Calculate
Hypothesis

χ2 - value


State  
HO and HA Decision Conclusion
Test


Critical value
5-Step
Hypothesis Test
5-Steps Hypothesis
Test
1 State HO and HA

2 Calculate χ2

3 Determine Critical Value

4 Decision

5 Conclusion
Step 1: State HO & HA
HO: Statement of assumption
HA: Statement opposite of the assumption
Step 2: Calculate Test
Statistic
Calculation of is based on frequency
counts:
1. Observed (O)
2. Expected (E)
E = np where
n Sample size
p Probability/Proportion
2
(O  E )
 
2
E
Step 3: Determine Critical
Value
Critical value is based on:
 Significance level (α)
 Degrees of freedom
df  k  1
Where k = no. of levels/groups
2
 , df
Step 4: Decision
– Only two (2) possible decisions.
– Reject or Fail to Reject HO
Manual:
Reject HO: χ2cal ≥ χ2critical
Criteria Decision
Fail to reject HO: χ cal < χ critical
2 2
χ2cal ≥ χ2critical Reject HO
χ2cal < χ2critical Fail to reject HO
SPSS:
Criteria Decision
Reject HO: sig-χ ≤ α
2
sig-χ2 ≤ α Reject HO

Fail to reject HO: sig-χ > α 2 sig-χ2 > α Fail to reject HO


Step 5: Conclusion
Reject HO:
Significant different from the
assumption
Fail to reject HO:
No significant different from
the assumption
Example 1
The following table displays the age distribution for a sample
summoned for traffic violations. Test the hypothesis that the
proportion of people summoned for traffic violations is equal for
all age groups at .05 level of significance.

Age group < 20 20 - 29 30 - 39 40 – 49 ≥ 50


No. of Summon 32 25 19 16 8

Assumption:
p1 = p2 = p3 = p4 = p5 = .20
Step 1: Hypotheses
HO: The proportion of people involved in traffic
violations is the same for all age groups
HA: The proportion of people involved in traffic
violations is different by age groups
Step 2: Test Statistics
2
(O  E )
Age O E (O – E) (O – E)2 E
< 20 32 20 12 144 7.20
20-29 25 20 5 25 1.25
30-39 19 20 -1 1 .05
-4 16 .80
40-49 16 20 -12 144 7.20
≥ 50 8
100 20 χ2 = 16.50
Step 3: Critical value
df  k  1
 5 1
4
Fail to Reject HO
reject HO
 42, .05  9.49

α =.05

9.49
Criteria Decision
Step 4 : Decision χ2cal ≥ χ2critical Reject HO
χ2cal < χ2critical Fail to reject HO
Decision
Since χ2cal (16.5) is bigger than χ2critical (9.49)
 Reject HO
Step 5 : Conclusion
Conclusion
Conclude the proportion of people summoned for
traffic violations is significantly different by age
groups at .05 level of significance.
7 Basic Key
Information
Purpose
Test the assumption pertaining to the distribution
of a categorical variable

Next ►
Requirements
► Only one categorical variable (Nominal/Ordinal)
► Calculation is based on:
O - Observed frequency
E - Expected frequency
E = np
where n – sample size
p – probability/proportion
How to run t-test in
SPSS

Next ►
Next ►
Results of Chi-Square

Next ►
Decision
Criteria
Reject HO: sig-χ2 ≤ α
Fail to
reject HO: sig- χ2 > α

Next ►
Interpretation
Reject HO
There is significant different from the
assumption

Fail to Reject HO
There is no significant different from the
assumption

Next ►
Example
• The table below, gift_type, provides the observed frequencies
(Observed N) for each gift, as well as the expected frequencies
(Expected N), which are the frequencies expected if the null
hypothesis is true. The difference between the observed and
expected frequencies is provided in the Residual column.
• Gift type:
• 1 = Gift Certificate
• 2 = Cuddly Toy
• 3 = Cinema Tickets
…cont.
• The table below, Test Statistics, provides the actual result of the chi-
square goodness-of-fit test.
Step 1: State HO & HA
HO: The proportion of respondents
received gifts are similar for all types
of gifts
HA: The proportion of respondents
received gifts are different by types
of gifts
Step 2: Report test statistics and sig-
value
 
Step 3: Determine Significance Level
(Alpha)
Set at either .05 or .01

 
In this case, .05
Reject HO:
Significant different from

Step 4: Decision the assumption


Fail to reject HO:
No significant different from
– Only two (2) possible decisions.
 
the assumption
– Reject or Fail to Reject HO

Since sig-p (.000) < .05), reject HO

SPSS:
Criteria Decision
Reject HO: sig-χ ≤ α
2
sig-χ2 ≤ α Reject HO

Fail to reject HO: sig-χ > α


2 sig-χ2 > α Fail to reject HO
Step 5: Conclusion
Conclude the proportion of people received
gifts is significantly different by types of gifts
at .05 level of significance.
APA Reporting

A chi-square test of goodness-of-fit was performed to


determine whether the three types of gifts were equally
distributed. Distribution for the three types of gifts was not
equally distributed in the population, χ2 (2, N = 1000) = 49.4, p
< .05.
2 χ
2

Test of
Independence
Introduction
• Aims at finding out whether the 2 qualitative
variables are independent of each other or related to
each other by taking into account the proportion of
responses found in the combination of different
categories of these two variables
• Also known as two independent samples chi-square
test
Purpose &
Requirement
Purpose:
Test association between two categorical variables and
determine the strength of the association

Requirement:
DV − Nominal/Ordinal
IV − Nominal/Ordinal

Note: At least one of the variable is nominal scale


Calculation based on frequencies rather than numerical scores.
Next ►
What
to Expect?
What to
Expect
Manual
Hypothesis
Test
 2  
O  E 2 Criteria
χ2cal ≥ χ2critical
Decision
Reject HO
Calculate E χ2cal < χ2critical Fail to reject HO
χ2 - value


State  
HO and HA Decision Conclusion


Critical value 

Next ►
5-Step
Hypothesis Test

5-Step Hypothesis
Test
1 State HO and HA

2 Calculate Test Statistics

3 Determine Critical Value

4 Decision

5 Conclusion

Next ►
Step 1: State HO & HA
HO: DV is independent of IV
HA: DV is dependent on IV
Note:
Please follow the above stated format; otherwise the
meaning is reversed
DV = Academic performance
Example: IV = Student group
HO: Academic performance is
independent of student group 
HA: Academic performance is dependent
of student group
Next ►
Step 2: Calculate Test
Statistic
1. Calculate Expected Count (E) for each of the cells
in the following contingency table
RT * CT
E
GT

2x3 Contingency Table

Next ►
2. Calculate the Chi-Square value, using the following
formula: (O  E )
2
2  
E


Step 3: Determine Critical
Value
Critical value is based on:
 Significance level (α)
 Degrees of freedom
df  ( R  1) (C  1)

 2
 , df

Step 4: Decision
– Only two (2) possible decisions. Criteria Decision
χ2cal ≥ χ2critical Reject HO
– Reject or Fail to Reject HO χ2cal < χ2critical Fail to reject HO

Manual: Criteria Decision


sig- χ2 ≤ α Reject HO
Reject HO: χ2cal ≥ χ2critical
sig- χ2 > α Fail to reject HO
Fail to reject HO: χ2cal < χ2critical

SPSS:
Reject HO: sig- χ2 ≤ α 

Fail to reject HO: sig- χ2 > α


Step 5: Conclusion
Reject HO:
DV is significantly dependent on IV
Fail to reject HO:
DV is not significantly dependent on
IV

Measures of
Association
1. Phi coefficient
2

n
Note:
2. Contingency coefficient a. Phi coefficient is used for only 2x2 contingency
2 table
C b. Use Guildford’s rule of thumb to interpret the
2  n magnitude of association between the two
variables
3. Cramer V coefficient
2
V
n (k  1)

Guildford Rule of Thumb
rs Strength of Relationship

< .2 Negligible Relationship


.2 - .4 Low relationship
.4 - .7 Moderate relationship
.7 - .9 High relationship
> .9 Very high relationship
Example/Exerci
se
A study was conducted to test the relationship between
gender and academic performance. Data collected from a
randomly selected sample.
1. Test the hypothesis on the relationship between the two
variables at .05 level of significance.
2. Calculate and describe an appropriate measure of
association between the two variables.
Academic Performance
Gender High Moderate Low

Male 93 70 12
Female 87 32 6
Q1. Hypothesis Test
a. Hypotheses
HO: Academic performance is independent of gender
HA: Academic performance is dependent on gender
b. Test statistic. Calculate expected value for each cell
Academic Performance Row
Gender High Moderate Low Totals
175
Male 93
(105.0) 70
(59.5) 12
(10.5)
125
Female 87
(75.0)
32
(42.5)
6
(7.5)

Column Totals 180 102 18 300


Chi-Square
O  E 2
Group O E (O – E) (O – E)2 E
M-H 93 105.0 -12.0 144.00 1.371
M-M 70 59.5 10.5 110.25 1.853
M-L 12 10.5 1.5 2.25 .214
F-H 87 75.0 12.0 144.00 1.920
F-M 32 42.5 -10.5 110.25 2.594
F-L 6 7.5 -1.5 2.25 .300

300 8.252
…Cont.

c. Critical value
Fail to Reject HO
df  ( R  1) (C  1) reject HO
 (2  1) (3  1)
 1 2
2
α =.05
 22, .05  5.99

d. Decision 5.99
Since χ2cal (8.252) > χ2critical (5.99)
8.252
 Reject HO
e. Conclusion Academic
performance is significantly dependent on gender at .
05 level of significance
…Cont.

Q2. Measures of
Association
For a 2 x 3 contingency table, both contingency and
Cramer’s V coefficients are appropriate
2 2
C V
2 n n (k  1)
8.252 8.252
 
8.252  300 300 (2  1)
 .164  .166

Negligible association between gender and


academic performance
Guildford Rule of Thumb
rs Strength of Relationship

< .2 Negligible Relationship


.2 - .4 Low relationship
.4 - .7 Moderate relationship
.7 - .9 High relationship
> .9 Very high relationship
Invalid Chi Square
• Invalid if more than 1/5 (20%) cell with expected count less than
5
• Can be avoided by:
• Collect larger samples
• Combine data for the smaller expected categories until their
combined value is 5 or more
A study was conducted to test the relationship between
gender and group participation. Data collected from a
randomly selected sample follow
1. Test the hypothesis on the relationship between the two
variables at .05 level of significance.
2. Calculate and describe an appropriate measure of
association between the two variables.
Group participation
Gender A B C

Female 6 16 8
Male 5 13 2
Q1. Hypothesis Test
a. Hypotheses
HO: Group participation is independent of gender
HA: Group participation is dependent on gender
b. Test statistic. Calculate expected value for each cell
Group participation Row
Gender A B C Totals
30
Female 6
(6.6) 16
(17.4) 8
(6.0)
20
Male 5
(4.4)
13
(11.6)
2
(4.0)

Column Totals 11 29 10 50
2
(O  E )
 
2
Chi-Square E
O  E 2
Group O E (O – E) (O – E)2 E
F-A 6 6.6 -0.6 0.36 0.055
F-B 16 17.4 -1.4 1.96 0.113
F-C 8 6.0 2.0 4.0 0.667
M-A 5 4.4 0.6 0.36 0.082
M-B 13 11.6 1.4 1.96 0.169
M-C 2 4.0 -2.0 4.0 1.000

50 2.086
…Cont.

c. Critical value
Fail to Reject HO
df  ( R  1) (C  1) reject HO
 (2  1) (3  1)
 1 2
2

α =.05
 22, .05  5.99

d. Decision 5.99
Since χ2cal (2.086) < χ2critical (5.99)
2.086
 Fail to reject HO
e. Conclusion
Group participation is independent on gender at .05
level of significance
…Cont.

Q2. Measures of
Association
For a 2 x 3 contingency table, both contingency and
Cramer’s V coefficients are appropriate
2 2
C V
2 n n (k  1)
2.086 2.086
 
2.086  50 50 ( 2  1)

 .200  .204
Negligible association between gender and group
participation
Look for

Is this chi square of independence valid? expected


value <5

O  E 2  2 𝑋 100=33.33 %
Group O E (O – E) (O – E) 2
E 6

F-A 6 6.6 -0.6 0.36 0.055 Since 33.33% cell


F-B 16 17.4 -1.4 1.96 0.113 with expected count
less than 5,
F-C 8 6.0 2.0 4.0 0.667 therefore,
M-A 5 4.4 0.6 0.36 0.082 Chi square is invalid
M-B 13 11.6 1.4 1.96 0.169
M-C 2 4.0 -2.0 4.0 1.000

50 2.086

Condition to be fulfilled:
Valid if less than 20% cell with expected count less than 5
Invalid if more than 20% cell with expected count less than 5
Yates Correction
• Yates Correction*
• When there is only 1 degree of freedom, regular chi-test should not be used
• Apply the Yates correction by subtracting 0.5 from the absolute value of each
calculated O-E term, then continue as usual with the new corrected values
2x3
Contingency table Observed
Values

Expected
Values
…Cont.
Hypothesis Testing:
Chi-Square
Chi-Square value
Chi-Square Tests

Asymp. Sig.
Value df (2-sided) Decision:
Pearson Chi-Square 8.253a 2 .016 sig-χ2 (.016) < .01
Likelihood Ratio 8.370 2 .015
Linear-by-Linear
Reject HO
6.762 1 .009
Association
N of Valid Cases 300 Conclusios:
a. 0 cells (.0%) have expected count less than 5. The Performance is significantly
minimum expected count is 7.50. dependent on student group
Measures of Association Symmetric Measures
at .01 level of sig
between variables
Value Approx. Sig. Which measure is
Nominal by Phi .166 .016 most appropriate?
Nominal Cramer's V .166 .016
Contingency Coefficient .164 .016
negligible
N of Valid Cases 300
association
a. Not assuming the null hypothesis.
between variables
b. Using the asymptotic standard error assuming the null
For 2x3 table,
hypothesis.
use C or V coefficient
SPSS OUTPUT
• Suppose we want to test for an association between smoking
behavior (nonsmoker, current smoker, or past smoker) and gender
(male or female) using a Chi-Square Test of Independence (we'll
use α = 0.05).

Next ►
a. Hypotheses
 

HO: Smoking behavior is independent of gender


HA: Smoking behavior is dependent on gender
b. Report test statistic
=3.171, sig-=.205
c. Set Alpha (significant level) = .05
d. Decision
Since the p-value (.205) is greater than significance level (α = 0.05), failed to reject the null
hypothesis
e. Conclusion
Conclude that there is not enough evidence to suggest an association between gende
and smoking, or
No association was found between gender and smoking behavior (Χ2(2)> = 3.171, p = 0.205).
How to run t-test in
SPSS

Next ►
Next ►
Results of Chi-Square

Next ►
Decision
Criteria
Reject HO: sig-χ2 ≤ α
Fail to
reject HO: sig- χ2 > α

Next ►
Interpretation
Reject HO
There is significant association between
IV and DV

Fail to Reject HO
There is no significant association
between IV and DV

Next ►
APA Reporting
A chi-square test of independence was
performed to examine the relation between
gender and smoking behavior. The relation
between these variables was not significant, χ2
(2, N = 193) = 3.171, p >.05. Smoking
behaviour does not related to gender.

You might also like