Professional Documents
Culture Documents
5950 P11 Chi Square 2019
5950 P11 Chi Square 2019
Analysis
Prepared by:
Prof. Dr Bahaman Abu Samah
Department of Professional Development and Continuing Education
Faculty of Educational Studies
Universiti Putra Malaysia
Serdang
Introduction
• The chi-square distribution has only one parameter –
degrees of freedom
• The shape of the distribution is skewed to the right
(positively skewed) for small df and becomes symmetric for
large df
• The chi-square statistic reflects the magnitude of the
discrepancies between observed and expected counts
• The entire chi-square distribution lies to the right of the y-
axis
• The chi-square value can never be negative
Chi-Square
Distribution
df = 2
df = 7
df = 12
| | | | | | | | | | | | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 χ2
Types
Chi-Square
of
Goodness-of-fit
To test certain assumption on the distribution of a
categorical variable
Test of Independence
Test on association between variables regarding
contingency tables
Test of Homogeneity
Test on the difference/proportion between groups
1 χ 2
Goodness-of-Fit
Introduction
• Aims at comparing the actual frequencies within
each category of a nominal variable against its
expected frequency of a model based on the
probability theory
• Developed by Karl Pearson in 1980s
• Also known as one sample Chi-Square Test
Purpose
• Test the assumption on the distribution of a
categorical variable
• Example:
There is 3:7 male to female teachers on teaching
profession
Next ►
Requirements
• One categorical variable (nominal/ordinal) with more than two categories
• The expected frequencies should not be smaller than 5 for more than 20% of the
total expected frequencies
• In the case of expected frequencies of less than 5 exceeding 20%, it is
advisable to collapse the adjacent categories
• Calculation is based on:
O - Observed frequency
E - Expected frequency
E = np
where n – sample size
p – probability/proportion
What
to Expect?
Manual
What to Expect?
2
O E 2
Criteria Decision
E χ2cal ≥ χ2critical Reject HO
χ2cal < χ2critical Fail to reject HO
Calculate
Hypothesis
χ2 - value
State
HO and HA Decision Conclusion
Test
Critical value
5-Step
Hypothesis Test
5-Steps Hypothesis
Test
1 State HO and HA
2 Calculate χ2
4 Decision
5 Conclusion
Step 1: State HO & HA
HO: Statement of assumption
HA: Statement opposite of the assumption
Step 2: Calculate Test
Statistic
Calculation of is based on frequency
counts:
1. Observed (O)
2. Expected (E)
E = np where
n Sample size
p Probability/Proportion
2
(O E )
2
E
Step 3: Determine Critical
Value
Critical value is based on:
Significance level (α)
Degrees of freedom
df k 1
Where k = no. of levels/groups
2
, df
Step 4: Decision
– Only two (2) possible decisions.
– Reject or Fail to Reject HO
Manual:
Reject HO: χ2cal ≥ χ2critical
Criteria Decision
Fail to reject HO: χ cal < χ critical
2 2
χ2cal ≥ χ2critical Reject HO
χ2cal < χ2critical Fail to reject HO
SPSS:
Criteria Decision
Reject HO: sig-χ ≤ α
2
sig-χ2 ≤ α Reject HO
Assumption:
p1 = p2 = p3 = p4 = p5 = .20
Step 1: Hypotheses
HO: The proportion of people involved in traffic
violations is the same for all age groups
HA: The proportion of people involved in traffic
violations is different by age groups
Step 2: Test Statistics
2
(O E )
Age O E (O – E) (O – E)2 E
< 20 32 20 12 144 7.20
20-29 25 20 5 25 1.25
30-39 19 20 -1 1 .05
-4 16 .80
40-49 16 20 -12 144 7.20
≥ 50 8
100 20 χ2 = 16.50
Step 3: Critical value
df k 1
5 1
4
Fail to Reject HO
reject HO
42, .05 9.49
α =.05
9.49
Criteria Decision
Step 4 : Decision χ2cal ≥ χ2critical Reject HO
χ2cal < χ2critical Fail to reject HO
Decision
Since χ2cal (16.5) is bigger than χ2critical (9.49)
Reject HO
Step 5 : Conclusion
Conclusion
Conclude the proportion of people summoned for
traffic violations is significantly different by age
groups at .05 level of significance.
7 Basic Key
Information
Purpose
Test the assumption pertaining to the distribution
of a categorical variable
Next ►
Requirements
► Only one categorical variable (Nominal/Ordinal)
► Calculation is based on:
O - Observed frequency
E - Expected frequency
E = np
where n – sample size
p – probability/proportion
How to run t-test in
SPSS
Next ►
Next ►
Results of Chi-Square
Next ►
Decision
Criteria
Reject HO: sig-χ2 ≤ α
Fail to
reject HO: sig- χ2 > α
Next ►
Interpretation
Reject HO
There is significant different from the
assumption
Fail to Reject HO
There is no significant different from the
assumption
Next ►
Example
• The table below, gift_type, provides the observed frequencies
(Observed N) for each gift, as well as the expected frequencies
(Expected N), which are the frequencies expected if the null
hypothesis is true. The difference between the observed and
expected frequencies is provided in the Residual column.
• Gift type:
• 1 = Gift Certificate
• 2 = Cuddly Toy
• 3 = Cinema Tickets
…cont.
• The table below, Test Statistics, provides the actual result of the chi-
square goodness-of-fit test.
Step 1: State HO & HA
HO: The proportion of respondents
received gifts are similar for all types
of gifts
HA: The proportion of respondents
received gifts are different by types
of gifts
Step 2: Report test statistics and sig-
value
Step 3: Determine Significance Level
(Alpha)
Set at either .05 or .01
In this case, .05
Reject HO:
Significant different from
SPSS:
Criteria Decision
Reject HO: sig-χ ≤ α
2
sig-χ2 ≤ α Reject HO
Test of
Independence
Introduction
• Aims at finding out whether the 2 qualitative
variables are independent of each other or related to
each other by taking into account the proportion of
responses found in the combination of different
categories of these two variables
• Also known as two independent samples chi-square
test
Purpose &
Requirement
Purpose:
Test association between two categorical variables and
determine the strength of the association
Requirement:
DV − Nominal/Ordinal
IV − Nominal/Ordinal
State
HO and HA Decision Conclusion
Critical value
Next ►
5-Step
Hypothesis Test
5-Step Hypothesis
Test
1 State HO and HA
4 Decision
5 Conclusion
Next ►
Step 1: State HO & HA
HO: DV is independent of IV
HA: DV is dependent on IV
Note:
Please follow the above stated format; otherwise the
meaning is reversed
DV = Academic performance
Example: IV = Student group
HO: Academic performance is
independent of student group
HA: Academic performance is dependent
of student group
Next ►
Step 2: Calculate Test
Statistic
1. Calculate Expected Count (E) for each of the cells
in the following contingency table
RT * CT
E
GT
Next ►
2. Calculate the Chi-Square value, using the following
formula: (O E )
2
2
E
Step 3: Determine Critical
Value
Critical value is based on:
Significance level (α)
Degrees of freedom
df ( R 1) (C 1)
2
, df
Step 4: Decision
– Only two (2) possible decisions. Criteria Decision
χ2cal ≥ χ2critical Reject HO
– Reject or Fail to Reject HO χ2cal < χ2critical Fail to reject HO
SPSS:
Reject HO: sig- χ2 ≤ α
Male 93 70 12
Female 87 32 6
Q1. Hypothesis Test
a. Hypotheses
HO: Academic performance is independent of gender
HA: Academic performance is dependent on gender
b. Test statistic. Calculate expected value for each cell
Academic Performance Row
Gender High Moderate Low Totals
175
Male 93
(105.0) 70
(59.5) 12
(10.5)
125
Female 87
(75.0)
32
(42.5)
6
(7.5)
300 8.252
…Cont.
c. Critical value
Fail to Reject HO
df ( R 1) (C 1) reject HO
(2 1) (3 1)
1 2
2
α =.05
22, .05 5.99
d. Decision 5.99
Since χ2cal (8.252) > χ2critical (5.99)
8.252
Reject HO
e. Conclusion Academic
performance is significantly dependent on gender at .
05 level of significance
…Cont.
Q2. Measures of
Association
For a 2 x 3 contingency table, both contingency and
Cramer’s V coefficients are appropriate
2 2
C V
2 n n (k 1)
8.252 8.252
8.252 300 300 (2 1)
.164 .166
Female 6 16 8
Male 5 13 2
Q1. Hypothesis Test
a. Hypotheses
HO: Group participation is independent of gender
HA: Group participation is dependent on gender
b. Test statistic. Calculate expected value for each cell
Group participation Row
Gender A B C Totals
30
Female 6
(6.6) 16
(17.4) 8
(6.0)
20
Male 5
(4.4)
13
(11.6)
2
(4.0)
Column Totals 11 29 10 50
2
(O E )
2
Chi-Square E
O E 2
Group O E (O – E) (O – E)2 E
F-A 6 6.6 -0.6 0.36 0.055
F-B 16 17.4 -1.4 1.96 0.113
F-C 8 6.0 2.0 4.0 0.667
M-A 5 4.4 0.6 0.36 0.082
M-B 13 11.6 1.4 1.96 0.169
M-C 2 4.0 -2.0 4.0 1.000
50 2.086
…Cont.
c. Critical value
Fail to Reject HO
df ( R 1) (C 1) reject HO
(2 1) (3 1)
1 2
2
α =.05
22, .05 5.99
d. Decision 5.99
Since χ2cal (2.086) < χ2critical (5.99)
2.086
Fail to reject HO
e. Conclusion
Group participation is independent on gender at .05
level of significance
…Cont.
Q2. Measures of
Association
For a 2 x 3 contingency table, both contingency and
Cramer’s V coefficients are appropriate
2 2
C V
2 n n (k 1)
2.086 2.086
2.086 50 50 ( 2 1)
.200 .204
Negligible association between gender and group
participation
Look for
O E 2 2 𝑋 100=33.33 %
Group O E (O – E) (O – E) 2
E 6
50 2.086
Condition to be fulfilled:
Valid if less than 20% cell with expected count less than 5
Invalid if more than 20% cell with expected count less than 5
Yates Correction
• Yates Correction*
• When there is only 1 degree of freedom, regular chi-test should not be used
• Apply the Yates correction by subtracting 0.5 from the absolute value of each
calculated O-E term, then continue as usual with the new corrected values
2x3
Contingency table Observed
Values
Expected
Values
…Cont.
Hypothesis Testing:
Chi-Square
Chi-Square value
Chi-Square Tests
Asymp. Sig.
Value df (2-sided) Decision:
Pearson Chi-Square 8.253a 2 .016 sig-χ2 (.016) < .01
Likelihood Ratio 8.370 2 .015
Linear-by-Linear
Reject HO
6.762 1 .009
Association
N of Valid Cases 300 Conclusios:
a. 0 cells (.0%) have expected count less than 5. The Performance is significantly
minimum expected count is 7.50. dependent on student group
Measures of Association Symmetric Measures
at .01 level of sig
between variables
Value Approx. Sig. Which measure is
Nominal by Phi .166 .016 most appropriate?
Nominal Cramer's V .166 .016
Contingency Coefficient .164 .016
negligible
N of Valid Cases 300
association
a. Not assuming the null hypothesis.
between variables
b. Using the asymptotic standard error assuming the null
For 2x3 table,
hypothesis.
use C or V coefficient
SPSS OUTPUT
• Suppose we want to test for an association between smoking
behavior (nonsmoker, current smoker, or past smoker) and gender
(male or female) using a Chi-Square Test of Independence (we'll
use α = 0.05).
Next ►
a. Hypotheses
Next ►
Next ►
Results of Chi-Square
Next ►
Decision
Criteria
Reject HO: sig-χ2 ≤ α
Fail to
reject HO: sig- χ2 > α
Next ►
Interpretation
Reject HO
There is significant association between
IV and DV
Fail to Reject HO
There is no significant association
between IV and DV
Next ►
APA Reporting
A chi-square test of independence was
performed to examine the relation between
gender and smoking behavior. The relation
between these variables was not significant, χ2
(2, N = 193) = 3.171, p >.05. Smoking
behaviour does not related to gender.