Homework Problems - Chi-Square Test of Independence

You might also like

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 42

SW388R6

Data Analysis
and Computers I
Chi-square Test of Independence
Slide 1

Reviewing the Chi-square Test of Independence

Sample Homework Problem

Solving the Problem with SPSS

Logic for Chi-square Test of Independence


SW388R6

Chi-square Test of Independence


Data Analysis
and Computers I

Slide 2

 The chi-square test of independence is one of the


most frequently used hypothesis test in the social
sciences because it can be used with variables at any
level of measurement.

 In this set of problems, we will use the chi-square


test of independence to evaluate group differences
when the dependent variable is nominal,
dichotomous, ordinal, or grouped interval.

 The chi-square test of independence can be used for


any variable; both the group (independent) and the
test variable (dependent) can be nominal,
dichotomous, ordinal, or grouped interval.
SW388R6
Data Analysis Chi-square test of independence vs. other
statistical tests comparing groups - 1
and Computers I

Slide 3

 Tests of means (t-tests and Anova) compare the


central tendency of groups so that we can say
whether one group tends to score higher or lower on
the dependent variable.

 The chi-square test of independence presumes that


all variables are nominal, and tests for differences
within individual categories of the dependent
variable.

 For example, a chi-square test of independence will


tell us whether there were more males in the
satisfied category than we would have expected by
chance, but this does not imply that males were
generally more satisfied than females.
SW388R6
Data Analysis Chi-square test of independence vs. other
statistical tests comparing groups - 2
and Computers I

Slide 4

 To demonstrate, look at the following table:

Males Females Row Total


1 = Dissatisfied 20 5 25
2 = Neutral 10 35 45
3 = Satisfied 30 20 50

Column Total 60 60 120

 Since the total number of males and females is the


same (60), we would expect the 50 subjects in the
Satisfied category to be evenly split (25 males and
25 females). Having 30 males instead of 25 implies
that males are over-represented in this group.
 Yet, females have a higher mean score than males
(2.25 vs. 2.17) as an indication of overall trend.
SW388R6

Independence Defined
Data Analysis
and Computers I

Slide 5

 Two variables are independent if, for all cases, the


classification of a case into a particular category of
the independent variable has no effect on the
probability that the case will fall into any particular
category of the dependent variable.

 When two variables are independent, there is no


relationship between them. We would expect that
the frequency breakdowns of the dependent variable
to be similar for all groups.
SW388R6

Independence Demonstrated
Data Analysis
and Computers I

Slide 6

 Suppose we are interested in the relationship


between gender and attending college.

 If there is no relationship between gender and


attending college, and 40% of our total sample
attend college, we would expect 40% of the males in
our sample to attend college and 40% of the females
to attend college.

 If there is a relationship between gender and


attending college, we would expect a higher
proportion of one group to attend college than the
other group, e.g. 60% to 20%.
SW388R6
Data Analysis Displaying Independent and Dependent
Relationships
and Computers I

Slide 7

When group membership


When the variables are makes a difference, the
independent, the proportion dependent relationship is
in both groups is close to indicated by one group having
the same size as the a higher proportion than the
proportion for the total proportion for the total
sample. sample.
Independent Relationship Dependent Relationship
betw een Gender and College betw een Gender and College
Poportion Attending College

Poportion Attending College


100% 100%

80% 80%
60%
60% 60%
40% 40% 40% 40%
40% 40%
20%
20% 20%
0% 0%
Males Females Total Males Females Total
SW388R6

Expected Frequencies
Data Analysis
and Computers I

Slide 8

 Expected frequencies are computed as if there is no


difference between the groups, i.e. both groups have
the same proportion as the total sample in each
category of the dependent variable.

 Since the proportion of subjects in each category of


the independent variable can differ, we take group
category into account in computing expected
frequencies as well.

 To summarize, the expected frequencies for each


cell are computed to be proportional to both the
breakdown for the dependent variable and the
breakdown for the independent variable.
SW388R6

Expected Frequency Calculation


Data Analysis
and Computers I

Slide 9

The data from “Observed Frequencies for Sample Data” is


the source for information to compute the expected
frequencies. Percentages are computed for the column of
all students and for the row of all GPA’s. These
percentages are then multiplied by the total number of
students in the sample (453) to compute the expected
frequency for each cell in the table.
SW388R6
Data Analysis
and Computers I

Slide 10
Expected Frequencies vs. Observed Frequencies

 The chi-square test of independence plugs the


observed frequencies and expected frequencies into
a formula which computes how the pattern of
observed frequencies differs from the pattern of
expected frequencies.

 Probabilities for the test statistic can be obtained


from the chi-square probability distribution so that
we can test hypotheses.

 The chi-square test of independence is a test of the


influence or impact that a subject’s score on the
independent variable has on the same subject’s score
for the dependent variable.
SW388R6
Data Analysis Level of measurement and
sample size requirements
and Computers I

Slide 11

 The chi-square Test of Independence can be used for


any level variable, including interval level variables
grouped in a frequency distribution. It is most useful
for nominal variables for which we do not another
option.

 Sample size requirement:


 No cell has an expected frequency less than 1.
 No more than 20% of cells have an expected frequency less
than 5.

 If these requirements are violated, the chi-square


distribution will give us misleading probabilities.
SW388R6

Hypotheses
Data Analysis
and Computers I

Slide 12

 The research hypothesis states that the two variables


are dependent or related. This will be true if the
observed counts for the categories of the variables in
the sample are different from the expected counts.

 The null hypothesis is that the two variables are


independent. This will be true if the observed counts
in the sample are equal to the expected counts.

 The decision rule for the chi-square test of


independence is the same as our other statistical
tests: reject the null hypothesis if the probability of
the test statistic is less than or equal to alpha.
SW388R6

Sampling distribution and test statistic


Data Analysis
and Computers I

Slide 13

 To test the relationship, we use the chi-square test


statistic, which follows the chi-square distribution.

 If we were calculating the statistic by hand, we


would have to compute the degrees of freedom to
identify the probability of the test statistic. SPSS will
print out the degrees of freedom and the probability
of the test statistic for us.
SW388R6

Computing the Test Statistic


Data Analysis
and Computers I

Slide 14

 Conceptually, the chi-square test of independence


statistic is computed by summing the difference
between the expected and observed frequencies for
each cell in the table divided by the expected
frequencies for the cell.

 We identify the value and probability for this test


statistic from the SPSS statistical output.
SW388R6

Decision and Interpretation


Data Analysis
and Computers I

Slide 15

 If the probability of the test statistic is less than or


equal to the probability of the alpha error rate, we
reject the null hypothesis and conclude that our data
supports the research hypothesis. We conclude that
there is a relationship between the variables.

 If the probability of the test statistic is greater than


the probability of the alpha error rate, we fail to
reject the null hypothesis. We conclude that there is
no relationship between the variables, i.e. they are
independent.
SW388R6

Which Cell or Cells Caused the Difference


Data Analysis
and Computers I

Slide 16

 One of the problems in interpreting chi-square tests


is the determination of which cell or cells produced
the statistically significant difference. Examination
of percentages in the contingency table and
expected frequency table can be misleading.
 The residual, or the difference, between the
observed frequency and the expected frequency is a
more reliable indicator, especially if the residual is
converted to a z-score and compared to a critical
value equivalent to the alpha for the problem.
 Like the post-hoc tests in one-way Anova, we only
conduct post-hoc tests If the result of the chi-square
test of independence is statistically significant.
SW388R6

Standardized Residuals
Data Analysis
and Computers I

Slide 17

 SPSS prints out the standardized residual (converted


to a z-score) computed for each cell. It does not
produce the probability or significance.

 Without a probability, we will compare the size of


the standardized residuals to the critical values that
correspond to an alpha of 0.05 (+/-1.96) or an alpha
of 0.01 (+/-2.58). The problems will tell you which
value to use. This is equivalent to testing the null
hypothesis that the actual frequency equals the
expected frequency for a specific cell versus the
research hypothesis of a difference greater than
zero.

 There can be 0, 1, 2, or more cells with statistically


significant standardized residuals to be interpreted.
SW388R6

Interpreting Standardized Residuals


Data Analysis
and Computers I

Slide 18

 Standardized residuals that have a positive value


mean that the cell was over-represented in the
actual sample, compared to the expected frequency,
i.e. there were more subjects in this category than
we expected.

 Standardized residuals that have a negative value


mean that the cell was under-represented in the
actual sample, compared to the expected frequency,
i.e. there were fewer subjects in this category than
we expected.
SW388R6
Data Analysis Interpreting Cell Differences in
a Chi-square Test - 1
and Computers I

Slide 19

A chi-square test of
independence of the
relationship between
sex and marital status
finds a statistically
significant relationship
between the
variables.
SW388R6
Data Analysis Interpreting Cell Differences in
a Chi-square Test - 2
and Computers I

Slide 20

Researcher often try to identify try to identify which


cell or cells are the major contributors to the
significant chi-square test by examining the pattern of
column percentages.

Based on the column percentages, we would identify


cells on the married row and the widowed row as the
ones producing the significant result because they
show the largest differences: 8.2% on the married
row (50.9%-42.7%) and 9.0% on the widowed row
(13.1%-4.1%)
SW388R6
Data Analysis Interpreting Cell Differences in
a Chi-square Test - 3
and Computers I

Slide 21

Using a level of significance of 0.05, the critical value


for a standardized residual would be -1.96 and +1.96.
Using standardized residuals, we would find that only
the cells on the widowed row are the significant
contributors to the chi-square relationship between sex
and marital status.

If we interpreted the contribution of the married marital


status, we would be mistaken. Basing the interpretation
on column percentages can be misleading.
SW388R6
Data Analysis Chi-square test of independence:
APA Style - 1
and Computers I

Slide 22

A chi-square test of independence was performed to examine the relation


between religion and college interest. The relation between these
variables was significant, Χ² (2, N = 170) = 14.14, p <.01. Catholic teens
were less likely to show an interest in attending college than were
Protestant teens.

Χ² (2, N = 170) = 14.14, p <.01

Number of Value
Degrees of freedom Significance
valid of
(from SPSS output) of statistic
cases statistic

Source: depts.washington.edu/psywc/handouts/pdf/stats.pdf
SW388R6
Data Analysis Chi-square test of independence :
APA Style – Example 2
and Computers I

Slide 23

A chi-square test of independence indicated that there is a significant


relationship between the size of the group and helping behavior, Χ² (2,
N=52) = 7.91, p<.05. As show in Table 1, as group size increased, helping
behavior decreased.

Table 1 The percentage of observers who assisted at each group size


Observer Behavior
Group Size Helped Did not Help Total
(n = 31) (n = 21)
_________________________________________________
2 35% 9% n = 13
3 52% 48% n = 26
6 13% 43% n = 13
_________________________________________________

Source: www.ithaca.edu/faculty/alynn/Chi square independence.pdf


SW388R6
Data Analysis Sample homework problem:
Chi-square test of independence
and Computers I

Slide 24

This problem uses the data set GSS2000R.Sav to compare the breakdown for the
variable "should marijuana be made legal" [grass] for groups of survey
respondents defined by the variable "general happiness" [happy]. Using a chi-
square test of independence, is the following statement true, true with caution,
false, or an incorrect application of a statistic? Use .05 as alpha and 1.96 as the
critical value for the post hoc test.

Within the group of survey respondents who said that overall they were not too
happy, there were significantly more who thought the use of marijuana should
be made legal than would have been expected based on the breakdown of
"should marijuana be made legal" by "general happiness".

o True This is the general framework


for the problems in the
o True with caution homework assignment on the
o False chi-square test of independence.
The description is similar to
o Incorrect application of a statistic findings one might state in a
research article.
SW388R6
Data Analysis Sample homework problem:
Data set, variables, and criteria
and Computers I

Slide 25

This problem uses the data set GSS2000R.Sav to compare the breakdown for the
variable "should marijuana be made legal" [grass] for groups of survey
respondents defined by the variable "general happiness" [happy]. Using a chi-
square test of independence, is the following statement true, true with caution,
false, or an incorrect application of a statistic? Use .05 as alpha and 1.96 as the
critical value for the post hoc test.

Within the group of survey respondents


The first who said that overall they were not too
paragraph identifies:
happy, there were significantly more who thought the use of marijuana should
• The data set to use, e.g. GSS2000R.Sav
be made legal than would haveindependent
• The been expected based
variable that on the breakdown
defines the of
groups to be compared in the analysis
"should marijuana be made legal" by "general happiness".
• The dependent variable that is compared
across groups
• The alpha level to use in the hypothesis test
o True • The critical value used for the post hoc test
o True with caution
o False
o Incorrect application of a statistic
SW388R6
Data Analysis Sample homework problem:
Specifications for the test
and Computers I

Slide 26

This problem uses the data set GSS2000R.Sav to compare the breakdown for the
variable "should marijuana be made legal" [grass] for groups of survey
respondents defined by the variable "general happiness" [happy]. Using a chi-
square test of independence, is the following statement true, true with caution,
false, or an incorrect application of a statistic? Use .05 as alpha and 1.96 as the
critical value for the post hoc test.

Within the group of survey respondents who said that overall they were not too
happy, there were significantly more who thought the use of marijuana should
be made legal than would have been expected based on the breakdown of
"should marijuana be made legal" by "general happiness".

o True The second paragraph states the finding that we


o True with caution want to verify with the chi-square test of
independence. The finding identifies:
o False
• The specific group of the independent
o Incorrect application of a statistic
variable that is over or under represented for
the stated category of the dependent
variable
• The direction of the relationship, i.e. over-
representation (significantly more) or under-
representation (significantly fewer).
SW388R6
Data Analysis Sample homework problem:
Choosing an answer
and Computers I

Slide 27

This
Theproblem
answer touses the data
a problem set GSS2000R.Sav to compare the breakdown for the
will
variable
be True"should marijuana
if the test of be made legal" [grass] forSince
groupsit isoflegitimate
survey to
independence supports the use ordinal variables in
respondents defined
finding in the problem by the variable "general happiness" [happy]. Using
the chi-square test ofa chi-
statement.
square test ofThe answer to a
independence, independence,
is the following statement true, true True with caution,
problem will be False if the test with caution is not
false, or an incorrect
of independence doesapplication
not of a statistic? Use .05 as for
used alphatheseand 1.96 as the
support the finding in the problems.
critical value for the post hoc test.
problem statement.

Within the group of survey respondents who said that overall they were not too
happy, there were significantly more who thought the use of marijuana should
be made legal than would have been expected based on the breakdown of
"should marijuana be made legal" by "general happiness".

o True The answer to a problem will


o True with caution Incorrect application of a
statistic if the Test of
o False Independence violates the
o Incorrect application of a statistic sample size requirement, i.e.
no expected frequencies less
than 1 and no more than 20%
of the cells have expected
frequencies less than 5.
SW388R6
Data Analysis Solving the problem with SPSS:
Level of measurement
and Computers I

Slide 28

In the chi-square test of independence,


the level of measurement for the
independent and the dependent variable
can be any level that defines groups
(dichotomous, nominal, ordinal, or
grouped interval). "Should marijuana be
made legal" [grass] is dichotomous and
"general happiness" [happy] is ordinal, so
the level of measurement requirements
are satisfied.
SW388R6
Data Analysis Solving the problem with SPSS:
The chi-square test of independence- 1
and Computers I

Slide 29

To chi-square test
of independence is
computed for
cross-tabulated
tables.

Select Descriptive
Statistics > Crosstabs…
from the Analyze menu.
SW388R6
Data Analysis Solving the problem with SPSS:
The chi-square test of independence- 2
and Computers I

Slide 30

First, move the


dependent variable
grass to the
Row(s) list box.

The finding we are trying to


verify is:
Within the group of survey
respondents who said that
overall they were not too
happy, there were significantly Second, move the
more who thought the use of independent variable
marijuana should be made legal happy to the
than would have been expected Column(s) list box.
based on the breakdown of
"should marijuana be made
legal" by "general happiness".
Third, click on
We first create a cross- the Statistics
tabulated table with the button to add the
independent variable in the chi-square test.
columns and the dependent
variable in the rows.
SW388R6
Data Analysis Solving the problem with SPSS:
The chi-square test of independence- 3
and Computers I

Slide 31

First, mark the


check box for
Chi-square in Second, click on the
the Crosstabs: Continue button to
Statistics dialog close the dialog box.
box.'.
SW388R6
Data Analysis Solving the problem with SPSS:
The chi-square test of independence- 4
and Computers I

Slide 32

When we return to the


Crosstabs dialog box, we
click on the Cells button to
specify what we wanted to
include in the cells of the
crosstabs table.
SW388R6
Data Analysis Solving the problem with SPSS:
The chi-square test of independence- 5
and Computers I

Slide 33

First, we want the


cells to show both the
Observed and
Expected counts for
each cell, so we mark
these check boxes.

Third, we click
on the
Continue
button to close
the dialog box.

Second, we mark
the check box for
Standardized
residuals, which
we will use for the
post hoc test.
SW388R6
Data Analysis Solving the problem with SPSS:
The chi-square test of independence- 6
and Computers I

Slide 34

Having completed
the specifications
for the test, we
click on the OK
button to generate
the output.
SW388R6
Data Analysis Solving the problem with SPSS:
Sample size requirement
and Computers I

Slide 35

The sample size requirement for the chi-square


test of independence states that none of the
cells should have an expected frequency less
than 1.0 and no more than 20% of the cells
should have an expected frequency less than
5.

The information for verifying this


requirement is found in the
footnote to the chi-square tests
table.

The minimum expected frequency


in any cell was 5.24, which is larger
than the minimum requirement of
1. None of the cells had an
expected frequency less than 5.

The sample size requirements for


the chi-square test are satisfied.

If the fail to meet the sample


size requirements, the
probability for the chi-square
statistic may be inaccurate,
so it would be an
inappropriate application of
the statistic.
SW388R6
Data Analysis Solving the problem with SPSS:
Overall significance of the test
and Computers I

Slide 36

The finding we are trying to verify is:


Within the group of survey respondents
who said that overall they were not too
happy, there were significantly more who
thought the use of marijuana should be
made legal than would have been
expected based on the breakdown of
"should marijuana be made legal" by
"general happiness".

This requires that the chi-square statistic


for the table be statistically significant.
The chi-square test of
independence supported
the existence of a
relationship between
"general happiness"
[happy] and "should
marijuana be made legal"
[grass], Chi-square (2, N
= 169) = 17.14, p < .01.

If the overall test is


not significant, we do
not examine the
tests of individual
cells.
SW388R6
Data Analysis Solving the problem with SPSS:
Significance of the specific relationship
and Computers I

Slide 37

The finding we are trying


to verify is:
Within the group of survey
respondents who said that
overall they were not too
happy, there were
significantly more who
thought the use of
marijuana should be made
legal than would have
been expected based on
the breakdown of "should
marijuana be made legal"
by "general happiness".

This requires that the


standardized residual is
greater than the critical The standardized residual for the cell for survey
value. respondents who said that overall they were not
too happy and who thought the use of marijuana
should be made legal was 3.00, which was
greater than the critical value of 1.96.

The specific relationship satisfies the requirement


for statistical significance.
SW388R6
Data Analysis Solving the problem with SPSS:
Direction of the specific relationship - 1
and Computers I

Slide 38

The finding we are trying


to verify is:
Within the group of survey
respondents who said that
overall they were not too
happy, there were
significantly more who
thought the use of
marijuana should be made
legal than would have
been expected based on
the breakdown of "should
marijuana be made legal"
by "general happiness".

This requires that the


standardized residuals is
greater than the critical Since the sign of the standardized residual was a
value. plus, the statement that "there were significantly
more who thought the use of marijuana should
be made legal than would have been expected
based on the breakdown of "should marijuana be
made legal" by "general happiness"" is correct.
SW388R6
Data Analysis Solving the problem with SPSS:
Direction of the specific relationship - 2
and Computers I

Slide 39

The finding we are trying


to verify is:
Within the group of survey
respondents who said that
overall they were not too
happy, there were
significantly more who
thought the use of
marijuana should be made
legal than would have
been expected based on
the breakdown of "should
marijuana be made legal"
by "general happiness".

This requires that the We can verify the direction of the relationship by
standardized residuals is comparing the Count to the Expected Count.
greater than the critical
value. Consistent with the proportion of subjects for
General Happiness and Should Marijuana be
Made Legal, we would have expected 5.2 in the
specific combination for this problem. We actually
had 12 subjects for this combination, more than
we would have expected.

The answer to the


problem is True.
SW388R6
Data Analysis Solving the problem with SPSS:
Direction of the specific relationship - 2
and Computers I

Slide 40

The finding stated in the problem


was: Within the group of survey
respondents who said that overall
they were not too happy, there were
significantly more who thought
the use of marijuana should be
made legal than would have been
expected based on the breakdown
of "should marijuana be made legal"
by "general happiness".

Alternatively, the problem could have been framed as:


Within the group of survey respondents who said that
overall they were not too happy, there were
significantly fewer who thought the use of marijuana
should not be made legal than would have been
expected based on the breakdown of "should marijuana
be made legal" by "general happiness“, since the
standardized residual for that cell was -2.2. It is usually
less awkward to state what people do think, rather than
what they don’t think, though both statements would be
statistically correct.
SW388R6
Data Analysis Logic for homework problems:
Chi-square Test of Independence - 1
and Computers I

Slide 41

Since SPSS output


includes the sample
Compute chi-square test of size information, we
independence, requesting conduct the test at
standardized residuals our initial step.

All expected
frequencies ≥ 1?
No

Inappropriate
Yes application of
a statistic

No more than 20%


of cells have
expected
frequencies < 5? No

Inappropriate
Yes application of
a statistic
SW388R6
Data Analysis Logic for homework problems:
Chi-square Test of Independence - 2
and Computers I

Slide 42

Probability of the No
chi-square False
statistic ≤ alpha?

Yes

Standardized
residual for target No
False
cell ≥ critical
value?

Yes

Correct interpretation
of relationship (more No
or less than False
expected)?

Yes

True

You might also like