Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

CSSS/SOC/STAT 221: Statistical Concepts and Methods for the Social Sciences

Name: __________________________________________
Collaborators: __________________________________________

Problem Set 4: Using a chi-square test to identify differences in survival rates between
different populations
Early in the morning of Monday 15 April 1912, the ocean liner RMS Titanic sank in the
North Atlantic, on the ship’s first voyage. Tragically, approximately 68% of the ship’s
passengers and crew perished. The two-way table below describes the frequencies of those who
survived the catastrophe and those who perished, separated according to passenger class (1st, 2nd,
and 3rd class, plus crew).
Observed Survive Row
counts (O i , j) Perished d totals
1st class 122 203 325
2nd class 167 118 285
3rd class 528 178 706
Crew 673 212 885
Column totals 1490 711 2201

The overall survival proportion was approximately 32% ( p pooled =711/2201 ), but some
groups fared better than others: 62% of the 1st-class passengers and 41% of the 2nd-class
passengers survived, in contrast to 25% of the 3rd-class passengers and 24% of the ship’s crew.
Given this pattern in the sample, it is reasonable to wonder if different groups had unequal access
to the means of survival. Conversely, we might wonder if every group was exposed to the same
probability of survival, in which case the pattern of variability we see in the sample would be the
result of random sampling error alone.
We can treat the probability of survival π as the focus of a null hypothesis. If survival is
independent of passenger class, then we can set our null value to p pooled and stat our hypothesis as

H 0 :π 1 st =π 2 nd=π 3 rd =π crew = p pooled

—which we can test with a chi-square test for independence.


The first step is to calculate the expected counts of those who would have perished and
those who would have survived for each group if the null hypothesis were true. The following
table gives expected frequencies for each combination of levels of the passenger class variable
and the survival outcome variable, Ei , j. These were calculated by multiplying the corresponding
row total ni and column total n j in the above table, then dividing by the total sample size 2201:

ni ×n j
Ei , j =
n

Expected counts for the crew are missing.

1
CSSS/SOC/STAT 221: Statistical Concepts and Methods for the Social Sciences

Expected Survive Row


counts ( Ei , j ) Perished d totals
1st 220.01 104.99 325
2nd 192.94 92.06 285
3rd 477.94 228.06 706
Crew 885
Column totals 1490 711 2201

(1) Given this equation, calculate the missing values for the crew to complete the table above, to
the second decimal place. Based on the values that you calculate for the crew, did more crew
survive than expected, or less? By how much?
______________________________________________________________________________
______________________________________________________________________________

The chi-squared test is valid only if the expected frequency for all cells is greater than 5,
so we can use the test in this case.
If you compare the tables of observed frequencies (Oi , j) and expected frequencies ( Ei , j )
above, you should notice that the observed frequencies of survivors between classes are not
exactly what we would expect given the null hypothesis that survival is independent of passenger
class. For example, we would have expected approximately 105 1st-class passengers to survive if
the null hypothesis were true, yet in reality 203 survived. To measure how different our observed
data are from what we would expect under our null hypothesis, we first need to calculate eight
squared Z-scores (or “squared standardized differences”), one each for each cell in the table
below, given the following equation:

2
2 ( Oi , j−E i , j )
Z = i, j
E i, j

These scores measure the groups that contribute the most and least to the pattern of
disproportionate survivorship. For example, the 1st class passengers are far more different from
expectation than the 2nd class passengers. Squared standardized differences for the crew are
missing.

Squared Z
2
scores ( Zi , j ) Perished Survived
st
1 43.66 91.50
2nd 3.49 7.31
3rd 5.24 10.99
Crew

2
CSSS/SOC/STAT 221: Statistical Concepts and Methods for the Social Sciences

(2) Given this equation, calculate the missing values for the crew to complete the table above.
Calculate to the second decimal place. Based on the values that you calculate for the crew, are
the crew more different from expectation than the 1st class passengers? Why or why not?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

To calculate the observed chi-square test statistic, we sum these eight squared
standardized differences:

2
χ obs =190.40

To identify a critical value for this test statistic, we will use the chi-square distribution. To do so,
we need to calculate this distribution’s degrees-of-freedom (d . f .) parameter, which is:

( 3 ) d . f .=( number of rows−1 ) × ( number of columns−1 ) =¿ ¿

(Note that the number of rows is the number of groups on the ship, while the number of columns
is the number of different survival outcomes – ‘perish’ or ‘survive.’)
To evaluate whether the difference between observed and expected data is significant,
you need to identify a critical value χ 2crit based on the d . f . you calculated above, as well as a pre-
specified level of significance. Let’s use an exceptionally low level of significance, α =0.001.
Choosing such a low α makes it very hard to reject the null hypothesis, implying a high standard
for the alternative hypothesis that survivorship varied between classes. Based on the chi-squared
table presented on the next page, what is χ 2crit ?

( 4 ) χ 2crit =¿ ¿

(5) Based on χ 2obs and χ 2crit above, should you accept or reject the null hypothesis?
______________________________________________________________________________

(6) Depending on your decision about the null hypothesis in Question 5, what does this imply
about the relationship between the passenger/crew status of the individual and their survival
rates? In other words, are these two variables associated or unassociated?
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________

3
CSSS/SOC/STAT 221: Statistical Concepts and Methods for the Social Sciences

You might also like