Professional Documents
Culture Documents
Chi-Square: Example Problem
Chi-Square: Example Problem
The expected value within each cell, if the null condition is true (i.e., if the factors have
no significant influence on observed frequencies in the population), is simply the
product of the row total and column total divided by the overall sample N for the test
of independence and N divided by the number of levels of the single factor for the test
of goodness of fit. If Oij is the observed frequency and Eij the expected frequency for
the cell corresponding to the ith condition and the jth group, then chi-square is:
HYPOTHESIS:
The decision rule for the χ2 test depends on the level of significance and the degrees of freedom,
defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the
null hypothesis is true, the observed and expected frequencies will be close in value and the χ2
statistic will be close to zero. If the null hypothesis is false, then the χ2 statistic will be large.
Critical values can be found in a table of probabilities for the χ2 distribution. Here we have
df=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule
is as follows: Reject H0 if χ2 > 5.99.
DEGREE OF FREEDOM
DF = (ROWS-1) (COLUMNS-1)
= (2-1) (3-1)
= (1) (2)
DF = 2
CRITICAL VALUE
C.V = 5.994
We now compute the expected frequencies using the sample size and the proportions specified
in the null hypothesis. We then substitute the sample data (observed counts) and the expected
frequencies into the formula for the test statistic identified in Step 2. The computations can be
organized as follows.
OBSERVED COUNTS
SHOPEE LAZADA OTHERS Row total
MALE 19 6 11 36
Expected 20.52 5.04 10.44
FEMALE 38 8 18 64
Expected 36.48 8.96 18.56
Column Total 57 14 29 100
EXPECTED VALUE:
(Group) (Category)
E (Group, Category) =
GRAND TOTAL
(36)(57)
E (Male, Shopee) = =20.52
100
(36)(14)
E (Male, Lazada) = =5.04
100
(36)(29)
E (Male, Others) = =10.44
100
(64)(57)
E (Female, Shopee) = =36.48
100
(64)(14)
E (Female, Lazada) = =8.96
100
(64)(29)
E (Female, Others) = =18.56
100
(observed−expected)2
χ2=∑
expected
χ2= 0.5086
p < .05
Step 5. Conclusion.
The researcher accepted the null hypothesis because α is greater than the p-value.
SPEARMAN RHO
The Spearman’s Rank Correlation Coefficient is a statistical test that examines the
degree to which two data sets are correlated, if at all. While a scatter graph of the two
data sets may give the researcher a hint towards whether the two have a correlation,
Spearman’s Rank gives the researcher a numerical value on the degree of correlation,
or indeed, the degree of non-correlation. It is a relatively straight forward analysis for
those researchers whom are not wholly confident in their mathematical skills.
In order to use Spearman’s Rank the researcher must have paired sets of data that are
in some way related (such as the geographical site where they were collected in the
field). It is a good idea for the researcher to have at least ten pairs of data to use for the
analysis: any fewer than this and the result will be highly insignificant and more likely
to be as a result of chance than of true correlation.
Requirements:
Scale of measurement must be ordinal (or interval, ratio)
Data must be in the form of matched pairs
The association must be monotonic (i.e., variables increase in value together, or one
increases while the other decreases)
Equation:
EXAMPLE PROBLEM: 10 schools are ranked based on distance and
popularity as follows. Calculate the Spearman's Rank Correlation
Coefficient.
Step 1. The researcher should arrange the paired data in a table to allow for ease of
analysis. This can be done in a spreadsheet package or through hand written methods
Step 2. Then researcher should rank each data, starting with 1 as the smallest figure
and (in this case) 10 as the largest.
Step 3. The difference ( 𝑑 ) between the two ranks should then be calculated by
subtracting 𝑅1 from 𝑅2
Step 4. 𝑑 should then be squared to remove any negative values. The total value of all
the 𝑑2 can also be calculated at this stage.
Scatterplot
6000
5000
Number of Students
4000
3000
2000
1000
0
0 50 100 150 200 250 300 350
Distance
Step 5. One should then apply the Spearman’s Rank equation to calculate the coefficient
value (𝑅) (the value that tells the researcher the strength of the correlation).
where 𝑛 is the number of pairs of data collected and used (in this case 10). The sum
of the 𝑑2 values(Ʃ𝑑2) in this example is 42
6(42)
=1-
103-10
252
=1-
990
= 1 - 0.25
R = 0.75
The coefficient (𝑅) will be between a value of -1 and +1 where -1 indicates a perfect
negative correlation and +1 indicates a perfect positive correlation. A value of between -
0.7 to +0.7 is generally seen as being too weak to be thought of as a significant result
Therefore, the data in this example shows a strong positive correlation between
number of students and the distance of school
Step 6. To check whether the result is meaningful or is just down to chance, the value
for 𝑅 can be compared with the critical value for 𝑛 in the Spearman’s Rank significance
table.
Below is the significance table for some values of 𝑛, but for analysis of larger sets of
data, extended significance tables can be found online
The critical value for this example, where there are 10 pairs of data ( 𝑛 = 10 ), is 0.564.
As the value of 𝑅 is greater than the critical value, we can say with 95% certainty that
the results we have observed have not occurred by chance. This means the results are
highly significant and sound conclusions can be drawn from them.