Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

CHI-SQUARE

To conduct the chi-square test, the researcher enters observed frequencies


corresponding to combinations of levels of relevant factors (here, called "condition"
and "group," but these are labels of convenience). Sums of elements within rows and
within columns are then computed (call these marginal Ns). The chi-square test of
independence is used to test the null hypothesis that the frequency within cells is what
would be expected, given these marginal Ns. The chi-square test of goodness of fit is
used to test the hypothesis that the total sample N is distributed evenly among all levels
of the relevant factor.

The expected value within each cell, if the null condition is true (i.e., if the factors have
no significant influence on observed frequencies in the population), is simply the
product of the row total and column total divided by the overall sample N for the test
of independence and N divided by the number of levels of the single factor for the test
of goodness of fit. If Oij is the observed frequency and Eij the expected frequency for
the cell corresponding to the ith condition and the jth group, then chi-square is:

EXAMPLE PROBLEM: A survey is conducted on 100 young adults, 36 male and 64


female which e-commerce apps (Shopee, Lazada or others) do they prefer. Based on
the data collected is the person’s gender affect preference of e-commerce app?

Step 1. Set up hypotheses and determine level of significance.

HYPOTHESIS:

H0: E-commerce apps are independent upon gender

H1: E-commerce apps are dependent upon gender

Step 2. Select the appropriate test statistic.

The test statistic is:


Step 3. Set up decision rule.

The decision rule for the χ2 test depends on the level of significance and the degrees of freedom,
defined as degrees of freedom (df) = k-1 (where k is the number of response categories). If the
null hypothesis is true, the observed and expected frequencies will be close in value and the χ2
statistic will be close to zero. If the null hypothesis is false, then the χ2 statistic will be large.
Critical values can be found in a table of probabilities for the χ2 distribution. Here we have
df=2 and a 5% level of significance. The appropriate critical value is 5.99, and the decision rule
is as follows: Reject H0 if χ2 > 5.99.

Significance level: α = 0.05

DEGREE OF FREEDOM

DF = (ROWS-1) (COLUMNS-1)
= (2-1) (3-1)
= (1) (2)
DF = 2

CRITICAL VALUE

C.V = 5.994

Step 4. Compute the test statistic.

We now compute the expected frequencies using the sample size and the proportions specified
in the null hypothesis. We then substitute the sample data (observed counts) and the expected
frequencies into the formula for the test statistic identified in Step 2. The computations can be
organized as follows.

OBSERVED COUNTS
SHOPEE LAZADA OTHERS Row total
MALE 19 6 11 36
Expected 20.52 5.04 10.44
FEMALE 38 8 18 64
Expected 36.48 8.96 18.56
Column Total 57 14 29 100
EXPECTED VALUE:
(Group) (Category)
E (Group, Category) =
GRAND TOTAL
(36)(57)
E (Male, Shopee) = =20.52
100

(36)(14)
E (Male, Lazada) = =5.04
100

(36)(29)
E (Male, Others) = =10.44
100

(64)(57)
E (Female, Shopee) = =36.48
100

(64)(14)
E (Female, Lazada) = =8.96
100

(64)(29)
E (Female, Others) = =18.56
100

TEST STATISTICS (CHI SQUARE VALUE)

(observed−expected)2
χ2=∑
expected

(19-20.52)2 (6-5.04)2 (11-10.44)2 (38-36.48)2 (8-8.96)2 (18-18.56)2


χ2= + + + + +
20.52 5.04 10.44 36.48 8.96 18.56

χ2= 0.5086

p < .05

Step 5. Conclusion.

The researcher accepted the null hypothesis because α is greater than the p-value.
SPEARMAN RHO

The Spearman’s Rank Correlation Coefficient is a statistical test that examines the
degree to which two data sets are correlated, if at all. While a scatter graph of the two
data sets may give the researcher a hint towards whether the two have a correlation,
Spearman’s Rank gives the researcher a numerical value on the degree of correlation,
or indeed, the degree of non-correlation. It is a relatively straight forward analysis for
those researchers whom are not wholly confident in their mathematical skills.

In order to use Spearman’s Rank the researcher must have paired sets of data that are
in some way related (such as the geographical site where they were collected in the
field). It is a good idea for the researcher to have at least ten pairs of data to use for the
analysis: any fewer than this and the result will be highly insignificant and more likely
to be as a result of chance than of true correlation.

Requirements:
Scale of measurement must be ordinal (or interval, ratio)
Data must be in the form of matched pairs
The association must be monotonic (i.e., variables increase in value together, or one
increases while the other decreases)

Equation:
EXAMPLE PROBLEM: 10 schools are ranked based on distance and
popularity as follows. Calculate the Spearman's Rank Correlation
Coefficient.

Step 1. The researcher should arrange the paired data in a table to allow for ease of
analysis. This can be done in a spreadsheet package or through hand written methods

Step 2. Then researcher should rank each data, starting with 1 as the smallest figure
and (in this case) 10 as the largest.

Step 3. The difference ( 𝑑 ) between the two ranks should then be calculated by
subtracting 𝑅1 from 𝑅2

Step 4. 𝑑 should then be squared to remove any negative values. The total value of all
the 𝑑2 can also be calculated at this stage.

SCHOOL DISTANCE RANK STUDENTS RANK d d2


(𝑹1) (𝑹2) (𝑹1 – 𝑹2)
1 100 10 2000 9 1 1
2 110 9 2500 8 1 1
3 130 8 1000 10 -2 4
4 150 7 3000 7 0 0
5 160 6 4000 5 1 1
6 180 5 5000 3 2 4
7 200 4 5500 2 2 4
8 250 3 4500 4 -1 1
9 270 2 6000 1 1 1
10 300 1 3500 6 -5 25
Σ 42

Scatterplot
6000

5000
Number of Students

4000

3000

2000

1000

0
0 50 100 150 200 250 300 350
Distance
Step 5. One should then apply the Spearman’s Rank equation to calculate the coefficient
value (𝑅) (the value that tells the researcher the strength of the correlation).

where 𝑛 is the number of pairs of data collected and used (in this case 10). The sum
of the 𝑑2 values(Ʃ𝑑2) in this example is 42

Therefore, the equation can be calculated as follows:

6(42)
=1-
103-10

252
=1-
990

= 1 - 0.25

R = 0.75

The coefficient (𝑅) will be between a value of -1 and +1 where -1 indicates a perfect
negative correlation and +1 indicates a perfect positive correlation. A value of between -
0.7 to +0.7 is generally seen as being too weak to be thought of as a significant result

Therefore, the data in this example shows a strong positive correlation between
number of students and the distance of school
Step 6. To check whether the result is meaningful or is just down to chance, the value
for 𝑅 can be compared with the critical value for 𝑛 in the Spearman’s Rank significance
table.

Below is the significance table for some values of 𝑛, but for analysis of larger sets of
data, extended significance tables can be found online

The critical value for this example, where there are 10 pairs of data ( 𝑛 = 10 ), is 0.564.
As the value of 𝑅 is greater than the critical value, we can say with 95% certainty that
the results we have observed have not occurred by chance. This means the results are
highly significant and sound conclusions can be drawn from them.

You might also like