Professional Documents
Culture Documents
Non Parametric Tests
Non Parametric Tests
Non Parametric Tests
tests
• 1. To differentiate parametric and non-
parametric tests
• 2. To determine the advantages and
disadvantages of non-parametric tests.
• 3. to do complete hypothesis testing using
Learning the selected non-parametric tests:
• 3.1 The Wilcoxon Signed-Rank Test for Single Population
Outcomes • 3.2 The Wilcoxon Signed-Rank Test for Dependent Samples
(Paired Observation)
• 3.3 Mann-Whitney U Test
• 3.4 Kruskal-Wallis H-Test
• 3.5 Chi-Square Test
• 4. To differentiate the use of the selected non-parametric tests.
• 5. To utilize and interpret SPSS output for Non-Parametric Tests
In the previous chapter, Hypothesis testing, we
NON- discussed parametric statistics (Z-test, t-test, and
ANOVA). In doing parametric tests, normality of
4 Kruskal-Wallis H-Test
5 Chi-Square Test
6 Spearman’s Correlation
Used to the hypothesis regarding single population
Uses hypothesis about the median instead of the
mean
An alternative test for T-test for single population
Formula: Normal Approximation
1. The Wilcoxon Signed- For:
Rank Test for Single
Population Where: n = the number of observation different from
median (mean)
Ws = the smaller of sum of negative ranks and
sum of positive ranks
t = the number of tied ranks
Steps:
1. Determine the hypothesized mean or median
2. Subtract the hypothesized mean or median from each observation
3. Affix the sign of the difference (+ or -) to the difference and
4. Rank the difference based on their absolute values.
5. Find the sum of negative ranks W-, and the sum of the positive
ranks W+.
6. Find the Wilcoxon Statistics - Ws, the smaller of W- and W+.
7. Compute the Zc and determine the p-value, 2[P(Z< Zc)].
Example:
•The Department of Trade and Industry (DTI) claims that on
the average the price pork in Metro Manila 220 pesos per kilo.
A group of consumers investigates the DTI’s and recorded the
price of pork in 12 randomly selected supermarkets in Metro
Manila. The data are as follows: 225, 218, 220, 235, 240, 215, Observations Difference Absolute
Sign Rank
230, 225, 220, 228, 220, and 230 (xi)
225
(xi-220)
5
value
5 + 4
218 -2 2 - 2
Solution: 220
235 15 15 + 9
•1 Ho: the average price is 220 ; (Median=220) 240 20 20 + 10
Ha: the average price is not 220; (Median≠220) 215 -5 5 - 4
•α=0.05 (rule of thumb) 230 10 10 + 7.5
225 5 5 + 4
•3Computation
221 1 1 + 1
Note: From the table (at the right), the observation that ties the hypothesized 228 8 8 + 6
value is disregarded in the computation. 220
Note also that there are two groups with tied ranks. Two observations with 230 10 10 + 7.5
ranked 7.5 and 3 observations with ranked 4.
W- = 2+4 = 6 W+ = 4+9+10+7.5+4+1+6+7.5 = 49
Therefore, Ws = 6
Zc=Ws - nn+14nn+12n+124-t3-t48= 6 – 1011410112124-33+23-(3+2)48
Zc=-2.199, which is the same with SPSS output
TO compute the p-value for the Zc statistics, determine 2[P(Z< 2Zc)]
2[P(Z< 2Zc)] = 2[P(Z< -2.193)] ;
From the z –table the P(Z< -2.199) = 0.014
= 2 * 0.0142= 0.028
Thus, P-value = 0.028
Pre-training weight:
99 57 62 69 74 77 59 92 70 85
Post-training weight:
94 57 62 69 66 76 58 88 70 84
•Solution:
•Ho: the pre-training weight equals post-training weight
•Ha: the pre training weight is not equal to post training weight
•α=0.05
•Computation: Pre-test Post-test
•W- = 0 W+ = 5+6+2+2+4+2 = 21 Ws = Min(W-, W+) = 0 weight weight Difference Sign Rank
•Note that there is one group with tied ranks. Three observations were
ranked 2.
99 94 5 + 5
•Zc=W - nn+14nn+12n+124-t3-t48= 0 – 6*746*7*1324-33-348 57 57 0
• = -2.226
62 62 0
TO compute the p-value for the Zc statistics, determine 2[P(Z< 2Zc)]
• 2[P(Z< 2Zc)] = 2[P(Z< -2.193)] ; 69 69 0
•From the z –table the P(Z< -2.264) = 0.013 = 2 * 0.013= 0.026
•Thus, P-value = 0.026
74 66 8 + 6
•4. Decision and Conclusion 77 76 1 + 2
•Since the P-value of the computed Z-statistics is less than 0.05 (0.026
<0.05), then the null hypothesis that the pre-training weight equals 59 58 1 + 2
post training weight is rejected. Thus, we can conclude that the training
is effective in reducing weight. 92 88 4 + 4
70 70 0
85 84 1 + 2
DATA ANALYSIS USING SPSS:
Also called Mann-Whitney Wilcoxon test, an extension of the Wilcoxon Rank
Sum test is a non-parametric test that is used to test whether 2 samples came from
the sane population.
An alternative test for t-test for independent samples.
Unlike t-test for independent samples, equality or inequality of variances is not
put into consideration.
Formula: U statistic, the smaller of U1 and U2.
and
3. Mann-Whitney U Where:
Test = number of samples for group 1 = number of samples for
group 2
= sum of ranks for group 1 = sum of ranks for group
2
= smaller of and
The p-value can be estimated using , which is a two-tailed test.
Steps in MWUT
1. Rank the observations as one group from smallest to
highest.
2. Compute the sum of ranks in each group
3. Compute the U statistic and Z statistic
4. Determine the p-value of the Z statistic
Example:
1. Below are the daily sales in hundred thousand pesos of 2 fast-food chain located in the same city.
Jollybi 127.5 175.3 89.6 148.7 189.4 202.6 275.1
Is there a significant difference between the average daily sales of the 2 fast-food chains?
Solution:
1. Ho: The 2 populations are equal (the medians of the 2 groups are the same)
Ha: The 2 populations are not equal (the medians of the 2 groups are the not same)
2. Α = 0.05
3. Test Statistics : MWUT
•Solution: Jollybi Rank Makdough Rank
89.6 1 100.3 2
127.5 4 125.2 3
148.7 6 134.4 5
175.3 9 161.8 7
189.4 10 172.9 8
202.6 11 210.1 12
218.3 14 218.3 13
W1 = 1+4+6+9+10+11+14 = 55
W2 = 2+3+5+7+8+12+13 = 50
U1= n1n2+n1n1+12-W1=7x7+782-55=22
U2= n1n2+n2n2+12-W2=7x7+782-50=27
Zc= Ws-n1n1+n2+12n1n2n1+n2+112=50-71527x71512=-2.561.25=-0.319
P-value = 2*P(Z < -0.319) = 2*0.3745 = 0.7490 (Note: use the Z-table)
5. Decision and Conclusion: Failed to reject Ho, thus the 2 populations are equal.
Therefore, there is no significant difference between the average sales of the fast=food chains.
DATA ANALYSIS USING SPSS:
Is a non-parametric test that is used to test significant differences of more than 2
independent samples/groups.
Calculation is based on ranks,
Alternative test for One-Way ANOVA
Observations must be ordinal or continuous scale. Each group must have at least
4. Kruskal-Wallis H- 5 samples.
Test
Formula: H statistic
5.949
H0.05 = 7.377
Using Microsoft Excel, the P-value is;
P-value = CHSQ.DIST.RT (H, df) = CHSQ.DIST.RT (5.949, 3) = 0.114
5. Decision and Conclusion: The computed H is not greater than the tabular H 0.05, fail to reject Ho. Therefore,
there is no significant difference on the percent increase of salary for middle managers among the 4 manufacturing
plants.
Note: In cases that there are many ties, the H statistic can be adjusted to yield a better result. SPSS uses this adjustment
in calculating H statistic.
The adjusted H statistic, HA is calculated
HA = H/T
,
Where the sum,, is calculated from all over the scores where tie exist and is the number of ties in that level or
scores. In example number 1, there are 3 levels where ties exist, and in each level there are 2 ties, thus
Now
HA = H/T = 5.949/0.9977 = 5.962 ; which is exactly the same value computed using SPSS
P-value = CHSQ.DIST.RT (H, df) = CHSQ.DIST.RT (5.962, 3) = 0.113
• Test of goodness of fit and Chi-Square test of independence
and test of proportions.
• In the Chi-Square test of Independence, the frequency of one
nominal variable is compared with different values of the
5. Chi-Square Test
second nominal variable.
• The Chi-square test of Independence is used when we want
to test associations/relationship between Independent
Variable and Dependent Variable which are both categorical
variables.
• It tests the Hypothesis:
against
ASSUMPTIONS:
1. Independent random sampling.
2. The data used is the cross tabulated frequencies.
3. Nominal/Ordinal level data.
4. No more than 20% of the cells have an expected frequency less than 5.
5. No empty cells.
5. Chi-Square Test Note: incases that numbers 4 and 5 assumptions are violated, merging of cases or cells is
recommended.
Formula for the Chi-square statistic:
Where:
Note that expected frequency ( can be computed by the multiplying the i-th row total frequency
and j-th column total frequency and dividing the product by the grand total frequency.
Steps in Chi-Square test
1. Tabulate the frequencies into rows and columns and get total frequencies in each column and row, and get the
grand total frequencies
2. Compute the expected frequencies Eij
3. Compute the χ2 statistic
4. Compare the computed χ2 statistic using critical value table for χ 2 or calculate the p-value using χ2 statistic. P-
value = χ2-dist (df, χ2 ), use Microsoft Excel for calculation.
Educational Level
Gender High School Bachelors Masters PhD Total
Female 60 54 46 41 201
Male 40 44 53 57 194
Educational Level
Where:
n = the number of paired observations
di = the difference between ranks for each paired observation
Steps in calculating Spearman Rho
•Rank, 1 being the highest, the observations in each group separately.
•Determine the difference, di, between rank for each paired observation.
•Square each di
•Compute the spearman rho coefficient using the formula shown above
Example: Is there a correlation between weekly sales and number of employees of a certain fast-food chain.
A sample of 12 branches were included in the study. The data is presented below:
Weekly sales (100000) 6.4 7.5 3.4 5.7 8.8 6.1 9.8 7.2 5.8 6.7 4.3 7.6
Number of employees 12 15 7 10 14 13 20 18 8 11 6 17
Solution:
6.1 13 8 6 2 4
9.8 18 1 2 -1 1
7.2 20 5 1 4 16
5.8 8 9 10 -1 1
6.7 11 6 8 -2 4
4.3 6 11 12 -1 1
7.6 17 3 3 0 0
Sum 34
Practice Exercise: Answer each problem as indicated. Hint: Use 4-step hypothesis testing.
1. New recruits to a call center are given initial training in answering customer calls. Following this training they
are independently assessed on their competence, and are rated on a score of 1 to 10, 1 representing ‘totally
incompetent’ to 10 ‘totally competent’. It is usual for the trainees’ scores to be symmetrically distributed about a
median of 6. A new trainer has been appointed and the scores of her first 19 trainees are: 6, 5, 6, 9, 7, 3, 4, 6, 7, 2,
9, 8, 7, 4, 5, 6, 9, 5, and 7. Is there evidence at the 5% level that the new trainer has made any difference?
2. Samples of cream from each of 10 dairies(A-J) are each divided into two portions. One portion from each
is sent to Laboratory 1, the other to Laboratory 2, for bacterium counts. The counts, in thousand bacteria ml -1
are:
Is there a significant difference between the 2 laboratory tests? Use 5% level
Dairy A B C D E F G H I J
Laboratory 1 11.7 12.1 13.3 15.1 15.9 15.3 11.9 16.2 15.1 13.6
Laboratory 2 10.9 10.9 11.9 13.4 15.4 14.8 12.3 15.0 14.2 13.1
3. A random sample of 12 basketball players were asked to shoot free throws before and after rigorous exercise. The
number of successful free throw attempts by each player was as follows:
Player 1 2 3 4 5 6 7 8 9 10 11 12
Before 18 12 7 21 19 14 8 11 19 16 8 11
After 16 10 8 23 13 10 8 13 9 8 8 5
Test the hypothesis that the free throw success rate tends to decrease when the players are tired. Use 5% level
of significance.
4. Below is the fuel efficiency of 3 brands of gasoline tested to 21 identical car make:
Brand A Brand B Brand C
14 18 20
15 18.5 23
16 19 23
17 19 23
19 20 24
20 20 25
20 20.5 26
Is there a significant difference on the fuel efficiency of the 3 brands of gasoline? Use 1% level of
significance
Answer to Practice Exercises
1. New recruits to a call center are given initial training in answering customer calls. Following this training they
are independently assessed on their competence, and are rated on a score of 1 to 10, 1 representing ‘totally
incompetent’ to 10 ‘totally competent’. It is usual for the trainees’ scores to be symmetrically distributed about a
median of 6. A new trainer has been appointed and the scores of her first 19 trainees are: 6, 5, 6, 9, 7, 3, 4, 6, 7, 2,
9, 8, 7, 4, 5, 6, 9, 5, and 7. Is there evidence at the 5% level that the new trainer has made any difference?
Solution:
1. Ho: The average rating on competence is 6. (Median = 6)
Ha: The average rating on competence is not 6. (Median ≠ 6)
2. Test Statistics: Use The Wilcoxon Signed-Rank Test for Single Population at 5% level of significance.
3. Computation: SPSS Computer Output
4. Decision and Conclusion: The p-value is 0,855which is greater than 0,05, thus we fail to reject Ho. Therefore, the
average rating on competence is 6, indicating the competence score of the new trainees is not different with the others.
2. Samples of cream from each of 10 dairies(A-J) are each divided into two portions. One portion from each is sent to
Laboratory 1, the other to Laboratory 2, for bacterium counts. The counts, in thousand bacteria ml -1 are:
Is there a significant difference between the 2 laboratory tests results? Use 5% level of significance.
Solution:
1. Ho: There is no significant difference in the laboratory test results. (The medians of the 2 lab results
are the same)
Ha: There is no significant difference in the laboratory test results. (The medians of the 2 lab results
are the same)
2. Test Statistics: Use The Mann-Whitney U-test at 5% level of significance.
3. Computation: SPSS Computer Output
4. Decision and Conclusion: The p-value is 0.247which is greater than 0,05, thus we fail to reject Ho. Therefore, the
medians of the 2 lab results are the same, indicating that bacterium counts of LAB 1 and LAB 2 are the same.
3. A random sample of 12 basketball players were asked to shoot free throws before and after rigorous exercise. The
number of successful free throw attempts by each player was as follows:
Player 1 2 3 4 5 6 7 8 9 10 11 12
Before 18 12 7 21 19 14 8 11 19 16 8 11
After 16 10 8 23 13 10 8 13 9 8 8 5
Test the hypothesis that the free throw success rate tends to decrease when the players are tired. Use 5% level
of significance.
Solution:
1. Ho: There is no significant difference in the freethrow success rate before and after the rigorous exercise.
Ha: There is a significant difference in the freethrow success rate before and after the rigorous exercise
2. Test Statistics: Use The Wilcoxon Signed-Rank Test for Dependent Samples (Paired Observation)
at 5% level of significance.
Solution:
4. Decision and Conclusion: The p-value is 0.045 which is less than 0,05, thus we reject Ho. Therefore, there is a
significant decrease in the free throw success rate after the rigorous exercise.
4. Below is the fuel efficiency of 3 brands of gasoline tested to 21 identical car make:
Brand A Brand B Brand C
14 18 20
15 18.5 23
16 19 23
17 19 23
19 20 24
20 20 25
20 20.5 26
Is there a significant difference on the fuel efficiency of the 3 brands of gasoline? Use 1% level of
significance
Solution:
1. Ho: There is no significant difference in the fuel efficiency of the three brands of gasoline.
Ha: There is a significant difference in the fuel efficiency of the three brands of gasoline.
2. or
4. Decision and Conclusion: The p-value is 0.001 which is less than 0,01, thus we reject Ho. Therefore, there is a
significant difference in the fuel efficiency of the three brands of gasoline.