Non Parametric Tests

Chapter 22
Nonparametric Methods
If the assumptions such as normality or linearity are not satisfied and/or there are
extreme outliers, it is sometimes appropriate to use nonparametric methods, which
both do not involve statistical inference of parameters and also are distribution-free.
Most of the methods considered in this chapter involve the use of ranks. These
methods still require the important assumption of the independence of observations
though.
22.1 Ranks
We look at two nonparametric methods, both analogous to the two-sample t test, in
this section. One is the Wilcoxon rank-sum or Mann-Whitney statistic which is the
nonparametric version of the parametric (independent) two-sample t test. The other
is the Wilcoxon signed-rank test which is the nonparametric version of the (dependent)
paired t test.
Exercise 22.1 (Ranks)

1. Inference for two independent samples: blood cells.
A study is conducted to determine cellular response to progesterone in females.
Blood cells from four females are injected with progesterone; blood cells from
four different females are, for comparison purposes, left untreated.
(a) Review: two-sample (independent) t-test.
Test if average progesterone response is greater than average control re-
sponse at 5%. Assume normality with no outliers.
female progesterone (1) female control (2)
1 5.85 5 5.23
2 2.28 6 1.21
3 1.51 7 1.40
4 2.12 8 1.38
213
214 Chapter 22. Nonparametric Methods (lecture notes 12)
# INSTALL library package "gtools"; Import Dataset "chapter22.blood.independent"

library(gtools)
blood <- chapter22.blood.independent; attach(chapter22.blood.independent); head(chapter22.blood.independent)
treatment <- blood$treatment[!is.na(blood$treatment)]; treatment; length(treatment) # remove treatment zeros
label.treatment <- c(rep("treatment",length(treatment))); label.treatment
control <- blood$control[!is.na(blood$control)]; control; length(control) # remove control zeros
label.control <- c(rep("control",length(control))); label.control
label.combined <- c(label.treatment,label.control) # show steps in how ranks used
combined <- c(treatment,control); combined
combined.ranks <- rank(combined); combined.ranks # ranks of combined data
tapply(combined.ranks,label.combined,sum) # sum ranks of treatment, control separately
i. Statement. Choose one.

A. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 < 0
B. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
C. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 6= 0
ii. Test.
1 5.85 5 5.23
2 2.28 6 1.21
3 1.51 7 1.40
4 2.12 8 1.38
average x̄1 = 2.94 x̄2 = 2.305
Test statistic of x̄1 − x̄2 = 0.635 is t = 0.458 / 2.93 / 4.56,
with degrees of freedom 3.999 / 4.999 / 5.999
so chance observed x̄1 − x̄2 = 0.635 or more, if µ1 − µ2 = 0, is
p-value = P (x̄1 − x̄2 ≥ 0.635) ≈ P (t ≥ 0.458) = 0.13 / 0.25 / 0.33
level of significance α = (choose one) 0.01 / 0.05 / 0.10.
t.test(treatment,control,alternative="greater") # independent two-sample t test
data: treatment and control

t = 0.45817, df = 5.9996, p-value = 0.3315
iii. Conclusion.
Since p–value = 0.33 > α = 0.050,
do not reject / reject null guess: H0 : µ1 − µ2 = 0.
Sample average difference x̄1 −x̄2 indicates population difference µ1 −µ2
is less than / equals / is greater than 0: H0 : µ1 − µ2 = 0.
In other words, progesterone population mean cellular response
is less than / equals / is greater than / is different from
control population mean cellular response.
iv. Comments.
Parameters µ1 , µ2 are / are not used in this analysis.
Normal distribution is / is not required assumption in analysis;
but Figure 22.1 indicates data is / is not normal.
Figure 22.1 indicates observations (5.23, 5.85) are / are not outliers.
Independence of data is / is not required in this analysis.
Section 1. Ranks (lecture notes 12) 215
Normal Q−Q Plot for Treatment Normal Q−Q Plot for Control
6
5
Sample Quantiles
Sample Quantiles
5
4
4
3
3
2
2
−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0
Theoretical Quantiles Theoretical Quantiles
Figure 22.1: Checking normality and outlier assumptions
par(mfrow=c(2,2)) # check normality and outlier assumptions

boxplot(treatment,horizontal=T,outline=TRUE,frame=F,col="green")
boxplot(control,horizontal=T,outline=TRUE,frame=F,col="green")
qqnorm(treatment, main="Normal Q-Q Plot for Treatment"); qqline(treatment, col = 2)
qqnorm(control, main="Normal Q-Q Plot for Control"); qqline(treatment, col = 2)
par(mfrow=c(1,1))
(b) Wilcoxon rank-sum or Mann-Whitney statistic, small sample (n ≤ 10).

Test if distribution of progesterone response is greater than distribution of
control response at 5%.
1 5.85 5 5.23
2 2.28 6 1.21
3 1.51 7 1.40
4 2.12 8 1.38
A. H0 : distributions same versus
H1 : progesterone distribution greater than control distribution
B. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
C. H0 : treatment distribution > than control distribution versus
H1 : distributions same
ii. Test.
Combine and rank responses, then separately sum ranks:
female 6 8 7 3 4 2 5 1
response 1.21 1.38 1.40 1.51 2.12 2.28 5.23 5.85
rank 1 2 3 4 5 6 7 8
female progesterone rank female control rank
1 5.85 8 5 5.23 7
2 2.28 6 6 1.21 1
3 1.51 4 7 1.40 3
4 2.12 5 8 1.38 2
sum Ttreat = 23 Tcontrol = 13
Calculate missing Ttreat and Tcontrol for other possible rankings:
female 1 2 3 4 Ttreat 5 6 7 8 Tcontrol
rank 1 2 3 4 10 5 6 7 8 26
rank 5 6 7 8 1 2 3 4 10
rank 1 6 3 8 18 5 2 7 4
rank 8 6 4 5 23 7 1 3 2 13
There are P8,4 = 1680 different rankings, all chosen at random.
In some / all cases,
Ttreat + Tcontrol = 1 + 2 + 3 + · · · + 7 + 8 = 9(8)
2
= 16 / 26 / 36,
so knowing one, the other is known, Ttreat = 36 − Tcontrol ,
so let’s use Tcontrol = 13.
Histogram of all.ranks
150
Frequency
100
50
0
10 15 20 25
all.ranks
Figure 22.2: Permutation histogram of Wilcoxon rank sum test
rank.permutations <- permutations(length(combined),

min(length(treatment),length(control)),
combined.ranks, set=TRUE, repeats.allowed=FALSE) # histogram of rank sums
all.ranks <- rowSums(rank.permutations); hist(all.ranks); abline(v=13, lwd=2, col="purple")
p.value <- sum(all.ranks <= 13)/sum(all.ranks >= 0); sum(all.ranks <= 23); sum(all.ranks >= 0); p.value
Histogram 22.2 indicates Tcontrol = 13 in at most 168 of the 1680 cases;
so, chance observed Tcontrol = 13 or less, if distributions are the same,
168
p-value = 1680 = 0.03 / 0.10 / 0.13
level of significance α = (choose one) 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater") # p-value, use rank sum based on smallest sample size n
Wilcoxon rank sum test

W = 13, p-value = 0.1
alternative hypothesis: true location shift is greater than 0
iii. Conclusion.
Since p–value = 0.10 > α = 0.05,
do not reject / reject null guess: distributions same.
sample rank sum Tcontrol indicates treatment population distribution
control population distribution.
iv. Comment: independent samples
Control blood samples for four women
depend on / are independent of
progesterone–infected blood samples of four other women. In general,
sampling is independent if individuals in one sample do not determine
individuals in other sample.
v. Comments.
Parameters µ1 , µ2 are / are not used in this analysis.
Normal distribution is / is not required assumption in analysis.
Random data is / is not required in this analysis.
Ranks-sum test here deals better / worse with outliers than t-test.
(c) Wilcoxon rank-sum test, small sample (n ≤ 10), with ties.
1 5.85 4 5.23
2 1.50 5 1.40
3 1.50 6 1.40
7 1.40
B. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
ii. Test.
Combine and rank responses, then separately sum ranks:
female 5 6 7 2 3 4 1
response 1.40 1.40 1.40 1.50 1.50 5.23 5.85
rank 2 2 2 6 7
female progesterone rank female control rank
1 5.85 7 4 5.23 6
2 1.50 5 1.40 2
3 1.50 6 1.40 2
7 1.40 2
sum Ttreat = 16 Tcontrol = 12
There are P7,3 = 210 different rankings, all chosen at random.
In some / all cases,
Ttreat + Tcontrol = 1 + 2 + 3 + · · · + 7 = 8(7)
2
= 16 / 28 / 36,
choose T with smallest sample size, 3: Ttreat / Tcontrol .
so, chance observed Ttreat = 16 or less, if distributions are the same,
p-value = 0.03 / 0.10 / 0.13.
Level of significance α = (choose one) 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater") # p-value, use rank sum based on smallest sample size n
Wilcoxon rank sum test with continuity correction

W = 10, p-value = 0.09737
alternative hypothesis: true location shift is greater than 0
iii. Conclusion.
Since p–value = 0.10 > α = 0.05,
sample rank sum Ttreat indicates treatment population distribution
iv. Comments.
Wilcoxon ranks-sum test can / cannot done exactly using permuta-
tion approach when there are ties.
2. Wilcoxon rank-sum or Mann-Whitney statistic, large sample (n > 10).

Consider generated sales (in $1000) for a random sample of older salespeople
and new hires. Test if two distributions are different at 5%.
old (1) 6.22 8.11 5.44 5.76 4.87 5.46 9.33 9.45 8.34 6.23
8.14 5.43 8.98 8.27 7.66 9.34 10.99 10.22 8.88 7.77
6.66 5.55 7.89 8.94 6.02 6.81 8 9 7
new (2) 4.23 2.11 1.11 3 3.87 2.03 4.55 4.31 3.78 5.95
2.16 3.33 3.79 4.1 5.67 4.44 3.32 4.77 8.44
sales <- chapter22.sales.independent; attach(chapter22.sales.independent); head(chapter22.sales.independent)

new <- sales$new[!is.na(sales$new)]; new; length(new) # remove new zeros
label.new <- c(rep("new",length(new))); label.new
old <- sales$old[!is.na(sales$old)]; old; length(old) # remove old zeros
label.old <- c(rep("old",length(old))); label.old
label.combined <- c(label.new,label.old) # show steps in how ranks used
combined <- c(new,old); combined
tapply(combined.ranks,label.combined,sum) # sum ranks of new, old separately
separately
(a) Statement. Choose one.

i. H0 : distributions same versus
H1 : old distribution different than new distribution
ii. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
iii. H0 : old distribution different from new distribution versus
(b) Test.
number old salespeople nold = n1 = 19 / 29 / 221 / 955
number new salespeople nnew = n2 = 19 / 29 / 221 / 955
so smallest sample size is n1 = 29 / n2 = 19
Told = T1 = 19 / 29 / 221 / 955
Tnew = T2 = 19 / 29 / 221 / 955
so rank sum with smallest sample size T = T1 = 955 / T2 = 221
and E(T ) = E(T2 ) =qn2 (n1 +n
2
2 +1)
= 19(29+19+1)
q2
= 47.433 / 465.5
SD(T ) = SD(T2 ) = n1 n2 (n121 +n2 +1)
= 29×19(29+19+1)
12
≈ 47.433 / 465.5
so approximate test statistic is
T − E(T ) T2 − E(T2 ) 221 − 465.5

Z= = ≈ ≈
SD(T ) SD(T2 ) 47.433
−5.15 / 0 / 5.15
so p-value = 2 × P (Z < −5.15) ≈ 0.00 / 0.10 / 0.13
notice approximate and exact answers are different from one another, but both very very small
Level of significance α = 0.01 / 0.05 / 0.10.
wilcox.test(new,old,alternative="two.sided") # exact p-value, use rank sum based on smallest sample size n
2*pnorm(-5.15) # approximate p-value: 2 times P(Z < -5.15) because two-sided test
Wilcoxon rank sum test
data: new and old

W = 31, p-value = 6.039e-09
alternative hypothesis: true location shift is not equal to 0
(c) Conclusion.
Since p–value = 0 < α = 0.05,

sample rank sum Tnew indicates new salespeople population distribution
old salespeople population distribution.
(d) Comments
approximate p-value does / does not equal exact p-value.
critical value at 5%, Tcrit = 19 / 29 / 182
qwilcox(0.05/2,29,19,lower.tail=TRUE) - 1
[1] 182
3. Inference for two dependent samples: blood cells.

Blood cells from female 1 are broken into two groups. One group of these
blood cells are injected with progesterone; the other group, the control, is, for
comparison purposes, left untreated. Blood cells of other females are handled
in same way.
(a) Review: paired t-test.

Test if mean progesterone response greater than mean control response at
5%. Assume normality with no outliers.
female progesterone (1) control (2)
1 5.85 5.23
2 2.28 1.21
3 1.40 1.51
4 2.12 1.38
# Import Dataset "chapter22.blood.dependent"
blood <- chapter22.blood.dependent; attach(chapter22.blood.dependent); head(chapter22.blood.dependent)
treatment <- blood$treatment; treatment; control <- blood$control; control
difference <- treatment - control; difference <- difference[difference!=0]; difference # remove zeros
label <- c(rep("improve",length(difference)));
label[difference < 0] <- c(rep("worsen",length(difference[difference<0]))); label # label improve/worsen
difference.abs <- abs(difference); difference.abs # remove negatives from ranks
difference.ranked <- rank(difference.abs); difference.ranked
tapply(difference.ranked,label,sum)
i. Statement.
If mean progesterone response, µ1 , is greater than average control re-
sponse, µ2 , µ1 > µ2 , difference in responses must be greater than zero,
µd = µ1 − µ2 > 0, so (circle one)
A. H0 : µd = 0 versus H1 : µd > 0
B. H0 : µd = 0 versus H1 : µd < 0
C. H0 : µd = 0 versus H1 : µd 6= 0
ii. Test.
female progesterone (1) control (2) differences, di

1 5.85 5.23 d1 = 5.85 − 5.23 = 0.62
2 2.28 1.21 1.07
3 1.40 1.51 -0.11
4 2.12 1.38 0.74
¯
Average of differences d = 0.62+1.07−0.11+0.74
= 0.355 / 0.58 / 0.635
4
¯
and test statistic of d = 0.635 is t ≈ 1.42 / 2.33 / 3.19,
with n − 1 = 4 − 1 = 1 / 2 / 3 degrees of freedom,
so chance observed d¯ = 0.58 or more, if µd = 0, is
p–value = P (d̄ ≥ 0.635) ≈ P (t ≥ 2.33) ≈ 0.013 / 0.025 / 0.051.
t.test(treatment,control,alternative="greater",paired=TRUE) # paired t test
Paired t-test

t = 2.3303, df = 3, p-value = 0.05106
iii. Conclusion.
Since p–value = 0.051 > α = 0.050,
(circle one) do not reject / reject null guess: H0 : µd = 0.
Sample average difference d¯ indicates population average difference µd
is less than / equals / is greater than 0: H1 : µd > 0.
In other words, progesterone population mean cellular response
control population mean cellular response.
iv. Comment: dependent samples
Control blood samples depend on / are independent of
progesterone blood samples.
(b) Wilcoxon signed-rank.
female progesterone (1) control (2) differences, di ranks |ranks|

1 5.85 5.23 0.62 2 2
2 2.28 1.21 1.07 4 4
3 1.40 1.51 -0.11 -1 1
4 2.12 1.38 0.74 3 3
B. H0 : µd = 0 versus H1 : µd > 0

ii. Test.
sum positive ranks: Ttreat > control = T + = 2 + 4 + 3 = 1 / 9 / 10,
sum negative ranks: |Ttreat < control | = T − = | − 1| = 1 / 9 / 10,
test statistic (with smallest ranked sum): T − / T +
chance observed T − = 1 or less, if distributions are the same,
p-value = 0.03 / 0.10 / 0.13
level of significance α = 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater",paired=TRUE) # based on smallest signed-rank sum
Wilcoxon signed rank test

V = 9, p-value = 0.125
iii. Conclusion.
Since p–value = 0.13 > α = 0.05,
signed-rank sum T − indicates treatment population distribution
iv. Comments.
Parameter µd is / is not used in this analysis.
Signed-rank test deals better / worse with outliers than paired t-test.
(c) Wilcoxon signed-rank, with ties.
female progesterone (1) control (2) differences, di ranks |ranks|

1 5.85 5.23 0.62 2 2
2 2.28 2.28 0 0 0
3 1.40 1.51 -0.11 -1 1
4 2.12 1.38 0.74 3 3
B. H0 : µd = 0 versus H1 : µd > 0
Section 2. The Wilcoxon Rank-Sum Mann-Whitney Statistic (lecture notes 12) 223
ii. Test.
sum positive ranks: Ttreat > control = T + = 2 + 3 = 1 / 5 / 6,
sum negative ranks: = |Ttreat < control | = T − = | − 1| = 1 / 5 / 6,
ignore tied ranks: Ttreat = control so are / are not counted as zero,
test statistic (with smallest ranked sum): T − / T +
chance observed T − = 1 or less, if distributions are the same,
p-value = 0.03 / 0.13 / 0.21
level of significance α = 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater",paired=TRUE) # based on smallest signed-rank sum
Wilcoxon signed rank test with continuity correction

V = 5, p-value = 0.2113
iii. Conclusion.
Since p–value = 0.21 > α = 0.05,
signed-rank sum T − indicates treatment population distribution
22.2 The Wilcoxon Rank-Sum Mann-Whitney

Statistic
Material covered in previous section.
22.3 Kruskal-Wallace Test

We look at two nonparametric methods, both analogous to the parametric analysis
of variance (ANOVA) method. One is the Kruskal-Wallace test which is the non-
parametric version of a completely randomized one-way ANOVA. The other is the
Friedman test which is the nonparametric version of a randomized block two-way
ANOVA.
Exercise 22.3 (Kruskal-Wallace Test)
1. Review: test multiple means using one-way ANOVA.

Fifteen different patients, chosen at random, subjected to three drugs. Test if
at least one of the three mean patient responses to drug is different at α = 0.05.
drug 1 drug 2 drug 3

5.90 5.51 5.01
5.92 5.50 5.00
5.91 5.50 4.99
5.89 5.49 4.98
5.88 5.50 5.02
x̄1 ≈ 5.90 x̄2 ≈ 5.50 x̄3 ≈ 5.00
# Import dataset "chapter20.drugA.oneway"

y.drug.A <- chapter20.drugA.oneway; attach(y.drug.A); head(y.drug.A)
sapply(list(drug.1,drug.2,drug.3), mean) # mean responses for three drugs
(y.stack.A <- stack(y.drug.A)); names(y.stack.A) <- c("response.A", "drug.A"); attach(y.stack.A); head(y.stack.A)
tapply(response.A,drug.A,var)
(a) Statement.
i. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ2 , µ1 = µ3 .
ii. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ3 , µ1 6= µ2 .
iii. H0 : µ1 = µ2 = µ3 vs H1 : µi 6= µj , i 6= j; i, j = 1, 2, 3.
iv. H0 : means same vs H1 : at least one of the means different
(b) Test.
p–value = (circle one) 0.00 / 0.035 / 0.043.
summary(aov(response.A~drug.A))
Df Sum Sq Mean Sq F value Pr(>F)

drug.A 2 2.0333 1.0167 5545 <2e-16 ***
Residuals 12 0.0022 0.0002
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Level of significance α = (choose one) 0.01 / 0.05 / 0.10.
(c) Conclusion.
Since p–value = 0.00 < α = 0.05,
(circle one) do not reject / reject null H0 : means same.
Data indicates (circle one)
average drug responses same
at least one of average drug responses different
2. Test multiple population centers using Kruskal-Wallis Test.

Fifteen different patients, chosen at random, subjected to three drugs. Test if
at least one of three population patient response centers to drug is different at
α = 0.05. Fill in the missing ranks.
Section 3. Kruskal-Wallace Test (lecture notes 12) 225
drug 1 rank drug 2 rank drug 3 rank

5.90 13 5.51 10 5.01 4
5.92 15 5.50 5.00 3
5.91 14 5.50 4.99 2
5.89 12 5.49 6 4.98 1
5.88 11 5.50 5.02 5
T1 = 65 T2 = T3 = 15
# Import Dataset "chapter20.drugA.oneway"

drug <- chapter20.drugA.oneway; attach(drug); head(drug)
drug.1 <- drug$drug.1[!is.na(drug$drug.1)]; drug.1; length(drug.1) # remove treatment zeros
label.drug.1 <- c(rep("drug.1",length(drug.1))); label.drug.1 # label treatments
drug.2 <- drug$drug.2[!is.na(drug$drug.2)]; drug.2; length(drug.2)
label.drug.2 <- c(rep("drug.2",length(drug.2))); label.drug.2
drug.3 <- drug$drug.3[!is.na(drug$drug.3)]; drug.3; length(drug.3)
label.drug.3 <- c(rep("drug.3",length(drug.3))); label.drug.3
combined <- c(drug.1,drug.2,drug.3); combined # combine treatments, labels
label.combined <- factor(c(label.drug.1,label.drug.2,label.drug.3))
(a) Statement.
i. H0 : at least one center different vs H1 : centers same
ii. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ3 , µ1 6= µ2 .
iii. H0 : centers same vs H1 : at least one center different
(b) Test.
since T1 = 15 / 40 / 65
and T2 = 15 / 40 / 65
and T3 = 15 / 40 / 65
tapply(combined.ranks,label.combined,sum) # sum ranks of separate treatments
drug.1 drug.2 drug.3

65 40 15
and overall sample size N = 15 / 40 / 65
test statistic
12 X Ti2
H = − 3(N + 1)
N(N + 1) ni
T12 T22 T32
!
12
= + + − 3(N + 1)
N(N + 1) n1 n2 n2
652 402 152
!
12
= + + − 3(15 + 1) =
15(15 + 1) 5 5 5
12.5 / 15 / 17.3
so chance observed H =
calculated H here different H from R because R adjusted for ties
12.5 or more, if distributions the same,
p–value = P (H > 12.5) ≈ 0.001 / 0.035 / 0.043
kruskal.test(combined,label.combined) # Kruskal-Wallace test statistic, p-value
Kruskal-Wallis rank sum test
data: combined and label.combined

Kruskal-Wallis chi-squared = 12.59, df = 2, p-value = 0.001846
(c) Conclusion.
Since p–value = 0.001 < α = 0.050,
do not reject / reject null H0 : means same
Data indicates
drug responses same
at least one of drug responses different
(d) Comments.
Parameter are / are not used in this analysis.
Kruskal-Wallace test better / worse with outliers than one-way ANOVA.
upper critical value at 5% χ2k−1,α = χ23−1,0.05 = 5.99 / 6.99 / 7.99
qchisq(0.05,2,lower.tail=FALSE) # chi-square upper critical value at 5%
[1] 5.991465
3. Test multiple population centers with blocking using Friedman Test.

Five different patients, chosen at random, are each subjected to three drugs (at
different times). Test if at least one of three population patient response centers
to drug is different at α = 0.05. Fill in the missing ranks. Notice rank within
patient blocks.
patient drug 1 rank drug 2 rank drug 3 rank

1 5.90 2 6.31 3 4.52 1
2 4.42 2 3.54 1 6.93 3
3 7.51 3 4.73 2 4.48 1
4 7.89 3 7.20 2 5.55 1
5 3.78 5.72 3.52
T1 = 12 T2 = 11 T3 = 7
# Import Dataset "chapter20.drugB.oneway"

drug <- as.matrix(chapter20.drugB.oneway); head(drug)
(a) Statement.
i. H0 : at least one center different vs H1 : centers same
Section 3. Kruskal-Wallace Test (lecture notes 12) 227
ii. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ3 , µ1 6= µ2 .
iii. H0 : centers same vs H1 : at least one center different
(b) Test.
since T1 = 7 / 11 / 12
and T2 = 7 / 11 / 12
and T3 = 7 / 11 / 12
and number of blocks b = 3 / 5
and number of treatments k = 3 / 5
test statistic
12
Ti2 − 3b(k + 1)
X
F =
bk(k + 1)
12
= T12 + T22 + T32 − 3b(k + 1)
bk(k + 1)
12
= 122 + 112 + 72 − 3(5)(3 + 1) =
3(5)(3 + 1)
2.7 / 2.8 / 2.9

so chance observed F = 2.8 or more, if distributions the same,
p–value = P (F > 2.8) ≈ 0.00 / 0.25 / 0.43
friedman.test(drug) # Friedman test statistic, p-value
Friedman rank sum test
data: drug
Friedman chi-squared = 2.8, df = 2, p-value = 0.2466
(c) Conclusion.
Since p–value = 0.25 > α = 0.05,
do not reject / reject null H0 : means same
Data indicates
drug responses same
at least one of drug responses different
(d) Comments.
Parameter are / are not used in this analysis.
upper critical value at 5% χ2k−1,α = χ23−1,0.05 = 5.99 / 6.99 / 7.99
qchisq(0.05,2,lower.tail=FALSE) # chi-square upper critical value at 5%
[1] 5.991465
22.4 Paired Data: The Wilcoxon Signed-Ranked

Test
22.5 Friedman Test for a Randomized Block De-

sign
Material covered in the previous section.
22.6 Kendall’s Tau: Measuring Monotonicity

We look at measures of association between two quantitative variables. After review-
ing the parametric Pearson’s correlation, given by
P P
xi yi
xi yi −
P
n
rp = v ! !
u 2 2
( xi ) ( yi )
P P
t P x2 yi2 −
u
−
P
i n n
we look at two analogous nonparametric statistics. One is Spearman’s correlation

which is the correlation of the ranks of variables x and y, rx and ry , and given by
P P
rx ry
rx ry −
P
n
rs = v ! !,
u 2 2
( rx ) ( ry )
P P
u P
r2 − ry2 −
t P
x n n
and the other is Kendall’s Tau which looks at all pairwise points in a scatterplot and
compares the number discordant, nc (slopes between points are negative) and number
concordant, nc (slopes between points are positive) using the following formula,
nc − nd
τ= 1 .
2
n(n − 1)
In all three cases, −1 < rp < 1, −1 < rs < 1 and −1 < τ < 1, however, only
Pearson’s correlation measures linear association, whereas the other two measure
association alone. Pearson’s correlation is sensitive to outliers whereas the other two
are not.
Exercise 22.6 (Kendall’s Tau: Measuring Monotonicity)

1. Comparing measures of association.
Section 6. Kendall’s Tau: Measuring Monotonicity (lecture notes 12) 229
600
100
200
550
180
95
500
160
90
Reading Ability
140
Pizza Sales
Grain Yield
450
85
120
400
80
100
350
75
80
300
60
70
2 4 6 8 10 0 50 100 150 5 10 15 20 25
Brightness Distance from Water’s Edge Number of Students
Figure 22.3: Three scatterplots
# Import dataset "chapter4.reading.brightness", "chapter4.grain.water", "chapter4.pizza.students"

reading <- chapter4.reading.brightness; attach(reading); head(reading)
grain <- chapter4.grain.water; attach(grain); head(grain)
pizza <- chapter4.pizza.students; attach(pizza); head(pizza)
par(mfrow=c(1,3))
plot(reading$brightness, reading$reading, pch=16,col="red",xlab="Brightness",ylab="Reading Ability")
plot( grain$distance, grain$yield,pch=16,col="red",ylab="Grain Yield",xlab="Distance from Water’s Edge")
plot(pizza$students, pizza$sales, pch=16,col="red",ylab="Pizza Sales",xlab="Number of Students")
par(mfrow=c(1,1))
(a) Reading ability versus brightness

brightness, x 1 2 3 4 5 6 7 8 9 10
reading ability, y 70 70 75 88 91 94 100 92 90 85
Pearson correlation rp ≈ 0.449 / 0.584 / 0.704
Spearman correlation rs ≈ 0.449 / 0.584 / 0.704
Kendall’s tau r ≈ 0.449 / 0.584 / 0.704
So, in all three cases, association between reading ability and brightness is
negative / positive but it is strongest (closest to 1) for rp / rs / τ
cor(reading$brightness,reading$reading,method="pearson")
cor(reading$brightness,reading$reading,method="spearman")
cor(reading$brightness,reading$reading,method="kendall")
[1] 0.7043218
[1] 0.5835893
[1] 0.4494666
(b) Grain yield versus distance from water
dist, x 0 10 20 30 45 50 70 80 100 120 140 160 170 190

yield, y 500 590 410 470 450 480 510 450 360 400 300 410 280 350
Pearson correlation rp ≈ −0.791 / −0.785 / −0.589

Spearman correlation rs ≈ −0.791 / −0.785 / −0.589
Kendall’s tau r ≈ −0.791 / −0.785 / −0.589
negative / positive but it is strongest (closest to -1) for rp / rs / τ
cor(grain$distance,grain$yield,method="pearson")
cor(grain$distance,grain$yield,method="spearman")
cor(grain$distance,grain$yield,method="kendall")
[1] -0.7851085
[1] -0.7907508
[1] -0.5889252
(c) Annual pizza sales versus student number
student number, x 2 6 8 8 12 16 20 20 22 26
pizza sales, y 58 105 88 118 117 137 157 169 149 202
Pearson correlation rp ≈ 0.796 / 0.920 / 0.950
Spearman correlation rs ≈ 0.796 / 0.920 / 0.950
Kendall’s tau r ≈ 0.796 / 0.920 / 0.950
negative / positive but it is strongest (closest to 1) for rp / rs / τ
cor(pizza$students,pizza$sales,method="pearson")
cor(pizza$students,pizza$sales,method="spearman")
cor(pizza$students,pizza$sales,method="kendall")
[1] 0.950123
[1] 0.9207488
[1] 0.7956601
2. Comparing Spearman and Pearson correlations.
brightness, x 1 2 3 4 5 6 7 8 9 10
rank(brightness) 1 2 3 4 5 6 7 8 9 10
rank(ability) 1.5 1.5 3 5 7 9 10 8 6 4
Consider reading data. Scatterplot of ranks of data more / less able to fit a
line than data itself and so consequently rs = 0.584 < rp = 0.704. Spearman’s
correlation rs measures association / linear association.
# Import dataset "chapter22.reading.brightness.ranks"
reading.ranks <- chapter22.reading.brightness.ranks; attach(reading.ranks); head(reading.ranks)
reading <- chapter4.reading.brightness; attach(reading); head(reading)
Section 7. Spearman’s Rho (lecture notes 12) 231
95 100
10
Rank(Reading Ability)
8
Reading Ability
90
6
85
80
4
75
2
70
2 4 6 8 10 2 4 6 8 10
Brightness Rank(Brightness)
Figure 22.4: Scatterplot of data and ranks of data
par(mfrow=c(1,2))
plot(reading$brightness, reading$reading, pch=16,col="red",xlab="Brightness",ylab="Reading Ability")
plot(reading.ranks$rank.brightness, reading.ranks$rank.reading, pch=16,col="red",xlab="Rank(Brightness)",ylab="Rank(Readi
par(mfrow=c(1,1))
3. Kendall’s tau.
brightness, x 1 2 3 4 5 6 7 8 9 10
Consider reading data.

examples of concordant (positive) pairwise slopes are blue / red
examples of discordant (negative) pairwise slopes are blue / red
if number of concordant, nc = 32, number discordant, nd = 12, and since there
are n = 10 points,
nc − nd 32 − 12
τ= 1 = 1 =
2
n(n − 1) 2
10(10 − 1)
0.44 / 0.50 / 0.55
the one green horizontal slope neither concordant nor discordant is ignored
22.7 Spearman’s Rho

Material covered in the previous section.
100
d
95
d d
Reading Ability
d
90
d
d
85
c
c
80
75
70
2 4 6 8 10
Brightness
Figure 22.5: Concordant and discordant pairwise slopes
22.8 When Should You Use Nonparametric Meth-

ods?

Non Parametric Tests

Uploaded by

Copyright:

Available Formats

You might also like

Non Parametric Tests

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Non Parametric Tests

Uploaded by

Copyright:

Available Formats

Chapter 22

Exercise 22.1 (Ranks)

# INSTALL library package "gtools"; Import Dataset "chapter22.blood.independent"

i. Statement. Choose one.

data: treatment and control

Theoretical Quantiles Theoretical Quantiles

Figure 22.1: Checking normality and outlier assumptions

par(mfrow=c(2,2)) # check normality and outlier assumptions

(b) Wilcoxon rank-sum or Mann-Whitney statistic, small sample (n ≤ 10).

Figure 22.2: Permutation histogram of Wilcoxon rank sum test

rank.permutations <- permutations(length(combined),

wilcox.test(treatment,control,alternative="greater") # p-value, use rank sum based on smallest sample size n

Wilcoxon rank sum test

data: treatment and control

Wilcoxon rank sum test with continuity correction

data: treatment and control

2. Wilcoxon rank-sum or Mann-Whitney statistic, large sample (n > 10).

sales <- chapter22.sales.independent; attach(chapter22.sales.independent); head(chapter22.sales.independent)

(a) Statement. Choose one.

T − E(T ) T2 − E(T2 ) 221 − 465.5

Wilcoxon rank sum test

data: new and old

do not reject / reject null guess: distributions same.

3. Inference for two dependent samples: blood cells.

(a) Review: paired t-test.

female progesterone (1) control (2) differences, di

data: treatment and control

female progesterone (1) control (2) differences, di ranks |ranks|

C. H0 : treatment distribution > than control distribution versus

Wilcoxon signed rank test

data: treatment and control

female progesterone (1) control (2) differences, di ranks |ranks|

Wilcoxon signed rank test with continuity correction

data: treatment and control

22.2 The Wilcoxon Rank-Sum Mann-Whitney

22.3 Kruskal-Wallace Test

Exercise 22.3 (Kruskal-Wallace Test)

1. Review: test multiple means using one-way ANOVA.

drug 1 drug 2 drug 3

# Import dataset "chapter20.drugA.oneway"

Df Sum Sq Mean Sq F value Pr(>F)

2. Test multiple population centers using Kruskal-Wallis Test.

drug 1 rank drug 2 rank drug 3 rank

# Import Dataset "chapter20.drugA.oneway"

drug.1 drug.2 drug.3

kruskal.test(combined,label.combined) # Kruskal-Wallace test statistic, p-value

Kruskal-Wallis rank sum test

data: combined and label.combined

3. Test multiple population centers with blocking using Friedman Test.

patient drug 1 rank drug 2 rank drug 3 rank

# Import Dataset "chapter20.drugB.oneway"

2.7 / 2.8 / 2.9

Friedman rank sum test

22.4 Paired Data: The Wilcoxon Signed-Ranked

22.5 Friedman Test for a Randomized Block De-

22.6 Kendall’s Tau: Measuring Monotonicity

we look at two analogous nonparametric statistics. One is Spearman’s correlation

Exercise 22.6 (Kendall’s Tau: Measuring Monotonicity)

Brightness Distance from Water’s Edge Number of Students