Non Parametric Tests

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Chapter 22

Nonparametric Methods

If the assumptions such as normality or linearity are not satisfied and/or there are
extreme outliers, it is sometimes appropriate to use nonparametric methods, which
both do not involve statistical inference of parameters and also are distribution-free.
Most of the methods considered in this chapter involve the use of ranks. These
methods still require the important assumption of the independence of observations
though.

22.1 Ranks
We look at two nonparametric methods, both analogous to the two-sample t test, in
this section. One is the Wilcoxon rank-sum or Mann-Whitney statistic which is the
nonparametric version of the parametric (independent) two-sample t test. The other
is the Wilcoxon signed-rank test which is the nonparametric version of the (dependent)
paired t test.

Exercise 22.1 (Ranks)


1. Inference for two independent samples: blood cells.
A study is conducted to determine cellular response to progesterone in females.
Blood cells from four females are injected with progesterone; blood cells from
four different females are, for comparison purposes, left untreated.
(a) Review: two-sample (independent) t-test.
Test if average progesterone response is greater than average control re-
sponse at 5%. Assume normality with no outliers.
female progesterone (1) female control (2)
1 5.85 5 5.23
2 2.28 6 1.21
3 1.51 7 1.40
4 2.12 8 1.38

213
214 Chapter 22. Nonparametric Methods (lecture notes 12)

# INSTALL library package "gtools"; Import Dataset "chapter22.blood.independent"


library(gtools)
blood <- chapter22.blood.independent; attach(chapter22.blood.independent); head(chapter22.blood.independent)
treatment <- blood$treatment[!is.na(blood$treatment)]; treatment; length(treatment) # remove treatment zeros
label.treatment <- c(rep("treatment",length(treatment))); label.treatment
control <- blood$control[!is.na(blood$control)]; control; length(control) # remove control zeros
label.control <- c(rep("control",length(control))); label.control
label.combined <- c(label.treatment,label.control) # show steps in how ranks used
combined <- c(treatment,control); combined
combined.ranks <- rank(combined); combined.ranks # ranks of combined data
tapply(combined.ranks,label.combined,sum) # sum ranks of treatment, control separately

i. Statement. Choose one.


A. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 < 0
B. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
C. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 6= 0
ii. Test.
female progesterone (1) female control (2)
1 5.85 5 5.23
2 2.28 6 1.21
3 1.51 7 1.40
4 2.12 8 1.38
average x̄1 = 2.94 x̄2 = 2.305
Test statistic of x̄1 − x̄2 = 0.635 is t = 0.458 / 2.93 / 4.56,
with degrees of freedom 3.999 / 4.999 / 5.999
so chance observed x̄1 − x̄2 = 0.635 or more, if µ1 − µ2 = 0, is
p-value = P (x̄1 − x̄2 ≥ 0.635) ≈ P (t ≥ 0.458) = 0.13 / 0.25 / 0.33
level of significance α = (choose one) 0.01 / 0.05 / 0.10.
t.test(treatment,control,alternative="greater") # independent two-sample t test

data: treatment and control


t = 0.45817, df = 5.9996, p-value = 0.3315
iii. Conclusion.
Since p–value = 0.33 > α = 0.050,
do not reject / reject null guess: H0 : µ1 − µ2 = 0.
Sample average difference x̄1 −x̄2 indicates population difference µ1 −µ2
is less than / equals / is greater than 0: H0 : µ1 − µ2 = 0.
In other words, progesterone population mean cellular response
is less than / equals / is greater than / is different from
control population mean cellular response.
iv. Comments.
Parameters µ1 , µ2 are / are not used in this analysis.
Normal distribution is / is not required assumption in analysis;
but Figure 22.1 indicates data is / is not normal.
Figure 22.1 indicates observations (5.23, 5.85) are / are not outliers.
Independence of data is / is not required in this analysis.
Section 1. Ranks (lecture notes 12) 215

Normal Q−Q Plot for Treatment Normal Q−Q Plot for Control

6
5
Sample Quantiles

Sample Quantiles
5
4
4
3
3
2
2

−1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0

Theoretical Quantiles Theoretical Quantiles

Figure 22.1: Checking normality and outlier assumptions

par(mfrow=c(2,2)) # check normality and outlier assumptions


boxplot(treatment,horizontal=T,outline=TRUE,frame=F,col="green")
boxplot(control,horizontal=T,outline=TRUE,frame=F,col="green")
qqnorm(treatment, main="Normal Q-Q Plot for Treatment"); qqline(treatment, col = 2)
qqnorm(control, main="Normal Q-Q Plot for Control"); qqline(treatment, col = 2)
par(mfrow=c(1,1))

(b) Wilcoxon rank-sum or Mann-Whitney statistic, small sample (n ≤ 10).


Test if distribution of progesterone response is greater than distribution of
control response at 5%.
female progesterone (1) female control (2)
1 5.85 5 5.23
2 2.28 6 1.21
3 1.51 7 1.40
4 2.12 8 1.38
i. Statement. Choose one.
A. H0 : distributions same versus
H1 : progesterone distribution greater than control distribution
B. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
C. H0 : treatment distribution > than control distribution versus
H1 : distributions same
ii. Test.
Combine and rank responses, then separately sum ranks:
216 Chapter 22. Nonparametric Methods (lecture notes 12)

female 6 8 7 3 4 2 5 1
response 1.21 1.38 1.40 1.51 2.12 2.28 5.23 5.85
rank 1 2 3 4 5 6 7 8
female progesterone rank female control rank
1 5.85 8 5 5.23 7
2 2.28 6 6 1.21 1
3 1.51 4 7 1.40 3
4 2.12 5 8 1.38 2
sum Ttreat = 23 Tcontrol = 13
Calculate missing Ttreat and Tcontrol for other possible rankings:
female 1 2 3 4 Ttreat 5 6 7 8 Tcontrol
rank 1 2 3 4 10 5 6 7 8 26
rank 5 6 7 8 1 2 3 4 10
rank 1 6 3 8 18 5 2 7 4
rank 8 6 4 5 23 7 1 3 2 13
There are P8,4 = 1680 different rankings, all chosen at random.
In some / all cases,
Ttreat + Tcontrol = 1 + 2 + 3 + · · · + 7 + 8 = 9(8)
2
= 16 / 26 / 36,
so knowing one, the other is known, Ttreat = 36 − Tcontrol ,
so let’s use Tcontrol = 13.

Histogram of all.ranks
150
Frequency

100
50
0

10 15 20 25

all.ranks

Figure 22.2: Permutation histogram of Wilcoxon rank sum test

rank.permutations <- permutations(length(combined),


min(length(treatment),length(control)),
combined.ranks, set=TRUE, repeats.allowed=FALSE) # histogram of rank sums
all.ranks <- rowSums(rank.permutations); hist(all.ranks); abline(v=13, lwd=2, col="purple")
p.value <- sum(all.ranks <= 13)/sum(all.ranks >= 0); sum(all.ranks <= 23); sum(all.ranks >= 0); p.value
Histogram 22.2 indicates Tcontrol = 13 in at most 168 of the 1680 cases;
so, chance observed Tcontrol = 13 or less, if distributions are the same,
168
p-value = 1680 = 0.03 / 0.10 / 0.13
level of significance α = (choose one) 0.01 / 0.05 / 0.10.
Section 1. Ranks (lecture notes 12) 217

wilcox.test(treatment,control,alternative="greater") # p-value, use rank sum based on smallest sample size n

Wilcoxon rank sum test

data: treatment and control


W = 13, p-value = 0.1
alternative hypothesis: true location shift is greater than 0
iii. Conclusion.
Since p–value = 0.10 > α = 0.05,
do not reject / reject null guess: distributions same.
sample rank sum Tcontrol indicates treatment population distribution
is less than / equals / is greater than / is different from
control population distribution.
iv. Comment: independent samples
Control blood samples for four women
depend on / are independent of
progesterone–infected blood samples of four other women. In general,
sampling is independent if individuals in one sample do not determine
individuals in other sample.
v. Comments.
Parameters µ1 , µ2 are / are not used in this analysis.
Normal distribution is / is not required assumption in analysis.
Random data is / is not required in this analysis.
Ranks-sum test here deals better / worse with outliers than t-test.
(c) Wilcoxon rank-sum test, small sample (n ≤ 10), with ties.
Test if distribution of progesterone response is greater than distribution of
control response at 5%.
female progesterone (1) female control (2)
1 5.85 4 5.23
2 1.50 5 1.40
3 1.50 6 1.40
7 1.40
i. Statement. Choose one.
A. H0 : distributions same versus
H1 : progesterone distribution greater than control distribution
B. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
C. H0 : treatment distribution > than control distribution versus
H1 : distributions same
ii. Test.
Combine and rank responses, then separately sum ranks:
218 Chapter 22. Nonparametric Methods (lecture notes 12)

female 5 6 7 2 3 4 1
response 1.40 1.40 1.40 1.50 1.50 5.23 5.85
rank 2 2 2 6 7
female progesterone rank female control rank
1 5.85 7 4 5.23 6
2 1.50 5 1.40 2
3 1.50 6 1.40 2
7 1.40 2
sum Ttreat = 16 Tcontrol = 12
There are P7,3 = 210 different rankings, all chosen at random.
In some / all cases,
Ttreat + Tcontrol = 1 + 2 + 3 + · · · + 7 = 8(7)
2
= 16 / 28 / 36,
choose T with smallest sample size, 3: Ttreat / Tcontrol .
so, chance observed Ttreat = 16 or less, if distributions are the same,
p-value = 0.03 / 0.10 / 0.13.
Level of significance α = (choose one) 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater") # p-value, use rank sum based on smallest sample size n

Wilcoxon rank sum test with continuity correction

data: treatment and control


W = 10, p-value = 0.09737
alternative hypothesis: true location shift is greater than 0
iii. Conclusion.
Since p–value = 0.10 > α = 0.05,
do not reject / reject null guess: distributions same.
sample rank sum Ttreat indicates treatment population distribution
is less than / equals / is greater than / is different from
control population distribution.
iv. Comments.
Wilcoxon ranks-sum test can / cannot done exactly using permuta-
tion approach when there are ties.

2. Wilcoxon rank-sum or Mann-Whitney statistic, large sample (n > 10).


Consider generated sales (in $1000) for a random sample of older salespeople
and new hires. Test if two distributions are different at 5%.

old (1) 6.22 8.11 5.44 5.76 4.87 5.46 9.33 9.45 8.34 6.23
8.14 5.43 8.98 8.27 7.66 9.34 10.99 10.22 8.88 7.77
6.66 5.55 7.89 8.94 6.02 6.81 8 9 7
new (2) 4.23 2.11 1.11 3 3.87 2.03 4.55 4.31 3.78 5.95
2.16 3.33 3.79 4.1 5.67 4.44 3.32 4.77 8.44
Section 1. Ranks (lecture notes 12) 219

sales <- chapter22.sales.independent; attach(chapter22.sales.independent); head(chapter22.sales.independent)


new <- sales$new[!is.na(sales$new)]; new; length(new) # remove new zeros
label.new <- c(rep("new",length(new))); label.new
old <- sales$old[!is.na(sales$old)]; old; length(old) # remove old zeros
label.old <- c(rep("old",length(old))); label.old
label.combined <- c(label.new,label.old) # show steps in how ranks used
combined <- c(new,old); combined
combined.ranks <- rank(combined); combined.ranks # ranks of combined data
tapply(combined.ranks,label.combined,sum) # sum ranks of new, old separately
separately

(a) Statement. Choose one.


i. H0 : distributions same versus
H1 : old distribution different than new distribution
ii. H0 : µ1 − µ2 = 0 versus H1 : µ1 − µ2 > 0
iii. H0 : old distribution different from new distribution versus
H1 : distributions same
(b) Test.
number old salespeople nold = n1 = 19 / 29 / 221 / 955
number new salespeople nnew = n2 = 19 / 29 / 221 / 955
so smallest sample size is n1 = 29 / n2 = 19
Told = T1 = 19 / 29 / 221 / 955
Tnew = T2 = 19 / 29 / 221 / 955
so rank sum with smallest sample size T = T1 = 955 / T2 = 221
and E(T ) = E(T2 ) =qn2 (n1 +n
2
2 +1)
= 19(29+19+1)
q2
= 47.433 / 465.5
SD(T ) = SD(T2 ) = n1 n2 (n121 +n2 +1)
= 29×19(29+19+1)
12
≈ 47.433 / 465.5
so approximate test statistic is

T − E(T ) T2 − E(T2 ) 221 − 465.5


Z= = ≈ ≈
SD(T ) SD(T2 ) 47.433

−5.15 / 0 / 5.15
so p-value = 2 × P (Z < −5.15) ≈ 0.00 / 0.10 / 0.13
notice approximate and exact answers are different from one another, but both very very small
Level of significance α = 0.01 / 0.05 / 0.10.
wilcox.test(new,old,alternative="two.sided") # exact p-value, use rank sum based on smallest sample size n
2*pnorm(-5.15) # approximate p-value: 2 times P(Z < -5.15) because two-sided test

Wilcoxon rank sum test

data: new and old


W = 31, p-value = 6.039e-09
alternative hypothesis: true location shift is not equal to 0
(c) Conclusion.
Since p–value = 0 < α = 0.05,
220 Chapter 22. Nonparametric Methods (lecture notes 12)

do not reject / reject null guess: distributions same.


sample rank sum Tnew indicates new salespeople population distribution
is less than / equals / is greater than / is different from
old salespeople population distribution.
(d) Comments
approximate p-value does / does not equal exact p-value.
critical value at 5%, Tcrit = 19 / 29 / 182
qwilcox(0.05/2,29,19,lower.tail=TRUE) - 1

[1] 182

3. Inference for two dependent samples: blood cells.


Blood cells from female 1 are broken into two groups. One group of these
blood cells are injected with progesterone; the other group, the control, is, for
comparison purposes, left untreated. Blood cells of other females are handled
in same way.

(a) Review: paired t-test.


Test if mean progesterone response greater than mean control response at
5%. Assume normality with no outliers.
female progesterone (1) control (2)
1 5.85 5.23
2 2.28 1.21
3 1.40 1.51
4 2.12 1.38
# Import Dataset "chapter22.blood.dependent"
blood <- chapter22.blood.dependent; attach(chapter22.blood.dependent); head(chapter22.blood.dependent)
treatment <- blood$treatment; treatment; control <- blood$control; control
difference <- treatment - control; difference <- difference[difference!=0]; difference # remove zeros
label <- c(rep("improve",length(difference)));
label[difference < 0] <- c(rep("worsen",length(difference[difference<0]))); label # label improve/worsen
difference.abs <- abs(difference); difference.abs # remove negatives from ranks
difference.ranked <- rank(difference.abs); difference.ranked
tapply(difference.ranked,label,sum)

i. Statement.
If mean progesterone response, µ1 , is greater than average control re-
sponse, µ2 , µ1 > µ2 , difference in responses must be greater than zero,
µd = µ1 − µ2 > 0, so (circle one)
A. H0 : µd = 0 versus H1 : µd > 0
B. H0 : µd = 0 versus H1 : µd < 0
C. H0 : µd = 0 versus H1 : µd 6= 0
ii. Test.
Section 1. Ranks (lecture notes 12) 221

female progesterone (1) control (2) differences, di


1 5.85 5.23 d1 = 5.85 − 5.23 = 0.62
2 2.28 1.21 1.07
3 1.40 1.51 -0.11
4 2.12 1.38 0.74
¯
Average of differences d = 0.62+1.07−0.11+0.74
= 0.355 / 0.58 / 0.635
4
¯
and test statistic of d = 0.635 is t ≈ 1.42 / 2.33 / 3.19,
with n − 1 = 4 − 1 = 1 / 2 / 3 degrees of freedom,
so chance observed d¯ = 0.58 or more, if µd = 0, is
p–value = P (d̄ ≥ 0.635) ≈ P (t ≥ 2.33) ≈ 0.013 / 0.025 / 0.051.
t.test(treatment,control,alternative="greater",paired=TRUE) # paired t test

Paired t-test

data: treatment and control


t = 2.3303, df = 3, p-value = 0.05106
Level of significance α = 0.01 / 0.05 / 0.10.
iii. Conclusion.
Since p–value = 0.051 > α = 0.050,
(circle one) do not reject / reject null guess: H0 : µd = 0.
Sample average difference d¯ indicates population average difference µd
is less than / equals / is greater than 0: H1 : µd > 0.
In other words, progesterone population mean cellular response
is less than / equals / is greater than / is different from
control population mean cellular response.
iv. Comment: dependent samples
Control blood samples depend on / are independent of
progesterone blood samples.
(b) Wilcoxon signed-rank.
Test if distribution of progesterone response is greater than distribution of
control response at 5%.

female progesterone (1) control (2) differences, di ranks |ranks|


1 5.85 5.23 0.62 2 2
2 2.28 1.21 1.07 4 4
3 1.40 1.51 -0.11 -1 1
4 2.12 1.38 0.74 3 3
i. Statement. Choose one.
A. H0 : distributions same versus
H1 : progesterone distribution greater than control distribution
B. H0 : µd = 0 versus H1 : µd > 0
222 Chapter 22. Nonparametric Methods (lecture notes 12)

C. H0 : treatment distribution > than control distribution versus


H1 : distributions same
ii. Test.
sum positive ranks: Ttreat > control = T + = 2 + 4 + 3 = 1 / 9 / 10,
sum negative ranks: |Ttreat < control | = T − = | − 1| = 1 / 9 / 10,
test statistic (with smallest ranked sum): T − / T +
chance observed T − = 1 or less, if distributions are the same,
p-value = 0.03 / 0.10 / 0.13
level of significance α = 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater",paired=TRUE) # based on smallest signed-rank sum

Wilcoxon signed rank test

data: treatment and control


V = 9, p-value = 0.125
iii. Conclusion.
Since p–value = 0.13 > α = 0.05,
do not reject / reject null guess: distributions same.
signed-rank sum T − indicates treatment population distribution
is less than / equals / is greater than / is different from
control population distribution.
iv. Comments.
Parameter µd is / is not used in this analysis.
Normal distribution is / is not required assumption in analysis.
Random data is / is not required in this analysis.
Signed-rank test deals better / worse with outliers than paired t-test.
(c) Wilcoxon signed-rank, with ties.
Test if distribution of progesterone response is greater than distribution of
control response at 5%.

female progesterone (1) control (2) differences, di ranks |ranks|


1 5.85 5.23 0.62 2 2
2 2.28 2.28 0 0 0
3 1.40 1.51 -0.11 -1 1
4 2.12 1.38 0.74 3 3
i. Statement. Choose one.
A. H0 : distributions same versus
H1 : progesterone distribution greater than control distribution
B. H0 : µd = 0 versus H1 : µd > 0
C. H0 : treatment distribution > than control distribution versus
H1 : distributions same
Section 2. The Wilcoxon Rank-Sum Mann-Whitney Statistic (lecture notes 12) 223

ii. Test.
sum positive ranks: Ttreat > control = T + = 2 + 3 = 1 / 5 / 6,
sum negative ranks: = |Ttreat < control | = T − = | − 1| = 1 / 5 / 6,
ignore tied ranks: Ttreat = control so are / are not counted as zero,
test statistic (with smallest ranked sum): T − / T +
chance observed T − = 1 or less, if distributions are the same,
p-value = 0.03 / 0.13 / 0.21
level of significance α = 0.01 / 0.05 / 0.10.
wilcox.test(treatment,control,alternative="greater",paired=TRUE) # based on smallest signed-rank sum

Wilcoxon signed rank test with continuity correction

data: treatment and control


V = 5, p-value = 0.2113
iii. Conclusion.
Since p–value = 0.21 > α = 0.05,
do not reject / reject null guess: distributions same.
signed-rank sum T − indicates treatment population distribution
is less than / equals / is greater than / is different from
control population distribution.

22.2 The Wilcoxon Rank-Sum Mann-Whitney


Statistic
Material covered in previous section.

22.3 Kruskal-Wallace Test


We look at two nonparametric methods, both analogous to the parametric analysis
of variance (ANOVA) method. One is the Kruskal-Wallace test which is the non-
parametric version of a completely randomized one-way ANOVA. The other is the
Friedman test which is the nonparametric version of a randomized block two-way
ANOVA.

Exercise 22.3 (Kruskal-Wallace Test)

1. Review: test multiple means using one-way ANOVA.


Fifteen different patients, chosen at random, subjected to three drugs. Test if
at least one of the three mean patient responses to drug is different at α = 0.05.
224 Chapter 22. Nonparametric Methods (lecture notes 12)

drug 1 drug 2 drug 3


5.90 5.51 5.01
5.92 5.50 5.00
5.91 5.50 4.99
5.89 5.49 4.98
5.88 5.50 5.02
x̄1 ≈ 5.90 x̄2 ≈ 5.50 x̄3 ≈ 5.00

# Import dataset "chapter20.drugA.oneway"


y.drug.A <- chapter20.drugA.oneway; attach(y.drug.A); head(y.drug.A)
sapply(list(drug.1,drug.2,drug.3), mean) # mean responses for three drugs
(y.stack.A <- stack(y.drug.A)); names(y.stack.A) <- c("response.A", "drug.A"); attach(y.stack.A); head(y.stack.A)
tapply(response.A,drug.A,var)

(a) Statement.
i. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ2 , µ1 = µ3 .
ii. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ3 , µ1 6= µ2 .
iii. H0 : µ1 = µ2 = µ3 vs H1 : µi 6= µj , i 6= j; i, j = 1, 2, 3.
iv. H0 : means same vs H1 : at least one of the means different
(b) Test.
p–value = (circle one) 0.00 / 0.035 / 0.043.
summary(aov(response.A~drug.A))

Df Sum Sq Mean Sq F value Pr(>F)


drug.A 2 2.0333 1.0167 5545 <2e-16 ***
Residuals 12 0.0022 0.0002
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Level of significance α = (choose one) 0.01 / 0.05 / 0.10.
(c) Conclusion.
Since p–value = 0.00 < α = 0.05,
(circle one) do not reject / reject null H0 : means same.
Data indicates (circle one)
average drug responses same
at least one of average drug responses different

2. Test multiple population centers using Kruskal-Wallis Test.


Fifteen different patients, chosen at random, subjected to three drugs. Test if
at least one of three population patient response centers to drug is different at
α = 0.05. Fill in the missing ranks.
Section 3. Kruskal-Wallace Test (lecture notes 12) 225

drug 1 rank drug 2 rank drug 3 rank


5.90 13 5.51 10 5.01 4
5.92 15 5.50 5.00 3
5.91 14 5.50 4.99 2
5.89 12 5.49 6 4.98 1
5.88 11 5.50 5.02 5
T1 = 65 T2 = T3 = 15

# Import Dataset "chapter20.drugA.oneway"


drug <- chapter20.drugA.oneway; attach(drug); head(drug)
drug.1 <- drug$drug.1[!is.na(drug$drug.1)]; drug.1; length(drug.1) # remove treatment zeros
label.drug.1 <- c(rep("drug.1",length(drug.1))); label.drug.1 # label treatments
drug.2 <- drug$drug.2[!is.na(drug$drug.2)]; drug.2; length(drug.2)
label.drug.2 <- c(rep("drug.2",length(drug.2))); label.drug.2
drug.3 <- drug$drug.3[!is.na(drug$drug.3)]; drug.3; length(drug.3)
label.drug.3 <- c(rep("drug.3",length(drug.3))); label.drug.3
combined <- c(drug.1,drug.2,drug.3); combined # combine treatments, labels
label.combined <- factor(c(label.drug.1,label.drug.2,label.drug.3))
combined.ranks <- rank(combined); combined.ranks # ranks of combined data

(a) Statement.
i. H0 : at least one center different vs H1 : centers same
ii. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ3 , µ1 6= µ2 .
iii. H0 : centers same vs H1 : at least one center different
(b) Test.
since T1 = 15 / 40 / 65
and T2 = 15 / 40 / 65
and T3 = 15 / 40 / 65
tapply(combined.ranks,label.combined,sum) # sum ranks of separate treatments

drug.1 drug.2 drug.3


65 40 15
and overall sample size N = 15 / 40 / 65
test statistic
12 X Ti2
H = − 3(N + 1)
N(N + 1) ni
T12 T22 T32
!
12
= + + − 3(N + 1)
N(N + 1) n1 n2 n2
652 402 152
!
12
= + + − 3(15 + 1) =
15(15 + 1) 5 5 5

12.5 / 15 / 17.3
so chance observed H =
calculated H here different H from R because R adjusted for ties
12.5 or more, if distributions the same,
p–value = P (H > 12.5) ≈ 0.001 / 0.035 / 0.043
226 Chapter 22. Nonparametric Methods (lecture notes 12)

kruskal.test(combined,label.combined) # Kruskal-Wallace test statistic, p-value

Kruskal-Wallis rank sum test

data: combined and label.combined


Kruskal-Wallis chi-squared = 12.59, df = 2, p-value = 0.001846
Level of significance α = 0.01 / 0.05 / 0.10.
(c) Conclusion.
Since p–value = 0.001 < α = 0.050,
do not reject / reject null H0 : means same
Data indicates
drug responses same
at least one of drug responses different
(d) Comments.
Parameter are / are not used in this analysis.
Normal distribution is / is not required assumption in analysis.
Random data is / is not required in this analysis.
Kruskal-Wallace test better / worse with outliers than one-way ANOVA.
upper critical value at 5% χ2k−1,α = χ23−1,0.05 = 5.99 / 6.99 / 7.99
qchisq(0.05,2,lower.tail=FALSE) # chi-square upper critical value at 5%

[1] 5.991465

3. Test multiple population centers with blocking using Friedman Test.


Five different patients, chosen at random, are each subjected to three drugs (at
different times). Test if at least one of three population patient response centers
to drug is different at α = 0.05. Fill in the missing ranks. Notice rank within
patient blocks.

patient drug 1 rank drug 2 rank drug 3 rank


1 5.90 2 6.31 3 4.52 1
2 4.42 2 3.54 1 6.93 3
3 7.51 3 4.73 2 4.48 1
4 7.89 3 7.20 2 5.55 1
5 3.78 5.72 3.52
T1 = 12 T2 = 11 T3 = 7

# Import Dataset "chapter20.drugB.oneway"


drug <- as.matrix(chapter20.drugB.oneway); head(drug)

(a) Statement.
i. H0 : at least one center different vs H1 : centers same
Section 3. Kruskal-Wallace Test (lecture notes 12) 227

ii. H0 : µ1 = µ2 = µ3 vs H1 : µ1 6= µ3 , µ1 6= µ2 .
iii. H0 : centers same vs H1 : at least one center different
(b) Test.
since T1 = 7 / 11 / 12
and T2 = 7 / 11 / 12
and T3 = 7 / 11 / 12
and number of blocks b = 3 / 5
and number of treatments k = 3 / 5
test statistic
12
Ti2 − 3b(k + 1)
X
F =
bk(k + 1)
12  
= T12 + T22 + T32 − 3b(k + 1)
bk(k + 1)
12  
= 122 + 112 + 72 − 3(5)(3 + 1) =
3(5)(3 + 1)

2.7 / 2.8 / 2.9


so chance observed F = 2.8 or more, if distributions the same,
p–value = P (F > 2.8) ≈ 0.00 / 0.25 / 0.43
friedman.test(drug) # Friedman test statistic, p-value

Friedman rank sum test

data: drug
Friedman chi-squared = 2.8, df = 2, p-value = 0.2466
Level of significance α = 0.01 / 0.05 / 0.10.
(c) Conclusion.
Since p–value = 0.25 > α = 0.05,
do not reject / reject null H0 : means same
Data indicates
drug responses same
at least one of drug responses different
(d) Comments.
Parameter are / are not used in this analysis.
Normal distribution is / is not required assumption in analysis.
Random data is / is not required in this analysis.
upper critical value at 5% χ2k−1,α = χ23−1,0.05 = 5.99 / 6.99 / 7.99
qchisq(0.05,2,lower.tail=FALSE) # chi-square upper critical value at 5%

[1] 5.991465
228 Chapter 22. Nonparametric Methods (lecture notes 12)

22.4 Paired Data: The Wilcoxon Signed-Ranked


Test
Material covered in previous section.

22.5 Friedman Test for a Randomized Block De-


sign
Material covered in the previous section.

22.6 Kendall’s Tau: Measuring Monotonicity


We look at measures of association between two quantitative variables. After review-
ing the parametric Pearson’s correlation, given by
P P
xi yi
xi yi −
P
n
rp = v ! !
u 2 2
( xi ) ( yi )
P P
t P x2 yi2 −
u

P
i n n

we look at two analogous nonparametric statistics. One is Spearman’s correlation


which is the correlation of the ranks of variables x and y, rx and ry , and given by
P P
rx ry
rx ry −
P
n
rs = v ! !,
u 2 2
( rx ) ( ry )
P P
u P
r2 − ry2 −
t P
x n n

and the other is Kendall’s Tau which looks at all pairwise points in a scatterplot and
compares the number discordant, nc (slopes between points are negative) and number
concordant, nc (slopes between points are positive) using the following formula,
nc − nd
τ= 1 .
2
n(n − 1)

In all three cases, −1 < rp < 1, −1 < rs < 1 and −1 < τ < 1, however, only
Pearson’s correlation measures linear association, whereas the other two measure
association alone. Pearson’s correlation is sensitive to outliers whereas the other two
are not.

Exercise 22.6 (Kendall’s Tau: Measuring Monotonicity)


1. Comparing measures of association.
Section 6. Kendall’s Tau: Measuring Monotonicity (lecture notes 12) 229

600
100

200
550

180
95

500

160
90
Reading Ability

140
Pizza Sales
Grain Yield

450
85

120
400
80

100
350
75

80
300

60
70

2 4 6 8 10 0 50 100 150 5 10 15 20 25

Brightness Distance from Water’s Edge Number of Students

Figure 22.3: Three scatterplots

# Import dataset "chapter4.reading.brightness", "chapter4.grain.water", "chapter4.pizza.students"


reading <- chapter4.reading.brightness; attach(reading); head(reading)
grain <- chapter4.grain.water; attach(grain); head(grain)
pizza <- chapter4.pizza.students; attach(pizza); head(pizza)

par(mfrow=c(1,3))
plot(reading$brightness, reading$reading, pch=16,col="red",xlab="Brightness",ylab="Reading Ability")
plot( grain$distance, grain$yield,pch=16,col="red",ylab="Grain Yield",xlab="Distance from Water’s Edge")
plot(pizza$students, pizza$sales, pch=16,col="red",ylab="Pizza Sales",xlab="Number of Students")
par(mfrow=c(1,1))

(a) Reading ability versus brightness


brightness, x 1 2 3 4 5 6 7 8 9 10
reading ability, y 70 70 75 88 91 94 100 92 90 85
Pearson correlation rp ≈ 0.449 / 0.584 / 0.704
Spearman correlation rs ≈ 0.449 / 0.584 / 0.704
Kendall’s tau r ≈ 0.449 / 0.584 / 0.704
So, in all three cases, association between reading ability and brightness is
negative / positive but it is strongest (closest to 1) for rp / rs / τ
cor(reading$brightness,reading$reading,method="pearson")
cor(reading$brightness,reading$reading,method="spearman")
cor(reading$brightness,reading$reading,method="kendall")

[1] 0.7043218
[1] 0.5835893
[1] 0.4494666
(b) Grain yield versus distance from water
230 Chapter 22. Nonparametric Methods (lecture notes 12)

dist, x 0 10 20 30 45 50 70 80 100 120 140 160 170 190


yield, y 500 590 410 470 450 480 510 450 360 400 300 410 280 350

Pearson correlation rp ≈ −0.791 / −0.785 / −0.589


Spearman correlation rs ≈ −0.791 / −0.785 / −0.589
Kendall’s tau r ≈ −0.791 / −0.785 / −0.589
So, in all three cases, association between reading ability and brightness is
negative / positive but it is strongest (closest to -1) for rp / rs / τ
cor(grain$distance,grain$yield,method="pearson")
cor(grain$distance,grain$yield,method="spearman")
cor(grain$distance,grain$yield,method="kendall")

[1] -0.7851085
[1] -0.7907508
[1] -0.5889252
(c) Annual pizza sales versus student number

student number, x 2 6 8 8 12 16 20 20 22 26
pizza sales, y 58 105 88 118 117 137 157 169 149 202
Pearson correlation rp ≈ 0.796 / 0.920 / 0.950
Spearman correlation rs ≈ 0.796 / 0.920 / 0.950
Kendall’s tau r ≈ 0.796 / 0.920 / 0.950
So, in all three cases, association between reading ability and brightness is
negative / positive but it is strongest (closest to 1) for rp / rs / τ
cor(pizza$students,pizza$sales,method="pearson")
cor(pizza$students,pizza$sales,method="spearman")
cor(pizza$students,pizza$sales,method="kendall")

[1] 0.950123
[1] 0.9207488
[1] 0.7956601

2. Comparing Spearman and Pearson correlations.

brightness, x 1 2 3 4 5 6 7 8 9 10
rank(brightness) 1 2 3 4 5 6 7 8 9 10
reading ability, y 70 70 75 88 91 94 100 92 90 85
rank(ability) 1.5 1.5 3 5 7 9 10 8 6 4

Consider reading data. Scatterplot of ranks of data more / less able to fit a
line than data itself and so consequently rs = 0.584 < rp = 0.704. Spearman’s
correlation rs measures association / linear association.
# Import dataset "chapter22.reading.brightness.ranks"
reading.ranks <- chapter22.reading.brightness.ranks; attach(reading.ranks); head(reading.ranks)
reading <- chapter4.reading.brightness; attach(reading); head(reading)
Section 7. Spearman’s Rho (lecture notes 12) 231
95 100

10
Rank(Reading Ability)

8
Reading Ability

90

6
85
80

4
75

2
70

2 4 6 8 10 2 4 6 8 10

Brightness Rank(Brightness)

Figure 22.4: Scatterplot of data and ranks of data

par(mfrow=c(1,2))
plot(reading$brightness, reading$reading, pch=16,col="red",xlab="Brightness",ylab="Reading Ability")
plot(reading.ranks$rank.brightness, reading.ranks$rank.reading, pch=16,col="red",xlab="Rank(Brightness)",ylab="Rank(Readi
par(mfrow=c(1,1))

3. Kendall’s tau.

brightness, x 1 2 3 4 5 6 7 8 9 10
reading ability, y 70 70 75 88 91 94 100 92 90 85

Consider reading data.


examples of concordant (positive) pairwise slopes are blue / red
examples of discordant (negative) pairwise slopes are blue / red
if number of concordant, nc = 32, number discordant, nd = 12, and since there
are n = 10 points,
nc − nd 32 − 12
τ= 1 = 1 =
2
n(n − 1) 2
10(10 − 1)
0.44 / 0.50 / 0.55
the one green horizontal slope neither concordant nor discordant is ignored

22.7 Spearman’s Rho


Material covered in the previous section.
232 Chapter 22. Nonparametric Methods (lecture notes 12)

100

d
95

d d
Reading Ability

d
90

d
d
85

c
c
80
75
70

2 4 6 8 10

Brightness

Figure 22.5: Concordant and discordant pairwise slopes

22.8 When Should You Use Nonparametric Meth-


ods?
Material covered in previous section.

You might also like