Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 46

Nonparametric tests I

Back to basics

Lecture Outline
What is a nonparametric test?
Rank tests, distribution free tests and
nonparametric tests
Which type of test to use

MTB > dotplot 'Male' 'Female';


SUBC> same.
.
: . . . .
. . :: :..:::.. :..:: :... .:.. .. . : . .
---+---------+---------+---------+---------+---------+---MALE
..: . : : : .
.: ::::::.::.:. ::.: : . : . .
---+---------+---------+---------+---------+---------+---FEMALE
0.32
0.48
0.64
0.80
0.96
1.12

MTB > dotplot 'Male' 'Female';


SUBC> same.
.
: . . . .
. . :: :..:::.. :..:: :... .:.. .. . : . .
---+---------+---------+---------+---------+---------+---MALE
..: . : : : .
.: ::::::.::.:. ::.: : . : . .
---+---------+---------+---------+---------+---------+---FEMALE
0.32
0.48
0.64
0.80
0.96
1.12

MTB > desc 'Male' 'Female


Variable N Mean Median TrMean StDev SEMean
MALE
50 0.5908 0.5600 0.5770 0.1979 0.0280
FEMALE
50 0.5180 0.4950 0.5102 0.1315 0.0186
Variable
Min
Max
Q1
Q3
MALE
0.2900 1.1300 0.4275 0.7150
FEMALE 0.3200 0.8500 0.4100 0.6125

Lecture Outline
What is a nonparametric test?
What is a parameter?
What are examples of non-parametric
tests?

Rank tests, distribution free tests and


nonparametric tests
Which type of test to use

Parameters
are central to inference in GLM and
ANOVA
and represent assumptions about the
underlying processes

LET K1=4.7 # Group 1 mean minus grand mean


LET K2=-2.5 # Group 2 mean minus grand mean
LET K3=10.4 # The grand mean
LET K4=1.9 # Standard deviation of the error
RANDOM 30 'Error'
LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'

LET K1=4.7 # Group 1 mean minus grand mean


LET K2=-2.5 # Group 2 mean minus grand mean
LET K3=10.4 # The grand mean
LET K4=1.9 # Standard deviation of the error
RANDOM 30 'Error'
LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
Group
1
1
Fitted value = m + 2
2
3
-1-2

Error has Normal Distribution with zero mean and


standard deviation

LET K1=4.7 # Group 1 mean minus grand mean


LET K2=-2.5 # Group 2 mean minus grand mean
LET K3=10.4 # The grand mean
LET K4=1.9 # Standard deviation of the error
RANDOM 30 'Error'
LET 'Y'=K3+K1*'DUM1'+K2*'DUM2'+K4*'Error'
Group
1
1
Fitted value = m + 2
2
3
-1-2

Error has Normal Distribution with zero mean and


standard deviation

Parameters
are central to inference in GLM and
ANOVA
but represent assumptions about the
underlying processes

Parameters
are central to inference in GLM and
ANOVA
but represent assumptions about the
underlying processes
can be done without in some simple
situations

Parameters
are central to inference in GLM and
ANOVA
but represent assumptions about the
underlying processes
can be done without in some simple
situations BUT HOW?

Rnk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Wt Sex
0.29
1
0.32
2
0.34
1
0.34
2
0.34
2
0.36
1
0.36
1
0.37
1
0.37
1
0.37
1
0.37
2
0.37
2
0.38
1
0.38
1
0.38
2
0.38
2
0.39
2
0.40
2
0.40
2
0.40
2
0.41
1
0.41
1
0.41
2
0.41
2
0.41
2

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

0.41
0.42
0.43
0.43
0.43
0.45
0.45
0.45
0.45
0.46
0.47
0.47
0.48
0.48
0.48
0.48
0.49
0.49
0.50
0.50
0.50
0.50
0.50
0.51
0.51

2
1
1
2
2
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
2
2
1
2

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

0.52
0.52
0.52
0.53
0.53
0.55
0.56
0.56
0.56
0.57
0.58
0.58
0.59
0.59
0.59
0.60
0.61
0.61
0.62
0.62
0.62
0.62
0.62
0.63
0.63

1
2
2
2
2
2
1
1
1
1
2
2
1
2
2
1
1
2
1
1
2
2
2
1
2

76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

0.65
0.66
0.67
0.67
0.67
0.67
0.68
0.71
0.72
0.73
0.75
0.75
0.77
0.78
0.78
0.78
0.82
0.83
0.85
0.85
0.88
0.98
0.98
1.05
1.13

1
1
1
2
2
2
1
1
2
1
1
1
1
1
2
2
2
1
1
2
1
1
1
1
1

Rnk
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

Wt Sex
0.29
1
0.32
2
0.34
1
0.34
2
0.34
2
0.36
1
0.36
1
0.37
1
0.37
1
0.37
1
0.37
2
0.37
2
0.38
1
0.38
1
0.38
2
0.38
2
0.39
2
0.40
2
0.40
2
0.40
2
0.41
1
0.41
1
0.41
2
0.41
2
0.41
2

Remember ties
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

0.41
0.42
0.43
0.43
0.43
0.45
0.45
0.45
0.45
0.46
0.47
0.47
0.48
0.48
0.48
0.48
0.49
0.49
0.50
0.50
0.50
0.50
0.50
0.51
0.51

2
1
1
2
2
1
2
2
2
2
1
1
1
1
2
2
2
2
1
1
1
2
2
1
2

51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75

0.52
0.52
0.52
0.53
0.53
0.55
0.56
0.56
0.56
0.57
0.58
0.58
0.59
0.59
0.59
0.60
0.61
0.61
0.62
0.62
0.62
0.62
0.62
0.63
0.63

1
2
2
2
2
2
1
1
1
1
2
2
1
2
2
1
1
2
1
1
2
2
2
1
2

76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100

0.65
0.66
0.67
0.67
0.67
0.67
0.68
0.71
0.72
0.73
0.75
0.75
0.77
0.78
0.78
0.78
0.82
0.83
0.85
0.85
0.88
0.98
0.98
1.05
1.13

1
1
1
2
2
2
1
1
2
1
1
1
1
1
2
2
2
1
1
2
1
1
1
1
1

140
120
100
80
60
40
20
0
0

10

20

30

40

50

60

Mean Rank

70

80

90 100

140
120
100
80
60
40
20
0
0

10

20

30

40

50

60

Mean Rank

The Male mean rank = 55.26


The Female mean rank = 45.74

70

80

90 100

MTB > mann-whitney male female

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
FEMALE

N =
N =

50
50

Median =
Median =

0.5600
0.4950

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0

Sum of ranks of 2763 corresponds to


a mean rank of 2763/50 = 55.26

140
120
100
80
60
40
20
0
0

10

20

30

40

50

60

Mean Rank

The Male mean rank = 55.26


The Female mean rank = 45.74

70

80

90 100

140
120
100
80
60
40
20
0
0

10

20

30

40

50

60

Mean Rank

The Male mean rank = 55.26


The Female mean rank = 45.74

70

80

90 100

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05

MTB > mann-whitney male female


Mann-Whitney Test and CI: MALE, FEMALE
MALE
N = 50
Median =
0.5600
FEMALE
N = 50
Median =
0.4950
Point estimate for ETA1-ETA2 is
0.0500
95.0 Percent CI for ETA1-ETA2 is (-0.0100,0.1200)
W = 2763.0
Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1016
The test is significant at 0.1014 (adjusted for ties)
Cannot reject at alpha = 0.05

The null hypothesis is better expressed as the distributions


of male and female weights are the same.

Parameters
are central to inference in GLM and
ANOVA
but represent assumptions about the
underlying processes
can be done without in some simple
situations

Nonparametric vs Parametric

Nonparametric vs Parametric
Sign Test

One-sample t-test

Nonparametric vs Parametric
Sign Test
Mann-Whitney Test

One-sample t-test
Two-sample t-test

Nonparametric vs Parametric
Sign Test
Mann-Whitney Test
Spearman Rank Test

One-sample t-test
Two-sample t-test
Correlation/Regression

Nonparametric vs Parametric

Sign Test
Mann-Whitney Test
Spearman Rank Test
Kruskal-Wallis Test

One-sample t-test
Two-sample t-test
Correlation/Regression
One-way ANOVA

Nonparametric vs Parametric

Sign Test
Mann-Whitney Test
Spearman Rank Test
Kruskal-Wallis Test
Friedman Test

One-sample t-test
Two-sample t-test
Correlation/Regression
One-way ANOVA
One-way blocked ANOVA

Lecture Outline
What is a nonparametric test?
Rank tests, distribution free tests and
nonparametric tests
Which type of test to use

A rose by any other name..


Non-parametric tests lack parameters
Rank tests start by ranking the data
Distribution-free tests dont assume a
Normal distribution (or any other)
These are mainly but not completely
overlapping sets of tests (and some
are scale-invariant too).

Lecture Outline
What is a nonparametric test?
Rank tests, distribution free tests and
nonparametric tests
Which type of test to use

Fewer assumptions but...


still some assumptions (including independence)
limited range of situations
no more than 2 x-variables
cant mix continuous and categorical x-variables
provide p-values but estimation is dodgy
loss of efficiency if parametric assumptions are
upheld
there is a grand scheme for parametric statistics
(GLM) but a lot of separate strange names for
nonparametrics

When is there a choice?


when there is a non-parametric test
fewer than two or three variables
altogether

and prediction is not required

How to choose:
If the assumptions of parametric test are
upheld, use it on grounds of efficiency
If not upheld, consider fixing the
assumptions (e.g. by transforming the
data, as in the practical)
If assumptions not fixable, use
nonparametric test

MTB > dotplot 'LogM' 'LogF';


SUBC> same.
.
. .
.
. ::: :.. . :::.. :..::.:....: : : . : . .
+---------+---------+---------+---------+---------+-------LogM
.: . :
. .
. : ::.:: : :. ::.::. ::.:. : . : ..
+---------+---------+---------+---------+---------+-------LogF
-1.25 -1.00 -0.75 -0.50 -0.25
0.00

MTB > dotplot 'LogM' 'LogF';


SUBC> same.
.
. .
.
. ::: :.. . :::.. :..::.:....: : : . : . .
+---------+---------+---------+---------+---------+-------LogM
.: . :
. .
. : ::.:: : :. ::.::. ::.:. : . : ..
+---------+---------+---------+---------+---------+-------LogF
-1.25 -1.00 -0.75 -0.50 -0.25
0.00
MTB > desc 'LogM' 'LogF'
Variable N Mean Median TrMean StDev SEMean
LogM
50 -0.5786 -0.5798 -0.5850 0.3248 0.0459
LogF
50 -0.6878 -0.7032 -0.6928 0.2453 0.0347
Variable
Min
Max
Q1
Q3
LogM
-1.2379 0.1222 -0.8499 -0.3355
LogF
-1.1394 -0.1625 -0.8916 -0.4902

Lecture Outline
What is a nonparametric test?
Rank tests, distribution free tests and
nonparametric tests
Which type of test to use

Last remarks
Nonparametric tests are an opportunity
to revise the basic ideas of statistical
inference
They are sometimes useful in biology
They are often used in biology
NEXT WEEK: more nonparametrics,
including confidence intervals and
randomisation tests. READ the handout

You might also like