Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

The European Research Journal 2023;9(4):687-696 Original Article

DOI: 10.18621/eurj.1037546 Biostatistics

Comparison of the performances of non-parametric


k-sample test procedures as an alternative to one-way
analysis of variance
Aslı Ceren Macunluoğlu1 , Gökhan Ocakoğlu2

Department of Biostatistics, Bursa Uludağ University, Institute of Health Sciences, Bursa, Turkey; 2Department of Biostatistics, Bursa
1

Uludağ University, Faculty of Medicine, Bursa, Turkey

ABSTRACT
Objectives: The performances of the Kruskal-Wallis test, the van der Waerden test, the modified version of
Kruskal-Wallis test based on permutation test, the Mood’s Median test and the Savage test, which are among
the non-parametric alternatives of one-way analysis of variance and included in the literature, to protect the
Type-I error probability determined at the beginning of the trial at a nominal level, were compared with the F
test.
Methods: Performance of the tests to protect Type-I error; in cases where the variances are
homogeneous/heterogeneous, the sample sizes are balanced/unbalanced, the distribution of the data is in
accordance with the normal distribution/the log-normal distribution, how it is affected by the change in the
number of groups to be compared has been examined on simulation scenarios.
Results: The Kruskal-Wallis test, the van der Waerden test, the modified version of the Kruskal-Wallis test
based on the permutation test were not affected by the distribution of the data, but by the violation of the
homogeneity of the variances. The performance of the Mood’s Median test and the Savage test were not found
to be sufficient in terms of protection of theType-I error compared to other tests.
Conclusions: It was determined that the Kruskal-Wallis test, the van der Waerden test, the modified version
of Kruskal-Wallis test based on permutation test were not affected by the distribution of the data and tended to
preserve the Type-І error when the variances were homogeneous.
Keywords: Analysis of variance, conformity of normal distribution, non-parametric k-sample tests

D ata analysis methods that will be allied to the data


obtained from research with at least interval
scale; variance varies according to sample size, distri-
parametric test. Parametric tests are statistical methods
that require data to be measured on an interval or ratio
scale, which can be applied due to certain assump-
bution of data, and the number of groups to be com- tions. Non-parametric test procedures are alternatively
pared. One of the most critical steps of statistical data preferred when the necessary assumptions are not met
analysis is to decide whether the test procedure to be for performing parametric tests.
used to analyze the data will be a parametric or non- One-way analysis of variance (ANOVA) or F-test,

Received: December 16, 2021; Accepted: August 10, 2022; Published Online: October 24, 2022

How to cite this article: Macunluoğlu AC, Ocakoğlu G. Comparison of the performances of non-parametric k-sample test procedures as an alternative
to one-way analysis of variance. Eur Res J 2023;9(4):687-696. DOI: 10.18621/eurj.1037546
e-ISSN: 2149-3189
Address for correspondence: Gökhan Ocakoğlu, PhD., Professor, Bursa Uludağ University, Faculty of Medicine , Department of Biostatistics, Bursa,
Turkey, E-mail: gocakoglu@gmail.com, Phone: +90 224 295 38 71
©
Copyright © 2023 by Prusa Medical Publishing
Available at http://dergipark.org.tr/eurj
info@prusamp.com

The European Research Journal Volume 9 Issue 4 July 2023 687


Eur Res J 2023;9(4):687-696 Non-parametric k-sample test procedures as an alternative to ANOVA

which is a parametric test, is used to compare the mean anced [3]. Non-parametric tests are statistical proce-
of more than two populations and is one of the most dures that are preferred as an alternative to parametric
frequently used and most important statistical methods tests when assumptions are not met. Non-parametric
for this purpose [1]. The assumptions for the F test in- tests have less assumptions than parametric tests [4].
clude that the data is normally distributed, the sample The data need not conform to a normal distribution.
variances are equal, and the samples are independent Non-parametric tests can be applied to data measured
[2]. If the assumptions of conformity to normal distri- with a classifier or ordinal scale.
bution or homogeneity of variance are violated, the Pearson [5], Glass et al. [6], and Wilcox [7] exam-
probability of Type-І error obtained at the end of the ined the effect of the normality assumption violation
trials and the power of the test are adversely affected. on the Type-Ⅰ error. Wilcox [7] concluded that samples
This adversely affect becomes even more evident if that do not conform to normal distribution have some
the sample sizes in the groups compared are not bal- impact on the Type-Ⅰ error rate, but the effect is mini-
!
!"#$%&'(&)"*+$%&,-.%,&/0&12%&34/5+,!
"#$%&'! -./.01&2!3.$+/&! "(04%./.01&2!3.$+/&!
()!
*'(#+,!
! ! 5%,&'6.78(0! 5%,&'6.78(0! 5%,&'6.78(0!
1($%80.78(0,!9:&'&!7:&! 1($%80.78(0,!9:&'&!7:&! 1($%80.78(0,!987:!
0#$%&'!()!,.$+/&!,8;&,! 0#$%&'!()!,.$+/&!,8;&,! 806&',&!$.71:80*!%&79&&0!
.'&!0(7!&<#./! 28))&',!&=1&,,86&/>! 6.'8.01&!.02!0#$%&'!()!
,.$+/&!,8;&,!
!#!#!"
$#$#$"
%&#%&#%&" !#$#)" )#$#!"
%$#%$#%$" $#%&#%$" %$#%&#$"
!#'$#!&"
'&#'&#'&" '&#'$#!&" !&#'$#'&"
!" !#(&#(&"
'$#'$#'$" $&#*&#)&" )&#*&#$&"
$#'&#%&&"
!&#!&#!&" *$#)$#($" ($#)$#*$"
$&#$&#$&" )&#+&#%&&" %&&#+&#)&"
(&#(&#(&"
%&&#%&&#%&&"
!#!#!#!#!"
$#$#$#$#$"
%&#%&#%&#%&#%&" !#$#)#+#%%" )#$#!"
%$#%$#%$#%$#%$" $#)#+#%'#%$" !#'&#'$#(&#%&&" %$#%&#$"
'&#'&#'&#'&#'&" '&#''#',#'(#!&" !#$#!&#(&#%&&" !&#'$#'&"
$"
'$#'$#'$#'$#'$" $&#$$#*&#*$#)&" $#%&#'&#'$#(&" )&#*&#$&"
!&#!&#!&#!&#!&" $$#*$#)$#($#+$" !#$#%&#%$#%&&" ($#)$#*$"
$&#$&#$&#$&#$&" *&#)&#(&#+&#%&&" %&&#+&#)&"
(&#(&#(&#(&#(&"
%&&#%&&#%&&#%&&#%&&"
!#!#!#!#!#!#!#!"
$#$#$#$#$#$#$#$"
%&#%&#%&#%&#%&#%&#%&#%&"
%$#%$#%$#%$#%$#%$#%$#%$" !#$#)#+#%%#%'#%,#%$" !#$#%&#'&#'$#!&#(&#%&&" %$#%,#%'#%%#+#)#$#!"
'&#'&#'&#'&#'&#'&#'&#'&" '&#''#',#'$#'*#'(#'+#!&" $#%&#'&#'&#'$#(&#+&#%&&" !&#'+#'(#'*#'$#',#''#'&"
("
'$#'$#'$#'$#'$#'$#'$#'$" $&#$$#*&#*$#)&#)$#(&#($" !#$#%&#(&#(&#+&#%&&#%&&" ($#(&#)$#)&#*$#*&#$$#$&"
!&#!&#!&#!&#!&#!&#!&#!&" *&#*$#)$#(&#($#+&#+$#%&&" '&#'$#!&#(&#+&#+&#%&&#%&&" %&&#+$#+&#($#(&#)$#*$#*&"
$&#$&#$&#$&#$&#$&#$&#$&"
(&#(&#(&#(&#(&#(&#(&#(&"
%&&#%&&#%&&#%&&#%&&#%&&#%&&#%&&"

688 The European Research Journal Volume 9 Issue 4 July 2023


Eur Res J 2023;9(4):687-696 Macunluoğlu and Ocakoğlu

mal if the variances are homogeneous. Glass et al. [6] obtained after the numbers of H0 hypotheses were de-
reported similar results to Wilcox [7] in their studies termined, which were rejected at the end of 50000 rep-
if the variances were homogeneous. In his study, Bun- etitions. In our study, the evaluation criterion proposed
ing [8] examined the performances of the Kruskal- by Peterson [10] was adopted and it was concluded
Wallis test, the normal score test and the Welch test, that the performance of the tests with a probability of
which he included as an alternative to the F test and the Type-I error between 4.49% and 5.49% was suffi-
the F test, in terms of Type-I error and power. He eval- cient to maintain Type-I error.
uated the performances of the tests under various sim- Table 2 shows the variance rates of the groups that
ulation scenarios in terms of whether the variances are are suitable for normal distribution and the scale pa-
homogeneous or not in equal and unequal sample sizes rameter values of the groups that are suitable for log-
if the data show normal distribution or not. In his normal distribution.
study, Moder [2] stated that the location parameters of
the groups should be investigated in detail when there The F Test
are unbalanced sample sizes. One-way analysis of variance (ANOVA) or F test
In our study, we compared the performances of the is used to compare the mean of more than two popu-
Kruskal-Wallis test, the Mood’s Median test, the van lations. It is one of the most important and frequently
der Waerden test, the modified version of Kruskal- used methods of applied statistics [1]. The null hypoth-
Wallis test based on permutation test and the Savage esis H0: µ1=µ2=…=µk versus alternative H1: at least
test, which are among the non-parametric alternatives one µi (i= 1, 2, . . ., k) is different. The F test statistic,
of the F test, to protect the Type-Ⅰ error under various
simulation scenarios.
!$ & #
!%& "! #$!" %$"" ' (#)%*'
!" ( # #!*%./)%*0-%) ! !"#$
!$ ! &
!%& !'%&+$!' %$!" , (#-%)'

METHODS

In our study, the Kruskal-Wallis test, the modified ver- In Equation, k is the number of groups, N is the
sion of Kruskal-Wallis test based on permutation test, total number of observations, Xij is the jth observation
the Mood’s Median test, the van der Waerden test and (j = 1, 2, . . ., ni) in the ith group (i = 1, 2, . . . , k),
the Savage test in terms of maintaining the probability ! " # $ % &' !
! "" is the overall mean, !"!"!is the sample mean for
of the Type-Ⅰ error determined at the beginning of the the ith group. The F test is more powerful if the as-
experiment was compared with the F test. Simulation sumptions of normality and variance homogeneity
scenarios were run under the R program [9]. hold. The null hypothesis, H0: µ1=µ2=…=µk, should
The performance of the tests was evaluated as a then be rejected at the α level of significance when
result of comparisons between three, five, and eight F ≥ F1-α;k-1,N-k.
groups for simulation scenarios involving balanced/
non-balanced sample sizes (Table 1), normal distribu- The Kruskal-Wallis Test
tion or log-normal distribution, homogenous or het- One of the non-parametric alternatives to the F test
erogeneous variances (Table 2). In addition to the is the Kruskal-Wallis (KW) test. The KW test is a non-
specified simulation conditions, observation combina- parametric test procedure used to compare three or
tions are also included, where the number of sample more groups independently [11]. It is carried out using
size varies excessively among the group with higher ranks given to observation values instead of actual ob-
variance is assigned a lower number of observations, servation values. To calculate the test statistics, the
and the group with a lower variance is assigned a data are sorted from small to large, and each is as-
$!

higher number of observations and inverse matching signed a rank. is the sum of ranks assigned to
!! " # !"# !
#%&

between variance and sample size. the observations in the ith group. The null hypothesis
In comparisons made to determine Type-I error, H0: θ1= θ2=…= θk versus alternative H1: at least one θi
group means were taken equally. The Type- I error (i= 1, 2, . . ., k) is different. The test statistic is calcu-
probabilities for each of the simulation scenarios were lated as,

The European Research Journal Volume 9 Issue 4 July 2023 689


Eur Res J 2023;9(4):687-696 Non-parametric k-sample test procedures as an alternative to ANOVA

!
!"#$%&6(&7"4-"89%&4"1%,&/0&34/5+,!
! "('$./!28,7'8%#78(0! ?(*4!0('$./!28,7'8%#78(0!
"#$%&'!()! @($(*&0&(#,!A.'8.01&! @&7&'(*&0&(#,! @($(*&0&(#,!31./&! @&7&'(*&0&(#,!,1./&!
*'(#+,! A.'8.01&! +.'.$&7&'!B%C! +.'.$&7&'!B%C!
%#%#'"
%#'#'"
%#%#," &-%&#&-%&#&-'&""
%#,#," &-%&#&-'&#&-'&"
%#%#(" &-%&#&-!&#&-$&"
%#(#(" &-%#&-%#&-%" &-%&#&-,&#&-$&"
%#%#%&" &-'#&-'#&-'" &-%&#&-%&#&-$&"
%#%#%"
%#%&#%&" &-!#&-!#&-!" &-%&#&-$&#&-*&"
'#'#'"
%#,#(" &-,#&-,#&-," &-%&#&-*&#&-(&"
!" ,#,#,"
'#%#%" &-$#&-$#&-$" &-'&#&-%&#&-%&"
(#(#("
'#'#%" &-*#&-*#&-*" &-'&#&-'&#&-%&"
%&#%&#%&"
,#%#%" &-)#&-)#&-)" &-$&#&-!&#&-%&"
,#,#%" &-(#&-(#&-(" &-$&#&-,&#&-%&"
(#%#%" &-$&#&-%&#&-%&"
(#(#%" &-*&#&-$&#&-%&"
%&#%#%" &-(&#&-*&#&-%&"
%&#%&#%"
(#,#%"
%#%#'#'#'" &-%#&-%#&-'#&-'#&-'"
%#%#,#,#," &-%#&-%#&-%#&-%#&-%" &-%#&-%#&-,#&-,#&-,"
%#%#(#(#(" &-'#&-'#&-'#&-'#&-'" &-%#&-%#&-$#&-$#&-$"
%#%#%#%#%"
%#%#%&#%&#%&" &-!#&-!#&-!#&-!#&-!" &-%#&-%#&-*#&-)#&-("
'#'#'#'#'"
%#'#,#(#%&" &-,#&-,#&-,#&-,#&-," &-%#&-!#&-$#&-)#&-("
$" ,#,#,#,#,"
'#'#'#%#%" &-$#&-$#&-$#&-$#&-$" &-'#&-'#&-'#&-%#&-%"
(#(#(#(#("
,#,#,#%#%" &-*#&-*#&-*#&-*#&-*" &-,#&-,#&-,#&-%#&-%"
%&#%&#%&#%&#%&"
(#(#(#%#%" &-)#&-)#&-)#&-)#&-)" &-$#&-$#&-$#&-%#&-%"
%&#%&#%&#%#%" &-(#&-(#&-(#&-(#&-(" &-(#&-)#&-*#&-%#&-%"
%&#(#,#'#%" &-(#&-)#&-$#&-!#&-%"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-'"
%#%#%#%#%#%#%#'"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-!"
%#%#%#%#%#%#%#,"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-$"
%#%#%#%#%#%#%#("
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-)"
%#%#%#%#%#%#%#%&"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-%" &-%#&-%#&-%#&-!#&-!#&-!#&-$#&-$"
%#%#%#'#'#'#,#,"
&-'#&-'#&-'#&-'#&-'#&-'#&-'#&-'" &-%#&-%#&-%#&-%#&-*#&-*#&-(#&-("
%#%#%#%#%#%#%#%" %#%#%#%#,#,#,#,"
&-!#&-!#&-!#&-!#&-!#&-!#&-!#&-!" &-'#&-!#&-,#&-$#&-*#&-)#&-)#&-("
'#'#'#'#'#'#'#'" %#%#%#%#(#(#%&#%&"
&-,#&-,#&-,#&-,#&-,#&-,#&-,#&-," &-'#&-'#&-'#&-,#&-,#&-(#&-(#&-("
(" ,#,#,#,#,#,#,#," '#%#%#%#%#%#%#%"
&-$#&-$#&-$#&-$#&-$#&-$#&-$#&-$" &-'#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
(#(#(#(#(#(#(#(" ,#%#%#%#%#%#%#%"
&-*#&-*#&-*#&-*#&-*#&-*#&-*#&-*" &-!#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
%&#%&#%&#%&#%&#%&#%&#%&" (#%#%#%#%#%#%#%"
&-)#&-)#&-)#&-)#&-)#&-)#&-)#&-)" &-$#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
%&#%#%#%#%#%#%#%"
&-(#&-(#&-(#&-(#&-(#&-(#&-(#&-(" &-)#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
,#,#'#'#'#%#%#%"
&-$#&-$#&-!#&-!#&-!#&-%#&-%#&-%"
,#,#,#,#%#%#%#%"
&-(#&-(#&-*#&-*#&-%#&-%#&-%#&-%"
%&#%&#(#(#%#%#%#%"
&-(#&-)#&-)#&-*#&-$#&-,#&-!#&-'"
%&#%&#(#(#,#,#'#%"
&-(#&-(#&-(#&-,#&-,#&-'#&-'#&-'"
!

! !

690 The European Research Journal Volume 9 Issue 4 July 2023


Eur Res J 2023;9(4):687-696 Macunluoğlu and Ocakoğlu

! #! ()(*!+! Under the empirical distribution, if p0 ≤ α, reject the


!" # "! $%%
&'! $ &
"
,
'! !"#$ null hypothesis.
"

The Mood’s Median Test


where The Mood’s Median (MM) test is the generalized
version of the median test used to test data from two
" ) '! #*#+","
!! "!
#$"
#$%(" $&(" %%& & !
-
'" !"#$ independent groups, used for three and more sample
comparisons [15]. The null hypothesis H0: θ1= θ2=…
= θk versus alternative H1: at least one θi (i= 1, 2, …,
Note that, when there are no ties, S2 simplifies to k) is different.
N(N + 1)/12. To obtain the test statistics of the MM test, the
The null hypothesis H0: θ1= θ2=…= θk ,should then common median value of all data is first calculated.
be rejected at the α level of significance when ! !"!! #! As a second step, for each sample, it is determined
#
!"
#
$%&'(

how many observations are greater than the calculated


The Modified Version of Kruskal-Wallis Test Based median value and how many are equal to or less than
on Permutation Test it. As a result, a 2xk frequency table is obtained. The
Permutation test is the test procedure which is pre- test statistic is calculated as,
sented by Fisher [12] and the probability values ob-
tained are exact probabilities, and it is also stated by #
"# $% &
Hecke [13] as a simulation method used to determine ! ! " #'()* !"% !" ! !"#$
the strength of the test. !"

There are two methods for calculating the KW


test: permutation and rank transformations. The mod- The null hypothesis H0: θ1= θ2=…= θk ,should then
ified version of the KW test based on the permutation be rejected at the α level of significance when χ2 ≥ χ2α;
test is obtained by combining the permutation method (i-1)*(j-1)
based on the F statistic with the rank method [11]. The
process of obtaining the permutations starts by choos- The van der Waerden Test
ing the test statistic T and the acceptable significance The advantage of the van der Waerden test is that
level α. π1, π2, …, πn be a set of all distinct permuta- it provides the high efficiency of the standard ANOVA
tions of the ranks of the data set in the experiment. For analysis when the normality assumptions are in fact
permutation testing, the data are sorted from small to satisfied, but it also provides the robustness of the KW
large, each is given a rank and the KW test statistic is test when the normality assumptions are not satisfied
calculated (H1=t0). Different permutation (πi) values [16]. The KW test is based on the ranks of the data.
are obtained for each data sorted from small to large. The van der Waerden test converts the ranks to quan-
The KW test statistic is calculated for the obtained per- tiles of the standard normal distribution. These are
mutation (ni) values (Hi=H(πi)) and this process i is re- called normal scores and the test is computed from
peated (i= 2, 3, ..., M). these normal scores [17]. The null hypothesis H0: θ1=
The null hypothesis H0: θ1= θ2=…= θk versus al- θ2=…= θk versus alternative H1: at least one θi (i= 1,
ternative H1: at least one θi (i= 1, 2, …, k) is different. 2, …, k) is different. The formula for the van der
The test statistic is calculated as [13], Waerden test is

#
!! " !#$ % $" & " '$
"%# (#)! * $" &! !"#$ !"
!
#%#&! $# %&
$
#! !"#$
$ "!

where where
%! ( %!
'( )*+,! - ." #$
%!"
) * !+
(
. !!" * / ' "
(
. . !'!" * # #$ !
!"# $ % & ! !!" " # $
&'( ! ",
-! &0(
/( )*+,! 0 ." + "&$ !&$ "&$

The European Research Journal Volume 9 Issue 4 July 2023 691


Eur Res J 2023;9(4):687-696 Non-parametric k-sample test procedures as an alternative to ANOVA

is the normal quantile of x. The null hypothesis should groups) has a positive effect on its performance.
then be rejected at the α level of significance when V
≥ χ2α;k-1 Comparisons in which the sample size is not balanced,
the group variances are homogeneous, and the data
The Savage Test follow to the normal distribution (Supplementary
The Savage test is among the non-parametric al- Table 3-8)
ternatives to the F test used to test the differences be- The F test and the KW test based on permutation
tween location parameters. The Savage test is test are the most successful tests for estimating the
powerful to compare scale differences or position dif- Type-I error level initially determined. The F test and
ferences in the extreme value distribution, which are the modified version of KW test based on permutation
compatible with exponential distribution [18]. test are followed by the KW test with deflection esti-
The Savage test statistic is calculated by Savage mates shown only in a single simulation scenario.
scores. The null hypothesis H0: θ1= θ2=…= θk versus The other tests included in the study were found
alternative H1: at least one θi (i= 1, 2, …, k) is differ- to be adversely affected by the imbalance of the num-
ent. The formula for the Savage test is ber of sample sizes in the groups, and their perform-
ance to protect the Type-I error determined at the
" beginning was not sufficient.
!! " " #$(" $$ %%&
' &
! !"#$
#!
When simulation scenarios involving observation
combinations in which the number of sample sizes in
where groups differ excessively, it was observed that the F
test and modified version of KW test based on permu-
!!" " #'!(#
#
$ %!"!!!&
# )!
" #"(# !!" !"!!*+ " '
# )
#' # ! ! + !
tation test were not affected by the extreme differences
,$%#- !(# "(# !"
$%!&# )!
in the number of sample sizes in groups and tended to
maintain the Type-I error level initially determined in
The null hypothesis should then be rejected at the all simulation scenarios according to the Peterson cri-
α level of significance when TE ≥ χ2α;k-1. terion.
On the other hand, in cases where the number of
sample size in the groups varies in a balanced manner,
RESULTS the KW test, which performs at a level that can accom-
pany these two tests, was observed to have affected its
In this study, the tests were compared with the help of performance and gave deviated results if the difference
simulation scenarios in terms of the Type-I error pro- in the number of sample size was excessive.
tection. Simulation scenarios were performed under
the R program [9]. The obtained Type-I errors are Comparisons in which the sample size is balanced,
given in tables. group variances are heterogeneous, but the data fol-
low to the normal distribution (Supplementary Table
Comparisons in which sample size is balanced, the 9-11)
group variances are homogeneous, and the data fol- It has been seen that the tests included in the study
low to the normal distribution (Table 3, Supplementary generally give deviated results in terms of protecting
Table 1, 2) the Type-I error and their performance was not found
The F test is the test that shows the most success- sufficient.
ful performance when the non-parametric alternatives
are taken into consideration and the predetermined Comparisons in which the sample size is not balanced,
Type-I error level was determined. group variances are heterogeneous, but the data fol-
In addition to the F test, the KW test also tends to low to the normal distribution (Supplementary Table
maintain the Type-I error level in terms of observation 12-20)
combinations, and the increase in the number of It has been seen that the tests included in the study
groups to be compared (especially in the case of eight generally give deviated results in terms of protecting

692 The European Research Journal Volume 9 Issue 4 July 2023


Eur Res J 2023;9(4):687-696 Macunluoğlu and Ocakoğlu

!"#$%& :(& !;+%<=& %44/4& 4"1%,& >?@& 0/4& AB:& 34/5+,& C2%4%& !"! " !"" " !"# B& 'DED'F'GDED'GH& I!BI"BI#BGH&
,"*+$%&,-.%&-,&#"$"89%J&>8!B8"&B&8#@!
"!! #! $! %&! '()*%&! ++! ,&! -./.0(!
2! 34567! 84917! 94627! 67! 14117! 64517!
9! 34:87! 943:7! 34:67! ;4;57! 34897! <4537!
16! 94627! 94117! 343<7! 24387! 34357! 24:57!
19! 946:7! 94<17! 34537! 34;:7! 34587! 34<:7!
<6! 94657! 94197! 34587! 34537! 34857! 34327!
1!
<9! 94687! 94137! 34827! 34557! 34587! 34387!
26! 34:27! 34:<7! 34597! 34197! 34;87! 34;27!
96! 94117! 94127! 34::7! 34887! 34::7! 34;87!
86! 94657! 94687! 34:87! 94937! 94637! 348:7!
166! 94637! 94697! 94667! 941<7! 94667! 34837!
2! 348<7! 84217! 34;67! 67! 146<7! 64517!
9! 34:27! 94917! 348<7! ;4867! 248;7! <4;37!
16! 34887! 94197! 34<:7! 24257! 343<7! 24517!
19! 94627! 94<67! 34;37! 34;57! 34557! 341:7!
<6! 34867! 34::7! 34917! 34;17! 34;<7! 34<37!
<!
<9! 94197! 94<;7! 34:97! 34887! 94617! 34;57!
26! 94627! 946:7! 348;7! 34367! 34:;7! 34;<7!
96! 34:27! 94617! 34837! 345:7! 34:27! 345:7!
86! 34::7! 946;7! 34:<7! 94387! 34:97! 34587!
166! 94667! 34:57! 34:97! 94167! 34:87! 345;7!
2! 94687! 84957! 34567! 67! 14627! 64;57!
9! 34:;7! 943;7! 345:7! ;49:7! 24827! <4;37!
16! 946;7! 94<:7! 343;7! 24217! 34997! 248<7!
19! 34557! 34:67! 34;67! 343:7! 34527! 341:7!
<6! 94167! 94167! 34587! 34537! 34857! 34327!
3!
<9! 34:97! 34:37! 34837! 34817! 34897! 34;17!
26! 94<17! 94127! 34;67! 34137! 34;<7! 349<7!
96! 94167! 94617! 34867! 34837! 34587! 34857!
86! 348:7! 34897! 34:97! 943<7! 34:57! 34827!
166! 348;7! 34:<7! 34897! 946:7! 34887! 34867!
2! 34:57! 849:7! 34837! 67! 14127! 645<7!
9! 34587! 94<:7! 34;57! ;4;;7! 34837! <4527!
16! 94657! 94<67! 343;7! 24327! 34997! 24897!
19! 94197! 94297! 345;7! 345<7! 34837! 343<7!
<6! 34:67! 34:27! 34;<7! 345:7! 349;7! 34257!
8!
<9! 94617! 941<7! 345:7! 34567! 34897! 34997!
26! 94667! 94637! 348<7! 342;7! 345;7! 34927!
96! 94687! 94627! 34:57! 348<7! 34:97! 34837!
86! 94<17! 94<<7! 94127! 94917! 941<7! 34::7!
166! 946:7! 34:27! 94697! 94137! 94657! 94617!
2! 94167! 84;;7! 34:67! 67! 141;7! 645<7!
9! 34::7! 94357! 34:17! ;4537! 24:67! <4517!
16! 34887! 94687! 34<:7! 243;7! 34367! 245;7!
19! 94617! 94697! 34;17! 34357! 34;17! 34<27!
<6! 946<7! 946:7! 34527! 34897! 34557! 34<87!
16!
<9! 94687! 94687! 34897! 348<7! 34827! 34997!
26! 34:67! 34::7! 34527! 34287! 348<7! 349<7!
96! 946;7! 34::7! 34837! 34897! 348;7! 34;67!
86! 94687! 94167! 34:97! 34887! 34:<7! 34;87!
166! 946:7! 946;7! 94637! 34:97! 94667! 34867!
!"#!#$%&$'#()"#(*+&,-./)-..0&#$%&$'#1%*2()"#$3%#245060%5#7%*&048#46#(*+&,-./)-..0&#$%&$#9-&%5#48#1%*2+$-$048#$%&$'#::"#
:445;&#:%50-8#$%&$'#<)"#7-8#5%*#)-%*5%8#$%&$#!
!

The European Research Journal Volume 9 Issue 4 July 2023 693


Eur Res J 2023;9(4):687-696 Non-parametric k-sample test procedures as an alternative to ANOVA

the Type-I error and their performance is not sufficient. group variances are heterogeneous, and the data fol-
low to log-normal distribution Supplementary (Table
Comparisons in which the sample size is balanced, 30-32)
group variances are homogeneous, and the data fol- It has been seen that the tests included in the study
low to log-normal distribution (Supplementary Table generally give deviated results in terms of protecting
21-23) the Type-I error and their performance was not found
As expected, the F test is the test that shows the sufficient.
most successful performance in order to estimate the
level of Type-I error determined at the beginning when Comparisons in which the sample size is not balanced,
considering the non-parametric alternatives available. group variances are heterogeneous, and the data fol-
In addition to the F test, the KW test also tends to low to log-normal distribution (Supplementary Table
maintain the Type-I error level in terms of observation 33-41)
combinations, and the increase in the number of It has been seen that the tests included in the study
groups to be compared (especially in the case of eight generally give deviated results in terms of protecting
groups) has a positive effect on its performance. the Type-I error and their performance was not found
The performance of the MM test was also posi- sufficient.
tively affected by the increase in the number of groups.
Although its performance in protecting the Type-I
error is lower than that of the KW test, its performance DISCUSSION
in the case of eight groups has increased significantly
compared to the number of groups to be compared The F test is the test that shows the most successful
with three and five. performance as expected in cases where the conform-
ity to the normal distribution and the homogeneity of
Comparisons in which the sample size is not balanced, the variances are provided. When the simulation sce-
group variances are homogeneous, and the data fol- narios where the assumption of homogeneity of vari-
low to log-normal distribution (Supplementary Table ances are not met, as expected, the F test was highly
24-29) affected by the deterioration in group variances and
When simulation scenarios involving observation failed to maintain the Type-I error at the nominal level
combinations in which the number of sample size in (α = 0.05). The results of our study reach similar re-
the groups are not equal are examined, the F test and sults to the studies conducted by Buning [8] and
the KW test are the tests that show the most successful Moder [2]. It is the test that shows the most successful
performance in order to estimate the Type-I error level performance compared to other alternative tests in
determined at the beginning. These tests are followed cases where the data conform to the log-normal dis-
by modified version of KW test based on permutation tribution, and the variances are homogeneous. Blanca
test. et al. [19] Clinch and Keselman [20], Gamage and
When the simulation scenarios involving observa- Weerahandi [21], Lantz [22] and Schmider et al. [23]
tion combinations in which the number of sample size reported that the F test tends to protect the Type-I error
in the groups differ excessively, it was observed that in cases where the assumption of conformity to the
the permutation version of the F test and the KW test normal distribution is violated. It was observed that
was not affected by the extreme differences in the the effect of violation of the homogeneity of variances
number of sample size in the groups. on the performance of the F test was more than the vi-
T he other tests included in the study were found to olation of the assumption of conformity to normal dis-
be adversely affected by the imbalance of the number tribution. Bishop and Dudewicz [3], Blanca et al. [19],
of sample size in the groups, and their performance in Brown and Forsythe [24], Buning [8], Debeuckelaer
maintaining the Type-I error level determined at the [25], Lee and Ahn [26], Li et al. [27], Lu and Mathew
beginning was not sufficient. [28], Markowski [29], Keselman et al. [30], Tomarken
and Serlin [31] concluded that the F test is highly af-
Comparisons in which the sample size is balanced, fected by the deterioration in group variances.

694 The European Research Journal Volume 9 Issue 4 July 2023


Eur Res J 2023;9(4):687-696 Macunluoğlu and Ocakoğlu

In this study, the KW test was not affected by the the literature by reporting that the Savage test’s per-
distribution of the data. It was concluded that the vio- formance in protecting the Type-І error compared to
lation of the homogeneity of variances and the number other tests gives very poor and biased results.
of sample sizes (equal and unequal) in the groups were
effective on the performance of the KW test to protect
the Type-I error. In their studies, Hoeffding [32] ve CONCLUSION
Terry [33] concluded that the performance of the KW
test was not sufficient in terms of protecting Type-I In conclusion as stated in the literature, it was deter-
error in cases where the variance was not homoge- mined that the F test tends to maintain its robustness
neous. Lantz [22], Luh and Guo [34], Jett and Speer in case of violation of the normal distribution, how-
[35] found in their studies that the KW test was not af- ever, it is more affected by the violation of the homo-
fected by the distribution of the data, and in cases geneity assumption of variances. It was concluded that
where the variances were homogeneous, they tend to the distribution of the data was not effective on the
protect the Type-I error. KW test’s performance in protecting Type-І error, the
The modified version of the KW test is not af- violation of homogeneity of variances and the sample
fected by the distribution of the data; It is highly af- size in the groups were effective. The modified version
fected by the homogeneity violation of variances such of KW test based on permutation test is not affected
as the KW test. It can be suggested as an alternative by the distribution of the data; like the KW test, it is
for the F test for observation combinations where the highly affected by the violation of homogeneity of
number of sample sizes in the groups are not equal and variances. It can be suggested as an alternative to the
excessively different. Odiase and Ogbonmwan [14] re- F test for combinations of observations where the sam-
ported in their study that the permutation test does not ple sizes in the groups are not equal and vary exces-
require assumptions for the distribution of the data, sively. The van der Wearden test was not affected by
and that it performs well on data that are normally dis- the distribution of the data and showed successful per-
tributed and not normally distributed. formance in protecting the Type-I error in observation
The van der Wearden test was not affected by the dis- combinations where the number of sample sizes in the
tribution of the data and showed successful perform- groups where the group variances were homogeneous
ance in protecting the Type-I error in observation differed significantly. In general, the MM test did not
combinations where the number of sample sizes in the show a successful performance in protecting the Type-
groups where the group variances were homogeneous І error. It has been found that the Savage test’s per-
differed significantly. The van der Wearden test was formance in protecting the Type-І error compared to
greatly affected by the breakdown in group variance. other tests gives very poor and biased results.
Luepsen [1] stated that the van der Wearden test was
the most successful test after the F test in estimating Authors’ Contribution
the Type-І error level in cases where there is no rela- Study Conception: GO; Study Design: GO; Super-
tionship between group variances and the number of vision: GO; Funding: N/A; Materials: N/A ; Data Col-
observations belonging to the groups. lection and/or Processing: ACM; Analysis and/or
Although the MM test performed well as the num- Data Interpretation: ACM, GO; Literature Review:
ber of groups compared increased, it did not show a ACM; Manuscript Preparation: ACM, GO and Critical
successful performance in protecting the Type-I error Review: ACM, GO.
in general. Jett and Speer [35] stated in their simula-
tion studies that the performance of the MM test was Conflict of interest
not sufficient to protect the Type-І error and reported The authors disclosed no conflict of interest during
our study with supporting findings. the preparation or publication of this manuscript.
The Savage test could not perform adequately to
protect the Type-І error at nominal level and gave bi- Financing
ased results. There is no study in the literature regard- The authors disclosed that they did not receive any
ing the Savage test. Our study aims to contribute to grant during conduction or writing of this study.

The European Research Journal Volume 9 Issue 4 July 2023 695


Eur Res J 2023;9(4):687-696 Non-parametric k-sample test procedures as an alternative to ANOVA

Supplementary Tables 1 to 41 normal data: Is ANOVA still a valid option? Psicothema


2017;29:552-7.
20. Clinch J, Kesselman H. Parametric alternatives to the analysis
of variance. J Educ Stat 1982;7:207-14.
REFERENCES 21. Gamage J, Weerahandi S. Size performance of some tests in
one-way ANOVA. Commun Stat Simul Comput 1998;27:625-
1. Luepsen H. Comparison of nonparametric analysis of variance 40.
methods: a vote for van der Waerden. Commun Stat Simul Com- 22. Lantz B. The impact of sample non-normality on ANOVA
put 2018;47:2547-76. and alternative methods. Br J Math Stat Psychol 2013;66:224-
2. Moder K. Alternatives to F-test in one way ANOVA in case of 44.
heterogeneity of variances (a simulation study). Psychol Test As- 23. Schmider E, Ziegler M, Danay E, Beyer L, Bühner M. Is it
sess Model 2010;52:343-53. really robust? Reinvestigating the robustness of ANOVA against
3. Bishop TA, Dudewicz EJ. Exact analysis of variance with un- violations of the normal distribution assumption. Methodology
equal variances: test procedures and tables. Technometrics (Gott) 2010;6:147-51.
1978;20:419-30. 24. Brown MB, Forsythe AB. The small sample behavior of some
4. McSeeney M, Katz B. Nonparametric statistics: use and statistics which test the equality of several means. Technometrics
nonuse. Percept Mot Skills 1978;46(3_suppl):1023-32. 1974;16:129-32.
5. Pearson ES. The analysis of variance in cases of non-normal 25. De Beuckelaer A. A closer examination on some parametric
variation. Biometrika 1931;23:114-33. alternatives to the ANOVA F-test. Stat Papers 1996;37:291-305.
6. Glass G, Peckham P, Sande J. Consequences of failure to meet 26. Lee S, Ahn C. Modified ANOVA for unequal variances. Com-
assumptions underlying the fixed effects analyses of variance and mun Stat Simul Comput 2003;32:987-1004.
covariance. Rev Educ Res 1972;42:237-88. 27. Li X, Wang J, Liang H. Comparison of several means: a fidu-
7. Wilcox RR. ANOVA: a paradigm for low power and mislead- cial based approach. Comput Stat Data Analysis 2011;55;1993-
ing measures of effect size? Rev Educ Res 1995;65:51-77. 2002.
8. Buning H. Robust analysis of variance. J Appl Stat 28. Lu F, Mathew T. A parametric bootstrap approach for
1997;24:319-32. ANOVA with unequal variances: fixed and random models.
9. R Development Core Team. R: A Language and Environment Comput Stat Data Analysis, 2007;51:5731-42.
for Statistical Computing [Computer software manual]. Vienna, 29. Markowski CA. Conditions for the effectiveness of a prelim-
Austria:. [cited 2018] Available from http://www.Rproject.org/ inary test of variance. Am Stat 1990;44:322-6.
10. Peterson K. Six modifications of the aligned rank transform 30. Keselman HJ, Rogan JC, Fier-Walsh BJ. An evaluation of
test for interaction. J Modern Appl Stat Methods 2002;1:100-9. some non-parametric and parametric tests for location equality.
11. Kruskal WH, Wallis A. Use of ranks in one-criterion variance Br J Math Stat Psychol 1977;30:213-21.
analysis. J Am Stat Assoc 1952;47:583-621. 31. Tomarken A, Serlin RC. Comparison of ANOVA alternatives
12. Fisher RA. The Design of Experiments. Edinburgh: Oliver under variance heterogeneity and specific noncentrality struc-
and Boyd; 1935. tures. Psychol Bull 1986;99:90-9.
13. Hecke TV. Power Study of Anova versus Kruskal-Wallis Test, 32. Hoeffding W. Optimum" nonparametric tests. Berkeley Sym-
2010. posium on Mathematical Statistics and Probability. Universy of
14. Odiase JI, Ogbonmwan SM. JMASM20: exact permutation California 2nd ed. 1951: pp.83-92.
critical values for the Kruskal-Wallis One-way ANOVA. J Mod- 33. Terry MH. Some rank order test which are most powerful
ern Appl Stat Methods 2005;4:609-20. aganist specific parametric alternatives. Ann Math Stat
15. Brown GW, Mood AM. On Median Tests for Linear Hypothe- 1952;23:346-66.
ses. University of California Press, 1951: pp. 159-66. 34. Luh W, Guo J. Approximate transformation trimmed mean
16. Conover WJ. Practical Nonparameteric Statistics. 3rd ed. methods to the test of simple linear regression slope equality. J
Wiley; 1999: p. 396-406. Appl Stat 2000;27:843-57.
17. van der Waerden B. Order Tests for The Two-Sample Problem 35. Jett D, Speer J. Comparison of parametric and nonparametric
II, III, Proceedings of the Koninklijke Nederlandse Akademie tests for differences in distribution. Proceedings of The National
van Wetenschappen. Serie A 1953;564:303-10 and 311-6. Conference On Undergraduate Research (NCUR) 2016 Univer-
18. Hajek J. A Course in Nonparametric Statistics. San Francisco: sity of North Carolina-Asheville Asheville, North Carolina April
Holden-Day, 1969: p.83. 7-9, 2016: 1765-70.
19. Blanca M, Alarcón R, Arnau J, Bono R, Bendayan R. Non-

This is an open access article distributed under the terms of Creative Common
Attribution-NonCommercial-NoDerivatives 4.0 International License.

696 The European Research Journal Volume 9 Issue 4 July 2023

You might also like