Professional Documents
Culture Documents
10.18621-eurj.1037546-2135935
10.18621-eurj.1037546-2135935
Department of Biostatistics, Bursa Uludağ University, Institute of Health Sciences, Bursa, Turkey; 2Department of Biostatistics, Bursa
1
ABSTRACT
Objectives: The performances of the Kruskal-Wallis test, the van der Waerden test, the modified version of
Kruskal-Wallis test based on permutation test, the Mood’s Median test and the Savage test, which are among
the non-parametric alternatives of one-way analysis of variance and included in the literature, to protect the
Type-I error probability determined at the beginning of the trial at a nominal level, were compared with the F
test.
Methods: Performance of the tests to protect Type-I error; in cases where the variances are
homogeneous/heterogeneous, the sample sizes are balanced/unbalanced, the distribution of the data is in
accordance with the normal distribution/the log-normal distribution, how it is affected by the change in the
number of groups to be compared has been examined on simulation scenarios.
Results: The Kruskal-Wallis test, the van der Waerden test, the modified version of the Kruskal-Wallis test
based on the permutation test were not affected by the distribution of the data, but by the violation of the
homogeneity of the variances. The performance of the Mood’s Median test and the Savage test were not found
to be sufficient in terms of protection of theType-I error compared to other tests.
Conclusions: It was determined that the Kruskal-Wallis test, the van der Waerden test, the modified version
of Kruskal-Wallis test based on permutation test were not affected by the distribution of the data and tended to
preserve the Type-І error when the variances were homogeneous.
Keywords: Analysis of variance, conformity of normal distribution, non-parametric k-sample tests
Received: December 16, 2021; Accepted: August 10, 2022; Published Online: October 24, 2022
How to cite this article: Macunluoğlu AC, Ocakoğlu G. Comparison of the performances of non-parametric k-sample test procedures as an alternative
to one-way analysis of variance. Eur Res J 2023;9(4):687-696. DOI: 10.18621/eurj.1037546
e-ISSN: 2149-3189
Address for correspondence: Gökhan Ocakoğlu, PhD., Professor, Bursa Uludağ University, Faculty of Medicine , Department of Biostatistics, Bursa,
Turkey, E-mail: gocakoglu@gmail.com, Phone: +90 224 295 38 71
©
Copyright © 2023 by Prusa Medical Publishing
Available at http://dergipark.org.tr/eurj
info@prusamp.com
which is a parametric test, is used to compare the mean anced [3]. Non-parametric tests are statistical proce-
of more than two populations and is one of the most dures that are preferred as an alternative to parametric
frequently used and most important statistical methods tests when assumptions are not met. Non-parametric
for this purpose [1]. The assumptions for the F test in- tests have less assumptions than parametric tests [4].
clude that the data is normally distributed, the sample The data need not conform to a normal distribution.
variances are equal, and the samples are independent Non-parametric tests can be applied to data measured
[2]. If the assumptions of conformity to normal distri- with a classifier or ordinal scale.
bution or homogeneity of variance are violated, the Pearson [5], Glass et al. [6], and Wilcox [7] exam-
probability of Type-І error obtained at the end of the ined the effect of the normality assumption violation
trials and the power of the test are adversely affected. on the Type-Ⅰ error. Wilcox [7] concluded that samples
This adversely affect becomes even more evident if that do not conform to normal distribution have some
the sample sizes in the groups compared are not bal- impact on the Type-Ⅰ error rate, but the effect is mini-
!
!"#$%&'(&)"*+$%&,-.%,&/0&12%&34/5+,!
"#$%&'! -./.01&2!3.$+/&! "(04%./.01&2!3.$+/&!
()!
*'(#+,!
! ! 5%,&'6.78(0! 5%,&'6.78(0! 5%,&'6.78(0!
1($%80.78(0,!9:&'&!7:&! 1($%80.78(0,!9:&'&!7:&! 1($%80.78(0,!987:!
0#$%&'!()!,.$+/&!,8;&,! 0#$%&'!()!,.$+/&!,8;&,! 806&',&!$.71:80*!%&79&&0!
.'&!0(7!&<#./! 28))&',!&=1&,,86&/>! 6.'8.01&!.02!0#$%&'!()!
,.$+/&!,8;&,!
!#!#!"
$#$#$"
%&#%&#%&" !#$#)" )#$#!"
%$#%$#%$" $#%&#%$" %$#%&#$"
!#'$#!&"
'&#'&#'&" '&#'$#!&" !&#'$#'&"
!" !#(&#(&"
'$#'$#'$" $&#*&#)&" )&#*&#$&"
$#'&#%&&"
!&#!&#!&" *$#)$#($" ($#)$#*$"
$&#$&#$&" )&#+&#%&&" %&&#+&#)&"
(&#(&#(&"
%&&#%&&#%&&"
!#!#!#!#!"
$#$#$#$#$"
%&#%&#%&#%&#%&" !#$#)#+#%%" )#$#!"
%$#%$#%$#%$#%$" $#)#+#%'#%$" !#'&#'$#(&#%&&" %$#%&#$"
'&#'&#'&#'&#'&" '&#''#',#'(#!&" !#$#!&#(&#%&&" !&#'$#'&"
$"
'$#'$#'$#'$#'$" $&#$$#*&#*$#)&" $#%&#'&#'$#(&" )&#*&#$&"
!&#!&#!&#!&#!&" $$#*$#)$#($#+$" !#$#%&#%$#%&&" ($#)$#*$"
$&#$&#$&#$&#$&" *&#)&#(&#+&#%&&" %&&#+&#)&"
(&#(&#(&#(&#(&"
%&&#%&&#%&&#%&&#%&&"
!#!#!#!#!#!#!#!"
$#$#$#$#$#$#$#$"
%&#%&#%&#%&#%&#%&#%&#%&"
%$#%$#%$#%$#%$#%$#%$#%$" !#$#)#+#%%#%'#%,#%$" !#$#%&#'&#'$#!&#(&#%&&" %$#%,#%'#%%#+#)#$#!"
'&#'&#'&#'&#'&#'&#'&#'&" '&#''#',#'$#'*#'(#'+#!&" $#%&#'&#'&#'$#(&#+&#%&&" !&#'+#'(#'*#'$#',#''#'&"
("
'$#'$#'$#'$#'$#'$#'$#'$" $&#$$#*&#*$#)&#)$#(&#($" !#$#%&#(&#(&#+&#%&&#%&&" ($#(&#)$#)&#*$#*&#$$#$&"
!&#!&#!&#!&#!&#!&#!&#!&" *&#*$#)$#(&#($#+&#+$#%&&" '&#'$#!&#(&#+&#+&#%&&#%&&" %&&#+$#+&#($#(&#)$#*$#*&"
$&#$&#$&#$&#$&#$&#$&#$&"
(&#(&#(&#(&#(&#(&#(&#(&"
%&&#%&&#%&&#%&&#%&&#%&&#%&&#%&&"
mal if the variances are homogeneous. Glass et al. [6] obtained after the numbers of H0 hypotheses were de-
reported similar results to Wilcox [7] in their studies termined, which were rejected at the end of 50000 rep-
if the variances were homogeneous. In his study, Bun- etitions. In our study, the evaluation criterion proposed
ing [8] examined the performances of the Kruskal- by Peterson [10] was adopted and it was concluded
Wallis test, the normal score test and the Welch test, that the performance of the tests with a probability of
which he included as an alternative to the F test and the Type-I error between 4.49% and 5.49% was suffi-
the F test, in terms of Type-I error and power. He eval- cient to maintain Type-I error.
uated the performances of the tests under various sim- Table 2 shows the variance rates of the groups that
ulation scenarios in terms of whether the variances are are suitable for normal distribution and the scale pa-
homogeneous or not in equal and unequal sample sizes rameter values of the groups that are suitable for log-
if the data show normal distribution or not. In his normal distribution.
study, Moder [2] stated that the location parameters of
the groups should be investigated in detail when there The F Test
are unbalanced sample sizes. One-way analysis of variance (ANOVA) or F test
In our study, we compared the performances of the is used to compare the mean of more than two popu-
Kruskal-Wallis test, the Mood’s Median test, the van lations. It is one of the most important and frequently
der Waerden test, the modified version of Kruskal- used methods of applied statistics [1]. The null hypoth-
Wallis test based on permutation test and the Savage esis H0: µ1=µ2=…=µk versus alternative H1: at least
test, which are among the non-parametric alternatives one µi (i= 1, 2, . . ., k) is different. The F test statistic,
of the F test, to protect the Type-Ⅰ error under various
simulation scenarios.
!$ & #
!%& "! #$!" %$"" ' (#)%*'
!" ( # #!*%./)%*0-%) ! !"#$
!$ ! &
!%& !'%&+$!' %$!" , (#-%)'
METHODS
In our study, the Kruskal-Wallis test, the modified ver- In Equation, k is the number of groups, N is the
sion of Kruskal-Wallis test based on permutation test, total number of observations, Xij is the jth observation
the Mood’s Median test, the van der Waerden test and (j = 1, 2, . . ., ni) in the ith group (i = 1, 2, . . . , k),
the Savage test in terms of maintaining the probability ! " # $ % &' !
! "" is the overall mean, !"!"!is the sample mean for
of the Type-Ⅰ error determined at the beginning of the the ith group. The F test is more powerful if the as-
experiment was compared with the F test. Simulation sumptions of normality and variance homogeneity
scenarios were run under the R program [9]. hold. The null hypothesis, H0: µ1=µ2=…=µk, should
The performance of the tests was evaluated as a then be rejected at the α level of significance when
result of comparisons between three, five, and eight F ≥ F1-α;k-1,N-k.
groups for simulation scenarios involving balanced/
non-balanced sample sizes (Table 1), normal distribu- The Kruskal-Wallis Test
tion or log-normal distribution, homogenous or het- One of the non-parametric alternatives to the F test
erogeneous variances (Table 2). In addition to the is the Kruskal-Wallis (KW) test. The KW test is a non-
specified simulation conditions, observation combina- parametric test procedure used to compare three or
tions are also included, where the number of sample more groups independently [11]. It is carried out using
size varies excessively among the group with higher ranks given to observation values instead of actual ob-
variance is assigned a lower number of observations, servation values. To calculate the test statistics, the
and the group with a lower variance is assigned a data are sorted from small to large, and each is as-
$!
higher number of observations and inverse matching signed a rank. is the sum of ranks assigned to
!! " # !"# !
#%&
between variance and sample size. the observations in the ith group. The null hypothesis
In comparisons made to determine Type-I error, H0: θ1= θ2=…= θk versus alternative H1: at least one θi
group means were taken equally. The Type- I error (i= 1, 2, . . ., k) is different. The test statistic is calcu-
probabilities for each of the simulation scenarios were lated as,
!
!"#$%&6(&7"4-"89%&4"1%,&/0&34/5+,!
! "('$./!28,7'8%#78(0! ?(*4!0('$./!28,7'8%#78(0!
"#$%&'!()! @($(*&0&(#,!A.'8.01&! @&7&'(*&0&(#,! @($(*&0&(#,!31./&! @&7&'(*&0&(#,!,1./&!
*'(#+,! A.'8.01&! +.'.$&7&'!B%C! +.'.$&7&'!B%C!
%#%#'"
%#'#'"
%#%#," &-%&#&-%&#&-'&""
%#,#," &-%&#&-'&#&-'&"
%#%#(" &-%&#&-!&#&-$&"
%#(#(" &-%#&-%#&-%" &-%&#&-,&#&-$&"
%#%#%&" &-'#&-'#&-'" &-%&#&-%&#&-$&"
%#%#%"
%#%&#%&" &-!#&-!#&-!" &-%&#&-$&#&-*&"
'#'#'"
%#,#(" &-,#&-,#&-," &-%&#&-*&#&-(&"
!" ,#,#,"
'#%#%" &-$#&-$#&-$" &-'&#&-%&#&-%&"
(#(#("
'#'#%" &-*#&-*#&-*" &-'&#&-'&#&-%&"
%&#%&#%&"
,#%#%" &-)#&-)#&-)" &-$&#&-!&#&-%&"
,#,#%" &-(#&-(#&-(" &-$&#&-,&#&-%&"
(#%#%" &-$&#&-%&#&-%&"
(#(#%" &-*&#&-$&#&-%&"
%&#%#%" &-(&#&-*&#&-%&"
%&#%&#%"
(#,#%"
%#%#'#'#'" &-%#&-%#&-'#&-'#&-'"
%#%#,#,#," &-%#&-%#&-%#&-%#&-%" &-%#&-%#&-,#&-,#&-,"
%#%#(#(#(" &-'#&-'#&-'#&-'#&-'" &-%#&-%#&-$#&-$#&-$"
%#%#%#%#%"
%#%#%&#%&#%&" &-!#&-!#&-!#&-!#&-!" &-%#&-%#&-*#&-)#&-("
'#'#'#'#'"
%#'#,#(#%&" &-,#&-,#&-,#&-,#&-," &-%#&-!#&-$#&-)#&-("
$" ,#,#,#,#,"
'#'#'#%#%" &-$#&-$#&-$#&-$#&-$" &-'#&-'#&-'#&-%#&-%"
(#(#(#(#("
,#,#,#%#%" &-*#&-*#&-*#&-*#&-*" &-,#&-,#&-,#&-%#&-%"
%&#%&#%&#%&#%&"
(#(#(#%#%" &-)#&-)#&-)#&-)#&-)" &-$#&-$#&-$#&-%#&-%"
%&#%&#%&#%#%" &-(#&-(#&-(#&-(#&-(" &-(#&-)#&-*#&-%#&-%"
%&#(#,#'#%" &-(#&-)#&-$#&-!#&-%"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-'"
%#%#%#%#%#%#%#'"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-!"
%#%#%#%#%#%#%#,"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-$"
%#%#%#%#%#%#%#("
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-)"
%#%#%#%#%#%#%#%&"
&-%#&-%#&-%#&-%#&-%#&-%#&-%#&-%" &-%#&-%#&-%#&-!#&-!#&-!#&-$#&-$"
%#%#%#'#'#'#,#,"
&-'#&-'#&-'#&-'#&-'#&-'#&-'#&-'" &-%#&-%#&-%#&-%#&-*#&-*#&-(#&-("
%#%#%#%#%#%#%#%" %#%#%#%#,#,#,#,"
&-!#&-!#&-!#&-!#&-!#&-!#&-!#&-!" &-'#&-!#&-,#&-$#&-*#&-)#&-)#&-("
'#'#'#'#'#'#'#'" %#%#%#%#(#(#%&#%&"
&-,#&-,#&-,#&-,#&-,#&-,#&-,#&-," &-'#&-'#&-'#&-,#&-,#&-(#&-(#&-("
(" ,#,#,#,#,#,#,#," '#%#%#%#%#%#%#%"
&-$#&-$#&-$#&-$#&-$#&-$#&-$#&-$" &-'#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
(#(#(#(#(#(#(#(" ,#%#%#%#%#%#%#%"
&-*#&-*#&-*#&-*#&-*#&-*#&-*#&-*" &-!#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
%&#%&#%&#%&#%&#%&#%&#%&" (#%#%#%#%#%#%#%"
&-)#&-)#&-)#&-)#&-)#&-)#&-)#&-)" &-$#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
%&#%#%#%#%#%#%#%"
&-(#&-(#&-(#&-(#&-(#&-(#&-(#&-(" &-)#&-%#&-%#&-%#&-%#&-%#&-%#&-%"
,#,#'#'#'#%#%#%"
&-$#&-$#&-!#&-!#&-!#&-%#&-%#&-%"
,#,#,#,#%#%#%#%"
&-(#&-(#&-*#&-*#&-%#&-%#&-%#&-%"
%&#%&#(#(#%#%#%#%"
&-(#&-)#&-)#&-*#&-$#&-,#&-!#&-'"
%&#%&#(#(#,#,#'#%"
&-(#&-(#&-(#&-,#&-,#&-'#&-'#&-'"
!
! !
#
!! " !#$ % $" & " '$
"%# (#)! * $" &! !"#$ !"
!
#%#&! $# %&
$
#! !"#$
$ "!
where where
%! ( %!
'( )*+,! - ." #$
%!"
) * !+
(
. !!" * / ' "
(
. . !'!" * # #$ !
!"# $ % & ! !!" " # $
&'( ! ",
-! &0(
/( )*+,! 0 ." + "&$ !&$ "&$
is the normal quantile of x. The null hypothesis should groups) has a positive effect on its performance.
then be rejected at the α level of significance when V
≥ χ2α;k-1 Comparisons in which the sample size is not balanced,
the group variances are homogeneous, and the data
The Savage Test follow to the normal distribution (Supplementary
The Savage test is among the non-parametric al- Table 3-8)
ternatives to the F test used to test the differences be- The F test and the KW test based on permutation
tween location parameters. The Savage test is test are the most successful tests for estimating the
powerful to compare scale differences or position dif- Type-I error level initially determined. The F test and
ferences in the extreme value distribution, which are the modified version of KW test based on permutation
compatible with exponential distribution [18]. test are followed by the KW test with deflection esti-
The Savage test statistic is calculated by Savage mates shown only in a single simulation scenario.
scores. The null hypothesis H0: θ1= θ2=…= θk versus The other tests included in the study were found
alternative H1: at least one θi (i= 1, 2, …, k) is differ- to be adversely affected by the imbalance of the num-
ent. The formula for the Savage test is ber of sample sizes in the groups, and their perform-
ance to protect the Type-I error determined at the
" beginning was not sufficient.
!! " " #$(" $$ %%&
' &
! !"#$
#!
When simulation scenarios involving observation
combinations in which the number of sample sizes in
where groups differ excessively, it was observed that the F
test and modified version of KW test based on permu-
!!" " #'!(#
#
$ %!"!!!&
# )!
" #"(# !!" !"!!*+ " '
# )
#' # ! ! + !
tation test were not affected by the extreme differences
,$%#- !(# "(# !"
$%!&# )!
in the number of sample sizes in groups and tended to
maintain the Type-I error level initially determined in
The null hypothesis should then be rejected at the all simulation scenarios according to the Peterson cri-
α level of significance when TE ≥ χ2α;k-1. terion.
On the other hand, in cases where the number of
sample size in the groups varies in a balanced manner,
RESULTS the KW test, which performs at a level that can accom-
pany these two tests, was observed to have affected its
In this study, the tests were compared with the help of performance and gave deviated results if the difference
simulation scenarios in terms of the Type-I error pro- in the number of sample size was excessive.
tection. Simulation scenarios were performed under
the R program [9]. The obtained Type-I errors are Comparisons in which the sample size is balanced,
given in tables. group variances are heterogeneous, but the data fol-
low to the normal distribution (Supplementary Table
Comparisons in which sample size is balanced, the 9-11)
group variances are homogeneous, and the data fol- It has been seen that the tests included in the study
low to the normal distribution (Table 3, Supplementary generally give deviated results in terms of protecting
Table 1, 2) the Type-I error and their performance was not found
The F test is the test that shows the most success- sufficient.
ful performance when the non-parametric alternatives
are taken into consideration and the predetermined Comparisons in which the sample size is not balanced,
Type-I error level was determined. group variances are heterogeneous, but the data fol-
In addition to the F test, the KW test also tends to low to the normal distribution (Supplementary Table
maintain the Type-I error level in terms of observation 12-20)
combinations, and the increase in the number of It has been seen that the tests included in the study
groups to be compared (especially in the case of eight generally give deviated results in terms of protecting
!"#$%& :(& !;+%<=& %44/4& 4"1%,& >?@& 0/4& AB:& 34/5+,& C2%4%& !"! " !"" " !"# B& 'DED'F'GDED'GH& I!BI"BI#BGH&
,"*+$%&,-.%&-,&#"$"89%J&>8!B8"&B&8#@!
"!! #! $! %&! '()*%&! ++! ,&! -./.0(!
2! 34567! 84917! 94627! 67! 14117! 64517!
9! 34:87! 943:7! 34:67! ;4;57! 34897! <4537!
16! 94627! 94117! 343<7! 24387! 34357! 24:57!
19! 946:7! 94<17! 34537! 34;:7! 34587! 34<:7!
<6! 94657! 94197! 34587! 34537! 34857! 34327!
1!
<9! 94687! 94137! 34827! 34557! 34587! 34387!
26! 34:27! 34:<7! 34597! 34197! 34;87! 34;27!
96! 94117! 94127! 34::7! 34887! 34::7! 34;87!
86! 94657! 94687! 34:87! 94937! 94637! 348:7!
166! 94637! 94697! 94667! 941<7! 94667! 34837!
2! 348<7! 84217! 34;67! 67! 146<7! 64517!
9! 34:27! 94917! 348<7! ;4867! 248;7! <4;37!
16! 34887! 94197! 34<:7! 24257! 343<7! 24517!
19! 94627! 94<67! 34;37! 34;57! 34557! 341:7!
<6! 34867! 34::7! 34917! 34;17! 34;<7! 34<37!
<!
<9! 94197! 94<;7! 34:97! 34887! 94617! 34;57!
26! 94627! 946:7! 348;7! 34367! 34:;7! 34;<7!
96! 34:27! 94617! 34837! 345:7! 34:27! 345:7!
86! 34::7! 946;7! 34:<7! 94387! 34:97! 34587!
166! 94667! 34:57! 34:97! 94167! 34:87! 345;7!
2! 94687! 84957! 34567! 67! 14627! 64;57!
9! 34:;7! 943;7! 345:7! ;49:7! 24827! <4;37!
16! 946;7! 94<:7! 343;7! 24217! 34997! 248<7!
19! 34557! 34:67! 34;67! 343:7! 34527! 341:7!
<6! 94167! 94167! 34587! 34537! 34857! 34327!
3!
<9! 34:97! 34:37! 34837! 34817! 34897! 34;17!
26! 94<17! 94127! 34;67! 34137! 34;<7! 349<7!
96! 94167! 94617! 34867! 34837! 34587! 34857!
86! 348:7! 34897! 34:97! 943<7! 34:57! 34827!
166! 348;7! 34:<7! 34897! 946:7! 34887! 34867!
2! 34:57! 849:7! 34837! 67! 14127! 645<7!
9! 34587! 94<:7! 34;57! ;4;;7! 34837! <4527!
16! 94657! 94<67! 343;7! 24327! 34997! 24897!
19! 94197! 94297! 345;7! 345<7! 34837! 343<7!
<6! 34:67! 34:27! 34;<7! 345:7! 349;7! 34257!
8!
<9! 94617! 941<7! 345:7! 34567! 34897! 34997!
26! 94667! 94637! 348<7! 342;7! 345;7! 34927!
96! 94687! 94627! 34:57! 348<7! 34:97! 34837!
86! 94<17! 94<<7! 94127! 94917! 941<7! 34::7!
166! 946:7! 34:27! 94697! 94137! 94657! 94617!
2! 94167! 84;;7! 34:67! 67! 141;7! 645<7!
9! 34::7! 94357! 34:17! ;4537! 24:67! <4517!
16! 34887! 94687! 34<:7! 243;7! 34367! 245;7!
19! 94617! 94697! 34;17! 34357! 34;17! 34<27!
<6! 946<7! 946:7! 34527! 34897! 34557! 34<87!
16!
<9! 94687! 94687! 34897! 348<7! 34827! 34997!
26! 34:67! 34::7! 34527! 34287! 348<7! 349<7!
96! 946;7! 34::7! 34837! 34897! 348;7! 34;67!
86! 94687! 94167! 34:97! 34887! 34:<7! 34;87!
166! 946:7! 946;7! 94637! 34:97! 94667! 34867!
!"#!#$%&$'#()"#(*+&,-./)-..0&#$%&$'#1%*2()"#$3%#245060%5#7%*&048#46#(*+&,-./)-..0&#$%&$#9-&%5#48#1%*2+$-$048#$%&$'#::"#
:445;&#:%50-8#$%&$'#<)"#7-8#5%*#)-%*5%8#$%&$#!
!
the Type-I error and their performance is not sufficient. group variances are heterogeneous, and the data fol-
low to log-normal distribution Supplementary (Table
Comparisons in which the sample size is balanced, 30-32)
group variances are homogeneous, and the data fol- It has been seen that the tests included in the study
low to log-normal distribution (Supplementary Table generally give deviated results in terms of protecting
21-23) the Type-I error and their performance was not found
As expected, the F test is the test that shows the sufficient.
most successful performance in order to estimate the
level of Type-I error determined at the beginning when Comparisons in which the sample size is not balanced,
considering the non-parametric alternatives available. group variances are heterogeneous, and the data fol-
In addition to the F test, the KW test also tends to low to log-normal distribution (Supplementary Table
maintain the Type-I error level in terms of observation 33-41)
combinations, and the increase in the number of It has been seen that the tests included in the study
groups to be compared (especially in the case of eight generally give deviated results in terms of protecting
groups) has a positive effect on its performance. the Type-I error and their performance was not found
The performance of the MM test was also posi- sufficient.
tively affected by the increase in the number of groups.
Although its performance in protecting the Type-I
error is lower than that of the KW test, its performance DISCUSSION
in the case of eight groups has increased significantly
compared to the number of groups to be compared The F test is the test that shows the most successful
with three and five. performance as expected in cases where the conform-
ity to the normal distribution and the homogeneity of
Comparisons in which the sample size is not balanced, the variances are provided. When the simulation sce-
group variances are homogeneous, and the data fol- narios where the assumption of homogeneity of vari-
low to log-normal distribution (Supplementary Table ances are not met, as expected, the F test was highly
24-29) affected by the deterioration in group variances and
When simulation scenarios involving observation failed to maintain the Type-I error at the nominal level
combinations in which the number of sample size in (α = 0.05). The results of our study reach similar re-
the groups are not equal are examined, the F test and sults to the studies conducted by Buning [8] and
the KW test are the tests that show the most successful Moder [2]. It is the test that shows the most successful
performance in order to estimate the Type-I error level performance compared to other alternative tests in
determined at the beginning. These tests are followed cases where the data conform to the log-normal dis-
by modified version of KW test based on permutation tribution, and the variances are homogeneous. Blanca
test. et al. [19] Clinch and Keselman [20], Gamage and
When the simulation scenarios involving observa- Weerahandi [21], Lantz [22] and Schmider et al. [23]
tion combinations in which the number of sample size reported that the F test tends to protect the Type-I error
in the groups differ excessively, it was observed that in cases where the assumption of conformity to the
the permutation version of the F test and the KW test normal distribution is violated. It was observed that
was not affected by the extreme differences in the the effect of violation of the homogeneity of variances
number of sample size in the groups. on the performance of the F test was more than the vi-
T he other tests included in the study were found to olation of the assumption of conformity to normal dis-
be adversely affected by the imbalance of the number tribution. Bishop and Dudewicz [3], Blanca et al. [19],
of sample size in the groups, and their performance in Brown and Forsythe [24], Buning [8], Debeuckelaer
maintaining the Type-I error level determined at the [25], Lee and Ahn [26], Li et al. [27], Lu and Mathew
beginning was not sufficient. [28], Markowski [29], Keselman et al. [30], Tomarken
and Serlin [31] concluded that the F test is highly af-
Comparisons in which the sample size is balanced, fected by the deterioration in group variances.
In this study, the KW test was not affected by the the literature by reporting that the Savage test’s per-
distribution of the data. It was concluded that the vio- formance in protecting the Type-І error compared to
lation of the homogeneity of variances and the number other tests gives very poor and biased results.
of sample sizes (equal and unequal) in the groups were
effective on the performance of the KW test to protect
the Type-I error. In their studies, Hoeffding [32] ve CONCLUSION
Terry [33] concluded that the performance of the KW
test was not sufficient in terms of protecting Type-I In conclusion as stated in the literature, it was deter-
error in cases where the variance was not homoge- mined that the F test tends to maintain its robustness
neous. Lantz [22], Luh and Guo [34], Jett and Speer in case of violation of the normal distribution, how-
[35] found in their studies that the KW test was not af- ever, it is more affected by the violation of the homo-
fected by the distribution of the data, and in cases geneity assumption of variances. It was concluded that
where the variances were homogeneous, they tend to the distribution of the data was not effective on the
protect the Type-I error. KW test’s performance in protecting Type-І error, the
The modified version of the KW test is not af- violation of homogeneity of variances and the sample
fected by the distribution of the data; It is highly af- size in the groups were effective. The modified version
fected by the homogeneity violation of variances such of KW test based on permutation test is not affected
as the KW test. It can be suggested as an alternative by the distribution of the data; like the KW test, it is
for the F test for observation combinations where the highly affected by the violation of homogeneity of
number of sample sizes in the groups are not equal and variances. It can be suggested as an alternative to the
excessively different. Odiase and Ogbonmwan [14] re- F test for combinations of observations where the sam-
ported in their study that the permutation test does not ple sizes in the groups are not equal and vary exces-
require assumptions for the distribution of the data, sively. The van der Wearden test was not affected by
and that it performs well on data that are normally dis- the distribution of the data and showed successful per-
tributed and not normally distributed. formance in protecting the Type-I error in observation
The van der Wearden test was not affected by the dis- combinations where the number of sample sizes in the
tribution of the data and showed successful perform- groups where the group variances were homogeneous
ance in protecting the Type-I error in observation differed significantly. In general, the MM test did not
combinations where the number of sample sizes in the show a successful performance in protecting the Type-
groups where the group variances were homogeneous І error. It has been found that the Savage test’s per-
differed significantly. The van der Wearden test was formance in protecting the Type-І error compared to
greatly affected by the breakdown in group variance. other tests gives very poor and biased results.
Luepsen [1] stated that the van der Wearden test was
the most successful test after the F test in estimating Authors’ Contribution
the Type-І error level in cases where there is no rela- Study Conception: GO; Study Design: GO; Super-
tionship between group variances and the number of vision: GO; Funding: N/A; Materials: N/A ; Data Col-
observations belonging to the groups. lection and/or Processing: ACM; Analysis and/or
Although the MM test performed well as the num- Data Interpretation: ACM, GO; Literature Review:
ber of groups compared increased, it did not show a ACM; Manuscript Preparation: ACM, GO and Critical
successful performance in protecting the Type-I error Review: ACM, GO.
in general. Jett and Speer [35] stated in their simula-
tion studies that the performance of the MM test was Conflict of interest
not sufficient to protect the Type-І error and reported The authors disclosed no conflict of interest during
our study with supporting findings. the preparation or publication of this manuscript.
The Savage test could not perform adequately to
protect the Type-І error at nominal level and gave bi- Financing
ased results. There is no study in the literature regard- The authors disclosed that they did not receive any
ing the Savage test. Our study aims to contribute to grant during conduction or writing of this study.
This is an open access article distributed under the terms of Creative Common
Attribution-NonCommercial-NoDerivatives 4.0 International License.