Professional Documents
Culture Documents
1 s2.0 S0169743921001404 Main
1 s2.0 S0169743921001404 Main
1 s2.0 S0169743921001404 Main
A R T I C L E I N F O A B S T R A C T
Keywords: Maximum likelihood (ML) estimators of the model parameters in multiple linear regression are obtained using
Multiple linear regression genetic algorithm (GA) when the distribution of the error terms is long-tailed symmetric. We compare the effi-
Long-tailed symmetric distribution ciencies of the ML estimators obtained using GA with the corresponding ML estimators obtained using other
Maximum likelihood
iterative techniques via an extensive Monte Carlo simulation study. Robust confidence intervals based on
Modified maximum likelihood
Genetic algorithm
modified ML estimators are used as the search space in GA. Our simulation study shows that GA outperforms
traditional algorithms in most cases. Therefore, we suggest using GA to obtain the ML estimates of the multiple
linear regression model parameters when the distribution of the error terms is LTS. Finally, real data of the Covid-
19 pandemic, a global health crisis in early 2020, is presented for illustrative purposes.
* Corresponding author.
E-mail address: ayalcinkaya@ankara.edu.tr (A. Yalçınkaya).
https://doi.org/10.1016/j.chemolab.2021.104372
Received 5 November 2020; Received in revised form 31 May 2021; Accepted 24 June 2021
Available online 29 June 2021
0169-7439/© 2021 Elsevier B.V. All rights reserved.
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
p
non-convergence of iterations, convergence to the wrong root, slow 1 ϵ2
f ðϵÞ ¼ pffiffiffi 1 þ 2 ; ∞ < ϵ < ∞ (2)
convergence may arise in numerical methods, see Refs. [1,2,28,36,37]. kσ
Different than the earlier studies genetic algorithm (GA) which is a σ k B 12; p 12
population-based heuristic optimization method is utilized to obtain the
ML estimates of model parameters in multiple linear regression. It is a where k ¼ 2p 3, p 2, σ > 0, and B (⋅, ⋅) is the beta function. The
powerful method besides being easy for solving an optimization problem expected value and variance of the LTS distribution are given by E(ϵ) ¼
and can comfortably be used for large-scale and complex nonlinear 0 and Var(ϵ) ¼ σ 2, respectively. Also, the kurtosis value (γ) is evaluated
optimization problems, see Xia et al. [39]. Further, as Lu et al. [24] stated by γ ¼ 3 (p 3/2)/(p 5/2). Table 1 shows kurtosis values for some
that satisfactory solutions can easily be obtained according to the opti- representative values of shape parameter p.
mization objectives, and the shortcomings of numerical optimization LTS distribution has the following special cases:
methods are overcome by using GA due to its characteristics which make
pffiffiffiffiffiffiffi μÞ
it have outstanding advantages in iterative optimization. Garcia et al. [8] ● Assume X ~ LTS (μ, σ , p) then T ¼ v=k ðX σ reduces to the Student's
also reported that GA is one of the most robust heuristics automated t distribution with v ¼ 2p 1 degrees of freedom.
methods to solve optimization problems. See also Yalcinkaya et al. [40, ● When p → ∞, LTS distribution converges to the well-known Normal
41] for the details and advantages of GA. The main reason for choosing distribution.
GA in obtaining the estimates of the parameters is that the use of GA
warrants the convergence to the global optimal solution in optimization See Islam and Tiku [16] and Tiku and Kumra [32] for more details
problems that are extremely complex and suspected to be multi-modal, about LTS distribution.
see Goldberg [12]. We then compare the efficiencies of the ML estima-
tors using GA with corresponding estimators using traditional iterative
3. Maximum likelihood estimation
techniques, such as Newton-Raphson (NR), Nelder-Mead (NM), and
iteratively re-weighting algorithm (IRA).
In this section, we derive the maximum likelihood estimators of the
One of the main contributions of this paper is to propose confidence
parameters θ0, θ1, …, θq and σ in the multiple linear regression with LTS
intervals based on Tiku's [29,30] modified maximum likelihood (MML)
distributed error terms. The log-likelihood (lnL) function based on (2)
estimators for the corresponding multiple linear regression model pa- P
where ϵi ¼ yi θ0 qj¼1 θj xij , i ¼ 1, …, n, is given by
rameters as the search space in GA. Yalcinkaya et al. [40,41] and Acitas
et al. [3] use GA and particle swarm optimization to obtain the ML es- 0 !2 1
timators of the parameters of skew normal distribution and Weibull P
q
B yi θ0 θj xij C
pffiffiffi 1 1 Xn B C
distribution, respectively. They both strategically use the confidence in- B j¼1 C
lnL ¼ nln σ k B ; p p lnB1 þ C
tervals based on the MML estimators of parameters of interest as search 2 2 B kσ2 C
i¼1 @ A
space and show that using the proposed approach provides a narrower
search space improving the GA's performance for convergence.
The usage of heuristic methods is very limited in the context of (3)
regression. Therefore, we aim to implement these methods in solving In order to obtain the ML estimators of unknown parameters, partial
optimization problems in multiple linear regression with non-normal error derivatives of lnL function with respect to the parameters of interest are
terms instead of classical approaches. To the best of our knowledge, this is taken and we set them equal to 0 as follows:
the first study to obtain ML estimates of the parameters of a multiple linear
Pq
regression model with LTS distributed error terms using GA. X yi θ0 j¼1 θj xij
∂lnL n
It should be noted that there are many long-tailed symmetric distri- ¼ 2p Pq 2 ¼ 0; (4)
∂θ0 i¼1 k σ 2 þ yi θ0
butions, see, for example, Lange et al. [21] and Lange and Sinsheimer j¼1 θj xij
[22]. Our results can easily be extended to these distributions so they
may be the topic of our future studies. Pq
∂lnL Xn xil yi θ0 j¼1 θj xij
A virus called as Covid-19, which appeared in Wuhan province of ¼ 2p Pq 2 ¼ 0; (5)
China in late 2019, spread rapidly worldwide in early 2020. On July 7, ∂θ l i¼1 k σ 2 þ y θ θx
i 0 j¼1 j ij
2020, World Health Organization (WHO) reported that Covid-19 infec-
ted over 11 million 500 thousand people worldwide and killed more than
(l ¼ 1, …, q), and
500 thousand of them. In the Covid-19 pandemic, which turned into a
global health crisis, the growth rate of cases and the number of deaths per !2
P
q
million varies from country to country because of the different charac- yi θ0 θj xij
teristics of their governments and people. It is important to model the ∂lnL n 2p X n
j¼1
¼ þ !2 ¼ 0: (6)
mortality rate on fighting Covid-19. Therefore, in this study, we analyze ∂σ σ σ i¼1 P
q
the Covid-19 data which includes characteristics of governments and kσ2 þ yi θ0 θj xij
j¼1
people, both to make a scientific contribution to the fight with Covid-19
and to demonstrate an implementation of our proposed methodology. The solutions of these likelihood equations are the ML estimators of
The rest of the paper is organized as follows. Section 2 presents the LTS parameters of interest. However, the equations have no explicit solutions
distribution. Section 3 includes the ML estimation for multiple linear [16] since the likelihood equations include intractable functions such as
regression model parameters when the distribution of error terms is LTS. gðzi Þ ¼ zi =½k þz2i where zi ¼ ϵi/σ , i ¼ 1, …, n. Therefore, we resort to the
The details of GA, the procedure of identifying the search space in GA, and iterative techniques such as GA, IRA, NR, and NM algorithms. The pro-
the details of IRA are also given in Section 3. The simulation study and its cedures of GA and IRA used here are introduced in the following sub-
results are presented in Section 4. Real data of the Covid-19 pandemic is sections. See Ref. [40] for the details of NR and NM algorithms. Here, we
examined in Section 5 to show the implementation of the proposed
methodology. In the final section, the concluding remarks are given. Table 1
The kurtosis values for the LTS distribution.
2. Long-tailed symmetric distribution p 2.5 3.0 3.5 5.0 10 ∞
2
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
don't give the NR and NM algorithms because they are very common CIðσ Þ ¼ σ^ zα=2 seð^
σ Þ; σ^ þ zα=2 seð^σ Þ (9)
methods to obtain the ML estimators in the literature.
Here, se(⋅) is the standard error of the estimator of interest. In this study,
3.1. Genetic algorithm we evaluate the standard errors by using the asymptotic variance -
covariance matrix of Tiku's MML estimators. The asymptotic variance -
Genetic Algorithm (GA), an iterative population-based search tech- covariance matrix of the MML estimators is obtained by using the inverse
nique proposed by Holland [13], is a very popular heuristic algorithm to of the expected Fisher information matrix given in Islam and Tiku [16].
find the optimum of the objective function. The idea underlying GA is to MML estimators are explicit estimators of the model parameters and
imitate the behavior of heredity characteristics of chromosomes on the derived by using non-iterative method, see Ref. [29]. To obtain the MML
passing from one generation to another based on the evolutionary estimators, firstly the likelihood equations given in (4)-(6) are expressed
mechanisms. Each solution and a set of solutions in every generation are in terms of the ordered statistics (for given θ0 and θj, 1 j q)
called as chromosome and population, respectively. The following pro-
cedure given stage by stage is used during GA. X
q
ϵðiÞ ¼ y½i θ0 θj x½ij ; 1in (10)
j¼1
Generating Initial Population: GA starts with an initial population of
N chromosomes generated from the search space via an initialization where (y[i], x[i]1, …, x[i]q) is called as the concomitant vector of obser-
ð0Þ ð0Þ vations corresponding to ith ordered ϵ(i). Then the MML estimators of
strategy. Assume that the initial population is denoted by w1 ; w2 ;
ð0Þ 0 multiple linear regression model parameters when the errors have LTS
…; wN where w ¼ θ0 ; θ1 ; …; θq ; σ is a vector of unknown param-
distribution are formulated as
eters. Here, q and N are the number of the independent variables in
the multiple linear regression model and the population size, X
q
ðmÞ ^θ ¼ y½ ^θ x½j ; (11)
respectively. Also, the vector of wr , r ¼ 1, 2, …, N, m ¼ 0, 1, 2, … 0
j¼1
j
tors are applied according to mutation probability (MP) and crossover B¼ αi y½i y½ Kj x½ij x½j (17)
k i¼1 j¼1
probability (CP) to the candidate chromosomes selected from the
chromosomes except elites via a selection strategy. Here, we prefer
and
the roulette wheel selection strategy. The basic principle of this
strategy is that the chromosomes having the better fitness value have ( )2
2p X X
n q
a greater chance of being selected. C¼ β y½i y½: Kj ðx½ij x½:j Þ : (18)
ðmþ1Þ ðmþ1Þ k i¼1 i
Convergence Check: So, the new population w1 ; w2 ; …; j¼1
ðmþ1Þ
wN is obtained. If the convergence criteria is not hold, this process Here,
is continued by setting m ¼ m þ 1. When the process stops, the values
3
of the best chromosome at the last population are called as the esti- ð2=kÞtðiÞ
mates of the parameters. αi ¼ 2 2; (19)
f1 þ ð1=kÞtðiÞ g
Identifying the search space of GA: We use the confidence intervals
based on Tiku's MML estimators of the parameters θ0, θ1, …, θq and σ 2
as the search space in GA. Then asymptotic 100 (1 α)% confidence 1 ð1=kÞtðiÞ
βi ¼ 2
(20)
2
intervals for the parameters θ0, θ1, …, θq and σ are given as follows: f1 þ ð1=kÞtðiÞ g
CIðθ0 Þ ¼ ^ θ0 ; ^θ0 þ zα=2 se ^θ0 ;
θ0 zα=2 se ^ (7) where t(i) is the expected value of the ordered statistics Z(i) ¼ ϵ(i)/σ , i. e.,
t(i) ¼ E (Z(i)). If C in (18) is a negative value, then we use the following α*i
and β*i ,
CIðθl Þ ¼ ^
θl zα=2 se ^
θl ; ^θl þ zα=2 se ^θl ; (8)
3
ð1=kÞtðiÞ
α*i ¼ 2 2; (21)
(l ¼ 1, …, q), and f1 þ ð1=kÞtðiÞ g
3
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
Table 2
Simulated Mean, MSE and Def values for the estimators of the parameters θ0, θ1, θ2, θ3 and σ
^
θ0 ^θ1 ^θ2 ^θ3 σ^
n Method Mean MSE Mean MSE Mean MSE Mean MSE Mean MSE Def
p¼3
20 GA 0.0143 0.2519 1.0041 0.2814 0.9888 0.3940 1.0260 0.4764 0.9123 0.0368 1.4404
IRA 0.0271 0.0088 0.9613 0.3480 0.9601 0.7383 1.1101 1.1324 0.9873 0.0442 2.2716
NR 326E2 679E10 356E2 957E10 441E2 148E11 246E2 550E10 166.3 228E5 367E11
NM 0.0134 0.5006 1.0084 0.4934 0.9985 0.6622 1.0102 0.7341 0.8802 0.0525 2.4427
50 GA 0.0101 0.0750 1.0151 0.1288 0.9945 0.1614 0.9977 0.1457 0.9737 0.0129 0.5238
IRA 0.0338 0.0047 1.0062 0.1751 1.0112 0.2299 0.9843 0.1971 1.0116 0.0162 0.6231
NR 0.0024 0.1228 1.0185 0.1872 0.9632 0.2176 1.0065 0.2240 0.9516 0.0135 0.7650
NM 0.0019 0.1237 1.0122 0.1836 0.9868 0.2196 0.9932 0.2089 0.9599 0.0175 0.7532
100 GA 0.0025 0.0397 1.0044 0.0634 0.9948 0.0704 0.9901 0.0703 0.9874 0.0063 0.2500
IRA 0.0201 0.0016 1.0134 0.0811 0.9949 0.0842 0.9945 0.0811 1.0098 0.0079 0.2559
NR 0.0046 0.0699 1.0023 0.0920 0.9889 0.1025 0.9840 0.0979 0.9775 0.0083 0.3706
NM 0.0046 0.0699 1.0023 0.0920 0.9889 0.1025 0.9840 0.0979 0.9775 0.0083 0.3705
200 GA 0.0039 0.0199 0.9992 0.0322 0.9955 0.0317 1.0099 0.0336 0.9968 0.0029 0.1203
IRA 0.0204 0.0011 1.0048 0.0391 0.9948 0.0383 1.0142 0.0381 1.0148 0.0041 0.1206
NR 0.0027 0.0351 0.9995 0.0464 0.9869 0.0459 1.0064 0.0475 0.9885 0.0041 0.1791
NM 0.0027 0.0351 0.9995 0.0464 0.9870 0.0459 1.0064 0.0475 0.9885 0.0041 0.1790
500 GA 0.0076 0.0084 1.0099 0.0102 0.9920 0.0128 1.0164 0.0144 1.0027 0.0009 0.0468
IRA 0.0138 0.0004 1.0156 0.0127 0.9952 0.0151 1.0205 0.0145 1.0150 0.0016 0.0444
NR 0.0063 0.0156 1.0069 0.0147 0.9896 0.0187 1.0192 0.0218 0.9950 0.0015 0.0722
NM 0.0063 0.0156 1.0069 0.0147 0.9896 0.0187 1.0191 0.0218 0.9950 0.0015 0.0722
p¼5
20 GA 0.0117 0.2931 0.9986 0.3237 1.0062 0.4645 1.0047 0.5582 0.9112 0.0322 1.6716
IRA 0.0174 0.0069 0.9717 0.3298 0.9858 0.7563 1.0595 1.1332 0.9518 0.0375 2.2637
NR 126E2 106E10 122E2 101E10 7462.9 951E9 17,779 107E10 81.019 174E5 410E10
NM 0.0114 0.5590 1.0010 0.5583 1.0171 0.7812 0.9898 0.8560 0.8854 0.0448 2.7993
50 GA 0.0010 0.0938 0.9940 0.1513 1.0158 0.1810 0.9964 0.1680 0.9647 0.0110 0.6051
IRA 0.0196 0.0037 0.9914 0.1853 1.0285 0.2459 0.9892 0.1992 0.9824 0.0135 0.6476
NR 0.0053 0.1548 0.9947 0.2147 1.0108 0.2519 0.9920 0.2412 0.9521 0.0152 0.8778
NM 0.0053 0.1547 0.9947 0.2147 1.0108 0.2519 0.9921 0.2412 0.9521 0.0152 0.8777
100 GA 0.0099 0.0485 1.0097 0.0762 1.0067 0.0795 1.0068 0.0821 0.9874 0.0053 0.2917
IRA 0.0142 0.0014 1.0120 0.0879 1.0045 0.0864 1.0037 0.0855 0.9979 0.0064 0.2676
NR 0.0102 0.0857 1.0108 0.1077 1.0074 0.1127 1.0053 0.1134 0.9803 0.0068 0.4263
NM 0.0102 0.0857 1.0107 0.1077 1.0074 0.1127 1.0054 0.1134 0.9803 0.0068 0.4262
200 GA 0.0107 0.0237 0.9941 0.0419 1.0246 0.0382 1.0031 0.0369 0.9956 0.0025 0.1432
IRA 0.0131 0.0008 0.9881 0.0456 1.0290 0.0409 0.9995 0.0407 1.0032 0.0033 0.1312
NR 0.0105 0.0420 0.9918 0.0588 1.0288 0.0538 1.0007 0.0513 0.9903 0.0034 0.2093
NM 0.0106 0.0420 0.9919 0.0588 1.0288 0.0538 1.0007 0.0513 0.9903 0.0034 0.2093
500 GA 0.0021 0.0112 0.9959 0.0169 0.9953 0.0172 1.0069 0.0189 0.9999 0.0009 0.0652
IRA 0.0063 0.0003 1.0024 0.0188 0.9968 0.0188 1.0145 0.0211 1.0048 0.0013 0.0602
NR 0.0081 0.0216 0.9925 0.0266 0.9890 0.0258 1.0060 0.0278 0.9960 0.0013 0.1032
NM 0.0082 0.0216 0.9925 0.0267 0.9890 0.0258 1.0060 0.0279 0.9960 0.0013 0.1032
E: This symbol means to raise the number that comes after it to a power of 10.
" #
1 P
n n X
X q X
n
β*i ¼ 2; ð1⩽i⩽nÞ (22) ðmþ1Þ
θ0 ¼
ðmÞ
yi γ i
ðmÞ ðmÞ
θj xij γ i =
ðmÞ
γi ;
2
f1 þ ð1=kÞtðiÞ g i¼1 i¼1 j¼1 i¼1
…, θq and σ .
ðmÞ ðmÞ
Step 2: Compute the following equations and γ i ¼ 1=wi . Here, m ¼ 0, 1, 2, … is the number of iterations and l
¼ 1, …, q is the number of regression coefficients.
4
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
Fig. 1. Scatter plots and the correlation coefficients for the Covid-19 data.
5
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
linear regression model when the error terms have LTS distribution
because of the superior performance of the GA.
5. Application
6
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
Fig. 4. LTS Q-Q Plot for the residuals after excluding the outliers (n ¼ 49).
Table 3
Estimates of the regression parameters, bootstrap standard errors, AIC, RMSE, and Dks values (n ¼ 49).
^
θ0 ^θ1 ^θ2 ^θ3 σ^ AIC RMSE Dks
X
3 The estimates of the regression parameters, the bootstrap standard
~ϵi ¼ yi ~
θ0 ~
θj xij ð1 i 49Þ (29) errors of the corresponding estimates (given in the parenthesis), AIC,
j¼1
RMSE, and the K–S test statistics (i.e. Dks) values are given in Table 3.
given in Fig. 4 indicates that LTS distribution is appropriate. According to the K–S results based on GA and IRA estimates of the
To identify whether the LTS distribution is appropriate for the dis- model parameters, the null hypothesis is not rejected at the significance
tribution of the error terms or not, we also use Kolmogorov-Simirnov level α ¼ 0.05 for both tests since the Dks values are less than the cor-
(K–S) test which is a well-known and widely-used goodness of fit test. responding table value dn ¼ 49,α ¼ 0.05 ¼ 0.17128, see Table 3. This result
To test the following null hypothesis implies that the distribution of the error terms obtained from the
regression equation based on GA and IRA estimates of the model pa-
H0: The error terms are distributed as LTS rameters is LTS. The Q-Q plot given in Fig. 4 supports this result, visually.
It is clear from Table 3 that the proposed method using GA has lower
versus the alternative AIC and RMSE values than those of IRA in Case 1. Furthermore, GA has
much smaller bootstrap standard errors than IRA for all parameter esti-
H1: The error terms are not distributed as LTS, mates. Therefore, GA is more reliable and preferable than IRA. Addi-
tionally, it is obvious from Table 3 that using GA is superior than using PL
K–S test statistic Dks is obtained as follows: in model fitting according to the AIC and RMSE results given in Case 1
and Case 2. These conclusions show the superiority of the GA. As a result,
Dks ¼ sup ðjFn ðÞ F0 ðÞjÞ (30) we advice to use GA to obtain the estimates of the parameters (including
ϵ
the shape parameter p) in multiple linear regression model with LTS
distributed error terms.
where F0 (⋅) is the cdf of a LTS distribution with a known shape parameter
Additionally, to investigate the aspects of the robustness of the ML
p and Fn(⋅) is the empirical cdf for the error terms. At significance level α,
estimates based on GA and IRA methods to outliers, we give the results of
if the calculated value Dks is greater than the tabulated value dα given in
the regression analysis for the data including outliers (i.e., n ¼ 52), see
Ref. [26] or equivalently if the corresponding p-value is less than α, then
Table 4. It is seen from Table 3 and Table 4 that the model fitting
the null hypothesis H0 is rejected.
Table 4
Estimates of the regression parameters, bootstrap standard errors, AIC, RMSE, and Dks values (n ¼ 52).
^
θ0 ^θ1 ^θ2 ^θ3 σ^ AIC RMSE Dks
7
A. Yalçınkaya et al. Chemometrics and Intelligent Laboratory Systems 216 (2021) 104372
performance of GA is less sensitive to the outliers than that of IRA. It is [11] S. Ghosal, S. Sengupta, M. Majumder, B. Sinha, Linear Regression Analysis to
predict the number of deaths in India due to SARS-CoV-2 at 6 weeks from day
clear that AIC and RMSE values decrease for the censored sample for both
0 (100 cases-March 14th 2020), Diabetes & Metabolic Syndrome: Clin. Res. Rev. 14
models based on GA and IRA, this is an indication of the negative effect of (4) (2020) 311–315.
the outliers on the estimated regression equation. However, the reduc- [12] D. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning,
tion rate is much bigger for the model based on IRA. Addison-Wesley, MA, 1989.
[13] J. Holland, Adaptation in Natural and Artificial System: an Introduction with
Application to Biology, Control and Artificial Intelligence, University of Michigan
6. Conclusion Press, Ann Arbor, 1975.
[14] P.J. Huber, Robust Statistics, Springer Berlin Heidelberg, 2011, pp. 1248–1251.
[15] M.Q. Islam, Estimation and hypothesis testing in multivariate linear regression
In this study, we focus the ML estimation of the parameters of the models under non normality, Commun. Stat. Theor. Methods 46 (17) (2017)
multiple linear regression model when the underlying distribution of 8521–8543.
error terms is LTS. It should be noted that the ML estimators of the pa- [16] M.Q. Islam, M.L. Tiku, Multiple linear regression model under nonnormality,
Commun. Stat. Theor. Methods 33 (10) (2005) 2443–2467.
rameters cannot be obtained analytically. Therefore, we resort to GA and [17] M.Q. Islam, M.L. Tiku, F. Yildirim, Nonnormal regression. I. Skew distributions,
traditional NR, NM and IRA algorithms. To improve the performance of Commun. Stat. Theor. Methods 30 (6) (2001) 993–1020.
GA, we use the robust confidence intervals based on the MML estimators [18] M.Q. Islam, F. Yildirim, M. Yazici, Inference in multivariate linear regression
models with elliptically distributed errors, J. Appl. Stat. 41 (8) (2014) 1746–1766.
of the regression parameters as the search space. We compare the effi- [19] S. Jomnonkwao, S. Uttra, V. Ratanavaraha, Forecasting road traffic deaths in
ciencies of the ML estimators obtained by using mentioned algorithms in Thailand: applications of time-series, curve estimation, multiple linear regression,
terms of MSE and DEF criteria. Our simulation study shows that GA and path analysis models, Sustainability 12 (1) (2020) 395.
[20] Y.M. Kantar, B. Senoglu, A comparative study for the location and scale parameters
outperforms other traditional algorithms in most of the cases to obtain
of the Weibull distribution with given shape parameter, Comput. Geosci. 34 (12)
the ML estimates. Eventually, we strongly advise using GA to obtain the (2008) 1900–1909.
ML estimates of the parameters of the multiple linear regression model [21] K.L. Lange, R.J. Little, J. Taylor, Robust statistical modelling using the t-
when the error terms have LTS distribution because of the superior distribution, J. Am. Stat. Assoc. 84 (1989) 881–896.
[22] K.L. Lange, J.S. Sinsheimer, Normal/independent distributions and their
performance of the GA. applications in robust regression, J. Comput. Graph Stat. 2 (1993) 175–198.
[23] K. Liao, E.S. Park, J. Zhang, L. Cheng, D. Ji, Q. Ying, J.Z. Yu, A multiple linear
CRediT authorship contribution statement regression model with multiplicative log-normal error term for atmospheric
concentration data, Sci. Total Environ. 767 (2021) 144282.
[24] X. Lu, Y. Wu, J. Lian, Y. Zhang, C. Chen, P. Wang, L. Meng, Energy management of
Abdullah Yalçınkaya: Conceptualization, Methodology, Software, hybrid electric vehicles: a review of energy optimization of fuel cell hybrid power
Formal analysis, Validation, Data curation, Investigation, Writing – system based on genetic algorithm, Energy Convers. Manag. 205 (2020) 112474.
_
original draft, Review & editing. Iklim Gedik Balay: Conceptualization,
[25] K. McGarry, S.A. Siedlecki, J. Salisbury, S.R. Alin, Multiple linear regression models
for reconstructing and exploring processes controlling the carbonate system of the
Methodology, Software, Formal analysis, Resources, Validation, Writing northeast US from basic hydrographic data, J. Geophys. Res.: Oceans 126 (2)
– original draft, , Review & editing. Birdal Şenoǧ
ǧlu: Conceptualization, (2021), e2020JC016480.
[26] L.H. Miller, Table of percentage points of Kolmogorov statistics, J. Am. Stat. Assoc.
Methodology, Supervision, Formal analysis, Validation, Writing original 51 (1956) 111–121.
draft, Review & editing. [27] E.S. Pearson, The analysis of variance in cases of non-normal variation, Biometrika
23 (1/2) (1931) 114–133.
[28] S. Puthenpura, N.K. Sinha, Modified maximum likelihood method for the robust
Declaration of competing interest
estimation of system parameters from very noisy data, Automatica 22 (2) (1986)
231–235.
The authors declare that they have no known competing financial [29] M.L. Tiku, Estimating the mean and standard deviation from a censored normal
interests or personal relationships that could have appeared to influence sample, Biometrika 54 (1–2) (1967) 155–165.
[30] M.L. Tiku, Estimating the parameters of normal and logistic distribution from
the work reported in this paper. censored samples, Aust. J. Stat. 10 (2) (1968) 64–74.
[31] M.L. Tiku, M.Q. Islam, A.S. Selcuk, Nonnormal regression. II. Symmetric
References distributions, Commun. Stat. Theor. Methods 30 (6) (2001) 1021–1045.
[32] M.L. Tiku, S. Kumra, Expected values and variances and covariances of order
statistics for a family of symmetric distributions (Student's t), selected tables in
[1] S. Acitas, P. Kasap, B. Senoglu, O. Arslan, One-step M-estimators: jones and Faddy's mathematical statistics 8 (1985) 141–270.
skewed t-distribution, J. Appl. Stat. 40 (7) (2013) 1545–1560. [33] M.L. Tiku, R.P. Suresh, A new method of estimation for location and scale
[2] S. Acitas, P. Kasap, B. Senoglu, O. Arslan, Robust estimation with the skew t2 parameters, J. Stat. Plann. Inference 30 (2) (1992) 281–292.
distribution, Pakistan Journal of Statistics 29 (4) (2013) 409–430. [34] M.L. Tiku, W.K. Wong, G. Bian, Estimating parameters in autoregressive models in
[3] S. Acitas, C.H. Aladag, B. Senoglu, A new approach for estimating the parameters of non-normal situations: symmetric innovations, Commun. Stat. Theor. Methods 28
Weibull distribution via particle swarm optimization: an application to the (2) (1999) 315–341.
strengths of glass fibre data, Reliab. Eng. Syst. Saf. 183 (2019) 116–127. [35] J.W. Tukey, A survey of sampling from contaminated distributions, Contributions to
[4] H. Akaike, Maximum likelihood identification of Gaussian autoregressive moving probability and statistics (1960) 448–485.
average models, Biometrika 60 (2) (1973) 255–265. [36] D.C. Vaughan, On the Tiku-Suresh method of estimation, Commun. Stat. Theor.
[5] G.K. Bhattacharyya, The asymptotics of maximum likelihood and related estimators Methods 21 (2) (1992) 451–469.
based on type II censored data, J. Am. Stat. Assoc. 80 (390) (1985) 398–404. [37] D.C. Vaughan, The generalized secant hyperbolic distribution and its properties,
[6] N. Celik, B. Senoglu, Robust estimation and testing in one-way ANOVA for Type II Commun. Stat. Theor. Methods 31 (2) (2002) 219–238.
censored samples: skew normal error terms, J. Stat. Comput. Simulat. 88 (7) (2018) [38] D.C. Vaughan, M.L. Tiku, Estimation and hypothesis testing for a nonnormal
1382–1393. bivariate distribution with applications, Math. Comput. Model. 32 (1–2) (2000)
[7] N. Celik, B. Senoglu, O. Arslan, Estimation and testing in one-way ANOVA when the 53–67.
errors are skew-normal, Rev. Colomb. Estadística 38 (1) (2015) 75–91. [39] Z. Xia, K. Mao, S. Wei, X. Wang, Y. Fang, S. Yang, Application of genetic algorithm-
[8] A.M. Garcia, I. Sante, M. Boullon, R. Crecente, Calibration of an urban cellular support vector regression model to predict damping of cantilever beam with
automaton model by using statistical techniques and a genetic algorithm. particle damper, J. Low Freq. Noise Vib. Act. Contr. 36 (2) (2017) 138–147.
Application to a small urban settlement of NW Spain, Int. J. Geogr. Inf. Sci. 27 (8) [40] A. Yalcinkaya, B. Senoglu, U. Yolcu, Maximum likelihood estimation for the
(2013) 1593–1611. parameters of skew normal distribution using genetic algorithm, Swarm and
[9] R.C. Geary, Testing for normality, Biometrika 34 (3/4) (1947) 209–242. Evolutionary Computation 38 (2018) 127–138.
[10] M.J. Gelfand, J.C. Jackson, X. Pan, D. Nau, M.M. Dagher, P.V. Lange, C. Chiu, The [41] A. Yalcinkaya, U. Yolcu, B. Senoglu, Maximum likelihood and maximum product of
importance of cultural tightness and government efficiency for understanding spacings estimations for the parameters of skew-normal distribution under doubly
COVID-19 growth and death rates. Preprint on PsyArXiv. https://doi.org/10 type II censoring using genetic algorithm, Expert Syst. Appl. 168 (2021) 114407.
.31234/osf.io/m7f8a, 2020.