Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Comparative Analysis of Meta-heuristic Feature

Selection and Feature Extraction Approaches for


Enhanced Chronic Kidney Disease Prediction
*1Pratham Yashwante, *1Yash Patil, *1Karan Nadar, *2Anindita Khade
*1Student,Department of Computer Engineering, SIES Graduate School of Technology, Nerul, Maharashtra, India
*2Professor, Department of Computer Engineering, SIES Graduate School of Technology, Nerul, Maharashtra, India

Abstract—Chronic Kidney Disease (CKD) has garnered considering the trade-off between true positive and false
significant attention over the past decades, primarily due positive rates across various classification thresholds [12].
to its lack of symptoms in the early stages. The objective
of this research paper is to evaluate and contrast the II. BACKGROUND
effects of various feature extraction methods, such as Numerous approaches for chronic kidney disease prediction
Linear Discriminant Analysis (LDA), Principal using intelligent algorithms have been developed. Iliyas
Component Analysis (PCA), Independent Component Ibrahim et al., [2] utilized a Deep Neural Network (DNN) on
Analysis (ICA), and meta-heuristic feature selection Bade General Hospital's dataset to predict CKD with 98%
methods like Particle Swarm Optimization (PSO), Ant accuracy. The study highlights Creatinine and Bicarbonate as
Colony Optimization (ACO), and Artificial Bee Colony key attributes for effective CKD detection. Author Saurabh
(ABC). The classification models employed for evaluating Pal [3] investigated CKD prediction using a machine learning
the selected features include Artificial Neural Network model incorporating categorical and non-categorical
(ANN), Random Forest Classifier (RF), Multilayer attributes. The approach combines baseline classifiers,
Perceptron Classifier (MLP), and K-Nearest Neighbors utilizing a majority voting method for a 3% accuracy
(KNN). The issue of overfitting and underfitting has been improvement. Ibomoiye Domor Mienye et al., [4] introduced
addressed. The results are computed based on accuracy a novel method by integrating PSO to optimize parameters in
for both the training and testing sets and AUC-ROC a Stacked Sparse Autoencoder (SSAE), tackling internal
scores, which have been visualized. We found out that the covariate shift challenges. The SSAE network connects the
meta-heuristic optimization feature selection algorithms last autoencoder's hidden layer to a softmax classifier.
improve the performance of the models drastically Vijendra Singh et al., [5] in their paper gave deep learning
compared to feature extraction techniques. model for early chronic disease diagnosis, using Recursive
Feature Elimination, outperforms five classifiers (SVM,
Keywords—chronic kidney disease, prediction, deep learning, KNN, Logistic Regression, Random Forest, Naive Bayes)
machine learning, meta-heuristic feature selection, feature with 100% accuracy. Surpassing recent studies, including
extraction. various classifiers, the model's perfect accuracy, ranging
I. INTRODUCTION from 85% to 98.5% in existing works, positions it as a
promising tool for nephrologists. Hanyu Zhang et al., [6]
CKD silently impacts millions globally, necessitating early addressed challenges in evaluating CKD patients' conditions
detection and intervention [1]. Globally, in 2017, 1.2 million by employing data preprocessing and ANN techniques. The
people died from CKD [17]. Traditional diagnostic markers study compares a classical MLP model with a LASSO-
often miss early signs, but deep learning, leveraging preselected MLP model, showing comparable high accuracy
extensive patient data, reveals elusive patterns. With in mapping clinical factors to survivability. Chaity Mondol et
advancements in data analytics and computing, machine al., [7] introduced a high-accuracy neural network method for
learning enhances our understanding of CKD risk factors, detecting CKD, offering a promising tool for risk assessment.
enabling early detection and management, potentially The study preprocesses the raw dataset, enhancing detection
transforming lives [10]. The meta-heuristic algorithms for efficiency. Optimized neural network models (OCNN,
feature selection have been proven helpful to build robust OANN, OLSTM) outperform traditional methods, with
models for disease prediction [11]. Our work aims to compare OCNN reaching 98.75% accuracy. Manonmani. M et al., [8]
meta-heuristic feature selection algorithms (PSO, ACO, enhanced the Teaching-Learning-Based Optimization
ABC) and feature extraction algorithms (LDA, PCA, ICA) in (TLBO) algorithm for high-dimensional medical data
predicting CKD model’s performance. These algorithms analysis, addressing CKD diagnosis accuracy. The proposed
extract/select relevant features for a refined data ITLBO achieves a 36% reduction in features, surpassing
representation, which are then used as an input for the TLBO's 25%. Experimental results show improved
classification models (ANN, RF, MLP, KNN). All models performance metrics, including a 6.75%, 6.25%, and 4.75%
share hyperparameters, enabling an evaluation of the accuracy boost for SVM, Gradient Boosting, and CNN.
effectiveness of feature selection/extraction methods in S.Belina et al., [9] aimed for optimal predictability of CKD
improving CKD prediction accuracy and reliability. The by combining ACO-based Feature Selection and Extreme
evaluation of results in this study centres around two key Learning Machine (ELM). The proposed ACO algorithm
metrics: accuracy score and the Area Under the Receiver minimizes features efficiently, improving CKD prediction
Operating Characteristic (AUC-ROC) curve. The accuracy accuracy and streamlining the diagnostic process. In above
score provides a holistic measure of the overall correctness of mentioned studies, various feature selection and feature
the predictions made by the models. On the other hand, the extraction methods were used to improve the accuracy
AUC-ROC curve offers a more nuanced evaluation by models. We have selected some of the above feature
extraction and selection methods to perform a comparative
analysis.
III. DATASET
The dataset used in this paper is the CKD dataset that is taken
from the DY Patil hospital. There are a total of 22 attributes
including the class attribute. The dataset is a binary
classification dataset with ‘ckd’ representing presence of
disease and ‘notckd’ representing absence of disease. The
dataset has 400 records of CKD patients and 400 of non-CKD
patients respectively. The details about attributes of dataset
are given in TABLE 1.

TABLE 1: Dataset Description

Features Description Units


Age Age of the patient int
Gender Gender of the patient categorical
vol Volume float
sg Specific gravity float
freq freq int
sod Sodium int
pot Potassium float
chlo Chloride int
phos Phosphorous float Figure 1. Workflow
prot Protein float
Alb Albumin float A. Feature Extraction
Glob Glob float LDA is a dimensionality reduction technique commonly used
urea Urea level float for supervised classification problems. The goal of LDA is to
creatinine Creatinine level float project the dataset onto a lower-dimensional space while
Bun Blood Urea Nitrogen float maximizing the class separability. It seeks to maximize the
uric acid Uric acid level float distance between class means while minimizing the spread
rbc Red blood cells float within each class.
Wbc White blood cells float
Pcv Packed cell volume float PCA is a statistical procedure that uses orthogonal
pe Pedal edema categorical transformation to turn a set of correlated feature observations
into a set of linearly uncorrelated features. It is a
Ane Anemia categorical
dimensionality reduction technique that transforms data into a
classification Class categorical
new coordinate system to capture its essential features.
ICA is used in machine learning to disentangle a multivariate
The data has multiple nominal attributes, thus categorical data
encoding is applied. Categorical data encoding involves signal into its independent non-Gaussian components. The
transforming categorical variables into binary classes ('1' and goal of ICA is to estimate the mixing matrix 𝐴−1 or the
'0') to facilitate calculations. Subsequently, data sources S by maximizing the statistical independence of the
standardization is performed to address fluctuations within estimated components.
ranges. This entails scaling the dataset to a variance of 1 after B. Feature Selection
centering the mean around 0 using StandardScaler. These
preprocessing steps collectively enhance the consistency and Meta-heuristic feature selection algorithms iteratively
comparability of the dataset. explore and evaluate different subsets of features to find an
optimal or near-optimal subset that maximizes a performance
IV. METHOD metric. They leverage evolutionary or swarm intelligence
We received real-time data from DY Patil Hospital for this principles to efficiently search the high-dimensional feature
research, consisting of information from 800 patients. space and improve model performance.
Initially base models were trained on all the given attributes PSO is a heuristic technique that finds the best solution by
in the dataset. Various feature extraction and feature selection mimicking the movement and clustering of birds. PSO aims
to iteratively adjust particle positions to find the optimal
methods were applied to the dataset to select only the
solution in a search space [15]. The algorithm optimizes a
necessary features. These feature extraction methods include fitness function by iteratively adjusting the positions of
LDA, PCA, ICA, and meta-heuristic feature selection particles in the search space.
methods such as PSO, ACO, and ABC. The best-
selected/extracted features were then fed into the 𝑣𝑖𝑡+1 = 𝑤𝑣𝑖𝑡 + 𝑐1 𝑟1 (𝑝𝑏𝑒𝑠𝑡𝑖 − 𝑥𝑖𝑡 ) + 𝑐2 𝑟2(𝑔𝑏𝑒𝑠𝑡 − 𝑥𝑖𝑡 ) (1)
classification models, including ANN, RF, MLP, KNN. The
Figure 1. illustrates the sequence of tasks outlined in the 𝑥𝑖𝑡+1 = 𝑥𝑖𝑡 + 𝑣𝑖𝑡+1 (2)
paper’s workflow.
𝑥𝑖𝑡 is position and 𝑣𝑖𝑡 is velocity of the particle, the feature space have comparable values or labels. Since it is
instance-based and non-parametric, it keeps the training
ACO is inspired by nature and is based on how ants forage for dataset for predictions and doesn't make any assumptions
food. ACO iteratively simulates ant movements on a graph or about the underlying data distribution.
network, updating pheromone levels on edges based on
successful paths. Pheromone levels guide future ants, and over V. RESULTS
time, the algorithm converges towards optimal solutions [9].
Accuracy, Receiver Operating Curve (ROC), Area Under
𝜏𝑖𝑗 = (1 − 𝜌). 𝜏𝑖𝑗 + ∑𝑚 𝑘
𝑘=1 ∆𝜏𝑖𝑗 (3) Curve (AUC) are the various assessment methods used for
assessing the prediction models.
𝜏𝑖𝑗 is pheromone level, 𝜌 is pheromone evaporation rate.
TABLE 2: Confusion matrix
ABC is a nature-inspired optimization algorithm based on the
foraging behaviour of honeybees. ABC involves employed Predicted ‘ckd’ Predicted ‘notckd’
bees, onlookers, and scouts. Employed bees explore the Actual ‘ckd’ TP FN
solution space, and onlookers select better solutions based on
Actual ‘notckd’ FP TN
employed bees' information [16]. The algorithm incorporates
exploration and exploitation to find optimal solutions. 𝑇𝑃
𝑇𝑃𝑅 = (5)
𝑇𝑃+𝐹𝑁
𝑥𝑖𝑗𝑡+1 = 𝑥𝑖𝑗𝑡 + ∅𝑖𝑗 (𝑥𝑖𝑗𝑡 𝑡
− 𝑥𝑘𝑗 ) (4)

𝑡
𝑥𝑘𝑗 is a random selected bee, ∅𝑖𝑗 is a random number between (𝑇𝑃𝑅𝑖+1 +𝑇𝑃𝑅𝑖 )(𝐹𝑃𝑅𝑖+1 −𝐹𝑃𝑅𝑖 )
𝐴𝑈𝐶 = ∑𝑛−1
𝑖=1 (6)
[−1,1]. 2

C. Models
𝐹𝑃
ANN is a computational model that draws inspiration from the 𝐹𝑃𝑅 = (7)
𝐹𝑃+𝑇𝑁
architecture and operations of the human brain. An artificial
neural network (ANN) is made up of layers of networked
nodes, or neurons, that process information by use of weighted In all the classification models, 80% of dataset is used for
connections to convert input signals into output. The network training and 20%of the dataset is used for testing. Among 800
learns by fine-tuning these weights during training, which records, 560 records are used for training and 240 records are
improves its capacity for precise classifications or predictions used for testing.
[6]. The network is composed of input and output layers, with Initially the base models, which are ANN, RF, MLP, KNN
hidden layers enabling intricate representations. ANNs are are trained using all the features in dataset. The AUC-ROC
useful for a variety of tasks, including as pattern recognition curves for all the base models is given in Figure 2.
and regression, since they are excellent at identifying complex
patterns and relationships in data.
RF is an ensemble learning method used for classification and
regression tasks. During training, it builds several decision
trees and combines their predictions to increase overall
resilience and accuracy [5]. A random subset of the data is
used to train each tree, and the final prediction is either the
average (regression) or the majority vote (classification) of
each tree's individual predictions. By comparing the relative
importance of each variable among the trees in the forest,
Random Forest reduces overfitting, manages noisy data
effectively, and offers insights into feature relevance.
Multi-Layer Perceptron is an artificial neural network
containing an input layer, one or more hidden layers, and an
output layer [13]. Each connection between neurons in one
layer and those in a subsequent layer is weighted. The network
modifies these weights during training in order to identify
patterns in the incoming data. Each neuron's activation
function adds non-linearity, which enables the network to Figure 2. AUC-ROC curves for base model
simulate intricate interactions. Because of its adaptability,
MLPs can be applied to a wide range of tasks, such as The following are the few important parameters for each
regression and classification, by varying the number of layers model used for the base models. The parameters for ANN are
and neurons in relation to the difficulty of the problem. given in TABLE 3.
KNN is a simple and intuitive machine learning algorithm
TABLE 3: Parameters for ANN
used for classification and regression tasks. In order to
function, a data point is either classified into the majority class
Parameter Value
(classification) or its K nearest neighbours values are averaged
(regression) in the feature space [14]. The number of optimizer adam
neighbours taken into account depends on the selection of K. dropout rate 0.5
KNN is predicated on the idea that comparable instances in no. of layers 3
The key parameters for RF are n_estimators, max_depth, selected by using feature extraction algorithms. PCA is the
criterion, their values are given in TABLE 4 best performing feature extraction algorithm.

TABLE 4: Parameters for Random Forest Classifier

Parameter Value
n_estimators [10,200]
max_depth [1,20]
criterion [‘gini’,’entropy’]

Few key parameters for Multilayer Perceptron Classifier are


size of hidden_layers, max_iter, solver, activation function.
The value of these parameters is given in TABLE 5.

TABLE 5: Parameters for MLP

Parameter Value
hidden_layer_size 50
max_iter 1000
solver adam Figure 3. AUC-ROC for LDA
activation relu

The key parameter for KNN is the number of neighbors.


TABLE 6 gives value for the number of neighbors.
TABLE 6: Parameters for KNN

Parameter Value
n_neighbors [1,30]

Accuracy measures the overall correctness of the model’s


predictions. TABLE 7 and TABLE 8 give the accuracy score
for training and testing sets respectively.

TABLE 7: Accuracy score for training set

Model MLP ANN KNN RF


Base 0.9643 0.9800 0.9300 1.0000
ICA 0.8125 0.8000 0.8268 1.0000 Figure.4. AUC-ROC for PCA
PCA 0.8250 0.8179 0.8054 0.9929
LDA 0.7786 0.7804 0.7786 0.7804
ACO 0.9661 0.9839 0.9839 1.0000
ABC 0.9964 0.9982 0.9929 1.0000
PSO 0.9929 0.9893 0.9839 1.0000

TABLE 8: Accuracy score for testing set

Model MLP ANN KNN RF


Base 0.9333 0.9600 0.8500 1.0000
ICA 0.8000 0.8000 0.8042 0.9584
PCA 0.8125 0.8167 0.7875 0.9333
LDA 0.7667 0.7625 0.7667 0.7667
ACO 0.9417 0.9625 0.9625 1.0000
ABC 0.9875 0.9958 0.9833 1.0000
PSO 0.9833 0.9750 0.9625 1.0000

As the accuracy scores show meta-heuristic feature selection


algorithms, particularly Artificial Bee Colony algorithm
performs better than feature extraction algorithms. The RF is Figure 5. AUC-ROC for ICA
the best performing classification model overall.
The Figure 3., Figure 4., Figure 5., shows the AUC-ROC The Figure 6., Figure 7., Figure 8., shows the AUC-ROC
curves for all the above-mentioned models with the features curves for all the models with features selected by feature
selection algorithms. ABC is the based performing feature overall. Thus, the meta-heuristic algorithms are better in
selection algorithm. selecting relevant and important features for prediction
models as compared to feature extraction algorithms.
CONCLUSION
Chronic Kidney Disease is a significant concern for society.
In this paper, we evaluated and compared the effects of
feature extraction and meta-heuristic feature selection
methods on various classification models. The experimental
results concluded that meta-heuristic optimization feature
selection methods outperformed feature extraction methods
and base models. All the classification models demonstrated
robust accuracy when selected features from feature selection
techniques were used. RF emerged as the best-performing
model and ABC as the best selection technique.
Classification models have limitations when we work with
less data. The dataset used in this research comprised only
800 records, so performance may vary with changes in the
Figure 6. AUC-ROC for ABC dataset size. In the future, additional parameters could be
considered to enhance prediction accuracy and optimize
behavior of the models.
REFERENCES
[1] Luyckx, Valerie A., Marcello Tonelli, and John W. Stanifer. "The
global burden of kidney disease and the sustainable development
goals." Bulletin of the World Health Organization 96.6 (2018): 414.
[2] Iliyas, Iliyas Ibrahim, et al. "Prediction of chronic kidney disease using
deep neural network." arXiv preprint arXiv:2012.12089 (2020). J.
Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd ed., vol.
2. Oxford: Clarendon, 1892, pp.68–73.
[3] Pal, Saurabh. "Prediction for chronic kidney disease by categorical and
non_categorical attributes using different machine learning
algorithms." Multimedia Tools and Applications (2023): 1-14.
[4] Mienye, Ibomoiye Domor, and Yanxia Sun. "Improved heart disease
prediction using particle swarm optimization based stacked sparse
autoencoder." Electronics 10.19 (2021): 2347.
[5] Singh, Vijendra, Vijayan K. Asari, and Rajkumar Rajasekaran. "A deep
neural network for early detection and prediction of chronic kidney
disease." Diagnostics 12.1 (2022): 116
Figure 7. AUC-ROC for ACO [6] Zhang, Hanyu, et al. "Chronic kidney disease survival prediction with
artificial neural networks." 2018 IEEE International Conference on
Bioinformatics and Biomedicine (BIBM). IEEE, 2018.
[7] Mondol, Chaity, et al. "Early prediction of chronic kidney disease: A
comprehensive performance analysis of deep learning
models." Algorithms 15.9 (2022): 308.
[8] Balakrishnan, Sarojini. "Feature selection using improved teaching
learning based algorithm on chronic kidney disease dataset." Procedia
Computer Science 171 (2020): 1660-1669.
[9] VJ Sara, S. Belina, and K. Kalaiselvi. "Ant colony optimization (ACO)
based feature selection and extreme learning machine (ELM) for
chronic kidney disease detection." International Journal of Advanced
Studies of Scientific Research 4.1 (2019)
[10] Debal, Dibaba Adeba, and Tilahun Melak Sitote. "Chronic kidney
disease prediction using machine learning techniques." Journal of Big
Data 9.1 (2022): 1-19.
[11] Singh, Chandrabhan, Mohit Gangwar, and Upendra Kumar. "Analysis
of Meta-Heuristic Feature Selection Techniques on classifier
performance with specific reference to psychiatric disorder."
[12] Hajian-Tilaki, Karimollah. "Receiver operating characteristic (ROC)
curve analysis for medical diagnostic test evaluation." Caspian journal
of internal medicine 4.2 (2013): 627.
Figure 8. AUC-ROC for PSO
[13] Djerioui, Mohamed, et al. "Heart Disease prediction using MLP and
LSTM models." 2020 International Conference on Electrical
Engineering (ICEE). IEEE, 2020.
By comparing the accuracy as well as the AUC-ROC curves
[14] Devika, R., Sai Vaishnavi Avilala, and V. Subramaniyaswamy.
for all the feature selection and feature extraction methods, "Comparative study of classifier for chronic kidney disease prediction
Artificial Bee Colony algorithm has performed the best, and using naive bayes, KNN and random forest." 2019 3rd International
Random Forest Classifier has shown consistent accuracy conference on computing methodologies and communication
(ICCMC). IEEE, 2019.
[15] Sharma, Shaweta, et al. "Metaheuristics Algorithms for Complex
Disease Prediction." Nature-Inspired Methods for Smart Healthcare
Systems and Medical Data. Cham: Springer Nature Switzerland, 2023.
169-180.
[16] Tarle, Balasaheb, and Sudarson Jena. "Improved artificial neural
network with aid of artificial bee colony for medical data
classification." International Journal of Business Intelligence and Data
Mining 15.3 (2019): 288-305.
[17] Bikbov, B., et al. "GBD Chronic Kidney Disease Collaboration:
Global, regional, and national burden of chronic kidney disease, 1990-
2017: A systematic analysis for the Global Burden of Disease Study
2017." Lancet 395.709-733 (2020): 32061315.

You might also like