An Improved Comparative Model For Chronic Kidney Disease (CKD) Prediction

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

2020 14th International Conference on Open Source Systems and Technologies (ICOSST)

An improved comparative model for chronic


kidney disease (CKD) prediction
Inayatullah1
Department of Software Engineering Huma Qayyum2
University of Engineering and Department of Software Engineering
Technology Taxila University of Engineering and
Taxila, Pakistan Technology Taxila
2020 14th International Conference on Open Source Systems and Technologies (ICOSST) | 978-1-7281-9050-1/20/$31.00 ©2020 IEEE | DOI: 10.1109/ICOSST51357.2020.9333097

inayat.ullah@students.uettaxila.edu.pk Taxila, Pakistan


huma.ayub@uettaxila.edu.pk

Abstract—This paper exploit machine learning (ML) technique diabetes are risks influences for CKD and its global
to find and diagnose chronic kidney disease (CKD) at mild prevalence [1] in adults is 6.4% in 2000 with most cases in
damaged stage. Kidney disease are syndromes that cause the developed countries and likely to be increase 7.7% by 2030
functions and Glomerular filtration rate (GFR) of the kidney. and 26%(972millions) which will be increased to 1.56 billion
Nephrologist caution that the ratio of patients affected by CKD by 2025 correspondingly. In diagnosis of CKD predicting
are significantly increasing. More precise Data mining and ML outcomes of patients remain significant for researchers,
methods are required to predict and diagnose CKD successfully. individuals, medical, and health care systems. For this
In this paper we apply different ML classification procedures on purpose, machine learning algorithms performed an
a data set obtained from UCI repository which comprise of 400
important rule and applied it to predict chronic kidney
instances, 24 features and binary classification labels. The 7-fold
diseases. RF, DA, KNN, SVM, and NB are familiar machine
and 10-fold cross-validation procedures are applied on dataset
learning (ML) techniques which assist in classification
to evaluate the model. The most familiar ML classification
algorithms are included in this paper are Random Forest (RF),
problems. These ML techniques are excellent importance in
Discriminant Analysis (DA), Naive Bayes (NB), Support Vector health care, weather prediction, digital image processing,
Machine (SVM), and K-Nearest Neighbor (K-NN). All the stock exchange market analysis. In this paper the above five
experiments are performed in MATLAB tools. The statistical techniques are used to classify CKD. The RF give maximum
results of all algorithms proved that RF performed better then accuracy 99.75% as compared to other techniques. The
DA, NB, SVM, and K-NN with accuracies of 99.75%, 98.25%, detailed description of these techniques is given in section III.
98%, 97%, and 92% respectively.
II. REVIEW OF RELATED WORK
Keywords— Random Forest (RF), Support Vector Machine Qing-Guo Wang and Adeola Ogunleye proposed
(SVM), Discriminant Analysis (DA), Naïve Bayes (NB), K- enhanced Extreme Gradient Boosting (XGBoost) method [2],
Nearest Neighbor (K-NN) with features selection techniques to diagnosis CKD
accurately on time. Dataset raw data are pre-processed and
I. INTRODUCTION
applied spot-checking ML procedures to rapidly discover
Kidneys are organs in the body to stream blood and excrete algorithm that perform will. Fuzhe Ma et al. [3] introduced
waste product from the body, also regulating the blood HMANN model for the early diagnosis of CKD on internet of
pressure, boosting the production of red blood cells. Kidney medical things (IoMT). The chronic renal disease detection,
disease are syndromes that cause the functions and segmentation and diagnosis process take place on come in for
glomerular filtration rate (GFR) of the kidney. The Estimated raw image. In this technique the irregularity of the kidney can
Glomerular Filtration rate eGFR has five stages. The eGFR be checked by segment the kidney area in image and extracts
1st stage is in mild kidney damage stage which eGFR ≥ 90 features from the image. In study [4] the nested ensemble ML
𝑚𝐿⁄𝑚𝑖𝑛 ⁄1.73𝑚2 . The 2nd, 3rd, and 4th stages eGFR is methodology are used to predict patient of breast cancer, for
between 60-89, 30-59, 15-29 𝑚𝐿⁄𝑚𝑖𝑛 ⁄1.73𝑚2 respectively. benign and malignant categories utilizing the Wisconsin
The 5th stage eGFR is near to failure or failed stage which diagnostic breast cancer (WDBC) dataset. In ref. [5], Murat
eGFR<15 𝑚𝐿⁄𝑚𝑖𝑛 ⁄1.73𝑚2 . Nephrologists instigate about Koklu and Kemal Tutuncu used most known methods, such
the rapid growing kidney disease is due to the high blood as SVM ,Naïve Bayes, multilayer perceptron and C4.5 for
pressure, diabetes, poor sleeping, heart disease, hepatitis C, classification of CKD. By comparing the results of these
ingesting junk food and self-medication. According to the algorithms, multilayer perceptron has highest accuracy. In
expert of kidney transplant unit in Pakistan Institute of this study the perceptron algorithm is used for supervised
Medical Sciences (PIMS), reported in 2019 that 17 million learning of binary classifiers to decide whether the disease is
people suffering from kidney diseases. Hypertension and CKD or not CKD. Shu Lih Oh, U.Raghavendra et al [6]

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
proposed convolutional neural network (CNN) for parkinson system, there are required electronic medical record about the
disease (PD) using Electroencephalogram (EEG) signals. patient [14]. The electronic record of the patient is formed by
These signals are usually used to diagnose abnormality in medical experts in a healthcare institution like hospital. This
brain. The proposed CNN network processed on training data, digital record of patient makes the diagnosis process of the
and model testing. CNN is a type of deep neural networks. It disease easy and precise. Data mining techniques operate in
is experienced that more connected and deeper the network, two concepts’ .i.e. predictive and descriptive models [15].
the better will be the learning ability, which give the more Predictive model is a supervise learning function which train
accurate results. In a research study [7], authors Hanyu from data to predict unknown data label, while descriptive
Zhang, Che-lung Hung et al also used ANN algorithm to model is an unsupervised learning technique which assign
predict CKD survival. Two ANN network models are built, unlabelled data to a cluster. In other words the supervise
one is a classical Multilayer perceptron (MLPs) and the other learning task for regression [16], categorization and
is integrating LASSO features selection which enhanced the classification while some techniques worked for clustering
prediction accuracy by variable selection and regularization [17], correlation analysis, and association. In this era the
parameters. The LASSO operator is applied in the hidden advancement of the technologies is rapidly developing for the
layers which automatically pick out the relevant attributes. health care sector. These innovations make the healthcare
The results of this model are good, but the recall and problems easy and more secure in term of individual life risk.
sensitivity remained inadequate as equated to other models.
The current ML procedures like SVM, NN, DT, regression, III. PROPOSED TECHNIQUES DESCRIPTION
and NB etc. classifiers are applied in different combination on The five ML techniques proposed in this paper are SVM, DA,
a dataset to predict disease on time. In this method NB, RF and KNN. The data flow diagram (DFD) and
ensembling [8] techniques like bagging etc. are used on a explanation of the five procedures are presented below.
dataset to predict heart disease accurately. Ensembling
technique majority voting used combination of different
algorithms. These techniques can be applied on CKD by
Raw Dada
using the kidney disease dataset to predict CKD. In another
study [9] authors predict CKD by using Naïve Bayes, SMO
and J48 as a classification algorithms. From results of these Data Pre-Processing
algorithms J48 give better result. The deep ANN [10]
technique used and compared with other techniques like
SVM, Random forest etc. to analyse CKD. From results it had 10-fold Cross Testing Data
Training Data
Validation of (10%)
been demonstrated that deep neural network has maximum (90%)
Dataset
accuracy 97% as compared to other models. Quan Zou,
Kaiyang Qu et al [11] worked on Diabetes mellitus (DM)
chronic disease. Quan Zou, Kaiyang Qu et al used NN, DT, Classification Algorithms
and RF ML procedures to predict DM disease. In this study a
dataset of 14 features and 164431 records are used. Decision
Trained Model
tree are implemented in weka tools and neural network
implemented in matlab tools due to its dissimilar features.
Experimental results showed that using principal component
Prediction
analysis (PCA) give poor results and using all features with
minimum redundancy maximum relevance (mRMR) give
improved results. In Ref. [12] authors outlook and precautions
Evaluation Metrics or
that most of the models for predicting of the CKD are not parameters
assessable for researchers, clinicians and practitioners, so this
is why conclusion making and resource allocation for the
Results: in term percent
risky CKD are thought-provoking. They suggested that there Accuracy, Precision,
are desired of a model that clinical intuition for CKD Recall, F1-score
prediction, also the potential of initial prognosis of the disease
and abilities for the risky decision for rehabilitation. The Fig. 1 Data Flow Diagram
medical decision support systems (MDSSs) [13] usages are
increasing with time. Expert systems are used in management A. Random Forest(RF)
of CKD due to some pros in overcoming the complication in 1) Overview
medical data. In CKD, the expert system has three main RF is a type of ensembling classifiers, which are use
applications, like prediction, classification of different stages decision trees structure algorithm in a randomized fashion. It
of CKD, and modelling the kidney transplantation is applied on regression and multiclass categorization
surveillances. Inference engine and knowledge base sub problems. In machine learning methodology, random forest
systems make the expert systems. The knowledge base algorithm generates decision trees and make prediction on a
comprises the rules which uses by the inference engine for data samples which finally decide the best solution by
new facts. Diagnosis of the disease by using the expert majority voting. The random forest (RF) algorithm was

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
created by a computer scientist Tim Kam Ho [18] in 1995. In I. Training phase
Fig. 2 multiple random datasets (RD1, RD2, RD3, … RDn) • For d = 1 to D (where D is the number of decision trees in
are created in randomize fashion from original dataset (OD). RF) Bootstrapping 𝐿∗ sample of size n (from OD)
• Create trees 𝐷𝑏 from the bootstrap samples
In 2nd step random datasets are used to build multiple decision
• Repeat the following steps for each terminal node of trees
trees. In 3rd step combined all decision trees and make result 𝐷𝑏 until the minimum node size N reached
of the test instances based on majority voting or averaging to i. Select X attributes randomly out of Y attributes
assign the test instance to a specific class. ii. Select the best split up point p out of X
2) Technical description iii. Make split up point into farther binary daughter points
In general a Random Forest (RF) split-up into the iv. Repeat the above three steps until N nodes or split generate
following hierarchical segments [19] (see Fig. 2). v. Repeat the above for D number of trees
II. Testing phase
a) Training Data
• In testing phase make prediction on novel point 𝑥 ∗
In RF training data is in fact the original dataset (OD) 1. Case: For regression problems take average of all the
which contain some attributes and one target or label attribute output trees.
which work as a binary classification based on other attributes 1
𝑅𝐹𝑟 = ∑𝐷𝑑=1 𝐷𝑏 (𝑥 ∗ )
𝐷
of the OD. Let 𝑌 ′ = {𝑦1, 𝑦2 , 𝑦3 , . . . , 𝑦𝑛 } be the training set for 2. Case: In classification problems take majority votes
class 𝐶𝑗 . The subsets of these training set are created in the for most voted class. Let 𝑀𝑣 (𝑥 ∗ ) are majority votes
for a class from the 𝐷𝑏 tress on novel point 𝑥 ∗.
bootstrapping process (see Fig. 2). 𝑅𝐹𝑐 = majority vote[𝑀𝑣 (𝑥 ∗ )]1𝐷
b) Bootstrapping:
Bootstrapping [20] is a statistical resampling procedure
which pick and choose instances randomly of the OD in order
to generate various distinct training set which is called
bootstrapping datasets (BD) or random datasets (RD). In BD,
the duplication of the instances is permitted, but it is better if
the frequency is minimal. Let 𝐿∗ = {𝑅𝐷1 , 𝑅𝐷2 , 𝑅𝐷3 , . . . , 𝑅𝐷𝑛 }
be the number of resampling from OD as shown in Fig. 2.
Then multiple decision trees can create from bootstrapping 𝐿∗ .
c) Decision tree:
Decision tree (DT) is a classification technique [21] for
samples. In RF technique, create multiple decision trees from
each subset of the OD. All DTs randomly select subset of the
variables and provide its prediction for the target label in OD.
Let 𝐷𝑏 = {𝐷𝑇1 , 𝐷𝑇2 , 𝐷𝑇3 , . . . , 𝑇𝐷𝑛 } be the number of decision
trees created from bootstrapping 𝐿∗ .
d) Majority voting:
The majority voting is a weighted procedure that combine the
predictions provided by multiple DTs in a RF. In this
procedure a tuple is associated with all DTs for the target label
on the base of majority voting. In case of classification
majority voting are applied as following.

𝑅𝐹𝑐 = majority vote[𝑀𝑣 (𝑥 ∗ )]1𝐷 (1)

Where 𝑀𝑣 (𝑥 ∗ ) is the majority votes for novel point 𝑥 ∗ to Fig. 2 Random Forest
consider most voted class for final prediction, and over-all
decision trees in RF is represented by D. B. Discriminant Analysis (DA)
e) Bagging: 1) Overview
Bagging also known bootstrap aggregation [22] is an Discriminant Analysis (DA) is a procedure, devolved by
ensemble meta-algorithm ML procedure, used to improve the Sir Ronald Fisher in 1938 [23, 24], which allows the
accuracy and decrease the misclassification error of the ML researcher to examine the distinction between multiple
algorithms. It is a specific case for model averaging. In case clusters of objects with respect to several variables
of regression problem, the bagging or averaging of the total concurrently. The work of DA is to find a line that best
number of decision trees are given as following. separates the data points into labels for the binary outcome.

1 2) Technical description
𝑅𝐹𝑟 = 𝐷 ∑𝐷 ∗
𝑑=1 𝐷𝑏 (𝑥 ) (2) a) Linear discriminant analysis (LDA):
LDA is a specialized type of dimensionality
𝐷𝑏 is the number of decision trees created from bootstrap. reduction of DA procedure, which is commonly used for the
f) RF Algorithm: classification problems as shown in Fig. 3. LDA try to expend
the separation of known label and shrink their within-label

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
variability. The following steps applied in LDA (see Fig. 3). In product of the feature’s values for any of two classes that is
1st phase compute the split-up between the distinct labels. The disease predicted CKD or notCKD.
2nd step is to compute the gap among the examples and mean
D. Support Vector Machine (SVM)
of each label. In 3rd step is to find the lower dimension space
and expands between label variance and decreases the within 1) Overview
label variance. Here we need to minimize scatter within SVM is a supervised ML classifier for data regression and
classes and maximize scatter between classes. organization complications. SVM is particularly applied on
b) Quadratic discriminant analysis (QDA): binary classification tasks. In 1990s SVM [27] was presented
by Vladimir Vapnik, Cortes and Boser as a clustering
QDA is also a statistical learning technique used for
algorithm for unlabelled data. SVM have many real life usages
classification of instances to a class like LDA. QDA assumed
like, image classification, face recognition, text labelling,
that each label has its individual covariance matrix, rather
meteorological conditions prediction, stock market analysis
LDA assumes equal covariance matrix for every class.
and medical diagnosis [27]. The main operation of the SVM
is to distinct data on the bases of binary labelled, by a line
which getting the maximum distance between the classes
Between the different label split-ups

[28]. Almost maximum ML techniques have challenges due


to curse of dimensionality. This issue is occurred when there
are inadequate number of instances and limited experienced
from quite a few attributes. To some extent SVM model shrink
this problem [29]. In combination of distinct data, SVM
utilized the kernel function [30].
2) Technical description
The basic functionality of the SVM is to enhance the
prediction accuracy of the assumption. Some basic
perceptions which absolutely introduced the SVM are the
Maximum distance between means and samples (with in class) following [30].
variance)
Fig. 3 Linear discriminant analysis
a) Maximum margin hyperplane
C. Naïve Bayes (NB) Maximum margin hyperplane is the maximal gap
1) Overview separation from hyperplane of both sides, to the adjacent
The NB is probabilistic ML procedure, grounded on Bayes data points of both margins of the hyperplane. There can be
multiple options for hyperplane on separation a specific
theorem, which is used in classification problems. It is
particular occurrence of the Bayesian network, and a data, but the hyperplane which give maximum labelled
classifier which work on bases of probability. In this instances separation is selected [30] which gives the
algorithm if change one feature value, does not change the maximal margin as of both classes. The distance closest to
value of the other. This classifier used for high dimensional point which is fit to the upper (red circles) label and the
dataset. distance closest to the point which fit to the lower (blue
circles) label must be the same. The mathematical form of
2) Technical description
these points are given by the equations bellow [31].
The Naïve Bayes algorithm accept the existence of pre-
defined features in a specific class which are unrelated to the
existence of the other features. This take on that the features 𝜓 ∙ 𝑥𝑖 + 𝜆 ≥ +1 when 𝑦𝑖 = +1 (5)
values conditionally unbiased provided target value [25]. A 𝜓 ∙ 𝑥𝑖 + 𝜆 ≤ −1 when 𝑦𝑖 = -1 (6)
few initial steps are taken to build Naïve Bayes model in
mathematical form. 1st of all pre-defined features are given where 𝑥𝑖 are the points lie on the hyperplanes (support
for a specific class to be identify, in this study the CKD vectors), when 𝑥𝑖 positive points are above the hyperplane
features are given in a dataset for classes CKD and notCKD. and vice versa when 𝑥𝑖 negative. Where 𝜓 is weight vector
which is perpendicular to the hyperplane and λ denote bias,
𝑃(𝛽 ⁄𝛼)𝑃(𝛼) which is a specific parameter in SVM.
𝑃(𝛼 ⁄𝛽 ) = 𝑃(𝛽)
(3)
P(𝛽⁄𝑌) = P(𝛽1 , 𝛽2 , 𝛽3 , … 𝛽𝑛 ⁄𝑌)

= ∏𝑛𝑖=1 𝑃(𝛽𝑖 ⁄𝑌) (4)

Where P(𝛼⁄𝛽 ) is Posterior probability of the target label c


specified forecaster feature β. Here 𝑃(𝛼) represent Prior
chance of labels, 𝑃(𝛽 ⁄𝛼 ) represent likelihood which give
chance of predictor given label and 𝑃(𝛽) is the prior
possibility of predictor. In 3rd step product of all features
values are find for each instances and choose the highest
probability value for a target value as shown in equation (4)
by Naïve Bayes assumption [26]. The features values give Fig. 4 maximal margin
decision on the highest probability value obtained from the

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
b) Soft margin the training data set. Used simple majority of the
On linear decision boundary some data points are miss- classification of nearest neighbors as prediction value of the
classified. The Soft margin established to allow occasionally query example.
miss-classified data points. In this missed classification, it a) Euclidean distance
may be the anomaly of the dataset or the instances may The K-NN algorithm find the K nearest neighbors of
possibly be placed on false side of the hyperplane. black square in the dark red and blue circles which are
c) Kernel functions examples of the two classes as shown in Fig. 5. In the given
The kernel functions [32] are used to projects data- figure we take k=3 points as nearest neighbors to the blue and
points as of low-dimensional space to higher dimension dark red circles using the Euclidean distance formula. The
space when the data-points are not in simple plane, but in a predicted sample are assigning to the class which have
complex higher dimension space, which cannot be separate maximum nearest samples. In Fig. 5 dark red circle have two
by simple method. and blue circle have one nearest neighbors to the black square.
In this case black square is assign to the dark red samples.
E. K-Nearest Neighbor (K-NN)
1) Overview IV. THE EXPERIMENTAL STUDIES
The K-NN is a ML technique which used for the In this part of the paper, we explain the conducted
classification problems, to predict new samples in searching experiments of this study. The statistical analysis and
the whole training set for the K-closest samples by distance comprehensive explanation of the dataset will be carrying out
function. The K-NN is a data-points based learning in this section.
procedure. K-NN is a supervised ML technique which use
A. The dataset specification
target variable or label dataset for unknown instances. In
classification the K values are the mode or most common The experimentation of this study was executed on the
label values. In this algorithm the so-called distance measure CKD dataset, obtained from University of California, Irvine
Euclidean function is used. In 1951 Hodges and Fix (UCI) ML repository [35]. L.Jerline Rubini a research scholar
introduced the K-NN algorithm as a non-parametric proposed this CKD dataset with three algorithms, like logistic
technique for regression and pattern classification [33]. This regression, radial based function network and multilayer
algorithm learning from historical data and new data points perceptron [36]. The dataset accumulated from 400 records
would be put in a class which have maximal data points in which consist of 24 features and one target label in binary
the closest neighbors. classification, which collected over a period of two-months.
This dataset consists of 13 nominal and 11 numerical features.
The target label work as a binary classification, either the
patient is CKD or not-CKD. The 250 features assign as a
CKD and 150 features assign as a not-CKD in around 63%
and 37% respectively. In this study we used the most relevant
features and condensed the dimension of the dataset to reduce
the computational power and to get maximum accuracy. The
statistical analysis and detailed description of the dataset is
given in study [37].
B. Experimental arrangement
The experiment was conducted to evaluate the five ML
techniques. The ML techniques were executed in MATLAB
R2017b environment using the CKD dataset. In this study 7-
Fig. 5 K-Nearest neighbor fold and 10-fold cross-validation technique were applied to
partition the data into training and testing examples. Cross-
2) Technical description validation is also called rotation estimation [38] which process
K-NN algorithm compute the gap between novel instance the available data in several iteration for a single training
and with neighbors of training examples. By the result of K model. In 10-fold cross-validation the available data samples
values the majority voting technique are applied to assign the are divided into 10 equal subsets. In 1st iteration the 1st subset
new instance to a target label. The mathematical form of the assign to test examples and the remaining nine subsets are
algorithm for a query data Ǭi whose class is not categorized training examples. In the next iterations the tested subsets
can be find by the Euclidean distance method [34]. examples would be not reassign for testing examples. It
implies that all examples would be considered in testing for
once. The accuracy of all conducted classifiers was analysed
Dist(Ǭ𝑖 , 𝜌𝑖 )=√∑𝑛𝑖=1(Ǭ𝑖 − 𝜌𝑖 )2 (7) from confusion matrix. The accuracy results in TABLE IV
showed that 10-fold cross-validation give maximum accuracy
as compared to 7-fold cross-validation as shown in TABLE I.
Where 𝜌𝑖 = (𝜌1 , 𝜌2 , 𝜌3 ,…, 𝜌𝑛 ) are points in d-dimensional
space. The distance between query data Ǭi is calculated with V. RESULTS AND DISCUSSION
every point of train data set. The decision of the unknown
In this study we utilized five machine learning (ML)
class instance Ǭi categorization are taken by sorting the
techniques on CKD relevant dataset which is obtained from
results of kth nearest data points with minimum distances in
UCI repository and compared all the results as shown in

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
TABLE IV. In this work each classifier gives different results. TABLE II. Confusion matrix values from 10-fold cross-validation
RF has the maximum classification accuracy 99.75% and it is
considered the superlative classification algorithm as Predicted CKD Predicted notCKD
S. Actual CKD TP FP
compared to DA, NB, SVM and KNN. In TABLE III the No
result of this study also compared to the previous models Actual notCKD FN TN
which give maximum performance as equated to prior. The 218 32
results of NB and DA are almost same, but with minor 1 KNN
difference, i.e. NB and DA correctly classified 392 and 393 0 150
instances out of 400, respectively. The results are measured in 247 3
the form of Precision, Recall, Accuracy, and F-score. The 2 NB
results of all models are obtained by using confusion matrix. 5 145
244 6
The statistical measured results and correct predictions of each
3 DA
classifier from confusion matrix are given in detailed in 1 149
TABLE II. 244 6
4 SVM
TABLE I. Different techniques Using 7-fold cross-validation 6 144
250 0
5 RF
S. F1- Accuracy 1 149
Models Precision Recall
No score %

Proposed Techniques Chart


1 K-NN 0.864 0.995 0.925 91.24
105
2 SVM 0.968 0.983 0.975 96.99
100
3 NB 0.972 0.956 0.963 95.50
Percentage (%)

95
4 DA 0.975 0.994 0.979 98.24
90
5 RF 1.00 0.995 0.997 99.74
85
A. Accuracy
As reported by the ISO 5725-1 [39] accuracy is the proximity 80
of trueness of a measurement results to the true value. K-NN SVM NB DA RF
(𝑇𝑃 + 𝑇𝑁 ) Precision% 87.2 97.6 98.8 97.6 100
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (8)
(𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 ) Recall% 100 97.6 98.01 99.59 99.6
F1-score% 93.16 97.6 98.40638 98.5858 99.7996
Here 𝑇𝑃 is the true positive. 𝑇𝑁 is the true negative. 𝐹𝑃 is the
false positive. 𝐹𝑁 is false negative. Accuracy% 92 97 98 98.25 99.75

B. Precision Techniques Data Table


The precision is the intensity of repeated measurement of Fig. 6 Statistical measures column plots of Model
equivalent results under unchanged conditions [40].
TABLE III. Random Forest Comparison with previous technique
𝑇𝑃
Precision = 𝑇 (9)
𝑃 + 𝐹𝑃 S. F1- Accuracy
Models Precision Recall
C. Recall No score %
The recall represent the fraction of the total number of actual 1 [2] 1.00 0.917 0.974 97.6%
positive cases that are accurately predicted positive [41].
𝑇𝑃 2 [4] 0.977 0.977 0.977 98.07%
Recall = 𝑇 (10)
𝑃 + 𝐹𝑁
3 [7] 0.904 0.844 0.872 97.01%
D. F1- score
4 [9] 0.99 0.99 0.99 99%
2∗(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)∗(𝑅𝑒𝑐𝑎𝑙𝑙)
F1-score = (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛) + (𝑅𝑒𝑐𝑎𝑙𝑙)
(11)
5 [10] 1.00 0.925 0.976 97%
Here F1-score is also called F1-measure. The above equation Our
measures the accuracy of the test data. 6 1.00 0.996 0.998 99.75%
Model

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
TABLE IV. Results of Different Methods Using Same Dataset (CKD) in MATLAB tools

Correctly Evaluation metrics


Time to build
S. Classified Weighted avg
model Accuracy%
No Classifiers Used instances out
Precision Recall F1-score
of 400
K-Nearest Neighbor
1 368 0.8720 1.0000 0.9316 1.005s 92%
(K-NN)
Support Vector
2 388 0.9760 0.9760 0.9760 0.977s 97%
Machine (SVM)
3 Naïve Bayes (NB) 392 0.9880 0.9801 0.9840 0.523s 98%
Discriminant
4 393 0.9760 0.9959 0.9859 0.471s 98.25%
Analysis (DA)
5 Random Forest (RF) 399 1.0000 0.9960 0.9980 5.611s 99.75%

[9] M. Arora and E. A. Sharma, "Chronic Kidney Disease Detection


VI. CONCLUSIONS by Analyzing Medical Datasets in Weka," International Journal
of Computer Application, vol. 6, pp. 20-26, 2016.
Numerous techniques have been used to predict CKD by using [10] H. Kriplani, B. Patel, and S. Roy, "Prediction of chronic kidney
different datasets and information as an input variable. In this diseases using deep artificial neural network technique," in
study we used the CKD dataset obtained from UCI repository. Computer Aided Intervention and Diagnostics in Clinical and
Medical Images, ed: Springer, 2019, pp. 179-187.
RF, DA, NB, SVM and K-NN are the familiar ML techniques [11] Q. Zou, K. Qu, Y. Luo, D. Yin, Y. Ju, and H. Tang, "Predicting
used in this study. These techniques have some advantages in diabetes mellitus with machine learning techniques," Frontiers in
different ML areas. The RF an ensembling technique for genetics, vol. 9, p. 515, 2018.
binary classification are proposed in this paper as a prediction [12] M. J. Kadatz, E. S. Lee, and A. Levin, "Predicting progression in
CKD: perspectives and precautions," American Journal of
model to accurately predict the CKD. The RF model classified Kidney Diseases, vol. 67, pp. 779-786, 2016.
approximately all instances of the dataset. The dataset was [13] A. Yadollahpour, "Applications of expert systems in
pre-processed by using the above techniques and used 10-fold management of chronic kidney disease: a review of predicting
cross validation to split the training and testing dataset into techniques," Oriental Journal of Computer Science and
Technology, vol. 7, pp. 306-315, 2014.
90% and 10% respectively. The evaluation metrics are used to [14] J.-J. Yang, J. Li, J. Mulder, Y. Wang, S. Chen, H. Wu, et al.,
find the results of ML techniques used in this study. The "Emerging information technologies for enhanced healthcare,"
results proved that the performance of RF improved and give Computers in industry, vol. 69, pp. 3-11, 2015.
maximum accuracy 99.75% as equated to other classifiers. In [15] M. Kantardzic, Data mining: concepts, models, methods, and
algorithms: John Wiley & Sons, 2011.
future this method can be applied for other disease like covid- [16] Z. Wu and C.-h. Li, "L0-constrained regression for data mining,"
19. in Pacific-Asia Conference on Knowledge Discovery and Data
Mining, 2007, pp. 981-988.
REFERENCES [17] P. Berkhin, "A survey of clustering data mining techniques," in
[1] V. Jha, G. Garcia-Garcia, K. Iseki, Z. Li, S. Naicker, B. Plattner, Grouping multidimensional data, ed: Springer, 2006, pp. 25-71.
et al., "Chronic kidney disease: global dimension and [18] T. K. Ho, "Random decision forests," in Proceedings of 3rd
perspectives," The Lancet, vol. 382, pp. 260-272, 2013. international conference on document analysis and recognition,
[2] A. Ogunleye and Q.-G. Wang, "Enhanced XGBoost-based 1995, pp. 278-282.
automatic diagnosis system for chronic kidney disease," in 2018 [19] A. Farrell, G. Wang, S. A. Rush, J. A. Martin, J. L. Belant, A. B.
IEEE 14th International Conference on Control and Automation Butler, et al., "Machine learning of large‐scale spatial
(ICCA), 2018, pp. 805-810. distributions of wild turkeys with high‐dimensional
[3] F. Ma, T. Sun, L. Liu, and H. Jing, "Detection and diagnosis of environmental data," Ecology and evolution, vol. 9, pp. 5938-
chronic kidney disease using deep learning-based heterogeneous 5949, 2019.
modified artificial neural network," Future Generation Computer [20] B. Efron, "Bootstrap methods: another look at the jackknife," in
Systems, 2020. Breakthroughs in statistics, ed: Springer, 1992, pp. 569-593.
[4] M. Abdar, M. Zomorodi-Moghadam, X. Zhou, R. Gururajan, X. [21] L. Rokach and O. Z. Maimon, Data mining with decision trees:
Tao, P. D. Barua, et al., "A new nested ensemble technique for theory and applications vol. 69: World scientific, 2008.
automated diagnosis of breast cancer," Pattern Recognition [22] T. Hothorn and B. Lausen, "Double-bagging: combining
Letters, vol. 132, pp. 123-131, 2020. classifiers by bootstrap aggregation," Pattern Recognition, vol.
[5] M. Koklu and K. Tutuncu, "Classification of Chronic Kidney 36, pp. 1303-1309, 2003.
Disease With Most Known data Mining Methods," Int. J. Adv. [23] W. R. Klecka, G. R. Iversen, and W. R. Klecka, Discriminant
Sci. Eng. Technol, vol. 5, pp. 14-18, 2017. analysis vol. 19: Sage, 1980.
[6] S. L. Oh, Y. Hagiwara, U. Raghavendra, R. Yuvaraj, N. [24] S. B. Green and J. Neil, "Salkind. 2008. Using SPSS for Windows
Arunkumar, M. Murugappan, et al., "A deep learning approach and Macintosh: Analyzing and Understanding Data," ed: Upper
for Parkinson’s disease diagnosis from EEG signals," Neural Saddle River, NJ: Pearson.
Computing and Applications, pp. 1-7, 2018. [25] T. M. Mitchell, "Machine learning," ed: McGraw-hill New York,
[7] H. Zhang, C.-L. Hung, W. C.-C. Chu, P.-F. Chiu, and C. Y. Tang, 1997.
"Chronic kidney disease survival prediction with artificial neural [26] C. Piech, "Logistic Regression," 2017.
networks," in 2018 IEEE International Conference on [27] R. G. Brereton and G. R. Lloyd, "Support vector machines for
Bioinformatics and Biomedicine (BIBM), 2018, pp. 1351-1356. classification and regression," Analyst, vol. 135, pp. 230-267,
[8] C. B. C. Latha and S. C. Jeeva, "Improving the accuracy of 2010.
prediction of heart disease risk based on ensemble classification [28] T. S. Furey, N. Cristianini, N. Duffy, D. W. Bednarski, M.
techniques," Informatics in Medicine Unlocked, vol. 16, p. Schummer, and D. Haussler, "Support vector machine
100203, 2019. classification and validation of cancer tissue samples using
microarray expression data," Bioinformatics, vol. 16, pp. 906-
914, 2000.

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.
2020 14th International Conference on Open Source Systems and Technologies (ICOSST)
[29] L.-P. Ni, Z.-W. Ni, and Y.-Z. Gao, "Stock trend prediction based
on fractal feature selection and support vector machine," Expert
Systems with Applications, vol. 38, pp. 5569-5576, 2011.
[30] W. S. Noble, "What is a support vector machine?," Nature
biotechnology, vol. 24, pp. 1565-1567, 2006.
[31] C. Cortes and V. Vapnik, "Support-vector networks," Machine
learning, vol. 20, pp. 273-297, 1995.
[32] A. Patle and D. S. Chouhan, "SVM kernel functions for
classification," in 2013 International Conference on Advances in
Technology and Engineering (ICATE), 2013, pp. 1-9.
[33] N. S. Altman, "An introduction to kernel and nearest-neighbor
nonparametric regression," The American Statistician, vol. 46,
pp. 175-185, 1992.
[34] Q. Kuang and L. Zhao, "A practical GPU based kNN algorithm,"
in Proceedings. The 2009 International Symposium on Computer
Science and Computational Technology (ISCSCI 2009), 2009, p.
151.
[35] D. Dua and C. Graff, "UCI machine learning repository. School
of Information and Computer Science, University of California,
Irvine, CA," ed, 2019.
[36] L. J. Rubini and P. Eswaran, "Generating comparative analysis of
early stage prediction of Chronic Kidney Disease," International
Journal of Modern Engineering Research (IJMER), vol. 5, pp.
49-55, 2015.
[37] N. A. Almansour, H. F. Syed, N. R. Khayat, R. K. Altheeb, R. E.
Juri, J. Alhiyafi, et al., "Neural network and support vector
machine for the prediction of chronic kidney disease: A
comparative study," Computers in biology and medicine, vol.
109, pp. 101-111, 2019.
[38] S. Geisser, Predictive inference vol. 55: CRC press, 1993.
[39] I. ISO, "5725-1: 1994, Accuracy (trueness and precision) of
measurement methods and results-Part 1: General principles and
definitions," International Organization for Standardization,
Geneva, 1994.
[40] J. C. f. G. i. Metrology-JCGM, "JCGM 200: 2008. International
vocabulary of metrology: basic and general concepts and
associated terms (VIM)," ed: Working Group 2, France, 2008.
[41] D. M. Powers, "Evaluation: from precision, recall and F-measure
to ROC, informedness, markedness and correlation," 2011.

978-1-7281-9050-1/20/$31.00 ©2020 IEEE

Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on June 17,2021 at 02:28:36 UTC from IEEE Xplore. Restrictions apply.

You might also like