Ieee 2

2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS)
Diabetes Prediction using Machine Learning

Algorithms with Feature Selection and
2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) | 978-1-6654-0521-8/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICACCS51430.2021.9441935
Dimensionality Reduction
Sivaranjani S, Ananya S, Aravinth J, Karthika R
Department of Electronics and Communication Engineering
Amrita School o f Engineering, Coimbatore
Amrita Vishwa Vidyapeetham, India
siva01.11.1999@gmail.com, ananyasankar7@gmail.com,
j_aravinth@cb.amrita.edu, r_karthika@cb.amrita.edu
Abstract— In today’s world diabetes has become one o f the systems, Innovative Diagnosing methods, Improved
most life threatening and at the same time most common diseases Medicines etc. The idea of this proposed work is to improve
not only in India but around the world. Diabetes is seen in all age the accuracy level of detecting the potential long-term damage
groups these days and they are attributed to lifestyle, genetic, stress that a diabetic patient could encounter if the Blood Sugar not
and age factor. Whatever be the reasons for diabetics, the outcome
kept under control.
could be severe if left unnoticed. Currently various methods are
being used to predict diabetes and diabetic inflicted diseases. In the
proposed work, we have used the Machine Learning algorithms There are several existing machine learning algorithms
Support Vector Machine (SVM) & Random Forest (RF) that would proposed to predict the onset of diabetes. Decision tree is
help to identify the potential chances o f getting affected by Diabetes basically following a tree like structure. In this, each node
Related Diseases. After pre-processing the data, features which denotes a test on a feature. Every single leaf node denotes a
influences the prediction are selected by implementing step forward class label and branches are conjunctions of features that lead
and backward feature selection. The Principle Component Analysis to class labels. It can be used for both numerical and
(PCA) dimensionality reduction method is analyzed after the unambiguous data [2]. Naïve Bayes is another machine
selection o f specific features and the accuracy o f the prediction is learning algorithm which gives prediction on the basis of
83% implementing Random Forest (RF) which is significant in
comparison with Support Vector Machine (SVM) with accuracy of
probability of specific object. It assumes that a specific feature
81.4%. is unrelated to any other previous features. The illness meter is
predicted correctly [3]. KNN is also known as lazy learning
algorithm. It classifies based on distance parameter. The
Keywords—PIMA dataset, pre-processing, classifier, Random distance measures can be Manhattan, Euclidean and various
Forest (RF), Support Vector Machine (SVM), feature selection, other calculations [4].
dimensionality reduction.
The collected data are processed to classify them into
I. In t r o d u c t io n
categories. The objective of the classification problem is to
collect the structured and random data and classify them in to
Diabetes are classified into two types that is typel and type 2. a preferred order using Random forest and Support vector
The primary difference between these two is that the people machine algorithms. Random forest is a model that works on
who are affected by type 1 diabetes do not produce insulin. multiple decision trees. It does not work on simple averaging
While the people infected by type 2 diabetes do not respond principles but works on two key concepts random sampling of
properly to insulin. They also do not make enough insulin at data and random subset of Features. This enhances the
the later stage. The major symptoms of those who are affected accuracy level of the predictions. Support vector machine
by either typel or type2 diabetes are that they urinate (SVM) is a model for classification, regression and outlier’s
frequently, feeling hungry and thirsty frequently, feeling detection. SVM is unique than other classifications as it works
fatigued and having blurry vision. on margin maximation principles.
The global data of diabetic patients are on the rise year after In our proposed work we discuss the two major classifiers, RF
year. There are 382 million affected people worldwide [1]. and SVM. The classification model is implemented and the
More than 75% of the patients are Type 2 and the T2D results are examined. Feature selection and dimensionality
numbers is soaring every year which is a real cause of concern reduction are experimented and the results are analyzed and
to WHO. Lots of efforts are being put by various organizations the efficient methods are concluded.
to stop the surge by creating awareness programs, healthcare
978-1-6654-0521-8/21/$31.00 ©2021 IEEE
141
Authorized licensed use limited to: California State University Fresno. Downloaded on July 01,2021 at 06:17:17 UTC from IEEE Xplore. Restrictions apply.
This paper deals with four sections, which includes, section II in increase of accuracy. The gradient boost scores are
which deals with Related work which discusses about the analyzed across varying feature numbers.
existing works, section III describes the System Design which Decision Tree, Random Forest, Neural Network classifying
covers the proposed work flow and methodologies, section IV algorithms are used in [13] for the model prediction and
briefs about Results and Discussion which compares different performance metrics: sensitivity, specificity and accuracy
techniques with their metric results and section V is about were analyzed. Feature selection (Principal Component
Conclusion. Analysis) was carried out and the results are verified using all
features, single feature like Glucose or skin thickness. In this
work [14] the comparison paper of various machine learning
II. RELATED WORK classification algorithms as SVM, RF, LR, DT and KNN. The
Some values in a dataset can be missing. Pre-processing of data set was pre-processed and classified and the accuracy was
missing values can be done by taking the mean values of the analyzed. With the results RF was proven to be more efficient
existing values. This method is called “Imputation”. with an accuracy of 75%.
Normalization and other pre-processors improve the efficiency
of the model. Principle Component Analysis (PCA) decreases The data set used in [15] is obtained from CPCCSSN.
the dimensionalities of the features [5]. Various other pre- Gradient Boost Method (GBM), LR, RF and Rpart are the
processing techniques like normalization, scaling and other algorithms used in the classification. Maximizing area under
combinations are analyzed. Data pre-processing is very the Receiver operating characteristic (ROC) curve using the
important in data mining and analyzing. Eliminating missing optimal hyperparameters. A 10-fold cress-validation is done to
values and transforming continuous data to finite values are analyze the trend in the prediction of unseen dataset. Random
two methods of pre-processing the diabetes dataset [6]. It Forest proves to be significant.
increases the efficiency of timing.
Support Vector Machine (SVM) classifier is used to check the
More than 60% of Indians suffer from chronic diseases. The healthcare parameters like heart rate, blood pressure and
prediction of human diseases like diabetes, cancer, asthma, serves as a multi-parameter detector [16]. The classifier helps
High blood pressure is done using various machine learning in health monitoring. SVM classification algorithm is
techniques [7]. The data presented is the complete source of modified and used in Software defined radio [17]. SVM
information for the prediction of diseases. The contribution of predicts the severity of Leukaemia Cancer by selecting the
symptoms is of most significance. The prediction of diabetes most influencing features [18].
type I and type II are done using the symptoms like The concept of Random Forest Classifier algorithm and
weightless, blurred vision and fatigue. feature selection for the model’s better performance is detailed
in this paper [19]. The improvement of the performance in
Decision Tree (DT), SVM, RF, K-Nearest Neighbor (KNN) detection is implemented.
and Naïve Bayes are the classification models used in [8]. In the proposed work of classifying the inputs, Random Forest
Information about the diabetes status in and around and the (RF) classification algorithm and Support Vector Machine
statistics were explained. The tree structure of the DT (SVM) algorithm are used for the diabetes prediction and
algorithm explains the flow of the classification. The decision classification [20]. Further Feature selection and
node’s branches can be two or more. Ways to choose the best dimensionality reduction are processed and classification is
attribute depends on the purity of the child nodes i.e. their done at each different step to analyze the performance of the
degree. Naïve Bayes is a probabilistic classification algorithm model after each various modification. The performance of
which depends on the Bayes theory. SVM creates a hyper line each model is analyzed using metrics like accuracy, sensitivity
between data set and classifies accordingly. and specificity metrics.
RF is a both classification and regression model. It uses
bagging method to find random samples. KNN predicts the
category based on the similarity or degree of distance III. SYSTEM DESIGN
measure. A modified approach is taken to select the most In this section, we firstly introduce the diabetes data set and its
appropriate feature and the performance is being evaluated feature and use it as the input of the Machine Learning (ML)
using the metrics [9] [10]. model. Present the developed framework with model
architecture and implementation details. The below block
Various classifying algorithms are used and implemented in diagram of the proposed model is given in Fig.1.
predicting the diabetes mellitus. Classification is done in [11]
using three classifiers namely Naïve Bayes, SVM and The data set used in this experiment is PIMA Indian diabetes
Decision Tree (DT) and evaluated on various measures. Naïve dataset [21]. The data pre-processing is done such that the
Bayes was found efficient among the three with an accuracy missing values are replaced by non-zero mean values of each
of 76.3 %. In this paper [12] classifies using K-nearest parameter. Then the data set is classified using Random Forest
neighbor (KNN), Logistic regression (LR) classifiers. (RF) and Support Vector Machine (SVM) classifiers. Feature
Gradient Boosting feature selection is carried out and results selection is done to extract the most influencing parameters
142
and classification is done to check the improvement in only extracted to maximize the performance of the model. The
classification. Further dimensionality reduction is carried out one followed here is wrapper method. This method follows a
to minimize the dimensional space. Finally, the data is greedy search approach. This evaluates all the combinations of
classified using the above-mentioned classifiers. features that are possible against the given evaluation
criterion. Here, forward feature selection and backward feature
elimination methods are used.
Forward feature selection is an iterative selection method. It
starts with the best performing variable with the target. Then
the next best performing variable is selected in combination
with the previously best selected one. This is repeated until
pre-set criterion is achieved. The pseudo code is as follows:
Step forward feature selection (n, p_values_of_n_features):
fix significance level (say SL=0.05)
for i from 1 to n
fit the model with feature i
check if it has the lowest p value
select that feature
while p_value < SL
add another feature with the same criteria with the
selected one(s)
end
Backward feature elimination is exactly opposite to the

previous method. In this method, all the variables are selected
and then the model is built. Next the variable which gives best
evaluation measured value is taken. This is repeated till the
criterion is achieved. The pseudo code is as follows:
Step backward feature elimination(n,p_values_of_n_features):
select significance level (say SL=0.05)
for i from 1 to n
fit the model with each feature i
select feature with highest p_value
while p_value > SL
remove that feature
fit the model with left out independent variables
end
C. Dimensionality reduction:
Dimensionality reduction is defined as the input parameters or
input variables of the given dataset. Generally, more input
parameters make predictive modelling task more difficult.
Hence, dimensionality reduction is nothing but reducing the
Fig. 1. Block Diagram of the proposed model number of input variables to increase the efficiency. Principle
component analysis is the proposed method of dimensionality
A. Data Pre-processing reduction. It is used for the reduction of dimensionality by
The dataset has some missing values in certain attributes, decreasing the information loss and also maximizing the
which can cause inaccurate results in the prediction. Accuracy interpretability. This creates uncorrelated variables that
of the model can also be reduced. In order to overcome that maximize the variances effectively. The PCA method using
the missing values of each attributes are treated with the covariance method transforms a matrix of X dimension to a
efficient mean value. Each ‘0’ value in a column is replaced matrix of another dimension Y where Y is a Karhunen-
by the non-zero mean of that column except pregnancies as a Loeve transfonn (KLT) of X.
patient may not have a history. Y = KLT{X} (1)
The empirical mean value of the matrix is given by:
B. Feature selection: «/ = ^ZF=iJTy (2)
The main usage of feature selection in machine learning is that where Uj is the vector of empirical means, one mean for each
the most important attributes or the best set of parameters are column j of the data matrix, Y;J is the data matrix, contains set
143
of all data vectors, one vector from each row and n is the the dependent parameter which predicts if a person is healthy
number of row vectors in the data set. or not.
The eigen vectors and eigen values of the covariance matrix is Data pre-processing as discussed place an important role in
given by the equation: dealing the missing values and after analyzing the results
' V~1CV = D (3) using the metrics the efficiency of the model increases after
where D is the diagonal matrix of eigenvalues of C, V is data pre-processing. The modified values of the missing
matrix containing the set of all eigenvectors of C, and C is the values are the mean value of the non-zero samples. The data
covariance matrix. type of each parameter is been taken care of during
calculation. The mean values of glucose, blood pressure, skin
thickness, insulin, BMI, diabetes pedigree function and age are
D. Classification: 122, 72, 29, 155, 32.46, 0.47 and 33 respectively.
After processing the data using the feature selection and The data set is classified after data pre-processing and the
dimensionality reduction methods, the model is classified and results of each classifier is validated using a 5-fold cross
the results are analyzed. It maps the input parameters to the validation technique. Step forward feature selection and step
output and draws some conclusion based on the parameters backward feature elimination are done to check the
and its influence in predicting the output. As the output is performance of the both approaches. Exhaustive feature
either a person is healthy or not, binary classification is selection is another approach but as it is suitable for large
implemented. In this work, Random Forest (RF) and Support amount if sample data it is not been evaluated in our work as
Vector Machine (SVM) classifications are implemented in the data set has only 768 samples. Dimensionality reduction is
predicting the onset of diabetes based on the input parameters. implemented after either one of the feature selection approach
and the significant changes are noted. The performance tables
of each method are given in Table 1 and Table 2.
IV. RESULTS AND DISCUSSION
The data pre-processing implementation is the first step and is
The section discusses about the results obtained by common to both step forward and step backward wrap
implementing the proposed model and analyzing the approaches. The data set is split into 7:3 ratio for the training
efficiency. This section is divided into 4 parts: A. PIMA and test set respectively i.e. 70% of the data set are randomly
dataset and pre-processing: The understanding of the dataset chosen as the training sample and the rest 30% is the testing
and pre-processing it to increase the efficiency; B. sample. This split is decided after various combinations which
Performance of feature selection: Step forward and step proves to it be efficient. The test set is further divided
backward feature selection are implemented and analyzed; C. randomly and equally and validated for the untrained data.
Performance of dimensionality reduction: Principal The performance metrics are the accuracy of the training set,
components analysis (PCA) method is implemented to reduce the test accuracy and the validation accuracy. The validation
the dimension and analyze the results; D. Overall accuracy is after 5-fold cross validation of the 50% of test set.
performance. The sensitivity of a model is called the true positive rate
(TPR) and is the proportion of samples that are actually
A. PIMA dataset and pre-processing positive and predicted as positive. The specificity of a model
is referred to the true negative rate (TNR) i.e. the proportion of
samples that are predicted as negatives and also actually
Pima Indian Diabetes Dataset with Class variable of negative.
either ‘0’ or ‘1’ (binary class) is originally from the source of The training accuracy and the testing accuracy of Random
Research Centre of National Institute of Diabetes and Forest Classifier is more than the SVM classifier after data
Digestive and Kidney Diseases, RMI Group Leader Applied pre-processing and validation accuracy for both the classifiers
Physics Laboratory The Johns Hopkins University. This are more or less equal. The sensitivity of the classifiers is 0.8
dataset is used for analysis of diabetes in various techniques which showcases that the true positive rate is around 80% and
and algorithm. The dataset contains only female patients at the specificity of the RF algorithm is 88% and of SVM is 78%
least 21 years old of Pima Indian heritage. As glucose level and shows that the true negative rate is significantly efficient
may vary during pregnancies normally it is one of the for RF classifier which conveys the degree of correctness.
attributes. The total number of subjects(patients) in this data
set is 768 and we use it completely for the experiment. There
are totally 8 independent attributes or parameters and a single TABLE I. PERFORMANCE OF FEATURE SELECTION
class variable which says whether a patient is healthy or C la s s ifie r F e a tu r e T e s t A c c u ra c y in V a lid a tio n S e n s itiv ity S pecificity 7
diabetic. Each parameter is a numerical value with the units as S e le c tio n p e r c e n ta g e ( % ) A c c u ra c y ' in
p e r c e n ta g e ( % )
follows: no. of pregnancies in numbers, plasma glucose Step F orw ard 77.23 82.89 0.83 0 .82
R a n d o m F o re st S tep
concentration (2 hours in an oral glucose tolerance test) which B a ckw a rd
77.61 82.9 0.83 0 .82
is dimensionless, diastolic blood pressure (mm Hg), triceps S u p p o r t V e c to r

M a c h in e
Step F o rw ard 75.37 7 9.92 0.817 0.75
skinfold thickness (mm), 2 hour serum insulin (mm U/ml), S tep

B a ckw a rd
7 4.62 81.41 0.8 2 0 .79
Body Mass Index (BMI) in Kg/m2 and age in years. These

attributes are the independent parameters. And the outcome is
144
B. Performance o f feature selection increases by 3%, but the validation accuracy reduces by 2%
percent as the sample size is a key factor in this approach. The
After data pre-processing the model selects the specificity decreases drastically by a value of 0.12 i.e. the true
features that increases the efficiency of the overall negative rate but remains insignificant as mentioned above. In
performance with two methods, namely step forward and step SVM classifier, after dimensionality reduction, the accuracy
backward feature selection. Classification is carried out using and the true prediction rates are declined further by 2-3% and
RF and SVM after each of these methods. The performances 0.07 respectively. SVM proves to be insignificant over RF
of this implementation are discussed in this section. The with the step forward feature selection with the given analysis.
results are in Table 1. The best performing parameter of step In step backward feature elimination, after dimensionality
forward feature selection is obtained sequentially using the reduction, the accuracies decrease and the sensitivity and
efficiency score and as given a criterion, four best contributing specificity show the same variations as the other approach.
parameters are chosen. The four parameters are Number of Feature selection/elimination chooses the best attribute and
Pregnancies, Glucose, BMI and Age. In the RF classifier, the reduces the complexity. Dimensionality reductions is not
train set and test set accuracy slightly decrease by 1% but the efficiently significant with the given data set as the sample
feature selection method helps in reducing the complexity of size is moderate.
the model and the number of features or parameters
TABLE III. COMPARISON WITH EXISTING METHODS
contributing are precisely reduced. The validation accuracy is
unchanged which shows that the prediction works as efficient Algorithm Accuracy
as before feature selection. The true positive rate increases by in (°/o)
Support Vector Machine 77.73
a subtle amount of 0.02 and the true negative rate decreases by (SVM)
0.06 but predicting positive for a negative may make the Naive Bayes 76.30
patient more conscious, so it can be tolerable. In SVM Logistic Regression (RF) 77.64
classifier, the train set, test set and validation accuracies Random Forest (RF) without 74.44
decrease by a small amount of 0.8% after the feature selection pre-processing
Random Forest (RF) with pre- S3
but the sensitivity and specificity of the model proves to be of processing (Proposed Work)
the same rate as before the process of feature selection. It
infers that the true positives and negatives are predicted
appropriately given the reduction in the complexity of the D. Overall performance
model by reducing the features. In step backward feature
elimination, the parameters are selected by comparing their
Existing machine learning models and their comparison
values measured by eliminating lowest performing feature
with the proposed work in given in Table 3. Implementation of
until the criteria of reaching four features is reached. The four
SVM and other algorithms results in a best accuracy of
parameters selected by this approach are Glucose, BMI, 77.73% in this paper [8]. Similarly, Naïve Bayes has an
Diabetes Pedigree Function and Age. In RF classifier, the accuracy of 76.34% in [11], LR while comparing other
overall performance is significantly efficient compared to the
classifiers like KNN, DT, RF etc. is efficient with 77.64% in
step forward approach. The specificity in SVM remains same [12] and among SVM, DT, RF, LR and KNN, RF was better
compared to the others method. The changing trend in the
with 74.44% accuracy. In our proposed work, the model
different set of accuracies of the model prevails same as the
selects four most contributing features and the accuracy after
step forward selection, but the efficiency proves to be more.
step forward feature selection in RF is 83% and 80% in SVM
TABLE II. PERFORMANCE OF DIMENSIONALITY REDUCTION classifiers. The dimensionality reduction increases the test set
accuracy of the RF and SVM classifiers. The accuracy after
C la s s ifie r D im e n sio n a lity ’ T e s t A c c u r a c y in V alidation S e n s itiv ity S p e c ific ity
R e d u c tio n p e rc e n ta g e ( % ) A c c u ra c y in step backward feature selection for RF and SVM classifiers
R a n d o m F o re st A fte r Step 80.97%
p e rc e n ta g e (% )
7 7.69% 0.81 0.67
are 83% and 81.4%. The accuracy after dimensionality
F orw ard
A fte r Step 77.98% 7 9.92% 0.85 0.69
reduction increases only in the test set of SVM model. The
S u p p o r t V e c to r
B a c kw a rd
A fte r Step 7 6 .5 % 7 3 .6 % 0 .817 0.75
feature selection proves to be significant and reduces the
M a c h in e F orw ard
A fte r Step
complexity of the model. Dimensionality is not of high
76 .5 % 7 4.35% 0.784 0.6
B ackw ard significance as the dataset is not very large.
C. Performance o f dimensionality reduction
References
Dimensionality reduction by the PCA method is
implemented after the two feature selection methods [1] Naz, H., Ahuja, S. Deep learning approach for diabetes prediction using
individually. Dimensionality reduction decreases the PIMA Indian dataset. J Diabetes Metab Disord 19, 391-403 (2020).
dimensional space in computation and help in classifying [2] K.Suresh, O.Obulesu, & B. Venkata Ramudu. (2020). Diabetes
efficiently. Classification is done after reduction method using Prediction using Machine Learning Techniques. Helix, 10(02), 136-142.
the two classifiers. The results are in Table 2. In RF classifier,
the train set accuracy remains same and the test set accuracy
145
[3] M. Chen, Y. Hao, K. Hwang, L. Wang, and L. Wang, “Disease [14] Naveen Kishore G, V.Rajesh, A.Vamsi Akki Reddy, K.Sumedh,
Prediction by Machine Learning Over Big Data From Healthcare T.Rajesh Sai Reddy, Prediction Of Diabetes Using Machine
Communities,” IEEE Access, vol. 5, pp. 8869-8879, 2017. Learning Classification Algorithms, International Journal of
[4] Jakka, Aishwarya & Jakka, Vakula. (2019). Performance Evaluation of Scientific & Technology Research Volume 9, Issue 01, January
Machine Learning Models for Diabetes Prediction. 2020 ISSN 2277-8616.
10.35940/ijitee.K2155.0981119. [15] Lai, H., Huang, H., Keshavjee, K. et al. Predictive models for
[5] S. Wei, X. Zhao and C. Miao, "A comprehensive exploration to the diabetes mellitus using machine learning techniques. BMC
machine learning techniques for diabetes identification," 2018 IEEE 4th Endocr Disord 19, 101 (2019).
World Forum on Internet of Things (WF-IoT), Singapore, 2018, pp. [16] S. S. Sanjanaa Bose and C. Santhosh Kumar, "Combining the
291-295. Multiple Features for Improving the Performance o f Multi-
[6] B. Giri, N. S. Ghosh, R. Majumdar and A. Ghosh, "Predicting Diabetes Parameter Patient Monitor," 2019 5th International Conference on
Implementing Hybrid Approach," 2020 8th International Conference on Advanced Computing & Communication Systems (ICACCS),
Reliability, Infocom Technologies and Optimization (Trends and Future Coimbatore, India, 2019, pp.
Directions) (ICRITO), Noida, India, 2020, pp. 388-391. 647-651.
[7] P. K. Fahad and M. S. Pallavi, "Prediction of Human Health Using [17] Abinav, Anil kumar, Naveena Karthika, Pratibha, Ronsen, Gandhiraj
Machine Learning and Big Data," 2018 International Conference on R., and Soman K.P., “SVM based Classification of Digitally
Communication and Signal Processing (ICCSP), Chennai, 2018, pp. Modulated Signals for Software Defined Radio”, in International
0195-0199 Conference on Embedded Systems 2010, Coimbatore Institute
of Technology, Coimbatore, 2010.
[8] Sneha, N., Gangil, T. Analysis o f diabetes mellitus for early prediction
using optimal features selection. J Big Data 6, 13 (2019). [18] Kavitha K. R., Gopinath, A., and Gopi, M., “Applying Improved
SVM Classifier for Leukemia Cancer Classification Using FCBF”,
[9] M.Rajeswari , P.Prabhu, A Review of Diabetic Prediction Using
in 2017 International Conference on Advances in Computing,
Machine Learning Techniques, International Journal o f Engineering and
Communications and Informatics (ICACCI), Udupi, India, 2017.
Techniques - Volume 5 Issue 4,July 2019, ISSN: 2395-1303
[19] S. V. Thambi, K. T. Sreekumar, C. S. Kumar and P. C. R. Raj,
[10] Faruque, Md & , Asaduzzaman & Sarker, Iqbal. (2019). Performance
"Random forest algorithm for improving the performance o f speech/
Analysis o f Machine Learning Techniques to Predict Diabetes Mellitus.
non-speech detection," 2014 First International Conference on
[11] Deepti Sisodia, Dilip Singh Sisodia, Prediction of Diabetes using Computational Systems and Communications (ICCSC), Trivandrum,
Classification Algorithms, Procedia Computer Science, Volume 132, 2014, pp. 28-32.
2018.
[20] R. Karthika, Dr. Latha Parameswaran, B.K., P., and L.P., S., “Study
[12] Kayal Vizhi, Aman Dash, “Diabetes Prediction Using Machine of Gabor wavelet for face recognition invariant to pose and
Learning”, IJAST, vol. 29, no. 06, pp. 2842 - 2852, May 2020. orientation”, Proceedings of the International Conference on
[13] Zou Quan, Qu Kaiyang, Luo Yamei, Yin Dehui, Ju Ying, Tang Hua, Soft Computing Systems, Advances in Intelligent Systems and
Predicting Diabetes Mellitus With Machine Learning Techniques, Computing, vol. 397. Springer Verlag, pp. 501-509, 2016.
Frontiers in Genetics, volume-9, 2018, 515, ISSN 1664-8021. [21] Pima Indians Diabetes Database- Predict the onset o f diabetes based
on diagnostic-https://www.kaggle.com/uciml/pima-indians-diabetes-
database
146

Ieee 2

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ieee 2

Uploaded by

Copyright:

Available Formats

2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS)

Diabetes Prediction using Machine Learning

978-1-6654-0521-8/21/$31.00 ©2021 IEEE

Backward feature elimination is exactly opposite to the

is dimensionless, diastolic blood pressure (mm Hg), triceps S u p p o r t V e c to r

skinfold thickness (mm), 2 hour serum insulin (mm U/ml), S tep

Body Mass Index (BMI) in Kg/m2 and age in years. These

You might also like