Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 7

Diabetes disease prediction using machine learning

techniques

ABSTRACT:
As the Diabetesle disease is long lasting diseases, it takes long period to diagnosis. The
Diabetesle disease is a threatening disease in all over the world, its cost more to
diagnosis, as some of the Diabetesle diseases are unable to diagnose, the patient has to
suffer throughout his lifetime. This kind disease data are available hugely in medical field, to
make easier for healthcare system the data mining approaches are applied. As in this
project five hundred Diabetesle dataset are taken and the machine learning approaches are
applied, The objective of this research work is to introduce a new decision support system to
predict Diabetes disease. The aim of this work is to compare the performance of Support
vector machine (SVM) and K-Nearest Neighbour (KNN) and decision tree classifier on the
basis of its accuracy, precision and execution time for DD prediction.

INTRODUCTION

Machine Learning is a field concerned with the study of huge and multiple variable data. It is
evolved from the study of pattern recognition and computational learning theory in artificial
intelligence and involves computational methods, algorithms, and techniques for analysis. In
Medical Science’s perspective, Machine Learning promises to aid physicians to make near-
perfect diagnoses, opt for the best medications for their patients, spot patients at high-risk for
pitiable outcomes, and specifically improving patients’ physical condition while minimizing
costs Machine learning and data mining techniques together has proved a success in
prediction and diagnosis of various critical diseases. There are different applications for
Machine Learning, the most crucial of which is data mining. AI alongside information
mining can regularly be viably applied to such issues, as they improve the effectiveness of the
frameworks and their plans. Grouping is a data mining capacity that appoints things in an
assortment to target classifications or classes. The objective of grouping is to precisely
foresee the objective class for each case in the data. Various data mining classification
approaches and machine learning algorithms are applied for the prediction of Diabetes
diseases. Here we are concerned about Diabetes disease (DD), also known as Diabetes renal
disease, which is an abnormal function of the or a progressive failure of renal function over a
period of months or years. Often, Diabetes disease is diagnosed as a result of screening of
people known to be at risk of problems, such as those with high blood pressure or diabetes
and those with a blood relative with DD. It is differentiated from acute disease in that the
reduction in function must be present for over 3 months. This work predominantly focused
on, prediction of Diabetes disease. Diabetes disease is predicted using classification
techniques of data mining. The classifiers used here are, Support Vector Machine (SVM) and
K-Nearest Neighbor (KNN) and decision tree classifier. Their performance is then evaluated
based on accuracy, precision, and F-measure.

Objective:

The objective of this research work is to introduce a new decision support system to predict
Diabetes disease. The aim of this work is to compare the performance of Naive Bayes
(NAIVE BAYES) and K-Nearest Neighbour (KNN) and Decision tress classifier on the basis
of its accuracy, precision and execution time for DD prediction. From the experimental
results it is observed that the performance of KNN classifier is better than NAIVE BAYES.

existing process and its limitations

Nowadays, health care industries are providing several benefits like fraud detection in health
insurance, availability of medical facilities to at inexpensive prices, identification of smarter
treatment methodologies, construction of effective healthcare policies, effective hospital
resource management, better customer relation, improved patient care and hospital infection
control. Disease detection is also one of the significant areas of research in medical. There is
no automation for Diabetes disease prediction.
METHODOLOGY
The work proposed here uses three classification techniques to predict the presence of
Diabetes disease in humans. The classifiers used are Support vector machine and KNN
classifier. The data set for Diabetes disease was gathered and applied on each classifier to
predict the disease and the performance of the classifier is evaluated based on accuracy,
precision and F measure. Architecture of Predictive Data Mining: Proposed Approach.

Data Set For


Diabetes Kidney
Disease

Data Mining
Classi fication

SVM KNN

Performan ce Evaluation

KNN

Diabetes Disease has turned into a worldwide medical problem and is a area of concern. It is
a condition where Diabetes become damaged and cannot filter toxic wastes in the body.
Proposed system is an automation Diabetes disease prediction and its stages using
classification techniques Decision Tree, K-Nearest Neighbour and SVM Classifier to
compare the performance of these above techniques based on their execution time.

Literature review done in connection with the work


In 2015, Konstantina Kourou et.al [1] proposed a study of Machine learning applications in
cancer prognosis and prediction. In this paper, they have presented a review of various recent
ML approaches that are applied for the prediction of cancer detection. Here they have
presented review of newly published content for the work done so far in cancer detection. In
2015 P.Swathi Baby et. al [2] proposed a project to diagnosis and prediction system based on
predictive mining. Here disease data set is used and analysed using Weka and Orange
software. Here the Machine learning algorithms such as AD Trees, J48, K star , Naïve Bayes,
Random forest are used for the performance study of each algorithm which gives the
Statistical analysis and predicting diseases using the algorithms. Their observation shows
that the best algorithms K-Star and Random Forest for the used Dataset ,where Build the
models are less time(0 sec and 0.6 sec) and the ROC values are 1. In 2015, Konstantina
Kourou et.al [1] proposed a study of Machine learning applications in cancer prognosis and
prediction. In this paper, they have presented a review of various recent ML approaches that
are applied for the prediction of cancer detection. Here they have presented review of newly
published content for the work done so far in cancer detection. In 2015 P.Swathi Baby et. al
[2] proposed a project to diagnosis and prediction system based on predictive mining. Here
disease data set is used and analysed using Weka and Orange software. Here the Machine
learning algorithms such as AD Trees, J48, K star , Naïve Bayes, Random forest are used for
the performance study of each algorithm which gives the Statistical analysis and predicting
diseases using the algorithms. Their observation shows that the best algorithms K-Star and
Random Forest for the used Dataset ,where Build the models are less time(0 sec and 0.6 sec)
and the ROC values are

The following are the parameter we considering for this project

 Pregnancies
 Glucose
 Blood Pressure
 Skin Thickness
 Insulin
 BMI
 Diabetes Pedigree Function
 Age
 Outcome
Problem statement
Patients with the potential of diabetes have to go through a series of tests and exams to
diagnose the disease properly. These tests may include redundant or unnecessary medical
procedures, which lead to intricate complications and wastage of time & resources.To
overcome from this we are using machine learning technology which helps in predicting the
person is having diabetes or not with accuracy and also reduce cost.

System architecture

System design
Data Collection

Data Preprocessing

Regression Algorithm

(DCT,Naïve bayes, knn )

Test data Model

New Data Analysis

Prediction

Comparison
Study

Hardware Requirements

 Processor : Intel i3 3.30 GHz.


 Hard Disk : 40 GB (min)
 RAM : 4GB

Software Requirements

 Operating system : Windows 7 and above. 


 Coding Language : python.
 Data Base : MYSQL
 Editor : Notepad++

Future scope:
In this Project we can considered more number of datasets for prediction and we can use
different classification and deep learning algorithm for better accurate. We can use more
number of parameters and predict other disease. this application can be deployed in mobile so
it will be user-friendly everyone can access it.

Expected output:
Collection of diabetes datasets from the kaggle.com and the data provider websites. The
machine learning steps like pre processing, data analysis, splitting of data and data modelling
is showed in the Jupiter notebook. The accuracy and comparison study is showed. The web
application will be developed using flask framework.

You might also like