Mini Project On Diabetes Prediction: Information Technology

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

A Report On

MINI PROJECT ON
DIABETES PREDICTION

INFORMATION TECHNOLOGY

SEMESTER VI

SUBMITTED BY
Shraddha Tamhane (61)
Tanaya Thakur (68)

UNDER THE GUIDANCE OF


Dr. J.E. Nalavade

DEPARTMENT OF INFORMATION TECHNOLOGY


PILLAI HOC COLLEGE OF ENGINEERING & TECHNOLOGY,
RASAYANI

UNIVERSITY OF MUMBAI
AY 2020-21
Pillai HOC College of Engineering & Technology, Rasayani
Year: 2020-2021

INFORMATION TECHNOLOGY
Certificate

This is to certify that the project entitled “MINI PROJECT ON DIABETES


PREDICTION” is successfully completed by following students:

Student Name Roll Number


Shraddha Tamhane 61
Tanaya Thakur 68

As per the syllabus & in partial fulfilment for the Mini Project of Business
Intelligence Lab in Information Technology from University of Mumbai, it is also
to certify that this is the original work of the candidate done during the
academic year 2020-2021.

Project Guide Head of Department

Principal

Internal Examiner External Examiner


ACKNOWLEDGEMENT

A project is never complete without the guidance of experts who already gone
through this in past before and hence become master of it and as a result, our
guides. So, we would like to take this opportunity to thank all those individuals
who helped us in visualizing our project.

We express our deep gratitude to our project guide Dr. J. E. Nalavade for
providing timely assistant to our query and guidance that he gave owing to his
experience in this field for past many years. He had indeed been a lighthouse for
us in this journey. We would also like to thank our guide for providing us with
his expert opinion and valuable suggestions at every stage of project.

We would also take this opportunity to thank our project coordinator Dr. J. E.
Nalavade for his guidance in selecting this project and also for providing us all
this details on proper presentation of this project.

We would like to take this opportunity to thank Dr. J. E. Nalavade, Head of


Information Technology for his motivation and valuable support. This
acknowledgement is incomplete without thanking teaching and non-teaching
staff of department of their kind support.

We extend our sincerity appreciation to all our Professor and Principal Dr. T. J.
Mathew, Principal of Pillai HOC College of Engineering and Technology,
Rasayani for providing he infrastructure and resources required for project.

Thanking You,
INDEX

Sr. No. Contents


1 Abstract
2 Introduction
3 Literature Survey
4 Proposed System
5 Model Building
6 Implementation
7 Experimental Result
8 Conclusion
1. Abstract

Diabetes is an illness caused because of high glucose level in a human body. Diabetes
should not be ignored if it is untreated then Diabetes may cause some major issues in a
person like: heart related problems, kidney problem, blood pressure, eye damage and it
can also affect other organs of human body. Diabetes can be controlled if it is predicted
earlier. It is one of the growing extremely fatal diseases all over the world. Various
traditional methods based on physical and chemical tests, are available for diagnosing
diabetes.
To achieve this goal, in this project work we will do early prediction of Diabetes in a
human body or a patient for a higher accuracy through applying, various Machine
Learning Techniques. Machine learning techniques provide better result for prediction
by constructing models from datasets collected from patients. In this work we will use
Machine Learning Classification and ensemble techniques on a dataset to predict
diabetes, which are K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naïve
Bayes and Random Forest Classifier. The accuracy is different for every model when
compared to other models.
The Project work gives the accurate or higher accuracy model shows that the model is
capable of predicting diabetes effectively. Our result shows that SVM, Naïve Bayes and
Random Forest Classifier achieved higher accuracy compared to KNN. This research
focuses on the recent development in the machine learning which have made significant
impact in the detection and diagnosis of diabetes. This also aims to propose an effective
technique for earlier detection of diabetes disease.
2. Introduction

Diabetes is noxious diseases in the world. Diabetes caused because of obesity or high
blood glucose level, and so forth. It affects the hormone insulin, resulting in abnormal
metabolism of crabs and improves level of sugar in the blood. Diabetes occurs when
body does not make enough insulin. Diabetes is one of the frequent diseases that targets
the elderly population worldwide. The main cause of diabetes remains unknown, yet
scientists believe that both genetic factors and environmental lifestyle play a major role
in diabetes.
According to (WHO) World Health Organization about 422 million people suffering
from diabetes particularly from low- or idle-income countries. According to the
International Diabetes Federation, 451 million people across the world were diabetic in
2017. And this could be increased to 490 billion up to the year of 2030. However,
prevalence of diabetes is found among various Countries like Canada, China, and India
etc. Population of India is now more than 100 million so the actual number of diabetics
in India is 40 million. Diabetes is major cause of death in the world. Early prediction of
disease like diabetes can be controlled and save the human life.
To accomplish this, this work explores prediction of diabetes by taking various
attributes related to diabetes disease. For this purpose, we use the Pima Indian
Diabetes Dataset, we apply various Machine Learning classification and ensemble
Techniques to predict diabetes.
Machine Learning is a method that is used to train computers or machines explicitly.
Various Machine Learning Techniques provide efficient result to collect knowledge by
building various classification and ensemble models from collected dataset. Such
collected data can be useful to predict diabetes. Various techniques of Machine Learning
can capable to do prediction, however it’s tough to choose best technique. Thus, for this
purpose we apply popular classification and ensemble methods on dataset for
prediction.
3. Literature Survey

 Machine Learning Approach for Postprandial Blood Glucose Prediction in


Gestational Diabetes Mellitus (2020)

In this work, they developed and comprehensively described a data-driven blood


glucose model based on a decision tree gradient boosting algorithm to predict
different characteristics of postprandial glycaemic responses; the model utilized
meal-related data derived from a mobile app diary (including information on the
glycaemic index), food context (information on previous meals), characteristics of
the individual patients and patient behavioural questionnaires.

 Prediction of Diabetes Using Machine Learning Algorithms in Healthcare


(2018)

Predictive analytics in healthcare is a challenging task but ultimately can help


practitioners make big data-informed timely decisions about patient's health and
treatment. This paper discusses the predictive analytics in healthcare, six different
machine learning algorithms are used in this research work. This paper aims to help
doctors and practitioners in early prediction of diabetes using machine learning
techniques.

 A Decision Support System for Diabetes Prediction Using Machine


Learning and Deep Learning Techniques (2019)

In this paper, it proposes a DSS for diabetes prediction based on Machine Learning
(ML) techniques. It compared conventional machine learning with deep learning
approaches. For conventional machine learning method, we considered the most
commonly used classifiers: Support Vector Machine (SVM) and the Random Forest
(RF).
4. Proposed System

Goal of the paper is to investigate for model to predict diabetes with better accuracy. We
experimented with different classification and ensemble algorithms to predict diabetes.
In the following, we briefly discuss the phase.

i. Dataset Description:
The data is gathered form Kaggle which is named as Pima Indian Diabetes Dataset. The
dataset has many attributes of 768 patients.

Table 1: Dataset Description

Sr. No. Attributes

1 Pregnancy

2 Glucose

3 Blood Pressure

4 Skin thickness

5 Insulin

6 BMI (Body Mass Index)

7 Diabetes Pedigree Function

8 Age

The 9th attribute is class variable of each data points. This class variable shows the
outcome 0 and 1 for diabetics which indicates positive or negative for diabetics.
Distribution of Diabetic patient- We made a model to predict diabetes however the
dataset was slightly imbalanced having around 500 classes labelled as 0 means negative
means no diabetes and 268 labelled as 1 means positive means diabetic.
ii. Data Pre-processing:
Data pre-processing is most important process. Mostly healthcare related data contains
missing vale and other impurities that can cause effectiveness of data. To improve
quality and effectiveness obtained after mining process, Data pre-processing is done. To
use Machine Learning Techniques on the dataset effectively the process is essential for
accurate result and successful prediction. For Pima Indian diabetes dataset, we need to
perform pre-processing in two steps.
a. Missing Values removal: Remove all the instances that have zero (0) as worth.
Having zero as worth is not possible. Therefore, this instance is eliminated.
Through eliminating irrelevant features/instances we make feature subset and
this process is called features subset selection, which reduces dimensionality of
data and help to work faster.
b. Splitting of data- After cleaning the data, data is normalized in training and
testing the model. When data is spitted then we train algorithm on the training
data set and keep test data set aside. This training process will produce the
training model based on logic and algorithms and values of the feature in
training data. Basically, aim of normalization is to bring all the attributes
under same scale.

iii. Apply Machine Learning


When data has been ready, we apply Machine Learning Technique. We use different
classification and ensemble techniques, to predict diabetes. The methods applied on
Pima Indians diabetes dataset. Main objective to apply Machine Learning Techniques to
analyze the performance of these methods and find accuracy of them and also been able
to figure out the responsible/important feature which play a major role in prediction.
The Techniques are follows-
a. Support Vector Machine (SVM): Support Vector Machine also known as SVM
is a supervised machine learning algorithm. SVM is most popular classification
technique. SVM creates a hyperplane that separate two classes. It can create a
hyperplane or set of hyperplanes in high dimensional space. This hyper plane
can be used for classification or regression also. SVM differentiates instances
in specific classes and can also classify the entities which are not sup- ported
by data. Separation is done by through hyperplane performs the separation to
the closest training point of any class.

b. K-Nearest Neighbor (KNN): KNN is also a supervised machine learning


algorithm. KNN helps to solve both the classification and regression problems.
KNN is lazy prediction technique. KNN assumes that similar things are near to
each other. Many times, data points which are similar are very near to each
other. KNN helps to group new work based on similarity measure. KNN
algorithm record all the records and classify them according to their similarity
measure. For finding the distance between the points uses tree like structure.
To make a prediction for a new data point, the algorithm finds the closest data
points in the training data set its nearest neighbors. Here K= Number of
nearby neighbors, it’s always a positive integer. Neighbor’s value is chosen
from set of class.

c. Naïve Bayes: It is a classification technique based on Bayes’ theorem with an


assumption of independence between predictors. In simple terms, a Naive
Bayes classifier assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature

d. Random Forest Classifier: Random Forest is a trademark term for an


ensemble of decision trees. In Random Forest, we’ve collection of decision
trees (so known as “Forest”). To classify a new object based on attributes,
each tree gives a classification and we say the tree “votes” for that class. The
forest chooses the classification having the most votes (over all the trees in
the forest).
5. Model Building

This is most important phase which includes model building for prediction of
diabetes. In this we have implemented various machine learning algorithms which
are discussed above for diabetes prediction.

Procedure of Proposed Methodology-

Step 1: Import required libraries

Step 2: Import diabetes dataset.

Step 3: Pre-processing the data

Step 4: Machine Learning Modelling (Machine Learning Algorithm i.e., K-


Nearest Neighbor, Support Vector Machine, Naïve Bayes and Random Forest
Classifier).
6. Implementation and Result
7. Experimental Results

In this work different steps were taken. The proposed approach uses different
classification and ensemble methods and implemented using python. These methods
are standard Machine Learning methods used to obtain the best accuracy from data. In
this work we see that SVM, Random Forest Classifier and Naïve Bayes achieved higher
accuracy. Overall, we have used best Machine Learning techniques for prediction and
to achieve high performance accuracy.
The sum of the importance of each feature playing major role for diabetes have been
plotted, where X-axis represents the importance of each feature and Y-Axis the names of
the features.

8. Conclusion

The main aim of this project was to design and implement Diabetes Prediction Using
Machine Learning Methods and Performance Analysis of that methods and it has been
achieved successfully. The proposed approach uses various classification and ensemble
learning method in which SVM, KNN, Naïve Bayes and Random Forest Classifier are
used.
ML model KNN was able to classify patients as Diabetic or not with an accuracy of
99.35% and ML model SVM, Naïve Bayes and Random Forest Classifier were able to
classify patients as Diabetic or not with an accuracy of 100%.

You might also like