Diabetes Prediction

DIABETES PREDICTION WEB APP
(for the partial fulfillment of Masters of Computer Application)
Submitted by
SAURABH JETHURI
Under the guidance of
Ms. Anamika Arora
ASSISTANT PROFESSOR, GEHU
SCHOOL OF COMPUTING
GRAPHIC ERA HILL UNIVERSITY

Dec, 2023
i
ACKNOWLEDGEMENT
This is a great opportunity to acknowledge and to thanks all those persons

without whose support and help this project would have been impossible. I
would like to add a few heartfelt words from the people who were part of
this project in numerous ways.
I am obliged and thankful to my project SUPERVISOR, ANAMIKA
ARORA, for her continuous encouragement, motivation and professional
guidance during the work of this project which has proven to be an integral
part of it. Without her valuable support and guidance, this project could not
elevate up this level of development from our point of view. I would like to
thank all the faculty members, School of Computing, Graphic Era Hill
University for their valuable time spent in requirements analysis and
evaluation of the project work.
I would like to express my sincere and cordial gratitude to the people those
who have supported me directly, purveyed mental encouragement, evaluated
and criticized my work in several phases during the development of this
project and for preparing this dissertation indirectly.
Saurabh Jethuri
22561091 (61)
ii
ABSTRACT
Diabetes is a chronic disease with the potential to cause a worldwide health care crisis.
According to International Diabetes Federation 382 million people are living with
diabetes across the whole world. By 2035, this will be doubled as 592 million. Diabetes is
a disease caused due to the increase level of blood glucose. This high blood glucose
produces the symptoms of frequent urination, increased thirst, and increased hunger.
Diabetes is a one of the leading cause of blindness, kidney failure, amputations, heart
failure and stroke. When we eat, our body turns food into sugars, or glucose. At that
point, our pancreas is supposed to release insulin. Insulin serves as a key to open our
cells, to allow the glucose to enter and allow us to use the glucose for energy. But with
diabetes, this system does not work. Type 1 and type 2 diabetes are the most common
forms of the disease, but there are also other kinds, such as gestational diabetes, which
occurs during pregnancy, as well as other forms. Machine learning is an emerging
scientific field in data science dealing with the ways in which machines learn from
experience. The aim of this project is to develop a system which can perform early
prediction of diabetes for a patient with a higher accuracy by combining the results of
different machine learning techniques. The algorithms like K nearest neighbour, Logistic
Regression, Random forest, Support vector machine and Decision tree are used. The
accuracy of the model using each of the algorithms is calculated. Then the one with a
good accuracy is taken as the model for predicting the diabetes.
Keywords : Machine Learning, Diabetes, Decision tree, K nearest neighbour,

Logistic Regression, Support vector Machine, Accuracy.
iii
TABLE OF CONTENTS
ACKNOWLEDGEMENT i
ABSTRACT ii
LIST OF TABLES iv
LIST OF FIGURES v
1. INTRODUCTION 1
1.1 General 1
1.2 Purpose 2
1.3 Scope 2
1.4 Motivation 3
2. REQUIREMENT ANALYSIS 5
2.1 Introduction 5
2.3 Functional Requirements 5
2.3 Non Functional Requirements 6
3. DATASET 7
4. PROPOSED METHOD 11
5. MODELING AND ANALYSIS 15
6. MEASUREMENTS 17
7. RESULTS AND DISCUSSION 19
8. RESULTS AND ANALYSIS 20
9. CONCLUSION 30
10. REFERENCES 31
iv
CHAPTER 1
INTRODUCTION
1.1 GENERAL
A report by the WHO (world health organization) as of Nov 2016, says that there are 422
million adults are with diabetes, 1.6 million deaths. In 2012 High Blood Glucose has
been the cause of 2.2 million people deaths. Many diseases are caused due to Diabetes,
and they affect our kidneys, eyes, heart and also other organs. To understand diabetes, we
first need to learn how the body works without diabetes.
The food that we eat contains various kinds of components such as sugar, protein, fat, etc.
The sugar we gain mainly comes from foods that contain carbohydrates which provides
our body with energy. Foods such as bread, cereal, pasta, rice, fruit, dairy products, and
vegetables contain carbohydrates. Such kinds of food, when consumed, are broken down
into glucose by our body, and they are supplied throughout by the means of our
bloodstream.
Mainly, glucose travels to the brain as it is required mainly for the body’s thinking and
functionality. The rest of the glucose is supplied to the rest of our bodies such as the cells,
and the liver. Insulin is an important component that is required for the functionality of
the human body. It is a hormone produced by beta cells in the pancreas. It permits the
glucose to move from the bloodstream to the cells in our body. Since the pancreas is used
to produce insulin, it needs enough glucose. If the pancreas cannot produce enough
insulin, the glucose builds up and this is how diabetes is developed in an individual.
The signs or symptoms of Diabetes can be listed as Blurred vision, Fatigue, Weight Loss,
Increased Hunger and Thirst, Frequent Urination, Confusion, Poor Healing, Frequent
Infections, Difficulty in Concentrating. Type 1 Diabetes: Type 1 diabetes occurs when
our immune system destroys cells in your pancreas called beta cells, the cells that remake
the insulin in our body. The insulin builds up in our blood and as a result, our cells are in
a state of starvation which causes diabetes. It occurs usually in people less than 30 years
and about 5 - 10% of those with diabetes but 10 can occur at any age. On the contrary to
ancient belief, it is not a childhood disease. It happens to occur to adults more than
children, although it was known to be a juvenile disease.
Type 2 Diabetes: People with type 2 diabetes make insulin, but their cells don’t consume
it as much as they should. The pancreas cannot keep up, and the sugar builds up in our
bloodstream.
Data analytics is the identification of the hidden patterns from huge amounts of data for
the drawing of conclusions. In health care, machine learning algorithms are used for
analyzing the medical data to build machine learning models to carry out medical
diagnoses. In this paper, we are going to use techniques such as Naive Bayes, Random
Forest Algorithm and Logistic Regression to predict diabetes with the help of PIMA
dataset.
1.2 PURPOSE
The aim of the project is to determine the appropriate classification model or algorithm
that gives the best accuracy results ever possible. So, that algorithm proven to be the best
can then be used in the prediction of diabetes to figure out if a person is diabetic or non-
diabetic so far. This is to avoid any kind of misconceptions due to the incompetent
classification algorithm or model can cause if the best one is not chosen.
It is also one of the most chronic diseases in India or elsewhere, the prediction of this in
the early stage or even before should be able to control and contain it more easily maybe
with a proper diet or a less severe treatment. The Type 2 diabetes has a much stronger
link to family history or lineage than the Type 1 diabetes. So, if a member of a family has
Type 2 diabetes it is likely that any member of the family could possess the same, so it
has to be eliminated before it gets too complicated.
1.3 SCOPE
Type 2 diabetes is very different from Type 1 diabetes, which was previously called as
insulin dependent diabetes mellitus (IDDM). Before the 2000s Type 2 diabetes was
considered a disease of elderly and middle-aged individuals (hence it was also called
adult on set diabetes). But once it started to show up on teenagers the name faced away as
it no 11 longer was confined to middle-aged or adults.
According to U.S. NIH, Type 2 diabetes contributes to the respective conditions directly:
1. Stroke and Heart diseases. Adults who are subjected to diabetes die due to heart
diseases 2x to 4x times than those of adults who are not subjected to diabetes. The risk of
the stroke the take place is also 2x to 4x times than those adults who are not subjected to
diabetes.
2. Nervous system disease. Half of the population with diabetes feel impaired sensation,
pain in the hands or feet, carpal tunnel syndrome, slower digestion, and many other
nervous problems.
3. Possess High blood pressure. Most of the adults have blood pressure that goes higher
than 130/80 mmHg.
4. Blindness. It is one of the new causes of being diabetic for the ones who are between
the ages of 20 to 74.
5. Amputation. 60% of this occurs among the people with diabetes, non-traumatic lower
limb amputation.
6. Kidney disease. One of the leading cause of kidney failure. About 150,000 individuals
having diabetes survive on chronic dialysis or due to a kidney transplant.
7. Immune system disorder. Individuals with diabetes have fewer abilities to fight or
reject viral and bacterial infections. They have more chances of dying of influenza or
pneumonia from the individuals who do have diabetes.
2
8. Pregnancy complication. Mothers having diabetes having a greater number of
abortions, and their babies tend to have a greater risk of major born defects and of being
susceptible to diabetes later in their life.
Diabetes is a disease that is considered to be as one of the leading causes of death In
India,
72 Million diabetes cases were recorded in 2017 and this is expected to double by 2025.
This poses a serious public Health Issue in a country where population keeps increasing
12 exponentially every year. Among the Indian states, Tamil Nadu has been having the
highest Death Rates from Diabetes. Diabetes often leads to long term disabilities and
complications.
It leads to Heart Attacks, Kidney Failure, Blindness and Gestational Diabetes causes birth
defects to the new born babies.
Around 1.95 Lakh Crores will be needed as the Annual Cost to treat Diabetes. Urban
Poor in India spend 34% of their income on Diabetes Treatment.
These trends indicate that there is a rise in premature death and this is a major threat to
global development. Technological advancements have been useful in reducing
hyperglycemia.
But irrespective of all of these Technological Advancements, Diabetes still poses a
serious threat to life.
We aim to perform Prediction and analysis on a PIMA Dataset that can be used to find
the efficiency and accuracy. This can be used to find the most suitable algorithm and the
one that has the highest accuracy. We split the dataset into 4 different splits namely: 60 /
40, 70 / 30, 80 / 20, 65 / 35. The Input Training and Test Data is fitted to the model and
we then classify the Training data into different arrays for the purpose of prediction. We
then find the accuracy by comparing the predicted values with the original set of values
that we have.
1.4 MOTIVATION
Diabetes is indeed one of the chronic health problems that are devastating and with
preventable consequences. The driving agents would be high blood glucose levels due to
low insulin production. Type 2 diabetes affects men and women proportionately, around
12 million men and 11.5 million women have diabetes. To improve the quality of life
means to take one’s own diabetes into control, for which additional support and education
need to be provided to the patients.
Though the technology has evolved and new treatments are found in controlling diabetes,
the challenges of self-comprehension are the most overwhelming for most of the
individuals. It demands individual patient self-management that includes monitoring the
blood glucose levels, maintaining a healthy diet, taking medication and regularly
exercising. There are high non-compliance patterns in self-management behaviors, this
3
could be usually due to the changes that are encountered in the patient's routine life. To
13 adapt and inherit to such changes the patients are usually motivated to achieve their
goals and them their new way of living to create a long living life which allows them to
manage diabetes. The support, assistance, and feedback play a very important role in
achieving self management goals. Organizations, where there are peer diabetes people
support groups, are a valuable source for the patients as they can cling on to the mutual
changes and awareness of themselves.
4
CHAPTER 2
REQUIREMENT ANALYSIS
2.1 Introduction
The diabetes prediction web application aims to assist users in predicting their risk of
developing diabetes based on various health indicators and factors. By providing a user-
friendly interface and utilizing machine learning algorithms, the application will help
individuals make informed decisions about their health and take proactive measures to
prevent or manage diabetes.
2.2 Functional Requirements

a. Input Data:
 The application should allow users to input relevant health data, such as blood
glucose levels, body mass index (BMI), age, family history, and lifestyle factors.
 Users should be able to input data either manually or import it from external
devices or health applications.
b. Diabetes Prediction:
 The application should employ machine learning algorithms to analyze the user's
input data and predict the likelihood of developing diabetes.
 The prediction should be based on accurate and validated models that take into
account various risk factors.
c. Prediction Results and Recommendations:
 The application should provide users with clear and understandable prediction
results indicating their risk level (e.g., low, medium, high) for developing
diabetes.
 Based on the prediction outcome, the application should offer personalized
recommendations for lifestyle modifications, preventive measures, and further
medical consultations.
2.3 Non-Functional Requirements

a. User-Friendly Interface:
 The application should have an intuitive and visually appealing interface that is
easy to navigate and understand.
5
 User input forms should be designed to be clear, concise, and user-friendly.
b. Performance and Scalability:
 The application should be capable of handling multiple user requests
simultaneously without significant performance degradation.
 It should be scalable to accommodate a growing user base and increasing data
volumes.
c. Reliability and Availability:
 The application should be reliable, ensuring it operates smoothly without
unexpected downtime or errors.
 Adequate server resources and backups should be in place to maintain
availability.
d. Compatibility:
 The application should be compatible with commonly used web browsers and
responsive across various devices (desktops, tablets, mobile phones).
e. Accessibility:
 The application should adhere to accessibility guidelines, making it accessible to
users with disabilities.
 Consideration should be given to providing alternative text, keyboard navigation,
and other accessibility features.
f. Data Privacy and Compliance:
 The application should adhere to data privacy regulations, safeguarding user
information and providing transparent privacy policies.
 Proper consent mechanisms should be in place for collecting and processing user
data.
g. Performance Metrics:
 The application should be capable of providing prediction results within a
reasonable response time.
 Accuracy metrics should be measured and reported periodically to ensure reliable
predictions.
6
Hardware requirements
S.No Hardware Component
1 CPU Intel i3 or higher
2 Internet Connection To retrieve dataset
3 Storage At least 4 GB
4 RAM 4 GB
Software Requirement
S.No Software Name
1 Browser Google Chrome, Firefox, etc.
2 Operating System Windows, Mac OS
3 Programming IDE VS Code and Google Collab
4 Server WSGI (Web Server Gateway Interface)
Technology Requirement
S.No Technology Used
1 HTML5 Web Page Layout
2 CSS3 Designing Web Page
3 JavaScript Listening actions of web page
4 Python (version - 3.7.0b5) Developed ML model and implementing backend logic
5 Dataset Data required for predicting result based on SVM

algorithm
7
CHAPTER 3
DATASET
The dataset collected

is originally from the
Pima Indians Diabetes
Database is
available on Kaggle. It
consists of several
medical analyst variables
and one target variable.
The objective of the
dataset is to predict
whether the patient has
8
diabetes or not. The
dataset
consists of several
independent variables
and one dependent
variable, i.e., the
outcome.
Independent variables
include the number of
pregnancies the patient
has had their BMI,
9
insulin level, age, and so
on as Shown in
Following Table
Database is
consists of several
10
dataset
consists of several
and one dependent
variable, i.e., the
outcome.
has had their BMI,
11
on as Shown in
Following Table 1:
Table 1 Dataset description
Serial no Attribute
Names Description
1
Database is
consists of several
12
dataset
consists of several
and one dependent
variable, i.e., the
outcome.
13
has had their BMI,
on as Shown in
Following Table 1:
Database is
consists of several
14
dataset
consists of several
and one dependent
variable, i.e., the
outcome.
15
has had their BMI,
on as Shown in Follow
The dataset The dataset collected is originally from the Pima Indians Diabetes
Database is available on Kaggle. It consists of several medical analyst variables and one
target variable. The objective of the dataset is to predict whether the patient has diabetes
or not. The dataset consists of several independent variables and one dependent
variable, i.e., the outcome. Independent variables include the number of pregnancies
the patient has had their BMI, insulin level, age, and so on as Shown in Following Table
1:
Table 1 Dataset Description
Serial no Attribute Names Description

1 Pregnancies Number of Times pregnant
2 Glucose Plasma glucose
concentration
3 Blood Pressure Diastolic blood pressure
4 Skin Thickness Triceps skin fold thickness
5 Insulin 2-h serum index
6 BMI Body mass index
7 Diabetes pedigree function Diabetes pedigree function
8 Outcome Class variable (0 or 1)
9 Age Age of patient
Fig 3.1 Dataset Description
The diabetes data set consist of 768 data points, with 9 features each.
16
“Outcome” is the feature we are going to predict, 0 means No diabetes, 1 means diabetes.
17
Correlation Matrix
Fig 3.2 Correlation Matrix
It is easy to see that there is no single feature that has a very high correlation with our
outcome value. Some of the features have a negative correlation with the outcome value
and some have positive.
18
Skew of Data:
Fig 3.3 Skew of Data
It shows how each feature and label is distributed along different ranges, which further
confirms the need for scaling. It basically means that each of these is actually a
categorical variable. We will need to handle these categorical variable before applying
Machine Learning. Our outcome labels have two classes, 0 for no disease and 1 for
disease.
19
Bar Plot for Outcome Class
Fig 3.4 Bar Plot for Outcome Class
The above graph shows that the data is biased towards datapoints having outcome value
as 0 where it means that diabetes was not present actually. The number of non-daibetics is
almost twice the number of diabetic patients.
Fig 3.5 Graph
20
CHAPTER 4
PROPOSED METHOD
I] Dataset collection – It includes data collection and understanding the data to study
the hidden patterns and trends which helps to predict and evaluating the results.
Dataset carries1405 rows i.e., total number of data and 10 columns i.e., total number of
features. Features include Pregnancies, Glucose, Blood Pressure, Skin Thickness,
Insulin, BMI, Diabetes Pedigree Function, Age
II] Data Pre-processing: This phase of model handles inconsistent data in order
to get more accurate and precise results like in this dataset Id is inconsistent so we
dropped the feature. This dataset doesn’t contain missing values. So, we imputed
missing values for few selected attributes like Glucose level, Blood Pressure, Skin
Thickness, BMI and Age because these attributes cannot have values zero. Then data
was scaled using Standard Scaler. Since there were a smaller number of features and
important for prediction so no feature selection was done.
III]Missing value identification: Using the Panda library and SK-learn , we got the
missing values in the datasets, shown in Table 2. We replaced the missing value with
the corresponding mean value.
Fig 4.1 Missing Value Identified

21
IV] Feature selection:
Pearson’s correlation method is a popular method to find the most relevant

attributes/features. The correlation coefficient is calculated in this method, which
correlates with the output and input attributes. The coefficient value remains in the
range by between −1 and 1. The value above 0.5 and below −0.5 indicates a notable
correlation, and the zero value means no correlation.
Fig 4.2 Correlation Table
V] Scaling and Normalization:
We performed feature scaling by normalizing the data from 0 to 1 range, which boosted
the algorithm’s calculation speed.
scaling means that you're transforming your data so that it fits within a specific scale,
like 0-100 or 0-1. You want to scale data when you're using methods based on measures
of how far apart data points are, like support vector machines (SVM) or k-nearest
neighbours (KNN).With these algorithms, a change of "1" in any numeric feature is given
the same importance.
22
Fig 4.3 Scaling and Normalization
VI] Splitting of data:
After data cleaning and pre-processing, the dataset becomes ready to

train and test. In the train/split method, we split the dataset randomly into the training
and testing set. For Training we took 1600 sample and for testing we took 400 sample.
Fig 4.4 Spitting data
23
VII] Design and implementation of classification model: In this research work,
comprehensive studies are done by applying different ML classification techniques like
DT, KNN, RF, NB, LR, SVM.
VIII] Machine learning classifier:
We have developed a model using Machine learning Technique. Used different classifier
and ensemble techniques to predict diabetes dataset. We have applied SVM, LR, DT and
RF Machine learning classifier to analyze the performance by finding accuracy of each
classifier All the classifiers are implemented using scikit learn libraries in python. The
implemented classification algorithms are described in next section.
24
CHAPTER 5
MODELING AND ANALYSIS
A] Logistic Regression: Logistic regression is a machine learning technique used

when dependent variables are able to categorize. The outputs obtained by using the
logistic regression is based on the available features. Here sigmoidal function is used to
categorize the output.
B] K-Nearest Neighbors: K-nearest neighbors (KNN) algorithm uses ‘feature

similarity’ to predict the values of new datapoints which further means that the new data
point will be assigned a value based on how closely it matches the points in the training
set. Predictions are made for a new instance (x) by searching through the entire training
set for the K most similar instances (the neighbors) and summarizing the output variable
for those K instances.
SVM is supervised learning algorithm used for classification. In SVM we have to

identify the right hyper plane to classify the data correctly. In this we have to set correct
parameters values. To find the right hyper plane we have to find right margin for this we
have choose the gamma value as 0.0001 and rbf kernel. If we select the hyper plane with
low margin leads to miss classification.
D] Naive Bayes: Naive Bayes classifiers are a collection of classification

algorithms based on Bayes’ Theorem. It is not a single algorithm but a family of
algorithms where all of them share a common principle, i.e. every pair of features being
classified is independent of each other.
E] Decision Tree: Decision tree is non parametric classifier in supervised learning. In

this method all the details are represented in the form of tree, where leaves are
corresponds to the class labels and attributes are corresponds to internal node of the tree.
We have used Gini Index for splitting the nodes.
25
F] Random Forest: Random forest is an ensemble learning method for classification.
This algorithm consists of trees and the number of tree structures present in the data is
used to predict the accuracy. Where leaves are corresponds to the class labels and
attributes are corresponds to internal node of the tree. Here number of trees in forest used
is 100 in number and Gini index is used for splitting the nodes.
G] AdaBoost Classifier: Boosting is an ensemble modeling technique that attempts to

build a strong classifier from the number of weak classifiers. It is done by building a
model by using weak models in series. Firstly, a model is built from the training data.
Then the second model is built which tries to correct the errors present in the first model.
This procedure is continued and models are added until either the complete training data
set is predicted correctly or the maximum number of models are added.
AdaBoost was the first really successful boosting algorithm developed for the purpose
of binary classification. AdaBoost is short for Adaptive Boosting and is a very popular
boosting technique that combines multiple “weak classifiers” into a single “strong
classifier”. It was formulated by Yoav Freund and Robert Schapire. They also won the
2003 Gödel Prize for their work.
26
CHAPTER 6
MEASUREMENTS
To find the efficient classifier for diabetes prediction we have applied a performance
matrices are confusion matrix and accuracy are discussed as follows:
Confusion matrix: - which provides output matrix with complete description performance
of the model. Here,
TP: True positive
FP: False positive
TN: True negative
FN: False negative
Fig 6.1 Actual values

27
The following performance metrics are used to calculate the presentation of various
algorithms.
True positive (TP) – person has disease, and the prediction also has a positive
True negative (TN) – person not having disease and the prediction also has a negative
False positive (FP) – person not having disease but the prediction has a positive
False negative (FN) – person having disease and the prediction also has a positive
TP and TN can be used to calculate accuracy rate and the error rates can be computed
using FP and FN values.
True positive rate can be calculated as TP by a total number of persons have disease in
reality.
False positive rate can be calculated as FP by a total number of persons do not have
disease in reality.
Precision is TP/ total number of person have prediction result is yes.
Accuracy is the total number of correctly classified records
Accuracy- We have chooses accuracy matrix to measure the performance of all the
models. The ratio of number of correct predictions to the total number of predictions
Made.
28
CHAPTER 7
RESULTS AND DISCUSSION
learning classification algorithms developed for prediction of diabetes in earlier stage.

We used 70% of data for trimming and 30% of data for testing. In this ratio of data
splitting Here we found that Random Forest Classifier predicted with 99% of accuracy as
highest accuracy for the dataset. Comparison of results of all the implemented classifiers
are listed in below.
Fig 7.1 Results
Create a user Interface for Accessibility
The last part of the project is the creation of a user interface for the model. This user
interface is used to enter unseen data for the model to read and then make a Prediction.
The user interface is created using “Html & CSS” Web App.
29
Project View
User Interface
Inputting Values
30
Predicting Result
31
32
CHAPTER 8
RESULTS AND ANALYSIS
The project predicts the onset of diabetes in a person based on the relevant medical detail
scollected. When the person enters all the relevant medical data required in the online
Web portal, this data is then passed on to the trained model for it to make predictions
whether the person is diabetic or non-diabetic the model then makes the prediction with
an accuracy of 80%, which is fairly good and reliable. Following figure shows the basic
UI form which requires the user to enter the specific medical data fields. These
parameters help determine if the person is prone to develop diabetes Our research has the
added benefit of an associated Web app, which makes the model more user friendly and
easily understandable for a novice
33
CHAPTER 9
CONCLUSION
The objective of the project was to develop a model which could identify patients with
diabetes who are at high risk of hospital admission. Prediction of risk of hospital
admission is a fairly complex task. Many factors influence this process and the outcome.
There is presently a serious need for methods that can increase healthcare institution’s
understanding of what is important in predicting the hospital admission risk.
This project is a small contribution to the present existing methods of diabetes
detection by proposing a system that can be used as an assistive tool in identifying the
patients at greater risk of being diabetic. This project achieves this by analyzing many
key factors like the patient’s blood glucose level, body mass index, etc., using
various machine learning models and through retrospective analysis of patients’
medical records. The project predicts the onset of diabetes in a person based on the
relevant medical details that are collected using a Web application. When the
user enters all the relevant medical data required in the online Web application, this data
is then passed on to the trained model for it to make predictions whether the
person is diabetic or nondiabetic. The model is developed using artificial neural network
consists of total of six dense layers. Each of these layers is responsible for the efficient
working of the model. The model makes the prediction with an accuracy of 98%,which is
fairly good and reliable.
34
CHAPTER 10
REFERENCES
References1. Sahoo, K.S., et al.: An evolutionary SVM model for DDOS attack detection
in softwaredefinednetworks. IEEE Access 8, 132502–132513 (2020)
2. Sahoo, K.S., et al.: A machine learning approach for predicting DDoS traffic
insoftware defined networks. In: 2018 International Conference on
InformationTechnology (ICIT). IEEE (2018)
3. Jakka, A., Vakula Rani, J.: Performance evaluation of machine learning models
fordiabetesprediction. Int. J. Innov. Technol. Explor. Eng. (IJITEE) 8(11) (2019).
ISSN:2278-3075
4. Zou, Q., Qu, K., Luo, Y., Yin, D., Ju, Y., Tang, H.: Predicting diabetes mellitus
withmachine learning techniques. Bioinform. Comput. Biol. Sect. J. Front.
Genet.,published: 06 2018
35

Diabetes Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Diabetes Prediction

Uploaded by

Copyright:

Available Formats

DIABETES PREDICTION WEB APP

(for the partial fulfillment of Masters of Computer Application)

Under the guidance of

Ms. Anamika Arora

ASSISTANT PROFESSOR, GEHU

GRAPHIC ERA HILL UNIVERSITY

This is a great opportunity to acknowledge and to thanks all those persons

Keywords : Machine Learning, Diabetes, Decision tree, K nearest neighbour,

2.3 Functional Requirements 5

2.3 Non Functional Requirements 6

2.2 Functional Requirements

2.3 Non-Functional Requirements

S.No Hardware Component

1 CPU Intel i3 or higher

2 Internet Connection To retrieve dataset

S.No Software Name

1 Browser Google Chrome, Firefox, etc.

2 Operating System Windows, Mac OS

3 Programming IDE VS Code and Google Collab

4 Server WSGI (Web Server Gateway Interface)

S.No Technology Used

1 HTML5 Web Page Layout

2 CSS3 Designing Web Page

3 JavaScript Listening actions of web page

4 Python (version - 3.7.0b5) Developed ML model and implementing backend logic

5 Dataset Data required for predicting result based on SVM

The dataset collected

Table 1 Dataset Description

Serial no Attribute Names Description

Fig 3.1 Dataset Description

Fig 3.2 Correlation Matrix

Fig 3.3 Skew of Data

Fig 3.4 Bar Plot for Outcome Class

Fig 3.5 Graph

Fig 4.1 Missing Value Identified

Pearson’s correlation method is a popular method to find the most relevant

Fig 4.2 Correlation Table

V] Scaling and Normalization:

VI] Splitting of data:

After data cleaning and pre-processing, the dataset becomes ready to

Fig 4.4 Spitting data

VIII] Machine learning classifier:

MODELING AND ANALYSIS

A] Logistic Regression: Logistic regression is a machine learning technique used

B] K-Nearest Neighbors: K-nearest neighbors (KNN) algorithm uses ‘feature

SVM is supervised learning algorithm used for classification. In SVM we have to

D] Naive Bayes: Naive Bayes classifiers are a collection of classification

E] Decision Tree: Decision tree is non parametric classifier in supervised learning. In

G] AdaBoost Classifier: Boosting is an ensemble modeling technique that attempts to

TP: True positive

FP: False positive

TN: True negative

FN: False negative

Fig 6.1 Actual values

Precision is TP/ total number of person have prediction result is yes.

Accuracy is the total number of correctly classified records

RESULTS AND DISCUSSION

learning classification algorithms developed for prediction of diabetes in earlier stage.

Fig 7.1 Results

Create a user Interface for Accessibility

RESULTS AND ANALYSIS

You might also like