Chapter I (1) - Merged

CHAPTER I
INTRODUCTION
1.1 PREAMBLE
Diabetes is a condition in which our blood glucose, often known as blood sugar, is
abnormally high. Although there is no cure for diabetes, we may take efforts to manage it and
stay healthy. Diabetes comes in a variety of forms, but the symptoms are all the same. However,
diabetes prediction has many drawbacks in previous years. To overcome these limitations, we
are going to employ machine learning techniques to accurately and quickly predict diabetes.
This chapter deals with the importance to detect diabetes, types of diabetes, objective and
significance of our work.
1.2 DIABETES
Diabetes mellitus is another name for diabetes [1]. Diabetes is a condition in which our
blood glucose, commonly known as blood sugar, is too high. Because glucose is a significant
source of energy for the cells that make up our muscles and tissues, it is essential to our health.
It’s also the brain’s primary fuel source [2].
The main source of energy is blood glucose, which comes from the food we eat [3].
Insulin could be a hormone produced by the pancreas that functions as the key to permit glucose
from our diet to maneuver from the bloodstream into our cells for energy production [4]. Insulin
aids glucose absorption into cells.
When our bodies don’t produce enough insulin or don’t use it properly, glucose lingers
in our blood and doesn’t reach our cells [5]. Having too much glucose in our blood might lead
to health issues over time. Although there is no cure for diabetes, we may take steps to manage
it and be healthy. Diabetes is also referred to as “a touch of sugar” or “borderline diabetes [6].”
These words imply that someone does not have diabetes or has a lesser form of the diabetes,
however diabetes affects everyone.
The number of diabetic patients has risen dramatically in recent years, owing primarily
to the ageing population and irregular western eating patterns [7]. According the World Health
Organization, diabetes affects 346 million people worldwide with type 2 diabetes being the
most common. According to a report by the International Diabetes Federation, the number of
persons with diabetes is gradually rising. According to a survey conducted by the International
Diabetes Federation, the number of people living with diabetes is gradually rising. In 2013,
382 million people had diabetes, and by 2035 that figure is predicted to rise to 400 million.
With the advancement of information technology, we may be able to utilize the large amount
of data in the health care industry to assist doctors in diagnosing diabetes [8].
According to the sixth edition of the IDF (International Diabetes Federation) Diabetes
Atlas, there are an estimated 382 million people worldwide who have diabetes with rapid
increases reported in nations around the world, and Type 2 diabetes accounts for the majority
of all diabetes cases. As a result, Type 2 diabetes has become a major global health concern.
Millions of lives could be rescued if diabetes could be diagnosed and prevented as
early as feasible [9][10].
1.3 PROBLEM STATEMENT

For treatment, doctors rely on common knowledge. When there is a dearth of common
knowledge, studies are summarized after a few examples have been investigated. However,
this procedure takes time, whereas patterns can be detected faster using machine learning. A
large amount of data is necessary to use machine learning. Depending on the disease, there is
a very limited quantity of data available. Also, the fact that someone is affected with the illness.
Machine learning requires large data sets to train on, which must be comprehensive,
unbiased, and of high quality. They may also have to wait for new data to be created at times.
Machine Learning requires a significant amount of time to accomplish its goal with a
high level of accuracy and relevance. Interpretation of Results and High Error Susceptibility
are two main components of diabetes prediction using Machine Learning.
The Prediction of Diabetes Empowered with Fused Machine Learning aims to address
the pressing need for accurate and early identification of diabetes. This research endeavors to
integrate diverse machine learning techniques, optimizing predictive models with
comprehensive health data. By leveraging a fused approach, the study seeks to enhance
prediction accuracy, thereby enabling timely interventions and personalized healthcare
strategies for individuals at risk of or affected by diabetes.
Diabetes can be predicted earlier, which allows patients to be treated before it becomes
critical. As a result, we apply Fused Machine Learning to predict diabetes.
1.4 PURPOSE OF THIS WORK
Early prediction of diabetes is like having a superhero power for your health. It means
finding out if someone might get diabetes before it actually happens. Why is this important?
Because, just like superheroes stop bad things from happening, early diabetes prediction helps
us stop health problems before they start.
By identifying individuals at risk in the nascent stages of diabetes, we empower timely

and targeted interventions, ushering in preventive measures that encompass lifestyle
adjustments and dietary changes. This proactive approach not only averts the onset of diabetes
but also mitigates the risk of severe complications such as cardiovascular diseases, kidney
failure, and vision impairment. Beyond individual health, early prediction optimizes healthcare
resource allocation, fostering efficiency and reducing the burden on healthcare systems. It
facilitates informed decision-making, enhancing the quality of life for those at risk, and
contributes to cost savings by prioritizing preventive strategies over high-cost complications.
Additionally, the data gleaned from early prediction efforts informs public health planning,
enabling tailored interventions, personalized treatment plans, and research that advances our
understanding of diabetes causes and progression. Ultimately, early diabetes prediction is a
catalyst for health equity, ensuring that diverse populations have access to timely detection,
preventive measures, and a more equitable distribution of healthcare resources.
1.5 SIGNIFICANCE OF THE WORK

One of the benefits of using this Machine Learning in prediction of diabetes has
continuous improvement is held. The other benefits include no human intervention is required
it is fully automated. Many companies are employing these techniques to improve medical
diagnostics and early diabetes prediction. So, this techniques has the ability to predict diabetes.
1.6 BACKGROUND STUDY

A background study on diabetes involves exploring the risk factors, pathophysiology, and
current trends related to the condition. Here's a comprehensive overview:
1.6.1 Risk Factors

 Genetic Predisposition: Family history plays a significant role in both type 1 and type
2 diabetes.
 Lifestyle Factors: Sedentary behavior, unhealthy diet, and obesity contribute to the
development of type 2 diabetes.
 Age and Ethnicity: The risk of type 2 diabetes increases with age, and certain ethnic
groups are more susceptible.
1.6.2 Pathophysiology
 In type 1 diabetes, the immune system mistakenly targets and destroys beta cells in the
pancreas, leading to insulin deficiency.
 Type 2 diabetes involves insulin resistance, where cells do not respond effectively to
insulin, coupled with insufficient insulin production.
1.6.3 Complications
 Diabetes can lead to a range of complications affecting various organs, including the
eyes (diabetic retinopathy), kidneys (diabetic nephropathy), nerves (diabetic
neuropathy), and cardiovascular system (increased risk of heart disease and stroke).
1.6.4 Current Trends And Innovations

 Advances in diabetes management include the development of continuous glucose
monitoring (CGM) systems, closed-loop insulin delivery systems (artificial pancreas),
and personalized medicine approaches.
 Research is ongoing to explore novel therapies, genetic factors, and the potential role
of artificial intelligence in predicting and managing diabetes.
1.6.5 Public Health Impact

 Diabetes places a significant burden on healthcare systems globally, with associated
costs related to treatment, complications, and lost productivity.
 Public health initiatives focus on prevention, early detection, and lifestyle interventions
to mitigate the impact of diabetes.
1.6.6 Challenges And Future Directions

 Challenges in diabetes management include the rising prevalence, disparities in access
to care, and the need for sustainable and scalable interventions.
 Future directions involve continued research into prevention, early detection, and
innovative technologies to enhance diabetes care and improve outcomes.
A thorough background study on diabetes provides the foundation for understanding the
multifaceted aspects of the condition, informing research, policy, and clinical practices aimed
at addressing this global health challenge.
1.7 NON-TECHNICAL EVALUATION
Diabetes is one of the major health problems of all over the world. People with diabetes
have a higher chance of major health issues. High blood glucose levels over time can lead to
significant disorders of the heart and blood vessels, as well as the eyes, kidneys, nerves, and
teeth. In addition, people with diabetes are more likely to contract infections. Almost every
developed country diabetes is a leading cause of cardiovascular illness, blindness, kidney
failure, and lower limb amputation in many nations. Many studies have suggested traditional
methods as predictors. Data mining predicts the future by modelling Predictive modelling is
the process by which a model is created to predict an outcome. These are some of the algorithms
used for prediction of diabetes are Decision Tree J48, KNN Classifier, Random Forest, Support
vector Machine [11].
Machine learning is the area of artificial intelligence that uses the statistical analyses, and
is recognized to be a promising area that, based on the given dataset of diabetes. The majority
of them have concentrated on heart disease and cancer. Diabetes detection and diabetes a
comparative analysis of several algorithms focusing on diabetes mellitus data classification
shows how it's done. The algorithms have mostly concentrated on identification of pre-
diabetes, which was identified as risk factor in a somewhat strong indicator of future
development diabetes describes a recent study in this direction. SVM and ANN are machine
learning algorithms are used to predict pre- diabetes [12]
Diabetic classification is a critical and difficult topic for diabetes illness diagnosis and
data interpretation. When dealing with healthcare distributions that are complicated,
classification models are also crucial. The majority of medical data is non- linear, non-
normality, as well as a built-in correlation structure. As a result, the traditional and frequently
utilized LDA, QDA, NB, and other classifications approaches are unable to appropriately
classify the data. It is well understood that having a big training dataset does not ensure better
classification accuracy. GPC is a technique is used to predict the diabetes [13]
Data mining is a process of discovering useful knowledge from database to build a

structure that can meaningfully interpret the data. Data mining is the process of discovering
interesting patterns from large amount of data. Data mining uses many Machine learning
techniques such as supervised, unsupervised, and semi-supervised learning techniques. We
apply machine learning technique and propose a new hybrid intelligent method using Principal
Component Analysis, CART, and EM are used for disease diagnostic. These are used for
diseases prediction using machine learning techniques [14].
Based on machine learning algorithms, it is possible to distinguish between diabetic

individuals and those who are not. We do a best selection step in addition to using a feature
vector from both the diabetes affected and unaffected patient distributions should provide a
small number of features to make our solution real-time deployable using, for example
integrated devices [15].
Mobile health (mHealth) technologies can be used for patient self-management, patient
diagnosis, and determining the likelihood of being afflicted by a disease as information
technology advances. Diabetes mellitus is a chronic and lifestyle disease that affects millions
of people throughout the world. To prevent or control diabetes, certain mobile applications
maintain track of calories, sugar consumed, medicine doses, blood glucose, blood pressure, and
exercises. Because no app has been shown to anticipate the disease, we use the mhealth
monitoring system [16].
A Hybrid prediction model is used to predict Type 2 Diabetes with the help of K-means
and Decision Tree. K-means algorithm is one of the most popular clustering algorithms. The
procedure of K-means follows a simple way to classify a given data set through a certain no of
clusters. Decision tree is an important algorithm in data mining. The main advantage of
decision trees is readily interpretable. Hence, these models could assist doctors and medical
professionals in making decisions and improve prediction [17].
Diabetes is influenced by a variety of factors such as height, weight, hereditary factors,

and insulin, but the most important aspect to consider is sugar concentration. The best approach
to avoid difficulties is to identify the problem early. For diagnosing the diseases using various
machine learning approaches like J48, SVM, Naive Bayes, Decision tree, Decision table etc.
These classification algorithms are used and evaluated on the PIDD dataset to find the
prediction of diabetes in the patient [18].
Predictive Analysis, a technique from Machine learning algorithms, Data mining and
statistical methods that are used to find knowledge and predict future events. For constructing
predictive model we used supervised learning algorithms. These models have a set of input
data and also a set of output, and builds a model to make predictions. Description models were
made by Unsupervised Learning. It is used for transactional data. Semi-supervised learning
uses both labeled and unlabeled data on training dataset. Decision Tree, K-means and Logistic
Regression are booming techniques in Machine learning [19].
Nowadays, chronic diseases became threaten in all countries, in every country one third
of people is suffering from chronic diseases. In medical lot of chronicle disease dataset are
collected and stored, the data mining helps in early detection of the disease. There is a vast
amount of healthcare data accessible that is not being mined in a more efficient and accurate
manner to uncover hidden information for effective decision making. The suggested
technology use data mining techniques to detect the disease early on diseases that last a long
time [20].
As we observe in diabetic individuals, type 2 diabetes has become one of the biggest
challenges. Many studies have employed data mining techniques, neural network approaches,
and genetic algorithms to predict type 2 diabetes. We will employ the classification of several
machine learning techniques in this research, including logistic regression, support vector
machine, and random forest classification. Pima Indian Diabetes data set is the data set that we
used. We can predict type 2 diabetes at an early stage using these algorithms, and we can
achieve excellent accuracy and precision ratings [21].
Today, advances in clinical healthcare services are supported by digitalization in the

healthcare industry. Traditional techniques of studying patient outcomes in forecasting and
diagnosing chronic diseases are being replaced by technologies that capture the most relevant
insights from medical data by applying predictive analysis with the highly valuable tool of
machine learning. The use of machine learning methods in this model for diagnosis
demonstrates its capacity to achieve high classification accuracy while reducing computing
time [22].
Naive Bayes: attributes and, as a result, computing the probability of each Naive Bayes
is useful for classifying attribute separately. As a result, the Naive Bayes algorithm is employed
in the implementation for prediction. This classifier determines whether or not a person has
diabetes based on probability results. The severity of diabetes is determined by the range of
each feature. The prediction will be based on the given cut-offs by default, prior to all
calculations as a result, the forecast will indicate the level of diabetes risk. Prediction is based
on probability. The results reveal whether or not a person is a risk of developing diabetes [23].
1.8 UNADDRESSED CHALLENGES
The challenges yet to resolve in the field of diabetes prediction empowered with fused machine
learning include:
 Data Privacy Concerns: Integration of machine learning for diabetes prediction may
raise privacy issues due to the sensitive nature of health data.
 Complex Model Interpretability: Fused machine learning models can be intricate,
making it challenging to interpret and explain their predictions, limiting user
understanding.
 Resource Intensiveness: Implementing and maintaining a fused machine learning
system for diabetes prediction may demand substantial computational resources,
impacting scalability.
The proposed method aims to address these challenges in the following ways:
 Data Privacy: The proposed method can incorporate privacy-preserving techniques

such as federated learning or differential privacy to ensure the protection of sensitive
health data while still enabling effective diabetes prediction.
 Model Interpretability: The proposed method can utilize techniques such as model
explainability and feature importance analysis to enhance the interpretability of the
fused machine learning models, providing insights into the factors contributing to
diabetes prediction.
 Resource Optimization: The proposed method can explore optimization techniques and
model simplification approaches to reduce the computational resources required for
implementing and maintaining the fused machine learning system, thereby improving
scalability.
By addressing these challenges, the proposed method aims to enhance the privacy,
interpretability, and scalability of diabetes prediction empowered with fused machine learning,
making it more effective and applicable in real-world healthcare settings.
1.9 AIM AND OBJECTIVE

The aim is to to enhance the accuracy of diabetes risk prediction by integrating clinical
data with sleep health and lifestyle data.
The objective of this work is to develop a novel predictive model that leverages a fused
machine learning approach to enable more accurate and early identification of individuals at
risk of diabetes.
To optimize predictive models with comprehensive health data, including clinical, sleep
health, and lifestyle data, to enhance the accuracy of diabetes risk prediction.
To enable timely interventions and personalized healthcare strategies for individuals at

risk of or affected by diabetes by leveraging the enhanced prediction accuracy of the fused
machine learning approach.
1.10 APPLICATIONS
The "Prediction of Diabetes Empowered with Fused Machine Learning" project has several
potential applications in the healthcare and medical domain, including:
 Early Detection and Diagnosis: The predictive model developed through this project
can be used for early detection and diagnosis of diabetes, enabling healthcare
professionals to identify individuals at risk and initiate timely interventions.
 Personalized Healthcare Interventions: The comprehensive insights provided by the
fused machine learning approach can support the development of personalized
healthcare interventions for individuals at risk of or affected by diabetes, leading to
tailored treatment plans and improved patient care.
 Real-time Monitoring and Intervention: The empowered prediction system's real-time
applicability can facilitate continuous monitoring of diabetes risk factors and enable
timely interventions, contributing to proactive and targeted healthcare strategies.
 Research and Clinical Studies: The advanced predictive model can be utilized in
research studies and clinical trials to analyze diabetes risk factors, evaluate treatment
outcomes, and contribute to the advancement of diabetes management strategies.
 Public Health Initiatives: The insights generated by the fused machine learning
approach can inform public health initiatives aimed at addressing diabetes prevalence,
risk factors, and preventive measures within communities and populations.
Overall, the applications of this project extend to various aspects of diabetes management,
including early detection, personalized care, real-time monitoring, research, and public health
interventions, contributing to improved healthcare outcomes for individuals at risk of or
affected by diabetes.
1.11 STRUCTURAL OUTLINE
The rest of the chapter is organized as follows:
Chapter 1 – This chapter contains an introduction, problem statement, aim and objective
of the project, methodology, significance of the work, and conclusion.
Chapter 2 – This chapter contains a literature review and comparison of various

techniques.
1.12 SUMMARY
A brief overview of diabetes prediction with the help of Machine Learning is included in
this chapter. And this chapter covers the following topics: defining the problem statement,
estimating the project's goal, a brief introduction to the methodology used in the project, the
project's importance, and the project's organization, which contains the output of each chapter
introduction. The following chapter is the Literature Review, which reviews many journal
papers in order to achieve a specific problem statement by assessing all relevant work and
material given in that reference paper in order to comprehend the current problem statement in
that domain.
CHAPTER 2
LITERATURE SURVEY
2.1 PREAMBLE
In the previous chapter, we have gone through the problem statement, objective of the
work, background study and concepts like Diabetes and classification of Diabetes. The goal of
the literature review is to have a clear idea of the current problem. In this Literature Survey, we
are going to see with few reference papers and which are related to this project. Literature
Survey is helpful to get knowledge about particular domain of any application, eventually find
advantages and disadvantages of each and every paper.
2.2 LITERATURE REVIEW

2.2.1 Machine Learning Approaches
Francesco Mercaldo et al. [24] has proposed a new method for diagnosis of diabetes
based on machine learning techniques. This type of types 2 diabetes is most commonly seen in
people over the age of 35, and it is caused by obesity, sugar glucose levels, and plasma
concentration, among other factors. This work proposes the support vector machine and k-
means clustering algorithms. According to medical students, diabetes has been on the rise for
decades and is not going away. Based on this, proposed employing a set of WHO characteristics
to determine if a patient is diabetes and how many people are affected by the disease. We
gathered data on Pima Indians who live in the Gila River Indian community in southern
Arizona. The data was broken down into eight categories, including plasma glucose
concentration, diastolic blood pressure, number of pregnancies, BMI, diabetes pedigree
function, age, and so on. These characteristics were chosen because they have been linked to
an increased risk of diabetes in Pima and other populations. The dataset contains 768 separate
examples, with 500 cases being classified as class () and 268 cases being classified as class1.
As a result, the classification analysis is carried out with the help of the Weka software tool.
There are three types of analysis: descriptive, hypothesis testing, and classification analysis.
As a result of the machine learning methods, it provides high precision and recall values.
Deepti Sisodia et al. [25] has introduced a different approach for diabetes prediction
using classification algorithms. Diabetes is a disease that causes blood sugar levels to raise and
is one of the most fatal and chronic disease. If diabetes is left untreated and undiagnosed, it can
lead to the many complications. Diabetes is more commonly encountered in young and senior
citizens. As a result, we must be able to predict diabetes at an early stage. As a result, the rise
of machine learning approaches is addressing this essential issue. The major goal of this
research is to create a model that can accurately diagnose the likelihood of diabetes in patients.
As a result, in this research, we will employ three machine learning classification techniques
decision tree, support vector machine and naïve bayes are used to detect diabetes at an early
stage. The dataset utilized was the Pima Indian Diabetes database from UCI machine learning.
Precision, accuracy, f-measure, and recall are all used to evaluate the performance of the three
methods. When compared to other algorithms, Naïve Bayes had the highest accuracy of 76.30
percent across all classification techniques. Finally, diabetes can be predicted early on.
Aishwarya Mujumdara et al. [26] machine learning models were utilised to suggest a
new method for disease prediction. Machine learning, a pipeline model and logistic regression
are used in this process. These strategies and procedures rely on current and historical data to
uncover knowledge and forecast future events. The goal is to ensure that all of the steps in the
pipeline are constrained to the data for evaluation, such as the training dataset or each cross-
validation fold in order to improve classification accuracy, and also created a diabetes
prediction pipeline model.
Jobeda Jamal Khanam et al. [27] has proposed a comparison of diabetes prediction
machine learning models. In this research, combine the data mining, machine learning (ML)
and neural networks (NN) strategies to predict diabetes. The Pima Indian Diabetes (PID)
dataset from the UCI Machine Learning Repository was used in this work. Data mining and
machine learning have developed into reliable and supporting technologies in the medical area
in recent years. Using the ANN approach, it achieved 75.7 percent accuracy. They used the
decision tree model, Bootstrap aggregating, and Adaptive Boosting. For data preprocessing,
they used the WEKA tool. The approach of feature reduction was used to remove three features.
Machine learning is used to automate diabetes prediction years, while data mining is used to
pre-process and extract key parts from healthcare data. Data mining will be the only way to
make accurate decisions with certainty. Finally, it provides greater precision.
Yunlei Sun et al. [28] in this method is used to predict diabetes using machine learning
algorithms. This method is using the CNN model. This model is used to solve a problem
combined with the BN layer. The study’s key contribution is twofold: (1) in this research, we
use the CNN method to answer the problem of how to do one-dimensional irrelevant data
convolution on one-dimensional unrelated data sets. (2) The CNN model is combined with the
BN layer in this study to reduce gradient dispersion, boost training time, and improve model
accuracy. Other one-dimensional irrelevant data sets, such as other medical records of
electronic medical records information, the relevant survey, can also be used with the research
design model. By using this model improving the training speed and accuracy.
Qian Wang et al. [29] Seeking a classification algorithm for diabetes mellitus that
addresses the difficult issues with missing values and class imbalances that are present in the
diabetes dataset. Both the data pre-treatment and classification steps have been covered by the
algorithm. By reducing class imbalance and making up for missing values, a high quality
dataset was produced. K-fold cross validation is used to confirm the reliability of the Random
Forest classifier. The suggested DMP_MI algorithm has demonstrated excellent potential for
diabetes prediction by outperforming previous algorithms in terms of accuracy and other
classifier performance measures.
Therence Nibareke et al. [30] proposed a Big Data-machine learning approach for
analytics on airline delays and diabetes prediction. Three machine learning algorithms—linear
regression, naive bayes, and decision tree—were utilized in this case to tackle the problem. The
accuracy of these three models is the same (0.766), but Decision Tree outperforms the other
two models with the highest score (1) and the lowest error (0). In order to aid in decision-
making and forecast flight delays, this article performed analytics on flight datasets.
N. Sneha et al. [31] Diabetes Mellitus was evaluated for early prediction utilizing the
best feature selection. Some of the machine learning methods used in this instance include
Support Vector Machine, Random Forest, Naive Bayes, Decision Tree, and K-Nearest
Neighbors. With 98.20% and 98.00% accuracy, respectively, the decision tree approach and the
random forest are shown to have the best accuracy. Additionally, this generalizes the selection
of the dataset's top attributes, which improves classification accuracy.
Mariwan Ahmed Hama Saeed. [32] employing up-sampling and machine learning
methods, a type 2 diabetes was classified. The classifier machine learning models utilized in
this study include gradient boosting, AdaBoost, decision trees, and additional trees. The models
analyze the PIMA Indian Diabetes dataset (PIMA) and the Behavioural Risk Factor
Surveillance System (BRFSS) diabetes datasets to classify patients with positive or negative
diagnosis. The remaining 80% of the datasets are utilized for training, while just 20% are used
for testing. The extra trees classifier outperformed other models, with an area under curve of
0.96% for PIMA datasets and 0.99% for BRFSS datasets.
Nikos Fazakis et al. [33] It was suggested to use machine learning tools for long-term
type 2 diabetes risk prediction. The system developed by this model, which applies, evaluates,
and combines specific KDD process components, is focused on estimating the likelihood of
getting diabetes. In specifically, dataset generation, feature selection, and classification are
considered using a variety of Supervised Machine Learning (ML) models. It is recommended
to improve the prediction of diabetes using the ensemble WeightedVotingLRRFs Machine
Learning model, which has an Area under the ROC Curve (AUC) of 0.884.
Liyan Jia et al. [34] Under the name PE_DIM, an efficient probabilistic ensemble
classification technique for diabetes with class imbalance missing values was created. In order
to address the issue of missing imbalances and improve classification performance, the authors
of this study propose an efficient Probabilistic Ensemble classification technique for Diabetes
managing class Imbalance Missing values (PE_DIM). On the Pima Indian diabetes dataset, a
combination of extreme gradient boosting, random forests, and weighted k closest neighbors
achieves the greatest classification accuracy of 94.53%. k-nearest neighbors, weighted k-
nearest neighbors, logistic regression, decision trees, naïve bayes, random forests, and extreme
gradient boosting are some more datasets.
Sajida Perveen et al. [35] proposed a predictive model for the metabolic syndrome and
the onset of diabetes mellitus based on machine learning techniques. This model predicts the
occurrence of diabetes in the future across balanced and unbalanced datasets by combining
significant risk factors identified by logistic regression analysis with Naive Bayes and J48
decision trees. The outcomes demonstrated that the Naive Bayes with K-methods under-
sampling technique was superior than random under-sampling, over-sampling, and no
sampling. The study's data collection, which spans the years 2003 to 2013, comprises 667 907
records with a 79% accuracy rate.
Gaurav Tripathi et al. [36] accomplished a Machine Learning Early Prediction of

Diabetes Mellitus. They suggest using the four major types of algorithms used in this model
are Linear Discriminant Analysis (LDA), K-Nearest Neighbor (KNN), Support Vector Machine
(SVM), and Random Forest (RF). The dataset used in this investigation was the Pima Indian
Diabetes Database (PIDD), which shows the results of people with and without diabetes. The
dataset has 768 entries and 8 key features that are connected to diabetes and have a class
designation. The results show that Random Forest (RF) has the best accuracy, with a score of
87.66%.
Rakesh S Raj et al. [37] Compared a Support Vector Machine and Naive Bayes
Classifiers for Predicting Diabetes. In this paper they used Data Mining (DM), Support vector
machine (SVM), Naive Bayes (NB). To create data sets, various diabetic patients' health reports
are employed. For sample datasets, the Naive Bayes algorithm predicts with a 62.5% accuracy,
while the Support vector machine predicts with an 82% accuracy.
M. Raihan et al. [38] K-Means and Hierarchical Clustering Techniques were used to
construct An Empirical Study to Predict Diabetes Mellitus. Only two well-known machine
learning techniques, k-means and hierarchical clustering, are employed in this research. This
study makes use of the diabetes dataset. With the help of this research study, an expert system
that predicts diabetes more effectively and precisely than currently available prediction tools
can be created.
Melky Radja et al. [39] evaluated supervised machine learning algorithms for diabetes
prediction using various data set sizes. Several classification techniques, including Naive
Bayes, SVM(SMO), decision tables, and J48, will be examined in this work. In testing with
data sets of 768, 538, and 384 number of instance funds, the SVM method has the highest level
of accuracy, according to the findings of a comparison of the four supervised classification
algorithms on the diabetes dataset.
Arwatki Chen Lyngdoh et al. [40] diabetes diagnosis. The dataset looks into studies on
the Pima Indians' advocacy of utilizing machine learning algorithms to forecast diabetic
development. In order to anticipate the developiabetes Database, this study compares the
abilities of five supervised machine learning algorithms: K-Nearest Neighbors, Naive Baye,
Decision Tree Classifier, Random Forest, and Support Vector Machine. The maximum
accuracy of K-Nearest Neighbor classifiers was 76%. This study has the advantage that we can
forecast other fatal diseases using this method.
2.2.2 Deep Learning Approaches

Navaneeth Bhaskar et al. [41] classifies an Automated Diabetes Detection from Human
Breath. This work uses machine learning and deep learning techniques to generate automatic
forecasts for the tested samples. The proposed medical system correctly predicted diabetes
98.02% of the time. This model's benefit is that it works well and may be applied in clinical
settings to non-invasively identify diabetes.
Huaping Zhou et al. [42] improved Diabetes Prediction Deep Neural Network (DNN)
model. Using Deep Cognition AI, this model was developed and trained on the Deep Learning
Studio. Here, two datasets are used: the diabetes type dataset and the diabetes type dataset,
which had the best training accuracy of 94.02174%, and the Pima Indians diabetes dataset,
which had the best training accuracy of 99.4112%. The key to finding the best diabetic
medication is spotting the disease early.
Bala Manoj Kumar et al. [43] using a Deep Neural Networks (DNN) Classifier to
predict Type 2 Diabetes Mellitus. The dataset (PID) for Pima Indian diabetes was obtained
from the UCI repository. The performance of the model was evaluated by evaluating its
accuracy, specificity, sensitivity, recall, and precision. The model obtained superior
performance than previous cutting-edge approaches, achieving 98.16% accuracy using a
random train-test split.
Ammar Armghan et al. [44] created a deep learning-based very sensitive biosensor for
synchronized diabetes identification. In this study, a very sensitive biosensor is used to detect
diabetes using a medical deep learning model. The suggested Multimodal Distance Metric
Model (MDML) achieved 97.98% f1-score, 89.90% diagnostic odds ratio, 96.21% accuracy,
91.53% precision, 94.21% recall, and 94.21% recall. By identifying both types of diabetes in a
single test, this platform enables accurate and speedy diagnosis of diabetes.
2.2.3 Hybrid Approaches

Radwa Marzouk et al. [45] Radwa Marzouk et al. developed an Analytical Predictive
Models and Secure Web-Based Personalized Diabetes Monitoring System [28]. The suggested
method makes use of a number of machine learning algorithms, including the Decision Tree
(DT), Support Vector Classifier (SVM), Random Forest (RF), Gradient Boosting (GB), Multi-
layer Perceptron (MLP), Artificial Neural Network (ANN), k-Nearest Neighbors (KNN),
Logistic Regression (LR), and Nave Bayes (NB). The proposed analytical model is evaluated
using a synthetic dataset as well as the PIMA Diabetes Dataset. Gradient Boosting obtained
87.49% accuracy, 88% precision, and 86% recall, while Decision Tree Classifier also attained
high accuracy. Gradient Boosting and Decision Tree Classifier may thus be more accurate at
predicting diabetes than other methods.
PratyaNuankaew et al. [46] made use of the Average Weighted Objective Distance-
Based Type 2 Diabetes Prediction Method. The suggested model is tested using two separate
open-source datasets of 392 entries each, Pima Indians Diabetes and Mendeley Data for
Diabetes. K-Nearest Neighbors, Support Vector Machines, Random Forest, and Deep Learning
are just a few of the machine learning-based prediction techniques that had their effectiveness
for both datasets assessed. According to comparative findings, the suggested technique
produced accuracy for Datasets 1 and 2 of 93.22% and 98.95%, respectively.
Usama Ahmed et al. [47] proposed a machine learning-enabled diabetes prediction.

They utilized a hybrid machine learning approach in their model to predict diabetes. In this
article, artificial neural network (ANN) and support vector machine (SVM) models are both
employed. To ascertain whether or not a diabetes diagnosis is accurate, these models look at
the data. The dataset utilized in this study is divided 70:30 between the training and testing
halves. The proposed fused machine learning model's accuracy is 94.87%.
After reviewing all these comes to know that prediction of diabetes at an early stage
can be happened by using different models, methods like data mining, machine learning and
deep learning and this identification should require for every people.
2.3 COMPARISON TABLE OF EXISTING METHODOLOGIES

Ref
YEAR AUTHOR METHODOLOGY MERITS DEMERITS
No.
Francesco Achieves high
k-means clustering& Limited
[24] 2017 Mercaldo et precision &
SVM discussion.
al. recall values.
WEKA tool is
We have to take
personalized
Sisodia et Naïve Bayes & more no. of
[25] 2018 according to
al. Decision Tree data and
the
algorithms.
requirements.
Aishwarya Find
Classification
Mujumdara Pipeline model & knowledge and
[26] 2019 and Accuracy is
et al. Logistic Regression predict future
not so high.
events.
Jobeda
Missing value
Jamal Logistic Regression & It gives better
[27] 2021 only for the
Khanam et SVM accuracy.
type2 diabetes.
al.
Yunlei Sun Not mentioned
[28] 2019 CNN High accuracy.
et al.
Superior
Mariwan Gradient Boosting,
performance
Ahmed AdaBoost, Decision Not mentioned
[32] 2023 with high area
Hama Seed Tree & Extra Tree
under the curve
et al. Classifiers
(AUC).
Limited dataset
size, limited
Nikos AI Tools used
generalizability
[33] 2021 Fazakis et WeightedVotingLRRFs to predict
& limited
al. earlier.
interpretability
analysis.
Relevant
comparison of Limited scope,
Rakesh et classification small dataset,
[37] 2019 Naïve Bayes & SVM
al. algorithms for lack of
diabetes algorithm.
prediction.
limited number
K-means & Provides of instances
M. Raihan
[38] 2019 Hierarchical valuable and attributes
et al.
Clustering. insights. are used in the
dataset.
Comprehensive Limited
evaluation exploration of
using different deep learning
Melky Naïve Bayes, SVM,
[39] 2019 data set sizes methods,
Radja et al. decision table & J48.
and potential lack
measurement of
variables. generalizability.
2.4 DATASET COMPARISON

Ref
YEAR AUTHOR DATASET
No.
[28] 2020 Qian Wang et al. PID
[29] 2019 Yunlei Sun et al. Electronic Medical Record
[30] 2020 Therence et al. PID
[31] 2019 N. Sneha et al. UCI machine repository.
PID &
[32] 2023 Mariwan Ahmed Hama Seed et al.
BRFSS
[33] 2021 Nikos Fazakis et al. ELSA database
[34] 2022 Liyan Jia et al. PID
[35] 2018 Sajida Perveen et al. -
[36] 2020 Gaurav et al. PID
[37] 2019 Rakesh et al. Different hospital data sources.
[38] 2019 M. Raihan et al. Real-time data
[41] 2023 Navaneeth Bhaskar et al. -
Diabetes dataset
[42] 2020 Huaping Zhou et al.
PID
[43] 2020 Bala Manoj et al. PID
[44] 2023 Ammar Armghan et al. -
2.5 RESULTS COMPARISON

Ref F-
ALGORITHM PRECISION RECALL ACCURACY
No. SCORE
[24] K-NN 0.804 0.794 0.798 79.42%
[25] AdaBoost - - 0.988 98.8%
[26] Naïve Bayes 0.759 0.763 0.760 76.30%
[27] Hoeffding Tree 0.757 0.762 0.759 -
[28] CNN - - - 97.56%
[32] Extra Tree Classifier 0.94 0.99 0.97 96%
[33] WeightedVotingLRRFS - - - 88.8%
[37] SVM - - - 82%
[38] - - - - -
[39] SVM 0.74 0.54 0.76 77.3%
2.6 PARAMETERS COMPARISON

CLINICAL DATA LIFESTYLE DATA
Physical
Ref Skin Stress Heart Sleep
Glucose BP Insulin Activity
No. Thickness Level Rate Disorder
Level
[24]        
[25]        
[26]        
[27]        
[28]        
[32]        
[33]        
[37]        
[38]        
[39]        
2.7 GAP IDENTIFICATION FROM THE LITERATURE SURVEY
The gap identification from the literature survey reveals several areas where further
research and development are needed in the field of diabetes diagnosis using machine learning
techniques. Some of the identified gaps include:
 Limited discussion on potential applicability to other medical conditions.

 Need for more data and algorithms to improve accuracy and early identification of
diabetes.
 Existing methods showing lower classification and prediction accuracy.
 Missing the value identification and limited data set for type 2 diabetes.
 Limited interpretability analysis, ethical implications, and consideration of dataset size
and generalizability in predicting long-term risk of type 2 diabetes.
The proposed system is unique because it combines clinical data with sleep health and
lifestyle data to improve the accuracy of diabetes risk prediction. None of the existing systems
in the table use this combination of data. Therefore, the proposed system is novel and has the
potential to provide more accurate predictions for early stage diabetes risk.
2.8 SUMMARY
This chapter summarizes the research work done for prediction of diabetes at early stage.
In this various machine learning algorithms were compared and reviewed in this chapter.
2.9 REFERENCES
[1] Mary Posonia, S. Vigneshwari, D. Jamuna Rani, “Machine Learning based Diabetes
Predicition using Decision Tree J48”, 2020.
[2] Priyanka Sonar, Prof. K. JayaMalini, “Diabetes Prediction using Different Machine
Learning Approaches”, 2019.
[3] Muhammad Azeem Sarwar, Nasir Kamal, Wajeeha Hamid, Munam Ali Shah,
“Prediction of Diabetes using Machine Learning Algorithms in Healthcare”, 2018.
[4] Md. Tanvir Islam, M. Raihan, Nasrin Aktar, Md. Shahabub Alam, Romana Rahman
Ema, Tajul Islam, “Diabetes Mellitus Prediction using Different Ensemble Machine
Learning Approaches”, 2020.
[5] Md Shafiqul Islam, Marwa K. Qaraqe, Hasan T. Addas, Madhav Erraguntla,
Muhammad Abdul-Ghani, “The Prediction of Diabetes Development: A Machine
Learning Framework”, 2020.
[6] Md. Tanvir Islam, M. Raihan, Fahmida Farzana, Nasrin Aktar, Promila Ghosh, Sajib
Kabiraj, “Typical and Non-Typical Diabetes Disease Prediction using Random Forest
Algorithm”, 2020.
[7] S. M. Mahedy Hasan, Md. Fazle Rabbi, Arifa Islam Champa, Md. Asif Zaman, “An
Effective Diabetes Prediction System Using Machine Learning Techniques”, 2020.
[8] Narendra Mohan, Vinod Jain, “Performance Analysis of Support Vector Machine in
Diabetes Prediction”, 2020.
[9] G. A. Pethunachiyar, “Classification of Diabetes Patients using Kernel Based Support
Vector Machines”, 2020.
[10] Veena Vijiayan V, Anjali C, “Prediction and Diagnosis of Diabetes Mellitus –A
Machine Learning Approach”, 2015.
[11] Kandhasamy, J. Pradeep, Balamurali, S, “Performance Analysis of Classifier Models
to Predict Diabetes Mellitus”, 2015.
[12] Z. Tafa, N. Pervetica and B. Karahoda, “An intelligent system for diabetes
prediction”, 2015.
[13] Maniruzzaman, Md.; Kumar, Nishith, Menhazul Abedin, Md.; Shaykhul Islam, Md.;
Suri, Harman S, El-Baz, Ayman S, Suri, Jasjit S, “Comparative Approaches for
Classification of Diabetes Mellitus Data: Machine Learning Paradigm Computer
Methods and Programs in Biomedicine”, 2017.
[14] Nilashi, Mehrbakhsh; Ibrahim, Othman bin; Ahmadi, Hossein; Shahmoradi, Leila,
“An analytical method for diseases prediction using machine learning techniques",
2017.
[15] N. S. Khan, M. H. Muaz, A. Kabir and M. N. Islam, “Diabetes Predicting mHealth

Application Using Machine Learning”, 2017.
[16] W. Chen, S. Chen, H. Zhang and T. Wu, “A hybrid prediction model for type 2 diabetes
using K-means and decision tree”, 2017.
[17] M. Chen, J. Yang, J. Zhou, Y. Hao, J. Zhang and C. Youn, “5G-Smart Diabetes: Toward
Personalized Diabetes Diagnosis with Healthcare Big Data Clouds”, 2018.
[18] S. Ganiger and K. M. M. Rajashekharaiah, “Chronic Diseases Diagnosis using Machine
Learning”, 2018.
[19] Kevin Plis, Razvan Bunesco, Cindy Marling, Jay Shubrook, Frank, Schwartz “A
Machine Learning Approach to Predicting Blood Glucose Levels for Diabetes
Management”, 2018.
[20] Yuvaraj N, Sri Preethaa, K.R. “Diabetes prediction in healthcare systems using machine
learning algorithms on Hadoop cluster”, 2019.
[21] Sharma N, Singh A, “Diabetes Detection and Prediction Using Machine Learning/loT:
A Survey”, 2018.
[22] S. M. Sasubilli, A. Kumar and V. Dutt, “Machine Learning Implementation on Medical
Domain to Identify Disease Insights using TMS”, 2020.
[23] S. Alanazi and M. A. Mezher, “Using Machine Learning Algorithms For Prediction Of
Diabetes Mellitus”, 2020.
[24] Francesco Mercaldo, Vittoria Nardone, Antonella Santone, “Diabetes Mellitus
Affected Patients Classification and Diagnosis through Machine Learning
Techniques”, 2017.
[25] Deepti Sisodia, Dilip Singh Sisodia, “Prediction of Diabetes using Classification
Algorithms”, 2018.
[26] Aishwarya Mujumdara, Dr.Vaidehi V, “Diabetes Prediction using Machine Learning
Algorithms”, 2019.
[27] Jobeda Jamal Khanam, Simon Y, “A comparison of machine learning algorithms for
diabetes prediction”, 2021.
[28] Yunlei Sun, “The Neural Network of One-Dimensional Convolution-An Example of
the Diagnosis of Diabetic Retinopathy”, 2019.
[29] Qian Wang, Weijia Cao, Jiawel Guo, Jiadong Ren, Yongqiang Cheng, Darryl N. Davis,
“DMP_MI: An Effective Diabetes Mellitus Classification Algorithm on Imbalanced
Data With Missing Values”, 2019.
[30] Thérence Nibareke, Jalal Laassiri, “Using Big Data machine learning models for
diabetes prediction and fight delays analytics”, 2020.
[31] N. Sneha, Tarun Gangil, “Analysis of diabetes mellitus for early prediction using
optimal features selection”, 2019.
[32] Mariwan Ahmed Hama Saeed, “Diabetes type 2 classifcation using machine learning
algorithms with up-sampling technique”, 2023.
[33] Nikos Fazakis , Otilia Kocsis , Elias Dritsas , Sotiris Alexiou, Nikos Fakotakis,
Konstantinos Moustakas, “Machine Learning Tools for Long-Term Type 2 Diabetes
Risk Prediction”, 2021.
[34] Liyan Jia, Zhiping Wang, Siqi LV, Zhaohui XU, “PE_DIM: An Efficient Probabilistic
Ensemble Classification Algorithm for Diabetes Handling Class Imbalance Missing
Values”, 2022.
[35] Sajida Perveen, Muhammad Shahbaz, Karim Keshavijee, Aziz Guergachi, “Metabolic
Syndrome and Development of Diabetes Mellitus: Predictive Modeling Based on
Machine Learning Techniques”, 2018.
[36] Gaurav Tripathi, Rakesh Kumar, “Early Prediction of Diabetes Mellitus Using
Machine Learning”, 2020.
[37] Rakesh S Raj, Sanjay D S, Dr. Kusuma M, Dr. S Sampath, “Comparison of Support
Vector Machine and Naïve Bayes Classifiers for Predicting Diabetes”, 2019.
[38] M. Raihan, Md. Tanvir Islam, Fahmida Farzana, Md. Golam Morshed Raju, Himadri
Shekhar Mondal, “An Empirical Study to Predict Diabetes Mellitus using K-Means and
Hierarchical Clustering Techniques”, 2019.
[39] MelkyRadja, Andi Wahju Rahardjo Emanuel, “Performance Evaluation of Supervised
Machine Learning Algorithms Using Different Data Set Sizes for Diabetes Prediction”,
2019.
[40] Arwatki Chen Lyngdoh, Nurul Amin Choudhury, Soumen Moulik, “Diabetes Disease
Prediction Using Machine Learning Algorithms”, 2020.
[41] Navaneeth Bhaskar, Vinayak Bairagi, Ekkarat Boonchieng, Mousami V. Munot,
“Automated Detection of Diabetes From Exhaled Human Breath Using Deep Hybrid
Architecture”, 2023.
[42] Huaping Zhou, Raushan Myrzashova, Rui Zheng, “Diabetes prediction model based on
an enhanced deep neural network”, 2020.
[43] Bala Manoj Kumar P, Srinivasa Perumal R, Nadesh R K, Arivuselvan K, “Type 2:
Diabetes mellitus prediction using Deep Neural Networks classifier”, 2020.
[44] Ammar Armghan, Jaganathan Logeshwaran, M. Sutharsan, Khaled Aliqab, Meshari
Alsharari, Shobhit K. Patel, “Design of biosensor for synchronized identification of
diabetes using deep learning”, 2023.
[45] Radwa Marzouk, Ala Saleh Alluhaidan, Sahar A. EL_Rahman, “An Analytical
Predictive Models and Secure Web-Based Personalized Diabetes Monitoring System”,
2022.
[46] Pratya Nuankaew, Supansa Chaising, Punnarumol Temdee, “Average Weighted
Objective Distance-Based Method for Type 2 Diabetes Prediction”, 2021.
[47] Usama Ahmed, Ghassan F. Issa, Muhammad Adnan Khan, Shabib Aftab, Muhammad
Farhan Khan, Raed A. T. Said, Taher M. Ghazal,Munir Ahmad, “Prediction of Diabetes
Empowered With Fused Machine Learning”, 2021.

Chapter I (1) - Merged

Uploaded by

Copyright:

Available Formats

You might also like

Chapter I (1) - Merged

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter I (1) - Merged

Uploaded by

Copyright:

Available Formats

CHAPTER I

1.3 PROBLEM STATEMENT

By identifying individuals at risk in the nascent stages of diabetes, we empower timely

1.5 SIGNIFICANCE OF THE WORK

1.6 BACKGROUND STUDY

1.6.1 Risk Factors

1.6.4 Current Trends And Innovations

1.6.5 Public Health Impact

1.6.6 Challenges And Future Directions

Data mining is a process of discovering useful knowledge from database to build a

Based on machine learning algorithms, it is possible to distinguish between diabetic

Diabetes is influenced by a variety of factors such as height, weight, hereditary factors,

Today, advances in clinical healthcare services are supported by digitalization in the

 Data Privacy: The proposed method can incorporate privacy-preserving techniques

1.9 AIM AND OBJECTIVE

To enable timely interventions and personalized healthcare strategies for individuals at

Chapter 2 – This chapter contains a literature review and comparison of various

2.2 LITERATURE REVIEW

Gaurav Tripathi et al. [36] accomplished a Machine Learning Early Prediction of

2.2.2 Deep Learning Approaches

2.2.3 Hybrid Approaches

Usama Ahmed et al. [47] proposed a machine learning-enabled diabetes prediction.

2.3 COMPARISON TABLE OF EXISTING METHODOLOGIES

2.4 DATASET COMPARISON

2.5 RESULTS COMPARISON

2.6 PARAMETERS COMPARISON

 Limited discussion on potential applicability to other medical conditions.

[15] N. S. Khan, M. H. Muaz, A. Kabir and M. N. Islam, “Diabetes Predicting mHealth

You might also like