Professional Documents
Culture Documents
Predicting Factors Affecting Diabetes in Women Worldwide Using Logistic Regression
Predicting Factors Affecting Diabetes in Women Worldwide Using Logistic Regression
Predicting Factors Affecting Diabetes in Women Worldwide Using Logistic Regression
1 Introduction
2 Problem Statement
3 Objective
4 Methodology
Introduction
Cost is another factor for the case remaining diagnostics. Health expenses for
diabetes are expensive.
The rising prevalence of the disease calls for urgent action. Research studies
conducted in Ghana and across the world have provided valuable insights into
the epidemiology, risk factors, and management of diabetes. However, more
research and coordinated efforts are needed to effectively prevent and control
diabetes and its associated complications.
Problem Statement
Problem Statement
Many people are remained uninvestigated for diabetes as (a) diabetes is often
asymptomatic, (b) investigations are costly, and (c) the process of doctor and
diagnostic center appointment is sluggish. Data mining techniques can
provide a solution which addresses the mentioned problems and lessens the
workload of health service providers.
Objective
Objective
The study aims to build a logistic regression model for predicting onset of
diabetes from a medical dataset regarding a female who is suffering from
diabetes with high level of accuracy in a fastest way. Using Logistic
regression, we can create a statistical model to better understand the
prevalence, risk factors, and complications associated with diabetes in Ghana
and worldwide. We can also compare the findings with global trends to gain
further insight into the current state of diabetes in Ghana and how it impacts
the global population. This research will be beneficial in helping us identify
the most effective interventions to reduce the burden of diabetes in Ghana
and beyond.
Methodology
Methodology
We built a Machine Learning model using the Logistic regression algorithm
using these few steps.
(a) Data Pre-processing step.
(b) Fitting Logistic Regression to the Training set.
(c) Predicting the test result.
(d) Test accuracy of the result(Creation of Confusion matrix).
(e) Visualizing the test set result.
The analysis of this study was conducted using R Statistical Software and
SPSS. Our data and sources of information were gathered from the internet,
libraries, personal notes, lecture notes, and other relevant sources such as the
World Health Organization (WHO). All of these sources would provide
valuable insight into the research topic, allowing us to draw meaningful
conclusions
GROUP 9 August 29, 2023 6 / 16
Data Analysis and Results
There are 268 (34.9) percent diabetes patients and 500 (65.1) percent without
diabetes in our sample. There are 768 instances logged under eight attributes as
independent variables and 1 dependent variable. The class of ”0” is negative for
diabetes and ”1” is treated as positive for diabetes.
GROUP 9 August 29, 2023 7 / 16
Data Analysis and Results
Boxplots
Summary
Correlation Graph
Correlation Graph
Below is a graph of the correlation graph:
Confusion Matrix
General
Confusion Matrix
From the table among the 768 patient, 268 people suffer from diabetes and 500
people do not have diabetes.
ANOVA Model
Table 3: ANOVA
Model Evaluation
The overall prevalence of diabetes was 76.9 percent, which emphasizes that,
diabetes is a common mental health problem among patients worldwide and
is higher than the average prevalence reported by many authors about the
global diabetes. Also, it was uncovered that factors including body weight,
abnormal cholesterol level, smoking, family history, physical inactivity, bad
food habit were associated with diabetes. The validation shows that our
model has a relatively good predictive performance.
Thank you!