Predicting Factors Affecting Diabetes in Women Worldwide Using Logistic Regression

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

PREDICTING FACTORS AFFECTING DIABETES IN WOMEN

WORLDWIDE USING LOGISTIC REGRESSION


A Logistic Regression Approach

BANSON DANIEL 9371719


ASIGBETSE FREDRICK ETSE WALTER 9370319
ISAAC ANKOMAH 9367919
ABUBAKARI ANDARATU 9364119

August 29, 2023

GROUP 9 August 29, 2023 1 / 16


Contents

1 Introduction

2 Problem Statement

3 Objective

4 Methodology

5 Data Analysis and Results


Boxplots
Correlation Graph
General Confusion Matrix

6 Conclusion and Recommendation

GROUP 9 August 29, 2023 2 / 16


Introduction

Introduction

Background of the study


Currently, the world has been struggling from a serious health problem of
diabetes. Almost 3.2 million deaths are recorded annually caused to diabetes.
A lot of complications are followed from diabetes such as cardiovascular
disease, blindness and visual disability, kidney failure, etc.

Cost is another factor for the case remaining diagnostics. Health expenses for
diabetes are expensive.

The rising prevalence of the disease calls for urgent action. Research studies
conducted in Ghana and across the world have provided valuable insights into
the epidemiology, risk factors, and management of diabetes. However, more
research and coordinated efforts are needed to effectively prevent and control
diabetes and its associated complications.

GROUP 9 August 29, 2023 3 / 16


Problem Statement

Problem Statement

Problem Statement
Many people are remained uninvestigated for diabetes as (a) diabetes is often
asymptomatic, (b) investigations are costly, and (c) the process of doctor and
diagnostic center appointment is sluggish. Data mining techniques can
provide a solution which addresses the mentioned problems and lessens the
workload of health service providers.

The rapid increase in the number of diabetes cases is becoming a global


health concern. Early detection and prevention of diabetes are essential to
mitigate its impact on individuals and society.

GROUP 9 August 29, 2023 4 / 16


Objective

Objective
Objective
The study aims to build a logistic regression model for predicting onset of
diabetes from a medical dataset regarding a female who is suffering from
diabetes with high level of accuracy in a fastest way. Using Logistic
regression, we can create a statistical model to better understand the
prevalence, risk factors, and complications associated with diabetes in Ghana
and worldwide. We can also compare the findings with global trends to gain
further insight into the current state of diabetes in Ghana and how it impacts
the global population. This research will be beneficial in helping us identify
the most effective interventions to reduce the burden of diabetes in Ghana
and beyond.

Logistic Regression is performed on the Pima Indian Diabetes Database


(PIDD) dataset for building a classification model. Ultimately, this will help
to inform medical and public health professionals so that they can create
more effective strategies to tackle the growing problem of diabetes in Ghana
and the world.
GROUP 9 August 29, 2023 5 / 16
Methodology

Methodology
Methodology
We built a Machine Learning model using the Logistic regression algorithm
using these few steps.
(a) Data Pre-processing step.
(b) Fitting Logistic Regression to the Training set.
(c) Predicting the test result.
(d) Test accuracy of the result(Creation of Confusion matrix).
(e) Visualizing the test set result.

The analysis of this study was conducted using R Statistical Software and
SPSS. Our data and sources of information were gathered from the internet,
libraries, personal notes, lecture notes, and other relevant sources such as the
World Health Organization (WHO). All of these sources would provide
valuable insight into the research topic, allowing us to draw meaningful
conclusions
GROUP 9 August 29, 2023 6 / 16
Data Analysis and Results

Data Analysis and Results


Below are tables and graphs generated from our analysis
Table 1: Descriptive Statistics

There are 268 (34.9) percent diabetes patients and 500 (65.1) percent without
diabetes in our sample. There are 768 instances logged under eight attributes as
independent variables and 1 dependent variable. The class of ”0” is negative for
diabetes and ”1” is treated as positive for diabetes.
GROUP 9 August 29, 2023 7 / 16
Data Analysis and Results

Boxplots
Summary

GROUP 9 August 29, 2023 8 / 16


Data Analysis and Results

Correlation Graph
Correlation Graph
Below is a graph of the correlation graph:

GROUP 9 August 29, 2023 9 / 16


Data Analysis and Results

Confusion Matrix
General

GROUP 9 August 29, 2023 10 / 16


Data Analysis and Results

Confusion Matrix

Table 2: Generated Confusion Matrix

GROUP 9 August 29, 2023 11 / 16


Data Analysis and Results

From the table among the 768 patient, 268 people suffer from diabetes and 500
people do not have diabetes.

Accuracy is defined as a percentage of correct prediction for the tested data. It


was observed that Logistic Regression classification has an accuracy of 77 percent.

GROUP 9 August 29, 2023 11 / 16


Data Analysis and Results

ANOVA Model

Table 3: ANOVA

GROUP 9 August 29, 2023 12 / 16


Data Analysis and Results

Model Evaluation

Table 4: Evaluation measures of our model

GROUP 9 August 29, 2023 13 / 16


Conclusion and Recommendation

Conclusion and Recommendation

Conclusion and Recommendation


Conclusion: Our study reported a performance metric, that is, precision,
recall, accuracy, F1 score, and AUC for logistic regression technique. The
Logistic regression classifier achieved a performance with 77 percent accuracy
and an F1 score of 0.83 It was revealed that 65.1 percent of the patients had
no diabetes and finally 34.9 percent of patients were sadly affected by
diabetes.

The overall prevalence of diabetes was 76.9 percent, which emphasizes that,
diabetes is a common mental health problem among patients worldwide and
is higher than the average prevalence reported by many authors about the
global diabetes. Also, it was uncovered that factors including body weight,
abnormal cholesterol level, smoking, family history, physical inactivity, bad
food habit were associated with diabetes. The validation shows that our
model has a relatively good predictive performance.

GROUP 9 August 29, 2023 14 / 16


Conclusion and Recommendation

Conclusion and Recommendation

Recommendation: Diabetes prediction at an early stage can address one of


the critical issues in the public health sector. In this study, a methodical step
was followed to develop a prediction model to forecast illnesses such as
Diabetes with maximum accuracy. After measuring the confusing matrix,
F-score, Precision, Recall, and Accuracy level, the Logistic Regression
achieved an accuracy of 77 percent.

Thus, the binary class ”Logistic Regression” model is recommended for


predicting the onset of diabetes for medical records. This study should be
implemented for prediction in the health sector.

GROUP 9 August 29, 2023 15 / 16


Conclusion and Recommendation

Thank you!

GROUP 9 August 29, 2023 16 / 16

You might also like