Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/372859217

DISEASE PREDICTION WEB-APP USING MACHINE LEARNING A SENIOR YEAR


PROJECT REPORT

Book · August 2023

CITATIONS READS

0 2,221

1 author:

Tatchemo GAUIAFAING Ronald


University of Verona
3 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by Tatchemo GAUIAFAING Ronald on 03 August 2023.

The user has requested enhancement of the downloaded file.


DISEASE PREDICTION WEB-APP
USING
MACHINE LEARNING

A SENIOR YEAR PROJECT REPORT

By

TATCHEMO GUIAFAING RONALD

MATRICULE: 21SI-004973

SUBMITTED TO THE SCHOOL OF INFORMATION TECHNOLOGY IN


PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE AWARD OF
A BACHELOR OF SCIENCE (B.sc.) DEGREE IN SOFTWARE
ENGINEERING

SUPERVISOR JULY, 2023.

Mrs. TIAKO FANI MICHELE NDAMBOMVE


DECLARATION

I declare that this project on “DISEASE PREDICTION WEB-APP USING


MACHINE LEARNING” is an original work done by me under the supervision of
Mrs. TIAKO FANI MICHELE NDAMBOMVE of the School of Information
Technology at the Catholic University Institute of Buea.

i
CERTIFICATION

This is to certify that this project titled “DISEASE PREDICTION WEB-APP USING
MACHINE LEARNING” was carried out by TATCHEMO GUIAFAING RONALD,
registration number 21SI-004973, an undergraduate senior year student in the School of
Information Technology of the Department of Software Engineering, Catholic
University Institute of Buea.

………………………………… Date…………………………………

Tatchemo Guiafaing Ronald

This Certification is confirmed by:

……………………………………… Date………………………………………..

Mrs. Tiako Fani Michele Ndambomve (Dean of SIT)

ii
DEDICATION

I dedicate this project to the Lord Jesus Christ, who gave me the grace and strength to go through this
amazing degree program. I’m very grateful for the constant support and encouragement I have
received from family and friends. A special thanks to all the Academic Mentors I have come across
since I was a freshman; their lectures, support, and disapproval have contributed a lot to my adaptation
to this wide-ranging field of Information technology. The future would have never been that bright
without all of them.

iii
ACKNOWLEDGMENTS

I am most grateful to the Almighty God for providing me with love, grace, mercy,
wisdom, and strength.

I appreciate all the guidance received from the V.P. of Academic Affairs,
Research, and Cooperation: Dr. FELICITAS ATABONG MOKOM

I am thankful to my family as a whole and particularly to my


parents, Mr. GUIAFAING JEAN MARIE and Mrs. TCHOMTE
LOUISE NOUTATIEM, for their words of encouragement and
financial support.

His Lordship, Bishop Michael Miabesue Bibi, Pro-Chancellor of the Catholic


University Institute of Buea, for providing such a good and secure environment
for our studies.

The Dean of the School of Information Technology, Mrs. TIAKO FANI


MICHELE NDAMBOMVE, for her motherly attention, encouragement, advice,
and support.

Dr. DOUMBU REINE CHARLYE GUIAFAING for her expertise in the domain
of health care. I am grateful for her constant guidance, assistance, inspiration,
advice, expertise, love, and support, which made this work a success.

Engineer NOUTATIEM YOURI MIKHAIL GUIAFAING CEO of Yimmiline SAS


moral and technical support have been of help.

My classmates ALUFON WILFRIED MAWOH and ADRIEN DJEUFOUE


LONTSI's motivation and teamwork helped me accomplish this project.

iv
LIST OF ABBREVIATIONS
Kaggle It is a subsidiary of Google and an online community of data scientists and machine learning
engineers. It is on this platform that all four sicknesses have been trained, tested,tested
downloaded, and inserted in the final application for future use.
PYTHON It is a general-purpose language, which can be used to create a variety of different
Programs
ANACONDA Open-source distribution of the Python for M.L that aims to simplify package
management and deployment.
M.L Machine Learning
Support Vector Machine This are supervised learning models with associated learning algorithms that analyze
data for classification and regression analysis.
Datasets A data set is a collection of data.
Accuracy score Accuracy is one metric for evaluating classification models.
Disease prediction
Predicting the user's disease based on the symptoms that the user(Doctor) provides as
input.
NEURAL NETWORK An artificial intelligence (AI) method that teaches computers to process data in a way
that is inspired by the human brain Hence it said to be using Deep learning.
Spyder It is an open-source cross-platform integrated development environment for scientific
programming in the Python language. It is high-level Python web framework that
encourages rapid development and clean, pragmatic Design.
DJANGO It is high-level Python web framework that encourages rapid development and clean,
pragmatic design.
STREAMLIT It is an open-source app framework for Machine Learning
MVC Model View Controller this is Implemented in the app Implementation Using Django
ANACONDA It is an open-source distribution of the Python and R programming languages for M.L
that aims to simplify package management and deployment.
TENSORFLOW Project Object Model
Template Folder This is a Folder where HTML files are being place in for further Usage in the Code
Using Django Framework
Static Folder This is a Folder where CSS ,IMAGES , J.S, SCSS files are being place in for further
Usage in the Code Using Django Framework
TSL Key This is a Security Key Generated by Google base on the Use Email address in other to
ease User Authentication before being able to Predict Patients sickness
API Application Programming Interface
LGR Logistic Regression Model

v
TABLE OF CONTENT
DECLARATION............................................................................................................................................1
CERTIFICATION...........................................................................................................................................2
DEDICATION................................................................................................................................................3
ACKNOWLEDGMENTS...............................................................................................................................4
LIST OF ABBREVIATIONS.........................................................................................................................5
TABLE OF FIGURES..................................................................................................................................10
ABSTRACT..................................................................................................................................................11
1. INTRODUCTION.....................................................................................................................................12
1.1 Background of the Study.....................................................................................................................12
1.2 Problem Statement..............................................................................................................................12
1.3 Aims and Objectives............................................................................................................................13
1.4 SCOPE AND LIMITATIONS OF THE WORK.................................................................................13
1.4.1 SCOPE............................................................................................................................................13
1.4.2 LIMITATIONS OF THE WORK:...................................................................................................13
2. LITERATURE REVIEW..........................................................................................................................14
3. METHODOLOGY....................................................................................................................................17
A.SOFTWARE DEVELOPMENT: SICKNESS EDUCATIONAL SITE....................................................18
A.1 PROCEDURE....................................................................................................................................18
A.1.1 System Design.................................................................................................................................21
A.1.2 Implementation................................................................................................................................24
A.1.3 Testing.............................................................................................................................................24
A.1.4 Deployment and maintenance..........................................................................................................24
A.2 Project Plan.........................................................................................................................................24
3.1 Frameworks Used in this Project.............................................................................................................26
B. MACHINE LEARNING MODEL: PREDICTION APP..........................................................................27
B.1 ALGORITHM USE...........................................................................................................................27
B.2 WORKFLOW OF MACHINE LEARNING:....................................................................................27

vi
4 IMPLEMENTATION................................................................................................................................32
4.1 Configuration......................................................................................................................................32
4.2 Test Results.........................................................................................................................................33
5 CONCLUSION..........................................................................................................................................41
5.1Future Work.........................................................................................................................................42
REFERENCES..............................................................................................................................................43

vii
TABLE OF FIGURES

Figure 1: Iterative waterfall Model.......................................................................................................................................18


Figure 2: Data flow of a Common User interaction with the system.....................................................................................20
Figure 3: Data flow of a Medical Doctor interaction with the system...................................................................................21
Figure 4: ER-Diagram of the Prediction System Once the Doctor has Login.......................................................................22
Figure 5: Class Diagram of the multiple prediction system...................................................................................................22
Figure 6: Django MVT Architecture.....................................................................................................................................25
Figure 7: Workflow of Diabetes prediction Architecture for a Diabetic Case.......................................................................27
Figure 8: Workflow of Prediction Architecture for Patients with Heart Diseases.................................................................28
Figure 9: Workflow of Parkinson's Prediction Architecture for Patients...............................................................................28
Figure 10: Visualization of the Support Vector Machine Classifier......................................................................................29
Figure 11: Screen-shoot of the 4 downloaded dataset for a possible prediction....................................................................30
Figure 12: Configuration of the Kaggle Notebook for the Heart Disease datasets................................................................31
Figure 13: Downloading the Required Software...................................................................................................................32
Figure 14: Representation of the spitted datasets in two.......................................................................................................33
Figure 15: Accuracy score of the Training data of Diabetic dataset......................................................................................34
Figure 16: Visualization of the Train model for the Diabetic Datasets..................................................................................34
Figure 17: All the Train datasets have been downloaded for future use................................................................................35
Figure 18: Spyder IDE of the loaded train model..................................................................................................................35
Figure 19: Spyder IDE of the loaded train model for the 4 Diseases.....................................................................................36
Figure 20: Visualization of the Final Version of the Prediction App....................................................................................37
Figure 21: Implementing the Front-End website using Django Framework..........................................................................37
Figure 22: Joining the Django front-end website with the Prediction Application................................................................38
Figure 23: Front View of the web-app combine with the Streamlit (Prediction App)...........................................................38
Figure 24: Visualizing the Sign-In Option Menu..................................................................................................................39
Figure 25: Visualizing the Sign-In and Sign-up page Option Menu......................................................................................39

INDEX OF TABLES
Table 1: Comparative study using various algorithms in the literature review......................................................................16
Table 2: Used Software Tools...............................................................................................................................................25
Table 3: Hardware’s components Used.................................................................................................................................25

10
ABSTRACT

Disease Prediction Using Machine Learning is the system that is used to predict
diseases based on the symptoms that are given by patients or any other user. Disease
prediction in humans also means predicting the probability of a patient’s disease after
examining the combinations of the patient’s symptoms. This analysis in the medical
industry would lead to a streamlined and expedited treatment of patients. The
previous researchers have primarily emphasized machine learning models, mainly
Support Vector Machine (SVM), K-nearest neighbors (KNN), for the detection of
diseases with symptoms as parameters. However, the data used by the prior
researchers for training the model is not transformed, and the model is completely
dependent on the symptoms, while their accuracy is poor. Nevertheless, there is a
need to design a modified model for better accuracy and early prediction of human
disease. The proposed model has improved the efficacy and accuracy of the model by
resolving the issue of the earlier researcher’s models. The proposed model uses the
medical dataset from Kaggle and transforms the data by assigning weights based on
their rarity. This dataset is then trained using a combination of machine learning
algorithms, including SVM. Neural Network Parallel to this, the history of the patient
can be analyzed using the LSTM Algorithm. SVM is then used to conclude the
possible disease. The proposed model has achieved better accuracy and reliability as
compared to state-of-the-art methods. The proposed model is useful to contribute
towards development in the automation of the healthcare industries. It will be
connected to Hospital websites, and only an authenticated user (a doctor) will be
connected to the main predicting Web-Application for a possible prediction.

11
1. INTRODUCTION
1.1 Background of the Study

Machine Learning is the domain that uses past data for predicting. Machine Learning is
the understanding of computer system under which the Machine Learning model learn
from data and experience. The machine learning algorithm has two phases: 1) Training &
2) Testing. To predict the disease from a patient’s symptoms and from the history of the
patient, machine learning technology is struggling from past decades. Healthcare issues
can be solved efficiently by using Machine Learning Technology. We are applying
complete machine learning concepts to keep the track of patient’s health. ML model
allows us to build models to get quickly cleaned and processed data and deliver results
faster. By using this system doctors will make good decisions related to patient diagnoses
and according to that, good treatment will be given to the patient, which increases
improvement in patient healthcare services. To introduce machine learning in the medical
field, healthcare is the prime example. For the prediction of diseases, the existing will be
done on linear, KNN, Decision Tree algorithm.

Specialists find it difficult to make decisions about the illnesses because they may not
have skills in all areas. To address this issue, it is necessary to develop a disease
prediction system that combines medical knowledge with an integrated system to produce
the biggest results and can help society [1].

1.2 Problem Statement

✓ According to the World-data Info there are 0.09 doctors per 1000 Inhabitants in
Cameroon.

✓ Specialists find it difficult to make decisions about the illnesses because they
may not have skills in all areas.

✓ Inadequate Medical facilities distributed across various Hospitals make it more


time-consuming for a Patient to know which disease they are suffering from.

12
1.3 Aims and Objectives

The project proposes to create a DISEASE PREDICTION WEB-APP USING MACHINE


LEARNING that will provide the most accurate prediction of various patient illnesses. The prediction
will be done thanks to collected datasets on Kaggle, and training will be done on those datasets, which
will be mounted on a web app coded in Spyder and run using Streamlit. This multiple disease
prediction system will be collected on a sample hospital website and can only be accessed when given
access (logging in). The prediction will be faster as compared to using laboratory equipment, and
efficiency is greater than 94% for each trained and tested Model. I decided that the following modules
could be included in the development process:
➢ Aim
Rapidly determine if a patient is Infected with a Particular Disease or not, with
a prediction Accuracy of up to 94% for each of the illnesses.
➢ Objectives
✓ Assist the General Practitioner / Specialist in their day-to-day life-Saving
battle.

✓ Educate the population with respect to the major illnesses faced by the
population in our community (such as diabetes, breast cancer, Parkinson's,
and Heart Disease), especially how they are transmitted, how they can be
avoided, and the preventive measures put in place by the Government.

1.4 SCOPE AND LIMITATIONS OF THE WORK


1.4.1 SCOPE
In the future, more models could be trained and used in various sectors, enhancing efficiency
by considering more symptoms to predict disease.

1.4.2 LIMITATIONS OF THE WORK:


Through the applications, it will be possible to predict if a User is sick of a Particular disease
or not. The primary limitation of this application is that it can only be used by Medical
doctors or high-class people who are able to afford medical equipment in their home for the
collections of samples, which will be input into the applications.

13
2. LITERATURE REVIEW

There have been numerous studies done related to predicting the disease using different
machine learning techniques and algorithms which can be used by medical institutions.
This paper reviews some of those studies done in research papers using the techniques
and results used by them. Reviews are given below:
A. Reviews:
MIN CHEN et al, [2] proposed a disease prediction system in his paper where he used
machine learning algorithms. In the prediction of disease, he used techniques like CNN-
UDRP algorithm, CNN-MDRP algorithm, Naive Bayes, K- Nearest Neighbor, and
Decision Tree. This proposed system had an accuracy of 94.8%.

Sayali Ambekar et al, [3] recommended Disease Risk Prediction and used a convolution
neural network to perform the task. In this paper machine learning techniques like CNN-
UDRP algorithm, Naive Bayes, and KNN algorithm are used. The system uses structured
data to be trained and its accuracy reaches 82% and achieved by using Naïve Bayes.

Naganna Chetty et al, [4] developed a system that gives improved results for disease
prediction and used a fuzzy approach. And used techniques like KNN classifier, Fuzzy c-
means clustering, and Fuzzy KNN classifier. In this paper diabetes disease and liver,
disorder prediction is done and the accuracy of Diabetes is 97.02% and Liver disorder is
96.13. Dhiraj Dahiwade et al, [5] designed a model for prediction of the disease using
approaches of machine learning and used techniques like KNN and CNN. This Perceptron model
is used in this system. This system predicts heart disease based on basic symptoms like age, sex,
pulse rate, etc.

The accuracy of this suggested system is 91%. Ankita Dewan et al, [8] recommended a disease
prediction system that uses data mining classification hybrid technique for predicting heart
disease. This system is using techniques like Neural Network, Decision Tree, and Naive Bayes.
The accuracy of this system is 87%.
14
YEAR AUTHOR PURPOSE METHOD USED ACCURACY

2017 MIN CHENet al, [2] Proposed a disease CNN-UDRP algorithm, 0.95
prediction system in his CNN-MDRP algorithm,
paper where he Naive Bayes, K- Nearest
used machine learning Neighbor,
Algorithms. Decision Tree

2018 Sayali Ambe kar et al, Recommended Disease CNN-UDRP algorithm, The highest
[3] Risk Prediction and used a Naive Bayes and accuracy of
KNN convolution neural algorithm 82% is
network to perform achieved by
the task Naïve
Bayes.

2015 Naganna Chetty Developed a system KNN classifier, Diabetes:


et al, [4] that gives improved Fuzzy c-means clustering, 97.02 %
results for disease and Fuzzy
prediction and KNN classifier Liver
used a fuzzy disorder:
approach 96.13%

2019 Dhiraj Dahiwade etal, Designed a model for K-Nearest neighbor (KNN) KNN: 95%
[5] prediction of the an Convolutional
disease using neuralnetwork (CNN) CNN:98%
approaches of machine
learning
2017 Lambo darJena et al, Focused on risk prediction Naive Bayes 95%
[6] for chronic diseases by
taking advantage of $$
distributed machine
99.7%
learning classifiers Multilayer Perceptron

2016 Dhom se Studied special Naive Bayes classification, Diabetes


disease Kanch an B.et al, prediction utilizing Decision Tree and Support Disease:
[7] principal component Vector Machine 34.89%
analysis using Heart
machine learning Disease:53%
algorithms

15
84.42%
Heart
Disease:
87.12%

Decision Tree: Breast


Cancer:
94.29%

Diabetes:
74.03%

Heart
Disease:
70.97%

2017 Deeraj Shetty et al, Studied the uses of data Naïve Bayes and KNN KNN gives
[10] mining for diabetes better
disease prediction accuracy,
compared to
Naïve
Bayes.
2017 Rashmi G Saboji et al, Tried to find a scalable Random Forest Algorithm 0.98
[11] solution that
can predict heart disease
utilizing
classification mining

Table 1: Comparative study using various algorithms in the literature review

16
3. METHODOLOGY

It is not easy to develop software. Designing software is complicated and time-consuming. process.
During the program design life-cycle, programmers customize software for clients. There are situations
where programmers often change their designs according to customers’ requirements and some other
limitations. The design process is unstable and complex, while the requirements of customers are
always idealistic. It is imperative to make the right decision when choosing what software to use.
Development methodology is used.

It is essential to know that building software is not simply about writing some code and then being
done. To begin with, software developers talk to their clients about what they want the software to be
and what its major functions are. Then a development methodology is needed. Most companies use the
methodologies, which are public; some companies use their own methodologies. Award divides those
methodologies into two categories: heavyweight and lightweight. In Awad’s words, heavyweight
methodology refers to traditional methodologies; the most classical traditional methodology is
waterfall, which I will only discuss and compare with agile methodology.

Implementing this Disease Prediction System Using Machine Learning: A Combination of Both
Software Development Principles (for the website to be implemented, which will help educate
communities with respect to each pathology) and Machine Model (which will involve obtaining the
dataset from Kaggle, pre-processing the data, splitting the data into both Testing and training data, and
passing this spitted data into a machine learning algorithm) (SVM, LGR, or Deep Neural Network, as
the Case may be), then obtaining new data, saving it as a trained vector machine classifier, and hence
passing it into an if and else condition that will return either 0 or 1, that is, Patient is having... or
Patient is not having..., as the case may be, with respect to the 4 Various Sicknesses.

17
A.SOFTWARE DEVELOPMENT: SICKNESS
EDUCATIONAL SITE

A.1 PROCEDURE

For achieving the objective of the system, software development methodology needs to
be chosen wisely for better planning the flow of the system's development. There are
various types of models in the Software Development Life Cycle (SDLC), which are the
V-Shaped Model, Evolutionary Prototyping Model, Spiral Model, Iterative Waterfall
Model, and Agile Model. The choice made for the development was an Iterative
Waterfall Model.

Figure 1: Iterative waterfall Model

The figure above shows the methodology that will be used in this research. The waterfall approach
emphasizes a structure and well-defined progression between each phase.

18
Each phase consists of a defined set of activities and deliverables that must be accomplished
before going to the next phase or step. The first phase tries to obtain or capture what the system
will do, which is its requirement; the second phase determines how it will be designed; the third
phase is the actual programming; the fourth phase is the full system testing; and the final phase,
which In the fifth phase, it focuses on the implementation tasks, such as documentation. The six
stages involved in the iterative waterfall model are as follows:

1. Requirement Analysis

2. System Design

3. Implementation

4. Testing

5. Deployment

6. Maintenance

A.1.1 Requirement analysis

During this phase, existing systems are analyzed, and all the requirements that are needed to
develop the new system are identified. In this phase, information regarding the system is
gathered and studied, either in the form of journals, articles, or research papers. The findings are
summarized and analyzed to find the requirements of the system as functional and non-
functional requirements.
A.1.1.1 Functional Requirements

The functional requirement includes;


Community:

I) Any Individual can educate himself with respect to various pathology especially how they are
transmitted and how they can been prevented.

II) They Can Subscribe Hospital News-letters in other to received information weekly.

19
III) They could leave a message to the Hospital office by using the contact form.

Medical Doctor (Specialist):

I) Doctors can log in to the system or create an account, which has to be activated by IT
personnel at each hospital.

II) Once the Doctor has logged in, the System will be directed to the Multiple Disease Prediction
App for a possible prediction by the doctor, where a form will be presented and a form should be
filled out.
A.1.1 System Design
The system design process normally partitions the requirements into either hardware or software
systems. It builds the overall system architecture. Software Design involves identifying and
describing the fundamental abstractions of software systems and their relationships. Based on the
detailed requirements that were obtained from the first phase of this project, the software
architectural design task is performed, which includes identifying the data flow, the class
diagram of this application, and also a text plan. The data flow diagram shows the flow of
information and the transformation applied when data moves in and out of a system.

Figure 2: Data flow of a Common User interaction with the system

The diagram above shows the data flow of the user (Common User) interacts with the system.
20
Figure 3: Data flow of a Medical Doctor interaction with the system

The diagram above shows the data flow of the doctor's interaction with the system.

The diagram above shows the data flow of the Medical doctor's (approved) interaction
with the system. The Class diagram and E-R diagram are used to illustrate the objects,
their properties and methods, and the relationships between objects once the User has
authenticated and been redirected to the Prediction Application.

21
Figure 4: ER-Diagram of the Prediction System Once the Doctor has Login

Figure 5: Class Diagram of the multiple prediction system

22
A.1.2 Implementation
Based on my architectural views as above from the previous phase, the implementation task is
performed. The whole application is partitioned into sets of program modules.
A.1.3 Testing
The system is built and tested as a complete system to ensure that the software requirements have been
met. This is done at the end of the Machine Learning part, which consists of Training the data and
testing the train data before inserting it into the Spyder application and thus connecting it to the Django
web applications. After my testing, as a full system, the system is placed out in public to be tested by
other users. This is the phase where the application is fully tested. It's completely tested to ensure it is
up and running without any errors. At this stage, the application is carefully monitored to ensure that it
is error-free and works as expected.

A.1.4 Deployment and maintenance

This is the longest phase; the application is deployed to an online server and put into use. Maintenance
involves correcting errors that were not discovered in the early stages, improving the implementation
of the system units, and enhancing the system’s components. In summary, with the iterative waterfall
model, the project is divided into phases; typically, the previous phase’s deliverables have to be
completed before the next can be executed, but in a case where an in-phase need needs to be corrected,
the model allows you to iterate to the previous phases for modifications. The reason for the choice of
model is because of the nature of the project, in which some phases need modifications in order for the
other phases to be modified and enhanced. The approach allows for a good estimation of the time
allocated for each of the major phases of the project.

A.2 Project Plan

A project plan is a formal document designed to guide the control and execution of a project. The
primary uses of the project plan are to document planning assumptions and decisions, facilitate
communication among project stakeholders, and document approved scope, cost, and schedule
baselines. This project plan will include cost estimation and analysis, project schedule.

23
Software Reason of Usage
Linux (UBUNTU 23.04) Most popular Linux distribution with good
performance and security enhancement. It
is a lightweight operating system and
consumes fewer hardware components.
Python programming language is easy to
Python (3.9.17) learn and object-oriented programming
language and comes with a lot of libraries
or packages for developing software’s
Kaggle It provide and online machine system which will
help to train and test the downloaded dataset before
connecting it to Spyder application
Web browser It is use as a source finding platform for article on
Google chrome literature (Previous work)
Table 2: Used Software Tools

Hardware/components Specification/Model Reason of Usage


Laptop COMPAQ i5 Intel core It was the Work Station
12Gigabyte of RAM , use to build the System.
750Gigabyte of Storage
WIFI BOX MTN , ORANGE Providing the internet
Connection
Table 3: Hardware’s components Used

The table above is the combination of both the Software and Hardware component used: For Web-
app and the Machine Learning Model

24
3.1 Frameworks Used in this Project
Bootstrap: It is the most popular HTML, CSS, and JS framework for developing responsive, mobile-
first projects on the web.
Django: Django is an open-source python web framework used for rapid development, pragmatic,
maintainable, clean design, and secures websites.
Django is based on MVT (Model-View-Template) architecture. MVT is a software design pattern for
developing a web application.

Figure 6: Django MVT Architecture

It is a collection of three important components Model View and Template. The Model helps to handle
the database. It is a data access layer that handles the data. The Template is a presentation layer that
handles the User Interface part completely. The View is used to execute the business logic and interact
with a model to carry data and renders a template. Django follows the MVC pattern but maintains its
conventions.
Streamlit (Spyder): Even though the Project will have an educational part, whereby sickness will be
explained to the user, especially their way of transmitting it and how it can be avoided, the principal
objective of the research work is to ensure that Doctors are able to determine at the end of having to
fill out a form if a Patient is sick or not.
Spyder was built specifically to be used for data science. Its interface allows the user to scroll through
various data variables and also offers an online help option. The output of the code can be viewed in
the Python console (for this research, streamlit is used) on the same screen. Where I will work on
different scripts at a moment and then try them out one by one in the same console or different as per
your choice, all the variables used will be stored in the variable explorer tab. It also provides an option
to view graphs and visualizations in the plot window.
25
So the data will be trained and tested on Kaggle and then downloaded with extension of be of (.sav)
and connected to Spyder apps, and then a form is created from the Spyder app to enable the data to be
parsed from the form to the train model after the user has been given access by the Django web app.

B. MACHINE LEARNING MODEL: PREDICTION APP

B.1 ALGORITHM USE

For 3 disease prediction, the SVM algorithm is used, and the Supervise algorithm is the Thus, in
supervision learning, we feed our data to the Machine Learning model, and the Machine Learning
Model learns from the data and its respective labels. So in this case, we train our models with several
pieces of medical information (Such as the blood glucose level and the insulin level of patients, along
with whether the person has diabetes or not). So this acts as a label to show whether the person is
Diabetic or Not. So once we feed this data to our support vector machine, what happens is that it tries
to plot this data. And once it plots, try to find hyperplane.

B.2 WORKFLOW OF MACHINE LEARNING:

1-From Kaggle.com I will download the datasets, and then try to train our data with the models and
the respective levels and feed it into our Models.
2: I will pre-process the data, and we will try to analyze it. This data will be very suitable to feed the
machine learning model, and we need to standardize this data. (Because there are a lot of attributes
here, there is a lot of Medical Information.) So standardizing this data is important.

3: So once I pre-process the data, I will split the data into Train and Test. So I will train our machine
Learning Model with training data.
4: So once I split the test data into Training data and testing data, we will feed this to our Support
Vector Machine models. So we will be Using classification models, where these models will classify
whether the patient is Diabetic or non-diabetic. (This is for the case of Diabetic prediction.)

26
5- So once I have a Train Vector Support Machine Classifier, when we give our new data, I can now
predict whether the Person is Diabetic or non-diabetic.

Figure 7: Workflow of Diabetes prediction Architecture for a Diabetic Case.

This image makes it visible the hyperplane. So what happens is that this hyperplane separates these
two pieces of data. So when I feed a new model, it will try to put that particular data into the ideas of
these two groups.

27
Figure 8: Workflow of Prediction Architecture for Patients with Heart Diseases

Figure 9: Workflow of Parkinson's Prediction Architecture for Patients


28
Figure 10: Visualization of the Support Vector Machine Classifier

The Support Vector Machine Model function is shown in the Image above. It in effect
considers the dataset entered and the one with which we are checking the similarity
between them in the form of X and Y (or Feature 1 and Feature 2), where X is people
with Parkinson's and the other set of lines is people without Parkinson's (or any other
disease training Model). The SVM finds the best line of separation between the two sets
of Data.

So SVM is a vector that is very close to this hyperplane. So if the Orientation of this
hyperplane changes, the SVM Classifier will also change. So what happens is that when
a new data-point is given, it tries to find out to which side the data plane belongs. So that
is how SVM works. And two points are not sufficient in some cases. So in some cases,
we need more than two dimensions to determine or get the result of our testing.

29
Figure
11: Screen-shoot of the 4 downloaded dataset for a possible prediction

The figure above is capture of the downloaded datasets of the sickness from kaggle to my repository
(computer) which will be use locally in other to ensure that the web-app function as train with the
same accuracy of each sickness.

Thus this datasets and the final train model will be move in the Spyder directory and part will be call
in our Spyder app whereby a form will be generate and combine at the end using the menu-option of
Streamlit.

30
4 IMPLEMENTATION

4.1 Configuration

As mentioned earlier in the Methodology section, this project is divided into two paths:

the Software Development section and the building of the Machine Learning Model.

Having downloaded each dataset on Kaggle, I created an account on Kaggle and created
a directory for each of those projects.

Figure 12:
Configuration of the Kaggle Notebook for the Heart Disease datasets

31
Figure 13: Downloading the Required Software

4.2 Test Results

The outcome row is dropped from each model, and then each model is split into X_train, X_test, and
Y_train. Y_test = train_test_split(X, Y, test_size = 0.1, stratify = Y, random_state = 2). 4 variables are
created, as seen above: X_train (containing the feature of all the train data), X_test (containing the
feature of all the test data), Y_train (containing the feature of all the targets presenting the X_train),
and Y_test(containing the feature of all the targets presenting the X_test)

32
Figure 14: Representation of the spitted datasets in two

33
Figure 15: Accuracy score of the Training data of Diabetic dataset

Figure 16: Visualization of the Train model for the Diabetic Datasets

The pickle library is the function that is needed to save these models. A variable is declared as being
trained_model.sav, and the model is loaded in the variable called classifier. So I am opening the
variables filename and wb (write, binary), so I am writing a file in the binary format, and what I am
writing is nothing more than the classifier. This is, in effect, the diabetic dataset. In this case, I have
used a PIMA( Diabetes is a chronic condition that causes a person's blood sugar level to become too
high) diabetic dataset. So when I run this, it will create a file called trained_model.sav. So basically, it
is the same thing, so what will be done is through a webpage. So a user Interface is put in place, and
through this interface, the User could be prompted to enter these details. And the code will predict
whether the person is diabetic or not. Hence, each model is saved and downloaded. Spyder IDE has
been used to design the User Interface.

34
Figure 17: All the Train datasets have been downloaded for future use

The Image above represents both the train model (which has the extension .sav ) and the dataset
downloaded at the beginning of the project from Kaggle.

Figure 18: Spyder IDE of the loaded train model

It is seen that the implementation is quite the same thing as the one done online on kaggle.

So a Function is created, and all of the declared predictions are included in the function. Having
created the function, I created the Interface (diabetic webpage), which in effect will present to the user
a field for him/her to enter (with the outcome field left out for our model to train based on the Outcome
field). And what the user will enter will be parsed in the train model and the model will return an
answer being either 0 or 1.

35
Figure 19: Spyder IDE of the loaded train model for the 4 Diseases

The Figure above represents a screenshot of the Combine trained model parsed into the Spyder
application. The form arrangement is done using col 1, col 2 col 3 definitions which in effect will
divide the row into 3. So if a row were to take only one field, it will take now 3 fields of the data
element of data-type being number. Hence the application is less cumbersome.

Having done this the option-menu function is called and the visualizations is been seen using Streamlit
which is the web-server used.

36
Figure 20: Visualization of the Final Version of the Prediction App

Figure 21: Implementing the Front-End website using Django Framework

37
Figure 22: Joining the Django front-end website with the Prediction Application

FigureFigure
4- 13: 23: Front
Front ViewView of the
of the web-app
web-app combine
combine withwith
the the Streamlit
Streamlit (Prediction
(Prediction App)App)

38
Figure 24: Visualizing the Sign-In Option Menu

Figure 4- Figure 25: Visualizing


15: Visualizing the Sign-In
the Sign-In and Sign-up
and Sign-up page. page Option Menu

39
5 CONCLUSION

This project aims to predict disease based on symptoms. The project is set up in such a
way that the device takes the approver's symptoms (a medical doctor's) as input and
generates an output, which is disease prediction. A prediction accuracy probability of
95% is obtained on average. The GRAILS system was used to successfully incorporate
the disease predictor. In order not only to predict sickness but also to eradicate or reduce
its propagation, a front-end website was also implemented where any user would be able
to be informed concerning various diseases, especially the most prevalent ones, and how
they are transmitted. In case the user has more questions, on the Contact Us page, there is
a form that the user will have to fill out and will be contacted a few days later by a Nurse,
doctor, or any other healthcare professional depending on the nature of his or her
Questions.

The application can therefore be mounted or added to any Hospital website where
needed.

40
5.1Future Work

 I intend to remove the contact us page but rather implement a chatbot interface
function and, using the principles of deep learning, assist any User.

 I am planning to work on each page of the website to include all possible


Hospital information.

 I equally have an objective to add other users' authorization or grant to the


system (and not only doctors), for some users may have in their possession all the
information that could be asked for in the prediction form.

 I intend to ensure that the Data inserted by the user in the Spyder application is
stored for future training of the model; thus, some diseases may mutate and become more
and more difficult to predict.

 I intend to put the final application version online for free for use in unfavorable
communities in Cameroon and across Africa.

41
REFERENCES

[1] M. Jiang, Y. Chen, M. Liu, S. T. Rosenbloom, S. Mani, J. C.


Denny, and H. Xu, “A study of machine-learning-based approaches to
extract clinical entities and their assertions from discharge summaries,”
J. Am Med Inform Assoc, vol. 18, no. 5, pp. 601–606, 2011.
[2] M. Chen, Y. Hao, K. Hwang, L. Wang, and L.
Wang,“Disease prediction by machine learning over big data
from healthcare communities” IEEE Access, vol. 5, no.1,
pp.8869–8879, 2017.
[3] Sayali Ambekar, Rashmi Phalnikar, “Disease RiskPrediction by
Using Convolutional Neural Network” IEEE, 978-1-5386-5257-2/18,
2018.
[4] Naganna Chetty, Kunwar Singh Vaisla and Nagamma Patil, “An
Improved Method for Disease Predictionusing Fuzzy Approach” IEEE,
DOI 10.1109/ICACCE.2015.67, pp. 569-572, 2015.
[5] Dhiraj Dahiwade, Gajanan Patle and Ektaa Meshram, “Designing
Disease Prediction Model Using Machine Learning Approach” IEEE
Xplore Part Number: CFP19K25-ART; ISBN: 978-1-5386-7808-4, pp.
1211-1215, 2019.
[6] Lambodar Jena and Ramakrushna Swain, “ChronicDisease Risk
Prediction using Distributed Machine Learning Classifiers” IEEE, 978-
1-5386-2924-6/17, pp. 170-173, 2017.
[7] Dhomse Kanchan B. and Mahale Kishor M., “Study of Machine
Learning Algorithms for Special Disease Prediction using Principal of
Component Analysis” IEEE, 978-1-5090-0467-6/16, pp. 5-10, 2016.
[8] Ankita Dewan and Meghna Sharma, “Prediction of Heart Disease
Using a Hybrid Technique in Data Mining Classification” IEEE, 978-9-
3805-4416-8/15, pp. 704-706, 2015.

42
[9]Pahulpreet Singh Kohli and Shriya Arora, “Application of Machine
Learning in Disease Prediction” IEEE, 978-1-5386-6947-1/18, pp. 1-4,
2018.
[10]Deeraj Shetty, Kishor Rit, Sohail Shaikh and Nikita Patil, ”
Diabetes DiseasePrediction Using Data Mining” IEEE, 978-1-5090-
3294-5/17, 2017.
[11]Rashmi G Saboji and Prem Kumar Ramesh,“A Scalable Solution
for Heart Disease Prediction using Classification Mining Technique”
IEEE, 978-1-5386-1887-5/17, pp. 1780-1785, 2017.
[12] Rati Shukla, Vikash Yadav, Parashu Ram Pal and Pankaj
Pathak, "Machine Learning Techniques for Detecting and Predicting
Breast Cancer" IJITEE, ISSN: 2278-3075, Volume-8, pp. 2658-2662,
2019.
[13] Servant Leadership from a Christian Perspective Essay | Bartleby.
(n.d.). Retrieved May22, 2022, from
https://www.bartleby.com/essay/Servant-Leadership-From-a-Christian-
Perspective-F3CVTBFZTC

43

View publication stats

You might also like