Professional Documents
Culture Documents
A Gui Based Disease Predictor Using Machine Learning - Mansi - Project PDF
A Gui Based Disease Predictor Using Machine Learning - Mansi - Project PDF
Project Report
On
By
MANASI PATEL
20MCA073
(20PG030067)
GIET UNIVERSITY
GUNUPUR – 765022
2021 – 22
GIET UNIVERSITY
DEPARTMENT OF MCA
Gunupur-765022, Dist-Rayagada, Odisha, INDIA
www.giet.edu
CERTIFICATE
ACKNOWLEDGEMENT
I would like to thank of gratitude to our supervisor “Mr Jagdish Sahoo”for their
guidance and constant supervision as well as for providing necessary information regarding the
project and also for their support in completing the project. Secondly I would like to thank my
class teacher “Mr. Sibo Prasad Patro” for his appreciation to complete the project on time.’
Thirdly a special gratitude to our HOD,CSE “Dr. Sanjay Kumar Kuanar” and Assistant
on basic concepts that we learned and to implement that in our project. And lastly thanks to
project guidance “Mr. Sibo Prasad Patro” for the continuous support.
MANSI PATEL
Index
In this digital world, most of the people are prone to diseases, due to lack
of healthy food, proper sleep and daily exercise. It is very crucial to know if we
are suffering from a disease, at an early stage rather than discovering it at a later
stage. Hence disease prediction system plays an important role as it predicts the
diseases based on symptoms. Machine learning Approach for Identifying Disease
Prediction Using Machine Learning is based on prediction modeling that predicts
disease of the patients according to the symptoms provided by the users as an i/p
to the system. We proposed a GUI based disease prediction system uses Machine
Learning algorithms named Decision Tree, Random Forest, Naïve Bayes. This
system also suggests drugs that are most commonly used to cure the disease. This
paper gives an idea of predicting multiple diseases using Machine Learning
algorithms. Here we will use the concept of supervised Machine Learning in
which implementation will be done by applying Decision Tree, Random Forest,
Naïve Bayes algorithm which will help in early prediction of disease accurately
and better patients care. The results ensured that the system would be functional
and user oriented for patients for timely diagnoses of diseases in a patient.
Machine Learning is a subset of Ai that is mainly deal with the study of
algorithms which improve with the use of data and experience. Machine Learning
has two phases i.e. Training and Testing. Machine Learning provides an efficient
platform in medical field to solve various healthcare issues at a much faster rate.
There are two kinds of Machine Learning –Supervised Learning and
Unsupervised Learning. In supervised learning we frame a model with the help of
the data that is well labeled. On the other hand, unsupervised Learning model
learn from unlabeled data. The intent is to deduce a satisfactory Machine
Learning algorithm which is efficient and accurate for the prediction of disease.
The main feature will be Machine Learning in which we will be using algorithms
such as Decision Tree, Random Forest, Naïve Bayes which will help in early
prediction of disease accurately and better patient care.
Accurate and on-time analysis of any health related problem is important
for the prevention and treatment of the illness. The traditional way of diagnosis
may not be sufficient in the case of a serious ailment. Developing a medical
diagnosis system based on machine learning (ML) algorithms for prediction of
any disease can help in a more accurate diagnosis than the conventional method.
We have designed a disease prediction system using multiple ML algorithms. The
dataset used had more than 230 diseases for processing. Based on the symptoms,
age, and gender of an individual, the diagnosis system gives the output as the
disease that the individual might be suffering from. The weighted KNN algorithm
gave the best results as compared to the other algorithms. The accuracy of the
weighted KNN algorithm for the prediction was 93.5 %. Our diagnosis model can
act as a doctor for the early diagnosis of a disease to ensure the treatment can take
place on time and lives can be saved.
The healthcare domain is one of the prominent research fields in the
current scenario with the rapid improvement of technology and data. It is difficult
to handle the huge amount of data of the patients. It is easier to handle this data
through Big Data Analytics. There are a lot of procedures for the treatment of
multiple diseases across the world. Machine Learning is an emerging approach
that helps in prediction, diagnosis of a disease. This paper depicts the prediction
of disease based on symptoms using machine learning. Machine Learning
algorithms such as Naive Bayes, Decision Tree and Random Forest are employed
on the provided dataset and predict the disease. Its implementation is done
through the python programming language. The research demonstrates the best
algorithm based on their accuracy. The accuracy of an algorithm is determined by
the performance on the given dataset.
“Disease Prediction” system based on predictive modeling predicts the
disease of the user on the basis of the symptoms that user provides as an input to
the system. The system analyzes the symptoms provided by the user as input and
gives the probability of the disease as an output Disease Prediction is done by
implementing the Naïve Bayes Classifier. Naïve Bayes Classifier calculates the
probability of the disease. Therefore, average prediction accuracy probability
60% is obtained.
INTRODUCTION
At present, when one suffers from particular disease, then the person has to visit
to doctor which is time consuming and costly too. Also if the user is out of reach
of doctor and hospitals it may be difficult for the user as the disease can not be
identified. So, if the above process can be completed using a automated program
which can save time as well as money, it could be easier to the patient which can
make the process easier. There are other Heart related Disease Prediction System
using data mining techniques that analyzes the risk level of the patient. Disease
Predictor is a web based application that predicts the disease of the user with
respect to the symptoms given by the user.
As the use of internet is growing every day, people are always curious to
know different new things. People always try to refer to the internet if any
problem arises. People have access to internet than hospitals and doctors. People
do not have immediate option when they suffer with particular disease. So, this
system can be helpful to the people as they have access to internet 24 hours.
Machine learning is programming computers to optimize a performance using
example data or past data. Machine learning is the study of computer systems that
learn from data and experience. Machine learning algorithm has two tracks:
Training, Testing. Prediction of a disease by using patient's symptoms and history
machine learning technology is striving from past decades. Machine Learning
technology gives an immeasurable platform in the medical field so that healthcare
issues can be resolved efficiently. We are applying machine learning to
maintained complete hospital data Machine learning technology which allows
building models to get quickly analyze data and deliver results faster, with the use
of machine learning technology doctors can make a good decision for patient
diagnoses and treatment options, which leads to improvement of patient
healthcare services. Healthcare is the most prime example of how machine
learning is used in the medical field. To improve the accuracy from massive data,
the existing work will be done on unstructured and textual data. For the
prediction of diseases, the existing will be done on linear, KNN, Decision Tree
algorithm. The order of reference in the running text should match with the list of
references at the end of the paper.
➢ PURPOSE:
This system is used to predict disease according to symptoms. This system uses
decision tree classifier for evaluating the model. This system is used by end-
users. The system will predict disease based on symptoms. This system uses
Machine Learning Technology. For predicting diseases, the decision tree
classifier algorithm is used. This system is for those people who are always
fretting about their health, for this reason, we provide some features which
acknowledge them and enhance their mood too. So, there is a feature for the
awareness of health 'Disease Predictor', which recognize disease according to
symptoms. prediction has the potential to benefit stakeholders such as the
government and health insurance companies. It can identify patients at risk of
disease or health conditions. There are many tools related to disease prediction.
But particularly heart related diseases have been analyzed and risk level is
generated. But generally there are no such tools that are used for prediction of
general diseases. So Disease Predictor helps for the prediction of the general
diseases.
To implement Naïve Bayes Classifier that classifies the disease as per the
input of the user. To develop web interface platform for the prediction of the
disease. There is a need to study and make a system which will make it easy for
an end-user to predict the permanent diseases without visiting a physician or
doctor for a diagnosis. To detect the Various Diseases through the examining
Symptoms of patient's using various methods of Machine Learning Models. To
Manage Text data and Structured data is no Proper method. The Recommended
system will examine both structure and unstructured data. The Predictions
Accuracy will Improve using Machine Learning.
➢ PROJECT SCOPE:
This project aims to provide a GUI based platform to predict the occurrences of
disease on the basis of various symptoms. The user can select various symptoms
and can find the diseases with their probabilistic figures. There is a need to study
and make a system which will make it easy for an end-user to predict the
permanent diseases without visiting a physician or doctor for a diagnosis. To
detect the Various Diseases through the examining Symptoms of patient's using
various methods of Machine Learning Models. To Manage Text data and
Structured data is no Proper method hence the data converted to a structure. The
Recommended system will examine both structure and unstructured data. The
Predictions Accuracy will Improve using 3 different Machine Learning
algorithms including Random Forest Classifier, Decision Tree, Naïve Bayes
Classifier. In this proposed model the user will input his health issues and the
GUI model will predict the disease and gives the result in the screen.
In this project we were collected various disease symptoms and using
machine learning algorithms the dataset were trained and tested. As per the model
various disease predictions are carried out. In the future we will implement this
project with various other machine learning algorithms using voting classifier.
➢ PROJECT FEATURES:
Our project can identify patients at risk of disease or health conditions. Clinicians
can then take appropriate measures to avoid or minimize the risk and in turn,
improve quality of care and avoid potential hospital admissions. The advantage of
our project are as follows:
➢ USER REQUIREMENT:
➢ FUNCTIONAL REQUIREMENTS :
• Authorized user:
User have to go through an authentication process in order to enter into the web
app, this provides security so that other users cannot view your credentials.
• Model:
It is a machine learning web app , it means that the information passed are
processed by a highly intelligent algorithm which after training can give efficient
results.
• Data:
The data fetched are the valid users credentials that are entered by users, stored in
a database. The database is firebase, which is a highly effective in managing large
databases.
➢ NON FUNCTIONAL REQUIREMENTS:
• Performance Requirements:
• Safety Requirements:
In order to prevent data loss in case of system failure, the user information
provided by the user saved in the database, for the system to resume the
prediction process on reboot.
In case the EA detects any security lapse in the system, he should able to
shutdown the server and close all connections immediately while preserving the
already information. The system should be capable of gracefully recovering from
earlier crashes and continuing the prediction process.
• Security Requirements:
• User Documentation:
Documentation for EA
The user interface (through browser) is easy enough to use even for a lay
user. But minimal instructions may be provided at the bottom of each web-page
as an aid forth un-introduced.
➢ HARDWARE AND SOFTWARE REQUIREMENT:-
Memory 128 GB
Analyze the problem in terms of what we want to predict and what kind of
observation data we have to make those predictions. Predictions are generally a
label or a target answer; it may be a yes/no label (binary classification) or a
category (multiclass classification) or a real number (regression).
Identify what kind of historical data we have for prediction modeling, the next
step is to collect the data from datasets or from any other data sources.
Transform the data in the form that the Machine Learning system can understand.
Graphical User Interface (GUI) is designed for taking input and displaying
output. There are 5 input text boxes which consist of dropdown menu of
symptoms and the user can select those one by one. Python Tkinter package is
used for designing the GUI. On pressing the ‘Result’ button, the disease is
predicted in the output field. Also, the drugs are described in the specified field.
Before training the model, it is essential to split the data into training and
evaluation sets, as we need to monitor how well a model generalizes to unseen
data. Now, the algorithm will learn the pattern and mapping between the feature
and the label.
Test the model on unknown data. After the system starts working properly, the
model is complete.
Level0:
• Flow Chart:
• UML Diagram:
It explains the sequence of the Disease Predictor. Initially system shows the symptoms to be
selected. The user selects the symptoms and submits to the system .The Disease Predictor
predicts and display the result
It explains different state of the system. First the user opens Disease Predictor. The user selects
the symptoms. When finished selecting symptoms the user submits the symptoms. Disease
Predictor analyzes the symptoms and displays the result.
(State diagram)
Class Diagram:
It explain the classes used in the Disease Predictor. There are three classes used in total,
Symptoms Reader: Reads the user input and creates the list of symptoms Symptoms Analyzer:
According to symptoms parameter displays the subjective result. Calculate Values: Calculates
the probabilistic model of the diseases.
(Class diagram)
➢ LOW LEVEL DESIGN
• Process Specification(Algorithms)
We proposed a GUI based disease prediction system uses Machine Learning algorithms named
Decision Tree, Random Forest, Naïve Bayes. This system also suggests drugs that are most
commonly used to cure the disease. This paper gives an idea of predicting multiple diseases
using Machine Learning algorithms. Here we will use the concept of supervised Machine
Learning in which implementation will be done by applying Decision Tree, Random Forest,
Naïve Bayes algorithm which will help in early prediction of disease accurately and better
patients care. The results ensured that the system would be functional and user oriented for
patients for timely diagnoses of diseases in a patient.
✓ Decision Tree:
The decision tree type used in this research is the gain ratio decision tree. The gain ratio
decision tree is based on the entropy (information gain) approach, which selects the splitting
attribute that minimizes the value of entropy, therefore maximizing the information gain.
Information gain is the contrast between the original information content and the amount of
information required. The features are ranked by the information gains, and then the top-ranked
features are chosen as the potential attributes used in the classifier. To distinguish the splitting
attribute of the decision tree, one must calculate the information gain for each attribute and then
select the attribute that maximizes the information gain. The information gain for each attribute
is calculated using the following formula:
E = ∑Ki=1 pilog2 pi
Where k is the number of classes of the target attributes Pi is the number of occurrences of
class i divided by the total number of instances (i.e. the probability of i occurring). To reduce
the effect of bias resulting from the use of information gain, a variant is known as gain-ratio
was introduced by the Australian academic Ross Quinlan. The information gain measure is
biased toward tests with many consequences. That is, it favours to select attributes having a
large number of values. Gain Ratio regulates the information gain for each attribute to allow for
the breadth and uniformity of the attribute value.
Decision Trees are supervised learning method used for regression and classification. It learns
the simple decision rules after inferring the data features and hence predicts target variable
value.
There are various decision tree algorithms like ID3, C4.5, C5.0 and CART. CART is the most
recent and enhanced version and hence the same has been used in our model.
(a) Gini impurity It is used by the CART algorithm for classification trees. It is a measure of
how often a randomly chosen element from the set would be incorrectly labelled if it was
randomly labelled according to the distribution of labels in the subset.
(b) Information gain it is used by the ID3, C4.5 and C5.0 tree generation algorithms. It is based
on the concept of entropy and information content from information theory. It is used to decide
which feature to split on at each step in building the tree.
✓ Naïve Bayes:
Where P(h|d) is the probability of hypothesis h given the data d. This is called the posterior
probability. P(d|h) is the probability of data d given that the hypothesis h was true. P(h) is the
probability of hypothesis h being true (regardless of the data). This is called the prior
probability of h. P(d) is the probability of the data (regardless of the hypothesis).
✓ Random Forest:
Random forest has nearly the same hyperparameters as a decision tree or a bagging
classifier. Fortunately, there’s no need to combine a decision tree with a bagging classifier
because you can easily use the classifier-class of random forest. With random forest, you can
also deal with regression tasks by using the algorithm’s regressor.
Random forest adds additional randomness to the model, while growing the trees.
Instead of searching for the most important feature while splitting a node, it searches for the
best feature among a random subset of features. This results in a wide diversity that generally
results in a better model.
Therefore, in random forest, only a random subset of the features is taken into
consideration by the algorithm for splitting a node. You can even make trees more random by
additionally using random thresholds for each feature rather than searching for the best possible
thresholds (like a normal decision tree does).
Screenshot Diagram
CODING
from tkinter import *
import numpy as np
import pandas as pd
l1=['back_pain','constipation','abdominal_pain','diarrhoea','mild_fever','yellow_urine',
'yellowing_of_eyes','acute_liver_failure','fluid_overload','swelling_of_stomach',
'swelled_lymph_nodes','malaise','blurred_and_distorted_vision','phlegm','throat_irritation',
'redness_of_eyes','sinus_pressure','runny_nose','congestion','chest_pain','weakness_in_limbs',
'fast_heart_rate','pain_during_bowel_movements','pain_in_anal_region','bloody_stool',
'irritation_in_anus','neck_pain','dizziness','cramps','bruising','obesity','swollen_legs',
'swollen_blood_vessels','puffy_face_and_eyes','enlarged_thyroid','brittle_nails',
'swollen_extremeties','excessive_hunger','extra_marital_contacts','drying_and_tingling_lips',
'slurred_speech','knee_pain','hip_joint_pain','muscle_weakness','stiff_neck','swelling_joints',
'movement_stiffness','spinning_movements','loss_of_balance','unsteadiness',
'weakness_of_one_body_side','loss_of_smell','bladder_discomfort','foul_smell_of urine',
'continuous_feel_of_urine','passage_of_gases','internal_itching','toxic_look_(typhos)',
'depression','irritability','muscle_pain','altered_sensorium','red_spots_over_body','belly_pain',
'abnormal_menstruation','dischromic_patches','watering_from_eyes','increased_appetite','polyu
ria','family_history','mucoid_sputum','rusty_sputum','lack_of_concentration','visual_disturbance
s','receiving_blood_transfusion','receiving_unsterile_injections','coma','stomach_bleeding','diste
ntion_of_abdomen','history_of_alcohol_consumption','fluid_overload','blood_in_sputum','prom
inent_veins_on_calf','palpitations','painful_walking','pus_filled_pimples','blackheads','scurring',
'skin_peeling','silver_like_dusting','small_dents_in_nails','inflammatory_nails','blister','red_sore
_around_nose','yellow_crust_ooze']
'Hepatitis B','HepatitisC','HepatitisD','HepatitisE','Alcoholichepatitis','Tuberculosis',
'Common Cold','Pneumonia','Dimorphichemmorhoids(piles)',
'Heartattack','Varicoseveins','Hypothyroidism','Hyperthyroidism','Hypoglycemia','Osteoarthristi
s,
'Impetigo']
l2=[]
for i in range(0,len(l1)):
l2.append(0)
df=pd.read_csv("Prototype.csv")
#Replace the values in the imported file by pandas by the inbuilt function replace in pandas.
'HepatitisB':20,'HepatitisC':21,'HepatitisD':22,'HepatitisE':23,'Alcoholichepatitis':24,'Tuberculo
sis':25,'CommonCold':26,'Pneumonia':27,'Dimorphichemmorhoids(piles)':28,'Heartattack':29,'
Varicoseveins':30,'Hypothyroidism':31,'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis
':34,'Arthritis':35,'(vertigo) Paroymsal Positional Vertigo':36,'Acne':37,'Urinary tract
infection':38,'Psoriasis':39,'Impetigo':40
},inplace=True)
#check the df
#print(df.head())
X= df[l1]
#print(X)
y = df[["prognosis"]]
np.ravel(y)
#print(y)
tr=pd.read_csv("Prototype 1.csv")
'Migraine':11,'Cervical spondylosis':12,
'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis':34,'Arthritis':35,
'Impetigo':40}},inplace=True)
X_test= tr[l1]
y_test = tr[["prognosis"]]
#print(y_test)
np.ravel(y_test)
def DecisionTree():
clf3 = tree.DecisionTreeClassifier()
clf3 = clf3.fit(X,y)
y_pred=clf3.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf3.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
t1.delete("1.0", END)
t1.insert(END, disease[a])
else:
t1.delete("1.0", END)
def randomforest():
clf4 = RandomForestClassifier()
clf4 = clf4.fit(X,np.ravel(y))
# calculating accuracy
y_pred=clf4.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = clf4.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
t2.delete("1.0", END)
t2.insert(END, disease[a])
else:
t2.delete("1.0", END)
def NaiveBayes():
gnb = GaussianNB()
gnb=gnb.fit(X,np.ravel(y))
y_pred=gnb.predict(X_test)
print(accuracy_score(y_test, y_pred))
print(accuracy_score(y_test, y_pred,normalize=False))
psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]
for k in range(0,len(l1)):
for z in psymptoms:
if(z==l1[k]):
l2[k]=1
inputtest = [l2]
predict = gnb.predict(inputtest)
predicted=predict[0]
h='no'
for a in range(0,len(disease)):
if(predicted == a):
h='yes'
break
if (h=='yes'):
t3.delete("1.0", END)
t3.insert(END, disease[a])
else:
t3.delete("1.0", END)
root = Tk()
root.configure(background='black')
Symptom1 = StringVar()
Symptom1.set("Select Here")
Symptom2 = StringVar()
Symptom2.set("Select Here")
Symptom3 = StringVar()
Symptom3.set("Select Here")
Symptom4 = StringVar()
Symptom4.set("Select Here")
Symptom5 = StringVar()
Symptom5.set("Select Here")
Name = StringVar()
w2 = Label(root, justify=LEFT, text="Disease Predictor using Machine Learning", fg="Red",
bg="White")
w2.config(font=("Times",30,"bold italic"))
w2.config(font=("Times",30,"bold italic"))
NameLb.config(font=("Times",15,"bold italic"))
S1Lb.config(font=("Times",15,"bold italic"))
S2Lb.config(font=("Times",15,"bold italic"))
S3Lb.config(font=("Times",15,"bold italic"))
S4Lb.config(font=("Times",15,"bold italic"))
S5Lb.config(font=("Times",15,"bold italic"))
lrLb.config(font=("Times",15,"bold italic"))
destreeLb.config(font=("Times",15,"bold italic"))
ranfLb.config(font=("Times",15,"bold italic"))
OPTIONS = sorted(l1)
NameEn.grid(row=6, column=1)
S1 = OptionMenu(root, Symptom1,*OPTIONS)
S1.grid(row=7, column=1)
S2 = OptionMenu(root, Symptom2,*OPTIONS)
S2.grid(row=8, column=1)
S3 = OptionMenu(root, Symptom3,*OPTIONS)
S3.grid(row=9, column=1)
S4 = OptionMenu(root, Symptom4,*OPTIONS)
S4.grid(row=10, column=1)
S5 = OptionMenu(root, Symptom5,*OPTIONS)
S5.grid(row=11, column=1)
dst.config(font=("Times",15,"bold italic"))
dst.grid(row=8, column=3,padx=10)
rnf.config(font=("Times",15,"bold italic"))
rnf.grid(row=9, column=3,padx=10)
lr.config(font=("Times",15,"bold italic"))
lr.grid(row=10, column=3,padx=10)
t1.config(font=("Times",15,"bold italic"))
t3.config(font=("Times",15,"bold italic"))
root.mainloop()
TESTING
The test case designed for the project is discussed below:
2. Select submit
Expected Result: The symptoms selected should be submitted and further analyzed to calculate the
probability of the disease.
CONCLUSION
To conclude, our system is helpful to those people who are always worrying about their health
and they need to know what happens with their body. Our main motto to develop this system is
to know them for their health. Especially, people who are suffering from mental illness like
depression, anxiety. They can come out of these problems and can live their daily lives
easily.This project aims to predict the disease on the basis of the symptoms. The project is
designed in such a way that the system takes symptoms from the user as input and produces
output i.e.predict disease. Average prediction accuracy probability of 55% is obtained. Disease
Predictor was successfully implemented using grails framework.