Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

A

Project Report
On

“A GUI BASED DISEASE PREDICTOR USING


MACHINE LEARNING Classifiers”

Submitted in partial fulfillment of


the requirements for the 4th Semester Sessional Examination of
(Major Project)
Master of Computer Application

By

MANASI PATEL
20MCA073
(20PG030067)

Under the esteemed guidance of


Mr. Jagadish Sahoo

DEPARTMENT OF MASTER OF COMPUTER APPLICATION

GIET UNIVERSITY
GUNUPUR – 765022
2021 – 22
GIET UNIVERSITY
DEPARTMENT OF MCA
Gunupur-765022, Dist-Rayagada, Odisha, INDIA
www.giet.edu

CERTIFICATE

This is to certify that the project work entitled “A GUI BASED

DISEASE PREDICTOR USING MACHINE LEARNING

CLASSIFIERS” is done by Mansi Patel in partial fulfillment of the

requirements for the 4th Semester Sessional Examination (Major

Project) of Master of Computer Application during the academic year

2021-22. This work is submitted to the department as a part of the

evaluation of the 4th Semester Project.

Mr. Jagadish Sahoo Prof. (Dr). Sanjay Kumar Kuanar


Project Supervisor HoD, CSE
GIET UNIVERSITY
DEPARTMENT OF MCA
Gunupur-765022, Dist-Rayagada, Odisha, INDIA
www.giet.edu

ACKNOWLEDGEMENT

I would like to thank of gratitude to our supervisor “Mr Jagdish Sahoo”for their

guidance and constant supervision as well as for providing necessary information regarding the

project and also for their support in completing the project. Secondly I would like to thank my

class teacher “Mr. Sibo Prasad Patro” for his appreciation to complete the project on time.’

Thirdly a special gratitude to our HOD,CSE “Dr. Sanjay Kumar Kuanar” and Assistant

HOD,MCA “Dr. NeelamadhabPadhy” for providing us a opportunity to do this project, based

on basic concepts that we learned and to implement that in our project. And lastly thanks to

project guidance “Mr. Sibo Prasad Patro” for the continuous support.

MANSI PATEL
Index

Sl No Topic Page No.


1 Abstact…………………………………….. 5-7
2 Introduction………………………………. 8-9
3 Purpose……………………………………. 10
4 Project Scope……………………………… 11
5 Project Features…………………………… 12
6 System Analysis…………………………… 13-15
7 Hardware and Software
Requirement………………………………. 16
8 System Design And Specification………… 17
9 Low level Design 23
10 Screenshot of Project Output 26
10 Coding…………………………………….... 27
11 Testing……………………………………… 38
12 Conclusion………………………………….. 39
13 Limitation…………………………………… 40
14 Reference……………………………………. 41
Abstract

In this digital world, most of the people are prone to diseases, due to lack
of healthy food, proper sleep and daily exercise. It is very crucial to know if we
are suffering from a disease, at an early stage rather than discovering it at a later
stage. Hence disease prediction system plays an important role as it predicts the
diseases based on symptoms. Machine learning Approach for Identifying Disease
Prediction Using Machine Learning is based on prediction modeling that predicts
disease of the patients according to the symptoms provided by the users as an i/p
to the system. We proposed a GUI based disease prediction system uses Machine
Learning algorithms named Decision Tree, Random Forest, Naïve Bayes. This
system also suggests drugs that are most commonly used to cure the disease. This
paper gives an idea of predicting multiple diseases using Machine Learning
algorithms. Here we will use the concept of supervised Machine Learning in
which implementation will be done by applying Decision Tree, Random Forest,
Naïve Bayes algorithm which will help in early prediction of disease accurately
and better patients care. The results ensured that the system would be functional
and user oriented for patients for timely diagnoses of diseases in a patient.
Machine Learning is a subset of Ai that is mainly deal with the study of
algorithms which improve with the use of data and experience. Machine Learning
has two phases i.e. Training and Testing. Machine Learning provides an efficient
platform in medical field to solve various healthcare issues at a much faster rate.
There are two kinds of Machine Learning –Supervised Learning and
Unsupervised Learning. In supervised learning we frame a model with the help of
the data that is well labeled. On the other hand, unsupervised Learning model
learn from unlabeled data. The intent is to deduce a satisfactory Machine
Learning algorithm which is efficient and accurate for the prediction of disease.
The main feature will be Machine Learning in which we will be using algorithms
such as Decision Tree, Random Forest, Naïve Bayes which will help in early
prediction of disease accurately and better patient care.
Accurate and on-time analysis of any health related problem is important
for the prevention and treatment of the illness. The traditional way of diagnosis
may not be sufficient in the case of a serious ailment. Developing a medical
diagnosis system based on machine learning (ML) algorithms for prediction of
any disease can help in a more accurate diagnosis than the conventional method.
We have designed a disease prediction system using multiple ML algorithms. The
dataset used had more than 230 diseases for processing. Based on the symptoms,
age, and gender of an individual, the diagnosis system gives the output as the
disease that the individual might be suffering from. The weighted KNN algorithm
gave the best results as compared to the other algorithms. The accuracy of the
weighted KNN algorithm for the prediction was 93.5 %. Our diagnosis model can
act as a doctor for the early diagnosis of a disease to ensure the treatment can take
place on time and lives can be saved.
The healthcare domain is one of the prominent research fields in the
current scenario with the rapid improvement of technology and data. It is difficult
to handle the huge amount of data of the patients. It is easier to handle this data
through Big Data Analytics. There are a lot of procedures for the treatment of
multiple diseases across the world. Machine Learning is an emerging approach
that helps in prediction, diagnosis of a disease. This paper depicts the prediction
of disease based on symptoms using machine learning. Machine Learning
algorithms such as Naive Bayes, Decision Tree and Random Forest are employed
on the provided dataset and predict the disease. Its implementation is done
through the python programming language. The research demonstrates the best
algorithm based on their accuracy. The accuracy of an algorithm is determined by
the performance on the given dataset.
“Disease Prediction” system based on predictive modeling predicts the
disease of the user on the basis of the symptoms that user provides as an input to
the system. The system analyzes the symptoms provided by the user as input and
gives the probability of the disease as an output Disease Prediction is done by
implementing the Naïve Bayes Classifier. Naïve Bayes Classifier calculates the
probability of the disease. Therefore, average prediction accuracy probability
60% is obtained.
INTRODUCTION

At present, when one suffers from particular disease, then the person has to visit
to doctor which is time consuming and costly too. Also if the user is out of reach
of doctor and hospitals it may be difficult for the user as the disease can not be
identified. So, if the above process can be completed using a automated program
which can save time as well as money, it could be easier to the patient which can
make the process easier. There are other Heart related Disease Prediction System
using data mining techniques that analyzes the risk level of the patient. Disease
Predictor is a web based application that predicts the disease of the user with
respect to the symptoms given by the user.
As the use of internet is growing every day, people are always curious to
know different new things. People always try to refer to the internet if any
problem arises. People have access to internet than hospitals and doctors. People
do not have immediate option when they suffer with particular disease. So, this
system can be helpful to the people as they have access to internet 24 hours.
Machine learning is programming computers to optimize a performance using
example data or past data. Machine learning is the study of computer systems that
learn from data and experience. Machine learning algorithm has two tracks:
Training, Testing. Prediction of a disease by using patient's symptoms and history
machine learning technology is striving from past decades. Machine Learning
technology gives an immeasurable platform in the medical field so that healthcare
issues can be resolved efficiently. We are applying machine learning to
maintained complete hospital data Machine learning technology which allows
building models to get quickly analyze data and deliver results faster, with the use
of machine learning technology doctors can make a good decision for patient
diagnoses and treatment options, which leads to improvement of patient
healthcare services. Healthcare is the most prime example of how machine
learning is used in the medical field. To improve the accuracy from massive data,
the existing work will be done on unstructured and textual data. For the
prediction of diseases, the existing will be done on linear, KNN, Decision Tree
algorithm. The order of reference in the running text should match with the list of
references at the end of the paper.
➢ PURPOSE:

This system is used to predict disease according to symptoms. This system uses
decision tree classifier for evaluating the model. This system is used by end-
users. The system will predict disease based on symptoms. This system uses
Machine Learning Technology. For predicting diseases, the decision tree
classifier algorithm is used. This system is for those people who are always
fretting about their health, for this reason, we provide some features which
acknowledge them and enhance their mood too. So, there is a feature for the
awareness of health 'Disease Predictor', which recognize disease according to
symptoms. prediction has the potential to benefit stakeholders such as the
government and health insurance companies. It can identify patients at risk of
disease or health conditions. There are many tools related to disease prediction.
But particularly heart related diseases have been analyzed and risk level is
generated. But generally there are no such tools that are used for prediction of
general diseases. So Disease Predictor helps for the prediction of the general
diseases.
To implement Naïve Bayes Classifier that classifies the disease as per the
input of the user. To develop web interface platform for the prediction of the
disease. There is a need to study and make a system which will make it easy for
an end-user to predict the permanent diseases without visiting a physician or
doctor for a diagnosis. To detect the Various Diseases through the examining
Symptoms of patient's using various methods of Machine Learning Models. To
Manage Text data and Structured data is no Proper method. The Recommended
system will examine both structure and unstructured data. The Predictions
Accuracy will Improve using Machine Learning.
➢ PROJECT SCOPE:

This project aims to provide a GUI based platform to predict the occurrences of
disease on the basis of various symptoms. The user can select various symptoms
and can find the diseases with their probabilistic figures. There is a need to study
and make a system which will make it easy for an end-user to predict the
permanent diseases without visiting a physician or doctor for a diagnosis. To
detect the Various Diseases through the examining Symptoms of patient's using
various methods of Machine Learning Models. To Manage Text data and
Structured data is no Proper method hence the data converted to a structure. The
Recommended system will examine both structure and unstructured data. The
Predictions Accuracy will Improve using 3 different Machine Learning
algorithms including Random Forest Classifier, Decision Tree, Naïve Bayes
Classifier. In this proposed model the user will input his health issues and the
GUI model will predict the disease and gives the result in the screen.
In this project we were collected various disease symptoms and using
machine learning algorithms the dataset were trained and tested. As per the model
various disease predictions are carried out. In the future we will implement this
project with various other machine learning algorithms using voting classifier.
➢ PROJECT FEATURES:

Our project can identify patients at risk of disease or health conditions. Clinicians
can then take appropriate measures to avoid or minimize the risk and in turn,
improve quality of care and avoid potential hospital admissions. The advantage of
our project are as follows:

1. An automated disease prediction system


2. An interactive user interface
3. GUI based environment
4. Easy to use
5. End user need to input only few information
6. Accuracy of our application is good
7. Using novel supervised machine learning algorithm the data set trained
8. More reliable application
9. The application can run in all platforms
10.The main goal of this research is to find the best accuracy for the prediction
of disease by using major risk factors based on different classifier
algorithms such as NB, RF and DT
SYSTEM ANALYSIS

➢ USER REQUIREMENT:

SRS will include two sections-

• Overall description will describe major components of the system,


interconnection and external interfaces.
• Specification requirements will describe the functions of actors, their role
in the system and constraints.

➢ FUNCTIONAL REQUIREMENTS :

• Authorized user:

User have to go through an authentication process in order to enter into the web
app, this provides security so that other users cannot view your credentials.

• Model:

It is a machine learning web app , it means that the information passed are
processed by a highly intelligent algorithm which after training can give efficient
results.

• Data:

The data fetched are the valid users credentials that are entered by users, stored in
a database. The database is firebase, which is a highly effective in managing large
databases.
➢ NON FUNCTIONAL REQUIREMENTS:

• Performance Requirements:

The web app is highly reliable as it predicts highly effecting In predicting


user heart condition. The algorithm works very fast as we are using highly
advanced algorithm that can predict when given information. The response time
is very fast as the prediction is done rapidly.

• Safety Requirements:

In order to prevent data loss in case of system failure, the user information
provided by the user saved in the database, for the system to resume the
prediction process on reboot.

In case the EA detects any security lapse in the system, he should able to
shutdown the server and close all connections immediately while preserving the
already information. The system should be capable of gracefully recovering from
earlier crashes and continuing the prediction process.

• Security Requirements:

The system should provide basic security features like password


authentication and encrypted transactions. All the passwords generated and
communicated to the users should be stored inthe server only in an encrypted
form for login management to prevent misuse. Serial attacks should be avoided
by maintaining a minimum time gap between successive invalid log-in
attempts.Additional security features like voter anonymity and threshold schemes
for multipleEAs might providedlater on as an add-on feature to the software.
• Project Documentation:
A detailed design document should be presented which should describe in
detail, theclasses, interfaces and control flow involved in detail. The final code
would contain acomplete description of classes, their methods, their inputs and
outputs.

• User Documentation:

Documentation for EA

A step by step cross-referenced tutorial like manual should be provided for


the EA in order to help him set-up a server and the e-voting system on it. All
features and GUI interface details should also be clearly provided in the
document.

Documentation for users

The user interface (through browser) is easy enough to use even for a lay
user. But minimal instructions may be provided at the bottom of each web-page
as an aid forth un-introduced.
➢ HARDWARE AND SOFTWARE REQUIREMENT:-

HARDWARE AND SOFTWARE SPECIFICATION

Memory 128 GB

Processor Intel Xeon E5 2667 v4 @ 3.20


GHz*16

GPU NVIDIA lesla M60

Operating System Windows

Labeling Software Labeling

Deep Learning Library Tensor Flow

Programming Language Python


SYSTEM DESIGN AND SPECIFICATION
➢ HIGH LEVEL DESIGN
• Project model

STEPS TO DEVELOP THE PROJECT:

a) Analyzing the problem statement & requirements

Analyze the problem in terms of what we want to predict and what kind of
observation data we have to make those predictions. Predictions are generally a
label or a target answer; it may be a yes/no label (binary classification) or a
category (multiclass classification) or a real number (regression).

b) Collect and clean the data

Identify what kind of historical data we have for prediction modeling, the next
step is to collect the data from datasets or from any other data sources.

c) Prepare data for ML application

Transform the data in the form that the Machine Learning system can understand.

d) Prepare the Graphical User Interface (GUI) of the model

Graphical User Interface (GUI) is designed for taking input and displaying
output. There are 5 input text boxes which consist of dropdown menu of
symptoms and the user can select those one by one. Python Tkinter package is
used for designing the GUI. On pressing the ‘Result’ button, the disease is
predicted in the output field. Also, the drugs are described in the specified field.

e) Train the model

Before training the model, it is essential to split the data into training and
evaluation sets, as we need to monitor how well a model generalizes to unseen
data. Now, the algorithm will learn the pattern and mapping between the feature
and the label.

f) Evaluate and Improve model Accuracy

Accuracy is a measure to know how well or bad a model is doing on an unseen


validation set. Based on the current learning, evaluate the model on validation
sets.

g) Test the model

Test the model on unknown data. After the system starts working properly, the
model is complete.

• DFD(Data Flow Diagram):

A data-flow diagram is a way of representing a flow of data through a process or


a system. The DFD also provides information about the outputs and inputs of
each entity and the process itself. A data-flow diagram has no control flow —
there are no decision rules and no loops.

Level0:
• Flow Chart:
• UML Diagram:

Use Case Diagram:

It explains the sequence of the Disease Predictor. Initially system shows the symptoms to be
selected. The user selects the symptoms and submits to the system .The Disease Predictor
predicts and display the result

(Use case diagram)


State Diagram:

It explains different state of the system. First the user opens Disease Predictor. The user selects
the symptoms. When finished selecting symptoms the user submits the symptoms. Disease
Predictor analyzes the symptoms and displays the result.

(State diagram)
Class Diagram:

It explain the classes used in the Disease Predictor. There are three classes used in total,
Symptoms Reader: Reads the user input and creates the list of symptoms Symptoms Analyzer:
According to symptoms parameter displays the subjective result. Calculate Values: Calculates
the probabilistic model of the diseases.

(Class diagram)
➢ LOW LEVEL DESIGN

• Process Specification(Algorithms)
We proposed a GUI based disease prediction system uses Machine Learning algorithms named
Decision Tree, Random Forest, Naïve Bayes. This system also suggests drugs that are most
commonly used to cure the disease. This paper gives an idea of predicting multiple diseases
using Machine Learning algorithms. Here we will use the concept of supervised Machine
Learning in which implementation will be done by applying Decision Tree, Random Forest,
Naïve Bayes algorithm which will help in early prediction of disease accurately and better
patients care. The results ensured that the system would be functional and user oriented for
patients for timely diagnoses of diseases in a patient.

✓ Decision Tree:

The decision tree type used in this research is the gain ratio decision tree. The gain ratio
decision tree is based on the entropy (information gain) approach, which selects the splitting
attribute that minimizes the value of entropy, therefore maximizing the information gain.
Information gain is the contrast between the original information content and the amount of
information required. The features are ranked by the information gains, and then the top-ranked
features are chosen as the potential attributes used in the classifier. To distinguish the splitting
attribute of the decision tree, one must calculate the information gain for each attribute and then
select the attribute that maximizes the information gain. The information gain for each attribute
is calculated using the following formula:

E = ∑Ki=1 pilog2 pi

Where k is the number of classes of the target attributes Pi is the number of occurrences of
class i divided by the total number of instances (i.e. the probability of i occurring). To reduce
the effect of bias resulting from the use of information gain, a variant is known as gain-ratio
was introduced by the Australian academic Ross Quinlan. The information gain measure is
biased toward tests with many consequences. That is, it favours to select attributes having a
large number of values. Gain Ratio regulates the information gain for each attribute to allow for
the breadth and uniformity of the attribute value.

Decision Trees are supervised learning method used for regression and classification. It learns
the simple decision rules after inferring the data features and hence predicts target variable
value.

There are various decision tree algorithms like ID3, C4.5, C5.0 and CART. CART is the most
recent and enhanced version and hence the same has been used in our model.

(a) Gini impurity It is used by the CART algorithm for classification trees. It is a measure of
how often a randomly chosen element from the set would be incorrectly labelled if it was
randomly labelled according to the distribution of labels in the subset.

(b) Information gain it is used by the ID3, C4.5 and C5.0 tree generation algorithms. It is based
on the concept of entropy and information content from information theory. It is used to decide
which feature to split on at each step in building the tree.

✓ Naïve Bayes:

It is a machine learning algorithm for classification problems and is based on Bayes’


probability theorem. The primary use of this is to do text classification which involves high
dimensional training data sets. We used the Bayes theorem that can be defined as:

P(h|d) = P (d|h)·P (h) /P (d)

Where P(h|d) is the probability of hypothesis h given the data d. This is called the posterior
probability. P(d|h) is the probability of data d given that the hypothesis h was true. P(h) is the
probability of hypothesis h being true (regardless of the data). This is called the prior
probability of h. P(d) is the probability of the data (regardless of the hypothesis).
✓ Random Forest:

Random forest is a supervised learning algorithm. The "forest" it builds is an ensemble


of decision trees, usually trained with the “bagging” method. The general idea of the bagging
method is that a combination of learning models increases the overall result. One big advantage
of random forest is that it can be used for both classification and regression problems, which
form the majority of current machine learning systems.

Let’s look at random forest in classification, since classification is sometimes


considered the building block of machine learning. Below you can see how a random forest
would look like with two trees:

Random forest has nearly the same hyperparameters as a decision tree or a bagging
classifier. Fortunately, there’s no need to combine a decision tree with a bagging classifier
because you can easily use the classifier-class of random forest. With random forest, you can
also deal with regression tasks by using the algorithm’s regressor.

Random forest adds additional randomness to the model, while growing the trees.
Instead of searching for the most important feature while splitting a node, it searches for the
best feature among a random subset of features. This results in a wide diversity that generally
results in a better model.

Therefore, in random forest, only a random subset of the features is taken into
consideration by the algorithm for splitting a node. You can even make trees more random by
additionally using random thresholds for each feature rather than searching for the best possible
thresholds (like a normal decision tree does).
Screenshot Diagram
CODING
from tkinter import *

import numpy as np

import pandas as pd

#List of the symptoms is listed here in list l1.

l1=['back_pain','constipation','abdominal_pain','diarrhoea','mild_fever','yellow_urine',

'yellowing_of_eyes','acute_liver_failure','fluid_overload','swelling_of_stomach',

'swelled_lymph_nodes','malaise','blurred_and_distorted_vision','phlegm','throat_irritation',

'redness_of_eyes','sinus_pressure','runny_nose','congestion','chest_pain','weakness_in_limbs',

'fast_heart_rate','pain_during_bowel_movements','pain_in_anal_region','bloody_stool',

'irritation_in_anus','neck_pain','dizziness','cramps','bruising','obesity','swollen_legs',

'swollen_blood_vessels','puffy_face_and_eyes','enlarged_thyroid','brittle_nails',

'swollen_extremeties','excessive_hunger','extra_marital_contacts','drying_and_tingling_lips',

'slurred_speech','knee_pain','hip_joint_pain','muscle_weakness','stiff_neck','swelling_joints',

'movement_stiffness','spinning_movements','loss_of_balance','unsteadiness',

'weakness_of_one_body_side','loss_of_smell','bladder_discomfort','foul_smell_of urine',

'continuous_feel_of_urine','passage_of_gases','internal_itching','toxic_look_(typhos)',

'depression','irritability','muscle_pain','altered_sensorium','red_spots_over_body','belly_pain',

'abnormal_menstruation','dischromic_patches','watering_from_eyes','increased_appetite','polyu
ria','family_history','mucoid_sputum','rusty_sputum','lack_of_concentration','visual_disturbance
s','receiving_blood_transfusion','receiving_unsterile_injections','coma','stomach_bleeding','diste
ntion_of_abdomen','history_of_alcohol_consumption','fluid_overload','blood_in_sputum','prom
inent_veins_on_calf','palpitations','painful_walking','pus_filled_pimples','blackheads','scurring',
'skin_peeling','silver_like_dusting','small_dents_in_nails','inflammatory_nails','blister','red_sore
_around_nose','yellow_crust_ooze']

#List of Diseases is listed in list disease.

disease=['Fungal infection','Allergy','GERD','Chroniccholestasis','Drug Reaction',


'Peptic ulcer diseae','AIDS','Diabetes','Gastroenteritis','BronchialAsthma','Hypertension',

' Migraine','Cervical spondylosis',

'Paralysis (brain hemorrhage)','Jaundice','Malaria','Chickenpox','Dengue','Typhoid','hepatitis A',

'Hepatitis B','HepatitisC','HepatitisD','HepatitisE','Alcoholichepatitis','Tuberculosis',

'Common Cold','Pneumonia','Dimorphichemmorhoids(piles)',

'Heartattack','Varicoseveins','Hypothyroidism','Hyperthyroidism','Hypoglycemia','Osteoarthristi
s,

'Arthritis','(vertigo) Paroymsal PositionalVertigo','Acne','Urinary tract infection','Psoriasis',

'Impetigo']

l2=[]

for i in range(0,len(l1)):

l2.append(0)

df=pd.read_csv("Prototype.csv")

#Replace the values in the imported file by pandas by the inbuilt function replace in pandas.

df.replace({'prognosis':{'Fungal infection':0,'Allergy':1,'GERD':2,'Chronic cholestasis':3,'Drug


Reaction':4,'Pepticulcerdiseae':5,'AIDS':6,'Diabetes':7,'Gastroenteritis':8,'Bronchial
Asthma':9,'Hypertension':10,'Migraine':11,'Cervicalspondylosis':12,'Paralysis(brainhemorrhage
)':13,'Jaundice':14,'Malaria':15,'Chicken pox':16,'Dengue':17,'Typhoid':18,'hepatitis A':19,

'HepatitisB':20,'HepatitisC':21,'HepatitisD':22,'HepatitisE':23,'Alcoholichepatitis':24,'Tuberculo
sis':25,'CommonCold':26,'Pneumonia':27,'Dimorphichemmorhoids(piles)':28,'Heartattack':29,'
Varicoseveins':30,'Hypothyroidism':31,'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis
':34,'Arthritis':35,'(vertigo) Paroymsal Positional Vertigo':36,'Acne':37,'Urinary tract
infection':38,'Psoriasis':39,'Impetigo':40

},inplace=True)

#check the df

#print(df.head())

X= df[l1]
#print(X)

y = df[["prognosis"]]

np.ravel(y)

#print(y)

#Read a csv named Testing.csv

tr=pd.read_csv("Prototype 1.csv")

#Use replace method in pandas.

tr.replace({'prognosis':{'Fungal infection':0,'Allergy':1,'GERD':2,'Chronic cholestasis':3,'Drug


Reaction':4,

'Peptic ulcer diseae':5,'AIDS':6,'Diabetes ':7,'Gastroenteritis':8,'Bronchial


Asthma':9,'Hypertension ':10,

'Migraine':11,'Cervical spondylosis':12,

'Paralysis (brain hemorrhage)':13,'Jaundice':14,'Malaria':15,'Chicken


pox':16,'Dengue':17,'Typhoid':18,'hepatitis A':19,

'Hepatitis B':20,'Hepatitis C':21,'Hepatitis D':22,'Hepatitis E':23,'Alcoholic


hepatitis':24,'Tuberculosis':25,

'Common Cold':26,'Pneumonia':27,'Dimorphic hemmorhoids(piles)':28,'Heart


attack':29,'Varicose veins':30,'Hypothyroidism':31,

'Hyperthyroidism':32,'Hypoglycemia':33,'Osteoarthristis':34,'Arthritis':35,

'(vertigo) Paroymsal Positional Vertigo':36,'Acne':37,'Urinary tract infection':38,'Psoriasis':39,

'Impetigo':40}},inplace=True)

X_test= tr[l1]

y_test = tr[["prognosis"]]

#print(y_test)

np.ravel(y_test)
def DecisionTree():

from sklearn import tree

clf3 = tree.DecisionTreeClassifier()

clf3 = clf3.fit(X,y)

from sklearn.metrics import accuracy_score

y_pred=clf3.predict(X_test)

print(accuracy_score(y_test, y_pred))

print(accuracy_score(y_test, y_pred,normalize=False))

psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]

for k in range(0,len(l1)):

for z in psymptoms:

if(z==l1[k]):

l2[k]=1

inputtest = [l2]

predict = clf3.predict(inputtest)

predicted=predict[0]

h='no'

for a in range(0,len(disease)):

if(predicted == a):

h='yes'

break

if (h=='yes'):

t1.delete("1.0", END)
t1.insert(END, disease[a])

else:

t1.delete("1.0", END)

t1.insert(END, "Not Found")

def randomforest():

from sklearn.ensemble import RandomForestClassifier

clf4 = RandomForestClassifier()

clf4 = clf4.fit(X,np.ravel(y))

# calculating accuracy

from sklearn.metrics import accuracy_score

y_pred=clf4.predict(X_test)

print(accuracy_score(y_test, y_pred))

print(accuracy_score(y_test, y_pred,normalize=False))

psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]

for k in range(0,len(l1)):

for z in psymptoms:

if(z==l1[k]):

l2[k]=1

inputtest = [l2]

predict = clf4.predict(inputtest)

predicted=predict[0]

h='no'

for a in range(0,len(disease)):

if(predicted == a):

h='yes'
break

if (h=='yes'):

t2.delete("1.0", END)

t2.insert(END, disease[a])

else:

t2.delete("1.0", END)

t2.insert(END, "Not Found")

def NaiveBayes():

from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB()

gnb=gnb.fit(X,np.ravel(y))

from sklearn.metrics import accuracy_score

y_pred=gnb.predict(X_test)

print(accuracy_score(y_test, y_pred))

print(accuracy_score(y_test, y_pred,normalize=False))

psymptoms =
[Symptom1.get(),Symptom2.get(),Symptom3.get(),Symptom4.get(),Symptom5.get()]

for k in range(0,len(l1)):

for z in psymptoms:

if(z==l1[k]):

l2[k]=1

inputtest = [l2]

predict = gnb.predict(inputtest)

predicted=predict[0]

h='no'

for a in range(0,len(disease)):
if(predicted == a):

h='yes'

break

if (h=='yes'):

t3.delete("1.0", END)

t3.insert(END, disease[a])

else:

t3.delete("1.0", END)

t3.insert(END, "Not Found")

root = Tk()

root.configure(background='black')

Symptom1 = StringVar()

Symptom1.set("Select Here")

Symptom2 = StringVar()

Symptom2.set("Select Here")

Symptom3 = StringVar()

Symptom3.set("Select Here")

Symptom4 = StringVar()

Symptom4.set("Select Here")

Symptom5 = StringVar()

Symptom5.set("Select Here")

Name = StringVar()
w2 = Label(root, justify=LEFT, text="Disease Predictor using Machine Learning", fg="Red",
bg="White")

w2.config(font=("Times",30,"bold italic"))

w2.grid(row=1, column=0, columnspan=2, padx=100)

w2 = Label(root, justify=LEFT, text="A Project by Shrimad Mishra", fg="Pink", bg="Blue")

w2.config(font=("Times",30,"bold italic"))

w2.grid(row=2, column=0, columnspan=2, padx=100)

NameLb = Label(root, text="Name of the Patient", fg="Red", bg="Sky Blue")

NameLb.config(font=("Times",15,"bold italic"))

NameLb.grid(row=6, column=0, pady=15, sticky=W)

S1Lb = Label(root, text="Symptom 1", fg="Blue", bg="Pink")

S1Lb.config(font=("Times",15,"bold italic"))

S1Lb.grid(row=7, column=0, pady=10, sticky=W)

S2Lb = Label(root, text="Symptom 2", fg="White", bg="Purple")

S2Lb.config(font=("Times",15,"bold italic"))

S2Lb.grid(row=8, column=0, pady=10, sticky=W)

S3Lb = Label(root, text="Symptom 3", fg="Green",bg="white")

S3Lb.config(font=("Times",15,"bold italic"))

S3Lb.grid(row=9, column=0, pady=10, sticky=W)

S4Lb = Label(root, text="Symptom 4", fg="blue", bg="Yellow")

S4Lb.config(font=("Times",15,"bold italic"))

S4Lb.grid(row=10, column=0, pady=10, sticky=W)


S5Lb = Label(root, text="Symptom 5", fg="purple", bg="light green")

S5Lb.config(font=("Times",15,"bold italic"))

S5Lb.grid(row=11, column=0, pady=10, sticky=W)

lrLb = Label(root, text="DecisionTree", fg="white", bg="red")

lrLb.config(font=("Times",15,"bold italic"))

lrLb.grid(row=15, column=0, pady=10,sticky=W)

destreeLb = Label(root, text="RandomForest", fg="Red", bg="Orange")

destreeLb.config(font=("Times",15,"bold italic"))

destreeLb.grid(row=17, column=0, pady=10, sticky=W)

ranfLb = Label(root, text="NaiveBayes", fg="White", bg="green")

ranfLb.config(font=("Times",15,"bold italic"))

ranfLb.grid(row=19, column=0, pady=10, sticky=W)

OPTIONS = sorted(l1)

NameEn = Entry(root, textvariable=Name)

NameEn.grid(row=6, column=1)

S1 = OptionMenu(root, Symptom1,*OPTIONS)

S1.grid(row=7, column=1)

S2 = OptionMenu(root, Symptom2,*OPTIONS)

S2.grid(row=8, column=1)
S3 = OptionMenu(root, Symptom3,*OPTIONS)

S3.grid(row=9, column=1)

S4 = OptionMenu(root, Symptom4,*OPTIONS)

S4.grid(row=10, column=1)

S5 = OptionMenu(root, Symptom5,*OPTIONS)

S5.grid(row=11, column=1)

dst = Button(root, text="Prediction 1", command=DecisionTree,bg="Red",fg="yellow")

dst.config(font=("Times",15,"bold italic"))

dst.grid(row=8, column=3,padx=10)

rnf = Button(root, text="Prediction 2", command=randomforest,bg="White",fg="green")

rnf.config(font=("Times",15,"bold italic"))

rnf.grid(row=9, column=3,padx=10)

lr = Button(root, text="Prediction 3", command=NaiveBayes,bg="Blue",fg="white")

lr.config(font=("Times",15,"bold italic"))

lr.grid(row=10, column=3,padx=10)

t1 = Text(root, height=1, width=40,bg="Light green",fg="red")

t1.config(font=("Times",15,"bold italic"))

t1.grid(row=15, column=1, padx=10)

t2 = Text(root, height=1, width=40,bg="White",fg="Blue")


t2.config(font=("Times",15,"bold italic"))

t2.grid(row=17, column=1 , padx=10)

t3 = Text(root, height=1, width=40,bg="red",fg="white")

t3.config(font=("Times",15,"bold italic"))

t3.grid(row=19, column=1 , padx=10)

root.mainloop()
TESTING
The test case designed for the project is discussed below:

Test Case- I: Submit the symptoms from the list

Precondition: The application is open.

Assumptions: The symptoms for the disease are available

Test steps: 1. Select the checkbox from the list

2. Select submit

Expected Result: The symptoms selected should be submitted and further analyzed to calculate the
probability of the disease.
CONCLUSION

To conclude, our system is helpful to those people who are always worrying about their health
and they need to know what happens with their body. Our main motto to develop this system is
to know them for their health. Especially, people who are suffering from mental illness like
depression, anxiety. They can come out of these problems and can live their daily lives
easily.This project aims to predict the disease on the basis of the symptoms. The project is
designed in such a way that the system takes symptoms from the user as input and produces
output i.e.predict disease. Average prediction accuracy probability of 55% is obtained. Disease
Predictor was successfully implemented using grails framework.

Besides, our system provides better accuracy of disease prediction according


to symptoms of the user, and also it will provide motivational thoughts and images. In the end,
we can say that our system has no boundary of the user because everyone can use this system.
LIMITATION
1. In our proposed system we have taken only selected data values further expecting to
provide more data values.
2. We have used machine learning classifier's beacuse of small dataset, deep learning
algorithms required huge data for more accuracy.
3. Each narrow application needs to be specially trained.
4. Require large amounts of hand-crafted, structured training data.
5. Learning must generally be supervised: Training data must be tagged.
6. Require lengthy offline/ batch training.
7. Do not learn incrementally or interactively, in real time.
REFERENCE

1. Mir, S.N. Dhage, in 2018 Fourth International Conference on Computing


Communication Control and Automation (ICCUBEA) (IEEE, 2018), pp. 1–6
2. Y. Khourdifi, M. Bahaj, Heart disease prediction and classification using machine
learning algorithms optimized by particle swarm optimization and ant colony
optimization, Int. J. Intell. Eng. Syst. 12(1), 242 (2019)
3. S. Vijayarani, S. Dhayanand, Liver disease prediction using svm and na¨ıve bayes
algorithms, International Journal of Science, Engineering and Technology Research
(IJSETR) 4(4), 816 (2015)
4. S. Mohan, C. Thirumalai, G. Srivastava, Effective heart disease prediction using hybrid
machine learning techniques, IEEE Access 7, 81542 (2019)
5. T.V. Sriram, M.V. Rao, G.S. Narayana, D. Kaladhar, T.P.R. Vital, Intelligent parkinson
disease prediction using machine learning algorithms, International Journal of
Engineering and Innovative Technology (IJEIT) 3(3), 1568 (2013)
6. A.S. Monto, S. Gravenstein, M. Elliott, M. Colopy, J. Schweinle, Clinical signs and
symptoms predicting influenza infection, Archives of internal medicine 160(21), 3243
(2000)
7. R.D.H.D.P. Sreevalli, K.P.M. Asia, Prediction of diseases using random forest
classification algorithm
8. D.R. Langbehn, R.R. Brinkman, D. Falush, J.S. Paulsen, M. Hayden, an International
Huntington’s Disease Collaborative Group, A new model for prediction of the age of
onset and penetrance for huntington’s disease based on cag length, Clinical genetics
65(4), 267 (2004)
9. K. Kourou, T.P. Exarchos, K.P. Exarchos, M.V. Karamouzis, D.I. Fotiadis, Machine
learning applications in cancer prognosis and prediction, Computational and structural
biotechnology journal 13, 8 (2015)
10. T. Karayılan, O. Kılı¸c, in ¨ 2017 International Conference on Computer Science and
Engineering (UBMK) (IEEE, 2017), pp. 719–723
11. M. Chen, Y. Hao, K. Hwang, L. Wang, L. Wang, Disease prediction by machine
learning over big data from healthcare communities, Ieee Access 5, 8869 (2017)
12. S. Chae, S. Kwon, D. Lee, Predicting infectious disease using deep learning and big
data, International journal of environmental research and public health 15(8), 1596
(2018)
13. A.U. Haq, J.P. Li, M.H. Memon, S. Nazir, R. Sun, A hybrid intelligent system
framework for the prediction of heart disease using machine learning algorithms,
Mobile Information Systems 2018 (2018)
14. M. Maniruzzaman, M.J. Rahman, B. Ahammed, M.M. Abedin, Classification and
prediction of diabetes disease using machine learning paradigm, Health Information
Science and Systems 8(1), 7 (2020)
15. Arthur Samuel, Automated Design of Both the Topology and Sizing of Analog
Electrical Circuits Using Genetic Programming. Artificial Intelligence in Design '96.
Springer, Dordrecht.
16. Pingale, Kedar, et al. "Disease Prediction using Machine Learning."
(2019).Mr.ChalaBeyene, Prof. Pooja Kamat, “Survey on Prediction and Analysis the
Occurrence of Heart Disease Using Data Mining Techniques”, International Journal of
Pure and Applied Mathematics, 2018. 17. Pingale, K., Surwase, S., Kulkarni, V.,
Sarage, S., &Karve, A. (2019). Disease Prediction using Machine Learning.
17. Balasubramanian, Satyabhama, and Balaji Subramanian. "Symptom based disease
prediction in medical system by using Kmeans algorithm." International Journal of
Advances in Computer Science and Technology 3.
18. Dhenakaran, K. Rajalakshmi Dr SS. "Analysis of Data mining Prediction Techniques in
Healthcare Management System." International Journal of Advanced Research in
Computer Science and Software Engineering 5.4 (2015)
19. AiyeshaSadiya, Differential Diagnosis of Tuberculosis and Pneumonia using Machine
Learning(2019) and S. Patel and H. Patel, “Survey of data mining techniques used in
healthcare domain,” Int. J. of Inform. Sci. and Tech., Vol. 6, pp. 53-60, March, 2016.

You might also like