Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

HEART DISEASE PREDICTION

Abstract-- According to the Department Of working, and the individual will die within
Health, cardiac diseases claim the lives of 17.7 minutes.Most Cardiovascular Diseases can be
million people each year, accounting for 31% of all prevented by addressing observable risk factors, such as
deaths worldwide. Heart disease has also become cigarette smoking, poor eating habits, obesity, physical
the leading cause of death in India. According to inactivity, and harmful liquor consumption in public
the 2016 Global Burden of Disease report, heart places. Individuals with Cardiovascular Disease or who
disease claimed the lives of 1.7 million Indians in are at high cardiovascular risk (due to the presence of at
2016. We have referred around 15-20 IEEE least one risk factor, such as hypertension, diabetes,
papers and all the papers have the same hyperlipidemia, or a well-established illness) require an
attributes. So we propose the model having early introduction and direction using short
consultation with doctor with new and easily prescriptions, as advised. Cardiovascular disease is
manageable attributes that are Gender, caused by the accumulation of fatty deposits inside the
Breathlessness during activity, breathlessness at conduits and the formation of blood clusters. It's also
rest, awake in by breathlessness at night, Exercise linked to damage to the brain, heart, kidneys, and eyes,
Induced Angina (chest pain after exercise),history among other organs. Estimates made by the Health
of cyanosis (bluish discoloration of fingers/around organization , India has lost up to 237 billion dollars
lips),diabetes, clubbing, Blood Pressure(if more due to cardiovascular disease in the last decade. As a
than 140/90). Our project’s main goal is to build result, timely and precise prognosis of heart-related
intelligent system with machine learning, namely disease is essential.This system is basically a web based
Naive Bayes, KNN, Random forest, Decision application wherein the user answers a sequence of
tree,Logistic RegressionXGBOOST, LSVM, ANN questions. Data analytics is used to incorporate the
.Based on the obtained results the system can world for its valuable use to controlling, contravasting
predict whether a person has chances of heart and managing large data sets.it can be applied with
disease or not. It is implemented as a web much success to predict, prevent, manage a
application.We implement this system special for cardiovascular disease.
the age group under 50 for early prediction .With
early prediction and proper medical treatment one 2.LITERATURE SURVEY
can reduce the cost of treatment and further
damage. In case if the user is unknown about This section highlights ongoing research towards using
his/her diabetes and BP status he can answer no to machine learning classifiers to predict chronic and
the main question ,after that type 1 symptoms of persistent diseases. Yuanyuan [1] To build the clusters
this disease can be seen on the screen and based on for discovering anomalies, the Silhouette approach
his/her response system predict the prediction. determines an optimal value of K. They next use the
five most prominent machine learning classification
Keyword– KNN, Naive Bayes, Decision Tree, techniques to remove the discovered anomalies from
XGBOOST,Randomforest,Logistic_Regression, the data. The work in [2] explored the LCSA stands for
LSVM,ANN Levy-based crow search algorithm. ANFIS, or very
non-linear, complex, and dynamic computational
1.INTRODUCTION processes, are used in this framework to make
predictions on cardiac diseases. MSSO is used to
In most animals, the heart is a muscular organ that optimize the learning parameters of ANFIS, leading to
pumps blood through the circulatory system's blood better results.The RFRF-ILM model given in [3]The
arteries. The pumping blood transports oxygen and Decision Tree(DT) feature variables and criteria are
nutrients to the body, as well as metabolic waste like used to cluster datasets. The classifier is then applied to
carbon dioxide, to the lungs. If it fails to function each data set to estimate its performance. The good
properly, the brain and several other organs will stop performing models are selected based on the results
because they have a lower error rate. The output is 4.PROPOSED SYSTEM
further optimized by taking a decision tree cluster with
a high inaccuracy rate and eliminating its related Architecture Part A:
class-type information.Chidambaram [4] Using the This is a Architecture part A of a heart disease prediction
pandas tool, I first cleansed the dataset and processed it system.It gives a quick glance at the system.we have given
with preprocessing techniques such as Data Integration, two options for user
Data Reduction ,Data Transformation and Data 1)Quick predict which doesn't require any registration
Cleaning.Patient records were visualized The cleansed 2) sign up/ in which gives more features to users like
data is split into 60 percent training and 40 percent test previous records and prevention.
using the split criterion, and the dataset is then tested to
five machine learning classifiers. The EDCNN model is
being worked on by Rony [5]. This model focuses on a
more detailed architecture that encompasses the
multilayer perceptron model as well as regularization
learning techniques. For the diagnosis, the UCI
repository dataset was used, as well as the CNN
classifier and multi-layer perceptron (MLP)
module.Santhana Krishnan [6] uses the dataset
retrieved from an online website . Decision tree(DT)
and Naive Bayes(NB) classifier algorithms are used. On
the dataset, two data mining algorithms were used to
estimate the likelihood of a patient developing heart
disease.

Fig 4.1 Architecture part A


3.EXISTING SYSTEM
Fig 4.1 takes a glance at the system, as it shows we
gave two options for the user .The quick predict
In this section we take a glance on the work that
system can predict heart disease without registration
developers and researchers did in heart disease
of the user. Here the user answers the predefined
prediction. Almost all of them have used The data set
questions and based on the input data various
includes 304 instances of 10 parameters such as sex,
machine learning algorithms work on the data and the
age, trestbps, cp, thalach, restecg,cho, fbs, ca, target
prediction of heart disease is given in the output. In
was collected from the University of California
the second option user need to registered to the
repository. The dataset is cleansed and processed at the
system.once users account is created he/she can
first level utilizing preprocessing techniques such as
access some extra benefits as compare to quick
Data Integration, Data Transformation, Data Reduction,
predict.user can check his previous record,some
and Data Cleaning with the Pandas tool. There were
prevention about heart disease and contact to doctors
304 patient records in all that were viewed. Data
and hospital registered in the system.all the further
visualization tools assist the data scientist in
steps are same as quick predict.
comprehending the dataset's viability. The cleansed data
is divided into 60 percent training and 40 percent test
Architecture Part B:
groups, and the dataset is then applied to five machine
The system's operation begins with the collecting of
learning classifiers: Logistic Regression , Support
data and the selection of relevant attributes.The
Vector Machine , Decision Tree (DT), Random Forest
relevant data is then preprocessed and converted to
(RF), and K-Nearest Neighbors (KNN). The confusion
the required format. The information is then
matrix was used to calculate the classifiers' accuracy.
separated into two categories: training and testing.
The best classifier could be determined as the one that
The algorithms are used, and the model is trained
achieves the maximum accuracy.
with the data provided. The system's correctness is
determined by testing it using the testing
data.Architecture part B is given the following fig 4.2

Fig 4.3. Correlation matrix

A correlation Matrix Plot is a covariance matrix with


a correlation meter that defines the strength of the
Fig 4.2 Architecture Part B linear relationship. The Correlation matrix represents
the strength and direction of a linear relationship
Collection of dataset: between two variables, with values ranging from -1
We started by gathering data for our heart disease to +1. The correlation matrix's feature shows the
prediction system. We divided the dataset into correlation between the coefficients. Each of the
training data and testing data after it was collected. values of a random variable is said to be correlated
The training dataset is used to learn the prediction with each other. By showing the correlation matrix as
model, whereas the testing dataset is used to evaluate a heat map, this is an effective way to check for
the model. For this project, 70% of training data is relationships between features.
used and 30% of data is used for testing. The dataset
used for this project is given by Dr.Haresh Dhondi
(MD)
Data Pre-processing:
Selection of attributes: Preprocessing data is a crucial stage in the
Attribute or Feature selection includes the selection development of a machine learning model. Data may
of appropriate attributes for the prediction system. not be clean or in the appropriate format for the
This is used to boost the system's efficiency. Various model at first, which can lead to inaccurate results.
attributes of the patient like Gender,Breathlessness We change data into our needed format during
during activity,breathlessness at rest,awake in by pre-processing. It's used to deal with the dataset's
breathlessness at night,Exercise Induced Angina noises, duplication, and missing values. Importing
(chest pain after exercise),history of cyanosis (bluish datasets, Attribute scaling ,dividing datasets are all
discoloration of fingers/around part of data pre-processing. Data preprocessing is
lips),diabetes,clubbing, Blood Pressure(if more than essential to improve the model's accuracy. Figure:
140/90).etc are selected for the prediction. The Data Pre-processing
Correlation matrix is used for attribute selection for
this model. Balancing of Data:
Imbalanced datasets can be balanced in two ways.
They are UnderSampling and Oversampling
● UnderSampling:The size of the abundant
class is reduced in Under Sampling to
achieve dataset balance. When the amount
of data is adequate, this technique is are non-medical parameters so anyone having basic
considered. knowledge of health can fill his/her personal data and
● OverSampling:The dataset balance is based on the given input data the system will predict
achieved in Over Sampling by increasing the the person having heart disease or not. Table.2.
size of the limited samples. When the shows the parameters used in the proposed model
amount of data available is insufficient, this
method is considered. Table.4.1 Feature information of dataset

Sr.No Attribute Name Range of


values

1 Age Int (years)

2 Gender Categorical
code

3 Breathlessness during Binary


activity

4 breathlessness at rest Binary

5 Awake by Binary
breathlessness at night
Fig 4.4 Data Balancing
6 Exercise induced Binary
Here,count 0 shows The percentage of people who do angina (chest pain
not have heart disease. after exercise)
count 1 shows The number of people who have been
diagnosed with heart disease. 7 History of cyanosis Binary
(bluish discoloration
of fingers/lips)

Prediction of Disease: 8 Diabetes Binary


SVM, Naive Bayes, Decision Tree, Random Tree, If user selects NO
Logistic Regression, Artificial Neural Network, and 1.Excessive thirst
Xg-boost are some of the machine learning methods 2.Excessive urination
3.weight loss
used for classification. For heart disease prediction, a
4.Tingling hands or
comparative analysis of algorithms is performed, and feet
the algorithm with the highest accuracy is used.
9 Clubbing Binary

10 Blood pressure(if Binary


Data Set Creation:
more than 140/90)
Medical Expert :Dr. Haresh Dhondi ( MD) If user selects NO
1.Severe headache
Parameters: 2.Fatigued
We have referred different IEEE papers and we come 3.Unusal change in
to know that all the papers have the same attributes. behavior
4.Nosebleed
So we propose the model having consultation with an
expert Dr.Harsh Dhondi with new and easily
manageable attributes. All the parameters that we use
5.RESULT
Young age (6.8%) + Middle age(39.9%)=46.7%
people upto age 50 have heart related disease.One
After testing all the above algorithms on our dataset shocking information that came out with this research
is from our dataset around 6.8% young age persons
we obtained the following accuracy.we obtained the and 39.9% middle age persons were detected with
highest accuracy with LSVM model which is 99.20% heart disease.total 46.7% persons having heart
disease in young and middle age groups.This is an
alarming situation.With early detection and proper
medical assistance this numbers can be reduce.
Comparative Analysis:

5.1Table Obtained Accuracy with different


Algorithms

SR.NO ALGORITH ACCURACY


M

1 NN 99.20%

2 DT 94.35%

3 RF 97.18%

4 XGBOOST 98.31%

5 KNN 86.67%
Fig 5.2 Heart Disease patients Age group wise
6 NB 87.24%

7 LR 88.60%

8 LSVM 99.80% .In Fig 5.1Data distribution is given on X axis we


have age and on Y axis number of records

.Fig5.1 Obtained Accuracy Fig5.3 Age Wise Distribution

Findings from study :


6. CONCLUSION 10.1109/ICIICT1.2019.8741465

In this application we have found the best optimized 7. Keerthi Samhitha;Sarika Priya;Jithina Jose,
prediction models for heart disease with simple and ”Improving the Accuracy in Prediction of Heart Disease
easily manageable parameters.So that the system can using Machine Learning Algorithms”. DOI:
identify heart disease at an early stage.we implement 10.1109/ICCSP48568.2020.9182303
this system special for age group under 50 for early
8. Nabaouia Louridi;Meryem Amar; Bouabid EI Ouahidi
prediction .With early prediction and proper medical
”Identification of Cardiovascular Disease Using Machine
treatment one can reduce the cost of treatment and
Learning” DOI: 10.1109/CMT.2019.8931411
further damage.
7.FUTURE SCOPE
A mechanism has been developed to determine the
accuracy of heart disease prediction. The proposed
technology has a high level of accuracy when it
comes to diagnosing cardiac problems. We can
integrate this web application with an Android app in
the future so that it can be easily accessible to android
users.we also make a chain of different heart
specialist hospitals and provide them with this
system. So the patient can easily get an idea of the
available hospital for treatment.

REFERENCE

1. Devansh Shah ; Samir Patel ; Santosh Kumar Bharti,


“Heart Disease Prediction using Machine Learning”
https://doi.org/10.1007/s42979-020-00365-y

2. Archana Singh ; Rakesh Kumar, “Heart Disease


Prediction Using Machine Learning Algorithms “ DOI:
10.1109/ICE348803.2020.9122958

3. Pranav Motarwar ; Ankita Duraphe ; G Suganya ; M


Premalatha , “Cognitive Approach for Heart Disease
Prediction using Machine Learning” DOI:
10.1109/ic-ETITE47903.2020.242

4. Harshit Jindal ; Sarthak Agrawal ; Rishabh Khera ;


Rachna Jain ; Preeti Nagrath , “Heart disease prediction
using machine learning algorithms “ DOI:
:10.1088/1757-899X/1022/1/012072

5. R.Jane Preetha Princy ; Saravanan Parthasarathy ; P. ,


“Prediction of Cardiac Disease using Supervised Machine
Learning Algorithms” DOI :
10.1109/ICICCS48265.2020.9121169

6. Santhana Krishnan; Geetha.S,”Prediction of Heart


Disease Using Machine Learning Algorithms” DOI:

You might also like