Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Heart Disease Prediction Using Classification

Algorithms
Aditya Tandon Abhishek Pratap Singh Amit Rawat
Computer Science and Technology Computer Science and Technology Computer Science and Technology
Noida Institute of Engineering and Noida Institute of Engineering and Noida Institute of Engineering and
Technology Technology Technology
Greater Noida, India Greater Noida, India Greater Noida, India
awesomeaditya0@gmai singhabhi0106@gmail.com awattanay4@gmail.com
l.com

Ashutosh Tyagi Dr. Kumud Saxena


Computer Science and Technology Computer Science and Technology
Noida Institute of Engineering and Noida Institute of Engineering and
Technology Technology
Greater Noida, India Greater Noida, India
tyagiashutosh629@gmail.com saxena.kumud@gmail.com

Abstract— Cardiovascular disease is a sickness that can data discovery. Predictive data mining involves attributes or
cause sudden death. It happens when the heart is not working variables in datasets to determine unknown or future values of
properly due to many factors such as obesity, high blood other factors. A model which can predicts the chance of heart
pressure, and cholesterol. The number of cases for death due to disease with upmost accuracy will be very helpful and is the
heart disease has been increased and there is a need for goal of our research.
methods to help predict the disease, aid in early diagnosis, and
help doctors treat patients medically. The current study aims
to estimate the risk of heart attack based on data from patients. II. Methods
In practice, prediction and interpretation are the main goals of We will create a program which will be using various
data discovery. Predictive data mining involves attributes or available machine learning model, we will train our model
variables in datasets to determine unknown or future values of using the above heart-related metrics, and the machine learning
other factors. This definition refers to finding patterns that model will predict the chance of having a heart attack, on the
interpret data for human interpretation. Machine learning is basis of the data extracted from patient’s report. It's often about
now used in many fields, and healthcare is no exception. K-
nearest, random forests etc. are some machine learning
visualizing data from a graphical perspective, analyzing data to
machine learning algorithms, which can help in the prediction see relationships between attributes to better understand key
of the heart disease in patient. Medical care is about people's attributes. We will use classification techniques (K-Nearest
lives and should be the right one. Therefore, we need to create Neighbor, Logistic Regression, Random Forest) to get a value
a system that can accurately predict the disease. of 0 or 1 in machine learning algorithms. This indicates
whether the person has a heart condition. Finally, after
Keywords— Classification algorithm, KNN, random forest processing the data with different classification methods, we
and data mining etc. will choose the most accurate model and draw conclusions
result by the same.
I. INTRODUCTION The models are-
Heart attacks are still one of the leading causes of death Logistic Regression: It is one of most common Machine
worldwide, and early detection can prevent heart attacks. Learning model, using widely in data science field. It is used
There are many cases in which the disease is detected at such to estimate the categorical dependent variable using a set of
stage where it became unrecoverable., It become a necessity independent variables. It estimates the output of the dependent
to develop something for the early detection of the variable. Therefore, the result must be categorical.
heart disease, if there is even one percent chance of any kind KNN: K-Neighbors is a way to teach computers how to sort
of heart disease in a person, and it is detected at initial stage, things into groups. It looks at all the things that have already
a life can be saved. been sorted and finds things that are similar to the new thing.
There are various causes of heart attack such as smoking, Then it puts the new thing in the same group as the similar
alcohol consumption, unhealthy lifestyle, lack of physical things. This makes it easy to sort things quickly and accurately.
workout, etc. There are already a lot of researches about the The K-Neighbors method can be used to sort things into groups
problem, but the emerging technologies give new hope for or to predict what a new thing might be like based on what
better results, so must be explored. similar things are like.
Heart datasets contain a lot of confidential information that is Random Forest: Random Forest is a special way for
available but not useful for prediction. The study uses several computers to learn. It helps them figure out if things belong in
data mining techniques to transform unused data into database certain groups or how to predict things. It's like teamwork for
models. People die from symptoms, they never expected. computers because it combines lots of different ways of
Doctors need to predict heart attacks in patients before they figuring things out to make it even better. It randomly selects
happen. Data mining is a method of obtaining data to analyze
subset from the dataset to create various decision tree.
and transform data into useful data. The current study aims to
estimate the risk of heart attack based on data from patients. In
practice, prediction and interpretation are the main goals of
A. WORKING PROCESS Data Splitting: Data splitting is one of the important steps of
Working process comprises of 5 major steps, which is the the process, as classification algorithms are the supervised
basis of the data science life cycle. Process starts with the data learning algorithm, so it is must to have a testing dataset so that
cleaning, then further move to analyzing the data, model Given dataset is divided in the ratio of 80-20, among
building and model training then lastly predicting the result training and testing dataset respectively, model will be then
and comparing the output. trained with the training dataset and then test over testing data to
analyze if the model is predicting correctly or not, and the
Data Cleaning: It is the process of removing bad or
accuracy with which it is predicting.
unnecessary data from the dataset like duplicate records,
attribute with no value, outliers, etc. It is the process to make Model Training: After the execution of previous process that
the data more useful and capable of being used for various was data splitting, we further moved towards the model
statistical analysis. training, we trained various available machine learning models,
EDA: It is the process of analyzing the dataset using data with the training dataset. The model will get familiar with the
visualization and, then finding the characteristics of various pattern in the dataset and will used this to predict for the testing
attributes, discovering hidden patterns, relationship dataset.
(correlation) among the attributes., how one attribute will Model Evaluation: Now, it’s time for evaluating the model, we
effects the other. need to find the accuracy of each model. The most accurate
There are various types of exploratory data analysis, most model will be used for the prediction.
common and beneficial one is data visualization. Various
visualization techniques like heatmap, bar plot, scatterplot are For the evaluation of a model, there are few things, which are
used to visualize the statistical data for better understating of very crucial, first is precision, which is simply the ratio
data. between true positives and all the positive, second is recall,
In python various libraries are present for the EDA one such is, which is measure of model correctly identifying true positives.
pandas library, which consists of various functions for Logistic Regression Model: Here is the precision and recall table
different requirements. for the logistic regression model.
1. Shape function is used to find the shape of the
dataset, i.e. number of rows and number of
columns. Target Precision Recall
2. Info function is used to display the columns with 0 0.92 0.79
their respective datatypes.
3. Nunique function is used to get number of unique 1 0.83 0,93
elements in the dataset.
4. Describe function is used to describe the data, it Table 2
computes statistical data like mean standard
deviation, etc. K Nearest Neighbor Model: The recall and precision table for KNN
model is below.
5. Head function will display the 5 initial rows of the
dataset
First five rows of the dataset. Target Precision Recall
Table 1
0 0.76 0.84
Age sex cp chol fbs exang thal Target
1 0.84 0.76
52 1 0 212 0 0 3 0
Table 3
53 1 0 203 1 1 3 0
Random Forest Model: Below is the precision and recall table
59 1 1 221 0 1 2 1 for random forest model.

47 1 0 205 0 1 2 0 Target Precision Recall


60 0 0 254 0 0 2 1 0 0.92 0.79

Heatmap is used to show correlation between various 1 0.83 0.93


attributes of the dataset.
Table 4

Fig 1.
III. FLOWCHART V. RESULT AND OUTCOMES
Three machine learning models are used for the prediction of
heart disease, random forest is the most accurate model
among all the used model.

Model Accuracy
Logistic 0.8634
Regressor
KNN 0.795
Random Forest 1.0

Table 6

III. CONCLUSION
The goal of this survey paper is to find the most accurate
machine learning model which can predict the chance of
having the heart disease with upmost accuracy. The program
looks at a person's medical history and compares it to other
people's histories to see if they might be at risk for heart
disease. We have used various supervised machine learning
algorithms like KNN, random forest and logistic regressor. The
predictive system uses all these three models and then test the
model over the given input and all the three model predicts
their result, we have observed the system for various inputs and
we are concluding that, "random forest," was the most accurate
and gave the best result among all. This model predicts with
IV. DATASET upmost accuracy.
IV. REFRENCES
[1] (2022). Heart Stroke Detection Using KNN Algorithm. ECS Trans., 1(107),
18385-18393. https://doi.org/10.1149/10701.18385ecst
[2] (2023). Diagnosis of Heart Disease using Machine Learning Algorithms.
IJARSCT, 171-182. https://doi.org/10.48175/ijarsct-9491
[3] https://www.kaggle.com/datasets/johnsmith88/ heart-disease-dataset
[4] (2023). MACHINE LEARNING-BASED CARDIAC DISEASE
PREDICTION SURVEY PAPER. IRJMETS.
https://doi.org/10.56726/irjmets34502
[5] https://www.analyticsvidhya.com/bl og/2020/09/precision-recall-machine-
learning/
[6] https://www.geeksforgeeks.org/what-is- exploratory-data-analysis/
[7] (2020). Heart Disease Prediction Using Machine Learning. IJMTST, 12(6),
290-293. https://doi.org/10.46501/ijmtst061254
[8] (2023). MACHINE LEARNING-BASED CARDIAC DISEASE
PREDICTION SURVEY PAPER.IRJMETS.
https://doi.org/10.56726/irjmets34502
[9] (2023). MACHINE LEARNING-BASED CARDIAC DISEASE
PREDICTION SURVEY PAPER.
IRJMETS. https://doi.org/10.56726/irjmets34502

You might also like