Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A survey on Cardiac Disease Prediction Using

Machine Learning Algorithms – A Review


Mukesh Raj1, Manan Agarwal2, Om Katiyar3, Manvendra Pathya4
Department of Computer Science and Engineering
JSS Academy of Technical Education, Noida-201301, India
E-mails: 1mukeshraj.2021t@gmail.com , 2mananagarwal1103@gmail.com ,
3omkatiyar604@gmail.com , 4manvendrapathya01@gmail.com

Abstract—As the vital organ regulating the rhythms of I. INTRODUCTION


the human body, the heart demands precise diagnosis.
Even minor diagnostic lapses can culminate in Cardiovascular diseases remain a leading global cause of
debilitating fatigue or tragic outcomes, with the number mortality as per the World Health Organization which
of cardiovascular mortality events increasing daily. estimates 12 million deaths annually. Often referred to as
Considering escalating global concern regarding heart heart diseases, accurate and early diagnosis of these
disease, researchers have sought innovative solutions. conditions is significant in better prevention and treatment.
Machine learning, which is a sub-part of artificial However, traditional diagnostic methods are often very
intelligence, demonstrates considerable potential to time-consuming and costly. Recent advancements in
address this need. Its ability to grasp patterns from large machine learning (ML) have prompted researchers to
datasets and forecast future events renders it well-suited explore its potential for predicting heart disease, offering a
for the task of predicting potential cardiovascular non-invasive and possibly more efficient approach. This
diseases. This enables early intervention and literature review examines the emerging field of ML
preventative strategies. This academic review explores applications in heart disease prediction. We conduct a
various scholarly literature applying diverse machine systematic analysis of numerous research papers, evaluating
learning algorithms - including k-nearest neighbours the diverse ML algorithms and techniques employed across
(classifies new data points based on majority), decision studies.
trees (tree-like model to split dataset into smaller
datasets), random forests (ensemble learning method), This review examines algorithm diversity, performance
logistic and linear regression (used for classification and comparison, challenges and limitations, and future scope in
regression respectively), and support vector machines machine learning (ML) - based heart disease prediction.
(differentiates datasets by maximizing the margin Algorithm diversity analyses a range of ML techniques,
between them) - to the realm of heart disease risk including deep learning, support vector machines, logistic
modelling. Algorithms are educated with reputable regression, and ensemble learning. Performance
datasets from sources such as the Cleveland Clinic, comparison assesses algorithm accuracy, sensitivity,
Hungarian, Swiss, Long Beach and UCI repositories. By specificity, and other metrics to identify promising
comprehensively surveying achievements to date, this approaches. Challenges and limitations address issues like
review aims to outline progress made in machine data quality, bias, and interpretability. Future directions
learning for cardiovascular prognosis. Advancing highlight potential avenues to advance ML-based
predictive accuracy through iterative research brings prediction. Through critical analysis, this review aims to
closer the goal of personalized heart health monitoring provide insights on ML's potential and challenges in
to preserve life and promote community wellness. revolutionizing early heart disease diagnosis and
Further research holds promise to transform management. It functions as an asset for clinicians,
cardiovascular prevention through precision machine researchers, and healthcare professionals.
learning.

Keywords—decision tree, heart disease, KNN, logistic


regression, support vector machine, random forest.
II. LOGISTIC REGRESSION generated during ECG exams provides insights into cardiac
functioning. The ‘P’ wave signifies atrial depolarization,
the ‘QRS’ complex indicates ventricular depolarization,
Logistic regression, a machine learning algorithm, is and the ‘T’ wave is linked to ventricular repolarization.
applied to address classification problems with binary Together, examination of these distinct deflections’ aids in
outcomes. It predicts the probability of a specific event evaluation of atrial and ventricular conduction as well as
happening based on various factors and outputs values myocardial repolarization. Deviations from normal wave
between 0 and 1, representing the likelihood of the patterns can indicate underlying pathologies such as
event occurring. It also classifies data points into two arrhythmias, conduction delays, or ischemia. By
incorporating ECG-derived features into predictive
categories based on a threshold (e.g., 0.5).
modelling, researchers hope to diagnose cardiac conditions
Analysis of cardiac disease typically involves several more accurately and better stratify patient risk. Overall, the
inclusion of variables representing ECG waveform
key steps. Data acquisition utilizes appropriate
components allows algorithms to leverage the physiological
methods to collect relevant medical information.
insights obtainable from electrographic assessment of the
Preprocessing then cleanses the data by removing heart.
errors or inconsistencies. Feature selection identifies
attributes that are highly correlated with the target
variable of disease presence/absence. Logistic
regression modelling commonly trains and tests on
these features to predict whether cardiac disease is
present or not. This established workflow first gathers
suitable input, prepares the data, focuses on impactful
predictors, and finally applies a standard classification
algorithm to determine cardiac disease status. Overall,
the approach aims to methodically analyse cardiac
conditions through established machine learning
techniques.

Fig. 2. An Electrocardiogram [2]

Logistic regression (LR), a supervised machine learning


classification purposes, is used when the dependent variable
is binary or dichotomous. LR can predict discrete
categorical variables that have two possible classes, such as
0 or 1. It can be represented using the sigmoid function
which is serves as the cost function in LR. The sigmoid
transforms the predicted real values into probabilities
within the range 0 and 1.

The logistic sigmoid function is:

1
𝑃(𝑥) =
1+𝑒

Where P(x) represents the probability estimation function


whose range is from 0 to 1. The variable x is the input to the
probability function, which is the algorithm's predictive
value. Additionally, e is Euler's number, which has a value
of approximately 2.71828.

Fig. 1. Workflow of logistic regression model [1]

The datasets utilized for heart disease prediction contain


variables representing waves observed in electrocardiogram
(ECG) readings. The characteristic waveform morphology
III. LITERATURE SURVEY discusses that the traditional methods of storing and
analysing medical data are not secure or efficient. There is
a lack of transparency and accessibility to the patients
[3] Alkayyali, Z. K. & et al., (2023) provides a regarding their own data. The authors proposed the
comprehensive review of 40 studies employing machine blockchain based system which included decentralized data
learning techniques for the prediction of heart diseases. The storage and patients control over their data. The authors
authors used systematic learning techniques with CNN introduced a machine learning technique called
being the most prevalent, achieving accuracy exceeding SCA_WKNN for predicting heart disease, which leverages
88%. Showed how important is the need for a diverse the secure storage and transparency of blockchain
dataset. Suggested future scope studies in the under- technology. This paper presents a novel application of
explored techniques like reinforcement learning and semi- blockchain - using it to train a SCA_WKNN algorithm for
supervised learning with more diverse datasets to improve
improved heart disease prediction.
model generalizability.
[8] Rajendran, R. & et al., (2022) developed a new cutting-
[4] Bhatt, C. M. & et al., (2023) shows how the machine
edge way of a machine learning pipeline for enhancing the
learning algorithms are employed for heart disease
accuracy in heart disease diagnosis through precision
prediction have up to 94% accuracy. However, all the
prediction. The authors combined four datasets to increase
studies which the author mentions have used small sample
dataset volume and address bias-variance issues. Imputes
sizes, and the results of those studies cannot be generalized
missing values and removes outliers based on attribute
for the larger populations also. In this paper, the authors
relationships and mahala Nobis distance. Proposed a new
have converted the continuous data into categorical data
entropy-based feature engineering (EFE) technique for
which improves the accuracy of the machine learning
improved data quality. Deployed a diverse ensemble of
algorithms. However, this is not the only approach, every
machine learning models (Naive Bayes, decision trees,
method depends on the specific dataset and the machine
SVMs, random forests, logistic regression) leveraging
learning algorithm used. The authors made use of the
various pre-processing and feature engineering techniques.
following machine learning algorithms – random forests
For accurate heart disease detection, an ensemble of
(that combines various decision trees to improve accuracy),
Logistic Regression and Naive Bayes demonstrated
decision trees (that are used to classify data into different
superior performance compared to all other tested models,
categories), and vector machines (that are used for
reaching a stellar 96.8% accuracy and demonstrating high
classification and regression).
specificity (92.7%), precision (91.5%), and overall
[5] Gupta, C. & et al., (2022) in their paper stated that they performance (0.931 F1 score).
used supervise machine learning algorithms employed for
prediction of heart diseases with high precision. However, [9] Absar, N. & et al., (2022) uses four machine learning
there is no one-size-fits-all approach for everything, models (adaboost, KNN, random forest, decision tree) to
different algorithms work in different datasets. Logistic diagnosis heart diseases prediction. It combined and
regression, which is a type of machine learning algorithm, analysed data from four sources (Switzerland, Cleveland,
is commonly employed for classification tasks. It has been Long Beach, Hungary). While achieving a remarkable
99.03% accuracy on the combined dataset, the system also
shown to be effective for predicting cardiac disease in
maintains a noteworthy 93.43% accuracy on the individual
several studies. Other machine learning algorithms that the
Cleveland data, showcasing its effectiveness across
authors have used to predict cardiac diseases include different datasets. The authors used Streamlit for building
random forest, random forests, and support vector the prediction system adds potential for wider accessibility
machines. and ease of use.
[6] Rahman, M. M. & et al., (2022) the paper discusses a
[10] Sarra, R. R. & et al., (2022) introduces an innovative
website based cardiac disease prediction system employing
cardiac disease prediction model based on the support
machine learning algorithms. The system uses 13 health vector machine (SVM) algorithm, with the objective of
parameters which have been shown to be effective to improving the precision and minimizing computation load.
predict heart disease in other studies. The system also uses Leveraging the power of X2 statistical feature selection, the
eight machine learning algorithms to make predictions. The model's accuracy jumps to 89.47% on the Cleveland dataset
results show that decision trees and random forests provide and 89.7% on Statlog, demonstrating the effectiveness of
the best precision and effectiveness in the prediction of this approach in optimizing prediction performance. The X2
heart diseases. The system is hosted on a website so that the method selects only 6 important features out of the 14,
people can check their heart condition from anywhere. reducing computational load and potentially improving
generalizability.
[7] Hasanova, H. & et al., (2022) proposed the use of
[11] Nagavelli, U. & et al., (2022) focuses on applying
blockchain technology in the machine learning algorithm
machine learning techniques for enhanced detection and
framework to enhance the heart disease prediction. It
precise heart disease prediction, particularly early-stage algorithms and deep learning methodologies. While these
heart failure (HFD). Existing research indicates the methods show promise for early and accurate diagnosis,
potential of ML in disease diagnosis, highlighting various their performance can be negatively impacted by
techniques like SVM, XGBoost, and Naïve Bayes. imbalanced datasets, where one class (e.g., healthy
Challenges mentioned include limited large-scale and individuals) dominates others (e.g., heart disease patients).
The authors conducted a systematic literature review (SLR)
diverse datasets for accurate model training and
of 451 research papers on heart disease diagnosis using
interpretability difficulties with certain ML algorithms. ML/DL. The study revealed several key findings such as
Future research directions include exploring under-utilized current research primarily focuses on improving model
techniques like DBSCAN and SMOTE-ENN for data performance, neglecting crucial aspects like interpretability
preparation and outlier handling and investigating more and explain ability, existing heart disease datasets often
diverse datasets to improve model generalizability. contain imbalanced class ratios, leading to biased ML/DL
performance, various data-level and algorithm-level
[12] Raju, K. B. & et al. (2022) the paper explores the techniques have been presented to deal with the imbalanced
investigation into heart disease diagnosis through data, but open challenges remain.
utilization of IoT and deep learning techniques and how it
suffers from limited accuracy or high computational cost. [16] Ahmad, G. N. & et al. (2022) proposes a system
The diagnosis process relied on an Optimized Cascaded utilizing diverse machine learning algorithms for cardiac
Convolutional Neural Network (CCNN), a specialized disease prediction. To achieve robust and comprehensive
architecture designed for enhanced accuracy. Its heart disease prediction, the system employs a quartet of
hyperparameters were optimally configured using the powerful algorithms: Logistic Regression, Support Vector
Machine, K-Nearest Neighbours, and Gradient Boosting
Galactic Swarm Optimization (GSO) algorithm.
Classifier. Each algorithm's potential is maximized through
Challenges identified include security and privacy
GridSearchCV hyperparameter tuning, leading to rigorous
concerns, data complexity, and the need for effective evaluation on two datasets: the Switzerland, Cleveland,
feature extraction and optimization techniques. Long Beach V, and Hungary dataset and the UCI Kaggle
dataset. GridSearchCV played a pivotal role in unlocking
[13] Al Ahdal, A. & et al. (2022) investigates the potential the full potential of the Extreme Gradient Boosting
of utilizing machine learning for accurate heart disease Classifier, leading to exceptional accuracy across both
prediction through analysis of UCI medical datasets using datasets. This dynamic duo achieved a perfect 100% testing
various machine learning algorithms. This study aims to accuracy and a noteworthy 99.03% training accuracy on the
showcase on the best ML models employed for predicting Cleveland et al. dataset, while securing a commendable
heart disease. The authors achieve this by comparing and 98.05% testing accuracy and a flawless 100% training
evaluating various models, validating their performance accuracy on the UCI Kaggle dataset. Standing tall among
with accuracy and confusion matrix analysis, and its peers, the proposed system, powered by the Extreme
optimizing efficiency through irrelevant attribute handling Gradient Boosting Classifier with GridSearchCV,
and data normalization. The future work includes using demonstrated its superiority in a head-to-head comparison
more advanced ML models and address the issue of data with established heart disease prediction techniques.
privacy and security concerns. Remarkably, it achieved the highest testing accuracy,
leaving other methods in its wake.
[14] Sharean, T. M. & et al. (2022) reviews various deep
learning techniques and methods for prediction of heart [17] Chang, V. & et al. (2022) proposes an AI-based
diseases. It highlights the potential of these methods for detection system of cardiac diseases utilizing machine
early and accurate diagnosis, potentially leading to learning algorithms created with the Python programming
improved patient outcomes. The deep learning approaches language and its libraries. It describes the potential of
include CNN-based method which extract features from machine learning in heart disease risk. The paper mentions
medical data (e.g., ECG signals) and achieve high accuracy Loku et al. (2020) highlighting Python's safety and
in predicting heart disease (up to 99.1%) and DNN-based applications in healthcare. This paper adds value to the
method which combine feature selection and deep learning domain by demonstrating the potential of Python-based
machine learning for heart disease detection and
for efficient and reliable prediction (up to 98.77%).
introducing a random forest classifier approach with
Ensemble deep learning method combines multiple deep
promising accuracy.
learning models which further improves accuracy (up to
98.5%). The future direction is to explore more advanced
[18] Ansarullah, S. I. & et al. (2022) suggests a heart
deep learning models like LSTMs and capsule disease risk prediction model leveraging machine learning
networks, transfer learning for limited data, and addressing techniques based on non-invasive factors like age, blood
data privacy and security concerns. pressure, BMI, and lifestyle habits. With a remarkable 85%
accuracy, the model not only delivers exceptional
[15] Ahsan, M. M. & et al. (2022) the paper focuses on the performance but also stands out from the competition by
challenges associated with imbalanced data in forecasting achieving superior prediction accuracy and a lower
cardiac disease through the application of machine learning misclassification rate. Specific findings in this paper
included that random forest achieved the best performance REFERENCES
among decision tree, K-nearest neighbour, support vector
machine, and Naive bayes algorithms and the model is
developed making use of a novel non-invasive dataset [1] Workflow of logistic regression model https://ars.els-
collected from Kashmir, India. Limitations include cdn.com/content/image/1-s2.0-S2666285X22000449-
complex rules in the decision tree model and high gr1_lrg.jpg
misclassification rates in K-nearest neighbour and Naive [2] ACLS Medical Training – Basics of ECG
Bayes models. https://www.aclsmedicaltraining.com/basics-of-ecg/
[3] Alkayyali, Z. K., Idris, S. A. B., & Abu-Naser, S. S.
[19] Al Bataineh, A. & et al. (2022) proposes a new (2023). A Systematic Literature Review of Deep and
technique for the prediction of cardiac diseases using a
Machine Learning Algorithms in Cardiovascular Diseases
swarm-based multilayer perceptron (MLP-PSO) network.
Diagnosis. Journal of Theoretical and Applied Information
This method uses particle swarm optimization (PSO) to
optimize the training of the MLP neural network for Technology, 101(4), 1353-1365.
improving prediction accuracy. The authors compare their [4] Bhatt, C. M., Patel, P., Ghetia, T., & Mazzeo, P. L.
method with 10 other machine learning algorithms and find (2023). Effective heart disease prediction using machine
that it outperforms them all, achieving an accuracy of learning techniques. Algorithms, 16(2), 88.
84.61%. It highlights various studies that have utilized [5] Gupta, C., Saha, A., Reddy, N. S., & Acharya, U. D.
different algorithms like Bayesian networks, decision trees, (2022). Cardiac Disease Prediction using Supervised
K-Nearest Neighbours, and neural networks with varying Machine Learning Techniques. In Journal of Physics:
success. Future research could involve testing the method Conference Series (Vol. 2161, No. 1, p. 012013). IOP
on other datasets and exploring its integration into clinical
Publishing.
practice.
[6] Rahman, M. M. (2022). A web-based heart disease
prediction system using machine learning
[20] Ali, M. M., & et al. (2021) investigated utilizing
machine learning for precisely predicting heart diseases at algorithms. Network Biology, 12(2), 64.
a very early stage. They compared several supervised [7] Hasanova, H., Tufail, M., Baek, U. J., Park, J. T., &
learning algorithms on a heart diseases dataset from Kaggle Kim, M. S. (2022). A novel blockchain-enabled heart
and various other sources and found that random forests disease prediction mechanism using machine
(RF) achieved the highest accuracy, sensitivity, and learning. Computers and Electrical Engineering, 101,
specificity (all 100%). This proves that it doesn’t matter 108086.
how simple a machine learning algorithm is, it can be [8] Rajendran, R., & Karthi, A. (2022). Heart disease
effective for a particular dataset. The paper mentions the prediction using entropy-based feature engineering and
importance of early-stage heart disease detection and the
ensembling of machine learning classifiers. Expert Systems
potential of data mining techniques for improved diagnosis.
This paper explores previous research making use of data with Applications, 207, 117882.
mining for association rule extraction, classification, and [9] Absar, N., Das, E. K., Shoma, S. N., Khandaker, M. U.,
clustering. Miraz, M. H., Faruque, M. R. I., ... & Pathan, R. K. (2022,
June). The efficacy of machine-learning-supported smart
system for heart disease prediction. In Healthcare (Vol. 10,
No. 6, p. 1137). MDPI.
IV. CONCLUSION [10] Sarra, R. R., Dinar, A. M., Mohammed, M. A., &
Abdulkareem, K. H. (2022). Enhanced heart disease
In conclusion, this study examined machine learning's prediction based on machine learning and χ2 statistical
growing role in predicting cardiovascular disease through optimal feature selection model. Designs, 6(5), 87.
an extensive analysis. Various algorithms were surveyed, [11] Nagavelli, U., Samanta, D., & Chakraborty, P. (2022).
from established methods like logistic regression to modern Machine learning technology-based heart disease detection
techniques such as deep learning and ensemble approaches. models. Journal of Healthcare Engineering, 2022.
These algorithms demonstrated differing levels of [12] Raju, K. B., Dara, S., Vidyarthi, A., Gupta, V. M.,
effectiveness across datasets. While some promising
& Khan, B. (2022). Smart heart disease prediction system
accuracy levels have been achieved, challenges remain
with IoT and fog computing sectors enabled by cascaded
regarding data quality, bias, interpretability, necessitating
deep learning model. Computational Intelligence and
further research. Future areas of research include exploring
Neuroscience, 2022.
underutilized techniques, utilizing more comprehensive
[13] Al Ahdal, A., Rakhra, M., Badotra, S., & Fadhaeel, T.
and diverse datasets, and addressing ethical considerations
(2022, March). An integrated machine learning techniques
like data privacy and security. By advancing in these areas,
for accurate heart disease prediction. In 2022 International
machine learning has great potential to revolutionize early
Mobile and Embedded Technology Conference
diagnosis and personalized intervention, ultimately
(MECON) (pp. 594-598). IEEE.
transforming the prevention and management of
cardiovascular disease.
[14] Sharean, T. M., & Johncy, G. (2022). Deep learning
models on Heart Disease Estimation-A review. Journal of
Artificial Intelligence, 4(2), 122-130.
[15] Ahsan, M. M., & Siddique, Z. (2022). Machine
learning-based heart disease diagnosis: A systematic
literature review. Artificial Intelligence in Medicine, 128,
102289.
[16] Ahmad, G. N., Fatima, H., Ullah, S., & Saidi, A. S.
(2022). Efficient medical diagnosis of human heart diseases
using machine learning techniques with and without
GridSearchCV. IEEE Access, 10, 80151-80173.
[17] Chang, V., Bhavani, V. R., Xu, A. Q., & Hossain, M.
A. (2022). An artificial intelligence model for heart disease
detection using machine learning algorithms. Healthcare
Analytics, 2, 100016.
[18] Ansarullah, S. I., Saif, S. M., Kumar, P., & Kirmani,
M. M. (2022). Significance of visible non-invasive risk
attributes for the initial prediction of heart disease using
different machine learning techniques. Computational
intelligence and neuroscience, 2022.
[19] Al Bataineh, A., & Manacek, S. (2022). MLP-PSO
hybrid algorithm for heart disease prediction. Journal of
Personalized Medicine, 12(8), 1208.
[20] Ali, M. M., Paul, B. K., Ahmed, K., Bui, F. M., Quinn,
J. M., & Moni, M. A. (2021). Heart disease prediction using
supervised machine learning algorithms: Performance
analysis and comparison. Computers in Biology and
Medicine, 136, 104672.

You might also like