Professional Documents
Culture Documents
A Prediction of Heart Disease Using Machine Learning Algorithms
A Prediction of Heart Disease Using Machine Learning Algorithms
Learning Algorithms
Abstract. Now a day’s heart disease is emerging as one of the most death-
dealing diseases. As per a report published by the World Health Organization
[WHO], heart disease is one of the most hazardous diseases to human which
causes death all over the world from the last 20 years. Approx. 12 million people
are dying every year, which makes it the biggest challenge for medical profes-
sionals to develop an early diagnosis of heart disease with better accuracy. In this
paper, we have applied different machine learning algorithms and compared their
classification accuracies. We have proposed a modified algorithm using logistic
regression with principal component analysis for predicting heart disease with
more accuracy on various attributes such as age, blood pressure, chest pain,
serum cholesterol levels, heart rate, and other characteristic attributes, and
patients will be classified according to varying degrees of coronary artery disease.
1 Introduction
In the world full of enlarged and enhanced computer technologies one of the major sub-
field of computer science that is Artificial Intelligence (Machine Learning and Deep
Learning) is used in the medical field to pull out the predictions whether heart disease
exists or not based on extracted medical records (image file or .csv file) of the patients
from the medical databases called Electronic Health Record with the use of various
algorithms [1, 17].
Nowadays heart disease is the most death-dealing disease. As per a report published
by the World Health Organization [WHO], heart disease is one of the most hazardous
diseases to human which causes death all over the world from the last 20 years. Millions
of human beings around the world are suffering from heart disease. Approx. 12 million
people dying every year which makes it the biggest challenge for medical professionals
how important the early diagnosis of heart disease with better accuracy [2].
There are many traditional methods for predicting such illness but they are not
looking sufficient, like data mining algorithms do not predict heart disease with so
much accuracy like the machine learning algorithms do (support vector machine,
logistic regression, naïve bayes, random forest, and decision tree). In terms of data
mining, when we work with these types of algorithms the problem arises from the very
© The Editor(s) (if applicable) and The Author(s), under exclusive license
to Springer Nature Switzerland AG 2021
J. I.-Z. Chen et al. (Eds.): ICIPCN 2020, AISC 1200, pp. 497–504, 2021.
https://doi.org/10.1007/978-3-030-51859-2_45
498 M. F. Ansari et al.
first step called data extraction like incomplete data, missing values, and inconsistency,
and predicted results are not so much accurate. Medical industries much needed that
type of diagnosis system which can predict heart disease at an early stage and offers
more and more accurate diagnosis than traditional methods [3, 18].
After the promising success of machine learning algorithms in various real-life field
industries, we have also observed that it can be a promising solution with the highest
accuracy for medical diagnosis and it can be seen as a key application in the healthcare
industry [4, 17].
In this paper, we are applying machine learning algorithms and comparing their
accuracy for classifying whether an algorithm has a more accurate percentage and on
this basis, we proposed a modified algorithm for predicting heart disease on various
attributes such as age, blood pressure, chest pain, serum cholesterol levels, heart rate,
and other characteristic attributes, and the patient will be classified according to varying
degrees of coronary artery disease. In this paper, we used the UCI machine learning
dataset of 304 patients which contain 304 rows and 14 columns.
2 Related Works
Many researchers are continuously working in the field of heart disease prediction to
find out better and better accuracy with the use of various algorithms [5]. From the
literature survey of different numbers of researchers, various techniques have been used
for heart disease prediction using large datasets to find out some trends, patterns, and
associations. A short literature review is presented here. Recently integrated clustering
of more than one machine learning techniques can improve model performance in the
heart disease diagnosis using various algorithms.
The study shows that several researchers are using various techniques like data
mining and machine learning etc., to identify the risk factors associated with heart
disease. Statistical scrutiny has identified the risk factors related to heart disease to be
age, blood pressure, cholesterol, smoking, high blood pressure with cholesterol levels,
family history, obesity, physical inactivity, high stress. Sufficient information about the
risk factors related to heart diseases helps health care professionals and doctors to
identify patients at high risk of having heart disease [6].
In many types of research, the researchers also focused on security issues, when
data are imported before mining. Specifically, they examine some scenarios in which
data mining algorithms like association rule mining and data clustering require privacy
safeguards [16]. Data mining is a hopeful approach to meet this challenging require-
ment [7].
Afterward, the new research area has emerged for heart disease diagnosis and
prediction called machine learning. Many authors have presented the concept namely
“Heart disease prediction using machine learning over data mining concepts” as a
means of extracting interesting patterns using data mining concepts with the use of
machine learning algorithms that focuses on expectation, based on training dataset with
known factors and the data mining willidnetify the unknown data properties [8]. The
machine learning concept is based on identifying unique patterns in data and extracting
feasible knowledge from them. The support vector machine and logistic technique have
been applied over the data for the prediction accuracy to achieve an expert system [9].
A Prediction of Heart Disease Using Machine Learning Algorithms 499
3 Algorithms Used
This field is originally kindled by a cardiologist who sought to develop and test
computational analogy of heart muscles. In the heart disease prediction system, disease
risk attributes are taken as input variables. ‘Disease existing’ and ‘disease non-existing’
are the output variables.
4 Proposed Model
DATA
COLLECTION Visualization
Cleaning
DATA
PROCESSING
(EDA) Feature Extrac-
tion
Feature Selec-
Analysis
APPLIED tion
ALGORITHMS
Logistic Regres-
sion
PATTERN
EVALUATION Support Vector
Machines
PROPOSED
ALGORITHM
Logistic Regres-
sion with PCA
RESULTS
Accuracy ¼ TP þ TN TP þ FP þ TN þ FN
Precision ¼ TP TP þ FP Recall ¼ TP TP þ FN
F1 Measure ¼ 2 Percision Recall Precision þ Recall
Average Predictive Power Ascribe Concluded in eda Part: Age, Exang, Slope
i. Those who have aged (>35) having more chances of heart disease.
ii. Groups of people who shared the same characteristics more chances of having
chest pain after exercise.
iii. Cohort people have more chances of having a flat ST-wave slope than a non-
disease cohort.
Poor Predictive Power Ascribe Concluded in eda Part: Trestbps, Chol, Gender,
FBS, Restecg
A Prediction of Heart Disease Using Machine Learning Algorithms 503
These ascribes have not any forecast power or can’t differentiate between disease
and non-disease group of people.
6 Conclusion
In this paper, we proposed a new model after applying two models. This model is
evaluated on UCI machine learning repository datasets and the aim was to predict if a
person has heart disease or not on attributes blood pressure, heartbeat, exang, fbs, and
others with better accuracy than other models. Firstly we train logistic regression with
all attributes, then we train logistic regression with strong predictive power attributes
concluded in eda part, and then in last, we train logistic regression after removing the
least significant attributes. Secondly, we are applying support vector machines. And
then we proposed a model, logistic regression with principal component analysis. We
see a logistic regression model with all the variables and logistic regression model with
PCA performed best with an accuracy of 86%, recall 68%, specificity 69%, precision
77%, and f1score 72% [Table 1]. The consequences of the models are whether heart
disease is existing or not with different levels of presence.
References
1. W.H Organisation: “New initiative launched to tackle cardiovascular disease, the world
number one killer” Intra-Health International (2017)
2. Shen, Z., Clarke, M., Jones, R.: Detecting the risk factors of coronary heart disease by use of
neural networks. In: Engineering in Medicine and Biology Society (1993)
3. Subhash, S., Patil, S.: Disease prediction using machine learning over big data. Int. J. Innov.
Res. Sci. Eng. Technol. 7 (2018)
4. Ambekar, S., Phalnikar, R: Disease prediction by using machine learning. Int. J. Comput.
Eng. Appl. (2018). ISSN 2321-3469
5. Wilson, P.W.F., D’Agostino, R.B., Levy, D., Belanger, A.M.: Prediction of coronary heart
disease using risk factor categories. J. Am. Heart Assoc. 97, 1837–1847 (1998)
504 M. F. Ansari et al.
6. Amin, S.U., Agarwal, K.: Genetic neural network based data mining in prediction of heart
disease using risk factors. In: 2013 IEEE Conference on Information & Communication
Technologies (ICT) (2013)
7. Kumar, B.S.: Adaptive personalized clinical decision support system using effective data
mining algorithms. J. Netw. Commun. Emerg. Technol. (2018)
8. Stephen, J., Pejaver, V.: Big data in public health: terminology, mach. learning, and privacy.
Annu. Rev. Public Health 39, 95–112 (2018)
9. Raj, J.S., Ananthi, J.V.: Recurrent neural networks and nonlınear predıctıon in support
vector machines. J. Soft Comput. Paradigm (JSCP) 1(01), 33–40 (2019)
10. Simons, L.A., Simons, J., Friedlander, Y: Risk functions for prediction of cardiovascular
disease in elderly Australians: the Dubbo study. Med. J. Aust. (2003)
11. Bashar, A.: Survey on evolving deep learning neural network architectures. J. Artif. Intell. 1
(02), 73–82 (2019)
12. Kumar, B.S.: Data mining methods and techniques for clinical decision support systems.
J. Netw. Commun. Emerg. Technol. (JNCET) (2017)
13. Vapnik, V.N., Vapnik, V.: Statistical Learning Theory, vol. 2. Wiley, New York
14. Bharti, S.: Analytical study of heart disease prediction comparing with different algorithms.
In: International Conference on Computing, Communication and Automation (ICCA2015)
(2015)
15. Burges, J.C.: A tutorial on support vector machines for pattern recognition. Data Min.
Knowl. Discov. 2(2), 121–167 (1998)
16. Baboota, R., Kaur, H.: Predictive analysis and modelling football results using machine
learning approach for English premier league. Int. J. Forecast. 35(2), 745–755 (2019)
17. Kaur, H., Alam, M.A., Jameel, R., Mourya, A.K., Chang, V.: A proposed solution and future
direction for blockchain-based heterogeneous medicare data in cloud environment. J. Med.
Syst. 42(8), 1–11 (2018). https://doi.org/10.1007/s10916-018-1007-5
18. Kaur, H., Kumari, V.: Predictive modelling and analytics for diabetes using a machine
learning approach. Appl. Comput. Inf. (2018)