Professional Documents
Culture Documents
Heart Disease Prediction
Heart Disease Prediction
Ministry of Education
King Faisal University
College of Computer Sciences & Information Technology
Department of Information System
4. Data Exploration:
The dataset contained several 270 attributes with 14 variables ‘age’, sex’, ‘chest pain type’,
‘BP’, ‘cholesterol’, ‘FBS over 120’, ‘EKG results’, ‘max HR’, ‘exercise angina’, ‘ST
depression’, ‘slope of ST’, ‘Number of vessels fluro’, ‘Thallium’, and ‘Heart disease’. The target
variable is ‘heart disease’.
Heart disease is the target variable while the other variables are the independent variables that
influence the occurrence of heart disease.
The quartiles were employed to identify various outliners from the variables which were noted as
shown in the diagram below.
5. Data Preparation:
The data set was split into two the testing and training datasets with 75% and 25%. From the data
analysis in the RapidMiner the dataset had no missing values. Hence no treatment was required
to fill in the missing values. The normalization was also done for some non-values.
From the heat map above or the correlation matrix, it's clear that there is no attribute having a
higher correlation with heart disease which is the target variable.
Positive (1) TP FN
Negative (0) FP TN
The confusion matrix shows that has 70 with true negative rate, 39 false positive, 50 with false
negative, and 111 with a true positive. The mode achieved an accuracy of 66.80%
The confusion matrix shows that has 75 with a true negative rate, 31 false positive, 45 with false
negative, and 119 with a true positive. The decision tree mode achieved an accuracy of 71.80%.
v. Random forest
It is a machine-learning algorithm that is normally used for both classification and regression. It
normally creates decision trees of the selected data to obtain the prediction in this case heart
disease.
The confusion matrix shows that has 83 with a true negative rate, 20 false positive, 37 with false
negative, and 130 with a true positive. The random forest mode achieved an accuracy of 78.73%.
The confusion matrix shows 95 with a true negative rate, 18 false positive, 25 with a false
negative, and 132 with a true positive. The random forest mode achieved an accuracy of 83.93%.
From the analysis above, linear regression had the highest accuracy with 83.93% followed by
random forest with 78.73%, 71.80 for the decision tree, and KN with 66.80%. The most precise
model was the linear regression with 84.59% it also has the highest recall with 83.59%. Hence
from the above analysis and comparison, linear regression had the highest accuracy, precision,
and recall of the best model. From the experiments where we implemented four classification
algorithms trying to find out the best algorithms that can be used to predict heart disease. As we
all know that heart disease is a sensitive and critical disease that has resulted in deaths hence the
model will play a big role in the prediction hence reducing the spread. Hence the accuracy and
TP should e kept high while FP should be kept as low as possible.
Conclusion
Heart disease is a major concern in society due to poor living and feeding habits in most
countries. It is hard to manually demine the odds of getting heart disease based on the risk
factors. Machine learning which can be implemented in different sites plays a vital role in the
prediction of this disease. It is clear from this implementation that machine learning algorithms
can be implemented in the healthcare sector to predict and diagnose heart disease at an early
stage.
References
D’Agostino, R. B., Levy, D., Belanger, A. M., Silbershatz, H., & Kannel, W. B. (1998).
Prediction of Coronary Heart Disease Using Risk Factor Categories. Circulation, 97(18),
1837–1847. https://doi.org/10.1161/01.cir.97.18.1837
Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective Heart Disease Prediction Using
Hybrid Machine Learning Techniques. IEEE Access, 7, 81542–81554.
https://doi.org/10.1109/access.2019.2923707
Purushottam, Saxena, K., & Sharma, R. (2016). Efficient Heart Disease Prediction System.
Procedia Computer Science, 85, 962–969. https://doi.org/10.1016/j.procs.2016.05.288
UCI Machine Learning Repository: Heart Disease Data Set. (2022).
https://archive.ics.uci.edu/ml/datasets/heart+disease