Heart Disease

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 13

DATA MINING AND DATA

WAREHOUSING

HEART DISEASE PREDICTION USING


RAPIDMINER

Student Name ID
Faddel Ali Sahool 218011753
Hadi Al-Naser 218031879
Mohammed Ibrahim Alhussain 216016024
Hashem Zakaria Alsadah 218028932
INTRODUCTION

• In the modern technological world heart disease has continued to increase and has become one of
the major causes of death internationally. Heart attacks can e said to e tragic since they block the
flow of blood to the brain or the heart where the people at risk may show elevated blood
pressure, glucose, and as well as stress. Other heart-related diseases include coronary heart
disease and hypertension (Mohan et al., 2019). Data mining has been used in a variety of
applications, for instance, marketing, customer relationship, engineering and medicine analysis,
mobile computing, and web mining.
OBJECTIVE

• The main objective of this analysis is to develop a heart prediction system using RapidMiner.
The algorithms can discover and extract hidden information about the disease from the historical
data set.
• The system will aim at exploiting the data mining methods on the available medical data to assist
in heart disease prediction.
IMPORTANCE

• The clinical decisions are normally made by the doctor’s experience and insights rather than the
knowledge rich data hidden in the medical dataset. This has led to unwanted biases, errors, and
excessive medical costs which influence the quality of service being given to the patients. The
prediction models will create a new approach to concealed patterns in the heart disease medical
data available. Hence playing a big part in reducing and avoiding the human biases which have
been in this sector for a very long time.
• The project will save a lot of lives through education on a healthy lifestyle based on the available
predictive models. The study will also reduce the cost of medical tests since the results will be
obtained using this technique.
DATA COLLECTION AND PREPARATION
• The data set was collected from the UCI data respiratory which is freely available.
• The data has been found suitable for developing models since it is said to maintain
lesser missing values and other outliers.
• The data is normally preprocessed and cleaned before testing and training.
• The dataset contained several 270 attributes with 14 variables ‘age’, sex’, ‘chest
pain type’, ‘BP’, ‘cholesterol’, ‘FBS over 120’, ‘EKG results’, ‘max HR’,
‘exercise angina’, ‘ST depression’, ‘slope of ST’, ‘Number of vessels fluro’,
‘Thallium’, and ‘Heart disease’.
• The target variable is ‘heart disease’
CORRELATION MATRIX
CONFUSION MATRIX
• This is also identified as the error matrix which is a visualization of the performance of the
algorithms, typically a supervised learning one. Each of the raw columns represents the instance in
an actual class. The typical confusion matrix is normally represented as TN: True negative, TP:
true positive, FN False negative, and FP: False positive.

Positive (1) Negative (0)

Positive (1) TP FN

Negative (0) FP TN
KNN AND DECISION TREE
RANDOM FOREST AND REGRESSION

• Random forest confusion matrix shows that has 83 with a true negative rate, 20 false positive, 37
with false negative, and 130 with a true positive. The random forest mode achieved an accuracy
of 78.73%.
• Linear regression confusion matrix shows 95 with a true negative rate, 18 false positive, 25 with
a false negative, and 132 with a true positive. The random forest mode achieved an accuracy of
83.93%.
RESULTS AND ANALYSIS

Algorithm Accuracy Precision Recall

KNN 66.80% 66.58% 66.165%

Decision Tree 71.80% 71.655 70.915%

Random Forest 78.73% 79.21% 77.92%

Linear Regression 83.93% 84.08% 83.59%


RANDOM FOREST AND REGRESSION

• From the analysis above, linear regression had the highest accuracy with 83.93% followed by
random forest with 78.73%, 71.80 for the decision tree, and KN with 66.80%.
• The most precise model was the linear regression with 84.59% it also has the highest recall
with 83.59%. Hence from the above analysis and comparison, linear regression had the highest
accuracy, precision, and recall of the best model.
• As we all know that heart disease is a sensitive and critical disease that has resulted in deaths
hence the model will play a big role in the prediction hence reducing the spread. Hence the
accuracy and TP should e kept high while FP should be kept as low as possible.
CONCLUSION

Heart disease is a major concern in society due to poor living and feeding habits in most countries.
It is hard to manually demine the odds of getting heart disease based on the risk factors. Machine
learning which can be implemented in different sites plays a vital role in the prediction of this
disease. It is clear from this implementation that machine learning algorithms can be implemented
in the healthcare sector to predict and diagnose heart disease at an early stage.
REFERENCES

• Mohan, S., Thirumalai, C., & Srivastava, G. (2019). Effective Heart Disease Prediction Using
Hybrid Machine Learning Techniques. IEEE Access, 7, 81542–81554.
https://doi.org/10.1109/access.2019.2923707
• Purushottam, Saxena, K., & Sharma, R. (2016). Efficient Heart Disease Prediction System.
Procedia Computer Science, 85, 962–969. https://doi.org/10.1016/j.procs.2016.05.288
• UCI Machine Learning Repository: Heart Disease Data Set. (2022).
https://archive.ics.uci.edu/ml/datasets/heart+disease
.

You might also like