Professional Documents
Culture Documents
Muthu Sir Project
Muthu Sir Project
Project Report
2020 On
Supervisor Submitted by
Dr. (Mr) M.S.Muthu Saurabh Kumar
Associate Professor in Pharmaceutics,
Department of Pharmaceutical Engg. & Technology,
Indian Institute of Technology (BHU),
(Assistant Professor)
Assistant Professor
Dept. of Pharmaceutical Engineering and Technology
IIT(BHU) Varanasi
27 May 2020
This is to certify that the present work entitled “Data Visualization and Prediction of
Heart Disease by Machine Learning Algorithms” has been carried out by Mr.
Saurabh kumar under my direct supervision and guidance during his academic
semester IV. He has conducted his studies very sincerely, meticulously and
methodically and the results of the work are embodied in this report. I wish him
success in all his future endeavors.
1 Acknowledgement..............................................................................................................................2
2 Abstract..............................................................................................................................................3
3 Introduction.......................................................................................................................................4
4 Objective.............................................................................................................................................5
5 Types of Cardiovascular Diseases.....................................................................................................6
6 Prevalence of Cardiovascular Diseases............................................................................................7
7 Machine Learning Algorithms..........................................................................................................7
8 Material and Methodology................................................................................................................9
9 Work Plan........................................................................................................................................11
10 Data Visualization............................................................................................................................11
11 Result.................................................................................................................................................15
12 Conclusion........................................................................................................................................15
1
1 Acknowledgement
During the period of my project in this University, several respectful and affectionate
persons helped directly and indirectly to my project. Without their support it would be
impossible for me to accomplish my work, that’s why I wish to dedicate this section to
recognize their support.
First and foremost, I would like to thank my guide Prof. Shreyans Kumar Jain for guiding
me thoughtfully and efficiently through this project, giving me an opportunity to work at my
own pace along my own lines, while providing me with very useful directions whenever
necessary.
I offer my sincere thanks to all other persons who knowingly or unknowingly helped me
complete this project. I perceive as this project as a big milestone in my career development.
I will strive to use gained skills and knowledge in the best possible way, and I will continue
to work on their improvement, in order to attain desired career objectives. Hope to continue
cooperation with all in the future.
Saurabh Kumar
Dept. of Pharmaceutical Engineering & Technology
Indian Institute of Technology (BHU)
Varanasi-221005
2 Abstract
Heart related diseases or Cardiovascular Diseases are the main reason for a huge number of
deaths in the world over the last few decades and has emerged as the most life-threatening
disease, not only in India but in the whole world. So, there is a need of reliable, accurate and
feasible system to diagnose such diseases in time for proper treatment. Machine Learning
algorithms and techniques have been applied to various medical datasets to automate the
analysis of large and complex data. Many researchers, in recent times, have been using
several machine learning techniques to help the health care industry and the professionals in
the diagnosis of heart related diseases. This project presents various models based on such
algorithms and techniques and analyzes their performance.
3 Introduction
The heart is one of the main organs of the human body. It pumps blood trough the blood
vessels of the circulatory system. The circulatory system is extremely important because it
transports blood, oxygen and other materials to the different organs of the body. Heart plays
the most crucial role in circulatory system. If the heart does not function properly then it will
lead to serious health conditions including death. Change in lifestyle, work related stress and
bad food habits contribute to the increase in rate of several heart related diseases.
Medical organizations, all around the world, collect data on various health related issues.
These data can be exploited using various machine learning techniques to gain useful
insights. But the data collected is very massive and, many a times, this data can be very
noisy. These datasets, which are too overwhelming for human minds to comprehend, can be
easily explored using various machine learning techniques. Thus, these algorithms have
become very useful, in recent times, to predict the presence or absence of heart related
diseases accurately.
4 Objective
The aim of the project is to explore the problems in healthcare system and make a suitable
prototype to solve the problem.
Heart diseases or cardiovascular diseases are a class of diseases that involve the heart and
blood vessels. Cardiovascular disease includes coronary artery diseases like angina and
myocardial infarction (commonly known as a heart attack).There is another heart disease,
called coronary heart disease, in which a waxy substance called plaque develops inside the
coronary arteries. These are the arteries which supply oxygen-rich blood to heart muscle.
When plaque begins to build up in these arteries, the condition is called atherosclerosis. The
development of plaque occurs over many years. With the passage of time, this plaque can
harden or rupture. Hardened plaque eventually narrows the coronary arteries which in turn
reduces the flow of oxygen-rich blood to the heart. If this plaque ruptures, a blood clot can
form on its surface. A large blood clot can most of the time completely block blood flow
through a coronary artery. Over time, the ruptured plaque also hardens and narrows the
coronary arteries. If the stopped blood flow isn’t restored quickly, the section of heart
muscle begins to die. Without quick treatment, a heart attack can lead to serious health
problems and even death. Heart attack is a common cause of death worldwide. Some of the
common symptoms of heart attack are as follows.
Chest pain: It is the most common symptom of heart attack. If someone has a
blocked artery or is having a heart attack, he may feel pain, tightness or pressure in
the chest.
Nausea, Indigestion, Heartburn and Stomach Pain: These are some of the often
overlooked symptoms of heart attack. Women tend to show these symptoms more
than men.
Pain in the Arms: The pain often starts in the chest and then moves towards the
arms, especially in the left side.
Feeling Dizzy and Light Headed: Things that lead to the loss of balance.
Fatigue: Simple chores which begin to set a feeling of tiredness should not be
ignored.
Sweating: Some other cardiovascular diseases which are quite common are stroke,
heart failure, hypertensive heart disease, rheumatic heart disease, cardiomyopathy,
cardiac arrhythmia, congenital heart disease, valvular heart disease, aortic
aneurysms, peripheral artery disease and venous thrombosis. Heart diseases develop
due to certain abnormalities in the functioning of the circulatory system or may be
aggravated by certain lifestyle choices like smoking, certain eating habits, sedentary
life and others. If the heart diseases are detected earlier then it can be treated properly
and kept under control. Here, early detection is the main key. Being well informed
about the whys and wherefores of heart disease will help in prevention summarily.
An estimated 17.5 million deaths occur due to cardiovascular diseases worldwide. More than
75% deaths due to cardiovascular diseases occur in the middle-income and low-income
countries. Also, 80% of the deaths that occur due to cardiovascular diseases are because of
stroke and heart attack. India too has a growing number of cardiovascular disease patients
added every year. Currently, the number of heart disease patients in India is more than 30
million. Over two lakh open heart surgeries are performed in India each year. A matter of
growing concern is that the number of patients requiring coronary interventions has been
rising at 20% to 30% for the past few years.
Research on machine learning has led to the formulation of several machine learning
algorithms. These algorithms can be directly used on a dataset for creating some models or
to draw vital conclusions and inferences from that dataset. Some popular machine learning
algorithms are Regression, Decision Tree, K Nearest Neighbor, Random Forest, Support
Vector Machine etc. They are discussed in the follows section.
Logistic Regression
K-nearest neighbors
Support Vector Machine (SVM)
Random Forest
Decision Tree
For prediction of the heart disease, data is collected from UC Irvine Machine Learning
Repository
Attribute’s Information:
age: The person's age in years
sex
value ‘1’ = male
value ‘0’ = female
cp or chest pain type
value ‘0’ = asymptomatic
value ‘1’ = typical angina
value ‘2’ = atypical angina
value ‘3’ = non-anginal pain
trestbps: The person's resting blood pressure (mm Hg on admission to the hospital)
chol: The person's cholesterol measurement in mg/dl
fbs: The person’s fasting blood sugar > 120 mg/dl
value ‘0’ = false
value ‘1’ = true
restecg: Resting electrocardiographic measurement
value ‘0’ = normal
value ‘1’ = having ST-T wave abnormality
value ‘2’ = showing probable or definite left ventricular hypertrophy by
Estes' criteria
thalach: The person's maximum heart rate achieved
exang: Exercise induced angina
value ‘0’ = no
value ‘1’ = yes
oldpeak: ST depression induced by exercise relative to rest ('ST' relates to positions
on the ECG plot)
slope: The slope of the peak exercise ST segment Value
value ‘0’ = down sloping
value ‘1’ = up sloping
value ‘1’ = flat
ca: The number of major vessels (0-3)
thal: A blood disorder called thalassemia
value ‘1’ = normal
value ‘2’ = fixed defect
value ‘3’ = reversible defect
target: Heart Disease
value ‘0’ = no heart disease
value ‘1’ = heart disease present
9 Work Plan
The heart disease data has been collected from UCI Machine Learning Repository. Then, the
data has been analyzed by performing visualization plots and charts. Then, the machine
learning algorithms like logistic regression, k-nearest neighbors, support vector machine,
random forest and decision tree have been implemented on the extracted data and prediction
will be done by each algorithms. Then score for each algorithm has been calculated and the
algorithms have been rated by their scores. On the basis of scores of each algorithm, models
have been compared that how efficient and precise is the algorithm in order to make
prediction of heart disease.
10 Data Visualization
The plot points out that most of the heart patients get atypical angina type chest pain
The plot points out that most of the heart patients get blood cholestrol level in range of (200-
250)mg/dl
The plot points out that when fasting blood sugar levels are above 120 mg/dl, it is less likely to
have a heart disease
11 Result
The following scores were resulted after implementing the following machine learning
algorithms:-
Algorithm % Score
Logistic Regression 86.88524590163934
Support Vector Machine 85.24590163934426
Decision Tree 80.32786885245902
K Nearest Neighbors 86.88524590163934
Random Forest 85.24590163934426
The complete project with Jupyter Notebook is uploaded on GitHub repository which link is
given below:
https://github.com/bhaveshpancholi/Heart-Disease- Prediction/blob/master/Heart%20Disease
%20Prediction.ipynb
12 Conclusion
After implementing several machine learning algorithms, the highest score is found by
Logistic Regression and K Nearest Neighbors with 0.868. The worst score is found by
Decision Tree with 0.803.