Data Science Training Report: Heart Disease Prediction Project

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

DATA SCIENCE TRAINING REPORT

Heart Disease Prediction Project


Submitted for the partial fulfilment of the Degree
of
Bachelor of Technology
(Computer Science and Engineering)

Submitted by: Submitted to:


Paras Kalyan Mr. Inderjit Singh
2004698 Mrs. Priyanka Arora
Mrs. Jasdeep Kaur
Training Co-ordinator’s
CSE Deptt.

Department of Computer Science & Engineering


GURU NANAK DEV ENGINEERING COLLEGE
LUDHIANA -141006

1
Acknowledgement

I take this occasion to thank God, almighty for blessing me with his grace and taking my

endeavour to a successful culmination. I extend my sincere and heartfelt thanks to our

esteemed guide, Mr. Navjot Singh Tung, for providing me with the right guidance and

advice at the crucial junctures and for showing me the right way. I would like to thank the

other faculty members also, at this occasion.

I am also very thankful to my family and friends for their timely aid without which l

wouldn’t have finished my project successfully. I extend my thanks to all my well-wishers

and all those who have contributed directly and indirectly for the completion of this work.

2
Table of Contents

List of Figures .......................................................................................................... 4

Chapter 1 – Introduction to the Project .................................................................... 5

i. Overview ......................................................................................................... 5

ii. Problem Definition .......................................................................................... 5

iii. Objectives ........................................................................................................ 5

Chapter 2 – Technical Worflow of Project ............................................................... 6

i. Requirements ................................................................................................... 6

Chapter 3 – The Dataset ............................................................................................ 7

Chapter 4 – Operations Performed ....................................................................... 8 - 9

i. Data Processing ............................................................................................... 8

ii. Tools used .................................................................................................. 8 - 9

Chapter 5 – Overview of Dashboard ................................................................ 10 - 12

i. Page 1 ............................................................................................................. 10

ii. Page 2 ............................................................................................................. 11

iii. Page 3 ............................................................................................................. 12

Chapter 6 – Conclusion ........................................................................................... 13

3
List of Figures

Figure 1 ....................................................................................................................... 6

Figure 2 ....................................................................................................................... 7

Figure 3 ..................................................................................................................... 10

Figure 4 ..................................................................................................................... 11

Figure 5 ..................................................................................................................... 12

4
Chapter 1
Introduction to Project

Overview:-
According to the World Health Organization, every year 12 million deaths occur worldwide due to
Heart Disease. The load of cardiovascular disease is rapidly increasing all over the world from the
past few years. Many researches have been conducted in attempt to pinpoint the most influential
factors of heart disease as well as accurately predict the overall risk. Heart Disease is even
highlighted as a silent killer which leads to the death of the person without obvious symptoms. The
early diagnosis of heart disease plays a vital role in making decisions on lifestyle changes in high-
risk patients and in turn reduce the complications. This project aims to predict future Heart Disease
by analyzing data of patients which classifies whether they have heart diseaseor not.

Problem Definition:-
The major challenge in heart disease is its detection. There are instruments available which can
predict heart disease but either they are expensive or are not efficient to calculate chance of heart
disease in human. Early detection of cardiac diseases candecrease the mortality rate and overall
complications. However, it is not possible to monitor patients every day in all cases accurately and
consultation of a patient for 24 hours by a doctor is not available since it requires more sapience,
time and expertise. Since we have a good amount of data in today’s world, we can use various
techniques to analyze the data for hidden patterns. The hidden patterns can be used for health
diagnosis in medicinal data.

Objectives:-
The main objective of developing this project are:
1. To determine significant risk factors based on medical dataset which may lead to heart
disease.
2. To get to know about disease in early stages.
3. To aware patients about various cardiovascluar diseases.

5
Chapter 2
Technical Workflow of project

Dataset

Dataset Data Harmonization


Processing Data Cleansing

Using Getting some insights


Power Bi Data Visualization

Creating
Dashboard

Fig 1

Requirements:-
Hardware:- A laptop or Desktop with processor i3 or above, minimum RAM 2GB.
Software:-
• Microsoft Excel / Libreoffice Calc
• Microsoft Power BI

6
Chapter 3
The Dataset

The dataset is publicly available on the kaggle website. The creators of this dataset are Hungarian
Institute of Cardiology, University Hospital(Zurich), VA Medical centre. It provides patient data
which includes 1190 rows and 14 columns. The attribute include: Age, gender, chest pain type,
cholesterol level, max heart rate, resting blood pressure, fasting, sugar blood, resting
electrocardiographic results, maximum heart rate, exercise induced angina, ST slope, slope of the
peak exercise angina, and heart disease ranging from 0 to 1 where 0 is absence of heart disease. The
data set is in csv (Comma Separated Value) format which is further processed to be use for project.

Fig 2

7
Chapter 4
Operations Performed
Data Processing:-

Data Harmonization- Data harmonization is the improvement of data quality and utilization
through the use of machine learning capabilities. Data harmonization interprets existing
characteristics of data and action taken on data and uses that information to transform or suggest
subsequent data quality improvements.
Data cleansing:-
Data cleansing is the act of correcting or moving inaccurate, broken, or erroneous data from your
dataset. Think of this as giving your data a makeover. If you’ve ever corrected misspelled or
mashed together field names in a spreadsheet, congrats! You’ve cleansed data.

Data normalization:-
Data normalization and harmonization can be used interchangeably. Both imply making the
fundamental aspects of your data all the same.

Tools used for Data analysis and Visulaizaion:

Libreoffice Calc:- Calc is the spreadsheet component of LibreOffice. Spreadsheets allow us to


organize, analyze and store data in tabular form. Furthermore in a spreadsheet we can manipulate
this data to produce certain results. Other features provided by Calc. In Calc you create files that are
called spreadsheets. A spreadsheet consist of a number of individual sheets, each sheet containing
cells arranged in rows and columns. A particular cell is identified by its row number and column
letter (for example cell B8). Each spreadsheet can have many sheets, and each sheet has a large
number of individual cells. Each cell can contain data in the form of text, numbers or formulas. In
Calc, each sheet can have a maximum of 1,048,576 rows and a maximum of 1024 columns.

Power bi - Power BI is a Business Intelligence and Data Visualization tool for converting data
from various data sources into interactive dashboards and analysis reports. Power BI offers cloud-
based services for interactive visualizations with a simple interface for end users to create their own
reports and dashboards.

8
Different Power BI versions like Desktop, Service-based (SaaS), and mobile Power BI apps are
used for different platforms. It provides multiple software connectors and services for business
intelligence.

9
Chapter 5
Overview of Dashboard
Page 1

Fig 3

➢ In this page of the dashboard, firstly, count of no. of patients is displayed using Cards
visulaization in Power Bi. A slicer has been used that allows us to see the visualization
depending upon the presence or absence of heart disease.

➢ Next, a line chart is used to visualize the age distribution of patients given in the data. A
column chart is used to display the number of patients having different chest pain type. Next
to it, there is a Pie chart that shows gender distribution.

10
Page 2

Fig 4

➢ In this page of the dashboard, firstly, the two things are similar as the previous page that is
count of no. of patients and a slicer. Next, there is a Histogram that shows Chloesterol
distribution according to the age of patients like which age range has maximum level of
cholesterol levels.

➢ Next, a Donut chart is used to show percentage and count of patients that have Fasting
Blood Sugar above 120. There are 2 values- 0 & 1, 0 means No and 1 means Yes.

➢ A Stacked area chart is used to show the Maximum heart rate distribution like for a
particular value of max heart rate how many patients are there.

11
Page 3

Fig 5

➢ In this page, Resting ECG results are visualized using Stacked Column Chart according to
Presence and Absence of heart disease., dark blue color represents data for Present heart
disease patients. It has 3 values- 0, 1, 2.

➢ Next, ST Slope is visualized using Clustered Column chart according to Presence and
Absence of heart disease., dark red color represents count for Presence of heart disease. It
has 4 values- Flat, Upsloping, Downsloping and 0, represent Not Available.

12
Chapter 6
Conclusion

The early prognosis of cardiovascular diseases can aid in making decisions on lifestyle changes in
high risk patients and in turn reduce the complications, which can be a great milestone in the field
of medicine. Using these visualizations, we can predict whether a patient has a heart disease or not
which can save us a lot of time and money. We can provide recommendations to the patients based
upon visualizations.

13

You might also like