Batch-2 (Review 2)

Kallam Harandha Reddy
Institute Of Technology
(Autonomous)
Heart Disease
Prediction System
using Exploratory Data
Analysis
Mentored By-
Mr.Nagarjuna
Submitted By-
P.Pravallika(CSE/576)
N.Divya(CSE/570)
Y.Joshua(CSE/5A6)
N.MukeshKrishna(CSE/569)
TABLE OF CONTENTS:
1. Abstract
2. Introduction
3. Existing System
4. Proposed System
5. System Specifications
6. System Architecture
7. Problem Statement
8. Data Collection
9. Technology Used
10. Data Cleaning
11. Library Used
12. Tableau Dashboard
13. Conclusion
Abstract
● With an opulence of data , healthcare is being developed by the application of
machine learning.
● Cardiovascular disease is one of the most fatal conditions in the present

world. In case of heart diseases, the correct diagnosis in early stage is
important as time is the very important factor.
● We are building a Heart Disease Prediction system to predict the chance of

heart disease , for this we use different algorithms like Logistic Regression
and Random Forests by giving age, gender, blood pressure etc as input. As a
output it gives the chance of getting heart disease.
Introduction
Heart disease predictor is an offline platform designed and developed to explore the
path of machine learning . The goal is to predict the health of a patient from
collective data, so as to be able to detect configurations at risk for the patient, and
therefore, in cases requiring emergency medical assistance, alert the appropriate
medical staff of the situation of the latter.
We initially have a dataset collecting information of many patients with which we

are able to conclude the results into a complete form and can predict data precisely.
The results of the predictions, derived from the predictive models generated by
machine learning, will be presented through several distinct graphical interfaces
according to the datasets considered. We will then bring criticism as to the scope of
our results.
Existing System
● Diagnosis of the disease solely depends upon the docter’s intuition and patient’s records.
● Researchers made use of several data mining techniques that are accessible to help the
specialists or physicians identify the heart disease. One of them is Naïve Bayes algorithm.
● The disadvantages of this prediction are, cardiovascular disease results are
not accurate , cannot handle enormous datasets for patient records.
Disadvantages:
● This practice leads to unwanted biases ,errors and excessive medical costs which effects
the quality of service provided to patients.
● There are many ways that a medical misdiagnosis can represent itself .
Proposed System
● Machine learning techniques are used to increase the accuracy rate.

● In machine learning technique we can use the following algorithms on
huge datasets to predict the heart disease.
● 1.Logistic Regression
● 2.Random Forest
● Logistic Regression algorithm is used to improve the accuracy of the system.
● By using this ,the proposed system acts as a decision support system and will
predict the chances of heart diseases.
System Specifications
Software requirements:
● OS : Windows
● Python IDE : Python 2.7.x and above Anaconda IDE
● Setup tools and pip to be installed for 3.6.x and above
Hardware requirements:
● RAM : 4GB and Higher
● Processor : Intel i3 and above
● Hard Disk : 500GB: Minimum
System Architecture
System Architecture
Problem Statement
Machine learning allows building models to quickly analyze data and deliver
results, leveraging the historical and real-time data, with machine learning that
will help healthcare service providers to make better decisions on patient’s
disease diagnosis.
By analyzing the data we can predict the occurrence of the disease in our
project. This intelligent system for disease prediction plays a major role in
controlling the disease and maintaining the good health status of people by
predicting accurate disease risk.
Machine learning algorithms can also be helpful in providing vital statistics,

real-time data and advanced analytics in terms of the patient’s disease, lab test
results, blood pressure, family history, clinical trial data, etc., to doctors.
Data Collection
Data has been collected from Kaggle.
Data collection is the process of gathering and Measuring information from countless different sources.
In order to use the datawe collect to develop practical Artificial Intelligence (AI) amd Machine learning
solutions it must be collected and stored in a way that makes sense for the business problem at hand
What is Kaggle?
KAGGLE is an online community of data scientists and machine learners, owned by Google LLC.
Kaggle allows users to find and publish data sets, explore and build models in a web based data science
environment, work with other data scientists and other machine learning engineers and enter data
competitions to solve data science challenges
Attributes in Dataset
● Age
● Sex
● Chest Pain (CP)
● Blood Pressure (Trestbps)
● Cholestrol (Chol)Major
● Fasting Blood Sugar (fbs)
● Heart Rate (Thalach)
● Resting electrocardiographic results(Rest
● Exercise induced angina (Exang)
● Expression (oldpeak)
● Slope
● vessels (Ca)
Testing Technologies
Anaconda(Python) - Anaconda is a free and open-source distribution of the Python and
R programming languages for scientific computing, that aims to simplify package
management and deployment.
Jupyter Notebook - The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations, visualizations and
narrative text. Uses include: data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much more.
Data Cleaning
Data Cleaning is essentially the task of removing errors and anomalies or replacing
observed values with the true values from data to get more values in analytics .
METHODS
● Get Rid of Extra Spaces.

● Select and Treat All Blank Cells.
● Convert Numbers Stored as Text into Numbers.
● Remove Duplicates.
● Highlight Errors.
● Change Text to Lower/Upper/Proper Case.
● Spell Check.
● Delete all Formatting.
Libraries Used
1. Pandas- is a software library written for the Python programming language for
data manipulation and analysis. In particular, it offers data structures and operations
for manipulating numerical tables and time series pandas is a Python package
providing fast, flexible, and expressive data structures designed to make working
with “relational” or “labeled” data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real world data analysis
in Python.
2. .NumPy- NumPy is a library for the Python programming language, adding

support for large, multi-dimensional arrays and matrices, along with a large
collection of high-level mathematical functions to operate on these arrays.
Tableau Dashboard
● Tableau is one of the business intelligence software used to analyse data and
visualize the insights in the form of graph and charts.
● User can develop and share an interactive dashboard which shows the hidden
pattern, trends, density and variation of data.
● Tableau uses centroid-based k-means clustering algorithm that divides the data
into K-number of clusters.
● Dashboards are created with the data set after applying K-means algorithm.
● It provides visual appealing clusters in order to predict the occurrence of heart
disease from the given dataset.
Dataset in Tableau:
Conclusion
● The models we used to predict the probability of having heart disease are Logistic
regression,Random forest as they are more accurate in numerical variables. The
model accuracy is 85 % in test and train data sets. This model will would be used in
medical field as it can predict the heart diseases .
● Heart stroke and vascular disease are the major cause of disability and premature
death. Chest pain is the key to recognize the heart disease. In this work, the heart
diseases are predicted by considering major factors with four
types of chest pain. K-means clustering is one of the simplest and popular
unsupervised machine learning algorithms. Here the datasets is clustered and based
upon the clusters the happening of chest pain is predicted. The role of
exploratory data using tableau provided a visual appealing and accurate clustering
experience.
THANK YOU

Batch-2 (Review 2)

Uploaded by

Copyright:

Available Formats

You might also like

Batch-2 (Review 2)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Batch-2 (Review 2)

Uploaded by

Copyright:

Available Formats

Kallam Harandha Reddy

● Cardiovascular disease is one of the most fatal conditions in the present

● We are building a Heart Disease Prediction system to predict the chance of

We initially have a dataset collecting information of many patients with which we

● Machine learning techniques are used to increase the accuracy rate.

Machine learning algorithms can also be helpful in providing vital statistics,

● Chest Pain (CP)

● Blood Pressure (Trestbps)

● Fasting Blood Sugar (fbs)

● Heart Rate (Thalach)

● Resting electrocardiographic results(Rest

● Exercise induced angina (Exang)

● Get Rid of Extra Spaces.

2. .NumPy- NumPy is a library for the Python programming language, adding

You might also like