Professional Documents
Culture Documents
Batch-2 (Review 2)
Batch-2 (Review 2)
Batch-2 (Review 2)
Institute Of Technology
(Autonomous)
Heart Disease
Prediction System
using Exploratory Data
Analysis
Mentored By-
Mr.Nagarjuna
Submitted By-
P.Pravallika(CSE/576)
N.Divya(CSE/570)
Y.Joshua(CSE/5A6)
N.MukeshKrishna(CSE/569)
TABLE OF CONTENTS:
1. Abstract
2. Introduction
3. Existing System
4. Proposed System
5. System Specifications
6. System Architecture
7. Problem Statement
8. Data Collection
9. Technology Used
10. Data Cleaning
11. Library Used
12. Tableau Dashboard
13. Conclusion
Abstract
● With an opulence of data , healthcare is being developed by the application of
machine learning.
● Diagnosis of the disease solely depends upon the docter’s intuition and patient’s records.
● Researchers made use of several data mining techniques that are accessible to help the
specialists or physicians identify the heart disease. One of them is Naïve Bayes algorithm.
● The disadvantages of this prediction are, cardiovascular disease results are
not accurate , cannot handle enormous datasets for patient records.
Disadvantages:
● This practice leads to unwanted biases ,errors and excessive medical costs which effects
the quality of service provided to patients.
● There are many ways that a medical misdiagnosis can represent itself .
Proposed System
Software requirements:
● OS : Windows
● Python IDE : Python 2.7.x and above Anaconda IDE
● Setup tools and pip to be installed for 3.6.x and above
Hardware requirements:
● RAM : 4GB and Higher
● Processor : Intel i3 and above
● Hard Disk : 500GB: Minimum
System Architecture
System Architecture
Problem Statement
Machine learning allows building models to quickly analyze data and deliver
results, leveraging the historical and real-time data, with machine learning that
will help healthcare service providers to make better decisions on patient’s
disease diagnosis.
By analyzing the data we can predict the occurrence of the disease in our
project. This intelligent system for disease prediction plays a major role in
controlling the disease and maintaining the good health status of people by
predicting accurate disease risk.
What is Kaggle?
KAGGLE is an online community of data scientists and machine learners, owned by Google LLC.
Kaggle allows users to find and publish data sets, explore and build models in a web based data science
environment, work with other data scientists and other machine learning engineers and enter data
competitions to solve data science challenges
Attributes in Dataset
● Age
● Sex
● Cholestrol (Chol)Major
● Expression (oldpeak)
● Slope
● vessels (Ca)
Testing Technologies
Anaconda(Python) - Anaconda is a free and open-source distribution of the Python and
R programming languages for scientific computing, that aims to simplify package
management and deployment.
Jupyter Notebook - The Jupyter Notebook is an open-source web application that allows
you to create and share documents that contain live code, equations, visualizations and
narrative text. Uses include: data cleaning and transformation, numerical simulation,
statistical modeling, data visualization, machine learning, and much more.
Data Cleaning
Data Cleaning is essentially the task of removing errors and anomalies or replacing
observed values with the true values from data to get more values in analytics .
METHODS
1. Pandas- is a software library written for the Python programming language for
data manipulation and analysis. In particular, it offers data structures and operations
for manipulating numerical tables and time series pandas is a Python package
providing fast, flexible, and expressive data structures designed to make working
with “relational” or “labeled” data both easy and intuitive. It aims to be the
fundamental high-level building block for doing practical, real world data analysis
in Python.
● Tableau is one of the business intelligence software used to analyse data and
visualize the insights in the form of graph and charts.
● User can develop and share an interactive dashboard which shows the hidden
pattern, trends, density and variation of data.
● Tableau uses centroid-based k-means clustering algorithm that divides the data
into K-number of clusters.
● Dashboards are created with the data set after applying K-means algorithm.
● It provides visual appealing clusters in order to predict the occurrence of heart
disease from the given dataset.
Dataset in Tableau:
Conclusion
● The models we used to predict the probability of having heart disease are Logistic
regression,Random forest as they are more accurate in numerical variables. The
model accuracy is 85 % in test and train data sets. This model will would be used in
medical field as it can predict the heart diseases .
● Heart stroke and vascular disease are the major cause of disability and premature
death. Chest pain is the key to recognize the heart disease. In this work, the heart
diseases are predicted by considering major factors with four
types of chest pain. K-means clustering is one of the simplest and popular
unsupervised machine learning algorithms. Here the datasets is clustered and based
upon the clusters the happening of chest pain is predicted. The role of
exploratory data using tableau provided a visual appealing and accurate clustering
experience.
THANK YOU