19bit0006 VL2022230101720 Pe005

A project report on
A WEBSITE TO PREDICT HEART DISEASE

USING MACHINE LEARNING
Submitted in partial fulfillment for the award of the degree of
B.Tech (Information Technology)

by
SIDDU BALA MALLIKARJUN REDDY (19BIT0006)
SCHOOL OF INFORMATION TECHNOLOGY &

ENGINEERING
April, 2023
A WEBSITE TO PREDICT HEART DISEASE
USING MACHINE LEARNING
Submitted in partial fulfillment for the award of the degree of
B.Tech (Information Technology)

by
SIDDU BALA MALLIKARJUN REDDY (19BIT0006)
SCHOOL OF INFORMATION TECHNOLOGY &

ENGINEERING
April, 2023
DECLARATION
I here by declare that the thesis entitled “A WEBSITE TO PREDICT HEART

DISEASE USING MACHINE LEARNING” submitted by me, for the award of the
degree of B.Tech (Information Technology) is a record of bonafide work carried out by
me under the supervision of Dr. Sumaiya Thaseen I.
I further declare that the work reported in this thesis has not been submitted and
will not be submitted, either in part or in full, for the award of any other degree or
diploma in this institute or any other institute or university.
Place: Vellore
Date: 03-04-2023 Signature of the Candidate

CERTIFICATE
This is to certify that the thesis entitled “A WEBSITE TO PREDICT HEART

DISEASE USING MACHINE LEARNING” submitted by SIDDU BALA
MALLIRKUN REDDY (19BIT0006), School of Information Technology &
Engineering, Vellore Institute of Technology, Vellore for the award of the degree
B.Tech (Information Technology) is a record of bonafide work carried out by him/her
under my supervision.
The contents of this report have not been submitted and will not be submitted
either in part or in full, for the award of any other degree or diploma in this institute or
any other institute or university. The Project report fulfils the requirements and
regulations of VELLORE INSTITUTE OF TECHNOLOGY, VELLORE and in my
opinion meets the necessary standards for submission.
Signature of the Guide Signature of the HoD
Internal Examiner External Examiner

ABSTRACT
Heart disease is a major cause of death worldwide. The ability to accurately predict the
risk of heart disease can help individuals take preventive measures and lead a healthier
life. Machine learning (ML) algorithms have shown promising results in predicting
heart disease risk. In this paper, we explore the use of ML techniques for heart disease
prediction. We use a dataset containing clinical and demographic information of
patients to train and evaluate various ML models. We experiment with several
classification algorithms such as logistic regression, random forest, and support vector
machines to predict the risk of heart disease. Our results show that ML models can
accurately predict the risk of heart disease, with an accuracy of up to 90%. The proposed
approach can be a useful tool for healthcare professionals to identify high-risk
individuals and provide early interventions.
ACKNOWLEDGEMENT
It is my pleasure to express with deep sense of gratitude to Sumaiya Thaseen I,

Associate Professor Grade 1, SITE, Vellore Institute of Technology, for his/her
constant guidance, continual encouragement, understanding; more than all, he taught
me patience in my endeavor. My association with him / her is not confined to academics
only, but it is a great opportunity on my part of work with an intellectual and expert in
the field of Machine Learning.
I would like to express my gratitude to DR.G.VISWANATHAN, Chancellor

VELLORE INSTITUTE OF TECHNOLOGY, VELLORE, MR. SANKAR
VISWANATHAN, DR. SEKAR VISWANATHAN, MR.G V SELVAM, Vice –
Presidents VELLORE INSTITUTE OF TECHNOLOGY, VELLORE, DR.
RAMBABU KODALI, Vice – Chancellor, DR. PARTHA SHARATHY MALLICK,
Pro-Vice Chancellor and Dr. S. Sumathy, Dean, School of Information Technology &
Engineering (SITE), for providing with an environment to work in and for his
inspiration during the tenure of the course.
In jubilant mood I express ingeniously my whole-hearted thanks to Dr.Sumaiya

Thaseen I, HoD/Professor, all teaching staff and members working as limbs of our
university for their not-self-centered enthusiasm coupled with timely encouragements
showered on me with zeal, which prompted the acquirement of the requisite knowledge
to finalize my course study successfully. I would like to thank my parents for their
support.
It is indeed a pleasure to thank my friends who persuaded and encouraged me to take

up and complete this task. At last but not least, I express my gratitude and appreciation
to all those who have helped me directly or indirectly toward the successful completion
of this project.
Place: Vellore
Date: 03/04/2023 Name of the student
SIDDU BALA MALLIKARUN REDDY

TABLE OF CONTENTS
LIST OF FIGURES
LIST OF TABLES
LIST OF ACRONYMS
CHAPTER 1
INTRODUCTION
1.1BACKGROUND
1.2MOTIVATION
1.3 PROJECT STATEMENT
1.4 OBJECTIVES
1.5 SCOPE OF THE PROJECT
CHAPTER 2
LITERATURE SURVEY
2.1 SUMMARY OF THE EXISTING WORKS
2.2 CHALLENGES PRESENT IN EXISTING SYSTEM
CHAPTER 3
REQUIREMENTS
3.1 HARDWARE REQUIREMENTS
3.2 SOFTWARE REQUIREMENTS
3.3 BUDGET
3.4 GANTT CHART

CHAPTER 4
ANALYSIS & DESIGN

4.1 PROPOSED METHODOLOGY
4.2 SYSTEM ARCHITECTURE
4.3 MODULE DESCRIPTIONS
CHAPTER 5
IMPLEMENTATION & TESTING

5.1 DATA SET
5.2 SAMPLE CODE
5.3 SAMPLE OUTPUT
5.4 TEST PLAN & DATA VERIFICATION
CHAPTER 6
RESULTS
6.1 RESEARCH FINDINGS
6.2 RESULT ANALYIS & EVALUATION METRICS
CONCLUSIONS AND FUTURE WORK
REFERENCES
LIST OF FIGURES
Architecture of Proposed System
Comparison of F1-Scores of Algorithms
Feature Importance of Gradient-Boost Algorithm
LIST OF TABLES
Comparison of values with past work/models
LIST OF ACRONYMS
XGB - Extreme Gradient Boosting
KNN - K-Nearest Neighbours
SVM - Support Vector Machines
HD - heart disease
CHD - coronary heart disease
LMT - Logistic Model Tree

CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
Machine Learning, an integral part of Artificial Intelligence, has begun penetrating

various industries, amongst which healthcare stands an obvious one. Currently, this
field is working on algorithms that reliably predict the presence or absence of lung
cancer, HD, and other ailments. Such data, if predicted ahead of time, can provide
valuable insights to clinicians, allowing them to tailor their diagnosis and treatment to
the individual patient.
The current situation is that the healthcare business collects vast amounts of data, but
not all of it is mined to uncover hidden patterns and make effective decisions. As a
result, the projections have huge variations from the true value.
1.2 MOTIVATION
In United States and many other developed countries, 50% of deaths are caused due to
cardiovascular diseases. Similarly, in many countries leading cause for deaths is heart
disease. Among many types of heart disease, coronary heart disease led to the highest
number of deaths. As these diseases occur suddenly or in most of the cases they are
diagnosed at the last stages, where the patients and doctors are helpless to cure the
disease.
So, we came up with this project idea of creating a website with good UI and more
accurate prediction of these diseases so that they can recognize the disease in the
starting stage itself and take measures accordingly. Technology should be used not only
for business but also for the better living of the people.
1.3 PROJECT STATEMENT
At the end of the day, it's our health what matters. Being proactive is the best
solution when it comes to taking care of health. With this objective, as we know mostly
"Heart-related problems" are the ones that occur suddenly, sometimes they might be
severe. Various factors of our health contribute to the disease occurrence. Our project
is a website that predicts the probability of coronary heart disease occurrence. Dataset
with valuable attributes that contribute to Heart problem has been considered. Various
ML models are applied to train and test pre-processed data and comparative analysis of
algorithms has been made. ML model is implemented at the backend and Flask server
is used to connect frontend and backend. Input features on the website are selected
based on their impact on the accuracy of the model. Validation of the input data user
provides is implemented using JavaScript. Ensuring that valid data is entered helps in
reducing the outliers and increases the accuracy of the product. A website like this
keeps us updated about our health condition and helps us change our lifestyle and habits
that improve our health.
1.4 OBJECTIVE
This project objective is to develop a website, in which users can provide their data
of health factors like Blood Pressure levels and Medication, Smoking/Drinking habits,
Body Mass Index, Heart Rate, Anxiety, Yellow Fingers and various other features that
have an important role in prediction of heart disease and more other common now-a-
days factors for this occurrence. The data collected will undergo prediction of “10 Year
Risk” for the occurrence of heart disease.
1.5 SCOPE OF THE PROJECT
The scope of the project "A Website to Predict Coronary Heart Disease Occurrence
Using ML in real life" would involve designing and developing a website that utilizes
machine learning algorithms to predict the occurrence of coronary heart disease in real
life. The website would need to collect relevant data from users, including personal and
medical information, and then use this data to train a machine learning model that can
accurately predict the likelihood of developing coronary heart disease.
The final product could be used by individuals who are concerned about their risk of
developing coronary heart disease, as well as healthcare professionals who could use
the predictions to guide treatment and prevention strategies. The project has the
potential to make a significant impact in the field of healthcare and could help reduce
the incidence of coronary heart disease.
CHAPTER 2
LITERATURE SURVEY
2.1 SUMMARY OF THE EXISTING WORKS
[1] Here, various classifiers are applied on heart disease (HD) dataset to find the most
accurate classifiers that works for dataset and they are compared based on accuracy
score. As there are many attributes, they minimalized the number and prioritized the
attributes. The algorithms used are KNN, SVM, Adaboost, SGD and Decision Table
(DT) classifiers to analyze the dataset and predict the disease.
[2] In this paper, there is clear explanation of pre-processing of unbalanced dataset and
training the dataset with machine learning models and predicted the risk of occurrence
of coronary heart disease. Random Forest algorithm acquired 96.80 % which is highest
of others. After comparative analysis of three supervised ML algorithms, to create
randomness in data K-Fold cross validation technique is carried out.
[3] Heart disorder occurrence is predicted by applying algorithms. ROC curve is used
to validate these methods. Logistic Regression acquired the maximum correctness. To
make sure that the model works for all diverse datasets, it should be trained & tested
over high dimensional datasets.
[4] In this paper, at first Support vector classifier and KNN classifier applied together
85% accuracy. Following this neural network & Naïve bayes classifier combination is
applied. To acquire more accuracy of model, Associate classification is applied as the
output is association of various models. It is proven that, Associate classification along
with Naïve bayes classifier, Decision tree & neural network is more reliable & will also
handle unstructured data.
[5] Classifiers applied together 85% accuracy. Following this neural network & Naïve
bayes classifier combination is applied. To acquire more accuracy of model, Associate
classification is applied as the output is association of various models. It is proven that,
Associate classification along with Naïve bayes classifier, Decision tree & neural
network is more reliable & will also handle unstructured data. To make sure that the
model works for all diverse datasets, it should be trained & tested over high dimensional
datasets.
[6] In this paper, survey of several research papers involving prediction of

Cardiovascular diseases by Data Mining, ML and DL techniques. Feature selection is
used to increase accuracy in many of the classification algorithms. When feature
selection applied, to decrease the search space, greedy based sequential forward &
backward selection is used. They also mentioned the algorithms and their accuracies in
a tabular column. Artificial Neural Network, regression classification & clustering
techniques are discussed.
[7] In this paper they applied Machine Learning to predict cardio vascular disease for
the patients who are undergoing dialysis. Amongst American and Italian datasets, many
ML algorithms are trained & tested. But as Italian dataset is biased, prediction results
might differ in accuracy.
[8] In this paper, they proposed an idea for occurrence which analyzes various
optimization algorithms, weight initialization techniques and their accuracy levels are
compared. In neural network, activation functions like ReLU is used. Comparative
analysis of combination of ReLU and various optimization algorithms like Adam,
Adagrad is carried out. Adagrad optimizing algorithm along with ReLu has shown 85%
accuracy which is the highest, when compared to other algorithms.
[9] In this paper, limited dataset is used. Discussed the functioning of every algorithm
used and why they used them for this dataset. Artificial Neural Network consists of 3
layers and in hidden layer Activation function is applied. To predict targeted label,
ReLU activation function is applied. ANN acquired highest accuracy of 85% when
compared to other algorithms.
[10] In this paper, they performed Classification techniques of Machine Learning for
accurate results which in return helps medical industry for faster detection of heart
disease. They implemented “Deep Neural Network” classifiers which analyzes various
optimization algorithms, weight initialization techniques and their accuracy levels are
compared. In neural network, activation functions like ReLU is used. Comparative
analysis of combination of ReLU and various optimization algorithms like Adam,
Adagrad is carried out. Adagrad optimizing algorithm along with ReLu has shown 85%
accuracy which is the highest, when compared to other algorithms.
2.2 CHALLENGES PRESENT IN EXISTING SYSTEM
Here are some challenges present in the existing system for a website to predict
coronary heart disease occurrence using ML:
1. Limited availability of data: One of the main challenges in developing a website

to predict coronary heart disease occurrence using ML is the limited availability
of high-quality data. Machine learning algorithms require a large amount of
accurate and diverse data to make accurate predictions.
2. Data quality: The quality of the data used to train the machine learning models
is essential for accurate predictions. The data collected from different sources
may have errors, missing values, and inconsistencies that can affect the
accuracy of the model.
3. Interpretability: The interpretability of machine learning models is a significant
concern in the healthcare industry. The ability to understand how a model
arrives at its predictions is crucial to building trust in the model's
recommendations.
4. Legal and ethical considerations: Collecting and using sensitive medical data
comes with legal and ethical considerations. Websites that collect personal data
are required to comply with data privacy laws and regulations, and healthcare
data has additional protections due to its sensitive nature.
5. Bias in the data: Machine learning algorithms can amplify biases present in the
data. If the training data is biased, the machine learning model will learn and
perpetuate that bias. This could lead to inaccurate predictions for certain
populations, such as minorities or underrepresented groups.
6. Limited user adoption: Developing a website to predict coronary heart disease
occurrence using ML may not be sufficient if users are not adopting it.
Encouraging users to provide accurate and complete data can be challenging.
Additionally, if the website is not user-friendly or accessible, users may
not use it at all.
CHAPTER 3
REQUIREMENTS
3.1 HARDWARE REQUIREMENTS
Laptop
Internet/Wi-Fi Hotspot
i3 Processor Based Computer or higher
3.2 SOFTWARE REQUIREMENTS
Front-End : HTML, CSS, BOOTSTRAP
Back-End : MYSQL
ML model training : PYTHON – Scikit-Learn/Keras(Google Colab)
ML model Deployment : Flask
3.3 BUDGET
as its software related all are almost free of cost
3.4 GANTT CHART

CHAPTER 4
ANALYSIS & DESIGN
4.1 PROPOSED METHODOLOGY
At the end of the day, it's our health what matters. Being proactive is the best solution
when it comes to taking care of health. With this objective, as we know, mostly "Heart-
related problems" are the ones that occur suddenly, sometimes they might be severe.
Various factors of our health contribute to the disease occurrence. Our project is a
website that predicts the probability of coronary heart disease occurrence.
The prior discovery of common diseases like diabetes, heart disease and pulmonary
cancer may control and reduce the likelihood of patient being fatal. As the machine
education and the artificial intelligence progresses, this is achieved by using several
classifiers and clustering algorithms. This paper presents an algorithm for machine
learning for prevention of coronary heart disease, which for many people is the leading
cause of death.
We would like to do some ensemble methods in this prediction.
4.2 SYSTEM ARCHITECTURE

4.3 MODULE DESCRIPTIONS
ALGORITHMS USED:
1. SUPPORT VECTOR CLASSIFIER:
This classifier works well on small datasets when compared to large datasets. All data
is divided into 2 sets. The goal is to mark a hyper plane which basically has maximum
margin value from the nearest data point in 2 sets. Margin is the distance between data
point and hyper plane. Problems based on subset solving, SVM is a better choice.
2. RANDOM FOREST CLASSIFIER:
Random forest comes under supervised algorithm category. It can be implemented for
both regression and classification. As it’s in the name, “Forest” basically comprises of
trees. The more the trees, denser and robust the forest. This classifier creates trees called
“Decision Trees” on data sample and result of every tree is considered. The result with
majority is treated as best solution. Random forest algorithm has an enormous
application in recommendation engines, image classification and feature selection.
3. GRADIENT BOOSTER CLASSIFIER:
Gradient boosting can be applied for both regression and classification problems, it
generally produces an ensemble of weak hypothesis, mostly it tries to minimalize the
function of cost generated by decision trees.
4. XG BOOST CLASSIFIER:
Extreme Gradient Boosting Algorithm is which is highly efficient and provides parallel
tree boosting. The main objective of Gradient Boost is to minimize the loss function by
adding weak learners using a gradient descent optimization algorithm.
5. LOGISTIC REGRESSION:
It is an algorithm to check the probability for an event occurrence. outcome or a binary

outcome with 2 classes. variable outcome which is categorical regression. Logit Link
function is being used here where data values are fitted.
PACKAGES USED:
1.Matplotlib
It is a plotting library for the Python. I used this library to print confusion matrix.
2.Keras
Keras is a powerful and simple free open-source Python library for evaluating and
developing deep learning models.
3.Sklearn
Scikit-learn, an effective library which comprises algorithms like support, random

forests, and KNN, and it also supports in data cleaning and changing the values in
dataset.
4.Pandas
Pandas’ library is used to load datasets as a data frame. It imports data from various
formats. It also helps in data cleaning and changing the values in dataset.
5.Seaborn
Seaborn is a data visualization library which helps in advance data exploratory analysis
which in turn add a great understanding about data to the data analyst.
CHAPTER 5
IMPLEMENTATION & TESTING
5.1 DATA SET
We are using “pandas” a machine learning library in our project for further processing.
This loads the dataset. Collecting dataset is primary task. Collecting a dataset containing
credible, diverse, and massive data is very important. As we give this data to machine
to learn and predict the future input based on its learning on current data, ensuring the
quality of data is very important. We collected a dataset with certain number of samples.
Training dataset contains of various attributes diversified from basic attributes like age,
gender, education to ingenious attributes like diabetes, heart rate, total cholesterol level,
systolic blood pressure, diastolic blood pressure, cigarettes per day etc. Normally the
number of these attributes varies from dataset to other. The dataset chosen is containing
all types of constraints varying form small to big that influences the coronary heart
disease.
5.2 SAMPLE CODE
DATA PRE-PROCESSING:
It is an important process as it ensures valid data is given to machine to learn. So, any
null values in data are replaced with other values like mean, median or less-dominant
value to balance the dataset and ensuring result is un-biased. We had to do ‘Feature
Scaling’ using ‘Standardization’ technique.
This technique rescales value such that it has distribution with mean equals 0 and
variance equal to 1. To ensure that machine learns from a quality data, we need to clean
up data. To clean data, we need to check if there are null values or values which are
impossible for an attribute to have (called as outliers).
For example, Age of a person is 500. We need to replace them with some other values
like mean, median or mode. As the dataset contains attributes with the imbalanced
values, we need to balance the null values or make sure that there are no gaps left in the
dataset such that values are close to the materiality.
Checking if there are any Null values in data:
Replacing Null values with most frequent for some attributes.
STANDARDIZATION:
Standardization, in the context of machine learning (ML), refers to the process of

scaling or normalizing the features of a dataset to ensure that they have the same range
and distribution. This is often done to improve the performance and accuracy of
machine learning models.
Standardization involves subtracting the mean value of a feature and dividing it by its
standard deviation, resulting in a new feature with a mean of zero and a standard
deviation of one. This process is also known as z-score normalization or standard score
normalization.
By standardizing the features, we can ensure that the machine learning model gives
equal importance to all features, and it prevents some features from having more
influence than others. Standardization can also make the model more robust to outliers,
as it reduces the impact of extreme values on the overall performance of the model.
Standardization is a common pre-processing step in many machine learning algorithms,

such as logistic regression, support vector machines (SVM), and artificial neural
networks.
TRAINING THE MODEL:
We have used many attributes in the training set which are more than required to
analyse the prediction of the coronary heart disease. We needed to do feature scaling
using Standardization strategy. This procedure rescales worth to such an extent that it
has conveyance with 0 mean worth and difference equivalent. After transformations it
also helps to manage the imbalance problem.
We train our model with a dataset of 4238 records and Supervised ML algorithms are
used. To train the model and their respective F1-Scores are recorded. In this level, Data
is divided to train validate and test. Training data is to train the machine using different
Machine Learning Algorithms. Testing data is to test the learnt algorithm by machine.
Heart disease dataset is trained with various Supervised Machine Learning Algorithms
and calculated the accuracy of models in concern of testing data and calculated the best
suitable algorithm for our dataset. We also conducted ‘Exploratory Data Analysis’
which conveys the relation between attributes in a better way. After fetching that, data
augmentation was done and dataset can increase to more extent than usual. The
accuracy depends on the algorithm to other.
LOGISTIC REGRESSION:
DECISION TREE 1:
DECISION TREE 2:
SUPPORT VECTOR CLASSIFIER:

RANDOM FOREST:
GRADIENT BOOSTING:
XGB:
ENSEMBLE TECHNIQUE 1:
’
TESTING THE MODEL:
After applying each algorithm, accuracy of test-data is recorded. In this module we are
generating validation data from data augmented training data. Number of impactful
attributes depends on their contribution to the validation result. In total, this is utilized
to choose specific highlights which are more effective on result, and which improves
the exhibition
EVALUATION:
For evaluating the obtained average precision, F1-score and average recall is calculated.
Evaluation Metric For this project, We have used following metrics for evaluation -
Precision:
Recall:
F1-score:
It is defined precision and recall harmonic mean.

Confusion Matrix:
MODEL DEPLOYMENT:
Convert. ipynb file to pickle file(.pkl) and load the file in “app.py”. We collect data
from user through web-interface, data collected is passed through “Flask” to the trained
model as input and the result predicted by the model is displayed to user. We collect
data from user through web-interface, data collected is passed through “Flask” to the
trained model as input and the result predicted by the model is displayed to user.
CHAPTER 6
RESULTS
6.1 RESEARCH FINDINGS

6.2 RESULT ANALYIS & EVALUATION METRICS
Logistic Regression
Decision Tree Classifier 1

Decision Tree Classifier 2
Support Vector Machine
Random Forest Classifier
Gradient Boosting Classifier

XGB Classifier
Ensemble Technique 1
STEP BY STEP EXPLANATION OF “HEALTHIFY” WEBSITE:
Creating Virtual FLASK environment:
Sign page:
HOME PAGE:
ABOUT WEBSITE:
USER DETAILS PAGE:
VALIDATIONS TO ENSURE CREDIBLE USERS:
Invalid Mobile Number is given.
Age of user is below 18.
User checked ‘NO’ for smoking, but Number of Cigars/Day is given 10.
User checked ‘NO’ for Diabetes, but Fasting Sugar levels aren’t normal.
User entered “NOT-SO-POSSIBLE” values for Tot Chol, SysBP and DiaBP, BMI and
Heart rate.
Validating user is inevitable to ensure accuracy of the prediction. Because the Machine
learning Model in the back end has been built over real time data. So, any impossible
input might affect the model accuracy which in turn effects Website. So, validated the
values using java script for better results. On clicking ‘PREDICT’ validation begins.
Validating Phone number:
It alerts user saying “Phone number must be a 10-digit number.

Validating Age:
It alerts user saying, “The website is only for people aged 18 and above”.
Validating wrongly checked value for ‘SMOKING’:
If user checks ‘NO’ and enters number of Cigars/days >0, It alerts user saying “You checked
'NO' for smoking & Please ensure Number of Cigars/day is entered 'ZERO’ or select 'YES' for
smoking.”
Validating Cholesterol:
Validating Systolic & Diastolic Blood Pressure:
If user enters Systolic Blood Pressure Ranging below 90 and greater than 180, It alerts user
saying, “Please enter valid systolic BP ranging from 90 to 180”.
If user enters Diastolic Blood Pressure Ranging below 60 and greater than 110, It alerts user
saying “"Please enter valid diastolic BP ranging from 60 to 110"”.
Validating BMI index value:
If value is not valid, it alerts the user saying, “Please enter BMI value ranging from 10 to 50”.
Validating Heart rate:
Example 2:
User checks ‘NO’ but his systolic and diastolic BP are not normal.
User checks ‘YES’ but his fasting glucose is normal.
User checks ‘YES’ for smoking & enters ZERO in No.of.Cigars/Day.

Validating Smoking & No.of.Cigars:
Validating Diabetes:
PREDICTION OF DISEASE FOR A NORMAL PERSON:

RESULT:
PREDICTION OF DISEASE FOR A DISEASED PERSON:

CONCLUSIONS AND FUTURE WORK
Coronary Disease is currently one of the life-threatening disease, for which millions of lives
losses occurs every year. The odds of anticipating coronary disease physically on danger factors
are hard to evaluate. Many advanced diagnosis techniques are available in clinical industry;
however, deep learning is considered to be the best of its choice in terms of accuracy. The
experimental study shows that CNN algorithm has achieved highest accuracy. In future, the
work can be extended to study ensemble models or combining different parameter for hidden
layers. Along with deep learning, it can be extended with feature selection algorithms.
REFERENCES
[1] Senthil Kumar Mohan, Chandrasekar Thirumalai, Gautam Srivastava, Effective Heart
Disease Prediction Using Hybrid Machine Learning Techniques, Smart Caching,
Communications, Computing and Cybersecurity for Information-Centric Internet of Things,
IEEE Access ( Volume: 7), Page(s): 81542 – 81554, 19 June 2019
[2] Devansh Shah, Samir Patel, Santosh Kumar Bharti; Heart Disease Prediction using Machine
Learning Techniques, Springer Nature Singapore Pte Ltd 2020, 2 October 2020.
[3] Abhishek Taneja; Heart Disease Prediction System Using Data Mining Techniques,
Department of Computer Science, S.A. Jain College, Ambala City, India
[4] Rahul Katarya & Sunit Kumar Meena; Machine Learning Techniques for Heart Disease
Prediction: A Compar
[5] Anjan Nikhil Repaka, Sai Deepak Ravikanti, Ramya G Franklin; Design and Implementing
Heart Disease Prediction Using Navies Bayesian, Proceedings of the Third International
Conference on Trends in Electronics and Informatics (ICOEI 2019) IEEE Xplore Part Number:
CFP19J32-ART; ISBN: 978-1-5386-9439-8, 11 October 2019.
[6] Puroshattam, Kanank Saxena, Richa Sharma; Efficient Heart Disease Prediction System,
Elsevier.
[7] V V Ramalingam; Heart disease prediction using machine learning techniques, March 2018,
International Journal of Engineering & Technology 7(2.8):684, DOI:10.14419/ijet.
v7i2.8.10557
[8] Marjia Sultana; Afrin Haider; Mohammad Shorif Uddin; Analysis of data mining techniques
for heart disease prediction, 2016 3rd International Conference on Electrical Engineering and
Information Communication Technology (ICEEICT), 09 March 2017
[9] Chaitrali S. Dangare, Sulabha S. Apte; Improved Study of Heart Disease Prediction System
using Data Mining Classification Techniques
[10] Archana Singh, Rakesh Kumar; Heart Disease Prediction Using Machine Learning
Algorithms, 2020 International Conference on Electrical and Electronics Engineering (ICE3-
2020), 23 June 2020
APPENDIX A

19bit0006 VL2022230101720 Pe005

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

19bit0006 VL2022230101720 Pe005

Uploaded by

Copyright:

Available Formats

A project report on

A WEBSITE TO PREDICT HEART DISEASE

B.Tech (Information Technology)

SIDDU BALA MALLIKARJUN REDDY (19BIT0006)

SCHOOL OF INFORMATION TECHNOLOGY &

B.Tech (Information Technology)

SIDDU BALA MALLIKARJUN REDDY (19BIT0006)

SCHOOL OF INFORMATION TECHNOLOGY &

I here by declare that the thesis entitled “A WEBSITE TO PREDICT HEART

Date: 03-04-2023 Signature of the Candidate

This is to certify that the thesis entitled “A WEBSITE TO PREDICT HEART

Signature of the Guide Signature of the HoD

Internal Examiner External Examiner

It is my pleasure to express with deep sense of gratitude to Sumaiya Thaseen I,

I would like to express my gratitude to DR.G.VISWANATHAN, Chancellor

In jubilant mood I express ingeniously my whole-hearted thanks to Dr.Sumaiya

It is indeed a pleasure to thank my friends who persuaded and encouraged me to take

Date: 03/04/2023 Name of the student

SIDDU BALA MALLIKARUN REDDY

1.3 PROJECT STATEMENT

1.5 SCOPE OF THE PROJECT

2.2 CHALLENGES PRESENT IN EXISTING SYSTEM

3.2 SOFTWARE REQUIREMENTS

3.4 GANTT CHART

ANALYSIS & DESIGN

4.2 SYSTEM ARCHITECTURE

4.3 MODULE DESCRIPTIONS

IMPLEMENTATION & TESTING

5.2 SAMPLE CODE

5.3 SAMPLE OUTPUT

5.4 TEST PLAN & DATA VERIFICATION

6.2 RESULT ANALYIS & EVALUATION METRICS

CONCLUSIONS AND FUTURE WORK

Architecture of Proposed System

Comparison of F1-Scores of Algorithms

Feature Importance of Gradient-Boost Algorithm

Comparison of values with past work/models

XGB - Extreme Gradient Boosting

KNN - K-Nearest Neighbours

SVM - Support Vector Machines

CHD - coronary heart disease

LMT - Logistic Model Tree

Machine Learning, an integral part of Artificial Intelligence, has begun penetrating

1.3 PROJECT STATEMENT

1.5 SCOPE OF THE PROJECT

2.1 SUMMARY OF THE EXISTING WORKS

[6] In this paper, survey of several research papers involving prediction of

2.2 CHALLENGES PRESENT IN EXISTING SYSTEM

1. Limited availability of data: One of the main challenges in developing a website

3.1 HARDWARE REQUIREMENTS

i3 Processor Based Computer or higher

3.2 SOFTWARE REQUIREMENTS

Front-End : HTML, CSS, BOOTSTRAP

ML model training : PYTHON – Scikit-Learn/Keras(Google Colab)

ML model Deployment : Flask

as its software related all are almost free of cost

3.4 GANTT CHART

ANALYSIS & DESIGN

4.1 PROPOSED METHODOLOGY

We would like to do some ensemble methods in this prediction.

4.2 SYSTEM ARCHITECTURE

1. SUPPORT VECTOR CLASSIFIER:

2. RANDOM FOREST CLASSIFIER: