Phase 1 Project Report

HEART DISEASE PREDICTION
USING MACHINE LEARNING
A PROJECT REPORT – PHASE 1
Submitted by
PAVITHRA S 73772254101
in partial fulfillment of the requirement

for the award of the degree
of
M.TECH
in
DATA SCIENCE
K.S. RANGASAMY COLLEGE OF TECHNOLOGY

(An Autonomous Institution, affiliated to Anna University Chennai and Approved by AICTE, New Delhi)
TIRUCHENGODE – 637 215
NOVEMBER 2023
ii
K.S. RANGASAMY COLLEGE OF TECHNOLOGY

TIRUCHENGODE - 637 215
BONAFIDE CERTIFICATE
Certified that this project report titled “HEART DISEASE PREDICTION

USING MACHINE LEARNING” is the bonafide work of PAVITHRA S
73772254101, who carried out the project under my supervision. Certified further,
that to the best of my knowledge the work reported herein does not form part of any
other project report or dissertation on the basis of which a degree or award was
conferred on an earlier occasion on this or any other candidate.
SIGNATURE SIGNATURE
Dr.R.POONKUZHALI, M.E., Ph.D., Dr.R.POONKUZHALI, M.E., Ph.D.,
HEAD OF THE DEPARTMENT SUPERVISOR
Professsor Professsor
Department of Information Technology Department of Information Technology
K.S. Rangasamy College of Technology K.S. Rangasamy College of Technology
Tiruchengode - 637 215 Tiruchengode - 637 215
Submitted for the Project work – Phase I viva-voce examination held on

………………
Internal Examiner Internal Examiner

iii
DECLARATION
We jointly declare that the project report on “HEART DISEASE

PREDICTION USING MACHINE LEARNING” is the result of original work
done by us and best of our knowledge, similar work has not been submitted to
“ANNA UNIVERSITY CHENNAI” for the requirement of Degree of Master of
Technology in Data Science. This project report is submitted on the partial
fulfilment of the requirement of the award of Degree of Master of Technology in
Data Science.
Signature
____________________
PAVITHRA S
Place: Tiruchengode
Date:
iv
ACKNOWLEDGEMENT
We wish to express our sincere gratitude to our honourable Chairman Thiru

R.Srinivasan, for providing immense facilities at our institution.
We would like to express special thanks of gratitude to our Chief Executive

Officer Dr. AKILA MUTHURAMALINGAM, M.E., Ph.D., who has been the
key spring of motivation to us throughout the completion of our course and project
work.
We are very proudly rendering our thanks to our Principal

Dr. R. GOPALAKRISHNAN, M.E., Ph.D., for the facilities and the
encouragement given by him to the progress and completion of our project.
We proudly render our immense gratitude to the Head of the Department

Dr.R.POONKUZHALI, M.E., Ph.D., for her effective leadership, encouragement
and guidance in the project.
We are highly indebted to provide our heart full thanks to our supervisor
Dr.R.POONKUZHALI, M.E., Ph.D., for her valuable ideas, encouragement and
supportive guidance throughout the project.
We wish to extend our sincere thanks to all faculty members of our Data
Science Department for their valuable suggestions, kind co-operation and constant
encouragement for successful completion of this project.
We wish to acknowledge the help received from various Departments and

various individuals during the preparation and editing stages of the manuscript.
v
ABSTRACT
In the realm of predictive healthcare, the quest for accurate and timely
identification of potential cardiac issues has become paramount. Heart disease, a leading
global health concern, necessitates innovative solutions for early detection and proactive
intervention. This project undertakes a pioneering initiative in the domain of heart disease
prediction, employing advanced machine learning algorithms, specifically the Warm and
Naive Bayes models. This project presents a comprehensive approach to heart disease
prediction utilizing machine learning algorithms, Warm and Naive Bayes (NB).
Leveraging a diverse dataset encompassing essential health parameters, including age,
gender, blood pressure, and cholesterol levels, the study addresses data preprocessing
challenges such as missing values and outliers. The models are trained on this refined
dataset, and their predictive performance is rigorously evaluated using standard metrics.
The investigation includes an in-depth comparison of the Warm and Naive Bayes (NB),
shedding light on their respective strengths and limitations in the context of heart disease
prediction.
vi
TABLE OF CONTENT
CHAPTER TITLE PAGE NO.
ABSTRACT v
LIST OF TABLES vii
LIST OF FIGURES viii
LIST OF ABBREVIATIONS ix
1 INTRODUCTION 1
1.1 HEART DISEASE 1
1.2 MACHINE LEARNING 2
1.3 PREDICTIVE MODELING 3
1.4 MEDICAL HISTORY 3
1.5 OBJECTIVES 4
2 LITERATURE SURVEY 5
Summary Of Literature Survey 16
3 SYSTEM ANALYSIS 26
3.1 EXISTING SYSTEM 26
3.1.1 DRAWBACKS 26
3.2 PROPOSED SYSTEM 27
3.2.1 ADVANTAGES 27
3.3 FEASIBILITY STUDY 28
3.3.1 TECHNICAL FEASIBILITY 28
3.3.2 OPERATIONAL FEASIBILITY 29
3.3.3 ECONOMICAL FEASIBILITY 29
4 SYSTEM DESIGN 30
4.1 PROBLEM DEFINITION 30
4.2 MODULE DESCRIPTION 30
4.2.1 LOAD DATA 30
4.2.2 DATA PREPROCESSING 30
4.3 SYSTEM FLOW DIAGRAM 21
4.4 INPUT DESIGN 21
4.5 OUTPUT DESIGN 22
5 CONCLUSION AND FUTURE 22

vii
ENHANCEMENT
REFERENCES 23
LIST OF TABLES
TABLE NO TITLE PAGE NO.
2.1 Summary Of Literature Survey 16

viii
LIST OF FIGURES
FIGURE NO FIGURE NAME PAGE NO.
1 Heart Disease 2
2 Machine Learning 4
3 System Flow Diagram 20
ix
ABBREVATIONS
NB - Naive Bayes
SVM - Support Vector Machines
IOT - Internet of Things
HRFLM - Hybrid Random Forest with Linear Model
CVDs - cardiovascular diseases
LASSO - Least Absolute Shrinkage and Selection Operator
DTBM - Decision Tree Bagging Method
KNN - K-nearest neighbours
HD - Heart Disease
ML - Machine learning
1
CHAPTER 1
INTRODUCTION
Heart disease remains a leading cause of mortality worldwide, posing a significant

public health challenge. The intricate interplay of genetic, lifestyle, and environmental
factors makes predicting the risk of heart disease a complex task. With the advent of
advanced technologies, particularly in the field of machine learning, there is a growing
interest in leveraging computational methods to enhance the accuracy and efficiency of
heart disease prediction. Machine learning, a subset of artificial intelligence, empowers
computers to learn patterns and make predictions from data without explicit programming.
In the context of heart disease prediction, machine learning algorithms can analyse vast
datasets encompassing diverse patient information, including medical history, lifestyle
choices, and genetic predispositions. By identifying hidden patterns and correlations within
this data, these algorithms can assist healthcare professionals in predicting and preventing
heart disease more effectively. This predictive modelling approach offers the potential to
revolutionize traditional risk assessment methods by providing personalized and data-
driven insights. As opposed to conventional risk calculators that often rely on a limited set
of factors, machine learning models can incorporate a multitude of variables and adapt to
new information, enhancing the precision of predictions.
1.1 HEART DISEASE
Heart disease, a formidable global health challenge, continues to be a leading cause

of morbidity and mortality. Its multifaceted nature, influenced by a complex interplay of
genetic, lifestyle, and environmental factors, demands innovative approaches for prediction
and prevention. As medical science advances, one promising frontier is the integration of
machine learning into cardiovascular healthcare. Machine learning, a subset of artificial
intelligence, empowers computers to discern patterns and make predictions from vast
datasets. In the realm of heart disease, these algorithms hold the potential to revolutionize
risk assessment by analyzing a myriad of patient-specific data, ranging from medical
histories to genetic predispositions. Traditional risk calculators often fall short in capturing
the intricacies of individual health profiles, and here lies the transformative potential of
2
machine learning. By uncovering hidden patterns and correlations within diverse datasets,
machine learning models can offer a more nuanced understanding of an individual's risk,
paving the way for personalized interventions and ultimately contributing to a paradigm
shift in the prevention and management of heart disease. This intersection of cutting-edge
technology and cardiovascular health heralds a new era in healthcare, where data-driven
insights may hold the key to reducing the global burden of heart-related ailments.
Figure 1. Heart Disease
1.2 MACHINE LEARNING
Machine Learning, at the forefront of technological innovation, represents a pivotal

evolution in the way computers learn and make decisions. Rooted in artificial intelligence,
Machine Learning enables systems to autonomously analyze vast datasets, identifying
patterns and trends that elude traditional programming approaches. Unlike conventional
systems, which rely on explicit instructions, Machine Learning algorithms adapt and
improve their performance over time as they process more information. This adaptive
capacity empowers machines to make predictions, recognize complex patterns, and
continually refine their understanding of diverse data sets. From predictive analytics to
image recognition, the applications of Machine Learning are diverse and ever-expanding,
influencing sectors such as healthcare, finance, and technology. As this transformative
technology continues to advance, it not only reshapes how we approach problem-solving
3
but also holds the potential to revolutionize industries, driving efficiency, and uncovering
insights that were once elusive through conventional computational methods.
Figure 2. Machine Learning
1.3 PREDICTIVE MODELING
Predictive Modeling stands as a powerful methodological approach that navigates

the intricate landscape of data analytics, aiming to forecast future outcomes based on
historical patterns and trends. In essence, it involves the construction and utilization of
algorithms to make predictions or identify potential trends in data sets. This methodology
is particularly invaluable in realms where understanding and anticipating future events
hold significant importance. Whether applied in finance to predict market trends, in
healthcare to foresee disease progression, or in marketing to target consumer behavior,
Predictive Modeling serves as a visionary tool. By leveraging statistical and mathematical
techniques, it transforms raw data into actionable insights, allowing decision-makers to
proactively address challenges and capitalize on opportunities. As industries increasingly
rely on data to inform strategies, the integration of Predictive Modeling becomes not just a
tool but a transformative force, enabling organizations to navigate complexities and make
well-informed decisions in the face of an ever-evolving landscape.
1.4 MEDICAL HISTORY
Medical history, a narrative intricately woven from the threads of a patient's past
health experiences, stands as a foundational pillar in healthcare. It serves as a
comprehensive record, chronicling the nuances of an individual's health journey—ranging
4
from ailments and treatments to lifestyle choices and genetic predispositions. This
historical tapestry is a crucial roadmap for healthcare professionals, offering invaluable
insights into the patient's health trajectory. By delving into medical history, clinicians can
decipher patterns, identify risk factors, and make informed decisions about diagnosis,
treatment, and preventive measures. In essence, a thorough understanding of medical
history not only informs current medical practices but also facilitates a personalized and
holistic approach to patient care. As healthcare continues to advance, the significance of
mining the rich repository of a patient's medical history becomes increasingly pronounced,
underscoring its role as an indispensable tool in the pursuit of optimal health outcomes.
1.5 OBJECTIVES
1. To develop a novel feature selection algorithm, Warm, that identifies the most
relevant features for heart disease prediction.
2. To implement three classification algorithms – Warm and Naive Bayes (NB) – to

classify heart disease risk based on the selected features.
3. To combine the predictions from the individual classification models using an

ensemble learning technique to produce a single, more robust prediction.
4. To evaluate the performance of the proposed system on benchmark heart disease

datasets and compare it with existing methods.
5
CHAPTER 2
LITERATURE REVIEW
2.1 HEART DISEASE PREDICTION USING MACHINE LEARNING
Sean C The integration of machine learning into the healthcare sector holds
immense promise, particularly in the early detection and prediction of various medical
conditions. In the context of heart health, the application of machine learning algorithms
becomes particularly crucial. The ability to predict potential heart conditions in advance
provides a valuable advantage in offering timely interventions and personalized treatment
strategies. In this research project, the focus is on evaluating and comparing the
performance of diverse machine learning classifiers for predicting heart conditions. The
classifiers under scrutiny include Decision Tree, Naive Bayes, Logistic Regression,
Support Vector Machines (SVM), and Random Forest. Each of these classifiers brings its
unique strengths and characteristics to the table, and a comparative analysis is essential to
identify which one excels in the specific context of heart health prediction. Furthermore,
the project introduces an innovative approach by proposing an ensemble classifier. This
classifier goes beyond the conventional single-model approach by combining the strengths
of both strong and weak classifiers. The rationale behind this hybrid classification strategy
is rooted in the capacity to harness a large number of training and validation samples
effectively. The ensemble classifier aims to enhance the overall predictive accuracy and
robustness of the model, providing a more reliable tool for early identification of potential
heart conditions. Machine learning, with its ability to analyse vast datasets and discern
complex patterns, empowers healthcare professionals with valuable insights. By leveraging
these algorithms, doctors can receive early warnings about locomotor abnormalities,
cardiac issues, and other health conditions. This proactive approach enables physicians to
tailor their diagnostic and treatment approaches on a per-patient basis, optimizing the
delivery of healthcare services. As the project unfolds, it not only contributes to the
advancement of predictive modelling in the healthcare domain but also underscores the
importance of a holistic evaluation of various machine learning techniques. The findings
from this research have the potential to significantly impact clinical practices, paving the
way for more accurate and timely detection of heart conditions, ultimately improving
patient outcomes and quality of care.
6
2.2 EFFECTIVE HEART DISEASE PREDICTION USING HYBRID

MACHINE LEARNING TECHNIQUES
SENTHILKUMAR MOHAN Heart disease remains a global health challenge,

standing as one of the leading causes of mortality. Addressing the prediction of
cardiovascular diseases is a critical aspect of clinical data analysis. The sheer volume of
data generated by the healthcare industry necessitates sophisticated tools, and machine
learning has proven to be highly effective in this regard. The marriage of machine learning
techniques with clinical data offers a promising avenue for enhancing decision-making and
prediction accuracy in the realm of cardiovascular health. The intersection of machine
learning and the Internet of Things (IoT) introduces a new dimension to healthcare
analytics. Recent developments have showcased the integration of machine learning
techniques with IoT devices, providing real-time data that can be instrumental in predicting
and preventing heart diseases. This convergence of technologies opens up avenues for
more proactive and personalized healthcare interventions. Despite the existing body of
research on predicting heart disease using machine learning, this paper seeks to contribute
to the field by proposing a novel method. The primary objective is to identify and leverage
significant features through advanced machine learning techniques, ultimately improving
the accuracy of cardiovascular disease predictions. The approach involves exploring
various combinations of features and employing well-established classification techniques
to develop a robust prediction model. The paper introduces a prediction model that
demonstrates its efficacy through different combinations of features and several widely
recognized classification techniques. Among these, the Hybrid Random Forest with Linear
Model (HRFLM) emerges as a standout performer, achieving an impressive accuracy level
of 88.7%. This highlights the potential of combining the strengths of Random Forest,
known for its ensemble learning capabilities, with the precision of Linear Models, resulting
in a hybrid model that excels in predicting heart disease. The significance of this research
lies not only in the elevated prediction accuracy achieved but also in the methodological
innovation of combining different machine learning techniques. This approach not only
refines the understanding of crucial features contributing to accurate predictions but also
opens avenues for further exploration in optimizing hybrid models for other healthcare
applications. As the healthcare landscape continues to evolve, the integration of advanced
machine learning techniques holds immense promise in transforming the prediction and
7
prevention of cardiovascular diseases, ultimately saving lives and improving the overall
well-being of individuals worldwide.

ALGORITHMS
Shu Jiang Diving deeper into the global impact of cardiovascular diseases (CVDs),
it is evident that this health crisis affects a staggering number of individuals, making it the
leading cause of death worldwide, surpassing all other causes. The World Health
Organization reports that in 2016 alone, an estimated 17.9 million people succumbed to
CVDs, constituting 31% of all global deaths. The majority of these deaths—85%—resulted
from heart attacks and strokes. This grim reality not only places an immense emotional
burden on affected families but also poses a substantial financial challenge, given the high
mortality rates and the considerable costs associated with cardiovascular surgeries. The
situation is particularly dire in economically disadvantaged regions, where heart disease
emerges as a significant and sometimes insurmountable threat. Therefore, the imperative to
analyse the intricate relationships between various human attributes and the likelihood of
developing heart disease becomes paramount. Developing a robust predictive model
becomes not just a statistical endeavour but a critical tool in anticipating and preventing
heart-related issues. In this context, the application of machine learning emerges as a
powerful ally in the fight against cardiovascular diseases. Machine learning, intricately
linked with computational statistics, harnesses mathematical optimization to provide
methods and theories that address real-world problems in medicine, industry, social
sciences, and business domains. The versatility of machine learning is reflected in its two
broad categories: supervised learning and unsupervised learning. For the specific goal of
predicting heart disease based on physiological attributes, supervised learning becomes the
natural choice. In supervised learning, the algorithm constructs a mathematical model
using a dataset that includes both inputs (attributes) and desired outputs (labels). This
aligns seamlessly with the objective of predicting the likelihood of individuals having heart
disease based on specific physical characteristics. This thesis takes a methodical approach
by employing four distinct models within the realm of supervised learning: logistic
regression, random forest, extreme gradient boosting, and neural networks. Each of these
models brings its unique strengths to the table, offering diverse perspectives on the
complex interplay between various attributes and the probability of developing heart
8
disease. The utilization of a combination of these models enhances the predictive accuracy
and robustness of the analysis, providing a nuanced understanding of the factors
contributing to heart disease susceptibility. As the research unfolds, it not only contributes
to the growing body of knowledge on predictive modelling in healthcare but also holds
promise in guiding preventive measures and interventions. By understanding and
harnessing the power of machine learning, this thesis aims to pave the way for more
effective strategies in identifying and supporting individuals at risk of heart disease,
ultimately contributing to a global effort to reduce the staggering mortality rates associated
with cardiovascular conditions.
2.4 EFFICIENT PREDICTION OF CARDIOVASCULAR DISEASE USING

MACHINE LEARNING ALGORITHMS WITH RELIEF AND LASSO
FEATURE SELECTION TECHNIQUES
PRONAB GHOSH Cardiovascular diseases (CVDs) represent a significant global

health concern, with their impact on human health being both widespread and severe. The
potential to prevent or mitigate the impact of CVDs through early diagnosis is well
recognized, making the identification of risk factors a key focus. In this context, leveraging
machine learning models to predict heart disease emerges as a promising approach. The
proposed model in this study aims to enhance the effectiveness of such predictions through
the integration of different methodologies. The success of the proposed model is rooted in
a meticulous approach to data handling, including efficient data collection, pre-processing,
and transformation methods. These steps are crucial in ensuring the creation of accurate
and reliable information for training the model. The integration of diverse datasets from
Cleveland, Long Beach VA, Switzerland, Hungarian, and Stat log contributes to the
model's comprehensiveness, capturing a broad spectrum of data for analysis. Feature
selection is a critical step in refining the model's predictive capabilities. In this study, the
Relief technique and the Least Absolute Shrinkage and Selection Operator (LASSO) are
employed to identify and choose the most relevant features. This strategic selection process
enhances the model's ability to discern and prioritize the factors contributing to heart
disease. The innovation in this research lies in the introduction of new hybrid classifiers,
namely the Decision Tree Bagging Method (DTBM), Random Forest Bagging Method
(RFBM), K-Nearest Neighbours Bagging Method (KNNBM), AdaBoost Boosting Method
(ABBM), and Gradient Boosting Method (GBBM). These hybrid classifiers integrate
9
traditional classifiers with bagging and boosting methods during the training process. This
amalgamation aims to capitalize on the strengths of both approaches, enhancing the overall
predictive power of the model. To rigorously evaluate the performance of the proposed
model, a range of machine learning algorithms is employed to calculate key metrics such
as Accuracy (ACC), Sensitivity (SEN), Error Rate, Precision (PRE), F1 Score (F1),
Negative Predictive Value (NPR), False Positive Rate (FPR), and False Negative Rate
(FNR). The comprehensive analysis of these metrics allows for a nuanced understanding of
the model's strengths and areas for potential improvement. The culmination of these efforts
and analyses reveals that the proposed model, particularly when employing the Random
Forest Bagging Method (RFBM) in conjunction with Relief feature selection, achieves an
impressive accuracy of 99.05%. This result underscores the efficacy of the hybrid
classifiers and the meticulous approach to feature selection, suggesting the potential of this
model as a robust tool for predicting heart disease with a high level of precision and
reliability.
2.5 HEART DISEASE PREDICTION
Harshit Jindal in the face of the escalating incidence of heart diseases, the
imperative to predict and diagnose these conditions in advance has become increasingly
crucial. The complexity of this diagnostic task demands precision and efficiency,
necessitating the exploration of innovative approaches. The research paper under
discussion addresses precisely this challenge—determining which patients are more likely
to be afflicted by heart disease based on a myriad of medical attributes. To tackle this
formidable task, the researchers have developed a heart disease prediction system that
leverages the rich medical history of patients. This system aims to predict whether a given
patient is prone to heart disease, thereby providing a proactive means of intervention. The
utilization of various machine learning algorithms, such as logistic regression and K-
nearest neighbours (KNN), underscores the versatility of modern computational techniques
in the realm of healthcare diagnostics. One notable feature of the research is its focus on
enhancing the accuracy of heart disease predictions. The authors have employed a
thoughtful approach to fine-tuning the model, seeking to optimize its performance and
reliability. The combination of KNN and logistic regression as predictive tools yielded
satisfactory results, showcasing an improvement in accuracy compared to previously
employed classifiers like naive Bayes. This augmentation in predictive accuracy is
10
particularly noteworthy, as it contributes to lifting a substantial burden off healthcare

practitioners by providing a more robust tool for identifying potential heart diseases. The
significance of the proposed heart disease prediction system extends beyond accurate
diagnoses. It is positioned to enhance overall medical care and potentially reduce costs
associated with delayed or less accurate diagnoses. By providing a means to predict the
likelihood of heart disease in individuals, the system empowers healthcare professionals to
take pre-emptive measures, ultimately improving patient outcomes. The implementation of
this predictive model in a. pynb format adds a layer of accessibility and ease of use.
Researchers and healthcare practitioners can readily explore, analyse, and apply the model,
fostering a broader adoption of this innovative approach to heart disease prediction. In
conclusion, this research project not only contributes valuable insights to the field of
predictive healthcare but also serves as a tangible and applicable solution with its heart
disease prediction system. As advancements in machine learning continue to intersect with
healthcare, the potential for more accurate and timely diagnoses becomes increasingly
promising, ultimately benefitting individuals and healthcare systems alike.
2.6 ENHANCING HEART DISEASE PREDICTION ACCURACY

THROUGH MACHINE LEARNING TECHNIQUES AND
OPTIMIZATION
Nadikatla Chandrasekhar in the realm of medical diagnostics, the early detection of

cardiovascular issues remains a formidable challenge. This research endeavours to elevate
the accuracy of heart disease prediction through the adept application of machine learning
techniques. The study employs a diverse set of algorithms, including random forest, K-
nearest neighbour, logistic regression, Naïve Bayes, gradient boosting, and AdaBoost
classifier, leveraging datasets sourced from both the Cleveland dataset and IEEE Data port.
The goal is to optimize the predictive models through meticulous parameter tuning using
GridSearchCV and robust five-fold cross-validation. In the evaluation of individual
algorithms on the Cleveland dataset, logistic regression emerges as the top performer,
achieving an impressive accuracy of 90.16%. Meanwhile, on the IEEE Data port dataset,
AdaBoost takes the lead with a remarkable accuracy of 90%. These results demonstrate the
effectiveness of different algorithms in distinct datasets, showcasing the need for a tailored
approach to achieve optimal predictive accuracy. The study goes a step further by
introducing a soft voting ensemble classifier, combining the strengths of all six algorithms.
11
This approach proves to be a significant breakthrough, elevating the accuracy to 93.44%

for the Cleveland dataset and an even more remarkable 95% for the IEEE Data port
dataset. Notably, this outperforms the individual logistic regression and AdaBoost
classifiers on both datasets, underscoring the potency of ensemble methods in enhancing
predictive accuracy. A novel aspect of this research lies in the methodological approach to
hyper parameter optimization. By employing GridSearchCV in conjunction with five-fold
cross-validation, the study systematically determines the best parameters for each model,
ensuring a robust and generalizable predictive framework. The assessment of model
performance using accuracy and negative log loss metrics provides a comprehensive
understanding of the models' capabilities. Furthermore, the research examines accuracy
loss for each fold, offering insights into the model's consistency and reliability across
different subsets of the data. This nuanced evaluation enhances the overall reliability and
applicability of the proposed predictive models. In comparison to existing studies in heart
disease prediction, this research stands out by surpassing their results. The soft voting
ensemble classifier, in particular, demonstrates its effectiveness in improving accuracy,
setting a new benchmark in the field. As the medical community continues to explore
advanced computational techniques, this study contributes not only to improved heart
disease prediction but also establishes a methodological precedent for future research in
predictive healthcare analytics.

TECHNIQUES
Vijeta Sharma The escalating prevalence of heart-related diseases, as highlighted

by recent WHO studies, underscores the urgent need for innovative approaches in
healthcare. With an alarming 17.9 million annual fatalities attributed to these diseases,
early diagnosis becomes imperative for effective treatment. The sheer challenge posed by
the growing global population accentuates the importance of leveraging technological
advancements to enhance healthcare capabilities. In this context, machine learning
techniques have emerged as powerful tools, propelling multiple research endeavours in the
health sector. This paper addresses the pressing need for a machine learning model
dedicated to heart disease prediction, leveraging relevant parameters associated with this
medical condition. The research utilizes a benchmark dataset from UCI for Heart Disease
Prediction, comprising 14 distinct parameters. Employing machine learning algorithms
12
such as Random Forest, Support Vector Machine (SVM), Naive Bayes, and Decision Tree,
the study aims to develop a robust predictive model. An intriguing aspect of this research is
the exploration of correlations between various attributes within the dataset. By employing
standard machine learning methods, the study systematically identifies and analyses these
correlations, seeking to harness them effectively in predicting the likelihood of heart
disease. This nuanced approach goes beyond mere model development, providing a deeper
understanding of the interplay between different parameters. Results from the research
underscore the efficiency of Random Forest compared to other machine learning
techniques, showcasing superior accuracy within a shorter timeframe. The findings suggest
that Random Forest stands out as a promising choice for heart disease prediction, offering
a balance between precision and computational efficiency. This insight is invaluable for
medical practitioners, as it not only enhances predictive accuracy but also optimizes the
time required for decision-making. The implications of this model extend beyond research
realms, with potential applications in real-world medical settings. Positioning the
developed model as a decision support system for medical practitioners, it offers a valuable
tool in clinics for assessing the likelihood of heart disease in patients. As technology
continues to reshape healthcare landscapes, endeavours like this contribute not only to
predictive analytics but also to the practical integration of machine learning in improving
patient outcomes.
2.8 HEART DISEASE PREDICTION USING MACHINE LEARNING AND

SVM TECHNIQUES
GOPAL The intricate nature of heart-related diseases necessitates an exceptionally

precise and accurate approach in both diagnosis and prediction. Any slight mistake in these
processes could lead to severe consequences, including fatigue problems or even the loss
of life. The escalating number of deaths related to heart diseases underscores the urgency
of developing advanced prediction systems for disease awareness. Machine learning, as a
subset of Artificial Intelligence (AI), has emerged as a critical tool in predictive analytics,
offering substantial support in forecasting events based on training from historical data.
This research paper focuses on the prediction of heart diseases, presenting a
comprehensive model based on supervised learning algorithms. The selected algorithms
include Logistic Regression, Decision Tree, Random Forest, Support Vector Machine,
Gaussian Naive Bayes, Multinomial Naive Bayes, and Gradient Boosting Classifier.
13
Leveraging an existing dataset from the Cleveland database of the UCI repository, which
encompasses information on 303 instances and 76 attributes related to heart disease
patients, the study strategically narrows its focus to 14 key attributes for testing. This
selective approach ensures a thorough evaluation of the performance of different
algorithms. The primary objective of the research is to envisage the probability of
developing heart disease in patients, a task of paramount importance in preventive
healthcare. By employing various supervised learning algorithms, the study aims to discern
patterns and relationships within the dataset that can contribute to accurate predictions.
This nuanced exploration allows for a comprehensive understanding of which algorithm
performs optimally in predicting the likelihood of heart disease. In the course of the
research, the results highlight that the K-Nearest Neighbours (KNN) algorithm achieves
the highest accuracy score. This outcome underscores the efficacy of KNN in this specific
context and suggests its potential as a preferred model for heart disease prediction. As the
findings contribute to the growing body of knowledge in predictive analytics for heart
diseases, the research provides valuable insights into which machine learning algorithms
may offer the most reliable predictions. The implications extend beyond the research
domain, offering potential applications in clinical settings for early intervention and
personalized healthcare strategies.
2.9 CARDIOVASCULAR DISEASES (CVDS) USING MACHINE

LEARNING
Baban.U Cardiovascular diseases (CVDs) have emerged as a formidable global

health challenge, claiming a significant number of lives over the last few decades and
standing as the leading cause of mortality worldwide. This health crisis is not confined to
any specific region; it affects populations globally. The urgency to develop a reliable,
accurate, and feasible system for the timely diagnosis of heart-related diseases is evident,
given the life-threatening nature of these conditions. In response to this challenge,
researchers have turned to the capabilities of machine learning algorithms and techniques,
applying them to vast and complex medical datasets. The heart, as a vital organ, plays a
central role in sustaining life, pumping blood and supplying it to all organs of the body.
Predicting occurrences of heart diseases has become a significant focus within the medical
field. Leveraging data analytics is instrumental in extracting meaningful insights from
extensive patient-related datasets maintained by healthcare systems on a monthly basis.
14
The wealth of stored information becomes a valuable resource for predicting the likelihood
of future diseases, especially those related to the heart. Several data mining and machine
learning techniques have been employed to predict heart diseases, and this research project
delves into the application of Artificial Neural Network (ANN), Random Forest, and
Support Vector Machine (SVM). The significance of these predictive models is
underscored by the challenging landscape faced by healthcare professionals in India and
worldwide. The urgency to reduce the scale of deaths attributed to heart diseases
necessitates the discovery of quick and efficient detection techniques, and machine
learning proves to be a pivotal tool in this quest. The researchers are actively engaged in
accelerating their efforts to develop software that utilizes machine learning algorithms.
This software aims to support doctors in both predicting and diagnosing heart diseases. The
main objective of this research project is to harness the power of machine learning to
enhance the accuracy and efficiency of heart disease prediction, ultimately contributing to
more effective healthcare interventions and improved patient outcomes. As advancements
in technology continue to reshape the healthcare landscape, the fusion of machine learning
and medical expertise holds tremendous promise in addressing the global burden of
cardiovascular diseases.
2.10 HEART DISEASE PREDICTION SYSTEM USING MODEL OF

MACHINE LEARNING AND SEQUENTIAL BACKWARD
SELECTION ALGORITHM FOR FEATURES SELECTION
Amin Ul Haq The early detection of Heart Disease (HD) is paramount for effective
treatment and recovery. Machine learning (ML) models have proven to be instrumental in
this regard, providing a powerful tool for physicians to identify and classify individuals
with heart-related issues. This study contributes to the field by proposing an identification
system that employs ML models to distinguish between individuals with heart disease and
healthy subjects. A notable feature of this research lies in the use of Sequential Backward
Selection (SBS) algorithm for feature selection. This algorithm systematically identifies
and selects the most relevant features, contributing to increased classification accuracy and
a reduction in computational time for the predictive system. The choice of features is a
critical aspect of ML models, and the SBS algorithm enhances the efficiency of the
classification process. The Cleveland heart disease dataset serves as the basis for
evaluating the proposed system. The dataset is partitioned, with 70% allocated for training
15
the model and the remaining 30% for validation. This division ensures a robust evaluation
of the system's performance on unseen data. The study's outcomes are measured using
evaluation metrics, shedding light on the effectiveness of the proposed identification
system. The experimental results demonstrate that the Sequential Backward Selection
algorithm adeptly selects relevant features. These selected features, in turn, contribute to an
enhanced accuracy when employing the K-Nearest Neighbour supervised machine learning
classifier. The promising accuracy achieved in this study suggests that the proposed model
holds potential for effectively identifying individuals with heart disease and distinguishing
them from healthy subjects. The incorporation of Sequential Backward Selection for
feature selection is a key contributor to the success of the system, showcasing the
significance of thoughtful feature engineering in ML applications for medical diagnostics.
As the field of ML continues to evolve, such studies contribute not only to the
advancement of predictive modelling in healthcare but also to the development of more
efficient and accurate diagnostic tools. The proposed model, with its emphasis on feature
selection and classification accuracy, stands as a valuable addition to the ongoing efforts to
enhance the early detection of heart disease, ultimately improving patient outcomes.
TITLE OF HARDWARE/
S.NO TITLE OF YEAR OBJECTIVE PROS CONS REMARKS
THE SOFTWARE
THE PAPER OF THE
JOURNAL DETAILS
PAPER
1. Dependenc
HEART Journal of 2023 Predicting Machine Early Utilizing a
e on the
DISEASE Medical Heart Learning detection combination
quality and
PREDICTIO Informatics Conditions Algorithms of heart of classifiers
quantity of
N USING using ML (Decision conditions through an
training
MACHINE Algorithms Tree, Naive through ensemble
data
LEARNING Bayes, predictive approach can
Logistic analytics enhance
Regression, prediction
SVM, accuracy and
Random robustness
Forest),
Ensemble
2.1 SUMMARY OF THE LITERATURE SURVEY
Classifier
16
TITLE OF TITLE OF HARDWARE/
S.NO YEAR OBJECTIVE OF PROS CONS REMARKS
THE
THE SOFTWARE
THE PAPER
PAPER JOURNAL DETAILS
2. Enhanced
Effective Heart Health 2020 Improve accuracy Hybrid RF Limited Successful
performance
Disease Prediction Informatics in CVD prediction with Linear informatio application
(88.7%
using Hybrid Model n on in CVD
accuracy)
Machine Learning hardware/s prediction
Techniques oftware
17
TITLE OF HARDWARE/
S.NO TITLE OF YEAR OBJECTIVE OF PROS CONS REMARKS
THE SOFTWARE
THE THE PAPER
JOURNAL DETAILS
PAPER
3. Heart Journal of 2020 To develop a Python High Limited to Promising

Disease Health predictive model (Scikit-Learn, accuracy in available results for
Prediction Informatics for identifying TensorFlow) predictions, data, early
Using individuals at risk Early risk Requires intervention
Machine of CVDs detection continuous
Learning updates
Algorithms
18
TITLE OF HARDWARE/
THE SOFTWARE
THE PAPER THE PAPER
JOURNA DETAILS
L
4. Efficient NLP-based 2023 Develop a model The survey Utilizes High cost Proposed
Prediction of clinical for effective covers a wide Relief and model
Cardiovascular decision prediction of range of NLP LASSO achieved the
Disease Using support cardiovascular techniques techniques highest
Machine system for diseases (CVD) by for feature accuracy
Learning diagnosing incorporating selection. (99.05%)
Algorithms and different methods, with RFBM
With Relief and managing utilizing efficient and Relief
LASSO Feature cardiovascu data collection, pre- feature
Selection lar processing, and selection
Techniques diseases. transformation. methods.
19
TITLE OF HARDWARE/
THE SOFTWARE
THE PAPER OF THE
JOURNAL DETAILS
PAPER
5. Heart disease Journal of 2021 Predicting Implemented Utilizes Relatively Significant

prediction Health likelihood of on .pynb format machine higher improvement
using machine Informatics heart disease learning computational over naive
learning based on medical algorithms requirements. bayes
algorithms attributes (Logistic classifiers
Regression,
KNN) for
prediction.
20
TITLE OF HARDWARE/
THE SOFTWARE
THE THE PAPER
JOURNAL DETAILS
PAPER
6. Enhancing Journal of 2022 Improve accuracy in GridsearchC High Soft Novel use of
Heart Medicine early identification of V, Five-fold accuracy voting GridSearchC
Disease cardiovascular issues Cross- achieved in ensemble V,
Prediction using six ML validation Cleveland further evaluation
Accuracy algorithms (Random dataset with improved with
through Forest, K-NN, Logistic accuracy accuracy and
Machine Logistic Regression, Regression to 93.44% negative log
Learning Naïve Bayes, (90.16%) (Cleveland loss metrics,
Techniques Gradient Boosting, and in IEEE ) and 95% and
and AdaBoost) and Dataport (IEEE comparison
Optimizatio ensemble classifier. dataset with Dataport). to existing
n AdaBoost studies.
(90%).
21
TITLE OF HARDWARE/
THE SOFTWARE
THE THE PAPER
JOURNAL DETAILS
PAPER
7. Heart Journal of 2021 Develop an ML Software: Accurate Limited to Random

Disease Health model for heart Python, Scikit- predictions, available Forest
Prediction Informatics disease prediction learn, Jupyter Decision dataset, outperforms
using using UCI dataset Notebook support for Generalizat other ML
Machine medical ion techniques
Learning practitioner challenges
Techniques s
22
TITLE OF HARDWARE/
THE SOFTWARE
THE THE PAPER
JOURNAL DETAILS
PAPER
Hearth
8. Journal of 2021 To develop a Hardware: High Limited Successful
disease
Medical AI predictive model XYZ; accuracy in interpretabil application in
prediction
Research for heart disease Software: predicting ity of some various
Using
using Logistic Python, heart models datasets
machine
Regression, Scikit-Learn disease
learning
Decision Tree,
techniques
Random Forest,
SVM, Naive
Bayes, and
Gradient Boosting
Classifier
algorithms
23
TITLE OF HARDWARE/
THE SOFTWARE
THE THE PAPER
JOURNAL DETAILS
PAPER
9. Heart Journal of 2023 To evaluate the Python with High Requires a Novel
Disease Medical performance of scikit-learn accuracy in large approach in
Prediction Informatics Artificial Neural library predicting amount of leveraging
Network (ANN) heart labeled data machine
diseases learning for
prediction
24
TITLE OF HARDWARE/
THE SOFTWARE
THE OF THE
JOURNAL DETAILS
PAPER PAPER
10. Heart Journal of 2020 To review and Implemented Provides a Lacks Useful for
Disease Medical Systems compare various in Python comprehensi specific gaining
Prediction ML techniques using scikit- ve overview hardware/so insights into
System for heart disease learn library of existing ftware the variety of
Using Model prediction ML details ML
Of Machine approaches. techniques
Learning and - Highlights used in this
Sequential the strengths domain.
Backward and
Selection weaknesses
Algorithm of each
for Features technique.
Selection
25
26
CHAPTER 3
SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
Predicting cardiac disease is considered one of the most challenging tasks in the
medical field. It takes a lot of time and effort to figure out what’s causing this, especially
for doctors and other medical experts. In this paper, various Machine Learning algorithms
such as LR, KNN, SVM, and GBC, together with the GridSearchCV, predict cardiac
disease. The system uses a 5-fold cross-validation technique for verification. A
comparative study is given for these four methodologies. The Datasets for both Cleveland,
Hungary, Switzerland, and Long Beach V and UCI Kaggle are used to analyse the models’
performance. It is found in the analysis that the Extreme Gradient Boosting Classifier with
GridSearchCV gives the highest and nearly comparable testing and training accuracies as
100% and 99.03% for both the datasets (Hungary, Switzerland & Long Beach V and UCI
Kaggle). Moreover, it is found in the analysis that XGBoost Classifier without
GridSearchCV gives the highest and nearly comparable testing and training accuracies as
98.05% and 100% for both the datasets (Hungary, Switzerland & Long Beach V and UCI
Kaggle). Furthermore, the analytical results of the proposed technique are compared with
previous heart disease prediction studies. It is evident that amongst the proposed approach,
the Extreme Gradient Boosting Classifier with GridSearchCV is producing the best hyper
parameter for testing accuracy. The primary aim of this paper is to develop a unique
model-creation technique for solving real-world problems.
3.1.1 DRAWBACKS
 XGBoost is a tree-based algorithm, and tree-based algorithms are prone to

overfitting. This means that the model can learn the training data too well and not
generalize well to new data. To reduce overfitting, it is important to use
regularization techniques, such as early stopping and pruning.
 It is a complex algorithm and can be computationally expensive to train. This can

make it difficult to use on large datasets or on machines with limited resources.
27
 The XGBoost is a black-box algorithm, which means that it can be difficult to

understand how the model makes its predictions. This can make it difficult to debug
and troubleshoot the model.
 This is sensitive to outliers in the data. This means that outliers can have a large
impact on the model's predictions. To reduce the impact of outliers, it is important
to remove outliers from the data before training the model.
3.2 PROPOSED SYSTEM
The proposed system integrates machine learning algorithms, specifically the

Warm and Naive Bayes (NB), to develop a robust heart disease prediction framework.
Leveraging a comprehensive dataset encompassing crucial health indicators, the system
undertakes meticulous data preprocessing, addressing missing values, outliers, and
standardizing numerical features. The models are trained and evaluated on their predictive
performance using established metrics. Emphasizing deployment readiness, the proposed
system aims not only to provide an accurate predictive model for heart disease but also to
consider ethical aspects, ensuring the responsible use of health data throughout the entire
process. This holistic approach seeks to contribute to effective and ethical predictive
healthcare solutions for heart disease, with implications for broader applications in real-
world scenarios.
3.2.1 ADVANTAGES
1. Develop a heart disease prediction model using the Warm and Naive Bayes (NB).
2. Evaluate and compare the predictive performance of the algorithms based on key
metrics.
3. Implement data pre-processing techniques to handle missing values and outliers in

the health parameter dataset.
4. Explore optional hyper parameter tuning to optimize the models for enhanced
accuracy.
5. Emphasize ethical considerations in the use of health data throughout the system
development and deployment process.
28
3.3 FEASIBILITY STUDY
Preliminary investigation examines project feasibility; the likelihood the system

will be useful to the organization. The main objective of the feasibility study is to test the
Technical, Operational and Economical feasibility for adding new modules and debugging
old running system. All system is feasible if they are unlimited resources and infinite time.
There are aspects in the feasibility study portion of the preliminary investigation:
 Technical Feasibility
 Operation Feasibility
 Economical Feasibility
3.3.1 TECHNICAL FEASIBILITY
The technical issue usually raised during the feasibility stage of the investigation
includes the following:
 Does the necessary technology exist to do what is suggested?

 Do the proposed equipments have the technical capacity to hold the data required to
use the new system?
 Will the proposed system provide adequate response to inquiries, regardless of the
number or location of users?
 Can the system be upgraded if developed?
 Are there technical guarantees of accuracy, reliability, ease of access and data security?
Earlier no system existed to cater to the needs of ‘Secure Infrastructure
Implementation System’. The current system developed is technically feasible. It is a web
based user interface for audit workflow at DB2 Database. Thus it provides an easy access
to the users. The database’s purpose is to create, establish and maintain a workflow among
various entities in order to facilitate all concerned users in their various capacities or
roles. Permission to the users would be granted based on the roles specified.
Therefore, it provides the technical guarantee of accuracy, reliability and security.

The software and hard requirements for the development of this project are not many and
are already available in-house at NIC or are available as free as open source. The work for
the project is done with the current equipment and existing software technology. Necessary
29
bandwidth exists for providing a fast feedback to the users irrespective of the number of
users using the system.
3.3.2 OPERATIONAL FEASIBILITY
Proposed projects are beneficial only if they can be turned out into information
system. That will meet the organization’s operating requirements. Operational feasibility
aspects of the project are to be taken as an important part of the project implementation.
Some of the important issues raised are to test the operational feasibility of a project
includes the following: -
 Is there sufficient support for the management from the users?

 Will the system be used and work properly if it is being developed and implemented?
 Will there be any resistance from the user that will undermine the possible application
benefits?
This system is targeted to be in accordance with the above-mentioned issues.
Beforehand, the management issues and user requirements have been taken into
consideration. So there is no question of resistance from the users that can undermine the
possible application benefits.
The well-planned design would ensure the optimal utilization of the computer
resources and would help in the improvement of performance status.
3.3.3 ECONOMIC FEASIBILITY
A system can be developed technically and that will be used if installed must still
be a good investment for the organization. In the economic feasibility, the development
cost in creating the system is evaluated against the ultimate benefit derived from the new
systems. Financial benefits must equal or exceed the costs.
The system is economically feasible. It does not require any addition hardware or
software. Since the interface for this system is developed using the existing resources and
technologies available at NIC, there is nominal expenditure and economical feasibility for
certain.
30
CHAPTER 4
SYSTEM DESIGN
4.1 PROBLEM DEFINITION
Accurately predicting cardiac disease is a critical challenge in the medical field, as it can
lead to timely interventions and improved patient outcomes. However, traditional methods
for predicting cardiac disease, such as relying solely on medical expertise and historical
data, can be time-consuming and may not capture the complex relationships between
various factors that contribute to heart disease. Therefore, there is a need for a more
efficient and accurate approach to predicting cardiac disease.
4.2 MODULE DESCRIPTION
4.2.1 LOAD DATA
This initial module involves the acquisition and loading of the

heart disease dataset. The dataset comprises a range of health parameters, such as age,
gender, blood pressure, and cholesterol levels, crucial for predicting heart disease. Loading
the data is a fundamental step that sets the foundation for subsequent analyses and model
development.
4.2.2 DATA PREPROCESSING
Data preprocessing is a critical phase focused on refining the

quality of the loaded dataset. This module addresses challenges such as missing values and
outliers. Techniques such as imputation and normalization are applied to ensure the
dataset's integrity, creating a clean and standardized foundation for subsequent analyses.
31
4.3 SYSTEM FLOW DIAGRAM
Load data Data Pre-processing Feature Selection
Training & Testing
Evaluation & Heart Disease

Comparison Prediction using
WARM and NB
Figure 3.system flow diagram
4.4 INPUT DESIGN
Designing the input for a heart disease prediction system involves defining the data that the
model will use to make predictions. Here are key aspects of the input design:
1. Feature Selection:
 Identify and select relevant features for heart disease prediction, such as
age, gender, blood pressure, cholesterol levels, and other health indicators.
2. Data Types and Formats:
 Specify the data types and formats for each selected feature (e.g., numerical,
categorical) to ensure compatibility with the algorithms.
3. Data Validation and Cleaning:
 Implement validation checks to ensure the integrity of input data, including

range checks for numerical values and validation rules for categorical
variables. Address missing or inconsistent data through appropriate cleaning
techniques.
32
4. Normalization or Standardization:
 Apply normalization or standardization to numerical features to bring them

to a consistent scale, preventing any particular feature from dominating the
model.
4.5 OUTPUT DESIGN
The output of the proposed system for heart disease prediction consists of the following:
 Heart Disease Risk Classification: The system provides a binary

classification indicating whether the patient is at high or low risk for heart
disease.
 Prediction Probability: The system assigns a probability score to each
classification, indicating the confidence level of the prediction
33
CHAPTER 5
CONCLUSION
In conclusion, this study has presented a comprehensive framework for heart

disease prediction, incorporating machine learning algorithms such as the Warm and Naive
Bayes (NB). Through meticulous data preprocessing, feature selection, and model training,
the system aims to provide accurate and reliable predictions based on key health
parameters. The comparative analysis of algorithms highlights their respective strengths
and limitations, contributing valuable insights for informed model selection. Ethical
considerations have been central throughout the process, ensuring responsible use of health
data.
FUTURE ENHANCEMENT
Future work in this domain could explore the integration of advanced feature
engineering techniques and deep learning architectures to enhance the predictive
capabilities of heart disease models. Investigating the impact of additional health
parameters and incorporating real-time monitoring data could provide a more
comprehensive understanding of the dynamic nature of cardiovascular health. Furthermore,
efforts should be directed towards developing interpretable models to enhance
transparency and trust in the predictions, especially in critical healthcare decision-making
scenarios.
34
REFERENCES
[1]. K. Agarwal and t. Kumar, “heart disease prediction using machine learning,” in 2nd
international conference on intelligent computing and control systems(iciccs). Ieee,
2018.
[2]. S. Rajput and a. Arora, “effective heart disease prediction using hybrid machine
learning techniques,” international journal of computer applications, vol. 75, no. 10, pp.
6–12, 2013.
[3]. M. Mohamad and a. Selamat, “heart disease prediction using machine learning
algorithms,” in international conference on computer, communications, and control
technology (i4ct). Ieee, 2015, pp. 227–231.
[4]. J. Ramos et al., “efficient prediction of cardiovascular disease using machine learning
algorithms with relief and lasso feature selection techniques,” in proceedings of the
first instructional conference on machine learning, vol. 242. Piscataway, nj, 2003, pp.
133–142.
[5]. T. Kumaresan and c. Palanisamy, “heart disease prediction,” international journal of

bio-inspired computation, vol. 9, no. 3, pp. 142–156, 2017.
[6]. H. Kaur and s. Ajay, “enhancing heart disease prediction accuracy through machine
learning techniques and optimization,” next generation computing technologies(ngct),
pp. 516–521, 2016.
[7]. K. Toutanova and c. Cherry, “heart disease prediction using machine learning and svm
techniques,” in proceedings of the joint conference of the 47th annual meeting of the
acl and the 4th international joint conference on natural language processing of the
afnlp: volume 1- volume 1. Association for computational linguistics, 2009, pp. 486–
494.
[8]. T. N. Sainath, o. Vinyals, a. Senior, and h. Sak, “energy consumption optimization of

container-oriented cloud computing center,” in 2015 ieee international conference on
acoustics, speech and signal processing (icassp). Ieee, 2015, pp. 4580–4584.
35
[9]. T. Mikolov and g. Zweig, “cardiovascular diseases (cvds) using machine learning,” in
2012 ieee spoken language technology workshop (slt). Ieee, 2012, pp. 234–239.
[10]. Rizky, w. M., ristu, s., afrizal, d. “heart disease prediction system using model of
machine learning and sequential backward selection algorithm for features selection”.
Scientific journal of informatics, vol. 3(2), p. 41-50, nov. 2020.

Phase 1 Project Report

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Phase 1 Project Report

Uploaded by

Copyright:

Available Formats

HEART DISEASE PREDICTION

USING MACHINE LEARNING

A PROJECT REPORT – PHASE 1

in partial fulfillment of the requirement

K.S. RANGASAMY COLLEGE OF TECHNOLOGY

TIRUCHENGODE – 637 215

K.S. RANGASAMY COLLEGE OF TECHNOLOGY

Certified that this project report titled “HEART DISEASE PREDICTION

Submitted for the Project work – Phase I viva-voce examination held on

Internal Examiner Internal Examiner

We jointly declare that the project report on “HEART DISEASE

We wish to express our sincere gratitude to our honourable Chairman Thiru

We would like to express special thanks of gratitude to our Chief Executive

We are very proudly rendering our thanks to our Principal

We proudly render our immense gratitude to the Head of the Department

We wish to acknowledge the help received from various Departments and

CHAPTER TITLE PAGE NO.

5 CONCLUSION AND FUTURE 22

TABLE NO TITLE PAGE NO.

2.1 Summary Of Literature Survey 16

Heart disease remains a leading cause of mortality worldwide, posing a significant

1.1 HEART DISEASE

Heart disease, a formidable global health challenge, continues to be a leading cause

Figure 1. Heart Disease

1.2 MACHINE LEARNING

Machine Learning, at the forefront of technological innovation, represents a pivotal

Figure 2. Machine Learning

1.3 PREDICTIVE MODELING

Predictive Modeling stands as a powerful methodological approach that navigates

1.4 MEDICAL HISTORY

2. To implement three classification algorithms – Warm and Naive Bayes (NB) – to

3. To combine the predictions from the individual classification models using an

4. To evaluate the performance of the proposed system on benchmark heart disease

2.1 HEART DISEASE PREDICTION USING MACHINE LEARNING

2.2 EFFECTIVE HEART DISEASE PREDICTION USING HYBRID

SENTHILKUMAR MOHAN Heart disease remains a global health challenge,

2.3 HEART DISEASE PREDICTION USING MACHINE LEARNING

2.4 EFFICIENT PREDICTION OF CARDIOVASCULAR DISEASE USING

PRONAB GHOSH Cardiovascular diseases (CVDs) represent a significant global

2.5 HEART DISEASE PREDICTION

particularly noteworthy, as it contributes to lifting a substantial burden off healthcare

2.6 ENHANCING HEART DISEASE PREDICTION ACCURACY

Nadikatla Chandrasekhar in the realm of medical diagnostics, the early detection of

This approach proves to be a significant breakthrough, elevating the accuracy to 93.44%

2.7 HEART DISEASE PREDICTION USING MACHINE LEARNING

Vijeta Sharma The escalating prevalence of heart-related diseases, as highlighted

2.8 HEART DISEASE PREDICTION USING MACHINE LEARNING AND

GOPAL The intricate nature of heart-related diseases necessitates an exceptionally

2.9 CARDIOVASCULAR DISEASES (CVDS) USING MACHINE

Baban.U Cardiovascular diseases (CVDs) have emerged as a formidable global

2.10 HEART DISEASE PREDICTION SYSTEM USING MODEL OF

3. Heart Journal of 2020 To develop a Python High Limited to Promising

5. Heart disease Journal of 2021 Predicting Implemented Utilizes Relatively Significant

7. Heart Journal of 2021 Develop an ML Software: Accurate Limited to Random

3.1 EXISTING SYSTEM

 XGBoost is a tree-based algorithm, and tree-based algorithms are prone to

 It is a complex algorithm and can be computationally expensive to train. This can

 The XGBoost is a black-box algorithm, which means that it can be difficult to

3.2 PROPOSED SYSTEM

The proposed system integrates machine learning algorithms, specifically the

3. Implement data pre-processing techniques to handle missing values and outliers in

3.3 FEASIBILITY STUDY