Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

Prediction of Heart Disease Using Machine Learning

Course Number: Course Name

Module tutor: Shelagh Keogh

Assignment Title

Student Name: University Identifier

Institutional Affiliation

Program / Course of Study

Supervisor’s Name

Second Marker’s Name

Draft Outline Confirmation

Word Count
1. Table of Contents

2. Aim..........................................................................................................................................................3
3. Background Information, Motivation, Relevance and Literature Review................................................3
3.1. Background information.......................................................................................................................3
3.2. Motivation............................................................................................................................................4
3.3. Literature Review.................................................................................................................................4
3.3.1. Unsupervised Methods.....................................................................................................................5
3.3.2. Supervised Methods..........................................................................................................................5
4. Sources and Use of Knowledge...............................................................................................................6
4.1. Journal..................................................................................................................................................6
4.2. Specification for publication in Informatics in Medicine Unlocked......................................................6
4.2.1. The Layout of the Page......................................................................................................................6
4.2.2. Page Style..........................................................................................................................................7
4.2.3. Reference format..............................................................................................................................7
4.2.4. The relevance of Authors and Journals.............................................................................................7
5. Scope, Objective, and Risk.......................................................................................................................8
5.1. Scope....................................................................................................................................................8
5.2. Objective..............................................................................................................................................9
5.2.1. A critical review of existing literature................................................................................................9
5.2.3. Classification......................................................................................................................................9
The study will employ both supervised and unsupervised methods for CVDs classification. To establish
the best techniques used in CVDs prediction..............................................................................................9 Evaluate Decision Tree Classifier....................................................................................................9 Evaluate Support Vector Machines................................................................................................9 Evaluate Neural Networks..............................................................................................................9 Evaluate the Naïve Bayes Algorithm.............................................................................................10
5.3. Risk Management Plan.......................................................................................................................10
6. Security, Ethical, Legal, social and Professional Issues..........................................................................11
6.1. Ethical Issues......................................................................................................................................11
6.2. Legal Issues.........................................................................................................................................11
6.3. Social Issues........................................................................................................................................11
6.4. Security Issues....................................................................................................................................11
6.5. Professional Issues.............................................................................................................................11
7. Schedule of Activities............................................................................................................................12
7.1. Work Breakdown Structure (WBS).....................................................................................................12
7.2. Task List..............................................................................................................................................13
8. Reference..............................................................................................................................................21

2. Aim

Machine learning has been regarded as a significant tool in the prediction of heart disease and
other health disorders in the healthcare industry for a while. This study aims to investigate the
prediction of heart disease using machine learning.

3. Background Information, Motivation, Relevance and Literature Review

3.1. Background information

Machine learning is the examination of models of statistics as well as algorithms used by

computer systems in executing operations by employing Artificial Intelligence in the learning
process without human intervention. As such, it can self-adjust the actions according to the
process. Generally, machine learning has substantial contributions to the Healthcare Industry.
Firstly the development of Clinical Decision Support Systems cannot be overlooked. According to
ECKHARD (2018), a CDSS is information automation in the health sector that is modeled to
facilitate doctors as well as medical personnel with an influenced clinical decision by linking
health observation and health information.

For instance, machine learning has substantially improved prescribing practices, helps
reduce medical errors. It also offers areas for quality improvement, such as data presentation,
documentation, order, and or prescription facilitators as well as pathways. The system’s greatest
advantage is that it can be evaluated by end-users filling the CDS user log feedback (ECKHARD,
2018). Additionally, operational factors, such as workflow impact and efficiency in terms of
turnaround time, are factored. Patient outcomes can also be evaluated, especially manageable
conditions that can be quickly evaluated quantitatively.

Secondly, IBM Watson Genomics is a distinct illustration whereby cancer is detected

during the early periods through the integration of genome-based tumor sequencing and intelligent
computing. Therapeutic treatments have been discovered to cater for oncology. Microsoft, on the
other hand, has developed image diagnostic tools for image analysis, a significant step in medical
imaging (FWS, 2016). Additionally, ML has led to the improvement in differential reinforcement,
a medical concept that supports expected responses through the extinction of unwanted responses.
It is actualized through an application that identifies body expressions made each day, allowing an
individual to understand unconscious behavior to adopt essential changes.

Crowd sourced data collection is a research concept that enables researchers and medical
practitioners to access information uploaded by other people concerning their consent. It has
resulted in improved medication and diagnosis. According to FWS (2016), AI can predict epidemic
by simulating global outbreaks through an assortment of real-time social media updates,
information collected from websites as well as data from satellites. Such phenomena are made
possible using tools such as artificial neural networks. Such information helps third world countries
without the proper infrastructure to prepare for such epidemics.

3.2. Motivation

With recent developments in lifestyle changes and feeding habits, heart diseases and disorders
have been on the rise, cutting across the age distribution in the society. According to WHO (2017),
it is estimated that a third of global mortality is due to heart-related complications making it the
leading cause of deaths. The majority of the mortality cases occur in developing economies and
third world economies. Notably, such a health condition can occur as a result of health risk factors
such as obesity, idleness, smoking as well as poor feeding habits, and excessive use of alcohol.

It is important to note that globally, men are more prone to heart disease than women. With the
availability of a huge database on health cases related to heart diseases as well as resulting
morbidity and mortality, it is easier to predict heart disease. Its prediction is regarded as a
milestone in mitigating as well as preventing new cases of CVDs globally. Of importance is the
fact that Coronary Heart Disease accounts for most deaths related to heart disease in the UK,
averaging 435 deaths daily (Griffin, 2018). Additionally, two-thirds of the CVDs deaths result
from strokes and heart attacks. For instance, two hundred thousand hospital visits in Britain are due
to heart attacks, with 60% being male victims. Individuals with CVDs, as well as people at risk of
CVDs as a result of the presence of diabetes or hypertension, require early screening and
management to live a normal life. The high mortality is due to the late identification of risks or lack
of identification in the first place.

Classification is an AI method executed in clinical data science to predict with significant

accuracy (Nichenametla et al., 2018). Data mining helps mitigate the difficulty of predicting heart
disease due to causative conditions such as diabetes, high cholesterol, as well as abnormal pulse
rate. This study will be using ML to establish whether a person is suffering from CVDs using
Cleveland Heart Dataset.

3.3. Literature Review

For a long period, machine learning has been used in many fields to enhance efficiency and
effectiveness. In the healthcare industry, it is used in predicting establishing the presence of

locomotive disorders as well as heart diseases. According to Karan (2019) use of algorithms such
as Support Vector Machines (SVM), Naïve Bayes Classifier, Neural Networks, as well as Decision
Tree Classifier. Prior knowledge of such conditions can significantly help physicians to adapt their
diagnosis to a more insight-based option. SVM has a high accuracy level of 92.1% with Artificial
Neural Networks recording accuracy of 91% (Latha && Jeeva, 2019). Decision Trees Classifier
has a relatively lower percentage accuracy of 89.6.

3.3.1. Unsupervised Methods

Support Vector Machines (SVM) have been widely used for classification matters. Essentially,
the method uses a margin scheme that offers solutions via transformations of sophisticated
quadratic programming problem. Additionally, it separates classification issues using a Hyperplane
(Peachap && Tchiotsop, 2019).

According to Latha and Jeeva (2019), health risk factors such as obesity, smoking,
inactiveness as well as poor feeding habits and excessive use of alcohol cause CVDs. Additionally,
Coronary Heart Disease accounts for most deaths related to heart disease in the UK, averaging 435
deaths daily (Griffin, 2018). Additionally, two-thirds of the CVDs deaths result from strokes and
heart attacks. For instance, two hundred thousand hospital visits in Britain are due to heart attacks,
with 60% being male victims.

According to WHO (2017), it is estimated that a third of global mortality is due to heart-
related complications making it the leading cause of deaths. The figures are 17 million individuals
on the estimate with high prevalence on the Asian continent. The majority of these deaths occur in
developing economies and third world countries. Notably, such a health condition can occur as a
result of health risk factors such as smoking, obesity, physical inactivity as well as poor feeding
habits and excessive use of alcohol. It is important to note that globally, men are more prone to
heart disease than women.

3.3.2. Supervised Methods

Notably, existing literature has sufficiently used ensemble classifier methods for improved
performance inaccurate prediction of heart diseases. A consolidation of neural networks and
genetic designs increased accuracy up by 6% to 99.97 %, supported by distorted logic (Karan,
2019). Additionally, supervised genetic algorithms based on a neural network of distorted logic
diagnosed heart disease possibility with an accuracy of 97.78%. However, the change of the data

set was observed to cause a decline in percentage accuracy of predicting a CVDs risk by 5% to
93% while using a rough set-based classification.

According to Latha and Jeeva (2019), neural networks have been used in the past to
eliminate medical errors in the diagnosis of blood sugar levels. A design named Coactive Neuro-
Fuzzy Inference System (CANFIS) assorted with genetic algorithms, and neural networks
indicated positive results in a diagnosis of CVDs. As a result of this, a genetic algorithm was
purposely selected to pick a feasible feature set as well as automatically tuning specifications for
CANFIS. The model was observed to facilitate doctors as well as medical personnel with
influenced medical information for the diagnosis of heart disease.

Naïve Bayes is an algorithm based on conditional probability with a mathematical model

that determines the category of a new characteristic vector. It is mainly used to categorize the text-
based database (Haq et al., 2018). On the other hand, Decision Tree Classifier entails an algorithm
symbolized by a decision tree shape with each node symbolizing a decision or a conduit for
connection, usually the leaf node. The nodes are classified into internal and external nodes. Internal
nodes form the decision making part while the external nodes visit the next nodes, usually called
child nodes.

4. Sources and Use of Knowledge

4.1. Journal

A variety of journals consistent with the topic of the use of machine learning in the prediction
of heart disease have been published. Informatics in Medicine Unlocked produced the ideal journal
to this topic on the prediction of heart disease. Additionally, Informatics in Medicine Unlocked has
published various journals on heart disease prediction and where these articles use different
methods of prediction, and all have been published in the last decade. Notably, the publisher has
very good recommendations on heart disease prediction from several academics.

4.2. Specification for publication in Informatics in Medicine Unlocked

4.2.1. The Layout of the Page

The size of the suits A4 standard as upper margin width is 19mm. Also, the lower margin is
approximately 43mm, with the left margins 14 mm and the right 32 mm, respectively.

4.2.2. Page Style

As per custom standards, paragraphs in the journal should be justified and indented, which has
been fully implemented. Times New Roman font has been extensively used as the text font for the
document. Ideally, the title should have a font of 24 pt while 11pt features the author name with
font 10pt being used as the regular font on the body. The body, the author's email address should
be in 9 pt.

4.2.3. Reference format

The format of the reference should be in APA 7th edition, or at least the previous version of the
publication is for the previous years when the formatting guidelines had prescribed differently than
the current one.
4.2.4. The relevance of Authors and Journals

According to journal publication guidelines, for the journal to meet set standards, it should
meet several standards. Firstly, the availability of the author's guidelines and their implementation.
The author should have written several journals related to the field under review. Additionally,
they should be an authority in that field via professional qualifications. It is significant for the
journal to display accurate and vetted involvement of the editorial board as well as an accurate
description of the peer-review procedure. All the standards mentioned earlier have been fully met
by the selected journal for this study. Additionally, other relevant journals related to the topic of
study concerning various ML methods of prediction of CVDs have been fully published by the
same publisher.

5. Scope, Objective, and Risk

5.1. Scope

This study focuses on the diagnosis of heart diseases using artificial intelligence. The matter of
discussion is about human health, and as such, it should be as reliable and accurate as possible. The
consequences of the wrong diagnosis of CVDs is fatal to a patient’s health as well as the
organization's reputation. The research mainly aims at investigating the diagnosis of CVDs using
artificial intelligence, simply referred to as machine learning.

The topic is further divided into the following specific scopes;

I. To evaluate existing machine learning methods in the research literature.

II. To analyze and establish a more efficient and effective algorithm for the prediction of heart
III. To analyze and assess the recommended system with contrast to existing solutions.

5.2. Objective

5.2.1. A critical review of existing literature.

1.1 Evaluate and strictly review the existing information on the prediction of heart disorders
using machine learning.
1.2 To find out an efficient machine learning method that is used in the prediction of heart
1.3 Study the classification, diagnosis, and treatment of heart disease.

To have sufficient knowledge about the types of CVDs, causes, signs, and symptoms as well as

5.2.2. Machine learning techniques

To eliminate unwanted symptoms and CVDs causative illness, such as diabetes and obesity, that
could affect the prediction of heart disease.

2.1. Decision tree classifier.

2.2. Support vector machine.

To identify the ideal clinical symptoms of heart disease.

2.3. Neural Networks

2.4. Naïve Bayes Algorithm.

5.2.3. Classification

The study will employ both supervised and unsupervised methods for CVDs classification. To
establish the best techniques used in CVDs prediction. Evaluate Decision Tree Classifier. Evaluate Support Vector Machines Evaluate Neural Networks

9 Evaluate the Naïve Bayes Algorithm

5.3. Risk Management Plan

A risk management plan is established to find out possible risks that might occur throughout the
research. The plan identifies, evaluates, and comes up with measures to mitigate the risk. It is,
however, subject to change with progress in the project.

Risk Risk Event Risk Risk Risk Risk

Type Value Monitoring Managemen Review
The scale t strategy Date
of 1 - 100
TIME when the need arises for 55 Delays in Maximize Research
working on weekends, the research weekdays to Period
database will not be work cater for such
accessible an event
FUNDS No financial risk
/ / / /
TOPIC The topic is new, and my 45 Delays in Reading Research
knowledge is relatively research extensively Period
limited hence causing work
HUMAN No risk to humans
/ / / /

ENVIRON The project does not harbor

-MENTAL environmental risk / / / /

TOPIC Wrong analysis and or 40 Error Inquiry with Research

findings could easily occur Supervisor as Period
as its first time to well as peers.
implement ML

Table 1: Risk Table

6. Security, Ethical, Legal, social and Professional Issues

6.1. Ethical Issues

The research was conducted objectively on the choice of design as well as the scope of the
study was free of bias. Additionally, the research is open to sharing key data and recommendations.
The research has adopted human subject protection, and many efforts have been focused on
reducing any possible harm to individuals as well as trying to maximize benefits for the collective
good. The research has avoided data confidentiality by using the information in the public domain.

6.2. Legal Issues

This research study does not intend to carry out any legal action as the database grants access
to members of the public. Notably, the research is in line with medical laws and regulations
regarding the study, specifically the human body research act.

6.3. Social Issues

Primarily the research entails simulation of the actual process. As such, it is devoid of any harm
to the society as it does not engage any individuals or organizations.

6.4. Security Issues

Firstly, the research has employed honesty and integrity in its reporting, thus providing safe
information matters the context and scope of the study. Subsequently, the study is free of any harm
that it may cause to the society directly through individual or organizational participation as it will
primarily use a public dataset.

6.5. Professional Issues

This study is open to further research to critique and enhance the methodology, data analysis,
findings, conclusions, and recommendations by other researchers. Additionally, the research has
been carefully executed to ensure that it is incredibly executed. There is no conflict arising between
the research aim and personal or financial interest, as the research is purely for academic purposes.

Additionally, as stipulated, the ethics form has been duly filled following the research as per the
North Umbria Ethics Registration guidelines.

7. Schedule of Activities

A schedule of activities is not only important in the planning and executing tasks in an orderly
format, but also it is paramount in the visualization of the overall project model as well as the
reflection of the progress and performance throughout the expected project period. The project will
commence on (INSERT DATE) and conclude (INSERT DATE). The working days are (INSERT
FIGURE), during the semester. Notably, the project excluded weekends.

7.1. Work Breakdown Structure (WBS)

7.2. Task List

Task name Duration Start End
1 Literature Review 11 Days

1.1 Identifying method used in the literature 3 Days

1.2 Studying the Classification of Heart Disease 3 Days

1.3 Critique of the literature review 5 Days

2 Machine learning techniques 12 Days

2.1 Heart disease prediction 4 Days

4.1 Evaluation of Artificial Neural Network 5 Days

4.2 Investigating Decision Tree 5 Days

4.3 Examining Support Vector Machine 5 Days

4.4 Evaluation Naïve Bayes Classifier 2 Days

6 Research draft writing 28 Days

6.1 Writing Introduction and literature reviews 7 Days

6.2 Writing Methodology 6 Days

6.3 Writing Result and Discussion 4 Days

6.4 Writing Conclusion, Recommendations 6 Days

6.5 Proofreading and reviewing the dissertation 4 Days

6.6 Reviewing the thesis and making the poster 7 Days

7 Submission date

meeting with the supervisor

Appendix A - Ethics Form

[Complete after approval]



Department of Computer and Information Sciences Green


Section One: Registration [To be completed by


Title of project Prediction of Heart Disease using Machine


Researcher’s name

Program of study
MSc computer science

Academic Year

Module code KF7028

Supervisor’s name

Second Marker’s name

Start date of the project

Short description of the project, including research methods and selection of any participants:

Sample Script

10. If yes [to 5, 6, 7, 8, or 9 above], have you identified steps to address the issues?

Statement by researcher

This statement should explain how any issues identified in the answers to the above questions will be addressed
and what steps will be taken to mitigate such risks or adverse impact

The project will not carry out any kind issues.

I have read the University and the Faculty Ethics Policy and Procedures and confirm that the answers I
given above are correct. Where issues arise under items 5, 6, 7, 8 or 9 [above] I have described in
writing how I intend to approach these issues in the research.


e 16
Sample Script

Section Two: Approval

[The form is reviewed by the supervisor and second marker. Approval may be given by either for green
projects; amber projects must be approved by the second marker. Red projects must be referred to the Faculty
Research Ethics Committee.]

Red: Vulnerable participants, sensitive data, risks to participants or researchers, NHS, etc.

Amber: Human participants, environmental issues, commercially sensitive information, etc.

Green: No participants involved, no sensitive data, etc.

For full definitions see section on Risk Categories in the Engineering and Environment Ethics Procedures.

Ethical approval

[Please tick as appropriate]

Green - Ethical approval is given without conditions

Amber - Ethical approval is given with the following conditions

Information to be provided to all participants

Participant consent to be obtained using the standard Research Participant Consent
Form or otherwise by Faculty procedures
Data to be stored and destroyed securely by University guidelines Adherence to the
Data Protection Act
Anonymity to be provided to participants
Commercial confidentiality to be provided to organisations(s)
Other (please state):

Red - Project is referred to FREC for approval

Name of Approver

Signature ……………………………………….


Outcome of FREC referral – Decision, minute and date of meeting, or signatures of two signatories, one of
whom is a member of FREC.

Appendix B - Key Words and Abbreviations
Data Mining: it is the practice of identifying data sets and data patterns from a significant
database for generating new information. It may employ a combination of methods ranging from
various disciplines such as machine learning and statistics.

Machine Learning: It is the study of statistical models and algorithms used by computer
systems in executing operations by employing Artificial Intelligence in the learning process
without human intervention. As such, it can self-adjust the actions according to the process.
Heart Diseases: can be defined as a range of conditions affecting the heart. It ranges from heart
rhythm disorders, blood vessel diseases as well as heart defects.
Ensemble Classifier: it is a set of classifiers which combine individual decisions mostly via
weighting vote that classify new examples. Algorithms are used mainly to obtain better
predictive performance as a collective than it could perform using individual classifiers.
Prediction Model: Mainly employed in ensemble modeling. It is a process through which
multiple different structures are created to predict an outcome or event. The model aggregates
the prediction of each model individually, which results in the prediction of unseen data as a
single output.

CDSS – Clinical Decision Support System.
AI – Artificial Intelligence.
ML – Machine Learning.
CVDs- Cardiovascular Disease.
CHD – Coronary Heart Disease.
CANFIS - Coactive Neuro-Fuzzy Inference System.

8. Reference

9. Karan Bhanot (2019) Predicting the Presence of heart diseases using machine
10. ECKHARD, M. (2018). Clinical Decision Support Systems. Nursing
Informatics for the Advanced Practice Nurse: Patient Safety, Quality,
Outcomes, and interprofessionalism
11. Nichenametla, Rajesh & Maneesha, T. & Hafeez, Shaik & Krishna, Hari.
(2018). Prediction of Heart Disease Using Machine Learning Algorithms.
International Journal of Engineering and Technology (UAE). 7. 363-366.
12. FWS (2016). Top 10 Applications of Machine Learning in Healthcare - FWS. Retrieved 25 March 2020, from
13. WHO (2017). Cardiovascular Diseases (CVDs) Retrieved 25 March 2020,
14. Griffin, S. (2018). Facts and the fiction surrounding heart attacks and heart
disease. Retrieved 25 March 2020, from
15. Latha, C., & Jeeva, S. (2019). Improving the accuracy of prediction of heart
disease risk based on ensemble classification techniques. Informatics in
Medicine Unlocked, 16, 100203.
16. Peachap, A., & Tchiotsop, D. (2019). Epileptic seizures detection based on
some new Laguerre polynomial wavelets, artificial neural networks, and
support vector machines. Informatics in Medicine Unlocked, 16, 100209.
17. Das, H., Naik, B., & Behera, H. (2020). Medical disease analysis using Neuro-
Fuzzy with Feature Extraction Model for classification. Informatics in
Medicine Unlocked, 18, 100288.
18. Haq, A., Li, J., Memon, M., Nazir, S., & Sun, R. (2018). A Hybrid Intelligent
System Framework for the Prediction of Heart Disease Using Machine
Learning Algorithms. Mobile Information Systems, 2018, 1-21.


You might also like