Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

HEART DISEASE PREDICTION METHOD

USING ENSEMBLE CLASSIFIER


HEART DISEASE PREDICTION METHOD
USING ENSEMBLE CLASSIFIER
A project report submitted in partial fulfilment for the award of
requirement of the Degree of

BACHELOR OF COMPUTER SCIENCE

Submitted by

R.PARAMESH 21CS1120
K.BHUVANKUMAR 21CS1111
V.DHANUSH 21CS1113
S.ANANDHARAJ 21CS1105

Under the guidance of

MS. T.SANTHIYA B.SC., M.S.C.,


( Professor, Department of Computer Science)

DEPARTMENT OF COMPUTER SCIENCE


PSV COLLEGE OF ARTS AND SCIENCE
PONDICHERRY UNIVERSITY

PONDICHERRY, INDIA.

May 2024
PSV COLLEGE OF ARTS AND SCIENCE
PONDICHERRY UNIVERSITY
PUDUCHERRY – 607 402

DEPARTMENT OF COMPUTER SCIENCE

BONAFIDE CERTIFICATE

This is to certify that this project work entitled “HEART DISESAS


PREDICTION METHOD USING ENSEMBLE CLASSIFIER” is a
bonafide work done by R.PARAMESH(21CS1120)K.BHUVANKUMAR
(21CS1111)V.DHANUSH(21CS1113)S.ANANDHA RAJ(21CS1105) in
partial fulfillment of the requirements for the award of B.Sc Degree in
Computer Science by Pondicherry University during the academic year 2021-
2024

`PROJECT GUIDE HEAD OF THE DEPARTEMENT

Submitted for the university Examination held on_____________________

INTERNAL EXAMINER EXTERNAL EXAMINER


ACKNOWLEDGEMENT

We greatly thanks to all those who helped us in making this Project successful.

We find great pleasure in thanking our Chairman and Managing Director

Thiru. S.SELVAMANI and our secretary Dr. S.VIKNESH M.Tech, Ph.D., for all their

support, encouragement by providing us with a better environment for studying as well as to

equip ourselves to the learning environment.

It is great pleasure to thank our Director Dr.N.GOBU M.E., MBA, Ph.D., MISTE,

AMIE, MIIW for his valuable guidance, support and encouragement to do our project work.

It gives us great ecstasy of pleasure to convey our deep and sincere thanks to Principal

Dr. K.KAMALAKKAN, Ph.D., of PSV College of Arts and Science for having given us

permission to take up this project and for his kind patronage.

We wish to express our deep sense of gratitude to our Head of the Department

Mr. R.BABU M.C.A., M.E., for his valuable guidance and encouragement throughout this

project work.

We wish to express our gratitude thanks to our Project Guide MS. T,SANTHIYA

B.S.C., M.S.C, Professor for his able guidance and useful suggestions, which helped me in

completing the project work, in time.

We also thanks to all our Department Faculties and Lab Administrator for their

timely guidance in the conduct of our project work and for all their valuable assistance in the

project work.

Finally, yet importantly, I would like to express my heartfelt thanks to my beloved

parents for their blessings my friends/classmates/seniors for their help and wishes for the

successful completion of this project.


CHAPTER
TITLE
NO PAGE NO

ABSTRACT 8-9

I INTRODUCTION 10

II PROBLEM DEFINITION 12

III SYSTEM STUDY 13-14

3.1 EXISTING SYSTEM

3.2 PROPOSED SYSTEM

IV SYSTEM REQUIREMENTS SPECIFICATION 15-22

4.1 HARDWARE SPECIFICATION

4.2 SOFTWARE SPECIFICATION

V SYSTEM ANALYSIS 23-27

5.1 DATA FLOW DIAGRAM

5.2 ARCHITECTURE

VI SYSTEM IMPLEMENTATION 28-30

6.1 HARDWARE SPECIFICATION

6.2 SOFTWARE SPECIFICATION

VII SOFTWARE ENVIRONMENT 31-29


VII SYSTEM TESTING 30-36

IX CODING 37-67

X SCREENSHOTS 68-71

XI CONCLUSION 72

XII FUTURE ENHANCEMENT 73

XIII BIBLOGRAPHY 74
ABSTRACT
ABSTRACT

Heart disease prediction is crucial for informed healthcare decisions, enabling early
interventions and lifestyle adjustments. Traditional methods such as machine learning and
deep learning have been instrumental in developing predictive models for heart disease
detection. Commonly employed algorithms like Support Vector Machines (SVM), logistic
regression, XG Boost, and LightGBM have demonstrated accuracies ranging from 73.77% to
88.5%. In our project, we aim to enhance the efficiency of heart disease prediction by
leveraging the power of ensemble classifiers. Specifically, we employ the Convolutional
Neural Network (CNN) alongside Recurrent Neural Network (RNN) algorithms. This novel
approach integrates the strengths of both deep learning and traditional machine learning
methods. By combining these models through an ensemble classifier, we seek to improve
predictive accuracy and contribute to more effective and reliable heart disease risk
assessment. This research endeavors to advance the field by exploring innovative
methodologies for enhancing the performance of heart disease prediction models.
INTRODUCTION
I. INTRODUCTION

The growth in medical data collection presents a new opportunity for physicians to
improve patient diagnosis. In recent years, practitioners have increased their usage of
computer technologies to improve decision-making support. In the health care industry, Deep
learning is becoming an important solution to aid the diagnosis of patients. Deep learning is
an analytical tool used when a task is large and difficult to program, such as transforming
medical record into knowledge, pandemic predictions, and genomic data analysis. Recent
studies have used Deep learning techniques to diagnose different cardiac problems and make
a prediction. A major problem of Deep learning is the high dimensionality of the dataset, in
our project we used ensemble classifier method to detect accurately for heart disease using
CNN and RNN. The analysis of many features requires a large amount of memory and leads
to an over fitting, so the weighting features decrease redundant data and processing time, thus
improving the performance of the algorithm. Dimensionality reduction uses feature extraction
to transform and simplify data, while feature selection reduces the dataset by removing
useless features.
PROBLEM DEFINITION
2.1PROBLEM DEFINITION
Expert choice system in light of AI classifiers and the use of fake fluffy rationale is
successfully finding the HD therefore, the proportion of death diminishes and The Cleveland
heart illness informational index was utilized by different analysts also for the distinguishing
proof issue of HD. The Deep learning prescient models of ensemble classifier need
appropriate information for preparing and testing. The presentation of AI model can be
expanded whenever adjusted dataset is use for preparing and testing of the model. Moreover,
the model prescient abilities can improve by utilizing appropriate and related highlights from
the information. Hence, information adjusting and highlight determination is altogether
significant for model execution improvement. In writing different analysis strategies have
been proposed by different analysts, anyway these strategies are most certainly not
successfully analysis HD.

2.2 EXISTING SYSTEM

Expert choice system in light of AI classifiers and the use of fake fluffy rationale is
successfully finding the HD therefore, the proportion of death diminishes and The Cleveland
heart illness informational index was utilized by different analysts also for the distinguishing
proof issue of HD. The Deep learning prescient models need appropriate information for
preparing and testing. The current heart disease prediction system relies on traditional
machine learning and potentially deep learning methods but operates on a standalone model
basis. Commonly used algorithms, such as Support Vector Machines (SVM), logistic
regression, XG Boost, and LightGBM, may have been employed individually to predict heart
disease risk based on specific datasets.

DISADVANTAGES:

• Lack of prediction accuracy


• High computation time for prediction of HD
• There are no successfully hybrid algorithms.
2.3 PROPOSED SYSTEM

Augment the existing dataset with additional relevant features and ensure its
comprehensiveness to capture a more nuanced representation of heart disease
risk factors. Implement advanced data preprocessing techniques to handle
outliers, imbalances, and complex relationships within the dataset. This includes
sophisticated methods for handling missing values and robust feature scaling.
Introduce an ensemble framework that combines the predictive power of the
Convolutional Neural Network(CNN) and the temporal understanding of
Recurrent Neural Network (RNN) algorithms. The ensemble utilizes a
Ensemble Classifier to merge the outputs of individual models. Capitalize on
the diversity of CNN and RNN algorithms, which bring distinct perspectives to
heart disease prediction. This diversity aims to enhance the overall robustness of
the predictive model.

ADVANTAGES:

• High accuracy on diagnosis of Heart Disease.

• Using Ensemble classifier method with CNN and RNN algorithm.

• Reducing computation time.


SYSTEM REQUIREMENT SPECIFICATION
III SYSTEM RQUIREMENT SPECIFICATION

3.1 HARDWARE REQUIREMENT

 Processor – I3, i5,i7,AMD

 RAM -- 8 Gb

 Hard Disk -- 500 GB

3.2 SOFTWARE REQUIREMENTS


 Operating System -- Windows 7/8/10

 Front end – HTML,CSS

 Framework -- Flask

 Language – Python

PYTHON

INTRODUCTION TO PYTHON
Python is a high-level object-oriented programming language that was created by Guido
van Rossum. It is also called general-purpose programming language as it is used in
almost every domain we can think of as mentioned below:

 Web Development
 Software Development
 Game Development
 AI & ML
 Data Analytics
This list can go on as we go but why python is so much popular let’s see it in the next
topic.

WHY PYTHON PROGRAMMING?


You guys might have a question in mind that, why python? why not another programming
language?

So let me explain:

Every Programming language serves some purpose or use-case according to a domain. for eg,
Javascript is the most popular language amongst web developers as it gives the developer the
power to handle applications via different frameworks like react, vue, angular which are used
to build beautiful User Interfaces. Similarly, they have pros and cons at the same time. so if
we consider python it is general-purpose which means it is widely used in every domain the
reason is it’s very simple to understand, scalable because of which the speed of development
is so fast. Now you get the idea why besides learning python it doesn’t require any
programming background so that’s why it’s popular amongst developers as well. Python has
simpler syntax similar to the English language and also the syntax allows developers to write
programs with fewer lines of code. Since it is open-source there are many libraries available
that make developers’ jobs easy ultimately results in high productivity. They can easily focus
on business logic and Its demanding skills in the digital era where information is available in
large data sets.

HOW DO WE GET STARTED?

Now in the era of the digital world, there is a lot of information available on the
internet that might confuse us believe me. what we can do is follow the documentation which
is a good start point. Once we are familiar with concepts or terminology we can dive deeper
into this.

Following are references where we can start our journey:


Official Website: https://www.python.org/

Udemy Course: https://www.udemy.com/course/python-the-complete-python-developer-


course/

YouTube: https://www.youtube.com/watch?v=_uQrJ0TkZlc

CodeAcademy: https://www.codecademy.com/catalog/language/python

I hope now you guys are excited to get started right so you might be wondering where we can
start coding right so there are a lot of options available in markets. we can use any IDE we
are comfortable with but for those who are new to the programming world I am listing some
of IDE’s below for python:

1) Visual Studio: https://visualstudio.microsoft.com/

2) PyCharm: https://www.jetbrains.com/pycharm/

3) Spyder: https://www.spyder-ide.org/

4) Atom: https://atom.io/

5) Google Colab: https://research.google.com/colaboratory/

Real-World Examples:
1) NASA (National Aeronautics and Space Agency): One of Nasa’s Shuttle Support
Contractors, United Space Alliance developed a Workflow Automation System (WAS)
which is fast. Internal Resources Within critical project stated that:

“Python allows us to tackle the complexity of programs like the WAS without getting bogged
down in the language”.

Nasa also published a website (https://code.nasa.gov/) where there are 400 open source
projects which use python.

2) Netflix: There are various projects in Netflix which use python as follow:
 Central Alert Gateway
 Chaos Gorilla
 Security Monkey
 Chronos

Amongst all projects, Regional failover is the project they have as the system decreases
outage time from 45 minutes to 7 minutes with no additional cost.

3) Instagram: Instagram also uses python extensively. They have built a photo-sharing
social platform using Django which is a web framework for python. Also, they are able to
successfully upgrade their framework without any technical challenges.

Applications of Python Programming:


1) Web Development: Python offers different frameworks for web development like Django,
Pyramid, Flask. This framework is known for security, flexibility, scalability.

2) Game Development: PySoy and PyGame are two python libraries that are used for game
development

3) Artificial Intelligence and Machine Learning: There is a large number of open-source


libraries which can be used while developing AI/ML applications.

4) Desktop GUI: Desktop GUI offers many toolkits and frameworks using which we can
build desktop applications.PyQt, PyGtk, PyGUI are some of the GUI frameworks.

How to Become Better Programmer:


The last but most important thing is how you get better at what programming you choose is
practice practice practice. Practical knowledge only acquired by playing with things so you
will get more exposure to real-world scenarios. Consistency is more important than anything
because if you practice it for some days and then you did nothing then when you start again it
will be difficult to practice consistently. So I request you guys to learn by doing projects so it
will help you understand how things get done and important thing is to have fun at the same
time.
Approach to be followed to master Python:
“Beginning is the end and end is the beginning”. I know what you are thinking about. It is
basically a famous quote from a web series named “Dark”. Now how it relates to Python
programming?

If you researched on google, youtube, or any development communities out


there, you will find that people explained how you can master programming in
let’s say some “x” number of days and like that.

Well, the reality is like the logo of infinity which we can see above. In the
programming realm, there is no such thing as mastery. It’s simply a trial and
error process. For example. Yesterday I was writing some code where I was
trying to print a value of a variable before declaring it inside a function. There I
had seen a new error named “UnboundLocalErrorException“.

So the important thing to keep in mind is that programming is a surprising


realm. Throughout your entire career, you will be seeing new errors and
exceptions. Just remember the quote – “Practise makes a man perfect”.

Now here is the main part. What approach to follow in order to master Python
Programming?

Well here it is:

Step-1: Start with a “Hello World” Program


If you happened to learn some programming languages, then I am sure you are
aware of what I am talking about. The “Hello World” program is like a tradition
in the developer community. If you want to master any programming language,
this should be the very first line of code we should be seeking for.

Simple Hello World Program in Python:

print("Hello World")

Step-2: Start learning about variables


Now once we have mastered the “Hello World” program in Python, the next
step is to master variables in python. Variables are like containers that are used
to store values.

Variables in Python:

my_var = 100

As you can see here, we have created a variable named “my_var” to assign a
value 100 to the same.

Step-3: Start learning about Data Types and Data Structures


The next outpost is to learn about data types. Here I have seen that there is a lot
of confusion between data types and data structures. The important thing to
keep in mind here is that data types represent the type of data. For example. in
Python, we have something like int, string, float, etc. Those are called data types
as they indicate the type of data we are dealing with.

While data structures are responsible for deciding how to store this data in a
computer’s memory.

String data type in Python:

my_str = "ABCD"
As you can see here, we have assigned a value “ABCD” to a variable my_str.
This is basically a string data type in Python.

Data Structure in Python:

my_dict={1:100,2:200,3:300}

This is known as a dictionary data structure in Python.

Again this is just the tip of the iceberg. There are lots of data types and data
structures in Python. To give a basic idea about data structures in Python, here
is the complete list:

1.Lists

2.Dictionary

3.Sets

4.Tuples

5.Frozenset

Step-4: Start learning about conditionals and loops


In any programming language, conditionals and loops are considered one of the
backbone.

Python is no exception for that as well. This is one of the most important
concepts that we need to master.

IF-ELIF-ELSE conditionals:

if(x < 10):

print("x is less than 10")


elif(x > 10):

print("x is greater than 10")

else:

print("Do nothing")

As you can see in the above example, we have created what is known as the if-
elif-else ladder

For loop:

for i in "Python":

print(i)

The above code is basically an example of for loop in python.

PRO Tip:
Once you start programming with Python, you will be seeing that if we missed
any white spacing in python then python will start giving some errors. This is
known as Indentation in python. Python is very strict with indentation. Python is
created with a mindset to help everyone become a neat programmer. This
indentation scheme in python is introduced in one of python’s early PEP(Python
Enhancement Proposal).

THE PYTHON STANDARD LIBRARY

While The Python Language Reference describes the exact syntax and semantics of
the Python language, this library reference manual describes the standard library that is
distributed with Python. It also describes some of the optional components that are commonly
included in Python distributions.
Python’s standard library is very extensive, offering a wide range of facilities as
indicated by the long table of contents listed below. The library contains built-in modules
(written in C) that provide access to system functionality such as file I/O that would
otherwise be inaccessible to Python programmers, as well as modules written in Python that
provide standardized solutions for many problems that occur in everyday programming.
Some of these modules are explicitly designed to encourage and enhance the portability of
Python programs by abstracting away platform-specifics into platform-neutral APIs.

The Python installers for the Windows platform usually include the entire standard
library and often also include many additional components. For Unix-like operating systems
Python is normally provided as a collection of packages, so it may be necessary to use the
packaging tools provided with the operating system to obtain some or all of the optional
components.

In addition to the standard library, there is a growing collection of several thousand


components (from individual programs and modules to packages and entire application
development frameworks), available from the Python Package Index.

What Is a Python Package?


To understand Python packages, we’ll briefly look at scripts and modules. A “script” is
something you execute in the shell to accomplish a defined task. To write a script, you’d
type your code into your favorite text editor and save it with the .py extension. You can
then use the python command in a terminal to execute your script.
A module on the other hand is a Python program that you import, either in interactive
mode or into your other programs. “Module” is really an umbrella term for reusable code.
A Python package usually consists of several modules. Physically, a package is a folder
containing modules and maybe other folders that themselves may contain more folders and
modules. Conceptually, it’s a namespace. This simply means that a package’s modules are
bound together by a package name, by which they may be referenced.
Circling back to our earlier definition of a module as reusable, importable code, we note
that every package is a module — but not every module is a package. A package folder
usually contains one file named __init__.py that basically tells Python: “Hey, this directory
is a package!” The init file may be empty, or it may contain code to be executed upon
package initialization.
You’ve probably come across the term “library” as well. For Python, a library isn’t as
clearly defined as a package or a module, but a good rule of thumb is that whenever a
package has been published, it may be referred to as a library.

HOW TO USE A PYTHON PACKAGE

We’ve mentioned namespaces, publishing packages and importing modules. If any of these
terms or concepts aren’t entirely clear to you, we’ve got you! In this section, we’ll cover
everything you’ll need to really grasp the pipeline of using Python packages in your code.
Importing a Python Package
We’ll import a package using the import statement:

Let’s assume that we haven’t yet installed any packages. Python comes with a big
collection of pre-installed packages known as the Python Standard Library. It includes tools
for a range of use cases, such as text processing and doing math. Let’s import the latter:

You might think of an import statement as a search trigger for a module. Searches are
strictly organized: At first, Python looks for a module in the cache, then in the standard
library and finally in a list of paths. This list may be accessed after importing sys (another
standard library module).
The sys.path command returns all the directories in which Python will try to find a package.
It may happen that you’ve downloaded a package but when you try importing it, you get an
error:

In such cases, check whether your imported package has been placed in one of Python’s
search paths. If it hasn’t, you can always expand your list of search paths:

At that point, the interpreter will have more than one more location to look for packages
after receiving an import statement.

Namespaces and Aliasing


When we had imported the math module, we initialized the math namespace. This means
that we can now refer to functions and classes from the math module by way of “dot
notation”:
Assume that we were only interested in our math module’s factorial function, and that
we’re also tired of using dot notation. In that case, we can proceed as follows:

If you’d like to import multiple resources from the same source, you can simply comma-
separate them in the import statement:

There is, however, always a small risk that your variables will clash with other variables in
your namespace. What if one of the variables in your code was named log, too? It would
overwrite the log function, causing bugs. To avoid that, it’s better to import the package as
we did before. If you want to save typing time, you can alias your package to give it a
shorter name:

Aliasing is a pretty common technique. Some packages have commonly used aliases: For
instance, the numerical computation library NumPy is almost always imported as “np.”
Another option is to import all a module’s resources into your namespace:

However, this method poses serious risk since you usually don’t know all the names
contained in a package, increasing the likelihood of your variables being overwritten. It’s
for this reason that most seasoned Python programmers will discourage use of the wildcard
* in imports. Also, as the Zen of Python states, “namespaces are one honking great idea!”
How to Install a Python Package

How about packages that are not part of the standard library? The official repository for
finding and downloading such third-party packages is the Python Package Index, usually
referred to simply as PyPI. To install packages from PyPI, use the package installer pip:

pip can install Python packages from any source, not just PyPI. If you installed Python
using Anaconda or Miniconda, you can also use the conda command to install Python
packages.

While conda is very easy to use, it’s not as versatile as pip. So if you cannot install a
package using conda, you can always try pip instead.
Reloading a Module
If you’re programming in interactive mode, and you change a module’s script, these
changes won’t be imported, even if you issue another import statement. In such case, you’ll
want to use the reload() function from the importlib library:

How to Create Your Own Python Package


Packaging your code for further use doesn’t necessarily mean you’ll want it published to
PyPI. Maybe you just want to share it with a friend, or reuse it yourself. Whatever your
aim, there are several files that you should include in your project. We’ve already
mentioned the __init__.py file.

Another important file is setup.py. Using the setuptools package, this file provides detailed
information about your project and lists all dependencies — packages required by your
code to run properly.

Publishing to PyPI is beyond the scope of this introductory tutorial. But if you do have a
package for distribution, your project should include two more files: a README.md
written in Markdown, and a license. Check out the official Python Packaging User Guide
(PyPUG) if you want to know more.

INSTALLING PACKAGES

This section covers the basics of how to install Python packages.

It’s important to note that the term “package” in this context is being used to describe a
bundle of software to be installed (i.e. as a synonym for a distribution). It does not to refer to
the kind of package that you import in your Python source code (i.e. a container of modules).
It is common in the Python community to refer to a distribution using the term “package”.
Using the term “distribution” is often not preferred, because it can easily be confused with a
Linux distribution, or another larger software distribution like Python itself.

Requirements for Installing Packages

This section describes the steps to follow before installing other Python packages.

Ensure you can run Python from the command line

Before you go any further, make sure you have Python and that the expected version is
available from your command line. You can check this by running:

Unix/macOS
python3 --version

Windows

You should get some output like Python 3.6.3. If you do not have Python, please install the
latest 3.x version from python.org or refer to the Installing Python section of the Hitchhiker’s
Guide to Python.

Note

If you’re a newcomer and you get an error like this:

>>> python --version

Traceback (most recent call last):

File "<stdin>", line 1, in <module>

NameError: name 'python' is not defined

It’s because this command and other suggested commands in this tutorial are intended to be
run in a shell (also called a terminal or console). See the Python for Beginners getting started
tutorial for an introduction to using your operating system’s shell and interacting with
Python.
Note

If you’re using an enhanced shell like IPython or the Jupyter notebook, you can run system
commands like those in this tutorial by prefacing them with a ! character:

In [1]: import sys

!{sys.executable} --version
Python 3.6.3

It’s recommended to write {sys.executable} rather than plain python in order to ensure that
commands are run in the Python installation matching the currently running notebook (which
may not be the same Python installation that the python command refers to).
Note

Due to the way most Linux distributions are handling the Python 3 migration, Linux users
using the system Python without creating a virtual environment first should replace
the python command in this tutorial with python3 and the python -m pip command
with python3 -m pip --user. Do not run any of the commands in this tutorial with sudo: if
you get a permissions error, come back to the section on creating virtual environments, set
one up, and then continue with the tutorial as written.
Ensure you can run pip from the command line

Additionally, you’ll need to make sure you have pip available. You can check this by
running:

Unix/macOS

python3 -m pip --version

Windows

If you installed Python from source, with an installer from python.org, or via Homebrew you
should already have pip. If you’re on Linux and installed using your OS package manager,
you may have to install pip separately, see Installing pip/setuptools/wheel with Linux
Package Managers.
If pip isn’t already installed, then first try to bootstrap it from the standard library:

Unix/macOS

python3 -m ensurepip --default-pip

Windows

If that still doesn’t allow you to run python -m pip:

 Securely Download get-pip.py 1

 Run python get-pip.py. 2 This will install or upgrade pip. Additionally, it will
install setuptools and wheel if they’re not installed already.

Warning

Be cautious if you’re using a Python install that’s managed by your operating system
or another package manager. get-pip.py does not coordinate with those tools, and may
leave your system in an inconsistent state. You can use python get-pip.py --
prefix=/usr/local/ to install in /usr/local which is designed for locally-installed
software.
Ensure pip, setuptools, and wheel are up to date

While pip alone is sufficient to install from pre-built binary archives, up to date copies of
the setuptools and wheel projects are useful to ensure you can also install from source
archives:

Unix/macOS

python3 -m pip install --upgrade pip setuptools wheel

Windows

Optionally, create a virtual environment

See section below for details, but here’s the basic venv 3 command to use on a typical Linux
system:

Unix/macOS

python3 -m venv tutorial_env


source tutorial_env/bin/activate

Windows

This will create a new virtual environment in the tutorial_env subdirectory, and configure the
current shell to use it as the default python environment.
Creating Virtual Environments

Python “Virtual Environments” allow Python packages to be installed in an isolated location


for a particular application, rather than being installed globally. If you are looking to safely
install global command line tools, see Installing stand alone command line tools.

Imagine you have an application that needs version 1 of LibFoo, but another application
requires version 2. How can you use both these applications? If you install everything into
/usr/lib/python3.6/site-packages (or whatever your platform’s standard location is), it’s easy
to end up in a situation where you unintentionally upgrade an application that shouldn’t be
upgraded.

Or more generally, what if you want to install an application and leave it be? If an application
works, any change in its libraries or the versions of those libraries can break the application.

Also, what if you can’t install packages into the global site-packages directory? For instance,
on a shared host.

In all these cases, virtual environments can help you. They have their own installation
directories and they don’t share libraries with other virtual environments.

Currently, there are two common tools for creating Python virtual environments:

 venv is available by default in Python 3.3 and later, and


installs pip and setuptools into created virtual environments in Python 3.4 and later.
 virtualenv needs to be installed separately, but supports Python 2.7+ and Python 3.3+,
and pip, setuptools and wheel are always installed into created virtual environments
by default (regardless of Python version).

The basic usage is like so:

Using venv:

Unix/macOS

python3 -m venv <DIR>


source <DIR>/bin/activate

Windows

Using virtualenv:

Unix/macOS

python3 -m virtualenv <DIR>


source <DIR>/bin/activate

Windows

For more information, see the venv docs or the virtualenv docs.

The use of source under Unix shells ensures that the virtual environment’s variables are set
within the current shell, and not in a subprocess (which then disappears, having no useful
effect).

In both of the above cases, Windows users should _not_ use the source command, but should
rather run the activate script directly from the command shell like so:

<DIR>\Scripts\activate

Managing multiple virtual environments directly can become tedious, so the dependency
management tutorial introduces a higher level tool, Pipenv, that automatically manages a
separate virtual environment for each project and application that you work on.
Use pip for Installing

pip is the recommended installer. Below, we’ll cover the most common usage scenarios. For
more detail, see the pip docs, which includes a complete Reference Guide.

Installing from PyPI

The most common usage of pip is to install from the Python Package Index using
a requirement specifier. Generally speaking, a requirement specifier is composed of a project
name followed by an optional version specifier. PEP 440 contains a full specification of the
currently supported specifiers. Below are some examples.

To install the latest version of “SomeProject”:

Unix/macOS

python3 -m pip install "SomeProject"

Windows

To install a specific version:

Unix/macOS

python3 -m pip install "SomeProject==1.4"

Windows

To install greater than or equal to one version and less than another:

Unix/macOS
python3 -m pip install "SomeProject>=1,<2"

Windows

To install a version that’s “compatible” with a certain version: 4

Unix/macOS

python3 -m pip install "SomeProject~=1.4.2"

Windows

In this case, this means to install any version “==1.4.*” version that’s also “>=1.4.2”.

Source Distributions vs Wheels

pip can install from either Source Distributions (sdist) or Wheels, but if both are present on
PyPI, pip will prefer a compatible wheel. You can override pip`s default behavior by e.g.
using its –no-binary option.

Wheels are a pre-built distribution format that provides faster installation compared to Source
Distributions (sdist), especially when a project contains compiled extensions.

If pip does not find a wheel to install, it will locally build a wheel and cache it for future
installs, instead of rebuilding the source distribution in the future.

Upgrading packages

Upgrade an already installed SomeProject to the latest from PyPI.

Unix/macOS
python3 -m pip install --upgrade SomeProject

Windows

Installing to the User Site

To install packages that are isolated to the current user, use the --user flag:

Unix/macOS

python3 -m pip install --user SomeProject

Windows

For more information see the User Installs section from the pip docs.

Note that the --user flag has no effect when inside a virtual environment - all installation
commands will affect the virtual environment.

If SomeProject defines any command-line scripts or console entry points, --user will cause
them to be installed inside the user base’s binary directory, which may or may not already be
present in your shell’s PATH. (Starting in version 10, pip displays a warning when installing
any scripts to a directory outside PATH.) If the scripts are not available in your shell after
installation, you’ll need to add the directory to your PATH:

 On Linux and macOS you can find the user base binary directory by running python -
m site --user-base and adding bin to the end. For example, this will typically
print ~/.local (with ~ expanded to the absolute path to your home directory) so you’ll
need to add ~/.local/bin to your PATH. You can set your PATH permanently
by modifying ~/.profile.
 On Windows you can find the user base binary directory by running py -m site --
user-site and replacing site-packages with Scripts. For example, this could
return C:\Users\Username\AppData\Roaming\Python36\site-packages so you would
need to set your PATH to
include C:\Users\Username\AppData\Roaming\Python36\Scripts. You can set your
user PATH permanently in the Control Panel. You may need to log out for
the PATH changes to take effect.
Requirements files

Install a list of requirements specified in a Requirements File.

Unix/macOS

python3 -m pip install -r requirements.txt

Windows

Installing from VCS

Install a project from VCS in “editable” mode. For a full breakdown of the syntax, see pip’s
section on VCS Support.

Unix/macOS

python3 -m pip install -e git+https://git.repo/some_pkg.git#egg=SomeProject # from git


python3 -m pip install -e hg+https://hg.repo/some_pkg#egg=SomeProject # from
mercurial
python3 -m pip install -e svn+svn://svn.repo/some_pkg/trunk/#egg=SomeProject # from
svn
python3 -m pip install -e git+https://git.repo/some_pkg.git@feature#egg=SomeProject #
from a branch
Windows

Installing from other Indexes

Install from an alternate index

Unix/macOS

python3 -m pip install --index-url http://my.package.repo/simple/ SomeProject

Windows

Search an additional index during install, in addition to PyPI

Unix/macOS

python3 -m pip install --extra-index-url http://my.package.repo/simple SomeProject

Windows

Installing from a local src tree

Installing from local src in Development Mode, i.e. in such a way that the project appears to
be installed, but yet is still editable from the src tree.

Unix/macOS

python3 -m pip install -e <path>


Windows

You can also install normally from src

Unix/macOS

python3 -m pip install <path>

Windows

Installing from local archives

Install a particular source archive file.

Unix/macOS

python3 -m pip install ./downloads/SomeProject-1.0.4.tar.gz

Windows

Install from a local directory containing archives (and don’t check PyPI)

Unix/macOS

python3 -m pip install --no-index --find-links=file:///local/dir/ SomeProject


python3 -m pip install --no-index --find-links=/local/dir/ SomeProject
python3 -m pip install --no-index --find-links=relative/dir/ SomeProject
Windows

Installing from other sources

To install from other data sources (for example Amazon S3 storage) you can create a helper
application that presents the data in a PEP 503 compliant index format, and use the --extra-
index-url flag to direct pip to use that index.

./s3helper --port=7777
python -m pip install --extra-index-url http://localhost:7777 SomeProject

Installing Prereleases

Find pre-release and development versions, in addition to stable versions. By default, pip
only finds stable versions.

Unix/macOS

python3 -m pip install --pre SomeProject

Windows

Installing Setuptools “Extras”

Install setuptools extras.

Unix/macOS
python3 -m pip install SomePackage[PDF]
python3 -m pip install SomePackage[PDF]==3.0

python3 -m pip install -e .[PDF] # editable project in current director`


SYSTEM ANALYSIS
IV SYSTEM ANALYSIS

4.1 ARCHITECTURE

4.2 DATA FLOW DIAGRAM


4.4 SEQUENCE DIAGRAM
SYSTEM IMPLEMENTATION
MODULES :

• Data pre-processing

• Feature selection and reduction

• Ensemble classifier

MODULE DESCRIPTION

DATA PRE PROCESSING:

Heart disease data is pre-processed after collection of various records. The dataset contains a
total of patient records, where records are with some missing values. Those records have been
removed from the dataset and the remaining patient records are used in pre-processing. The
multiclass variable and binary classification are introduced for the attributes of the given
dataset. The multi-class variable is used to check the presence or absence of heart disease. In
the instance of the patient having heart disease, the value is set to else the value is set to
indicating the absence of heart disease in the patient. The pre-processing of data is carried out
by converting medical records into diagnosis values. The results of data pre-processing for
patient records indicate that records show the value of establishing the presence of heart
disease while the remaining reflected the value of 0 indicating the absence of heart disease.

FEATURE SELECTION AND REDUCTION:

With a keen focus on data quality, this module addresses missing values, outliers, and applies
appropriate feature scaling. It adeptly encodes categorical variables, ensuring seamless
integration into machine learning models, and balances the dataset to rectify potential class
imbalances. For datasets with temporal components, it tactfully sequences data to align with
the temporal considerations of the Recurrent Neural Network (RNN) and Convolutional
Neural Network(CNN). The output is a refined and preprocessed dataset, primed for optimal
performance in subsequent modeling endeavors.
ENSEMBLE CLASSIFIER:

he Ensemble Model Module is designed to amalgamate predictions from both the


Convolutional Neural Network (CNN) and the Recurrent Neural Network (RNN), harnessing
their unique strengths to fortify overall predictive accuracy Executing the CNN, the module
employs Deep learning techniques to generate predictions that capture nuanced patterns in the
data. Simultaneously, the RNN component accommodates temporal intricacies, offering a
comprehensive understanding of sequential patterns. The ensemble architecture, facilitated by
a VotingClassifier with 'soft' voting, skillfully integrates predictions from CNN and RNN,
fostering a harmonious synergy. To enhance interpretability, the module integrates tools that
elucidate model decisions, unraveling the factors influencing predictions. The final
deliverable is a detailed report encapsulating the project's scope, methodology, findings, and
avenues for potential future enhancements.

ALGORITHMS:

CONVOLUTIONAL NEURAL NETWORK:

A convolutional neural network (CNN or convnet) is a subset of machine learning. It is one


of the various types of artificial neural networks which are used for different applications and
data types. A CNN is a kind of network architecture for deep learning algorithms and is
specifically used for image recognition and tasks that involve the processing of pixel data.

There are other types of neural networks in deep learning, but for identifying and recognizing
objects, CNNs are the network architecture of choice. This makes them highly suitable for
computer vision (CV) tasks and for applications where object recognition is vital, such as
self-driving cars and facial recognition.

Artificial neural networks (ANNs) are a core element of deep learning algorithms. One type
of an ANN is a recurrent neural network (RNN) that uses sequential or time series data as
input. It is suitable for applications involving natural language processing (NLP), language
translation, speech recognition and image captioning.
The CNN is another type of neural network that can uncover key information in both time
series and image data. For this reason, it is highly valuable for image-related tasks, such as
image recognition, object classification and pattern recognition. To identify patterns within
an image, a CNN leverages principles from linear algebra, such as matrix multiplication.
CNNs can also classify audio and signal data.

A CNN's architecture is analogous to the connectivity pattern of the human brain. Just like
the brain consists of billions of neurons, CNNs also have neurons arranged in a specific way.
In fact, a CNN's neurons are arranged like the brain's frontal lobe, the area responsible for
processing visual stimuli. This arrangement ensures that the entire visual field is covered,
thus avoiding the piecemeal image processing problem of traditional neural networks, which
must be fed images in reduced-resolution pieces. Compared to the older networks, a CNN
delivers better performance with image inputs, and also with speech or audio signal inputs.

A deep learning CNN consists of three layers: a convolutional layer, a pooling layer and a
fully connected (FC) layer. The convolutional layer is the first layer while the FC layer is the
last.

From the convolutional layer to the FC layer, the complexity of the CNN increases. It is this
increasing complexity that allows the CNN to successively identify larger portions and more
complex features of an image until it finally identifies the object in its entirety.

Convolutional layer. The majority of computations happen in the convolutional layer, which
is the core building block of a CNN. A second convolutional layer can follow the initial
convolutional layer. The process of convolution involves a kernel or filter inside this layer
moving across the receptive fields of the image, checking if a feature is present in the image.

Over multiple iterations, the kernel sweeps over the entire image. After each iteration a dot
product is calculated between the input pixels and the filter. The final output from the series
of dots is known as a feature map or convolved feature. Ultimately, the image is converted
into numerical values in this layer, which allows the CNN to interpret the image and extract
relevant patterns from it.
Pooling layer. Like the convolutional layer, the pooling layer also sweeps a kernel or filter
across the input image. But unlike the convolutional layer, the pooling layer reduces the
number of parameters in the input and also results in some information loss. On the positive
side, this layer reduces complexity and improves the efficiency of the CNN.

Fully connected layer. The FC layer is where image classification happens in the CNN based
on the features extracted in the previous layers. Here, fully connected means that all the
inputs or nodes from one layer are connected to every activation unit or node of the next
layer.

All the layers in the CNN are not fully connected because it would result in an unnecessarily
dense network. It also would increase losses and affect the output quality, and it would be
computationally expensive.

How do convolutional neural networks work?

A CNN can have multiple layers, each of which learns to detect the different features of an
input image. A filter or kernel is applied to each image to produce an output that gets
progressively better and more detailed after each layer. In the lower layers, the filters can
start as simple features.

At each successive layer, the filters increase in complexity to check and identify features that
uniquely represent the input object. Thus, the output of each convolved image -- the partially
recognized image after each layer -- becomes the input for the next layer. In the last layer,
which is an FC layer, the CNN recognizes the image or the object it represents.

With convolution, the input image goes through a set of these filters. As each filter activates
certain features from the image, it does its work and passes on its output to the filter in the
next layer. Each layer learns to identify different features and the operations end up being
repeated for dozens, hundreds or even thousands of layers. Finally, all the image data
progressing through the CNN's multiple layers allow the CNN to identify the entire object.

Deep learning is a subset of machine learning that uses neural networks with at least three
layers. Compared to a network with just one layer, a network with multiple layers can deliver
more accurate results. Both RNNs and CNNs are used in deep learning, depending on the
application.

For image recognition, image classification and computer vision (CV) applications, CNNs
are particularly useful because they provide highly accurate results, especially when a lot of
data is involved. The CNN also learns the object's features in successive iterations as the
object data moves through the CNN's many layers. This direct (and deep) learning eliminates
the need for manual feature extraction (feature engineering).

CNNs can be retrained for new recognition tasks and built on preexisting networks. These
advantages open up new opportunities to use CNNs for real-world applications without
increasing computational complexities or costs.

As seen earlier, CNNs are more computationally efficient than regular NNs since they use
parameter sharing. The models are easy to deploy and can run on any device, including
smartphones.

RECURRENT NEURAL NETWORK(RNN):

Recurrent Neural Network(RNN) is a type of Neural Network where the output from the
previous step is fed as input to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other. Still, in cases when it is required to predict the
next word of a sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which solved this issue with
the help of a Hidden Layer. The main and most important feature of RNN is its Hidden state,
which remembers some information about a sequence. The state is also referred to as
Memory State since it remembers the previous input to the network. It uses the same
parameters for each input as it performs the same task on all the inputs or hidden layers to
produce the output. This reduces the complexity of parameters, unlike other neural networks.

Artificial neural networks that do not have looping nodes are called feed forward neural
networks. Because all information is only passed forward, this kind of neural network is also
referred to as a multi-layer neural network.

Information moves from the input layer to the output layer – if any hidden layers are present
– unidirectionally in a feedforward neural network. These networks are appropriate for image
classification tasks, for example, where input and output are independent. Nevertheless, their
inability to retain previous inputs automatically renders them less useful for sequential data
analysis.

The fundamental processing unit in a Recurrent Neural Network (RNN) is a Recurrent Unit,
which is not explicitly called a “Recurrent Neuron.” This unit has the unique ability to
maintain a hidden state, allowing the network to capture sequential dependencies by
remembering previous inputs while processing. Long Short-Term Memory (LSTM) and
Gated Recurrent Unit (GRU) versions improve the RNN’s ability to handle long-term
dependencies.

Training through RNN

A single-time step of the input is provided to the network.

Then calculate its current state using a set of current input and the previous state.

The current ht becomes ht-1 for the next time step.

One can go as many time steps according to the problem and join the information from all the
previous states.

Once all the time steps are completed the final current state is used to calculate the output.

The output is then compared to the actual output i.e the target output and the error is
generated.

The error is then back-propagated to the network to update the weights and hence the
network (RNN) is trained using Backpropagation through time.

ADVANTAGES AND DISADVANTAGES OF RECURRENT NEURAL NETWORK

ADVANTAGES
An RNN remembers each and every piece of information through time. It is useful in time
series prediction only because of the feature to remember previous inputs as well. This is
called Long Short Term Memory.

Recurrent neural networks are even used with convolutional layers to extend the effective
pixel neighborhood.

DISADVANTAGES:

 Gradient vanishing and exploding problems.


 Training an RNN is a very difficult task.
 It cannot process very long sequences if using tanh or relu as an activation function.

APPLICATIONS OF RECURRENT NEURAL NETWORK

 Language Modelling and Generating Text


 Speech Recognition
 Machine Translation
 Image Recognition, Face detection
 Time series Forecasting

VARIATION OF RECURRENT NEURAL NETWORK (RNN)

To overcome the problems like vanishing gradient and exploding gradient descent several
new advanced versions of RNNs are formed some of these are as;

 Bidirectional Neural Network (BiNN)


 Long Short-Term Memory (LSTM)
 Bidirectional Neural Network (BiNN)

A BiNN is a variation of a Recurrent Neural Network in which the input information flows in
both direction and then the output of both direction are combined to produce the input. BiNN
is useful in situations when the context of the input is more important such as Nlp tasks and
Time-series analysis problems.

Long Short-Term Memory (LSTM)


Long Short-Term Memory works on the read-write-and-forget principle where given the
input information network reads and writes the most useful information from the data and it
forgets about the information which is not important in predicting the output. For doing this
three new gates are introduced in the RNN. In this way, only the selected information is
passed through the network.

Difference between RNN and Simple Neural Network

RNN is considered to be the better version of deep neural when the data is sequential. There
are significant differences between the RNN and deep neural networks

TECHNICAL DESCRIPTION

1.1 DOMAIN SPECIFICATION

DEEP NEURAL NETWORKS

A deep neural network is simply a shallow neural network with more than one hidden
layer. Each neuron in the hidden layer is connected to many others. Each arrow has a weight
property attached to it, which controls how much that neuron's activation affects the others
attached to it.

The word 'deep' in deep learning is attributed to these deep hidden layers and derives
its effectiveness from it. Selecting the number of hidden layers depends on the nature of the
problem and the size of the data set. The following figure shows a deep neural network with
two hidden layers.
In this section, we covered a high-level overview of how an artificial neural network works.
To learn more, see the article on how neural networks work from scratch. You can also take a
deeper look at neural networks in this neural networks deep dive.

APPLICATIONS

Deep learning has a plethora of applications in almost every field such as health care,
finance, and image recognition. In this section, let's go over a few applications.

 Health care: With easier access to accelerated GPU and the availability of huge
amounts of data, health care use cases have been a perfect fit for applying deep
learning. Using image recognition, cancer detection from MRI imaging and x-rays has
been surpassing human levels of accuracy. Drug discovery, clinical trial matching,
and genomics have been other popular health care-based applications.
 Autonomous vehicles: Though self-driving cars is a risky field to automate, it has
recently taken a turn towards becoming a reality. From recognizing a stop sign to
seeing a pedestrian on the road, deep learning-based models are trained and tried
under simulated environments to monitor progress.
 e-commerce: Product recommendations has been one of the most popular and
profitable applications of deep learning. With more personalized and accurate
recommendations, customers are able to easily shop for the items they are looking for
and are able to view all of the options that they can choose from. This also accelerates
sales and thus, benefits sellers.
 Personal assistant: Thanks to advancements in the field of deep learning, having a
personal assistant is as simple as buying a device like Alexa or Google Assistant.
These smart assistants use deep learning in various aspects such as personalized voice
and accent recognition, personalized recommendations, and text generation.

Clearly, these are only a small portion of the vast applications to which deep learning can
be applied. Stock market predictions and weather predictions are also equally popular fields
in which deep learning has been helpful.

CHALLENGES IN DEEP LEARNING

Though deep learning methods gained immense popularity in the last 10 years or so,
the idea has been around since the mid-1950s when Frank Rosenblatt invented the perceptron
on an IBM® 704 machines. It was a two-layer-based electronic device that had the ability to
detect shapes and do reasoning. Advancements in this field in recent years are primarily
because of the increase in computing power and high-performance graphical processing units
(GPUs), coupled with the large increase in the wealth of data these models have at their
disposal for learning, as well as interest and funding from the community for continued
research. Though deep learning has taken off in the last few years, it does come with its own
set of challenges that the community is working hard to resolve.
NEED FOR DATA

The deep learning methods prevalent today are very data hungry, and many complex
problems such as language translation don't have sophisticated data sets available. Deep
learning methods to perform neural machine translation to and from low-resource languages
often perform poorly, and techniques such as domain adaptation (applying learnings gained
from developing high-resource systems to low-resource scenarios) have shown promise in
recent years. For problems such as pose estimation, it can be arduous to generate such a high
volume of data. The synthetic data the model ends up training on differs a lot in reality from
the "in-the-wild" setup in which the model ultimately needs to perform.

EXPLAINABILITY AND FAIRNESS

Even though deep learning algorithms have proven to beat human-level accuracy,
there is no clear way to backtrack and provide the reasoning behind each prediction that's
made. This makes it difficult to use in applications such as finance where there are mandates
to provide the reasoning behind every loan that is approved or rejected.

Another dimension that tends to be an issue is the underlying bias in the data itself,
which can lead to poor performance of the model on crucial subsets of the data. Learning
agents that use a reward-based mechanism sometimes stop behaving ethically because all
they require to minimize system error is to maximize the reward they accrue. This example
shows how the agent simply stopped playing the game and ended up in an infinite loop of
collecting reward points. While it might be acceptable in a game scenario, wrong or unethical
decisions can have a profound negative impact in the real world. A strong need exists to
allow models to learn in a balanced fashion.
VII APPENDICES
7. 1CODING
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

from sklearn.linear_model import LogisticRegression


from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import train_test_split,cross_val_score


from sklearn.model_selection import RandomizedSearchCV,GridSearchCV
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.metrics import precision_score, recall_score,f1_score
from sklearn.metrics import roc_curve
df = pd.read_csv("heart-disease.csv")
df.shape #(rows, columns)
(303, 14)
df.head()
df.tail()
df["target"].value_counts()
target
1 165
0 138
Name: count, dtype: int64df["target"].value_counts().plot(kind = "bar",
color=["salmon","lightblue"]);
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 303 entries, 0 to 302
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 303 non-null int64
1 sex 303 non-null int64
2 cp 303 non-null int64
3 trestbps 303 non-null int64
4 chol 303 non-null int64
5 fbs 303 non-null int64
6 restecg 303 non-null int64
7 thalach 303 non-null int64
8 exang 303 non-null int64
9 oldpeak 303 non-null float64
10 slope 303 non-null int64
11 ca 303 non-null int64
12 thal 303 non-null int64
13 target 303 non-null int64
dtypes: float64(1), int64(13)
memory usage: 33.3 KB

df.isna().sum()
age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

df.describe()
df.sex.value_counts()
sex
1 207
0 96
Name: count, dtype: int64

# Compare target column with sex column


pd.crosstab(df.target,df.sex)
# Create a plot of crosstab
pd.crosstab(df.target,df.sex).plot(kind="bar",
figsize=(10,6),
color=["salmon","lightblue"])
plt.title("Heart Disease Frequency for sex")
plt.xlabel("0 = No Disease, 1 = Disease")
plt.ylabel("Amount")
plt.legend(["Female", "Male"])
plt.xticks(rotation = 0);
# Check the distribution of the age column with a histogram
df.age.plot.hist();

pd.crosstab(df.cp, df.target)
pd.crosstab(df.cp, df.target).plot(kind="bar",
figsize=(10, 6),
color=["salmon", "lightblue"])

plt.title("Heart Disease Frequency Per Chest Pain Type")


plt.xlabel("Chest Pain Type")
plt.ylabel("Amount")
plt.legend(["No Disease", "Disease"])
plt.xticks(rotation=0);
df.corr()
corr_matrix = df.corr()
fig, ax = plt.subplots(figsize=(15, 10))
ax = sns.heatmap(corr_matrix,
annot=True,
linewidths=0.5,
fmt=".2f",
cmap="YlGnBu");
df.head()
X = df.drop("target", axis=1)

y = df["target"]
X.head()
y.head()
0 1
1 1
2 1
3 1
4 1
Name: target, dtype: int64np.random.seed(42)

X_train, X_test, y_train, y_test = train_test_split(X,


y,
test_size=0.2)
# KNN ALGORITHM
knn_classifier = KNeighborsClassifier(n_neighbors=5)

# Train the KNN classifier


knn_classifier.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
y_pred_knn = knn_classifier.predict(X_test)

# Evaluate the accuracy


accuracy_KNN = accuracy_score(y_test, y_pred_knn)
print(f'KNN Accuracy: {accuracy_KNN}')
KNN Accuracy: 0.6885245901639344

print("KNN Classification Report:")


print(classification_report(y_test, y_pred_knn))

# Display the confusion matrix


print("KNN Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_knn))

KNN Classification Report:


precision recall f1-score support

0 0.69 0.62 0.65 29


1 0.69 0.75 0.72 32

accuracy 0.69 61
macro avg 0.69 0.69 0.69 61
weighted avg 0.69 0.69 0.69 61

KNN Confusion Matrix:


[[18 11]
[ 8 24]]

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred_knn)

# Plot confusion matrix using seaborn heatmap


plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=np.unique(y_test), yticklabels=np.unique(y_test))
plt.title('KNN Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()
# RANDOM FOREST ALGORITHM

random_forest_classifier = RandomForestClassifier(n_estimators=100,
random_state=42)

# Train the Random Forest classifier


random_forest_classifier.fit(X_train, y_train)
y_pred_RF = random_forest_classifier.predict(X_test)

# Evaluate the accuracy


accuracy_RF = accuracy_score(y_test, y_pred_RF)
print(f'Random Forest Accuracy: {accuracy_RF}')
Random Forest Accuracy: 0.8360655737704918

print("Random forest Classification Report:")


print(classification_report(y_test, y_pred_RF))

# Display the confusion matrix


print("Random forest Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_RF))
Random forest Classification Report:
precision recall f1-score support

0 0.83 0.83 0.83 29


1 0.84 0.84 0.84 32

accuracy 0.84 61
macro avg 0.84 0.84 0.84 61
weighted avg 0.84 0.84 0.84 61

Random forest Confusion Matrix:


[[24 5]
[ 5 27]]
cm = confusion_matrix(y_test, y_pred_RF)

# Plot confusion matrix using seaborn heatmap


plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=np.unique(y_test), yticklabels=np.unique(y_test))
plt.title('Random Forest Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

# ENSEMBLE MODEL (CNN AND RNN)


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, LSTM, Dense,
Flatten, concatenate, Dropout, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras import regularizers

X_reshaped = X.values.reshape((X.shape[0], X.shape[1], 1))

X_train, X_test, y_train, y_test = train_test_split(X_reshaped, y,


test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train.reshape(X_train.shape[0], -
1)).reshape(X_train.shape)
X_test_scaled = scaler.transform(X_test.reshape(X_test.shape[0], -
1)).reshape(X_test.shape)

# Model definition
cnn_input = Input(shape=(X_train_scaled.shape[1], X_train_scaled.shape[2]))
cnn_layer = Conv1D(filters=128, kernel_size=3, activation='relu')(cnn_input)
cnn_layer = MaxPooling1D(pool_size=2)(cnn_layer)
cnn_output = Flatten()(cnn_layer)

rnn_input = Input(shape=(X_train_scaled.shape[1], X_train_scaled.shape[2]))


rnn_layer = LSTM(units=100, activation='relu',
return_sequences=True)(rnn_input)
rnn_layer = LSTM(units=100, activation='relu')(rnn_layer)

# Combine the CNN and RNN parts


combined_layer = concatenate([cnn_output, rnn_layer])
combined_layer = Dropout(0.5)(combined_layer)
combined_layer = BatchNormalization()(combined_layer)
final_layer = Dense(units=1, activation='sigmoid',
kernel_regularizer=regularizers.l2(0.01))(combined_layer)

model = Model(inputs=[cnn_input, rnn_input], outputs=final_layer)

model.compile(optimizer=Adam(learning_rate=0.0001),
loss='binary_crossentropy', metrics=['accuracy'])
early_stopping = EarlyStopping(monitor='val_loss', patience=10,
restore_best_weights=True)

# Train the model


history = model.fit([X_train_scaled, X_train_scaled], y_train, epochs=100,
batch_size=32, validation_split=0.2, callbacks=[early_stopping])

# Plot the accuracy graph


model.save("model.h5")
WARNING:tensorflow:From C:\Users\yamin\anaconda3\Lib\site-
packages\keras\src\losses.py:2976: The name
tf.losses.sparse_softmax_cross_entropy is deprecated. Please use
tf.compat.v1.losses.sparse_softmax_cross_entropy instead.

WARNING:tensorflow:From C:\Users\yamin\anaconda3\Lib\site-
packages\keras\src\backend.py:1398: The name
tf.executing_eagerly_outside_functions is deprecated. Please use
tf.compat.v1.executing_eagerly_outside_functions instead.

WARNING:tensorflow:From C:\Users\yamin\anaconda3\Lib\site-
packages\keras\src\backend.py:6642: The name tf.nn.max_pool is deprecated.
Please use tf.nn.max_pool2d instead.

Epoch 1/100

WARNING:tensorflow:From C:\Users\yamin\anaconda3\Lib\site-
packages\keras\src\utils\tf_utils.py:492: The name tf.ragged.RaggedTensorValue
is deprecated. Please use tf.compat.v1.ragged.RaggedTensorValue instead.

WARNING:tensorflow:From C:\Users\yamin\anaconda3\Lib\site-
packages\keras\src\engine\base_layer_utils.py:384: The name
tf.executing_eagerly_outside_functions is deprecated. Please use
tf.compat.v1.executing_eagerly_outside_functions instead.

7/7 [==============================] - 6s 152ms/step - loss: 0.8930 -


accuracy: 0.4611 - val_loss: 0.7086 - val_accuracy: 0.5714

Epoch 2/100

7/7 [==============================] - 0s 26ms/step - loss: 0.8791 - accuracy:


0.4663 - val_loss: 0.7059 - val_accuracy: 0.5714
Epoch 3/100

7/7 [==============================] - 0s 25ms/step - loss: 0.8316 - accuracy:


0.5285 - val_loss: 0.7029 - val_accuracy: 0.5714

Epoch 4/100

7/7 [==============================] - 0s 24ms/step - loss: 0.8555 - accuracy:


0.5492 - val_loss: 0.6990 - val_accuracy: 0.6531

Epoch 5/100

7/7 [==============================] - 0s 27ms/step - loss: 0.7544 - accuracy:


0.5803 - val_loss: 0.6955 - val_accuracy: 0.6531

Epoch 6/100

7/7 [==============================] - 0s 28ms/step - loss: 0.8032 - accuracy:


0.5544 - val_loss: 0.6917 - val_accuracy: 0.6531

Epoch 7/100

7/7 [==============================] - 0s 26ms/step - loss: 0.7810 - accuracy:


0.5596 - val_loss: 0.6875 - val_accuracy: 0.7143

Epoch 8/100

...

7/7 [==============================] - 0s 30ms/step - loss: 0.5342 - accuracy:


0.7617 - val_loss: 0.5368 - val_accuracy: 0.7143

Epoch 50/100

7/7 [==============================] - 0s 26ms/step - loss: 0.4601 - accuracy:


0.7824 - val_loss: 0.5333 - val_accuracy: 0.7347

Epoch 51/100

Output is truncated. View as a scrollable element or open in a text ed

Epoch 52/100

7/7 [==============================] - 0s 28ms/step - loss: 0.5480 - accuracy:


0.7306 - val_loss: 0.5322 - val_accuracy: 0.7551

Epoch 53/100

7/7 [==============================] - 0s 27ms/step - loss: 0.5619 - accuracy:


0.7306 - val_loss: 0.5307 - val_accuracy: 0.7551

Epoch 54/100

7/7 [==============================] - 0s 27ms/step - loss: 0.4544 - accuracy:


0.8031 - val_loss: 0.5299 - val_accuracy: 0.7551
Epoch 55/100

7/7 [==============================] - 0s 28ms/step - loss: 0.4599 - accuracy:


0.7979 - val_loss: 0.5285 - val_accuracy: 0.7551

Epoch 56/100

7/7 [==============================] - 0s 26ms/step - loss: 0.4792 - accuracy:


0.7668 - val_loss: 0.5286 - val_accuracy: 0.7551

Epoch 57/100

7/7 [==============================] - 0s 30ms/step - loss: 0.5016 - accuracy:


0.7617 - val_loss: 0.5281 - val_accuracy: 0.7347

Epoch 58/100

7/7 [==============================] - 0s 28ms/step - loss: 0.5130 - accuracy:


0.7824 - val_loss: 0.5270 - val_accuracy: 0.7347

Epoch 59/100

7/7 [==============================] - 0s 27ms/step - loss: 0.4922 - accuracy:


0.7720 - val_loss: 0.5260 - val_accuracy: 0.7347

Epoch 60/100

7/7 [==============================] - 0s 27ms/step - loss: 0.5161 - accuracy:


0.7254 - val_loss: 0.5247 - val_accuracy: 0.7347

Epoch 61/100

7/7 [==============================] - 0s 28ms/step - loss: 0.4627 - accuracy:


0.7979 - val_loss: 0.5229 - val_accuracy: 0.7551

Epoch 62/100

7/7 [==============================] - 0s 27ms/step - loss: 0.4725 - accuracy:


0.7617 - val_loss: 0.5233 - val_accuracy: 0.7347

Epoch 63/100

7/7 [==============================] - 0s 27ms/step - loss: 0.4886 - accuracy:


0.7979 - val_loss: 0.5227 - val_accuracy: 0.7347

...

Epoch 78/100

7/7 [==============================] - 0s 27ms/step - loss: 0.4561 - accuracy:


0.7720 - val_loss: 0.5205 - val_accuracy: 0.7755

Epoch 79/100
7/7 [==============================] - 0s 28ms/step - loss: 0.4746 - accuracy:
0.7824 - val_loss: 0.5207 - val_accuracy: 0.7755

Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

C:\Users\yamin\anaconda3\Lib\site-packages\keras\src\engine\training.py:3103:
UserWarning: You are saving your model as an HDF5 file via `model.save()`.
This file format is considered legacy. We recommend using instead the native
Keras format, e.g. `model.save('my_model.keras')`.

saving_api.save_model(

plt.plot(history.history['accuracy'], label='Train Accuracy')


plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.title('Model Training Accuracy')
plt.legend()
plt.show()

# Evaluate the model on the test set


accuracy = model.evaluate([X_test_scaled, X_test_scaled], y_test)[1]
print(f'Test Accuracy: {accuracy}')

# Make predictions using the trained model


2/2
[==============================] - 0s 13ms/step - loss: 0.4883 - accuracy:
0.7869

Test Accuracy: 0.7868852615356445


model_scores = {'KNN': accuracy_KNN, 'Random Forest': accuracy_RF, 'CNN_LSTM':
accuracy_EN}

# Create a DataFrame
model_compare = pd.DataFrame(model_scores, index=["accuracy"])

# Plot the bar graph for accuracy comparison


model_compare.T.plot.bar(rot=0)
plt.title('Model Comparison')
plt.ylabel('Accuracy')
plt.show()
SCREENSHOTS

NEGATIVE RESULT:
POSITIVE RESULT:
CONCLUSION
CONCLUSION:

The project successfully introduces an ensemble approach for heart disease prediction,
showcasing improved accuracy and interpretability. The ensemble framework, combining
CNN and RNN, proves to be a promising avenue for advancing predictive modeling in
healthcare. The comprehensive report serves as a valuable resource for stakeholders,
outlining methodologies, results, and future avenues for research and improvement.
FUTURE ENHANCEMENT

The Data Processing Module successfully handled missing values, outliers, and applied
feature scaling, resulting in a refined and cleaned dataset. The thorough data processing lays a
solid foundation for subsequent model training, ensuring that the models are fed with high-
quality and standardized inputs. The Ensemble Model Module effectively combined
predictions from the Convolutional Neural Network (CNN) and the Recurrent Neural
Network (RNN), showcasing improved predictive accuracy. By leveraging the strengths of
both CNN and RNN, the ensemble approach demonstrates the potential for enhanced
performance, capturing a more comprehensive understanding of heart disease risk factors.
The integration of CNN and RNN in an ensemble demonstrates the potential for harnessing
diverse modeling approaches. The project contributes to the field of healthcare analytics,
offering a robust tool for early heart disease detection and informed decision-making.
IX BIBLOGRAPHY

[1] S. J. Pasha and e. S. Mohamed, ‘‘novel feature reduction (nfr) model with Deep learning
and data mining algorithms for effective disease risk prediction,’’ ieee access, vol. 8, pp.
184087–184108, 2020.

[2] Y. Khan, U. Qamar, N. Yousaf, and A. Khan, ‘‘Deep learning techniques for heart disease
datasets: A survey,’’ in Proc. 11th Int. Conf. Mach. Learn. Comput. (ICMLC), Zhuhai,
China, 2019, pp. 27–35.

[3] S. Goel, A. Deep, S. Srivastava, and A. Tripathi, ‘‘Comparative anal- ysis of various
techniques for heart disease prediction,’’ in Proc. 4th Int. Conf. Inf. Syst. Comput. Netw.
(ISCON), Mathura, India, Nov. 2019, pp. 88–94

[4] A. Lakshmanarao, Y. Swathi, and P. S. S. Sundareswar, ‘‘Deep learning techniques for


heart disease prediction,’’ Int. J. Sci. Technol. Res., vol. 8, no. 11, p. 97, Nov. 2019.

[5] S. Mohan, C. Thirumalai, and G. Srivastava, ‘‘Effective heart disease prediction using
hybrid Deep learning techniques,’’ IEEE Access, vol. 7,pp. 81542–81554, 2019.

[6] A. K. Gárate-Escamila, A. Hajjam El Hassani, and E. Andrès, ‘‘Classification models for


heart disease prediction using feature selection and PCA,’’ Informat. Med. Unlocked, vol. 19,
Jan. 2020, Art. no. 100330.

[7] D. W. Hosmer, S. Lemeshow, and E. D. Cook, Applied Logistic Regression, 2nd ed. New
York, NY, USA: Wiley, 2000.

[8] E. Nasarian, M. Abdar, M. A. Fahami, R. Alizadehsani, S. Hussain, M. E. Basiri, M.


Zomorodi-Moghadam, X. Zhou, P. Pławiak, U. R. Acharya, R.-S. Tan, and N. Sarrafzadegan,
‘‘Association between work-related features and coronary artery disease: A heterogeneous
hybrid feature selection integrated with balancing approach,’’ Pattern Recognit. Lett., vol.
133, pp. 33–40, May 2020

[9] R. Atallah and A. Al-Mousa, ‘‘Heart disease detection using Deep learning majority
voting ensemble method,’’ in Proc. 2nd Int. Conf. new Trends Comput. Sci. (ICTCS), Oct.
2019, pp. 1–6.
[10] A. Gupta, L. Kumar, R. Jain, and P. Nagrath, ‘‘Heart disease pre-diction using
classification (naive bayes),’’ in Proc. 1st Int. Conf. Comput., Commun., Cyber-Secur. (ICS).
Singapore: Springer, 2020, pp. 561–573.

You might also like