Professional Documents
Culture Documents
A6 Final Documentation1
A6 Final Documentation1
ALGORITHM
Submitted By
B. SRI KAVYA
(182U1A0510)
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
BONAFIDE CERTIFICATE
This is to certify that the project work entitled “EFFICIENCY OF HEART DISEASE
PREDICTION USING GENETIC ALGORITHM” is a bonafide record done by B. SRI KAVYA
(182U1A0510) , L. REKHA (182U1A0549), K. CHANDANA (182U1A0545), B. PRANAVI
(182U1A0509), in the department of Computer Science & Engineering, Geethanjali Institute of
Science and Technology, Nellore and is submitted to Jawaharlal Nehru Technological University,
Anantapur in the partial fulfillment for the award of B. Tech degree in Computer Science &
Engineering. This work has been carried out under my supervision.
(2018-2022)
ACKNOWLEDGEMENTS
The satisfaction that accompanies the successful completion of the project would be
incomplete without the people who made it possible. Their constant guidance and
encouragement crowned the efforts with success.
We owe our gratitude to Dr. G. SUBBA RAO, M.Tech, Ph.D, MIE, LMISTE,
MSAE, PRINCIPAL, Geethanjali Institute of Science and Technology, Nellore, for his
consistent help and valuable suggestions.
Our special thanks to Dr. V. SIREESHA, M.E., Ph.D., Professor & Head of the
Department, Department of Computer Science & Engineering, Geethanjali Institute of Science
and Technology, Nellore, for her timely suggestions and help during the progress of project
work in spite of her busy schedule.
It is indeed our proud privilege to express our deep sense of gratitude and indebtedness
to our guide, V. GAYATRI, M.E., (Ph.D) Associate Professor Computer Science &
Engineering, Geethanjali Institute of Science and Technology, Nellore, for her keen interest,
critical, constructive and skillful guidance and constant encouragement throughout the course
and for successful completion of project.
During the entire course of dissertation work, we received valuable academic inputs
as well as moral support from other departments, general teaching and non-teaching faculty at
GEETHANJALI INSTITUTE OF SCIENCE AND TECHNOLOGY, Nellore. We
were motivated by the uphold and moral encouragement given to us by our beloved parents.
Finally we wish to express our sincere thanks for all those who helped me directly or indirectly
to complete the work.
PROJECT ASSOCIATES
LIST OF FIGURES ii
LIST OF TABLES ii
LIST OF GRAPHS iii
7.3 Advantages 42
7.5 Operators 44
8. CONCLUSION 56
9. FUTURE ENHANCEMENT 58
10. BIBILIOGRAPHY 60
ABSTRACT
Human heart is an important organ in the human body. It is very helpful for body
functioning and removes the waste products from our body by pumping the blood throughout
the body. It is very risky to human lives whenever heart disease or failure occurs. Machine
Learning is one of the most widely used concepts around the world. It will be essential in the
healthcare sector which will be useful for doctors to fasten the diagnosis.
The objective of this project is to build a Machine learning model for heart disease prediction
based on the related attributes. The dataset of Heart disease prediction consists of 14 different
parameters related to Heart Disease. The proposed work deals with Machine Learning
algorithms such as bio-inspired optimization algorithms. It has four features optimizing
algorithms such as Genetic Algorithm, Bat algorithm, Bee algorithm and ACO algorithm.
Here we are implementing 3 algorithms called Genetic, Bat and Bee algorithms. We analyse
and predict the result whether the patient has heart disease, no disease and the stages of
disease using bio –inspired algorithms. Genetic Algorithm gives more accuracy in less time
for the prediction. This prediction will make it faster and more efficient in healthcare sectors.
This model can be helpful to the medical practitioners at their clinic as decision support
system.
i
LIST OF
LIST OF TABLES
i
LIST OF
SNO GRAPH NO GRAPH NAME PAGE NO
1. 7.8.2 Data Analysis 54
2. 7.8.3 Accuracy Graph 55
LIST OF ABBREVATIONS
i
EFFICIENCY OF HEART DISEASE PREDICTION USING GENETIC
Chapter 1
INTRODUCTION
1. INTRODUCTION
1.1 Introduction
Health diseases are increasing day by day due to life style and hereditary. In this
aspect, heart disease is the most important cause of demise in the human kind over past few
years. Human heart is an important organ in the human body. It is very helpful for body
functioning and removes the waste products from our body by pumping the blood throughout
the body. It is very risky to human lives whenever heart disease or failure occurs. nearly 1.2
billion population die every year as a outcome of heart diseases. There is no single solution to
the increasing load of Heart disease. There is no single solution to the rising burden of heart
disease, given the massive transformation in ethnic, as well as economic environments. Heart
failure prognosis has historically been an extremely thought-provoking task in the eve of
high- cost ratios. The price of a wide range of modern imaging and clinical methodologies for
the diagnosis of heart disease is prohibitively high. Leading causes of cardiac disease involve
chest discomfort, dyspnoea, fatigue, edema, palpations, as well as syncope, as well as cough,
hemoptysis, and cyanosis.
The system which is computer-based clinical decision support and can reduce medical errors,
improve patient safety and reduce unnecessary changes in practice, and improve the
prognosis of the patient’s medical history to integrate patients. Machine Learning is one of
the most widely used concepts around the world. It will be essential in the healthcare sector
which will be useful for doctors to fasten the diagnosis.
The main objective of this study is to develop a prototype of heart disease forecasting system
using bio – inspired algorithms. a huge knowledge and accurate data in the field not only
helps users by providing effective treatment, but also help to reduce the cost of treatment and
improve the visualization and ease of explanation. bio-inspired optimization algorithms has 4
features optimizing algorithms such as Genetic Algorithm, Bat algorithm, Bee algorithm and
ACO algorithm. Here we are implementing 3 algorithms called Genetic, Bat and Bee
algorithms.We analyse and predict the result whether the patient has heart disease, no disease
and the stages of disease using bio –inspired algorithms. Genetic Algorithm gives more
accuracy in less time for the prediction.
Machine Learning is the field of study that gives computers the capability to learn
without being explicitly programmed. ML is one of the most exciting technologies that one
would have ever come across. As it is evident from the name, it gives the computer that makes
it more similar to humans: The ability to learn. Machine learning is actively being used today,
perhaps in many more places than one would expect.
Machine Learning (ML) has proven to be one of the most game-changing technological
advancements of the past decade. In the increasingly competitive corporate world, ML is
enabling companies to fast-track digital transformation and move into an age of automation.
ML is required to stay relevant in some verticals, such as digital payments and fraud
detection in banking or product recommendations. The eventual adoption of machine learning
algorithms and its pervasiveness in enterprises is also well-documented, with different
companies adopting machine learning at scale across verticals. Today, every other app and
software all over the Internet uses machine learning in some form or the other.
Machine learning is fundamentally set apart from artificial intelligence, as it has the
capability to evolve. Using various programming techniques, machine learning algorithms are
able to process large amounts of data and extract useful information. In this way, they can
improve upon their previous iterations by learning from the data they are provided.
Machine learning is a subset of AI, which enables the machine to automatically learn
from data, improve performance from past experiences, and make predictions. Machine
learning contains a set of algorithms that work on a huge amount of data. Data is fed to these
algorithms to train them, and on the basis of training, they build the model & perform a
specific task. These ML algorithms help to solve different business problems like Regression,
Classification, Forecasting, Clustering, and Associations, etc.
Based on the methods and way of learning, machine learning is divided into mainly four types,
which are:
Genetic Algorithm
BAT Algorithm
BEE Algorithm
ACO Algorithm
This Special Issue intends to gather the latest high-quality original research and review
articles discussing bio-inspired algorithms and their applications. We also welcome articles
focusing on the analysis and design of optimization test problems, as well as performance
evaluation indicators.
Data mining is the process of extracting and discovering patterns in large data
sets involving methods at the intersection of machine learning, statistics, and database
systems.
The actual data mining task is the semi-automatic or automatic analysis of large quantities of
data to extract previously unknown, interesting patterns such as groups of data records,
unusual records, and dependencies. This usually involves using database techniques such as
spatial indices. These patterns can then be seen as a kind of summary of the input data, and
may be used in further analysis or, for example, in machine learning and predictive analytics.
For example, the data mining step might identify multiple groups in the data, which can then
be used to obtain more accurate prediction results by a decision support system. Neither the
data collection, data preparation, nor result interpretation and reporting is part of the data
mining step, although they do belong to the overall KDD process as additional steps.
The term "data mining" is a misnomer because the goal is the extraction of patterns and
knowledge from large amounts of data, not the extraction of data itself. It also is
a buzzword and is frequently applied to any form of large-scale data or information
processing as well as any application of computer decision support system, including
artificial intelligence and business intelligence.
1.6 OVERVIEW
Machine Learning is one of the most widely used concepts around the world. It will
be essential in the healthcare sector which will be useful for doctors to fasten the diagnosis.
The objective of this project is to build a Machine learning model for heart disease prediction
based on the related attributes. The dataset of Heart disease prediction consists of 14 different
parameters related to Heart Disease. The raw data for the Heart diseases prediction is the
collects of historical data that includes a variety of important attributes like age , sex, cp,
trestbps, chol, fbs , restecg, thalacg exang, oldpeak, slope, ca, thal, class this are the 14 main
Attributes taken in the dataset. This dataset under goes through the data preprocessing .
The proposed work deals with Machine Learning algorithms such as bio-inspired
optimization algorithms. It has 4 features optimizing algorithms such as Genetic Algorithm,
and Bee algorithms. We analyse and predict the result whether the patient has heart disease,
no disease and the stages of disease using bio –inspired algorithms. This result will displayed
in the GUI. Genetic Algorithm gives more accuracy in less time for the prediction. This
prediction will make it faster and more efficient in healthcare sectors.
The objective of this paper is to evaluate an imbalanced dataset with the help of
various machine learning models like bio-inspired algorithms. It has 4 features optimizing
algorithms such as Genetic Algorithm, Bat algorithm, Bee algorithm and ACO algorithm And
to determine which one of those is the best suited model for Heart disease prediction.
The major challenge in heart disease is its detection. There are instruments available
which can predict heart disease but either it are expensive or are not efficient to calculate
chance of heart disease in human. Early detection of cardiac diseases can decrease the
mortality rate and overall complications. However, it is not possible to monitor patients
everyday in all cases accurately and consultation of a patient for 24 hours by a doctor is not
available since it requires more sapience, time and expertise. Since we have a good amount of
data in today’s world, we can use various machine learning algorithms to analyze the data for
hidden patterns. The hidden patterns can be used for health diagnosis in medicinal data.
Chapter 2
LITERATURE SURVEY
2. LITERATURE SURVEY
As per the recent study by WHO, heart related diseases are increasing. 17.9 million
people die every-year due to this. With growing population, it gets further difficult to
diagnose and start treatment at early stage. But due to the recent advancement in technology,
Machine Learning techniques have accelerated the health sector by multiple researches. Thus,
the objective of this project is to build a ML model for heart disease prediction based on the
related parameters. We have used a benchmark dataset of UCI Heart disease prediction for
this research work, which consist of 14 different parameters related to Heart Disease.
Machine Learning algorithms such as Random Forest, Support Vector Machine (SVM),
Naive Bayes and Decision tree have been used for the development of model. In our research
we have also tried to find the correlations between the different attributes available in the
dataset with the help of standard Machine Learning methods and then using them efficiently
in the prediction of chances of Heart disease. Result shows that compared to other ML
techniques, Random Forest gives more accuracy in less time for the prediction. This model
can be helpful to the medical practitioners at their clinic as decision support system.
2.2 Heart Disease Diagnosis and Prediction Using Machine Learning and
Data Mining Techniques
Heart disease is the main reason for death in the world over the last decade. Almost
one person dies of Heart disease about every minute in the United States alone. Researchers
have been using several data mining techniques to help health care professionals in the
diagnosis of heart disease. However using data mining technique can reduce the number of
test that are required. In order to reduce number of deaths from heart diseases there have to be
a quick and efficient detection technique. Decision Tree is one of the effective data mining
methods used. This research compares different algorithms of Decision Tree classification
seeking better performance in heart disease diagnosis using WEKA. The algorithms which
are tested is J48 algorithm, Logistic model tree algorithm and Random Forest algorithm. The
existing datasets of heart disease patients from Cleveland database of UCI repository is used
to test and justify the performance of decision tree algorithms. This datasets consists of 303
instances and 76 attributes. Subsequently, the classification algorithm that has optimal
suggested for use in sizeable data. The goal of this study is to extract hidden patterns by
applying data mining techniques, which are noteworthy to heart diseases and to predict the
presence of heart disease in patients where this presence is valued from no presence to likely
presence.
2.4 Decision support system for heart disease based on support vector
machine and articial neural network
present medical diagnostic procedures in a rational, objective, accurate and fast way. This
paper presents a decision support system for heart disease classification based on support
vector machine (SVM) and Artificial Neural Network (ANN). A multilayer perceptron neural
network (MLPNN) with three layers is employed to develop a decision support system for the
diagnosis of heart disease. The multilayer perceptron neural network is trained by back-
propagation algorithm which is computationally efficient method. Results obtained show that
a MLPNN with back-propagation can be successfully used for diagnosing heart disease than
support vector machine.
There is an increase in death rate yearly as a result of heart diseases. One of the major
factors that cause this increase is misdiagnoses on the part of medical doctors or ignorance on
the part of the patient. Heart diseases can be described as any kind of disorder that affects the
heart. In this research work, causes of heart diseases, the complications and the remedies for
the diseases have been considered. An intelligent system which can diagnose heart diseases
has been implemented. This system will prevent misdiagnosis which is the major error that
may occur by medical doctors. The dataset of statlog heart disease has been used to carry out
this experiment. The dataset comprises attributes of patients diagnosed for heart diseases. The
diagnosis was used to confirm whether heart disease is present or absent in the patient. The
datasets were obtained from the UCI Machine Learning. This dataset was divided into
training, validation set and testing set, to be fed into the network. The intelligent system was
modeled on feed forward multilayer perceptron, and support vector machine. The recognition
rate obtained from these models were later compared to ascertain the best model for the
intelligent system due to its significance in medical field. The results obtained are 85%,
87.5% for feedforward multilayer perceptron, and support vector machine respectively. From
this experiment we discovered that support vector machine is the best network for the
diagnosis of heart disease.
Data mining, a great developing technique that revolves around exploring and digging
out significant information from massive collection of data which can be further beneficial in
examining and drawing out patterns for making business related decisions. Talking about the
Medical domain, implementation of data mining in this field can yield in discovering and
withdrawing valuable patterns and information which can prove beneficial in performing
clinical diagnosis. The research focuses on heart disease diagnosis by considering previous
data and information. To achieve this SHDP (Smart Heart Disease Prediction) is built via
Navies Bayesian in order to predict risk factors concerning heart disease. The speedy
advancement of technology has led to remarkable rise in mobile health technology that
being one of the web application. The required data is assembled in a standardized form. For
predicting the chances of heart disease in a patient, the following attributes are being fetched
from the medical profiles, these include: age, BP, cholesterol, sex, blood sugar etc... The
collected attributes acts as input for the Navies Bayesian classification for predicting heart
disease. The dataset utilized is split into two sections, 80% dataset is utilized for training and
rest 20% is utilized for testing. The proposed approach includes following stages: dataset
collection, user registration and login (Application based), classification via Navies Bayesian,
prediction and secure data transfer by employing AES (Advanced Encryption Standard).
Thereafter result is produced. The research elaborates and presents multiple knowledge
abstraction
techniques by making use of data mining methods which are adopted for heart disease
prediction. The output reveals that the established diagnostic system effectively assists in
predicting risk factors concerning heart diseases.
2.7 Design of a hybrid system for the diabetes and heart diseases
proposed method achieved accuracy values 84.24% and 86.8% for Pima Indians diabetes
dataset and Cleveland heart disease dataset, respectively. It has been observed that these
results are one of the best results compared with results obtained from related previous
studies and reported in the UCI web sites.
2.8 Intelligent heart disease prediction system using data mining techniques
Now a day’s artificial neural network (ANN) has been widely used as a tool for
solving many decision modelling problems. A multilayer perception is a feed forward ANN
model that is used extensively for the solution of a no. of different problems. An ANN is the
simulation of the human brain. It is a supervised learning technique used for non- linear
classification Coronary heart disease is major epidemic in India and Andhra Pradesh is in risk
of Coronary Heart Disease. Clinical diagnosis is done mostly by doctor’s expertise and
patients were asked to take no. of diagnosis tests. But all the tests will not contribute towards
effective diagnosis of disease. Feature subset selection is a pre-processing step used to reduce
dimensionality, remove irrelevant data. In this paper we introduce a classification approach
which uses ANN and feature subset selection for the classification of heart disease. PCA is
used for pre-processing and to reduce no. Of attributes which indirectly reduces the no. of
diagnosis tests which are needed to be taken by a patient. We applied our approach on
base. Our experimental results show that accuracy improved over traditional classification
techniques. This system is feasible and faster and more accurate for diagnosis of heart disease.
2.10 Heart disease prediction using data mining with map-reduce algorithm
The World Health Organization (WHO) estimated that cardiovascular diseases (CVD)
are the major cause of mortality globally, as well as in India. They are caused by disorders of
the heart and blood vessels, and includes coronary heart disease (heart attacks), Data mining
acts as a major role in the construction of an intellectual prediction model for healthcare
systems to detect Heart Disease (HD) using patient data sets, which support doctors in
diminishing mortality rate due to heart disease. Several researches have been carried out for
building model using individually or by combining the Data Mining with computational
techniques involving Decision tree (DT), Naïve bayes (NB) along with Meta-heuristics
approach, Trained Neural Network (NN), Machine intelligence or AI and unsupervised
learning algorithms like KNN and Support vector machine (SVM). In the proposed system,
large set of medical instances are taken as input. From this medical dataset, it is aimed to
extract the needed information from the record of heart patients using Map-reduce technique.
The performance of the proposed Map-reduce Algorithm’s implementation in parallel and
distributed systems was evaluated by using Cleveland dataset and compared with that of the
predictable ANN method. The trial results verify that the projected method could achieve an
average prediction accuracy of 98%, which is greater than the conventional recurrent fuzzy
neural network. In addition, this Map-reduce technique also had better performance than
previous methods that reported prediction accuracies in the range of 95– 98%. These findings
suggest that the Map-reduce technique could be used to accurately predict HD risks in the
clinic. In 2019, Blue Eyes Intelligence Engineering and Sciences Publication. All rights
reserved.
Chapter 3
METHODOLOGY
3. METHODOLOGY
The architecture of the proposed system is as displayed in the figure below. The major
components of the architecture are as follows: patient database, preprocessing, training the
model, test the model, algorithms, and prediction of heart disease. There are many disease
prediction systems which do not use some of the risk factors such as age, sex, blood pressure,
cholesterol, chest pain, etc. Without using these vital risk factors; result will not be much
accurate. In this proposed work, 14 important risk factors are used to predict heart disease in
accurate manner. This makes data preparation the most important step in this process. Along
with the data another most important step is selecting the most suitable Algorithms like
Genetic
,BAT ,BEE Algorithms.
1. Data Set : A website called kaggle.com obtained the Data Set we're using. This are the
Attributes taken in the dataset : age , sex, cp, trestbps, chol, fbs , restecg, thalacg exang,
oldpeak, slope, ca, thal, class
2.Data Preprocessing: The collected dataset is divided into sections one is traning part and
another one is testing part.
3. Bio-Inspired Algorithms: The algorithms which is used here is Genetic, BAT, BEE
Algorithms to predict the required output.
5. Prediction: This is the final step of the system, here the output will display in the GUI.
DATA SETS
DATA RETRIEVAL
DATA PREPROCESSING
TRAINING TESTING
BIO-INSPIRED ALGORITHMS
GENETIC
BAT ALGORITHM BEE ALGORITHM
ALGORITHM
MODEL
EVALUATION
3.3 DATASET
The raw data for the Heart diseases prediction is the collects of historical data that
includes a variety of important attributes like age , sex, cp, trestbps, chol, fbs , restecg,
thalacg exang, oldpeak, slope, ca, thal, class this are the 14 main Attributes taken in the
dataset. All attributes are numeric-valued. The data was collected from the following
locations:Cleveland Clinic Foundation.
3.3.2 DataSets
Here the dataset is shows that the count, mean, std, min & max value etc. It is helpful to
understand the dataset of particular attributes.
Encoding. Since the raw data are incomplete for making them complete form pre-
processing should be done.
Because data is collected from multiple sources which are in different formats, more
than half out time is fed in dealing with the data quality issues when working on a machine
learning problem. It is simply unrealistic to expect that the data will be perfect. There may be
problems due to human error, limitations of measuring devices, or flaws in the data collection
process. Let’s cover over some of the techniques to deal with them.
a. Missing Values
Simple and sometimes effective strategy fails if many objects have missing values. If
a feature has mostly missing values, then that feature itself can also be eliminated.
If only a reasonable percentage of values are missing, then we can also run simple
interpolation methods to fill in those values. However, the most common method we have
used to deal with missing values is by filling them in with the mean, median or mode value of
the respective feature.
b. Duplicate Values
A dataset may include data objects which are duplicates of one another. It may
happen when the same person submits a form more than once. The term deduplication is
often used to refer to the process of dealing with duplicates. In most cases, the duplicates are
removed so as to not give that particular data object an advantage or bias, when running
machine learning algorithms.
Feature Aggregation
Feature Aggregations are performed so as to take the aggregated values in order to put
the data in a better perspective. This results in reduction of memory consumption and
processing time. Aggregations provide us with a high-level view of the data as the behavior
of groups or aggregates is more stable than individual data objects.
Feature Sampling
Sampling is a very common method for selecting a subset of the dataset that we are
analyzing. In most cases, working with the complete dataset can turn out to be too expensive
considering the memory and time constraints. Using a sampling algorithm can help us reduce
the size of the dataset to a point where we can use a better, but more expensive, machine
learning algorithm. The key principle here is that the sampling should be done in such a
manner that the sample generated should have approximately the same properties as the
original dataset, meaning that the sample is representative. This involves choosing the correct
sample size and sampling strategy.
Feature Encoding
The whole purpose of data preprocessing is to encode the data in order to bring it to
such a state that the machine now understands it. Feature encoding is basically performing
transformations on the data such that it can be easily accepted as input for machine learning
algorithms while still retaining its original meaning. There are some general norms or rules
which are followed when performing feature encoding. For Continuous variables: Nominal:
Any one-to-one mapping can be done which retains the meaning. For instance, a permutation
of values like in one hot encoding.
In the training part, the algorithm as mentioned above will be implemented. it helps in
finding a better set of output. The training is done on basis of the dataset input to the system.
The efficiency of the system can be improved every instance as many times the model is
trained, the number of iterations etc. The whole dataset provided which consists of 14
attributes and 303 rows will help the model undergo training. Training can also be
implemented by splitting the data in equalized required amount of data partitions. In the user
interactive GUI, as the user will select train network option after entering his data at the
backend the .csv file of heart disease dataset will be read and normalization will be carried out
so as to classify the data into classes which becomes easier. To generate a network, train()
function is implemented so as to pass the inputs. this network will be stored.
3.4 ALGORITHMS
3.4.1 Genetic Algorithm
The genetic algorithm is a method for solving both constrained and unconstrained
optimization problems that is based on natural selection, the process that drives biological
evolution. The genetic algorithm repeatedly modifies a population of individual solutions.
The genetic algorithm is a method for solving both constrained and unconstrained
optimization problems that is based on natural selection, the process that drives biological
evolution. The genetic algorithm repeatedly modifies a population of individual solutions.
At each step, the genetic algorithm selects individuals from the current population to be
parents and uses them to produce the children for the next generation. Over successive
generations, the population "evolves" toward an optimal solution. You can apply the
genetic algorithm to solve a variety of optimization problems that are not well suited for
standard optimization algorithms, including problems in which the objective function is
discontinuous, nondifferentiable, stochastic, or highly nonlinear. The genetic algorithm can
address problems of mixed integer programming, where some components are restricted to
be integer-valued. It deals with the population i.e individual input string. First it will select
the input string and assign a fitness value. Based on those fitness value a new offspring
will be generated. Then followed by the crossover process it will generate possibly a fit
string so as to obtain optimized weight. The new string generated at each stage is possibly
a better than the previous one.
The Bat algorithm is a metaheuristic algorithm for global optimization. It was inspired
by the echolocation behaviour of microbats, with varying pulse rates of emission and
loudness. The BA is widely used in various optimization problems because of its excellent
performance. bat algorithm has the advantage of simplicity and flexibility.
(1) Bats use echolocation to sense prey, predator, or any barriers in the path and distance.
(2) Bats fly with a velocity and position. They have frequency f and loudness to reach their
prey.
a main search cycle which is iterated for a given number T of times, or until a solution of
acceptable fitness is found. Each search cycle is composed of five procedures: recruitment,
local search, neighbourhood shrinking, site abandonment, and global search.
Chapter 4
SYSTEM ANALYSIS
4. SYSTEM ANALYSIS
4.1 Existing System
By using data mining techniques like BAT, BEE, ACO algorithms the performance was
evaluated in terms of accuracy, sensitivity and specificity and also compare to other well-
known data sets was low.
In this project student want to detect heart disease from dataset using Bio Inspired 4 features
optimizing algorithms such as Genetic Algorithm, Bat, Bee and ACO. Here ACO algorithm
is design in python to solve Travelling Salesman Problem to find shortest path and it cannot
be implemented with heart disease dataset, so I am implementing 3 algorithms called Genetic,
Bat and Bee.
Finally, the performance was evaluated in terms of accuracy, sensitivity and specificity and
also compare to other well-known data sets, it has been observed that these results are one of
the best results compared with the results obtained from related previous studies.
The feasibility of the project is analyzed in this phase and business proposal is
put forth with a very general plan for the project and some cost estimates. During system
analysis the feasibility study of the proposed system is to be carried out. This is to ensure that
the proposed system is not a burden to the company. For feasibility analysis, some
understanding of the major requirements for the system is essential.
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY
This study is carried out to check the economic impact that the system will have on
the organization. The amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified. Thus the developed
system as well within the budget and this was achieved because most of the technologies
used are freely available. Only the customized products had to be purchased.
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand on the
available technical resources. This will lead to high demands on the available technical
resources. This will lead to high demands being placed on the client. The developed system
must have a modest requirement, as only minimal or null changes are required for
implementing this system.
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system
and to make him familiar with it. His level of confidence must be raised so that he is also able
to make some constructive criticism, which is welcomed, as he is the final user of the system.
Processor : i3
Chapter 5
SYSTEM DESIGN
5. SYSTEM DESIGN
5.1 INTRODUCTION
The most creative and challenging phase of the life cycle is system and design. The term
design describes a final system and the process by which it is developed. It refers to the
technical specifications that will be applied in implementation the candidate system. The
design may be defined as “the process of applying various techniques and principles for the
purpose of defining a device, a process or a system in sufficient details to permit its
physicalrealization”. The design’s goal is how the output is to be produced and in what
format samples of theoutput and input are also presented. Second input data and database
files have to be designedto meet the requirements of the proposed output. The processing
phase is handled through the program construction and testing. Finally details related to
justification of the system and an estimate of the impact of the candidate system on the users
and the organization are documented and evaluated by management as a step toward
implementation. The importance of software design can be stated in a single word “Quality”.
Design provides us with representation of software that can be assessed for quality. Design is
the only way that we can accurately translate a customer’s requirements into a finished
software product or system without design we risk building an unstable system, that might
fail it small changes are made or may be difficult to test, or one who’s quality can’t be tested.
So, it is an essential phase in the development of a software product.
➢ The goal is for UML to become a common language for creating models of objectoriented
computer software. In its current form UML is comprised of two major components: a Meta-
model and a notation. In the future, some form of method or process may also be added to; or
associated with, UML.
➢ The UML represents a collection of best engineering practices that have proven successful
in modeling of large and complex systems.
➢ The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design ofsoftware projects.
GOALS:
1. Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
5. Encourage the growth of OO tools market. 6. Support higher level development concepts
such as collaboration frameworks, patterns and components.
1. Things.
2. Relationships.
3. Diagrams.
Things are the abstractions that are first-class citizen in a model. There are four kinds ofthings
in the UML.
1. Structure things.
2. Behavioral things.
3. Grouping things.
4. Annotational things.
These things are the basic object-oriented building blocks ofthe UML. You use them to write
well-formed models.
Use case diagram shows a set of use cases and actors (a special kind of class) and their
relationship. Usecase diagrams addressthe static usecase view ofa system. These diagrams are
especially important in organizing and modeling the behavioral of a system both sequence
and collaboration diagrams are kind of interaction diagram.
client
accuracy graph
Class diagrams area unit the foremost common diagrams employed in UML. Category
diagram consists of categories, interfaces, associations and collaboration. Category diagrams
primarily represent the thing directed read of a system that is static in nature. Active category
is employed in a very category diagram to represent the concurrency of the system. Fig 6.2(b)
refers the class diagram for predicting the fraudulent transactions.
Class diagram represents the thing orientation of a system. Therefore, it's usually usedfor
development purpose. This can be the foremost wide used diagram at the time of system
construction.
client system
accuaracy graph
Chapter 6
Software Testing is a critical element of software quality assurance and represents the
ultimate review of specification, design and coding, Testing presents an interesting
anomaly for the software engineer.
Testing is a process of executing a program with the intent of finding an error. A good test
case is one that has a probability of finding an as yet undiscovered error.A successful test is
one that uncovers an undiscovered error.
Testing Principles
All tests should be traceable to end user requirements
Testing should begin on a small scale and progress towards testing in large
A Strategy for software testing integrates software test cases into a series of well planned
steps that result in the successful construction of software. Software testing is a broader topic
for what is referred to as Verification and Validation. Verification refers to the set of
activities that ensure that the software correctly implements a specific function Validation
refers he set of activities that ensure that the software that has been built is traceable to
customer’s requirements.
The purpose of testing is to discover errors. Testing is the process of trying to discover every
conceivable fault or weakness in a work product. It provides a way to check the functionality
of components, sub assemblies, assemblies and/or a finished product It is the process of
exercising software with the intent of ensuring that the Software system meets its
and user expectations and does not fail in an unacceptable manner. There are various types of
test. Each test type addresses a specific testing requirement.
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches
and internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests
perform basic tests at component level and test a specific business process, application,
and/or system configuration. Unit tests ensure that each unique path of a business process
performs accurately to the documented specifications and contains clearly defined inputs and
expected results.
Integration tests are designed to test integrated software components to determine if they
actually run as one program. Testing is event driven and is more concerned with the basic
outcome of screens or fields. Integration tests demonstrate that although the components were
individually satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically aimed at exposing the
problems that arise from the combination of components.
Functional tests provide systematic demonstrations that functions tested are available as
specified by the business and technical requirements, system documentation, and user
manuals.
System testing ensures that the entire integrated software system meets requirements. It tests
a configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
White Box Testing is a testing in which in which the software tester has knowledge of the
inner workings, structure and language of the software, or at least its purpose. It is purpose. It
is used to test areas that cannot be reached from a black box level.
Unit testing is usually conducted as part of a combined code and unit test phase of the
software lifecycle, although it is not uncommon for coding and unit testing to be conducted as
two distinct phases.
Integration Testing
Software integration testing is the incremental integration testing of two or
more integrated software components on a single platform to produce failures caused by
interface defects.
The task of the integration test is to check that components or software applications, e.g.
components in a software system or – one step up – software applications at the company level
– interact without error.
Test Results
All the test cases mentioned above passed successfully. No defects encountered.
Acceptance Testing
User Acceptance Testing is a critical phase of any project and requires significant participation
by the end user. It also ensures that the system meets the functional requirements.
Test Results:
All the test cases mentioned above passed successfully. No defects encountered.
Chapter 7
Python drew inspiration from other programming languages like C, C++, Java, Perl, and Lisp.
Python's developers try to avoid changing the language to make it better until they have a lot
of things to change. Also, they try not to make small repairs, called patches, to unimportant
parts of the CPython reference implementation that would make it faster. When speed is
important, a Python programmer can move some of the work of the program to other parts
written in programming languages like C or PyPy, a just-in-time compiler. It translates a
Python script into C and makes direct C-level API calls into the Python interpreter.
Keeping Python fun to use is an important goal of Python’s developers. It reflects in the
language's name, a tribute to the British comedy group Monty Python. On occasions, there
are playful approaches to tutorials and reference materials, such as referring to spam and eggs
instead of the standard foo and bar.
7.2 Features
Changing pri so that it is a built-in function, not a statement. This made it easier
to change a module to use a different print function, as well as making the syntax
more regular. In Python 2.6 and 2.7 print is available as a builtin but is masked
by the print statement syntax, which can be disabled by entering from future
the
raw_inp function to input . Python function behaves like Python
3's inp
2's raw_inp function, in that the input is always returned as a string rather than
into
functoo (the rationale being code that is less readable than code
uses redu
that uses a for loop and accumulator variable).
Adding support for optional function annotations that can be used for informal
type declarations or other purposes.
Unifying the str / unicode types, representing text, and introducing a separate
immutable
byt type; and a mostly corresponding mutable bytearr type, both
of which represent arrays of bytes.
Removing backward-compatibility features, including old-style classes, string
exceptions, and implicit relative imports
A change in integer division functionality: in Python 2, integer division always
5/ 5/
GEETHANJALI INSTITUTE OF SCIENCE AND
TECHNOLOGY
5
EFFICIENCY OF HEART DISEASE PREDICTION USING GENETIC
returns an integer. For example
is 2 ; whereas in Python is 2.5 . (In
3,
both Python 2 – 2.2 onwards – and Python 3, a separate operator exists to provide
the old behavior
Python is meant to be an easily readable language. Its formatting is visually uncluttered and
often uses English keywords where other languages use punctuation. Unlike many other
languages, it does not use curly brackets to delimit blocks, and semicolons after statements
are allowed but rarely used. It has fewer syntactic exceptions and special cases than C or
Pascal.
Python was designed for readability, and has some similarities to the English
language with influence from mathematics.
Python uses new lines to complete a command, as opposed to other programming
languages which often use semicolons or parentheses.
Python relies on indentation, using whitespace, to define scope; such as the scope of
loops, functions and classes. Other programming languages often use curly-brackets
for this purpose.
7.3 Advantages
Class
Objects
Polymorphism
Encapsulation
Inheritance
a) Class
A class is a collection of objects. A class contains the blueprints or the prototype from which
the objects are being created. It is a logical entity that contains some attributes and methods.
To understand the need for creating a class let’s consider an example, let’s say you wanted to
track the number of dogs that may have different attributes like breed, age. If a list is used,
the first element could be the dog’s breed while the second element could represent its age.
Let’s suppose there are 100 different dogs, then how would you know which element is
be which? What if you wanted to add other properties to these dogs? This lacks organization
and it’s the exact need for classes.
# Statement-1
# Statement-N
b) Objects
The object is an entity that has a state and behavior associated with it. It may be any real-
world object like a mouse, keyboard, chair, table, pen, etc. Integers, strings, floating-point
numbers, even arrays, and dictionaries, are all objects. More specifically, any single integer
or any single string is an object. The number 12 is an object, the string “Hello, world” is an
object, a list is an object that can hold other objects, and so on. You’ve been using objects all
along and may not even realize it.
An object consists of
State: It is represented by the attributes of an object. It also reflects the properties of an
object.
Behavior: It is represented by the methods of an object. It also reflects the response of
an object to other objects.
Identity: It gives a unique name to an object and enables one object to interact with other
objects.
To understand the state, behavior, and identity let us take the example of the class dog
(explained above).
e) Encapsulation
Data Types
a) Strings
A string is a sequence of characters. It can be declared in python by using double-quotes.
Strings are immutable, i.e., they cannot be changed.
b) Lists
Lists are one of the most powerful tools in python. They are just like the arrays declared in
other languages. But the most powerful thing is that list need not be always homogeneous. A
single list can contain strings, integers, as well as objects. Lists can also be used for
implementing stacks and queues. Lists are mutable, i.e., they can be altered once declared.
c)Tuples
A tuple is a sequence of immutable Python objects. Tuples are just like lists with the
exception that tuples cannot be changed once declared. Tuples are usually faster than lists.
d)Iterations
Iterations or looping can be performed in python by ‘for’ and ‘while’ loops. Apart from
iterating upon a particular condition, we can also iterate on strings, lists, and tuples.
7.5 Operators
Python Operators in general are used to perform operations on values and variables. These
are standard symbols used for the purpose of logical and arithmetic operations. In this article,
we will look into different types of Python operators.
a) Arithmetic Operators
Arithmetic operators are used to performing mathematical operations like addition,
subtraction, multiplication, and division.
b) Comparison Operators
<= Less than or equal to True if the left operand is less than or equal to the right x <= y
c) Logical Operators
Logical operators perform Logical AND, Logical OR, and Logical NOT operations. It is
used to combine conditional statements.
d) Bitwise Operators
Bitwise operators act on bits and perform the bit-by-bit operations. These are used to operate
on binary numbers.
e) Assignment Operators
f) Identity Operators
is and is not are the identity operators both are used to check if two values are located on the
same part of the memory. Two variables that are equal do not imply that they are identical.
is True if the operands are identical
is not True if the operands are not identical
g) Membership Operators
in and not in are the membership operators; used to test whether a value or variable is in a
sequence.
in True if value is found in the sequence
not in True if value is not found in the sequence
7.6 RESULT AND DISCUSSIONS
This section present the results of the proposed method of heart disease prediction using Bio-
Inspired Algorithms. we applied machine learning algorithms on heart disease dataset to
predict heart disease, based on the data of each attribute for each patient. Our goal was to
compare different classification models and define the most efficient one. For the comparison
of the dataset, performance metrics after feature selection, parameter tuning and calibration
are used because this is a standard process of evaluating algorithms. We build a model with
two categories as training and testing set in the machine learning such that 70% of training set
and 30% of testing set involved in the proposed work. Here we implemented the bio-inspired
algorithms that are Genetic Algorithm, Bat algorithm, Bee algorithm. All this algorithms are
well-suited for this system and The highest accuracy is given by Genetic algorithm.
To run this project double click on ‘run.bat’ file to get below screen
In above screen click on ‘Upload Heart Disease’ button and upload heart disease dataset. See
below screen
In above screen uploading dataset file, after uploading will get below screen
Now click on ‘Run Genetic Algorithm’ button to run genetic algorithm on dataset and to get
its accuracy details. While running this algorithm u can see black console to see feature
selection process, while running it will open empty windows, u just close all those empty
windows except current window
In above screen for GA accuracy, precision and recall we got 100% result. Now click on ‘Run
Bat’ algorithm button to get its accuracy is 51%.
Now click on ‘Upload & Predict Test Data’ button to upload test data and to predict it class.
In above screen I am uploading test file which contains test data without class label, after
uploading test data will get below screen
In above screen application has predicted disease stages. Now click on ‘Accuracy Graph’
button to view accuracy of all algorithms in graph format.
7.8.1 Correlation Matrix: The correlation matrix in machine learning is used for feature
selection. It represents dependency between various attributes.
Recall: It is the ratio of correct positive results to the total number of positive results predicted
by the system.
It is the harmonic mean of Precision and Recall. It measures the test accuracy. The range of
this metric is 0 to 1.
7.8.2 Data Analysis
Here the below Histogram shows the Distribution of dataset of different Attributes. which
helps to predict the required output.
The highest accuracy is given by genetic algorithm(100), and reaming algorithms got less
accuracy Bat algorithm (51.61%), Bee algorithm(54.83%).The results are shown below.
Here the below figure shows the accuracy graph between the Genetic algorithm, Bat algorithm,
Bee algorithm.:
Chapter 8
CONCLUSION
8. CONCLUSION
In the proposed Heart disease prediction system, Bio-Inspired algorithms are work well for
this model. Here in this bio- inspired algorithms it has four features optimizing algorithms
such as Genetic Algorithm, Bat algorithm, Bee algorithm and ACO algorithm. Here we are
implementing 3 algorithms called Genetic, Bat and Bee algorithms. We analyse and predict
the result whether the patient has heart disease, no disease and the stages of disease using bio
– inspired algorithms. After the comparative analysis of the various Machine Learning
models, we can conclude that the Genetic algorithm is the best approach to be used for
predicting heart disease. Among all the algorithms used Genetic algorithm has highest
accuracy value i.e., about 100%. Hence, we conclude that the Genetic algorithm is an
efficient model among all the algorithms used. Based on these results it can be shows that the
proposed system gives the good performance in the category of optimization and our project
provides an easy and efficient system for Heart disease prediction.
Chapter 9
FUTURE ENHANCEMENT
9. FUTURE ENHANCEMENT
Chapter 10
BIBLIOGRAPHY
10. BIBLIOGRAPHY
8. H. Kahramanli and N. Allahverdi, ”Design of a hybrid system for the diabetes and
heart diseases,” Expert Systems with Applications, vol. 35, no. 1-2, pp. 82-89, 2008.
9. S. Palaniappan and R. Awang, ”Intelligent heart disease prediction sys- tem using data
mining techniques,” in Pro- ceedings of IEEE/ACS Inter- national Conference on
Computer Systems and Applications (AICCSA 2008), pp. 108-115, Doha, Qatar,
March-April 2008.
10. E. O. Olaniyi and O. K. Oyedotun, ”Heart diseases diagnosis using neural networks
arbitration,” International Journal of Intelligent Systems and Applications, vol. 7, no.
12, pp. 75-82, 2015.
11. R. Das, I. Turkoglu, and A. Sengur, ”Effective diagnosis of heart disease through
neural networks ensembles,” Expert Systems with Applications, vol. 36, no. 4, pp.
7675-7680, 2009.
12. M. A. Jabbar, B. L. Deekshatulu, and P. Chandra, ”Classification of heart disease
using artificial neural network and feature subset selection,” Global Journal of
Computer Science and Technology Neural Artificial Intelligence, vol. 13, no. 11,
2013.
13. Rajesh Tiwari, Manisha Sharma, Kamal K. Mehta and Mohan Awasthy, “Dynamic
Load
Distribution to Improve Speedup of Multi-core System using MPI with
Virtualization”, International Journal of Advanced Science and Technology, Vol. 29,
Issue 12s, 2020, pp 931 – 940, ISSN: 2005 – 4238.
14. T.Nagamani, S.Logeswari, B.Gomathy,” Heart Disease Prediction using Data Mining
with Mapreduce Algorithm”, International Journal of Innovative Technology and
Exploring Engineering (IJITEE) ISSN: 2278- 3075, Volume-8 Issue-3, January 2019.
15. Fahd Saleh Alotaibi,” Implementation of Machine Learning Model to Predict Heart
Failure Disease”, (IJACSA) International Journal of Advanced Computer Science and
Applications, Vol. 10, No. 6, 2019.
16. Anjan Nikhil Repaka, Sai Deepak Ravikanti, Ramya G Franklin, ”Design And
Implementation Heart Disease Prediction Using Naives Bayesian”, International
Conference on Trends in Electronics and Infor- mation(ICOEI 2019).
17. Theresa Princy R,J. Thomas,’Human heart Disease Prediction System using Data
Mining Techniques’, International Conference on Circuit Power and Computing
Technologies,Bangalore,2016.
28. Prajakta Ghadge,Vrushali Girme, Kajal Kokane, and Prajakta Desh- mukh,
2016,“Intelligent Heart Attack Prediction System Using Big Data”, International
Journal of Recent Research in Mathematics Com- puter Science and Information
Technology,Vol. 2, Issue 2, pp.73- 77, October 2015–March.
29. Asha Rajkumar, and Mrs G. Sophia Reena, 2010, “Diagnosis of Heart Disease using
Data Mining Algorithms”,Global Journal of Computer Science and Technology,Vol.
10,Issue 10, pp.38-43, September.
30. Purusothaman, and P. Krishnakumari, June 2015,“A Survey of Data Mining
Techniques on Risk Prediction: Heart Dis- ease”, Indian Journal of Science and
Technology, Vol. 8(12), DOI:10.17485/ijst/2015/v8i12/58385, pp. 1-5